Re: [PATCH 1/1] commit c6825c0976fa7893692e0e43b09740b419b23c09 upstream.
On Wed, Nov 04, 2015 at 03:46:54PM -0800, Ani Sinha wrote: > (removed a bunch of people from CC list) > > On Mon, Oct 26, 2015 at 1:06 PM, Pablo Neira Ayuso> wrote: > > > Then we can review and, if no major concerns, I can submit this to > > -stable. > > Now that Neal has sufficiently tested the patches, is it OK to apply > to -stable or do you guys want me to do anything more? I'll be passing up this to -stable asap. Thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] commit c6825c0976fa7893692e0e43b09740b419b23c09 upstream.
(removed a bunch of people from CC list) On Mon, Oct 26, 2015 at 1:06 PM, Pablo Neira Ayusowrote: > Then we can review and, if no major concerns, I can submit this to > -stable. Now that Neal has sufficiently tested the patches, is it OK to apply to -stable or do you guys want me to do anything more? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] commit c6825c0976fa7893692e0e43b09740b419b23c09 upstream.
> On Thu, Oct 29, 2015 at 6:21 PM, Neal P. Murphy >wrote: > > On Thu, 29 Oct 2015 17:01:24 -0700 > > Ani Sinha wrote: > > > >> On Wed, Oct 28, 2015 at 11:40 PM, Neal P. Murphy > >> wrote: > >> > On Wed, 28 Oct 2015 02:36:50 -0400 > >> > "Neal P. Murphy" wrote: > >> > > >> >> On Mon, 26 Oct 2015 21:06:33 +0100 > >> >> Pablo Neira Ayuso wrote: > >> >> > >> >> > Hi, > >> >> > > >> >> > On Mon, Oct 26, 2015 at 11:55:39AM -0700, Ani Sinha wrote: > >> >> > > netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get > >> >> > > >> >> > Please, no need to Cc everyone here. Please, submit your Netfilter > >> >> > patches to netfilter-de...@vger.kernel.org. > >> >> > > >> >> > Moreover, it would be great if the subject includes something > >> >> > descriptive on what you need, for this I'd suggest: > >> >> > > >> >> > [PATCH -stable 3.4,backport] netfilter: nf_conntrack: fix RCU race in > >> >> > nf_conntrack_find_get > >> >> > > >> >> > I'm including Neal P. Murphy, he said he would help testing these > >> >> > backports, getting a Tested-by: tag usually speeds up things too. > >> >> > >> > > >> > I've probably done about as much seat-of-the-pants testing as I can. All > >> > opening/closing the same destination IP/port. > >> > > >> > Host: Debian Jessie, 8-core Vishera 8350 at 4.4 GHz, 16GiB RAM at (I > >> > think) 2100MHz. > >> > > >> > Traffic generator 1: 6-CPU KVM running 64-bit Smoothwall Express 3.1 > >> > (linux 3.4.109 without these patches), with 8GiB RAM and 9GiB swap. > >> > Packets sent across PURPLE (to bypass NAT and firewall). > >> > > >> > Traffic generator 2: 32-bit KVM running Smoothwall Express 3.1 (linux > >> > 3.4.110 with these patches), 3GiB RAM and minimal swap. > >> > > >> > In the first set of tests, generator 1's traffic passed through > >> > Generator 2 as a NATting firewall, to the host's web server. In the > >> > second set of tests, generator 2's traffic went through NAT to the > >> > host's web server. > >> > > >> > The load tests: > >> > - 2500 processes using 2500 addresses and random src ports > >> > - 2500 processes using 2500 addresses and the same src port > >> > - 2500 processes using the same src address and port > >> > > >> > I also tested using stock NF timeouts and using 1 second timeouts. > >> > > >> > Bandwidth used got as high as 16Mb/s for some tests. > >> > > >> > Conntracks got up to 200 000 or so or bounced between 1 and 2, depending > >> > on the test and the timeouts. > >> > > >> > I did not reproduce the problem these patches solve. But more > >> > importantly, I saw no problems at all. Each time I terminated a test, > >> > RAM usage returned to about that of post-boot; so there were no apparent > >> > memory leaks. No kernel messages and no netfilter messages appeared > >> > during the tests. > >> > > >> > If I have time, I suppose I could run another set of tests: 2500 source > >> > processes using 2500 addresses times 200 ports to connect to 2500 > >> > addresses times 200 ports on a destination system. Each process opens > >> > 200 sockets, then closes them. And repeats ad infinitum. But I might > >> > have to be clever since I can't run 500 000 processes; but I could run > >> > 20 VMs; that would get it down to about 12 000 processes per VM. And I > >> > might have to figure out how to allow allow processes on the destination > >> > system to open hundreds or thousands of sockets. > >> > >> Should I resend the patch with a Tested-by: tag? > > > > ... Oh, wait. Not yet. The dawn just broke over ol' Marblehead here. I only > > tested TCP; I need to hammer UDP, too. > > > > Can I set the timeouts to zero? Or is one as low as I can go? > > Any progress with testing ? I applied the 'hammer' through a firewall with the patch. I used TCP, UDP and ICMP. I don't know if the patch fixes the problem. But I'm reasonably sure that it did not break normal operations. To test a different problem I fixed (a memory leak in my 64-bit counter patch for xt_ACCOUNT), I tested 60,000 addresses (most of a /16) through the firewall. Again, no troubles. I only observed two odd things which are likely completely unrelated to your patch. When I started the TCP test, then added the UDP test, only TCP would come through. If I stopped and restarted the TCP test, only UDP would come through. I suspect this is due to buffering. It's just a behaviour I haven't encountered since I started using Linux many years ago (around '98). The second, when I started the test, the firewall would lose contact with the upstream F/W's apcupsd daemon; again, this is likely due to the nature of the test: it likely floods input and output queues. I'd say you can probably resend with Tested-by. Neal -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at
Re: [PATCH 1/1] commit c6825c0976fa7893692e0e43b09740b419b23c09 upstream.
On Thu, Oct 29, 2015 at 6:21 PM, Neal P. Murphywrote: > On Thu, 29 Oct 2015 17:01:24 -0700 > Ani Sinha wrote: > >> On Wed, Oct 28, 2015 at 11:40 PM, Neal P. Murphy >> wrote: >> > On Wed, 28 Oct 2015 02:36:50 -0400 >> > "Neal P. Murphy" wrote: >> > >> >> On Mon, 26 Oct 2015 21:06:33 +0100 >> >> Pablo Neira Ayuso wrote: >> >> >> >> > Hi, >> >> > >> >> > On Mon, Oct 26, 2015 at 11:55:39AM -0700, Ani Sinha wrote: >> >> > > netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get >> >> > >> >> > Please, no need to Cc everyone here. Please, submit your Netfilter >> >> > patches to netfilter-de...@vger.kernel.org. >> >> > >> >> > Moreover, it would be great if the subject includes something >> >> > descriptive on what you need, for this I'd suggest: >> >> > >> >> > [PATCH -stable 3.4,backport] netfilter: nf_conntrack: fix RCU race in >> >> > nf_conntrack_find_get >> >> > >> >> > I'm including Neal P. Murphy, he said he would help testing these >> >> > backports, getting a Tested-by: tag usually speeds up things too. >> >> >> > >> > I've probably done about as much seat-of-the-pants testing as I can. All >> > opening/closing the same destination IP/port. >> > >> > Host: Debian Jessie, 8-core Vishera 8350 at 4.4 GHz, 16GiB RAM at (I >> > think) 2100MHz. >> > >> > Traffic generator 1: 6-CPU KVM running 64-bit Smoothwall Express 3.1 >> > (linux 3.4.109 without these patches), with 8GiB RAM and 9GiB swap. >> > Packets sent across PURPLE (to bypass NAT and firewall). >> > >> > Traffic generator 2: 32-bit KVM running Smoothwall Express 3.1 (linux >> > 3.4.110 with these patches), 3GiB RAM and minimal swap. >> > >> > In the first set of tests, generator 1's traffic passed through Generator >> > 2 as a NATting firewall, to the host's web server. In the second set of >> > tests, generator 2's traffic went through NAT to the host's web server. >> > >> > The load tests: >> > - 2500 processes using 2500 addresses and random src ports >> > - 2500 processes using 2500 addresses and the same src port >> > - 2500 processes using the same src address and port >> > >> > I also tested using stock NF timeouts and using 1 second timeouts. >> > >> > Bandwidth used got as high as 16Mb/s for some tests. >> > >> > Conntracks got up to 200 000 or so or bounced between 1 and 2, depending >> > on the test and the timeouts. >> > >> > I did not reproduce the problem these patches solve. But more importantly, >> > I saw no problems at all. Each time I terminated a test, RAM usage >> > returned to about that of post-boot; so there were no apparent memory >> > leaks. No kernel messages and no netfilter messages appeared during the >> > tests. >> > >> > If I have time, I suppose I could run another set of tests: 2500 source >> > processes using 2500 addresses times 200 ports to connect to 2500 >> > addresses times 200 ports on a destination system. Each process opens 200 >> > sockets, then closes them. And repeats ad infinitum. But I might have to >> > be clever since I can't run 500 000 processes; but I could run 20 VMs; >> > that would get it down to about 12 000 processes per VM. And I might have >> > to figure out how to allow allow processes on the destination system to >> > open hundreds or thousands of sockets. >> >> Should I resend the patch with a Tested-by: tag? > > ... Oh, wait. Not yet. The dawn just broke over ol' Marblehead here. I only > tested TCP; I need to hammer UDP, too. > > Can I set the timeouts to zero? Or is one as low as I can go? I don't see any assertion or check against 0 sec timeouts. You can try. Your conntrack entries will be constantly flushing. > > N -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] commit c6825c0976fa7893692e0e43b09740b419b23c09 upstream.
On Wed, Oct 28, 2015 at 11:40 PM, Neal P. Murphywrote: > On Wed, 28 Oct 2015 02:36:50 -0400 > "Neal P. Murphy" wrote: > >> On Mon, 26 Oct 2015 21:06:33 +0100 >> Pablo Neira Ayuso wrote: >> >> > Hi, >> > >> > On Mon, Oct 26, 2015 at 11:55:39AM -0700, Ani Sinha wrote: >> > > netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get >> > >> > Please, no need to Cc everyone here. Please, submit your Netfilter >> > patches to netfilter-de...@vger.kernel.org. >> > >> > Moreover, it would be great if the subject includes something >> > descriptive on what you need, for this I'd suggest: >> > >> > [PATCH -stable 3.4,backport] netfilter: nf_conntrack: fix RCU race in >> > nf_conntrack_find_get >> > >> > I'm including Neal P. Murphy, he said he would help testing these >> > backports, getting a Tested-by: tag usually speeds up things too. >> > > I've probably done about as much seat-of-the-pants testing as I can. All > opening/closing the same destination IP/port. > > Host: Debian Jessie, 8-core Vishera 8350 at 4.4 GHz, 16GiB RAM at (I think) > 2100MHz. > > Traffic generator 1: 6-CPU KVM running 64-bit Smoothwall Express 3.1 (linux > 3.4.109 without these patches), with 8GiB RAM and 9GiB swap. Packets sent > across PURPLE (to bypass NAT and firewall). > > Traffic generator 2: 32-bit KVM running Smoothwall Express 3.1 (linux 3.4.110 > with these patches), 3GiB RAM and minimal swap. > > In the first set of tests, generator 1's traffic passed through Generator 2 > as a NATting firewall, to the host's web server. In the second set of tests, > generator 2's traffic went through NAT to the host's web server. > > The load tests: > - 2500 processes using 2500 addresses and random src ports > - 2500 processes using 2500 addresses and the same src port > - 2500 processes using the same src address and port > > I also tested using stock NF timeouts and using 1 second timeouts. > > Bandwidth used got as high as 16Mb/s for some tests. > > Conntracks got up to 200 000 or so or bounced between 1 and 2, depending on > the test and the timeouts. > > I did not reproduce the problem these patches solve. But more importantly, I > saw no problems at all. Each time I terminated a test, RAM usage returned to > about that of post-boot; so there were no apparent memory leaks. No kernel > messages and no netfilter messages appeared during the tests. > > If I have time, I suppose I could run another set of tests: 2500 source > processes using 2500 addresses times 200 ports to connect to 2500 addresses > times 200 ports on a destination system. Each process opens 200 sockets, then > closes them. And repeats ad infinitum. But I might have to be clever since I > can't run 500 000 processes; but I could run 20 VMs; that would get it down > to about 12 000 processes per VM. And I might have to figure out how to allow > allow processes on the destination system to open hundreds or thousands of > sockets. Should I resend the patch with a Tested-by: tag? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] commit c6825c0976fa7893692e0e43b09740b419b23c09 upstream.
On Wed, 28 Oct 2015 02:36:50 -0400 "Neal P. Murphy"wrote: > On Mon, 26 Oct 2015 21:06:33 +0100 > Pablo Neira Ayuso wrote: > > > Hi, > > > > On Mon, Oct 26, 2015 at 11:55:39AM -0700, Ani Sinha wrote: > > > netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get > > > > Please, no need to Cc everyone here. Please, submit your Netfilter > > patches to netfilter-de...@vger.kernel.org. > > > > Moreover, it would be great if the subject includes something > > descriptive on what you need, for this I'd suggest: > > > > [PATCH -stable 3.4,backport] netfilter: nf_conntrack: fix RCU race in > > nf_conntrack_find_get > > > > I'm including Neal P. Murphy, he said he would help testing these > > backports, getting a Tested-by: tag usually speeds up things too. > I've probably done about as much seat-of-the-pants testing as I can. All opening/closing the same destination IP/port. Host: Debian Jessie, 8-core Vishera 8350 at 4.4 GHz, 16GiB RAM at (I think) 2100MHz. Traffic generator 1: 6-CPU KVM running 64-bit Smoothwall Express 3.1 (linux 3.4.109 without these patches), with 8GiB RAM and 9GiB swap. Packets sent across PURPLE (to bypass NAT and firewall). Traffic generator 2: 32-bit KVM running Smoothwall Express 3.1 (linux 3.4.110 with these patches), 3GiB RAM and minimal swap. In the first set of tests, generator 1's traffic passed through Generator 2 as a NATting firewall, to the host's web server. In the second set of tests, generator 2's traffic went through NAT to the host's web server. The load tests: - 2500 processes using 2500 addresses and random src ports - 2500 processes using 2500 addresses and the same src port - 2500 processes using the same src address and port I also tested using stock NF timeouts and using 1 second timeouts. Bandwidth used got as high as 16Mb/s for some tests. Conntracks got up to 200 000 or so or bounced between 1 and 2, depending on the test and the timeouts. I did not reproduce the problem these patches solve. But more importantly, I saw no problems at all. Each time I terminated a test, RAM usage returned to about that of post-boot; so there were no apparent memory leaks. No kernel messages and no netfilter messages appeared during the tests. If I have time, I suppose I could run another set of tests: 2500 source processes using 2500 addresses times 200 ports to connect to 2500 addresses times 200 ports on a destination system. Each process opens 200 sockets, then closes them. And repeats ad infinitum. But I might have to be clever since I can't run 500 000 processes; but I could run 20 VMs; that would get it down to about 12 000 processes per VM. And I might have to figure out how to allow allow processes on the destination system to open hundreds or thousands of sockets. N -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] commit c6825c0976fa7893692e0e43b09740b419b23c09 upstream.
On Thu, 29 Oct 2015 17:01:24 -0700 Ani Sinhawrote: > On Wed, Oct 28, 2015 at 11:40 PM, Neal P. Murphy > wrote: > > On Wed, 28 Oct 2015 02:36:50 -0400 > > "Neal P. Murphy" wrote: > > > >> On Mon, 26 Oct 2015 21:06:33 +0100 > >> Pablo Neira Ayuso wrote: > >> > >> > Hi, > >> > > >> > On Mon, Oct 26, 2015 at 11:55:39AM -0700, Ani Sinha wrote: > >> > > netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get > >> > > >> > Please, no need to Cc everyone here. Please, submit your Netfilter > >> > patches to netfilter-de...@vger.kernel.org. > >> > > >> > Moreover, it would be great if the subject includes something > >> > descriptive on what you need, for this I'd suggest: > >> > > >> > [PATCH -stable 3.4,backport] netfilter: nf_conntrack: fix RCU race in > >> > nf_conntrack_find_get > >> > > >> > I'm including Neal P. Murphy, he said he would help testing these > >> > backports, getting a Tested-by: tag usually speeds up things too. > >> > > > > I've probably done about as much seat-of-the-pants testing as I can. All > > opening/closing the same destination IP/port. > > > > Host: Debian Jessie, 8-core Vishera 8350 at 4.4 GHz, 16GiB RAM at (I think) > > 2100MHz. > > > > Traffic generator 1: 6-CPU KVM running 64-bit Smoothwall Express 3.1 (linux > > 3.4.109 without these patches), with 8GiB RAM and 9GiB swap. Packets sent > > across PURPLE (to bypass NAT and firewall). > > > > Traffic generator 2: 32-bit KVM running Smoothwall Express 3.1 (linux > > 3.4.110 with these patches), 3GiB RAM and minimal swap. > > > > In the first set of tests, generator 1's traffic passed through Generator 2 > > as a NATting firewall, to the host's web server. In the second set of > > tests, generator 2's traffic went through NAT to the host's web server. > > > > The load tests: > > - 2500 processes using 2500 addresses and random src ports > > - 2500 processes using 2500 addresses and the same src port > > - 2500 processes using the same src address and port > > > > I also tested using stock NF timeouts and using 1 second timeouts. > > > > Bandwidth used got as high as 16Mb/s for some tests. > > > > Conntracks got up to 200 000 or so or bounced between 1 and 2, depending on > > the test and the timeouts. > > > > I did not reproduce the problem these patches solve. But more importantly, > > I saw no problems at all. Each time I terminated a test, RAM usage returned > > to about that of post-boot; so there were no apparent memory leaks. No > > kernel messages and no netfilter messages appeared during the tests. > > > > If I have time, I suppose I could run another set of tests: 2500 source > > processes using 2500 addresses times 200 ports to connect to 2500 addresses > > times 200 ports on a destination system. Each process opens 200 sockets, > > then closes them. And repeats ad infinitum. But I might have to be clever > > since I can't run 500 000 processes; but I could run 20 VMs; that would get > > it down to about 12 000 processes per VM. And I might have to figure out > > how to allow allow processes on the destination system to open hundreds or > > thousands of sockets. > > Should I resend the patch with a Tested-by: tag? ... Oh, wait. Not yet. The dawn just broke over ol' Marblehead here. I only tested TCP; I need to hammer UDP, too. Can I set the timeouts to zero? Or is one as low as I can go? N -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] commit c6825c0976fa7893692e0e43b09740b419b23c09 upstream.
On Mon, 26 Oct 2015 21:06:33 +0100 Pablo Neira Ayusowrote: > Hi, > > On Mon, Oct 26, 2015 at 11:55:39AM -0700, Ani Sinha wrote: > > netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get > > Please, no need to Cc everyone here. Please, submit your Netfilter > patches to netfilter-de...@vger.kernel.org. > > Moreover, it would be great if the subject includes something > descriptive on what you need, for this I'd suggest: > > [PATCH -stable 3.4,backport] netfilter: nf_conntrack: fix RCU race in > nf_conntrack_find_get > > I'm including Neal P. Murphy, he said he would help testing these > backports, getting a Tested-by: tag usually speeds up things too. I hammered it a couple nights ago. First test was 5000 processes on 6 SMP CPUs opening and closing a port on a 'remote' host using the usual random source ports. Only got up to 32000 conntracks. The generator was a 64-bit Smoothwall KVM without the patch. The traffic passed through a 32-bit Smoothwall KVM with the patch. The target was on the VM host. No problems encountered. I suspect I didn't come close to triggering the original problem. Second test was a couple thousand processes all using the same source IP and port and dest IP and port. Still no problems. But these were perl scripts (and they used lots of RAM); perhaps a short C program would let me run more. Any ideas on how I might test it more brutally? N -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] commit c6825c0976fa7893692e0e43b09740b419b23c09 upstream.
Hi, On Mon, Oct 26, 2015 at 11:55:39AM -0700, Ani Sinha wrote: > netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get Please, no need to Cc everyone here. Please, submit your Netfilter patches to netfilter-de...@vger.kernel.org. Moreover, it would be great if the subject includes something descriptive on what you need, for this I'd suggest: [PATCH -stable 3.4,backport] netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get I'm including Neal P. Murphy, he said he would help testing these backports, getting a Tested-by: tag usually speeds up things too. Burden is usually huge here, the easier you get it for us, the best. Then we can review and, if no major concerns, I can submit this to -stable. Let me know if you have any other questions, Thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/1] commit c6825c0976fa7893692e0e43b09740b419b23c09 upstream.
netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get Lets look at destroy_conntrack: hlist_nulls_del_rcu(>tuplehash[IP_CT_DIR_ORIGINAL].hnnode); ... nf_conntrack_free(ct) kmem_cache_free(net->ct.nf_conntrack_cachep, ct); net->ct.nf_conntrack_cachep is created with SLAB_DESTROY_BY_RCU. The hash is protected by rcu, so readers look up conntracks without locks. A conntrack is removed from the hash, but in this moment a few readers still can use the conntrack. Then this conntrack is released and another thread creates conntrack with the same address and the equal tuple. After this a reader starts to validate the conntrack: * It's not dying, because a new conntrack was created * nf_ct_tuple_equal() returns true. But this conntrack is not initialized yet, so it can not be used by two threads concurrently. In this case BUG_ON may be triggered from nf_nat_setup_info(). Florian Westphal suggested to check the confirm bit too. I think it's right. task 1 task 2 task 3 nf_conntrack_find_get nf_conntrack_find destroy_conntrack hlist_nulls_del_rcu nf_conntrack_free kmem_cache_free __nf_conntrack_alloc kmem_cache_alloc memset(>tuplehash[IP_CT_DIR_MAX], if (nf_ct_is_dying(ct)) if (!nf_ct_tuple_equal() I'm not sure, that I have ever seen this race condition in a real life. Currently we are investigating a bug, which is reproduced on a few nodes. In our case one conntrack is initialized from a few tasks concurrently, we don't have any other explanation for this. <2>[46267.083061] kernel BUG at net/ipv4/netfilter/nf_nat_core.c:322! ... <4>[46267.083951] RIP: 0010:[] [] nf_nat_setup_info+0x564/0x590 [nf_nat] ... <4>[46267.085549] Call Trace: <4>[46267.085622] [] alloc_null_binding+0x5b/0xa0 [iptable_nat] <4>[46267.085697] [] nf_nat_rule_find+0x5c/0x80 [iptable_nat] <4>[46267.085770] [] nf_nat_fn+0x111/0x260 [iptable_nat] <4>[46267.085843] [] nf_nat_out+0x48/0xd0 [iptable_nat] <4>[46267.085919] [] nf_iterate+0x69/0xb0 <4>[46267.085991] [] ? ip_finish_output+0x0/0x2f0 <4>[46267.086063] [] nf_hook_slow+0x74/0x110 <4>[46267.086133] [] ? ip_finish_output+0x0/0x2f0 <4>[46267.086207] [] ? dst_output+0x0/0x20 <4>[46267.086277] [] ip_output+0xa4/0xc0 <4>[46267.086346] [] raw_sendmsg+0x8b4/0x910 <4>[46267.086419] [] inet_sendmsg+0x4a/0xb0 <4>[46267.086491] [] ? sock_update_classid+0x3a/0x50 <4>[46267.086562] [] sock_sendmsg+0x117/0x140 <4>[46267.086638] [] ? _spin_unlock_bh+0x1b/0x20 <4>[46267.086712] [] ? autoremove_wake_function+0x0/0x40 <4>[46267.086785] [] ? do_ip_setsockopt+0x90/0xd80 <4>[46267.086858] [] ? call_function_interrupt+0xe/0x20 <4>[46267.086936] [] ? ub_slab_ptr+0x20/0x90 <4>[46267.087006] [] ? ub_slab_ptr+0x20/0x90 <4>[46267.087081] [] ? kmem_cache_alloc+0xd8/0x1e0 <4>[46267.087151] [] sys_sendto+0x139/0x190 <4>[46267.087229] [] ? sock_setsockopt+0x16d/0x6f0 <4>[46267.087303] [] ? audit_syscall_entry+0x1d7/0x200 <4>[46267.087378] [] ? __audit_syscall_exit+0x265/0x290 <4>[46267.087454] [] ? compat_sys_setsockopt+0x75/0x210 <4>[46267.087531] [] compat_sys_socketcall+0x13f/0x210 <4>[46267.087607] [] ia32_sysret+0x0/0x5 <4>[46267.087676] Code: 91 20 e2 01 75 29 48 89 de 4c 89 f7 e8 56 fa ff ff 85 c0 0f 84 68 fc ff ff 0f b6 4d c6 41 8b 45 00 e9 4d fb ff ff e8 7c 19 e9 e0 <0f> 0b eb fe f6 05 17 91 20 e2 80 74 ce 80 3d 5f 2e 00 00 00 74 <1>[46267.088023] RIP [] nf_nat_setup_info+0x564/0x590 Cc: Eric DumazetCc: Florian Westphal Cc: Pablo Neira Ayuso Cc: Patrick McHardy Cc: Jozsef Kadlecsik Cc: "David S. Miller" Cc: Cyrill Gorcunov Signed-off-by: Andrey Vagin Acked-by: Eric Dumazet Signed-off-by: Pablo Neira Ayuso Signed-off-by: Ani Sinha --- net/netfilter/nf_conntrack_core.c | 21 + 1 file changed, 17 insertions(+), 4 deletions(-) diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c index 9a46908..fd0f7a3 100644 --- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c @@ -309,6 +309,21 @@ static void death_by_timeout(unsigned long ul_conntrack) nf_ct_put(ct); } +static inline bool +nf_ct_key_equal(struct nf_conntrack_tuple_hash *h, + const struct nf_conntrack_tuple *tuple, + u16 zone) +{ + struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(h); + + /* A conntrack can be recreated with the equal tuple, +* so we need to check that the conntrack is confirmed +*/ + return nf_ct_tuple_equal(tuple,
[PATCH 1/1] commit c6825c0976fa7893692e0e43b09740b419b23c09 upstream.
netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get Lets look at destroy_conntrack: hlist_nulls_del_rcu(>tuplehash[IP_CT_DIR_ORIGINAL].hnnode); ... nf_conntrack_free(ct) kmem_cache_free(net->ct.nf_conntrack_cachep, ct); net->ct.nf_conntrack_cachep is created with SLAB_DESTROY_BY_RCU. The hash is protected by rcu, so readers look up conntracks without locks. A conntrack is removed from the hash, but in this moment a few readers still can use the conntrack. Then this conntrack is released and another thread creates conntrack with the same address and the equal tuple. After this a reader starts to validate the conntrack: * It's not dying, because a new conntrack was created * nf_ct_tuple_equal() returns true. But this conntrack is not initialized yet, so it can not be used by two threads concurrently. In this case BUG_ON may be triggered from nf_nat_setup_info(). Florian Westphal suggested to check the confirm bit too. I think it's right. task 1 task 2 task 3 nf_conntrack_find_get nf_conntrack_find destroy_conntrack hlist_nulls_del_rcu nf_conntrack_free kmem_cache_free __nf_conntrack_alloc kmem_cache_alloc memset(>tuplehash[IP_CT_DIR_MAX], if (nf_ct_is_dying(ct)) if (!nf_ct_tuple_equal() I'm not sure, that I have ever seen this race condition in a real life. Currently we are investigating a bug, which is reproduced on a few nodes. In our case one conntrack is initialized from a few tasks concurrently, we don't have any other explanation for this. <2>[46267.083061] kernel BUG at net/ipv4/netfilter/nf_nat_core.c:322! ... <4>[46267.083951] RIP: 0010:[] [] nf_nat_setup_info+0x564/0x590 [nf_nat] ... <4>[46267.085549] Call Trace: <4>[46267.085622] [] alloc_null_binding+0x5b/0xa0 [iptable_nat] <4>[46267.085697] [] nf_nat_rule_find+0x5c/0x80 [iptable_nat] <4>[46267.085770] [] nf_nat_fn+0x111/0x260 [iptable_nat] <4>[46267.085843] [] nf_nat_out+0x48/0xd0 [iptable_nat] <4>[46267.085919] [] nf_iterate+0x69/0xb0 <4>[46267.085991] [] ? ip_finish_output+0x0/0x2f0 <4>[46267.086063] [] nf_hook_slow+0x74/0x110 <4>[46267.086133] [] ? ip_finish_output+0x0/0x2f0 <4>[46267.086207] [] ? dst_output+0x0/0x20 <4>[46267.086277] [] ip_output+0xa4/0xc0 <4>[46267.086346] [] raw_sendmsg+0x8b4/0x910 <4>[46267.086419] [] inet_sendmsg+0x4a/0xb0 <4>[46267.086491] [] ? sock_update_classid+0x3a/0x50 <4>[46267.086562] [] sock_sendmsg+0x117/0x140 <4>[46267.086638] [] ? _spin_unlock_bh+0x1b/0x20 <4>[46267.086712] [] ? autoremove_wake_function+0x0/0x40 <4>[46267.086785] [] ? do_ip_setsockopt+0x90/0xd80 <4>[46267.086858] [] ? call_function_interrupt+0xe/0x20 <4>[46267.086936] [] ? ub_slab_ptr+0x20/0x90 <4>[46267.087006] [] ? ub_slab_ptr+0x20/0x90 <4>[46267.087081] [] ? kmem_cache_alloc+0xd8/0x1e0 <4>[46267.087151] [] sys_sendto+0x139/0x190 <4>[46267.087229] [] ? sock_setsockopt+0x16d/0x6f0 <4>[46267.087303] [] ? audit_syscall_entry+0x1d7/0x200 <4>[46267.087378] [] ? __audit_syscall_exit+0x265/0x290 <4>[46267.087454] [] ? compat_sys_setsockopt+0x75/0x210 <4>[46267.087531] [] compat_sys_socketcall+0x13f/0x210 <4>[46267.087607] [] ia32_sysret+0x0/0x5 <4>[46267.087676] Code: 91 20 e2 01 75 29 48 89 de 4c 89 f7 e8 56 fa ff ff 85 c0 0f 84 68 fc ff ff 0f b6 4d c6 41 8b 45 00 e9 4d fb ff ff e8 7c 19 e9 e0 <0f> 0b eb fe f6 05 17 91 20 e2 80 74 ce 80 3d 5f 2e 00 00 00 74 <1>[46267.088023] RIP [] nf_nat_setup_info+0x564/0x590 Cc: Eric DumazetCc: Florian Westphal Cc: Pablo Neira Ayuso Cc: Patrick McHardy Cc: Jozsef Kadlecsik Cc: "David S. Miller" Cc: Cyrill Gorcunov Signed-off-by: Andrey Vagin Acked-by: Eric Dumazet Signed-off-by: Pablo Neira Ayuso Signed-off-by: Ani Sinha --- net/netfilter/nf_conntrack_core.c | 21 + 1 file changed, 17 insertions(+), 4 deletions(-) diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c index 9a46908..fd0f7a3 100644 --- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c @@ -309,6 +309,21 @@ static void death_by_timeout(unsigned long ul_conntrack) nf_ct_put(ct); } +static inline bool +nf_ct_key_equal(struct nf_conntrack_tuple_hash *h, + const struct nf_conntrack_tuple *tuple, + u16 zone) +{ + struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(h); + + /* A conntrack can be recreated with the equal tuple, +* so we need to check that the conntrack is confirmed +*/ + return nf_ct_tuple_equal(tuple,
Re: [PATCH 1/1] commit c6825c0976fa7893692e0e43b09740b419b23c09 upstream.
Please refer to the thread "linux 3.4.43 : kernel crash at __nf_conntrack_confirm" on netdev for context. thanks On Sat, Oct 24, 2015 at 11:27 AM, Ani Sinhawrote: > netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get > > Lets look at destroy_conntrack: > > hlist_nulls_del_rcu(>tuplehash[IP_CT_DIR_ORIGINAL].hnnode); > ... > nf_conntrack_free(ct) > kmem_cache_free(net->ct.nf_conntrack_cachep, ct); > > net->ct.nf_conntrack_cachep is created with SLAB_DESTROY_BY_RCU. > > The hash is protected by rcu, so readers look up conntracks without > locks. > A conntrack is removed from the hash, but in this moment a few readers > still can use the conntrack. Then this conntrack is released and another > thread creates conntrack with the same address and the equal tuple. > After this a reader starts to validate the conntrack: > * It's not dying, because a new conntrack was created > * nf_ct_tuple_equal() returns true. > > But this conntrack is not initialized yet, so it can not be used by two > threads concurrently. In this case BUG_ON may be triggered from > nf_nat_setup_info(). > > Florian Westphal suggested to check the confirm bit too. I think it's > right. > > task 1 task 2 task 3 > nf_conntrack_find_get > nf_conntrack_find > destroy_conntrack > hlist_nulls_del_rcu > nf_conntrack_free > kmem_cache_free > __nf_conntrack_alloc > kmem_cache_alloc > > memset(>tuplehash[IP_CT_DIR_MAX], > if (nf_ct_is_dying(ct)) > if (!nf_ct_tuple_equal() > > I'm not sure, that I have ever seen this race condition in a real life. > Currently we are investigating a bug, which is reproduced on a few nodes. > In our case one conntrack is initialized from a few tasks concurrently, > we don't have any other explanation for this. > > <2>[46267.083061] kernel BUG at net/ipv4/netfilter/nf_nat_core.c:322! > ... > <4>[46267.083951] RIP: 0010:[] [] > nf_nat_setup_info+0x564/0x590 [nf_nat] > ... > <4>[46267.085549] Call Trace: > <4>[46267.085622] [] alloc_null_binding+0x5b/0xa0 > [iptable_nat] > <4>[46267.085697] [] nf_nat_rule_find+0x5c/0x80 > [iptable_nat] > <4>[46267.085770] [] nf_nat_fn+0x111/0x260 [iptable_nat] > <4>[46267.085843] [] nf_nat_out+0x48/0xd0 [iptable_nat] > <4>[46267.085919] [] nf_iterate+0x69/0xb0 > <4>[46267.085991] [] ? ip_finish_output+0x0/0x2f0 > <4>[46267.086063] [] nf_hook_slow+0x74/0x110 > <4>[46267.086133] [] ? ip_finish_output+0x0/0x2f0 > <4>[46267.086207] [] ? dst_output+0x0/0x20 > <4>[46267.086277] [] ip_output+0xa4/0xc0 > <4>[46267.086346] [] raw_sendmsg+0x8b4/0x910 > <4>[46267.086419] [] inet_sendmsg+0x4a/0xb0 > <4>[46267.086491] [] ? sock_update_classid+0x3a/0x50 > <4>[46267.086562] [] sock_sendmsg+0x117/0x140 > <4>[46267.086638] [] ? _spin_unlock_bh+0x1b/0x20 > <4>[46267.086712] [] ? autoremove_wake_function+0x0/0x40 > <4>[46267.086785] [] ? do_ip_setsockopt+0x90/0xd80 > <4>[46267.086858] [] ? call_function_interrupt+0xe/0x20 > <4>[46267.086936] [] ? ub_slab_ptr+0x20/0x90 > <4>[46267.087006] [] ? ub_slab_ptr+0x20/0x90 > <4>[46267.087081] [] ? kmem_cache_alloc+0xd8/0x1e0 > <4>[46267.087151] [] sys_sendto+0x139/0x190 > <4>[46267.087229] [] ? sock_setsockopt+0x16d/0x6f0 > <4>[46267.087303] [] ? audit_syscall_entry+0x1d7/0x200 > <4>[46267.087378] [] ? __audit_syscall_exit+0x265/0x290 > <4>[46267.087454] [] ? compat_sys_setsockopt+0x75/0x210 > <4>[46267.087531] [] compat_sys_socketcall+0x13f/0x210 > <4>[46267.087607] [] ia32_sysret+0x0/0x5 > <4>[46267.087676] Code: 91 20 e2 01 75 29 48 89 de 4c 89 f7 e8 56 fa ff ff 85 > c0 0f 84 68 fc ff ff 0f b6 4d c6 41 8b 45 00 e9 4d fb ff ff e8 7c 19 e9 e0 > <0f> 0b eb fe f6 05 17 91 20 e2 80 74 ce 80 3d 5f 2e 00 00 00 74 > <1>[46267.088023] RIP [] nf_nat_setup_info+0x564/0x590 > > Cc: Eric Dumazet > Cc: Florian Westphal > Cc: Pablo Neira Ayuso > Cc: Patrick McHardy > Cc: Jozsef Kadlecsik > Cc: "David S. Miller" > Cc: Cyrill Gorcunov > Signed-off-by: Andrey Vagin > Acked-by: Eric Dumazet > Signed-off-by: Pablo Neira Ayuso > Signed-off-by: Ani Sinha > --- > net/netfilter/nf_conntrack_core.c | 21 + > 1 file changed, 17 insertions(+), 4 deletions(-) > > diff --git a/net/netfilter/nf_conntrack_core.c > b/net/netfilter/nf_conntrack_core.c > index 9a46908..fd0f7a3 100644 > --- a/net/netfilter/nf_conntrack_core.c > +++ b/net/netfilter/nf_conntrack_core.c > @@ -309,6 +309,21 @@ static void death_by_timeout(unsigned long ul_conntrack) > nf_ct_put(ct); > } > > +static inline