Re: [PATCH 1/1] commit c6825c0976fa7893692e0e43b09740b419b23c09 upstream.

2015-11-06 Thread Pablo Neira Ayuso
On Wed, Nov 04, 2015 at 03:46:54PM -0800, Ani Sinha wrote:
> (removed a bunch of people from CC list)
> 
> On Mon, Oct 26, 2015 at 1:06 PM, Pablo Neira Ayuso  
> wrote:
> 
> > Then we can review and, if no major concerns, I can submit this to
> > -stable.
> 
> Now that Neal has sufficiently tested the patches, is it OK to apply
> to -stable or do you guys want me to do anything more?

I'll be passing up this to -stable asap.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] commit c6825c0976fa7893692e0e43b09740b419b23c09 upstream.

2015-11-04 Thread Ani Sinha
(removed a bunch of people from CC list)

On Mon, Oct 26, 2015 at 1:06 PM, Pablo Neira Ayuso  wrote:

> Then we can review and, if no major concerns, I can submit this to
> -stable.

Now that Neal has sufficiently tested the patches, is it OK to apply
to -stable or do you guys want me to do anything more?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] commit c6825c0976fa7893692e0e43b09740b419b23c09 upstream.

2015-11-02 Thread Ani Sinha
> On Thu, Oct 29, 2015 at 6:21 PM, Neal P. Murphy
>  wrote:
> > On Thu, 29 Oct 2015 17:01:24 -0700
> > Ani Sinha  wrote:
> >
> >> On Wed, Oct 28, 2015 at 11:40 PM, Neal P. Murphy
> >>  wrote:
> >> > On Wed, 28 Oct 2015 02:36:50 -0400
> >> > "Neal P. Murphy"  wrote:
> >> >
> >> >> On Mon, 26 Oct 2015 21:06:33 +0100
> >> >> Pablo Neira Ayuso  wrote:
> >> >>
> >> >> > Hi,
> >> >> >
> >> >> > On Mon, Oct 26, 2015 at 11:55:39AM -0700, Ani Sinha wrote:
> >> >> > > netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get
> >> >> >
> >> >> > Please, no need to Cc everyone here. Please, submit your Netfilter
> >> >> > patches to netfilter-de...@vger.kernel.org.
> >> >> >
> >> >> > Moreover, it would be great if the subject includes something
> >> >> > descriptive on what you need, for this I'd suggest:
> >> >> >
> >> >> > [PATCH -stable 3.4,backport] netfilter: nf_conntrack: fix RCU race in 
> >> >> > nf_conntrack_find_get
> >> >> >
> >> >> > I'm including Neal P. Murphy, he said he would help testing these
> >> >> > backports, getting a Tested-by: tag usually speeds up things too.
> >> >>
> >> >
> >> > I've probably done about as much seat-of-the-pants testing as I can. All 
> >> > opening/closing the same destination IP/port.
> >> >
> >> > Host: Debian Jessie, 8-core Vishera 8350 at 4.4 GHz, 16GiB RAM at (I 
> >> > think) 2100MHz.
> >> >
> >> > Traffic generator 1: 6-CPU KVM running 64-bit Smoothwall Express 3.1 
> >> > (linux 3.4.109 without these patches), with 8GiB RAM and 9GiB swap. 
> >> > Packets sent across PURPLE (to bypass NAT and firewall).
> >> >
> >> > Traffic generator 2: 32-bit KVM running Smoothwall Express 3.1 (linux 
> >> > 3.4.110 with these patches), 3GiB RAM and minimal swap.
> >> >
> >> > In the first set of tests, generator 1's traffic passed through 
> >> > Generator 2 as a NATting firewall, to the host's web server. In the 
> >> > second set of tests, generator 2's traffic went through NAT to the 
> >> > host's web server.
> >> >
> >> > The load tests:
> >> >   - 2500 processes using 2500 addresses and random src ports
> >> >   - 2500 processes using 2500 addresses and the same src port
> >> >   - 2500 processes using the same src address and port
> >> >
> >> > I also tested using stock NF timeouts and using 1 second timeouts.
> >> >
> >> > Bandwidth used got as high as 16Mb/s for some tests.
> >> >
> >> > Conntracks got up to 200 000 or so or bounced between 1 and 2, depending 
> >> > on the test and the timeouts.
> >> >
> >> > I did not reproduce the problem these patches solve. But more 
> >> > importantly, I saw no problems at all. Each time I terminated a test, 
> >> > RAM usage returned to about that of post-boot; so there were no apparent 
> >> > memory leaks. No kernel messages and no netfilter messages appeared 
> >> > during the tests.
> >> >
> >> > If I have time, I suppose I could run another set of tests: 2500 source 
> >> > processes using 2500 addresses times 200 ports to connect to 2500 
> >> > addresses times 200 ports on a destination system. Each process opens 
> >> > 200 sockets, then closes them. And repeats ad infinitum. But I might 
> >> > have to be clever since I can't run 500 000 processes; but I could run 
> >> > 20 VMs; that would get it down to about 12 000 processes per VM. And I 
> >> > might have to figure out how to allow allow processes on the destination 
> >> > system to open hundreds or thousands of sockets.
> >>
> >> Should I resend the patch with a Tested-by: tag?
> >
> > ... Oh, wait. Not yet. The dawn just broke over ol' Marblehead here. I only 
> > tested TCP; I need to hammer UDP, too.
> >
> > Can I set the timeouts to zero? Or is one as low as I can go?
>
> Any progress with testing ?

I applied the 'hammer' through a firewall with the patch. I used TCP,
UDP and ICMP.

I don't know if the patch fixes the problem. But I'm reasonably sure
that it did not break normal operations.

To test a different problem I fixed (a memory leak in my 64-bit
counter patch for xt_ACCOUNT), I tested 60,000 addresses (most of a
/16) through the firewall. Again, no troubles.

I only observed two odd things which are likely completely unrelated
to your patch. When I started the TCP test, then added the UDP test,
only TCP would come through. If I stopped and restarted the TCP test,
only UDP would come through. I suspect this is due to buffering. It's
just a behaviour I haven't encountered since I started using Linux
many years ago (around '98). The second, when I started the test, the
firewall would lose contact with the upstream F/W's apcupsd daemon;
again, this is likely due to the nature of the test: it likely floods
input and output queues.


I'd say you can probably resend with Tested-by.

Neal
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

Re: [PATCH 1/1] commit c6825c0976fa7893692e0e43b09740b419b23c09 upstream.

2015-10-30 Thread Ani Sinha
On Thu, Oct 29, 2015 at 6:21 PM, Neal P. Murphy
 wrote:
> On Thu, 29 Oct 2015 17:01:24 -0700
> Ani Sinha  wrote:
>
>> On Wed, Oct 28, 2015 at 11:40 PM, Neal P. Murphy
>>  wrote:
>> > On Wed, 28 Oct 2015 02:36:50 -0400
>> > "Neal P. Murphy"  wrote:
>> >
>> >> On Mon, 26 Oct 2015 21:06:33 +0100
>> >> Pablo Neira Ayuso  wrote:
>> >>
>> >> > Hi,
>> >> >
>> >> > On Mon, Oct 26, 2015 at 11:55:39AM -0700, Ani Sinha wrote:
>> >> > > netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get
>> >> >
>> >> > Please, no need to Cc everyone here. Please, submit your Netfilter
>> >> > patches to netfilter-de...@vger.kernel.org.
>> >> >
>> >> > Moreover, it would be great if the subject includes something
>> >> > descriptive on what you need, for this I'd suggest:
>> >> >
>> >> > [PATCH -stable 3.4,backport] netfilter: nf_conntrack: fix RCU race in 
>> >> > nf_conntrack_find_get
>> >> >
>> >> > I'm including Neal P. Murphy, he said he would help testing these
>> >> > backports, getting a Tested-by: tag usually speeds up things too.
>> >>
>> >
>> > I've probably done about as much seat-of-the-pants testing as I can. All 
>> > opening/closing the same destination IP/port.
>> >
>> > Host: Debian Jessie, 8-core Vishera 8350 at 4.4 GHz, 16GiB RAM at (I 
>> > think) 2100MHz.
>> >
>> > Traffic generator 1: 6-CPU KVM running 64-bit Smoothwall Express 3.1 
>> > (linux 3.4.109 without these patches), with 8GiB RAM and 9GiB swap. 
>> > Packets sent across PURPLE (to bypass NAT and firewall).
>> >
>> > Traffic generator 2: 32-bit KVM running Smoothwall Express 3.1 (linux 
>> > 3.4.110 with these patches), 3GiB RAM and minimal swap.
>> >
>> > In the first set of tests, generator 1's traffic passed through Generator 
>> > 2 as a NATting firewall, to the host's web server. In the second set of 
>> > tests, generator 2's traffic went through NAT to the host's web server.
>> >
>> > The load tests:
>> >   - 2500 processes using 2500 addresses and random src ports
>> >   - 2500 processes using 2500 addresses and the same src port
>> >   - 2500 processes using the same src address and port
>> >
>> > I also tested using stock NF timeouts and using 1 second timeouts.
>> >
>> > Bandwidth used got as high as 16Mb/s for some tests.
>> >
>> > Conntracks got up to 200 000 or so or bounced between 1 and 2, depending 
>> > on the test and the timeouts.
>> >
>> > I did not reproduce the problem these patches solve. But more importantly, 
>> > I saw no problems at all. Each time I terminated a test, RAM usage 
>> > returned to about that of post-boot; so there were no apparent memory 
>> > leaks. No kernel messages and no netfilter messages appeared during the 
>> > tests.
>> >
>> > If I have time, I suppose I could run another set of tests: 2500 source 
>> > processes using 2500 addresses times 200 ports to connect to 2500 
>> > addresses times 200 ports on a destination system. Each process opens 200 
>> > sockets, then closes them. And repeats ad infinitum. But I might have to 
>> > be clever since I can't run 500 000 processes; but I could run 20 VMs; 
>> > that would get it down to about 12 000 processes per VM. And I might have 
>> > to figure out how to allow allow processes on the destination system to 
>> > open hundreds or thousands of sockets.
>>
>> Should I resend the patch with a Tested-by: tag?
>
> ... Oh, wait. Not yet. The dawn just broke over ol' Marblehead here. I only 
> tested TCP; I need to hammer UDP, too.
>
> Can I set the timeouts to zero? Or is one as low as I can go?

I don't see any assertion or check against 0 sec timeouts. You can
try. Your conntrack entries will be constantly flushing.

>
> N
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] commit c6825c0976fa7893692e0e43b09740b419b23c09 upstream.

2015-10-29 Thread Ani Sinha
On Wed, Oct 28, 2015 at 11:40 PM, Neal P. Murphy
 wrote:
> On Wed, 28 Oct 2015 02:36:50 -0400
> "Neal P. Murphy"  wrote:
>
>> On Mon, 26 Oct 2015 21:06:33 +0100
>> Pablo Neira Ayuso  wrote:
>>
>> > Hi,
>> >
>> > On Mon, Oct 26, 2015 at 11:55:39AM -0700, Ani Sinha wrote:
>> > > netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get
>> >
>> > Please, no need to Cc everyone here. Please, submit your Netfilter
>> > patches to netfilter-de...@vger.kernel.org.
>> >
>> > Moreover, it would be great if the subject includes something
>> > descriptive on what you need, for this I'd suggest:
>> >
>> > [PATCH -stable 3.4,backport] netfilter: nf_conntrack: fix RCU race in 
>> > nf_conntrack_find_get
>> >
>> > I'm including Neal P. Murphy, he said he would help testing these
>> > backports, getting a Tested-by: tag usually speeds up things too.
>>
>
> I've probably done about as much seat-of-the-pants testing as I can. All 
> opening/closing the same destination IP/port.
>
> Host: Debian Jessie, 8-core Vishera 8350 at 4.4 GHz, 16GiB RAM at (I think) 
> 2100MHz.
>
> Traffic generator 1: 6-CPU KVM running 64-bit Smoothwall Express 3.1 (linux 
> 3.4.109 without these patches), with 8GiB RAM and 9GiB swap. Packets sent 
> across PURPLE (to bypass NAT and firewall).
>
> Traffic generator 2: 32-bit KVM running Smoothwall Express 3.1 (linux 3.4.110 
> with these patches), 3GiB RAM and minimal swap.
>
> In the first set of tests, generator 1's traffic passed through Generator 2 
> as a NATting firewall, to the host's web server. In the second set of tests, 
> generator 2's traffic went through NAT to the host's web server.
>
> The load tests:
>   - 2500 processes using 2500 addresses and random src ports
>   - 2500 processes using 2500 addresses and the same src port
>   - 2500 processes using the same src address and port
>
> I also tested using stock NF timeouts and using 1 second timeouts.
>
> Bandwidth used got as high as 16Mb/s for some tests.
>
> Conntracks got up to 200 000 or so or bounced between 1 and 2, depending on 
> the test and the timeouts.
>
> I did not reproduce the problem these patches solve. But more importantly, I 
> saw no problems at all. Each time I terminated a test, RAM usage returned to 
> about that of post-boot; so there were no apparent memory leaks. No kernel 
> messages and no netfilter messages appeared during the tests.
>
> If I have time, I suppose I could run another set of tests: 2500 source 
> processes using 2500 addresses times 200 ports to connect to 2500 addresses 
> times 200 ports on a destination system. Each process opens 200 sockets, then 
> closes them. And repeats ad infinitum. But I might have to be clever since I 
> can't run 500 000 processes; but I could run 20 VMs; that would get it down 
> to about 12 000 processes per VM. And I might have to figure out how to allow 
> allow processes on the destination system to open hundreds or thousands of 
> sockets.

Should I resend the patch with a Tested-by: tag?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] commit c6825c0976fa7893692e0e43b09740b419b23c09 upstream.

2015-10-29 Thread Neal P. Murphy
On Wed, 28 Oct 2015 02:36:50 -0400
"Neal P. Murphy"  wrote:

> On Mon, 26 Oct 2015 21:06:33 +0100
> Pablo Neira Ayuso  wrote:
> 
> > Hi,
> > 
> > On Mon, Oct 26, 2015 at 11:55:39AM -0700, Ani Sinha wrote:
> > > netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get
> > 
> > Please, no need to Cc everyone here. Please, submit your Netfilter
> > patches to netfilter-de...@vger.kernel.org.
> > 
> > Moreover, it would be great if the subject includes something
> > descriptive on what you need, for this I'd suggest:
> > 
> > [PATCH -stable 3.4,backport] netfilter: nf_conntrack: fix RCU race in 
> > nf_conntrack_find_get
> > 
> > I'm including Neal P. Murphy, he said he would help testing these
> > backports, getting a Tested-by: tag usually speeds up things too.
> 

I've probably done about as much seat-of-the-pants testing as I can. All 
opening/closing the same destination IP/port.

Host: Debian Jessie, 8-core Vishera 8350 at 4.4 GHz, 16GiB RAM at (I think) 
2100MHz.

Traffic generator 1: 6-CPU KVM running 64-bit Smoothwall Express 3.1 (linux 
3.4.109 without these patches), with 8GiB RAM and 9GiB swap. Packets sent 
across PURPLE (to bypass NAT and firewall).

Traffic generator 2: 32-bit KVM running Smoothwall Express 3.1 (linux 3.4.110 
with these patches), 3GiB RAM and minimal swap.

In the first set of tests, generator 1's traffic passed through Generator 2 as 
a NATting firewall, to the host's web server. In the second set of tests, 
generator 2's traffic went through NAT to the host's web server.

The load tests:
  - 2500 processes using 2500 addresses and random src ports
  - 2500 processes using 2500 addresses and the same src port
  - 2500 processes using the same src address and port

I also tested using stock NF timeouts and using 1 second timeouts.

Bandwidth used got as high as 16Mb/s for some tests.

Conntracks got up to 200 000 or so or bounced between 1 and 2, depending on the 
test and the timeouts.

I did not reproduce the problem these patches solve. But more importantly, I 
saw no problems at all. Each time I terminated a test, RAM usage returned to 
about that of post-boot; so there were no apparent memory leaks. No kernel 
messages and no netfilter messages appeared during the tests.

If I have time, I suppose I could run another set of tests: 2500 source 
processes using 2500 addresses times 200 ports to connect to 2500 addresses 
times 200 ports on a destination system. Each process opens 200 sockets, then 
closes them. And repeats ad infinitum. But I might have to be clever since I 
can't run 500 000 processes; but I could run 20 VMs; that would get it down to 
about 12 000 processes per VM. And I might have to figure out how to allow 
allow processes on the destination system to open hundreds or thousands of 
sockets.

N
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] commit c6825c0976fa7893692e0e43b09740b419b23c09 upstream.

2015-10-29 Thread Neal P. Murphy
On Thu, 29 Oct 2015 17:01:24 -0700
Ani Sinha  wrote:

> On Wed, Oct 28, 2015 at 11:40 PM, Neal P. Murphy
>  wrote:
> > On Wed, 28 Oct 2015 02:36:50 -0400
> > "Neal P. Murphy"  wrote:
> >
> >> On Mon, 26 Oct 2015 21:06:33 +0100
> >> Pablo Neira Ayuso  wrote:
> >>
> >> > Hi,
> >> >
> >> > On Mon, Oct 26, 2015 at 11:55:39AM -0700, Ani Sinha wrote:
> >> > > netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get
> >> >
> >> > Please, no need to Cc everyone here. Please, submit your Netfilter
> >> > patches to netfilter-de...@vger.kernel.org.
> >> >
> >> > Moreover, it would be great if the subject includes something
> >> > descriptive on what you need, for this I'd suggest:
> >> >
> >> > [PATCH -stable 3.4,backport] netfilter: nf_conntrack: fix RCU race in 
> >> > nf_conntrack_find_get
> >> >
> >> > I'm including Neal P. Murphy, he said he would help testing these
> >> > backports, getting a Tested-by: tag usually speeds up things too.
> >>
> >
> > I've probably done about as much seat-of-the-pants testing as I can. All 
> > opening/closing the same destination IP/port.
> >
> > Host: Debian Jessie, 8-core Vishera 8350 at 4.4 GHz, 16GiB RAM at (I think) 
> > 2100MHz.
> >
> > Traffic generator 1: 6-CPU KVM running 64-bit Smoothwall Express 3.1 (linux 
> > 3.4.109 without these patches), with 8GiB RAM and 9GiB swap. Packets sent 
> > across PURPLE (to bypass NAT and firewall).
> >
> > Traffic generator 2: 32-bit KVM running Smoothwall Express 3.1 (linux 
> > 3.4.110 with these patches), 3GiB RAM and minimal swap.
> >
> > In the first set of tests, generator 1's traffic passed through Generator 2 
> > as a NATting firewall, to the host's web server. In the second set of 
> > tests, generator 2's traffic went through NAT to the host's web server.
> >
> > The load tests:
> >   - 2500 processes using 2500 addresses and random src ports
> >   - 2500 processes using 2500 addresses and the same src port
> >   - 2500 processes using the same src address and port
> >
> > I also tested using stock NF timeouts and using 1 second timeouts.
> >
> > Bandwidth used got as high as 16Mb/s for some tests.
> >
> > Conntracks got up to 200 000 or so or bounced between 1 and 2, depending on 
> > the test and the timeouts.
> >
> > I did not reproduce the problem these patches solve. But more importantly, 
> > I saw no problems at all. Each time I terminated a test, RAM usage returned 
> > to about that of post-boot; so there were no apparent memory leaks. No 
> > kernel messages and no netfilter messages appeared during the tests.
> >
> > If I have time, I suppose I could run another set of tests: 2500 source 
> > processes using 2500 addresses times 200 ports to connect to 2500 addresses 
> > times 200 ports on a destination system. Each process opens 200 sockets, 
> > then closes them. And repeats ad infinitum. But I might have to be clever 
> > since I can't run 500 000 processes; but I could run 20 VMs; that would get 
> > it down to about 12 000 processes per VM. And I might have to figure out 
> > how to allow allow processes on the destination system to open hundreds or 
> > thousands of sockets.
> 
> Should I resend the patch with a Tested-by: tag?

... Oh, wait. Not yet. The dawn just broke over ol' Marblehead here. I only 
tested TCP; I need to hammer UDP, too.

Can I set the timeouts to zero? Or is one as low as I can go?

N
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] commit c6825c0976fa7893692e0e43b09740b419b23c09 upstream.

2015-10-28 Thread Neal P. Murphy
On Mon, 26 Oct 2015 21:06:33 +0100
Pablo Neira Ayuso  wrote:

> Hi,
> 
> On Mon, Oct 26, 2015 at 11:55:39AM -0700, Ani Sinha wrote:
> > netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get
> 
> Please, no need to Cc everyone here. Please, submit your Netfilter
> patches to netfilter-de...@vger.kernel.org.
> 
> Moreover, it would be great if the subject includes something
> descriptive on what you need, for this I'd suggest:
> 
> [PATCH -stable 3.4,backport] netfilter: nf_conntrack: fix RCU race in 
> nf_conntrack_find_get
> 
> I'm including Neal P. Murphy, he said he would help testing these
> backports, getting a Tested-by: tag usually speeds up things too.

I hammered it a couple nights ago. First test was 5000 processes on 6 SMP CPUs 
opening and closing a port on a 'remote' host using the usual random source 
ports. Only got up to 32000 conntracks. The generator was a 64-bit Smoothwall 
KVM without the patch. The traffic passed through a 32-bit Smoothwall KVM with 
the patch. The target was on the VM host. No problems encountered. I suspect I 
didn't come close to triggering the original problem. Second test was a couple 
thousand processes all using the same source IP and port and dest IP and port. 
Still no problems. But these were perl scripts (and they used lots of RAM); 
perhaps a short C program would let me run more.

Any ideas on how I might test it more brutally?

N
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] commit c6825c0976fa7893692e0e43b09740b419b23c09 upstream.

2015-10-26 Thread Pablo Neira Ayuso
Hi,

On Mon, Oct 26, 2015 at 11:55:39AM -0700, Ani Sinha wrote:
> netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get

Please, no need to Cc everyone here. Please, submit your Netfilter
patches to netfilter-de...@vger.kernel.org.

Moreover, it would be great if the subject includes something
descriptive on what you need, for this I'd suggest:

[PATCH -stable 3.4,backport] netfilter: nf_conntrack: fix RCU race in 
nf_conntrack_find_get

I'm including Neal P. Murphy, he said he would help testing these
backports, getting a Tested-by: tag usually speeds up things too.

Burden is usually huge here, the easier you get it for us, the best.
Then we can review and, if no major concerns, I can submit this to
-stable.

Let me know if you have any other questions,
Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/1] commit c6825c0976fa7893692e0e43b09740b419b23c09 upstream.

2015-10-26 Thread Ani Sinha
netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get

Lets look at destroy_conntrack:

hlist_nulls_del_rcu(>tuplehash[IP_CT_DIR_ORIGINAL].hnnode);
...
nf_conntrack_free(ct)
kmem_cache_free(net->ct.nf_conntrack_cachep, ct);

net->ct.nf_conntrack_cachep is created with SLAB_DESTROY_BY_RCU.

The hash is protected by rcu, so readers look up conntracks without
locks.
A conntrack is removed from the hash, but in this moment a few readers
still can use the conntrack. Then this conntrack is released and another
thread creates conntrack with the same address and the equal tuple.
After this a reader starts to validate the conntrack:
* It's not dying, because a new conntrack was created
* nf_ct_tuple_equal() returns true.

But this conntrack is not initialized yet, so it can not be used by two
threads concurrently. In this case BUG_ON may be triggered from
nf_nat_setup_info().

Florian Westphal suggested to check the confirm bit too. I think it's
right.

task 1  task 2  task 3
nf_conntrack_find_get
 nf_conntrack_find
destroy_conntrack
 hlist_nulls_del_rcu
 nf_conntrack_free
 kmem_cache_free
__nf_conntrack_alloc
 kmem_cache_alloc
 
memset(>tuplehash[IP_CT_DIR_MAX],
 if (nf_ct_is_dying(ct))
 if (!nf_ct_tuple_equal()

I'm not sure, that I have ever seen this race condition in a real life.
Currently we are investigating a bug, which is reproduced on a few nodes.
In our case one conntrack is initialized from a few tasks concurrently,
we don't have any other explanation for this.

<2>[46267.083061] kernel BUG at net/ipv4/netfilter/nf_nat_core.c:322!
...
<4>[46267.083951] RIP: 0010:[]  [] 
nf_nat_setup_info+0x564/0x590 [nf_nat]
...
<4>[46267.085549] Call Trace:
<4>[46267.085622]  [] alloc_null_binding+0x5b/0xa0 
[iptable_nat]
<4>[46267.085697]  [] nf_nat_rule_find+0x5c/0x80 [iptable_nat]
<4>[46267.085770]  [] nf_nat_fn+0x111/0x260 [iptable_nat]
<4>[46267.085843]  [] nf_nat_out+0x48/0xd0 [iptable_nat]
<4>[46267.085919]  [] nf_iterate+0x69/0xb0
<4>[46267.085991]  [] ? ip_finish_output+0x0/0x2f0
<4>[46267.086063]  [] nf_hook_slow+0x74/0x110
<4>[46267.086133]  [] ? ip_finish_output+0x0/0x2f0
<4>[46267.086207]  [] ? dst_output+0x0/0x20
<4>[46267.086277]  [] ip_output+0xa4/0xc0
<4>[46267.086346]  [] raw_sendmsg+0x8b4/0x910
<4>[46267.086419]  [] inet_sendmsg+0x4a/0xb0
<4>[46267.086491]  [] ? sock_update_classid+0x3a/0x50
<4>[46267.086562]  [] sock_sendmsg+0x117/0x140
<4>[46267.086638]  [] ? _spin_unlock_bh+0x1b/0x20
<4>[46267.086712]  [] ? autoremove_wake_function+0x0/0x40
<4>[46267.086785]  [] ? do_ip_setsockopt+0x90/0xd80
<4>[46267.086858]  [] ? call_function_interrupt+0xe/0x20
<4>[46267.086936]  [] ? ub_slab_ptr+0x20/0x90
<4>[46267.087006]  [] ? ub_slab_ptr+0x20/0x90
<4>[46267.087081]  [] ? kmem_cache_alloc+0xd8/0x1e0
<4>[46267.087151]  [] sys_sendto+0x139/0x190
<4>[46267.087229]  [] ? sock_setsockopt+0x16d/0x6f0
<4>[46267.087303]  [] ? audit_syscall_entry+0x1d7/0x200
<4>[46267.087378]  [] ? __audit_syscall_exit+0x265/0x290
<4>[46267.087454]  [] ? compat_sys_setsockopt+0x75/0x210
<4>[46267.087531]  [] compat_sys_socketcall+0x13f/0x210
<4>[46267.087607]  [] ia32_sysret+0x0/0x5
<4>[46267.087676] Code: 91 20 e2 01 75 29 48 89 de 4c 89 f7 e8 56 fa ff ff 85 
c0 0f 84 68 fc ff ff 0f b6 4d c6 41 8b 45 00 e9 4d fb ff ff e8 7c 19 e9 e0 <0f> 
0b eb fe f6 05 17 91 20 e2 80 74 ce 80 3d 5f 2e 00 00 00 74
<1>[46267.088023] RIP  [] nf_nat_setup_info+0x564/0x590

Cc: Eric Dumazet 
Cc: Florian Westphal 
Cc: Pablo Neira Ayuso 
Cc: Patrick McHardy 
Cc: Jozsef Kadlecsik 
Cc: "David S. Miller" 
Cc: Cyrill Gorcunov 
Signed-off-by: Andrey Vagin 
Acked-by: Eric Dumazet 
Signed-off-by: Pablo Neira Ayuso 
Signed-off-by: Ani Sinha 
---
 net/netfilter/nf_conntrack_core.c | 21 +
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/net/netfilter/nf_conntrack_core.c 
b/net/netfilter/nf_conntrack_core.c
index 9a46908..fd0f7a3 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -309,6 +309,21 @@ static void death_by_timeout(unsigned long ul_conntrack)
nf_ct_put(ct);
 }
 
+static inline bool
+nf_ct_key_equal(struct nf_conntrack_tuple_hash *h,
+   const struct nf_conntrack_tuple *tuple,
+   u16 zone)
+{
+   struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(h);
+
+   /* A conntrack can be recreated with the equal tuple,
+* so we need to check that the conntrack is confirmed
+*/
+   return nf_ct_tuple_equal(tuple, 

[PATCH 1/1] commit c6825c0976fa7893692e0e43b09740b419b23c09 upstream.

2015-10-24 Thread Ani Sinha
netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get

Lets look at destroy_conntrack:

hlist_nulls_del_rcu(>tuplehash[IP_CT_DIR_ORIGINAL].hnnode);
...
nf_conntrack_free(ct)
kmem_cache_free(net->ct.nf_conntrack_cachep, ct);

net->ct.nf_conntrack_cachep is created with SLAB_DESTROY_BY_RCU.

The hash is protected by rcu, so readers look up conntracks without
locks.
A conntrack is removed from the hash, but in this moment a few readers
still can use the conntrack. Then this conntrack is released and another
thread creates conntrack with the same address and the equal tuple.
After this a reader starts to validate the conntrack:
* It's not dying, because a new conntrack was created
* nf_ct_tuple_equal() returns true.

But this conntrack is not initialized yet, so it can not be used by two
threads concurrently. In this case BUG_ON may be triggered from
nf_nat_setup_info().

Florian Westphal suggested to check the confirm bit too. I think it's
right.

task 1  task 2  task 3
nf_conntrack_find_get
 nf_conntrack_find
destroy_conntrack
 hlist_nulls_del_rcu
 nf_conntrack_free
 kmem_cache_free
__nf_conntrack_alloc
 kmem_cache_alloc
 
memset(>tuplehash[IP_CT_DIR_MAX],
 if (nf_ct_is_dying(ct))
 if (!nf_ct_tuple_equal()

I'm not sure, that I have ever seen this race condition in a real life.
Currently we are investigating a bug, which is reproduced on a few nodes.
In our case one conntrack is initialized from a few tasks concurrently,
we don't have any other explanation for this.

<2>[46267.083061] kernel BUG at net/ipv4/netfilter/nf_nat_core.c:322!
...
<4>[46267.083951] RIP: 0010:[]  [] 
nf_nat_setup_info+0x564/0x590 [nf_nat]
...
<4>[46267.085549] Call Trace:
<4>[46267.085622]  [] alloc_null_binding+0x5b/0xa0 
[iptable_nat]
<4>[46267.085697]  [] nf_nat_rule_find+0x5c/0x80 [iptable_nat]
<4>[46267.085770]  [] nf_nat_fn+0x111/0x260 [iptable_nat]
<4>[46267.085843]  [] nf_nat_out+0x48/0xd0 [iptable_nat]
<4>[46267.085919]  [] nf_iterate+0x69/0xb0
<4>[46267.085991]  [] ? ip_finish_output+0x0/0x2f0
<4>[46267.086063]  [] nf_hook_slow+0x74/0x110
<4>[46267.086133]  [] ? ip_finish_output+0x0/0x2f0
<4>[46267.086207]  [] ? dst_output+0x0/0x20
<4>[46267.086277]  [] ip_output+0xa4/0xc0
<4>[46267.086346]  [] raw_sendmsg+0x8b4/0x910
<4>[46267.086419]  [] inet_sendmsg+0x4a/0xb0
<4>[46267.086491]  [] ? sock_update_classid+0x3a/0x50
<4>[46267.086562]  [] sock_sendmsg+0x117/0x140
<4>[46267.086638]  [] ? _spin_unlock_bh+0x1b/0x20
<4>[46267.086712]  [] ? autoremove_wake_function+0x0/0x40
<4>[46267.086785]  [] ? do_ip_setsockopt+0x90/0xd80
<4>[46267.086858]  [] ? call_function_interrupt+0xe/0x20
<4>[46267.086936]  [] ? ub_slab_ptr+0x20/0x90
<4>[46267.087006]  [] ? ub_slab_ptr+0x20/0x90
<4>[46267.087081]  [] ? kmem_cache_alloc+0xd8/0x1e0
<4>[46267.087151]  [] sys_sendto+0x139/0x190
<4>[46267.087229]  [] ? sock_setsockopt+0x16d/0x6f0
<4>[46267.087303]  [] ? audit_syscall_entry+0x1d7/0x200
<4>[46267.087378]  [] ? __audit_syscall_exit+0x265/0x290
<4>[46267.087454]  [] ? compat_sys_setsockopt+0x75/0x210
<4>[46267.087531]  [] compat_sys_socketcall+0x13f/0x210
<4>[46267.087607]  [] ia32_sysret+0x0/0x5
<4>[46267.087676] Code: 91 20 e2 01 75 29 48 89 de 4c 89 f7 e8 56 fa ff ff 85 
c0 0f 84 68 fc ff ff 0f b6 4d c6 41 8b 45 00 e9 4d fb ff ff e8 7c 19 e9 e0 <0f> 
0b eb fe f6 05 17 91 20 e2 80 74 ce 80 3d 5f 2e 00 00 00 74
<1>[46267.088023] RIP  [] nf_nat_setup_info+0x564/0x590

Cc: Eric Dumazet 
Cc: Florian Westphal 
Cc: Pablo Neira Ayuso 
Cc: Patrick McHardy 
Cc: Jozsef Kadlecsik 
Cc: "David S. Miller" 
Cc: Cyrill Gorcunov 
Signed-off-by: Andrey Vagin 
Acked-by: Eric Dumazet 
Signed-off-by: Pablo Neira Ayuso 
Signed-off-by: Ani Sinha 
---
 net/netfilter/nf_conntrack_core.c | 21 +
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/net/netfilter/nf_conntrack_core.c 
b/net/netfilter/nf_conntrack_core.c
index 9a46908..fd0f7a3 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -309,6 +309,21 @@ static void death_by_timeout(unsigned long ul_conntrack)
nf_ct_put(ct);
 }
 
+static inline bool
+nf_ct_key_equal(struct nf_conntrack_tuple_hash *h,
+   const struct nf_conntrack_tuple *tuple,
+   u16 zone)
+{
+   struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(h);
+
+   /* A conntrack can be recreated with the equal tuple,
+* so we need to check that the conntrack is confirmed
+*/
+   return nf_ct_tuple_equal(tuple, 

Re: [PATCH 1/1] commit c6825c0976fa7893692e0e43b09740b419b23c09 upstream.

2015-10-24 Thread Ani Sinha
Please refer to the thread "linux 3.4.43 : kernel crash at
__nf_conntrack_confirm" on netdev for context.

thanks

On Sat, Oct 24, 2015 at 11:27 AM, Ani Sinha  wrote:
> netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get
>
> Lets look at destroy_conntrack:
>
> hlist_nulls_del_rcu(>tuplehash[IP_CT_DIR_ORIGINAL].hnnode);
> ...
> nf_conntrack_free(ct)
> kmem_cache_free(net->ct.nf_conntrack_cachep, ct);
>
> net->ct.nf_conntrack_cachep is created with SLAB_DESTROY_BY_RCU.
>
> The hash is protected by rcu, so readers look up conntracks without
> locks.
> A conntrack is removed from the hash, but in this moment a few readers
> still can use the conntrack. Then this conntrack is released and another
> thread creates conntrack with the same address and the equal tuple.
> After this a reader starts to validate the conntrack:
> * It's not dying, because a new conntrack was created
> * nf_ct_tuple_equal() returns true.
>
> But this conntrack is not initialized yet, so it can not be used by two
> threads concurrently. In this case BUG_ON may be triggered from
> nf_nat_setup_info().
>
> Florian Westphal suggested to check the confirm bit too. I think it's
> right.
>
> task 1  task 2  task 3
> nf_conntrack_find_get
>  nf_conntrack_find
> destroy_conntrack
>  hlist_nulls_del_rcu
>  nf_conntrack_free
>  kmem_cache_free
> __nf_conntrack_alloc
>  kmem_cache_alloc
>  
> memset(>tuplehash[IP_CT_DIR_MAX],
>  if (nf_ct_is_dying(ct))
>  if (!nf_ct_tuple_equal()
>
> I'm not sure, that I have ever seen this race condition in a real life.
> Currently we are investigating a bug, which is reproduced on a few nodes.
> In our case one conntrack is initialized from a few tasks concurrently,
> we don't have any other explanation for this.
>
> <2>[46267.083061] kernel BUG at net/ipv4/netfilter/nf_nat_core.c:322!
> ...
> <4>[46267.083951] RIP: 0010:[]  [] 
> nf_nat_setup_info+0x564/0x590 [nf_nat]
> ...
> <4>[46267.085549] Call Trace:
> <4>[46267.085622]  [] alloc_null_binding+0x5b/0xa0 
> [iptable_nat]
> <4>[46267.085697]  [] nf_nat_rule_find+0x5c/0x80 
> [iptable_nat]
> <4>[46267.085770]  [] nf_nat_fn+0x111/0x260 [iptable_nat]
> <4>[46267.085843]  [] nf_nat_out+0x48/0xd0 [iptable_nat]
> <4>[46267.085919]  [] nf_iterate+0x69/0xb0
> <4>[46267.085991]  [] ? ip_finish_output+0x0/0x2f0
> <4>[46267.086063]  [] nf_hook_slow+0x74/0x110
> <4>[46267.086133]  [] ? ip_finish_output+0x0/0x2f0
> <4>[46267.086207]  [] ? dst_output+0x0/0x20
> <4>[46267.086277]  [] ip_output+0xa4/0xc0
> <4>[46267.086346]  [] raw_sendmsg+0x8b4/0x910
> <4>[46267.086419]  [] inet_sendmsg+0x4a/0xb0
> <4>[46267.086491]  [] ? sock_update_classid+0x3a/0x50
> <4>[46267.086562]  [] sock_sendmsg+0x117/0x140
> <4>[46267.086638]  [] ? _spin_unlock_bh+0x1b/0x20
> <4>[46267.086712]  [] ? autoremove_wake_function+0x0/0x40
> <4>[46267.086785]  [] ? do_ip_setsockopt+0x90/0xd80
> <4>[46267.086858]  [] ? call_function_interrupt+0xe/0x20
> <4>[46267.086936]  [] ? ub_slab_ptr+0x20/0x90
> <4>[46267.087006]  [] ? ub_slab_ptr+0x20/0x90
> <4>[46267.087081]  [] ? kmem_cache_alloc+0xd8/0x1e0
> <4>[46267.087151]  [] sys_sendto+0x139/0x190
> <4>[46267.087229]  [] ? sock_setsockopt+0x16d/0x6f0
> <4>[46267.087303]  [] ? audit_syscall_entry+0x1d7/0x200
> <4>[46267.087378]  [] ? __audit_syscall_exit+0x265/0x290
> <4>[46267.087454]  [] ? compat_sys_setsockopt+0x75/0x210
> <4>[46267.087531]  [] compat_sys_socketcall+0x13f/0x210
> <4>[46267.087607]  [] ia32_sysret+0x0/0x5
> <4>[46267.087676] Code: 91 20 e2 01 75 29 48 89 de 4c 89 f7 e8 56 fa ff ff 85 
> c0 0f 84 68 fc ff ff 0f b6 4d c6 41 8b 45 00 e9 4d fb ff ff e8 7c 19 e9 e0 
> <0f> 0b eb fe f6 05 17 91 20 e2 80 74 ce 80 3d 5f 2e 00 00 00 74
> <1>[46267.088023] RIP  [] nf_nat_setup_info+0x564/0x590
>
> Cc: Eric Dumazet 
> Cc: Florian Westphal 
> Cc: Pablo Neira Ayuso 
> Cc: Patrick McHardy 
> Cc: Jozsef Kadlecsik 
> Cc: "David S. Miller" 
> Cc: Cyrill Gorcunov 
> Signed-off-by: Andrey Vagin 
> Acked-by: Eric Dumazet 
> Signed-off-by: Pablo Neira Ayuso 
> Signed-off-by: Ani Sinha 
> ---
>  net/netfilter/nf_conntrack_core.c | 21 +
>  1 file changed, 17 insertions(+), 4 deletions(-)
>
> diff --git a/net/netfilter/nf_conntrack_core.c 
> b/net/netfilter/nf_conntrack_core.c
> index 9a46908..fd0f7a3 100644
> --- a/net/netfilter/nf_conntrack_core.c
> +++ b/net/netfilter/nf_conntrack_core.c
> @@ -309,6 +309,21 @@ static void death_by_timeout(unsigned long ul_conntrack)
> nf_ct_put(ct);
>  }
>
> +static inline