Re: [PATCH iptables] iptables: support insisting that the lock is held

2017-05-03 Thread Lorenzo Colitti
On Wed, May 3, 2017 at 11:51 PM, Aaron Conole  wrote:
> I've thought about it.  I think a better change that makes sense in the
> presence of concurrent updates would be to use the wait argument as a
> total time, and apply it to both the userspace lock, AND an EAGAIN from
> the kernel space side.  So if the kernel space says 'locked try again',
> and the user passed a -w option, we can simply keep retrying until the
> wait time expires.  Does that make sense and solve your issue, Lorenzo?

The lock is always taken regardless of whether -w is specified. The
only difference is that with -w userspace waits instead of exiting
with an error.

I'm with Pablo on this one - the behaviour is incorrect and should be
fixed. Admins shouldn't have to wait for a concurrent execution to
happen before they know that they misconfigured the lock. I think that
if the admin runs iptables and the lock is unusable, it's a
configuration error, and it should be reported as such immediately.

Also - even if we add a check for EAGAIN, how reliable is that
codepath going to be? Concurrent updates are already pretty rare, and
the problem we saw was pretty hard to track down. If we sprinkle
checks for EAGAIN all over the code, and a future change misses one of
them, how will anyone notice breakage?
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH iptables] iptables: support insisting that the lock is held

2017-05-03 Thread Lorenzo Colitti
On Wed, May 3, 2017 at 11:27 PM, Aaron Conole  wrote:
> I wouldn't say it that way.  I looked at this a while ago, and one thing
> to keep in mind is the if the particular prefix path in the filesystem
> (for instance /run) isn't available, then this will cause iptables to
> fail.  I'm not sure how many systems do provide /run - at the time it
> might have been more common.

That is a configuration error on the part of the distribution
maintainers. The location of the iptables lock is configurable at
compile time and if the distribution does not have /run, the
maintainers can use another path.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH iptables] iptables: support insisting that the lock is held

2017-05-03 Thread Lorenzo Colitti
On Wed, May 3, 2017 at 8:01 PM, Pablo Neira Ayuso  wrote:
> This sounds to me like a wrong design decision was made when
> introducing this userspace lock.

Looks like the lock was introduced by

93587a04d0f2 ("ip[6]tables: Add locking to prevent concurrent instances")

which made the lock optional for backwards compatibility:

   xt_socket = socket(AF_UNIX, SOCK_STREAM, 0);
   /* If we can't even create a socket, fall back to prior
(lockless) behavior */
   if (xt_socket < 0)
   return true;

That was almost four years ago. Sometime between 1.4 and 1.6, the lock
was switched to use flock() on /run/xtables.lock , but the fail-open
behaviour was preserved.

> I would like to skip this compile time switch, if the existing
> behaviour is broken, we should just fix it. What is the scenario that
> can indeed have an impact in terms of backward compatibility breakage?
> Does it really make sense to keep a buggy behaviour around?

I agree that the current behaviour is dangerous and error-prone. It's
bitten us badly several times now. The only use case I came up with is
running commands early on boot, from a read-only filesystem that does
not have /run/xtables.lock. If we change the behaviour, that system
might become unbootable on an iptables upgrade.

Another issue mentioned in the initial socket-based approach was that
no locking is necessary for iptables commands running in different
network namespaces. Someone with such a system might be relying on
this. The impact of this change would be to slow down concurrent
iptables commands.

These both seem like a very contrived cases though. I'll send out
another patch that changes the behaviour.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH iptables] iptables: support insisting that the lock is held

2017-05-03 Thread Aaron Conole
Aaron Conole  writes:

> Pablo Neira Ayuso  writes:
>
>> On Thu, Apr 20, 2017 at 06:23:33PM +0900, Lorenzo Colitti wrote:
>>> Currently, iptables programs will exit with an error if the
>>> iptables lock cannot be acquired, but will silently continue if
>>> the lock cannot be opened at all.
>>
>> This sounds to me like a wrong design decision was made when
>> introducing this userspace lock.
>
> I wouldn't say it that way.  I looked at this a while ago, and one thing
> to keep in mind is the if the particular prefix path in the filesystem
> (for instance /run) isn't available, then this will cause iptables to
> fail.  I'm not sure how many systems do provide /run - at the time it
> might have been more common.

Another issue is container systems.  Until recently, Kubernetes didn't
provide /run at all, and not all container orchestration tools will
provide this filesystem.

>>> This can cause unexpected failures (with unhelpful error messages)
>>> in the presence of concurrent updates.
>>> 
>>> This patch adds a compile-time option that causes iptables to
>>> refuse to do anything if the lock cannot be acquired. It is a
>>> compile-time option instead of a command-line flag because:
>>> 
>>> 1. In order to reliably avoid concurrent modification, all
>>>invocations of iptables commands must follow this behaviour.
>>> 2. Whether or not the lock can be opened is typically not
>>>a run-time condition but is likely to be a configuration
>>>error.
>>>
>>> Tested by deleting xtables.lock and observing that all commands
>>> failed if iptables was compiled with --enable-strict-locking, but
>>> succeeded otherwise.
>>> 
>>> By default, --enable-strict-locking is disabled for backwards
>>> compatibility reasons. It can be enabled by default in a future
>>> change if desired.
>>
>> I would like to skip this compile time switch, if the existing
>> behaviour is broken, we should just fix it. What is the scenario that
>> can indeed have an impact in terms of backward compatibility breakage?
>> Does it really make sense to keep a buggy behaviour around?
>
> I'm not sure about a change to the behavior, but I agree that a compile
> time switch is probably not the way to go.

I've thought about it.  I think a better change that makes sense in the
presence of concurrent updates would be to use the wait argument as a
total time, and apply it to both the userspace lock, AND an EAGAIN from
the kernel space side.  So if the kernel space says 'locked try again',
and the user passed a -w option, we can simply keep retrying until the
wait time expires.  Does that make sense and solve your issue, Lorenzo?
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH iptables] iptables: support insisting that the lock is held

2017-05-03 Thread Aaron Conole
Pablo Neira Ayuso  writes:

> On Thu, Apr 20, 2017 at 06:23:33PM +0900, Lorenzo Colitti wrote:
>> Currently, iptables programs will exit with an error if the
>> iptables lock cannot be acquired, but will silently continue if
>> the lock cannot be opened at all.
>
> This sounds to me like a wrong design decision was made when
> introducing this userspace lock.

I wouldn't say it that way.  I looked at this a while ago, and one thing
to keep in mind is the if the particular prefix path in the filesystem
(for instance /run) isn't available, then this will cause iptables to
fail.  I'm not sure how many systems do provide /run - at the time it
might have been more common.

>> This can cause unexpected failures (with unhelpful error messages)
>> in the presence of concurrent updates.
>> 
>> This patch adds a compile-time option that causes iptables to
>> refuse to do anything if the lock cannot be acquired. It is a
>> compile-time option instead of a command-line flag because:
>> 
>> 1. In order to reliably avoid concurrent modification, all
>>invocations of iptables commands must follow this behaviour.
>> 2. Whether or not the lock can be opened is typically not
>>a run-time condition but is likely to be a configuration
>>error.
>>
>> Tested by deleting xtables.lock and observing that all commands
>> failed if iptables was compiled with --enable-strict-locking, but
>> succeeded otherwise.
>> 
>> By default, --enable-strict-locking is disabled for backwards
>> compatibility reasons. It can be enabled by default in a future
>> change if desired.
>
> I would like to skip this compile time switch, if the existing
> behaviour is broken, we should just fix it. What is the scenario that
> can indeed have an impact in terms of backward compatibility breakage?
> Does it really make sense to keep a buggy behaviour around?

I'm not sure about a change to the behavior, but I agree that a compile
time switch is probably not the way to go.

-Aaron
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/16] Netfilter/IPVS/OVS fixes for net

2017-05-03 Thread David Miller
From: Pablo Neira Ayuso 
Date: Wed,  3 May 2017 11:31:55 +0200

> The following patchset contains a rather large batch of Netfilter, IPVS
> and OVS fixes for your net tree. This includes fixes for ctnetlink, the
> userspace conntrack helper infrastructure, conntrack OVS support,
> ebtables DNAT target, several leaks in error path among other. More
> specifically, they are:
 ...
> You can pull these changes from:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git

Pulled, thanks Pablo.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] netfilter: conntrack: Force inlining of build check to prevent build failure

2017-05-03 Thread Pablo Neira Ayuso
On Wed, May 03, 2017 at 09:55:16AM -0400, David Miller wrote:
> From: Geert Uytterhoeven 
> Date: Wed,  3 May 2017 14:18:43 +0200
> 
> > If gcc (e.g. 4.1.2) decides not to inline total_extension_size(), the
> > build will fail with:
> > 
> > net/built-in.o: In function `nf_conntrack_init_start':
> > (.text+0x9baf6): undefined reference to `__compiletime_assert_1893'
> > 
> > or
> > 
> > ERROR: "__compiletime_assert_1893" [net/netfilter/nf_conntrack.ko] 
> > undefined!
> > 
> > Fix this by forcing inlining of total_extension_size().
> > 
> > Fixes: b3a5db109e0670d6 ("netfilter: conntrack: use u8 for extension sizes 
> > again")
> > Signed-off-by: Geert Uytterhoeven 
> 
> Pablo, I'm going to apply this directly to my tree to fix this build
> failure, I hope you don't mind.

Acked-by: Pablo Neira Ayuso 
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] netfilter: conntrack: Force inlining of build check to prevent build failure

2017-05-03 Thread David Miller
From: Geert Uytterhoeven 
Date: Wed,  3 May 2017 14:18:43 +0200

> If gcc (e.g. 4.1.2) decides not to inline total_extension_size(), the
> build will fail with:
> 
> net/built-in.o: In function `nf_conntrack_init_start':
> (.text+0x9baf6): undefined reference to `__compiletime_assert_1893'
> 
> or
> 
> ERROR: "__compiletime_assert_1893" [net/netfilter/nf_conntrack.ko] 
> undefined!
> 
> Fix this by forcing inlining of total_extension_size().
> 
> Fixes: b3a5db109e0670d6 ("netfilter: conntrack: use u8 for extension sizes 
> again")
> Signed-off-by: Geert Uytterhoeven 

Pablo, I'm going to apply this directly to my tree to fix this build
failure, I hope you don't mind.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] netfilter: conntrack: Force inlining of build check to prevent build failure

2017-05-03 Thread Arnd Bergmann
On Wed, May 3, 2017 at 2:47 PM, Geert Uytterhoeven  wrote:
> Hi Arnd,
>
> On Wed, May 3, 2017 at 2:32 PM, Arnd Bergmann  wrote:
>> On Wed, May 3, 2017 at 2:18 PM, Geert Uytterhoeven  
>> wrote:
>>> If gcc (e.g. 4.1.2) decides not to inline total_extension_size(), the
>>> build will fail with:
>>>
>>> net/built-in.o: In function `nf_conntrack_init_start':
>>> (.text+0x9baf6): undefined reference to `__compiletime_assert_1893'
>>>
>>> or
>>>
>>> ERROR: "__compiletime_assert_1893" [net/netfilter/nf_conntrack.ko] 
>>> undefined!
>>>
>>> Fix this by forcing inlining of total_extension_size().
>>>
>>> Fixes: b3a5db109e0670d6 ("netfilter: conntrack: use u8 for extension sizes 
>>> again")
>>> Signed-off-by: Geert Uytterhoeven 
>>
>> I saw this as well when I tried building with "gcc-7 -Og", and came to the 
>> same
>> conclusion.
>
> Good^H^H^H^HBad to see it not only happens with ancient compilers ;-)

Right now we don't see it on newer compilers (I assume gcc-4.3 or up) as we
always build with either -O2 or -Os optimizations. I was playing with -Og the
other day to get faster builds, but that causes many build failures
and additional
warnings as the result of missing out on optimizations that we have come
to rely on.

It might be worth getting -Og to build if the compile time is
drastically faster, but
we probably have to completely do away with BUILD_BUG_ON() and similar
checks in that configuration, which in turn makes the build output less
valuable.

   Arnd
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] netfilter: conntrack: Force inlining of build check to prevent build failure

2017-05-03 Thread Geert Uytterhoeven
Hi Arnd,

On Wed, May 3, 2017 at 2:32 PM, Arnd Bergmann  wrote:
> On Wed, May 3, 2017 at 2:18 PM, Geert Uytterhoeven  
> wrote:
>> If gcc (e.g. 4.1.2) decides not to inline total_extension_size(), the
>> build will fail with:
>>
>> net/built-in.o: In function `nf_conntrack_init_start':
>> (.text+0x9baf6): undefined reference to `__compiletime_assert_1893'
>>
>> or
>>
>> ERROR: "__compiletime_assert_1893" [net/netfilter/nf_conntrack.ko] 
>> undefined!
>>
>> Fix this by forcing inlining of total_extension_size().
>>
>> Fixes: b3a5db109e0670d6 ("netfilter: conntrack: use u8 for extension sizes 
>> again")
>> Signed-off-by: Geert Uytterhoeven 
>
> I saw this as well when I tried building with "gcc-7 -Og", and came to the 
> same
> conclusion.

Good^H^H^H^HBad to see it not only happens with ancient compilers ;-)

> Acked-by: Arnd Bergmann 

Thanks!

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] netfilter: conntrack: Force inlining of build check to prevent build failure

2017-05-03 Thread Florian Westphal
Geert Uytterhoeven  wrote:
> If gcc (e.g. 4.1.2) decides not to inline total_extension_size(), the
> build will fail with:
> 
> net/built-in.o: In function `nf_conntrack_init_start':
> (.text+0x9baf6): undefined reference to `__compiletime_assert_1893'
> 
> or
> 
> ERROR: "__compiletime_assert_1893" [net/netfilter/nf_conntrack.ko] 
> undefined!
> 
> Fix this by forcing inlining of total_extension_size().

Sorry about that, thanks for the fix Geert.

Acked-by: Florian Westphal 
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] netfilter: conntrack: Force inlining of build check to prevent build failure

2017-05-03 Thread Arnd Bergmann
On Wed, May 3, 2017 at 2:18 PM, Geert Uytterhoeven  wrote:
> If gcc (e.g. 4.1.2) decides not to inline total_extension_size(), the
> build will fail with:
>
> net/built-in.o: In function `nf_conntrack_init_start':
> (.text+0x9baf6): undefined reference to `__compiletime_assert_1893'
>
> or
>
> ERROR: "__compiletime_assert_1893" [net/netfilter/nf_conntrack.ko] 
> undefined!
>
> Fix this by forcing inlining of total_extension_size().
>
> Fixes: b3a5db109e0670d6 ("netfilter: conntrack: use u8 for extension sizes 
> again")
> Signed-off-by: Geert Uytterhoeven 

I saw this as well when I tried building with "gcc-7 -Og", and came to the same
conclusion.

Acked-by: Arnd Bergmann 

With -Og, there were a couple of other instances of BUILD_BUG_ON() failing
to see a compile-time constant, presumably as it fails to inline any functions
that are not explicitly marked inline.

   Arnd
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] netfilter: conntrack: Force inlining of build check to prevent build failure

2017-05-03 Thread Geert Uytterhoeven
If gcc (e.g. 4.1.2) decides not to inline total_extension_size(), the
build will fail with:

net/built-in.o: In function `nf_conntrack_init_start':
(.text+0x9baf6): undefined reference to `__compiletime_assert_1893'

or

ERROR: "__compiletime_assert_1893" [net/netfilter/nf_conntrack.ko] 
undefined!

Fix this by forcing inlining of total_extension_size().

Fixes: b3a5db109e0670d6 ("netfilter: conntrack: use u8 for extension sizes 
again")
Signed-off-by: Geert Uytterhoeven 
---
 net/netfilter/nf_conntrack_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/nf_conntrack_core.c 
b/net/netfilter/nf_conntrack_core.c
index f9245dbfe4356da6..3c8f1ed2f5558fe0 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -1853,7 +1853,7 @@ EXPORT_SYMBOL_GPL(nf_conntrack_set_hashsize);
 module_param_call(hashsize, nf_conntrack_set_hashsize, param_get_uint,
  _conntrack_htable_size, 0600);
 
-static unsigned int total_extension_size(void)
+static __always_inline unsigned int total_extension_size(void)
 {
/* remember to add new extensions below */
BUILD_BUG_ON(NF_CT_EXT_NUM > 9);
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


crash on >= 4.9.0 kernel seems nf related

2017-05-03 Thread Bjørnar Ness
After running on kernel 4.9 for quite some time, we suddenly experienced
hangs. Tried compiling 4.11.0, but the problem remains. The only think
that changed was a couple
more rules in nft, traffic is the same.

4.8.6 kernel works fine (but has other problems). Attached is the
debugging output, hope it can
help you track down the issue:

May  3 12:46:59 fw01 kernel: [  271.875363] [ cut here ]
May  3 12:46:59 fw01 kernel: [  271.875369] WARNING: CPU: 0 PID: 3 at
net/sched/sch_generic.c:316 dev_watchdog+0x212/0x220
May  3 12:46:59 fw01 kernel: [  271.875370] NETDEV WATCHDOG: eth1
(igb): transmit queue 1 timed out
May  3 12:46:59 fw01 kernel: [  271.875370] Modules linked in:
May  3 12:46:59 fw01 kernel: [  271.875373] CPU: 0 PID: 3 Comm:
kworker/0:0 Not tainted 4.11.0 #1
May  3 12:46:59 fw01 kernel: [  271.875374] Hardware name: Dell Inc.
PowerEdge R210/05KX61, BIOS 1.3.4 05/24/2010
May  3 12:46:59 fw01 kernel: [  271.875377] Workqueue: events_long gc_worker
May  3 12:46:59 fw01 kernel: [  271.875378] Call Trace:
May  3 12:46:59 fw01 kernel: [  271.875380]  
May  3 12:46:59 fw01 kernel: [  271.875383]  dump_stack+0x4d/0x6e
May  3 12:46:59 fw01 kernel: [  271.875386]  __warn+0xcc/0xf0
May  3 12:46:59 fw01 kernel: [  271.875387]  warn_slowpath_fmt+0x4a/0x50
May  3 12:46:59 fw01 kernel: [  271.875390]  ? tick_program_event+0x3f/0x70
May  3 12:46:59 fw01 kernel: [  271.875391]  dev_watchdog+0x212/0x220
May  3 12:46:59 fw01 kernel: [  271.875392]  ?
dev_deactivate_queue.constprop.31+0x60/0x60
May  3 12:46:59 fw01 kernel: [  271.875395]  call_timer_fn+0x30/0x140
May  3 12:46:59 fw01 kernel: [  271.875397]  run_timer_softirq+0x1be/0x3f0
May  3 12:46:59 fw01 kernel: [  271.875399]  ? handle_irq_event_percpu+0x40/0x50
May  3 12:46:59 fw01 kernel: [  271.875401]  __do_softirq+0xbe/0x280
May  3 12:46:59 fw01 kernel: [  271.875404]  do_softirq_own_stack+0x1c/0x30
May  3 12:46:59 fw01 kernel: [  271.875404]  
May  3 12:46:59 fw01 kernel: [  271.875406]  do_softirq+0x42/0x50
May  3 12:46:59 fw01 kernel: [  271.875407]  __local_bh_enable_ip+0x75/0x80
May  3 12:46:59 fw01 kernel: [  271.875410]  _raw_spin_unlock_bh+0x15/0x20
May  3 12:46:59 fw01 kernel: [  271.875412]
nf_nat_cleanup_conntrack+0x1bd/0x230
May  3 12:46:59 fw01 kernel: [  271.875413]  ? nf_nat_l3proto_register+0x70/0x70
May  3 12:46:59 fw01 kernel: [  271.875415]  __nf_ct_ext_destroy+0x38/0x50
May  3 12:46:59 fw01 kernel: [  271.875417]  nf_conntrack_free+0x17/0x50
May  3 12:46:59 fw01 kernel: [  271.875418]  destroy_conntrack+0x74/0x90
May  3 12:46:59 fw01 kernel: [  271.875421]  nf_conntrack_destroy+0x12/0x20
May  3 12:46:59 fw01 kernel: [  271.875422]  nf_ct_gc_expired+0x45/0x90
May  3 12:46:59 fw01 kernel: [  271.875423]  gc_worker+0xb3/0x180
May  3 12:46:59 fw01 kernel: [  271.875425]  process_one_work+0x143/0x3e0
May  3 12:46:59 fw01 kernel: [  271.875426]  worker_thread+0x126/0x480
May  3 12:46:59 fw01 kernel: [  271.875429]  kthread+0x104/0x140
May  3 12:46:59 fw01 kernel: [  271.875430]  ? process_one_work+0x3e0/0x3e0
May  3 12:46:59 fw01 kernel: [  271.875431]  ? kthread_park+0x90/0x90
May  3 12:46:59 fw01 kernel: [  271.875432]  ret_from_fork+0x29/0x40
May  3 12:46:59 fw01 kernel: [  271.875434] ---[ end trace 3da51f3ef83370a7 ]---
May  3 12:47:03 fw01 kernel: [  275.629072] igb :01:00.0 eth0:
igb: eth0 NIC Link is Down
May  3 12:47:03 fw01 kernel: [  275.629135] igb :01:00.1 eth1: Reset adapter
May  3 12:47:03 fw01 kernel: [  275.629316] igb :01:00.0 eth0: Reset adapter
May  3 12:47:12 fw01 kernel: [  285.159798] igb :01:00.1 eth1:
igb: eth1 NIC Link is Up 1000 Mbps Half Duplex, Flow Control: RX/TX
May  3 12:47:12 fw01 kernel: [  285.159801] igb :01:00.1: EEE
Disabled: unsupported at half duplex. Re-enable using ethtool when at
full duplex.
May  3 12:47:15 fw01 kernel: [  287.842412] igb :01:00.1: exceed
max 2 second
May  3 12:48:21 fw01 kernel: [  353.875946] igb :01:00.0 eth0: Reset adapter

-- 
Bj(/)rnar
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: .nf_ct_iterate_cleanup panic

2017-05-03 Thread Florian Westphal
dalin liu  wrote:
> kernel version: 4.4.56
> 
> In my linux box,when some PPPoE link are disconnected,
> nf_ct_iterate_cleanup will be called.nf_ct_iterate_cleanup may have
> access to the wrong conntrack address:

Is this a new bug?  If so, what kernel version did not have this problem?
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH iptables] iptables: support insisting that the lock is held

2017-05-03 Thread Pablo Neira Ayuso
On Thu, Apr 20, 2017 at 06:23:33PM +0900, Lorenzo Colitti wrote:
> Currently, iptables programs will exit with an error if the
> iptables lock cannot be acquired, but will silently continue if
> the lock cannot be opened at all.

This sounds to me like a wrong design decision was made when
introducing this userspace lock.

> This can cause unexpected failures (with unhelpful error messages)
> in the presence of concurrent updates.
> 
> This patch adds a compile-time option that causes iptables to
> refuse to do anything if the lock cannot be acquired. It is a
> compile-time option instead of a command-line flag because:
> 
> 1. In order to reliably avoid concurrent modification, all
>invocations of iptables commands must follow this behaviour.
> 2. Whether or not the lock can be opened is typically not
>a run-time condition but is likely to be a configuration
>error.
> 
> Tested by deleting xtables.lock and observing that all commands
> failed if iptables was compiled with --enable-strict-locking, but
> succeeded otherwise.
> 
> By default, --enable-strict-locking is disabled for backwards
> compatibility reasons. It can be enabled by default in a future
> change if desired.

I would like to skip this compile time switch, if the existing
behaviour is broken, we should just fix it. What is the scenario that
can indeed have an impact in terms of backward compatibility breakage?
Does it really make sense to keep a buggy behaviour around?
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


.nf_ct_iterate_cleanup panic

2017-05-03 Thread dalin liu
kernel version: 4.4.56

In my linux box,when some PPPoE link are disconnected,
nf_ct_iterate_cleanup will be called.nf_ct_iterate_cleanup may have
access to the wrong conntrack address:

<1>[927268.772583] BUG: unable to handle kernel paging request at
88a0050402c7
<1>[927268.785578] IP: [] nf_ct_iterate_cleanup+0xa2/0x230
<4>[927268.792969] PGD 0
<4>[927268.800512] Oops:  [#1] SMP
<4>[927268.808463] Modules linked in: ipmi_watchdog iptable_raw xt_CT
nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_ftp nf_conntrack_sip
nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_ftp
ip_set_hash_netiface ip_set_hash_net ip_set_hash_ip xt_set ip_set
ixgbe(O) vxlan igb(O) e1000e(O)
<4>[927268.862764] CPU: 46 PID: 6186 Comm: pppd Tainted: G   O
   4.4.56 #58
<4>[927268.882176] Hardware name: Supermicro Super Server/X10DRL-i,
BIOS 2.0a 08/25/2016
<4>[927268.903142] task: 880472611c40 ti: 880223ed8000
task.ti: 880223ed8000
<4>[927268.925685] RIP: 0010:[]
[] nf_ct_iterate_cleanup+0xa2/0x230
<4>[927268.949862] RSP: 0018:880223edbc28  EFLAGS: 00010246
<4>[927268.962330] RAX: c90001eb1000 RBX: 8193a8d0 RCX:
8801f4a668a8
<4>[927268.988124] RDX: 0001 RSI: 0200 RDI:
82007af4
<4>[927269.015332] RBP: 880223edbc78 R08: 88047fc43f08 R09:
0101
<4>[927269.043272] R10:  R11:  R12:
6c89
<4>[927269.072536] R13: 820ac780 R14: 88a005040290 R15:
8801f56f2a80
<4>[927269.102576] FS:  7f1d30a3d720()
GS:88047fc4() knlGS:
<4>[927269.132067] CS:  0010 DS:  ES:  CR0: 80050033
<4>[927269.147159] CR2: 88a0050402c7 CR3: 00046e37e000 CR4:
001406e0
<4>[927269.177840] Stack:
<4>[927269.192958]  880223edbc38  013d
0013593d
<4>[927269.223469]  880223edbc88 0002 880223edbd20
fff0
<4>[927269.253778]  820b7050  880223edbc88
8193a940
<4>[927269.283402] Call Trace:
<4>[927269.297585]  [] masq_device_event+0x30/0x40
<4>[927269.311561]  [] notifier_call_chain+0x4b/0x70
<4>[927269.325391]  [] raw_notifier_call_chain+0x11/0x20
<4>[927269.338999]  [] call_netdevice_notifiers_info+0x3b/0x70
<4>[927269.352544]  [] call_netdevice_notifiers+0x11/0x20
<4>[927269.365945]  [] __dev_notify_flags+0x5f/0xb0
<4>[927269.379318]  [] dev_change_flags+0x54/0x70
<4>[927269.392522]  [] devinet_ioctl+0x5a1/0x670
<4>[927269.405462]  [] inet_ioctl+0x66/0x80
<4>[927269.418095]  [] sock_ioctl+0x66/0x260
<4>[927269.430430]  [] do_vfs_ioctl+0x81/0x4d0
<4>[927269.442515]  [] SyS_ioctl+0x47/0x80
<4>[927269.454330]  [] entry_SYSCALL_64_fastpath+0x12/0x6a
<4>[927269.465924] Code: 00 00 0f 83 54 01 00 00 49 8b 85 80 0c 00 00
4e 8b 34 f0 41 f6 c6 01 74 12 e9 3e 01 00 00 4d 8b 36 41 f6 c6 01 0f
85 31 01 00 00 <41> 80 7e 37 00 75 ec 4d 8d 7e f0 4c 89 e6 4c 89 ff ff
d3 85 c0
<1>[927269.506749] RIP  [] nf_ct_iterate_cleanup+0xa2/0x230
<4>[927269.519054]  RSP 
<4>[927269.531071] CR2: 88a0050402c7
<4>[927269.552110] ---[ end trace 01e3243232603b17 ]---
<0>[927270.934341] Kernel panic - not syncing: Fatal exception in interrupt


103c: 0f 85 31 01 00 00 jne1173 
if (NF_CT_DIRECTION(h) != IP_CT_DIR_ORIGINAL)
1042: 41 80 7e 37 00   cmpb   $0x0,0x37(%r14)
1047: 75 ec jne1035 
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 15/16] netfilter: update MAINTAINERS file

2017-05-03 Thread Pablo Neira Ayuso
Several updates on the MAINTAINERS section for Netfilter:

1) Add Florian Westphal, he's been part of the coreteam since October 2012.
   He's been dedicating tireless efforts to improve the Netfilter codebase,
   fix bugs and push ongoing new developments ever since.

2) Add http://www.nftables.org/ URL, currently pointing to
   http://www.netfilter.org.

3) Update project status from Supported to Maintained.

Signed-off-by: Pablo Neira Ayuso 
Acked-by: Florian Westphal 
---
 MAINTAINERS | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 38d3e4ed7208..fc95cb06fb29 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8701,14 +8701,16 @@ F:  drivers/net/ethernet/neterion/
 NETFILTER
 M: Pablo Neira Ayuso 
 M: Jozsef Kadlecsik 
+M: Florian Westphal 
 L: netfilter-devel@vger.kernel.org
 L: coret...@netfilter.org
 W: http://www.netfilter.org/
 W: http://www.iptables.org/
+W: http://www.nftables.org/
 Q: http://patchwork.ozlabs.org/project/netfilter-devel/list/
 T: git git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git
 T: git git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next.git
-S: Supported
+S: Maintained
 F: include/linux/netfilter*
 F: include/linux/netfilter/
 F: include/net/netfilter/
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 13/16] ipvs: explicitly forbid ipv6 service/dest creation if ipv6 mod is disabled

2017-05-03 Thread Pablo Neira Ayuso
From: Paolo Abeni 

When creating a new ipvs service, ipv6 addresses are always accepted
if CONFIG_IP_VS_IPV6 is enabled. On dest creation the address family
is not explicitly checked.

This allows the user-space to configure ipvs services even if the
system is booted with ipv6.disable=1. On specific configuration, ipvs
can try to call ipv6 routing code at setup time, causing the kernel to
oops due to fib6_rules_ops being NULL.

This change addresses the issue adding a check for the ipv6
module being enabled while validating ipv6 service operations and
adding the same validation for dest operations.

According to git history, this issue is apparently present since
the introduction of ipv6 support, and the oops can be triggered
since commit 09571c7ae30865ad ("IPVS: Add function to determine
if IPv6 address is local")

Fixes: 09571c7ae30865ad ("IPVS: Add function to determine if IPv6 address is 
local")
Signed-off-by: Paolo Abeni 
Acked-by: Julian Anastasov 
Signed-off-by: Simon Horman 
---
 net/netfilter/ipvs/ip_vs_ctl.c | 22 +-
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index 5aeb0dde6ccc..4d753beaac32 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -3078,6 +3078,17 @@ static int ip_vs_genl_dump_services(struct sk_buff *skb,
return skb->len;
 }
 
+static bool ip_vs_is_af_valid(int af)
+{
+   if (af == AF_INET)
+   return true;
+#ifdef CONFIG_IP_VS_IPV6
+   if (af == AF_INET6 && ipv6_mod_enabled())
+   return true;
+#endif
+   return false;
+}
+
 static int ip_vs_genl_parse_service(struct netns_ipvs *ipvs,
struct ip_vs_service_user_kern *usvc,
struct nlattr *nla, int full_entry,
@@ -3104,11 +3115,7 @@ static int ip_vs_genl_parse_service(struct netns_ipvs 
*ipvs,
memset(usvc, 0, sizeof(*usvc));
 
usvc->af = nla_get_u16(nla_af);
-#ifdef CONFIG_IP_VS_IPV6
-   if (usvc->af != AF_INET && usvc->af != AF_INET6)
-#else
-   if (usvc->af != AF_INET)
-#endif
+   if (!ip_vs_is_af_valid(usvc->af))
return -EAFNOSUPPORT;
 
if (nla_fwmark) {
@@ -3610,6 +3617,11 @@ static int ip_vs_genl_set_cmd(struct sk_buff *skb, 
struct genl_info *info)
if (udest.af == 0)
udest.af = svc->af;
 
+   if (!ip_vs_is_af_valid(udest.af)) {
+   ret = -EAFNOSUPPORT;
+   goto out;
+   }
+
if (udest.af != svc->af && cmd != IPVS_CMD_DEL_DEST) {
/* The synchronization protocol is incompatible
 * with mixed family services
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 09/16] netfilter: xt_socket: Fix broken IPv6 handling

2017-05-03 Thread Pablo Neira Ayuso
From: Peter Tirsek 

Commit 834184b1f3a4 ("netfilter: defrag: only register defrag
functionality if needed") used the outdated XT_SOCKET_HAVE_IPV6 macro
which was removed earlier in commit 8db4c5be88f6 ("netfilter: move
socket lookup infrastructure to nf_socket_ipv{4,6}.c"). With that macro
never being defined, the xt_socket match emits an "Unknown family 10"
warning when used with IPv6:

WARNING: CPU: 0 PID: 1377 at net/netfilter/xt_socket.c:160 
socket_mt_enable_defrag+0x47/0x50 [xt_socket]
Unknown family 10
Modules linked in: xt_socket nf_socket_ipv4 nf_socket_ipv6 nf_defrag_ipv4 [...]
CPU: 0 PID: 1377 Comm: ip6tables-resto Not tainted 4.10.10 #1
Hardware name: [...]
Call Trace:
? __warn+0xe7/0x100
? socket_mt_enable_defrag+0x47/0x50 [xt_socket]
? socket_mt_enable_defrag+0x47/0x50 [xt_socket]
? warn_slowpath_fmt+0x39/0x40
? socket_mt_enable_defrag+0x47/0x50 [xt_socket]
? socket_mt_v2_check+0x12/0x40 [xt_socket]
? xt_check_match+0x6b/0x1a0 [x_tables]
? xt_find_match+0x93/0xd0 [x_tables]
? xt_request_find_match+0x20/0x80 [x_tables]
? translate_table+0x48e/0x870 [ip6_tables]
? translate_table+0x577/0x870 [ip6_tables]
? walk_component+0x3a/0x200
? kmalloc_order+0x1d/0x50
? do_ip6t_set_ctl+0x181/0x490 [ip6_tables]
? filename_lookup+0xa5/0x120
? nf_setsockopt+0x3a/0x60
? ipv6_setsockopt+0xb0/0xc0
? sock_common_setsockopt+0x23/0x30
? SyS_socketcall+0x41d/0x630
? vfs_read+0xfa/0x120
? do_fast_syscall_32+0x7a/0x110
? entry_SYSENTER_32+0x47/0x71

This patch brings the conditional back in line with how the rest of the
file handles IPv6.

Fixes: 834184b1f3a4 ("netfilter: defrag: only register defrag functionality if 
needed")
Signed-off-by: Peter Tirsek 
Acked-by: Florian Westphal 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/xt_socket.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/xt_socket.c b/net/netfilter/xt_socket.c
index 770bbec878f1..e75ef39669c5 100644
--- a/net/netfilter/xt_socket.c
+++ b/net/netfilter/xt_socket.c
@@ -152,7 +152,7 @@ static int socket_mt_enable_defrag(struct net *net, int 
family)
switch (family) {
case NFPROTO_IPV4:
return nf_defrag_ipv4_enable(net);
-#ifdef XT_SOCKET_HAVE_IPV6
+#if IS_ENABLED(CONFIG_IP6_NF_IPTABLES)
case NFPROTO_IPV6:
return nf_defrag_ipv6_enable(net);
 #endif
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 03/16] netfilter: nf_ct_helper: permit cthelpers with different names via nfnetlink

2017-05-03 Thread Pablo Neira Ayuso
From: Liping Zhang 

cthelpers added via nfnetlink may have the same tuple, i.e. except for
the l3proto and l4proto, other fields are all zero. So even with the
different names, we will also fail to add them:
  # nfct helper add ssdp inet udp
  # nfct helper add tftp inet udp
  nfct v1.4.3: netlink error: File exists

So in order to avoid unpredictable behaviour, we should:
1. cthelpers can be selected by nft ct helper obj or xt_CT target, so
report error if duplicated { name, l3proto, l4proto } tuple exist.
2. cthelpers can be selected by nf_ct_tuple_src_mask_cmp when
nf_ct_auto_assign_helper is enabled, so also report error if duplicated
{ l3proto, l4proto, src-port } tuple exist.

Also note, if the cthelper is added from userspace, then the src-port will
always be zero, it's invalid for nf_ct_auto_assign_helper, so there's no
need to check the second point listed above.

Fixes: 893e093c786c ("netfilter: nf_ct_helper: bail out on duplicated helpers")
Signed-off-by: Liping Zhang 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/nf_conntrack_helper.c | 26 +-
 1 file changed, 21 insertions(+), 5 deletions(-)

diff --git a/net/netfilter/nf_conntrack_helper.c 
b/net/netfilter/nf_conntrack_helper.c
index 4eeb3418366a..99bcd44aac70 100644
--- a/net/netfilter/nf_conntrack_helper.c
+++ b/net/netfilter/nf_conntrack_helper.c
@@ -386,17 +386,33 @@ int nf_conntrack_helper_register(struct 
nf_conntrack_helper *me)
struct nf_conntrack_tuple_mask mask = { .src.u.all = htons(0x) };
unsigned int h = helper_hash(>tuple);
struct nf_conntrack_helper *cur;
-   int ret = 0;
+   int ret = 0, i;
 
BUG_ON(me->expect_policy == NULL);
BUG_ON(me->expect_class_max >= NF_CT_MAX_EXPECT_CLASSES);
BUG_ON(strlen(me->name) > NF_CT_HELPER_NAME_LEN - 1);
 
mutex_lock(_ct_helper_mutex);
-   hlist_for_each_entry(cur, _ct_helper_hash[h], hnode) {
-   if (nf_ct_tuple_src_mask_cmp(>tuple, >tuple, )) {
-   ret = -EEXIST;
-   goto out;
+   for (i = 0; i < nf_ct_helper_hsize; i++) {
+   hlist_for_each_entry(cur, _ct_helper_hash[i], hnode) {
+   if (!strcmp(cur->name, me->name) &&
+   (cur->tuple.src.l3num == NFPROTO_UNSPEC ||
+cur->tuple.src.l3num == me->tuple.src.l3num) &&
+   cur->tuple.dst.protonum == me->tuple.dst.protonum) {
+   ret = -EEXIST;
+   goto out;
+   }
+   }
+   }
+
+   /* avoid unpredictable behaviour for auto_assign_helper */
+   if (!(me->flags & NF_CT_HELPER_F_USERSPACE)) {
+   hlist_for_each_entry(cur, _ct_helper_hash[h], hnode) {
+   if (nf_ct_tuple_src_mask_cmp(>tuple, >tuple,
+)) {
+   ret = -EEXIST;
+   goto out;
+   }
}
}
hlist_add_head_rcu(>hnode, _ct_helper_hash[h]);
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 14/16] netfilter: x_tables: unlock on error in xt_find_table_lock()

2017-05-03 Thread Pablo Neira Ayuso
From: Dan Carpenter 

According to my static checker we should unlock here before the return.
That seems reasonable to me as well.

Fixes" b9e69e127397 ("netfilter: xtables: don't hook tables by default")
Signed-off-by: Dan Carpenter 
Acked-by: Florian Westphal 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/x_tables.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index 14857afc9937..f134d384852f 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -1051,8 +1051,10 @@ struct xt_table *xt_find_table_lock(struct net *net, 
u_int8_t af,
list_for_each_entry(t, _net.xt.tables[af], list) {
if (strcmp(t->name, name))
continue;
-   if (!try_module_get(t->me))
+   if (!try_module_get(t->me)) {
+   mutex_unlock([af].mutex);
return NULL;
+   }
 
mutex_unlock([af].mutex);
if (t->table_init(net) != 0) {
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 04/16] netfilter: nft_set_bitmap: free dummy elements when destroy the set

2017-05-03 Thread Pablo Neira Ayuso
From: Liping Zhang 

We forget to free dummy elements when deleting the set. So when I was
running nft-test.py, I saw many kmemleak warnings:
  kmemleak: 1344 new suspected memory leaks ...
  # cat /sys/kernel/debug/kmemleak
  unreferenced object 0x8800631345c8 (size 32):
  comm "nft", pid 9075, jiffies 4295743309 (age 1354.815s)
  hex dump (first 32 bytes):
f8 63 13 63 00 88 ff ff 88 79 13 63 00 88 ff ff  .c.c.y.c
04 0c 00 00 00 00 00 00 00 00 00 00 08 03 00 00  
  backtrace:
[] kmemleak_alloc+0x4a/0xa0
[] __kmalloc+0x164/0x310
[] nft_set_elem_init+0x3d/0x1b0 [nf_tables]
[] nft_add_set_elem+0x45a/0x8c0 [nf_tables]
[] nf_tables_newsetelem+0x105/0x1d0 [nf_tables]
[] nfnetlink_rcv+0x414/0x770 [nfnetlink]
[] netlink_unicast+0x1f6/0x310
[] netlink_sendmsg+0x306/0x3b0
  ...

Fixes: e920dde516088 ("netfilter: nft_set_bitmap: keep a list of dummy 
elements")
Signed-off-by: Liping Zhang 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/nft_set_bitmap.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/net/netfilter/nft_set_bitmap.c b/net/netfilter/nft_set_bitmap.c
index 8ebbc2940f4c..b988162b5b15 100644
--- a/net/netfilter/nft_set_bitmap.c
+++ b/net/netfilter/nft_set_bitmap.c
@@ -257,6 +257,11 @@ static int nft_bitmap_init(const struct nft_set *set,
 
 static void nft_bitmap_destroy(const struct nft_set *set)
 {
+   struct nft_bitmap *priv = nft_set_priv(set);
+   struct nft_bitmap_elem *be, *n;
+
+   list_for_each_entry_safe(be, n, >list, head)
+   nft_set_elem_destroy(set, be, true);
 }
 
 static bool nft_bitmap_estimate(const struct nft_set_desc *desc, u32 features,
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 00/16] Netfilter/IPVS/OVS fixes for net

2017-05-03 Thread Pablo Neira Ayuso
Hi David,

The following patchset contains a rather large batch of Netfilter, IPVS
and OVS fixes for your net tree. This includes fixes for ctnetlink, the
userspace conntrack helper infrastructure, conntrack OVS support,
ebtables DNAT target, several leaks in error path among other. More
specifically, they are:

1) Fix reference count leak in the CT target error path, from Gao Feng.

2) Remove conntrack entry clashing with a matching expectation, patch
   from Jarno Rajahalme.

3) Fix bogus EEXIST when registering two different userspace helpers,
   from Liping Zhang.

4) Don't leak dummy elements in the new bitmap set type in nf_tables,
   from Liping Zhang.

5) Get rid of module autoload from conntrack update path in ctnetlink,
   we don't need autoload at this late stage and it is happening with
   rcu read lock held which is not good. From Liping Zhang.

6) Fix deadlock due to double-acquire of the expect_lock from conntrack
   update path, this fixes a bug that was introduced when the central
   spinlock got removed. Again from Liping Zhang.

7) Safe ct->status update from ctnetlink path, from Liping. The expect_lock
   protection that was selected when the central spinlock was removed was
   not really protecting anything at all.

8) Protect sequence adjustment under ct->lock.

9) Missing socket match with IPv6, from Peter Tirsek.

10) Adjust skb->pkt_type of DNAT'ed frames from ebtables, from
Linus Luessing.

11) Don't give up on evaluating the expression on new entries added via
dynset expression in nf_tables, from Liping Zhang.

12) Use skb_checksum() when mangling icmpv6 in IPv6 NAT as this deals
with non-linear skbuffs.

13) Don't allow IPv6 service in IPVS if no IPv6 support is available,
from Paolo Abeni.

14) Missing mutex release in error path of xt_find_table_lock(), from
Dan Carpenter.

15) Update maintainers files, Netfilter section. Add Florian to the
file, refer to nftables.org and change project status from Supported
to Maintained.

16) Bail out on mismatching extensions in element updates in nf_tables.

You can pull these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git

Thanks!



The following changes since commit 94836ecf1e7378b64d37624fbb81fe48fbd4c772:

  Merge tag 'nfsd-4.11-2' of git://linux-nfs.org/~bfields/linux (2017-04-21 
16:37:48 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git HEAD

for you to fetch changes up to 9744a6fcefcb4d56501d69adb04c24559d353cad:

  netfilter: nf_tables: check if same extensions are set when adding elements 
(2017-05-03 10:58:00 +0200)


Dan Carpenter (1):
  netfilter: x_tables: unlock on error in xt_find_table_lock()

Dave Johnson (1):
  netfilter: Wrong icmp6 checksum for ICMPV6_TIME_EXCEED in reverse SNATv6 
path

Gao Feng (1):
  netfilter: xt_CT: fix refcnt leak on error path

Jarno Rajahalme (1):
  openvswitch: Delete conntrack entry clashing with an expectation.

Linus Lüssing (1):
  bridge: ebtables: fix reception of frames DNAT-ed to bridge device/port

Liping Zhang (7):
  netfilter: nf_ct_helper: permit cthelpers with different names via 
nfnetlink
  netfilter: nft_set_bitmap: free dummy elements when destroy the set
  netfilter: ctnetlink: drop the incorrect cthelper module request
  netfilter: ctnetlink: fix deadlock due to acquire _expect_lock twice
  netfilter: ctnetlink: make it safer when updating ct->status
  netfilter: ctnetlink: acquire ct->lock before operating nf_ct_seqadj
  netfilter: nft_dynset: continue to next expr if _OP_ADD succeeded

Pablo Neira Ayuso (3):
  Merge tag 'ipvs-fixes-for-v4.11' of http://git.kernel.org/.../horms/ipvs
  netfilter: update MAINTAINERS file
  netfilter: nf_tables: check if same extensions are set when adding 
elements

Paolo Abeni (1):
  ipvs: explicitly forbid ipv6 service/dest creation if ipv6 mod is disabled

Peter Tirsek (1):
  netfilter: xt_socket: Fix broken IPv6 handling

 MAINTAINERS|  4 +-
 include/uapi/linux/netfilter/nf_conntrack_common.h | 13 +++-
 net/bridge/netfilter/ebt_dnat.c| 20 +
 net/ipv6/netfilter/nf_nat_l3proto_ipv6.c   |  2 +-
 net/netfilter/ipvs/ip_vs_ctl.c | 22 --
 net/netfilter/nf_conntrack_helper.c| 26 +--
 net/netfilter/nf_conntrack_netlink.c   | 89 --
 net/netfilter/nf_tables_api.c  |  5 ++
 net/netfilter/nft_dynset.c |  5 +-
 net/netfilter/nft_set_bitmap.c |  5 ++
 net/netfilter/x_tables.c   |  4 +-
 net/netfilter/xt_CT.c  | 11 ++-
 net/netfilter/xt_socket.c  |  2 +-
 

[PATCH 16/16] netfilter: nf_tables: check if same extensions are set when adding elements

2017-05-03 Thread Pablo Neira Ayuso
If no NLM_F_EXCL is set and the element already exists in the set, make
sure that both elements have the same extensions.

Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/nf_tables_api.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 434c739dfeca..11a96e8dd3cd 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -3749,6 +3749,11 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct 
nft_set *set,
err = set->ops->insert(ctx->net, set, , );
if (err) {
if (err == -EEXIST) {
+   if (nft_set_ext_exists(ext, NFT_SET_EXT_DATA) ^
+   nft_set_ext_exists(ext2, NFT_SET_EXT_DATA) ||
+   nft_set_ext_exists(ext, NFT_SET_EXT_OBJREF) ^
+   nft_set_ext_exists(ext2, NFT_SET_EXT_OBJREF))
+   return -EBUSY;
if ((nft_set_ext_exists(ext, NFT_SET_EXT_DATA) &&
 nft_set_ext_exists(ext2, NFT_SET_EXT_DATA) &&
 memcmp(nft_set_ext_data(ext),
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 07/16] netfilter: ctnetlink: make it safer when updating ct->status

2017-05-03 Thread Pablo Neira Ayuso
From: Liping Zhang 

After converting to use rcu for conntrack hash, one CPU may update
the ct->status via ctnetlink, while another CPU may process the
packets and update the ct->status.

So the non-atomic operation "ct->status |= status;" via ctnetlink
becomes unsafe, and this may clear the IPS_DYING_BIT bit set by
another CPU unexpectedly. For example:
 CPU0CPU1
  ctnetlink_change_status__nf_conntrack_find_get
  old = ct->status  nf_ct_gc_expired
  - nf_ct_kill
  -  test_and_set_bit(IPS_DYING_BIT
  new = old | status; -
  ct->status = new; <-- oops, _DYING_ is cleared!

Now using a series of atomic bit operation to solve the above issue.

Also note, user shouldn't set IPS_TEMPLATE, IPS_SEQ_ADJUST directly,
so make these two bits be unchangable too.

If we set the IPS_TEMPLATE_BIT, ct will be freed by nf_ct_tmpl_free,
but actually it is alloced by nf_conntrack_alloc.
If we set the IPS_SEQ_ADJUST_BIT, this may cause the NULL pointer
deference, as the nfct_seqadj(ct) maybe NULL.

Last, add some comments to describe the logic change due to the
commit a963d710f367 ("netfilter: ctnetlink: Fix regression in CTA_STATUS
processing"), which makes me feel a little confusing.

Fixes: 76507f69c44e ("[NETFILTER]: nf_conntrack: use RCU for conntrack hash")
Signed-off-by: Liping Zhang 
Signed-off-by: Pablo Neira Ayuso 
---
 include/uapi/linux/netfilter/nf_conntrack_common.h | 13 ++---
 net/netfilter/nf_conntrack_netlink.c   | 33 --
 2 files changed, 33 insertions(+), 13 deletions(-)

diff --git a/include/uapi/linux/netfilter/nf_conntrack_common.h 
b/include/uapi/linux/netfilter/nf_conntrack_common.h
index 6a8e33dd4ecb..38fc383139f0 100644
--- a/include/uapi/linux/netfilter/nf_conntrack_common.h
+++ b/include/uapi/linux/netfilter/nf_conntrack_common.h
@@ -82,10 +82,6 @@ enum ip_conntrack_status {
IPS_DYING_BIT = 9,
IPS_DYING = (1 << IPS_DYING_BIT),
 
-   /* Bits that cannot be altered from userland. */
-   IPS_UNCHANGEABLE_MASK = (IPS_NAT_DONE_MASK | IPS_NAT_MASK |
-IPS_EXPECTED | IPS_CONFIRMED | IPS_DYING),
-
/* Connection has fixed timeout. */
IPS_FIXED_TIMEOUT_BIT = 10,
IPS_FIXED_TIMEOUT = (1 << IPS_FIXED_TIMEOUT_BIT),
@@ -101,6 +97,15 @@ enum ip_conntrack_status {
/* Conntrack got a helper explicitly attached via CT target. */
IPS_HELPER_BIT = 13,
IPS_HELPER = (1 << IPS_HELPER_BIT),
+
+   /* Be careful here, modifying these bits can make things messy,
+* so don't let users modify them directly.
+*/
+   IPS_UNCHANGEABLE_MASK = (IPS_NAT_DONE_MASK | IPS_NAT_MASK |
+IPS_EXPECTED | IPS_CONFIRMED | IPS_DYING |
+IPS_SEQ_ADJUST | IPS_TEMPLATE),
+
+   __IPS_MAX_BIT = 14,
 };
 
 /* Connection tracking event types */
diff --git a/net/netfilter/nf_conntrack_netlink.c 
b/net/netfilter/nf_conntrack_netlink.c
index e5f9b1f4..86deed6a8db4 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -1419,6 +1419,24 @@ ctnetlink_parse_nat_setup(struct nf_conn *ct,
 }
 #endif
 
+static void
+__ctnetlink_change_status(struct nf_conn *ct, unsigned long on,
+ unsigned long off)
+{
+   unsigned int bit;
+
+   /* Ignore these unchangable bits */
+   on &= ~IPS_UNCHANGEABLE_MASK;
+   off &= ~IPS_UNCHANGEABLE_MASK;
+
+   for (bit = 0; bit < __IPS_MAX_BIT; bit++) {
+   if (on & (1 << bit))
+   set_bit(bit, >status);
+   else if (off & (1 << bit))
+   clear_bit(bit, >status);
+   }
+}
+
 static int
 ctnetlink_change_status(struct nf_conn *ct, const struct nlattr * const cda[])
 {
@@ -1438,10 +1456,7 @@ ctnetlink_change_status(struct nf_conn *ct, const struct 
nlattr * const cda[])
/* ASSURED bit can only be set */
return -EBUSY;
 
-   /* Be careful here, modifying NAT bits can screw up things,
-* so don't let users modify them directly if they don't pass
-* nf_nat_range. */
-   ct->status |= status & ~(IPS_NAT_DONE_MASK | IPS_NAT_MASK);
+   __ctnetlink_change_status(ct, status, 0);
return 0;
 }
 
@@ -1628,7 +1643,7 @@ ctnetlink_change_seq_adj(struct nf_conn *ct,
if (ret < 0)
return ret;
 
-   ct->status |= IPS_SEQ_ADJUST;
+   set_bit(IPS_SEQ_ADJUST_BIT, >status);
}
 
if (cda[CTA_SEQ_ADJ_REPLY]) {
@@ -1637,7 +1652,7 @@ ctnetlink_change_seq_adj(struct nf_conn *ct,
if (ret < 0)
return ret;
 
-   ct->status |= IPS_SEQ_ADJUST;
+   set_bit(IPS_SEQ_ADJUST_BIT, 

[PATCH 02/16] openvswitch: Delete conntrack entry clashing with an expectation.

2017-05-03 Thread Pablo Neira Ayuso
From: Jarno Rajahalme 

Conntrack helpers do not check for a potentially clashing conntrack
entry when creating a new expectation.  Also, nf_conntrack_in() will
check expectations (via init_conntrack()) only if a conntrack entry
can not be found.  The expectation for a packet which also matches an
existing conntrack entry will not be removed by conntrack, and is
currently handled inconsistently by OVS, as OVS expects the
expectation to be removed when the connection tracking entry matching
that expectation is confirmed.

It should be noted that normally an IP stack would not allow reuse of
a 5-tuple of an old (possibly lingering) connection for a new data
connection, so this is somewhat unlikely corner case.  However, it is
possible that a misbehaving source could cause conntrack entries be
created that could then interfere with new related connections.

Fix this in the OVS module by deleting the clashing conntrack entry
after an expectation has been matched.  This causes the following
nf_conntrack_in() call also find the expectation and remove it when
creating the new conntrack entry, as well as the forthcoming reply
direction packets to match the new related connection instead of the
old clashing conntrack entry.

Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action")
Reported-by: Yang Song 
Signed-off-by: Jarno Rajahalme 
Acked-by: Joe Stringer 
Signed-off-by: Pablo Neira Ayuso 
---
 net/openvswitch/conntrack.c | 30 +-
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 7b2c2fce408a..c92a5795dcda 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -514,10 +514,38 @@ ovs_ct_expect_find(struct net *net, const struct 
nf_conntrack_zone *zone,
   u16 proto, const struct sk_buff *skb)
 {
struct nf_conntrack_tuple tuple;
+   struct nf_conntrack_expect *exp;
 
if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb), proto, net, 
))
return NULL;
-   return __nf_ct_expect_find(net, zone, );
+
+   exp = __nf_ct_expect_find(net, zone, );
+   if (exp) {
+   struct nf_conntrack_tuple_hash *h;
+
+   /* Delete existing conntrack entry, if it clashes with the
+* expectation.  This can happen since conntrack ALGs do not
+* check for clashes between (new) expectations and existing
+* conntrack entries.  nf_conntrack_in() will check the
+* expectations only if a conntrack entry can not be found,
+* which can lead to OVS finding the expectation (here) in the
+* init direction, but which will not be removed by the
+* nf_conntrack_in() call, if a matching conntrack entry is
+* found instead.  In this case all init direction packets
+* would be reported as new related packets, while reply
+* direction packets would be reported as un-related
+* established packets.
+*/
+   h = nf_conntrack_find_get(net, zone, );
+   if (h) {
+   struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(h);
+
+   nf_ct_delete(ct, 0, 0);
+   nf_conntrack_put(>ct_general);
+   }
+   }
+
+   return exp;
 }
 
 /* This replicates logic from nf_conntrack_core.c that is not exported. */
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 11/16] netfilter: nft_dynset: continue to next expr if _OP_ADD succeeded

2017-05-03 Thread Pablo Neira Ayuso
From: Liping Zhang 

Currently, after adding the following nft rules:
  # nft add set x target1 { type ipv4_addr \; flags timeout \;}
  # nft add rule x y set add ip daddr timeout 1d @target1 counter

the counters will always be zero despite of the elements are added
to the dynamic set "target1" or not, as we will break the nft expr
traversal unconditionally:
  # nft list ruleset
  ...
  set target1 {
  ...
  elements = { 8.8.8.8 expires 23h59m53s}
  }
  chain output {
  ...
  set add ip daddr timeout 1d @target1 counter packets 0 bytes 0
   ^   ^
  ...
  }

Since we add the elements to the set successfully, we should continue
to the next expression.

Additionally, if elements are added to "flow table" successfully, we
will _always_ continue to the next expr, even if the operation is
_OP_ADD. So it's better to keep them to be consistent.

Fixes: 22fe54d5fefc ("netfilter: nf_tables: add support for dynamic set 
updates")
Reported-by: Robert White 
Signed-off-by: Liping Zhang 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/nft_dynset.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/net/netfilter/nft_dynset.c b/net/netfilter/nft_dynset.c
index 049ad2d9ee66..fafbeea3ed04 100644
--- a/net/netfilter/nft_dynset.c
+++ b/net/netfilter/nft_dynset.c
@@ -82,8 +82,7 @@ static void nft_dynset_eval(const struct nft_expr *expr,
nft_set_ext_exists(ext, NFT_SET_EXT_EXPIRATION)) {
timeout = priv->timeout ? : set->timeout;
*nft_set_ext_expiration(ext) = jiffies + timeout;
-   } else if (sexpr == NULL)
-   goto out;
+   }
 
if (sexpr != NULL)
sexpr->ops->eval(sexpr, regs, pkt);
@@ -92,7 +91,7 @@ static void nft_dynset_eval(const struct nft_expr *expr,
regs->verdict.code = NFT_BREAK;
return;
}
-out:
+
if (!priv->invert)
regs->verdict.code = NFT_BREAK;
 }
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 08/16] netfilter: ctnetlink: acquire ct->lock before operating nf_ct_seqadj

2017-05-03 Thread Pablo Neira Ayuso
From: Liping Zhang 

We should acquire the ct->lock before accessing or modifying the
nf_ct_seqadj, as another CPU may modify the nf_ct_seqadj at the same
time during its packet proccessing.

Signed-off-by: Liping Zhang 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/nf_conntrack_netlink.c | 21 +++--
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/net/netfilter/nf_conntrack_netlink.c 
b/net/netfilter/nf_conntrack_netlink.c
index 86deed6a8db4..78f8c9adbd3c 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -417,8 +417,7 @@ dump_ct_seq_adj(struct sk_buff *skb, const struct 
nf_ct_seqadj *seq, int type)
return -1;
 }
 
-static int ctnetlink_dump_ct_seq_adj(struct sk_buff *skb,
-const struct nf_conn *ct)
+static int ctnetlink_dump_ct_seq_adj(struct sk_buff *skb, struct nf_conn *ct)
 {
struct nf_conn_seqadj *seqadj = nfct_seqadj(ct);
struct nf_ct_seqadj *seq;
@@ -426,15 +425,20 @@ static int ctnetlink_dump_ct_seq_adj(struct sk_buff *skb,
if (!(ct->status & IPS_SEQ_ADJUST) || !seqadj)
return 0;
 
+   spin_lock_bh(>lock);
seq = >seq[IP_CT_DIR_ORIGINAL];
if (dump_ct_seq_adj(skb, seq, CTA_SEQ_ADJ_ORIG) == -1)
-   return -1;
+   goto err;
 
seq = >seq[IP_CT_DIR_REPLY];
if (dump_ct_seq_adj(skb, seq, CTA_SEQ_ADJ_REPLY) == -1)
-   return -1;
+   goto err;
 
+   spin_unlock_bh(>lock);
return 0;
+err:
+   spin_unlock_bh(>lock);
+   return -1;
 }
 
 static int ctnetlink_dump_id(struct sk_buff *skb, const struct nf_conn *ct)
@@ -1637,11 +1641,12 @@ ctnetlink_change_seq_adj(struct nf_conn *ct,
if (!seqadj)
return 0;
 
+   spin_lock_bh(>lock);
if (cda[CTA_SEQ_ADJ_ORIG]) {
ret = change_seq_adj(>seq[IP_CT_DIR_ORIGINAL],
 cda[CTA_SEQ_ADJ_ORIG]);
if (ret < 0)
-   return ret;
+   goto err;
 
set_bit(IPS_SEQ_ADJUST_BIT, >status);
}
@@ -1650,12 +1655,16 @@ ctnetlink_change_seq_adj(struct nf_conn *ct,
ret = change_seq_adj(>seq[IP_CT_DIR_REPLY],
 cda[CTA_SEQ_ADJ_REPLY]);
if (ret < 0)
-   return ret;
+   goto err;
 
set_bit(IPS_SEQ_ADJUST_BIT, >status);
}
 
+   spin_unlock_bh(>lock);
return 0;
+err:
+   spin_unlock_bh(>lock);
+   return ret;
 }
 
 static int
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 12/16] netfilter: Wrong icmp6 checksum for ICMPV6_TIME_EXCEED in reverse SNATv6 path

2017-05-03 Thread Pablo Neira Ayuso
From: Dave Johnson 

When recalculating the outer ICMPv6 checksum for a reverse path NATv6
such as ICMPV6_TIME_EXCEED nf_nat_icmpv6_reply_translation() was
accessing data beyond the headlen of the skb for non-linear skb.  This
resulted in incorrect ICMPv6 checksum as garbage data was used.

Patch replaces csum_partial() with skb_checksum() which supports
non-linear skbs similar to nf_nat_icmp_reply_translation() from ipv4.

Signed-off-by: Dave Johnson 
Signed-off-by: Pablo Neira Ayuso 
---
 net/ipv6/netfilter/nf_nat_l3proto_ipv6.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c 
b/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c
index e0be97e636a4..69937b637ee5 100644
--- a/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c
+++ b/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c
@@ -235,7 +235,7 @@ int nf_nat_icmpv6_reply_translation(struct sk_buff *skb,
inside->icmp6.icmp6_cksum =
csum_ipv6_magic(>saddr, >daddr,
skb->len - hdrlen, IPPROTO_ICMPV6,
-   csum_partial(>icmp6,
+   skb_checksum(skb, hdrlen,
 skb->len - hdrlen, 0));
}
 
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 01/16] netfilter: xt_CT: fix refcnt leak on error path

2017-05-03 Thread Pablo Neira Ayuso
From: Gao Feng 

There are two cases which causes refcnt leak.

1. When nf_ct_timeout_ext_add failed in xt_ct_set_timeout, it should
free the timeout refcnt.
Now goto the err_put_timeout error handler instead of going ahead.

2. When the time policy is not found, we should call module_put.
Otherwise, the related cthelper module cannot be removed anymore.
It is easy to reproduce by typing the following command:
  # iptables -t raw -A OUTPUT -p tcp -j CT --helper ftp --timeout xxx

Signed-off-by: Gao Feng 
Signed-off-by: Liping Zhang 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/xt_CT.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/xt_CT.c b/net/netfilter/xt_CT.c
index b008db0184b8..81fdcdca7457 100644
--- a/net/netfilter/xt_CT.c
+++ b/net/netfilter/xt_CT.c
@@ -167,8 +167,10 @@ xt_ct_set_timeout(struct nf_conn *ct, const struct 
xt_tgchk_param *par,
goto err_put_timeout;
}
timeout_ext = nf_ct_timeout_ext_add(ct, timeout, GFP_ATOMIC);
-   if (timeout_ext == NULL)
+   if (!timeout_ext) {
ret = -ENOMEM;
+   goto err_put_timeout;
+   }
 
rcu_read_unlock();
return ret;
@@ -200,6 +202,7 @@ static int xt_ct_tg_check(const struct xt_tgchk_param *par,
  struct xt_ct_target_info_v1 *info)
 {
struct nf_conntrack_zone zone;
+   struct nf_conn_help *help;
struct nf_conn *ct;
int ret = -EOPNOTSUPP;
 
@@ -248,7 +251,7 @@ static int xt_ct_tg_check(const struct xt_tgchk_param *par,
if (info->timeout[0]) {
ret = xt_ct_set_timeout(ct, par, info->timeout);
if (ret < 0)
-   goto err3;
+   goto err4;
}
__set_bit(IPS_CONFIRMED_BIT, >status);
nf_conntrack_get(>ct_general);
@@ -256,6 +259,10 @@ static int xt_ct_tg_check(const struct xt_tgchk_param *par,
info->ct = ct;
return 0;
 
+err4:
+   help = nfct_help(ct);
+   if (help)
+   module_put(help->helper->me);
 err3:
nf_ct_tmpl_free(ct);
 err2:
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 06/16] netfilter: ctnetlink: fix deadlock due to acquire _expect_lock twice

2017-05-03 Thread Pablo Neira Ayuso
From: Liping Zhang 

Currently, ctnetlink_change_conntrack is always protected by _expect_lock,
but this will cause a deadlock when deleting the helper from a conntrack,
as the _expect_lock will be acquired again by nf_ct_remove_expectations:

 CPU0

  lock(nf_conntrack_expect_lock);
  lock(nf_conntrack_expect_lock);

  *** DEADLOCK ***
  May be due to missing lock nesting notation

  2 locks held by lt-conntrack_gr/12853:
  #0:  ([i].mutex){+.+.+.}, at: []
   nfnetlink_rcv_msg+0x399/0x6a9 [nfnetlink]
  #1:  (nf_conntrack_expect_lock){+.}, at: []
   ctnetlink_new_conntrack+0x17f/0x408 [nf_conntrack_netlink]

  Call Trace:
   dump_stack+0x85/0xc2
   __lock_acquire+0x1608/0x1680
   ? ctnetlink_parse_tuple_proto+0x10f/0x1c0 [nf_conntrack_netlink]
   lock_acquire+0x100/0x1f0
   ? nf_ct_remove_expectations+0x32/0x90 [nf_conntrack]
   _raw_spin_lock_bh+0x3f/0x50
   ? nf_ct_remove_expectations+0x32/0x90 [nf_conntrack]
   nf_ct_remove_expectations+0x32/0x90 [nf_conntrack]
   ctnetlink_change_helper+0xc6/0x190 [nf_conntrack_netlink]
   ctnetlink_new_conntrack+0x1b2/0x408 [nf_conntrack_netlink]
   nfnetlink_rcv_msg+0x60a/0x6a9 [nfnetlink]
   ? nfnetlink_rcv_msg+0x1b9/0x6a9 [nfnetlink]
   ? nfnetlink_bind+0x1a0/0x1a0 [nfnetlink]
   netlink_rcv_skb+0xa4/0xc0
   nfnetlink_rcv+0x87/0x770 [nfnetlink]

Since the operations are unrelated to nf_ct_expect, so we can drop the
_expect_lock. Also note, after removing the _expect_lock protection,
another CPU may invoke nf_conntrack_helper_unregister, so we should
use rcu_read_lock to protect __nf_conntrack_helper_find invoked by
ctnetlink_change_helper.

Fixes: ca7433df3a67 ("netfilter: conntrack: seperate expect locking from 
nf_conntrack_lock")
Signed-off-by: Liping Zhang 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/nf_conntrack_netlink.c | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/net/netfilter/nf_conntrack_netlink.c 
b/net/netfilter/nf_conntrack_netlink.c
index 48c184552de0..e5f9b1f4 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -1510,23 +1510,29 @@ static int ctnetlink_change_helper(struct nf_conn *ct,
return 0;
}
 
+   rcu_read_lock();
helper = __nf_conntrack_helper_find(helpname, nf_ct_l3num(ct),
nf_ct_protonum(ct));
-   if (helper == NULL)
+   if (helper == NULL) {
+   rcu_read_unlock();
return -EOPNOTSUPP;
+   }
 
if (help) {
if (help->helper == helper) {
/* update private helper data if allowed. */
if (helper->from_nlattr)
helper->from_nlattr(helpinfo, ct);
-   return 0;
+   err = 0;
} else
-   return -EBUSY;
+   err = -EBUSY;
+   } else {
+   /* we cannot set a helper for an existing conntrack */
+   err = -EOPNOTSUPP;
}
 
-   /* we cannot set a helper for an existing conntrack */
-   return -EOPNOTSUPP;
+   rcu_read_unlock();
+   return err;
 }
 
 static int ctnetlink_change_timeout(struct nf_conn *ct,
@@ -1945,9 +1951,7 @@ static int ctnetlink_new_conntrack(struct net *net, 
struct sock *ctnl,
err = -EEXIST;
ct = nf_ct_tuplehash_to_ctrack(h);
if (!(nlh->nlmsg_flags & NLM_F_EXCL)) {
-   spin_lock_bh(_conntrack_expect_lock);
err = ctnetlink_change_conntrack(ct, cda);
-   spin_unlock_bh(_conntrack_expect_lock);
if (err == 0) {
nf_conntrack_eventmask_report((1 << IPCT_REPLY) |
  (1 << IPCT_ASSURED) |
@@ -2342,11 +2346,7 @@ ctnetlink_glue_parse(const struct nlattr *attr, struct 
nf_conn *ct)
if (ret < 0)
return ret;
 
-   spin_lock_bh(_conntrack_expect_lock);
-   ret = ctnetlink_glue_parse_ct((const struct nlattr **)cda, ct);
-   spin_unlock_bh(_conntrack_expect_lock);
-
-   return ret;
+   return ctnetlink_glue_parse_ct((const struct nlattr **)cda, ct);
 }
 
 static int ctnetlink_glue_exp_parse(const struct nlattr * const *cda,
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 05/16] netfilter: ctnetlink: drop the incorrect cthelper module request

2017-05-03 Thread Pablo Neira Ayuso
From: Liping Zhang 

First, when creating a new ct, we will invoke request_module to try to
load the related inkernel cthelper. So there's no need to call the
request_module again when updating the ct helpinfo.

Second, ctnetlink_change_helper may be called with rcu_read_lock held,
i.e. rcu_read_lock -> nfqnl_recv_verdict -> nfqnl_ct_parse ->
ctnetlink_glue_parse -> ctnetlink_glue_parse_ct ->
ctnetlink_change_helper. But the request_module invocation may sleep,
so we can't call it with the rcu_read_lock held.

Remove it now.

Signed-off-by: Liping Zhang 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/nf_conntrack_netlink.c | 17 +
 1 file changed, 1 insertion(+), 16 deletions(-)

diff --git a/net/netfilter/nf_conntrack_netlink.c 
b/net/netfilter/nf_conntrack_netlink.c
index dc7dfd68fafe..48c184552de0 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -1512,23 +1512,8 @@ static int ctnetlink_change_helper(struct nf_conn *ct,
 
helper = __nf_conntrack_helper_find(helpname, nf_ct_l3num(ct),
nf_ct_protonum(ct));
-   if (helper == NULL) {
-#ifdef CONFIG_MODULES
-   spin_unlock_bh(_conntrack_expect_lock);
-
-   if (request_module("nfct-helper-%s", helpname) < 0) {
-   spin_lock_bh(_conntrack_expect_lock);
-   return -EOPNOTSUPP;
-   }
-
-   spin_lock_bh(_conntrack_expect_lock);
-   helper = __nf_conntrack_helper_find(helpname, nf_ct_l3num(ct),
-   nf_ct_protonum(ct));
-   if (helper)
-   return -EAGAIN;
-#endif
+   if (helper == NULL)
return -EOPNOTSUPP;
-   }
 
if (help) {
if (help->helper == helper) {
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html