Re: [PATCH iptables] iptables: support insisting that the lock is held
On Wed, May 3, 2017 at 11:51 PM, Aaron Conolewrote: > I've thought about it. I think a better change that makes sense in the > presence of concurrent updates would be to use the wait argument as a > total time, and apply it to both the userspace lock, AND an EAGAIN from > the kernel space side. So if the kernel space says 'locked try again', > and the user passed a -w option, we can simply keep retrying until the > wait time expires. Does that make sense and solve your issue, Lorenzo? The lock is always taken regardless of whether -w is specified. The only difference is that with -w userspace waits instead of exiting with an error. I'm with Pablo on this one - the behaviour is incorrect and should be fixed. Admins shouldn't have to wait for a concurrent execution to happen before they know that they misconfigured the lock. I think that if the admin runs iptables and the lock is unusable, it's a configuration error, and it should be reported as such immediately. Also - even if we add a check for EAGAIN, how reliable is that codepath going to be? Concurrent updates are already pretty rare, and the problem we saw was pretty hard to track down. If we sprinkle checks for EAGAIN all over the code, and a future change misses one of them, how will anyone notice breakage? -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH iptables] iptables: support insisting that the lock is held
On Wed, May 3, 2017 at 11:27 PM, Aaron Conolewrote: > I wouldn't say it that way. I looked at this a while ago, and one thing > to keep in mind is the if the particular prefix path in the filesystem > (for instance /run) isn't available, then this will cause iptables to > fail. I'm not sure how many systems do provide /run - at the time it > might have been more common. That is a configuration error on the part of the distribution maintainers. The location of the iptables lock is configurable at compile time and if the distribution does not have /run, the maintainers can use another path. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH iptables] iptables: support insisting that the lock is held
On Wed, May 3, 2017 at 8:01 PM, Pablo Neira Ayusowrote: > This sounds to me like a wrong design decision was made when > introducing this userspace lock. Looks like the lock was introduced by 93587a04d0f2 ("ip[6]tables: Add locking to prevent concurrent instances") which made the lock optional for backwards compatibility: xt_socket = socket(AF_UNIX, SOCK_STREAM, 0); /* If we can't even create a socket, fall back to prior (lockless) behavior */ if (xt_socket < 0) return true; That was almost four years ago. Sometime between 1.4 and 1.6, the lock was switched to use flock() on /run/xtables.lock , but the fail-open behaviour was preserved. > I would like to skip this compile time switch, if the existing > behaviour is broken, we should just fix it. What is the scenario that > can indeed have an impact in terms of backward compatibility breakage? > Does it really make sense to keep a buggy behaviour around? I agree that the current behaviour is dangerous and error-prone. It's bitten us badly several times now. The only use case I came up with is running commands early on boot, from a read-only filesystem that does not have /run/xtables.lock. If we change the behaviour, that system might become unbootable on an iptables upgrade. Another issue mentioned in the initial socket-based approach was that no locking is necessary for iptables commands running in different network namespaces. Someone with such a system might be relying on this. The impact of this change would be to slow down concurrent iptables commands. These both seem like a very contrived cases though. I'll send out another patch that changes the behaviour. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH iptables] iptables: support insisting that the lock is held
Aaron Conolewrites: > Pablo Neira Ayuso writes: > >> On Thu, Apr 20, 2017 at 06:23:33PM +0900, Lorenzo Colitti wrote: >>> Currently, iptables programs will exit with an error if the >>> iptables lock cannot be acquired, but will silently continue if >>> the lock cannot be opened at all. >> >> This sounds to me like a wrong design decision was made when >> introducing this userspace lock. > > I wouldn't say it that way. I looked at this a while ago, and one thing > to keep in mind is the if the particular prefix path in the filesystem > (for instance /run) isn't available, then this will cause iptables to > fail. I'm not sure how many systems do provide /run - at the time it > might have been more common. Another issue is container systems. Until recently, Kubernetes didn't provide /run at all, and not all container orchestration tools will provide this filesystem. >>> This can cause unexpected failures (with unhelpful error messages) >>> in the presence of concurrent updates. >>> >>> This patch adds a compile-time option that causes iptables to >>> refuse to do anything if the lock cannot be acquired. It is a >>> compile-time option instead of a command-line flag because: >>> >>> 1. In order to reliably avoid concurrent modification, all >>>invocations of iptables commands must follow this behaviour. >>> 2. Whether or not the lock can be opened is typically not >>>a run-time condition but is likely to be a configuration >>>error. >>> >>> Tested by deleting xtables.lock and observing that all commands >>> failed if iptables was compiled with --enable-strict-locking, but >>> succeeded otherwise. >>> >>> By default, --enable-strict-locking is disabled for backwards >>> compatibility reasons. It can be enabled by default in a future >>> change if desired. >> >> I would like to skip this compile time switch, if the existing >> behaviour is broken, we should just fix it. What is the scenario that >> can indeed have an impact in terms of backward compatibility breakage? >> Does it really make sense to keep a buggy behaviour around? > > I'm not sure about a change to the behavior, but I agree that a compile > time switch is probably not the way to go. I've thought about it. I think a better change that makes sense in the presence of concurrent updates would be to use the wait argument as a total time, and apply it to both the userspace lock, AND an EAGAIN from the kernel space side. So if the kernel space says 'locked try again', and the user passed a -w option, we can simply keep retrying until the wait time expires. Does that make sense and solve your issue, Lorenzo? -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH iptables] iptables: support insisting that the lock is held
Pablo Neira Ayusowrites: > On Thu, Apr 20, 2017 at 06:23:33PM +0900, Lorenzo Colitti wrote: >> Currently, iptables programs will exit with an error if the >> iptables lock cannot be acquired, but will silently continue if >> the lock cannot be opened at all. > > This sounds to me like a wrong design decision was made when > introducing this userspace lock. I wouldn't say it that way. I looked at this a while ago, and one thing to keep in mind is the if the particular prefix path in the filesystem (for instance /run) isn't available, then this will cause iptables to fail. I'm not sure how many systems do provide /run - at the time it might have been more common. >> This can cause unexpected failures (with unhelpful error messages) >> in the presence of concurrent updates. >> >> This patch adds a compile-time option that causes iptables to >> refuse to do anything if the lock cannot be acquired. It is a >> compile-time option instead of a command-line flag because: >> >> 1. In order to reliably avoid concurrent modification, all >>invocations of iptables commands must follow this behaviour. >> 2. Whether or not the lock can be opened is typically not >>a run-time condition but is likely to be a configuration >>error. >> >> Tested by deleting xtables.lock and observing that all commands >> failed if iptables was compiled with --enable-strict-locking, but >> succeeded otherwise. >> >> By default, --enable-strict-locking is disabled for backwards >> compatibility reasons. It can be enabled by default in a future >> change if desired. > > I would like to skip this compile time switch, if the existing > behaviour is broken, we should just fix it. What is the scenario that > can indeed have an impact in terms of backward compatibility breakage? > Does it really make sense to keep a buggy behaviour around? I'm not sure about a change to the behavior, but I agree that a compile time switch is probably not the way to go. -Aaron -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/16] Netfilter/IPVS/OVS fixes for net
From: Pablo Neira AyusoDate: Wed, 3 May 2017 11:31:55 +0200 > The following patchset contains a rather large batch of Netfilter, IPVS > and OVS fixes for your net tree. This includes fixes for ctnetlink, the > userspace conntrack helper infrastructure, conntrack OVS support, > ebtables DNAT target, several leaks in error path among other. More > specifically, they are: ... > You can pull these changes from: > > git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git Pulled, thanks Pablo. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] netfilter: conntrack: Force inlining of build check to prevent build failure
On Wed, May 03, 2017 at 09:55:16AM -0400, David Miller wrote: > From: Geert Uytterhoeven> Date: Wed, 3 May 2017 14:18:43 +0200 > > > If gcc (e.g. 4.1.2) decides not to inline total_extension_size(), the > > build will fail with: > > > > net/built-in.o: In function `nf_conntrack_init_start': > > (.text+0x9baf6): undefined reference to `__compiletime_assert_1893' > > > > or > > > > ERROR: "__compiletime_assert_1893" [net/netfilter/nf_conntrack.ko] > > undefined! > > > > Fix this by forcing inlining of total_extension_size(). > > > > Fixes: b3a5db109e0670d6 ("netfilter: conntrack: use u8 for extension sizes > > again") > > Signed-off-by: Geert Uytterhoeven > > Pablo, I'm going to apply this directly to my tree to fix this build > failure, I hope you don't mind. Acked-by: Pablo Neira Ayuso -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] netfilter: conntrack: Force inlining of build check to prevent build failure
From: Geert UytterhoevenDate: Wed, 3 May 2017 14:18:43 +0200 > If gcc (e.g. 4.1.2) decides not to inline total_extension_size(), the > build will fail with: > > net/built-in.o: In function `nf_conntrack_init_start': > (.text+0x9baf6): undefined reference to `__compiletime_assert_1893' > > or > > ERROR: "__compiletime_assert_1893" [net/netfilter/nf_conntrack.ko] > undefined! > > Fix this by forcing inlining of total_extension_size(). > > Fixes: b3a5db109e0670d6 ("netfilter: conntrack: use u8 for extension sizes > again") > Signed-off-by: Geert Uytterhoeven Pablo, I'm going to apply this directly to my tree to fix this build failure, I hope you don't mind. Thanks. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] netfilter: conntrack: Force inlining of build check to prevent build failure
On Wed, May 3, 2017 at 2:47 PM, Geert Uytterhoevenwrote: > Hi Arnd, > > On Wed, May 3, 2017 at 2:32 PM, Arnd Bergmann wrote: >> On Wed, May 3, 2017 at 2:18 PM, Geert Uytterhoeven >> wrote: >>> If gcc (e.g. 4.1.2) decides not to inline total_extension_size(), the >>> build will fail with: >>> >>> net/built-in.o: In function `nf_conntrack_init_start': >>> (.text+0x9baf6): undefined reference to `__compiletime_assert_1893' >>> >>> or >>> >>> ERROR: "__compiletime_assert_1893" [net/netfilter/nf_conntrack.ko] >>> undefined! >>> >>> Fix this by forcing inlining of total_extension_size(). >>> >>> Fixes: b3a5db109e0670d6 ("netfilter: conntrack: use u8 for extension sizes >>> again") >>> Signed-off-by: Geert Uytterhoeven >> >> I saw this as well when I tried building with "gcc-7 -Og", and came to the >> same >> conclusion. > > Good^H^H^H^HBad to see it not only happens with ancient compilers ;-) Right now we don't see it on newer compilers (I assume gcc-4.3 or up) as we always build with either -O2 or -Os optimizations. I was playing with -Og the other day to get faster builds, but that causes many build failures and additional warnings as the result of missing out on optimizations that we have come to rely on. It might be worth getting -Og to build if the compile time is drastically faster, but we probably have to completely do away with BUILD_BUG_ON() and similar checks in that configuration, which in turn makes the build output less valuable. Arnd -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] netfilter: conntrack: Force inlining of build check to prevent build failure
Hi Arnd, On Wed, May 3, 2017 at 2:32 PM, Arnd Bergmannwrote: > On Wed, May 3, 2017 at 2:18 PM, Geert Uytterhoeven > wrote: >> If gcc (e.g. 4.1.2) decides not to inline total_extension_size(), the >> build will fail with: >> >> net/built-in.o: In function `nf_conntrack_init_start': >> (.text+0x9baf6): undefined reference to `__compiletime_assert_1893' >> >> or >> >> ERROR: "__compiletime_assert_1893" [net/netfilter/nf_conntrack.ko] >> undefined! >> >> Fix this by forcing inlining of total_extension_size(). >> >> Fixes: b3a5db109e0670d6 ("netfilter: conntrack: use u8 for extension sizes >> again") >> Signed-off-by: Geert Uytterhoeven > > I saw this as well when I tried building with "gcc-7 -Og", and came to the > same > conclusion. Good^H^H^H^HBad to see it not only happens with ancient compilers ;-) > Acked-by: Arnd Bergmann Thanks! Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] netfilter: conntrack: Force inlining of build check to prevent build failure
Geert Uytterhoevenwrote: > If gcc (e.g. 4.1.2) decides not to inline total_extension_size(), the > build will fail with: > > net/built-in.o: In function `nf_conntrack_init_start': > (.text+0x9baf6): undefined reference to `__compiletime_assert_1893' > > or > > ERROR: "__compiletime_assert_1893" [net/netfilter/nf_conntrack.ko] > undefined! > > Fix this by forcing inlining of total_extension_size(). Sorry about that, thanks for the fix Geert. Acked-by: Florian Westphal -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] netfilter: conntrack: Force inlining of build check to prevent build failure
On Wed, May 3, 2017 at 2:18 PM, Geert Uytterhoevenwrote: > If gcc (e.g. 4.1.2) decides not to inline total_extension_size(), the > build will fail with: > > net/built-in.o: In function `nf_conntrack_init_start': > (.text+0x9baf6): undefined reference to `__compiletime_assert_1893' > > or > > ERROR: "__compiletime_assert_1893" [net/netfilter/nf_conntrack.ko] > undefined! > > Fix this by forcing inlining of total_extension_size(). > > Fixes: b3a5db109e0670d6 ("netfilter: conntrack: use u8 for extension sizes > again") > Signed-off-by: Geert Uytterhoeven I saw this as well when I tried building with "gcc-7 -Og", and came to the same conclusion. Acked-by: Arnd Bergmann With -Og, there were a couple of other instances of BUILD_BUG_ON() failing to see a compile-time constant, presumably as it fails to inline any functions that are not explicitly marked inline. Arnd -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] netfilter: conntrack: Force inlining of build check to prevent build failure
If gcc (e.g. 4.1.2) decides not to inline total_extension_size(), the build will fail with: net/built-in.o: In function `nf_conntrack_init_start': (.text+0x9baf6): undefined reference to `__compiletime_assert_1893' or ERROR: "__compiletime_assert_1893" [net/netfilter/nf_conntrack.ko] undefined! Fix this by forcing inlining of total_extension_size(). Fixes: b3a5db109e0670d6 ("netfilter: conntrack: use u8 for extension sizes again") Signed-off-by: Geert Uytterhoeven--- net/netfilter/nf_conntrack_core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c index f9245dbfe4356da6..3c8f1ed2f5558fe0 100644 --- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c @@ -1853,7 +1853,7 @@ EXPORT_SYMBOL_GPL(nf_conntrack_set_hashsize); module_param_call(hashsize, nf_conntrack_set_hashsize, param_get_uint, _conntrack_htable_size, 0600); -static unsigned int total_extension_size(void) +static __always_inline unsigned int total_extension_size(void) { /* remember to add new extensions below */ BUILD_BUG_ON(NF_CT_EXT_NUM > 9); -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
crash on >= 4.9.0 kernel seems nf related
After running on kernel 4.9 for quite some time, we suddenly experienced hangs. Tried compiling 4.11.0, but the problem remains. The only think that changed was a couple more rules in nft, traffic is the same. 4.8.6 kernel works fine (but has other problems). Attached is the debugging output, hope it can help you track down the issue: May 3 12:46:59 fw01 kernel: [ 271.875363] [ cut here ] May 3 12:46:59 fw01 kernel: [ 271.875369] WARNING: CPU: 0 PID: 3 at net/sched/sch_generic.c:316 dev_watchdog+0x212/0x220 May 3 12:46:59 fw01 kernel: [ 271.875370] NETDEV WATCHDOG: eth1 (igb): transmit queue 1 timed out May 3 12:46:59 fw01 kernel: [ 271.875370] Modules linked in: May 3 12:46:59 fw01 kernel: [ 271.875373] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.11.0 #1 May 3 12:46:59 fw01 kernel: [ 271.875374] Hardware name: Dell Inc. PowerEdge R210/05KX61, BIOS 1.3.4 05/24/2010 May 3 12:46:59 fw01 kernel: [ 271.875377] Workqueue: events_long gc_worker May 3 12:46:59 fw01 kernel: [ 271.875378] Call Trace: May 3 12:46:59 fw01 kernel: [ 271.875380] May 3 12:46:59 fw01 kernel: [ 271.875383] dump_stack+0x4d/0x6e May 3 12:46:59 fw01 kernel: [ 271.875386] __warn+0xcc/0xf0 May 3 12:46:59 fw01 kernel: [ 271.875387] warn_slowpath_fmt+0x4a/0x50 May 3 12:46:59 fw01 kernel: [ 271.875390] ? tick_program_event+0x3f/0x70 May 3 12:46:59 fw01 kernel: [ 271.875391] dev_watchdog+0x212/0x220 May 3 12:46:59 fw01 kernel: [ 271.875392] ? dev_deactivate_queue.constprop.31+0x60/0x60 May 3 12:46:59 fw01 kernel: [ 271.875395] call_timer_fn+0x30/0x140 May 3 12:46:59 fw01 kernel: [ 271.875397] run_timer_softirq+0x1be/0x3f0 May 3 12:46:59 fw01 kernel: [ 271.875399] ? handle_irq_event_percpu+0x40/0x50 May 3 12:46:59 fw01 kernel: [ 271.875401] __do_softirq+0xbe/0x280 May 3 12:46:59 fw01 kernel: [ 271.875404] do_softirq_own_stack+0x1c/0x30 May 3 12:46:59 fw01 kernel: [ 271.875404] May 3 12:46:59 fw01 kernel: [ 271.875406] do_softirq+0x42/0x50 May 3 12:46:59 fw01 kernel: [ 271.875407] __local_bh_enable_ip+0x75/0x80 May 3 12:46:59 fw01 kernel: [ 271.875410] _raw_spin_unlock_bh+0x15/0x20 May 3 12:46:59 fw01 kernel: [ 271.875412] nf_nat_cleanup_conntrack+0x1bd/0x230 May 3 12:46:59 fw01 kernel: [ 271.875413] ? nf_nat_l3proto_register+0x70/0x70 May 3 12:46:59 fw01 kernel: [ 271.875415] __nf_ct_ext_destroy+0x38/0x50 May 3 12:46:59 fw01 kernel: [ 271.875417] nf_conntrack_free+0x17/0x50 May 3 12:46:59 fw01 kernel: [ 271.875418] destroy_conntrack+0x74/0x90 May 3 12:46:59 fw01 kernel: [ 271.875421] nf_conntrack_destroy+0x12/0x20 May 3 12:46:59 fw01 kernel: [ 271.875422] nf_ct_gc_expired+0x45/0x90 May 3 12:46:59 fw01 kernel: [ 271.875423] gc_worker+0xb3/0x180 May 3 12:46:59 fw01 kernel: [ 271.875425] process_one_work+0x143/0x3e0 May 3 12:46:59 fw01 kernel: [ 271.875426] worker_thread+0x126/0x480 May 3 12:46:59 fw01 kernel: [ 271.875429] kthread+0x104/0x140 May 3 12:46:59 fw01 kernel: [ 271.875430] ? process_one_work+0x3e0/0x3e0 May 3 12:46:59 fw01 kernel: [ 271.875431] ? kthread_park+0x90/0x90 May 3 12:46:59 fw01 kernel: [ 271.875432] ret_from_fork+0x29/0x40 May 3 12:46:59 fw01 kernel: [ 271.875434] ---[ end trace 3da51f3ef83370a7 ]--- May 3 12:47:03 fw01 kernel: [ 275.629072] igb :01:00.0 eth0: igb: eth0 NIC Link is Down May 3 12:47:03 fw01 kernel: [ 275.629135] igb :01:00.1 eth1: Reset adapter May 3 12:47:03 fw01 kernel: [ 275.629316] igb :01:00.0 eth0: Reset adapter May 3 12:47:12 fw01 kernel: [ 285.159798] igb :01:00.1 eth1: igb: eth1 NIC Link is Up 1000 Mbps Half Duplex, Flow Control: RX/TX May 3 12:47:12 fw01 kernel: [ 285.159801] igb :01:00.1: EEE Disabled: unsupported at half duplex. Re-enable using ethtool when at full duplex. May 3 12:47:15 fw01 kernel: [ 287.842412] igb :01:00.1: exceed max 2 second May 3 12:48:21 fw01 kernel: [ 353.875946] igb :01:00.0 eth0: Reset adapter -- Bj(/)rnar -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: .nf_ct_iterate_cleanup panic
dalin liuwrote: > kernel version: 4.4.56 > > In my linux box,when some PPPoE link are disconnected, > nf_ct_iterate_cleanup will be called.nf_ct_iterate_cleanup may have > access to the wrong conntrack address: Is this a new bug? If so, what kernel version did not have this problem? -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH iptables] iptables: support insisting that the lock is held
On Thu, Apr 20, 2017 at 06:23:33PM +0900, Lorenzo Colitti wrote: > Currently, iptables programs will exit with an error if the > iptables lock cannot be acquired, but will silently continue if > the lock cannot be opened at all. This sounds to me like a wrong design decision was made when introducing this userspace lock. > This can cause unexpected failures (with unhelpful error messages) > in the presence of concurrent updates. > > This patch adds a compile-time option that causes iptables to > refuse to do anything if the lock cannot be acquired. It is a > compile-time option instead of a command-line flag because: > > 1. In order to reliably avoid concurrent modification, all >invocations of iptables commands must follow this behaviour. > 2. Whether or not the lock can be opened is typically not >a run-time condition but is likely to be a configuration >error. > > Tested by deleting xtables.lock and observing that all commands > failed if iptables was compiled with --enable-strict-locking, but > succeeded otherwise. > > By default, --enable-strict-locking is disabled for backwards > compatibility reasons. It can be enabled by default in a future > change if desired. I would like to skip this compile time switch, if the existing behaviour is broken, we should just fix it. What is the scenario that can indeed have an impact in terms of backward compatibility breakage? Does it really make sense to keep a buggy behaviour around? -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
.nf_ct_iterate_cleanup panic
kernel version: 4.4.56 In my linux box,when some PPPoE link are disconnected, nf_ct_iterate_cleanup will be called.nf_ct_iterate_cleanup may have access to the wrong conntrack address: <1>[927268.772583] BUG: unable to handle kernel paging request at 88a0050402c7 <1>[927268.785578] IP: [] nf_ct_iterate_cleanup+0xa2/0x230 <4>[927268.792969] PGD 0 <4>[927268.800512] Oops: [#1] SMP <4>[927268.808463] Modules linked in: ipmi_watchdog iptable_raw xt_CT nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_ftp nf_conntrack_sip nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_ftp ip_set_hash_netiface ip_set_hash_net ip_set_hash_ip xt_set ip_set ixgbe(O) vxlan igb(O) e1000e(O) <4>[927268.862764] CPU: 46 PID: 6186 Comm: pppd Tainted: G O 4.4.56 #58 <4>[927268.882176] Hardware name: Supermicro Super Server/X10DRL-i, BIOS 2.0a 08/25/2016 <4>[927268.903142] task: 880472611c40 ti: 880223ed8000 task.ti: 880223ed8000 <4>[927268.925685] RIP: 0010:[] [] nf_ct_iterate_cleanup+0xa2/0x230 <4>[927268.949862] RSP: 0018:880223edbc28 EFLAGS: 00010246 <4>[927268.962330] RAX: c90001eb1000 RBX: 8193a8d0 RCX: 8801f4a668a8 <4>[927268.988124] RDX: 0001 RSI: 0200 RDI: 82007af4 <4>[927269.015332] RBP: 880223edbc78 R08: 88047fc43f08 R09: 0101 <4>[927269.043272] R10: R11: R12: 6c89 <4>[927269.072536] R13: 820ac780 R14: 88a005040290 R15: 8801f56f2a80 <4>[927269.102576] FS: 7f1d30a3d720() GS:88047fc4() knlGS: <4>[927269.132067] CS: 0010 DS: ES: CR0: 80050033 <4>[927269.147159] CR2: 88a0050402c7 CR3: 00046e37e000 CR4: 001406e0 <4>[927269.177840] Stack: <4>[927269.192958] 880223edbc38 013d 0013593d <4>[927269.223469] 880223edbc88 0002 880223edbd20 fff0 <4>[927269.253778] 820b7050 880223edbc88 8193a940 <4>[927269.283402] Call Trace: <4>[927269.297585] [] masq_device_event+0x30/0x40 <4>[927269.311561] [] notifier_call_chain+0x4b/0x70 <4>[927269.325391] [] raw_notifier_call_chain+0x11/0x20 <4>[927269.338999] [] call_netdevice_notifiers_info+0x3b/0x70 <4>[927269.352544] [] call_netdevice_notifiers+0x11/0x20 <4>[927269.365945] [] __dev_notify_flags+0x5f/0xb0 <4>[927269.379318] [] dev_change_flags+0x54/0x70 <4>[927269.392522] [] devinet_ioctl+0x5a1/0x670 <4>[927269.405462] [] inet_ioctl+0x66/0x80 <4>[927269.418095] [] sock_ioctl+0x66/0x260 <4>[927269.430430] [] do_vfs_ioctl+0x81/0x4d0 <4>[927269.442515] [] SyS_ioctl+0x47/0x80 <4>[927269.454330] [] entry_SYSCALL_64_fastpath+0x12/0x6a <4>[927269.465924] Code: 00 00 0f 83 54 01 00 00 49 8b 85 80 0c 00 00 4e 8b 34 f0 41 f6 c6 01 74 12 e9 3e 01 00 00 4d 8b 36 41 f6 c6 01 0f 85 31 01 00 00 <41> 80 7e 37 00 75 ec 4d 8d 7e f0 4c 89 e6 4c 89 ff ff d3 85 c0 <1>[927269.506749] RIP [] nf_ct_iterate_cleanup+0xa2/0x230 <4>[927269.519054] RSP <4>[927269.531071] CR2: 88a0050402c7 <4>[927269.552110] ---[ end trace 01e3243232603b17 ]--- <0>[927270.934341] Kernel panic - not syncing: Fatal exception in interrupt 103c: 0f 85 31 01 00 00 jne1173if (NF_CT_DIRECTION(h) != IP_CT_DIR_ORIGINAL) 1042: 41 80 7e 37 00 cmpb $0x0,0x37(%r14) 1047: 75 ec jne1035 -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 15/16] netfilter: update MAINTAINERS file
Several updates on the MAINTAINERS section for Netfilter: 1) Add Florian Westphal, he's been part of the coreteam since October 2012. He's been dedicating tireless efforts to improve the Netfilter codebase, fix bugs and push ongoing new developments ever since. 2) Add http://www.nftables.org/ URL, currently pointing to http://www.netfilter.org. 3) Update project status from Supported to Maintained. Signed-off-by: Pablo Neira AyusoAcked-by: Florian Westphal --- MAINTAINERS | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 38d3e4ed7208..fc95cb06fb29 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -8701,14 +8701,16 @@ F: drivers/net/ethernet/neterion/ NETFILTER M: Pablo Neira Ayuso M: Jozsef Kadlecsik +M: Florian Westphal L: netfilter-devel@vger.kernel.org L: coret...@netfilter.org W: http://www.netfilter.org/ W: http://www.iptables.org/ +W: http://www.nftables.org/ Q: http://patchwork.ozlabs.org/project/netfilter-devel/list/ T: git git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git T: git git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next.git -S: Supported +S: Maintained F: include/linux/netfilter* F: include/linux/netfilter/ F: include/net/netfilter/ -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 13/16] ipvs: explicitly forbid ipv6 service/dest creation if ipv6 mod is disabled
From: Paolo AbeniWhen creating a new ipvs service, ipv6 addresses are always accepted if CONFIG_IP_VS_IPV6 is enabled. On dest creation the address family is not explicitly checked. This allows the user-space to configure ipvs services even if the system is booted with ipv6.disable=1. On specific configuration, ipvs can try to call ipv6 routing code at setup time, causing the kernel to oops due to fib6_rules_ops being NULL. This change addresses the issue adding a check for the ipv6 module being enabled while validating ipv6 service operations and adding the same validation for dest operations. According to git history, this issue is apparently present since the introduction of ipv6 support, and the oops can be triggered since commit 09571c7ae30865ad ("IPVS: Add function to determine if IPv6 address is local") Fixes: 09571c7ae30865ad ("IPVS: Add function to determine if IPv6 address is local") Signed-off-by: Paolo Abeni Acked-by: Julian Anastasov Signed-off-by: Simon Horman --- net/netfilter/ipvs/ip_vs_ctl.c | 22 +- 1 file changed, 17 insertions(+), 5 deletions(-) diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c index 5aeb0dde6ccc..4d753beaac32 100644 --- a/net/netfilter/ipvs/ip_vs_ctl.c +++ b/net/netfilter/ipvs/ip_vs_ctl.c @@ -3078,6 +3078,17 @@ static int ip_vs_genl_dump_services(struct sk_buff *skb, return skb->len; } +static bool ip_vs_is_af_valid(int af) +{ + if (af == AF_INET) + return true; +#ifdef CONFIG_IP_VS_IPV6 + if (af == AF_INET6 && ipv6_mod_enabled()) + return true; +#endif + return false; +} + static int ip_vs_genl_parse_service(struct netns_ipvs *ipvs, struct ip_vs_service_user_kern *usvc, struct nlattr *nla, int full_entry, @@ -3104,11 +3115,7 @@ static int ip_vs_genl_parse_service(struct netns_ipvs *ipvs, memset(usvc, 0, sizeof(*usvc)); usvc->af = nla_get_u16(nla_af); -#ifdef CONFIG_IP_VS_IPV6 - if (usvc->af != AF_INET && usvc->af != AF_INET6) -#else - if (usvc->af != AF_INET) -#endif + if (!ip_vs_is_af_valid(usvc->af)) return -EAFNOSUPPORT; if (nla_fwmark) { @@ -3610,6 +3617,11 @@ static int ip_vs_genl_set_cmd(struct sk_buff *skb, struct genl_info *info) if (udest.af == 0) udest.af = svc->af; + if (!ip_vs_is_af_valid(udest.af)) { + ret = -EAFNOSUPPORT; + goto out; + } + if (udest.af != svc->af && cmd != IPVS_CMD_DEL_DEST) { /* The synchronization protocol is incompatible * with mixed family services -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 09/16] netfilter: xt_socket: Fix broken IPv6 handling
From: Peter TirsekCommit 834184b1f3a4 ("netfilter: defrag: only register defrag functionality if needed") used the outdated XT_SOCKET_HAVE_IPV6 macro which was removed earlier in commit 8db4c5be88f6 ("netfilter: move socket lookup infrastructure to nf_socket_ipv{4,6}.c"). With that macro never being defined, the xt_socket match emits an "Unknown family 10" warning when used with IPv6: WARNING: CPU: 0 PID: 1377 at net/netfilter/xt_socket.c:160 socket_mt_enable_defrag+0x47/0x50 [xt_socket] Unknown family 10 Modules linked in: xt_socket nf_socket_ipv4 nf_socket_ipv6 nf_defrag_ipv4 [...] CPU: 0 PID: 1377 Comm: ip6tables-resto Not tainted 4.10.10 #1 Hardware name: [...] Call Trace: ? __warn+0xe7/0x100 ? socket_mt_enable_defrag+0x47/0x50 [xt_socket] ? socket_mt_enable_defrag+0x47/0x50 [xt_socket] ? warn_slowpath_fmt+0x39/0x40 ? socket_mt_enable_defrag+0x47/0x50 [xt_socket] ? socket_mt_v2_check+0x12/0x40 [xt_socket] ? xt_check_match+0x6b/0x1a0 [x_tables] ? xt_find_match+0x93/0xd0 [x_tables] ? xt_request_find_match+0x20/0x80 [x_tables] ? translate_table+0x48e/0x870 [ip6_tables] ? translate_table+0x577/0x870 [ip6_tables] ? walk_component+0x3a/0x200 ? kmalloc_order+0x1d/0x50 ? do_ip6t_set_ctl+0x181/0x490 [ip6_tables] ? filename_lookup+0xa5/0x120 ? nf_setsockopt+0x3a/0x60 ? ipv6_setsockopt+0xb0/0xc0 ? sock_common_setsockopt+0x23/0x30 ? SyS_socketcall+0x41d/0x630 ? vfs_read+0xfa/0x120 ? do_fast_syscall_32+0x7a/0x110 ? entry_SYSENTER_32+0x47/0x71 This patch brings the conditional back in line with how the rest of the file handles IPv6. Fixes: 834184b1f3a4 ("netfilter: defrag: only register defrag functionality if needed") Signed-off-by: Peter Tirsek Acked-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso --- net/netfilter/xt_socket.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/netfilter/xt_socket.c b/net/netfilter/xt_socket.c index 770bbec878f1..e75ef39669c5 100644 --- a/net/netfilter/xt_socket.c +++ b/net/netfilter/xt_socket.c @@ -152,7 +152,7 @@ static int socket_mt_enable_defrag(struct net *net, int family) switch (family) { case NFPROTO_IPV4: return nf_defrag_ipv4_enable(net); -#ifdef XT_SOCKET_HAVE_IPV6 +#if IS_ENABLED(CONFIG_IP6_NF_IPTABLES) case NFPROTO_IPV6: return nf_defrag_ipv6_enable(net); #endif -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 03/16] netfilter: nf_ct_helper: permit cthelpers with different names via nfnetlink
From: Liping Zhangcthelpers added via nfnetlink may have the same tuple, i.e. except for the l3proto and l4proto, other fields are all zero. So even with the different names, we will also fail to add them: # nfct helper add ssdp inet udp # nfct helper add tftp inet udp nfct v1.4.3: netlink error: File exists So in order to avoid unpredictable behaviour, we should: 1. cthelpers can be selected by nft ct helper obj or xt_CT target, so report error if duplicated { name, l3proto, l4proto } tuple exist. 2. cthelpers can be selected by nf_ct_tuple_src_mask_cmp when nf_ct_auto_assign_helper is enabled, so also report error if duplicated { l3proto, l4proto, src-port } tuple exist. Also note, if the cthelper is added from userspace, then the src-port will always be zero, it's invalid for nf_ct_auto_assign_helper, so there's no need to check the second point listed above. Fixes: 893e093c786c ("netfilter: nf_ct_helper: bail out on duplicated helpers") Signed-off-by: Liping Zhang Signed-off-by: Pablo Neira Ayuso --- net/netfilter/nf_conntrack_helper.c | 26 +- 1 file changed, 21 insertions(+), 5 deletions(-) diff --git a/net/netfilter/nf_conntrack_helper.c b/net/netfilter/nf_conntrack_helper.c index 4eeb3418366a..99bcd44aac70 100644 --- a/net/netfilter/nf_conntrack_helper.c +++ b/net/netfilter/nf_conntrack_helper.c @@ -386,17 +386,33 @@ int nf_conntrack_helper_register(struct nf_conntrack_helper *me) struct nf_conntrack_tuple_mask mask = { .src.u.all = htons(0x) }; unsigned int h = helper_hash(>tuple); struct nf_conntrack_helper *cur; - int ret = 0; + int ret = 0, i; BUG_ON(me->expect_policy == NULL); BUG_ON(me->expect_class_max >= NF_CT_MAX_EXPECT_CLASSES); BUG_ON(strlen(me->name) > NF_CT_HELPER_NAME_LEN - 1); mutex_lock(_ct_helper_mutex); - hlist_for_each_entry(cur, _ct_helper_hash[h], hnode) { - if (nf_ct_tuple_src_mask_cmp(>tuple, >tuple, )) { - ret = -EEXIST; - goto out; + for (i = 0; i < nf_ct_helper_hsize; i++) { + hlist_for_each_entry(cur, _ct_helper_hash[i], hnode) { + if (!strcmp(cur->name, me->name) && + (cur->tuple.src.l3num == NFPROTO_UNSPEC || +cur->tuple.src.l3num == me->tuple.src.l3num) && + cur->tuple.dst.protonum == me->tuple.dst.protonum) { + ret = -EEXIST; + goto out; + } + } + } + + /* avoid unpredictable behaviour for auto_assign_helper */ + if (!(me->flags & NF_CT_HELPER_F_USERSPACE)) { + hlist_for_each_entry(cur, _ct_helper_hash[h], hnode) { + if (nf_ct_tuple_src_mask_cmp(>tuple, >tuple, +)) { + ret = -EEXIST; + goto out; + } } } hlist_add_head_rcu(>hnode, _ct_helper_hash[h]); -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 14/16] netfilter: x_tables: unlock on error in xt_find_table_lock()
From: Dan CarpenterAccording to my static checker we should unlock here before the return. That seems reasonable to me as well. Fixes" b9e69e127397 ("netfilter: xtables: don't hook tables by default") Signed-off-by: Dan Carpenter Acked-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso --- net/netfilter/x_tables.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c index 14857afc9937..f134d384852f 100644 --- a/net/netfilter/x_tables.c +++ b/net/netfilter/x_tables.c @@ -1051,8 +1051,10 @@ struct xt_table *xt_find_table_lock(struct net *net, u_int8_t af, list_for_each_entry(t, _net.xt.tables[af], list) { if (strcmp(t->name, name)) continue; - if (!try_module_get(t->me)) + if (!try_module_get(t->me)) { + mutex_unlock([af].mutex); return NULL; + } mutex_unlock([af].mutex); if (t->table_init(net) != 0) { -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 04/16] netfilter: nft_set_bitmap: free dummy elements when destroy the set
From: Liping ZhangWe forget to free dummy elements when deleting the set. So when I was running nft-test.py, I saw many kmemleak warnings: kmemleak: 1344 new suspected memory leaks ... # cat /sys/kernel/debug/kmemleak unreferenced object 0x8800631345c8 (size 32): comm "nft", pid 9075, jiffies 4295743309 (age 1354.815s) hex dump (first 32 bytes): f8 63 13 63 00 88 ff ff 88 79 13 63 00 88 ff ff .c.c.y.c 04 0c 00 00 00 00 00 00 00 00 00 00 08 03 00 00 backtrace: [] kmemleak_alloc+0x4a/0xa0 [] __kmalloc+0x164/0x310 [] nft_set_elem_init+0x3d/0x1b0 [nf_tables] [] nft_add_set_elem+0x45a/0x8c0 [nf_tables] [] nf_tables_newsetelem+0x105/0x1d0 [nf_tables] [] nfnetlink_rcv+0x414/0x770 [nfnetlink] [] netlink_unicast+0x1f6/0x310 [] netlink_sendmsg+0x306/0x3b0 ... Fixes: e920dde516088 ("netfilter: nft_set_bitmap: keep a list of dummy elements") Signed-off-by: Liping Zhang Signed-off-by: Pablo Neira Ayuso --- net/netfilter/nft_set_bitmap.c | 5 + 1 file changed, 5 insertions(+) diff --git a/net/netfilter/nft_set_bitmap.c b/net/netfilter/nft_set_bitmap.c index 8ebbc2940f4c..b988162b5b15 100644 --- a/net/netfilter/nft_set_bitmap.c +++ b/net/netfilter/nft_set_bitmap.c @@ -257,6 +257,11 @@ static int nft_bitmap_init(const struct nft_set *set, static void nft_bitmap_destroy(const struct nft_set *set) { + struct nft_bitmap *priv = nft_set_priv(set); + struct nft_bitmap_elem *be, *n; + + list_for_each_entry_safe(be, n, >list, head) + nft_set_elem_destroy(set, be, true); } static bool nft_bitmap_estimate(const struct nft_set_desc *desc, u32 features, -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 00/16] Netfilter/IPVS/OVS fixes for net
Hi David, The following patchset contains a rather large batch of Netfilter, IPVS and OVS fixes for your net tree. This includes fixes for ctnetlink, the userspace conntrack helper infrastructure, conntrack OVS support, ebtables DNAT target, several leaks in error path among other. More specifically, they are: 1) Fix reference count leak in the CT target error path, from Gao Feng. 2) Remove conntrack entry clashing with a matching expectation, patch from Jarno Rajahalme. 3) Fix bogus EEXIST when registering two different userspace helpers, from Liping Zhang. 4) Don't leak dummy elements in the new bitmap set type in nf_tables, from Liping Zhang. 5) Get rid of module autoload from conntrack update path in ctnetlink, we don't need autoload at this late stage and it is happening with rcu read lock held which is not good. From Liping Zhang. 6) Fix deadlock due to double-acquire of the expect_lock from conntrack update path, this fixes a bug that was introduced when the central spinlock got removed. Again from Liping Zhang. 7) Safe ct->status update from ctnetlink path, from Liping. The expect_lock protection that was selected when the central spinlock was removed was not really protecting anything at all. 8) Protect sequence adjustment under ct->lock. 9) Missing socket match with IPv6, from Peter Tirsek. 10) Adjust skb->pkt_type of DNAT'ed frames from ebtables, from Linus Luessing. 11) Don't give up on evaluating the expression on new entries added via dynset expression in nf_tables, from Liping Zhang. 12) Use skb_checksum() when mangling icmpv6 in IPv6 NAT as this deals with non-linear skbuffs. 13) Don't allow IPv6 service in IPVS if no IPv6 support is available, from Paolo Abeni. 14) Missing mutex release in error path of xt_find_table_lock(), from Dan Carpenter. 15) Update maintainers files, Netfilter section. Add Florian to the file, refer to nftables.org and change project status from Supported to Maintained. 16) Bail out on mismatching extensions in element updates in nf_tables. You can pull these changes from: git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git Thanks! The following changes since commit 94836ecf1e7378b64d37624fbb81fe48fbd4c772: Merge tag 'nfsd-4.11-2' of git://linux-nfs.org/~bfields/linux (2017-04-21 16:37:48 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git HEAD for you to fetch changes up to 9744a6fcefcb4d56501d69adb04c24559d353cad: netfilter: nf_tables: check if same extensions are set when adding elements (2017-05-03 10:58:00 +0200) Dan Carpenter (1): netfilter: x_tables: unlock on error in xt_find_table_lock() Dave Johnson (1): netfilter: Wrong icmp6 checksum for ICMPV6_TIME_EXCEED in reverse SNATv6 path Gao Feng (1): netfilter: xt_CT: fix refcnt leak on error path Jarno Rajahalme (1): openvswitch: Delete conntrack entry clashing with an expectation. Linus Lüssing (1): bridge: ebtables: fix reception of frames DNAT-ed to bridge device/port Liping Zhang (7): netfilter: nf_ct_helper: permit cthelpers with different names via nfnetlink netfilter: nft_set_bitmap: free dummy elements when destroy the set netfilter: ctnetlink: drop the incorrect cthelper module request netfilter: ctnetlink: fix deadlock due to acquire _expect_lock twice netfilter: ctnetlink: make it safer when updating ct->status netfilter: ctnetlink: acquire ct->lock before operating nf_ct_seqadj netfilter: nft_dynset: continue to next expr if _OP_ADD succeeded Pablo Neira Ayuso (3): Merge tag 'ipvs-fixes-for-v4.11' of http://git.kernel.org/.../horms/ipvs netfilter: update MAINTAINERS file netfilter: nf_tables: check if same extensions are set when adding elements Paolo Abeni (1): ipvs: explicitly forbid ipv6 service/dest creation if ipv6 mod is disabled Peter Tirsek (1): netfilter: xt_socket: Fix broken IPv6 handling MAINTAINERS| 4 +- include/uapi/linux/netfilter/nf_conntrack_common.h | 13 +++- net/bridge/netfilter/ebt_dnat.c| 20 + net/ipv6/netfilter/nf_nat_l3proto_ipv6.c | 2 +- net/netfilter/ipvs/ip_vs_ctl.c | 22 -- net/netfilter/nf_conntrack_helper.c| 26 +-- net/netfilter/nf_conntrack_netlink.c | 89 -- net/netfilter/nf_tables_api.c | 5 ++ net/netfilter/nft_dynset.c | 5 +- net/netfilter/nft_set_bitmap.c | 5 ++ net/netfilter/x_tables.c | 4 +- net/netfilter/xt_CT.c | 11 ++- net/netfilter/xt_socket.c | 2 +-
[PATCH 16/16] netfilter: nf_tables: check if same extensions are set when adding elements
If no NLM_F_EXCL is set and the element already exists in the set, make sure that both elements have the same extensions. Signed-off-by: Pablo Neira Ayuso--- net/netfilter/nf_tables_api.c | 5 + 1 file changed, 5 insertions(+) diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 434c739dfeca..11a96e8dd3cd 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -3749,6 +3749,11 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set, err = set->ops->insert(ctx->net, set, , ); if (err) { if (err == -EEXIST) { + if (nft_set_ext_exists(ext, NFT_SET_EXT_DATA) ^ + nft_set_ext_exists(ext2, NFT_SET_EXT_DATA) || + nft_set_ext_exists(ext, NFT_SET_EXT_OBJREF) ^ + nft_set_ext_exists(ext2, NFT_SET_EXT_OBJREF)) + return -EBUSY; if ((nft_set_ext_exists(ext, NFT_SET_EXT_DATA) && nft_set_ext_exists(ext2, NFT_SET_EXT_DATA) && memcmp(nft_set_ext_data(ext), -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 07/16] netfilter: ctnetlink: make it safer when updating ct->status
From: Liping ZhangAfter converting to use rcu for conntrack hash, one CPU may update the ct->status via ctnetlink, while another CPU may process the packets and update the ct->status. So the non-atomic operation "ct->status |= status;" via ctnetlink becomes unsafe, and this may clear the IPS_DYING_BIT bit set by another CPU unexpectedly. For example: CPU0CPU1 ctnetlink_change_status__nf_conntrack_find_get old = ct->status nf_ct_gc_expired - nf_ct_kill - test_and_set_bit(IPS_DYING_BIT new = old | status; - ct->status = new; <-- oops, _DYING_ is cleared! Now using a series of atomic bit operation to solve the above issue. Also note, user shouldn't set IPS_TEMPLATE, IPS_SEQ_ADJUST directly, so make these two bits be unchangable too. If we set the IPS_TEMPLATE_BIT, ct will be freed by nf_ct_tmpl_free, but actually it is alloced by nf_conntrack_alloc. If we set the IPS_SEQ_ADJUST_BIT, this may cause the NULL pointer deference, as the nfct_seqadj(ct) maybe NULL. Last, add some comments to describe the logic change due to the commit a963d710f367 ("netfilter: ctnetlink: Fix regression in CTA_STATUS processing"), which makes me feel a little confusing. Fixes: 76507f69c44e ("[NETFILTER]: nf_conntrack: use RCU for conntrack hash") Signed-off-by: Liping Zhang Signed-off-by: Pablo Neira Ayuso --- include/uapi/linux/netfilter/nf_conntrack_common.h | 13 ++--- net/netfilter/nf_conntrack_netlink.c | 33 -- 2 files changed, 33 insertions(+), 13 deletions(-) diff --git a/include/uapi/linux/netfilter/nf_conntrack_common.h b/include/uapi/linux/netfilter/nf_conntrack_common.h index 6a8e33dd4ecb..38fc383139f0 100644 --- a/include/uapi/linux/netfilter/nf_conntrack_common.h +++ b/include/uapi/linux/netfilter/nf_conntrack_common.h @@ -82,10 +82,6 @@ enum ip_conntrack_status { IPS_DYING_BIT = 9, IPS_DYING = (1 << IPS_DYING_BIT), - /* Bits that cannot be altered from userland. */ - IPS_UNCHANGEABLE_MASK = (IPS_NAT_DONE_MASK | IPS_NAT_MASK | -IPS_EXPECTED | IPS_CONFIRMED | IPS_DYING), - /* Connection has fixed timeout. */ IPS_FIXED_TIMEOUT_BIT = 10, IPS_FIXED_TIMEOUT = (1 << IPS_FIXED_TIMEOUT_BIT), @@ -101,6 +97,15 @@ enum ip_conntrack_status { /* Conntrack got a helper explicitly attached via CT target. */ IPS_HELPER_BIT = 13, IPS_HELPER = (1 << IPS_HELPER_BIT), + + /* Be careful here, modifying these bits can make things messy, +* so don't let users modify them directly. +*/ + IPS_UNCHANGEABLE_MASK = (IPS_NAT_DONE_MASK | IPS_NAT_MASK | +IPS_EXPECTED | IPS_CONFIRMED | IPS_DYING | +IPS_SEQ_ADJUST | IPS_TEMPLATE), + + __IPS_MAX_BIT = 14, }; /* Connection tracking event types */ diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c index e5f9b1f4..86deed6a8db4 100644 --- a/net/netfilter/nf_conntrack_netlink.c +++ b/net/netfilter/nf_conntrack_netlink.c @@ -1419,6 +1419,24 @@ ctnetlink_parse_nat_setup(struct nf_conn *ct, } #endif +static void +__ctnetlink_change_status(struct nf_conn *ct, unsigned long on, + unsigned long off) +{ + unsigned int bit; + + /* Ignore these unchangable bits */ + on &= ~IPS_UNCHANGEABLE_MASK; + off &= ~IPS_UNCHANGEABLE_MASK; + + for (bit = 0; bit < __IPS_MAX_BIT; bit++) { + if (on & (1 << bit)) + set_bit(bit, >status); + else if (off & (1 << bit)) + clear_bit(bit, >status); + } +} + static int ctnetlink_change_status(struct nf_conn *ct, const struct nlattr * const cda[]) { @@ -1438,10 +1456,7 @@ ctnetlink_change_status(struct nf_conn *ct, const struct nlattr * const cda[]) /* ASSURED bit can only be set */ return -EBUSY; - /* Be careful here, modifying NAT bits can screw up things, -* so don't let users modify them directly if they don't pass -* nf_nat_range. */ - ct->status |= status & ~(IPS_NAT_DONE_MASK | IPS_NAT_MASK); + __ctnetlink_change_status(ct, status, 0); return 0; } @@ -1628,7 +1643,7 @@ ctnetlink_change_seq_adj(struct nf_conn *ct, if (ret < 0) return ret; - ct->status |= IPS_SEQ_ADJUST; + set_bit(IPS_SEQ_ADJUST_BIT, >status); } if (cda[CTA_SEQ_ADJ_REPLY]) { @@ -1637,7 +1652,7 @@ ctnetlink_change_seq_adj(struct nf_conn *ct, if (ret < 0) return ret; - ct->status |= IPS_SEQ_ADJUST; + set_bit(IPS_SEQ_ADJUST_BIT,
[PATCH 02/16] openvswitch: Delete conntrack entry clashing with an expectation.
From: Jarno RajahalmeConntrack helpers do not check for a potentially clashing conntrack entry when creating a new expectation. Also, nf_conntrack_in() will check expectations (via init_conntrack()) only if a conntrack entry can not be found. The expectation for a packet which also matches an existing conntrack entry will not be removed by conntrack, and is currently handled inconsistently by OVS, as OVS expects the expectation to be removed when the connection tracking entry matching that expectation is confirmed. It should be noted that normally an IP stack would not allow reuse of a 5-tuple of an old (possibly lingering) connection for a new data connection, so this is somewhat unlikely corner case. However, it is possible that a misbehaving source could cause conntrack entries be created that could then interfere with new related connections. Fix this in the OVS module by deleting the clashing conntrack entry after an expectation has been matched. This causes the following nf_conntrack_in() call also find the expectation and remove it when creating the new conntrack entry, as well as the forthcoming reply direction packets to match the new related connection instead of the old clashing conntrack entry. Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action") Reported-by: Yang Song Signed-off-by: Jarno Rajahalme Acked-by: Joe Stringer Signed-off-by: Pablo Neira Ayuso --- net/openvswitch/conntrack.c | 30 +- 1 file changed, 29 insertions(+), 1 deletion(-) diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index 7b2c2fce408a..c92a5795dcda 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -514,10 +514,38 @@ ovs_ct_expect_find(struct net *net, const struct nf_conntrack_zone *zone, u16 proto, const struct sk_buff *skb) { struct nf_conntrack_tuple tuple; + struct nf_conntrack_expect *exp; if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb), proto, net, )) return NULL; - return __nf_ct_expect_find(net, zone, ); + + exp = __nf_ct_expect_find(net, zone, ); + if (exp) { + struct nf_conntrack_tuple_hash *h; + + /* Delete existing conntrack entry, if it clashes with the +* expectation. This can happen since conntrack ALGs do not +* check for clashes between (new) expectations and existing +* conntrack entries. nf_conntrack_in() will check the +* expectations only if a conntrack entry can not be found, +* which can lead to OVS finding the expectation (here) in the +* init direction, but which will not be removed by the +* nf_conntrack_in() call, if a matching conntrack entry is +* found instead. In this case all init direction packets +* would be reported as new related packets, while reply +* direction packets would be reported as un-related +* established packets. +*/ + h = nf_conntrack_find_get(net, zone, ); + if (h) { + struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(h); + + nf_ct_delete(ct, 0, 0); + nf_conntrack_put(>ct_general); + } + } + + return exp; } /* This replicates logic from nf_conntrack_core.c that is not exported. */ -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 11/16] netfilter: nft_dynset: continue to next expr if _OP_ADD succeeded
From: Liping ZhangCurrently, after adding the following nft rules: # nft add set x target1 { type ipv4_addr \; flags timeout \;} # nft add rule x y set add ip daddr timeout 1d @target1 counter the counters will always be zero despite of the elements are added to the dynamic set "target1" or not, as we will break the nft expr traversal unconditionally: # nft list ruleset ... set target1 { ... elements = { 8.8.8.8 expires 23h59m53s} } chain output { ... set add ip daddr timeout 1d @target1 counter packets 0 bytes 0 ^ ^ ... } Since we add the elements to the set successfully, we should continue to the next expression. Additionally, if elements are added to "flow table" successfully, we will _always_ continue to the next expr, even if the operation is _OP_ADD. So it's better to keep them to be consistent. Fixes: 22fe54d5fefc ("netfilter: nf_tables: add support for dynamic set updates") Reported-by: Robert White Signed-off-by: Liping Zhang Signed-off-by: Pablo Neira Ayuso --- net/netfilter/nft_dynset.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/net/netfilter/nft_dynset.c b/net/netfilter/nft_dynset.c index 049ad2d9ee66..fafbeea3ed04 100644 --- a/net/netfilter/nft_dynset.c +++ b/net/netfilter/nft_dynset.c @@ -82,8 +82,7 @@ static void nft_dynset_eval(const struct nft_expr *expr, nft_set_ext_exists(ext, NFT_SET_EXT_EXPIRATION)) { timeout = priv->timeout ? : set->timeout; *nft_set_ext_expiration(ext) = jiffies + timeout; - } else if (sexpr == NULL) - goto out; + } if (sexpr != NULL) sexpr->ops->eval(sexpr, regs, pkt); @@ -92,7 +91,7 @@ static void nft_dynset_eval(const struct nft_expr *expr, regs->verdict.code = NFT_BREAK; return; } -out: + if (!priv->invert) regs->verdict.code = NFT_BREAK; } -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 08/16] netfilter: ctnetlink: acquire ct->lock before operating nf_ct_seqadj
From: Liping ZhangWe should acquire the ct->lock before accessing or modifying the nf_ct_seqadj, as another CPU may modify the nf_ct_seqadj at the same time during its packet proccessing. Signed-off-by: Liping Zhang Signed-off-by: Pablo Neira Ayuso --- net/netfilter/nf_conntrack_netlink.c | 21 +++-- 1 file changed, 15 insertions(+), 6 deletions(-) diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c index 86deed6a8db4..78f8c9adbd3c 100644 --- a/net/netfilter/nf_conntrack_netlink.c +++ b/net/netfilter/nf_conntrack_netlink.c @@ -417,8 +417,7 @@ dump_ct_seq_adj(struct sk_buff *skb, const struct nf_ct_seqadj *seq, int type) return -1; } -static int ctnetlink_dump_ct_seq_adj(struct sk_buff *skb, -const struct nf_conn *ct) +static int ctnetlink_dump_ct_seq_adj(struct sk_buff *skb, struct nf_conn *ct) { struct nf_conn_seqadj *seqadj = nfct_seqadj(ct); struct nf_ct_seqadj *seq; @@ -426,15 +425,20 @@ static int ctnetlink_dump_ct_seq_adj(struct sk_buff *skb, if (!(ct->status & IPS_SEQ_ADJUST) || !seqadj) return 0; + spin_lock_bh(>lock); seq = >seq[IP_CT_DIR_ORIGINAL]; if (dump_ct_seq_adj(skb, seq, CTA_SEQ_ADJ_ORIG) == -1) - return -1; + goto err; seq = >seq[IP_CT_DIR_REPLY]; if (dump_ct_seq_adj(skb, seq, CTA_SEQ_ADJ_REPLY) == -1) - return -1; + goto err; + spin_unlock_bh(>lock); return 0; +err: + spin_unlock_bh(>lock); + return -1; } static int ctnetlink_dump_id(struct sk_buff *skb, const struct nf_conn *ct) @@ -1637,11 +1641,12 @@ ctnetlink_change_seq_adj(struct nf_conn *ct, if (!seqadj) return 0; + spin_lock_bh(>lock); if (cda[CTA_SEQ_ADJ_ORIG]) { ret = change_seq_adj(>seq[IP_CT_DIR_ORIGINAL], cda[CTA_SEQ_ADJ_ORIG]); if (ret < 0) - return ret; + goto err; set_bit(IPS_SEQ_ADJUST_BIT, >status); } @@ -1650,12 +1655,16 @@ ctnetlink_change_seq_adj(struct nf_conn *ct, ret = change_seq_adj(>seq[IP_CT_DIR_REPLY], cda[CTA_SEQ_ADJ_REPLY]); if (ret < 0) - return ret; + goto err; set_bit(IPS_SEQ_ADJUST_BIT, >status); } + spin_unlock_bh(>lock); return 0; +err: + spin_unlock_bh(>lock); + return ret; } static int -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 12/16] netfilter: Wrong icmp6 checksum for ICMPV6_TIME_EXCEED in reverse SNATv6 path
From: Dave JohnsonWhen recalculating the outer ICMPv6 checksum for a reverse path NATv6 such as ICMPV6_TIME_EXCEED nf_nat_icmpv6_reply_translation() was accessing data beyond the headlen of the skb for non-linear skb. This resulted in incorrect ICMPv6 checksum as garbage data was used. Patch replaces csum_partial() with skb_checksum() which supports non-linear skbs similar to nf_nat_icmp_reply_translation() from ipv4. Signed-off-by: Dave Johnson Signed-off-by: Pablo Neira Ayuso --- net/ipv6/netfilter/nf_nat_l3proto_ipv6.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c b/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c index e0be97e636a4..69937b637ee5 100644 --- a/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c +++ b/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c @@ -235,7 +235,7 @@ int nf_nat_icmpv6_reply_translation(struct sk_buff *skb, inside->icmp6.icmp6_cksum = csum_ipv6_magic(>saddr, >daddr, skb->len - hdrlen, IPPROTO_ICMPV6, - csum_partial(>icmp6, + skb_checksum(skb, hdrlen, skb->len - hdrlen, 0)); } -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 01/16] netfilter: xt_CT: fix refcnt leak on error path
From: Gao FengThere are two cases which causes refcnt leak. 1. When nf_ct_timeout_ext_add failed in xt_ct_set_timeout, it should free the timeout refcnt. Now goto the err_put_timeout error handler instead of going ahead. 2. When the time policy is not found, we should call module_put. Otherwise, the related cthelper module cannot be removed anymore. It is easy to reproduce by typing the following command: # iptables -t raw -A OUTPUT -p tcp -j CT --helper ftp --timeout xxx Signed-off-by: Gao Feng Signed-off-by: Liping Zhang Signed-off-by: Pablo Neira Ayuso --- net/netfilter/xt_CT.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/net/netfilter/xt_CT.c b/net/netfilter/xt_CT.c index b008db0184b8..81fdcdca7457 100644 --- a/net/netfilter/xt_CT.c +++ b/net/netfilter/xt_CT.c @@ -167,8 +167,10 @@ xt_ct_set_timeout(struct nf_conn *ct, const struct xt_tgchk_param *par, goto err_put_timeout; } timeout_ext = nf_ct_timeout_ext_add(ct, timeout, GFP_ATOMIC); - if (timeout_ext == NULL) + if (!timeout_ext) { ret = -ENOMEM; + goto err_put_timeout; + } rcu_read_unlock(); return ret; @@ -200,6 +202,7 @@ static int xt_ct_tg_check(const struct xt_tgchk_param *par, struct xt_ct_target_info_v1 *info) { struct nf_conntrack_zone zone; + struct nf_conn_help *help; struct nf_conn *ct; int ret = -EOPNOTSUPP; @@ -248,7 +251,7 @@ static int xt_ct_tg_check(const struct xt_tgchk_param *par, if (info->timeout[0]) { ret = xt_ct_set_timeout(ct, par, info->timeout); if (ret < 0) - goto err3; + goto err4; } __set_bit(IPS_CONFIRMED_BIT, >status); nf_conntrack_get(>ct_general); @@ -256,6 +259,10 @@ static int xt_ct_tg_check(const struct xt_tgchk_param *par, info->ct = ct; return 0; +err4: + help = nfct_help(ct); + if (help) + module_put(help->helper->me); err3: nf_ct_tmpl_free(ct); err2: -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 06/16] netfilter: ctnetlink: fix deadlock due to acquire _expect_lock twice
From: Liping ZhangCurrently, ctnetlink_change_conntrack is always protected by _expect_lock, but this will cause a deadlock when deleting the helper from a conntrack, as the _expect_lock will be acquired again by nf_ct_remove_expectations: CPU0 lock(nf_conntrack_expect_lock); lock(nf_conntrack_expect_lock); *** DEADLOCK *** May be due to missing lock nesting notation 2 locks held by lt-conntrack_gr/12853: #0: ([i].mutex){+.+.+.}, at: [] nfnetlink_rcv_msg+0x399/0x6a9 [nfnetlink] #1: (nf_conntrack_expect_lock){+.}, at: [] ctnetlink_new_conntrack+0x17f/0x408 [nf_conntrack_netlink] Call Trace: dump_stack+0x85/0xc2 __lock_acquire+0x1608/0x1680 ? ctnetlink_parse_tuple_proto+0x10f/0x1c0 [nf_conntrack_netlink] lock_acquire+0x100/0x1f0 ? nf_ct_remove_expectations+0x32/0x90 [nf_conntrack] _raw_spin_lock_bh+0x3f/0x50 ? nf_ct_remove_expectations+0x32/0x90 [nf_conntrack] nf_ct_remove_expectations+0x32/0x90 [nf_conntrack] ctnetlink_change_helper+0xc6/0x190 [nf_conntrack_netlink] ctnetlink_new_conntrack+0x1b2/0x408 [nf_conntrack_netlink] nfnetlink_rcv_msg+0x60a/0x6a9 [nfnetlink] ? nfnetlink_rcv_msg+0x1b9/0x6a9 [nfnetlink] ? nfnetlink_bind+0x1a0/0x1a0 [nfnetlink] netlink_rcv_skb+0xa4/0xc0 nfnetlink_rcv+0x87/0x770 [nfnetlink] Since the operations are unrelated to nf_ct_expect, so we can drop the _expect_lock. Also note, after removing the _expect_lock protection, another CPU may invoke nf_conntrack_helper_unregister, so we should use rcu_read_lock to protect __nf_conntrack_helper_find invoked by ctnetlink_change_helper. Fixes: ca7433df3a67 ("netfilter: conntrack: seperate expect locking from nf_conntrack_lock") Signed-off-by: Liping Zhang Signed-off-by: Pablo Neira Ayuso --- net/netfilter/nf_conntrack_netlink.c | 24 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c index 48c184552de0..e5f9b1f4 100644 --- a/net/netfilter/nf_conntrack_netlink.c +++ b/net/netfilter/nf_conntrack_netlink.c @@ -1510,23 +1510,29 @@ static int ctnetlink_change_helper(struct nf_conn *ct, return 0; } + rcu_read_lock(); helper = __nf_conntrack_helper_find(helpname, nf_ct_l3num(ct), nf_ct_protonum(ct)); - if (helper == NULL) + if (helper == NULL) { + rcu_read_unlock(); return -EOPNOTSUPP; + } if (help) { if (help->helper == helper) { /* update private helper data if allowed. */ if (helper->from_nlattr) helper->from_nlattr(helpinfo, ct); - return 0; + err = 0; } else - return -EBUSY; + err = -EBUSY; + } else { + /* we cannot set a helper for an existing conntrack */ + err = -EOPNOTSUPP; } - /* we cannot set a helper for an existing conntrack */ - return -EOPNOTSUPP; + rcu_read_unlock(); + return err; } static int ctnetlink_change_timeout(struct nf_conn *ct, @@ -1945,9 +1951,7 @@ static int ctnetlink_new_conntrack(struct net *net, struct sock *ctnl, err = -EEXIST; ct = nf_ct_tuplehash_to_ctrack(h); if (!(nlh->nlmsg_flags & NLM_F_EXCL)) { - spin_lock_bh(_conntrack_expect_lock); err = ctnetlink_change_conntrack(ct, cda); - spin_unlock_bh(_conntrack_expect_lock); if (err == 0) { nf_conntrack_eventmask_report((1 << IPCT_REPLY) | (1 << IPCT_ASSURED) | @@ -2342,11 +2346,7 @@ ctnetlink_glue_parse(const struct nlattr *attr, struct nf_conn *ct) if (ret < 0) return ret; - spin_lock_bh(_conntrack_expect_lock); - ret = ctnetlink_glue_parse_ct((const struct nlattr **)cda, ct); - spin_unlock_bh(_conntrack_expect_lock); - - return ret; + return ctnetlink_glue_parse_ct((const struct nlattr **)cda, ct); } static int ctnetlink_glue_exp_parse(const struct nlattr * const *cda, -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 05/16] netfilter: ctnetlink: drop the incorrect cthelper module request
From: Liping ZhangFirst, when creating a new ct, we will invoke request_module to try to load the related inkernel cthelper. So there's no need to call the request_module again when updating the ct helpinfo. Second, ctnetlink_change_helper may be called with rcu_read_lock held, i.e. rcu_read_lock -> nfqnl_recv_verdict -> nfqnl_ct_parse -> ctnetlink_glue_parse -> ctnetlink_glue_parse_ct -> ctnetlink_change_helper. But the request_module invocation may sleep, so we can't call it with the rcu_read_lock held. Remove it now. Signed-off-by: Liping Zhang Signed-off-by: Pablo Neira Ayuso --- net/netfilter/nf_conntrack_netlink.c | 17 + 1 file changed, 1 insertion(+), 16 deletions(-) diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c index dc7dfd68fafe..48c184552de0 100644 --- a/net/netfilter/nf_conntrack_netlink.c +++ b/net/netfilter/nf_conntrack_netlink.c @@ -1512,23 +1512,8 @@ static int ctnetlink_change_helper(struct nf_conn *ct, helper = __nf_conntrack_helper_find(helpname, nf_ct_l3num(ct), nf_ct_protonum(ct)); - if (helper == NULL) { -#ifdef CONFIG_MODULES - spin_unlock_bh(_conntrack_expect_lock); - - if (request_module("nfct-helper-%s", helpname) < 0) { - spin_lock_bh(_conntrack_expect_lock); - return -EOPNOTSUPP; - } - - spin_lock_bh(_conntrack_expect_lock); - helper = __nf_conntrack_helper_find(helpname, nf_ct_l3num(ct), - nf_ct_protonum(ct)); - if (helper) - return -EAGAIN; -#endif + if (helper == NULL) return -EOPNOTSUPP; - } if (help) { if (help->helper == helper) { -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html