Re: [nft PATCH] doc: Fix for make distcheck

2018-10-31 Thread Pablo Neira Ayuso
On Wed, Oct 31, 2018 at 11:16:56PM +0100, Phil Sutter wrote:
> When building from a separate build directory, a2x did not find the
> source file nft.txt. Using '$<' instead fixes this.

Applied, thanks!


Re: [iptables PATCH v2] xtables: Fix for matching rules with wildcard interfaces

2018-10-31 Thread Pablo Neira Ayuso
On Wed, Oct 31, 2018 at 08:13:34PM +0100, Phil Sutter wrote:
> Due to xtables_parse_interface() and parse_ifname() being misaligned
> regarding interface mask setting, rules containing a wildcard interface
> added with iptables-nft could neither be checked nor deleted.
> 
> As suggested, introduce extensions/iptables.t to hold checks for
> built-in selectors. This file is picked up by iptables-test.py as-is.
> The only limitation is that iptables is being used for it, so no
> ip6tables-specific things can be tested with it (for now).

Applied, thanks Phil.


[nft PATCH] doc: Fix for make distcheck

2018-10-31 Thread Phil Sutter
When building from a separate build directory, a2x did not find the
source file nft.txt. Using '$<' instead fixes this.

Fixes: 3bacae9e4a1e3 ("doc: Review man page building in Makefile.am")
Signed-off-by: Phil Sutter 
---
 doc/Makefile.am | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/Makefile.am b/doc/Makefile.am
index 503d6cd80051c..01e1af90bbf0c 100644
--- a/doc/Makefile.am
+++ b/doc/Makefile.am
@@ -14,7 +14,7 @@ ASCIIDOC_INCLUDES = \
 ASCIIDOCS = ${ASCIIDOC_MAIN} ${ASCIIDOC_INCLUDES}
 
 nft.8: ${ASCIIDOCS}
-   ${AM_V_GEN}${A2X} ${A2X_OPTS_MANPAGE} nft.txt
+   ${AM_V_GEN}${A2X} ${A2X_OPTS_MANPAGE} $<
 
 .adoc.3:
${AM_V_GEN}${A2X} ${A2X_OPTS_MANPAGE} $<
-- 
2.19.0



[iptables PATCH v2] xtables: Fix for matching rules with wildcard interfaces

2018-10-31 Thread Phil Sutter
Due to xtables_parse_interface() and parse_ifname() being misaligned
regarding interface mask setting, rules containing a wildcard interface
added with iptables-nft could neither be checked nor deleted.

As suggested, introduce extensions/iptables.t to hold checks for
built-in selectors. This file is picked up by iptables-test.py as-is.
The only limitation is that iptables is being used for it, so no
ip6tables-specific things can be tested with it (for now).

Signed-off-by: Phil Sutter 
---
Changes since v1:
- Introduce extensions/iptables.t instead of (yet another) script in
  iptables/tests.
---
 extensions/iptables.t | 4 
 iptables/nft-shared.c | 2 +-
 2 files changed, 5 insertions(+), 1 deletion(-)
 create mode 100644 extensions/iptables.t

diff --git a/extensions/iptables.t b/extensions/iptables.t
new file mode 100644
index 0..65456ee9874d7
--- /dev/null
+++ b/extensions/iptables.t
@@ -0,0 +1,4 @@
+:FORWARD
+-i alongifacename0;=;OK
+-i thisinterfaceistoolong0;;FAIL
+-i eth+ -o alongifacename+;=;OK
diff --git a/iptables/nft-shared.c b/iptables/nft-shared.c
index 492e4ec124a79..7b8ca5e4becaf 100644
--- a/iptables/nft-shared.c
+++ b/iptables/nft-shared.c
@@ -249,7 +249,7 @@ static void parse_ifname(const char *name, unsigned int 
len, char *dst, unsigned
return;
dst[len++] = 0;
if (mask)
-   memset(mask, 0xff, len + 1);
+   memset(mask, 0xff, len - 2);
 }
 
 int parse_meta(struct nftnl_expr *e, uint8_t key, char *iniface,
-- 
2.19.0



Re: [PATCH nft] json: fix json_events_cb() declaration when libjansson is not present

2018-10-31 Thread Phil Sutter
Hey Laura,

On Wed, Oct 31, 2018 at 12:54:18PM +0100, Laura Garcia Liebana wrote:
> When nftables is configured without libjansson support, the following
> compilation error is shown:
> 
> monitor.c: In function ‘netlink_echo_callback’:
> monitor.c:910:10: error: too many arguments to function ‘json_events_cb’
>return json_events_cb(nlh, _monh);
>   ^~
> 
> This patch makes a declaration of the json_events_cb() function
> consistent.
> 
> Fixes: bb32d8db9a12 ("JSON: Add support for echo option")
> 
> Signed-off-by: Laura Garcia Liebana 

Oops, thanks for catching this!

Cheers, Phil


[PATCH nf 1/2] netfilter: nf_tables: don't skip inactive chains during update

2018-10-31 Thread Florian Westphal
There is no synchronization between packet path and the configuration plane.

The packet path uses two arrays with rules, one contains the current (active)
generation.  The other either contains the last (obsolete) generation or
the future one.

Consider:
cpu1   cpu2
   nft_do_chain(c);
delete c
net->gen++;
   genbit = !!net->gen;
   rules = c->rg[genbit];

cpu1 ignores c when updating if c is not active anymore in the new
generation.

On cpu2, we now use rules from wrong generation, as c->rg[old]
contains the rules matching 'c' whereas c->rg[new] was not updated and
can even point to rules that have been free'd already, causing a crash.

To fix this, make sure that 'current' to the 'next' generation are
identical for chains that are going away so that c->rg[new] will just
use the matching rules even if genbit was incremented already.

Fixes: 0cbc06b3faba7 ("netfilter: nf_tables: remove synchronize_rcu in commit 
phase")
Signed-off-by: Florian Westphal 
---
 net/netfilter/nf_tables_api.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 42487d01a3ed..dd577e7d100c 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -6324,7 +6324,7 @@ static void nf_tables_commit_chain_free_rules_old(struct 
nft_rule **rules)
call_rcu(>h, __nf_tables_commit_chain_free_rules_old);
 }
 
-static void nf_tables_commit_chain_active(struct net *net, struct nft_chain 
*chain)
+static void nf_tables_commit_chain(struct net *net, struct nft_chain *chain)
 {
struct nft_rule **g0, **g1;
bool next_genbit;
@@ -6441,11 +6441,8 @@ static int nf_tables_commit(struct net *net, struct 
sk_buff *skb)
 
/* step 2.  Make rules_gen_X visible to packet path */
list_for_each_entry(table, >nft.tables, list) {
-   list_for_each_entry(chain, >chains, list) {
-   if (!nft_is_active_next(net, chain))
-   continue;
-   nf_tables_commit_chain_active(net, chain);
-   }
+   list_for_each_entry(chain, >chains, list)
+   nf_tables_commit_chain(net, chain);
}
 
/*
-- 
2.18.1



[PATCH nf 2/2] selftests: add script to stress-test nft packet path vs. control plane

2018-10-31 Thread Florian Westphal
Start flood ping for each cpu while loading/flushing rulesets to make
sure we do not access already-free'd rules from nf_tables evaluation loop.

Also add this to TARGETS so 'make run_tests' in selftest dir runs it
automatically.

This would have caught the bug fixed in previous change
("netfilter: nf_tables: do not skip inactive chains during generation update")
sooner.

Signed-off-by: Florian Westphal 
---
 tools/testing/selftests/Makefile  |  1 +
 tools/testing/selftests/netfilter/Makefile|  6 ++
 tools/testing/selftests/netfilter/config  |  2 +
 .../selftests/netfilter/nft_trans_stress.sh   | 78 +++
 4 files changed, 87 insertions(+)
 create mode 100644 tools/testing/selftests/netfilter/Makefile
 create mode 100644 tools/testing/selftests/netfilter/config
 create mode 100755 tools/testing/selftests/netfilter/nft_trans_stress.sh

diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index f1fe492c8e17..f0017c831e57 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -24,6 +24,7 @@ TARGETS += memory-hotplug
 TARGETS += mount
 TARGETS += mqueue
 TARGETS += net
+TARGETS += netfilter
 TARGETS += nsfs
 TARGETS += powerpc
 TARGETS += proc
diff --git a/tools/testing/selftests/netfilter/Makefile 
b/tools/testing/selftests/netfilter/Makefile
new file mode 100644
index ..47ed6cef93fb
--- /dev/null
+++ b/tools/testing/selftests/netfilter/Makefile
@@ -0,0 +1,6 @@
+# SPDX-License-Identifier: GPL-2.0
+# Makefile for netfilter selftests
+
+TEST_PROGS := nft_trans_stress.sh
+
+include ../lib.mk
diff --git a/tools/testing/selftests/netfilter/config 
b/tools/testing/selftests/netfilter/config
new file mode 100644
index ..1017313e41a8
--- /dev/null
+++ b/tools/testing/selftests/netfilter/config
@@ -0,0 +1,2 @@
+CONFIG_NET_NS=y
+NF_TABLES_INET=y
diff --git a/tools/testing/selftests/netfilter/nft_trans_stress.sh 
b/tools/testing/selftests/netfilter/nft_trans_stress.sh
new file mode 100755
index ..f1affd12c4b1
--- /dev/null
+++ b/tools/testing/selftests/netfilter/nft_trans_stress.sh
@@ -0,0 +1,78 @@
+#!/bin/bash
+#
+# This test is for stress-testing the nf_tables config plane path vs.
+# packet path processing: Make sure we never release rules that are
+# still visible to other cpus.
+#
+# set -e
+
+# Kselftest framework requirement - SKIP code is 4.
+ksft_skip=4
+
+testns=testns1
+tables="foo bar baz quux"
+
+nft --version > /dev/null 2>&1
+if [ $? -ne 0 ];then
+   echo "SKIP: Could not run test without nft tool"
+   exit $ksft_skip
+fi
+
+ip -Version > /dev/null 2>&1
+if [ $? -ne 0 ];then
+   echo "SKIP: Could not run test without ip tool"
+   exit $ksft_skip
+fi
+
+tmp=$(mktemp)
+
+for table in $tables; do
+   echo add table inet "$table" >> "$tmp"
+   echo flush table inet "$table" >> "$tmp"
+
+   echo "add chain inet $table INPUT { type filter hook input priority 0; 
}" >> "$tmp"
+   echo "add chain inet $table OUTPUT { type filter hook output priority 
0; }" >> "$tmp"
+   for c in $(seq 1 400); do
+   chain=$(printf "chain%03u" "$c")
+   echo "add chain inet $table $chain" >> "$tmp"
+   done
+
+   for c in $(seq 1 400); do
+   chain=$(printf "chain%03u" "$c")
+   for BASE in INPUT OUTPUT; do
+   echo "add rule inet $table $BASE counter jump $chain" 
>> "$tmp"
+   done
+   echo "add rule inet $table $chain counter return" >> "$tmp"
+   done
+done
+
+ip netns add "$testns"
+ip -netns "$testns" link set lo up
+
+lscpu | grep ^CPU\(s\): | ( read cpu cpunum ;
+cpunum=$((cpunum-1))
+for i in $(seq 0 $cpunum);do
+   mask=$(printf 0x%x $((1<<$i)))
+ip netns exec "$testns" taskset $mask ping -4 127.0.0.1 -fq > 
/dev/null &
+ip netns exec "$testns" taskset $mask ping -6 ::1 -fq > /dev/null &
+done)
+
+sleep 1
+
+for i in $(seq 1 10) ; do ip netns exec "$testns" nft -f "$tmp" & done
+
+for table in $tables;do
+   randsleep=$((RANDOM%10))
+   sleep $randsleep
+   ip netns exec "$testns" nft delete table inet $table 2>/dev/null
+done
+
+randsleep=$((RANDOM%10))
+sleep $randsleep
+
+pkill -9 ping
+
+wait
+
+rm -f "$tmp"
+ip netns del "$testns"
-- 
2.18.1



Re: stable regression: revert request for netfilter ipv6 defrag bug

2018-10-31 Thread Sasha Levin

On Wed, Oct 24, 2018 at 12:31:04PM +0200, Florian Westphal wrote:

Hi,

please consider reverting

commit 84379c9afe011020e797e3f50a662b08a6355dcf
netfilter: ipv6: nf_defrag: drop skb dst before queueing

It causes kernel crash for locally generated ipv6 fragments
when netfilter ipv6 defragmentation is used.

The faulty commit is not essential for -stable, it only
delays netns teardown for longer than needed when that netns
still has ipv6 frags queued.  Much better than crash :-/

commit ids are:
4.4.y: not affected (not backported)
4.9.y: backported as ad8b1ffc3efae2f65080bdb11145c87d299b8f9a
4.14.y: backported as 28c74ff85efd192aeca9005499ca50c24d795f61
4.18.y: (first affected kernel): 84379c9afe011020e797e3f50a662b08a6355dcf

For 4.19.y, you could also wait for a bug fix to hit Linus tree,
I can ping you again once its in:
https://patchwork.ozlabs.org/patch/988233/


I've queued a revert for 4.18, 4.14, and 4.9. Thank you.

--
Thanks,
Sasha


Re: [nft PATCH] py: Adjust Nftables class to output flags changes

2018-10-31 Thread Pablo Neira Ayuso
On Wed, Oct 31, 2018 at 01:53:16PM +0100, Phil Sutter wrote:
> Introduce setter/getter methods for each introduced output flag. Ignore
> NFT_CTX_OUTPUT_NUMERIC_ALL for now since it's main purpose is for
> internal use.
> 
> Adjust the script in tests/py accordingly: Due to the good defaults,
> only numeric proto output has to be selected - this is not a must, but
> allows for the test cases to remain unchanged.

Applied, thanks Phil.


Re: [PATCH nft] json: fix json_events_cb() declaration when libjansson is not present

2018-10-31 Thread Pablo Neira Ayuso
On Wed, Oct 31, 2018 at 12:54:18PM +0100, Laura Garcia Liebana wrote:
> When nftables is configured without libjansson support, the following
> compilation error is shown:
> 
> monitor.c: In function ‘netlink_echo_callback’:
> monitor.c:910:10: error: too many arguments to function ‘json_events_cb’
>return json_events_cb(nlh, _monh);
>   ^~
> 
> This patch makes a declaration of the json_events_cb() function
> consistent.

Applied, thanks Laura.


[nft PATCH] py: Adjust Nftables class to output flags changes

2018-10-31 Thread Phil Sutter
Introduce setter/getter methods for each introduced output flag. Ignore
NFT_CTX_OUTPUT_NUMERIC_ALL for now since it's main purpose is for
internal use.

Adjust the script in tests/py accordingly: Due to the good defaults,
only numeric proto output has to be selected - this is not a must, but
allows for the test cases to remain unchanged.

Signed-off-by: Phil Sutter 
---
 py/nftables.py   | 220 ---
 tests/py/nft-test.py |  29 +++---
 2 files changed, 153 insertions(+), 96 deletions(-)

diff --git a/py/nftables.py b/py/nftables.py
index d85bbb2ffeeda..6891cb1ce177b 100644
--- a/py/nftables.py
+++ b/py/nftables.py
@@ -33,11 +33,17 @@ class Nftables:
 "segtree":   0x40,
 }
 
-numeric_levels = {
-"none": 0,
-"addr": 1,
-"port": 2,
-"all":  3,
+output_flags = {
+"reversedns": (1 << 0),
+"service":(1 << 1),
+"stateless":  (1 << 2),
+"handle": (1 << 3),
+"json":   (1 << 4),
+"echo":   (1 << 5),
+"guid":   (1 << 6),
+"numeric_proto":  (1 << 7),
+"numeric_prio":   (1 << 8),
+"numeric_symbol": (1 << 9),
 }
 
 def __init__(self, sofile="libnftables.so"):
@@ -58,40 +64,12 @@ class Nftables:
 self.nft_ctx_new.restype = c_void_p
 self.nft_ctx_new.argtypes = [c_int]
 
-self.nft_ctx_output_get_handle = lib.nft_ctx_output_get_handle
-self.nft_ctx_output_get_handle.restype = c_bool
-self.nft_ctx_output_get_handle.argtypes = [c_void_p]
+self.nft_ctx_output_get_flags = lib.nft_ctx_output_get_flags
+self.nft_ctx_output_get_flags.restype = c_uint
+self.nft_ctx_output_get_flags.argtypes = [c_void_p]
 
-self.nft_ctx_output_set_handle = lib.nft_ctx_output_set_handle
-self.nft_ctx_output_set_handle.argtypes = [c_void_p, c_bool]
-
-self.nft_ctx_output_get_echo = lib.nft_ctx_output_get_echo
-self.nft_ctx_output_get_echo.restype = c_bool
-self.nft_ctx_output_get_echo.argtypes = [c_void_p]
-
-self.nft_ctx_output_set_echo = lib.nft_ctx_output_set_echo
-self.nft_ctx_output_set_echo.argtypes = [c_void_p, c_bool]
-
-self.nft_ctx_output_get_numeric = lib.nft_ctx_output_get_numeric
-self.nft_ctx_output_get_numeric.restype = c_int
-self.nft_ctx_output_get_numeric.argtypes = [c_void_p]
-
-self.nft_ctx_output_set_numeric = lib.nft_ctx_output_set_numeric
-self.nft_ctx_output_set_numeric.argtypes = [c_void_p, c_int]
-
-self.nft_ctx_output_get_stateless = lib.nft_ctx_output_get_stateless
-self.nft_ctx_output_get_stateless.restype = c_bool
-self.nft_ctx_output_get_stateless.argtypes = [c_void_p]
-
-self.nft_ctx_output_set_stateless = lib.nft_ctx_output_set_stateless
-self.nft_ctx_output_set_stateless.argtypes = [c_void_p, c_bool]
-
-self.nft_ctx_output_get_json = lib.nft_ctx_output_get_json
-self.nft_ctx_output_get_json.restype = c_bool
-self.nft_ctx_output_get_json.argtypes = [c_void_p]
-
-self.nft_ctx_output_set_json = lib.nft_ctx_output_set_json
-self.nft_ctx_output_set_json.argtypes = [c_void_p, c_bool]
+self.nft_ctx_output_set_flags = lib.nft_ctx_output_set_flags
+self.nft_ctx_output_set_flags.argtypes = [c_void_p, c_uint]
 
 self.nft_ctx_output_get_debug = lib.nft_ctx_output_get_debug
 self.nft_ctx_output_get_debug.restype = c_int
@@ -128,12 +106,77 @@ class Nftables:
 self.nft_ctx_buffer_output(self.__ctx)
 self.nft_ctx_buffer_error(self.__ctx)
 
+def __get_output_flag(self, name):
+flag = self.output_flags[name]
+return self.nft_ctx_output_get_flags(self.__ctx) & flag
+
+def __set_output_flag(self, name, val):
+flag = self.output_flags[name]
+flags = self.nft_ctx_output_get_flags(self.__ctx)
+if val:
+new_flags = flags | flag
+else:
+new_flags = flags & ~flag
+self.nft_ctx_output_set_flags(self.__ctx, new_flags)
+return flags & flag
+
+def get_reversedns_output(self):
+"""Get the current state of reverse DNS output.
+
+Returns a boolean indicating whether reverse DNS lookups are performed
+for IP addresses in output.
+"""
+return self.__get_output_flag("reversedns")
+
+def set_reversedns_output(self, val):
+"""Enable or disable reverse DNS output.
+
+Accepts a boolean turning reverse DNS lookups in output on or off.
+
+Returns the previous value.
+"""
+return self.__set_output_flag("reversedns", val)
+
+def get_service_output(self):
+"""Get the current state of service name output.
+
+Returns a boolean indicating whether service names are used for port
+numbers in output or not.
+"""
+return 

[PATCH nft] json: fix json_events_cb() declaration when libjansson is not present

2018-10-31 Thread Laura Garcia Liebana
When nftables is configured without libjansson support, the following
compilation error is shown:

monitor.c: In function ‘netlink_echo_callback’:
monitor.c:910:10: error: too many arguments to function ‘json_events_cb’
   return json_events_cb(nlh, _monh);
  ^~

This patch makes a declaration of the json_events_cb() function
consistent.

Fixes: bb32d8db9a12 ("JSON: Add support for echo option")

Signed-off-by: Laura Garcia Liebana 
---
 include/json.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/json.h b/include/json.h
index 8d45c3c..c724c29 100644
--- a/include/json.h
+++ b/include/json.h
@@ -239,7 +239,8 @@ static inline void monitor_print_rule_json(struct 
netlink_mon_handler *monh,
/* empty */
 }
 
-static inline int json_events_cb(const struct nlmsghdr *nlh)
+static inline int json_events_cb(const struct nlmsghdr *nlh,
+ struct netlink_mon_handler *monh)
 {
return -1;
 }
-- 
2.11.0



Segmentation fault when using ebtables

2018-10-31 Thread Dmitry Vinokurov
Hello,

I'm trying to execute "ebtables -A FORWARD -p arp -j DROP" compiled
for x86_64 from ebtables-v2.0.10-4 sources and get segmentation fault
on file libebtc.c and line 240 (
ebt_find_target(EBT_STANDARD_TARGET)->used = 1; ). Seems that
ebt_find_target returns NULL.

Could anybody help explain why it happens and how to solve it?

-- 
Best regards,
Dmitriy Vinokurov
+7 905 862 17 11
gim6...@gmail.com


[PATCH] netfilter: conntrack: fix cloned unconfirmed skb->_nfct race in __nf_conntrack_confirm

2018-10-31 Thread Chieh-Min Wang
From: Chieh-Min Wang 

For bridge(br_flood) or broadcast/multicast packets, they could clone skb with
unconfirmed conntrack which break the rule that unconfirmed skb->_nfct is never 
shared.
With nfqueue running on my system, the race can be easily reproduced with 
following
warning calltrace:

[13257.707525] CPU: 0 PID: 12132 Comm: main Tainted: PW   4.4.60 
#7744
[13257.707568] Hardware name: Qualcomm (Flattened Device Tree)
[13257.714700] [] (unwind_backtrace) from [] 
(show_stack+0x10/0x14)
[13257.720253] [] (show_stack) from [] 
(dump_stack+0x94/0xa8)
[13257.728240] [] (dump_stack) from [] 
(warn_slowpath_common+0x94/0xb0)
[13257.735268] [] (warn_slowpath_common) from [] 
(warn_slowpath_null+0x1c/0x24)
[13257.743519] [] (warn_slowpath_null) from [] 
(__nf_conntrack_confirm+0xa8/0x618)
[13257.752284] [] (__nf_conntrack_confirm) from [] 
(ipv4_confirm+0xb8/0xfc)
[13257.761049] [] (ipv4_confirm) from [] 
(nf_iterate+0x48/0xa8)
[13257.769725] [] (nf_iterate) from [] 
(nf_hook_slow+0x30/0xb0)
[13257.777108] [] (nf_hook_slow) from [] 
(br_nf_post_routing+0x274/0x31c)
[13257.784486] [] (br_nf_post_routing) from [] 
(nf_iterate+0x48/0xa8)
[13257.792556] [] (nf_iterate) from [] 
(nf_hook_slow+0x30/0xb0)
[13257.800458] [] (nf_hook_slow) from [] 
(br_forward_finish+0x94/0xa4)
[13257.808010] [] (br_forward_finish) from [] 
(br_nf_forward_finish+0x150/0x1ac)
[13257.815736] [] (br_nf_forward_finish) from [] 
(nf_reinject+0x108/0x170)
[13257.824762] [] (nf_reinject) from [] 
(nfqnl_recv_verdict+0x3d8/0x420)
[13257.832924] [] (nfqnl_recv_verdict) from [] 
(nfnetlink_rcv_msg+0x158/0x248)
[13257.841256] [] (nfnetlink_rcv_msg) from [] 
(netlink_rcv_skb+0x54/0xb0)
[13257.849762] [] (netlink_rcv_skb) from [] 
(netlink_unicast+0x148/0x23c)
[13257.858093] [] (netlink_unicast) from [] 
(netlink_sendmsg+0x2ec/0x368)
[13257.866348] [] (netlink_sendmsg) from [] 
(sock_sendmsg+0x34/0x44)
[13257.874590] [] (sock_sendmsg) from [] 
(___sys_sendmsg+0x1ec/0x200)
[13257.882489] [] (___sys_sendmsg) from [] 
(__sys_sendmsg+0x3c/0x64)
[13257.890300] [] (__sys_sendmsg) from [] 
(ret_fast_syscall+0x0/0x34)

The original code just triggered the warning but do nothing. It will caused the 
shared
conntrack moves to the dying list and the packet be droppped 
(nf_ct_resolve_clash returns
NF_DROP for dying conntrack).

- Reproduce steps:

++
|  br0(bridge)   |
||
+-+-+-+--+
  | eth0|   | eth1|   | eth2|
  | |   | |   | |
  +--+--+   +--+--+   +---+-+
 | |  |
 | |  |
  +--+-+ +-+--++--+-+
  | PC1| | PC2|| PC3|
  ++ ++++

iptables -A FORWARD -m mark --mark 0x100/0x100 -j NFQUEUE --queue-num 
100 --queue-bypass
ps: Our nfq userspace program will set mark on packets whose connection has 
already been processed.

PC1 sends broadcast packets simulated by hping3:
hping3 --rand-source --udp 192.168.1.255 -i u100

- Broadcast racing flow chart is as follow:

br_handle_frame
  BR_HOOK(NFPROTO_BRIDGE, NF_BR_PRE_ROUTING, br_handle_frame_finish)
  // skb->_nfct (unconfirmed conntrack) is constructed at PRE_ROUTING stage
  br_handle_frame_finish
// check if this packet is broadcast
br_flood_forward
  br_flood
list_for_each_entry_rcu(p, >port_list, list) // iterate through 
each port
  maybe_deliver
deliver_clone
  skb = skb_clone(skb)
  __br_forward
BR_HOOK(NFPROTO_BRIDGE, NF_BR_FORWARD,...)
// queue in our nfq and received by our userspace program
// goto __nf_conntrack_confirm with process context on CPU 1
br_pass_frame_up
  BR_HOOK(NFPROTO_BRIDGE, NF_BR_LOCAL_IN,...)
  // goto __nf_conntrack_confirm with softirq context on CPU 0

Because conntrack confirm can happen at both INPUT and POSTROUTING stage.
So with NFQUEUE running, skb->_nfct with the same unconfirmed conntrack could 
race on different core.

Signed-off-by: Chieh-Min Wang 
---
 net/netfilter/nf_conntrack_core.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/net/netfilter/nf_conntrack_core.c 
b/net/netfilter/nf_conntrack_core.c
index ca1168d..1c16bd9 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -901,10 +901,17 @@ __nf_conntrack_confirm(struct sk_buff *skb)
 * REJECT will give spurious warnings here.
 */
 
-   /* No external references means no one else could have
-* confirmed us.
+   /* Another skb with the same unconfirmed conntrack may
+* win the race. This may happen for bridge(br_flood)
+* or broadcast/multicast packets do skb_clone with
+* unconfirmed conntrack.
 */
-   WARN_ON(nf_ct_is_confirmed(ct));
+   if (unlikely(nf_ct_is_confirmed(ct))) {
+   nf_conntrack_double_unlock(hash, reply_hash);
+