[PATCH nf] netfilter: nf_conncount: use rb_link_node_rcu() instead of rb_link_node()

2018-12-07 Thread Taehee Yoo
rbnode in insert_tree() is rcu protected pointer.
So, in order to handle this pointer, _rcu function should be used.
rb_link_node_rcu() is a rcu version of rb_link_node().

Fixes: 34848d5c896e ("netfilter: nf_conncount: Split insert and traversal")
Signed-off-by: Taehee Yoo 
---
 net/netfilter/nf_conncount.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/nf_conncount.c b/net/netfilter/nf_conncount.c
index b6d0f6deea86..9cd180bda092 100644
--- a/net/netfilter/nf_conncount.c
+++ b/net/netfilter/nf_conncount.c
@@ -427,7 +427,7 @@ insert_tree(struct net *net,
count = 1;
rbconn->list.count = count;
 
-   rb_link_node(>node, parent, rbnode);
+   rb_link_node_rcu(>node, parent, rbnode);
rb_insert_color(>node, root);
 out_unlock:
spin_unlock_bh(_conncount_locks[hash % CONNCOUNT_LOCK_SLOTS]);
-- 
2.17.1



[PATCH nf] netfilter: nf_tables: deactivate expressions in rule replecement routine

2018-11-27 Thread Taehee Yoo
Rule replacement routine removes an old rule then adds a new rule.
In the old rule removing routine, below steps are needed.
Allocate trans, deactivate rule and deactivate expressons of rule.
But there is no expression deactivation routine in rule replacement
routine.

test commands:
   %nft add table ip filter
   %nft add chain ip filter c1
   %nft add chain ip filter c1
   %nft add rule ip filter c1 jump c2
   %nft replace rule ip filter c1 handle 3 accept
   %nft flush ruleset

 expression means immediate NFT_JUMP to chain c2.
Reference count of chain c2 is increased when the rule is added.

When rule is deleted or replaced, reference count of c2 should be
decreased. reference count decrement routine is in
the nft_immediate_deactivate().
That function is called by nft_rule_expr_deactivate().
But There is no nft_rule_expr_deactivate() in the rule replacement
routine. therefore reference count is not decreased.
That eventually makes the below message.

splat looks like:
[  214.396453] WARNING: CPU: 1 PID: 21 at net/netfilter/nf_tables_api.c:1432 
nf_tables_chain_destroy.isra.38+0x2f9/0x3a0 [nf_tables]
[  214.398983] Modules linked in: nf_tables nfnetlink
[  214.398983] CPU: 1 PID: 21 Comm: kworker/1:1 Not tainted 4.20.0-rc2+ #44
[  214.398983] Workqueue: events nf_tables_trans_destroy_work [nf_tables]
[  214.398983] RIP: 0010:nf_tables_chain_destroy.isra.38+0x2f9/0x3a0 [nf_tables]
[  214.398983] Code: 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 
8e 00 00 00 48 8b 7b 58 e8 e1 2c 4e c6 48 89 df e8 d9 2c 4e c6 eb 9a <0f> 0b eb 
96 0f 0b e9 7e fe ff ff e8 a7 7e 4e c6 e9 a4 fe ff ff e8
[  214.398983] RSP: 0018:8881152874e8 EFLAGS: 00010202
[  214.398983] RAX: 0001 RBX: 88810ef9fc28 RCX: 8881152876f0
[  214.398983] RDX: dc00 RSI: 111022a50ede RDI: 88810ef9fc78
[  214.398983] RBP: 111022a50e9d R08: 8000 R09: 
[  214.398983] R10:  R11:  R12: 111022a50eba
[  214.398983] R13: 888114446e08 R14: 8881152876f0 R15: ed1022a50ed6
[  214.398983] FS:  () GS:88811640() 
knlGS:
[  214.398983] CS:  0010 DS:  ES:  CR0: 80050033
[  214.398983] CR2: 7fab9bb5f868 CR3: 00012aa16000 CR4: 001006e0
[  214.398983] Call Trace:
[  214.398983]  ? nf_tables_table_destroy.isra.37+0x100/0x100 [nf_tables]
[  214.398983]  ? __kasan_slab_free+0x145/0x180
[  214.398983]  ? nf_tables_trans_destroy_work+0x439/0x830 [nf_tables]
[  214.398983]  ? kfree+0xdb/0x280
[  214.398983]  nf_tables_trans_destroy_work+0x5f5/0x830 [nf_tables]
[ ... ]

Fixes: bb7b40aecbf7 ("netfilter: nf_tables: bogus EBUSY in chain deletions")
Reported by: Christoph Anton Mitterer 
Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=914505
Link: https://bugzilla.kernel.org/show_bug.cgi?id=201791
Signed-off-by: Taehee Yoo 
---
 net/netfilter/nf_tables_api.c | 15 ---
 1 file changed, 4 insertions(+), 11 deletions(-)

diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index ddeaa1990e1e..2e61aab6ed73 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -2667,21 +2667,14 @@ static int nf_tables_newrule(struct net *net, struct 
sock *nlsk,
}
 
if (nlh->nlmsg_flags & NLM_F_REPLACE) {
-   if (!nft_is_active_next(net, old_rule)) {
-   err = -ENOENT;
-   goto err2;
-   }
-   trans = nft_trans_rule_add(, NFT_MSG_DELRULE,
-  old_rule);
+   trans = nft_trans_rule_add(, NFT_MSG_NEWRULE, rule);
if (trans == NULL) {
err = -ENOMEM;
goto err2;
}
-   nft_deactivate_next(net, old_rule);
-   chain->use--;
-
-   if (nft_trans_rule_add(, NFT_MSG_NEWRULE, rule) == NULL) {
-   err = -ENOMEM;
+   err = nft_delrule(, old_rule);
+   if (err < 0) {
+   nft_trans_destroy(trans);
goto err2;
}
 
-- 
2.17.1



Re: [PATCH nf] netfilter: xt_TEE: fix build failure

2018-11-26 Thread Taehee Yoo
On Mon, 26 Nov 2018 at 20:28, Pablo Neira Ayuso  wrote:
>
> On Mon, Nov 26, 2018 at 06:39:28PM +0900, Taehee Yoo wrote:
> > Hi Pablo,
> >
> > According to Masahiro Yamada, this is Kconfig bug and he is fixing Kconfig.
> > https://lkml.org/lkml/2018/11/26/291
> >
> > So that I think this patch will be useless.
> > Could you check it up?
>
> OK, will keep back your patch by now, if this fix for Kbuild is still
> not fixing up the problem, then robots will spot this again.
>

Okay, Thank you for checking!

> Thanks!
>
> > On Sun, 18 Nov 2018 at 23:39, Taehee Yoo  wrote:
> > >
> > > xt_TEE.c needs nf_dup_ipv6.c to support ipv6 packet duplication.
> > > So that if xt_TEE is enabled, nf_dup_ipv6 will be automatically selected.
> > > But there is build failure scenario.
> > >
> > > test config:
> > > CONFIG_NETFILTER_XT_TARGET_TEE=y
> > > CONFIG_NF_DUP_IPV6=m
> > >
> > > compile result:
> > > net/netfilter/xt_TEE.o: In function `tee_tg6':
> > > net/netfilter/xt_TEE.c:57: undefined reference to `nf_dup_ipv6'
> > >
> > > This patch forces to avoid above config.
> > >
> > > Fixes: 5d400a4933e8 ("netfilter: Kconfig: Change select IPv6 
> > > dependencies")
> > > Reported-by: Randy Dunlap 
> > > Reported-by: Reported-by: Stephen Rothwell 
> > > Signed-off-by: Taehee Yoo 
> > > ---
> > >  net/netfilter/Kconfig | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
> > > index 2ab870ef233a..a0c2712290ea 100644
> > > --- a/net/netfilter/Kconfig
> > > +++ b/net/netfilter/Kconfig
> > > @@ -1011,7 +1011,7 @@ config NETFILTER_XT_TARGET_TEE
> > > depends on IPV6 || IPV6=n
> > > depends on !NF_CONNTRACK || NF_CONNTRACK
> > > select NF_DUP_IPV4
> > > -   select NF_DUP_IPV6 if IP6_NF_IPTABLES
> > > +   select NF_DUP_IPV6 if IP6_NF_IPTABLES != n
> > > ---help---
> > > This option adds a "TEE" target with which a packet can be cloned 
> > > and
> > > this clone be rerouted to another nexthop.
> > > --
> > > 2.17.1
> > >


[PATCH nf] netfilter: nf_tables: fix suspicious RCU usage in nft_chain_stats_replace()

2018-11-26 Thread Taehee Yoo
basechain->stats is rcu protected data.
And write critical section of basechain->stats data is
nft_chain_stats_replace().
The function is executed in commit phase. so that actually commit_mutex
lock protects that.
Hence commit_mutex lockdep should be used for rcu_dereference_protected()
in the nft_chain_stats_replace() instead of NFNL_SUBSYS_NFTABLES.

By this patch, rcu APIs are used to handle basechain->stats data.

test commands:
   %iptables-nft -I INPUT
   %iptables-nft -Z
   %iptables-nft -Z

splat looks like:
[89279.358755] =
[89279.363656] WARNING: suspicious RCU usage
[89279.368458] 4.20.0-rc2+ #44 Tainted: GWL
[89279.374661] -
[89279.379542] net/netfilter/nf_tables_api.c:1404 suspicious 
rcu_dereference_protected() usage!
[89279.389520]
other info that might help us debug this:

[89279.398893]
rcu_scheduler_active = 2, debug_locks = 1
[89279.406556] 1 lock held by iptables-nft/5225:
[89279.411728]  #0: bf45a000 (>nft.commit_mutex){+.+.}, at: 
nf_tables_valid_genid+0x1f/0x70 [nf_tables]
[89279.424022]
stack backtrace:
[89279.429236] CPU: 0 PID: 5225 Comm: iptables-nft Tainted: GWL
4.20.0-rc2+ #44
[89279.430135] Call Trace:
[89279.430135]  dump_stack+0xc9/0x16b
[89279.430135]  ? show_regs_print_info+0x5/0x5
[89279.430135]  ? lockdep_rcu_suspicious+0x117/0x160
[89279.430135]  nft_chain_commit_update+0x4ea/0x640 [nf_tables]
[89279.430135]  ? sched_clock_local+0xd4/0x140
[89279.430135]  ? check_flags.part.35+0x440/0x440
[89279.430135]  ? __rhashtable_remove_fast.constprop.67+0xec0/0xec0 [nf_tables]
[89279.430135]  ? sched_clock_cpu+0x126/0x170
[89279.430135]  ? find_held_lock+0x39/0x1c0
[89279.430135]  ? hlock_class+0x140/0x140
[89279.430135]  ? is_bpf_text_address+0x5/0xf0
[89279.430135]  ? check_flags.part.35+0x440/0x440
[89279.430135]  ? __lock_is_held+0xb4/0x140
[89279.430135]  nf_tables_commit+0x2555/0x39c0 [nf_tables]

Fixes: f102d66b335a4 ("netfilter: nf_tables: use dedicated mutex to guard 
transactions")
Signed-off-by: Taehee Yoo 
---
 include/linux/netfilter/nfnetlink.h | 12 
 net/netfilter/nf_tables_api.c   | 21 +
 net/netfilter/nf_tables_core.c  |  2 +-
 3 files changed, 14 insertions(+), 21 deletions(-)

diff --git a/include/linux/netfilter/nfnetlink.h 
b/include/linux/netfilter/nfnetlink.h
index 4a520d3304a2..cf09ab37b45b 100644
--- a/include/linux/netfilter/nfnetlink.h
+++ b/include/linux/netfilter/nfnetlink.h
@@ -62,18 +62,6 @@ static inline bool lockdep_nfnl_is_held(__u8 subsys_id)
 }
 #endif /* CONFIG_PROVE_LOCKING */
 
-/*
- * nfnl_dereference - fetch RCU pointer when updates are prevented by subsys 
mutex
- *
- * @p: The pointer to read, prior to dereferencing
- * @ss: The nfnetlink subsystem ID
- *
- * Return the value of the specified RCU-protected pointer, but omit
- * the READ_ONCE(), because caller holds the NFNL subsystem mutex.
- */
-#define nfnl_dereference(p, ss)\
-   rcu_dereference_protected(p, lockdep_nfnl_is_held(ss))
-
 #define MODULE_ALIAS_NFNL_SUBSYS(subsys) \
MODULE_ALIAS("nfnetlink-subsys-" __stringify(subsys))
 
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index ddeaa1990e1e..e82ad1795194 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -1216,7 +1216,8 @@ static int nf_tables_fill_chain_info(struct sk_buff *skb, 
struct net *net,
if (nla_put_string(skb, NFTA_CHAIN_TYPE, basechain->type->name))
goto nla_put_failure;
 
-   if (basechain->stats && nft_dump_stats(skb, basechain->stats))
+   if (rcu_access_pointer(basechain->stats) &&
+   nft_dump_stats(skb, rcu_dereference(basechain->stats)))
goto nla_put_failure;
}
 
@@ -1392,7 +1393,8 @@ static struct nft_stats __percpu *nft_stats_alloc(const 
struct nlattr *attr)
return newstats;
 }
 
-static void nft_chain_stats_replace(struct nft_base_chain *chain,
+static void nft_chain_stats_replace(struct net *net,
+   struct nft_base_chain *chain,
struct nft_stats __percpu *newstats)
 {
struct nft_stats __percpu *oldstats;
@@ -1400,8 +1402,9 @@ static void nft_chain_stats_replace(struct nft_base_chain 
*chain,
if (newstats == NULL)
return;
 
-   if (chain->stats) {
-   oldstats = nfnl_dereference(chain->stats, NFNL_SUBSYS_NFTABLES);
+   if (rcu_access_pointer(chain->stats)) {
+   oldstats = rcu_dereference_protected(chain->stats,
+   lockdep_commit_lock_is_held(net));
rcu_assign_pointer(chain->stats, newstats);
synchronize_rcu();
free_pe

Re: [PATCH nf] netfilter: xt_TEE: fix build failure

2018-11-26 Thread Taehee Yoo
Hi Pablo,

According to Masahiro Yamada, this is Kconfig bug and he is fixing Kconfig.
https://lkml.org/lkml/2018/11/26/291

So that I think this patch will be useless.
Could you check it up?

Thanks!

On Sun, 18 Nov 2018 at 23:39, Taehee Yoo  wrote:
>
> xt_TEE.c needs nf_dup_ipv6.c to support ipv6 packet duplication.
> So that if xt_TEE is enabled, nf_dup_ipv6 will be automatically selected.
> But there is build failure scenario.
>
> test config:
> CONFIG_NETFILTER_XT_TARGET_TEE=y
> CONFIG_NF_DUP_IPV6=m
>
> compile result:
> net/netfilter/xt_TEE.o: In function `tee_tg6':
> net/netfilter/xt_TEE.c:57: undefined reference to `nf_dup_ipv6'
>
> This patch forces to avoid above config.
>
> Fixes: 5d400a4933e8 ("netfilter: Kconfig: Change select IPv6 dependencies")
> Reported-by: Randy Dunlap 
> Reported-by: Reported-by: Stephen Rothwell 
> Signed-off-by: Taehee Yoo 
> ---
>  net/netfilter/Kconfig | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
> index 2ab870ef233a..a0c2712290ea 100644
> --- a/net/netfilter/Kconfig
> +++ b/net/netfilter/Kconfig
> @@ -1011,7 +1011,7 @@ config NETFILTER_XT_TARGET_TEE
> depends on IPV6 || IPV6=n
> depends on !NF_CONNTRACK || NF_CONNTRACK
> select NF_DUP_IPV4
> -   select NF_DUP_IPV6 if IP6_NF_IPTABLES
> +   select NF_DUP_IPV6 if IP6_NF_IPTABLES != n
> ---help---
> This option adds a "TEE" target with which a packet can be cloned and
> this clone be rerouted to another nexthop.
> --
> 2.17.1
>


[PATCH nf] netfilter: nf_conncount: remove wrong condition check routine

2018-11-25 Thread Taehee Yoo
All lists in the tree_nodes_free() have both zero count and true dead flag.
Because lists are selected by nf_conncount_gc_list() and that makes that
zero-count and true dead flag.
So that the if statement of tree_nodes_free() is unnecessary and wrong.

Fixes: 31568ec09ea0 ("netfilter: nf_conncount: fix list_del corruption in 
conn_free")
Signed-off-by: Taehee Yoo 
---
 net/netfilter/nf_conncount.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/net/netfilter/nf_conncount.c b/net/netfilter/nf_conncount.c
index 8acae4a3e4c0..b6d0f6deea86 100644
--- a/net/netfilter/nf_conncount.c
+++ b/net/netfilter/nf_conncount.c
@@ -323,11 +323,8 @@ static void tree_nodes_free(struct rb_root *root,
while (gc_count) {
rbconn = gc_nodes[--gc_count];
spin_lock(>list.list_lock);
-   if (rbconn->list.count == 0 && rbconn->list.dead == false) {
-   rbconn->list.dead = true;
-   rb_erase(>node, root);
-   call_rcu(>rcu_head, __tree_nodes_free);
-   }
+   rb_erase(>node, root);
+   call_rcu(>rcu_head, __tree_nodes_free);
spin_unlock(>list.list_lock);
}
 }
-- 
2.17.1



[PATCH nf v2 2/2] netfilter: nat: fix double register in masquerade modules

2018-11-22 Thread Taehee Yoo
masquerade modules register notifier and that should not be
double-registered. so that these modules manage reference counter.
If already notifiers are registered, it just return success.
But there is unsafe scenario.

test commands:

   while :
   do
   modprobe ip6t_MASQUERADE &
   modprobe nft_masq_ipv6 &
   modprobe -rv ip6t_MASQUERADE &
   modprobe -rv nft_masq_ipv6 &
   done

numbers are reference count.

CPU0CPU1CPU2CPU3CPU4
[insmod][insmod][rmmod] [rmmod] [insmod]

0->1
register1->2
returns 2->1
returns 1->0
0->1
register <--
unregister


The unregistation of CPU3 should be processed before the
registration of CPU4.

In order to fix this, mutex can be used.
So that this patch uses it.

splat looks like:
[  323.869557] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [modprobe:1381]
[  323.869574] Modules linked in: nf_tables(+) nf_nat_ipv6(-) nf_nat 
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 n]
[  323.869574] irq event stamp: 194074
[  323.898930] hardirqs last  enabled at (194073): [] 
trace_hardirqs_on_thunk+0x1a/0x1c
[  323.898930] hardirqs last disabled at (194074): [] 
trace_hardirqs_off_thunk+0x1a/0x1c
[  323.898930] softirqs last  enabled at (182132): [] 
__do_softirq+0x6ec/0xa3b
[  323.898930] softirqs last disabled at (182109): [] 
irq_exit+0x1a6/0x1e0
[  323.898930] CPU: 0 PID: 1381 Comm: modprobe Not tainted 4.20.0-rc2+ #27
[  323.898930] RIP: 0010:raw_notifier_chain_register+0xea/0x240
[  323.898930] Code: 3c 03 0f 8e f2 00 00 00 44 3b 6b 10 7f 4d 49 bc 00 00 00 
00 00 fc ff df eb 22 48 8d 7b 10 488
[  323.898930] RSP: 0018:888101597218 EFLAGS: 0206 ORIG_RAX: 
ff13
[  323.898930] RAX:  RBX: c04361c0 RCX: 
[  323.898930] RDX: 126132ae RSI: c04aa3c0 RDI: c04361d0
[  323.898930] RBP: c04361c8 R08:  R09: 0001
[  323.898930] R10: 8881015972b0 R11: fbfff26132c4 R12: dc00
[  323.898930] R13:  R14: 1110202b2e44 R15: c04aa3c0
[  323.898930] FS:  7f813ed41540() GS:88811ae0() 
knlGS:
[  323.898930] CS:  0010 DS:  ES:  CR0: 80050033
[  323.898930] CR2: 559bf2c9f120 CR3: 00010bc8 CR4: 001006f0
[  323.898930] Call Trace:
[  323.898930]  ? atomic_notifier_chain_register+0x2d0/0x2d0
[  323.898930]  ? down_read+0x150/0x150
[  323.898930]  ? sched_clock_cpu+0x126/0x170
[  323.898930]  ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables]
[  323.898930]  ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables]
[  323.898930]  register_netdevice_notifier+0xbb/0x790
[  323.898930]  ? __dev_close_many+0x2d0/0x2d0
[  323.898930]  ? __mutex_unlock_slowpath+0x17f/0x740
[  323.898930]  ? wait_for_completion+0x710/0x710
[  323.898930]  ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables]
[  323.898930]  ? up_write+0x6c/0x210
[  323.898930]  ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables]
[  324.127073]  ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables]
[  324.127073]  nft_chain_filter_init+0x1e/0xe8a [nf_tables]
[  324.127073]  nf_tables_module_init+0x37/0x92 [nf_tables]
[ ... ]

Fixes: 8dd33cc93ec9 ("netfilter: nf_nat: generalize IPv4 masquerading support 
for nf_tables")
Fixes: be6b635cd674 ("netfilter: nf_nat: generalize IPv6 masquerading support 
for nf_tables")
Signed-off-by: Taehee Yoo 
---

v2:
 - Add second patch
 - return success when notifier is already registered. (Florian Westphal)
v1: Initial patch

 net/ipv4/netfilter/nf_nat_masquerade_ipv4.c | 23 ++---
 net/ipv6/netfilter/nf_nat_masquerade_ipv6.c | 23 ++---
 2 files changed, 32 insertions(+), 14 deletions(-)

diff --git a/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c 
b/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c
index c7d7fa4fc369..41327bb99093 100644
--- a/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c
+++ b/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c
@@ -147,15 +147,17 @@ static struct notifier_block masq_inet_notifier = {
.notifier_call  = masq_inet_event,
 };
 
-static atomic_t masquerade_notifier_refcount = ATOMIC_INIT(0);
+static int masq_refcnt;
+static DEFINE_MUTEX(masq_mutex);
 
 int nf_nat_masquerade_ipv4_register_notifier(void)
 {
-   int ret;
+   int ret = 0;
 
+   mutex_lock(_mutex);
/* check if the notifier was already set */
-   if (atomic_inc_return(_notifier_refcount) > 1)
-   return 0;
+   if (++masq_refcnt > 1)
+   goto out_unlock;

[PATCH nf v2 0/2] netfilter: fix notifier registration bugs

2018-11-22 Thread Taehee Yoo
This patch series fix notifier registration bugs.

First patch adds error handling code for failure of notifier registration.
notifier registration can be failed. so that error handling code are needed.

Second patch fixes double-register bug in masqerade modules.
In order to protect double-register, masquerade modules manage
reference count. but it's not enough.
So that, this patch uses mutex instead of atomic value.

v2:
 - Add second patch
 - return success when notifier is already registered. (Florian Westphal)
v1: Initial patch

Taehee Yoo (2):
  netfilter: add missing error handling code for register functions
  netfilter: nat: fix double register in masquerade modules

 .../net/netfilter/ipv4/nf_nat_masquerade.h|  2 +-
 .../net/netfilter/ipv6/nf_nat_masquerade.h|  2 +-
 net/ipv4/netfilter/ipt_MASQUERADE.c   |  7 ++-
 net/ipv4/netfilter/nf_nat_masquerade_ipv4.c   | 38 +++---
 net/ipv4/netfilter/nft_masq_ipv4.c|  4 +-
 net/ipv6/netfilter/ip6t_MASQUERADE.c  |  8 ++-
 net/ipv6/netfilter/nf_nat_masquerade_ipv6.c   | 49 ++-
 net/ipv6/netfilter/nft_masq_ipv6.c|  4 +-
 net/netfilter/nft_flow_offload.c  |  5 +-
 9 files changed, 89 insertions(+), 30 deletions(-)

-- 
2.17.1



[PATCH nf v2 1/2] netfilter: add missing error handling code for register functions

2018-11-22 Thread Taehee Yoo
register_{netdevice/inetaddr/inet6addr}_notifier returns value that
could be error value. so that error handling code are needed.

Signed-off-by: Taehee Yoo 
---

v2:
 - Add second patch
 - return success when notifier is already registered. (Florian Westphal)
v1: Initial patch

 .../net/netfilter/ipv4/nf_nat_masquerade.h|  2 +-
 .../net/netfilter/ipv6/nf_nat_masquerade.h|  2 +-
 net/ipv4/netfilter/ipt_MASQUERADE.c   |  7 ++--
 net/ipv4/netfilter/nf_nat_masquerade_ipv4.c   | 21 +---
 net/ipv4/netfilter/nft_masq_ipv4.c|  4 ++-
 net/ipv6/netfilter/ip6t_MASQUERADE.c  |  8 +++--
 net/ipv6/netfilter/nf_nat_masquerade_ipv6.c   | 32 +--
 net/ipv6/netfilter/nft_masq_ipv6.c|  4 ++-
 net/netfilter/nft_flow_offload.c  |  5 ++-
 9 files changed, 63 insertions(+), 22 deletions(-)

diff --git a/include/net/netfilter/ipv4/nf_nat_masquerade.h 
b/include/net/netfilter/ipv4/nf_nat_masquerade.h
index cd24be4c4a99..13d55206bb9f 100644
--- a/include/net/netfilter/ipv4/nf_nat_masquerade.h
+++ b/include/net/netfilter/ipv4/nf_nat_masquerade.h
@@ -9,7 +9,7 @@ nf_nat_masquerade_ipv4(struct sk_buff *skb, unsigned int 
hooknum,
   const struct nf_nat_range2 *range,
   const struct net_device *out);
 
-void nf_nat_masquerade_ipv4_register_notifier(void);
+int nf_nat_masquerade_ipv4_register_notifier(void);
 void nf_nat_masquerade_ipv4_unregister_notifier(void);
 
 #endif /*_NF_NAT_MASQUERADE_IPV4_H_ */
diff --git a/include/net/netfilter/ipv6/nf_nat_masquerade.h 
b/include/net/netfilter/ipv6/nf_nat_masquerade.h
index 0c3b5ebf0bb8..2917bf95c437 100644
--- a/include/net/netfilter/ipv6/nf_nat_masquerade.h
+++ b/include/net/netfilter/ipv6/nf_nat_masquerade.h
@@ -5,7 +5,7 @@
 unsigned int
 nf_nat_masquerade_ipv6(struct sk_buff *skb, const struct nf_nat_range2 *range,
   const struct net_device *out);
-void nf_nat_masquerade_ipv6_register_notifier(void);
+int nf_nat_masquerade_ipv6_register_notifier(void);
 void nf_nat_masquerade_ipv6_unregister_notifier(void);
 
 #endif /* _NF_NAT_MASQUERADE_IPV6_H_ */
diff --git a/net/ipv4/netfilter/ipt_MASQUERADE.c 
b/net/ipv4/netfilter/ipt_MASQUERADE.c
index ce1512b02cb2..fd3f9e8a74da 100644
--- a/net/ipv4/netfilter/ipt_MASQUERADE.c
+++ b/net/ipv4/netfilter/ipt_MASQUERADE.c
@@ -81,9 +81,12 @@ static int __init masquerade_tg_init(void)
int ret;
 
ret = xt_register_target(_tg_reg);
+   if (ret)
+   return ret;
 
-   if (ret == 0)
-   nf_nat_masquerade_ipv4_register_notifier();
+   ret = nf_nat_masquerade_ipv4_register_notifier();
+   if (ret)
+   xt_unregister_target(_tg_reg);
 
return ret;
 }
diff --git a/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c 
b/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c
index a9d5e013e555..c7d7fa4fc369 100644
--- a/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c
+++ b/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c
@@ -149,16 +149,29 @@ static struct notifier_block masq_inet_notifier = {
 
 static atomic_t masquerade_notifier_refcount = ATOMIC_INIT(0);
 
-void nf_nat_masquerade_ipv4_register_notifier(void)
+int nf_nat_masquerade_ipv4_register_notifier(void)
 {
+   int ret;
+
/* check if the notifier was already set */
if (atomic_inc_return(_notifier_refcount) > 1)
-   return;
+   return 0;
 
/* Register for device down reports */
-   register_netdevice_notifier(_dev_notifier);
+   ret = register_netdevice_notifier(_dev_notifier);
+   if (ret)
+   goto err_dec;
/* Register IP address change reports */
-   register_inetaddr_notifier(_inet_notifier);
+   ret = register_inetaddr_notifier(_inet_notifier);
+   if (ret)
+   goto err_unregister;
+
+   return ret;
+err_unregister:
+   unregister_netdevice_notifier(_dev_notifier);
+err_dec:
+   atomic_dec(_notifier_refcount);
+   return ret;
 }
 EXPORT_SYMBOL_GPL(nf_nat_masquerade_ipv4_register_notifier);
 
diff --git a/net/ipv4/netfilter/nft_masq_ipv4.c 
b/net/ipv4/netfilter/nft_masq_ipv4.c
index f1193e1e928a..6847de1d1db8 100644
--- a/net/ipv4/netfilter/nft_masq_ipv4.c
+++ b/net/ipv4/netfilter/nft_masq_ipv4.c
@@ -69,7 +69,9 @@ static int __init nft_masq_ipv4_module_init(void)
if (ret < 0)
return ret;
 
-   nf_nat_masquerade_ipv4_register_notifier();
+   ret = nf_nat_masquerade_ipv4_register_notifier();
+   if (ret)
+   nft_unregister_expr(_masq_ipv4_type);
 
return ret;
 }
diff --git a/net/ipv6/netfilter/ip6t_MASQUERADE.c 
b/net/ipv6/netfilter/ip6t_MASQUERADE.c
index 491f808e356a..29c7f1915a96 100644
--- a/net/ipv6/netfilter/ip6t_MASQUERADE.c
+++ b/net/ipv6/netfilter/ip6t_MASQUERADE.c
@@ -58,8 +58,12 @@ static int __init masquerade_tg6_init(void)
int err;
 
err = xt_register_target(_tg6_reg);
-   if (er

Re: [PATCH nf-next] netfilter: add missing error handling code for register functions.

2018-11-19 Thread Taehee Yoo
On Tue, 20 Nov 2018 at 06:19, Florian Westphal  wrote:
>

Hi Florian!
Thank you for the review!

> Taehee Yoo  wrote:
> > register_{netdevice/inetaddr/inet6addr}_notifier returns value that
> > could be error value. so that error handling code are needed.
>
> Nothing should break without those notifiers in place though.
>
> >   /* check if the notifier was already set */
> >   if (atomic_inc_return(_notifier_refcount) > 1)
> > - return;
> > + return -EEXIST;
>
> I don't think this is an error, it should return 0.
>
> > diff --git a/net/ipv4/netfilter/nft_masq_ipv4.c 
> > b/net/ipv4/netfilter/nft_masq_ipv4.c
> > index f1193e1e928a..6847de1d1db8 100644
> > --- a/net/ipv4/netfilter/nft_masq_ipv4.c
> > +++ b/net/ipv4/netfilter/nft_masq_ipv4.c
> > @@ -69,7 +69,9 @@ static int __init nft_masq_ipv4_module_init(void)
> >   if (ret < 0)
> >   return ret;
> >
> > - nf_nat_masquerade_ipv4_register_notifier();
> > + ret = nf_nat_masquerade_ipv4_register_notifier();
> > + if (ret)
> > + nft_unregister_expr(_masq_ipv4_type);
>
> Else this would error out in case xtables masquerade module is already
> loaded.

I have tested just now about this.
test commands:
modprobe ipt_MASQUERADE
modprobe nft_masq_ipv4 <-- FAIL

Second command fails because
nf_nat_masquerade_ipv4_register_notifier() returns -EEXIST.

Thanks a lot for finding this. I will send v2 patch!


[PATCH nf-next] netfilter: add missing error handling code for register functions.

2018-11-19 Thread Taehee Yoo
register_{netdevice/inetaddr/inet6addr}_notifier returns value that
could be error value. so that error handling code are needed.

Signed-off-by: Taehee Yoo 
---
 .../net/netfilter/ipv4/nf_nat_masquerade.h|  2 +-
 .../net/netfilter/ipv6/nf_nat_masquerade.h|  2 +-
 net/ipv4/netfilter/ipt_MASQUERADE.c   |  7 ++--
 net/ipv4/netfilter/nf_nat_masquerade_ipv4.c   | 21 +---
 net/ipv4/netfilter/nft_masq_ipv4.c|  4 ++-
 net/ipv6/netfilter/ip6t_MASQUERADE.c  |  8 +++--
 net/ipv6/netfilter/nf_nat_masquerade_ipv6.c   | 32 +--
 net/ipv6/netfilter/nft_masq_ipv6.c|  4 ++-
 net/netfilter/nft_flow_offload.c  |  5 ++-
 9 files changed, 63 insertions(+), 22 deletions(-)

diff --git a/include/net/netfilter/ipv4/nf_nat_masquerade.h 
b/include/net/netfilter/ipv4/nf_nat_masquerade.h
index cd24be4c4a99..13d55206bb9f 100644
--- a/include/net/netfilter/ipv4/nf_nat_masquerade.h
+++ b/include/net/netfilter/ipv4/nf_nat_masquerade.h
@@ -9,7 +9,7 @@ nf_nat_masquerade_ipv4(struct sk_buff *skb, unsigned int 
hooknum,
   const struct nf_nat_range2 *range,
   const struct net_device *out);
 
-void nf_nat_masquerade_ipv4_register_notifier(void);
+int nf_nat_masquerade_ipv4_register_notifier(void);
 void nf_nat_masquerade_ipv4_unregister_notifier(void);
 
 #endif /*_NF_NAT_MASQUERADE_IPV4_H_ */
diff --git a/include/net/netfilter/ipv6/nf_nat_masquerade.h 
b/include/net/netfilter/ipv6/nf_nat_masquerade.h
index 0c3b5ebf0bb8..2917bf95c437 100644
--- a/include/net/netfilter/ipv6/nf_nat_masquerade.h
+++ b/include/net/netfilter/ipv6/nf_nat_masquerade.h
@@ -5,7 +5,7 @@
 unsigned int
 nf_nat_masquerade_ipv6(struct sk_buff *skb, const struct nf_nat_range2 *range,
   const struct net_device *out);
-void nf_nat_masquerade_ipv6_register_notifier(void);
+int nf_nat_masquerade_ipv6_register_notifier(void);
 void nf_nat_masquerade_ipv6_unregister_notifier(void);
 
 #endif /* _NF_NAT_MASQUERADE_IPV6_H_ */
diff --git a/net/ipv4/netfilter/ipt_MASQUERADE.c 
b/net/ipv4/netfilter/ipt_MASQUERADE.c
index ce1512b02cb2..fd3f9e8a74da 100644
--- a/net/ipv4/netfilter/ipt_MASQUERADE.c
+++ b/net/ipv4/netfilter/ipt_MASQUERADE.c
@@ -81,9 +81,12 @@ static int __init masquerade_tg_init(void)
int ret;
 
ret = xt_register_target(_tg_reg);
+   if (ret)
+   return ret;
 
-   if (ret == 0)
-   nf_nat_masquerade_ipv4_register_notifier();
+   ret = nf_nat_masquerade_ipv4_register_notifier();
+   if (ret)
+   xt_unregister_target(_tg_reg);
 
return ret;
 }
diff --git a/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c 
b/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c
index a9d5e013e555..a6672c2be268 100644
--- a/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c
+++ b/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c
@@ -149,16 +149,29 @@ static struct notifier_block masq_inet_notifier = {
 
 static atomic_t masquerade_notifier_refcount = ATOMIC_INIT(0);
 
-void nf_nat_masquerade_ipv4_register_notifier(void)
+int nf_nat_masquerade_ipv4_register_notifier(void)
 {
+   int ret;
+
/* check if the notifier was already set */
if (atomic_inc_return(_notifier_refcount) > 1)
-   return;
+   return -EEXIST;
 
/* Register for device down reports */
-   register_netdevice_notifier(_dev_notifier);
+   ret = register_netdevice_notifier(_dev_notifier);
+   if (ret)
+   goto err_dec;
/* Register IP address change reports */
-   register_inetaddr_notifier(_inet_notifier);
+   ret = register_inetaddr_notifier(_inet_notifier);
+   if (ret)
+   goto err_unregister;
+
+   return ret;
+err_unregister:
+   unregister_netdevice_notifier(_dev_notifier);
+err_dec:
+   atomic_dec(_notifier_refcount);
+   return ret;
 }
 EXPORT_SYMBOL_GPL(nf_nat_masquerade_ipv4_register_notifier);
 
diff --git a/net/ipv4/netfilter/nft_masq_ipv4.c 
b/net/ipv4/netfilter/nft_masq_ipv4.c
index f1193e1e928a..6847de1d1db8 100644
--- a/net/ipv4/netfilter/nft_masq_ipv4.c
+++ b/net/ipv4/netfilter/nft_masq_ipv4.c
@@ -69,7 +69,9 @@ static int __init nft_masq_ipv4_module_init(void)
if (ret < 0)
return ret;
 
-   nf_nat_masquerade_ipv4_register_notifier();
+   ret = nf_nat_masquerade_ipv4_register_notifier();
+   if (ret)
+   nft_unregister_expr(_masq_ipv4_type);
 
return ret;
 }
diff --git a/net/ipv6/netfilter/ip6t_MASQUERADE.c 
b/net/ipv6/netfilter/ip6t_MASQUERADE.c
index 491f808e356a..29c7f1915a96 100644
--- a/net/ipv6/netfilter/ip6t_MASQUERADE.c
+++ b/net/ipv6/netfilter/ip6t_MASQUERADE.c
@@ -58,8 +58,12 @@ static int __init masquerade_tg6_init(void)
int err;
 
err = xt_register_target(_tg6_reg);
-   if (err == 0)
-   nf_nat_masquerade_ipv6_register_notifier();
+   if (err)
+   return err;
+
+

Re: [PATCH nf] netfilter: xt_TEE: fix build failure

2018-11-18 Thread Taehee Yoo
On Mon, 19 Nov 2018 at 02:15, Randy Dunlap  wrote:
>
> On 11/18/18 6:39 AM, Taehee Yoo wrote:
> > xt_TEE.c needs nf_dup_ipv6.c to support ipv6 packet duplication.
> > So that if xt_TEE is enabled, nf_dup_ipv6 will be automatically selected.
> > But there is build failure scenario.
> >
> > test config:
> > CONFIG_NETFILTER_XT_TARGET_TEE=y
> > CONFIG_NF_DUP_IPV6=m
> >
> > compile result:
> > net/netfilter/xt_TEE.o: In function `tee_tg6':
> > net/netfilter/xt_TEE.c:57: undefined reference to `nf_dup_ipv6'
> >
> > This patch forces to avoid above config.
> >
> > Fixes: 5d400a4933e8 ("netfilter: Kconfig: Change select IPv6 dependencies")
> > Reported-by: Randy Dunlap 
> > Reported-by: Reported-by: Stephen Rothwell 
> > Signed-off-by: Taehee Yoo 
>
> Hi,

Hi!
Thank you for review!

> This does fix the build error, so
> Acked-by: Randy Dunlap 
>
> The patch causes this:
>   CONFIG_NF_DUP_IPV6=m
> to become this:
>   CONFIG_NF_DUP_IPV6=y
>
> I understand how the above change fixes the build error, but I don't
> see how the change to the Kconfig below file causes the resulting
> .config file change above.
>
> Do you?  Can you explain it?  Thanks.
>

My understanding is that:
   1. select NF_DUP_IPV6 if IP6_NF_IPTABLES
   2. select NF_DUP_IPV6 if IP6_NF_IPTABLES != n

First statement means that
NF_DUP_IPV6 can't be lower than IP6_NF_IPTABLES.
for example,
If NF_DUP_IPV6 is 'm', NF_DUP_IPV6 can be 'm' or 'y'.
If NF_DUP_IPV6 is 'y', NF_DUP_IPV6 can be only 'y'

Second statement means that
NF_DUP_IPV6 will be set the same value of NETFILTER_XT_TARGET_TEE
if IP6_NF_IPTABLES value is not 'n'.

My understanding might be wrong. If so, please let me know!

Thanks!


> > ---
> >  net/netfilter/Kconfig | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
> > index 2ab870ef233a..a0c2712290ea 100644
> > --- a/net/netfilter/Kconfig
> > +++ b/net/netfilter/Kconfig
> > @@ -1011,7 +1011,7 @@ config NETFILTER_XT_TARGET_TEE
> >   depends on IPV6 || IPV6=n
> >   depends on !NF_CONNTRACK || NF_CONNTRACK
> >   select NF_DUP_IPV4
> > - select NF_DUP_IPV6 if IP6_NF_IPTABLES
> > + select NF_DUP_IPV6 if IP6_NF_IPTABLES != n
> >   ---help---
> >   This option adds a "TEE" target with which a packet can be cloned and
> >   this clone be rerouted to another nexthop.
> >
>
>
> --
> ~Randy


[PATCH nf] netfilter: xt_TEE: fix build failure

2018-11-18 Thread Taehee Yoo
xt_TEE.c needs nf_dup_ipv6.c to support ipv6 packet duplication.
So that if xt_TEE is enabled, nf_dup_ipv6 will be automatically selected.
But there is build failure scenario.

test config:
CONFIG_NETFILTER_XT_TARGET_TEE=y
CONFIG_NF_DUP_IPV6=m

compile result:
net/netfilter/xt_TEE.o: In function `tee_tg6':
net/netfilter/xt_TEE.c:57: undefined reference to `nf_dup_ipv6'

This patch forces to avoid above config.

Fixes: 5d400a4933e8 ("netfilter: Kconfig: Change select IPv6 dependencies")
Reported-by: Randy Dunlap 
Reported-by: Reported-by: Stephen Rothwell 
Signed-off-by: Taehee Yoo 
---
 net/netfilter/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index 2ab870ef233a..a0c2712290ea 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -1011,7 +1011,7 @@ config NETFILTER_XT_TARGET_TEE
depends on IPV6 || IPV6=n
depends on !NF_CONNTRACK || NF_CONNTRACK
select NF_DUP_IPV4
-   select NF_DUP_IPV6 if IP6_NF_IPTABLES
+   select NF_DUP_IPV6 if IP6_NF_IPTABLES != n
---help---
This option adds a "TEE" target with which a packet can be cloned and
this clone be rerouted to another nexthop.
-- 
2.17.1



[PATCH nf] netfilter: xt_hashlimit: fix a possible memory leak in htable_create()

2018-11-16 Thread Taehee Yoo
In the htable_create(), hinfo is allocated by vmalloc()
So that if error occurred, hinfo should be freed.

Fixes: 11d5f15723c9 ("netfilter: xt_hashlimit: Create revision 2 to support 
higher pps rates")
Signed-off-by: Taehee Yoo 
---
 net/netfilter/xt_hashlimit.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/net/netfilter/xt_hashlimit.c b/net/netfilter/xt_hashlimit.c
index 3e7d259e5d8d..1ad4017f9b73 100644
--- a/net/netfilter/xt_hashlimit.c
+++ b/net/netfilter/xt_hashlimit.c
@@ -295,9 +295,10 @@ static int htable_create(struct net *net, struct 
hashlimit_cfg3 *cfg,
 
/* copy match config into hashtable config */
ret = cfg_copy(>cfg, (void *)cfg, 3);
-
-   if (ret)
+   if (ret) {
+   vfree(hinfo);
return ret;
+   }
 
hinfo->cfg.size = size;
if (hinfo->cfg.max == 0)
@@ -814,7 +815,6 @@ hashlimit_mt_v1(const struct sk_buff *skb, struct 
xt_action_param *par)
int ret;
 
ret = cfg_copy(, (void *)>cfg, 1);
-
if (ret)
return ret;
 
@@ -830,7 +830,6 @@ hashlimit_mt_v2(const struct sk_buff *skb, struct 
xt_action_param *par)
int ret;
 
ret = cfg_copy(, (void *)>cfg, 2);
-
if (ret)
return ret;
 
@@ -921,7 +920,6 @@ static int hashlimit_mt_check_v1(const struct 
xt_mtchk_param *par)
return ret;
 
ret = cfg_copy(, (void *)>cfg, 1);
-
if (ret)
return ret;
 
@@ -940,7 +938,6 @@ static int hashlimit_mt_check_v2(const struct 
xt_mtchk_param *par)
return ret;
 
ret = cfg_copy(, (void *)>cfg, 2);
-
if (ret)
return ret;
 
-- 
2.17.1



[PATCH nf-next 2/2] netfilter: nf_flow_table: simplify nf_flow_offload_gc_step()

2018-11-06 Thread Taehee Yoo
nf_flow_offload_gc_step() and nf_flow_table_iterate() are very similar.
so that many duplicate code can be removed.
After this patch, nf_flow_offload_gc_step() is simple callback function of
nf_flow_table_iterate() like nf_flow_table_do_cleanup().

Signed-off-by: Taehee Yoo 
---
 net/netfilter/nf_flow_table_core.c | 34 ++
 1 file changed, 7 insertions(+), 27 deletions(-)

diff --git a/net/netfilter/nf_flow_table_core.c 
b/net/netfilter/nf_flow_table_core.c
index 58bb006cf1b8..fa0844e2a68d 100644
--- a/net/netfilter/nf_flow_table_core.c
+++ b/net/netfilter/nf_flow_table_core.c
@@ -286,33 +286,13 @@ static inline bool nf_flow_has_expired(const struct 
flow_offload *flow)
return (__s32)(flow->timeout - (u32)jiffies) <= 0;
 }
 
-static void nf_flow_offload_gc_step(struct nf_flowtable *flow_table)
+static void nf_flow_offload_gc_step(struct flow_offload *flow, void *data)
 {
-   struct flow_offload_tuple_rhash *tuplehash;
-   struct rhashtable_iter hti;
-   struct flow_offload *flow;
-
-   rhashtable_walk_enter(_table->rhashtable, );
-   rhashtable_walk_start();
+   struct nf_flowtable *flow_table = data;
 
-   while ((tuplehash = rhashtable_walk_next())) {
-   if (IS_ERR(tuplehash)) {
-   if (PTR_ERR(tuplehash) != -EAGAIN)
-   break;
-   continue;
-   }
-   if (tuplehash->tuple.dir)
-   continue;
-
-   flow = container_of(tuplehash, struct flow_offload, 
tuplehash[0]);
-
-   if (nf_flow_has_expired(flow) ||
-   (flow->flags & (FLOW_OFFLOAD_DYING |
-   FLOW_OFFLOAD_TEARDOWN)))
-   flow_offload_del(flow_table, flow);
-   }
-   rhashtable_walk_stop();
-   rhashtable_walk_exit();
+   if (nf_flow_has_expired(flow) ||
+   (flow->flags & (FLOW_OFFLOAD_DYING | FLOW_OFFLOAD_TEARDOWN)))
+   flow_offload_del(flow_table, flow);
 }
 
 static void nf_flow_offload_work_gc(struct work_struct *work)
@@ -320,7 +300,7 @@ static void nf_flow_offload_work_gc(struct work_struct 
*work)
struct nf_flowtable *flow_table;
 
flow_table = container_of(work, struct nf_flowtable, gc_work.work);
-   nf_flow_offload_gc_step(flow_table);
+   nf_flow_table_iterate(flow_table, nf_flow_offload_gc_step, flow_table);
queue_delayed_work(system_power_efficient_wq, _table->gc_work, HZ);
 }
 
@@ -504,7 +484,7 @@ void nf_flow_table_free(struct nf_flowtable *flow_table)
mutex_unlock(_lock);
cancel_delayed_work_sync(_table->gc_work);
nf_flow_table_iterate(flow_table, nf_flow_table_do_cleanup, NULL);
-   nf_flow_offload_gc_step(flow_table);
+   nf_flow_table_iterate(flow_table, nf_flow_offload_gc_step, flow_table);
rhashtable_destroy(_table->rhashtable);
 }
 EXPORT_SYMBOL_GPL(nf_flow_table_free);
-- 
2.17.1



[PATCH nf-next 1/2] netfilter: nf_flow_table: make nf_flow_table_iterate() static

2018-11-06 Thread Taehee Yoo
nf_flow_table_iterate() is local function.
It can be static function.

Signed-off-by: Taehee Yoo 
---
 include/net/netfilter/nf_flow_table.h | 4 
 net/netfilter/nf_flow_table_core.c| 8 
 2 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/include/net/netfilter/nf_flow_table.h 
b/include/net/netfilter/nf_flow_table.h
index 77e2761d4f2f..7d5cda7ce32a 100644
--- a/include/net/netfilter/nf_flow_table.h
+++ b/include/net/netfilter/nf_flow_table.h
@@ -95,10 +95,6 @@ void flow_offload_free(struct flow_offload *flow);
 int flow_offload_add(struct nf_flowtable *flow_table, struct flow_offload 
*flow);
 struct flow_offload_tuple_rhash *flow_offload_lookup(struct nf_flowtable 
*flow_table,
 struct flow_offload_tuple 
*tuple);
-int nf_flow_table_iterate(struct nf_flowtable *flow_table,
- void (*iter)(struct flow_offload *flow, void *data),
- void *data);
-
 void nf_flow_table_cleanup(struct net_device *dev);
 
 int nf_flow_table_init(struct nf_flowtable *flow_table);
diff --git a/net/netfilter/nf_flow_table_core.c 
b/net/netfilter/nf_flow_table_core.c
index b7a4816add76..58bb006cf1b8 100644
--- a/net/netfilter/nf_flow_table_core.c
+++ b/net/netfilter/nf_flow_table_core.c
@@ -247,9 +247,10 @@ flow_offload_lookup(struct nf_flowtable *flow_table,
 }
 EXPORT_SYMBOL_GPL(flow_offload_lookup);
 
-int nf_flow_table_iterate(struct nf_flowtable *flow_table,
- void (*iter)(struct flow_offload *flow, void *data),
- void *data)
+static int
+nf_flow_table_iterate(struct nf_flowtable *flow_table,
+ void (*iter)(struct flow_offload *flow, void *data),
+ void *data)
 {
struct flow_offload_tuple_rhash *tuplehash;
struct rhashtable_iter hti;
@@ -279,7 +280,6 @@ int nf_flow_table_iterate(struct nf_flowtable *flow_table,
 
return err;
 }
-EXPORT_SYMBOL_GPL(nf_flow_table_iterate);
 
 static inline bool nf_flow_has_expired(const struct flow_offload *flow)
 {
-- 
2.17.1



[PATCH nf-next 0/2] netfilter: nf_flow_table: remove duplicate code in nf_flow_table_core.c

2018-11-06 Thread Taehee Yoo
In this patch series, duplicate code in nf_flow_table_core.c are removed.

First patch makes nf_flow_table_iterate() static because
that is local function.

Second patch makes nf_flow_offfload_gc_step() simplier.
Both nf_flow_offload_gc_step() and nf_flow_table_iterate()
have same rhashtable lookup routine.
So that duplicate code in nf_flow_offload_gc_step() can be removed.

Taehee Yoo (2):
  netfilter: nf_flow_table: make nf_flow_table_iterate() static
  netfilter: nf_flow_table: simplify nf_flow_offload_gc_step()

 include/net/netfilter/nf_flow_table.h |  4 ---
 net/netfilter/nf_flow_table_core.c| 42 +++
 2 files changed, 11 insertions(+), 35 deletions(-)

-- 
2.17.1



[PATCH nf v3 4/4] netfilter: ipt_CLUSTERIP: check MAC address when duplicate config is set

2018-11-05 Thread Taehee Yoo
If same destination IP address config is already existing, that config is
just used. MAC address also should be same.
However, there is no MAC address checking routine.
So that MAC address checking routine is added.

test commands:
   %iptables -A INPUT -p tcp -i lo -d 192.168.0.5 --dport 80 \
   -j CLUSTERIP --new --hashmode sourceip \
   --clustermac 01:00:5e:00:00:20 --total-nodes 2 --local-node 1
   %iptables -A INPUT -p tcp -i lo -d 192.168.0.5 --dport 80 \
   -j CLUSTERIP --new --hashmode sourceip \
   --clustermac 01:00:5e:00:00:21 --total-nodes 2 --local-node 1

After this patch, above commands are disallowed.

v3: add Fourth patch.
v2:
 - use spin_lock_bh() instead of spin_lock() (Pablo Neira Ayuso)
 - add missing dev_mc_add() and dev_mc_del().
 - add Third patch.
v1: Initial patch

Signed-off-by: Taehee Yoo 
---
 net/ipv4/netfilter/ipt_CLUSTERIP.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/netfilter/ipt_CLUSTERIP.c 
b/net/ipv4/netfilter/ipt_CLUSTERIP.c
index 7fd399751c2e..3cd237b42f44 100644
--- a/net/ipv4/netfilter/ipt_CLUSTERIP.c
+++ b/net/ipv4/netfilter/ipt_CLUSTERIP.c
@@ -509,7 +509,8 @@ static int clusterip_tg_check(const struct xt_tgchk_param 
*par)
if (IS_ERR(config))
return PTR_ERR(config);
}
-   }
+   } else if (memcmp(>clustermac, >clustermac, ETH_ALEN))
+   return -EINVAL;
 
ret = nf_ct_netns_get(par->net, par->family);
if (ret < 0) {
-- 
2.17.1



[PATCH nf v3 3/4] netfilter: ipt_CLUSTERIP: fix sleep-in-atomic bug in clusterip_config_entry_put()

2018-11-05 Thread Taehee Yoo
A proc_remove() can sleep. so that it can't be inside of spin_lock.
Hence proc_remove() is moved to outside of spin_lock. and it also
adds mutex to sync create and remove of proc entry(config->pde).

test commands:
SHELL#1
   %while :; do iptables -A INPUT -p udp -i enp2s0 -d 192.168.1.100 \
   --dport 9000  -j CLUSTERIP --new --hashmode sourceip \
   --clustermac 01:00:5e:00:00:21 --total-nodes 3 --local-node 3; \
   iptables -F; done

SHELL#2
   %while :; do echo +1 > /proc/net/ipt_CLUSTERIP/192.168.1.100; \
   echo -1 > /proc/net/ipt_CLUSTERIP/192.168.1.100; done

[ 2949.569864] BUG: sleeping function called from invalid context at 
kernel/sched/completion.c:99
[ 2949.579944] in_atomic(): 1, irqs_disabled(): 0, pid: 5472, name: iptables
[ 2949.587920] 1 lock held by iptables/5472:
[ 2949.592711]  #0: 8f0ebcf2 (&(>lock)->rlock){+...}, at: 
refcount_dec_and_lock+0x24/0x50
[ 2949.603307] CPU: 1 PID: 5472 Comm: iptables Tainted: GW 
4.19.0-rc5+ #16
[ 2949.604212] Hardware name: To be filled by O.E.M. To be filled by 
O.E.M./Aptio CRB, BIOS 5.6.5 07/08/2015
[ 2949.604212] Call Trace:
[ 2949.604212]  dump_stack+0xc9/0x16b
[ 2949.604212]  ? show_regs_print_info+0x5/0x5
[ 2949.604212]  ___might_sleep+0x2eb/0x420
[ 2949.604212]  ? set_rq_offline.part.87+0x140/0x140
[ 2949.604212]  ? _rcu_barrier_trace+0x400/0x400
[ 2949.604212]  wait_for_completion+0x94/0x710
[ 2949.604212]  ? wait_for_completion_interruptible+0x780/0x780
[ 2949.604212]  ? __kernel_text_address+0xe/0x30
[ 2949.604212]  ? __lockdep_init_map+0x10e/0x5c0
[ 2949.604212]  ? __lockdep_init_map+0x10e/0x5c0
[ 2949.604212]  ? __init_waitqueue_head+0x86/0x130
[ 2949.604212]  ? init_wait_entry+0x1a0/0x1a0
[ 2949.604212]  proc_entry_rundown+0x208/0x270
[ 2949.604212]  ? proc_reg_get_unmapped_area+0x370/0x370
[ 2949.604212]  ? __lock_acquire+0x4500/0x4500
[ 2949.604212]  ? complete+0x18/0x70
[ 2949.604212]  remove_proc_subtree+0x143/0x2a0
[ 2949.708655]  ? remove_proc_entry+0x390/0x390
[ 2949.708655]  clusterip_tg_destroy+0x27a/0x630 [ipt_CLUSTERIP]
[ ... ]

v3: add Fourth patch.
v2:
 - use spin_lock_bh() instead of spin_lock() (Pablo Neira Ayuso)
 - add missing dev_mc_add() and dev_mc_del().
 - add Third patch.
v1: Initial patch

Fixes: b3e456fce9f5 ("netfilter: ipt_CLUSTERIP: fix a race condition of proc 
file creation")
Signed-off-by: Taehee Yoo 
---
 net/ipv4/netfilter/ipt_CLUSTERIP.c | 19 ++-
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/netfilter/ipt_CLUSTERIP.c 
b/net/ipv4/netfilter/ipt_CLUSTERIP.c
index e734a57cd9f1..7fd399751c2e 100644
--- a/net/ipv4/netfilter/ipt_CLUSTERIP.c
+++ b/net/ipv4/netfilter/ipt_CLUSTERIP.c
@@ -56,7 +56,7 @@ struct clusterip_config {
 #endif
enum clusterip_hashmode hash_mode;  /* which hashing mode */
u_int32_t hash_initval; /* hash initialization */
-   struct rcu_head rcu;
+   struct rcu_head rcu;/* for call_rcu_bh */
struct net *net;/* netns for pernet list */
char ifname[IFNAMSIZ];  /* device ifname */
 };
@@ -72,6 +72,8 @@ struct clusterip_net {
 
 #ifdef CONFIG_PROC_FS
struct proc_dir_entry *procdir;
+   /* mutex protects the config->pde*/
+   struct mutex mutex;
 #endif
 };
 
@@ -118,17 +120,18 @@ clusterip_config_entry_put(struct clusterip_config *c)
 
local_bh_disable();
if (refcount_dec_and_lock(>entries, >lock)) {
+   list_del_rcu(>list);
+   spin_unlock(>lock);
+   local_bh_enable();
/* In case anyone still accesses the file, the open/close
 * functions are also incrementing the refcount on their own,
 * so it's safe to remove the entry even if it's in use. */
 #ifdef CONFIG_PROC_FS
+   mutex_lock(>mutex);
if (cn->procdir)
proc_remove(c->pde);
+   mutex_unlock(>mutex);
 #endif
-   list_del_rcu(>list);
-   spin_unlock(>lock);
-   local_bh_enable();
-
return;
}
local_bh_enable();
@@ -278,9 +281,11 @@ clusterip_config_init(struct net *net, const struct 
ipt_clusterip_tgt_info *i,
 
/* create proc dir entry */
sprintf(buffer, "%pI4", );
+   mutex_lock(>mutex);
c->pde = proc_create_data(buffer, 0600,
  cn->procdir,
  _proc_fops, c);
+   mutex_unlock(>mutex);
if (!c->pde) {
err = -ENOMEM;
goto err;
@@ -832,6 +837,7 @@ static int clusterip_net_init(struct net *net)
pr_err("Unable to proc dir entry\n");
return -ENOMEM;

[PATCH nf v3 2/4] netfilter: ipt_CLUSTERIP: remove wrong WARN_ON_ONCE in netns exit routine

2018-11-05 Thread Taehee Yoo
When network namespace is destroyed, both clusterip_tg_destroy() and
clusterip_net_exit() are called. and clusterip_net_exit() is called
before clusterip_tg_destroy().
Hence cleanup check code in clusterip_net_exit() doesn't make sense.

test commands:
   %ip netns add vm1
   %ip netns exec vm1 bash
   %ip link set lo up
   %iptables -A INPUT -p tcp -i lo -d 192.168.0.5 --dport 80 \
-j CLUSTERIP --new --hashmode sourceip \
--clustermac 01:00:5e:00:00:20 --total-nodes 2 --local-node 1
   %exit
   %ip netns del vm1

splat looks like:
[  341.184508] WARNING: CPU: 1 PID: 87 at 
net/ipv4/netfilter/ipt_CLUSTERIP.c:840 clusterip_net_exit+0x319/0x380 
[ipt_CLUSTERIP]
[  341.184850] Modules linked in: ipt_CLUSTERIP nf_conntrack nf_defrag_ipv6 
nf_defrag_ipv4 xt_tcpudp iptable_filter bpfilter ip_tables x_tables
[  341.184850] CPU: 1 PID: 87 Comm: kworker/u4:2 Not tainted 4.19.0-rc5+ #16
[  341.227509] Workqueue: netns cleanup_net
[  341.227509] RIP: 0010:clusterip_net_exit+0x319/0x380 [ipt_CLUSTERIP]
[  341.227509] Code: 0f 85 7f fe ff ff 48 c7 c2 80 64 2c c0 be a8 02 00 00 48 
c7 c7 a0 63 2c c0 c6 05 18 6e 00 00 01 e8 bc 38 ff f5 e9 5b fe ff ff <0f> 0b e9 
33 ff ff ff e8 4b 90 50 f6 e9 2d fe ff ff 48 89 df e8 de
[  341.227509] RSP: 0018:88011086f408 EFLAGS: 00010202
[  341.227509] RAX: dc00 RBX: 11002210de85 RCX: 
[  341.227509] RDX: 11002210de85 RSI: 880110813be8 RDI: ed002210de58
[  341.227509] RBP: 88011086f4d0 R08:  R09: 
[  341.227509] R10:  R11:  R12: 11002210de81
[  341.227509] R13: 880110625a48 R14: 880114cec8c8 R15: 0014
[  341.227509] FS:  () GS:88011660() 
knlGS:
[  341.227509] CS:  0010 DS:  ES:  CR0: 80050033
[  341.227509] CR2: 7f11fd38e000 CR3: 00013ca16000 CR4: 001006e0
[  341.227509] Call Trace:
[  341.227509]  ? __clusterip_config_find+0x460/0x460 [ipt_CLUSTERIP]
[  341.227509]  ? default_device_exit+0x1ca/0x270
[  341.227509]  ? remove_proc_entry+0x1cd/0x390
[  341.227509]  ? dev_change_net_namespace+0xd00/0xd00
[  341.227509]  ? __init_waitqueue_head+0x130/0x130
[  341.227509]  ops_exit_list.isra.10+0x94/0x140
[  341.227509]  cleanup_net+0x45b/0x900
[ ... ]

v3: add Fourth patch.
v2:
 - use spin_lock_bh() instead of spin_lock() (Pablo Neira Ayuso)
 - add missing dev_mc_add() and dev_mc_del().
 - add Third patch.
v1: Initial patch

Fixes: 613d0776d3fe ("netfilter: exit_net cleanup check added")
Signed-off-by: Taehee Yoo 
---
 net/ipv4/netfilter/ipt_CLUSTERIP.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/ipv4/netfilter/ipt_CLUSTERIP.c 
b/net/ipv4/netfilter/ipt_CLUSTERIP.c
index e6147ecb006b..e734a57cd9f1 100644
--- a/net/ipv4/netfilter/ipt_CLUSTERIP.c
+++ b/net/ipv4/netfilter/ipt_CLUSTERIP.c
@@ -845,7 +845,6 @@ static void clusterip_net_exit(struct net *net)
cn->procdir = NULL;
 #endif
nf_unregister_net_hook(net, _arp_ops);
-   WARN_ON_ONCE(!list_empty(>configs));
 }
 
 static struct pernet_operations clusterip_net_ops = {
-- 
2.17.1



[PATCH nf v3 1/4] netfilter: ipt_CLUSTERIP: fix deadlock in netns exit routine

2018-11-05 Thread Taehee Yoo
When network namespace is destroyed, cleanup_net() is called.
cleanup_net() holds pernet_ops_rwsem then calls each ->exit callback.
So that clusterip_tg_destroy() is called by cleanup_net().
And clusterip_tg_destroy() calls unregister_netdevice_notifier().

But both cleanup_net() and clusterip_tg_destroy() hold same
lock(pernet_ops_rwsem). hence deadlock occurrs.

After this patch, only 1 notifier is registered when module is inserted.
And all of configs are added to per-net list.

test commands:
   %ip netns add vm1
   %ip netns exec vm1 bash
   %ip link set lo up
   %iptables -A INPUT -p tcp -i lo -d 192.168.0.5 --dport 80 \
-j CLUSTERIP --new --hashmode sourceip \
--clustermac 01:00:5e:00:00:20 --total-nodes 2 --local-node 1
   %exit
   %ip netns del vm1

splat looks like:
[  341.809674] 
[  341.809674] WARNING: possible recursive locking detected
[  341.809674] 4.19.0-rc5+ #16 Tainted: GW
[  341.809674] 
[  341.809674] kworker/u4:2/87 is trying to acquire lock:
[  341.809674] 5da2d519 (pernet_ops_rwsem){}, at: 
unregister_netdevice_notifier+0x8c/0x460
[  341.809674]
[  341.809674] but task is already holding lock:
[  341.809674] 5da2d519 (pernet_ops_rwsem){}, at: 
cleanup_net+0x119/0x900
[  341.809674]
[  341.809674] other info that might help us debug this:
[  341.809674]  Possible unsafe locking scenario:
[  341.809674]
[  341.809674]CPU0
[  341.809674]
[  341.809674]   lock(pernet_ops_rwsem);
[  341.809674]   lock(pernet_ops_rwsem);
[  341.809674]
[  341.809674]  *** DEADLOCK ***
[  341.809674]
[  341.809674]  May be due to missing lock nesting notation
[  341.809674]
[  341.809674] 3 locks held by kworker/u4:2/87:
[  341.809674]  #0: d9df6c92 ((wq_completion)"%s""netns"){+.+.}, at: 
process_one_work+0xafe/0x1de0
[  341.809674]  #1: c2cbcee2 (net_cleanup_work){+.+.}, at: 
process_one_work+0xb60/0x1de0
[  341.809674]  #2: 5da2d519 (pernet_ops_rwsem){}, at: 
cleanup_net+0x119/0x900
[  341.809674]
[  341.809674] stack backtrace:
[  341.809674] CPU: 1 PID: 87 Comm: kworker/u4:2 Tainted: GW 
4.19.0-rc5+ #16
[  341.809674] Workqueue: netns cleanup_net
[  341.809674] Call Trace:
[ ... ]
[  342.070196]  down_write+0x93/0x160
[  342.070196]  ? unregister_netdevice_notifier+0x8c/0x460
[  342.070196]  ? down_read+0x1e0/0x1e0
[  342.070196]  ? sched_clock_cpu+0x126/0x170
[  342.070196]  ? find_held_lock+0x39/0x1c0
[  342.070196]  unregister_netdevice_notifier+0x8c/0x460
[  342.070196]  ? register_netdevice_notifier+0x790/0x790
[  342.070196]  ? __local_bh_enable_ip+0xe9/0x1b0
[  342.070196]  ? __local_bh_enable_ip+0xe9/0x1b0
[  342.070196]  ? clusterip_tg_destroy+0x372/0x650 [ipt_CLUSTERIP]
[  342.070196]  ? trace_hardirqs_on+0x93/0x210
[  342.070196]  ? __bpf_trace_preemptirq_template+0x10/0x10
[  342.070196]  ? clusterip_tg_destroy+0x372/0x650 [ipt_CLUSTERIP]
[  342.123094]  clusterip_tg_destroy+0x3ad/0x650 [ipt_CLUSTERIP]
[  342.123094]  ? clusterip_net_init+0x3d0/0x3d0 [ipt_CLUSTERIP]
[  342.123094]  ? cleanup_match+0x17d/0x200 [ip_tables]
[  342.123094]  ? xt_unregister_table+0x215/0x300 [x_tables]
[  342.123094]  ? kfree+0xe2/0x2a0
[  342.123094]  cleanup_entry+0x1d5/0x2f0 [ip_tables]
[  342.123094]  ? cleanup_match+0x200/0x200 [ip_tables]
[  342.123094]  __ipt_unregister_table+0x9b/0x1a0 [ip_tables]
[  342.123094]  iptable_filter_net_exit+0x43/0x80 [iptable_filter]
[  342.123094]  ops_exit_list.isra.10+0x94/0x140
[  342.123094]  cleanup_net+0x45b/0x900
[ ... ]

v3: add Fourth patch.
v2:
 - use spin_lock_bh() instead of spin_lock() (Pablo Neira Ayuso)
 - add missing dev_mc_add() and dev_mc_del().
 - add Third patch.
v1: Initial patch

Fixes: 202f59afd441 ("netfilter: ipt_CLUSTERIP: do not hold dev")
Signed-off-by: Taehee Yoo 
---
 net/ipv4/netfilter/ipt_CLUSTERIP.c | 155 -
 1 file changed, 87 insertions(+), 68 deletions(-)

diff --git a/net/ipv4/netfilter/ipt_CLUSTERIP.c 
b/net/ipv4/netfilter/ipt_CLUSTERIP.c
index 2c8d313ae216..e6147ecb006b 100644
--- a/net/ipv4/netfilter/ipt_CLUSTERIP.c
+++ b/net/ipv4/netfilter/ipt_CLUSTERIP.c
@@ -57,17 +57,14 @@ struct clusterip_config {
enum clusterip_hashmode hash_mode;  /* which hashing mode */
u_int32_t hash_initval; /* hash initialization */
struct rcu_head rcu;
-
+   struct net *net;/* netns for pernet list */
char ifname[IFNAMSIZ];  /* device ifname */
-   struct notifier_block notifier; /* refresh c->ifindex in it */
 };
 
 #ifdef CONFIG_PROC_FS
 static const struct file_operations clusterip_proc_fops;
 #endif
 
-static unsigned int clusterip_net_id __read_mostly;
-
 struct clusterip_net {
struct list_head configs;
/* lock protects the configs list */
@@ -78,16 +75,30 @@ struct c

[PATCH nf v3 0/4] netfilter: ipt_CLUSTERIP: fix bugs in ipt_CLUSTERIP

2018-11-05 Thread Taehee Yoo
This patchset fixes bugs in ipt_CLUSTERIP.

First patch fixes deadlock when netns is destroyed.
When netns is destroyed cleanup_net() is called.
That function calls ->exit callback of pernet_ops.
->exit callback of ipt_CLUSTERIP hold same lock with cleanup_net().
so that deadlock will occurred.

Second patch removes wrong WARN_ON_ONCE() in clusterip_net_exit().
A WARN_ON_ONCE() in clusterip_net_exit() is for checking cleanup
is successfully done. but clusterip_net_exit() is called earlier than
cleanup function(clusterip_tg_destroy). so that it can't check about that.

Third patch fixes sleep-in-atomic bug when config structure is destroyed.
In order to sync create and remove of proc entry, proc_remove() is placed in 
spin_lock.
But proc_remove() can sleep. so that proc_remove() shouldn't be inside
of spin_lock.

Fourth patch do not allow incompatible MAC address config setting.
If same destination IP address config is already existing, that config is
just used. MAC address also should be same.
However, there is no MAC address checking routine.

v3: add Fourth patch.
v2:
 - use spin_lock_bh() instead of spin_lock() (Pablo Neira Ayuso)
 - add missing dev_mc_add() and dev_mc_del().
 - add Third patch.
v1: Initial patch

Taehee Yoo (4):
  netfilter: ipt_CLUSTERIP: fix deadlock in netns exit routine
  netfilter: ipt_CLUSTERIP: remove wrong WARN_ON_ONCE in netns exit
routine
  netfilter: ipt_CLUSTERIP: fix sleep-in-atomic bug in
clusterip_config_entry_put()
  netfilter: ipt_CLUSTERIP: check MAC address when duplicate config is
set

 net/ipv4/netfilter/ipt_CLUSTERIP.c | 178 +
 1 file changed, 103 insertions(+), 75 deletions(-)

-- 
2.17.1



[PATCH nf v2 3/3] netfilter: nf_conncount: fix unexpected permanent node of list.

2018-11-04 Thread Taehee Yoo
When list->count is 0, the list is deleted by GC.
But list->count is never reached 0. because Initial count value is 1
and it is increased when node is inserted.
So that initial value of list->count should be 0.

Originally GC always finds zero count list through deleting node and
decreasing count. it has problem that it couldn't find lists that
didn't insert nodes(by allocating problem, etc...).
In order to solve this problem, GC routine also finds zero count list
without deleting node.

v2:
 - Use spin_lock_bh() in nf_conncount_add() (Pablo Neira Ayuso)
 - Add Third patch.
v1: Initial patch

Fixes: cb2b36f5a97d ("netfilter: nf_conncount: Switch to plain list")
Signed-off-by: Taehee Yoo 
---
 net/netfilter/nf_conncount.c | 17 ++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/net/netfilter/nf_conncount.c b/net/netfilter/nf_conncount.c
index cb33709138df..b5dc749112a1 100644
--- a/net/netfilter/nf_conncount.c
+++ b/net/netfilter/nf_conncount.c
@@ -144,8 +144,10 @@ static bool conn_free(struct nf_conncount_list *list,
list->count--;
conn->dead = true;
list_del_rcu(>node);
-   if (list->count == 0)
+   if (list->count == 0) {
+   list->dead = true;
free_entry = true;
+   }
 
spin_unlock_bh(>list_lock);
call_rcu(>rcu_head, __conn_free);
@@ -248,7 +250,7 @@ void nf_conncount_list_init(struct nf_conncount_list *list)
 {
spin_lock_init(>list_lock);
INIT_LIST_HEAD(>head);
-   list->count = 1;
+   list->count = 0;
list->dead = false;
 }
 EXPORT_SYMBOL_GPL(nf_conncount_list_init);
@@ -262,6 +264,7 @@ bool nf_conncount_gc_list(struct net *net,
struct nf_conn *found_ct;
unsigned int collected = 0;
bool free_entry = false;
+   bool ret = false;
 
list_for_each_entry_safe(conn, conn_n, >head, node) {
found = find_or_evict(net, list, conn, _entry);
@@ -291,7 +294,14 @@ bool nf_conncount_gc_list(struct net *net,
if (collected > CONNCOUNT_GC_MAX_NODES)
return false;
}
-   return false;
+
+   spin_lock_bh(>list_lock);
+   if (!list->count) {
+   list->dead = true;
+   ret = true;
+   }
+   spin_unlock_bh(>list_lock);
+   return ret;
 }
 EXPORT_SYMBOL_GPL(nf_conncount_gc_list);
 
@@ -417,6 +427,7 @@ insert_tree(struct net *net,
nf_conncount_list_init(>list);
list_add(>node, >list.head);
count = 1;
+   rbconn->list.count = count;
 
rb_link_node(>node, parent, rbnode);
rb_insert_color(>node, root);
-- 
2.17.1



[PATCH nf v2 2/3] netfilter: nf_conncount: fix list_del corruption in conn_free

2018-11-04 Thread Taehee Yoo
nf_conncount_tuple is an element of nft_connlimit and that is deleted by
conn_free(). elements can be deleted by both GC routine and
data path functions(nf_conncount_lookup, nf_conncount_add) and they
calls conn_free() to free elements.
But conn_free() only protects lists, not each element.
So that list_del corruption could occurred.

The conn_free() doesn't check whether element is already deleted.
In order to protect elements, dead flag is added.
If an element is deleted, dead flag is set.
The only conn_free() can delete elements so that both list lock and
dead flag are enough to protect it.

test commands:
   %nft add table ip filter
   %nft add chain ip filter input { type filter hook input priority 0\; }
   %nft add rule filter input meter test { ip id ct count over 2 } counter

splat looks like:
[ 1779.495778] list_del corruption, 8800b6e12008->prev is LIST_POISON2 
(dead0200)
[ 1779.505453] [ cut here ]
[ 1779.506260] kernel BUG at lib/list_debug.c:50!
[ 1779.515831] invalid opcode:  [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[ 1779.516772] CPU: 0 PID: 33 Comm: kworker/0:2 Not tainted 4.19.0-rc6+ #22
[ 1779.516772] Workqueue: events_power_efficient nft_rhash_gc [nf_tables_set]
[ 1779.516772] RIP: 0010:__list_del_entry_valid+0xd8/0x150
[ 1779.516772] Code: 39 48 83 c4 08 b8 01 00 00 00 5b 5d c3 48 89 ea 48 c7 c7 
00 c3 5b 98 e8 0f dc 40 ff 0f 0b 48 c7 c7 60 c3 5b 98 e8 01 dc 40 ff <0f> 0b 48 
c7 c7 c0 c3 5b 98 e8 f3 db 40 ff 0f 0b 48 c7 c7 20 c4 5b
[ 1779.516772] RSP: 0018:880119127420 EFLAGS: 00010286
[ 1779.516772] RAX: 004e RBX: dead0200 RCX: 
[ 1779.516772] RDX: 004e RSI: 0008 RDI: ed0023224e7a
[ 1779.516772] RBP: 88011934bc10 R08: ed002367cea9 R09: ed002367cea9
[ 1779.516772] R10: 0001 R11: ed002367cea8 R12: 8800b6e12008
[ 1779.516772] R13: 8800b6e12010 R14: 88011934bc20 R15: 8800b6e12008
[ 1779.516772] FS:  () GS:88011b20() 
knlGS:
[ 1779.516772] CS:  0010 DS:  ES:  CR0: 80050033
[ 1779.516772] CR2: 7fc876534010 CR3: 00010da16000 CR4: 001006f0
[ 1779.516772] Call Trace:
[ 1779.516772]  conn_free+0x9f/0x2b0 [nf_conncount]
[ 1779.516772]  ? nf_ct_tmpl_alloc+0x2a0/0x2a0 [nf_conntrack]
[ 1779.516772]  ? nf_conncount_add+0x520/0x520 [nf_conncount]
[ 1779.516772]  ? do_raw_spin_trylock+0x1a0/0x1a0
[ 1779.516772]  ? do_raw_spin_trylock+0x10/0x1a0
[ 1779.516772]  find_or_evict+0xe5/0x150 [nf_conncount]
[ 1779.516772]  nf_conncount_gc_list+0x162/0x360 [nf_conncount]
[ 1779.516772]  ? nf_conncount_lookup+0xee0/0xee0 [nf_conncount]
[ 1779.516772]  ? _raw_spin_unlock_irqrestore+0x45/0x50
[ 1779.516772]  ? trace_hardirqs_off+0x6b/0x220
[ 1779.516772]  ? trace_hardirqs_on_caller+0x220/0x220
[ 1779.516772]  nft_rhash_gc+0x16b/0x540 [nf_tables_set]
[ ... ]

v2:
 - Use spin_lock_bh() in nf_conncount_add() (Pablo Neira Ayuso)
 - Add Third patch.
v1: Initial patch

Fixes: 5c789e131cbb ("netfilter: nf_conncount: Add list lock and gc worker, and 
RCU for init tree search")
Signed-off-by: Taehee Yoo 
---
 net/netfilter/nf_conncount.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/nf_conncount.c b/net/netfilter/nf_conncount.c
index 71b1f4f99580..cb33709138df 100644
--- a/net/netfilter/nf_conncount.c
+++ b/net/netfilter/nf_conncount.c
@@ -49,6 +49,7 @@ struct nf_conncount_tuple {
struct nf_conntrack_zonezone;
int cpu;
u32 jiffies32;
+   booldead;
struct rcu_head rcu_head;
 };
 
@@ -106,6 +107,7 @@ nf_conncount_add(struct nf_conncount_list *list,
conn->zone = *zone;
conn->cpu = raw_smp_processor_id();
conn->jiffies32 = (u32)jiffies;
+   conn->dead = false;
spin_lock_bh(>list_lock);
if (list->dead == true) {
kmem_cache_free(conncount_conn_cachep, conn);
@@ -134,12 +136,13 @@ static bool conn_free(struct nf_conncount_list *list,
 
spin_lock_bh(>list_lock);
 
-   if (list->count == 0) {
+   if (conn->dead) {
spin_unlock_bh(>list_lock);
-return free_entry;
+   return free_entry;
}
 
list->count--;
+   conn->dead = true;
list_del_rcu(>node);
if (list->count == 0)
free_entry = true;
-- 
2.17.1



[PATCH nf v2 1/3] netfilter: nf_conncount: use spin_lock_bh instead of spin_lock

2018-11-04 Thread Taehee Yoo
conn_free() holds lock with spin_lock(). and it is called by both
nf_conncount_lookup() and nf_conncount_gc_list().
nf_conncount_lookup() is bottom-half context and nf_conncount_gc_list()
is process context. so that spin_lock() is not safe.
Hence conn_free() should use spin_lock_bh() instead of spin_lock().

test commands:
   %nft add table ip filter
   %nft add chain ip filter input { type filter hook input priority 0\; }
   %nft add rule filter input meter test { ip saddr ct count over 2 } \
   counter

splat looks like:
[  461.996507] 
[  461.998999] WARNING: inconsistent lock state
[  461.998999] 4.19.0-rc6+ #22 Not tainted
[  461.998999] 
[  461.998999] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
[  461.998999] kworker/0:2/134 [HC0[0]:SC0[0]:HE1:SE1] takes:
[  461.998999] a71a559a (&(>list_lock)->rlock){+.?.}, at: 
conn_free+0x69/0x2b0 [nf_conncount]
[  461.998999] {IN-SOFTIRQ-W} state was registered at:
[  461.998999]   _raw_spin_lock+0x30/0x70
[  461.998999]   nf_conncount_add+0x28a/0x520 [nf_conncount]
[  461.998999]   nft_connlimit_eval+0x401/0x580 [nft_connlimit]
[  461.998999]   nft_dynset_eval+0x32b/0x590 [nf_tables]
[  461.998999]   nft_do_chain+0x497/0x1430 [nf_tables]
[  461.998999]   nft_do_chain_ipv4+0x255/0x330 [nf_tables]
[  461.998999]   nf_hook_slow+0xb1/0x160
[ ... ]
[  461.998999] other info that might help us debug this:
[  461.998999]  Possible unsafe locking scenario:
[  461.998999]
[  461.998999]CPU0
[  461.998999]
[  461.998999]   lock(&(>list_lock)->rlock);
[  461.998999]   
[  461.998999] lock(&(>list_lock)->rlock);
[  461.998999]
[  461.998999]  *** DEADLOCK ***
[  461.998999]
[ ... ]

v2:
 - Use spin_lock_bh() in nf_conncount_add() (Pablo Neira Ayuso)
 - Add Third patch.
v1: Initial patch

Fixes: 5c789e131cbb ("netfilter: nf_conncount: Add list lock and gc worker, and 
RCU for init tree search")
Signed-off-by: Taehee Yoo 
---
 net/netfilter/nf_conncount.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/netfilter/nf_conncount.c b/net/netfilter/nf_conncount.c
index 02ca7df793f5..71b1f4f99580 100644
--- a/net/netfilter/nf_conncount.c
+++ b/net/netfilter/nf_conncount.c
@@ -106,15 +106,15 @@ nf_conncount_add(struct nf_conncount_list *list,
conn->zone = *zone;
conn->cpu = raw_smp_processor_id();
conn->jiffies32 = (u32)jiffies;
-   spin_lock(>list_lock);
+   spin_lock_bh(>list_lock);
if (list->dead == true) {
kmem_cache_free(conncount_conn_cachep, conn);
-   spin_unlock(>list_lock);
+   spin_unlock_bh(>list_lock);
return NF_CONNCOUNT_SKIP;
}
list_add_tail(>node, >head);
list->count++;
-   spin_unlock(>list_lock);
+   spin_unlock_bh(>list_lock);
return NF_CONNCOUNT_ADDED;
 }
 EXPORT_SYMBOL_GPL(nf_conncount_add);
@@ -132,10 +132,10 @@ static bool conn_free(struct nf_conncount_list *list,
 {
bool free_entry = false;
 
-   spin_lock(>list_lock);
+   spin_lock_bh(>list_lock);
 
if (list->count == 0) {
-   spin_unlock(>list_lock);
+   spin_unlock_bh(>list_lock);
 return free_entry;
}
 
@@ -144,7 +144,7 @@ static bool conn_free(struct nf_conncount_list *list,
if (list->count == 0)
free_entry = true;
 
-   spin_unlock(>list_lock);
+   spin_unlock_bh(>list_lock);
call_rcu(>rcu_head, __conn_free);
return free_entry;
 }
-- 
2.17.1



[PATCH nf v2 0/3] netfilter: nf_conncount: fix bugs in conn_free

2018-11-04 Thread Taehee Yoo
Three bugs in nf_conncount are fixed by this patch series.

First patch fixes inconsistent lock state in conn_free().
conn_free() is called both BH and process context. so that
spin_lock_bh() should be used.

Second patch fixes unsafe locking scenario of list element.
conn_free() can't protect double delete of list element.
So that dead flag is added.

Third patch fixes unexpected permanent node of list.
Node of nf_conncount list should be removed by GC. but it never happened.
Because initial count value is 1 and it is never reached zero.
So that GC don't remove it.

Common test commands:
   %nft add table ip filter
   %nft add chain ip filter input { type filter hook input priority 0\; }
   %nft add rule filter input meter test { ip saddr ct count over 2 } \
   counter

v2:
 - Use spin_lock_bh() in nf_conncount_add() (Pablo Neira Ayuso)
 - Add Third patch.
v1: Initial patch

Taehee Yoo (3):
  netfilter: nf_conncount: use spin_lock_bh instead of spin_lock
  netfilter: nf_conncount: fix list_del corruption in conn_free
  netfilter: nf_conncount: fix unexpected permanent node of list.

 net/netfilter/nf_conncount.c | 36 +---
 1 file changed, 25 insertions(+), 11 deletions(-)

-- 
2.17.1



Re: [PATCH nf] netfilter: xt_RATEEST: remove netns exit routine

2018-11-03 Thread Taehee Yoo
On Sat, 3 Nov 2018 at 22:47, Pablo Neira Ayuso  wrote:
>
> Hi Taehee!
>
> On Wed, Oct 31, 2018 at 03:22:22AM +0900, Taehee Yoo wrote:
> > On Tue, 30 Oct 2018 at 08:00, Pablo Neira Ayuso  wrote:
> > >
> >
> > Hi Pablo,
> > Thank you for review!
> >
> > > On Fri, Oct 19, 2018 at 12:27:57AM +0900, Taehee Yoo wrote:
> > > > xt_rateest_net_exit() was added to check whether rules are flushed
> > > > successfully. but ->net_exit() callback is called earlier than
> > > > ->destroy() callback.
> > > > So that ->net_exit() callback can't check that.
> > > >
> > > > test commands:
> > > >%ip netns add vm1
> > > >%ip netns exec vm1 iptables -t mangle -I PREROUTING -p udp \
> > > >  --dport  -j RATEEST --rateest-name ap \
> > > >  --rateest-interval 250ms --rateest-ewma 0.5s
> > > >%ip netns del vm1
> > >
> > > Hm, I cannot reproduce this here.
> > >
> > > I can see iptables-tests.py with -N fails to load entries:
> > >
> > > # ip netns exec test xtables-legacy-multi iptables -A INPUT -m 
> > > rateest --rateest RE1 --rateest-lt --rateest-bps 8bit
> > > iptables: No chain/target/match by that name.
> > >
> > > but not this warning, probably I'm missing instrumention, something
> > > not enabled here.
> > >
> >
> > I think you need RE1 RATEEST entry because rateest match needs RATEEST 
> > entry.
> > So that below command is needed.
> >%ip netns exec test xtables-legacy-multi iptables -t mangle -I
> > PREROUTING -p udp \
> > --dport  -j RATEEST --rateest-name RE1
> > --rateest-interval 250ms --rateest-ewma 0.5s
> > RE1 entry is created by above command.
> > Then, your command would not be failed.
>
> OK, running here:
>
> # iptables-tests.py -N
>
> [ After you fix for this for iptables-tests.py ;-) ]
>
> I don't hit this splat here, can you hit it there? Probably there's
> something in my testbed that makes thing behave differently. So I
> cannot still reproduce it, hm.

Oh, I'm so sorry, my original test command couldn't make this splat always.
And I found a condition to make this splat.
This command set will make splat.
   %modprobe -rv ipfilter_filter
   %modprobe -rv xt_RATEEST
   %modprobe iptable_filter
   %modprobe xt_RATEEST
   %iptables-test.py -N ./extensions/libxt_RATEEST.t

And below command could not make splat.
   %modprobe -rv ipfilter_filter
   %modprobe -rv xt_RATEEST
   %modprobe xt_RATEEST
   %modprobe iptable_filter
   %iptables-test.py -N ./extensions/libxt_RATEEST.t

Thanks!


[PATCH iptables] iptables: iptables-test: fix netns test

2018-11-01 Thread Taehee Yoo
The libxt_rateest test is always failed because dependent command is not
executed in netns.
(@iptables -I INPUT -j RATEEST --rateest-name RE1 --rateest-interval \
 250.0ms --rateest-ewmalog 500.0ms)
After this path, adding netns command is executed first.
Then test commands are executed.

Fixes: 0123183f43a9 ("iptables-test: add -N option to exercise netns removal 
path")
Reported-by: Pablo Neira Ayuso 
Signed-off-by: Taehee Yoo 
---
 iptables-test.py | 22 +-
 1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/iptables-test.py b/iptables-test.py
index 5e6bfb7e..331fe59d 100755
--- a/iptables-test.py
+++ b/iptables-test.py
@@ -147,12 +147,6 @@ def run_test(iptables, rule, rule_save, res, filename, 
lineno, netns):
 
 return delete_rule(iptables, rule, filename, lineno)
 
-def run_test_netns(iptables, rule, rule_save, res, filename, lineno):
-execute_cmd("ip netns add iptables-container-test", filename, lineno)
-ret = run_test(iptables, rule, rule_save, res, filename, lineno, True)
-execute_cmd("ip netns del iptables-container-test", filename, lineno)
-return ret
-
 def execute_cmd(cmd, filename, lineno):
 '''
 Executes a command, checking for segfaults and returning the command exit
@@ -207,6 +201,9 @@ def run_test_file(filename, netns):
 table = ""
 total_test_passed = True
 
+if netns:
+execute_cmd("ip netns add iptables-container-test", filename, 0)
+
 for lineno, line in enumerate(f):
 if line[0] == "#":
 continue
@@ -218,6 +215,8 @@ def run_test_file(filename, netns):
 # external non-iptables invocation, executed as is.
 if line[0] == "@":
 external_cmd = line.rstrip()[1:]
+if netns:
+external_cmd = "ip netns exec iptables-container-test " + 
EXECUTEABLE + " " + external_cmd
 execute_cmd(external_cmd, filename, lineno)
 continue
 
@@ -245,13 +244,8 @@ def run_test_file(filename, netns):
 rule_save = chain + " " + item[1]
 
 res = item[2].rstrip()
-
-if netns:
-ret = run_test_netns(iptables, rule, rule_save,
- res, filename, lineno + 1)
-else:
-ret = run_test(iptables, rule, rule_save,
-   res, filename, lineno + 1, False)
+ret = run_test(iptables, rule, rule_save,
+   res, filename, lineno + 1, netns)
 
 if ret < 0:
 test_passed = False
@@ -261,6 +255,8 @@ def run_test_file(filename, netns):
 if test_passed:
 passed += 1
 
+if netns:
+execute_cmd("ip netns del iptables-container-test", filename, 0)
 if total_test_passed:
 print filename + ": " + Colors.GREEN + "OK" + Colors.ENDC
 
-- 
2.17.1



Re: [PATCH nf 1/2] netfilter: nf_conncount: use spin_lock_bh instead of spin_lock

2018-10-30 Thread Taehee Yoo
Thanks to all reviewer!

On Tue, 30 Oct 2018 at 08:41, Florian Westphal  wrote:
>
> Pablo Neira Ayuso  wrote:
> > On Thu, Oct 25, 2018 at 11:56:12PM +0900, Taehee Yoo wrote:
> > > conn_free() holds lock with spin_lock(). and it is called by both
> > > nf_conncount_lookup() and nf_conncount_gc_list().
> > > nf_conncount_lookup() is bottom-half context and nf_conncount_gc_list()
> > > is process context. so that spin_lock() is not safe.
> > > Hence conn_free() should use spin_lock_bh() instead of spin_lock().
> > >
> > > test commands:
> > >%nft add table ip filter
> > >%nft add chain ip filter input { type filter hook input priority 0\; }
> > >%nft add rule filter input meter test { ip saddr ct count over 2 } \
> > >counter
> > >
> > > splat looks like:
> > > [  461.996507] 
> > > [  461.998999] WARNING: inconsistent lock state
> > > [  461.998999] 4.19.0-rc6+ #22 Not tainted
> > > [  461.998999] 
> > > [  461.998999] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
> > > [  461.998999] kworker/0:2/134 [HC0[0]:SC0[0]:HE1:SE1] takes:
> > > [  461.998999] a71a559a (&(>list_lock)->rlock){+.?.}, at: 
> > > conn_free+0x69/0x2b0 [nf_conncount]
> > > [  461.998999] {IN-SOFTIRQ-W} state was registered at:
> > > [  461.998999]   _raw_spin_lock+0x30/0x70
> > > [  461.998999]   nf_conncount_add+0x28a/0x520 [nf_conncount]
> > > [  461.998999]   nft_connlimit_eval+0x401/0x580 [nft_connlimit]
> > > [  461.998999]   nft_dynset_eval+0x32b/0x590 [nf_tables]
> > > [  461.998999]   nft_do_chain+0x497/0x1430 [nf_tables]
> > > [  461.998999]   nft_do_chain_ipv4+0x255/0x330 [nf_tables]
> > > [  461.998999]   nf_hook_slow+0xb1/0x160
> > > [ ... ]
> > > [  461.998999] other info that might help us debug this:
> > > [  461.998999]  Possible unsafe locking scenario:
> > > [  461.998999]
> > > [  461.998999]CPU0
> > > [  461.998999]
> > > [  461.998999]   lock(&(>list_lock)->rlock);
> > > [  461.998999]   
> > > [  461.998999] lock(&(>list_lock)->rlock);
> > > [  461.998999]
> > > [  461.998999]  *** DEADLOCK ***
> > > [  461.998999]
> > > [ ... ]
> >
> > nf_conncount_add() also holds spin_lock while allocate from there is
> > GFP_ATOMIC given this is called from packet path.
>
> Good catch, yes, this needs spin_lock_bh variant too.
>
> > tree_nodes_free() is also called from user context without _bh
> > disabled.
>
> This one is fine, both call sites hold spin_lock_bh(_conncount_locks[x]).

I will test then I will send v2 patch!

Thanks!


Re: [PATCH nf] netfilter: xt_RATEEST: remove netns exit routine

2018-10-30 Thread Taehee Yoo
On Tue, 30 Oct 2018 at 08:00, Pablo Neira Ayuso  wrote:
>

Hi Pablo,
Thank you for review!

> On Fri, Oct 19, 2018 at 12:27:57AM +0900, Taehee Yoo wrote:
> > xt_rateest_net_exit() was added to check whether rules are flushed
> > successfully. but ->net_exit() callback is called earlier than
> > ->destroy() callback.
> > So that ->net_exit() callback can't check that.
> >
> > test commands:
> >%ip netns add vm1
> >%ip netns exec vm1 iptables -t mangle -I PREROUTING -p udp \
> >  --dport  -j RATEEST --rateest-name ap \
> >  --rateest-interval 250ms --rateest-ewma 0.5s
> >%ip netns del vm1
>
> Hm, I cannot reproduce this here.
>
> I can see iptables-tests.py with -N fails to load entries:
>
> # ip netns exec test xtables-legacy-multi iptables -A INPUT -m rateest 
> --rateest RE1 --rateest-lt --rateest-bps 8bit
> iptables: No chain/target/match by that name.
>
> but not this warning, probably I'm missing instrumention, something
> not enabled here.
>

I think you need RE1 RATEEST entry because rateest match needs RATEEST entry.
So that below command is needed.
   %ip netns exec test xtables-legacy-multi iptables -t mangle -I
PREROUTING -p udp \
--dport  -j RATEEST --rateest-name RE1
--rateest-interval 250ms --rateest-ewma 0.5s
RE1 entry is created by above command.
Then, your command would not be failed.

Thanks!

> > splat looks like:
> > [  668.813518] WARNING: CPU: 0 PID: 87 at net/netfilter/xt_RATEEST.c:210 
> > xt_rateest_net_exit+0x210/0x340 [xt_RATEEST]
> > [  668.813518] CPU: 0 PID: 87 Comm: kworker/u4:2 Not tainted 4.19.0-rc7+ #21
> > [  668.813518] Workqueue: netns cleanup_net
> > [  668.813518] RIP: 0010:xt_rateest_net_exit+0x210/0x340 [xt_RATEEST]
> > [  668.813518] Code: 00 48 8b 85 30 ff ff ff 4c 8b 23 80 38 00 0f 85 24 01 
> > 00 00 48 8b 85 30 ff ff ff 4d 85 e4 4c 89 a5 58 ff ff ff c6 00 f8 74 b2 
> > <0f> 0b 48 83 c3 08 4c 39 f3 75 b0 48 b8 00 00 00 00 00 fc ff df 49
> > [  668.813518] RSP: 0018:8801156c73f8 EFLAGS: 00010282
> > [  668.813518] RAX: ed0022ad8e85 RBX: 880118928e98 RCX: 
> > 5db8012a
> > [  668.813518] RDX: 8801156c7428 RSI: cb1d185f RDI: 
> > 880115663b74
> > [  668.813518] RBP: 8801156c74d0 R08: 8801156633c0 R09: 
> > 1100236440be
> > [  668.813518] R10: 0001 R11: ed002367d852 R12: 
> > 880115142b08
> > [  668.813518] R13: 110022ad8e81 R14: 880118928ea8 R15: 
> > dc00
> > [  668.813518] FS:  () GS:88011b20() 
> > knlGS:
> > [  668.813518] CS:  0010 DS:  ES:  CR0: 80050033
> > [  668.813518] CR2: 563aa69f4f28 CR3: 000105a16000 CR4: 
> > 001006f0
> > [  668.813518] Call Trace:
> > [  668.813518]  ? unregister_netdevice_many+0xe0/0xe0
> > [  668.813518]  ? xt_rateest_net_init+0x2c0/0x2c0 [xt_RATEEST]
> > [  668.813518]  ? default_device_exit+0x1ca/0x270
> > [  668.813518]  ? remove_proc_entry+0x1cd/0x390
> > [  668.813518]  ? dev_change_net_namespace+0xd00/0xd00
> > [  668.813518]  ? __init_waitqueue_head+0x130/0x130
> > [  668.813518]  ops_exit_list.isra.10+0x94/0x140
> > [  668.813518]  cleanup_net+0x45b/0x900
> > [  668.813518]  ? net_drop_ns+0x110/0x110
> > [  668.813518]  ? swapgs_restore_regs_and_return_to_usermode+0x3c/0x80
> > [  668.813518]  ? save_trace+0x300/0x300
> > [  668.813518]  ? lock_acquire+0x196/0x470
> > [  668.813518]  ? lock_acquire+0x196/0x470
> > [  668.813518]  ? process_one_work+0xb60/0x1de0
> > [  668.813518]  ? _raw_spin_unlock_irq+0x29/0x40
> > [  668.813518]  ? _raw_spin_unlock_irq+0x29/0x40
> > [  668.813518]  ? __lock_acquire+0x4500/0x4500
> > [  668.813518]  ? __lock_is_held+0xb4/0x140
> > [  668.813518]  process_one_work+0xc13/0x1de0
> > [  668.813518]  ? pwq_dec_nr_in_flight+0x3c0/0x3c0
> > [  668.813518]  ? set_load_weight+0x270/0x270
> > [ ... ]
> >
> > Fixes: 3427b2ab63fa ("netfilter: make xt_rateest hash table per net")
> > Signed-off-by: Taehee Yoo 
> > ---
> >  net/netfilter/xt_RATEEST.c | 10 --
> >  1 file changed, 10 deletions(-)
> >
> > diff --git a/net/netfilter/xt_RATEEST.c b/net/netfilter/xt_RATEEST.c
> > index dec843cadf46..9e05c86ba5c4 100644
> > --- a/net/netfilter/xt_RATEEST.c
> > +++ b/net/netfilter/xt_RATEEST.c
> > @@ -201,18 +201,8 @@ static __net_init int xt_rateest_net_init(struct net 
> > *net)
> >   return 0;
> >  }
> >
> > -static void __net_exit xt_rateest_net_exit(struct net *net)
> > -{
> > - struct xt_rateest_net *xn = net_generic(net, xt_rateest_id);
> > - int i;
> > -
> > - for (i = 0; i < ARRAY_SIZE(xn->hash); i++)
> > - WARN_ON_ONCE(!hlist_empty(>hash[i]));
> > -}
> > -
> >  static struct pernet_operations xt_rateest_net_ops = {
> >   .init = xt_rateest_net_init,
> > - .exit = xt_rateest_net_exit,
> >   .id   = _rateest_id,
> >   .size = sizeof(struct xt_rateest_net),
> >  };
> > --
> > 2.17.1
> >


[PATCH nf 2/2] netfilter: nf_conncount: fix list_del corruption in conn_free

2018-10-25 Thread Taehee Yoo
nf_conncount_tuple is an element of nft_connlimit and that is deleted by
conn_free(). elements can be deleted by both GC routine and
data path functions(nf_conncount_lookup, nf_conncount_add) and they
calls conn_free() to free elements.
But conn_free() only protects lists, not each element.
So that list_del corruption could occurred.

The conn_free() doesn't check whether element is already deleted.
In order to protect elements, dead flag is added.
If an element is deleted, dead flag is set.
The only conn_free() can delete elements so that both list lock and
dead flag are enough to protect it.

test commands:
   %nft add table ip filter
   %nft add chain ip filter input { type filter hook input priority 0\; }
   %nft add rule filter input meter test { ip id ct count over 2 } counter

splat looks like:
[ 1779.495778] list_del corruption, 8800b6e12008->prev is LIST_POISON2 
(dead0200)
[ 1779.505453] [ cut here ]
[ 1779.506260] kernel BUG at lib/list_debug.c:50!
[ 1779.515831] invalid opcode:  [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[ 1779.516772] CPU: 0 PID: 33 Comm: kworker/0:2 Not tainted 4.19.0-rc6+ #22
[ 1779.516772] Workqueue: events_power_efficient nft_rhash_gc [nf_tables_set]
[ 1779.516772] RIP: 0010:__list_del_entry_valid+0xd8/0x150
[ 1779.516772] Code: 39 48 83 c4 08 b8 01 00 00 00 5b 5d c3 48 89 ea 48 c7 c7 
00 c3 5b 98 e8 0f dc 40 ff 0f 0b 48 c7 c7 60 c3 5b 98 e8 01 dc 40 ff <0f> 0b 48 
c7 c7 c0 c3 5b 98 e8 f3 db 40 ff 0f 0b 48 c7 c7 20 c4 5b
[ 1779.516772] RSP: 0018:880119127420 EFLAGS: 00010286
[ 1779.516772] RAX: 004e RBX: dead0200 RCX: 
[ 1779.516772] RDX: 004e RSI: 0008 RDI: ed0023224e7a
[ 1779.516772] RBP: 88011934bc10 R08: ed002367cea9 R09: ed002367cea9
[ 1779.516772] R10: 0001 R11: ed002367cea8 R12: 8800b6e12008
[ 1779.516772] R13: 8800b6e12010 R14: 88011934bc20 R15: 8800b6e12008
[ 1779.516772] FS:  () GS:88011b20() 
knlGS:
[ 1779.516772] CS:  0010 DS:  ES:  CR0: 80050033
[ 1779.516772] CR2: 7fc876534010 CR3: 00010da16000 CR4: 001006f0
[ 1779.516772] Call Trace:
[ 1779.516772]  conn_free+0x9f/0x2b0 [nf_conncount]
[ 1779.516772]  ? nf_ct_tmpl_alloc+0x2a0/0x2a0 [nf_conntrack]
[ 1779.516772]  ? nf_conncount_add+0x520/0x520 [nf_conncount]
[ 1779.516772]  ? do_raw_spin_trylock+0x1a0/0x1a0
[ 1779.516772]  ? do_raw_spin_trylock+0x10/0x1a0
[ 1779.516772]  find_or_evict+0xe5/0x150 [nf_conncount]
[ 1779.516772]  nf_conncount_gc_list+0x162/0x360 [nf_conncount]
[ 1779.516772]  ? nf_conncount_lookup+0xee0/0xee0 [nf_conncount]
[ 1779.516772]  ? _raw_spin_unlock_irqrestore+0x45/0x50
[ 1779.516772]  ? trace_hardirqs_off+0x6b/0x220
[ 1779.516772]  ? trace_hardirqs_on_caller+0x220/0x220
[ 1779.516772]  nft_rhash_gc+0x16b/0x540 [nf_tables_set]
[ ... ]

Fixes: 5c789e131cbb ("netfilter: nf_conncount: Add list lock and gc worker, and 
RCU for init tree search")
Signed-off-by: Taehee Yoo 
---
 net/netfilter/nf_conncount.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/nf_conncount.c b/net/netfilter/nf_conncount.c
index 8c6ece33b31f..2d7527533cf6 100644
--- a/net/netfilter/nf_conncount.c
+++ b/net/netfilter/nf_conncount.c
@@ -49,6 +49,7 @@ struct nf_conncount_tuple {
struct nf_conntrack_zonezone;
int cpu;
u32 jiffies32;
+   booldead;
struct rcu_head rcu_head;
 };
 
@@ -106,6 +107,7 @@ nf_conncount_add(struct nf_conncount_list *list,
conn->zone = *zone;
conn->cpu = raw_smp_processor_id();
conn->jiffies32 = (u32)jiffies;
+   conn->dead = false;
spin_lock(>list_lock);
if (list->dead == true) {
kmem_cache_free(conncount_conn_cachep, conn);
@@ -134,12 +136,13 @@ static bool conn_free(struct nf_conncount_list *list,
 
spin_lock_bh(>list_lock);
 
-   if (list->count == 0) {
+   if (conn->dead) {
spin_unlock_bh(>list_lock);
-return free_entry;
+   return free_entry;
}
 
list->count--;
+   conn->dead = true;
list_del_rcu(>node);
if (list->count == 0)
free_entry = true;
-- 
2.17.1



[PATCH nf 0/2] netfilter: nf_conncount: fix bugs in conn_free

2018-10-25 Thread Taehee Yoo
Two bugs in nf_conncount are fixed by this patch series.

First patch fixes inconsistent lock state in conn_free().
conn_free() is called both BH and process context. so that
spin_lock_bh() should be used.

Second patch fixes unsafe locking scenario of list element.
conn_free() can't protect double delete of list element.
So that dead flag is added.

Taehee Yoo (2):
  netfilter: nf_conncount: use spin_lock_bh instead of spin_lock
  netfilter: nf_conncount: fix list_del corruption in conn_free

 net/netfilter/nf_conncount.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

-- 
2.17.1



[PATCH nf 1/2] netfilter: nf_conncount: use spin_lock_bh instead of spin_lock

2018-10-25 Thread Taehee Yoo
conn_free() holds lock with spin_lock(). and it is called by both
nf_conncount_lookup() and nf_conncount_gc_list().
nf_conncount_lookup() is bottom-half context and nf_conncount_gc_list()
is process context. so that spin_lock() is not safe.
Hence conn_free() should use spin_lock_bh() instead of spin_lock().

test commands:
   %nft add table ip filter
   %nft add chain ip filter input { type filter hook input priority 0\; }
   %nft add rule filter input meter test { ip saddr ct count over 2 } \
   counter

splat looks like:
[  461.996507] 
[  461.998999] WARNING: inconsistent lock state
[  461.998999] 4.19.0-rc6+ #22 Not tainted
[  461.998999] 
[  461.998999] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
[  461.998999] kworker/0:2/134 [HC0[0]:SC0[0]:HE1:SE1] takes:
[  461.998999] a71a559a (&(>list_lock)->rlock){+.?.}, at: 
conn_free+0x69/0x2b0 [nf_conncount]
[  461.998999] {IN-SOFTIRQ-W} state was registered at:
[  461.998999]   _raw_spin_lock+0x30/0x70
[  461.998999]   nf_conncount_add+0x28a/0x520 [nf_conncount]
[  461.998999]   nft_connlimit_eval+0x401/0x580 [nft_connlimit]
[  461.998999]   nft_dynset_eval+0x32b/0x590 [nf_tables]
[  461.998999]   nft_do_chain+0x497/0x1430 [nf_tables]
[  461.998999]   nft_do_chain_ipv4+0x255/0x330 [nf_tables]
[  461.998999]   nf_hook_slow+0xb1/0x160
[ ... ]
[  461.998999] other info that might help us debug this:
[  461.998999]  Possible unsafe locking scenario:
[  461.998999]
[  461.998999]CPU0
[  461.998999]
[  461.998999]   lock(&(>list_lock)->rlock);
[  461.998999]   
[  461.998999] lock(&(>list_lock)->rlock);
[  461.998999]
[  461.998999]  *** DEADLOCK ***
[  461.998999]
[ ... ]

Fixes: 5c789e131cbb ("netfilter: nf_conncount: Add list lock and gc worker, and 
RCU for init tree search")
Signed-off-by: Taehee Yoo 
---
 net/netfilter/nf_conncount.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/netfilter/nf_conncount.c b/net/netfilter/nf_conncount.c
index 02ca7df793f5..8c6ece33b31f 100644
--- a/net/netfilter/nf_conncount.c
+++ b/net/netfilter/nf_conncount.c
@@ -132,10 +132,10 @@ static bool conn_free(struct nf_conncount_list *list,
 {
bool free_entry = false;
 
-   spin_lock(>list_lock);
+   spin_lock_bh(>list_lock);
 
if (list->count == 0) {
-   spin_unlock(>list_lock);
+   spin_unlock_bh(>list_lock);
 return free_entry;
}
 
@@ -144,7 +144,7 @@ static bool conn_free(struct nf_conncount_list *list,
if (list->count == 0)
free_entry = true;
 
-   spin_unlock(>list_lock);
+   spin_unlock_bh(>list_lock);
call_rcu(>rcu_head, __conn_free);
return free_entry;
 }
-- 
2.17.1



[PATCH nf] netfilter: xt_IDLETIMER: add sysfs filename checking routine

2018-10-20 Thread Taehee Yoo
When IDLETIMER rule is added, sysfs file is created under
/sys/class/xt_idletimer/timers/
But some label name shouldn't be used.
".", "..", "power", "uevent", "subsystem", etc...
So that sysfs filename checking routine is needed.

test commands:
   %iptables -I INPUT -j IDLETIMER --timeout 1 --label "power"

splat looks like:
[95765.423132] sysfs: cannot create duplicate filename 
'/devices/virtual/xt_idletimer/timers/power'
[95765.433418] CPU: 0 PID: 8446 Comm: iptables Not tainted 4.19.0-rc6+ #20
[95765.449755] Call Trace:
[95765.449755]  dump_stack+0xc9/0x16b
[95765.449755]  ? show_regs_print_info+0x5/0x5
[95765.449755]  sysfs_warn_dup+0x74/0x90
[95765.449755]  sysfs_add_file_mode_ns+0x352/0x500
[95765.449755]  sysfs_create_file_ns+0x179/0x270
[95765.449755]  ? sysfs_add_file_mode_ns+0x500/0x500
[95765.449755]  ? idletimer_tg_checkentry+0x3e5/0xb1b [xt_IDLETIMER]
[95765.449755]  ? rcu_read_lock_sched_held+0x114/0x130
[95765.449755]  ? __kmalloc_track_caller+0x211/0x2b0
[95765.449755]  ? memcpy+0x34/0x50
[95765.449755]  idletimer_tg_checkentry+0x4e2/0xb1b [xt_IDLETIMER]
[ ... ]

Fixes: 0902b469bd25 ("netfilter: xtables: idletimer target implementation")
Signed-off-by: Taehee Yoo 
---
 net/netfilter/xt_IDLETIMER.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/net/netfilter/xt_IDLETIMER.c b/net/netfilter/xt_IDLETIMER.c
index 5ee859193783..25453a16385e 100644
--- a/net/netfilter/xt_IDLETIMER.c
+++ b/net/netfilter/xt_IDLETIMER.c
@@ -116,6 +116,22 @@ static void idletimer_tg_expired(struct timer_list *t)
schedule_work(>work);
 }
 
+static int idletimer_check_sysfs_name(const char *name, unsigned int size)
+{
+   int ret;
+
+   ret = xt_check_proc_name(name, size);
+   if (ret < 0)
+   return ret;
+
+   if (!strcmp(name, "power") ||
+   !strcmp(name, "subsystem") ||
+   !strcmp(name, "uevent"))
+   return -EINVAL;
+
+   return 0;
+}
+
 static int idletimer_tg_create(struct idletimer_tg_info *info)
 {
int ret;
@@ -126,6 +142,10 @@ static int idletimer_tg_create(struct idletimer_tg_info 
*info)
goto out;
}
 
+   ret = idletimer_check_sysfs_name(info->label, sizeof(info->label));
+   if (ret < 0)
+   goto out_free_timer;
+
sysfs_attr_init(>timer->attr.attr);
info->timer->attr.attr.name = kstrdup(info->label, GFP_KERNEL);
if (!info->timer->attr.attr.name) {
-- 
2.17.1



[PATCH nf] netfilter: xt_RATEEST: remove netns exit routine

2018-10-18 Thread Taehee Yoo
xt_rateest_net_exit() was added to check whether rules are flushed
successfully. but ->net_exit() callback is called earlier than
->destroy() callback.
So that ->net_exit() callback can't check that.

test commands:
   %ip netns add vm1
   %ip netns exec vm1 iptables -t mangle -I PREROUTING -p udp \
   --dport  -j RATEEST --rateest-name ap \
   --rateest-interval 250ms --rateest-ewma 0.5s
   %ip netns del vm1

splat looks like:
[  668.813518] WARNING: CPU: 0 PID: 87 at net/netfilter/xt_RATEEST.c:210 
xt_rateest_net_exit+0x210/0x340 [xt_RATEEST]
[  668.813518] Modules linked in: xt_RATEEST xt_tcpudp iptable_mangle bpfilter 
ip_tables x_tables
[  668.813518] CPU: 0 PID: 87 Comm: kworker/u4:2 Not tainted 4.19.0-rc7+ #21
[  668.813518] Workqueue: netns cleanup_net
[  668.813518] RIP: 0010:xt_rateest_net_exit+0x210/0x340 [xt_RATEEST]
[  668.813518] Code: 00 48 8b 85 30 ff ff ff 4c 8b 23 80 38 00 0f 85 24 01 00 
00 48 8b 85 30 ff ff ff 4d 85 e4 4c 89 a5 58 ff ff ff c6 00 f8 74 b2 <0f> 0b 48 
83 c3 08 4c 39 f3 75 b0 48 b8 00 00 00 00 00 fc ff df 49
[  668.813518] RSP: 0018:8801156c73f8 EFLAGS: 00010282
[  668.813518] RAX: ed0022ad8e85 RBX: 880118928e98 RCX: 5db8012a
[  668.813518] RDX: 8801156c7428 RSI: cb1d185f RDI: 880115663b74
[  668.813518] RBP: 8801156c74d0 R08: 8801156633c0 R09: 1100236440be
[  668.813518] R10: 0001 R11: ed002367d852 R12: 880115142b08
[  668.813518] R13: 110022ad8e81 R14: 880118928ea8 R15: dc00
[  668.813518] FS:  () GS:88011b20() 
knlGS:
[  668.813518] CS:  0010 DS:  ES:  CR0: 80050033
[  668.813518] CR2: 563aa69f4f28 CR3: 000105a16000 CR4: 001006f0
[  668.813518] Call Trace:
[  668.813518]  ? unregister_netdevice_many+0xe0/0xe0
[  668.813518]  ? xt_rateest_net_init+0x2c0/0x2c0 [xt_RATEEST]
[  668.813518]  ? default_device_exit+0x1ca/0x270
[  668.813518]  ? remove_proc_entry+0x1cd/0x390
[  668.813518]  ? dev_change_net_namespace+0xd00/0xd00
[  668.813518]  ? __init_waitqueue_head+0x130/0x130
[  668.813518]  ops_exit_list.isra.10+0x94/0x140
[  668.813518]  cleanup_net+0x45b/0x900
[  668.813518]  ? net_drop_ns+0x110/0x110
[  668.813518]  ? swapgs_restore_regs_and_return_to_usermode+0x3c/0x80
[  668.813518]  ? save_trace+0x300/0x300
[  668.813518]  ? lock_acquire+0x196/0x470
[  668.813518]  ? lock_acquire+0x196/0x470
[  668.813518]  ? process_one_work+0xb60/0x1de0
[  668.813518]  ? _raw_spin_unlock_irq+0x29/0x40
[  668.813518]  ? _raw_spin_unlock_irq+0x29/0x40
[  668.813518]  ? __lock_acquire+0x4500/0x4500
[  668.813518]  ? __lock_is_held+0xb4/0x140
[  668.813518]  process_one_work+0xc13/0x1de0
[  668.813518]  ? pwq_dec_nr_in_flight+0x3c0/0x3c0
[  668.813518]  ? set_load_weight+0x270/0x270
[ ... ]

Fixes: 3427b2ab63fa ("netfilter: make xt_rateest hash table per net")
Signed-off-by: Taehee Yoo 
---
 net/netfilter/xt_RATEEST.c | 10 --
 1 file changed, 10 deletions(-)

diff --git a/net/netfilter/xt_RATEEST.c b/net/netfilter/xt_RATEEST.c
index dec843cadf46..9e05c86ba5c4 100644
--- a/net/netfilter/xt_RATEEST.c
+++ b/net/netfilter/xt_RATEEST.c
@@ -201,18 +201,8 @@ static __net_init int xt_rateest_net_init(struct net *net)
return 0;
 }
 
-static void __net_exit xt_rateest_net_exit(struct net *net)
-{
-   struct xt_rateest_net *xn = net_generic(net, xt_rateest_id);
-   int i;
-
-   for (i = 0; i < ARRAY_SIZE(xn->hash); i++)
-   WARN_ON_ONCE(!hlist_empty(>hash[i]));
-}
-
 static struct pernet_operations xt_rateest_net_ops = {
.init = xt_rateest_net_init,
-   .exit = xt_rateest_net_exit,
.id   = _rateest_id,
.size = sizeof(struct xt_rateest_net),
 };
-- 
2.17.1



[PATCH nf-next] netfilter: nfnetlink_log: remove empty nfnetlink_log.h header file

2018-10-18 Thread Taehee Yoo
/include/net/netfilter/nfnetlink_log.h file is empty.
so that it can be removed.

Signed-off-by: Taehee Yoo 
---
 include/net/netfilter/nfnetlink_log.h | 1 -
 1 file changed, 1 deletion(-)
 delete mode 100644 include/net/netfilter/nfnetlink_log.h

diff --git a/include/net/netfilter/nfnetlink_log.h 
b/include/net/netfilter/nfnetlink_log.h
deleted file mode 100644
index ea32a7d3cf1b..
--- a/include/net/netfilter/nfnetlink_log.h
+++ /dev/null
@@ -1 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-- 
2.17.1



[PATCH nf v2 3/3] netfilter: ipt_CLUSTERIP: fix sleep-in-atomic bug in clusterip_config_entry_put()

2018-10-14 Thread Taehee Yoo
A proc_remove() can sleep. so that it can't be inside of spin_lock.
Hence proc_remove() is moved to outside of spin_lock. and it also
adds mutex to sync create and remove of proc entry(config->pde).

test commands:
SHELL#1
   %while :; do iptables -A INPUT -p udp -i enp2s0 -d 192.168.1.100 \
   --dport 9000  -j CLUSTERIP --new --hashmode sourceip \
   --clustermac 01:00:5e:00:00:21 --total-nodes 3 --local-node 3; \
   iptables -F; done

SHELL#2
   %while :; do echo +1 > /proc/net/ipt_CLUSTERIP/192.168.1.100; \
   echo -1 > /proc/net/ipt_CLUSTERIP/192.168.1.100; done

[ 2949.569864] BUG: sleeping function called from invalid context at 
kernel/sched/completion.c:99
[ 2949.579944] in_atomic(): 1, irqs_disabled(): 0, pid: 5472, name: iptables
[ 2949.587920] 1 lock held by iptables/5472:
[ 2949.592711]  #0: 8f0ebcf2 (&(>lock)->rlock){+...}, at: 
refcount_dec_and_lock+0x24/0x50
[ 2949.603307] CPU: 1 PID: 5472 Comm: iptables Tainted: GW 
4.19.0-rc5+ #16
[ 2949.604212] Hardware name: To be filled by O.E.M. To be filled by 
O.E.M./Aptio CRB, BIOS 5.6.5 07/08/2015
[ 2949.604212] Call Trace:
[ 2949.604212]  dump_stack+0xc9/0x16b
[ 2949.604212]  ? show_regs_print_info+0x5/0x5
[ 2949.604212]  ___might_sleep+0x2eb/0x420
[ 2949.604212]  ? set_rq_offline.part.87+0x140/0x140
[ 2949.604212]  ? _rcu_barrier_trace+0x400/0x400
[ 2949.604212]  wait_for_completion+0x94/0x710
[ 2949.604212]  ? wait_for_completion_interruptible+0x780/0x780
[ 2949.604212]  ? __kernel_text_address+0xe/0x30
[ 2949.604212]  ? __lockdep_init_map+0x10e/0x5c0
[ 2949.604212]  ? __lockdep_init_map+0x10e/0x5c0
[ 2949.604212]  ? __init_waitqueue_head+0x86/0x130
[ 2949.604212]  ? init_wait_entry+0x1a0/0x1a0
[ 2949.604212]  proc_entry_rundown+0x208/0x270
[ 2949.604212]  ? proc_reg_get_unmapped_area+0x370/0x370
[ 2949.604212]  ? __lock_acquire+0x4500/0x4500
[ 2949.604212]  ? complete+0x18/0x70
[ 2949.604212]  remove_proc_subtree+0x143/0x2a0
[ 2949.708655]  ? remove_proc_entry+0x390/0x390
[ 2949.708655]  clusterip_tg_destroy+0x27a/0x630 [ipt_CLUSTERIP]
[ ... ]

v3: add Third patch.
v2:
 - use spin_lock_bh() instead of spin_lock() (Pablo Neira Ayuso)
 - add missing dev_mc_add() and dev_mc_del().
v1: Initial patch

Fixes: b3e456fce9f5 ("netfilter: ipt_CLUSTERIP: fix a race condition of proc 
file creation")
Signed-off-by: Taehee Yoo 
---
 net/ipv4/netfilter/ipt_CLUSTERIP.c | 19 ++-
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/netfilter/ipt_CLUSTERIP.c 
b/net/ipv4/netfilter/ipt_CLUSTERIP.c
index e734a57cd9f1..7fd399751c2e 100644
--- a/net/ipv4/netfilter/ipt_CLUSTERIP.c
+++ b/net/ipv4/netfilter/ipt_CLUSTERIP.c
@@ -56,7 +56,7 @@ struct clusterip_config {
 #endif
enum clusterip_hashmode hash_mode;  /* which hashing mode */
u_int32_t hash_initval; /* hash initialization */
-   struct rcu_head rcu;
+   struct rcu_head rcu;/* for call_rcu_bh */
struct net *net;/* netns for pernet list */
char ifname[IFNAMSIZ];  /* device ifname */
 };
@@ -72,6 +72,8 @@ struct clusterip_net {
 
 #ifdef CONFIG_PROC_FS
struct proc_dir_entry *procdir;
+   /* mutex protects the config->pde*/
+   struct mutex mutex;
 #endif
 };
 
@@ -118,17 +120,18 @@ clusterip_config_entry_put(struct clusterip_config *c)
 
local_bh_disable();
if (refcount_dec_and_lock(>entries, >lock)) {
+   list_del_rcu(>list);
+   spin_unlock(>lock);
+   local_bh_enable();
/* In case anyone still accesses the file, the open/close
 * functions are also incrementing the refcount on their own,
 * so it's safe to remove the entry even if it's in use. */
 #ifdef CONFIG_PROC_FS
+   mutex_lock(>mutex);
if (cn->procdir)
proc_remove(c->pde);
+   mutex_unlock(>mutex);
 #endif
-   list_del_rcu(>list);
-   spin_unlock(>lock);
-   local_bh_enable();
-
return;
}
local_bh_enable();
@@ -278,9 +281,11 @@ clusterip_config_init(struct net *net, const struct 
ipt_clusterip_tgt_info *i,
 
/* create proc dir entry */
sprintf(buffer, "%pI4", );
+   mutex_lock(>mutex);
c->pde = proc_create_data(buffer, 0600,
  cn->procdir,
  _proc_fops, c);
+   mutex_unlock(>mutex);
if (!c->pde) {
err = -ENOMEM;
goto err;
@@ -832,6 +837,7 @@ static int clusterip_net_init(struct net *net)
pr_err("Unable to proc dir entry\n");
return -ENOMEM;
}
+   mut

[PATCH nf v2 0/3] netfilter: ipt_CLUSTERIP: fix bugs in ipt_CLUSTERIP

2018-10-14 Thread Taehee Yoo
This patchset fixes bugs in ipt_CLUSTERIP.

First patch fixes deadlock when netns is destroyed.
When netns is destroyed cleanup_net() is called.
That function calls ->exit callback of pernet_ops.
->exit callback of ipt_CLUSTERIP hold same lock with cleanup_net().
so that deadlock will occurred.

Second patch removes wrong WARN_ON_ONCE() in clusterip_net_exit().
A WARN_ON_ONCE() in clusterip_net_exit() is for checking cleanup
is successfully done. but clusterip_net_exit() is called earlier than
cleanup function(clusterip_tg_destroy). so that it can't check about that.

Third patch fixes sleep-in-atomic bug when config structure is destroyed.
In order to sync create and remove of proc entry, proc_remove() is placed in 
spin_lock.
But proc_remove() can sleep. so that proc_remove() shouldn't be inside
of spin_lock.

v3: add Third patch.
v2:
 - use spin_lock_bh() instead of spin_lock() (Pablo Neira Ayuso)
 - add missing dev_mc_add() and dev_mc_del().
v1: Initial patch

Taehee Yoo (3):
  netfilter: ipt_CLUSTERIP: fix deadlock in netns exit routine
  netfilter: ipt_CLUSTERIP: remove wrong WARN_ON_ONCE in netns exit
routine
  netfilter: ipt_CLUSTERIP: fix sleep-in-atomic bug in
clusterip_config_entry_put()

 net/ipv4/netfilter/ipt_CLUSTERIP.c | 175 +
 1 file changed, 101 insertions(+), 74 deletions(-)

-- 
2.17.1



[PATCH nf v2 1/3] netfilter: ipt_CLUSTERIP: fix deadlock in netns exit routine

2018-10-14 Thread Taehee Yoo
When network namespace is destroyed, cleanup_net() is called.
cleanup_net() holds pernet_ops_rwsem then calls each ->exit callback.
So that clusterip_tg_destroy() is called by cleanup_net().
And clusterip_tg_destroy() calls unregister_netdevice_notifier().

But both cleanup_net() and clusterip_tg_destroy() hold same
lock(pernet_ops_rwsem). hence deadlock occurrs.

After this patch, only 1 notifier is registered when module is inserted.
And all of configs are added to per-net list.

test commands:
   %ip netns add vm1
   %ip netns exec vm1 bash
   %ip link set lo up
   %iptables -A INPUT -p tcp -i lo -d 192.168.0.5 --dport 80 \
-j CLUSTERIP --new --hashmode sourceip \
--clustermac 01:00:5e:00:00:20 --total-nodes 2 --local-node 1
   %exit
   %ip netns del vm1

splat looks like:
[  341.809674] 
[  341.809674] WARNING: possible recursive locking detected
[  341.809674] 4.19.0-rc5+ #16 Tainted: GW
[  341.809674] 
[  341.809674] kworker/u4:2/87 is trying to acquire lock:
[  341.809674] 5da2d519 (pernet_ops_rwsem){}, at: 
unregister_netdevice_notifier+0x8c/0x460
[  341.809674]
[  341.809674] but task is already holding lock:
[  341.809674] 5da2d519 (pernet_ops_rwsem){}, at: 
cleanup_net+0x119/0x900
[  341.809674]
[  341.809674] other info that might help us debug this:
[  341.809674]  Possible unsafe locking scenario:
[  341.809674]
[  341.809674]CPU0
[  341.809674]
[  341.809674]   lock(pernet_ops_rwsem);
[  341.809674]   lock(pernet_ops_rwsem);
[  341.809674]
[  341.809674]  *** DEADLOCK ***
[  341.809674]
[  341.809674]  May be due to missing lock nesting notation
[  341.809674]
[  341.809674] 3 locks held by kworker/u4:2/87:
[  341.809674]  #0: d9df6c92 ((wq_completion)"%s""netns"){+.+.}, at: 
process_one_work+0xafe/0x1de0
[  341.809674]  #1: c2cbcee2 (net_cleanup_work){+.+.}, at: 
process_one_work+0xb60/0x1de0
[  341.809674]  #2: 5da2d519 (pernet_ops_rwsem){}, at: 
cleanup_net+0x119/0x900
[  341.809674]
[  341.809674] stack backtrace:
[  341.809674] CPU: 1 PID: 87 Comm: kworker/u4:2 Tainted: GW 
4.19.0-rc5+ #16
[  341.809674] Workqueue: netns cleanup_net
[  341.809674] Call Trace:
[ ... ]
[  342.070196]  down_write+0x93/0x160
[  342.070196]  ? unregister_netdevice_notifier+0x8c/0x460
[  342.070196]  ? down_read+0x1e0/0x1e0
[  342.070196]  ? sched_clock_cpu+0x126/0x170
[  342.070196]  ? find_held_lock+0x39/0x1c0
[  342.070196]  unregister_netdevice_notifier+0x8c/0x460
[  342.070196]  ? register_netdevice_notifier+0x790/0x790
[  342.070196]  ? __local_bh_enable_ip+0xe9/0x1b0
[  342.070196]  ? __local_bh_enable_ip+0xe9/0x1b0
[  342.070196]  ? clusterip_tg_destroy+0x372/0x650 [ipt_CLUSTERIP]
[  342.070196]  ? trace_hardirqs_on+0x93/0x210
[  342.070196]  ? __bpf_trace_preemptirq_template+0x10/0x10
[  342.070196]  ? clusterip_tg_destroy+0x372/0x650 [ipt_CLUSTERIP]
[  342.123094]  clusterip_tg_destroy+0x3ad/0x650 [ipt_CLUSTERIP]
[  342.123094]  ? clusterip_net_init+0x3d0/0x3d0 [ipt_CLUSTERIP]
[  342.123094]  ? cleanup_match+0x17d/0x200 [ip_tables]
[  342.123094]  ? xt_unregister_table+0x215/0x300 [x_tables]
[  342.123094]  ? kfree+0xe2/0x2a0
[  342.123094]  cleanup_entry+0x1d5/0x2f0 [ip_tables]
[  342.123094]  ? cleanup_match+0x200/0x200 [ip_tables]
[  342.123094]  __ipt_unregister_table+0x9b/0x1a0 [ip_tables]
[  342.123094]  iptable_filter_net_exit+0x43/0x80 [iptable_filter]
[  342.123094]  ops_exit_list.isra.10+0x94/0x140
[  342.123094]  cleanup_net+0x45b/0x900
[ ... ]

v3: add Third patch.
v2:
 - use spin_lock_bh() instead of spin_lock() (Pablo Neira Ayuso)
 - add missing dev_mc_add() and dev_mc_del().
v1: Initial patch

Fixes: 202f59afd441 ("netfilter: ipt_CLUSTERIP: do not hold dev")
Signed-off-by: Taehee Yoo 
---
 net/ipv4/netfilter/ipt_CLUSTERIP.c | 155 -
 1 file changed, 87 insertions(+), 68 deletions(-)

diff --git a/net/ipv4/netfilter/ipt_CLUSTERIP.c 
b/net/ipv4/netfilter/ipt_CLUSTERIP.c
index 2c8d313ae216..e6147ecb006b 100644
--- a/net/ipv4/netfilter/ipt_CLUSTERIP.c
+++ b/net/ipv4/netfilter/ipt_CLUSTERIP.c
@@ -57,17 +57,14 @@ struct clusterip_config {
enum clusterip_hashmode hash_mode;  /* which hashing mode */
u_int32_t hash_initval; /* hash initialization */
struct rcu_head rcu;
-
+   struct net *net;/* netns for pernet list */
char ifname[IFNAMSIZ];  /* device ifname */
-   struct notifier_block notifier; /* refresh c->ifindex in it */
 };
 
 #ifdef CONFIG_PROC_FS
 static const struct file_operations clusterip_proc_fops;
 #endif
 
-static unsigned int clusterip_net_id __read_mostly;
-
 struct clusterip_net {
struct list_head configs;
/* lock protects the configs list */
@@ -78,16 +75,30 @@ struct clusterip_net 

[PATCH nf v2 2/3] netfilter: ipt_CLUSTERIP: remove wrong WARN_ON_ONCE in netns exit routine

2018-10-14 Thread Taehee Yoo
When network namespace is destroyed, both clusterip_tg_destroy() and
clusterip_net_exit() are called. and clusterip_net_exit() is called
before clusterip_tg_destroy().
Hence cleanup check code in clusterip_net_exit() doesn't make sense.

test commands:
   %ip netns add vm1
   %ip netns exec vm1 bash
   %ip link set lo up
   %iptables -A INPUT -p tcp -i lo -d 192.168.0.5 --dport 80 \
-j CLUSTERIP --new --hashmode sourceip \
--clustermac 01:00:5e:00:00:20 --total-nodes 2 --local-node 1
   %exit
   %ip netns del vm1

splat looks like:
[  341.184508] WARNING: CPU: 1 PID: 87 at 
net/ipv4/netfilter/ipt_CLUSTERIP.c:840 clusterip_net_exit+0x319/0x380 
[ipt_CLUSTERIP]
[  341.184850] Modules linked in: ipt_CLUSTERIP nf_conntrack nf_defrag_ipv6 
nf_defrag_ipv4 xt_tcpudp iptable_filter bpfilter ip_tables x_tables
[  341.184850] CPU: 1 PID: 87 Comm: kworker/u4:2 Not tainted 4.19.0-rc5+ #16
[  341.227509] Workqueue: netns cleanup_net
[  341.227509] RIP: 0010:clusterip_net_exit+0x319/0x380 [ipt_CLUSTERIP]
[  341.227509] Code: 0f 85 7f fe ff ff 48 c7 c2 80 64 2c c0 be a8 02 00 00 48 
c7 c7 a0 63 2c c0 c6 05 18 6e 00 00 01 e8 bc 38 ff f5 e9 5b fe ff ff <0f> 0b e9 
33 ff ff ff e8 4b 90 50 f6 e9 2d fe ff ff 48 89 df e8 de
[  341.227509] RSP: 0018:88011086f408 EFLAGS: 00010202
[  341.227509] RAX: dc00 RBX: 11002210de85 RCX: 
[  341.227509] RDX: 11002210de85 RSI: 880110813be8 RDI: ed002210de58
[  341.227509] RBP: 88011086f4d0 R08:  R09: 
[  341.227509] R10:  R11:  R12: 11002210de81
[  341.227509] R13: 880110625a48 R14: 880114cec8c8 R15: 0014
[  341.227509] FS:  () GS:88011660() 
knlGS:
[  341.227509] CS:  0010 DS:  ES:  CR0: 80050033
[  341.227509] CR2: 7f11fd38e000 CR3: 00013ca16000 CR4: 001006e0
[  341.227509] Call Trace:
[  341.227509]  ? __clusterip_config_find+0x460/0x460 [ipt_CLUSTERIP]
[  341.227509]  ? default_device_exit+0x1ca/0x270
[  341.227509]  ? remove_proc_entry+0x1cd/0x390
[  341.227509]  ? dev_change_net_namespace+0xd00/0xd00
[  341.227509]  ? __init_waitqueue_head+0x130/0x130
[  341.227509]  ops_exit_list.isra.10+0x94/0x140
[  341.227509]  cleanup_net+0x45b/0x900
[ ... ]

v3: add Third patch.
v2:
 - use spin_lock_bh() instead of spin_lock() (Pablo Neira Ayuso)
 - add missing dev_mc_add() and dev_mc_del().
v1: Initial patch

Fixes: 613d0776d3fe ("netfilter: exit_net cleanup check added")
Signed-off-by: Taehee Yoo 
---
 net/ipv4/netfilter/ipt_CLUSTERIP.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/ipv4/netfilter/ipt_CLUSTERIP.c 
b/net/ipv4/netfilter/ipt_CLUSTERIP.c
index e6147ecb006b..e734a57cd9f1 100644
--- a/net/ipv4/netfilter/ipt_CLUSTERIP.c
+++ b/net/ipv4/netfilter/ipt_CLUSTERIP.c
@@ -845,7 +845,6 @@ static void clusterip_net_exit(struct net *net)
cn->procdir = NULL;
 #endif
nf_unregister_net_hook(net, _arp_ops);
-   WARN_ON_ONCE(!list_empty(>configs));
 }
 
 static struct pernet_operations clusterip_net_ops = {
-- 
2.17.1



[PATCH nf v2] netfilter: nf_flow_table: do not remove offload when other netns's interface is down

2018-10-11 Thread Taehee Yoo
When interface is down, offload cleanup function(nf_flow_table_do_cleanup)
is called and that checks whether interface index of offload and
index of link down interface is same. but only interface index checking
is not enough because flowtable is not pernet list.
So that, if other netns's interface that has index is same with offload
is down, that offload will be removed.
This patch adds netns checking code to the offload cleanup routine.

Fixes: 59c466dd68e7 ("netfilter: nf_flow_table: add a new flow state for 
tearing down offloading")
Signed-off-by: Taehee Yoo 
---
v2: do not modify unnecessary code (Pablo Neira Ayuso)
v1: Initial patch

 net/netfilter/nf_flow_table_core.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/net/netfilter/nf_flow_table_core.c 
b/net/netfilter/nf_flow_table_core.c
index d8125616edc7..c188e27972c7 100644
--- a/net/netfilter/nf_flow_table_core.c
+++ b/net/netfilter/nf_flow_table_core.c
@@ -478,14 +478,17 @@ EXPORT_SYMBOL_GPL(nf_flow_table_init);
 static void nf_flow_table_do_cleanup(struct flow_offload *flow, void *data)
 {
struct net_device *dev = data;
+   struct flow_offload_entry *e;
+
+   e = container_of(flow, struct flow_offload_entry, flow);
 
if (!dev) {
flow_offload_teardown(flow);
return;
}
-
-   if (flow->tuplehash[0].tuple.iifidx == dev->ifindex ||
-   flow->tuplehash[1].tuple.iifidx == dev->ifindex)
+   if (net_eq(nf_ct_net(e->ct), dev_net(dev)) &&
+   (flow->tuplehash[0].tuple.iifidx == dev->ifindex ||
+flow->tuplehash[1].tuple.iifidx == dev->ifindex))
flow_offload_dead(flow);
 }
 
-- 
2.17.1



[PATCH nf-next] netfilter: nf_flow_table: remove unnecessary parameter of nf_flow_table_cleanup()

2018-10-11 Thread Taehee Yoo
parameter net of nf_flow_table_cleanup() is not used.
So that it can be removed.

Signed-off-by: Taehee Yoo 
---
 include/net/netfilter/nf_flow_table.h | 2 +-
 net/netfilter/nf_flow_table_core.c| 2 +-
 net/netfilter/nft_flow_offload.c  | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/net/netfilter/nf_flow_table.h 
b/include/net/netfilter/nf_flow_table.h
index 0e355f4a3d76..77e2761d4f2f 100644
--- a/include/net/netfilter/nf_flow_table.h
+++ b/include/net/netfilter/nf_flow_table.h
@@ -99,7 +99,7 @@ int nf_flow_table_iterate(struct nf_flowtable *flow_table,
  void (*iter)(struct flow_offload *flow, void *data),
  void *data);
 
-void nf_flow_table_cleanup(struct net *net, struct net_device *dev);
+void nf_flow_table_cleanup(struct net_device *dev);
 
 int nf_flow_table_init(struct nf_flowtable *flow_table);
 void nf_flow_table_free(struct nf_flowtable *flow_table);
diff --git a/net/netfilter/nf_flow_table_core.c 
b/net/netfilter/nf_flow_table_core.c
index 185c633b6872..a3cc2ef8a48a 100644
--- a/net/netfilter/nf_flow_table_core.c
+++ b/net/netfilter/nf_flow_table_core.c
@@ -483,7 +483,7 @@ static void nf_flow_table_iterate_cleanup(struct 
nf_flowtable *flowtable,
flush_delayed_work(>gc_work);
 }
 
-void nf_flow_table_cleanup(struct net *net, struct net_device *dev)
+void nf_flow_table_cleanup(struct net_device *dev)
 {
struct nf_flowtable *flowtable;
 
diff --git a/net/netfilter/nft_flow_offload.c b/net/netfilter/nft_flow_offload.c
index d6bab8c3cbb0..e82d9a966c45 100644
--- a/net/netfilter/nft_flow_offload.c
+++ b/net/netfilter/nft_flow_offload.c
@@ -201,7 +201,7 @@ static int flow_offload_netdev_event(struct notifier_block 
*this,
if (event != NETDEV_DOWN)
return NOTIFY_DONE;
 
-   nf_flow_table_cleanup(dev_net(dev), dev);
+   nf_flow_table_cleanup(dev);
 
return NOTIFY_DONE;
 }
-- 
2.17.1



Re: [PATCH nf 2/2] netfilter: xt_TEE: add missing code to get interface index in checkentry.

2018-10-11 Thread Taehee Yoo
On Thu, 11 Oct 2018 at 19:17, Pablo Neira Ayuso  wrote:
>

Hi Pablo,

> On Wed, Oct 10, 2018 at 07:56:18PM +0200, Pablo Neira Ayuso wrote:
> > On Sun, Oct 07, 2018 at 12:09:32AM +0900, Taehee Yoo wrote:
> > > checkentry(tee_tg_check) should initialize priv->oif from dev if possible.
> > > But only netdevice notifier handler can set that.
> > > Hence priv->oif is always -1 until notifier handler is called.
> > >
> > > Fixes: 22265a5c3c10 ("netfilter: xt_TEE: resolve oif using netdevice 
> > > notifiers")
> >
> > I think this should be:
> >
> > Fixes: 9e2f6c5d78db ("netfilter: Rework xt_TEE netdevice notifier")
> >
> > since this one deleted the register_netdevice_notifier() call that was
> > setting the output interface index.
>
> Applied, thanks.
>
> Fixed it here before applying, I just hope my Fixes: tag is correct.
> It it is not, just let me know, will wait a bit to push out changes.

Thank you for review and fixing the fix-tag!

I checked about register_netdevice_notifier(), it works as your mention!
so that your fix-tag is right.

Thanks!


Re: [PATCH nf] netfilter: nf_flow_table: do not remove offload when other netns's interface is down

2018-10-11 Thread Taehee Yoo
On Thu, 11 Oct 2018 at 03:09, Pablo Neira Ayuso  wrote:
>

Hi Pablo,

Thank you for review!

> On Tue, Oct 09, 2018 at 02:59:48AM +0900, Taehee Yoo wrote:
> > When interface is down, offload cleanup function(nf_flow_table_do_cleanup)
> > is called and that checks whether interface index of offload and
> > index of link down interface is same. but only interface index checking
> > is not enough because flowtable is not pernet list.
> > So that, if other netns's interface that has index is same with offload
> > is down, that offload will be removed.
> > This patch adds netns checking code to the offload cleanup routine.
> > And it also removes unnecessary parameter of nf_flow_table_cleanup().
> >
> > Fixes: 59c466dd68e7 ("netfilter: nf_flow_table: add a new flow state for 
> > tearing down offloading")
> > Signed-off-by: Taehee Yoo 
> > ---
> >  include/net/netfilter/nf_flow_table.h |  2 +-
> >  net/netfilter/nf_flow_table_core.c| 10 +++---
> >  net/netfilter/nft_flow_offload.c  |  2 +-
> >  3 files changed, 9 insertions(+), 5 deletions(-)
> >
> > diff --git a/include/net/netfilter/nf_flow_table.h 
> > b/include/net/netfilter/nf_flow_table.h
> > index 0e355f4a3d76..77e2761d4f2f 100644
> > --- a/include/net/netfilter/nf_flow_table.h
> > +++ b/include/net/netfilter/nf_flow_table.h
> > @@ -99,7 +99,7 @@ int nf_flow_table_iterate(struct nf_flowtable *flow_table,
> > void (*iter)(struct flow_offload *flow, void *data),
> > void *data);
> >
> > -void nf_flow_table_cleanup(struct net *net, struct net_device *dev);
> > +void nf_flow_table_cleanup(struct net_device *dev);
> >
> >  int nf_flow_table_init(struct nf_flowtable *flow_table);
> >  void nf_flow_table_free(struct nf_flowtable *flow_table);
> > diff --git a/net/netfilter/nf_flow_table_core.c 
> > b/net/netfilter/nf_flow_table_core.c
> > index d8125616edc7..88aae0ae499c 100644
> > --- a/net/netfilter/nf_flow_table_core.c
> > +++ b/net/netfilter/nf_flow_table_core.c
> > @@ -478,14 +478,18 @@ EXPORT_SYMBOL_GPL(nf_flow_table_init);
> >  static void nf_flow_table_do_cleanup(struct flow_offload *flow, void *data)
> >  {
> >   struct net_device *dev = data;
> > + struct flow_offload_entry *e;
> > +
> > + e = container_of(flow, struct flow_offload_entry, flow);
> >
> >   if (!dev) {
> >   flow_offload_teardown(flow);
> >   return;
> >   }
> >
> > - if (flow->tuplehash[0].tuple.iifidx == dev->ifindex ||
> > - flow->tuplehash[1].tuple.iifidx == dev->ifindex)
> > + if (net_eq(nf_ct_net(e->ct), dev_net(dev)) &&
> > + (flow->tuplehash[0].tuple.iifidx == dev->ifindex ||
> > +  flow->tuplehash[1].tuple.iifidx == dev->ifindex))
> >   flow_offload_dead(flow);
> >  }
> >
>
> These two chunks below doesn't belong here. I'd prefer this goes
> in a separated patch for nf-next.
>

I agree with that
I will send separate two patches for nf and nf-next.

Thanks!

> Thanks.
>
> > @@ -496,7 +500,7 @@ static void nf_flow_table_iterate_cleanup(struct 
> > nf_flowtable *flowtable,
> >   flush_delayed_work(>gc_work);
> >  }
> >
> > -void nf_flow_table_cleanup(struct net *net, struct net_device *dev)
> > +void nf_flow_table_cleanup(struct net_device *dev)
> >  {
> >   struct nf_flowtable *flowtable;
> >
> > diff --git a/net/netfilter/nft_flow_offload.c 
> > b/net/netfilter/nft_flow_offload.c
> > index d6bab8c3cbb0..e82d9a966c45 100644
> > --- a/net/netfilter/nft_flow_offload.c
> > +++ b/net/netfilter/nft_flow_offload.c
> > @@ -201,7 +201,7 @@ static int flow_offload_netdev_event(struct 
> > notifier_block *this,
> >   if (event != NETDEV_DOWN)
> >   return NOTIFY_DONE;
> >
> > - nf_flow_table_cleanup(dev_net(dev), dev);
> > + nf_flow_table_cleanup(dev);
> >
> >   return NOTIFY_DONE;
> >  }
> > --
> > 2.17.1
> >


Re: [PATCH nf 1/2] netfilter: ipt_CLUSTERIP: fix deadlock in netns exit routine

2018-10-11 Thread Taehee Yoo
On Thu, 11 Oct 2018 at 02:32, Pablo Neira Ayuso  wrote:
>

Hi Pablo,

Thank you for review!

> On Sat, Oct 06, 2018 at 01:42:42AM +0900, Taehee Yoo wrote:
> > diff --git a/net/ipv4/netfilter/ipt_CLUSTERIP.c 
> > b/net/ipv4/netfilter/ipt_CLUSTERIP.c
> > index 2c8d313ae216..6ccabe6f74a6 100644
> > --- a/net/ipv4/netfilter/ipt_CLUSTERIP.c
> > +++ b/net/ipv4/netfilter/ipt_CLUSTERIP.c
> > @@ -59,7 +59,6 @@ struct clusterip_config {
> >   struct rcu_head rcu;
> >
> >   char ifname[IFNAMSIZ];  /* device ifname */
> > - struct notifier_block notifier; /* refresh c->ifindex in it */
> >  };
> >
> >  #ifdef CONFIG_PROC_FS
> > @@ -118,8 +117,6 @@ clusterip_config_entry_put(struct net *net, struct 
> > clusterip_config *c)
> >   spin_unlock(>lock);
> >   local_bh_enable();
> >
> > - unregister_netdevice_notifier(>notifier);
> > -
> >   return;
> >   }
> >   local_bh_enable();
> > @@ -181,32 +178,37 @@ clusterip_netdev_event(struct notifier_block *this, 
> > unsigned long event,
> >  void *ptr)
> >  {
> >   struct net_device *dev = netdev_notifier_info_to_dev(ptr);
> > + struct net *net = dev_net(dev);
> > + struct clusterip_net *cn = net_generic(net, clusterip_net_id);
> >   struct clusterip_config *c;
> >
> > - c = container_of(this, struct clusterip_config, notifier);
> > - switch (event) {
> > - case NETDEV_REGISTER:
> > - if (!strcmp(dev->name, c->ifname)) {
> > - c->ifindex = dev->ifindex;
> > - dev_mc_add(dev, c->clustermac);
> > - }
> > - break;
> > - case NETDEV_UNREGISTER:
> > - if (dev->ifindex == c->ifindex) {
> > - dev_mc_del(dev, c->clustermac);
> > - c->ifindex = -1;
> > - }
> > - break;
> > - case NETDEV_CHANGENAME:
> > - if (!strcmp(dev->name, c->ifname)) {
> > - c->ifindex = dev->ifindex;
> > - dev_mc_add(dev, c->clustermac);
> > - } else if (dev->ifindex == c->ifindex) {
> > - dev_mc_del(dev, c->clustermac);
> > - c->ifindex = -1;
> > + spin_lock(>lock);
>
> Do we need spin_lock_bh() here?

I checked that, you're right.
config is modified in the BH. so, that should be spin_lock_bh().
I will send v2 patch

Thanks!


Re: [PATCH nf-next] netfilter: nf_flow_table: remove flowtable hook flush routine in netns exit routine

2018-10-09 Thread Taehee Yoo
On Tue, 9 Oct 2018 at 08:19, Pablo Neira Ayuso  wrote:
>
> Hi Taehee,
>

Hi Pablo,

Thank you for your review!

> I can reproduce it, so this is a bug :-). Still one question below:
>
> On Tue, Oct 02, 2018 at 02:17:14AM +0900, Taehee Yoo wrote:
> [...]
> > diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
> > index f0159eea2978..42487d01a3ed 100644
> > --- a/net/netfilter/nf_tables_api.c
> > +++ b/net/netfilter/nf_tables_api.c
> > @@ -7280,9 +7280,6 @@ static void __nft_release_tables(struct net *net)
> >
> >   list_for_each_entry(chain, >chains, list)
> >   nf_tables_unregister_hook(net, table, chain);
> > - list_for_each_entry(flowtable, >flowtables, list)
> > - nf_unregister_net_hooks(net, flowtable->ops,
> > - flowtable->ops_len);
>
> Hm, why do we still need for basechains with device, ie. from ingress?
> I might be missing something...
>

As far as I know, at this point, all types of basechains(arp, ipv4, ipv6, ...)
are unregistered. ingress basechains are already unregistered by
notifier_call(nf_tables_netdev_event) but other types of basechains
still exist in chain list. so that this code is still needed.
But I might have misunderstood about your mention.
If so, please let me know about that.

Thanks!

> >   /* No packets are walking on these chains anymore. */
> >   ctx.table = table;
> >   list_for_each_entry(chain, >chains, list) {
> > --
> > 2.17.1
> >


[PATCH nf] netfilter: nf_flow_table: do not remove offload when other netns's interface is down

2018-10-08 Thread Taehee Yoo
When interface is down, offload cleanup function(nf_flow_table_do_cleanup)
is called and that checks whether interface index of offload and
index of link down interface is same. but only interface index checking
is not enough because flowtable is not pernet list.
So that, if other netns's interface that has index is same with offload
is down, that offload will be removed.
This patch adds netns checking code to the offload cleanup routine.
And it also removes unnecessary parameter of nf_flow_table_cleanup().

Fixes: 59c466dd68e7 ("netfilter: nf_flow_table: add a new flow state for 
tearing down offloading")
Signed-off-by: Taehee Yoo 
---
 include/net/netfilter/nf_flow_table.h |  2 +-
 net/netfilter/nf_flow_table_core.c| 10 +++---
 net/netfilter/nft_flow_offload.c  |  2 +-
 3 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/include/net/netfilter/nf_flow_table.h 
b/include/net/netfilter/nf_flow_table.h
index 0e355f4a3d76..77e2761d4f2f 100644
--- a/include/net/netfilter/nf_flow_table.h
+++ b/include/net/netfilter/nf_flow_table.h
@@ -99,7 +99,7 @@ int nf_flow_table_iterate(struct nf_flowtable *flow_table,
  void (*iter)(struct flow_offload *flow, void *data),
  void *data);
 
-void nf_flow_table_cleanup(struct net *net, struct net_device *dev);
+void nf_flow_table_cleanup(struct net_device *dev);
 
 int nf_flow_table_init(struct nf_flowtable *flow_table);
 void nf_flow_table_free(struct nf_flowtable *flow_table);
diff --git a/net/netfilter/nf_flow_table_core.c 
b/net/netfilter/nf_flow_table_core.c
index d8125616edc7..88aae0ae499c 100644
--- a/net/netfilter/nf_flow_table_core.c
+++ b/net/netfilter/nf_flow_table_core.c
@@ -478,14 +478,18 @@ EXPORT_SYMBOL_GPL(nf_flow_table_init);
 static void nf_flow_table_do_cleanup(struct flow_offload *flow, void *data)
 {
struct net_device *dev = data;
+   struct flow_offload_entry *e;
+
+   e = container_of(flow, struct flow_offload_entry, flow);
 
if (!dev) {
flow_offload_teardown(flow);
return;
}
 
-   if (flow->tuplehash[0].tuple.iifidx == dev->ifindex ||
-   flow->tuplehash[1].tuple.iifidx == dev->ifindex)
+   if (net_eq(nf_ct_net(e->ct), dev_net(dev)) &&
+   (flow->tuplehash[0].tuple.iifidx == dev->ifindex ||
+flow->tuplehash[1].tuple.iifidx == dev->ifindex))
flow_offload_dead(flow);
 }
 
@@ -496,7 +500,7 @@ static void nf_flow_table_iterate_cleanup(struct 
nf_flowtable *flowtable,
flush_delayed_work(>gc_work);
 }
 
-void nf_flow_table_cleanup(struct net *net, struct net_device *dev)
+void nf_flow_table_cleanup(struct net_device *dev)
 {
struct nf_flowtable *flowtable;
 
diff --git a/net/netfilter/nft_flow_offload.c b/net/netfilter/nft_flow_offload.c
index d6bab8c3cbb0..e82d9a966c45 100644
--- a/net/netfilter/nft_flow_offload.c
+++ b/net/netfilter/nft_flow_offload.c
@@ -201,7 +201,7 @@ static int flow_offload_netdev_event(struct notifier_block 
*this,
if (event != NETDEV_DOWN)
return NOTIFY_DONE;
 
-   nf_flow_table_cleanup(dev_net(dev), dev);
+   nf_flow_table_cleanup(dev);
 
return NOTIFY_DONE;
 }
-- 
2.17.1



[PATCH nf-next] netfilter: nf_nat_snmp_basic: add missing helper alias name

2018-10-06 Thread Taehee Yoo
In order to upload helper module automatically, helper alias name
is needed. so that MODULE_ALIAS_NFCT_HELPER() should be added.
And unlike other nat helper modules, the nf_nat_snmp_basic can be
used independently.
helper name is "snmp_trap" so that alias name will be
"nfct-helper-snmp_trap" by MODULE_ALIAS_NFCT_HELPER(snmp_trap)

test command:
   %iptables -t raw -I PREROUTING -p udp -j CT --helper snmp_trap
   %lsmod | grep nf_nat_snmp_basic

We can see nf_nat_snmp_basic module is uploaded automatically.

Signed-off-by: Taehee Yoo 
---
 net/ipv4/netfilter/nf_nat_snmp_basic_main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/ipv4/netfilter/nf_nat_snmp_basic_main.c 
b/net/ipv4/netfilter/nf_nat_snmp_basic_main.c
index ac110c1d55b5..a0aa13bcabda 100644
--- a/net/ipv4/netfilter/nf_nat_snmp_basic_main.c
+++ b/net/ipv4/netfilter/nf_nat_snmp_basic_main.c
@@ -60,6 +60,7 @@ MODULE_LICENSE("GPL");
 MODULE_AUTHOR("James Morris ");
 MODULE_DESCRIPTION("Basic SNMP Application Layer Gateway");
 MODULE_ALIAS("ip_nat_snmp_basic");
+MODULE_ALIAS_NFCT_HELPER("snmp_trap");
 
 #define SNMP_PORT 161
 #define SNMP_TRAP_PORT 162
-- 
2.17.1



[PATCH nf 1/2] netfilter: xt_TEE: fix wrong interface selection

2018-10-06 Thread Taehee Yoo
TEE netdevice notifier handler checks only interface name. however
each netns can have same interface name. hence other netns's interface
could be selected.

test commands:
   %ip netns add vm1
   %iptables -I INPUT -p icmp -j TEE --gateway 192.168.1.1 --oif enp2s0
   %ip link set enp2s0 netns vm1

Above rule is in the root netns. but that rule could get enp2s0
ifindex of vm1 by notifier handler.

After this patch, TEE rule is added to the per-netns list.

Fixes: 9e2f6c5d78db ("netfilter: Rework xt_TEE netdevice notifier")
Signed-off-by: Taehee Yoo 
---
 net/netfilter/xt_TEE.c | 69 +++---
 1 file changed, 52 insertions(+), 17 deletions(-)

diff --git a/net/netfilter/xt_TEE.c b/net/netfilter/xt_TEE.c
index 0d0d68c989df..673ad2099f97 100644
--- a/net/netfilter/xt_TEE.c
+++ b/net/netfilter/xt_TEE.c
@@ -14,6 +14,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -25,8 +27,15 @@ struct xt_tee_priv {
int oif;
 };
 
+static unsigned int tee_net_id __read_mostly;
 static const union nf_inet_addr tee_zero_address;
 
+struct tee_net {
+   struct list_head priv_list;
+   /* lock protects the priv_list */
+   struct mutex lock;
+};
+
 static unsigned int
 tee_tg4(struct sk_buff *skb, const struct xt_action_param *par)
 {
@@ -51,17 +60,16 @@ tee_tg6(struct sk_buff *skb, const struct xt_action_param 
*par)
 }
 #endif
 
-static DEFINE_MUTEX(priv_list_mutex);
-static LIST_HEAD(priv_list);
-
 static int tee_netdev_event(struct notifier_block *this, unsigned long event,
void *ptr)
 {
struct net_device *dev = netdev_notifier_info_to_dev(ptr);
+   struct net *net = dev_net(dev);
+   struct tee_net *tn = net_generic(net, tee_net_id);
struct xt_tee_priv *priv;
 
-   mutex_lock(_list_mutex);
-   list_for_each_entry(priv, _list, list) {
+   mutex_lock(>lock);
+   list_for_each_entry(priv, >priv_list, list) {
switch (event) {
case NETDEV_REGISTER:
if (!strcmp(dev->name, priv->tginfo->oif))
@@ -79,13 +87,14 @@ static int tee_netdev_event(struct notifier_block *this, 
unsigned long event,
break;
}
}
-   mutex_unlock(_list_mutex);
+   mutex_unlock(>lock);
 
return NOTIFY_DONE;
 }
 
 static int tee_tg_check(const struct xt_tgchk_param *par)
 {
+   struct tee_net *tn = net_generic(par->net, tee_net_id);
struct xt_tee_tginfo *info = par->targinfo;
struct xt_tee_priv *priv;
 
@@ -106,9 +115,9 @@ static int tee_tg_check(const struct xt_tgchk_param *par)
priv->oif = -1;
info->priv= priv;
 
-   mutex_lock(_list_mutex);
-   list_add(>list, _list);
-   mutex_unlock(_list_mutex);
+   mutex_lock(>lock);
+   list_add(>list, >priv_list);
+   mutex_unlock(>lock);
} else
info->priv = NULL;
 
@@ -118,12 +127,13 @@ static int tee_tg_check(const struct xt_tgchk_param *par)
 
 static void tee_tg_destroy(const struct xt_tgdtor_param *par)
 {
+   struct tee_net *tn = net_generic(par->net, tee_net_id);
struct xt_tee_tginfo *info = par->targinfo;
 
if (info->priv) {
-   mutex_lock(_list_mutex);
+   mutex_lock(>lock);
list_del(>priv->list);
-   mutex_unlock(_list_mutex);
+   mutex_unlock(>lock);
kfree(info->priv);
}
static_key_slow_dec(_tee_enabled);
@@ -156,6 +166,21 @@ static struct xt_target tee_tg_reg[] __read_mostly = {
 #endif
 };
 
+static int __net_init tee_net_init(struct net *net)
+{
+   struct tee_net *tn = net_generic(net, tee_net_id);
+
+   INIT_LIST_HEAD(>priv_list);
+   mutex_init(>lock);
+   return 0;
+}
+
+static struct pernet_operations tee_net_ops = {
+   .init = tee_net_init,
+   .id   = _net_id,
+   .size = sizeof(struct tee_net),
+};
+
 static struct notifier_block tee_netdev_notifier = {
.notifier_call = tee_netdev_event,
 };
@@ -164,22 +189,32 @@ static int __init tee_tg_init(void)
 {
int ret;
 
-   ret = xt_register_targets(tee_tg_reg, ARRAY_SIZE(tee_tg_reg));
-   if (ret)
+   ret = register_pernet_subsys(_net_ops);
+   if (ret < 0)
return ret;
+
+   ret = xt_register_targets(tee_tg_reg, ARRAY_SIZE(tee_tg_reg));
+   if (ret < 0)
+   goto cleanup_subsys;
+
ret = register_netdevice_notifier(_netdev_notifier);
-   if (ret) {
-   xt_unregister_targets(tee_tg_reg, ARRAY_SIZE(tee_tg_reg));
-   return ret;
-   }
+   if (ret < 0)
+   goto unregister_targets;
 
return 0;
+
+unregister_targets:
+   xt_unregist

[PATCH nf 0/2] netfilter: xt_TEE: fix bugs in xt_TEE

2018-10-06 Thread Taehee Yoo
This patchset fix bugs in xt_TEE.c

First patch fixes wrong interface selection.
In the netdevice notifier handler of xt_TEE, other netns's interface
could be selected. but that is wrong behaviour.

Second patch adds missing code that finds interface's index(dev->ifindex)
when rule is inserted.

Taehee Yoo (2):
  netfilter: xt_TEE: fix wrong interface selection
  netfilter: xt_TEE: add missing code to get interface index in
checkentry.

 net/netfilter/xt_TEE.c | 76 --
 1 file changed, 59 insertions(+), 17 deletions(-)

-- 
2.17.1



[PATCH nf 2/2] netfilter: ipt_CLUSTERIP: remove wrong WARN_ON_ONCE in netns exit routine

2018-10-05 Thread Taehee Yoo
When network namespace is destroyed, both clusterip_tg_destroy() and
clusterip_net_exit() are called. and clusterip_net_exit() is called
before clusterip_tg_destroy().
Hence cleanup check code in clusterip_net_exit() doesn't make sense.

test commands:
   %ip netns add vm1
   %ip netns exec vm1 bash
   %ip link set lo up
   %iptables -A INPUT -p tcp -i lo -d 192.168.0.5 --dport 80 \
 -j CLUSTERIP --new --hashmode sourceip \
 --clustermac 01:00:5e:00:00:20 --total-nodes 2 --local-node 1
   %exit
   %ip netns del vm1

splat looks like:
[  341.184508] WARNING: CPU: 1 PID: 87 at 
net/ipv4/netfilter/ipt_CLUSTERIP.c:840 clusterip_net_exit+0x319/0x380 
[ipt_CLUSTERIP]
[  341.184850] Modules linked in: ipt_CLUSTERIP nf_conntrack nf_defrag_ipv6 
nf_defrag_ipv4 xt_tcpudp iptable_filter bpfilter ip_tables x_tables
[  341.184850] CPU: 1 PID: 87 Comm: kworker/u4:2 Not tainted 4.19.0-rc5+ #16
[  341.227509] Workqueue: netns cleanup_net
[  341.227509] RIP: 0010:clusterip_net_exit+0x319/0x380 [ipt_CLUSTERIP]
[  341.227509] Code: 0f 85 7f fe ff ff 48 c7 c2 80 64 2c c0 be a8 02 00 00 48 
c7 c7 a0 63 2c c0 c6 05 18 6e 00 00 01 e8 bc 38 ff f5 e9 5b fe ff ff <0f> 0b e9 
33 ff ff ff e8 4b 90 50 f6 e9 2d fe ff ff 48 89 df e8 de
[  341.227509] RSP: 0018:88011086f408 EFLAGS: 00010202
[  341.227509] RAX: dc00 RBX: 11002210de85 RCX: 
[  341.227509] RDX: 11002210de85 RSI: 880110813be8 RDI: ed002210de58
[  341.227509] RBP: 88011086f4d0 R08:  R09: 
[  341.227509] R10:  R11:  R12: 11002210de81
[  341.227509] R13: 880110625a48 R14: 880114cec8c8 R15: 0014
[  341.227509] FS:  () GS:88011660() 
knlGS:
[  341.227509] CS:  0010 DS:  ES:  CR0: 80050033
[  341.227509] CR2: 7f11fd38e000 CR3: 00013ca16000 CR4: 001006e0
[  341.227509] Call Trace:
[  341.227509]  ? __clusterip_config_find+0x460/0x460 [ipt_CLUSTERIP]
[  341.227509]  ? default_device_exit+0x1ca/0x270
[  341.227509]  ? remove_proc_entry+0x1cd/0x390
[  341.227509]  ? dev_change_net_namespace+0xd00/0xd00
[  341.227509]  ? __init_waitqueue_head+0x130/0x130
[  341.227509]  ops_exit_list.isra.10+0x94/0x140
[  341.227509]  cleanup_net+0x45b/0x900
[ ... ]

Fixes: 613d0776d3fe ("netfilter: exit_net cleanup check added")
Signed-off-by: Taehee Yoo 
---
 net/ipv4/netfilter/ipt_CLUSTERIP.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/ipv4/netfilter/ipt_CLUSTERIP.c 
b/net/ipv4/netfilter/ipt_CLUSTERIP.c
index 6ccabe6f74a6..20b452df856c 100644
--- a/net/ipv4/netfilter/ipt_CLUSTERIP.c
+++ b/net/ipv4/netfilter/ipt_CLUSTERIP.c
@@ -835,7 +835,6 @@ static void clusterip_net_exit(struct net *net)
cn->procdir = NULL;
 #endif
nf_unregister_net_hook(net, _arp_ops);
-   WARN_ON_ONCE(!list_empty(>configs));
 }
 
 static struct pernet_operations clusterip_net_ops = {
-- 
2.17.1



[PATCH nf 1/2] netfilter: ipt_CLUSTERIP: fix deadlock in netns exit routine

2018-10-05 Thread Taehee Yoo
When network namespace is destroyed, cleanup_net() is called.
cleanup_net() holds pernet_ops_rwsem then calls each ->exit callback.
So that clusterip_tg_destroy() is called by cleanup_net().
And clusterip_tg_destroy() calls unregister_netdevice_notifier().

But both cleanup_net() and clusterip_tg_destroy() hold same
lock(pernet_ops_rwsem). hence deadlock occurrs.

After this patch, only 1 notifier is registered when module is inserted.
And all of configs are added to per-net list.

test commands:
   %ip netns add vm1
   %ip netns exec vm1 bash
   %ip link set lo up
   %iptables -A INPUT -p tcp -i lo -d 192.168.0.5 --dport 80 \
 -j CLUSTERIP --new --hashmode sourceip \
 --clustermac 01:00:5e:00:00:20 --total-nodes 2 --local-node 1
   %exit
   %ip netns del vm1

splat looks like:
[  341.809674] 
[  341.809674] WARNING: possible recursive locking detected
[  341.809674] 4.19.0-rc5+ #16 Tainted: GW
[  341.809674] 
[  341.809674] kworker/u4:2/87 is trying to acquire lock:
[  341.809674] 5da2d519 (pernet_ops_rwsem){}, at: 
unregister_netdevice_notifier+0x8c/0x460
[  341.809674]
[  341.809674] but task is already holding lock:
[  341.809674] 5da2d519 (pernet_ops_rwsem){}, at: 
cleanup_net+0x119/0x900
[  341.809674]
[  341.809674] other info that might help us debug this:
[  341.809674]  Possible unsafe locking scenario:
[  341.809674]
[  341.809674]CPU0
[  341.809674]
[  341.809674]   lock(pernet_ops_rwsem);
[  341.809674]   lock(pernet_ops_rwsem);
[  341.809674]
[  341.809674]  *** DEADLOCK ***
[  341.809674]
[  341.809674]  May be due to missing lock nesting notation
[  341.809674]
[  341.809674] 3 locks held by kworker/u4:2/87:
[  341.809674]  #0: d9df6c92 ((wq_completion)"%s""netns"){+.+.}, at: 
process_one_work+0xafe/0x1de0
[  341.809674]  #1: c2cbcee2 (net_cleanup_work){+.+.}, at: 
process_one_work+0xb60/0x1de0
[  341.809674]  #2: 5da2d519 (pernet_ops_rwsem){}, at: 
cleanup_net+0x119/0x900
[  341.809674]
[  341.809674] stack backtrace:
[  341.809674] CPU: 1 PID: 87 Comm: kworker/u4:2 Tainted: GW 
4.19.0-rc5+ #16
[  341.809674] Workqueue: netns cleanup_net
[  341.809674] Call Trace:
[ ... ]
[  342.070196]  down_write+0x93/0x160
[  342.070196]  ? unregister_netdevice_notifier+0x8c/0x460
[  342.070196]  ? down_read+0x1e0/0x1e0
[  342.070196]  ? sched_clock_cpu+0x126/0x170
[  342.070196]  ? find_held_lock+0x39/0x1c0
[  342.070196]  unregister_netdevice_notifier+0x8c/0x460
[  342.070196]  ? register_netdevice_notifier+0x790/0x790
[  342.070196]  ? __local_bh_enable_ip+0xe9/0x1b0
[  342.070196]  ? __local_bh_enable_ip+0xe9/0x1b0
[  342.070196]  ? clusterip_tg_destroy+0x372/0x650 [ipt_CLUSTERIP]
[  342.070196]  ? trace_hardirqs_on+0x93/0x210
[  342.070196]  ? __bpf_trace_preemptirq_template+0x10/0x10
[  342.070196]  ? clusterip_tg_destroy+0x372/0x650 [ipt_CLUSTERIP]
[  342.123094]  clusterip_tg_destroy+0x3ad/0x650 [ipt_CLUSTERIP]
[  342.123094]  ? clusterip_net_init+0x3d0/0x3d0 [ipt_CLUSTERIP]
[  342.123094]  ? cleanup_match+0x17d/0x200 [ip_tables]
[  342.123094]  ? xt_unregister_table+0x215/0x300 [x_tables]
[  342.123094]  ? kfree+0xe2/0x2a0
[  342.123094]  cleanup_entry+0x1d5/0x2f0 [ip_tables]
[  342.123094]  ? cleanup_match+0x200/0x200 [ip_tables]
[  342.123094]  __ipt_unregister_table+0x9b/0x1a0 [ip_tables]
[  342.123094]  iptable_filter_net_exit+0x43/0x80 [iptable_filter]
[  342.123094]  ops_exit_list.isra.10+0x94/0x140
[  342.123094]  cleanup_net+0x45b/0x900
[ ... ]

Fixes: 202f59afd441 ("netfilter: ipt_CLUSTERIP: do not hold dev")
Signed-off-by: Taehee Yoo 
---
 net/ipv4/netfilter/ipt_CLUSTERIP.c | 71 +-
 1 file changed, 40 insertions(+), 31 deletions(-)

diff --git a/net/ipv4/netfilter/ipt_CLUSTERIP.c 
b/net/ipv4/netfilter/ipt_CLUSTERIP.c
index 2c8d313ae216..6ccabe6f74a6 100644
--- a/net/ipv4/netfilter/ipt_CLUSTERIP.c
+++ b/net/ipv4/netfilter/ipt_CLUSTERIP.c
@@ -59,7 +59,6 @@ struct clusterip_config {
struct rcu_head rcu;
 
char ifname[IFNAMSIZ];  /* device ifname */
-   struct notifier_block notifier; /* refresh c->ifindex in it */
 };
 
 #ifdef CONFIG_PROC_FS
@@ -118,8 +117,6 @@ clusterip_config_entry_put(struct net *net, struct 
clusterip_config *c)
spin_unlock(>lock);
local_bh_enable();
 
-   unregister_netdevice_notifier(>notifier);
-
return;
}
local_bh_enable();
@@ -181,32 +178,37 @@ clusterip_netdev_event(struct notifier_block *this, 
unsigned long event,
   void *ptr)
 {
struct net_device *dev = netdev_notifier_info_to_dev(ptr);
+   struct net *net = dev_net(dev);
+   struct clusterip_net *cn = net_generic(net, clusterip_net_id);
struct clusterip_config *c;
 
-   c = 

[PATCH nf 0/2] netfilter: ipt_CLUSTERIP: fix bugs in ipt_CLUSTERIP

2018-10-05 Thread Taehee Yoo
8/0x8
[  399.572624]  ? cyc2ns_read_end+0x10/0x10
[  399.577146]  ? save_trace+0x300/0x300
[  399.581379]  ? sched_clock_local+0xd4/0x140
[  399.586198]  ? find_held_lock+0x39/0x1c0
[  399.590730]  ? worker_thread+0x353/0x1120
[  399.595355]  ? lock_contended+0xdb0/0xdb0
[  399.599974]  ? __lock_acquire+0x4500/0x4500
[  399.604792]  ? do_raw_spin_trylock+0x101/0x1a0
[  399.609894]  ? do_raw_spin_lock+0x1f0/0x1f0
[  399.614728]  worker_thread+0x15d/0x1120
[  399.619177]  ? process_one_work+0x1de0/0x1de0
[  399.624187]  ? cyc2ns_read_end+0x10/0x10
[  399.628699]  ? save_trace+0x300/0x300
[  399.632922]  ? cyc2ns_read_end+0x10/0x10
[  399.637434]  ? kasan_kmalloc+0xa0/0xd0
[  399.641766]  ? sched_clock_local+0xd4/0x140
[  399.646638]  ? find_held_lock+0x39/0x1c0
[  399.651177]  ? check_flags.part.36+0x450/0x450
[  399.656284]  ? _raw_spin_unlock_irqrestore+0x32/0x50
[  399.661968]  ? __kthread_parkme+0x44/0x180
[  399.88]  ? __bpf_trace_preemptirq_template+0x10/0x10
[  399.672787]  ? __kthread_parkme+0xb6/0x180
[  399.677504]  ? process_one_work+0x1de0/0x1de0
[  399.682501]  kthread+0x322/0x3e0
[  399.686238]  ? kthread_create_worker_on_cpu+0xc0/0xc0
[  399.692027]  ret_from_fork+0x3a/0x50
[  399.696210] INFO: lockdep is turned off.


Taehee Yoo (2):
  netfilter: ipt_CLUSTERIP: fix deadlock in netns exit routine
  netfilter: ipt_CLUSTERIP: remove wrong WARN_ON_ONCE in netns
exit routine

 net/ipv4/netfilter/ipt_CLUSTERIP.c | 72 +-
 1 file changed, 40 insertions(+), 32 deletions(-)

-- 
2.17.1



[PATCH nf-next] netfilter: nf_flow_table: remove flowtable hook flush routine in netns exit routine

2018-10-01 Thread Taehee Yoo
When device is unregistered, flowtable flush routine is called
by notifier_call(nf_tables_flowtable_event). and exit callback of
nftables pernet_operation(nf_tables_exit_net) also has flowtable flush
routine. but when network namespace is destroyed, both notifier_call
and pernet_operation are called. hence flowtable flush routine in
pernet_operation is unnecessary.

test commands:
   %ip netns add vm1
   %ip netns exec vm1 nft add table ip filter
   %ip netns exec vm1 nft add flowtable ip filter w \
{ hook ingress priority 0\; devices = { lo }\; }
   %ip netns del vm1

splat looks like:
[  265.187019] WARNING: CPU: 0 PID: 87 at net/netfilter/core.c:309 
nf_hook_entry_head+0xc7/0xf0
[  265.187112] Modules linked in: nf_flow_table_ipv4 nf_flow_table nf_conntrack 
nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink ip_tables x_tables
[  265.187390] CPU: 0 PID: 87 Comm: kworker/u4:2 Not tainted 4.19.0-rc3+ #5
[  265.187453] Workqueue: netns cleanup_net
[  265.187514] RIP: 0010:nf_hook_entry_head+0xc7/0xf0
[  265.187546] Code: 8d 81 68 03 00 00 5b c3 89 d0 83 fa 04 48 8d 84 c7 e8 11 
00 00 76 81 0f 0b 31 c0 e9 78 ff ff ff 0f 0b 48 83 c4 08 31 c0 5b c3 <0f> 0b 31 
c0 e9 65 ff ff ff 0f 0b 31 c0 e9 5c ff ff ff 48 89 0c 24
[  265.187573] RSP: 0018:88011546f098 EFLAGS: 00010246
[  265.187624] RAX: 8d90e135 RBX: 110022a8de1c RCX: 
[  265.187645] RDX:  RSI: 0005 RDI: 880116298040
[  265.187645] RBP: 88010ea4c1a8 R08:  R09: 
[  265.187645] R10: 88011546f1d8 R11: ed0022c532c1 R12: 88010ea4c1d0
[  265.187645] R13: 0005 R14: dc00 R15: 88010ea4c1c4
[  265.187645] FS:  () GS:88011b20() 
knlGS:
[  265.187645] CS:  0010 DS:  ES:  CR0: 80050033
[  265.187645] CR2: 7fdfb8d0 CR3: 57a16000 CR4: 001006f0
[  265.187645] Call Trace:
[  265.187645]  __nf_unregister_net_hook+0xca/0x5d0
[  265.187645]  ? nf_hook_entries_free.part.3+0x80/0x80
[  265.187645]  ? save_trace+0x300/0x300
[  265.187645]  nf_unregister_net_hooks+0x2e/0x40
[  265.187645]  nf_tables_exit_net+0x479/0x1340 [nf_tables]
[  265.187645]  ? find_held_lock+0x39/0x1c0
[  265.187645]  ? nf_tables_abort+0x30/0x30 [nf_tables]
[  265.187645]  ? inet_frag_destroy_rcu+0xd0/0xd0
[  265.187645]  ? trace_hardirqs_on+0x93/0x210
[  265.187645]  ? __bpf_trace_preemptirq_template+0x10/0x10
[  265.187645]  ? inet_frag_destroy_rcu+0xd0/0xd0
[  265.187645]  ? inet_frag_destroy_rcu+0xd0/0xd0
[  265.187645]  ? __mutex_unlock_slowpath+0x17f/0x740
[  265.187645]  ? wait_for_completion+0x710/0x710
[  265.187645]  ? bucket_table_free+0xb2/0x1f0
[  265.187645]  ? nested_table_free+0x130/0x130
[  265.187645]  ? __lock_is_held+0xb4/0x140
[  265.187645]  ops_exit_list.isra.10+0x94/0x140
[  265.187645]  cleanup_net+0x45b/0x900
[ ... ]

This WARNING means that hook unregisteration is failed because
all flowtables hooks are already unregistered by notifier_call.

Network namespace exit routine guarantees that all devices will be
unregistered first. then, other exit callbacks of pernet_operations
are called. so that removing flowtable flush routine in exit callback of
pernet_operation(nf_tables_exit_net) doesn't make flowtable leak.

Signed-off-by: Taehee Yoo 
---
 net/netfilter/nf_tables_api.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index f0159eea2978..42487d01a3ed 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -7280,9 +7280,6 @@ static void __nft_release_tables(struct net *net)
 
list_for_each_entry(chain, >chains, list)
nf_tables_unregister_hook(net, table, chain);
-   list_for_each_entry(flowtable, >flowtables, list)
-   nf_unregister_net_hooks(net, flowtable->ops,
-   flowtable->ops_len);
/* No packets are walking on these chains anymore. */
ctx.table = table;
list_for_each_entry(chain, >chains, list) {
-- 
2.17.1



[PATCH nf-next] netfilter: nf_tables: use rhashtable_lookup() instead of rhashtable_lookup_fast()

2018-09-24 Thread Taehee Yoo
Internally, rhashtable_lookup_fast() calls rcu_read_lock() then,
calls rhashtable_lookup(). so that in places where are guaranteed
by rcu read lock, rhashtable_lookup() is enough.

Signed-off-by: Taehee Yoo 
---
 net/netfilter/nf_flow_table_core.c | 4 ++--
 net/netfilter/nft_set_hash.c   | 8 
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/netfilter/nf_flow_table_core.c 
b/net/netfilter/nf_flow_table_core.c
index da3044482317..185c633b6872 100644
--- a/net/netfilter/nf_flow_table_core.c
+++ b/net/netfilter/nf_flow_table_core.c
@@ -233,8 +233,8 @@ flow_offload_lookup(struct nf_flowtable *flow_table,
struct flow_offload *flow;
int dir;
 
-   tuplehash = rhashtable_lookup_fast(_table->rhashtable, tuple,
-  nf_flow_offload_rhash_params);
+   tuplehash = rhashtable_lookup(_table->rhashtable, tuple,
+ nf_flow_offload_rhash_params);
if (!tuplehash)
return NULL;
 
diff --git a/net/netfilter/nft_set_hash.c b/net/netfilter/nft_set_hash.c
index 4f9c01715856..339a9dd1c832 100644
--- a/net/netfilter/nft_set_hash.c
+++ b/net/netfilter/nft_set_hash.c
@@ -88,7 +88,7 @@ static bool nft_rhash_lookup(const struct net *net, const 
struct nft_set *set,
.key = key,
};
 
-   he = rhashtable_lookup_fast(>ht, , nft_rhash_params);
+   he = rhashtable_lookup(>ht, , nft_rhash_params);
if (he != NULL)
*ext = >ext;
 
@@ -106,7 +106,7 @@ static void *nft_rhash_get(const struct net *net, const 
struct nft_set *set,
.key = elem->key.val.data,
};
 
-   he = rhashtable_lookup_fast(>ht, , nft_rhash_params);
+   he = rhashtable_lookup(>ht, , nft_rhash_params);
if (he != NULL)
return he;
 
@@ -129,7 +129,7 @@ static bool nft_rhash_update(struct nft_set *set, const u32 
*key,
.key = key,
};
 
-   he = rhashtable_lookup_fast(>ht, , nft_rhash_params);
+   he = rhashtable_lookup(>ht, , nft_rhash_params);
if (he != NULL)
goto out;
 
@@ -217,7 +217,7 @@ static void *nft_rhash_deactivate(const struct net *net,
};
 
rcu_read_lock();
-   he = rhashtable_lookup_fast(>ht, , nft_rhash_params);
+   he = rhashtable_lookup(>ht, , nft_rhash_params);
if (he != NULL &&
!nft_rhash_flush(net, set, he))
he = NULL;
-- 
2.17.1



[PATCH nf-next] netfilter: nf_flow_table: remove unnecessary nat flag check code

2018-09-24 Thread Taehee Yoo
nf_flow_offload_{ip/ipv6}_hook() check nat flag then, call
nf_flow_nat_{ip/ipv6} but that also check nat flag. so that
nat flag check code in nf_flow_offload_{ip/ipv6}_hook() are unnecessary.

Signed-off-by: Taehee Yoo 
---
 net/netfilter/nf_flow_table_ip.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table_ip.c
index 15ed91309992..1d291a51cd45 100644
--- a/net/netfilter/nf_flow_table_ip.c
+++ b/net/netfilter/nf_flow_table_ip.c
@@ -254,8 +254,7 @@ nf_flow_offload_ip_hook(void *priv, struct sk_buff *skb,
if (nf_flow_state_check(flow, ip_hdr(skb)->protocol, skb, thoff))
return NF_ACCEPT;
 
-   if (flow->flags & (FLOW_OFFLOAD_SNAT | FLOW_OFFLOAD_DNAT) &&
-   nf_flow_nat_ip(flow, skb, thoff, dir) < 0)
+   if (nf_flow_nat_ip(flow, skb, thoff, dir) < 0)
return NF_DROP;
 
flow->timeout = (u32)jiffies + NF_FLOW_TIMEOUT;
@@ -471,8 +470,7 @@ nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb,
if (skb_try_make_writable(skb, sizeof(*ip6h)))
return NF_DROP;
 
-   if (flow->flags & (FLOW_OFFLOAD_SNAT | FLOW_OFFLOAD_DNAT) &&
-   nf_flow_nat_ipv6(flow, skb, dir) < 0)
+   if (nf_flow_nat_ipv6(flow, skb, dir) < 0)
return NF_DROP;
 
flow->timeout = (u32)jiffies + NF_FLOW_TIMEOUT;
-- 
2.17.1



[PATCH nf-next] netfilter: nf_tables: use rhashtable_walk_enter instead of rhashtable_walk_init

2018-09-13 Thread Taehee Yoo
rhashtable_walk_init() is deprecated and rhashtable_walk_enter() can be
used instead. rhashtable_walk_init() is wrapper function of
rhashtable_walk_enter() so that logic is actually same.
But rhashtable_walk_enter() doesn't return error hence error path
code can be removed.

Signed-off-by: Taehee Yoo 
---
 net/netfilter/nf_flow_table_core.c | 35 ++
 net/netfilter/nft_set_hash.c   | 30 +++--
 2 files changed, 19 insertions(+), 46 deletions(-)

diff --git a/net/netfilter/nf_flow_table_core.c 
b/net/netfilter/nf_flow_table_core.c
index d8125616edc7..c0ebebb303ec 100644
--- a/net/netfilter/nf_flow_table_core.c
+++ b/net/netfilter/nf_flow_table_core.c
@@ -254,20 +254,17 @@ int nf_flow_table_iterate(struct nf_flowtable *flow_table,
struct flow_offload_tuple_rhash *tuplehash;
struct rhashtable_iter hti;
struct flow_offload *flow;
-   int err;
-
-   err = rhashtable_walk_init(_table->rhashtable, , GFP_KERNEL);
-   if (err)
-   return err;
+   int err = 0;
 
+   rhashtable_walk_enter(_table->rhashtable, );
rhashtable_walk_start();
 
while ((tuplehash = rhashtable_walk_next())) {
if (IS_ERR(tuplehash)) {
-   err = PTR_ERR(tuplehash);
-   if (err != -EAGAIN)
-   goto out;
-
+   if (PTR_ERR(tuplehash) != -EAGAIN) {
+   err = PTR_ERR(tuplehash);
+   break;
+   }
continue;
}
if (tuplehash->tuple.dir)
@@ -277,7 +274,6 @@ int nf_flow_table_iterate(struct nf_flowtable *flow_table,
 
iter(flow, data);
}
-out:
rhashtable_walk_stop();
rhashtable_walk_exit();
 
@@ -290,25 +286,19 @@ static inline bool nf_flow_has_expired(const struct 
flow_offload *flow)
return (__s32)(flow->timeout - (u32)jiffies) <= 0;
 }
 
-static int nf_flow_offload_gc_step(struct nf_flowtable *flow_table)
+static void nf_flow_offload_gc_step(struct nf_flowtable *flow_table)
 {
struct flow_offload_tuple_rhash *tuplehash;
struct rhashtable_iter hti;
struct flow_offload *flow;
-   int err;
-
-   err = rhashtable_walk_init(_table->rhashtable, , GFP_KERNEL);
-   if (err)
-   return 0;
 
+   rhashtable_walk_enter(_table->rhashtable, );
rhashtable_walk_start();
 
while ((tuplehash = rhashtable_walk_next())) {
if (IS_ERR(tuplehash)) {
-   err = PTR_ERR(tuplehash);
-   if (err != -EAGAIN)
-   goto out;
-
+   if (PTR_ERR(tuplehash) != -EAGAIN)
+   break;
continue;
}
if (tuplehash->tuple.dir)
@@ -321,11 +311,8 @@ static int nf_flow_offload_gc_step(struct nf_flowtable 
*flow_table)
FLOW_OFFLOAD_TEARDOWN)))
flow_offload_del(flow_table, flow);
}
-out:
rhashtable_walk_stop();
rhashtable_walk_exit();
-
-   return 1;
 }
 
 static void nf_flow_offload_work_gc(struct work_struct *work)
@@ -514,7 +501,7 @@ void nf_flow_table_free(struct nf_flowtable *flow_table)
mutex_unlock(_lock);
cancel_delayed_work_sync(_table->gc_work);
nf_flow_table_iterate(flow_table, nf_flow_table_do_cleanup, NULL);
-   WARN_ON(!nf_flow_offload_gc_step(flow_table));
+   nf_flow_offload_gc_step(flow_table);
rhashtable_destroy(_table->rhashtable);
 }
 EXPORT_SYMBOL_GPL(nf_flow_table_free);
diff --git a/net/netfilter/nft_set_hash.c b/net/netfilter/nft_set_hash.c
index 015124e649cb..4f9c01715856 100644
--- a/net/netfilter/nft_set_hash.c
+++ b/net/netfilter/nft_set_hash.c
@@ -244,21 +244,15 @@ static void nft_rhash_walk(const struct nft_ctx *ctx, 
struct nft_set *set,
struct nft_rhash_elem *he;
struct rhashtable_iter hti;
struct nft_set_elem elem;
-   int err;
-
-   err = rhashtable_walk_init(>ht, , GFP_ATOMIC);
-   iter->err = err;
-   if (err)
-   return;
 
+   rhashtable_walk_enter(>ht, );
rhashtable_walk_start();
 
while ((he = rhashtable_walk_next())) {
if (IS_ERR(he)) {
-   err = PTR_ERR(he);
-   if (err != -EAGAIN) {
-   iter->err = err;
-   goto out;
+   if (PTR_ERR(he) != -EAGAIN) {
+   iter->err = PTR_ERR(he);
+   break;
}
 
continue;
@@ -275,13 +269,11 @@ static void nft_rhash_walk(const struct nft_ctx *ctx, 
struct nft_set *set,
 
iter->err = iter

[PATCH nf-next] netfilter: nat: remove duplicate skb_is_nonlinear() in __nf_nat_mangle_tcp_packet()

2018-09-12 Thread Taehee Yoo
__nf_nat_mangle_tcp_packet() and nf_nat_mangle_udp_packet() call
mangle_contents(). and __nf_nat_mangle_tcp_packet()
and mangle_contents() call skb_is_nonlinear(). so that
skb_is_nonlinear() in __nf_nat_mangle_tcp_packet() is unnecessary.

Signed-off-by: Taehee Yoo 
---
 net/netfilter/nf_nat_helper.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/net/netfilter/nf_nat_helper.c b/net/netfilter/nf_nat_helper.c
index 99606baedda4..38793b95d9bc 100644
--- a/net/netfilter/nf_nat_helper.c
+++ b/net/netfilter/nf_nat_helper.c
@@ -37,7 +37,7 @@ static void mangle_contents(struct sk_buff *skb,
 {
unsigned char *data;
 
-   BUG_ON(skb_is_nonlinear(skb));
+   SKB_LINEAR_ASSERT(skb);
data = skb_network_header(skb) + dataoff;
 
/* move post-replacement */
@@ -110,8 +110,6 @@ bool __nf_nat_mangle_tcp_packet(struct sk_buff *skb,
!enlarge_skb(skb, rep_len - match_len))
return false;
 
-   SKB_LINEAR_ASSERT(skb);
-
tcph = (void *)skb->data + protoff;
 
oldlen = skb->len - protoff;
-- 
2.17.1



[PATCH nf-next] netfilter: nat: remove unnecessary rcu_read_lock in nf_nat_redirect_ipv{4/6}

2018-09-11 Thread Taehee Yoo
nf_nat_redirect_ipv4() and nf_nat_redirect_ipv6() are only called by
netfilter hook point. so that rcu_read_lock and rcu_read_unlock() are
unnecessary.

Signed-off-by: Taehee Yoo 
---
 net/netfilter/nf_nat_redirect.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/net/netfilter/nf_nat_redirect.c b/net/netfilter/nf_nat_redirect.c
index adee04af8d43..78a9e6454ff3 100644
--- a/net/netfilter/nf_nat_redirect.c
+++ b/net/netfilter/nf_nat_redirect.c
@@ -52,13 +52,11 @@ nf_nat_redirect_ipv4(struct sk_buff *skb,
 
newdst = 0;
 
-   rcu_read_lock();
indev = __in_dev_get_rcu(skb->dev);
if (indev && indev->ifa_list) {
ifa = indev->ifa_list;
newdst = ifa->ifa_local;
}
-   rcu_read_unlock();
 
if (!newdst)
return NF_DROP;
@@ -97,7 +95,6 @@ nf_nat_redirect_ipv6(struct sk_buff *skb, const struct 
nf_nat_range2 *range,
struct inet6_ifaddr *ifa;
bool addr = false;
 
-   rcu_read_lock();
idev = __in6_dev_get(skb->dev);
if (idev != NULL) {
read_lock_bh(>lock);
@@ -108,7 +105,6 @@ nf_nat_redirect_ipv6(struct sk_buff *skb, const struct 
nf_nat_range2 *range,
}
read_unlock_bh(>lock);
}
-   rcu_read_unlock();
 
if (!addr)
return NF_DROP;
-- 
2.17.1



[PATCH nf] netfilter: nft_set_rbtree: add missing rb_erase() in GC routine

2018-08-30 Thread Taehee Yoo
The nft_set_gc_batch_check() checks whether gc buffer is full.
If gc buffer is full, gc buffer is released by
the nft_set_gc_batch_complete() internally.
In case of rbtree, the rb_erase() should be called before calling the
nft_set_gc_batch_complete(). therefore the rb_erase() should
be called before calling the nft_set_gc_batch_check() too.

test commands:
   table ip filter {
   set set1 {
   type ipv4_addr; flags interval, timeout;
   gc-interval 10s;
   timeout 1s;
   elements = {
   1-2,
   3-4,
   5-6,
   ...
   1-10001,
   }
   }
   }
   %nft -f test.nft

splat looks like:
[  430.273885] kasan: GPF could be caused by NULL-ptr deref or user memory 
access
[  430.282158] general protection fault:  [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[  430.283116] CPU: 1 PID: 190 Comm: kworker/1:2 Tainted: GB 
4.18.0+ #7
[  430.283116] Workqueue: events_power_efficient nft_rbtree_gc [nf_tables_set]
[  430.313559] RIP: 0010:rb_next+0x81/0x130
[  430.313559] Code: 08 49 bd 00 00 00 00 00 fc ff df 48 bb 00 00 00 00 00 fc 
ff df 48 85 c0 75 05 eb 58 48 89 d4
[  430.313559] RSP: 0018:88010cdb7680 EFLAGS: 00010207
[  430.313559] RAX: 00b84854 RBX: dc00 RCX: 83f01973
[  430.313559] RDX: 0017090c RSI: 0008 RDI: 00b84864
[  430.313559] RBP: 8801060d4588 R08: fbfff09bc349 R09: fbfff09bc349
[  430.313559] R10: 0001 R11: fbfff09bc348 R12: 880100f081a8
[  430.313559] R13: dc00 R14: 880100ff8688 R15: dc00
[  430.313559] FS:  () GS:88011b40() 
knlGS:
[  430.313559] CS:  0010 DS:  ES:  CR0: 80050033
[  430.313559] CR2: 01551008 CR3: 5dc16000 CR4: 001006e0
[  430.313559] Call Trace:
[  430.313559]  nft_rbtree_gc+0x112/0x5c0 [nf_tables_set]
[  430.313559]  process_one_work+0xc13/0x1ec0
[  430.313559]  ? _raw_spin_unlock_irq+0x29/0x40
[  430.313559]  ? pwq_dec_nr_in_flight+0x3c0/0x3c0
[  430.313559]  ? set_load_weight+0x270/0x270
[  430.313559]  ? __switch_to_asm+0x34/0x70
[  430.313559]  ? __switch_to_asm+0x40/0x70
[  430.313559]  ? __switch_to_asm+0x34/0x70
[  430.313559]  ? __switch_to_asm+0x34/0x70
[  430.313559]  ? __switch_to_asm+0x40/0x70
[  430.313559]  ? __switch_to_asm+0x34/0x70
[  430.313559]  ? __switch_to_asm+0x40/0x70
[  430.313559]  ? __switch_to_asm+0x34/0x70
[  430.313559]  ? __switch_to_asm+0x34/0x70
[  430.313559]  ? __switch_to_asm+0x40/0x70
[  430.313559]  ? __switch_to_asm+0x34/0x70
[  430.313559]  ? __schedule+0x6d3/0x1f50
[  430.313559]  ? find_held_lock+0x39/0x1c0
[  430.313559]  ? __sched_text_start+0x8/0x8
[  430.313559]  ? cyc2ns_read_end+0x10/0x10
[  430.313559]  ? save_trace+0x300/0x300
[  430.313559]  ? sched_clock_local+0xd4/0x140
[  430.313559]  ? find_held_lock+0x39/0x1c0
[  430.313559]  ? worker_thread+0x353/0x1120
[  430.313559]  ? worker_thread+0x353/0x1120
[  430.313559]  ? lock_contended+0xe70/0xe70
[  430.313559]  ? __lock_acquire+0x4500/0x4500
[  430.535635]  ? do_raw_spin_unlock+0xa5/0x330
[  430.535635]  ? do_raw_spin_trylock+0x101/0x1a0
[  430.535635]  ? do_raw_spin_lock+0x1f0/0x1f0
[  430.535635]  ? _raw_spin_lock_irq+0x10/0x70
[  430.535635]  worker_thread+0x15d/0x1120
[ ... ]

Fixes: 8d8540c4f5e0 ("netfilter: nft_set_rbtree: add timeout support")
Signed-off-by: Taehee Yoo 
---
 net/netfilter/nft_set_rbtree.c | 28 ++--
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/net/netfilter/nft_set_rbtree.c b/net/netfilter/nft_set_rbtree.c
index 55e2d9215c0d..f95c7a9bdd69 100644
--- a/net/netfilter/nft_set_rbtree.c
+++ b/net/netfilter/nft_set_rbtree.c
@@ -356,11 +356,10 @@ static void nft_rbtree_walk(const struct nft_ctx *ctx,
 static void nft_rbtree_gc(struct work_struct *work)
 {
struct nft_set_gc_batch *gcb = NULL;
-   struct rb_node *node, *prev = NULL;
-   struct nft_rbtree_elem *rbe;
+   struct rb_node *node;
+   struct nft_rbtree_elem *rbe, *rbe_end = NULL, *rbe_prev = NULL;
struct nft_rbtree *priv;
struct nft_set *set;
-   int i;
 
priv = container_of(work, struct nft_rbtree, gc_work.work);
set  = nft_set_container_of(priv);
@@ -371,7 +370,7 @@ static void nft_rbtree_gc(struct work_struct *work)
rbe = rb_entry(node, struct nft_rbtree_elem, node);
 
if (nft_rbtree_interval_end(rbe)) {
-   prev = node;
+   rbe_end = rbe;
continue;
}
if (!nft_set_elem_expired(>ext))
@@ -379,29 +378,30 @@ static void nft_rbtree_gc(struct work_struct *work)
if (nft_set_elem_mark_busy(>ext))

[PATCH nf] netfilter: nf_tables: release chain in flushing set

2018-08-25 Thread Taehee Yoo
When element of verdict map is deleted, the delete routine should
release chain. however, flush element of verdict map routine doesn't
release chain.

test commands:
   %nft add table ip filter
   %nft add chain ip filter c1
   %nft add map ip filter map1 { type ipv4_addr : verdict \; }
   %nft add element ip filter map1 { 1 : jump c1 }
   %nft flush map ip filter map1
   %nft flush ruleset

splat looks like:
[ 4895.170899] kernel BUG at net/netfilter/nf_tables_api.c:1415!
[ 4895.178114] invalid opcode:  [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[ 4895.178880] CPU: 0 PID: 1670 Comm: nft Not tainted 4.18.0+ #55
[ 4895.178880] RIP: 0010:nf_tables_chain_destroy.isra.28+0x39/0x220 [nf_tables]
[ 4895.178880] Code: fc ff df 53 48 89 fb 48 83 c7 50 48 89 fa 48 c1 ea 03 0f 
b6 04 02 84 c0 74 09 3c 03 7f 05 e8 3e 4c 25 e1 8b 43 50 85 c0 74 02 <0f> 0b 48 
89 da 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 80 3c 02
[ 4895.228342] RSP: 0018:88010b98f4c0 EFLAGS: 00010202
[ 4895.234841] RAX: 0001 RBX: 8801131c6968 RCX: 8801146585b0
[ 4895.234841] RDX: 110022638d37 RSI: 8801191a9348 RDI: 8801131c69b8
[ 4895.234841] RBP: 8801146585a8 R08: 11002323526a R09: 
[ 4895.234841] R10:  R11:  R12: dead0200
[ 4895.234841] R13: dead0100 R14: a3638af8 R15: dc00
[ 4895.234841] FS:  7f6d188e6700() GS:88011b60() 
knlGS:
[ 4895.234841] CS:  0010 DS:  ES:  CR0: 80050033
[ 4895.234841] CR2: 7ffe72b8df88 CR3: 00010e2d4000 CR4: 001006f0
[ 4895.234841] Call Trace:
[ 4895.234841]  nf_tables_commit+0x2704/0x2c70 [nf_tables]
[ 4895.234841]  ? nfnetlink_rcv_batch+0xa4f/0x11b0 [nfnetlink]
[ 4895.234841]  ? nf_tables_setelem_notify.constprop.48+0x1a0/0x1a0 [nf_tables]
[ 4895.323824]  ? __lock_is_held+0x9d/0x130
[ 4895.323824]  ? kasan_unpoison_shadow+0x30/0x40
[ 4895.333299]  ? kasan_kmalloc+0xa9/0xc0
[ 4895.333299]  ? kmem_cache_alloc_trace+0x2c0/0x310
[ 4895.333299]  ? nfnetlink_rcv_batch+0xa4f/0x11b0 [nfnetlink]
[ 4895.333299]  nfnetlink_rcv_batch+0xdb9/0x11b0 [nfnetlink]
[ 4895.333299]  ? debug_show_all_locks+0x290/0x290
[ 4895.333299]  ? nfnetlink_net_init+0x150/0x150 [nfnetlink]
[ 4895.333299]  ? sched_clock_cpu+0xe5/0x170
[ 4895.333299]  ? sched_clock_local+0xff/0x130
[ 4895.333299]  ? sched_clock_cpu+0xe5/0x170
[ 4895.333299]  ? find_held_lock+0x39/0x1b0
[ 4895.333299]  ? sched_clock_local+0xff/0x130
[ 4895.333299]  ? memset+0x1f/0x40
[ 4895.333299]  ? nla_parse+0x33/0x260
[ 4895.333299]  ? ns_capable_common+0x6e/0x110
[ 4895.333299]  nfnetlink_rcv+0x2c0/0x310 [nfnetlink]
[ ... ]

Fixes: 8411b6442e59 ("netfilter: nf_tables: support for set flushing")
Signed-off-by: Taehee Yoo 
---
 net/netfilter/nf_tables_api.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 1dca5683f59f..2cfb173cd0b2 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -4637,6 +4637,7 @@ static int nft_flush_set(const struct nft_ctx *ctx,
}
set->ndeact++;
 
+   nft_set_elem_deactivate(ctx->net, set, elem);
nft_trans_elem_set(trans) = set;
nft_trans_elem(trans) = *elem;
list_add_tail(>list, >net->nft.commit_list);
-- 
2.17.1



[PATCH nf] netfilter: nft_set: fix allocation size overflow in privsize callback.

2018-07-25 Thread Taehee Yoo
les_set nf_tables nfnetlink ip_tables 
x_tables
[ 1239.670713] ---[ end trace 39375adcda140f11 ]---
[ 1239.676016] RIP: 0010:nft_hash_walk+0x1d2/0x310 [nf_tables_set]
[ 1239.682834] Code: 84 d2 7f 10 4c 89 e7 89 44 24 38 e8 d8 5a 17 e0 8b 44 24 
38 48 8d 7b 10 41 0f b6 0c 24 48 89 fa 48 89 fe 48 c1 ea 03 83 e6 07 <42> 0f b6 
14 3a 40 38 f2 7f 1a 84 d2 74 16
[ 1239.705108] RSP: 0018:8801118cf358 EFLAGS: 00010246
[ 1239.75] RAX:  RBX: 00020400 RCX: 0001
[ 1239.719269] RDX: 4082 RSI:  RDI: 00020410
[ 1239.727401] RBP: 880114d5a988 R08: 7e94 R09: 880114dd8030
[ 1239.735530] R10: 880114d5a988 R11: ed00229bb006 R12: 8801118cf4d0
[ 1239.743658] R13: 8801118cf4d8 R14:  R15: dc00
[ 1239.751785] FS:  7f5a8fe0b700() GS:88011b60() 
knlGS:
[ 1239.760993] CS:  0010 DS:  ES:  CR0: 80050033
[ 1239.767560] CR2: 7f5a8ecc27b0 CR3: 00010608e000 CR4: 001006f0
[ 1239.775679] Kernel panic - not syncing: Fatal exception
[ 1239.776630] Kernel Offset: 0x1f00 from 0x8100 (relocation 
range: 0x8000-0xbfff)
[ 1239.776630] Rebooting in 5 seconds..

Fixes: 20a69341f2d0 ("netfilter: nf_tables: add netlink set API")
Signed-off-by: Taehee Yoo 
---
 include/net/netfilter/nf_tables.h | 4 ++--
 net/netfilter/nf_tables_api.c | 2 +-
 net/netfilter/nft_set_bitmap.c| 6 +++---
 net/netfilter/nft_set_hash.c  | 8 
 net/netfilter/nft_set_rbtree.c| 4 ++--
 5 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/include/net/netfilter/nf_tables.h 
b/include/net/netfilter/nf_tables.h
index dc417ef..552bfbe 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -274,7 +274,7 @@ enum nft_set_class {
  * @space: memory class
  */
 struct nft_set_estimate {
-   unsigned intsize;
+   u64 size;
enum nft_set_class  lookup;
enum nft_set_class  space;
 };
@@ -336,7 +336,7 @@ struct nft_set_ops {
   const struct nft_set_elem *elem,
   unsigned int flags);
 
-   unsigned int(*privsize)(const struct nlattr * const 
nla[],
+   u64 (*privsize)(const struct nlattr * const 
nla[],
const struct nft_set_desc 
*desc);
bool(*estimate)(const struct nft_set_desc 
*desc,
u32 features,
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index f5745e4c..bf2d577 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -3294,7 +3294,7 @@ static int nf_tables_newset(struct net *net, struct sock 
*nlsk,
struct nft_set *set;
struct nft_ctx ctx;
char *name;
-   unsigned int size;
+   u64 size;
bool create;
u64 timeout;
u32 ktype, dtype, flags, policy, gc_int, objtype;
diff --git a/net/netfilter/nft_set_bitmap.c b/net/netfilter/nft_set_bitmap.c
index 128bc16..f866bd4 100644
--- a/net/netfilter/nft_set_bitmap.c
+++ b/net/netfilter/nft_set_bitmap.c
@@ -248,13 +248,13 @@ static inline u32 nft_bitmap_size(u32 klen)
return ((2 << ((klen * BITS_PER_BYTE) - 1)) / BITS_PER_BYTE) << 1;
 }
 
-static inline u32 nft_bitmap_total_size(u32 klen)
+static inline u64 nft_bitmap_total_size(u32 klen)
 {
return sizeof(struct nft_bitmap) + nft_bitmap_size(klen);
 }
 
-static unsigned int nft_bitmap_privsize(const struct nlattr * const nla[],
-   const struct nft_set_desc *desc)
+static u64 nft_bitmap_privsize(const struct nlattr * const nla[],
+  const struct nft_set_desc *desc)
 {
u32 klen = ntohl(nla_get_be32(nla[NFTA_SET_KEY_LEN]));
 
diff --git a/net/netfilter/nft_set_hash.c b/net/netfilter/nft_set_hash.c
index 90c3e7e..015124e 100644
--- a/net/netfilter/nft_set_hash.c
+++ b/net/netfilter/nft_set_hash.c
@@ -341,8 +341,8 @@ static void nft_rhash_gc(struct work_struct *work)
   nft_set_gc_interval(set));
 }
 
-static unsigned int nft_rhash_privsize(const struct nlattr * const nla[],
-  const struct nft_set_desc *desc)
+static u64 nft_rhash_privsize(const struct nlattr * const nla[],
+ const struct nft_set_desc *desc)
 {
return sizeof(struct nft_rhash);
 }
@@ -585,8 +585,8 @@ static void nft_hash_walk(const struct nft_ctx *ctx, struct 
nft_set *set,
}
 }
 
-static unsigned int nft_hash_privsize(const struct nlattr * const nla[],
- const struct nft_set_desc *desc)
+stati

Re: [PATCH V2 nf 3/3] netfilter: nf_tables: add default set size

2018-07-19 Thread Taehee Yoo
2018-07-19 1:44 GMT+09:00 Pablo Neira Ayuso :
> Hi,
>
> On Tue, Jul 10, 2018 at 11:22:36PM +0900, Taehee Yoo wrote:
>> In order to restrict element number of each set, member ->size is used.
>> that used to be given by user-space. if user-space don't specify ->size,
>> number of element is unlimited. so that overflow can occurred.
>>
>> After this patch,
>> If user-space don't specify ->size, 65535 is set.
>> all types of set have same default size.
>>
>> test commands:
>>%nft add table ip aa
>>%nft add map ip aa map1 { type ipv4_add : verdict\; }
>>%nft list ruleset
>>
>> Before this patch:
>>table ip aa {
>>  map map1 {
>>  type ipv4_addr : verdict
>>  }
>>}
>>
>> After this patch:
>>table ip aa {
>>  map map1 {
>>  type ipv4_addr : verdict
>>  size 65535
>>  }
>>}
>>
>> V2:
>>  - Add default set->size value instead add check set->size routine.
>>   - Requested by Florian Westphal
>
> I agree with Florian in that we can simplify the code by always doing
> size accounting all over the place (I mean remove branches to do
> inconditional size accounting). So we do size accounting even if we
> don't need it.
>

Thank you for reviewing!

> Then, moving forward, if we go for default size for sets, we may need
> a way to signal the kernel that the hashtable is resizable, in case
> the user wants to dynamically update the maximum size (in such case,
> the rhashtable implementation would be still useful I think).
>

In my opinion, we can use estimate callback.
If user sets 'performance', hashtable will be selected
or 'memory' is set, rhashtable will be selected.

As far as I know, updating flags and size value is not support.
If we are going to add to support updating maximum size
manually or dynamically, new flags should be added.

> Another possibility is that we can get rid of the rhashtable, and
> implement a more simple way to resize the existing fixed size
> hashtable, given that this will only happen from control plane and it
> should be a rare operation (just like conntrack resizing), but
> probably we may be re-inventing the wheel.
>
> Thoughts?
>

I would like to keep using rhashtable unless the rhashtable
reduces lookup performance much. because I think rhashtable is
a little bit easy to use and it makes readable code.

> In any case, this patch should go nf-next, so we have time to discuss
> things, so this series are applied, except this one ;-).
>
> Thanks!

I agree with it, I checked!

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 nf 2/3] netfilter: nft_set_rbtree: fix panic when destroying set by GC

2018-07-16 Thread Taehee Yoo
2018-07-17 1:09 GMT+09:00 Pablo Neira Ayuso :
> Hi Taehee,
>
> On Tue, Jul 10, 2018 at 11:22:01PM +0900, Taehee Yoo wrote:
>> This patch fixes below.
>> 1. check null pointer of rb_next.
>>  rb_next can return null. so null check routine should be added.
>> 2. add rcu_barrier in destroy routine.
>>  GC uses call_rcu to remove elements. but all elements should be
>>  removed before destroying set and chains. so that rcu_barrier is added.
> [...]
>> diff --git a/net/netfilter/nft_set_rbtree.c b/net/netfilter/nft_set_rbtree.c
>> index 1f8f257..09e3a15 100644
>> --- a/net/netfilter/nft_set_rbtree.c
>> +++ b/net/netfilter/nft_set_rbtree.c
>> @@ -381,7 +381,7 @@ static void nft_rbtree_gc(struct work_struct *work)
>>
>>   gcb = nft_set_gc_batch_check(set, gcb, GFP_ATOMIC);
>>   if (!gcb)
>> - goto out;
>> + break;
>>
>>   atomic_dec(>nelems);
>>   nft_set_gc_batch_add(gcb, rbe);
>> @@ -390,10 +390,12 @@ static void nft_rbtree_gc(struct work_struct *work)
>>   rbe = rb_entry(prev, struct nft_rbtree_elem, node);
>>   atomic_dec(>nelems);
>>   nft_set_gc_batch_add(gcb, rbe);
>> + prev = NULL;
>>   }
>>   node = rb_next(node);
>> + if (!node)
>> + break;
>>   }
>> -out:
>>   if (gcb) {
>>   for (i = 0; i < gcb->head.cnt; i++) {
>>   rbe = gcb->elems[i];
>> @@ -440,9 +442,10 @@ static void nft_rbtree_destroy(const struct nft_set 
>> *set)
>>   struct rb_node *node;
>>
>>   cancel_delayed_work_sync(>gc_work);
>> + rcu_barrier();
>>   while ((node = priv->root.rb_node) != NULL) {
>> - rb_erase(node, >root);
>>   rbe = rb_entry(node, struct nft_rbtree_elem, node);
>> + rb_erase(node, >root);
>
> Just to clarify: Do we have to reorder these lines? I mean, place
> rb_erase() after rb_entry(). Just asking because this is not described
> in the patch description - I just came from long trip, I'm tired so I
> may be overlooking anything. No need to resend in any case.
>
> Let me know, thanks!

Hi Pablo,

It doesn't change actural logic and doesn't fix problem.
I prefer that setting variables on top of while statement.
so I added it.
I'm so sorry to make you confused.

Thanks for reviewing!
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2 nf] netfilter: nf_tables: fix jumpstack depth validation

2018-07-12 Thread Taehee Yoo
The level of struct nft_ctx is updated by nf_tables_check_loops().
That is used to validate jumpstack depth.
But jumpstack validation routine doesn't update and validate recursively.
So, in some cases, chain depth can be bigger than the NFT_JUMP_STACK_SIZE.

After this patch, The jumpstack validation routine is located in
the nft_chain_validate().
When new rules or new set elements are added,
the nft_table_validate() is called by the nf_tables_newrule
and the nf_tables_newsetelem.
The nft_table_validate() calls the nft_chain_validate()
that visit all their children chains recursively.
So it can update depth of chain certainly.

Reproducer:
   %cat ./test.sh
   #!/bin/bash
   nft add table ip filter
   nft add chain ip filter input { type filter hook input priority 0\; }
   for ((i=0;i<20;i++)); do
nft add chain ip filter a$i
   done

   nft add rule ip filter input jump a1

   for ((i=0;i<10;i++)); do
nft add rule ip filter a$i jump a$((i+1))
   done

   for ((i=11;i<19;i++)); do
nft add rule ip filter a$i jump a$((i+1))
   done

   nft add rule ip filter a10 jump a11

Result:
[  253.931782] WARNING: CPU: 1 PID: 0 at net/netfilter/nf_tables_core.c:186 
nft_do_chain+0xacc/0xdf0 [nf_tables]
[  253.931915] Modules linked in: nf_tables nfnetlink ip_tables x_tables
[  253.932153] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.18.0-rc3+ #48
[  253.932153] RIP: 0010:nft_do_chain+0xacc/0xdf0 [nf_tables]
[  253.932153] Code: 83 f8 fb 0f 84 c7 00 00 00 e9 d0 00 00 00 83 f8 fd 74 0e 
83 f8 ff 0f 84 b4 00 00 00 e9 bd 00 00 00 83 bd 64 fd ff ff 0f 76 09 <0f> 0b 31 
c0 e9 bc 02 00 00 44 8b ad 64 fd
[  253.933807] RSP: 0018:88011b807570 EFLAGS: 00010212
[  253.933807] RAX: fffd RBX: 88011b807660 RCX: 
[  253.933807] RDX: 0010 RSI: 880112b39d78 RDI: 88011b807670
[  253.933807] RBP: 88011b807850 R08: ed0023700ece R09: ed0023700ecd
[  253.933807] R10: 88011b80766f R11: ed0023700ece R12: 88011b807898
[  253.933807] R13: 880112b39d80 R14: 880112b39d60 R15: dc00
[  253.933807] FS:  () GS:88011b80() 
knlGS:
[  253.933807] CS:  0010 DS:  ES:  CR0: 80050033
[  253.933807] CR2: 014f1008 CR3: 6b216000 CR4: 001006e0
[  253.933807] Call Trace:
[  253.933807]  
[  253.933807]  ? sched_clock_cpu+0x132/0x170
[  253.933807]  ? __nft_trace_packet+0x180/0x180 [nf_tables]
[  253.933807]  ? sched_clock_cpu+0x132/0x170
[  253.933807]  ? debug_show_all_locks+0x290/0x290
[  253.933807]  ? __lock_acquire+0x4835/0x4af0
[  253.933807]  ? inet_ehash_locks_alloc+0x1a0/0x1a0
[  253.933807]  ? unwind_next_frame+0x159e/0x1840
[  253.933807]  ? __read_once_size_nocheck.constprop.4+0x5/0x10
[  253.933807]  ? nft_do_chain_ipv4+0x197/0x1e0 [nf_tables]
[  253.933807]  ? nft_do_chain+0x5/0xdf0 [nf_tables]
[  253.933807]  nft_do_chain_ipv4+0x197/0x1e0 [nf_tables]
[  253.933807]  ? nft_do_chain_arp+0xb0/0xb0 [nf_tables]
[  253.933807]  ? __lock_is_held+0x9d/0x130
[  253.933807]  nf_hook_slow+0xc4/0x150
[  253.933807]  ip_local_deliver+0x28b/0x380
[  253.933807]  ? ip_call_ra_chain+0x3e0/0x3e0
[  253.933807]  ? ip_rcv_finish+0x1610/0x1610
[  253.933807]  ip_rcv+0xbcc/0xcc0
[  253.933807]  ? debug_show_all_locks+0x290/0x290
[  253.933807]  ? ip_local_deliver+0x380/0x380
[  253.933807]  ? __lock_is_held+0x9d/0x130
[  253.933807]  ? ip_local_deliver+0x380/0x380
[  253.933807]  __netif_receive_skb_core+0x1c9c/0x2240


V2:
 - add missing initialize code, requested by Pablo Neira Ayuso

Signed-off-by: Taehee Yoo 
---
 include/net/netfilter/nf_tables.h |  4 ++--
 net/netfilter/nf_tables_api.c | 11 ---
 net/netfilter/nft_immediate.c |  3 +++
 net/netfilter/nft_lookup.c| 13 +++--
 4 files changed, 20 insertions(+), 11 deletions(-)

diff --git a/include/net/netfilter/nf_tables.h 
b/include/net/netfilter/nf_tables.h
index 08c005c..4e82a4c 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -150,6 +150,7 @@ static inline void nft_data_debug(const struct nft_data 
*data)
  * @portid: netlink portID of the original message
  * @seq: netlink sequence number
  * @family: protocol family
+ * @level: depth of the chains
  * @report: notify via unicast netlink message
  */
 struct nft_ctx {
@@ -160,6 +161,7 @@ struct nft_ctx {
u32 portid;
u32 seq;
u8  family;
+   u8  level;
boolreport;
 };
 
@@ -865,7 +867,6 @@ enum nft_chain_flags {
  * @table: table that this chain belongs to
  * @handle: chain handle
  * @use: number of jump references to this chain
- * @level: length of longest path to this chain
  * @flags: bitmask of enum nft_chain_flags
  * @name: name of the chain

Re: [PATCH nf-next] netfilter: nf_tables: fix jumpstack depth validation

2018-07-12 Thread Taehee Yoo
2018-07-12 7:33 GMT+09:00 Pablo Neira Ayuso :
> On Mon, Jun 11, 2018 at 09:04:39PM +0900, Taehee Yoo wrote:
> [...]
>> diff --git a/include/net/netfilter/nf_tables.h 
>> b/include/net/netfilter/nf_tables.h
>> index 08c005c..a7d6476 100644
>> --- a/include/net/netfilter/nf_tables.h
>> +++ b/include/net/netfilter/nf_tables.h
>> @@ -150,6 +150,7 @@ static inline void nft_data_debug(const struct nft_data 
>> *data)
>>   *   @portid: netlink portID of the original message
>>   *   @seq: netlink sequence number
>>   *   @family: protocol family
>> + *   @level: depth of the chains
>>   *   @report: notify via unicast netlink message
>>   */
>>  struct nft_ctx {
>> @@ -160,6 +161,7 @@ struct nft_ctx {
>>   u32 portid;
>>   u32 seq;
>>   u8  family;
>> + u8  level;
>>   boolreport;
>>  };
>

Thank you for reviewing!

> I think the chunk I'm attaching is missing, right?
>
> Other than that, rejecting this configuration from control plane - now
> that we don't crash anymore due to hitting BUG_ON from packet path -
> is indeed the way to go.
>
> Thanks.

Yes, I missed it.
I will send v2 patch

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2 nf 3/3] netfilter: nf_tables: add default set size

2018-07-10 Thread Taehee Yoo
In order to restrict element number of each set, member ->size is used.
that used to be given by user-space. if user-space don't specify ->size,
number of element is unlimited. so that overflow can occurred.

After this patch,
If user-space don't specify ->size, 65535 is set.
all types of set have same default size.

test commands:
   %nft add table ip aa
   %nft add map ip aa map1 { type ipv4_add : verdict\; }
   %nft list ruleset

Before this patch:
   table ip aa {
   map map1 {
   type ipv4_addr : verdict
   }
   }

After this patch:
   table ip aa {
   map map1 {
   type ipv4_addr : verdict
   size 65535
   }
   }

V2:
 - Add default set->size value instead add check set->size routine.
  - Requested by Florian Westphal

Suggested-by: Florian Westphal 
Signed-off-by: Taehee Yoo 
---
 net/netfilter/nf_tables_api.c | 13 -
 net/netfilter/nft_dynset.c|  6 +-
 2 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 896d4a3..eb069b0 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -23,6 +23,8 @@
 #include 
 #include 
 
+#define NFT_DEFAULT_SET_SIZE   0x
+
 static LIST_HEAD(nf_tables_expressions);
 static LIST_HEAD(nf_tables_objects);
 static LIST_HEAD(nf_tables_flowtables);
@@ -3060,8 +3062,7 @@ static int nf_tables_fill_set(struct sk_buff *skb, const 
struct nft_ctx *ctx,
desc = nla_nest_start(skb, NFTA_SET_DESC);
if (desc == NULL)
goto nla_put_failure;
-   if (set->size &&
-   nla_put_be32(skb, NFTA_SET_DESC_SIZE, htonl(set->size)))
+   if (nla_put_be32(skb, NFTA_SET_DESC_SIZE, htonl(set->size)))
goto nla_put_failure;
nla_nest_end(skb, desc);
 
@@ -3437,7 +3438,10 @@ static int nf_tables_newset(struct net *net, struct sock 
*nlsk,
set->objtype = objtype;
set->dlen  = desc.dlen;
set->flags = flags;
-   set->size  = desc.size;
+   if (desc.size)
+   set->size  = desc.size;
+   else
+   set->size  = NFT_DEFAULT_SET_SIZE;
set->policy = policy;
set->udlen  = udlen;
set->udata  = udata;
@@ -4331,8 +4335,7 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct 
nft_set *set,
goto err5;
}
 
-   if (set->size &&
-   !atomic_add_unless(>nelems, 1, set->size + set->ndeact)) {
+   if (!atomic_add_unless(>nelems, 1, set->size + set->ndeact)) {
err = -ENFILE;
goto err6;
}
diff --git a/net/netfilter/nft_dynset.c b/net/netfilter/nft_dynset.c
index 27d7e459..c26970f 100644
--- a/net/netfilter/nft_dynset.c
+++ b/net/netfilter/nft_dynset.c
@@ -57,8 +57,7 @@ static void *nft_dynset_new(struct nft_set *set, const struct 
nft_expr *expr,
 err2:
nft_set_elem_destroy(set, elem, false);
 err1:
-   if (set->size)
-   atomic_dec(>nelems);
+   atomic_dec(>nelems);
return NULL;
 }
 
@@ -223,9 +222,6 @@ static int nft_dynset_init(const struct nft_ctx *ctx,
if (err < 0)
goto err1;
 
-   if (set->size == 0)
-   set->size = 0x;
-
priv->set = set;
return 0;
 
-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2 nf 2/3] netfilter: nft_set_rbtree: fix panic when destroying set by GC

2018-07-10 Thread Taehee Yoo
This patch fixes below.
1. check null pointer of rb_next.
 rb_next can return null. so null check routine should be added.
2. add rcu_barrier in destroy routine.
 GC uses call_rcu to remove elements. but all elements should be
 removed before destroying set and chains. so that rcu_barrier is added.

test script:
   %cat test.nft
   table inet aa {
   map map1 {
   type ipv4_addr : verdict; flags interval, timeout;
   elements = {
   0-1 : jump a0,
   3-4 : jump a0,
   6-7 : jump a0,
   9-10 : jump a0,
   12-13 : jump a0,
   15-16 : jump a0,
   18-19 : jump a0,
   21-22 : jump a0,
   24-25 : jump a0,
   27-28 : jump a0,
   }
   timeout 1s;
   }
   chain a0 {
   }
   }
   flush ruleset
   table inet aa {
   map map1 {
   type ipv4_addr : verdict; flags interval, timeout;
   elements = {
   0-1 : jump a0,
   3-4 : jump a0,
   6-7 : jump a0,
   9-10 : jump a0,
   12-13 : jump a0,
   15-16 : jump a0,
   18-19 : jump a0,
   21-22 : jump a0,
   24-25 : jump a0,
   27-28 : jump a0,
   }
   timeout 1s;
   }
   chain a0 {
   }
   }
   flush ruleset

splat looks like:
[ 2402.419838] kasan: GPF could be caused by NULL-ptr deref or user memory 
access
[ 2402.428433] general protection fault:  [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[ 2402.429343] CPU: 1 PID: 1350 Comm: kworker/1:1 Not tainted 4.18.0-rc2+ #1
[ 2402.429343] Hardware name: To be filled by O.E.M. To be filled by 
O.E.M./Aptio CRB, BIOS 5.6.5 03/23/2017
[ 2402.429343] Workqueue: events_power_efficient nft_rbtree_gc [nft_set_rbtree]
[ 2402.429343] RIP: 0010:rb_next+0x1e/0x130
[ 2402.429343] Code: e9 de f2 ff ff 0f 1f 80 00 00 00 00 41 55 48 89 fa 41 54 
55 53 48 c1 ea 03 48 b8 00 00 00 0
[ 2402.429343] RSP: 0018:880105f77678 EFLAGS: 00010296
[ 2402.429343] RAX: dc00 RBX: 8801143e3428 RCX: 11002287c69c
[ 2402.429343] RDX:  RSI: 0004 RDI: 
[ 2402.429343] RBP:  R08: ed0016aabc24 R09: ed0016aabc24
[ 2402.429343] R10: 0001 R11: ed0016aabc23 R12: 
[ 2402.429343] R13: 8800b6933388 R14: dc00 R15: 8801143e3440
[ 2402.534486] kasan: CONFIG_KASAN_INLINE enabled
[ 2402.534212] FS:  () GS:88011b60() 
knlGS:
[ 2402.534212] CS:  0010 DS:  ES:  CR0: 80050033
[ 2402.534212] CR2: 00863008 CR3: a3c16000 CR4: 001006e0
[ 2402.534212] Call Trace:
[ 2402.534212]  nft_rbtree_gc+0x2b5/0x5f0 [nft_set_rbtree]
[ 2402.534212]  process_one_work+0xc1b/0x1ee0
[ 2402.540329] kasan: GPF could be caused by NULL-ptr deref or user memory 
access
[ 2402.534212]  ? _raw_spin_unlock_irq+0x29/0x40
[ 2402.534212]  ? pwq_dec_nr_in_flight+0x3e0/0x3e0
[ 2402.534212]  ? set_load_weight+0x270/0x270
[ 2402.534212]  ? __schedule+0x6ea/0x1fb0
[ 2402.534212]  ? __sched_text_start+0x8/0x8
[ 2402.534212]  ? save_trace+0x320/0x320
[ 2402.534212]  ? sched_clock_local+0xe2/0x150
[ 2402.534212]  ? find_held_lock+0x39/0x1c0
[ 2402.534212]  ? worker_thread+0x35f/0x1150
[ 2402.534212]  ? lock_contended+0xe90/0xe90
[ 2402.534212]  ? __lock_acquire+0x4520/0x4520
[ 2402.534212]  ? do_raw_spin_unlock+0xb1/0x350
[ 2402.534212]  ? do_raw_spin_trylock+0x111/0x1b0
[ 2402.534212]  ? do_raw_spin_lock+0x1f0/0x1f0
[ 2402.534212]  worker_thread+0x169/0x1150

V2:
 - Do not add interval check routine in nft_set_rbtree.
  - Requested by Pablo Neira Ayuso

Fixes: 8d8540c4f5e0("netfilter: nft_set_rbtree: add timeout support")
Signed-off-by: Taehee Yoo 
---
 net/netfilter/nft_set_rbtree.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/net/netfilter/nft_set_rbtree.c b/net/netfilter/nft_set_rbtree.c
index 1f8f257..09e3a15 100644
--- a/net/netfilter/nft_set_rbtree.c
+++ b/net/netfilter/nft_set_rbtree.c
@@ -381,7 +381,7 @@ static void nft_rbtree_gc(struct work_struct *work)
 
gcb = nft_set_gc_batch_check(set, gcb, GFP_ATOMIC);
if (!gcb)
-   goto out;
+   break;
 
atomic_dec(>nelems);
nft_set_gc_batch_add(gcb, rbe);
@@ -390,10 +390,12 @@ static void nft_rbtree_gc(struct work_struct *work)
rbe = rb_entry(prev, struct nft_rbtree_elem, node);
atomi

[PATCH V2 nf 1/3] netfilter: nft_set_hash: add rcu_barrier() in the nft_rhash_destroy()

2018-07-10 Thread Taehee Yoo
GC of set uses call_rcu() to destroy elements.
So that elements would be destroyed after destroying sets and chains.
But, elements should be destroyed before destroying sets and chains.
In order to wait calling call_rcu(), a rcu_barrier() is added.

In order to test correctly, below patch should be applied.
https://patchwork.ozlabs.org/patch/940883/

test scripts:
   %cat test.nft
   table ip aa {
   map map1 {
   type ipv4_addr : verdict; flags timeout;
   elements = {
   0 : jump a0,
   1 : jump a0,
   2 : jump a0,
   3 : jump a0,
   4 : jump a0,
   5 : jump a0,
   6 : jump a0,
   7 : jump a0,
   8 : jump a0,
   9 : jump a0,
   }
   timeout 1s;
   }
   chain a0 {
   }
   }
   flush ruleset

   [ ... ]

   table ip aa {
   map map1 {
   type ipv4_addr : verdict; flags timeout;
   elements = {
   0 : jump a0,
   1 : jump a0,
   2 : jump a0,
   3 : jump a0,
   4 : jump a0,
   5 : jump a0,
   6 : jump a0,
   7 : jump a0,
   8 : jump a0,
   9 : jump a0,
   }
   timeout 1s;
   }
   chain a0 {
   }
   }
   flush ruleset

Splat looks like:
[  200.795603] kernel BUG at net/netfilter/nf_tables_api.c:1363!
[  200.806944] invalid opcode:  [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[  200.812253] CPU: 1 PID: 1582 Comm: nft Not tainted 4.17.0+ #24
[  200.820297] Hardware name: To be filled by O.E.M. To be filled by 
O.E.M./Aptio CRB, BIOS 5.6.5 07/08/2015
[  200.830309] RIP: 0010:nf_tables_chain_destroy.isra.34+0x62/0x240 [nf_tables]
[  200.838317] Code: 43 50 85 c0 74 26 48 8b 45 00 48 8b 4d 08 ba 54 05 00 00 
48 c7 c6 60 6d 29 c0 48 c7 c7 c0 65 29 c0
4c 8b 40 08 e8 58 e5 fd f8 <0f> 0b 48 89 da 48 b8 00 00 00 00 00 fc ff
[  200.860366] RSP: :880118dbf4d0 EFLAGS: 00010282
[  200.866354] RAX: 0061 RBX: 88010cdeaf08 RCX: 
[  200.874355] RDX: 0061 RSI: 0008 RDI: ed00231b7e90
[  200.882361] RBP: 880118dbf4e8 R08: ed002373bcfb R09: ed002373bcfa
[  200.890354] R10:  R11: ed002373bcfb R12: dead0200
[  200.898356] R13: dead0100 R14: bb62af38 R15: dc00
[  200.906354] FS:  7fefc31fd700() GS:88011b80() 
knlGS:
[  200.915533] CS:  0010 DS:  ES:  CR0: 80050033
[  200.922355] CR2: 557f1c8e9128 CR3: 00010688 CR4: 001006e0
[  200.930353] Call Trace:
[  200.932351]  ? nf_tables_commit+0x26f6/0x2c60 [nf_tables]
[  200.939525]  ? nf_tables_setelem_notify.constprop.49+0x1a0/0x1a0 [nf_tables]
[  200.947525]  ? nf_tables_delchain+0x6e0/0x6e0 [nf_tables]
[  200.952383]  ? nft_add_set_elem+0x1700/0x1700 [nf_tables]
[  200.959532]  ? nla_parse+0xab/0x230
[  200.963529]  ? nfnetlink_rcv_batch+0xd06/0x10d0 [nfnetlink]
[  200.968384]  ? nfnetlink_net_init+0x130/0x130 [nfnetlink]
[  200.975525]  ? debug_show_all_locks+0x290/0x290
[  200.980363]  ? debug_show_all_locks+0x290/0x290
[  200.986356]  ? sched_clock_cpu+0x132/0x170
[  200.990352]  ? find_held_lock+0x39/0x1b0
[  200.994355]  ? sched_clock_local+0x10d/0x130
[  200.999531]  ? memset+0x1f/0x40

Fixes: 9d0982927e79 ("netfilter: nft_hash: add support for timeouts")
Signed-off-by: Taehee Yoo 
---
 net/netfilter/nft_set_hash.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/netfilter/nft_set_hash.c b/net/netfilter/nft_set_hash.c
index 72ef35b..90c3e7e 100644
--- a/net/netfilter/nft_set_hash.c
+++ b/net/netfilter/nft_set_hash.c
@@ -387,6 +387,7 @@ static void nft_rhash_destroy(const struct nft_set *set)
struct nft_rhash *priv = nft_set_priv(set);
 
cancel_delayed_work_sync(>gc_work);
+   rcu_barrier();
rhashtable_free_and_destroy(>ht, nft_rhash_elem_destroy,
(void *)set);
 }
-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2 nf 0/3] netfilter: nf_tables: fix set destroying bugs

2018-07-10 Thread Taehee Yoo
This patch series fixes nft_set_hash and nft_set_rbtree bugs.

First patch adds rcu_barrier in the nft_rhash_destroy() to wait completion of
call_rcu by GC.

Second patch fixes bugs in nft_set_rbtree.c
 - add null check routine
 - add rcu_barrier in destroy routine

Last patch adds default set->size value.
 - all types of set have 65535 as default size.

V2:
 - Drops patch "netfilter: nft_set_hash: fix panic when destroying set".
  Requested by Florian Westphal
 - Do not add interval check routine in nft_set_rbtree.
  Requested by Pablo Neira Ayuso
 - Add default set->size value instead add check set->size routine.
  Requested by Florian Westphal

Taehee Yoo (3):
  netfilter: nft_set_hash: add rcu_barrier() in the nft_rhash_destroy()
  netfilter: nft_set_rbtree: fix panic when destroying set by GC
  netfilter: nf_tables: add default set size

 net/netfilter/nf_tables_api.c  | 13 -
 net/netfilter/nft_dynset.c |  6 +-
 net/netfilter/nft_set_hash.c   |  1 +
 net/netfilter/nft_set_rbtree.c |  9 ++---
 4 files changed, 16 insertions(+), 13 deletions(-)

-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nf 0/4] netfilter: nf_tables: fix set destroying bugs

2018-07-10 Thread Taehee Yoo
2018-07-09 22:56 GMT+09:00 Pablo Neira Ayuso :
> On Sun, Jul 01, 2018 at 08:43:16PM +0900, Taehee Yoo wrote:
>> This patch series fixes nft_set_hash and nft_set_rbtree bugs.
>>
>> First patch adds nft_rhash_iterate_destroy().
>> it walks and destroys all elements.
>>
>> Second patch adds rcu_barrier in the nft_rhash_destroy() to wait completion 
>> of
>> call_rcu by GC.
>>
>> Third patch reworks GC routine of nft_set_rbtree to fix bugs.
>>
>> Last patch adds set->size checking routine.
>
> Please, address feedback and send v2 on top of current nf.git.
>
> Thanks.

I had tested for a while with your patch.
It works well, so I plan to send v2 patch soon

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nf 3/4] netfilter: nft_set_rbtree: fix panic when destroying set by GC

2018-07-10 Thread Taehee Yoo
2018-07-09 22:48 GMT+09:00 Pablo Neira Ayuso :
> On Tue, Jul 03, 2018 at 11:40:06PM +0900, Taehee Yoo wrote:
>> 2018-07-03 19:20 GMT+09:00 Pablo Neira Ayuso :
>> > On Sun, Jul 01, 2018 at 08:44:52PM +0900, Taehee Yoo wrote:
>> >> This patch fixes below.
>> >> 1. check null pointer of rb_next.
>> >>  rb_next can return null. so null check routine should be added.
>> >> 2. check whether an interval flags is set or not.
>> >>  If interval flags is given, both a start node and a end node
>> >>  should be removed at once. If interval flags it not given,
>> >>  is doesn't matter.
>> >
>>
>> Thank you for reviewing!
>>
>> > For #2, I would prefer we reject rbtree for single elements. I'm going
>> > to send a patch for this.
>> >
>> > Would you rebase 1. and 3. on top?
>> >
>> > Thanks!
>> >
>>
>> Of course!
>> Do you mean that the 'top' is current top? or next top?
>
> I mean, on top of this one:
>
> https://patchwork.ozlabs.org/patch/940650/
>
> which makes sure we cannot use the interval set with single elements,
> which I understand is one of the problems this patch is addressing.
>
> Thanks.

Thanks for letting me know
I understood

Thanks
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nf 3/4] netfilter: nft_set_rbtree: fix panic when destroying set by GC

2018-07-03 Thread Taehee Yoo
2018-07-03 19:20 GMT+09:00 Pablo Neira Ayuso :
> On Sun, Jul 01, 2018 at 08:44:52PM +0900, Taehee Yoo wrote:
>> This patch fixes below.
>> 1. check null pointer of rb_next.
>>  rb_next can return null. so null check routine should be added.
>> 2. check whether an interval flags is set or not.
>>  If interval flags is given, both a start node and a end node
>>  should be removed at once. If interval flags it not given,
>>  is doesn't matter.
>

Thank you for reviewing!

> For #2, I would prefer we reject rbtree for single elements. I'm going
> to send a patch for this.
>
> Would you rebase 1. and 3. on top?
>
> Thanks!
>

Of course!
Do you mean that the 'top' is current top? or next top?

Thanks!

>> 3. add rcu_barrier in destroy routine.
>>  GC uses call_rcu to remove elements. but all elements should be
>>  removed before destroying set and chains. so that rcu_barrier is added.
>>
>> test script:
>>%cat test.nft
>>table inet aa {
>>  map map1 {
>>  type ipv4_addr : verdict; flags interval, timeout;
>>  elements = {
>>  0-1 : jump a0,
>>  3-4 : jump a0,
>>  6-7 : jump a0,
>>  9-10 : jump a0,
>>  12-13 : jump a0,
>>  15-16 : jump a0,
>>  18-19 : jump a0,
>>  21-22 : jump a0,
>>  24-25 : jump a0,
>>  27-28 : jump a0,
>>  }
>>  timeout 1s;
>>  }
>>  chain a0 {
>>  }
>>}
>>flush ruleset
>>table inet aa {
>>  map map1 {
>>  type ipv4_addr : verdict; flags interval, timeout;
>>  elements = {
>>  0-1 : jump a0,
>>  3-4 : jump a0,
>>  6-7 : jump a0,
>>  9-10 : jump a0,
>>  12-13 : jump a0,
>>  15-16 : jump a0,
>>  18-19 : jump a0,
>>  21-22 : jump a0,
>>  24-25 : jump a0,
>>  27-28 : jump a0,
>>  }
>>  timeout 1s;
>>  }
>>  chain a0 {
>>  }
>>}
>>flush ruleset
>>
>> splat looks like:
>> [ 2402.419838] kasan: GPF could be caused by NULL-ptr deref or user memory 
>> access
>> [ 2402.428433] general protection fault:  [#1] SMP DEBUG_PAGEALLOC KASAN 
>> PTI
>> [ 2402.429343] CPU: 1 PID: 1350 Comm: kworker/1:1 Not tainted 4.18.0-rc2+ #1
>> [ 2402.429343] Hardware name: To be filled by O.E.M. To be filled by 
>> O.E.M./Aptio CRB, BIOS 5.6.5 03/23/2017
>> [ 2402.429343] Workqueue: events_power_efficient nft_rbtree_gc 
>> [nft_set_rbtree]
>> [ 2402.429343] RIP: 0010:rb_next+0x1e/0x130
>> [ 2402.429343] Code: e9 de f2 ff ff 0f 1f 80 00 00 00 00 41 55 48 89 fa 41 
>> 54 55 53 48 c1 ea 03 48 b8 00 00 00 0
>> [ 2402.429343] RSP: 0018:880105f77678 EFLAGS: 00010296
>> [ 2402.429343] RAX: dc00 RBX: 8801143e3428 RCX: 
>> 11002287c69c
>> [ 2402.429343] RDX:  RSI: 0004 RDI: 
>> 
>> [ 2402.429343] RBP:  R08: ed0016aabc24 R09: 
>> ed0016aabc24
>> [ 2402.429343] R10: 0001 R11: ed0016aabc23 R12: 
>> 
>> [ 2402.429343] R13: 8800b6933388 R14: dc00 R15: 
>> 8801143e3440
>> [ 2402.534486] kasan: CONFIG_KASAN_INLINE enabled
>> [ 2402.534212] FS:  () GS:88011b60() 
>> knlGS:
>> [ 2402.534212] CS:  0010 DS:  ES:  CR0: 80050033
>> [ 2402.534212] CR2: 00863008 CR3: a3c16000 CR4: 
>> 001006e0
>> [ 2402.534212] Call Trace:
>> [ 2402.534212]  nft_rbtree_gc+0x2b5/0x5f0 [nft_set_rbtree]
>> [ 2402.534212]  process_one_work+0xc1b/0x1ee0
>> [ 2402.540329] kasan: GPF could be caused by NULL-ptr deref or user memory 
>> access
>> [ 2402.534212]  ? _raw_spin_unlock_irq+0x29/0x40
>> [ 2402.534212]  ? pwq_dec_nr_in_flight+0x3e0/0x3e0
>> [ 2402.534212]  ? set_load_weight+0x270/0x270
>> [ 2402.534212]  ? __schedule+0x6ea/0x1fb0
>> [ 2402.534212]  ? __sched_text_start+0x8/0x8
>> [ 2402.534212]  ? save_trac

Re: [PATCH nf 4/4] netfilter: nf_tables: check set->size before decreasing set->nelems

2018-07-03 Thread Taehee Yoo
2018-07-02 20:38 GMT+09:00 Florian Westphal :
> Taehee Yoo  wrote:
>> set->nelems is increased when set->size is given.
>> so that checking set->size routine should be added.
>
> Does it make sense to have sets with no upper size?
>
> I think it makes more sense to enforce an upper bound
> so that set->size is always nonzero.

Thank you for reviewing!

I agree,
In my opinion, default value that depend on set type should be given.
How do you think?

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nf 1/4] netfilter: nft_set_hash: fix panic when destroying set

2018-07-02 Thread Taehee Yoo
2018-07-02 20:45 GMT+09:00 Florian Westphal :
> Taehee Yoo  wrote:
>> In order to destroy elements of set, a rhashtable_free_and_destroy()
>> is used. the rhashtable_free_and_destroy() cancels a re-hash deferred work
>> then walks and destroys elements. at this moment, some elements are
>> still in a future_tbl. that elements are not destroyed.
>
> Wait.  Isn't that a bug in rhashtable_free_and_destroy()?
>
> I'd rather see rhashtable_free_and_destroy() do it correctly if
> possible rather than asking users of rhashtable_free_and_destroy()
> to not use that function...
>

Thank you for reviewing!

I didn't doubt that rhashtable_free_and_destroy has bug.
But on second thought, you're right. it seems that it's a bug in
rhashtable_free_and_destroy()

> Or did i misunderstand and its a nft specific problem?

You understood it correctly and
In my opinion, it's an ordinary case.

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nf 3/4] netfilter: nft_set_rbtree: fix panic when destroying set by GC

2018-07-01 Thread Taehee Yoo
This patch fixes below.
1. check null pointer of rb_next.
 rb_next can return null. so null check routine should be added.
2. check whether an interval flags is set or not.
 If interval flags is given, both a start node and a end node
 should be removed at once. If interval flags it not given,
 is doesn't matter.
3. add rcu_barrier in destroy routine.
 GC uses call_rcu to remove elements. but all elements should be
 removed before destroying set and chains. so that rcu_barrier is added.

test script:
   %cat test.nft
   table inet aa {
   map map1 {
   type ipv4_addr : verdict; flags interval, timeout;
   elements = {
   0-1 : jump a0,
   3-4 : jump a0,
   6-7 : jump a0,
   9-10 : jump a0,
   12-13 : jump a0,
   15-16 : jump a0,
   18-19 : jump a0,
   21-22 : jump a0,
   24-25 : jump a0,
   27-28 : jump a0,
   }
   timeout 1s;
   }
   chain a0 {
   }
   }
   flush ruleset
   table inet aa {
   map map1 {
   type ipv4_addr : verdict; flags interval, timeout;
   elements = {
   0-1 : jump a0,
   3-4 : jump a0,
   6-7 : jump a0,
   9-10 : jump a0,
   12-13 : jump a0,
   15-16 : jump a0,
   18-19 : jump a0,
   21-22 : jump a0,
   24-25 : jump a0,
   27-28 : jump a0,
   }
   timeout 1s;
   }
   chain a0 {
   }
   }
   flush ruleset

splat looks like:
[ 2402.419838] kasan: GPF could be caused by NULL-ptr deref or user memory 
access
[ 2402.428433] general protection fault:  [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[ 2402.429343] CPU: 1 PID: 1350 Comm: kworker/1:1 Not tainted 4.18.0-rc2+ #1
[ 2402.429343] Hardware name: To be filled by O.E.M. To be filled by 
O.E.M./Aptio CRB, BIOS 5.6.5 03/23/2017
[ 2402.429343] Workqueue: events_power_efficient nft_rbtree_gc [nft_set_rbtree]
[ 2402.429343] RIP: 0010:rb_next+0x1e/0x130
[ 2402.429343] Code: e9 de f2 ff ff 0f 1f 80 00 00 00 00 41 55 48 89 fa 41 54 
55 53 48 c1 ea 03 48 b8 00 00 00 0
[ 2402.429343] RSP: 0018:880105f77678 EFLAGS: 00010296
[ 2402.429343] RAX: dc00 RBX: 8801143e3428 RCX: 11002287c69c
[ 2402.429343] RDX:  RSI: 0004 RDI: 
[ 2402.429343] RBP:  R08: ed0016aabc24 R09: ed0016aabc24
[ 2402.429343] R10: 0001 R11: ed0016aabc23 R12: 
[ 2402.429343] R13: 8800b6933388 R14: dc00 R15: 8801143e3440
[ 2402.534486] kasan: CONFIG_KASAN_INLINE enabled
[ 2402.534212] FS:  () GS:88011b60() 
knlGS:
[ 2402.534212] CS:  0010 DS:  ES:  CR0: 80050033
[ 2402.534212] CR2: 00863008 CR3: a3c16000 CR4: 001006e0
[ 2402.534212] Call Trace:
[ 2402.534212]  nft_rbtree_gc+0x2b5/0x5f0 [nft_set_rbtree]
[ 2402.534212]  process_one_work+0xc1b/0x1ee0
[ 2402.540329] kasan: GPF could be caused by NULL-ptr deref or user memory 
access
[ 2402.534212]  ? _raw_spin_unlock_irq+0x29/0x40
[ 2402.534212]  ? pwq_dec_nr_in_flight+0x3e0/0x3e0
[ 2402.534212]  ? set_load_weight+0x270/0x270
[ 2402.534212]  ? __schedule+0x6ea/0x1fb0
[ 2402.534212]  ? __sched_text_start+0x8/0x8
[ 2402.534212]  ? save_trace+0x320/0x320
[ 2402.534212]  ? sched_clock_local+0xe2/0x150
[ 2402.534212]  ? find_held_lock+0x39/0x1c0
[ 2402.534212]  ? worker_thread+0x35f/0x1150
[ 2402.534212]  ? lock_contended+0xe90/0xe90
[ 2402.534212]  ? __lock_acquire+0x4520/0x4520
[ 2402.534212]  ? do_raw_spin_unlock+0xb1/0x350
[ 2402.534212]  ? do_raw_spin_trylock+0x111/0x1b0
[ 2402.534212]  ? do_raw_spin_lock+0x1f0/0x1f0
[ 2402.534212]  worker_thread+0x169/0x1150

Fixes: 8d8540c4f5e0("netfilter: nft_set_rbtree: add timeout support")
Signed-off-by: Taehee Yoo 
---
 net/netfilter/nft_set_rbtree.c | 34 ++
 1 file changed, 22 insertions(+), 12 deletions(-)

diff --git a/net/netfilter/nft_set_rbtree.c b/net/netfilter/nft_set_rbtree.c
index 7f3a9a2..1db52b0 100644
--- a/net/netfilter/nft_set_rbtree.c
+++ b/net/netfilter/nft_set_rbtree.c
@@ -356,27 +356,27 @@ static void nft_rbtree_walk(const struct nft_ctx *ctx,
 static void nft_rbtree_gc(struct work_struct *work)
 {
struct nft_set_gc_batch *gcb = NULL;
-   struct rb_node *node, *prev = NULL;
+   struct rb_node *node;
struct nft_rbtree_elem *rbe;
struct nft_rbtree *priv;
struct nft_set *set;
int i;
+   bool interv

[PATCH nf 2/4] netfilter: nft_set_hash: add rcu_barrier() in the nft_rhash_destroy()

2018-07-01 Thread Taehee Yoo
GC of set uses call_rcu() to destroy elements.
So that elements would be destroyed after destroying sets and chains.
But, elements should be destroyed before destroying sets and chains.
In order to wait calling call_rcu(), a rcu_barrier() is added.

test scripts:
   %cat test.nft
   table ip aa {
   map map1 {
   type ipv4_addr : verdict; flags timeout;
   elements = {
   0 : jump a0,
   1 : jump a0,
   2 : jump a0,
   3 : jump a0,
   4 : jump a0,
   5 : jump a0,
   6 : jump a0,
   7 : jump a0,
   8 : jump a0,
   9 : jump a0,
   }
   timeout 1s;
   }
   chain a0 {
   }
   }
   flush ruleset

   [ ... ]

   table ip aa {
   map map1 {
   type ipv4_addr : verdict; flags timeout;
   elements = {
   0 : jump a0,
   1 : jump a0,
   2 : jump a0,
   3 : jump a0,
   4 : jump a0,
   5 : jump a0,
   6 : jump a0,
   7 : jump a0,
   8 : jump a0,
   9 : jump a0,
   }
   timeout 1s;
   }
   chain a0 {
   }
   }
   flush ruleset

Splat looks like:
[  200.795603] kernel BUG at net/netfilter/nf_tables_api.c:1363!
[  200.806944] invalid opcode:  [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[  200.812253] CPU: 1 PID: 1582 Comm: nft Not tainted 4.17.0+ #24
[  200.820297] Hardware name: To be filled by O.E.M. To be filled by 
O.E.M./Aptio CRB, BIOS 5.6.5 07/08/2015
[  200.830309] RIP: 0010:nf_tables_chain_destroy.isra.34+0x62/0x240 [nf_tables]
[  200.838317] Code: 43 50 85 c0 74 26 48 8b 45 00 48 8b 4d 08 ba 54 05 00 00 
48 c7 c6 60 6d 29 c0 48 c7 c7 c0 65 29 c0
 4c 8b 40 08 e8 58 e5 fd f8 <0f> 0b 48 89 da 48 b8 00 00 00 00 00 fc ff
[  200.860366] RSP: :880118dbf4d0 EFLAGS: 00010282
[  200.866354] RAX: 0061 RBX: 88010cdeaf08 RCX: 
[  200.874355] RDX: 0061 RSI: 0008 RDI: ed00231b7e90
[  200.882361] RBP: 880118dbf4e8 R08: ed002373bcfb R09: ed002373bcfa
[  200.890354] R10:  R11: ed002373bcfb R12: dead0200
[  200.898356] R13: dead0100 R14: bb62af38 R15: dc00
[  200.906354] FS:  7fefc31fd700() GS:88011b80() 
knlGS:
[  200.915533] CS:  0010 DS:  ES:  CR0: 80050033
[  200.922355] CR2: 557f1c8e9128 CR3: 00010688 CR4: 001006e0
[  200.930353] Call Trace:
[  200.932351]  ? nf_tables_commit+0x26f6/0x2c60 [nf_tables]
[  200.939525]  ? nf_tables_setelem_notify.constprop.49+0x1a0/0x1a0 [nf_tables]
[  200.947525]  ? nf_tables_delchain+0x6e0/0x6e0 [nf_tables]
[  200.952383]  ? nft_add_set_elem+0x1700/0x1700 [nf_tables]
[  200.959532]  ? nla_parse+0xab/0x230
[  200.963529]  ? nfnetlink_rcv_batch+0xd06/0x10d0 [nfnetlink]
[  200.968384]  ? nfnetlink_net_init+0x130/0x130 [nfnetlink]
[  200.975525]  ? debug_show_all_locks+0x290/0x290
[  200.980363]  ? debug_show_all_locks+0x290/0x290
[  200.986356]  ? sched_clock_cpu+0x132/0x170
[  200.990352]  ? find_held_lock+0x39/0x1b0
[  200.994355]  ? sched_clock_local+0x10d/0x130
[  200.999531]  ? memset+0x1f/0x40

Fixes: 9d0982927e79 ("netfilter: nft_hash: add support for timeouts")
Signed-off-by: Taehee Yoo 
---
 net/netfilter/nft_set_hash.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/netfilter/nft_set_hash.c b/net/netfilter/nft_set_hash.c
index 695d5e8..ef66824 100644
--- a/net/netfilter/nft_set_hash.c
+++ b/net/netfilter/nft_set_hash.c
@@ -415,6 +415,7 @@ static void nft_rhash_destroy(const struct nft_set *set)
struct nft_rhash *priv = nft_set_priv(set);
 
cancel_delayed_work_sync(>gc_work);
+   rcu_barrier();
nft_rhash_iterate_destroy(set, >ht);
rhashtable_destroy(>ht);
 }
-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nf 4/4] netfilter: nf_tables: check set->size before decreasing set->nelems

2018-07-01 Thread Taehee Yoo
set->nelems is increased when set->size is given.
so that checking set->size routine should be added.

Signed-off-by: Taehee Yoo 
---
 net/netfilter/nf_tables_api.c  | 6 --
 net/netfilter/nft_set_hash.c   | 3 ++-
 net/netfilter/nft_set_rbtree.c | 7 ---
 3 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 896d4a3..99a85b6 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -6377,7 +6377,8 @@ static int nf_tables_commit(struct net *net, struct 
sk_buff *skb)
 >elem,
 NFT_MSG_DELSETELEM, 0);
te->set->ops->remove(net, te->set, >elem);
-   atomic_dec(>set->nelems);
+   if (te->set->size)
+   atomic_dec(>set->nelems);
te->set->ndeact--;
break;
case NFT_MSG_NEWOBJ:
@@ -6510,7 +6511,8 @@ static int __nf_tables_abort(struct net *net)
te = (struct nft_trans_elem *)trans->data;
 
te->set->ops->remove(net, te->set, >elem);
-   atomic_dec(>set->nelems);
+   if (te->set->size)
+   atomic_dec(>set->nelems);
break;
case NFT_MSG_DELSETELEM:
te = (struct nft_trans_elem *)trans->data;
diff --git a/net/netfilter/nft_set_hash.c b/net/netfilter/nft_set_hash.c
index ef66824..d736ab1 100644
--- a/net/netfilter/nft_set_hash.c
+++ b/net/netfilter/nft_set_hash.c
@@ -328,7 +328,8 @@ static void nft_rhash_gc(struct work_struct *work)
if (gcb == NULL)
goto out;
rhashtable_remove_fast(>ht, >node, nft_rhash_params);
-   atomic_dec(>nelems);
+   if (set->size)
+   atomic_dec(>nelems);
nft_set_gc_batch_add(gcb, he);
}
 out:
diff --git a/net/netfilter/nft_set_rbtree.c b/net/netfilter/nft_set_rbtree.c
index 1db52b0..de2d6b6 100644
--- a/net/netfilter/nft_set_rbtree.c
+++ b/net/netfilter/nft_set_rbtree.c
@@ -382,8 +382,8 @@ static void nft_rbtree_gc(struct work_struct *work)
gcb = nft_set_gc_batch_check(set, gcb, GFP_ATOMIC);
if (!gcb)
goto out;
-
-   atomic_dec(>nelems);
+   if (set->size)
+   atomic_dec(>nelems);
nft_set_gc_batch_add(gcb, rbe);
 
if (interval) {
@@ -398,7 +398,8 @@ static void nft_rbtree_gc(struct work_struct *work)
}
if (nft_set_elem_mark_busy(>ext))
continue;
-   atomic_dec(>nelems);
+   if (set->size)
+   atomic_dec(>nelems);
nft_set_gc_batch_add(gcb, rbe);
}
}
-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nf 1/4] netfilter: nft_set_hash: fix panic when destroying set

2018-07-01 Thread Taehee Yoo
In order to destroy elements of set, a rhashtable_free_and_destroy()
is used. the rhashtable_free_and_destroy() cancels a re-hash deferred work
then walks and destroys elements. at this moment, some elements are
still in a future_tbl. that elements are not destroyed.

A nft_rhash_iterate_destroy() walks all elements in the tbl and
the future_tbl then destroy.

test script:
   %cat test.nft
   table ip aa {
map map1 {
type ipv4_addr : verdict;
elements = {
   0 : jump a0,
   1 : jump a0,
   2 : jump a0,
   3 : jump a0,
   4 : jump a0,
   5 : jump a0,
   6 : jump a0,
   7 : jump a0,
   8 : jump a0,
   9 : jump a0,
}
}
chain a0 {
}
   }
   flush ruleset

   [ ... ]

   table ip aa {
map map1 {
type ipv4_addr : verdict;
elements = {
   0 : jump a0,
   1 : jump a0,
   2 : jump a0,
   3 : jump a0,
   4 : jump a0,
   5 : jump a0,
   6 : jump a0,
   7 : jump a0,
   8 : jump a0,
   9 : jump a0,
}
}
chain a0 {
}
   }
   flush ruleset

Splat looks like:
[  200.795603] kernel BUG at net/netfilter/nf_tables_api.c:1363!
[  200.806944] invalid opcode:  [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[  200.812253] CPU: 1 PID: 1582 Comm: nft Not tainted 4.17.0+ #24
[  200.820297] Hardware name: To be filled by O.E.M. To be filled by 
O.E.M./Aptio CRB, BIOS 5.6.5 07/08/2015
[  200.830309] RIP: 0010:nf_tables_chain_destroy.isra.34+0x62/0x240 [nf_tables]
[  200.838317] Code: 43 50 85 c0 74 26 48 8b 45 00 48 8b 4d 08 ba 54 05 00 00 
48 c7 c6 60 6d 29 c0 48 c7 c7 c0 65 29 c0 4c 8b 40 08 e8 58 e5 fd f8 <0f> 0b 48 
89 da 48 b8 00 00 00 00 00 fc ff
[  200.860366] RSP: :880118dbf4d0 EFLAGS: 00010282
[  200.866354] RAX: 0061 RBX: 88010cdeaf08 RCX: 
[  200.874355] RDX: 0061 RSI: 0008 RDI: ed00231b7e90
[  200.882361] RBP: 880118dbf4e8 R08: ed002373bcfb R09: ed002373bcfa
[  200.890354] R10:  R11: ed002373bcfb R12: dead0200
[  200.898356] R13: dead0100 R14: bb62af38 R15: dc00
[  200.906354] FS:  7fefc31fd700() GS:88011b80() 
knlGS:
[  200.915533] CS:  0010 DS:  ES:  CR0: 80050033
[  200.922355] CR2: 557f1c8e9128 CR3: 00010688 CR4: 001006e0
[  200.930353] Call Trace:
[  200.932351]  ? nf_tables_commit+0x26f6/0x2c60 [nf_tables]
[  200.939525]  ? nf_tables_setelem_notify.constprop.49+0x1a0/0x1a0 [nf_tables]
[  200.947525]  ? nf_tables_delchain+0x6e0/0x6e0 [nf_tables]
[  200.952383]  ? nft_add_set_elem+0x1700/0x1700 [nf_tables]
[  200.959532]  ? nla_parse+0xab/0x230
[  200.963529]  ? nfnetlink_rcv_batch+0xd06/0x10d0 [nfnetlink]
[  200.968384]  ? nfnetlink_net_init+0x130/0x130 [nfnetlink]
[  200.975525]  ? debug_show_all_locks+0x290/0x290
[  200.980363]  ? debug_show_all_locks+0x290/0x290
[  200.986356]  ? sched_clock_cpu+0x132/0x170
[  200.990352]  ? find_held_lock+0x39/0x1b0
[  200.994355]  ? sched_clock_local+0x10d/0x130
[  200.999531]  ? memset+0x1f/0x40

Signed-off-by: Taehee Yoo 
---
 net/netfilter/nft_set_hash.c | 42 +++---
 1 file changed, 35 insertions(+), 7 deletions(-)

diff --git a/net/netfilter/nft_set_hash.c b/net/netfilter/nft_set_hash.c
index 6f9a136..695d5e8 100644
--- a/net/netfilter/nft_set_hash.c
+++ b/net/netfilter/nft_set_hash.c
@@ -341,6 +341,39 @@ static void nft_rhash_gc(struct work_struct *work)
   nft_set_gc_interval(set));
 }
 
+static void nft_rhash_iterate_destroy(const struct nft_set *set,
+ struct rhashtable *ht)
+{
+   struct nft_rhash_elem *he, *phe = NULL;
+   struct rhashtable_iter hti;
+   struct nft_set *pset = (struct nft_set *)set;
+
+   rhashtable_walk_enter(ht, );
+   rhashtable_walk_start();
+
+   while ((he = rhashtable_walk_next())) {
+   if (IS_ERR(he)) {
+   if (PTR_ERR(he) != -EAGAIN)
+   break;
+   continue;
+   }
+   if (nft_set_elem_mark_busy(>ext))
+   continue;
+
+   rhashtable_remove_fast(ht, >node, nft_rhash_params);
+   if (pset->size)
+   atomic_dec(>nelems);
+   if (phe)
+   nft_set_elem_destroy(pset, phe, true);
+ 

[PATCH nf 0/4] netfilter: nf_tables: fix set destroying bugs

2018-07-01 Thread Taehee Yoo
This patch series fixes nft_set_hash and nft_set_rbtree bugs.

First patch adds nft_rhash_iterate_destroy().
it walks and destroys all elements.

Second patch adds rcu_barrier in the nft_rhash_destroy() to wait completion of
call_rcu by GC.

Third patch reworks GC routine of nft_set_rbtree to fix bugs.

Last patch adds set->size checking routine.

Taehee Yoo (4):
  netfilter: nft_set_hash: fix panic when destroying set
  netfilter: nft_set_hash: add rcu_barrier() in the nft_rhash_destroy()
  netfilter: nft_set_rbtree: fix panic when destroying set by GC
  netfilter: nf_tables: check set->size before decreasing set->nelems

 net/netfilter/nf_tables_api.c  |  6 --
 net/netfilter/nft_set_hash.c   | 46 ++
 net/netfilter/nft_set_rbtree.c | 41 +++--
 3 files changed, 68 insertions(+), 25 deletions(-)

-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nf-next] netfilter: nft_reject_bridge: remove unnecessary ttl set

2018-06-27 Thread Taehee Yoo
2018-06-27 23:48 GMT+09:00 Pablo Neira Ayuso :
> On Tue, Jun 12, 2018 at 01:54:47AM +0900, Taehee Yoo wrote:
>> In the nft_reject_br_send_v4_tcp_reset(), a ttl is set by
>> the nf_reject_ip_tcphdr_put(). so, below code is unnecessary.
>
> Applied, thanks.
>
> BTW, it's nf_reject_iphdr_put() the one that sets ttl, not
> nf_reject_ip_tcphdr_put(). I have mangled this before applying, no
> problem.

Thank you for reviewing!
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nf-next] netfilter: nft_reject_bridge: remove unnecessary ttl set

2018-06-11 Thread Taehee Yoo
In the nft_reject_br_send_v4_tcp_reset(), a ttl is set by
the nf_reject_ip_tcphdr_put(). so, below code is unnecessary.

Signed-off-by: Taehee Yoo 
---
 net/bridge/netfilter/nft_reject_bridge.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/bridge/netfilter/nft_reject_bridge.c 
b/net/bridge/netfilter/nft_reject_bridge.c
index eaf05de..e0b082c 100644
--- a/net/bridge/netfilter/nft_reject_bridge.c
+++ b/net/bridge/netfilter/nft_reject_bridge.c
@@ -89,8 +89,7 @@ static void nft_reject_br_send_v4_tcp_reset(struct net *net,
niph = nf_reject_iphdr_put(nskb, oldskb, IPPROTO_TCP,
   net->ipv4.sysctl_ip_default_ttl);
nf_reject_ip_tcphdr_put(nskb, oldskb, oth);
-   niph->ttl   = net->ipv4.sysctl_ip_default_ttl;
-   niph->tot_len   = htons(nskb->len);
+   niph->tot_len = htons(nskb->len);
ip_send_check(niph);
 
nft_reject_br_push_etherhdr(oldskb, nskb);
-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nf-next] netfilter: nf_tables: use WARN_ON_ONCE instead of BUG_ON in nft_do_chain()

2018-06-11 Thread Taehee Yoo
When depth of chain is bigger than NFT_JUMP_STACK_SIZE,
the nft_do_chain crashes.
But there is no need to crash hard here.

Suggested-by: Florian Westphal 
Signed-off-by: Taehee Yoo 
---
 net/netfilter/nf_tables_core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/nf_tables_core.c b/net/netfilter/nf_tables_core.c
index deff10a..8de912c 100644
--- a/net/netfilter/nf_tables_core.c
+++ b/net/netfilter/nf_tables_core.c
@@ -183,7 +183,8 @@ nft_do_chain(struct nft_pktinfo *pkt, void *priv)
 
switch (regs.verdict.code) {
case NFT_JUMP:
-   BUG_ON(stackptr >= NFT_JUMP_STACK_SIZE);
+   if (WARN_ON_ONCE(stackptr >= NFT_JUMP_STACK_SIZE))
+   return NF_DROP;
jumpstack[stackptr].chain = chain;
jumpstack[stackptr].rules = rules + 1;
stackptr++;
-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nf-next] netfilter: nf_tables: fix jumpstack depth validation

2018-06-11 Thread Taehee Yoo
Thank you for reviewing!

2018년 6월 11일 (월) 오후 9:14, Florian Westphal 님이 작성:
>
> Taehee Yoo  wrote:
> > The level of struct nft_ctx is updated by nf_tables_check_loops().
>
> [..]
>
> > [  168.803743] kernel BUG at net/netfilter/nf_tables_core.c:186!
>
> Could you also send a followup patch to replace this BUG_ON with
> WARN_ON_ONCE+ return NF_DROP?
>
> There is no need to crash hard here.

Okay, I will send a followup patch!

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nf-next] netfilter: nf_tables: fix jumpstack depth validation

2018-06-11 Thread Taehee Yoo
The level of struct nft_ctx is updated by nf_tables_check_loops().
That is used to validate jumpstack depth.
But jumpstack validation routine doesn't update and validate recursively.
So, in some cases, chain depth can be bigger than the NFT_JUMP_STACK_SIZE.

After this patch, The jumpstack validation routine is located in
the nft_chain_validate().
When new rules or new set elements are added,
the nft_table_validate() is called by the nf_tables_newrule
and the nf_tables_newsetelem.
The nft_table_validate() calls the nft_chain_validate()
that visit all their children chains recursively.
So it can update depth of chain certainly.

Reproducer:
   %cat ./test.sh
   #!/bin/bash
   nft add table ip filter
   nft add chain ip filter input { type filter hook input priority 0\; }
   for ((i=0;i<20;i++)); do
nft add chain ip filter a$i
   done

   nft add rule ip filter input jump a1

   for ((i=0;i<10;i++)); do
nft add rule ip filter a$i jump a$((i+1))
   done

   for ((i=11;i<19;i++)); do
nft add rule ip filter a$i jump a$((i+1))
   done

   nft add rule ip filter a10 jump a11

Result:
[  168.803743] kernel BUG at net/netfilter/nf_tables_core.c:186!
[  168.810881] invalid opcode:  [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[  168.812091] Modules linked in: nf_tables nfnetlink ip_tables x_tables
[  168.812091] CPU: 0 PID: 8 Comm: ksoftirqd/0 Not tainted 4.17.0-rc7+ #186
[  168.812091] RIP: 0010:nft_do_chain+0x9fe/0xf50 [nf_tables]
[  168.812091] RSP: :88011a5475b0 EFLAGS: 00010212
[  168.812091] RAX: fffd RBX: 88011a5476a0 RCX: 
[  168.812091] RDX: 0010 RSI: 880111d69bd8 RDI: 88011a5476b0
[  168.812091] RBP: 88011a547870 R08: ed00234a8ed6 R09: ed00234a8ed5
[  168.812091] R10: 88011a5476af R11: ed00234a8ed6 R12: 88011a5478b8
[  168.881353] R13: 880111d69be0 R14: 880111d69bc0 R15: dc00
[  168.887792] FS:  () GS:88011b60() 
knlGS:
[  168.887792] CS:  0010 DS:  ES:  CR0: 80050033
[  168.905313] CR2: 7ffd58ee6148 CR3: 67416000 CR4: 001006f0
[  168.917438] Call Trace:
[  168.917438]  ? __save_stack_trace+0x73/0xd0
[  168.922459]  ? __nft_trace_packet+0x1a0/0x1a0 [nf_tables]
[  168.922459]  ? save_stack+0x92/0xa0
[  168.922459]  ? ip_rcv+0x802/0xe10
[  168.922459]  ? sched_clock_cpu+0x144/0x180
[  168.922459]  ? sched_clock_local+0xe2/0x150
[  168.922459]  ? __lock_acquire+0xcea/0x4ed0
[  168.922459]  ? sched_clock_cpu+0x144/0x180
[  168.922459]  ? debug_check_no_locks_freed+0x280/0x280
[  168.922459]  ? nft_do_chain_ipv4+0x16f/0x1e0 [nf_tables]
[  168.922459]  nft_do_chain_ipv4+0x16f/0x1e0 [nf_tables]
[  168.922459]  ? nft_do_chain_arp+0xa0/0xa0 [nf_tables]
[  168.922459]  ? lock_acquire+0x193/0x380
[  168.922459]  ? lock_acquire+0x193/0x380
[  168.922459]  ? ip_local_deliver+0x1c6/0x3c0
[  168.922459]  nf_hook_slow+0xae/0x170
[  168.922459]  ip_local_deliver+0x293/0x3c0
[  168.922459]  ? ip_call_ra_chain+0x490/0x490
[  168.922459]  ? ip_rcv_finish+0x1910/0x1910
[  168.922459]  ip_rcv+0x802/0xe10
[ ... ]


Signed-off-by: Taehee Yoo 
---
 include/net/netfilter/nf_tables.h |  4 ++--
 net/netfilter/nf_tables_api.c | 10 +++---
 net/netfilter/nft_immediate.c |  3 +++
 net/netfilter/nft_lookup.c| 13 +++--
 4 files changed, 19 insertions(+), 11 deletions(-)

diff --git a/include/net/netfilter/nf_tables.h 
b/include/net/netfilter/nf_tables.h
index 08c005c..a7d6476 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -150,6 +150,7 @@ static inline void nft_data_debug(const struct nft_data 
*data)
  * @portid: netlink portID of the original message
  * @seq: netlink sequence number
  * @family: protocol family
+ * @level: depth of the chains
  * @report: notify via unicast netlink message
  */
 struct nft_ctx {
@@ -160,6 +161,7 @@ struct nft_ctx {
u32 portid;
u32 seq;
u8  family;
+   u8  level;
boolreport;
 };
 
@@ -865,7 +867,6 @@ enum nft_chain_flags {
  * @table: table that this chain belongs to
  * @handle: chain handle
  * @use: number of jump references to this chain
- * @level: length of longest path to this chain
  * @flags: bitmask of enum nft_chain_flags
  * @name: name of the chain
  */
@@ -878,7 +879,6 @@ struct nft_chain {
struct nft_table*table;
u64 handle;
u32 use;
-   u16 level;
u8  flags:6,
genmask:2;
char*name;
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_ap

[PATCH nf] netfilter: nft_set_rbtree: fix parameter of __nft_rbtree_lookup()

2018-06-06 Thread Taehee Yoo
The parameter this doesn't have a flags value. so that it can't be
used by nft_rbtree_interval_end().

test commands:
   %nft add table ip filter
   %nft add set ip filter s { type ipv4_addr \; flags interval \; }
   %nft add element ip filter s {0-1}
   %nft add element ip filter s {2-10}
   %nft add chain ip filter input { type filter hook input priority 0\; }
   %nft add rule ip filter input ip saddr @s

Splat looks like:
[  246.752502] BUG: KASAN: slab-out-of-bounds in 
__nft_rbtree_lookup+0x677/0x6a0 [nft_set_rbtree]
[  246.752502] Read of size 1 at addr 88010d9efa47 by task http/1092

[  246.752502] CPU: 1 PID: 1092 Comm: http Not tainted 4.17.0-rc6+ #185
[  246.752502] Call Trace:
[  246.752502]  
[  246.752502]  dump_stack+0x74/0xbb
[  246.752502]  ? __nft_rbtree_lookup+0x677/0x6a0 [nft_set_rbtree]
[  246.752502]  print_address_description+0xc7/0x290
[  246.752502]  ? __nft_rbtree_lookup+0x677/0x6a0 [nft_set_rbtree]
[  246.752502]  kasan_report+0x22c/0x350
[  246.752502]  __nft_rbtree_lookup+0x677/0x6a0 [nft_set_rbtree]
[  246.752502]  nft_rbtree_lookup+0xc9/0x2d2 [nft_set_rbtree]
[  246.752502]  ? sched_clock_cpu+0x144/0x180
[  246.752502]  nft_lookup_eval+0x149/0x3a0 [nf_tables]
[  246.752502]  ? __lock_acquire+0xcea/0x4ed0
[  246.752502]  ? nft_lookup_init+0x6b0/0x6b0 [nf_tables]
[  246.752502]  nft_do_chain+0x263/0xf50 [nf_tables]
[  246.752502]  ? __nft_trace_packet+0x1a0/0x1a0 [nf_tables]
[  246.752502]  ? sched_clock_cpu+0x144/0x180
[ ... ]

Fixes: f9121355eb6f ("netfilter: nft_set_rbtree: incorrect assumption on lower 
interval lookups")
Signed-off-by: Taehee Yoo 
---
 net/netfilter/nft_set_rbtree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/nft_set_rbtree.c b/net/netfilter/nft_set_rbtree.c
index e6f08bc..26fa93b 100644
--- a/net/netfilter/nft_set_rbtree.c
+++ b/net/netfilter/nft_set_rbtree.c
@@ -65,7 +65,7 @@ static bool __nft_rbtree_lookup(const struct net *net, const 
struct nft_set *set
parent = rcu_dereference_raw(parent->rb_left);
if (interval &&
nft_rbtree_equal(set, this, interval) &&
-   nft_rbtree_interval_end(this) &&
+   nft_rbtree_interval_end(rbe) &&
!nft_rbtree_interval_end(interval))
continue;
interval = rbe;
-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nf] netfilter: nft_reject_bridge: fix skb allocation size in nft_reject_br_send_v6_unreach

2018-06-01 Thread Taehee Yoo
In order to allocate icmpv6 skb, sizeof(struct ipv6hdr) should be used.

Signed-off-by: Taehee Yoo 
---
 net/bridge/netfilter/nft_reject_bridge.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/bridge/netfilter/nft_reject_bridge.c 
b/net/bridge/netfilter/nft_reject_bridge.c
index eaf05de..6de9812 100644
--- a/net/bridge/netfilter/nft_reject_bridge.c
+++ b/net/bridge/netfilter/nft_reject_bridge.c
@@ -261,7 +261,7 @@ static void nft_reject_br_send_v6_unreach(struct net *net,
if (!reject6_br_csum_ok(oldskb, hook))
return;
 
-   nskb = alloc_skb(sizeof(struct iphdr) + sizeof(struct icmp6hdr) +
+   nskb = alloc_skb(sizeof(struct ipv6hdr) + sizeof(struct icmp6hdr) +
 LL_MAX_HEADER + len, GFP_ATOMIC);
if (!nskb)
return;
-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nf-next] netfilter: nf_tables: remove unused variables

2018-05-28 Thread Taehee Yoo
The comment and trace_loginfo are not used anymore.

Signed-off-by: Taehee Yoo <ap420...@gmail.com>
---
 net/netfilter/nf_tables_core.c | 16 
 1 file changed, 16 deletions(-)

diff --git a/net/netfilter/nf_tables_core.c b/net/netfilter/nf_tables_core.c
index d457d85..a1b93fa 100644
--- a/net/netfilter/nf_tables_core.c
+++ b/net/netfilter/nf_tables_core.c
@@ -23,22 +23,6 @@
 #include 
 #include 
 
-static const char *const comments[__NFT_TRACETYPE_MAX] = {
-   [NFT_TRACETYPE_POLICY]  = "policy",
-   [NFT_TRACETYPE_RETURN]  = "return",
-   [NFT_TRACETYPE_RULE]= "rule",
-};
-
-static const struct nf_loginfo trace_loginfo = {
-   .type = NF_LOG_TYPE_LOG,
-   .u = {
-   .log = {
-   .level = LOGLEVEL_WARNING,
-   .logflags = NF_LOG_DEFAULT_MASK,
-   },
-   },
-};
-
 static noinline void __nft_trace_packet(struct nft_traceinfo *info,
const struct nft_chain *chain,
enum nft_trace_types type)
-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nf] netfilter: nf_tables: increse nft_counters_enabled in nft_chain_stats_replace()

2018-05-28 Thread Taehee Yoo
When a chain is updated, a counter can be attached. if so,
the nft_counters_enabled should be increased.

test commands:

   %nft add table ip filter
   %nft add chain ip filter input { type filter hook input priority 4\; }
   %iptables-compat -Z input
   %nft delete chain ip filter input

we can see below messages.

[  286.443720] jump label: negative count!
[  286.448278] WARNING: CPU: 0 PID: 1459 at kernel/jump_label.c:197 
__static_key_slow_dec_cpuslocked+0x6f/0xf0
[  286.449144] Modules linked in: nf_tables nfnetlink ip_tables x_tables
[  286.449144] CPU: 0 PID: 1459 Comm: nft Tainted: GW 
4.17.0-rc2+ #12
[  286.449144] RIP: 0010:__static_key_slow_dec_cpuslocked+0x6f/0xf0
[  286.449144] RSP: 0018:88010e5176f0 EFLAGS: 00010286
[  286.449144] RAX: 001b RBX: c0179500 RCX: b8a82522
[  286.449144] RDX: 0001 RSI: 0008 RDI: 88011b7e5eac
[  286.449144] RBP:  R08: ed00236fce5c R09: ed00236fce5b
[  286.449144] R10: c0179503 R11: ed00236fce5c R12: 
[  286.449144] R13: 88011a28e448 R14: 88011a28e470 R15: dc00
[  286.449144] FS:  7f0384328700() GS:88011b60() 
knlGS:
[  286.449144] CS:  0010 DS:  ES:  CR0: 80050033
[  286.449144] CR2: 7f038394bf10 CR3: 000104a86000 CR4: 001006f0
[  286.449144] Call Trace:
[  286.449144]  static_key_slow_dec+0x6a/0x70
[  286.449144]  nf_tables_chain_destroy+0x19d/0x210 [nf_tables]
[  286.449144]  nf_tables_commit+0x1891/0x1c50 [nf_tables]
[  286.449144]  nfnetlink_rcv+0x1148/0x13d0 [nfnetlink]
[ ... ]

Signed-off-by: Taehee Yoo <ap420...@gmail.com>
---
 net/netfilter/nf_tables_api.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index e53e2e5..f730283 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -1298,8 +1298,10 @@ static void nft_chain_stats_replace(struct 
nft_base_chain *chain,
rcu_assign_pointer(chain->stats, newstats);
synchronize_rcu();
free_percpu(oldstats);
-   } else
+   } else {
rcu_assign_pointer(chain->stats, newstats);
+   static_branch_inc(_counters_enabled);
+   }
 }
 
 static void nf_tables_chain_destroy(struct nft_ctx *ctx)
-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nf] netfilter: nf_tables: fix NULL-ptr in nf_tables_dump_obj()

2018-05-28 Thread Taehee Yoo
The table of nft_obj_filter is not array.
So, in order to check tablename, we should use pointer of that.

test commands:

   %nft add table ip filter
   %nft add counter ip filter ct1
   %nft reset counters

we can see below messages:

[  306.510504] kasan: CONFIG_KASAN_INLINE enabled
[  306.516184] kasan: GPF could be caused by NULL-ptr deref or user memory 
access
[  306.524775] general protection fault:  [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[  306.528284] Modules linked in: nft_objref nft_counter nf_tables nfnetlink 
ip_tables x_tables
[  306.528284] CPU: 0 PID: 1488 Comm: nft Not tainted 4.17.0-rc4+ #17
[  306.528284] Hardware name: To be filled by O.E.M. To be filled by 
O.E.M./Aptio CRB, BIOS 5.6.5 07/08/2015
[  306.528284] RIP: 0010:nf_tables_dump_obj+0x52c/0xa70 [nf_tables]
[  306.528284] RSP: 0018:8800b6cb7520 EFLAGS: 00010246
[  306.528284] RAX:  RBX: 8800b6c49820 RCX: 
[  306.528284] RDX:  RSI: dc00 RDI: ed0016d96e9a
[  306.528284] RBP: 8800b6cb75c0 R08: ed00236fce7c R09: ed00236fce7b
[  306.528284] R10: 9f6241e8 R11: ed00236fce7c R12: 880111365108
[  306.528284] R13:  R14: 8800b6c49860 R15: 8800b6c49860
[  306.528284] FS:  7f838b007700() GS:88011b60() 
knlGS:
[  306.528284] CS:  0010 DS:  ES:  CR0: 80050033
[  306.528284] CR2: 7ffeafabcf78 CR3: b6cbe000 CR4: 001006f0
[  306.528284] Call Trace:
[  306.528284]  netlink_dump+0x470/0xa20
[  306.528284]  __netlink_dump_start+0x5ae/0x690
[  306.528284]  ? nf_tables_getobj+0x1b3/0x740 [nf_tables]
[  306.528284]  nf_tables_getobj+0x2f5/0x740 [nf_tables]
[  306.528284]  ? nft_obj_notify+0x100/0x100 [nf_tables]
[  306.528284]  ? nf_tables_getobj+0x740/0x740 [nf_tables]
[  306.528284]  ? nf_tables_dump_flowtable_done+0x70/0x70 [nf_tables]
[  306.528284]  ? nft_obj_notify+0x100/0x100 [nf_tables]
[  306.528284]  nfnetlink_rcv_msg+0x8ff/0x932 [nfnetlink]
[  306.528284]  ? nfnetlink_rcv_msg+0x216/0x932 [nfnetlink]
[  306.528284]  netlink_rcv_skb+0x1c9/0x2f0
[  306.528284]  ? nfnetlink_bind+0x1d0/0x1d0 [nfnetlink]
[  306.528284]  ? debug_check_no_locks_freed+0x270/0x270
[  306.528284]  ? netlink_ack+0x7a0/0x7a0
[  306.528284]  ? ns_capable_common+0x6e/0x110
[ ... ]

Signed-off-by: Taehee Yoo <ap420...@gmail.com>
---
 net/netfilter/nf_tables_api.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index f730283..31b5315 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -4745,7 +4745,7 @@ static int nf_tables_dump_obj(struct sk_buff *skb, struct 
netlink_callback *cb)
if (idx > s_idx)
memset(>args[1], 0,
   sizeof(cb->args) - sizeof(cb->args[0]));
-   if (filter && filter->table[0] &&
+   if (filter && filter->table &&
strcmp(filter->table, table->name))
goto cont;
if (filter &&
@@ -5419,7 +5419,7 @@ static int nf_tables_dump_flowtable(struct sk_buff *skb,
if (idx > s_idx)
memset(>args[1], 0,
   sizeof(cb->args) - sizeof(cb->args[0]));
-   if (filter && filter->table[0] &&
+   if (filter && filter->table &&
strcmp(filter->table, table->name))
goto cont;
 
-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nf-next] netfilter: nft_meta: fix wrong value dereference in nft_meta_set_eval

2018-05-17 Thread Taehee Yoo
In the nft_meta_set_eval, nftrace value is dereferenced as u32 from sreg.
But correct type is u8. so that sometimes incorrect value is dereferenced.

Steps to reproduce:

   %nft add table ip filter
   %nft add chain ip filter input { type filter hook input priority 4\; }
   %nft add rule ip filter input nftrace set 0
   %nft monitor

Sometimes, we can see trace messages.

   trace id 16767227 ip filter input packet: iif "enp2s0"
   ether saddr xx:xx:xx:xx:xx:xx ether daddr xx:xx:xx:xx:xx:xx
   ip saddr 192.168.0.1 ip daddr 255.255.255.255 ip dscp cs0
   ip ecn not-ect ip
   trace id 16767227 ip filter input rule nftrace set 0 (verdict continue)
   trace id 16767227 ip filter input verdict continue
   trace id 16767227 ip filter input

Signed-off-by: Taehee Yoo <ap420...@gmail.com>
---
 net/netfilter/nft_meta.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/net/netfilter/nft_meta.c b/net/netfilter/nft_meta.c
index 5348bd0..1105a23 100644
--- a/net/netfilter/nft_meta.c
+++ b/net/netfilter/nft_meta.c
@@ -259,7 +259,7 @@ static void nft_meta_set_eval(const struct nft_expr *expr,
struct sk_buff *skb = pkt->skb;
u32 *sreg = >data[meta->sreg];
u32 value = *sreg;
-   u8 pkt_type;
+   u8 value8;
 
switch (meta->key) {
case NFT_META_MARK:
@@ -269,15 +269,17 @@ static void nft_meta_set_eval(const struct nft_expr *expr,
skb->priority = value;
break;
case NFT_META_PKTTYPE:
-   pkt_type = nft_reg_load8(sreg);
+   value8 = nft_reg_load8(sreg);
 
-   if (skb->pkt_type != pkt_type &&
-   skb_pkt_type_ok(pkt_type) &&
+   if (skb->pkt_type != value8 &&
+   skb_pkt_type_ok(value8) &&
skb_pkt_type_ok(skb->pkt_type))
-   skb->pkt_type = pkt_type;
+   skb->pkt_type = value8;
break;
case NFT_META_NFTRACE:
-   skb->nf_trace = !!value;
+   value8 = nft_reg_load8(sreg);
+
+   skb->nf_trace = !!value8;
break;
default:
WARN_ON(1);
-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nf] netfilter: nf_tables: fix NULL pointer dereference on nft_ct_helper_obj_dump()

2018-05-16 Thread Taehee Yoo
0246 R12: 7fff6367e110
[  916.655301] R13: 0020 R14: 7f57a153c610 R15: 562417258de0
[  916.655301] Code: ff ff ff 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 55 48 
89 fa 53 48 c1 ea 03 48 b8 00 00 00 00 00 fc ff df 48 89 fd 48 83 ec 08 <0f> b6 
04 02 48 89 fa 83 e2 07 38 d0 7f
[  916.655301] RIP: strlen+0x1a/0x90 RSP: 88010ff0f2f8
[  916.771929] ---[ end trace 1065e048e72479fe ]---
[  916.777204] Kernel panic - not syncing: Fatal exception
[  916.778158] Kernel Offset: 0x1400 from 0x8100 (relocation 
range: 0x8000-0xffffbfff)

Signed-off-by: Taehee Yoo <ap420...@gmail.com>
---
 net/netfilter/nft_ct.c | 20 
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/net/netfilter/nft_ct.c b/net/netfilter/nft_ct.c
index ea737fd..5c0de70 100644
--- a/net/netfilter/nft_ct.c
+++ b/net/netfilter/nft_ct.c
@@ -880,22 +880,26 @@ static int nft_ct_helper_obj_dump(struct sk_buff *skb,
  struct nft_object *obj, bool reset)
 {
const struct nft_ct_helper_obj *priv = nft_obj_data(obj);
-   const struct nf_conntrack_helper *helper = priv->helper4;
+   const struct nf_conntrack_helper *helper;
u16 family;
 
+   if (priv->helper4 && priv->helper6) {
+   family = NFPROTO_INET;
+   helper = priv->helper4;
+   } else if (priv->helper6) {
+   family = NFPROTO_IPV6;
+   helper = priv->helper6;
+   } else {
+   family = NFPROTO_IPV4;
+   helper = priv->helper4;
+   }
+
if (nla_put_string(skb, NFTA_CT_HELPER_NAME, helper->name))
return -1;
 
if (nla_put_u8(skb, NFTA_CT_HELPER_L4PROTO, priv->l4proto))
return -1;
 
-   if (priv->helper4 && priv->helper6)
-   family = NFPROTO_INET;
-   else if (priv->helper6)
-   family = NFPROTO_IPV6;
-   else
-   family = NFPROTO_IPV4;
-
if (nla_put_be16(skb, NFTA_CT_HELPER_L3PROTO, htons(family)))
return -1;
 
-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nf 5/5] netfilter: nf_tables: add call validate callback.

2018-05-15 Thread Taehee Yoo
A validate callback is called just before calling a ->commit callback.
If it is failed, ->abort is called.

Signed-off-by: Taehee Yoo <ap420...@gmail.com>
---
 net/netfilter/nfnetlink.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c
index 03ead8a..b9b6401 100644
--- a/net/netfilter/nfnetlink.c
+++ b/net/netfilter/nfnetlink.c
@@ -441,8 +441,21 @@ static void nfnetlink_rcv_batch(struct sk_buff *skb, 
struct nlmsghdr *nlh,
kfree_skb(skb);
goto replay;
} else if (status == NFNL_BATCH_DONE) {
+   if (ss->validate) {
+   err = ss->validate(net);
+   if (err < 0) {
+   if (nfnl_err_add(_list, nlmsg_hdr(oskb),
+err, ) < 0) {
+   nfnl_err_reset(_list);
+   netlink_ack(oskb, nlmsg_hdr(oskb),
+   -ENOMEM, NULL);
+   }
+   goto abort;
+   }
+   }
ss->commit(net, oskb);
} else {
+abort:
ss->abort(net, oskb);
}
 
-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nf 4/5] netfilter: nf_tables: use chain info to validate type and hook.

2018-05-15 Thread Taehee Yoo
After this patch, the nft_chain_validate_dependency and
nft_chain_validate_hooks use chain information array.
so that these functions can validate both basechain and non-basechain.

Now expr->ops->validate should be called in the nf_tables_validate because
that uses chain information that is allocated in the nf_tables_validate.
But exceptionally, the nf_tables_check_loops can call
that if ops is "immediate".

Now, nft_compat.c uses common validate routine instead of
the nft_compat_chain_validate_dependency.

Signed-off-by: Taehee Yoo <ap420...@gmail.com>
---
 net/netfilter/nf_tables_api.c | 51 +++---
 net/netfilter/nft_compat.c| 73 ++-
 2 files changed, 42 insertions(+), 82 deletions(-)

diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 36d8fba..d902ef9 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -1899,26 +1899,13 @@ static int nf_tables_newexpr(const struct nft_ctx *ctx,
expr->ops = ops;
if (ops->init) {
err = ops->init(ctx, expr, (const struct nlattr **)info->tb);
-   if (err < 0)
-   goto err1;
-   }
-
-   if (ops->validate) {
-   const struct nft_data *data = NULL;
-
-   err = ops->validate(ctx, expr, );
-   if (err < 0)
-   goto err2;
+   if (err < 0) {
+   expr->ops = NULL;
+   return err;
+   }
}
 
return 0;
-
-err2:
-   if (ops->destroy)
-   ops->destroy(ctx, expr);
-err1:
-   expr->ops = NULL;
-   return err;
 }
 
 static void nf_tables_expr_destroy(const struct nft_ctx *ctx,
@@ -6397,13 +6384,12 @@ static const struct nfnetlink_subsystem 
nf_tables_subsys = {
 int nft_chain_validate_dependency(const struct nft_ctx *ctx,
  enum nft_chain_types type)
 {
-   const struct nft_base_chain *basechain;
+   struct net *net = ctx->net;
+   struct nft_chain *chain = ctx->chain;
+   struct nft_chain_info *cinfo = nft_get_chain_info(net, chain);
 
-   if (nft_is_base_chain(ctx->chain)) {
-   basechain = nft_base_chain(ctx->chain);
-   if (basechain->type->type != type)
-   return -EOPNOTSUPP;
-   }
+   if (cinfo->type && cinfo->type != type)
+   return -EOPNOTSUPP;
return 0;
 }
 EXPORT_SYMBOL_GPL(nft_chain_validate_dependency);
@@ -6411,17 +6397,14 @@ EXPORT_SYMBOL_GPL(nft_chain_validate_dependency);
 int nft_chain_validate_hooks(const struct nft_ctx *ctx,
 unsigned int hook_flags)
 {
-   struct nft_base_chain *basechain;
-
-   if (nft_is_base_chain(ctx->chain)) {
-   basechain = nft_base_chain(ctx->chain);
-
-   if ((1 << basechain->ops.hooknum) & hook_flags)
-   return 0;
+   struct net *net = ctx->net;
+   struct nft_chain *chain = ctx->chain;
+   struct nft_chain_info *cinfo = nft_get_chain_info(net, chain);
 
+   if (!hook_flags)
+   return 0;
+   if (cinfo->hooknum & ~hook_flags)
return -EOPNOTSUPP;
-   }
-
return 0;
 }
 EXPORT_SYMBOL_GPL(nft_chain_validate_hooks);
@@ -6479,12 +6462,14 @@ static int nf_tables_check_loops(const struct nft_ctx 
*ctx,
 
if (!expr->ops->validate)
continue;
+   if (strcmp(expr->ops->type->name, "immediate"))
+   continue;
 
err = expr->ops->validate(ctx, expr, );
if (err < 0)
return err;
 
-   if (data == NULL)
+   if (!data)
continue;
 
switch (data->verdict.code) {
diff --git a/net/netfilter/nft_compat.c b/net/netfilter/nft_compat.c
index 1d99a1ef..c7aad9c 100644
--- a/net/netfilter/nft_compat.c
+++ b/net/netfilter/nft_compat.c
@@ -54,23 +54,6 @@ static bool nft_xt_put(struct nft_xt *xt)
return false;
 }
 
-static int nft_compat_chain_validate_dependency(const char *tablename,
-   const struct nft_chain *chain)
-{
-   const struct nft_base_chain *basechain;
-
-   if (!tablename ||
-   !nft_is_base_chain(chain))
-   return 0;
-
-   basechain = nft_base_chain(chain);
-   if (strcmp(tablename, "nat") == 0 &&
-   basechain->type->type != NFT_CHAIN_T_NAT)
-   return -EINVAL;
-
-   return 0;
-}
-
 union nft_entry {
struct ipt_entry e4;
struct ip6t_entry e6;
@@ -311,

[PATCH nf 3/5] netfilter: nf_tables: add type and hook validate routine

2018-05-15 Thread Taehee Yoo
This patch adds validate callback to the nfnetlink_subsysem.
It validates type and hook of both basechain and non-basechain.
To validate type and hook, it constructs chain information array.
Like loop detection routine, validator travels each rules and sets
then marks type and hook value to the each chain information array.

example :

table ip test {
chain prerouting {
type nat hook prerouting priority 4;
jump test1
}
chain postrouting {
type nat hook postrouting priority 5;
jump test1
}
chain input {
type filter hook input priority 0;
jump test1
}
chain outout {
type filter hook output priority 0;
jump test2
}
chain test1 {
jump test2
counter
}
chain test2 {
counter
}
}

The test1 has below chain information.
type = NFT_CHAIN_T_MIX
hook = (1 << NF_INET_PRE_ROUTING | 1 << NF_INET_POST_ROUTING |
1 << NF_INET_LOCAL_IN)

And the test2 has below chain information.
type = NFT_CHAIN_T_MIX
hook = (1 << NF_INET_PRE_ROUTING | 1 << NF_INET_POST_ROUTING |
1 << NF_INET_LOCAL_IN | 1 << NF_ONET_LOCAL_OUT)

The new type NFT_CHAIN_T_MIX means that chain has both filter and
nat type.
Then, validator calls expr->ops->validate()

Next patch makes expr->ops->validate() to use chain information array
insted of basechain's data.

Signed-off-by: Taehee Yoo <ap420...@gmail.com>
---
 include/linux/netfilter/nfnetlink.h |   1 +
 include/net/netfilter/nf_tables.h   |   1 +
 include/net/netns/nftables.h|   3 +
 net/netfilter/nf_tables_api.c   | 262 
 4 files changed, 267 insertions(+)

diff --git a/include/linux/netfilter/nfnetlink.h 
b/include/linux/netfilter/nfnetlink.h
index 34551f8..a641d52 100644
--- a/include/linux/netfilter/nfnetlink.h
+++ b/include/linux/netfilter/nfnetlink.h
@@ -29,6 +29,7 @@ struct nfnetlink_subsystem {
__u8 subsys_id; /* nfnetlink subsystem ID */
__u8 cb_count;  /* number of callbacks */
const struct nfnl_callback *cb; /* callback for individual types */
+   int (*validate)(struct net *net);
int (*commit)(struct net *net, struct sk_buff *skb);
int (*abort)(struct net *net, struct sk_buff *skb);
bool (*valid_genid)(struct net *net, u32 genid);
diff --git a/include/net/netfilter/nf_tables.h 
b/include/net/netfilter/nf_tables.h
index 7eb4802..9959509 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -877,6 +877,7 @@ enum nft_chain_types {
NFT_CHAIN_T_DEFAULT = 0,
NFT_CHAIN_T_ROUTE,
NFT_CHAIN_T_NAT,
+   NFT_CHAIN_T_MIX,
NFT_CHAIN_T_MAX
 };
 
diff --git a/include/net/netns/nftables.h b/include/net/netns/nftables.h
index 29c3851..61e94e5 100644
--- a/include/net/netns/nftables.h
+++ b/include/net/netns/nftables.h
@@ -4,9 +4,12 @@
 
 #include 
 
+struct nft_chain_info;
+
 struct netns_nftables {
struct list_headtables;
struct list_headcommit_list;
+   struct nft_chain_info   *chain_info;
unsigned intbase_seq;
u8  gencursor;
 };
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 13c2fc3..36d8fba 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -5841,6 +5841,267 @@ static void nf_tables_commit_release(struct net *net)
}
 }
 
+struct nft_chain_info {
+   u8  type;
+   unsigned inthooknum;
+};
+
+static inline struct nft_chain_info *nft_get_chain_info(struct net *net,
+   struct nft_chain *chain)
+{
+   return net->nft.chain_info + chain->handle;
+}
+
+static int nft_validate_chain(struct net *net, struct nft_chain *chain)
+{
+   struct nft_table *table = chain->table;
+   struct nft_expr *expr, *last;
+   struct nft_rule *rule;
+   struct nft_ctx ctx;
+
+   list_for_each_entry(rule, >rules, list) {
+   if (!nft_is_active_next(net, rule))
+   continue;
+   nft_rule_for_each_expr(expr, last, rule) {
+   const struct nft_data *data = NULL;
+   int err = 0;
+
+   if (!expr->ops->validate)
+   continue;
+
+   ctx.net = net;
+   ctx.family  = table->family;
+   ctx.table   = table;
+   ctx.chain   = chain;
+   err = expr->ops->validate(, expr, );
+   if (err < 0)
+   return err;
+
+  

[PATCH nf 2/5] netfilter: nf_tables: remove nft_af_info.

2018-05-15 Thread Taehee Yoo
The struct nft_af_info was removed.

Signed-off-by: Taehee Yoo <ap420...@gmail.com>
---
 include/net/netns/nftables.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/include/net/netns/nftables.h b/include/net/netns/nftables.h
index 4813435..29c3851 100644
--- a/include/net/netns/nftables.h
+++ b/include/net/netns/nftables.h
@@ -4,8 +4,6 @@
 
 #include 
 
-struct nft_af_info;
-
 struct netns_nftables {
struct list_headtables;
struct list_headcommit_list;
-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nf 0/5] netfilter: nf_tables: add validate non-basechain ruleset routine

2018-05-15 Thread Taehee Yoo
[  411.663780]  __tty_check_change.part.1+0x103/0x360
[  411.671776]  n_tty_read+0x16e/0x14b0
[  411.675768]  ? __ldsem_down_read_nested+0xea/0x5d0
[  411.679774]  ? copy_from_read_buf+0x400/0x400
[  411.683777]  ? do_wait_intr_irq+0x270/0x270
[  411.687780]  tty_read+0x14a/0x220
[  411.691765]  __vfs_read+0xd2/0x580
[  411.695771]  ? SyS_copy_file_range+0x340/0x340
[  411.703763]  ? lock_acquire+0x380/0x380
[  411.707772]  ? lock_acquire+0x193/0x380
[  411.711762]  ? finish_task_switch+0xf4/0x560
[  411.715760]  ? _raw_spin_unlock_irq+0x29/0x40
[  411.719771]  ? _raw_spin_unlock_irq+0x29/0x40
[  411.723761]  ? finish_task_switch+0x122/0x560
[  411.731765]  ? finish_task_switch+0xf4/0x560
[  411.735764]  ? __schedule+0x582/0x19a0
[  411.739760]  ? lock_acquire+0x380/0x380
[  411.743930]  vfs_read+0x105/0x300
[  411.747763]  ? ksys_read+0x160/0x160
[  411.751772]  ksys_read+0xae/0x160
[  411.755763]  ? kernel_write+0x130/0x130
[  411.759761]  ? do_syscall_64+0x43/0x5b0
[  411.763762]  ? ksys_read+0x160/0x160
[  411.767769]  do_syscall_64+0x18f/0x5b0
[  411.771766]  entry_SYSCALL_64_after_hwframe+0x42/0xb7
[  411.779574] RIP: 0033:0x7f600be28870
[  411.783569] RSP: 002b:7ffe1e4f5aa8 EFLAGS: 0246 ORIG_RAX: 

[  411.791773] RAX: ffda RBX: 7f600c0f18c0 RCX: 7f600be28870
[  411.799763] RDX: 0001 RSI: 7ffe1e4f5ab7 RDI: 
[  411.807765] RBP: 7ffe1e4f5ab7 R08: 7f600c0f3750 R09: 7f600c733b40
[  411.815779] R10:  R11: 0246 R12: 
[  411.823763] R13: 004be7a0 R14:  R15: 0001
[  411.831763] Code: 00 fc ff df 48 89 e5 41 57 41 56 41 55 41 54 41 89 d5 53 
48 89 fb 48 81 c7 48 05 00 00 48 89 fa 41 89 f6 48 c1 ea 03 48 83 ec 08 <80> 3c 
02 00 0f 85 82 05 00 00 4c 8b a3
[  411.851766] RIP: inet_select_addr+0x37/0x620 RSP: 88011b807368
[  411.860979] ---[ end trace bf2aa3e38f77f7bf ]---
[  411.866242] Kernel panic - not syncing: Fatal exception in interrupt
[  411.867204] Kernel Offset: 0x0 from 0x8100 (relocation range: 
0x8000-0xbfff)


In order to solve this,
patchset constructs chain information for all chains.
Then it validates all of rule using chain information.
If it is failed, abort callback is called.

Before this patch, routine of adding rule are like below.
   1. select_ops()
   2. init()
   3. validate()
   4. call_batch()
   5. commit()/abort()

After this patch, routine are like below.
   1. select_ops()
   2. init()
   3. call_batch()
   4. validate()
   5. commit()/abort()


Taehee Yoo (5):
  netfilter: nf_tables: use nft_ctx instead of nft_chain
  netfilter: nf_tables: remove nft_af_info.
  netfilter: nf_tables: add type and hook validate routine
  netfilter: nf_tables: use chain info to validate type and hook.
  netfilter: nf_tables: add call validate callback.

 include/linux/netfilter/nfnetlink.h  |   1 +
 include/net/netfilter/nf_tables.h|   5 +-
 include/net/netns/nftables.h |   3 +-
 net/bridge/netfilter/nft_reject_bridge.c |   4 +-
 net/netfilter/nf_tables_api.c| 317 +++
 net/netfilter/nfnetlink.c|  13 ++
 net/netfilter/nft_compat.c   |  73 +++
 net/netfilter/nft_fib.c  |   2 +-
 net/netfilter/nft_flow_offload.c |   2 +-
 net/netfilter/nft_masq.c |   4 +-
 net/netfilter/nft_meta.c |   4 +-
 net/netfilter/nft_nat.c  |   6 +-
 net/netfilter/nft_redir.c|   4 +-
 net/netfilter/nft_reject.c   |   2 +-
 net/netfilter/nft_rt.c   |   2 +-
 15 files changed, 340 insertions(+), 102 deletions(-)

-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nf 1/5] netfilter: nf_tables: use nft_ctx instead of nft_chain

2018-05-15 Thread Taehee Yoo
This patch prepares for next patches.
The nft_chain_validate_hooks and
nft_chain_validate_dependency are going to use both net and nft_chain.

Signed-off-by: Taehee Yoo <ap420...@gmail.com>
---
 include/net/netfilter/nf_tables.h|  4 ++--
 net/bridge/netfilter/nft_reject_bridge.c |  4 ++--
 net/netfilter/nf_tables_api.c| 12 ++--
 net/netfilter/nft_fib.c  |  2 +-
 net/netfilter/nft_flow_offload.c |  2 +-
 net/netfilter/nft_masq.c |  4 ++--
 net/netfilter/nft_meta.c |  4 ++--
 net/netfilter/nft_nat.c  |  6 +++---
 net/netfilter/nft_redir.c|  4 ++--
 net/netfilter/nft_reject.c   |  2 +-
 net/netfilter/nft_rt.c   |  2 +-
 11 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/include/net/netfilter/nf_tables.h 
b/include/net/netfilter/nf_tables.h
index a1e28dd..7eb4802 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -903,9 +903,9 @@ struct nft_chain_type {
void(*free)(struct nft_ctx *ctx);
 };
 
-int nft_chain_validate_dependency(const struct nft_chain *chain,
+int nft_chain_validate_dependency(const struct nft_ctx *ctx,
  enum nft_chain_types type);
-int nft_chain_validate_hooks(const struct nft_chain *chain,
+int nft_chain_validate_hooks(const struct nft_ctx *ctx,
  unsigned int hook_flags);
 
 struct nft_stats {
diff --git a/net/bridge/netfilter/nft_reject_bridge.c 
b/net/bridge/netfilter/nft_reject_bridge.c
index eaf05de..f3b633b 100644
--- a/net/bridge/netfilter/nft_reject_bridge.c
+++ b/net/bridge/netfilter/nft_reject_bridge.c
@@ -357,8 +357,8 @@ static int nft_reject_bridge_validate(const struct nft_ctx 
*ctx,
  const struct nft_expr *expr,
  const struct nft_data **data)
 {
-   return nft_chain_validate_hooks(ctx->chain, (1 << NF_BR_PRE_ROUTING) |
-   (1 << NF_BR_LOCAL_IN));
+   return nft_chain_validate_hooks(ctx, (1 << NF_BR_PRE_ROUTING) |
+(1 << NF_BR_LOCAL_IN));
 }
 
 static int nft_reject_bridge_init(const struct nft_ctx *ctx,
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 3806db3..13c2fc3 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -6132,13 +6132,13 @@ static const struct nfnetlink_subsystem 
nf_tables_subsys = {
.valid_genid= nf_tables_valid_genid,
 };
 
-int nft_chain_validate_dependency(const struct nft_chain *chain,
+int nft_chain_validate_dependency(const struct nft_ctx *ctx,
  enum nft_chain_types type)
 {
const struct nft_base_chain *basechain;
 
-   if (nft_is_base_chain(chain)) {
-   basechain = nft_base_chain(chain);
+   if (nft_is_base_chain(ctx->chain)) {
+   basechain = nft_base_chain(ctx->chain);
if (basechain->type->type != type)
return -EOPNOTSUPP;
}
@@ -6146,13 +6146,13 @@ int nft_chain_validate_dependency(const struct 
nft_chain *chain,
 }
 EXPORT_SYMBOL_GPL(nft_chain_validate_dependency);
 
-int nft_chain_validate_hooks(const struct nft_chain *chain,
+int nft_chain_validate_hooks(const struct nft_ctx *ctx,
 unsigned int hook_flags)
 {
struct nft_base_chain *basechain;
 
-   if (nft_is_base_chain(chain)) {
-   basechain = nft_base_chain(chain);
+   if (nft_is_base_chain(ctx->chain)) {
+   basechain = nft_base_chain(ctx->chain);
 
if ((1 << basechain->ops.hooknum) & hook_flags)
return 0;
diff --git a/net/netfilter/nft_fib.c b/net/netfilter/nft_fib.c
index 21df8cc..47dbf94 100644
--- a/net/netfilter/nft_fib.c
+++ b/net/netfilter/nft_fib.c
@@ -59,7 +59,7 @@ int nft_fib_validate(const struct nft_ctx *ctx, const struct 
nft_expr *expr,
return -EINVAL;
}
 
-   return nft_chain_validate_hooks(ctx->chain, hooks);
+   return nft_chain_validate_hooks(ctx, hooks);
 }
 EXPORT_SYMBOL_GPL(nft_fib_validate);
 
diff --git a/net/netfilter/nft_flow_offload.c b/net/netfilter/nft_flow_offload.c
index b65829b..6165733 100644
--- a/net/netfilter/nft_flow_offload.c
+++ b/net/netfilter/nft_flow_offload.c
@@ -128,7 +128,7 @@ static int nft_flow_offload_validate(const struct nft_ctx 
*ctx,
 {
unsigned int hook_mask = (1 << NF_INET_FORWARD);
 
-   return nft_chain_validate_hooks(ctx->chain, hook_mask);
+   return nft_chain_validate_hooks(ctx, hook_mask);
 }
 
 static int nft_flow_offload_init(const struct nft_ctx *ctx,
diff --git a/net/netfilter/nft_masq.c b/net/netfilter/nft_masq.c
index 9d8655b..5a32260 100644
--- a/net

Re: [PATCH 1/3 nf-next] netfilter: nf_tables: add release callback in nft_expr_type

2018-04-30 Thread Taehee Yoo
2018-04-30 3:03 GMT+09:00 Florian Westphal <f...@strlen.de>:
> Taehee Yoo <ap420...@gmail.com> wrote:
>> This patch adds the new release callback to release resources
>> allocated in nft_expr_type->select_ops.
>> This release callback can be used by error path in the
>> nf_tables_newrule routine.
>> Only the select_ops of the nft_compat.c allocates memory and holds
>> modules so far.
>
> Wouldn't it be simpler to just add the missing nft_xt_put()
> in nft_target_init()?
>
Thank you for your review!

I think the putting nft_xt_put() into the nft_{target/match}_init() can't solve
the below problem scenario.
Below is what I experienced scenario.

Before:
   $rmmod nft_counter

Steps to reproduce:
   $iptables-compat -I OUTPUT -m cpu --cpu 0

When above command is given, a netlink message has two
experssions that are the cpu compat and the nft_counter.
The nft_expr_type_get() in the nf_tables_expr_parse() successes
first expression then, calls select_ops callback.
(allocates memory and holds module)
But, second nft_expr_type_get() in the nf_tables_expr_parse()
returns -EAGAIN because of request_module().
In that point, by the 'goto err1',
the 'module_put(info[i].ops->type->owner)' is called.
There is no release routine.
If the nft_xt_put() is added into the nft_{target/match}_init(),
above scenario still can't be solved.

In order to reproduce above scenario, the nft_counter should be unloaded.

In the second patch, you said that you can't reproduce this problem.
If the nft_counter is unloaded, you can reproduce this problem.
Could you please test this?

Thank you!
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3 nf-next] netfilter: fix error path of the nf_tables_newrule

2018-04-29 Thread Taehee Yoo
There is module leak in the error path of the nf_tables_newrule.
In order to solve this, a member nft_expr_type *type is added into
the nft_expr_info. so that, we can make separated error path of the
nft_expr_ops and the nft_expr_type.
So that, the nf_tables_rule_destroy is not used in the error path
of the nf_tables_newrule anymore.

Steps to reproduce:
   $iptables-compat -I OUTPUT -m cpu --cpu 0
   $iptables-compat -F
   $lsmod

   Module  Size  Used by
   xt_cpu 16384  1


Signed-off-by: Taehee Yoo <ap420...@gmail.com>
---
 net/netfilter/nf_tables_api.c | 46 +--
 1 file changed, 31 insertions(+), 15 deletions(-)

diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 9134cc4..981f35e 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -1814,6 +1814,7 @@ int nft_expr_dump(struct sk_buff *skb, unsigned int attr,
 
 struct nft_expr_info {
const struct nft_expr_ops   *ops;
+   const struct nft_expr_type  *type;
struct nlattr   *tb[NFT_EXPR_MAXATTR + 1];
 };
 
@@ -1853,6 +1854,7 @@ static int nf_tables_expr_parse(const struct nft_ctx *ctx,
ops = type->ops;
 
info->ops = ops;
+   info->type = type;
return 0;
 
 err1:
@@ -1895,9 +1897,14 @@ static int nf_tables_newexpr(const struct nft_ctx *ctx,
 static void nf_tables_expr_destroy(const struct nft_ctx *ctx,
   struct nft_expr *expr)
 {
+   struct module *module = expr->ops->type->owner;
+
if (expr->ops->destroy)
expr->ops->destroy(ctx, expr);
-   module_put(expr->ops->type->owner);
+   if (expr->ops->type->release)
+   expr->ops->type->release(expr->ops);
+
+   module_put(module);
 }
 
 struct nft_expr *nft_expr_init(const struct nft_ctx *ctx,
@@ -1922,9 +1929,11 @@ struct nft_expr *nft_expr_init(const struct nft_ctx *ctx,
 
return expr;
 err3:
+   if (info.type->release)
+   info.type->release(info.ops);
kfree(expr);
 err2:
-   module_put(info.ops->type->owner);
+   module_put(info.type->owner);
 err1:
return ERR_PTR(err);
 }
@@ -2258,7 +2267,7 @@ static int nf_tables_newrule(struct net *net, struct sock 
*nlsk,
struct nft_expr *expr;
struct nft_ctx ctx;
struct nlattr *tmp;
-   unsigned int size, i, n, ulen = 0, usize = 0;
+   unsigned int size, i, n_type, n_ops, ulen = 0, usize = 0;
int err, rem;
bool create;
u64 handle, pos_handle;
@@ -2307,20 +2316,20 @@ static int nf_tables_newrule(struct net *net, struct 
sock *nlsk,
 
nft_ctx_init(, net, skb, nlh, family, table, chain, nla);
 
-   n = 0;
+   n_type = 0;
size = 0;
if (nla[NFTA_RULE_EXPRESSIONS]) {
nla_for_each_nested(tmp, nla[NFTA_RULE_EXPRESSIONS], rem) {
err = -EINVAL;
if (nla_type(tmp) != NFTA_LIST_ELEM)
goto err1;
-   if (n == NFT_RULE_MAXEXPRS)
+   if (n_type == NFT_RULE_MAXEXPRS)
goto err1;
-   err = nf_tables_expr_parse(, tmp, [n]);
+   err = nf_tables_expr_parse(, tmp, [n_type]);
if (err < 0)
goto err1;
-   size += info[n].ops->size;
-   n++;
+   size += info[n_type].ops->size;
+   n_type++;
}
}
/* Check for overflow of dlen field */
@@ -2352,11 +2361,10 @@ static int nf_tables_newrule(struct net *net, struct 
sock *nlsk,
}
 
expr = nft_expr_first(rule);
-   for (i = 0; i < n; i++) {
-   err = nf_tables_newexpr(, [i], expr);
+   for (n_ops = 0; n_ops < n_type; n_ops++) {
+   err = nf_tables_newexpr(, [n_ops], expr);
if (err < 0)
goto err2;
-   info[i].ops = NULL;
expr = nft_expr_next(expr);
}
 
@@ -2397,11 +2405,19 @@ static int nf_tables_newrule(struct net *net, struct 
sock *nlsk,
 err3:
list_del_rcu(>list);
 err2:
-   nf_tables_rule_destroy(, rule);
+   expr = nft_expr_first(rule);
+   for (i = 0; i < n_ops; i++) {
+   if (expr->ops && expr->ops->destroy)
+   expr->ops->destroy(, expr);
+   expr = nft_expr_next(expr);
+   }
+   kfree(rule);
+
 err1:
-   for (i = 0; i < n; i++) {
-   if (info[i].ops != NULL)
-   module_put(info[i].ops->type->owner);
+   for (i = 0; i < n_type; i++) {
+   if (info[i].type->rel

  1   2   >