Re: RFC: Designing per chain rule cache support in libnftnl

2018-11-22 Thread Florian Westphal
Phil Sutter  wrote:
> In order to improve performance in 'nft -f' as well as xtables-restore
> with very large rulesets, we need to store rules by chain they belong
> to. In order to avoid pointless code duplication, this should be
> supported by libnftnl.

Unfortunately we still need to change lookup algorithm as well
(hash, tree?), linear list scan is too expensive.

We might even need multiple internal ways to keep track of the chains,
e.g. to accelerate insert/delete-by-index :-/

> Looking into the topic, it seems like extending struct nftnl_chain is
> the most straightforward way to go. My idea is to embed an
> nftnl_rule_list in there, though I'm unsure how to best do that in
> practice:
> 
> We could either add a field of type struct nftnl_rule_list which would
> have to be initialized/cleared in nftnl_chain_alloc() and
> nftnl_chain_free(). This would be accompanied by a function to retrieve
> the pointer to that field so the existing rule_list routines may be used
> with it.
> 
> Another option would be to add a pointer to a struct nftnl_rule_list.
> Having a function to retrieve a pointer to that pointer, the rule_list
> could be initialized/cleared by users on demand.
> 
> What do you consider more practical? Is there a third option I didn't
> think of yet?

I'd vote for the former (embed nftnl_rule_list).

If user doesn't want it cleared at nftnl_chain_free() time they can
always allocate a new nftnl_rule_list and splice to that list.


Re: [iptables PATCH] arptables: Support --set-counters option

2018-11-22 Thread Florian Westphal
Phil Sutter  wrote:
> Relevant code for this was already present (short option '-c'), just the
> long option definition was missing.

Applied, thanks.


[iptables PATCH] arptables: Support --set-counters option

2018-11-22 Thread Phil Sutter
Relevant code for this was already present (short option '-c'), just the
long option definition was missing.

While being at it, add '-c' to help text.

Signed-off-by: Phil Sutter 
---
 iptables/xtables-arp.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/iptables/xtables-arp.c b/iptables/xtables-arp.c
index 5a9924ca56442..2f369d9aadb01 100644
--- a/iptables/xtables-arp.c
+++ b/iptables/xtables-arp.c
@@ -144,6 +144,7 @@ static struct option original_opts[] = {
{ "help", 2, 0, 'h' },
{ "line-numbers", 0, 0, '0' },
{ "modprobe", 1, 0, 'M' },
+   { "set-counters", 1, 0, 'c' },
{ 0 }
 };
 
@@ -481,7 +482,7 @@ exit_printhelp(void)
 "  --line-numbers  print line numbers when listing\n"
 "  --exact -x  expand numbers (display exact values)\n"
 "  --modprobe=try to insert modules using this 
command\n"
-"  --set-counters PKTS BYTES   set the counter during insert/append\n"
+"  --set-counters -c PKTS BYTESset the counter during insert/append\n"
 "[!] --version -V  print package version.\n");
printf(" opcode strings: \n");
 for (i = 0; i < NUMOPCODES; i++)
-- 
2.19.0



[PATCH nf v2 2/2] netfilter: nat: fix double register in masquerade modules

2018-11-22 Thread Taehee Yoo
masquerade modules register notifier and that should not be
double-registered. so that these modules manage reference counter.
If already notifiers are registered, it just return success.
But there is unsafe scenario.

test commands:

   while :
   do
   modprobe ip6t_MASQUERADE &
   modprobe nft_masq_ipv6 &
   modprobe -rv ip6t_MASQUERADE &
   modprobe -rv nft_masq_ipv6 &
   done

numbers are reference count.

CPU0CPU1CPU2CPU3CPU4
[insmod][insmod][rmmod] [rmmod] [insmod]

0->1
register1->2
returns 2->1
returns 1->0
0->1
register <--
unregister


The unregistation of CPU3 should be processed before the
registration of CPU4.

In order to fix this, mutex can be used.
So that this patch uses it.

splat looks like:
[  323.869557] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [modprobe:1381]
[  323.869574] Modules linked in: nf_tables(+) nf_nat_ipv6(-) nf_nat 
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 n]
[  323.869574] irq event stamp: 194074
[  323.898930] hardirqs last  enabled at (194073): [] 
trace_hardirqs_on_thunk+0x1a/0x1c
[  323.898930] hardirqs last disabled at (194074): [] 
trace_hardirqs_off_thunk+0x1a/0x1c
[  323.898930] softirqs last  enabled at (182132): [] 
__do_softirq+0x6ec/0xa3b
[  323.898930] softirqs last disabled at (182109): [] 
irq_exit+0x1a6/0x1e0
[  323.898930] CPU: 0 PID: 1381 Comm: modprobe Not tainted 4.20.0-rc2+ #27
[  323.898930] RIP: 0010:raw_notifier_chain_register+0xea/0x240
[  323.898930] Code: 3c 03 0f 8e f2 00 00 00 44 3b 6b 10 7f 4d 49 bc 00 00 00 
00 00 fc ff df eb 22 48 8d 7b 10 488
[  323.898930] RSP: 0018:888101597218 EFLAGS: 0206 ORIG_RAX: 
ff13
[  323.898930] RAX:  RBX: c04361c0 RCX: 
[  323.898930] RDX: 126132ae RSI: c04aa3c0 RDI: c04361d0
[  323.898930] RBP: c04361c8 R08:  R09: 0001
[  323.898930] R10: 8881015972b0 R11: fbfff26132c4 R12: dc00
[  323.898930] R13:  R14: 1110202b2e44 R15: c04aa3c0
[  323.898930] FS:  7f813ed41540() GS:88811ae0() 
knlGS:
[  323.898930] CS:  0010 DS:  ES:  CR0: 80050033
[  323.898930] CR2: 559bf2c9f120 CR3: 00010bc8 CR4: 001006f0
[  323.898930] Call Trace:
[  323.898930]  ? atomic_notifier_chain_register+0x2d0/0x2d0
[  323.898930]  ? down_read+0x150/0x150
[  323.898930]  ? sched_clock_cpu+0x126/0x170
[  323.898930]  ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables]
[  323.898930]  ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables]
[  323.898930]  register_netdevice_notifier+0xbb/0x790
[  323.898930]  ? __dev_close_many+0x2d0/0x2d0
[  323.898930]  ? __mutex_unlock_slowpath+0x17f/0x740
[  323.898930]  ? wait_for_completion+0x710/0x710
[  323.898930]  ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables]
[  323.898930]  ? up_write+0x6c/0x210
[  323.898930]  ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables]
[  324.127073]  ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables]
[  324.127073]  nft_chain_filter_init+0x1e/0xe8a [nf_tables]
[  324.127073]  nf_tables_module_init+0x37/0x92 [nf_tables]
[ ... ]

Fixes: 8dd33cc93ec9 ("netfilter: nf_nat: generalize IPv4 masquerading support 
for nf_tables")
Fixes: be6b635cd674 ("netfilter: nf_nat: generalize IPv6 masquerading support 
for nf_tables")
Signed-off-by: Taehee Yoo 
---

v2:
 - Add second patch
 - return success when notifier is already registered. (Florian Westphal)
v1: Initial patch

 net/ipv4/netfilter/nf_nat_masquerade_ipv4.c | 23 ++---
 net/ipv6/netfilter/nf_nat_masquerade_ipv6.c | 23 ++---
 2 files changed, 32 insertions(+), 14 deletions(-)

diff --git a/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c 
b/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c
index c7d7fa4fc369..41327bb99093 100644
--- a/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c
+++ b/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c
@@ -147,15 +147,17 @@ static struct notifier_block masq_inet_notifier = {
.notifier_call  = masq_inet_event,
 };
 
-static atomic_t masquerade_notifier_refcount = ATOMIC_INIT(0);
+static int masq_refcnt;
+static DEFINE_MUTEX(masq_mutex);
 
 int nf_nat_masquerade_ipv4_register_notifier(void)
 {
-   int ret;
+   int ret = 0;
 
+   mutex_lock(_mutex);
/* check if the notifier was already set */
-   if (atomic_inc_return(_notifier_refcount) > 1)
-   return 0;
+   if (++masq_refcnt > 1)
+   goto out_unlock;
 
/* Register for device down reports */
ret 

[PATCH nf v2 0/2] netfilter: fix notifier registration bugs

2018-11-22 Thread Taehee Yoo
This patch series fix notifier registration bugs.

First patch adds error handling code for failure of notifier registration.
notifier registration can be failed. so that error handling code are needed.

Second patch fixes double-register bug in masqerade modules.
In order to protect double-register, masquerade modules manage
reference count. but it's not enough.
So that, this patch uses mutex instead of atomic value.

v2:
 - Add second patch
 - return success when notifier is already registered. (Florian Westphal)
v1: Initial patch

Taehee Yoo (2):
  netfilter: add missing error handling code for register functions
  netfilter: nat: fix double register in masquerade modules

 .../net/netfilter/ipv4/nf_nat_masquerade.h|  2 +-
 .../net/netfilter/ipv6/nf_nat_masquerade.h|  2 +-
 net/ipv4/netfilter/ipt_MASQUERADE.c   |  7 ++-
 net/ipv4/netfilter/nf_nat_masquerade_ipv4.c   | 38 +++---
 net/ipv4/netfilter/nft_masq_ipv4.c|  4 +-
 net/ipv6/netfilter/ip6t_MASQUERADE.c  |  8 ++-
 net/ipv6/netfilter/nf_nat_masquerade_ipv6.c   | 49 ++-
 net/ipv6/netfilter/nft_masq_ipv6.c|  4 +-
 net/netfilter/nft_flow_offload.c  |  5 +-
 9 files changed, 89 insertions(+), 30 deletions(-)

-- 
2.17.1



[PATCH nf v2 1/2] netfilter: add missing error handling code for register functions

2018-11-22 Thread Taehee Yoo
register_{netdevice/inetaddr/inet6addr}_notifier returns value that
could be error value. so that error handling code are needed.

Signed-off-by: Taehee Yoo 
---

v2:
 - Add second patch
 - return success when notifier is already registered. (Florian Westphal)
v1: Initial patch

 .../net/netfilter/ipv4/nf_nat_masquerade.h|  2 +-
 .../net/netfilter/ipv6/nf_nat_masquerade.h|  2 +-
 net/ipv4/netfilter/ipt_MASQUERADE.c   |  7 ++--
 net/ipv4/netfilter/nf_nat_masquerade_ipv4.c   | 21 +---
 net/ipv4/netfilter/nft_masq_ipv4.c|  4 ++-
 net/ipv6/netfilter/ip6t_MASQUERADE.c  |  8 +++--
 net/ipv6/netfilter/nf_nat_masquerade_ipv6.c   | 32 +--
 net/ipv6/netfilter/nft_masq_ipv6.c|  4 ++-
 net/netfilter/nft_flow_offload.c  |  5 ++-
 9 files changed, 63 insertions(+), 22 deletions(-)

diff --git a/include/net/netfilter/ipv4/nf_nat_masquerade.h 
b/include/net/netfilter/ipv4/nf_nat_masquerade.h
index cd24be4c4a99..13d55206bb9f 100644
--- a/include/net/netfilter/ipv4/nf_nat_masquerade.h
+++ b/include/net/netfilter/ipv4/nf_nat_masquerade.h
@@ -9,7 +9,7 @@ nf_nat_masquerade_ipv4(struct sk_buff *skb, unsigned int 
hooknum,
   const struct nf_nat_range2 *range,
   const struct net_device *out);
 
-void nf_nat_masquerade_ipv4_register_notifier(void);
+int nf_nat_masquerade_ipv4_register_notifier(void);
 void nf_nat_masquerade_ipv4_unregister_notifier(void);
 
 #endif /*_NF_NAT_MASQUERADE_IPV4_H_ */
diff --git a/include/net/netfilter/ipv6/nf_nat_masquerade.h 
b/include/net/netfilter/ipv6/nf_nat_masquerade.h
index 0c3b5ebf0bb8..2917bf95c437 100644
--- a/include/net/netfilter/ipv6/nf_nat_masquerade.h
+++ b/include/net/netfilter/ipv6/nf_nat_masquerade.h
@@ -5,7 +5,7 @@
 unsigned int
 nf_nat_masquerade_ipv6(struct sk_buff *skb, const struct nf_nat_range2 *range,
   const struct net_device *out);
-void nf_nat_masquerade_ipv6_register_notifier(void);
+int nf_nat_masquerade_ipv6_register_notifier(void);
 void nf_nat_masquerade_ipv6_unregister_notifier(void);
 
 #endif /* _NF_NAT_MASQUERADE_IPV6_H_ */
diff --git a/net/ipv4/netfilter/ipt_MASQUERADE.c 
b/net/ipv4/netfilter/ipt_MASQUERADE.c
index ce1512b02cb2..fd3f9e8a74da 100644
--- a/net/ipv4/netfilter/ipt_MASQUERADE.c
+++ b/net/ipv4/netfilter/ipt_MASQUERADE.c
@@ -81,9 +81,12 @@ static int __init masquerade_tg_init(void)
int ret;
 
ret = xt_register_target(_tg_reg);
+   if (ret)
+   return ret;
 
-   if (ret == 0)
-   nf_nat_masquerade_ipv4_register_notifier();
+   ret = nf_nat_masquerade_ipv4_register_notifier();
+   if (ret)
+   xt_unregister_target(_tg_reg);
 
return ret;
 }
diff --git a/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c 
b/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c
index a9d5e013e555..c7d7fa4fc369 100644
--- a/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c
+++ b/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c
@@ -149,16 +149,29 @@ static struct notifier_block masq_inet_notifier = {
 
 static atomic_t masquerade_notifier_refcount = ATOMIC_INIT(0);
 
-void nf_nat_masquerade_ipv4_register_notifier(void)
+int nf_nat_masquerade_ipv4_register_notifier(void)
 {
+   int ret;
+
/* check if the notifier was already set */
if (atomic_inc_return(_notifier_refcount) > 1)
-   return;
+   return 0;
 
/* Register for device down reports */
-   register_netdevice_notifier(_dev_notifier);
+   ret = register_netdevice_notifier(_dev_notifier);
+   if (ret)
+   goto err_dec;
/* Register IP address change reports */
-   register_inetaddr_notifier(_inet_notifier);
+   ret = register_inetaddr_notifier(_inet_notifier);
+   if (ret)
+   goto err_unregister;
+
+   return ret;
+err_unregister:
+   unregister_netdevice_notifier(_dev_notifier);
+err_dec:
+   atomic_dec(_notifier_refcount);
+   return ret;
 }
 EXPORT_SYMBOL_GPL(nf_nat_masquerade_ipv4_register_notifier);
 
diff --git a/net/ipv4/netfilter/nft_masq_ipv4.c 
b/net/ipv4/netfilter/nft_masq_ipv4.c
index f1193e1e928a..6847de1d1db8 100644
--- a/net/ipv4/netfilter/nft_masq_ipv4.c
+++ b/net/ipv4/netfilter/nft_masq_ipv4.c
@@ -69,7 +69,9 @@ static int __init nft_masq_ipv4_module_init(void)
if (ret < 0)
return ret;
 
-   nf_nat_masquerade_ipv4_register_notifier();
+   ret = nf_nat_masquerade_ipv4_register_notifier();
+   if (ret)
+   nft_unregister_expr(_masq_ipv4_type);
 
return ret;
 }
diff --git a/net/ipv6/netfilter/ip6t_MASQUERADE.c 
b/net/ipv6/netfilter/ip6t_MASQUERADE.c
index 491f808e356a..29c7f1915a96 100644
--- a/net/ipv6/netfilter/ip6t_MASQUERADE.c
+++ b/net/ipv6/netfilter/ip6t_MASQUERADE.c
@@ -58,8 +58,12 @@ static int __init masquerade_tg6_init(void)
int err;
 
err = xt_register_target(_tg6_reg);
-   if (err == 0)
-