Re: net: suspicious RCU usage in nf_hook

2017-01-27 Thread Eric Dumazet
On Fri, 2017-01-27 at 17:00 -0800, Cong Wang wrote:
> On Fri, Jan 27, 2017 at 3:35 PM, Eric Dumazet  wrote:
> > Oh well, I forgot to submit the official patch I think, Jan 9th.
> >
> > https://groups.google.com/forum/#!topic/syzkaller/BhyN5OFd7sQ
> >
> 
> Hmm, but why only fragments need skb_orphan()? It seems like
> any kfree_skb() inside a nf hook needs to have a preceding
> skb_orphan().


> 
> Also, I am not convinced it is similar to commit 8282f27449bf15548
> which is on RX path.

Well, we clearly see IPv6 reassembly being part of the equation in both
cases.

I was replying to first part of the splat [1], which was already
diagnosed and had a non official patch.

use after free is also a bug, regardless of jump label being used or
not.

I still do not really understand this nf_hook issue, I thought we were
disabling BH in netfilter.

So the in_interrupt() check in net_disable_timestamp() should trigger,
this was the intent of netstamp_needed_deferred existence.

Not sure if we can test for rcu_read_lock() as well.

[1]
 sk_destruct+0x47/0x80 net/core/sock.c:1460
 __sk_free+0x57/0x230 net/core/sock.c:1468
 sock_wfree+0xae/0x120 net/core/sock.c:1645
 skb_release_head_state+0xfc/0x200 net/core/skbuff.c:655
 skb_release_all+0x15/0x60 net/core/skbuff.c:668
 __kfree_skb+0x15/0x20 net/core/skbuff.c:684
 kfree_skb+0x16e/0x4c0 net/core/skbuff.c:705
 inet_frag_destroy+0x121/0x290 net/ipv4/inet_fragment.c:304
 inet_frag_put include/net/inet_frag.h:133 [inline]
 nf_ct_frag6_gather+0x1106/0x3840
net/ipv6/netfilter/nf_conntrack_reasm.c:617
 ipv6_defrag+0x1be/0x2b0 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68
 nf_hook_entry_hookfn include/linux/netfilter.h:102 [inline]
 nf_hook_slow+0xc3/0x290 net/netfilter/core.c:310
 nf_hook include/linux/netfilter.h:212 [inline]
 __ip6_local_out+0x489/0x840 net/ipv6/output_core.c:160
 ip6_local_out+0x2d/0x170 net/ipv6/output_core.c:170





--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: net: suspicious RCU usage in nf_hook

2017-01-27 Thread Cong Wang
On Fri, Jan 27, 2017 at 3:35 PM, Eric Dumazet  wrote:
> Oh well, I forgot to submit the official patch I think, Jan 9th.
>
> https://groups.google.com/forum/#!topic/syzkaller/BhyN5OFd7sQ
>

Hmm, but why only fragments need skb_orphan()? It seems like
any kfree_skb() inside a nf hook needs to have a preceding
skb_orphan().

Also, I am not convinced it is similar to commit 8282f27449bf15548
which is on RX path.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: net: suspicious RCU usage in nf_hook

2017-01-27 Thread Cong Wang
On Fri, Jan 27, 2017 at 1:15 PM, Dmitry Vyukov  wrote:
> stack backtrace:
> CPU: 2 PID: 23111 Comm: syz-executor14 Not tainted 4.10.0-rc5+ #192
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:15 [inline]
>  dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
>  lockdep_rcu_suspicious+0x139/0x180 kernel/locking/lockdep.c:4452
>  rcu_preempt_sleep_check include/linux/rcupdate.h:560 [inline]
>  ___might_sleep+0x560/0x650 kernel/sched/core.c:7748
>  __might_sleep+0x95/0x1a0 kernel/sched/core.c:7739
>  mutex_lock_nested+0x24f/0x1730 kernel/locking/mutex.c:752
>  atomic_dec_and_mutex_lock+0x119/0x160 kernel/locking/mutex.c:1060
>  __static_key_slow_dec+0x7a/0x1e0 kernel/jump_label.c:149
>  static_key_slow_dec+0x51/0x90 kernel/jump_label.c:174
>  net_disable_timestamp+0x3b/0x50 net/core/dev.c:1728
>  sock_disable_timestamp+0x98/0xc0 net/core/sock.c:403
>  __sk_destruct+0x27d/0x6b0 net/core/sock.c:1441
>  sk_destruct+0x47/0x80 net/core/sock.c:1460

jump label uses a mutex and we call jump label API in softIRQ context...
Maybe we have to move it to a work struct as what we did for netlink.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


net: suspicious RCU usage in nf_hook

2017-01-27 Thread Dmitry Vyukov
Hello,

I've got the following report while running syzkaller fuzzer on
fd694aaa46c7ed811b72eb47d5eb11ce7ab3f7f1:

[ INFO: suspicious RCU usage. ]
4.10.0-rc5+ #192 Not tainted
---
./include/linux/rcupdate.h:561 Illegal context switch in RCU read-side
critical section!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 0
2 locks held by syz-executor14/23111:
 #0:  (sk_lock-AF_INET6){+.+.+.}, at: [] lock_sock
include/net/sock.h:1454 [inline]
 #0:  (sk_lock-AF_INET6){+.+.+.}, at: []
rawv6_sendmsg+0x1e65/0x3ec0 net/ipv6/raw.c:919
 #1:  (rcu_read_lock){..}, at: [] nf_hook
include/linux/netfilter.h:201 [inline]
 #1:  (rcu_read_lock){..}, at: []
__ip6_local_out+0x258/0x840 net/ipv6/output_core.c:160

stack backtrace:
CPU: 2 PID: 23111 Comm: syz-executor14 Not tainted 4.10.0-rc5+ #192
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:15 [inline]
 dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
 lockdep_rcu_suspicious+0x139/0x180 kernel/locking/lockdep.c:4452
 rcu_preempt_sleep_check include/linux/rcupdate.h:560 [inline]
 ___might_sleep+0x560/0x650 kernel/sched/core.c:7748
 __might_sleep+0x95/0x1a0 kernel/sched/core.c:7739
 mutex_lock_nested+0x24f/0x1730 kernel/locking/mutex.c:752
 atomic_dec_and_mutex_lock+0x119/0x160 kernel/locking/mutex.c:1060
 __static_key_slow_dec+0x7a/0x1e0 kernel/jump_label.c:149
 static_key_slow_dec+0x51/0x90 kernel/jump_label.c:174
 net_disable_timestamp+0x3b/0x50 net/core/dev.c:1728
 sock_disable_timestamp+0x98/0xc0 net/core/sock.c:403
 __sk_destruct+0x27d/0x6b0 net/core/sock.c:1441
 sk_destruct+0x47/0x80 net/core/sock.c:1460
 __sk_free+0x57/0x230 net/core/sock.c:1468
 sock_wfree+0xae/0x120 net/core/sock.c:1645
 skb_release_head_state+0xfc/0x200 net/core/skbuff.c:655
 skb_release_all+0x15/0x60 net/core/skbuff.c:668
 __kfree_skb+0x15/0x20 net/core/skbuff.c:684
 kfree_skb+0x16e/0x4c0 net/core/skbuff.c:705
 inet_frag_destroy+0x121/0x290 net/ipv4/inet_fragment.c:304
 inet_frag_put include/net/inet_frag.h:133 [inline]
 nf_ct_frag6_gather+0x1106/0x3840 net/ipv6/netfilter/nf_conntrack_reasm.c:617
 ipv6_defrag+0x1be/0x2b0 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68
 nf_hook_entry_hookfn include/linux/netfilter.h:102 [inline]
 nf_hook_slow+0xc3/0x290 net/netfilter/core.c:310
 nf_hook include/linux/netfilter.h:212 [inline]
 __ip6_local_out+0x489/0x840 net/ipv6/output_core.c:160
 ip6_local_out+0x2d/0x170 net/ipv6/output_core.c:170
 ip6_send_skb+0xa1/0x340 net/ipv6/ip6_output.c:1722
 ip6_push_pending_frames+0xb3/0xe0 net/ipv6/ip6_output.c:1742
 rawv6_push_pending_frames net/ipv6/raw.c:613 [inline]
 rawv6_sendmsg+0x2d1a/0x3ec0 net/ipv6/raw.c:927
 inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744
 sock_sendmsg_nosec net/socket.c:635 [inline]
 sock_sendmsg+0xca/0x110 net/socket.c:645
 sock_write_iter+0x326/0x600 net/socket.c:848
 do_iter_readv_writev+0x2e3/0x5b0 fs/read_write.c:695
 do_readv_writev+0x42c/0x9b0 fs/read_write.c:872
 vfs_writev+0x87/0xc0 fs/read_write.c:911
 do_writev+0x110/0x2c0 fs/read_write.c:944
 SYSC_writev fs/read_write.c:1017 [inline]
 SyS_writev+0x27/0x30 fs/read_write.c:1014
 entry_SYSCALL_64_fastpath+0x1f/0xc2
RIP: 0033:0x445559
RSP: 002b:7f6f46fceb58 EFLAGS: 0292 ORIG_RAX: 0014
RAX: ffda RBX: 0005 RCX: 00445559
RDX: 0001 RSI: 20f1eff0 RDI: 0005
RBP: 006e19c0 R08:  R09: 
R10:  R11: 0292 R12: 0070
R13: 20f59000 R14: 0015 R15: 00020400
BUG: sleeping function called from invalid context at kernel/locking/mutex.c:752
in_atomic(): 1, irqs_disabled(): 0, pid: 23111, name: syz-executor14
INFO: lockdep is turned off.
CPU: 2 PID: 23111 Comm: syz-executor14 Not tainted 4.10.0-rc5+ #192
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:15 [inline]
 dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
 ___might_sleep+0x47e/0x650 kernel/sched/core.c:7780
 __might_sleep+0x95/0x1a0 kernel/sched/core.c:7739
 mutex_lock_nested+0x24f/0x1730 kernel/locking/mutex.c:752
 atomic_dec_and_mutex_lock+0x119/0x160 kernel/locking/mutex.c:1060
 __static_key_slow_dec+0x7a/0x1e0 kernel/jump_label.c:149
 static_key_slow_dec+0x51/0x90 kernel/jump_label.c:174
 net_disable_timestamp+0x3b/0x50 net/core/dev.c:1728
 sock_disable_timestamp+0x98/0xc0 net/core/sock.c:403
 __sk_destruct+0x27d/0x6b0 net/core/sock.c:1441
 sk_destruct+0x47/0x80 net/core/sock.c:1460
 __sk_free+0x57/0x230 net/core/sock.c:1468
 sock_wfree+0xae/0x120 net/core/sock.c:1645
 skb_release_head_state+0xfc/0x200 net/core/skbuff.c:655
 skb_release_all+0x15/0x60 net/core/skbuff.c:668
 __kfree_skb+0x15/0x20 net/core/skbuff.c:684
 kfree_skb+0x16e/0x4c0 net/core/skbuff.c:705
 inet_frag_destroy+0x121/0x290 net/ipv4/inet_fragment.c:304
 inet_frag_put include/net/inet_frag.h:133 

[ANNOUNCE] iptables 1.6.1 release

2017-01-27 Thread Pablo Neira Ayuso
Hi!

The Netfilter project proudly presents:

iptables 1.6.1

iptables is the userspace command line program used to configure the
Linux 2.4.x and later packet filtering ruleset. It is targeted towards
system administrators.

This update contains accumulated bugfixes, several new extensions and
lots of translations via iptables-translate to ease migration to
nftables.

See ChangeLog that comes attached to this email for more details.

You can download it from:

http://www.netfilter.org/projects/iptables/downloads.html
ftp://ftp.netfilter.org/pub/iptables/

Have fun!
Ana Rey (1):
  extensions: libxt_udp: add translation to nft

Arpan Kapoor (1):
  libxtables: Replace gethostbyname() with getaddrinfo()

Arturo Borrero (3):
  extensions/libxt_rpfilter.man: fix typo, specifiy vs specify
  iptables/xtables-arp.c: fix typo, wierd vs weird
  extensions/libxt_tcp: fix nftables translate flags value, 'none' vs '0x0'

Arturo Borrero Gonzalez (1):
  extensions: update Arturo Borrero email address

Brian Haley (1):
  iptables-restore: add missing arguments to usage message

Florian Westphal (5):
  iptables.8: mention iptables-save in -L documentation
  iptables.8: nat table has four builtin chains
  extensions: NETMAP: add ' to:' prefix when printing NETMAP target
  extensions: NETMAP: fix iptables-save output
  connlabel: clarify default config path

George Burgess IV (1):
  libxt_multiport: remove an unused variable

Giuseppe Longo (1):
  configure: make libmnl and libnftnl hard requirements

Guruswamy Basavaiah (4):
  iptables: extensions: iptables-translate prints extra "nft" after 
printing any error
  iptables-translate: translate iptables --flush
  iptables-translate: Printing the table name before chain name.
  iptables-translate: Don't print "nft" in iptables-restore-translate 
command

Gustavo Zacarias (1):
  iptables: add xtables-config-parser.h to BUILT_SOURCES

Janani Ravichandran (1):
  extensions: libip6t_rt.c: Add translation to nft

Jordan Yelloz (1):
  extensions: added AR substitution

Keno Fischer (1):
  build: Fix two compile errors during out-of-tree build

Laura Garcia Liebana (12):
  extensions: libip6t_icmp6: Add translation to nft
  extensions: libipt_LOG: Avoid to print the default log level in the 
translation
  extensions: libipt_icmp: Add translation to nft
  extensions: libipt_REJECT: Avoid to print the default reject with value 
in the translation
  extensions: libip6t_REJECT: Avoid to print the default reject with value 
in the translation
  extensions: libxt_ipcomp: Add translation to nft
  extensions: libip6t_hbh: Add translation to nft
  extensions: libxt_multiport: Add translation to nft
  extensions: libxt_dscp: Add translation to nft
  extensions: libip6t_frag: Add translation to nft
  extensions: libxt_cgroup: Add translation to nft
  extensions: libxt_conntrack: Add translation to nft

Liping Zhang (27):
  extensions: libxt_limit: fix a wrong translation to nft rule
  extensions: libxt_mark: fix a wrong translation to nft when mask is 
specified
  extensions: libxt_TRACE: Add translation to nft
  extensions: libipt_realm: fix order of mask and id when do nft translation
  extensions: libxt_connlabel: fix crash when connlabel.conf is empty
  extensions: libxt_connlabel: Add translation to nft
  extensions: libxt_NFLOG: display nflog-size even if it is zero
  extensions: libxt_NFLOG: translate to nft log snaplen if nflog-size is 
specified
  extensions: libxt_NFLOG: add unit test to cover nflog-size with zero
  extensions: libxt_connlabel: add unit test
  iptables-translate: add in/out ifname wildcard match translation to nft
  extensions: libxt_CLASSIFY: Add translation to nft
  extensions: libipt_DNAT/SNAT: fix "OOM" when do translation to nft
  extensions: libip[6]t_SNAT/DNAT: use the new nft syntax when do xlate
  extensions: libip[6]t_REDIRECT: use new nft syntax when do xlate
  extensions: libip6t_SNAT/DNAT: add square bracket in xlat output when 
port is specified
  extensions: libipt_realm: add a missing space in translation
  extensions: libxt_iprange: rename "ip saddr" to "ip6 saddr" in 
ip6tables-xlate
  extensions: libxt_iprange: handle the invert flag properly in translation
  extensions: libxt_devgroup: handle the invert flag properly in translation
  extensions: libxt_ipcomp: add range support in translation
  extensions: libxt_quota: add translation to nft
  extensions: libxt_DSCP: add translation to nft
  extensions: libxt_statistic: add translation to nft
  extensions: LOG: add log flags translation to nft
  extensions: libxt_connbytes: Add translation to nft
  extensions: libxt_rpfilter: add translation to nft

Loganaden Velvindron (1):
  libxt_TCPOPTSTRIP: Fix musl compatibility

Pablo M. Bermudo Garay (11):

[PATCH 1/2 conntrack-tools] conntrackd: cthelper: Free pktb after use

2017-01-27 Thread Kevin Cernekee
According to valgrind, this currently leaks ~512B to 2kB for each
packet sent to the userspace helper.

Signed-off-by: Kevin Cernekee 
---
 src/cthelper.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/cthelper.c b/src/cthelper.c
index 54eb830..f01c509 100644
--- a/src/cthelper.c
+++ b/src/cthelper.c
@@ -325,6 +325,7 @@ static int nfq_queue_cb(const struct nlmsghdr *nlh, void 
*data)
if (pkt_verdict_issue(helper, myct, queue_num, id, verdict, pktb) < 0)
goto err4;
 
+   pktb_free(pktb);
nfct_destroy(ct);
if (myct->exp != NULL)
nfexp_destroy(myct->exp);
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2 conntrack-tools] conntrackd: config: Free strdup()ed tokens

2017-01-27 Thread Kevin Cernekee
This frees T_IP, T_PATH_VAL, and T_STRING tokens.  They were being flagged
by valgrind as memory leaks.

Lightly tested using doc/helper/conntrackd.conf and doc/stats/conntrackd.conf.

Signed-off-by: Kevin Cernekee 
---
 src/read_config_yy.y | 29 +
 1 file changed, 29 insertions(+)

diff --git a/src/read_config_yy.y b/src/read_config_yy.y
index 97f905d..f10e0db 100644
--- a/src/read_config_yy.y
+++ b/src/read_config_yy.y
@@ -116,6 +116,7 @@ logfile_bool : T_LOG T_OFF
 logfile_path : T_LOG T_PATH_VAL
 {
strncpy(conf.logfile, $2, FILENAME_MAXLEN);
+   free($2);
 };
 
 syslog_bool : T_SYSLOG T_ON
@@ -158,11 +159,13 @@ syslog_facility : T_SYSLOG T_STRING
conf.syslog_facility != conf.stats.syslog_facility)
dlog(LOG_WARNING, "conflicting Syslog facility "
 "values, defaulting to General");
+   free($2);
 };
 
 lock : T_LOCK T_PATH_VAL
 {
strncpy(conf.lockfile, $2, FILENAME_MAXLEN);
+   free($2);
 };
 
 refreshtime : T_REFRESH T_NUMBER
@@ -257,6 +260,7 @@ multicast_option : T_IPV4_ADDR T_IP
}
 
conf.channel[conf.channel_num].u.mcast.ipproto = AF_INET;
+   free($2);
 };
 
 multicast_option : T_IPV6_ADDR T_IP
@@ -296,6 +300,7 @@ multicast_option : T_IPV6_ADDR T_IP
conf.channel[conf.channel_num].u.mcast.ifa.interface_index6 = 
idx;
conf.channel[conf.channel_num].u.mcast.ipproto = AF_INET6;
}
+   free($2);
 };
 
 multicast_option : T_IPV4_IFACE T_IP
@@ -315,11 +320,13 @@ multicast_option : T_IPV4_IFACE T_IP
}
 
conf.channel[conf.channel_num].u.mcast.ipproto = AF_INET;
+   free($2);
 };
 
 multicast_option : T_IPV6_IFACE T_IP
 {
dlog(LOG_WARNING, "`IPv6_interface' not required, ignoring");
+   free($2);
 }
 
 multicast_option : T_IFACE T_STRING
@@ -340,6 +347,7 @@ multicast_option : T_IFACE T_STRING
conf.channel[conf.channel_num].u.mcast.ifa.interface_index6 = 
idx;
conf.channel[conf.channel_num].u.mcast.ipproto = AF_INET6;
}
+   free($2);
 };
 
 multicast_option : T_GROUP T_NUMBER
@@ -414,6 +422,7 @@ udp_option : T_IPV4_ADDR T_IP
break;
}
conf.channel[conf.channel_num].u.udp.ipproto = AF_INET;
+   free($2);
 };
 
 udp_option : T_IPV6_ADDR T_IP
@@ -431,6 +440,7 @@ udp_option : T_IPV6_ADDR T_IP
break;
 #endif
conf.channel[conf.channel_num].u.udp.ipproto = AF_INET6;
+   free($2);
 };
 
 udp_option : T_IPV4_DEST_ADDR T_IP
@@ -459,6 +469,7 @@ udp_option : T_IPV6_DEST_ADDR T_IP
break;
 #endif
conf.channel[conf.channel_num].u.udp.ipproto = AF_INET6;
+   free($2);
 };
 
 udp_option : T_IFACE T_STRING
@@ -474,6 +485,7 @@ udp_option : T_IFACE T_STRING
break;
}
conf.channel[conf.channel_num].u.udp.server.ipv6.scope_id = idx;
+   free($2);
 };
 
 udp_option : T_PORT T_NUMBER
@@ -552,6 +564,7 @@ tcp_option : T_IPV4_ADDR T_IP
break;
}
conf.channel[conf.channel_num].u.tcp.ipproto = AF_INET;
+   free($2);
 };
 
 tcp_option : T_IPV6_ADDR T_IP
@@ -569,6 +582,7 @@ tcp_option : T_IPV6_ADDR T_IP
break;
 #endif
conf.channel[conf.channel_num].u.tcp.ipproto = AF_INET6;
+   free($2);
 };
 
 tcp_option : T_IPV4_DEST_ADDR T_IP
@@ -580,6 +594,7 @@ tcp_option : T_IPV4_DEST_ADDR T_IP
break;
}
conf.channel[conf.channel_num].u.tcp.ipproto = AF_INET;
+   free($2);
 };
 
 tcp_option : T_IPV6_DEST_ADDR T_IP
@@ -597,6 +612,7 @@ tcp_option : T_IPV6_DEST_ADDR T_IP
break;
 #endif
conf.channel[conf.channel_num].u.tcp.ipproto = AF_INET6;
+   free($2);
 };
 
 tcp_option : T_IFACE T_STRING
@@ -612,6 +628,7 @@ tcp_option : T_IFACE T_STRING
break;
}
conf.channel[conf.channel_num].u.tcp.server.ipv6.scope_id = idx;
+   free($2);
 };
 
 tcp_option : T_PORT T_NUMBER
@@ -669,6 +686,7 @@ unix_options:
 unix_option : T_PATH T_PATH_VAL
 {
strcpy(conf.local.path, $2);
+   free($2);
 };
 
 unix_option : T_BACKLOG T_NUMBER
@@ -757,6 +775,7 @@ expect_list:
 expect_item: T_STRING
 {
exp_filter_add(STATE(exp_filter), $1);
+   free($1);
 }
 
 sync_mode_alarm: T_SYNC_MODE T_ALARM '{' sync_mode_alarm_list '}'
@@ -993,6 +1012,7 @@ scheduler_line : T_TYPE T_STRING
dlog(LOG_ERR, "unknown scheduler `%s'", $2);
exit(EXIT_FAILURE);
}
+   free($2);
 };
 
 scheduler_line : T_PRIO T_NUMBER
@@ -1079,6 +1099,7 @@ filter_protocol_item : T_STRING
nfct_filter_add_attr_u32(STATE(filter),
 NFCT_FILTER_L4PROTO,
 pent->p_proto);
+   free($1);
 };
 
 filter_protocol_item : T_TCP
@@ -1209,6 +1230,7 @@ filter_address_item : T_IPV4_ADDR T_IP
 
nfct_filter_add_attr(STATE(filter), NFCT_FILTER_SRC_IPV4, _ipv4);

Re: [PATCH v2 net] net: free ip_vs_dest structs when refcnt=0

2017-01-27 Thread Simon Horman
On Fri, Jan 27, 2017 at 01:21:11PM +0100, Pablo Neira Ayuso wrote:
> On Fri, Jan 27, 2017 at 09:07:38AM +0100, Simon Horman wrote:
> > On Thu, Jan 26, 2017 at 10:49:10PM +0200, Julian Anastasov wrote:
> > > 
> > >   Hello,
> > > 
> > > On Mon, 23 Jan 2017, David Windsor wrote:
> > > 
> > > > Currently, the ip_vs_dest cache frees ip_vs_dest objects when their
> > > > reference count becomes < 0.  Aside from not being semantically sound,
> > > > this is problematic for the new type refcount_t, which will be 
> > > > introduced
> > > > shortly in a separate patch. refcount_t is the new kernel type for
> > > > holding reference counts, and provides overflow protection and a
> > > > constrained interface relative to atomic_t (the type currently being
> > > > used for kernel reference counts).
> > > > 
> > > > Per Julian Anastasov: "The problem is that dest_trash currently holds
> > > > deleted dests (unlinked from RCU lists) with refcnt=0."  Changing
> > > > dest_trash to hold dest with refcnt=1 will allow us to free ip_vs_dest
> > > > structs when their refcnt=0, in ip_vs_dest_put_and_free().
> > > > 
> > > > Signed-off-by: David Windsor 
> > > 
> > >   Thanks! I tested the first version and this one
> > > just adds the needed changes in comments, so
> > > 
> > > Signed-off-by: Julian Anastasov 
> > > 
> > >   Simon and Pablo, this is more appropriate for
> > > ipvs-next/nf-next. Please apply!
> > 
> > Pablo, would you mind taking this one directly into nf-next?
> > 
> > Signed-off-by: Simon Horman 
> 
> Sure, no problem. I'll take it. Thanks!

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 09/14] netfilter: conntrack: refine gc worker heuristics, redux

2017-01-27 Thread Nicolas Dichtel
Le 26/01/2017 à 17:38, Pablo Neira Ayuso a écrit :
> From: Florian Westphal 
> 
> This further refines the changes made to conntrack gc_worker in
> commit e0df8cae6c16 ("netfilter: conntrack: refine gc worker heuristics").
> 
> The main idea of that change was to reduce the scan interval when evictions
> take place.
> 
> However, on the reporters' setup, there are 1-2 million conntrack entries
> in total and roughly 8k new (and closing) connections per second.
> 
> In this case we'll always evict at least one entry per gc cycle and scan
> interval is always at 1 jiffy because of this test:
> 
>  } else if (expired_count) {
>  gc_work->next_gc_run /= 2U;
>  next_run = msecs_to_jiffies(1);
> 
> being true almost all the time.
> 
> Given we scan ~10k entries per run its clearly wrong to reduce interval
> based on nonzero eviction count, it will only waste cpu cycles since a vast
> majorities of conntracks are not timed out.
> 
> Thus only look at the ratio (scanned entries vs. evicted entries) to make
> a decision on whether to reduce or not.
> 
> Because evictor is supposed to only kick in when system turns idle after
> a busy period, pick a high ratio -- this makes it 50%.  We thus keep
> the idea of increasing scan rate when its likely that table contains many
> expired entries.
> 
> In order to not let timed-out entries hang around for too long
> (important when using event logging, in which case we want to timely
> destroy events), we now scan the full table within at most
> GC_MAX_SCAN_JIFFIES (16 seconds) even in worst-case scenario where all
> timed-out entries sit in same slot.
> 
> I tested this with a vm under synflood (with
> sysctl net.netfilter.nf_conntrack_tcp_timeout_syn_recv=3).
> 
> While flood is ongoing, interval now stays at its max rate
> (GC_MAX_SCAN_JIFFIES / GC_MAX_BUCKETS_DIV -> 125ms).
> 
> With feedback from Nicolas Dichtel.
> 
> Reported-by: Denys Fedoryshchenko 
> Cc: Nicolas Dichtel 
> Fixes: b87a2f9199ea82eaadc ("netfilter: conntrack: add gc worker to remove 
> timed-out entries")
> Signed-off-by: Florian Westphal 
> Tested-by: Nicolas Dichtel 
> Acked-by: Nicolas Dichtel 
> Tested-by: Denys Fedoryshchenko 
> Signed-off-by: Pablo Neira Ayuso 
Pablo, is it possible to queue this patch (and the previous: 08/14) for the 4.9
stable?


Thank you,
Nicolas
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH] audit: normalize NETFILTER_PKT

2017-01-27 Thread Richard Guy Briggs
Eliminate flipping in and out of message fields.

https://github.com/linux-audit/audit-kernel/issues/11

Signed-off-by: Richard Guy Briggs 
---
 net/netfilter/xt_AUDIT.c |   92 +-
 1 files changed, 66 insertions(+), 26 deletions(-)

diff --git a/net/netfilter/xt_AUDIT.c b/net/netfilter/xt_AUDIT.c
index 4973cbd..8089ec2 100644
--- a/net/netfilter/xt_AUDIT.c
+++ b/net/netfilter/xt_AUDIT.c
@@ -31,24 +31,41 @@ MODULE_ALIAS("ip6t_AUDIT");
 MODULE_ALIAS("ebt_AUDIT");
 MODULE_ALIAS("arpt_AUDIT");
 
+struct nfpkt_par {
+   int ipv;
+   int iptrunc;
+   const void *saddr;
+   const void *daddr;
+   u16 ipid;
+   u8 proto;
+   u8 frag;
+   int ptrunc;
+   u16 sport;
+   u16 dport;
+   u8 icmpt;
+   u8 icmpc;
+};
+
 static void audit_proto(struct audit_buffer *ab, struct sk_buff *skb,
-   unsigned int proto, unsigned int offset)
+   unsigned int proto, unsigned int offset, struct 
nfpkt_par *apar)
 {
switch (proto) {
case IPPROTO_TCP:
case IPPROTO_UDP:
-   case IPPROTO_UDPLITE: {
+   case IPPROTO_UDPLITE:
+   case IPPROTO_DCCP:
+   case IPPROTO_SCTP: {
const __be16 *pptr;
__be16 _ports[2];
 
pptr = skb_header_pointer(skb, offset, sizeof(_ports), _ports);
if (pptr == NULL) {
-   audit_log_format(ab, " truncated=1");
+   apar->ptrunc = 1;
return;
}
+   apar->sport = ntohs(pptr[0]);
+   apar->dport = ntohs(pptr[1]);
 
-   audit_log_format(ab, " sport=%hu dport=%hu",
-ntohs(pptr[0]), ntohs(pptr[1]));
}
break;
 
@@ -59,41 +76,43 @@ static void audit_proto(struct audit_buffer *ab, struct 
sk_buff *skb,
 
iptr = skb_header_pointer(skb, offset, sizeof(_ih), &_ih);
if (iptr == NULL) {
-   audit_log_format(ab, " truncated=1");
+   apar->ptrunc = 1;
return;
}
-
-   audit_log_format(ab, " icmptype=%hhu icmpcode=%hhu",
-iptr[0], iptr[1]);
+   apar->icmpt = iptr[0];
+   apar->icmpc = iptr[1];
 
}
break;
}
 }
 
-static void audit_ip4(struct audit_buffer *ab, struct sk_buff *skb)
+static void audit_ip4(struct audit_buffer *ab, struct sk_buff *skb, struct 
nfpkt_par *apar)
 {
struct iphdr _iph;
const struct iphdr *ih;
 
+   apar->ipv = 4;
ih = skb_header_pointer(skb, 0, sizeof(_iph), &_iph);
if (!ih) {
-   audit_log_format(ab, " truncated=1");
+   apar->iptrunc = 1;
return;
}
 
-   audit_log_format(ab, " saddr=%pI4 daddr=%pI4 ipid=%hu proto=%hhu",
-   >saddr, >daddr, ntohs(ih->id), ih->protocol);
+   apar->saddr = >saddr;
+   apar->daddr = >daddr;
+   apar->ipid = ntohs(ih->id);
+   apar->proto = ih->protocol;
 
if (ntohs(ih->frag_off) & IP_OFFSET) {
-   audit_log_format(ab, " frag=1");
+   apar->frag = 1;
return;
}
 
-   audit_proto(ab, skb, ih->protocol, ih->ihl * 4);
+   audit_proto(ab, skb, ih->protocol, ih->ihl * 4, apar);
 }
 
-static void audit_ip6(struct audit_buffer *ab, struct sk_buff *skb)
+static void audit_ip6(struct audit_buffer *ab, struct sk_buff *skb, struct 
nfpkt_par *apar)
 {
struct ipv6hdr _ip6h;
const struct ipv6hdr *ih;
@@ -101,9 +120,10 @@ static void audit_ip6(struct audit_buffer *ab, struct 
sk_buff *skb)
__be16 frag_off;
int offset;
 
+   apar->ipv = 6;
ih = skb_header_pointer(skb, skb_network_offset(skb), sizeof(_ip6h), 
&_ip6h);
if (!ih) {
-   audit_log_format(ab, " truncated=1");
+   apar->iptrunc = 1;
return;
}
 
@@ -111,11 +131,12 @@ static void audit_ip6(struct audit_buffer *ab, struct 
sk_buff *skb)
offset = ipv6_skip_exthdr(skb, skb_network_offset(skb) + sizeof(_ip6h),
  , _off);
 
-   audit_log_format(ab, " saddr=%pI6c daddr=%pI6c proto=%hhu",
->saddr, >daddr, nexthdr);
+   apar->saddr = >saddr;
+   apar->daddr = >daddr;
+   apar->proto = nexthdr;
 
if (offset)
-   audit_proto(ab, skb, nexthdr, offset);
+   audit_proto(ab, skb, nexthdr, offset, apar);
 }
 
 static unsigned int
@@ -123,6 +144,9 @@ audit_tg(struct sk_buff *skb, const struct xt_action_param 
*par)
 {
const struct xt_audit_info *info = par->targinfo;
struct audit_buffer *ab;
+   struct nfpkt_par apar = {
+   -1, -1, NULL, NULL, -1, -1, -1, -1, -1, -1, -1, -1
+   };
 

Re: [PATCH nft v3 1/6] src: Allow reset single stateful object

2017-01-27 Thread Pablo Neira Ayuso
On Thu, Jan 26, 2017 at 03:09:44PM -0200, Elise Lennion wrote:
> Currently the stateful objects can only be reseted in groups. With this
> patch reseting a single object is allowed:
> 
> $ nft reset counter filter https-traffic
> table ip filter {
>   counter https-traffic {
>   packets 8774 bytes 542668
>   }
> }
> 
> $ nft list counter filter https-traffic
> table ip filter {
>   counter https-traffic {
>   packets 0 bytes 0
>   }
> }

Series from 1/6 to 6/6 applied, thanks Elise.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 net] net: free ip_vs_dest structs when refcnt=0

2017-01-27 Thread Simon Horman
On Thu, Jan 26, 2017 at 10:49:10PM +0200, Julian Anastasov wrote:
> 
>   Hello,
> 
> On Mon, 23 Jan 2017, David Windsor wrote:
> 
> > Currently, the ip_vs_dest cache frees ip_vs_dest objects when their
> > reference count becomes < 0.  Aside from not being semantically sound,
> > this is problematic for the new type refcount_t, which will be introduced
> > shortly in a separate patch. refcount_t is the new kernel type for
> > holding reference counts, and provides overflow protection and a
> > constrained interface relative to atomic_t (the type currently being
> > used for kernel reference counts).
> > 
> > Per Julian Anastasov: "The problem is that dest_trash currently holds
> > deleted dests (unlinked from RCU lists) with refcnt=0."  Changing
> > dest_trash to hold dest with refcnt=1 will allow us to free ip_vs_dest
> > structs when their refcnt=0, in ip_vs_dest_put_and_free().
> > 
> > Signed-off-by: David Windsor 
> 
>   Thanks! I tested the first version and this one
> just adds the needed changes in comments, so
> 
> Signed-off-by: Julian Anastasov 
> 
>   Simon and Pablo, this is more appropriate for
> ipvs-next/nf-next. Please apply!

Pablo, would you mind taking this one directly into nf-next?

Signed-off-by: Simon Horman 

> 
> > ---
> >  include/net/ip_vs.h| 2 +-
> >  net/netfilter/ipvs/ip_vs_ctl.c | 8 +++-
> >  2 files changed, 4 insertions(+), 6 deletions(-)
> > 
> > diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
> > index cd6018a..a3e78ad 100644
> > --- a/include/net/ip_vs.h
> > +++ b/include/net/ip_vs.h
> > @@ -1421,7 +1421,7 @@ static inline void ip_vs_dest_put(struct ip_vs_dest 
> > *dest)
> >  
> >  static inline void ip_vs_dest_put_and_free(struct ip_vs_dest *dest)
> >  {
> > -   if (atomic_dec_return(>refcnt) < 0)
> > +   if (atomic_dec_and_test(>refcnt))
> > kfree(dest);
> >  }
> >  
> > diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
> > index 55e0169..5fc4836 100644
> > --- a/net/netfilter/ipvs/ip_vs_ctl.c
> > +++ b/net/netfilter/ipvs/ip_vs_ctl.c
> > @@ -711,7 +711,6 @@ ip_vs_trash_get_dest(struct ip_vs_service *svc, int 
> > dest_af,
> >   dest->vport == svc->port))) {
> > /* HIT */
> > list_del(>t_list);
> > -   ip_vs_dest_hold(dest);
> > goto out;
> > }
> > }
> > @@ -741,7 +740,7 @@ static void ip_vs_dest_free(struct ip_vs_dest *dest)
> >   *  When the ip_vs_control_clearup is activated by ipvs module exit,
> >   *  the service tables must have been flushed and all the connections
> >   *  are expired, and the refcnt of each destination in the trash must
> > - *  be 0, so we simply release them here.
> > + *  be 1, so we simply release them here.
> >   */
> >  static void ip_vs_trash_cleanup(struct netns_ipvs *ipvs)
> >  {
> > @@ -1080,11 +1079,10 @@ static void __ip_vs_del_dest(struct netns_ipvs 
> > *ipvs, struct ip_vs_dest *dest,
> > if (list_empty(>dest_trash) && !cleanup)
> > mod_timer(>dest_trash_timer,
> >   jiffies + (IP_VS_DEST_TRASH_PERIOD >> 1));
> > -   /* dest lives in trash without reference */
> > +   /* dest lives in trash with reference */
> > list_add(>t_list, >dest_trash);
> > dest->idle_start = 0;
> > spin_unlock_bh(>dest_trash_lock);
> > -   ip_vs_dest_put(dest);
> >  }
> >  
> >  
> > @@ -1160,7 +1158,7 @@ static void ip_vs_dest_trash_expire(unsigned long 
> > data)
> >  
> > spin_lock(>dest_trash_lock);
> > list_for_each_entry_safe(dest, next, >dest_trash, t_list) {
> > -   if (atomic_read(>refcnt) > 0)
> > +   if (atomic_read(>refcnt) > 1)
> > continue;
> > if (dest->idle_start) {
> > if (time_before(now, dest->idle_start +
> > -- 
> > 2.7.4
> 
> Regards
> 
> --
> Julian Anastasov 
> 
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html