WARN_ON() in __nf_hook_entries_try_shrink()

2017-09-07 Thread Linus Torvalds
On shutdown I get this (edited down a bit to be more legible):

  Stopping firewalld - dynamic firewall daemon...
  NETFILTER_CFG table=nat family=2 entries=55
  NETFILTER_CFG table=mangle family=2 entries=40
  NETFILTER_CFG table=raw family=2 entries=28
  NETFILTER_CFG table=security family=2 entries=13
  NETFILTER_CFG table=filter family=2 entries=93
  [ cut here ]
  WARNING: CPU: 1 PID: 21512 at net/netfilter/core.c:218
__nf_hook_entries_try_shrink+0x106/0x130
  CPU: 1 PID: 21512 Comm: iptables-restor Not tainted
4.13.0-08555-gc0da4fa0d1a5 #7
  Hardware name: Dell Inc. XPS 13 9350/09JHRY, BIOS 1.4.17 05/10/2017
  RIP: 0010:__nf_hook_entries_try_shrink+0x106/0x130
  Call Trace:
 nf_unregister_net_hooks+0x117/0x240
 ipv4_hooks_unregister+0x60/0x70 [nf_conntrack_ipv4]
 nf_ct_netns_put+0x48/0x80 [nf_conntrack]
 conntrack_mt_destroy+0x15/0x20 [xt_conntrack]
 cleanup_match+0x43/0x70
 cleanup_entry+0x42/0xc0
 __do_replace+0x17a/0x1f0
 do_ipt_set_ctl+0x146/0x1b0
 nf_setsockopt+0x46/0x80
 ip_setsockopt+0x82/0xb0
 raw_setsockopt+0x34/0x40
 sock_common_setsockopt+0x14/0x20
 SyS_setsockopt+0x80/0xe0
 entry_SYSCALL_64_fastpath+0x13/0x94
[ .. warning repeats a few times .. ]
  ---[ end trace 56a6f5b20d97161d ]---
  NETFILTER_CFG table=broute family=7 entries=0
  NETFILTER_CFG table=nat family=7 entries=0
  NETFILTER_CFG table=filter family=7 entries=0
  NETFILTER_CFG table=mangle family=2 entries=6

and some searching notes that the kernel test robot already reported
this a few days ago but nobody reacted.

The kernel test robot seems to blame commit d3ad2c17b404 ("netfilter:
core: batch nf_unregister_net_hooks synchronize_net calls").

Hmm?

   Linus
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 08/26] locking: Remove spin_unlock_wait() generic definitions

2017-07-03 Thread Linus Torvalds
On Mon, Jul 3, 2017 at 9:18 AM, Paul E. McKenney
 wrote:
>
> Agreed, and my next step is to look at spin_lock() followed by
> spin_is_locked(), not necessarily the same lock.

Hmm. Most (all?) "spin_is_locked()" really should be about the same
thread that took the lock (ie it's about asserts and lock debugging).

The optimistic ABBA avoidance pattern for spinlocks *should* be

spin_lock(inner)
...
if (!try_lock(outer)) {
   spin_unlock(inner);
   .. do them in the right order ..

so I don't think spin_is_locked() should have any memory barriers.

In fact, the core function for spin_is_locked() is arguably
arch_spin_value_unlocked() which doesn't even do the access itself.

   Linus
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 25/26] tile: Remove spin_unlock_wait() arch-specific definitions

2017-06-29 Thread Linus Torvalds
On Thu, Jun 29, 2017 at 5:01 PM, Paul E. McKenney
 wrote:
> There is no agreed-upon definition of spin_unlock_wait()'s semantics,
> and it appears that all callers could do just as well with a lock/unlock
> pair.  This commit therefore removes the underlying arch-specific
> arch_spin_unlock_wait().

Please don't make this one commit fopr every architecture.

Once something gets removed, it gets removed. There's no point in
"remove it from architecture X". If there are no more users, we're
done with it, and making it be 25 patches with the same commit message
instead of just one doesn't help anybody.

   Linus
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] netfilter: nf_ct_helper: warn when not applying default helper assignment

2017-01-25 Thread Linus Torvalds
On Tue, Jan 24, 2017 at 2:17 AM, Jiri Kosina  wrote:
> +   if (!helper) {
> +   if (unlikely(!net->ct.sysctl_auto_assign_helper &&
> +   !net->ct.auto_assign_helper_warned &&
> +   
> __nf_ct_helper_find(>tuplehash[IP_CT_DIR_REPLY].tuple))) {
> +   pr_info("nf_conntrack: default automatic helper 
> assignment "
> +   "has been turned off for security reasons "
> +   "and CT-based firewall rule not found. Use 
> the "
> +   "iptables CT target to attach helpers 
> instead.\n");
> +   net->ct.auto_assign_helper_warned = true;
> +   } else {
> +   helper = 
> __nf_ct_helper_find(>tuplehash[IP_CT_DIR_REPLY].tuple);
> +   if (unlikely(!net->ct.auto_assign_helper_warned && 
> helper &&
> +   !net->ct.auto_assign_helper_warned)) {
> pr_info("nf_conntrack: automatic helper "
> "assignment is deprecated and it will "
> "be removed soon. Use the iptables CT target "
> "to attach helpers instead.\n");
> net->ct.auto_assign_helper_warned = true;
> +   }
> }
> }

I don't disagree that this kind of warning might be useful, but that
code makes my eyes bleed, and is really really hard to follow.

Please make it a helper function. And don't have crazy conditionals
with else statements and other crazy conditionals. With random
likely/unlikely things that are not necessariyl even true.

For example, you can rewrite the logic something like

static struct nf_conntrack_helper *find_auto_helper(struct nf_conn *ct)
{
return __nf_ct_helper_find(>tuplehash[IP_CT_DIR_REPLY].tuple;
}

static struct nf_conntrack_helper *ct_lookup_helper(struct nf_conn
*ct, struct net *net)
{
struct nf_conntrack_helper *ret;

if (!net->ct.sysctl_auto_assign_helper) {
if (net->ct.auto_assign_helper_warned)
return NULL;
if (!find_auto_helper(ct))
return NULL;

.. warn about helper existing but not used ..

net->ct.auto_assign_helper_warned = 1;
return NULL;
}

ret = find_auto_helper(ct);
if (!ret || net->ct.auto_assign_helper_warned)
return ret;

... warn about helper existing but automatic helpers deprecated..

net->ct.auto_assign_helper_warned = 1;
return ret;
}

and now each particular case is a lot easier to follow. Then you just have

if (!helper) {
helper = ct_lookup_helper(ct, net);
if (!helper) {
if (help)
RCU_INIT_POINTER(help->helper, NULL);
return 0;
}
 }

in __nf_ct_try_assign_helper()

All of the above is entirely untested and just written in my email
client. It may be garbage. It's not meant to be used, it's meant to
just illustrate avoiding complex nested conditionals. It's a few more
lines, but now each part has simple logic and is much more
understandable.

   Linus
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] restore original default of nf_conntrack_helper sysctl

2017-01-23 Thread Linus Torvalds
On Mon, Jan 23, 2017 at 4:06 PM, Jiri Kosina  wrote:
>
> Considering this being really close to the "userspace breakage"
> borderline, I'm CCing Linus as well.

For all I know, there may be some security reason why we really don't
want the automatic helpers, even if they can be convenient.

Also, you can just enable them with a kernel command line or a sysctl,
so it's not like you can't get the old behavior back.

Do networking people have any comments? Was there a reason to actually
switch the default? Because the commit messages aren't all that
helpful.

   Linus
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: slab corruption with current -git

2016-10-13 Thread Linus Torvalds
On Wed, Oct 12, 2016 at 11:27 PM, Markus Trippelsdorf
 wrote:
>
> Yeah.
>
> 105 entry->orig_ops = reg;
> 106 entry->ops  = *reg;
> 107 entry->next = NULL;

So ipt_register_table() does:

ret = nf_register_net_hooks(net, ops, hweight32(table->valid_hooks));

and then nf_register_net_hooks() just does

for (i = 0; i < n; i++) {
err = nf_register_net_hook(net, [i]);

so if the *reg is uninitialized, it means that it's the 'ops[]' array
that isn't actually really valid in "valid_hooks". Odd. They should
all be initialized by xt_hook_ops_alloc(), no?

That said, xt_hook_ops_alloc() itself is odd. Lookie here, this is the
loop that initializes things:

for (i = 0, hooknum = 0; i < num_hooks && hook_mask != 0;
 hook_mask >>= 1, ++hooknum) {

and it makes no sense to me how that tests *both* "i < num_hools" and
"hook_mask != 0".

Why? Because

num_hooks = hweight32(hook_mask);

so it's entirely redundant. num_hooks is already how many bits are on
in hook_mask, so that test is just duplicating the same thing twice
("have we done less than that number of bits" and "do we have any bits
less").

I don't know. There's something odd going on. Regardless, thsi is a
different problem from the nf_register_net_hook() list handling, so
I'll leave it to the networking people. David?

   Linus
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: slab corruption with current -git (was Re: [git pull] vfs pile 1 (splice))

2016-10-10 Thread Linus Torvalds
On Mon, Oct 10, 2016 at 10:39 PM, Linus Torvalds
<torva...@linux-foundation.org> wrote:
>
> I guess I will have to double-check that the slub corruption is gone
> still with that fixed.

So I'm not getting any warnings now from SLUB debugging. So the
original bug seems to not have re-surfaced, and the registration bug
is gone, so now the unregistration doesn't warn about anything either.

But I only rebooted three times.

   Linus
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: slab corruption with current -git (was Re: [git pull] vfs pile 1 (splice))

2016-10-10 Thread Linus Torvalds
On Sun, Oct 9, 2016 at 8:41 PM, Linus Torvalds
<torva...@linux-foundation.org> wrote:
> This COMPLETELY UNTESTED patch tries to fix the nf_hook_entry code to do this.
>
> I repeat: it's ENTIRELY UNTESTED.

Gaah.

That patch was subtle garbage.

The "add to list" thing did this:

rcu_assign_pointer(entry->next, p);
rcu_assign_pointer(*pp, p);

which is not so subtly broken - that second assignment just assigns
"p" to "*pp", but that was what *pp already contained. Too much
cut-and-paste.

That also explains why I then get the NOT FOUND case, because the add
never actually worked.

It *should* be

rcu_assign_pointer(entry->next, p);
rcu_assign_pointer(*pp, entry);

and then the warnings about "not found" are gone.

Duh.

I guess I will have to double-check that the slub corruption is gone
still with that fixed.

Anyway, new version of the patch (just that one line changed) attached.

Linus
 net/netfilter/core.c | 108 ---
 1 file changed, 33 insertions(+), 75 deletions(-)

diff --git a/net/netfilter/core.c b/net/netfilter/core.c
index c9d90eb64046..fcb5d1df11e9 100644
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -65,49 +65,24 @@ static DEFINE_MUTEX(nf_hook_mutex);
 #define nf_entry_dereference(e) \
rcu_dereference_protected(e, lockdep_is_held(_hook_mutex))
 
-static struct nf_hook_entry *nf_hook_entry_head(struct net *net,
-   const struct nf_hook_ops *reg)
+static struct nf_hook_entry __rcu **nf_hook_entry_head(struct net *net, const 
struct nf_hook_ops *reg)
 {
-   struct nf_hook_entry *hook_head = NULL;
-
if (reg->pf != NFPROTO_NETDEV)
-   hook_head = nf_entry_dereference(net->nf.hooks[reg->pf]
-[reg->hooknum]);
-   else if (reg->hooknum == NF_NETDEV_INGRESS) {
+   return net->nf.hooks[reg->pf]+reg->hooknum;
+
 #ifdef CONFIG_NETFILTER_INGRESS
+   if (reg->hooknum == NF_NETDEV_INGRESS) {
if (reg->dev && dev_net(reg->dev) == net)
-   hook_head =
-   nf_entry_dereference(
-   reg->dev->nf_hooks_ingress);
-#endif
+   return >dev->nf_hooks_ingress;
}
-   return hook_head;
-}
-
-/* must hold nf_hook_mutex */
-static void nf_set_hooks_head(struct net *net, const struct nf_hook_ops *reg,
- struct nf_hook_entry *entry)
-{
-   switch (reg->pf) {
-   case NFPROTO_NETDEV:
-#ifdef CONFIG_NETFILTER_INGRESS
-   /* We already checked in nf_register_net_hook() that this is
-* used from ingress.
-*/
-   rcu_assign_pointer(reg->dev->nf_hooks_ingress, entry);
 #endif
-   break;
-   default:
-   rcu_assign_pointer(net->nf.hooks[reg->pf][reg->hooknum],
-  entry);
-   break;
-   }
+   return NULL;
 }
 
 int nf_register_net_hook(struct net *net, const struct nf_hook_ops *reg)
 {
-   struct nf_hook_entry *hooks_entry;
-   struct nf_hook_entry *entry;
+   struct nf_hook_entry __rcu **pp;
+   struct nf_hook_entry *entry, *p;
 
if (reg->pf == NFPROTO_NETDEV) {
 #ifndef CONFIG_NETFILTER_INGRESS
@@ -119,6 +94,10 @@ int nf_register_net_hook(struct net *net, const struct 
nf_hook_ops *reg)
return -EINVAL;
}
 
+   pp = nf_hook_entry_head(net, reg);
+   if (!pp)
+   return -EINVAL;
+
entry = kmalloc(sizeof(*entry), GFP_KERNEL);
if (!entry)
return -ENOMEM;
@@ -128,26 +107,15 @@ int nf_register_net_hook(struct net *net, const struct 
nf_hook_ops *reg)
entry->next = NULL;
 
mutex_lock(_hook_mutex);
-   hooks_entry = nf_hook_entry_head(net, reg);
-
-   if (hooks_entry && hooks_entry->orig_ops->priority > reg->priority) {
-   /* This is the case where we need to insert at the head */
-   entry->next = hooks_entry;
-   hooks_entry = NULL;
-   }
-
-   while (hooks_entry &&
-   reg->priority >= hooks_entry->orig_ops->priority &&
-   nf_entry_dereference(hooks_entry->next)) {
-   hooks_entry = nf_entry_dereference(hooks_entry->next);
-   }
 
-   if (hooks_entry) {
-   entry->next = nf_entry_dereference(hooks_entry->next);
-   rcu_assign_pointer(hooks_entry->next, entry);
-   } else {
-   nf_set_hooks_head(net, reg, entry);
+   /* Find the spot in the list */
+   while ((p = nf_entry_dereference(*pp)) != NULL) {
+

Re: slab corruption with current -git

2016-10-10 Thread Linus Torvalds
On Mon, Oct 10, 2016 at 5:30 PM, David Miller  wrote:
>
> Linus can you add some extra info to that:

Sure. I made it a WARN_ON_ONCE(), but then always just printed the
pf/hooknum. It's all over the map:

 reg->pf=2 and reg->hooknum=4
 reg->pf=2 and reg->hooknum=2
 reg->pf=2 and reg->hooknum=3
 reg->pf=10 and reg->hooknum=4
 reg->pf=10 and reg->hooknum=2
 reg->pf=10 and reg->hooknum=3
 reg->pf=7 and reg->hooknum=1
 reg->pf=7 and reg->hooknum=2
 reg->pf=7 and reg->hooknum=3
 reg->pf=2 and reg->hooknum=0
 reg->pf=2 and reg->hooknum=3
 reg->pf=2 and reg->hooknum=0
 reg->pf=2 and reg->hooknum=3
 reg->pf=2 and reg->hooknum=4
 reg->pf=2 and reg->hooknum=4
 reg->pf=2 and reg->hooknum=1
 reg->pf=2 and reg->hooknum=1
 reg->pf=10 and reg->hooknum=0
 reg->pf=10 and reg->hooknum=3
 reg->pf=10 and reg->hooknum=0
 reg->pf=10 and reg->hooknum=3
 reg->pf=10 and reg->hooknum=4
 reg->pf=10 and reg->hooknum=4
 reg->pf=10 and reg->hooknum=1
 reg->pf=10 and reg->hooknum=1
 reg->pf=7 and reg->hooknum=3
 reg->pf=7 and reg->hooknum=4
 reg->pf=7 and reg->hooknum=0
 reg->pf=2 and reg->hooknum=4
 reg->pf=2 and reg->hooknum=2
 reg->pf=2 and reg->hooknum=3
 reg->pf=10 and reg->hooknum=4
 reg->pf=10 and reg->hooknum=2
 reg->pf=10 and reg->hooknum=3
 reg->pf=7 and reg->hooknum=1
 reg->pf=7 and reg->hooknum=2
 reg->pf=7 and reg->hooknum=3
 reg->pf=2 and reg->hooknum=0
 reg->pf=2 and reg->hooknum=3
 reg->pf=2 and reg->hooknum=0
 reg->pf=2 and reg->hooknum=3
 reg->pf=2 and reg->hooknum=4
 reg->pf=2 and reg->hooknum=4
 reg->pf=2 and reg->hooknum=1
 reg->pf=2 and reg->hooknum=1
 reg->pf=10 and reg->hooknum=0
 reg->pf=10 and reg->hooknum=3
 reg->pf=10 and reg->hooknum=0
 reg->pf=10 and reg->hooknum=3
 reg->pf=10 and reg->hooknum=4
 reg->pf=10 and reg->hooknum=4
 reg->pf=10 and reg->hooknum=1
 reg->pf=10 and reg->hooknum=1
 reg->pf=7 and reg->hooknum=3
 reg->pf=7 and reg->hooknum=4
 reg->pf=7 and reg->hooknum=0

and putting that through "sort -n" and "uniq -c", I get:

  4  reg->pf=10 and reg->hooknum=0
  4  reg->pf=10 and reg->hooknum=1
  2  reg->pf=10 and reg->hooknum=2
  6  reg->pf=10 and reg->hooknum=3
  6  reg->pf=10 and reg->hooknum=4
  4  reg->pf=2 and reg->hooknum=0
  4  reg->pf=2 and reg->hooknum=1
  2  reg->pf=2 and reg->hooknum=2
  6  reg->pf=2 and reg->hooknum=3
  6  reg->pf=2 and reg->hooknum=4
  2  reg->pf=7 and reg->hooknum=0
  2  reg->pf=7 and reg->hooknum=1
  2  reg->pf=7 and reg->hooknum=2
  4  reg->pf=7 and reg->hooknum=3

which doesn't look much better. But clearly there's a lot of those
"try to unregister stuff that you can't even find".

Maybe it tells you something.

   Linus
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: slab corruption with current -git (was Re: [git pull] vfs pile 1 (splice))

2016-10-10 Thread Linus Torvalds
On Mon, Oct 10, 2016 at 9:28 AM, Linus Torvalds
<torva...@linux-foundation.org> wrote:
>
> So as I already answered to Dave, I'm not actually sure that this was
> the buggy code, or that my patch would make any difference at all.

My patch does seem to fix things, and in fact the warning about "hook
not found" now triggers.

So I think the bug really was that the singly-linked list handling
code did not correctly handle the case of not finding the entry, and
then freed (incorrectly) the last one that wasn't actually unlinked.

In fact, I get quite a few warnings (56 total) about 30 seconds after
logging in:

[   54.213170] WARNING: CPU: 1 PID: 111 at net/netfilter/core.c:151
nf_unregister_net_hook+0x8e/0x170
... repeat 54 times ...
[   54.445520] WARNING: CPU: 7 PID: 111 at net/netfilter/core.c:151
nf_unregister_net_hook+0x8e/0x170

and looking in the journal, the first one is (again) immediately
preceded by that systemd-hostnamed service stopping:

  Oct 10 11:45:47 i7 audit[1546]: USER_LOGIN
  ...
  Oct 10 11:46:11 i7 audit[1]: SERVICE_STOP pid=1 uid=0
auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0
msg='unit=fprintd comm="systemd" exe="/usr/lib/systemd/systemd"
hostname=? addr=? terminal=? res=success'
  Oct 10 11:46:13 i7 pulseaudio[1697]: [pulseaudio] bluez5-util.c:
GetManagedObjects() failed: org.freedesktop.DBus.Error.NoReply: Did
not receive a reply. Possible causes include: the remote application
did not send a reply, the message bus security policy blocked the
reply, the reply timeout expir
  Oct 10 11:46:13 i7 dbus-daemon[1003]: [system] Failed to activate
service 'org.bluez': timed out
  Oct 10 11:46:20 i7 audit[1]: SERVICE_STOP pid=1 uid=0
auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0
msg='unit=systemd-hostnamed comm="systemd"
exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=?
res=success'
  Oct 10 11:46:20 i7 kernel: [ cut here ]
  Oct 10 11:46:20 i7 kernel: WARNING: CPU: 1 PID: 111 at
net/netfilter/core.c:151 nf_unregister_net_hook+0x8e/0x170

so I do think it's something to do with some network startup service
thing (perhaps dhcp, perhaps chrome, who knows) as I do my initial
login.

David - I think that also explains what was wrong with the old code.
In the old code, this loop:

while (hooks_entry && nf_entry_dereference(hooks_entry->next)) {

would exit with "hooks_entry" pointing to the last list entry (because
->next was NULL). Nothing was ever unlinked in the loop itself,
because it never actually found a matching entry, but then after the
loop it would free that last entry because it *thought* that was the
match.

My list rewrite fixes that.

Anyway, I'm assuming it will come to me from the networking tree after
more testing by the maintainers. You can add my

  Signed-off-by: Linus Torvalds <torva...@linux-foundation.org>

to the patch, though.

David, if you want me to just commit that thing directly, I can
obviously do so, but I do think somebody should look at

 (a) that I actually got the priority list ordering right on the insertion side
 (b) what it is that makes it try to unregister that hook that isn't
on the list in the first place

but on the whole I consider this issue explained and solved. I'll
continue to run with my patch on my machine (just not committed).

  Linus
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: slab corruption with current -git (was Re: [git pull] vfs pile 1 (splice))

2016-10-10 Thread Linus Torvalds
On Mon, Oct 10, 2016 at 6:49 AM, Aaron Conole  wrote:
>
> Okay, I'm looking it over.  Sorry for the mess.

So as I already answered to Dave, I'm not actually sure that this was
the buggy code, or that my patch would make any difference at all.

I never got a good reproducer for the bug: I spent much of the weekend
rebooting, because it seems to happen only just after a reboot, as I
log in and start my usual thing.

I initially blamed some off filesystem or block layer issue ("Oh, it
only happens with a cold cache"), partly because the initial
non-poisoned slub oopses happened in filesystem code.

But I now think it's netfilter, and I *think* that what triggers it is
something like the bluetooth subsystem giving up or something. What I
do when I log into a new session tends to be to go to the kernel
subdirectory in one or two terminals, and fire up chrome to read
email. And the problem either happened within half a minute of me
doing that, or it never happens at all.

Which is why I ended up rebooting a *lot*. Just running the kernel
never triggered it.

(It took me some time to figure that out, which is basically why I did
almost no pull requests the whole weekend)

The journal entries for that invalid kernel access is somewhat suggestive:

  Oct 09 13:24:03 i7 dbus-daemon[1030]: [system] Failed to activate
service 'org.bluez': timed out

  Oct 09 13:24:09 i7 audit[1]: SERVICE_STOP pid=1 uid=0
auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0
msg='unit=systemd-hostnamed comm="systemd"
exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=?
res=success'

  Oct 09 13:24:09 i7 kernel: general protection fault:  [#1] SMP

so it happened just as *some* network setup thing was finishing off (I
don't think it was systemd-hostnamed itself that necessarily matters,
but clearly something was finishing up as the netfilter problem
occurred.

> I'll review it, and test it.  Can you tell me what steps you took to
> reproduce the oops?

See above: I can't actually really "reproduce" it. It's probably
highly timing-dependent, and it is not unlikely that it's also very
much about specific setup. I'm running plain Fedora 24, I boot up, log
in, start two or three terminals, fire up chrome, and ...

So far I've seen the problem maybe 5-6 times, but a couple of those
were just silent hangs (I may have rebooted too quickly for things to
hit the disk, or the oops may just have killed the machine too hard).
Two I got the oops inside slub code, and I only have one successful
slub poisoning oops from netfilter.

(Part of the reason I only have one is that once I got that, I stopped
rebooting, and instead started looking at the netfilter code and
started to do some merge window pulls again because I felt that this
is *probably* the core reason, and I cant' afford to not do pulls
during the merge window for _too_ long).

  Linus
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: slab corruption with current -git

2016-10-10 Thread Linus Torvalds
On Mon, Oct 10, 2016 at 1:24 AM, David Miller  wrote:
>
> So I've been reviewing this patch and it looks fine, but I also want
> to figure out what is actually causing the OOPS and I can't spot it
> yet.

Yeah, I'm not actually sure the old linked list implementation is
buggy - it might just be ugly. I tried to follow the old code, and I
couldn't.

So the patch I sent out was a combination of "that's not how you
should do singly linked lists" and "those special cases make me
worry". In particular, the old code really ended up doing odd things
in the "can't find entry" case, because it would exit the loop with a
non-NULL 'entry' just because the next entry was NULL..

> One possible way to see that oops is to free the head entry of the
> chain without unlinking it.  The next unregister will dereference a
> POISON pointer.
>
> Actually...
>
> The POISON value comes not from a hook entry, but from the array of
> pointers in the per-netns datastructure.
>
> This means that the netns is possibly getting freed up before we
> unregister the netfilter hooks.

That is certainly one possible explanation for it, yes. However, I
didn't think that part had changed, had it?

The other thing I find a bit odd in that new single-linked list code is this:

 - nf_hook_slow():
...
RCU_INIT_POINTER(state->hook_entries, entry);

which makes me worried..  It copies the head entry of the list, and
maybe it will then (later) end up being used stale. I don't know. But
it looks a bit iffy.

  Linus
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: slab corruption with current -git (was Re: [git pull] vfs pile 1 (splice))

2016-10-09 Thread Linus Torvalds
On Sun, Oct 9, 2016 at 7:49 PM, Linus Torvalds
<torva...@linux-foundation.org> wrote:
>
> There is one *correct* way to remove an entry from a singly linked
> list, and it looks like this:
>
> struct entry **pp, *p;
>
> pp = 
> while ((p = *pp) != NULL) {
> if (right_entry(p)) {
> *pp = p->next;
> break;
> }
> pp = >next;
> }
>
> and that's it. Nothing else.

This COMPLETELY UNTESTED patch tries to fix the nf_hook_entry code to do this.

I repeat: it's ENTIRELY UNTESTED. I just converted the insertion and
deletion to the proper pattern, but I could easily have gotten the
insertion priority test the wrong way around entirely, for example. Or
it could simply have some other completely broken bug in it. It
compiles for me, but that's all I actually checked.

Note that the "correct way" of doing list operations also almost
inevitably is the shortest way by far, since it gets rid of all the
special cases. So the patch looks nice. It gets rid of the magic
"nf_set_hooks_head()" thing too, because once you do list following
right, the head is no different from any other pointer in the list.

So the patch stats look good:

 net/netfilter/core.c | 108 ---
 1 file changed, 33 insertions(+), 75 deletions(-)

but again, it's entirely *entirely* untested. Please consider this
just a "this is generally how list insert/delete operations should be
done, avoiding special cases for the first entry".

ALSO NOTE! The code assumes that the "nf_hook_mutex" locking only
protects the actual *lists*, and that the address to the list can be
looked up without holding the lock. That's generally how things are
done, and it simplifies error handling (because you can do the "there
is no such list at all" test before you do anything else. But again, I
don't actually know the code, and if there is something that actually
expands the number of lists etc that depends on that mutex, then the
list head lookup may need to be inside the lock too.

   Linus
 net/netfilter/core.c | 108 ---
 1 file changed, 33 insertions(+), 75 deletions(-)

diff --git a/net/netfilter/core.c b/net/netfilter/core.c
index c9d90eb64046..814258641fcc 100644
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -65,49 +65,24 @@ static DEFINE_MUTEX(nf_hook_mutex);
 #define nf_entry_dereference(e) \
rcu_dereference_protected(e, lockdep_is_held(_hook_mutex))
 
-static struct nf_hook_entry *nf_hook_entry_head(struct net *net,
-   const struct nf_hook_ops *reg)
+static struct nf_hook_entry __rcu **nf_hook_entry_head(struct net *net, const 
struct nf_hook_ops *reg)
 {
-   struct nf_hook_entry *hook_head = NULL;
-
if (reg->pf != NFPROTO_NETDEV)
-   hook_head = nf_entry_dereference(net->nf.hooks[reg->pf]
-[reg->hooknum]);
-   else if (reg->hooknum == NF_NETDEV_INGRESS) {
+   return net->nf.hooks[reg->pf]+reg->hooknum;
+
 #ifdef CONFIG_NETFILTER_INGRESS
+   if (reg->hooknum == NF_NETDEV_INGRESS) {
if (reg->dev && dev_net(reg->dev) == net)
-   hook_head =
-   nf_entry_dereference(
-   reg->dev->nf_hooks_ingress);
-#endif
+   return >dev->nf_hooks_ingress;
}
-   return hook_head;
-}
-
-/* must hold nf_hook_mutex */
-static void nf_set_hooks_head(struct net *net, const struct nf_hook_ops *reg,
- struct nf_hook_entry *entry)
-{
-   switch (reg->pf) {
-   case NFPROTO_NETDEV:
-#ifdef CONFIG_NETFILTER_INGRESS
-   /* We already checked in nf_register_net_hook() that this is
-* used from ingress.
-*/
-   rcu_assign_pointer(reg->dev->nf_hooks_ingress, entry);
 #endif
-   break;
-   default:
-   rcu_assign_pointer(net->nf.hooks[reg->pf][reg->hooknum],
-  entry);
-   break;
-   }
+   return NULL;
 }
 
 int nf_register_net_hook(struct net *net, const struct nf_hook_ops *reg)
 {
-   struct nf_hook_entry *hooks_entry;
-   struct nf_hook_entry *entry;
+   struct nf_hook_entry __rcu **pp;
+   struct nf_hook_entry *entry, *p;
 
if (reg->pf == NFPROTO_NETDEV) {
 #ifndef CONFIG_NETFILTER_INGRESS
@@ -119,6 +94,10 @@ int nf_register_net_hook(struct net *net, const struct 
nf_hook_ops *reg)
return -EINVAL;
}
 
+   pp = nf_hook_entry_head(net, reg);
+   if (!pp)
+   return -EINVAL;
+
entry = kmalloc(sizeof(*entry), GFP_KER

Re: slab corruption with current -git (was Re: [git pull] vfs pile 1 (splice))

2016-10-09 Thread Linus Torvalds
On Sun, Oct 9, 2016 at 6:35 PM, Aaron Conole  wrote:
>
> I was just about to build and test something similar:

So I haven't actually tested that one, but looking at the code, it
really looks very bogus. In fact, that code just looks like crap. It
does *not* do a proper "remove singly linked list entry". It's exactly
the kind of code that I rail against, and that people should never
write.

Any code that can't even traverse a linked list is not worth looking at.

There is one *correct* way to remove an entry from a singly linked
list, and it looks like this:

struct entry **pp, *p;

pp = 
while ((p = *pp) != NULL) {
if (right_entry(p)) {
*pp = p->next;
break;
}
pp = >next;
}

and that's it. Nothing else. The above code exits the loop with "p"
containing the entry that was removed, or NULL if nothing was. It
can't get any simpler than that, but more importantly, anything more
complicated than that is WRONG.

Seriously, nothing else is acceptable. In particular, any linked list
traversal that makes a special case of the first entry or the last
entry should not be allowed to exist. Note how there is not a single
special case in the above correct code. It JustWorks(tm).

That nf_unregister_net_hook() code has all the signs of exactly that
kind of broken list-handling code: special-casing the head of the
loop, and having the loop condition test both current and that odd
"next to next" pointer etc. It's all very very wrong.

So I really see two options:

 - do that singly-linked list traversal right (and I'm serious:
nothing but the code above can ever be right)

 - don't make up your own list handling code at all, and use the
standard linux list code.

So either e3b37f11e6e4 needs to be reverted, or it needs to be taught
to use real list handling.  If the code doesn't want to use the
regular list.h (either the doubly linked one, or the hlist one), it
needs to at least learn to do list removal right.

   Linus
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


slab corruption with current -git (was Re: [git pull] vfs pile 1 (splice))

2016-10-09 Thread Linus Torvalds
On Sun, Oct 9, 2016 at 12:11 PM, Linus Torvalds
<torva...@linux-foundation.org> wrote:
>
> Anyway, I don't think I can bisect it, but I'll try to narrow it down
> a *bit* at least.
>
> Not doing any more pulls on this unstable base, I've been puttering
> around in trying to clean up some stupid printk logging issues
> instead.

So I finally got a oops with slub debugging enabled. It doesn't really
narrow things down, though, it kind of extends on the possible
suspects. Now adding David Miller and Pablo, because it looks like it
may be netfilter that does something bad and corrupts memory.

Of course, maybe this is another symptom, and not the root cause for
my troubles, but it does look like it might be getting closer to the
cause... In particular, now it very much looks like a use-after-free
in the netfilter code, which *could* explain my original symptom with
later allocation users oopsing randomly.

Without further ado, here's the new oops:

   general protection fault:  [#1] SMP
   CPU: 7 PID: 169 Comm: kworker/u16:7 Not tainted 4.8.0-11288-gb66484cd7470 #1
   Hardware name: System manufacturer System Product Name/Z170-K, BIOS
1803 05/06/2016
   Workqueue: netns cleanup_net
   task: 91935e001fc0 task.stack: b4e2c213c000
   RIP: nf_unregister_net_hook+0x5f/0x190
   RSP: :b4e2c213fd40  EFLAGS: 00010202
   RAX: 6b6b6b6b6b6b6b6b RBX: 91933c4ab968 RCX: 0002
   RDX: 0002 RSI: c0642280 RDI: 91cf9820
   RBP: b4e2c213fd58 R08: 91933c4a86c8 R09: 0025
   R10: 00cc R11: 91935dd22000 R12: c0642280
   R13: 91934cc0ea80 R14: 91cf97e0 R15: 
   FS:  () GS:919376dc() knlGS:
   CS:  0010 DS:  ES:  CR0: 80050033
   CR2: 03e7c000 CR3: 0003fdb62000 CR4: 003406e0
   DR0:  DR1:  DR2: 
   DR3:  DR6: fffe0ff0 DR7: 0400
   Call Trace:
 netfilter_net_exit+0x2f/0x60
 ops_exit_list.isra.4+0x38/0x60
 cleanup_net+0x1ba/0x2a0
 process_one_work+0x1f1/0x480
 worker_thread+0x48/0x4d0
 ? process_one_work+0x480/0x480
 ? process_one_work+0x480/0x480
 kthread+0xd9/0xf0
 ? kthread_park+0x60/0x60
 ret_from_fork+0x22/0x30
   Code: 0f b6 ca 48 8d 84 c8 00 01 00 00 49 8b 5c c5 00 48 85 db 0f
84 cb 00 00 00 4c 3b 63 40 48 8b 03 0f 84 e9 00 00 00 48 85 c0 74 26
<4c> 3b 60 40 75 08 e9 ef 00 00 00 48 89 d8 48 8b 18 48 85 db 0f
   RIP  [] nf_unregister_net_hook+0x5f/0x190

and note the value in %rax: 6b is POISON_FREE, so it very much looks
like it's a pointer loaded from a free'd allocation.

The code disassembles to

   0: 0f b6 ca movzbl %dl,%ecx
   3: 48 8d 84 c8 00 01 00 lea0x100(%rax,%rcx,8),%rax
   a: 00
   b: 49 8b 5c c5 00   mov0x0(%r13,%rax,8),%rbx
  10: 48 85 db test   %rbx,%rbx
  13: 0f 84 cb 00 00 00 je 0xe4
  19: 4c 3b 63 40   cmp0x40(%rbx),%r12
  1d: 48 8b 03 mov(%rbx),%rax
  20: 0f 84 e9 00 00 00 je 0x10f
  26: 48 85 c0 test   %rax,%rax
  29: 74 26 je 0x51
  2b:* 4c 3b 60 40   cmp0x40(%rax),%r12 <-- trapping instruction
  2f: 75 08 jne0x39
  31: e9 ef 00 00 00   jmpq   0x125
  36: 48 89 d8 mov%rbx,%rax
  39: 48 8b 18 mov(%rax),%rbx
  3c: 48 85 db test   %rbx,%rbx

and that oopsing instruction seems to be the compare of
"hooks_entry->orig_ops" from hooks_entry in this expression:

if (hooks_entry && hooks_entry->orig_ops == reg) {

so hooks_entry() is bogus. It was gotten from

hooks_entry = nf_hook_entry_head(net, reg);

but that's as far as I dug. And yes, I do have
CONFIG_NETFILTER_INGRESS=y in case that matters.

And all this code has changed pretty radically in commit e3b37f11e6e4
("netfilter: replace list_head with single linked list"), and there
was clearly already something wrong with that code, with commit
5119e4381a90 ("netfilter: Fix potential null pointer dereference")
adding the test against NULL. But I suspect that only hid the "oops,
it's actually not NULL, it loaded some uninitialized value" problem.

Over to the networking guys.. Ideas?

   Linus
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT] Networking

2016-10-05 Thread Linus Torvalds
On Wed, Oct 5, 2016 at 5:52 PM, Stephen Rothwell  wrote:
>
> Except that commit effectively moved that function from
> net/netfilter/nf_tables_netdev.c to
> include/net/netfilter/nf_tables_ipv4.h while commit c73c24849011
> ("netfilter: nf_tables_netdev: remove redundant ip_hdr assignment")
> removed the assignment in the original file (and has been in your tree
> since v4.8-rc7) and that is where I originally actually got a conflict.

Oh, interesting. Why didn't I get the conflict there then?

I'm guessing (but too lazy to actually look up the history), that
David ended up doing that merge and that ends up being why I never saw
a conflict.

   Linus
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT] Networking

2016-10-05 Thread Linus Torvalds
On Wed, Oct 5, 2016 at 3:29 PM, Stephen Rothwell  wrote:
>
> I have been carrying the following merge fix patch (for the merge of
> the net-next tree with Linus' tree) for a while now which seems to have
> got missed:

Ugh. It doesn't seem to be a merge error, because that double iph
assignment came from the original patch that introduced this function:
commit ddc8b6027ad0 ("netfilter: introduce nft_set_pktinfo_{ipv4,
ipv6}_validate()").

So I wouldn't call it a merge error - it just looks like a bug in the
network layer. So I'm not going to apply your patch even though it
looks plausible to me, simply because it's outside my area of
expertise.

David? Pablo?

   Linus
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH 2/3] locking: Annotate spin_unlock_wait() users

2016-05-24 Thread Linus Torvalds
On Tue, May 24, 2016 at 7:27 AM, Peter Zijlstra  wrote:
> spin_unlock_wait() has an unintuitive 'feature' in that it doesn't
> fully serialize against the spin_unlock() we've waited on.

NAK.

We don't start adding more of this "after_ctrl_dep" crap.

It's completely impossible to understand, and even people who have
been locking experts have gotten it wrong.

So it is *completely* unacceptable to have it in drivers.

This needs to be either hidden inside the basic spinlock functions,
_or_ it needs to be a clear and unambiguous interface. Anything that
starts talking about control dependencies is not it.

Note that this really is about naming and use, not about
implementation. So something like "spin_sync_after_unlock_wait()" is
acceptable, even if the actual _implementation_ were to be exactly the
same as the "after_ctrl_dep()" crap.

The difference is that one talks about incomprehensible implementation
details that nobody outside of the person who *implemented* the
spinlock code is supposed to understand (and seriously, I have my
doubts even the spinlock implementer understands it, judging by the
last time this happened), and the other is a much simpler semantic
guarantee.

So don't talk about "acquire". And most certainly don't talk about
"control dependencies". Not if we end up having things like *drivers*
using this like in this example libata.

Linus
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html