Re: slab corruption with current -git

2016-10-09 Thread David Miller
From: Linus Torvalds 
Date: Sun, 9 Oct 2016 20:41:17 -0700

> Note that the "correct way" of doing list operations also almost
> inevitably is the shortest way by far, since it gets rid of all the
> special cases. So the patch looks nice. It gets rid of the magic
> "nf_set_hooks_head()" thing too, because once you do list following
> right, the head is no different from any other pointer in the list.

Perhaps we should have some "slist" primitives added to
include/linux/list.h but since the comparison differs for each user I
guess it's hard to abstract in a way that's generic and inlines
properly.

I'll start taking a look at your patch and this stuff as well, thanks
Linus.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: slab corruption with current -git (was Re: [git pull] vfs pile 1 (splice))

2016-10-09 Thread Linus Torvalds
On Sun, Oct 9, 2016 at 7:49 PM, Linus Torvalds
 wrote:
>
> There is one *correct* way to remove an entry from a singly linked
> list, and it looks like this:
>
> struct entry **pp, *p;
>
> pp = &head;
> while ((p = *pp) != NULL) {
> if (right_entry(p)) {
> *pp = p->next;
> break;
> }
> pp = &p->next;
> }
>
> and that's it. Nothing else.

This COMPLETELY UNTESTED patch tries to fix the nf_hook_entry code to do this.

I repeat: it's ENTIRELY UNTESTED. I just converted the insertion and
deletion to the proper pattern, but I could easily have gotten the
insertion priority test the wrong way around entirely, for example. Or
it could simply have some other completely broken bug in it. It
compiles for me, but that's all I actually checked.

Note that the "correct way" of doing list operations also almost
inevitably is the shortest way by far, since it gets rid of all the
special cases. So the patch looks nice. It gets rid of the magic
"nf_set_hooks_head()" thing too, because once you do list following
right, the head is no different from any other pointer in the list.

So the patch stats look good:

 net/netfilter/core.c | 108 ---
 1 file changed, 33 insertions(+), 75 deletions(-)

but again, it's entirely *entirely* untested. Please consider this
just a "this is generally how list insert/delete operations should be
done, avoiding special cases for the first entry".

ALSO NOTE! The code assumes that the "nf_hook_mutex" locking only
protects the actual *lists*, and that the address to the list can be
looked up without holding the lock. That's generally how things are
done, and it simplifies error handling (because you can do the "there
is no such list at all" test before you do anything else. But again, I
don't actually know the code, and if there is something that actually
expands the number of lists etc that depends on that mutex, then the
list head lookup may need to be inside the lock too.

   Linus
 net/netfilter/core.c | 108 ---
 1 file changed, 33 insertions(+), 75 deletions(-)

diff --git a/net/netfilter/core.c b/net/netfilter/core.c
index c9d90eb64046..814258641fcc 100644
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -65,49 +65,24 @@ static DEFINE_MUTEX(nf_hook_mutex);
 #define nf_entry_dereference(e) \
rcu_dereference_protected(e, lockdep_is_held(&nf_hook_mutex))
 
-static struct nf_hook_entry *nf_hook_entry_head(struct net *net,
-   const struct nf_hook_ops *reg)
+static struct nf_hook_entry __rcu **nf_hook_entry_head(struct net *net, const 
struct nf_hook_ops *reg)
 {
-   struct nf_hook_entry *hook_head = NULL;
-
if (reg->pf != NFPROTO_NETDEV)
-   hook_head = nf_entry_dereference(net->nf.hooks[reg->pf]
-[reg->hooknum]);
-   else if (reg->hooknum == NF_NETDEV_INGRESS) {
+   return net->nf.hooks[reg->pf]+reg->hooknum;
+
 #ifdef CONFIG_NETFILTER_INGRESS
+   if (reg->hooknum == NF_NETDEV_INGRESS) {
if (reg->dev && dev_net(reg->dev) == net)
-   hook_head =
-   nf_entry_dereference(
-   reg->dev->nf_hooks_ingress);
-#endif
+   return ®->dev->nf_hooks_ingress;
}
-   return hook_head;
-}
-
-/* must hold nf_hook_mutex */
-static void nf_set_hooks_head(struct net *net, const struct nf_hook_ops *reg,
- struct nf_hook_entry *entry)
-{
-   switch (reg->pf) {
-   case NFPROTO_NETDEV:
-#ifdef CONFIG_NETFILTER_INGRESS
-   /* We already checked in nf_register_net_hook() that this is
-* used from ingress.
-*/
-   rcu_assign_pointer(reg->dev->nf_hooks_ingress, entry);
 #endif
-   break;
-   default:
-   rcu_assign_pointer(net->nf.hooks[reg->pf][reg->hooknum],
-  entry);
-   break;
-   }
+   return NULL;
 }
 
 int nf_register_net_hook(struct net *net, const struct nf_hook_ops *reg)
 {
-   struct nf_hook_entry *hooks_entry;
-   struct nf_hook_entry *entry;
+   struct nf_hook_entry __rcu **pp;
+   struct nf_hook_entry *entry, *p;
 
if (reg->pf == NFPROTO_NETDEV) {
 #ifndef CONFIG_NETFILTER_INGRESS
@@ -119,6 +94,10 @@ int nf_register_net_hook(struct net *net, const struct 
nf_hook_ops *reg)
return -EINVAL;
}
 
+   pp = nf_hook_entry_head(net, reg);
+   if (!pp)
+   return -EINVAL;
+
entry = kmalloc(sizeof(*entry), GFP_KERNEL);
if (!entry)
return -ENOMEM;
@@ -128,26 +107,15 @@ int nf_register_net_hook(struct net *net, const struct 
nf_hook_ops *reg)
entry->next = NULL;
 
mutex_lock(&nf

Re: slab corruption with current -git (was Re: [git pull] vfs pile 1 (splice))

2016-10-09 Thread Linus Torvalds
On Sun, Oct 9, 2016 at 6:35 PM, Aaron Conole  wrote:
>
> I was just about to build and test something similar:

So I haven't actually tested that one, but looking at the code, it
really looks very bogus. In fact, that code just looks like crap. It
does *not* do a proper "remove singly linked list entry". It's exactly
the kind of code that I rail against, and that people should never
write.

Any code that can't even traverse a linked list is not worth looking at.

There is one *correct* way to remove an entry from a singly linked
list, and it looks like this:

struct entry **pp, *p;

pp = &head;
while ((p = *pp) != NULL) {
if (right_entry(p)) {
*pp = p->next;
break;
}
pp = &p->next;
}

and that's it. Nothing else. The above code exits the loop with "p"
containing the entry that was removed, or NULL if nothing was. It
can't get any simpler than that, but more importantly, anything more
complicated than that is WRONG.

Seriously, nothing else is acceptable. In particular, any linked list
traversal that makes a special case of the first entry or the last
entry should not be allowed to exist. Note how there is not a single
special case in the above correct code. It JustWorks(tm).

That nf_unregister_net_hook() code has all the signs of exactly that
kind of broken list-handling code: special-casing the head of the
loop, and having the loop condition test both current and that odd
"next to next" pointer etc. It's all very very wrong.

So I really see two options:

 - do that singly-linked list traversal right (and I'm serious:
nothing but the code above can ever be right)

 - don't make up your own list handling code at all, and use the
standard linux list code.

So either e3b37f11e6e4 needs to be reverted, or it needs to be taught
to use real list handling.  If the code doesn't want to use the
regular list.h (either the doubly linked one, or the hlist one), it
needs to at least learn to do list removal right.

   Linus
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: slab corruption with current -git (was Re: [git pull] vfs pile 1 (splice))

2016-10-09 Thread Aaron Conole
Florian Westphal  writes:

> Linus Torvalds  wrote:
>> On Sun, Oct 9, 2016 at 12:11 PM, Linus Torvalds
>>  wrote:
>> >
>> > Anyway, I don't think I can bisect it, but I'll try to narrow it down
>> > a *bit* at least.
>> >
>> > Not doing any more pulls on this unstable base, I've been puttering
>> > around in trying to clean up some stupid printk logging issues
>> > instead.
>> 
>> So I finally got a oops with slub debugging enabled. It doesn't really
>> narrow things down, though, it kind of extends on the possible
>> suspects. Now adding David Miller and Pablo, because it looks like it
>> may be netfilter that does something bad and corrupts memory.
>
> Quite possible, the netns interactions are not nice :-/
>
>> Without further ado, here's the new oops:
>> 
>>general protection fault:  [#1] SMP
>>CPU: 7 PID: 169 Comm: kworker/u16:7 Not tainted
>> 4.8.0-11288-gb66484cd7470 #1
>>Hardware name: System manufacturer System Product Name/Z170-K, BIOS
> ..
>>Call Trace:
>>  netfilter_net_exit+0x2f/0x60
>>  ops_exit_list.isra.4+0x38/0x60
>>  cleanup_net+0x1ba/0x2a0
>>  process_one_work+0x1f1/0x480
>>  worker_thread+0x48/0x4d0
>>  ? process_one_work+0x480/0x480
>
> ..
>
>> like it's a pointer loaded from a free'd allocation.
>> 
>> The code disassembles to
>> 
>>0: 0f b6 ca movzbl %dl,%ecx
>>3: 48 8d 84 c8 00 01 00 lea0x100(%rax,%rcx,8),%rax
>>a: 00
>>b: 49 8b 5c c5 00   mov0x0(%r13,%rax,8),%rbx
>>   10: 48 85 db test   %rbx,%rbx
>>   13: 0f 84 cb 00 00 00 je 0xe4
>>   19: 4c 3b 63 40   cmp0x40(%rbx),%r12
>>   1d: 48 8b 03 mov(%rbx),%rax
>>   20: 0f 84 e9 00 00 00 je 0x10f
>>   26: 48 85 c0 test   %rax,%rax
>>   29: 74 26 je 0x51
>>   2b:* 4c 3b 60 40   cmp0x40(%rax),%r12 <-- trapping instruction
>>   2f: 75 08 jne0x39
>>   31: e9 ef 00 00 00   jmpq   0x125
>>   36: 48 89 d8 mov%rbx,%rax
>>   39: 48 8b 18 mov(%rax),%rbx
>>   3c: 48 85 db test   %rbx,%rbx
>> 
>> and that oopsing instruction seems to be the compare of
>> "hooks_entry->orig_ops" from hooks_entry in this expression:
>> 
>> if (hooks_entry && hooks_entry->orig_ops == reg) {
>> 
>> so hooks_entry() is bogus. It was gotten from
>> 
>> hooks_entry = nf_hook_entry_head(net, reg);
>> 
>> but that's as far as I dug. And yes, I do have
>> CONFIG_NETFILTER_INGRESS=y in case that matters.
>> 
>> And all this code has changed pretty radically in commit e3b37f11e6e4
>> ("netfilter: replace list_head with single linked list"), and there
>> was clearly already something wrong with that code, with commit
>> 5119e4381a90 ("netfilter: Fix potential null pointer dereference")
>> adding the test against NULL. But I suspect that only hid the "oops,
>> it's actually not NULL, it loaded some uninitialized value" problem.
>> 
>> Over to the networking guys.. Ideas?
>
> Sorry, not off the top of my head.
> Pablo is currently travelling back home from netdev 1.2 in Tokyo,
> I can help starting Wednesday when I am back.
>
> One shot in the dark (not even compile tested; wonder if we can end up
> zapping bogus hook ...)
>

I was just about to build and test something similar:

diff --git a/net/netfilter/core.c b/net/netfilter/core.c
index c9d90eb..e84103f 100644
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -189,7 +189,7 @@ void nf_unregister_net_hook(struct net *net, const struct 
nf_hook_ops *reg)
 
 unlock:
mutex_unlock(&nf_hook_mutex);
-   if (!hooks_entry) {
+   if (!hooks_entry || hooks_entry->orig_ops != reg) {
WARN(1, "nf_unregister_net_hook: hook not found!\n");
return;
}
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: slab corruption with current -git (was Re: [git pull] vfs pile 1 (splice))

2016-10-09 Thread Florian Westphal
Linus Torvalds  wrote:
> On Sun, Oct 9, 2016 at 12:11 PM, Linus Torvalds
>  wrote:
> >
> > Anyway, I don't think I can bisect it, but I'll try to narrow it down
> > a *bit* at least.
> >
> > Not doing any more pulls on this unstable base, I've been puttering
> > around in trying to clean up some stupid printk logging issues
> > instead.
> 
> So I finally got a oops with slub debugging enabled. It doesn't really
> narrow things down, though, it kind of extends on the possible
> suspects. Now adding David Miller and Pablo, because it looks like it
> may be netfilter that does something bad and corrupts memory.

Quite possible, the netns interactions are not nice :-/

> Without further ado, here's the new oops:
> 
>general protection fault:  [#1] SMP
>CPU: 7 PID: 169 Comm: kworker/u16:7 Not tainted 4.8.0-11288-gb66484cd7470 
> #1
>Hardware name: System manufacturer System Product Name/Z170-K, BIOS
..
>Call Trace:
>  netfilter_net_exit+0x2f/0x60
>  ops_exit_list.isra.4+0x38/0x60
>  cleanup_net+0x1ba/0x2a0
>  process_one_work+0x1f1/0x480
>  worker_thread+0x48/0x4d0
>  ? process_one_work+0x480/0x480

..

> like it's a pointer loaded from a free'd allocation.
> 
> The code disassembles to
> 
>0: 0f b6 ca movzbl %dl,%ecx
>3: 48 8d 84 c8 00 01 00 lea0x100(%rax,%rcx,8),%rax
>a: 00
>b: 49 8b 5c c5 00   mov0x0(%r13,%rax,8),%rbx
>   10: 48 85 db test   %rbx,%rbx
>   13: 0f 84 cb 00 00 00 je 0xe4
>   19: 4c 3b 63 40   cmp0x40(%rbx),%r12
>   1d: 48 8b 03 mov(%rbx),%rax
>   20: 0f 84 e9 00 00 00 je 0x10f
>   26: 48 85 c0 test   %rax,%rax
>   29: 74 26 je 0x51
>   2b:* 4c 3b 60 40   cmp0x40(%rax),%r12 <-- trapping instruction
>   2f: 75 08 jne0x39
>   31: e9 ef 00 00 00   jmpq   0x125
>   36: 48 89 d8 mov%rbx,%rax
>   39: 48 8b 18 mov(%rax),%rbx
>   3c: 48 85 db test   %rbx,%rbx
> 
> and that oopsing instruction seems to be the compare of
> "hooks_entry->orig_ops" from hooks_entry in this expression:
> 
> if (hooks_entry && hooks_entry->orig_ops == reg) {
> 
> so hooks_entry() is bogus. It was gotten from
> 
> hooks_entry = nf_hook_entry_head(net, reg);
> 
> but that's as far as I dug. And yes, I do have
> CONFIG_NETFILTER_INGRESS=y in case that matters.
> 
> And all this code has changed pretty radically in commit e3b37f11e6e4
> ("netfilter: replace list_head with single linked list"), and there
> was clearly already something wrong with that code, with commit
> 5119e4381a90 ("netfilter: Fix potential null pointer dereference")
> adding the test against NULL. But I suspect that only hid the "oops,
> it's actually not NULL, it loaded some uninitialized value" problem.
> 
> Over to the networking guys.. Ideas?

Sorry, not off the top of my head.
Pablo is currently travelling back home from netdev 1.2 in Tokyo,
I can help starting Wednesday when I am back.

One shot in the dark (not even compile tested; wonder if we can end up
zapping bogus hook ...)

diff --git a/net/netfilter/core.c b/net/netfilter/core.c
index c9d90eb..fd6a2ce 100644
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -189,6 +189,9 @@ void nf_unregister_net_hook(struct net *net, const struct 
nf_hook_ops *reg)
 
 unlock:
mutex_unlock(&nf_hook_mutex);
+
+   WARN_ON(hooks_entry && hooks_entry->orig_ops != reg);
+
if (!hooks_entry) {
WARN(1, "nf_unregister_net_hook: hook not found!\n");
return;

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


slab corruption with current -git (was Re: [git pull] vfs pile 1 (splice))

2016-10-09 Thread Linus Torvalds
On Sun, Oct 9, 2016 at 12:11 PM, Linus Torvalds
 wrote:
>
> Anyway, I don't think I can bisect it, but I'll try to narrow it down
> a *bit* at least.
>
> Not doing any more pulls on this unstable base, I've been puttering
> around in trying to clean up some stupid printk logging issues
> instead.

So I finally got a oops with slub debugging enabled. It doesn't really
narrow things down, though, it kind of extends on the possible
suspects. Now adding David Miller and Pablo, because it looks like it
may be netfilter that does something bad and corrupts memory.

Of course, maybe this is another symptom, and not the root cause for
my troubles, but it does look like it might be getting closer to the
cause... In particular, now it very much looks like a use-after-free
in the netfilter code, which *could* explain my original symptom with
later allocation users oopsing randomly.

Without further ado, here's the new oops:

   general protection fault:  [#1] SMP
   CPU: 7 PID: 169 Comm: kworker/u16:7 Not tainted 4.8.0-11288-gb66484cd7470 #1
   Hardware name: System manufacturer System Product Name/Z170-K, BIOS
1803 05/06/2016
   Workqueue: netns cleanup_net
   task: 91935e001fc0 task.stack: b4e2c213c000
   RIP: nf_unregister_net_hook+0x5f/0x190
   RSP: :b4e2c213fd40  EFLAGS: 00010202
   RAX: 6b6b6b6b6b6b6b6b RBX: 91933c4ab968 RCX: 0002
   RDX: 0002 RSI: c0642280 RDI: 91cf9820
   RBP: b4e2c213fd58 R08: 91933c4a86c8 R09: 0025
   R10: 00cc R11: 91935dd22000 R12: c0642280
   R13: 91934cc0ea80 R14: 91cf97e0 R15: 
   FS:  () GS:919376dc() knlGS:
   CS:  0010 DS:  ES:  CR0: 80050033
   CR2: 03e7c000 CR3: 0003fdb62000 CR4: 003406e0
   DR0:  DR1:  DR2: 
   DR3:  DR6: fffe0ff0 DR7: 0400
   Call Trace:
 netfilter_net_exit+0x2f/0x60
 ops_exit_list.isra.4+0x38/0x60
 cleanup_net+0x1ba/0x2a0
 process_one_work+0x1f1/0x480
 worker_thread+0x48/0x4d0
 ? process_one_work+0x480/0x480
 ? process_one_work+0x480/0x480
 kthread+0xd9/0xf0
 ? kthread_park+0x60/0x60
 ret_from_fork+0x22/0x30
   Code: 0f b6 ca 48 8d 84 c8 00 01 00 00 49 8b 5c c5 00 48 85 db 0f
84 cb 00 00 00 4c 3b 63 40 48 8b 03 0f 84 e9 00 00 00 48 85 c0 74 26
<4c> 3b 60 40 75 08 e9 ef 00 00 00 48 89 d8 48 8b 18 48 85 db 0f
   RIP  [] nf_unregister_net_hook+0x5f/0x190

and note the value in %rax: 6b is POISON_FREE, so it very much looks
like it's a pointer loaded from a free'd allocation.

The code disassembles to

   0: 0f b6 ca movzbl %dl,%ecx
   3: 48 8d 84 c8 00 01 00 lea0x100(%rax,%rcx,8),%rax
   a: 00
   b: 49 8b 5c c5 00   mov0x0(%r13,%rax,8),%rbx
  10: 48 85 db test   %rbx,%rbx
  13: 0f 84 cb 00 00 00 je 0xe4
  19: 4c 3b 63 40   cmp0x40(%rbx),%r12
  1d: 48 8b 03 mov(%rbx),%rax
  20: 0f 84 e9 00 00 00 je 0x10f
  26: 48 85 c0 test   %rax,%rax
  29: 74 26 je 0x51
  2b:* 4c 3b 60 40   cmp0x40(%rax),%r12 <-- trapping instruction
  2f: 75 08 jne0x39
  31: e9 ef 00 00 00   jmpq   0x125
  36: 48 89 d8 mov%rbx,%rax
  39: 48 8b 18 mov(%rax),%rbx
  3c: 48 85 db test   %rbx,%rbx

and that oopsing instruction seems to be the compare of
"hooks_entry->orig_ops" from hooks_entry in this expression:

if (hooks_entry && hooks_entry->orig_ops == reg) {

so hooks_entry() is bogus. It was gotten from

hooks_entry = nf_hook_entry_head(net, reg);

but that's as far as I dug. And yes, I do have
CONFIG_NETFILTER_INGRESS=y in case that matters.

And all this code has changed pretty radically in commit e3b37f11e6e4
("netfilter: replace list_head with single linked list"), and there
was clearly already something wrong with that code, with commit
5119e4381a90 ("netfilter: Fix potential null pointer dereference")
adding the test against NULL. But I suspect that only hid the "oops,
it's actually not NULL, it loaded some uninitialized value" problem.

Over to the networking guys.. Ideas?

   Linus
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nf] netfilter: nft_dynset: fix element timeout for HZ != 1000

2016-10-09 Thread Anders K. Pedersen | Cohaesio
From: Anders K. Pedersen 

With HZ=100 element timeout in dynamic sets (i.e. flow tables) is 10 times
higher than configured.

Add proper conversion to/from jiffies, when interacting with userspace.

I tested this on Linux 4.8.1, and it applies cleanly to current nf and
nf-next trees.

Fixes: 22fe54d5fefc ("netfilter: nf_tables: add support for dynamic set 
updates")
Signed-off-by: Anders K. Pedersen 
---

--- a/net/netfilter/nft_dynset.c2016-10-03 01:24:33.0 +0200
+++ b/net/netfilter/nft_dynset.c2016-10-09 14:39:48.519488167 +0200
@@ -143,7 +143,8 @@ static int nft_dynset_init(const struct
if (tb[NFTA_DYNSET_TIMEOUT] != NULL) {
if (!(set->flags & NFT_SET_TIMEOUT))
return -EINVAL;
-   timeout = be64_to_cpu(nla_get_be64(tb[NFTA_DYNSET_TIMEOUT]));
+   timeout = msecs_to_jiffies(be64_to_cpu(nla_get_be64(
+   tb[NFTA_DYNSET_TIMEOUT])));
}
 
priv->sreg_key = nft_parse_register(tb[NFTA_DYNSET_SREG_KEY]);
@@ -230,7 +231,8 @@ static int nft_dynset_dump(struct sk_buf
goto nla_put_failure;
if (nla_put_string(skb, NFTA_DYNSET_SET_NAME, priv->set->name))
goto nla_put_failure;
-   if (nla_put_be64(skb, NFTA_DYNSET_TIMEOUT, cpu_to_be64(priv->timeout),
+   if (nla_put_be64(skb, NFTA_DYNSET_TIMEOUT,
+cpu_to_be64(jiffies_to_msecs(priv->timeout)),
 NFTA_DYNSET_PAD))
goto nla_put_failure;
if (priv->expr && nft_expr_dump(skb, NFTA_DYNSET_EXPR, priv->expr))