Re: mm/sched/net: BUG when running simple code

2014-07-08 Thread Sasha Levin
On 07/08/2014 10:51 AM, Peter Zijlstra wrote:
> On Thu, Jun 12, 2014 at 10:56:16PM -0400, Sasha Levin wrote:
>> Hi all,
>> 
>> Okay, I'm really lost. I got the following when fuzzing, and can't really 
>> explain what's going on. It seems that we get a "unable to handle kernel 
>> paging request" when running rather simple code, and I can't figure out how 
>> it would cause it.
>> 
> 
> Are you running on AMD hardware? If so; check out this thread:
> 
> http://marc.info/?i=53b02ceb.7010...@web.de
> 

Unfortunately (luckily?) it's all Intel over here.


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm/sched/net: BUG when running simple code

2014-07-08 Thread Peter Zijlstra
On Thu, Jun 12, 2014 at 10:56:16PM -0400, Sasha Levin wrote:
> Hi all,
> 
> Okay, I'm really lost. I got the following when fuzzing, and can't really 
> explain what's
> going on. It seems that we get a "unable to handle kernel paging request" 
> when running
> rather simple code, and I can't figure out how it would cause it.
> 

Are you running on AMD hardware? If so; check out this thread:

  http://marc.info/?i=53b02ceb.7010...@web.de


pgpSXnqfmfIwF.pgp
Description: PGP signature


Re: mm/sched/net: BUG when running simple code

2014-07-08 Thread Peter Zijlstra
On Thu, Jun 12, 2014 at 10:56:16PM -0400, Sasha Levin wrote:
 Hi all,
 
 Okay, I'm really lost. I got the following when fuzzing, and can't really 
 explain what's
 going on. It seems that we get a unable to handle kernel paging request 
 when running
 rather simple code, and I can't figure out how it would cause it.
 

Are you running on AMD hardware? If so; check out this thread:

  http://marc.info/?i=53b02ceb.7010...@web.de


pgpSXnqfmfIwF.pgp
Description: PGP signature


Re: mm/sched/net: BUG when running simple code

2014-07-08 Thread Sasha Levin
On 07/08/2014 10:51 AM, Peter Zijlstra wrote:
 On Thu, Jun 12, 2014 at 10:56:16PM -0400, Sasha Levin wrote:
 Hi all,
 
 Okay, I'm really lost. I got the following when fuzzing, and can't really 
 explain what's going on. It seems that we get a unable to handle kernel 
 paging request when running rather simple code, and I can't figure out how 
 it would cause it.
 
 
 Are you running on AMD hardware? If so; check out this thread:
 
 http://marc.info/?i=53b02ceb.7010...@web.de
 

Unfortunately (luckily?) it's all Intel over here.


Thanks,
Sasha
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm/sched/net: BUG when running simple code

2014-06-16 Thread Dan Aloni
On Mon, Jun 16, 2014 at 11:17:55PM -0400, Sasha Levin wrote:
> On 06/13/2014 12:13 AM, Dave Jones wrote:
> > On Fri, Jun 13, 2014 at 12:01:37AM -0400, Sasha Levin wrote:
> > another theory: Trinity can sometimes generate plausible looking module
> > addresses and pass those in structs etc.
> > 
> > I wonder if there's somewhere in that path that isn't checking that the 
> > address
> > in the optval it got is actually a userspace address before it tries to 
> > write to it.
> 
> It happened again, and this time I've left the kernel addresses in, and it's 
> quite
> interesting:
> 
> [   88.837926] Call Trace:
> [   88.837926]  [] __sock_create+0x292/0x3c0
> [   88.837926]  [] ? __sock_create+0x110/0x3c0
> [   88.837926]  [] sock_create+0x30/0x40
> [   88.837926]  [] SyS_socket+0x2c/0x70
> [   88.837926]  [] ? tracesys+0x7e/0xe6
> [   88.837926]  [] tracesys+0xe1/0xe6
> 
> tracesys() seems to live inside a module space here?

I think it's more likely kASLR. The Documentation/x86/x86_64/mm.txt doc needs 
updating.

-- 
Dan Aloni
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm/sched/net: BUG when running simple code

2014-06-16 Thread Sasha Levin
On 06/13/2014 12:13 AM, Dave Jones wrote:
> On Fri, Jun 13, 2014 at 12:01:37AM -0400, Sasha Levin wrote:
>  > On 06/12/2014 11:27 PM, Dan Aloni wrote:
>  > > On Thu, Jun 12, 2014 at 10:56:16PM -0400, Sasha Levin wrote:
>  > >> > Hi all,
>  > >> > 
>  > >> > Okay, I'm really lost. I got the following when fuzzing, and can't 
> really explain what's
>  > >> > going on. It seems that we get a "unable to handle kernel paging 
> request" when running
>  > >> > rather simple code, and I can't figure out how it would cause it.
>  > > [..]
>  > >> > Which agrees with the trace I got:
>  > >> > 
>  > >> > [  516.309720] BUG: unable to handle kernel paging request at 
> a0f12560
>  > >> > [  516.309720] IP: netlink_getsockopt (net/netlink/af_netlink.c:2271)
>  > > [..]
>  > >> > [  516.309720] RIP netlink_getsockopt (net/netlink/af_netlink.c:2271)
>  > >> > [  516.309720]  RSP 
>  > >> > [  516.309720] CR2: a0f12560
>  > >> > 
>  > >> > They only theory I had so far is that netlink is a module, and has 
> gone away while the code
>  > >> > was executing, but netlink isn't a module on my kernel.
>  > > The RIP - 0xa0f12560 is in the range (from 
> Documentation/x86/x86_64/mm.txt):
>  > > 
>  > > a000 - ff5f (=1525 MB) module mapping space
>  > > 
>  > > So seems it was in a module.
>  > 
>  > Yup, that's why that theory came up, but when I checked my config:
>  > ... 
>  > that theory went away. (also confirmed by not finding a netlink module.)
>  > 
>  > What about the kernel .text overflowing into the modules space? The loader
>  > checks for that, but can something like that happen after everything is
>  > up and running? I'll look into that tomorrow.
> 
> another theory: Trinity can sometimes generate plausible looking module
> addresses and pass those in structs etc.
> 
> I wonder if there's somewhere in that path that isn't checking that the 
> address
> in the optval it got is actually a userspace address before it tries to write 
> to it.

It happened again, and this time I've left the kernel addresses in, and it's 
quite
interesting:

[   88.837926] Call Trace:
[   88.837926]  [] __sock_create+0x292/0x3c0
[   88.837926]  [] ? __sock_create+0x110/0x3c0
[   88.837926]  [] sock_create+0x30/0x40
[   88.837926]  [] SyS_socket+0x2c/0x70
[   88.837926]  [] ? tracesys+0x7e/0xe6
[   88.837926]  [] tracesys+0xe1/0xe6

tracesys() seems to live inside a module space here?


Thanks,
Sasha

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm/sched/net: BUG when running simple code

2014-06-16 Thread Sasha Levin
On 06/13/2014 12:13 AM, Dave Jones wrote:
 On Fri, Jun 13, 2014 at 12:01:37AM -0400, Sasha Levin wrote:
   On 06/12/2014 11:27 PM, Dan Aloni wrote:
On Thu, Jun 12, 2014 at 10:56:16PM -0400, Sasha Levin wrote:
 Hi all,
 
 Okay, I'm really lost. I got the following when fuzzing, and can't 
 really explain what's
 going on. It seems that we get a unable to handle kernel paging 
 request when running
 rather simple code, and I can't figure out how it would cause it.
[..]
 Which agrees with the trace I got:
 
 [  516.309720] BUG: unable to handle kernel paging request at 
 a0f12560
 [  516.309720] IP: netlink_getsockopt (net/netlink/af_netlink.c:2271)
[..]
 [  516.309720] RIP netlink_getsockopt (net/netlink/af_netlink.c:2271)
 [  516.309720]  RSP 8803fc85fed8
 [  516.309720] CR2: a0f12560
 
 They only theory I had so far is that netlink is a module, and has 
 gone away while the code
 was executing, but netlink isn't a module on my kernel.
The RIP - 0xa0f12560 is in the range (from 
 Documentation/x86/x86_64/mm.txt):

a000 - ff5f (=1525 MB) module mapping space

So seems it was in a module.
   
   Yup, that's why that theory came up, but when I checked my config:
   ... 
   that theory went away. (also confirmed by not finding a netlink module.)
   
   What about the kernel .text overflowing into the modules space? The loader
   checks for that, but can something like that happen after everything is
   up and running? I'll look into that tomorrow.
 
 another theory: Trinity can sometimes generate plausible looking module
 addresses and pass those in structs etc.
 
 I wonder if there's somewhere in that path that isn't checking that the 
 address
 in the optval it got is actually a userspace address before it tries to write 
 to it.

It happened again, and this time I've left the kernel addresses in, and it's 
quite
interesting:

[   88.837926] Call Trace:
[   88.837926]  [9ff6a792] __sock_create+0x292/0x3c0
[   88.837926]  [9ff6a610] ? __sock_create+0x110/0x3c0
[   88.837926]  [9ff6a920] sock_create+0x30/0x40
[   88.837926]  [9ff6ad4c] SyS_socket+0x2c/0x70
[   88.837926]  [a0561c30] ? tracesys+0x7e/0xe6
[   88.837926]  [a0561c93] tracesys+0xe1/0xe6

tracesys() seems to live inside a module space here?


Thanks,
Sasha

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm/sched/net: BUG when running simple code

2014-06-16 Thread Dan Aloni
On Mon, Jun 16, 2014 at 11:17:55PM -0400, Sasha Levin wrote:
 On 06/13/2014 12:13 AM, Dave Jones wrote:
  On Fri, Jun 13, 2014 at 12:01:37AM -0400, Sasha Levin wrote:
  another theory: Trinity can sometimes generate plausible looking module
  addresses and pass those in structs etc.
  
  I wonder if there's somewhere in that path that isn't checking that the 
  address
  in the optval it got is actually a userspace address before it tries to 
  write to it.
 
 It happened again, and this time I've left the kernel addresses in, and it's 
 quite
 interesting:
 
 [   88.837926] Call Trace:
 [   88.837926]  [9ff6a792] __sock_create+0x292/0x3c0
 [   88.837926]  [9ff6a610] ? __sock_create+0x110/0x3c0
 [   88.837926]  [9ff6a920] sock_create+0x30/0x40
 [   88.837926]  [9ff6ad4c] SyS_socket+0x2c/0x70
 [   88.837926]  [a0561c30] ? tracesys+0x7e/0xe6
 [   88.837926]  [a0561c93] tracesys+0xe1/0xe6
 
 tracesys() seems to live inside a module space here?

I think it's more likely kASLR. The Documentation/x86/x86_64/mm.txt doc needs 
updating.

-- 
Dan Aloni
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm/sched/net: BUG when running simple code

2014-06-13 Thread Sasha Levin
On 06/13/2014 12:13 AM, Dave Jones wrote:
> On Fri, Jun 13, 2014 at 12:01:37AM -0400, Sasha Levin wrote:
>  > On 06/12/2014 11:27 PM, Dan Aloni wrote:
>  > > On Thu, Jun 12, 2014 at 10:56:16PM -0400, Sasha Levin wrote:
>  > >> > Hi all,
>  > >> > 
>  > >> > Okay, I'm really lost. I got the following when fuzzing, and can't 
> really explain what's
>  > >> > going on. It seems that we get a "unable to handle kernel paging 
> request" when running
>  > >> > rather simple code, and I can't figure out how it would cause it.
>  > > [..]
>  > >> > Which agrees with the trace I got:
>  > >> > 
>  > >> > [  516.309720] BUG: unable to handle kernel paging request at 
> a0f12560
>  > >> > [  516.309720] IP: netlink_getsockopt (net/netlink/af_netlink.c:2271)
>  > > [..]
>  > >> > [  516.309720] RIP netlink_getsockopt (net/netlink/af_netlink.c:2271)
>  > >> > [  516.309720]  RSP 
>  > >> > [  516.309720] CR2: a0f12560
>  > >> > 
>  > >> > They only theory I had so far is that netlink is a module, and has 
> gone away while the code
>  > >> > was executing, but netlink isn't a module on my kernel.
>  > > The RIP - 0xa0f12560 is in the range (from 
> Documentation/x86/x86_64/mm.txt):
>  > > 
>  > > a000 - ff5f (=1525 MB) module mapping space
>  > > 
>  > > So seems it was in a module.
>  > 
>  > Yup, that's why that theory came up, but when I checked my config:
>  > ... 
>  > that theory went away. (also confirmed by not finding a netlink module.)
>  > 
>  > What about the kernel .text overflowing into the modules space? The loader
>  > checks for that, but can something like that happen after everything is
>  > up and running? I'll look into that tomorrow.
> 
> another theory: Trinity can sometimes generate plausible looking module
> addresses and pass those in structs etc.
> 
> I wonder if there's somewhere in that path that isn't checking that the 
> address
> in the optval it got is actually a userspace address before it tries to write 
> to it.

This is, the access happened way before touching optval. The only thing that 
happened
before is reading optlen from userspace, but that happened using get_user() 
which should
mean that it was safe.

According to that trace, we died when *executing* a piece of code, not when 
accessing
some other memory. None of the instructions around the instruction we failed on 
don't
touch memory at all for that matter.


Thanks,
Sasha

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm/sched/net: BUG when running simple code

2014-06-13 Thread Sasha Levin
On 06/13/2014 12:13 AM, Dave Jones wrote:
 On Fri, Jun 13, 2014 at 12:01:37AM -0400, Sasha Levin wrote:
   On 06/12/2014 11:27 PM, Dan Aloni wrote:
On Thu, Jun 12, 2014 at 10:56:16PM -0400, Sasha Levin wrote:
 Hi all,
 
 Okay, I'm really lost. I got the following when fuzzing, and can't 
 really explain what's
 going on. It seems that we get a unable to handle kernel paging 
 request when running
 rather simple code, and I can't figure out how it would cause it.
[..]
 Which agrees with the trace I got:
 
 [  516.309720] BUG: unable to handle kernel paging request at 
 a0f12560
 [  516.309720] IP: netlink_getsockopt (net/netlink/af_netlink.c:2271)
[..]
 [  516.309720] RIP netlink_getsockopt (net/netlink/af_netlink.c:2271)
 [  516.309720]  RSP 8803fc85fed8
 [  516.309720] CR2: a0f12560
 
 They only theory I had so far is that netlink is a module, and has 
 gone away while the code
 was executing, but netlink isn't a module on my kernel.
The RIP - 0xa0f12560 is in the range (from 
 Documentation/x86/x86_64/mm.txt):

a000 - ff5f (=1525 MB) module mapping space

So seems it was in a module.
   
   Yup, that's why that theory came up, but when I checked my config:
   ... 
   that theory went away. (also confirmed by not finding a netlink module.)
   
   What about the kernel .text overflowing into the modules space? The loader
   checks for that, but can something like that happen after everything is
   up and running? I'll look into that tomorrow.
 
 another theory: Trinity can sometimes generate plausible looking module
 addresses and pass those in structs etc.
 
 I wonder if there's somewhere in that path that isn't checking that the 
 address
 in the optval it got is actually a userspace address before it tries to write 
 to it.

This is, the access happened way before touching optval. The only thing that 
happened
before is reading optlen from userspace, but that happened using get_user() 
which should
mean that it was safe.

According to that trace, we died when *executing* a piece of code, not when 
accessing
some other memory. None of the instructions around the instruction we failed on 
don't
touch memory at all for that matter.


Thanks,
Sasha

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm/sched/net: BUG when running simple code

2014-06-12 Thread Dan Aloni
On Fri, Jun 13, 2014 at 07:55:55AM +0300, Dan Aloni wrote:
> And also, the Oops code of 0003 (PF_WRITE and PF_USER) might hint at
> what Dave wrote.

Scrape what I wrote about that, it's PF_PROT | PF_WRITE.

-- 
Dan Aloni
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm/sched/net: BUG when running simple code

2014-06-12 Thread Dan Aloni
On Fri, Jun 13, 2014 at 07:55:55AM +0300, Dan Aloni wrote:
> > that theory went away. (also confirmed by not finding a netlink module.)
> > 
> > What about the kernel .text overflowing into the modules space? The loader
> > checks for that, but can something like that happen after everything is
> > up and running? I'll look into that tomorrow.
> 
> The kernel .text needs to be more than 512MB for the overlap to happen. 
> 
> 8000 - a000 (=512 MB)  kernel text mapping, from 
> phys 0
> 
> Also, it is bizarre that symbol resolution resolved a0f12560 to 
> a symbol that is in module space where af_netlink.o is surely not because of 
> "obj-y := af_netlink.o" in the Makefile. 
> 
> What does your /proc/kallsyms show when sorted with regards to the symbols
> in question?
> 
> Also curious are the addresses you have on the stack:
> 
> > [  516.309720] Stack:
> > [  516.309720]  8803fc85ff18 8803fc85ff18 8803fc85fef8 
> > 8900200549908020
> > [  516.309720]  8803fc85ff18 9ff66470 8803fc85ff18 
> > 0037
> > [  516.309720]  8803fc85ff78 9ff69d26 0037 
> > 0004
>[..]

Oh, just figured about the new kASLR feature that got enabled
recently, it explains the addresses, but there was supposed to be a
line for it in the Oops, so I'm puzzled.

-- 
Dan Aloni
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm/sched/net: BUG when running simple code

2014-06-12 Thread Dan Aloni
On Fri, Jun 13, 2014 at 12:01:37AM -0400, Sasha Levin wrote:
> On 06/12/2014 11:27 PM, Dan Aloni wrote:
> > On Thu, Jun 12, 2014 at 10:56:16PM -0400, Sasha Levin wrote:
> >> > Hi all,
> >> > 
> >> > Okay, I'm really lost. I got the following when fuzzing, and can't 
> >> > really explain what's
> >> > going on. It seems that we get a "unable to handle kernel paging 
> >> > request" when running
> >> > rather simple code, and I can't figure out how it would cause it.
> > [..]
> >> > Which agrees with the trace I got:
> >> > 
> >> > [  516.309720] BUG: unable to handle kernel paging request at 
> >> > a0f12560
> >> > [  516.309720] IP: netlink_getsockopt (net/netlink/af_netlink.c:2271)
> > [..]
> >> > [  516.309720] RIP netlink_getsockopt (net/netlink/af_netlink.c:2271)
> >> > [  516.309720]  RSP 
> >> > [  516.309720] CR2: a0f12560
> >> > 
> >> > They only theory I had so far is that netlink is a module, and has gone 
> >> > away while the code
> >> > was executing, but netlink isn't a module on my kernel.
> > The RIP - 0xa0f12560 is in the range (from 
> > Documentation/x86/x86_64/mm.txt):
> > 
> > a000 - ff5f (=1525 MB) module mapping space
> > 
> > So seems it was in a module.
> 
> Yup, that's why that theory came up, but when I checked my config:
> 
> $ cat .config | grep NETLINK
> CONFIG_COMPAT_NETLINK_MESSAGES=y
> CONFIG_NETFILTER_NETLINK=y
> CONFIG_NETFILTER_NETLINK_ACCT=y
> CONFIG_NETFILTER_NETLINK_QUEUE=y
> CONFIG_NETFILTER_NETLINK_LOG=y
> CONFIG_NF_CT_NETLINK=y
> CONFIG_NF_CT_NETLINK_TIMEOUT=y
> CONFIG_NF_CT_NETLINK_HELPER=y
> CONFIG_NETFILTER_NETLINK_QUEUE_CT=y
> CONFIG_NETLINK_MMAP=y
> CONFIG_NETLINK_DIAG=y
> CONFIG_SCSI_NETLINK=y
> CONFIG_QUOTA_NETLINK_INTERFACE=y
> 
> that theory went away. (also confirmed by not finding a netlink module.)
> 
> What about the kernel .text overflowing into the modules space? The loader
> checks for that, but can something like that happen after everything is
> up and running? I'll look into that tomorrow.

The kernel .text needs to be more than 512MB for the overlap to happen. 

8000 - a000 (=512 MB)  kernel text mapping, from 
phys 0

Also, it is bizarre that symbol resolution resolved a0f12560 to 
a symbol that is in module space where af_netlink.o is surely not because of 
"obj-y := af_netlink.o" in the Makefile. 

What does your /proc/kallsyms show when sorted with regards to the symbols
in question?

Also curious are the addresses you have on the stack:

> [  516.309720] Stack:
> [  516.309720]  8803fc85ff18 8803fc85ff18 8803fc85fef8 
> 8900200549908020
> [  516.309720]  8803fc85ff18 9ff66470 8803fc85ff18 
> 0037
> [  516.309720]  8803fc85ff78 9ff69d26 0037 
> 0004

0x9ff69d26 is just a small space before the beginning of the module 
mapping space, at the end of the kernel text mapping. Unless there are 
some tricks on those mappings, they should be unused, or perhaps 
CONFIG_DEBUG_PAGEALLOC is at play here?

And also, the Oops code of 0003 (PF_WRITE and PF_USER) might hint at
what Dave wrote.

-- 
Dan Aloni
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm/sched/net: BUG when running simple code

2014-06-12 Thread Dave Jones
On Fri, Jun 13, 2014 at 12:01:37AM -0400, Sasha Levin wrote:
 > On 06/12/2014 11:27 PM, Dan Aloni wrote:
 > > On Thu, Jun 12, 2014 at 10:56:16PM -0400, Sasha Levin wrote:
 > >> > Hi all,
 > >> > 
 > >> > Okay, I'm really lost. I got the following when fuzzing, and can't 
 > >> > really explain what's
 > >> > going on. It seems that we get a "unable to handle kernel paging 
 > >> > request" when running
 > >> > rather simple code, and I can't figure out how it would cause it.
 > > [..]
 > >> > Which agrees with the trace I got:
 > >> > 
 > >> > [  516.309720] BUG: unable to handle kernel paging request at 
 > >> > a0f12560
 > >> > [  516.309720] IP: netlink_getsockopt (net/netlink/af_netlink.c:2271)
 > > [..]
 > >> > [  516.309720] RIP netlink_getsockopt (net/netlink/af_netlink.c:2271)
 > >> > [  516.309720]  RSP 
 > >> > [  516.309720] CR2: a0f12560
 > >> > 
 > >> > They only theory I had so far is that netlink is a module, and has gone 
 > >> > away while the code
 > >> > was executing, but netlink isn't a module on my kernel.
 > > The RIP - 0xa0f12560 is in the range (from 
 > > Documentation/x86/x86_64/mm.txt):
 > > 
 > > a000 - ff5f (=1525 MB) module mapping space
 > > 
 > > So seems it was in a module.
 > 
 > Yup, that's why that theory came up, but when I checked my config:
 > ... 
 > that theory went away. (also confirmed by not finding a netlink module.)
 > 
 > What about the kernel .text overflowing into the modules space? The loader
 > checks for that, but can something like that happen after everything is
 > up and running? I'll look into that tomorrow.

another theory: Trinity can sometimes generate plausible looking module
addresses and pass those in structs etc.

I wonder if there's somewhere in that path that isn't checking that the address
in the optval it got is actually a userspace address before it tries to write 
to it.

Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm/sched/net: BUG when running simple code

2014-06-12 Thread Sasha Levin
On 06/12/2014 11:27 PM, Dan Aloni wrote:
> On Thu, Jun 12, 2014 at 10:56:16PM -0400, Sasha Levin wrote:
>> > Hi all,
>> > 
>> > Okay, I'm really lost. I got the following when fuzzing, and can't really 
>> > explain what's
>> > going on. It seems that we get a "unable to handle kernel paging request" 
>> > when running
>> > rather simple code, and I can't figure out how it would cause it.
> [..]
>> > Which agrees with the trace I got:
>> > 
>> > [  516.309720] BUG: unable to handle kernel paging request at 
>> > a0f12560
>> > [  516.309720] IP: netlink_getsockopt (net/netlink/af_netlink.c:2271)
> [..]
>> > [  516.309720] RIP netlink_getsockopt (net/netlink/af_netlink.c:2271)
>> > [  516.309720]  RSP 
>> > [  516.309720] CR2: a0f12560
>> > 
>> > They only theory I had so far is that netlink is a module, and has gone 
>> > away while the code
>> > was executing, but netlink isn't a module on my kernel.
> The RIP - 0xa0f12560 is in the range (from 
> Documentation/x86/x86_64/mm.txt):
> 
> a000 - ff5f (=1525 MB) module mapping space
> 
> So seems it was in a module.

Yup, that's why that theory came up, but when I checked my config:

$ cat .config | grep NETLINK
CONFIG_COMPAT_NETLINK_MESSAGES=y
CONFIG_NETFILTER_NETLINK=y
CONFIG_NETFILTER_NETLINK_ACCT=y
CONFIG_NETFILTER_NETLINK_QUEUE=y
CONFIG_NETFILTER_NETLINK_LOG=y
CONFIG_NF_CT_NETLINK=y
CONFIG_NF_CT_NETLINK_TIMEOUT=y
CONFIG_NF_CT_NETLINK_HELPER=y
CONFIG_NETFILTER_NETLINK_QUEUE_CT=y
CONFIG_NETLINK_MMAP=y
CONFIG_NETLINK_DIAG=y
CONFIG_SCSI_NETLINK=y
CONFIG_QUOTA_NETLINK_INTERFACE=y

that theory went away. (also confirmed by not finding a netlink module.)

What about the kernel .text overflowing into the modules space? The loader
checks for that, but can something like that happen after everything is
up and running? I'll look into that tomorrow.


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm/sched/net: BUG when running simple code

2014-06-12 Thread Dan Aloni
On Thu, Jun 12, 2014 at 10:56:16PM -0400, Sasha Levin wrote:
> Hi all,
> 
> Okay, I'm really lost. I got the following when fuzzing, and can't really 
> explain what's
> going on. It seems that we get a "unable to handle kernel paging request" 
> when running
> rather simple code, and I can't figure out how it would cause it.
[..]
> Which agrees with the trace I got:
> 
> [  516.309720] BUG: unable to handle kernel paging request at a0f12560
> [  516.309720] IP: netlink_getsockopt (net/netlink/af_netlink.c:2271)
[..]
> [  516.309720] RIP netlink_getsockopt (net/netlink/af_netlink.c:2271)
> [  516.309720]  RSP 
> [  516.309720] CR2: a0f12560
> 
> They only theory I had so far is that netlink is a module, and has gone away 
> while the code
> was executing, but netlink isn't a module on my kernel.

The RIP - 0xa0f12560 is in the range (from 
Documentation/x86/x86_64/mm.txt):

a000 - ff5f (=1525 MB) module mapping space

So seems it was in a module.

-- 
Dan Aloni
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm/sched/net: BUG when running simple code

2014-06-12 Thread Dan Aloni
On Thu, Jun 12, 2014 at 10:56:16PM -0400, Sasha Levin wrote:
 Hi all,
 
 Okay, I'm really lost. I got the following when fuzzing, and can't really 
 explain what's
 going on. It seems that we get a unable to handle kernel paging request 
 when running
 rather simple code, and I can't figure out how it would cause it.
[..]
 Which agrees with the trace I got:
 
 [  516.309720] BUG: unable to handle kernel paging request at a0f12560
 [  516.309720] IP: netlink_getsockopt (net/netlink/af_netlink.c:2271)
[..]
 [  516.309720] RIP netlink_getsockopt (net/netlink/af_netlink.c:2271)
 [  516.309720]  RSP 8803fc85fed8
 [  516.309720] CR2: a0f12560
 
 They only theory I had so far is that netlink is a module, and has gone away 
 while the code
 was executing, but netlink isn't a module on my kernel.

The RIP - 0xa0f12560 is in the range (from 
Documentation/x86/x86_64/mm.txt):

a000 - ff5f (=1525 MB) module mapping space

So seems it was in a module.

-- 
Dan Aloni
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm/sched/net: BUG when running simple code

2014-06-12 Thread Sasha Levin
On 06/12/2014 11:27 PM, Dan Aloni wrote:
 On Thu, Jun 12, 2014 at 10:56:16PM -0400, Sasha Levin wrote:
  Hi all,
  
  Okay, I'm really lost. I got the following when fuzzing, and can't really 
  explain what's
  going on. It seems that we get a unable to handle kernel paging request 
  when running
  rather simple code, and I can't figure out how it would cause it.
 [..]
  Which agrees with the trace I got:
  
  [  516.309720] BUG: unable to handle kernel paging request at 
  a0f12560
  [  516.309720] IP: netlink_getsockopt (net/netlink/af_netlink.c:2271)
 [..]
  [  516.309720] RIP netlink_getsockopt (net/netlink/af_netlink.c:2271)
  [  516.309720]  RSP 8803fc85fed8
  [  516.309720] CR2: a0f12560
  
  They only theory I had so far is that netlink is a module, and has gone 
  away while the code
  was executing, but netlink isn't a module on my kernel.
 The RIP - 0xa0f12560 is in the range (from 
 Documentation/x86/x86_64/mm.txt):
 
 a000 - ff5f (=1525 MB) module mapping space
 
 So seems it was in a module.

Yup, that's why that theory came up, but when I checked my config:

$ cat .config | grep NETLINK
CONFIG_COMPAT_NETLINK_MESSAGES=y
CONFIG_NETFILTER_NETLINK=y
CONFIG_NETFILTER_NETLINK_ACCT=y
CONFIG_NETFILTER_NETLINK_QUEUE=y
CONFIG_NETFILTER_NETLINK_LOG=y
CONFIG_NF_CT_NETLINK=y
CONFIG_NF_CT_NETLINK_TIMEOUT=y
CONFIG_NF_CT_NETLINK_HELPER=y
CONFIG_NETFILTER_NETLINK_QUEUE_CT=y
CONFIG_NETLINK_MMAP=y
CONFIG_NETLINK_DIAG=y
CONFIG_SCSI_NETLINK=y
CONFIG_QUOTA_NETLINK_INTERFACE=y

that theory went away. (also confirmed by not finding a netlink module.)

What about the kernel .text overflowing into the modules space? The loader
checks for that, but can something like that happen after everything is
up and running? I'll look into that tomorrow.


Thanks,
Sasha
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm/sched/net: BUG when running simple code

2014-06-12 Thread Dave Jones
On Fri, Jun 13, 2014 at 12:01:37AM -0400, Sasha Levin wrote:
  On 06/12/2014 11:27 PM, Dan Aloni wrote:
   On Thu, Jun 12, 2014 at 10:56:16PM -0400, Sasha Levin wrote:
Hi all,

Okay, I'm really lost. I got the following when fuzzing, and can't 
really explain what's
going on. It seems that we get a unable to handle kernel paging 
request when running
rather simple code, and I can't figure out how it would cause it.
   [..]
Which agrees with the trace I got:

[  516.309720] BUG: unable to handle kernel paging request at 
a0f12560
[  516.309720] IP: netlink_getsockopt (net/netlink/af_netlink.c:2271)
   [..]
[  516.309720] RIP netlink_getsockopt (net/netlink/af_netlink.c:2271)
[  516.309720]  RSP 8803fc85fed8
[  516.309720] CR2: a0f12560

They only theory I had so far is that netlink is a module, and has gone 
away while the code
was executing, but netlink isn't a module on my kernel.
   The RIP - 0xa0f12560 is in the range (from 
   Documentation/x86/x86_64/mm.txt):
   
   a000 - ff5f (=1525 MB) module mapping space
   
   So seems it was in a module.
  
  Yup, that's why that theory came up, but when I checked my config:
  ... 
  that theory went away. (also confirmed by not finding a netlink module.)
  
  What about the kernel .text overflowing into the modules space? The loader
  checks for that, but can something like that happen after everything is
  up and running? I'll look into that tomorrow.

another theory: Trinity can sometimes generate plausible looking module
addresses and pass those in structs etc.

I wonder if there's somewhere in that path that isn't checking that the address
in the optval it got is actually a userspace address before it tries to write 
to it.

Dave

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm/sched/net: BUG when running simple code

2014-06-12 Thread Dan Aloni
On Fri, Jun 13, 2014 at 12:01:37AM -0400, Sasha Levin wrote:
 On 06/12/2014 11:27 PM, Dan Aloni wrote:
  On Thu, Jun 12, 2014 at 10:56:16PM -0400, Sasha Levin wrote:
   Hi all,
   
   Okay, I'm really lost. I got the following when fuzzing, and can't 
   really explain what's
   going on. It seems that we get a unable to handle kernel paging 
   request when running
   rather simple code, and I can't figure out how it would cause it.
  [..]
   Which agrees with the trace I got:
   
   [  516.309720] BUG: unable to handle kernel paging request at 
   a0f12560
   [  516.309720] IP: netlink_getsockopt (net/netlink/af_netlink.c:2271)
  [..]
   [  516.309720] RIP netlink_getsockopt (net/netlink/af_netlink.c:2271)
   [  516.309720]  RSP 8803fc85fed8
   [  516.309720] CR2: a0f12560
   
   They only theory I had so far is that netlink is a module, and has gone 
   away while the code
   was executing, but netlink isn't a module on my kernel.
  The RIP - 0xa0f12560 is in the range (from 
  Documentation/x86/x86_64/mm.txt):
  
  a000 - ff5f (=1525 MB) module mapping space
  
  So seems it was in a module.
 
 Yup, that's why that theory came up, but when I checked my config:
 
 $ cat .config | grep NETLINK
 CONFIG_COMPAT_NETLINK_MESSAGES=y
 CONFIG_NETFILTER_NETLINK=y
 CONFIG_NETFILTER_NETLINK_ACCT=y
 CONFIG_NETFILTER_NETLINK_QUEUE=y
 CONFIG_NETFILTER_NETLINK_LOG=y
 CONFIG_NF_CT_NETLINK=y
 CONFIG_NF_CT_NETLINK_TIMEOUT=y
 CONFIG_NF_CT_NETLINK_HELPER=y
 CONFIG_NETFILTER_NETLINK_QUEUE_CT=y
 CONFIG_NETLINK_MMAP=y
 CONFIG_NETLINK_DIAG=y
 CONFIG_SCSI_NETLINK=y
 CONFIG_QUOTA_NETLINK_INTERFACE=y
 
 that theory went away. (also confirmed by not finding a netlink module.)
 
 What about the kernel .text overflowing into the modules space? The loader
 checks for that, but can something like that happen after everything is
 up and running? I'll look into that tomorrow.

The kernel .text needs to be more than 512MB for the overlap to happen. 

8000 - a000 (=512 MB)  kernel text mapping, from 
phys 0

Also, it is bizarre that symbol resolution resolved a0f12560 to 
a symbol that is in module space where af_netlink.o is surely not because of 
obj-y := af_netlink.o in the Makefile. 

What does your /proc/kallsyms show when sorted with regards to the symbols
in question?

Also curious are the addresses you have on the stack:

 [  516.309720] Stack:
 [  516.309720]  8803fc85ff18 8803fc85ff18 8803fc85fef8 
 8900200549908020
 [  516.309720]  8803fc85ff18 9ff66470 8803fc85ff18 
 0037
 [  516.309720]  8803fc85ff78 9ff69d26 0037 
 0004

0x9ff69d26 is just a small space before the beginning of the module 
mapping space, at the end of the kernel text mapping. Unless there are 
some tricks on those mappings, they should be unused, or perhaps 
CONFIG_DEBUG_PAGEALLOC is at play here?

And also, the Oops code of 0003 (PF_WRITE and PF_USER) might hint at
what Dave wrote.

-- 
Dan Aloni
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm/sched/net: BUG when running simple code

2014-06-12 Thread Dan Aloni
On Fri, Jun 13, 2014 at 07:55:55AM +0300, Dan Aloni wrote:
  that theory went away. (also confirmed by not finding a netlink module.)
  
  What about the kernel .text overflowing into the modules space? The loader
  checks for that, but can something like that happen after everything is
  up and running? I'll look into that tomorrow.
 
 The kernel .text needs to be more than 512MB for the overlap to happen. 
 
 8000 - a000 (=512 MB)  kernel text mapping, from 
 phys 0
 
 Also, it is bizarre that symbol resolution resolved a0f12560 to 
 a symbol that is in module space where af_netlink.o is surely not because of 
 obj-y := af_netlink.o in the Makefile. 
 
 What does your /proc/kallsyms show when sorted with regards to the symbols
 in question?
 
 Also curious are the addresses you have on the stack:
 
  [  516.309720] Stack:
  [  516.309720]  8803fc85ff18 8803fc85ff18 8803fc85fef8 
  8900200549908020
  [  516.309720]  8803fc85ff18 9ff66470 8803fc85ff18 
  0037
  [  516.309720]  8803fc85ff78 9ff69d26 0037 
  0004
[..]

Oh, just figured about the new kASLR feature that got enabled
recently, it explains the addresses, but there was supposed to be a
line for it in the Oops, so I'm puzzled.

-- 
Dan Aloni
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm/sched/net: BUG when running simple code

2014-06-12 Thread Dan Aloni
On Fri, Jun 13, 2014 at 07:55:55AM +0300, Dan Aloni wrote:
 And also, the Oops code of 0003 (PF_WRITE and PF_USER) might hint at
 what Dave wrote.

Scrape what I wrote about that, it's PF_PROT | PF_WRITE.

-- 
Dan Aloni
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/