Re: [LKP] 4ce5f9c9e7 [ 1.323881] WARNING: CPU: 0 PID: 1 at mm/slab_common.c:1031 kmalloc_slab

2018-10-10 Thread Eric W. Biederman
ebied...@xmission.com (Eric W. Biederman) writes:

> Sean Christopherson  writes:
>
>> On Wed, Oct 10, 2018 at 05:06:52PM -0500, Eric W. Biederman wrote:
>>> ebied...@xmission.com (Eric W. Biederman) writes:
>>> 
>>> > So I am flummoxed.  I am reading through the code and I don't see
>>> > anything that could trigger this, and when I ran the supplied reproducer
>>> > it did not reproduce for me.
>>> 
>>> Even more so.  With my tool chain the line that reports the failing
>>> address is impossible.
>>> 
>>> [   73.034423] RIP: 0010:copy_siginfo_from_user+0x4d/0xd0
>>> 
>>> With the supplied configureation my tool chain only has 0x30 bytes for
>>> all of copy_siginfo_from_user.  So I can't even begin to guess where
>>> in that function things are failing.
>>> 
>>> Any additional information that you can provide would be a real help
>>> in tracking down this strange failure.
>>
>> I don't have the exact toolchain, but I was able to get somewhat close
>> and may have found a smoking gun.  0x4d in my build is in the general
>> vicinity of "sig_sicodes[sig].limit" in known_siginfo_layout().  This
>> lines up with the register state from the log, e.g. RDI=0500104d8,
>> which is the mask generated by sig_specific_sicodes.  From what I can
>> tell, @sig is never bounds checked.  If the compiler generated an AND
>> instruction to compare against sig_specific_sicodes then that could
>> resolve true with any arbitrary value that happened to collide with
>> sig_specific_sicodes and result in an out-of-bounds access to
>> @sig_sicodes.  siginfo_layout() for example explicitly checks @sig
>> before indexing @sig_sicode, e.g. "sig < ARRAY_SIZE(sig_sicodes)".
>>
>> Maybe this?
>
> But sig is bounds checked.  Even better sig is checked to see if it
> is one of the values in the array.
>
>> From include/linux/signal.h
>
> #define SIG_SPECIFIC_SICODES_MASK (\
>   rt_sigmask(SIGILL)|  rt_sigmask(SIGFPE)| \
>   rt_sigmask(SIGSEGV)   |  rt_sigmask(SIGBUS)| \
>   rt_sigmask(SIGTRAP)   |  rt_sigmask(SIGCHLD)   | \
>   rt_sigmask(SIGPOLL)   |  rt_sigmask(SIGSYS)| \
>   SIGEMT_MASK)
>
> #define siginmask(sig, mask) \
>   ((sig) < SIGRTMIN && (rt_sigmask(sig) & (mask)))
>
> #define sig_specific_sicodes(sig) siginmask(sig, 
> SIG_SPECIFIC_SICODES_MASK)
>
>
>
> Hmm.  I wonder if something is passing in a negative signal number.
> There is not a bounds check for that.  A sufficiently large signal
> number might be the problem here.  Yes.  I can get an oops with
> a sufficiently large negative signal number.
>
> The code will later call valid_signal in check_permissions and
> that will cause the system call to fail, so the issue is just that
> the signal number is not being validated early enough.
>
> On the output path (copy_siginfo_to_user and copy_siginfo_to_user32) the
> signal number should be validated before it ever reaches userspace
> which is why I expect trinity never triggered anything.
>
> There is copy_siginfo_from_user32 and that does call siginfo_layout with
> a possibly negative signal number.  Which has the same potential issues.
>
> So I am going to go with the fix below.  That fixes things in my testing
> and by being unsigned should fix keep negative numbers from being a
> problem.

Sean thank you very much for putting me on the right path to track this
failing test down.

Eric


Re: [LKP] 4ce5f9c9e7 [ 1.323881] WARNING: CPU: 0 PID: 1 at mm/slab_common.c:1031 kmalloc_slab

2018-10-10 Thread Eric W. Biederman
ebied...@xmission.com (Eric W. Biederman) writes:

> Sean Christopherson  writes:
>
>> On Wed, Oct 10, 2018 at 05:06:52PM -0500, Eric W. Biederman wrote:
>>> ebied...@xmission.com (Eric W. Biederman) writes:
>>> 
>>> > So I am flummoxed.  I am reading through the code and I don't see
>>> > anything that could trigger this, and when I ran the supplied reproducer
>>> > it did not reproduce for me.
>>> 
>>> Even more so.  With my tool chain the line that reports the failing
>>> address is impossible.
>>> 
>>> [   73.034423] RIP: 0010:copy_siginfo_from_user+0x4d/0xd0
>>> 
>>> With the supplied configureation my tool chain only has 0x30 bytes for
>>> all of copy_siginfo_from_user.  So I can't even begin to guess where
>>> in that function things are failing.
>>> 
>>> Any additional information that you can provide would be a real help
>>> in tracking down this strange failure.
>>
>> I don't have the exact toolchain, but I was able to get somewhat close
>> and may have found a smoking gun.  0x4d in my build is in the general
>> vicinity of "sig_sicodes[sig].limit" in known_siginfo_layout().  This
>> lines up with the register state from the log, e.g. RDI=0500104d8,
>> which is the mask generated by sig_specific_sicodes.  From what I can
>> tell, @sig is never bounds checked.  If the compiler generated an AND
>> instruction to compare against sig_specific_sicodes then that could
>> resolve true with any arbitrary value that happened to collide with
>> sig_specific_sicodes and result in an out-of-bounds access to
>> @sig_sicodes.  siginfo_layout() for example explicitly checks @sig
>> before indexing @sig_sicode, e.g. "sig < ARRAY_SIZE(sig_sicodes)".
>>
>> Maybe this?
>
> But sig is bounds checked.  Even better sig is checked to see if it
> is one of the values in the array.
>
>> From include/linux/signal.h
>
> #define SIG_SPECIFIC_SICODES_MASK (\
>   rt_sigmask(SIGILL)|  rt_sigmask(SIGFPE)| \
>   rt_sigmask(SIGSEGV)   |  rt_sigmask(SIGBUS)| \
>   rt_sigmask(SIGTRAP)   |  rt_sigmask(SIGCHLD)   | \
>   rt_sigmask(SIGPOLL)   |  rt_sigmask(SIGSYS)| \
>   SIGEMT_MASK)
>
> #define siginmask(sig, mask) \
>   ((sig) < SIGRTMIN && (rt_sigmask(sig) & (mask)))
>
> #define sig_specific_sicodes(sig) siginmask(sig, 
> SIG_SPECIFIC_SICODES_MASK)
>
>
>
> Hmm.  I wonder if something is passing in a negative signal number.
> There is not a bounds check for that.  A sufficiently large signal
> number might be the problem here.  Yes.  I can get an oops with
> a sufficiently large negative signal number.
>
> The code will later call valid_signal in check_permissions and
> that will cause the system call to fail, so the issue is just that
> the signal number is not being validated early enough.
>
> On the output path (copy_siginfo_to_user and copy_siginfo_to_user32) the
> signal number should be validated before it ever reaches userspace
> which is why I expect trinity never triggered anything.
>
> There is copy_siginfo_from_user32 and that does call siginfo_layout with
> a possibly negative signal number.  Which has the same potential issues.
>
> So I am going to go with the fix below.  That fixes things in my testing
> and by being unsigned should fix keep negative numbers from being a
> problem.

Sean thank you very much for putting me on the right path to track this
failing test down.

Eric


Re: [LKP] 4ce5f9c9e7 [ 1.323881] WARNING: CPU: 0 PID: 1 at mm/slab_common.c:1031 kmalloc_slab

2018-10-10 Thread Eric W. Biederman
Sean Christopherson  writes:

> On Wed, Oct 10, 2018 at 05:06:52PM -0500, Eric W. Biederman wrote:
>> ebied...@xmission.com (Eric W. Biederman) writes:
>> 
>> > So I am flummoxed.  I am reading through the code and I don't see
>> > anything that could trigger this, and when I ran the supplied reproducer
>> > it did not reproduce for me.
>> 
>> Even more so.  With my tool chain the line that reports the failing
>> address is impossible.
>> 
>> [   73.034423] RIP: 0010:copy_siginfo_from_user+0x4d/0xd0
>> 
>> With the supplied configureation my tool chain only has 0x30 bytes for
>> all of copy_siginfo_from_user.  So I can't even begin to guess where
>> in that function things are failing.
>> 
>> Any additional information that you can provide would be a real help
>> in tracking down this strange failure.
>
> I don't have the exact toolchain, but I was able to get somewhat close
> and may have found a smoking gun.  0x4d in my build is in the general
> vicinity of "sig_sicodes[sig].limit" in known_siginfo_layout().  This
> lines up with the register state from the log, e.g. RDI=0500104d8,
> which is the mask generated by sig_specific_sicodes.  From what I can
> tell, @sig is never bounds checked.  If the compiler generated an AND
> instruction to compare against sig_specific_sicodes then that could
> resolve true with any arbitrary value that happened to collide with
> sig_specific_sicodes and result in an out-of-bounds access to
> @sig_sicodes.  siginfo_layout() for example explicitly checks @sig
> before indexing @sig_sicode, e.g. "sig < ARRAY_SIZE(sig_sicodes)".
>
> Maybe this?

But sig is bounds checked.  Even better sig is checked to see if it
is one of the values in the array.

>From include/linux/signal.h

#define SIG_SPECIFIC_SICODES_MASK (\
rt_sigmask(SIGILL)|  rt_sigmask(SIGFPE)| \
rt_sigmask(SIGSEGV)   |  rt_sigmask(SIGBUS)| \
rt_sigmask(SIGTRAP)   |  rt_sigmask(SIGCHLD)   | \
rt_sigmask(SIGPOLL)   |  rt_sigmask(SIGSYS)| \
SIGEMT_MASK)

#define siginmask(sig, mask) \
((sig) < SIGRTMIN && (rt_sigmask(sig) & (mask)))

#define sig_specific_sicodes(sig)   siginmask(sig, 
SIG_SPECIFIC_SICODES_MASK)



Hmm.  I wonder if something is passing in a negative signal number.
There is not a bounds check for that.  A sufficiently large signal
number might be the problem here.  Yes.  I can get an oops with
a sufficiently large negative signal number.

The code will later call valid_signal in check_permissions and
that will cause the system call to fail, so the issue is just that
the signal number is not being validated early enough.

On the output path (copy_siginfo_to_user and copy_siginfo_to_user32) the
signal number should be validated before it ever reaches userspace
which is why I expect trinity never triggered anything.

There is copy_siginfo_from_user32 and that does call siginfo_layout with
a possibly negative signal number.  Which has the same potential issues.

So I am going to go with the fix below.  That fixes things in my testing
and by being unsigned should fix keep negative numbers from being a
problem.

diff --git a/kernel/signal.c b/kernel/signal.c
index 2bffc5a50183..4fd431ce4f91 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2860,7 +2860,7 @@ static const struct {
[SIGSYS]  = { NSIGSYS,  SIL_SYS },
 };
 
-static bool known_siginfo_layout(int sig, int si_code)
+static bool known_siginfo_layout(unsigned sig, int si_code)
 {
if (si_code == SI_KERNEL)
return true;
@@ -2879,7 +2879,7 @@ static bool known_siginfo_layout(int sig, int si_code)
return false;
 }
 
-enum siginfo_layout siginfo_layout(int sig, int si_code)
+enum siginfo_layout siginfo_layout(unsigned sig, int si_code)
 {
enum siginfo_layout layout = SIL_KILL;
if ((si_code > SI_USER) && (si_code < SI_KERNEL)) {


Re: [LKP] 4ce5f9c9e7 [ 1.323881] WARNING: CPU: 0 PID: 1 at mm/slab_common.c:1031 kmalloc_slab

2018-10-10 Thread Eric W. Biederman
Sean Christopherson  writes:

> On Wed, Oct 10, 2018 at 05:06:52PM -0500, Eric W. Biederman wrote:
>> ebied...@xmission.com (Eric W. Biederman) writes:
>> 
>> > So I am flummoxed.  I am reading through the code and I don't see
>> > anything that could trigger this, and when I ran the supplied reproducer
>> > it did not reproduce for me.
>> 
>> Even more so.  With my tool chain the line that reports the failing
>> address is impossible.
>> 
>> [   73.034423] RIP: 0010:copy_siginfo_from_user+0x4d/0xd0
>> 
>> With the supplied configureation my tool chain only has 0x30 bytes for
>> all of copy_siginfo_from_user.  So I can't even begin to guess where
>> in that function things are failing.
>> 
>> Any additional information that you can provide would be a real help
>> in tracking down this strange failure.
>
> I don't have the exact toolchain, but I was able to get somewhat close
> and may have found a smoking gun.  0x4d in my build is in the general
> vicinity of "sig_sicodes[sig].limit" in known_siginfo_layout().  This
> lines up with the register state from the log, e.g. RDI=0500104d8,
> which is the mask generated by sig_specific_sicodes.  From what I can
> tell, @sig is never bounds checked.  If the compiler generated an AND
> instruction to compare against sig_specific_sicodes then that could
> resolve true with any arbitrary value that happened to collide with
> sig_specific_sicodes and result in an out-of-bounds access to
> @sig_sicodes.  siginfo_layout() for example explicitly checks @sig
> before indexing @sig_sicode, e.g. "sig < ARRAY_SIZE(sig_sicodes)".
>
> Maybe this?

But sig is bounds checked.  Even better sig is checked to see if it
is one of the values in the array.

>From include/linux/signal.h

#define SIG_SPECIFIC_SICODES_MASK (\
rt_sigmask(SIGILL)|  rt_sigmask(SIGFPE)| \
rt_sigmask(SIGSEGV)   |  rt_sigmask(SIGBUS)| \
rt_sigmask(SIGTRAP)   |  rt_sigmask(SIGCHLD)   | \
rt_sigmask(SIGPOLL)   |  rt_sigmask(SIGSYS)| \
SIGEMT_MASK)

#define siginmask(sig, mask) \
((sig) < SIGRTMIN && (rt_sigmask(sig) & (mask)))

#define sig_specific_sicodes(sig)   siginmask(sig, 
SIG_SPECIFIC_SICODES_MASK)



Hmm.  I wonder if something is passing in a negative signal number.
There is not a bounds check for that.  A sufficiently large signal
number might be the problem here.  Yes.  I can get an oops with
a sufficiently large negative signal number.

The code will later call valid_signal in check_permissions and
that will cause the system call to fail, so the issue is just that
the signal number is not being validated early enough.

On the output path (copy_siginfo_to_user and copy_siginfo_to_user32) the
signal number should be validated before it ever reaches userspace
which is why I expect trinity never triggered anything.

There is copy_siginfo_from_user32 and that does call siginfo_layout with
a possibly negative signal number.  Which has the same potential issues.

So I am going to go with the fix below.  That fixes things in my testing
and by being unsigned should fix keep negative numbers from being a
problem.

diff --git a/kernel/signal.c b/kernel/signal.c
index 2bffc5a50183..4fd431ce4f91 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2860,7 +2860,7 @@ static const struct {
[SIGSYS]  = { NSIGSYS,  SIL_SYS },
 };
 
-static bool known_siginfo_layout(int sig, int si_code)
+static bool known_siginfo_layout(unsigned sig, int si_code)
 {
if (si_code == SI_KERNEL)
return true;
@@ -2879,7 +2879,7 @@ static bool known_siginfo_layout(int sig, int si_code)
return false;
 }
 
-enum siginfo_layout siginfo_layout(int sig, int si_code)
+enum siginfo_layout siginfo_layout(unsigned sig, int si_code)
 {
enum siginfo_layout layout = SIL_KILL;
if ((si_code > SI_USER) && (si_code < SI_KERNEL)) {


Re: [LKP] 4ce5f9c9e7 [ 1.323881] WARNING: CPU: 0 PID: 1 at mm/slab_common.c:1031 kmalloc_slab

2018-10-10 Thread Sean Christopherson
On Wed, Oct 10, 2018 at 04:41:48PM -0700, Sean Christopherson wrote:
> On Wed, Oct 10, 2018 at 05:06:52PM -0500, Eric W. Biederman wrote:
> > ebied...@xmission.com (Eric W. Biederman) writes:
> > 
> > > So I am flummoxed.  I am reading through the code and I don't see
> > > anything that could trigger this, and when I ran the supplied reproducer
> > > it did not reproduce for me.
> > 
> > Even more so.  With my tool chain the line that reports the failing
> > address is impossible.
> > 
> > [   73.034423] RIP: 0010:copy_siginfo_from_user+0x4d/0xd0
> > 
> > With the supplied configureation my tool chain only has 0x30 bytes for
> > all of copy_siginfo_from_user.  So I can't even begin to guess where
> > in that function things are failing.
> > 
> > Any additional information that you can provide would be a real help
> > in tracking down this strange failure.
> 
> I don't have the exact toolchain, but I was able to get somewhat close
> and may have found a smoking gun.  0x4d in my build is in the general
> vicinity of "sig_sicodes[sig].limit" in known_siginfo_layout().  This
> lines up with the register state from the log, e.g. RDI=0500104d8,
> which is the mask generated by sig_specific_sicodes.  From what I can
> tell, @sig is never bounds checked.  If the compiler generated an AND
> instruction to compare against sig_specific_sicodes then that could
> resolve true with any arbitrary value that happened to collide with
> sig_specific_sicodes and result in an out-of-bounds access to
> @sig_sicodes.  siginfo_layout() for example explicitly checks @sig
> before indexing @sig_sicode, e.g. "sig < ARRAY_SIZE(sig_sicodes)".

Hmm, siginmask explicitly checks sig < SIGRTMIN, which might squash
my theory.

> 
> Maybe this?
> 
> ---
>  kernel/signal.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 1c2dd117fee0..6ee7491de906 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -2865,7 +2865,8 @@ static bool known_siginfo_layout(int sig, int si_code)
> if (si_code == SI_KERNEL)
> return true;
> else if ((si_code > SI_USER)) {
> -   if (sig_specific_sicodes(sig)) {
> +   if (sig < ARRAY_SIZE(sig_sicodes) &&
> +   sig_specific_sicodes(sig)) {
> if (si_code <= sig_sicodes[sig].limit)
> return true;
> }


Re: [LKP] 4ce5f9c9e7 [ 1.323881] WARNING: CPU: 0 PID: 1 at mm/slab_common.c:1031 kmalloc_slab

2018-10-10 Thread Sean Christopherson
On Wed, Oct 10, 2018 at 04:41:48PM -0700, Sean Christopherson wrote:
> On Wed, Oct 10, 2018 at 05:06:52PM -0500, Eric W. Biederman wrote:
> > ebied...@xmission.com (Eric W. Biederman) writes:
> > 
> > > So I am flummoxed.  I am reading through the code and I don't see
> > > anything that could trigger this, and when I ran the supplied reproducer
> > > it did not reproduce for me.
> > 
> > Even more so.  With my tool chain the line that reports the failing
> > address is impossible.
> > 
> > [   73.034423] RIP: 0010:copy_siginfo_from_user+0x4d/0xd0
> > 
> > With the supplied configureation my tool chain only has 0x30 bytes for
> > all of copy_siginfo_from_user.  So I can't even begin to guess where
> > in that function things are failing.
> > 
> > Any additional information that you can provide would be a real help
> > in tracking down this strange failure.
> 
> I don't have the exact toolchain, but I was able to get somewhat close
> and may have found a smoking gun.  0x4d in my build is in the general
> vicinity of "sig_sicodes[sig].limit" in known_siginfo_layout().  This
> lines up with the register state from the log, e.g. RDI=0500104d8,
> which is the mask generated by sig_specific_sicodes.  From what I can
> tell, @sig is never bounds checked.  If the compiler generated an AND
> instruction to compare against sig_specific_sicodes then that could
> resolve true with any arbitrary value that happened to collide with
> sig_specific_sicodes and result in an out-of-bounds access to
> @sig_sicodes.  siginfo_layout() for example explicitly checks @sig
> before indexing @sig_sicode, e.g. "sig < ARRAY_SIZE(sig_sicodes)".

Hmm, siginmask explicitly checks sig < SIGRTMIN, which might squash
my theory.

> 
> Maybe this?
> 
> ---
>  kernel/signal.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 1c2dd117fee0..6ee7491de906 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -2865,7 +2865,8 @@ static bool known_siginfo_layout(int sig, int si_code)
> if (si_code == SI_KERNEL)
> return true;
> else if ((si_code > SI_USER)) {
> -   if (sig_specific_sicodes(sig)) {
> +   if (sig < ARRAY_SIZE(sig_sicodes) &&
> +   sig_specific_sicodes(sig)) {
> if (si_code <= sig_sicodes[sig].limit)
> return true;
> }


Re: [LKP] 4ce5f9c9e7 [ 1.323881] WARNING: CPU: 0 PID: 1 at mm/slab_common.c:1031 kmalloc_slab

2018-10-10 Thread Sean Christopherson
On Wed, Oct 10, 2018 at 05:06:52PM -0500, Eric W. Biederman wrote:
> ebied...@xmission.com (Eric W. Biederman) writes:
> 
> > So I am flummoxed.  I am reading through the code and I don't see
> > anything that could trigger this, and when I ran the supplied reproducer
> > it did not reproduce for me.
> 
> Even more so.  With my tool chain the line that reports the failing
> address is impossible.
> 
> [   73.034423] RIP: 0010:copy_siginfo_from_user+0x4d/0xd0
> 
> With the supplied configureation my tool chain only has 0x30 bytes for
> all of copy_siginfo_from_user.  So I can't even begin to guess where
> in that function things are failing.
> 
> Any additional information that you can provide would be a real help
> in tracking down this strange failure.

I don't have the exact toolchain, but I was able to get somewhat close
and may have found a smoking gun.  0x4d in my build is in the general
vicinity of "sig_sicodes[sig].limit" in known_siginfo_layout().  This
lines up with the register state from the log, e.g. RDI=0500104d8,
which is the mask generated by sig_specific_sicodes.  From what I can
tell, @sig is never bounds checked.  If the compiler generated an AND
instruction to compare against sig_specific_sicodes then that could
resolve true with any arbitrary value that happened to collide with
sig_specific_sicodes and result in an out-of-bounds access to
@sig_sicodes.  siginfo_layout() for example explicitly checks @sig
before indexing @sig_sicode, e.g. "sig < ARRAY_SIZE(sig_sicodes)".

Maybe this?

---
 kernel/signal.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 1c2dd117fee0..6ee7491de906 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2865,7 +2865,8 @@ static bool known_siginfo_layout(int sig, int si_code)
if (si_code == SI_KERNEL)
return true;
else if ((si_code > SI_USER)) {
-   if (sig_specific_sicodes(sig)) {
+   if (sig < ARRAY_SIZE(sig_sicodes) &&
+   sig_specific_sicodes(sig)) {
if (si_code <= sig_sicodes[sig].limit)
return true;
}


Re: [LKP] 4ce5f9c9e7 [ 1.323881] WARNING: CPU: 0 PID: 1 at mm/slab_common.c:1031 kmalloc_slab

2018-10-10 Thread Sean Christopherson
On Wed, Oct 10, 2018 at 05:06:52PM -0500, Eric W. Biederman wrote:
> ebied...@xmission.com (Eric W. Biederman) writes:
> 
> > So I am flummoxed.  I am reading through the code and I don't see
> > anything that could trigger this, and when I ran the supplied reproducer
> > it did not reproduce for me.
> 
> Even more so.  With my tool chain the line that reports the failing
> address is impossible.
> 
> [   73.034423] RIP: 0010:copy_siginfo_from_user+0x4d/0xd0
> 
> With the supplied configureation my tool chain only has 0x30 bytes for
> all of copy_siginfo_from_user.  So I can't even begin to guess where
> in that function things are failing.
> 
> Any additional information that you can provide would be a real help
> in tracking down this strange failure.

I don't have the exact toolchain, but I was able to get somewhat close
and may have found a smoking gun.  0x4d in my build is in the general
vicinity of "sig_sicodes[sig].limit" in known_siginfo_layout().  This
lines up with the register state from the log, e.g. RDI=0500104d8,
which is the mask generated by sig_specific_sicodes.  From what I can
tell, @sig is never bounds checked.  If the compiler generated an AND
instruction to compare against sig_specific_sicodes then that could
resolve true with any arbitrary value that happened to collide with
sig_specific_sicodes and result in an out-of-bounds access to
@sig_sicodes.  siginfo_layout() for example explicitly checks @sig
before indexing @sig_sicode, e.g. "sig < ARRAY_SIZE(sig_sicodes)".

Maybe this?

---
 kernel/signal.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 1c2dd117fee0..6ee7491de906 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2865,7 +2865,8 @@ static bool known_siginfo_layout(int sig, int si_code)
if (si_code == SI_KERNEL)
return true;
else if ((si_code > SI_USER)) {
-   if (sig_specific_sicodes(sig)) {
+   if (sig < ARRAY_SIZE(sig_sicodes) &&
+   sig_specific_sicodes(sig)) {
if (si_code <= sig_sicodes[sig].limit)
return true;
}


Re: [LKP] 4ce5f9c9e7 [ 1.323881] WARNING: CPU: 0 PID: 1 at mm/slab_common.c:1031 kmalloc_slab

2018-10-10 Thread Eric W. Biederman
ebied...@xmission.com (Eric W. Biederman) writes:

> So I am flummoxed.  I am reading through the code and I don't see
> anything that could trigger this, and when I ran the supplied reproducer
> it did not reproduce for me.

Even more so.  With my tool chain the line that reports the failing
address is impossible.

[   73.034423] RIP: 0010:copy_siginfo_from_user+0x4d/0xd0

With the supplied configureation my tool chain only has 0x30 bytes for
all of copy_siginfo_from_user.  So I can't even begin to guess where
in that function things are failing.

Any additional information that you can provide would be a real help
in tracking down this strange failure.

Thank you,
Eric Biederman



Re: [LKP] 4ce5f9c9e7 [ 1.323881] WARNING: CPU: 0 PID: 1 at mm/slab_common.c:1031 kmalloc_slab

2018-10-10 Thread Eric W. Biederman
ebied...@xmission.com (Eric W. Biederman) writes:

> So I am flummoxed.  I am reading through the code and I don't see
> anything that could trigger this, and when I ran the supplied reproducer
> it did not reproduce for me.

Even more so.  With my tool chain the line that reports the failing
address is impossible.

[   73.034423] RIP: 0010:copy_siginfo_from_user+0x4d/0xd0

With the supplied configureation my tool chain only has 0x30 bytes for
all of copy_siginfo_from_user.  So I can't even begin to guess where
in that function things are failing.

Any additional information that you can provide would be a real help
in tracking down this strange failure.

Thank you,
Eric Biederman



Re: [LKP] 4ce5f9c9e7 [ 1.323881] WARNING: CPU: 0 PID: 1 at mm/slab_common.c:1031 kmalloc_slab

2018-10-10 Thread Eric W. Biederman


So I am flummoxed.  I am reading through the code and I don't see
anything that could trigger this, and when I ran the supplied reproducer
it did not reproduce for me.

Plus there is the noise from the kmalloc_slab test that is goofing up
the subject line.

Is there any chance I can get a disassembly of the
copy_siginfo_from_user or post_copy_siginfo_from_user from your build?
I don't have the same tool chain.

Right now I am strongly suspecting that there is a memory stomp
somewhere and the earlier tests just happen on something that is the
pinpointed commit to misbehave.

Either that or it is simply that I don't have the latest and greatest
smep/smap hardware and there is an off by one I am not seeing.

I don't doubt that this test is finding something I haven't figured out
how to see what it is finding, and when I exercise the same code path
with my own tests everything appears to work.

Eric

kernel test robot  writes:

> Greetings,
>
> 0day kernel testing robot got the below dmesg and the first bad commit is
>
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
>
> commit 4ce5f9c9e7546915c559ffae594e6d73f918db00
> Author: Eric W. Biederman 
> AuthorDate: Tue Sep 25 12:59:31 2018 +0200
> Commit: Eric W. Biederman 
> CommitDate: Wed Oct 3 16:50:39 2018 +0200
>
> signal: Use a smaller struct siginfo in the kernel
> 
> We reserve 128 bytes for struct siginfo but only use about 48 bytes on
> 64bit and 32 bytes on 32bit.  Someday we might use more but it is unlikely
> to be anytime soon.
> 
> Userspace seems content with just enough bytes of siginfo to implement
> sigqueue.  Or in the case of checkpoint/restart reinjecting signals
> the kernel has sent.
> 
> Reducing the stack footprint and the work to copy siginfo around from
> 2 cachelines to 1 cachelines seems worth doing even if I don't have
> benchmarks to show a performance difference.
> 
> Suggested-by: Linus Torvalds 
> Signed-off-by: "Eric W. Biederman" 
>
> ae7795bc61  signal: Distinguish between kernel_siginfo and siginfo
> 4ce5f9c9e7  signal: Use a smaller struct siginfo in the kernel
> 570b7bdeaf  Add linux-next specific files for 20181009
> +---+++---+
> |   | ae7795bc61 | 4ce5f9c9e7 | 
> next-20181009 |
> +---+++---+
> | boot_successes| 0  | 0  | 28
> |
> | boot_failures | 1144   | 280| 8 
> |
> | WARNING:at_mm/slab_common.c:#kmalloc_slab | 1144   | 280|   
> |
> | RIP:kmalloc_slab  | 1144   | 280|   
> |
> | Mem-Info  | 1144   | 280| 8 
> |
> | BUG:unable_to_handle_kernel   | 0  | 5  | 7 
> |
> | Oops:#[##]| 0  | 7  | 8 
> |
> | RIP:copy_siginfo_from_user| 0  | 7  |   
> |
> | Kernel_panic-not_syncing:Fatal_exception  | 0  | 7  | 8 
> |
> | RIP:post_copy_siginfo_from_user   | 0  | 0  | 8 
> |
> +---+++---+
>
> [1.320405] test_overflow: ok: (s8)(0 << 7) == 0
> [1.321071] test_overflow: ok: (s16)(0 << 15) == 0
> [1.321756] test_overflow: ok: (int)(0 << 31) == 0
> [1.322442] test_overflow: ok: (s32)(0 << 31) == 0
> [1.323121] test_overflow: ok: (s64)(0 << 63) == 0
> [1.323881] WARNING: CPU: 0 PID: 1 at mm/slab_common.c:1031 
> kmalloc_slab+0x17/0x70
> [1.324113] CPU: 0 PID: 1 Comm: swapper/0 Tainted: GT 
> 4.19.0-rc1-00077-g4ce5f9c #1
> [1.324113] RIP: 0010:kmalloc_slab+0x17/0x70
> [1.324113] Code: 00 00 00 83 3d 11 78 14 03 02 55 48 89 e5 5d 0f 97 c0 c3 
> 55 48 81 ff 00 00 40 00 48 89 e5 76 0e 31 c0 81 e6 00 02 00 00 75 4b <0f> 0b 
> eb 47 48 81 ff c0 00 00 00 77 19 48 85 ff b8 10 00 00 00 74
> [1.324113] RSP: :88000fc7fd50 EFLAGS: 00010246
> [1.324113] RAX:  RBX: 006000c0 RCX: 
> 88001fb68d47
> [1.324113] RDX: 0001 RSI:  RDI: 
> 
> [1.324113] RBP: 88000fc7fd50 R08: b128ac78 R09: 
> 0001
> [1.324113] R10: 0001 R11:  R12: 
> 88001d814800
> [1.324113] R13:  R14: 836e16f4 R15: 
> 0001
> [1.324113] FS:  () GS:88001f00() 
> knlGS:
> [1.324113] CS:  0010 DS:  ES:  CR0: 80050033
> [1.324113] CR2:  CR3: 03012001 

Re: [LKP] 4ce5f9c9e7 [ 1.323881] WARNING: CPU: 0 PID: 1 at mm/slab_common.c:1031 kmalloc_slab

2018-10-10 Thread Eric W. Biederman


So I am flummoxed.  I am reading through the code and I don't see
anything that could trigger this, and when I ran the supplied reproducer
it did not reproduce for me.

Plus there is the noise from the kmalloc_slab test that is goofing up
the subject line.

Is there any chance I can get a disassembly of the
copy_siginfo_from_user or post_copy_siginfo_from_user from your build?
I don't have the same tool chain.

Right now I am strongly suspecting that there is a memory stomp
somewhere and the earlier tests just happen on something that is the
pinpointed commit to misbehave.

Either that or it is simply that I don't have the latest and greatest
smep/smap hardware and there is an off by one I am not seeing.

I don't doubt that this test is finding something I haven't figured out
how to see what it is finding, and when I exercise the same code path
with my own tests everything appears to work.

Eric

kernel test robot  writes:

> Greetings,
>
> 0day kernel testing robot got the below dmesg and the first bad commit is
>
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
>
> commit 4ce5f9c9e7546915c559ffae594e6d73f918db00
> Author: Eric W. Biederman 
> AuthorDate: Tue Sep 25 12:59:31 2018 +0200
> Commit: Eric W. Biederman 
> CommitDate: Wed Oct 3 16:50:39 2018 +0200
>
> signal: Use a smaller struct siginfo in the kernel
> 
> We reserve 128 bytes for struct siginfo but only use about 48 bytes on
> 64bit and 32 bytes on 32bit.  Someday we might use more but it is unlikely
> to be anytime soon.
> 
> Userspace seems content with just enough bytes of siginfo to implement
> sigqueue.  Or in the case of checkpoint/restart reinjecting signals
> the kernel has sent.
> 
> Reducing the stack footprint and the work to copy siginfo around from
> 2 cachelines to 1 cachelines seems worth doing even if I don't have
> benchmarks to show a performance difference.
> 
> Suggested-by: Linus Torvalds 
> Signed-off-by: "Eric W. Biederman" 
>
> ae7795bc61  signal: Distinguish between kernel_siginfo and siginfo
> 4ce5f9c9e7  signal: Use a smaller struct siginfo in the kernel
> 570b7bdeaf  Add linux-next specific files for 20181009
> +---+++---+
> |   | ae7795bc61 | 4ce5f9c9e7 | 
> next-20181009 |
> +---+++---+
> | boot_successes| 0  | 0  | 28
> |
> | boot_failures | 1144   | 280| 8 
> |
> | WARNING:at_mm/slab_common.c:#kmalloc_slab | 1144   | 280|   
> |
> | RIP:kmalloc_slab  | 1144   | 280|   
> |
> | Mem-Info  | 1144   | 280| 8 
> |
> | BUG:unable_to_handle_kernel   | 0  | 5  | 7 
> |
> | Oops:#[##]| 0  | 7  | 8 
> |
> | RIP:copy_siginfo_from_user| 0  | 7  |   
> |
> | Kernel_panic-not_syncing:Fatal_exception  | 0  | 7  | 8 
> |
> | RIP:post_copy_siginfo_from_user   | 0  | 0  | 8 
> |
> +---+++---+
>
> [1.320405] test_overflow: ok: (s8)(0 << 7) == 0
> [1.321071] test_overflow: ok: (s16)(0 << 15) == 0
> [1.321756] test_overflow: ok: (int)(0 << 31) == 0
> [1.322442] test_overflow: ok: (s32)(0 << 31) == 0
> [1.323121] test_overflow: ok: (s64)(0 << 63) == 0
> [1.323881] WARNING: CPU: 0 PID: 1 at mm/slab_common.c:1031 
> kmalloc_slab+0x17/0x70
> [1.324113] CPU: 0 PID: 1 Comm: swapper/0 Tainted: GT 
> 4.19.0-rc1-00077-g4ce5f9c #1
> [1.324113] RIP: 0010:kmalloc_slab+0x17/0x70
> [1.324113] Code: 00 00 00 83 3d 11 78 14 03 02 55 48 89 e5 5d 0f 97 c0 c3 
> 55 48 81 ff 00 00 40 00 48 89 e5 76 0e 31 c0 81 e6 00 02 00 00 75 4b <0f> 0b 
> eb 47 48 81 ff c0 00 00 00 77 19 48 85 ff b8 10 00 00 00 74
> [1.324113] RSP: :88000fc7fd50 EFLAGS: 00010246
> [1.324113] RAX:  RBX: 006000c0 RCX: 
> 88001fb68d47
> [1.324113] RDX: 0001 RSI:  RDI: 
> 
> [1.324113] RBP: 88000fc7fd50 R08: b128ac78 R09: 
> 0001
> [1.324113] R10: 0001 R11:  R12: 
> 88001d814800
> [1.324113] R13:  R14: 836e16f4 R15: 
> 0001
> [1.324113] FS:  () GS:88001f00() 
> knlGS:
> [1.324113] CS:  0010 DS:  ES:  CR0: 80050033
> [1.324113] CR2:  CR3: 03012001