Re: [LKP] 4ce5f9c9e7 [ 1.323881] WARNING: CPU: 0 PID: 1 at mm/slab_common.c:1031 kmalloc_slab
ebied...@xmission.com (Eric W. Biederman) writes: > Sean Christopherson writes: > >> On Wed, Oct 10, 2018 at 05:06:52PM -0500, Eric W. Biederman wrote: >>> ebied...@xmission.com (Eric W. Biederman) writes: >>> >>> > So I am flummoxed. I am reading through the code and I don't see >>> > anything that could trigger this, and when I ran the supplied reproducer >>> > it did not reproduce for me. >>> >>> Even more so. With my tool chain the line that reports the failing >>> address is impossible. >>> >>> [ 73.034423] RIP: 0010:copy_siginfo_from_user+0x4d/0xd0 >>> >>> With the supplied configureation my tool chain only has 0x30 bytes for >>> all of copy_siginfo_from_user. So I can't even begin to guess where >>> in that function things are failing. >>> >>> Any additional information that you can provide would be a real help >>> in tracking down this strange failure. >> >> I don't have the exact toolchain, but I was able to get somewhat close >> and may have found a smoking gun. 0x4d in my build is in the general >> vicinity of "sig_sicodes[sig].limit" in known_siginfo_layout(). This >> lines up with the register state from the log, e.g. RDI=0500104d8, >> which is the mask generated by sig_specific_sicodes. From what I can >> tell, @sig is never bounds checked. If the compiler generated an AND >> instruction to compare against sig_specific_sicodes then that could >> resolve true with any arbitrary value that happened to collide with >> sig_specific_sicodes and result in an out-of-bounds access to >> @sig_sicodes. siginfo_layout() for example explicitly checks @sig >> before indexing @sig_sicode, e.g. "sig < ARRAY_SIZE(sig_sicodes)". >> >> Maybe this? > > But sig is bounds checked. Even better sig is checked to see if it > is one of the values in the array. > >> From include/linux/signal.h > > #define SIG_SPECIFIC_SICODES_MASK (\ > rt_sigmask(SIGILL)| rt_sigmask(SIGFPE)| \ > rt_sigmask(SIGSEGV) | rt_sigmask(SIGBUS)| \ > rt_sigmask(SIGTRAP) | rt_sigmask(SIGCHLD) | \ > rt_sigmask(SIGPOLL) | rt_sigmask(SIGSYS)| \ > SIGEMT_MASK) > > #define siginmask(sig, mask) \ > ((sig) < SIGRTMIN && (rt_sigmask(sig) & (mask))) > > #define sig_specific_sicodes(sig) siginmask(sig, > SIG_SPECIFIC_SICODES_MASK) > > > > Hmm. I wonder if something is passing in a negative signal number. > There is not a bounds check for that. A sufficiently large signal > number might be the problem here. Yes. I can get an oops with > a sufficiently large negative signal number. > > The code will later call valid_signal in check_permissions and > that will cause the system call to fail, so the issue is just that > the signal number is not being validated early enough. > > On the output path (copy_siginfo_to_user and copy_siginfo_to_user32) the > signal number should be validated before it ever reaches userspace > which is why I expect trinity never triggered anything. > > There is copy_siginfo_from_user32 and that does call siginfo_layout with > a possibly negative signal number. Which has the same potential issues. > > So I am going to go with the fix below. That fixes things in my testing > and by being unsigned should fix keep negative numbers from being a > problem. Sean thank you very much for putting me on the right path to track this failing test down. Eric
Re: [LKP] 4ce5f9c9e7 [ 1.323881] WARNING: CPU: 0 PID: 1 at mm/slab_common.c:1031 kmalloc_slab
ebied...@xmission.com (Eric W. Biederman) writes: > Sean Christopherson writes: > >> On Wed, Oct 10, 2018 at 05:06:52PM -0500, Eric W. Biederman wrote: >>> ebied...@xmission.com (Eric W. Biederman) writes: >>> >>> > So I am flummoxed. I am reading through the code and I don't see >>> > anything that could trigger this, and when I ran the supplied reproducer >>> > it did not reproduce for me. >>> >>> Even more so. With my tool chain the line that reports the failing >>> address is impossible. >>> >>> [ 73.034423] RIP: 0010:copy_siginfo_from_user+0x4d/0xd0 >>> >>> With the supplied configureation my tool chain only has 0x30 bytes for >>> all of copy_siginfo_from_user. So I can't even begin to guess where >>> in that function things are failing. >>> >>> Any additional information that you can provide would be a real help >>> in tracking down this strange failure. >> >> I don't have the exact toolchain, but I was able to get somewhat close >> and may have found a smoking gun. 0x4d in my build is in the general >> vicinity of "sig_sicodes[sig].limit" in known_siginfo_layout(). This >> lines up with the register state from the log, e.g. RDI=0500104d8, >> which is the mask generated by sig_specific_sicodes. From what I can >> tell, @sig is never bounds checked. If the compiler generated an AND >> instruction to compare against sig_specific_sicodes then that could >> resolve true with any arbitrary value that happened to collide with >> sig_specific_sicodes and result in an out-of-bounds access to >> @sig_sicodes. siginfo_layout() for example explicitly checks @sig >> before indexing @sig_sicode, e.g. "sig < ARRAY_SIZE(sig_sicodes)". >> >> Maybe this? > > But sig is bounds checked. Even better sig is checked to see if it > is one of the values in the array. > >> From include/linux/signal.h > > #define SIG_SPECIFIC_SICODES_MASK (\ > rt_sigmask(SIGILL)| rt_sigmask(SIGFPE)| \ > rt_sigmask(SIGSEGV) | rt_sigmask(SIGBUS)| \ > rt_sigmask(SIGTRAP) | rt_sigmask(SIGCHLD) | \ > rt_sigmask(SIGPOLL) | rt_sigmask(SIGSYS)| \ > SIGEMT_MASK) > > #define siginmask(sig, mask) \ > ((sig) < SIGRTMIN && (rt_sigmask(sig) & (mask))) > > #define sig_specific_sicodes(sig) siginmask(sig, > SIG_SPECIFIC_SICODES_MASK) > > > > Hmm. I wonder if something is passing in a negative signal number. > There is not a bounds check for that. A sufficiently large signal > number might be the problem here. Yes. I can get an oops with > a sufficiently large negative signal number. > > The code will later call valid_signal in check_permissions and > that will cause the system call to fail, so the issue is just that > the signal number is not being validated early enough. > > On the output path (copy_siginfo_to_user and copy_siginfo_to_user32) the > signal number should be validated before it ever reaches userspace > which is why I expect trinity never triggered anything. > > There is copy_siginfo_from_user32 and that does call siginfo_layout with > a possibly negative signal number. Which has the same potential issues. > > So I am going to go with the fix below. That fixes things in my testing > and by being unsigned should fix keep negative numbers from being a > problem. Sean thank you very much for putting me on the right path to track this failing test down. Eric
Re: [LKP] 4ce5f9c9e7 [ 1.323881] WARNING: CPU: 0 PID: 1 at mm/slab_common.c:1031 kmalloc_slab
Sean Christopherson writes: > On Wed, Oct 10, 2018 at 05:06:52PM -0500, Eric W. Biederman wrote: >> ebied...@xmission.com (Eric W. Biederman) writes: >> >> > So I am flummoxed. I am reading through the code and I don't see >> > anything that could trigger this, and when I ran the supplied reproducer >> > it did not reproduce for me. >> >> Even more so. With my tool chain the line that reports the failing >> address is impossible. >> >> [ 73.034423] RIP: 0010:copy_siginfo_from_user+0x4d/0xd0 >> >> With the supplied configureation my tool chain only has 0x30 bytes for >> all of copy_siginfo_from_user. So I can't even begin to guess where >> in that function things are failing. >> >> Any additional information that you can provide would be a real help >> in tracking down this strange failure. > > I don't have the exact toolchain, but I was able to get somewhat close > and may have found a smoking gun. 0x4d in my build is in the general > vicinity of "sig_sicodes[sig].limit" in known_siginfo_layout(). This > lines up with the register state from the log, e.g. RDI=0500104d8, > which is the mask generated by sig_specific_sicodes. From what I can > tell, @sig is never bounds checked. If the compiler generated an AND > instruction to compare against sig_specific_sicodes then that could > resolve true with any arbitrary value that happened to collide with > sig_specific_sicodes and result in an out-of-bounds access to > @sig_sicodes. siginfo_layout() for example explicitly checks @sig > before indexing @sig_sicode, e.g. "sig < ARRAY_SIZE(sig_sicodes)". > > Maybe this? But sig is bounds checked. Even better sig is checked to see if it is one of the values in the array. >From include/linux/signal.h #define SIG_SPECIFIC_SICODES_MASK (\ rt_sigmask(SIGILL)| rt_sigmask(SIGFPE)| \ rt_sigmask(SIGSEGV) | rt_sigmask(SIGBUS)| \ rt_sigmask(SIGTRAP) | rt_sigmask(SIGCHLD) | \ rt_sigmask(SIGPOLL) | rt_sigmask(SIGSYS)| \ SIGEMT_MASK) #define siginmask(sig, mask) \ ((sig) < SIGRTMIN && (rt_sigmask(sig) & (mask))) #define sig_specific_sicodes(sig) siginmask(sig, SIG_SPECIFIC_SICODES_MASK) Hmm. I wonder if something is passing in a negative signal number. There is not a bounds check for that. A sufficiently large signal number might be the problem here. Yes. I can get an oops with a sufficiently large negative signal number. The code will later call valid_signal in check_permissions and that will cause the system call to fail, so the issue is just that the signal number is not being validated early enough. On the output path (copy_siginfo_to_user and copy_siginfo_to_user32) the signal number should be validated before it ever reaches userspace which is why I expect trinity never triggered anything. There is copy_siginfo_from_user32 and that does call siginfo_layout with a possibly negative signal number. Which has the same potential issues. So I am going to go with the fix below. That fixes things in my testing and by being unsigned should fix keep negative numbers from being a problem. diff --git a/kernel/signal.c b/kernel/signal.c index 2bffc5a50183..4fd431ce4f91 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -2860,7 +2860,7 @@ static const struct { [SIGSYS] = { NSIGSYS, SIL_SYS }, }; -static bool known_siginfo_layout(int sig, int si_code) +static bool known_siginfo_layout(unsigned sig, int si_code) { if (si_code == SI_KERNEL) return true; @@ -2879,7 +2879,7 @@ static bool known_siginfo_layout(int sig, int si_code) return false; } -enum siginfo_layout siginfo_layout(int sig, int si_code) +enum siginfo_layout siginfo_layout(unsigned sig, int si_code) { enum siginfo_layout layout = SIL_KILL; if ((si_code > SI_USER) && (si_code < SI_KERNEL)) {
Re: [LKP] 4ce5f9c9e7 [ 1.323881] WARNING: CPU: 0 PID: 1 at mm/slab_common.c:1031 kmalloc_slab
Sean Christopherson writes: > On Wed, Oct 10, 2018 at 05:06:52PM -0500, Eric W. Biederman wrote: >> ebied...@xmission.com (Eric W. Biederman) writes: >> >> > So I am flummoxed. I am reading through the code and I don't see >> > anything that could trigger this, and when I ran the supplied reproducer >> > it did not reproduce for me. >> >> Even more so. With my tool chain the line that reports the failing >> address is impossible. >> >> [ 73.034423] RIP: 0010:copy_siginfo_from_user+0x4d/0xd0 >> >> With the supplied configureation my tool chain only has 0x30 bytes for >> all of copy_siginfo_from_user. So I can't even begin to guess where >> in that function things are failing. >> >> Any additional information that you can provide would be a real help >> in tracking down this strange failure. > > I don't have the exact toolchain, but I was able to get somewhat close > and may have found a smoking gun. 0x4d in my build is in the general > vicinity of "sig_sicodes[sig].limit" in known_siginfo_layout(). This > lines up with the register state from the log, e.g. RDI=0500104d8, > which is the mask generated by sig_specific_sicodes. From what I can > tell, @sig is never bounds checked. If the compiler generated an AND > instruction to compare against sig_specific_sicodes then that could > resolve true with any arbitrary value that happened to collide with > sig_specific_sicodes and result in an out-of-bounds access to > @sig_sicodes. siginfo_layout() for example explicitly checks @sig > before indexing @sig_sicode, e.g. "sig < ARRAY_SIZE(sig_sicodes)". > > Maybe this? But sig is bounds checked. Even better sig is checked to see if it is one of the values in the array. >From include/linux/signal.h #define SIG_SPECIFIC_SICODES_MASK (\ rt_sigmask(SIGILL)| rt_sigmask(SIGFPE)| \ rt_sigmask(SIGSEGV) | rt_sigmask(SIGBUS)| \ rt_sigmask(SIGTRAP) | rt_sigmask(SIGCHLD) | \ rt_sigmask(SIGPOLL) | rt_sigmask(SIGSYS)| \ SIGEMT_MASK) #define siginmask(sig, mask) \ ((sig) < SIGRTMIN && (rt_sigmask(sig) & (mask))) #define sig_specific_sicodes(sig) siginmask(sig, SIG_SPECIFIC_SICODES_MASK) Hmm. I wonder if something is passing in a negative signal number. There is not a bounds check for that. A sufficiently large signal number might be the problem here. Yes. I can get an oops with a sufficiently large negative signal number. The code will later call valid_signal in check_permissions and that will cause the system call to fail, so the issue is just that the signal number is not being validated early enough. On the output path (copy_siginfo_to_user and copy_siginfo_to_user32) the signal number should be validated before it ever reaches userspace which is why I expect trinity never triggered anything. There is copy_siginfo_from_user32 and that does call siginfo_layout with a possibly negative signal number. Which has the same potential issues. So I am going to go with the fix below. That fixes things in my testing and by being unsigned should fix keep negative numbers from being a problem. diff --git a/kernel/signal.c b/kernel/signal.c index 2bffc5a50183..4fd431ce4f91 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -2860,7 +2860,7 @@ static const struct { [SIGSYS] = { NSIGSYS, SIL_SYS }, }; -static bool known_siginfo_layout(int sig, int si_code) +static bool known_siginfo_layout(unsigned sig, int si_code) { if (si_code == SI_KERNEL) return true; @@ -2879,7 +2879,7 @@ static bool known_siginfo_layout(int sig, int si_code) return false; } -enum siginfo_layout siginfo_layout(int sig, int si_code) +enum siginfo_layout siginfo_layout(unsigned sig, int si_code) { enum siginfo_layout layout = SIL_KILL; if ((si_code > SI_USER) && (si_code < SI_KERNEL)) {
Re: [LKP] 4ce5f9c9e7 [ 1.323881] WARNING: CPU: 0 PID: 1 at mm/slab_common.c:1031 kmalloc_slab
On Wed, Oct 10, 2018 at 04:41:48PM -0700, Sean Christopherson wrote: > On Wed, Oct 10, 2018 at 05:06:52PM -0500, Eric W. Biederman wrote: > > ebied...@xmission.com (Eric W. Biederman) writes: > > > > > So I am flummoxed. I am reading through the code and I don't see > > > anything that could trigger this, and when I ran the supplied reproducer > > > it did not reproduce for me. > > > > Even more so. With my tool chain the line that reports the failing > > address is impossible. > > > > [ 73.034423] RIP: 0010:copy_siginfo_from_user+0x4d/0xd0 > > > > With the supplied configureation my tool chain only has 0x30 bytes for > > all of copy_siginfo_from_user. So I can't even begin to guess where > > in that function things are failing. > > > > Any additional information that you can provide would be a real help > > in tracking down this strange failure. > > I don't have the exact toolchain, but I was able to get somewhat close > and may have found a smoking gun. 0x4d in my build is in the general > vicinity of "sig_sicodes[sig].limit" in known_siginfo_layout(). This > lines up with the register state from the log, e.g. RDI=0500104d8, > which is the mask generated by sig_specific_sicodes. From what I can > tell, @sig is never bounds checked. If the compiler generated an AND > instruction to compare against sig_specific_sicodes then that could > resolve true with any arbitrary value that happened to collide with > sig_specific_sicodes and result in an out-of-bounds access to > @sig_sicodes. siginfo_layout() for example explicitly checks @sig > before indexing @sig_sicode, e.g. "sig < ARRAY_SIZE(sig_sicodes)". Hmm, siginmask explicitly checks sig < SIGRTMIN, which might squash my theory. > > Maybe this? > > --- > kernel/signal.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/kernel/signal.c b/kernel/signal.c > index 1c2dd117fee0..6ee7491de906 100644 > --- a/kernel/signal.c > +++ b/kernel/signal.c > @@ -2865,7 +2865,8 @@ static bool known_siginfo_layout(int sig, int si_code) > if (si_code == SI_KERNEL) > return true; > else if ((si_code > SI_USER)) { > - if (sig_specific_sicodes(sig)) { > + if (sig < ARRAY_SIZE(sig_sicodes) && > + sig_specific_sicodes(sig)) { > if (si_code <= sig_sicodes[sig].limit) > return true; > }
Re: [LKP] 4ce5f9c9e7 [ 1.323881] WARNING: CPU: 0 PID: 1 at mm/slab_common.c:1031 kmalloc_slab
On Wed, Oct 10, 2018 at 04:41:48PM -0700, Sean Christopherson wrote: > On Wed, Oct 10, 2018 at 05:06:52PM -0500, Eric W. Biederman wrote: > > ebied...@xmission.com (Eric W. Biederman) writes: > > > > > So I am flummoxed. I am reading through the code and I don't see > > > anything that could trigger this, and when I ran the supplied reproducer > > > it did not reproduce for me. > > > > Even more so. With my tool chain the line that reports the failing > > address is impossible. > > > > [ 73.034423] RIP: 0010:copy_siginfo_from_user+0x4d/0xd0 > > > > With the supplied configureation my tool chain only has 0x30 bytes for > > all of copy_siginfo_from_user. So I can't even begin to guess where > > in that function things are failing. > > > > Any additional information that you can provide would be a real help > > in tracking down this strange failure. > > I don't have the exact toolchain, but I was able to get somewhat close > and may have found a smoking gun. 0x4d in my build is in the general > vicinity of "sig_sicodes[sig].limit" in known_siginfo_layout(). This > lines up with the register state from the log, e.g. RDI=0500104d8, > which is the mask generated by sig_specific_sicodes. From what I can > tell, @sig is never bounds checked. If the compiler generated an AND > instruction to compare against sig_specific_sicodes then that could > resolve true with any arbitrary value that happened to collide with > sig_specific_sicodes and result in an out-of-bounds access to > @sig_sicodes. siginfo_layout() for example explicitly checks @sig > before indexing @sig_sicode, e.g. "sig < ARRAY_SIZE(sig_sicodes)". Hmm, siginmask explicitly checks sig < SIGRTMIN, which might squash my theory. > > Maybe this? > > --- > kernel/signal.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/kernel/signal.c b/kernel/signal.c > index 1c2dd117fee0..6ee7491de906 100644 > --- a/kernel/signal.c > +++ b/kernel/signal.c > @@ -2865,7 +2865,8 @@ static bool known_siginfo_layout(int sig, int si_code) > if (si_code == SI_KERNEL) > return true; > else if ((si_code > SI_USER)) { > - if (sig_specific_sicodes(sig)) { > + if (sig < ARRAY_SIZE(sig_sicodes) && > + sig_specific_sicodes(sig)) { > if (si_code <= sig_sicodes[sig].limit) > return true; > }
Re: [LKP] 4ce5f9c9e7 [ 1.323881] WARNING: CPU: 0 PID: 1 at mm/slab_common.c:1031 kmalloc_slab
On Wed, Oct 10, 2018 at 05:06:52PM -0500, Eric W. Biederman wrote: > ebied...@xmission.com (Eric W. Biederman) writes: > > > So I am flummoxed. I am reading through the code and I don't see > > anything that could trigger this, and when I ran the supplied reproducer > > it did not reproduce for me. > > Even more so. With my tool chain the line that reports the failing > address is impossible. > > [ 73.034423] RIP: 0010:copy_siginfo_from_user+0x4d/0xd0 > > With the supplied configureation my tool chain only has 0x30 bytes for > all of copy_siginfo_from_user. So I can't even begin to guess where > in that function things are failing. > > Any additional information that you can provide would be a real help > in tracking down this strange failure. I don't have the exact toolchain, but I was able to get somewhat close and may have found a smoking gun. 0x4d in my build is in the general vicinity of "sig_sicodes[sig].limit" in known_siginfo_layout(). This lines up with the register state from the log, e.g. RDI=0500104d8, which is the mask generated by sig_specific_sicodes. From what I can tell, @sig is never bounds checked. If the compiler generated an AND instruction to compare against sig_specific_sicodes then that could resolve true with any arbitrary value that happened to collide with sig_specific_sicodes and result in an out-of-bounds access to @sig_sicodes. siginfo_layout() for example explicitly checks @sig before indexing @sig_sicode, e.g. "sig < ARRAY_SIZE(sig_sicodes)". Maybe this? --- kernel/signal.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/kernel/signal.c b/kernel/signal.c index 1c2dd117fee0..6ee7491de906 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -2865,7 +2865,8 @@ static bool known_siginfo_layout(int sig, int si_code) if (si_code == SI_KERNEL) return true; else if ((si_code > SI_USER)) { - if (sig_specific_sicodes(sig)) { + if (sig < ARRAY_SIZE(sig_sicodes) && + sig_specific_sicodes(sig)) { if (si_code <= sig_sicodes[sig].limit) return true; }
Re: [LKP] 4ce5f9c9e7 [ 1.323881] WARNING: CPU: 0 PID: 1 at mm/slab_common.c:1031 kmalloc_slab
On Wed, Oct 10, 2018 at 05:06:52PM -0500, Eric W. Biederman wrote: > ebied...@xmission.com (Eric W. Biederman) writes: > > > So I am flummoxed. I am reading through the code and I don't see > > anything that could trigger this, and when I ran the supplied reproducer > > it did not reproduce for me. > > Even more so. With my tool chain the line that reports the failing > address is impossible. > > [ 73.034423] RIP: 0010:copy_siginfo_from_user+0x4d/0xd0 > > With the supplied configureation my tool chain only has 0x30 bytes for > all of copy_siginfo_from_user. So I can't even begin to guess where > in that function things are failing. > > Any additional information that you can provide would be a real help > in tracking down this strange failure. I don't have the exact toolchain, but I was able to get somewhat close and may have found a smoking gun. 0x4d in my build is in the general vicinity of "sig_sicodes[sig].limit" in known_siginfo_layout(). This lines up with the register state from the log, e.g. RDI=0500104d8, which is the mask generated by sig_specific_sicodes. From what I can tell, @sig is never bounds checked. If the compiler generated an AND instruction to compare against sig_specific_sicodes then that could resolve true with any arbitrary value that happened to collide with sig_specific_sicodes and result in an out-of-bounds access to @sig_sicodes. siginfo_layout() for example explicitly checks @sig before indexing @sig_sicode, e.g. "sig < ARRAY_SIZE(sig_sicodes)". Maybe this? --- kernel/signal.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/kernel/signal.c b/kernel/signal.c index 1c2dd117fee0..6ee7491de906 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -2865,7 +2865,8 @@ static bool known_siginfo_layout(int sig, int si_code) if (si_code == SI_KERNEL) return true; else if ((si_code > SI_USER)) { - if (sig_specific_sicodes(sig)) { + if (sig < ARRAY_SIZE(sig_sicodes) && + sig_specific_sicodes(sig)) { if (si_code <= sig_sicodes[sig].limit) return true; }
Re: [LKP] 4ce5f9c9e7 [ 1.323881] WARNING: CPU: 0 PID: 1 at mm/slab_common.c:1031 kmalloc_slab
ebied...@xmission.com (Eric W. Biederman) writes: > So I am flummoxed. I am reading through the code and I don't see > anything that could trigger this, and when I ran the supplied reproducer > it did not reproduce for me. Even more so. With my tool chain the line that reports the failing address is impossible. [ 73.034423] RIP: 0010:copy_siginfo_from_user+0x4d/0xd0 With the supplied configureation my tool chain only has 0x30 bytes for all of copy_siginfo_from_user. So I can't even begin to guess where in that function things are failing. Any additional information that you can provide would be a real help in tracking down this strange failure. Thank you, Eric Biederman
Re: [LKP] 4ce5f9c9e7 [ 1.323881] WARNING: CPU: 0 PID: 1 at mm/slab_common.c:1031 kmalloc_slab
ebied...@xmission.com (Eric W. Biederman) writes: > So I am flummoxed. I am reading through the code and I don't see > anything that could trigger this, and when I ran the supplied reproducer > it did not reproduce for me. Even more so. With my tool chain the line that reports the failing address is impossible. [ 73.034423] RIP: 0010:copy_siginfo_from_user+0x4d/0xd0 With the supplied configureation my tool chain only has 0x30 bytes for all of copy_siginfo_from_user. So I can't even begin to guess where in that function things are failing. Any additional information that you can provide would be a real help in tracking down this strange failure. Thank you, Eric Biederman
Re: [LKP] 4ce5f9c9e7 [ 1.323881] WARNING: CPU: 0 PID: 1 at mm/slab_common.c:1031 kmalloc_slab
So I am flummoxed. I am reading through the code and I don't see anything that could trigger this, and when I ran the supplied reproducer it did not reproduce for me. Plus there is the noise from the kmalloc_slab test that is goofing up the subject line. Is there any chance I can get a disassembly of the copy_siginfo_from_user or post_copy_siginfo_from_user from your build? I don't have the same tool chain. Right now I am strongly suspecting that there is a memory stomp somewhere and the earlier tests just happen on something that is the pinpointed commit to misbehave. Either that or it is simply that I don't have the latest and greatest smep/smap hardware and there is an off by one I am not seeing. I don't doubt that this test is finding something I haven't figured out how to see what it is finding, and when I exercise the same code path with my own tests everything appears to work. Eric kernel test robot writes: > Greetings, > > 0day kernel testing robot got the below dmesg and the first bad commit is > > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master > > commit 4ce5f9c9e7546915c559ffae594e6d73f918db00 > Author: Eric W. Biederman > AuthorDate: Tue Sep 25 12:59:31 2018 +0200 > Commit: Eric W. Biederman > CommitDate: Wed Oct 3 16:50:39 2018 +0200 > > signal: Use a smaller struct siginfo in the kernel > > We reserve 128 bytes for struct siginfo but only use about 48 bytes on > 64bit and 32 bytes on 32bit. Someday we might use more but it is unlikely > to be anytime soon. > > Userspace seems content with just enough bytes of siginfo to implement > sigqueue. Or in the case of checkpoint/restart reinjecting signals > the kernel has sent. > > Reducing the stack footprint and the work to copy siginfo around from > 2 cachelines to 1 cachelines seems worth doing even if I don't have > benchmarks to show a performance difference. > > Suggested-by: Linus Torvalds > Signed-off-by: "Eric W. Biederman" > > ae7795bc61 signal: Distinguish between kernel_siginfo and siginfo > 4ce5f9c9e7 signal: Use a smaller struct siginfo in the kernel > 570b7bdeaf Add linux-next specific files for 20181009 > +---+++---+ > | | ae7795bc61 | 4ce5f9c9e7 | > next-20181009 | > +---+++---+ > | boot_successes| 0 | 0 | 28 > | > | boot_failures | 1144 | 280| 8 > | > | WARNING:at_mm/slab_common.c:#kmalloc_slab | 1144 | 280| > | > | RIP:kmalloc_slab | 1144 | 280| > | > | Mem-Info | 1144 | 280| 8 > | > | BUG:unable_to_handle_kernel | 0 | 5 | 7 > | > | Oops:#[##]| 0 | 7 | 8 > | > | RIP:copy_siginfo_from_user| 0 | 7 | > | > | Kernel_panic-not_syncing:Fatal_exception | 0 | 7 | 8 > | > | RIP:post_copy_siginfo_from_user | 0 | 0 | 8 > | > +---+++---+ > > [1.320405] test_overflow: ok: (s8)(0 << 7) == 0 > [1.321071] test_overflow: ok: (s16)(0 << 15) == 0 > [1.321756] test_overflow: ok: (int)(0 << 31) == 0 > [1.322442] test_overflow: ok: (s32)(0 << 31) == 0 > [1.323121] test_overflow: ok: (s64)(0 << 63) == 0 > [1.323881] WARNING: CPU: 0 PID: 1 at mm/slab_common.c:1031 > kmalloc_slab+0x17/0x70 > [1.324113] CPU: 0 PID: 1 Comm: swapper/0 Tainted: GT > 4.19.0-rc1-00077-g4ce5f9c #1 > [1.324113] RIP: 0010:kmalloc_slab+0x17/0x70 > [1.324113] Code: 00 00 00 83 3d 11 78 14 03 02 55 48 89 e5 5d 0f 97 c0 c3 > 55 48 81 ff 00 00 40 00 48 89 e5 76 0e 31 c0 81 e6 00 02 00 00 75 4b <0f> 0b > eb 47 48 81 ff c0 00 00 00 77 19 48 85 ff b8 10 00 00 00 74 > [1.324113] RSP: :88000fc7fd50 EFLAGS: 00010246 > [1.324113] RAX: RBX: 006000c0 RCX: > 88001fb68d47 > [1.324113] RDX: 0001 RSI: RDI: > > [1.324113] RBP: 88000fc7fd50 R08: b128ac78 R09: > 0001 > [1.324113] R10: 0001 R11: R12: > 88001d814800 > [1.324113] R13: R14: 836e16f4 R15: > 0001 > [1.324113] FS: () GS:88001f00() > knlGS: > [1.324113] CS: 0010 DS: ES: CR0: 80050033 > [1.324113] CR2: CR3: 03012001
Re: [LKP] 4ce5f9c9e7 [ 1.323881] WARNING: CPU: 0 PID: 1 at mm/slab_common.c:1031 kmalloc_slab
So I am flummoxed. I am reading through the code and I don't see anything that could trigger this, and when I ran the supplied reproducer it did not reproduce for me. Plus there is the noise from the kmalloc_slab test that is goofing up the subject line. Is there any chance I can get a disassembly of the copy_siginfo_from_user or post_copy_siginfo_from_user from your build? I don't have the same tool chain. Right now I am strongly suspecting that there is a memory stomp somewhere and the earlier tests just happen on something that is the pinpointed commit to misbehave. Either that or it is simply that I don't have the latest and greatest smep/smap hardware and there is an off by one I am not seeing. I don't doubt that this test is finding something I haven't figured out how to see what it is finding, and when I exercise the same code path with my own tests everything appears to work. Eric kernel test robot writes: > Greetings, > > 0day kernel testing robot got the below dmesg and the first bad commit is > > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master > > commit 4ce5f9c9e7546915c559ffae594e6d73f918db00 > Author: Eric W. Biederman > AuthorDate: Tue Sep 25 12:59:31 2018 +0200 > Commit: Eric W. Biederman > CommitDate: Wed Oct 3 16:50:39 2018 +0200 > > signal: Use a smaller struct siginfo in the kernel > > We reserve 128 bytes for struct siginfo but only use about 48 bytes on > 64bit and 32 bytes on 32bit. Someday we might use more but it is unlikely > to be anytime soon. > > Userspace seems content with just enough bytes of siginfo to implement > sigqueue. Or in the case of checkpoint/restart reinjecting signals > the kernel has sent. > > Reducing the stack footprint and the work to copy siginfo around from > 2 cachelines to 1 cachelines seems worth doing even if I don't have > benchmarks to show a performance difference. > > Suggested-by: Linus Torvalds > Signed-off-by: "Eric W. Biederman" > > ae7795bc61 signal: Distinguish between kernel_siginfo and siginfo > 4ce5f9c9e7 signal: Use a smaller struct siginfo in the kernel > 570b7bdeaf Add linux-next specific files for 20181009 > +---+++---+ > | | ae7795bc61 | 4ce5f9c9e7 | > next-20181009 | > +---+++---+ > | boot_successes| 0 | 0 | 28 > | > | boot_failures | 1144 | 280| 8 > | > | WARNING:at_mm/slab_common.c:#kmalloc_slab | 1144 | 280| > | > | RIP:kmalloc_slab | 1144 | 280| > | > | Mem-Info | 1144 | 280| 8 > | > | BUG:unable_to_handle_kernel | 0 | 5 | 7 > | > | Oops:#[##]| 0 | 7 | 8 > | > | RIP:copy_siginfo_from_user| 0 | 7 | > | > | Kernel_panic-not_syncing:Fatal_exception | 0 | 7 | 8 > | > | RIP:post_copy_siginfo_from_user | 0 | 0 | 8 > | > +---+++---+ > > [1.320405] test_overflow: ok: (s8)(0 << 7) == 0 > [1.321071] test_overflow: ok: (s16)(0 << 15) == 0 > [1.321756] test_overflow: ok: (int)(0 << 31) == 0 > [1.322442] test_overflow: ok: (s32)(0 << 31) == 0 > [1.323121] test_overflow: ok: (s64)(0 << 63) == 0 > [1.323881] WARNING: CPU: 0 PID: 1 at mm/slab_common.c:1031 > kmalloc_slab+0x17/0x70 > [1.324113] CPU: 0 PID: 1 Comm: swapper/0 Tainted: GT > 4.19.0-rc1-00077-g4ce5f9c #1 > [1.324113] RIP: 0010:kmalloc_slab+0x17/0x70 > [1.324113] Code: 00 00 00 83 3d 11 78 14 03 02 55 48 89 e5 5d 0f 97 c0 c3 > 55 48 81 ff 00 00 40 00 48 89 e5 76 0e 31 c0 81 e6 00 02 00 00 75 4b <0f> 0b > eb 47 48 81 ff c0 00 00 00 77 19 48 85 ff b8 10 00 00 00 74 > [1.324113] RSP: :88000fc7fd50 EFLAGS: 00010246 > [1.324113] RAX: RBX: 006000c0 RCX: > 88001fb68d47 > [1.324113] RDX: 0001 RSI: RDI: > > [1.324113] RBP: 88000fc7fd50 R08: b128ac78 R09: > 0001 > [1.324113] R10: 0001 R11: R12: > 88001d814800 > [1.324113] R13: R14: 836e16f4 R15: > 0001 > [1.324113] FS: () GS:88001f00() > knlGS: > [1.324113] CS: 0010 DS: ES: CR0: 80050033 > [1.324113] CR2: CR3: 03012001