subject:"\[BUG \-tip\] kmemleak and stacktrace cause page faul"

Re: [BUG -tip] kmemleak and stacktrace cause page faul

2019-10-23 Thread Cyrill Gorcunov

On Wed, Oct 23, 2019 at 03:47:57PM +0200, Thomas Gleixner wrote:
> 
> So the fix is trivial.

Works like a charm.

Tested-by: Cyrill Gorcunov

Re: [BUG -tip] kmemleak and stacktrace cause page faul

2019-10-23 Thread Cyrill Gorcunov

On Wed, Oct 23, 2019 at 03:47:57PM +0200, Thomas Gleixner wrote:
> 
> And you are right this happens because cea_exception_stacks is not yet
> initialized which makes begin = 0 and therefore point into nirvana.
> 
> So the fix is trivial.

Great! Thanks a lot for sych detailed analysis! I'll test the patch.

Re: [BUG -tip] kmemleak and stacktrace cause page faul

2019-10-23 Thread Thomas Gleixner

On Wed, 23 Oct 2019, Thomas Gleixner wrote:
> On Tue, 22 Oct 2019, Cyrill Gorcunov wrote:
> Ergo ep must be a valid pointer pointing to the statically allocated and
> statically initialized estack_pages array.
> 
> /* Guard page? */
> if (!ep->size)
> 
> How on earth can dereferencing ep crash the machine?
> 
> return false;
> 
> That does not make any sense.
> 
> Surely, we should not even try to decode exception stack when
> cea_exception_stacks is not yet initialized, but that does not explain
> anything what you are observing.

So looking at your actual crash:

[0.027246] BUG: unable to handle page fault for address: 1ff0

So this derefences the stack pointer address.

[0.082275] stk 0x1010 k 1 begin 0x0 end 0xd000 estack_pages 
0x82014880 ep
0x82014888

ep is pointing correctly to estack_pages[1] which is bogus because 0x1010
is not a valid stack value, but dereferencing ep does not make it crash.

The crash farther down:

end = begin + (unsigned long)ep->size;

==> end = 0x2000

regs = (struct pt_regs *)end - 1;

==> regs = 0x2000 - sizeof(struct pt_regs *) = 0x1ff0

info->type  = ep->type;
info->begin = (unsigned long *)begin;
info->end   = (unsigned long *)end;

>   info->next_sp   = (unsigned long *)regs->sp;

This is the crashing instruction trying to access 0x1ff0

And you are right this happens because cea_exception_stacks is not yet
initialized which makes begin = 0 and therefore point into nirvana.

So the fix is trivial.

Thanks,

tglx

8<
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -94,6 +94,13 @@ static bool in_exception_stack(unsigned
BUILD_BUG_ON(N_EXCEPTION_STACKS != 6);
 
begin = (unsigned long)__this_cpu_read(cea_exception_stacks);
+   /*
+* Handle the case where stack trace is collected _before_
+* cea_exception_stacks had been initialized.
+*/
+   if (!begin)
+   return false;
+
end = begin + sizeof(struct cea_exception_stacks);
/* Bail if @stack is outside the exception stack area. */
if (stk < begin || stk >= end)

Re: [BUG -tip] kmemleak and stacktrace cause page faul

2019-10-23 Thread Cyrill Gorcunov

On Wed, Oct 23, 2019 at 03:38:40PM +0200, Thomas Gleixner wrote:
> > > 
> > > [0,2,4,5,6,8,10,12] are guard pages so 0 is not that crappy at all
> > 
> > Wait, Thomas, I might be wrong, but per-cpu is initialized to the pointer,
> > the memory for this estack_pages has not yet been allocated, no?
> 
> static const
> struct estack_pages estack_pages[CEA_ESTACK_PAGES] cacheline_aligned = {
> EPAGERANGE(DF),
>   EPAGERANGE(NMI),
>   EPAGERANGE(DB1),
>   EPAGERANGE(DB),
> EPAGERANGE(MCE),
> };
> 
> It's statically allocated. So it's available from the very beginning.

Indeed, thanks! I happened to overlooked this moment.
...
> And as I explained to you properly decoded the values _ARE_ correct and
> make sense.

Yes, just posted the diff itself to be sure. Thanks a huge for
explanation, Thomas!

Re: [BUG -tip] kmemleak and stacktrace cause page faul

2019-10-23 Thread Thomas Gleixner

On Wed, 23 Oct 2019, Cyrill Gorcunov wrote:
> On Wed, Oct 23, 2019 at 03:21:05PM +0200, Thomas Gleixner wrote:
> > Errm. estack_pages is statically initialized and it's an array of:.
> > 
> > struct estack_pages {
> > u32 offs;
> > u16 size;
> > u16 type;
> > };
> > 
> > [0,2,4,5,6,8,10,12] are guard pages so 0 is not that crappy at all
> 
> Wait, Thomas, I might be wrong, but per-cpu is initialized to the pointer,
> the memory for this estack_pages has not yet been allocated, no?

static const
struct estack_pages estack_pages[CEA_ESTACK_PAGES] cacheline_aligned = {
EPAGERANGE(DF),
EPAGERANGE(NMI),
EPAGERANGE(DB1),
EPAGERANGE(DB),
EPAGERANGE(MCE),
};

It's statically allocated. So it's available from the very beginning.

> The diff I made to fetch the values are
> 
> diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
> index 753b8cfe8b8a..bf0d755b6079 100644
> --- a/arch/x86/kernel/dumpstack_64.c
> +++ b/arch/x86/kernel/dumpstack_64.c
> @@ -101,8 +101,18 @@ static bool in_exception_stack(unsigned long *stack, 
> struct stack_info *info)
>  
>   /* Calc page offset from start of exception stacks */
>   k = (stk - begin) >> PAGE_SHIFT;
> +
>   /* Lookup the page descriptor */
>   ep = _pages[k];
> +
> + printk("stk 0x%lx k %u begin 0x%lx end 0x%lx estack_pages 0x%lx ep 
> 0x%lx\n",
> +stk, k, begin, end, (long)(void *)_pages[0], (long)(void 
> *)ep);
> +
> + for (k = 0; k < CEA_ESTACK_PAGES; k++) {
> + long v = *(long *)(void *)_pages[k];
> + printk("estack_pages[%d] = 0x%lx\n", k, v);

And as I explained to you properly decoded the values _ARE_ correct and
make sense.

> + }
> +
>   /* Guard page? */
>   if (!ep->size)
>   return false;
> 
> 
> > 
> > e.g. 0x51000 1000
> > 
> >  bit  0-31: 1000Offset 0x1000: 1 Page
> >  bit 32-47: 1000Size 0x1000:   1 Page
> >  bit 48-63: 5   Type 5: STACK_TYPE_EXCEPTION + ESTACK_DF
> > 
> > So, no. This is NOT the problem.
> 
> I drop the left of your reply. True, I agreed with anything you said.
>
> You know I didn't manage to dive more into this problem yesterday
> but if time permits I'll continue today. It is easily triggering
> under kvm (the kernel I'm building is almost without modules so
> I simply upload bzImage into the guest). FWIW, the config I'm
> using is https://gist.github.com/cyrillos/7cd5d2510a99af8ea872f07ac6f9095b

That's helpful because I enabled kmemleak and the kernel comes up just fine.

Thanks,

tglx

Re: [BUG -tip] kmemleak and stacktrace cause page faul

2019-10-23 Thread Thomas Gleixner

On Tue, 22 Oct 2019, Cyrill Gorcunov wrote:
> On Tue, Oct 22, 2019 at 05:23:25PM +0300, Cyrill Gorcunov wrote:
> > 
> > I presume the kmemleak tries to save stack trace too early when 
> > estack_pages are not
> > yet filled.
> 
> Indeed, at this stage of boot the percpu_setup_exception_stacks has not been 
> called
> yet and estack_pages full of crap
> 
> [0.157502] stk 0x1008 k 1 begin 0x0 end 0xd000 estack_pages 
> 0x82014880 ep 0x82014888
> [0.159395] estack_pages[0] = 0x0
> [0.160046] estack_pages[1] = 0x510001000
> [0.160881] estack_pages[2] = 0x0
> [0.161530] estack_pages[3] = 0x610003000
> [0.162343] estack_pages[4] = 0x0
> [0.162962] estack_pages[5] = 0x0
> [0.163523] estack_pages[6] = 0x0
> [0.164065] estack_pages[7] = 0x810007000
> [0.164978] estack_pages[8] = 0x0
> [0.165624] estack_pages[9] = 0x910009000
> [0.166448] estack_pages[10] = 0x0
> [0.167064] estack_pages[11] = 0xa1000b000
> [0.168055] estack_pages[12] = 0x0

Errm. estack_pages is statically initialized and it's an array of:.

struct estack_pages {
u32 offs;
u16 size;
u16 type;
};

[0,2,4,5,6,8,10,12] are guard pages so 0 is not that crappy at all

The rest looks completely valid if you actually decode it proper.

e.g. 0x51000 1000

 bit  0-31: 1000Offset 0x1000: 1 Page
 bit 32-47: 1000Size 0x1000:   1 Page
 bit 48-63: 5   Type 5: STACK_TYPE_EXCEPTION + ESTACK_DF

So, no. This is NOT the problem.

But yes, you are right that percpu_setup_exception_stacks() has not yet
been called simply because the percpu entry area has not been mapped yet.

So lets look at the full context:

begin = (unsigned long)__this_cpu_read(cea_exception_stacks);

When percpu_setup_exception_stacks() has not been called yet, then begin
should be 0.

end = begin + sizeof(struct cea_exception_stacks);

end should be 0 + sizeof(struct cea_exception_stacks);

/* Bail if @stack is outside the exception stack area. */
if (stk < begin || stk >= end)
return false;

So 'begin <= stk < end' must be true to get to the below:

/* Calc page offset from start of exception stacks */
k = (stk - begin) >> PAGE_SHIFT;

which gives a valid 'k' no matter what 'begin' is. And obviously 'k' cannot
be outside of the array size of estack_pages.

/* Lookup the page descriptor */
ep = _pages[k];

Ergo ep must be a valid pointer pointing to the statically allocated and
statically initialized estack_pages array.

/* Guard page? */
if (!ep->size)

How on earth can dereferencing ep crash the machine?

return false;

That does not make any sense.

Surely, we should not even try to decode exception stack when
cea_exception_stacks is not yet initialized, but that does not explain
anything what you are observing.

Thanks,

tglx

Re: [BUG -tip] kmemleak and stacktrace cause page faul

2019-10-23 Thread Cyrill Gorcunov

On Wed, Oct 23, 2019 at 03:21:05PM +0200, Thomas Gleixner wrote:
> On Tue, 22 Oct 2019, Cyrill Gorcunov wrote:
> > On Tue, Oct 22, 2019 at 05:23:25PM +0300, Cyrill Gorcunov wrote:
> > > 
> > > I presume the kmemleak tries to save stack trace too early when 
> > > estack_pages are not
> > > yet filled.
> > 
> > Indeed, at this stage of boot the percpu_setup_exception_stacks has not 
> > been called
> > yet and estack_pages full of crap
> > 
> > [0.157502] stk 0x1008 k 1 begin 0x0 end 0xd000 estack_pages 
> > 0x82014880 ep 0x82014888
> > [0.159395] estack_pages[0] = 0x0
> > [0.160046] estack_pages[1] = 0x510001000
> > [0.160881] estack_pages[2] = 0x0
> > [0.161530] estack_pages[3] = 0x610003000
> > [0.162343] estack_pages[4] = 0x0
> > [0.162962] estack_pages[5] = 0x0
> > [0.163523] estack_pages[6] = 0x0
> > [0.164065] estack_pages[7] = 0x810007000
> > [0.164978] estack_pages[8] = 0x0
> > [0.165624] estack_pages[9] = 0x910009000
> > [0.166448] estack_pages[10] = 0x0
> > [0.167064] estack_pages[11] = 0xa1000b000
> > [0.168055] estack_pages[12] = 0x0
> 
> Errm. estack_pages is statically initialized and it's an array of:.
> 
> struct estack_pages {
> u32 offs;
> u16 size;
> u16 type;
> };
> 
> [0,2,4,5,6,8,10,12] are guard pages so 0 is not that crappy at all

Wait, Thomas, I might be wrong, but per-cpu is initialized to the pointer,
the memory for this estack_pages has not yet been allocated, no?

> The rest looks completely valid if you actually decode it proper.

The diff I made to fetch the values are

diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 753b8cfe8b8a..bf0d755b6079 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -101,8 +101,18 @@ static bool in_exception_stack(unsigned long *stack, 
struct stack_info *info)
 
/* Calc page offset from start of exception stacks */
k = (stk - begin) >> PAGE_SHIFT;
+
/* Lookup the page descriptor */
ep = _pages[k];
+
+   printk("stk 0x%lx k %u begin 0x%lx end 0x%lx estack_pages 0x%lx ep 
0x%lx\n",
+  stk, k, begin, end, (long)(void *)_pages[0], (long)(void 
*)ep);
+
+   for (k = 0; k < CEA_ESTACK_PAGES; k++) {
+   long v = *(long *)(void *)_pages[k];
+   printk("estack_pages[%d] = 0x%lx\n", k, v);
+   }
+
/* Guard page? */
if (!ep->size)
return false;


> 
> e.g. 0x51000 1000
> 
>  bit  0-31: 1000  Offset 0x1000: 1 Page
>  bit 32-47: 1000  Size 0x1000:   1 Page
>  bit 48-63: 5 Type 5: STACK_TYPE_EXCEPTION + ESTACK_DF
> 
> So, no. This is NOT the problem.

I drop the left of your reply. True, I agreed with anything you said.
You know I didn't manage to dive more into this problem yesterday
but if time permits I'll continue today. It is easily triggering
under kvm (the kernel I'm building is almost without modules so
I simply upload bzImage into the guest). FWIW, the config I'm
using is https://gist.github.com/cyrillos/7cd5d2510a99af8ea872f07ac6f9095b

Re: [BUG -tip] kmemleak and stacktrace cause page faul

2019-10-22 Thread Cyrill Gorcunov

On Tue, Oct 22, 2019 at 05:23:25PM +0300, Cyrill Gorcunov wrote:
> 
> I presume the kmemleak tries to save stack trace too early when estack_pages 
> are not
> yet filled.

Indeed, at this stage of boot the percpu_setup_exception_stacks has not been 
called
yet and estack_pages full of crap

[0.157502] stk 0x1008 k 1 begin 0x0 end 0xd000 estack_pages 
0x82014880 ep 0x82014888
[0.159395] estack_pages[0] = 0x0
[0.160046] estack_pages[1] = 0x510001000
[0.160881] estack_pages[2] = 0x0
[0.161530] estack_pages[3] = 0x610003000
[0.162343] estack_pages[4] = 0x0
[0.162962] estack_pages[5] = 0x0
[0.163523] estack_pages[6] = 0x0
[0.164065] estack_pages[7] = 0x810007000
[0.164978] estack_pages[8] = 0x0
[0.165624] estack_pages[9] = 0x910009000
[0.166448] estack_pages[10] = 0x0
[0.167064] estack_pages[11] = 0xa1000b000
[0.168055] estack_pages[12] = 0x0
[0.168891] BUG: unable to handle page fault for address: 1ff0

Re: [BUG -tip] kmemleak and stacktrace cause page faul

2019-10-22 Thread Cyrill Gorcunov

On Sat, Oct 19, 2019 at 02:44:21PM +0300, Cyrill Gorcunov wrote:
...
> 
> I nailed it down to the following kmemleak code
> 
> create_object
>   ...
>   object->trace_len = __save_stack_trace(object->trace);
> 
> if I drop this line out it boots fine. Just wanted to share the observation,
> probably it is known issue already.
> 
> Sidenote: The last -tip kernel which I've been working with is dated Sep 18
> so the changes which cause the problem should be introduced last month.

I've just tried to boot with fresh -tip

commit c2e50c28eeb90c0f3309d43c2ce0bd292a6751b3 (HEAD -> master, origin/master, 
origin/HEAD)
Merge: aec1ea9d4f48 27a0a90d6301
Author: Ingo Molnar 
Date:   Tue Oct 22 01:16:59 2019 +0200

Merge branch 'perf/core'

Conflicts:
tools/perf/check-headers.sh

but got the same issue. So I tried to go deeper, and here is a result: we're 
failing
in arch/x86/kernel/dumpstack_64.c:in_exception_stack routine, precisely at line

ep = _pages[k];
/* Guard page? */
if (!ep->size)
return false;

so I added a logline here

[0.082275] stk 0x1010 k 1 begin 0x0 end 0xd000 estack_pages 
0x82014880 ep 0x82014888
[0.084951] BUG: unable to handle page fault for address: 1ff0
[0.086724] #PF: supervisor read access in kernel mode
[0.088506] #PF: error_code(0x) - not-present page
[0.090265] PGD 0 P4D 0 
[0.090846] Oops:  [#2] PREEMPT SMP PTI
[0.091734] CPU: 0 PID: 0 Comm: swapper Not tainted 
5.4.0-rc4-00258-gc2e50c28eeb9-dirty #114
[0.093514] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
[0.096993] RIP: 0010:get_stack_info+0xdc/0x173
[0.097994] Code: 84 48 01 82 66 85 c0 74 27 42 8b 14 f5 80 48 01 82 49 01 
d5 42 0f b7 14 f5 86 48 01 82 4c 01 e8 4c 89 6b 08 89 13 48 89 43 10 <48> 8b 40 
f0 eb 2b 65 48 8b 05 16 f4 f9 7e 48 8d 90 00 c0 ff ff 48

I presume the kmemleak tries to save stack trace too early when estack_pages 
are not
yet filled.

[BUG -tip] kmemleak and stacktrace cause page faul

2019-10-19 Thread Cyrill Gorcunov

Hi! I'm not sure if I've CC'ed proper persons, so please sorry if I did.
Anyway, today's -tip (07b4dbf1d830) refused to boot

[0.024793] No NUMA configuration found
[0.025406] Faking a node at [mem 0x-0x7ffdefff]
[0.026462] NODE_DATA(0) allocated [mem 0x7ffdb000-0x7ffdefff]
[0.027246] BUG: unable to handle page fault for address: 1ff0
[0.028160] #PF: supervisor read access in kernel mode
[0.028992] #PF: error_code(0x) - not-present page
[0.029820] PGD 0 P4D 0 
[0.030226] Oops:  [#1] PREEMPT SMP PTI
[0.031069] CPU: 0 PID: 0 Comm: swapper Not tainted 
5.4.0-rc3-00258-g07b4dbf1d830 #93
[0.032317] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
[0.034163] RIP: 0010:get_stack_info+0xb3/0x148
[0.034903] Code: 04 d5 84 48 01 82 66 85 c0 74 25 8b 0c d5 80 48 01 82 0f 
b7 14 d5 86 48 01 82 48 01 f1 89 13 48 01 c8 48 89 4b 08 48 89 43 10 <48> 8b 40 
f0 eb 2b 65 48 8b 05 1f f4 f9 7e 48 8d 90 00 c0 ff ff 48
[0.037579] RSP: :82603be0 EFLAGS: 00010006

I nailed it down to the following kmemleak code

create_object
  ...
  object->trace_len = __save_stack_trace(object->trace);

if I drop this line out it boots fine. Just wanted to share the observation,
probably it is known issue already.

Sidenote: The last -tip kernel which I've been working with is dated Sep 18
so the changes which cause the problem should be introduced last month.

Re: [BUG -tip] kmemleak and stacktrace cause page faul

Re: [BUG -tip] kmemleak and stacktrace cause page faul

Re: [BUG -tip] kmemleak and stacktrace cause page faul

Re: [BUG -tip] kmemleak and stacktrace cause page faul

Re: [BUG -tip] kmemleak and stacktrace cause page faul

Re: [BUG -tip] kmemleak and stacktrace cause page faul

Re: [BUG -tip] kmemleak and stacktrace cause page faul

Re: [BUG -tip] kmemleak and stacktrace cause page faul

Re: [BUG -tip] kmemleak and stacktrace cause page faul

[BUG -tip] kmemleak and stacktrace cause page faul

10 matches

Site Navigation

Mail list logo

Footer information