Re: Different SIGSEGV codes (x86 and ppc64le)
On Tue, Jan 19, 2016 at 9:34 PM, Michael Ellermanwrote: > > The kernel describes those error codes as: > > #define SEGV_MAPERR (__SI_FAULT|1) /* address not mapped to object */ > #define SEGV_ACCERR (__SI_FAULT|2) /* invalid permissions for mapped > object */ > > Which one is correct in this case isn't entirely clear. There is a stack > mapping, but you're not allowed to use it because of the stack ulimit, so > arguably ACCERR is more accurate. > > However that's only true because of the stack guard page, which is supposed to > be somewhat invisible to userspace. If I disable the stack guard page logic, > userspace sees SEGV_MAPERR, so it seems that historically that's what is > expected. I think MAPERR is likely the right thing for a guard page access. That said, I'd also warn people from caring too mucbh about the details of si_code. We've not traditionally been very good at filling it in. So any program that uses it for any actual semantic behavior is likely just broken. Print it it in debuggers by all means, but relying on it in any other way is just crazy. It's just not historically reliable enough. So I wouldn't worry about it excessively. Linus ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Different SIGSEGV codes (x86 and ppc64le)
Breno, Just to complement what you said, the si->si_code value you get on Power with the code you provided (SEGV_ACCERR 2) comes from .../arch/powerpc/mm/fault.c, line 375: https://goo.gl/6K40Bv Hence, changing that line to 'code = SEGV_MAPERR;' makes your code die with SIGSEGV Code ID 1, not 2: :~/stack$ uname -a Linux gromero12 4.4.0 #3 SMP Tue Jan 19 15:46:14 EST 2016 ppc64le ppc64le ppc64le GNU/Linux :~/stack$ ./overflow Got SIGSEGV(1) at address: 0x3fffcecefff0 Segmentation fault Regards, -- Gustavo Romero ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Different SIGSEGV codes (x86 and ppc64le)
During some debugging, we found that during a stack overflow, the SIGSEGV code returned is different on Power and Intel. We were able to narrow down the test case to the follow simple code: https://github.com/leitao/stack/blob/master/overflow.c On Power, the SIGSEV si->si_code is 2 (SEGV_ACCERR) , meaning "access error". On the other way around, the same test on x86 returns si->si_code = 1 (SEGV_MAPERR), meaning "invalid permission". Any idea why such difference? Example: Power - $ gcc overflow.c $ ./a.out Got SIGSEGV(2) at address: 0x3fffdd90ffe0 x86 --- $ gcc overflow.c $ ./a.out Got SIGSEGV(1) at address: 0x7fff9f089fe8 Thank you! Breno ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Different SIGSEGV codes (x86 and ppc64le)
On Tue, 2016-01-19 at 18:49 -0200, Breno Leitao wrote: > During some debugging, we found that during a stack overflow, the SIGSEGV code > returned is different on Power and Intel. > > We were able to narrow down the test case to the follow simple code: > > https://github.com/leitao/stack/blob/master/overflow.c [So the first thing I did was disable your signal handler, because that just complicates things.] > On Power, the SIGSEV si->si_code is 2 (SEGV_ACCERR) , meaning "access error". > On > the other way around, the same test on x86 returns si->si_code = 1 > (SEGV_MAPERR), > meaning "invalid permission". Any idea why such difference? This seems to be a result of the stack guard page. Whenever the lowest page of the stack vma is faulted in, the kernel grows the vma down one page. That means in do_page_fault() we don't ever see a bad area (ie. no vma found) for the stack. Instead we find a vma, and call handle_mm_fault(), which then tries to expand the stack down in check_stack_guard_page(). Then in expand_downwards() we call acct_stack_growth() which checks the stack ulimit, and that is what fails. That means the failure comes from handle_mm_fault(), and by that point in the logic we have already set code to SEGV_ACCERR. So even though we goto bad_area, code is SEGV_ACCERR and that's what you see. x86 on the other hand handles the error path differently, it passes the error down to mm_fault_error(), which calls bad_area_nosemaphore(), which always specifies SEGV_MAPERR for VM_FAULT_SIGSEGV. The kernel describes those error codes as: #define SEGV_MAPERR (__SI_FAULT|1) /* address not mapped to object */ #define SEGV_ACCERR (__SI_FAULT|2) /* invalid permissions for mapped object */ Which one is correct in this case isn't entirely clear. There is a stack mapping, but you're not allowed to use it because of the stack ulimit, so arguably ACCERR is more accurate. However that's only true because of the stack guard page, which is supposed to be somewhat invisible to userspace. If I disable the stack guard page logic, userspace sees SEGV_MAPERR, so it seems that historically that's what is expected. So we should probably fix this on powerpc. It also makes me think the logic we have in do_page_fault() to directly expand the stack (around line 375) is now dead code. cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev