Re: Generic page fault (Was: libsigsegv ....)
On Sat, 2015-02-28 at 16:41 -0800, Linus Torvalds wrote: > On Sat, Feb 28, 2015 at 3:02 PM, Benjamin Herrenschmidt > wrote: > > > > Anyway, here's the current patch: > > Ok, I think I like this approach better. > > Your FAULT_FLAG_EXEC handling is wrong, though. It shouldn't check > VM_WRITE, it should check VM_EXEC. A bit too much copy-paste ;) Ah yes indeed :) I would have caught that pretty quickly once I tried powerpc. > Btw, it's quite possible that we could just do all the PF_PROT > handling at the x86 level, before even calling the generic fault > handler. It's not like we even need the vma or the mm semaphore: if > it's a non-write protection fault, we always SIGSEGV. So why even > bother getting the locks and looking up the page tables etc? For non-write possibly, though that means we always end up going through handle_mm_fault for a non-present PTE before we decide that it wasn't actually accessible. Not a huge deal I suppose unless something (ab)uses mprotect + emulation for things like spying on MMIO accesses (been there) or god knows what other userspace based paging or GC might be out there. It's asking for trouble but the check in access_error() doesn't hurt much either. On the other hand, we are all about simplifying code here, aren't we ? :-) It's not like the early bail-out on x86 will buy us much and it's going to be more arch specific code since they'll all have a different variant of PF_PROT. One thing that worries me a bit, however, with the PF_PROT handling in the generic case is the case of archs who support present-but-not-accessible PTEs in "HW" (for us that basically means no _PAGE_USER), as these might use such fault for NUMA balancing, or software maintained "young" bit, and in those case, bailing out on PF_PROT is wrong. I'll have to figure out if that ever happens when I dig through more archs. > Now, that PF_PROT handling isn't exactly performance-critical, but it > might help to remove the odd x86 special case from the generic code. I don't think it's that odd on x86. If you look at what other archs do, there's a bit of everything out there. Just powerpc may or may not support exec permission for example depending on the CPU generation and when it doesn't it's basically similar to x86. I yet have to fully understand what ARM does (looks simple, just need to make sure it's what I think it is and check various ARM generations), sparc has its oddities, etc... None of that is terribly urgent and I'm pretty busy so give me a bit more time now that we have some kind of direction, to go through a few more archs and see where it takes us. I'll start gathering cross compilers and qemu debian images in the next few days. Cheers, Ben. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Generic page fault (Was: libsigsegv ....)
On Sat, Feb 28, 2015 at 3:02 PM, Benjamin Herrenschmidt wrote: > > Anyway, here's the current patch: Ok, I think I like this approach better. Your FAULT_FLAG_EXEC handling is wrong, though. It shouldn't check VM_WRITE, it should check VM_EXEC. A bit too much copy-paste ;) Btw, it's quite possible that we could just do all the PF_PROT handling at the x86 level, before even calling the generic fault handler. It's not like we even need the vma or the mm semaphore: if it's a non-write protection fault, we always SIGSEGV. So why even bother getting the locks and looking up the page tables etc? Now, that PF_PROT handling isn't exactly performance-critical, but it might help to remove the odd x86 special case from the generic code. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Generic page fault (Was: libsigsegv ....)
On Sun, 2015-03-01 at 09:16 +1100, Benjamin Herrenschmidt wrote: > So for error handling, I'm trying to simply return the VM_FAULT_* flags > from generic_page_fault see where that takes us. That's a way to avoid > passing an arch specific struct around. It also allows my hack to > account major faults with the hypervisor to be done outside the generic > code completely (no hook). > > .../... Here's what it looks like for x86 only and without completely sorting out the fatal signal business. However I might still have to do the arch pointer you mentioned for sparc and possibly other archs, but so far it looks better already. Note that if I add that arch pointer, I might stop messing around or even returning "fault" and instead just return a simple enum minor,major,error and let inline arch hooks populate the arch pointer with the error details in whatever fashion the arch prefers. However I suspect they'll all end up with sig and si_code in there... Anyway, here's the current patch: arch/x86/include/asm/fault.h | 21 arch/x86/mm/fault.c | 233 --- include/linux/fault.h| 24 + include/linux/mm.h | 5 +- mm/Makefile | 2 +- mm/fault.c | 196 6 files changed, 266 insertions(+), 215 deletions(-) create mode 100644 arch/x86/include/asm/fault.h create mode 100644 include/linux/fault.h create mode 100644 mm/fault.c diff --git a/arch/x86/include/asm/fault.h b/arch/x86/include/asm/fault.h new file mode 100644 index 000..04263ec --- /dev/null +++ b/arch/x86/include/asm/fault.h @@ -0,0 +1,21 @@ +#ifndef _ASM_X86_FAULT_H +#define _ASM_X86_FAULT_H + +#include +#include + +/* Check if the stack is allowed to grow during a user page fault */ +static inline bool stack_can_grow(struct pt_regs *regs, unsigned long flags, + unsigned long address, + struct vm_area_struct *vma) +{ + /* +* Accessing the stack below %sp is always a bug. +* The large cushion allows instructions like enter +* and pusha to work. ("enter $65535, $31" pushes +* 32 pointers and then decrements %sp by 65535.) +*/ + return address + 65536 + 32 * sizeof(unsigned long) >= regs->sp; +} + +#endif /* _ASM_X86_FAULT_H */ diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index ede025f..19a8a91 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -13,6 +13,7 @@ #include /* hstate_index_to_shift*/ #include /* prefetchw*/ #include /* exception_enter(), ... */ +#include #include /* dotraplinkage, ... */ #include/* pgd_*(), ... */ @@ -748,8 +749,7 @@ show_signal_msg(struct pt_regs *regs, unsigned long error_code, printk(KERN_CONT "\n"); } -static void -__bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code, +static void __bad_area(struct pt_regs *regs, unsigned long error_code, unsigned long address, int si_code) { struct task_struct *tsk = current; @@ -804,39 +804,10 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code, no_context(regs, error_code, address, SIGSEGV, si_code); } -static noinline void -bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code, -unsigned long address) -{ - __bad_area_nosemaphore(regs, error_code, address, SEGV_MAPERR); -} - -static void -__bad_area(struct pt_regs *regs, unsigned long error_code, - unsigned long address, int si_code) -{ - struct mm_struct *mm = current->mm; - - /* -* Something tried to access memory that isn't in our memory map.. -* Fix it, but check if it's kernel or user first.. -*/ - up_read(>mmap_sem); - - __bad_area_nosemaphore(regs, error_code, address, si_code); -} - -static noinline void -bad_area(struct pt_regs *regs, unsigned long error_code, unsigned long address) -{ - __bad_area(regs, error_code, address, SEGV_MAPERR); -} - -static noinline void -bad_area_access_error(struct pt_regs *regs, unsigned long error_code, - unsigned long address) +static inline void bad_area(struct pt_regs *regs, unsigned long error_code, + unsigned long address, int si_code) { - __bad_area(regs, error_code, address, SEGV_ACCERR); + __bad_area(regs, error_code, address, si_code); } static void @@ -871,40 +842,6 @@ do_sigbus(struct pt_regs *regs, unsigned long error_code, unsigned long address, force_sig_info_fault(SIGBUS, code, address, tsk, fault); } -static noinline void -mm_fault_error(struct pt_regs *regs, unsigned long error_code, - unsigned long address, unsigned int fault) -{ - if (fatal_signal_pending(current) &&
Re: Generic page fault (Was: libsigsegv ....)
On Sun, 2015-03-01 at 09:16 +1100, Benjamin Herrenschmidt wrote: > So for error handling, I'm trying to simply return the VM_FAULT_* flags > from generic_page_fault see where that takes us. That's a way to avoid > passing an arch specific struct around. It also allows my hack to > account major faults with the hypervisor to be done outside the generic > code completely (no hook). I suspect sparc64 will defeat the "not passing an arch struct around" though... oh well. Ben. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Generic page fault (Was: libsigsegv ....)
On Sat, 2015-02-28 at 13:49 -0800, Linus Torvalds wrote: .../... > - we handle write faults separately (see the first part of access_error() > > - so now we know it was a read or an instruction fetch > > - if PF_PROT is set, that means that the present bit was set in the > page tables, so it must have been an exec access to a NX page > > - otherwise, we just say "PROTNONE means no access, otherwise > populate the page tables" > > .. and if it turns out that it was a PF_INSTR to a NX page, we'll end > up taking the page fault *again* after it's been populated, and now > since the page table was populated, the access_error() will catch it > with the PF_PROT case. > > Or something like that. I might have screwed up some detail, but it > should all work. I see, it should work yes, I'll still add that FAULT_FLAG_EXEC for those who can tell reliably but it shouldn't hurt for x86 to not set it. Cheers, Ben. > Linus > -- > To unsubscribe from this list: send the line "unsubscribe linux-arch" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Generic page fault (Was: libsigsegv ....)
So for error handling, I'm trying to simply return the VM_FAULT_* flags from generic_page_fault see where that takes us. That's a way to avoid passing an arch specific struct around. It also allows my hack to account major faults with the hypervisor to be done outside the generic code completely (no hook). We will process generically some of the flags first such as the repeat logic or major/minor accounting of course. For that to work, I'm adding a VM_FAULT_ACCESS (that gets OR'ed with VM_FAULT_SIGSEGV) to differentiate SEGV_MAPERR and SEGV_ACCERR. So far so good. However, I noticed a small discrepancy on x86 in the handling of fatal signals: I see two path that can be hit on a fatal signal. The "obvious" one is the one in access_error() which calls no_context() with a 0 signal argument, the other path is in the retry handling, which will in this case call no_context() with SIGBUS/BUS_ADRERR. Now, the only place in there that seems to care about the signal that gets passed in is the sig_on_uaccess_error case. On one case (0 sig), that test will be skipped, on the other case (SIGBUS), that test will be done and might result in a sigbus being generated, which might override the original deadly signal (or am I missing something ?) Now I don't completely understand how the x86 vsyscall stuff works so I don't know precisely in what circumstances that test matters, I'll need you help there. Cheers, Ben. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Generic page fault (Was: libsigsegv ....)
On Sat, Feb 28, 2015 at 1:14 PM, Benjamin Herrenschmidt wrote: > > BTW. I fail to see how x86 checks PF_INSTR vs. VM_NOEXEC ... or it doesn't ? It doesn't. x86 traditionally doesn't have an execute bit, so traditionally "read == exec". So PF_INSTR really wasn't historically very useful, in that it would only show if the *first* access to a page was an instruction fetch - if you did a regular read to brign the page in, then subsequent instruction fetches would just work. Then NX came along, and what happens now is - we handle write faults separately (see the first part of access_error() - so now we know it was a read or an instruction fetch - if PF_PROT is set, that means that the present bit was set in the page tables, so it must have been an exec access to a NX page - otherwise, we just say "PROTNONE means no access, otherwise populate the page tables" .. and if it turns out that it was a PF_INSTR to a NX page, we'll end up taking the page fault *again* after it's been populated, and now since the page table was populated, the access_error() will catch it with the PF_PROT case. Or something like that. I might have screwed up some detail, but it should all work. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Generic page fault (Was: libsigsegv ....)
On Sat, 2015-02-28 at 11:56 -0800, Linus Torvalds wrote: > On Fri, Feb 27, 2015 at 11:12 PM, Benjamin Herrenschmidt > wrote: > > > > Let me know what you think of the approach. > > Hmm. I'm not happy with just how many of those arch wrapper/helper > functions there are, and some of it looks a bit unportable. > > For example, the code "knows" that "err_code" and "address" are the > only two architecture-specific pieces of information (in addition to > "struct pt_regs", of course. > > And the fault_is_user/write() stuff seems unnecessary - we have the > generic FAULT_FLAG_USER/WRITE flags for that, but instead of passing > in the generic version to the generic function, we pass in the > arch-specific ones. I was trying to keep that fault mask local but it might indeed be a mistake. The reasoning is that if you look at more archs (the scary one is sparc), there is potentially more arch "gunk" that needs to be hooked in after find_vma()... And a lot of that need arch specific flags coming in. For example on ppc I check if the instruction is allowed to grow the stack before we take the mm sem and pass that information for use by the stack growing check. Here I could just I suppose add some arch specific fault flags but it might get messy... > The same goes for "access_error()". Right now it's an arch-specific > helper function, but it actually does some generic tests (like > checking the vma protections), and I think it could/should be entirely > generic if we just added the necessary FAULT_FLAG_READ/EXEC/NOTPRESENT > information to the "flags" register. Just looking at the ppc version, > my gut feel is that "access_error()" is just horribly error-prone > as-is even from an architecture standpoint. I was weary of going down the path of having access_error() be an arch helper because it gets even wilder with other archs and the crazy platform flags we deal with on ppc. Here. I was thinking of looking at factoring that in a second step. You are right that most of it seems to related to execute permission however (at least sparc, mips, powerpc and arm). The way to go would be to define a FAULT_FLAG_EXEC which the arch only sets if it supports enforcing exec at fault time. BTW. I fail to see how x86 checks PF_INSTR vs. VM_NOEXEC ... or it doesn't ? > Yes, the "read/exec/notpresent" bits are a bit weird, but old x86 > isn't the only architecture that doesn't necessarily know the > difference between read and exec. > > So I'd like a bit more encapsulation. The generic code should never > really need to look at the arch-specific details, although it is true > that then the error handling cases will likely need it (in order to > put it on the arch-specific stack info or similar). Right, we might still have to pass error_code around, I'll have a look. > So my *gut* feel is that the generic code should be passed > > - address and the generic flags (ie FAULT_FLAG_USER/WRITE filled in > by the caller) > >These are the only things the generic code should need to use itself > > - I guess we do need to pass in "struct pt_regs", because things like > generic tracing actually want it. And error handling but it could be in the latter... > - an opaque "arch specific fault information structure pointer". Not > used for anything but passing back to the error functions (ie very > much *not* for helper functions for the normal case, like the current > "access_error()" - if it's actually needed for those kinds of things, > then I'm wrong about the model) Well, it might be needed for stack growth check. It's also somewhat means we have yet *another* fault information structure which is a bit sad but I don't see a good way to fold them into one. For errors I was thinking instead of returning error info from the generic code and leaving the generation of oops/signal to the caller, which means less arch gunk to carry around. >This would (for x86) contain "error_code", but I have the feeling > that other architectures might need/want more than one word of > information. > > But I don't know. Maybe I'm wrong. I don't hate the patch as-is, I > just have this feeling that it coudl be more "generic", and less > "random small arch helpers". I don't like it that much either, it's a first draft to see what the waters are like. I'll play around more and see what It comes to. Cheers, Ben. > > Linus > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majord...@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: mailto:"d...@kvack.org;> em...@kvack.org -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Generic page fault (Was: libsigsegv ....)
On Sat, Feb 28, 2015 at 11:56 AM, Linus Torvalds wrote: > > But I don't know. Maybe I'm wrong. I don't hate the patch as-is, I > just have this feeling that it coudl be more "generic", and less > "random small arch helpers". Oh, and I definitely agree with you on the "single handle_bad_fault()" thing rather than the current handle_bad_area/handle_kernel_fault/do_sigbus. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Generic page fault (Was: libsigsegv ....)
On Fri, Feb 27, 2015 at 11:12 PM, Benjamin Herrenschmidt wrote: > > Let me know what you think of the approach. Hmm. I'm not happy with just how many of those arch wrapper/helper functions there are, and some of it looks a bit unportable. For example, the code "knows" that "err_code" and "address" are the only two architecture-specific pieces of information (in addition to "struct pt_regs", of course. And the fault_is_user/write() stuff seems unnecessary - we have the generic FAULT_FLAG_USER/WRITE flags for that, but instead of passing in the generic version to the generic function, we pass in the arch-specific ones. The same goes for "access_error()". Right now it's an arch-specific helper function, but it actually does some generic tests (like checking the vma protections), and I think it could/should be entirely generic if we just added the necessary FAULT_FLAG_READ/EXEC/NOTPRESENT information to the "flags" register. Just looking at the ppc version, my gut feel is that "access_error()" is just horribly error-prone as-is even from an architecture standpoint. Yes, the "read/exec/notpresent" bits are a bit weird, but old x86 isn't the only architecture that doesn't necessarily know the difference between read and exec. So I'd like a bit more encapsulation. The generic code should never really need to look at the arch-specific details, although it is true that then the error handling cases will likely need it (in order to put it on the arch-specific stack info or similar). So my *gut* feel is that the generic code should be passed - address and the generic flags (ie FAULT_FLAG_USER/WRITE filled in by the caller) These are the only things the generic code should need to use itself - I guess we do need to pass in "struct pt_regs", because things like generic tracing actually want it. - an opaque "arch specific fault information structure pointer". Not used for anything but passing back to the error functions (ie very much *not* for helper functions for the normal case, like the current "access_error()" - if it's actually needed for those kinds of things, then I'm wrong about the model) This would (for x86) contain "error_code", but I have the feeling that other architectures might need/want more than one word of information. But I don't know. Maybe I'm wrong. I don't hate the patch as-is, I just have this feeling that it coudl be more "generic", and less "random small arch helpers". Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Generic page fault (Was: libsigsegv ....)
On Sat, 2015-02-28 at 18:14 +1100, Benjamin Herrenschmidt wrote: > On Sat, 2015-02-28 at 18:12 +1100, Benjamin Herrenschmidt wrote: > > > > Let me know what you think of the approach. It's boot tested on x86_64 > > in qemu and > > ... and powerpc64 LE on qemu as well :-) One thing I'd like to do is fold handle_kernel_fault() into handle_bad_area() (and in fact fold do_sigbus as well). Basically have a single handle_bad_fault() that takes sig and si_code as arguments which we call from the generic code for all faults. It will test for kernel vs. user mode and do the right thing (and we could handle the sigbus special case by simply comparing sig to sigbus). The one reason I haven't done it so far is that x86 handle_bad_area() has the is_f00f_bug() call which isn't do for other cases of no_context() and I'm not sure generalizing it is safe for all cases (or maybe I can call it only when sig is SIGSEGV ?) ... I don't actually understand what it does :) Cheers, Ben. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Generic page fault (Was: libsigsegv ....)
On Sat, 2015-02-28 at 18:14 +1100, Benjamin Herrenschmidt wrote: On Sat, 2015-02-28 at 18:12 +1100, Benjamin Herrenschmidt wrote: Let me know what you think of the approach. It's boot tested on x86_64 in qemu and ... and powerpc64 LE on qemu as well :-) One thing I'd like to do is fold handle_kernel_fault() into handle_bad_area() (and in fact fold do_sigbus as well). Basically have a single handle_bad_fault() that takes sig and si_code as arguments which we call from the generic code for all faults. It will test for kernel vs. user mode and do the right thing (and we could handle the sigbus special case by simply comparing sig to sigbus). The one reason I haven't done it so far is that x86 handle_bad_area() has the is_f00f_bug() call which isn't do for other cases of no_context() and I'm not sure generalizing it is safe for all cases (or maybe I can call it only when sig is SIGSEGV ?) ... I don't actually understand what it does :) Cheers, Ben. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Generic page fault (Was: libsigsegv ....)
On Sun, 2015-03-01 at 09:16 +1100, Benjamin Herrenschmidt wrote: So for error handling, I'm trying to simply return the VM_FAULT_* flags from generic_page_fault see where that takes us. That's a way to avoid passing an arch specific struct around. It also allows my hack to account major faults with the hypervisor to be done outside the generic code completely (no hook). .../... Here's what it looks like for x86 only and without completely sorting out the fatal signal business. However I might still have to do the arch pointer you mentioned for sparc and possibly other archs, but so far it looks better already. Note that if I add that arch pointer, I might stop messing around or even returning fault and instead just return a simple enum minor,major,error and let inline arch hooks populate the arch pointer with the error details in whatever fashion the arch prefers. However I suspect they'll all end up with sig and si_code in there... Anyway, here's the current patch: arch/x86/include/asm/fault.h | 21 arch/x86/mm/fault.c | 233 --- include/linux/fault.h| 24 + include/linux/mm.h | 5 +- mm/Makefile | 2 +- mm/fault.c | 196 6 files changed, 266 insertions(+), 215 deletions(-) create mode 100644 arch/x86/include/asm/fault.h create mode 100644 include/linux/fault.h create mode 100644 mm/fault.c diff --git a/arch/x86/include/asm/fault.h b/arch/x86/include/asm/fault.h new file mode 100644 index 000..04263ec --- /dev/null +++ b/arch/x86/include/asm/fault.h @@ -0,0 +1,21 @@ +#ifndef _ASM_X86_FAULT_H +#define _ASM_X86_FAULT_H + +#include linux/types.h +#include asm/ptrace.h + +/* Check if the stack is allowed to grow during a user page fault */ +static inline bool stack_can_grow(struct pt_regs *regs, unsigned long flags, + unsigned long address, + struct vm_area_struct *vma) +{ + /* +* Accessing the stack below %sp is always a bug. +* The large cushion allows instructions like enter +* and pusha to work. (enter $65535, $31 pushes +* 32 pointers and then decrements %sp by 65535.) +*/ + return address + 65536 + 32 * sizeof(unsigned long) = regs-sp; +} + +#endif /* _ASM_X86_FAULT_H */ diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index ede025f..19a8a91 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -13,6 +13,7 @@ #include linux/hugetlb.h /* hstate_index_to_shift*/ #include linux/prefetch.h/* prefetchw*/ #include linux/context_tracking.h/* exception_enter(), ... */ +#include linux/fault.h #include asm/traps.h /* dotraplinkage, ... */ #include asm/pgalloc.h /* pgd_*(), ... */ @@ -748,8 +749,7 @@ show_signal_msg(struct pt_regs *regs, unsigned long error_code, printk(KERN_CONT \n); } -static void -__bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code, +static void __bad_area(struct pt_regs *regs, unsigned long error_code, unsigned long address, int si_code) { struct task_struct *tsk = current; @@ -804,39 +804,10 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code, no_context(regs, error_code, address, SIGSEGV, si_code); } -static noinline void -bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code, -unsigned long address) -{ - __bad_area_nosemaphore(regs, error_code, address, SEGV_MAPERR); -} - -static void -__bad_area(struct pt_regs *regs, unsigned long error_code, - unsigned long address, int si_code) -{ - struct mm_struct *mm = current-mm; - - /* -* Something tried to access memory that isn't in our memory map.. -* Fix it, but check if it's kernel or user first.. -*/ - up_read(mm-mmap_sem); - - __bad_area_nosemaphore(regs, error_code, address, si_code); -} - -static noinline void -bad_area(struct pt_regs *regs, unsigned long error_code, unsigned long address) -{ - __bad_area(regs, error_code, address, SEGV_MAPERR); -} - -static noinline void -bad_area_access_error(struct pt_regs *regs, unsigned long error_code, - unsigned long address) +static inline void bad_area(struct pt_regs *regs, unsigned long error_code, + unsigned long address, int si_code) { - __bad_area(regs, error_code, address, SEGV_ACCERR); + __bad_area(regs, error_code, address, si_code); } static void @@ -871,40 +842,6 @@ do_sigbus(struct pt_regs *regs, unsigned long error_code, unsigned long address, force_sig_info_fault(SIGBUS, code, address, tsk, fault); } -static noinline void -mm_fault_error(struct pt_regs *regs, unsigned long error_code, -
Re: Generic page fault (Was: libsigsegv ....)
On Sat, Feb 28, 2015 at 3:02 PM, Benjamin Herrenschmidt b...@kernel.crashing.org wrote: Anyway, here's the current patch: Ok, I think I like this approach better. Your FAULT_FLAG_EXEC handling is wrong, though. It shouldn't check VM_WRITE, it should check VM_EXEC. A bit too much copy-paste ;) Btw, it's quite possible that we could just do all the PF_PROT handling at the x86 level, before even calling the generic fault handler. It's not like we even need the vma or the mm semaphore: if it's a non-write protection fault, we always SIGSEGV. So why even bother getting the locks and looking up the page tables etc? Now, that PF_PROT handling isn't exactly performance-critical, but it might help to remove the odd x86 special case from the generic code. Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Generic page fault (Was: libsigsegv ....)
On Sat, 2015-02-28 at 11:56 -0800, Linus Torvalds wrote: On Fri, Feb 27, 2015 at 11:12 PM, Benjamin Herrenschmidt b...@kernel.crashing.org wrote: Let me know what you think of the approach. Hmm. I'm not happy with just how many of those arch wrapper/helper functions there are, and some of it looks a bit unportable. For example, the code knows that err_code and address are the only two architecture-specific pieces of information (in addition to struct pt_regs, of course. And the fault_is_user/write() stuff seems unnecessary - we have the generic FAULT_FLAG_USER/WRITE flags for that, but instead of passing in the generic version to the generic function, we pass in the arch-specific ones. I was trying to keep that fault mask local but it might indeed be a mistake. The reasoning is that if you look at more archs (the scary one is sparc), there is potentially more arch gunk that needs to be hooked in after find_vma()... And a lot of that need arch specific flags coming in. For example on ppc I check if the instruction is allowed to grow the stack before we take the mm sem and pass that information for use by the stack growing check. Here I could just I suppose add some arch specific fault flags but it might get messy... The same goes for access_error(). Right now it's an arch-specific helper function, but it actually does some generic tests (like checking the vma protections), and I think it could/should be entirely generic if we just added the necessary FAULT_FLAG_READ/EXEC/NOTPRESENT information to the flags register. Just looking at the ppc version, my gut feel is that access_error() is just horribly error-prone as-is even from an architecture standpoint. I was weary of going down the path of having access_error() be an arch helper because it gets even wilder with other archs and the crazy platform flags we deal with on ppc. Here. I was thinking of looking at factoring that in a second step. You are right that most of it seems to related to execute permission however (at least sparc, mips, powerpc and arm). The way to go would be to define a FAULT_FLAG_EXEC which the arch only sets if it supports enforcing exec at fault time. BTW. I fail to see how x86 checks PF_INSTR vs. VM_NOEXEC ... or it doesn't ? Yes, the read/exec/notpresent bits are a bit weird, but old x86 isn't the only architecture that doesn't necessarily know the difference between read and exec. So I'd like a bit more encapsulation. The generic code should never really need to look at the arch-specific details, although it is true that then the error handling cases will likely need it (in order to put it on the arch-specific stack info or similar). Right, we might still have to pass error_code around, I'll have a look. So my *gut* feel is that the generic code should be passed - address and the generic flags (ie FAULT_FLAG_USER/WRITE filled in by the caller) These are the only things the generic code should need to use itself - I guess we do need to pass in struct pt_regs, because things like generic tracing actually want it. And error handling but it could be in the latter... - an opaque arch specific fault information structure pointer. Not used for anything but passing back to the error functions (ie very much *not* for helper functions for the normal case, like the current access_error() - if it's actually needed for those kinds of things, then I'm wrong about the model) Well, it might be needed for stack growth check. It's also somewhat means we have yet *another* fault information structure which is a bit sad but I don't see a good way to fold them into one. For errors I was thinking instead of returning error info from the generic code and leaving the generation of oops/signal to the caller, which means less arch gunk to carry around. This would (for x86) contain error_code, but I have the feeling that other architectures might need/want more than one word of information. But I don't know. Maybe I'm wrong. I don't hate the patch as-is, I just have this feeling that it coudl be more generic, and less random small arch helpers. I don't like it that much either, it's a first draft to see what the waters are like. I'll play around more and see what It comes to. Cheers, Ben. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Generic page fault (Was: libsigsegv ....)
On Sun, 2015-03-01 at 09:16 +1100, Benjamin Herrenschmidt wrote: So for error handling, I'm trying to simply return the VM_FAULT_* flags from generic_page_fault see where that takes us. That's a way to avoid passing an arch specific struct around. It also allows my hack to account major faults with the hypervisor to be done outside the generic code completely (no hook). I suspect sparc64 will defeat the not passing an arch struct around though... oh well. Ben. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Generic page fault (Was: libsigsegv ....)
On Sat, 2015-02-28 at 13:49 -0800, Linus Torvalds wrote: .../... - we handle write faults separately (see the first part of access_error() - so now we know it was a read or an instruction fetch - if PF_PROT is set, that means that the present bit was set in the page tables, so it must have been an exec access to a NX page - otherwise, we just say PROTNONE means no access, otherwise populate the page tables .. and if it turns out that it was a PF_INSTR to a NX page, we'll end up taking the page fault *again* after it's been populated, and now since the page table was populated, the access_error() will catch it with the PF_PROT case. Or something like that. I might have screwed up some detail, but it should all work. I see, it should work yes, I'll still add that FAULT_FLAG_EXEC for those who can tell reliably but it shouldn't hurt for x86 to not set it. Cheers, Ben. Linus -- To unsubscribe from this list: send the line unsubscribe linux-arch in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Generic page fault (Was: libsigsegv ....)
On Sat, Feb 28, 2015 at 1:14 PM, Benjamin Herrenschmidt b...@kernel.crashing.org wrote: BTW. I fail to see how x86 checks PF_INSTR vs. VM_NOEXEC ... or it doesn't ? It doesn't. x86 traditionally doesn't have an execute bit, so traditionally read == exec. So PF_INSTR really wasn't historically very useful, in that it would only show if the *first* access to a page was an instruction fetch - if you did a regular read to brign the page in, then subsequent instruction fetches would just work. Then NX came along, and what happens now is - we handle write faults separately (see the first part of access_error() - so now we know it was a read or an instruction fetch - if PF_PROT is set, that means that the present bit was set in the page tables, so it must have been an exec access to a NX page - otherwise, we just say PROTNONE means no access, otherwise populate the page tables .. and if it turns out that it was a PF_INSTR to a NX page, we'll end up taking the page fault *again* after it's been populated, and now since the page table was populated, the access_error() will catch it with the PF_PROT case. Or something like that. I might have screwed up some detail, but it should all work. Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Generic page fault (Was: libsigsegv ....)
So for error handling, I'm trying to simply return the VM_FAULT_* flags from generic_page_fault see where that takes us. That's a way to avoid passing an arch specific struct around. It also allows my hack to account major faults with the hypervisor to be done outside the generic code completely (no hook). We will process generically some of the flags first such as the repeat logic or major/minor accounting of course. For that to work, I'm adding a VM_FAULT_ACCESS (that gets OR'ed with VM_FAULT_SIGSEGV) to differentiate SEGV_MAPERR and SEGV_ACCERR. So far so good. However, I noticed a small discrepancy on x86 in the handling of fatal signals: I see two path that can be hit on a fatal signal. The obvious one is the one in access_error() which calls no_context() with a 0 signal argument, the other path is in the retry handling, which will in this case call no_context() with SIGBUS/BUS_ADRERR. Now, the only place in there that seems to care about the signal that gets passed in is the sig_on_uaccess_error case. On one case (0 sig), that test will be skipped, on the other case (SIGBUS), that test will be done and might result in a sigbus being generated, which might override the original deadly signal (or am I missing something ?) Now I don't completely understand how the x86 vsyscall stuff works so I don't know precisely in what circumstances that test matters, I'll need you help there. Cheers, Ben. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Generic page fault (Was: libsigsegv ....)
On Sat, Feb 28, 2015 at 11:56 AM, Linus Torvalds torva...@linux-foundation.org wrote: But I don't know. Maybe I'm wrong. I don't hate the patch as-is, I just have this feeling that it coudl be more generic, and less random small arch helpers. Oh, and I definitely agree with you on the single handle_bad_fault() thing rather than the current handle_bad_area/handle_kernel_fault/do_sigbus. Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Generic page fault (Was: libsigsegv ....)
On Fri, Feb 27, 2015 at 11:12 PM, Benjamin Herrenschmidt b...@kernel.crashing.org wrote: Let me know what you think of the approach. Hmm. I'm not happy with just how many of those arch wrapper/helper functions there are, and some of it looks a bit unportable. For example, the code knows that err_code and address are the only two architecture-specific pieces of information (in addition to struct pt_regs, of course. And the fault_is_user/write() stuff seems unnecessary - we have the generic FAULT_FLAG_USER/WRITE flags for that, but instead of passing in the generic version to the generic function, we pass in the arch-specific ones. The same goes for access_error(). Right now it's an arch-specific helper function, but it actually does some generic tests (like checking the vma protections), and I think it could/should be entirely generic if we just added the necessary FAULT_FLAG_READ/EXEC/NOTPRESENT information to the flags register. Just looking at the ppc version, my gut feel is that access_error() is just horribly error-prone as-is even from an architecture standpoint. Yes, the read/exec/notpresent bits are a bit weird, but old x86 isn't the only architecture that doesn't necessarily know the difference between read and exec. So I'd like a bit more encapsulation. The generic code should never really need to look at the arch-specific details, although it is true that then the error handling cases will likely need it (in order to put it on the arch-specific stack info or similar). So my *gut* feel is that the generic code should be passed - address and the generic flags (ie FAULT_FLAG_USER/WRITE filled in by the caller) These are the only things the generic code should need to use itself - I guess we do need to pass in struct pt_regs, because things like generic tracing actually want it. - an opaque arch specific fault information structure pointer. Not used for anything but passing back to the error functions (ie very much *not* for helper functions for the normal case, like the current access_error() - if it's actually needed for those kinds of things, then I'm wrong about the model) This would (for x86) contain error_code, but I have the feeling that other architectures might need/want more than one word of information. But I don't know. Maybe I'm wrong. I don't hate the patch as-is, I just have this feeling that it coudl be more generic, and less random small arch helpers. Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Generic page fault (Was: libsigsegv ....)
On Sat, 2015-02-28 at 16:41 -0800, Linus Torvalds wrote: On Sat, Feb 28, 2015 at 3:02 PM, Benjamin Herrenschmidt b...@kernel.crashing.org wrote: Anyway, here's the current patch: Ok, I think I like this approach better. Your FAULT_FLAG_EXEC handling is wrong, though. It shouldn't check VM_WRITE, it should check VM_EXEC. A bit too much copy-paste ;) Ah yes indeed :) I would have caught that pretty quickly once I tried powerpc. Btw, it's quite possible that we could just do all the PF_PROT handling at the x86 level, before even calling the generic fault handler. It's not like we even need the vma or the mm semaphore: if it's a non-write protection fault, we always SIGSEGV. So why even bother getting the locks and looking up the page tables etc? For non-write possibly, though that means we always end up going through handle_mm_fault for a non-present PTE before we decide that it wasn't actually accessible. Not a huge deal I suppose unless something (ab)uses mprotect + emulation for things like spying on MMIO accesses (been there) or god knows what other userspace based paging or GC might be out there. It's asking for trouble but the check in access_error() doesn't hurt much either. On the other hand, we are all about simplifying code here, aren't we ? :-) It's not like the early bail-out on x86 will buy us much and it's going to be more arch specific code since they'll all have a different variant of PF_PROT. One thing that worries me a bit, however, with the PF_PROT handling in the generic case is the case of archs who support present-but-not-accessible PTEs in HW (for us that basically means no _PAGE_USER), as these might use such fault for NUMA balancing, or software maintained young bit, and in those case, bailing out on PF_PROT is wrong. I'll have to figure out if that ever happens when I dig through more archs. Now, that PF_PROT handling isn't exactly performance-critical, but it might help to remove the odd x86 special case from the generic code. I don't think it's that odd on x86. If you look at what other archs do, there's a bit of everything out there. Just powerpc may or may not support exec permission for example depending on the CPU generation and when it doesn't it's basically similar to x86. I yet have to fully understand what ARM does (looks simple, just need to make sure it's what I think it is and check various ARM generations), sparc has its oddities, etc... None of that is terribly urgent and I'm pretty busy so give me a bit more time now that we have some kind of direction, to go through a few more archs and see where it takes us. I'll start gathering cross compilers and qemu debian images in the next few days. Cheers, Ben. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Generic page fault (Was: libsigsegv ....)
On Sat, 2015-02-28 at 18:12 +1100, Benjamin Herrenschmidt wrote: > > Let me know what you think of the approach. It's boot tested on x86_64 > in qemu and ... and powerpc64 LE on qemu as well :-) Cheers, Ben. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Generic page fault (Was: libsigsegv ....)
On Sun, 2015-02-01 at 17:09 -0800, Linus Torvalds wrote: > > Of course, what I *really* want would be to make a new > "generic_mm_fault()" helper that would do all the normal stuff: > > - find_vma() > - check permissions and ranges > - call 'handle_mm_fault()' > - do the proper error, retry and minor/major fault handling > > and then most architectures could just call that. So I spent a bit of time today while the kids were playing quietly (it does happen !), and came up with this (very) draft pair of patches for x86 and powerpc. It's by no mean a finished product as you can see, but it shows how "messy" things get. Basically a bit more messy than I originally thought but it's not *too* bad either. Let me know what you think of the approach. It's boot tested on x86_64 in qemu and . Next I think I'll tackle ARM, test a bit more, clean a few things up and submit, but by all means, please provide feedback on the approach before that :) I'm attaching the patches for now (there are two and I don't feel like posting two emails on that subject. The final stuff will be broken down in smaller bits). Cheers, Ben. From 1e3060ecdb479a3dfd587a5870e0351e0b1b5ddc Mon Sep 17 00:00:00 2001 From: Benjamin Herrenschmidt Date: Sat, 28 Feb 2015 17:38:17 +1100 Subject: [PATCH 1/2] Move bulk of x86 __do_page_fault() to a generic_page_fault() (Add add various hooks that other archs will need etc...) Signed-off-by: Benjamin Herrenschmidt --- arch/x86/include/asm/fault.h | 99 + arch/x86/mm/fault.c | 253 +++ include/linux/fault.h| 11 ++ mm/Makefile | 2 +- mm/fault.c | 171 + 5 files changed, 296 insertions(+), 240 deletions(-) create mode 100644 arch/x86/include/asm/fault.h create mode 100644 include/linux/fault.h create mode 100644 mm/fault.c diff --git a/arch/x86/include/asm/fault.h b/arch/x86/include/asm/fault.h new file mode 100644 index 000..7c1712e1 --- /dev/null +++ b/arch/x86/include/asm/fault.h @@ -0,0 +1,99 @@ +#ifndef _ASM_X86_FAULT_H +#define _ASM_X86_FAULT_H + +#include +#include + +/* + * Page fault error code bits: + * + * bit 0 == 0: no page found 1: protection fault + * bit 1 == 0: read access 1: write access + * bit 2 == 0: kernel-mode access 1: user-mode access + * bit 3 ==1: use of reserved bit detected + * bit 4 ==1: fault was an instruction fetch + */ +enum x86_pf_error_code { + + PF_PROT = 1 << 0, + PF_WRITE = 1 << 1, + PF_USER = 1 << 2, + PF_RSVD = 1 << 3, + PF_INSTR = 1 << 4, +}; + +static inline bool fault_is_user(struct pt_regs *regs, unsigned long err_code) +{ + return err_code & PF_USER; +} + +static inline bool fault_is_write(struct pt_regs *regs, unsigned long err_code) +{ + return err_code & PF_WRITE; +} + +/* Return type for do_page_fault */ +typedef void gpf_ret_t; + +#define FAULT_NO_ERR + +/* Check if the stack is allowed to grow during a user page fault */ +static inline bool stack_can_grow(struct pt_regs *regs, unsigned long err_code, + unsigned long address, + struct vm_area_struct *vma) +{ + /* + * Accessing the stack below %sp is always a bug. + * The large cushion allows instructions like enter + * and pusha to work. ("enter $65535, $31" pushes + * 32 pointers and then decrements %sp by 65535.) + */ + return address + 65536 + 32 * sizeof(unsigned long) >= regs->sp; +} + +/* Access validity check */ +static inline bool access_error(struct pt_regs *regs, unsigned long err_code, +struct vm_area_struct *vma) +{ + if (err_code & PF_WRITE) { + /* write, present and write, not present: */ + if (unlikely(!(vma->vm_flags & VM_WRITE))) + return true; + return false; + } + + /* read, present: */ + if (unlikely(err_code & PF_PROT)) + return true; + + /* read, not present: */ + if (unlikely(!(vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE + return true; + + return false; +} + +/* Error handlers */ + +gpf_ret_t handle_bad_area(struct pt_regs *regs, unsigned long error_code, + unsigned long address, int si_code); + + +void no_context(struct pt_regs *regs, unsigned long error_code, + unsigned long address, int signal, int si_code); + +static inline gpf_ret_t handle_kernel_fault(struct pt_regs *regs, + unsigned long error_code, + unsigned long address, int sig, + int si_code) +{ + no_context(regs, error_code, address, sig, si_code); +} + +gpf_ret_t do_sigbus(struct pt_regs *regs, unsigned long error_code, + unsigned long address, unsigned int fault); + +static inline void arch_account_major_fault(void) { } + + +#endif /* _ASM_X86_FAULT_H */ diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index ede025f..b7ca60a 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -13,6 +13,7 @@ #include /* hstate_index_to_shift */ #include /* prefetchw */ #include /* exception_enter(), ... */ +#include #include /* dotraplinkage,
Generic page fault (Was: libsigsegv ....)
On Sun, 2015-02-01 at 17:09 -0800, Linus Torvalds wrote: Of course, what I *really* want would be to make a new generic_mm_fault() helper that would do all the normal stuff: - find_vma() - check permissions and ranges - call 'handle_mm_fault()' - do the proper error, retry and minor/major fault handling and then most architectures could just call that. So I spent a bit of time today while the kids were playing quietly (it does happen !), and came up with this (very) draft pair of patches for x86 and powerpc. It's by no mean a finished product as you can see, but it shows how messy things get. Basically a bit more messy than I originally thought but it's not *too* bad either. Let me know what you think of the approach. It's boot tested on x86_64 in qemu and . Next I think I'll tackle ARM, test a bit more, clean a few things up and submit, but by all means, please provide feedback on the approach before that :) I'm attaching the patches for now (there are two and I don't feel like posting two emails on that subject. The final stuff will be broken down in smaller bits). Cheers, Ben. From 1e3060ecdb479a3dfd587a5870e0351e0b1b5ddc Mon Sep 17 00:00:00 2001 From: Benjamin Herrenschmidt b...@kernel.crashing.org Date: Sat, 28 Feb 2015 17:38:17 +1100 Subject: [PATCH 1/2] Move bulk of x86 __do_page_fault() to a generic_page_fault() (Add add various hooks that other archs will need etc...) Signed-off-by: Benjamin Herrenschmidt b...@kernel.crashing.org --- arch/x86/include/asm/fault.h | 99 + arch/x86/mm/fault.c | 253 +++ include/linux/fault.h| 11 ++ mm/Makefile | 2 +- mm/fault.c | 171 + 5 files changed, 296 insertions(+), 240 deletions(-) create mode 100644 arch/x86/include/asm/fault.h create mode 100644 include/linux/fault.h create mode 100644 mm/fault.c diff --git a/arch/x86/include/asm/fault.h b/arch/x86/include/asm/fault.h new file mode 100644 index 000..7c1712e1 --- /dev/null +++ b/arch/x86/include/asm/fault.h @@ -0,0 +1,99 @@ +#ifndef _ASM_X86_FAULT_H +#define _ASM_X86_FAULT_H + +#include linux/types.h +#include asm/ptrace.h + +/* + * Page fault error code bits: + * + * bit 0 == 0: no page found 1: protection fault + * bit 1 == 0: read access 1: write access + * bit 2 == 0: kernel-mode access 1: user-mode access + * bit 3 ==1: use of reserved bit detected + * bit 4 ==1: fault was an instruction fetch + */ +enum x86_pf_error_code { + + PF_PROT = 1 0, + PF_WRITE = 1 1, + PF_USER = 1 2, + PF_RSVD = 1 3, + PF_INSTR = 1 4, +}; + +static inline bool fault_is_user(struct pt_regs *regs, unsigned long err_code) +{ + return err_code PF_USER; +} + +static inline bool fault_is_write(struct pt_regs *regs, unsigned long err_code) +{ + return err_code PF_WRITE; +} + +/* Return type for do_page_fault */ +typedef void gpf_ret_t; + +#define FAULT_NO_ERR + +/* Check if the stack is allowed to grow during a user page fault */ +static inline bool stack_can_grow(struct pt_regs *regs, unsigned long err_code, + unsigned long address, + struct vm_area_struct *vma) +{ + /* + * Accessing the stack below %sp is always a bug. + * The large cushion allows instructions like enter + * and pusha to work. (enter $65535, $31 pushes + * 32 pointers and then decrements %sp by 65535.) + */ + return address + 65536 + 32 * sizeof(unsigned long) = regs-sp; +} + +/* Access validity check */ +static inline bool access_error(struct pt_regs *regs, unsigned long err_code, +struct vm_area_struct *vma) +{ + if (err_code PF_WRITE) { + /* write, present and write, not present: */ + if (unlikely(!(vma-vm_flags VM_WRITE))) + return true; + return false; + } + + /* read, present: */ + if (unlikely(err_code PF_PROT)) + return true; + + /* read, not present: */ + if (unlikely(!(vma-vm_flags (VM_READ | VM_EXEC | VM_WRITE + return true; + + return false; +} + +/* Error handlers */ + +gpf_ret_t handle_bad_area(struct pt_regs *regs, unsigned long error_code, + unsigned long address, int si_code); + + +void no_context(struct pt_regs *regs, unsigned long error_code, + unsigned long address, int signal, int si_code); + +static inline gpf_ret_t handle_kernel_fault(struct pt_regs *regs, + unsigned long error_code, + unsigned long address, int sig, + int si_code) +{ + no_context(regs, error_code, address, sig, si_code); +} + +gpf_ret_t do_sigbus(struct pt_regs *regs, unsigned long error_code, + unsigned long address, unsigned int fault); + +static inline void arch_account_major_fault(void) { } + + +#endif /* _ASM_X86_FAULT_H */ diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index ede025f..b7ca60a 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -13,6 +13,7 @@ #include linux/hugetlb.h /* hstate_index_to_shift */ #include linux/prefetch.h /* prefetchw */ #include
Re: Generic page fault (Was: libsigsegv ....)
On Sat, 2015-02-28 at 18:12 +1100, Benjamin Herrenschmidt wrote: Let me know what you think of the approach. It's boot tested on x86_64 in qemu and ... and powerpc64 LE on qemu as well :-) Cheers, Ben. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/