Re: [PATCH] Revert "x86: optimize page faults like all other achitectures and kill notifier cruft"
> I have been reading about kprobes and one thing particularly bothers me > in the case of mmio-trace. The probe will actually service the page > fault, therefore it should be able force do_page_fault() to return at > the probe point. I could not figure out a way to do that. > > Is it possible to do reliably with kprobes or markers? Probes are generally not designed to change the flow of the underlying code. While it's in theory possible it will be always unreliable. For checks that result in logic changes you'll always need some kind of explicit hook. -Andi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Revert "x86: optimize page faults like all other achitectures and kill notifier cruft"
On Wed, 9 Jan 2008 11:41:49 +0100 Ingo Molnar <[EMAIL PROTECTED]> wrote: > i agree. There a few practical complication on x86: the > do_page_fault() function is currently excluded from kprobe probing, > for recursion reasons. handle_mm_fault() can be probed OTOH - but > that does not catch vmalloc()-ed faults. The middle of > do_page_fault() [line 348] should work better [the point after > notify_page_fault()] - but it's usually more fragile to insert probes > to such middle-of-the-function places. I have been reading about kprobes and one thing particularly bothers me in the case of mmio-trace. The probe will actually service the page fault, therefore it should be able force do_page_fault() to return at the probe point. I could not figure out a way to do that. Is it possible to do reliably with kprobes or markers? Thanks for the replies. -- Pekka Paalanen http://www.iki.fi/pq/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Revert "x86: optimize page faults like all other achitectures and kill notifier cruft"
> Probing vmalloc faults is _really_ tricky : it also implies that the > handler (let's call it probe) connected to the probe point (marker or > kprobe) should _never_ cause a vmalloc page fault, That is why vmalloc_sync_all() was invented. It might make sense to just call that on kprobe registration. But I agree the other problems makes it a bad idea. I think the better solution is to keep the notifier, but make it cheaper (e.g. by using constant patching ...) -Andi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Revert "x86: optimize page faults like all other achitectures and kill notifier cruft"
* Ingo Molnar ([EMAIL PROTECTED]) wrote: > > (kprobes folks Cc:-ed) > > * David Miller <[EMAIL PROTECTED]> wrote: > > > From: Christoph Hellwig <[EMAIL PROTECTED]> > > Date: Wed, 9 Jan 2008 08:19:45 +0100 > > > > > On Wed, Jan 09, 2008 at 03:55:20AM +, Dave Airlie wrote: > > > > now because Linus said send him a patch to revert regressions rather > > > > than > > > > just complain, > > > > > > this is not a regression by any definition. You were abusing > > > exported symbols for out of tree junk, so you'll lose. > > > > And furthermore, they don't even need it, use a kprobe. > > i agree. There a few practical complication on x86: the do_page_fault() > function is currently excluded from kprobe probing, for recursion > reasons. handle_mm_fault() can be probed OTOH - but that does not catch > vmalloc()-ed faults. The middle of do_page_fault() [line 348] should > work better [the point after notify_page_fault()] - but it's usually > more fragile to insert probes to such middle-of-the-function places. > > So probing pagefaults is not as easy as it should/could be. We should > put a practical NOP marker to around line 348, to make it easier (and > faster) for systemtap to probe there. > > (__kprobes is a highly confusing newspeak name btw - it should be > __noprobe instead.) > > Ingo Probing vmalloc faults is _really_ tricky : it also implies that the handler (let's call it probe) connected to the probe point (marker or kprobe) should _never_ cause a vmalloc page fault, it should therefore never touch vmalloc'd memory, which is a very restrictive constraint, especially for tracing which may need large buffers (the only sane alternative is to allocate the buffers statically before the kernel boots). As for the location of the probe point, we have to determine how we want to handle a OOPSing probe. If we put the probe point too soon in the do_page_fault function, we will end up doing recursive page_fault rather than a OOPS, which may make things harder to debug. In the LTTng instrumentation, I volountarily excluded the bad_area and bad_area_nosemaphore paths from the page fault instrumentation for this exact reason. Currently, I have markers around the handle_mm_fault call : trace_mark(kernel_arch_trap_entry, "trap_id %d ip #p%ld", 14, instruction_pointer(regs)); fault = handle_mm_fault(mm, vma, address, write); trace_mark(kernel_arch_trap_exit, MARK_NOARGS); I also instrument handle_mm_fault, but I leave these markers in do_page_fault to get the architecture specific trap id (the "trap_entry" and "trap_exit" events) and the instruction pointer causing the fault. My handle_mm_fault instrumentation : (note that handle_mm_fault is also called by get_user_pages, not only do_page_fault) int handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long address, int write_access) { int res; pgd_t *pgd; pud_t *pud; pmd_t *pmd; pte_t *pte; trace_mark(mm_handle_fault_entry, "address %lu ip #p%ld write_access %d", address, KSTK_EIP(current), write_access); __set_current_state(TASK_RUNNING); count_vm_event(PGFAULT); if (unlikely(is_vm_hugetlb_page(vma))) { res = hugetlb_fault(mm, vma, address, write_access); goto end; } pgd = pgd_offset(mm, address); pud = pud_alloc(mm, pgd, address); if (!pud) { res = VM_FAULT_OOM; goto end; } pmd = pmd_alloc(mm, pud, address); if (!pmd) { res = VM_FAULT_OOM; goto end; } pte = pte_alloc_map(mm, pmd, address); if (!pte) { res = VM_FAULT_OOM; goto end; } res = handle_pte_fault(mm, vma, address, pte, pmd, write_access); end: trace_mark(mm_handle_fault_exit, MARK_NOARGS); return res; } Mathieu -- Mathieu Desnoyers Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Revert "x86: optimize page faults like all other achitectures and kill notifier cruft"
* Andi Kleen <[EMAIL PROTECTED]> wrote: > On Wed, Jan 09, 2008 at 07:18:46AM -0800, Arjan van de Ven wrote: > > On Wed, 9 Jan 2008 03:17:37 + (GMT) > > Dave Airlie <[EMAIL PROTECTED]> wrote: > > > > > So all distros with 2.6.24 kernels are useless to mmiotrace I don't > > > see why leaving things as is until a suitable replacement mechanism > > > can be used.. > > > > you work for a distro.. surely you can convince your own distro to > > carry this patch for one release? Maybe if it's this important.. so > > can the others... > > But if it's good for a major distribution why is it not good for > mainline? it's a bit too late to get the out-of-tree module into v2.6.24, and the revert makes little sense without the extra out-of-tree module. We at a minimum need a clear explanation of why this functionality cannot be provided via kprobes/systemtap. > We had this discussion at last kernel summit and the answer was clear. it's a clear answer 1 week after the merge window opens, but not 1 week before the next stable kernel is released. Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Revert "x86: optimize page faults like all other achitectures and kill notifier cruft"
On Wed, Jan 09, 2008 at 07:18:46AM -0800, Arjan van de Ven wrote: > On Wed, 9 Jan 2008 03:17:37 + (GMT) > Dave Airlie <[EMAIL PROTECTED]> wrote: > > > So all distros with 2.6.24 kernels are useless to mmiotrace I don't > > see why leaving things as is until a suitable replacement mechanism > > can be used.. > > you work for a distro.. surely you can convince your own distro to carry this > patch for one release? > Maybe if it's this important.. so can the others... But if it's good for a major distribution why is it not good for mainline? We had this discussion at last kernel summit and the answer was clear. -Andi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Revert "x86: optimize page faults like all other achitectures and kill notifier cruft"
On Wed, 9 Jan 2008 03:17:37 + (GMT) Dave Airlie <[EMAIL PROTECTED]> wrote: > So all distros with 2.6.24 kernels are useless to mmiotrace I don't > see why leaving things as is until a suitable replacement mechanism > can be used.. you work for a distro.. surely you can convince your own distro to carry this patch for one release? Maybe if it's this important.. so can the others... -- If you want to reach me at my work email, use [EMAIL PROTECTED] For development, discussion and tips for power savings, visit http://www.lesswatts.org -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Revert "x86: optimize page faults like all other achitectures and kill notifier cruft"
> You can set a kprobe on the x86 fault handler to do things like > mmiotrace. That would mean that if the kprobe faults it goes into an endless loop. Most of do_page_fault() is not really safe for kprobes. -Andi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Revert "x86: optimize page faults like all other achitectures and kill notifier cruft"
From: Jiri Kosina <[EMAIL PROTECTED]> Date: Wed, 9 Jan 2008 14:19:58 +0100 (CET) > On Tue, 8 Jan 2008, David Miller wrote: > > > You can set a kprobe on the x86 fault handler to do things like > > mmiotrace. > > Currently, on x86, you can not, because: > > fastcall void __kprobes do_page_fault( ... ); Read Ingo's reply. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Revert "x86: optimize page faults like all other achitectures and kill notifier cruft"
On Tue, 8 Jan 2008, David Miller wrote: > You can set a kprobe on the x86 fault handler to do things like > mmiotrace. Currently, on x86, you can not, because: fastcall void __kprobes do_page_fault( ... ); -- Jiri Kosina -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Revert "x86: optimize page faults like all other achitectures and kill notifier cruft"
(kprobes folks Cc:-ed) * David Miller <[EMAIL PROTECTED]> wrote: > From: Christoph Hellwig <[EMAIL PROTECTED]> > Date: Wed, 9 Jan 2008 08:19:45 +0100 > > > On Wed, Jan 09, 2008 at 03:55:20AM +, Dave Airlie wrote: > > > now because Linus said send him a patch to revert regressions rather than > > > just complain, > > > > this is not a regression by any definition. You were abusing > > exported symbols for out of tree junk, so you'll lose. > > And furthermore, they don't even need it, use a kprobe. i agree. There a few practical complication on x86: the do_page_fault() function is currently excluded from kprobe probing, for recursion reasons. handle_mm_fault() can be probed OTOH - but that does not catch vmalloc()-ed faults. The middle of do_page_fault() [line 348] should work better [the point after notify_page_fault()] - but it's usually more fragile to insert probes to such middle-of-the-function places. So probing pagefaults is not as easy as it should/could be. We should put a practical NOP marker to around line 348, to make it easier (and faster) for systemtap to probe there. (__kprobes is a highly confusing newspeak name btw - it should be __noprobe instead.) Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Revert "x86: optimize page faults like all other achitectures and kill notifier cruft"
>That change has been in the mainline tree for nearly three months. All >these affected parties have left it until the eve of 2.6.24 to actually >tell us about it. This is causing me sympathy problems :( Not true - I complained about this on Dec 3rd (attached), with the result of not getting a response from anyone but Andi (agreeing to restore these notifiers). Jan --- Begin Message --- Ever since I started to try to get at least some fundamental infrastructure pieces merged for using NLKD on Linux I was told that direct calls out of exception handlers for the sake of an individual (and perhaps even small) sub-system is undesirable. Making the exception notifiers report the right (correct) information (and, specific to x86, ensuring they get called in the right place) was one of the fundamental things, and just now I see that this is being reverted for (in my eyes) no good reason: Instead of adding direct calls to x86, all the other architectures should have followed the notifier model in order for the infrastructure to be usable by external components, especially if these aren't allowed into the kernel. Am I to conclude that replacing direct calls elsewhere in the tree (in order to e.g. avoid all kinds of small sub-components leaving their footprint in core files like kernel/fork.c) is no longer a desirable goal, thereby making it almost impossible to ever host a kernel debugger *without* having to patch core files. Thanks, Jan --- End Message ---
Re: [PATCH] Revert x86: optimize page faults like all other achitectures and kill notifier cruft
That change has been in the mainline tree for nearly three months. All these affected parties have left it until the eve of 2.6.24 to actually tell us about it. This is causing me sympathy problems :( Not true - I complained about this on Dec 3rd (attached), with the result of not getting a response from anyone but Andi (agreeing to restore these notifiers). Jan ---BeginMessage--- Ever since I started to try to get at least some fundamental infrastructure pieces merged for using NLKD on Linux I was told that direct calls out of exception handlers for the sake of an individual (and perhaps even small) sub-system is undesirable. Making the exception notifiers report the right (correct) information (and, specific to x86, ensuring they get called in the right place) was one of the fundamental things, and just now I see that this is being reverted for (in my eyes) no good reason: Instead of adding direct calls to x86, all the other architectures should have followed the notifier model in order for the infrastructure to be usable by external components, especially if these aren't allowed into the kernel. Am I to conclude that replacing direct calls elsewhere in the tree (in order to e.g. avoid all kinds of small sub-components leaving their footprint in core files like kernel/fork.c) is no longer a desirable goal, thereby making it almost impossible to ever host a kernel debugger *without* having to patch core files. Thanks, Jan ---End Message---
Re: [PATCH] Revert x86: optimize page faults like all other achitectures and kill notifier cruft
(kprobes folks Cc:-ed) * David Miller [EMAIL PROTECTED] wrote: From: Christoph Hellwig [EMAIL PROTECTED] Date: Wed, 9 Jan 2008 08:19:45 +0100 On Wed, Jan 09, 2008 at 03:55:20AM +, Dave Airlie wrote: now because Linus said send him a patch to revert regressions rather than just complain, this is not a regression by any definition. You were abusing exported symbols for out of tree junk, so you'll lose. And furthermore, they don't even need it, use a kprobe. i agree. There a few practical complication on x86: the do_page_fault() function is currently excluded from kprobe probing, for recursion reasons. handle_mm_fault() can be probed OTOH - but that does not catch vmalloc()-ed faults. The middle of do_page_fault() [line 348] should work better [the point after notify_page_fault()] - but it's usually more fragile to insert probes to such middle-of-the-function places. So probing pagefaults is not as easy as it should/could be. We should put a practical NOP marker to around line 348, to make it easier (and faster) for systemtap to probe there. (__kprobes is a highly confusing newspeak name btw - it should be __noprobe instead.) Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Revert x86: optimize page faults like all other achitectures and kill notifier cruft
On Tue, 8 Jan 2008, David Miller wrote: You can set a kprobe on the x86 fault handler to do things like mmiotrace. Currently, on x86, you can not, because: fastcall void __kprobes do_page_fault( ... ); -- Jiri Kosina -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Revert x86: optimize page faults like all other achitectures and kill notifier cruft
From: Jiri Kosina [EMAIL PROTECTED] Date: Wed, 9 Jan 2008 14:19:58 +0100 (CET) On Tue, 8 Jan 2008, David Miller wrote: You can set a kprobe on the x86 fault handler to do things like mmiotrace. Currently, on x86, you can not, because: fastcall void __kprobes do_page_fault( ... ); Read Ingo's reply. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Revert x86: optimize page faults like all other achitectures and kill notifier cruft
You can set a kprobe on the x86 fault handler to do things like mmiotrace. That would mean that if the kprobe faults it goes into an endless loop. Most of do_page_fault() is not really safe for kprobes. -Andi -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Revert x86: optimize page faults like all other achitectures and kill notifier cruft
On Wed, Jan 09, 2008 at 07:18:46AM -0800, Arjan van de Ven wrote: On Wed, 9 Jan 2008 03:17:37 + (GMT) Dave Airlie [EMAIL PROTECTED] wrote: So all distros with 2.6.24 kernels are useless to mmiotrace I don't see why leaving things as is until a suitable replacement mechanism can be used.. you work for a distro.. surely you can convince your own distro to carry this patch for one release? Maybe if it's this important.. so can the others... But if it's good for a major distribution why is it not good for mainline? We had this discussion at last kernel summit and the answer was clear. -Andi -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Revert x86: optimize page faults like all other achitectures and kill notifier cruft
* Andi Kleen [EMAIL PROTECTED] wrote: On Wed, Jan 09, 2008 at 07:18:46AM -0800, Arjan van de Ven wrote: On Wed, 9 Jan 2008 03:17:37 + (GMT) Dave Airlie [EMAIL PROTECTED] wrote: So all distros with 2.6.24 kernels are useless to mmiotrace I don't see why leaving things as is until a suitable replacement mechanism can be used.. you work for a distro.. surely you can convince your own distro to carry this patch for one release? Maybe if it's this important.. so can the others... But if it's good for a major distribution why is it not good for mainline? it's a bit too late to get the out-of-tree module into v2.6.24, and the revert makes little sense without the extra out-of-tree module. We at a minimum need a clear explanation of why this functionality cannot be provided via kprobes/systemtap. We had this discussion at last kernel summit and the answer was clear. it's a clear answer 1 week after the merge window opens, but not 1 week before the next stable kernel is released. Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Revert x86: optimize page faults like all other achitectures and kill notifier cruft
On Wed, 9 Jan 2008 03:17:37 + (GMT) Dave Airlie [EMAIL PROTECTED] wrote: So all distros with 2.6.24 kernels are useless to mmiotrace I don't see why leaving things as is until a suitable replacement mechanism can be used.. you work for a distro.. surely you can convince your own distro to carry this patch for one release? Maybe if it's this important.. so can the others... -- If you want to reach me at my work email, use [EMAIL PROTECTED] For development, discussion and tips for power savings, visit http://www.lesswatts.org -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Revert x86: optimize page faults like all other achitectures and kill notifier cruft
* Ingo Molnar ([EMAIL PROTECTED]) wrote: (kprobes folks Cc:-ed) * David Miller [EMAIL PROTECTED] wrote: From: Christoph Hellwig [EMAIL PROTECTED] Date: Wed, 9 Jan 2008 08:19:45 +0100 On Wed, Jan 09, 2008 at 03:55:20AM +, Dave Airlie wrote: now because Linus said send him a patch to revert regressions rather than just complain, this is not a regression by any definition. You were abusing exported symbols for out of tree junk, so you'll lose. And furthermore, they don't even need it, use a kprobe. i agree. There a few practical complication on x86: the do_page_fault() function is currently excluded from kprobe probing, for recursion reasons. handle_mm_fault() can be probed OTOH - but that does not catch vmalloc()-ed faults. The middle of do_page_fault() [line 348] should work better [the point after notify_page_fault()] - but it's usually more fragile to insert probes to such middle-of-the-function places. So probing pagefaults is not as easy as it should/could be. We should put a practical NOP marker to around line 348, to make it easier (and faster) for systemtap to probe there. (__kprobes is a highly confusing newspeak name btw - it should be __noprobe instead.) Ingo Probing vmalloc faults is _really_ tricky : it also implies that the handler (let's call it probe) connected to the probe point (marker or kprobe) should _never_ cause a vmalloc page fault, it should therefore never touch vmalloc'd memory, which is a very restrictive constraint, especially for tracing which may need large buffers (the only sane alternative is to allocate the buffers statically before the kernel boots). As for the location of the probe point, we have to determine how we want to handle a OOPSing probe. If we put the probe point too soon in the do_page_fault function, we will end up doing recursive page_fault rather than a OOPS, which may make things harder to debug. In the LTTng instrumentation, I volountarily excluded the bad_area and bad_area_nosemaphore paths from the page fault instrumentation for this exact reason. Currently, I have markers around the handle_mm_fault call : trace_mark(kernel_arch_trap_entry, trap_id %d ip #p%ld, 14, instruction_pointer(regs)); fault = handle_mm_fault(mm, vma, address, write); trace_mark(kernel_arch_trap_exit, MARK_NOARGS); I also instrument handle_mm_fault, but I leave these markers in do_page_fault to get the architecture specific trap id (the trap_entry and trap_exit events) and the instruction pointer causing the fault. My handle_mm_fault instrumentation : (note that handle_mm_fault is also called by get_user_pages, not only do_page_fault) int handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long address, int write_access) { int res; pgd_t *pgd; pud_t *pud; pmd_t *pmd; pte_t *pte; trace_mark(mm_handle_fault_entry, address %lu ip #p%ld write_access %d, address, KSTK_EIP(current), write_access); __set_current_state(TASK_RUNNING); count_vm_event(PGFAULT); if (unlikely(is_vm_hugetlb_page(vma))) { res = hugetlb_fault(mm, vma, address, write_access); goto end; } pgd = pgd_offset(mm, address); pud = pud_alloc(mm, pgd, address); if (!pud) { res = VM_FAULT_OOM; goto end; } pmd = pmd_alloc(mm, pud, address); if (!pmd) { res = VM_FAULT_OOM; goto end; } pte = pte_alloc_map(mm, pmd, address); if (!pte) { res = VM_FAULT_OOM; goto end; } res = handle_pte_fault(mm, vma, address, pte, pmd, write_access); end: trace_mark(mm_handle_fault_exit, MARK_NOARGS); return res; } Mathieu -- Mathieu Desnoyers Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Revert x86: optimize page faults like all other achitectures and kill notifier cruft
On Wed, 9 Jan 2008 11:41:49 +0100 Ingo Molnar [EMAIL PROTECTED] wrote: i agree. There a few practical complication on x86: the do_page_fault() function is currently excluded from kprobe probing, for recursion reasons. handle_mm_fault() can be probed OTOH - but that does not catch vmalloc()-ed faults. The middle of do_page_fault() [line 348] should work better [the point after notify_page_fault()] - but it's usually more fragile to insert probes to such middle-of-the-function places. I have been reading about kprobes and one thing particularly bothers me in the case of mmio-trace. The probe will actually service the page fault, therefore it should be able force do_page_fault() to return at the probe point. I could not figure out a way to do that. Is it possible to do reliably with kprobes or markers? Thanks for the replies. -- Pekka Paalanen http://www.iki.fi/pq/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Revert x86: optimize page faults like all other achitectures and kill notifier cruft
I have been reading about kprobes and one thing particularly bothers me in the case of mmio-trace. The probe will actually service the page fault, therefore it should be able force do_page_fault() to return at the probe point. I could not figure out a way to do that. Is it possible to do reliably with kprobes or markers? Probes are generally not designed to change the flow of the underlying code. While it's in theory possible it will be always unreliable. For checks that result in logic changes you'll always need some kind of explicit hook. -Andi -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Revert "x86: optimize page faults like all other achitectures and kill notifier cruft"
From: Christoph Hellwig <[EMAIL PROTECTED]> Date: Wed, 9 Jan 2008 08:19:45 +0100 > On Wed, Jan 09, 2008 at 03:55:20AM +, Dave Airlie wrote: > > now because Linus said send him a patch to revert regressions rather than > > just complain, > > this is not a regression by any definition. You were abusing exported > symbols for out of tree junk, so you'll lose. And furthermore, they don't even need it, use a kprobe. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Revert "x86: optimize page faults like all other achitectures and kill notifier cruft"
From: Christoph Hellwig <[EMAIL PROTECTED]> Date: Wed, 9 Jan 2008 08:17:27 +0100 > NACK. If you want to do it you'll need a much better reason and an > in-tree user. And if you want to redo it it should be available for > all platforms with a consistant API. I majorly NACK this as well, we don't want to bring this thing back especially for specialized debugging hacks. You can set a kprobe on the x86 fault handler to do things like mmiotrace. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Revert "x86: optimize page faults like all other achitectures and kill notifier cruft"
On Wed, Jan 09, 2008 at 03:55:20AM +, Dave Airlie wrote: > now because Linus said send him a patch to revert regressions rather than > just complain, this is not a regression by any definition. You were abusing exported symbols for out of tree junk, so you'll lose. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Revert "x86: optimize page faults like all other achitectures and kill notifier cruft"
On Wed, Jan 09, 2008 at 02:34:46AM +, Dave Airlie wrote: > > [This an initial RFC but I'd like to have this patch in before 2.6.24 goes > final as it really breaks this useful feature] > > mmiotrace the MMIO access tracer used to reverse engineer binary blobs > used this notifier interface and is planned on being pushed upstream. > > Having users able to just use the tracer module without having to rebuild > their kernel to add in a page fault handler hack means we get a lot > greater coverage for reverse engineering efforts. > > Signed-off-by: David Airlie <[EMAIL PROTECTED]> > > This reverts commit 74a0b5762713a26496db72eac34fbbed46f20fce. > Conflicts: NACK. If you want to do it you'll need a much better reason and an in-tree user. And if you want to redo it it should be available for all platforms with a consistant API. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Revert "x86: optimize page faults like all other achitectures and kill notifier cruft"
> An alternative might be to come up with something decent and target 2.6.24.x If you want zero cache line cost the only way is to handle that using Mathieu's inline patch infrastructure. Having a generic notifier type based on that would be probably a good idea. -Andi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Revert "x86: optimize page faults like all other achitectures and kill notifier cruft"
> > An alternative might be to come up with something decent and target 2.6.24.x I don't see mmiotrace getting merged into a stable kernel... how do however see it getting cleaned up for 2.6.25 now that people know how fragile the kernel hooks for it are.. > We put the crappy code back in for 2.6.24 then take it out immediately > after 2.6.24 and put something else in to support mmiotrace and perhaps the > other new mystery features to which you refer below. hm. (I think the other mystery feature is actually a Novell kernel debugger but I'm not sure, madwifi use it for similiar reasons to mmiotrace I think..) > > > all that crap > > > } > > > > > > > > > But that's all speculation. Has anyone actually measured the pagefault > > > latency impact of this change? Message-Id: <[EMAIL PROTECTED]> Subject: [patch 20/38] Minor fault path optimization. Date: Fri, 27 Apr 2007 16:05:23 +0200 was a patch to do exactly that.. hch decided the feature wasn't useful and posted a patch to remove it.. > > That change has been in the mainline tree for nearly three months. All > these affected parties have left it until the eve of 2.6.24 to actually > tell us about it. This is causing me sympathy problems :( > Jan first complained on the 4th Decemeber last year, I'm just posting this now because Linus said send him a patch to revert regressions rather than just complain, I've prepared the patch to put back the old behaviour from 2.6.23. This was only brought to my notice this morning but I'm not going to let that stop me from trying to find a correct fix rather than just ripping the feature out.. I think we could apply the page fault cleanup patch I mentioned earlier on top of this patch and get back the 300 cycles and that would make people happy, it makes sense for mmiotrace to use kprobes hooks and not have to do this stuff directly but if that is what is wanted the mmiotrace guys can do it directly in the future. Dave. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Revert "x86: optimize page faults like all other achitectures and kill notifier cruft"
On Wed, 9 Jan 2008 03:17:37 + (GMT) Dave Airlie <[EMAIL PROTECTED]> wrote: > > > On Wed, 9 Jan 2008 02:34:46 + (GMT) Dave Airlie <[EMAIL PROTECTED]> > > wrote: > > > > > > > > [This an initial RFC but I'd like to have this patch in before 2.6.24 > > > goes > > > final as it really breaks this useful feature] > > > > > > mmiotrace the MMIO access tracer used to reverse engineer binary blobs > > > used this notifier interface and is planned on being pushed upstream. > > > > > > Having users able to just use the tracer module without having to rebuild > > > their kernel to add in a page fault handler hack means we get a lot > > > greater coverage for reverse engineering efforts. > > > > Sorry, but that's a really really small benefit. This very small number of > > fairly (or very) technical users will be able to work out a way of getting > > this to work in 2.6.24. And in 2.6.25 with a merged mmiotrace we can do > > something different. > > mmiotrace isn't targetted at fairly or technical users, its whole > usefulness is that you don't need a kernel re-build, the distro kernels > all contain enough support for us to just get a user to grab mmiotrace, > run make and get a trace so in my eyes this a major feature regression > to have to go back to custom kernel builds... An alternative might be to come up with something decent and target 2.6.24.x > > It's a modest convenience to a very small number of people. And the cost? > > Multiple functions calls and multiple cachelines hit for every pagefault > > on, what? Tens of millions of machines? > > Which has been happening for how many months? perhaps if we merge > mmiotrace in 2.6.25 we can clean up this function, otherwise I just count > it as a feature regression... We put the crappy code back in for 2.6.24 then take it out immediately after 2.6.24 and put something else in to support mmiotrace and perhaps the other new mystery features to which you refer below. hm. > > pagefault it populates a struct on the stack, passes that around for a > > while, does a bit of RCU stuff only to find that there was nothing to do. > > Surely we should at least be doing something along the lines of > > > > if (unlikely(notify_page_fault_chain.notifier_call != NULL)) { > > all that crap > > } > > > > > > But that's all speculation. Has anyone actually measured the pagefault > > latency impact of this change? ^^ this. > > > +/* > > > + * These are only here because kprobes.c wants them to implement a > > > + * blatant layering violation. Will hopefully go away soon once all > > > + * architectures are updated. > > > + */ > > > +static inline int register_page_fault_notifier(struct notifier_block *nb) > > > +{ > > > + return 0; > > > +} > > > +static inline int unregister_page_fault_notifier(struct notifier_block > > > *nb) > > > +{ > > > + return 0; > > > +} > > > + > > > > And this doesn't look very good either. For how long did this fixme remain > > unfixed? > > > > > > So I'd suggest that we leave things as they are for 2.6.24 - mmiotrace > > people will work something out, I'm sure. For 2.6.25 if we merge mmiotrace > > we can look at doing something which is vaguely efficient and tasteful. > > > > I just reverted Christophs patch I didn't try and work out if the old code > had problems no one has fixed... > > So all distros with 2.6.24 kernels are useless to mmiotrace I don't see > why leaving things as is until a suitable replacement mechanism can be > used.. I've heard others give out about this also madwifi and SuSE kernel > folks... That change has been in the mainline tree for nearly three months. All these affected parties have left it until the eve of 2.6.24 to actually tell us about it. This is causing me sympathy problems :( -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Revert "x86: optimize page faults like all other achitectures and kill notifier cruft"
> On Wed, 9 Jan 2008 02:34:46 + (GMT) Dave Airlie <[EMAIL PROTECTED]> wrote: > > > > > [This an initial RFC but I'd like to have this patch in before 2.6.24 goes > > final as it really breaks this useful feature] > > > > mmiotrace the MMIO access tracer used to reverse engineer binary blobs > > used this notifier interface and is planned on being pushed upstream. > > > > Having users able to just use the tracer module without having to rebuild > > their kernel to add in a page fault handler hack means we get a lot > > greater coverage for reverse engineering efforts. > > Sorry, but that's a really really small benefit. This very small number of > fairly (or very) technical users will be able to work out a way of getting > this to work in 2.6.24. And in 2.6.25 with a merged mmiotrace we can do > something different. mmiotrace isn't targetted at fairly or technical users, its whole usefulness is that you don't need a kernel re-build, the distro kernels all contain enough support for us to just get a user to grab mmiotrace, run make and get a trace so in my eyes this a major feature regression to have to go back to custom kernel builds... > It's a modest convenience to a very small number of people. And the cost? > Multiple functions calls and multiple cachelines hit for every pagefault > on, what? Tens of millions of machines? Which has been happening for how many months? perhaps if we merge mmiotrace in 2.6.25 we can clean up this function, otherwise I just count it as a feature regression... > pagefault it populates a struct on the stack, passes that around for a > while, does a bit of RCU stuff only to find that there was nothing to do. > Surely we should at least be doing something along the lines of > > if (unlikely(notify_page_fault_chain.notifier_call != NULL)) { > all that crap > } > > > But that's all speculation. Has anyone actually measured the pagefault > latency impact of this change? > > > +/* > > + * These are only here because kprobes.c wants them to implement a > > + * blatant layering violation. Will hopefully go away soon once all > > + * architectures are updated. > > + */ > > +static inline int register_page_fault_notifier(struct notifier_block *nb) > > +{ > > + return 0; > > +} > > +static inline int unregister_page_fault_notifier(struct notifier_block *nb) > > +{ > > + return 0; > > +} > > + > > And this doesn't look very good either. For how long did this fixme remain > unfixed? > > > So I'd suggest that we leave things as they are for 2.6.24 - mmiotrace > people will work something out, I'm sure. For 2.6.25 if we merge mmiotrace > we can look at doing something which is vaguely efficient and tasteful. > I just reverted Christophs patch I didn't try and work out if the old code had problems no one has fixed... So all distros with 2.6.24 kernels are useless to mmiotrace I don't see why leaving things as is until a suitable replacement mechanism can be used.. I've heard others give out about this also madwifi and SuSE kernel folks... Dave. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Revert "x86: optimize page faults like all other achitectures and kill notifier cruft"
On Wed, 9 Jan 2008 02:34:46 + (GMT) Dave Airlie <[EMAIL PROTECTED]> wrote: > > [This an initial RFC but I'd like to have this patch in before 2.6.24 goes > final as it really breaks this useful feature] > > mmiotrace the MMIO access tracer used to reverse engineer binary blobs > used this notifier interface and is planned on being pushed upstream. > > Having users able to just use the tracer module without having to rebuild > their kernel to add in a page fault handler hack means we get a lot > greater coverage for reverse engineering efforts. Sorry, but that's a really really small benefit. This very small number of fairly (or very) technical users will be able to work out a way of getting this to work in 2.6.24. And in 2.6.25 with a merged mmiotrace we can do something different. It's a modest convenience to a very small number of people. And the cost? Multiple functions calls and multiple cachelines hit for every pagefault on, what? Tens of millions of machines? Plus the code which is getting restored isn't even very good. For every pagefault it populates a struct on the stack, passes that around for a while, does a bit of RCU stuff only to find that there was nothing to do. Surely we should at least be doing something along the lines of if (unlikely(notify_page_fault_chain.notifier_call != NULL)) { all that crap } But that's all speculation. Has anyone actually measured the pagefault latency impact of this change? > +/* > + * These are only here because kprobes.c wants them to implement a > + * blatant layering violation. Will hopefully go away soon once all > + * architectures are updated. > + */ > +static inline int register_page_fault_notifier(struct notifier_block *nb) > +{ > + return 0; > +} > +static inline int unregister_page_fault_notifier(struct notifier_block *nb) > +{ > + return 0; > +} > + And this doesn't look very good either. For how long did this fixme remain unfixed? So I'd suggest that we leave things as they are for 2.6.24 - mmiotrace people will work something out, I'm sure. For 2.6.25 if we merge mmiotrace we can look at doing something which is vaguely efficient and tasteful. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Revert "x86: optimize page faults like all other achitectures and kill notifier cruft"
On Wed, Jan 09, 2008 at 02:34:46AM +, Dave Airlie wrote: > > [This an initial RFC but I'd like to have this patch in before 2.6.24 goes > final as it really breaks this useful feature] > > mmiotrace the MMIO access tracer used to reverse engineer binary blobs > used this notifier interface and is planned on being pushed upstream. > > Having users able to just use the tracer module without having to rebuild > their kernel to add in a page fault handler hack means we get a lot > greater coverage for reverse engineering efforts. > > Signed-off-by: David Airlie <[EMAIL PROTECTED]> Acked-by: Andi Kleen <[EMAIL PROTECTED]> I never liked the original patch. -Andi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Revert "x86: optimize page faults like all other achitectures and kill notifier cruft"
[This an initial RFC but I'd like to have this patch in before 2.6.24 goes final as it really breaks this useful feature] mmiotrace the MMIO access tracer used to reverse engineer binary blobs used this notifier interface and is planned on being pushed upstream. Having users able to just use the tracer module without having to rebuild their kernel to add in a page fault handler hack means we get a lot greater coverage for reverse engineering efforts. Signed-off-by: David Airlie <[EMAIL PROTECTED]> This reverts commit 74a0b5762713a26496db72eac34fbbed46f20fce. Conflicts: include/asm-avr32/kprobes.h include/asm-ia64/kprobes.h include/asm-s390/kprobes.h include/asm-x86/kdebug_32.h include/asm-x86/kdebug_64.h include/asm-x86/kprobes_64.h --- arch/x86/kernel/kprobes_32.c |3 +- arch/x86/kernel/kprobes_64.c |1 + arch/x86/mm/fault_32.c| 43 ++- arch/x86/mm/fault_64.c| 44 +++- include/asm-avr32/kdebug.h| 16 ++ include/asm-avr32/kprobes.h |1 + include/asm-ia64/kdebug.h | 15 ++ include/asm-ia64/kprobes.h|1 + include/asm-powerpc/kdebug.h | 19 + include/asm-powerpc/kprobes.h |1 + include/asm-s390/kdebug.h | 15 ++ include/asm-s390/kprobes.h|1 + include/asm-sh/kdebug.h |2 + include/asm-sparc64/kdebug.h | 18 include/asm-sparc64/kprobes.h |1 + include/asm-x86/kdebug.h |3 ++ include/asm-x86/kprobes_32.h |2 +- include/asm-x86/kprobes_64.h |1 + kernel/kprobes.c | 39 +-- 19 files changed, 183 insertions(+), 43 deletions(-) diff --git a/arch/x86/kernel/kprobes_32.c b/arch/x86/kernel/kprobes_32.c index 3a020f7..1ba8fee 100644 --- a/arch/x86/kernel/kprobes_32.c +++ b/arch/x86/kernel/kprobes_32.c @@ -586,7 +586,7 @@ out: return 1; } -int __kprobes kprobe_fault_handler(struct pt_regs *regs, int trapnr) +static int __kprobes kprobe_fault_handler(struct pt_regs *regs, int trapnr) { struct kprobe *cur = kprobe_running(); struct kprobe_ctlblk *kcb = get_kprobe_ctlblk(); @@ -668,6 +668,7 @@ int __kprobes kprobe_exceptions_notify(struct notifier_block *self, ret = NOTIFY_STOP; break; case DIE_GPF: + case DIE_PAGE_FAULT: /* kprobe_running() needs smp_processor_id() */ preempt_disable(); if (kprobe_running() && diff --git a/arch/x86/kernel/kprobes_64.c b/arch/x86/kernel/kprobes_64.c index 5df19a9..279cea7 100644 --- a/arch/x86/kernel/kprobes_64.c +++ b/arch/x86/kernel/kprobes_64.c @@ -654,6 +654,7 @@ int __kprobes kprobe_exceptions_notify(struct notifier_block *self, ret = NOTIFY_STOP; break; case DIE_GPF: + case DIE_PAGE_FAULT: /* kprobe_running() needs smp_processor_id() */ preempt_disable(); if (kprobe_running() && diff --git a/arch/x86/mm/fault_32.c b/arch/x86/mm/fault_32.c index a2273d4..f03cc93 100644 --- a/arch/x86/mm/fault_32.c +++ b/arch/x86/mm/fault_32.c @@ -25,7 +25,6 @@ #include #include #include -#include #include #include @@ -33,27 +32,33 @@ extern void die(const char *,struct pt_regs *,long); -#ifdef CONFIG_KPROBES -static inline int notify_page_fault(struct pt_regs *regs) +static ATOMIC_NOTIFIER_HEAD(notify_page_fault_chain); + +int register_page_fault_notifier(struct notifier_block *nb) { - int ret = 0; - - /* kprobe_running() needs smp_processor_id() */ - if (!user_mode_vm(regs)) { - preempt_disable(); - if (kprobe_running() && kprobe_fault_handler(regs, 14)) - ret = 1; - preempt_enable(); - } + vmalloc_sync_all(); + return atomic_notifier_chain_register(_page_fault_chain, nb); +} +EXPORT_SYMBOL_GPL(register_page_fault_notifier); - return ret; +int unregister_page_fault_notifier(struct notifier_block *nb) +{ + return atomic_notifier_chain_unregister(_page_fault_chain, nb); } -#else -static inline int notify_page_fault(struct pt_regs *regs) +EXPORT_SYMBOL_GPL(unregister_page_fault_notifier); + +static inline int notify_page_fault(struct pt_regs *regs, long err) { - return 0; + struct die_args args = { + .regs = regs, + .str = "page fault", + .err = err, + .trapnr = 14, + .signr = SIGSEGV + }; + return atomic_notifier_call_chain(_page_fault_chain, + DIE_PAGE_FAULT, ); } -#endif /* * Return EIP plus the CS segment base. The segment limit is also @@ -331,7 +336,7 @@ fastcall void __kprobes do_page_fault(struct pt_regs *regs, if (unlikely(address
[PATCH] Revert x86: optimize page faults like all other achitectures and kill notifier cruft
[This an initial RFC but I'd like to have this patch in before 2.6.24 goes final as it really breaks this useful feature] mmiotrace the MMIO access tracer used to reverse engineer binary blobs used this notifier interface and is planned on being pushed upstream. Having users able to just use the tracer module without having to rebuild their kernel to add in a page fault handler hack means we get a lot greater coverage for reverse engineering efforts. Signed-off-by: David Airlie [EMAIL PROTECTED] This reverts commit 74a0b5762713a26496db72eac34fbbed46f20fce. Conflicts: include/asm-avr32/kprobes.h include/asm-ia64/kprobes.h include/asm-s390/kprobes.h include/asm-x86/kdebug_32.h include/asm-x86/kdebug_64.h include/asm-x86/kprobes_64.h --- arch/x86/kernel/kprobes_32.c |3 +- arch/x86/kernel/kprobes_64.c |1 + arch/x86/mm/fault_32.c| 43 ++- arch/x86/mm/fault_64.c| 44 +++- include/asm-avr32/kdebug.h| 16 ++ include/asm-avr32/kprobes.h |1 + include/asm-ia64/kdebug.h | 15 ++ include/asm-ia64/kprobes.h|1 + include/asm-powerpc/kdebug.h | 19 + include/asm-powerpc/kprobes.h |1 + include/asm-s390/kdebug.h | 15 ++ include/asm-s390/kprobes.h|1 + include/asm-sh/kdebug.h |2 + include/asm-sparc64/kdebug.h | 18 include/asm-sparc64/kprobes.h |1 + include/asm-x86/kdebug.h |3 ++ include/asm-x86/kprobes_32.h |2 +- include/asm-x86/kprobes_64.h |1 + kernel/kprobes.c | 39 +-- 19 files changed, 183 insertions(+), 43 deletions(-) diff --git a/arch/x86/kernel/kprobes_32.c b/arch/x86/kernel/kprobes_32.c index 3a020f7..1ba8fee 100644 --- a/arch/x86/kernel/kprobes_32.c +++ b/arch/x86/kernel/kprobes_32.c @@ -586,7 +586,7 @@ out: return 1; } -int __kprobes kprobe_fault_handler(struct pt_regs *regs, int trapnr) +static int __kprobes kprobe_fault_handler(struct pt_regs *regs, int trapnr) { struct kprobe *cur = kprobe_running(); struct kprobe_ctlblk *kcb = get_kprobe_ctlblk(); @@ -668,6 +668,7 @@ int __kprobes kprobe_exceptions_notify(struct notifier_block *self, ret = NOTIFY_STOP; break; case DIE_GPF: + case DIE_PAGE_FAULT: /* kprobe_running() needs smp_processor_id() */ preempt_disable(); if (kprobe_running() diff --git a/arch/x86/kernel/kprobes_64.c b/arch/x86/kernel/kprobes_64.c index 5df19a9..279cea7 100644 --- a/arch/x86/kernel/kprobes_64.c +++ b/arch/x86/kernel/kprobes_64.c @@ -654,6 +654,7 @@ int __kprobes kprobe_exceptions_notify(struct notifier_block *self, ret = NOTIFY_STOP; break; case DIE_GPF: + case DIE_PAGE_FAULT: /* kprobe_running() needs smp_processor_id() */ preempt_disable(); if (kprobe_running() diff --git a/arch/x86/mm/fault_32.c b/arch/x86/mm/fault_32.c index a2273d4..f03cc93 100644 --- a/arch/x86/mm/fault_32.c +++ b/arch/x86/mm/fault_32.c @@ -25,7 +25,6 @@ #include linux/kprobes.h #include linux/uaccess.h #include linux/kdebug.h -#include linux/kprobes.h #include asm/system.h #include asm/desc.h @@ -33,27 +32,33 @@ extern void die(const char *,struct pt_regs *,long); -#ifdef CONFIG_KPROBES -static inline int notify_page_fault(struct pt_regs *regs) +static ATOMIC_NOTIFIER_HEAD(notify_page_fault_chain); + +int register_page_fault_notifier(struct notifier_block *nb) { - int ret = 0; - - /* kprobe_running() needs smp_processor_id() */ - if (!user_mode_vm(regs)) { - preempt_disable(); - if (kprobe_running() kprobe_fault_handler(regs, 14)) - ret = 1; - preempt_enable(); - } + vmalloc_sync_all(); + return atomic_notifier_chain_register(notify_page_fault_chain, nb); +} +EXPORT_SYMBOL_GPL(register_page_fault_notifier); - return ret; +int unregister_page_fault_notifier(struct notifier_block *nb) +{ + return atomic_notifier_chain_unregister(notify_page_fault_chain, nb); } -#else -static inline int notify_page_fault(struct pt_regs *regs) +EXPORT_SYMBOL_GPL(unregister_page_fault_notifier); + +static inline int notify_page_fault(struct pt_regs *regs, long err) { - return 0; + struct die_args args = { + .regs = regs, + .str = page fault, + .err = err, + .trapnr = 14, + .signr = SIGSEGV + }; + return atomic_notifier_call_chain(notify_page_fault_chain, + DIE_PAGE_FAULT, args); } -#endif /* * Return EIP plus the CS segment base. The segment limit is also @@ -331,7 +336,7
Re: [PATCH] Revert x86: optimize page faults like all other achitectures and kill notifier cruft
On Wed, Jan 09, 2008 at 02:34:46AM +, Dave Airlie wrote: [This an initial RFC but I'd like to have this patch in before 2.6.24 goes final as it really breaks this useful feature] mmiotrace the MMIO access tracer used to reverse engineer binary blobs used this notifier interface and is planned on being pushed upstream. Having users able to just use the tracer module without having to rebuild their kernel to add in a page fault handler hack means we get a lot greater coverage for reverse engineering efforts. Signed-off-by: David Airlie [EMAIL PROTECTED] Acked-by: Andi Kleen [EMAIL PROTECTED] I never liked the original patch. -Andi -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Revert x86: optimize page faults like all other achitectures and kill notifier cruft
On Wed, 9 Jan 2008 02:34:46 + (GMT) Dave Airlie [EMAIL PROTECTED] wrote: [This an initial RFC but I'd like to have this patch in before 2.6.24 goes final as it really breaks this useful feature] mmiotrace the MMIO access tracer used to reverse engineer binary blobs used this notifier interface and is planned on being pushed upstream. Having users able to just use the tracer module without having to rebuild their kernel to add in a page fault handler hack means we get a lot greater coverage for reverse engineering efforts. Sorry, but that's a really really small benefit. This very small number of fairly (or very) technical users will be able to work out a way of getting this to work in 2.6.24. And in 2.6.25 with a merged mmiotrace we can do something different. It's a modest convenience to a very small number of people. And the cost? Multiple functions calls and multiple cachelines hit for every pagefault on, what? Tens of millions of machines? Plus the code which is getting restored isn't even very good. For every pagefault it populates a struct on the stack, passes that around for a while, does a bit of RCU stuff only to find that there was nothing to do. Surely we should at least be doing something along the lines of if (unlikely(notify_page_fault_chain.notifier_call != NULL)) { all that crap } But that's all speculation. Has anyone actually measured the pagefault latency impact of this change? +/* + * These are only here because kprobes.c wants them to implement a + * blatant layering violation. Will hopefully go away soon once all + * architectures are updated. + */ +static inline int register_page_fault_notifier(struct notifier_block *nb) +{ + return 0; +} +static inline int unregister_page_fault_notifier(struct notifier_block *nb) +{ + return 0; +} + And this doesn't look very good either. For how long did this fixme remain unfixed? So I'd suggest that we leave things as they are for 2.6.24 - mmiotrace people will work something out, I'm sure. For 2.6.25 if we merge mmiotrace we can look at doing something which is vaguely efficient and tasteful. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Revert x86: optimize page faults like all other achitectures and kill notifier cruft
On Wed, 9 Jan 2008 03:17:37 + (GMT) Dave Airlie [EMAIL PROTECTED] wrote: On Wed, 9 Jan 2008 02:34:46 + (GMT) Dave Airlie [EMAIL PROTECTED] wrote: [This an initial RFC but I'd like to have this patch in before 2.6.24 goes final as it really breaks this useful feature] mmiotrace the MMIO access tracer used to reverse engineer binary blobs used this notifier interface and is planned on being pushed upstream. Having users able to just use the tracer module without having to rebuild their kernel to add in a page fault handler hack means we get a lot greater coverage for reverse engineering efforts. Sorry, but that's a really really small benefit. This very small number of fairly (or very) technical users will be able to work out a way of getting this to work in 2.6.24. And in 2.6.25 with a merged mmiotrace we can do something different. mmiotrace isn't targetted at fairly or technical users, its whole usefulness is that you don't need a kernel re-build, the distro kernels all contain enough support for us to just get a user to grab mmiotrace, run make and get a trace so in my eyes this a major feature regression to have to go back to custom kernel builds... An alternative might be to come up with something decent and target 2.6.24.x It's a modest convenience to a very small number of people. And the cost? Multiple functions calls and multiple cachelines hit for every pagefault on, what? Tens of millions of machines? Which has been happening for how many months? perhaps if we merge mmiotrace in 2.6.25 we can clean up this function, otherwise I just count it as a feature regression... We put the crappy code back in for 2.6.24 then take it out immediately after 2.6.24 and put something else in to support mmiotrace and perhaps the other new mystery features to which you refer below. hm. pagefault it populates a struct on the stack, passes that around for a while, does a bit of RCU stuff only to find that there was nothing to do. Surely we should at least be doing something along the lines of if (unlikely(notify_page_fault_chain.notifier_call != NULL)) { all that crap } But that's all speculation. Has anyone actually measured the pagefault latency impact of this change? ^^ this. +/* + * These are only here because kprobes.c wants them to implement a + * blatant layering violation. Will hopefully go away soon once all + * architectures are updated. + */ +static inline int register_page_fault_notifier(struct notifier_block *nb) +{ + return 0; +} +static inline int unregister_page_fault_notifier(struct notifier_block *nb) +{ + return 0; +} + And this doesn't look very good either. For how long did this fixme remain unfixed? So I'd suggest that we leave things as they are for 2.6.24 - mmiotrace people will work something out, I'm sure. For 2.6.25 if we merge mmiotrace we can look at doing something which is vaguely efficient and tasteful. I just reverted Christophs patch I didn't try and work out if the old code had problems no one has fixed... So all distros with 2.6.24 kernels are useless to mmiotrace I don't see why leaving things as is until a suitable replacement mechanism can be used.. I've heard others give out about this also madwifi and SuSE kernel folks... That change has been in the mainline tree for nearly three months. All these affected parties have left it until the eve of 2.6.24 to actually tell us about it. This is causing me sympathy problems :( -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Revert x86: optimize page faults like all other achitectures and kill notifier cruft
An alternative might be to come up with something decent and target 2.6.24.x I don't see mmiotrace getting merged into a stable kernel... how do however see it getting cleaned up for 2.6.25 now that people know how fragile the kernel hooks for it are.. We put the crappy code back in for 2.6.24 then take it out immediately after 2.6.24 and put something else in to support mmiotrace and perhaps the other new mystery features to which you refer below. hm. (I think the other mystery feature is actually a Novell kernel debugger but I'm not sure, madwifi use it for similiar reasons to mmiotrace I think..) all that crap } But that's all speculation. Has anyone actually measured the pagefault latency impact of this change? Message-Id: [EMAIL PROTECTED] Subject: [patch 20/38] Minor fault path optimization. Date: Fri, 27 Apr 2007 16:05:23 +0200 was a patch to do exactly that.. hch decided the feature wasn't useful and posted a patch to remove it.. That change has been in the mainline tree for nearly three months. All these affected parties have left it until the eve of 2.6.24 to actually tell us about it. This is causing me sympathy problems :( Jan first complained on the 4th Decemeber last year, I'm just posting this now because Linus said send him a patch to revert regressions rather than just complain, I've prepared the patch to put back the old behaviour from 2.6.23. This was only brought to my notice this morning but I'm not going to let that stop me from trying to find a correct fix rather than just ripping the feature out.. I think we could apply the page fault cleanup patch I mentioned earlier on top of this patch and get back the 300 cycles and that would make people happy, it makes sense for mmiotrace to use kprobes hooks and not have to do this stuff directly but if that is what is wanted the mmiotrace guys can do it directly in the future. Dave. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Revert x86: optimize page faults like all other achitectures and kill notifier cruft
An alternative might be to come up with something decent and target 2.6.24.x If you want zero cache line cost the only way is to handle that using Mathieu's inline patch infrastructure. Having a generic notifier type based on that would be probably a good idea. -Andi -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Revert x86: optimize page faults like all other achitectures and kill notifier cruft
On Wed, Jan 09, 2008 at 02:34:46AM +, Dave Airlie wrote: [This an initial RFC but I'd like to have this patch in before 2.6.24 goes final as it really breaks this useful feature] mmiotrace the MMIO access tracer used to reverse engineer binary blobs used this notifier interface and is planned on being pushed upstream. Having users able to just use the tracer module without having to rebuild their kernel to add in a page fault handler hack means we get a lot greater coverage for reverse engineering efforts. Signed-off-by: David Airlie [EMAIL PROTECTED] This reverts commit 74a0b5762713a26496db72eac34fbbed46f20fce. Conflicts: NACK. If you want to do it you'll need a much better reason and an in-tree user. And if you want to redo it it should be available for all platforms with a consistant API. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Revert x86: optimize page faults like all other achitectures and kill notifier cruft
From: Christoph Hellwig [EMAIL PROTECTED] Date: Wed, 9 Jan 2008 08:19:45 +0100 On Wed, Jan 09, 2008 at 03:55:20AM +, Dave Airlie wrote: now because Linus said send him a patch to revert regressions rather than just complain, this is not a regression by any definition. You were abusing exported symbols for out of tree junk, so you'll lose. And furthermore, they don't even need it, use a kprobe. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Revert x86: optimize page faults like all other achitectures and kill notifier cruft
From: Christoph Hellwig [EMAIL PROTECTED] Date: Wed, 9 Jan 2008 08:17:27 +0100 NACK. If you want to do it you'll need a much better reason and an in-tree user. And if you want to redo it it should be available for all platforms with a consistant API. I majorly NACK this as well, we don't want to bring this thing back especially for specialized debugging hacks. You can set a kprobe on the x86 fault handler to do things like mmiotrace. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Revert x86: optimize page faults like all other achitectures and kill notifier cruft
On Wed, Jan 09, 2008 at 03:55:20AM +, Dave Airlie wrote: now because Linus said send him a patch to revert regressions rather than just complain, this is not a regression by any definition. You were abusing exported symbols for out of tree junk, so you'll lose. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/