[tip:sched/urgent] sched/fair: Fix infinite loop in update_blocked_averages() by reverting a9e7f6544b9c
Commit-ID: c40f7d74c741a907cfaeb73a7697081881c497d0 Gitweb: https://git.kernel.org/tip/c40f7d74c741a907cfaeb73a7697081881c497d0 Author: Linus Torvalds AuthorDate: Thu, 27 Dec 2018 13:46:17 -0800 Committer: Ingo Molnar CommitDate: Sun, 30 Dec 2018 13:54:31 +0100 sched/fair: Fix infinite loop in update_blocked_averages() by reverting a9e7f6544b9c Zhipeng Xie, Xie XiuQi and Sargun Dhillon reported lockups in the scheduler under high loads, starting at around the v4.18 time frame, and Zhipeng Xie tracked it down to bugs in the rq->leaf_cfs_rq_list manipulation. Do a (manual) revert of: a9e7f6544b9c ("sched/fair: Fix O(nr_cgroups) in load balance path") It turns out that the list_del_leaf_cfs_rq() introduced by this commit is a surprising property that was not considered in followup commits such as: 9c2791f936ef ("sched/fair: Fix hierarchical order in rq->leaf_cfs_rq_list") As Vincent Guittot explains: "I think that there is a bigger problem with commit a9e7f6544b9c and cfs_rq throttling: Let take the example of the following topology TG2 --> TG1 --> root: 1) The 1st time a task is enqueued, we will add TG2 cfs_rq then TG1 cfs_rq to leaf_cfs_rq_list and we are sure to do the whole branch in one path because it has never been used and can't be throttled so tmp_alone_branch will point to leaf_cfs_rq_list at the end. 2) Then TG1 is throttled 3) and we add TG3 as a new child of TG1. 4) The 1st enqueue of a task on TG3 will add TG3 cfs_rq just before TG1 cfs_rq and tmp_alone_branch will stay on rq->leaf_cfs_rq_list. With commit a9e7f6544b9c, we can del a cfs_rq from rq->leaf_cfs_rq_list. So if the load of TG1 cfs_rq becomes NULL before step 2) above, TG1 cfs_rq is removed from the list. Then at step 4), TG3 cfs_rq is added at the beginning of rq->leaf_cfs_rq_list but tmp_alone_branch still points to TG3 cfs_rq because its throttled parent can't be enqueued when the lock is released. tmp_alone_branch doesn't point to rq->leaf_cfs_rq_list whereas it should. So if TG3 cfs_rq is removed or destroyed before tmp_alone_branch points on another TG cfs_rq, the next TG cfs_rq that will be added, will be linked outside rq->leaf_cfs_rq_list - which is bad. In addition, we can break the ordering of the cfs_rq in rq->leaf_cfs_rq_list but this ordering is used to update and propagate the update from leaf down to root." Instead of trying to work through all these cases and trying to reproduce the very high loads that produced the lockup to begin with, simplify the code temporarily by reverting a9e7f6544b9c - which change was clearly not thought through completely. This (hopefully) gives us a kernel that doesn't lock up so people can continue to enjoy their holidays without worrying about regressions. ;-) [ mingo: Wrote changelog, fixed weird spelling in code comment while at it. ] Analyzed-by: Xie XiuQi Analyzed-by: Vincent Guittot Reported-by: Zhipeng Xie Reported-by: Sargun Dhillon Reported-by: Xie XiuQi Tested-by: Zhipeng Xie Tested-by: Sargun Dhillon Signed-off-by: Linus Torvalds Acked-by: Vincent Guittot Cc: # v4.13+ Cc: Bin Li Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Tejun Heo Cc: Thomas Gleixner Fixes: a9e7f6544b9c ("sched/fair: Fix O(nr_cgroups) in load balance path") Link: http://lkml.kernel.org/r/1545879866-27809-1-git-send-email-xiexi...@huawei.com Signed-off-by: Ingo Molnar --- kernel/sched/fair.c | 43 +-- 1 file changed, 9 insertions(+), 34 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index d1907506318a..6483834f1278 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -352,10 +352,9 @@ static inline void list_del_leaf_cfs_rq(struct cfs_rq *cfs_rq) } } -/* Iterate thr' all leaf cfs_rq's on a runqueue */ -#define for_each_leaf_cfs_rq_safe(rq, cfs_rq, pos) \ - list_for_each_entry_safe(cfs_rq, pos, >leaf_cfs_rq_list,\ -leaf_cfs_rq_list) +/* Iterate through all leaf cfs_rq's on a runqueue: */ +#define for_each_leaf_cfs_rq(rq, cfs_rq) \ + list_for_each_entry_rcu(cfs_rq, >leaf_cfs_rq_list, leaf_cfs_rq_list) /* Do the two (enqueued) entities belong to the same group ? */ static inline struct cfs_rq * @@ -447,8 +446,8 @@ static inline void list_del_leaf_cfs_rq(struct cfs_rq *cfs_rq) { } -#define for_each_leaf_cfs_rq_safe(rq, cfs_rq, pos) \ - for (cfs_rq = >cfs, pos = NULL; cfs_rq; cfs_rq = pos) +#define for_each_leaf_cfs_rq(rq, cfs_rq) \ + for (cfs_rq = >cfs; cfs_rq; cfs_rq = NULL) static inline struct sched_entity *parent_entity(struct sched_entity *se) { @@ -7647,27 +7646,10 @@ static inline bool others_have_blocked(struct rq *rq) #ifdef CONFIG_FAIR_GROUP_SCHED -static inline bool cfs_rq_is_decayed(struct cfs_rq *cfs_rq) -{ - if (cfs_rq->load.weight) - return false; - -
[tip:x86/asm] x86/syscalls: Don't pointlessly reload the system call number
Commit-ID: dfe64506c01e57159a4c550fe537c13a317ff01b Gitweb: https://git.kernel.org/tip/dfe64506c01e57159a4c550fe537c13a317ff01b Author: Linus TorvaldsAuthorDate: Thu, 5 Apr 2018 11:53:00 +0200 Committer: Ingo Molnar CommitDate: Thu, 5 Apr 2018 16:59:24 +0200 x86/syscalls: Don't pointlessly reload the system call number We have it in a register in the low-level asm, just pass it in as an argument rather than have do_syscall_64() load it back in from the ptregs pointer. Signed-off-by: Linus Torvalds Signed-off-by: Dominik Brodowski Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/20180405095307.3730-2-li...@dominikbrodowski.net Signed-off-by: Ingo Molnar --- arch/x86/entry/common.c | 12 ++-- arch/x86/entry/entry_64.S | 3 ++- 2 files changed, 8 insertions(+), 7 deletions(-) diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c index 74f6eee15179..a8b066dbbf48 100644 --- a/arch/x86/entry/common.c +++ b/arch/x86/entry/common.c @@ -266,14 +266,13 @@ __visible inline void syscall_return_slowpath(struct pt_regs *regs) } #ifdef CONFIG_X86_64 -__visible void do_syscall_64(struct pt_regs *regs) +__visible void do_syscall_64(unsigned long nr, struct pt_regs *regs) { - struct thread_info *ti = current_thread_info(); - unsigned long nr = regs->orig_ax; + struct thread_info *ti; enter_from_user_mode(); local_irq_enable(); - + ti = current_thread_info(); if (READ_ONCE(ti->flags) & _TIF_WORK_SYSCALL_ENTRY) nr = syscall_trace_enter(regs); @@ -282,8 +281,9 @@ __visible void do_syscall_64(struct pt_regs *regs) * table. The only functional difference is the x32 bit in * regs->orig_ax, which changes the behavior of some syscalls. */ - if (likely((nr & __SYSCALL_MASK) < NR_syscalls)) { - nr = array_index_nospec(nr & __SYSCALL_MASK, NR_syscalls); + nr &= __SYSCALL_MASK; + if (likely(nr < NR_syscalls)) { + nr = array_index_nospec(nr, NR_syscalls); regs->ax = sys_call_table[nr]( regs->di, regs->si, regs->dx, regs->r10, regs->r8, regs->r9); diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 936e19642eab..6cfe38665f3c 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -233,7 +233,8 @@ GLOBAL(entry_SYSCALL_64_after_hwframe) TRACE_IRQS_OFF /* IRQs are off. */ - movq%rsp, %rdi + movq%rax, %rdi + movq%rsp, %rsi calldo_syscall_64 /* returns with IRQs disabled */ TRACE_IRQS_IRETQ/* we're about to change IF */
[tip:x86/asm] x86/syscalls: Don't pointlessly reload the system call number
Commit-ID: dfe64506c01e57159a4c550fe537c13a317ff01b Gitweb: https://git.kernel.org/tip/dfe64506c01e57159a4c550fe537c13a317ff01b Author: Linus Torvalds AuthorDate: Thu, 5 Apr 2018 11:53:00 +0200 Committer: Ingo Molnar CommitDate: Thu, 5 Apr 2018 16:59:24 +0200 x86/syscalls: Don't pointlessly reload the system call number We have it in a register in the low-level asm, just pass it in as an argument rather than have do_syscall_64() load it back in from the ptregs pointer. Signed-off-by: Linus Torvalds Signed-off-by: Dominik Brodowski Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/20180405095307.3730-2-li...@dominikbrodowski.net Signed-off-by: Ingo Molnar --- arch/x86/entry/common.c | 12 ++-- arch/x86/entry/entry_64.S | 3 ++- 2 files changed, 8 insertions(+), 7 deletions(-) diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c index 74f6eee15179..a8b066dbbf48 100644 --- a/arch/x86/entry/common.c +++ b/arch/x86/entry/common.c @@ -266,14 +266,13 @@ __visible inline void syscall_return_slowpath(struct pt_regs *regs) } #ifdef CONFIG_X86_64 -__visible void do_syscall_64(struct pt_regs *regs) +__visible void do_syscall_64(unsigned long nr, struct pt_regs *regs) { - struct thread_info *ti = current_thread_info(); - unsigned long nr = regs->orig_ax; + struct thread_info *ti; enter_from_user_mode(); local_irq_enable(); - + ti = current_thread_info(); if (READ_ONCE(ti->flags) & _TIF_WORK_SYSCALL_ENTRY) nr = syscall_trace_enter(regs); @@ -282,8 +281,9 @@ __visible void do_syscall_64(struct pt_regs *regs) * table. The only functional difference is the x32 bit in * regs->orig_ax, which changes the behavior of some syscalls. */ - if (likely((nr & __SYSCALL_MASK) < NR_syscalls)) { - nr = array_index_nospec(nr & __SYSCALL_MASK, NR_syscalls); + nr &= __SYSCALL_MASK; + if (likely(nr < NR_syscalls)) { + nr = array_index_nospec(nr, NR_syscalls); regs->ax = sys_call_table[nr]( regs->di, regs->si, regs->dx, regs->r10, regs->r8, regs->r9); diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 936e19642eab..6cfe38665f3c 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -233,7 +233,8 @@ GLOBAL(entry_SYSCALL_64_after_hwframe) TRACE_IRQS_OFF /* IRQs are off. */ - movq%rsp, %rdi + movq%rax, %rdi + movq%rsp, %rsi calldo_syscall_64 /* returns with IRQs disabled */ TRACE_IRQS_IRETQ/* we're about to change IF */
[tip:x86/urgent] x86-32: Fix kexec with stack canary (CONFIG_CC_STACKPROTECTOR)
Commit-ID: ac461122c88a10b7d775de2f56467f097c9e627a Gitweb: https://git.kernel.org/tip/ac461122c88a10b7d775de2f56467f097c9e627a Author: Linus TorvaldsAuthorDate: Wed, 27 Dec 2017 11:48:50 -0800 Committer: Thomas Gleixner CommitDate: Wed, 27 Dec 2017 20:59:41 +0100 x86-32: Fix kexec with stack canary (CONFIG_CC_STACKPROTECTOR) Commit e802a51ede91 ("x86/idt: Consolidate IDT invalidation") cleaned up and unified the IDT invalidation that existed in a couple of places. It changed no actual real code. Despite not changing any actual real code, it _did_ change code generation: by implementing the common idt_invalidate() function in archx86/kernel/idt.c, it made the use of the function in arch/x86/kernel/machine_kexec_32.c be a real function call rather than an (accidental) inlining of the function. That, in turn, exposed two issues: - in load_segments(), we had incorrectly reset all the segment registers, which then made the stack canary load (which gcc does using offset of %gs) cause a trap. Instead of %gs pointing to the stack canary, it will be the normal zero-based kernel segment, and the stack canary load will take a page fault at address 0x14. - to make this even harder to debug, we had invalidated the GDT just before calling idt_invalidate(), which meant that the fault happened with an invalid GDT, which in turn causes a triple fault and immediate reboot. Fix this by (a) not reloading the special segments in load_segments(). We currently don't do any percpu accesses (which would require %fs on x86-32) in this area, but there's no reason to think that we might not want to do them, and like %gs, it's pointless to break it. (b) doing idt_invalidate() before invalidating the GDT, to keep things at least _slightly_ more debuggable for a bit longer. Without a IDT, traps will not work. Without a GDT, traps also will not work, but neither will any segment loads etc. So in a very real sense, the GDT is even more core than the IDT. Fixes: e802a51ede91 ("x86/idt: Consolidate IDT invalidation") Reported-and-tested-by: Alexandru Chirvasitu Signed-off-by: Linus Torvalds Signed-off-by: Thomas Gleixner Cc: Denys Vlasenko Cc: Peter Zijlstra Cc: Brian Gerst Cc: Steven Rostedt Cc: Borislav Petkov Cc: Andy Lutomirski Cc: Josh Poimboeuf Cc: sta...@vger.kernel.org Link: https://lkml.kernel.org/r/alpine.lfd.2.21.1712271143180.8...@i7.lan --- arch/x86/kernel/machine_kexec_32.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/arch/x86/kernel/machine_kexec_32.c b/arch/x86/kernel/machine_kexec_32.c index 00bc751..edfede7 100644 --- a/arch/x86/kernel/machine_kexec_32.c +++ b/arch/x86/kernel/machine_kexec_32.c @@ -48,8 +48,6 @@ static void load_segments(void) "\tmovl $"STR(__KERNEL_DS)",%%eax\n" "\tmovl %%eax,%%ds\n" "\tmovl %%eax,%%es\n" - "\tmovl %%eax,%%fs\n" - "\tmovl %%eax,%%gs\n" "\tmovl %%eax,%%ss\n" : : : "eax", "memory"); #undef STR @@ -232,8 +230,8 @@ void machine_kexec(struct kimage *image) * The gdt & idt are now invalid. * If you want to load them you must set up your own idt & gdt. */ - set_gdt(phys_to_virt(0), 0); idt_invalidate(phys_to_virt(0)); + set_gdt(phys_to_virt(0), 0); /* now call it */ image->start = relocate_kernel_ptr((unsigned long)image->head,
[tip:x86/urgent] x86-32: Fix kexec with stack canary (CONFIG_CC_STACKPROTECTOR)
Commit-ID: ac461122c88a10b7d775de2f56467f097c9e627a Gitweb: https://git.kernel.org/tip/ac461122c88a10b7d775de2f56467f097c9e627a Author: Linus Torvalds AuthorDate: Wed, 27 Dec 2017 11:48:50 -0800 Committer: Thomas Gleixner CommitDate: Wed, 27 Dec 2017 20:59:41 +0100 x86-32: Fix kexec with stack canary (CONFIG_CC_STACKPROTECTOR) Commit e802a51ede91 ("x86/idt: Consolidate IDT invalidation") cleaned up and unified the IDT invalidation that existed in a couple of places. It changed no actual real code. Despite not changing any actual real code, it _did_ change code generation: by implementing the common idt_invalidate() function in archx86/kernel/idt.c, it made the use of the function in arch/x86/kernel/machine_kexec_32.c be a real function call rather than an (accidental) inlining of the function. That, in turn, exposed two issues: - in load_segments(), we had incorrectly reset all the segment registers, which then made the stack canary load (which gcc does using offset of %gs) cause a trap. Instead of %gs pointing to the stack canary, it will be the normal zero-based kernel segment, and the stack canary load will take a page fault at address 0x14. - to make this even harder to debug, we had invalidated the GDT just before calling idt_invalidate(), which meant that the fault happened with an invalid GDT, which in turn causes a triple fault and immediate reboot. Fix this by (a) not reloading the special segments in load_segments(). We currently don't do any percpu accesses (which would require %fs on x86-32) in this area, but there's no reason to think that we might not want to do them, and like %gs, it's pointless to break it. (b) doing idt_invalidate() before invalidating the GDT, to keep things at least _slightly_ more debuggable for a bit longer. Without a IDT, traps will not work. Without a GDT, traps also will not work, but neither will any segment loads etc. So in a very real sense, the GDT is even more core than the IDT. Fixes: e802a51ede91 ("x86/idt: Consolidate IDT invalidation") Reported-and-tested-by: Alexandru Chirvasitu Signed-off-by: Linus Torvalds Signed-off-by: Thomas Gleixner Cc: Denys Vlasenko Cc: Peter Zijlstra Cc: Brian Gerst Cc: Steven Rostedt Cc: Borislav Petkov Cc: Andy Lutomirski Cc: Josh Poimboeuf Cc: sta...@vger.kernel.org Link: https://lkml.kernel.org/r/alpine.lfd.2.21.1712271143180.8...@i7.lan --- arch/x86/kernel/machine_kexec_32.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/arch/x86/kernel/machine_kexec_32.c b/arch/x86/kernel/machine_kexec_32.c index 00bc751..edfede7 100644 --- a/arch/x86/kernel/machine_kexec_32.c +++ b/arch/x86/kernel/machine_kexec_32.c @@ -48,8 +48,6 @@ static void load_segments(void) "\tmovl $"STR(__KERNEL_DS)",%%eax\n" "\tmovl %%eax,%%ds\n" "\tmovl %%eax,%%es\n" - "\tmovl %%eax,%%fs\n" - "\tmovl %%eax,%%gs\n" "\tmovl %%eax,%%ss\n" : : : "eax", "memory"); #undef STR @@ -232,8 +230,8 @@ void machine_kexec(struct kimage *image) * The gdt & idt are now invalid. * If you want to load them you must set up your own idt & gdt. */ - set_gdt(phys_to_virt(0), 0); idt_invalidate(phys_to_virt(0)); + set_gdt(phys_to_virt(0), 0); /* now call it */ image->start = relocate_kernel_ptr((unsigned long)image->head,
[tip:sched/urgent] sched/core: Remove pointless printout in sched_show_task()
Commit-ID: 8243d5597793b5e85143c9a935e1b971c59740a9 Gitweb: http://git.kernel.org/tip/8243d5597793b5e85143c9a935e1b971c59740a9 Author: Linus TorvaldsAuthorDate: Tue, 1 Nov 2016 17:47:18 -0600 Committer: Ingo Molnar CommitDate: Thu, 3 Nov 2016 07:31:34 +0100 sched/core: Remove pointless printout in sched_show_task() In sched_show_task() we print out a useless hex number, not even a symbol, and there's a big question mark whether this even makes sense anyway, I suspect we should just remove it all. Signed-off-by: Linus Torvalds Acked-by: Andy Lutomirski Cc: Peter Zijlstra Cc: Tetsuo Handa Cc: Thomas Gleixner Cc: b...@alien8.de Cc: brge...@gmail.com Cc: j...@thejh.net Cc: keesc...@chromium.org Cc: linux-...@vger.kernel.org Cc: tycho.ander...@canonical.com Link: http://lkml.kernel.org/r/ca+55afzphurpfzavu4z6moy7zmimcwpuudyu8bj9z0j+s8x...@mail.gmail.com Signed-off-by: Ingo Molnar --- kernel/sched/core.c | 9 - 1 file changed, 9 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 9abf66b..154fd68 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5198,17 +5198,8 @@ void sched_show_task(struct task_struct *p) state = __ffs(state) + 1; printk(KERN_INFO "%-15.15s %c", p->comm, state < sizeof(stat_nam) - 1 ? stat_nam[state] : '?'); -#if BITS_PER_LONG == 32 - if (state == TASK_RUNNING) - printk(KERN_CONT " running "); - else - printk(KERN_CONT " %08lx ", thread_saved_pc(p)); -#else if (state == TASK_RUNNING) printk(KERN_CONT " running task"); - else - printk(KERN_CONT " %016lx ", thread_saved_pc(p)); -#endif #ifdef CONFIG_DEBUG_STACK_USAGE free = stack_not_used(p); #endif
[tip:sched/urgent] sched/core: Remove pointless printout in sched_show_task()
Commit-ID: 8243d5597793b5e85143c9a935e1b971c59740a9 Gitweb: http://git.kernel.org/tip/8243d5597793b5e85143c9a935e1b971c59740a9 Author: Linus Torvalds AuthorDate: Tue, 1 Nov 2016 17:47:18 -0600 Committer: Ingo Molnar CommitDate: Thu, 3 Nov 2016 07:31:34 +0100 sched/core: Remove pointless printout in sched_show_task() In sched_show_task() we print out a useless hex number, not even a symbol, and there's a big question mark whether this even makes sense anyway, I suspect we should just remove it all. Signed-off-by: Linus Torvalds Acked-by: Andy Lutomirski Cc: Peter Zijlstra Cc: Tetsuo Handa Cc: Thomas Gleixner Cc: b...@alien8.de Cc: brge...@gmail.com Cc: j...@thejh.net Cc: keesc...@chromium.org Cc: linux-...@vger.kernel.org Cc: tycho.ander...@canonical.com Link: http://lkml.kernel.org/r/ca+55afzphurpfzavu4z6moy7zmimcwpuudyu8bj9z0j+s8x...@mail.gmail.com Signed-off-by: Ingo Molnar --- kernel/sched/core.c | 9 - 1 file changed, 9 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 9abf66b..154fd68 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5198,17 +5198,8 @@ void sched_show_task(struct task_struct *p) state = __ffs(state) + 1; printk(KERN_INFO "%-15.15s %c", p->comm, state < sizeof(stat_nam) - 1 ? stat_nam[state] : '?'); -#if BITS_PER_LONG == 32 - if (state == TASK_RUNNING) - printk(KERN_CONT " running "); - else - printk(KERN_CONT " %08lx ", thread_saved_pc(p)); -#else if (state == TASK_RUNNING) printk(KERN_CONT " running task"); - else - printk(KERN_CONT " %016lx ", thread_saved_pc(p)); -#endif #ifdef CONFIG_DEBUG_STACK_USAGE free = stack_not_used(p); #endif
[tip:x86/asm] um/Stop conflating task_struct::stack with thread_info
Commit-ID: d896fa20a70c9e596438728561e058a74ed3196b Gitweb: http://git.kernel.org/tip/d896fa20a70c9e596438728561e058a74ed3196b Author: Linus TorvaldsAuthorDate: Tue, 13 Sep 2016 14:29:23 -0700 Committer: Ingo Molnar CommitDate: Thu, 15 Sep 2016 08:25:12 +0200 um/Stop conflating task_struct::stack with thread_info thread_info may move in the future, so use the accessors. [ Andy Lutomirski wrote this changelog message and changed "task_thread_info(child)->cpu" to "task_cpu(child)". ] Signed-off-by: Linus Torvalds Signed-off-by: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Jann Horn Cc: Josh Poimboeuf Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/3439705d9838940cc82733a7335fa8c654c37db8.1473801993.git.l...@kernel.org Signed-off-by: Ingo Molnar --- arch/x86/um/ptrace_32.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/x86/um/ptrace_32.c b/arch/x86/um/ptrace_32.c index a7ef7b1..5766ead 100644 --- a/arch/x86/um/ptrace_32.c +++ b/arch/x86/um/ptrace_32.c @@ -194,7 +194,7 @@ int peek_user(struct task_struct *child, long addr, long data) static int get_fpregs(struct user_i387_struct __user *buf, struct task_struct *child) { - int err, n, cpu = ((struct thread_info *) child->stack)->cpu; + int err, n, cpu = task_cpu(child); struct user_i387_struct fpregs; err = save_i387_registers(userspace_pid[cpu], @@ -211,7 +211,7 @@ static int get_fpregs(struct user_i387_struct __user *buf, struct task_struct *c static int set_fpregs(struct user_i387_struct __user *buf, struct task_struct *child) { - int n, cpu = ((struct thread_info *) child->stack)->cpu; + int n, cpu = task_cpu(child); struct user_i387_struct fpregs; n = copy_from_user(, buf, sizeof(fpregs)); @@ -224,7 +224,7 @@ static int set_fpregs(struct user_i387_struct __user *buf, struct task_struct *c static int get_fpxregs(struct user_fxsr_struct __user *buf, struct task_struct *child) { - int err, n, cpu = ((struct thread_info *) child->stack)->cpu; + int err, n, cpu = task_cpu(child); struct user_fxsr_struct fpregs; err = save_fpx_registers(userspace_pid[cpu], (unsigned long *) ); @@ -240,7 +240,7 @@ static int get_fpxregs(struct user_fxsr_struct __user *buf, struct task_struct * static int set_fpxregs(struct user_fxsr_struct __user *buf, struct task_struct *child) { - int n, cpu = ((struct thread_info *) child->stack)->cpu; + int n, cpu = task_cpu(child); struct user_fxsr_struct fpregs; n = copy_from_user(, buf, sizeof(fpregs));
[tip:x86/asm] um/Stop conflating task_struct::stack with thread_info
Commit-ID: d896fa20a70c9e596438728561e058a74ed3196b Gitweb: http://git.kernel.org/tip/d896fa20a70c9e596438728561e058a74ed3196b Author: Linus Torvalds AuthorDate: Tue, 13 Sep 2016 14:29:23 -0700 Committer: Ingo Molnar CommitDate: Thu, 15 Sep 2016 08:25:12 +0200 um/Stop conflating task_struct::stack with thread_info thread_info may move in the future, so use the accessors. [ Andy Lutomirski wrote this changelog message and changed "task_thread_info(child)->cpu" to "task_cpu(child)". ] Signed-off-by: Linus Torvalds Signed-off-by: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Jann Horn Cc: Josh Poimboeuf Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/3439705d9838940cc82733a7335fa8c654c37db8.1473801993.git.l...@kernel.org Signed-off-by: Ingo Molnar --- arch/x86/um/ptrace_32.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/x86/um/ptrace_32.c b/arch/x86/um/ptrace_32.c index a7ef7b1..5766ead 100644 --- a/arch/x86/um/ptrace_32.c +++ b/arch/x86/um/ptrace_32.c @@ -194,7 +194,7 @@ int peek_user(struct task_struct *child, long addr, long data) static int get_fpregs(struct user_i387_struct __user *buf, struct task_struct *child) { - int err, n, cpu = ((struct thread_info *) child->stack)->cpu; + int err, n, cpu = task_cpu(child); struct user_i387_struct fpregs; err = save_i387_registers(userspace_pid[cpu], @@ -211,7 +211,7 @@ static int get_fpregs(struct user_i387_struct __user *buf, struct task_struct *c static int set_fpregs(struct user_i387_struct __user *buf, struct task_struct *child) { - int n, cpu = ((struct thread_info *) child->stack)->cpu; + int n, cpu = task_cpu(child); struct user_i387_struct fpregs; n = copy_from_user(, buf, sizeof(fpregs)); @@ -224,7 +224,7 @@ static int set_fpregs(struct user_i387_struct __user *buf, struct task_struct *c static int get_fpxregs(struct user_fxsr_struct __user *buf, struct task_struct *child) { - int err, n, cpu = ((struct thread_info *) child->stack)->cpu; + int err, n, cpu = task_cpu(child); struct user_fxsr_struct fpregs; err = save_fpx_registers(userspace_pid[cpu], (unsigned long *) ); @@ -240,7 +240,7 @@ static int get_fpxregs(struct user_fxsr_struct __user *buf, struct task_struct * static int set_fpxregs(struct user_fxsr_struct __user *buf, struct task_struct *child) { - int n, cpu = ((struct thread_info *) child->stack)->cpu; + int n, cpu = task_cpu(child); struct user_fxsr_struct fpregs; n = copy_from_user(, buf, sizeof(fpregs));
[tip:x86/asm] x86/entry: Get rid of pt_regs_to_thread_info()
Commit-ID: 97245d00585d82540f4538cf72d92a1e853c7b0e Gitweb: http://git.kernel.org/tip/97245d00585d82540f4538cf72d92a1e853c7b0e Author: Linus TorvaldsAuthorDate: Tue, 13 Sep 2016 14:29:22 -0700 Committer: Ingo Molnar CommitDate: Thu, 15 Sep 2016 08:25:12 +0200 x86/entry: Get rid of pt_regs_to_thread_info() It was a nice optimization while it lasted, but thread_info is moving and this optimization will no longer work. Quoting Linus: Oh Gods, Andy. That pt_regs_to_thread_info() thing made me want to do unspeakable acts on a poor innocent wax figure that looked _exactly_ like you. [ Changelog written by Andy. ] Signed-off-by: Linus Torvalds Signed-off-by: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Jann Horn Cc: Josh Poimboeuf Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/6376aa81c68798cc81631673f52bd91a3e078944.1473801993.git.l...@kernel.org Signed-off-by: Ingo Molnar --- arch/x86/entry/common.c | 20 ++-- 1 file changed, 6 insertions(+), 14 deletions(-) diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c index 871bbf9..bdd9cc5 100644 --- a/arch/x86/entry/common.c +++ b/arch/x86/entry/common.c @@ -31,13 +31,6 @@ #define CREATE_TRACE_POINTS #include -static struct thread_info *pt_regs_to_thread_info(struct pt_regs *regs) -{ - unsigned long top_of_stack = - (unsigned long)(regs + 1) + TOP_OF_KERNEL_STACK_PADDING; - return (struct thread_info *)(top_of_stack - THREAD_SIZE); -} - #ifdef CONFIG_CONTEXT_TRACKING /* Called on entry from user mode with IRQs off. */ __visible inline void enter_from_user_mode(void) @@ -71,7 +64,7 @@ static long syscall_trace_enter(struct pt_regs *regs) { u32 arch = in_ia32_syscall() ? AUDIT_ARCH_I386 : AUDIT_ARCH_X86_64; - struct thread_info *ti = pt_regs_to_thread_info(regs); + struct thread_info *ti = current_thread_info(); unsigned long ret = 0; bool emulated = false; u32 work; @@ -173,18 +166,17 @@ static void exit_to_usermode_loop(struct pt_regs *regs, u32 cached_flags) /* Disable IRQs and retry */ local_irq_disable(); - cached_flags = READ_ONCE(pt_regs_to_thread_info(regs)->flags); + cached_flags = READ_ONCE(current_thread_info()->flags); if (!(cached_flags & EXIT_TO_USERMODE_LOOP_FLAGS)) break; - } } /* Called with IRQs disabled. */ __visible inline void prepare_exit_to_usermode(struct pt_regs *regs) { - struct thread_info *ti = pt_regs_to_thread_info(regs); + struct thread_info *ti = current_thread_info(); u32 cached_flags; if (IS_ENABLED(CONFIG_PROVE_LOCKING) && WARN_ON(!irqs_disabled())) @@ -247,7 +239,7 @@ static void syscall_slow_exit_work(struct pt_regs *regs, u32 cached_flags) */ __visible inline void syscall_return_slowpath(struct pt_regs *regs) { - struct thread_info *ti = pt_regs_to_thread_info(regs); + struct thread_info *ti = current_thread_info(); u32 cached_flags = READ_ONCE(ti->flags); CT_WARN_ON(ct_state() != CONTEXT_KERNEL); @@ -270,7 +262,7 @@ __visible inline void syscall_return_slowpath(struct pt_regs *regs) #ifdef CONFIG_X86_64 __visible void do_syscall_64(struct pt_regs *regs) { - struct thread_info *ti = pt_regs_to_thread_info(regs); + struct thread_info *ti = current_thread_info(); unsigned long nr = regs->orig_ax; enter_from_user_mode(); @@ -303,7 +295,7 @@ __visible void do_syscall_64(struct pt_regs *regs) */ static __always_inline void do_syscall_32_irqs_on(struct pt_regs *regs) { - struct thread_info *ti = pt_regs_to_thread_info(regs); + struct thread_info *ti = current_thread_info(); unsigned int nr = (unsigned int)regs->orig_ax; #ifdef CONFIG_IA32_EMULATION
[tip:x86/asm] x86/entry: Get rid of pt_regs_to_thread_info()
Commit-ID: 97245d00585d82540f4538cf72d92a1e853c7b0e Gitweb: http://git.kernel.org/tip/97245d00585d82540f4538cf72d92a1e853c7b0e Author: Linus Torvalds AuthorDate: Tue, 13 Sep 2016 14:29:22 -0700 Committer: Ingo Molnar CommitDate: Thu, 15 Sep 2016 08:25:12 +0200 x86/entry: Get rid of pt_regs_to_thread_info() It was a nice optimization while it lasted, but thread_info is moving and this optimization will no longer work. Quoting Linus: Oh Gods, Andy. That pt_regs_to_thread_info() thing made me want to do unspeakable acts on a poor innocent wax figure that looked _exactly_ like you. [ Changelog written by Andy. ] Signed-off-by: Linus Torvalds Signed-off-by: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Jann Horn Cc: Josh Poimboeuf Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/6376aa81c68798cc81631673f52bd91a3e078944.1473801993.git.l...@kernel.org Signed-off-by: Ingo Molnar --- arch/x86/entry/common.c | 20 ++-- 1 file changed, 6 insertions(+), 14 deletions(-) diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c index 871bbf9..bdd9cc5 100644 --- a/arch/x86/entry/common.c +++ b/arch/x86/entry/common.c @@ -31,13 +31,6 @@ #define CREATE_TRACE_POINTS #include -static struct thread_info *pt_regs_to_thread_info(struct pt_regs *regs) -{ - unsigned long top_of_stack = - (unsigned long)(regs + 1) + TOP_OF_KERNEL_STACK_PADDING; - return (struct thread_info *)(top_of_stack - THREAD_SIZE); -} - #ifdef CONFIG_CONTEXT_TRACKING /* Called on entry from user mode with IRQs off. */ __visible inline void enter_from_user_mode(void) @@ -71,7 +64,7 @@ static long syscall_trace_enter(struct pt_regs *regs) { u32 arch = in_ia32_syscall() ? AUDIT_ARCH_I386 : AUDIT_ARCH_X86_64; - struct thread_info *ti = pt_regs_to_thread_info(regs); + struct thread_info *ti = current_thread_info(); unsigned long ret = 0; bool emulated = false; u32 work; @@ -173,18 +166,17 @@ static void exit_to_usermode_loop(struct pt_regs *regs, u32 cached_flags) /* Disable IRQs and retry */ local_irq_disable(); - cached_flags = READ_ONCE(pt_regs_to_thread_info(regs)->flags); + cached_flags = READ_ONCE(current_thread_info()->flags); if (!(cached_flags & EXIT_TO_USERMODE_LOOP_FLAGS)) break; - } } /* Called with IRQs disabled. */ __visible inline void prepare_exit_to_usermode(struct pt_regs *regs) { - struct thread_info *ti = pt_regs_to_thread_info(regs); + struct thread_info *ti = current_thread_info(); u32 cached_flags; if (IS_ENABLED(CONFIG_PROVE_LOCKING) && WARN_ON(!irqs_disabled())) @@ -247,7 +239,7 @@ static void syscall_slow_exit_work(struct pt_regs *regs, u32 cached_flags) */ __visible inline void syscall_return_slowpath(struct pt_regs *regs) { - struct thread_info *ti = pt_regs_to_thread_info(regs); + struct thread_info *ti = current_thread_info(); u32 cached_flags = READ_ONCE(ti->flags); CT_WARN_ON(ct_state() != CONTEXT_KERNEL); @@ -270,7 +262,7 @@ __visible inline void syscall_return_slowpath(struct pt_regs *regs) #ifdef CONFIG_X86_64 __visible void do_syscall_64(struct pt_regs *regs) { - struct thread_info *ti = pt_regs_to_thread_info(regs); + struct thread_info *ti = current_thread_info(); unsigned long nr = regs->orig_ax; enter_from_user_mode(); @@ -303,7 +295,7 @@ __visible void do_syscall_64(struct pt_regs *regs) */ static __always_inline void do_syscall_32_irqs_on(struct pt_regs *regs) { - struct thread_info *ti = pt_regs_to_thread_info(regs); + struct thread_info *ti = current_thread_info(); unsigned int nr = (unsigned int)regs->orig_ax; #ifdef CONFIG_IA32_EMULATION
[tip:x86/urgent] x86/efi: Fix 7-parameter efi_call()s
Commit-ID: 683ad8092cd262a02d01377dd17a29d492438b90 Gitweb: http://git.kernel.org/tip/683ad8092cd262a02d01377dd17a29d492438b90 Author: Linus TorvaldsAuthorDate: Mon, 16 May 2016 13:05:45 -0700 Committer: Ingo Molnar CommitDate: Tue, 17 May 2016 08:25:06 +0200 x86/efi: Fix 7-parameter efi_call()s Alex Thorlton reported that the SGI/UV code crashes in the efi_call() code when invoked with 7 parameters, due to: mov (%rsp), %rax mov 8(%rax), %rax ... mov %rax, 40(%rsp) Offset 8 is only true if CONFIG_FRAME_POINTERS is disabled, with frame pointers enabled it should be 16. Furthermore, the SAVE_XMM code saves the old stack pointer, but that's just crazy. It saves the stack pointer *AFTER* we've done the: FRAME_BEGIN ... which will have *changed* the stack pointer, depending on whether stack frames are enabled or not. So when the code then does: mov (%rsp), %rax ... we now move that old stack pointer into %rax, but the offset off that stack pointer will depend on whether that FRAME_BEGIN saved off %rbp or not. So that whole 8-vs-16 offset confusion depends on the frame pointer! If frame pointers were enabled, it will be 16. If they weren't, it will be 8. The right fix is to just get rid of that silly conditional frame pointer thing, and always use frame pointers in this stub function. And then we don't need that (odd) load to get the old stack pointer into %rax - we can just use the frame pointer. Reported-by: Alex Thorlton Tested-by: Alex Thorlton Signed-off-by: Linus Torvalds Cc: Alexander Shishkin Cc: Andrew Morton Cc: Andy Lutomirski Cc: Arnaldo Carvalho de Melo Cc: Borislav Petkov Cc: Brian Gerst Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Jiri Olsa Cc: Matt Fleming Cc: Peter Zijlstra Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Thomas Gleixner Cc: Vince Weaver Link: http://lkml.kernel.org/r/ca%2b55afzbs2v%3dwneh83cudg7xkoremfqj30bjwf40dcyjreb...@mail.gmail.com Signed-off-by: Ingo Molnar --- arch/x86/platform/efi/efi_stub_64.S | 9 - 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/arch/x86/platform/efi/efi_stub_64.S b/arch/x86/platform/efi/efi_stub_64.S index 92723ae..cd95075 100644 --- a/arch/x86/platform/efi/efi_stub_64.S +++ b/arch/x86/platform/efi/efi_stub_64.S @@ -11,7 +11,6 @@ #include #include #include -#include #define SAVE_XMM \ mov %rsp, %rax; \ @@ -40,10 +39,10 @@ mov (%rsp), %rsp ENTRY(efi_call) - FRAME_BEGIN + pushq %rbp + movq %rsp, %rbp SAVE_XMM - mov (%rsp), %rax - mov 8(%rax), %rax + mov 16(%rbp), %rax subq $48, %rsp mov %r9, 32(%rsp) mov %rax, 40(%rsp) @@ -53,6 +52,6 @@ ENTRY(efi_call) call *%rdi addq $48, %rsp RESTORE_XMM - FRAME_END + popq %rbp ret ENDPROC(efi_call)
[tip:x86/urgent] x86/efi: Fix 7-parameter efi_call()s
Commit-ID: 683ad8092cd262a02d01377dd17a29d492438b90 Gitweb: http://git.kernel.org/tip/683ad8092cd262a02d01377dd17a29d492438b90 Author: Linus Torvalds AuthorDate: Mon, 16 May 2016 13:05:45 -0700 Committer: Ingo Molnar CommitDate: Tue, 17 May 2016 08:25:06 +0200 x86/efi: Fix 7-parameter efi_call()s Alex Thorlton reported that the SGI/UV code crashes in the efi_call() code when invoked with 7 parameters, due to: mov (%rsp), %rax mov 8(%rax), %rax ... mov %rax, 40(%rsp) Offset 8 is only true if CONFIG_FRAME_POINTERS is disabled, with frame pointers enabled it should be 16. Furthermore, the SAVE_XMM code saves the old stack pointer, but that's just crazy. It saves the stack pointer *AFTER* we've done the: FRAME_BEGIN ... which will have *changed* the stack pointer, depending on whether stack frames are enabled or not. So when the code then does: mov (%rsp), %rax ... we now move that old stack pointer into %rax, but the offset off that stack pointer will depend on whether that FRAME_BEGIN saved off %rbp or not. So that whole 8-vs-16 offset confusion depends on the frame pointer! If frame pointers were enabled, it will be 16. If they weren't, it will be 8. The right fix is to just get rid of that silly conditional frame pointer thing, and always use frame pointers in this stub function. And then we don't need that (odd) load to get the old stack pointer into %rax - we can just use the frame pointer. Reported-by: Alex Thorlton Tested-by: Alex Thorlton Signed-off-by: Linus Torvalds Cc: Alexander Shishkin Cc: Andrew Morton Cc: Andy Lutomirski Cc: Arnaldo Carvalho de Melo Cc: Borislav Petkov Cc: Brian Gerst Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Jiri Olsa Cc: Matt Fleming Cc: Peter Zijlstra Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Thomas Gleixner Cc: Vince Weaver Link: http://lkml.kernel.org/r/ca%2b55afzbs2v%3dwneh83cudg7xkoremfqj30bjwf40dcyjreb...@mail.gmail.com Signed-off-by: Ingo Molnar --- arch/x86/platform/efi/efi_stub_64.S | 9 - 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/arch/x86/platform/efi/efi_stub_64.S b/arch/x86/platform/efi/efi_stub_64.S index 92723ae..cd95075 100644 --- a/arch/x86/platform/efi/efi_stub_64.S +++ b/arch/x86/platform/efi/efi_stub_64.S @@ -11,7 +11,6 @@ #include #include #include -#include #define SAVE_XMM \ mov %rsp, %rax; \ @@ -40,10 +39,10 @@ mov (%rsp), %rsp ENTRY(efi_call) - FRAME_BEGIN + pushq %rbp + movq %rsp, %rbp SAVE_XMM - mov (%rsp), %rax - mov 8(%rax), %rax + mov 16(%rbp), %rax subq $48, %rsp mov %r9, 32(%rsp) mov %rax, 40(%rsp) @@ -53,6 +52,6 @@ ENTRY(efi_call) call *%rdi addq $48, %rsp RESTORE_XMM - FRAME_END + popq %rbp ret ENDPROC(efi_call)
[tip:x86/apic] x86/apic: Add a single-target IPI function to the apic
Commit-ID: 539da7877275edb21a76aa02fb2c147eff02c559 Gitweb: http://git.kernel.org/tip/539da7877275edb21a76aa02fb2c147eff02c559 Author: Linus Torvalds AuthorDate: Wed, 4 Nov 2015 22:57:00 + Committer: Thomas Gleixner CommitDate: Thu, 5 Nov 2015 13:07:51 +0100 x86/apic: Add a single-target IPI function to the apic We still fall back on the "send mask" versions if an apic definition doesn't have the single-target version, but at least this allows the (trivial) case for the common clustered x2apic case. Signed-off-by: Linus Torvalds Reviewed-by: Ingo Molnar Cc: Borislav Petkov Cc: Peter Zijlstra Cc: Mike Travis Cc: Daniel J Blueman Link: http://lkml.kernel.org/r/20151104220848.737120...@linutronix.de Signed-off-by: Thomas Gleixner --- arch/x86/include/asm/apic.h | 1 + arch/x86/kernel/smp.c | 16 ++-- 2 files changed, 15 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h index a30316b..7f62ad4 100644 --- a/arch/x86/include/asm/apic.h +++ b/arch/x86/include/asm/apic.h @@ -303,6 +303,7 @@ struct apic { unsigned int *apicid); /* ipi */ + void (*send_IPI)(int cpu, int vector); void (*send_IPI_mask)(const struct cpumask *mask, int vector); void (*send_IPI_mask_allbutself)(const struct cpumask *mask, int vector); diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c index 12c8286..1dbf590 100644 --- a/arch/x86/kernel/smp.c +++ b/arch/x86/kernel/smp.c @@ -115,6 +115,18 @@ static atomic_t stopping_cpu = ATOMIC_INIT(-1); static bool smp_no_nmi_ipi = false; /* + * Helper wrapper: not all apic definitions support sending to + * a single CPU, so we fall back to sending to a mask. + */ +static void send_IPI_cpu(int cpu, int vector) +{ + if (apic->send_IPI) + apic->send_IPI(cpu, vector); + else + apic->send_IPI_mask(cpumask_of(cpu), vector); +} + +/* * this function sends a 'reschedule' IPI to another CPU. * it goes straight through and wastes no time serializing * anything. Worst case is that we lose a reschedule ... @@ -125,12 +137,12 @@ static void native_smp_send_reschedule(int cpu) WARN_ON(1); return; } - apic->send_IPI_mask(cpumask_of(cpu), RESCHEDULE_VECTOR); + send_IPI_cpu(cpu, RESCHEDULE_VECTOR); } void native_send_call_func_single_ipi(int cpu) { - apic->send_IPI_mask(cpumask_of(cpu), CALL_FUNCTION_SINGLE_VECTOR); + send_IPI_cpu(cpu, CALL_FUNCTION_SINGLE_VECTOR); } void native_send_call_func_ipi(const struct cpumask *mask) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/apic] x86/apic: Implement single target IPI function for x2apic_cluster
Commit-ID: 7b6ce46cb3d096831dea3accacee4717c66abac8 Gitweb: http://git.kernel.org/tip/7b6ce46cb3d096831dea3accacee4717c66abac8 Author: Linus Torvalds AuthorDate: Wed, 4 Nov 2015 22:57:00 + Committer: Thomas Gleixner CommitDate: Thu, 5 Nov 2015 13:07:52 +0100 x86/apic: Implement single target IPI function for x2apic_cluster [ tglx: Split it out from the patch which provides the new callback ] Signed-off-by: Linus Torvalds Reviewed-by: Ingo Molnar Cc: Borislav Petkov Cc: Peter Zijlstra Cc: Mike Travis Cc: Daniel J Blueman Link: http://lkml.kernel.org/r/20151104220848.817975...@linutronix.de Signed-off-by: Thomas Gleixner --- arch/x86/kernel/apic/x2apic_cluster.c | 9 + 1 file changed, 9 insertions(+) diff --git a/arch/x86/kernel/apic/x2apic_cluster.c b/arch/x86/kernel/apic/x2apic_cluster.c index cc8311c..aca8b75 100644 --- a/arch/x86/kernel/apic/x2apic_cluster.c +++ b/arch/x86/kernel/apic/x2apic_cluster.c @@ -23,6 +23,14 @@ static inline u32 x2apic_cluster(int cpu) return per_cpu(x86_cpu_to_logical_apicid, cpu) >> 16; } +static void x2apic_send_IPI(int cpu, int vector) +{ + u32 dest = per_cpu(x86_cpu_to_logical_apicid, cpu); + + x2apic_wrmsr_fence(); + __x2apic_send_IPI_dest(dest, vector, APIC_DEST_LOGICAL); +} + static void __x2apic_send_IPI_mask(const struct cpumask *mask, int vector, int apic_dest) { @@ -266,6 +274,7 @@ static struct apic apic_x2apic_cluster = { .cpu_mask_to_apicid_and = x2apic_cpu_mask_to_apicid_and, + .send_IPI = x2apic_send_IPI, .send_IPI_mask = x2apic_send_IPI_mask, .send_IPI_mask_allbutself = x2apic_send_IPI_mask_allbutself, .send_IPI_allbutself= x2apic_send_IPI_allbutself, -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/apic] x86/apic: Implement single target IPI function for x2apic_cluster
Commit-ID: 7b6ce46cb3d096831dea3accacee4717c66abac8 Gitweb: http://git.kernel.org/tip/7b6ce46cb3d096831dea3accacee4717c66abac8 Author: Linus TorvaldsAuthorDate: Wed, 4 Nov 2015 22:57:00 + Committer: Thomas Gleixner CommitDate: Thu, 5 Nov 2015 13:07:52 +0100 x86/apic: Implement single target IPI function for x2apic_cluster [ tglx: Split it out from the patch which provides the new callback ] Signed-off-by: Linus Torvalds Reviewed-by: Ingo Molnar Cc: Borislav Petkov Cc: Peter Zijlstra Cc: Mike Travis Cc: Daniel J Blueman Link: http://lkml.kernel.org/r/20151104220848.817975...@linutronix.de Signed-off-by: Thomas Gleixner --- arch/x86/kernel/apic/x2apic_cluster.c | 9 + 1 file changed, 9 insertions(+) diff --git a/arch/x86/kernel/apic/x2apic_cluster.c b/arch/x86/kernel/apic/x2apic_cluster.c index cc8311c..aca8b75 100644 --- a/arch/x86/kernel/apic/x2apic_cluster.c +++ b/arch/x86/kernel/apic/x2apic_cluster.c @@ -23,6 +23,14 @@ static inline u32 x2apic_cluster(int cpu) return per_cpu(x86_cpu_to_logical_apicid, cpu) >> 16; } +static void x2apic_send_IPI(int cpu, int vector) +{ + u32 dest = per_cpu(x86_cpu_to_logical_apicid, cpu); + + x2apic_wrmsr_fence(); + __x2apic_send_IPI_dest(dest, vector, APIC_DEST_LOGICAL); +} + static void __x2apic_send_IPI_mask(const struct cpumask *mask, int vector, int apic_dest) { @@ -266,6 +274,7 @@ static struct apic apic_x2apic_cluster = { .cpu_mask_to_apicid_and = x2apic_cpu_mask_to_apicid_and, + .send_IPI = x2apic_send_IPI, .send_IPI_mask = x2apic_send_IPI_mask, .send_IPI_mask_allbutself = x2apic_send_IPI_mask_allbutself, .send_IPI_allbutself= x2apic_send_IPI_allbutself, -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/apic] x86/apic: Add a single-target IPI function to the apic
Commit-ID: 539da7877275edb21a76aa02fb2c147eff02c559 Gitweb: http://git.kernel.org/tip/539da7877275edb21a76aa02fb2c147eff02c559 Author: Linus TorvaldsAuthorDate: Wed, 4 Nov 2015 22:57:00 + Committer: Thomas Gleixner CommitDate: Thu, 5 Nov 2015 13:07:51 +0100 x86/apic: Add a single-target IPI function to the apic We still fall back on the "send mask" versions if an apic definition doesn't have the single-target version, but at least this allows the (trivial) case for the common clustered x2apic case. Signed-off-by: Linus Torvalds Reviewed-by: Ingo Molnar Cc: Borislav Petkov Cc: Peter Zijlstra Cc: Mike Travis Cc: Daniel J Blueman Link: http://lkml.kernel.org/r/20151104220848.737120...@linutronix.de Signed-off-by: Thomas Gleixner --- arch/x86/include/asm/apic.h | 1 + arch/x86/kernel/smp.c | 16 ++-- 2 files changed, 15 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h index a30316b..7f62ad4 100644 --- a/arch/x86/include/asm/apic.h +++ b/arch/x86/include/asm/apic.h @@ -303,6 +303,7 @@ struct apic { unsigned int *apicid); /* ipi */ + void (*send_IPI)(int cpu, int vector); void (*send_IPI_mask)(const struct cpumask *mask, int vector); void (*send_IPI_mask_allbutself)(const struct cpumask *mask, int vector); diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c index 12c8286..1dbf590 100644 --- a/arch/x86/kernel/smp.c +++ b/arch/x86/kernel/smp.c @@ -115,6 +115,18 @@ static atomic_t stopping_cpu = ATOMIC_INIT(-1); static bool smp_no_nmi_ipi = false; /* + * Helper wrapper: not all apic definitions support sending to + * a single CPU, so we fall back to sending to a mask. + */ +static void send_IPI_cpu(int cpu, int vector) +{ + if (apic->send_IPI) + apic->send_IPI(cpu, vector); + else + apic->send_IPI_mask(cpumask_of(cpu), vector); +} + +/* * this function sends a 'reschedule' IPI to another CPU. * it goes straight through and wastes no time serializing * anything. Worst case is that we lose a reschedule ... @@ -125,12 +137,12 @@ static void native_smp_send_reschedule(int cpu) WARN_ON(1); return; } - apic->send_IPI_mask(cpumask_of(cpu), RESCHEDULE_VECTOR); + send_IPI_cpu(cpu, RESCHEDULE_VECTOR); } void native_send_call_func_single_ipi(int cpu) { - apic->send_IPI_mask(cpumask_of(cpu), CALL_FUNCTION_SINGLE_VECTOR); + send_IPI_cpu(cpu, CALL_FUNCTION_SINGLE_VECTOR); } void native_send_call_func_ipi(const struct cpumask *mask) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:locking/urgent] smp: Fix smp_call_function_single_async() locking
Commit-ID: 8053871d0f7f67c7efb7f226ef031f78877d6625 Gitweb: http://git.kernel.org/tip/8053871d0f7f67c7efb7f226ef031f78877d6625 Author: Linus Torvalds AuthorDate: Wed, 11 Feb 2015 12:42:10 -0800 Committer: Ingo Molnar CommitDate: Fri, 17 Apr 2015 09:57:52 +0200 smp: Fix smp_call_function_single_async() locking The current smp_function_call code suffers a number of problems, most notably smp_call_function_single_async() is broken. The problem is that flush_smp_call_function_queue() does csd_unlock() _after_ calling csd->func(). This means that a caller cannot properly synchronize the csd usage as it has to. Change the code to release the csd before calling ->func() for the async case, and put a WARN_ON_ONCE(csd->flags & CSD_FLAG_LOCK) in smp_call_function_single_async() to warn us of improper serialization, because any waiting there can results in deadlocks when called with IRQs disabled. Rename the (currently) unused WAIT flag to SYNCHRONOUS and (re)use it such that we know what to do in flush_smp_call_function_queue(). Rework csd_{,un}lock() to use smp_load_acquire() / smp_store_release() to avoid some full barriers while more clearly providing lock semantics. Finally move the csd maintenance out of generic_exec_single() into its callers for clearer code. Signed-off-by: Linus Torvalds [ Added changelog. ] Signed-off-by: Peter Zijlstra (Intel) Cc: Frederic Weisbecker Cc: Jens Axboe Cc: Rafael David Tinoco Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/ca+55afz492bzlfhdbkn-hygjcreup7cjmeyk3ntsfrwjppz...@mail.gmail.com Signed-off-by: Ingo Molnar --- kernel/smp.c | 78 1 file changed, 47 insertions(+), 31 deletions(-) diff --git a/kernel/smp.c b/kernel/smp.c index f38a1e6..2aaac2c 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -19,7 +19,7 @@ enum { CSD_FLAG_LOCK = 0x01, - CSD_FLAG_WAIT = 0x02, + CSD_FLAG_SYNCHRONOUS= 0x02, }; struct call_function_data { @@ -107,7 +107,7 @@ void __init call_function_init(void) */ static void csd_lock_wait(struct call_single_data *csd) { - while (csd->flags & CSD_FLAG_LOCK) + while (smp_load_acquire(>flags) & CSD_FLAG_LOCK) cpu_relax(); } @@ -121,19 +121,17 @@ static void csd_lock(struct call_single_data *csd) * to ->flags with any subsequent assignments to other * fields of the specified call_single_data structure: */ - smp_mb(); + smp_wmb(); } static void csd_unlock(struct call_single_data *csd) { - WARN_ON((csd->flags & CSD_FLAG_WAIT) && !(csd->flags & CSD_FLAG_LOCK)); + WARN_ON(!(csd->flags & CSD_FLAG_LOCK)); /* * ensure we're all done before releasing data: */ - smp_mb(); - - csd->flags &= ~CSD_FLAG_LOCK; + smp_store_release(>flags, 0); } static DEFINE_PER_CPU_SHARED_ALIGNED(struct call_single_data, csd_data); @@ -144,13 +142,16 @@ static DEFINE_PER_CPU_SHARED_ALIGNED(struct call_single_data, csd_data); * ->func, ->info, and ->flags set. */ static int generic_exec_single(int cpu, struct call_single_data *csd, - smp_call_func_t func, void *info, int wait) + smp_call_func_t func, void *info) { - struct call_single_data csd_stack = { .flags = 0 }; - unsigned long flags; - - if (cpu == smp_processor_id()) { + unsigned long flags; + + /* +* We can unlock early even for the synchronous on-stack case, +* since we're doing this from the same CPU.. +*/ + csd_unlock(csd); local_irq_save(flags); func(info); local_irq_restore(flags); @@ -161,21 +162,9 @@ static int generic_exec_single(int cpu, struct call_single_data *csd, if ((unsigned)cpu >= nr_cpu_ids || !cpu_online(cpu)) return -ENXIO; - - if (!csd) { - csd = _stack; - if (!wait) - csd = this_cpu_ptr(_data); - } - - csd_lock(csd); - csd->func = func; csd->info = info; - if (wait) - csd->flags |= CSD_FLAG_WAIT; - /* * The list addition should be visible before sending the IPI * handler locks the list to pull the entry off it because of @@ -190,9 +179,6 @@ static int generic_exec_single(int cpu, struct call_single_data *csd, if (llist_add(>llist, _cpu(call_single_queue, cpu))) arch_send_call_function_single_ipi(cpu); - if (wait) - csd_lock_wait(csd); - return 0; } @@ -250,8 +236,17 @@ static void flush_smp_call_function_queue(bool warn_cpu_offline) } llist_for_each_entry_safe(csd, csd_next, entry, llist) { - csd->func(csd->info); - csd_unlock(csd); +
[tip:locking/urgent] smp: Fix smp_call_function_single_async() locking
Commit-ID: 8053871d0f7f67c7efb7f226ef031f78877d6625 Gitweb: http://git.kernel.org/tip/8053871d0f7f67c7efb7f226ef031f78877d6625 Author: Linus Torvalds torva...@linux-foundation.org AuthorDate: Wed, 11 Feb 2015 12:42:10 -0800 Committer: Ingo Molnar mi...@kernel.org CommitDate: Fri, 17 Apr 2015 09:57:52 +0200 smp: Fix smp_call_function_single_async() locking The current smp_function_call code suffers a number of problems, most notably smp_call_function_single_async() is broken. The problem is that flush_smp_call_function_queue() does csd_unlock() _after_ calling csd-func(). This means that a caller cannot properly synchronize the csd usage as it has to. Change the code to release the csd before calling -func() for the async case, and put a WARN_ON_ONCE(csd-flags CSD_FLAG_LOCK) in smp_call_function_single_async() to warn us of improper serialization, because any waiting there can results in deadlocks when called with IRQs disabled. Rename the (currently) unused WAIT flag to SYNCHRONOUS and (re)use it such that we know what to do in flush_smp_call_function_queue(). Rework csd_{,un}lock() to use smp_load_acquire() / smp_store_release() to avoid some full barriers while more clearly providing lock semantics. Finally move the csd maintenance out of generic_exec_single() into its callers for clearer code. Signed-off-by: Linus Torvalds torva...@linux-foundation.org [ Added changelog. ] Signed-off-by: Peter Zijlstra (Intel) pet...@infradead.org Cc: Frederic Weisbecker fweis...@gmail.com Cc: Jens Axboe ax...@kernel.dk Cc: Rafael David Tinoco ina...@ubuntu.com Cc: Thomas Gleixner t...@linutronix.de Link: http://lkml.kernel.org/r/ca+55afz492bzlfhdbkn-hygjcreup7cjmeyk3ntsfrwjppz...@mail.gmail.com Signed-off-by: Ingo Molnar mi...@kernel.org --- kernel/smp.c | 78 1 file changed, 47 insertions(+), 31 deletions(-) diff --git a/kernel/smp.c b/kernel/smp.c index f38a1e6..2aaac2c 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -19,7 +19,7 @@ enum { CSD_FLAG_LOCK = 0x01, - CSD_FLAG_WAIT = 0x02, + CSD_FLAG_SYNCHRONOUS= 0x02, }; struct call_function_data { @@ -107,7 +107,7 @@ void __init call_function_init(void) */ static void csd_lock_wait(struct call_single_data *csd) { - while (csd-flags CSD_FLAG_LOCK) + while (smp_load_acquire(csd-flags) CSD_FLAG_LOCK) cpu_relax(); } @@ -121,19 +121,17 @@ static void csd_lock(struct call_single_data *csd) * to -flags with any subsequent assignments to other * fields of the specified call_single_data structure: */ - smp_mb(); + smp_wmb(); } static void csd_unlock(struct call_single_data *csd) { - WARN_ON((csd-flags CSD_FLAG_WAIT) !(csd-flags CSD_FLAG_LOCK)); + WARN_ON(!(csd-flags CSD_FLAG_LOCK)); /* * ensure we're all done before releasing data: */ - smp_mb(); - - csd-flags = ~CSD_FLAG_LOCK; + smp_store_release(csd-flags, 0); } static DEFINE_PER_CPU_SHARED_ALIGNED(struct call_single_data, csd_data); @@ -144,13 +142,16 @@ static DEFINE_PER_CPU_SHARED_ALIGNED(struct call_single_data, csd_data); * -func, -info, and -flags set. */ static int generic_exec_single(int cpu, struct call_single_data *csd, - smp_call_func_t func, void *info, int wait) + smp_call_func_t func, void *info) { - struct call_single_data csd_stack = { .flags = 0 }; - unsigned long flags; - - if (cpu == smp_processor_id()) { + unsigned long flags; + + /* +* We can unlock early even for the synchronous on-stack case, +* since we're doing this from the same CPU.. +*/ + csd_unlock(csd); local_irq_save(flags); func(info); local_irq_restore(flags); @@ -161,21 +162,9 @@ static int generic_exec_single(int cpu, struct call_single_data *csd, if ((unsigned)cpu = nr_cpu_ids || !cpu_online(cpu)) return -ENXIO; - - if (!csd) { - csd = csd_stack; - if (!wait) - csd = this_cpu_ptr(csd_data); - } - - csd_lock(csd); - csd-func = func; csd-info = info; - if (wait) - csd-flags |= CSD_FLAG_WAIT; - /* * The list addition should be visible before sending the IPI * handler locks the list to pull the entry off it because of @@ -190,9 +179,6 @@ static int generic_exec_single(int cpu, struct call_single_data *csd, if (llist_add(csd-llist, per_cpu(call_single_queue, cpu))) arch_send_call_function_single_ipi(cpu); - if (wait) - csd_lock_wait(csd); - return 0; } @@ -250,8 +236,17 @@ static void flush_smp_call_function_queue(bool warn_cpu_offline)
[tip:x86/urgent] x86-64, modify_ldt: Make support for 16-bit segments a runtime option
Commit-ID: fa81511bb0bbb2b1aace3695ce869da9762624ff Gitweb: http://git.kernel.org/tip/fa81511bb0bbb2b1aace3695ce869da9762624ff Author: Linus Torvalds AuthorDate: Wed, 14 May 2014 16:33:54 -0700 Committer: H. Peter Anvin CommitDate: Wed, 14 May 2014 16:33:54 -0700 x86-64, modify_ldt: Make support for 16-bit segments a runtime option Checkin: b3b42ac2cbae x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels disabled 16-bit segments on 64-bit kernels due to an information leak. However, it does seem that people are genuinely using Wine to run old 16-bit Windows programs on Linux. A proper fix for this ("espfix64") is coming in the upcoming merge window, but as a temporary fix, create a sysctl to allow the administrator to re-enable support for 16-bit segments. It adds a "/proc/sys/abi/ldt16" sysctl that defaults to zero (off). If you hit this issue and care about your old Windows program more than you care about a kernel stack address information leak, you can do echo 1 > /proc/sys/abi/ldt16 as root (add it to your startup scripts), and you should be ok. The sysctl table is only added if you have COMPAT support enabled on x86-64, but I assume anybody who runs old windows binaries very much does that ;) Signed-off-by: H. Peter Anvin Link: http://lkml.kernel.org/r/ca%2b55afw9bpod10u1lfhbomphwzkvjtkmcfcs9s3urpr1yyw...@mail.gmail.com Cc: --- arch/x86/kernel/ldt.c| 4 +++- arch/x86/vdso/vdso32-setup.c | 8 2 files changed, 11 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c index af1d14a..dcbbaa1 100644 --- a/arch/x86/kernel/ldt.c +++ b/arch/x86/kernel/ldt.c @@ -20,6 +20,8 @@ #include #include +int sysctl_ldt16 = 0; + #ifdef CONFIG_SMP static void flush_ldt(void *current_mm) { @@ -234,7 +236,7 @@ static int write_ldt(void __user *ptr, unsigned long bytecount, int oldmode) * IRET leaking the high bits of the kernel stack address. */ #ifdef CONFIG_X86_64 - if (!ldt_info.seg_32bit) { + if (!ldt_info.seg_32bit && !sysctl_ldt16) { error = -EINVAL; goto out_unlock; } diff --git a/arch/x86/vdso/vdso32-setup.c b/arch/x86/vdso/vdso32-setup.c index 0034898..e1f220e 100644 --- a/arch/x86/vdso/vdso32-setup.c +++ b/arch/x86/vdso/vdso32-setup.c @@ -39,6 +39,7 @@ #ifdef CONFIG_X86_64 #define vdso_enabled sysctl_vsyscall32 #define arch_setup_additional_pagessyscall32_setup_pages +extern int sysctl_ldt16; #endif /* @@ -249,6 +250,13 @@ static struct ctl_table abi_table2[] = { .mode = 0644, .proc_handler = proc_dointvec }, + { + .procname = "ldt16", + .data = _ldt16, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec + }, {} }; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/urgent] x86-64, modify_ldt: Make support for 16-bit segments a runtime option
Commit-ID: fa81511bb0bbb2b1aace3695ce869da9762624ff Gitweb: http://git.kernel.org/tip/fa81511bb0bbb2b1aace3695ce869da9762624ff Author: Linus Torvalds torva...@linux-foundation.org AuthorDate: Wed, 14 May 2014 16:33:54 -0700 Committer: H. Peter Anvin h...@linux.intel.com CommitDate: Wed, 14 May 2014 16:33:54 -0700 x86-64, modify_ldt: Make support for 16-bit segments a runtime option Checkin: b3b42ac2cbae x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels disabled 16-bit segments on 64-bit kernels due to an information leak. However, it does seem that people are genuinely using Wine to run old 16-bit Windows programs on Linux. A proper fix for this (espfix64) is coming in the upcoming merge window, but as a temporary fix, create a sysctl to allow the administrator to re-enable support for 16-bit segments. It adds a /proc/sys/abi/ldt16 sysctl that defaults to zero (off). If you hit this issue and care about your old Windows program more than you care about a kernel stack address information leak, you can do echo 1 /proc/sys/abi/ldt16 as root (add it to your startup scripts), and you should be ok. The sysctl table is only added if you have COMPAT support enabled on x86-64, but I assume anybody who runs old windows binaries very much does that ;) Signed-off-by: H. Peter Anvin h...@linux.intel.com Link: http://lkml.kernel.org/r/ca%2b55afw9bpod10u1lfhbomphwzkvjtkmcfcs9s3urpr1yyw...@mail.gmail.com Cc: sta...@vger.kernel.org --- arch/x86/kernel/ldt.c| 4 +++- arch/x86/vdso/vdso32-setup.c | 8 2 files changed, 11 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c index af1d14a..dcbbaa1 100644 --- a/arch/x86/kernel/ldt.c +++ b/arch/x86/kernel/ldt.c @@ -20,6 +20,8 @@ #include asm/mmu_context.h #include asm/syscalls.h +int sysctl_ldt16 = 0; + #ifdef CONFIG_SMP static void flush_ldt(void *current_mm) { @@ -234,7 +236,7 @@ static int write_ldt(void __user *ptr, unsigned long bytecount, int oldmode) * IRET leaking the high bits of the kernel stack address. */ #ifdef CONFIG_X86_64 - if (!ldt_info.seg_32bit) { + if (!ldt_info.seg_32bit !sysctl_ldt16) { error = -EINVAL; goto out_unlock; } diff --git a/arch/x86/vdso/vdso32-setup.c b/arch/x86/vdso/vdso32-setup.c index 0034898..e1f220e 100644 --- a/arch/x86/vdso/vdso32-setup.c +++ b/arch/x86/vdso/vdso32-setup.c @@ -39,6 +39,7 @@ #ifdef CONFIG_X86_64 #define vdso_enabled sysctl_vsyscall32 #define arch_setup_additional_pagessyscall32_setup_pages +extern int sysctl_ldt16; #endif /* @@ -249,6 +250,13 @@ static struct ctl_table abi_table2[] = { .mode = 0644, .proc_handler = proc_dointvec }, + { + .procname = ldt16, + .data = sysctl_ldt16, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec + }, {} }; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/urgent] x86, fpu, amd: Clear exceptions in AMD FXSAVE workaround
Commit-ID: 26bef1318adc1b3a530ecc807ef99346db2aa8b0 Gitweb: http://git.kernel.org/tip/26bef1318adc1b3a530ecc807ef99346db2aa8b0 Author: Linus Torvalds AuthorDate: Sat, 11 Jan 2014 19:15:52 -0800 Committer: H. Peter Anvin CommitDate: Sat, 11 Jan 2014 19:15:52 -0800 x86, fpu, amd: Clear exceptions in AMD FXSAVE workaround Before we do an EMMS in the AMD FXSAVE information leak workaround we need to clear any pending exceptions, otherwise we trap with a floating-point exception inside this code. Reported-by: halfdog Tested-by: Borislav Petkov Link: http://lkml.kernel.org/r/CA%2B55aFxQnY_PCG_n4=0w-VG=ylxl-yr7omxyy0wu2gcbaf3...@mail.gmail.com Signed-off-by: H. Peter Anvin --- arch/x86/include/asm/fpu-internal.h | 13 +++-- 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/fpu-internal.h b/arch/x86/include/asm/fpu-internal.h index c49a613..cea1c76 100644 --- a/arch/x86/include/asm/fpu-internal.h +++ b/arch/x86/include/asm/fpu-internal.h @@ -293,12 +293,13 @@ static inline int restore_fpu_checking(struct task_struct *tsk) /* AMD K7/K8 CPUs don't save/restore FDP/FIP/FOP unless an exception is pending. Clear the x87 state here by setting it to fixed values. "m" is a random variable that should be in L1 */ - alternative_input( - ASM_NOP8 ASM_NOP2, - "emms\n\t" /* clear stack tags */ - "fildl %P[addr]", /* set F?P to defined value */ - X86_FEATURE_FXSAVE_LEAK, - [addr] "m" (tsk->thread.fpu.has_fpu)); + if (unlikely(static_cpu_has(X86_FEATURE_FXSAVE_LEAK))) { + asm volatile( + "fnclex\n\t" + "emms\n\t" + "fildl %P[addr]"/* set F?P to defined value */ + : : [addr] "m" (tsk->thread.fpu.has_fpu)); + } return fpu_restore_checking(>thread.fpu); } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/urgent] x86, fpu, amd: Clear exceptions in AMD FXSAVE workaround
Commit-ID: 26bef1318adc1b3a530ecc807ef99346db2aa8b0 Gitweb: http://git.kernel.org/tip/26bef1318adc1b3a530ecc807ef99346db2aa8b0 Author: Linus Torvalds torva...@linux-foundation.org AuthorDate: Sat, 11 Jan 2014 19:15:52 -0800 Committer: H. Peter Anvin h...@zytor.com CommitDate: Sat, 11 Jan 2014 19:15:52 -0800 x86, fpu, amd: Clear exceptions in AMD FXSAVE workaround Before we do an EMMS in the AMD FXSAVE information leak workaround we need to clear any pending exceptions, otherwise we trap with a floating-point exception inside this code. Reported-by: halfdog m...@halfdog.net Tested-by: Borislav Petkov b...@suse.de Link: http://lkml.kernel.org/r/CA%2B55aFxQnY_PCG_n4=0w-VG=ylxl-yr7omxyy0wu2gcbaf3...@mail.gmail.com Signed-off-by: H. Peter Anvin h...@zytor.com --- arch/x86/include/asm/fpu-internal.h | 13 +++-- 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/fpu-internal.h b/arch/x86/include/asm/fpu-internal.h index c49a613..cea1c76 100644 --- a/arch/x86/include/asm/fpu-internal.h +++ b/arch/x86/include/asm/fpu-internal.h @@ -293,12 +293,13 @@ static inline int restore_fpu_checking(struct task_struct *tsk) /* AMD K7/K8 CPUs don't save/restore FDP/FIP/FOP unless an exception is pending. Clear the x87 state here by setting it to fixed values. m is a random variable that should be in L1 */ - alternative_input( - ASM_NOP8 ASM_NOP2, - emms\n\t /* clear stack tags */ - fildl %P[addr], /* set F?P to defined value */ - X86_FEATURE_FXSAVE_LEAK, - [addr] m (tsk-thread.fpu.has_fpu)); + if (unlikely(static_cpu_has(X86_FEATURE_FXSAVE_LEAK))) { + asm volatile( + fnclex\n\t + emms\n\t + fildl %P[addr]/* set F?P to defined value */ + : : [addr] m (tsk-thread.fpu.has_fpu)); + } return fpu_restore_checking(tsk-thread.fpu); } -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/asm] x86: Replace assembly access_ok() with a C variant
Commit-ID: c5fe5d80680e2949ffe102180f5fc6cefc0d145f Gitweb: http://git.kernel.org/tip/c5fe5d80680e2949ffe102180f5fc6cefc0d145f Author: Linus Torvalds AuthorDate: Fri, 27 Dec 2013 15:30:58 -0800 Committer: H. Peter Anvin CommitDate: Fri, 27 Dec 2013 16:58:17 -0800 x86: Replace assembly access_ok() with a C variant It turns out that the assembly variant doesn't actually produce that good code, presumably partly because it creates a long dependency chain with no scheduling, and partly because we cannot get a flags result out of gcc (which could be fixed with asm goto, but it turns out not to be worth it.) The C code allows gcc to schedule and generate multiple (easily predictable) branches, and as a side benefit we can really optimize the case where the size is constant. Link: http://lkml.kernel.org/r/CA%2B55aFzPBdbfKovMT8Edr4SmE2_=%2bokjfac9xw2awegogtk...@mail.gmail.com Signed-off-by: H. Peter Anvin --- arch/x86/include/asm/uaccess.h | 28 +--- 1 file changed, 17 insertions(+), 11 deletions(-) diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h index 8ec57c0..84ecf1d 100644 --- a/arch/x86/include/asm/uaccess.h +++ b/arch/x86/include/asm/uaccess.h @@ -40,22 +40,28 @@ /* * Test whether a block of memory is a valid user space address. * Returns 0 if the range is valid, nonzero otherwise. - * - * This is equivalent to the following test: - * (u33)addr + (u33)size > (u33)current->addr_limit.seg (u65 for x86_64) - * - * This needs 33-bit (65-bit for x86_64) arithmetic. We have a carry... */ +static inline int __chk_range_not_ok(unsigned long addr, unsigned long size, unsigned long limit) +{ + /* +* If we have used "sizeof()" for the size, +* we know it won't overflow the limit (but +* it might overflow the 'addr', so it's +* important to subtract the size from the +* limit, not add it to the address). +*/ + if (__builtin_constant_p(size)) + return addr > limit - size; + + /* Arbitrary sizes? Be careful about overflow */ + addr += size; + return (addr < size) || (addr > limit); +} #define __range_not_ok(addr, size, limit) \ ({ \ - unsigned long flag, roksum; \ __chk_user_ptr(addr); \ - asm("add %3,%1 ; sbb %0,%0 ; cmp %1,%4 ; sbb $0,%0" \ - : "=" (flag), "=r" (roksum) \ - : "1" (addr), "g" ((long)(size)), \ - "rm" (limit));\ - flag; \ + __chk_range_not_ok((unsigned long __force)(addr), size, limit); \ }) /** -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/asm] x86: Replace assembly access_ok() with a C variant
Commit-ID: c5fe5d80680e2949ffe102180f5fc6cefc0d145f Gitweb: http://git.kernel.org/tip/c5fe5d80680e2949ffe102180f5fc6cefc0d145f Author: Linus Torvalds torva...@linux-foundation.org AuthorDate: Fri, 27 Dec 2013 15:30:58 -0800 Committer: H. Peter Anvin h...@zytor.com CommitDate: Fri, 27 Dec 2013 16:58:17 -0800 x86: Replace assembly access_ok() with a C variant It turns out that the assembly variant doesn't actually produce that good code, presumably partly because it creates a long dependency chain with no scheduling, and partly because we cannot get a flags result out of gcc (which could be fixed with asm goto, but it turns out not to be worth it.) The C code allows gcc to schedule and generate multiple (easily predictable) branches, and as a side benefit we can really optimize the case where the size is constant. Link: http://lkml.kernel.org/r/CA%2B55aFzPBdbfKovMT8Edr4SmE2_=%2bokjfac9xw2awegogtk...@mail.gmail.com Signed-off-by: H. Peter Anvin h...@zytor.com --- arch/x86/include/asm/uaccess.h | 28 +--- 1 file changed, 17 insertions(+), 11 deletions(-) diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h index 8ec57c0..84ecf1d 100644 --- a/arch/x86/include/asm/uaccess.h +++ b/arch/x86/include/asm/uaccess.h @@ -40,22 +40,28 @@ /* * Test whether a block of memory is a valid user space address. * Returns 0 if the range is valid, nonzero otherwise. - * - * This is equivalent to the following test: - * (u33)addr + (u33)size (u33)current-addr_limit.seg (u65 for x86_64) - * - * This needs 33-bit (65-bit for x86_64) arithmetic. We have a carry... */ +static inline int __chk_range_not_ok(unsigned long addr, unsigned long size, unsigned long limit) +{ + /* +* If we have used sizeof() for the size, +* we know it won't overflow the limit (but +* it might overflow the 'addr', so it's +* important to subtract the size from the +* limit, not add it to the address). +*/ + if (__builtin_constant_p(size)) + return addr limit - size; + + /* Arbitrary sizes? Be careful about overflow */ + addr += size; + return (addr size) || (addr limit); +} #define __range_not_ok(addr, size, limit) \ ({ \ - unsigned long flag, roksum; \ __chk_user_ptr(addr); \ - asm(add %3,%1 ; sbb %0,%0 ; cmp %1,%4 ; sbb $0,%0 \ - : =r (flag), =r (roksum) \ - : 1 (addr), g ((long)(size)), \ - rm (limit));\ - flag; \ + __chk_range_not_ok((unsigned long __force)(addr), size, limit); \ }) /** -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/