[tip:sched/urgent] sched/fair: Fix infinite loop in update_blocked_averages() by reverting a9e7f6544b9c

2018-12-30 Thread tip-bot for Linus Torvalds
Commit-ID:  c40f7d74c741a907cfaeb73a7697081881c497d0
Gitweb: https://git.kernel.org/tip/c40f7d74c741a907cfaeb73a7697081881c497d0
Author: Linus Torvalds 
AuthorDate: Thu, 27 Dec 2018 13:46:17 -0800
Committer:  Ingo Molnar 
CommitDate: Sun, 30 Dec 2018 13:54:31 +0100

sched/fair: Fix infinite loop in update_blocked_averages() by reverting 
a9e7f6544b9c

Zhipeng Xie, Xie XiuQi and Sargun Dhillon reported lockups in the
scheduler under high loads, starting at around the v4.18 time frame,
and Zhipeng Xie tracked it down to bugs in the rq->leaf_cfs_rq_list
manipulation.

Do a (manual) revert of:

  a9e7f6544b9c ("sched/fair: Fix O(nr_cgroups) in load balance path")

It turns out that the list_del_leaf_cfs_rq() introduced by this commit
is a surprising property that was not considered in followup commits
such as:

  9c2791f936ef ("sched/fair: Fix hierarchical order in rq->leaf_cfs_rq_list")

As Vincent Guittot explains:

 "I think that there is a bigger problem with commit a9e7f6544b9c and
  cfs_rq throttling:

  Let take the example of the following topology TG2 --> TG1 --> root:

   1) The 1st time a task is enqueued, we will add TG2 cfs_rq then TG1
  cfs_rq to leaf_cfs_rq_list and we are sure to do the whole branch in
  one path because it has never been used and can't be throttled so
  tmp_alone_branch will point to leaf_cfs_rq_list at the end.

   2) Then TG1 is throttled

   3) and we add TG3 as a new child of TG1.

   4) The 1st enqueue of a task on TG3 will add TG3 cfs_rq just before TG1
  cfs_rq and tmp_alone_branch will stay  on rq->leaf_cfs_rq_list.

  With commit a9e7f6544b9c, we can del a cfs_rq from rq->leaf_cfs_rq_list.
  So if the load of TG1 cfs_rq becomes NULL before step 2) above, TG1
  cfs_rq is removed from the list.
  Then at step 4), TG3 cfs_rq is added at the beginning of rq->leaf_cfs_rq_list
  but tmp_alone_branch still points to TG3 cfs_rq because its throttled
  parent can't be enqueued when the lock is released.
  tmp_alone_branch doesn't point to rq->leaf_cfs_rq_list whereas it should.

  So if TG3 cfs_rq is removed or destroyed before tmp_alone_branch
  points on another TG cfs_rq, the next TG cfs_rq that will be added,
  will be linked outside rq->leaf_cfs_rq_list - which is bad.

  In addition, we can break the ordering of the cfs_rq in
  rq->leaf_cfs_rq_list but this ordering is used to update and
  propagate the update from leaf down to root."

Instead of trying to work through all these cases and trying to reproduce
the very high loads that produced the lockup to begin with, simplify
the code temporarily by reverting a9e7f6544b9c - which change was clearly
not thought through completely.

This (hopefully) gives us a kernel that doesn't lock up so people
can continue to enjoy their holidays without worrying about regressions. ;-)

[ mingo: Wrote changelog, fixed weird spelling in code comment while at it. ]

Analyzed-by: Xie XiuQi 
Analyzed-by: Vincent Guittot 
Reported-by: Zhipeng Xie 
Reported-by: Sargun Dhillon 
Reported-by: Xie XiuQi 
Tested-by: Zhipeng Xie 
Tested-by: Sargun Dhillon 
Signed-off-by: Linus Torvalds 
Acked-by: Vincent Guittot 
Cc:  # v4.13+
Cc: Bin Li 
Cc: Mike Galbraith 
Cc: Peter Zijlstra 
Cc: Tejun Heo 
Cc: Thomas Gleixner 
Fixes: a9e7f6544b9c ("sched/fair: Fix O(nr_cgroups) in load balance path")
Link: 
http://lkml.kernel.org/r/1545879866-27809-1-git-send-email-xiexi...@huawei.com
Signed-off-by: Ingo Molnar 
---
 kernel/sched/fair.c | 43 +--
 1 file changed, 9 insertions(+), 34 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d1907506318a..6483834f1278 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -352,10 +352,9 @@ static inline void list_del_leaf_cfs_rq(struct cfs_rq 
*cfs_rq)
}
 }
 
-/* Iterate thr' all leaf cfs_rq's on a runqueue */
-#define for_each_leaf_cfs_rq_safe(rq, cfs_rq, pos) \
-   list_for_each_entry_safe(cfs_rq, pos, >leaf_cfs_rq_list,\
-leaf_cfs_rq_list)
+/* Iterate through all leaf cfs_rq's on a runqueue: */
+#define for_each_leaf_cfs_rq(rq, cfs_rq) \
+   list_for_each_entry_rcu(cfs_rq, >leaf_cfs_rq_list, leaf_cfs_rq_list)
 
 /* Do the two (enqueued) entities belong to the same group ? */
 static inline struct cfs_rq *
@@ -447,8 +446,8 @@ static inline void list_del_leaf_cfs_rq(struct cfs_rq 
*cfs_rq)
 {
 }
 
-#define for_each_leaf_cfs_rq_safe(rq, cfs_rq, pos) \
-   for (cfs_rq = >cfs, pos = NULL; cfs_rq; cfs_rq = pos)
+#define for_each_leaf_cfs_rq(rq, cfs_rq)   \
+   for (cfs_rq = >cfs; cfs_rq; cfs_rq = NULL)
 
 static inline struct sched_entity *parent_entity(struct sched_entity *se)
 {
@@ -7647,27 +7646,10 @@ static inline bool others_have_blocked(struct rq *rq)
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
 
-static inline bool cfs_rq_is_decayed(struct cfs_rq *cfs_rq)
-{
-   if (cfs_rq->load.weight)
-   return false;
-
-   

[tip:x86/asm] x86/syscalls: Don't pointlessly reload the system call number

2018-04-06 Thread tip-bot for Linus Torvalds
Commit-ID:  dfe64506c01e57159a4c550fe537c13a317ff01b
Gitweb: https://git.kernel.org/tip/dfe64506c01e57159a4c550fe537c13a317ff01b
Author: Linus Torvalds 
AuthorDate: Thu, 5 Apr 2018 11:53:00 +0200
Committer:  Ingo Molnar 
CommitDate: Thu, 5 Apr 2018 16:59:24 +0200

x86/syscalls: Don't pointlessly reload the system call number

We have it in a register in the low-level asm, just pass it in as an
argument rather than have do_syscall_64() load it back in from the
ptregs pointer.

Signed-off-by: Linus Torvalds 
Signed-off-by: Dominik Brodowski 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Josh Poimboeuf 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Link: http://lkml.kernel.org/r/20180405095307.3730-2-li...@dominikbrodowski.net
Signed-off-by: Ingo Molnar 
---
 arch/x86/entry/common.c   | 12 ++--
 arch/x86/entry/entry_64.S |  3 ++-
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 74f6eee15179..a8b066dbbf48 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -266,14 +266,13 @@ __visible inline void syscall_return_slowpath(struct 
pt_regs *regs)
 }
 
 #ifdef CONFIG_X86_64
-__visible void do_syscall_64(struct pt_regs *regs)
+__visible void do_syscall_64(unsigned long nr, struct pt_regs *regs)
 {
-   struct thread_info *ti = current_thread_info();
-   unsigned long nr = regs->orig_ax;
+   struct thread_info *ti;
 
enter_from_user_mode();
local_irq_enable();
-
+   ti = current_thread_info();
if (READ_ONCE(ti->flags) & _TIF_WORK_SYSCALL_ENTRY)
nr = syscall_trace_enter(regs);
 
@@ -282,8 +281,9 @@ __visible void do_syscall_64(struct pt_regs *regs)
 * table.  The only functional difference is the x32 bit in
 * regs->orig_ax, which changes the behavior of some syscalls.
 */
-   if (likely((nr & __SYSCALL_MASK) < NR_syscalls)) {
-   nr = array_index_nospec(nr & __SYSCALL_MASK, NR_syscalls);
+   nr &= __SYSCALL_MASK;
+   if (likely(nr < NR_syscalls)) {
+   nr = array_index_nospec(nr, NR_syscalls);
regs->ax = sys_call_table[nr](
regs->di, regs->si, regs->dx,
regs->r10, regs->r8, regs->r9);
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 936e19642eab..6cfe38665f3c 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -233,7 +233,8 @@ GLOBAL(entry_SYSCALL_64_after_hwframe)
TRACE_IRQS_OFF
 
/* IRQs are off. */
-   movq%rsp, %rdi
+   movq%rax, %rdi
+   movq%rsp, %rsi
calldo_syscall_64   /* returns with IRQs disabled */
 
TRACE_IRQS_IRETQ/* we're about to change IF */


[tip:x86/asm] x86/syscalls: Don't pointlessly reload the system call number

2018-04-06 Thread tip-bot for Linus Torvalds
Commit-ID:  dfe64506c01e57159a4c550fe537c13a317ff01b
Gitweb: https://git.kernel.org/tip/dfe64506c01e57159a4c550fe537c13a317ff01b
Author: Linus Torvalds 
AuthorDate: Thu, 5 Apr 2018 11:53:00 +0200
Committer:  Ingo Molnar 
CommitDate: Thu, 5 Apr 2018 16:59:24 +0200

x86/syscalls: Don't pointlessly reload the system call number

We have it in a register in the low-level asm, just pass it in as an
argument rather than have do_syscall_64() load it back in from the
ptregs pointer.

Signed-off-by: Linus Torvalds 
Signed-off-by: Dominik Brodowski 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Josh Poimboeuf 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Link: http://lkml.kernel.org/r/20180405095307.3730-2-li...@dominikbrodowski.net
Signed-off-by: Ingo Molnar 
---
 arch/x86/entry/common.c   | 12 ++--
 arch/x86/entry/entry_64.S |  3 ++-
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 74f6eee15179..a8b066dbbf48 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -266,14 +266,13 @@ __visible inline void syscall_return_slowpath(struct 
pt_regs *regs)
 }
 
 #ifdef CONFIG_X86_64
-__visible void do_syscall_64(struct pt_regs *regs)
+__visible void do_syscall_64(unsigned long nr, struct pt_regs *regs)
 {
-   struct thread_info *ti = current_thread_info();
-   unsigned long nr = regs->orig_ax;
+   struct thread_info *ti;
 
enter_from_user_mode();
local_irq_enable();
-
+   ti = current_thread_info();
if (READ_ONCE(ti->flags) & _TIF_WORK_SYSCALL_ENTRY)
nr = syscall_trace_enter(regs);
 
@@ -282,8 +281,9 @@ __visible void do_syscall_64(struct pt_regs *regs)
 * table.  The only functional difference is the x32 bit in
 * regs->orig_ax, which changes the behavior of some syscalls.
 */
-   if (likely((nr & __SYSCALL_MASK) < NR_syscalls)) {
-   nr = array_index_nospec(nr & __SYSCALL_MASK, NR_syscalls);
+   nr &= __SYSCALL_MASK;
+   if (likely(nr < NR_syscalls)) {
+   nr = array_index_nospec(nr, NR_syscalls);
regs->ax = sys_call_table[nr](
regs->di, regs->si, regs->dx,
regs->r10, regs->r8, regs->r9);
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 936e19642eab..6cfe38665f3c 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -233,7 +233,8 @@ GLOBAL(entry_SYSCALL_64_after_hwframe)
TRACE_IRQS_OFF
 
/* IRQs are off. */
-   movq%rsp, %rdi
+   movq%rax, %rdi
+   movq%rsp, %rsi
calldo_syscall_64   /* returns with IRQs disabled */
 
TRACE_IRQS_IRETQ/* we're about to change IF */


[tip:x86/urgent] x86-32: Fix kexec with stack canary (CONFIG_CC_STACKPROTECTOR)

2017-12-27 Thread tip-bot for Linus Torvalds
Commit-ID:  ac461122c88a10b7d775de2f56467f097c9e627a
Gitweb: https://git.kernel.org/tip/ac461122c88a10b7d775de2f56467f097c9e627a
Author: Linus Torvalds 
AuthorDate: Wed, 27 Dec 2017 11:48:50 -0800
Committer:  Thomas Gleixner 
CommitDate: Wed, 27 Dec 2017 20:59:41 +0100

x86-32: Fix kexec with stack canary (CONFIG_CC_STACKPROTECTOR)

Commit e802a51ede91 ("x86/idt: Consolidate IDT invalidation") cleaned up
and unified the IDT invalidation that existed in a couple of places.  It
changed no actual real code.

Despite not changing any actual real code, it _did_ change code generation:
by implementing the common idt_invalidate() function in
archx86/kernel/idt.c, it made the use of the function in
arch/x86/kernel/machine_kexec_32.c be a real function call rather than an
(accidental) inlining of the function.

That, in turn, exposed two issues:

 - in load_segments(), we had incorrectly reset all the segment
   registers, which then made the stack canary load (which gcc does
   using offset of %gs) cause a trap.  Instead of %gs pointing to the
   stack canary, it will be the normal zero-based kernel segment, and
   the stack canary load will take a page fault at address 0x14.

 - to make this even harder to debug, we had invalidated the GDT just
   before calling idt_invalidate(), which meant that the fault happened
   with an invalid GDT, which in turn causes a triple fault and
   immediate reboot.

Fix this by

 (a) not reloading the special segments in load_segments(). We currently
 don't do any percpu accesses (which would require %fs on x86-32) in
 this area, but there's no reason to think that we might not want to
 do them, and like %gs, it's pointless to break it.

 (b) doing idt_invalidate() before invalidating the GDT, to keep things
 at least _slightly_ more debuggable for a bit longer. Without a
 IDT, traps will not work. Without a GDT, traps also will not work,
 but neither will any segment loads etc. So in a very real sense,
 the GDT is even more core than the IDT.

Fixes: e802a51ede91 ("x86/idt: Consolidate IDT invalidation")
Reported-and-tested-by: Alexandru Chirvasitu 
Signed-off-by: Linus Torvalds 
Signed-off-by: Thomas Gleixner 
Cc: Denys Vlasenko 
Cc: Peter Zijlstra 
Cc: Brian Gerst 
Cc: Steven Rostedt 
Cc: Borislav Petkov 
Cc: Andy Lutomirski 
Cc: Josh Poimboeuf 
Cc: sta...@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.lfd.2.21.1712271143180.8...@i7.lan

---
 arch/x86/kernel/machine_kexec_32.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/x86/kernel/machine_kexec_32.c 
b/arch/x86/kernel/machine_kexec_32.c
index 00bc751..edfede7 100644
--- a/arch/x86/kernel/machine_kexec_32.c
+++ b/arch/x86/kernel/machine_kexec_32.c
@@ -48,8 +48,6 @@ static void load_segments(void)
"\tmovl $"STR(__KERNEL_DS)",%%eax\n"
"\tmovl %%eax,%%ds\n"
"\tmovl %%eax,%%es\n"
-   "\tmovl %%eax,%%fs\n"
-   "\tmovl %%eax,%%gs\n"
"\tmovl %%eax,%%ss\n"
: : : "eax", "memory");
 #undef STR
@@ -232,8 +230,8 @@ void machine_kexec(struct kimage *image)
 * The gdt & idt are now invalid.
 * If you want to load them you must set up your own idt & gdt.
 */
-   set_gdt(phys_to_virt(0), 0);
idt_invalidate(phys_to_virt(0));
+   set_gdt(phys_to_virt(0), 0);
 
/* now call it */
image->start = relocate_kernel_ptr((unsigned long)image->head,


[tip:x86/urgent] x86-32: Fix kexec with stack canary (CONFIG_CC_STACKPROTECTOR)

2017-12-27 Thread tip-bot for Linus Torvalds
Commit-ID:  ac461122c88a10b7d775de2f56467f097c9e627a
Gitweb: https://git.kernel.org/tip/ac461122c88a10b7d775de2f56467f097c9e627a
Author: Linus Torvalds 
AuthorDate: Wed, 27 Dec 2017 11:48:50 -0800
Committer:  Thomas Gleixner 
CommitDate: Wed, 27 Dec 2017 20:59:41 +0100

x86-32: Fix kexec with stack canary (CONFIG_CC_STACKPROTECTOR)

Commit e802a51ede91 ("x86/idt: Consolidate IDT invalidation") cleaned up
and unified the IDT invalidation that existed in a couple of places.  It
changed no actual real code.

Despite not changing any actual real code, it _did_ change code generation:
by implementing the common idt_invalidate() function in
archx86/kernel/idt.c, it made the use of the function in
arch/x86/kernel/machine_kexec_32.c be a real function call rather than an
(accidental) inlining of the function.

That, in turn, exposed two issues:

 - in load_segments(), we had incorrectly reset all the segment
   registers, which then made the stack canary load (which gcc does
   using offset of %gs) cause a trap.  Instead of %gs pointing to the
   stack canary, it will be the normal zero-based kernel segment, and
   the stack canary load will take a page fault at address 0x14.

 - to make this even harder to debug, we had invalidated the GDT just
   before calling idt_invalidate(), which meant that the fault happened
   with an invalid GDT, which in turn causes a triple fault and
   immediate reboot.

Fix this by

 (a) not reloading the special segments in load_segments(). We currently
 don't do any percpu accesses (which would require %fs on x86-32) in
 this area, but there's no reason to think that we might not want to
 do them, and like %gs, it's pointless to break it.

 (b) doing idt_invalidate() before invalidating the GDT, to keep things
 at least _slightly_ more debuggable for a bit longer. Without a
 IDT, traps will not work. Without a GDT, traps also will not work,
 but neither will any segment loads etc. So in a very real sense,
 the GDT is even more core than the IDT.

Fixes: e802a51ede91 ("x86/idt: Consolidate IDT invalidation")
Reported-and-tested-by: Alexandru Chirvasitu 
Signed-off-by: Linus Torvalds 
Signed-off-by: Thomas Gleixner 
Cc: Denys Vlasenko 
Cc: Peter Zijlstra 
Cc: Brian Gerst 
Cc: Steven Rostedt 
Cc: Borislav Petkov 
Cc: Andy Lutomirski 
Cc: Josh Poimboeuf 
Cc: sta...@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.lfd.2.21.1712271143180.8...@i7.lan

---
 arch/x86/kernel/machine_kexec_32.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/x86/kernel/machine_kexec_32.c 
b/arch/x86/kernel/machine_kexec_32.c
index 00bc751..edfede7 100644
--- a/arch/x86/kernel/machine_kexec_32.c
+++ b/arch/x86/kernel/machine_kexec_32.c
@@ -48,8 +48,6 @@ static void load_segments(void)
"\tmovl $"STR(__KERNEL_DS)",%%eax\n"
"\tmovl %%eax,%%ds\n"
"\tmovl %%eax,%%es\n"
-   "\tmovl %%eax,%%fs\n"
-   "\tmovl %%eax,%%gs\n"
"\tmovl %%eax,%%ss\n"
: : : "eax", "memory");
 #undef STR
@@ -232,8 +230,8 @@ void machine_kexec(struct kimage *image)
 * The gdt & idt are now invalid.
 * If you want to load them you must set up your own idt & gdt.
 */
-   set_gdt(phys_to_virt(0), 0);
idt_invalidate(phys_to_virt(0));
+   set_gdt(phys_to_virt(0), 0);
 
/* now call it */
image->start = relocate_kernel_ptr((unsigned long)image->head,


[tip:sched/urgent] sched/core: Remove pointless printout in sched_show_task()

2016-11-03 Thread tip-bot for Linus Torvalds
Commit-ID:  8243d5597793b5e85143c9a935e1b971c59740a9
Gitweb: http://git.kernel.org/tip/8243d5597793b5e85143c9a935e1b971c59740a9
Author: Linus Torvalds 
AuthorDate: Tue, 1 Nov 2016 17:47:18 -0600
Committer:  Ingo Molnar 
CommitDate: Thu, 3 Nov 2016 07:31:34 +0100

sched/core: Remove pointless printout in sched_show_task()

In sched_show_task() we print out a useless hex number, not even a
symbol, and there's a big question mark whether this even makes sense
anyway, I suspect we should just remove it all.

Signed-off-by: Linus Torvalds 
Acked-by: Andy Lutomirski 
Cc: Peter Zijlstra 
Cc: Tetsuo Handa 
Cc: Thomas Gleixner 
Cc: b...@alien8.de
Cc: brge...@gmail.com
Cc: j...@thejh.net
Cc: keesc...@chromium.org
Cc: linux-...@vger.kernel.org
Cc: tycho.ander...@canonical.com
Link: 
http://lkml.kernel.org/r/ca+55afzphurpfzavu4z6moy7zmimcwpuudyu8bj9z0j+s8x...@mail.gmail.com
Signed-off-by: Ingo Molnar 
---
 kernel/sched/core.c | 9 -
 1 file changed, 9 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 9abf66b..154fd68 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5198,17 +5198,8 @@ void sched_show_task(struct task_struct *p)
state = __ffs(state) + 1;
printk(KERN_INFO "%-15.15s %c", p->comm,
state < sizeof(stat_nam) - 1 ? stat_nam[state] : '?');
-#if BITS_PER_LONG == 32
-   if (state == TASK_RUNNING)
-   printk(KERN_CONT " running  ");
-   else
-   printk(KERN_CONT " %08lx ", thread_saved_pc(p));
-#else
if (state == TASK_RUNNING)
printk(KERN_CONT "  running task");
-   else
-   printk(KERN_CONT " %016lx ", thread_saved_pc(p));
-#endif
 #ifdef CONFIG_DEBUG_STACK_USAGE
free = stack_not_used(p);
 #endif


[tip:sched/urgent] sched/core: Remove pointless printout in sched_show_task()

2016-11-03 Thread tip-bot for Linus Torvalds
Commit-ID:  8243d5597793b5e85143c9a935e1b971c59740a9
Gitweb: http://git.kernel.org/tip/8243d5597793b5e85143c9a935e1b971c59740a9
Author: Linus Torvalds 
AuthorDate: Tue, 1 Nov 2016 17:47:18 -0600
Committer:  Ingo Molnar 
CommitDate: Thu, 3 Nov 2016 07:31:34 +0100

sched/core: Remove pointless printout in sched_show_task()

In sched_show_task() we print out a useless hex number, not even a
symbol, and there's a big question mark whether this even makes sense
anyway, I suspect we should just remove it all.

Signed-off-by: Linus Torvalds 
Acked-by: Andy Lutomirski 
Cc: Peter Zijlstra 
Cc: Tetsuo Handa 
Cc: Thomas Gleixner 
Cc: b...@alien8.de
Cc: brge...@gmail.com
Cc: j...@thejh.net
Cc: keesc...@chromium.org
Cc: linux-...@vger.kernel.org
Cc: tycho.ander...@canonical.com
Link: 
http://lkml.kernel.org/r/ca+55afzphurpfzavu4z6moy7zmimcwpuudyu8bj9z0j+s8x...@mail.gmail.com
Signed-off-by: Ingo Molnar 
---
 kernel/sched/core.c | 9 -
 1 file changed, 9 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 9abf66b..154fd68 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5198,17 +5198,8 @@ void sched_show_task(struct task_struct *p)
state = __ffs(state) + 1;
printk(KERN_INFO "%-15.15s %c", p->comm,
state < sizeof(stat_nam) - 1 ? stat_nam[state] : '?');
-#if BITS_PER_LONG == 32
-   if (state == TASK_RUNNING)
-   printk(KERN_CONT " running  ");
-   else
-   printk(KERN_CONT " %08lx ", thread_saved_pc(p));
-#else
if (state == TASK_RUNNING)
printk(KERN_CONT "  running task");
-   else
-   printk(KERN_CONT " %016lx ", thread_saved_pc(p));
-#endif
 #ifdef CONFIG_DEBUG_STACK_USAGE
free = stack_not_used(p);
 #endif


[tip:x86/asm] um/Stop conflating task_struct::stack with thread_info

2016-09-15 Thread tip-bot for Linus Torvalds
Commit-ID:  d896fa20a70c9e596438728561e058a74ed3196b
Gitweb: http://git.kernel.org/tip/d896fa20a70c9e596438728561e058a74ed3196b
Author: Linus Torvalds 
AuthorDate: Tue, 13 Sep 2016 14:29:23 -0700
Committer:  Ingo Molnar 
CommitDate: Thu, 15 Sep 2016 08:25:12 +0200

um/Stop conflating task_struct::stack with thread_info

thread_info may move in the future, so use the accessors.

[ Andy Lutomirski wrote this changelog message and changed
  "task_thread_info(child)->cpu" to "task_cpu(child)". ]

Signed-off-by: Linus Torvalds 
Signed-off-by: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Jann Horn 
Cc: Josh Poimboeuf 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/3439705d9838940cc82733a7335fa8c654c37db8.1473801993.git.l...@kernel.org
Signed-off-by: Ingo Molnar 
---
 arch/x86/um/ptrace_32.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/um/ptrace_32.c b/arch/x86/um/ptrace_32.c
index a7ef7b1..5766ead 100644
--- a/arch/x86/um/ptrace_32.c
+++ b/arch/x86/um/ptrace_32.c
@@ -194,7 +194,7 @@ int peek_user(struct task_struct *child, long addr, long 
data)
 
 static int get_fpregs(struct user_i387_struct __user *buf, struct task_struct 
*child)
 {
-   int err, n, cpu = ((struct thread_info *) child->stack)->cpu;
+   int err, n, cpu = task_cpu(child);
struct user_i387_struct fpregs;
 
err = save_i387_registers(userspace_pid[cpu],
@@ -211,7 +211,7 @@ static int get_fpregs(struct user_i387_struct __user *buf, 
struct task_struct *c
 
 static int set_fpregs(struct user_i387_struct __user *buf, struct task_struct 
*child)
 {
-   int n, cpu = ((struct thread_info *) child->stack)->cpu;
+   int n, cpu = task_cpu(child);
struct user_i387_struct fpregs;
 
n = copy_from_user(, buf, sizeof(fpregs));
@@ -224,7 +224,7 @@ static int set_fpregs(struct user_i387_struct __user *buf, 
struct task_struct *c
 
 static int get_fpxregs(struct user_fxsr_struct __user *buf, struct task_struct 
*child)
 {
-   int err, n, cpu = ((struct thread_info *) child->stack)->cpu;
+   int err, n, cpu = task_cpu(child);
struct user_fxsr_struct fpregs;
 
err = save_fpx_registers(userspace_pid[cpu], (unsigned long *) );
@@ -240,7 +240,7 @@ static int get_fpxregs(struct user_fxsr_struct __user *buf, 
struct task_struct *
 
 static int set_fpxregs(struct user_fxsr_struct __user *buf, struct task_struct 
*child)
 {
-   int n, cpu = ((struct thread_info *) child->stack)->cpu;
+   int n, cpu = task_cpu(child);
struct user_fxsr_struct fpregs;
 
n = copy_from_user(, buf, sizeof(fpregs));


[tip:x86/asm] um/Stop conflating task_struct::stack with thread_info

2016-09-15 Thread tip-bot for Linus Torvalds
Commit-ID:  d896fa20a70c9e596438728561e058a74ed3196b
Gitweb: http://git.kernel.org/tip/d896fa20a70c9e596438728561e058a74ed3196b
Author: Linus Torvalds 
AuthorDate: Tue, 13 Sep 2016 14:29:23 -0700
Committer:  Ingo Molnar 
CommitDate: Thu, 15 Sep 2016 08:25:12 +0200

um/Stop conflating task_struct::stack with thread_info

thread_info may move in the future, so use the accessors.

[ Andy Lutomirski wrote this changelog message and changed
  "task_thread_info(child)->cpu" to "task_cpu(child)". ]

Signed-off-by: Linus Torvalds 
Signed-off-by: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Jann Horn 
Cc: Josh Poimboeuf 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/3439705d9838940cc82733a7335fa8c654c37db8.1473801993.git.l...@kernel.org
Signed-off-by: Ingo Molnar 
---
 arch/x86/um/ptrace_32.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/um/ptrace_32.c b/arch/x86/um/ptrace_32.c
index a7ef7b1..5766ead 100644
--- a/arch/x86/um/ptrace_32.c
+++ b/arch/x86/um/ptrace_32.c
@@ -194,7 +194,7 @@ int peek_user(struct task_struct *child, long addr, long 
data)
 
 static int get_fpregs(struct user_i387_struct __user *buf, struct task_struct 
*child)
 {
-   int err, n, cpu = ((struct thread_info *) child->stack)->cpu;
+   int err, n, cpu = task_cpu(child);
struct user_i387_struct fpregs;
 
err = save_i387_registers(userspace_pid[cpu],
@@ -211,7 +211,7 @@ static int get_fpregs(struct user_i387_struct __user *buf, 
struct task_struct *c
 
 static int set_fpregs(struct user_i387_struct __user *buf, struct task_struct 
*child)
 {
-   int n, cpu = ((struct thread_info *) child->stack)->cpu;
+   int n, cpu = task_cpu(child);
struct user_i387_struct fpregs;
 
n = copy_from_user(, buf, sizeof(fpregs));
@@ -224,7 +224,7 @@ static int set_fpregs(struct user_i387_struct __user *buf, 
struct task_struct *c
 
 static int get_fpxregs(struct user_fxsr_struct __user *buf, struct task_struct 
*child)
 {
-   int err, n, cpu = ((struct thread_info *) child->stack)->cpu;
+   int err, n, cpu = task_cpu(child);
struct user_fxsr_struct fpregs;
 
err = save_fpx_registers(userspace_pid[cpu], (unsigned long *) );
@@ -240,7 +240,7 @@ static int get_fpxregs(struct user_fxsr_struct __user *buf, 
struct task_struct *
 
 static int set_fpxregs(struct user_fxsr_struct __user *buf, struct task_struct 
*child)
 {
-   int n, cpu = ((struct thread_info *) child->stack)->cpu;
+   int n, cpu = task_cpu(child);
struct user_fxsr_struct fpregs;
 
n = copy_from_user(, buf, sizeof(fpregs));


[tip:x86/asm] x86/entry: Get rid of pt_regs_to_thread_info()

2016-09-15 Thread tip-bot for Linus Torvalds
Commit-ID:  97245d00585d82540f4538cf72d92a1e853c7b0e
Gitweb: http://git.kernel.org/tip/97245d00585d82540f4538cf72d92a1e853c7b0e
Author: Linus Torvalds 
AuthorDate: Tue, 13 Sep 2016 14:29:22 -0700
Committer:  Ingo Molnar 
CommitDate: Thu, 15 Sep 2016 08:25:12 +0200

x86/entry: Get rid of pt_regs_to_thread_info()

It was a nice optimization while it lasted, but thread_info is moving
and this optimization will no longer work.

Quoting Linus:

Oh Gods, Andy. That pt_regs_to_thread_info() thing made me want
to do unspeakable acts on a poor innocent wax figure that looked
_exactly_ like you.

[ Changelog written by Andy. ]
Signed-off-by: Linus Torvalds 
Signed-off-by: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Jann Horn 
Cc: Josh Poimboeuf 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/6376aa81c68798cc81631673f52bd91a3e078944.1473801993.git.l...@kernel.org
Signed-off-by: Ingo Molnar 
---
 arch/x86/entry/common.c | 20 ++--
 1 file changed, 6 insertions(+), 14 deletions(-)

diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 871bbf9..bdd9cc5 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -31,13 +31,6 @@
 #define CREATE_TRACE_POINTS
 #include 
 
-static struct thread_info *pt_regs_to_thread_info(struct pt_regs *regs)
-{
-   unsigned long top_of_stack =
-   (unsigned long)(regs + 1) + TOP_OF_KERNEL_STACK_PADDING;
-   return (struct thread_info *)(top_of_stack - THREAD_SIZE);
-}
-
 #ifdef CONFIG_CONTEXT_TRACKING
 /* Called on entry from user mode with IRQs off. */
 __visible inline void enter_from_user_mode(void)
@@ -71,7 +64,7 @@ static long syscall_trace_enter(struct pt_regs *regs)
 {
u32 arch = in_ia32_syscall() ? AUDIT_ARCH_I386 : AUDIT_ARCH_X86_64;
 
-   struct thread_info *ti = pt_regs_to_thread_info(regs);
+   struct thread_info *ti = current_thread_info();
unsigned long ret = 0;
bool emulated = false;
u32 work;
@@ -173,18 +166,17 @@ static void exit_to_usermode_loop(struct pt_regs *regs, 
u32 cached_flags)
/* Disable IRQs and retry */
local_irq_disable();
 
-   cached_flags = READ_ONCE(pt_regs_to_thread_info(regs)->flags);
+   cached_flags = READ_ONCE(current_thread_info()->flags);
 
if (!(cached_flags & EXIT_TO_USERMODE_LOOP_FLAGS))
break;
-
}
 }
 
 /* Called with IRQs disabled. */
 __visible inline void prepare_exit_to_usermode(struct pt_regs *regs)
 {
-   struct thread_info *ti = pt_regs_to_thread_info(regs);
+   struct thread_info *ti = current_thread_info();
u32 cached_flags;
 
if (IS_ENABLED(CONFIG_PROVE_LOCKING) && WARN_ON(!irqs_disabled()))
@@ -247,7 +239,7 @@ static void syscall_slow_exit_work(struct pt_regs *regs, 
u32 cached_flags)
  */
 __visible inline void syscall_return_slowpath(struct pt_regs *regs)
 {
-   struct thread_info *ti = pt_regs_to_thread_info(regs);
+   struct thread_info *ti = current_thread_info();
u32 cached_flags = READ_ONCE(ti->flags);
 
CT_WARN_ON(ct_state() != CONTEXT_KERNEL);
@@ -270,7 +262,7 @@ __visible inline void syscall_return_slowpath(struct 
pt_regs *regs)
 #ifdef CONFIG_X86_64
 __visible void do_syscall_64(struct pt_regs *regs)
 {
-   struct thread_info *ti = pt_regs_to_thread_info(regs);
+   struct thread_info *ti = current_thread_info();
unsigned long nr = regs->orig_ax;
 
enter_from_user_mode();
@@ -303,7 +295,7 @@ __visible void do_syscall_64(struct pt_regs *regs)
  */
 static __always_inline void do_syscall_32_irqs_on(struct pt_regs *regs)
 {
-   struct thread_info *ti = pt_regs_to_thread_info(regs);
+   struct thread_info *ti = current_thread_info();
unsigned int nr = (unsigned int)regs->orig_ax;
 
 #ifdef CONFIG_IA32_EMULATION


[tip:x86/asm] x86/entry: Get rid of pt_regs_to_thread_info()

2016-09-15 Thread tip-bot for Linus Torvalds
Commit-ID:  97245d00585d82540f4538cf72d92a1e853c7b0e
Gitweb: http://git.kernel.org/tip/97245d00585d82540f4538cf72d92a1e853c7b0e
Author: Linus Torvalds 
AuthorDate: Tue, 13 Sep 2016 14:29:22 -0700
Committer:  Ingo Molnar 
CommitDate: Thu, 15 Sep 2016 08:25:12 +0200

x86/entry: Get rid of pt_regs_to_thread_info()

It was a nice optimization while it lasted, but thread_info is moving
and this optimization will no longer work.

Quoting Linus:

Oh Gods, Andy. That pt_regs_to_thread_info() thing made me want
to do unspeakable acts on a poor innocent wax figure that looked
_exactly_ like you.

[ Changelog written by Andy. ]
Signed-off-by: Linus Torvalds 
Signed-off-by: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Jann Horn 
Cc: Josh Poimboeuf 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/6376aa81c68798cc81631673f52bd91a3e078944.1473801993.git.l...@kernel.org
Signed-off-by: Ingo Molnar 
---
 arch/x86/entry/common.c | 20 ++--
 1 file changed, 6 insertions(+), 14 deletions(-)

diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 871bbf9..bdd9cc5 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -31,13 +31,6 @@
 #define CREATE_TRACE_POINTS
 #include 
 
-static struct thread_info *pt_regs_to_thread_info(struct pt_regs *regs)
-{
-   unsigned long top_of_stack =
-   (unsigned long)(regs + 1) + TOP_OF_KERNEL_STACK_PADDING;
-   return (struct thread_info *)(top_of_stack - THREAD_SIZE);
-}
-
 #ifdef CONFIG_CONTEXT_TRACKING
 /* Called on entry from user mode with IRQs off. */
 __visible inline void enter_from_user_mode(void)
@@ -71,7 +64,7 @@ static long syscall_trace_enter(struct pt_regs *regs)
 {
u32 arch = in_ia32_syscall() ? AUDIT_ARCH_I386 : AUDIT_ARCH_X86_64;
 
-   struct thread_info *ti = pt_regs_to_thread_info(regs);
+   struct thread_info *ti = current_thread_info();
unsigned long ret = 0;
bool emulated = false;
u32 work;
@@ -173,18 +166,17 @@ static void exit_to_usermode_loop(struct pt_regs *regs, 
u32 cached_flags)
/* Disable IRQs and retry */
local_irq_disable();
 
-   cached_flags = READ_ONCE(pt_regs_to_thread_info(regs)->flags);
+   cached_flags = READ_ONCE(current_thread_info()->flags);
 
if (!(cached_flags & EXIT_TO_USERMODE_LOOP_FLAGS))
break;
-
}
 }
 
 /* Called with IRQs disabled. */
 __visible inline void prepare_exit_to_usermode(struct pt_regs *regs)
 {
-   struct thread_info *ti = pt_regs_to_thread_info(regs);
+   struct thread_info *ti = current_thread_info();
u32 cached_flags;
 
if (IS_ENABLED(CONFIG_PROVE_LOCKING) && WARN_ON(!irqs_disabled()))
@@ -247,7 +239,7 @@ static void syscall_slow_exit_work(struct pt_regs *regs, 
u32 cached_flags)
  */
 __visible inline void syscall_return_slowpath(struct pt_regs *regs)
 {
-   struct thread_info *ti = pt_regs_to_thread_info(regs);
+   struct thread_info *ti = current_thread_info();
u32 cached_flags = READ_ONCE(ti->flags);
 
CT_WARN_ON(ct_state() != CONTEXT_KERNEL);
@@ -270,7 +262,7 @@ __visible inline void syscall_return_slowpath(struct 
pt_regs *regs)
 #ifdef CONFIG_X86_64
 __visible void do_syscall_64(struct pt_regs *regs)
 {
-   struct thread_info *ti = pt_regs_to_thread_info(regs);
+   struct thread_info *ti = current_thread_info();
unsigned long nr = regs->orig_ax;
 
enter_from_user_mode();
@@ -303,7 +295,7 @@ __visible void do_syscall_64(struct pt_regs *regs)
  */
 static __always_inline void do_syscall_32_irqs_on(struct pt_regs *regs)
 {
-   struct thread_info *ti = pt_regs_to_thread_info(regs);
+   struct thread_info *ti = current_thread_info();
unsigned int nr = (unsigned int)regs->orig_ax;
 
 #ifdef CONFIG_IA32_EMULATION


[tip:x86/urgent] x86/efi: Fix 7-parameter efi_call()s

2016-05-17 Thread tip-bot for Linus Torvalds
Commit-ID:  683ad8092cd262a02d01377dd17a29d492438b90
Gitweb: http://git.kernel.org/tip/683ad8092cd262a02d01377dd17a29d492438b90
Author: Linus Torvalds 
AuthorDate: Mon, 16 May 2016 13:05:45 -0700
Committer:  Ingo Molnar 
CommitDate: Tue, 17 May 2016 08:25:06 +0200

x86/efi: Fix 7-parameter efi_call()s

Alex Thorlton reported that the SGI/UV code crashes in the efi_call()
code when invoked with 7 parameters, due to:

mov (%rsp), %rax
mov 8(%rax), %rax
...
mov %rax, 40(%rsp)

Offset 8 is only true if CONFIG_FRAME_POINTERS is disabled,
with frame pointers enabled it should be 16.

Furthermore, the SAVE_XMM code saves the old stack pointer, but
that's just crazy. It saves the stack pointer *AFTER* we've done
the:

FRAME_BEGIN

... which will have *changed* the stack pointer, depending on whether
stack frames are enabled or not.

So when the code then does:

mov (%rsp), %rax

... we now move that old stack pointer into %rax, but the offset off that
stack pointer will depend on whether that FRAME_BEGIN saved off %rbp
or not.

So that whole 8-vs-16 offset confusion depends on the frame pointer!
If frame pointers were enabled, it will be 16. If they weren't, it
will be 8.

The right fix is to just get rid of that silly conditional frame
pointer thing, and always use frame pointers in this stub function.
And then we don't need that (odd) load to get the old stack
pointer into %rax - we can just use the frame pointer.

Reported-by: Alex Thorlton 
Tested-by: Alex Thorlton 
Signed-off-by: Linus Torvalds 
Cc: Alexander Shishkin 
Cc: Andrew Morton 
Cc: Andy Lutomirski 
Cc: Arnaldo Carvalho de Melo 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Jiri Olsa 
Cc: Matt Fleming 
Cc: Peter Zijlstra 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
Cc: Thomas Gleixner 
Cc: Vince Weaver 
Link: 
http://lkml.kernel.org/r/ca%2b55afzbs2v%3dwneh83cudg7xkoremfqj30bjwf40dcyjreb...@mail.gmail.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/platform/efi/efi_stub_64.S | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/x86/platform/efi/efi_stub_64.S 
b/arch/x86/platform/efi/efi_stub_64.S
index 92723ae..cd95075 100644
--- a/arch/x86/platform/efi/efi_stub_64.S
+++ b/arch/x86/platform/efi/efi_stub_64.S
@@ -11,7 +11,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #define SAVE_XMM   \
mov %rsp, %rax; \
@@ -40,10 +39,10 @@
mov (%rsp), %rsp
 
 ENTRY(efi_call)
-   FRAME_BEGIN
+   pushq %rbp
+   movq %rsp, %rbp
SAVE_XMM
-   mov (%rsp), %rax
-   mov 8(%rax), %rax
+   mov 16(%rbp), %rax
subq $48, %rsp
mov %r9, 32(%rsp)
mov %rax, 40(%rsp)
@@ -53,6 +52,6 @@ ENTRY(efi_call)
call *%rdi
addq $48, %rsp
RESTORE_XMM
-   FRAME_END
+   popq %rbp
ret
 ENDPROC(efi_call)


[tip:x86/urgent] x86/efi: Fix 7-parameter efi_call()s

2016-05-17 Thread tip-bot for Linus Torvalds
Commit-ID:  683ad8092cd262a02d01377dd17a29d492438b90
Gitweb: http://git.kernel.org/tip/683ad8092cd262a02d01377dd17a29d492438b90
Author: Linus Torvalds 
AuthorDate: Mon, 16 May 2016 13:05:45 -0700
Committer:  Ingo Molnar 
CommitDate: Tue, 17 May 2016 08:25:06 +0200

x86/efi: Fix 7-parameter efi_call()s

Alex Thorlton reported that the SGI/UV code crashes in the efi_call()
code when invoked with 7 parameters, due to:

mov (%rsp), %rax
mov 8(%rax), %rax
...
mov %rax, 40(%rsp)

Offset 8 is only true if CONFIG_FRAME_POINTERS is disabled,
with frame pointers enabled it should be 16.

Furthermore, the SAVE_XMM code saves the old stack pointer, but
that's just crazy. It saves the stack pointer *AFTER* we've done
the:

FRAME_BEGIN

... which will have *changed* the stack pointer, depending on whether
stack frames are enabled or not.

So when the code then does:

mov (%rsp), %rax

... we now move that old stack pointer into %rax, but the offset off that
stack pointer will depend on whether that FRAME_BEGIN saved off %rbp
or not.

So that whole 8-vs-16 offset confusion depends on the frame pointer!
If frame pointers were enabled, it will be 16. If they weren't, it
will be 8.

The right fix is to just get rid of that silly conditional frame
pointer thing, and always use frame pointers in this stub function.
And then we don't need that (odd) load to get the old stack
pointer into %rax - we can just use the frame pointer.

Reported-by: Alex Thorlton 
Tested-by: Alex Thorlton 
Signed-off-by: Linus Torvalds 
Cc: Alexander Shishkin 
Cc: Andrew Morton 
Cc: Andy Lutomirski 
Cc: Arnaldo Carvalho de Melo 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Jiri Olsa 
Cc: Matt Fleming 
Cc: Peter Zijlstra 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
Cc: Thomas Gleixner 
Cc: Vince Weaver 
Link: 
http://lkml.kernel.org/r/ca%2b55afzbs2v%3dwneh83cudg7xkoremfqj30bjwf40dcyjreb...@mail.gmail.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/platform/efi/efi_stub_64.S | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/x86/platform/efi/efi_stub_64.S 
b/arch/x86/platform/efi/efi_stub_64.S
index 92723ae..cd95075 100644
--- a/arch/x86/platform/efi/efi_stub_64.S
+++ b/arch/x86/platform/efi/efi_stub_64.S
@@ -11,7 +11,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #define SAVE_XMM   \
mov %rsp, %rax; \
@@ -40,10 +39,10 @@
mov (%rsp), %rsp
 
 ENTRY(efi_call)
-   FRAME_BEGIN
+   pushq %rbp
+   movq %rsp, %rbp
SAVE_XMM
-   mov (%rsp), %rax
-   mov 8(%rax), %rax
+   mov 16(%rbp), %rax
subq $48, %rsp
mov %r9, 32(%rsp)
mov %rax, 40(%rsp)
@@ -53,6 +52,6 @@ ENTRY(efi_call)
call *%rdi
addq $48, %rsp
RESTORE_XMM
-   FRAME_END
+   popq %rbp
ret
 ENDPROC(efi_call)


[tip:x86/apic] x86/apic: Add a single-target IPI function to the apic

2015-11-05 Thread tip-bot for Linus Torvalds
Commit-ID:  539da7877275edb21a76aa02fb2c147eff02c559
Gitweb: http://git.kernel.org/tip/539da7877275edb21a76aa02fb2c147eff02c559
Author: Linus Torvalds 
AuthorDate: Wed, 4 Nov 2015 22:57:00 +
Committer:  Thomas Gleixner 
CommitDate: Thu, 5 Nov 2015 13:07:51 +0100

x86/apic: Add a single-target IPI function to the apic

We still fall back on the "send mask" versions if an apic definition
doesn't have the single-target version, but at least this allows the
(trivial) case for the common clustered x2apic case.

Signed-off-by: Linus Torvalds 
Reviewed-by: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Peter Zijlstra 
Cc: Mike Travis 
Cc: Daniel J Blueman 
Link: http://lkml.kernel.org/r/20151104220848.737120...@linutronix.de
Signed-off-by: Thomas Gleixner 
---
 arch/x86/include/asm/apic.h |  1 +
 arch/x86/kernel/smp.c   | 16 ++--
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index a30316b..7f62ad4 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -303,6 +303,7 @@ struct apic {
  unsigned int *apicid);
 
/* ipi */
+   void (*send_IPI)(int cpu, int vector);
void (*send_IPI_mask)(const struct cpumask *mask, int vector);
void (*send_IPI_mask_allbutself)(const struct cpumask *mask,
 int vector);
diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
index 12c8286..1dbf590 100644
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -115,6 +115,18 @@ static atomic_t stopping_cpu = ATOMIC_INIT(-1);
 static bool smp_no_nmi_ipi = false;
 
 /*
+ * Helper wrapper: not all apic definitions support sending to
+ * a single CPU, so we fall back to sending to a mask.
+ */
+static void send_IPI_cpu(int cpu, int vector)
+{
+   if (apic->send_IPI)
+   apic->send_IPI(cpu, vector);
+   else
+   apic->send_IPI_mask(cpumask_of(cpu), vector);
+}
+
+/*
  * this function sends a 'reschedule' IPI to another CPU.
  * it goes straight through and wastes no time serializing
  * anything. Worst case is that we lose a reschedule ...
@@ -125,12 +137,12 @@ static void native_smp_send_reschedule(int cpu)
WARN_ON(1);
return;
}
-   apic->send_IPI_mask(cpumask_of(cpu), RESCHEDULE_VECTOR);
+   send_IPI_cpu(cpu, RESCHEDULE_VECTOR);
 }
 
 void native_send_call_func_single_ipi(int cpu)
 {
-   apic->send_IPI_mask(cpumask_of(cpu), CALL_FUNCTION_SINGLE_VECTOR);
+   send_IPI_cpu(cpu, CALL_FUNCTION_SINGLE_VECTOR);
 }
 
 void native_send_call_func_ipi(const struct cpumask *mask)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/apic] x86/apic: Implement single target IPI function for x2apic_cluster

2015-11-05 Thread tip-bot for Linus Torvalds
Commit-ID:  7b6ce46cb3d096831dea3accacee4717c66abac8
Gitweb: http://git.kernel.org/tip/7b6ce46cb3d096831dea3accacee4717c66abac8
Author: Linus Torvalds 
AuthorDate: Wed, 4 Nov 2015 22:57:00 +
Committer:  Thomas Gleixner 
CommitDate: Thu, 5 Nov 2015 13:07:52 +0100

x86/apic: Implement single target IPI function for x2apic_cluster

[ tglx: Split it out from the patch which provides the new callback ]

Signed-off-by: Linus Torvalds 
Reviewed-by: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Peter Zijlstra 
Cc: Mike Travis 
Cc: Daniel J Blueman 
Link: http://lkml.kernel.org/r/20151104220848.817975...@linutronix.de
Signed-off-by: Thomas Gleixner 
---
 arch/x86/kernel/apic/x2apic_cluster.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/kernel/apic/x2apic_cluster.c 
b/arch/x86/kernel/apic/x2apic_cluster.c
index cc8311c..aca8b75 100644
--- a/arch/x86/kernel/apic/x2apic_cluster.c
+++ b/arch/x86/kernel/apic/x2apic_cluster.c
@@ -23,6 +23,14 @@ static inline u32 x2apic_cluster(int cpu)
return per_cpu(x86_cpu_to_logical_apicid, cpu) >> 16;
 }
 
+static void x2apic_send_IPI(int cpu, int vector)
+{
+   u32 dest = per_cpu(x86_cpu_to_logical_apicid, cpu);
+
+   x2apic_wrmsr_fence();
+   __x2apic_send_IPI_dest(dest, vector, APIC_DEST_LOGICAL);
+}
+
 static void
 __x2apic_send_IPI_mask(const struct cpumask *mask, int vector, int apic_dest)
 {
@@ -266,6 +274,7 @@ static struct apic apic_x2apic_cluster = {
 
.cpu_mask_to_apicid_and = x2apic_cpu_mask_to_apicid_and,
 
+   .send_IPI   = x2apic_send_IPI,
.send_IPI_mask  = x2apic_send_IPI_mask,
.send_IPI_mask_allbutself   = x2apic_send_IPI_mask_allbutself,
.send_IPI_allbutself= x2apic_send_IPI_allbutself,
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/apic] x86/apic: Implement single target IPI function for x2apic_cluster

2015-11-05 Thread tip-bot for Linus Torvalds
Commit-ID:  7b6ce46cb3d096831dea3accacee4717c66abac8
Gitweb: http://git.kernel.org/tip/7b6ce46cb3d096831dea3accacee4717c66abac8
Author: Linus Torvalds 
AuthorDate: Wed, 4 Nov 2015 22:57:00 +
Committer:  Thomas Gleixner 
CommitDate: Thu, 5 Nov 2015 13:07:52 +0100

x86/apic: Implement single target IPI function for x2apic_cluster

[ tglx: Split it out from the patch which provides the new callback ]

Signed-off-by: Linus Torvalds 
Reviewed-by: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Peter Zijlstra 
Cc: Mike Travis 
Cc: Daniel J Blueman 
Link: http://lkml.kernel.org/r/20151104220848.817975...@linutronix.de
Signed-off-by: Thomas Gleixner 
---
 arch/x86/kernel/apic/x2apic_cluster.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/kernel/apic/x2apic_cluster.c 
b/arch/x86/kernel/apic/x2apic_cluster.c
index cc8311c..aca8b75 100644
--- a/arch/x86/kernel/apic/x2apic_cluster.c
+++ b/arch/x86/kernel/apic/x2apic_cluster.c
@@ -23,6 +23,14 @@ static inline u32 x2apic_cluster(int cpu)
return per_cpu(x86_cpu_to_logical_apicid, cpu) >> 16;
 }
 
+static void x2apic_send_IPI(int cpu, int vector)
+{
+   u32 dest = per_cpu(x86_cpu_to_logical_apicid, cpu);
+
+   x2apic_wrmsr_fence();
+   __x2apic_send_IPI_dest(dest, vector, APIC_DEST_LOGICAL);
+}
+
 static void
 __x2apic_send_IPI_mask(const struct cpumask *mask, int vector, int apic_dest)
 {
@@ -266,6 +274,7 @@ static struct apic apic_x2apic_cluster = {
 
.cpu_mask_to_apicid_and = x2apic_cpu_mask_to_apicid_and,
 
+   .send_IPI   = x2apic_send_IPI,
.send_IPI_mask  = x2apic_send_IPI_mask,
.send_IPI_mask_allbutself   = x2apic_send_IPI_mask_allbutself,
.send_IPI_allbutself= x2apic_send_IPI_allbutself,
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/apic] x86/apic: Add a single-target IPI function to the apic

2015-11-05 Thread tip-bot for Linus Torvalds
Commit-ID:  539da7877275edb21a76aa02fb2c147eff02c559
Gitweb: http://git.kernel.org/tip/539da7877275edb21a76aa02fb2c147eff02c559
Author: Linus Torvalds 
AuthorDate: Wed, 4 Nov 2015 22:57:00 +
Committer:  Thomas Gleixner 
CommitDate: Thu, 5 Nov 2015 13:07:51 +0100

x86/apic: Add a single-target IPI function to the apic

We still fall back on the "send mask" versions if an apic definition
doesn't have the single-target version, but at least this allows the
(trivial) case for the common clustered x2apic case.

Signed-off-by: Linus Torvalds 
Reviewed-by: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Peter Zijlstra 
Cc: Mike Travis 
Cc: Daniel J Blueman 
Link: http://lkml.kernel.org/r/20151104220848.737120...@linutronix.de
Signed-off-by: Thomas Gleixner 
---
 arch/x86/include/asm/apic.h |  1 +
 arch/x86/kernel/smp.c   | 16 ++--
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index a30316b..7f62ad4 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -303,6 +303,7 @@ struct apic {
  unsigned int *apicid);
 
/* ipi */
+   void (*send_IPI)(int cpu, int vector);
void (*send_IPI_mask)(const struct cpumask *mask, int vector);
void (*send_IPI_mask_allbutself)(const struct cpumask *mask,
 int vector);
diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
index 12c8286..1dbf590 100644
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -115,6 +115,18 @@ static atomic_t stopping_cpu = ATOMIC_INIT(-1);
 static bool smp_no_nmi_ipi = false;
 
 /*
+ * Helper wrapper: not all apic definitions support sending to
+ * a single CPU, so we fall back to sending to a mask.
+ */
+static void send_IPI_cpu(int cpu, int vector)
+{
+   if (apic->send_IPI)
+   apic->send_IPI(cpu, vector);
+   else
+   apic->send_IPI_mask(cpumask_of(cpu), vector);
+}
+
+/*
  * this function sends a 'reschedule' IPI to another CPU.
  * it goes straight through and wastes no time serializing
  * anything. Worst case is that we lose a reschedule ...
@@ -125,12 +137,12 @@ static void native_smp_send_reschedule(int cpu)
WARN_ON(1);
return;
}
-   apic->send_IPI_mask(cpumask_of(cpu), RESCHEDULE_VECTOR);
+   send_IPI_cpu(cpu, RESCHEDULE_VECTOR);
 }
 
 void native_send_call_func_single_ipi(int cpu)
 {
-   apic->send_IPI_mask(cpumask_of(cpu), CALL_FUNCTION_SINGLE_VECTOR);
+   send_IPI_cpu(cpu, CALL_FUNCTION_SINGLE_VECTOR);
 }
 
 void native_send_call_func_ipi(const struct cpumask *mask)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:locking/urgent] smp: Fix smp_call_function_single_async() locking

2015-04-18 Thread tip-bot for Linus Torvalds
Commit-ID:  8053871d0f7f67c7efb7f226ef031f78877d6625
Gitweb: http://git.kernel.org/tip/8053871d0f7f67c7efb7f226ef031f78877d6625
Author: Linus Torvalds 
AuthorDate: Wed, 11 Feb 2015 12:42:10 -0800
Committer:  Ingo Molnar 
CommitDate: Fri, 17 Apr 2015 09:57:52 +0200

smp: Fix smp_call_function_single_async() locking

The current smp_function_call code suffers a number of problems, most
notably smp_call_function_single_async() is broken.

The problem is that flush_smp_call_function_queue() does csd_unlock()
_after_ calling csd->func(). This means that a caller cannot properly
synchronize the csd usage as it has to.

Change the code to release the csd before calling ->func() for the
async case, and put a WARN_ON_ONCE(csd->flags & CSD_FLAG_LOCK) in
smp_call_function_single_async() to warn us of improper serialization,
because any waiting there can results in deadlocks when called with
IRQs disabled.

Rename the (currently) unused WAIT flag to SYNCHRONOUS and (re)use it
such that we know what to do in flush_smp_call_function_queue().

Rework csd_{,un}lock() to use smp_load_acquire() / smp_store_release()
to avoid some full barriers while more clearly providing lock
semantics.

Finally move the csd maintenance out of generic_exec_single() into its
callers for clearer code.

Signed-off-by: Linus Torvalds 
[ Added changelog. ]
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Frederic Weisbecker 
Cc: Jens Axboe 
Cc: Rafael David Tinoco 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/ca+55afz492bzlfhdbkn-hygjcreup7cjmeyk3ntsfrwjppz...@mail.gmail.com
Signed-off-by: Ingo Molnar 
---
 kernel/smp.c | 78 
 1 file changed, 47 insertions(+), 31 deletions(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index f38a1e6..2aaac2c 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -19,7 +19,7 @@
 
 enum {
CSD_FLAG_LOCK   = 0x01,
-   CSD_FLAG_WAIT   = 0x02,
+   CSD_FLAG_SYNCHRONOUS= 0x02,
 };
 
 struct call_function_data {
@@ -107,7 +107,7 @@ void __init call_function_init(void)
  */
 static void csd_lock_wait(struct call_single_data *csd)
 {
-   while (csd->flags & CSD_FLAG_LOCK)
+   while (smp_load_acquire(>flags) & CSD_FLAG_LOCK)
cpu_relax();
 }
 
@@ -121,19 +121,17 @@ static void csd_lock(struct call_single_data *csd)
 * to ->flags with any subsequent assignments to other
 * fields of the specified call_single_data structure:
 */
-   smp_mb();
+   smp_wmb();
 }
 
 static void csd_unlock(struct call_single_data *csd)
 {
-   WARN_ON((csd->flags & CSD_FLAG_WAIT) && !(csd->flags & CSD_FLAG_LOCK));
+   WARN_ON(!(csd->flags & CSD_FLAG_LOCK));
 
/*
 * ensure we're all done before releasing data:
 */
-   smp_mb();
-
-   csd->flags &= ~CSD_FLAG_LOCK;
+   smp_store_release(>flags, 0);
 }
 
 static DEFINE_PER_CPU_SHARED_ALIGNED(struct call_single_data, csd_data);
@@ -144,13 +142,16 @@ static DEFINE_PER_CPU_SHARED_ALIGNED(struct 
call_single_data, csd_data);
  * ->func, ->info, and ->flags set.
  */
 static int generic_exec_single(int cpu, struct call_single_data *csd,
-  smp_call_func_t func, void *info, int wait)
+  smp_call_func_t func, void *info)
 {
-   struct call_single_data csd_stack = { .flags = 0 };
-   unsigned long flags;
-
-
if (cpu == smp_processor_id()) {
+   unsigned long flags;
+
+   /*
+* We can unlock early even for the synchronous on-stack case,
+* since we're doing this from the same CPU..
+*/
+   csd_unlock(csd);
local_irq_save(flags);
func(info);
local_irq_restore(flags);
@@ -161,21 +162,9 @@ static int generic_exec_single(int cpu, struct 
call_single_data *csd,
if ((unsigned)cpu >= nr_cpu_ids || !cpu_online(cpu))
return -ENXIO;
 
-
-   if (!csd) {
-   csd = _stack;
-   if (!wait)
-   csd = this_cpu_ptr(_data);
-   }
-
-   csd_lock(csd);
-
csd->func = func;
csd->info = info;
 
-   if (wait)
-   csd->flags |= CSD_FLAG_WAIT;
-
/*
 * The list addition should be visible before sending the IPI
 * handler locks the list to pull the entry off it because of
@@ -190,9 +179,6 @@ static int generic_exec_single(int cpu, struct 
call_single_data *csd,
if (llist_add(>llist, _cpu(call_single_queue, cpu)))
arch_send_call_function_single_ipi(cpu);
 
-   if (wait)
-   csd_lock_wait(csd);
-
return 0;
 }
 
@@ -250,8 +236,17 @@ static void flush_smp_call_function_queue(bool 
warn_cpu_offline)
}
 
llist_for_each_entry_safe(csd, csd_next, entry, llist) {
-   csd->func(csd->info);
-   csd_unlock(csd);
+   

[tip:locking/urgent] smp: Fix smp_call_function_single_async() locking

2015-04-18 Thread tip-bot for Linus Torvalds
Commit-ID:  8053871d0f7f67c7efb7f226ef031f78877d6625
Gitweb: http://git.kernel.org/tip/8053871d0f7f67c7efb7f226ef031f78877d6625
Author: Linus Torvalds torva...@linux-foundation.org
AuthorDate: Wed, 11 Feb 2015 12:42:10 -0800
Committer:  Ingo Molnar mi...@kernel.org
CommitDate: Fri, 17 Apr 2015 09:57:52 +0200

smp: Fix smp_call_function_single_async() locking

The current smp_function_call code suffers a number of problems, most
notably smp_call_function_single_async() is broken.

The problem is that flush_smp_call_function_queue() does csd_unlock()
_after_ calling csd-func(). This means that a caller cannot properly
synchronize the csd usage as it has to.

Change the code to release the csd before calling -func() for the
async case, and put a WARN_ON_ONCE(csd-flags  CSD_FLAG_LOCK) in
smp_call_function_single_async() to warn us of improper serialization,
because any waiting there can results in deadlocks when called with
IRQs disabled.

Rename the (currently) unused WAIT flag to SYNCHRONOUS and (re)use it
such that we know what to do in flush_smp_call_function_queue().

Rework csd_{,un}lock() to use smp_load_acquire() / smp_store_release()
to avoid some full barriers while more clearly providing lock
semantics.

Finally move the csd maintenance out of generic_exec_single() into its
callers for clearer code.

Signed-off-by: Linus Torvalds torva...@linux-foundation.org
[ Added changelog. ]
Signed-off-by: Peter Zijlstra (Intel) pet...@infradead.org
Cc: Frederic Weisbecker fweis...@gmail.com
Cc: Jens Axboe ax...@kernel.dk
Cc: Rafael David Tinoco ina...@ubuntu.com
Cc: Thomas Gleixner t...@linutronix.de
Link: 
http://lkml.kernel.org/r/ca+55afz492bzlfhdbkn-hygjcreup7cjmeyk3ntsfrwjppz...@mail.gmail.com
Signed-off-by: Ingo Molnar mi...@kernel.org
---
 kernel/smp.c | 78 
 1 file changed, 47 insertions(+), 31 deletions(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index f38a1e6..2aaac2c 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -19,7 +19,7 @@
 
 enum {
CSD_FLAG_LOCK   = 0x01,
-   CSD_FLAG_WAIT   = 0x02,
+   CSD_FLAG_SYNCHRONOUS= 0x02,
 };
 
 struct call_function_data {
@@ -107,7 +107,7 @@ void __init call_function_init(void)
  */
 static void csd_lock_wait(struct call_single_data *csd)
 {
-   while (csd-flags  CSD_FLAG_LOCK)
+   while (smp_load_acquire(csd-flags)  CSD_FLAG_LOCK)
cpu_relax();
 }
 
@@ -121,19 +121,17 @@ static void csd_lock(struct call_single_data *csd)
 * to -flags with any subsequent assignments to other
 * fields of the specified call_single_data structure:
 */
-   smp_mb();
+   smp_wmb();
 }
 
 static void csd_unlock(struct call_single_data *csd)
 {
-   WARN_ON((csd-flags  CSD_FLAG_WAIT)  !(csd-flags  CSD_FLAG_LOCK));
+   WARN_ON(!(csd-flags  CSD_FLAG_LOCK));
 
/*
 * ensure we're all done before releasing data:
 */
-   smp_mb();
-
-   csd-flags = ~CSD_FLAG_LOCK;
+   smp_store_release(csd-flags, 0);
 }
 
 static DEFINE_PER_CPU_SHARED_ALIGNED(struct call_single_data, csd_data);
@@ -144,13 +142,16 @@ static DEFINE_PER_CPU_SHARED_ALIGNED(struct 
call_single_data, csd_data);
  * -func, -info, and -flags set.
  */
 static int generic_exec_single(int cpu, struct call_single_data *csd,
-  smp_call_func_t func, void *info, int wait)
+  smp_call_func_t func, void *info)
 {
-   struct call_single_data csd_stack = { .flags = 0 };
-   unsigned long flags;
-
-
if (cpu == smp_processor_id()) {
+   unsigned long flags;
+
+   /*
+* We can unlock early even for the synchronous on-stack case,
+* since we're doing this from the same CPU..
+*/
+   csd_unlock(csd);
local_irq_save(flags);
func(info);
local_irq_restore(flags);
@@ -161,21 +162,9 @@ static int generic_exec_single(int cpu, struct 
call_single_data *csd,
if ((unsigned)cpu = nr_cpu_ids || !cpu_online(cpu))
return -ENXIO;
 
-
-   if (!csd) {
-   csd = csd_stack;
-   if (!wait)
-   csd = this_cpu_ptr(csd_data);
-   }
-
-   csd_lock(csd);
-
csd-func = func;
csd-info = info;
 
-   if (wait)
-   csd-flags |= CSD_FLAG_WAIT;
-
/*
 * The list addition should be visible before sending the IPI
 * handler locks the list to pull the entry off it because of
@@ -190,9 +179,6 @@ static int generic_exec_single(int cpu, struct 
call_single_data *csd,
if (llist_add(csd-llist, per_cpu(call_single_queue, cpu)))
arch_send_call_function_single_ipi(cpu);
 
-   if (wait)
-   csd_lock_wait(csd);
-
return 0;
 }
 
@@ -250,8 +236,17 @@ static void flush_smp_call_function_queue(bool 
warn_cpu_offline)
  

[tip:x86/urgent] x86-64, modify_ldt: Make support for 16-bit segments a runtime option

2014-05-14 Thread tip-bot for Linus Torvalds
Commit-ID:  fa81511bb0bbb2b1aace3695ce869da9762624ff
Gitweb: http://git.kernel.org/tip/fa81511bb0bbb2b1aace3695ce869da9762624ff
Author: Linus Torvalds 
AuthorDate: Wed, 14 May 2014 16:33:54 -0700
Committer:  H. Peter Anvin 
CommitDate: Wed, 14 May 2014 16:33:54 -0700

x86-64, modify_ldt: Make support for 16-bit segments a runtime option

Checkin:

b3b42ac2cbae x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels

disabled 16-bit segments on 64-bit kernels due to an information
leak.  However, it does seem that people are genuinely using Wine to
run old 16-bit Windows programs on Linux.

A proper fix for this ("espfix64") is coming in the upcoming merge
window, but as a temporary fix, create a sysctl to allow the
administrator to re-enable support for 16-bit segments.

It adds a "/proc/sys/abi/ldt16" sysctl that defaults to zero (off). If
you hit this issue and care about your old Windows program more than
you care about a kernel stack address information leak, you can do

   echo 1 > /proc/sys/abi/ldt16

as root (add it to your startup scripts), and you should be ok.

The sysctl table is only added if you have COMPAT support enabled on
x86-64, but I assume anybody who runs old windows binaries very much
does that ;)

Signed-off-by: H. Peter Anvin 
Link: 
http://lkml.kernel.org/r/ca%2b55afw9bpod10u1lfhbomphwzkvjtkmcfcs9s3urpr1yyw...@mail.gmail.com
Cc: 
---
 arch/x86/kernel/ldt.c| 4 +++-
 arch/x86/vdso/vdso32-setup.c | 8 
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c
index af1d14a..dcbbaa1 100644
--- a/arch/x86/kernel/ldt.c
+++ b/arch/x86/kernel/ldt.c
@@ -20,6 +20,8 @@
 #include 
 #include 
 
+int sysctl_ldt16 = 0;
+
 #ifdef CONFIG_SMP
 static void flush_ldt(void *current_mm)
 {
@@ -234,7 +236,7 @@ static int write_ldt(void __user *ptr, unsigned long 
bytecount, int oldmode)
 * IRET leaking the high bits of the kernel stack address.
 */
 #ifdef CONFIG_X86_64
-   if (!ldt_info.seg_32bit) {
+   if (!ldt_info.seg_32bit && !sysctl_ldt16) {
error = -EINVAL;
goto out_unlock;
}
diff --git a/arch/x86/vdso/vdso32-setup.c b/arch/x86/vdso/vdso32-setup.c
index 0034898..e1f220e 100644
--- a/arch/x86/vdso/vdso32-setup.c
+++ b/arch/x86/vdso/vdso32-setup.c
@@ -39,6 +39,7 @@
 #ifdef CONFIG_X86_64
 #define vdso_enabled   sysctl_vsyscall32
 #define arch_setup_additional_pagessyscall32_setup_pages
+extern int sysctl_ldt16;
 #endif
 
 /*
@@ -249,6 +250,13 @@ static struct ctl_table abi_table2[] = {
.mode   = 0644,
.proc_handler   = proc_dointvec
},
+   {
+   .procname   = "ldt16",
+   .data   = _ldt16,
+   .maxlen = sizeof(int),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec
+   },
{}
 };
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/urgent] x86-64, modify_ldt: Make support for 16-bit segments a runtime option

2014-05-14 Thread tip-bot for Linus Torvalds
Commit-ID:  fa81511bb0bbb2b1aace3695ce869da9762624ff
Gitweb: http://git.kernel.org/tip/fa81511bb0bbb2b1aace3695ce869da9762624ff
Author: Linus Torvalds torva...@linux-foundation.org
AuthorDate: Wed, 14 May 2014 16:33:54 -0700
Committer:  H. Peter Anvin h...@linux.intel.com
CommitDate: Wed, 14 May 2014 16:33:54 -0700

x86-64, modify_ldt: Make support for 16-bit segments a runtime option

Checkin:

b3b42ac2cbae x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels

disabled 16-bit segments on 64-bit kernels due to an information
leak.  However, it does seem that people are genuinely using Wine to
run old 16-bit Windows programs on Linux.

A proper fix for this (espfix64) is coming in the upcoming merge
window, but as a temporary fix, create a sysctl to allow the
administrator to re-enable support for 16-bit segments.

It adds a /proc/sys/abi/ldt16 sysctl that defaults to zero (off). If
you hit this issue and care about your old Windows program more than
you care about a kernel stack address information leak, you can do

   echo 1  /proc/sys/abi/ldt16

as root (add it to your startup scripts), and you should be ok.

The sysctl table is only added if you have COMPAT support enabled on
x86-64, but I assume anybody who runs old windows binaries very much
does that ;)

Signed-off-by: H. Peter Anvin h...@linux.intel.com
Link: 
http://lkml.kernel.org/r/ca%2b55afw9bpod10u1lfhbomphwzkvjtkmcfcs9s3urpr1yyw...@mail.gmail.com
Cc: sta...@vger.kernel.org
---
 arch/x86/kernel/ldt.c| 4 +++-
 arch/x86/vdso/vdso32-setup.c | 8 
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c
index af1d14a..dcbbaa1 100644
--- a/arch/x86/kernel/ldt.c
+++ b/arch/x86/kernel/ldt.c
@@ -20,6 +20,8 @@
 #include asm/mmu_context.h
 #include asm/syscalls.h
 
+int sysctl_ldt16 = 0;
+
 #ifdef CONFIG_SMP
 static void flush_ldt(void *current_mm)
 {
@@ -234,7 +236,7 @@ static int write_ldt(void __user *ptr, unsigned long 
bytecount, int oldmode)
 * IRET leaking the high bits of the kernel stack address.
 */
 #ifdef CONFIG_X86_64
-   if (!ldt_info.seg_32bit) {
+   if (!ldt_info.seg_32bit  !sysctl_ldt16) {
error = -EINVAL;
goto out_unlock;
}
diff --git a/arch/x86/vdso/vdso32-setup.c b/arch/x86/vdso/vdso32-setup.c
index 0034898..e1f220e 100644
--- a/arch/x86/vdso/vdso32-setup.c
+++ b/arch/x86/vdso/vdso32-setup.c
@@ -39,6 +39,7 @@
 #ifdef CONFIG_X86_64
 #define vdso_enabled   sysctl_vsyscall32
 #define arch_setup_additional_pagessyscall32_setup_pages
+extern int sysctl_ldt16;
 #endif
 
 /*
@@ -249,6 +250,13 @@ static struct ctl_table abi_table2[] = {
.mode   = 0644,
.proc_handler   = proc_dointvec
},
+   {
+   .procname   = ldt16,
+   .data   = sysctl_ldt16,
+   .maxlen = sizeof(int),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec
+   },
{}
 };
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/urgent] x86, fpu, amd: Clear exceptions in AMD FXSAVE workaround

2014-01-11 Thread tip-bot for Linus Torvalds
Commit-ID:  26bef1318adc1b3a530ecc807ef99346db2aa8b0
Gitweb: http://git.kernel.org/tip/26bef1318adc1b3a530ecc807ef99346db2aa8b0
Author: Linus Torvalds 
AuthorDate: Sat, 11 Jan 2014 19:15:52 -0800
Committer:  H. Peter Anvin 
CommitDate: Sat, 11 Jan 2014 19:15:52 -0800

x86, fpu, amd: Clear exceptions in AMD FXSAVE workaround

Before we do an EMMS in the AMD FXSAVE information leak workaround we
need to clear any pending exceptions, otherwise we trap with a
floating-point exception inside this code.

Reported-by: halfdog 
Tested-by: Borislav Petkov 
Link: 
http://lkml.kernel.org/r/CA%2B55aFxQnY_PCG_n4=0w-VG=ylxl-yr7omxyy0wu2gcbaf3...@mail.gmail.com
Signed-off-by: H. Peter Anvin 
---
 arch/x86/include/asm/fpu-internal.h | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/fpu-internal.h 
b/arch/x86/include/asm/fpu-internal.h
index c49a613..cea1c76 100644
--- a/arch/x86/include/asm/fpu-internal.h
+++ b/arch/x86/include/asm/fpu-internal.h
@@ -293,12 +293,13 @@ static inline int restore_fpu_checking(struct task_struct 
*tsk)
/* AMD K7/K8 CPUs don't save/restore FDP/FIP/FOP unless an exception
   is pending.  Clear the x87 state here by setting it to fixed
   values. "m" is a random variable that should be in L1 */
-   alternative_input(
-   ASM_NOP8 ASM_NOP2,
-   "emms\n\t"  /* clear stack tags */
-   "fildl %P[addr]",   /* set F?P to defined value */
-   X86_FEATURE_FXSAVE_LEAK,
-   [addr] "m" (tsk->thread.fpu.has_fpu));
+   if (unlikely(static_cpu_has(X86_FEATURE_FXSAVE_LEAK))) {
+   asm volatile(
+   "fnclex\n\t"
+   "emms\n\t"
+   "fildl %P[addr]"/* set F?P to defined value */
+   : : [addr] "m" (tsk->thread.fpu.has_fpu));
+   }
 
return fpu_restore_checking(>thread.fpu);
 }
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/urgent] x86, fpu, amd: Clear exceptions in AMD FXSAVE workaround

2014-01-11 Thread tip-bot for Linus Torvalds
Commit-ID:  26bef1318adc1b3a530ecc807ef99346db2aa8b0
Gitweb: http://git.kernel.org/tip/26bef1318adc1b3a530ecc807ef99346db2aa8b0
Author: Linus Torvalds torva...@linux-foundation.org
AuthorDate: Sat, 11 Jan 2014 19:15:52 -0800
Committer:  H. Peter Anvin h...@zytor.com
CommitDate: Sat, 11 Jan 2014 19:15:52 -0800

x86, fpu, amd: Clear exceptions in AMD FXSAVE workaround

Before we do an EMMS in the AMD FXSAVE information leak workaround we
need to clear any pending exceptions, otherwise we trap with a
floating-point exception inside this code.

Reported-by: halfdog m...@halfdog.net
Tested-by: Borislav Petkov b...@suse.de
Link: 
http://lkml.kernel.org/r/CA%2B55aFxQnY_PCG_n4=0w-VG=ylxl-yr7omxyy0wu2gcbaf3...@mail.gmail.com
Signed-off-by: H. Peter Anvin h...@zytor.com
---
 arch/x86/include/asm/fpu-internal.h | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/fpu-internal.h 
b/arch/x86/include/asm/fpu-internal.h
index c49a613..cea1c76 100644
--- a/arch/x86/include/asm/fpu-internal.h
+++ b/arch/x86/include/asm/fpu-internal.h
@@ -293,12 +293,13 @@ static inline int restore_fpu_checking(struct task_struct 
*tsk)
/* AMD K7/K8 CPUs don't save/restore FDP/FIP/FOP unless an exception
   is pending.  Clear the x87 state here by setting it to fixed
   values. m is a random variable that should be in L1 */
-   alternative_input(
-   ASM_NOP8 ASM_NOP2,
-   emms\n\t  /* clear stack tags */
-   fildl %P[addr],   /* set F?P to defined value */
-   X86_FEATURE_FXSAVE_LEAK,
-   [addr] m (tsk-thread.fpu.has_fpu));
+   if (unlikely(static_cpu_has(X86_FEATURE_FXSAVE_LEAK))) {
+   asm volatile(
+   fnclex\n\t
+   emms\n\t
+   fildl %P[addr]/* set F?P to defined value */
+   : : [addr] m (tsk-thread.fpu.has_fpu));
+   }
 
return fpu_restore_checking(tsk-thread.fpu);
 }
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/asm] x86: Replace assembly access_ok() with a C variant

2013-12-27 Thread tip-bot for Linus Torvalds
Commit-ID:  c5fe5d80680e2949ffe102180f5fc6cefc0d145f
Gitweb: http://git.kernel.org/tip/c5fe5d80680e2949ffe102180f5fc6cefc0d145f
Author: Linus Torvalds 
AuthorDate: Fri, 27 Dec 2013 15:30:58 -0800
Committer:  H. Peter Anvin 
CommitDate: Fri, 27 Dec 2013 16:58:17 -0800

x86: Replace assembly access_ok() with a C variant

It turns out that the assembly variant doesn't actually produce that
good code, presumably partly because it creates a long dependency
chain with no scheduling, and partly because we cannot get a flags
result out of gcc (which could be fixed with asm goto, but it turns
out not to be worth it.)

The C code allows gcc to schedule and generate multiple (easily
predictable) branches, and as a side benefit we can really optimize
the case where the size is constant.

Link: 
http://lkml.kernel.org/r/CA%2B55aFzPBdbfKovMT8Edr4SmE2_=%2bokjfac9xw2awegogtk...@mail.gmail.com
Signed-off-by: H. Peter Anvin 
---
 arch/x86/include/asm/uaccess.h | 28 +---
 1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 8ec57c0..84ecf1d 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -40,22 +40,28 @@
 /*
  * Test whether a block of memory is a valid user space address.
  * Returns 0 if the range is valid, nonzero otherwise.
- *
- * This is equivalent to the following test:
- * (u33)addr + (u33)size > (u33)current->addr_limit.seg (u65 for x86_64)
- *
- * This needs 33-bit (65-bit for x86_64) arithmetic. We have a carry...
  */
+static inline int __chk_range_not_ok(unsigned long addr, unsigned long size, 
unsigned long limit)
+{
+   /*
+* If we have used "sizeof()" for the size,
+* we know it won't overflow the limit (but
+* it might overflow the 'addr', so it's
+* important to subtract the size from the
+* limit, not add it to the address).
+*/
+   if (__builtin_constant_p(size))
+   return addr > limit - size;
+
+   /* Arbitrary sizes? Be careful about overflow */
+   addr += size;
+   return (addr < size) || (addr > limit);
+}
 
 #define __range_not_ok(addr, size, limit)  \
 ({ \
-   unsigned long flag, roksum; \
__chk_user_ptr(addr);   \
-   asm("add %3,%1 ; sbb %0,%0 ; cmp %1,%4 ; sbb $0,%0" \
-   : "=" (flag), "=r" (roksum)   \
-   : "1" (addr), "g" ((long)(size)),   \
- "rm" (limit));\
-   flag;   \
+   __chk_range_not_ok((unsigned long __force)(addr), size, limit); \
 })
 
 /**
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/asm] x86: Replace assembly access_ok() with a C variant

2013-12-27 Thread tip-bot for Linus Torvalds
Commit-ID:  c5fe5d80680e2949ffe102180f5fc6cefc0d145f
Gitweb: http://git.kernel.org/tip/c5fe5d80680e2949ffe102180f5fc6cefc0d145f
Author: Linus Torvalds torva...@linux-foundation.org
AuthorDate: Fri, 27 Dec 2013 15:30:58 -0800
Committer:  H. Peter Anvin h...@zytor.com
CommitDate: Fri, 27 Dec 2013 16:58:17 -0800

x86: Replace assembly access_ok() with a C variant

It turns out that the assembly variant doesn't actually produce that
good code, presumably partly because it creates a long dependency
chain with no scheduling, and partly because we cannot get a flags
result out of gcc (which could be fixed with asm goto, but it turns
out not to be worth it.)

The C code allows gcc to schedule and generate multiple (easily
predictable) branches, and as a side benefit we can really optimize
the case where the size is constant.

Link: 
http://lkml.kernel.org/r/CA%2B55aFzPBdbfKovMT8Edr4SmE2_=%2bokjfac9xw2awegogtk...@mail.gmail.com
Signed-off-by: H. Peter Anvin h...@zytor.com
---
 arch/x86/include/asm/uaccess.h | 28 +---
 1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 8ec57c0..84ecf1d 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -40,22 +40,28 @@
 /*
  * Test whether a block of memory is a valid user space address.
  * Returns 0 if the range is valid, nonzero otherwise.
- *
- * This is equivalent to the following test:
- * (u33)addr + (u33)size  (u33)current-addr_limit.seg (u65 for x86_64)
- *
- * This needs 33-bit (65-bit for x86_64) arithmetic. We have a carry...
  */
+static inline int __chk_range_not_ok(unsigned long addr, unsigned long size, 
unsigned long limit)
+{
+   /*
+* If we have used sizeof() for the size,
+* we know it won't overflow the limit (but
+* it might overflow the 'addr', so it's
+* important to subtract the size from the
+* limit, not add it to the address).
+*/
+   if (__builtin_constant_p(size))
+   return addr  limit - size;
+
+   /* Arbitrary sizes? Be careful about overflow */
+   addr += size;
+   return (addr  size) || (addr  limit);
+}
 
 #define __range_not_ok(addr, size, limit)  \
 ({ \
-   unsigned long flag, roksum; \
__chk_user_ptr(addr);   \
-   asm(add %3,%1 ; sbb %0,%0 ; cmp %1,%4 ; sbb $0,%0 \
-   : =r (flag), =r (roksum)   \
-   : 1 (addr), g ((long)(size)),   \
- rm (limit));\
-   flag;   \
+   __chk_range_not_ok((unsigned long __force)(addr), size, limit); \
 })
 
 /**
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/