[tip:x86/pti] sched/smt: Make sched_smt_present track topology

2018-11-28 Thread tip-bot for Peter Zijlstra (Intel)
Commit-ID:  c5511d03ec090980732e929c318a7a6374b5550e
Gitweb: https://git.kernel.org/tip/c5511d03ec090980732e929c318a7a6374b5550e
Author: Peter Zijlstra (Intel) 
AuthorDate: Sun, 25 Nov 2018 19:33:36 +0100
Committer:  Thomas Gleixner 
CommitDate: Wed, 28 Nov 2018 11:57:06 +0100

sched/smt: Make sched_smt_present track topology

Currently the 'sched_smt_present' static key is enabled when at CPU bringup
SMT topology is observed, but it is never disabled. However there is demand
to also disable the key when the topology changes such that there is no SMT
present anymore.

Implement this by making the key count the number of cores that have SMT
enabled.

In particular, the SMT topology bits are set before interrrupts are enabled
and similarly, are cleared after interrupts are disabled for the last time
and the CPU dies.

Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Thomas Gleixner 
Reviewed-by: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: Linus Torvalds 
Cc: Jiri Kosina 
Cc: Tom Lendacky 
Cc: Josh Poimboeuf 
Cc: Andrea Arcangeli 
Cc: David Woodhouse 
Cc: Tim Chen 
Cc: Andi Kleen 
Cc: Dave Hansen 
Cc: Casey Schaufler 
Cc: Asit Mallick 
Cc: Arjan van de Ven 
Cc: Jon Masters 
Cc: Waiman Long 
Cc: Greg KH 
Cc: Dave Stewart 
Cc: Kees Cook 
Cc: sta...@vger.kernel.org
Link: https://lkml.kernel.org/r/20181125185004.246110...@linutronix.de


---
 kernel/sched/core.c | 19 +++
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 091e089063be..6fedf3a98581 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5738,15 +5738,10 @@ int sched_cpu_activate(unsigned int cpu)
 
 #ifdef CONFIG_SCHED_SMT
/*
-* The sched_smt_present static key needs to be evaluated on every
-* hotplug event because at boot time SMT might be disabled when
-* the number of booted CPUs is limited.
-*
-* If then later a sibling gets hotplugged, then the key would stay
-* off and SMT scheduling would never be functional.
+* When going up, increment the number of cores with SMT present.
 */
-   if (cpumask_weight(cpu_smt_mask(cpu)) > 1)
-   static_branch_enable_cpuslocked(_smt_present);
+   if (cpumask_weight(cpu_smt_mask(cpu)) == 2)
+   static_branch_inc_cpuslocked(_smt_present);
 #endif
set_cpu_active(cpu, true);
 
@@ -5790,6 +5785,14 @@ int sched_cpu_deactivate(unsigned int cpu)
 */
synchronize_rcu_mult(call_rcu, call_rcu_sched);
 
+#ifdef CONFIG_SCHED_SMT
+   /*
+* When going down, decrement the number of cores with SMT present.
+*/
+   if (cpumask_weight(cpu_smt_mask(cpu)) == 2)
+   static_branch_dec_cpuslocked(_smt_present);
+#endif
+
if (!sched_smp_initialized)
return 0;
 


[tip:x86/pti] sched/smt: Make sched_smt_present track topology

2018-11-28 Thread tip-bot for Peter Zijlstra (Intel)
Commit-ID:  c5511d03ec090980732e929c318a7a6374b5550e
Gitweb: https://git.kernel.org/tip/c5511d03ec090980732e929c318a7a6374b5550e
Author: Peter Zijlstra (Intel) 
AuthorDate: Sun, 25 Nov 2018 19:33:36 +0100
Committer:  Thomas Gleixner 
CommitDate: Wed, 28 Nov 2018 11:57:06 +0100

sched/smt: Make sched_smt_present track topology

Currently the 'sched_smt_present' static key is enabled when at CPU bringup
SMT topology is observed, but it is never disabled. However there is demand
to also disable the key when the topology changes such that there is no SMT
present anymore.

Implement this by making the key count the number of cores that have SMT
enabled.

In particular, the SMT topology bits are set before interrrupts are enabled
and similarly, are cleared after interrupts are disabled for the last time
and the CPU dies.

Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Thomas Gleixner 
Reviewed-by: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: Linus Torvalds 
Cc: Jiri Kosina 
Cc: Tom Lendacky 
Cc: Josh Poimboeuf 
Cc: Andrea Arcangeli 
Cc: David Woodhouse 
Cc: Tim Chen 
Cc: Andi Kleen 
Cc: Dave Hansen 
Cc: Casey Schaufler 
Cc: Asit Mallick 
Cc: Arjan van de Ven 
Cc: Jon Masters 
Cc: Waiman Long 
Cc: Greg KH 
Cc: Dave Stewart 
Cc: Kees Cook 
Cc: sta...@vger.kernel.org
Link: https://lkml.kernel.org/r/20181125185004.246110...@linutronix.de


---
 kernel/sched/core.c | 19 +++
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 091e089063be..6fedf3a98581 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5738,15 +5738,10 @@ int sched_cpu_activate(unsigned int cpu)
 
 #ifdef CONFIG_SCHED_SMT
/*
-* The sched_smt_present static key needs to be evaluated on every
-* hotplug event because at boot time SMT might be disabled when
-* the number of booted CPUs is limited.
-*
-* If then later a sibling gets hotplugged, then the key would stay
-* off and SMT scheduling would never be functional.
+* When going up, increment the number of cores with SMT present.
 */
-   if (cpumask_weight(cpu_smt_mask(cpu)) > 1)
-   static_branch_enable_cpuslocked(_smt_present);
+   if (cpumask_weight(cpu_smt_mask(cpu)) == 2)
+   static_branch_inc_cpuslocked(_smt_present);
 #endif
set_cpu_active(cpu, true);
 
@@ -5790,6 +5785,14 @@ int sched_cpu_deactivate(unsigned int cpu)
 */
synchronize_rcu_mult(call_rcu, call_rcu_sched);
 
+#ifdef CONFIG_SCHED_SMT
+   /*
+* When going down, decrement the number of cores with SMT present.
+*/
+   if (cpumask_weight(cpu_smt_mask(cpu)) == 2)
+   static_branch_dec_cpuslocked(_smt_present);
+#endif
+
if (!sched_smp_initialized)
return 0;
 


[tip:x86/boot] x86/kaslr, ACPI/NUMA: Fix KASLR build error

2018-10-09 Thread tip-bot for Peter Zijlstra (Intel)
Commit-ID:  9d94e8b1d4f94a3c4cee5ad11a1be460cd070839
Gitweb: https://git.kernel.org/tip/9d94e8b1d4f94a3c4cee5ad11a1be460cd070839
Author: Peter Zijlstra (Intel) 
AuthorDate: Wed, 3 Oct 2018 14:41:27 +0200
Committer:  Borislav Petkov 
CommitDate: Tue, 9 Oct 2018 12:30:25 +0200

x86/kaslr, ACPI/NUMA: Fix KASLR build error

There is no point in trying to compile KASLR-specific code when there is
no KASLR.

 [ bp: Move the whole crap into kaslr.c and make
   rand_mem_physical_padding static. Make kaslr_check_padding()
   weak to avoid build breakage on other architectures. ]

Reported-by: Naresh Kamboju 
Reported-by: Mark Brown 
Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Borislav Petkov 
Cc: 
Cc: 
Cc: 
Cc: 
Cc: 
Cc: 
Link: 
http://lkml.kernel.org/r/20181003123402.ga15...@hirez.programming.kicks-ass.net
---
 arch/x86/include/asm/setup.h |  2 --
 arch/x86/mm/kaslr.c  | 19 ++-
 drivers/acpi/numa.c  | 17 +
 3 files changed, 23 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h
index 65a5bf8f6aba..ae13bc974416 100644
--- a/arch/x86/include/asm/setup.h
+++ b/arch/x86/include/asm/setup.h
@@ -80,8 +80,6 @@ static inline unsigned long kaslr_offset(void)
return (unsigned long)&_text - __START_KERNEL;
 }
 
-extern int rand_mem_physical_padding;
-
 /*
  * Do NOT EVER look at the BIOS memory size location.
  * It does not work on many machines.
diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
index 00cf4cae38f5..b3471388288d 100644
--- a/arch/x86/mm/kaslr.c
+++ b/arch/x86/mm/kaslr.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -40,7 +41,7 @@
  */
 static const unsigned long vaddr_end = CPU_ENTRY_AREA_BASE;
 
-int __initdata rand_mem_physical_padding = 
CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING;
+static int __initdata rand_mem_physical_padding = 
CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING;
 /*
  * Memory regions randomized by KASLR (except modules that use a separate logic
  * earlier during boot). The list is ordered based on virtual addresses. This
@@ -70,6 +71,22 @@ static inline bool kaslr_memory_enabled(void)
return kaslr_enabled() && !IS_ENABLED(CONFIG_KASAN);
 }
 
+/*
+ * Check the padding size for KASLR is enough.
+ */
+void __init kaslr_check_padding(void)
+{
+   u64 max_possible_phys, max_actual_phys, threshold;
+
+   max_actual_phys = roundup(PFN_PHYS(max_pfn), 1ULL << 40);
+   max_possible_phys = roundup(PFN_PHYS(max_possible_pfn), 1ULL << 40);
+   threshold = max_actual_phys + ((u64)rand_mem_physical_padding << 40);
+
+   if (max_possible_phys > threshold)
+   pr_warn("Set 'rand_mem_physical_padding=%llu' to avoid memory 
hotadd failure.\n",
+   (max_possible_phys - max_actual_phys) >> 40);
+}
+
 static int __init rand_mem_physical_padding_setup(char *str)
 {
int max_padding = (1 << (MAX_PHYSMEM_BITS - TB_SHIFT)) - 1;
diff --git a/drivers/acpi/numa.c b/drivers/acpi/numa.c
index 3d69834c692f..ba62004f4d86 100644
--- a/drivers/acpi/numa.c
+++ b/drivers/acpi/numa.c
@@ -32,7 +32,6 @@
 #include 
 #include 
 #include 
-#include 
 
 static nodemask_t nodes_found_map = NODE_MASK_NONE;
 
@@ -433,10 +432,12 @@ acpi_table_parse_srat(enum acpi_srat_type id,
handler, max_entries);
 }
 
+/* To be overridden by architectures */
+void __init __weak kaslr_check_padding(void) { }
+
 int __init acpi_numa_init(void)
 {
int cnt = 0;
-   u64 max_possible_phys, max_actual_phys, threshold;
 
if (acpi_disabled)
return -EINVAL;
@@ -466,17 +467,9 @@ int __init acpi_numa_init(void)
cnt = acpi_table_parse_srat(ACPI_SRAT_TYPE_MEMORY_AFFINITY,
acpi_parse_memory_affinity, 0);
 
-   /* check the padding size for KASLR is enough. */
-   if (parsed_numa_memblks && kaslr_enabled()) {
-   max_actual_phys = roundup(PFN_PHYS(max_pfn), 1ULL << 
40);
-   max_possible_phys = roundup(PFN_PHYS(max_possible_pfn), 
1ULL << 40);
-   threshold = max_actual_phys + 
((u64)rand_mem_physical_padding << 40);
+   if (parsed_numa_memblks)
+   kaslr_check_padding();
 
-   if (max_possible_phys > threshold) {
-   pr_warn("Set 'rand_mem_physical_padding=%llu' 
to avoid memory hotadd failure.\n",
- (max_possible_phys - max_actual_phys) >> 40);
-   }
-   }
}
 
/* SLIT: System Locality Information Table */


[tip:x86/boot] x86/kaslr, ACPI/NUMA: Fix KASLR build error

2018-10-09 Thread tip-bot for Peter Zijlstra (Intel)
Commit-ID:  9d94e8b1d4f94a3c4cee5ad11a1be460cd070839
Gitweb: https://git.kernel.org/tip/9d94e8b1d4f94a3c4cee5ad11a1be460cd070839
Author: Peter Zijlstra (Intel) 
AuthorDate: Wed, 3 Oct 2018 14:41:27 +0200
Committer:  Borislav Petkov 
CommitDate: Tue, 9 Oct 2018 12:30:25 +0200

x86/kaslr, ACPI/NUMA: Fix KASLR build error

There is no point in trying to compile KASLR-specific code when there is
no KASLR.

 [ bp: Move the whole crap into kaslr.c and make
   rand_mem_physical_padding static. Make kaslr_check_padding()
   weak to avoid build breakage on other architectures. ]

Reported-by: Naresh Kamboju 
Reported-by: Mark Brown 
Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Borislav Petkov 
Cc: 
Cc: 
Cc: 
Cc: 
Cc: 
Cc: 
Link: 
http://lkml.kernel.org/r/20181003123402.ga15...@hirez.programming.kicks-ass.net
---
 arch/x86/include/asm/setup.h |  2 --
 arch/x86/mm/kaslr.c  | 19 ++-
 drivers/acpi/numa.c  | 17 +
 3 files changed, 23 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h
index 65a5bf8f6aba..ae13bc974416 100644
--- a/arch/x86/include/asm/setup.h
+++ b/arch/x86/include/asm/setup.h
@@ -80,8 +80,6 @@ static inline unsigned long kaslr_offset(void)
return (unsigned long)&_text - __START_KERNEL;
 }
 
-extern int rand_mem_physical_padding;
-
 /*
  * Do NOT EVER look at the BIOS memory size location.
  * It does not work on many machines.
diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
index 00cf4cae38f5..b3471388288d 100644
--- a/arch/x86/mm/kaslr.c
+++ b/arch/x86/mm/kaslr.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -40,7 +41,7 @@
  */
 static const unsigned long vaddr_end = CPU_ENTRY_AREA_BASE;
 
-int __initdata rand_mem_physical_padding = 
CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING;
+static int __initdata rand_mem_physical_padding = 
CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING;
 /*
  * Memory regions randomized by KASLR (except modules that use a separate logic
  * earlier during boot). The list is ordered based on virtual addresses. This
@@ -70,6 +71,22 @@ static inline bool kaslr_memory_enabled(void)
return kaslr_enabled() && !IS_ENABLED(CONFIG_KASAN);
 }
 
+/*
+ * Check the padding size for KASLR is enough.
+ */
+void __init kaslr_check_padding(void)
+{
+   u64 max_possible_phys, max_actual_phys, threshold;
+
+   max_actual_phys = roundup(PFN_PHYS(max_pfn), 1ULL << 40);
+   max_possible_phys = roundup(PFN_PHYS(max_possible_pfn), 1ULL << 40);
+   threshold = max_actual_phys + ((u64)rand_mem_physical_padding << 40);
+
+   if (max_possible_phys > threshold)
+   pr_warn("Set 'rand_mem_physical_padding=%llu' to avoid memory 
hotadd failure.\n",
+   (max_possible_phys - max_actual_phys) >> 40);
+}
+
 static int __init rand_mem_physical_padding_setup(char *str)
 {
int max_padding = (1 << (MAX_PHYSMEM_BITS - TB_SHIFT)) - 1;
diff --git a/drivers/acpi/numa.c b/drivers/acpi/numa.c
index 3d69834c692f..ba62004f4d86 100644
--- a/drivers/acpi/numa.c
+++ b/drivers/acpi/numa.c
@@ -32,7 +32,6 @@
 #include 
 #include 
 #include 
-#include 
 
 static nodemask_t nodes_found_map = NODE_MASK_NONE;
 
@@ -433,10 +432,12 @@ acpi_table_parse_srat(enum acpi_srat_type id,
handler, max_entries);
 }
 
+/* To be overridden by architectures */
+void __init __weak kaslr_check_padding(void) { }
+
 int __init acpi_numa_init(void)
 {
int cnt = 0;
-   u64 max_possible_phys, max_actual_phys, threshold;
 
if (acpi_disabled)
return -EINVAL;
@@ -466,17 +467,9 @@ int __init acpi_numa_init(void)
cnt = acpi_table_parse_srat(ACPI_SRAT_TYPE_MEMORY_AFFINITY,
acpi_parse_memory_affinity, 0);
 
-   /* check the padding size for KASLR is enough. */
-   if (parsed_numa_memblks && kaslr_enabled()) {
-   max_actual_phys = roundup(PFN_PHYS(max_pfn), 1ULL << 
40);
-   max_possible_phys = roundup(PFN_PHYS(max_possible_pfn), 
1ULL << 40);
-   threshold = max_actual_phys + 
((u64)rand_mem_physical_padding << 40);
+   if (parsed_numa_memblks)
+   kaslr_check_padding();
 
-   if (max_possible_phys > threshold) {
-   pr_warn("Set 'rand_mem_physical_padding=%llu' 
to avoid memory hotadd failure.\n",
- (max_possible_phys - max_actual_phys) >> 40);
-   }
-   }
}
 
/* SLIT: System Locality Information Table */


[tip:x86/boot] x86/kaslr, ACPI/NUMA: Fix KASLR build error

2018-10-03 Thread tip-bot for Peter Zijlstra (Intel)
Commit-ID:  3a387c6d96e69f1710a3804eb68e1253263298f2
Gitweb: https://git.kernel.org/tip/3a387c6d96e69f1710a3804eb68e1253263298f2
Author: Peter Zijlstra (Intel) 
AuthorDate: Wed, 3 Oct 2018 14:41:27 +0200
Committer:  Borislav Petkov 
CommitDate: Wed, 3 Oct 2018 16:15:49 +0200

x86/kaslr, ACPI/NUMA: Fix KASLR build error

There is no point in trying to compile KASLR-specific code when there is
no KASLR.

 [ bp: Move the whole crap into kaslr.c and make
   rand_mem_physical_padding static. ]

Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Borislav Petkov 
Cc: 
Cc: 
Cc: 
Cc: 
Cc: 
Cc: 
Link: 
http://lkml.kernel.org/r/20181003123402.ga15...@hirez.programming.kicks-ass.net
---
 arch/x86/include/asm/kaslr.h |  2 ++
 arch/x86/include/asm/setup.h |  2 --
 arch/x86/mm/kaslr.c  | 19 ++-
 drivers/acpi/numa.c  | 15 +++
 4 files changed, 23 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/kaslr.h b/arch/x86/include/asm/kaslr.h
index db7ba2feb947..95ef3fc01d12 100644
--- a/arch/x86/include/asm/kaslr.h
+++ b/arch/x86/include/asm/kaslr.h
@@ -6,8 +6,10 @@ unsigned long kaslr_get_random_long(const char *purpose);
 
 #ifdef CONFIG_RANDOMIZE_MEMORY
 void kernel_randomize_memory(void);
+void kaslr_check_padding(void);
 #else
 static inline void kernel_randomize_memory(void) { }
+static inline void kaslr_check_padding(void) { }
 #endif /* CONFIG_RANDOMIZE_MEMORY */
 
 #endif
diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h
index 65a5bf8f6aba..ae13bc974416 100644
--- a/arch/x86/include/asm/setup.h
+++ b/arch/x86/include/asm/setup.h
@@ -80,8 +80,6 @@ static inline unsigned long kaslr_offset(void)
return (unsigned long)&_text - __START_KERNEL;
 }
 
-extern int rand_mem_physical_padding;
-
 /*
  * Do NOT EVER look at the BIOS memory size location.
  * It does not work on many machines.
diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
index 00cf4cae38f5..b3471388288d 100644
--- a/arch/x86/mm/kaslr.c
+++ b/arch/x86/mm/kaslr.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -40,7 +41,7 @@
  */
 static const unsigned long vaddr_end = CPU_ENTRY_AREA_BASE;
 
-int __initdata rand_mem_physical_padding = 
CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING;
+static int __initdata rand_mem_physical_padding = 
CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING;
 /*
  * Memory regions randomized by KASLR (except modules that use a separate logic
  * earlier during boot). The list is ordered based on virtual addresses. This
@@ -70,6 +71,22 @@ static inline bool kaslr_memory_enabled(void)
return kaslr_enabled() && !IS_ENABLED(CONFIG_KASAN);
 }
 
+/*
+ * Check the padding size for KASLR is enough.
+ */
+void __init kaslr_check_padding(void)
+{
+   u64 max_possible_phys, max_actual_phys, threshold;
+
+   max_actual_phys = roundup(PFN_PHYS(max_pfn), 1ULL << 40);
+   max_possible_phys = roundup(PFN_PHYS(max_possible_pfn), 1ULL << 40);
+   threshold = max_actual_phys + ((u64)rand_mem_physical_padding << 40);
+
+   if (max_possible_phys > threshold)
+   pr_warn("Set 'rand_mem_physical_padding=%llu' to avoid memory 
hotadd failure.\n",
+   (max_possible_phys - max_actual_phys) >> 40);
+}
+
 static int __init rand_mem_physical_padding_setup(char *str)
 {
int max_padding = (1 << (MAX_PHYSMEM_BITS - TB_SHIFT)) - 1;
diff --git a/drivers/acpi/numa.c b/drivers/acpi/numa.c
index 3d69834c692f..4408e37600ef 100644
--- a/drivers/acpi/numa.c
+++ b/drivers/acpi/numa.c
@@ -32,7 +32,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 
 static nodemask_t nodes_found_map = NODE_MASK_NONE;
 
@@ -436,7 +436,6 @@ acpi_table_parse_srat(enum acpi_srat_type id,
 int __init acpi_numa_init(void)
 {
int cnt = 0;
-   u64 max_possible_phys, max_actual_phys, threshold;
 
if (acpi_disabled)
return -EINVAL;
@@ -466,17 +465,9 @@ int __init acpi_numa_init(void)
cnt = acpi_table_parse_srat(ACPI_SRAT_TYPE_MEMORY_AFFINITY,
acpi_parse_memory_affinity, 0);
 
-   /* check the padding size for KASLR is enough. */
-   if (parsed_numa_memblks && kaslr_enabled()) {
-   max_actual_phys = roundup(PFN_PHYS(max_pfn), 1ULL << 
40);
-   max_possible_phys = roundup(PFN_PHYS(max_possible_pfn), 
1ULL << 40);
-   threshold = max_actual_phys + 
((u64)rand_mem_physical_padding << 40);
+   if (parsed_numa_memblks)
+   kaslr_check_padding();
 
-   if (max_possible_phys > threshold) {
-   pr_warn("Set 'rand_mem_physical_padding=%llu' 
to avoid memory hotadd failure.\n",
- (max_possible_phys - max_actual_phys) >> 40);
-   }
-   }
}
 
/* SLIT: System Locality 

[tip:x86/boot] x86/kaslr, ACPI/NUMA: Fix KASLR build error

2018-10-03 Thread tip-bot for Peter Zijlstra (Intel)
Commit-ID:  3a387c6d96e69f1710a3804eb68e1253263298f2
Gitweb: https://git.kernel.org/tip/3a387c6d96e69f1710a3804eb68e1253263298f2
Author: Peter Zijlstra (Intel) 
AuthorDate: Wed, 3 Oct 2018 14:41:27 +0200
Committer:  Borislav Petkov 
CommitDate: Wed, 3 Oct 2018 16:15:49 +0200

x86/kaslr, ACPI/NUMA: Fix KASLR build error

There is no point in trying to compile KASLR-specific code when there is
no KASLR.

 [ bp: Move the whole crap into kaslr.c and make
   rand_mem_physical_padding static. ]

Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Borislav Petkov 
Cc: 
Cc: 
Cc: 
Cc: 
Cc: 
Cc: 
Link: 
http://lkml.kernel.org/r/20181003123402.ga15...@hirez.programming.kicks-ass.net
---
 arch/x86/include/asm/kaslr.h |  2 ++
 arch/x86/include/asm/setup.h |  2 --
 arch/x86/mm/kaslr.c  | 19 ++-
 drivers/acpi/numa.c  | 15 +++
 4 files changed, 23 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/kaslr.h b/arch/x86/include/asm/kaslr.h
index db7ba2feb947..95ef3fc01d12 100644
--- a/arch/x86/include/asm/kaslr.h
+++ b/arch/x86/include/asm/kaslr.h
@@ -6,8 +6,10 @@ unsigned long kaslr_get_random_long(const char *purpose);
 
 #ifdef CONFIG_RANDOMIZE_MEMORY
 void kernel_randomize_memory(void);
+void kaslr_check_padding(void);
 #else
 static inline void kernel_randomize_memory(void) { }
+static inline void kaslr_check_padding(void) { }
 #endif /* CONFIG_RANDOMIZE_MEMORY */
 
 #endif
diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h
index 65a5bf8f6aba..ae13bc974416 100644
--- a/arch/x86/include/asm/setup.h
+++ b/arch/x86/include/asm/setup.h
@@ -80,8 +80,6 @@ static inline unsigned long kaslr_offset(void)
return (unsigned long)&_text - __START_KERNEL;
 }
 
-extern int rand_mem_physical_padding;
-
 /*
  * Do NOT EVER look at the BIOS memory size location.
  * It does not work on many machines.
diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
index 00cf4cae38f5..b3471388288d 100644
--- a/arch/x86/mm/kaslr.c
+++ b/arch/x86/mm/kaslr.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -40,7 +41,7 @@
  */
 static const unsigned long vaddr_end = CPU_ENTRY_AREA_BASE;
 
-int __initdata rand_mem_physical_padding = 
CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING;
+static int __initdata rand_mem_physical_padding = 
CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING;
 /*
  * Memory regions randomized by KASLR (except modules that use a separate logic
  * earlier during boot). The list is ordered based on virtual addresses. This
@@ -70,6 +71,22 @@ static inline bool kaslr_memory_enabled(void)
return kaslr_enabled() && !IS_ENABLED(CONFIG_KASAN);
 }
 
+/*
+ * Check the padding size for KASLR is enough.
+ */
+void __init kaslr_check_padding(void)
+{
+   u64 max_possible_phys, max_actual_phys, threshold;
+
+   max_actual_phys = roundup(PFN_PHYS(max_pfn), 1ULL << 40);
+   max_possible_phys = roundup(PFN_PHYS(max_possible_pfn), 1ULL << 40);
+   threshold = max_actual_phys + ((u64)rand_mem_physical_padding << 40);
+
+   if (max_possible_phys > threshold)
+   pr_warn("Set 'rand_mem_physical_padding=%llu' to avoid memory 
hotadd failure.\n",
+   (max_possible_phys - max_actual_phys) >> 40);
+}
+
 static int __init rand_mem_physical_padding_setup(char *str)
 {
int max_padding = (1 << (MAX_PHYSMEM_BITS - TB_SHIFT)) - 1;
diff --git a/drivers/acpi/numa.c b/drivers/acpi/numa.c
index 3d69834c692f..4408e37600ef 100644
--- a/drivers/acpi/numa.c
+++ b/drivers/acpi/numa.c
@@ -32,7 +32,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 
 static nodemask_t nodes_found_map = NODE_MASK_NONE;
 
@@ -436,7 +436,6 @@ acpi_table_parse_srat(enum acpi_srat_type id,
 int __init acpi_numa_init(void)
 {
int cnt = 0;
-   u64 max_possible_phys, max_actual_phys, threshold;
 
if (acpi_disabled)
return -EINVAL;
@@ -466,17 +465,9 @@ int __init acpi_numa_init(void)
cnt = acpi_table_parse_srat(ACPI_SRAT_TYPE_MEMORY_AFFINITY,
acpi_parse_memory_affinity, 0);
 
-   /* check the padding size for KASLR is enough. */
-   if (parsed_numa_memblks && kaslr_enabled()) {
-   max_actual_phys = roundup(PFN_PHYS(max_pfn), 1ULL << 
40);
-   max_possible_phys = roundup(PFN_PHYS(max_possible_pfn), 
1ULL << 40);
-   threshold = max_actual_phys + 
((u64)rand_mem_physical_padding << 40);
+   if (parsed_numa_memblks)
+   kaslr_check_padding();
 
-   if (max_possible_phys > threshold) {
-   pr_warn("Set 'rand_mem_physical_padding=%llu' 
to avoid memory hotadd failure.\n",
- (max_possible_phys - max_actual_phys) >> 40);
-   }
-   }
}
 
/* SLIT: System Locality 

[tip:smp/hotplug] perf: Avoid cpu_hotplug_lock r-r recursion

2017-04-20 Thread tip-bot for Peter Zijlstra (Intel)
Commit-ID:  641693094ee1568502280f95900f374b2226b51d
Gitweb: http://git.kernel.org/tip/641693094ee1568502280f95900f374b2226b51d
Author: Peter Zijlstra (Intel) 
AuthorDate: Tue, 18 Apr 2017 19:05:05 +0200
Committer:  Thomas Gleixner 
CommitDate: Thu, 20 Apr 2017 13:08:57 +0200

perf: Avoid cpu_hotplug_lock r-r recursion

There are two call-sites where using static_key results in recursing on the
cpu_hotplug_lock.

Use the hotplug locked version of static_key_slow_inc().

Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Thomas Gleixner 
Cc: Sebastian Siewior 
Cc: Steven Rostedt 
Cc: jba...@akamai.com
Link: http://lkml.kernel.org/r/20170418103422.687248...@infradead.org

---
 kernel/events/core.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 634dd95..8aa3063 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7653,7 +7653,7 @@ static int perf_swevent_init(struct perf_event *event)
if (err)
return err;
 
-   static_key_slow_inc(_swevent_enabled[event_id]);
+   static_key_slow_inc_cpuslocked(_swevent_enabled[event_id]);
event->destroy = sw_perf_event_destroy;
}
 
@@ -9160,7 +9160,7 @@ static void account_event(struct perf_event *event)
 
mutex_lock(_sched_mutex);
if (!atomic_read(_sched_count)) {
-   static_branch_enable(_sched_events);
+   static_key_slow_inc_cpuslocked(_sched_events.key);
/*
 * Guarantee that all CPUs observe they key change and
 * call the perf scheduling hooks before proceeding to


[tip:smp/hotplug] perf: Avoid cpu_hotplug_lock r-r recursion

2017-04-20 Thread tip-bot for Peter Zijlstra (Intel)
Commit-ID:  641693094ee1568502280f95900f374b2226b51d
Gitweb: http://git.kernel.org/tip/641693094ee1568502280f95900f374b2226b51d
Author: Peter Zijlstra (Intel) 
AuthorDate: Tue, 18 Apr 2017 19:05:05 +0200
Committer:  Thomas Gleixner 
CommitDate: Thu, 20 Apr 2017 13:08:57 +0200

perf: Avoid cpu_hotplug_lock r-r recursion

There are two call-sites where using static_key results in recursing on the
cpu_hotplug_lock.

Use the hotplug locked version of static_key_slow_inc().

Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Thomas Gleixner 
Cc: Sebastian Siewior 
Cc: Steven Rostedt 
Cc: jba...@akamai.com
Link: http://lkml.kernel.org/r/20170418103422.687248...@infradead.org

---
 kernel/events/core.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 634dd95..8aa3063 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7653,7 +7653,7 @@ static int perf_swevent_init(struct perf_event *event)
if (err)
return err;
 
-   static_key_slow_inc(_swevent_enabled[event_id]);
+   static_key_slow_inc_cpuslocked(_swevent_enabled[event_id]);
event->destroy = sw_perf_event_destroy;
}
 
@@ -9160,7 +9160,7 @@ static void account_event(struct perf_event *event)
 
mutex_lock(_sched_mutex);
if (!atomic_read(_sched_count)) {
-   static_branch_enable(_sched_events);
+   static_key_slow_inc_cpuslocked(_sched_events.key);
/*
 * Guarantee that all CPUs observe they key change and
 * call the perf scheduling hooks before proceeding to


[tip:smp/hotplug] jump_label: Provide static_key_slow_inc_cpuslocked()

2017-04-20 Thread tip-bot for Peter Zijlstra (Intel)
Commit-ID:  f5efc6fad63f5533a6083e95286920d5753e52bf
Gitweb: http://git.kernel.org/tip/f5efc6fad63f5533a6083e95286920d5753e52bf
Author: Peter Zijlstra (Intel) 
AuthorDate: Tue, 18 Apr 2017 19:05:04 +0200
Committer:  Thomas Gleixner 
CommitDate: Thu, 20 Apr 2017 13:08:57 +0200

jump_label: Provide static_key_slow_inc_cpuslocked()

Provide static_key_slow_inc_cpuslocked(), a variant that doesn't take
cpu_hotplug_lock().

Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Thomas Gleixner 
Cc: Sebastian Siewior 
Cc: Steven Rostedt 
Cc: jba...@akamai.com
Link: http://lkml.kernel.org/r/20170418103422.636958...@infradead.org

---
 include/linux/jump_label.h |  3 +++
 kernel/jump_label.c| 21 +
 2 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
index 2afd74b..7d07f0b 100644
--- a/include/linux/jump_label.h
+++ b/include/linux/jump_label.h
@@ -158,6 +158,7 @@ extern void arch_jump_label_transform_static(struct 
jump_entry *entry,
 enum jump_label_type type);
 extern int jump_label_text_reserved(void *start, void *end);
 extern void static_key_slow_inc(struct static_key *key);
+extern void static_key_slow_inc_cpuslocked(struct static_key *key);
 extern void static_key_slow_dec(struct static_key *key);
 extern void jump_label_apply_nops(struct module *mod);
 extern int static_key_count(struct static_key *key);
@@ -213,6 +214,8 @@ static inline void static_key_slow_inc(struct static_key 
*key)
atomic_inc(>enabled);
 }
 
+#define static_key_slow_inc_cpuslocked static_key_slow_inc
+
 static inline void static_key_slow_dec(struct static_key *key)
 {
STATIC_KEY_CHECK_USE();
diff --git a/kernel/jump_label.c b/kernel/jump_label.c
index f3afe07..308b12e 100644
--- a/kernel/jump_label.c
+++ b/kernel/jump_label.c
@@ -101,7 +101,7 @@ void static_key_disable(struct static_key *key)
 }
 EXPORT_SYMBOL_GPL(static_key_disable);
 
-void static_key_slow_inc(struct static_key *key)
+void __static_key_slow_inc(struct static_key *key)
 {
int v, v1;
 
@@ -130,7 +130,6 @@ void static_key_slow_inc(struct static_key *key)
 * the all CPUs, for that to be serialized against CPU hot-plug
 * we need to avoid CPUs coming online.
 */
-   get_online_cpus();
jump_label_lock();
if (atomic_read(>enabled) == 0) {
atomic_set(>enabled, -1);
@@ -140,10 +139,22 @@ void static_key_slow_inc(struct static_key *key)
atomic_inc(>enabled);
}
jump_label_unlock();
+}
+
+void static_key_slow_inc(struct static_key *key)
+{
+   get_online_cpus();
+   __static_key_slow_inc(key);
put_online_cpus();
 }
 EXPORT_SYMBOL_GPL(static_key_slow_inc);
 
+void static_key_slow_inc_cpuslocked(struct static_key *key)
+{
+   __static_key_slow_inc(key);
+}
+EXPORT_SYMBOL_GPL(static_key_slow_inc_cpuslocked);
+
 static void __static_key_slow_dec(struct static_key *key,
unsigned long rate_limit, struct delayed_work *work)
 {
@@ -154,7 +165,6 @@ static void __static_key_slow_dec(struct static_key *key,
 * returns is unbalanced, because all other static_key_slow_inc()
 * instances block while the update is in progress.
 */
-   get_online_cpus();
if (!atomic_dec_and_mutex_lock(>enabled, _label_mutex)) {
WARN(atomic_read(>enabled) < 0,
 "jump label: negative count!\n");
@@ -168,20 +178,23 @@ static void __static_key_slow_dec(struct static_key *key,
jump_label_update(key);
}
jump_label_unlock();
-   put_online_cpus();
 }
 
 static void jump_label_update_timeout(struct work_struct *work)
 {
struct static_key_deferred *key =
container_of(work, struct static_key_deferred, work.work);
+   get_online_cpus();
__static_key_slow_dec(>key, 0, NULL);
+   put_online_cpus();
 }
 
 void static_key_slow_dec(struct static_key *key)
 {
STATIC_KEY_CHECK_USE();
+   get_online_cpus();
__static_key_slow_dec(key, 0, NULL);
+   put_online_cpus();
 }
 EXPORT_SYMBOL_GPL(static_key_slow_dec);
 


[tip:smp/hotplug] jump_label: Provide static_key_slow_inc_cpuslocked()

2017-04-20 Thread tip-bot for Peter Zijlstra (Intel)
Commit-ID:  f5efc6fad63f5533a6083e95286920d5753e52bf
Gitweb: http://git.kernel.org/tip/f5efc6fad63f5533a6083e95286920d5753e52bf
Author: Peter Zijlstra (Intel) 
AuthorDate: Tue, 18 Apr 2017 19:05:04 +0200
Committer:  Thomas Gleixner 
CommitDate: Thu, 20 Apr 2017 13:08:57 +0200

jump_label: Provide static_key_slow_inc_cpuslocked()

Provide static_key_slow_inc_cpuslocked(), a variant that doesn't take
cpu_hotplug_lock().

Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Thomas Gleixner 
Cc: Sebastian Siewior 
Cc: Steven Rostedt 
Cc: jba...@akamai.com
Link: http://lkml.kernel.org/r/20170418103422.636958...@infradead.org

---
 include/linux/jump_label.h |  3 +++
 kernel/jump_label.c| 21 +
 2 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
index 2afd74b..7d07f0b 100644
--- a/include/linux/jump_label.h
+++ b/include/linux/jump_label.h
@@ -158,6 +158,7 @@ extern void arch_jump_label_transform_static(struct 
jump_entry *entry,
 enum jump_label_type type);
 extern int jump_label_text_reserved(void *start, void *end);
 extern void static_key_slow_inc(struct static_key *key);
+extern void static_key_slow_inc_cpuslocked(struct static_key *key);
 extern void static_key_slow_dec(struct static_key *key);
 extern void jump_label_apply_nops(struct module *mod);
 extern int static_key_count(struct static_key *key);
@@ -213,6 +214,8 @@ static inline void static_key_slow_inc(struct static_key 
*key)
atomic_inc(>enabled);
 }
 
+#define static_key_slow_inc_cpuslocked static_key_slow_inc
+
 static inline void static_key_slow_dec(struct static_key *key)
 {
STATIC_KEY_CHECK_USE();
diff --git a/kernel/jump_label.c b/kernel/jump_label.c
index f3afe07..308b12e 100644
--- a/kernel/jump_label.c
+++ b/kernel/jump_label.c
@@ -101,7 +101,7 @@ void static_key_disable(struct static_key *key)
 }
 EXPORT_SYMBOL_GPL(static_key_disable);
 
-void static_key_slow_inc(struct static_key *key)
+void __static_key_slow_inc(struct static_key *key)
 {
int v, v1;
 
@@ -130,7 +130,6 @@ void static_key_slow_inc(struct static_key *key)
 * the all CPUs, for that to be serialized against CPU hot-plug
 * we need to avoid CPUs coming online.
 */
-   get_online_cpus();
jump_label_lock();
if (atomic_read(>enabled) == 0) {
atomic_set(>enabled, -1);
@@ -140,10 +139,22 @@ void static_key_slow_inc(struct static_key *key)
atomic_inc(>enabled);
}
jump_label_unlock();
+}
+
+void static_key_slow_inc(struct static_key *key)
+{
+   get_online_cpus();
+   __static_key_slow_inc(key);
put_online_cpus();
 }
 EXPORT_SYMBOL_GPL(static_key_slow_inc);
 
+void static_key_slow_inc_cpuslocked(struct static_key *key)
+{
+   __static_key_slow_inc(key);
+}
+EXPORT_SYMBOL_GPL(static_key_slow_inc_cpuslocked);
+
 static void __static_key_slow_dec(struct static_key *key,
unsigned long rate_limit, struct delayed_work *work)
 {
@@ -154,7 +165,6 @@ static void __static_key_slow_dec(struct static_key *key,
 * returns is unbalanced, because all other static_key_slow_inc()
 * instances block while the update is in progress.
 */
-   get_online_cpus();
if (!atomic_dec_and_mutex_lock(>enabled, _label_mutex)) {
WARN(atomic_read(>enabled) < 0,
 "jump label: negative count!\n");
@@ -168,20 +178,23 @@ static void __static_key_slow_dec(struct static_key *key,
jump_label_update(key);
}
jump_label_unlock();
-   put_online_cpus();
 }
 
 static void jump_label_update_timeout(struct work_struct *work)
 {
struct static_key_deferred *key =
container_of(work, struct static_key_deferred, work.work);
+   get_online_cpus();
__static_key_slow_dec(>key, 0, NULL);
+   put_online_cpus();
 }
 
 void static_key_slow_dec(struct static_key *key)
 {
STATIC_KEY_CHECK_USE();
+   get_online_cpus();
__static_key_slow_dec(key, 0, NULL);
+   put_online_cpus();
 }
 EXPORT_SYMBOL_GPL(static_key_slow_dec);
 


[tip:smp/hotplug] jump_label: Pull get_online_cpus() into generic code

2017-04-20 Thread tip-bot for Peter Zijlstra (Intel)
Commit-ID:  82947f31231157d8ab70fa8961f23fd3887a3327
Gitweb: http://git.kernel.org/tip/82947f31231157d8ab70fa8961f23fd3887a3327
Author: Peter Zijlstra (Intel) 
AuthorDate: Tue, 18 Apr 2017 19:05:03 +0200
Committer:  Thomas Gleixner 
CommitDate: Thu, 20 Apr 2017 13:08:57 +0200

jump_label: Pull get_online_cpus() into generic code

This change does two things:

- it moves the get_online_cpus() call into generic code, with the aim of
  later providing some static_key ops that avoid it.

- as a side effect it inverts the lock order between cpu_hotplug_lock and
  jump_label_mutex.

Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Thomas Gleixner 
Cc: Sebastian Siewior 
Cc: Steven Rostedt 
Cc: jba...@akamai.com
Link: http://lkml.kernel.org/r/20170418103422.590118...@infradead.org

---
 arch/mips/kernel/jump_label.c  |  2 --
 arch/sparc/kernel/jump_label.c |  2 --
 arch/tile/kernel/jump_label.c  |  2 --
 arch/x86/kernel/jump_label.c   |  2 --
 kernel/jump_label.c| 14 ++
 5 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/arch/mips/kernel/jump_label.c b/arch/mips/kernel/jump_label.c
index 3e586da..32e3168 100644
--- a/arch/mips/kernel/jump_label.c
+++ b/arch/mips/kernel/jump_label.c
@@ -58,7 +58,6 @@ void arch_jump_label_transform(struct jump_entry *e,
insn.word = 0; /* nop */
}
 
-   get_online_cpus();
mutex_lock(_mutex);
if (IS_ENABLED(CONFIG_CPU_MICROMIPS)) {
insn_p->halfword[0] = insn.word >> 16;
@@ -70,7 +69,6 @@ void arch_jump_label_transform(struct jump_entry *e,
   (unsigned long)insn_p + sizeof(*insn_p));
 
mutex_unlock(_mutex);
-   put_online_cpus();
 }
 
 #endif /* HAVE_JUMP_LABEL */
diff --git a/arch/sparc/kernel/jump_label.c b/arch/sparc/kernel/jump_label.c
index 07933b9..93adde1 100644
--- a/arch/sparc/kernel/jump_label.c
+++ b/arch/sparc/kernel/jump_label.c
@@ -41,12 +41,10 @@ void arch_jump_label_transform(struct jump_entry *entry,
val = 0x0100;
}
 
-   get_online_cpus();
mutex_lock(_mutex);
*insn = val;
flushi(insn);
mutex_unlock(_mutex);
-   put_online_cpus();
 }
 
 #endif
diff --git a/arch/tile/kernel/jump_label.c b/arch/tile/kernel/jump_label.c
index 07802d5..93931a4 100644
--- a/arch/tile/kernel/jump_label.c
+++ b/arch/tile/kernel/jump_label.c
@@ -45,14 +45,12 @@ static void __jump_label_transform(struct jump_entry *e,
 void arch_jump_label_transform(struct jump_entry *e,
enum jump_label_type type)
 {
-   get_online_cpus();
mutex_lock(_mutex);
 
__jump_label_transform(e, type);
flush_icache_range(e->code, e->code + sizeof(tilegx_bundle_bits));
 
mutex_unlock(_mutex);
-   put_online_cpus();
 }
 
 __init_or_module void arch_jump_label_transform_static(struct jump_entry *e,
diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
index c37bd0f..ab4f491 100644
--- a/arch/x86/kernel/jump_label.c
+++ b/arch/x86/kernel/jump_label.c
@@ -105,11 +105,9 @@ static void __jump_label_transform(struct jump_entry 
*entry,
 void arch_jump_label_transform(struct jump_entry *entry,
   enum jump_label_type type)
 {
-   get_online_cpus();
mutex_lock(_mutex);
__jump_label_transform(entry, type, NULL, 0);
mutex_unlock(_mutex);
-   put_online_cpus();
 }
 
 static enum {
diff --git a/kernel/jump_label.c b/kernel/jump_label.c
index 6c9cb20..f3afe07 100644
--- a/kernel/jump_label.c
+++ b/kernel/jump_label.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef HAVE_JUMP_LABEL
 
@@ -124,6 +125,12 @@ void static_key_slow_inc(struct static_key *key)
return;
}
 
+   /*
+* A number of architectures need to synchronize I$ across
+* the all CPUs, for that to be serialized against CPU hot-plug
+* we need to avoid CPUs coming online.
+*/
+   get_online_cpus();
jump_label_lock();
if (atomic_read(>enabled) == 0) {
atomic_set(>enabled, -1);
@@ -133,6 +140,7 @@ void static_key_slow_inc(struct static_key *key)
atomic_inc(>enabled);
}
jump_label_unlock();
+   put_online_cpus();
 }
 EXPORT_SYMBOL_GPL(static_key_slow_inc);
 
@@ -146,6 +154,7 @@ static void __static_key_slow_dec(struct static_key *key,
 * returns is unbalanced, because all other static_key_slow_inc()
 * instances block while the update is in progress.
 */
+   get_online_cpus();
if (!atomic_dec_and_mutex_lock(>enabled, _label_mutex)) {
WARN(atomic_read(>enabled) < 0,
 "jump label: negative count!\n");
@@ -159,6 +168,7 @@ static void __static_key_slow_dec(struct 

[tip:smp/hotplug] jump_label: Pull get_online_cpus() into generic code

2017-04-20 Thread tip-bot for Peter Zijlstra (Intel)
Commit-ID:  82947f31231157d8ab70fa8961f23fd3887a3327
Gitweb: http://git.kernel.org/tip/82947f31231157d8ab70fa8961f23fd3887a3327
Author: Peter Zijlstra (Intel) 
AuthorDate: Tue, 18 Apr 2017 19:05:03 +0200
Committer:  Thomas Gleixner 
CommitDate: Thu, 20 Apr 2017 13:08:57 +0200

jump_label: Pull get_online_cpus() into generic code

This change does two things:

- it moves the get_online_cpus() call into generic code, with the aim of
  later providing some static_key ops that avoid it.

- as a side effect it inverts the lock order between cpu_hotplug_lock and
  jump_label_mutex.

Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Thomas Gleixner 
Cc: Sebastian Siewior 
Cc: Steven Rostedt 
Cc: jba...@akamai.com
Link: http://lkml.kernel.org/r/20170418103422.590118...@infradead.org

---
 arch/mips/kernel/jump_label.c  |  2 --
 arch/sparc/kernel/jump_label.c |  2 --
 arch/tile/kernel/jump_label.c  |  2 --
 arch/x86/kernel/jump_label.c   |  2 --
 kernel/jump_label.c| 14 ++
 5 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/arch/mips/kernel/jump_label.c b/arch/mips/kernel/jump_label.c
index 3e586da..32e3168 100644
--- a/arch/mips/kernel/jump_label.c
+++ b/arch/mips/kernel/jump_label.c
@@ -58,7 +58,6 @@ void arch_jump_label_transform(struct jump_entry *e,
insn.word = 0; /* nop */
}
 
-   get_online_cpus();
mutex_lock(_mutex);
if (IS_ENABLED(CONFIG_CPU_MICROMIPS)) {
insn_p->halfword[0] = insn.word >> 16;
@@ -70,7 +69,6 @@ void arch_jump_label_transform(struct jump_entry *e,
   (unsigned long)insn_p + sizeof(*insn_p));
 
mutex_unlock(_mutex);
-   put_online_cpus();
 }
 
 #endif /* HAVE_JUMP_LABEL */
diff --git a/arch/sparc/kernel/jump_label.c b/arch/sparc/kernel/jump_label.c
index 07933b9..93adde1 100644
--- a/arch/sparc/kernel/jump_label.c
+++ b/arch/sparc/kernel/jump_label.c
@@ -41,12 +41,10 @@ void arch_jump_label_transform(struct jump_entry *entry,
val = 0x0100;
}
 
-   get_online_cpus();
mutex_lock(_mutex);
*insn = val;
flushi(insn);
mutex_unlock(_mutex);
-   put_online_cpus();
 }
 
 #endif
diff --git a/arch/tile/kernel/jump_label.c b/arch/tile/kernel/jump_label.c
index 07802d5..93931a4 100644
--- a/arch/tile/kernel/jump_label.c
+++ b/arch/tile/kernel/jump_label.c
@@ -45,14 +45,12 @@ static void __jump_label_transform(struct jump_entry *e,
 void arch_jump_label_transform(struct jump_entry *e,
enum jump_label_type type)
 {
-   get_online_cpus();
mutex_lock(_mutex);
 
__jump_label_transform(e, type);
flush_icache_range(e->code, e->code + sizeof(tilegx_bundle_bits));
 
mutex_unlock(_mutex);
-   put_online_cpus();
 }
 
 __init_or_module void arch_jump_label_transform_static(struct jump_entry *e,
diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
index c37bd0f..ab4f491 100644
--- a/arch/x86/kernel/jump_label.c
+++ b/arch/x86/kernel/jump_label.c
@@ -105,11 +105,9 @@ static void __jump_label_transform(struct jump_entry 
*entry,
 void arch_jump_label_transform(struct jump_entry *entry,
   enum jump_label_type type)
 {
-   get_online_cpus();
mutex_lock(_mutex);
__jump_label_transform(entry, type, NULL, 0);
mutex_unlock(_mutex);
-   put_online_cpus();
 }
 
 static enum {
diff --git a/kernel/jump_label.c b/kernel/jump_label.c
index 6c9cb20..f3afe07 100644
--- a/kernel/jump_label.c
+++ b/kernel/jump_label.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef HAVE_JUMP_LABEL
 
@@ -124,6 +125,12 @@ void static_key_slow_inc(struct static_key *key)
return;
}
 
+   /*
+* A number of architectures need to synchronize I$ across
+* the all CPUs, for that to be serialized against CPU hot-plug
+* we need to avoid CPUs coming online.
+*/
+   get_online_cpus();
jump_label_lock();
if (atomic_read(>enabled) == 0) {
atomic_set(>enabled, -1);
@@ -133,6 +140,7 @@ void static_key_slow_inc(struct static_key *key)
atomic_inc(>enabled);
}
jump_label_unlock();
+   put_online_cpus();
 }
 EXPORT_SYMBOL_GPL(static_key_slow_inc);
 
@@ -146,6 +154,7 @@ static void __static_key_slow_dec(struct static_key *key,
 * returns is unbalanced, because all other static_key_slow_inc()
 * instances block while the update is in progress.
 */
+   get_online_cpus();
if (!atomic_dec_and_mutex_lock(>enabled, _label_mutex)) {
WARN(atomic_read(>enabled) < 0,
 "jump label: negative count!\n");
@@ -159,6 +168,7 @@ static void __static_key_slow_dec(struct static_key *key,
jump_label_update(key);
}
jump_label_unlock();
+   put_online_cpus();
 }
 
 

[tip:perf/core] perf annotate: Add number of samples to the header

2016-07-01 Thread tip-bot for Peter Zijlstra (Intel)
Commit-ID:  135cce1bf12bd30d7d66360022f9dac6ea3a07cd
Gitweb: http://git.kernel.org/tip/135cce1bf12bd30d7d66360022f9dac6ea3a07cd
Author: Peter Zijlstra (Intel) 
AuthorDate: Thu, 30 Jun 2016 10:29:55 +0200
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Thu, 30 Jun 2016 18:27:42 -0300

perf annotate: Add number of samples to the header

Staring at annotations of large functions is useless if there's only a
few samples in them. Report the number of samples in the header to make
this easier to determine.

Committer note:

The change amounts to:

  - Percent | Source code & Disassembly of perf-vdso.so for cycles:u
  --
  + Percent | Source code & Disassembly of perf-vdso.so for cycles:u (3278 
samples)
  
+

Signed-off-by: Peter Zijlstra (Intel) 
Cc: Jiri Olsa 
Link: 
http://lkml.kernel.org/r/20160630082955.ga30...@twins.programming.kicks-ass.net
[ split from a larger patch ]
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/util/annotate.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 78e5d6f..e9825fe 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -1522,6 +1522,7 @@ int symbol__annotate_printf(struct symbol *sym, struct 
map *map,
const char *d_filename;
const char *evsel_name = perf_evsel__name(evsel);
struct annotation *notes = symbol__annotation(sym);
+   struct sym_hist *h = annotation__histogram(notes, evsel->idx);
struct disasm_line *pos, *queue = NULL;
u64 start = map__rip_2objdump(map, sym->start);
int printed = 2, queue_len = 0;
@@ -1544,8 +1545,8 @@ int symbol__annotate_printf(struct symbol *sym, struct 
map *map,
if (perf_evsel__is_group_event(evsel))
width *= evsel->nr_members;
 
-   graph_dotted_len = printf(" %-*.*s| Source code & Disassembly of %s 
for %s\n",
-  width, width, "Percent", d_filename, evsel_name);
+   graph_dotted_len = printf(" %-*.*s| Source code & Disassembly of %s 
for %s (%" PRIu64 " samples)\n",
+  width, width, "Percent", d_filename, evsel_name, h->sum);
 
printf("%-*.*s\n",
   graph_dotted_len, graph_dotted_len, graph_dotted_line);


[tip:perf/core] perf annotate: Add number of samples to the header

2016-07-01 Thread tip-bot for Peter Zijlstra (Intel)
Commit-ID:  135cce1bf12bd30d7d66360022f9dac6ea3a07cd
Gitweb: http://git.kernel.org/tip/135cce1bf12bd30d7d66360022f9dac6ea3a07cd
Author: Peter Zijlstra (Intel) 
AuthorDate: Thu, 30 Jun 2016 10:29:55 +0200
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Thu, 30 Jun 2016 18:27:42 -0300

perf annotate: Add number of samples to the header

Staring at annotations of large functions is useless if there's only a
few samples in them. Report the number of samples in the header to make
this easier to determine.

Committer note:

The change amounts to:

  - Percent | Source code & Disassembly of perf-vdso.so for cycles:u
  --
  + Percent | Source code & Disassembly of perf-vdso.so for cycles:u (3278 
samples)
  
+

Signed-off-by: Peter Zijlstra (Intel) 
Cc: Jiri Olsa 
Link: 
http://lkml.kernel.org/r/20160630082955.ga30...@twins.programming.kicks-ass.net
[ split from a larger patch ]
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/util/annotate.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 78e5d6f..e9825fe 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -1522,6 +1522,7 @@ int symbol__annotate_printf(struct symbol *sym, struct 
map *map,
const char *d_filename;
const char *evsel_name = perf_evsel__name(evsel);
struct annotation *notes = symbol__annotation(sym);
+   struct sym_hist *h = annotation__histogram(notes, evsel->idx);
struct disasm_line *pos, *queue = NULL;
u64 start = map__rip_2objdump(map, sym->start);
int printed = 2, queue_len = 0;
@@ -1544,8 +1545,8 @@ int symbol__annotate_printf(struct symbol *sym, struct 
map *map,
if (perf_evsel__is_group_event(evsel))
width *= evsel->nr_members;
 
-   graph_dotted_len = printf(" %-*.*s| Source code & Disassembly of %s 
for %s\n",
-  width, width, "Percent", d_filename, evsel_name);
+   graph_dotted_len = printf(" %-*.*s| Source code & Disassembly of %s 
for %s (%" PRIu64 " samples)\n",
+  width, width, "Percent", d_filename, evsel_name, h->sum);
 
printf("%-*.*s\n",
   graph_dotted_len, graph_dotted_len, graph_dotted_line);


[tip:perf/core] perf annotate: Simplify header dotted line sizing

2016-07-01 Thread tip-bot for Peter Zijlstra (Intel)
Commit-ID:  53dd9b5f95dda95bcadda1b4680be42dfe1f9e5e
Gitweb: http://git.kernel.org/tip/53dd9b5f95dda95bcadda1b4680be42dfe1f9e5e
Author: Peter Zijlstra (Intel) 
AuthorDate: Thu, 30 Jun 2016 09:17:26 -0300
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Thu, 30 Jun 2016 09:21:03 -0300

perf annotate: Simplify header dotted line sizing

No need to use strlen, etc to figure that out, just use the return from
printf(), it will tell how wide the following line needs to be.

Signed-off-by: Peter Zijlstra (Intel) 
Cc: Jiri Olsa 
Link: 
http://lkml.kernel.org/r/20160630082955.ga30...@twins.programming.kicks-ass.net
[ split from a larger patch ]
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/util/annotate.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index c385fec..78e5d6f 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -1528,7 +1528,7 @@ int symbol__annotate_printf(struct symbol *sym, struct 
map *map,
int more = 0;
u64 len;
int width = 8;
-   int namelen, evsel_name_len, graph_dotted_len;
+   int graph_dotted_len;
 
filename = strdup(dso->long_name);
if (!filename)
@@ -1540,17 +1540,14 @@ int symbol__annotate_printf(struct symbol *sym, struct 
map *map,
d_filename = basename(filename);
 
len = symbol__size(sym);
-   namelen = strlen(d_filename);
-   evsel_name_len = strlen(evsel_name);
 
if (perf_evsel__is_group_event(evsel))
width *= evsel->nr_members;
 
-   printf(" %-*.*s|Source code & Disassembly of %s for %s\n",
+   graph_dotted_len = printf(" %-*.*s| Source code & Disassembly of %s 
for %s\n",
   width, width, "Percent", d_filename, evsel_name);
 
-   graph_dotted_len = width + namelen + evsel_name_len;
-   printf("-%-*.*s-\n",
+   printf("%-*.*s\n",
   graph_dotted_len, graph_dotted_len, graph_dotted_line);
 
if (verbose)


[tip:perf/core] perf annotate: Simplify header dotted line sizing

2016-07-01 Thread tip-bot for Peter Zijlstra (Intel)
Commit-ID:  53dd9b5f95dda95bcadda1b4680be42dfe1f9e5e
Gitweb: http://git.kernel.org/tip/53dd9b5f95dda95bcadda1b4680be42dfe1f9e5e
Author: Peter Zijlstra (Intel) 
AuthorDate: Thu, 30 Jun 2016 09:17:26 -0300
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Thu, 30 Jun 2016 09:21:03 -0300

perf annotate: Simplify header dotted line sizing

No need to use strlen, etc to figure that out, just use the return from
printf(), it will tell how wide the following line needs to be.

Signed-off-by: Peter Zijlstra (Intel) 
Cc: Jiri Olsa 
Link: 
http://lkml.kernel.org/r/20160630082955.ga30...@twins.programming.kicks-ass.net
[ split from a larger patch ]
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/util/annotate.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index c385fec..78e5d6f 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -1528,7 +1528,7 @@ int symbol__annotate_printf(struct symbol *sym, struct 
map *map,
int more = 0;
u64 len;
int width = 8;
-   int namelen, evsel_name_len, graph_dotted_len;
+   int graph_dotted_len;
 
filename = strdup(dso->long_name);
if (!filename)
@@ -1540,17 +1540,14 @@ int symbol__annotate_printf(struct symbol *sym, struct 
map *map,
d_filename = basename(filename);
 
len = symbol__size(sym);
-   namelen = strlen(d_filename);
-   evsel_name_len = strlen(evsel_name);
 
if (perf_evsel__is_group_event(evsel))
width *= evsel->nr_members;
 
-   printf(" %-*.*s|Source code & Disassembly of %s for %s\n",
+   graph_dotted_len = printf(" %-*.*s| Source code & Disassembly of %s 
for %s\n",
   width, width, "Percent", d_filename, evsel_name);
 
-   graph_dotted_len = width + namelen + evsel_name_len;
-   printf("-%-*.*s-\n",
+   printf("%-*.*s\n",
   graph_dotted_len, graph_dotted_len, graph_dotted_line);
 
if (verbose)


[tip:smp/hotplug] sched: Allow per-cpu kernel threads to run on online && !active

2016-05-06 Thread tip-bot for Peter Zijlstra (Intel)
Commit-ID:  e9d867a67fd03ccc07248ca4e9c2f74fed494d5b
Gitweb: http://git.kernel.org/tip/e9d867a67fd03ccc07248ca4e9c2f74fed494d5b
Author: Peter Zijlstra (Intel) 
AuthorDate: Thu, 10 Mar 2016 12:54:08 +0100
Committer:  Thomas Gleixner 
CommitDate: Fri, 6 May 2016 14:58:22 +0200

sched: Allow per-cpu kernel threads to run on online && !active

In order to enable symmetric hotplug, we must mirror the online &&
!active state of cpu-down on the cpu-up side.

However, to retain sanity, limit this state to per-cpu kthreads.

Aside from the change to set_cpus_allowed_ptr(), which allow moving
the per-cpu kthreads on, the other critical piece is the cpu selection
for pinned tasks in select_task_rq(). This avoids dropping into
select_fallback_rq().

select_fallback_rq() cannot be allowed to select !active cpus because
its used to migrate user tasks away. And we do not want to move user
tasks onto cpus that are in transition.

Requested-by: Thomas Gleixner 
Signed-off-by: Peter Zijlstra (Intel) 
Tested-by: Thomas Gleixner 
Cc: Lai Jiangshan 
Cc: Jan H. Schönherr 
Cc: Oleg Nesterov 
Cc: r...@linutronix.de
Link: 
http://lkml.kernel.org/r/20160301152303.gv6...@twins.programming.kicks-ass.net
Signed-off-by: Thomas Gleixner 
---
 arch/powerpc/kernel/smp.c |  2 +-
 arch/s390/kernel/smp.c|  2 +-
 include/linux/cpumask.h   |  6 ++
 kernel/sched/core.c   | 49 ---
 4 files changed, 46 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 8cac1eb..55c924b 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -565,7 +565,7 @@ int __cpu_up(unsigned int cpu, struct task_struct *tidle)
smp_ops->give_timebase();
 
/* Wait until cpu puts itself in the online & active maps */
-   while (!cpu_online(cpu) || !cpu_active(cpu))
+   while (!cpu_online(cpu))
cpu_relax();
 
return 0;
diff --git a/arch/s390/kernel/smp.c b/arch/s390/kernel/smp.c
index 40a6b4f..7b89a75 100644
--- a/arch/s390/kernel/smp.c
+++ b/arch/s390/kernel/smp.c
@@ -832,7 +832,7 @@ int __cpu_up(unsigned int cpu, struct task_struct *tidle)
pcpu_attach_task(pcpu, tidle);
pcpu_start_fn(pcpu, smp_start_secondary, NULL);
/* Wait until cpu puts itself in the online & active maps */
-   while (!cpu_online(cpu) || !cpu_active(cpu))
+   while (!cpu_online(cpu))
cpu_relax();
return 0;
 }
diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index 40cee6b..e828cf6 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -743,12 +743,10 @@ set_cpu_present(unsigned int cpu, bool present)
 static inline void
 set_cpu_online(unsigned int cpu, bool online)
 {
-   if (online) {
+   if (online)
cpumask_set_cpu(cpu, &__cpu_online_mask);
-   cpumask_set_cpu(cpu, &__cpu_active_mask);
-   } else {
+   else
cpumask_clear_cpu(cpu, &__cpu_online_mask);
-   }
 }
 
 static inline void
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8b489fc..8bfd7d4 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1082,13 +1082,21 @@ void do_set_cpus_allowed(struct task_struct *p, const 
struct cpumask *new_mask)
 static int __set_cpus_allowed_ptr(struct task_struct *p,
  const struct cpumask *new_mask, bool check)
 {
+   const struct cpumask *cpu_valid_mask = cpu_active_mask;
+   unsigned int dest_cpu;
unsigned long flags;
struct rq *rq;
-   unsigned int dest_cpu;
int ret = 0;
 
rq = task_rq_lock(p, );
 
+   if (p->flags & PF_KTHREAD) {
+   /*
+* Kernel threads are allowed on online && !active CPUs
+*/
+   cpu_valid_mask = cpu_online_mask;
+   }
+
/*
 * Must re-check here, to close a race against __kthread_bind(),
 * sched_setaffinity() is not guaranteed to observe the flag.
@@ -1101,18 +1109,28 @@ static int __set_cpus_allowed_ptr(struct task_struct *p,
if (cpumask_equal(>cpus_allowed, new_mask))
goto out;
 
-   if (!cpumask_intersects(new_mask, cpu_active_mask)) {
+   if (!cpumask_intersects(new_mask, cpu_valid_mask)) {
ret = -EINVAL;
goto out;
}
 
do_set_cpus_allowed(p, new_mask);
 
+   if (p->flags & PF_KTHREAD) {
+   /*
+* For kernel threads that do indeed end up on online &&
+* !active we want to ensure they are strict per-cpu threads.
+*/
+   WARN_ON(cpumask_intersects(new_mask, cpu_online_mask) &&
+   !cpumask_intersects(new_mask, 

[tip:smp/hotplug] sched: Allow per-cpu kernel threads to run on online && !active

2016-05-06 Thread tip-bot for Peter Zijlstra (Intel)
Commit-ID:  e9d867a67fd03ccc07248ca4e9c2f74fed494d5b
Gitweb: http://git.kernel.org/tip/e9d867a67fd03ccc07248ca4e9c2f74fed494d5b
Author: Peter Zijlstra (Intel) 
AuthorDate: Thu, 10 Mar 2016 12:54:08 +0100
Committer:  Thomas Gleixner 
CommitDate: Fri, 6 May 2016 14:58:22 +0200

sched: Allow per-cpu kernel threads to run on online && !active

In order to enable symmetric hotplug, we must mirror the online &&
!active state of cpu-down on the cpu-up side.

However, to retain sanity, limit this state to per-cpu kthreads.

Aside from the change to set_cpus_allowed_ptr(), which allow moving
the per-cpu kthreads on, the other critical piece is the cpu selection
for pinned tasks in select_task_rq(). This avoids dropping into
select_fallback_rq().

select_fallback_rq() cannot be allowed to select !active cpus because
its used to migrate user tasks away. And we do not want to move user
tasks onto cpus that are in transition.

Requested-by: Thomas Gleixner 
Signed-off-by: Peter Zijlstra (Intel) 
Tested-by: Thomas Gleixner 
Cc: Lai Jiangshan 
Cc: Jan H. Schönherr 
Cc: Oleg Nesterov 
Cc: r...@linutronix.de
Link: 
http://lkml.kernel.org/r/20160301152303.gv6...@twins.programming.kicks-ass.net
Signed-off-by: Thomas Gleixner 
---
 arch/powerpc/kernel/smp.c |  2 +-
 arch/s390/kernel/smp.c|  2 +-
 include/linux/cpumask.h   |  6 ++
 kernel/sched/core.c   | 49 ---
 4 files changed, 46 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 8cac1eb..55c924b 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -565,7 +565,7 @@ int __cpu_up(unsigned int cpu, struct task_struct *tidle)
smp_ops->give_timebase();
 
/* Wait until cpu puts itself in the online & active maps */
-   while (!cpu_online(cpu) || !cpu_active(cpu))
+   while (!cpu_online(cpu))
cpu_relax();
 
return 0;
diff --git a/arch/s390/kernel/smp.c b/arch/s390/kernel/smp.c
index 40a6b4f..7b89a75 100644
--- a/arch/s390/kernel/smp.c
+++ b/arch/s390/kernel/smp.c
@@ -832,7 +832,7 @@ int __cpu_up(unsigned int cpu, struct task_struct *tidle)
pcpu_attach_task(pcpu, tidle);
pcpu_start_fn(pcpu, smp_start_secondary, NULL);
/* Wait until cpu puts itself in the online & active maps */
-   while (!cpu_online(cpu) || !cpu_active(cpu))
+   while (!cpu_online(cpu))
cpu_relax();
return 0;
 }
diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index 40cee6b..e828cf6 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -743,12 +743,10 @@ set_cpu_present(unsigned int cpu, bool present)
 static inline void
 set_cpu_online(unsigned int cpu, bool online)
 {
-   if (online) {
+   if (online)
cpumask_set_cpu(cpu, &__cpu_online_mask);
-   cpumask_set_cpu(cpu, &__cpu_active_mask);
-   } else {
+   else
cpumask_clear_cpu(cpu, &__cpu_online_mask);
-   }
 }
 
 static inline void
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8b489fc..8bfd7d4 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1082,13 +1082,21 @@ void do_set_cpus_allowed(struct task_struct *p, const 
struct cpumask *new_mask)
 static int __set_cpus_allowed_ptr(struct task_struct *p,
  const struct cpumask *new_mask, bool check)
 {
+   const struct cpumask *cpu_valid_mask = cpu_active_mask;
+   unsigned int dest_cpu;
unsigned long flags;
struct rq *rq;
-   unsigned int dest_cpu;
int ret = 0;
 
rq = task_rq_lock(p, );
 
+   if (p->flags & PF_KTHREAD) {
+   /*
+* Kernel threads are allowed on online && !active CPUs
+*/
+   cpu_valid_mask = cpu_online_mask;
+   }
+
/*
 * Must re-check here, to close a race against __kthread_bind(),
 * sched_setaffinity() is not guaranteed to observe the flag.
@@ -1101,18 +1109,28 @@ static int __set_cpus_allowed_ptr(struct task_struct *p,
if (cpumask_equal(>cpus_allowed, new_mask))
goto out;
 
-   if (!cpumask_intersects(new_mask, cpu_active_mask)) {
+   if (!cpumask_intersects(new_mask, cpu_valid_mask)) {
ret = -EINVAL;
goto out;
}
 
do_set_cpus_allowed(p, new_mask);
 
+   if (p->flags & PF_KTHREAD) {
+   /*
+* For kernel threads that do indeed end up on online &&
+* !active we want to ensure they are strict per-cpu threads.
+*/
+   WARN_ON(cpumask_intersects(new_mask, cpu_online_mask) &&
+   !cpumask_intersects(new_mask, cpu_active_mask) &&
+   p->nr_cpus_allowed != 1);
+   }
+
/* Can the task run on the task's current CPU? If so, we're done */
if 

[tip:smp/hotplug] sched: Allow per-cpu kernel threads to run on online && !active

2016-05-05 Thread tip-bot for Peter Zijlstra (Intel)
Commit-ID:  618d6e31623149c6203b46850e2e76ee0f29e577
Gitweb: http://git.kernel.org/tip/618d6e31623149c6203b46850e2e76ee0f29e577
Author: Peter Zijlstra (Intel) 
AuthorDate: Thu, 10 Mar 2016 12:54:08 +0100
Committer:  Thomas Gleixner 
CommitDate: Thu, 5 May 2016 13:17:52 +0200

sched: Allow per-cpu kernel threads to run on online && !active

In order to enable symmetric hotplug, we must mirror the online &&
!active state of cpu-down on the cpu-up side.

However, to retain sanity, limit this state to per-cpu kthreads.

Aside from the change to set_cpus_allowed_ptr(), which allow moving
the per-cpu kthreads on, the other critical piece is the cpu selection
for pinned tasks in select_task_rq(). This avoids dropping into
select_fallback_rq().

select_fallback_rq() cannot be allowed to select !active cpus because
its used to migrate user tasks away. And we do not want to move user
tasks onto cpus that are in transition.

Requested-by: Thomas Gleixner 
Signed-off-by: Peter Zijlstra (Intel) 
Tested-by: Thomas Gleixner 
Cc: Lai Jiangshan 
Cc: Jan H. Schönherr 
Cc: Oleg Nesterov 
Cc: r...@linutronix.de
Link: 
http://lkml.kernel.org/r/20160301152303.gv6...@twins.programming.kicks-ass.net
Signed-off-by: Thomas Gleixner 

---
 arch/powerpc/kernel/smp.c |  2 +-
 arch/s390/kernel/smp.c|  2 +-
 include/linux/cpumask.h   |  6 ++
 kernel/sched/core.c   | 49 ---
 4 files changed, 46 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 8cac1eb..55c924b 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -565,7 +565,7 @@ int __cpu_up(unsigned int cpu, struct task_struct *tidle)
smp_ops->give_timebase();
 
/* Wait until cpu puts itself in the online & active maps */
-   while (!cpu_online(cpu) || !cpu_active(cpu))
+   while (!cpu_online(cpu))
cpu_relax();
 
return 0;
diff --git a/arch/s390/kernel/smp.c b/arch/s390/kernel/smp.c
index 40a6b4f..7b89a75 100644
--- a/arch/s390/kernel/smp.c
+++ b/arch/s390/kernel/smp.c
@@ -832,7 +832,7 @@ int __cpu_up(unsigned int cpu, struct task_struct *tidle)
pcpu_attach_task(pcpu, tidle);
pcpu_start_fn(pcpu, smp_start_secondary, NULL);
/* Wait until cpu puts itself in the online & active maps */
-   while (!cpu_online(cpu) || !cpu_active(cpu))
+   while (!cpu_online(cpu))
cpu_relax();
return 0;
 }
diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index 40cee6b..e828cf6 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -743,12 +743,10 @@ set_cpu_present(unsigned int cpu, bool present)
 static inline void
 set_cpu_online(unsigned int cpu, bool online)
 {
-   if (online) {
+   if (online)
cpumask_set_cpu(cpu, &__cpu_online_mask);
-   cpumask_set_cpu(cpu, &__cpu_active_mask);
-   } else {
+   else
cpumask_clear_cpu(cpu, &__cpu_online_mask);
-   }
 }
 
 static inline void
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8b489fc..8bfd7d4 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1082,13 +1082,21 @@ void do_set_cpus_allowed(struct task_struct *p, const 
struct cpumask *new_mask)
 static int __set_cpus_allowed_ptr(struct task_struct *p,
  const struct cpumask *new_mask, bool check)
 {
+   const struct cpumask *cpu_valid_mask = cpu_active_mask;
+   unsigned int dest_cpu;
unsigned long flags;
struct rq *rq;
-   unsigned int dest_cpu;
int ret = 0;
 
rq = task_rq_lock(p, );
 
+   if (p->flags & PF_KTHREAD) {
+   /*
+* Kernel threads are allowed on online && !active CPUs
+*/
+   cpu_valid_mask = cpu_online_mask;
+   }
+
/*
 * Must re-check here, to close a race against __kthread_bind(),
 * sched_setaffinity() is not guaranteed to observe the flag.
@@ -1101,18 +1109,28 @@ static int __set_cpus_allowed_ptr(struct task_struct *p,
if (cpumask_equal(>cpus_allowed, new_mask))
goto out;
 
-   if (!cpumask_intersects(new_mask, cpu_active_mask)) {
+   if (!cpumask_intersects(new_mask, cpu_valid_mask)) {
ret = -EINVAL;
goto out;
}
 
do_set_cpus_allowed(p, new_mask);
 
+   if (p->flags & PF_KTHREAD) {
+   /*
+* For kernel threads that do indeed end up on online &&
+* !active we want to ensure they are strict per-cpu threads.
+*/
+   WARN_ON(cpumask_intersects(new_mask, cpu_online_mask) &&
+   !cpumask_intersects(new_mask, 

[tip:smp/hotplug] sched: Allow per-cpu kernel threads to run on online && !active

2016-05-05 Thread tip-bot for Peter Zijlstra (Intel)
Commit-ID:  618d6e31623149c6203b46850e2e76ee0f29e577
Gitweb: http://git.kernel.org/tip/618d6e31623149c6203b46850e2e76ee0f29e577
Author: Peter Zijlstra (Intel) 
AuthorDate: Thu, 10 Mar 2016 12:54:08 +0100
Committer:  Thomas Gleixner 
CommitDate: Thu, 5 May 2016 13:17:52 +0200

sched: Allow per-cpu kernel threads to run on online && !active

In order to enable symmetric hotplug, we must mirror the online &&
!active state of cpu-down on the cpu-up side.

However, to retain sanity, limit this state to per-cpu kthreads.

Aside from the change to set_cpus_allowed_ptr(), which allow moving
the per-cpu kthreads on, the other critical piece is the cpu selection
for pinned tasks in select_task_rq(). This avoids dropping into
select_fallback_rq().

select_fallback_rq() cannot be allowed to select !active cpus because
its used to migrate user tasks away. And we do not want to move user
tasks onto cpus that are in transition.

Requested-by: Thomas Gleixner 
Signed-off-by: Peter Zijlstra (Intel) 
Tested-by: Thomas Gleixner 
Cc: Lai Jiangshan 
Cc: Jan H. Schönherr 
Cc: Oleg Nesterov 
Cc: r...@linutronix.de
Link: 
http://lkml.kernel.org/r/20160301152303.gv6...@twins.programming.kicks-ass.net
Signed-off-by: Thomas Gleixner 

---
 arch/powerpc/kernel/smp.c |  2 +-
 arch/s390/kernel/smp.c|  2 +-
 include/linux/cpumask.h   |  6 ++
 kernel/sched/core.c   | 49 ---
 4 files changed, 46 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 8cac1eb..55c924b 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -565,7 +565,7 @@ int __cpu_up(unsigned int cpu, struct task_struct *tidle)
smp_ops->give_timebase();
 
/* Wait until cpu puts itself in the online & active maps */
-   while (!cpu_online(cpu) || !cpu_active(cpu))
+   while (!cpu_online(cpu))
cpu_relax();
 
return 0;
diff --git a/arch/s390/kernel/smp.c b/arch/s390/kernel/smp.c
index 40a6b4f..7b89a75 100644
--- a/arch/s390/kernel/smp.c
+++ b/arch/s390/kernel/smp.c
@@ -832,7 +832,7 @@ int __cpu_up(unsigned int cpu, struct task_struct *tidle)
pcpu_attach_task(pcpu, tidle);
pcpu_start_fn(pcpu, smp_start_secondary, NULL);
/* Wait until cpu puts itself in the online & active maps */
-   while (!cpu_online(cpu) || !cpu_active(cpu))
+   while (!cpu_online(cpu))
cpu_relax();
return 0;
 }
diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index 40cee6b..e828cf6 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -743,12 +743,10 @@ set_cpu_present(unsigned int cpu, bool present)
 static inline void
 set_cpu_online(unsigned int cpu, bool online)
 {
-   if (online) {
+   if (online)
cpumask_set_cpu(cpu, &__cpu_online_mask);
-   cpumask_set_cpu(cpu, &__cpu_active_mask);
-   } else {
+   else
cpumask_clear_cpu(cpu, &__cpu_online_mask);
-   }
 }
 
 static inline void
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8b489fc..8bfd7d4 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1082,13 +1082,21 @@ void do_set_cpus_allowed(struct task_struct *p, const 
struct cpumask *new_mask)
 static int __set_cpus_allowed_ptr(struct task_struct *p,
  const struct cpumask *new_mask, bool check)
 {
+   const struct cpumask *cpu_valid_mask = cpu_active_mask;
+   unsigned int dest_cpu;
unsigned long flags;
struct rq *rq;
-   unsigned int dest_cpu;
int ret = 0;
 
rq = task_rq_lock(p, );
 
+   if (p->flags & PF_KTHREAD) {
+   /*
+* Kernel threads are allowed on online && !active CPUs
+*/
+   cpu_valid_mask = cpu_online_mask;
+   }
+
/*
 * Must re-check here, to close a race against __kthread_bind(),
 * sched_setaffinity() is not guaranteed to observe the flag.
@@ -1101,18 +1109,28 @@ static int __set_cpus_allowed_ptr(struct task_struct *p,
if (cpumask_equal(>cpus_allowed, new_mask))
goto out;
 
-   if (!cpumask_intersects(new_mask, cpu_active_mask)) {
+   if (!cpumask_intersects(new_mask, cpu_valid_mask)) {
ret = -EINVAL;
goto out;
}
 
do_set_cpus_allowed(p, new_mask);
 
+   if (p->flags & PF_KTHREAD) {
+   /*
+* For kernel threads that do indeed end up on online &&
+* !active we want to ensure they are strict per-cpu threads.
+*/
+   WARN_ON(cpumask_intersects(new_mask, cpu_online_mask) &&
+   !cpumask_intersects(new_mask, cpu_active_mask) &&
+   p->nr_cpus_allowed != 1);
+   }
+
/* Can the task run on the task's current CPU? If so, we're done */
if 

[tip:sched/core] wait.[ch]: Introduce the simple waitqueue (swait) implementation

2016-02-25 Thread tip-bot for Peter Zijlstra (Intel)
Commit-ID:  13b35686e8b934ff78f59cef0c65fa3a43f8eeaf
Gitweb: http://git.kernel.org/tip/13b35686e8b934ff78f59cef0c65fa3a43f8eeaf
Author: Peter Zijlstra (Intel) 
AuthorDate: Fri, 19 Feb 2016 09:46:37 +0100
Committer:  Thomas Gleixner 
CommitDate: Thu, 25 Feb 2016 11:27:16 +0100

wait.[ch]: Introduce the simple waitqueue (swait) implementation

The existing wait queue support has support for custom wake up call
backs, wake flags, wake key (passed to call back) and exclusive
flags that allow wakers to be tagged as exclusive, for limiting
the number of wakers.

In a lot of cases, none of these features are used, and hence we
can benefit from a slimmed down version that lowers memory overhead
and reduces runtime overhead.

The concept originated from -rt, where waitqueues are a constant
source of trouble, as we can't convert the head lock to a raw
spinlock due to fancy and long lasting callbacks.

With the removal of custom callbacks, we can use a raw lock for
queue list manipulations, hence allowing the simple wait support
to be used in -rt.

[Patch is from PeterZ which is based on Thomas version. Commit message is
 written by Paul G.
 Daniel:  - Fixed some compile issues
  - Added non-lazy implementation of swake_up_locked as suggested
 by Boqun Feng.]

Originally-by: Thomas Gleixner 
Signed-off-by: Daniel Wagner 
Acked-by: Peter Zijlstra (Intel) 
Cc: linux-rt-us...@vger.kernel.org
Cc: Boqun Feng 
Cc: Marcelo Tosatti 
Cc: Steven Rostedt 
Cc: Paul Gortmaker 
Cc: Paolo Bonzini 
Cc: "Paul E. McKenney" 
Link: http://lkml.kernel.org/r/1455871601-27484-2-git-send-email-w...@monom.org
Signed-off-by: Thomas Gleixner 
---
 include/linux/swait.h | 172 ++
 kernel/sched/Makefile |   2 +-
 kernel/sched/swait.c  | 123 
 3 files changed, 296 insertions(+), 1 deletion(-)

diff --git a/include/linux/swait.h b/include/linux/swait.h
new file mode 100644
index 000..c1f9c62
--- /dev/null
+++ b/include/linux/swait.h
@@ -0,0 +1,172 @@
+#ifndef _LINUX_SWAIT_H
+#define _LINUX_SWAIT_H
+
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * Simple wait queues
+ *
+ * While these are very similar to the other/complex wait queues (wait.h) the
+ * most important difference is that the simple waitqueue allows for
+ * deterministic behaviour -- IOW it has strictly bounded IRQ and lock hold
+ * times.
+ *
+ * In order to make this so, we had to drop a fair number of features of the
+ * other waitqueue code; notably:
+ *
+ *  - mixing INTERRUPTIBLE and UNINTERRUPTIBLE sleeps on the same waitqueue;
+ *all wakeups are TASK_NORMAL in order to avoid O(n) lookups for the right
+ *sleeper state.
+ *
+ *  - the exclusive mode; because this requires preserving the list order
+ *and this is hard.
+ *
+ *  - custom wake functions; because you cannot give any guarantees about
+ *random code.
+ *
+ * As a side effect of this; the data structures are slimmer.
+ *
+ * One would recommend using this wait queue where possible.
+ */
+
+struct task_struct;
+
+struct swait_queue_head {
+   raw_spinlock_t  lock;
+   struct list_headtask_list;
+};
+
+struct swait_queue {
+   struct task_struct  *task;
+   struct list_headtask_list;
+};
+
+#define __SWAITQUEUE_INITIALIZER(name) {   \
+   .task   = current,  \
+   .task_list  = LIST_HEAD_INIT((name).task_list), \
+}
+
+#define DECLARE_SWAITQUEUE(name)   \
+   struct swait_queue name = __SWAITQUEUE_INITIALIZER(name)
+
+#define __SWAIT_QUEUE_HEAD_INITIALIZER(name) { \
+   .lock   = __RAW_SPIN_LOCK_UNLOCKED(name.lock),  \
+   .task_list  = LIST_HEAD_INIT((name).task_list), \
+}
+
+#define DECLARE_SWAIT_QUEUE_HEAD(name) \
+   struct swait_queue_head name = __SWAIT_QUEUE_HEAD_INITIALIZER(name)
+
+extern void __init_swait_queue_head(struct swait_queue_head *q, const char 
*name,
+   struct lock_class_key *key);
+
+#define init_swait_queue_head(q)   \
+   do {\
+   static struct lock_class_key __key; \
+   __init_swait_queue_head((q), #q, &__key);   \
+   } while (0)
+
+#ifdef CONFIG_LOCKDEP
+# define __SWAIT_QUEUE_HEAD_INIT_ONSTACK(name) \
+   ({ init_swait_queue_head(); name; })
+# define DECLARE_SWAIT_QUEUE_HEAD_ONSTACK(name)\
+   struct 

[tip:sched/core] wait.[ch]: Introduce the simple waitqueue (swait) implementation

2016-02-25 Thread tip-bot for Peter Zijlstra (Intel)
Commit-ID:  13b35686e8b934ff78f59cef0c65fa3a43f8eeaf
Gitweb: http://git.kernel.org/tip/13b35686e8b934ff78f59cef0c65fa3a43f8eeaf
Author: Peter Zijlstra (Intel) 
AuthorDate: Fri, 19 Feb 2016 09:46:37 +0100
Committer:  Thomas Gleixner 
CommitDate: Thu, 25 Feb 2016 11:27:16 +0100

wait.[ch]: Introduce the simple waitqueue (swait) implementation

The existing wait queue support has support for custom wake up call
backs, wake flags, wake key (passed to call back) and exclusive
flags that allow wakers to be tagged as exclusive, for limiting
the number of wakers.

In a lot of cases, none of these features are used, and hence we
can benefit from a slimmed down version that lowers memory overhead
and reduces runtime overhead.

The concept originated from -rt, where waitqueues are a constant
source of trouble, as we can't convert the head lock to a raw
spinlock due to fancy and long lasting callbacks.

With the removal of custom callbacks, we can use a raw lock for
queue list manipulations, hence allowing the simple wait support
to be used in -rt.

[Patch is from PeterZ which is based on Thomas version. Commit message is
 written by Paul G.
 Daniel:  - Fixed some compile issues
  - Added non-lazy implementation of swake_up_locked as suggested
 by Boqun Feng.]

Originally-by: Thomas Gleixner 
Signed-off-by: Daniel Wagner 
Acked-by: Peter Zijlstra (Intel) 
Cc: linux-rt-us...@vger.kernel.org
Cc: Boqun Feng 
Cc: Marcelo Tosatti 
Cc: Steven Rostedt 
Cc: Paul Gortmaker 
Cc: Paolo Bonzini 
Cc: "Paul E. McKenney" 
Link: http://lkml.kernel.org/r/1455871601-27484-2-git-send-email-w...@monom.org
Signed-off-by: Thomas Gleixner 
---
 include/linux/swait.h | 172 ++
 kernel/sched/Makefile |   2 +-
 kernel/sched/swait.c  | 123 
 3 files changed, 296 insertions(+), 1 deletion(-)

diff --git a/include/linux/swait.h b/include/linux/swait.h
new file mode 100644
index 000..c1f9c62
--- /dev/null
+++ b/include/linux/swait.h
@@ -0,0 +1,172 @@
+#ifndef _LINUX_SWAIT_H
+#define _LINUX_SWAIT_H
+
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * Simple wait queues
+ *
+ * While these are very similar to the other/complex wait queues (wait.h) the
+ * most important difference is that the simple waitqueue allows for
+ * deterministic behaviour -- IOW it has strictly bounded IRQ and lock hold
+ * times.
+ *
+ * In order to make this so, we had to drop a fair number of features of the
+ * other waitqueue code; notably:
+ *
+ *  - mixing INTERRUPTIBLE and UNINTERRUPTIBLE sleeps on the same waitqueue;
+ *all wakeups are TASK_NORMAL in order to avoid O(n) lookups for the right
+ *sleeper state.
+ *
+ *  - the exclusive mode; because this requires preserving the list order
+ *and this is hard.
+ *
+ *  - custom wake functions; because you cannot give any guarantees about
+ *random code.
+ *
+ * As a side effect of this; the data structures are slimmer.
+ *
+ * One would recommend using this wait queue where possible.
+ */
+
+struct task_struct;
+
+struct swait_queue_head {
+   raw_spinlock_t  lock;
+   struct list_headtask_list;
+};
+
+struct swait_queue {
+   struct task_struct  *task;
+   struct list_headtask_list;
+};
+
+#define __SWAITQUEUE_INITIALIZER(name) {   \
+   .task   = current,  \
+   .task_list  = LIST_HEAD_INIT((name).task_list), \
+}
+
+#define DECLARE_SWAITQUEUE(name)   \
+   struct swait_queue name = __SWAITQUEUE_INITIALIZER(name)
+
+#define __SWAIT_QUEUE_HEAD_INITIALIZER(name) { \
+   .lock   = __RAW_SPIN_LOCK_UNLOCKED(name.lock),  \
+   .task_list  = LIST_HEAD_INIT((name).task_list), \
+}
+
+#define DECLARE_SWAIT_QUEUE_HEAD(name) \
+   struct swait_queue_head name = __SWAIT_QUEUE_HEAD_INITIALIZER(name)
+
+extern void __init_swait_queue_head(struct swait_queue_head *q, const char 
*name,
+   struct lock_class_key *key);
+
+#define init_swait_queue_head(q)   \
+   do {\
+   static struct lock_class_key __key; \
+   __init_swait_queue_head((q), #q, &__key);   \
+   } while (0)
+
+#ifdef CONFIG_LOCKDEP
+# define __SWAIT_QUEUE_HEAD_INIT_ONSTACK(name) \
+   ({ init_swait_queue_head(); name; })
+# define DECLARE_SWAIT_QUEUE_HEAD_ONSTACK(name)\
+   struct swait_queue_head name = __SWAIT_QUEUE_HEAD_INIT_ONSTACK(name)
+#else
+# define DECLARE_SWAIT_QUEUE_HEAD_ONSTACK(name)\
+   DECLARE_SWAIT_QUEUE_HEAD(name)
+#endif
+
+static inline int swait_active(struct swait_queue_head *q)
+{
+   return 

[tip:perf/core] perf/core: Rename perf_event_read_{one,group}, perf_read_hw

2015-09-13 Thread tip-bot for Peter Zijlstra (Intel)
Commit-ID:  b15f495b4e9295cf21065d8569835a2f18cfe41b
Gitweb: http://git.kernel.org/tip/b15f495b4e9295cf21065d8569835a2f18cfe41b
Author: Peter Zijlstra (Intel) 
AuthorDate: Thu, 3 Sep 2015 20:07:47 -0700
Committer:  Ingo Molnar 
CommitDate: Sun, 13 Sep 2015 11:27:26 +0200

perf/core: Rename perf_event_read_{one,group}, perf_read_hw

In order to free up the perf_event_read_group() name:

 s/perf_event_read_\(one\|group\)/perf_read_\1/g
 s/perf_read_hw/__perf_read/g

Signed-off-by: Peter Zijlstra (Intel) 
Cc: Arnaldo Carvalho de Melo 
Cc: Arnaldo Carvalho de Melo 
Cc: Jiri Olsa 
Cc: Linus Torvalds 
Cc: Michael Ellerman 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
Cc: Thomas Gleixner 
Cc: Vince Weaver 
Link: 
http://lkml.kernel.org/r/1441336073-22750-5-git-send-email-suka...@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar 
---
 kernel/events/core.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 260bf8c..67b7dba 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3742,7 +3742,7 @@ static void put_event(struct perf_event *event)
 * see the comment there.
 *
 *  2) there is a lock-inversion with mmap_sem through
-* perf_event_read_group(), which takes faults while
+* perf_read_group(), which takes faults while
 * holding ctx->mutex, however this is called after
 * the last filedesc died, so there is no possibility
 * to trigger the AB-BA case.
@@ -3837,7 +3837,7 @@ u64 perf_event_read_value(struct perf_event *event, u64 
*enabled, u64 *running)
 }
 EXPORT_SYMBOL_GPL(perf_event_read_value);
 
-static int perf_event_read_group(struct perf_event *event,
+static int perf_read_group(struct perf_event *event,
   u64 read_format, char __user *buf)
 {
struct perf_event *leader = event->group_leader, *sub;
@@ -3885,7 +3885,7 @@ static int perf_event_read_group(struct perf_event *event,
return ret;
 }
 
-static int perf_event_read_one(struct perf_event *event,
+static int perf_read_one(struct perf_event *event,
 u64 read_format, char __user *buf)
 {
u64 enabled, running;
@@ -3923,7 +3923,7 @@ static bool is_event_hup(struct perf_event *event)
  * Read the performance event - simple non blocking version for now
  */
 static ssize_t
-perf_read_hw(struct perf_event *event, char __user *buf, size_t count)
+__perf_read(struct perf_event *event, char __user *buf, size_t count)
 {
u64 read_format = event->attr.read_format;
int ret;
@@ -3941,9 +3941,9 @@ perf_read_hw(struct perf_event *event, char __user *buf, 
size_t count)
 
WARN_ON_ONCE(event->ctx->parent_ctx);
if (read_format & PERF_FORMAT_GROUP)
-   ret = perf_event_read_group(event, read_format, buf);
+   ret = perf_read_group(event, read_format, buf);
else
-   ret = perf_event_read_one(event, read_format, buf);
+   ret = perf_read_one(event, read_format, buf);
 
return ret;
 }
@@ -3956,7 +3956,7 @@ perf_read(struct file *file, char __user *buf, size_t 
count, loff_t *ppos)
int ret;
 
ctx = perf_event_ctx_lock(event);
-   ret = perf_read_hw(event, buf, count);
+   ret = __perf_read(event, buf, count);
perf_event_ctx_unlock(event, ctx);
 
return ret;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:perf/core] perf/core: Rename perf_event_read_{one,group}, perf_read_hw

2015-09-13 Thread tip-bot for Peter Zijlstra (Intel)
Commit-ID:  b15f495b4e9295cf21065d8569835a2f18cfe41b
Gitweb: http://git.kernel.org/tip/b15f495b4e9295cf21065d8569835a2f18cfe41b
Author: Peter Zijlstra (Intel) 
AuthorDate: Thu, 3 Sep 2015 20:07:47 -0700
Committer:  Ingo Molnar 
CommitDate: Sun, 13 Sep 2015 11:27:26 +0200

perf/core: Rename perf_event_read_{one,group}, perf_read_hw

In order to free up the perf_event_read_group() name:

 s/perf_event_read_\(one\|group\)/perf_read_\1/g
 s/perf_read_hw/__perf_read/g

Signed-off-by: Peter Zijlstra (Intel) 
Cc: Arnaldo Carvalho de Melo 
Cc: Arnaldo Carvalho de Melo 
Cc: Jiri Olsa 
Cc: Linus Torvalds 
Cc: Michael Ellerman 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
Cc: Thomas Gleixner 
Cc: Vince Weaver 
Link: 
http://lkml.kernel.org/r/1441336073-22750-5-git-send-email-suka...@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar 
---
 kernel/events/core.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 260bf8c..67b7dba 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3742,7 +3742,7 @@ static void put_event(struct perf_event *event)
 * see the comment there.
 *
 *  2) there is a lock-inversion with mmap_sem through
-* perf_event_read_group(), which takes faults while
+* perf_read_group(), which takes faults while
 * holding ctx->mutex, however this is called after
 * the last filedesc died, so there is no possibility
 * to trigger the AB-BA case.
@@ -3837,7 +3837,7 @@ u64 perf_event_read_value(struct perf_event *event, u64 
*enabled, u64 *running)
 }
 EXPORT_SYMBOL_GPL(perf_event_read_value);
 
-static int perf_event_read_group(struct perf_event *event,
+static int perf_read_group(struct perf_event *event,
   u64 read_format, char __user *buf)
 {
struct perf_event *leader = event->group_leader, *sub;
@@ -3885,7 +3885,7 @@ static int perf_event_read_group(struct perf_event *event,
return ret;
 }
 
-static int perf_event_read_one(struct perf_event *event,
+static int perf_read_one(struct perf_event *event,
 u64 read_format, char __user *buf)
 {
u64 enabled, running;
@@ -3923,7 +3923,7 @@ static bool is_event_hup(struct perf_event *event)
  * Read the performance event - simple non blocking version for now
  */
 static ssize_t
-perf_read_hw(struct perf_event *event, char __user *buf, size_t count)
+__perf_read(struct perf_event *event, char __user *buf, size_t count)
 {
u64 read_format = event->attr.read_format;
int ret;
@@ -3941,9 +3941,9 @@ perf_read_hw(struct perf_event *event, char __user *buf, 
size_t count)
 
WARN_ON_ONCE(event->ctx->parent_ctx);
if (read_format & PERF_FORMAT_GROUP)
-   ret = perf_event_read_group(event, read_format, buf);
+   ret = perf_read_group(event, read_format, buf);
else
-   ret = perf_event_read_one(event, read_format, buf);
+   ret = perf_read_one(event, read_format, buf);
 
return ret;
 }
@@ -3956,7 +3956,7 @@ perf_read(struct file *file, char __user *buf, size_t 
count, loff_t *ppos)
int ret;
 
ctx = perf_event_ctx_lock(event);
-   ret = perf_read_hw(event, buf, count);
+   ret = __perf_read(event, buf, count);
perf_event_ctx_unlock(event, ctx);
 
return ret;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:locking/core] locking/pvqspinlock, x86: Implement the paravirt qspinlock call patching

2015-05-08 Thread tip-bot for Peter Zijlstra (Intel)
Commit-ID:  f233f7f1581e78fd9b4023f2e7d8c1ed89020cc9
Gitweb: http://git.kernel.org/tip/f233f7f1581e78fd9b4023f2e7d8c1ed89020cc9
Author: Peter Zijlstra (Intel) 
AuthorDate: Fri, 24 Apr 2015 14:56:38 -0400
Committer:  Ingo Molnar 
CommitDate: Fri, 8 May 2015 12:37:09 +0200

locking/pvqspinlock, x86: Implement the paravirt qspinlock call patching

We use the regular paravirt call patching to switch between:

  native_queued_spin_lock_slowpath()__pv_queued_spin_lock_slowpath()
  native_queued_spin_unlock()   __pv_queued_spin_unlock()

We use a callee saved call for the unlock function which reduces the
i-cache footprint and allows 'inlining' of SPIN_UNLOCK functions
again.

We further optimize the unlock path by patching the direct call with a
"movb $0,%arg1" if we are indeed using the native unlock code. This
makes the unlock code almost as fast as the !PARAVIRT case.

This significantly lowers the overhead of having
CONFIG_PARAVIRT_SPINLOCKS enabled, even for native code.

Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Waiman Long 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Andrew Morton 
Cc: Boris Ostrovsky 
Cc: Borislav Petkov 
Cc: Daniel J Blueman 
Cc: David Vrabel 
Cc: Douglas Hatch 
Cc: H. Peter Anvin 
Cc: Konrad Rzeszutek Wilk 
Cc: Linus Torvalds 
Cc: Oleg Nesterov 
Cc: Paolo Bonzini 
Cc: Paul E. McKenney 
Cc: Peter Zijlstra 
Cc: Raghavendra K T 
Cc: Rik van Riel 
Cc: Scott J Norton 
Cc: Thomas Gleixner 
Cc: virtualizat...@lists.linux-foundation.org
Cc: xen-de...@lists.xenproject.org
Link: 
http://lkml.kernel.org/r/1429901803-29771-10-git-send-email-waiman.l...@hp.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/Kconfig  |  2 +-
 arch/x86/include/asm/paravirt.h   | 29 -
 arch/x86/include/asm/paravirt_types.h | 10 ++
 arch/x86/include/asm/qspinlock.h  | 25 -
 arch/x86/include/asm/qspinlock_paravirt.h |  6 ++
 arch/x86/kernel/paravirt-spinlocks.c  | 24 +++-
 arch/x86/kernel/paravirt_patch_32.c   | 22 ++
 arch/x86/kernel/paravirt_patch_64.c   | 22 ++
 8 files changed, 128 insertions(+), 12 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 90b1b54..50ec043 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -667,7 +667,7 @@ config PARAVIRT_DEBUG
 config PARAVIRT_SPINLOCKS
bool "Paravirtualization layer for spinlocks"
depends on PARAVIRT && SMP
-   select UNINLINE_SPIN_UNLOCK
+   select UNINLINE_SPIN_UNLOCK if !QUEUED_SPINLOCK
---help---
  Paravirtualized spinlocks allow a pvops backend to replace the
  spinlock implementation with something virtualization-friendly
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 8957810..266c353 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -712,6 +712,31 @@ static inline void __set_fixmap(unsigned /* enum 
fixed_addresses */ idx,
 
 #if defined(CONFIG_SMP) && defined(CONFIG_PARAVIRT_SPINLOCKS)
 
+#ifdef CONFIG_QUEUED_SPINLOCK
+
+static __always_inline void pv_queued_spin_lock_slowpath(struct qspinlock 
*lock,
+   u32 val)
+{
+   PVOP_VCALL2(pv_lock_ops.queued_spin_lock_slowpath, lock, val);
+}
+
+static __always_inline void pv_queued_spin_unlock(struct qspinlock *lock)
+{
+   PVOP_VCALLEE1(pv_lock_ops.queued_spin_unlock, lock);
+}
+
+static __always_inline void pv_wait(u8 *ptr, u8 val)
+{
+   PVOP_VCALL2(pv_lock_ops.wait, ptr, val);
+}
+
+static __always_inline void pv_kick(int cpu)
+{
+   PVOP_VCALL1(pv_lock_ops.kick, cpu);
+}
+
+#else /* !CONFIG_QUEUED_SPINLOCK */
+
 static __always_inline void __ticket_lock_spinning(struct arch_spinlock *lock,
__ticket_t ticket)
 {
@@ -724,7 +749,9 @@ static __always_inline void __ticket_unlock_kick(struct 
arch_spinlock *lock,
PVOP_VCALL2(pv_lock_ops.unlock_kick, lock, ticket);
 }
 
-#endif
+#endif /* CONFIG_QUEUED_SPINLOCK */
+
+#endif /* SMP && PARAVIRT_SPINLOCKS */
 
 #ifdef CONFIG_X86_32
 #define PV_SAVE_REGS "pushl %ecx; pushl %edx;"
diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index f7b0b5c..76cd684 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -333,9 +333,19 @@ struct arch_spinlock;
 typedef u16 __ticket_t;
 #endif
 
+struct qspinlock;
+
 struct pv_lock_ops {
+#ifdef CONFIG_QUEUED_SPINLOCK
+   void (*queued_spin_lock_slowpath)(struct qspinlock *lock, u32 val);
+   struct paravirt_callee_save queued_spin_unlock;
+
+   void (*wait)(u8 *ptr, u8 val);
+   void (*kick)(int cpu);
+#else /* !CONFIG_QUEUED_SPINLOCK */
struct paravirt_callee_save lock_spinning;
void (*unlock_kick)(struct arch_spinlock *lock, __ticket_t ticket);

[tip:locking/core] locking/qspinlock: Optimize for smaller NR_CPUS

2015-05-08 Thread tip-bot for Peter Zijlstra (Intel)
Commit-ID:  69f9cae90907e09af95fb991ed384670cef8dd32
Gitweb: http://git.kernel.org/tip/69f9cae90907e09af95fb991ed384670cef8dd32
Author: Peter Zijlstra (Intel) 
AuthorDate: Fri, 24 Apr 2015 14:56:34 -0400
Committer:  Ingo Molnar 
CommitDate: Fri, 8 May 2015 12:36:48 +0200

locking/qspinlock: Optimize for smaller NR_CPUS

When we allow for a max NR_CPUS < 2^14 we can optimize the pending
wait-acquire and the xchg_tail() operations.

By growing the pending bit to a byte, we reduce the tail to 16bit.
This means we can use xchg16 for the tail part and do away with all
the repeated compxchg() operations.

This in turn allows us to unconditionally acquire; the locked state
as observed by the wait loops cannot change. And because both locked
and pending are now a full byte we can use simple stores for the
state transition, obviating one atomic operation entirely.

This optimization is needed to make the qspinlock achieve performance
parity with ticket spinlock at light load.

All this is horribly broken on Alpha pre EV56 (and any other arch that
cannot do single-copy atomic byte stores).

Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Waiman Long 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Andrew Morton 
Cc: Boris Ostrovsky 
Cc: Borislav Petkov 
Cc: Daniel J Blueman 
Cc: David Vrabel 
Cc: Douglas Hatch 
Cc: H. Peter Anvin 
Cc: Konrad Rzeszutek Wilk 
Cc: Linus Torvalds 
Cc: Oleg Nesterov 
Cc: Paolo Bonzini 
Cc: Paul E. McKenney 
Cc: Peter Zijlstra 
Cc: Raghavendra K T 
Cc: Rik van Riel 
Cc: Scott J Norton 
Cc: Thomas Gleixner 
Cc: virtualizat...@lists.linux-foundation.org
Cc: xen-de...@lists.xenproject.org
Link: 
http://lkml.kernel.org/r/1429901803-29771-6-git-send-email-waiman.l...@hp.com
Signed-off-by: Ingo Molnar 
---
 include/asm-generic/qspinlock_types.h | 13 +++
 kernel/locking/qspinlock.c| 69 ++-
 2 files changed, 81 insertions(+), 1 deletion(-)

diff --git a/include/asm-generic/qspinlock_types.h 
b/include/asm-generic/qspinlock_types.h
index 3a7f671..85f888e 100644
--- a/include/asm-generic/qspinlock_types.h
+++ b/include/asm-generic/qspinlock_types.h
@@ -35,6 +35,14 @@ typedef struct qspinlock {
 /*
  * Bitfields in the atomic value:
  *
+ * When NR_CPUS < 16K
+ *  0- 7: locked byte
+ * 8: pending
+ *  9-15: not used
+ * 16-17: tail index
+ * 18-31: tail cpu (+1)
+ *
+ * When NR_CPUS >= 16K
  *  0- 7: locked byte
  * 8: pending
  *  9-10: tail index
@@ -47,7 +55,11 @@ typedef struct qspinlock {
 #define _Q_LOCKED_MASK _Q_SET_MASK(LOCKED)
 
 #define _Q_PENDING_OFFSET  (_Q_LOCKED_OFFSET + _Q_LOCKED_BITS)
+#if CONFIG_NR_CPUS < (1U << 14)
+#define _Q_PENDING_BITS8
+#else
 #define _Q_PENDING_BITS1
+#endif
 #define _Q_PENDING_MASK_Q_SET_MASK(PENDING)
 
 #define _Q_TAIL_IDX_OFFSET (_Q_PENDING_OFFSET + _Q_PENDING_BITS)
@@ -58,6 +70,7 @@ typedef struct qspinlock {
 #define _Q_TAIL_CPU_BITS   (32 - _Q_TAIL_CPU_OFFSET)
 #define _Q_TAIL_CPU_MASK   _Q_SET_MASK(TAIL_CPU)
 
+#define _Q_TAIL_OFFSET _Q_TAIL_IDX_OFFSET
 #define _Q_TAIL_MASK   (_Q_TAIL_IDX_MASK | _Q_TAIL_CPU_MASK)
 
 #define _Q_LOCKED_VAL  (1U << _Q_LOCKED_OFFSET)
diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index 82bb4a9..e17efe7 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 /*
@@ -56,6 +57,10 @@
  * node; whereby avoiding the need to carry a node from lock to unlock, and
  * preserving existing lock API. This also makes the unlock code simpler and
  * faster.
+ *
+ * N.B. The current implementation only supports architectures that allow
+ *  atomic operations on smaller 8-bit and 16-bit data types.
+ *
  */
 
 #include "mcs_spinlock.h"
@@ -96,6 +101,62 @@ static inline struct mcs_spinlock *decode_tail(u32 tail)
 
 #define _Q_LOCKED_PENDING_MASK (_Q_LOCKED_MASK | _Q_PENDING_MASK)
 
+/*
+ * By using the whole 2nd least significant byte for the pending bit, we
+ * can allow better optimization of the lock acquisition for the pending
+ * bit holder.
+ */
+#if _Q_PENDING_BITS == 8
+
+struct __qspinlock {
+   union {
+   atomic_t val;
+   struct {
+#ifdef __LITTLE_ENDIAN
+   u16 locked_pending;
+   u16 tail;
+#else
+   u16 tail;
+   u16 locked_pending;
+#endif
+   };
+   };
+};
+
+/**
+ * clear_pending_set_locked - take ownership and clear the pending bit.
+ * @lock: Pointer to queued spinlock structure
+ *
+ * *,1,0 -> *,0,1
+ *
+ * Lock stealing is not allowed if this function is used.
+ */
+static __always_inline void clear_pending_set_locked(struct qspinlock *lock)
+{
+   struct __qspinlock *l = (void *)lock;
+
+   WRITE_ONCE(l->locked_pending, _Q_LOCKED_VAL);
+}
+
+/*
+ * xchg_tail - Put in the new queue tail code word 

[tip:locking/core] locking/qspinlock: Revert to test-and-set on hypervisors

2015-05-08 Thread tip-bot for Peter Zijlstra (Intel)
Commit-ID:  2aa79af64263190eec610422b07f60e99a7d230a
Gitweb: http://git.kernel.org/tip/2aa79af64263190eec610422b07f60e99a7d230a
Author: Peter Zijlstra (Intel) 
AuthorDate: Fri, 24 Apr 2015 14:56:36 -0400
Committer:  Ingo Molnar 
CommitDate: Fri, 8 May 2015 12:36:58 +0200

locking/qspinlock: Revert to test-and-set on hypervisors

When we detect a hypervisor (!paravirt, see qspinlock paravirt support
patches), revert to a simple test-and-set lock to avoid the horrors
of queue preemption.

Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Waiman Long 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Andrew Morton 
Cc: Boris Ostrovsky 
Cc: Borislav Petkov 
Cc: Daniel J Blueman 
Cc: David Vrabel 
Cc: Douglas Hatch 
Cc: H. Peter Anvin 
Cc: Konrad Rzeszutek Wilk 
Cc: Linus Torvalds 
Cc: Oleg Nesterov 
Cc: Paolo Bonzini 
Cc: Paul E. McKenney 
Cc: Peter Zijlstra 
Cc: Raghavendra K T 
Cc: Rik van Riel 
Cc: Scott J Norton 
Cc: Thomas Gleixner 
Cc: virtualizat...@lists.linux-foundation.org
Cc: xen-de...@lists.xenproject.org
Link: 
http://lkml.kernel.org/r/1429901803-29771-8-git-send-email-waiman.l...@hp.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/include/asm/qspinlock.h | 14 ++
 include/asm-generic/qspinlock.h  |  7 +++
 kernel/locking/qspinlock.c   |  3 +++
 3 files changed, 24 insertions(+)

diff --git a/arch/x86/include/asm/qspinlock.h b/arch/x86/include/asm/qspinlock.h
index e2aee82..f079b70 100644
--- a/arch/x86/include/asm/qspinlock.h
+++ b/arch/x86/include/asm/qspinlock.h
@@ -1,6 +1,7 @@
 #ifndef _ASM_X86_QSPINLOCK_H
 #define _ASM_X86_QSPINLOCK_H
 
+#include 
 #include 
 
 #definequeued_spin_unlock queued_spin_unlock
@@ -15,6 +16,19 @@ static inline void queued_spin_unlock(struct qspinlock *lock)
smp_store_release((u8 *)lock, 0);
 }
 
+#define virt_queued_spin_lock virt_queued_spin_lock
+
+static inline bool virt_queued_spin_lock(struct qspinlock *lock)
+{
+   if (!static_cpu_has(X86_FEATURE_HYPERVISOR))
+   return false;
+
+   while (atomic_cmpxchg(>val, 0, _Q_LOCKED_VAL) != 0)
+   cpu_relax();
+
+   return true;
+}
+
 #include 
 
 #endif /* _ASM_X86_QSPINLOCK_H */
diff --git a/include/asm-generic/qspinlock.h b/include/asm-generic/qspinlock.h
index 569abcd..83bfb87 100644
--- a/include/asm-generic/qspinlock.h
+++ b/include/asm-generic/qspinlock.h
@@ -111,6 +111,13 @@ static inline void queued_spin_unlock_wait(struct 
qspinlock *lock)
cpu_relax();
 }
 
+#ifndef virt_queued_spin_lock
+static __always_inline bool virt_queued_spin_lock(struct qspinlock *lock)
+{
+   return false;
+}
+#endif
+
 /*
  * Initializier
  */
diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index 0338721..fd31a47 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -249,6 +249,9 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 
val)
 
BUILD_BUG_ON(CONFIG_NR_CPUS >= (1U << _Q_TAIL_CPU_BITS));
 
+   if (virt_queued_spin_lock(lock))
+   return;
+
/*
 * wait for in-progress pending->locked hand-overs
 *
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:locking/core] locking/qspinlock: Add pending bit

2015-05-08 Thread tip-bot for Peter Zijlstra (Intel)
Commit-ID:  c1fb159db9f2e50e0f4025bed92a67a6a7bfa7b7
Gitweb: http://git.kernel.org/tip/c1fb159db9f2e50e0f4025bed92a67a6a7bfa7b7
Author: Peter Zijlstra (Intel) 
AuthorDate: Fri, 24 Apr 2015 14:56:32 -0400
Committer:  Ingo Molnar 
CommitDate: Fri, 8 May 2015 12:36:32 +0200

locking/qspinlock: Add pending bit

Because the qspinlock needs to touch a second cacheline (the per-cpu
mcs_nodes[]); add a pending bit and allow a single in-word spinner
before we punt to the second cacheline.

It is possible so observe the pending bit without the locked bit when
the last owner has just released but the pending owner has not yet
taken ownership.

In this case we would normally queue -- because the pending bit is
already taken. However, in this case the pending bit is guaranteed
to be released 'soon', therefore wait for it and avoid queueing.

Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Waiman Long 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Andrew Morton 
Cc: Boris Ostrovsky 
Cc: Borislav Petkov 
Cc: Daniel J Blueman 
Cc: David Vrabel 
Cc: Douglas Hatch 
Cc: H. Peter Anvin 
Cc: Konrad Rzeszutek Wilk 
Cc: Linus Torvalds 
Cc: Oleg Nesterov 
Cc: Paolo Bonzini 
Cc: Paul E. McKenney 
Cc: Peter Zijlstra 
Cc: Raghavendra K T 
Cc: Rik van Riel 
Cc: Scott J Norton 
Cc: Thomas Gleixner 
Cc: virtualizat...@lists.linux-foundation.org
Cc: xen-de...@lists.xenproject.org
Link: 
http://lkml.kernel.org/r/1429901803-29771-4-git-send-email-waiman.l...@hp.com
Signed-off-by: Ingo Molnar 
---
 include/asm-generic/qspinlock_types.h |  12 +++-
 kernel/locking/qspinlock.c| 119 --
 2 files changed, 107 insertions(+), 24 deletions(-)

diff --git a/include/asm-generic/qspinlock_types.h 
b/include/asm-generic/qspinlock_types.h
index aec05c7..7ee6632 100644
--- a/include/asm-generic/qspinlock_types.h
+++ b/include/asm-generic/qspinlock_types.h
@@ -36,8 +36,9 @@ typedef struct qspinlock {
  * Bitfields in the atomic value:
  *
  *  0- 7: locked byte
- *  8- 9: tail index
- * 10-31: tail cpu (+1)
+ * 8: pending
+ *  9-10: tail index
+ * 11-31: tail cpu (+1)
  */
 #define_Q_SET_MASK(type)   (((1U << _Q_ ## type ## _BITS) - 1)\
  << _Q_ ## type ## _OFFSET)
@@ -45,7 +46,11 @@ typedef struct qspinlock {
 #define _Q_LOCKED_BITS 8
 #define _Q_LOCKED_MASK _Q_SET_MASK(LOCKED)
 
-#define _Q_TAIL_IDX_OFFSET (_Q_LOCKED_OFFSET + _Q_LOCKED_BITS)
+#define _Q_PENDING_OFFSET  (_Q_LOCKED_OFFSET + _Q_LOCKED_BITS)
+#define _Q_PENDING_BITS1
+#define _Q_PENDING_MASK_Q_SET_MASK(PENDING)
+
+#define _Q_TAIL_IDX_OFFSET (_Q_PENDING_OFFSET + _Q_PENDING_BITS)
 #define _Q_TAIL_IDX_BITS   2
 #define _Q_TAIL_IDX_MASK   _Q_SET_MASK(TAIL_IDX)
 
@@ -54,5 +59,6 @@ typedef struct qspinlock {
 #define _Q_TAIL_CPU_MASK   _Q_SET_MASK(TAIL_CPU)
 
 #define _Q_LOCKED_VAL  (1U << _Q_LOCKED_OFFSET)
+#define _Q_PENDING_VAL (1U << _Q_PENDING_OFFSET)
 
 #endif /* __ASM_GENERIC_QSPINLOCK_TYPES_H */
diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index 029b51c..af9c2ef 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -94,24 +94,28 @@ static inline struct mcs_spinlock *decode_tail(u32 tail)
return per_cpu_ptr(_nodes[idx], cpu);
 }
 
+#define _Q_LOCKED_PENDING_MASK (_Q_LOCKED_MASK | _Q_PENDING_MASK)
+
 /**
  * queued_spin_lock_slowpath - acquire the queued spinlock
  * @lock: Pointer to queued spinlock structure
  * @val: Current value of the queued spinlock 32-bit word
  *
- * (queue tail, lock value)
- *
- *  fast  :slow  :
unlock
- *:  :
- * uncontended  (0,0)   --:--> (0,1) :--> (*,0)
- *:   | ^./  :
- *:   v   \   |  :
- * uncontended:(n,x) --+--> (n,0) |  :
- *   queue:   | ^--'  |  :
- *:   v   |  :
- * contended  :(*,x) --+--> (*,0) -> (*,1) ---'  :
- *   queue: ^--' :
+ * (queue tail, pending bit, lock value)
  *
+ *  fast :slow  :unlock
+ *   :  :
+ * uncontended  (0,0,0) -:--> (0,0,1) --:--> 
(*,*,0)
+ *   :   | ^.--. /  :
+ *   :   v   \  \|  :
+ * pending   :(0,1,1) +--> (0,1,0)   \   |  :
+ *   :   | ^--'  |   |  :
+ *   :   v   |

[tip:locking/core] locking/pvqspinlock, x86: Implement the paravirt qspinlock call patching

2015-05-08 Thread tip-bot for Peter Zijlstra (Intel)
Commit-ID:  f233f7f1581e78fd9b4023f2e7d8c1ed89020cc9
Gitweb: http://git.kernel.org/tip/f233f7f1581e78fd9b4023f2e7d8c1ed89020cc9
Author: Peter Zijlstra (Intel) pet...@infradead.org
AuthorDate: Fri, 24 Apr 2015 14:56:38 -0400
Committer:  Ingo Molnar mi...@kernel.org
CommitDate: Fri, 8 May 2015 12:37:09 +0200

locking/pvqspinlock, x86: Implement the paravirt qspinlock call patching

We use the regular paravirt call patching to switch between:

  native_queued_spin_lock_slowpath()__pv_queued_spin_lock_slowpath()
  native_queued_spin_unlock()   __pv_queued_spin_unlock()

We use a callee saved call for the unlock function which reduces the
i-cache footprint and allows 'inlining' of SPIN_UNLOCK functions
again.

We further optimize the unlock path by patching the direct call with a
movb $0,%arg1 if we are indeed using the native unlock code. This
makes the unlock code almost as fast as the !PARAVIRT case.

This significantly lowers the overhead of having
CONFIG_PARAVIRT_SPINLOCKS enabled, even for native code.

Signed-off-by: Peter Zijlstra (Intel) pet...@infradead.org
Signed-off-by: Waiman Long waiman.l...@hp.com
Signed-off-by: Peter Zijlstra (Intel) pet...@infradead.org
Cc: Andrew Morton a...@linux-foundation.org
Cc: Boris Ostrovsky boris.ostrov...@oracle.com
Cc: Borislav Petkov b...@alien8.de
Cc: Daniel J Blueman dan...@numascale.com
Cc: David Vrabel david.vra...@citrix.com
Cc: Douglas Hatch doug.ha...@hp.com
Cc: H. Peter Anvin h...@zytor.com
Cc: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Cc: Linus Torvalds torva...@linux-foundation.org
Cc: Oleg Nesterov o...@redhat.com
Cc: Paolo Bonzini paolo.bonz...@gmail.com
Cc: Paul E. McKenney paul...@linux.vnet.ibm.com
Cc: Peter Zijlstra pet...@infradead.org
Cc: Raghavendra K T raghavendra...@linux.vnet.ibm.com
Cc: Rik van Riel r...@redhat.com
Cc: Scott J Norton scott.nor...@hp.com
Cc: Thomas Gleixner t...@linutronix.de
Cc: virtualizat...@lists.linux-foundation.org
Cc: xen-de...@lists.xenproject.org
Link: 
http://lkml.kernel.org/r/1429901803-29771-10-git-send-email-waiman.l...@hp.com
Signed-off-by: Ingo Molnar mi...@kernel.org
---
 arch/x86/Kconfig  |  2 +-
 arch/x86/include/asm/paravirt.h   | 29 -
 arch/x86/include/asm/paravirt_types.h | 10 ++
 arch/x86/include/asm/qspinlock.h  | 25 -
 arch/x86/include/asm/qspinlock_paravirt.h |  6 ++
 arch/x86/kernel/paravirt-spinlocks.c  | 24 +++-
 arch/x86/kernel/paravirt_patch_32.c   | 22 ++
 arch/x86/kernel/paravirt_patch_64.c   | 22 ++
 8 files changed, 128 insertions(+), 12 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 90b1b54..50ec043 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -667,7 +667,7 @@ config PARAVIRT_DEBUG
 config PARAVIRT_SPINLOCKS
bool Paravirtualization layer for spinlocks
depends on PARAVIRT  SMP
-   select UNINLINE_SPIN_UNLOCK
+   select UNINLINE_SPIN_UNLOCK if !QUEUED_SPINLOCK
---help---
  Paravirtualized spinlocks allow a pvops backend to replace the
  spinlock implementation with something virtualization-friendly
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 8957810..266c353 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -712,6 +712,31 @@ static inline void __set_fixmap(unsigned /* enum 
fixed_addresses */ idx,
 
 #if defined(CONFIG_SMP)  defined(CONFIG_PARAVIRT_SPINLOCKS)
 
+#ifdef CONFIG_QUEUED_SPINLOCK
+
+static __always_inline void pv_queued_spin_lock_slowpath(struct qspinlock 
*lock,
+   u32 val)
+{
+   PVOP_VCALL2(pv_lock_ops.queued_spin_lock_slowpath, lock, val);
+}
+
+static __always_inline void pv_queued_spin_unlock(struct qspinlock *lock)
+{
+   PVOP_VCALLEE1(pv_lock_ops.queued_spin_unlock, lock);
+}
+
+static __always_inline void pv_wait(u8 *ptr, u8 val)
+{
+   PVOP_VCALL2(pv_lock_ops.wait, ptr, val);
+}
+
+static __always_inline void pv_kick(int cpu)
+{
+   PVOP_VCALL1(pv_lock_ops.kick, cpu);
+}
+
+#else /* !CONFIG_QUEUED_SPINLOCK */
+
 static __always_inline void __ticket_lock_spinning(struct arch_spinlock *lock,
__ticket_t ticket)
 {
@@ -724,7 +749,9 @@ static __always_inline void __ticket_unlock_kick(struct 
arch_spinlock *lock,
PVOP_VCALL2(pv_lock_ops.unlock_kick, lock, ticket);
 }
 
-#endif
+#endif /* CONFIG_QUEUED_SPINLOCK */
+
+#endif /* SMP  PARAVIRT_SPINLOCKS */
 
 #ifdef CONFIG_X86_32
 #define PV_SAVE_REGS pushl %ecx; pushl %edx;
diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index f7b0b5c..76cd684 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -333,9 +333,19 @@ struct arch_spinlock;
 typedef u16 __ticket_t;
 

[tip:locking/core] locking/qspinlock: Optimize for smaller NR_CPUS

2015-05-08 Thread tip-bot for Peter Zijlstra (Intel)
Commit-ID:  69f9cae90907e09af95fb991ed384670cef8dd32
Gitweb: http://git.kernel.org/tip/69f9cae90907e09af95fb991ed384670cef8dd32
Author: Peter Zijlstra (Intel) pet...@infradead.org
AuthorDate: Fri, 24 Apr 2015 14:56:34 -0400
Committer:  Ingo Molnar mi...@kernel.org
CommitDate: Fri, 8 May 2015 12:36:48 +0200

locking/qspinlock: Optimize for smaller NR_CPUS

When we allow for a max NR_CPUS  2^14 we can optimize the pending
wait-acquire and the xchg_tail() operations.

By growing the pending bit to a byte, we reduce the tail to 16bit.
This means we can use xchg16 for the tail part and do away with all
the repeated compxchg() operations.

This in turn allows us to unconditionally acquire; the locked state
as observed by the wait loops cannot change. And because both locked
and pending are now a full byte we can use simple stores for the
state transition, obviating one atomic operation entirely.

This optimization is needed to make the qspinlock achieve performance
parity with ticket spinlock at light load.

All this is horribly broken on Alpha pre EV56 (and any other arch that
cannot do single-copy atomic byte stores).

Signed-off-by: Peter Zijlstra (Intel) pet...@infradead.org
Signed-off-by: Waiman Long waiman.l...@hp.com
Signed-off-by: Peter Zijlstra (Intel) pet...@infradead.org
Cc: Andrew Morton a...@linux-foundation.org
Cc: Boris Ostrovsky boris.ostrov...@oracle.com
Cc: Borislav Petkov b...@alien8.de
Cc: Daniel J Blueman dan...@numascale.com
Cc: David Vrabel david.vra...@citrix.com
Cc: Douglas Hatch doug.ha...@hp.com
Cc: H. Peter Anvin h...@zytor.com
Cc: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Cc: Linus Torvalds torva...@linux-foundation.org
Cc: Oleg Nesterov o...@redhat.com
Cc: Paolo Bonzini paolo.bonz...@gmail.com
Cc: Paul E. McKenney paul...@linux.vnet.ibm.com
Cc: Peter Zijlstra pet...@infradead.org
Cc: Raghavendra K T raghavendra...@linux.vnet.ibm.com
Cc: Rik van Riel r...@redhat.com
Cc: Scott J Norton scott.nor...@hp.com
Cc: Thomas Gleixner t...@linutronix.de
Cc: virtualizat...@lists.linux-foundation.org
Cc: xen-de...@lists.xenproject.org
Link: 
http://lkml.kernel.org/r/1429901803-29771-6-git-send-email-waiman.l...@hp.com
Signed-off-by: Ingo Molnar mi...@kernel.org
---
 include/asm-generic/qspinlock_types.h | 13 +++
 kernel/locking/qspinlock.c| 69 ++-
 2 files changed, 81 insertions(+), 1 deletion(-)

diff --git a/include/asm-generic/qspinlock_types.h 
b/include/asm-generic/qspinlock_types.h
index 3a7f671..85f888e 100644
--- a/include/asm-generic/qspinlock_types.h
+++ b/include/asm-generic/qspinlock_types.h
@@ -35,6 +35,14 @@ typedef struct qspinlock {
 /*
  * Bitfields in the atomic value:
  *
+ * When NR_CPUS  16K
+ *  0- 7: locked byte
+ * 8: pending
+ *  9-15: not used
+ * 16-17: tail index
+ * 18-31: tail cpu (+1)
+ *
+ * When NR_CPUS = 16K
  *  0- 7: locked byte
  * 8: pending
  *  9-10: tail index
@@ -47,7 +55,11 @@ typedef struct qspinlock {
 #define _Q_LOCKED_MASK _Q_SET_MASK(LOCKED)
 
 #define _Q_PENDING_OFFSET  (_Q_LOCKED_OFFSET + _Q_LOCKED_BITS)
+#if CONFIG_NR_CPUS  (1U  14)
+#define _Q_PENDING_BITS8
+#else
 #define _Q_PENDING_BITS1
+#endif
 #define _Q_PENDING_MASK_Q_SET_MASK(PENDING)
 
 #define _Q_TAIL_IDX_OFFSET (_Q_PENDING_OFFSET + _Q_PENDING_BITS)
@@ -58,6 +70,7 @@ typedef struct qspinlock {
 #define _Q_TAIL_CPU_BITS   (32 - _Q_TAIL_CPU_OFFSET)
 #define _Q_TAIL_CPU_MASK   _Q_SET_MASK(TAIL_CPU)
 
+#define _Q_TAIL_OFFSET _Q_TAIL_IDX_OFFSET
 #define _Q_TAIL_MASK   (_Q_TAIL_IDX_MASK | _Q_TAIL_CPU_MASK)
 
 #define _Q_LOCKED_VAL  (1U  _Q_LOCKED_OFFSET)
diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index 82bb4a9..e17efe7 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -24,6 +24,7 @@
 #include linux/percpu.h
 #include linux/hardirq.h
 #include linux/mutex.h
+#include asm/byteorder.h
 #include asm/qspinlock.h
 
 /*
@@ -56,6 +57,10 @@
  * node; whereby avoiding the need to carry a node from lock to unlock, and
  * preserving existing lock API. This also makes the unlock code simpler and
  * faster.
+ *
+ * N.B. The current implementation only supports architectures that allow
+ *  atomic operations on smaller 8-bit and 16-bit data types.
+ *
  */
 
 #include mcs_spinlock.h
@@ -96,6 +101,62 @@ static inline struct mcs_spinlock *decode_tail(u32 tail)
 
 #define _Q_LOCKED_PENDING_MASK (_Q_LOCKED_MASK | _Q_PENDING_MASK)
 
+/*
+ * By using the whole 2nd least significant byte for the pending bit, we
+ * can allow better optimization of the lock acquisition for the pending
+ * bit holder.
+ */
+#if _Q_PENDING_BITS == 8
+
+struct __qspinlock {
+   union {
+   atomic_t val;
+   struct {
+#ifdef __LITTLE_ENDIAN
+   u16 locked_pending;
+   u16 tail;
+#else
+   u16 tail;
+ 

[tip:locking/core] locking/qspinlock: Revert to test-and-set on hypervisors

2015-05-08 Thread tip-bot for Peter Zijlstra (Intel)
Commit-ID:  2aa79af64263190eec610422b07f60e99a7d230a
Gitweb: http://git.kernel.org/tip/2aa79af64263190eec610422b07f60e99a7d230a
Author: Peter Zijlstra (Intel) pet...@infradead.org
AuthorDate: Fri, 24 Apr 2015 14:56:36 -0400
Committer:  Ingo Molnar mi...@kernel.org
CommitDate: Fri, 8 May 2015 12:36:58 +0200

locking/qspinlock: Revert to test-and-set on hypervisors

When we detect a hypervisor (!paravirt, see qspinlock paravirt support
patches), revert to a simple test-and-set lock to avoid the horrors
of queue preemption.

Signed-off-by: Peter Zijlstra (Intel) pet...@infradead.org
Signed-off-by: Waiman Long waiman.l...@hp.com
Signed-off-by: Peter Zijlstra (Intel) pet...@infradead.org
Cc: Andrew Morton a...@linux-foundation.org
Cc: Boris Ostrovsky boris.ostrov...@oracle.com
Cc: Borislav Petkov b...@alien8.de
Cc: Daniel J Blueman dan...@numascale.com
Cc: David Vrabel david.vra...@citrix.com
Cc: Douglas Hatch doug.ha...@hp.com
Cc: H. Peter Anvin h...@zytor.com
Cc: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Cc: Linus Torvalds torva...@linux-foundation.org
Cc: Oleg Nesterov o...@redhat.com
Cc: Paolo Bonzini paolo.bonz...@gmail.com
Cc: Paul E. McKenney paul...@linux.vnet.ibm.com
Cc: Peter Zijlstra pet...@infradead.org
Cc: Raghavendra K T raghavendra...@linux.vnet.ibm.com
Cc: Rik van Riel r...@redhat.com
Cc: Scott J Norton scott.nor...@hp.com
Cc: Thomas Gleixner t...@linutronix.de
Cc: virtualizat...@lists.linux-foundation.org
Cc: xen-de...@lists.xenproject.org
Link: 
http://lkml.kernel.org/r/1429901803-29771-8-git-send-email-waiman.l...@hp.com
Signed-off-by: Ingo Molnar mi...@kernel.org
---
 arch/x86/include/asm/qspinlock.h | 14 ++
 include/asm-generic/qspinlock.h  |  7 +++
 kernel/locking/qspinlock.c   |  3 +++
 3 files changed, 24 insertions(+)

diff --git a/arch/x86/include/asm/qspinlock.h b/arch/x86/include/asm/qspinlock.h
index e2aee82..f079b70 100644
--- a/arch/x86/include/asm/qspinlock.h
+++ b/arch/x86/include/asm/qspinlock.h
@@ -1,6 +1,7 @@
 #ifndef _ASM_X86_QSPINLOCK_H
 #define _ASM_X86_QSPINLOCK_H
 
+#include asm/cpufeature.h
 #include asm-generic/qspinlock_types.h
 
 #definequeued_spin_unlock queued_spin_unlock
@@ -15,6 +16,19 @@ static inline void queued_spin_unlock(struct qspinlock *lock)
smp_store_release((u8 *)lock, 0);
 }
 
+#define virt_queued_spin_lock virt_queued_spin_lock
+
+static inline bool virt_queued_spin_lock(struct qspinlock *lock)
+{
+   if (!static_cpu_has(X86_FEATURE_HYPERVISOR))
+   return false;
+
+   while (atomic_cmpxchg(lock-val, 0, _Q_LOCKED_VAL) != 0)
+   cpu_relax();
+
+   return true;
+}
+
 #include asm-generic/qspinlock.h
 
 #endif /* _ASM_X86_QSPINLOCK_H */
diff --git a/include/asm-generic/qspinlock.h b/include/asm-generic/qspinlock.h
index 569abcd..83bfb87 100644
--- a/include/asm-generic/qspinlock.h
+++ b/include/asm-generic/qspinlock.h
@@ -111,6 +111,13 @@ static inline void queued_spin_unlock_wait(struct 
qspinlock *lock)
cpu_relax();
 }
 
+#ifndef virt_queued_spin_lock
+static __always_inline bool virt_queued_spin_lock(struct qspinlock *lock)
+{
+   return false;
+}
+#endif
+
 /*
  * Initializier
  */
diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index 0338721..fd31a47 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -249,6 +249,9 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 
val)
 
BUILD_BUG_ON(CONFIG_NR_CPUS = (1U  _Q_TAIL_CPU_BITS));
 
+   if (virt_queued_spin_lock(lock))
+   return;
+
/*
 * wait for in-progress pending-locked hand-overs
 *
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:locking/core] locking/qspinlock: Add pending bit

2015-05-08 Thread tip-bot for Peter Zijlstra (Intel)
Commit-ID:  c1fb159db9f2e50e0f4025bed92a67a6a7bfa7b7
Gitweb: http://git.kernel.org/tip/c1fb159db9f2e50e0f4025bed92a67a6a7bfa7b7
Author: Peter Zijlstra (Intel) pet...@infradead.org
AuthorDate: Fri, 24 Apr 2015 14:56:32 -0400
Committer:  Ingo Molnar mi...@kernel.org
CommitDate: Fri, 8 May 2015 12:36:32 +0200

locking/qspinlock: Add pending bit

Because the qspinlock needs to touch a second cacheline (the per-cpu
mcs_nodes[]); add a pending bit and allow a single in-word spinner
before we punt to the second cacheline.

It is possible so observe the pending bit without the locked bit when
the last owner has just released but the pending owner has not yet
taken ownership.

In this case we would normally queue -- because the pending bit is
already taken. However, in this case the pending bit is guaranteed
to be released 'soon', therefore wait for it and avoid queueing.

Signed-off-by: Peter Zijlstra (Intel) pet...@infradead.org
Signed-off-by: Waiman Long waiman.l...@hp.com
Signed-off-by: Peter Zijlstra (Intel) pet...@infradead.org
Cc: Andrew Morton a...@linux-foundation.org
Cc: Boris Ostrovsky boris.ostrov...@oracle.com
Cc: Borislav Petkov b...@alien8.de
Cc: Daniel J Blueman dan...@numascale.com
Cc: David Vrabel david.vra...@citrix.com
Cc: Douglas Hatch doug.ha...@hp.com
Cc: H. Peter Anvin h...@zytor.com
Cc: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Cc: Linus Torvalds torva...@linux-foundation.org
Cc: Oleg Nesterov o...@redhat.com
Cc: Paolo Bonzini paolo.bonz...@gmail.com
Cc: Paul E. McKenney paul...@linux.vnet.ibm.com
Cc: Peter Zijlstra pet...@infradead.org
Cc: Raghavendra K T raghavendra...@linux.vnet.ibm.com
Cc: Rik van Riel r...@redhat.com
Cc: Scott J Norton scott.nor...@hp.com
Cc: Thomas Gleixner t...@linutronix.de
Cc: virtualizat...@lists.linux-foundation.org
Cc: xen-de...@lists.xenproject.org
Link: 
http://lkml.kernel.org/r/1429901803-29771-4-git-send-email-waiman.l...@hp.com
Signed-off-by: Ingo Molnar mi...@kernel.org
---
 include/asm-generic/qspinlock_types.h |  12 +++-
 kernel/locking/qspinlock.c| 119 --
 2 files changed, 107 insertions(+), 24 deletions(-)

diff --git a/include/asm-generic/qspinlock_types.h 
b/include/asm-generic/qspinlock_types.h
index aec05c7..7ee6632 100644
--- a/include/asm-generic/qspinlock_types.h
+++ b/include/asm-generic/qspinlock_types.h
@@ -36,8 +36,9 @@ typedef struct qspinlock {
  * Bitfields in the atomic value:
  *
  *  0- 7: locked byte
- *  8- 9: tail index
- * 10-31: tail cpu (+1)
+ * 8: pending
+ *  9-10: tail index
+ * 11-31: tail cpu (+1)
  */
 #define_Q_SET_MASK(type)   (((1U  _Q_ ## type ## _BITS) - 1)\
   _Q_ ## type ## _OFFSET)
@@ -45,7 +46,11 @@ typedef struct qspinlock {
 #define _Q_LOCKED_BITS 8
 #define _Q_LOCKED_MASK _Q_SET_MASK(LOCKED)
 
-#define _Q_TAIL_IDX_OFFSET (_Q_LOCKED_OFFSET + _Q_LOCKED_BITS)
+#define _Q_PENDING_OFFSET  (_Q_LOCKED_OFFSET + _Q_LOCKED_BITS)
+#define _Q_PENDING_BITS1
+#define _Q_PENDING_MASK_Q_SET_MASK(PENDING)
+
+#define _Q_TAIL_IDX_OFFSET (_Q_PENDING_OFFSET + _Q_PENDING_BITS)
 #define _Q_TAIL_IDX_BITS   2
 #define _Q_TAIL_IDX_MASK   _Q_SET_MASK(TAIL_IDX)
 
@@ -54,5 +59,6 @@ typedef struct qspinlock {
 #define _Q_TAIL_CPU_MASK   _Q_SET_MASK(TAIL_CPU)
 
 #define _Q_LOCKED_VAL  (1U  _Q_LOCKED_OFFSET)
+#define _Q_PENDING_VAL (1U  _Q_PENDING_OFFSET)
 
 #endif /* __ASM_GENERIC_QSPINLOCK_TYPES_H */
diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index 029b51c..af9c2ef 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -94,24 +94,28 @@ static inline struct mcs_spinlock *decode_tail(u32 tail)
return per_cpu_ptr(mcs_nodes[idx], cpu);
 }
 
+#define _Q_LOCKED_PENDING_MASK (_Q_LOCKED_MASK | _Q_PENDING_MASK)
+
 /**
  * queued_spin_lock_slowpath - acquire the queued spinlock
  * @lock: Pointer to queued spinlock structure
  * @val: Current value of the queued spinlock 32-bit word
  *
- * (queue tail, lock value)
- *
- *  fast  :slow  :
unlock
- *:  :
- * uncontended  (0,0)   --:-- (0,1) :-- (*,0)
- *:   | ^./  :
- *:   v   \   |  :
- * uncontended:(n,x) --+-- (n,0) |  :
- *   queue:   | ^--'  |  :
- *:   v   |  :
- * contended  :(*,x) --+-- (*,0) - (*,1) ---'  :
- *   queue: ^--' :
+ * (queue tail, pending bit, lock value)
  *
+ *  fast :slow  :unlock
+ *   :   

[tip:perf/core] perf: Fix move_group() order

2015-02-04 Thread tip-bot for Peter Zijlstra (Intel)
Commit-ID:  8f95b435b62522aed3381aaea920de8d09ccabf3
Gitweb: http://git.kernel.org/tip/8f95b435b62522aed3381aaea920de8d09ccabf3
Author: Peter Zijlstra (Intel) 
AuthorDate: Tue, 27 Jan 2015 11:53:12 +0100
Committer:  Ingo Molnar 
CommitDate: Wed, 4 Feb 2015 08:07:11 +0100

perf: Fix move_group() order

Jiri reported triggering the new WARN_ON_ONCE in event_sched_out over
the weekend:

  event_sched_out.isra.79+0x2b9/0x2d0
  group_sched_out+0x69/0xc0
  ctx_sched_out+0x106/0x130
  task_ctx_sched_out+0x37/0x70
  __perf_install_in_context+0x70/0x1a0
  remote_function+0x48/0x60
  generic_exec_single+0x15b/0x1d0
  smp_call_function_single+0x67/0xa0
  task_function_call+0x53/0x80
  perf_install_in_context+0x8b/0x110

I think the below should cure this; if we install a group leader it
will iterate the (still intact) group list and find its siblings and
try and install those too -- even though those still have the old
event->ctx -- in the new ctx.

Upon installing the first group sibling we'd try and schedule out the
group and trigger the above warn.

Fix this by installing the group leader last, installing siblings
would have no effect, they're not reachable through the group lists
and therefore we don't schedule them.

Also delay resetting the state until we're absolutely sure the events
are quiescent.

Reported-by: Jiri Olsa 
Reported-by: vincent.wea...@maine.edu
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Arnaldo Carvalho de Melo 
Cc: Linus Torvalds 
Link: 
http://lkml.kernel.org/r/20150126162639.ga21...@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar 
---
 kernel/events/core.c | 56 +++-
 1 file changed, 47 insertions(+), 9 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 417a96b..142dbabc 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7645,16 +7645,9 @@ SYSCALL_DEFINE5(perf_event_open,
 
perf_remove_from_context(group_leader, false);
 
-   /*
-* Removing from the context ends up with disabled
-* event. What we want here is event in the initial
-* startup state, ready to be add into new context.
-*/
-   perf_event__state_init(group_leader);
list_for_each_entry(sibling, _leader->sibling_list,
group_entry) {
perf_remove_from_context(sibling, false);
-   perf_event__state_init(sibling);
put_ctx(gctx);
}
} else {
@@ -7670,13 +7663,31 @@ SYSCALL_DEFINE5(perf_event_open,
 */
synchronize_rcu();
 
-   perf_install_in_context(ctx, group_leader, group_leader->cpu);
-   get_ctx(ctx);
+   /*
+* Install the group siblings before the group leader.
+*
+* Because a group leader will try and install the entire group
+* (through the sibling list, which is still in-tact), we can
+* end up with siblings installed in the wrong context.
+*
+* By installing siblings first we NO-OP because they're not
+* reachable through the group lists.
+*/
list_for_each_entry(sibling, _leader->sibling_list,
group_entry) {
+   perf_event__state_init(sibling);
perf_install_in_context(ctx, sibling, sibling->cpu);
get_ctx(ctx);
}
+
+   /*
+* Removing from the context ends up with disabled
+* event. What we want here is event in the initial
+* startup state, ready to be add into new context.
+*/
+   perf_event__state_init(group_leader);
+   perf_install_in_context(ctx, group_leader, group_leader->cpu);
+   get_ctx(ctx);
}
 
perf_install_in_context(ctx, event, event->cpu);
@@ -7806,8 +7817,35 @@ void perf_pmu_migrate_context(struct pmu *pmu, int 
src_cpu, int dst_cpu)
list_add(>migrate_entry, );
}
 
+   /*
+* Wait for the events to quiesce before re-instating them.
+*/
synchronize_rcu();
 
+   /*
+* Re-instate events in 2 passes.
+*
+* Skip over group leaders and only install siblings on this first
+* pass, siblings will not get enabled without a leader, however a
+* leader will enable its siblings, even if those are still on the old
+* context.
+*/
+   list_for_each_entry_safe(event, tmp, , migrate_entry) {
+   if (event->group_leader == event)
+   continue;
+
+   list_del(>migrate_entry);
+   if (event->state >= PERF_EVENT_STATE_OFF)
+   

[tip:perf/core] perf: Fix move_group() order

2015-02-04 Thread tip-bot for Peter Zijlstra (Intel)
Commit-ID:  8f95b435b62522aed3381aaea920de8d09ccabf3
Gitweb: http://git.kernel.org/tip/8f95b435b62522aed3381aaea920de8d09ccabf3
Author: Peter Zijlstra (Intel) pet...@infradead.org
AuthorDate: Tue, 27 Jan 2015 11:53:12 +0100
Committer:  Ingo Molnar mi...@kernel.org
CommitDate: Wed, 4 Feb 2015 08:07:11 +0100

perf: Fix move_group() order

Jiri reported triggering the new WARN_ON_ONCE in event_sched_out over
the weekend:

  event_sched_out.isra.79+0x2b9/0x2d0
  group_sched_out+0x69/0xc0
  ctx_sched_out+0x106/0x130
  task_ctx_sched_out+0x37/0x70
  __perf_install_in_context+0x70/0x1a0
  remote_function+0x48/0x60
  generic_exec_single+0x15b/0x1d0
  smp_call_function_single+0x67/0xa0
  task_function_call+0x53/0x80
  perf_install_in_context+0x8b/0x110

I think the below should cure this; if we install a group leader it
will iterate the (still intact) group list and find its siblings and
try and install those too -- even though those still have the old
event-ctx -- in the new ctx.

Upon installing the first group sibling we'd try and schedule out the
group and trigger the above warn.

Fix this by installing the group leader last, installing siblings
would have no effect, they're not reachable through the group lists
and therefore we don't schedule them.

Also delay resetting the state until we're absolutely sure the events
are quiescent.

Reported-by: Jiri Olsa jo...@redhat.com
Reported-by: vincent.wea...@maine.edu
Signed-off-by: Peter Zijlstra (Intel) pet...@infradead.org
Cc: Arnaldo Carvalho de Melo a...@kernel.org
Cc: Linus Torvalds torva...@linux-foundation.org
Link: 
http://lkml.kernel.org/r/20150126162639.ga21...@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar mi...@kernel.org
---
 kernel/events/core.c | 56 +++-
 1 file changed, 47 insertions(+), 9 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 417a96b..142dbabc 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7645,16 +7645,9 @@ SYSCALL_DEFINE5(perf_event_open,
 
perf_remove_from_context(group_leader, false);
 
-   /*
-* Removing from the context ends up with disabled
-* event. What we want here is event in the initial
-* startup state, ready to be add into new context.
-*/
-   perf_event__state_init(group_leader);
list_for_each_entry(sibling, group_leader-sibling_list,
group_entry) {
perf_remove_from_context(sibling, false);
-   perf_event__state_init(sibling);
put_ctx(gctx);
}
} else {
@@ -7670,13 +7663,31 @@ SYSCALL_DEFINE5(perf_event_open,
 */
synchronize_rcu();
 
-   perf_install_in_context(ctx, group_leader, group_leader-cpu);
-   get_ctx(ctx);
+   /*
+* Install the group siblings before the group leader.
+*
+* Because a group leader will try and install the entire group
+* (through the sibling list, which is still in-tact), we can
+* end up with siblings installed in the wrong context.
+*
+* By installing siblings first we NO-OP because they're not
+* reachable through the group lists.
+*/
list_for_each_entry(sibling, group_leader-sibling_list,
group_entry) {
+   perf_event__state_init(sibling);
perf_install_in_context(ctx, sibling, sibling-cpu);
get_ctx(ctx);
}
+
+   /*
+* Removing from the context ends up with disabled
+* event. What we want here is event in the initial
+* startup state, ready to be add into new context.
+*/
+   perf_event__state_init(group_leader);
+   perf_install_in_context(ctx, group_leader, group_leader-cpu);
+   get_ctx(ctx);
}
 
perf_install_in_context(ctx, event, event-cpu);
@@ -7806,8 +7817,35 @@ void perf_pmu_migrate_context(struct pmu *pmu, int 
src_cpu, int dst_cpu)
list_add(event-migrate_entry, events);
}
 
+   /*
+* Wait for the events to quiesce before re-instating them.
+*/
synchronize_rcu();
 
+   /*
+* Re-instate events in 2 passes.
+*
+* Skip over group leaders and only install siblings on this first
+* pass, siblings will not get enabled without a leader, however a
+* leader will enable its siblings, even if those are still on the old
+* context.
+*/
+   list_for_each_entry_safe(event, tmp, events, migrate_entry) {
+   if (event-group_leader == event)
+   

[tip:perf/core] perf: Avoid horrible stack usage

2015-01-14 Thread tip-bot for Peter Zijlstra (Intel)
Commit-ID:  86038c5ea81b519a8a1fcfcd5e4599aab0cdd119
Gitweb: http://git.kernel.org/tip/86038c5ea81b519a8a1fcfcd5e4599aab0cdd119
Author: Peter Zijlstra (Intel) 
AuthorDate: Tue, 16 Dec 2014 12:47:34 +0100
Committer:  Ingo Molnar 
CommitDate: Wed, 14 Jan 2015 15:11:45 +0100

perf: Avoid horrible stack usage

Both Linus (most recent) and Steve (a while ago) reported that perf
related callbacks have massive stack bloat.

The problem is that software events need a pt_regs in order to
properly report the event location and unwind stack. And because we
could not assume one was present we allocated one on stack and filled
it with minimal bits required for operation.

Now, pt_regs is quite large, so this is undesirable. Furthermore it
turns out that most sites actually have a pt_regs pointer available,
making this even more onerous, as the stack space is pointless waste.

This patch addresses the problem by observing that software events
have well defined nesting semantics, therefore we can use static
per-cpu storage instead of on-stack.

Linus made the further observation that all but the scheduler callers
of perf_sw_event() have a pt_regs available, so we change the regular
perf_sw_event() to require a valid pt_regs (where it used to be
optional) and add perf_sw_event_sched() for the scheduler.

We have a scheduler specific call instead of a more generic _noregs()
like construct because we can assume non-recursion from the scheduler
and thereby simplify the code further (_noregs would have to put the
recursion context call inline in order to assertain which __perf_regs
element to use).

One last note on the implementation of perf_trace_buf_prepare(); we
allow .regs = NULL for those cases where we already have a pt_regs
pointer available and do not need another.

Reported-by: Linus Torvalds 
Reported-by: Steven Rostedt 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Arnaldo Carvalho de Melo 
Cc: Javi Merino 
Cc: Linus Torvalds 
Cc: Mathieu Desnoyers 
Cc: Oleg Nesterov 
Cc: Paul Mackerras 
Cc: Petr Mladek 
Cc: Steven Rostedt 
Cc: Tom Zanussi 
Cc: Vaibhav Nagarnaik 
Link: 
http://lkml.kernel.org/r/20141216115041.gw3...@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar 
---
 include/linux/ftrace_event.h|  2 +-
 include/linux/perf_event.h  | 28 +---
 include/trace/ftrace.h  |  7 ---
 kernel/events/core.c| 23 +--
 kernel/sched/core.c |  2 +-
 kernel/trace/trace_event_perf.c |  4 +++-
 kernel/trace/trace_kprobe.c |  4 ++--
 kernel/trace/trace_syscalls.c   |  4 ++--
 kernel/trace/trace_uprobe.c |  2 +-
 9 files changed, 52 insertions(+), 24 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 0bebb5c..d36f68b 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -595,7 +595,7 @@ extern int  ftrace_profile_set_filter(struct perf_event 
*event, int event_id,
 char *filter_str);
 extern void ftrace_profile_free_filter(struct perf_event *event);
 extern void *perf_trace_buf_prepare(int size, unsigned short type,
-   struct pt_regs *regs, int *rctxp);
+   struct pt_regs **regs, int *rctxp);
 
 static inline void
 perf_trace_buf_submit(void *raw_data, int size, int rctx, u64 addr,
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 4f7a61c..3a7bd80 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -665,6 +665,7 @@ static inline int is_software_event(struct perf_event 
*event)
 
 extern struct static_key perf_swevent_enabled[PERF_COUNT_SW_MAX];
 
+extern void ___perf_sw_event(u32, u64, struct pt_regs *, u64);
 extern void __perf_sw_event(u32, u64, struct pt_regs *, u64);
 
 #ifndef perf_arch_fetch_caller_regs
@@ -689,14 +690,25 @@ static inline void perf_fetch_caller_regs(struct pt_regs 
*regs)
 static __always_inline void
 perf_sw_event(u32 event_id, u64 nr, struct pt_regs *regs, u64 addr)
 {
-   struct pt_regs hot_regs;
+   if (static_key_false(_swevent_enabled[event_id]))
+   __perf_sw_event(event_id, nr, regs, addr);
+}
+
+DECLARE_PER_CPU(struct pt_regs, __perf_regs[4]);
 
+/*
+ * 'Special' version for the scheduler, it hard assumes no recursion,
+ * which is guaranteed by us not actually scheduling inside other swevents
+ * because those disable preemption.
+ */
+static __always_inline void
+perf_sw_event_sched(u32 event_id, u64 nr, u64 addr)
+{
if (static_key_false(_swevent_enabled[event_id])) {
-   if (!regs) {
-   perf_fetch_caller_regs(_regs);
-   regs = _regs;
-   }
-   __perf_sw_event(event_id, nr, regs, addr);
+   struct pt_regs *regs = this_cpu_ptr(&__perf_regs[0]);
+
+   perf_fetch_caller_regs(regs);
+   ___perf_sw_event(event_id, nr, regs, addr);
}
 

[tip:perf/core] perf: Avoid horrible stack usage

2015-01-14 Thread tip-bot for Peter Zijlstra (Intel)
Commit-ID:  86038c5ea81b519a8a1fcfcd5e4599aab0cdd119
Gitweb: http://git.kernel.org/tip/86038c5ea81b519a8a1fcfcd5e4599aab0cdd119
Author: Peter Zijlstra (Intel) pet...@infradead.org
AuthorDate: Tue, 16 Dec 2014 12:47:34 +0100
Committer:  Ingo Molnar mi...@kernel.org
CommitDate: Wed, 14 Jan 2015 15:11:45 +0100

perf: Avoid horrible stack usage

Both Linus (most recent) and Steve (a while ago) reported that perf
related callbacks have massive stack bloat.

The problem is that software events need a pt_regs in order to
properly report the event location and unwind stack. And because we
could not assume one was present we allocated one on stack and filled
it with minimal bits required for operation.

Now, pt_regs is quite large, so this is undesirable. Furthermore it
turns out that most sites actually have a pt_regs pointer available,
making this even more onerous, as the stack space is pointless waste.

This patch addresses the problem by observing that software events
have well defined nesting semantics, therefore we can use static
per-cpu storage instead of on-stack.

Linus made the further observation that all but the scheduler callers
of perf_sw_event() have a pt_regs available, so we change the regular
perf_sw_event() to require a valid pt_regs (where it used to be
optional) and add perf_sw_event_sched() for the scheduler.

We have a scheduler specific call instead of a more generic _noregs()
like construct because we can assume non-recursion from the scheduler
and thereby simplify the code further (_noregs would have to put the
recursion context call inline in order to assertain which __perf_regs
element to use).

One last note on the implementation of perf_trace_buf_prepare(); we
allow .regs = NULL for those cases where we already have a pt_regs
pointer available and do not need another.

Reported-by: Linus Torvalds torva...@linux-foundation.org
Reported-by: Steven Rostedt rost...@goodmis.org
Signed-off-by: Peter Zijlstra (Intel) pet...@infradead.org
Cc: Arnaldo Carvalho de Melo a...@kernel.org
Cc: Javi Merino javi.mer...@arm.com
Cc: Linus Torvalds torva...@linux-foundation.org
Cc: Mathieu Desnoyers mathieu.desnoy...@efficios.com
Cc: Oleg Nesterov o...@redhat.com
Cc: Paul Mackerras pau...@samba.org
Cc: Petr Mladek pmla...@suse.cz
Cc: Steven Rostedt rost...@goodmis.org
Cc: Tom Zanussi tom.zanu...@linux.intel.com
Cc: Vaibhav Nagarnaik vnagarn...@google.com
Link: 
http://lkml.kernel.org/r/20141216115041.gw3...@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar mi...@kernel.org
---
 include/linux/ftrace_event.h|  2 +-
 include/linux/perf_event.h  | 28 +---
 include/trace/ftrace.h  |  7 ---
 kernel/events/core.c| 23 +--
 kernel/sched/core.c |  2 +-
 kernel/trace/trace_event_perf.c |  4 +++-
 kernel/trace/trace_kprobe.c |  4 ++--
 kernel/trace/trace_syscalls.c   |  4 ++--
 kernel/trace/trace_uprobe.c |  2 +-
 9 files changed, 52 insertions(+), 24 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 0bebb5c..d36f68b 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -595,7 +595,7 @@ extern int  ftrace_profile_set_filter(struct perf_event 
*event, int event_id,
 char *filter_str);
 extern void ftrace_profile_free_filter(struct perf_event *event);
 extern void *perf_trace_buf_prepare(int size, unsigned short type,
-   struct pt_regs *regs, int *rctxp);
+   struct pt_regs **regs, int *rctxp);
 
 static inline void
 perf_trace_buf_submit(void *raw_data, int size, int rctx, u64 addr,
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 4f7a61c..3a7bd80 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -665,6 +665,7 @@ static inline int is_software_event(struct perf_event 
*event)
 
 extern struct static_key perf_swevent_enabled[PERF_COUNT_SW_MAX];
 
+extern void ___perf_sw_event(u32, u64, struct pt_regs *, u64);
 extern void __perf_sw_event(u32, u64, struct pt_regs *, u64);
 
 #ifndef perf_arch_fetch_caller_regs
@@ -689,14 +690,25 @@ static inline void perf_fetch_caller_regs(struct pt_regs 
*regs)
 static __always_inline void
 perf_sw_event(u32 event_id, u64 nr, struct pt_regs *regs, u64 addr)
 {
-   struct pt_regs hot_regs;
+   if (static_key_false(perf_swevent_enabled[event_id]))
+   __perf_sw_event(event_id, nr, regs, addr);
+}
+
+DECLARE_PER_CPU(struct pt_regs, __perf_regs[4]);
 
+/*
+ * 'Special' version for the scheduler, it hard assumes no recursion,
+ * which is guaranteed by us not actually scheduling inside other swevents
+ * because those disable preemption.
+ */
+static __always_inline void
+perf_sw_event_sched(u32 event_id, u64 nr, u64 addr)
+{
if (static_key_false(perf_swevent_enabled[event_id])) {
-   if (!regs) {
-   

[tip:perf/urgent] perf/x86: Fix embarrasing typo

2014-11-04 Thread tip-bot for Peter Zijlstra (Intel)
Commit-ID:  ce5686d4ed12158599d2042a6c8659254ed263ce
Gitweb: http://git.kernel.org/tip/ce5686d4ed12158599d2042a6c8659254ed263ce
Author: Peter Zijlstra (Intel) 
AuthorDate: Wed, 29 Oct 2014 11:17:04 +0100
Committer:  Ingo Molnar 
CommitDate: Tue, 4 Nov 2014 07:06:58 +0100

perf/x86: Fix embarrasing typo

Because we're all human and typing sucks..

Fixes: 7fb0f1de49fc ("perf/x86: Fix compile warnings for intel_uncore")
Reported-by: Andi Kleen 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Linus Torvalds 
Cc: x...@kernel.org
Link: http://lkml.kernel.org/n/tip-be0bftjh8yfm4uvmvtf3y...@git.kernel.org
Signed-off-by: Ingo Molnar 
---
 arch/x86/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index ded8a67..41a503c 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -144,7 +144,7 @@ config INSTRUCTION_DECODER
 
 config PERF_EVENTS_INTEL_UNCORE
def_bool y
-   depends on PERF_EVENTS && SUP_SUP_INTEL && PCI
+   depends on PERF_EVENTS && CPU_SUP_INTEL && PCI
 
 config OUTPUT_FORMAT
string
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:perf/urgent] perf/x86: Fix embarrasing typo

2014-11-04 Thread tip-bot for Peter Zijlstra (Intel)
Commit-ID:  ce5686d4ed12158599d2042a6c8659254ed263ce
Gitweb: http://git.kernel.org/tip/ce5686d4ed12158599d2042a6c8659254ed263ce
Author: Peter Zijlstra (Intel) pet...@infradead.org
AuthorDate: Wed, 29 Oct 2014 11:17:04 +0100
Committer:  Ingo Molnar mi...@kernel.org
CommitDate: Tue, 4 Nov 2014 07:06:58 +0100

perf/x86: Fix embarrasing typo

Because we're all human and typing sucks..

Fixes: 7fb0f1de49fc (perf/x86: Fix compile warnings for intel_uncore)
Reported-by: Andi Kleen a...@linux.intel.com
Signed-off-by: Peter Zijlstra (Intel) pet...@infradead.org
Cc: Linus Torvalds torva...@linux-foundation.org
Cc: x...@kernel.org
Link: http://lkml.kernel.org/n/tip-be0bftjh8yfm4uvmvtf3y...@git.kernel.org
Signed-off-by: Ingo Molnar mi...@kernel.org
---
 arch/x86/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index ded8a67..41a503c 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -144,7 +144,7 @@ config INSTRUCTION_DECODER
 
 config PERF_EVENTS_INTEL_UNCORE
def_bool y
-   depends on PERF_EVENTS  SUP_SUP_INTEL  PCI
+   depends on PERF_EVENTS  CPU_SUP_INTEL  PCI
 
 config OUTPUT_FORMAT
string
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:perf/urgent] perf: Fix bogus kernel printk

2014-10-28 Thread tip-bot for Peter Zijlstra (Intel)
Commit-ID:  65d71fe1375b973083733294795bf2b09d45b3c2
Gitweb: http://git.kernel.org/tip/65d71fe1375b973083733294795bf2b09d45b3c2
Author: Peter Zijlstra (Intel) 
AuthorDate: Tue, 7 Oct 2014 19:07:33 +0200
Committer:  Ingo Molnar 
CommitDate: Tue, 28 Oct 2014 10:51:01 +0100

perf: Fix bogus kernel printk

Andy spotted the fail in what was intended as a conditional printk level.

Reported-by: Andy Lutomirski 
Fixes: cc6cd47e7395 ("perf/x86: Tone down kernel messages when the PMU check 
fails in a virtual environment")
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Arnaldo Carvalho de Melo 
Cc: Linus Torvalds 
Link: 
http://lkml.kernel.org/r/20141007124757.gh19...@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar 
---
 arch/x86/kernel/cpu/perf_event.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 1b8299d..66451a6 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -243,8 +243,9 @@ static bool check_hw_exists(void)
 
 msr_fail:
printk(KERN_CONT "Broken PMU hardware detected, using software events 
only.\n");
-   printk(boot_cpu_has(X86_FEATURE_HYPERVISOR) ? KERN_INFO : KERN_ERR
-  "Failed to access perfctr msr (MSR %x is %Lx)\n", reg, val_new);
+   printk("%sFailed to access perfctr msr (MSR %x is %Lx)\n",
+   boot_cpu_has(X86_FEATURE_HYPERVISOR) ? KERN_INFO : KERN_ERR,
+   reg, val_new);
 
return false;
 }
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:perf/urgent] perf: Fix bogus kernel printk

2014-10-28 Thread tip-bot for Peter Zijlstra (Intel)
Commit-ID:  65d71fe1375b973083733294795bf2b09d45b3c2
Gitweb: http://git.kernel.org/tip/65d71fe1375b973083733294795bf2b09d45b3c2
Author: Peter Zijlstra (Intel) pet...@infradead.org
AuthorDate: Tue, 7 Oct 2014 19:07:33 +0200
Committer:  Ingo Molnar mi...@kernel.org
CommitDate: Tue, 28 Oct 2014 10:51:01 +0100

perf: Fix bogus kernel printk

Andy spotted the fail in what was intended as a conditional printk level.

Reported-by: Andy Lutomirski l...@amacapital.net
Fixes: cc6cd47e7395 (perf/x86: Tone down kernel messages when the PMU check 
fails in a virtual environment)
Signed-off-by: Peter Zijlstra (Intel) pet...@infradead.org
Cc: Arnaldo Carvalho de Melo a...@kernel.org
Cc: Linus Torvalds torva...@linux-foundation.org
Link: 
http://lkml.kernel.org/r/20141007124757.gh19...@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar mi...@kernel.org
---
 arch/x86/kernel/cpu/perf_event.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 1b8299d..66451a6 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -243,8 +243,9 @@ static bool check_hw_exists(void)
 
 msr_fail:
printk(KERN_CONT Broken PMU hardware detected, using software events 
only.\n);
-   printk(boot_cpu_has(X86_FEATURE_HYPERVISOR) ? KERN_INFO : KERN_ERR
-  Failed to access perfctr msr (MSR %x is %Lx)\n, reg, val_new);
+   printk(%sFailed to access perfctr msr (MSR %x is %Lx)\n,
+   boot_cpu_has(X86_FEATURE_HYPERVISOR) ? KERN_INFO : KERN_ERR,
+   reg, val_new);
 
return false;
 }
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/