[PATCH] powerpc/e200: Skip tlb1 entries used for kernel mapping

2018-07-24 Thread Bharat Bhushan
E200 have TLB1 only and it does not have TLB0.
So TLB1 are used for mapping kernel and user-space both.
TLB miss handler for E200 does not consider skipping TLBs
used for kernel mapping. This patch ensures that we skip
tlb1 entries used for kernel mapping (tlbcam_index).

Signed-off-by: Bharat Bhushan 
---
 arch/powerpc/kernel/head_fsl_booke.S | 20 ++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/head_fsl_booke.S 
b/arch/powerpc/kernel/head_fsl_booke.S
index bf4c602..951fb96 100644
--- a/arch/powerpc/kernel/head_fsl_booke.S
+++ b/arch/powerpc/kernel/head_fsl_booke.S
@@ -801,12 +801,28 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_BIG_PHYS)
/* Round robin TLB1 entries assignment */
mfspr   r12, SPRN_MAS0
 
+   /* Get first free tlbcam entry */
+   lis r11, tlbcam_index@ha
+   lwz r11, tlbcam_index@l(r11)
+
+   /* Extract MAS0(NV) */
+   andi.   r13, r12, 0xfff
+   cmpw0, r13, r11
+   blt 0, 5f
+   b   6f
+5:
+   /* When NV is less than first free tlbcam entry, use first free
+* tlbcam entry for ESEL and set NV */
+   rlwimi  r12, r11, 16, 4, 15
+   addir11, r11, 1
+   rlwimi  r12, r11, 0, 20, 31
+   b   7f
+6:
/* Extract TLB1CFG(NENTRY) */
mfspr   r11, SPRN_TLB1CFG
andi.   r11, r11, 0xfff
 
-   /* Extract MAS0(NV) */
-   andi.   r13, r12, 0xfff
+   /* Set MAS0(NV) for next TLB miss exception */
addir13, r13, 1
cmpw0, r13, r11
addir12, r12, 1
-- 
1.9.3



[PATCH v4 0/2] powerpc: Detection and scheduler optimization for POWER9 bigcore

2018-07-24 Thread Gautham R. Shenoy
From: "Gautham R. Shenoy" 

Hi,

This is the fourth iteration of the patchset to add support for
big-core on POWER9.

The previous versions can be found here:

v3: https://lkml.org/lkml/2018/7/6/255
v2: https://lkml.org/lkml/2018/7/3/401
v1: https://lkml.org/lkml/2018/5/11/245

Changes :
v3 --> v4:
   - Build fix for powerpc-g5 : Enable CPU_FTR_ASYM_SMT only on
 CONFIG_PPC_POWERNV and CONFIG_PPC_PSERIES.
   - Fixed a minor error in the ABI description.

v2 --> v3
- Set sane values in the tg->property, tg->nr_groups inside
parse_thread_groups before returning due to an error.
- Define a helper function to determine whether a CPU device node
  is a big-core or not.
- Updated the comments around the functions to describe the
  arguments passed to them.

v1 --> v2
- Added comments explaining the "ibm,thread-groups" device tree property.
- Uses cleaner device-tree parsing functions to parse the u32 arrays.
- Adds a sysfs file listing the small-core siblings for every CPU.
- Enables the scheduler optimization by setting the CPU_FTR_ASYM_SMT bit
  in the cur_cpu_spec->cpu_features on detecting the presence
  of interleaved big-core.
- Handles the corner case where there is only a single thread-group
  or when there is a single thread in a thread-group.

Description:

A pair of IBM POWER9 SMT4 cores can be fused together to form a
big-core with 8 SMT threads. This can be discovered via the
"ibm,thread-groups" CPU property in the device tree which will
indicate which group of threads that share the L1 cache, translation
cache and instruction data flow.  If there are multiple such group of
threads, then the core is a big-core. Furthermore, the thread-ids of
such a big-core is obtained by interleaving the thread-ids of the
component SMT4 cores.

Eg: Threads in the pair of component SMT4 cores of an interleaved
big-core are numbered {0,2,4,6} and {1,3,5,7} respectively.

On such a big-core, when multiple tasks are scheduled to run on the
big-core, we get the best performance when the tasks are spread across
the pair of SMT4 cores.

The Linux scheduler supports a flag called "SD_ASYM_PACKING" which
when set in the SMT sched-domain, biases the load-balancing of the
tasks on the smaller numbered threads in the core. On an big-core
whose threads are interleavings of the threads of the small cores,
enabling SD_ASYM_PACKING in the SMT sched-domain automatically results
in spreading the tasks uniformly across the associated pair of SMT4
cores, thereby yielding better performance.

This patchset contains two patches which on detecting the presence of
interleaved big-cores will enable the the CPU_FTR_ASYM_SMT bit in the
cur_cpu_spec->cpu_feature.

Patch 1: adds support to detect the presence of
big-cores and reports the small-core siblings of each CPU X
via the sysfs file "/sys/devices/system/cpu/cpuX/big_core_siblings".

Patch 2: checks if the thread-ids of the component small-cores are
interleaved, in which case we enable the the CPU_FTR_ASYM_SMT bit in
the cur_cpu_spec->cpu_features which results in the SD_ASYM_PACKING
flag being set at the SMT level sched-domain.

Results:
~
Experimental results for ebizzy with 2 threads, bound to a single big-core
show a marked improvement with this patchset over the 4.18-rc5 vanilla
kernel.

The result of 100 such runs for 4.18-rc5 kernel and the 4.18-rc5 +
big-core-patches are as follows

4.18-rc5 vanilla:

records/s:  # samples  : Histogram

[0 - 100]:  0  : #
[100 - 200]  :  7  : ##
[200 - 300]  :  17 : 
[300 - 400]  :  18 : 
[400 - 500]  :  3  : #
[500 - 600]  :  55 : 

4.8-rc5 + big-core-patches

records/s:  # samples  : Histogram

[0 - 100]:  0  : #
[100 - 200]  :  0  : #
[200 - 300]  :  8  : ##
[300 - 400]  :  0  : #
[400 - 500]  :  0  : #
[500 - 600]  :  92 : ###


Gautham R. Shenoy (2):
  powerpc: Detect the presence of big-cores via "ibm,thread-groups"
  powerpc: Enable CPU_FTR_ASYM_SMT for interleaved big-cores

 Documentation/ABI/testing/sysfs-devices-system-cpu |   8 +
 arch/powerpc/include/asm/cputhreads.h  |  22 ++
 arch/powerpc/kernel/setup-common.c | 229 -
 arch/powerpc/kernel/sysfs.c|  35 
 4 files changed, 293 insertions(+), 1 deletion(-)

-- 
1.9.4



[PATCH v4 2/2] powerpc: Enable CPU_FTR_ASYM_SMT for interleaved big-cores

2018-07-24 Thread Gautham R. Shenoy
From: "Gautham R. Shenoy" 

A pair of IBM POWER9 SMT4 cores can be fused together to form a big-core
with 8 SMT threads. This can be discovered via the "ibm,thread-groups"
CPU property in the device tree which will indicate which group of
threads that share the L1 cache, translation cache and instruction data
flow. If there are multiple such group of threads, then the core is a
big-core.

Furthermore, if the thread-ids of the threads of the big-core can be
obtained by interleaving the thread-ids of the thread-groups
(component small core), then such a big-core is called an interleaved
big-core.

Eg: Threads in the pair of component SMT4 cores of an interleaved
big-core are numbered {0,2,4,6} and {1,3,5,7} respectively.

The SMT4 cores forming a big-core are more or less independent
units. Thus when multiple tasks are scheduled to run on the fused
core, we get the best performance when the tasks are spread across the
pair of SMT4 cores.

This patch enables CPU_FTR_ASYM_SMT bit in the cpu-features on
detecting the presence of interleaved big-cores at boot up. This will
will bias the load-balancing of tasks on smaller numbered threads,
which will automatically result in spreading the tasks uniformly
across the associated pair of SMT4 cores.

Signed-off-by: Gautham R. Shenoy 
---
 arch/powerpc/kernel/setup-common.c | 75 +-
 1 file changed, 74 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/setup-common.c 
b/arch/powerpc/kernel/setup-common.c
index 989edc1..22bc486 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -581,6 +581,69 @@ int get_cpu_thread_group_start(int cpu, struct 
thread_groups *tg)
return -1;
 }
 
+/*
+ * check_interleaved_big_core - Checks if the thread group tg
+ * corresponds to a big-core whose threads are interleavings of the
+ * threads of the component small cores.
+ *
+ * @tg: A thread-group struct for the core.
+ *
+ * Returns true if the core is a interleaved big-core.
+ * Returns false otherwise.
+ */
+static inline bool check_interleaved_big_core(struct thread_groups *tg)
+{
+   int nr_groups;
+   int threads_per_group;
+   int cur_cpu, next_cpu, i, j;
+
+   nr_groups = tg->nr_groups;
+   threads_per_group = tg->threads_per_group;
+
+   if (tg->property != 1)
+   return false;
+
+   if (nr_groups < 2 || threads_per_group < 2)
+   return false;
+
+   /*
+* In case of an interleaved big-core, the thread-ids of the
+* big-core can be obtained by interleaving the the thread-ids
+* of the component small
+*
+* Eg: On a 8-thread big-core with two SMT4 small cores, the
+* threads of the two component small cores will be
+* {0, 2, 4, 6} and {1, 3, 5, 7}.
+*/
+   for (i = 0; i < nr_groups; i++) {
+   int group_start = i * threads_per_group;
+
+   for (j = 0; j < threads_per_group - 1; j++) {
+   int cur_idx = group_start + j;
+
+   cur_cpu = tg->thread_list[cur_idx];
+   next_cpu = tg->thread_list[cur_idx + 1];
+   if (next_cpu != cur_cpu + nr_groups)
+   return false;
+   }
+   }
+
+   return true;
+}
+
+#if defined(CONFIG_PPC_PSERIES) || defined(CONFIG_PPC_POWERNV)
+static inline void enable_asym_smt_feature(void)
+{
+   int key = __builtin_ctzl(CPU_FTR_ASYM_SMT);
+
+   cur_cpu_spec->cpu_features |= CPU_FTR_ASYM_SMT;
+   static_branch_enable(_feature_keys[key]);
+   pr_info("Enabling ASYM_SMT on interleaved big-cores\n");
+}
+#else
+#define enable_asym_smt_feature()
+#endif
+
 /**
  * setup_cpu_maps - initialize the following cpu maps:
  *  cpu_possible_mask
@@ -604,6 +667,7 @@ void __init smp_setup_cpu_maps(void)
struct device_node *dn;
int cpu = 0;
int nthreads = 1;
+   bool has_interleaved_big_cores = true;
 
has_big_cores = true;
DBG("smp_setup_cpu_maps()\n");
@@ -657,6 +721,12 @@ void __init smp_setup_cpu_maps(void)
 
if (has_big_cores && !dt_has_big_core(dn, )) {
has_big_cores = false;
+   has_interleaved_big_cores = false;
+   }
+
+   if (has_interleaved_big_cores) {
+   has_interleaved_big_cores =
+   check_interleaved_big_core();
}
 
if (cpu >= nr_cpu_ids) {
@@ -713,7 +783,10 @@ void __init smp_setup_cpu_maps(void)
vdso_data->processorCount = num_present_cpus();
 #endif /* CONFIG_PPC64 */
 
-/* Initialize CPU <=> thread mapping/
+   if (has_interleaved_big_cores)
+   enable_asym_smt_feature();
+
+   /* Initialize CPU <=> thread mapping/
 *
 * WARNING: We assume that the number of threads is the same for
 * every CPU 

[PATCH v4 1/2] powerpc: Detect the presence of big-cores via "ibm, thread-groups"

2018-07-24 Thread Gautham R. Shenoy
From: "Gautham R. Shenoy" 

On IBM POWER9, the device tree exposes a property array identifed by
"ibm,thread-groups" which will indicate which groups of threads share a
particular set of resources.

As of today we only have one form of grouping identifying the group of
threads in the core that share the L1 cache, translation cache and
instruction data flow.

This patch defines the helper function to parse the contents of
"ibm,thread-groups" and a new structure to contain the parsed output.

The patch also creates the sysfs file named "small_core_siblings" that
returns the physical ids of the threads in the core that share the L1
cache, translation cache and instruction data flow.

Signed-off-by: Gautham R. Shenoy 
---
 Documentation/ABI/testing/sysfs-devices-system-cpu |   8 ++
 arch/powerpc/include/asm/cputhreads.h  |  22 +++
 arch/powerpc/kernel/setup-common.c | 154 +
 arch/powerpc/kernel/sysfs.c|  35 +
 4 files changed, 219 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu 
b/Documentation/ABI/testing/sysfs-devices-system-cpu
index 9c5e7732..41adf1d 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -487,3 +487,11 @@ Description:   Information about CPU vulnerabilities
"Not affected"CPU is not affected by the vulnerability
"Vulnerable"  CPU is affected and no mitigation in effect
"Mitigation: $M"  CPU is affected and mitigation $M is in effect
+
+What:  /sys/devices/system/cpu/cpu[0-9]+/small_core_siblings
+Date:  24-Jul-2018
+KernelVersion: v4.18.0
+Contact:   Gautham R. Shenoy 
+Description:   List of Physical ids of CPUs which share the L1 cache,
+   translation cache and instruction data-flow with this CPU.
+Values:Comma separated list of decimal integers.
diff --git a/arch/powerpc/include/asm/cputhreads.h 
b/arch/powerpc/include/asm/cputhreads.h
index d71a909..33226d7 100644
--- a/arch/powerpc/include/asm/cputhreads.h
+++ b/arch/powerpc/include/asm/cputhreads.h
@@ -23,11 +23,13 @@
 extern int threads_per_core;
 extern int threads_per_subcore;
 extern int threads_shift;
+extern bool has_big_cores;
 extern cpumask_t threads_core_mask;
 #else
 #define threads_per_core   1
 #define threads_per_subcore1
 #define threads_shift  0
+#define has_big_cores  0
 #define threads_core_mask  (*get_cpu_mask(0))
 #endif
 
@@ -69,12 +71,32 @@ static inline cpumask_t cpu_online_cores_map(void)
return cpu_thread_mask_to_cores(cpu_online_mask);
 }
 
+#define MAX_THREAD_LIST_SIZE   8
+struct thread_groups {
+   unsigned int property;
+   unsigned int nr_groups;
+   unsigned int threads_per_group;
+   unsigned int thread_list[MAX_THREAD_LIST_SIZE];
+};
+
 #ifdef CONFIG_SMP
 int cpu_core_index_of_thread(int cpu);
 int cpu_first_thread_of_core(int core);
+int parse_thread_groups(struct device_node *dn, struct thread_groups *tg);
+int get_cpu_thread_group_start(int cpu, struct thread_groups *tg);
 #else
 static inline int cpu_core_index_of_thread(int cpu) { return cpu; }
 static inline int cpu_first_thread_of_core(int core) { return core; }
+static inline int parse_thread_groups(struct device_node *dn,
+ struct thread_groups *tg)
+{
+   return -ENODATA;
+}
+
+static inline int get_cpu_thread_group_start(int cpu, struct thread_groups *tg)
+{
+   return -1;
+}
 #endif
 
 static inline int cpu_thread_in_core(int cpu)
diff --git a/arch/powerpc/kernel/setup-common.c 
b/arch/powerpc/kernel/setup-common.c
index 40b44bb..989edc1 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -402,10 +402,12 @@ void __init check_for_initrd(void)
 #ifdef CONFIG_SMP
 
 int threads_per_core, threads_per_subcore, threads_shift;
+bool has_big_cores;
 cpumask_t threads_core_mask;
 EXPORT_SYMBOL_GPL(threads_per_core);
 EXPORT_SYMBOL_GPL(threads_per_subcore);
 EXPORT_SYMBOL_GPL(threads_shift);
+EXPORT_SYMBOL_GPL(has_big_cores);
 EXPORT_SYMBOL_GPL(threads_core_mask);
 
 static void __init cpu_init_thread_core_maps(int tpc)
@@ -433,6 +435,152 @@ static void __init cpu_init_thread_core_maps(int tpc)
 
 u32 *cpu_to_phys_id = NULL;
 
+/*
+ * parse_thread_groups: Parses the "ibm,thread-groups" device tree
+ *  property for the CPU device node @dn and stores
+ *  the parsed output in the thread_groups
+ *  structure @tg.
+ *
+ * @dn: The device node of the CPU device.
+ * @tg: Pointer to a thread group structure into which the parsed
+ * output of "ibm,thread-groups" is stored.
+ *
+ * ibm,thread-groups[0..N-1] array defines which group of threads in
+ * the CPU-device node can be grouped together based on the property.
+ *
+ * ibm,thread-groups[0] tells us the property based on 

[PATCH 2/7] powerpc/traps: Return early in show_signal_msg()

2018-07-24 Thread Murilo Opsfelder Araujo
Modify logic of show_signal_msg() to return early, if possible.  Replace
printk_ratelimited() by printk() and a default rate limit burst to limit
displaying unhandled signals messages.

Signed-off-by: Murilo Opsfelder Araujo 
---
 arch/powerpc/kernel/traps.c | 21 +++--
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index cbd3dc365193..4faab4705774 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -301,6 +301,13 @@ void user_single_step_siginfo(struct task_struct *tsk,
info->si_addr = (void __user *)regs->nip;
 }
 
+static bool show_unhandled_signals_ratelimited(void)
+{
+   static DEFINE_RATELIMIT_STATE(rs, DEFAULT_RATELIMIT_INTERVAL,
+ DEFAULT_RATELIMIT_BURST);
+   return show_unhandled_signals && __ratelimit();
+}
+
 static void show_signal_msg(int signr, struct pt_regs *regs, int code,
unsigned long addr)
 {
@@ -309,11 +316,12 @@ static void show_signal_msg(int signr, struct pt_regs 
*regs, int code,
const char fmt64[] = KERN_INFO "%s[%d]: unhandled signal %d " \
"at %016lx nip %016lx lr %016lx code %x\n";
 
-   if (show_unhandled_signals && unhandled_signal(current, signr)) {
-   printk_ratelimited(regs->msr & MSR_64BIT ? fmt64 : fmt32,
-  current->comm, current->pid, signr,
-  addr, regs->nip, regs->link, code);
-   }
+   if (!unhandled_signal(current, signr))
+   return;
+
+   printk(regs->msr & MSR_64BIT ? fmt64 : fmt32,
+  current->comm, current->pid, signr,
+  addr, regs->nip, regs->link, code);
 }
 
 void _exception_pkey(int signr, struct pt_regs *regs, int code,
@@ -326,7 +334,8 @@ void _exception_pkey(int signr, struct pt_regs *regs, int 
code,
return;
}
 
-   show_signal_msg(signr, regs, code, addr);
+   if (show_unhandled_signals_ratelimited())
+   show_signal_msg(signr, regs, code, addr);
 
if (arch_irqs_disabled() && !arch_irq_disabled_regs(regs))
local_irq_enable();
-- 
2.17.1



[PATCH 6/7] powerpc/traps: Print signal name for unhandled signals

2018-07-24 Thread Murilo Opsfelder Araujo
This adds a human-readable name in the unhandled signal message.

Before this patch, a page fault looked like:

Jul 11 16:04:11 localhost kernel: pandafault[6303]: unhandled signal 11 at 
17d0 nip 161c lr 7fff93c55100 code 2 in 
pandafault[1000+1]

After this patch, a page fault looks like:

Jul 11 18:14:48 localhost kernel: pandafault[6352]: segfault (11) at 
00013a2a09f8 nip 00013a2a086c lr 7fffb63e5100 code 2 in 
pandafault[13a2a+1]

Signed-off-by: Murilo Opsfelder Araujo 
---
 arch/powerpc/kernel/traps.c | 43 +
 1 file changed, 39 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index e6c43ef9fb50..e55ee639d010 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -96,6 +96,41 @@ EXPORT_SYMBOL(__debugger_fault_handler);
 #define TM_DEBUG(x...) do { } while(0)
 #endif
 
+static const char *signames[SIGRTMIN + 1] = {
+   "UNKNOWN",
+   "SIGHUP",   // 1
+   "SIGINT",   // 2
+   "SIGQUIT",  // 3
+   "SIGILL",   // 4
+   "unhandled trap",   // 5 = SIGTRAP
+   "SIGABRT",  // 6 = SIGIOT
+   "bus error",// 7 = SIGBUS
+   "floating point exception", // 8 = SIGFPE
+   "illegal instruction",  // 9 = SIGILL
+   "SIGUSR1",  // 10
+   "segfault", // 11 = SIGSEGV
+   "SIGUSR2",  // 12
+   "SIGPIPE",  // 13
+   "SIGALRM",  // 14
+   "SIGTERM",  // 15
+   "SIGSTKFLT",// 16
+   "SIGCHLD",  // 17
+   "SIGCONT",  // 18
+   "SIGSTOP",  // 19
+   "SIGTSTP",  // 20
+   "SIGTTIN",  // 21
+   "SIGTTOU",  // 22
+   "SIGURG",   // 23
+   "SIGXCPU",  // 24
+   "SIGXFSZ",  // 25
+   "SIGVTALRM",// 26
+   "SIGPROF",  // 27
+   "SIGWINCH", // 28
+   "SIGIO",// 29 = SIGPOLL = SIGLOST
+   "SIGPWR",   // 30
+   "SIGSYS",   // 31 = SIGUNUSED
+};
+
 /*
  * Trap & Exception support
  */
@@ -314,10 +349,10 @@ static void show_signal_msg(int signr, struct pt_regs 
*regs, int code,
if (!unhandled_signal(current, signr))
return;
 
-   pr_info("%s[%d]: unhandled signal %d at "REG_FMT \
-   " nip "REG_FMT" lr "REG_FMT" code %x",
-   current->comm, current->pid, signr, addr,
-   regs->nip, regs->link, code);
+   pr_info("%s[%d]: %s (%d) at "REG_FMT" nip "REG_FMT \
+   " lr "REG_FMT" code %x",
+   current->comm, current->pid, signames[signr],
+   signr, addr, regs->nip, regs->link, code);
 
print_vma_addr(KERN_CONT " in ", regs->nip);
 
-- 
2.17.1



[PATCH 5/7] powerpc/traps: Print VMA for unhandled signals

2018-07-24 Thread Murilo Opsfelder Araujo
This adds VMA address in the message printed for unhandled signals, similarly to
what other architectures, like x86, print.

Before this patch, a page fault looked like:

Jul 11 15:56:25 localhost kernel: pandafault[61470]: unhandled signal 11 at 
17d0 nip 161c lr 7fff8d185100 code 2

After this patch, a page fault looks like:

Jul 11 16:04:11 localhost kernel: pandafault[6303]: unhandled signal 11 at 
17d0 nip 161c lr 7fff93c55100 code 2 in 
pandafault[1000+1]

Signed-off-by: Murilo Opsfelder Araujo 
---
 arch/powerpc/kernel/traps.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 047d980ac776..e6c43ef9fb50 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -315,9 +315,13 @@ static void show_signal_msg(int signr, struct pt_regs 
*regs, int code,
return;
 
pr_info("%s[%d]: unhandled signal %d at "REG_FMT \
-   " nip "REG_FMT" lr "REG_FMT" code %x\n",
+   " nip "REG_FMT" lr "REG_FMT" code %x",
current->comm, current->pid, signr, addr,
regs->nip, regs->link, code);
+
+   print_vma_addr(KERN_CONT " in ", regs->nip);
+
+   pr_cont("\n");
 }
 
 void _exception_pkey(int signr, struct pt_regs *regs, int code,
-- 
2.17.1



[PATCH 3/7] powerpc/reg: Add REG_FMT definition

2018-07-24 Thread Murilo Opsfelder Araujo
Make REG definition, in arch/powerpc/kernel/process.c, generic enough by
renaming it to REG_FMT and placing it in arch/powerpc/include/asm/reg.h to be
used elsewhere.

Replace occurrences of REG by REG_FMT in arch/powerpc/kernel/process.c.

Signed-off-by: Murilo Opsfelder Araujo 
---
 arch/powerpc/include/asm/reg.h |  6 ++
 arch/powerpc/kernel/process.c  | 22 ++
 2 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 858aa7984ab0..d6c5c77383de 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -1319,6 +1319,12 @@
 #define PVR_ARCH_207   0x0f04
 #define PVR_ARCH_300   0x0f05
 
+#ifdef CONFIG_PPC64
+#define REG_FMT"%016lx"
+#else
+#define REG_FMT"%08lx"
+#endif /* CONFIG_PPC64 */
+
 /* Macros for setting and retrieving special purpose registers */
 #ifndef __ASSEMBLY__
 #define mfmsr()({unsigned long rval; \
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 27f0caee55ea..b1af3390249c 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1381,11 +1381,9 @@ static void print_msr_bits(unsigned long val)
 }
 
 #ifdef CONFIG_PPC64
-#define REG"%016lx"
 #define REGS_PER_LINE  4
 #define LAST_VOLATILE  13
 #else
-#define REG"%08lx"
 #define REGS_PER_LINE  8
 #define LAST_VOLATILE  12
 #endif
@@ -1396,21 +1394,21 @@ void show_regs(struct pt_regs * regs)
 
show_regs_print_info(KERN_DEFAULT);
 
-   printk("NIP:  "REG" LR: "REG" CTR: "REG"\n",
+   printk("NIP:  "REG_FMT" LR: "REG_FMT" CTR: "REG_FMT"\n",
   regs->nip, regs->link, regs->ctr);
printk("REGS: %px TRAP: %04lx   %s  (%s)\n",
   regs, regs->trap, print_tainted(), init_utsname()->release);
-   printk("MSR:  "REG" ", regs->msr);
+   printk("MSR:  "REG_FMT" ", regs->msr);
print_msr_bits(regs->msr);
-   pr_cont("  CR: %08lx  XER: %08lx\n", regs->ccr, regs->xer);
+   pr_cont("  CR: "REG_FMT"  XER: "REG_FMT"\n", regs->ccr, regs->xer);
trap = TRAP(regs);
if ((TRAP(regs) != 0xc00) && cpu_has_feature(CPU_FTR_CFAR))
-   pr_cont("CFAR: "REG" ", regs->orig_gpr3);
+   pr_cont("CFAR: "REG_FMT" ", regs->orig_gpr3);
if (trap == 0x200 || trap == 0x300 || trap == 0x600)
 #if defined(CONFIG_4xx) || defined(CONFIG_BOOKE)
-   pr_cont("DEAR: "REG" ESR: "REG" ", regs->dar, regs->dsisr);
+   pr_cont("DEAR: "REG_FMT" ESR: "REG_FMT" ", regs->dar, 
regs->dsisr);
 #else
-   pr_cont("DAR: "REG" DSISR: %08lx ", regs->dar, regs->dsisr);
+   pr_cont("DAR: "REG_FMT" DSISR: "REG_FMT" ", regs->dar, 
regs->dsisr);
 #endif
 #ifdef CONFIG_PPC64
pr_cont("IRQMASK: %lx ", regs->softe);
@@ -1423,7 +1421,7 @@ void show_regs(struct pt_regs * regs)
for (i = 0;  i < 32;  i++) {
if ((i % REGS_PER_LINE) == 0)
pr_cont("\nGPR%02d: ", i);
-   pr_cont(REG " ", regs->gpr[i]);
+   pr_cont(REG_FMT " ", regs->gpr[i]);
if (i == LAST_VOLATILE && !FULL_REGS(regs))
break;
}
@@ -1433,8 +1431,8 @@ void show_regs(struct pt_regs * regs)
 * Lookup NIP late so we have the best change of getting the
 * above info out without failing
 */
-   printk("NIP ["REG"] %pS\n", regs->nip, (void *)regs->nip);
-   printk("LR ["REG"] %pS\n", regs->link, (void *)regs->link);
+   printk("NIP ["REG_FMT"] %pS\n", regs->nip, (void *)regs->nip);
+   printk("LR ["REG_FMT"] %pS\n", regs->link, (void *)regs->link);
 #endif
show_stack(current, (unsigned long *) regs->gpr[1]);
if (!user_mode(regs))
@@ -2038,7 +2036,7 @@ void show_stack(struct task_struct *tsk, unsigned long 
*stack)
newsp = stack[0];
ip = stack[STACK_FRAME_LR_SAVE];
if (!firstframe || ip != lr) {
-   printk("["REG"] ["REG"] %pS", sp, ip, (void *)ip);
+   printk("["REG_FMT"] ["REG_FMT"] %pS", sp, ip, (void 
*)ip);
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
if ((ip == rth) && curr_frame >= 0) {
pr_cont(" (%pS)",
-- 
2.17.1



[PATCH 7/7] powerpc/traps: Show instructions on exceptions

2018-07-24 Thread Murilo Opsfelder Araujo
Move show_instructions() declaration to arch/powerpc/include/asm/stacktrace.h
and include asm/stracktrace.h in arch/powerpc/kernel/process.c, which contains
the implementation.

Modify show_instructions() not to call __kernel_text_address(), allowing
userspace instruction dump.  probe_kernel_address(), which returns -EFAULT if
something goes wrong, is still being called.

Call show_instructions() in arch/powerpc/kernel/traps.c to dump instructions at
faulty location, useful to debugging.

Before this patch, an unhandled signal message looked like:

Jul 24 09:57:00 localhost kernel: pandafault[10524]: segfault (11) at 
17d0 nip 161c lr 7fffbd295100 code 2 in 
pandafault[1000+1]

After this patch, it looks like:

Jul 24 09:57:00 localhost kernel: pandafault[10524]: segfault (11) at 
17d0 nip 161c lr 7fffbd295100 code 2 in 
pandafault[1000+1]
Jul 24 09:57:00 localhost kernel: Instruction dump:
Jul 24 09:57:00 localhost kernel: 4bfffeec 4bfffee8 3c401002 38427f00 
fbe1fff8 f821ffc1 7c3f0b78 3d22fffe
Jul 24 09:57:00 localhost kernel: 392988d0 f93f0020 e93f0020 39400048 
<9949> 3920 7d234b78 383f0040

Signed-off-by: Murilo Opsfelder Araujo 
---
 arch/powerpc/include/asm/stacktrace.h | 7 +++
 arch/powerpc/kernel/process.c | 6 +++---
 arch/powerpc/kernel/traps.c   | 3 +++
 3 files changed, 13 insertions(+), 3 deletions(-)
 create mode 100644 arch/powerpc/include/asm/stacktrace.h

diff --git a/arch/powerpc/include/asm/stacktrace.h 
b/arch/powerpc/include/asm/stacktrace.h
new file mode 100644
index ..46e5ef451578
--- /dev/null
+++ b/arch/powerpc/include/asm/stacktrace.h
@@ -0,0 +1,7 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_POWERPC_STACKTRACE_H
+#define _ASM_POWERPC_STACKTRACE_H
+
+void show_instructions(struct pt_regs *regs);
+
+#endif /* _ASM_POWERPC_STACKTRACE_H */
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index b1af3390249c..ee1d63e03c52 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -52,6 +52,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1261,7 +1262,7 @@ struct task_struct *__switch_to(struct task_struct *prev,
 
 static int instructions_to_print = 16;
 
-static void show_instructions(struct pt_regs *regs)
+void show_instructions(struct pt_regs *regs)
 {
int i;
unsigned long pc = regs->nip - (instructions_to_print * 3 / 4 *
@@ -1283,8 +1284,7 @@ static void show_instructions(struct pt_regs *regs)
pc = (unsigned long)phys_to_virt(pc);
 #endif
 
-   if (!__kernel_text_address(pc) ||
-probe_kernel_address((unsigned int __user *)pc, instr)) {
+   if (probe_kernel_address((unsigned int __user *)pc, instr)) {
pr_cont(" ");
} else {
if (regs->nip == pc)
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index e55ee639d010..3beca17ac1b1 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -70,6 +70,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #if defined(CONFIG_DEBUGGER) || defined(CONFIG_KEXEC_CORE)
 int (*__debugger)(struct pt_regs *regs) __read_mostly;
@@ -357,6 +358,8 @@ static void show_signal_msg(int signr, struct pt_regs 
*regs, int code,
print_vma_addr(KERN_CONT " in ", regs->nip);
 
pr_cont("\n");
+
+   show_instructions(regs);
 }
 
 void _exception_pkey(int signr, struct pt_regs *regs, int code,
-- 
2.17.1



[PATCH 0/7] powerpc: Modernize unhandled signals message

2018-07-24 Thread Murilo Opsfelder Araujo
Hi, everyone.

This series was inspired by the need to modernize and display more
informative messages about unhandled signals.

The "unhandled signal NN" is not very informative.  We thought it would
be helpful adding a human-readable message describing what the signal
number means, printing the VMA address, and dumping the instructions.

We can add more informative messages, like informing what each code of a
SIGSEGV signal means.  We are open to suggestions.

I have collected some early feedback from Michael Ellerman about this
series and would love to hear more feedback from you all.

Before this series:

Jul 24 13:01:07 localhost kernel: pandafault[5989]: unhandled signal 11 at 
17d0 nip 161c lr 3fff85a75100 code 2

After this series:

Jul 24 13:08:01 localhost kernel: pandafault[10758]: segfault (11) at 
17d0 nip 161c lr 7fffabc85100 code 2 in 
pandafault[1000+1]
Jul 24 13:08:01 localhost kernel: Instruction dump:
Jul 24 13:08:01 localhost kernel: 4bfffeec 4bfffee8 3c401002 38427f00 
fbe1fff8 f821ffc1 7c3f0b78 3d22fffe
Jul 24 13:08:01 localhost kernel: 392988d0 f93f0020 e93f0020 39400048 
<9949> 3920 7d234b78 383f0040

Cheers
Murilo

Murilo Opsfelder Araujo (7):
  powerpc/traps: Print unhandled signals in a separate function
  powerpc/traps: Return early in show_signal_msg()
  powerpc/reg: Add REG_FMT definition
  powerpc/traps: Use REG_FMT in show_signal_msg()
  powerpc/traps: Print VMA for unhandled signals
  powerpc/traps: Print signal name for unhandled signals
  powerpc/traps: Show instructions on exceptions

 arch/powerpc/include/asm/reg.h|  6 +++
 arch/powerpc/include/asm/stacktrace.h |  7 +++
 arch/powerpc/kernel/process.c | 28 +-
 arch/powerpc/kernel/traps.c   | 73 +++
 4 files changed, 89 insertions(+), 25 deletions(-)
 create mode 100644 arch/powerpc/include/asm/stacktrace.h

-- 
2.17.1



[PATCH 1/7] powerpc/traps: Print unhandled signals in a separate function

2018-07-24 Thread Murilo Opsfelder Araujo
Isolate the logic of printing unhandled signals out of _exception_pkey().  No
functional change, only code rearrangement.

Signed-off-by: Murilo Opsfelder Araujo 
---
 arch/powerpc/kernel/traps.c | 26 --
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 0e17dcb48720..cbd3dc365193 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -301,26 +301,32 @@ void user_single_step_siginfo(struct task_struct *tsk,
info->si_addr = (void __user *)regs->nip;
 }
 
+static void show_signal_msg(int signr, struct pt_regs *regs, int code,
+   unsigned long addr)
+{
+   const char fmt32[] = KERN_INFO "%s[%d]: unhandled signal %d " \
+   "at %08lx nip %08lx lr %08lx code %x\n";
+   const char fmt64[] = KERN_INFO "%s[%d]: unhandled signal %d " \
+   "at %016lx nip %016lx lr %016lx code %x\n";
+
+   if (show_unhandled_signals && unhandled_signal(current, signr)) {
+   printk_ratelimited(regs->msr & MSR_64BIT ? fmt64 : fmt32,
+  current->comm, current->pid, signr,
+  addr, regs->nip, regs->link, code);
+   }
+}
 
 void _exception_pkey(int signr, struct pt_regs *regs, int code,
-   unsigned long addr, int key)
+unsigned long addr, int key)
 {
siginfo_t info;
-   const char fmt32[] = KERN_INFO "%s[%d]: unhandled signal %d " \
-   "at %08lx nip %08lx lr %08lx code %x\n";
-   const char fmt64[] = KERN_INFO "%s[%d]: unhandled signal %d " \
-   "at %016lx nip %016lx lr %016lx code %x\n";
 
if (!user_mode(regs)) {
die("Exception in kernel mode", regs, signr);
return;
}
 
-   if (show_unhandled_signals && unhandled_signal(current, signr)) {
-   printk_ratelimited(regs->msr & MSR_64BIT ? fmt64 : fmt32,
-  current->comm, current->pid, signr,
-  addr, regs->nip, regs->link, code);
-   }
+   show_signal_msg(signr, regs, code, addr);
 
if (arch_irqs_disabled() && !arch_irq_disabled_regs(regs))
local_irq_enable();
-- 
2.17.1



[PATCH 4/7] powerpc/traps: Use REG_FMT in show_signal_msg()

2018-07-24 Thread Murilo Opsfelder Araujo
Simplify the message format by using REG_FMT as the register format.  This
avoids having two different formats and avoids checking for MSR_64BIT.

Signed-off-by: Murilo Opsfelder Araujo 
---
 arch/powerpc/kernel/traps.c | 12 
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 4faab4705774..047d980ac776 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -311,17 +311,13 @@ static bool show_unhandled_signals_ratelimited(void)
 static void show_signal_msg(int signr, struct pt_regs *regs, int code,
unsigned long addr)
 {
-   const char fmt32[] = KERN_INFO "%s[%d]: unhandled signal %d " \
-   "at %08lx nip %08lx lr %08lx code %x\n";
-   const char fmt64[] = KERN_INFO "%s[%d]: unhandled signal %d " \
-   "at %016lx nip %016lx lr %016lx code %x\n";
-
if (!unhandled_signal(current, signr))
return;
 
-   printk(regs->msr & MSR_64BIT ? fmt64 : fmt32,
-  current->comm, current->pid, signr,
-  addr, regs->nip, regs->link, code);
+   pr_info("%s[%d]: unhandled signal %d at "REG_FMT \
+   " nip "REG_FMT" lr "REG_FMT" code %x\n",
+   current->comm, current->pid, signr, addr,
+   regs->nip, regs->link, code);
 }
 
 void _exception_pkey(int signr, struct pt_regs *regs, int code,
-- 
2.17.1



Re: [PATCH 1/4] treewide: convert ISO_8859-1 text comments to utf-8

2018-07-24 Thread Simon Horman
On Tue, Jul 24, 2018 at 01:13:25PM +0200, Arnd Bergmann wrote:
> Almost all files in the kernel are either plain text or UTF-8
> encoded. A couple however are ISO_8859-1, usually just a few
> characters in a C comments, for historic reasons.
> 
> This converts them all to UTF-8 for consistency.
> 
> Signed-off-by: Arnd Bergmann 
> ---
>  .../devicetree/bindings/net/nfc/pn544.txt |   2 +-
>  arch/arm/boot/dts/sun4i-a10-inet97fv2.dts |   2 +-
>  arch/arm/crypto/sha256_glue.c |   2 +-
>  arch/arm/crypto/sha256_neon_glue.c|   4 +-
>  drivers/crypto/vmx/ghashp8-ppc.pl |  12 +-
>  drivers/iio/dac/ltc2632.c |   2 +-
>  drivers/power/reset/ltc2952-poweroff.c|   4 +-
>  kernel/events/callchain.c |   2 +-
>  net/netfilter/ipvs/Kconfig|   8 +-
>  net/netfilter/ipvs/ip_vs_mh.c |   4 +-

IPVS portion:

Acked-by: Simon Horman 


>  tools/power/cpupower/po/de.po |  44 +++
>  tools/power/cpupower/po/fr.po | 120 +-
>  12 files changed, 103 insertions(+), 103 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/net/nfc/pn544.txt 
> b/Documentation/devicetree/bindings/net/nfc/pn544.txt
> index 538a86f7b2b0..72593f056b75 100644
> --- a/Documentation/devicetree/bindings/net/nfc/pn544.txt
> +++ b/Documentation/devicetree/bindings/net/nfc/pn544.txt
> @@ -2,7 +2,7 @@
>  
>  Required properties:
>  - compatible: Should be "nxp,pn544-i2c".
> -- clock-frequency: I�C work frequency.
> +- clock-frequency: I²C work frequency.
>  - reg: address on the bus
>  - interrupt-parent: phandle for the interrupt gpio controller
>  - interrupts: GPIO interrupt to which the chip is connected
> diff --git a/arch/arm/boot/dts/sun4i-a10-inet97fv2.dts 
> b/arch/arm/boot/dts/sun4i-a10-inet97fv2.dts
> index 5d096528e75a..71c27ea0b53e 100644
> --- a/arch/arm/boot/dts/sun4i-a10-inet97fv2.dts
> +++ b/arch/arm/boot/dts/sun4i-a10-inet97fv2.dts
> @@ -1,7 +1,7 @@
>  /*
>   * Copyright 2014 Open Source Support GmbH
>   *
> - * David Lanzend�rfer 
> + * David Lanzendörfer 
>   *
>   * This file is dual-licensed: you can use it either under the terms
>   * of the GPL or the X11 license, at your option. Note that this dual
> diff --git a/arch/arm/crypto/sha256_glue.c b/arch/arm/crypto/sha256_glue.c
> index bf8ccff2c9d0..0ae900e778f3 100644
> --- a/arch/arm/crypto/sha256_glue.c
> +++ b/arch/arm/crypto/sha256_glue.c
> @@ -2,7 +2,7 @@
>   * Glue code for the SHA256 Secure Hash Algorithm assembly implementation
>   * using optimized ARM assembler and NEON instructions.
>   *
> - * Copyright � 2015 Google Inc.
> + * Copyright © 2015 Google Inc.
>   *
>   * This file is based on sha256_ssse3_glue.c:
>   *   Copyright (C) 2013 Intel Corporation
> diff --git a/arch/arm/crypto/sha256_neon_glue.c 
> b/arch/arm/crypto/sha256_neon_glue.c
> index 9bbee56fbdc8..1d82c6cd31a4 100644
> --- a/arch/arm/crypto/sha256_neon_glue.c
> +++ b/arch/arm/crypto/sha256_neon_glue.c
> @@ -2,10 +2,10 @@
>   * Glue code for the SHA256 Secure Hash Algorithm assembly implementation
>   * using NEON instructions.
>   *
> - * Copyright � 2015 Google Inc.
> + * Copyright © 2015 Google Inc.
>   *
>   * This file is based on sha512_neon_glue.c:
> - *   Copyright � 2014 Jussi Kivilinna 
> + *   Copyright © 2014 Jussi Kivilinna 
>   *
>   * This program is free software; you can redistribute it and/or modify it
>   * under the terms of the GNU General Public License as published by the Free
> diff --git a/drivers/crypto/vmx/ghashp8-ppc.pl 
> b/drivers/crypto/vmx/ghashp8-ppc.pl
> index f746af271460..38b06503ede0 100644
> --- a/drivers/crypto/vmx/ghashp8-ppc.pl
> +++ b/drivers/crypto/vmx/ghashp8-ppc.pl
> @@ -129,9 +129,9 @@ $code=<<___;
>le?vperm   $IN,$IN,$IN,$lemask
>   vxor$zero,$zero,$zero
>  
> - vpmsumd $Xl,$IN,$Hl # H.lo�Xi.lo
> - vpmsumd $Xm,$IN,$H  # H.hi�Xi.lo+H.lo�Xi.hi
> - vpmsumd $Xh,$IN,$Hh # H.hi�Xi.hi
> + vpmsumd $Xl,$IN,$Hl # H.lo·Xi.lo
> + vpmsumd $Xm,$IN,$H  # H.hi·Xi.lo+H.lo·Xi.hi
> + vpmsumd $Xh,$IN,$Hh # H.hi·Xi.hi
>  
>   vpmsumd $t2,$Xl,$xC2# 1st phase
>  
> @@ -187,11 +187,11 @@ $code=<<___;
>  .align   5
>  Loop:
>subic  $len,$len,16
> - vpmsumd $Xl,$IN,$Hl # H.lo�Xi.lo
> + vpmsumd $Xl,$IN,$Hl # H.lo·Xi.lo
>subfe. r0,r0,r0# borrow?-1:0
> - vpmsumd $Xm,$IN,$H  # H.hi�Xi.lo+H.lo�Xi.hi
> + vpmsumd $Xm,$IN,$H  # H.hi·Xi.lo+H.lo·Xi.hi
>andr0,r0,$len
> - vpmsumd $Xh,$IN,$Hh # H.hi�Xi.hi
> + vpmsumd $Xh,$IN,$Hh # H.hi·Xi.hi
>add$inp,$inp,r0
>  
>   vpmsumd 

[PATCH v8 0/2] hwmon/powernv: Add attributes to enable/disable sensors

2018-07-24 Thread Shilpasri G Bhat
This patch series adds new attribute to enable or disable a sensor at
runtime.

Changes from v7:
- Use of_for_each_phandle() and of_count_phandle_with_args() to parse
  through the phandle array

v7 : https://lkml.org/lkml/2018/7/20/72
v6 : https://lkml.org/lkml/2018/7/18/806
v5 : https://lkml.org/lkml/2018/7/15/15
v4 : https://lkml.org/lkml/2018/7/6/379
v3 : https://lkml.org/lkml/2018/7/5/476
v2 : https://lkml.org/lkml/2018/7/4/263
v1 : https://lkml.org/lkml/2018/3/22/214

Shilpasri G Bhat (2):
  powernv:opal-sensor-groups: Add support to enable sensor groups
  hwmon: ibmpowernv: Add attributes to enable/disable sensor groups

 Documentation/hwmon/ibmpowernv |  43 +++-
 arch/powerpc/include/asm/opal-api.h|   1 +
 arch/powerpc/include/asm/opal.h|   2 +
 .../powerpc/platforms/powernv/opal-sensor-groups.c |  28 +++
 arch/powerpc/platforms/powernv/opal-wrappers.S |   1 +
 drivers/hwmon/ibmpowernv.c | 238 ++---
 6 files changed, 279 insertions(+), 34 deletions(-)

-- 
1.8.3.1



[PATCH v8 2/2] hwmon: ibmpowernv: Add attributes to enable/disable sensor groups

2018-07-24 Thread Shilpasri G Bhat
OPAL firmware provides the facility for some groups of sensors to be
enabled/disabled at runtime to give the user the option of using the
system resources for collecting these sensors or not.

For example, on POWER9 systems, the On Chip Controller (OCC) gathers
various system and chip level sensors and maintains their values in
main memory.

This patch provides support for enabling/disabling the sensor groups
like power, temperature, current and voltage.

Signed-off-by: Shilpasri G Bhat 
[stew...@linux.vnet.ibm.com: Commit message]
---
Changes from v7:
- Use of_for_each_phandle() and of_count_phandle_with_args() to parse
  through the phandle array

 Documentation/hwmon/ibmpowernv |  43 +++-
 drivers/hwmon/ibmpowernv.c | 238 +++--
 2 files changed, 247 insertions(+), 34 deletions(-)

diff --git a/Documentation/hwmon/ibmpowernv b/Documentation/hwmon/ibmpowernv
index 8826ba2..5646825 100644
--- a/Documentation/hwmon/ibmpowernv
+++ b/Documentation/hwmon/ibmpowernv
@@ -33,9 +33,48 @@ fanX_input   Measured RPM value.
 fanX_min   Threshold RPM for alert generation.
 fanX_fault 0: No fail condition
1: Failing fan
+
 tempX_inputMeasured ambient temperature.
 tempX_max  Threshold ambient temperature for alert generation.
-inX_input  Measured power supply voltage
+tempX_highest  Historical maximum temperature
+tempX_lowest   Historical minimum temperature
+tempX_enable   Enable/disable all temperature sensors belonging to the
+   sub-group. In POWER9, this attribute corresponds to
+   each OCC. Using this attribute each OCC can be asked to
+   disable/enable all of its temperature sensors.
+   1: Enable
+   0: Disable
+
+inX_input  Measured power supply voltage (millivolt)
 inX_fault  0: No fail condition.
1: Failing power supply.
-power1_input   System power consumption (microWatt)
+inX_highestHistorical maximum voltage
+inX_lowest Historical minimum voltage
+inX_enable Enable/disable all voltage sensors belonging to the
+   sub-group. In POWER9, this attribute corresponds to
+   each OCC. Using this attribute each OCC can be asked to
+   disable/enable all of its voltage sensors.
+   1: Enable
+   0: Disable
+
+powerX_input   Power consumption (microWatt)
+powerX_input_highest   Historical maximum power
+powerX_input_lowestHistorical minimum power
+powerX_enable  Enable/disable all power sensors belonging to the
+   sub-group. In POWER9, this attribute corresponds to
+   each OCC. Using this attribute each OCC can be asked to
+   disable/enable all of its power sensors.
+   1: Enable
+   0: Disable
+
+currX_inputMeasured current (milliampere)
+currX_highest  Historical maximum current
+currX_lowest   Historical minimum current
+currX_enable   Enable/disable all current sensors belonging to the
+   sub-group. In POWER9, this attribute corresponds to
+   each OCC. Using this attribute each OCC can be asked to
+   disable/enable all of its current sensors.
+   1: Enable
+   0: Disable
+
+energyX_input  Cumulative energy (microJoule)
diff --git a/drivers/hwmon/ibmpowernv.c b/drivers/hwmon/ibmpowernv.c
index f829dad..8347280 100644
--- a/drivers/hwmon/ibmpowernv.c
+++ b/drivers/hwmon/ibmpowernv.c
@@ -90,11 +90,20 @@ struct sensor_data {
char label[MAX_LABEL_LEN];
char name[MAX_ATTR_LEN];
struct device_attribute dev_attr;
+   struct sensor_group_data *sgrp_data;
+};
+
+struct sensor_group_data {
+   struct mutex mutex;
+   u32 gid;
+   bool enable;
 };
 
 struct platform_data {
const struct attribute_group *attr_groups[MAX_SENSOR_TYPE + 1];
+   struct sensor_group_data *sgrp_data;
u32 sensors_count; /* Total count of sensors from each group */
+   u32 nr_sensor_groups; /* Total number of sensor groups */
 };
 
 static ssize_t show_sensor(struct device *dev, struct device_attribute 
*devattr,
@@ -105,6 +114,9 @@ static ssize_t show_sensor(struct device *dev, struct 
device_attribute *devattr,
ssize_t ret;
u64 x;
 
+   if (sdata->sgrp_data && !sdata->sgrp_data->enable)
+   return -ENODATA;
+
ret =  opal_get_sensor_data_u64(sdata->id, );
 
if (ret)
@@ -120,6 +132,46 @@ static ssize_t show_sensor(struct device *dev, struct 
device_attribute *devattr,
return sprintf(buf, "%llu\n", x);
 }
 
+static 

Re: [PATCH v3 1/1] KVM: PPC: Book3S HV: pack VCORE IDs to access full VCPU ID space

2018-07-24 Thread Cédric Le Goater
On 07/19/2018 04:25 AM, Sam Bobroff wrote:
> From: Sam Bobroff 
> 
> It is not currently possible to create the full number of possible
> VCPUs (KVM_MAX_VCPUS) on Power9 with KVM-HV when the guest uses less
> threads per core than it's core stride (or "VSMT mode"). This is
> because the VCORE ID and XIVE offsets to grow beyond KVM_MAX_VCPUS
> even though the VCPU ID is less than KVM_MAX_VCPU_ID.
> 
> To address this, "pack" the VCORE ID and XIVE offsets by using
> knowledge of the way the VCPU IDs will be used when there are less
> guest threads per core than the core stride. The primary thread of
> each core will always be used first. Then, if the guest uses more than
> one thread per core, these secondary threads will sequentially follow
> the primary in each core.
> 
> So, the only way an ID above KVM_MAX_VCPUS can be seen, is if the
> VCPUs are being spaced apart, so at least half of each core is empty
> and IDs between KVM_MAX_VCPUS and (KVM_MAX_VCPUS * 2) can be mapped
> into the second half of each core (4..7, in an 8-thread core).
> 
> Similarly, if IDs above KVM_MAX_VCPUS * 2 are seen, at least 3/4 of
> each core is being left empty, and we can map down into the second and
> third quarters of each core (2, 3 and 5, 6 in an 8-thread core).
> 
> Lastly, if IDs above KVM_MAX_VCPUS * 4 are seen, only the primary
> threads are being used and 7/8 of the core is empty, allowing use of
> the 1, 3, 5 and 7 thread slots.
> 
> (Strides less than 8 are handled similarly.)
> 
> This allows the VCORE ID or offset to be calculated quickly from the
> VCPU ID or XIVE server numbers, without access to the VCPU structure.
> 
> Signed-off-by: Sam Bobroff 

On the XIVE part, 

Reviewed-by: Cédric Le Goater 

Thanks,

C.

> ---
> Hello everyone,
> 
> I've completed a trial merge with the guest native-XIVE code and found no
> problems; it's no more difficult than the host side and only requires a few
> calls to xive_vp().
> 
> On that basis, here is v3 (unchanged from v2) as non-RFC and it seems to be
> ready to go.
> 
> Patch set v3:
> Patch 1/1: KVM: PPC: Book3S HV: pack VCORE IDs to access full VCPU ID space
> 
> Patch set v2:
> Patch 1/1: KVM: PPC: Book3S HV: pack VCORE IDs to access full VCPU ID space
> * Corrected places in kvm/book3s_xive.c where IDs weren't packed.
> * Because kvmppc_pack_vcpu_id() is only called on P9, there is no need to 
> test "emul_smt_mode > 1", so remove it.
> * Re-ordered block_offsets[] to be more ascending.
> * Added more detailed description of the packing algorithm.
> 
> Patch set v1:
> Patch 1/1: KVM: PPC: Book3S HV: pack VCORE IDs to access full VCPU ID space
> 
>  arch/powerpc/include/asm/kvm_book3s.h | 44 
> +++
>  arch/powerpc/kvm/book3s_hv.c  | 14 +++
>  arch/powerpc/kvm/book3s_xive.c| 19 +--
>  3 files changed, 66 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
> b/arch/powerpc/include/asm/kvm_book3s.h
> index 1f345a0b6ba2..ba4b6e00fca7 100644
> --- a/arch/powerpc/include/asm/kvm_book3s.h
> +++ b/arch/powerpc/include/asm/kvm_book3s.h
> @@ -390,4 +390,48 @@ extern int kvmppc_h_logical_ci_store(struct kvm_vcpu 
> *vcpu);
>  #define SPLIT_HACK_MASK  0xff00
>  #define SPLIT_HACK_OFFS  0xfb00
>  
> +/* Pack a VCPU ID from the [0..KVM_MAX_VCPU_ID) space down to the
> + * [0..KVM_MAX_VCPUS) space, while using knowledge of the guest's core stride
> + * (but not it's actual threading mode, which is not available) to avoid
> + * collisions.
> + *
> + * The implementation leaves VCPU IDs from the range [0..KVM_MAX_VCPUS) 
> (block
> + * 0) unchanged: if the guest is filling each VCORE completely then it will 
> be
> + * using consecutive IDs and it will fill the space without any packing.
> + *
> + * For higher VCPU IDs, the packed ID is based on the VCPU ID modulo
> + * KVM_MAX_VCPUS (effectively masking off the top bits) and then an offset is
> + * added to avoid collisions.
> + *
> + * VCPU IDs in the range [KVM_MAX_VCPUS..(KVM_MAX_VCPUS*2)) (block 1) are 
> only
> + * possible if the guest is leaving at least 1/2 of each VCORE empty, so IDs
> + * can be safely packed into the second half of each VCORE by adding an 
> offset
> + * of (stride / 2).
> + *
> + * Similarly, if VCPU IDs in the range [(KVM_MAX_VCPUS*2)..(KVM_MAX_VCPUS*4))
> + * (blocks 2 and 3) are seen, the guest must be leaving at least 3/4 of each
> + * VCORE empty so packed IDs can be offset by (stride / 4) and (stride * 3 / 
> 4).
> + *
> + * Finally, VCPU IDs from blocks 5..7 will only be seen if the guest is 
> using a
> + * stride of 8 and 1 thread per core so the remaining offsets of 1, 3, 5 and 
> 7
> + * must be free to use.
> + *
> + * (The offsets for each block are stored in block_offsets[], indexed by the
> + * block number if the stride is 8. For cases where the guest's stride is 
> less
> + * than 8, we can re-use the block_offsets array by multiplying 

[PATCH 1/4] treewide: convert ISO_8859-1 text comments to utf-8

2018-07-24 Thread Arnd Bergmann
Almost all files in the kernel are either plain text or UTF-8
encoded. A couple however are ISO_8859-1, usually just a few
characters in a C comments, for historic reasons.

This converts them all to UTF-8 for consistency.

Signed-off-by: Arnd Bergmann 
---
 .../devicetree/bindings/net/nfc/pn544.txt |   2 +-
 arch/arm/boot/dts/sun4i-a10-inet97fv2.dts |   2 +-
 arch/arm/crypto/sha256_glue.c |   2 +-
 arch/arm/crypto/sha256_neon_glue.c|   4 +-
 drivers/crypto/vmx/ghashp8-ppc.pl |  12 +-
 drivers/iio/dac/ltc2632.c |   2 +-
 drivers/power/reset/ltc2952-poweroff.c|   4 +-
 kernel/events/callchain.c |   2 +-
 net/netfilter/ipvs/Kconfig|   8 +-
 net/netfilter/ipvs/ip_vs_mh.c |   4 +-
 tools/power/cpupower/po/de.po |  44 +++
 tools/power/cpupower/po/fr.po | 120 +-
 12 files changed, 103 insertions(+), 103 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/nfc/pn544.txt 
b/Documentation/devicetree/bindings/net/nfc/pn544.txt
index 538a86f7b2b0..72593f056b75 100644
--- a/Documentation/devicetree/bindings/net/nfc/pn544.txt
+++ b/Documentation/devicetree/bindings/net/nfc/pn544.txt
@@ -2,7 +2,7 @@
 
 Required properties:
 - compatible: Should be "nxp,pn544-i2c".
-- clock-frequency: I?C work frequency.
+- clock-frequency: I??C work frequency.
 - reg: address on the bus
 - interrupt-parent: phandle for the interrupt gpio controller
 - interrupts: GPIO interrupt to which the chip is connected
diff --git a/arch/arm/boot/dts/sun4i-a10-inet97fv2.dts 
b/arch/arm/boot/dts/sun4i-a10-inet97fv2.dts
index 5d096528e75a..71c27ea0b53e 100644
--- a/arch/arm/boot/dts/sun4i-a10-inet97fv2.dts
+++ b/arch/arm/boot/dts/sun4i-a10-inet97fv2.dts
@@ -1,7 +1,7 @@
 /*
  * Copyright 2014 Open Source Support GmbH
  *
- * David Lanzend?rfer 
+ * David Lanzend??rfer 
  *
  * This file is dual-licensed: you can use it either under the terms
  * of the GPL or the X11 license, at your option. Note that this dual
diff --git a/arch/arm/crypto/sha256_glue.c b/arch/arm/crypto/sha256_glue.c
index bf8ccff2c9d0..0ae900e778f3 100644
--- a/arch/arm/crypto/sha256_glue.c
+++ b/arch/arm/crypto/sha256_glue.c
@@ -2,7 +2,7 @@
  * Glue code for the SHA256 Secure Hash Algorithm assembly implementation
  * using optimized ARM assembler and NEON instructions.
  *
- * Copyright ? 2015 Google Inc.
+ * Copyright ?? 2015 Google Inc.
  *
  * This file is based on sha256_ssse3_glue.c:
  *   Copyright (C) 2013 Intel Corporation
diff --git a/arch/arm/crypto/sha256_neon_glue.c 
b/arch/arm/crypto/sha256_neon_glue.c
index 9bbee56fbdc8..1d82c6cd31a4 100644
--- a/arch/arm/crypto/sha256_neon_glue.c
+++ b/arch/arm/crypto/sha256_neon_glue.c
@@ -2,10 +2,10 @@
  * Glue code for the SHA256 Secure Hash Algorithm assembly implementation
  * using NEON instructions.
  *
- * Copyright ? 2015 Google Inc.
+ * Copyright ?? 2015 Google Inc.
  *
  * This file is based on sha512_neon_glue.c:
- *   Copyright ? 2014 Jussi Kivilinna 
+ *   Copyright ?? 2014 Jussi Kivilinna 
  *
  * This program is free software; you can redistribute it and/or modify it
  * under the terms of the GNU General Public License as published by the Free
diff --git a/drivers/crypto/vmx/ghashp8-ppc.pl 
b/drivers/crypto/vmx/ghashp8-ppc.pl
index f746af271460..38b06503ede0 100644
--- a/drivers/crypto/vmx/ghashp8-ppc.pl
+++ b/drivers/crypto/vmx/ghashp8-ppc.pl
@@ -129,9 +129,9 @@ $code=<<___;
 le?vperm   $IN,$IN,$IN,$lemask
vxor$zero,$zero,$zero
 
-   vpmsumd $Xl,$IN,$Hl # H.lo?Xi.lo
-   vpmsumd $Xm,$IN,$H  # H.hi?Xi.lo+H.lo?Xi.hi
-   vpmsumd $Xh,$IN,$Hh # H.hi?Xi.hi
+   vpmsumd $Xl,$IN,$Hl # H.lo??Xi.lo
+   vpmsumd $Xm,$IN,$H  # H.hi??Xi.lo+H.lo??Xi.hi
+   vpmsumd $Xh,$IN,$Hh # H.hi??Xi.hi
 
vpmsumd $t2,$Xl,$xC2# 1st phase
 
@@ -187,11 +187,11 @@ $code=<<___;
 .align 5
 Loop:
 subic  $len,$len,16
-   vpmsumd $Xl,$IN,$Hl # H.lo?Xi.lo
+   vpmsumd $Xl,$IN,$Hl # H.lo??Xi.lo
 subfe. r0,r0,r0# borrow?-1:0
-   vpmsumd $Xm,$IN,$H  # H.hi?Xi.lo+H.lo?Xi.hi
+   vpmsumd $Xm,$IN,$H  # H.hi??Xi.lo+H.lo??Xi.hi
 andr0,r0,$len
-   vpmsumd $Xh,$IN,$Hh # H.hi?Xi.hi
+   vpmsumd $Xh,$IN,$Hh # H.hi??Xi.hi
 add$inp,$inp,r0
 
vpmsumd $t2,$Xl,$xC2# 1st phase
diff --git a/drivers/iio/dac/ltc2632.c b/drivers/iio/dac/ltc2632.c
index cca278eaa138..885105135580 100644
--- a/drivers/iio/dac/ltc2632.c
+++ b/drivers/iio/dac/ltc2632.c
@@ -1,7 +1,7 @@
 /*
  * LTC2632 Digital to analog convertors spi driver
  *
- * 

Re: [PATCH 1/4] treewide: convert ISO_8859-1 text comments to utf-8

2018-07-24 Thread Jonathan Cameron
On Tue, 24 Jul 2018 13:13:25 +0200
Arnd Bergmann  wrote:

> Almost all files in the kernel are either plain text or UTF-8
> encoded. A couple however are ISO_8859-1, usually just a few
> characters in a C comments, for historic reasons.
> 
> This converts them all to UTF-8 for consistency.
> 
> Signed-off-by: Arnd Bergmann 
For IIO, Acked-by: Jonathan Cameron 

Thanks for tidying this up.

Jonathan

> ---
>  .../devicetree/bindings/net/nfc/pn544.txt |   2 +-
>  arch/arm/boot/dts/sun4i-a10-inet97fv2.dts |   2 +-
>  arch/arm/crypto/sha256_glue.c |   2 +-
>  arch/arm/crypto/sha256_neon_glue.c|   4 +-
>  drivers/crypto/vmx/ghashp8-ppc.pl |  12 +-
>  drivers/iio/dac/ltc2632.c |   2 +-
>  drivers/power/reset/ltc2952-poweroff.c|   4 +-
>  kernel/events/callchain.c |   2 +-
>  net/netfilter/ipvs/Kconfig|   8 +-
>  net/netfilter/ipvs/ip_vs_mh.c |   4 +-
>  tools/power/cpupower/po/de.po |  44 +++
>  tools/power/cpupower/po/fr.po | 120 +-
>  12 files changed, 103 insertions(+), 103 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/net/nfc/pn544.txt 
> b/Documentation/devicetree/bindings/net/nfc/pn544.txt
> index 538a86f7b2b0..72593f056b75 100644
> --- a/Documentation/devicetree/bindings/net/nfc/pn544.txt
> +++ b/Documentation/devicetree/bindings/net/nfc/pn544.txt
> @@ -2,7 +2,7 @@
>  
>  Required properties:
>  - compatible: Should be "nxp,pn544-i2c".
> -- clock-frequency: I_C work frequency.
> +- clock-frequency: I²C work frequency.
>  - reg: address on the bus
>  - interrupt-parent: phandle for the interrupt gpio controller
>  - interrupts: GPIO interrupt to which the chip is connected
> diff --git a/arch/arm/boot/dts/sun4i-a10-inet97fv2.dts 
> b/arch/arm/boot/dts/sun4i-a10-inet97fv2.dts
> index 5d096528e75a..71c27ea0b53e 100644
> --- a/arch/arm/boot/dts/sun4i-a10-inet97fv2.dts
> +++ b/arch/arm/boot/dts/sun4i-a10-inet97fv2.dts
> @@ -1,7 +1,7 @@
>  /*
>   * Copyright 2014 Open Source Support GmbH
>   *
> - * David Lanzend_rfer 
> + * David Lanzendörfer 
>   *
>   * This file is dual-licensed: you can use it either under the terms
>   * of the GPL or the X11 license, at your option. Note that this dual
> diff --git a/arch/arm/crypto/sha256_glue.c b/arch/arm/crypto/sha256_glue.c
> index bf8ccff2c9d0..0ae900e778f3 100644
> --- a/arch/arm/crypto/sha256_glue.c
> +++ b/arch/arm/crypto/sha256_glue.c
> @@ -2,7 +2,7 @@
>   * Glue code for the SHA256 Secure Hash Algorithm assembly implementation
>   * using optimized ARM assembler and NEON instructions.
>   *
> - * Copyright _ 2015 Google Inc.
> + * Copyright © 2015 Google Inc.
>   *
>   * This file is based on sha256_ssse3_glue.c:
>   *   Copyright (C) 2013 Intel Corporation
> diff --git a/arch/arm/crypto/sha256_neon_glue.c 
> b/arch/arm/crypto/sha256_neon_glue.c
> index 9bbee56fbdc8..1d82c6cd31a4 100644
> --- a/arch/arm/crypto/sha256_neon_glue.c
> +++ b/arch/arm/crypto/sha256_neon_glue.c
> @@ -2,10 +2,10 @@
>   * Glue code for the SHA256 Secure Hash Algorithm assembly implementation
>   * using NEON instructions.
>   *
> - * Copyright _ 2015 Google Inc.
> + * Copyright © 2015 Google Inc.
>   *
>   * This file is based on sha512_neon_glue.c:
> - *   Copyright _ 2014 Jussi Kivilinna 
> + *   Copyright © 2014 Jussi Kivilinna 
>   *
>   * This program is free software; you can redistribute it and/or modify it
>   * under the terms of the GNU General Public License as published by the Free
> diff --git a/drivers/crypto/vmx/ghashp8-ppc.pl 
> b/drivers/crypto/vmx/ghashp8-ppc.pl
> index f746af271460..38b06503ede0 100644
> --- a/drivers/crypto/vmx/ghashp8-ppc.pl
> +++ b/drivers/crypto/vmx/ghashp8-ppc.pl
> @@ -129,9 +129,9 @@ $code=<<___;
>le?vperm   $IN,$IN,$IN,$lemask
>   vxor$zero,$zero,$zero
>  
> - vpmsumd $Xl,$IN,$Hl # H.lo_Xi.lo
> - vpmsumd $Xm,$IN,$H  # H.hi_Xi.lo+H.lo_Xi.hi
> - vpmsumd $Xh,$IN,$Hh # H.hi_Xi.hi
> + vpmsumd $Xl,$IN,$Hl # H.lo·Xi.lo
> + vpmsumd $Xm,$IN,$H  # H.hi·Xi.lo+H.lo·Xi.hi
> + vpmsumd $Xh,$IN,$Hh # H.hi·Xi.hi
>  
>   vpmsumd $t2,$Xl,$xC2# 1st phase
>  
> @@ -187,11 +187,11 @@ $code=<<___;
>  .align   5
>  Loop:
>subic  $len,$len,16
> - vpmsumd $Xl,$IN,$Hl # H.lo_Xi.lo
> + vpmsumd $Xl,$IN,$Hl # H.lo·Xi.lo
>subfe. r0,r0,r0# borrow?-1:0
> - vpmsumd $Xm,$IN,$H  # H.hi_Xi.lo+H.lo_Xi.hi
> + vpmsumd $Xm,$IN,$H  # H.hi·Xi.lo+H.lo·Xi.hi
>andr0,r0,$len
> - vpmsumd $Xh,$IN,$Hh # H.hi_Xi.hi
> + vpmsumd $Xh,$IN,$Hh # H.hi·Xi.hi
>add$inp,$inp,r0
>  

Re: [PATCH v11 19/26] mm: provide speculative fault infrastructure

2018-07-24 Thread zhong jiang
On 2018/5/17 19:06, Laurent Dufour wrote:
> From: Peter Zijlstra 
>
> Provide infrastructure to do a speculative fault (not holding
> mmap_sem).
>
> The not holding of mmap_sem means we can race against VMA
> change/removal and page-table destruction. We use the SRCU VMA freeing
> to keep the VMA around. We use the VMA seqcount to detect change
> (including umapping / page-table deletion) and we use gup_fast() style
> page-table walking to deal with page-table races.
>
> Once we've obtained the page and are ready to update the PTE, we
> validate if the state we started the fault with is still valid, if
> not, we'll fail the fault with VM_FAULT_RETRY, otherwise we update the
> PTE and we're done.
>
> Signed-off-by: Peter Zijlstra (Intel) 
>
> [Manage the newly introduced pte_spinlock() for speculative page
>  fault to fail if the VMA is touched in our back]
> [Rename vma_is_dead() to vma_has_changed() and declare it here]
> [Fetch p4d and pud]
> [Set vmd.sequence in __handle_mm_fault()]
> [Abort speculative path when handle_userfault() has to be called]
> [Add additional VMA's flags checks in handle_speculative_fault()]
> [Clear FAULT_FLAG_ALLOW_RETRY in handle_speculative_fault()]
> [Don't set vmf->pte and vmf->ptl if pte_map_lock() failed]
> [Remove warning comment about waiting for !seq&1 since we don't want
>  to wait]
> [Remove warning about no huge page support, mention it explictly]
> [Don't call do_fault() in the speculative path as __do_fault() calls
>  vma->vm_ops->fault() which may want to release mmap_sem]
> [Only vm_fault pointer argument for vma_has_changed()]
> [Fix check against huge page, calling pmd_trans_huge()]
> [Use READ_ONCE() when reading VMA's fields in the speculative path]
> [Explicitly check for __HAVE_ARCH_PTE_SPECIAL as we can't support for
>  processing done in vm_normal_page()]
> [Check that vma->anon_vma is already set when starting the speculative
>  path]
> [Check for memory policy as we can't support MPOL_INTERLEAVE case due to
>  the processing done in mpol_misplaced()]
> [Don't support VMA growing up or down]
> [Move check on vm_sequence just before calling handle_pte_fault()]
> [Don't build SPF services if !CONFIG_SPECULATIVE_PAGE_FAULT]
> [Add mem cgroup oom check]
> [Use READ_ONCE to access p*d entries]
> [Replace deprecated ACCESS_ONCE() by READ_ONCE() in vma_has_changed()]
> [Don't fetch pte again in handle_pte_fault() when running the speculative
>  path]
> [Check PMD against concurrent collapsing operation]
> [Try spin lock the pte during the speculative path to avoid deadlock with
>  other CPU's invalidating the TLB and requiring this CPU to catch the
>  inter processor's interrupt]
> [Move define of FAULT_FLAG_SPECULATIVE here]
> [Introduce __handle_speculative_fault() and add a check against
>  mm->mm_users in handle_speculative_fault() defined in mm.h]
> Signed-off-by: Laurent Dufour 
> ---
>  include/linux/hugetlb_inline.h |   2 +-
>  include/linux/mm.h |  30 
>  include/linux/pagemap.h|   4 +-
>  mm/internal.h  |  16 +-
>  mm/memory.c| 340 
> -
>  5 files changed, 385 insertions(+), 7 deletions(-)
>
> diff --git a/include/linux/hugetlb_inline.h b/include/linux/hugetlb_inline.h
> index 0660a03d37d9..9e25283d6fc9 100644
> --- a/include/linux/hugetlb_inline.h
> +++ b/include/linux/hugetlb_inline.h
> @@ -8,7 +8,7 @@
>  
>  static inline bool is_vm_hugetlb_page(struct vm_area_struct *vma)
>  {
> - return !!(vma->vm_flags & VM_HUGETLB);
> + return !!(READ_ONCE(vma->vm_flags) & VM_HUGETLB);
>  }
>  
>  #else
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 05cbba70104b..31acf98a7d92 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -315,6 +315,7 @@ extern pgprot_t protection_map[16];
>  #define FAULT_FLAG_USER  0x40/* The fault originated in 
> userspace */
>  #define FAULT_FLAG_REMOTE0x80/* faulting for non current tsk/mm */
>  #define FAULT_FLAG_INSTRUCTION  0x100/* The fault was during an 
> instruction fetch */
> +#define FAULT_FLAG_SPECULATIVE   0x200   /* Speculative fault, not 
> holding mmap_sem */
>  
>  #define FAULT_FLAG_TRACE \
>   { FAULT_FLAG_WRITE, "WRITE" }, \
> @@ -343,6 +344,10 @@ struct vm_fault {
>   gfp_t gfp_mask; /* gfp mask to be used for allocations 
> */
>   pgoff_t pgoff;  /* Logical page offset based on vma */
>   unsigned long address;  /* Faulting virtual address */
> +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT
> + unsigned int sequence;
> + pmd_t orig_pmd; /* value of PMD at the time of fault */
> +#endif
>   pmd_t *pmd; /* Pointer to pmd entry matching
>* the 'address' */
>   pud_t *pud; /* Pointer to pud entry matching
> @@ -1415,6 +1420,31 @@ int invalidate_inode_page(struct page *page);

Re: [PATCH 1/4] treewide: convert ISO_8859-1 text comments to utf-8

2018-07-24 Thread Andrew Morton
On Tue, 24 Jul 2018 13:13:25 +0200 Arnd Bergmann  wrote:

> Almost all files in the kernel are either plain text or UTF-8
> encoded. A couple however are ISO_8859-1, usually just a few
> characters in a C comments, for historic reasons.
> 
> This converts them all to UTF-8 for consistency.

Was "consistency" the only rationale?  The discussion is now outside my
memory horizon but I thought there were other reasons.

Will we be getting a checkpatch rule to keep things this way?


Re: [PATCH v07 6/9] pmt/numa: Disable arch_update_cpu_topology during CPU readd

2018-07-24 Thread Nathan Fontenot

On 07/13/2018 03:18 PM, Michael Bringmann wrote:

pmt/numa: Disable arch_update_cpu_topology during post migration
CPU readd updates when evaluating device-tree changes after LPM
to avoid thread deadlocks trying to update node assignments.
System timing between all of the threads and timers restarted in
a migrated system overlapped frequently allowing tasks to start
acquiring resources (get_online_cpus) needed by rebuild_sched_domains.
Defer the operation of that function until after the CPU readd has
completed.

Signed-off-by: Michael Bringmann 
---
  arch/powerpc/platforms/pseries/hotplug-cpu.c |9 -
  1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index 1906ee57..df1791b 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -26,6 +26,7 @@
  #include /* for idle_task_exit */
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -684,9 +685,15 @@ static int dlpar_cpu_readd_by_index(u32 drc_index)

pr_info("Attempting to re-add CPU, drc index %x\n", drc_index);

+   arch_update_cpu_topology_suspend();
rc = dlpar_cpu_remove_by_index(drc_index, false);
-   if (!rc)
+   arch_update_cpu_topology_resume();
+
+   if (!rc) {
+   arch_update_cpu_topology_suspend();
rc = dlpar_cpu_add(drc_index, false);
+   arch_update_cpu_topology_resume();
+   }



A couple of questions...Why not disable across the entire remove and add
operations instead of disabling for each operation?

Also, what about other CPU add/remove routines, do they need to do
similar disabling?

-Nathan


if (rc)
pr_info("Failed to update cpu at drc_index %lx\n",





Re: [PATCH 1/4] treewide: convert ISO_8859-1 text comments to utf-8

2018-07-24 Thread Randy Dunlap
On 07/24/2018 02:00 PM, Andrew Morton wrote:
> On Tue, 24 Jul 2018 13:13:25 +0200 Arnd Bergmann  wrote:
> 
>> Almost all files in the kernel are either plain text or UTF-8
>> encoded. A couple however are ISO_8859-1, usually just a few
>> characters in a C comments, for historic reasons.
>>
>> This converts them all to UTF-8 for consistency.
> 
> Was "consistency" the only rationale?  The discussion is now outside my
> memory horizon but I thought there were other reasons.

kconfig tools prefer ASCII or utf-8.

email tools probably likewise.

user sanity?

> Will we be getting a checkpatch rule to keep things this way?



-- 
~Randy


Re: [PATCH v4 00/11] hugetlb: Factorize hugetlb architecture primitives

2018-07-24 Thread Paul Burton
Hi Alexandre,

On Thu, Jul 05, 2018 at 11:07:05AM +, Alexandre Ghiti wrote:
> In order to reduce copy/paste of functions across architectures and then
> make riscv hugetlb port (and future ports) simpler and smaller, this
> patchset intends to factorize the numerous hugetlb primitives that are
> defined across all the architectures.
> 
> Except for prepare_hugepage_range, this patchset moves the versions that
> are just pass-through to standard pte primitives into
> asm-generic/hugetlb.h by using the same #ifdef semantic that can be
> found in asm-generic/pgtable.h, i.e. __HAVE_ARCH_***.
> 
> s390 architecture has not been tackled in this serie since it does not
> use asm-generic/hugetlb.h at all.
> powerpc could be factorized a bit more (cf huge_ptep_set_wrprotect).
> 
> This patchset has been compiled on x86 only. 

For MIPS these look good - I don't see any issues & they pass a build
test (using cavium_octeon_defconfig which enables huge pages), so:

Acked-by: Paul Burton  # MIPS parts

Thanks,
Paul


Re: [PATCH 1/4] treewide: convert ISO_8859-1 text comments to utf-8

2018-07-24 Thread Andrew Morton
On Tue, 24 Jul 2018 17:13:20 -0700 Joe Perches  wrote:

> On Tue, 2018-07-24 at 14:00 -0700, Andrew Morton wrote:
> > On Tue, 24 Jul 2018 13:13:25 +0200 Arnd Bergmann  wrote:
> > > Almost all files in the kernel are either plain text or UTF-8
> > > encoded. A couple however are ISO_8859-1, usually just a few
> > > characters in a C comments, for historic reasons.
> > > This converts them all to UTF-8 for consistency.
> []
> > Will we be getting a checkpatch rule to keep things this way?
> 
> How would that be done?

I'm using this, seems to work.

if ! file $p | grep -q -P ", ASCII text|, UTF-8 Unicode text"
then
echo $p: weird charset
fi



Re: [PATCH 1/4] treewide: convert ISO_8859-1 text comments to utf-8

2018-07-24 Thread Joe Perches
On Tue, 2018-07-24 at 14:00 -0700, Andrew Morton wrote:
> On Tue, 24 Jul 2018 13:13:25 +0200 Arnd Bergmann  wrote:
> > Almost all files in the kernel are either plain text or UTF-8
> > encoded. A couple however are ISO_8859-1, usually just a few
> > characters in a C comments, for historic reasons.
> > This converts them all to UTF-8 for consistency.
[]
> Will we be getting a checkpatch rule to keep things this way?

How would that be done?


Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required

2018-07-24 Thread Baoquan He
Hi Andrew,

On 07/19/18 at 12:44pm, Andrew Morton wrote:
> On Thu, 19 Jul 2018 23:17:53 +0800 Baoquan He  wrote:
> > > As far as I can tell, the above is the whole reason for the patchset,
> > > yes?  To avoid confusing users.
> > 
> > 
> > In fact, it's not just trying to avoid confusing users. Kexec loading
> > and kexec_file loading are just do the same thing in essence. Just we
> > need do kernel image verification on uefi system, have to port kexec
> > loading code to kernel. 
> > 
> > Kexec has been a formal feature in our distro, and customers owning
> > those kind of very large machine can make use of this feature to speed
> > up the reboot process. On uefi machine, the kexec_file loading will
> > search place to put kernel under 4G from top to down. As we know, the
> > 1st 4G space is DMA32 ZONE, dma, pci mmcfg, bios etc all try to consume
> > it. It may have possibility to not be able to find a usable space for
> > kernel/initrd. From the top down of the whole memory space, we don't
> > have this worry. 
> > 
> > And at the first post, I just posted below with AKASHI's
> > walk_system_ram_res_rev() version. Later you suggested to use
> > list_head to link child sibling of resource, see what the code change
> > looks like.
> > http://lkml.kernel.org/r/20180322033722.9279-1-...@redhat.com
> > 
> > Then I posted v2
> > http://lkml.kernel.org/r/20180408024724.16812-1-...@redhat.com
> > Rob Herring mentioned that other components which has this tree struct
> > have planned to do the same thing, replacing the singly linked list with
> > list_head to link resource child sibling. Just quote Rob's words as
> > below. I think this could be another reason.
> > 
> > ~ From Rob
> > The DT struct device_node also has the same tree structure with
> > parent, child, sibling pointers and converting to list_head had been
> > on the todo list for a while. ACPI also has some tree walking
> > functions (drivers/acpi/acpica/pstree.c). Perhaps there should be a
> > common tree struct and helpers defined either on top of list_head or a
> > ~
> > new struct if that saves some size.
> 
> Please let's get all this into the changelogs?

Sorry for late reply because of some urgent customer hotplug issues.

I am rewriting all change logs, and cover letter. Then found I was wrong
about the 2nd reason. The current kexec_file_load calls
kexec_locate_mem_hole() to go through all system RAM region, if one
region is larger than the size of kernel or initrd, it will search a
position in that region from top to down. Since kexec will jump to 2nd
kernel and don't need to care the 1st kernel's data, we can always find
a usable space to load kexec kernel/initrd under 4G.

So the only reason for this patch is keeping consistent with kexec_load
and avoid confusion.

And since x86 5-level paging mode has been added, we have another issue
for top-down searching in the whole system RAM. That is we support
dynamic 4-level to 5-level changing. Namely a kernel compiled with
5-level support, we can add 'no5lvl' to force 4-level. Then jumping from
a 5-level kernel to 4-level kernel, e.g we load kernel at the top of
system RAM in 5-level paging mode which might be bigger than 64TB, then
try to jump to 4-level kernel with the upper limit of 64TB. For this
case, we need add limit for kexec kernel loading if in 5-level kernel.

All this mess makes me hesitate to choose a deligate method. Maybe I
should drop this patchset.

> 
> > > 
> > > Is that sufficient?  Can we instead simplify their lives by providing
> > > better documentation or informative printks or better Kconfig text,
> > > etc?
> > > 
> > > And who *are* the people who are performing this configuration?  Random
> > > system administrators?  Linux distro engineers?  If the latter then
> > > they presumably aren't easily confused!
> > 
> > Kexec was invented for kernel developer to speed up their kernel
> > rebooting. Now high end sever admin, kernel developer and QE are also
> > keen to use it to reboot large box for faster feature testing, bug
> > debugging. Kernel dev could know this well, about kernel loading
> > position, admin or QE might not be aware of it very well. 
> > 
> > > 
> > > In other words, I'm trying to understand how much benefit this patchset
> > > will provide to our users as a whole.
> > 
> > Understood. The list_head replacing patch truly involes too many code
> > changes, it's risky. I am willing to try any idea from reviewers, won't
> > persuit they have to be accepted finally. If don't have a try, we don't
> > know what it looks like, and what impact it may have. I am fine to take
> > AKASHI's simple version of walk_system_ram_res_rev() to lower risk, even
> > though it could be a little bit low efficient.
> 
> The larger patch produces a better result.  We can handle it ;)

For this issue, if we stop changing the kexec top down searching code,
I am not sure if we should post this replacing with list_head patches
separately.

Thanks
Baoquan


Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-07-24 Thread Anshuman Khandual
On 07/23/2018 02:38 PM, Michael S. Tsirkin wrote:
> On Mon, Jul 23, 2018 at 11:58:23AM +0530, Anshuman Khandual wrote:
>> On 07/20/2018 06:46 PM, Michael S. Tsirkin wrote:
>>> On Fri, Jul 20, 2018 at 09:29:37AM +0530, Anshuman Khandual wrote:
 This patch series is the follow up on the discussions we had before about
 the RFC titled [RFC,V2] virtio: Add platform specific DMA API translation
 for virito devices (https://patchwork.kernel.org/patch/10417371/). There
 were suggestions about doing away with two different paths of transactions
 with the host/QEMU, first being the direct GPA and the other being the DMA
 API based translations.

 First patch attempts to create a direct GPA mapping based DMA operations
 structure called 'virtio_direct_dma_ops' with exact same implementation
 of the direct GPA path which virtio core currently has but just wrapped in
 a DMA API format. Virtio core must use 'virtio_direct_dma_ops' instead of
 the arch default in absence of VIRTIO_F_IOMMU_PLATFORM flag to preserve the
 existing semantics. The second patch does exactly that inside the function
 virtio_finalize_features(). The third patch removes the default direct GPA
 path from virtio core forcing it to use DMA API callbacks for all devices.
 Now with that change, every device must have a DMA operations structure
 associated with it. The fourth patch adds an additional hook which gives
 the platform an opportunity to do yet another override if required. This
 platform hook can be used on POWER Ultravisor based protected guests to
 load up SWIOTLB DMA callbacks to do the required (as discussed previously
 in the above mentioned thread how host is allowed to access only parts of
 the guest GPA range) bounce buffering into the shared memory for all I/O
 scatter gather buffers to be consumed on the host side.

 Please go through these patches and review whether this approach broadly
 makes sense. I will appreciate suggestions, inputs, comments regarding
 the patches or the approach in general. Thank you.
>>> I like how patches 1-3 look. Could you test performance
>>> with/without to see whether the extra indirection through
>>> use of DMA ops causes a measurable slow-down?
>>
>> I ran this simple DD command 10 times where /dev/vda is a virtio block
>> device of 10GB size.
>>
>> dd if=/dev/zero of=/dev/vda bs=8M count=1024 oflag=direct
>>
>> With and without patches bandwidth which has a bit wide range does not
>> look that different from each other.
>>
>> Without patches
>> ===
>>
>> -- 1 -
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.95557 s, 4.4 GB/s
>> -- 2 -
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.05176 s, 4.2 GB/s
>> -- 3 -
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.88314 s, 4.6 GB/s
>> -- 4 -
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.84899 s, 4.6 GB/s
>> -- 5 -
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 5.37184 s, 1.6 GB/s
>> -- 6 -
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.9205 s, 4.5 GB/s
>> -- 7 -
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 6.85166 s, 1.3 GB/s
>> -- 8 -
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.74049 s, 4.9 GB/s
>> -- 9 -
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 6.31699 s, 1.4 GB/s
>> -- 10 -
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.47057 s, 3.5 GB/s
>>
>>
>> With patches
>> 
>>
>> -- 1 -
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.25993 s, 3.8 GB/s
>> -- 2 -
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.82438 s, 4.7 GB/s
>> -- 3 -
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.93856 s, 4.4 GB/s
>> -- 4 -
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.83405 s, 4.7 GB/s
>> -- 5 -
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 7.50199 s, 1.1 GB/s
>> -- 6 -
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.28742 s, 3.8 GB/s
>> -- 7 -
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 5.74958 s, 1.5 GB/s
>> -- 8 -
>> 

Re: [PATCH 1/4] treewide: convert ISO_8859-1 text comments to utf-8

2018-07-24 Thread Michael Ellerman
Arnd Bergmann  writes:

> Almost all files in the kernel are either plain text or UTF-8
> encoded. A couple however are ISO_8859-1, usually just a few
> characters in a C comments, for historic reasons.
>
> This converts them all to UTF-8 for consistency.
>
> Signed-off-by: Arnd Bergmann 
> ---
...
>  drivers/crypto/vmx/ghashp8-ppc.pl |  12 +-
...
> diff --git a/drivers/crypto/vmx/ghashp8-ppc.pl 
> b/drivers/crypto/vmx/ghashp8-ppc.pl
> index f746af271460..38b06503ede0 100644
> --- a/drivers/crypto/vmx/ghashp8-ppc.pl
> +++ b/drivers/crypto/vmx/ghashp8-ppc.pl
> @@ -129,9 +129,9 @@ $code=<<___;
>le?vperm   $IN,$IN,$IN,$lemask
>   vxor$zero,$zero,$zero
>  
> - vpmsumd $Xl,$IN,$Hl # H.loXi.lo
> - vpmsumd $Xm,$IN,$H  # H.hiXi.lo+H.loXi.hi
> - vpmsumd $Xh,$IN,$Hh # H.hiXi.hi
> + vpmsumd $Xl,$IN,$Hl # H.lo·Xi.lo
> + vpmsumd $Xm,$IN,$H  # H.hi·Xi.lo+H.lo·Xi.hi
> + vpmsumd $Xh,$IN,$Hh # H.hi·Xi.hi
>  
>   vpmsumd $t2,$Xl,$xC2# 1st phase
>  
> @@ -187,11 +187,11 @@ $code=<<___;
>  .align   5
>  Loop:
>subic  $len,$len,16
> - vpmsumd $Xl,$IN,$Hl # H.loXi.lo
> + vpmsumd $Xl,$IN,$Hl # H.lo·Xi.lo
>subfe. r0,r0,r0# borrow?-1:0
> - vpmsumd $Xm,$IN,$H  # H.hiXi.lo+H.loXi.hi
> + vpmsumd $Xm,$IN,$H  # H.hi·Xi.lo+H.lo·Xi.hi
>andr0,r0,$len
> - vpmsumd $Xh,$IN,$Hh # H.hiXi.hi
> + vpmsumd $Xh,$IN,$Hh # H.hi·Xi.hi
>add$inp,$inp,r0
>  
>   vpmsumd $t2,$Xl,$xC2# 1st phase

Acked-by: Michael Ellerman  (powerpc)

cheers


Re: [RFC 4/4] virtio: Add platform specific DMA API translation for virito devices

2018-07-24 Thread Anshuman Khandual
On 07/23/2018 07:46 AM, Anshuman Khandual wrote:
> On 07/20/2018 06:45 PM, Michael S. Tsirkin wrote:
>> On Fri, Jul 20, 2018 at 09:29:41AM +0530, Anshuman Khandual wrote:
>>> Subject: Re: [RFC 4/4] virtio: Add platform specific DMA API translation for
>>> virito devices
>>
>> s/virito/virtio/
> 
> Oops, will fix it. Thanks for pointing out.
> 
>>
>>> This adds a hook which a platform can define in order to allow it to
>>> override virtio device's DMA OPS irrespective of whether it has the
>>> flag VIRTIO_F_IOMMU_PLATFORM set or not. We want to use this to do
>>> bounce-buffering of data on the new secure pSeries platform, currently
>>> under development, where a KVM host cannot access all of the memory
>>> space of a secure KVM guest.  The host can only access the pages which
>>> the guest has explicitly requested to be shared with the host, thus
>>> the virtio implementation in the guest has to copy data to and from
>>> shared pages.
>>>
>>> With this hook, the platform code in the secure guest can force the
>>> use of swiotlb for virtio buffers, with a back-end for swiotlb which
>>> will use a pool of pre-allocated shared pages.  Thus all data being
>>> sent or received by virtio devices will be copied through pages which
>>> the host has access to.
>>>
>>> Signed-off-by: Anshuman Khandual 
>>> ---
>>>  arch/powerpc/include/asm/dma-mapping.h | 6 ++
>>>  arch/powerpc/platforms/pseries/iommu.c | 6 ++
>>>  drivers/virtio/virtio.c| 7 +++
>>>  3 files changed, 19 insertions(+)
>>>
>>> diff --git a/arch/powerpc/include/asm/dma-mapping.h 
>>> b/arch/powerpc/include/asm/dma-mapping.h
>>> index 8fa3945..bc5a9d3 100644
>>> --- a/arch/powerpc/include/asm/dma-mapping.h
>>> +++ b/arch/powerpc/include/asm/dma-mapping.h
>>> @@ -116,3 +116,9 @@ extern u64 __dma_get_required_mask(struct device *dev);
>>>  
>>>  #endif /* __KERNEL__ */
>>>  #endif /* _ASM_DMA_MAPPING_H */
>>> +
>>> +#define platform_override_dma_ops platform_override_dma_ops
>>> +
>>> +struct virtio_device;
>>> +
>>> +extern void platform_override_dma_ops(struct virtio_device *vdev);
>>> diff --git a/arch/powerpc/platforms/pseries/iommu.c 
>>> b/arch/powerpc/platforms/pseries/iommu.c
>>> index 06f0296..5773bc7 100644
>>> --- a/arch/powerpc/platforms/pseries/iommu.c
>>> +++ b/arch/powerpc/platforms/pseries/iommu.c
>>> @@ -38,6 +38,7 @@
>>>  #include 
>>>  #include 
>>>  #include 
>>> +#include 
>>>  #include 
>>>  #include 
>>>  #include 
>>> @@ -1396,3 +1397,8 @@ static int __init disable_multitce(char *str)
>>>  __setup("multitce=", disable_multitce);
>>>  
>>>  machine_subsys_initcall_sync(pseries, tce_iommu_bus_notifier_init);
>>> +
>>> +void platform_override_dma_ops(struct virtio_device *vdev)
>>> +{
>>> +   /* Override vdev->parent.dma_ops if required */
>>> +}
>>> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
>>> index 6b13987..432c332 100644
>>> --- a/drivers/virtio/virtio.c
>>> +++ b/drivers/virtio/virtio.c
>>> @@ -168,6 +168,12 @@ EXPORT_SYMBOL_GPL(virtio_add_status);
>>>  
>>>  const struct dma_map_ops virtio_direct_dma_ops;
>>>  
>>> +#ifndef platform_override_dma_ops
>>> +static inline void platform_override_dma_ops(struct virtio_device *vdev)
>>> +{
>>> +}
>>> +#endif
>>> +
>>>  int virtio_finalize_features(struct virtio_device *dev)
>>>  {
>>> int ret = dev->config->finalize_features(dev);
>>> @@ -179,6 +185,7 @@ int virtio_finalize_features(struct virtio_device *dev)
>>> if (virtio_has_iommu_quirk(dev))
>>> set_dma_ops(dev->dev.parent, _direct_dma_ops);
>>>  
>>> +   platform_override_dma_ops(dev);
>>
>> Is there a single place where virtio_has_iommu_quirk is called now?
> 
> Not other than this one. But in the proposed implementation of
> platform_override_dma_ops on powerpc, we will again check on
> virtio_has_iommu_quirk before overriding it with SWIOTLB.
> 
> void platform_override_dma_ops(struct virtio_device *vdev)
> {
> if (is_ultravisor_platform() && virtio_has_iommu_quirk(vdev))
> set_dma_ops(vdev->dev.parent, _dma_ops);
> }
> 
>> If so, we could put this into virtio_has_iommu_quirk then.
> 
> Did you mean platform_override_dma_ops instead ? If so, yes that
> is possible. Default implementation of platform_override_dma_ops
> should just check on VIRTIO_F_IOMMU_PLATFORM feature and override
> with virtio_direct_dma_ops but arch implementation can check on
> what ever else they would like and override appropriately.
> 
> Default platform_override_dma_ops will be like this
> 
> #ifndef platform_override_dma_ops
> static inline void platform_override_dma_ops(struct virtio_device *vdev)
> {
>   if(!virtio_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM))
>   set_dma_ops(dev->dev.parent, _direct_dma_ops);
> }
> #endif
> 
> Proposed powerpc implementation will be like this instead
> 
> void platform_override_dma_ops(struct virtio_device *vdev)
> {
>   if (virtio_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM))
>   return;
> 

Re: [PATCH v3 1/1] KVM: PPC: Book3S HV: pack VCORE IDs to access full VCPU ID space

2018-07-24 Thread Sam Bobroff
On Mon, Jul 23, 2018 at 03:43:37PM +1000, Paul Mackerras wrote:
> On Thu, Jul 19, 2018 at 12:25:10PM +1000, Sam Bobroff wrote:
> > From: Sam Bobroff 
> > 
> > It is not currently possible to create the full number of possible
> > VCPUs (KVM_MAX_VCPUS) on Power9 with KVM-HV when the guest uses less
> > threads per core than it's core stride (or "VSMT mode"). This is
> > because the VCORE ID and XIVE offsets to grow beyond KVM_MAX_VCPUS
> > even though the VCPU ID is less than KVM_MAX_VCPU_ID.
> > 
> > To address this, "pack" the VCORE ID and XIVE offsets by using
> > knowledge of the way the VCPU IDs will be used when there are less
> > guest threads per core than the core stride. The primary thread of
> > each core will always be used first. Then, if the guest uses more than
> > one thread per core, these secondary threads will sequentially follow
> > the primary in each core.
> > 
> > So, the only way an ID above KVM_MAX_VCPUS can be seen, is if the
> > VCPUs are being spaced apart, so at least half of each core is empty
> > and IDs between KVM_MAX_VCPUS and (KVM_MAX_VCPUS * 2) can be mapped
> > into the second half of each core (4..7, in an 8-thread core).
> > 
> > Similarly, if IDs above KVM_MAX_VCPUS * 2 are seen, at least 3/4 of
> > each core is being left empty, and we can map down into the second and
> > third quarters of each core (2, 3 and 5, 6 in an 8-thread core).
> > 
> > Lastly, if IDs above KVM_MAX_VCPUS * 4 are seen, only the primary
> > threads are being used and 7/8 of the core is empty, allowing use of
> > the 1, 3, 5 and 7 thread slots.
> > 
> > (Strides less than 8 are handled similarly.)
> > 
> > This allows the VCORE ID or offset to be calculated quickly from the
> > VCPU ID or XIVE server numbers, without access to the VCPU structure.
> > 
> > Signed-off-by: Sam Bobroff 
> 
> I have some comments relating to the situation where the stride
> (i.e. kvm->arch.emul_smt_mode) is less than 8; see below.
> 
> [snip]
> > +static inline u32 kvmppc_pack_vcpu_id(struct kvm *kvm, u32 id)
> > +{
> > +   const int block_offsets[MAX_SMT_THREADS] = {0, 4, 2, 6, 1, 3, 5, 7};
> 
> This needs to be {0, 4, 2, 6, 1, 5, 3, 7} (with the 3 and 5 swapped
> from what you have) for the case when stride == 4 and block == 3.  In
> that case we need block_offsets[block] to be 3; if it is 5, then we
> will collide with the case where block == 2 for the next virtual core.

Agh! Yes it does.

> > +   int stride = kvm->arch.emul_smt_mode;
> > +   int block = (id / KVM_MAX_VCPUS) * (MAX_SMT_THREADS / stride);
> > +   u32 packed_id;
> > +
> > +   BUG_ON(block >= MAX_SMT_THREADS);
> > +   packed_id = (id % KVM_MAX_VCPUS) + block_offsets[block];
> > +   BUG_ON(packed_id >= KVM_MAX_VCPUS);
> > +   return packed_id;
> > +}
> > +
> >  #endif /* __ASM_KVM_BOOK3S_H__ */
> > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> > index de686b340f4a..363c2fb0d89e 100644
> > --- a/arch/powerpc/kvm/book3s_hv.c
> > +++ b/arch/powerpc/kvm/book3s_hv.c
> > @@ -1816,7 +1816,7 @@ static int threads_per_vcore(struct kvm *kvm)
> > return threads_per_subcore;
> >  }
> >  
> > -static struct kvmppc_vcore *kvmppc_vcore_create(struct kvm *kvm, int core)
> > +static struct kvmppc_vcore *kvmppc_vcore_create(struct kvm *kvm, int id)
> >  {
> > struct kvmppc_vcore *vcore;
> >  
> > @@ -1830,7 +1830,7 @@ static struct kvmppc_vcore 
> > *kvmppc_vcore_create(struct kvm *kvm, int core)
> > init_swait_queue_head(>wq);
> > vcore->preempt_tb = TB_NIL;
> > vcore->lpcr = kvm->arch.lpcr;
> > -   vcore->first_vcpuid = core * kvm->arch.smt_mode;
> > +   vcore->first_vcpuid = id;
> > vcore->kvm = kvm;
> > INIT_LIST_HEAD(>preempt_list);
> >  
> > @@ -2048,12 +2048,18 @@ static struct kvm_vcpu 
> > *kvmppc_core_vcpu_create_hv(struct kvm *kvm,
> > mutex_lock(>lock);
> > vcore = NULL;
> > err = -EINVAL;
> > -   core = id / kvm->arch.smt_mode;
> > +   if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> > +   BUG_ON(kvm->arch.smt_mode != 1);
> > +   core = kvmppc_pack_vcpu_id(kvm, id);
> 
> We now have a way for userspace to trigger a BUG_ON, as far as I can
> see.  The only check on id up to this point is that it is less than
> KVM_MAX_VCPU_ID, which means that the BUG_ON(block >= MAX_SMT_THREADS)
> can be triggered, if kvm->arch.emul_smt_mode < MAX_SMT_THREADS, by
> giving an id that is greater than or equal to KVM_MAX_VCPUS *
> kvm->arch.emul_smt+mode.
> 
> > +   } else {
> > +   core = id / kvm->arch.smt_mode;
> > +   }
> > if (core < KVM_MAX_VCORES) {
> > vcore = kvm->arch.vcores[core];
> > +   BUG_ON(cpu_has_feature(CPU_FTR_ARCH_300) && vcore);
> 
> Doesn't this just mean that userspace has chosen an id big enough to
> cause a collision in the output space of kvmppc_pack_vcpu_id()?  How
> is this not user-triggerable?
> 
> Paul.

Yep, good point. Particularly when dealing with a malicious userspace
that won't follow QEMU's allocation pattern.

I'll re-work it 

Re: [PATCH v11 19/26] mm: provide speculative fault infrastructure

2018-07-24 Thread Laurent Dufour



On 24/07/2018 16:26, zhong jiang wrote:
> On 2018/5/17 19:06, Laurent Dufour wrote:
>> From: Peter Zijlstra 
>>
>> Provide infrastructure to do a speculative fault (not holding
>> mmap_sem).
>>
>> The not holding of mmap_sem means we can race against VMA
>> change/removal and page-table destruction. We use the SRCU VMA freeing
>> to keep the VMA around. We use the VMA seqcount to detect change
>> (including umapping / page-table deletion) and we use gup_fast() style
>> page-table walking to deal with page-table races.
>>
>> Once we've obtained the page and are ready to update the PTE, we
>> validate if the state we started the fault with is still valid, if
>> not, we'll fail the fault with VM_FAULT_RETRY, otherwise we update the
>> PTE and we're done.
>>
>> Signed-off-by: Peter Zijlstra (Intel) 
>>
>> [Manage the newly introduced pte_spinlock() for speculative page
>>  fault to fail if the VMA is touched in our back]
>> [Rename vma_is_dead() to vma_has_changed() and declare it here]
>> [Fetch p4d and pud]
>> [Set vmd.sequence in __handle_mm_fault()]
>> [Abort speculative path when handle_userfault() has to be called]
>> [Add additional VMA's flags checks in handle_speculative_fault()]
>> [Clear FAULT_FLAG_ALLOW_RETRY in handle_speculative_fault()]
>> [Don't set vmf->pte and vmf->ptl if pte_map_lock() failed]
>> [Remove warning comment about waiting for !seq&1 since we don't want
>>  to wait]
>> [Remove warning about no huge page support, mention it explictly]
>> [Don't call do_fault() in the speculative path as __do_fault() calls
>>  vma->vm_ops->fault() which may want to release mmap_sem]
>> [Only vm_fault pointer argument for vma_has_changed()]
>> [Fix check against huge page, calling pmd_trans_huge()]
>> [Use READ_ONCE() when reading VMA's fields in the speculative path]
>> [Explicitly check for __HAVE_ARCH_PTE_SPECIAL as we can't support for
>>  processing done in vm_normal_page()]
>> [Check that vma->anon_vma is already set when starting the speculative
>>  path]
>> [Check for memory policy as we can't support MPOL_INTERLEAVE case due to
>>  the processing done in mpol_misplaced()]
>> [Don't support VMA growing up or down]
>> [Move check on vm_sequence just before calling handle_pte_fault()]
>> [Don't build SPF services if !CONFIG_SPECULATIVE_PAGE_FAULT]
>> [Add mem cgroup oom check]
>> [Use READ_ONCE to access p*d entries]
>> [Replace deprecated ACCESS_ONCE() by READ_ONCE() in vma_has_changed()]
>> [Don't fetch pte again in handle_pte_fault() when running the speculative
>>  path]
>> [Check PMD against concurrent collapsing operation]
>> [Try spin lock the pte during the speculative path to avoid deadlock with
>>  other CPU's invalidating the TLB and requiring this CPU to catch the
>>  inter processor's interrupt]
>> [Move define of FAULT_FLAG_SPECULATIVE here]
>> [Introduce __handle_speculative_fault() and add a check against
>>  mm->mm_users in handle_speculative_fault() defined in mm.h]
>> Signed-off-by: Laurent Dufour 
>> ---
>>  include/linux/hugetlb_inline.h |   2 +-
>>  include/linux/mm.h |  30 
>>  include/linux/pagemap.h|   4 +-
>>  mm/internal.h  |  16 +-
>>  mm/memory.c| 340 
>> -
>>  5 files changed, 385 insertions(+), 7 deletions(-)
>>
>> diff --git a/include/linux/hugetlb_inline.h b/include/linux/hugetlb_inline.h
>> index 0660a03d37d9..9e25283d6fc9 100644
>> --- a/include/linux/hugetlb_inline.h
>> +++ b/include/linux/hugetlb_inline.h
>> @@ -8,7 +8,7 @@
>>  
>>  static inline bool is_vm_hugetlb_page(struct vm_area_struct *vma)
>>  {
>> -return !!(vma->vm_flags & VM_HUGETLB);
>> +return !!(READ_ONCE(vma->vm_flags) & VM_HUGETLB);
>>  }
>>  
>>  #else
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index 05cbba70104b..31acf98a7d92 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -315,6 +315,7 @@ extern pgprot_t protection_map[16];
>>  #define FAULT_FLAG_USER 0x40/* The fault originated in 
>> userspace */
>>  #define FAULT_FLAG_REMOTE   0x80/* faulting for non current tsk/mm */
>>  #define FAULT_FLAG_INSTRUCTION  0x100   /* The fault was during an 
>> instruction fetch */
>> +#define FAULT_FLAG_SPECULATIVE  0x200   /* Speculative fault, not 
>> holding mmap_sem */
>>  
>>  #define FAULT_FLAG_TRACE \
>>  { FAULT_FLAG_WRITE, "WRITE" }, \
>> @@ -343,6 +344,10 @@ struct vm_fault {
>>  gfp_t gfp_mask; /* gfp mask to be used for allocations 
>> */
>>  pgoff_t pgoff;  /* Logical page offset based on vma */
>>  unsigned long address;  /* Faulting virtual address */
>> +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT
>> +unsigned int sequence;
>> +pmd_t orig_pmd; /* value of PMD at the time of fault */
>> +#endif
>>  pmd_t *pmd; /* Pointer to pmd entry matching
>>   * the 'address' */
>>  

Re: [PATCH 4/7] x86,tlb: make lazy TLB mode lazier

2018-07-24 Thread Will Deacon
Hi Andy,

Sorry, I missed the arm64 question at the end of this...

On Thu, Jul 19, 2018 at 10:04:09AM -0700, Andy Lutomirski wrote:
> On Thu, Jul 19, 2018 at 9:45 AM, Andy Lutomirski  wrote:
> > [I added PeterZ and Vitaly -- can you see any way in which this would
> > break something obscure?  I don't.]
> >
> > On Thu, Jul 19, 2018 at 7:14 AM, Rik van Riel  wrote:
> >> I guess we can skip both switch_ldt and load_mm_cr4 if real_prev equals
> >> next?
> >
> > Yes, AFAICS.
> >
> >>
> >> On to the lazy TLB mm_struct refcounting stuff :)
> >>
> >>>
> >>> Which refcount?  mm_users shouldn’t be hot, so I assume you’re talking 
> >>> about
> >>> mm_count. My suggestion is to get rid of mm_count instead of trying to
> >>> optimize it.
> >>
> >>
> >> Do you have any suggestions on how? :)
> >>
> >> The TLB shootdown sent at __exit_mm time does not get rid of the
> >> kernelthread->active_mm
> >> pointer pointing at the mm that is exiting.
> >>
> >
> > Ah, but that's conceptually very easy to fix.  Add a #define like
> > ARCH_NO_TASK_ACTIVE_MM.  Then just get rid of active_mm if that
> > #define is set.  After some grepping, there are very few users.  The
> > only nontrivial ones are the ones in kernel/ and mm/mmu_context.c that
> > are involved in the rather complicated dance of refcounting active_mm.
> > If that field goes away, it doesn't need to be refcounted.  Instead, I
> > think the refcounting can get replaced with something like:
> >
> > /*
> >  * Release any arch-internal references to mm.  Only called when
> > mm_users is zero
> >  * and all tasks using mm have either been switch_mm()'d away or have had
> >  * enter_lazy_tlb() called.
> >  */
> > extern void arch_shoot_down_dead_mm(struct mm_struct *mm);
> >
> > which the kernel calls in __mmput() after tearing down all the page
> > tables.  The body can be something like:
> >
> > if (WARN_ON(cpumask_any_but(mm_cpumask(...), ...)) {
> >   /* send an IPI.  Maybe just call tlb_flush_remove_tables() */
> > }
> >
> > (You'll also have to fix up the highly questionable users in
> > arch/x86/platform/efi/efi_64.c, but that's easy.)
> >
> > Does all that make sense?  Basically, as I understand it, the
> > expensive atomic ops you're seeing are all pointless because they're
> > enabling an optimization that hasn't actually worked for a long time,
> > if ever.
> 
> Hmm.  Xen PV has a big hack in xen_exit_mmap(), which is called from
> arch_exit_mmap(), I think.  It's a heavier weight version of more or
> less the same thing that arch_shoot_down_dead_mm() would be, except
> that it happens before exit_mmap().  But maybe Xen actually has the
> right idea.  In other words, rather doing the big pagetable free in
> exit_mmap() while there may still be other CPUs pointing at the page
> tables, the other order might make more sense.  So maybe, if
> ARCH_NO_TASK_ACTIVE_MM is set, arch_exit_mmap() should be responsible
> for getting rid of all secret arch references to the mm.
> 
> Hmm.  ARCH_FREE_UNUSED_MM_IMMEDIATELY might be a better name.
> 
> I added some more arch maintainers.  The idea here is that, on x86 at
> least, task->active_mm and all its refcounting is pure overhead.  When
> a process exits, __mmput() gets called, but the core kernel has a
> longstanding "optimization" in which other tasks (kernel threads and
> idle tasks) may have ->active_mm pointing at this mm.  This is nasty,
> complicated, and hurts performance on large systems, since it requires
> extra atomic operations whenever a CPU switches between real users
> threads and idle/kernel threads.
> 
> It's also almost completely worthless on x86 at least, since __mmput()
> frees pagetables, and that operation *already* forces a remote TLB
> flush, so we might as well zap all the active_mm references at the
> same time.
> 
> But arm64 has real HW remote flushes.  Does arm64 actually benefit
> from the active_mm optimization?  What happens on arm64 when a process
> exits?  How about s390?  I suspect that x390 has rather larger systems
> than arm64, where the cost of the reference counting can be much
> higher.

IIRC, the TLB invalidation on task exit has the fullmm field set in the
mmu_gather structure, so we don't actually do any TLB invalidation at all.
Instead, we just don't re-allocate the ASID and invalidate the whole TLB
when we run out of ASIDs (they're 16-bit on most Armv8 CPUs).

Does that answer your question?

Will


Re: [V2, 1/2] powerpc/powernv/opal-dump : Handles opal_dump_info properly

2018-07-24 Thread Michael Ellerman
On Mon, 2017-02-20 at 13:22:10 UTC, Mukesh Ojha wrote:
> Moves the return value check of 'opal_dump_info' to a proper place which
> was previously unnecessarily filling all the dump info even on failure.
> 
> Signed-off-by: Mukesh Ojha 
> Acked-by: Stewart Smith 
> Acked-by: Jeremy Kerr 

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/a5bbe8fd29f7e42fe5d26371adbad9

cheers


Re: powerpc/tm: Remove struct thread_info param from tm_reclaim_thread()

2018-07-24 Thread Michael Ellerman
On Thu, 2018-02-01 at 01:07:46 UTC, Cyril Bur wrote:
> tm_reclaim_thread() doesn't use the parameter anymore, both callers have
> to bother getting it as they have no need for a struct thread_info
> either.
> 
> Just remove it and adjust the callers.
> 
> Signed-off-by: Cyril Bur 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/edd00b830731be468fd3caf7f9154d

cheers


Re: powerpc/tm: Update function prototype comment

2018-07-24 Thread Michael Ellerman
On Mon, 2018-02-05 at 05:17:16 UTC, Cyril Bur wrote:
> In commit eb5c3f1c8647 ("powerpc: Always save/restore checkpointed regs
> during treclaim/trecheckpoint") __tm_recheckpoint was modified to no
> longer take the second parameter 'unsigned long orig_msr' as part of a
> TM rewrite to simplify the reclaiming/recheckpointing process.
> 
> There is a comment in the asm file where the function is delcared which
> has an incorrect prototype with the 'orig_msr' parameter.
> 
> This patch corrects the comment.
> 
> Signed-off-by: Cyril Bur 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/a596a7e91710d26fd862e3b7031c40

cheers


Re: [01/15] powerpc/powernv: opal_put_chars partial write fix

2018-07-24 Thread Michael Ellerman
On Mon, 2018-04-30 at 14:55:44 UTC, Nicholas Piggin wrote:
> The intention here is to consume and discard the remaining buffer
> upon error. This works if there has not been a previous partial write.
> If there has been, then total_len is no longer total number of bytes
> to copy. total_len is always "bytes left to copy", so it should be
> added to written bytes.
> 
> This code may not be exercised any more if partial writes will not be
> hit, but this is a small bugfix before a larger change.
> 
> Reviewed-by: Benjamin Herrenschmidt 
> Signed-off-by: Nicholas Piggin 

Patches 1-9 applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/bd90284cc6c1c9e8e48c8eadd0c795

cheers


Re: [v2] powerpc/64s: make PACA_IRQ_HARD_DIS track MSR[EE] closely

2018-07-24 Thread Michael Ellerman
On Sun, 2018-06-03 at 12:24:32 UTC, Nicholas Piggin wrote:
> When the masked interrupt handler clears MSR[EE] for an interrupt in
> the PACA_IRQ_MUST_HARD_MASK set, it does not set PACA_IRQ_HARD_DIS.
> This makes them get out of synch.
> 
> With that taken into account, it's only low level irq manipulation
> (and interrupt entry before reconcile) where they can be out of synch.
> This makes the code less surprising.
> 
> It also allows the IRQ replay code to rely on the IRQ_HARD_DIS value
> and not have to mtmsrd again in this case (e.g., for an external
> interrupt that has been masked). The bigger benefit might just be
> that there is not such an element of surprise in these two bits of
> state.
> 
> Signed-off-by: Nicholas Piggin 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/9b81c0211c249c1bc8caec2ddbc86e

cheers


Re: [v8, 1/5] powerpc/64: Align bytes before fall back to .Lshort in powerpc64 memcmp()

2018-07-24 Thread Michael Ellerman
On Thu, 2018-06-07 at 01:57:51 UTC, wei.guo.si...@gmail.com wrote:
> From: Simon Guo 
> 
> Currently memcmp() 64bytes version in powerpc will fall back to .Lshort
> (compare per byte mode) if either src or dst address is not 8 bytes aligned.
> It can be opmitized in 2 situations:
> 
> 1) if both addresses are with the same offset with 8 bytes boundary:
> memcmp() can compare the unaligned bytes within 8 bytes boundary firstly
> and then compare the rest 8-bytes-aligned content with .Llong mode.
> 
> 2)  If src/dst addrs are not with the same offset of 8 bytes boundary:
> memcmp() can align src addr with 8 bytes, increment dst addr accordingly,
>  then load src with aligned mode and load dst with unaligned mode.
> 
> This patch optmizes memcmp() behavior in the above 2 situations.
> 
> Tested with both little/big endian. Performance result below is based on
> little endian.
> 
> Following is the test result with src/dst having the same offset case:
> (a similar result was observed when src/dst having different offset):
> (1) 256 bytes
> Test with the existing tools/testing/selftests/powerpc/stringloops/memcmp:
> - without patch
>   29.773018302 seconds time elapsed   
>( +- 0.09% )
> - with patch
>   16.485568173 seconds time elapsed   
>( +-  0.02% )
>   -> There is ~+80% percent improvement
> 
> (2) 32 bytes
> To observe performance impact on < 32 bytes, modify
> tools/testing/selftests/powerpc/stringloops/memcmp.c with following:
> ---
>  #include 
>  #include "utils.h"
> 
> -#define SIZE 256
> +#define SIZE 32
>  #define ITERATIONS 1
> 
>  int test_memcmp(const void *s1, const void *s2, size_t n);
> 
> 
> - Without patch
>   0.244746482 seconds time elapsed
>   ( +-  0.36%)
> - with patch
>   0.215069477 seconds time elapsed
>   ( +-  0.51%)
>   -> There is ~+13% improvement
> 
> (3) 0~8 bytes
> To observe <8 bytes performance impact, modify
> tools/testing/selftests/powerpc/stringloops/memcmp.c with following:
> ---
>  #include 
>  #include "utils.h"
> 
> -#define SIZE 256
> -#define ITERATIONS 1
> +#define SIZE 8
> +#define ITERATIONS 100
> 
>  int test_memcmp(const void *s1, const void *s2, size_t n);
> ---
> - Without patch
>1.845642503 seconds time elapsed   
>( +- 0.12% )
> - With patch
>1.849767135 seconds time elapsed   
>( +- 0.26% )
>   -> They are nearly the same. (-0.2%)
> 
> Signed-off-by: Simon Guo 

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/2d9ee327adce5f6becea2dd51d282a

cheers


Re: [1/2] powerpc/mm: Check memblock_add against MAX_PHYSMEM_BITS range

2018-07-24 Thread Michael Ellerman
On Thu, 2018-06-21 at 08:31:57 UTC, "Aneesh Kumar K.V" wrote:
> With SPARSEMEM config enabled, we make sure that we don't add sections beyond
> MAX_PHYSMEM_BITS range. This results in not building vmemmap mapping for
> range beyond max range. But our memblock layer looks the device tree and 
> create
> mapping for the full memory range. Prevent this by checking against
> MAX_PHSYSMEM_BITS when doing memblock_add.
> 
> We don't do similar check for memeblock_reserve_range. If reserve range is 
> beyond
> MAX_PHYSMEM_BITS we expect that to be configured with 'nomap'. Any other
> reserved range should come from existing memblock ranges which we already
> filtered while adding.
> 
> This avoids crash as below when running on a system with system ram config 
> above
> MAX_PHSYSMEM_BITS
> 
>  Unable to handle kernel paging request for data at address 0xc00a00100440
>  Faulting instruction address: 0xc1034118
>  cpu 0x0: Vector: 300 (Data Access) at [c124fb30]
>  pc: c1034118: __free_pages_bootmem+0xc0/0x1c0
>  lr: c103b258: free_all_bootmem+0x19c/0x22c
>  sp: c124fdb0
> msr: 92001033
> dar: c00a00100440
>   dsisr: 4000
>current = 0xc120dd00
>paca= 0xc1f6^I irqmask: 0x03^I irq_happened: 0x01
>  pid   = 0, comm = swapper
>  [c124fe20] c103b258 free_all_bootmem+0x19c/0x22c
>  [c124fee0] c1010a68 mem_init+0x3c/0x5c
>  [c124ff00] c100401c start_kernel+0x298/0x5e4
>  [c124ff90] c000b57c start_here_common+0x1c/0x520
> 
> Signed-off-by: Aneesh Kumar K.V 

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/6aba0c84ec474534bbae3675e95464

cheers


Re: [1/3] powerpc/mm/hash: Remove the superfluous bitwise operation when find hpte group

2018-07-24 Thread Michael Ellerman
On Fri, 2018-06-29 at 08:36:29 UTC, "Aneesh Kumar K.V" wrote:
> From: "Aneesh Kumar K.V" 
> 
> When computing the starting slot number for a hash page table group we used
> to do this
> hpte_group = ((hash & htab_hash_mask) * HPTES_PER_GROUP) & ~0x7UL;
> 
> Multiplying with 8 (HPTES_PER_GROUP) imply the last three bits are 0. Hence we
> really don't need to clear then separately.
> 
> Signed-off-by: Aneesh Kumar K.V 

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/1531cff44b5bb30c899404c044805e

cheers


[PATCH v8 1/2] powernv:opal-sensor-groups: Add support to enable sensor groups

2018-07-24 Thread Shilpasri G Bhat
Adds support to enable/disable a sensor group at runtime. This
can be used to select the sensor groups that needs to be copied to
main memory by OCC. Sensor groups like power, temperature, current,
voltage, frequency, utilization can be enabled/disabled at runtime.

Signed-off-by: Shilpasri G Bhat 
---
 arch/powerpc/include/asm/opal-api.h|  1 +
 arch/powerpc/include/asm/opal.h|  2 ++
 .../powerpc/platforms/powernv/opal-sensor-groups.c | 28 ++
 arch/powerpc/platforms/powernv/opal-wrappers.S |  1 +
 4 files changed, 32 insertions(+)

diff --git a/arch/powerpc/include/asm/opal-api.h 
b/arch/powerpc/include/asm/opal-api.h
index 3bab299..56a94a1 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -206,6 +206,7 @@
 #define OPAL_NPU_SPA_CLEAR_CACHE   160
 #define OPAL_NPU_TL_SET161
 #define OPAL_SENSOR_READ_U64   162
+#define OPAL_SENSOR_GROUP_ENABLE   163
 #define OPAL_PCI_GET_PBCQ_TUNNEL_BAR   164
 #define OPAL_PCI_SET_PBCQ_TUNNEL_BAR   165
 #define OPAL_LAST  165
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index e1b2910..fc0550e 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -292,6 +292,7 @@ int64_t opal_imc_counters_init(uint32_t type, uint64_t 
address,
 int opal_get_power_shift_ratio(u32 handle, int token, u32 *psr);
 int opal_set_power_shift_ratio(u32 handle, int token, u32 psr);
 int opal_sensor_group_clear(u32 group_hndl, int token);
+int opal_sensor_group_enable(u32 group_hndl, int token, bool enable);
 
 s64 opal_signal_system_reset(s32 cpu);
 s64 opal_quiesce(u64 shutdown_type, s32 cpu);
@@ -326,6 +327,7 @@ extern int opal_async_wait_response_interruptible(uint64_t 
token,
struct opal_msg *msg);
 extern int opal_get_sensor_data(u32 sensor_hndl, u32 *sensor_data);
 extern int opal_get_sensor_data_u64(u32 sensor_hndl, u64 *sensor_data);
+extern int sensor_group_enable(u32 grp_hndl, bool enable);
 
 struct rtc_time;
 extern time64_t opal_get_boot_time(void);
diff --git a/arch/powerpc/platforms/powernv/opal-sensor-groups.c 
b/arch/powerpc/platforms/powernv/opal-sensor-groups.c
index 541c9ea..f7d04b6 100644
--- a/arch/powerpc/platforms/powernv/opal-sensor-groups.c
+++ b/arch/powerpc/platforms/powernv/opal-sensor-groups.c
@@ -32,6 +32,34 @@ struct sg_attr {
struct sg_attr *sgattrs;
 } *sgs;
 
+int sensor_group_enable(u32 handle, bool enable)
+{
+   struct opal_msg msg;
+   int token, ret;
+
+   token = opal_async_get_token_interruptible();
+   if (token < 0)
+   return token;
+
+   ret = opal_sensor_group_enable(handle, token, enable);
+   if (ret == OPAL_ASYNC_COMPLETION) {
+   ret = opal_async_wait_response(token, );
+   if (ret) {
+   pr_devel("Failed to wait for the async response\n");
+   ret = -EIO;
+   goto out;
+   }
+   ret = opal_error_code(opal_get_async_rc(msg));
+   } else {
+   ret = opal_error_code(ret);
+   }
+
+out:
+   opal_async_release_token(token);
+   return ret;
+}
+EXPORT_SYMBOL_GPL(sensor_group_enable);
+
 static ssize_t sg_store(struct kobject *kobj, struct kobj_attribute *attr,
const char *buf, size_t count)
 {
diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S 
b/arch/powerpc/platforms/powernv/opal-wrappers.S
index a8d9b40..8268a1e 100644
--- a/arch/powerpc/platforms/powernv/opal-wrappers.S
+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
@@ -327,3 +327,4 @@ OPAL_CALL(opal_npu_tl_set,  
OPAL_NPU_TL_SET);
 OPAL_CALL(opal_pci_get_pbcq_tunnel_bar,
OPAL_PCI_GET_PBCQ_TUNNEL_BAR);
 OPAL_CALL(opal_pci_set_pbcq_tunnel_bar,
OPAL_PCI_SET_PBCQ_TUNNEL_BAR);
 OPAL_CALL(opal_sensor_read_u64,OPAL_SENSOR_READ_U64);
+OPAL_CALL(opal_sensor_group_enable,OPAL_SENSOR_GROUP_ENABLE);
-- 
1.8.3.1



Re: [PATCH] net: ethernet: fs-enet: Use generic CRC32 implementation

2018-07-24 Thread Krzysztof Kozlowski
On 24 July 2018 at 13:05, David Laight  wrote:
> From: Krzysztof Kozlowski
>> Sent: 23 July 2018 17:20
>> Use generic kernel CRC32 implementation because it:
>> 1. Should be faster (uses lookup tables),
>
> Are you sure?
> The lookup tables are unlikely to be in the data cache and
> the 6 cache misses kill performance.
> (Not that it particularly matters when setting up multicast hash tables).

Good point, so this statement should be rather "Could be faster"... I
did not run any performance tests so this is not backed up by any
data.

I think the main benefit is rather easier code maintenance by removing
duplicated, custom code.

>> 2. Removes duplicated CRC generation code,
>> 3. Uses well-proven algorithm instead of coding it one more time.
> ...
>>
>> Not tested on hardware.
>
> Have you verified that the old and new functions give the
> same result for a few mac addresses?
> It is very easy to use the wrong bits in crc calculations
> or generate the output in the wrong bit order.

I copied the original code and new one onto a different driver and run
this in a loop for thousands of data input (although not all possible
MAC combinations). The output was the same. I agree however that real
testing would be important.

Best regards,
Krzysztof


Re: [v2] powerpc: NMI IPI make NMI IPIs fully sychronous

2018-07-24 Thread Michael Ellerman
On Wed, 2018-04-25 at 05:17:59 UTC, Nicholas Piggin wrote:
> There is an asynchronous aspect to smp_send_nmi_ipi. The caller waits
> for all CPUs to call in to the handler, but it does not wait for
> completion of the handler. This is a needless complication, so remove
> it and always wait synchronously.
> 
> The synchronous wait allows the caller to easily time out and clear
> the wait for completion (zero nmi_ipi_busy_count) in the case of badly
> behaved handlers. This would have prevented the recent smp_send_stop
> NMI IPI bug from causing the system to hang.
> 
> Signed-off-by: Nicholas Piggin 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/5b73151fff63fb019db8171cb81c6c

cheers


Re: powerpc64s: Show ori31 availability in spectre_v1 sysfs file not v2

2018-07-24 Thread Michael Ellerman
On Mon, 2018-07-09 at 06:25:21 UTC, Michael Ellerman wrote:
> When I added the spectre_v2 information in sysfs, I included the
> availability of the ori31 speculation barrier.
> 
> Although the ori31 barrier can be used to mitigate v2, it's primarily
> intended as a spectre v1 mitigation. Spectre v2 is mitigated by
> hardware changes.
> 
> So rework the sysfs files to show the ori31 information in the
> spectre_v1 file, rather than v2.
> 
> Currently we display eg:
> 
>   $ grep . spectre_v*
>   spectre_v1:Mitigation: __user pointer sanitization
>   spectre_v2:Mitigation: Indirect branch cache disabled, ori31 speculation 
> barrier enabled
> 
> After:
> 
>   $ grep . spectre_v*
>   spectre_v1:Mitigation: __user pointer sanitization, ori31 speculation 
> barrier enabled
>   spectre_v2:Mitigation: Indirect branch cache disabled
> 
> Fixes: d6fbe1c55c55 ("powerpc/64s: Wire up cpu_show_spectre_v2()")
> Cc: sta...@vger.kernel.org # v4.17+
> Signed-off-by: Michael Ellerman 

Applied to powerpc next.

https://git.kernel.org/powerpc/c/6d44acae1937b81cf8115ada8958e0

cheers


Re: [1/2] powerpc: Add ppc32_allmodconfig defconfig target

2018-07-24 Thread Michael Ellerman
On Mon, 2018-07-09 at 14:24:25 UTC, Michael Ellerman wrote:
> Because the allmodconfig logic just sets every symbol to M or Y, it
> has the effect of always generating a 64-bit config, because
> CONFIG_PPC64 becomes Y.
> 
> So to make it easier for folks to test 32-bit code, provide a phony
> defconfig target that generates a 32-bit allmodconfig.
> 
> The 32-bit port has several mutually exclusive CPU types, we choose
> the Book3S variants as that's what the help text in Kconfig says is
> most common.
> 
> Signed-off-by: Michael Ellerman 

Series applied to powerpc next.

https://git.kernel.org/powerpc/c/8db0c9d416f26018cb7cabfb0b144f

cheers


Re: powerpc/mm/hash: Improve error reporting on HCALL failures

2018-07-24 Thread Michael Ellerman
On Fri, 2018-06-29 at 08:39:04 UTC, "Aneesh Kumar K.V" wrote:
> This patch adds error reporting to H_ENTER and H_READ hcalls. A failure for
> both these hcalls are mostly fatal and it would be good to log the failure
> reason.
> 
> We also switch printk to pr_*
> 
> Signed-off-by: Aneesh Kumar K.V 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/ca42d8d2d6c55822fa8f1d230ffa3b

cheers


Re: [v3, 1/9] powerpc/pkeys: Give all threads control of their key permissions

2018-07-24 Thread Michael Ellerman
On Tue, 2018-07-17 at 13:51:02 UTC, Ram Pai wrote:
> Currently in a multithreaded application, a key allocated by one
> thread is not usable by other threads. By "not usable" we mean that
> other threads are unable to change the access permissions for that
> key for themselves.
> 
> When a new key is allocated in one thread, the corresponding UAMOR
> bits for that thread get enabled, however the UAMOR bits for that key
> for all other threads remain disabled.
> 
> Other threads have no way to set permissions on the key, and the
> current default permissions are that read/write is enabled for all
> keys, which means the key has no effect for other threads. Although
> that may be the desired behaviour in some circumstances, having all
> threads able to control their permissions for the key is more
> flexible.
> 
> The current behaviour also differs from the x86 behaviour, which is
> problematic for users.
> 
> To fix this, enable the UAMOR bits for all keys, at process
> creation (in start_thread(), ie exec time). Since the contents of
> UAMOR are inherited at fork, all threads are capable of modifying the
> permissions on any key.
> 
> This is technically an ABI break on powerpc, but pkey support is fairly
> new on powerpc and not widely used, and this brings us into
> line with x86.
> 
> Fixes: cf43d3b26452 ("powerpc: Enable pkey subsystem")
> Cc: sta...@vger.kernel.org # v4.16+
> Tested-by: Florian Weimer 
> Signed-off-by: Ram Pai 
> [mpe: Reword some of the changelog]
> Signed-off-by: Michael Ellerman 

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/a57a04c76e06822e4377831611364c

cheers


[PATCH] powerpc: Add a checkpatch wrapper with our preferred settings

2018-07-24 Thread Michael Ellerman
This makes it easy to run checkpatch with settings that we have agreed
on (bwhahahahah).

Usage is eg:

  $ ./arch/powerpc/tools/checkpatch.sh -g origin/master..

To check all commits since origin/master.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/tools/checkpatch.sh | 21 +
 1 file changed, 21 insertions(+)
 create mode 100755 arch/powerpc/tools/checkpatch.sh

diff --git a/arch/powerpc/tools/checkpatch.sh b/arch/powerpc/tools/checkpatch.sh
new file mode 100755
index ..4c2ac4655e26
--- /dev/null
+++ b/arch/powerpc/tools/checkpatch.sh
@@ -0,0 +1,21 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0+
+# Copyright 2018, Michael Ellerman, IBM Corporation.
+#
+# Wrapper around checkpatch that uses our preferred settings
+
+script_base=$(realpath $(dirname $0))
+
+exec $script_base/../../../scripts/checkpatch.pl \
+   --subjective \
+   --max-line-length=90 \
+   --show-types \
+   --ignore ARCH_INCLUDE_LINUX \
+   --ignore BIT_MACRO \
+   --ignore COMPARISON_TO_NULL \
+   --ignore EMAIL_SUBJECT \
+   --ignore FILE_PATH_CHANGES \
+   --ignore GLOBAL_INITIALISERS \
+   --ignore LINE_SPACING \
+   --ignore MULTIPLE_ASSIGNMENTS \
+   $@
-- 
2.14.1



RE: [PATCH] net: ethernet: fs-enet: Use generic CRC32 implementation

2018-07-24 Thread David Laight
From: Krzysztof Kozlowski
> Sent: 24 July 2018 12:12
...
> >> Not tested on hardware.
> >
> > Have you verified that the old and new functions give the
> > same result for a few mac addresses?
> > It is very easy to use the wrong bits in crc calculations
> > or generate the output in the wrong bit order.
> 
> I copied the original code and new one onto a different driver and run
> this in a loop for thousands of data input (although not all possible
> MAC combinations). The output was the same. I agree however that real
> testing would be important.

Since CRC are linear you only need to check that each input
bit generates the correct output.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)


RE: [PATCH] net: ethernet: fs-enet: Use generic CRC32 implementation

2018-07-24 Thread David Laight
From: Krzysztof Kozlowski
> Sent: 23 July 2018 17:20
> Use generic kernel CRC32 implementation because it:
> 1. Should be faster (uses lookup tables),

Are you sure?
The lookup tables are unlikely to be in the data cache and
the 6 cache misses kill performance.
(Not that it particularly matters when setting up multicast hash tables).

> 2. Removes duplicated CRC generation code,
> 3. Uses well-proven algorithm instead of coding it one more time.
...
> 
> Not tested on hardware.

Have you verified that the old and new functions give the
same result for a few mac addresses?
It is very easy to use the wrong bits in crc calculations
or generate the output in the wrong bit order.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)