Re: [PATCH V3 0/3] Add new PowerPC specific ELF core notes

2014-07-24 Thread Sam Bobroff
On 24/05/14 01:15, Anshuman Khandual wrote:
   This patch series adds five new ELF core note sections which can be
 used with existing ptrace request PTRACE_GETREGSET/SETREGSET for accessing
 various transactional memory and miscellaneous register sets on PowerPC
 platform. Please find a test program exploiting these new ELF core note
 types on a POWER8 system.
 
 RFC: https://lkml.org/lkml/2014/4/1/292
 V1:  https://lkml.org/lkml/2014/4/2/43
 V2:  https://lkml.org/lkml/2014/5/5/88
 
 Changes in V3
 =
 (1) Added two new error paths in every TM related get/set functions when 
 regset
 support is not present on the system (ENODEV) or when the process does not
 have any transaction active (ENODATA) in the context
 
 (2) Installed the active hooks for all the newly added regset core note types
 
 Changes in V2
 =
 (1) Removed all the power specific ptrace requests corresponding to new 
 NT_PPC_*
 elf core note types. Now all the register sets can be accessed from ptrace
 through PTRACE_GETREGSET/PTRACE_SETREGSET using the individual NT_PPC* 
 core
 note type instead
 (2) Fixed couple of attribute values for REGSET_TM_CGPR register set
 (3) Renamed flush_tmreg_to_thread as flush_tmregs_to_thread
 (4) Fixed 32 bit checkpointed GPR support
 (5) Changed commit messages accordingly
 
 Outstanding Issues
 ==
 (1) Running DSCR register value inside a transaction does not seem to be saved
 at thread.dscr when the process stops for ptrace examination.

Since this is fixed by 96d016108640bc2b7fb0ee800737f80923847294, which
is already upstream, you might want to rebase and re-test. It should
pass and then you can remove the outstanding issues :-)

 
 Test programs

program

When I posted the patch I mentioned above, I was asked to move the test
code into the powerpc kernel selftests so you may want to do this too.

Also, your test program covers everything mine did and more so you might
want to remove mine if you do add this to the selftests.

 =
 #include unistd.h
 #include stdlib.h
 #include string.h
 #include malloc.h
 #include errno.h
 #include sys/ptrace.h
 #include sys/uio.h
 #include sys/types.h
 #include sys/signal.h
 #include sys/user.h

You should include sys/wait.h for waitpid().

 #include linux/elf.h
 #include linux/types.h
 #include linux/ptrace.h
 
 typedef long long u64;
 typedef unsigned int u32;
 typedef __vector128 vector128;
 
 /* TM CFPR */
 struct tm_cfpr {
   u64 fpr[32];
   u64 fpscr;
 };
 
 /* TM CVMX */
 struct tm_cvmx {
 vector128 vr[32] __attribute__((aligned(16)));
 vector128 vscr __attribute__((aligned(16)));
   u32 vrsave; 
 };
 
 /* TM SPR */
 struct tm_spr_regs {
   u64 tm_tfhar;
   u64 tm_texasr;
   u64 tm_tfiar;
   u64 tm_orig_msr;
   u64 tm_tar;
   u64 tm_ppr;
   u64 tm_dscr;
 };
 
 /* Miscellaneous registers */
 struct misc_regs {
   u64 dscr;
   u64 ppr;
   u64 tar;
 };
 
 /* TM instructions */
 #define TBEGIN  .long 0x7C00051D ;
 #define TEND.long 0x7C00055D ;
 
 /* SPR number */
 #define SPRN_DSCR 0x3
 #define SPRN_TAR  815
 
 /* ELF core notes */
 #define NT_PPC_TM_SPR  0x103   /* PowerPC transactional memory 
 special registers */
 #define NT_PPC_TM_CGPR 0x104   /* PowerpC transactional memory 
 checkpointed GPR */
 #define NT_PPC_TM_CFPR 0x105   /* PowerPC transactional memory 
 checkpointed FPR */
 #define NT_PPC_TM_CVMX 0x106   /* PowerPC transactional memory 
 checkpointed VMX */
 #define NT_PPC_MISC0x107   /* PowerPC miscellaneous registers */
 
 #define VAL1 1
 #define VAL2 2
 #define VAL3 3
 #define VAL4 4
 
 int main(int argc, char *argv[])
 {
   struct tm_spr_regs *tmr1;
   struct pt_regs *pregs1, *pregs2;
   struct tm_cfpr *fpr, *fpr1;
   struct misc_regs *dbr1;
   struct iovec iov;
 
   pid_t child;
   int ret = 0, status = 0, i = 0, flag = 1;

status, i and flags are all unused.

 
   pregs2 = (struct pt_regs *) malloc(sizeof(struct pt_regs));
   fpr = (struct tm_cfpr *) malloc(sizeof(struct tm_cfpr));
 
   child = fork();
   if (child  0) {
   printf(fork() failed \n);
   exit(-1);
   }
 
   /* Child code */
   if (child == 0) {
   asm __volatile__(
   6: ;  /* TM checkpointed values */
   li 1, %[val1];/* GPR[1] */
   .long 0x7C210166; /* FPR[1] */
   li 2, %[val2];/* GPR[2] */
   .long 0x7C420166; /* FPR[2] */
   mtspr %[tar], 1;  /* TAR */
   mtspr %[dscr], 2; /* DSCR */
   1: ;
   TBEGIN  /* TM running values */
   beq 2f ;
   

Re: [PATCH V3 2/3] powerpc, ptrace: Enable support for transactional memory register sets

2014-07-24 Thread Sam Bobroff
On 24/05/14 01:15, Anshuman Khandual wrote:
 This patch enables get and set of transactional memory related register
 sets through PTRACE_GETREGSET/PTRACE_SETREGSET interface by implementing
 four new powerpc specific register sets i.e REGSET_TM_SPR, REGSET_TM_CGPR,
 REGSET_TM_CFPR, REGSET_CVMX support corresponding to these following new
 ELF core note types added previously in this regard.
 
   (1) NT_PPC_TM_SPR
   (2) NT_PPC_TM_CGPR
   (3) NT_PPC_TM_CFPR
   (4) NT_PPC_TM_CVMX
 
 Signed-off-by: Anshuman Khandual khand...@linux.vnet.ibm.com

Hi Anshuman,

I'm not Ben but I've reviewed your patch as well as I can and I have
some comments that might be useful to you.

First of all, I couldn't get this to compile without CONFIG_VSX and
CONFIG_PPC_TRANSACTIONAL_MEM defined: there are obvious typos (esle
instead of else) and references to fields that aren't defined for
those cases. I haven't mentioned any of those issues below as the
compiler will do that but you should definitely test those configurations.

Also some of the code seems to assume that if CONFIG_VSX is defined then
CONFIG_PPC_TRANSACTIONAL_MEM must also be defined, but that isn't the
case (it's the other way round: CONFIG_PPC_TRANSACTIONAL_MEM implies
CONFIG_VSX).

 ---
  arch/powerpc/include/asm/switch_to.h |   8 +
  arch/powerpc/kernel/process.c|  24 ++
  arch/powerpc/kernel/ptrace.c | 792 
 +--
  3 files changed, 795 insertions(+), 29 deletions(-)
 
 diff --git a/arch/powerpc/include/asm/switch_to.h 
 b/arch/powerpc/include/asm/switch_to.h
 index 0e83e7d..2737f46 100644
 --- a/arch/powerpc/include/asm/switch_to.h
 +++ b/arch/powerpc/include/asm/switch_to.h
 @@ -80,6 +80,14 @@ static inline void flush_spe_to_thread(struct task_struct 
 *t)
  }
  #endif
  
 +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
 +extern void flush_tmregs_to_thread(struct task_struct *);
 +#else
 +static inline void flush_tmregs_to_thread(struct task_struct *t)
 +{
 +}
 +#endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
 +
  static inline void clear_task_ebb(struct task_struct *t)
  {
  #ifdef CONFIG_PPC_BOOK3S_64
 diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
 index 31d0215..e247898 100644
 --- a/arch/powerpc/kernel/process.c
 +++ b/arch/powerpc/kernel/process.c
 @@ -695,6 +695,30 @@ static inline void __switch_to_tm(struct task_struct 
 *prev)
   }
  }
  
 +void flush_tmregs_to_thread(struct task_struct *tsk)
 +{
 + /*
 +  * If task is not current, it should have been flushed
 +  * already to it's thread_struct during __switch_to().
 +  */
 + if (tsk != current)
 + return;
 +
 + preempt_disable();
 + if (tsk-thread.regs) {
 + /*
 +  * If we are still current, the TM state need to
 +  * be flushed to thread_struct as it will be still
 +  * present in the current cpu.
 +  */
 + if (MSR_TM_ACTIVE(tsk-thread.regs-msr)) {
 + __switch_to_tm(tsk);
 + tm_recheckpoint_new_task(tsk);

There is at least one other usage of this pair of calls in order to
flush the TM state (in arch_dup_task_struct()), so rather than copying
it you might want to create a new function and call it from both places.
(And include the nice comment from arch_dup_task_struct() that explains
how it works and why.)

 + }
 + }
 + preempt_enable();
 +}
 +
  /*
   * This is called if we are on the way out to userspace and the
   * TIF_RESTORE_TM flag is set.  It checks if we need to reload
 diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
 index 2e3d2bf..17642ef 100644
 --- a/arch/powerpc/kernel/ptrace.c
 +++ b/arch/powerpc/kernel/ptrace.c
 @@ -357,6 +357,17 @@ static int gpr_set(struct task_struct *target, const 
 struct user_regset *regset,
   return ret;
  }
  
 +/*
 + * When any transaction is active, thread_struct-transact_fp holds
 + * the current running value of all FPR registers and thread_struct-
 + * fp_state holds the last checkpointed FPR registers state for the
 + * current transaction.
 + *
 + * struct data {
 + *   u64 fpr[32];
 + *   u64 fpscr;
 + * };

It would be nice to say why you've included struct data in the comment.

 + */
  static int fpr_get(struct task_struct *target, const struct user_regset 
 *regset,
  unsigned int pos, unsigned int count,
  void *kbuf, void __user *ubuf)
 @@ -365,21 +376,41 @@ static int fpr_get(struct task_struct *target, const 
 struct user_regset *regset,
   u64 buf[33];
   int i;
  #endif
 - flush_fp_to_thread(target);
 + if (MSR_TM_ACTIVE(target-thread.regs-msr)) {
 + flush_fp_to_thread(target);
 + flush_altivec_to_thread(target);
 + flush_tmregs_to_thread(target);
 + } else {
 + flush_fp_to_thread(target);
 + }

I don't see why you need the else. Could this be:


[PATCH 3/3] powerpc: Document sysfs DSCR interface

2014-05-21 Thread Sam Bobroff
Add some documentation about ...

/sys/devices/system/cpu/dscr_default
/sys/devices/system/cpu/cpuN/dscr

... to Documentation/ABI/stable.

Signed-off-by: Sam Bobroff sam.bobr...@au1.ibm.com
---
 Documentation/ABI/stable/sysfs-devices-system-cpu |   25 +
 1 file changed, 25 insertions(+)
 create mode 100644 Documentation/ABI/stable/sysfs-devices-system-cpu

diff --git a/Documentation/ABI/stable/sysfs-devices-system-cpu 
b/Documentation/ABI/stable/sysfs-devices-system-cpu
new file mode 100644
index 000..33c133e
--- /dev/null
+++ b/Documentation/ABI/stable/sysfs-devices-system-cpu
@@ -0,0 +1,25 @@
+What:  /sys/devices/system/cpu/dscr_default
+Date:  13-May-2014
+KernelVersion: v3.15.0
+Contact:
+Description:   Writes are equivalent to writing to
+   /sys/devices/system/cpu/cpuN/dscr on all CPUs.
+   Reads return the last written value or 0.
+   This value is not a global default: it is a way to set
+   all per-CPU defaults at the same time.
+Values:64 bit unsigned integer (bit field)
+
+What:  /sys/devices/system/cpu/cpu[0-9]+/dscr
+Date:  13-May-2014
+KernelVersion: v3.15.0
+Contact:
+Description:   Default value for the Data Stream Control Register (DSCR) on
+   a CPU.
+   This default value is used when the kernel is executing and
+   for any process that has not set the DSCR itself.
+   If a process ever sets the DSCR (via direct access to the
+   SPR) that value will be persisted for that process and used
+   on any CPU where it executes (overriding the value described
+   here).
+   If set by a process it will be inherited by child processes.
+Values:64 bit unsigned integer (bit field)
-- 
1.7.10.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 2/3] powerpc: fix regression of per-CPU DSCR setting

2014-05-21 Thread Sam Bobroff
Since commit efcac65 powerpc: Per process DSCR + some fixes (try#4)
it is no longer possible to set the DSCR on a per-CPU basis.

The old behaviour was to minipulate the DSCR SPR directly but this is no
longer sufficient: the value is quickly overwritten by context switching.

This patch stores the per-CPU DSCR value in a kernel variable rather than
directly in the SPR and it is used whenever a process has not set the DSCR
itself. The sysfs interface (/sys/devices/system/cpu/cpuN/dscr) is unchanged.

Writes to the old global default (/sys/devices/system/cpu/dscr_default)
now set all of the per-CPU values and reads return the last written value.

The new per-CPU default is added to the paca_struct and is used everywhere
outside of sysfs.c instead of the old global default.

Signed-off-by: Sam Bobroff sam.bobr...@au1.ibm.com
---
 arch/powerpc/include/asm/paca.h |3 +++
 arch/powerpc/kernel/asm-offsets.c   |1 +
 arch/powerpc/kernel/entry_64.S  |9 +
 arch/powerpc/kernel/sysfs.c |   32 ++-
 arch/powerpc/kernel/tm.S|   16 
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |3 +--
 6 files changed, 29 insertions(+), 35 deletions(-)

diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index 8e956a0..bb0bd25 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -92,7 +92,10 @@ struct paca_struct {
struct slb_shadow *slb_shadow_ptr;
struct dtl_entry *dispatch_log;
struct dtl_entry *dispatch_log_end;
+#endif /* CONFIG_PPC_STD_MMU_64 */
+   u64 dscr_default;   /* per-CPU default DSCR */
 
+#ifdef CONFIG_PPC_STD_MMU_64
/*
 * Now, starting in cacheline 2, the exception save areas
 */
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index dba8140..cba2697 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -247,6 +247,7 @@ int main(void)
 #endif
DEFINE(PACAHWCPUID, offsetof(struct paca_struct, hw_cpu_id));
DEFINE(PACAKEXECSTATE, offsetof(struct paca_struct, kexec_state));
+   DEFINE(PACA_DSCR, offsetof(struct paca_struct, dscr_default));
DEFINE(PACA_STARTTIME, offsetof(struct paca_struct, starttime));
DEFINE(PACA_STARTTIME_USER, offsetof(struct paca_struct, 
starttime_user));
DEFINE(PACA_USER_TIME, offsetof(struct paca_struct, user_time));
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 9fde8a1..911d453 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -387,12 +387,6 @@ _GLOBAL(ret_from_kernel_thread)
li  r3,0
b   syscall_exit
 
-   .section.toc,aw
-DSCR_DEFAULT:
-   .tc dscr_default[TC],dscr_default
-
-   .section.text
-
 /*
  * This routine switches between two different tasks.  The process
  * state of one is saved on its kernel stack.  Then the state
@@ -577,11 +571,10 @@ END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC)
 #ifdef CONFIG_PPC64
 BEGIN_FTR_SECTION
lwz r6,THREAD_DSCR_INHERIT(r4)
-   ld  r7,DSCR_DEFAULT@toc(2)
ld  r0,THREAD_DSCR(r4)
cmpwi   r6,0
bne 1f
-   ld  r0,0(r7)
+   ld  r0,PACA_DSCR(r13)
 1:
 BEGIN_FTR_SECTION_NESTED(70)
mfspr   r8, SPRN_FSCR
diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
index e2a1d6f..67fd2fd 100644
--- a/arch/powerpc/kernel/sysfs.c
+++ b/arch/powerpc/kernel/sysfs.c
@@ -484,7 +484,6 @@ SYSFS_PMCSETUP(pmc8, SPRN_PMC8);
 SYSFS_PMCSETUP(mmcra, SPRN_MMCRA);
 SYSFS_SPRSETUP(purr, SPRN_PURR);
 SYSFS_SPRSETUP(spurr, SPRN_SPURR);
-SYSFS_SPRSETUP(dscr, SPRN_DSCR);
 SYSFS_SPRSETUP(pir, SPRN_PIR);
 
 /*
@@ -494,12 +493,27 @@ SYSFS_SPRSETUP(pir, SPRN_PIR);
 */
 static DEVICE_ATTR(mmcra, 0600, show_mmcra, store_mmcra);
 static DEVICE_ATTR(spurr, 0400, show_spurr, NULL);
-static DEVICE_ATTR(dscr, 0600, show_dscr, store_dscr);
 static DEVICE_ATTR(purr, 0400, show_purr, store_purr);
 static DEVICE_ATTR(pir, 0400, show_pir, NULL);
 
-unsigned long dscr_default = 0;
-EXPORT_SYMBOL(dscr_default);
+static unsigned long dscr_default;
+
+static void read_dscr(void *val)
+{
+   *(unsigned long *)val = get_paca()-dscr_default;
+}
+
+static void write_dscr(void *val)
+{
+   get_paca()-dscr_default = *(unsigned long *)val;
+   if (!current-thread.dscr_inherit) {
+   current-thread.dscr = *(unsigned long *)val;
+   mtspr(SPRN_DSCR, *(unsigned long *)val);
+   }
+}
+
+SYSFS_SPRSETUP_SHOW_STORE(dscr);
+static DEVICE_ATTR(dscr, 0600, show_dscr, store_dscr);
 
 static void add_write_permission_dev_attr(struct device_attribute *attr)
 {
@@ -512,14 +526,6 @@ static ssize_t show_dscr_default(struct device *dev,
return sprintf(buf, %lx\n, dscr_default);
 }
 
-static void update_dscr(void *dummy)
-{
-   if (!current

[PATCH 0/3] powerpc: fix regression of per-CPU DSCR setting

2014-05-21 Thread Sam Bobroff
Hello,

This patch corrects a regression on PowerPC CPUs that causes their
per-CPU DSCR SPR value (exposed via /sys/devices/system/cpuN/dscr) to
be quickly lost during context switching, effectively meaning that the
DSCR can no longer be set on a per-CPU basis.

My intent is to restore the functionality of the per-CPU value in a
way that is compatible with the newer global default and task-specific
DSCR setting system.  Users of either the old or new systems should
now get pretty much what they expect.

A couple of notes:

I've split an existing ifdef CONFIG_PPC_STD_MMU_64 block in
paca_struct into two parts because it allows dscr_default to be placed
into a cache line hole. (This seems be the case even without
CONFIG_PPC_STD_MMU_64 being defined.) Comments or ideas on alternative
placements are welcome.

PowerPC context switching is touched but there should not be any
performance cost; if anything it should get slightly faster due to the
per-CPU value being easier to access than the old global default.

Sam Bobroff (3):
  powerpc: Split __SYSFS_SPRSETUP macro
  powerpc: fix regression of per-CPU DSCR setting
  powerpc: Document sysfs DSCR interface

 Documentation/ABI/stable/sysfs-devices-system-cpu |   25 ++
 arch/powerpc/include/asm/paca.h   |3 ++
 arch/powerpc/kernel/asm-offsets.c |1 +
 arch/powerpc/kernel/entry_64.S|9 +---
 arch/powerpc/kernel/sysfs.c   |   51 +
 arch/powerpc/kernel/tm.S  |   16 ++-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   |3 +-
 7 files changed, 67 insertions(+), 41 deletions(-)
 create mode 100644 Documentation/ABI/stable/sysfs-devices-system-cpu

-- 
1.7.10.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 1/3] powerpc: Split __SYSFS_SPRSETUP macro

2014-05-21 Thread Sam Bobroff
Split the __SYSFS_SPRSETUP macro into two parts so that registers requiring
custom read and write functions can use common code for their show and store
functions.

Signed-off-by: Sam Bobroff sam.bobr...@au1.ibm.com
---
 arch/powerpc/kernel/sysfs.c |   19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
index d90d4b7..e2a1d6f 100644
--- a/arch/powerpc/kernel/sysfs.c
+++ b/arch/powerpc/kernel/sysfs.c
@@ -404,7 +404,7 @@ void ppc_enable_pmcs(void)
 }
 EXPORT_SYMBOL(ppc_enable_pmcs);
 
-#define __SYSFS_SPRSETUP(NAME, ADDRESS, EXTRA) \
+#define __SYSFS_SPRSETUP_READ_WRITE(NAME, ADDRESS, EXTRA) \
 static void read_##NAME(void *val) \
 { \
*(unsigned long *)val = mfspr(ADDRESS); \
@@ -413,7 +413,9 @@ static void write_##NAME(void *val) \
 { \
EXTRA; \
mtspr(ADDRESS, *(unsigned long *)val);  \
-} \
+}
+
+#define __SYSFS_SPRSETUP_SHOW_STORE(NAME) \
 static ssize_t show_##NAME(struct device *dev, \
struct device_attribute *attr, \
char *buf) \
@@ -436,10 +438,15 @@ static ssize_t __used \
return count; \
 }
 
-#define SYSFS_PMCSETUP(NAME, ADDRESS)  \
-   __SYSFS_SPRSETUP(NAME, ADDRESS, ppc_enable_pmcs())
-#define SYSFS_SPRSETUP(NAME, ADDRESS)  \
-   __SYSFS_SPRSETUP(NAME, ADDRESS, )
+#define SYSFS_PMCSETUP(NAME, ADDRESS) \
+   __SYSFS_SPRSETUP_READ_WRITE(NAME, ADDRESS, ppc_enable_pmcs()) \
+   __SYSFS_SPRSETUP_SHOW_STORE(NAME)
+#define SYSFS_SPRSETUP(NAME, ADDRESS) \
+   __SYSFS_SPRSETUP_READ_WRITE(NAME, ADDRESS, ) \
+   __SYSFS_SPRSETUP_SHOW_STORE(NAME)
+
+#define SYSFS_SPRSETUP_SHOW_STORE(NAME) \
+   __SYSFS_SPRSETUP_SHOW_STORE(NAME)
 
 /* Let's define all possible registers, we'll only hook up the ones
  * that are implemented on the current processor
-- 
1.7.10.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 1/1] powerpc: correct DSCR during TM context switch

2014-06-03 Thread Sam Bobroff
Correct the DSCR SPR becoming temporarily corrupted when a task is
context switched when within a transaction. It is corrected when
the transaction is aborted (which will happen after a context switch)
but if the task has suspended (TSUSPEND) the transaction the incorrect
value can be seen.

The problem is caused by saving a thread's DSCR after it has already
been reverted to the CPU's default value:

__switch_to() calls __switch_to_tm()
which calls tm_reclaim_task()
which calls tm_reclaim_thread()
which calls tm_reclaim() where the DSCR is reset
__switch_to() calls _switch
_switch() saves the DSCR to thread.dscr

The fix is to treat the DSCR similarly to the TAR and save it early
in __switch_to().

The program below will expose the problem:

  #include inttypes.h
  #include stdio.h
  #include stdlib.h
  #include assert.h
  #include asm/tm.h

  #define TBEGIN  .long 0x7C00051D ;
  #define TEND.long 0x7C00055D ;
  #define TCHECK  .long 0x7C00059C ;
  #define TSUSPEND.long 0x7C0005DD ;
  #define TRESUME .long 0x7C2005DD ;
  #define SPRN_TEXASR 0x82
  #define SPRN_DSCR   0x03

  int main(void) {
uint64_t i = 0, rv, dscr1 = 1, dscr2, texasr;

for (;;) {
  rv = 1;
  asm __volatile__ (
  ld  3, %[dscr1];
  mtspr   %[sprn_dscr], 3;
  TBEGIN
  beq 1f;
  TSUSPEND
  2: ;
  TCHECK
  bc  4, 0, 2b;
  mfspr   3, %[sprn_dscr];
  std 3, %[dscr2];
  mfspr   3, %[sprn_texasr];
  std 3, %[texasr];
  TRESUME
  TEND
  li  %[rv], 0;
  1: ;
  : [rv]=r(rv), [dscr2]=m(dscr2), [texasr]=m(texasr)
  : [dscr1]m(dscr1)
  , [sprn_dscr]i(SPRN_DSCR), [sprn_texasr]i(SPRN_TEXASR)
  : memory, r3
  );
  assert(rv);
  if ((texasr  56) == TM_CAUSE_RESCHED) {
putchar('!');
fflush(stdout);
i++;
  }
  else {
putchar('.');
fflush(stdout);
  }
  if (dscr2 != dscr1) {
printf(\n DSCR incorrect: 0x%lx (expecting 0x%lx)\n, dscr2, 
dscr1);
exit(EXIT_FAILURE);
  }
  if (i  10) {
printf(\n DSCR TM context switching seems OK.\n);
exit(EXIT_SUCCESS);
  }
}
  }

Signed-off-by: Sam Bobroff sam.bobr...@au1.ibm.com
---
 arch/powerpc/include/asm/switch_to.h |6 --
 arch/powerpc/kernel/entry_64.S   |6 --
 arch/powerpc/kernel/process.c|8 
 3 files changed, 8 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/include/asm/switch_to.h 
b/arch/powerpc/include/asm/switch_to.h
index 2737f46..3efd0e5 100644
--- a/arch/powerpc/include/asm/switch_to.h
+++ b/arch/powerpc/include/asm/switch_to.h
@@ -16,13 +16,15 @@ struct thread_struct;
 extern struct task_struct *_switch(struct thread_struct *prev,
   struct thread_struct *next);
 #ifdef CONFIG_PPC_BOOK3S_64
-static inline void save_tar(struct thread_struct *prev)
+static inline void save_early_sprs(struct thread_struct *prev)
 {
if (cpu_has_feature(CPU_FTR_ARCH_207S))
prev-tar = mfspr(SPRN_TAR);
+   if (cpu_has_feature(CPU_FTR_DSCR))
+   prev-dscr = mfspr(SPRN_DSCR);
 }
 #else
-static inline void save_tar(struct thread_struct *prev) {}
+static inline void save_early_sprs(struct thread_struct *prev) {}
 #endif
 
 extern void enable_kernel_fp(void);
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 662c6dd..a107f4a 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -432,12 +432,6 @@ BEGIN_FTR_SECTION
std r24,THREAD_VRSAVE(r3)
 END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC)
 #endif /* CONFIG_ALTIVEC */
-#ifdef CONFIG_PPC64
-BEGIN_FTR_SECTION
-   mfspr   r25,SPRN_DSCR
-   std r25,THREAD_DSCR(r3)
-END_FTR_SECTION_IFSET(CPU_FTR_DSCR)
-#endif
and.r0,r0,r22
beq+1f
andcr22,r22,r0
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index e247898..8d2065e 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -771,15 +771,15 @@ struct task_struct *__switch_to(struct task_struct *prev,
 
WARN_ON(!irqs_disabled());
 
-   /* Back up the TAR across context switches.
+   /* Back up the TAR and DSCR across context switches.
 * Note that the TAR is not available for use in the kernel.  (To
 * provide this, the TAR should be backed up/restored on exception
 * entry/exit instead, and be in pt_regs.  FIXME, this should be in
 * pt_regs anyway (for debug).)
-* Save the TAR here before we do treclaim/trecheckpoint as these
-* will change the TAR.
+* Save the TAR and DSCR here before we do treclaim/trecheckpoint as
+* these will change them.
 */
-   save_tar(prev-thread);
+   save_early_sprs(prev-thread);
 
__switch_to_tm(prev

Re: [PATCH 1/1] powerpc: correct DSCR during TM context switch

2014-06-04 Thread Sam Bobroff
On 04/06/14 20:03, Michael Neuling wrote:
 On Wed, 2014-06-04 at 17:31 +1000, Michael Ellerman wrote:
 Hi Sam,

 Comments inline ..
 
 Ditto

Responses inline...

 On Wed, 2014-06-04 at 13:33 +1000, Sam Bobroff wrote:
 Correct the DSCR SPR becoming temporarily corrupted when a task is
 context switched when within a transaction. It is corrected when
 the transaction is aborted (which will happen after a context switch)
 but if the task has suspended (TSUSPEND) the transaction the incorrect
 value can be seen.

 I don't quite follow this description. How is it corrected when the 
 transaction
 is aborted, and when does that usually happen? If that happens the task can't
 ever see the corrupted value?

 To hit the suspended case, the task starts a transaction, suspends it, is 
 then
 context switched out and back in, and at that point it can see the wrong 
 value?
 
 Yep, that's it and it's corrupted until the transaction is rolled back
 (normally at the tresume).  At the tresume it gets rolled back to the
 checkpointed value at tbegin and is no longer corrupt.


I'll re-work the explanation to be clearer about how it becomes corrupt and how 
it is corrected.

 The problem is caused by saving a thread's DSCR afterNo it's lost at that 
 point as we've not saved it and it was overwritten when we did the 
 treclaim.   it has already
 been reverted to the CPU's default value:

 __switch_to() calls __switch_to_tm()
 which calls tm_reclaim_task()
 which calls tm_reclaim_thread()
 which calls tm_reclaim() where the DSCR is reset

 Where the DSCR is set to DSCR_DEFAULT ? Or now PACA_DSCR since your previous
 patches?

 Could we instead fix the bug there by reverting to the thread's DSCR value?
 
 We really need to save it earlier, before the treclaim which will
 override it.

I'll try to improve this explanation as well.
 
 __switch_to() calls _switch
 _switch() saves the DSCR to thread.dscrTBEGIN

 The fix is to treat the DSCR similarly to the TAR and save it early
 in __switch_to().

 The program below will expose the problem:


 Can you drop this in tools/testing/selftests/powerpc/tm ?

 You'll need to create that directory, you can ape the Makefile from the pmu
 directory, it should be fairly obvious. See the pmu tests for how to 
 integrate
 with the test harness etc., or bug me if it's not straight forward.

Will do :-)

 diff --git a/arch/powerpc/include/asm/switch_to.h 
 b/arch/powerpc/include/asm/switch_to.h
 index 2737f46..3efd0e5 100644
 --- a/arch/powerpc/include/asm/switch_to.h
 +++ b/arch/powerpc/include/asm/switch_to.h
 @@ -16,13 +16,15 @@ struct thread_struct;
  extern struct task_struct *_switch(struct thread_struct *prev,
struct thread_struct *next);
  #ifdef CONFIG_PPC_BOOK3S_64
 -static inline void save_tar(struct thread_struct *prev)
 +static inline void save_early_sprs(struct thread_struct *prev)
  {
 if (cpu_has_feature(CPU_FTR_ARCH_207S))
 prev-tar = mfspr(SPRN_TAR);
 +   if (cpu_has_feature(CPU_FTR_DSCR))
 +   prev-dscr = mfspr(SPRN_DSCR);
  }

 Are we going to end up saving more SPRs in this code? What makes the TAR  
 DSCR
 special vs everything else?
 
 There are only a limited set of SPRs that TM checkpoints.  The full list
 is CR, LR, CTR, FPSCR, AMR, PPR, VRSAVE, VSCR, DSCR, and TAR.  
 
 http://www.scribd.com/doc/142877680/PowerISA-v2-07#outer_page_826
 
 CR, LR, CTR, PPR are handled really early in the exception handler
 
 FPSCR, VSCR are done in the FP/VMX/VSX code.
 
 AMR we don't care about.
 
 That just leaves the DSCR and the TAR for here
 
 ... and the VRSAVE.  Sam: did you have a patch to save that one early
 too?  I think we talked about it but forgot, or did we decide that it's
 always broken anyway so we don't care? :-D

I thought we'd decided that VRSAVE was already probably broken ;-)

I haven't tested VRSAVE yet so we don't know if it's actually getting corrupted 
in this situation (although it seems likely), and from a quick look at the code 
it's not being treated like DSCR or TAR at the moment so I would need to 
investigate it separately.

 Mikey
 
 The nice thing about doing this in asm is it's nop'ed out for cpus that don't
 have the DSCR. What does the generated code for this look like?

 diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
 index 662c6dd..a107f4a 100644
 --- a/arch/powerpc/kernel/entry_64.S
 +++ b/arch/powerpc/kernel/entry_64.S
 @@ -432,12 +432,6 @@ BEGIN_FTR_SECTION
 std r24,THREAD_VRSAVE(r3)
  END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC)
  #endif /* CONFIG_ALTIVEC */
 -#ifdef CONFIG_PPC64
 -BEGIN_FTR_SECTION
 -   mfspr   r25,SPRN_DSCR
 -   std r25,THREAD_DSCR(r3)
 -END_FTR_SECTION_IFSET(CPU_FTR_DSCR)
 -#endif
 and.r0,r0,r22
 beq+1f
 andcr22,r22,r0


I wondered a little about this as well. The C code calls this function:

static inline int cpu_has_feature(unsigned long feature)
{
return (CPU_FTRS_ALWAYS

[PATCH 1/1 v2] powerpc: Correct DSCR during TM context switch

2014-06-05 Thread Sam Bobroff
Correct the DSCR SPR becoming temporarily corrupted if a task is
context switched during a transaction.

The problem occurs while suspending the task and is caused by saving
the DSCR to thread.dscr after it has already been set to the CPU's
default value:

__switch_to() calls __switch_to_tm()
which calls tm_reclaim_task()
which calls tm_reclaim_thread()
which calls tm_reclaim()
where the DSCR is set to the CPU's default
__switch_to() calls _switch()
where thread.dscr is set to the DSCR

When the task is resumed, it's transaction will be doomed (as usual)
and the DSCR SPR will be corrupted, although the checkpointed value
will be correct. Therefore the DSCR will be immediately corrected by
the transaction aborting, unless it has been suspended. In that case
the incorrect value can be seen by the task until it resumes the
transaction.

The fix is to treat the DSCR similarly to the TAR and save it early
in __switch_to().

A program exposing the problem is added to the kernel self tests as:
tools/testing/selftests/powerpc/tm/tm-resched-dscr.

Signed-off-by: Sam Bobroff sam.bobr...@au1.ibm.com
---
Changes:
v2:
* Reworked commit message.
* Adjusted test code and added it to kernel self tests.
---
 arch/powerpc/include/asm/switch_to.h   |6 +-
 arch/powerpc/kernel/entry_64.S |6 --
 arch/powerpc/kernel/process.c  |8 +-
 tools/testing/selftests/powerpc/Makefile   |2 +-
 tools/testing/selftests/powerpc/tm/Makefile|   15 
 .../testing/selftests/powerpc/tm/tm-resched-dscr.c |   90 
 6 files changed, 114 insertions(+), 13 deletions(-)
 create mode 100644 tools/testing/selftests/powerpc/tm/Makefile
 create mode 100644 tools/testing/selftests/powerpc/tm/tm-resched-dscr.c

diff --git a/arch/powerpc/include/asm/switch_to.h 
b/arch/powerpc/include/asm/switch_to.h
index 2737f46..3efd0e5 100644
--- a/arch/powerpc/include/asm/switch_to.h
+++ b/arch/powerpc/include/asm/switch_to.h
@@ -16,13 +16,15 @@ struct thread_struct;
 extern struct task_struct *_switch(struct thread_struct *prev,
   struct thread_struct *next);
 #ifdef CONFIG_PPC_BOOK3S_64
-static inline void save_tar(struct thread_struct *prev)
+static inline void save_early_sprs(struct thread_struct *prev)
 {
if (cpu_has_feature(CPU_FTR_ARCH_207S))
prev-tar = mfspr(SPRN_TAR);
+   if (cpu_has_feature(CPU_FTR_DSCR))
+   prev-dscr = mfspr(SPRN_DSCR);
 }
 #else
-static inline void save_tar(struct thread_struct *prev) {}
+static inline void save_early_sprs(struct thread_struct *prev) {}
 #endif
 
 extern void enable_kernel_fp(void);
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 662c6dd..a107f4a 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -432,12 +432,6 @@ BEGIN_FTR_SECTION
std r24,THREAD_VRSAVE(r3)
 END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC)
 #endif /* CONFIG_ALTIVEC */
-#ifdef CONFIG_PPC64
-BEGIN_FTR_SECTION
-   mfspr   r25,SPRN_DSCR
-   std r25,THREAD_DSCR(r3)
-END_FTR_SECTION_IFSET(CPU_FTR_DSCR)
-#endif
and.r0,r0,r22
beq+1f
andcr22,r22,r0
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index e247898..8d2065e 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -771,15 +771,15 @@ struct task_struct *__switch_to(struct task_struct *prev,
 
WARN_ON(!irqs_disabled());
 
-   /* Back up the TAR across context switches.
+   /* Back up the TAR and DSCR across context switches.
 * Note that the TAR is not available for use in the kernel.  (To
 * provide this, the TAR should be backed up/restored on exception
 * entry/exit instead, and be in pt_regs.  FIXME, this should be in
 * pt_regs anyway (for debug).)
-* Save the TAR here before we do treclaim/trecheckpoint as these
-* will change the TAR.
+* Save the TAR and DSCR here before we do treclaim/trecheckpoint as
+* these will change them.
 */
-   save_tar(prev-thread);
+   save_early_sprs(prev-thread);
 
__switch_to_tm(prev);
 
diff --git a/tools/testing/selftests/powerpc/Makefile 
b/tools/testing/selftests/powerpc/Makefile
index 316194f..e1544e8 100644
--- a/tools/testing/selftests/powerpc/Makefile
+++ b/tools/testing/selftests/powerpc/Makefile
@@ -13,7 +13,7 @@ CFLAGS := -Wall -O2 -flto -Wall -Werror 
-DGIT_VERSION='$(GIT_VERSION)' -I$(CUR
 
 export CC CFLAGS
 
-TARGETS = pmu copyloops
+TARGETS = pmu copyloops tm
 
 endif
 
diff --git a/tools/testing/selftests/powerpc/tm/Makefile 
b/tools/testing/selftests/powerpc/tm/Makefile
new file mode 100644
index 000..51267f4
--- /dev/null
+++ b/tools/testing/selftests/powerpc/tm/Makefile
@@ -0,0 +1,15 @@
+PROGS := tm-resched-dscr
+
+all: $(PROGS)
+
+$(PROGS

[PATCH 1/1] selftests/powerpc: fix TARGETS in powerpc selftests makefile

2014-06-24 Thread Sam Bobroff
This patch changes the name of a make variable (TARGETS) to prevent it
from colliding with a value set by the user on the command line (as
they are recommended to do by tools/testing/selftests/README.txt).

Before this patch, make -C tools/testing/selftests TARGETS=powerpc
would fail.

Signed-off-by: Sam Bobroff sam.bobr...@au1.ibm.com
---
 tools/testing/selftests/powerpc/Makefile | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/powerpc/Makefile 
b/tools/testing/selftests/powerpc/Makefile
index 54833a7..84795c0 100644
--- a/tools/testing/selftests/powerpc/Makefile
+++ b/tools/testing/selftests/powerpc/Makefile
@@ -13,22 +13,22 @@ CFLAGS := -Wall -O2 -flto -Wall -Werror 
-DGIT_VERSION='$(GIT_VERSION)' -I$(CUR
 
 export CC CFLAGS
 
-TARGETS = pmu copyloops mm tm
+SUB_TARGETS = pmu copyloops mm tm
 
 endif
 
 all:
-   @for TARGET in $(TARGETS); do \
+   @for TARGET in $(SUB_TARGETS); do \
$(MAKE) -C $$TARGET all; \
done;
 
 run_tests: all
-   @for TARGET in $(TARGETS); do \
+   @for TARGET in $(SUB_TARGETS); do \
$(MAKE) -C $$TARGET run_tests; \
done;
 
 clean:
-   @for TARGET in $(TARGETS); do \
+   @for TARGET in $(SUB_TARGETS); do \
$(MAKE) -C $$TARGET clean; \
done;
rm -f tags
-- 
1.9.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V3 0/3] Add new PowerPC specific ELF core notes

2014-07-17 Thread Sam Bobroff
On 17/07/14 21:14, Michael Neuling wrote:
 
 On Jul 17, 2014 9:11 PM, Benjamin Herrenschmidt
 b...@kernel.crashing.org mailto:b...@kernel.crashing.org wrote:

  
   Outstanding Issues
   ==
   (1) Running DSCR register value inside a transaction does not
 seem to be saved
   at thread.dscr when the process stops for ptrace examination.
  
   Hey Ben,
  
   Any updates on this patch series ?
 
  Ben,
 
  Any updates on this patch series ?

 I haven't had a chance to review yet, I was hoping somebody else would..

 Have you made any progress vs. the DSCR outstanding issue mentioned
 above ?
 
 The DSCR issue should be resolved with Sam Bobroff's recent  DSCR
 fixes.  I've not tested them though.
 
 Actually... Sam did you review this series?
 
 Mikey
 

I did, and applying powerpc: Correct DSCR during TM context switch
corrected the DSCR value in the test program (the one in the patch notes
for this series).

(In fact, IIRC, the reason for my patch set was the bug exposed by this
one ;-)

Cheers,
Sam.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

suspicious RCU usage clockevents_lock, tick_broadcast_lock, hrtimer_bases.lock

2015-02-12 Thread Sam Bobroff
Hello,

I'm receiving this while booting a vanilla 3.19 kernel on a Power 8 machine:

[2.522179] device-mapper: uevent: version 1.0.3
[2.522741] device-mapper: ioctl: 4.29.0-ioctl (2014-10-28)
initialised: dm-de...@redhat.com
[2.543590]
[2.543630] ===
[2.543709] [ INFO: suspicious RCU usage. ]
[2.543758] 3.19.0samb #2 Not tainted
[2.543802] ---
[2.543847] include/trace/events/timer.h:186 suspicious
rcu_dereference_check() usage!
[2.543940]
[2.543940] other info that might help us debug this:
[2.543940]
[2.544035]
[2.544035] RCU used illegally from idle CPU!
[2.544035] rcu_scheduler_active = 1, debug_locks = 0
[2.544154] RCU used illegally from extended quiescent state!
[2.544234] 3 locks held by swapper/1/0:
[2.544284]  #0:  (clockevents_lock){-.}, at:
[c01a45cc] .clockevents_notify+0x5c/0x320
[2.544464]  #1:  (tick_broadcast_lock){-.-...}, at:
[c01a7ac4] .tick_broadcast_oneshot_control+0xe4/0x530
[2.544654]  #2:  (hrtimer_bases.lock#2){-.-...}, at:
[c0192694] .__hrtimer_start_range_ns+0x124/0x6e0
[2.544843]
[2.544843] stack backtrace:
[2.544904] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.19.0samb #2
[2.544986] Call Trace:
[2.545023] [c00fdf6f7770] [c0faf118]
.dump_stack+0x98/0xd4 (unreliable)
[2.545124] [c00fdf6f77f0] [c0152b78]
.lockdep_rcu_suspicious+0x138/0x180
[2.545225] [c00fdf6f7880] [c0191374]
.enqueue_hrtimer+0x1c4/0x300
[2.545325] [c00fdf6f7910] [c019276c]
.__hrtimer_start_range_ns+0x1fc/0x6e0
[2.545425] [c00fdf6f7a10] [c01a8e90] .bc_set_next+0xc0/0xf0
[2.545510] [c00fdf6f7aa0] [c01a51f0]
.clockevents_program_event+0x100/0x1f0
[2.545607] [c00fdf6f7b40] [c01a6bac]
.tick_broadcast_set_event+0x6c/0x120
[2.545705] [c00fdf6f7bd0] [c01a7c94]
.tick_broadcast_oneshot_control+0x2b4/0x530
[2.545802] [c00fdf6f7ca0] [c01a4818]
.clockevents_notify+0x2a8/0x320
[2.545898] [c00fdf6f7d70] [c01484f4]
.cpu_startup_entry+0x404/0x730
[2.545995] [c00fdf6f7ec0] [c0044314]
.start_secondary+0x3a4/0x460
[2.546092] [c00fdf6f7f90] [c0008bfc]
start_secondary_prolog+0x10/0x14
[2.546555] Registering IBM Power 842 compression driver

Cheers,
Sam.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc/powernv: Check image loaded or not before calling flash

2015-02-12 Thread Sam Bobroff
On 13/02/15 08:27, Benjamin Herrenschmidt wrote:
 On Thu, 2015-02-12 at 15:23 +0530, Vasant Hegde wrote:
 Present code checks for update_flash_data in opal_flash_term_callback().
 update_flash_data has been statically initialized to zero, and that
 is the value of FLASH_IMG_READY. Also code update initialization happens
 during subsys init.
 
 Please statically initialize it to a sane value instead.

I've tested this patch and it works for me (the message is suppressed)
but I agree with Ben that it seems cleaner to use a static initializer.

 So if reboot is issued before the subsys init stage then we endup displaying
 Flashing new firmware message.. which may confuse end user.

 This patch adds additional validation to make sure image is actually loaded
 or not.

 Reported-by: Sam Bobroff sam.bobr...@au1.ibm.com
 Signed-off-by: Vasant Hegde hegdevas...@linux.vnet.ibm.com
 ---
  arch/powerpc/platforms/powernv/opal-flash.c |3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

 diff --git a/arch/powerpc/platforms/powernv/opal-flash.c 
 b/arch/powerpc/platforms/powernv/opal-flash.c
 index 5c21d9c..5455cd4 100644
 --- a/arch/powerpc/platforms/powernv/opal-flash.c
 +++ b/arch/powerpc/platforms/powernv/opal-flash.c
 @@ -319,7 +319,8 @@ void opal_flash_term_callback(void)
  {
  struct cpumask mask;
  
 -if (update_flash_data.status != FLASH_IMG_READY)
 +if (update_flash_data.status != FLASH_IMG_READY ||
 +image_data.status != IMAGE_READY)
  return;
  
  pr_alert(FLASH: Flashing new firmware\n);
 
 
 ___
 Linuxppc-dev mailing list
 Linuxppc-dev@lists.ozlabs.org
 https://lists.ozlabs.org/listinfo/linuxppc-dev
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 1/3] powerpc/tm: Abort syscalls in active transactions

2015-03-18 Thread Sam Bobroff
This patch changes the syscall handler to doom (tabort) active
transactions when a syscall is made and return immediately without
performing the syscall.

Currently, the system call instruction automatically suspends an
active transaction which causes side effects to persist when an active
transaction fails.

This does change the kernel's behaviour, but in a way that was
documented as unsupported. It doesn't reduce functionality because
syscalls will still be performed after tsuspend. It also provides a
consistent interface and makes the behaviour of user code
substantially the same across powerpc and platforms that do not
support suspended transactions (e.g. x86 and s390).

Performance measurements using
http://ozlabs.org/~anton/junkcode/null_syscall.c
indicate the cost of a system call increases by about 0.5%.

Signed-off-by: Sam Bobroff sam.bobr...@au1.ibm.com
---
 Documentation/powerpc/transactional_memory.txt |   33 
 arch/powerpc/include/uapi/asm/tm.h |2 +-
 arch/powerpc/kernel/entry_64.S |   19 ++
 3 files changed, 37 insertions(+), 17 deletions(-)

diff --git a/Documentation/powerpc/transactional_memory.txt 
b/Documentation/powerpc/transactional_memory.txt
index 9791e98..4167bc2 100644
--- a/Documentation/powerpc/transactional_memory.txt
+++ b/Documentation/powerpc/transactional_memory.txt
@@ -74,22 +74,23 @@ Causes of transaction aborts
 Syscalls
 
 
-Performing syscalls from within transaction is not recommended, and can lead
-to unpredictable results.
-
-Syscalls do not by design abort transactions, but beware: The kernel code will
-not be running in transactional state.  The effect of syscalls will always
-remain visible, but depending on the call they may abort your transaction as a
-side-effect, read soon-to-be-aborted transactional data that should not remain
-invisible, etc.  If you constantly retry a transaction that constantly aborts
-itself by calling a syscall, you'll have a livelock  make no progress.
-
-Simple syscalls (e.g. sigprocmask()) could be OK.  Even things like write()
-from, say, printf() should be OK as long as the kernel does not access any
-memory that was accessed transactionally.
-
-Consider any syscalls that happen to work as debug-only -- not recommended for
-production use.  Best to queue them up till after the transaction is over.
+Syscalls made from within an active transaction will not be performed and the
+transaction will be doomed by the kernel with the failure code TM_CAUSE_SYSCALL
+| TM_CAUSE_PERSISTENT.
+
+Syscalls made from within a suspended transaction are performed as normal and
+the transaction is not explicitly doomed by the kernel.  However, what the
+kernel does to perform the syscall may result in the transaction being doomed
+by the hardware.  The syscall is performed in suspended mode so any side
+effects will be persistent, independent of transaction success or failure.  No
+guarantees are provided by the kernel about which syscalls will affect
+transaction success.
+
+Care must be taken when relying on syscalls to abort during active transactions
+if the calls are made via a library.  Libraries may cache values (which may
+give the appearence of success) or perform operations that cause transaction
+failure before entering the kernel (which may produce different failure codes).
+Examples are glibc's getpid() and lazy symbol resolution.
 
 
 Signals
diff --git a/arch/powerpc/include/uapi/asm/tm.h 
b/arch/powerpc/include/uapi/asm/tm.h
index 5d836b7..5047659 100644
--- a/arch/powerpc/include/uapi/asm/tm.h
+++ b/arch/powerpc/include/uapi/asm/tm.h
@@ -11,7 +11,7 @@
 #define TM_CAUSE_RESCHED   0xde
 #define TM_CAUSE_TLBI  0xdc
 #define TM_CAUSE_FAC_UNAV  0xda
-#define TM_CAUSE_SYSCALL   0xd8  /* future use */
+#define TM_CAUSE_SYSCALL   0xd8
 #define TM_CAUSE_MISC  0xd6  /* future use */
 #define TM_CAUSE_SIGNAL0xd4
 #define TM_CAUSE_ALIGNMENT 0xd2
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index d180caf2..85bf81d 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -34,6 +34,7 @@
 #include asm/ftrace.h
 #include asm/hw_irq.h
 #include asm/context_tracking.h
+#include asm/tm.h
 
 /*
  * System calls.
@@ -145,6 +146,24 @@ END_FW_FTR_SECTION_IFSET(FW_FEATURE_SPLPAR)
andi.   r11,r10,_TIF_SYSCALL_DOTRACE
bne syscall_dotrace
 .Lsyscall_dotrace_cont:
+#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
+BEGIN_FTR_SECTION
+   b   1f
+END_FTR_SECTION_IFCLR(CPU_FTR_TM)
+   extrdi. r11, r12, 1, (63-MSR_TS_T_LG) /* transaction active? */
+   beq+1f
+
+   /* Doom the transaction and don't perform the syscall: */
+   mfmsr   r11
+   li  r12, 1
+   rldimi  r11, r12, MSR_TM_LG, 63-MSR_TM_LG
+   mtmsrd  r11, 0
+   li  r11, (TM_CAUSE_SYSCALL|TM_CAUSE_PERSISTENT)
+   tabort. r11
+
+   b   .Lsyscall_exit
+1

[PATCH 2/3] selftests/powerpc: Move get_auxv_entry() to harness.c

2015-03-18 Thread Sam Bobroff
Move get_auxv_entry() from pmu/lib.c up to harness.c in order to make
it available to other tests.

Signed-off-by: Sam Bobroff sam.bobr...@au1.ibm.com
---
 tools/testing/selftests/powerpc/harness.c |   47 +
 tools/testing/selftests/powerpc/pmu/lib.c |   47 -
 tools/testing/selftests/powerpc/pmu/lib.h |1 -
 tools/testing/selftests/powerpc/utils.h   |2 +-
 4 files changed, 48 insertions(+), 49 deletions(-)

diff --git a/tools/testing/selftests/powerpc/harness.c 
b/tools/testing/selftests/powerpc/harness.c
index 8ebc58a..f7997af 100644
--- a/tools/testing/selftests/powerpc/harness.c
+++ b/tools/testing/selftests/powerpc/harness.c
@@ -11,6 +11,10 @@
 #include sys/types.h
 #include sys/wait.h
 #include unistd.h
+#include elf.h
+#include fcntl.h
+#include link.h
+#include sys/stat.h
 
 #include subunit.h
 #include utils.h
@@ -112,3 +116,46 @@ int test_harness(int (test_function)(void), char *name)
 
return rc;
 }
+
+static char auxv[4096];
+
+void *get_auxv_entry(int type)
+{
+   ElfW(auxv_t) *p;
+   void *result;
+   ssize_t num;
+   int fd;
+
+   fd = open(/proc/self/auxv, O_RDONLY);
+   if (fd == -1) {
+   perror(open);
+   return NULL;
+   }
+
+   result = NULL;
+
+   num = read(fd, auxv, sizeof(auxv));
+   if (num  0) {
+   perror(read);
+   goto out;
+   }
+
+   if (num  sizeof(auxv)) {
+   printf(Overflowed auxv buffer\n);
+   goto out;
+   }
+
+   p = (ElfW(auxv_t) *)auxv;
+
+   while (p-a_type != AT_NULL) {
+   if (p-a_type == type) {
+   result = (void *)p-a_un.a_val;
+   break;
+   }
+
+   p++;
+   }
+out:
+   close(fd);
+   return result;
+}
diff --git a/tools/testing/selftests/powerpc/pmu/lib.c 
b/tools/testing/selftests/powerpc/pmu/lib.c
index 9768dea..a07104c 100644
--- a/tools/testing/selftests/powerpc/pmu/lib.c
+++ b/tools/testing/selftests/powerpc/pmu/lib.c
@@ -5,15 +5,10 @@
 
 #define _GNU_SOURCE/* For CPU_ZERO etc. */
 
-#include elf.h
 #include errno.h
-#include fcntl.h
-#include link.h
 #include sched.h
 #include setjmp.h
 #include stdlib.h
-#include sys/stat.h
-#include sys/types.h
 #include sys/wait.h
 
 #include utils.h
@@ -256,45 +251,3 @@ out:
return rc;
 }
 
-static char auxv[4096];
-
-void *get_auxv_entry(int type)
-{
-   ElfW(auxv_t) *p;
-   void *result;
-   ssize_t num;
-   int fd;
-
-   fd = open(/proc/self/auxv, O_RDONLY);
-   if (fd == -1) {
-   perror(open);
-   return NULL;
-   }
-
-   result = NULL;
-
-   num = read(fd, auxv, sizeof(auxv));
-   if (num  0) {
-   perror(read);
-   goto out;
-   }
-
-   if (num  sizeof(auxv)) {
-   printf(Overflowed auxv buffer\n);
-   goto out;
-   }
-
-   p = (ElfW(auxv_t) *)auxv;
-
-   while (p-a_type != AT_NULL) {
-   if (p-a_type == type) {
-   result = (void *)p-a_un.a_val;
-   break;
-   }
-
-   p++;
-   }
-out:
-   close(fd);
-   return result;
-}
diff --git a/tools/testing/selftests/powerpc/pmu/lib.h 
b/tools/testing/selftests/powerpc/pmu/lib.h
index 0f0339c..ca5d72a 100644
--- a/tools/testing/selftests/powerpc/pmu/lib.h
+++ b/tools/testing/selftests/powerpc/pmu/lib.h
@@ -29,7 +29,6 @@ extern int notify_parent(union pipe write_pipe);
 extern int notify_parent_of_error(union pipe write_pipe);
 extern pid_t eat_cpu(int (test_function)(void));
 extern bool require_paranoia_below(int level);
-extern void *get_auxv_entry(int type);
 
 struct addr_range {
uint64_t first, last;
diff --git a/tools/testing/selftests/powerpc/utils.h 
b/tools/testing/selftests/powerpc/utils.h
index a93777a..64f53cd 100644
--- a/tools/testing/selftests/powerpc/utils.h
+++ b/tools/testing/selftests/powerpc/utils.h
@@ -19,7 +19,7 @@ typedef uint8_t u8;
 
 
 int test_harness(int (test_function)(void), char *name);
-
+extern void *get_auxv_entry(int type);
 
 /* Yes, this is evil */
 #define FAIL_IF(x) \
-- 
1.7.10.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 3/3] selftests/powerpc: Add transactional syscall test

2015-03-18 Thread Sam Bobroff
Check that a syscall made during an active transaction will fail with
the correct failure code and that one made during a suspended
transaction will succeed.

Signed-off-by: Sam Bobroff sam.bobr...@au1.ibm.com
---
 tools/testing/selftests/powerpc/tm/Makefile |3 +-
 tools/testing/selftests/powerpc/tm/tm-syscall.c |  113 +++
 2 files changed, 115 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/powerpc/tm/tm-syscall.c

diff --git a/tools/testing/selftests/powerpc/tm/Makefile 
b/tools/testing/selftests/powerpc/tm/Makefile
index 2cede23..d8dab0d 100644
--- a/tools/testing/selftests/powerpc/tm/Makefile
+++ b/tools/testing/selftests/powerpc/tm/Makefile
@@ -1,4 +1,5 @@
-PROGS := tm-resched-dscr
+PROGS := tm-resched-dscr tm-syscall
+CFLAGS:=$(CFLAGS) -mhtm -Wl,-z,now
 
 all: $(PROGS)
 
diff --git a/tools/testing/selftests/powerpc/tm/tm-syscall.c 
b/tools/testing/selftests/powerpc/tm/tm-syscall.c
new file mode 100644
index 000..7c60e53
--- /dev/null
+++ b/tools/testing/selftests/powerpc/tm/tm-syscall.c
@@ -0,0 +1,113 @@
+/* Test the kernel's system call code to ensure that a system call
+ * made from within an active HTM transaction is aborted with the
+ * correct failure code.
+ * Conversely, ensure that a system call made from within a
+ * suspended transaction can succeed.
+ *
+ * It is important to compile with -Wl,-z,now to prevent
+ * lazy symbol resolution from affecting the results.
+ */
+
+#include stdio.h
+#include unistd.h
+#include asm/tm.h
+#include asm/cputable.h
+#include linux/auxvec.h
+
+#include utils.h
+
+#define TM_RETRIES 10
+#define TM_TEST_RUNS 1000
+
+int t_failure_persistent(void)
+{
+   long texasr = __builtin_get_texasr();
+   long failure_code = (texasr  56)  0xff;
+
+   return failure_code  TM_CAUSE_PERSISTENT;
+}
+
+int t_failure_code_syscall(void)
+{
+   long texasr = __builtin_get_texasr();
+   long failure_code = (texasr  56)  0xff;
+
+   return (failure_code  TM_CAUSE_SYSCALL) == TM_CAUSE_SYSCALL;
+}
+
+int t_active_getppid(void)
+{
+   int i;
+
+   for (i = 0; i  TM_RETRIES; i++) {
+   if (__builtin_tbegin(0)) {
+   getppid();
+   __builtin_tend(0);
+   return 1;
+   }
+   if (t_failure_persistent())
+   return 0;
+   }
+   return 0;
+}
+
+int t_active_getppid_test(void)
+{
+   int i;
+
+   for (i = 0; i  TM_TEST_RUNS; i++) {
+   if (t_active_getppid())
+   return 0;
+   if (!t_failure_persistent())
+   return 0;
+   if (!t_failure_code_syscall())
+   return 0;
+   }
+   return 1;
+}
+
+int t_suspended_getppid(void)
+{
+   int i;
+
+   for (i = 0; i  TM_RETRIES; i++) {
+   if (__builtin_tbegin(0)) {
+   __builtin_tsuspend();
+   getppid();
+   __builtin_tresume();
+   __builtin_tend(0);
+   return 1;
+   }
+   if (t_failure_persistent())
+   return 0;
+   }
+   return 0;
+}
+
+int t_suspended_getppid_test(void)
+{
+   int i;
+
+   for (i = 0; i  TM_TEST_RUNS; i++) {
+   if (!t_suspended_getppid())
+   return 0;
+   }
+   return 1;
+}
+
+int tm_syscall(void)
+{
+   SKIP_IF(!((long)get_auxv_entry(AT_HWCAP2)  PPC_FEATURE2_HTM));
+   setbuf(stdout, 0);
+   FAIL_IF(!t_active_getppid_test());
+   printf(%d active transactions correctly aborted.\n, TM_TEST_RUNS);
+   FAIL_IF(!t_suspended_getppid_test());
+   printf(%d suspended transactions succeeded.\n, TM_TEST_RUNS);
+   return 0;
+}
+
+int main(void)
+{
+   return test_harness(tm_syscall, tm_syscall);
+}
+
-- 
1.7.10.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 0/3] powerpc/tm: Abort syscalls in active transactions

2015-03-18 Thread Sam Bobroff

See the first patch for a description of the reasoning behind this
change.

This set includes the change, a kernel selftest for it and
some slight refactoring of the selftest code.


Sam Bobroff (3):
  powerpc/tm: Abort syscalls in active transactions
  selftests/powerpc: Move get_auxv_entry() to harness.c
  selftests/powerpc: Add transactional syscall test

 Documentation/powerpc/transactional_memory.txt  |   33 +++
 arch/powerpc/include/uapi/asm/tm.h  |2 +-
 arch/powerpc/kernel/entry_64.S  |   19 
 tools/testing/selftests/powerpc/harness.c   |   47 ++
 tools/testing/selftests/powerpc/pmu/lib.c   |   47 --
 tools/testing/selftests/powerpc/pmu/lib.h   |1 -
 tools/testing/selftests/powerpc/tm/Makefile |3 +-
 tools/testing/selftests/powerpc/tm/tm-syscall.c |  113 +++
 tools/testing/selftests/powerpc/utils.h |2 +-
 9 files changed, 200 insertions(+), 67 deletions(-)
 create mode 100644 tools/testing/selftests/powerpc/tm/tm-syscall.c

-- 
1.7.10.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 4/4] powerpc/tm: Correct minor documentation typos

2015-03-29 Thread Sam Bobroff
Signed-off-by: Sam Bobroff sam.bobr...@au1.ibm.com
---
v2:

Discovered some typos while updating the documentation.

 Documentation/powerpc/transactional_memory.txt |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/powerpc/transactional_memory.txt 
b/Documentation/powerpc/transactional_memory.txt
index 98b39af..ba0a2a4 100644
--- a/Documentation/powerpc/transactional_memory.txt
+++ b/Documentation/powerpc/transactional_memory.txt
@@ -175,7 +175,7 @@ These are defined in asm/reg.h, and distinguish different 
reasons why the
 kernel aborted a transaction:
 
  TM_CAUSE_RESCHED   Thread was rescheduled.
- TM_CAUSE_TLBI  Software TLB invalide.
+ TM_CAUSE_TLBI  Software TLB invalid.
  TM_CAUSE_FAC_UNAV  FP/VEC/VSX unavailable trap.
  TM_CAUSE_SYSCALL   Syscall from active transaction.
  TM_CAUSE_SIGNALSignal delivered.
@@ -185,7 +185,7 @@ kernel aborted a transaction:
 
 These can be checked by the user program's abort handler as TEXASR[0:7].  If
 bit 7 is set, it indicates that the error is consider persistent.  For example
-a TM_CAUSE_ALIGNMENT will be persistent while a TM_CAUSE_RESCHED will not.q
+a TM_CAUSE_ALIGNMENT will be persistent while a TM_CAUSE_RESCHED will not.
 
 GDB
 ===
-- 
1.7.10.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 0/4] powerpc/tm: Abort syscalls in active transactions

2015-03-29 Thread Sam Bobroff

See the first patch for a description of the reasoning behind this
change.

This set includes the change, a kernel selftest for it and
some slight refactoring of the selftest code.


v2:
Patch 1/4: powerpc/tm: Abort syscalls in active transactions

Also update the failure code table.

Patch 3/4: selftests/powerpc: Add transactional syscall test

Further testing has shown that the success or failure of the transactions was
affected by minor changes to the code, compiler optimisation and linker
settings.

To address this, I've moved the transactional part of the test to a separate
function, written in assembly. I've also extended the test to as many
transactions as it can fit into ten seconds, to better test for failures that
occur more rarely. This has stabilised the results, and it's no longer
necessary to use special compiler or linker flags.

Patch 4/4: powerpc/tm: Correct minor documentation typos

Discovered some typos while updating the documentation.


Sam Bobroff (4):
  powerpc/tm: Abort syscalls in active transactions
  selftests/powerpc: Move get_auxv_entry() to harness.c
  selftests/powerpc: Add transactional syscall test
  powerpc/tm: Correct minor documentation typos

 Documentation/powerpc/transactional_memory.txt |   36 +++
 arch/powerpc/include/uapi/asm/tm.h |2 +-
 arch/powerpc/kernel/entry_64.S |   19 
 tools/testing/selftests/powerpc/harness.c  |   47 +
 tools/testing/selftests/powerpc/pmu/lib.c  |   47 -
 tools/testing/selftests/powerpc/pmu/lib.h  |1 -
 tools/testing/selftests/powerpc/tm/.gitignore  |1 +
 tools/testing/selftests/powerpc/tm/Makefile|4 +-
 .../testing/selftests/powerpc/tm/tm-syscall-asm.S  |   27 +
 .../testing/selftests/powerpc/tm/tm-syscall-asm.h  |2 +
 tools/testing/selftests/powerpc/tm/tm-syscall.c|  109 
 tools/testing/selftests/powerpc/utils.h|2 +-
 12 files changed, 228 insertions(+), 69 deletions(-)
 create mode 100644 tools/testing/selftests/powerpc/tm/tm-syscall-asm.S
 create mode 100644 tools/testing/selftests/powerpc/tm/tm-syscall-asm.h
 create mode 100644 tools/testing/selftests/powerpc/tm/tm-syscall.c

-- 
1.7.10.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 3/3] selftests/powerpc: Add transactional syscall test

2015-03-29 Thread Sam Bobroff
On 24/03/15 13:02, Michael Ellerman wrote:
 On Tue, 2015-03-24 at 12:52 +1100, Sam Bobroff wrote:
 On 20/03/15 20:25, Anshuman Khandual wrote:
 On 03/19/2015 10:13 AM, Sam Bobroff wrote:
 Check that a syscall made during an active transaction will fail with
 the correct failure code and that one made during a suspended
 transaction will succeed.

 Signed-off-by: Sam Bobroff sam.bobr...@au1.ibm.com

 The test works.

 Great :-)

 +
 +int tm_syscall(void)
 +{
 +  SKIP_IF(!((long)get_auxv_entry(AT_HWCAP2)  PPC_FEATURE2_HTM));
 +  setbuf(stdout, 0);
 +  FAIL_IF(!t_active_getppid_test());
 +  printf(%d active transactions correctly aborted.\n, TM_TEST_RUNS);
 +  FAIL_IF(!t_suspended_getppid_test());
 +  printf(%d suspended transactions succeeded.\n, TM_TEST_RUNS);
 +  return 0;
 +}
 +
 +int main(void)
 +{
 +  return test_harness(tm_syscall, tm_syscall);
 +}
 +

 There is an extra blank line at the end of this file. Interchanging return
 codes of 0 and 1 for various functions make it very confusing along with
 negative FAIL_IF checks in the primary test function. Control flow 
 structures
 like these can use some in-code documentation for readability.

 +   for (i = 0; i  TM_RETRIES; i++) {
 +   if (__builtin_tbegin(0)) {
 +   getppid();
 +   __builtin_tend(0);
 +   return 1;
 +   }
 +   if (t_failure_persistent())
 +   return 0;

 or

 +   if (__builtin_tbegin(0)) {
 +   __builtin_tsuspend();
 +   getppid();
 +   __builtin_tresume();
 +   __builtin_tend(0);
 +   return 1;
 +   }
 +   if (t_failure_persistent())
 +   return 0;


 Good points. I'll remove the blank line and comment the code.

 I'm not sure I can do any better with the FAIL_IF() macro: I wanted it
 to read fail if the test failed, but I can see what you mean about a
 double negative. Maybe it would be better to introduce a different
 macro, more like a standard assert: TEST(XXX) which fails if XXX is
 false. However, I think TEST would be too generic a name and I'm not
 should what would be better. Any comments/suggestions?
 
 FAIL_IF() is designed for things that return 0 for OK and !0 for failure. Like
 most things in C.
 
 So I think it would be improved if you inverted your return codes in your test
 routines.
 
 Even better to return ESOMETHING in the error cases, and zero otherwise.
 
 cheers

Fair enough. I think the *_test() functions I added for clarity were
just making it more confusing, so I've dropped them.

Moving the code around, even a little, has also exposed the fact that
transactions are very sensitive to how the code is compiled so I'm going
to move the transaction blocks out into a separate assembly file where I
can control exactly what instructions get used. This will also mean that
it's no longer dependent on using linker magic (or some other trick) to
avoid lazy symbol loading.

I'll repost the series.

Thanks for the review!

Cheers,
Sam.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 2/4] selftests/powerpc: Move get_auxv_entry() to harness.c

2015-03-29 Thread Sam Bobroff
Move get_auxv_entry() from pmu/lib.c up to harness.c in order to make
it available to other tests.

Signed-off-by: Sam Bobroff sam.bobr...@au1.ibm.com
---
 tools/testing/selftests/powerpc/harness.c |   47 +
 tools/testing/selftests/powerpc/pmu/lib.c |   47 -
 tools/testing/selftests/powerpc/pmu/lib.h |1 -
 tools/testing/selftests/powerpc/utils.h   |2 +-
 4 files changed, 48 insertions(+), 49 deletions(-)

diff --git a/tools/testing/selftests/powerpc/harness.c 
b/tools/testing/selftests/powerpc/harness.c
index 8ebc58a..f7997af 100644
--- a/tools/testing/selftests/powerpc/harness.c
+++ b/tools/testing/selftests/powerpc/harness.c
@@ -11,6 +11,10 @@
 #include sys/types.h
 #include sys/wait.h
 #include unistd.h
+#include elf.h
+#include fcntl.h
+#include link.h
+#include sys/stat.h
 
 #include subunit.h
 #include utils.h
@@ -112,3 +116,46 @@ int test_harness(int (test_function)(void), char *name)
 
return rc;
 }
+
+static char auxv[4096];
+
+void *get_auxv_entry(int type)
+{
+   ElfW(auxv_t) *p;
+   void *result;
+   ssize_t num;
+   int fd;
+
+   fd = open(/proc/self/auxv, O_RDONLY);
+   if (fd == -1) {
+   perror(open);
+   return NULL;
+   }
+
+   result = NULL;
+
+   num = read(fd, auxv, sizeof(auxv));
+   if (num  0) {
+   perror(read);
+   goto out;
+   }
+
+   if (num  sizeof(auxv)) {
+   printf(Overflowed auxv buffer\n);
+   goto out;
+   }
+
+   p = (ElfW(auxv_t) *)auxv;
+
+   while (p-a_type != AT_NULL) {
+   if (p-a_type == type) {
+   result = (void *)p-a_un.a_val;
+   break;
+   }
+
+   p++;
+   }
+out:
+   close(fd);
+   return result;
+}
diff --git a/tools/testing/selftests/powerpc/pmu/lib.c 
b/tools/testing/selftests/powerpc/pmu/lib.c
index 9768dea..a07104c 100644
--- a/tools/testing/selftests/powerpc/pmu/lib.c
+++ b/tools/testing/selftests/powerpc/pmu/lib.c
@@ -5,15 +5,10 @@
 
 #define _GNU_SOURCE/* For CPU_ZERO etc. */
 
-#include elf.h
 #include errno.h
-#include fcntl.h
-#include link.h
 #include sched.h
 #include setjmp.h
 #include stdlib.h
-#include sys/stat.h
-#include sys/types.h
 #include sys/wait.h
 
 #include utils.h
@@ -256,45 +251,3 @@ out:
return rc;
 }
 
-static char auxv[4096];
-
-void *get_auxv_entry(int type)
-{
-   ElfW(auxv_t) *p;
-   void *result;
-   ssize_t num;
-   int fd;
-
-   fd = open(/proc/self/auxv, O_RDONLY);
-   if (fd == -1) {
-   perror(open);
-   return NULL;
-   }
-
-   result = NULL;
-
-   num = read(fd, auxv, sizeof(auxv));
-   if (num  0) {
-   perror(read);
-   goto out;
-   }
-
-   if (num  sizeof(auxv)) {
-   printf(Overflowed auxv buffer\n);
-   goto out;
-   }
-
-   p = (ElfW(auxv_t) *)auxv;
-
-   while (p-a_type != AT_NULL) {
-   if (p-a_type == type) {
-   result = (void *)p-a_un.a_val;
-   break;
-   }
-
-   p++;
-   }
-out:
-   close(fd);
-   return result;
-}
diff --git a/tools/testing/selftests/powerpc/pmu/lib.h 
b/tools/testing/selftests/powerpc/pmu/lib.h
index 0f0339c..ca5d72a 100644
--- a/tools/testing/selftests/powerpc/pmu/lib.h
+++ b/tools/testing/selftests/powerpc/pmu/lib.h
@@ -29,7 +29,6 @@ extern int notify_parent(union pipe write_pipe);
 extern int notify_parent_of_error(union pipe write_pipe);
 extern pid_t eat_cpu(int (test_function)(void));
 extern bool require_paranoia_below(int level);
-extern void *get_auxv_entry(int type);
 
 struct addr_range {
uint64_t first, last;
diff --git a/tools/testing/selftests/powerpc/utils.h 
b/tools/testing/selftests/powerpc/utils.h
index a93777a..64f53cd 100644
--- a/tools/testing/selftests/powerpc/utils.h
+++ b/tools/testing/selftests/powerpc/utils.h
@@ -19,7 +19,7 @@ typedef uint8_t u8;
 
 
 int test_harness(int (test_function)(void), char *name);
-
+extern void *get_auxv_entry(int type);
 
 /* Yes, this is evil */
 #define FAIL_IF(x) \
-- 
1.7.10.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 1/4] powerpc/tm: Abort syscalls in active transactions

2015-03-29 Thread Sam Bobroff
This patch changes the syscall handler to doom (tabort) active
transactions when a syscall is made and return immediately without
performing the syscall.

Currently, the system call instruction automatically suspends an
active transaction which causes side effects to persist when an active
transaction fails.

This does change the kernel's behaviour, but in a way that was
documented as unsupported. It doesn't reduce functionality because
syscalls will still be performed after tsuspend. It also provides a
consistent interface and makes the behaviour of user code
substantially the same across powerpc and platforms that do not
support suspended transactions (e.g. x86 and s390).

Performance measurements using
http://ozlabs.org/~anton/junkcode/null_syscall.c
indicate the cost of a system call increases by about 0.5%.

Signed-off-by: Sam Bobroff sam.bobr...@au1.ibm.com
---
v2:

Also update the failure code table.

 Documentation/powerpc/transactional_memory.txt |   32 
 arch/powerpc/include/uapi/asm/tm.h |2 +-
 arch/powerpc/kernel/entry_64.S |   19 ++
 3 files changed, 36 insertions(+), 17 deletions(-)

diff --git a/Documentation/powerpc/transactional_memory.txt 
b/Documentation/powerpc/transactional_memory.txt
index 9791e98..98b39af 100644
--- a/Documentation/powerpc/transactional_memory.txt
+++ b/Documentation/powerpc/transactional_memory.txt
@@ -74,22 +74,23 @@ Causes of transaction aborts
 Syscalls
 
 
-Performing syscalls from within transaction is not recommended, and can lead
-to unpredictable results.
+Syscalls made from within an active transaction will not be performed and the
+transaction will be doomed by the kernel with the failure code TM_CAUSE_SYSCALL
+| TM_CAUSE_PERSISTENT.
 
-Syscalls do not by design abort transactions, but beware: The kernel code will
-not be running in transactional state.  The effect of syscalls will always
-remain visible, but depending on the call they may abort your transaction as a
-side-effect, read soon-to-be-aborted transactional data that should not remain
-invisible, etc.  If you constantly retry a transaction that constantly aborts
-itself by calling a syscall, you'll have a livelock  make no progress.
+Syscalls made from within a suspended transaction are performed as normal and
+the transaction is not explicitly doomed by the kernel.  However, what the
+kernel does to perform the syscall may result in the transaction being doomed
+by the hardware.  The syscall is performed in suspended mode so any side
+effects will be persistent, independent of transaction success or failure.  No
+guarantees are provided by the kernel about which syscalls will affect
+transaction success.
 
-Simple syscalls (e.g. sigprocmask()) could be OK.  Even things like write()
-from, say, printf() should be OK as long as the kernel does not access any
-memory that was accessed transactionally.
-
-Consider any syscalls that happen to work as debug-only -- not recommended for
-production use.  Best to queue them up till after the transaction is over.
+Care must be taken when relying on syscalls to abort during active transactions
+if the calls are made via a library.  Libraries may cache values (which may
+give the appearance of success) or perform operations that cause transaction
+failure before entering the kernel (which may produce different failure codes).
+Examples are glibc's getpid() and lazy symbol resolution.
 
 
 Signals
@@ -176,8 +177,7 @@ kernel aborted a transaction:
  TM_CAUSE_RESCHED   Thread was rescheduled.
  TM_CAUSE_TLBI  Software TLB invalide.
  TM_CAUSE_FAC_UNAV  FP/VEC/VSX unavailable trap.
- TM_CAUSE_SYSCALL   Currently unused; future syscalls that must abort
-transactions for consistency will use this.
+ TM_CAUSE_SYSCALL   Syscall from active transaction.
  TM_CAUSE_SIGNALSignal delivered.
  TM_CAUSE_MISC  Currently unused.
  TM_CAUSE_ALIGNMENT Alignment fault.
diff --git a/arch/powerpc/include/uapi/asm/tm.h 
b/arch/powerpc/include/uapi/asm/tm.h
index 5d836b7..5047659 100644
--- a/arch/powerpc/include/uapi/asm/tm.h
+++ b/arch/powerpc/include/uapi/asm/tm.h
@@ -11,7 +11,7 @@
 #define TM_CAUSE_RESCHED   0xde
 #define TM_CAUSE_TLBI  0xdc
 #define TM_CAUSE_FAC_UNAV  0xda
-#define TM_CAUSE_SYSCALL   0xd8  /* future use */
+#define TM_CAUSE_SYSCALL   0xd8
 #define TM_CAUSE_MISC  0xd6  /* future use */
 #define TM_CAUSE_SIGNAL0xd4
 #define TM_CAUSE_ALIGNMENT 0xd2
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index d180caf2..85bf81d 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -34,6 +34,7 @@
 #include asm/ftrace.h
 #include asm/hw_irq.h
 #include asm/context_tracking.h
+#include asm/tm.h
 
 /*
  * System calls.
@@ -145,6 +146,24 @@ END_FW_FTR_SECTION_IFSET(FW_FEATURE_SPLPAR)
andi.   r11,r10

[PATCH v2 3/4] selftests/powerpc: Add transactional syscall test

2015-03-29 Thread Sam Bobroff
Check that a syscall made during an active transaction will fail with
the correct failure code and that one made during a suspended
transaction will succeed.

Signed-off-by: Sam Bobroff sam.bobr...@au1.ibm.com
---
v2:

Further testing has shown that the success or failure of the transactions was
affected by minor changes to the code, compiler optimisation and linker
settings.

To address this, I've moved the transactional part of the test to a separate
function, written in assembly. I've also extended the test to as many
transactions as it can fit into ten seconds, to better test for failures that
occur more rarely. This has stabilised the results, and it's no longer
necessary to use special compiler or linker flags.

 tools/testing/selftests/powerpc/tm/.gitignore  |1 +
 tools/testing/selftests/powerpc/tm/Makefile|4 +-
 .../testing/selftests/powerpc/tm/tm-syscall-asm.S  |   27 +
 .../testing/selftests/powerpc/tm/tm-syscall-asm.h  |2 +
 tools/testing/selftests/powerpc/tm/tm-syscall.c|  109 
 5 files changed, 142 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/powerpc/tm/tm-syscall-asm.S
 create mode 100644 tools/testing/selftests/powerpc/tm/tm-syscall-asm.h
 create mode 100644 tools/testing/selftests/powerpc/tm/tm-syscall.c

diff --git a/tools/testing/selftests/powerpc/tm/.gitignore 
b/tools/testing/selftests/powerpc/tm/.gitignore
index 33d02cc..2699635d 100644
--- a/tools/testing/selftests/powerpc/tm/.gitignore
+++ b/tools/testing/selftests/powerpc/tm/.gitignore
@@ -1 +1,2 @@
 tm-resched-dscr
+tm-syscall
diff --git a/tools/testing/selftests/powerpc/tm/Makefile 
b/tools/testing/selftests/powerpc/tm/Makefile
index 2cede23..93bbff3 100644
--- a/tools/testing/selftests/powerpc/tm/Makefile
+++ b/tools/testing/selftests/powerpc/tm/Makefile
@@ -1,8 +1,10 @@
-PROGS := tm-resched-dscr
+PROGS := tm-resched-dscr tm-syscall
+CFLAGS:=$(CFLAGS) -mhtm
 
 all: $(PROGS)
 
 $(PROGS): ../harness.c
+tm-syscall: tm-syscall-asm.S
 
 run_tests: all
@-for PROG in $(PROGS); do \
diff --git a/tools/testing/selftests/powerpc/tm/tm-syscall-asm.S 
b/tools/testing/selftests/powerpc/tm/tm-syscall-asm.S
new file mode 100644
index 000..2b2daa7
--- /dev/null
+++ b/tools/testing/selftests/powerpc/tm/tm-syscall-asm.S
@@ -0,0 +1,27 @@
+#include ppc-asm.h
+#include asm/unistd.h
+
+   .text
+FUNC_START(getppid_tm_active_impl)
+   tbegin.
+   beq 1f
+   li  r0, __NR_getppid
+   sc
+   tend.
+   blr
+1:
+   li  r3, -1
+   blr
+
+FUNC_START(getppid_tm_suspended_impl)
+   tbegin.
+   beq 1f
+   li  r0, __NR_getppid
+   tsuspend.
+   sc
+   tresume.
+   tend.
+   blr
+1:
+   li  r3, -1
+   blr
diff --git a/tools/testing/selftests/powerpc/tm/tm-syscall-asm.h 
b/tools/testing/selftests/powerpc/tm/tm-syscall-asm.h
new file mode 100644
index 000..6136328
--- /dev/null
+++ b/tools/testing/selftests/powerpc/tm/tm-syscall-asm.h
@@ -0,0 +1,2 @@
+extern int getppid_tm_active_impl(void);
+extern int getppid_tm_suspended_impl(void);
diff --git a/tools/testing/selftests/powerpc/tm/tm-syscall.c 
b/tools/testing/selftests/powerpc/tm/tm-syscall.c
new file mode 100644
index 000..ff3b15c
--- /dev/null
+++ b/tools/testing/selftests/powerpc/tm/tm-syscall.c
@@ -0,0 +1,109 @@
+/* Test the kernel's system call code to ensure that a system call
+ * made from within an active HTM transaction is aborted with the
+ * correct failure code.
+ * Conversely, ensure that a system call made from within a
+ * suspended transaction can succeed.
+ */
+
+#include stdio.h
+#include unistd.h
+#include sys/syscall.h
+#include asm/tm.h
+#include asm/cputable.h
+#include linux/auxvec.h
+#include sys/time.h
+#include stdlib.h
+
+#include utils.h
+#include tm-syscall-asm.h
+
+unsigned retries = 0;
+
+#define TEST_DURATION 10 /* seconds */
+#define TM_RETRIES 100
+
+long failure_code(void)
+{
+   return __builtin_get_texasr()  56;
+}
+
+bool failure_is_persistent(void)
+{
+   return (failure_code()  TM_CAUSE_PERSISTENT) == TM_CAUSE_PERSISTENT;
+}
+
+bool failure_is_syscall(void)
+{
+   return (failure_code()  TM_CAUSE_SYSCALL) == TM_CAUSE_SYSCALL;
+}
+
+pid_t getppid_tm(bool suspend)
+{
+   int i;
+   pid_t pid;
+
+   for (i = 0; i  TM_RETRIES; i++) {
+   if (suspend)
+   pid = getppid_tm_suspended_impl();
+   else
+   pid = getppid_tm_active_impl();
+   if (pid = 0)
+   return pid;
+   if (failure_is_persistent()) {
+   if (failure_is_syscall())
+   return -1;
+   printf(Unexpected persistent transaction failure.\n);
+   printf(TEXASR 0x%016lx, TFIAR 0x%016lx.\n,
+  __builtin_get_texasr(), __builtin_get_tfiar());
+   exit(-1

Re: [PATCH 3/3] selftests/powerpc: Add transactional syscall test

2015-03-23 Thread Sam Bobroff
On 20/03/15 20:25, Anshuman Khandual wrote:
 On 03/19/2015 10:13 AM, Sam Bobroff wrote:
 Check that a syscall made during an active transaction will fail with
 the correct failure code and that one made during a suspended
 transaction will succeed.

 Signed-off-by: Sam Bobroff sam.bobr...@au1.ibm.com
 
 The test works.

Great :-)

 +
 +int tm_syscall(void)
 +{
 +SKIP_IF(!((long)get_auxv_entry(AT_HWCAP2)  PPC_FEATURE2_HTM));
 +setbuf(stdout, 0);
 +FAIL_IF(!t_active_getppid_test());
 +printf(%d active transactions correctly aborted.\n, TM_TEST_RUNS);
 +FAIL_IF(!t_suspended_getppid_test());
 +printf(%d suspended transactions succeeded.\n, TM_TEST_RUNS);
 +return 0;
 +}
 +
 +int main(void)
 +{
 +return test_harness(tm_syscall, tm_syscall);
 +}
 +
 
 There is an extra blank line at the end of this file. Interchanging return
 codes of 0 and 1 for various functions make it very confusing along with
 negative FAIL_IF checks in the primary test function. Control flow structures
 like these can use some in-code documentation for readability.
 
 + for (i = 0; i  TM_RETRIES; i++) {
 + if (__builtin_tbegin(0)) {
 + getppid();
 + __builtin_tend(0);
 + return 1;
 + }
 + if (t_failure_persistent())
 + return 0;
 
 or
 
 + if (__builtin_tbegin(0)) {
 + __builtin_tsuspend();
 + getppid();
 + __builtin_tresume();
 + __builtin_tend(0);
 + return 1;
 + }
 + if (t_failure_persistent())
 + return 0;
 

Good points. I'll remove the blank line and comment the code.

I'm not sure I can do any better with the FAIL_IF() macro: I wanted it
to read fail if the test failed, but I can see what you mean about a
double negative. Maybe it would be better to introduce a different
macro, more like a standard assert: TEST(XXX) which fails if XXX is
false. However, I think TEST would be too generic a name and I'm not
should what would be better. Any comments/suggestions?

Thanks for the review!

Cheers,
Sam.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 2/4] selftests/powerpc: Move get_auxv_entry() to harness.c

2015-04-06 Thread Sam Bobroff
Move get_auxv_entry() from pmu/lib.c up to harness.c in order to make
it available to other tests.

Signed-off-by: Sam Bobroff sam.bobr...@au1.ibm.com
---
 tools/testing/selftests/powerpc/harness.c |   47 +
 tools/testing/selftests/powerpc/pmu/lib.c |   47 -
 tools/testing/selftests/powerpc/pmu/lib.h |1 -
 tools/testing/selftests/powerpc/utils.h   |2 +-
 4 files changed, 48 insertions(+), 49 deletions(-)

diff --git a/tools/testing/selftests/powerpc/harness.c 
b/tools/testing/selftests/powerpc/harness.c
index 8ebc58a..f7997af 100644
--- a/tools/testing/selftests/powerpc/harness.c
+++ b/tools/testing/selftests/powerpc/harness.c
@@ -11,6 +11,10 @@
 #include sys/types.h
 #include sys/wait.h
 #include unistd.h
+#include elf.h
+#include fcntl.h
+#include link.h
+#include sys/stat.h
 
 #include subunit.h
 #include utils.h
@@ -112,3 +116,46 @@ int test_harness(int (test_function)(void), char *name)
 
return rc;
 }
+
+static char auxv[4096];
+
+void *get_auxv_entry(int type)
+{
+   ElfW(auxv_t) *p;
+   void *result;
+   ssize_t num;
+   int fd;
+
+   fd = open(/proc/self/auxv, O_RDONLY);
+   if (fd == -1) {
+   perror(open);
+   return NULL;
+   }
+
+   result = NULL;
+
+   num = read(fd, auxv, sizeof(auxv));
+   if (num  0) {
+   perror(read);
+   goto out;
+   }
+
+   if (num  sizeof(auxv)) {
+   printf(Overflowed auxv buffer\n);
+   goto out;
+   }
+
+   p = (ElfW(auxv_t) *)auxv;
+
+   while (p-a_type != AT_NULL) {
+   if (p-a_type == type) {
+   result = (void *)p-a_un.a_val;
+   break;
+   }
+
+   p++;
+   }
+out:
+   close(fd);
+   return result;
+}
diff --git a/tools/testing/selftests/powerpc/pmu/lib.c 
b/tools/testing/selftests/powerpc/pmu/lib.c
index 9768dea..a07104c 100644
--- a/tools/testing/selftests/powerpc/pmu/lib.c
+++ b/tools/testing/selftests/powerpc/pmu/lib.c
@@ -5,15 +5,10 @@
 
 #define _GNU_SOURCE/* For CPU_ZERO etc. */
 
-#include elf.h
 #include errno.h
-#include fcntl.h
-#include link.h
 #include sched.h
 #include setjmp.h
 #include stdlib.h
-#include sys/stat.h
-#include sys/types.h
 #include sys/wait.h
 
 #include utils.h
@@ -256,45 +251,3 @@ out:
return rc;
 }
 
-static char auxv[4096];
-
-void *get_auxv_entry(int type)
-{
-   ElfW(auxv_t) *p;
-   void *result;
-   ssize_t num;
-   int fd;
-
-   fd = open(/proc/self/auxv, O_RDONLY);
-   if (fd == -1) {
-   perror(open);
-   return NULL;
-   }
-
-   result = NULL;
-
-   num = read(fd, auxv, sizeof(auxv));
-   if (num  0) {
-   perror(read);
-   goto out;
-   }
-
-   if (num  sizeof(auxv)) {
-   printf(Overflowed auxv buffer\n);
-   goto out;
-   }
-
-   p = (ElfW(auxv_t) *)auxv;
-
-   while (p-a_type != AT_NULL) {
-   if (p-a_type == type) {
-   result = (void *)p-a_un.a_val;
-   break;
-   }
-
-   p++;
-   }
-out:
-   close(fd);
-   return result;
-}
diff --git a/tools/testing/selftests/powerpc/pmu/lib.h 
b/tools/testing/selftests/powerpc/pmu/lib.h
index 0f0339c..ca5d72a 100644
--- a/tools/testing/selftests/powerpc/pmu/lib.h
+++ b/tools/testing/selftests/powerpc/pmu/lib.h
@@ -29,7 +29,6 @@ extern int notify_parent(union pipe write_pipe);
 extern int notify_parent_of_error(union pipe write_pipe);
 extern pid_t eat_cpu(int (test_function)(void));
 extern bool require_paranoia_below(int level);
-extern void *get_auxv_entry(int type);
 
 struct addr_range {
uint64_t first, last;
diff --git a/tools/testing/selftests/powerpc/utils.h 
b/tools/testing/selftests/powerpc/utils.h
index a93777a..64f53cd 100644
--- a/tools/testing/selftests/powerpc/utils.h
+++ b/tools/testing/selftests/powerpc/utils.h
@@ -19,7 +19,7 @@ typedef uint8_t u8;
 
 
 int test_harness(int (test_function)(void), char *name);
-
+extern void *get_auxv_entry(int type);
 
 /* Yes, this is evil */
 #define FAIL_IF(x) \
-- 
1.7.10.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 0/4] powerpc/tm: Abort syscalls in active transactions

2015-04-06 Thread Sam Bobroff

See the first patch for a description of the reasoning behind this
change.

This set includes the change, a kernel selftest for it and
some slight refactoring of the selftest code.


v3:
Patch 1/4: powerpc/tm: Abort syscalls in active transactions

Use TABORT() macro to allow building on versions of gcc that don't support
the tabort. instruction.


v2:
Patch 1/4: powerpc/tm: Abort syscalls in active transactions

Also update the failure code table.

Patch 3/4: selftests/powerpc: Add transactional syscall test

Further testing has shown that the success or failure of the transactions was
affected by minor changes to the code, compiler optimisation and linker
settings.

To address this, I've moved the transactional part of the test to a separate
function, written in assembly. I've also extended the test to as many
transactions as it can fit into ten seconds, to better test for failures that
occur more rarely. This has stabilised the results, and it's no longer
necessary to use special compiler or linker flags.

Patch 4/4: powerpc/tm: Correct minor documentation typos

Discovered some typos while updating the documentation.


Sam Bobroff (4):
  powerpc/tm: Abort syscalls in active transactions
  selftests/powerpc: Move get_auxv_entry() to harness.c
  selftests/powerpc: Add transactional syscall test
  powerpc/tm: Correct minor documentation typos

 Documentation/powerpc/transactional_memory.txt |   36 +++
 arch/powerpc/include/uapi/asm/tm.h |2 +-
 arch/powerpc/kernel/entry_64.S |   19 
 tools/testing/selftests/powerpc/harness.c  |   47 +
 tools/testing/selftests/powerpc/pmu/lib.c  |   47 -
 tools/testing/selftests/powerpc/pmu/lib.h  |1 -
 tools/testing/selftests/powerpc/tm/.gitignore  |1 +
 tools/testing/selftests/powerpc/tm/Makefile|4 +-
 .../testing/selftests/powerpc/tm/tm-syscall-asm.S  |   27 +
 .../testing/selftests/powerpc/tm/tm-syscall-asm.h  |2 +
 tools/testing/selftests/powerpc/tm/tm-syscall.c|  109 
 tools/testing/selftests/powerpc/utils.h|2 +-
 12 files changed, 228 insertions(+), 69 deletions(-)
 create mode 100644 tools/testing/selftests/powerpc/tm/tm-syscall-asm.S
 create mode 100644 tools/testing/selftests/powerpc/tm/tm-syscall-asm.h
 create mode 100644 tools/testing/selftests/powerpc/tm/tm-syscall.c

-- 
1.7.10.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 1/4] powerpc/tm: Abort syscalls in active transactions

2015-04-06 Thread Sam Bobroff
This patch changes the syscall handler to doom (tabort) active
transactions when a syscall is made and return immediately without
performing the syscall.

Currently, the system call instruction automatically suspends an
active transaction which causes side effects to persist when an active
transaction fails.

This does change the kernel's behaviour, but in a way that was
documented as unsupported. It doesn't reduce functionality because
syscalls will still be performed after tsuspend. It also provides a
consistent interface and makes the behaviour of user code
substantially the same across powerpc and platforms that do not
support suspended transactions (e.g. x86 and s390).

Performance measurements using
http://ozlabs.org/~anton/junkcode/null_syscall.c
indicate the cost of a system call increases by about 0.5%.

Signed-off-by: Sam Bobroff sam.bobr...@au1.ibm.com
Acked-By: Michael Neuling mi...@neuling.org
---
v3:

Use TABORT() macro to allow building on versions of gcc that don't support
the tabort. instruction.

v2:

Also update the failure code table.

 Documentation/powerpc/transactional_memory.txt |   32 
 arch/powerpc/include/uapi/asm/tm.h |2 +-
 arch/powerpc/kernel/entry_64.S |   19 ++
 3 files changed, 36 insertions(+), 17 deletions(-)

diff --git a/Documentation/powerpc/transactional_memory.txt 
b/Documentation/powerpc/transactional_memory.txt
index 9791e98..98b39af 100644
--- a/Documentation/powerpc/transactional_memory.txt
+++ b/Documentation/powerpc/transactional_memory.txt
@@ -74,22 +74,23 @@ Causes of transaction aborts
 Syscalls
 
 
-Performing syscalls from within transaction is not recommended, and can lead
-to unpredictable results.
+Syscalls made from within an active transaction will not be performed and the
+transaction will be doomed by the kernel with the failure code TM_CAUSE_SYSCALL
+| TM_CAUSE_PERSISTENT.
 
-Syscalls do not by design abort transactions, but beware: The kernel code will
-not be running in transactional state.  The effect of syscalls will always
-remain visible, but depending on the call they may abort your transaction as a
-side-effect, read soon-to-be-aborted transactional data that should not remain
-invisible, etc.  If you constantly retry a transaction that constantly aborts
-itself by calling a syscall, you'll have a livelock  make no progress.
+Syscalls made from within a suspended transaction are performed as normal and
+the transaction is not explicitly doomed by the kernel.  However, what the
+kernel does to perform the syscall may result in the transaction being doomed
+by the hardware.  The syscall is performed in suspended mode so any side
+effects will be persistent, independent of transaction success or failure.  No
+guarantees are provided by the kernel about which syscalls will affect
+transaction success.
 
-Simple syscalls (e.g. sigprocmask()) could be OK.  Even things like write()
-from, say, printf() should be OK as long as the kernel does not access any
-memory that was accessed transactionally.
-
-Consider any syscalls that happen to work as debug-only -- not recommended for
-production use.  Best to queue them up till after the transaction is over.
+Care must be taken when relying on syscalls to abort during active transactions
+if the calls are made via a library.  Libraries may cache values (which may
+give the appearance of success) or perform operations that cause transaction
+failure before entering the kernel (which may produce different failure codes).
+Examples are glibc's getpid() and lazy symbol resolution.
 
 
 Signals
@@ -176,8 +177,7 @@ kernel aborted a transaction:
  TM_CAUSE_RESCHED   Thread was rescheduled.
  TM_CAUSE_TLBI  Software TLB invalide.
  TM_CAUSE_FAC_UNAV  FP/VEC/VSX unavailable trap.
- TM_CAUSE_SYSCALL   Currently unused; future syscalls that must abort
-transactions for consistency will use this.
+ TM_CAUSE_SYSCALL   Syscall from active transaction.
  TM_CAUSE_SIGNALSignal delivered.
  TM_CAUSE_MISC  Currently unused.
  TM_CAUSE_ALIGNMENT Alignment fault.
diff --git a/arch/powerpc/include/uapi/asm/tm.h 
b/arch/powerpc/include/uapi/asm/tm.h
index 5d836b7..5047659 100644
--- a/arch/powerpc/include/uapi/asm/tm.h
+++ b/arch/powerpc/include/uapi/asm/tm.h
@@ -11,7 +11,7 @@
 #define TM_CAUSE_RESCHED   0xde
 #define TM_CAUSE_TLBI  0xdc
 #define TM_CAUSE_FAC_UNAV  0xda
-#define TM_CAUSE_SYSCALL   0xd8  /* future use */
+#define TM_CAUSE_SYSCALL   0xd8
 #define TM_CAUSE_MISC  0xd6  /* future use */
 #define TM_CAUSE_SIGNAL0xd4
 #define TM_CAUSE_ALIGNMENT 0xd2
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index d180caf2..6374af8 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -34,6 +34,7 @@
 #include asm/ftrace.h
 #include asm/hw_irq.h
 #include asm

[PATCH v3 4/4] powerpc/tm: Correct minor documentation typos

2015-04-06 Thread Sam Bobroff
Signed-off-by: Sam Bobroff sam.bobr...@au1.ibm.com
---
v2:

Discovered some typos while updating the documentation.

 Documentation/powerpc/transactional_memory.txt |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/powerpc/transactional_memory.txt 
b/Documentation/powerpc/transactional_memory.txt
index 98b39af..ba0a2a4 100644
--- a/Documentation/powerpc/transactional_memory.txt
+++ b/Documentation/powerpc/transactional_memory.txt
@@ -175,7 +175,7 @@ These are defined in asm/reg.h, and distinguish different 
reasons why the
 kernel aborted a transaction:
 
  TM_CAUSE_RESCHED   Thread was rescheduled.
- TM_CAUSE_TLBI  Software TLB invalide.
+ TM_CAUSE_TLBI  Software TLB invalid.
  TM_CAUSE_FAC_UNAV  FP/VEC/VSX unavailable trap.
  TM_CAUSE_SYSCALL   Syscall from active transaction.
  TM_CAUSE_SIGNALSignal delivered.
@@ -185,7 +185,7 @@ kernel aborted a transaction:
 
 These can be checked by the user program's abort handler as TEXASR[0:7].  If
 bit 7 is set, it indicates that the error is consider persistent.  For example
-a TM_CAUSE_ALIGNMENT will be persistent while a TM_CAUSE_RESCHED will not.q
+a TM_CAUSE_ALIGNMENT will be persistent while a TM_CAUSE_RESCHED will not.
 
 GDB
 ===
-- 
1.7.10.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 3/4] selftests/powerpc: Add transactional syscall test

2015-04-06 Thread Sam Bobroff
Check that a syscall made during an active transaction will fail with
the correct failure code and that one made during a suspended
transaction will succeed.

Signed-off-by: Sam Bobroff sam.bobr...@au1.ibm.com
---
v2:

Further testing has shown that the success or failure of the transactions was
affected by minor changes to the code, compiler optimisation and linker
settings.

To address this, I've moved the transactional part of the test to a separate
function, written in assembly. I've also extended the test to as many
transactions as it can fit into ten seconds, to better test for failures that
occur more rarely. This has stabilised the results, and it's no longer
necessary to use special compiler or linker flags.

 tools/testing/selftests/powerpc/tm/.gitignore  |1 +
 tools/testing/selftests/powerpc/tm/Makefile|4 +-
 .../testing/selftests/powerpc/tm/tm-syscall-asm.S  |   27 +
 .../testing/selftests/powerpc/tm/tm-syscall-asm.h  |2 +
 tools/testing/selftests/powerpc/tm/tm-syscall.c|  109 
 5 files changed, 142 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/powerpc/tm/tm-syscall-asm.S
 create mode 100644 tools/testing/selftests/powerpc/tm/tm-syscall-asm.h
 create mode 100644 tools/testing/selftests/powerpc/tm/tm-syscall.c

diff --git a/tools/testing/selftests/powerpc/tm/.gitignore 
b/tools/testing/selftests/powerpc/tm/.gitignore
index 33d02cc..2699635d 100644
--- a/tools/testing/selftests/powerpc/tm/.gitignore
+++ b/tools/testing/selftests/powerpc/tm/.gitignore
@@ -1 +1,2 @@
 tm-resched-dscr
+tm-syscall
diff --git a/tools/testing/selftests/powerpc/tm/Makefile 
b/tools/testing/selftests/powerpc/tm/Makefile
index 2cede23..93bbff3 100644
--- a/tools/testing/selftests/powerpc/tm/Makefile
+++ b/tools/testing/selftests/powerpc/tm/Makefile
@@ -1,8 +1,10 @@
-PROGS := tm-resched-dscr
+PROGS := tm-resched-dscr tm-syscall
+CFLAGS:=$(CFLAGS) -mhtm
 
 all: $(PROGS)
 
 $(PROGS): ../harness.c
+tm-syscall: tm-syscall-asm.S
 
 run_tests: all
@-for PROG in $(PROGS); do \
diff --git a/tools/testing/selftests/powerpc/tm/tm-syscall-asm.S 
b/tools/testing/selftests/powerpc/tm/tm-syscall-asm.S
new file mode 100644
index 000..2b2daa7
--- /dev/null
+++ b/tools/testing/selftests/powerpc/tm/tm-syscall-asm.S
@@ -0,0 +1,27 @@
+#include ppc-asm.h
+#include asm/unistd.h
+
+   .text
+FUNC_START(getppid_tm_active_impl)
+   tbegin.
+   beq 1f
+   li  r0, __NR_getppid
+   sc
+   tend.
+   blr
+1:
+   li  r3, -1
+   blr
+
+FUNC_START(getppid_tm_suspended_impl)
+   tbegin.
+   beq 1f
+   li  r0, __NR_getppid
+   tsuspend.
+   sc
+   tresume.
+   tend.
+   blr
+1:
+   li  r3, -1
+   blr
diff --git a/tools/testing/selftests/powerpc/tm/tm-syscall-asm.h 
b/tools/testing/selftests/powerpc/tm/tm-syscall-asm.h
new file mode 100644
index 000..6136328
--- /dev/null
+++ b/tools/testing/selftests/powerpc/tm/tm-syscall-asm.h
@@ -0,0 +1,2 @@
+extern int getppid_tm_active_impl(void);
+extern int getppid_tm_suspended_impl(void);
diff --git a/tools/testing/selftests/powerpc/tm/tm-syscall.c 
b/tools/testing/selftests/powerpc/tm/tm-syscall.c
new file mode 100644
index 000..ff3b15c
--- /dev/null
+++ b/tools/testing/selftests/powerpc/tm/tm-syscall.c
@@ -0,0 +1,109 @@
+/* Test the kernel's system call code to ensure that a system call
+ * made from within an active HTM transaction is aborted with the
+ * correct failure code.
+ * Conversely, ensure that a system call made from within a
+ * suspended transaction can succeed.
+ */
+
+#include stdio.h
+#include unistd.h
+#include sys/syscall.h
+#include asm/tm.h
+#include asm/cputable.h
+#include linux/auxvec.h
+#include sys/time.h
+#include stdlib.h
+
+#include utils.h
+#include tm-syscall-asm.h
+
+unsigned retries = 0;
+
+#define TEST_DURATION 10 /* seconds */
+#define TM_RETRIES 100
+
+long failure_code(void)
+{
+   return __builtin_get_texasr()  56;
+}
+
+bool failure_is_persistent(void)
+{
+   return (failure_code()  TM_CAUSE_PERSISTENT) == TM_CAUSE_PERSISTENT;
+}
+
+bool failure_is_syscall(void)
+{
+   return (failure_code()  TM_CAUSE_SYSCALL) == TM_CAUSE_SYSCALL;
+}
+
+pid_t getppid_tm(bool suspend)
+{
+   int i;
+   pid_t pid;
+
+   for (i = 0; i  TM_RETRIES; i++) {
+   if (suspend)
+   pid = getppid_tm_suspended_impl();
+   else
+   pid = getppid_tm_active_impl();
+   if (pid = 0)
+   return pid;
+   if (failure_is_persistent()) {
+   if (failure_is_syscall())
+   return -1;
+   printf(Unexpected persistent transaction failure.\n);
+   printf(TEXASR 0x%016lx, TFIAR 0x%016lx.\n,
+  __builtin_get_texasr(), __builtin_get_tfiar());
+   exit(-1

[PATCH 1/1] KVM: PPC: Book3S: correct width in XER handling

2015-05-19 Thread Sam Bobroff
In 64 bit kernels, the Fixed Point Exception Register (XER) is a 64
bit field (e.g. in kvm_regs and kvm_vcpu_arch) and in most places it is
accessed as such.

This patch corrects places where it is accessed as a 32 bit field by a
64 bit kernel.  In some cases this is via a 32 bit load or store
instruction which, depending on endianness, will cause either the
lower or upper 32 bits to be missed.  In another case it is cast as a
u32, causing the upper 32 bits to be cleared.

This patch corrects those places by extending the access methods to
64 bits.

Signed-off-by: Sam Bobroff sam.bobr...@au1.ibm.com
---

 arch/powerpc/include/asm/kvm_book3s.h   |4 ++--
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |6 +++---
 arch/powerpc/kvm/book3s_segment.S   |4 ++--
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index b91e74a..05a875a 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -225,12 +225,12 @@ static inline u32 kvmppc_get_cr(struct kvm_vcpu *vcpu)
return vcpu-arch.cr;
 }
 
-static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, u32 val)
+static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, ulong val)
 {
vcpu-arch.xer = val;
 }
 
-static inline u32 kvmppc_get_xer(struct kvm_vcpu *vcpu)
+static inline ulong kvmppc_get_xer(struct kvm_vcpu *vcpu)
 {
return vcpu-arch.xer;
 }
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 4d70df2..d75be59 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -870,7 +870,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
blt hdec_soon
 
ld  r6, VCPU_CTR(r4)
-   lwz r7, VCPU_XER(r4)
+   ld  r7, VCPU_XER(r4)
 
mtctr   r6
mtxer   r7
@@ -1103,7 +1103,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
mfctr   r3
mfxer   r4
std r3, VCPU_CTR(r9)
-   stw r4, VCPU_XER(r9)
+   std r4, VCPU_XER(r9)
 
/* If this is a page table miss then see if it's theirs or ours */
cmpwi   r12, BOOK3S_INTERRUPT_H_DATA_STORAGE
@@ -1675,7 +1675,7 @@ kvmppc_hdsi:
bl  kvmppc_msr_interrupt
 fast_interrupt_c_return:
 6: ld  r7, VCPU_CTR(r9)
-   lwz r8, VCPU_XER(r9)
+   ld  r8, VCPU_XER(r9)
mtctr   r7
mtxer   r8
mr  r4, r9
diff --git a/arch/powerpc/kvm/book3s_segment.S 
b/arch/powerpc/kvm/book3s_segment.S
index acee37c..ca8f174 100644
--- a/arch/powerpc/kvm/book3s_segment.S
+++ b/arch/powerpc/kvm/book3s_segment.S
@@ -123,7 +123,7 @@ no_dcbz32_on:
PPC_LL  r8, SVCPU_CTR(r3)
PPC_LL  r9, SVCPU_LR(r3)
lwz r10, SVCPU_CR(r3)
-   lwz r11, SVCPU_XER(r3)
+   PPC_LL  r11, SVCPU_XER(r3)
 
mtctr   r8
mtlrr9
@@ -237,7 +237,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
mfctr   r8
mflrr9
 
-   stw r5, SVCPU_XER(r13)
+   PPC_STL r5, SVCPU_XER(r13)
PPC_STL r6, SVCPU_FAULT_DAR(r13)
stw r7, SVCPU_FAULT_DSISR(r13)
PPC_STL r8, SVCPU_CTR(r13)
-- 
1.7.10.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 1/1] powerpc/tm: Abort syscalls in active transactions (v2)

2015-06-11 Thread Sam Bobroff
This patch changes the syscall handler to doom (tabort) active
transactions when a syscall is made and return very early without
performing the syscall and keeping side effects to a minimum (no CPU
accounting or system call tracing is performed). Also included is a
new HWCAP2 bit, PPC_FEATURE2_HTM_NOSC, to indicate this
behaviour to userspace.

Currently, the system call instruction automatically suspends an
active transaction which causes side effects to persist when an active
transaction fails.

This does change the kernel's behaviour, but in a way that was
documented as unsupported.  It doesn't reduce functionality as
syscalls will still be performed after tsuspend; it just requires that
the transaction be explicitly suspended.  It also provides a
consistent interface and makes the behaviour of user code
substantially the same across powerpc and platforms that do not
support suspended transactions (e.g. x86 and s390).

Performance measurements using
http://ozlabs.org/~anton/junkcode/null_syscall.c indicate the cost of
a normal (non-aborted) system call increases by about 0.25%.

Signed-off-by: Sam Bobroff sam.bobr...@au1.ibm.com
---

 Documentation/powerpc/transactional_memory.txt  | 32 -
 arch/powerpc/include/asm/cputable.h | 10 
 arch/powerpc/include/uapi/asm/cputable.h|  1 +
 arch/powerpc/include/uapi/asm/tm.h  |  2 +-
 arch/powerpc/kernel/cputable.c  |  4 +++-
 arch/powerpc/kernel/entry_64.S  | 28 ++
 tools/testing/selftests/powerpc/tm/tm-syscall.c |  3 ++-
 7 files changed, 57 insertions(+), 23 deletions(-)

diff --git a/Documentation/powerpc/transactional_memory.txt 
b/Documentation/powerpc/transactional_memory.txt
index ded6979..ba0a2a4 100644
--- a/Documentation/powerpc/transactional_memory.txt
+++ b/Documentation/powerpc/transactional_memory.txt
@@ -74,22 +74,23 @@ Causes of transaction aborts
 Syscalls
 
 
-Performing syscalls from within transaction is not recommended, and can lead
-to unpredictable results.
+Syscalls made from within an active transaction will not be performed and the
+transaction will be doomed by the kernel with the failure code TM_CAUSE_SYSCALL
+| TM_CAUSE_PERSISTENT.
 
-Syscalls do not by design abort transactions, but beware: The kernel code will
-not be running in transactional state.  The effect of syscalls will always
-remain visible, but depending on the call they may abort your transaction as a
-side-effect, read soon-to-be-aborted transactional data that should not remain
-invisible, etc.  If you constantly retry a transaction that constantly aborts
-itself by calling a syscall, you'll have a livelock  make no progress.
+Syscalls made from within a suspended transaction are performed as normal and
+the transaction is not explicitly doomed by the kernel.  However, what the
+kernel does to perform the syscall may result in the transaction being doomed
+by the hardware.  The syscall is performed in suspended mode so any side
+effects will be persistent, independent of transaction success or failure.  No
+guarantees are provided by the kernel about which syscalls will affect
+transaction success.
 
-Simple syscalls (e.g. sigprocmask()) could be OK.  Even things like write()
-from, say, printf() should be OK as long as the kernel does not access any
-memory that was accessed transactionally.
-
-Consider any syscalls that happen to work as debug-only -- not recommended for
-production use.  Best to queue them up till after the transaction is over.
+Care must be taken when relying on syscalls to abort during active transactions
+if the calls are made via a library.  Libraries may cache values (which may
+give the appearance of success) or perform operations that cause transaction
+failure before entering the kernel (which may produce different failure codes).
+Examples are glibc's getpid() and lazy symbol resolution.
 
 
 Signals
@@ -176,8 +177,7 @@ kernel aborted a transaction:
  TM_CAUSE_RESCHED   Thread was rescheduled.
  TM_CAUSE_TLBI  Software TLB invalid.
  TM_CAUSE_FAC_UNAV  FP/VEC/VSX unavailable trap.
- TM_CAUSE_SYSCALL   Currently unused; future syscalls that must abort
-transactions for consistency will use this.
+ TM_CAUSE_SYSCALL   Syscall from active transaction.
  TM_CAUSE_SIGNALSignal delivered.
  TM_CAUSE_MISC  Currently unused.
  TM_CAUSE_ALIGNMENT Alignment fault.
diff --git a/arch/powerpc/include/asm/cputable.h 
b/arch/powerpc/include/asm/cputable.h
index 6367b83..4994648 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -242,11 +242,13 @@ enum {
 
 /* We only set the TM feature if the kernel was compiled with TM supprt */
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
-#define CPU_FTR_TM_COMPCPU_FTR_TM
-#define PPC_FEATURE2_HTM_COMP  PPC_FEATURE2_HTM
+#define CPU_FTR_TM_COMPCPU_FTR_TM

[PATCH v3 1/1] KVM: PPC: Book3S: correct width in XER handling

2015-05-26 Thread Sam Bobroff
In 64 bit kernels, the Fixed Point Exception Register (XER) is a 64
bit field (e.g. in kvm_regs and kvm_vcpu_arch) and in most places it is
accessed as such.

This patch corrects places where it is accessed as a 32 bit field by a
64 bit kernel.  In some cases this is via a 32 bit load or store
instruction which, depending on endianness, will cause either the
lower or upper 32 bits to be missed.  In another case it is cast as a
u32, causing the upper 32 bits to be cleared.

This patch corrects those places by extending the access methods to
64 bits.

Signed-off-by: Sam Bobroff sam.bobr...@au1.ibm.com
---

v3:
Adjust booke set/get xer to match book3s.

v2:

Also extend kvmppc_book3s_shadow_vcpu.xer to 64 bit.

 arch/powerpc/include/asm/kvm_book3s.h |4 ++--
 arch/powerpc/include/asm/kvm_book3s_asm.h |2 +-
 arch/powerpc/include/asm/kvm_booke.h  |4 ++--
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   |6 +++---
 arch/powerpc/kvm/book3s_segment.S |4 ++--
 5 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index b91e74a..05a875a 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -225,12 +225,12 @@ static inline u32 kvmppc_get_cr(struct kvm_vcpu *vcpu)
return vcpu-arch.cr;
 }
 
-static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, u32 val)
+static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, ulong val)
 {
vcpu-arch.xer = val;
 }
 
-static inline u32 kvmppc_get_xer(struct kvm_vcpu *vcpu)
+static inline ulong kvmppc_get_xer(struct kvm_vcpu *vcpu)
 {
return vcpu-arch.xer;
 }
diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h 
b/arch/powerpc/include/asm/kvm_book3s_asm.h
index 5bdfb5d..c4ccd2d 100644
--- a/arch/powerpc/include/asm/kvm_book3s_asm.h
+++ b/arch/powerpc/include/asm/kvm_book3s_asm.h
@@ -112,7 +112,7 @@ struct kvmppc_book3s_shadow_vcpu {
bool in_use;
ulong gpr[14];
u32 cr;
-   u32 xer;
+   ulong xer;
ulong ctr;
ulong lr;
ulong pc;
diff --git a/arch/powerpc/include/asm/kvm_booke.h 
b/arch/powerpc/include/asm/kvm_booke.h
index 3286f0d..bc6e29e 100644
--- a/arch/powerpc/include/asm/kvm_booke.h
+++ b/arch/powerpc/include/asm/kvm_booke.h
@@ -54,12 +54,12 @@ static inline u32 kvmppc_get_cr(struct kvm_vcpu *vcpu)
return vcpu-arch.cr;
 }
 
-static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, u32 val)
+static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, ulong val)
 {
vcpu-arch.xer = val;
 }
 
-static inline u32 kvmppc_get_xer(struct kvm_vcpu *vcpu)
+static inline ulong kvmppc_get_xer(struct kvm_vcpu *vcpu)
 {
return vcpu-arch.xer;
 }
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 4d70df2..d75be59 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -870,7 +870,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
blt hdec_soon
 
ld  r6, VCPU_CTR(r4)
-   lwz r7, VCPU_XER(r4)
+   ld  r7, VCPU_XER(r4)
 
mtctr   r6
mtxer   r7
@@ -1103,7 +1103,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
mfctr   r3
mfxer   r4
std r3, VCPU_CTR(r9)
-   stw r4, VCPU_XER(r9)
+   std r4, VCPU_XER(r9)
 
/* If this is a page table miss then see if it's theirs or ours */
cmpwi   r12, BOOK3S_INTERRUPT_H_DATA_STORAGE
@@ -1675,7 +1675,7 @@ kvmppc_hdsi:
bl  kvmppc_msr_interrupt
 fast_interrupt_c_return:
 6: ld  r7, VCPU_CTR(r9)
-   lwz r8, VCPU_XER(r9)
+   ld  r8, VCPU_XER(r9)
mtctr   r7
mtxer   r8
mr  r4, r9
diff --git a/arch/powerpc/kvm/book3s_segment.S 
b/arch/powerpc/kvm/book3s_segment.S
index acee37c..ca8f174 100644
--- a/arch/powerpc/kvm/book3s_segment.S
+++ b/arch/powerpc/kvm/book3s_segment.S
@@ -123,7 +123,7 @@ no_dcbz32_on:
PPC_LL  r8, SVCPU_CTR(r3)
PPC_LL  r9, SVCPU_LR(r3)
lwz r10, SVCPU_CR(r3)
-   lwz r11, SVCPU_XER(r3)
+   PPC_LL  r11, SVCPU_XER(r3)
 
mtctr   r8
mtlrr9
@@ -237,7 +237,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
mfctr   r8
mflrr9
 
-   stw r5, SVCPU_XER(r13)
+   PPC_STL r5, SVCPU_XER(r13)
PPC_STL r6, SVCPU_FAULT_DAR(r13)
stw r7, SVCPU_FAULT_DSISR(r13)
PPC_STL r8, SVCPU_CTR(r13)
-- 
1.7.10.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 1/1] KVM: PPC: Book3S: correct width in XER handling

2015-05-26 Thread Sam Bobroff
On Tue, May 26, 2015 at 10:35:08AM +0200, Alexander Graf wrote:
 
 
 On 26.05.15 02:27, Sam Bobroff wrote:
  In 64 bit kernels, the Fixed Point Exception Register (XER) is a 64
  bit field (e.g. in kvm_regs and kvm_vcpu_arch) and in most places it is
  accessed as such.
  
  This patch corrects places where it is accessed as a 32 bit field by a
  64 bit kernel.  In some cases this is via a 32 bit load or store
  instruction which, depending on endianness, will cause either the
  lower or upper 32 bits to be missed.  In another case it is cast as a
  u32, causing the upper 32 bits to be cleared.
  
  This patch corrects those places by extending the access methods to
  64 bits.
  
  Signed-off-by: Sam Bobroff sam.bobr...@au1.ibm.com
  ---
  
  v2:
  
  Also extend kvmppc_book3s_shadow_vcpu.xer to 64 bit.
  
   arch/powerpc/include/asm/kvm_book3s.h |4 ++--
   arch/powerpc/include/asm/kvm_book3s_asm.h |2 +-
   arch/powerpc/kvm/book3s_hv_rmhandlers.S   |6 +++---
   arch/powerpc/kvm/book3s_segment.S |4 ++--
   4 files changed, 8 insertions(+), 8 deletions(-)
  
  diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
  b/arch/powerpc/include/asm/kvm_book3s.h
  index b91e74a..05a875a 100644
  --- a/arch/powerpc/include/asm/kvm_book3s.h
  +++ b/arch/powerpc/include/asm/kvm_book3s.h
  @@ -225,12 +225,12 @@ static inline u32 kvmppc_get_cr(struct kvm_vcpu *vcpu)
  return vcpu-arch.cr;
   }
   
  -static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, u32 val)
  +static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, ulong val)
 
 Now we have book3s and booke files with different prototypes on the same
 inline function names. That's really ugly. Please keep them in sync ;).

OK will do.

 
 Alex
 
   {
  vcpu-arch.xer = val;
   }
   
  -static inline u32 kvmppc_get_xer(struct kvm_vcpu *vcpu)
  +static inline ulong kvmppc_get_xer(struct kvm_vcpu *vcpu)
   {
  return vcpu-arch.xer;
   }
  diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h 
  b/arch/powerpc/include/asm/kvm_book3s_asm.h
  index 5bdfb5d..c4ccd2d 100644
  --- a/arch/powerpc/include/asm/kvm_book3s_asm.h
  +++ b/arch/powerpc/include/asm/kvm_book3s_asm.h
  @@ -112,7 +112,7 @@ struct kvmppc_book3s_shadow_vcpu {
  bool in_use;
  ulong gpr[14];
  u32 cr;
  -   u32 xer;
  +   ulong xer;
  ulong ctr;
  ulong lr;
  ulong pc;
  diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
  b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
  index 4d70df2..d75be59 100644
  --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
  +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
  @@ -870,7 +870,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
  blt hdec_soon
   
  ld  r6, VCPU_CTR(r4)
  -   lwz r7, VCPU_XER(r4)
  +   ld  r7, VCPU_XER(r4)
   
  mtctr   r6
  mtxer   r7
  @@ -1103,7 +1103,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
  mfctr   r3
  mfxer   r4
  std r3, VCPU_CTR(r9)
  -   stw r4, VCPU_XER(r9)
  +   std r4, VCPU_XER(r9)
   
  /* If this is a page table miss then see if it's theirs or ours */
  cmpwi   r12, BOOK3S_INTERRUPT_H_DATA_STORAGE
  @@ -1675,7 +1675,7 @@ kvmppc_hdsi:
  bl  kvmppc_msr_interrupt
   fast_interrupt_c_return:
   6: ld  r7, VCPU_CTR(r9)
  -   lwz r8, VCPU_XER(r9)
  +   ld  r8, VCPU_XER(r9)
  mtctr   r7
  mtxer   r8
  mr  r4, r9
  diff --git a/arch/powerpc/kvm/book3s_segment.S 
  b/arch/powerpc/kvm/book3s_segment.S
  index acee37c..ca8f174 100644
  --- a/arch/powerpc/kvm/book3s_segment.S
  +++ b/arch/powerpc/kvm/book3s_segment.S
  @@ -123,7 +123,7 @@ no_dcbz32_on:
  PPC_LL  r8, SVCPU_CTR(r3)
  PPC_LL  r9, SVCPU_LR(r3)
  lwz r10, SVCPU_CR(r3)
  -   lwz r11, SVCPU_XER(r3)
  +   PPC_LL  r11, SVCPU_XER(r3)
   
  mtctr   r8
  mtlrr9
  @@ -237,7 +237,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
  mfctr   r8
  mflrr9
   
  -   stw r5, SVCPU_XER(r13)
  +   PPC_STL r5, SVCPU_XER(r13)
  PPC_STL r6, SVCPU_FAULT_DAR(r13)
  stw r7, SVCPU_FAULT_DSISR(r13)
  PPC_STL r8, SVCPU_CTR(r13)
  

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/1] KVM: PPC: Book3S: correct width in XER handling

2015-05-25 Thread Sam Bobroff
On Mon, May 25, 2015 at 11:08:08PM +0200, Alexander Graf wrote:
 
 
 On 20.05.15 07:26, Sam Bobroff wrote:
  In 64 bit kernels, the Fixed Point Exception Register (XER) is a 64
  bit field (e.g. in kvm_regs and kvm_vcpu_arch) and in most places it is
  accessed as such.
  
  This patch corrects places where it is accessed as a 32 bit field by a
  64 bit kernel.  In some cases this is via a 32 bit load or store
  instruction which, depending on endianness, will cause either the
  lower or upper 32 bits to be missed.  In another case it is cast as a
  u32, causing the upper 32 bits to be cleared.
  
  This patch corrects those places by extending the access methods to
  64 bits.
  
  Signed-off-by: Sam Bobroff sam.bobr...@au1.ibm.com
  ---
  
   arch/powerpc/include/asm/kvm_book3s.h   |4 ++--
   arch/powerpc/kvm/book3s_hv_rmhandlers.S |6 +++---
   arch/powerpc/kvm/book3s_segment.S   |4 ++--
   3 files changed, 7 insertions(+), 7 deletions(-)
  
  diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
  b/arch/powerpc/include/asm/kvm_book3s.h
  index b91e74a..05a875a 100644
  --- a/arch/powerpc/include/asm/kvm_book3s.h
  +++ b/arch/powerpc/include/asm/kvm_book3s.h
  @@ -225,12 +225,12 @@ static inline u32 kvmppc_get_cr(struct kvm_vcpu *vcpu)
  return vcpu-arch.cr;
   }
   
  -static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, u32 val)
  +static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, ulong val)
   {
  vcpu-arch.xer = val;
   }
   
  -static inline u32 kvmppc_get_xer(struct kvm_vcpu *vcpu)
  +static inline ulong kvmppc_get_xer(struct kvm_vcpu *vcpu)
   {
  return vcpu-arch.xer;
   }
  diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
  b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
  index 4d70df2..d75be59 100644
  --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
  +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
  @@ -870,7 +870,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
  blt hdec_soon
   
  ld  r6, VCPU_CTR(r4)
  -   lwz r7, VCPU_XER(r4)
  +   ld  r7, VCPU_XER(r4)
   
  mtctr   r6
  mtxer   r7
  @@ -1103,7 +1103,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
  mfctr   r3
  mfxer   r4
  std r3, VCPU_CTR(r9)
  -   stw r4, VCPU_XER(r9)
  +   std r4, VCPU_XER(r9)
   
  /* If this is a page table miss then see if it's theirs or ours */
  cmpwi   r12, BOOK3S_INTERRUPT_H_DATA_STORAGE
  @@ -1675,7 +1675,7 @@ kvmppc_hdsi:
  bl  kvmppc_msr_interrupt
   fast_interrupt_c_return:
   6: ld  r7, VCPU_CTR(r9)
  -   lwz r8, VCPU_XER(r9)
  +   ld  r8, VCPU_XER(r9)
  mtctr   r7
  mtxer   r8
  mr  r4, r9
  diff --git a/arch/powerpc/kvm/book3s_segment.S 
  b/arch/powerpc/kvm/book3s_segment.S
  index acee37c..ca8f174 100644
  --- a/arch/powerpc/kvm/book3s_segment.S
  +++ b/arch/powerpc/kvm/book3s_segment.S
  @@ -123,7 +123,7 @@ no_dcbz32_on:
  PPC_LL  r8, SVCPU_CTR(r3)
  PPC_LL  r9, SVCPU_LR(r3)
  lwz r10, SVCPU_CR(r3)
  -   lwz r11, SVCPU_XER(r3)
  +   PPC_LL  r11, SVCPU_XER(r3)
 
 struct kvmppc_book3s_shadow_vcpu {
 bool in_use;
 ulong gpr[14];
 u32 cr;
 u32 xer;
 [...]
 
 so at least this change looks wrong. Please double-check all fields in
 your patch again.
 
 
 Alex

Thanks for the review and the catch!

The xer field in kvm_vcpu_arch is already ulong, so it looks like the one in
kvmppc_book3s_shadow_vcpu is the only other case. I'll fix that and repost.

Cheers,
Sam.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 1/1] KVM: PPC: Book3S: correct width in XER handling

2015-05-25 Thread Sam Bobroff
In 64 bit kernels, the Fixed Point Exception Register (XER) is a 64
bit field (e.g. in kvm_regs and kvm_vcpu_arch) and in most places it is
accessed as such.

This patch corrects places where it is accessed as a 32 bit field by a
64 bit kernel.  In some cases this is via a 32 bit load or store
instruction which, depending on endianness, will cause either the
lower or upper 32 bits to be missed.  In another case it is cast as a
u32, causing the upper 32 bits to be cleared.

This patch corrects those places by extending the access methods to
64 bits.

Signed-off-by: Sam Bobroff sam.bobr...@au1.ibm.com
---

v2:

Also extend kvmppc_book3s_shadow_vcpu.xer to 64 bit.

 arch/powerpc/include/asm/kvm_book3s.h |4 ++--
 arch/powerpc/include/asm/kvm_book3s_asm.h |2 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   |6 +++---
 arch/powerpc/kvm/book3s_segment.S |4 ++--
 4 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index b91e74a..05a875a 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -225,12 +225,12 @@ static inline u32 kvmppc_get_cr(struct kvm_vcpu *vcpu)
return vcpu-arch.cr;
 }
 
-static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, u32 val)
+static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, ulong val)
 {
vcpu-arch.xer = val;
 }
 
-static inline u32 kvmppc_get_xer(struct kvm_vcpu *vcpu)
+static inline ulong kvmppc_get_xer(struct kvm_vcpu *vcpu)
 {
return vcpu-arch.xer;
 }
diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h 
b/arch/powerpc/include/asm/kvm_book3s_asm.h
index 5bdfb5d..c4ccd2d 100644
--- a/arch/powerpc/include/asm/kvm_book3s_asm.h
+++ b/arch/powerpc/include/asm/kvm_book3s_asm.h
@@ -112,7 +112,7 @@ struct kvmppc_book3s_shadow_vcpu {
bool in_use;
ulong gpr[14];
u32 cr;
-   u32 xer;
+   ulong xer;
ulong ctr;
ulong lr;
ulong pc;
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 4d70df2..d75be59 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -870,7 +870,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
blt hdec_soon
 
ld  r6, VCPU_CTR(r4)
-   lwz r7, VCPU_XER(r4)
+   ld  r7, VCPU_XER(r4)
 
mtctr   r6
mtxer   r7
@@ -1103,7 +1103,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
mfctr   r3
mfxer   r4
std r3, VCPU_CTR(r9)
-   stw r4, VCPU_XER(r9)
+   std r4, VCPU_XER(r9)
 
/* If this is a page table miss then see if it's theirs or ours */
cmpwi   r12, BOOK3S_INTERRUPT_H_DATA_STORAGE
@@ -1675,7 +1675,7 @@ kvmppc_hdsi:
bl  kvmppc_msr_interrupt
 fast_interrupt_c_return:
 6: ld  r7, VCPU_CTR(r9)
-   lwz r8, VCPU_XER(r9)
+   ld  r8, VCPU_XER(r9)
mtctr   r7
mtxer   r8
mr  r4, r9
diff --git a/arch/powerpc/kvm/book3s_segment.S 
b/arch/powerpc/kvm/book3s_segment.S
index acee37c..ca8f174 100644
--- a/arch/powerpc/kvm/book3s_segment.S
+++ b/arch/powerpc/kvm/book3s_segment.S
@@ -123,7 +123,7 @@ no_dcbz32_on:
PPC_LL  r8, SVCPU_CTR(r3)
PPC_LL  r9, SVCPU_LR(r3)
lwz r10, SVCPU_CR(r3)
-   lwz r11, SVCPU_XER(r3)
+   PPC_LL  r11, SVCPU_XER(r3)
 
mtctr   r8
mtlrr9
@@ -237,7 +237,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
mfctr   r8
mflrr9
 
-   stw r5, SVCPU_XER(r13)
+   PPC_STL r5, SVCPU_XER(r13)
PPC_STL r6, SVCPU_FAULT_DAR(r13)
stw r7, SVCPU_FAULT_DSISR(r13)
PPC_STL r8, SVCPU_CTR(r13)
-- 
1.7.10.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [1/1] powerpc/xmon: Paged output for paca display

2015-08-19 Thread Sam Bobroff
On Tue, Aug 18, 2015 at 04:26:32PM +1000, Michael Ellerman wrote:
 On Fri, 2015-14-08 at 02:55:14 UTC, Sam bobroff wrote:
  The paca display is already more than 24 lines, which can be problematic
  if you have an old school 80x24 terminal, or more likely you are on a
  virtual terminal which does not scroll for whatever reason.
  
  This adds an optional letter to the dp and dpa xmon commands
  (dpp and dppa), which will enable a per-page display (with 16
  line pages): the first page  will be displayed and if there was data
  that didn't fit, it will display a message indicating that the user can
  use enter to display the next page. The intent is that this feels
  similar to the way the memory display functions work.
  
  This is implemented by running over the entire output both for the
  initial command and for each subsequent page: the visible part is
  clipped out by checking line numbers. Handling the empty command as
  more is done by writing a special command into a static buffer that
  indicates where to move the sliding visibility window. This is similar
  to the approach used for the memory dump commands except that the
  state data is encoded into the last_cmd string, rather than a set of
  static variables. The memory dump commands could probably be rewritten
  to make use of the same buffer and remove their other static
  variables.
  
  Sample output:
  
  0:mon dpp1
  paca for cpu 0x1 @ cfdc0480:
   possible = yes
   present  = yes
   online   = yes
   lock_token   = 0x8000  (0x8)
   paca_index   = 0x1 (0xa)
   kernel_toc   = 0xc0eb2400  (0x10)
   kernelbase   = 0xc000  (0x18)
   kernel_msr   = 0xb0001032  (0x20)
   emergency_sp = 0xc0003ffe8000  (0x28)
   mc_emergency_sp  = 0xc0003ffe4000  (0x2e0)
   in_mce   = 0x0 (0x2e8)
   data_offset  = 0x7f17  (0x30)
   hw_cpu_id= 0x8 (0x38)
   cpu_start= 0x1 (0x3a)
   kexec_state  = 0x0 (0x3b)
  [Enter for next page]
  0:mon
   __current= 0xc0007e696620  (0x290)
   kstack   = 0xc0007e6ebe30  (0x298)
   stab_rr  = 0xb (0x2a0)
   saved_r1 = 0xc0007ef37860  (0x2a8)
   trap_save= 0x0 (0x2b8)
   soft_enabled = 0x0 (0x2ba)
   irq_happened = 0x1 (0x2bb)
   io_sync  = 0x0 (0x2bc)
   irq_work_pending = 0x0 (0x2bd)
   nap_state_lost   = 0x0 (0x2be)
  0:mon
  
  (Based on a similar patch by Michael Ellerman m...@ellerman.id.au
  [v2] powerpc/xmon: Allow limiting the size of the paca display.
  This patch is an alternative and cannot coexist with the original.)
 
 
 So this is nice, but ... the diff is twice the size of my version, plus 128
 bytes of BSS, so I'm not sure the added benefit is sufficient to justify the
 added code complexity.
 
 But you can convince me otherwise if you feel strongly about it.
 
 cheers

I do think the output is a lot better paged like this :-)

The 128 byte buffer is a lot more than it needs for this particular command; it
could quite comfortably be lowered to about 32 (I was leaving space for other
commands to use it but there aren't any so far). I'll do this and repost.

Also, because the last_cmd_buf system is not specific to the paca display, it
could be used by the other paged commands (like the memory dumps). If we did
this we could (probably) remove ndump, nidump and ncsum which are all longs,
although I haven't worked out how much buffer space would be needed in
last_cmd_buf to support these (they have their own paging code but the
positional information could be stored in the string buffer). It's probably not
much work but might be a bit tricky. Do you think it's worth doing?

Since we're looking at memory usage, it looks like tmpstr[128] could be
removed without much work, saving 128 bytes and removing an unnecessary global
variable. If it actually turns out to be easy to do I'll post a separate patch.

Thanks for the revew,
Sam.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 2/2] powerpc/xmon: Paginate kernel log buffer display

2015-08-20 Thread Sam Bobroff
The kernel log buffer is often much longer than the size of a terminal
so paginate it's output.

Signed-off-by: Sam Bobroff sam.bobr...@au1.ibm.com
---
 arch/powerpc/xmon/xmon.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index 9ce9e7d..fdd765e 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -248,9 +248,7 @@ Commands:\n\
   u   dump TLB\n
 #endif
   ?   help\n
-#ifdef CONFIG_PPC64
-  .#  limit output to # lines per page (dump paca only)\n
-#endif
+  .#  limit output to # lines per page (for dp#, dpa, dl)\n
   zr  reboot\n\
   zh   halt\n
 ;
@@ -850,6 +848,7 @@ static void paged_set_size(void)
paged_size = 0;
}
 }
+
 static void paged_reset(void)
 {
paged_cur_page = 0;
@@ -2372,10 +2371,12 @@ dump_log_buf(void)
sync();
 
kmsg_dump_rewind_nolock(dumper);
+   paged_start();
while (kmsg_dump_get_line_nolock(dumper, false, buf, sizeof(buf), 
len)) {
buf[len] = '\0';
-   printf(%s, buf);
+   XMON_PRINTF(%s, buf);
}
+   paged_end(dl\n);
 
sync();
/* wait a little while to see if we get a machine check */
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 0/2] powerpc/xmon: Paged output for paca display

2015-08-20 Thread Sam Bobroff



Changes v1 - v2:

* Removed pagination parameters from commands, replaced with new command to set
  page size. This works better for multiple commands and produces simpler code.
* Switched from encoding the page position in the command buffer to using some
  globals. Saves some memory and is less invasive to the command code.
* Added a patch to paginate the kernel log buffer display.


Sam Bobroff (2):
  powerpc/xmon: Paged output for paca display
  powerpc/xmon: Paginate kernel log buffer display

 arch/powerpc/xmon/xmon.c | 89 +++-
 1 file changed, 73 insertions(+), 16 deletions(-)

-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 1/2] powerpc/xmon: Paged output for paca display

2015-08-20 Thread Sam Bobroff
The paca display is already more than 24 lines, which can be problematic
if you have an old school 80x24 terminal, or more likely you are on a
virtual terminal which does not scroll for whatever reason.

This patch adds a new command ., which takes a single (hex) numeric
argument: lines per page. It will cause the output of dp and dpa
to be broken into pages, if necessary.

This is implemented by running over the entire output both for the
initial command and for each subsequent page: the visible part is
clipped out by checking line numbers. This is a simplistic approach
but minimally invasive; it is intended to be easily reusable for other
commands.

Sample output:

0:mon .10
0:mon dp1
paca for cpu 0x1 @ cfdc0480:
 possible = yes
 present  = yes
 online   = yes
 lock_token   = 0x8000  (0x8)
 paca_index   = 0x1 (0xa)
 kernel_toc   = 0xc0eb2400  (0x10)
 kernelbase   = 0xc000  (0x18)
 kernel_msr   = 0xb0001032  (0x20)
 emergency_sp = 0xc0003ffe8000  (0x28)
 mc_emergency_sp  = 0xc0003ffe4000  (0x2e0)
 in_mce   = 0x0 (0x2e8)
 data_offset  = 0x7f17  (0x30)
 hw_cpu_id= 0x8 (0x38)
 cpu_start= 0x1 (0x3a)
 kexec_state  = 0x0 (0x3b)
[Enter for next page]
0:mon
 __current= 0xc0007e696620  (0x290)
 kstack   = 0xc0007e6ebe30  (0x298)
 stab_rr  = 0xb (0x2a0)
 saved_r1 = 0xc0007ef37860  (0x2a8)
 trap_save= 0x0 (0x2b8)
 soft_enabled = 0x0 (0x2ba)
 irq_happened = 0x1 (0x2bb)
 io_sync  = 0x0 (0x2bc)
 irq_work_pending = 0x0 (0x2bd)
 nap_state_lost   = 0x0 (0x2be)
0:mon

(Based on a similar patch by Michael Ellerman m...@ellerman.id.au
[v2] powerpc/xmon: Allow limiting the size of the paca display.
This patch is an alternative and cannot coexist with the original.)

Signed-off-by: Sam Bobroff sam.bobr...@au1.ibm.com
---
 arch/powerpc/xmon/xmon.c | 86 +++-
 1 file changed, 71 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index e599259..9ce9e7d 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -72,6 +72,12 @@ static int xmon_gate;
 
 static unsigned long in_xmon __read_mostly = 0;
 
+#define XMON_PRINTF(...) do { if (paged_vis()) printf(__VA_ARGS__); } while (0)
+#define MAX_PAGED_SIZE 1024
+static unsigned long paged_size = 0, paged_pos, paged_cur_page;
+#ifdef CONFIG_PPC64
+static unsigned long paca_cpu;
+#endif
 static unsigned long adrs;
 static int size = 1;
 #define MAX_DUMP (128 * 1024)
@@ -242,6 +248,9 @@ Commands:\n\
   u   dump TLB\n
 #endif
   ?   help\n
+#ifdef CONFIG_PPC64
+  .#  limit output to # lines per page (dump paca only)\n
+#endif
   zr  reboot\n\
   zh   halt\n
 ;
@@ -833,6 +842,19 @@ static void remove_cpu_bpts(void)
write_ciabr(0);
 }
 
+static void paged_set_size(void)
+{
+   if (!scanhex(paged_size) || (paged_size  MAX_PAGED_SIZE)) {
+   printf(Invalid number of lines per page (max: %d).\n,
+  MAX_PAGED_SIZE);
+   paged_size = 0;
+   }
+}
+static void paged_reset(void)
+{
+   paged_cur_page = 0;
+}
+
 /* Command interpreting routine */
 static char *last_cmd;
 
@@ -863,7 +885,8 @@ cmds(struct pt_regs *excp)
take_input(last_cmd);
last_cmd = NULL;
cmd = inchar();
-   }
+   } else
+   paged_reset();
switch (cmd) {
case 'm':
cmd = inchar();
@@ -924,6 +947,9 @@ cmds(struct pt_regs *excp)
case '?':
xmon_puts(help_string);
break;
+   case '.':
+   paged_set_size();
+   break;
case 'b':
bpt_cmds();
break;
@@ -2069,6 +2095,31 @@ static void xmon_rawdump (unsigned long adrs, long ndump)
printf(\n);
 }
 
+static void paged_start(void)
+{
+   paged_pos = 0;
+}
+
+static void paged_end(char *next_cmd)
+{
+   unsigned long next_page_start = ++paged_cur_page * paged_size;
+
+   if (paged_size  (paged_pos  next_page_start)) {
+   last_cmd = next_cmd;
+   printf([Enter for next page]\n);
+   }
+}
+
+static bool paged_vis(void)
+{
+   bool rv = (!paged_size
+   || ((paged_pos = (paged_size * paged_cur_page))
+(paged_pos  (paged_size * (paged_cur_page + 1);
+
+   paged_pos++;
+   return rv;
+}
+
 #ifdef CONFIG_PPC64
 static void dump_one_paca(int cpu)
 {
@@ -2084,15 +2135,17 @@ static void dump_one_paca(int cpu

Re: [PATCH v3 1/1] KVM: PPC: Book3S: correct width in XER handling

2015-08-05 Thread Sam Bobroff
Ping?

I think I've addressed all the comments in this version. Is there anything else
I need to look at?

Cheers,
Sam.

On Wed, May 27, 2015 at 09:56:57AM +1000, Sam Bobroff wrote:
 In 64 bit kernels, the Fixed Point Exception Register (XER) is a 64
 bit field (e.g. in kvm_regs and kvm_vcpu_arch) and in most places it is
 accessed as such.
 
 This patch corrects places where it is accessed as a 32 bit field by a
 64 bit kernel.  In some cases this is via a 32 bit load or store
 instruction which, depending on endianness, will cause either the
 lower or upper 32 bits to be missed.  In another case it is cast as a
 u32, causing the upper 32 bits to be cleared.
 
 This patch corrects those places by extending the access methods to
 64 bits.
 
 Signed-off-by: Sam Bobroff sam.bobr...@au1.ibm.com
 ---
 
 v3:
 Adjust booke set/get xer to match book3s.
 
 v2:
 
 Also extend kvmppc_book3s_shadow_vcpu.xer to 64 bit.
 
  arch/powerpc/include/asm/kvm_book3s.h |4 ++--
  arch/powerpc/include/asm/kvm_book3s_asm.h |2 +-
  arch/powerpc/include/asm/kvm_booke.h  |4 ++--
  arch/powerpc/kvm/book3s_hv_rmhandlers.S   |6 +++---
  arch/powerpc/kvm/book3s_segment.S |4 ++--
  5 files changed, 10 insertions(+), 10 deletions(-)
 
 diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
 b/arch/powerpc/include/asm/kvm_book3s.h
 index b91e74a..05a875a 100644
 --- a/arch/powerpc/include/asm/kvm_book3s.h
 +++ b/arch/powerpc/include/asm/kvm_book3s.h
 @@ -225,12 +225,12 @@ static inline u32 kvmppc_get_cr(struct kvm_vcpu *vcpu)
   return vcpu-arch.cr;
  }
  
 -static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, u32 val)
 +static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, ulong val)
  {
   vcpu-arch.xer = val;
  }
  
 -static inline u32 kvmppc_get_xer(struct kvm_vcpu *vcpu)
 +static inline ulong kvmppc_get_xer(struct kvm_vcpu *vcpu)
  {
   return vcpu-arch.xer;
  }
 diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h 
 b/arch/powerpc/include/asm/kvm_book3s_asm.h
 index 5bdfb5d..c4ccd2d 100644
 --- a/arch/powerpc/include/asm/kvm_book3s_asm.h
 +++ b/arch/powerpc/include/asm/kvm_book3s_asm.h
 @@ -112,7 +112,7 @@ struct kvmppc_book3s_shadow_vcpu {
   bool in_use;
   ulong gpr[14];
   u32 cr;
 - u32 xer;
 + ulong xer;
   ulong ctr;
   ulong lr;
   ulong pc;
 diff --git a/arch/powerpc/include/asm/kvm_booke.h 
 b/arch/powerpc/include/asm/kvm_booke.h
 index 3286f0d..bc6e29e 100644
 --- a/arch/powerpc/include/asm/kvm_booke.h
 +++ b/arch/powerpc/include/asm/kvm_booke.h
 @@ -54,12 +54,12 @@ static inline u32 kvmppc_get_cr(struct kvm_vcpu *vcpu)
   return vcpu-arch.cr;
  }
  
 -static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, u32 val)
 +static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, ulong val)
  {
   vcpu-arch.xer = val;
  }
  
 -static inline u32 kvmppc_get_xer(struct kvm_vcpu *vcpu)
 +static inline ulong kvmppc_get_xer(struct kvm_vcpu *vcpu)
  {
   return vcpu-arch.xer;
  }
 diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
 b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
 index 4d70df2..d75be59 100644
 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
 +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
 @@ -870,7 +870,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
   blt hdec_soon
  
   ld  r6, VCPU_CTR(r4)
 - lwz r7, VCPU_XER(r4)
 + ld  r7, VCPU_XER(r4)
  
   mtctr   r6
   mtxer   r7
 @@ -1103,7 +1103,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
   mfctr   r3
   mfxer   r4
   std r3, VCPU_CTR(r9)
 - stw r4, VCPU_XER(r9)
 + std r4, VCPU_XER(r9)
  
   /* If this is a page table miss then see if it's theirs or ours */
   cmpwi   r12, BOOK3S_INTERRUPT_H_DATA_STORAGE
 @@ -1675,7 +1675,7 @@ kvmppc_hdsi:
   bl  kvmppc_msr_interrupt
  fast_interrupt_c_return:
  6:   ld  r7, VCPU_CTR(r9)
 - lwz r8, VCPU_XER(r9)
 + ld  r8, VCPU_XER(r9)
   mtctr   r7
   mtxer   r8
   mr  r4, r9
 diff --git a/arch/powerpc/kvm/book3s_segment.S 
 b/arch/powerpc/kvm/book3s_segment.S
 index acee37c..ca8f174 100644
 --- a/arch/powerpc/kvm/book3s_segment.S
 +++ b/arch/powerpc/kvm/book3s_segment.S
 @@ -123,7 +123,7 @@ no_dcbz32_on:
   PPC_LL  r8, SVCPU_CTR(r3)
   PPC_LL  r9, SVCPU_LR(r3)
   lwz r10, SVCPU_CR(r3)
 - lwz r11, SVCPU_XER(r3)
 + PPC_LL  r11, SVCPU_XER(r3)
  
   mtctr   r8
   mtlrr9
 @@ -237,7 +237,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
   mfctr   r8
   mflrr9
  
 - stw r5, SVCPU_XER(r13)
 + PPC_STL r5, SVCPU_XER(r13)
   PPC_STL r6, SVCPU_FAULT_DAR(r13)
   stw r7, SVCPU_FAULT_DSISR(r13)
   PPC_STL r8, SVCPU_CTR(r13)
 -- 
 1.7.10.4
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm-ppc in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] book3s_hv_rmhandlers:Pass the correct trap argument to kvmhv_commence_exit

2015-08-13 Thread Sam Bobroff
On Thu, May 21, 2015 at 01:57:04PM +0530, Gautham R. Shenoy wrote:
 In guest_exit_cont we call kvmhv_commence_exit which expects the trap
 number as the argument. However r3 doesn't contain the trap number at
 this point and as a result we would be calling the function with a
 spurious trap number.
 
 Fix this by copying r12 into r3 before calling kvmhv_commence_exit as
 r12 contains the trap number
 
 Signed-off-by: Gautham R. Shenoy e...@linux.vnet.ibm.com

Hi Gautham,

I agree with your logic: r3 is quite clearly corrupted in that path. So:

Reviewed-by: Sam Bobroff sam.bobr...@au1.ibm.com

Just one comment: Do you have a case of this causing some visible problem due
to the corrupted trap number? (I'll test the patch if you do.)

Cheers,
Sam.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 1/1] powerpc/xmon: Paged output for paca display

2015-08-13 Thread Sam Bobroff
The paca display is already more than 24 lines, which can be problematic
if you have an old school 80x24 terminal, or more likely you are on a
virtual terminal which does not scroll for whatever reason.

This adds an optional letter to the dp and dpa xmon commands
(dpp and dppa), which will enable a per-page display (with 16
line pages): the first page  will be displayed and if there was data
that didn't fit, it will display a message indicating that the user can
use enter to display the next page. The intent is that this feels
similar to the way the memory display functions work.

This is implemented by running over the entire output both for the
initial command and for each subsequent page: the visible part is
clipped out by checking line numbers. Handling the empty command as
more is done by writing a special command into a static buffer that
indicates where to move the sliding visibility window. This is similar
to the approach used for the memory dump commands except that the
state data is encoded into the last_cmd string, rather than a set of
static variables. The memory dump commands could probably be rewritten
to make use of the same buffer and remove their other static
variables.

Sample output:

0:mon dpp1
paca for cpu 0x1 @ cfdc0480:
 possible = yes
 present  = yes
 online   = yes
 lock_token   = 0x8000  (0x8)
 paca_index   = 0x1 (0xa)
 kernel_toc   = 0xc0eb2400  (0x10)
 kernelbase   = 0xc000  (0x18)
 kernel_msr   = 0xb0001032  (0x20)
 emergency_sp = 0xc0003ffe8000  (0x28)
 mc_emergency_sp  = 0xc0003ffe4000  (0x2e0)
 in_mce   = 0x0 (0x2e8)
 data_offset  = 0x7f17  (0x30)
 hw_cpu_id= 0x8 (0x38)
 cpu_start= 0x1 (0x3a)
 kexec_state  = 0x0 (0x3b)
[Enter for next page]
0:mon
 __current= 0xc0007e696620  (0x290)
 kstack   = 0xc0007e6ebe30  (0x298)
 stab_rr  = 0xb (0x2a0)
 saved_r1 = 0xc0007ef37860  (0x2a8)
 trap_save= 0x0 (0x2b8)
 soft_enabled = 0x0 (0x2ba)
 irq_happened = 0x1 (0x2bb)
 io_sync  = 0x0 (0x2bc)
 irq_work_pending = 0x0 (0x2bd)
 nap_state_lost   = 0x0 (0x2be)
0:mon

(Based on a similar patch by Michael Ellerman m...@ellerman.id.au
[v2] powerpc/xmon: Allow limiting the size of the paca display.
This patch is an alternative and cannot coexist with the original.)

Signed-off-by: Sam Bobroff sam.bobr...@au1.ibm.com
---

 arch/powerpc/xmon/xmon.c | 82 
 1 file changed, 62 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index e599259..9157286 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -72,6 +72,7 @@ static int xmon_gate;
 
 static unsigned long in_xmon __read_mostly = 0;
 
+static char last_cmd_buf[128];
 static unsigned long adrs;
 static int size = 1;
 #define MAX_DUMP (128 * 1024)
@@ -204,8 +205,8 @@ Commands:\n\
   dldump the kernel log buffer\n
 #ifdef CONFIG_PPC64
   \
-  dp[#]dump paca for current cpu, or cpu #\n\
-  dpa  dump paca for all possible cpus\n
+  dp[p][#] dump paca for current cpu, or cpu # (p = paged)\n\
+  dp[p]a   dump paca for all possible cpus (p = paged)\n
 #endif
   \
   dr   dump stream of raw bytes\n\
@@ -2070,7 +2071,17 @@ static void xmon_rawdump (unsigned long adrs, long ndump)
 }
 
 #ifdef CONFIG_PPC64
-static void dump_one_paca(int cpu)
+static bool line_visible(unsigned long start, unsigned long count,
+unsigned long *line) {
+   bool rv = (!count
+   || ((*line = start)  (*line  (start + count;
+
+   (*line)++;
+   return rv;
+}
+
+static void dump_one_paca(int cpu, unsigned long start,
+ unsigned long count, unsigned long *line)
 {
struct paca_struct *p;
 
@@ -2084,15 +2095,22 @@ static void dump_one_paca(int cpu)
 
p = paca[cpu];
 
-   printf(paca for cpu 0x%x @ %p:\n, cpu, p);
+#define VPRINT(...) do { \
+   if (line_visible(start, count, line)) \
+   printf(__VA_ARGS__); \
+} while (0)
+   VPRINT(paca for cpu 0x%x @ %p:\n, cpu, p);
 
-   printf( %-*s = %s\n, 16, possible, cpu_possible(cpu) ? yes : 
no);
-   printf( %-*s = %s\n, 16, present, cpu_present(cpu) ? yes : no);
-   printf( %-*s = %s\n, 16, online, cpu_online(cpu) ? yes : no);
+   VPRINT( %-*s = %s\n, 16, possible, cpu_possible(cpu) ? yes : 
no);
+   VPRINT( %-*s = %s\n, 16, present, cpu_present(cpu) ? yes : no);
+   VPRINT( %-*s = %s\n, 16, online, cpu_online(cpu) ? yes : no);
+#undef VPRINT
 
-#define DUMP(paca, name, format) \
-   printf( %-*s = %#-*format\t(0x%lx)\n, 16, #name, 18, paca-name, \
-   offsetof

Re: [PATCH v2] powerpc/xmon: Allow limiting the size of the paca display

2015-08-13 Thread Sam Bobroff
On Wed, Aug 12, 2015 at 09:55:25PM +1000, Michael Ellerman wrote:
 The paca display is already more than 24 lines, which can be problematic
 if you have an old school 80x24 terminal, or more likely you are on a
 virtual terminal which does not scroll for whatever reason.
 
 We'd like to expand the paca display even more, so add a way to limit
 the number of lines that are displayed.
 
 This adds a third form of 'dp' which is 'dp # #', where the first number
 is the cpu, and the second is the number of lines to display.
 
 Example output:
 
   5:mon dp 3 6
   paca for cpu 0x3 @ cfe00c00:
possible = yes
present  = yes
online   = yes
lock_token   = 0x8000  (0xa)
paca_index   = 0x3 (0x8)

Michael,

This patch inspired me to do the additional work to make the output paged, more
like the memory dump commands.

I'll post it shortly as powerpc/xmon: Paged output for paca display.

Cheers,
Sam.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3] powerpc: Add an inline function to update POWER8 HID0

2015-08-13 Thread Sam Bobroff
On Wed, Aug 05, 2015 at 12:38:31PM +0530, Gautham R. Shenoy wrote:
 Section 3.7 of Version 1.2 of the Power8 Processor User's Manual
 prescribes that updates to HID0 be preceded by a SYNC instruction and
 followed by an ISYNC instruction (Page 91).
 
 Create an inline function name update_power8_hid0() which follows this
 recipe and invoke it from the static split core path.
 
 Signed-off-by: Gautham R. Shenoy e...@linux.vnet.ibm.com

Hi Gautham,

I've tested this on a Power 8 machine and verified that it is able to change
split modes and that when doing so the new code is used.

Reviewed-by: Sam Bobroff sam.bobr...@au1.ibm.com
Tested-by: Sam Bobroff sam.bobr...@au1.ibm.com

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v2,1/1] powerpc: Individual System V IPC system calls

2015-10-14 Thread Sam Bobroff
On Tue, Oct 13, 2015 at 08:38:42PM +1100, Michael Ellerman wrote:
> On Tue, 2015-13-10 at 01:49:28 UTC, Sam bobroff wrote:
> > This patch provides individual system call numbers for the following
> > System V IPC system calls, on PowerPC, so that they do not need to be
> > multiplexed:
> > * semop, semget, semctl, semtimedop
> > * msgsnd, msgrcv, msgget, msgctl
> > * shmat, shmdt, shmget, shmctl
> 
> You tested this right? :)  Tell me about it.

Why yes I did:

I have written a (fairly) trivial test program that calls each function in a
way that doesn't fail (but that doesn't necessarily attempt to exercise the
full functionality of it; my intent was primarily to validate the parameter
passing part as that is where most of the code change is (on the glibc side)).

I patched a local copy of glibc with the new kernel header and various tweaks
to correctly format the parameter lists for the new calls (there is actually
quite a lot of code in glibc around the IPC calls due to various compatibility
issues). I could then build a full tool chain that supported the new calls.

(This was a lot more extensive than the kernel patch but should be fairly close
to what needs to go into glibc.)

I used that tool chain to build a complete host system (using buildroot). Then
I could run the following tests:

* glibc: stock
  Host kernel: stock
  Result: success
  Notes: As expected, base case.

* glibc: stock
  Host kernel: patched
  Result: success
  Notes: As expected, the old ipc() call still exists in the patched host.

* glibc: patched
  Host kernel: stock
  Result: failure
  Notes: As expected, the test was run with a glibc that requires a patched
  kernel on an unpatched one so the syscalls are unknown.

* glibc: patched
  Host kernel: patched
  Result: success
  Notes: As expected. (Also, a bit of debug in glibc shows the new system call
  paths being followed.)

(I also re-ran the tests both for little-endian and big-endian hosts.)

It would obviously be good to have someone else test this, but I can't see a
way to make it easy to do. They would presumably have to go through all of the
above, which seems too much to ask given how trivial the kernel side of the
patch is. Still, it bothers me a bit so if there is any way please let me know.
(I thought about writing some assembly to directly test the syscall numbers but
all it would do is verify that the numbers are valid, which really isn't much
of a test.)

> Also we could make these available to SPU programs, but I don't think there's
> any point, no one's going to do a libc update for that.
> 
> cheers

Cheers,
Sam.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 1/2] powerpc/xmon: Paged output for paca display

2015-10-14 Thread Sam Bobroff
On Wed, Oct 14, 2015 at 08:39:09PM +1100, Michael Ellerman wrote:
> On Thu, 2015-10-08 at 11:50 +1100, Sam Bobroff wrote:
> > The paca display is already more than 24 lines, which can be problematic
> > if you have an old school 80x24 terminal, or more likely you are on a
> > virtual terminal which does not scroll for whatever reason.
> > 
> > This patch adds a new command ".", which takes a single (hex) numeric
> > argument: lines per page. It will cause the output of "dp" and "dpa"
> > to be broken into pages, if necessary.
> > 
> > Sample output:
> > 
> > 0:mon> .10
> 
> So what about making it "#" rather than "." ?
> 
> cheers

Sure, although we'll have to do a better job than the other commands in the 
help text ;-)
(They use "#" to indicate a hex number and "##" is just going to be confusing.)

Do you want me to respin? (I'm happy for you to just adjust the patch.)

Cheers,
Sam.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v2,1/1] powerpc: Individual System V IPC system calls

2015-10-14 Thread Sam Bobroff
On Wed, Oct 14, 2015 at 08:38:15PM +1100, Michael Ellerman wrote:
> On Wed, 2015-10-14 at 18:00 +1100, Sam Bobroff wrote:
> > On Tue, Oct 13, 2015 at 08:38:42PM +1100, Michael Ellerman wrote:
> > > On Tue, 2015-13-10 at 01:49:28 UTC, Sam bobroff wrote:
> > > > This patch provides individual system call numbers for the following
> > > > System V IPC system calls, on PowerPC, so that they do not need to be
> > > > multiplexed:
> > > > * semop, semget, semctl, semtimedop
> > > > * msgsnd, msgrcv, msgget, msgctl
> > > > * shmat, shmdt, shmget, shmctl
> > > 
> > > You tested this right? :)  Tell me about it.
> > 
> > Why yes I did:
> 
> ...
> 
> > (I also re-ran the tests both for little-endian and big-endian hosts.)
> 
> Did you test on 32-bit at all?

I ran the test program, compiled for 32 and 64 bit, on a biarch power7 machine
(using -m32 and -m64 to the compiler) but only to verify that the fully patched
system succeeded. Is that sufficient?

> > It would obviously be good to have someone else test this, but I can't see a
> > way to make it easy to do. They would presumably have to go through all of 
> > the
> > above, which seems too much to ask given how trivial the kernel side of the
> > patch is. Still, it bothers me a bit so if there is any way please let me 
> > know.
> > (I thought about writing some assembly to directly test the syscall numbers 
> > but
> > all it would do is verify that the numbers are valid, which really isn't 
> > much
> > of a test.)
> 
> Actually that is still a useful test, it at least tells you if the kernel
> you're running on implements the syscalls. Obviously if you're on mainline
> that's easy enough to work out from the git history, but if/when these get
> backported to distro kernels, it's often harder to work out what's in the
> source than just testing it directly.

Oh, fair enough then.

> So I wrote a quick dirty test for that, it seems to work for me:

[snip]

Thanks :-)

> Which gives:
> 
> test: ipc_unmuxed
> tags: git_version:v4.3-rc3-44-g10053fa531a8-dirty
> Testing semop returned -1, errno 22
> Testing semgetreturned -1, errno 2
> Testing semctlreturned -1, errno 22
> Testing semtimedopreturned -1, errno 22
> Testing msgsndreturned -1, errno 14
> Testing msgrcvreturned -1, errno 22
> Testing msggetreturned -1, errno 2
> Testing msgctlreturned -1, errno 22
> Testing shmat returned -1, errno 22
> Testing shmdt returned -1, errno 22
> Testing shmgetreturned -1, errno 2
> Testing shmctlreturned -1, errno 22
> success: ipc_unmuxed
> 
> 
> And on an unpatched system:
> 
> test: ipc_unmuxed
> tags: git_version:v4.3-rc3-44-g10053fa531a8-dirty
> Testing semop returned -1, errno 38
> [FAIL] Test FAILED on line 2
> failure: ipc_unmuxed
> 
> 
> Look OK?

Yep! And 38 (ENOSYS) is the code we'd expect in the failure case.

> cheers

Cheers,
Sam.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 0/2] powerpc/xmon: Paged output for paca display

2015-10-07 Thread Sam Bobroff



Changes v2 -> v3:

Moved the pagination implementation from xmon.c to nonstdio.c where it's much
easier to do and the code is significantly simplified.

As it's now trivial to do, add the capability to truncate the output or to stop
pagination and dump the rest of the output.

Changed function naming scheme to read more easily (e.g.
xmon_start_pagination()).


Changes v1 -> v2:

* Removed pagination parameters from commands, replaced with new command to set
  page size. This works better for multiple commands and produces simpler code.
* Switched from encoding the page position in the command buffer to using some
  globals. Saves some memory and is less invasive to the command code.
* Added a patch to paginate the kernel log buffer display.


Sam Bobroff (2):
  powerpc/xmon: Paged output for paca display
  powerpc/xmon: Paginate kernel log buffer display

 arch/powerpc/xmon/nonstdio.c | 57 ++--
 arch/powerpc/xmon/nonstdio.h |  3 +++
 arch/powerpc/xmon/xmon.c | 18 ++
 3 files changed, 76 insertions(+), 2 deletions(-)

-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 2/2] powerpc/xmon: Paginate kernel log buffer display

2015-10-07 Thread Sam Bobroff
The kernel log buffer is often much longer than the size of a terminal
so paginate it's output.

Signed-off-by: Sam Bobroff <sam.bobr...@au1.ibm.com>
---
 arch/powerpc/xmon/xmon.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index cc070c3..8ba8ea7 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -242,9 +242,7 @@ Commands:\n\
 "  u   dump TLB\n"
 #endif
 "  ?   help\n"
-#ifdef CONFIG_PPC64
-"  .#  limit output to # lines per page (dump paca only)\n"
-#endif
+"  .#  limit output to # lines per page (for dp#, dpa, dl)\n"
 "  zr  reboot\n\
   zh   halt\n"
 ;
@@ -2333,10 +2331,12 @@ dump_log_buf(void)
sync();
 
kmsg_dump_rewind_nolock();
+   xmon_start_pagination();
while (kmsg_dump_get_line_nolock(, false, buf, sizeof(buf), 
)) {
buf[len] = '\0';
printf("%s", buf);
}
+   xmon_end_pagination();
 
sync();
/* wait a little while to see if we get a machine check */
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v2,1/2] powerpc/xmon: Paged output for paca display

2015-10-07 Thread Sam Bobroff
On Tue, Oct 06, 2015 at 10:05:38PM +1100, Michael Ellerman wrote:
> On Fri, 2015-21-08 at 04:24:27 UTC, Sam bobroff wrote:
> > The paca display is already more than 24 lines, which can be problematic
> > if you have an old school 80x24 terminal, or more likely you are on a
> > virtual terminal which does not scroll for whatever reason.
> > 
> > This patch adds a new command ".", which takes a single (hex) numeric
> > argument: lines per page. It will cause the output of "dp" and "dpa"
> > to be broken into pages, if necessary.
> > 
> > This is implemented by running over the entire output both for the
> > initial command and for each subsequent page: the visible part is
> > clipped out by checking line numbers. This is a simplistic approach
> > but minimally invasive; it is intended to be easily reusable for other
> > commands.
> > 
> > Sample output:
> > 
> > 0:mon> .10
> > 0:mon> dp1
> > paca for cpu 0x1 @ cfdc0480:
> >  possible = yes
> >  present  = yes
> >  online   = yes
> >  lock_token   = 0x8000  (0x8)
> >  paca_index   = 0x1 (0xa)
> >  kernel_toc   = 0xc0eb2400  (0x10)
> >  kernelbase   = 0xc000  (0x18)
> >  kernel_msr   = 0xb0001032  (0x20)
> >  emergency_sp = 0xc0003ffe8000  (0x28)
> >  mc_emergency_sp  = 0xc0003ffe4000  (0x2e0)
> >  in_mce   = 0x0 (0x2e8)
> >  data_offset  = 0x7f17  (0x30)
> >  hw_cpu_id= 0x8 (0x38)
> >  cpu_start= 0x1 (0x3a)
> >  kexec_state  = 0x0 (0x3b)
> > [Enter for next page]
> > 0:mon>
> >  __current= 0xc0007e696620  (0x290)
> >  kstack   = 0xc0007e6ebe30  (0x298)
> >  stab_rr  = 0xb (0x2a0)
> >  saved_r1 = 0xc0007ef37860  (0x2a8)
> >  trap_save= 0x0 (0x2b8)
> >  soft_enabled = 0x0 (0x2ba)
> >  irq_happened = 0x1 (0x2bb)
> >  io_sync  = 0x0 (0x2bc)
> >  irq_work_pending = 0x0 (0x2bd)
> >  nap_state_lost   = 0x0 (0x2be)
> > 0:mon>
> > 
> > (Based on a similar patch by Michael Ellerman <m...@ellerman.id.au>
> > "[v2] powerpc/xmon: Allow limiting the size of the paca display".
> > This patch is an alternative and cannot coexist with the original.)
> > 
> > Signed-off-by: Sam Bobroff <sam.bobr...@au1.ibm.com>
> > ---
> >  arch/powerpc/xmon/xmon.c | 86 
> > +++-
> >  1 file changed, 71 insertions(+), 15 deletions(-)
> > 
> > diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
> > index e599259..9ce9e7d 100644
> > --- a/arch/powerpc/xmon/xmon.c
> > +++ b/arch/powerpc/xmon/xmon.c
> > @@ -72,6 +72,12 @@ static int xmon_gate;
> >  
> >  static unsigned long in_xmon __read_mostly = 0;
> >  
> > +#define XMON_PRINTF(...) do { if (paged_vis()) printf(__VA_ARGS__); } 
> > while (0)
> 
> Can you do this is a proper function. I know it will need to be varargs, but
> that shouldn't be too ugly.

Sure, but I think I'll re-work the core of the logic and place it directly into
nonstdio.c. That will allow me to implement it directly in xmon_write() and
there won't need to be any change here at all. It will also allow blocking
during the output which will remove the whole 'run the command several times
and print a slice of it' system.

I'll post a v3.

> 
> > +#define MAX_PAGED_SIZE 1024
> 
> Why do we need a max at all?

OK, removed.

> > +static unsigned long paged_size = 0, paged_pos, paged_cur_page;
> 
> > +#ifdef CONFIG_PPC64
> > +static unsigned long paca_cpu;
> > +#endif
> 
> That can just be static in dump_pacas() by the looks.

This won't be needed in the new version.

> >  static unsigned long adrs;
> >  static int size = 1;
> >  #define MAX_DUMP (128 * 1024)
> > @@ -242,6 +248,9 @@ Commands:\n\
> >  "  u   dump TLB\n"
> >  #endif
> >  "  ?   help\n"
> > +#ifdef CONFIG_PPC64
> > +"  .#  limit output to # lines per page (dump paca only)\n"
> > +#endif
> 
> Don't make it 64-bit only.

It's only because the paca display itself is 64 bit only.

> >  "  zr  reboot\n\
> > 

[PATCH v3 1/2] powerpc/xmon: Paged output for paca display

2015-10-07 Thread Sam Bobroff
The paca display is already more than 24 lines, which can be problematic
if you have an old school 80x24 terminal, or more likely you are on a
virtual terminal which does not scroll for whatever reason.

This patch adds a new command ".", which takes a single (hex) numeric
argument: lines per page. It will cause the output of "dp" and "dpa"
to be broken into pages, if necessary.

Sample output:

0:mon> .10
0:mon> dp1
paca for cpu 0x1 @ cfdc0480:
 possible = yes
 present  = yes
 online   = yes
 lock_token   = 0x8000  (0x8)
 paca_index   = 0x1 (0xa)
 kernel_toc   = 0xc0eb2400  (0x10)
 kernelbase   = 0xc000  (0x18)
 kernel_msr   = 0xb0001032  (0x20)
 emergency_sp = 0xc0003ffe8000  (0x28)
 mc_emergency_sp  = 0xc0003ffe4000  (0x2e0)
 in_mce   = 0x0 (0x2e8)
 data_offset  = 0x7f17  (0x30)
 hw_cpu_id= 0x8 (0x38)
 cpu_start= 0x1 (0x3a)
 kexec_state  = 0x0 (0x3b)
[Hit a key (a:all, q:truncate, any:next page)]
0:mon>
 __current= 0xc0007e696620  (0x290)
 kstack   = 0xc0007e6ebe30  (0x298)
 stab_rr  = 0xb (0x2a0)
 saved_r1 = 0xc0007ef37860  (0x2a8)
 trap_save= 0x0 (0x2b8)
 soft_enabled = 0x0 (0x2ba)
 irq_happened = 0x1 (0x2bb)
 io_sync  = 0x0 (0x2bc)
 irq_work_pending = 0x0 (0x2bd)
 nap_state_lost   = 0x0 (0x2be)
0:mon>

(Based on a similar patch by Michael Ellerman <m...@ellerman.id.au>
"[v2] powerpc/xmon: Allow limiting the size of the paca display".
This patch is an alternative and cannot coexist with the original.)

Signed-off-by: Sam Bobroff <sam.bobr...@au1.ibm.com>
---
 arch/powerpc/xmon/nonstdio.c | 57 ++--
 arch/powerpc/xmon/nonstdio.h |  3 +++
 arch/powerpc/xmon/xmon.c | 18 ++
 3 files changed, 76 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/xmon/nonstdio.c b/arch/powerpc/xmon/nonstdio.c
index c987486..18819d7 100644
--- a/arch/powerpc/xmon/nonstdio.c
+++ b/arch/powerpc/xmon/nonstdio.c
@@ -11,10 +11,25 @@
 #include 
 #include "nonstdio.h"
 
+int paginating = 0, paginate_skipping = 0;
+unsigned long paginate_lpp = 0 /* Lines Per Page */;
+unsigned long paginate_pos;
 
-static int xmon_write(const void *ptr, int nb)
+void xmon_start_pagination(void)
 {
-   return udbg_write(ptr, nb);
+   paginating = 1;
+   paginate_skipping = 0;
+   paginate_pos = 0;
+}
+
+void xmon_end_pagination(void)
+{
+   paginating = 0;
+}
+
+void xmon_set_pagination_lpp(unsigned long lpp)
+{
+   paginate_lpp = lpp;
 }
 
 static int xmon_readchar(void)
@@ -24,6 +39,44 @@ static int xmon_readchar(void)
return -1;
 }
 
+static int xmon_write(const char *ptr, int nb)
+{
+   int rv = 0;
+   const char *p = ptr, *q;
+   const char msg[] = "[Hit a key (a:all, q:truncate, any:next page)]";
+
+   if (nb <= 0)
+   return rv;
+   if (paginating && paginate_skipping)
+   return nb;
+   if (paginate_lpp) {
+   while (paginating && (q = strchr(p, '\n'))) {
+   rv += udbg_write(p, q - p + 1);
+   p = q + 1;
+   paginate_pos++;
+   if (paginate_pos >= paginate_lpp) {
+   udbg_write(msg, strlen(msg));
+   switch (xmon_readchar()) {
+   case 'a':
+   paginating = 0;
+   break;
+   case 'q':
+   paginate_skipping = 1;
+   break;
+   default:
+   /* nothing */
+   break;
+   }
+   paginate_pos = 0;
+   udbg_write("\r\n", 2);
+   if (paginate_skipping)
+   return nb;
+   }
+   }
+   }
+   return rv + udbg_write(p, nb - (p - ptr));
+}
+
 int xmon_putchar(int c)
 {
char ch = c;
diff --git a/arch/powerpc/xmon/nonstdio.h b/arch/powerpc/xmon/nonstdio.h
index 18a51de..f865336 100644
--- a/arch/powerpc/xmon/nonstdio.h
+++ b/arch/powerpc/xmon/nonstdio.h
@@ -3,6 +3,9 @@
 #define printf xmon_printf
 #define putcharxmon_putchar
 
+extern void xmon_set_pagination_lpp(unsigned long lpp);
+extern void xmon_start_pagination(void);
+extern void xmon_end_pagination(void

[PATCH v2 1/1] powerpc: Individual System V IPC system calls

2015-10-12 Thread Sam Bobroff
This patch provides individual system call numbers for the following
System V IPC system calls, on PowerPC, so that they do not need to be
multiplexed:
* semop, semget, semctl, semtimedop
* msgsnd, msgrcv, msgget, msgctl
* shmat, shmdt, shmget, shmctl

Signed-off-by: Sam Bobroff <sam.bobr...@au1.ibm.com>
---

v2:

Rebased onto today's next-20151012.

 arch/powerpc/include/asm/systbl.h  | 12 
 arch/powerpc/include/asm/unistd.h  |  2 +-
 arch/powerpc/include/uapi/asm/unistd.h | 12 
 3 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/systbl.h 
b/arch/powerpc/include/asm/systbl.h
index 126d0c4..c9e26cb 100644
--- a/arch/powerpc/include/asm/systbl.h
+++ b/arch/powerpc/include/asm/systbl.h
@@ -370,3 +370,15 @@ COMPAT_SYS(execveat)
 PPC64ONLY(switch_endian)
 SYSCALL_SPU(userfaultfd)
 SYSCALL_SPU(membarrier)
+SYSCALL(semop)
+SYSCALL(semget)
+COMPAT_SYS(semctl)
+COMPAT_SYS(semtimedop)
+COMPAT_SYS(msgsnd)
+COMPAT_SYS(msgrcv)
+SYSCALL(msgget)
+COMPAT_SYS(msgctl)
+COMPAT_SYS(shmat)
+SYSCALL(shmdt)
+SYSCALL(shmget)
+COMPAT_SYS(shmctl)
diff --git a/arch/powerpc/include/asm/unistd.h 
b/arch/powerpc/include/asm/unistd.h
index 13411be..6d8f802 100644
--- a/arch/powerpc/include/asm/unistd.h
+++ b/arch/powerpc/include/asm/unistd.h
@@ -12,7 +12,7 @@
 #include 
 
 
-#define __NR_syscalls  366
+#define __NR_syscalls  378
 
 #define __NR__exit __NR_exit
 #define NR_syscalls__NR_syscalls
diff --git a/arch/powerpc/include/uapi/asm/unistd.h 
b/arch/powerpc/include/uapi/asm/unistd.h
index 63377380..81579e9 100644
--- a/arch/powerpc/include/uapi/asm/unistd.h
+++ b/arch/powerpc/include/uapi/asm/unistd.h
@@ -388,5 +388,17 @@
 #define __NR_switch_endian 363
 #define __NR_userfaultfd   364
 #define __NR_membarrier365
+#define __NR_semop 366
+#define __NR_semget367
+#define __NR_semctl368
+#define __NR_semtimedop369
+#define __NR_msgsnd370
+#define __NR_msgrcv371
+#define __NR_msgget372
+#define __NR_msgctl373
+#define __NR_shmat 374
+#define __NR_shmdt 375
+#define __NR_shmget376
+#define __NR_shmctl377
 
 #endif /* _UAPI_ASM_POWERPC_UNISTD_H_ */
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 1/1] powerpc: Individual System V IPC system calls

2015-09-24 Thread Sam Bobroff
This patch provides individual system call numbers for the following
System V IPC system calls, on PowerPC, so that they do not need to be
multiplexed:
* semop, semget, semctl, semtimedop
* msgsnd, msgrcv, msgget, msgctl
* shmat, shmdt, shmget, shmctl

Signed-off-by: Sam Bobroff <sam.bobr...@au1.ibm.com>
---

 arch/powerpc/include/asm/systbl.h  | 12 
 arch/powerpc/include/asm/unistd.h  |  2 +-
 arch/powerpc/include/uapi/asm/unistd.h | 12 
 3 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/systbl.h 
b/arch/powerpc/include/asm/systbl.h
index 71f2b3f..546b9ec 100644
--- a/arch/powerpc/include/asm/systbl.h
+++ b/arch/powerpc/include/asm/systbl.h
@@ -368,3 +368,15 @@ SYSCALL_SPU(memfd_create)
 SYSCALL_SPU(bpf)
 COMPAT_SYS(execveat)
 PPC64ONLY(switch_endian)
+SYSCALL(semop)
+SYSCALL(semget)
+COMPAT_SYS(semctl)
+COMPAT_SYS(semtimedop)
+COMPAT_SYS(msgsnd)
+COMPAT_SYS(msgrcv)
+SYSCALL(msgget)
+COMPAT_SYS(msgctl)
+COMPAT_SYS(shmat)
+SYSCALL(shmdt)
+SYSCALL(shmget)
+COMPAT_SYS(shmctl)
diff --git a/arch/powerpc/include/asm/unistd.h 
b/arch/powerpc/include/asm/unistd.h
index f4f8b66..e51c51b 100644
--- a/arch/powerpc/include/asm/unistd.h
+++ b/arch/powerpc/include/asm/unistd.h
@@ -12,7 +12,7 @@
 #include 
 
 
-#define __NR_syscalls  364
+#define __NR_syscalls  376
 
 #define __NR__exit __NR_exit
 #define NR_syscalls__NR_syscalls
diff --git a/arch/powerpc/include/uapi/asm/unistd.h 
b/arch/powerpc/include/uapi/asm/unistd.h
index e4aa173..a8390ee 100644
--- a/arch/powerpc/include/uapi/asm/unistd.h
+++ b/arch/powerpc/include/uapi/asm/unistd.h
@@ -386,5 +386,17 @@
 #define __NR_bpf   361
 #define __NR_execveat  362
 #define __NR_switch_endian 363
+#define __NR_semop 364
+#define __NR_semget365
+#define __NR_semctl366
+#define __NR_semtimedop367
+#define __NR_msgsnd368
+#define __NR_msgrcv369
+#define __NR_msgget370
+#define __NR_msgctl371
+#define __NR_shmat 372
+#define __NR_shmdt 373
+#define __NR_shmget374
+#define __NR_shmctl375
 
 #endif /* _UAPI_ASM_POWERPC_UNISTD_H_ */
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/1] powerpc: Individual System V IPC system calls

2015-09-24 Thread Sam Bobroff
On Fri, Sep 25, 2015 at 10:33:37AM +1000, Michael Ellerman wrote:
> On Thu, 2015-09-24 at 15:39 +1000, Sam Bobroff wrote:
> > This patch provides individual system call numbers for the following
> > System V IPC system calls, on PowerPC, so that they do not need to be
> > multiplexed:
> > * semop, semget, semctl, semtimedop
> > * msgsnd, msgrcv, msgget, msgctl
> > * shmat, shmdt, shmget, shmctl
> 
> Thanks.
> 
> Can you please rebase this on top of linux-next, where we have two new
> syscalls wired up. It should be trivial, just the numbering changes, but I
> think you have a modified libc to actually test the result.

No problem, and yes I can test the glibc changes :-)

Cheers,
Sam.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v5 08/11] powerpc/powernv: Add platform support for stop instruction

2016-06-08 Thread Sam Bobroff
On Thu, Jun 02, 2016 at 07:38:58AM -0500, Shreyas B. Prabhu wrote:

...

> +/* Power Management - PSSCR Fields */

It might be nice to give the full name of the register, as below with the FPSCR.

> +#define PSSCR_RL_MASK0x000F
> +#define PSSCR_MTL_MASK   0x00F0
> +#define PSSCR_TR_MASK0x0300
> +#define PSSCR_PSLL_MASK  0x000F
> +#define PSSCR_EC 0x0010
> +#define PSSCR_ESL0x0020
> +#define PSSCR_SD 0x0040
> +
> +
>  /* Floating Point Status and Control Register (FPSCR) Fields */

Cheers,
Sam.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/1] powerpc: Detect broken or mismatched toolchains

2016-02-22 Thread Sam Bobroff
On Mon, Feb 22, 2016 at 08:05:01PM -0600, Scott Wood wrote:
> On Mon, 2016-02-22 at 16:13 +1100, Sam Bobroff wrote:
> > It can currently be difficult to diagnose a build that fails due to
> > the compiler, linker or other parts of the toolchain being unable to
> > build binaries of the type required by the kernel config. For example
> > using a little endian toolchain to build a big endian kernel may
> > produce:
> > 
> > as: unrecognized option '-maltivec'
> > 
> > This patch adds a basic compile test and error message to
> > arch/powerpc/Makefile so that the above error becomes:
> > 
> > *** Sorry, your toolchain seems to be broken or incorrect. ***
> > Make sure it supports your kernel configuration (ppc64).
> > 
> > Signed-off-by: Sam Bobroff <sam.bobr...@au1.ibm.com>
> > ---
> 
> How is this more useful than getting to actually see the way in which the
> toolchain (or the CFLAGS) is broken?

My reasoning was that it would be better because it happens at the start of the
build, rather than (possibly) a long way into it, and it indicates that the
problem is the toolchain setup (or config) itself rather than the file it's
trying to compile or link.

But I agree completely with what you're saying. I'll try re-working it in a way
that shows the command that fails and it's output.

Cheers,
Sam.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 1/1] powerpc: Detect broken or mismatched toolchains

2016-02-21 Thread Sam Bobroff
It can currently be difficult to diagnose a build that fails due to
the compiler, linker or other parts of the toolchain being unable to
build binaries of the type required by the kernel config. For example
using a little endian toolchain to build a big endian kernel may
produce:

as: unrecognized option '-maltivec'

This patch adds a basic compile test and error message to
arch/powerpc/Makefile so that the above error becomes:

*** Sorry, your toolchain seems to be broken or incorrect. ***
Make sure it supports your kernel configuration (ppc64).

Signed-off-by: Sam Bobroff <sam.bobr...@au1.ibm.com>
---

 arch/powerpc/Makefile | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 96efd82..0041cd2 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -355,6 +355,13 @@ TOUT   := .tmp_gas_check
 # - Require gcc 4.0 or above on 64-bit
 # - gcc-4.2.0 has issues compiling modules on 64-bit
 checkbin:
+   @if test "$(call try-run,echo 'int _start(void) { return 0; }' > 
\"$$TMP\"; \
+   $(CC) $(KBUILD_CPPFLAGS) $(KBUILD_CFLAGS) -x c -nostdlib \"$$TMP\" \
+   -o /dev/null,ok,broken)" = "broken" ; then \
+   echo "*** Sorry, your toolchain seems to be broken or 
incorrect. ***" ; \
+   echo "Make sure it supports your kernel configuration 
($(UTS_MACHINE))." ; \
+   false; \
+   fi
@if test "$(cc-name)" != "clang" \
&& test "$(cc-version)" = "0304" ; then \
if ! /bin/echo mftb 5 | $(AS) -v -mppc -many -o $(TOUT) 
>/dev/null 2>&1 ; then \
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [1/1] KVM: PPC: Introduce KVM_CAP_PPC_HTM

2016-07-18 Thread Sam Bobroff
On Fri, Jul 08, 2016 at 08:49:49PM +1000, Michael Ellerman wrote:
> On Wed, 2016-06-07 at 06:05:54 UTC, Sam bobroff wrote:
> > diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> > index 02416fe..06d79bc 100644
> > --- a/arch/powerpc/kvm/powerpc.c
> > +++ b/arch/powerpc/kvm/powerpc.c
> > @@ -588,6 +588,10 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long 
> > ext)
> > r = 1;
> > break;
> >  #endif
> > +   case KVM_CAP_PPC_HTM:
> > +   r = cpu_has_feature(CPU_FTR_TM)
> > +   && is_kvmppc_hv_enabled(kvm);
> 
> I think it should be using CPU_FTR_TM_COMP.

Oh, why is that? I'm happy to respin the patch I'm just curious.

(I did it that way becuase that seems to be the way the other flags are used,
e.g. CPU_FTR_ALTIVEC).

If I read the code correctly, using CPU_FTR_TM_COMP will work fine: it should
cause the cpu_has_feature() test to always return false if CPU_FTR_TM_COMP is
0.

> And AFAICS you don't need to break that line.

Sure, I'll un-split it when I respin.

> cheers

Cheers,
Sam.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 1/1] KVM: PPC: Introduce KVM_CAP_PPC_HTM

2016-07-19 Thread Sam Bobroff
Introduce a new KVM capability, KVM_CAP_PPC_HTM, that can be queried to
determine if a PowerPC KVM guest should use HTM (Hardware Transactional
Memory).

This will be used by QEMU to populate the pa-features bits in the
guest's device tree.

Signed-off-by: Sam Bobroff <sam.bobr...@au1.ibm.com>
---

v2:

* Use CPU_FTR_TM_COMP instead of CPU_FTR_TM.
* I didn't unbreak the line, as with the extra characters checkpatch will
  complain if I do. I did move the break to a more usual place.

 arch/powerpc/kvm/powerpc.c | 4 
 include/uapi/linux/kvm.h   | 1 +
 2 files changed, 5 insertions(+)

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 02416fe..5ebc8ff 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -588,6 +588,10 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
r = 1;
break;
 #endif
+   case KVM_CAP_PPC_HTM:
+   r = cpu_has_feature(CPU_FTR_TM_COMP) &&
+   is_kvmppc_hv_enabled(kvm);
+   break;
default:
r = 0;
break;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 05ebf47..f421d0e 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -866,6 +866,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_ARM_PMU_V3 126
 #define KVM_CAP_VCPU_ATTRIBUTES 127
 #define KVM_CAP_MAX_VCPU_ID 128
+#define KVM_CAP_PPC_HTM 129
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 1/3] spapr: Disable ibm, pa-features HTM bit

2016-07-04 Thread Sam Bobroff
There are a few issues with our handling of the ibm,pa-features
HTM bit:

- We don't support transactional memory in PR KVM, so don't tell
  the OS that we do.

- In full emulation we have a minimal implementation of HTM that always
  fails, so for performance reasons lets not tell the OS that we
  support it either.

- In HV KVM mode, we should mirror the host HTM enabled state by
  checking a KVM capability or looking at the AT_HWCAP2 bit.

For now unconditionally disable it by removing HTM from the
pa-features bits.  It will be re-enabled in a subsequent patch
specifically for HV KVM.

Signed-off-by: Sam Bobroff <sam.bobr...@au1.ibm.com>
---
 hw/ppc/spapr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 78ebd9e..704aae7 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -635,7 +635,7 @@ static void spapr_populate_cpu_dt(CPUState *cs, void *fdt, 
int offset,
 0xf6, 0x1f, 0xc7, 0xc0, 0x80, 0xf0,
 0x80, 0x00, 0x00, 0x00, 0x00, 0x00,
 0x00, 0x00, 0x00, 0x00, 0x80, 0x00,
-0x80, 0x00, 0x80, 0x00, 0x80, 0x00 };
+0x80, 0x00, 0x80, 0x00, 0x00, 0x00 };
 uint8_t *pa_features;
 size_t pa_size;
 
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 2/3] Add KVM_CAP_PPC_HTM to linux/kvm.h

2016-07-04 Thread Sam Bobroff
Signed-off-by: Sam Bobroff <sam.bobr...@au1.ibm.com>
---
 linux-headers/linux/kvm.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index e60e21b..37cb3e8 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -866,6 +866,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_ARM_PMU_V3 126
 #define KVM_CAP_VCPU_ATTRIBUTES 127
 #define KVM_CAP_MAX_VCPU_ID 128
+#define KVM_CAP_PPC_HTM 129
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 3/3] spapr: Set ibm, pa-features HTM from KVM_CAP_PPC_HTM

2016-07-04 Thread Sam Bobroff
Advertise HTM support in ibm, pa-features if KVM indicates support when
queried via a new capability (KVM_CAP_PPC_HTM).

If KVM returns false for the capability (which may indicate that the
host kernel doesn't support the capability itself) attempt to
determine availability using a fallback method based on KVM being
KVM-HV and HTM being available to the QEMU process.

Signed-off-by: Sam Bobroff <sam.bobr...@au1.ibm.com>
---
 hw/ppc/spapr.c   |  3 ++-
 target-ppc/kvm.c | 27 +++
 target-ppc/kvm_ppc.h |  6 ++
 3 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 704aae7..e229532 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -606,6 +606,7 @@ static void spapr_populate_cpu_dt(CPUState *cs, void *fdt, 
int offset,
 uint32_t tbfreq = kvm_enabled() ? kvmppc_get_tbfreq()
 : SPAPR_TIMEBASE_FREQ;
 uint32_t cpufreq = kvm_enabled() ? kvmppc_get_clockfreq() : 10;
+uint8_t htm = (kvm_enabled() && kvmppc_get_htm_support(env)) ? 0x80 : 0x00;
 uint32_t page_sizes_prop[64];
 size_t page_sizes_prop_size;
 uint32_t vcpus_per_socket = smp_threads * smp_cores;
@@ -635,7 +636,7 @@ static void spapr_populate_cpu_dt(CPUState *cs, void *fdt, 
int offset,
 0xf6, 0x1f, 0xc7, 0xc0, 0x80, 0xf0,
 0x80, 0x00, 0x00, 0x00, 0x00, 0x00,
 0x00, 0x00, 0x00, 0x00, 0x80, 0x00,
-0x80, 0x00, 0x80, 0x00, 0x00, 0x00 };
+0x80, 0x00, 0x80, 0x00, 0x00 | htm, 0x00 };
 uint8_t *pa_features;
 size_t pa_size;
 
diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index 884d564..f94ce3b 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -20,6 +20,7 @@
 #include 
 
 #include 
+#include "elf.h"
 
 #include "qemu-common.h"
 #include "qemu/error-report.h"
@@ -1976,6 +1977,32 @@ uint32_t kvmppc_get_dfp(void)
 return kvmppc_read_int_cpu_dt("ibm,dfp");
 }
 
+bool kvmppc_get_htm_support(CPUPPCState *env)
+{
+PowerPCCPU *cpu = ppc_env_get_cpu(env);
+CPUState *cs = CPU(cpu);
+
+
+if (kvm_vm_check_extension(cs->kvm_state, KVM_CAP_PPC_HTM)) {
+return true;
+}
+/*
+ * Fallback test for host kernels that don't yet support KVM_CAP_PPC_HTM.
+ * This will be unnecessary when KVM_API_VERSION is incremented (to 13 or
+ * above) because that will remove the ambiguity between the host kernel
+ * lacking support for KVM_CAP_PPC_HTM and it having support but reporting
+ * HTM as unavailable (both of which return 0, above).
+ */
+if (kvm_vm_check_extension(cs->kvm_state, KVM_CAP_PPC_GET_PVINFO)) {
+/* Assume this means PR KVM, so no TM. */
+return false;
+} else {
+/* Assume this means HV KVM, propagate whatever host userspace sees. */
+unsigned long hwcap2 = qemu_getauxval(AT_HWCAP2);
+return !!(hwcap2 & PPC_FEATURE2_HAS_HTM);
+}
+}
+
 static int kvmppc_get_pvinfo(CPUPPCState *env, struct kvm_ppc_pvinfo *pvinfo)
  {
  PowerPCCPU *cpu = ppc_env_get_cpu(env);
diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h
index 20bfb59..b01c717 100644
--- a/target-ppc/kvm_ppc.h
+++ b/target-ppc/kvm_ppc.h
@@ -17,6 +17,7 @@ uint32_t kvmppc_get_tbfreq(void);
 uint64_t kvmppc_get_clockfreq(void);
 uint32_t kvmppc_get_vmx(void);
 uint32_t kvmppc_get_dfp(void);
+bool kvmppc_get_htm_support(CPUPPCState *env);
 bool kvmppc_get_host_model(char **buf);
 bool kvmppc_get_host_serial(char **buf);
 int kvmppc_get_hasidle(CPUPPCState *env);
@@ -90,6 +91,11 @@ static inline uint32_t kvmppc_get_dfp(void)
 return 0;
 }
 
+static inline bool kvmppc_get_htm_support(KVMState *kvm_state)
+{
+return false;
+}
+
 static inline int kvmppc_get_hasidle(CPUPPCState *env)
 {
 return 0;
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 0/3] Rework spapr: Better handling of ibm, pa-features TM bit

2016-07-04 Thread Sam Bobroff

Hi David,

Anton asked me to have a look at this, so here is an attempt at a
re-implementation of his: "spapr: Better handling of ibm, pa-features TM bit"
addressing your comments and those from Paul Mackerras.  I've broken the
patch into one to unconditionally disable the HTM bit in pa-features and a
second one to set it conditionally based on a (new) KVM capability. It requires
a small patch to KVM to support the capability, presumably something like this:

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 02416fe..4a8ddab 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -588,6 +588,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
r = 1;
break;
 #endif
+   case KVM_CAP_PPC_HTM:
+   r = cpu_has_feature(CPU_FTR_TM);
+   break;
default:
r = 0;
break;

Adding a new capability requires changing linux/kvm.h but I believe we can
avoid incrementing KVM_API_VERSION at this stage since kernels that don't yet
support it will simply return false. However, due to the ambiguity of that
result I've included Anton's initial fallback approach to be used in that case.
Once the API version is incremented (and support for the capability is
guaranteed) the ambiguity would be gone and we should be able to remove the
fallback. As long as that happens before KVM PR supports HTM, this should
address Paul's concern about using the PVINFO capability to discover KVM-HV.

Note: I'm not sure how to handle the change to linux/kvm.h, I've included the
patch here because it's needed to compile, but I suspect it needs to go via the
kernel. Let's see if this part looks good first.

Note also: I've changed TM to HTM where possible in an attempt to be consistent.


Sam Bobroff (3):
  spapr: Disable ibm, pa-features HTM bit
  Add KVM_CAP_PPC_HTM to linux/kvm.h
  spapr: Set ibm, pa-features HTM from KVM_CAP_PPC_HTM

 hw/ppc/spapr.c|  3 ++-
 linux-headers/linux/kvm.h |  1 +
 target-ppc/kvm.c  | 27 +++
 target-ppc/kvm_ppc.h  |  6 ++
 4 files changed, 36 insertions(+), 1 deletion(-)

-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 3/3] spapr: Set ibm, pa-features HTM from KVM_CAP_PPC_HTM

2016-07-05 Thread Sam Bobroff
Advertise HTM support in ibm, pa-features if KVM indicates support when
queried via a new capability (KVM_CAP_PPC_HTM).

If KVM returns false for the capability (which may indicate that the
host kernel doesn't support the capability itself) attempt to
determine availability using a fallback method based on KVM being
KVM-HV and HTM being available to the QEMU process.

Signed-off-by: Sam Bobroff <sam.bobr...@au1.ibm.com>
---
v2:

* Improve readability of HTM bit set code.
* Move the test for KVM into kvmppc_get_htm_support().

 hw/ppc/spapr.c   |  3 +++
 target-ppc/kvm.c | 29 +
 target-ppc/kvm_ppc.h |  6 ++
 3 files changed, 38 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 704aae7..843fbe7 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -712,6 +712,9 @@ static void spapr_populate_cpu_dt(CPUState *cs, void *fdt, 
int offset,
 } else /* env->mmu_model == POWERPC_MMU_2_07 */ {
 pa_features = pa_features_207;
 pa_size = sizeof(pa_features_207);
+if (kvmppc_get_htm_support(env)) {
+pa_features[24] |= 0x80;
+}
 }
 if (env->ci_large_pages) {
 pa_features[3] |= 0x20;
diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index 884d564..9d13446 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -20,6 +20,7 @@
 #include 
 
 #include 
+#include "elf.h"
 
 #include "qemu-common.h"
 #include "qemu/error-report.h"
@@ -1976,6 +1977,34 @@ uint32_t kvmppc_get_dfp(void)
 return kvmppc_read_int_cpu_dt("ibm,dfp");
 }
 
+bool kvmppc_get_htm_support(CPUPPCState *env)
+{
+PowerPCCPU *cpu = ppc_env_get_cpu(env);
+CPUState *cs = CPU(cpu);
+
+if (!kvm_enabled()) {
+return false;
+}
+if (kvm_vm_check_extension(cs->kvm_state, KVM_CAP_PPC_HTM)) {
+return true;
+}
+/*
+ * Fallback test for host kernels that don't yet support KVM_CAP_PPC_HTM.
+ * This will be unnecessary when KVM_API_VERSION is incremented (to 13 or
+ * above) because that will remove the ambiguity between the host kernel
+ * lacking support for KVM_CAP_PPC_HTM and it having support but reporting
+ * HTM as unavailable (both of which return 0, above).
+ */
+if (kvm_vm_check_extension(cs->kvm_state, KVM_CAP_PPC_GET_PVINFO)) {
+/* Assume this means PR KVM, so no TM. */
+return false;
+} else {
+/* Assume this means HV KVM, propagate whatever host userspace sees. */
+unsigned long hwcap2 = qemu_getauxval(AT_HWCAP2);
+return !!(hwcap2 & PPC_FEATURE2_HAS_HTM);
+}
+}
+
 static int kvmppc_get_pvinfo(CPUPPCState *env, struct kvm_ppc_pvinfo *pvinfo)
  {
  PowerPCCPU *cpu = ppc_env_get_cpu(env);
diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h
index 20bfb59..b01c717 100644
--- a/target-ppc/kvm_ppc.h
+++ b/target-ppc/kvm_ppc.h
@@ -17,6 +17,7 @@ uint32_t kvmppc_get_tbfreq(void);
 uint64_t kvmppc_get_clockfreq(void);
 uint32_t kvmppc_get_vmx(void);
 uint32_t kvmppc_get_dfp(void);
+bool kvmppc_get_htm_support(CPUPPCState *env);
 bool kvmppc_get_host_model(char **buf);
 bool kvmppc_get_host_serial(char **buf);
 int kvmppc_get_hasidle(CPUPPCState *env);
@@ -90,6 +91,11 @@ static inline uint32_t kvmppc_get_dfp(void)
 return 0;
 }
 
+static inline bool kvmppc_get_htm_support(KVMState *kvm_state)
+{
+return false;
+}
+
 static inline int kvmppc_get_hasidle(CPUPPCState *env)
 {
 return 0;
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 2/3] Add KVM_CAP_PPC_HTM to linux/kvm.h

2016-07-05 Thread Sam Bobroff
On Tue, Jul 05, 2016 at 04:05:58PM +1000, David Gibson wrote:
> On Tue, Jul 05, 2016 at 03:19:23PM +1000, Sam Bobroff wrote:
> > Signed-off-by: Sam Bobroff <sam.bobr...@au1.ibm.com>
> 
> Ok, so the usual procedure for updates to linux-headers is this:
>1. Get the change merged on the kernel side
> 
>2. Use scripts/update-linux-headers.sh to update the whole
>   linux-headers subtree to the new kernel version
> 
>3. Submit the patch - make sure to include the commit id of the
>   kernel you updated to in the qemu commit message

Thanks,

Do you think we're at the point with this patch set that I should start
progressing the kernel side change now?

Cheers,
Sam.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 2/3] Add KVM_CAP_PPC_HTM to linux/kvm.h

2016-07-05 Thread Sam Bobroff
Signed-off-by: Sam Bobroff <sam.bobr...@au1.ibm.com>
---
 linux-headers/linux/kvm.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index e60e21b..37cb3e8 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -866,6 +866,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_ARM_PMU_V3 126
 #define KVM_CAP_VCPU_ATTRIBUTES 127
 #define KVM_CAP_MAX_VCPU_ID 128
+#define KVM_CAP_PPC_HTM 129
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 0/3] Rework spapr: Better handling of ibm, pa-features TM bit

2016-07-05 Thread Sam Bobroff

Here's version 2.


Changes v1 -> v2:
Patch 2/3: spapr: Set ibm, pa-features HTM from KVM_CAP_PPC_HTM

* Improve readability of HTM bit set code.
* Move the test for KVM into kvmppc_get_htm_support().


Sam Bobroff (3):
  spapr: Disable ibm, pa-features HTM bit
  Add KVM_CAP_PPC_HTM to linux/kvm.h
  spapr: Set ibm, pa-features HTM from KVM_CAP_PPC_HTM

 hw/ppc/spapr.c|  5 -
 linux-headers/linux/kvm.h |  1 +
 target-ppc/kvm.c  | 29 +
 target-ppc/kvm_ppc.h  |  6 ++
 4 files changed, 40 insertions(+), 1 deletion(-)

-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 1/3] spapr: Disable ibm, pa-features HTM bit

2016-07-05 Thread Sam Bobroff
There are a few issues with our handling of the ibm,pa-features
HTM bit:

- We don't support transactional memory in PR KVM, so don't tell
  the OS that we do.

- In full emulation we have a minimal implementation of HTM that always
  fails, so for performance reasons lets not tell the OS that we
  support it either.

- In HV KVM mode, we should mirror the host HTM enabled state by
  checking a KVM capability or looking at the AT_HWCAP2 bit.

For now unconditionally disable it by removing HTM from the
pa-features bits.  It will be re-enabled in a subsequent patch
specifically for HV KVM.

Signed-off-by: Sam Bobroff <sam.bobr...@au1.ibm.com>
---
 hw/ppc/spapr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 78ebd9e..704aae7 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -635,7 +635,7 @@ static void spapr_populate_cpu_dt(CPUState *cs, void *fdt, 
int offset,
 0xf6, 0x1f, 0xc7, 0xc0, 0x80, 0xf0,
 0x80, 0x00, 0x00, 0x00, 0x00, 0x00,
 0x00, 0x00, 0x00, 0x00, 0x80, 0x00,
-0x80, 0x00, 0x80, 0x00, 0x80, 0x00 };
+0x80, 0x00, 0x80, 0x00, 0x00, 0x00 };
 uint8_t *pa_features;
 size_t pa_size;
 
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 1/1] KVM: PPC: Introduce KVM_CAP_PPC_HTM

2016-07-06 Thread Sam Bobroff
Introduce a new KVM capability, KVM_CAP_PPC_HTM, that can be queried to
determine if a PowerPC KVM guest should use HTM (Hardware Transactional
Memory).

This will be used by QEMU to populate the pa-features bits in the
guest's device tree.

Signed-off-by: Sam Bobroff <sam.bobr...@au1.ibm.com>
---

 arch/powerpc/kvm/powerpc.c | 4 
 include/uapi/linux/kvm.h   | 1 +
 2 files changed, 5 insertions(+)

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 02416fe..06d79bc 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -588,6 +588,10 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
r = 1;
break;
 #endif
+   case KVM_CAP_PPC_HTM:
+   r = cpu_has_feature(CPU_FTR_TM)
+   && is_kvmppc_hv_enabled(kvm);
+   break;
default:
r = 0;
break;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 05ebf47..f421d0e 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -866,6 +866,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_ARM_PMU_V3 126
 #define KVM_CAP_VCPU_ATTRIBUTES 127
 #define KVM_CAP_MAX_VCPU_ID 128
+#define KVM_CAP_PPC_HTM 129
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/6] kvm: powerpc halt polling updates

2016-10-13 Thread Sam Bobroff
On Fri, Oct 14, 2016 at 11:53:18AM +1100, Suraj Jitindar Singh wrote:
> This patch series makes some updates and bug fixes to the powerpc kvm-hv
> halt polling code.
> 
> The first two patches are concerned with exporting the generic kvm module
> parameter variables and accessing these from the powerpc specific code.
> 
> The third patch fixes a bug where changing the global max halt polling
> interval module parameter can sometimes have no effect.
> 
> The fourth patch decreases the default global max halt polling interval
> to something more sensible.
> 
> The fifth patch contains generic fixups with no functional effect.
> 
> The last patch adds halt polling documentation.
> 
> Suraj Jitindar Singh (6):
>   kvm: export kvm module parameter variables
>   powerpc/kvm: Use generic kvm module parameters in kvm-hv
>   powerpc/kvm: Add check for module parameter halt_poll_ns
>   powerpc/kvm: Decrease the powerpc default halt poll max value
>   powerpc/kvm: Comment style and print format fixups
>   doc/kvm: Add halt polling documentation
> 
>  Documentation/virtual/kvm/00-INDEX |   2 +
>  Documentation/virtual/kvm/halt-polling.txt | 127 
> +
>  arch/powerpc/include/asm/kvm_host.h|   2 +-
>  arch/powerpc/kvm/book3s_hv.c   |  33 ++--
>  arch/powerpc/kvm/trace_hv.h|   2 +-
>  include/linux/kvm_host.h   |   4 +
>  virt/kvm/kvm_main.c|   9 +-
>  7 files changed, 149 insertions(+), 30 deletions(-)
>  create mode 100644 Documentation/virtual/kvm/halt-polling.txt
> 
> -- 
> 2.5.5

Hi Suraj,

I've given this set a quick test and it seems to work fine. I used a repetitive
wakeup, using a nanosleep loop in guest userspace (with real time prio), and I
was able to cause halt polling to switch on and off as I adjusted halt_poll_ns.
I think the new default value is much better: halt polling started (e.g. CPU
utilization rose to 100%) once CPU utilization had already risen to about 75%.

Cheers,
Sam.

Tested-by: Sam Bobroff <sam.bobr...@au1.ibm.com>



Re: [PATCH 01/12] powerpc: Disable HFSCR:TM if TM not supported

2017-03-27 Thread Sam Bobroff
On Mon, Mar 20, 2017 at 05:49:03PM +1100, Benjamin Herrenschmidt wrote:
> Otherwise KVM guests might mess with it even when told not
> to causing bad thing interrupts in the host
> 
> Signed-off-by: Benjamin Herrenschmidt 

I've tested this on a P8, with a kernel and QEMU close to their
respective current master branches, and if:
* the host is configured without CONFIG_PPC_TRANSACTIONAL_MEM,
* and the guest is configured with CONFIG_PPC_TRANSACTIONAL_MEM,
* and the guest runs a program that uses HTM (in my tests, just a loop
  doing some floating point multiplies in a transaction)...

Without the patch the host will OOPS, usually in __kvmppc_vcore_entry,
and kill QEMU. On a busy host this is sometimes followed by "Oops: Bad
kernel stack pointer, sig: 6" and the host dies.

With the patch the userspace test program is killed with a SIGILL. The
guest and host are unaffected.

Cheers,
Sam.
> ---
>  arch/powerpc/kernel/setup_64.c | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
> index 9cfaa8b..b372b23 100644
> --- a/arch/powerpc/kernel/setup_64.c
> +++ b/arch/powerpc/kernel/setup_64.c
> @@ -236,6 +236,16 @@ static void cpu_ready_for_interrupts(void)
>   mtspr(SPRN_LPCR, lpcr | LPCR_AIL_3);
>   }
>  
> + /*
> +  * Fixup HFSCR:TM based on CPU features. The bit is set by our
> +  * early asm init because at that point we haven't updated our
> +  * CPU features from firmware and device-tree. Here we have,
> +  * so let's do it
> +  */
> + if (early_cpu_has_feature(CPU_FTR_HVMODE) &&
> + !early_cpu_has_feature(CPU_FTR_TM_COMP))
> + mtspr(SPRN_HFSCR, mfspr(SPRN_HFSCR) & ~HFSCR_TM);
> +
>   /* Set IR and DR in PACA MSR */
>   get_paca()->kernel_msr = MSR_KERNEL;
>  }
> -- 
> 2.9.3



[PATCH 1/1] selftests/powerpc: Improve tm-resched-dscr

2017-08-16 Thread Sam Bobroff
The tm-resched-dscr self test can, in some situations, run for
several minutes before being successfully interrupted by the context
switch it needs in order to perform the test. This often seems to
occur when the test is being run in a virtual machine.

Improve the test by running it under eat_cpu() to guarantee
contention for the CPU and increase the chance of a context switch.

In practice this seems to reduce the test time, in some cases, from
more than two minutes to under a second.

Also remove the "progress dots" so that if the test does run for a
long time, it doesn't produce large amounts of unnecessary output.

Signed-off-by: Sam Bobroff <sam.bobr...@au1.ibm.com>
---
 tools/testing/selftests/powerpc/tm/Makefile  |  1 +
 tools/testing/selftests/powerpc/tm/tm-resched-dscr.c | 12 
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/powerpc/tm/Makefile 
b/tools/testing/selftests/powerpc/tm/Makefile
index 958c11c14acd..7bfcd454fb2a 100644
--- a/tools/testing/selftests/powerpc/tm/Makefile
+++ b/tools/testing/selftests/powerpc/tm/Makefile
@@ -15,6 +15,7 @@ $(OUTPUT)/tm-syscall: tm-syscall-asm.S
 $(OUTPUT)/tm-syscall: CFLAGS += -I../../../../../usr/include
 $(OUTPUT)/tm-tmspr: CFLAGS += -pthread
 $(OUTPUT)/tm-vmx-unavail: CFLAGS += -pthread -m64
+$(OUTPUT)/tm-resched-dscr: ../pmu/lib.o
 
 SIGNAL_CONTEXT_CHK_TESTS := $(patsubst 
%,$(OUTPUT)/%,$(SIGNAL_CONTEXT_CHK_TESTS))
 $(SIGNAL_CONTEXT_CHK_TESTS): tm-signal.S
diff --git a/tools/testing/selftests/powerpc/tm/tm-resched-dscr.c 
b/tools/testing/selftests/powerpc/tm/tm-resched-dscr.c
index e79ccd6aada1..a7ac2e4c60d9 100644
--- a/tools/testing/selftests/powerpc/tm/tm-resched-dscr.c
+++ b/tools/testing/selftests/powerpc/tm/tm-resched-dscr.c
@@ -30,6 +30,7 @@
 
 #include "utils.h"
 #include "tm.h"
+#include "../pmu/lib.h"
 
 #define SPRN_DSCR   0x03
 
@@ -75,8 +76,6 @@ int test_body(void)
);
assert(rv); /* make sure the transaction aborted */
if ((texasr >> 56) != TM_CAUSE_RESCHED) {
-   putchar('.');
-   fflush(stdout);
continue;
}
if (dscr2 != dscr1) {
@@ -89,7 +88,12 @@ int test_body(void)
}
 }
 
-int main(void)
+static int tm_resched_dscr(void)
 {
-   return test_harness(test_body, "tm_resched_dscr");
+   return eat_cpu(test_body);
+}
+
+int main(int argc, const char *argv[])
+{
+   return test_harness(tm_resched_dscr, "tm_resched_dscr");
 }
-- 
2.14.1.2.g4274c698f



[PATCH 1/1] KVM: PPC: Book3S: Fix server always zero from kvmppc_xive_get_xive()

2017-09-26 Thread Sam Bobroff
In KVM's XICS-on-XIVE emulation, kvmppc_xive_get_xive() returns the
value of state->guest_server as "server". However, this value is not
set by it's counterpart kvmppc_xive_set_xive(). When the guest uses
this interface to migrate interrupts away from a CPU that is going
offline, it sees all interrupts as belonging to CPU 0, so they are
left assigned to (now) offline CPUs.

This patch removes the guest_server field from the state, and returns
act_server in it's place (that is, the CPU actually handling the
interrupt, which may differ from the one requested).

Fixes: 5af50993850a ("KVM: PPC: Book3S HV: Native usage of the XIVE
interrupt controller")
Cc: sta...@vger.kernel.org
Signed-off-by: Sam Bobroff <sam.bobr...@au1.ibm.com>
---
The other obvious way to patch this would be to set state->guest_server in
kvmppc_xive_set_xive() and that does also work because act_server is usually
equal to guest_server.

However, in the cases where guest_server differed from act_server, the guest
would only move IRQs correctly if it got act_server (the CPU actually handling
the interrupt) here. So, that approach seemed better.

 arch/powerpc/kvm/book3s_xive.c | 5 ++---
 arch/powerpc/kvm/book3s_xive.h | 1 -
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
index 13304622ab1c..bf457843e032 100644
--- a/arch/powerpc/kvm/book3s_xive.c
+++ b/arch/powerpc/kvm/book3s_xive.c
@@ -622,7 +622,7 @@ int kvmppc_xive_get_xive(struct kvm *kvm, u32 irq, u32 
*server,
return -EINVAL;
state = >irq_state[idx];
arch_spin_lock(>lock);
-   *server = state->guest_server;
+   *server = state->act_server;
*priority = state->guest_priority;
arch_spin_unlock(>lock);
 
@@ -1331,7 +1331,7 @@ static int xive_get_source(struct kvmppc_xive *xive, long 
irq, u64 addr)
xive->saved_src_count++;
 
/* Convert saved state into something compatible with xics */
-   val = state->guest_server;
+   val = state->act_server;
prio = state->saved_scan_prio;
 
if (prio == MASKED) {
@@ -1507,7 +1507,6 @@ static int xive_set_source(struct kvmppc_xive *xive, long 
irq, u64 addr)
/* First convert prio and mark interrupt as untargetted */
act_prio = xive_prio_from_guest(guest_prio);
state->act_priority = MASKED;
-   state->guest_server = server;
 
/*
 * We need to drop the lock due to the mutex below. Hopefully
diff --git a/arch/powerpc/kvm/book3s_xive.h b/arch/powerpc/kvm/book3s_xive.h
index 5938f7644dc1..6ba63f8e8a61 100644
--- a/arch/powerpc/kvm/book3s_xive.h
+++ b/arch/powerpc/kvm/book3s_xive.h
@@ -35,7 +35,6 @@ struct kvmppc_xive_irq_state {
struct xive_irq_data *pt_data;  /* XIVE Pass-through associated data */
 
/* Targetting as set by guest */
-   u32 guest_server;   /* Current guest selected target */
u8 guest_priority;  /* Guest set priority */
u8 saved_priority;  /* Saved priority when masking */
 
-- 
2.14.1.4.g334a7be4f



Re: [PATCH] powerpc/kernel: Change retrieval of pci_dn

2017-08-29 Thread Sam Bobroff
On Mon, Aug 28, 2017 at 11:05:03AM -0500, Bryant G. Ly wrote:
> For a PCI device it's pci_dn can be retrieved from
> pdev->dev.archdata.firmware_data, PCI_DN(devnode), or parent's list.
> Thus, we should just use the generic function pci_get_pdn_by_devfn
> to get the pci_dn.
> 
> Signed-off-by: Bryant G. Ly <b...@us.ibm.com>
Reviewed-by: Sam Bobroff <sam.bobr...@au1.ibm.com>

I don't know this area but I tested it using a patched kernel with the
old and new code together. My test kernel booted fine (in QEMU+KVM) and
I saw 26 reads and 4 writes, all of which got the same value with either
code block.

I also checked that the error result in the "not found" case is the same
as well, which it is, because rtas_{read,write}_config() will return
PCIBIOS_DEVICE_NOT_FOUND if given a NULL pdn.

So, looks good to me.

Cheers,
Sam.

> ---
>  arch/powerpc/kernel/rtas_pci.c | 30 ++
>  1 file changed, 2 insertions(+), 28 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/rtas_pci.c b/arch/powerpc/kernel/rtas_pci.c
> index 73f1934..c21e6c5 100644
> --- a/arch/powerpc/kernel/rtas_pci.c
> +++ b/arch/powerpc/kernel/rtas_pci.c
> @@ -91,25 +91,13 @@ static int rtas_pci_read_config(struct pci_bus *bus,
>   unsigned int devfn,
>   int where, int size, u32 *val)
>  {
> - struct device_node *busdn, *dn;
>   struct pci_dn *pdn;
> - bool found = false;
>   int ret;
>  
>   /* Search only direct children of the bus */
>   *val = 0x;
> - busdn = pci_bus_to_OF_node(bus);
> - for (dn = busdn->child; dn; dn = dn->sibling) {
> - pdn = PCI_DN(dn);
> - if (pdn && pdn->devfn == devfn
> - && of_device_is_available(dn)) {
> - found = true;
> - break;
> - }
> - }
>  
> - if (!found)
> - return PCIBIOS_DEVICE_NOT_FOUND;
> + pdn = pci_get_pdn_by_devfn(bus, devfn);
>  
>   ret = rtas_read_config(pdn, where, size, val);
>   if (*val == EEH_IO_ERROR_VALUE(size) &&
> @@ -153,23 +141,9 @@ static int rtas_pci_write_config(struct pci_bus *bus,
>unsigned int devfn,
>int where, int size, u32 val)
>  {
> - struct device_node *busdn, *dn;
>   struct pci_dn *pdn;
> - bool found = false;
> -
> - /* Search only direct children of the bus */
> - busdn = pci_bus_to_OF_node(bus);
> - for (dn = busdn->child; dn; dn = dn->sibling) {
> - pdn = PCI_DN(dn);
> - if (pdn && pdn->devfn == devfn
> - && of_device_is_available(dn)) {
> - found = true;
> - break;
> - }
> - }
>  
> - if (!found)
> - return PCIBIOS_DEVICE_NOT_FOUND;
> + pdn = pci_get_pdn_by_devfn(bus, devfn);
>  
>   return rtas_write_config(pdn, where, size, val);
>  }
> -- 
> 2.5.4 (Apple Git-61)



Re: [PATCH 13/13] powerpc/eeh: Refactor report functions

2018-05-06 Thread Sam Bobroff
On Thu, May 03, 2018 at 11:27:12PM +1000, Michael Ellerman wrote:
> Sam Bobroff <sbobr...@linux.ibm.com> writes:
> > diff --git a/arch/powerpc/kernel/eeh_driver.c 
> > b/arch/powerpc/kernel/eeh_driver.c
> > index eb4feee81ff4..1c4336dcf9f5 100644
> > --- a/arch/powerpc/kernel/eeh_driver.c
> > +++ b/arch/powerpc/kernel/eeh_driver.c
> > @@ -54,6 +54,25 @@ static int eeh_result_priority(enum pci_ers_result 
> > result)
> > }
> >  };
> >  
> > +const char *pci_ers_result_name(enum pci_ers_result r)
> > +{
> > +   switch (r) {
> > +   case PCI_ERS_RESULT_NONE: return "none";
> > +   case PCI_ERS_RESULT_CAN_RECOVER: return "can recover";
> > +   case PCI_ERS_RESULT_NEED_RESET: return "need reset";
> > +   case PCI_ERS_RESULT_DISCONNECT: return "disconnect";
> > +   case PCI_ERS_RESULT_RECOVERED: return "recovered";
> > +   case PCI_ERS_RESULT_NO_AER_DRIVER: return "no AER driver";
> > +   default:
> > +   WARN_ONCE(1, "Unknown result type");
> > +   return "unknown";
> > +   }
> > +};
> > +
> > +#define eeh_infoline(EDEV, FMT, ...) \
> > +pr_info("EEH: PE#%x (PCI %s): " pr_fmt(FMT) "\n", EDEV->pe_config_addr, \
> > +((EDEV->pdev) ? dev_name(>pdev->dev) : "NONE"), ##__VA_ARGS__)
> 
> Ooof, I guess.
> 
> It would be nicer as a function.

OK (I'd prefer to avoid macros too), but I'm not sure what kind of
function you mean. I initially tried to use a function, but the existing
pr_*() type macros seemed to be macros all the way down to printk, so there
didn't seem to be anywhere to hook into, and I ended up having to
open-code the varargs and allocate a buffer to print into.  Then it can
overflow etc. and it ended up seeming worse. Is there a better way I'm
missing here?

> "infoline" annoys me for some reason, "eeh_info" ?

OK. How about eeh_edev_info(), to indicate that it acts on an 'edev'?

> > @@ -223,123 +242,118 @@ static void eeh_set_irq_state(struct eeh_pe *root, 
> > bool enable)
> > }
> >  }
> >  
> > +static void eeh_pe_report(const char *name, struct eeh_pe *root,
> > + enum pci_ers_result (*fn)(struct eeh_dev *,
> > +   struct pci_driver *),
> > + enum pci_ers_result *result)
> > +{
> > +   struct eeh_pe *pe;
> > +   struct eeh_dev *edev, *tmp;
> > +   enum pci_ers_result new_result;
> > +
> > +   pr_info("EEH: Beginning: '%s'\n", name);
> > +   eeh_for_each_pe(root, pe) {
> > +   eeh_pe_for_each_dev(pe, edev, tmp) {
> 
> This should be in a separate function.

Oh, that's better. Will do!

> > +   device_lock(>pdev->dev);
> > +   if (eeh_edev_actionable(edev)) {
> > +   struct pci_driver *driver;
> > +
> > +   driver = eeh_pcid_get(edev->pdev);
> > +
> > +   if (!driver)
> > +   eeh_infoline(edev, "no driver");
> > +   else if (!driver->err_handler)
> > +   eeh_infoline(edev,
> > +"driver not EEH aware");
> > +   else if (edev->mode & EEH_DEV_NO_HANDLER)
> > +   eeh_infoline(edev,
> > +"driver bound too late");
> > +   else {
> > +   new_result = fn(edev, driver);
> > +   eeh_infoline(edev,
> > +   "%s driver reports: '%s'",
> > +   driver->name,
> > +   
> > pci_ers_result_name(new_result));
> > +   if (result)
> > +   *result = merge_result(*result,
> > +  
> > new_result);
> > +   }
> > +   if (driver)
> > +   eeh_pcid_put(edev->pdev);
> > +   } else {
> > +   eeh_infoline(edev, "not actionable (%d,%d,%d)",
> > +!!edev->pdev,
> > +!eeh_dev_removed(edev),
> > +!eeh_pe_passed(edev->pe));
> > +   }
> > +   device_unlock(>pdev->dev);
> 
> ^^^
> 
> 
> cheers

Thanks!


signature.asc
Description: PGP signature


Re: [PATCH 03/13] powerpc/eeh: Fix use-after-release of EEH driver

2018-05-06 Thread Sam Bobroff
On Fri, May 04, 2018 at 12:56:55PM +1000, Michael Ellerman wrote:
> Sam Bobroff <sbobr...@linux.ibm.com> writes:
> 
> > Correct two cases where eeh_pcid_get() is used to reference the driver's
> > module but the reference is dropped before the driver pointer is used.
> >
> > In eeh_rmv_device() also refactor a little so that only two calls to
> > eeh_pcid_put() are needed, rather than three and the reference isn't
> > taken at all if it wasn't needed.
> 
> This sounds like a crash or memory corruption bug?
> 
> But it doesn't have Fixes or Cc: stable. Is it not a major problem for
> some reason?

Only that I've exercised that code path a fair bit during testing and
never managed to cause a problem with it. I found it by inspection.

Do you think I should mark it fixes or stable in the next version?

> cheers
> 


signature.asc
Description: PGP signature


Re: [PATCH 02/13] powerpc/eeh: Add final message for successful recovery

2018-05-06 Thread Sam Bobroff
On Fri, May 04, 2018 at 12:55:42PM +1000, Michael Ellerman wrote:
> Sam Bobroff <sbobr...@linux.ibm.com> writes:
> 
> > Add a single log line at the end of successful EEH recovery, so that
> > it's clear that event processing has finished.
> >
> > Signed-off-by: Sam Bobroff <sbobr...@linux.ibm.com>
> > ---
> >  arch/powerpc/kernel/eeh_driver.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/arch/powerpc/kernel/eeh_driver.c 
> > b/arch/powerpc/kernel/eeh_driver.c
> > index 56a60b9eb397..07e0a42035ce 100644
> > --- a/arch/powerpc/kernel/eeh_driver.c
> > +++ b/arch/powerpc/kernel/eeh_driver.c
> > @@ -910,6 +910,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
> > pr_info("EEH: Notify device driver to resume\n");
> > eeh_pe_dev_traverse(pe, eeh_report_resume, NULL);
> >  
> > +   pr_info("EEH: Recovery successful.\n");
> 
> Is it possible for recovery for multiple devices to be interleaved?
> 
> Should that message include the device?

No, all the calls are serialized. They're only called from
eeh_event_handler() which is in the main loop of a kernel thread (see
eeh_event_init()).

> cheers
> 


signature.asc
Description: PGP signature


Re: [PATCH 02/13] powerpc/eeh: Add final message for successful recovery

2018-05-06 Thread Sam Bobroff
On Fri, May 04, 2018 at 04:08:28PM +1000, Russell Currey wrote:
> On Fri, 2018-05-04 at 12:55 +1000, Michael Ellerman wrote:
> > Sam Bobroff <sbobr...@linux.ibm.com> writes:
> > 
> > > Add a single log line at the end of successful EEH recovery, so
> > > that
> > > it's clear that event processing has finished.
> > > 
> > > Signed-off-by: Sam Bobroff <sbobr...@linux.ibm.com>
> > > ---
> > >  arch/powerpc/kernel/eeh_driver.c | 1 +
> > >  1 file changed, 1 insertion(+)
> > > 
> > > diff --git a/arch/powerpc/kernel/eeh_driver.c
> > > b/arch/powerpc/kernel/eeh_driver.c
> > > index 56a60b9eb397..07e0a42035ce 100644
> > > --- a/arch/powerpc/kernel/eeh_driver.c
> > > +++ b/arch/powerpc/kernel/eeh_driver.c
> > > @@ -910,6 +910,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
> > >   pr_info("EEH: Notify device driver to resume\n");
> > >   eeh_pe_dev_traverse(pe, eeh_report_resume, NULL);
> > >  
> > > + pr_info("EEH: Recovery successful.\n");
> > Is it possible for recovery for multiple devices to be interleaved?
> > 
> > Should that message include the device?
> 
> Pretty sure EEH will only process a single error at a time so this
> *should* always let you infer from context, but PHB and PE should
> probably be included anyway.  It'd be cool to move pe_{err/warn/info}()
> out of powernv for messages like this.

While we only process a single "event" at a time, that event covers the
initial PE, or possibly it's parent or PHB, and any dependent PEs so I
thought it could be misleading to show only a single one for that
message -- I wanted something to indicate that the whole "event" had
finished (since that was otherwise not really clear to me). Does that
seem reasonable?

I'm sorry but I don't understand what you mean by moving some functions
out of powernv, could you explain more?

> - Russell
> 
> > 
> > cheers
> 


signature.asc
Description: PGP signature


Re: [PATCH 11/13] powerpc/eeh: Introduce eeh_set_irq_state()

2018-05-07 Thread Sam Bobroff
On Fri, May 04, 2018 at 01:02:32PM +1000, Michael Ellerman wrote:
> Sam Bobroff <sbobr...@linux.ibm.com> writes:
> 
> > diff --git a/arch/powerpc/kernel/eeh_driver.c 
> > b/arch/powerpc/kernel/eeh_driver.c
> > index f63a01d336ee..b3edd0df04b8 100644
> > --- a/arch/powerpc/kernel/eeh_driver.c
> > +++ b/arch/powerpc/kernel/eeh_driver.c
> > @@ -210,6 +206,23 @@ static void eeh_set_channel_state(struct eeh_pe *root, 
> > enum pci_channel_state s)
> > edev->pdev->error_state = s;
> >  }
> >  
> > +static void eeh_set_irq_state(struct eeh_pe *root, bool enable)
> > +{
> > +   struct eeh_pe *pe;
> > +   struct eeh_dev *edev, *tmp;
> > +
> > +   eeh_for_each_pe(root, pe)
> > +   eeh_pe_for_each_dev(pe, edev, tmp)
> > +   if (eeh_edev_actionable(edev))
> > +   if (eeh_pcid_get(edev->pdev)) {
> > +   if (enable)
> > +   eeh_enable_irq(edev);
> > +   else
> > +   eeh_disable_irq(edev);
> > +   eeh_pcid_put(edev->pdev);
> > +   }
> 
> Yikes.
> 
> What about?
> 
>   eeh_for_each_pe(root, pe) {
>   eeh_pe_for_each_dev(pe, edev, tmp) {
>   if (!eeh_edev_actionable(edev))
>   continue;
> 
>   if (!eeh_pcid_get(edev->pdev))
>   continue;
> 
>   if (enable)
>   eeh_enable_irq(edev);
>   else
>   eeh_disable_irq(edev);
> 
>   eeh_pcid_put(edev->pdev);
>   }
>   }
> 
> cheers

Sure, will do.

Cheers,


signature.asc
Description: PGP signature


Re: [PATCH 07/13] powerpc/eeh: Clean up pci_ers_result handling

2018-05-07 Thread Sam Bobroff
On Fri, May 04, 2018 at 04:58:01PM +1000, Russell Currey wrote:
> On Wed, 2018-05-02 at 16:36 +1000, Sam Bobroff wrote:
> > As EEH event handling progresses, a cumulative result of type
> > pci_ers_result is built up by (some of) the eeh_report_*() functions
> > using either:
> > if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
> > if (*res == PCI_ERS_RESULT_NONE) *res = rc;
> > or:
> > if ((*res == PCI_ERS_RESULT_NONE) ||
> > (*res == PCI_ERS_RESULT_RECOVERED)) *res = rc;
> > if (*res == PCI_ERS_RESULT_DISCONNECT &&
> > rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
> > (Where *res is the accumulator.)
> > 
> > However, the intent is not immediately clear and the result in some
> > situations is order dependent.
> > 
> > Address this by assigning a priority to each result value, and always
> > merging to the highest priority. This renders the intent clear, and
> > provides a stable value for all orderings.
> > 
> > Signed-off-by: Sam Bobroff <sbobr...@linux.ibm.com>
> > ---
> >  arch/powerpc/kernel/eeh_driver.c | 36 ++--
> > 
> >  1 file changed, 26 insertions(+), 10 deletions(-)
> > 
> > diff --git a/arch/powerpc/kernel/eeh_driver.c
> > b/arch/powerpc/kernel/eeh_driver.c
> > index 188d15c4fe3a..f33dd68a9ca2 100644
> > --- a/arch/powerpc/kernel/eeh_driver.c
> > +++ b/arch/powerpc/kernel/eeh_driver.c
> > @@ -39,6 +39,29 @@ struct eeh_rmv_data {
> > int removed;
> >  };
> >  
> > +static int eeh_result_priority(enum pci_ers_result result)
> > +{
> > +   switch (result) {
> > +   case PCI_ERS_RESULT_NONE: return 0;
> > +   case PCI_ERS_RESULT_NO_AER_DRIVER: return 1;
> > +   case PCI_ERS_RESULT_RECOVERED: return 2;
> > +   case PCI_ERS_RESULT_CAN_RECOVER: return 3;
> > +   case PCI_ERS_RESULT_DISCONNECT: return 4;
> > +   case PCI_ERS_RESULT_NEED_RESET: return 5;
> > +   default:
> > +   WARN_ONCE(1, "Unknown pci_ers_result value");
> 
> Missing newline and also would be good to print the value was

Sounds good, will do! I'm not sure if it's mentioned elsewhere but I'll
fix the same issues in pci_ers_result_name() as well.

> > +   return 0;
> > +   }
> > +};
> > +
> > +static enum pci_ers_result merge_result(enum pci_ers_result old,
> > +   enum pci_ers_result new)
> 
> merge_result() sounds like something really generic, maybe call it
> pci_ers_merge_result() or something?

Sounds good, will do.

> Note: just learned that it stands for Error Recovery System and that's
> a thing (?)
> 
> > +{
> > +   if (eeh_result_priority(new) > eeh_result_priority(old))
> > +   return new;
> > +   return old;
> 
> MAX would be nicer as per mpe's suggestion

Sorry, I don't understand. The return value isn't MAX(new, old) so
how can MAX() do this?

> > +}
> > +
> >  /**
> >   * eeh_pcid_get - Get the PCI device driver
> >   * @pdev: PCI device
> > @@ -206,9 +229,7 @@ static void *eeh_report_error(struct eeh_dev
> > *edev, void *userdata)
> >  
> > rc = driver->err_handler->error_detected(dev,
> > pci_channel_io_frozen);
> >  
> > -   /* A driver that needs a reset trumps all others */
> > -   if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
> > -   if (*res == PCI_ERS_RESULT_NONE) *res = rc;
> > +   *res = merge_result(*res, rc);
> >  
> > edev->in_error = true;
> > pci_uevent_ers(dev, PCI_ERS_RESULT_NONE);
> > @@ -249,9 +270,7 @@ static void *eeh_report_mmio_enabled(struct
> > eeh_dev *edev, void *userdata)
> >  
> > rc = driver->err_handler->mmio_enabled(dev);
> >  
> > -   /* A driver that needs a reset trumps all others */
> > -   if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
> > -   if (*res == PCI_ERS_RESULT_NONE) *res = rc;
> > +   *res = merge_result(*res, rc);
> >  
> >  out:
> > eeh_pcid_put(dev);
> > @@ -294,10 +313,7 @@ static void *eeh_report_reset(struct eeh_dev
> > *edev, void *userdata)
> > goto out;
> >  
> > rc = driver->err_handler->slot_reset(dev);
> > -   if ((*res == PCI_ERS_RESULT_NONE) ||
> > -   (*res == PCI_ERS_RESULT_RECOVERED)) *res = rc;
> > -   if (*res == PCI_ERS_RESULT_DISCONNECT &&
> > -rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
> > +   *res = merge_result(*res, rc);
> >  
> >  out:
> > eeh_pcid_put(dev);
> 


signature.asc
Description: PGP signature


[PATCH v2 1/1] powerpc/pseries: fix EEH recovery of some IOV devices

2018-05-24 Thread Sam Bobroff
EEH recovery currently fails on pSeries for some IOV capable PCI
devices, if CONFIG_PCI_IOV is on and the hypervisor doesn't provide
certain device tree properties for the device. (Found on an IOV
capable device using the ipr driver.)

Recovery fails in pci_enable_resources() at the check on r->parent,
because r->flags is set and r->parent is not.  This state is due to
sriov_init() setting the start, end and flags members of the IOV BARs
but the parent not being set later in
pseries_pci_fixup_iov_resources(), because the
"ibm,open-sriov-vf-bar-info" property is missing.

Correct this by zeroing the resource flags for IOV BARs when they
can't be configured.

Signed-off-by: Sam Bobroff <sbobr...@linux.ibm.com>
---
Hi,

This is a fix to allow EEH recovery to succeed in a specific situation,
which I've tried to explain in the commit message.

As with the RFC version, the IOV BARs are disabled by setting the resource
flags to 0 but the other fields are now left as-is because that is what is done
elsewhere (see sriov_init() and __pci_read_base()).

I've also examined the concern raised by Bjorn Helgaas, that VFs could be
enabled later after the BARs are disabled, and it already seems safe: enabling
VFs (on pseries) depends on another device tree property,
"ibm,number-of-configurable-vfs" as well as support for the RTAS function
"ibm_map_pes". Since these are all part of the hypervisor's support for IOV it
seems unlikely that we would ever see some of them but not all. (None are
currently provided by QEMU/KVM.) (Additionally, the ipr driver on which the EEH
recovery failure was discovered doesn't even seem to have SR-IOV support so it
certainly can't enable VFs.)

Cheers,
Sam.
== v1 -> v2: ==

Patch 1/1: powerpc/pseries: fix EEH recovery of some IOV devices
* Moved the BAR disabling code to a function.
* Also check in pseries_pci_fixup_resources().

== v1: ==

Patch 1/1: powerpc/pseries: fix EEH recovery of IOV devices

 arch/powerpc/platforms/pseries/setup.c | 25 +
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/setup.c 
b/arch/powerpc/platforms/pseries/setup.c
index b55ad4286dc7..0a9e4243ae1d 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -645,6 +645,15 @@ void of_pci_parse_iov_addrs(struct pci_dev *dev, const int 
*indexes)
}
 }
 
+static void pseries_disable_sriov_resources(struct pci_dev *pdev)
+{
+   int i;
+
+   pci_warn(pdev, "No hypervisor support for SR-IOV on this device, IOV 
BARs disabled.\n");
+   for (i = 0; i < PCI_SRIOV_NUM_BARS; i++)
+   pdev->resource[i + PCI_IOV_RESOURCES].flags = 0;
+}
+
 static void pseries_pci_fixup_resources(struct pci_dev *pdev)
 {
const int *indexes;
@@ -652,10 +661,10 @@ static void pseries_pci_fixup_resources(struct pci_dev 
*pdev)
 
/*Firmware must support open sriov otherwise dont configure*/
indexes = of_get_property(dn, "ibm,open-sriov-vf-bar-info", NULL);
-   if (!indexes)
-   return;
-   /* Assign the addresses from device tree*/
-   of_pci_set_vf_bar_size(pdev, indexes);
+   if (indexes)
+   of_pci_set_vf_bar_size(pdev, indexes);
+   else
+   pseries_disable_sriov_resources(pdev);
 }
 
 static void pseries_pci_fixup_iov_resources(struct pci_dev *pdev)
@@ -667,10 +676,10 @@ static void pseries_pci_fixup_iov_resources(struct 
pci_dev *pdev)
return;
/*Firmware must support open sriov otherwise dont configure*/
indexes = of_get_property(dn, "ibm,open-sriov-vf-bar-info", NULL);
-   if (!indexes)
-   return;
-   /* Assign the addresses from device tree*/
-   of_pci_parse_iov_addrs(pdev, indexes);
+   if (indexes)
+   of_pci_parse_iov_addrs(pdev, indexes);
+   else
+   pseries_disable_sriov_resources(pdev);
 }
 
 static resource_size_t pseries_pci_iov_resource_alignment(struct pci_dev *pdev,
-- 
2.16.1.74.g9b0b1f47b



[PATCH v2 03/13] powerpc/eeh: Fix use-after-release of EEH driver

2018-05-24 Thread Sam Bobroff
Correct two cases where eeh_pcid_get() is used to reference the driver's
module but the reference is dropped before the driver pointer is used.

In eeh_rmv_device() also refactor a little so that only two calls to
eeh_pcid_put() are needed, rather than three and the reference isn't
taken at all if it wasn't needed.

Signed-off-by: Sam Bobroff <sbobr...@linux.ibm.com>
---
 arch/powerpc/kernel/eeh_driver.c | 28 
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 07e0a42035ce..54333f6c9d67 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -458,9 +458,11 @@ static void *eeh_add_virt_device(void *data, void 
*userdata)
 
driver = eeh_pcid_get(dev);
if (driver) {
-   eeh_pcid_put(dev);
-   if (driver->err_handler)
+   if (driver->err_handler) {
+   eeh_pcid_put(dev);
return NULL;
+   }
+   eeh_pcid_put(dev);
}
 
 #ifdef CONFIG_PCI_IOV
@@ -497,17 +499,19 @@ static void *eeh_rmv_device(void *data, void *userdata)
if (eeh_dev_removed(edev))
return NULL;
 
-   driver = eeh_pcid_get(dev);
-   if (driver) {
-   eeh_pcid_put(dev);
-   if (removed &&
-   eeh_pe_passed(edev->pe))
-   return NULL;
-   if (removed &&
-   driver->err_handler &&
-   driver->err_handler->error_detected &&
-   driver->err_handler->slot_reset)
+   if (removed) {
+   if (eeh_pe_passed(edev->pe))
return NULL;
+   driver = eeh_pcid_get(dev);
+   if (driver) {
+   if (driver->err_handler &&
+   driver->err_handler->error_detected &&
+   driver->err_handler->slot_reset) {
+   eeh_pcid_put(dev);
+   return NULL;
+   }
+   eeh_pcid_put(dev);
+   }
}
 
/* Remove it from PCI subsystem */
-- 
2.16.1.74.g9b0b1f47b



[PATCH v2 01/13] powerpc/eeh: Add eeh_max_freezes to initial EEH log line

2018-05-24 Thread Sam Bobroff
The current failure message includes the number of failures that have
occurred in the last hour (for a device) but it does not indicate
how many failures will be tolerated before the device is permanently
disabled.

Include the limit (eeh_max_freezes) to make this less surprising when
it happens.

Also remove the embedded newline from the existing message to make it
easier to grep for.

Signed-off-by: Sam Bobroff <sbobr...@linux.ibm.com>
---
 arch/powerpc/kernel/eeh_driver.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index b8a329f04814..56a60b9eb397 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -778,14 +778,13 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
eeh_pe_update_time_stamp(pe);
pe->freeze_count++;
if (pe->freeze_count > eeh_max_freezes) {
-   pr_err("EEH: PHB#%x-PE#%x has failed %d times in the\n"
-  "last hour and has been permanently disabled.\n",
+   pr_err("EEH: PHB#%x-PE#%x has failed %d times in the last hour 
and has been permanently disabled.\n",
   pe->phb->global_number, pe->addr,
   pe->freeze_count);
goto hard_fail;
}
-   pr_warn("EEH: This PCI device has failed %d times in the last hour\n",
-   pe->freeze_count);
+   pr_warn("EEH: This PCI device has failed %d times in the last hour and 
will be permanently disabled after %d failures.\n",
+   pe->freeze_count, eeh_max_freezes);
 
/* Walk the various device drivers attached to this slot through
 * a reset sequence, giving each an opportunity to do what it needs
-- 
2.16.1.74.g9b0b1f47b



[PATCH v2 00/13] EEH refactoring 2

2018-05-24 Thread Sam Bobroff
Hello everyone,

Here is a second, somewhat deeper, set of cleanups for the EEH code
(mostly eeh_drver.c).

These changes are not intended to significantly alter the actual processing,
but rather to improve the readability and maintainability of the code. They are
subjective by nature so I would appreciate comments and suggestions.

The earlier changes are mostly to support the last patch, where we're finally
able to use a common infrastructure for the reporting functions (basically
wrappers around the driver's handlers). This allows removal of a fair bit of
code, and the easy addition of some useful messaging which should make future
maintenance easier (as an example, a recent fix in this area "powerpc/eeh: Fix
race with driver un/bind" would have required adding two lines rather than
42+/26-).

Cheers,
Sam.

Patch set changelog follows:

== v1 -> v2: ==

Patch 1/13: powerpc/eeh: Add eeh_max_freezes to initial EEH log line

Patch 2/13: powerpc/eeh: Add final message for successful recovery

Patch 3/13: powerpc/eeh: Fix use-after-release of EEH driver

Patch 4/13: powerpc/eeh: Remove unused eeh_pcid_name()

Patch 5/13: powerpc/eeh: Strengthen types of eeh traversal functions

Patch 6/13: powerpc/eeh: Add message when PE processing at parent

Patch 7/13: powerpc/eeh: Clean up pci_ers_result handling
* Added the value, and missing newline, to some WARN()s.
* Improved name of merge_result() to pci_ers_merge_result().
* Adjusted the result priorities so that unknown doesn't overlap with _NONE.

Patch 8/13: powerpc/eeh: Introduce eeh_for_each_pe()

Patch 9/13: powerpc/eeh: Introduce eeh_edev_actionable()

Patch 10/13: powerpc/eeh: Introduce eeh_set_channel_state()

Patch 11/13: powerpc/eeh: Introduce eeh_set_irq_state()
* Reorganised eeh_set_irq_state() to reduce nesting depth.

Patch 12/13: powerpc/eeh: Cleaner handling of EEH_DEV_NO_HANDLER

Patch 13/13: powerpc/eeh: Refactor report functions
* Better name for eeh_infoline() and implement using a function.
* Move the core of eeh_pe_report() into a new function to improve readability.
* pci_ers_result_name(): match parameter name to eeh_result_priority(), correct 
and improve warning message.

== v1: ==

Patch 1/13: powerpc/eeh: Add eeh_max_freezes to initial EEH log line
Patch 2/13: powerpc/eeh: Add final message for successful recovery
Patch 3/13: powerpc/eeh: Fix use-after-release of EEH driver
Patch 4/13: powerpc/eeh: Remove unused eeh_pcid_name()
Patch 5/13: powerpc/eeh: Strengthen types of eeh traversal functions
Patch 6/13: powerpc/eeh: Add message when PE processing at parent
Patch 7/13: powerpc/eeh: Clean up pci_ers_result handling
Patch 8/13: powerpc/eeh: Introduce eeh_for_each_pe()
Patch 9/13: powerpc/eeh: Introduce eeh_edev_actionable()
Patch 10/13: powerpc/eeh: Introduce eeh_set_channel_state()
Patch 11/13: powerpc/eeh: Introduce eeh_set_irq_state()
Patch 12/13: powerpc/eeh: Cleaner handling of EEH_DEV_NO_HANDLER
Patch 13/13: powerpc/eeh: Refactor report functions

Sam Bobroff (13):
  powerpc/eeh: Add eeh_max_freezes to initial EEH log line
  powerpc/eeh: Add final message for successful recovery
  powerpc/eeh: Fix use-after-release of EEH driver
  powerpc/eeh: Remove unused eeh_pcid_name()
  powerpc/eeh: Strengthen types of eeh traversal functions
  powerpc/eeh: Add message when PE processing at parent
  powerpc/eeh: Clean up pci_ers_result handling
  powerpc/eeh: Introduce eeh_for_each_pe()
  powerpc/eeh: Introduce eeh_edev_actionable()
  powerpc/eeh: Introduce eeh_set_channel_state()
  powerpc/eeh: Introduce eeh_set_irq_state()
  powerpc/eeh: Cleaner handling of EEH_DEV_NO_HANDLER
  powerpc/eeh: Refactor report functions

 arch/powerpc/include/asm/eeh.h   |  11 +-
 arch/powerpc/kernel/eeh.c|  19 +-
 arch/powerpc/kernel/eeh_driver.c | 493 +--
 arch/powerpc/kernel/eeh_pe.c |  26 +--
 4 files changed, 294 insertions(+), 255 deletions(-)

-- 
2.16.1.74.g9b0b1f47b



[PATCH v2 09/13] powerpc/eeh: Introduce eeh_edev_actionable()

2018-05-24 Thread Sam Bobroff
The same test is done in every EEH report function, so factor it out.

Since eeh_dev_removed() needs to be moved higher up in the file,
simplify it a little while we're at it.

Signed-off-by: Sam Bobroff <sbobr...@linux.ibm.com>
---
 arch/powerpc/kernel/eeh_driver.c | 30 --
 1 file changed, 16 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 2d3cac584899..30898866418b 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -62,6 +62,17 @@ static enum pci_ers_result pci_ers_merge_result(enum 
pci_ers_result old,
return old;
 }
 
+static bool eeh_dev_removed(struct eeh_dev *edev)
+{
+   return !edev || (edev->mode & EEH_DEV_REMOVED);
+}
+
+static bool eeh_edev_actionable(struct eeh_dev *edev)
+{
+   return (edev->pdev && !eeh_dev_removed(edev) &&
+   !eeh_pe_passed(edev->pe));
+}
+
 /**
  * eeh_pcid_get - Get the PCI device driver
  * @pdev: PCI device
@@ -163,15 +174,6 @@ static void eeh_enable_irq(struct pci_dev *dev)
}
 }
 
-static bool eeh_dev_removed(struct eeh_dev *edev)
-{
-   /* EEH device removed ? */
-   if (!edev || (edev->mode & EEH_DEV_REMOVED))
-   return true;
-
-   return false;
-}
-
 static void *eeh_dev_save_state(struct eeh_dev *edev, void *userdata)
 {
struct pci_dev *pdev;
@@ -212,7 +214,7 @@ static void *eeh_report_error(struct eeh_dev *edev, void 
*userdata)
enum pci_ers_result rc, *res = userdata;
struct pci_driver *driver;
 
-   if (!dev || eeh_dev_removed(edev) || eeh_pe_passed(edev->pe))
+   if (!eeh_edev_actionable(edev))
return NULL;
 
device_lock(>dev);
@@ -256,7 +258,7 @@ static void *eeh_report_mmio_enabled(struct eeh_dev *edev, 
void *userdata)
enum pci_ers_result rc, *res = userdata;
struct pci_driver *driver;
 
-   if (!dev || eeh_dev_removed(edev) || eeh_pe_passed(edev->pe))
+   if (!eeh_edev_actionable(edev))
return NULL;
 
device_lock(>dev);
@@ -295,7 +297,7 @@ static void *eeh_report_reset(struct eeh_dev *edev, void 
*userdata)
enum pci_ers_result rc, *res = userdata;
struct pci_driver *driver;
 
-   if (!dev || eeh_dev_removed(edev) || eeh_pe_passed(edev->pe))
+   if (!eeh_edev_actionable(edev))
return NULL;
 
device_lock(>dev);
@@ -365,7 +367,7 @@ static void *eeh_report_resume(struct eeh_dev *edev, void 
*userdata)
bool was_in_error;
struct pci_driver *driver;
 
-   if (!dev || eeh_dev_removed(edev) || eeh_pe_passed(edev->pe))
+   if (!eeh_edev_actionable(edev))
return NULL;
 
device_lock(>dev);
@@ -412,7 +414,7 @@ static void *eeh_report_failure(struct eeh_dev *edev, void 
*userdata)
struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
struct pci_driver *driver;
 
-   if (!dev || eeh_dev_removed(edev) || eeh_pe_passed(edev->pe))
+   if (!eeh_edev_actionable(edev))
return NULL;
 
device_lock(>dev);
-- 
2.16.1.74.g9b0b1f47b



[PATCH v2 13/13] powerpc/eeh: Refactor report functions

2018-05-24 Thread Sam Bobroff
The EEH report functions now share a fair bit of code around the start
and end of each function.

So factor out as much as possible, and move the traversal into a
custom function. This also allows accurate debug to be generated more
easily.

Signed-off-by: Sam Bobroff <sbobr...@linux.ibm.com>
---
== v1 -> v2: ==
* Better name for eeh_infoline() and implement using a function.
* Move the core of eeh_pe_report() into a new function to improve readability.
* pci_ers_result_name(): match parameter name to eeh_result_priority(), correct 
and improve warning message.

 arch/powerpc/kernel/eeh_driver.c | 310 ---
 1 file changed, 159 insertions(+), 151 deletions(-)

diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 42fe80ed6f59..7cab58abbec9 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -54,6 +54,40 @@ static int eeh_result_priority(enum pci_ers_result result)
}
 };
 
+const char *pci_ers_result_name(enum pci_ers_result result)
+{
+   switch (result) {
+   case PCI_ERS_RESULT_NONE: return "none";
+   case PCI_ERS_RESULT_CAN_RECOVER: return "can recover";
+   case PCI_ERS_RESULT_NEED_RESET: return "need reset";
+   case PCI_ERS_RESULT_DISCONNECT: return "disconnect";
+   case PCI_ERS_RESULT_RECOVERED: return "recovered";
+   case PCI_ERS_RESULT_NO_AER_DRIVER: return "no AER driver";
+   default:
+   WARN_ONCE(1, "Unknown result type: %d\n", (int)result);
+   return "unknown";
+   }
+};
+
+static __printf(2, 3)
+void eeh_edev_info(const struct eeh_dev *edev, const char *fmt, ...)
+{
+   struct va_format vaf;
+   va_list args;
+
+   va_start(args, fmt);
+
+   vaf.fmt = fmt;
+   vaf.va = 
+
+   printk(KERN_INFO "EEH: PE#%x (PCI %s): %pV\n",
+   edev->pe_config_addr,
+   edev->pdev ? dev_name(>pdev->dev) : "none",
+   );
+
+   va_end(args);
+}
+
 static enum pci_ers_result pci_ers_merge_result(enum pci_ers_result old,
enum pci_ers_result new)
 {
@@ -229,123 +263,124 @@ static void eeh_set_irq_state(struct eeh_pe *root, bool 
enable)
}
 }
 
-/**
- * eeh_report_error - Report pci error to each device driver
- * @data: eeh device
- * @userdata: return value
- *
- * Report an EEH error to each device driver, collect up and
- * merge the device driver responses. Cumulative response
- * passed back in "userdata".
- */
-static void *eeh_report_error(struct eeh_dev *edev, void *userdata)
+typedef enum pci_ers_result (*eeh_report_fn)(struct eeh_dev *,
+struct pci_driver *);
+static void eeh_pe_report_edev(struct eeh_dev *edev,
+ eeh_report_fn fn,
+ enum pci_ers_result *result)
 {
-   struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
-   enum pci_ers_result rc, *res = userdata;
struct pci_driver *driver;
+   enum pci_ers_result new_result;
+
+   device_lock(>pdev->dev);
+   if (eeh_edev_actionable(edev)) {
+   driver = eeh_pcid_get(edev->pdev);
+
+   if (!driver)
+   eeh_edev_info(edev, "no driver");
+   else if (!driver->err_handler)
+   eeh_edev_info(edev,
+"driver not EEH aware");
+   else if (edev->mode & EEH_DEV_NO_HANDLER)
+   eeh_edev_info(edev,
+"driver bound too late");
+   else {
+   new_result = fn(edev, driver);
+   eeh_edev_info(edev,
+   "%s driver reports: '%s'",
+   driver->name,
+   pci_ers_result_name(new_result));
+   if (result)
+   *result = pci_ers_merge_result(*result,
+  new_result);
+   }
+   if (driver)
+   eeh_pcid_put(edev->pdev);
+   } else {
+   eeh_edev_info(edev, "not actionable (%d,%d,%d)",
+!!edev->pdev,
+!eeh_dev_removed(edev),
+!eeh_pe_passed(edev->pe));
+   }
+   device_unlock(>pdev->dev);
+}
 
-   if (!eeh_edev_actionable(edev))
-   return NULL;
+static void eeh_pe_report(const char *name, struct eeh_pe *root,
+ eeh_report_fn fn,
+ enum pci_ers_result *result)
+{
+

[PATCH v2 02/13] powerpc/eeh: Add final message for successful recovery

2018-05-24 Thread Sam Bobroff
Add a single log line at the end of successful EEH recovery, so that
it's clear that event processing has finished.

Signed-off-by: Sam Bobroff <sbobr...@linux.ibm.com>
---
 arch/powerpc/kernel/eeh_driver.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 56a60b9eb397..07e0a42035ce 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -910,6 +910,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
pr_info("EEH: Notify device driver to resume\n");
eeh_pe_dev_traverse(pe, eeh_report_resume, NULL);
 
+   pr_info("EEH: Recovery successful.\n");
goto final;
 
 hard_fail:
-- 
2.16.1.74.g9b0b1f47b



[PATCH v2 04/13] powerpc/eeh: Remove unused eeh_pcid_name()

2018-05-24 Thread Sam Bobroff
Signed-off-by: Sam Bobroff <sbobr...@linux.ibm.com>
---
 arch/powerpc/kernel/eeh_driver.c | 14 --
 1 file changed, 14 deletions(-)

diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 54333f6c9d67..ca9a73fe9cc5 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -39,20 +39,6 @@ struct eeh_rmv_data {
int removed;
 };
 
-/**
- * eeh_pcid_name - Retrieve name of PCI device driver
- * @pdev: PCI device
- *
- * This routine is used to retrieve the name of PCI device driver
- * if that's valid.
- */
-static inline const char *eeh_pcid_name(struct pci_dev *pdev)
-{
-   if (pdev && pdev->dev.driver)
-   return pdev->dev.driver->name;
-   return "";
-}
-
 /**
  * eeh_pcid_get - Get the PCI device driver
  * @pdev: PCI device
-- 
2.16.1.74.g9b0b1f47b



[PATCH v2 06/13] powerpc/eeh: Add message when PE processing at parent

2018-05-24 Thread Sam Bobroff
To aid debugging, add a message to show when EEH processing for a PE
will be done at the device's parent, rather than directly at the
device.

Signed-off-by: Sam Bobroff <sbobr...@linux.ibm.com>
---
 arch/powerpc/kernel/eeh.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index f82dade4fb9a..1139821a9aec 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -541,8 +541,12 @@ int eeh_dev_check_failure(struct eeh_dev *edev)
 
/* Frozen parent PE ? */
ret = eeh_ops->get_state(parent_pe, NULL);
-   if (ret > 0 && !eeh_state_active(ret))
+   if (ret > 0 && !eeh_state_active(ret)) {
pe = parent_pe;
+   pr_err("EEH: Failure of PHB#%x-PE#%x will be handled at 
parent PHB#%x-PE#%x.\n",
+  pe->phb->global_number, pe->addr,
+  pe->phb->global_number, parent_pe->addr);
+   }
 
/* Next parent level */
parent_pe = parent_pe->parent;
-- 
2.16.1.74.g9b0b1f47b



[PATCH v2 12/13] powerpc/eeh: Cleaner handling of EEH_DEV_NO_HANDLER

2018-05-24 Thread Sam Bobroff
If a device without a driver is recovered via EEH, the flag
EEH_DEV_NO_HANDLER is incorrectly left set on the device after
recovery, because the test in eeh_report_resume() for the existence of
a bound driver is done before the flag is cleared. If a driver is
later bound, and EEH experienced again, some of the drivers EEH
handers are not called.

To correct this, clear the flag unconditionally after EEH processing
is complete.

Signed-off-by: Sam Bobroff <sbobr...@linux.ibm.com>
---
 arch/powerpc/kernel/eeh_driver.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 8dd2a33424a1..42fe80ed6f59 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -405,7 +405,6 @@ static void *eeh_report_resume(struct eeh_dev *edev, void 
*userdata)
if (!driver->err_handler ||
!driver->err_handler->resume ||
(edev->mode & EEH_DEV_NO_HANDLER) || !was_in_error) {
-   edev->mode &= ~EEH_DEV_NO_HANDLER;
goto out;
}
 
@@ -780,6 +779,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 {
struct pci_bus *bus;
struct eeh_dev *edev, *tmp;
+   struct eeh_pe *tmp_pe;
int rc = 0;
enum pci_ers_result result = PCI_ERS_RESULT_NONE;
struct eeh_rmv_data rmv_data = {LIST_HEAD_INIT(rmv_data.edev_list), 0};
@@ -934,6 +934,10 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
eeh_set_irq_state(pe, true);
eeh_pe_dev_traverse(pe, eeh_report_resume, NULL);
 
+   eeh_for_each_pe(pe, tmp_pe)
+   eeh_pe_for_each_dev(tmp_pe, edev, tmp)
+   edev->mode &= ~EEH_DEV_NO_HANDLER;
+
pr_info("EEH: Recovery successful.\n");
goto final;
 
-- 
2.16.1.74.g9b0b1f47b



[PATCH v2 10/13] powerpc/eeh: Introduce eeh_set_channel_state()

2018-05-24 Thread Sam Bobroff
To ease future refactoring, extract setting of the channel state
from the report functions out into their own functions. This increases
the amount of code that is identical across all of the report
functions.

Signed-off-by: Sam Bobroff <sbobr...@linux.ibm.com>
---
 arch/powerpc/kernel/eeh_driver.c | 19 ---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 30898866418b..76951ca701b2 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -199,6 +199,17 @@ static void *eeh_dev_save_state(struct eeh_dev *edev, void 
*userdata)
return NULL;
 }
 
+static void eeh_set_channel_state(struct eeh_pe *root, enum pci_channel_state 
s)
+{
+   struct eeh_pe *pe;
+   struct eeh_dev *edev, *tmp;
+
+   eeh_for_each_pe(root, pe)
+   eeh_pe_for_each_dev(pe, edev, tmp)
+   if (eeh_edev_actionable(edev))
+   edev->pdev->error_state = s;
+}
+
 /**
  * eeh_report_error - Report pci error to each device driver
  * @data: eeh device
@@ -218,7 +229,6 @@ static void *eeh_report_error(struct eeh_dev *edev, void 
*userdata)
return NULL;
 
device_lock(>dev);
-   dev->error_state = pci_channel_io_frozen;
 
driver = eeh_pcid_get(dev);
if (!driver) goto out_no_dev;
@@ -301,7 +311,6 @@ static void *eeh_report_reset(struct eeh_dev *edev, void 
*userdata)
return NULL;
 
device_lock(>dev);
-   dev->error_state = pci_channel_io_normal;
 
driver = eeh_pcid_get(dev);
if (!driver) goto out_no_dev;
@@ -371,7 +380,6 @@ static void *eeh_report_resume(struct eeh_dev *edev, void 
*userdata)
return NULL;
 
device_lock(>dev);
-   dev->error_state = pci_channel_io_normal;
 
driver = eeh_pcid_get(dev);
if (!driver) goto out_no_dev;
@@ -795,6 +803,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 * hotplug for this case.
 */
pr_info("EEH: Notify device drivers to shutdown\n");
+   eeh_set_channel_state(pe, pci_channel_io_frozen);
eeh_pe_dev_traverse(pe, eeh_report_error, );
if ((pe->type & EEH_PE_PHB) &&
result != PCI_ERS_RESULT_NONE &&
@@ -885,6 +894,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
pr_info("EEH: Notify device drivers "
"the completion of reset\n");
result = PCI_ERS_RESULT_NONE;
+   eeh_set_channel_state(pe, pci_channel_io_normal);
eeh_pe_dev_traverse(pe, eeh_report_reset, );
}
 
@@ -906,6 +916,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 
/* Tell all device drivers that they can resume operations */
pr_info("EEH: Notify device driver to resume\n");
+   eeh_set_channel_state(pe, pci_channel_io_normal);
eeh_pe_dev_traverse(pe, eeh_report_resume, NULL);
 
pr_info("EEH: Recovery successful.\n");
@@ -924,6 +935,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
eeh_slot_error_detail(pe, EEH_LOG_PERM);
 
/* Notify all devices that they're about to go down. */
+   eeh_set_channel_state(pe, pci_channel_io_perm_failure);
eeh_pe_dev_traverse(pe, eeh_report_failure, NULL);
 
/* Mark the PE to be removed permanently */
@@ -1033,6 +1045,7 @@ void eeh_handle_special_event(void)
 
/* Notify all devices to be down */
eeh_pe_state_clear(pe, EEH_PE_PRI_BUS);
+   eeh_set_channel_state(pe, 
pci_channel_io_perm_failure);
eeh_pe_dev_traverse(pe,
eeh_report_failure, NULL);
bus = eeh_pe_bus_get(phb_pe);
-- 
2.16.1.74.g9b0b1f47b



[PATCH v2 05/13] powerpc/eeh: Strengthen types of eeh traversal functions

2018-05-24 Thread Sam Bobroff
The traversal functions eeh_pe_traverse() and eeh_pe_dev_traverse()
both provide their first argument as void * but every single user casts
it to the expected type.

Change the type of the first parameter from void * to the appropriate
type, and clean up all uses.

Signed-off-by: Sam Bobroff <sbobr...@linux.ibm.com>
---
 arch/powerpc/include/asm/eeh.h   |  7 ---
 arch/powerpc/kernel/eeh.c| 13 +
 arch/powerpc/kernel/eeh_driver.c | 30 ++
 arch/powerpc/kernel/eeh_pe.c | 19 +++
 4 files changed, 26 insertions(+), 43 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index c2266ca61853..f02e0400e6f2 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -262,7 +262,8 @@ static inline bool eeh_state_active(int state)
== (EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE);
 }
 
-typedef void *(*eeh_traverse_func)(void *data, void *flag);
+typedef void *(*eeh_edev_traverse_func)(struct eeh_dev *edev, void *flag);
+typedef void *(*eeh_pe_traverse_func)(struct eeh_pe *pe, void *flag);
 void eeh_set_pe_aux_size(int size);
 int eeh_phb_pe_create(struct pci_controller *phb);
 struct eeh_pe *eeh_phb_pe_get(struct pci_controller *phb);
@@ -272,9 +273,9 @@ int eeh_add_to_parent_pe(struct eeh_dev *edev);
 int eeh_rmv_from_parent_pe(struct eeh_dev *edev);
 void eeh_pe_update_time_stamp(struct eeh_pe *pe);
 void *eeh_pe_traverse(struct eeh_pe *root,
-   eeh_traverse_func fn, void *flag);
+ eeh_pe_traverse_func fn, void *flag);
 void *eeh_pe_dev_traverse(struct eeh_pe *root,
-   eeh_traverse_func fn, void *flag);
+ eeh_edev_traverse_func fn, void *flag);
 void eeh_pe_restore_bars(struct eeh_pe *pe);
 const char *eeh_pe_loc_get(struct eeh_pe *pe);
 struct pci_bus *eeh_pe_bus_get(struct eeh_pe *pe);
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index bc640e4c5ca5..f82dade4fb9a 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -263,9 +263,8 @@ static size_t eeh_dump_dev_log(struct eeh_dev *edev, char 
*buf, size_t len)
return n;
 }
 
-static void *eeh_dump_pe_log(void *data, void *flag)
+static void *eeh_dump_pe_log(struct eeh_pe *pe, void *flag)
 {
-   struct eeh_pe *pe = data;
struct eeh_dev *edev, *tmp;
size_t *plen = flag;
 
@@ -686,9 +685,9 @@ int eeh_pci_enable(struct eeh_pe *pe, int function)
return rc;
 }
 
-static void *eeh_disable_and_save_dev_state(void *data, void *userdata)
+static void *eeh_disable_and_save_dev_state(struct eeh_dev *edev,
+   void *userdata)
 {
-   struct eeh_dev *edev = data;
struct pci_dev *pdev = eeh_dev_to_pci_dev(edev);
struct pci_dev *dev = userdata;
 
@@ -714,9 +713,8 @@ static void *eeh_disable_and_save_dev_state(void *data, 
void *userdata)
return NULL;
 }
 
-static void *eeh_restore_dev_state(void *data, void *userdata)
+static void *eeh_restore_dev_state(struct eeh_dev *edev, void *userdata)
 {
-   struct eeh_dev *edev = data;
struct pci_dn *pdn = eeh_dev_to_pdn(edev);
struct pci_dev *pdev = eeh_dev_to_pci_dev(edev);
struct pci_dev *dev = userdata;
@@ -856,11 +854,10 @@ int pcibios_set_pcie_reset_state(struct pci_dev *dev, 
enum pcie_reset_state stat
  * the indicated device and its children so that the bunch of the
  * devices could be reset properly.
  */
-static void *eeh_set_dev_freset(void *data, void *flag)
+static void *eeh_set_dev_freset(struct eeh_dev *edev, void *flag)
 {
struct pci_dev *dev;
unsigned int *freset = (unsigned int *)flag;
-   struct eeh_dev *edev = (struct eeh_dev *)data;
 
dev = eeh_dev_to_pci_dev(edev);
if (dev)
diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index ca9a73fe9cc5..188d15c4fe3a 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -149,9 +149,8 @@ static bool eeh_dev_removed(struct eeh_dev *edev)
return false;
 }
 
-static void *eeh_dev_save_state(void *data, void *userdata)
+static void *eeh_dev_save_state(struct eeh_dev *edev, void *userdata)
 {
-   struct eeh_dev *edev = data;
struct pci_dev *pdev;
 
if (!edev)
@@ -184,9 +183,8 @@ static void *eeh_dev_save_state(void *data, void *userdata)
  * merge the device driver responses. Cumulative response
  * passed back in "userdata".
  */
-static void *eeh_report_error(void *data, void *userdata)
+static void *eeh_report_error(struct eeh_dev *edev, void *userdata)
 {
-   struct eeh_dev *edev = (struct eeh_dev *)data;
struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
enum pci_ers_result rc, *res = userdata;
struct pci_driver *driver;
@@ -231,9 +229,8 @@ static void *eeh_report_error(void *data, void *userdata)
  * are now enabled. Collects up and m

[PATCH v2 08/13] powerpc/eeh: Introduce eeh_for_each_pe()

2018-05-24 Thread Sam Bobroff
Add a for_each-style macro for iterating through PEs without the
boilerplate required by a traversal function. eeh_pe_next() is now
exported, as it is now used directly in place.

Signed-off-by: Sam Bobroff <sbobr...@linux.ibm.com>
---
 arch/powerpc/include/asm/eeh.h | 4 
 arch/powerpc/kernel/eeh_pe.c   | 7 +++
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index f02e0400e6f2..677102baf3cd 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -106,6 +106,9 @@ struct eeh_pe {
 #define eeh_pe_for_each_dev(pe, edev, tmp) \
list_for_each_entry_safe(edev, tmp, >edevs, list)
 
+#define eeh_for_each_pe(root, pe) \
+   for (pe = root; pe; pe = eeh_pe_next(pe, root))
+
 static inline bool eeh_pe_passed(struct eeh_pe *pe)
 {
return pe ? !!atomic_read(>pass_dev_cnt) : false;
@@ -267,6 +270,7 @@ typedef void *(*eeh_pe_traverse_func)(struct eeh_pe *pe, 
void *flag);
 void eeh_set_pe_aux_size(int size);
 int eeh_phb_pe_create(struct pci_controller *phb);
 struct eeh_pe *eeh_phb_pe_get(struct pci_controller *phb);
+struct eeh_pe *eeh_pe_next(struct eeh_pe *pe, struct eeh_pe *root);
 struct eeh_pe *eeh_pe_get(struct pci_controller *phb,
  int pe_no, int config_addr);
 int eeh_add_to_parent_pe(struct eeh_dev *edev);
diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
index 38a4bcd8ed13..1b238ecc553e 100644
--- a/arch/powerpc/kernel/eeh_pe.c
+++ b/arch/powerpc/kernel/eeh_pe.c
@@ -142,8 +142,7 @@ struct eeh_pe *eeh_phb_pe_get(struct pci_controller *phb)
  * The function is used to retrieve the next PE in the
  * hierarchy PE tree.
  */
-static struct eeh_pe *eeh_pe_next(struct eeh_pe *pe,
- struct eeh_pe *root)
+struct eeh_pe *eeh_pe_next(struct eeh_pe *pe, struct eeh_pe *root)
 {
struct list_head *next = pe->child_list.next;
 
@@ -178,7 +177,7 @@ void *eeh_pe_traverse(struct eeh_pe *root,
struct eeh_pe *pe;
void *ret;
 
-   for (pe = root; pe; pe = eeh_pe_next(pe, root)) {
+   eeh_for_each_pe(root, pe) {
ret = fn(pe, flag);
if (ret) return ret;
}
@@ -209,7 +208,7 @@ void *eeh_pe_dev_traverse(struct eeh_pe *root,
}
 
/* Traverse root PE */
-   for (pe = root; pe; pe = eeh_pe_next(pe, root)) {
+   eeh_for_each_pe(root, pe) {
eeh_pe_for_each_dev(pe, edev, tmp) {
ret = fn(edev, flag);
if (ret)
-- 
2.16.1.74.g9b0b1f47b



[PATCH v2 07/13] powerpc/eeh: Clean up pci_ers_result handling

2018-05-24 Thread Sam Bobroff
As EEH event handling progresses, a cumulative result of type
pci_ers_result is built up by (some of) the eeh_report_*() functions
using either:
if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
if (*res == PCI_ERS_RESULT_NONE) *res = rc;
or:
if ((*res == PCI_ERS_RESULT_NONE) ||
(*res == PCI_ERS_RESULT_RECOVERED)) *res = rc;
if (*res == PCI_ERS_RESULT_DISCONNECT &&
rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
(Where *res is the accumulator.)

However, the intent is not immediately clear and the result in some
situations is order dependent.

Address this by assigning a priority to each result value, and always
merging to the highest priority. This renders the intent clear, and
provides a stable value for all orderings.

Signed-off-by: Sam Bobroff <sbobr...@linux.ibm.com>
---
== v1 -> v2: ==
* Added the value, and missing newline, to some WARN()s.
* Improved name of merge_result() to pci_ers_merge_result().
* Adjusted the result priorities so that unknown doesn't overlap with _NONE.

 arch/powerpc/kernel/eeh_driver.c | 36 ++--
 1 file changed, 26 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 188d15c4fe3a..2d3cac584899 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -39,6 +39,29 @@ struct eeh_rmv_data {
int removed;
 };
 
+static int eeh_result_priority(enum pci_ers_result result)
+{
+   switch (result) {
+   case PCI_ERS_RESULT_NONE: return 1;
+   case PCI_ERS_RESULT_NO_AER_DRIVER: return 2;
+   case PCI_ERS_RESULT_RECOVERED: return 3;
+   case PCI_ERS_RESULT_CAN_RECOVER: return 4;
+   case PCI_ERS_RESULT_DISCONNECT: return 5;
+   case PCI_ERS_RESULT_NEED_RESET: return 6;
+   default:
+   WARN_ONCE(1, "Unknown pci_ers_result value: %d\n", (int)result);
+   return 0;
+   }
+};
+
+static enum pci_ers_result pci_ers_merge_result(enum pci_ers_result old,
+   enum pci_ers_result new)
+{
+   if (eeh_result_priority(new) > eeh_result_priority(old))
+   return new;
+   return old;
+}
+
 /**
  * eeh_pcid_get - Get the PCI device driver
  * @pdev: PCI device
@@ -206,9 +229,7 @@ static void *eeh_report_error(struct eeh_dev *edev, void 
*userdata)
 
rc = driver->err_handler->error_detected(dev, pci_channel_io_frozen);
 
-   /* A driver that needs a reset trumps all others */
-   if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
-   if (*res == PCI_ERS_RESULT_NONE) *res = rc;
+   *res = pci_ers_merge_result(*res, rc);
 
edev->in_error = true;
pci_uevent_ers(dev, PCI_ERS_RESULT_NONE);
@@ -249,9 +270,7 @@ static void *eeh_report_mmio_enabled(struct eeh_dev *edev, 
void *userdata)
 
rc = driver->err_handler->mmio_enabled(dev);
 
-   /* A driver that needs a reset trumps all others */
-   if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
-   if (*res == PCI_ERS_RESULT_NONE) *res = rc;
+   *res = pci_ers_merge_result(*res, rc);
 
 out:
eeh_pcid_put(dev);
@@ -294,10 +313,7 @@ static void *eeh_report_reset(struct eeh_dev *edev, void 
*userdata)
goto out;
 
rc = driver->err_handler->slot_reset(dev);
-   if ((*res == PCI_ERS_RESULT_NONE) ||
-   (*res == PCI_ERS_RESULT_RECOVERED)) *res = rc;
-   if (*res == PCI_ERS_RESULT_DISCONNECT &&
-rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
+   *res = pci_ers_merge_result(*res, rc);
 
 out:
eeh_pcid_put(dev);
-- 
2.16.1.74.g9b0b1f47b



[PATCH v2 11/13] powerpc/eeh: Introduce eeh_set_irq_state()

2018-05-24 Thread Sam Bobroff
To ease future refactoring, extract calls to eeh_enable_irq() and
eeh_disable_irq() from the various report functions. This makes
the report functions initial sequences more similar, as well as making
the IRQ changes visible when reading eeh_handle_normal_event().

Signed-off-by: Sam Bobroff <sbobr...@linux.ibm.com>
---
== v1 -> v2: ==
* Reorganised eeh_set_irq_state() to reduce nesting depth.

 arch/powerpc/kernel/eeh_driver.c | 52 ++--
 1 file changed, 34 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 76951ca701b2..8dd2a33424a1 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -118,22 +118,20 @@ static inline void eeh_pcid_put(struct pci_dev *pdev)
  * do real work because EEH should freeze DMA transfers for those PCI
  * devices encountering EEH errors, which includes MSI or MSI-X.
  */
-static void eeh_disable_irq(struct pci_dev *dev)
+static void eeh_disable_irq(struct eeh_dev *edev)
 {
-   struct eeh_dev *edev = pci_dev_to_eeh_dev(dev);
-
/* Don't disable MSI and MSI-X interrupts. They are
 * effectively disabled by the DMA Stopped state
 * when an EEH error occurs.
 */
-   if (dev->msi_enabled || dev->msix_enabled)
+   if (edev->pdev->msi_enabled || edev->pdev->msix_enabled)
return;
 
-   if (!irq_has_action(dev->irq))
+   if (!irq_has_action(edev->pdev->irq))
return;
 
edev->mode |= EEH_DEV_IRQ_DISABLED;
-   disable_irq_nosync(dev->irq);
+   disable_irq_nosync(edev->pdev->irq);
 }
 
 /**
@@ -143,10 +141,8 @@ static void eeh_disable_irq(struct pci_dev *dev)
  * This routine must be called to enable interrupt while failed
  * device could be resumed.
  */
-static void eeh_enable_irq(struct pci_dev *dev)
+static void eeh_enable_irq(struct eeh_dev *edev)
 {
-   struct eeh_dev *edev = pci_dev_to_eeh_dev(dev);
-
if ((edev->mode) & EEH_DEV_IRQ_DISABLED) {
edev->mode &= ~EEH_DEV_IRQ_DISABLED;
/*
@@ -169,8 +165,8 @@ static void eeh_enable_irq(struct pci_dev *dev)
 *
 *  tglx
 */
-   if (irqd_irq_disabled(irq_get_irq_data(dev->irq)))
-   enable_irq(dev->irq);
+   if (irqd_irq_disabled(irq_get_irq_data(edev->pdev->irq)))
+   enable_irq(edev->pdev->irq);
}
 }
 
@@ -210,6 +206,29 @@ static void eeh_set_channel_state(struct eeh_pe *root, 
enum pci_channel_state s)
edev->pdev->error_state = s;
 }
 
+static void eeh_set_irq_state(struct eeh_pe *root, bool enable)
+{
+   struct eeh_pe *pe;
+   struct eeh_dev *edev, *tmp;
+
+   eeh_for_each_pe(root, pe) {
+   eeh_pe_for_each_dev(pe, edev, tmp) {
+   if (!eeh_edev_actionable(edev))
+   continue;
+
+   if (!eeh_pcid_get(edev->pdev))
+   continue;
+
+   if (enable)
+   eeh_enable_irq(edev);
+   else
+   eeh_disable_irq(edev);
+
+   eeh_pcid_put(edev->pdev);
+   }
+   }
+}
+
 /**
  * eeh_report_error - Report pci error to each device driver
  * @data: eeh device
@@ -233,8 +252,6 @@ static void *eeh_report_error(struct eeh_dev *edev, void 
*userdata)
driver = eeh_pcid_get(dev);
if (!driver) goto out_no_dev;
 
-   eeh_disable_irq(dev);
-
if (!driver->err_handler ||
!driver->err_handler->error_detected)
goto out;
@@ -315,8 +332,6 @@ static void *eeh_report_reset(struct eeh_dev *edev, void 
*userdata)
driver = eeh_pcid_get(dev);
if (!driver) goto out_no_dev;
 
-   eeh_enable_irq(dev);
-
if (!driver->err_handler ||
!driver->err_handler->slot_reset ||
(edev->mode & EEH_DEV_NO_HANDLER) ||
@@ -386,7 +401,6 @@ static void *eeh_report_resume(struct eeh_dev *edev, void 
*userdata)
 
was_in_error = edev->in_error;
edev->in_error = false;
-   eeh_enable_irq(dev);
 
if (!driver->err_handler ||
!driver->err_handler->resume ||
@@ -431,8 +445,6 @@ static void *eeh_report_failure(struct eeh_dev *edev, void 
*userdata)
driver = eeh_pcid_get(dev);
if (!driver) goto out_no_dev;
 
-   eeh_disable_irq(dev);
-
if (!driver->err_handler ||
!driver->err_handler->error_detected)
goto out;
@@ -804,6 +816,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 */
pr_info("EEH: Notify device drivers to shutdown\n");
eeh_set_cha

Re: [PATCH v2 07/13] powerpc/eeh: Clean up pci_ers_result handling

2018-06-03 Thread Sam Bobroff
On Sat, Jun 02, 2018 at 01:40:46AM +1000, Michael Ellerman wrote:
> Sam Bobroff  writes:
> 
> > As EEH event handling progresses, a cumulative result of type
> > pci_ers_result is built up by (some of) the eeh_report_*() functions
> > using either:
> > if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
> > if (*res == PCI_ERS_RESULT_NONE) *res = rc;
> > or:
> > if ((*res == PCI_ERS_RESULT_NONE) ||
> > (*res == PCI_ERS_RESULT_RECOVERED)) *res = rc;
> > if (*res == PCI_ERS_RESULT_DISCONNECT &&
> > rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
> > (Where *res is the accumulator.)
> >
> > However, the intent is not immediately clear and the result in some
> > situations is order dependent.
> >
> > Address this by assigning a priority to each result value, and always
> > merging to the highest priority. This renders the intent clear, and
> > provides a stable value for all orderings.
> >
> > Signed-off-by: Sam Bobroff 
> > ---
> > == v1 -> v2: ==
> >
> > * Added the value, and missing newline, to some WARN()s.
> > * Improved name of merge_result() to pci_ers_merge_result().
> > * Adjusted the result priorities so that unknown doesn't overlap with _NONE.
> 
> These === markers seem to have confused patchwork, they ended up in the
> patch, and then git put them in the changelog.
> 
> http://patchwork.ozlabs.org/patch/920194/
> 
> The usual format is just something like:
> 
> v2 - Added the value, and missing newline, to some WARN()s.
>- Improved name of merge_result() to pci_ers_merge_result().
>- Adjusted the result priorities so that unknown doesn't overlap with 
> _NONE.
> 
> cheers

Oh! I'll change it!

Sam.

> 
> >  arch/powerpc/kernel/eeh_driver.c | 36 ++--
> >  1 file changed, 26 insertions(+), 10 deletions(-)
> >
> > diff --git a/arch/powerpc/kernel/eeh_driver.c 
> > b/arch/powerpc/kernel/eeh_driver.c
> > index 188d15c4fe3a..2d3cac584899 100644
> > --- a/arch/powerpc/kernel/eeh_driver.c
> > +++ b/arch/powerpc/kernel/eeh_driver.c
> > @@ -39,6 +39,29 @@ struct eeh_rmv_data {
> > int removed;
> >  };
> >  
> > +static int eeh_result_priority(enum pci_ers_result result)
> > +{
> > +   switch (result) {
> > +   case PCI_ERS_RESULT_NONE: return 1;
> > +   case PCI_ERS_RESULT_NO_AER_DRIVER: return 2;
> > +   case PCI_ERS_RESULT_RECOVERED: return 3;
> > +   case PCI_ERS_RESULT_CAN_RECOVER: return 4;
> > +   case PCI_ERS_RESULT_DISCONNECT: return 5;
> > +   case PCI_ERS_RESULT_NEED_RESET: return 6;
> > +   default:
> > +   WARN_ONCE(1, "Unknown pci_ers_result value: %d\n", (int)result);
> > +   return 0;
> > +   }
> > +};
> > +
> > +static enum pci_ers_result pci_ers_merge_result(enum pci_ers_result old,
> > +   enum pci_ers_result new)
> > +{
> > +   if (eeh_result_priority(new) > eeh_result_priority(old))
> > +   return new;
> > +   return old;
> > +}
> > +
> >  /**
> >   * eeh_pcid_get - Get the PCI device driver
> >   * @pdev: PCI device
> > @@ -206,9 +229,7 @@ static void *eeh_report_error(struct eeh_dev *edev, 
> > void *userdata)
> >  
> > rc = driver->err_handler->error_detected(dev, pci_channel_io_frozen);
> >  
> > -   /* A driver that needs a reset trumps all others */
> > -   if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
> > -   if (*res == PCI_ERS_RESULT_NONE) *res = rc;
> > +   *res = pci_ers_merge_result(*res, rc);
> >  
> > edev->in_error = true;
> > pci_uevent_ers(dev, PCI_ERS_RESULT_NONE);
> > @@ -249,9 +270,7 @@ static void *eeh_report_mmio_enabled(struct eeh_dev 
> > *edev, void *userdata)
> >  
> > rc = driver->err_handler->mmio_enabled(dev);
> >  
> > -   /* A driver that needs a reset trumps all others */
> > -   if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
> > -   if (*res == PCI_ERS_RESULT_NONE) *res = rc;
> > +   *res = pci_ers_merge_result(*res, rc);
> >  
> >  out:
> > eeh_pcid_put(dev);
> > @@ -294,10 +313,7 @@ static void *eeh_report_reset(struct eeh_dev *edev, 
> > void *userdata)
> > goto out;
> >  
> > rc = driver->err_handler->slot_reset(dev);
> > -   if ((*res == PCI_ERS_RESULT_NONE) ||
> > -   (*res == PCI_ERS_RESULT_RECOVERED)) *res = rc;
> > -   if (*res == PCI_ERS_RESULT_DISCONNECT &&
> > -rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
> > +   *res = pci_ers_merge_result(*res, rc);
> >  
> >  out:
> > eeh_pcid_put(dev);
> > -- 
> > 2.16.1.74.g9b0b1f47b
> 


signature.asc
Description: PGP signature


[PATCH 06/13] powerpc/eeh: Add message when PE processing at parent

2018-05-02 Thread Sam Bobroff
To aid debugging, add a message to show when EEH processing for a PE
will be done at the device's parent, rather than directly at the
device.

Signed-off-by: Sam Bobroff <sbobr...@linux.ibm.com>
---
 arch/powerpc/kernel/eeh.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index f82dade4fb9a..1139821a9aec 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -541,8 +541,12 @@ int eeh_dev_check_failure(struct eeh_dev *edev)
 
/* Frozen parent PE ? */
ret = eeh_ops->get_state(parent_pe, NULL);
-   if (ret > 0 && !eeh_state_active(ret))
+   if (ret > 0 && !eeh_state_active(ret)) {
pe = parent_pe;
+   pr_err("EEH: Failure of PHB#%x-PE#%x will be handled at 
parent PHB#%x-PE#%x.\n",
+  pe->phb->global_number, pe->addr,
+  pe->phb->global_number, parent_pe->addr);
+   }
 
/* Next parent level */
parent_pe = parent_pe->parent;
-- 
2.16.1.74.g9b0b1f47b



[PATCH 07/13] powerpc/eeh: Clean up pci_ers_result handling

2018-05-02 Thread Sam Bobroff
As EEH event handling progresses, a cumulative result of type
pci_ers_result is built up by (some of) the eeh_report_*() functions
using either:
if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
if (*res == PCI_ERS_RESULT_NONE) *res = rc;
or:
if ((*res == PCI_ERS_RESULT_NONE) ||
(*res == PCI_ERS_RESULT_RECOVERED)) *res = rc;
if (*res == PCI_ERS_RESULT_DISCONNECT &&
rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
(Where *res is the accumulator.)

However, the intent is not immediately clear and the result in some
situations is order dependent.

Address this by assigning a priority to each result value, and always
merging to the highest priority. This renders the intent clear, and
provides a stable value for all orderings.

Signed-off-by: Sam Bobroff <sbobr...@linux.ibm.com>
---
 arch/powerpc/kernel/eeh_driver.c | 36 ++--
 1 file changed, 26 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 188d15c4fe3a..f33dd68a9ca2 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -39,6 +39,29 @@ struct eeh_rmv_data {
int removed;
 };
 
+static int eeh_result_priority(enum pci_ers_result result)
+{
+   switch (result) {
+   case PCI_ERS_RESULT_NONE: return 0;
+   case PCI_ERS_RESULT_NO_AER_DRIVER: return 1;
+   case PCI_ERS_RESULT_RECOVERED: return 2;
+   case PCI_ERS_RESULT_CAN_RECOVER: return 3;
+   case PCI_ERS_RESULT_DISCONNECT: return 4;
+   case PCI_ERS_RESULT_NEED_RESET: return 5;
+   default:
+   WARN_ONCE(1, "Unknown pci_ers_result value");
+   return 0;
+   }
+};
+
+static enum pci_ers_result merge_result(enum pci_ers_result old,
+   enum pci_ers_result new)
+{
+   if (eeh_result_priority(new) > eeh_result_priority(old))
+   return new;
+   return old;
+}
+
 /**
  * eeh_pcid_get - Get the PCI device driver
  * @pdev: PCI device
@@ -206,9 +229,7 @@ static void *eeh_report_error(struct eeh_dev *edev, void 
*userdata)
 
rc = driver->err_handler->error_detected(dev, pci_channel_io_frozen);
 
-   /* A driver that needs a reset trumps all others */
-   if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
-   if (*res == PCI_ERS_RESULT_NONE) *res = rc;
+   *res = merge_result(*res, rc);
 
edev->in_error = true;
pci_uevent_ers(dev, PCI_ERS_RESULT_NONE);
@@ -249,9 +270,7 @@ static void *eeh_report_mmio_enabled(struct eeh_dev *edev, 
void *userdata)
 
rc = driver->err_handler->mmio_enabled(dev);
 
-   /* A driver that needs a reset trumps all others */
-   if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
-   if (*res == PCI_ERS_RESULT_NONE) *res = rc;
+   *res = merge_result(*res, rc);
 
 out:
eeh_pcid_put(dev);
@@ -294,10 +313,7 @@ static void *eeh_report_reset(struct eeh_dev *edev, void 
*userdata)
goto out;
 
rc = driver->err_handler->slot_reset(dev);
-   if ((*res == PCI_ERS_RESULT_NONE) ||
-   (*res == PCI_ERS_RESULT_RECOVERED)) *res = rc;
-   if (*res == PCI_ERS_RESULT_DISCONNECT &&
-rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
+   *res = merge_result(*res, rc);
 
 out:
eeh_pcid_put(dev);
-- 
2.16.1.74.g9b0b1f47b



[PATCH 13/13] powerpc/eeh: Refactor report functions

2018-05-02 Thread Sam Bobroff
The EEH report functions now share a fair bit of code around the start
and end of each function.

So factor out as much as possible, and move the traversal into a
custom function. This also allows accurate debug to be generated more
easily.

Signed-off-by: Sam Bobroff <sbobr...@linux.ibm.com>
---
 arch/powerpc/kernel/eeh_driver.c | 289 +++
 1 file changed, 138 insertions(+), 151 deletions(-)

diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index eb4feee81ff4..1c4336dcf9f5 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -54,6 +54,25 @@ static int eeh_result_priority(enum pci_ers_result result)
}
 };
 
+const char *pci_ers_result_name(enum pci_ers_result r)
+{
+   switch (r) {
+   case PCI_ERS_RESULT_NONE: return "none";
+   case PCI_ERS_RESULT_CAN_RECOVER: return "can recover";
+   case PCI_ERS_RESULT_NEED_RESET: return "need reset";
+   case PCI_ERS_RESULT_DISCONNECT: return "disconnect";
+   case PCI_ERS_RESULT_RECOVERED: return "recovered";
+   case PCI_ERS_RESULT_NO_AER_DRIVER: return "no AER driver";
+   default:
+   WARN_ONCE(1, "Unknown result type");
+   return "unknown";
+   }
+};
+
+#define eeh_infoline(EDEV, FMT, ...) \
+pr_info("EEH: PE#%x (PCI %s): " pr_fmt(FMT) "\n", EDEV->pe_config_addr, \
+((EDEV->pdev) ? dev_name(>pdev->dev) : "NONE"), ##__VA_ARGS__)
+
 static enum pci_ers_result merge_result(enum pci_ers_result old,
enum pci_ers_result new)
 {
@@ -223,123 +242,118 @@ static void eeh_set_irq_state(struct eeh_pe *root, bool 
enable)
}
 }
 
+static void eeh_pe_report(const char *name, struct eeh_pe *root,
+ enum pci_ers_result (*fn)(struct eeh_dev *,
+   struct pci_driver *),
+ enum pci_ers_result *result)
+{
+   struct eeh_pe *pe;
+   struct eeh_dev *edev, *tmp;
+   enum pci_ers_result new_result;
+
+   pr_info("EEH: Beginning: '%s'\n", name);
+   eeh_for_each_pe(root, pe) {
+   eeh_pe_for_each_dev(pe, edev, tmp) {
+   device_lock(>pdev->dev);
+   if (eeh_edev_actionable(edev)) {
+   struct pci_driver *driver;
+
+   driver = eeh_pcid_get(edev->pdev);
+
+   if (!driver)
+   eeh_infoline(edev, "no driver");
+   else if (!driver->err_handler)
+   eeh_infoline(edev,
+"driver not EEH aware");
+   else if (edev->mode & EEH_DEV_NO_HANDLER)
+   eeh_infoline(edev,
+"driver bound too late");
+   else {
+   new_result = fn(edev, driver);
+   eeh_infoline(edev,
+   "%s driver reports: '%s'",
+   driver->name,
+   
pci_ers_result_name(new_result));
+   if (result)
+   *result = merge_result(*result,
+  
new_result);
+   }
+   if (driver)
+   eeh_pcid_put(edev->pdev);
+   } else {
+   eeh_infoline(edev, "not actionable (%d,%d,%d)",
+!!edev->pdev,
+!eeh_dev_removed(edev),
+!eeh_pe_passed(edev->pe));
+   }
+   device_unlock(>pdev->dev);
+   }
+   }
+   if (result)
+   pr_info("EEH: Finished:'%s' with aggregate recovery 
state:'%s'\n",
+   name, pci_ers_result_name(*result));
+   else
+   pr_info("EEH: Finished:'%s'", name);
+}
+
 /**
  * eeh_report_error - Report pci error to each device driver
- * @data: eeh device
- * @userdata: return value
+ * @edev: eeh device
+ * @driver: device's PCI driver
  *
- * Report an EEH error to each device driver, collect up and
- * merge the device driver responses. Cumulative response
- * passed back in "user

[PATCH 01/13] powerpc/eeh: Add eeh_max_freezes to initial EEH log line

2018-05-02 Thread Sam Bobroff
The current failure message includes the number of failures that have
occurred in the last hour (for a device) but it does not indicate
how many failures will be tolerated before the device is permanently
disabled.

Include the limit (eeh_max_freezes) to make this less surprising when
it happens.

Also remove the embedded newline from the existing message to make it
easier to grep for.

Signed-off-by: Sam Bobroff <sbobr...@linux.ibm.com>
---
 arch/powerpc/kernel/eeh_driver.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index b8a329f04814..56a60b9eb397 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -778,14 +778,13 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
eeh_pe_update_time_stamp(pe);
pe->freeze_count++;
if (pe->freeze_count > eeh_max_freezes) {
-   pr_err("EEH: PHB#%x-PE#%x has failed %d times in the\n"
-  "last hour and has been permanently disabled.\n",
+   pr_err("EEH: PHB#%x-PE#%x has failed %d times in the last hour 
and has been permanently disabled.\n",
   pe->phb->global_number, pe->addr,
   pe->freeze_count);
goto hard_fail;
}
-   pr_warn("EEH: This PCI device has failed %d times in the last hour\n",
-   pe->freeze_count);
+   pr_warn("EEH: This PCI device has failed %d times in the last hour and 
will be permanently disabled after %d failures.\n",
+   pe->freeze_count, eeh_max_freezes);
 
/* Walk the various device drivers attached to this slot through
 * a reset sequence, giving each an opportunity to do what it needs
-- 
2.16.1.74.g9b0b1f47b



  1   2   3   4   >