Re: [PATCH V3 0/3] Add new PowerPC specific ELF core notes
On 24/05/14 01:15, Anshuman Khandual wrote: This patch series adds five new ELF core note sections which can be used with existing ptrace request PTRACE_GETREGSET/SETREGSET for accessing various transactional memory and miscellaneous register sets on PowerPC platform. Please find a test program exploiting these new ELF core note types on a POWER8 system. RFC: https://lkml.org/lkml/2014/4/1/292 V1: https://lkml.org/lkml/2014/4/2/43 V2: https://lkml.org/lkml/2014/5/5/88 Changes in V3 = (1) Added two new error paths in every TM related get/set functions when regset support is not present on the system (ENODEV) or when the process does not have any transaction active (ENODATA) in the context (2) Installed the active hooks for all the newly added regset core note types Changes in V2 = (1) Removed all the power specific ptrace requests corresponding to new NT_PPC_* elf core note types. Now all the register sets can be accessed from ptrace through PTRACE_GETREGSET/PTRACE_SETREGSET using the individual NT_PPC* core note type instead (2) Fixed couple of attribute values for REGSET_TM_CGPR register set (3) Renamed flush_tmreg_to_thread as flush_tmregs_to_thread (4) Fixed 32 bit checkpointed GPR support (5) Changed commit messages accordingly Outstanding Issues == (1) Running DSCR register value inside a transaction does not seem to be saved at thread.dscr when the process stops for ptrace examination. Since this is fixed by 96d016108640bc2b7fb0ee800737f80923847294, which is already upstream, you might want to rebase and re-test. It should pass and then you can remove the outstanding issues :-) Test programs program When I posted the patch I mentioned above, I was asked to move the test code into the powerpc kernel selftests so you may want to do this too. Also, your test program covers everything mine did and more so you might want to remove mine if you do add this to the selftests. = #include unistd.h #include stdlib.h #include string.h #include malloc.h #include errno.h #include sys/ptrace.h #include sys/uio.h #include sys/types.h #include sys/signal.h #include sys/user.h You should include sys/wait.h for waitpid(). #include linux/elf.h #include linux/types.h #include linux/ptrace.h typedef long long u64; typedef unsigned int u32; typedef __vector128 vector128; /* TM CFPR */ struct tm_cfpr { u64 fpr[32]; u64 fpscr; }; /* TM CVMX */ struct tm_cvmx { vector128 vr[32] __attribute__((aligned(16))); vector128 vscr __attribute__((aligned(16))); u32 vrsave; }; /* TM SPR */ struct tm_spr_regs { u64 tm_tfhar; u64 tm_texasr; u64 tm_tfiar; u64 tm_orig_msr; u64 tm_tar; u64 tm_ppr; u64 tm_dscr; }; /* Miscellaneous registers */ struct misc_regs { u64 dscr; u64 ppr; u64 tar; }; /* TM instructions */ #define TBEGIN .long 0x7C00051D ; #define TEND.long 0x7C00055D ; /* SPR number */ #define SPRN_DSCR 0x3 #define SPRN_TAR 815 /* ELF core notes */ #define NT_PPC_TM_SPR 0x103 /* PowerPC transactional memory special registers */ #define NT_PPC_TM_CGPR 0x104 /* PowerpC transactional memory checkpointed GPR */ #define NT_PPC_TM_CFPR 0x105 /* PowerPC transactional memory checkpointed FPR */ #define NT_PPC_TM_CVMX 0x106 /* PowerPC transactional memory checkpointed VMX */ #define NT_PPC_MISC0x107 /* PowerPC miscellaneous registers */ #define VAL1 1 #define VAL2 2 #define VAL3 3 #define VAL4 4 int main(int argc, char *argv[]) { struct tm_spr_regs *tmr1; struct pt_regs *pregs1, *pregs2; struct tm_cfpr *fpr, *fpr1; struct misc_regs *dbr1; struct iovec iov; pid_t child; int ret = 0, status = 0, i = 0, flag = 1; status, i and flags are all unused. pregs2 = (struct pt_regs *) malloc(sizeof(struct pt_regs)); fpr = (struct tm_cfpr *) malloc(sizeof(struct tm_cfpr)); child = fork(); if (child 0) { printf(fork() failed \n); exit(-1); } /* Child code */ if (child == 0) { asm __volatile__( 6: ; /* TM checkpointed values */ li 1, %[val1];/* GPR[1] */ .long 0x7C210166; /* FPR[1] */ li 2, %[val2];/* GPR[2] */ .long 0x7C420166; /* FPR[2] */ mtspr %[tar], 1; /* TAR */ mtspr %[dscr], 2; /* DSCR */ 1: ; TBEGIN /* TM running values */ beq 2f ;
Re: [PATCH V3 2/3] powerpc, ptrace: Enable support for transactional memory register sets
On 24/05/14 01:15, Anshuman Khandual wrote: This patch enables get and set of transactional memory related register sets through PTRACE_GETREGSET/PTRACE_SETREGSET interface by implementing four new powerpc specific register sets i.e REGSET_TM_SPR, REGSET_TM_CGPR, REGSET_TM_CFPR, REGSET_CVMX support corresponding to these following new ELF core note types added previously in this regard. (1) NT_PPC_TM_SPR (2) NT_PPC_TM_CGPR (3) NT_PPC_TM_CFPR (4) NT_PPC_TM_CVMX Signed-off-by: Anshuman Khandual khand...@linux.vnet.ibm.com Hi Anshuman, I'm not Ben but I've reviewed your patch as well as I can and I have some comments that might be useful to you. First of all, I couldn't get this to compile without CONFIG_VSX and CONFIG_PPC_TRANSACTIONAL_MEM defined: there are obvious typos (esle instead of else) and references to fields that aren't defined for those cases. I haven't mentioned any of those issues below as the compiler will do that but you should definitely test those configurations. Also some of the code seems to assume that if CONFIG_VSX is defined then CONFIG_PPC_TRANSACTIONAL_MEM must also be defined, but that isn't the case (it's the other way round: CONFIG_PPC_TRANSACTIONAL_MEM implies CONFIG_VSX). --- arch/powerpc/include/asm/switch_to.h | 8 + arch/powerpc/kernel/process.c| 24 ++ arch/powerpc/kernel/ptrace.c | 792 +-- 3 files changed, 795 insertions(+), 29 deletions(-) diff --git a/arch/powerpc/include/asm/switch_to.h b/arch/powerpc/include/asm/switch_to.h index 0e83e7d..2737f46 100644 --- a/arch/powerpc/include/asm/switch_to.h +++ b/arch/powerpc/include/asm/switch_to.h @@ -80,6 +80,14 @@ static inline void flush_spe_to_thread(struct task_struct *t) } #endif +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM +extern void flush_tmregs_to_thread(struct task_struct *); +#else +static inline void flush_tmregs_to_thread(struct task_struct *t) +{ +} +#endif /* CONFIG_PPC_TRANSACTIONAL_MEM */ + static inline void clear_task_ebb(struct task_struct *t) { #ifdef CONFIG_PPC_BOOK3S_64 diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index 31d0215..e247898 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -695,6 +695,30 @@ static inline void __switch_to_tm(struct task_struct *prev) } } +void flush_tmregs_to_thread(struct task_struct *tsk) +{ + /* + * If task is not current, it should have been flushed + * already to it's thread_struct during __switch_to(). + */ + if (tsk != current) + return; + + preempt_disable(); + if (tsk-thread.regs) { + /* + * If we are still current, the TM state need to + * be flushed to thread_struct as it will be still + * present in the current cpu. + */ + if (MSR_TM_ACTIVE(tsk-thread.regs-msr)) { + __switch_to_tm(tsk); + tm_recheckpoint_new_task(tsk); There is at least one other usage of this pair of calls in order to flush the TM state (in arch_dup_task_struct()), so rather than copying it you might want to create a new function and call it from both places. (And include the nice comment from arch_dup_task_struct() that explains how it works and why.) + } + } + preempt_enable(); +} + /* * This is called if we are on the way out to userspace and the * TIF_RESTORE_TM flag is set. It checks if we need to reload diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c index 2e3d2bf..17642ef 100644 --- a/arch/powerpc/kernel/ptrace.c +++ b/arch/powerpc/kernel/ptrace.c @@ -357,6 +357,17 @@ static int gpr_set(struct task_struct *target, const struct user_regset *regset, return ret; } +/* + * When any transaction is active, thread_struct-transact_fp holds + * the current running value of all FPR registers and thread_struct- + * fp_state holds the last checkpointed FPR registers state for the + * current transaction. + * + * struct data { + * u64 fpr[32]; + * u64 fpscr; + * }; It would be nice to say why you've included struct data in the comment. + */ static int fpr_get(struct task_struct *target, const struct user_regset *regset, unsigned int pos, unsigned int count, void *kbuf, void __user *ubuf) @@ -365,21 +376,41 @@ static int fpr_get(struct task_struct *target, const struct user_regset *regset, u64 buf[33]; int i; #endif - flush_fp_to_thread(target); + if (MSR_TM_ACTIVE(target-thread.regs-msr)) { + flush_fp_to_thread(target); + flush_altivec_to_thread(target); + flush_tmregs_to_thread(target); + } else { + flush_fp_to_thread(target); + } I don't see why you need the else. Could this be:
[PATCH 3/3] powerpc: Document sysfs DSCR interface
Add some documentation about ... /sys/devices/system/cpu/dscr_default /sys/devices/system/cpu/cpuN/dscr ... to Documentation/ABI/stable. Signed-off-by: Sam Bobroff sam.bobr...@au1.ibm.com --- Documentation/ABI/stable/sysfs-devices-system-cpu | 25 + 1 file changed, 25 insertions(+) create mode 100644 Documentation/ABI/stable/sysfs-devices-system-cpu diff --git a/Documentation/ABI/stable/sysfs-devices-system-cpu b/Documentation/ABI/stable/sysfs-devices-system-cpu new file mode 100644 index 000..33c133e --- /dev/null +++ b/Documentation/ABI/stable/sysfs-devices-system-cpu @@ -0,0 +1,25 @@ +What: /sys/devices/system/cpu/dscr_default +Date: 13-May-2014 +KernelVersion: v3.15.0 +Contact: +Description: Writes are equivalent to writing to + /sys/devices/system/cpu/cpuN/dscr on all CPUs. + Reads return the last written value or 0. + This value is not a global default: it is a way to set + all per-CPU defaults at the same time. +Values:64 bit unsigned integer (bit field) + +What: /sys/devices/system/cpu/cpu[0-9]+/dscr +Date: 13-May-2014 +KernelVersion: v3.15.0 +Contact: +Description: Default value for the Data Stream Control Register (DSCR) on + a CPU. + This default value is used when the kernel is executing and + for any process that has not set the DSCR itself. + If a process ever sets the DSCR (via direct access to the + SPR) that value will be persisted for that process and used + on any CPU where it executes (overriding the value described + here). + If set by a process it will be inherited by child processes. +Values:64 bit unsigned integer (bit field) -- 1.7.10.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 2/3] powerpc: fix regression of per-CPU DSCR setting
Since commit efcac65 powerpc: Per process DSCR + some fixes (try#4) it is no longer possible to set the DSCR on a per-CPU basis. The old behaviour was to minipulate the DSCR SPR directly but this is no longer sufficient: the value is quickly overwritten by context switching. This patch stores the per-CPU DSCR value in a kernel variable rather than directly in the SPR and it is used whenever a process has not set the DSCR itself. The sysfs interface (/sys/devices/system/cpu/cpuN/dscr) is unchanged. Writes to the old global default (/sys/devices/system/cpu/dscr_default) now set all of the per-CPU values and reads return the last written value. The new per-CPU default is added to the paca_struct and is used everywhere outside of sysfs.c instead of the old global default. Signed-off-by: Sam Bobroff sam.bobr...@au1.ibm.com --- arch/powerpc/include/asm/paca.h |3 +++ arch/powerpc/kernel/asm-offsets.c |1 + arch/powerpc/kernel/entry_64.S |9 + arch/powerpc/kernel/sysfs.c | 32 ++- arch/powerpc/kernel/tm.S| 16 arch/powerpc/kvm/book3s_hv_rmhandlers.S |3 +-- 6 files changed, 29 insertions(+), 35 deletions(-) diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h index 8e956a0..bb0bd25 100644 --- a/arch/powerpc/include/asm/paca.h +++ b/arch/powerpc/include/asm/paca.h @@ -92,7 +92,10 @@ struct paca_struct { struct slb_shadow *slb_shadow_ptr; struct dtl_entry *dispatch_log; struct dtl_entry *dispatch_log_end; +#endif /* CONFIG_PPC_STD_MMU_64 */ + u64 dscr_default; /* per-CPU default DSCR */ +#ifdef CONFIG_PPC_STD_MMU_64 /* * Now, starting in cacheline 2, the exception save areas */ diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index dba8140..cba2697 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -247,6 +247,7 @@ int main(void) #endif DEFINE(PACAHWCPUID, offsetof(struct paca_struct, hw_cpu_id)); DEFINE(PACAKEXECSTATE, offsetof(struct paca_struct, kexec_state)); + DEFINE(PACA_DSCR, offsetof(struct paca_struct, dscr_default)); DEFINE(PACA_STARTTIME, offsetof(struct paca_struct, starttime)); DEFINE(PACA_STARTTIME_USER, offsetof(struct paca_struct, starttime_user)); DEFINE(PACA_USER_TIME, offsetof(struct paca_struct, user_time)); diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index 9fde8a1..911d453 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -387,12 +387,6 @@ _GLOBAL(ret_from_kernel_thread) li r3,0 b syscall_exit - .section.toc,aw -DSCR_DEFAULT: - .tc dscr_default[TC],dscr_default - - .section.text - /* * This routine switches between two different tasks. The process * state of one is saved on its kernel stack. Then the state @@ -577,11 +571,10 @@ END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC) #ifdef CONFIG_PPC64 BEGIN_FTR_SECTION lwz r6,THREAD_DSCR_INHERIT(r4) - ld r7,DSCR_DEFAULT@toc(2) ld r0,THREAD_DSCR(r4) cmpwi r6,0 bne 1f - ld r0,0(r7) + ld r0,PACA_DSCR(r13) 1: BEGIN_FTR_SECTION_NESTED(70) mfspr r8, SPRN_FSCR diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c index e2a1d6f..67fd2fd 100644 --- a/arch/powerpc/kernel/sysfs.c +++ b/arch/powerpc/kernel/sysfs.c @@ -484,7 +484,6 @@ SYSFS_PMCSETUP(pmc8, SPRN_PMC8); SYSFS_PMCSETUP(mmcra, SPRN_MMCRA); SYSFS_SPRSETUP(purr, SPRN_PURR); SYSFS_SPRSETUP(spurr, SPRN_SPURR); -SYSFS_SPRSETUP(dscr, SPRN_DSCR); SYSFS_SPRSETUP(pir, SPRN_PIR); /* @@ -494,12 +493,27 @@ SYSFS_SPRSETUP(pir, SPRN_PIR); */ static DEVICE_ATTR(mmcra, 0600, show_mmcra, store_mmcra); static DEVICE_ATTR(spurr, 0400, show_spurr, NULL); -static DEVICE_ATTR(dscr, 0600, show_dscr, store_dscr); static DEVICE_ATTR(purr, 0400, show_purr, store_purr); static DEVICE_ATTR(pir, 0400, show_pir, NULL); -unsigned long dscr_default = 0; -EXPORT_SYMBOL(dscr_default); +static unsigned long dscr_default; + +static void read_dscr(void *val) +{ + *(unsigned long *)val = get_paca()-dscr_default; +} + +static void write_dscr(void *val) +{ + get_paca()-dscr_default = *(unsigned long *)val; + if (!current-thread.dscr_inherit) { + current-thread.dscr = *(unsigned long *)val; + mtspr(SPRN_DSCR, *(unsigned long *)val); + } +} + +SYSFS_SPRSETUP_SHOW_STORE(dscr); +static DEVICE_ATTR(dscr, 0600, show_dscr, store_dscr); static void add_write_permission_dev_attr(struct device_attribute *attr) { @@ -512,14 +526,6 @@ static ssize_t show_dscr_default(struct device *dev, return sprintf(buf, %lx\n, dscr_default); } -static void update_dscr(void *dummy) -{ - if (!current
[PATCH 0/3] powerpc: fix regression of per-CPU DSCR setting
Hello, This patch corrects a regression on PowerPC CPUs that causes their per-CPU DSCR SPR value (exposed via /sys/devices/system/cpuN/dscr) to be quickly lost during context switching, effectively meaning that the DSCR can no longer be set on a per-CPU basis. My intent is to restore the functionality of the per-CPU value in a way that is compatible with the newer global default and task-specific DSCR setting system. Users of either the old or new systems should now get pretty much what they expect. A couple of notes: I've split an existing ifdef CONFIG_PPC_STD_MMU_64 block in paca_struct into two parts because it allows dscr_default to be placed into a cache line hole. (This seems be the case even without CONFIG_PPC_STD_MMU_64 being defined.) Comments or ideas on alternative placements are welcome. PowerPC context switching is touched but there should not be any performance cost; if anything it should get slightly faster due to the per-CPU value being easier to access than the old global default. Sam Bobroff (3): powerpc: Split __SYSFS_SPRSETUP macro powerpc: fix regression of per-CPU DSCR setting powerpc: Document sysfs DSCR interface Documentation/ABI/stable/sysfs-devices-system-cpu | 25 ++ arch/powerpc/include/asm/paca.h |3 ++ arch/powerpc/kernel/asm-offsets.c |1 + arch/powerpc/kernel/entry_64.S|9 +--- arch/powerpc/kernel/sysfs.c | 51 + arch/powerpc/kernel/tm.S | 16 ++- arch/powerpc/kvm/book3s_hv_rmhandlers.S |3 +- 7 files changed, 67 insertions(+), 41 deletions(-) create mode 100644 Documentation/ABI/stable/sysfs-devices-system-cpu -- 1.7.10.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 1/3] powerpc: Split __SYSFS_SPRSETUP macro
Split the __SYSFS_SPRSETUP macro into two parts so that registers requiring custom read and write functions can use common code for their show and store functions. Signed-off-by: Sam Bobroff sam.bobr...@au1.ibm.com --- arch/powerpc/kernel/sysfs.c | 19 +-- 1 file changed, 13 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c index d90d4b7..e2a1d6f 100644 --- a/arch/powerpc/kernel/sysfs.c +++ b/arch/powerpc/kernel/sysfs.c @@ -404,7 +404,7 @@ void ppc_enable_pmcs(void) } EXPORT_SYMBOL(ppc_enable_pmcs); -#define __SYSFS_SPRSETUP(NAME, ADDRESS, EXTRA) \ +#define __SYSFS_SPRSETUP_READ_WRITE(NAME, ADDRESS, EXTRA) \ static void read_##NAME(void *val) \ { \ *(unsigned long *)val = mfspr(ADDRESS); \ @@ -413,7 +413,9 @@ static void write_##NAME(void *val) \ { \ EXTRA; \ mtspr(ADDRESS, *(unsigned long *)val); \ -} \ +} + +#define __SYSFS_SPRSETUP_SHOW_STORE(NAME) \ static ssize_t show_##NAME(struct device *dev, \ struct device_attribute *attr, \ char *buf) \ @@ -436,10 +438,15 @@ static ssize_t __used \ return count; \ } -#define SYSFS_PMCSETUP(NAME, ADDRESS) \ - __SYSFS_SPRSETUP(NAME, ADDRESS, ppc_enable_pmcs()) -#define SYSFS_SPRSETUP(NAME, ADDRESS) \ - __SYSFS_SPRSETUP(NAME, ADDRESS, ) +#define SYSFS_PMCSETUP(NAME, ADDRESS) \ + __SYSFS_SPRSETUP_READ_WRITE(NAME, ADDRESS, ppc_enable_pmcs()) \ + __SYSFS_SPRSETUP_SHOW_STORE(NAME) +#define SYSFS_SPRSETUP(NAME, ADDRESS) \ + __SYSFS_SPRSETUP_READ_WRITE(NAME, ADDRESS, ) \ + __SYSFS_SPRSETUP_SHOW_STORE(NAME) + +#define SYSFS_SPRSETUP_SHOW_STORE(NAME) \ + __SYSFS_SPRSETUP_SHOW_STORE(NAME) /* Let's define all possible registers, we'll only hook up the ones * that are implemented on the current processor -- 1.7.10.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 1/1] powerpc: correct DSCR during TM context switch
Correct the DSCR SPR becoming temporarily corrupted when a task is context switched when within a transaction. It is corrected when the transaction is aborted (which will happen after a context switch) but if the task has suspended (TSUSPEND) the transaction the incorrect value can be seen. The problem is caused by saving a thread's DSCR after it has already been reverted to the CPU's default value: __switch_to() calls __switch_to_tm() which calls tm_reclaim_task() which calls tm_reclaim_thread() which calls tm_reclaim() where the DSCR is reset __switch_to() calls _switch _switch() saves the DSCR to thread.dscr The fix is to treat the DSCR similarly to the TAR and save it early in __switch_to(). The program below will expose the problem: #include inttypes.h #include stdio.h #include stdlib.h #include assert.h #include asm/tm.h #define TBEGIN .long 0x7C00051D ; #define TEND.long 0x7C00055D ; #define TCHECK .long 0x7C00059C ; #define TSUSPEND.long 0x7C0005DD ; #define TRESUME .long 0x7C2005DD ; #define SPRN_TEXASR 0x82 #define SPRN_DSCR 0x03 int main(void) { uint64_t i = 0, rv, dscr1 = 1, dscr2, texasr; for (;;) { rv = 1; asm __volatile__ ( ld 3, %[dscr1]; mtspr %[sprn_dscr], 3; TBEGIN beq 1f; TSUSPEND 2: ; TCHECK bc 4, 0, 2b; mfspr 3, %[sprn_dscr]; std 3, %[dscr2]; mfspr 3, %[sprn_texasr]; std 3, %[texasr]; TRESUME TEND li %[rv], 0; 1: ; : [rv]=r(rv), [dscr2]=m(dscr2), [texasr]=m(texasr) : [dscr1]m(dscr1) , [sprn_dscr]i(SPRN_DSCR), [sprn_texasr]i(SPRN_TEXASR) : memory, r3 ); assert(rv); if ((texasr 56) == TM_CAUSE_RESCHED) { putchar('!'); fflush(stdout); i++; } else { putchar('.'); fflush(stdout); } if (dscr2 != dscr1) { printf(\n DSCR incorrect: 0x%lx (expecting 0x%lx)\n, dscr2, dscr1); exit(EXIT_FAILURE); } if (i 10) { printf(\n DSCR TM context switching seems OK.\n); exit(EXIT_SUCCESS); } } } Signed-off-by: Sam Bobroff sam.bobr...@au1.ibm.com --- arch/powerpc/include/asm/switch_to.h |6 -- arch/powerpc/kernel/entry_64.S |6 -- arch/powerpc/kernel/process.c|8 3 files changed, 8 insertions(+), 12 deletions(-) diff --git a/arch/powerpc/include/asm/switch_to.h b/arch/powerpc/include/asm/switch_to.h index 2737f46..3efd0e5 100644 --- a/arch/powerpc/include/asm/switch_to.h +++ b/arch/powerpc/include/asm/switch_to.h @@ -16,13 +16,15 @@ struct thread_struct; extern struct task_struct *_switch(struct thread_struct *prev, struct thread_struct *next); #ifdef CONFIG_PPC_BOOK3S_64 -static inline void save_tar(struct thread_struct *prev) +static inline void save_early_sprs(struct thread_struct *prev) { if (cpu_has_feature(CPU_FTR_ARCH_207S)) prev-tar = mfspr(SPRN_TAR); + if (cpu_has_feature(CPU_FTR_DSCR)) + prev-dscr = mfspr(SPRN_DSCR); } #else -static inline void save_tar(struct thread_struct *prev) {} +static inline void save_early_sprs(struct thread_struct *prev) {} #endif extern void enable_kernel_fp(void); diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index 662c6dd..a107f4a 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -432,12 +432,6 @@ BEGIN_FTR_SECTION std r24,THREAD_VRSAVE(r3) END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC) #endif /* CONFIG_ALTIVEC */ -#ifdef CONFIG_PPC64 -BEGIN_FTR_SECTION - mfspr r25,SPRN_DSCR - std r25,THREAD_DSCR(r3) -END_FTR_SECTION_IFSET(CPU_FTR_DSCR) -#endif and.r0,r0,r22 beq+1f andcr22,r22,r0 diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index e247898..8d2065e 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -771,15 +771,15 @@ struct task_struct *__switch_to(struct task_struct *prev, WARN_ON(!irqs_disabled()); - /* Back up the TAR across context switches. + /* Back up the TAR and DSCR across context switches. * Note that the TAR is not available for use in the kernel. (To * provide this, the TAR should be backed up/restored on exception * entry/exit instead, and be in pt_regs. FIXME, this should be in * pt_regs anyway (for debug).) -* Save the TAR here before we do treclaim/trecheckpoint as these -* will change the TAR. +* Save the TAR and DSCR here before we do treclaim/trecheckpoint as +* these will change them. */ - save_tar(prev-thread); + save_early_sprs(prev-thread); __switch_to_tm(prev
Re: [PATCH 1/1] powerpc: correct DSCR during TM context switch
On 04/06/14 20:03, Michael Neuling wrote: On Wed, 2014-06-04 at 17:31 +1000, Michael Ellerman wrote: Hi Sam, Comments inline .. Ditto Responses inline... On Wed, 2014-06-04 at 13:33 +1000, Sam Bobroff wrote: Correct the DSCR SPR becoming temporarily corrupted when a task is context switched when within a transaction. It is corrected when the transaction is aborted (which will happen after a context switch) but if the task has suspended (TSUSPEND) the transaction the incorrect value can be seen. I don't quite follow this description. How is it corrected when the transaction is aborted, and when does that usually happen? If that happens the task can't ever see the corrupted value? To hit the suspended case, the task starts a transaction, suspends it, is then context switched out and back in, and at that point it can see the wrong value? Yep, that's it and it's corrupted until the transaction is rolled back (normally at the tresume). At the tresume it gets rolled back to the checkpointed value at tbegin and is no longer corrupt. I'll re-work the explanation to be clearer about how it becomes corrupt and how it is corrected. The problem is caused by saving a thread's DSCR afterNo it's lost at that point as we've not saved it and it was overwritten when we did the treclaim. it has already been reverted to the CPU's default value: __switch_to() calls __switch_to_tm() which calls tm_reclaim_task() which calls tm_reclaim_thread() which calls tm_reclaim() where the DSCR is reset Where the DSCR is set to DSCR_DEFAULT ? Or now PACA_DSCR since your previous patches? Could we instead fix the bug there by reverting to the thread's DSCR value? We really need to save it earlier, before the treclaim which will override it. I'll try to improve this explanation as well. __switch_to() calls _switch _switch() saves the DSCR to thread.dscrTBEGIN The fix is to treat the DSCR similarly to the TAR and save it early in __switch_to(). The program below will expose the problem: Can you drop this in tools/testing/selftests/powerpc/tm ? You'll need to create that directory, you can ape the Makefile from the pmu directory, it should be fairly obvious. See the pmu tests for how to integrate with the test harness etc., or bug me if it's not straight forward. Will do :-) diff --git a/arch/powerpc/include/asm/switch_to.h b/arch/powerpc/include/asm/switch_to.h index 2737f46..3efd0e5 100644 --- a/arch/powerpc/include/asm/switch_to.h +++ b/arch/powerpc/include/asm/switch_to.h @@ -16,13 +16,15 @@ struct thread_struct; extern struct task_struct *_switch(struct thread_struct *prev, struct thread_struct *next); #ifdef CONFIG_PPC_BOOK3S_64 -static inline void save_tar(struct thread_struct *prev) +static inline void save_early_sprs(struct thread_struct *prev) { if (cpu_has_feature(CPU_FTR_ARCH_207S)) prev-tar = mfspr(SPRN_TAR); + if (cpu_has_feature(CPU_FTR_DSCR)) + prev-dscr = mfspr(SPRN_DSCR); } Are we going to end up saving more SPRs in this code? What makes the TAR DSCR special vs everything else? There are only a limited set of SPRs that TM checkpoints. The full list is CR, LR, CTR, FPSCR, AMR, PPR, VRSAVE, VSCR, DSCR, and TAR. http://www.scribd.com/doc/142877680/PowerISA-v2-07#outer_page_826 CR, LR, CTR, PPR are handled really early in the exception handler FPSCR, VSCR are done in the FP/VMX/VSX code. AMR we don't care about. That just leaves the DSCR and the TAR for here ... and the VRSAVE. Sam: did you have a patch to save that one early too? I think we talked about it but forgot, or did we decide that it's always broken anyway so we don't care? :-D I thought we'd decided that VRSAVE was already probably broken ;-) I haven't tested VRSAVE yet so we don't know if it's actually getting corrupted in this situation (although it seems likely), and from a quick look at the code it's not being treated like DSCR or TAR at the moment so I would need to investigate it separately. Mikey The nice thing about doing this in asm is it's nop'ed out for cpus that don't have the DSCR. What does the generated code for this look like? diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index 662c6dd..a107f4a 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -432,12 +432,6 @@ BEGIN_FTR_SECTION std r24,THREAD_VRSAVE(r3) END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC) #endif /* CONFIG_ALTIVEC */ -#ifdef CONFIG_PPC64 -BEGIN_FTR_SECTION - mfspr r25,SPRN_DSCR - std r25,THREAD_DSCR(r3) -END_FTR_SECTION_IFSET(CPU_FTR_DSCR) -#endif and.r0,r0,r22 beq+1f andcr22,r22,r0 I wondered a little about this as well. The C code calls this function: static inline int cpu_has_feature(unsigned long feature) { return (CPU_FTRS_ALWAYS
[PATCH 1/1 v2] powerpc: Correct DSCR during TM context switch
Correct the DSCR SPR becoming temporarily corrupted if a task is context switched during a transaction. The problem occurs while suspending the task and is caused by saving the DSCR to thread.dscr after it has already been set to the CPU's default value: __switch_to() calls __switch_to_tm() which calls tm_reclaim_task() which calls tm_reclaim_thread() which calls tm_reclaim() where the DSCR is set to the CPU's default __switch_to() calls _switch() where thread.dscr is set to the DSCR When the task is resumed, it's transaction will be doomed (as usual) and the DSCR SPR will be corrupted, although the checkpointed value will be correct. Therefore the DSCR will be immediately corrected by the transaction aborting, unless it has been suspended. In that case the incorrect value can be seen by the task until it resumes the transaction. The fix is to treat the DSCR similarly to the TAR and save it early in __switch_to(). A program exposing the problem is added to the kernel self tests as: tools/testing/selftests/powerpc/tm/tm-resched-dscr. Signed-off-by: Sam Bobroff sam.bobr...@au1.ibm.com --- Changes: v2: * Reworked commit message. * Adjusted test code and added it to kernel self tests. --- arch/powerpc/include/asm/switch_to.h |6 +- arch/powerpc/kernel/entry_64.S |6 -- arch/powerpc/kernel/process.c |8 +- tools/testing/selftests/powerpc/Makefile |2 +- tools/testing/selftests/powerpc/tm/Makefile| 15 .../testing/selftests/powerpc/tm/tm-resched-dscr.c | 90 6 files changed, 114 insertions(+), 13 deletions(-) create mode 100644 tools/testing/selftests/powerpc/tm/Makefile create mode 100644 tools/testing/selftests/powerpc/tm/tm-resched-dscr.c diff --git a/arch/powerpc/include/asm/switch_to.h b/arch/powerpc/include/asm/switch_to.h index 2737f46..3efd0e5 100644 --- a/arch/powerpc/include/asm/switch_to.h +++ b/arch/powerpc/include/asm/switch_to.h @@ -16,13 +16,15 @@ struct thread_struct; extern struct task_struct *_switch(struct thread_struct *prev, struct thread_struct *next); #ifdef CONFIG_PPC_BOOK3S_64 -static inline void save_tar(struct thread_struct *prev) +static inline void save_early_sprs(struct thread_struct *prev) { if (cpu_has_feature(CPU_FTR_ARCH_207S)) prev-tar = mfspr(SPRN_TAR); + if (cpu_has_feature(CPU_FTR_DSCR)) + prev-dscr = mfspr(SPRN_DSCR); } #else -static inline void save_tar(struct thread_struct *prev) {} +static inline void save_early_sprs(struct thread_struct *prev) {} #endif extern void enable_kernel_fp(void); diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index 662c6dd..a107f4a 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -432,12 +432,6 @@ BEGIN_FTR_SECTION std r24,THREAD_VRSAVE(r3) END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC) #endif /* CONFIG_ALTIVEC */ -#ifdef CONFIG_PPC64 -BEGIN_FTR_SECTION - mfspr r25,SPRN_DSCR - std r25,THREAD_DSCR(r3) -END_FTR_SECTION_IFSET(CPU_FTR_DSCR) -#endif and.r0,r0,r22 beq+1f andcr22,r22,r0 diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index e247898..8d2065e 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -771,15 +771,15 @@ struct task_struct *__switch_to(struct task_struct *prev, WARN_ON(!irqs_disabled()); - /* Back up the TAR across context switches. + /* Back up the TAR and DSCR across context switches. * Note that the TAR is not available for use in the kernel. (To * provide this, the TAR should be backed up/restored on exception * entry/exit instead, and be in pt_regs. FIXME, this should be in * pt_regs anyway (for debug).) -* Save the TAR here before we do treclaim/trecheckpoint as these -* will change the TAR. +* Save the TAR and DSCR here before we do treclaim/trecheckpoint as +* these will change them. */ - save_tar(prev-thread); + save_early_sprs(prev-thread); __switch_to_tm(prev); diff --git a/tools/testing/selftests/powerpc/Makefile b/tools/testing/selftests/powerpc/Makefile index 316194f..e1544e8 100644 --- a/tools/testing/selftests/powerpc/Makefile +++ b/tools/testing/selftests/powerpc/Makefile @@ -13,7 +13,7 @@ CFLAGS := -Wall -O2 -flto -Wall -Werror -DGIT_VERSION='$(GIT_VERSION)' -I$(CUR export CC CFLAGS -TARGETS = pmu copyloops +TARGETS = pmu copyloops tm endif diff --git a/tools/testing/selftests/powerpc/tm/Makefile b/tools/testing/selftests/powerpc/tm/Makefile new file mode 100644 index 000..51267f4 --- /dev/null +++ b/tools/testing/selftests/powerpc/tm/Makefile @@ -0,0 +1,15 @@ +PROGS := tm-resched-dscr + +all: $(PROGS) + +$(PROGS
[PATCH 1/1] selftests/powerpc: fix TARGETS in powerpc selftests makefile
This patch changes the name of a make variable (TARGETS) to prevent it from colliding with a value set by the user on the command line (as they are recommended to do by tools/testing/selftests/README.txt). Before this patch, make -C tools/testing/selftests TARGETS=powerpc would fail. Signed-off-by: Sam Bobroff sam.bobr...@au1.ibm.com --- tools/testing/selftests/powerpc/Makefile | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/tools/testing/selftests/powerpc/Makefile b/tools/testing/selftests/powerpc/Makefile index 54833a7..84795c0 100644 --- a/tools/testing/selftests/powerpc/Makefile +++ b/tools/testing/selftests/powerpc/Makefile @@ -13,22 +13,22 @@ CFLAGS := -Wall -O2 -flto -Wall -Werror -DGIT_VERSION='$(GIT_VERSION)' -I$(CUR export CC CFLAGS -TARGETS = pmu copyloops mm tm +SUB_TARGETS = pmu copyloops mm tm endif all: - @for TARGET in $(TARGETS); do \ + @for TARGET in $(SUB_TARGETS); do \ $(MAKE) -C $$TARGET all; \ done; run_tests: all - @for TARGET in $(TARGETS); do \ + @for TARGET in $(SUB_TARGETS); do \ $(MAKE) -C $$TARGET run_tests; \ done; clean: - @for TARGET in $(TARGETS); do \ + @for TARGET in $(SUB_TARGETS); do \ $(MAKE) -C $$TARGET clean; \ done; rm -f tags -- 1.9.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V3 0/3] Add new PowerPC specific ELF core notes
On 17/07/14 21:14, Michael Neuling wrote: On Jul 17, 2014 9:11 PM, Benjamin Herrenschmidt b...@kernel.crashing.org mailto:b...@kernel.crashing.org wrote: Outstanding Issues == (1) Running DSCR register value inside a transaction does not seem to be saved at thread.dscr when the process stops for ptrace examination. Hey Ben, Any updates on this patch series ? Ben, Any updates on this patch series ? I haven't had a chance to review yet, I was hoping somebody else would.. Have you made any progress vs. the DSCR outstanding issue mentioned above ? The DSCR issue should be resolved with Sam Bobroff's recent DSCR fixes. I've not tested them though. Actually... Sam did you review this series? Mikey I did, and applying powerpc: Correct DSCR during TM context switch corrected the DSCR value in the test program (the one in the patch notes for this series). (In fact, IIRC, the reason for my patch set was the bug exposed by this one ;-) Cheers, Sam. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
suspicious RCU usage clockevents_lock, tick_broadcast_lock, hrtimer_bases.lock
Hello, I'm receiving this while booting a vanilla 3.19 kernel on a Power 8 machine: [2.522179] device-mapper: uevent: version 1.0.3 [2.522741] device-mapper: ioctl: 4.29.0-ioctl (2014-10-28) initialised: dm-de...@redhat.com [2.543590] [2.543630] === [2.543709] [ INFO: suspicious RCU usage. ] [2.543758] 3.19.0samb #2 Not tainted [2.543802] --- [2.543847] include/trace/events/timer.h:186 suspicious rcu_dereference_check() usage! [2.543940] [2.543940] other info that might help us debug this: [2.543940] [2.544035] [2.544035] RCU used illegally from idle CPU! [2.544035] rcu_scheduler_active = 1, debug_locks = 0 [2.544154] RCU used illegally from extended quiescent state! [2.544234] 3 locks held by swapper/1/0: [2.544284] #0: (clockevents_lock){-.}, at: [c01a45cc] .clockevents_notify+0x5c/0x320 [2.544464] #1: (tick_broadcast_lock){-.-...}, at: [c01a7ac4] .tick_broadcast_oneshot_control+0xe4/0x530 [2.544654] #2: (hrtimer_bases.lock#2){-.-...}, at: [c0192694] .__hrtimer_start_range_ns+0x124/0x6e0 [2.544843] [2.544843] stack backtrace: [2.544904] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.19.0samb #2 [2.544986] Call Trace: [2.545023] [c00fdf6f7770] [c0faf118] .dump_stack+0x98/0xd4 (unreliable) [2.545124] [c00fdf6f77f0] [c0152b78] .lockdep_rcu_suspicious+0x138/0x180 [2.545225] [c00fdf6f7880] [c0191374] .enqueue_hrtimer+0x1c4/0x300 [2.545325] [c00fdf6f7910] [c019276c] .__hrtimer_start_range_ns+0x1fc/0x6e0 [2.545425] [c00fdf6f7a10] [c01a8e90] .bc_set_next+0xc0/0xf0 [2.545510] [c00fdf6f7aa0] [c01a51f0] .clockevents_program_event+0x100/0x1f0 [2.545607] [c00fdf6f7b40] [c01a6bac] .tick_broadcast_set_event+0x6c/0x120 [2.545705] [c00fdf6f7bd0] [c01a7c94] .tick_broadcast_oneshot_control+0x2b4/0x530 [2.545802] [c00fdf6f7ca0] [c01a4818] .clockevents_notify+0x2a8/0x320 [2.545898] [c00fdf6f7d70] [c01484f4] .cpu_startup_entry+0x404/0x730 [2.545995] [c00fdf6f7ec0] [c0044314] .start_secondary+0x3a4/0x460 [2.546092] [c00fdf6f7f90] [c0008bfc] start_secondary_prolog+0x10/0x14 [2.546555] Registering IBM Power 842 compression driver Cheers, Sam. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc/powernv: Check image loaded or not before calling flash
On 13/02/15 08:27, Benjamin Herrenschmidt wrote: On Thu, 2015-02-12 at 15:23 +0530, Vasant Hegde wrote: Present code checks for update_flash_data in opal_flash_term_callback(). update_flash_data has been statically initialized to zero, and that is the value of FLASH_IMG_READY. Also code update initialization happens during subsys init. Please statically initialize it to a sane value instead. I've tested this patch and it works for me (the message is suppressed) but I agree with Ben that it seems cleaner to use a static initializer. So if reboot is issued before the subsys init stage then we endup displaying Flashing new firmware message.. which may confuse end user. This patch adds additional validation to make sure image is actually loaded or not. Reported-by: Sam Bobroff sam.bobr...@au1.ibm.com Signed-off-by: Vasant Hegde hegdevas...@linux.vnet.ibm.com --- arch/powerpc/platforms/powernv/opal-flash.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/powernv/opal-flash.c b/arch/powerpc/platforms/powernv/opal-flash.c index 5c21d9c..5455cd4 100644 --- a/arch/powerpc/platforms/powernv/opal-flash.c +++ b/arch/powerpc/platforms/powernv/opal-flash.c @@ -319,7 +319,8 @@ void opal_flash_term_callback(void) { struct cpumask mask; -if (update_flash_data.status != FLASH_IMG_READY) +if (update_flash_data.status != FLASH_IMG_READY || +image_data.status != IMAGE_READY) return; pr_alert(FLASH: Flashing new firmware\n); ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 1/3] powerpc/tm: Abort syscalls in active transactions
This patch changes the syscall handler to doom (tabort) active transactions when a syscall is made and return immediately without performing the syscall. Currently, the system call instruction automatically suspends an active transaction which causes side effects to persist when an active transaction fails. This does change the kernel's behaviour, but in a way that was documented as unsupported. It doesn't reduce functionality because syscalls will still be performed after tsuspend. It also provides a consistent interface and makes the behaviour of user code substantially the same across powerpc and platforms that do not support suspended transactions (e.g. x86 and s390). Performance measurements using http://ozlabs.org/~anton/junkcode/null_syscall.c indicate the cost of a system call increases by about 0.5%. Signed-off-by: Sam Bobroff sam.bobr...@au1.ibm.com --- Documentation/powerpc/transactional_memory.txt | 33 arch/powerpc/include/uapi/asm/tm.h |2 +- arch/powerpc/kernel/entry_64.S | 19 ++ 3 files changed, 37 insertions(+), 17 deletions(-) diff --git a/Documentation/powerpc/transactional_memory.txt b/Documentation/powerpc/transactional_memory.txt index 9791e98..4167bc2 100644 --- a/Documentation/powerpc/transactional_memory.txt +++ b/Documentation/powerpc/transactional_memory.txt @@ -74,22 +74,23 @@ Causes of transaction aborts Syscalls -Performing syscalls from within transaction is not recommended, and can lead -to unpredictable results. - -Syscalls do not by design abort transactions, but beware: The kernel code will -not be running in transactional state. The effect of syscalls will always -remain visible, but depending on the call they may abort your transaction as a -side-effect, read soon-to-be-aborted transactional data that should not remain -invisible, etc. If you constantly retry a transaction that constantly aborts -itself by calling a syscall, you'll have a livelock make no progress. - -Simple syscalls (e.g. sigprocmask()) could be OK. Even things like write() -from, say, printf() should be OK as long as the kernel does not access any -memory that was accessed transactionally. - -Consider any syscalls that happen to work as debug-only -- not recommended for -production use. Best to queue them up till after the transaction is over. +Syscalls made from within an active transaction will not be performed and the +transaction will be doomed by the kernel with the failure code TM_CAUSE_SYSCALL +| TM_CAUSE_PERSISTENT. + +Syscalls made from within a suspended transaction are performed as normal and +the transaction is not explicitly doomed by the kernel. However, what the +kernel does to perform the syscall may result in the transaction being doomed +by the hardware. The syscall is performed in suspended mode so any side +effects will be persistent, independent of transaction success or failure. No +guarantees are provided by the kernel about which syscalls will affect +transaction success. + +Care must be taken when relying on syscalls to abort during active transactions +if the calls are made via a library. Libraries may cache values (which may +give the appearence of success) or perform operations that cause transaction +failure before entering the kernel (which may produce different failure codes). +Examples are glibc's getpid() and lazy symbol resolution. Signals diff --git a/arch/powerpc/include/uapi/asm/tm.h b/arch/powerpc/include/uapi/asm/tm.h index 5d836b7..5047659 100644 --- a/arch/powerpc/include/uapi/asm/tm.h +++ b/arch/powerpc/include/uapi/asm/tm.h @@ -11,7 +11,7 @@ #define TM_CAUSE_RESCHED 0xde #define TM_CAUSE_TLBI 0xdc #define TM_CAUSE_FAC_UNAV 0xda -#define TM_CAUSE_SYSCALL 0xd8 /* future use */ +#define TM_CAUSE_SYSCALL 0xd8 #define TM_CAUSE_MISC 0xd6 /* future use */ #define TM_CAUSE_SIGNAL0xd4 #define TM_CAUSE_ALIGNMENT 0xd2 diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index d180caf2..85bf81d 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -34,6 +34,7 @@ #include asm/ftrace.h #include asm/hw_irq.h #include asm/context_tracking.h +#include asm/tm.h /* * System calls. @@ -145,6 +146,24 @@ END_FW_FTR_SECTION_IFSET(FW_FEATURE_SPLPAR) andi. r11,r10,_TIF_SYSCALL_DOTRACE bne syscall_dotrace .Lsyscall_dotrace_cont: +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM +BEGIN_FTR_SECTION + b 1f +END_FTR_SECTION_IFCLR(CPU_FTR_TM) + extrdi. r11, r12, 1, (63-MSR_TS_T_LG) /* transaction active? */ + beq+1f + + /* Doom the transaction and don't perform the syscall: */ + mfmsr r11 + li r12, 1 + rldimi r11, r12, MSR_TM_LG, 63-MSR_TM_LG + mtmsrd r11, 0 + li r11, (TM_CAUSE_SYSCALL|TM_CAUSE_PERSISTENT) + tabort. r11 + + b .Lsyscall_exit +1
[PATCH 2/3] selftests/powerpc: Move get_auxv_entry() to harness.c
Move get_auxv_entry() from pmu/lib.c up to harness.c in order to make it available to other tests. Signed-off-by: Sam Bobroff sam.bobr...@au1.ibm.com --- tools/testing/selftests/powerpc/harness.c | 47 + tools/testing/selftests/powerpc/pmu/lib.c | 47 - tools/testing/selftests/powerpc/pmu/lib.h |1 - tools/testing/selftests/powerpc/utils.h |2 +- 4 files changed, 48 insertions(+), 49 deletions(-) diff --git a/tools/testing/selftests/powerpc/harness.c b/tools/testing/selftests/powerpc/harness.c index 8ebc58a..f7997af 100644 --- a/tools/testing/selftests/powerpc/harness.c +++ b/tools/testing/selftests/powerpc/harness.c @@ -11,6 +11,10 @@ #include sys/types.h #include sys/wait.h #include unistd.h +#include elf.h +#include fcntl.h +#include link.h +#include sys/stat.h #include subunit.h #include utils.h @@ -112,3 +116,46 @@ int test_harness(int (test_function)(void), char *name) return rc; } + +static char auxv[4096]; + +void *get_auxv_entry(int type) +{ + ElfW(auxv_t) *p; + void *result; + ssize_t num; + int fd; + + fd = open(/proc/self/auxv, O_RDONLY); + if (fd == -1) { + perror(open); + return NULL; + } + + result = NULL; + + num = read(fd, auxv, sizeof(auxv)); + if (num 0) { + perror(read); + goto out; + } + + if (num sizeof(auxv)) { + printf(Overflowed auxv buffer\n); + goto out; + } + + p = (ElfW(auxv_t) *)auxv; + + while (p-a_type != AT_NULL) { + if (p-a_type == type) { + result = (void *)p-a_un.a_val; + break; + } + + p++; + } +out: + close(fd); + return result; +} diff --git a/tools/testing/selftests/powerpc/pmu/lib.c b/tools/testing/selftests/powerpc/pmu/lib.c index 9768dea..a07104c 100644 --- a/tools/testing/selftests/powerpc/pmu/lib.c +++ b/tools/testing/selftests/powerpc/pmu/lib.c @@ -5,15 +5,10 @@ #define _GNU_SOURCE/* For CPU_ZERO etc. */ -#include elf.h #include errno.h -#include fcntl.h -#include link.h #include sched.h #include setjmp.h #include stdlib.h -#include sys/stat.h -#include sys/types.h #include sys/wait.h #include utils.h @@ -256,45 +251,3 @@ out: return rc; } -static char auxv[4096]; - -void *get_auxv_entry(int type) -{ - ElfW(auxv_t) *p; - void *result; - ssize_t num; - int fd; - - fd = open(/proc/self/auxv, O_RDONLY); - if (fd == -1) { - perror(open); - return NULL; - } - - result = NULL; - - num = read(fd, auxv, sizeof(auxv)); - if (num 0) { - perror(read); - goto out; - } - - if (num sizeof(auxv)) { - printf(Overflowed auxv buffer\n); - goto out; - } - - p = (ElfW(auxv_t) *)auxv; - - while (p-a_type != AT_NULL) { - if (p-a_type == type) { - result = (void *)p-a_un.a_val; - break; - } - - p++; - } -out: - close(fd); - return result; -} diff --git a/tools/testing/selftests/powerpc/pmu/lib.h b/tools/testing/selftests/powerpc/pmu/lib.h index 0f0339c..ca5d72a 100644 --- a/tools/testing/selftests/powerpc/pmu/lib.h +++ b/tools/testing/selftests/powerpc/pmu/lib.h @@ -29,7 +29,6 @@ extern int notify_parent(union pipe write_pipe); extern int notify_parent_of_error(union pipe write_pipe); extern pid_t eat_cpu(int (test_function)(void)); extern bool require_paranoia_below(int level); -extern void *get_auxv_entry(int type); struct addr_range { uint64_t first, last; diff --git a/tools/testing/selftests/powerpc/utils.h b/tools/testing/selftests/powerpc/utils.h index a93777a..64f53cd 100644 --- a/tools/testing/selftests/powerpc/utils.h +++ b/tools/testing/selftests/powerpc/utils.h @@ -19,7 +19,7 @@ typedef uint8_t u8; int test_harness(int (test_function)(void), char *name); - +extern void *get_auxv_entry(int type); /* Yes, this is evil */ #define FAIL_IF(x) \ -- 1.7.10.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 3/3] selftests/powerpc: Add transactional syscall test
Check that a syscall made during an active transaction will fail with the correct failure code and that one made during a suspended transaction will succeed. Signed-off-by: Sam Bobroff sam.bobr...@au1.ibm.com --- tools/testing/selftests/powerpc/tm/Makefile |3 +- tools/testing/selftests/powerpc/tm/tm-syscall.c | 113 +++ 2 files changed, 115 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/powerpc/tm/tm-syscall.c diff --git a/tools/testing/selftests/powerpc/tm/Makefile b/tools/testing/selftests/powerpc/tm/Makefile index 2cede23..d8dab0d 100644 --- a/tools/testing/selftests/powerpc/tm/Makefile +++ b/tools/testing/selftests/powerpc/tm/Makefile @@ -1,4 +1,5 @@ -PROGS := tm-resched-dscr +PROGS := tm-resched-dscr tm-syscall +CFLAGS:=$(CFLAGS) -mhtm -Wl,-z,now all: $(PROGS) diff --git a/tools/testing/selftests/powerpc/tm/tm-syscall.c b/tools/testing/selftests/powerpc/tm/tm-syscall.c new file mode 100644 index 000..7c60e53 --- /dev/null +++ b/tools/testing/selftests/powerpc/tm/tm-syscall.c @@ -0,0 +1,113 @@ +/* Test the kernel's system call code to ensure that a system call + * made from within an active HTM transaction is aborted with the + * correct failure code. + * Conversely, ensure that a system call made from within a + * suspended transaction can succeed. + * + * It is important to compile with -Wl,-z,now to prevent + * lazy symbol resolution from affecting the results. + */ + +#include stdio.h +#include unistd.h +#include asm/tm.h +#include asm/cputable.h +#include linux/auxvec.h + +#include utils.h + +#define TM_RETRIES 10 +#define TM_TEST_RUNS 1000 + +int t_failure_persistent(void) +{ + long texasr = __builtin_get_texasr(); + long failure_code = (texasr 56) 0xff; + + return failure_code TM_CAUSE_PERSISTENT; +} + +int t_failure_code_syscall(void) +{ + long texasr = __builtin_get_texasr(); + long failure_code = (texasr 56) 0xff; + + return (failure_code TM_CAUSE_SYSCALL) == TM_CAUSE_SYSCALL; +} + +int t_active_getppid(void) +{ + int i; + + for (i = 0; i TM_RETRIES; i++) { + if (__builtin_tbegin(0)) { + getppid(); + __builtin_tend(0); + return 1; + } + if (t_failure_persistent()) + return 0; + } + return 0; +} + +int t_active_getppid_test(void) +{ + int i; + + for (i = 0; i TM_TEST_RUNS; i++) { + if (t_active_getppid()) + return 0; + if (!t_failure_persistent()) + return 0; + if (!t_failure_code_syscall()) + return 0; + } + return 1; +} + +int t_suspended_getppid(void) +{ + int i; + + for (i = 0; i TM_RETRIES; i++) { + if (__builtin_tbegin(0)) { + __builtin_tsuspend(); + getppid(); + __builtin_tresume(); + __builtin_tend(0); + return 1; + } + if (t_failure_persistent()) + return 0; + } + return 0; +} + +int t_suspended_getppid_test(void) +{ + int i; + + for (i = 0; i TM_TEST_RUNS; i++) { + if (!t_suspended_getppid()) + return 0; + } + return 1; +} + +int tm_syscall(void) +{ + SKIP_IF(!((long)get_auxv_entry(AT_HWCAP2) PPC_FEATURE2_HTM)); + setbuf(stdout, 0); + FAIL_IF(!t_active_getppid_test()); + printf(%d active transactions correctly aborted.\n, TM_TEST_RUNS); + FAIL_IF(!t_suspended_getppid_test()); + printf(%d suspended transactions succeeded.\n, TM_TEST_RUNS); + return 0; +} + +int main(void) +{ + return test_harness(tm_syscall, tm_syscall); +} + -- 1.7.10.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 0/3] powerpc/tm: Abort syscalls in active transactions
See the first patch for a description of the reasoning behind this change. This set includes the change, a kernel selftest for it and some slight refactoring of the selftest code. Sam Bobroff (3): powerpc/tm: Abort syscalls in active transactions selftests/powerpc: Move get_auxv_entry() to harness.c selftests/powerpc: Add transactional syscall test Documentation/powerpc/transactional_memory.txt | 33 +++ arch/powerpc/include/uapi/asm/tm.h |2 +- arch/powerpc/kernel/entry_64.S | 19 tools/testing/selftests/powerpc/harness.c | 47 ++ tools/testing/selftests/powerpc/pmu/lib.c | 47 -- tools/testing/selftests/powerpc/pmu/lib.h |1 - tools/testing/selftests/powerpc/tm/Makefile |3 +- tools/testing/selftests/powerpc/tm/tm-syscall.c | 113 +++ tools/testing/selftests/powerpc/utils.h |2 +- 9 files changed, 200 insertions(+), 67 deletions(-) create mode 100644 tools/testing/selftests/powerpc/tm/tm-syscall.c -- 1.7.10.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 4/4] powerpc/tm: Correct minor documentation typos
Signed-off-by: Sam Bobroff sam.bobr...@au1.ibm.com --- v2: Discovered some typos while updating the documentation. Documentation/powerpc/transactional_memory.txt |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/Documentation/powerpc/transactional_memory.txt b/Documentation/powerpc/transactional_memory.txt index 98b39af..ba0a2a4 100644 --- a/Documentation/powerpc/transactional_memory.txt +++ b/Documentation/powerpc/transactional_memory.txt @@ -175,7 +175,7 @@ These are defined in asm/reg.h, and distinguish different reasons why the kernel aborted a transaction: TM_CAUSE_RESCHED Thread was rescheduled. - TM_CAUSE_TLBI Software TLB invalide. + TM_CAUSE_TLBI Software TLB invalid. TM_CAUSE_FAC_UNAV FP/VEC/VSX unavailable trap. TM_CAUSE_SYSCALL Syscall from active transaction. TM_CAUSE_SIGNALSignal delivered. @@ -185,7 +185,7 @@ kernel aborted a transaction: These can be checked by the user program's abort handler as TEXASR[0:7]. If bit 7 is set, it indicates that the error is consider persistent. For example -a TM_CAUSE_ALIGNMENT will be persistent while a TM_CAUSE_RESCHED will not.q +a TM_CAUSE_ALIGNMENT will be persistent while a TM_CAUSE_RESCHED will not. GDB === -- 1.7.10.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 0/4] powerpc/tm: Abort syscalls in active transactions
See the first patch for a description of the reasoning behind this change. This set includes the change, a kernel selftest for it and some slight refactoring of the selftest code. v2: Patch 1/4: powerpc/tm: Abort syscalls in active transactions Also update the failure code table. Patch 3/4: selftests/powerpc: Add transactional syscall test Further testing has shown that the success or failure of the transactions was affected by minor changes to the code, compiler optimisation and linker settings. To address this, I've moved the transactional part of the test to a separate function, written in assembly. I've also extended the test to as many transactions as it can fit into ten seconds, to better test for failures that occur more rarely. This has stabilised the results, and it's no longer necessary to use special compiler or linker flags. Patch 4/4: powerpc/tm: Correct minor documentation typos Discovered some typos while updating the documentation. Sam Bobroff (4): powerpc/tm: Abort syscalls in active transactions selftests/powerpc: Move get_auxv_entry() to harness.c selftests/powerpc: Add transactional syscall test powerpc/tm: Correct minor documentation typos Documentation/powerpc/transactional_memory.txt | 36 +++ arch/powerpc/include/uapi/asm/tm.h |2 +- arch/powerpc/kernel/entry_64.S | 19 tools/testing/selftests/powerpc/harness.c | 47 + tools/testing/selftests/powerpc/pmu/lib.c | 47 - tools/testing/selftests/powerpc/pmu/lib.h |1 - tools/testing/selftests/powerpc/tm/.gitignore |1 + tools/testing/selftests/powerpc/tm/Makefile|4 +- .../testing/selftests/powerpc/tm/tm-syscall-asm.S | 27 + .../testing/selftests/powerpc/tm/tm-syscall-asm.h |2 + tools/testing/selftests/powerpc/tm/tm-syscall.c| 109 tools/testing/selftests/powerpc/utils.h|2 +- 12 files changed, 228 insertions(+), 69 deletions(-) create mode 100644 tools/testing/selftests/powerpc/tm/tm-syscall-asm.S create mode 100644 tools/testing/selftests/powerpc/tm/tm-syscall-asm.h create mode 100644 tools/testing/selftests/powerpc/tm/tm-syscall.c -- 1.7.10.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 3/3] selftests/powerpc: Add transactional syscall test
On 24/03/15 13:02, Michael Ellerman wrote: On Tue, 2015-03-24 at 12:52 +1100, Sam Bobroff wrote: On 20/03/15 20:25, Anshuman Khandual wrote: On 03/19/2015 10:13 AM, Sam Bobroff wrote: Check that a syscall made during an active transaction will fail with the correct failure code and that one made during a suspended transaction will succeed. Signed-off-by: Sam Bobroff sam.bobr...@au1.ibm.com The test works. Great :-) + +int tm_syscall(void) +{ + SKIP_IF(!((long)get_auxv_entry(AT_HWCAP2) PPC_FEATURE2_HTM)); + setbuf(stdout, 0); + FAIL_IF(!t_active_getppid_test()); + printf(%d active transactions correctly aborted.\n, TM_TEST_RUNS); + FAIL_IF(!t_suspended_getppid_test()); + printf(%d suspended transactions succeeded.\n, TM_TEST_RUNS); + return 0; +} + +int main(void) +{ + return test_harness(tm_syscall, tm_syscall); +} + There is an extra blank line at the end of this file. Interchanging return codes of 0 and 1 for various functions make it very confusing along with negative FAIL_IF checks in the primary test function. Control flow structures like these can use some in-code documentation for readability. + for (i = 0; i TM_RETRIES; i++) { + if (__builtin_tbegin(0)) { + getppid(); + __builtin_tend(0); + return 1; + } + if (t_failure_persistent()) + return 0; or + if (__builtin_tbegin(0)) { + __builtin_tsuspend(); + getppid(); + __builtin_tresume(); + __builtin_tend(0); + return 1; + } + if (t_failure_persistent()) + return 0; Good points. I'll remove the blank line and comment the code. I'm not sure I can do any better with the FAIL_IF() macro: I wanted it to read fail if the test failed, but I can see what you mean about a double negative. Maybe it would be better to introduce a different macro, more like a standard assert: TEST(XXX) which fails if XXX is false. However, I think TEST would be too generic a name and I'm not should what would be better. Any comments/suggestions? FAIL_IF() is designed for things that return 0 for OK and !0 for failure. Like most things in C. So I think it would be improved if you inverted your return codes in your test routines. Even better to return ESOMETHING in the error cases, and zero otherwise. cheers Fair enough. I think the *_test() functions I added for clarity were just making it more confusing, so I've dropped them. Moving the code around, even a little, has also exposed the fact that transactions are very sensitive to how the code is compiled so I'm going to move the transaction blocks out into a separate assembly file where I can control exactly what instructions get used. This will also mean that it's no longer dependent on using linker magic (or some other trick) to avoid lazy symbol loading. I'll repost the series. Thanks for the review! Cheers, Sam. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 2/4] selftests/powerpc: Move get_auxv_entry() to harness.c
Move get_auxv_entry() from pmu/lib.c up to harness.c in order to make it available to other tests. Signed-off-by: Sam Bobroff sam.bobr...@au1.ibm.com --- tools/testing/selftests/powerpc/harness.c | 47 + tools/testing/selftests/powerpc/pmu/lib.c | 47 - tools/testing/selftests/powerpc/pmu/lib.h |1 - tools/testing/selftests/powerpc/utils.h |2 +- 4 files changed, 48 insertions(+), 49 deletions(-) diff --git a/tools/testing/selftests/powerpc/harness.c b/tools/testing/selftests/powerpc/harness.c index 8ebc58a..f7997af 100644 --- a/tools/testing/selftests/powerpc/harness.c +++ b/tools/testing/selftests/powerpc/harness.c @@ -11,6 +11,10 @@ #include sys/types.h #include sys/wait.h #include unistd.h +#include elf.h +#include fcntl.h +#include link.h +#include sys/stat.h #include subunit.h #include utils.h @@ -112,3 +116,46 @@ int test_harness(int (test_function)(void), char *name) return rc; } + +static char auxv[4096]; + +void *get_auxv_entry(int type) +{ + ElfW(auxv_t) *p; + void *result; + ssize_t num; + int fd; + + fd = open(/proc/self/auxv, O_RDONLY); + if (fd == -1) { + perror(open); + return NULL; + } + + result = NULL; + + num = read(fd, auxv, sizeof(auxv)); + if (num 0) { + perror(read); + goto out; + } + + if (num sizeof(auxv)) { + printf(Overflowed auxv buffer\n); + goto out; + } + + p = (ElfW(auxv_t) *)auxv; + + while (p-a_type != AT_NULL) { + if (p-a_type == type) { + result = (void *)p-a_un.a_val; + break; + } + + p++; + } +out: + close(fd); + return result; +} diff --git a/tools/testing/selftests/powerpc/pmu/lib.c b/tools/testing/selftests/powerpc/pmu/lib.c index 9768dea..a07104c 100644 --- a/tools/testing/selftests/powerpc/pmu/lib.c +++ b/tools/testing/selftests/powerpc/pmu/lib.c @@ -5,15 +5,10 @@ #define _GNU_SOURCE/* For CPU_ZERO etc. */ -#include elf.h #include errno.h -#include fcntl.h -#include link.h #include sched.h #include setjmp.h #include stdlib.h -#include sys/stat.h -#include sys/types.h #include sys/wait.h #include utils.h @@ -256,45 +251,3 @@ out: return rc; } -static char auxv[4096]; - -void *get_auxv_entry(int type) -{ - ElfW(auxv_t) *p; - void *result; - ssize_t num; - int fd; - - fd = open(/proc/self/auxv, O_RDONLY); - if (fd == -1) { - perror(open); - return NULL; - } - - result = NULL; - - num = read(fd, auxv, sizeof(auxv)); - if (num 0) { - perror(read); - goto out; - } - - if (num sizeof(auxv)) { - printf(Overflowed auxv buffer\n); - goto out; - } - - p = (ElfW(auxv_t) *)auxv; - - while (p-a_type != AT_NULL) { - if (p-a_type == type) { - result = (void *)p-a_un.a_val; - break; - } - - p++; - } -out: - close(fd); - return result; -} diff --git a/tools/testing/selftests/powerpc/pmu/lib.h b/tools/testing/selftests/powerpc/pmu/lib.h index 0f0339c..ca5d72a 100644 --- a/tools/testing/selftests/powerpc/pmu/lib.h +++ b/tools/testing/selftests/powerpc/pmu/lib.h @@ -29,7 +29,6 @@ extern int notify_parent(union pipe write_pipe); extern int notify_parent_of_error(union pipe write_pipe); extern pid_t eat_cpu(int (test_function)(void)); extern bool require_paranoia_below(int level); -extern void *get_auxv_entry(int type); struct addr_range { uint64_t first, last; diff --git a/tools/testing/selftests/powerpc/utils.h b/tools/testing/selftests/powerpc/utils.h index a93777a..64f53cd 100644 --- a/tools/testing/selftests/powerpc/utils.h +++ b/tools/testing/selftests/powerpc/utils.h @@ -19,7 +19,7 @@ typedef uint8_t u8; int test_harness(int (test_function)(void), char *name); - +extern void *get_auxv_entry(int type); /* Yes, this is evil */ #define FAIL_IF(x) \ -- 1.7.10.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 1/4] powerpc/tm: Abort syscalls in active transactions
This patch changes the syscall handler to doom (tabort) active transactions when a syscall is made and return immediately without performing the syscall. Currently, the system call instruction automatically suspends an active transaction which causes side effects to persist when an active transaction fails. This does change the kernel's behaviour, but in a way that was documented as unsupported. It doesn't reduce functionality because syscalls will still be performed after tsuspend. It also provides a consistent interface and makes the behaviour of user code substantially the same across powerpc and platforms that do not support suspended transactions (e.g. x86 and s390). Performance measurements using http://ozlabs.org/~anton/junkcode/null_syscall.c indicate the cost of a system call increases by about 0.5%. Signed-off-by: Sam Bobroff sam.bobr...@au1.ibm.com --- v2: Also update the failure code table. Documentation/powerpc/transactional_memory.txt | 32 arch/powerpc/include/uapi/asm/tm.h |2 +- arch/powerpc/kernel/entry_64.S | 19 ++ 3 files changed, 36 insertions(+), 17 deletions(-) diff --git a/Documentation/powerpc/transactional_memory.txt b/Documentation/powerpc/transactional_memory.txt index 9791e98..98b39af 100644 --- a/Documentation/powerpc/transactional_memory.txt +++ b/Documentation/powerpc/transactional_memory.txt @@ -74,22 +74,23 @@ Causes of transaction aborts Syscalls -Performing syscalls from within transaction is not recommended, and can lead -to unpredictable results. +Syscalls made from within an active transaction will not be performed and the +transaction will be doomed by the kernel with the failure code TM_CAUSE_SYSCALL +| TM_CAUSE_PERSISTENT. -Syscalls do not by design abort transactions, but beware: The kernel code will -not be running in transactional state. The effect of syscalls will always -remain visible, but depending on the call they may abort your transaction as a -side-effect, read soon-to-be-aborted transactional data that should not remain -invisible, etc. If you constantly retry a transaction that constantly aborts -itself by calling a syscall, you'll have a livelock make no progress. +Syscalls made from within a suspended transaction are performed as normal and +the transaction is not explicitly doomed by the kernel. However, what the +kernel does to perform the syscall may result in the transaction being doomed +by the hardware. The syscall is performed in suspended mode so any side +effects will be persistent, independent of transaction success or failure. No +guarantees are provided by the kernel about which syscalls will affect +transaction success. -Simple syscalls (e.g. sigprocmask()) could be OK. Even things like write() -from, say, printf() should be OK as long as the kernel does not access any -memory that was accessed transactionally. - -Consider any syscalls that happen to work as debug-only -- not recommended for -production use. Best to queue them up till after the transaction is over. +Care must be taken when relying on syscalls to abort during active transactions +if the calls are made via a library. Libraries may cache values (which may +give the appearance of success) or perform operations that cause transaction +failure before entering the kernel (which may produce different failure codes). +Examples are glibc's getpid() and lazy symbol resolution. Signals @@ -176,8 +177,7 @@ kernel aborted a transaction: TM_CAUSE_RESCHED Thread was rescheduled. TM_CAUSE_TLBI Software TLB invalide. TM_CAUSE_FAC_UNAV FP/VEC/VSX unavailable trap. - TM_CAUSE_SYSCALL Currently unused; future syscalls that must abort -transactions for consistency will use this. + TM_CAUSE_SYSCALL Syscall from active transaction. TM_CAUSE_SIGNALSignal delivered. TM_CAUSE_MISC Currently unused. TM_CAUSE_ALIGNMENT Alignment fault. diff --git a/arch/powerpc/include/uapi/asm/tm.h b/arch/powerpc/include/uapi/asm/tm.h index 5d836b7..5047659 100644 --- a/arch/powerpc/include/uapi/asm/tm.h +++ b/arch/powerpc/include/uapi/asm/tm.h @@ -11,7 +11,7 @@ #define TM_CAUSE_RESCHED 0xde #define TM_CAUSE_TLBI 0xdc #define TM_CAUSE_FAC_UNAV 0xda -#define TM_CAUSE_SYSCALL 0xd8 /* future use */ +#define TM_CAUSE_SYSCALL 0xd8 #define TM_CAUSE_MISC 0xd6 /* future use */ #define TM_CAUSE_SIGNAL0xd4 #define TM_CAUSE_ALIGNMENT 0xd2 diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index d180caf2..85bf81d 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -34,6 +34,7 @@ #include asm/ftrace.h #include asm/hw_irq.h #include asm/context_tracking.h +#include asm/tm.h /* * System calls. @@ -145,6 +146,24 @@ END_FW_FTR_SECTION_IFSET(FW_FEATURE_SPLPAR) andi. r11,r10
[PATCH v2 3/4] selftests/powerpc: Add transactional syscall test
Check that a syscall made during an active transaction will fail with the correct failure code and that one made during a suspended transaction will succeed. Signed-off-by: Sam Bobroff sam.bobr...@au1.ibm.com --- v2: Further testing has shown that the success or failure of the transactions was affected by minor changes to the code, compiler optimisation and linker settings. To address this, I've moved the transactional part of the test to a separate function, written in assembly. I've also extended the test to as many transactions as it can fit into ten seconds, to better test for failures that occur more rarely. This has stabilised the results, and it's no longer necessary to use special compiler or linker flags. tools/testing/selftests/powerpc/tm/.gitignore |1 + tools/testing/selftests/powerpc/tm/Makefile|4 +- .../testing/selftests/powerpc/tm/tm-syscall-asm.S | 27 + .../testing/selftests/powerpc/tm/tm-syscall-asm.h |2 + tools/testing/selftests/powerpc/tm/tm-syscall.c| 109 5 files changed, 142 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/powerpc/tm/tm-syscall-asm.S create mode 100644 tools/testing/selftests/powerpc/tm/tm-syscall-asm.h create mode 100644 tools/testing/selftests/powerpc/tm/tm-syscall.c diff --git a/tools/testing/selftests/powerpc/tm/.gitignore b/tools/testing/selftests/powerpc/tm/.gitignore index 33d02cc..2699635d 100644 --- a/tools/testing/selftests/powerpc/tm/.gitignore +++ b/tools/testing/selftests/powerpc/tm/.gitignore @@ -1 +1,2 @@ tm-resched-dscr +tm-syscall diff --git a/tools/testing/selftests/powerpc/tm/Makefile b/tools/testing/selftests/powerpc/tm/Makefile index 2cede23..93bbff3 100644 --- a/tools/testing/selftests/powerpc/tm/Makefile +++ b/tools/testing/selftests/powerpc/tm/Makefile @@ -1,8 +1,10 @@ -PROGS := tm-resched-dscr +PROGS := tm-resched-dscr tm-syscall +CFLAGS:=$(CFLAGS) -mhtm all: $(PROGS) $(PROGS): ../harness.c +tm-syscall: tm-syscall-asm.S run_tests: all @-for PROG in $(PROGS); do \ diff --git a/tools/testing/selftests/powerpc/tm/tm-syscall-asm.S b/tools/testing/selftests/powerpc/tm/tm-syscall-asm.S new file mode 100644 index 000..2b2daa7 --- /dev/null +++ b/tools/testing/selftests/powerpc/tm/tm-syscall-asm.S @@ -0,0 +1,27 @@ +#include ppc-asm.h +#include asm/unistd.h + + .text +FUNC_START(getppid_tm_active_impl) + tbegin. + beq 1f + li r0, __NR_getppid + sc + tend. + blr +1: + li r3, -1 + blr + +FUNC_START(getppid_tm_suspended_impl) + tbegin. + beq 1f + li r0, __NR_getppid + tsuspend. + sc + tresume. + tend. + blr +1: + li r3, -1 + blr diff --git a/tools/testing/selftests/powerpc/tm/tm-syscall-asm.h b/tools/testing/selftests/powerpc/tm/tm-syscall-asm.h new file mode 100644 index 000..6136328 --- /dev/null +++ b/tools/testing/selftests/powerpc/tm/tm-syscall-asm.h @@ -0,0 +1,2 @@ +extern int getppid_tm_active_impl(void); +extern int getppid_tm_suspended_impl(void); diff --git a/tools/testing/selftests/powerpc/tm/tm-syscall.c b/tools/testing/selftests/powerpc/tm/tm-syscall.c new file mode 100644 index 000..ff3b15c --- /dev/null +++ b/tools/testing/selftests/powerpc/tm/tm-syscall.c @@ -0,0 +1,109 @@ +/* Test the kernel's system call code to ensure that a system call + * made from within an active HTM transaction is aborted with the + * correct failure code. + * Conversely, ensure that a system call made from within a + * suspended transaction can succeed. + */ + +#include stdio.h +#include unistd.h +#include sys/syscall.h +#include asm/tm.h +#include asm/cputable.h +#include linux/auxvec.h +#include sys/time.h +#include stdlib.h + +#include utils.h +#include tm-syscall-asm.h + +unsigned retries = 0; + +#define TEST_DURATION 10 /* seconds */ +#define TM_RETRIES 100 + +long failure_code(void) +{ + return __builtin_get_texasr() 56; +} + +bool failure_is_persistent(void) +{ + return (failure_code() TM_CAUSE_PERSISTENT) == TM_CAUSE_PERSISTENT; +} + +bool failure_is_syscall(void) +{ + return (failure_code() TM_CAUSE_SYSCALL) == TM_CAUSE_SYSCALL; +} + +pid_t getppid_tm(bool suspend) +{ + int i; + pid_t pid; + + for (i = 0; i TM_RETRIES; i++) { + if (suspend) + pid = getppid_tm_suspended_impl(); + else + pid = getppid_tm_active_impl(); + if (pid = 0) + return pid; + if (failure_is_persistent()) { + if (failure_is_syscall()) + return -1; + printf(Unexpected persistent transaction failure.\n); + printf(TEXASR 0x%016lx, TFIAR 0x%016lx.\n, + __builtin_get_texasr(), __builtin_get_tfiar()); + exit(-1
Re: [PATCH 3/3] selftests/powerpc: Add transactional syscall test
On 20/03/15 20:25, Anshuman Khandual wrote: On 03/19/2015 10:13 AM, Sam Bobroff wrote: Check that a syscall made during an active transaction will fail with the correct failure code and that one made during a suspended transaction will succeed. Signed-off-by: Sam Bobroff sam.bobr...@au1.ibm.com The test works. Great :-) + +int tm_syscall(void) +{ +SKIP_IF(!((long)get_auxv_entry(AT_HWCAP2) PPC_FEATURE2_HTM)); +setbuf(stdout, 0); +FAIL_IF(!t_active_getppid_test()); +printf(%d active transactions correctly aborted.\n, TM_TEST_RUNS); +FAIL_IF(!t_suspended_getppid_test()); +printf(%d suspended transactions succeeded.\n, TM_TEST_RUNS); +return 0; +} + +int main(void) +{ +return test_harness(tm_syscall, tm_syscall); +} + There is an extra blank line at the end of this file. Interchanging return codes of 0 and 1 for various functions make it very confusing along with negative FAIL_IF checks in the primary test function. Control flow structures like these can use some in-code documentation for readability. + for (i = 0; i TM_RETRIES; i++) { + if (__builtin_tbegin(0)) { + getppid(); + __builtin_tend(0); + return 1; + } + if (t_failure_persistent()) + return 0; or + if (__builtin_tbegin(0)) { + __builtin_tsuspend(); + getppid(); + __builtin_tresume(); + __builtin_tend(0); + return 1; + } + if (t_failure_persistent()) + return 0; Good points. I'll remove the blank line and comment the code. I'm not sure I can do any better with the FAIL_IF() macro: I wanted it to read fail if the test failed, but I can see what you mean about a double negative. Maybe it would be better to introduce a different macro, more like a standard assert: TEST(XXX) which fails if XXX is false. However, I think TEST would be too generic a name and I'm not should what would be better. Any comments/suggestions? Thanks for the review! Cheers, Sam. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v3 2/4] selftests/powerpc: Move get_auxv_entry() to harness.c
Move get_auxv_entry() from pmu/lib.c up to harness.c in order to make it available to other tests. Signed-off-by: Sam Bobroff sam.bobr...@au1.ibm.com --- tools/testing/selftests/powerpc/harness.c | 47 + tools/testing/selftests/powerpc/pmu/lib.c | 47 - tools/testing/selftests/powerpc/pmu/lib.h |1 - tools/testing/selftests/powerpc/utils.h |2 +- 4 files changed, 48 insertions(+), 49 deletions(-) diff --git a/tools/testing/selftests/powerpc/harness.c b/tools/testing/selftests/powerpc/harness.c index 8ebc58a..f7997af 100644 --- a/tools/testing/selftests/powerpc/harness.c +++ b/tools/testing/selftests/powerpc/harness.c @@ -11,6 +11,10 @@ #include sys/types.h #include sys/wait.h #include unistd.h +#include elf.h +#include fcntl.h +#include link.h +#include sys/stat.h #include subunit.h #include utils.h @@ -112,3 +116,46 @@ int test_harness(int (test_function)(void), char *name) return rc; } + +static char auxv[4096]; + +void *get_auxv_entry(int type) +{ + ElfW(auxv_t) *p; + void *result; + ssize_t num; + int fd; + + fd = open(/proc/self/auxv, O_RDONLY); + if (fd == -1) { + perror(open); + return NULL; + } + + result = NULL; + + num = read(fd, auxv, sizeof(auxv)); + if (num 0) { + perror(read); + goto out; + } + + if (num sizeof(auxv)) { + printf(Overflowed auxv buffer\n); + goto out; + } + + p = (ElfW(auxv_t) *)auxv; + + while (p-a_type != AT_NULL) { + if (p-a_type == type) { + result = (void *)p-a_un.a_val; + break; + } + + p++; + } +out: + close(fd); + return result; +} diff --git a/tools/testing/selftests/powerpc/pmu/lib.c b/tools/testing/selftests/powerpc/pmu/lib.c index 9768dea..a07104c 100644 --- a/tools/testing/selftests/powerpc/pmu/lib.c +++ b/tools/testing/selftests/powerpc/pmu/lib.c @@ -5,15 +5,10 @@ #define _GNU_SOURCE/* For CPU_ZERO etc. */ -#include elf.h #include errno.h -#include fcntl.h -#include link.h #include sched.h #include setjmp.h #include stdlib.h -#include sys/stat.h -#include sys/types.h #include sys/wait.h #include utils.h @@ -256,45 +251,3 @@ out: return rc; } -static char auxv[4096]; - -void *get_auxv_entry(int type) -{ - ElfW(auxv_t) *p; - void *result; - ssize_t num; - int fd; - - fd = open(/proc/self/auxv, O_RDONLY); - if (fd == -1) { - perror(open); - return NULL; - } - - result = NULL; - - num = read(fd, auxv, sizeof(auxv)); - if (num 0) { - perror(read); - goto out; - } - - if (num sizeof(auxv)) { - printf(Overflowed auxv buffer\n); - goto out; - } - - p = (ElfW(auxv_t) *)auxv; - - while (p-a_type != AT_NULL) { - if (p-a_type == type) { - result = (void *)p-a_un.a_val; - break; - } - - p++; - } -out: - close(fd); - return result; -} diff --git a/tools/testing/selftests/powerpc/pmu/lib.h b/tools/testing/selftests/powerpc/pmu/lib.h index 0f0339c..ca5d72a 100644 --- a/tools/testing/selftests/powerpc/pmu/lib.h +++ b/tools/testing/selftests/powerpc/pmu/lib.h @@ -29,7 +29,6 @@ extern int notify_parent(union pipe write_pipe); extern int notify_parent_of_error(union pipe write_pipe); extern pid_t eat_cpu(int (test_function)(void)); extern bool require_paranoia_below(int level); -extern void *get_auxv_entry(int type); struct addr_range { uint64_t first, last; diff --git a/tools/testing/selftests/powerpc/utils.h b/tools/testing/selftests/powerpc/utils.h index a93777a..64f53cd 100644 --- a/tools/testing/selftests/powerpc/utils.h +++ b/tools/testing/selftests/powerpc/utils.h @@ -19,7 +19,7 @@ typedef uint8_t u8; int test_harness(int (test_function)(void), char *name); - +extern void *get_auxv_entry(int type); /* Yes, this is evil */ #define FAIL_IF(x) \ -- 1.7.10.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v3 0/4] powerpc/tm: Abort syscalls in active transactions
See the first patch for a description of the reasoning behind this change. This set includes the change, a kernel selftest for it and some slight refactoring of the selftest code. v3: Patch 1/4: powerpc/tm: Abort syscalls in active transactions Use TABORT() macro to allow building on versions of gcc that don't support the tabort. instruction. v2: Patch 1/4: powerpc/tm: Abort syscalls in active transactions Also update the failure code table. Patch 3/4: selftests/powerpc: Add transactional syscall test Further testing has shown that the success or failure of the transactions was affected by minor changes to the code, compiler optimisation and linker settings. To address this, I've moved the transactional part of the test to a separate function, written in assembly. I've also extended the test to as many transactions as it can fit into ten seconds, to better test for failures that occur more rarely. This has stabilised the results, and it's no longer necessary to use special compiler or linker flags. Patch 4/4: powerpc/tm: Correct minor documentation typos Discovered some typos while updating the documentation. Sam Bobroff (4): powerpc/tm: Abort syscalls in active transactions selftests/powerpc: Move get_auxv_entry() to harness.c selftests/powerpc: Add transactional syscall test powerpc/tm: Correct minor documentation typos Documentation/powerpc/transactional_memory.txt | 36 +++ arch/powerpc/include/uapi/asm/tm.h |2 +- arch/powerpc/kernel/entry_64.S | 19 tools/testing/selftests/powerpc/harness.c | 47 + tools/testing/selftests/powerpc/pmu/lib.c | 47 - tools/testing/selftests/powerpc/pmu/lib.h |1 - tools/testing/selftests/powerpc/tm/.gitignore |1 + tools/testing/selftests/powerpc/tm/Makefile|4 +- .../testing/selftests/powerpc/tm/tm-syscall-asm.S | 27 + .../testing/selftests/powerpc/tm/tm-syscall-asm.h |2 + tools/testing/selftests/powerpc/tm/tm-syscall.c| 109 tools/testing/selftests/powerpc/utils.h|2 +- 12 files changed, 228 insertions(+), 69 deletions(-) create mode 100644 tools/testing/selftests/powerpc/tm/tm-syscall-asm.S create mode 100644 tools/testing/selftests/powerpc/tm/tm-syscall-asm.h create mode 100644 tools/testing/selftests/powerpc/tm/tm-syscall.c -- 1.7.10.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v3 1/4] powerpc/tm: Abort syscalls in active transactions
This patch changes the syscall handler to doom (tabort) active transactions when a syscall is made and return immediately without performing the syscall. Currently, the system call instruction automatically suspends an active transaction which causes side effects to persist when an active transaction fails. This does change the kernel's behaviour, but in a way that was documented as unsupported. It doesn't reduce functionality because syscalls will still be performed after tsuspend. It also provides a consistent interface and makes the behaviour of user code substantially the same across powerpc and platforms that do not support suspended transactions (e.g. x86 and s390). Performance measurements using http://ozlabs.org/~anton/junkcode/null_syscall.c indicate the cost of a system call increases by about 0.5%. Signed-off-by: Sam Bobroff sam.bobr...@au1.ibm.com Acked-By: Michael Neuling mi...@neuling.org --- v3: Use TABORT() macro to allow building on versions of gcc that don't support the tabort. instruction. v2: Also update the failure code table. Documentation/powerpc/transactional_memory.txt | 32 arch/powerpc/include/uapi/asm/tm.h |2 +- arch/powerpc/kernel/entry_64.S | 19 ++ 3 files changed, 36 insertions(+), 17 deletions(-) diff --git a/Documentation/powerpc/transactional_memory.txt b/Documentation/powerpc/transactional_memory.txt index 9791e98..98b39af 100644 --- a/Documentation/powerpc/transactional_memory.txt +++ b/Documentation/powerpc/transactional_memory.txt @@ -74,22 +74,23 @@ Causes of transaction aborts Syscalls -Performing syscalls from within transaction is not recommended, and can lead -to unpredictable results. +Syscalls made from within an active transaction will not be performed and the +transaction will be doomed by the kernel with the failure code TM_CAUSE_SYSCALL +| TM_CAUSE_PERSISTENT. -Syscalls do not by design abort transactions, but beware: The kernel code will -not be running in transactional state. The effect of syscalls will always -remain visible, but depending on the call they may abort your transaction as a -side-effect, read soon-to-be-aborted transactional data that should not remain -invisible, etc. If you constantly retry a transaction that constantly aborts -itself by calling a syscall, you'll have a livelock make no progress. +Syscalls made from within a suspended transaction are performed as normal and +the transaction is not explicitly doomed by the kernel. However, what the +kernel does to perform the syscall may result in the transaction being doomed +by the hardware. The syscall is performed in suspended mode so any side +effects will be persistent, independent of transaction success or failure. No +guarantees are provided by the kernel about which syscalls will affect +transaction success. -Simple syscalls (e.g. sigprocmask()) could be OK. Even things like write() -from, say, printf() should be OK as long as the kernel does not access any -memory that was accessed transactionally. - -Consider any syscalls that happen to work as debug-only -- not recommended for -production use. Best to queue them up till after the transaction is over. +Care must be taken when relying on syscalls to abort during active transactions +if the calls are made via a library. Libraries may cache values (which may +give the appearance of success) or perform operations that cause transaction +failure before entering the kernel (which may produce different failure codes). +Examples are glibc's getpid() and lazy symbol resolution. Signals @@ -176,8 +177,7 @@ kernel aborted a transaction: TM_CAUSE_RESCHED Thread was rescheduled. TM_CAUSE_TLBI Software TLB invalide. TM_CAUSE_FAC_UNAV FP/VEC/VSX unavailable trap. - TM_CAUSE_SYSCALL Currently unused; future syscalls that must abort -transactions for consistency will use this. + TM_CAUSE_SYSCALL Syscall from active transaction. TM_CAUSE_SIGNALSignal delivered. TM_CAUSE_MISC Currently unused. TM_CAUSE_ALIGNMENT Alignment fault. diff --git a/arch/powerpc/include/uapi/asm/tm.h b/arch/powerpc/include/uapi/asm/tm.h index 5d836b7..5047659 100644 --- a/arch/powerpc/include/uapi/asm/tm.h +++ b/arch/powerpc/include/uapi/asm/tm.h @@ -11,7 +11,7 @@ #define TM_CAUSE_RESCHED 0xde #define TM_CAUSE_TLBI 0xdc #define TM_CAUSE_FAC_UNAV 0xda -#define TM_CAUSE_SYSCALL 0xd8 /* future use */ +#define TM_CAUSE_SYSCALL 0xd8 #define TM_CAUSE_MISC 0xd6 /* future use */ #define TM_CAUSE_SIGNAL0xd4 #define TM_CAUSE_ALIGNMENT 0xd2 diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index d180caf2..6374af8 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -34,6 +34,7 @@ #include asm/ftrace.h #include asm/hw_irq.h #include asm
[PATCH v3 4/4] powerpc/tm: Correct minor documentation typos
Signed-off-by: Sam Bobroff sam.bobr...@au1.ibm.com --- v2: Discovered some typos while updating the documentation. Documentation/powerpc/transactional_memory.txt |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/Documentation/powerpc/transactional_memory.txt b/Documentation/powerpc/transactional_memory.txt index 98b39af..ba0a2a4 100644 --- a/Documentation/powerpc/transactional_memory.txt +++ b/Documentation/powerpc/transactional_memory.txt @@ -175,7 +175,7 @@ These are defined in asm/reg.h, and distinguish different reasons why the kernel aborted a transaction: TM_CAUSE_RESCHED Thread was rescheduled. - TM_CAUSE_TLBI Software TLB invalide. + TM_CAUSE_TLBI Software TLB invalid. TM_CAUSE_FAC_UNAV FP/VEC/VSX unavailable trap. TM_CAUSE_SYSCALL Syscall from active transaction. TM_CAUSE_SIGNALSignal delivered. @@ -185,7 +185,7 @@ kernel aborted a transaction: These can be checked by the user program's abort handler as TEXASR[0:7]. If bit 7 is set, it indicates that the error is consider persistent. For example -a TM_CAUSE_ALIGNMENT will be persistent while a TM_CAUSE_RESCHED will not.q +a TM_CAUSE_ALIGNMENT will be persistent while a TM_CAUSE_RESCHED will not. GDB === -- 1.7.10.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v3 3/4] selftests/powerpc: Add transactional syscall test
Check that a syscall made during an active transaction will fail with the correct failure code and that one made during a suspended transaction will succeed. Signed-off-by: Sam Bobroff sam.bobr...@au1.ibm.com --- v2: Further testing has shown that the success or failure of the transactions was affected by minor changes to the code, compiler optimisation and linker settings. To address this, I've moved the transactional part of the test to a separate function, written in assembly. I've also extended the test to as many transactions as it can fit into ten seconds, to better test for failures that occur more rarely. This has stabilised the results, and it's no longer necessary to use special compiler or linker flags. tools/testing/selftests/powerpc/tm/.gitignore |1 + tools/testing/selftests/powerpc/tm/Makefile|4 +- .../testing/selftests/powerpc/tm/tm-syscall-asm.S | 27 + .../testing/selftests/powerpc/tm/tm-syscall-asm.h |2 + tools/testing/selftests/powerpc/tm/tm-syscall.c| 109 5 files changed, 142 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/powerpc/tm/tm-syscall-asm.S create mode 100644 tools/testing/selftests/powerpc/tm/tm-syscall-asm.h create mode 100644 tools/testing/selftests/powerpc/tm/tm-syscall.c diff --git a/tools/testing/selftests/powerpc/tm/.gitignore b/tools/testing/selftests/powerpc/tm/.gitignore index 33d02cc..2699635d 100644 --- a/tools/testing/selftests/powerpc/tm/.gitignore +++ b/tools/testing/selftests/powerpc/tm/.gitignore @@ -1 +1,2 @@ tm-resched-dscr +tm-syscall diff --git a/tools/testing/selftests/powerpc/tm/Makefile b/tools/testing/selftests/powerpc/tm/Makefile index 2cede23..93bbff3 100644 --- a/tools/testing/selftests/powerpc/tm/Makefile +++ b/tools/testing/selftests/powerpc/tm/Makefile @@ -1,8 +1,10 @@ -PROGS := tm-resched-dscr +PROGS := tm-resched-dscr tm-syscall +CFLAGS:=$(CFLAGS) -mhtm all: $(PROGS) $(PROGS): ../harness.c +tm-syscall: tm-syscall-asm.S run_tests: all @-for PROG in $(PROGS); do \ diff --git a/tools/testing/selftests/powerpc/tm/tm-syscall-asm.S b/tools/testing/selftests/powerpc/tm/tm-syscall-asm.S new file mode 100644 index 000..2b2daa7 --- /dev/null +++ b/tools/testing/selftests/powerpc/tm/tm-syscall-asm.S @@ -0,0 +1,27 @@ +#include ppc-asm.h +#include asm/unistd.h + + .text +FUNC_START(getppid_tm_active_impl) + tbegin. + beq 1f + li r0, __NR_getppid + sc + tend. + blr +1: + li r3, -1 + blr + +FUNC_START(getppid_tm_suspended_impl) + tbegin. + beq 1f + li r0, __NR_getppid + tsuspend. + sc + tresume. + tend. + blr +1: + li r3, -1 + blr diff --git a/tools/testing/selftests/powerpc/tm/tm-syscall-asm.h b/tools/testing/selftests/powerpc/tm/tm-syscall-asm.h new file mode 100644 index 000..6136328 --- /dev/null +++ b/tools/testing/selftests/powerpc/tm/tm-syscall-asm.h @@ -0,0 +1,2 @@ +extern int getppid_tm_active_impl(void); +extern int getppid_tm_suspended_impl(void); diff --git a/tools/testing/selftests/powerpc/tm/tm-syscall.c b/tools/testing/selftests/powerpc/tm/tm-syscall.c new file mode 100644 index 000..ff3b15c --- /dev/null +++ b/tools/testing/selftests/powerpc/tm/tm-syscall.c @@ -0,0 +1,109 @@ +/* Test the kernel's system call code to ensure that a system call + * made from within an active HTM transaction is aborted with the + * correct failure code. + * Conversely, ensure that a system call made from within a + * suspended transaction can succeed. + */ + +#include stdio.h +#include unistd.h +#include sys/syscall.h +#include asm/tm.h +#include asm/cputable.h +#include linux/auxvec.h +#include sys/time.h +#include stdlib.h + +#include utils.h +#include tm-syscall-asm.h + +unsigned retries = 0; + +#define TEST_DURATION 10 /* seconds */ +#define TM_RETRIES 100 + +long failure_code(void) +{ + return __builtin_get_texasr() 56; +} + +bool failure_is_persistent(void) +{ + return (failure_code() TM_CAUSE_PERSISTENT) == TM_CAUSE_PERSISTENT; +} + +bool failure_is_syscall(void) +{ + return (failure_code() TM_CAUSE_SYSCALL) == TM_CAUSE_SYSCALL; +} + +pid_t getppid_tm(bool suspend) +{ + int i; + pid_t pid; + + for (i = 0; i TM_RETRIES; i++) { + if (suspend) + pid = getppid_tm_suspended_impl(); + else + pid = getppid_tm_active_impl(); + if (pid = 0) + return pid; + if (failure_is_persistent()) { + if (failure_is_syscall()) + return -1; + printf(Unexpected persistent transaction failure.\n); + printf(TEXASR 0x%016lx, TFIAR 0x%016lx.\n, + __builtin_get_texasr(), __builtin_get_tfiar()); + exit(-1
[PATCH 1/1] KVM: PPC: Book3S: correct width in XER handling
In 64 bit kernels, the Fixed Point Exception Register (XER) is a 64 bit field (e.g. in kvm_regs and kvm_vcpu_arch) and in most places it is accessed as such. This patch corrects places where it is accessed as a 32 bit field by a 64 bit kernel. In some cases this is via a 32 bit load or store instruction which, depending on endianness, will cause either the lower or upper 32 bits to be missed. In another case it is cast as a u32, causing the upper 32 bits to be cleared. This patch corrects those places by extending the access methods to 64 bits. Signed-off-by: Sam Bobroff sam.bobr...@au1.ibm.com --- arch/powerpc/include/asm/kvm_book3s.h |4 ++-- arch/powerpc/kvm/book3s_hv_rmhandlers.S |6 +++--- arch/powerpc/kvm/book3s_segment.S |4 ++-- 3 files changed, 7 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index b91e74a..05a875a 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -225,12 +225,12 @@ static inline u32 kvmppc_get_cr(struct kvm_vcpu *vcpu) return vcpu-arch.cr; } -static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, u32 val) +static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, ulong val) { vcpu-arch.xer = val; } -static inline u32 kvmppc_get_xer(struct kvm_vcpu *vcpu) +static inline ulong kvmppc_get_xer(struct kvm_vcpu *vcpu) { return vcpu-arch.xer; } diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index 4d70df2..d75be59 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -870,7 +870,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S) blt hdec_soon ld r6, VCPU_CTR(r4) - lwz r7, VCPU_XER(r4) + ld r7, VCPU_XER(r4) mtctr r6 mtxer r7 @@ -1103,7 +1103,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) mfctr r3 mfxer r4 std r3, VCPU_CTR(r9) - stw r4, VCPU_XER(r9) + std r4, VCPU_XER(r9) /* If this is a page table miss then see if it's theirs or ours */ cmpwi r12, BOOK3S_INTERRUPT_H_DATA_STORAGE @@ -1675,7 +1675,7 @@ kvmppc_hdsi: bl kvmppc_msr_interrupt fast_interrupt_c_return: 6: ld r7, VCPU_CTR(r9) - lwz r8, VCPU_XER(r9) + ld r8, VCPU_XER(r9) mtctr r7 mtxer r8 mr r4, r9 diff --git a/arch/powerpc/kvm/book3s_segment.S b/arch/powerpc/kvm/book3s_segment.S index acee37c..ca8f174 100644 --- a/arch/powerpc/kvm/book3s_segment.S +++ b/arch/powerpc/kvm/book3s_segment.S @@ -123,7 +123,7 @@ no_dcbz32_on: PPC_LL r8, SVCPU_CTR(r3) PPC_LL r9, SVCPU_LR(r3) lwz r10, SVCPU_CR(r3) - lwz r11, SVCPU_XER(r3) + PPC_LL r11, SVCPU_XER(r3) mtctr r8 mtlrr9 @@ -237,7 +237,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE) mfctr r8 mflrr9 - stw r5, SVCPU_XER(r13) + PPC_STL r5, SVCPU_XER(r13) PPC_STL r6, SVCPU_FAULT_DAR(r13) stw r7, SVCPU_FAULT_DSISR(r13) PPC_STL r8, SVCPU_CTR(r13) -- 1.7.10.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 1/1] powerpc/tm: Abort syscalls in active transactions (v2)
This patch changes the syscall handler to doom (tabort) active transactions when a syscall is made and return very early without performing the syscall and keeping side effects to a minimum (no CPU accounting or system call tracing is performed). Also included is a new HWCAP2 bit, PPC_FEATURE2_HTM_NOSC, to indicate this behaviour to userspace. Currently, the system call instruction automatically suspends an active transaction which causes side effects to persist when an active transaction fails. This does change the kernel's behaviour, but in a way that was documented as unsupported. It doesn't reduce functionality as syscalls will still be performed after tsuspend; it just requires that the transaction be explicitly suspended. It also provides a consistent interface and makes the behaviour of user code substantially the same across powerpc and platforms that do not support suspended transactions (e.g. x86 and s390). Performance measurements using http://ozlabs.org/~anton/junkcode/null_syscall.c indicate the cost of a normal (non-aborted) system call increases by about 0.25%. Signed-off-by: Sam Bobroff sam.bobr...@au1.ibm.com --- Documentation/powerpc/transactional_memory.txt | 32 - arch/powerpc/include/asm/cputable.h | 10 arch/powerpc/include/uapi/asm/cputable.h| 1 + arch/powerpc/include/uapi/asm/tm.h | 2 +- arch/powerpc/kernel/cputable.c | 4 +++- arch/powerpc/kernel/entry_64.S | 28 ++ tools/testing/selftests/powerpc/tm/tm-syscall.c | 3 ++- 7 files changed, 57 insertions(+), 23 deletions(-) diff --git a/Documentation/powerpc/transactional_memory.txt b/Documentation/powerpc/transactional_memory.txt index ded6979..ba0a2a4 100644 --- a/Documentation/powerpc/transactional_memory.txt +++ b/Documentation/powerpc/transactional_memory.txt @@ -74,22 +74,23 @@ Causes of transaction aborts Syscalls -Performing syscalls from within transaction is not recommended, and can lead -to unpredictable results. +Syscalls made from within an active transaction will not be performed and the +transaction will be doomed by the kernel with the failure code TM_CAUSE_SYSCALL +| TM_CAUSE_PERSISTENT. -Syscalls do not by design abort transactions, but beware: The kernel code will -not be running in transactional state. The effect of syscalls will always -remain visible, but depending on the call they may abort your transaction as a -side-effect, read soon-to-be-aborted transactional data that should not remain -invisible, etc. If you constantly retry a transaction that constantly aborts -itself by calling a syscall, you'll have a livelock make no progress. +Syscalls made from within a suspended transaction are performed as normal and +the transaction is not explicitly doomed by the kernel. However, what the +kernel does to perform the syscall may result in the transaction being doomed +by the hardware. The syscall is performed in suspended mode so any side +effects will be persistent, independent of transaction success or failure. No +guarantees are provided by the kernel about which syscalls will affect +transaction success. -Simple syscalls (e.g. sigprocmask()) could be OK. Even things like write() -from, say, printf() should be OK as long as the kernel does not access any -memory that was accessed transactionally. - -Consider any syscalls that happen to work as debug-only -- not recommended for -production use. Best to queue them up till after the transaction is over. +Care must be taken when relying on syscalls to abort during active transactions +if the calls are made via a library. Libraries may cache values (which may +give the appearance of success) or perform operations that cause transaction +failure before entering the kernel (which may produce different failure codes). +Examples are glibc's getpid() and lazy symbol resolution. Signals @@ -176,8 +177,7 @@ kernel aborted a transaction: TM_CAUSE_RESCHED Thread was rescheduled. TM_CAUSE_TLBI Software TLB invalid. TM_CAUSE_FAC_UNAV FP/VEC/VSX unavailable trap. - TM_CAUSE_SYSCALL Currently unused; future syscalls that must abort -transactions for consistency will use this. + TM_CAUSE_SYSCALL Syscall from active transaction. TM_CAUSE_SIGNALSignal delivered. TM_CAUSE_MISC Currently unused. TM_CAUSE_ALIGNMENT Alignment fault. diff --git a/arch/powerpc/include/asm/cputable.h b/arch/powerpc/include/asm/cputable.h index 6367b83..4994648 100644 --- a/arch/powerpc/include/asm/cputable.h +++ b/arch/powerpc/include/asm/cputable.h @@ -242,11 +242,13 @@ enum { /* We only set the TM feature if the kernel was compiled with TM supprt */ #ifdef CONFIG_PPC_TRANSACTIONAL_MEM -#define CPU_FTR_TM_COMPCPU_FTR_TM -#define PPC_FEATURE2_HTM_COMP PPC_FEATURE2_HTM +#define CPU_FTR_TM_COMPCPU_FTR_TM
[PATCH v3 1/1] KVM: PPC: Book3S: correct width in XER handling
In 64 bit kernels, the Fixed Point Exception Register (XER) is a 64 bit field (e.g. in kvm_regs and kvm_vcpu_arch) and in most places it is accessed as such. This patch corrects places where it is accessed as a 32 bit field by a 64 bit kernel. In some cases this is via a 32 bit load or store instruction which, depending on endianness, will cause either the lower or upper 32 bits to be missed. In another case it is cast as a u32, causing the upper 32 bits to be cleared. This patch corrects those places by extending the access methods to 64 bits. Signed-off-by: Sam Bobroff sam.bobr...@au1.ibm.com --- v3: Adjust booke set/get xer to match book3s. v2: Also extend kvmppc_book3s_shadow_vcpu.xer to 64 bit. arch/powerpc/include/asm/kvm_book3s.h |4 ++-- arch/powerpc/include/asm/kvm_book3s_asm.h |2 +- arch/powerpc/include/asm/kvm_booke.h |4 ++-- arch/powerpc/kvm/book3s_hv_rmhandlers.S |6 +++--- arch/powerpc/kvm/book3s_segment.S |4 ++-- 5 files changed, 10 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index b91e74a..05a875a 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -225,12 +225,12 @@ static inline u32 kvmppc_get_cr(struct kvm_vcpu *vcpu) return vcpu-arch.cr; } -static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, u32 val) +static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, ulong val) { vcpu-arch.xer = val; } -static inline u32 kvmppc_get_xer(struct kvm_vcpu *vcpu) +static inline ulong kvmppc_get_xer(struct kvm_vcpu *vcpu) { return vcpu-arch.xer; } diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h b/arch/powerpc/include/asm/kvm_book3s_asm.h index 5bdfb5d..c4ccd2d 100644 --- a/arch/powerpc/include/asm/kvm_book3s_asm.h +++ b/arch/powerpc/include/asm/kvm_book3s_asm.h @@ -112,7 +112,7 @@ struct kvmppc_book3s_shadow_vcpu { bool in_use; ulong gpr[14]; u32 cr; - u32 xer; + ulong xer; ulong ctr; ulong lr; ulong pc; diff --git a/arch/powerpc/include/asm/kvm_booke.h b/arch/powerpc/include/asm/kvm_booke.h index 3286f0d..bc6e29e 100644 --- a/arch/powerpc/include/asm/kvm_booke.h +++ b/arch/powerpc/include/asm/kvm_booke.h @@ -54,12 +54,12 @@ static inline u32 kvmppc_get_cr(struct kvm_vcpu *vcpu) return vcpu-arch.cr; } -static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, u32 val) +static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, ulong val) { vcpu-arch.xer = val; } -static inline u32 kvmppc_get_xer(struct kvm_vcpu *vcpu) +static inline ulong kvmppc_get_xer(struct kvm_vcpu *vcpu) { return vcpu-arch.xer; } diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index 4d70df2..d75be59 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -870,7 +870,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S) blt hdec_soon ld r6, VCPU_CTR(r4) - lwz r7, VCPU_XER(r4) + ld r7, VCPU_XER(r4) mtctr r6 mtxer r7 @@ -1103,7 +1103,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) mfctr r3 mfxer r4 std r3, VCPU_CTR(r9) - stw r4, VCPU_XER(r9) + std r4, VCPU_XER(r9) /* If this is a page table miss then see if it's theirs or ours */ cmpwi r12, BOOK3S_INTERRUPT_H_DATA_STORAGE @@ -1675,7 +1675,7 @@ kvmppc_hdsi: bl kvmppc_msr_interrupt fast_interrupt_c_return: 6: ld r7, VCPU_CTR(r9) - lwz r8, VCPU_XER(r9) + ld r8, VCPU_XER(r9) mtctr r7 mtxer r8 mr r4, r9 diff --git a/arch/powerpc/kvm/book3s_segment.S b/arch/powerpc/kvm/book3s_segment.S index acee37c..ca8f174 100644 --- a/arch/powerpc/kvm/book3s_segment.S +++ b/arch/powerpc/kvm/book3s_segment.S @@ -123,7 +123,7 @@ no_dcbz32_on: PPC_LL r8, SVCPU_CTR(r3) PPC_LL r9, SVCPU_LR(r3) lwz r10, SVCPU_CR(r3) - lwz r11, SVCPU_XER(r3) + PPC_LL r11, SVCPU_XER(r3) mtctr r8 mtlrr9 @@ -237,7 +237,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE) mfctr r8 mflrr9 - stw r5, SVCPU_XER(r13) + PPC_STL r5, SVCPU_XER(r13) PPC_STL r6, SVCPU_FAULT_DAR(r13) stw r7, SVCPU_FAULT_DSISR(r13) PPC_STL r8, SVCPU_CTR(r13) -- 1.7.10.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2 1/1] KVM: PPC: Book3S: correct width in XER handling
On Tue, May 26, 2015 at 10:35:08AM +0200, Alexander Graf wrote: On 26.05.15 02:27, Sam Bobroff wrote: In 64 bit kernels, the Fixed Point Exception Register (XER) is a 64 bit field (e.g. in kvm_regs and kvm_vcpu_arch) and in most places it is accessed as such. This patch corrects places where it is accessed as a 32 bit field by a 64 bit kernel. In some cases this is via a 32 bit load or store instruction which, depending on endianness, will cause either the lower or upper 32 bits to be missed. In another case it is cast as a u32, causing the upper 32 bits to be cleared. This patch corrects those places by extending the access methods to 64 bits. Signed-off-by: Sam Bobroff sam.bobr...@au1.ibm.com --- v2: Also extend kvmppc_book3s_shadow_vcpu.xer to 64 bit. arch/powerpc/include/asm/kvm_book3s.h |4 ++-- arch/powerpc/include/asm/kvm_book3s_asm.h |2 +- arch/powerpc/kvm/book3s_hv_rmhandlers.S |6 +++--- arch/powerpc/kvm/book3s_segment.S |4 ++-- 4 files changed, 8 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index b91e74a..05a875a 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -225,12 +225,12 @@ static inline u32 kvmppc_get_cr(struct kvm_vcpu *vcpu) return vcpu-arch.cr; } -static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, u32 val) +static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, ulong val) Now we have book3s and booke files with different prototypes on the same inline function names. That's really ugly. Please keep them in sync ;). OK will do. Alex { vcpu-arch.xer = val; } -static inline u32 kvmppc_get_xer(struct kvm_vcpu *vcpu) +static inline ulong kvmppc_get_xer(struct kvm_vcpu *vcpu) { return vcpu-arch.xer; } diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h b/arch/powerpc/include/asm/kvm_book3s_asm.h index 5bdfb5d..c4ccd2d 100644 --- a/arch/powerpc/include/asm/kvm_book3s_asm.h +++ b/arch/powerpc/include/asm/kvm_book3s_asm.h @@ -112,7 +112,7 @@ struct kvmppc_book3s_shadow_vcpu { bool in_use; ulong gpr[14]; u32 cr; - u32 xer; + ulong xer; ulong ctr; ulong lr; ulong pc; diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index 4d70df2..d75be59 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -870,7 +870,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S) blt hdec_soon ld r6, VCPU_CTR(r4) - lwz r7, VCPU_XER(r4) + ld r7, VCPU_XER(r4) mtctr r6 mtxer r7 @@ -1103,7 +1103,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) mfctr r3 mfxer r4 std r3, VCPU_CTR(r9) - stw r4, VCPU_XER(r9) + std r4, VCPU_XER(r9) /* If this is a page table miss then see if it's theirs or ours */ cmpwi r12, BOOK3S_INTERRUPT_H_DATA_STORAGE @@ -1675,7 +1675,7 @@ kvmppc_hdsi: bl kvmppc_msr_interrupt fast_interrupt_c_return: 6: ld r7, VCPU_CTR(r9) - lwz r8, VCPU_XER(r9) + ld r8, VCPU_XER(r9) mtctr r7 mtxer r8 mr r4, r9 diff --git a/arch/powerpc/kvm/book3s_segment.S b/arch/powerpc/kvm/book3s_segment.S index acee37c..ca8f174 100644 --- a/arch/powerpc/kvm/book3s_segment.S +++ b/arch/powerpc/kvm/book3s_segment.S @@ -123,7 +123,7 @@ no_dcbz32_on: PPC_LL r8, SVCPU_CTR(r3) PPC_LL r9, SVCPU_LR(r3) lwz r10, SVCPU_CR(r3) - lwz r11, SVCPU_XER(r3) + PPC_LL r11, SVCPU_XER(r3) mtctr r8 mtlrr9 @@ -237,7 +237,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE) mfctr r8 mflrr9 - stw r5, SVCPU_XER(r13) + PPC_STL r5, SVCPU_XER(r13) PPC_STL r6, SVCPU_FAULT_DAR(r13) stw r7, SVCPU_FAULT_DSISR(r13) PPC_STL r8, SVCPU_CTR(r13) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/1] KVM: PPC: Book3S: correct width in XER handling
On Mon, May 25, 2015 at 11:08:08PM +0200, Alexander Graf wrote: On 20.05.15 07:26, Sam Bobroff wrote: In 64 bit kernels, the Fixed Point Exception Register (XER) is a 64 bit field (e.g. in kvm_regs and kvm_vcpu_arch) and in most places it is accessed as such. This patch corrects places where it is accessed as a 32 bit field by a 64 bit kernel. In some cases this is via a 32 bit load or store instruction which, depending on endianness, will cause either the lower or upper 32 bits to be missed. In another case it is cast as a u32, causing the upper 32 bits to be cleared. This patch corrects those places by extending the access methods to 64 bits. Signed-off-by: Sam Bobroff sam.bobr...@au1.ibm.com --- arch/powerpc/include/asm/kvm_book3s.h |4 ++-- arch/powerpc/kvm/book3s_hv_rmhandlers.S |6 +++--- arch/powerpc/kvm/book3s_segment.S |4 ++-- 3 files changed, 7 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index b91e74a..05a875a 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -225,12 +225,12 @@ static inline u32 kvmppc_get_cr(struct kvm_vcpu *vcpu) return vcpu-arch.cr; } -static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, u32 val) +static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, ulong val) { vcpu-arch.xer = val; } -static inline u32 kvmppc_get_xer(struct kvm_vcpu *vcpu) +static inline ulong kvmppc_get_xer(struct kvm_vcpu *vcpu) { return vcpu-arch.xer; } diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index 4d70df2..d75be59 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -870,7 +870,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S) blt hdec_soon ld r6, VCPU_CTR(r4) - lwz r7, VCPU_XER(r4) + ld r7, VCPU_XER(r4) mtctr r6 mtxer r7 @@ -1103,7 +1103,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) mfctr r3 mfxer r4 std r3, VCPU_CTR(r9) - stw r4, VCPU_XER(r9) + std r4, VCPU_XER(r9) /* If this is a page table miss then see if it's theirs or ours */ cmpwi r12, BOOK3S_INTERRUPT_H_DATA_STORAGE @@ -1675,7 +1675,7 @@ kvmppc_hdsi: bl kvmppc_msr_interrupt fast_interrupt_c_return: 6: ld r7, VCPU_CTR(r9) - lwz r8, VCPU_XER(r9) + ld r8, VCPU_XER(r9) mtctr r7 mtxer r8 mr r4, r9 diff --git a/arch/powerpc/kvm/book3s_segment.S b/arch/powerpc/kvm/book3s_segment.S index acee37c..ca8f174 100644 --- a/arch/powerpc/kvm/book3s_segment.S +++ b/arch/powerpc/kvm/book3s_segment.S @@ -123,7 +123,7 @@ no_dcbz32_on: PPC_LL r8, SVCPU_CTR(r3) PPC_LL r9, SVCPU_LR(r3) lwz r10, SVCPU_CR(r3) - lwz r11, SVCPU_XER(r3) + PPC_LL r11, SVCPU_XER(r3) struct kvmppc_book3s_shadow_vcpu { bool in_use; ulong gpr[14]; u32 cr; u32 xer; [...] so at least this change looks wrong. Please double-check all fields in your patch again. Alex Thanks for the review and the catch! The xer field in kvm_vcpu_arch is already ulong, so it looks like the one in kvmppc_book3s_shadow_vcpu is the only other case. I'll fix that and repost. Cheers, Sam. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 1/1] KVM: PPC: Book3S: correct width in XER handling
In 64 bit kernels, the Fixed Point Exception Register (XER) is a 64 bit field (e.g. in kvm_regs and kvm_vcpu_arch) and in most places it is accessed as such. This patch corrects places where it is accessed as a 32 bit field by a 64 bit kernel. In some cases this is via a 32 bit load or store instruction which, depending on endianness, will cause either the lower or upper 32 bits to be missed. In another case it is cast as a u32, causing the upper 32 bits to be cleared. This patch corrects those places by extending the access methods to 64 bits. Signed-off-by: Sam Bobroff sam.bobr...@au1.ibm.com --- v2: Also extend kvmppc_book3s_shadow_vcpu.xer to 64 bit. arch/powerpc/include/asm/kvm_book3s.h |4 ++-- arch/powerpc/include/asm/kvm_book3s_asm.h |2 +- arch/powerpc/kvm/book3s_hv_rmhandlers.S |6 +++--- arch/powerpc/kvm/book3s_segment.S |4 ++-- 4 files changed, 8 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index b91e74a..05a875a 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -225,12 +225,12 @@ static inline u32 kvmppc_get_cr(struct kvm_vcpu *vcpu) return vcpu-arch.cr; } -static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, u32 val) +static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, ulong val) { vcpu-arch.xer = val; } -static inline u32 kvmppc_get_xer(struct kvm_vcpu *vcpu) +static inline ulong kvmppc_get_xer(struct kvm_vcpu *vcpu) { return vcpu-arch.xer; } diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h b/arch/powerpc/include/asm/kvm_book3s_asm.h index 5bdfb5d..c4ccd2d 100644 --- a/arch/powerpc/include/asm/kvm_book3s_asm.h +++ b/arch/powerpc/include/asm/kvm_book3s_asm.h @@ -112,7 +112,7 @@ struct kvmppc_book3s_shadow_vcpu { bool in_use; ulong gpr[14]; u32 cr; - u32 xer; + ulong xer; ulong ctr; ulong lr; ulong pc; diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index 4d70df2..d75be59 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -870,7 +870,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S) blt hdec_soon ld r6, VCPU_CTR(r4) - lwz r7, VCPU_XER(r4) + ld r7, VCPU_XER(r4) mtctr r6 mtxer r7 @@ -1103,7 +1103,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) mfctr r3 mfxer r4 std r3, VCPU_CTR(r9) - stw r4, VCPU_XER(r9) + std r4, VCPU_XER(r9) /* If this is a page table miss then see if it's theirs or ours */ cmpwi r12, BOOK3S_INTERRUPT_H_DATA_STORAGE @@ -1675,7 +1675,7 @@ kvmppc_hdsi: bl kvmppc_msr_interrupt fast_interrupt_c_return: 6: ld r7, VCPU_CTR(r9) - lwz r8, VCPU_XER(r9) + ld r8, VCPU_XER(r9) mtctr r7 mtxer r8 mr r4, r9 diff --git a/arch/powerpc/kvm/book3s_segment.S b/arch/powerpc/kvm/book3s_segment.S index acee37c..ca8f174 100644 --- a/arch/powerpc/kvm/book3s_segment.S +++ b/arch/powerpc/kvm/book3s_segment.S @@ -123,7 +123,7 @@ no_dcbz32_on: PPC_LL r8, SVCPU_CTR(r3) PPC_LL r9, SVCPU_LR(r3) lwz r10, SVCPU_CR(r3) - lwz r11, SVCPU_XER(r3) + PPC_LL r11, SVCPU_XER(r3) mtctr r8 mtlrr9 @@ -237,7 +237,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE) mfctr r8 mflrr9 - stw r5, SVCPU_XER(r13) + PPC_STL r5, SVCPU_XER(r13) PPC_STL r6, SVCPU_FAULT_DAR(r13) stw r7, SVCPU_FAULT_DSISR(r13) PPC_STL r8, SVCPU_CTR(r13) -- 1.7.10.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [1/1] powerpc/xmon: Paged output for paca display
On Tue, Aug 18, 2015 at 04:26:32PM +1000, Michael Ellerman wrote: On Fri, 2015-14-08 at 02:55:14 UTC, Sam bobroff wrote: The paca display is already more than 24 lines, which can be problematic if you have an old school 80x24 terminal, or more likely you are on a virtual terminal which does not scroll for whatever reason. This adds an optional letter to the dp and dpa xmon commands (dpp and dppa), which will enable a per-page display (with 16 line pages): the first page will be displayed and if there was data that didn't fit, it will display a message indicating that the user can use enter to display the next page. The intent is that this feels similar to the way the memory display functions work. This is implemented by running over the entire output both for the initial command and for each subsequent page: the visible part is clipped out by checking line numbers. Handling the empty command as more is done by writing a special command into a static buffer that indicates where to move the sliding visibility window. This is similar to the approach used for the memory dump commands except that the state data is encoded into the last_cmd string, rather than a set of static variables. The memory dump commands could probably be rewritten to make use of the same buffer and remove their other static variables. Sample output: 0:mon dpp1 paca for cpu 0x1 @ cfdc0480: possible = yes present = yes online = yes lock_token = 0x8000 (0x8) paca_index = 0x1 (0xa) kernel_toc = 0xc0eb2400 (0x10) kernelbase = 0xc000 (0x18) kernel_msr = 0xb0001032 (0x20) emergency_sp = 0xc0003ffe8000 (0x28) mc_emergency_sp = 0xc0003ffe4000 (0x2e0) in_mce = 0x0 (0x2e8) data_offset = 0x7f17 (0x30) hw_cpu_id= 0x8 (0x38) cpu_start= 0x1 (0x3a) kexec_state = 0x0 (0x3b) [Enter for next page] 0:mon __current= 0xc0007e696620 (0x290) kstack = 0xc0007e6ebe30 (0x298) stab_rr = 0xb (0x2a0) saved_r1 = 0xc0007ef37860 (0x2a8) trap_save= 0x0 (0x2b8) soft_enabled = 0x0 (0x2ba) irq_happened = 0x1 (0x2bb) io_sync = 0x0 (0x2bc) irq_work_pending = 0x0 (0x2bd) nap_state_lost = 0x0 (0x2be) 0:mon (Based on a similar patch by Michael Ellerman m...@ellerman.id.au [v2] powerpc/xmon: Allow limiting the size of the paca display. This patch is an alternative and cannot coexist with the original.) So this is nice, but ... the diff is twice the size of my version, plus 128 bytes of BSS, so I'm not sure the added benefit is sufficient to justify the added code complexity. But you can convince me otherwise if you feel strongly about it. cheers I do think the output is a lot better paged like this :-) The 128 byte buffer is a lot more than it needs for this particular command; it could quite comfortably be lowered to about 32 (I was leaving space for other commands to use it but there aren't any so far). I'll do this and repost. Also, because the last_cmd_buf system is not specific to the paca display, it could be used by the other paged commands (like the memory dumps). If we did this we could (probably) remove ndump, nidump and ncsum which are all longs, although I haven't worked out how much buffer space would be needed in last_cmd_buf to support these (they have their own paging code but the positional information could be stored in the string buffer). It's probably not much work but might be a bit tricky. Do you think it's worth doing? Since we're looking at memory usage, it looks like tmpstr[128] could be removed without much work, saving 128 bytes and removing an unnecessary global variable. If it actually turns out to be easy to do I'll post a separate patch. Thanks for the revew, Sam. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 2/2] powerpc/xmon: Paginate kernel log buffer display
The kernel log buffer is often much longer than the size of a terminal so paginate it's output. Signed-off-by: Sam Bobroff sam.bobr...@au1.ibm.com --- arch/powerpc/xmon/xmon.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c index 9ce9e7d..fdd765e 100644 --- a/arch/powerpc/xmon/xmon.c +++ b/arch/powerpc/xmon/xmon.c @@ -248,9 +248,7 @@ Commands:\n\ u dump TLB\n #endif ? help\n -#ifdef CONFIG_PPC64 - .# limit output to # lines per page (dump paca only)\n -#endif + .# limit output to # lines per page (for dp#, dpa, dl)\n zr reboot\n\ zh halt\n ; @@ -850,6 +848,7 @@ static void paged_set_size(void) paged_size = 0; } } + static void paged_reset(void) { paged_cur_page = 0; @@ -2372,10 +2371,12 @@ dump_log_buf(void) sync(); kmsg_dump_rewind_nolock(dumper); + paged_start(); while (kmsg_dump_get_line_nolock(dumper, false, buf, sizeof(buf), len)) { buf[len] = '\0'; - printf(%s, buf); + XMON_PRINTF(%s, buf); } + paged_end(dl\n); sync(); /* wait a little while to see if we get a machine check */ -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 0/2] powerpc/xmon: Paged output for paca display
Changes v1 - v2: * Removed pagination parameters from commands, replaced with new command to set page size. This works better for multiple commands and produces simpler code. * Switched from encoding the page position in the command buffer to using some globals. Saves some memory and is less invasive to the command code. * Added a patch to paginate the kernel log buffer display. Sam Bobroff (2): powerpc/xmon: Paged output for paca display powerpc/xmon: Paginate kernel log buffer display arch/powerpc/xmon/xmon.c | 89 +++- 1 file changed, 73 insertions(+), 16 deletions(-) -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 1/2] powerpc/xmon: Paged output for paca display
The paca display is already more than 24 lines, which can be problematic if you have an old school 80x24 terminal, or more likely you are on a virtual terminal which does not scroll for whatever reason. This patch adds a new command ., which takes a single (hex) numeric argument: lines per page. It will cause the output of dp and dpa to be broken into pages, if necessary. This is implemented by running over the entire output both for the initial command and for each subsequent page: the visible part is clipped out by checking line numbers. This is a simplistic approach but minimally invasive; it is intended to be easily reusable for other commands. Sample output: 0:mon .10 0:mon dp1 paca for cpu 0x1 @ cfdc0480: possible = yes present = yes online = yes lock_token = 0x8000 (0x8) paca_index = 0x1 (0xa) kernel_toc = 0xc0eb2400 (0x10) kernelbase = 0xc000 (0x18) kernel_msr = 0xb0001032 (0x20) emergency_sp = 0xc0003ffe8000 (0x28) mc_emergency_sp = 0xc0003ffe4000 (0x2e0) in_mce = 0x0 (0x2e8) data_offset = 0x7f17 (0x30) hw_cpu_id= 0x8 (0x38) cpu_start= 0x1 (0x3a) kexec_state = 0x0 (0x3b) [Enter for next page] 0:mon __current= 0xc0007e696620 (0x290) kstack = 0xc0007e6ebe30 (0x298) stab_rr = 0xb (0x2a0) saved_r1 = 0xc0007ef37860 (0x2a8) trap_save= 0x0 (0x2b8) soft_enabled = 0x0 (0x2ba) irq_happened = 0x1 (0x2bb) io_sync = 0x0 (0x2bc) irq_work_pending = 0x0 (0x2bd) nap_state_lost = 0x0 (0x2be) 0:mon (Based on a similar patch by Michael Ellerman m...@ellerman.id.au [v2] powerpc/xmon: Allow limiting the size of the paca display. This patch is an alternative and cannot coexist with the original.) Signed-off-by: Sam Bobroff sam.bobr...@au1.ibm.com --- arch/powerpc/xmon/xmon.c | 86 +++- 1 file changed, 71 insertions(+), 15 deletions(-) diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c index e599259..9ce9e7d 100644 --- a/arch/powerpc/xmon/xmon.c +++ b/arch/powerpc/xmon/xmon.c @@ -72,6 +72,12 @@ static int xmon_gate; static unsigned long in_xmon __read_mostly = 0; +#define XMON_PRINTF(...) do { if (paged_vis()) printf(__VA_ARGS__); } while (0) +#define MAX_PAGED_SIZE 1024 +static unsigned long paged_size = 0, paged_pos, paged_cur_page; +#ifdef CONFIG_PPC64 +static unsigned long paca_cpu; +#endif static unsigned long adrs; static int size = 1; #define MAX_DUMP (128 * 1024) @@ -242,6 +248,9 @@ Commands:\n\ u dump TLB\n #endif ? help\n +#ifdef CONFIG_PPC64 + .# limit output to # lines per page (dump paca only)\n +#endif zr reboot\n\ zh halt\n ; @@ -833,6 +842,19 @@ static void remove_cpu_bpts(void) write_ciabr(0); } +static void paged_set_size(void) +{ + if (!scanhex(paged_size) || (paged_size MAX_PAGED_SIZE)) { + printf(Invalid number of lines per page (max: %d).\n, + MAX_PAGED_SIZE); + paged_size = 0; + } +} +static void paged_reset(void) +{ + paged_cur_page = 0; +} + /* Command interpreting routine */ static char *last_cmd; @@ -863,7 +885,8 @@ cmds(struct pt_regs *excp) take_input(last_cmd); last_cmd = NULL; cmd = inchar(); - } + } else + paged_reset(); switch (cmd) { case 'm': cmd = inchar(); @@ -924,6 +947,9 @@ cmds(struct pt_regs *excp) case '?': xmon_puts(help_string); break; + case '.': + paged_set_size(); + break; case 'b': bpt_cmds(); break; @@ -2069,6 +2095,31 @@ static void xmon_rawdump (unsigned long adrs, long ndump) printf(\n); } +static void paged_start(void) +{ + paged_pos = 0; +} + +static void paged_end(char *next_cmd) +{ + unsigned long next_page_start = ++paged_cur_page * paged_size; + + if (paged_size (paged_pos next_page_start)) { + last_cmd = next_cmd; + printf([Enter for next page]\n); + } +} + +static bool paged_vis(void) +{ + bool rv = (!paged_size + || ((paged_pos = (paged_size * paged_cur_page)) +(paged_pos (paged_size * (paged_cur_page + 1); + + paged_pos++; + return rv; +} + #ifdef CONFIG_PPC64 static void dump_one_paca(int cpu) { @@ -2084,15 +2135,17 @@ static void dump_one_paca(int cpu
Re: [PATCH v3 1/1] KVM: PPC: Book3S: correct width in XER handling
Ping? I think I've addressed all the comments in this version. Is there anything else I need to look at? Cheers, Sam. On Wed, May 27, 2015 at 09:56:57AM +1000, Sam Bobroff wrote: In 64 bit kernels, the Fixed Point Exception Register (XER) is a 64 bit field (e.g. in kvm_regs and kvm_vcpu_arch) and in most places it is accessed as such. This patch corrects places where it is accessed as a 32 bit field by a 64 bit kernel. In some cases this is via a 32 bit load or store instruction which, depending on endianness, will cause either the lower or upper 32 bits to be missed. In another case it is cast as a u32, causing the upper 32 bits to be cleared. This patch corrects those places by extending the access methods to 64 bits. Signed-off-by: Sam Bobroff sam.bobr...@au1.ibm.com --- v3: Adjust booke set/get xer to match book3s. v2: Also extend kvmppc_book3s_shadow_vcpu.xer to 64 bit. arch/powerpc/include/asm/kvm_book3s.h |4 ++-- arch/powerpc/include/asm/kvm_book3s_asm.h |2 +- arch/powerpc/include/asm/kvm_booke.h |4 ++-- arch/powerpc/kvm/book3s_hv_rmhandlers.S |6 +++--- arch/powerpc/kvm/book3s_segment.S |4 ++-- 5 files changed, 10 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index b91e74a..05a875a 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -225,12 +225,12 @@ static inline u32 kvmppc_get_cr(struct kvm_vcpu *vcpu) return vcpu-arch.cr; } -static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, u32 val) +static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, ulong val) { vcpu-arch.xer = val; } -static inline u32 kvmppc_get_xer(struct kvm_vcpu *vcpu) +static inline ulong kvmppc_get_xer(struct kvm_vcpu *vcpu) { return vcpu-arch.xer; } diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h b/arch/powerpc/include/asm/kvm_book3s_asm.h index 5bdfb5d..c4ccd2d 100644 --- a/arch/powerpc/include/asm/kvm_book3s_asm.h +++ b/arch/powerpc/include/asm/kvm_book3s_asm.h @@ -112,7 +112,7 @@ struct kvmppc_book3s_shadow_vcpu { bool in_use; ulong gpr[14]; u32 cr; - u32 xer; + ulong xer; ulong ctr; ulong lr; ulong pc; diff --git a/arch/powerpc/include/asm/kvm_booke.h b/arch/powerpc/include/asm/kvm_booke.h index 3286f0d..bc6e29e 100644 --- a/arch/powerpc/include/asm/kvm_booke.h +++ b/arch/powerpc/include/asm/kvm_booke.h @@ -54,12 +54,12 @@ static inline u32 kvmppc_get_cr(struct kvm_vcpu *vcpu) return vcpu-arch.cr; } -static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, u32 val) +static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, ulong val) { vcpu-arch.xer = val; } -static inline u32 kvmppc_get_xer(struct kvm_vcpu *vcpu) +static inline ulong kvmppc_get_xer(struct kvm_vcpu *vcpu) { return vcpu-arch.xer; } diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index 4d70df2..d75be59 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -870,7 +870,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S) blt hdec_soon ld r6, VCPU_CTR(r4) - lwz r7, VCPU_XER(r4) + ld r7, VCPU_XER(r4) mtctr r6 mtxer r7 @@ -1103,7 +1103,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) mfctr r3 mfxer r4 std r3, VCPU_CTR(r9) - stw r4, VCPU_XER(r9) + std r4, VCPU_XER(r9) /* If this is a page table miss then see if it's theirs or ours */ cmpwi r12, BOOK3S_INTERRUPT_H_DATA_STORAGE @@ -1675,7 +1675,7 @@ kvmppc_hdsi: bl kvmppc_msr_interrupt fast_interrupt_c_return: 6: ld r7, VCPU_CTR(r9) - lwz r8, VCPU_XER(r9) + ld r8, VCPU_XER(r9) mtctr r7 mtxer r8 mr r4, r9 diff --git a/arch/powerpc/kvm/book3s_segment.S b/arch/powerpc/kvm/book3s_segment.S index acee37c..ca8f174 100644 --- a/arch/powerpc/kvm/book3s_segment.S +++ b/arch/powerpc/kvm/book3s_segment.S @@ -123,7 +123,7 @@ no_dcbz32_on: PPC_LL r8, SVCPU_CTR(r3) PPC_LL r9, SVCPU_LR(r3) lwz r10, SVCPU_CR(r3) - lwz r11, SVCPU_XER(r3) + PPC_LL r11, SVCPU_XER(r3) mtctr r8 mtlrr9 @@ -237,7 +237,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE) mfctr r8 mflrr9 - stw r5, SVCPU_XER(r13) + PPC_STL r5, SVCPU_XER(r13) PPC_STL r6, SVCPU_FAULT_DAR(r13) stw r7, SVCPU_FAULT_DSISR(r13) PPC_STL r8, SVCPU_CTR(r13) -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] book3s_hv_rmhandlers:Pass the correct trap argument to kvmhv_commence_exit
On Thu, May 21, 2015 at 01:57:04PM +0530, Gautham R. Shenoy wrote: In guest_exit_cont we call kvmhv_commence_exit which expects the trap number as the argument. However r3 doesn't contain the trap number at this point and as a result we would be calling the function with a spurious trap number. Fix this by copying r12 into r3 before calling kvmhv_commence_exit as r12 contains the trap number Signed-off-by: Gautham R. Shenoy e...@linux.vnet.ibm.com Hi Gautham, I agree with your logic: r3 is quite clearly corrupted in that path. So: Reviewed-by: Sam Bobroff sam.bobr...@au1.ibm.com Just one comment: Do you have a case of this causing some visible problem due to the corrupted trap number? (I'll test the patch if you do.) Cheers, Sam. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 1/1] powerpc/xmon: Paged output for paca display
The paca display is already more than 24 lines, which can be problematic if you have an old school 80x24 terminal, or more likely you are on a virtual terminal which does not scroll for whatever reason. This adds an optional letter to the dp and dpa xmon commands (dpp and dppa), which will enable a per-page display (with 16 line pages): the first page will be displayed and if there was data that didn't fit, it will display a message indicating that the user can use enter to display the next page. The intent is that this feels similar to the way the memory display functions work. This is implemented by running over the entire output both for the initial command and for each subsequent page: the visible part is clipped out by checking line numbers. Handling the empty command as more is done by writing a special command into a static buffer that indicates where to move the sliding visibility window. This is similar to the approach used for the memory dump commands except that the state data is encoded into the last_cmd string, rather than a set of static variables. The memory dump commands could probably be rewritten to make use of the same buffer and remove their other static variables. Sample output: 0:mon dpp1 paca for cpu 0x1 @ cfdc0480: possible = yes present = yes online = yes lock_token = 0x8000 (0x8) paca_index = 0x1 (0xa) kernel_toc = 0xc0eb2400 (0x10) kernelbase = 0xc000 (0x18) kernel_msr = 0xb0001032 (0x20) emergency_sp = 0xc0003ffe8000 (0x28) mc_emergency_sp = 0xc0003ffe4000 (0x2e0) in_mce = 0x0 (0x2e8) data_offset = 0x7f17 (0x30) hw_cpu_id= 0x8 (0x38) cpu_start= 0x1 (0x3a) kexec_state = 0x0 (0x3b) [Enter for next page] 0:mon __current= 0xc0007e696620 (0x290) kstack = 0xc0007e6ebe30 (0x298) stab_rr = 0xb (0x2a0) saved_r1 = 0xc0007ef37860 (0x2a8) trap_save= 0x0 (0x2b8) soft_enabled = 0x0 (0x2ba) irq_happened = 0x1 (0x2bb) io_sync = 0x0 (0x2bc) irq_work_pending = 0x0 (0x2bd) nap_state_lost = 0x0 (0x2be) 0:mon (Based on a similar patch by Michael Ellerman m...@ellerman.id.au [v2] powerpc/xmon: Allow limiting the size of the paca display. This patch is an alternative and cannot coexist with the original.) Signed-off-by: Sam Bobroff sam.bobr...@au1.ibm.com --- arch/powerpc/xmon/xmon.c | 82 1 file changed, 62 insertions(+), 20 deletions(-) diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c index e599259..9157286 100644 --- a/arch/powerpc/xmon/xmon.c +++ b/arch/powerpc/xmon/xmon.c @@ -72,6 +72,7 @@ static int xmon_gate; static unsigned long in_xmon __read_mostly = 0; +static char last_cmd_buf[128]; static unsigned long adrs; static int size = 1; #define MAX_DUMP (128 * 1024) @@ -204,8 +205,8 @@ Commands:\n\ dldump the kernel log buffer\n #ifdef CONFIG_PPC64 \ - dp[#]dump paca for current cpu, or cpu #\n\ - dpa dump paca for all possible cpus\n + dp[p][#] dump paca for current cpu, or cpu # (p = paged)\n\ + dp[p]a dump paca for all possible cpus (p = paged)\n #endif \ dr dump stream of raw bytes\n\ @@ -2070,7 +2071,17 @@ static void xmon_rawdump (unsigned long adrs, long ndump) } #ifdef CONFIG_PPC64 -static void dump_one_paca(int cpu) +static bool line_visible(unsigned long start, unsigned long count, +unsigned long *line) { + bool rv = (!count + || ((*line = start) (*line (start + count; + + (*line)++; + return rv; +} + +static void dump_one_paca(int cpu, unsigned long start, + unsigned long count, unsigned long *line) { struct paca_struct *p; @@ -2084,15 +2095,22 @@ static void dump_one_paca(int cpu) p = paca[cpu]; - printf(paca for cpu 0x%x @ %p:\n, cpu, p); +#define VPRINT(...) do { \ + if (line_visible(start, count, line)) \ + printf(__VA_ARGS__); \ +} while (0) + VPRINT(paca for cpu 0x%x @ %p:\n, cpu, p); - printf( %-*s = %s\n, 16, possible, cpu_possible(cpu) ? yes : no); - printf( %-*s = %s\n, 16, present, cpu_present(cpu) ? yes : no); - printf( %-*s = %s\n, 16, online, cpu_online(cpu) ? yes : no); + VPRINT( %-*s = %s\n, 16, possible, cpu_possible(cpu) ? yes : no); + VPRINT( %-*s = %s\n, 16, present, cpu_present(cpu) ? yes : no); + VPRINT( %-*s = %s\n, 16, online, cpu_online(cpu) ? yes : no); +#undef VPRINT -#define DUMP(paca, name, format) \ - printf( %-*s = %#-*format\t(0x%lx)\n, 16, #name, 18, paca-name, \ - offsetof
Re: [PATCH v2] powerpc/xmon: Allow limiting the size of the paca display
On Wed, Aug 12, 2015 at 09:55:25PM +1000, Michael Ellerman wrote: The paca display is already more than 24 lines, which can be problematic if you have an old school 80x24 terminal, or more likely you are on a virtual terminal which does not scroll for whatever reason. We'd like to expand the paca display even more, so add a way to limit the number of lines that are displayed. This adds a third form of 'dp' which is 'dp # #', where the first number is the cpu, and the second is the number of lines to display. Example output: 5:mon dp 3 6 paca for cpu 0x3 @ cfe00c00: possible = yes present = yes online = yes lock_token = 0x8000 (0xa) paca_index = 0x3 (0x8) Michael, This patch inspired me to do the additional work to make the output paged, more like the memory dump commands. I'll post it shortly as powerpc/xmon: Paged output for paca display. Cheers, Sam. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v3] powerpc: Add an inline function to update POWER8 HID0
On Wed, Aug 05, 2015 at 12:38:31PM +0530, Gautham R. Shenoy wrote: Section 3.7 of Version 1.2 of the Power8 Processor User's Manual prescribes that updates to HID0 be preceded by a SYNC instruction and followed by an ISYNC instruction (Page 91). Create an inline function name update_power8_hid0() which follows this recipe and invoke it from the static split core path. Signed-off-by: Gautham R. Shenoy e...@linux.vnet.ibm.com Hi Gautham, I've tested this on a Power 8 machine and verified that it is able to change split modes and that when doing so the new code is used. Reviewed-by: Sam Bobroff sam.bobr...@au1.ibm.com Tested-by: Sam Bobroff sam.bobr...@au1.ibm.com ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [v2,1/1] powerpc: Individual System V IPC system calls
On Tue, Oct 13, 2015 at 08:38:42PM +1100, Michael Ellerman wrote: > On Tue, 2015-13-10 at 01:49:28 UTC, Sam bobroff wrote: > > This patch provides individual system call numbers for the following > > System V IPC system calls, on PowerPC, so that they do not need to be > > multiplexed: > > * semop, semget, semctl, semtimedop > > * msgsnd, msgrcv, msgget, msgctl > > * shmat, shmdt, shmget, shmctl > > You tested this right? :) Tell me about it. Why yes I did: I have written a (fairly) trivial test program that calls each function in a way that doesn't fail (but that doesn't necessarily attempt to exercise the full functionality of it; my intent was primarily to validate the parameter passing part as that is where most of the code change is (on the glibc side)). I patched a local copy of glibc with the new kernel header and various tweaks to correctly format the parameter lists for the new calls (there is actually quite a lot of code in glibc around the IPC calls due to various compatibility issues). I could then build a full tool chain that supported the new calls. (This was a lot more extensive than the kernel patch but should be fairly close to what needs to go into glibc.) I used that tool chain to build a complete host system (using buildroot). Then I could run the following tests: * glibc: stock Host kernel: stock Result: success Notes: As expected, base case. * glibc: stock Host kernel: patched Result: success Notes: As expected, the old ipc() call still exists in the patched host. * glibc: patched Host kernel: stock Result: failure Notes: As expected, the test was run with a glibc that requires a patched kernel on an unpatched one so the syscalls are unknown. * glibc: patched Host kernel: patched Result: success Notes: As expected. (Also, a bit of debug in glibc shows the new system call paths being followed.) (I also re-ran the tests both for little-endian and big-endian hosts.) It would obviously be good to have someone else test this, but I can't see a way to make it easy to do. They would presumably have to go through all of the above, which seems too much to ask given how trivial the kernel side of the patch is. Still, it bothers me a bit so if there is any way please let me know. (I thought about writing some assembly to directly test the syscall numbers but all it would do is verify that the numbers are valid, which really isn't much of a test.) > Also we could make these available to SPU programs, but I don't think there's > any point, no one's going to do a libc update for that. > > cheers Cheers, Sam. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v3 1/2] powerpc/xmon: Paged output for paca display
On Wed, Oct 14, 2015 at 08:39:09PM +1100, Michael Ellerman wrote: > On Thu, 2015-10-08 at 11:50 +1100, Sam Bobroff wrote: > > The paca display is already more than 24 lines, which can be problematic > > if you have an old school 80x24 terminal, or more likely you are on a > > virtual terminal which does not scroll for whatever reason. > > > > This patch adds a new command ".", which takes a single (hex) numeric > > argument: lines per page. It will cause the output of "dp" and "dpa" > > to be broken into pages, if necessary. > > > > Sample output: > > > > 0:mon> .10 > > So what about making it "#" rather than "." ? > > cheers Sure, although we'll have to do a better job than the other commands in the help text ;-) (They use "#" to indicate a hex number and "##" is just going to be confusing.) Do you want me to respin? (I'm happy for you to just adjust the patch.) Cheers, Sam. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [v2,1/1] powerpc: Individual System V IPC system calls
On Wed, Oct 14, 2015 at 08:38:15PM +1100, Michael Ellerman wrote: > On Wed, 2015-10-14 at 18:00 +1100, Sam Bobroff wrote: > > On Tue, Oct 13, 2015 at 08:38:42PM +1100, Michael Ellerman wrote: > > > On Tue, 2015-13-10 at 01:49:28 UTC, Sam bobroff wrote: > > > > This patch provides individual system call numbers for the following > > > > System V IPC system calls, on PowerPC, so that they do not need to be > > > > multiplexed: > > > > * semop, semget, semctl, semtimedop > > > > * msgsnd, msgrcv, msgget, msgctl > > > > * shmat, shmdt, shmget, shmctl > > > > > > You tested this right? :) Tell me about it. > > > > Why yes I did: > > ... > > > (I also re-ran the tests both for little-endian and big-endian hosts.) > > Did you test on 32-bit at all? I ran the test program, compiled for 32 and 64 bit, on a biarch power7 machine (using -m32 and -m64 to the compiler) but only to verify that the fully patched system succeeded. Is that sufficient? > > It would obviously be good to have someone else test this, but I can't see a > > way to make it easy to do. They would presumably have to go through all of > > the > > above, which seems too much to ask given how trivial the kernel side of the > > patch is. Still, it bothers me a bit so if there is any way please let me > > know. > > (I thought about writing some assembly to directly test the syscall numbers > > but > > all it would do is verify that the numbers are valid, which really isn't > > much > > of a test.) > > Actually that is still a useful test, it at least tells you if the kernel > you're running on implements the syscalls. Obviously if you're on mainline > that's easy enough to work out from the git history, but if/when these get > backported to distro kernels, it's often harder to work out what's in the > source than just testing it directly. Oh, fair enough then. > So I wrote a quick dirty test for that, it seems to work for me: [snip] Thanks :-) > Which gives: > > test: ipc_unmuxed > tags: git_version:v4.3-rc3-44-g10053fa531a8-dirty > Testing semop returned -1, errno 22 > Testing semgetreturned -1, errno 2 > Testing semctlreturned -1, errno 22 > Testing semtimedopreturned -1, errno 22 > Testing msgsndreturned -1, errno 14 > Testing msgrcvreturned -1, errno 22 > Testing msggetreturned -1, errno 2 > Testing msgctlreturned -1, errno 22 > Testing shmat returned -1, errno 22 > Testing shmdt returned -1, errno 22 > Testing shmgetreturned -1, errno 2 > Testing shmctlreturned -1, errno 22 > success: ipc_unmuxed > > > And on an unpatched system: > > test: ipc_unmuxed > tags: git_version:v4.3-rc3-44-g10053fa531a8-dirty > Testing semop returned -1, errno 38 > [FAIL] Test FAILED on line 2 > failure: ipc_unmuxed > > > Look OK? Yep! And 38 (ENOSYS) is the code we'd expect in the failure case. > cheers Cheers, Sam. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v3 0/2] powerpc/xmon: Paged output for paca display
Changes v2 -> v3: Moved the pagination implementation from xmon.c to nonstdio.c where it's much easier to do and the code is significantly simplified. As it's now trivial to do, add the capability to truncate the output or to stop pagination and dump the rest of the output. Changed function naming scheme to read more easily (e.g. xmon_start_pagination()). Changes v1 -> v2: * Removed pagination parameters from commands, replaced with new command to set page size. This works better for multiple commands and produces simpler code. * Switched from encoding the page position in the command buffer to using some globals. Saves some memory and is less invasive to the command code. * Added a patch to paginate the kernel log buffer display. Sam Bobroff (2): powerpc/xmon: Paged output for paca display powerpc/xmon: Paginate kernel log buffer display arch/powerpc/xmon/nonstdio.c | 57 ++-- arch/powerpc/xmon/nonstdio.h | 3 +++ arch/powerpc/xmon/xmon.c | 18 ++ 3 files changed, 76 insertions(+), 2 deletions(-) -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v3 2/2] powerpc/xmon: Paginate kernel log buffer display
The kernel log buffer is often much longer than the size of a terminal so paginate it's output. Signed-off-by: Sam Bobroff <sam.bobr...@au1.ibm.com> --- arch/powerpc/xmon/xmon.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c index cc070c3..8ba8ea7 100644 --- a/arch/powerpc/xmon/xmon.c +++ b/arch/powerpc/xmon/xmon.c @@ -242,9 +242,7 @@ Commands:\n\ " u dump TLB\n" #endif " ? help\n" -#ifdef CONFIG_PPC64 -" .# limit output to # lines per page (dump paca only)\n" -#endif +" .# limit output to # lines per page (for dp#, dpa, dl)\n" " zr reboot\n\ zh halt\n" ; @@ -2333,10 +2331,12 @@ dump_log_buf(void) sync(); kmsg_dump_rewind_nolock(); + xmon_start_pagination(); while (kmsg_dump_get_line_nolock(, false, buf, sizeof(buf), )) { buf[len] = '\0'; printf("%s", buf); } + xmon_end_pagination(); sync(); /* wait a little while to see if we get a machine check */ -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [v2,1/2] powerpc/xmon: Paged output for paca display
On Tue, Oct 06, 2015 at 10:05:38PM +1100, Michael Ellerman wrote: > On Fri, 2015-21-08 at 04:24:27 UTC, Sam bobroff wrote: > > The paca display is already more than 24 lines, which can be problematic > > if you have an old school 80x24 terminal, or more likely you are on a > > virtual terminal which does not scroll for whatever reason. > > > > This patch adds a new command ".", which takes a single (hex) numeric > > argument: lines per page. It will cause the output of "dp" and "dpa" > > to be broken into pages, if necessary. > > > > This is implemented by running over the entire output both for the > > initial command and for each subsequent page: the visible part is > > clipped out by checking line numbers. This is a simplistic approach > > but minimally invasive; it is intended to be easily reusable for other > > commands. > > > > Sample output: > > > > 0:mon> .10 > > 0:mon> dp1 > > paca for cpu 0x1 @ cfdc0480: > > possible = yes > > present = yes > > online = yes > > lock_token = 0x8000 (0x8) > > paca_index = 0x1 (0xa) > > kernel_toc = 0xc0eb2400 (0x10) > > kernelbase = 0xc000 (0x18) > > kernel_msr = 0xb0001032 (0x20) > > emergency_sp = 0xc0003ffe8000 (0x28) > > mc_emergency_sp = 0xc0003ffe4000 (0x2e0) > > in_mce = 0x0 (0x2e8) > > data_offset = 0x7f17 (0x30) > > hw_cpu_id= 0x8 (0x38) > > cpu_start= 0x1 (0x3a) > > kexec_state = 0x0 (0x3b) > > [Enter for next page] > > 0:mon> > > __current= 0xc0007e696620 (0x290) > > kstack = 0xc0007e6ebe30 (0x298) > > stab_rr = 0xb (0x2a0) > > saved_r1 = 0xc0007ef37860 (0x2a8) > > trap_save= 0x0 (0x2b8) > > soft_enabled = 0x0 (0x2ba) > > irq_happened = 0x1 (0x2bb) > > io_sync = 0x0 (0x2bc) > > irq_work_pending = 0x0 (0x2bd) > > nap_state_lost = 0x0 (0x2be) > > 0:mon> > > > > (Based on a similar patch by Michael Ellerman <m...@ellerman.id.au> > > "[v2] powerpc/xmon: Allow limiting the size of the paca display". > > This patch is an alternative and cannot coexist with the original.) > > > > Signed-off-by: Sam Bobroff <sam.bobr...@au1.ibm.com> > > --- > > arch/powerpc/xmon/xmon.c | 86 > > +++- > > 1 file changed, 71 insertions(+), 15 deletions(-) > > > > diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c > > index e599259..9ce9e7d 100644 > > --- a/arch/powerpc/xmon/xmon.c > > +++ b/arch/powerpc/xmon/xmon.c > > @@ -72,6 +72,12 @@ static int xmon_gate; > > > > static unsigned long in_xmon __read_mostly = 0; > > > > +#define XMON_PRINTF(...) do { if (paged_vis()) printf(__VA_ARGS__); } > > while (0) > > Can you do this is a proper function. I know it will need to be varargs, but > that shouldn't be too ugly. Sure, but I think I'll re-work the core of the logic and place it directly into nonstdio.c. That will allow me to implement it directly in xmon_write() and there won't need to be any change here at all. It will also allow blocking during the output which will remove the whole 'run the command several times and print a slice of it' system. I'll post a v3. > > > +#define MAX_PAGED_SIZE 1024 > > Why do we need a max at all? OK, removed. > > +static unsigned long paged_size = 0, paged_pos, paged_cur_page; > > > +#ifdef CONFIG_PPC64 > > +static unsigned long paca_cpu; > > +#endif > > That can just be static in dump_pacas() by the looks. This won't be needed in the new version. > > static unsigned long adrs; > > static int size = 1; > > #define MAX_DUMP (128 * 1024) > > @@ -242,6 +248,9 @@ Commands:\n\ > > " u dump TLB\n" > > #endif > > " ? help\n" > > +#ifdef CONFIG_PPC64 > > +" .# limit output to # lines per page (dump paca only)\n" > > +#endif > > Don't make it 64-bit only. It's only because the paca display itself is 64 bit only. > > " zr reboot\n\ > >
[PATCH v3 1/2] powerpc/xmon: Paged output for paca display
The paca display is already more than 24 lines, which can be problematic if you have an old school 80x24 terminal, or more likely you are on a virtual terminal which does not scroll for whatever reason. This patch adds a new command ".", which takes a single (hex) numeric argument: lines per page. It will cause the output of "dp" and "dpa" to be broken into pages, if necessary. Sample output: 0:mon> .10 0:mon> dp1 paca for cpu 0x1 @ cfdc0480: possible = yes present = yes online = yes lock_token = 0x8000 (0x8) paca_index = 0x1 (0xa) kernel_toc = 0xc0eb2400 (0x10) kernelbase = 0xc000 (0x18) kernel_msr = 0xb0001032 (0x20) emergency_sp = 0xc0003ffe8000 (0x28) mc_emergency_sp = 0xc0003ffe4000 (0x2e0) in_mce = 0x0 (0x2e8) data_offset = 0x7f17 (0x30) hw_cpu_id= 0x8 (0x38) cpu_start= 0x1 (0x3a) kexec_state = 0x0 (0x3b) [Hit a key (a:all, q:truncate, any:next page)] 0:mon> __current= 0xc0007e696620 (0x290) kstack = 0xc0007e6ebe30 (0x298) stab_rr = 0xb (0x2a0) saved_r1 = 0xc0007ef37860 (0x2a8) trap_save= 0x0 (0x2b8) soft_enabled = 0x0 (0x2ba) irq_happened = 0x1 (0x2bb) io_sync = 0x0 (0x2bc) irq_work_pending = 0x0 (0x2bd) nap_state_lost = 0x0 (0x2be) 0:mon> (Based on a similar patch by Michael Ellerman <m...@ellerman.id.au> "[v2] powerpc/xmon: Allow limiting the size of the paca display". This patch is an alternative and cannot coexist with the original.) Signed-off-by: Sam Bobroff <sam.bobr...@au1.ibm.com> --- arch/powerpc/xmon/nonstdio.c | 57 ++-- arch/powerpc/xmon/nonstdio.h | 3 +++ arch/powerpc/xmon/xmon.c | 18 ++ 3 files changed, 76 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/xmon/nonstdio.c b/arch/powerpc/xmon/nonstdio.c index c987486..18819d7 100644 --- a/arch/powerpc/xmon/nonstdio.c +++ b/arch/powerpc/xmon/nonstdio.c @@ -11,10 +11,25 @@ #include #include "nonstdio.h" +int paginating = 0, paginate_skipping = 0; +unsigned long paginate_lpp = 0 /* Lines Per Page */; +unsigned long paginate_pos; -static int xmon_write(const void *ptr, int nb) +void xmon_start_pagination(void) { - return udbg_write(ptr, nb); + paginating = 1; + paginate_skipping = 0; + paginate_pos = 0; +} + +void xmon_end_pagination(void) +{ + paginating = 0; +} + +void xmon_set_pagination_lpp(unsigned long lpp) +{ + paginate_lpp = lpp; } static int xmon_readchar(void) @@ -24,6 +39,44 @@ static int xmon_readchar(void) return -1; } +static int xmon_write(const char *ptr, int nb) +{ + int rv = 0; + const char *p = ptr, *q; + const char msg[] = "[Hit a key (a:all, q:truncate, any:next page)]"; + + if (nb <= 0) + return rv; + if (paginating && paginate_skipping) + return nb; + if (paginate_lpp) { + while (paginating && (q = strchr(p, '\n'))) { + rv += udbg_write(p, q - p + 1); + p = q + 1; + paginate_pos++; + if (paginate_pos >= paginate_lpp) { + udbg_write(msg, strlen(msg)); + switch (xmon_readchar()) { + case 'a': + paginating = 0; + break; + case 'q': + paginate_skipping = 1; + break; + default: + /* nothing */ + break; + } + paginate_pos = 0; + udbg_write("\r\n", 2); + if (paginate_skipping) + return nb; + } + } + } + return rv + udbg_write(p, nb - (p - ptr)); +} + int xmon_putchar(int c) { char ch = c; diff --git a/arch/powerpc/xmon/nonstdio.h b/arch/powerpc/xmon/nonstdio.h index 18a51de..f865336 100644 --- a/arch/powerpc/xmon/nonstdio.h +++ b/arch/powerpc/xmon/nonstdio.h @@ -3,6 +3,9 @@ #define printf xmon_printf #define putcharxmon_putchar +extern void xmon_set_pagination_lpp(unsigned long lpp); +extern void xmon_start_pagination(void); +extern void xmon_end_pagination(void
[PATCH v2 1/1] powerpc: Individual System V IPC system calls
This patch provides individual system call numbers for the following System V IPC system calls, on PowerPC, so that they do not need to be multiplexed: * semop, semget, semctl, semtimedop * msgsnd, msgrcv, msgget, msgctl * shmat, shmdt, shmget, shmctl Signed-off-by: Sam Bobroff <sam.bobr...@au1.ibm.com> --- v2: Rebased onto today's next-20151012. arch/powerpc/include/asm/systbl.h | 12 arch/powerpc/include/asm/unistd.h | 2 +- arch/powerpc/include/uapi/asm/unistd.h | 12 3 files changed, 25 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/systbl.h b/arch/powerpc/include/asm/systbl.h index 126d0c4..c9e26cb 100644 --- a/arch/powerpc/include/asm/systbl.h +++ b/arch/powerpc/include/asm/systbl.h @@ -370,3 +370,15 @@ COMPAT_SYS(execveat) PPC64ONLY(switch_endian) SYSCALL_SPU(userfaultfd) SYSCALL_SPU(membarrier) +SYSCALL(semop) +SYSCALL(semget) +COMPAT_SYS(semctl) +COMPAT_SYS(semtimedop) +COMPAT_SYS(msgsnd) +COMPAT_SYS(msgrcv) +SYSCALL(msgget) +COMPAT_SYS(msgctl) +COMPAT_SYS(shmat) +SYSCALL(shmdt) +SYSCALL(shmget) +COMPAT_SYS(shmctl) diff --git a/arch/powerpc/include/asm/unistd.h b/arch/powerpc/include/asm/unistd.h index 13411be..6d8f802 100644 --- a/arch/powerpc/include/asm/unistd.h +++ b/arch/powerpc/include/asm/unistd.h @@ -12,7 +12,7 @@ #include -#define __NR_syscalls 366 +#define __NR_syscalls 378 #define __NR__exit __NR_exit #define NR_syscalls__NR_syscalls diff --git a/arch/powerpc/include/uapi/asm/unistd.h b/arch/powerpc/include/uapi/asm/unistd.h index 63377380..81579e9 100644 --- a/arch/powerpc/include/uapi/asm/unistd.h +++ b/arch/powerpc/include/uapi/asm/unistd.h @@ -388,5 +388,17 @@ #define __NR_switch_endian 363 #define __NR_userfaultfd 364 #define __NR_membarrier365 +#define __NR_semop 366 +#define __NR_semget367 +#define __NR_semctl368 +#define __NR_semtimedop369 +#define __NR_msgsnd370 +#define __NR_msgrcv371 +#define __NR_msgget372 +#define __NR_msgctl373 +#define __NR_shmat 374 +#define __NR_shmdt 375 +#define __NR_shmget376 +#define __NR_shmctl377 #endif /* _UAPI_ASM_POWERPC_UNISTD_H_ */ -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 1/1] powerpc: Individual System V IPC system calls
This patch provides individual system call numbers for the following System V IPC system calls, on PowerPC, so that they do not need to be multiplexed: * semop, semget, semctl, semtimedop * msgsnd, msgrcv, msgget, msgctl * shmat, shmdt, shmget, shmctl Signed-off-by: Sam Bobroff <sam.bobr...@au1.ibm.com> --- arch/powerpc/include/asm/systbl.h | 12 arch/powerpc/include/asm/unistd.h | 2 +- arch/powerpc/include/uapi/asm/unistd.h | 12 3 files changed, 25 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/systbl.h b/arch/powerpc/include/asm/systbl.h index 71f2b3f..546b9ec 100644 --- a/arch/powerpc/include/asm/systbl.h +++ b/arch/powerpc/include/asm/systbl.h @@ -368,3 +368,15 @@ SYSCALL_SPU(memfd_create) SYSCALL_SPU(bpf) COMPAT_SYS(execveat) PPC64ONLY(switch_endian) +SYSCALL(semop) +SYSCALL(semget) +COMPAT_SYS(semctl) +COMPAT_SYS(semtimedop) +COMPAT_SYS(msgsnd) +COMPAT_SYS(msgrcv) +SYSCALL(msgget) +COMPAT_SYS(msgctl) +COMPAT_SYS(shmat) +SYSCALL(shmdt) +SYSCALL(shmget) +COMPAT_SYS(shmctl) diff --git a/arch/powerpc/include/asm/unistd.h b/arch/powerpc/include/asm/unistd.h index f4f8b66..e51c51b 100644 --- a/arch/powerpc/include/asm/unistd.h +++ b/arch/powerpc/include/asm/unistd.h @@ -12,7 +12,7 @@ #include -#define __NR_syscalls 364 +#define __NR_syscalls 376 #define __NR__exit __NR_exit #define NR_syscalls__NR_syscalls diff --git a/arch/powerpc/include/uapi/asm/unistd.h b/arch/powerpc/include/uapi/asm/unistd.h index e4aa173..a8390ee 100644 --- a/arch/powerpc/include/uapi/asm/unistd.h +++ b/arch/powerpc/include/uapi/asm/unistd.h @@ -386,5 +386,17 @@ #define __NR_bpf 361 #define __NR_execveat 362 #define __NR_switch_endian 363 +#define __NR_semop 364 +#define __NR_semget365 +#define __NR_semctl366 +#define __NR_semtimedop367 +#define __NR_msgsnd368 +#define __NR_msgrcv369 +#define __NR_msgget370 +#define __NR_msgctl371 +#define __NR_shmat 372 +#define __NR_shmdt 373 +#define __NR_shmget374 +#define __NR_shmctl375 #endif /* _UAPI_ASM_POWERPC_UNISTD_H_ */ -- 2.1.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/1] powerpc: Individual System V IPC system calls
On Fri, Sep 25, 2015 at 10:33:37AM +1000, Michael Ellerman wrote: > On Thu, 2015-09-24 at 15:39 +1000, Sam Bobroff wrote: > > This patch provides individual system call numbers for the following > > System V IPC system calls, on PowerPC, so that they do not need to be > > multiplexed: > > * semop, semget, semctl, semtimedop > > * msgsnd, msgrcv, msgget, msgctl > > * shmat, shmdt, shmget, shmctl > > Thanks. > > Can you please rebase this on top of linux-next, where we have two new > syscalls wired up. It should be trivial, just the numbering changes, but I > think you have a modified libc to actually test the result. No problem, and yes I can test the glibc changes :-) Cheers, Sam. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v5 08/11] powerpc/powernv: Add platform support for stop instruction
On Thu, Jun 02, 2016 at 07:38:58AM -0500, Shreyas B. Prabhu wrote: ... > +/* Power Management - PSSCR Fields */ It might be nice to give the full name of the register, as below with the FPSCR. > +#define PSSCR_RL_MASK0x000F > +#define PSSCR_MTL_MASK 0x00F0 > +#define PSSCR_TR_MASK0x0300 > +#define PSSCR_PSLL_MASK 0x000F > +#define PSSCR_EC 0x0010 > +#define PSSCR_ESL0x0020 > +#define PSSCR_SD 0x0040 > + > + > /* Floating Point Status and Control Register (FPSCR) Fields */ Cheers, Sam. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/1] powerpc: Detect broken or mismatched toolchains
On Mon, Feb 22, 2016 at 08:05:01PM -0600, Scott Wood wrote: > On Mon, 2016-02-22 at 16:13 +1100, Sam Bobroff wrote: > > It can currently be difficult to diagnose a build that fails due to > > the compiler, linker or other parts of the toolchain being unable to > > build binaries of the type required by the kernel config. For example > > using a little endian toolchain to build a big endian kernel may > > produce: > > > > as: unrecognized option '-maltivec' > > > > This patch adds a basic compile test and error message to > > arch/powerpc/Makefile so that the above error becomes: > > > > *** Sorry, your toolchain seems to be broken or incorrect. *** > > Make sure it supports your kernel configuration (ppc64). > > > > Signed-off-by: Sam Bobroff <sam.bobr...@au1.ibm.com> > > --- > > How is this more useful than getting to actually see the way in which the > toolchain (or the CFLAGS) is broken? My reasoning was that it would be better because it happens at the start of the build, rather than (possibly) a long way into it, and it indicates that the problem is the toolchain setup (or config) itself rather than the file it's trying to compile or link. But I agree completely with what you're saying. I'll try re-working it in a way that shows the command that fails and it's output. Cheers, Sam. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 1/1] powerpc: Detect broken or mismatched toolchains
It can currently be difficult to diagnose a build that fails due to the compiler, linker or other parts of the toolchain being unable to build binaries of the type required by the kernel config. For example using a little endian toolchain to build a big endian kernel may produce: as: unrecognized option '-maltivec' This patch adds a basic compile test and error message to arch/powerpc/Makefile so that the above error becomes: *** Sorry, your toolchain seems to be broken or incorrect. *** Make sure it supports your kernel configuration (ppc64). Signed-off-by: Sam Bobroff <sam.bobr...@au1.ibm.com> --- arch/powerpc/Makefile | 7 +++ 1 file changed, 7 insertions(+) diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile index 96efd82..0041cd2 100644 --- a/arch/powerpc/Makefile +++ b/arch/powerpc/Makefile @@ -355,6 +355,13 @@ TOUT := .tmp_gas_check # - Require gcc 4.0 or above on 64-bit # - gcc-4.2.0 has issues compiling modules on 64-bit checkbin: + @if test "$(call try-run,echo 'int _start(void) { return 0; }' > \"$$TMP\"; \ + $(CC) $(KBUILD_CPPFLAGS) $(KBUILD_CFLAGS) -x c -nostdlib \"$$TMP\" \ + -o /dev/null,ok,broken)" = "broken" ; then \ + echo "*** Sorry, your toolchain seems to be broken or incorrect. ***" ; \ + echo "Make sure it supports your kernel configuration ($(UTS_MACHINE))." ; \ + false; \ + fi @if test "$(cc-name)" != "clang" \ && test "$(cc-version)" = "0304" ; then \ if ! /bin/echo mftb 5 | $(AS) -v -mppc -many -o $(TOUT) >/dev/null 2>&1 ; then \ -- 2.1.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [1/1] KVM: PPC: Introduce KVM_CAP_PPC_HTM
On Fri, Jul 08, 2016 at 08:49:49PM +1000, Michael Ellerman wrote: > On Wed, 2016-06-07 at 06:05:54 UTC, Sam bobroff wrote: > > diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c > > index 02416fe..06d79bc 100644 > > --- a/arch/powerpc/kvm/powerpc.c > > +++ b/arch/powerpc/kvm/powerpc.c > > @@ -588,6 +588,10 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long > > ext) > > r = 1; > > break; > > #endif > > + case KVM_CAP_PPC_HTM: > > + r = cpu_has_feature(CPU_FTR_TM) > > + && is_kvmppc_hv_enabled(kvm); > > I think it should be using CPU_FTR_TM_COMP. Oh, why is that? I'm happy to respin the patch I'm just curious. (I did it that way becuase that seems to be the way the other flags are used, e.g. CPU_FTR_ALTIVEC). If I read the code correctly, using CPU_FTR_TM_COMP will work fine: it should cause the cpu_has_feature() test to always return false if CPU_FTR_TM_COMP is 0. > And AFAICS you don't need to break that line. Sure, I'll un-split it when I respin. > cheers Cheers, Sam. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 1/1] KVM: PPC: Introduce KVM_CAP_PPC_HTM
Introduce a new KVM capability, KVM_CAP_PPC_HTM, that can be queried to determine if a PowerPC KVM guest should use HTM (Hardware Transactional Memory). This will be used by QEMU to populate the pa-features bits in the guest's device tree. Signed-off-by: Sam Bobroff <sam.bobr...@au1.ibm.com> --- v2: * Use CPU_FTR_TM_COMP instead of CPU_FTR_TM. * I didn't unbreak the line, as with the extra characters checkpatch will complain if I do. I did move the break to a more usual place. arch/powerpc/kvm/powerpc.c | 4 include/uapi/linux/kvm.h | 1 + 2 files changed, 5 insertions(+) diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 02416fe..5ebc8ff 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -588,6 +588,10 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) r = 1; break; #endif + case KVM_CAP_PPC_HTM: + r = cpu_has_feature(CPU_FTR_TM_COMP) && + is_kvmppc_hv_enabled(kvm); + break; default: r = 0; break; diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 05ebf47..f421d0e 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -866,6 +866,7 @@ struct kvm_ppc_smmu_info { #define KVM_CAP_ARM_PMU_V3 126 #define KVM_CAP_VCPU_ATTRIBUTES 127 #define KVM_CAP_MAX_VCPU_ID 128 +#define KVM_CAP_PPC_HTM 129 #ifdef KVM_CAP_IRQ_ROUTING -- 2.1.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 1/3] spapr: Disable ibm, pa-features HTM bit
There are a few issues with our handling of the ibm,pa-features HTM bit: - We don't support transactional memory in PR KVM, so don't tell the OS that we do. - In full emulation we have a minimal implementation of HTM that always fails, so for performance reasons lets not tell the OS that we support it either. - In HV KVM mode, we should mirror the host HTM enabled state by checking a KVM capability or looking at the AT_HWCAP2 bit. For now unconditionally disable it by removing HTM from the pa-features bits. It will be re-enabled in a subsequent patch specifically for HV KVM. Signed-off-by: Sam Bobroff <sam.bobr...@au1.ibm.com> --- hw/ppc/spapr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 78ebd9e..704aae7 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -635,7 +635,7 @@ static void spapr_populate_cpu_dt(CPUState *cs, void *fdt, int offset, 0xf6, 0x1f, 0xc7, 0xc0, 0x80, 0xf0, 0x80, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x80, 0x00, -0x80, 0x00, 0x80, 0x00, 0x80, 0x00 }; +0x80, 0x00, 0x80, 0x00, 0x00, 0x00 }; uint8_t *pa_features; size_t pa_size; -- 2.1.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 2/3] Add KVM_CAP_PPC_HTM to linux/kvm.h
Signed-off-by: Sam Bobroff <sam.bobr...@au1.ibm.com> --- linux-headers/linux/kvm.h | 1 + 1 file changed, 1 insertion(+) diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h index e60e21b..37cb3e8 100644 --- a/linux-headers/linux/kvm.h +++ b/linux-headers/linux/kvm.h @@ -866,6 +866,7 @@ struct kvm_ppc_smmu_info { #define KVM_CAP_ARM_PMU_V3 126 #define KVM_CAP_VCPU_ATTRIBUTES 127 #define KVM_CAP_MAX_VCPU_ID 128 +#define KVM_CAP_PPC_HTM 129 #ifdef KVM_CAP_IRQ_ROUTING -- 2.1.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 3/3] spapr: Set ibm, pa-features HTM from KVM_CAP_PPC_HTM
Advertise HTM support in ibm, pa-features if KVM indicates support when queried via a new capability (KVM_CAP_PPC_HTM). If KVM returns false for the capability (which may indicate that the host kernel doesn't support the capability itself) attempt to determine availability using a fallback method based on KVM being KVM-HV and HTM being available to the QEMU process. Signed-off-by: Sam Bobroff <sam.bobr...@au1.ibm.com> --- hw/ppc/spapr.c | 3 ++- target-ppc/kvm.c | 27 +++ target-ppc/kvm_ppc.h | 6 ++ 3 files changed, 35 insertions(+), 1 deletion(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 704aae7..e229532 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -606,6 +606,7 @@ static void spapr_populate_cpu_dt(CPUState *cs, void *fdt, int offset, uint32_t tbfreq = kvm_enabled() ? kvmppc_get_tbfreq() : SPAPR_TIMEBASE_FREQ; uint32_t cpufreq = kvm_enabled() ? kvmppc_get_clockfreq() : 10; +uint8_t htm = (kvm_enabled() && kvmppc_get_htm_support(env)) ? 0x80 : 0x00; uint32_t page_sizes_prop[64]; size_t page_sizes_prop_size; uint32_t vcpus_per_socket = smp_threads * smp_cores; @@ -635,7 +636,7 @@ static void spapr_populate_cpu_dt(CPUState *cs, void *fdt, int offset, 0xf6, 0x1f, 0xc7, 0xc0, 0x80, 0xf0, 0x80, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x80, 0x00, -0x80, 0x00, 0x80, 0x00, 0x00, 0x00 }; +0x80, 0x00, 0x80, 0x00, 0x00 | htm, 0x00 }; uint8_t *pa_features; size_t pa_size; diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c index 884d564..f94ce3b 100644 --- a/target-ppc/kvm.c +++ b/target-ppc/kvm.c @@ -20,6 +20,7 @@ #include #include +#include "elf.h" #include "qemu-common.h" #include "qemu/error-report.h" @@ -1976,6 +1977,32 @@ uint32_t kvmppc_get_dfp(void) return kvmppc_read_int_cpu_dt("ibm,dfp"); } +bool kvmppc_get_htm_support(CPUPPCState *env) +{ +PowerPCCPU *cpu = ppc_env_get_cpu(env); +CPUState *cs = CPU(cpu); + + +if (kvm_vm_check_extension(cs->kvm_state, KVM_CAP_PPC_HTM)) { +return true; +} +/* + * Fallback test for host kernels that don't yet support KVM_CAP_PPC_HTM. + * This will be unnecessary when KVM_API_VERSION is incremented (to 13 or + * above) because that will remove the ambiguity between the host kernel + * lacking support for KVM_CAP_PPC_HTM and it having support but reporting + * HTM as unavailable (both of which return 0, above). + */ +if (kvm_vm_check_extension(cs->kvm_state, KVM_CAP_PPC_GET_PVINFO)) { +/* Assume this means PR KVM, so no TM. */ +return false; +} else { +/* Assume this means HV KVM, propagate whatever host userspace sees. */ +unsigned long hwcap2 = qemu_getauxval(AT_HWCAP2); +return !!(hwcap2 & PPC_FEATURE2_HAS_HTM); +} +} + static int kvmppc_get_pvinfo(CPUPPCState *env, struct kvm_ppc_pvinfo *pvinfo) { PowerPCCPU *cpu = ppc_env_get_cpu(env); diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h index 20bfb59..b01c717 100644 --- a/target-ppc/kvm_ppc.h +++ b/target-ppc/kvm_ppc.h @@ -17,6 +17,7 @@ uint32_t kvmppc_get_tbfreq(void); uint64_t kvmppc_get_clockfreq(void); uint32_t kvmppc_get_vmx(void); uint32_t kvmppc_get_dfp(void); +bool kvmppc_get_htm_support(CPUPPCState *env); bool kvmppc_get_host_model(char **buf); bool kvmppc_get_host_serial(char **buf); int kvmppc_get_hasidle(CPUPPCState *env); @@ -90,6 +91,11 @@ static inline uint32_t kvmppc_get_dfp(void) return 0; } +static inline bool kvmppc_get_htm_support(KVMState *kvm_state) +{ +return false; +} + static inline int kvmppc_get_hasidle(CPUPPCState *env) { return 0; -- 2.1.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 0/3] Rework spapr: Better handling of ibm, pa-features TM bit
Hi David, Anton asked me to have a look at this, so here is an attempt at a re-implementation of his: "spapr: Better handling of ibm, pa-features TM bit" addressing your comments and those from Paul Mackerras. I've broken the patch into one to unconditionally disable the HTM bit in pa-features and a second one to set it conditionally based on a (new) KVM capability. It requires a small patch to KVM to support the capability, presumably something like this: diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 02416fe..4a8ddab 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -588,6 +588,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) r = 1; break; #endif + case KVM_CAP_PPC_HTM: + r = cpu_has_feature(CPU_FTR_TM); + break; default: r = 0; break; Adding a new capability requires changing linux/kvm.h but I believe we can avoid incrementing KVM_API_VERSION at this stage since kernels that don't yet support it will simply return false. However, due to the ambiguity of that result I've included Anton's initial fallback approach to be used in that case. Once the API version is incremented (and support for the capability is guaranteed) the ambiguity would be gone and we should be able to remove the fallback. As long as that happens before KVM PR supports HTM, this should address Paul's concern about using the PVINFO capability to discover KVM-HV. Note: I'm not sure how to handle the change to linux/kvm.h, I've included the patch here because it's needed to compile, but I suspect it needs to go via the kernel. Let's see if this part looks good first. Note also: I've changed TM to HTM where possible in an attempt to be consistent. Sam Bobroff (3): spapr: Disable ibm, pa-features HTM bit Add KVM_CAP_PPC_HTM to linux/kvm.h spapr: Set ibm, pa-features HTM from KVM_CAP_PPC_HTM hw/ppc/spapr.c| 3 ++- linux-headers/linux/kvm.h | 1 + target-ppc/kvm.c | 27 +++ target-ppc/kvm_ppc.h | 6 ++ 4 files changed, 36 insertions(+), 1 deletion(-) -- 2.1.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 3/3] spapr: Set ibm, pa-features HTM from KVM_CAP_PPC_HTM
Advertise HTM support in ibm, pa-features if KVM indicates support when queried via a new capability (KVM_CAP_PPC_HTM). If KVM returns false for the capability (which may indicate that the host kernel doesn't support the capability itself) attempt to determine availability using a fallback method based on KVM being KVM-HV and HTM being available to the QEMU process. Signed-off-by: Sam Bobroff <sam.bobr...@au1.ibm.com> --- v2: * Improve readability of HTM bit set code. * Move the test for KVM into kvmppc_get_htm_support(). hw/ppc/spapr.c | 3 +++ target-ppc/kvm.c | 29 + target-ppc/kvm_ppc.h | 6 ++ 3 files changed, 38 insertions(+) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 704aae7..843fbe7 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -712,6 +712,9 @@ static void spapr_populate_cpu_dt(CPUState *cs, void *fdt, int offset, } else /* env->mmu_model == POWERPC_MMU_2_07 */ { pa_features = pa_features_207; pa_size = sizeof(pa_features_207); +if (kvmppc_get_htm_support(env)) { +pa_features[24] |= 0x80; +} } if (env->ci_large_pages) { pa_features[3] |= 0x20; diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c index 884d564..9d13446 100644 --- a/target-ppc/kvm.c +++ b/target-ppc/kvm.c @@ -20,6 +20,7 @@ #include #include +#include "elf.h" #include "qemu-common.h" #include "qemu/error-report.h" @@ -1976,6 +1977,34 @@ uint32_t kvmppc_get_dfp(void) return kvmppc_read_int_cpu_dt("ibm,dfp"); } +bool kvmppc_get_htm_support(CPUPPCState *env) +{ +PowerPCCPU *cpu = ppc_env_get_cpu(env); +CPUState *cs = CPU(cpu); + +if (!kvm_enabled()) { +return false; +} +if (kvm_vm_check_extension(cs->kvm_state, KVM_CAP_PPC_HTM)) { +return true; +} +/* + * Fallback test for host kernels that don't yet support KVM_CAP_PPC_HTM. + * This will be unnecessary when KVM_API_VERSION is incremented (to 13 or + * above) because that will remove the ambiguity between the host kernel + * lacking support for KVM_CAP_PPC_HTM and it having support but reporting + * HTM as unavailable (both of which return 0, above). + */ +if (kvm_vm_check_extension(cs->kvm_state, KVM_CAP_PPC_GET_PVINFO)) { +/* Assume this means PR KVM, so no TM. */ +return false; +} else { +/* Assume this means HV KVM, propagate whatever host userspace sees. */ +unsigned long hwcap2 = qemu_getauxval(AT_HWCAP2); +return !!(hwcap2 & PPC_FEATURE2_HAS_HTM); +} +} + static int kvmppc_get_pvinfo(CPUPPCState *env, struct kvm_ppc_pvinfo *pvinfo) { PowerPCCPU *cpu = ppc_env_get_cpu(env); diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h index 20bfb59..b01c717 100644 --- a/target-ppc/kvm_ppc.h +++ b/target-ppc/kvm_ppc.h @@ -17,6 +17,7 @@ uint32_t kvmppc_get_tbfreq(void); uint64_t kvmppc_get_clockfreq(void); uint32_t kvmppc_get_vmx(void); uint32_t kvmppc_get_dfp(void); +bool kvmppc_get_htm_support(CPUPPCState *env); bool kvmppc_get_host_model(char **buf); bool kvmppc_get_host_serial(char **buf); int kvmppc_get_hasidle(CPUPPCState *env); @@ -90,6 +91,11 @@ static inline uint32_t kvmppc_get_dfp(void) return 0; } +static inline bool kvmppc_get_htm_support(KVMState *kvm_state) +{ +return false; +} + static inline int kvmppc_get_hasidle(CPUPPCState *env) { return 0; -- 2.1.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 2/3] Add KVM_CAP_PPC_HTM to linux/kvm.h
On Tue, Jul 05, 2016 at 04:05:58PM +1000, David Gibson wrote: > On Tue, Jul 05, 2016 at 03:19:23PM +1000, Sam Bobroff wrote: > > Signed-off-by: Sam Bobroff <sam.bobr...@au1.ibm.com> > > Ok, so the usual procedure for updates to linux-headers is this: >1. Get the change merged on the kernel side > >2. Use scripts/update-linux-headers.sh to update the whole > linux-headers subtree to the new kernel version > >3. Submit the patch - make sure to include the commit id of the > kernel you updated to in the qemu commit message Thanks, Do you think we're at the point with this patch set that I should start progressing the kernel side change now? Cheers, Sam. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 2/3] Add KVM_CAP_PPC_HTM to linux/kvm.h
Signed-off-by: Sam Bobroff <sam.bobr...@au1.ibm.com> --- linux-headers/linux/kvm.h | 1 + 1 file changed, 1 insertion(+) diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h index e60e21b..37cb3e8 100644 --- a/linux-headers/linux/kvm.h +++ b/linux-headers/linux/kvm.h @@ -866,6 +866,7 @@ struct kvm_ppc_smmu_info { #define KVM_CAP_ARM_PMU_V3 126 #define KVM_CAP_VCPU_ATTRIBUTES 127 #define KVM_CAP_MAX_VCPU_ID 128 +#define KVM_CAP_PPC_HTM 129 #ifdef KVM_CAP_IRQ_ROUTING -- 2.1.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 0/3] Rework spapr: Better handling of ibm, pa-features TM bit
Here's version 2. Changes v1 -> v2: Patch 2/3: spapr: Set ibm, pa-features HTM from KVM_CAP_PPC_HTM * Improve readability of HTM bit set code. * Move the test for KVM into kvmppc_get_htm_support(). Sam Bobroff (3): spapr: Disable ibm, pa-features HTM bit Add KVM_CAP_PPC_HTM to linux/kvm.h spapr: Set ibm, pa-features HTM from KVM_CAP_PPC_HTM hw/ppc/spapr.c| 5 - linux-headers/linux/kvm.h | 1 + target-ppc/kvm.c | 29 + target-ppc/kvm_ppc.h | 6 ++ 4 files changed, 40 insertions(+), 1 deletion(-) -- 2.1.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 1/3] spapr: Disable ibm, pa-features HTM bit
There are a few issues with our handling of the ibm,pa-features HTM bit: - We don't support transactional memory in PR KVM, so don't tell the OS that we do. - In full emulation we have a minimal implementation of HTM that always fails, so for performance reasons lets not tell the OS that we support it either. - In HV KVM mode, we should mirror the host HTM enabled state by checking a KVM capability or looking at the AT_HWCAP2 bit. For now unconditionally disable it by removing HTM from the pa-features bits. It will be re-enabled in a subsequent patch specifically for HV KVM. Signed-off-by: Sam Bobroff <sam.bobr...@au1.ibm.com> --- hw/ppc/spapr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 78ebd9e..704aae7 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -635,7 +635,7 @@ static void spapr_populate_cpu_dt(CPUState *cs, void *fdt, int offset, 0xf6, 0x1f, 0xc7, 0xc0, 0x80, 0xf0, 0x80, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x80, 0x00, -0x80, 0x00, 0x80, 0x00, 0x80, 0x00 }; +0x80, 0x00, 0x80, 0x00, 0x00, 0x00 }; uint8_t *pa_features; size_t pa_size; -- 2.1.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 1/1] KVM: PPC: Introduce KVM_CAP_PPC_HTM
Introduce a new KVM capability, KVM_CAP_PPC_HTM, that can be queried to determine if a PowerPC KVM guest should use HTM (Hardware Transactional Memory). This will be used by QEMU to populate the pa-features bits in the guest's device tree. Signed-off-by: Sam Bobroff <sam.bobr...@au1.ibm.com> --- arch/powerpc/kvm/powerpc.c | 4 include/uapi/linux/kvm.h | 1 + 2 files changed, 5 insertions(+) diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 02416fe..06d79bc 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -588,6 +588,10 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) r = 1; break; #endif + case KVM_CAP_PPC_HTM: + r = cpu_has_feature(CPU_FTR_TM) + && is_kvmppc_hv_enabled(kvm); + break; default: r = 0; break; diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 05ebf47..f421d0e 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -866,6 +866,7 @@ struct kvm_ppc_smmu_info { #define KVM_CAP_ARM_PMU_V3 126 #define KVM_CAP_VCPU_ATTRIBUTES 127 #define KVM_CAP_MAX_VCPU_ID 128 +#define KVM_CAP_PPC_HTM 129 #ifdef KVM_CAP_IRQ_ROUTING -- 2.1.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/6] kvm: powerpc halt polling updates
On Fri, Oct 14, 2016 at 11:53:18AM +1100, Suraj Jitindar Singh wrote: > This patch series makes some updates and bug fixes to the powerpc kvm-hv > halt polling code. > > The first two patches are concerned with exporting the generic kvm module > parameter variables and accessing these from the powerpc specific code. > > The third patch fixes a bug where changing the global max halt polling > interval module parameter can sometimes have no effect. > > The fourth patch decreases the default global max halt polling interval > to something more sensible. > > The fifth patch contains generic fixups with no functional effect. > > The last patch adds halt polling documentation. > > Suraj Jitindar Singh (6): > kvm: export kvm module parameter variables > powerpc/kvm: Use generic kvm module parameters in kvm-hv > powerpc/kvm: Add check for module parameter halt_poll_ns > powerpc/kvm: Decrease the powerpc default halt poll max value > powerpc/kvm: Comment style and print format fixups > doc/kvm: Add halt polling documentation > > Documentation/virtual/kvm/00-INDEX | 2 + > Documentation/virtual/kvm/halt-polling.txt | 127 > + > arch/powerpc/include/asm/kvm_host.h| 2 +- > arch/powerpc/kvm/book3s_hv.c | 33 ++-- > arch/powerpc/kvm/trace_hv.h| 2 +- > include/linux/kvm_host.h | 4 + > virt/kvm/kvm_main.c| 9 +- > 7 files changed, 149 insertions(+), 30 deletions(-) > create mode 100644 Documentation/virtual/kvm/halt-polling.txt > > -- > 2.5.5 Hi Suraj, I've given this set a quick test and it seems to work fine. I used a repetitive wakeup, using a nanosleep loop in guest userspace (with real time prio), and I was able to cause halt polling to switch on and off as I adjusted halt_poll_ns. I think the new default value is much better: halt polling started (e.g. CPU utilization rose to 100%) once CPU utilization had already risen to about 75%. Cheers, Sam. Tested-by: Sam Bobroff <sam.bobr...@au1.ibm.com>
Re: [PATCH 01/12] powerpc: Disable HFSCR:TM if TM not supported
On Mon, Mar 20, 2017 at 05:49:03PM +1100, Benjamin Herrenschmidt wrote: > Otherwise KVM guests might mess with it even when told not > to causing bad thing interrupts in the host > > Signed-off-by: Benjamin HerrenschmidtI've tested this on a P8, with a kernel and QEMU close to their respective current master branches, and if: * the host is configured without CONFIG_PPC_TRANSACTIONAL_MEM, * and the guest is configured with CONFIG_PPC_TRANSACTIONAL_MEM, * and the guest runs a program that uses HTM (in my tests, just a loop doing some floating point multiplies in a transaction)... Without the patch the host will OOPS, usually in __kvmppc_vcore_entry, and kill QEMU. On a busy host this is sometimes followed by "Oops: Bad kernel stack pointer, sig: 6" and the host dies. With the patch the userspace test program is killed with a SIGILL. The guest and host are unaffected. Cheers, Sam. > --- > arch/powerpc/kernel/setup_64.c | 10 ++ > 1 file changed, 10 insertions(+) > > diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c > index 9cfaa8b..b372b23 100644 > --- a/arch/powerpc/kernel/setup_64.c > +++ b/arch/powerpc/kernel/setup_64.c > @@ -236,6 +236,16 @@ static void cpu_ready_for_interrupts(void) > mtspr(SPRN_LPCR, lpcr | LPCR_AIL_3); > } > > + /* > + * Fixup HFSCR:TM based on CPU features. The bit is set by our > + * early asm init because at that point we haven't updated our > + * CPU features from firmware and device-tree. Here we have, > + * so let's do it > + */ > + if (early_cpu_has_feature(CPU_FTR_HVMODE) && > + !early_cpu_has_feature(CPU_FTR_TM_COMP)) > + mtspr(SPRN_HFSCR, mfspr(SPRN_HFSCR) & ~HFSCR_TM); > + > /* Set IR and DR in PACA MSR */ > get_paca()->kernel_msr = MSR_KERNEL; > } > -- > 2.9.3
[PATCH 1/1] selftests/powerpc: Improve tm-resched-dscr
The tm-resched-dscr self test can, in some situations, run for several minutes before being successfully interrupted by the context switch it needs in order to perform the test. This often seems to occur when the test is being run in a virtual machine. Improve the test by running it under eat_cpu() to guarantee contention for the CPU and increase the chance of a context switch. In practice this seems to reduce the test time, in some cases, from more than two minutes to under a second. Also remove the "progress dots" so that if the test does run for a long time, it doesn't produce large amounts of unnecessary output. Signed-off-by: Sam Bobroff <sam.bobr...@au1.ibm.com> --- tools/testing/selftests/powerpc/tm/Makefile | 1 + tools/testing/selftests/powerpc/tm/tm-resched-dscr.c | 12 2 files changed, 9 insertions(+), 4 deletions(-) diff --git a/tools/testing/selftests/powerpc/tm/Makefile b/tools/testing/selftests/powerpc/tm/Makefile index 958c11c14acd..7bfcd454fb2a 100644 --- a/tools/testing/selftests/powerpc/tm/Makefile +++ b/tools/testing/selftests/powerpc/tm/Makefile @@ -15,6 +15,7 @@ $(OUTPUT)/tm-syscall: tm-syscall-asm.S $(OUTPUT)/tm-syscall: CFLAGS += -I../../../../../usr/include $(OUTPUT)/tm-tmspr: CFLAGS += -pthread $(OUTPUT)/tm-vmx-unavail: CFLAGS += -pthread -m64 +$(OUTPUT)/tm-resched-dscr: ../pmu/lib.o SIGNAL_CONTEXT_CHK_TESTS := $(patsubst %,$(OUTPUT)/%,$(SIGNAL_CONTEXT_CHK_TESTS)) $(SIGNAL_CONTEXT_CHK_TESTS): tm-signal.S diff --git a/tools/testing/selftests/powerpc/tm/tm-resched-dscr.c b/tools/testing/selftests/powerpc/tm/tm-resched-dscr.c index e79ccd6aada1..a7ac2e4c60d9 100644 --- a/tools/testing/selftests/powerpc/tm/tm-resched-dscr.c +++ b/tools/testing/selftests/powerpc/tm/tm-resched-dscr.c @@ -30,6 +30,7 @@ #include "utils.h" #include "tm.h" +#include "../pmu/lib.h" #define SPRN_DSCR 0x03 @@ -75,8 +76,6 @@ int test_body(void) ); assert(rv); /* make sure the transaction aborted */ if ((texasr >> 56) != TM_CAUSE_RESCHED) { - putchar('.'); - fflush(stdout); continue; } if (dscr2 != dscr1) { @@ -89,7 +88,12 @@ int test_body(void) } } -int main(void) +static int tm_resched_dscr(void) { - return test_harness(test_body, "tm_resched_dscr"); + return eat_cpu(test_body); +} + +int main(int argc, const char *argv[]) +{ + return test_harness(tm_resched_dscr, "tm_resched_dscr"); } -- 2.14.1.2.g4274c698f
[PATCH 1/1] KVM: PPC: Book3S: Fix server always zero from kvmppc_xive_get_xive()
In KVM's XICS-on-XIVE emulation, kvmppc_xive_get_xive() returns the value of state->guest_server as "server". However, this value is not set by it's counterpart kvmppc_xive_set_xive(). When the guest uses this interface to migrate interrupts away from a CPU that is going offline, it sees all interrupts as belonging to CPU 0, so they are left assigned to (now) offline CPUs. This patch removes the guest_server field from the state, and returns act_server in it's place (that is, the CPU actually handling the interrupt, which may differ from the one requested). Fixes: 5af50993850a ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller") Cc: sta...@vger.kernel.org Signed-off-by: Sam Bobroff <sam.bobr...@au1.ibm.com> --- The other obvious way to patch this would be to set state->guest_server in kvmppc_xive_set_xive() and that does also work because act_server is usually equal to guest_server. However, in the cases where guest_server differed from act_server, the guest would only move IRQs correctly if it got act_server (the CPU actually handling the interrupt) here. So, that approach seemed better. arch/powerpc/kvm/book3s_xive.c | 5 ++--- arch/powerpc/kvm/book3s_xive.h | 1 - 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c index 13304622ab1c..bf457843e032 100644 --- a/arch/powerpc/kvm/book3s_xive.c +++ b/arch/powerpc/kvm/book3s_xive.c @@ -622,7 +622,7 @@ int kvmppc_xive_get_xive(struct kvm *kvm, u32 irq, u32 *server, return -EINVAL; state = >irq_state[idx]; arch_spin_lock(>lock); - *server = state->guest_server; + *server = state->act_server; *priority = state->guest_priority; arch_spin_unlock(>lock); @@ -1331,7 +1331,7 @@ static int xive_get_source(struct kvmppc_xive *xive, long irq, u64 addr) xive->saved_src_count++; /* Convert saved state into something compatible with xics */ - val = state->guest_server; + val = state->act_server; prio = state->saved_scan_prio; if (prio == MASKED) { @@ -1507,7 +1507,6 @@ static int xive_set_source(struct kvmppc_xive *xive, long irq, u64 addr) /* First convert prio and mark interrupt as untargetted */ act_prio = xive_prio_from_guest(guest_prio); state->act_priority = MASKED; - state->guest_server = server; /* * We need to drop the lock due to the mutex below. Hopefully diff --git a/arch/powerpc/kvm/book3s_xive.h b/arch/powerpc/kvm/book3s_xive.h index 5938f7644dc1..6ba63f8e8a61 100644 --- a/arch/powerpc/kvm/book3s_xive.h +++ b/arch/powerpc/kvm/book3s_xive.h @@ -35,7 +35,6 @@ struct kvmppc_xive_irq_state { struct xive_irq_data *pt_data; /* XIVE Pass-through associated data */ /* Targetting as set by guest */ - u32 guest_server; /* Current guest selected target */ u8 guest_priority; /* Guest set priority */ u8 saved_priority; /* Saved priority when masking */ -- 2.14.1.4.g334a7be4f
Re: [PATCH] powerpc/kernel: Change retrieval of pci_dn
On Mon, Aug 28, 2017 at 11:05:03AM -0500, Bryant G. Ly wrote: > For a PCI device it's pci_dn can be retrieved from > pdev->dev.archdata.firmware_data, PCI_DN(devnode), or parent's list. > Thus, we should just use the generic function pci_get_pdn_by_devfn > to get the pci_dn. > > Signed-off-by: Bryant G. Ly <b...@us.ibm.com> Reviewed-by: Sam Bobroff <sam.bobr...@au1.ibm.com> I don't know this area but I tested it using a patched kernel with the old and new code together. My test kernel booted fine (in QEMU+KVM) and I saw 26 reads and 4 writes, all of which got the same value with either code block. I also checked that the error result in the "not found" case is the same as well, which it is, because rtas_{read,write}_config() will return PCIBIOS_DEVICE_NOT_FOUND if given a NULL pdn. So, looks good to me. Cheers, Sam. > --- > arch/powerpc/kernel/rtas_pci.c | 30 ++ > 1 file changed, 2 insertions(+), 28 deletions(-) > > diff --git a/arch/powerpc/kernel/rtas_pci.c b/arch/powerpc/kernel/rtas_pci.c > index 73f1934..c21e6c5 100644 > --- a/arch/powerpc/kernel/rtas_pci.c > +++ b/arch/powerpc/kernel/rtas_pci.c > @@ -91,25 +91,13 @@ static int rtas_pci_read_config(struct pci_bus *bus, > unsigned int devfn, > int where, int size, u32 *val) > { > - struct device_node *busdn, *dn; > struct pci_dn *pdn; > - bool found = false; > int ret; > > /* Search only direct children of the bus */ > *val = 0x; > - busdn = pci_bus_to_OF_node(bus); > - for (dn = busdn->child; dn; dn = dn->sibling) { > - pdn = PCI_DN(dn); > - if (pdn && pdn->devfn == devfn > - && of_device_is_available(dn)) { > - found = true; > - break; > - } > - } > > - if (!found) > - return PCIBIOS_DEVICE_NOT_FOUND; > + pdn = pci_get_pdn_by_devfn(bus, devfn); > > ret = rtas_read_config(pdn, where, size, val); > if (*val == EEH_IO_ERROR_VALUE(size) && > @@ -153,23 +141,9 @@ static int rtas_pci_write_config(struct pci_bus *bus, >unsigned int devfn, >int where, int size, u32 val) > { > - struct device_node *busdn, *dn; > struct pci_dn *pdn; > - bool found = false; > - > - /* Search only direct children of the bus */ > - busdn = pci_bus_to_OF_node(bus); > - for (dn = busdn->child; dn; dn = dn->sibling) { > - pdn = PCI_DN(dn); > - if (pdn && pdn->devfn == devfn > - && of_device_is_available(dn)) { > - found = true; > - break; > - } > - } > > - if (!found) > - return PCIBIOS_DEVICE_NOT_FOUND; > + pdn = pci_get_pdn_by_devfn(bus, devfn); > > return rtas_write_config(pdn, where, size, val); > } > -- > 2.5.4 (Apple Git-61)
Re: [PATCH 13/13] powerpc/eeh: Refactor report functions
On Thu, May 03, 2018 at 11:27:12PM +1000, Michael Ellerman wrote: > Sam Bobroff <sbobr...@linux.ibm.com> writes: > > diff --git a/arch/powerpc/kernel/eeh_driver.c > > b/arch/powerpc/kernel/eeh_driver.c > > index eb4feee81ff4..1c4336dcf9f5 100644 > > --- a/arch/powerpc/kernel/eeh_driver.c > > +++ b/arch/powerpc/kernel/eeh_driver.c > > @@ -54,6 +54,25 @@ static int eeh_result_priority(enum pci_ers_result > > result) > > } > > }; > > > > +const char *pci_ers_result_name(enum pci_ers_result r) > > +{ > > + switch (r) { > > + case PCI_ERS_RESULT_NONE: return "none"; > > + case PCI_ERS_RESULT_CAN_RECOVER: return "can recover"; > > + case PCI_ERS_RESULT_NEED_RESET: return "need reset"; > > + case PCI_ERS_RESULT_DISCONNECT: return "disconnect"; > > + case PCI_ERS_RESULT_RECOVERED: return "recovered"; > > + case PCI_ERS_RESULT_NO_AER_DRIVER: return "no AER driver"; > > + default: > > + WARN_ONCE(1, "Unknown result type"); > > + return "unknown"; > > + } > > +}; > > + > > +#define eeh_infoline(EDEV, FMT, ...) \ > > +pr_info("EEH: PE#%x (PCI %s): " pr_fmt(FMT) "\n", EDEV->pe_config_addr, \ > > +((EDEV->pdev) ? dev_name(>pdev->dev) : "NONE"), ##__VA_ARGS__) > > Ooof, I guess. > > It would be nicer as a function. OK (I'd prefer to avoid macros too), but I'm not sure what kind of function you mean. I initially tried to use a function, but the existing pr_*() type macros seemed to be macros all the way down to printk, so there didn't seem to be anywhere to hook into, and I ended up having to open-code the varargs and allocate a buffer to print into. Then it can overflow etc. and it ended up seeming worse. Is there a better way I'm missing here? > "infoline" annoys me for some reason, "eeh_info" ? OK. How about eeh_edev_info(), to indicate that it acts on an 'edev'? > > @@ -223,123 +242,118 @@ static void eeh_set_irq_state(struct eeh_pe *root, > > bool enable) > > } > > } > > > > +static void eeh_pe_report(const char *name, struct eeh_pe *root, > > + enum pci_ers_result (*fn)(struct eeh_dev *, > > + struct pci_driver *), > > + enum pci_ers_result *result) > > +{ > > + struct eeh_pe *pe; > > + struct eeh_dev *edev, *tmp; > > + enum pci_ers_result new_result; > > + > > + pr_info("EEH: Beginning: '%s'\n", name); > > + eeh_for_each_pe(root, pe) { > > + eeh_pe_for_each_dev(pe, edev, tmp) { > > This should be in a separate function. Oh, that's better. Will do! > > + device_lock(>pdev->dev); > > + if (eeh_edev_actionable(edev)) { > > + struct pci_driver *driver; > > + > > + driver = eeh_pcid_get(edev->pdev); > > + > > + if (!driver) > > + eeh_infoline(edev, "no driver"); > > + else if (!driver->err_handler) > > + eeh_infoline(edev, > > +"driver not EEH aware"); > > + else if (edev->mode & EEH_DEV_NO_HANDLER) > > + eeh_infoline(edev, > > +"driver bound too late"); > > + else { > > + new_result = fn(edev, driver); > > + eeh_infoline(edev, > > + "%s driver reports: '%s'", > > + driver->name, > > + > > pci_ers_result_name(new_result)); > > + if (result) > > + *result = merge_result(*result, > > + > > new_result); > > + } > > + if (driver) > > + eeh_pcid_put(edev->pdev); > > + } else { > > + eeh_infoline(edev, "not actionable (%d,%d,%d)", > > +!!edev->pdev, > > +!eeh_dev_removed(edev), > > +!eeh_pe_passed(edev->pe)); > > + } > > + device_unlock(>pdev->dev); > > ^^^ > > > cheers Thanks! signature.asc Description: PGP signature
Re: [PATCH 03/13] powerpc/eeh: Fix use-after-release of EEH driver
On Fri, May 04, 2018 at 12:56:55PM +1000, Michael Ellerman wrote: > Sam Bobroff <sbobr...@linux.ibm.com> writes: > > > Correct two cases where eeh_pcid_get() is used to reference the driver's > > module but the reference is dropped before the driver pointer is used. > > > > In eeh_rmv_device() also refactor a little so that only two calls to > > eeh_pcid_put() are needed, rather than three and the reference isn't > > taken at all if it wasn't needed. > > This sounds like a crash or memory corruption bug? > > But it doesn't have Fixes or Cc: stable. Is it not a major problem for > some reason? Only that I've exercised that code path a fair bit during testing and never managed to cause a problem with it. I found it by inspection. Do you think I should mark it fixes or stable in the next version? > cheers > signature.asc Description: PGP signature
Re: [PATCH 02/13] powerpc/eeh: Add final message for successful recovery
On Fri, May 04, 2018 at 12:55:42PM +1000, Michael Ellerman wrote: > Sam Bobroff <sbobr...@linux.ibm.com> writes: > > > Add a single log line at the end of successful EEH recovery, so that > > it's clear that event processing has finished. > > > > Signed-off-by: Sam Bobroff <sbobr...@linux.ibm.com> > > --- > > arch/powerpc/kernel/eeh_driver.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/arch/powerpc/kernel/eeh_driver.c > > b/arch/powerpc/kernel/eeh_driver.c > > index 56a60b9eb397..07e0a42035ce 100644 > > --- a/arch/powerpc/kernel/eeh_driver.c > > +++ b/arch/powerpc/kernel/eeh_driver.c > > @@ -910,6 +910,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe) > > pr_info("EEH: Notify device driver to resume\n"); > > eeh_pe_dev_traverse(pe, eeh_report_resume, NULL); > > > > + pr_info("EEH: Recovery successful.\n"); > > Is it possible for recovery for multiple devices to be interleaved? > > Should that message include the device? No, all the calls are serialized. They're only called from eeh_event_handler() which is in the main loop of a kernel thread (see eeh_event_init()). > cheers > signature.asc Description: PGP signature
Re: [PATCH 02/13] powerpc/eeh: Add final message for successful recovery
On Fri, May 04, 2018 at 04:08:28PM +1000, Russell Currey wrote: > On Fri, 2018-05-04 at 12:55 +1000, Michael Ellerman wrote: > > Sam Bobroff <sbobr...@linux.ibm.com> writes: > > > > > Add a single log line at the end of successful EEH recovery, so > > > that > > > it's clear that event processing has finished. > > > > > > Signed-off-by: Sam Bobroff <sbobr...@linux.ibm.com> > > > --- > > > arch/powerpc/kernel/eeh_driver.c | 1 + > > > 1 file changed, 1 insertion(+) > > > > > > diff --git a/arch/powerpc/kernel/eeh_driver.c > > > b/arch/powerpc/kernel/eeh_driver.c > > > index 56a60b9eb397..07e0a42035ce 100644 > > > --- a/arch/powerpc/kernel/eeh_driver.c > > > +++ b/arch/powerpc/kernel/eeh_driver.c > > > @@ -910,6 +910,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe) > > > pr_info("EEH: Notify device driver to resume\n"); > > > eeh_pe_dev_traverse(pe, eeh_report_resume, NULL); > > > > > > + pr_info("EEH: Recovery successful.\n"); > > Is it possible for recovery for multiple devices to be interleaved? > > > > Should that message include the device? > > Pretty sure EEH will only process a single error at a time so this > *should* always let you infer from context, but PHB and PE should > probably be included anyway. It'd be cool to move pe_{err/warn/info}() > out of powernv for messages like this. While we only process a single "event" at a time, that event covers the initial PE, or possibly it's parent or PHB, and any dependent PEs so I thought it could be misleading to show only a single one for that message -- I wanted something to indicate that the whole "event" had finished (since that was otherwise not really clear to me). Does that seem reasonable? I'm sorry but I don't understand what you mean by moving some functions out of powernv, could you explain more? > - Russell > > > > > cheers > signature.asc Description: PGP signature
Re: [PATCH 11/13] powerpc/eeh: Introduce eeh_set_irq_state()
On Fri, May 04, 2018 at 01:02:32PM +1000, Michael Ellerman wrote: > Sam Bobroff <sbobr...@linux.ibm.com> writes: > > > diff --git a/arch/powerpc/kernel/eeh_driver.c > > b/arch/powerpc/kernel/eeh_driver.c > > index f63a01d336ee..b3edd0df04b8 100644 > > --- a/arch/powerpc/kernel/eeh_driver.c > > +++ b/arch/powerpc/kernel/eeh_driver.c > > @@ -210,6 +206,23 @@ static void eeh_set_channel_state(struct eeh_pe *root, > > enum pci_channel_state s) > > edev->pdev->error_state = s; > > } > > > > +static void eeh_set_irq_state(struct eeh_pe *root, bool enable) > > +{ > > + struct eeh_pe *pe; > > + struct eeh_dev *edev, *tmp; > > + > > + eeh_for_each_pe(root, pe) > > + eeh_pe_for_each_dev(pe, edev, tmp) > > + if (eeh_edev_actionable(edev)) > > + if (eeh_pcid_get(edev->pdev)) { > > + if (enable) > > + eeh_enable_irq(edev); > > + else > > + eeh_disable_irq(edev); > > + eeh_pcid_put(edev->pdev); > > + } > > Yikes. > > What about? > > eeh_for_each_pe(root, pe) { > eeh_pe_for_each_dev(pe, edev, tmp) { > if (!eeh_edev_actionable(edev)) > continue; > > if (!eeh_pcid_get(edev->pdev)) > continue; > > if (enable) > eeh_enable_irq(edev); > else > eeh_disable_irq(edev); > > eeh_pcid_put(edev->pdev); > } > } > > cheers Sure, will do. Cheers, signature.asc Description: PGP signature
Re: [PATCH 07/13] powerpc/eeh: Clean up pci_ers_result handling
On Fri, May 04, 2018 at 04:58:01PM +1000, Russell Currey wrote: > On Wed, 2018-05-02 at 16:36 +1000, Sam Bobroff wrote: > > As EEH event handling progresses, a cumulative result of type > > pci_ers_result is built up by (some of) the eeh_report_*() functions > > using either: > > if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc; > > if (*res == PCI_ERS_RESULT_NONE) *res = rc; > > or: > > if ((*res == PCI_ERS_RESULT_NONE) || > > (*res == PCI_ERS_RESULT_RECOVERED)) *res = rc; > > if (*res == PCI_ERS_RESULT_DISCONNECT && > > rc == PCI_ERS_RESULT_NEED_RESET) *res = rc; > > (Where *res is the accumulator.) > > > > However, the intent is not immediately clear and the result in some > > situations is order dependent. > > > > Address this by assigning a priority to each result value, and always > > merging to the highest priority. This renders the intent clear, and > > provides a stable value for all orderings. > > > > Signed-off-by: Sam Bobroff <sbobr...@linux.ibm.com> > > --- > > arch/powerpc/kernel/eeh_driver.c | 36 ++-- > > > > 1 file changed, 26 insertions(+), 10 deletions(-) > > > > diff --git a/arch/powerpc/kernel/eeh_driver.c > > b/arch/powerpc/kernel/eeh_driver.c > > index 188d15c4fe3a..f33dd68a9ca2 100644 > > --- a/arch/powerpc/kernel/eeh_driver.c > > +++ b/arch/powerpc/kernel/eeh_driver.c > > @@ -39,6 +39,29 @@ struct eeh_rmv_data { > > int removed; > > }; > > > > +static int eeh_result_priority(enum pci_ers_result result) > > +{ > > + switch (result) { > > + case PCI_ERS_RESULT_NONE: return 0; > > + case PCI_ERS_RESULT_NO_AER_DRIVER: return 1; > > + case PCI_ERS_RESULT_RECOVERED: return 2; > > + case PCI_ERS_RESULT_CAN_RECOVER: return 3; > > + case PCI_ERS_RESULT_DISCONNECT: return 4; > > + case PCI_ERS_RESULT_NEED_RESET: return 5; > > + default: > > + WARN_ONCE(1, "Unknown pci_ers_result value"); > > Missing newline and also would be good to print the value was Sounds good, will do! I'm not sure if it's mentioned elsewhere but I'll fix the same issues in pci_ers_result_name() as well. > > + return 0; > > + } > > +}; > > + > > +static enum pci_ers_result merge_result(enum pci_ers_result old, > > + enum pci_ers_result new) > > merge_result() sounds like something really generic, maybe call it > pci_ers_merge_result() or something? Sounds good, will do. > Note: just learned that it stands for Error Recovery System and that's > a thing (?) > > > +{ > > + if (eeh_result_priority(new) > eeh_result_priority(old)) > > + return new; > > + return old; > > MAX would be nicer as per mpe's suggestion Sorry, I don't understand. The return value isn't MAX(new, old) so how can MAX() do this? > > +} > > + > > /** > > * eeh_pcid_get - Get the PCI device driver > > * @pdev: PCI device > > @@ -206,9 +229,7 @@ static void *eeh_report_error(struct eeh_dev > > *edev, void *userdata) > > > > rc = driver->err_handler->error_detected(dev, > > pci_channel_io_frozen); > > > > - /* A driver that needs a reset trumps all others */ > > - if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc; > > - if (*res == PCI_ERS_RESULT_NONE) *res = rc; > > + *res = merge_result(*res, rc); > > > > edev->in_error = true; > > pci_uevent_ers(dev, PCI_ERS_RESULT_NONE); > > @@ -249,9 +270,7 @@ static void *eeh_report_mmio_enabled(struct > > eeh_dev *edev, void *userdata) > > > > rc = driver->err_handler->mmio_enabled(dev); > > > > - /* A driver that needs a reset trumps all others */ > > - if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc; > > - if (*res == PCI_ERS_RESULT_NONE) *res = rc; > > + *res = merge_result(*res, rc); > > > > out: > > eeh_pcid_put(dev); > > @@ -294,10 +313,7 @@ static void *eeh_report_reset(struct eeh_dev > > *edev, void *userdata) > > goto out; > > > > rc = driver->err_handler->slot_reset(dev); > > - if ((*res == PCI_ERS_RESULT_NONE) || > > - (*res == PCI_ERS_RESULT_RECOVERED)) *res = rc; > > - if (*res == PCI_ERS_RESULT_DISCONNECT && > > -rc == PCI_ERS_RESULT_NEED_RESET) *res = rc; > > + *res = merge_result(*res, rc); > > > > out: > > eeh_pcid_put(dev); > signature.asc Description: PGP signature
[PATCH v2 1/1] powerpc/pseries: fix EEH recovery of some IOV devices
EEH recovery currently fails on pSeries for some IOV capable PCI devices, if CONFIG_PCI_IOV is on and the hypervisor doesn't provide certain device tree properties for the device. (Found on an IOV capable device using the ipr driver.) Recovery fails in pci_enable_resources() at the check on r->parent, because r->flags is set and r->parent is not. This state is due to sriov_init() setting the start, end and flags members of the IOV BARs but the parent not being set later in pseries_pci_fixup_iov_resources(), because the "ibm,open-sriov-vf-bar-info" property is missing. Correct this by zeroing the resource flags for IOV BARs when they can't be configured. Signed-off-by: Sam Bobroff <sbobr...@linux.ibm.com> --- Hi, This is a fix to allow EEH recovery to succeed in a specific situation, which I've tried to explain in the commit message. As with the RFC version, the IOV BARs are disabled by setting the resource flags to 0 but the other fields are now left as-is because that is what is done elsewhere (see sriov_init() and __pci_read_base()). I've also examined the concern raised by Bjorn Helgaas, that VFs could be enabled later after the BARs are disabled, and it already seems safe: enabling VFs (on pseries) depends on another device tree property, "ibm,number-of-configurable-vfs" as well as support for the RTAS function "ibm_map_pes". Since these are all part of the hypervisor's support for IOV it seems unlikely that we would ever see some of them but not all. (None are currently provided by QEMU/KVM.) (Additionally, the ipr driver on which the EEH recovery failure was discovered doesn't even seem to have SR-IOV support so it certainly can't enable VFs.) Cheers, Sam. == v1 -> v2: == Patch 1/1: powerpc/pseries: fix EEH recovery of some IOV devices * Moved the BAR disabling code to a function. * Also check in pseries_pci_fixup_resources(). == v1: == Patch 1/1: powerpc/pseries: fix EEH recovery of IOV devices arch/powerpc/platforms/pseries/setup.c | 25 + 1 file changed, 17 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c index b55ad4286dc7..0a9e4243ae1d 100644 --- a/arch/powerpc/platforms/pseries/setup.c +++ b/arch/powerpc/platforms/pseries/setup.c @@ -645,6 +645,15 @@ void of_pci_parse_iov_addrs(struct pci_dev *dev, const int *indexes) } } +static void pseries_disable_sriov_resources(struct pci_dev *pdev) +{ + int i; + + pci_warn(pdev, "No hypervisor support for SR-IOV on this device, IOV BARs disabled.\n"); + for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) + pdev->resource[i + PCI_IOV_RESOURCES].flags = 0; +} + static void pseries_pci_fixup_resources(struct pci_dev *pdev) { const int *indexes; @@ -652,10 +661,10 @@ static void pseries_pci_fixup_resources(struct pci_dev *pdev) /*Firmware must support open sriov otherwise dont configure*/ indexes = of_get_property(dn, "ibm,open-sriov-vf-bar-info", NULL); - if (!indexes) - return; - /* Assign the addresses from device tree*/ - of_pci_set_vf_bar_size(pdev, indexes); + if (indexes) + of_pci_set_vf_bar_size(pdev, indexes); + else + pseries_disable_sriov_resources(pdev); } static void pseries_pci_fixup_iov_resources(struct pci_dev *pdev) @@ -667,10 +676,10 @@ static void pseries_pci_fixup_iov_resources(struct pci_dev *pdev) return; /*Firmware must support open sriov otherwise dont configure*/ indexes = of_get_property(dn, "ibm,open-sriov-vf-bar-info", NULL); - if (!indexes) - return; - /* Assign the addresses from device tree*/ - of_pci_parse_iov_addrs(pdev, indexes); + if (indexes) + of_pci_parse_iov_addrs(pdev, indexes); + else + pseries_disable_sriov_resources(pdev); } static resource_size_t pseries_pci_iov_resource_alignment(struct pci_dev *pdev, -- 2.16.1.74.g9b0b1f47b
[PATCH v2 03/13] powerpc/eeh: Fix use-after-release of EEH driver
Correct two cases where eeh_pcid_get() is used to reference the driver's module but the reference is dropped before the driver pointer is used. In eeh_rmv_device() also refactor a little so that only two calls to eeh_pcid_put() are needed, rather than three and the reference isn't taken at all if it wasn't needed. Signed-off-by: Sam Bobroff <sbobr...@linux.ibm.com> --- arch/powerpc/kernel/eeh_driver.c | 28 1 file changed, 16 insertions(+), 12 deletions(-) diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c index 07e0a42035ce..54333f6c9d67 100644 --- a/arch/powerpc/kernel/eeh_driver.c +++ b/arch/powerpc/kernel/eeh_driver.c @@ -458,9 +458,11 @@ static void *eeh_add_virt_device(void *data, void *userdata) driver = eeh_pcid_get(dev); if (driver) { - eeh_pcid_put(dev); - if (driver->err_handler) + if (driver->err_handler) { + eeh_pcid_put(dev); return NULL; + } + eeh_pcid_put(dev); } #ifdef CONFIG_PCI_IOV @@ -497,17 +499,19 @@ static void *eeh_rmv_device(void *data, void *userdata) if (eeh_dev_removed(edev)) return NULL; - driver = eeh_pcid_get(dev); - if (driver) { - eeh_pcid_put(dev); - if (removed && - eeh_pe_passed(edev->pe)) - return NULL; - if (removed && - driver->err_handler && - driver->err_handler->error_detected && - driver->err_handler->slot_reset) + if (removed) { + if (eeh_pe_passed(edev->pe)) return NULL; + driver = eeh_pcid_get(dev); + if (driver) { + if (driver->err_handler && + driver->err_handler->error_detected && + driver->err_handler->slot_reset) { + eeh_pcid_put(dev); + return NULL; + } + eeh_pcid_put(dev); + } } /* Remove it from PCI subsystem */ -- 2.16.1.74.g9b0b1f47b
[PATCH v2 01/13] powerpc/eeh: Add eeh_max_freezes to initial EEH log line
The current failure message includes the number of failures that have occurred in the last hour (for a device) but it does not indicate how many failures will be tolerated before the device is permanently disabled. Include the limit (eeh_max_freezes) to make this less surprising when it happens. Also remove the embedded newline from the existing message to make it easier to grep for. Signed-off-by: Sam Bobroff <sbobr...@linux.ibm.com> --- arch/powerpc/kernel/eeh_driver.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c index b8a329f04814..56a60b9eb397 100644 --- a/arch/powerpc/kernel/eeh_driver.c +++ b/arch/powerpc/kernel/eeh_driver.c @@ -778,14 +778,13 @@ void eeh_handle_normal_event(struct eeh_pe *pe) eeh_pe_update_time_stamp(pe); pe->freeze_count++; if (pe->freeze_count > eeh_max_freezes) { - pr_err("EEH: PHB#%x-PE#%x has failed %d times in the\n" - "last hour and has been permanently disabled.\n", + pr_err("EEH: PHB#%x-PE#%x has failed %d times in the last hour and has been permanently disabled.\n", pe->phb->global_number, pe->addr, pe->freeze_count); goto hard_fail; } - pr_warn("EEH: This PCI device has failed %d times in the last hour\n", - pe->freeze_count); + pr_warn("EEH: This PCI device has failed %d times in the last hour and will be permanently disabled after %d failures.\n", + pe->freeze_count, eeh_max_freezes); /* Walk the various device drivers attached to this slot through * a reset sequence, giving each an opportunity to do what it needs -- 2.16.1.74.g9b0b1f47b
[PATCH v2 00/13] EEH refactoring 2
Hello everyone, Here is a second, somewhat deeper, set of cleanups for the EEH code (mostly eeh_drver.c). These changes are not intended to significantly alter the actual processing, but rather to improve the readability and maintainability of the code. They are subjective by nature so I would appreciate comments and suggestions. The earlier changes are mostly to support the last patch, where we're finally able to use a common infrastructure for the reporting functions (basically wrappers around the driver's handlers). This allows removal of a fair bit of code, and the easy addition of some useful messaging which should make future maintenance easier (as an example, a recent fix in this area "powerpc/eeh: Fix race with driver un/bind" would have required adding two lines rather than 42+/26-). Cheers, Sam. Patch set changelog follows: == v1 -> v2: == Patch 1/13: powerpc/eeh: Add eeh_max_freezes to initial EEH log line Patch 2/13: powerpc/eeh: Add final message for successful recovery Patch 3/13: powerpc/eeh: Fix use-after-release of EEH driver Patch 4/13: powerpc/eeh: Remove unused eeh_pcid_name() Patch 5/13: powerpc/eeh: Strengthen types of eeh traversal functions Patch 6/13: powerpc/eeh: Add message when PE processing at parent Patch 7/13: powerpc/eeh: Clean up pci_ers_result handling * Added the value, and missing newline, to some WARN()s. * Improved name of merge_result() to pci_ers_merge_result(). * Adjusted the result priorities so that unknown doesn't overlap with _NONE. Patch 8/13: powerpc/eeh: Introduce eeh_for_each_pe() Patch 9/13: powerpc/eeh: Introduce eeh_edev_actionable() Patch 10/13: powerpc/eeh: Introduce eeh_set_channel_state() Patch 11/13: powerpc/eeh: Introduce eeh_set_irq_state() * Reorganised eeh_set_irq_state() to reduce nesting depth. Patch 12/13: powerpc/eeh: Cleaner handling of EEH_DEV_NO_HANDLER Patch 13/13: powerpc/eeh: Refactor report functions * Better name for eeh_infoline() and implement using a function. * Move the core of eeh_pe_report() into a new function to improve readability. * pci_ers_result_name(): match parameter name to eeh_result_priority(), correct and improve warning message. == v1: == Patch 1/13: powerpc/eeh: Add eeh_max_freezes to initial EEH log line Patch 2/13: powerpc/eeh: Add final message for successful recovery Patch 3/13: powerpc/eeh: Fix use-after-release of EEH driver Patch 4/13: powerpc/eeh: Remove unused eeh_pcid_name() Patch 5/13: powerpc/eeh: Strengthen types of eeh traversal functions Patch 6/13: powerpc/eeh: Add message when PE processing at parent Patch 7/13: powerpc/eeh: Clean up pci_ers_result handling Patch 8/13: powerpc/eeh: Introduce eeh_for_each_pe() Patch 9/13: powerpc/eeh: Introduce eeh_edev_actionable() Patch 10/13: powerpc/eeh: Introduce eeh_set_channel_state() Patch 11/13: powerpc/eeh: Introduce eeh_set_irq_state() Patch 12/13: powerpc/eeh: Cleaner handling of EEH_DEV_NO_HANDLER Patch 13/13: powerpc/eeh: Refactor report functions Sam Bobroff (13): powerpc/eeh: Add eeh_max_freezes to initial EEH log line powerpc/eeh: Add final message for successful recovery powerpc/eeh: Fix use-after-release of EEH driver powerpc/eeh: Remove unused eeh_pcid_name() powerpc/eeh: Strengthen types of eeh traversal functions powerpc/eeh: Add message when PE processing at parent powerpc/eeh: Clean up pci_ers_result handling powerpc/eeh: Introduce eeh_for_each_pe() powerpc/eeh: Introduce eeh_edev_actionable() powerpc/eeh: Introduce eeh_set_channel_state() powerpc/eeh: Introduce eeh_set_irq_state() powerpc/eeh: Cleaner handling of EEH_DEV_NO_HANDLER powerpc/eeh: Refactor report functions arch/powerpc/include/asm/eeh.h | 11 +- arch/powerpc/kernel/eeh.c| 19 +- arch/powerpc/kernel/eeh_driver.c | 493 +-- arch/powerpc/kernel/eeh_pe.c | 26 +-- 4 files changed, 294 insertions(+), 255 deletions(-) -- 2.16.1.74.g9b0b1f47b
[PATCH v2 09/13] powerpc/eeh: Introduce eeh_edev_actionable()
The same test is done in every EEH report function, so factor it out. Since eeh_dev_removed() needs to be moved higher up in the file, simplify it a little while we're at it. Signed-off-by: Sam Bobroff <sbobr...@linux.ibm.com> --- arch/powerpc/kernel/eeh_driver.c | 30 -- 1 file changed, 16 insertions(+), 14 deletions(-) diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c index 2d3cac584899..30898866418b 100644 --- a/arch/powerpc/kernel/eeh_driver.c +++ b/arch/powerpc/kernel/eeh_driver.c @@ -62,6 +62,17 @@ static enum pci_ers_result pci_ers_merge_result(enum pci_ers_result old, return old; } +static bool eeh_dev_removed(struct eeh_dev *edev) +{ + return !edev || (edev->mode & EEH_DEV_REMOVED); +} + +static bool eeh_edev_actionable(struct eeh_dev *edev) +{ + return (edev->pdev && !eeh_dev_removed(edev) && + !eeh_pe_passed(edev->pe)); +} + /** * eeh_pcid_get - Get the PCI device driver * @pdev: PCI device @@ -163,15 +174,6 @@ static void eeh_enable_irq(struct pci_dev *dev) } } -static bool eeh_dev_removed(struct eeh_dev *edev) -{ - /* EEH device removed ? */ - if (!edev || (edev->mode & EEH_DEV_REMOVED)) - return true; - - return false; -} - static void *eeh_dev_save_state(struct eeh_dev *edev, void *userdata) { struct pci_dev *pdev; @@ -212,7 +214,7 @@ static void *eeh_report_error(struct eeh_dev *edev, void *userdata) enum pci_ers_result rc, *res = userdata; struct pci_driver *driver; - if (!dev || eeh_dev_removed(edev) || eeh_pe_passed(edev->pe)) + if (!eeh_edev_actionable(edev)) return NULL; device_lock(>dev); @@ -256,7 +258,7 @@ static void *eeh_report_mmio_enabled(struct eeh_dev *edev, void *userdata) enum pci_ers_result rc, *res = userdata; struct pci_driver *driver; - if (!dev || eeh_dev_removed(edev) || eeh_pe_passed(edev->pe)) + if (!eeh_edev_actionable(edev)) return NULL; device_lock(>dev); @@ -295,7 +297,7 @@ static void *eeh_report_reset(struct eeh_dev *edev, void *userdata) enum pci_ers_result rc, *res = userdata; struct pci_driver *driver; - if (!dev || eeh_dev_removed(edev) || eeh_pe_passed(edev->pe)) + if (!eeh_edev_actionable(edev)) return NULL; device_lock(>dev); @@ -365,7 +367,7 @@ static void *eeh_report_resume(struct eeh_dev *edev, void *userdata) bool was_in_error; struct pci_driver *driver; - if (!dev || eeh_dev_removed(edev) || eeh_pe_passed(edev->pe)) + if (!eeh_edev_actionable(edev)) return NULL; device_lock(>dev); @@ -412,7 +414,7 @@ static void *eeh_report_failure(struct eeh_dev *edev, void *userdata) struct pci_dev *dev = eeh_dev_to_pci_dev(edev); struct pci_driver *driver; - if (!dev || eeh_dev_removed(edev) || eeh_pe_passed(edev->pe)) + if (!eeh_edev_actionable(edev)) return NULL; device_lock(>dev); -- 2.16.1.74.g9b0b1f47b
[PATCH v2 13/13] powerpc/eeh: Refactor report functions
The EEH report functions now share a fair bit of code around the start and end of each function. So factor out as much as possible, and move the traversal into a custom function. This also allows accurate debug to be generated more easily. Signed-off-by: Sam Bobroff <sbobr...@linux.ibm.com> --- == v1 -> v2: == * Better name for eeh_infoline() and implement using a function. * Move the core of eeh_pe_report() into a new function to improve readability. * pci_ers_result_name(): match parameter name to eeh_result_priority(), correct and improve warning message. arch/powerpc/kernel/eeh_driver.c | 310 --- 1 file changed, 159 insertions(+), 151 deletions(-) diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c index 42fe80ed6f59..7cab58abbec9 100644 --- a/arch/powerpc/kernel/eeh_driver.c +++ b/arch/powerpc/kernel/eeh_driver.c @@ -54,6 +54,40 @@ static int eeh_result_priority(enum pci_ers_result result) } }; +const char *pci_ers_result_name(enum pci_ers_result result) +{ + switch (result) { + case PCI_ERS_RESULT_NONE: return "none"; + case PCI_ERS_RESULT_CAN_RECOVER: return "can recover"; + case PCI_ERS_RESULT_NEED_RESET: return "need reset"; + case PCI_ERS_RESULT_DISCONNECT: return "disconnect"; + case PCI_ERS_RESULT_RECOVERED: return "recovered"; + case PCI_ERS_RESULT_NO_AER_DRIVER: return "no AER driver"; + default: + WARN_ONCE(1, "Unknown result type: %d\n", (int)result); + return "unknown"; + } +}; + +static __printf(2, 3) +void eeh_edev_info(const struct eeh_dev *edev, const char *fmt, ...) +{ + struct va_format vaf; + va_list args; + + va_start(args, fmt); + + vaf.fmt = fmt; + vaf.va = + + printk(KERN_INFO "EEH: PE#%x (PCI %s): %pV\n", + edev->pe_config_addr, + edev->pdev ? dev_name(>pdev->dev) : "none", + ); + + va_end(args); +} + static enum pci_ers_result pci_ers_merge_result(enum pci_ers_result old, enum pci_ers_result new) { @@ -229,123 +263,124 @@ static void eeh_set_irq_state(struct eeh_pe *root, bool enable) } } -/** - * eeh_report_error - Report pci error to each device driver - * @data: eeh device - * @userdata: return value - * - * Report an EEH error to each device driver, collect up and - * merge the device driver responses. Cumulative response - * passed back in "userdata". - */ -static void *eeh_report_error(struct eeh_dev *edev, void *userdata) +typedef enum pci_ers_result (*eeh_report_fn)(struct eeh_dev *, +struct pci_driver *); +static void eeh_pe_report_edev(struct eeh_dev *edev, + eeh_report_fn fn, + enum pci_ers_result *result) { - struct pci_dev *dev = eeh_dev_to_pci_dev(edev); - enum pci_ers_result rc, *res = userdata; struct pci_driver *driver; + enum pci_ers_result new_result; + + device_lock(>pdev->dev); + if (eeh_edev_actionable(edev)) { + driver = eeh_pcid_get(edev->pdev); + + if (!driver) + eeh_edev_info(edev, "no driver"); + else if (!driver->err_handler) + eeh_edev_info(edev, +"driver not EEH aware"); + else if (edev->mode & EEH_DEV_NO_HANDLER) + eeh_edev_info(edev, +"driver bound too late"); + else { + new_result = fn(edev, driver); + eeh_edev_info(edev, + "%s driver reports: '%s'", + driver->name, + pci_ers_result_name(new_result)); + if (result) + *result = pci_ers_merge_result(*result, + new_result); + } + if (driver) + eeh_pcid_put(edev->pdev); + } else { + eeh_edev_info(edev, "not actionable (%d,%d,%d)", +!!edev->pdev, +!eeh_dev_removed(edev), +!eeh_pe_passed(edev->pe)); + } + device_unlock(>pdev->dev); +} - if (!eeh_edev_actionable(edev)) - return NULL; +static void eeh_pe_report(const char *name, struct eeh_pe *root, + eeh_report_fn fn, + enum pci_ers_result *result) +{ +
[PATCH v2 02/13] powerpc/eeh: Add final message for successful recovery
Add a single log line at the end of successful EEH recovery, so that it's clear that event processing has finished. Signed-off-by: Sam Bobroff <sbobr...@linux.ibm.com> --- arch/powerpc/kernel/eeh_driver.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c index 56a60b9eb397..07e0a42035ce 100644 --- a/arch/powerpc/kernel/eeh_driver.c +++ b/arch/powerpc/kernel/eeh_driver.c @@ -910,6 +910,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe) pr_info("EEH: Notify device driver to resume\n"); eeh_pe_dev_traverse(pe, eeh_report_resume, NULL); + pr_info("EEH: Recovery successful.\n"); goto final; hard_fail: -- 2.16.1.74.g9b0b1f47b
[PATCH v2 04/13] powerpc/eeh: Remove unused eeh_pcid_name()
Signed-off-by: Sam Bobroff <sbobr...@linux.ibm.com> --- arch/powerpc/kernel/eeh_driver.c | 14 -- 1 file changed, 14 deletions(-) diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c index 54333f6c9d67..ca9a73fe9cc5 100644 --- a/arch/powerpc/kernel/eeh_driver.c +++ b/arch/powerpc/kernel/eeh_driver.c @@ -39,20 +39,6 @@ struct eeh_rmv_data { int removed; }; -/** - * eeh_pcid_name - Retrieve name of PCI device driver - * @pdev: PCI device - * - * This routine is used to retrieve the name of PCI device driver - * if that's valid. - */ -static inline const char *eeh_pcid_name(struct pci_dev *pdev) -{ - if (pdev && pdev->dev.driver) - return pdev->dev.driver->name; - return ""; -} - /** * eeh_pcid_get - Get the PCI device driver * @pdev: PCI device -- 2.16.1.74.g9b0b1f47b
[PATCH v2 06/13] powerpc/eeh: Add message when PE processing at parent
To aid debugging, add a message to show when EEH processing for a PE will be done at the device's parent, rather than directly at the device. Signed-off-by: Sam Bobroff <sbobr...@linux.ibm.com> --- arch/powerpc/kernel/eeh.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c index f82dade4fb9a..1139821a9aec 100644 --- a/arch/powerpc/kernel/eeh.c +++ b/arch/powerpc/kernel/eeh.c @@ -541,8 +541,12 @@ int eeh_dev_check_failure(struct eeh_dev *edev) /* Frozen parent PE ? */ ret = eeh_ops->get_state(parent_pe, NULL); - if (ret > 0 && !eeh_state_active(ret)) + if (ret > 0 && !eeh_state_active(ret)) { pe = parent_pe; + pr_err("EEH: Failure of PHB#%x-PE#%x will be handled at parent PHB#%x-PE#%x.\n", + pe->phb->global_number, pe->addr, + pe->phb->global_number, parent_pe->addr); + } /* Next parent level */ parent_pe = parent_pe->parent; -- 2.16.1.74.g9b0b1f47b
[PATCH v2 12/13] powerpc/eeh: Cleaner handling of EEH_DEV_NO_HANDLER
If a device without a driver is recovered via EEH, the flag EEH_DEV_NO_HANDLER is incorrectly left set on the device after recovery, because the test in eeh_report_resume() for the existence of a bound driver is done before the flag is cleared. If a driver is later bound, and EEH experienced again, some of the drivers EEH handers are not called. To correct this, clear the flag unconditionally after EEH processing is complete. Signed-off-by: Sam Bobroff <sbobr...@linux.ibm.com> --- arch/powerpc/kernel/eeh_driver.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c index 8dd2a33424a1..42fe80ed6f59 100644 --- a/arch/powerpc/kernel/eeh_driver.c +++ b/arch/powerpc/kernel/eeh_driver.c @@ -405,7 +405,6 @@ static void *eeh_report_resume(struct eeh_dev *edev, void *userdata) if (!driver->err_handler || !driver->err_handler->resume || (edev->mode & EEH_DEV_NO_HANDLER) || !was_in_error) { - edev->mode &= ~EEH_DEV_NO_HANDLER; goto out; } @@ -780,6 +779,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe) { struct pci_bus *bus; struct eeh_dev *edev, *tmp; + struct eeh_pe *tmp_pe; int rc = 0; enum pci_ers_result result = PCI_ERS_RESULT_NONE; struct eeh_rmv_data rmv_data = {LIST_HEAD_INIT(rmv_data.edev_list), 0}; @@ -934,6 +934,10 @@ void eeh_handle_normal_event(struct eeh_pe *pe) eeh_set_irq_state(pe, true); eeh_pe_dev_traverse(pe, eeh_report_resume, NULL); + eeh_for_each_pe(pe, tmp_pe) + eeh_pe_for_each_dev(tmp_pe, edev, tmp) + edev->mode &= ~EEH_DEV_NO_HANDLER; + pr_info("EEH: Recovery successful.\n"); goto final; -- 2.16.1.74.g9b0b1f47b
[PATCH v2 10/13] powerpc/eeh: Introduce eeh_set_channel_state()
To ease future refactoring, extract setting of the channel state from the report functions out into their own functions. This increases the amount of code that is identical across all of the report functions. Signed-off-by: Sam Bobroff <sbobr...@linux.ibm.com> --- arch/powerpc/kernel/eeh_driver.c | 19 --- 1 file changed, 16 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c index 30898866418b..76951ca701b2 100644 --- a/arch/powerpc/kernel/eeh_driver.c +++ b/arch/powerpc/kernel/eeh_driver.c @@ -199,6 +199,17 @@ static void *eeh_dev_save_state(struct eeh_dev *edev, void *userdata) return NULL; } +static void eeh_set_channel_state(struct eeh_pe *root, enum pci_channel_state s) +{ + struct eeh_pe *pe; + struct eeh_dev *edev, *tmp; + + eeh_for_each_pe(root, pe) + eeh_pe_for_each_dev(pe, edev, tmp) + if (eeh_edev_actionable(edev)) + edev->pdev->error_state = s; +} + /** * eeh_report_error - Report pci error to each device driver * @data: eeh device @@ -218,7 +229,6 @@ static void *eeh_report_error(struct eeh_dev *edev, void *userdata) return NULL; device_lock(>dev); - dev->error_state = pci_channel_io_frozen; driver = eeh_pcid_get(dev); if (!driver) goto out_no_dev; @@ -301,7 +311,6 @@ static void *eeh_report_reset(struct eeh_dev *edev, void *userdata) return NULL; device_lock(>dev); - dev->error_state = pci_channel_io_normal; driver = eeh_pcid_get(dev); if (!driver) goto out_no_dev; @@ -371,7 +380,6 @@ static void *eeh_report_resume(struct eeh_dev *edev, void *userdata) return NULL; device_lock(>dev); - dev->error_state = pci_channel_io_normal; driver = eeh_pcid_get(dev); if (!driver) goto out_no_dev; @@ -795,6 +803,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe) * hotplug for this case. */ pr_info("EEH: Notify device drivers to shutdown\n"); + eeh_set_channel_state(pe, pci_channel_io_frozen); eeh_pe_dev_traverse(pe, eeh_report_error, ); if ((pe->type & EEH_PE_PHB) && result != PCI_ERS_RESULT_NONE && @@ -885,6 +894,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe) pr_info("EEH: Notify device drivers " "the completion of reset\n"); result = PCI_ERS_RESULT_NONE; + eeh_set_channel_state(pe, pci_channel_io_normal); eeh_pe_dev_traverse(pe, eeh_report_reset, ); } @@ -906,6 +916,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe) /* Tell all device drivers that they can resume operations */ pr_info("EEH: Notify device driver to resume\n"); + eeh_set_channel_state(pe, pci_channel_io_normal); eeh_pe_dev_traverse(pe, eeh_report_resume, NULL); pr_info("EEH: Recovery successful.\n"); @@ -924,6 +935,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe) eeh_slot_error_detail(pe, EEH_LOG_PERM); /* Notify all devices that they're about to go down. */ + eeh_set_channel_state(pe, pci_channel_io_perm_failure); eeh_pe_dev_traverse(pe, eeh_report_failure, NULL); /* Mark the PE to be removed permanently */ @@ -1033,6 +1045,7 @@ void eeh_handle_special_event(void) /* Notify all devices to be down */ eeh_pe_state_clear(pe, EEH_PE_PRI_BUS); + eeh_set_channel_state(pe, pci_channel_io_perm_failure); eeh_pe_dev_traverse(pe, eeh_report_failure, NULL); bus = eeh_pe_bus_get(phb_pe); -- 2.16.1.74.g9b0b1f47b
[PATCH v2 05/13] powerpc/eeh: Strengthen types of eeh traversal functions
The traversal functions eeh_pe_traverse() and eeh_pe_dev_traverse() both provide their first argument as void * but every single user casts it to the expected type. Change the type of the first parameter from void * to the appropriate type, and clean up all uses. Signed-off-by: Sam Bobroff <sbobr...@linux.ibm.com> --- arch/powerpc/include/asm/eeh.h | 7 --- arch/powerpc/kernel/eeh.c| 13 + arch/powerpc/kernel/eeh_driver.c | 30 ++ arch/powerpc/kernel/eeh_pe.c | 19 +++ 4 files changed, 26 insertions(+), 43 deletions(-) diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h index c2266ca61853..f02e0400e6f2 100644 --- a/arch/powerpc/include/asm/eeh.h +++ b/arch/powerpc/include/asm/eeh.h @@ -262,7 +262,8 @@ static inline bool eeh_state_active(int state) == (EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE); } -typedef void *(*eeh_traverse_func)(void *data, void *flag); +typedef void *(*eeh_edev_traverse_func)(struct eeh_dev *edev, void *flag); +typedef void *(*eeh_pe_traverse_func)(struct eeh_pe *pe, void *flag); void eeh_set_pe_aux_size(int size); int eeh_phb_pe_create(struct pci_controller *phb); struct eeh_pe *eeh_phb_pe_get(struct pci_controller *phb); @@ -272,9 +273,9 @@ int eeh_add_to_parent_pe(struct eeh_dev *edev); int eeh_rmv_from_parent_pe(struct eeh_dev *edev); void eeh_pe_update_time_stamp(struct eeh_pe *pe); void *eeh_pe_traverse(struct eeh_pe *root, - eeh_traverse_func fn, void *flag); + eeh_pe_traverse_func fn, void *flag); void *eeh_pe_dev_traverse(struct eeh_pe *root, - eeh_traverse_func fn, void *flag); + eeh_edev_traverse_func fn, void *flag); void eeh_pe_restore_bars(struct eeh_pe *pe); const char *eeh_pe_loc_get(struct eeh_pe *pe); struct pci_bus *eeh_pe_bus_get(struct eeh_pe *pe); diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c index bc640e4c5ca5..f82dade4fb9a 100644 --- a/arch/powerpc/kernel/eeh.c +++ b/arch/powerpc/kernel/eeh.c @@ -263,9 +263,8 @@ static size_t eeh_dump_dev_log(struct eeh_dev *edev, char *buf, size_t len) return n; } -static void *eeh_dump_pe_log(void *data, void *flag) +static void *eeh_dump_pe_log(struct eeh_pe *pe, void *flag) { - struct eeh_pe *pe = data; struct eeh_dev *edev, *tmp; size_t *plen = flag; @@ -686,9 +685,9 @@ int eeh_pci_enable(struct eeh_pe *pe, int function) return rc; } -static void *eeh_disable_and_save_dev_state(void *data, void *userdata) +static void *eeh_disable_and_save_dev_state(struct eeh_dev *edev, + void *userdata) { - struct eeh_dev *edev = data; struct pci_dev *pdev = eeh_dev_to_pci_dev(edev); struct pci_dev *dev = userdata; @@ -714,9 +713,8 @@ static void *eeh_disable_and_save_dev_state(void *data, void *userdata) return NULL; } -static void *eeh_restore_dev_state(void *data, void *userdata) +static void *eeh_restore_dev_state(struct eeh_dev *edev, void *userdata) { - struct eeh_dev *edev = data; struct pci_dn *pdn = eeh_dev_to_pdn(edev); struct pci_dev *pdev = eeh_dev_to_pci_dev(edev); struct pci_dev *dev = userdata; @@ -856,11 +854,10 @@ int pcibios_set_pcie_reset_state(struct pci_dev *dev, enum pcie_reset_state stat * the indicated device and its children so that the bunch of the * devices could be reset properly. */ -static void *eeh_set_dev_freset(void *data, void *flag) +static void *eeh_set_dev_freset(struct eeh_dev *edev, void *flag) { struct pci_dev *dev; unsigned int *freset = (unsigned int *)flag; - struct eeh_dev *edev = (struct eeh_dev *)data; dev = eeh_dev_to_pci_dev(edev); if (dev) diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c index ca9a73fe9cc5..188d15c4fe3a 100644 --- a/arch/powerpc/kernel/eeh_driver.c +++ b/arch/powerpc/kernel/eeh_driver.c @@ -149,9 +149,8 @@ static bool eeh_dev_removed(struct eeh_dev *edev) return false; } -static void *eeh_dev_save_state(void *data, void *userdata) +static void *eeh_dev_save_state(struct eeh_dev *edev, void *userdata) { - struct eeh_dev *edev = data; struct pci_dev *pdev; if (!edev) @@ -184,9 +183,8 @@ static void *eeh_dev_save_state(void *data, void *userdata) * merge the device driver responses. Cumulative response * passed back in "userdata". */ -static void *eeh_report_error(void *data, void *userdata) +static void *eeh_report_error(struct eeh_dev *edev, void *userdata) { - struct eeh_dev *edev = (struct eeh_dev *)data; struct pci_dev *dev = eeh_dev_to_pci_dev(edev); enum pci_ers_result rc, *res = userdata; struct pci_driver *driver; @@ -231,9 +229,8 @@ static void *eeh_report_error(void *data, void *userdata) * are now enabled. Collects up and m
[PATCH v2 08/13] powerpc/eeh: Introduce eeh_for_each_pe()
Add a for_each-style macro for iterating through PEs without the boilerplate required by a traversal function. eeh_pe_next() is now exported, as it is now used directly in place. Signed-off-by: Sam Bobroff <sbobr...@linux.ibm.com> --- arch/powerpc/include/asm/eeh.h | 4 arch/powerpc/kernel/eeh_pe.c | 7 +++ 2 files changed, 7 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h index f02e0400e6f2..677102baf3cd 100644 --- a/arch/powerpc/include/asm/eeh.h +++ b/arch/powerpc/include/asm/eeh.h @@ -106,6 +106,9 @@ struct eeh_pe { #define eeh_pe_for_each_dev(pe, edev, tmp) \ list_for_each_entry_safe(edev, tmp, >edevs, list) +#define eeh_for_each_pe(root, pe) \ + for (pe = root; pe; pe = eeh_pe_next(pe, root)) + static inline bool eeh_pe_passed(struct eeh_pe *pe) { return pe ? !!atomic_read(>pass_dev_cnt) : false; @@ -267,6 +270,7 @@ typedef void *(*eeh_pe_traverse_func)(struct eeh_pe *pe, void *flag); void eeh_set_pe_aux_size(int size); int eeh_phb_pe_create(struct pci_controller *phb); struct eeh_pe *eeh_phb_pe_get(struct pci_controller *phb); +struct eeh_pe *eeh_pe_next(struct eeh_pe *pe, struct eeh_pe *root); struct eeh_pe *eeh_pe_get(struct pci_controller *phb, int pe_no, int config_addr); int eeh_add_to_parent_pe(struct eeh_dev *edev); diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c index 38a4bcd8ed13..1b238ecc553e 100644 --- a/arch/powerpc/kernel/eeh_pe.c +++ b/arch/powerpc/kernel/eeh_pe.c @@ -142,8 +142,7 @@ struct eeh_pe *eeh_phb_pe_get(struct pci_controller *phb) * The function is used to retrieve the next PE in the * hierarchy PE tree. */ -static struct eeh_pe *eeh_pe_next(struct eeh_pe *pe, - struct eeh_pe *root) +struct eeh_pe *eeh_pe_next(struct eeh_pe *pe, struct eeh_pe *root) { struct list_head *next = pe->child_list.next; @@ -178,7 +177,7 @@ void *eeh_pe_traverse(struct eeh_pe *root, struct eeh_pe *pe; void *ret; - for (pe = root; pe; pe = eeh_pe_next(pe, root)) { + eeh_for_each_pe(root, pe) { ret = fn(pe, flag); if (ret) return ret; } @@ -209,7 +208,7 @@ void *eeh_pe_dev_traverse(struct eeh_pe *root, } /* Traverse root PE */ - for (pe = root; pe; pe = eeh_pe_next(pe, root)) { + eeh_for_each_pe(root, pe) { eeh_pe_for_each_dev(pe, edev, tmp) { ret = fn(edev, flag); if (ret) -- 2.16.1.74.g9b0b1f47b
[PATCH v2 07/13] powerpc/eeh: Clean up pci_ers_result handling
As EEH event handling progresses, a cumulative result of type pci_ers_result is built up by (some of) the eeh_report_*() functions using either: if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc; if (*res == PCI_ERS_RESULT_NONE) *res = rc; or: if ((*res == PCI_ERS_RESULT_NONE) || (*res == PCI_ERS_RESULT_RECOVERED)) *res = rc; if (*res == PCI_ERS_RESULT_DISCONNECT && rc == PCI_ERS_RESULT_NEED_RESET) *res = rc; (Where *res is the accumulator.) However, the intent is not immediately clear and the result in some situations is order dependent. Address this by assigning a priority to each result value, and always merging to the highest priority. This renders the intent clear, and provides a stable value for all orderings. Signed-off-by: Sam Bobroff <sbobr...@linux.ibm.com> --- == v1 -> v2: == * Added the value, and missing newline, to some WARN()s. * Improved name of merge_result() to pci_ers_merge_result(). * Adjusted the result priorities so that unknown doesn't overlap with _NONE. arch/powerpc/kernel/eeh_driver.c | 36 ++-- 1 file changed, 26 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c index 188d15c4fe3a..2d3cac584899 100644 --- a/arch/powerpc/kernel/eeh_driver.c +++ b/arch/powerpc/kernel/eeh_driver.c @@ -39,6 +39,29 @@ struct eeh_rmv_data { int removed; }; +static int eeh_result_priority(enum pci_ers_result result) +{ + switch (result) { + case PCI_ERS_RESULT_NONE: return 1; + case PCI_ERS_RESULT_NO_AER_DRIVER: return 2; + case PCI_ERS_RESULT_RECOVERED: return 3; + case PCI_ERS_RESULT_CAN_RECOVER: return 4; + case PCI_ERS_RESULT_DISCONNECT: return 5; + case PCI_ERS_RESULT_NEED_RESET: return 6; + default: + WARN_ONCE(1, "Unknown pci_ers_result value: %d\n", (int)result); + return 0; + } +}; + +static enum pci_ers_result pci_ers_merge_result(enum pci_ers_result old, + enum pci_ers_result new) +{ + if (eeh_result_priority(new) > eeh_result_priority(old)) + return new; + return old; +} + /** * eeh_pcid_get - Get the PCI device driver * @pdev: PCI device @@ -206,9 +229,7 @@ static void *eeh_report_error(struct eeh_dev *edev, void *userdata) rc = driver->err_handler->error_detected(dev, pci_channel_io_frozen); - /* A driver that needs a reset trumps all others */ - if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc; - if (*res == PCI_ERS_RESULT_NONE) *res = rc; + *res = pci_ers_merge_result(*res, rc); edev->in_error = true; pci_uevent_ers(dev, PCI_ERS_RESULT_NONE); @@ -249,9 +270,7 @@ static void *eeh_report_mmio_enabled(struct eeh_dev *edev, void *userdata) rc = driver->err_handler->mmio_enabled(dev); - /* A driver that needs a reset trumps all others */ - if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc; - if (*res == PCI_ERS_RESULT_NONE) *res = rc; + *res = pci_ers_merge_result(*res, rc); out: eeh_pcid_put(dev); @@ -294,10 +313,7 @@ static void *eeh_report_reset(struct eeh_dev *edev, void *userdata) goto out; rc = driver->err_handler->slot_reset(dev); - if ((*res == PCI_ERS_RESULT_NONE) || - (*res == PCI_ERS_RESULT_RECOVERED)) *res = rc; - if (*res == PCI_ERS_RESULT_DISCONNECT && -rc == PCI_ERS_RESULT_NEED_RESET) *res = rc; + *res = pci_ers_merge_result(*res, rc); out: eeh_pcid_put(dev); -- 2.16.1.74.g9b0b1f47b
[PATCH v2 11/13] powerpc/eeh: Introduce eeh_set_irq_state()
To ease future refactoring, extract calls to eeh_enable_irq() and eeh_disable_irq() from the various report functions. This makes the report functions initial sequences more similar, as well as making the IRQ changes visible when reading eeh_handle_normal_event(). Signed-off-by: Sam Bobroff <sbobr...@linux.ibm.com> --- == v1 -> v2: == * Reorganised eeh_set_irq_state() to reduce nesting depth. arch/powerpc/kernel/eeh_driver.c | 52 ++-- 1 file changed, 34 insertions(+), 18 deletions(-) diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c index 76951ca701b2..8dd2a33424a1 100644 --- a/arch/powerpc/kernel/eeh_driver.c +++ b/arch/powerpc/kernel/eeh_driver.c @@ -118,22 +118,20 @@ static inline void eeh_pcid_put(struct pci_dev *pdev) * do real work because EEH should freeze DMA transfers for those PCI * devices encountering EEH errors, which includes MSI or MSI-X. */ -static void eeh_disable_irq(struct pci_dev *dev) +static void eeh_disable_irq(struct eeh_dev *edev) { - struct eeh_dev *edev = pci_dev_to_eeh_dev(dev); - /* Don't disable MSI and MSI-X interrupts. They are * effectively disabled by the DMA Stopped state * when an EEH error occurs. */ - if (dev->msi_enabled || dev->msix_enabled) + if (edev->pdev->msi_enabled || edev->pdev->msix_enabled) return; - if (!irq_has_action(dev->irq)) + if (!irq_has_action(edev->pdev->irq)) return; edev->mode |= EEH_DEV_IRQ_DISABLED; - disable_irq_nosync(dev->irq); + disable_irq_nosync(edev->pdev->irq); } /** @@ -143,10 +141,8 @@ static void eeh_disable_irq(struct pci_dev *dev) * This routine must be called to enable interrupt while failed * device could be resumed. */ -static void eeh_enable_irq(struct pci_dev *dev) +static void eeh_enable_irq(struct eeh_dev *edev) { - struct eeh_dev *edev = pci_dev_to_eeh_dev(dev); - if ((edev->mode) & EEH_DEV_IRQ_DISABLED) { edev->mode &= ~EEH_DEV_IRQ_DISABLED; /* @@ -169,8 +165,8 @@ static void eeh_enable_irq(struct pci_dev *dev) * * tglx */ - if (irqd_irq_disabled(irq_get_irq_data(dev->irq))) - enable_irq(dev->irq); + if (irqd_irq_disabled(irq_get_irq_data(edev->pdev->irq))) + enable_irq(edev->pdev->irq); } } @@ -210,6 +206,29 @@ static void eeh_set_channel_state(struct eeh_pe *root, enum pci_channel_state s) edev->pdev->error_state = s; } +static void eeh_set_irq_state(struct eeh_pe *root, bool enable) +{ + struct eeh_pe *pe; + struct eeh_dev *edev, *tmp; + + eeh_for_each_pe(root, pe) { + eeh_pe_for_each_dev(pe, edev, tmp) { + if (!eeh_edev_actionable(edev)) + continue; + + if (!eeh_pcid_get(edev->pdev)) + continue; + + if (enable) + eeh_enable_irq(edev); + else + eeh_disable_irq(edev); + + eeh_pcid_put(edev->pdev); + } + } +} + /** * eeh_report_error - Report pci error to each device driver * @data: eeh device @@ -233,8 +252,6 @@ static void *eeh_report_error(struct eeh_dev *edev, void *userdata) driver = eeh_pcid_get(dev); if (!driver) goto out_no_dev; - eeh_disable_irq(dev); - if (!driver->err_handler || !driver->err_handler->error_detected) goto out; @@ -315,8 +332,6 @@ static void *eeh_report_reset(struct eeh_dev *edev, void *userdata) driver = eeh_pcid_get(dev); if (!driver) goto out_no_dev; - eeh_enable_irq(dev); - if (!driver->err_handler || !driver->err_handler->slot_reset || (edev->mode & EEH_DEV_NO_HANDLER) || @@ -386,7 +401,6 @@ static void *eeh_report_resume(struct eeh_dev *edev, void *userdata) was_in_error = edev->in_error; edev->in_error = false; - eeh_enable_irq(dev); if (!driver->err_handler || !driver->err_handler->resume || @@ -431,8 +445,6 @@ static void *eeh_report_failure(struct eeh_dev *edev, void *userdata) driver = eeh_pcid_get(dev); if (!driver) goto out_no_dev; - eeh_disable_irq(dev); - if (!driver->err_handler || !driver->err_handler->error_detected) goto out; @@ -804,6 +816,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe) */ pr_info("EEH: Notify device drivers to shutdown\n"); eeh_set_cha
Re: [PATCH v2 07/13] powerpc/eeh: Clean up pci_ers_result handling
On Sat, Jun 02, 2018 at 01:40:46AM +1000, Michael Ellerman wrote: > Sam Bobroff writes: > > > As EEH event handling progresses, a cumulative result of type > > pci_ers_result is built up by (some of) the eeh_report_*() functions > > using either: > > if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc; > > if (*res == PCI_ERS_RESULT_NONE) *res = rc; > > or: > > if ((*res == PCI_ERS_RESULT_NONE) || > > (*res == PCI_ERS_RESULT_RECOVERED)) *res = rc; > > if (*res == PCI_ERS_RESULT_DISCONNECT && > > rc == PCI_ERS_RESULT_NEED_RESET) *res = rc; > > (Where *res is the accumulator.) > > > > However, the intent is not immediately clear and the result in some > > situations is order dependent. > > > > Address this by assigning a priority to each result value, and always > > merging to the highest priority. This renders the intent clear, and > > provides a stable value for all orderings. > > > > Signed-off-by: Sam Bobroff > > --- > > == v1 -> v2: == > > > > * Added the value, and missing newline, to some WARN()s. > > * Improved name of merge_result() to pci_ers_merge_result(). > > * Adjusted the result priorities so that unknown doesn't overlap with _NONE. > > These === markers seem to have confused patchwork, they ended up in the > patch, and then git put them in the changelog. > > http://patchwork.ozlabs.org/patch/920194/ > > The usual format is just something like: > > v2 - Added the value, and missing newline, to some WARN()s. >- Improved name of merge_result() to pci_ers_merge_result(). >- Adjusted the result priorities so that unknown doesn't overlap with > _NONE. > > cheers Oh! I'll change it! Sam. > > > arch/powerpc/kernel/eeh_driver.c | 36 ++-- > > 1 file changed, 26 insertions(+), 10 deletions(-) > > > > diff --git a/arch/powerpc/kernel/eeh_driver.c > > b/arch/powerpc/kernel/eeh_driver.c > > index 188d15c4fe3a..2d3cac584899 100644 > > --- a/arch/powerpc/kernel/eeh_driver.c > > +++ b/arch/powerpc/kernel/eeh_driver.c > > @@ -39,6 +39,29 @@ struct eeh_rmv_data { > > int removed; > > }; > > > > +static int eeh_result_priority(enum pci_ers_result result) > > +{ > > + switch (result) { > > + case PCI_ERS_RESULT_NONE: return 1; > > + case PCI_ERS_RESULT_NO_AER_DRIVER: return 2; > > + case PCI_ERS_RESULT_RECOVERED: return 3; > > + case PCI_ERS_RESULT_CAN_RECOVER: return 4; > > + case PCI_ERS_RESULT_DISCONNECT: return 5; > > + case PCI_ERS_RESULT_NEED_RESET: return 6; > > + default: > > + WARN_ONCE(1, "Unknown pci_ers_result value: %d\n", (int)result); > > + return 0; > > + } > > +}; > > + > > +static enum pci_ers_result pci_ers_merge_result(enum pci_ers_result old, > > + enum pci_ers_result new) > > +{ > > + if (eeh_result_priority(new) > eeh_result_priority(old)) > > + return new; > > + return old; > > +} > > + > > /** > > * eeh_pcid_get - Get the PCI device driver > > * @pdev: PCI device > > @@ -206,9 +229,7 @@ static void *eeh_report_error(struct eeh_dev *edev, > > void *userdata) > > > > rc = driver->err_handler->error_detected(dev, pci_channel_io_frozen); > > > > - /* A driver that needs a reset trumps all others */ > > - if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc; > > - if (*res == PCI_ERS_RESULT_NONE) *res = rc; > > + *res = pci_ers_merge_result(*res, rc); > > > > edev->in_error = true; > > pci_uevent_ers(dev, PCI_ERS_RESULT_NONE); > > @@ -249,9 +270,7 @@ static void *eeh_report_mmio_enabled(struct eeh_dev > > *edev, void *userdata) > > > > rc = driver->err_handler->mmio_enabled(dev); > > > > - /* A driver that needs a reset trumps all others */ > > - if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc; > > - if (*res == PCI_ERS_RESULT_NONE) *res = rc; > > + *res = pci_ers_merge_result(*res, rc); > > > > out: > > eeh_pcid_put(dev); > > @@ -294,10 +313,7 @@ static void *eeh_report_reset(struct eeh_dev *edev, > > void *userdata) > > goto out; > > > > rc = driver->err_handler->slot_reset(dev); > > - if ((*res == PCI_ERS_RESULT_NONE) || > > - (*res == PCI_ERS_RESULT_RECOVERED)) *res = rc; > > - if (*res == PCI_ERS_RESULT_DISCONNECT && > > -rc == PCI_ERS_RESULT_NEED_RESET) *res = rc; > > + *res = pci_ers_merge_result(*res, rc); > > > > out: > > eeh_pcid_put(dev); > > -- > > 2.16.1.74.g9b0b1f47b > signature.asc Description: PGP signature
[PATCH 06/13] powerpc/eeh: Add message when PE processing at parent
To aid debugging, add a message to show when EEH processing for a PE will be done at the device's parent, rather than directly at the device. Signed-off-by: Sam Bobroff <sbobr...@linux.ibm.com> --- arch/powerpc/kernel/eeh.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c index f82dade4fb9a..1139821a9aec 100644 --- a/arch/powerpc/kernel/eeh.c +++ b/arch/powerpc/kernel/eeh.c @@ -541,8 +541,12 @@ int eeh_dev_check_failure(struct eeh_dev *edev) /* Frozen parent PE ? */ ret = eeh_ops->get_state(parent_pe, NULL); - if (ret > 0 && !eeh_state_active(ret)) + if (ret > 0 && !eeh_state_active(ret)) { pe = parent_pe; + pr_err("EEH: Failure of PHB#%x-PE#%x will be handled at parent PHB#%x-PE#%x.\n", + pe->phb->global_number, pe->addr, + pe->phb->global_number, parent_pe->addr); + } /* Next parent level */ parent_pe = parent_pe->parent; -- 2.16.1.74.g9b0b1f47b
[PATCH 07/13] powerpc/eeh: Clean up pci_ers_result handling
As EEH event handling progresses, a cumulative result of type pci_ers_result is built up by (some of) the eeh_report_*() functions using either: if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc; if (*res == PCI_ERS_RESULT_NONE) *res = rc; or: if ((*res == PCI_ERS_RESULT_NONE) || (*res == PCI_ERS_RESULT_RECOVERED)) *res = rc; if (*res == PCI_ERS_RESULT_DISCONNECT && rc == PCI_ERS_RESULT_NEED_RESET) *res = rc; (Where *res is the accumulator.) However, the intent is not immediately clear and the result in some situations is order dependent. Address this by assigning a priority to each result value, and always merging to the highest priority. This renders the intent clear, and provides a stable value for all orderings. Signed-off-by: Sam Bobroff <sbobr...@linux.ibm.com> --- arch/powerpc/kernel/eeh_driver.c | 36 ++-- 1 file changed, 26 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c index 188d15c4fe3a..f33dd68a9ca2 100644 --- a/arch/powerpc/kernel/eeh_driver.c +++ b/arch/powerpc/kernel/eeh_driver.c @@ -39,6 +39,29 @@ struct eeh_rmv_data { int removed; }; +static int eeh_result_priority(enum pci_ers_result result) +{ + switch (result) { + case PCI_ERS_RESULT_NONE: return 0; + case PCI_ERS_RESULT_NO_AER_DRIVER: return 1; + case PCI_ERS_RESULT_RECOVERED: return 2; + case PCI_ERS_RESULT_CAN_RECOVER: return 3; + case PCI_ERS_RESULT_DISCONNECT: return 4; + case PCI_ERS_RESULT_NEED_RESET: return 5; + default: + WARN_ONCE(1, "Unknown pci_ers_result value"); + return 0; + } +}; + +static enum pci_ers_result merge_result(enum pci_ers_result old, + enum pci_ers_result new) +{ + if (eeh_result_priority(new) > eeh_result_priority(old)) + return new; + return old; +} + /** * eeh_pcid_get - Get the PCI device driver * @pdev: PCI device @@ -206,9 +229,7 @@ static void *eeh_report_error(struct eeh_dev *edev, void *userdata) rc = driver->err_handler->error_detected(dev, pci_channel_io_frozen); - /* A driver that needs a reset trumps all others */ - if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc; - if (*res == PCI_ERS_RESULT_NONE) *res = rc; + *res = merge_result(*res, rc); edev->in_error = true; pci_uevent_ers(dev, PCI_ERS_RESULT_NONE); @@ -249,9 +270,7 @@ static void *eeh_report_mmio_enabled(struct eeh_dev *edev, void *userdata) rc = driver->err_handler->mmio_enabled(dev); - /* A driver that needs a reset trumps all others */ - if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc; - if (*res == PCI_ERS_RESULT_NONE) *res = rc; + *res = merge_result(*res, rc); out: eeh_pcid_put(dev); @@ -294,10 +313,7 @@ static void *eeh_report_reset(struct eeh_dev *edev, void *userdata) goto out; rc = driver->err_handler->slot_reset(dev); - if ((*res == PCI_ERS_RESULT_NONE) || - (*res == PCI_ERS_RESULT_RECOVERED)) *res = rc; - if (*res == PCI_ERS_RESULT_DISCONNECT && -rc == PCI_ERS_RESULT_NEED_RESET) *res = rc; + *res = merge_result(*res, rc); out: eeh_pcid_put(dev); -- 2.16.1.74.g9b0b1f47b
[PATCH 13/13] powerpc/eeh: Refactor report functions
The EEH report functions now share a fair bit of code around the start and end of each function. So factor out as much as possible, and move the traversal into a custom function. This also allows accurate debug to be generated more easily. Signed-off-by: Sam Bobroff <sbobr...@linux.ibm.com> --- arch/powerpc/kernel/eeh_driver.c | 289 +++ 1 file changed, 138 insertions(+), 151 deletions(-) diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c index eb4feee81ff4..1c4336dcf9f5 100644 --- a/arch/powerpc/kernel/eeh_driver.c +++ b/arch/powerpc/kernel/eeh_driver.c @@ -54,6 +54,25 @@ static int eeh_result_priority(enum pci_ers_result result) } }; +const char *pci_ers_result_name(enum pci_ers_result r) +{ + switch (r) { + case PCI_ERS_RESULT_NONE: return "none"; + case PCI_ERS_RESULT_CAN_RECOVER: return "can recover"; + case PCI_ERS_RESULT_NEED_RESET: return "need reset"; + case PCI_ERS_RESULT_DISCONNECT: return "disconnect"; + case PCI_ERS_RESULT_RECOVERED: return "recovered"; + case PCI_ERS_RESULT_NO_AER_DRIVER: return "no AER driver"; + default: + WARN_ONCE(1, "Unknown result type"); + return "unknown"; + } +}; + +#define eeh_infoline(EDEV, FMT, ...) \ +pr_info("EEH: PE#%x (PCI %s): " pr_fmt(FMT) "\n", EDEV->pe_config_addr, \ +((EDEV->pdev) ? dev_name(>pdev->dev) : "NONE"), ##__VA_ARGS__) + static enum pci_ers_result merge_result(enum pci_ers_result old, enum pci_ers_result new) { @@ -223,123 +242,118 @@ static void eeh_set_irq_state(struct eeh_pe *root, bool enable) } } +static void eeh_pe_report(const char *name, struct eeh_pe *root, + enum pci_ers_result (*fn)(struct eeh_dev *, + struct pci_driver *), + enum pci_ers_result *result) +{ + struct eeh_pe *pe; + struct eeh_dev *edev, *tmp; + enum pci_ers_result new_result; + + pr_info("EEH: Beginning: '%s'\n", name); + eeh_for_each_pe(root, pe) { + eeh_pe_for_each_dev(pe, edev, tmp) { + device_lock(>pdev->dev); + if (eeh_edev_actionable(edev)) { + struct pci_driver *driver; + + driver = eeh_pcid_get(edev->pdev); + + if (!driver) + eeh_infoline(edev, "no driver"); + else if (!driver->err_handler) + eeh_infoline(edev, +"driver not EEH aware"); + else if (edev->mode & EEH_DEV_NO_HANDLER) + eeh_infoline(edev, +"driver bound too late"); + else { + new_result = fn(edev, driver); + eeh_infoline(edev, + "%s driver reports: '%s'", + driver->name, + pci_ers_result_name(new_result)); + if (result) + *result = merge_result(*result, + new_result); + } + if (driver) + eeh_pcid_put(edev->pdev); + } else { + eeh_infoline(edev, "not actionable (%d,%d,%d)", +!!edev->pdev, +!eeh_dev_removed(edev), +!eeh_pe_passed(edev->pe)); + } + device_unlock(>pdev->dev); + } + } + if (result) + pr_info("EEH: Finished:'%s' with aggregate recovery state:'%s'\n", + name, pci_ers_result_name(*result)); + else + pr_info("EEH: Finished:'%s'", name); +} + /** * eeh_report_error - Report pci error to each device driver - * @data: eeh device - * @userdata: return value + * @edev: eeh device + * @driver: device's PCI driver * - * Report an EEH error to each device driver, collect up and - * merge the device driver responses. Cumulative response - * passed back in "user
[PATCH 01/13] powerpc/eeh: Add eeh_max_freezes to initial EEH log line
The current failure message includes the number of failures that have occurred in the last hour (for a device) but it does not indicate how many failures will be tolerated before the device is permanently disabled. Include the limit (eeh_max_freezes) to make this less surprising when it happens. Also remove the embedded newline from the existing message to make it easier to grep for. Signed-off-by: Sam Bobroff <sbobr...@linux.ibm.com> --- arch/powerpc/kernel/eeh_driver.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c index b8a329f04814..56a60b9eb397 100644 --- a/arch/powerpc/kernel/eeh_driver.c +++ b/arch/powerpc/kernel/eeh_driver.c @@ -778,14 +778,13 @@ void eeh_handle_normal_event(struct eeh_pe *pe) eeh_pe_update_time_stamp(pe); pe->freeze_count++; if (pe->freeze_count > eeh_max_freezes) { - pr_err("EEH: PHB#%x-PE#%x has failed %d times in the\n" - "last hour and has been permanently disabled.\n", + pr_err("EEH: PHB#%x-PE#%x has failed %d times in the last hour and has been permanently disabled.\n", pe->phb->global_number, pe->addr, pe->freeze_count); goto hard_fail; } - pr_warn("EEH: This PCI device has failed %d times in the last hour\n", - pe->freeze_count); + pr_warn("EEH: This PCI device has failed %d times in the last hour and will be permanently disabled after %d failures.\n", + pe->freeze_count, eeh_max_freezes); /* Walk the various device drivers attached to this slot through * a reset sequence, giving each an opportunity to do what it needs -- 2.16.1.74.g9b0b1f47b