[PATCH 3/3] powerpc/lib/sstep: Fix fixed-point shift instructions that set CA32
This fixes the emulated behaviour of existing fixed-point shift right algebraic instructions that are supposed to set both the CA and CA32 bits of XER when running on a system that is compliant with POWER ISA v3.0 independent of whether the system is executing in 32-bit mode or 64-bit mode. The following instructions are affected: * Shift Right Algebraic Word Immediate (srawi[.]) * Shift Right Algebraic Word (sraw[.]) * Shift Right Algebraic Doubleword Immediate (sradi[.]) * Shift Right Algebraic Doubleword (srad[.]) Signed-off-by: Sandipan Das--- arch/powerpc/lib/sstep.c | 4 1 file changed, 4 insertions(+) diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c index fe1910733e55..5118110c3983 100644 --- a/arch/powerpc/lib/sstep.c +++ b/arch/powerpc/lib/sstep.c @@ -1804,6 +1804,7 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs, op->xerval |= XER_CA; else op->xerval &= ~XER_CA; + set_ca32(op, op->xerval & XER_CA); goto logical_done; case 824: /* srawi */ @@ -1816,6 +1817,7 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs, op->xerval |= XER_CA; else op->xerval &= ~XER_CA; + set_ca32(op, op->xerval & XER_CA); goto logical_done; #ifdef __powerpc64__ @@ -1845,6 +1847,7 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs, op->xerval |= XER_CA; else op->xerval &= ~XER_CA; + set_ca32(op, op->xerval & XER_CA); goto logical_done; case 826: /* sradi with sh_5 = 0 */ @@ -1858,6 +1861,7 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs, op->xerval |= XER_CA; else op->xerval &= ~XER_CA; + set_ca32(op, op->xerval & XER_CA); goto logical_done; #endif /* __powerpc64__ */ -- 2.13.5
[PATCH 2/3] powerpc/lib/sstep: Fix fixed-point arithmetic instructions that set CA32
There are existing fixed-point arithmetic instructions that always set the CA bit of XER to reflect the carry out of bit 0 in 64-bit mode and out of bit 32 in 32-bit mode. In ISA v3.0, these instructions also always set the CA32 bit of XER to reflect the carry out of bit 32. This fixes the emulated behaviour of such instructions when running on a system that is compliant with POWER ISA v3.0. The following instructions are affected: * Add Immediate Carrying (addic) * Add Immediate Carrying and Record (addic.) * Subtract From Immediate Carrying (subfic) * Add Carrying (addc[.]) * Subtract From Carrying (subfc[.]) * Add Extended (adde[.]) * Subtract From Extended (subfe[.]) * Add to Minus One Extended (addme[.]) * Subtract From Minus One Extended (subfme[.]) * Add to Zero Extended (addze[.]) * Subtract From Zero Extended (subfze[.]) Signed-off-by: Sandipan Das--- arch/powerpc/lib/sstep.c | 13 + 1 file changed, 13 insertions(+) diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c index 16814bfc01da..fe1910733e55 100644 --- a/arch/powerpc/lib/sstep.c +++ b/arch/powerpc/lib/sstep.c @@ -964,6 +964,16 @@ static nokprobe_inline void set_cr0(const struct pt_regs *regs, op->ccval |= 0x2000; } +static nokprobe_inline void set_ca32(struct instruction_op *op, bool val) +{ + if (cpu_has_feature(CPU_FTR_ARCH_300)) { + if (val) + op->xerval |= XER_CA32; + else + op->xerval &= ~XER_CA32; + } +} + static nokprobe_inline void add_with_carry(const struct pt_regs *regs, struct instruction_op *op, int rd, unsigned long val1, unsigned long val2, @@ -987,6 +997,9 @@ static nokprobe_inline void add_with_carry(const struct pt_regs *regs, op->xerval |= XER_CA; else op->xerval &= ~XER_CA; + + set_ca32(op, (unsigned int)val < (unsigned int)val1 || + (carry_in && (unsigned int)val == (unsigned int)val1)); } static nokprobe_inline void do_cmp_signed(const struct pt_regs *regs, -- 2.13.5
[PATCH 1/3] powerpc/lib/sstep: Add XER bits introduced in POWER ISA v3.0
This adds definitions for the OV32 and CA32 bits of XER that were introduced in POWER ISA v3.0. There are some existing instructions that currently set the OV and CA bits based on certain conditions. The emulation behaviour of all these instructions needs to be updated to set these new bits accordingly. Signed-off-by: Sandipan Das--- arch/powerpc/lib/sstep.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c index 5e8418c28bd8..16814bfc01da 100644 --- a/arch/powerpc/lib/sstep.c +++ b/arch/powerpc/lib/sstep.c @@ -31,6 +31,8 @@ extern char system_call_common[]; #define XER_SO 0x8000U #define XER_OV 0x4000U #define XER_CA 0x2000U +#define XER_OV32 0x0008U +#define XER_CA32 0x0004U #ifdef CONFIG_PPC_FPU /* -- 2.13.5
Re: [PATCH] powerpc/powernv: Make opal_event_shutdown() callable from IRQ context
On Fri, 29 Sep 2017 13:58:02 +1000 Michael Ellermanwrote: > In opal_event_shutdown() we free all the IRQs hanging off the > opal_event_irqchip. However it's not safe to do so if we're called > from IRQ context, because free_irq() wants to synchronise versus IRQ > context. This can lead to warnings and a stuck system. > > For example from sysrq-b: > > Trying to free IRQ 17 from IRQ context! > [ cut here ] > WARNING: CPU: 0 PID: 0 at kernel/irq/manage.c:1461 __free_irq+0x398/0x8d0 > ... > NIP __free_irq+0x398/0x8d0 > LR __free_irq+0x394/0x8d0 > Call Trace: > __free_irq+0x394/0x8d0 (unreliable) > free_irq+0xa4/0x140 > opal_event_shutdown+0x128/0x180 > opal_shutdown+0x1c/0xb0 > pnv_shutdown+0x20/0x40 > machine_restart+0x38/0x90 > emergency_restart+0x28/0x40 > sysrq_handle_reboot+0x24/0x40 > __handle_sysrq+0x198/0x590 > hvc_poll+0x48c/0x8c0 > hvc_handle_interrupt+0x1c/0x50 > __handle_irq_event_percpu+0xe8/0x6e0 > handle_irq_event_percpu+0x34/0xe0 > handle_irq_event+0xc4/0x210 > handle_level_irq+0x250/0x770 > generic_handle_irq+0x5c/0xa0 > opal_handle_events+0x11c/0x240 > opal_interrupt+0x38/0x50 > __handle_irq_event_percpu+0xe8/0x6e0 > handle_irq_event_percpu+0x34/0xe0 > handle_irq_event+0xc4/0x210 > handle_fasteoi_irq+0x174/0xa10 > generic_handle_irq+0x5c/0xa0 > __do_irq+0xbc/0x4e0 > call_do_irq+0x14/0x24 > do_IRQ+0x18c/0x540 > hardware_interrupt_common+0x158/0x180 > > We can avoid that by using disable_irq_nosync() rather than > free_irq(). Although it doesn't fully free the IRQ, it should be > sufficient when we're shutting down, particularly in an emergency. > > Add an in_interrupt() check and use free_irq() when we're shutting > down normally. It's probably OK to use disable_irq_nosync() in that > case too, but for now it's safer to leave that behaviour as-is. > > Fixes: 9f0fd0499d30 ("powerpc/powernv: Add a virtual irqchip for opal events") > Signed-off-by: Michael Ellerman > --- Acked-by: Balbir Singh
[PATCH v4 5/5] powerpc/mce: hookup memory_failure for UE errors
If we are in user space and hit a UE error, we now have the basic infrastructure to walk the page tables and find out the effective address that was accessed, since the DAR is not valid. We use a work_queue content to hookup the bad pfn, any other context causes problems, since memory_failure itself can call into schedule() via lru_drain_ bits. We could probably poison the struct page to avoid a race between detection and taking corrective action. Signed-off-by: Balbir SinghReviewed-by: Nicholas Piggin --- arch/powerpc/kernel/mce.c | 70 +-- 1 file changed, 67 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c index ee148f4..299553e 100644 --- a/arch/powerpc/kernel/mce.c +++ b/arch/powerpc/kernel/mce.c @@ -39,11 +39,21 @@ static DEFINE_PER_CPU(struct machine_check_event[MAX_MC_EVT], mce_event); static DEFINE_PER_CPU(int, mce_queue_count); static DEFINE_PER_CPU(struct machine_check_event[MAX_MC_EVT], mce_event_queue); +/* Queue for delayed MCE UE events. */ +static DEFINE_PER_CPU(int, mce_ue_count); +static DEFINE_PER_CPU(struct machine_check_event[MAX_MC_EVT], + mce_ue_event_queue); + static void machine_check_process_queued_event(struct irq_work *work); +void machine_check_ue_event(struct machine_check_event *evt); +static void machine_process_ue_event(struct work_struct *work); + static struct irq_work mce_event_process_work = { .func = machine_check_process_queued_event, }; +DECLARE_WORK(mce_ue_event_work, machine_process_ue_event); + static void mce_set_error_info(struct machine_check_event *mce, struct mce_error_info *mce_err) { @@ -143,6 +153,7 @@ void save_mce_event(struct pt_regs *regs, long handled, if (phys_addr != ULONG_MAX) { mce->u.ue_error.physical_address_provided = true; mce->u.ue_error.physical_address = phys_addr; + machine_check_ue_event(mce); } } return; @@ -197,6 +208,26 @@ void release_mce_event(void) get_mce_event(NULL, true); } + +/* + * Queue up the MCE event which then can be handled later. + */ +void machine_check_ue_event(struct machine_check_event *evt) +{ + int index; + + index = __this_cpu_inc_return(mce_ue_count) - 1; + /* If queue is full, just return for now. */ + if (index >= MAX_MC_EVT) { + __this_cpu_dec(mce_ue_count); + return; + } + memcpy(this_cpu_ptr(_ue_event_queue[index]), evt, sizeof(*evt)); + + /* Queue work to process this event later. */ + schedule_work(_ue_event_work); +} + /* * Queue up the MCE event which then can be handled later. */ @@ -219,7 +250,39 @@ void machine_check_queue_event(void) /* Queue irq work to process this event later. */ irq_work_queue(_event_process_work); } - +/* + * process pending MCE event from the mce event queue. This function will be + * called during syscall exit. + */ +static void machine_process_ue_event(struct work_struct *work) +{ + int index; + struct machine_check_event *evt; + + while (__this_cpu_read(mce_ue_count) > 0) { + index = __this_cpu_read(mce_ue_count) - 1; + evt = this_cpu_ptr(_ue_event_queue[index]); +#ifdef CONFIG_MEMORY_FAILURE + /* +* This should probably queued elsewhere, but +* oh! well +*/ + if (evt->error_type == MCE_ERROR_TYPE_UE) { + if (evt->u.ue_error.physical_address_provided) { + unsigned long pfn; + + pfn = evt->u.ue_error.physical_address >> + PAGE_SHIFT; + memory_failure(pfn, SIGBUS, 0); + } else + pr_warn("Failed to identify bad address from " + "where the uncorrectable error (UE) " + "was generated\n"); + } +#endif + __this_cpu_dec(mce_ue_count); + } +} /* * process pending MCE event from the mce event queue. This function will be * called during syscall exit. @@ -227,6 +290,7 @@ void machine_check_queue_event(void) static void machine_check_process_queued_event(struct irq_work *work) { int index; + struct machine_check_event *evt; add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE); @@ -236,8 +300,8 @@ static void machine_check_process_queued_event(struct irq_work *work) */ while (__this_cpu_read(mce_queue_count) > 0) { index = __this_cpu_read(mce_queue_count) - 1; - machine_check_print_event_info( -
[PATCH v4 4/5] powerpc/mce: Hookup ierror (instruction) UE errors
Hookup instruction errors (UE) for memory offling via memory_failure() in a manner similar to load/store errors (derror). Since we have access to the NIP, the conversion is a one step process in this case. Signed-off-by: Balbir SinghReviewed-by: Nicholas Piggin --- arch/powerpc/kernel/mce_power.c | 22 +++--- 1 file changed, 19 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kernel/mce_power.c b/arch/powerpc/kernel/mce_power.c index 0e584d5..2a888f6 100644 --- a/arch/powerpc/kernel/mce_power.c +++ b/arch/powerpc/kernel/mce_power.c @@ -488,7 +488,8 @@ static int mce_find_instr_ea_and_pfn(struct pt_regs *regs, uint64_t *addr, static int mce_handle_ierror(struct pt_regs *regs, const struct mce_ierror_table table[], - struct mce_error_info *mce_err, uint64_t *addr) + struct mce_error_info *mce_err, uint64_t *addr, + uint64_t *phys_addr) { uint64_t srr1 = regs->msr; int handled = 0; @@ -540,8 +541,22 @@ static int mce_handle_ierror(struct pt_regs *regs, } mce_err->severity = table[i].severity; mce_err->initiator = table[i].initiator; - if (table[i].nip_valid) + if (table[i].nip_valid) { *addr = regs->nip; + if (mce_err->severity == MCE_SEV_ERROR_SYNC && + table[i].error_type == MCE_ERROR_TYPE_UE) { + unsigned long pfn; + + if (get_paca()->in_mce < MAX_MCE_DEPTH) { + pfn = addr_to_pfn(regs, regs->nip); + if (pfn != ULONG_MAX) { + *phys_addr = + (pfn << PAGE_SHIFT); + handled = 1; + } + } + } + } return handled; } @@ -676,7 +691,8 @@ static long mce_handle_error(struct pt_regs *regs, handled = mce_handle_derror(regs, dtable, _err, , _addr); else - handled = mce_handle_ierror(regs, itable, _err, ); + handled = mce_handle_ierror(regs, itable, _err, , + _addr); if (!handled && mce_err.error_type == MCE_ERROR_TYPE_UE) handled = mce_handle_ue_error(regs); -- 2.9.5
[PATCH v4 3/5] powerpc/mce: Hookup derror (load/store) UE errors
Extract physical_address for UE errors by walking the page tables for the mm and address at the NIP, to extract the instruction. Then use the instruction to find the effective address via analyse_instr(). We might have page table walking races, but we expect them to be rare, the physical address extraction is best effort. The idea is to then hook up this infrastructure to memory failure eventually. Signed-off-by: Balbir SinghReviewed-by: Nicholas Piggin --- arch/powerpc/include/asm/exception-64s.h | 5 ++ arch/powerpc/include/asm/mce.h | 2 +- arch/powerpc/kernel/exceptions-64s.S | 2 +- arch/powerpc/kernel/mce.c| 6 ++- arch/powerpc/kernel/mce_power.c | 87 ++-- 5 files changed, 94 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h index 9a31897..b272052 100644 --- a/arch/powerpc/include/asm/exception-64s.h +++ b/arch/powerpc/include/asm/exception-64s.h @@ -55,6 +55,11 @@ #endif /* + * maximum recursive depth of MCE exceptions + */ +#define MAX_MCE_DEPTH 4 + +/* * EX_LR is only used in EXSLB and where it does not overlap with EX_DAR * EX_CCR similarly with DSISR, but being 4 byte registers there is a hole * in the save area so it's not necessary to overlap them. Could be used diff --git a/arch/powerpc/include/asm/mce.h b/arch/powerpc/include/asm/mce.h index 75292c7..3a1226e 100644 --- a/arch/powerpc/include/asm/mce.h +++ b/arch/powerpc/include/asm/mce.h @@ -204,7 +204,7 @@ struct mce_error_info { extern void save_mce_event(struct pt_regs *regs, long handled, struct mce_error_info *mce_err, uint64_t nip, - uint64_t addr); + uint64_t addr, uint64_t phys_addr); extern int get_mce_event(struct machine_check_event *mce, bool release); extern void release_mce_event(void); extern void machine_check_queue_event(void); diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 48da0f5..3a6c8c8 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -232,7 +232,7 @@ BEGIN_FTR_SECTION addir10,r10,1 /* increment paca->in_mce */ sth r10,PACA_IN_MCE(r13) /* Limit nested MCE to level 4 to avoid stack overflow */ - cmpwi r10,4 + cmpwi r10,MAX_MCE_DEPTH bgt 2f /* Check if we hit limit of 4 */ std r11,GPR1(r1)/* Save r1 on the stack. */ std r11,0(r1) /* make stack chain pointer */ diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c index fef1408..ee148f4 100644 --- a/arch/powerpc/kernel/mce.c +++ b/arch/powerpc/kernel/mce.c @@ -82,7 +82,7 @@ static void mce_set_error_info(struct machine_check_event *mce, */ void save_mce_event(struct pt_regs *regs, long handled, struct mce_error_info *mce_err, - uint64_t nip, uint64_t addr) + uint64_t nip, uint64_t addr, uint64_t phys_addr) { int index = __this_cpu_inc_return(mce_nest_count) - 1; struct machine_check_event *mce = this_cpu_ptr(_event[index]); @@ -140,6 +140,10 @@ void save_mce_event(struct pt_regs *regs, long handled, } else if (mce->error_type == MCE_ERROR_TYPE_UE) { mce->u.ue_error.effective_address_provided = true; mce->u.ue_error.effective_address = addr; + if (phys_addr != ULONG_MAX) { + mce->u.ue_error.physical_address_provided = true; + mce->u.ue_error.physical_address = phys_addr; + } } return; } diff --git a/arch/powerpc/kernel/mce_power.c b/arch/powerpc/kernel/mce_power.c index b76ca19..0e584d5 100644 --- a/arch/powerpc/kernel/mce_power.c +++ b/arch/powerpc/kernel/mce_power.c @@ -27,6 +27,36 @@ #include #include #include +#include +#include +#include +#include + +/* + * Convert an address related to an mm to a PFN. NOTE: we are in real + * mode, we could potentially race with page table updates. + */ +static unsigned long addr_to_pfn(struct pt_regs *regs, unsigned long addr) +{ + pte_t *ptep; + unsigned long flags; + struct mm_struct *mm; + + if (user_mode(regs)) + mm = current->mm; + else + mm = _mm; + + local_irq_save(flags); + if (mm == current->mm) + ptep = find_current_mm_pte(mm->pgd, addr, NULL, NULL); + else + ptep = find_init_mm_pte(addr, NULL); + local_irq_restore(flags); + if (!ptep || pte_special(*ptep)) + return ULONG_MAX; + return pte_pfn(*ptep); +} static void flush_tlb_206(unsigned int num_sets, unsigned int action) { @@ -421,6 +451,41 @@ static const struct mce_derror_table
[PATCH v4 2/5] powerpc/mce: align the print of physical address better
Use the same alignment as Effective address and rename phyiscal address to Page Frame Number Signed-off-by: Balbir SinghReviewed-by: Nicholas Piggin --- arch/powerpc/kernel/mce.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c index e254399..fef1408 100644 --- a/arch/powerpc/kernel/mce.c +++ b/arch/powerpc/kernel/mce.c @@ -340,7 +340,7 @@ void machine_check_print_event_info(struct machine_check_event *evt, printk("%sEffective address: %016llx\n", level, evt->u.ue_error.effective_address); if (evt->u.ue_error.physical_address_provided) - printk("%s Physical address: %016llx\n", + printk("%sPhysical address: %016llx\n", level, evt->u.ue_error.physical_address); break; case MCE_ERROR_TYPE_SLB: -- 2.9.5
[PATCH v4 1/5] powerpc/mce.c: Remove unused function get_mce_fault_addr()
There are no users of get_mce_fault_addr() Fixes: b63a0ff ("powerpc/powernv: Machine check exception handling.") Signed-off-by: Balbir SinghReviewed-by: Nicholas Piggin --- arch/powerpc/include/asm/mce.h | 2 -- arch/powerpc/kernel/mce.c | 39 --- 2 files changed, 41 deletions(-) diff --git a/arch/powerpc/include/asm/mce.h b/arch/powerpc/include/asm/mce.h index 190d69a..75292c7 100644 --- a/arch/powerpc/include/asm/mce.h +++ b/arch/powerpc/include/asm/mce.h @@ -210,6 +210,4 @@ extern void release_mce_event(void); extern void machine_check_queue_event(void); extern void machine_check_print_event_info(struct machine_check_event *evt, bool user_mode); -extern uint64_t get_mce_fault_addr(struct machine_check_event *evt); - #endif /* __ASM_PPC64_MCE_H__ */ diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c index 9b2ea7e..e254399 100644 --- a/arch/powerpc/kernel/mce.c +++ b/arch/powerpc/kernel/mce.c @@ -411,45 +411,6 @@ void machine_check_print_event_info(struct machine_check_event *evt, } EXPORT_SYMBOL_GPL(machine_check_print_event_info); -uint64_t get_mce_fault_addr(struct machine_check_event *evt) -{ - switch (evt->error_type) { - case MCE_ERROR_TYPE_UE: - if (evt->u.ue_error.effective_address_provided) - return evt->u.ue_error.effective_address; - break; - case MCE_ERROR_TYPE_SLB: - if (evt->u.slb_error.effective_address_provided) - return evt->u.slb_error.effective_address; - break; - case MCE_ERROR_TYPE_ERAT: - if (evt->u.erat_error.effective_address_provided) - return evt->u.erat_error.effective_address; - break; - case MCE_ERROR_TYPE_TLB: - if (evt->u.tlb_error.effective_address_provided) - return evt->u.tlb_error.effective_address; - break; - case MCE_ERROR_TYPE_USER: - if (evt->u.user_error.effective_address_provided) - return evt->u.user_error.effective_address; - break; - case MCE_ERROR_TYPE_RA: - if (evt->u.ra_error.effective_address_provided) - return evt->u.ra_error.effective_address; - break; - case MCE_ERROR_TYPE_LINK: - if (evt->u.link_error.effective_address_provided) - return evt->u.link_error.effective_address; - break; - default: - case MCE_ERROR_TYPE_UNKNOWN: - break; - } - return 0; -} -EXPORT_SYMBOL(get_mce_fault_addr); - /* * This function is called in real mode. Strictly no printk's please. * -- 2.9.5
[PATCH v4 0/5] Revisit MCE handling for UE Errors
This patch series is designed to hook up memory_failure on UE errors, this is specially helpful for user_mode UE errors. The first two patches cleanup bits, remove dead code. I could not find any users of get_mce_fault_addr(). The second one improves printing of physical address The third patch walks kernel/user mode page tables in real mode to extract the effective address of the instruction that caused the UE error and the effective address it was trying to access (for load/store). The fourth patch hooks up the pfn for instruction UE errors (ierror). The fifth patch hooks up memory_failure to the MCE patch. TODO: Log the address in NVRAM, so that we can recover from bad pages at boot and keep the blacklist persistent. Changelog v4: - Add a #define for MAX_MCE_DEPTH instead of hard coding the value - Refactor addr_to_pfn to deduce the right parameters Changelog v3: - Check for recursive MCE invocations (suggested by Nick) Changelog v2: - Remove pr_warn from real mode to a more delayed context, where its OK to warn. - Add a trivial patch to align prints of physical and effective address Changelog v1: - Address review comments from Nick and Mahesh (initialization of pfn and more comments on failure when addr_to_pfn() or anaylse_instr() fail) - Hookup ierrors to the framework as well (comments from Mahesh) Balbir Singh (5): powerpc/mce.c: Remove unused function get_mce_fault_addr() powerpc/mce: align the print of physical address better powerpc/mce: Hookup derror (load/store) UE errors powerpc/mce: Hookup ierror (instruction) UE errors powerpc/mce: hookup memory_failure for UE errors arch/powerpc/include/asm/exception-64s.h | 5 ++ arch/powerpc/include/asm/mce.h | 4 +- arch/powerpc/kernel/exceptions-64s.S | 2 +- arch/powerpc/kernel/mce.c| 117 +++ arch/powerpc/kernel/mce_power.c | 109 +--- 5 files changed, 181 insertions(+), 56 deletions(-) -- 2.9.5
[PATCH] powerpc/powernv: Make opal_event_shutdown() callable from IRQ context
In opal_event_shutdown() we free all the IRQs hanging off the opal_event_irqchip. However it's not safe to do so if we're called from IRQ context, because free_irq() wants to synchronise versus IRQ context. This can lead to warnings and a stuck system. For example from sysrq-b: Trying to free IRQ 17 from IRQ context! [ cut here ] WARNING: CPU: 0 PID: 0 at kernel/irq/manage.c:1461 __free_irq+0x398/0x8d0 ... NIP __free_irq+0x398/0x8d0 LR __free_irq+0x394/0x8d0 Call Trace: __free_irq+0x394/0x8d0 (unreliable) free_irq+0xa4/0x140 opal_event_shutdown+0x128/0x180 opal_shutdown+0x1c/0xb0 pnv_shutdown+0x20/0x40 machine_restart+0x38/0x90 emergency_restart+0x28/0x40 sysrq_handle_reboot+0x24/0x40 __handle_sysrq+0x198/0x590 hvc_poll+0x48c/0x8c0 hvc_handle_interrupt+0x1c/0x50 __handle_irq_event_percpu+0xe8/0x6e0 handle_irq_event_percpu+0x34/0xe0 handle_irq_event+0xc4/0x210 handle_level_irq+0x250/0x770 generic_handle_irq+0x5c/0xa0 opal_handle_events+0x11c/0x240 opal_interrupt+0x38/0x50 __handle_irq_event_percpu+0xe8/0x6e0 handle_irq_event_percpu+0x34/0xe0 handle_irq_event+0xc4/0x210 handle_fasteoi_irq+0x174/0xa10 generic_handle_irq+0x5c/0xa0 __do_irq+0xbc/0x4e0 call_do_irq+0x14/0x24 do_IRQ+0x18c/0x540 hardware_interrupt_common+0x158/0x180 We can avoid that by using disable_irq_nosync() rather than free_irq(). Although it doesn't fully free the IRQ, it should be sufficient when we're shutting down, particularly in an emergency. Add an in_interrupt() check and use free_irq() when we're shutting down normally. It's probably OK to use disable_irq_nosync() in that case too, but for now it's safer to leave that behaviour as-is. Fixes: 9f0fd0499d30 ("powerpc/powernv: Add a virtual irqchip for opal events") Signed-off-by: Michael Ellerman--- arch/powerpc/platforms/powernv/opal-irqchip.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/powernv/opal-irqchip.c b/arch/powerpc/platforms/powernv/opal-irqchip.c index ecdcba9d1220..9d1b8c0aaf93 100644 --- a/arch/powerpc/platforms/powernv/opal-irqchip.c +++ b/arch/powerpc/platforms/powernv/opal-irqchip.c @@ -174,8 +174,14 @@ void opal_event_shutdown(void) /* First free interrupts, which will also mask them */ for (i = 0; i < opal_irq_count; i++) { - if (opal_irqs[i]) + if (!opal_irqs[i]) + continue; + + if (in_interrupt()) + disable_irq_nosync(opal_irqs[i]); + else free_irq(opal_irqs[i], NULL); + opal_irqs[i] = 0; } } -- 2.7.4
[PATCH] powerpc: Fix workaround for spurious MCE on POWER9
In the recent commit: d8bd9f3f09 powerpc: Handle MCE on POWER9 with only DSISR bit 30 set I screwed up the bit. It should be bit 25 (IBM bit 38). Signed-off-by: Michael Neuling--- arch/powerpc/kernel/mce_power.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/mce_power.c b/arch/powerpc/kernel/mce_power.c index f523125b9d..72f153c6f3 100644 --- a/arch/powerpc/kernel/mce_power.c +++ b/arch/powerpc/kernel/mce_power.c @@ -626,7 +626,7 @@ long __machine_check_early_realmode_p9(struct pt_regs *regs) { /* * On POWER9 DD2.1 and below, it's possible to get a machine check -* caused by a paste instruction where only DSISR bit 30 is set. This +* caused by a paste instruction where only DSISR bit 25 is set. This * will result in the MCE handler seeing an unknown event and the kernel * crashing. An MCE that occurs like this is spurious, so we don't need * to do anything in terms of servicing it. If there is something that @@ -634,7 +634,7 @@ long __machine_check_early_realmode_p9(struct pt_regs *regs) * correct DSISR so that it can be serviced properly. So detect this * case and mark it as handled. */ - if (SRR1_MC_LOADSTORE(regs->msr) && regs->dsisr == 0x4000) + if (SRR1_MC_LOADSTORE(regs->msr) && regs->dsisr == 0x0200) return 1; return mce_handle_error(regs, mce_p9_derror_table, mce_p9_ierror_table); -- 2.11.0
[PATCH v2 6/6] powerpc/powernv: implement NMI IPI with OPAL_SIGNAL_SYSTEM_RESET
This allows MSR[EE]=0 lockups to be detected on an OPAL (bare metal) system similarly to the hcall NMI IPI on pseries guests, when the platform/firmware supports it. This is an example of CPU10 spinning with interrupts hard disabled: Watchdog CPU:32 detected Hard LOCKUP other CPUS:10 Watchdog CPU:10 Hard LOCKUP CPU: 10 PID: 4410 Comm: bash Not tainted 4.13.0-rc7-00074-ge89ce1f89f62-dirty #34 task: c003a82b4400 task.stack: c003af55c000 NIP: c00a7b38 LR: c0659044 CTR: c00a7b00 REGS: cfd23d80 TRAP: 0100 Not tainted (4.13.0-rc7-00074-ge89ce1f89f62-dirty) MSR: 900c1033CR: 2842 XER: 2000 CFAR: c00a7b38 SOFTE: 0 GPR00: c0659044 c003af55fbb0 c1072a00 0078 GPR04: c003c81b5c80 c003c81cc7e8 90009033 GPR08: c00a7b00 0001 90001003 GPR12: c00a7b00 cfd83200 10180df8 10189e60 GPR16: 10189ed8 10151270 1018bd88 1018de78 GPR20: 370a0668 0001 101645e0 10163c10 GPR24: 7fffd14d6294 7fffd14d6290 c0fba6f0 0004 GPR28: c0f351d8 0078 c0f4095c NIP [c00a7b38] sysrq_handle_xmon+0x38/0x40 LR [c0659044] __handle_sysrq+0xe4/0x270 Call Trace: [c003af55fbd0] [c0659044] __handle_sysrq+0xe4/0x270 [c003af55fc70] [c0659810] write_sysrq_trigger+0x70/0xa0 [c003af55fca0] [c03da650] proc_reg_write+0xb0/0x110 [c003af55fcf0] [c03423bc] __vfs_write+0x6c/0x1b0 [c003af55fd90] [c0344398] vfs_write+0xd8/0x240 [c003af55fde0] [c034632c] SyS_write+0x6c/0x110 [c003af55fe30] [c000b220] system_call+0x58/0x6c Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/opal-api.h| 1 + arch/powerpc/include/asm/opal.h| 2 + arch/powerpc/platforms/powernv/opal-wrappers.S | 1 + arch/powerpc/platforms/powernv/setup.c | 1 + arch/powerpc/platforms/powernv/smp.c | 52 ++ 5 files changed, 57 insertions(+) diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h index 450a60b81d2a..9d191ebea706 100644 --- a/arch/powerpc/include/asm/opal-api.h +++ b/arch/powerpc/include/asm/opal-api.h @@ -188,6 +188,7 @@ #define OPAL_XIVE_DUMP 142 #define OPAL_XIVE_RESERVED3143 #define OPAL_XIVE_RESERVED4144 +#define OPAL_SIGNAL_SYSTEM_RESET 145 #define OPAL_NPU_INIT_CONTEXT 146 #define OPAL_NPU_DESTROY_CONTEXT 147 #define OPAL_NPU_MAP_LPAR 148 diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h index 726c23304a57..7d7613c49f2b 100644 --- a/arch/powerpc/include/asm/opal.h +++ b/arch/powerpc/include/asm/opal.h @@ -281,6 +281,8 @@ int opal_get_power_shift_ratio(u32 handle, int token, u32 *psr); int opal_set_power_shift_ratio(u32 handle, int token, u32 psr); int opal_sensor_group_clear(u32 group_hndl, int token); +int64_t opal_signal_system_reset(int32_t cpu); + /* Internal functions */ extern int early_init_dt_scan_opal(unsigned long node, const char *uname, int depth, void *data); diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S index 8c1ede2d3f7e..37cd170201a2 100644 --- a/arch/powerpc/platforms/powernv/opal-wrappers.S +++ b/arch/powerpc/platforms/powernv/opal-wrappers.S @@ -307,6 +307,7 @@ OPAL_CALL(opal_xive_get_vp_info, OPAL_XIVE_GET_VP_INFO); OPAL_CALL(opal_xive_set_vp_info, OPAL_XIVE_SET_VP_INFO); OPAL_CALL(opal_xive_sync, OPAL_XIVE_SYNC); OPAL_CALL(opal_xive_dump, OPAL_XIVE_DUMP); +OPAL_CALL(opal_signal_system_reset,OPAL_SIGNAL_SYSTEM_RESET); OPAL_CALL(opal_npu_init_context, OPAL_NPU_INIT_CONTEXT); OPAL_CALL(opal_npu_destroy_context,OPAL_NPU_DESTROY_CONTEXT); OPAL_CALL(opal_npu_map_lpar, OPAL_NPU_MAP_LPAR); diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c index 897aa1400eb8..cf52d53da460 100644 --- a/arch/powerpc/platforms/powernv/setup.c +++ b/arch/powerpc/platforms/powernv/setup.c @@ -282,6 +282,7 @@ static void __init pnv_setup_machdep_opal(void) ppc_md.restart = pnv_restart; pm_power_off = pnv_power_off; ppc_md.halt = pnv_halt; + /* ppc_md.system_reset_exception gets filled in by pnv_smp_init() */ ppc_md.machine_check_exception = opal_machine_check; ppc_md.mce_check_early_recovery = opal_mce_check_early_recovery; ppc_md.hmi_exception_early = opal_hmi_exception_early; diff --git
[PATCH v2 5/6] powerpc/64s: Implement system reset idle wakeup reason
It is possible to wake from idle due to a system reset exception, in which case the CPU takes a system reset interrupt to wake from idle, with system reset as the wakeup reason. The regular (not idle wakeup) system reset interrupt handler must be invoked in this case, otherwise the system reset interrupt is lost. Handle the system reset interrupt immediately after CPU state has been restored. Signed-off-by: Nicholas Piggin--- arch/powerpc/kernel/irq.c | 41 ++--- 1 file changed, 38 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c index 4e65bf82f5e0..4813b83b22aa 100644 --- a/arch/powerpc/kernel/irq.c +++ b/arch/powerpc/kernel/irq.c @@ -394,11 +394,19 @@ bool prep_irq_for_idle_irqsoff(void) /* * Take the SRR1 wakeup reason, index into this table to find the * appropriate irq_happened bit. + * + * Sytem reset exceptions taken in idle state also come through here, + * but they are NMI interrupts so do not need to wait for IRQs to be + * restored, and should be taken as early as practical. These are marked + * with 0xff in the table. The Power ISA specifies 0100b as the system + * reset interrupt reason. */ +#define IRQ_SYSTEM_RESET 0xff + static const u8 srr1_to_lazyirq[0x10] = { 0, 0, 0, PACA_IRQ_DBELL, - 0, + IRQ_SYSTEM_RESET, PACA_IRQ_DBELL, PACA_IRQ_DEC, 0, @@ -407,15 +415,42 @@ static const u8 srr1_to_lazyirq[0x10] = { PACA_IRQ_HMI, 0, 0, 0, 0, 0 }; +static noinline void replay_system_reset(void) +{ + struct pt_regs regs; + + ppc_save_regs(); + regs.trap = 0x100; + get_paca()->in_nmi = 1; + system_reset_exception(); + get_paca()->in_nmi = 0; +} + void irq_set_pending_from_srr1(unsigned long srr1) { unsigned int idx = (srr1 & SRR1_WAKEMASK_P8) >> 18; + u8 reason = srr1_to_lazyirq[idx]; + + /* +* Take the system reset now, which is immediately after registers +* are restored from idle. It's an NMI, so interrupts need not be +* re-enabled before it is taken. +*/ + if (unlikely(reason == IRQ_SYSTEM_RESET)) { + replay_system_reset(); + return; + } /* * The 0 index (SRR1[42:45]=b) must always evaluate to 0, -* so this can be called unconditionally with srr1 wake reason. +* so this can be called unconditionally with the SRR1 wake +* reason as returned by the idle code, which uses 0 to mean no +* interrupt. +* +* If a future CPU was to designate this as an interrupt reason, +* then a new index for no interrupt must be assigned. */ - local_paca->irq_happened |= srr1_to_lazyirq[idx]; + local_paca->irq_happened |= reason; } #endif /* CONFIG_PPC_BOOK3S */ -- 2.13.3
[PATCH v2 4/6] powerpc/xmon: avoid tripping SMP hardlockup watchdog
The SMP hardlockup watchdog cross-checks other CPUs for lockups, which causes xmon headaches because it's assuming interrupts hard disabled means no watchdog troubles. Try to improve that by calling touch_nmi_watchdog() in obvious places where secondaries are spinning. Also annotate these spin loops with spin_begin/end calls. Signed-off-by: Nicholas Piggin--- arch/powerpc/xmon/xmon.c | 17 + 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c index 33351c6704b1..d9a12102b111 100644 --- a/arch/powerpc/xmon/xmon.c +++ b/arch/powerpc/xmon/xmon.c @@ -530,14 +530,19 @@ static int xmon_core(struct pt_regs *regs, int fromipi) waiting: secondary = 1; + spin_begin(); while (secondary && !xmon_gate) { if (in_xmon == 0) { - if (fromipi) + if (fromipi) { + spin_end(); goto leave; + } secondary = test_and_set_bit(0, _xmon); } - barrier(); + spin_cpu_relax(); + touch_nmi_watchdog(); } + spin_end(); if (!secondary && !xmon_gate) { /* we are the first cpu to come in */ @@ -568,21 +573,25 @@ static int xmon_core(struct pt_regs *regs, int fromipi) mb(); xmon_gate = 1; barrier(); + touch_nmi_watchdog(); } cmdloop: while (in_xmon) { if (secondary) { + spin_begin(); if (cpu == xmon_owner) { if (!test_and_set_bit(0, _taken)) { secondary = 0; + spin_end(); continue; } /* missed it */ while (cpu == xmon_owner) - barrier(); + spin_cpu_relax(); } - barrier(); + spin_cpu_relax(); + touch_nmi_watchdog(); } else { cmd = cmds(regs); if (cmd != 0) { -- 2.13.3
[PATCH v2 3/6] powerpc/watchdog: do not trigger SMP crash from touch_nmi_watchdog
In xmon, touch_nmi_watchdog() is not expected to be checking that other CPUs have not touched the watchdog, so the code will just call touch_nmi_watchdog() once before re-enabling hard interrupts. Just update our CPU's state, and ignore apparently stuck SMP threads. Arguably touch_nmi_watchdog should check for SMP lockups, and callers should be fixed, but that's not trivial for the input code of xmon. --- arch/powerpc/kernel/watchdog.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c index 920e61c79f47..1fb9379dc683 100644 --- a/arch/powerpc/kernel/watchdog.c +++ b/arch/powerpc/kernel/watchdog.c @@ -277,9 +277,12 @@ void arch_touch_nmi_watchdog(void) { unsigned long ticks = tb_ticks_per_usec * wd_timer_period_ms * 1000; int cpu = smp_processor_id(); + u64 tb = get_tb(); - if (get_tb() - per_cpu(wd_timer_tb, cpu) >= ticks) - watchdog_timer_interrupt(cpu); + if (tb - per_cpu(wd_timer_tb, cpu) >= ticks) { + per_cpu(wd_timer_tb, cpu) = tb; + wd_smp_clear_cpu_pending(cpu, tb); + } } EXPORT_SYMBOL(arch_touch_nmi_watchdog); -- 2.13.3
[PATCH v2 2/6] powerpc/watchdog: do not backtrace locked CPUs twice if allcpus backtrace is enabled
If sysctl_hardlockup_all_cpu_backtrace is enabled, there is no need to IPI stuck CPUs for backtrace before trigger_allbutself_cpu_backtrace(), which does the same thing again. Signed-off-by: Nicholas Piggin--- arch/powerpc/kernel/watchdog.c | 19 +++ 1 file changed, 11 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c index 532a1adbe89b..920e61c79f47 100644 --- a/arch/powerpc/kernel/watchdog.c +++ b/arch/powerpc/kernel/watchdog.c @@ -133,15 +133,18 @@ static void watchdog_smp_panic(int cpu, u64 tb) pr_emerg("Watchdog CPU:%d detected Hard LOCKUP other CPUS:%*pbl\n", cpu, cpumask_pr_args(_smp_cpus_pending)); - /* -* Try to trigger the stuck CPUs. -*/ - for_each_cpu(c, _smp_cpus_pending) { - if (c == cpu) - continue; - smp_send_nmi_ipi(c, wd_lockup_ipi, 100); + if (!sysctl_hardlockup_all_cpu_backtrace) { + /* +* Try to trigger the stuck CPUs, unless we are going to +* get a backtrace on all of them anyway. +*/ + for_each_cpu(c, _smp_cpus_pending) { + if (c == cpu) + continue; + smp_send_nmi_ipi(c, wd_lockup_ipi, 100); + } + smp_flush_nmi_ipi(100); } - smp_flush_nmi_ipi(100); /* Take the stuck CPUs out of the watch group */ set_cpumask_stuck(_smp_cpus_pending, tb); -- 2.13.3
[PATCH v2 1/6] powerpc/watchdog: do not panic from locked CPU's IPI handler
The SMP watchdog will detect locked CPUs and IPI them to print a backtrace and registers. If panic on hard lockup is enabled, do not panic from this handler, because that can cause recursion into the IPI layer during the panic. The caller already panics in this case. Signed-off-by: Nicholas Piggin--- arch/powerpc/kernel/watchdog.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c index 2f6eadd9408d..532a1adbe89b 100644 --- a/arch/powerpc/kernel/watchdog.c +++ b/arch/powerpc/kernel/watchdog.c @@ -97,8 +97,7 @@ static void wd_lockup_ipi(struct pt_regs *regs) else dump_stack(); - if (hardlockup_panic) - nmi_panic(regs, "Hard LOCKUP"); + /* Do not panic from here because that can recurse into NMI IPI layer */ } static void set_cpumask_stuck(const struct cpumask *cpumask, u64 tb) -- 2.13.3
[PATCH v2 0/6] More NMI IPI enablement work
Since last post, the first 4 patches are unchanged. Split the last patch into 2, tidied up a few things, and removed the DD1 workarounds because firmware is not going to support DD1. Then re-tested with upstream firmware which has now merged support for OPAL_SIGNAL_SYSTEM_RESET on POWER9 DD2. Nicholas Piggin (6): powerpc/watchdog: do not panic from locked CPU's IPI handler powerpc/watchdog: do not backtrace locked CPUs twice if allcpus backtrace is enabled powerpc/watchdog: do not trigger SMP crash from touch_nmi_watchdog powerpc/xmon: avoid tripping SMP hardlockup watchdog powerpc/64s: Implement system reset idle wakeup reason powerpc/powernv: implement NMI IPI with OPAL_SIGNAL_SYSTEM_RESET arch/powerpc/include/asm/opal-api.h| 1 + arch/powerpc/include/asm/opal.h| 2 + arch/powerpc/kernel/irq.c | 41 ++-- arch/powerpc/kernel/watchdog.c | 29 -- arch/powerpc/platforms/powernv/opal-wrappers.S | 1 + arch/powerpc/platforms/powernv/setup.c | 1 + arch/powerpc/platforms/powernv/smp.c | 52 ++ arch/powerpc/xmon/xmon.c | 17 +++-- 8 files changed, 125 insertions(+), 19 deletions(-) -- 2.13.3
Re: [PATCH kernel v3] vfio/spapr: Add cond_resched() for huge updates
On Thu, Sep 28, 2017 at 07:16:12PM +1000, Alexey Kardashevskiy wrote: > Clearing very big IOMMU tables can trigger soft lockups. This adds > cond_resched() to allow the scheduler to do context switching when > it decides to. > > Signed-off-by: Alexey KardashevskiyReviewed-by: David Gibson > --- > > The testcase is POWER9 box with 264GB guest, 4 VFIO devices from > independent IOMMU groups, 64K IOMMU pages. This configuration produces > 4325376 TCE entries, each entry update incurs 4 OPAL calls to update > an individual PE TCE cache; this produced lockups for more than 20s. > Reducing table size to 4194304 (i.e. 256GB guest) or removing one > of 4 VFIO devices makes the problem go away. > > --- > Changes: > v3: > * cond_resched() checks for should_resched() so we just call resched() > and let the cpu scheduler decide whether to switch or not > > v2: > * replaced with time based solution > --- > drivers/vfio/vfio_iommu_spapr_tce.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c > b/drivers/vfio/vfio_iommu_spapr_tce.c > index 63112c36ab2d..759a5bdd40e1 100644 > --- a/drivers/vfio/vfio_iommu_spapr_tce.c > +++ b/drivers/vfio/vfio_iommu_spapr_tce.c > @@ -507,6 +507,8 @@ static int tce_iommu_clear(struct tce_container > *container, > enum dma_data_direction direction; > > for ( ; pages; --pages, ++entry) { > + cond_resched(); > + > direction = DMA_NONE; > oldhpa = 0; > ret = iommu_tce_xchg(tbl, entry, , ); -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson signature.asc Description: PGP signature
[PATCH] powerpc/vio: dispose of virq mapping on vdevice unregister
When a vdevice is DLPAR removed from the system the vio subsystem doesn't bother unmapping the virq from the irq_domain. As a result we have a virq mapped to a hardware irq that is no longer valid for the irq_domain. A side effect is that we are left with /proc/irq/affinity entries, and attempts to modify the smp_affinity of the irq will fail. In the following observed example the kernel log is spammed by ics_rtas_set_affinity errors after the removal of a VSCSI adapter. This is a result of irqbalance trying to adjust the affinity every 10 seconds. rpadlpar_io: slot U8408.E8E.10A7ACV-V5-C25 removed ics_rtas_set_affinity: ibm,set-xive irq=655385 returns -3 ics_rtas_set_affinity: ibm,set-xive irq=655385 returns -3 This patch fixes the issue by calling irq_dispose_mapping() on the virq of the viodev on unregister. Cc: sta...@vger.kernel.org Signed-off-by: Tyrel Datwyler --- arch/powerpc/kernel/vio.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/powerpc/kernel/vio.c b/arch/powerpc/kernel/vio.c index 5f8dcda..a3228d6 100644 --- a/arch/powerpc/kernel/vio.c +++ b/arch/powerpc/kernel/vio.c @@ -1577,6 +1577,8 @@ static struct device_attribute vio_dev_attrs[] = { void vio_unregister_device(struct vio_dev *viodev) { device_unregister(>dev); + if (viodev->family == VDEVICE) + irq_dispose_mapping(viodev->irq); } EXPORT_SYMBOL(vio_unregister_device); -- 1.7.12.4
Re: [PATCH v3 00/20] Speculative page faults
On Thu, 28 Sep 2017 14:29:02 +0200 Laurent Dufourwrote: > > Laurent's [0/n] provides some nice-looking performance benefits for > > workloads which are chosen to show performance benefits(!) but, alas, > > no quantitative testing results for workloads which we may suspect will > > be harmed by the changes(?). Even things as simple as impact upon > > single-threaded pagefault-intensive workloads and its effect upon > > CONFIG_SMP=n .text size? > > I forgot to mention in my previous email the impact on the .text section. > > Here are the metrics I got : > > .text sizeUP SMP Delta > 4.13-mmotm8444201 8964137 6.16% > '' +spf 8452041 8971929 6.15% > Delta 0.09% 0.09% > > No major impact as you could see. 8k text increase seems rather a lot actually. That's a lot more userspace cacheclines that get evicted during a fault... Is the feature actually beneficial on uniprocessor?
[RFC PATCH for 4.14] membarrier powerpc: Move hook to switch_mm()
Nick has a valid point that the sched_in() hook is a fast-path compared to switch_mm(). Adding an extra TIF test in a fast-path to save a barrier in a comparatively slow-path is therefore not such a good idea overall. Therefore, move the architecture hook to switch_mm() instead. [ This patch is aimed at Paul's tree. It applies on top of "membarrier: Provide register expedited private command (v4)" and "membarrier: Document scheduler barrier requirements (v4)". ] Signed-off-by: Mathieu DesnoyersCC: Peter Zijlstra CC: Paul E. McKenney CC: Nicholas Piggin CC: Boqun Feng CC: Andrew Hunter CC: Maged Michael CC: gro...@google.com CC: Avi Kivity CC: Benjamin Herrenschmidt CC: Paul Mackerras CC: Michael Ellerman CC: Dave Watson CC: Alan Stern CC: Will Deacon CC: Andy Lutomirski CC: Ingo Molnar CC: Alexander Viro CC: linuxppc-dev@lists.ozlabs.org CC: linux-a...@vger.kernel.org --- arch/powerpc/include/asm/membarrier.h | 9 - arch/powerpc/mm/mmu_context.c | 7 +++ include/linux/sched/mm.h | 13 - kernel/sched/core.c | 6 ++ 4 files changed, 17 insertions(+), 18 deletions(-) diff --git a/arch/powerpc/include/asm/membarrier.h b/arch/powerpc/include/asm/membarrier.h index 588154c1cf57..61152a7a3cf9 100644 --- a/arch/powerpc/include/asm/membarrier.h +++ b/arch/powerpc/include/asm/membarrier.h @@ -1,8 +1,8 @@ #ifndef _ASM_POWERPC_MEMBARRIER_H #define _ASM_POWERPC_MEMBARRIER_H -static inline void membarrier_arch_sched_in(struct task_struct *prev, - struct task_struct *next) +static inline void membarrier_arch_switch_mm(struct mm_struct *prev, + struct mm_struct *next, struct task_struct *tsk) { /* * Only need the full barrier when switching between processes. @@ -11,9 +11,8 @@ static inline void membarrier_arch_sched_in(struct task_struct *prev, * when switching from userspace to kernel is not needed after * store to rq->curr. */ - if (likely(!test_ti_thread_flag(task_thread_info(next), - TIF_MEMBARRIER_PRIVATE_EXPEDITED) - || !prev->mm || prev->mm == next->mm)) + if (likely(!test_ti_thread_flag(task_thread_info(tsk), + TIF_MEMBARRIER_PRIVATE_EXPEDITED) || !prev)) return; /* diff --git a/arch/powerpc/mm/mmu_context.c b/arch/powerpc/mm/mmu_context.c index 0f613bc63c50..22f5c91cdc38 100644 --- a/arch/powerpc/mm/mmu_context.c +++ b/arch/powerpc/mm/mmu_context.c @@ -12,6 +12,7 @@ #include #include +#include #include @@ -67,6 +68,10 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, * * On the read side the barrier is in pte_xchg(), which orders * the store to the PTE vs the load of mm_cpumask. +* +* This full barrier is needed by membarrier when switching +* between processes after store to rq->curr, before user-space +* memory accesses. */ smp_mb(); @@ -89,6 +94,8 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, if (new_on_cpu) radix_kvm_prefetch_workaround(next); + else + membarrier_arch_switch_mm(prev, next, tsk); /* * The actual HW switching method differs between the various diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h index 1bd10c2c0893..d5a9ab8f3836 100644 --- a/include/linux/sched/mm.h +++ b/include/linux/sched/mm.h @@ -215,8 +215,8 @@ static inline void memalloc_noreclaim_restore(unsigned int flags) #ifdef CONFIG_ARCH_HAS_MEMBARRIER_HOOKS #include #else -static inline void membarrier_arch_sched_in(struct task_struct *prev, - struct task_struct *next) +static inline void membarrier_arch_switch_mm(struct mm_struct *prev, + struct mm_struct *next, struct task_struct *tsk) { } static inline void membarrier_arch_fork(struct task_struct *t, @@ -232,11 +232,6 @@ static inline void membarrier_arch_register_private_expedited( } #endif -static inline void membarrier_sched_in(struct task_struct *prev, - struct task_struct *next) -{ - membarrier_arch_sched_in(prev, next); -} static inline void membarrier_fork(struct task_struct *t, unsigned long clone_flags) { @@ -252,8 +247,8 @@ static inline void membarrier_execve(struct task_struct *t) membarrier_arch_execve(t); } #else -static inline void
[PATCH v2] powerpc: fix compile error on 64K pages on 40x, 44x
The mmu context on the 40x, 44x does not define pte_frag entry. This causes gcc abort the compilation due to: setup-common.c: In function ‘setup_arch’: setup-common.c:908: error: ‘mm_context_t’ has no ‘pte_frag’ This patch fixes the issue by adding additional guard conditions, that limit the initialization to just platforms that define the pte_frag variable in the mmu context. Signed-off-by: Christian Lamparter--- arch/powerpc/kernel/setup-common.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c index 0ac741fae90e..4a22ec6b34de 100644 --- a/arch/powerpc/kernel/setup-common.c +++ b/arch/powerpc/kernel/setup-common.c @@ -904,7 +904,8 @@ void __init setup_arch(char **cmdline_p) #endif #endif -#ifdef CONFIG_PPC_64K_PAGES +#if defined(CONFIG_PPC_64K_PAGES) && \ +(defined(CONFIG_PPC_BOOK3S_64) || defined(CONFIG_PPC_BOOK3E_MMU)) init_mm.context.pte_frag = NULL; #endif #ifdef CONFIG_SPAPR_TCE_IOMMU -- 2.14.2
Re: [PATCH v4 for 4.14 1/3] membarrier: Provide register expedited private command
- On Sep 28, 2017, at 12:16 PM, Nicholas Piggin npig...@gmail.com wrote: > On Thu, 28 Sep 2017 15:29:50 + (UTC) > Mathieu Desnoyerswrote: > >> - On Sep 28, 2017, at 11:01 AM, Nicholas Piggin npig...@gmail.com wrote: >> >> > On Thu, 28 Sep 2017 13:31:36 + (UTC) >> > Mathieu Desnoyers wrote: >> > >> >> - On Sep 27, 2017, at 9:04 AM, Nicholas Piggin npig...@gmail.com >> >> wrote: >> >> > > [snip] > >> >> So I don't see much point in trying to remove that registration step. >> > >> > I don't follow you. You are talking about the concept of registering >> > intention to use a different function? And the registration API is not >> > merged yet? >> >> Yes, I'm talking about requiring processes to invoke membarrier cmd >> MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED before they can successfully >> invoke membarrier cmd MEMBARRIER_CMD_PRIVATE_EXPEDITED. >> >> > Let me say I'm not completely against the idea of a registration API. But >> > don't think registration for this expedited command is necessary. >> >> Given that we have the powerpc lack-of-full-barrier-on-return-to-userspace >> case now, and we foresee x86-sysexit, sparc, and alpha also requiring >> special treatment when we introduce the MEMBARRIER_FLAG_SYNC_CORE behavior >> in the next release, it seems that we'll have a hard time handling >> architecture special cases efficiently if we don't expose the registration >> API right away. > > But SYNC_CORE is a different functionality, right? You can add the > registration API for it when that goes in. Sure, I could. However, I was hoping to re-use the same command, with a "SYNC_CORE" flag, and I would have liked to have consistent behavior for same commands used with different flags. > >> > But (aside) let's say a tif flag turns out to be a good diea for your >> > second case, why not just check the flag in the membarrier sys call and >> > do the registration the first time it uses it? >> >> We also considered that option. It's mainly about guaranteeing that >> an expedited membarrier command never blocks. If we introduce this >> "lazy auto-registration" behavior, we end up blocking the process >> at a random point in its execution so we can issue a synchronize_sched(). >> By exposing an explicit registration, we can control where this delay >> occurs, and even allow library constructors to invoke the registration >> while the process is a single threaded, therefore allowing us to completely >> skip synchronize_sched(). > > Okay I guess that could be a good reason. As I said I'm not opposed to > the concept. I suppose you could even have a registration for expedited > private even if it's a no-op on all architectures, just in case some new > ways of implementing it can be done in future. That's an approach I would be OK with too. Mandating explicit registration will give us much more flexibility. > I suppose I'm more objecting to the added complexity for powerpc, and > more code in the fastpath to make the slowpath faster. Just to make sure I understand your concern here. The "fastpath" you refer to is the TIF flag test in membarrier_sched_in() within finish_task_switch(), and the "slowpath" is switch_mm() which lacks the required full barrier now, am I correct ? Would it help if we invoke the membarrier hook from switch_mm() instead ? We'd therefore only add the TIF flag test in switch_mm(), rather than for every context switch. Thanks, Mathieu > > Thanks, > Nick -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com
Re: [EXTERNAL]Re: FSL serial 166550 errata broken?
On Thu, 2017-09-28 at 17:54 +0200, Joakim Tjernlund wrote: > On Wed, 2017-09-27 at 15:32 +, York Sun wrote: > > On 09/27/2017 04:03 AM, Joakim Tjernlund wrote: > > > On Mon, 2017-09-25 at 17:26 +, York Sun wrote: > > > > On 09/25/2017 09:55 AM, Joakim Tjernlund wrote: > > > > > We got some "broken" boards(mpx8321) where UART RX is held low(BREAK) > > > > > There we get a few: > > > > > serial8250: too much work for irq18 > > > > > and the board freezes. > > > > > > > > > > Looking inte to driver/CPU there is an errtum(A-004737) w.r.t BREAK > > > > > handling > > > > > and I can see we are hitting the irq function fsl8250_handle_irq() > > > > > added > > > > > in commit: 9deaa53ac7fa373623123aa4f18828dd62292b1a > > > > >"serial: add irq handler for Freescale 16550 errata." > > > > > For all I can tell this workaround is broken and I cannot figure out > > > > > why. > > > > > Any pointers? > > > > > > > > > > > > > Jocke, > > > > > > > > Did you mean MPC8321? > > > > > > > > I personally don't have experience debugging this erratum. Have you > > > > tried to contact the patch author Paul Gortmaker to see how he managed > > > > to get it work? > > > > > > No, but I just found out it is u-boot related. If I use an very old > > > u-boot it works. > > > Between then and now we did a massive upgrade of u-boot and now if > > > breaks. And no, > > > bisect not possible due to local u-boot mods :) > > > > How old? It is a good sign that an old U-Boot works. Git bisect would be > > a good tool if practical. Sometimes I have to reapply some local changes > > for every step of bisect to make it useful. You mileage may vary though. > > > > > > > > Any idea what could be different? I cannot see and I have tested > > > what I could see/think of but now I am out of ideas. > > > > I don't believe we have much change for MPC8321 for a long time. If any > > change has impact on kernel but not U-Boot itself, it may be other > > errata workarounds. > > > > I presume this happens on your own board. If I am in your shoes, I would > > try to reduce the number of local commits and reapply them to various > > U-Boot tags to further narrow down the search scope. In the mean time, > > getting register dump to compare the good and bad may be another way to go. > > God, this was no fun exercise but I think I found the offending commit: > 82dda962f0a6449672df3378bb6b5fe54372a927 > serial: Unconditionally enable CONFIG_SERIAL_MULTI > > Enable CONFIG_SERIAL_MULTI for all builds of U-Boot. That includes > both SPL builds and non-SPL builds, everything. To avoid poluting > this patch with removal of ifdef-endif constructions containing > CONFIG_SERIAL_MULTI, the CONFIG_SERIAL_MULTI is temporarily added > into CPPFLAGS in config.mk . This will be again removed in following > patch. > Ok, unless I init ttyS1 in u-boot too, IRQ storm will ensue if BREAK is present when opening ttyS1: + /* Must init the second RS232 or IRQ storm happens +* when BREAK is constant on while opening ttyS1 */ + NS16550_init((NS16550_t)CONFIG_SYS_NS16550_COM2, +ns16550_calc_divisor((NS16550_t)CONFIG_SYS_NS16550_COM2, + CONFIG_SYS_NS16550_CLK, + CONFIG_BAUDRATE)); I guess this is a kernel bug too, the driver should clear/init needed state before starting the device. Fixing that is not on my menu though :) I also noted that FSL custom irq handler, fsl8250_handle_irq, does not handle dma like the standard one does. I guess it needs some love too. Jocke
Re: [PATCH v4 for 4.14 1/3] membarrier: Provide register expedited private command
On Thu, 28 Sep 2017 17:51:15 +0200 Peter Zijlstrawrote: > On Fri, Sep 29, 2017 at 01:01:12AM +1000, Nicholas Piggin wrote: > > That's fine. If a user is not bound to a subset of CPUs, they could > > also cause disturbances with other syscalls and faults, taking locks, > > causing tlb flushes and IPIs and things. > > So on the big SGI class machines we've had trouble with > for_each_cpu() loops before, and IIRC the biggest Power box is not too > far from that 1-2K CPUs IIRC. This is a loop in process context, interrupts on, can reschedule, no locks held though. The biggest power boxes are more tightly coupled than those big SGI systems, but even so just plodding along taking and releasing locks in turn would be fine on those SGI ones as well really. Not DoS level. This is not a single mega hot cache line or lock that is bouncing over the entire machine, but one process grabbing a line and lock from each of 1000 CPUs. Slight disturbance sure, but each individual CPU will see it as 1/1000th of a disturbance, most of the cost will be concentrated in the syscall caller. > > Bouncing that lock across the machine is *painful*, I have vague > memories of cases where the lock ping-pong was most the time spend. > > But only Power needs this, all the other architectures are fine with the > lockless approach for MEMBAR_EXPEDITED_PRIVATE. Yes, we can add an iterator function that power can override in a few lines. Less arch specific code than this proposal. > > The ISYNC variant of the same however appears to want TIF flags or > something to aid a number of archs, the rq->lock will not help there. The SYNC_CORE? Yes it seems different. I think that's another issue though. Thanks, Nick
Re: [PATCH v4 for 4.14 1/3] membarrier: Provide register expedited private command
On Thu, 28 Sep 2017 15:29:50 + (UTC) Mathieu Desnoyerswrote: > - On Sep 28, 2017, at 11:01 AM, Nicholas Piggin npig...@gmail.com wrote: > > > On Thu, 28 Sep 2017 13:31:36 + (UTC) > > Mathieu Desnoyers wrote: > > > >> - On Sep 27, 2017, at 9:04 AM, Nicholas Piggin npig...@gmail.com wrote: > >> [snip] > >> So I don't see much point in trying to remove that registration step. > > > > I don't follow you. You are talking about the concept of registering > > intention to use a different function? And the registration API is not > > merged yet? > > Yes, I'm talking about requiring processes to invoke membarrier cmd > MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED before they can successfully > invoke membarrier cmd MEMBARRIER_CMD_PRIVATE_EXPEDITED. > > > Let me say I'm not completely against the idea of a registration API. But > > don't think registration for this expedited command is necessary. > > Given that we have the powerpc lack-of-full-barrier-on-return-to-userspace > case now, and we foresee x86-sysexit, sparc, and alpha also requiring > special treatment when we introduce the MEMBARRIER_FLAG_SYNC_CORE behavior > in the next release, it seems that we'll have a hard time handling > architecture special cases efficiently if we don't expose the registration > API right away. But SYNC_CORE is a different functionality, right? You can add the registration API for it when that goes in. > > But (aside) let's say a tif flag turns out to be a good diea for your > > second case, why not just check the flag in the membarrier sys call and > > do the registration the first time it uses it? > > We also considered that option. It's mainly about guaranteeing that > an expedited membarrier command never blocks. If we introduce this > "lazy auto-registration" behavior, we end up blocking the process > at a random point in its execution so we can issue a synchronize_sched(). > By exposing an explicit registration, we can control where this delay > occurs, and even allow library constructors to invoke the registration > while the process is a single threaded, therefore allowing us to completely > skip synchronize_sched(). Okay I guess that could be a good reason. As I said I'm not opposed to the concept. I suppose you could even have a registration for expedited private even if it's a no-op on all architectures, just in case some new ways of implementing it can be done in future. I suppose I'm more objecting to the added complexity for powerpc, and more code in the fastpath to make the slowpath faster. Thanks, Nick
Re: [EXTERNAL]Re: FSL serial 166550 errata broken?
On Wed, 2017-09-27 at 15:32 +, York Sun wrote: > On 09/27/2017 04:03 AM, Joakim Tjernlund wrote: > > On Mon, 2017-09-25 at 17:26 +, York Sun wrote: > > > On 09/25/2017 09:55 AM, Joakim Tjernlund wrote: > > > > We got some "broken" boards(mpx8321) where UART RX is held low(BREAK) > > > > There we get a few: > > > > serial8250: too much work for irq18 > > > > and the board freezes. > > > > > > > > Looking inte to driver/CPU there is an errtum(A-004737) w.r.t BREAK > > > > handling > > > > and I can see we are hitting the irq function fsl8250_handle_irq() added > > > > in commit: 9deaa53ac7fa373623123aa4f18828dd62292b1a > > > >"serial: add irq handler for Freescale 16550 errata." > > > > For all I can tell this workaround is broken and I cannot figure out > > > > why. > > > > Any pointers? > > > > > > > > > > Jocke, > > > > > > Did you mean MPC8321? > > > > > > I personally don't have experience debugging this erratum. Have you > > > tried to contact the patch author Paul Gortmaker to see how he managed > > > to get it work? > > > > No, but I just found out it is u-boot related. If I use an very old u-boot > > it works. > > Between then and now we did a massive upgrade of u-boot and now if breaks. > > And no, > > bisect not possible due to local u-boot mods :) > > How old? It is a good sign that an old U-Boot works. Git bisect would be > a good tool if practical. Sometimes I have to reapply some local changes > for every step of bisect to make it useful. You mileage may vary though. > > > > > Any idea what could be different? I cannot see and I have tested > > what I could see/think of but now I am out of ideas. > > I don't believe we have much change for MPC8321 for a long time. If any > change has impact on kernel but not U-Boot itself, it may be other > errata workarounds. > > I presume this happens on your own board. If I am in your shoes, I would > try to reduce the number of local commits and reapply them to various > U-Boot tags to further narrow down the search scope. In the mean time, > getting register dump to compare the good and bad may be another way to go. God, this was no fun exercise but I think I found the offending commit: 82dda962f0a6449672df3378bb6b5fe54372a927 serial: Unconditionally enable CONFIG_SERIAL_MULTI Enable CONFIG_SERIAL_MULTI for all builds of U-Boot. That includes both SPL builds and non-SPL builds, everything. To avoid poluting this patch with removal of ifdef-endif constructions containing CONFIG_SERIAL_MULTI, the CONFIG_SERIAL_MULTI is temporarily added into CPPFLAGS in config.mk . This will be again removed in following patch.
Re: [PATCH v4 for 4.14 1/3] membarrier: Provide register expedited private command
On Fri, Sep 29, 2017 at 01:01:12AM +1000, Nicholas Piggin wrote: > That's fine. If a user is not bound to a subset of CPUs, they could > also cause disturbances with other syscalls and faults, taking locks, > causing tlb flushes and IPIs and things. So on the big SGI class machines we've had trouble with for_each_cpu() loops before, and IIRC the biggest Power box is not too far from that 1-2K CPUs IIRC. Bouncing that lock across the machine is *painful*, I have vague memories of cases where the lock ping-pong was most the time spend. But only Power needs this, all the other architectures are fine with the lockless approach for MEMBAR_EXPEDITED_PRIVATE. The ISYNC variant of the same however appears to want TIF flags or something to aid a number of archs, the rq->lock will not help there.
Re: [PATCH v4 for 4.14 1/3] membarrier: Provide register expedited private command
- On Sep 28, 2017, at 11:01 AM, Nicholas Piggin npig...@gmail.com wrote: > On Thu, 28 Sep 2017 13:31:36 + (UTC) > Mathieu Desnoyerswrote: > >> - On Sep 27, 2017, at 9:04 AM, Nicholas Piggin npig...@gmail.com wrote: >> >> > On Tue, 26 Sep 2017 20:43:28 + (UTC) >> > Mathieu Desnoyers wrote: >> > >> >> - On Sep 26, 2017, at 1:51 PM, Mathieu Desnoyers >> >> mathieu.desnoy...@efficios.com wrote: >> >> [...] >> Therefore, >> you end up with the same rq lock disruption as if you would iterate on all >> online CPUs. If userspace does that in a loop, you end up, in PeterZ's words, >> with an Insta-DoS. > > I really don't see how that can be true. spinlock by definition is for > sharing of resources, it's not an insta-DoS just because you take shared > spinlocks! [...] >> >> > >> > For the powerpc approach, yes there is some controversy about using >> > runqueue locks even for cpus that we already can interfere with, but I >> > think we have a lot of options we could look at *after* it ever shows >> > up as a problem. >> >> The DoS argument from Peter seems to be a strong opposition to grabbing >> the rq locks. > > Well if I still can't unconvince you, then we should try testing that > theory. [ I'll let PeterZ pitch in on this part of the discussion ] > >> >> Here is another point in favor of having a register command for the >> private membarrier: This gives us greater flexibility to improve the >> kernel scheduler and return-to-userspace barriers if need be in the >> future. >> >> For instance, I plan to propose a "MEMBARRIER_FLAG_SYNC_CORE" flag >> that will also provide guarantees about context synchronization of >> all cores for memory reclaim performed by JIT for the next merge >> window. So far, the following architectures seems to have the proper >> core serializing instructions already in place when returning to >> user-space: x86 (iret), powerpc (rfi), arm32/64 (return from exception, >> eret), s390/x (lpswe), ia64 (rfi), parisc (issue at least 7 instructions >> while signing around a bonfire), and mips SMP (eret). >> >> So far, AFAIU, only x86 (eventually going through sysexit), alpha >> (appears to require an explicit imb), and sparc (explicit flush + 5 >> instructions around similar bonfire as parisc) appear to require special >> handling. >> >> I therefore plan to use the registration step with a >> MEMBARRIER_FLAG_SYNC_CORE flag set to set TIF flags and add the >> required context synchronizing barriers on sched_in() only for >> processes wishing to use private expedited membarrier. >> >> So I don't see much point in trying to remove that registration step. > > I don't follow you. You are talking about the concept of registering > intention to use a different function? And the registration API is not > merged yet? Yes, I'm talking about requiring processes to invoke membarrier cmd MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED before they can successfully invoke membarrier cmd MEMBARRIER_CMD_PRIVATE_EXPEDITED. > Let me say I'm not completely against the idea of a registration API. But > don't think registration for this expedited command is necessary. Given that we have the powerpc lack-of-full-barrier-on-return-to-userspace case now, and we foresee x86-sysexit, sparc, and alpha also requiring special treatment when we introduce the MEMBARRIER_FLAG_SYNC_CORE behavior in the next release, it seems that we'll have a hard time handling architecture special cases efficiently if we don't expose the registration API right away. > > But (aside) let's say a tif flag turns out to be a good diea for your > second case, why not just check the flag in the membarrier sys call and > do the registration the first time it uses it? We also considered that option. It's mainly about guaranteeing that an expedited membarrier command never blocks. If we introduce this "lazy auto-registration" behavior, we end up blocking the process at a random point in its execution so we can issue a synchronize_sched(). By exposing an explicit registration, we can control where this delay occurs, and even allow library constructors to invoke the registration while the process is a single threaded, therefore allowing us to completely skip synchronize_sched(). Thanks, Mathieu > > Thanks, > Nick -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com
Re: [PATCH v4 for 4.14 1/3] membarrier: Provide register expedited private command
On Thu, 28 Sep 2017 13:31:36 + (UTC) Mathieu Desnoyerswrote: > - On Sep 27, 2017, at 9:04 AM, Nicholas Piggin npig...@gmail.com wrote: > > > On Tue, 26 Sep 2017 20:43:28 + (UTC) > > Mathieu Desnoyers wrote: > > > >> - On Sep 26, 2017, at 1:51 PM, Mathieu Desnoyers > >> mathieu.desnoy...@efficios.com wrote: > >> > >> > Provide a new command allowing processes to register their intent to use > >> > the private expedited command. > >> > > >> > >> I missed a few maintainers that should have been CC'd. Adding them now. > >> This patch is aimed to go through Paul E. McKenney's tree. > > > > Honestly this is pretty ugly new user API and fairly large amount of > > complexity just to avoid the powerpc barrier. And you end up with arch > > specific hooks anyway! > > > > So my plan was to add an arch-overridable loop primitive that iterates > > over all running threads for an mm. > > Iterating over all threads for an mm is one possible solution that has > been listed at the membarrier BOF at LPC. We ended up dismissing this > solution because it would not inefficient for processes which have > lots of threads (e.g. Java). Sorry I meant hardware threads, I should have said "CPUs". > > > powerpc will use its mm_cpumask for > > iterating and use runqueue locks to avoid the barrier. > > This is another solution which has been rejected during the LPC BOF. > > What I gather from past threads is that the mm_cpumask's bits on powerpc > are pretty much only being set, never much cleared. Therefore, over the > lifetime of a thread which is not affined to specific processors, chances > are that this cpumask will end up having all cores on the system. That's fine. If a user is not bound to a subset of CPUs, they could also cause disturbances with other syscalls and faults, taking locks, causing tlb flushes and IPIs and things. > Therefore, > you end up with the same rq lock disruption as if you would iterate on all > online CPUs. If userspace does that in a loop, you end up, in PeterZ's words, > with an Insta-DoS. I really don't see how that can be true. spinlock by definition is for sharing of resources, it's not an insta-DoS just because you take shared spinlocks! You can contend other common locks or resources too. Linux simply does not have guaranteed strict isolation particularly when you allow threads to be scheduled together on CPUs, so this can't be used arbitrarily. If it was taking all locks at once that's one thing which has always been good policy to avoid. But it's not, any single rq lock will only be taken and released for a very short time, far shorter than a normal context switch. And the entire operation will be moderated by the cost of the syscall, and the number of runqueues it has to iterate. There's better ways to cause DoS. Start lots of threads and burn cycles, bounce between CPUs with affinty, sleep and wake one another between remote CPUs etc. Run out of memory, cause hash collisions on various hashes that are easy to control, cause lots of TLB flush IPIs... The runqueue lock is not really special. Might equally complain about a zone page allocator or lru lock, or an inode mutex or common dentry lock. > The name may sound cool, but I gather that this is not > a service the kernel willingly wants to provide to userspace. > > A cunning process could easily find a way to fill its mm_cpumask and then > issue membarrier in a loop to bring a large system to its knees. I still don't see how. Nothing that you couldn't do with other resources or configurations of threads and system calls. Sure you might cause a disturbance and admin might notice and kill it. > > > x86 will most > > likely want to use its mm_cpumask to iterate. > > Iterating on mm_cpumask rather than online cpus adds complexity wrt memory > barriers (unless we go for rq locks approach). We'd need, in addition to > ensure that we have the proper barriers before/after store to rq->curr, > to also ensure that we have the proper barriers between mm_cpumask > updates and user-space loads/stores. Considering that we're not aiming > at taking the rq locks anyway, iteration over all online cpus seems > less complex than iterating on mm_cpumask on the architectures that > keep track of it. Well x86 does not *have* to implement it. I thought it probably would like to, and I didn't think its barriers would have been all that complex when I last looked, but didn't look too closely. > > > > > For the powerpc approach, yes there is some controversy about using > > runqueue locks even for cpus that we already can interfere with, but I > > think we have a lot of options we could look at *after* it ever shows > > up as a problem. > > The DoS argument from Peter seems to be a strong opposition to grabbing > the rq locks. Well if I still can't unconvince you, then we should try testing that theory. > > Here is another point in
[PATCH] selftests/powerpc: Use snprintf to construct DSCR sysfs interface paths
Currently sprintf is used, and while paths should never exceed the size of the buffer it is theoretically possible since dirent.d_name is 256 bytes. As a result this trips -Wformat-overflow, and since the test is built with -Wall -Werror the causes the build to fail. Switch to using snprintf and skip any paths which are too long for the filename buffer. Signed-off-by: Seth Forshee--- tools/testing/selftests/powerpc/dscr/dscr_sysfs_test.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/powerpc/dscr/dscr_sysfs_test.c b/tools/testing/selftests/powerpc/dscr/dscr_sysfs_test.c index 17fb1b43c320..1899bd85121f 100644 --- a/tools/testing/selftests/powerpc/dscr/dscr_sysfs_test.c +++ b/tools/testing/selftests/powerpc/dscr/dscr_sysfs_test.c @@ -53,6 +53,8 @@ static int check_all_cpu_dscr_defaults(unsigned long val) } while ((dp = readdir(sysfs))) { + int len; + if (!(dp->d_type & DT_DIR)) continue; if (!strcmp(dp->d_name, "cpuidle")) @@ -60,7 +62,9 @@ static int check_all_cpu_dscr_defaults(unsigned long val) if (!strstr(dp->d_name, "cpu")) continue; - sprintf(file, "%s%s/dscr", CPU_PATH, dp->d_name); + len = snprintf(file, LEN_MAX, "%s%s/dscr", CPU_PATH, dp->d_name); + if (len >= LEN_MAX) + continue; if (access(file, F_OK)) continue; -- 2.14.1
[PATCH] powerpc: Always initialize input array when calling epapr_hypercall()
Several callers to epapr_hypercall() pass an uninitialized stack allocated array for the input arguments, presumably because they have no input arguments. However this can produce errors like this one arch/powerpc/include/asm/epapr_hcalls.h:470:42: error: 'in' may be used uninitialized in this function [-Werror=maybe-uninitialized] unsigned long register r3 asm("r3") = in[0]; ~~^~~ Fix callers to this function to always zero-initialize the input arguments array to prevent this. Signed-off-by: Seth Forshee--- arch/powerpc/include/asm/epapr_hcalls.h | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/include/asm/epapr_hcalls.h b/arch/powerpc/include/asm/epapr_hcalls.h index 334459ad145b..90863245df53 100644 --- a/arch/powerpc/include/asm/epapr_hcalls.h +++ b/arch/powerpc/include/asm/epapr_hcalls.h @@ -508,7 +508,7 @@ static unsigned long epapr_hypercall(unsigned long *in, static inline long epapr_hypercall0_1(unsigned int nr, unsigned long *r2) { - unsigned long in[8]; + unsigned long in[8] = {0}; unsigned long out[8]; unsigned long r; @@ -520,7 +520,7 @@ static inline long epapr_hypercall0_1(unsigned int nr, unsigned long *r2) static inline long epapr_hypercall0(unsigned int nr) { - unsigned long in[8]; + unsigned long in[8] = {0}; unsigned long out[8]; return epapr_hypercall(in, out, nr); @@ -528,7 +528,7 @@ static inline long epapr_hypercall0(unsigned int nr) static inline long epapr_hypercall1(unsigned int nr, unsigned long p1) { - unsigned long in[8]; + unsigned long in[8] = {0}; unsigned long out[8]; in[0] = p1; @@ -538,7 +538,7 @@ static inline long epapr_hypercall1(unsigned int nr, unsigned long p1) static inline long epapr_hypercall2(unsigned int nr, unsigned long p1, unsigned long p2) { - unsigned long in[8]; + unsigned long in[8] = {0}; unsigned long out[8]; in[0] = p1; @@ -549,7 +549,7 @@ static inline long epapr_hypercall2(unsigned int nr, unsigned long p1, static inline long epapr_hypercall3(unsigned int nr, unsigned long p1, unsigned long p2, unsigned long p3) { - unsigned long in[8]; + unsigned long in[8] = {0}; unsigned long out[8]; in[0] = p1; @@ -562,7 +562,7 @@ static inline long epapr_hypercall4(unsigned int nr, unsigned long p1, unsigned long p2, unsigned long p3, unsigned long p4) { - unsigned long in[8]; + unsigned long in[8] = {0}; unsigned long out[8]; in[0] = p1; -- 2.14.1
Re: [PATCH v4 for 4.14 1/3] membarrier: Provide register expedited private command
- On Sep 27, 2017, at 9:04 AM, Nicholas Piggin npig...@gmail.com wrote: > On Tue, 26 Sep 2017 20:43:28 + (UTC) > Mathieu Desnoyerswrote: > >> - On Sep 26, 2017, at 1:51 PM, Mathieu Desnoyers >> mathieu.desnoy...@efficios.com wrote: >> >> > Provide a new command allowing processes to register their intent to use >> > the private expedited command. >> > >> >> I missed a few maintainers that should have been CC'd. Adding them now. >> This patch is aimed to go through Paul E. McKenney's tree. > > Honestly this is pretty ugly new user API and fairly large amount of > complexity just to avoid the powerpc barrier. And you end up with arch > specific hooks anyway! > > So my plan was to add an arch-overridable loop primitive that iterates > over all running threads for an mm. Iterating over all threads for an mm is one possible solution that has been listed at the membarrier BOF at LPC. We ended up dismissing this solution because it would not inefficient for processes which have lots of threads (e.g. Java). > powerpc will use its mm_cpumask for > iterating and use runqueue locks to avoid the barrier. This is another solution which has been rejected during the LPC BOF. What I gather from past threads is that the mm_cpumask's bits on powerpc are pretty much only being set, never much cleared. Therefore, over the lifetime of a thread which is not affined to specific processors, chances are that this cpumask will end up having all cores on the system. Therefore, you end up with the same rq lock disruption as if you would iterate on all online CPUs. If userspace does that in a loop, you end up, in PeterZ's words, with an Insta-DoS. The name may sound cool, but I gather that this is not a service the kernel willingly wants to provide to userspace. A cunning process could easily find a way to fill its mm_cpumask and then issue membarrier in a loop to bring a large system to its knees. > x86 will most > likely want to use its mm_cpumask to iterate. Iterating on mm_cpumask rather than online cpus adds complexity wrt memory barriers (unless we go for rq locks approach). We'd need, in addition to ensure that we have the proper barriers before/after store to rq->curr, to also ensure that we have the proper barriers between mm_cpumask updates and user-space loads/stores. Considering that we're not aiming at taking the rq locks anyway, iteration over all online cpus seems less complex than iterating on mm_cpumask on the architectures that keep track of it. > > For the powerpc approach, yes there is some controversy about using > runqueue locks even for cpus that we already can interfere with, but I > think we have a lot of options we could look at *after* it ever shows > up as a problem. The DoS argument from Peter seems to be a strong opposition to grabbing the rq locks. Here is another point in favor of having a register command for the private membarrier: This gives us greater flexibility to improve the kernel scheduler and return-to-userspace barriers if need be in the future. For instance, I plan to propose a "MEMBARRIER_FLAG_SYNC_CORE" flag that will also provide guarantees about context synchronization of all cores for memory reclaim performed by JIT for the next merge window. So far, the following architectures seems to have the proper core serializing instructions already in place when returning to user-space: x86 (iret), powerpc (rfi), arm32/64 (return from exception, eret), s390/x (lpswe), ia64 (rfi), parisc (issue at least 7 instructions while signing around a bonfire), and mips SMP (eret). So far, AFAIU, only x86 (eventually going through sysexit), alpha (appears to require an explicit imb), and sparc (explicit flush + 5 instructions around similar bonfire as parisc) appear to require special handling. I therefore plan to use the registration step with a MEMBARRIER_FLAG_SYNC_CORE flag set to set TIF flags and add the required context synchronizing barriers on sched_in() only for processes wishing to use private expedited membarrier. So I don't see much point in trying to remove that registration step. Thanks, Mathieu > > Thanks, > Nick -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com
Re: [PATCH] powerpc/powernv: Use early_radix_enabled in POWER9 tlb flush
Nick, I applied your patch into linux kernel 4.13, rebuild it, installed it, then had a test, still can not boot OS. System hung here, the same as before. Would you please have a look? The system is going down NOW!Sent SIGTERM to all processesSent SIGKILL to all processes[ 104.755810] kexec_core: Starting new kernel[ 126.157029744,5] OPAL: Switch to big-endian OS Best Regards!Meng LiIBM OpenPOWER Application Engineer - Original message -From: Nicholas PigginTo: linuxppc-dev@lists.ozlabs.orgCc: Nicholas Piggin , Jeremy Kerr , Meng YK Li Subject: [PATCH] powerpc/powernv: Use early_radix_enabled in POWER9 tlb flushDate: Wed, Sep 27, 2017 1:46 PM This code is used at boot and machine checks, so it should be usingearly_radix_enabled() (which is usable any time).Signed-off-by: Nicholas Piggin --- arch/powerpc/kernel/mce_power.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)diff --git a/arch/powerpc/kernel/mce_power.c b/arch/powerpc/kernel/mce_power.cindex b76ca198e09c..d37e612050b5 100644--- a/arch/powerpc/kernel/mce_power.c+++ b/arch/powerpc/kernel/mce_power.c@@ -128,7 +128,7 @@ void __flush_tlb_power9(unsigned int action) { unsigned int num_sets; - if (radix_enabled())+ if (early_radix_enabled()) num_sets = POWER9_TLB_SETS_RADIX; else num_sets = POWER9_TLB_SETS_HASH;--2.13.3
Re: [PATCH v3 00/20] Speculative page faults
Hi Andrew, On 26/09/2017 01:34, Andrew Morton wrote: On Mon, 25 Sep 2017 09:27:43 -0700 Alexei Starovoitovwrote: On Mon, Sep 18, 2017 at 12:15 AM, Laurent Dufour wrote: Despite the unprovable lockdep warning raised by Sergey, I didn't get any feedback on this series. Is there a chance to get it moved upstream ? what is the status ? We're eagerly looking forward for this set to land, since we have several use cases for tracing that will build on top of this set as discussed at Plumbers. There has been sadly little review and testing so far :( I'll be taking a close look at it all over the next couple of weeks. One terribly important thing (especially for a patchset this large and intrusive) is the rationale for merging it: the justification, usually in the form of end-user benefit. Laurent's [0/n] provides some nice-looking performance benefits for workloads which are chosen to show performance benefits(!) but, alas, no quantitative testing results for workloads which we may suspect will be harmed by the changes(?). Even things as simple as impact upon single-threaded pagefault-intensive workloads and its effect upon CONFIG_SMP=n .text size? I forgot to mention in my previous email the impact on the .text section. Here are the metrics I got : .text size UP SMP Delta 4.13-mmotm 8444201 8964137 6.16% '' +spf 8452041 8971929 6.15% Delta 0.09% 0.09% No major impact as you could see. Thanks, Laurent If you have additional usecases then please, spell them out for us in full detail so we can better understand the benefits which this patchset provides.
Re: [PATCH v3 00/20] Speculative page faults
Hi, On 26/09/2017 01:34, Andrew Morton wrote: On Mon, 25 Sep 2017 09:27:43 -0700 Alexei Starovoitovwrote: On Mon, Sep 18, 2017 at 12:15 AM, Laurent Dufour wrote: Despite the unprovable lockdep warning raised by Sergey, I didn't get any feedback on this series. Is there a chance to get it moved upstream ? what is the status ? We're eagerly looking forward for this set to land, since we have several use cases for tracing that will build on top of this set as discussed at Plumbers. There has been sadly little review and testing so far :( I do agree and I could just encourage people to do so :/ I'll be taking a close look at it all over the next couple of weeks. Thanks Andrew for giving it a close look. One terribly important thing (especially for a patchset this large and intrusive) is the rationale for merging it: the justification, usually in the form of end-user benefit. The benefit is only for multi-threaded processes. But even on *small* systems with 16 CPUs, there is a real benefit. Laurent's [0/n] provides some nice-looking performance benefits for workloads which are chosen to show performance benefits(!) but, alas, no quantitative testing results for workloads which we may suspect will be harmed by the changes(?). I did test with kernbench, involving gcc/ld which are not multi-threaded, AFAIK, and I didn't see any impact. But if you know additional test I should give a try, please advise. Regarding ebizzy, it was designed to simulate web server's activity, so I guess there will be improvements when running real web servers. Even things as simple as impact upon single-threaded pagefault-intensive workloads and its effect upon CONFIG_SMP=n .text size? If you have additional usecases then please, spell them out for us in full detail so we can better understand the benefits which this patchset provides. The other use-case I'm aware of is on memory database, where performance improvements is really significant, as I mentioned in the header of my series. Cheers, Laurent.
RE: [PATCH v2 2/3] powerpc/64: enhance memcmp() with VMX instruction for long bytes comparision
From: Simon Guo > Sent: 27 September 2017 19:34 ... > > On X86 all the AVX registers are caller saved, the system call > > entry could issue the instruction that invalidates them all. > > Kernel code running in the context of a user process could then > > use the registers without saving them. > > It would only need to set a mark to ensure they are invalidated > > again on return to user (might be cheap enough to do anyway). > > Dunno about PPC though. > > I am not aware of any ppc instruction which can set a "mark" or provide > any high granularity flag against single or subgroup of vec regs' validity. > But ppc experts may want to correct me. I was just thinking of a software flag. David
[PATCH kernel v3] vfio/spapr: Add cond_resched() for huge updates
Clearing very big IOMMU tables can trigger soft lockups. This adds cond_resched() to allow the scheduler to do context switching when it decides to. Signed-off-by: Alexey Kardashevskiy--- The testcase is POWER9 box with 264GB guest, 4 VFIO devices from independent IOMMU groups, 64K IOMMU pages. This configuration produces 4325376 TCE entries, each entry update incurs 4 OPAL calls to update an individual PE TCE cache; this produced lockups for more than 20s. Reducing table size to 4194304 (i.e. 256GB guest) or removing one of 4 VFIO devices makes the problem go away. --- Changes: v3: * cond_resched() checks for should_resched() so we just call resched() and let the cpu scheduler decide whether to switch or not v2: * replaced with time based solution --- drivers/vfio/vfio_iommu_spapr_tce.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c index 63112c36ab2d..759a5bdd40e1 100644 --- a/drivers/vfio/vfio_iommu_spapr_tce.c +++ b/drivers/vfio/vfio_iommu_spapr_tce.c @@ -507,6 +507,8 @@ static int tce_iommu_clear(struct tce_container *container, enum dma_data_direction direction; for ( ; pages; --pages, ++entry) { + cond_resched(); + direction = DMA_NONE; oldhpa = 0; ret = iommu_tce_xchg(tbl, entry, , ); -- 2.11.0
Re: [PATCH 1/1] KVM: PPC: Book3S: Fix server always zero from kvmppc_xive_get_xive()
On Thu, 2017-09-28 at 11:45 +1000, David Gibson wrote: > On Tue, Sep 26, 2017 at 04:47:04PM +1000, Sam Bobroff wrote: > > In KVM's XICS-on-XIVE emulation, kvmppc_xive_get_xive() returns the > > value of state->guest_server as "server". However, this value is not > > set by it's counterpart kvmppc_xive_set_xive(). When the guest uses > > this interface to migrate interrupts away from a CPU that is going > > offline, it sees all interrupts as belonging to CPU 0, so they are > > left assigned to (now) offline CPUs. > > > > This patch removes the guest_server field from the state, and returns > > act_server in it's place (that is, the CPU actually handling the > > interrupt, which may differ from the one requested). > > > > Fixes: 5af50993850a ("KVM: PPC: Book3S HV: Native usage of the XIVE > > interrupt controller") > > Cc: sta...@vger.kernel.org > > Signed-off-by: Sam Bobroff> > --- > > The other obvious way to patch this would be to set state->guest_server in > > kvmppc_xive_set_xive() and that does also work because act_server is usually > > equal to guest_server. > > > > However, in the cases where guest_server differed from act_server, the guest > > would only move IRQs correctly if it got act_server (the CPU actually > > handling > > the interrupt) here. So, that approach seemed better. > > Paolo, again this is a pretty urgent fix for KVM on Power and Paulus > is away. We're hoping BenH will ack shortly (he's the logical > technical reviewer), after which can you merge this direct into the > KVM staging tree? (RHBZ 1477391, and we suspect several more are > related). Acked-by: Benjamin Herrenschmidt As a subsequent cleanup we should probably rename act_server to server. Note: We know of a remaining theorical race that isn't fixed yet with CPU unplug. If an interrupt is already in the queue of the CPU calling xics_migrate_irqs_away (guest), then that irq never gets pulled out of that queue and thus the bug this patch is fixing will re-occur. Fix isn't trivial, I'm working on it, though I'm tempted to make some assumptions about how linux does things to keep it (much) simpler. I'll elaborate later (at Kernel Recipes right now) Cheers, Ben. > > arch/powerpc/kvm/book3s_xive.c | 5 ++--- > > arch/powerpc/kvm/book3s_xive.h | 1 - > > 2 files changed, 2 insertions(+), 4 deletions(-) > > > > diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c > > index 13304622ab1c..bf457843e032 100644 > > --- a/arch/powerpc/kvm/book3s_xive.c > > +++ b/arch/powerpc/kvm/book3s_xive.c > > @@ -622,7 +622,7 @@ int kvmppc_xive_get_xive(struct kvm *kvm, u32 irq, u32 > > *server, > > return -EINVAL; > > state = >irq_state[idx]; > > arch_spin_lock(>lock); > > - *server = state->guest_server; > > + *server = state->act_server; > > *priority = state->guest_priority; > > arch_spin_unlock(>lock); > > > > @@ -1331,7 +1331,7 @@ static int xive_get_source(struct kvmppc_xive *xive, > > long irq, u64 addr) > > xive->saved_src_count++; > > > > /* Convert saved state into something compatible with xics */ > > - val = state->guest_server; > > + val = state->act_server; > > prio = state->saved_scan_prio; > > > > if (prio == MASKED) { > > @@ -1507,7 +1507,6 @@ static int xive_set_source(struct kvmppc_xive *xive, > > long irq, u64 addr) > > /* First convert prio and mark interrupt as untargetted */ > > act_prio = xive_prio_from_guest(guest_prio); > > state->act_priority = MASKED; > > - state->guest_server = server; > > > > /* > > * We need to drop the lock due to the mutex below. Hopefully > > diff --git a/arch/powerpc/kvm/book3s_xive.h b/arch/powerpc/kvm/book3s_xive.h > > index 5938f7644dc1..6ba63f8e8a61 100644 > > --- a/arch/powerpc/kvm/book3s_xive.h > > +++ b/arch/powerpc/kvm/book3s_xive.h > > @@ -35,7 +35,6 @@ struct kvmppc_xive_irq_state { > > struct xive_irq_data *pt_data; /* XIVE Pass-through associated data */ > > > > /* Targetting as set by guest */ > > - u32 guest_server; /* Current guest selected target */ > > u8 guest_priority; /* Guest set priority */ > > u8 saved_priority; /* Saved priority when masking */ > > > >