Re: [Qemu-devel] [Qemu-ppc] [PATCH 0/3] ppc: complete the new HV mode
On 06/06/2016 06:17 AM, Benjamin Herrenschmidt wrote: > On Sun, 2016-06-05 at 19:41 +0200, Cédric Le Goater wrote: >> >> Here is a fix I think. Could you give it a try ? > > This is somewhat wrong... > >> commit cd0c6f473532 ('ppc: Do some batching of TCG tlb flushes') >> introduced an optimisation to flush TLBs only when a context >> synchronizing event is reached (interrupt, rfi). This was done for >> ppc64 but 32bit was forgotten on the way. > > No it didn't. That commit only delays flushes on ppc64. ppc32 is > unaffected, unless I missed something. IE. It will delay flushes caused > by slb instructions (which don't exist on 32-bit) > and ppc_tlb_invalidate_one() only in the 64-bit cases. > > Also what your patch does in practice is not really change that, though > you seem to try to somewhat extend the batching to 32-bit (but > incompletely), you also introduce something which effectively reverts > part of 9fb044911444fdd09f5f072ad0ca269d7f8b841d (split I/D mode). > > I think that's more what's "fixing" your problem, ie, the flush in > IR/DR changes. However it shouldn't be needed. OK. I thought that was needed because of what the 32b specs say in "Synchronization Requirements for Special Registers and for Lookaside Buffers", a "Context-synchronizing instruction" is required after a mtmsr({I,D}R) and also because changing IR can break implicit branching. But I might just misunderstand all of it as I am discovering. > I suspect all of that is papering over another bug somewhere else which > got exposed by the split I/D mode, since we no longer over-flush on > transitions to/from real-mode. So we must be missing flushes elsewhere, > possibly some G3 specific stuff, or there always was some kind of bug > in the TLB flushing on 32-bit that got somewhat masked by the over- > flushing we used to do. >From what I see, darwin loops on tlbie : 0x000952fc: mtctr r0 0x00095300: tlbie r6 0x00095304: addir6,r6,4096 0x00095308: bdnz+ 0x95300 0x0009530c: mtcrf 128,r11 0x00095310: sync 0x00095314: eieio 0x00095318: bns-0x95328 and this is done on the G4, but not necessarily on the G3, it depends on r11 which contains some bits of SPRG2 : 0x0009531c: tlbsync 0x00095320: sync 0x00095324: isync HID0 is also read and written to but to control cache bits. > I need a repro-case. Booting the darwin CD is enough. Cheers, C. > Cheers, > Ben. > >> Tested on mac99 and g3beige with >> >> qemu-system-ppc -cdrom darwinppc-602.cdr -boot d >> >> Signed-off-by: Cédric Le Goater >> --- >> >> I think the hunk in powerpc_excp() is needed if we don't generate a >> context synchronizing event. what is best to do ? >> >> target-ppc/cpu.h |2 +- >> target-ppc/excp_helper.c | 10 ++ >> target-ppc/helper_regs.h |9 - >> target-ppc/translate.c |2 +- >> 4 files changed, 20 insertions(+), 3 deletions(-) >> >> Index: qemu-dgibson-for-2.7.git/target-ppc/translate.c >> === >> --- qemu-dgibson-for-2.7.git.orig/target-ppc/translate.c >> +++ qemu-dgibson-for-2.7.git/target-ppc/translate.c >> @@ -3290,7 +3290,7 @@ static void gen_eieio(DisasContext *ctx) >> { >> } >> >> -#if !defined(CONFIG_USER_ONLY) && defined(TARGET_PPC64) >> +#if !defined(CONFIG_USER_ONLY) >> static inline void gen_check_tlb_flush(DisasContext *ctx) >> { >> TCGv_i32 t = tcg_temp_new_i32(); >> Index: qemu-dgibson-for-2.7.git/target-ppc/cpu.h >> === >> --- qemu-dgibson-for-2.7.git.orig/target-ppc/cpu.h >> +++ qemu-dgibson-for-2.7.git/target-ppc/cpu.h >> @@ -958,9 +958,9 @@ struct CPUPPCState { >> /* PowerPC 64 SLB area */ >> ppc_slb_t slb[MAX_SLB_ENTRIES]; >> int32_t slb_nr; >> +#endif >> /* tcg TLB needs flush (deferred slb inval instruction >> typically) */ >> uint32_t tlb_need_flush; >> -#endif >> /* segment registers */ >> hwaddr htab_base; >> /* mask used to normalize hash value to PTEG index */ >> Index: qemu-dgibson-for-2.7.git/target-ppc/helper_regs.h >> === >> --- qemu-dgibson-for-2.7.git.orig/target-ppc/helper_regs.h >> +++ qemu-dgibson-for-2.7.git/target-ppc/helper_regs.h >> @@ -121,6 +121,13 @@ static inline int hreg_store_msr(CPUPPCS >> } >> if (((value >> MSR_IR) & 1) != msr_ir || >> ((value >> MSR_DR) & 1) != msr_dr) { >> +/* A change of the instruction relocation bit in the MSR can >> + * cause an implicit branch in the address space. This >> + * requires a tlb flush. >> + */ >> +if (env->mmu_model & POWERPC_MMU_32B) { >> +env->tlb_need_flush = 1; >> +} >> cs->interrupt_request |= CPU_INTERRUPT_EXITTB; >> } >> if ((env->mmu_model & P
Re: [Qemu-devel] [Qemu-ppc] [PATCH 0/3] ppc: complete the new HV mode
On Mon, 2016-06-06 at 17:04 +1000, Benjamin Herrenschmidt wrote: > On Mon, 2016-06-06 at 07:29 +0100, Mark Cave-Ayland wrote: > > > > > > The best reproducer is to run from David's ppc-for-2.7 branch with > > the above patch applied manually and then try booting the following > > ISOs which now panic on boot with the split I/D MMU mode enabled: > So at least HelenOS is fixed by this: Note that Cedric is right, the TLB flush batching isn't done for 32- bit, and we could do it, it would probably improve performances, but it needs to be done properly :-) I'll look into it next. > --- a/target-ppc/mmu_helper.c > +++ b/target-ppc/mmu_helper.c > @@ -1969,6 +1969,11 @@ void ppc_tlb_invalidate_one(CPUPPCState *env, > target_ulong addr) > /* XXX: this case should be optimized, > * giving a mask to tlb_flush_page > */ > +/* This is broken, some CPUs invalidate a whole congruence > + * class on an even smaller subset of bits and some OSes > take > + * advantage of this. Just blow the whole thing away. > + */ > +#if 0 > tlb_flush_page(cs, addr | (0x0 << 28)); > tlb_flush_page(cs, addr | (0x1 << 28)); > tlb_flush_page(cs, addr | (0x2 << 28)); > @@ -1985,6 +1990,9 @@ void ppc_tlb_invalidate_one(CPUPPCState *env, > target_ulong addr) > tlb_flush_page(cs, addr | (0xD << 28)); > tlb_flush_page(cs, addr | (0xE << 28)); > tlb_flush_page(cs, addr | (0xF << 28)); > +#else > +tlb_flush(cs, 1); > +#endif > break; > #if defined(TARGET_PPC64) > case POWERPC_MMU_64B: > > I'll check Darwin... > > Cheers, > Ben.
Re: [Qemu-devel] [Qemu-ppc] [PATCH 0/3] ppc: complete the new HV mode
On Mon, 2016-06-06 at 07:29 +0100, Mark Cave-Ayland wrote: > > The best reproducer is to run from David's ppc-for-2.7 branch with > the above patch applied manually and then try booting the following > ISOs which now panic on boot with the split I/D MMU mode enabled: So at least HelenOS is fixed by this: --- a/target-ppc/mmu_helper.c +++ b/target-ppc/mmu_helper.c @@ -1969,6 +1969,11 @@ void ppc_tlb_invalidate_one(CPUPPCState *env, target_ulong addr) /* XXX: this case should be optimized, * giving a mask to tlb_flush_page */ +/* This is broken, some CPUs invalidate a whole congruence + * class on an even smaller subset of bits and some OSes take + * advantage of this. Just blow the whole thing away. + */ +#if 0 tlb_flush_page(cs, addr | (0x0 << 28)); tlb_flush_page(cs, addr | (0x1 << 28)); tlb_flush_page(cs, addr | (0x2 << 28)); @@ -1985,6 +1990,9 @@ void ppc_tlb_invalidate_one(CPUPPCState *env, target_ulong addr) tlb_flush_page(cs, addr | (0xD << 28)); tlb_flush_page(cs, addr | (0xE << 28)); tlb_flush_page(cs, addr | (0xF << 28)); +#else +tlb_flush(cs, 1); +#endif break; #if defined(TARGET_PPC64) case POWERPC_MMU_64B: I'll check Darwin... Cheers, Ben.
Re: [Qemu-devel] [Qemu-ppc] [PATCH 0/3] ppc: complete the new HV mode
On 06/06/16 05:20, Benjamin Herrenschmidt wrote: > On Mon, 2016-06-06 at 13:55 +1000, Benjamin Herrenschmidt wrote: >> >> I'm not sure that 32-bit patch is correct. We shouldn't have to flush >> on IR/DR transitions at all, that's the whole point of the split I/D >> code. >> >> I think something else is wrong. > > Note: With whatever's in this branch, OpenBIOS won't start at all Yeah, there's a patch currently pending upstream to fix the serial port which hangs OpenBIOS upon startup: https://lists.nongnu.org/archive/html/qemu-devel/2016-06/msg00035.html. The best reproducer is to run from David's ppc-for-2.7 branch with the above patch applied manually and then try booting the following ISOs which now panic on boot with the split I/D MMU mode enabled: Darwin/ppc: https://opensource.apple.com/static/iso/darwinppc-602.cdr.gz HelenOS:http://www.helenos.org/releases/HelenOS-0.6.0-ppc32.iso I've seen some DMA coherence issues related to macio in some of my local tests and so I'm also testing with my beta patch to switch macio over to use the DMA helpers here: https://lists.nongnu.org/archive/html/qemu-devel/2016-05/msg04907.html. ATB, Mark.
Re: [Qemu-devel] [Qemu-ppc] [PATCH 0/3] ppc: complete the new HV mode
On Mon, 2016-06-06 at 13:55 +1000, Benjamin Herrenschmidt wrote: > > I'm not sure that 32-bit patch is correct. We shouldn't have to flush > on IR/DR transitions at all, that's the whole point of the split I/D > code. > > I think something else is wrong. Note: With whatever's in this branch, OpenBIOS won't start at all Cheers, Ben.
Re: [Qemu-devel] [Qemu-ppc] [PATCH 0/3] ppc: complete the new HV mode
On Sun, 2016-06-05 at 19:41 +0200, Cédric Le Goater wrote: > > Here is a fix I think. Could you give it a try ? This is somewhat wrong... > commit cd0c6f473532 ('ppc: Do some batching of TCG tlb flushes') > introduced an optimisation to flush TLBs only when a context > synchronizing event is reached (interrupt, rfi). This was done for > ppc64 but 32bit was forgotten on the way. No it didn't. That commit only delays flushes on ppc64. ppc32 is unaffected, unless I missed something. IE. It will delay flushes caused by slb instructions (which don't exist on 32-bit) and ppc_tlb_invalidate_one() only in the 64-bit cases. Also what your patch does in practice is not really change that, though you seem to try to somewhat extend the batching to 32-bit (but incompletely), you also introduce something which effectively reverts part of 9fb044911444fdd09f5f072ad0ca269d7f8b841d (split I/D mode). I think that's more what's "fixing" your problem, ie, the flush in IR/DR changes. However it shouldn't be needed. I suspect all of that is papering over another bug somewhere else which got exposed by the split I/D mode, since we no longer over-flush on transitions to/from real-mode. So we must be missing flushes elsewhere, possibly some G3 specific stuff, or there always was some kind of bug in the TLB flushing on 32-bit that got somewhat masked by the over- flushing we used to do. I need a repro-case. Cheers, Ben. > Tested on mac99 and g3beige with > > qemu-system-ppc -cdrom darwinppc-602.cdr -boot d > > Signed-off-by: Cédric Le Goater > --- > > I think the hunk in powerpc_excp() is needed if we don't generate a > context synchronizing event. what is best to do ? > > target-ppc/cpu.h |2 +- > target-ppc/excp_helper.c | 10 ++ > target-ppc/helper_regs.h |9 - > target-ppc/translate.c |2 +- > 4 files changed, 20 insertions(+), 3 deletions(-) > > Index: qemu-dgibson-for-2.7.git/target-ppc/translate.c > === > --- qemu-dgibson-for-2.7.git.orig/target-ppc/translate.c > +++ qemu-dgibson-for-2.7.git/target-ppc/translate.c > @@ -3290,7 +3290,7 @@ static void gen_eieio(DisasContext *ctx) > { > } > > -#if !defined(CONFIG_USER_ONLY) && defined(TARGET_PPC64) > +#if !defined(CONFIG_USER_ONLY) > static inline void gen_check_tlb_flush(DisasContext *ctx) > { > TCGv_i32 t = tcg_temp_new_i32(); > Index: qemu-dgibson-for-2.7.git/target-ppc/cpu.h > === > --- qemu-dgibson-for-2.7.git.orig/target-ppc/cpu.h > +++ qemu-dgibson-for-2.7.git/target-ppc/cpu.h > @@ -958,9 +958,9 @@ struct CPUPPCState { > /* PowerPC 64 SLB area */ > ppc_slb_t slb[MAX_SLB_ENTRIES]; > int32_t slb_nr; > +#endif > /* tcg TLB needs flush (deferred slb inval instruction > typically) */ > uint32_t tlb_need_flush; > -#endif > /* segment registers */ > hwaddr htab_base; > /* mask used to normalize hash value to PTEG index */ > Index: qemu-dgibson-for-2.7.git/target-ppc/helper_regs.h > === > --- qemu-dgibson-for-2.7.git.orig/target-ppc/helper_regs.h > +++ qemu-dgibson-for-2.7.git/target-ppc/helper_regs.h > @@ -121,6 +121,13 @@ static inline int hreg_store_msr(CPUPPCS > } > if (((value >> MSR_IR) & 1) != msr_ir || > ((value >> MSR_DR) & 1) != msr_dr) { > +/* A change of the instruction relocation bit in the MSR can > + * cause an implicit branch in the address space. This > + * requires a tlb flush. > + */ > +if (env->mmu_model & POWERPC_MMU_32B) { > +env->tlb_need_flush = 1; > +} > cs->interrupt_request |= CPU_INTERRUPT_EXITTB; > } > if ((env->mmu_model & POWERPC_MMU_BOOKE) && > @@ -151,7 +158,7 @@ static inline int hreg_store_msr(CPUPPCS > return excp; > } > > -#if !defined(CONFIG_USER_ONLY) && defined(TARGET_PPC64) > +#if !defined(CONFIG_USER_ONLY) > static inline void check_tlb_flush(CPUPPCState *env) > { > CPUState *cs = CPU(ppc_env_get_cpu(env)); > Index: qemu-dgibson-for-2.7.git/target-ppc/excp_helper.c > === > --- qemu-dgibson-for-2.7.git.orig/target-ppc/excp_helper.c > +++ qemu-dgibson-for-2.7.git/target-ppc/excp_helper.c > @@ -709,6 +709,16 @@ static inline void powerpc_excp(PowerPCC > } > } > #endif > +if (((new_msr >> MSR_IR) & 1) != msr_ir || > +((new_msr >> MSR_DR) & 1) != msr_dr) { > +/* A change of the instruction relocation bit in the MSR can > + * cause an implicit branch in the address space. This > + * requires a tlb flush. > + */ > +if (env->mmu_model & POWERPC_MMU_32B) { > +env->tlb_need_flush = 1; > +} > +} > /* We don't use hreg_store_msr here as already have treated > * any special ca