On 28 August 2014 18:14, Paolo Bonzini <pbonz...@redhat.com> wrote: > PowerPC TCG flushes the TLB on every IR/DR change, which basically > means on every user<->kernel context switch. Use the 6-element > TLB array as a cache, where each MMU index is mapped to a different > state of the IR/DR/PR/HV bits. > > This brings the number of TLB flushes down from ~900000 to ~50000 > for starting up the Debian installer, which is in line with x86 > and gives a ~10% performance improvement. > > Signed-off-by: Paolo Bonzini <pbonz...@redhat.com> > --- > cputlb.c | 19 +++++++++++++++++ > hw/ppc/spapr_hcall.c | 6 +++++- > include/exec/exec-all.h | 5 +++++ > target-ppc/cpu.h | 4 +++- > target-ppc/excp_helper.c | 6 +----- > target-ppc/helper_regs.h | 52 > +++++++++++++++++++++++++++++++-------------- > target-ppc/translate_init.c | 5 +++++ > 7 files changed, 74 insertions(+), 23 deletions(-) > > diff --git a/cputlb.c b/cputlb.c > index afd3705..17e1b03 100644 > --- a/cputlb.c > +++ b/cputlb.c > @@ -67,6 +67,25 @@ void tlb_flush(CPUState *cpu, int flush_global) > tlb_flush_count++; > } > > +void tlb_flush_idx(CPUState *cpu, int mmu_idx) > +{ > + CPUArchState *env = cpu->env_ptr; > + > +#if defined(DEBUG_TLB) > + printf("tlb_flush_idx %d:\n", mmu_idx); > +#endif > + /* must reset current TB so that interrupts cannot modify the > + links while we are modifying them */ > + cpu->current_tb = NULL; > + > + memset(env->tlb_table[mmu_idx], -1, sizeof(env->tlb_table[mmu_idx])); > + memset(cpu->tb_jmp_cache, 0, sizeof(cpu->tb_jmp_cache)); > + > + env->tlb_flush_addr = -1; > + env->tlb_flush_mask = 0;
Isn't this going to break huge page support? Consider the case: * set up huge pages in one TLB index (causing tlb_flush_addr and tlb_flush_mask to be set to cover that range) * switch to a different TLB index * tlb_flush_idx() for that index (causing flush_addr/mask to be reset) * switch back to first TLB index * do tlb_flush_page for an address inside the huge-page region I think you need the flush addr/mask to be per-TLB-index if you want this to work. Personally I would put the "implement new feature in core code" in a separate patch from "use new feature in PPC code". Does PPC hardware do lots of TLB flushes on user-kernel transitions, or does it have some sort of info in the TLB entry about whether it should match or not? (I'm wondering if there's a generalisation possible here that might help ARM too.) -- PMM