Re: [RFC PATCH 3/3] powerpc/64s/radix: optimise TLB flush with precise TLB ranges in mmu_gather

2018-06-13 Thread Nicholas Piggin
On Thu, 14 Jun 2018 15:15:47 +0900 Linus Torvalds wrote: > On Thu, Jun 14, 2018 at 11:49 AM Nicholas Piggin wrote: > > > > +#ifndef pte_free_tlb > > #define pte_free_tlb(tlb, ptep, address) \ > > do {\ > >

Re: [RFC PATCH 3/3] powerpc/64s/radix: optimise TLB flush with precise TLB ranges in mmu_gather

2018-06-13 Thread Linus Torvalds
On Thu, Jun 14, 2018 at 11:49 AM Nicholas Piggin wrote: > > +#ifndef pte_free_tlb > #define pte_free_tlb(tlb, ptep, address) \ > do {\ > __tlb_adjust_range(tlb, address, PAGE_SIZE);\ >

Re: [RFC PATCH 3/3] powerpc/64s/radix: optimise TLB flush with precise TLB ranges in mmu_gather

2018-06-13 Thread Nicholas Piggin
On Tue, 12 Jun 2018 18:10:26 -0700 Linus Torvalds wrote: > On Tue, Jun 12, 2018 at 5:12 PM Nicholas Piggin wrote: > > > > > > And in _theory_, maybe you could have just used "invalpg" with a > > > targeted address instead. In fact, I think a single invlpg invalidates > > > _all_ caches for the a

Re: [RFC PATCH 3/3] powerpc/64s/radix: optimise TLB flush with precise TLB ranges in mmu_gather

2018-06-12 Thread Linus Torvalds
On Tue, Jun 12, 2018 at 5:12 PM Nicholas Piggin wrote: > > > > And in _theory_, maybe you could have just used "invalpg" with a > > targeted address instead. In fact, I think a single invlpg invalidates > > _all_ caches for the associated MM, but don't quote me on that. Confirmed. The SDK says

Re: [RFC PATCH 3/3] powerpc/64s/radix: optimise TLB flush with precise TLB ranges in mmu_gather

2018-06-12 Thread Nicholas Piggin
On Tue, 12 Jun 2018 16:39:55 -0700 Linus Torvalds wrote: > On Tue, Jun 12, 2018 at 4:26 PM Linus Torvalds > wrote: > > > > Right. Intel depends on the current thing, ie if a page table > > *itself* is freed, we will will need to do a flush, but it's the exact > > same flush as if there had been

Re: [RFC PATCH 3/3] powerpc/64s/radix: optimise TLB flush with precise TLB ranges in mmu_gather

2018-06-12 Thread Nicholas Piggin
On Tue, 12 Jun 2018 16:26:33 -0700 Linus Torvalds wrote: > On Tue, Jun 12, 2018 at 4:09 PM Nicholas Piggin wrote: > > > > Sorry I mean Intel needs the existing behaviour of range flush expanded > > to cover page table pages right? > > Right. Intel depends on the current thing, ie if a pa

Re: [RFC PATCH 3/3] powerpc/64s/radix: optimise TLB flush with precise TLB ranges in mmu_gather

2018-06-12 Thread Linus Torvalds
On Tue, Jun 12, 2018 at 4:26 PM Linus Torvalds wrote: > > Right. Intel depends on the current thing, ie if a page table > *itself* is freed, we will will need to do a flush, but it's the exact > same flush as if there had been a regular page there. > > That's already handled by (for example) pud_

Re: [RFC PATCH 3/3] powerpc/64s/radix: optimise TLB flush with precise TLB ranges in mmu_gather

2018-06-12 Thread Linus Torvalds
On Tue, Jun 12, 2018 at 4:09 PM Nicholas Piggin wrote: > > Sorry I mean Intel needs the existing behaviour of range flush expanded > to cover page table pages right? Right. Intel depends on the current thing, ie if a page table *itself* is freed, we will will need to do a flush, but it's the

Re: [RFC PATCH 3/3] powerpc/64s/radix: optimise TLB flush with precise TLB ranges in mmu_gather

2018-06-12 Thread Nicholas Piggin
On Tue, 12 Jun 2018 15:42:34 -0700 Linus Torvalds wrote: > On Tue, Jun 12, 2018 at 3:31 PM Nicholas Piggin wrote: > > > > Okay sure, and this is the reason for the wide cc list. Intel does > > need it of course, from 4.10.3.1 of the dev manual: > > > > — The processor may create a PML4-cache e

Re: [RFC PATCH 3/3] powerpc/64s/radix: optimise TLB flush with precise TLB ranges in mmu_gather

2018-06-12 Thread Linus Torvalds
On Tue, Jun 12, 2018 at 3:31 PM Nicholas Piggin wrote: > > Okay sure, and this is the reason for the wide cc list. Intel does > need it of course, from 4.10.3.1 of the dev manual: > > — The processor may create a PML4-cache entry even if there are no > translations for any linear address tha

Re: [RFC PATCH 3/3] powerpc/64s/radix: optimise TLB flush with precise TLB ranges in mmu_gather

2018-06-12 Thread Nicholas Piggin
On Tue, 12 Jun 2018 11:18:27 -0700 Linus Torvalds wrote: > On Tue, Jun 12, 2018 at 12:16 AM Nicholas Piggin wrote: > > > > This brings the number of tlbiel instructions required by a kernel > > compile from 33M to 25M, most avoided from exec->shift_arg_pages. > > And this shows that "page_sta

Re: [RFC PATCH 3/3] powerpc/64s/radix: optimise TLB flush with precise TLB ranges in mmu_gather

2018-06-12 Thread Linus Torvalds
On Tue, Jun 12, 2018 at 12:16 AM Nicholas Piggin wrote: > > This brings the number of tlbiel instructions required by a kernel > compile from 33M to 25M, most avoided from exec->shift_arg_pages. And this shows that "page_start/end" is purely for powerpc and used nowhere else. The previous patch

[RFC PATCH 3/3] powerpc/64s/radix: optimise TLB flush with precise TLB ranges in mmu_gather

2018-06-12 Thread Nicholas Piggin
Use the page_start and page_end fields of mmu_gather to implement more precise TLB flushing. (start, end) covers the entire TLB and page table range that has been invalidated, for architectures that do not have explicit page walk cache management. page_start and page_end are just for ranges that ma