Re: [PATCH RFC v6] x86,mm,sched: make lazy TLB mode even lazier
On Fri, Sep 9, 2016 at 12:44 AM, Peter Zijlstrawrote: > On Thu, Sep 08, 2016 at 09:39:45PM -0700, Andy Lutomirski wrote: >> If they're busy threads, shouldn't the yield return immediately >> because the threads are still ready to run? Lazy TLB won't do much >> unless you get the kernel in some state where it's running in the >> context of a different kernel thread and hasn't switched to >> swapper_pg_dir. IIRC idle works like that, but you'd need to actually >> sleep to go idle. > > Right, a task doing: > > for (;;) sched_yield(); > > esp. when its the only runnable thread on the CPU, is a busy thread. It > will not enter switch_mm(), which was where the invalidate hook was > placed IIRC. Hi all- I'm guessing that this patch got abandoned, at least temporarily. I'm currently polishing up my PCID series, and I think it might be worth revisiting this on top of my PCID rework. The relevant major infrastructure change I'm making with my PCID code is that I'm adding an atomic64_t to each mm_context_t that gets incremented every time a flush on that mm is requested. With that change, we might be able to get away with simply removing a cpu from mm_cpumask immediately when it enters lazy mode and adding a hook to the scheduler to revalidate the TLB state when switching mms when we were previously lazy. Revalidation would just check that the counter hasn't changed. --Andy
Re: [PATCH RFC v6] x86,mm,sched: make lazy TLB mode even lazier
On Fri, Sep 9, 2016 at 12:44 AM, Peter Zijlstra wrote: > On Thu, Sep 08, 2016 at 09:39:45PM -0700, Andy Lutomirski wrote: >> If they're busy threads, shouldn't the yield return immediately >> because the threads are still ready to run? Lazy TLB won't do much >> unless you get the kernel in some state where it's running in the >> context of a different kernel thread and hasn't switched to >> swapper_pg_dir. IIRC idle works like that, but you'd need to actually >> sleep to go idle. > > Right, a task doing: > > for (;;) sched_yield(); > > esp. when its the only runnable thread on the CPU, is a busy thread. It > will not enter switch_mm(), which was where the invalidate hook was > placed IIRC. Hi all- I'm guessing that this patch got abandoned, at least temporarily. I'm currently polishing up my PCID series, and I think it might be worth revisiting this on top of my PCID rework. The relevant major infrastructure change I'm making with my PCID code is that I'm adding an atomic64_t to each mm_context_t that gets incremented every time a flush on that mm is requested. With that change, we might be able to get away with simply removing a cpu from mm_cpumask immediately when it enters lazy mode and adding a hook to the scheduler to revalidate the TLB state when switching mms when we were previously lazy. Revalidation would just check that the counter hasn't changed. --Andy
Re: [PATCH RFC v6] x86,mm,sched: make lazy TLB mode even lazier
On Thu, Sep 08, 2016 at 09:39:45PM -0700, Andy Lutomirski wrote: > If they're busy threads, shouldn't the yield return immediately > because the threads are still ready to run? Lazy TLB won't do much > unless you get the kernel in some state where it's running in the > context of a different kernel thread and hasn't switched to > swapper_pg_dir. IIRC idle works like that, but you'd need to actually > sleep to go idle. Right, a task doing: for (;;) sched_yield(); esp. when its the only runnable thread on the CPU, is a busy thread. It will not enter switch_mm(), which was where the invalidate hook was placed IIRC.
Re: [PATCH RFC v6] x86,mm,sched: make lazy TLB mode even lazier
On Thu, Sep 08, 2016 at 09:39:45PM -0700, Andy Lutomirski wrote: > If they're busy threads, shouldn't the yield return immediately > because the threads are still ready to run? Lazy TLB won't do much > unless you get the kernel in some state where it's running in the > context of a different kernel thread and hasn't switched to > swapper_pg_dir. IIRC idle works like that, but you'd need to actually > sleep to go idle. Right, a task doing: for (;;) sched_yield(); esp. when its the only runnable thread on the CPU, is a busy thread. It will not enter switch_mm(), which was where the invalidate hook was placed IIRC.
Re: [PATCH RFC v6] x86,mm,sched: make lazy TLB mode even lazier
On Thu, Sep 8, 2016 at 5:09 PM, Benjamin Serebrinwrote: > Sorry for the delay, I was eaten by a grue. > > I found that my initial study did not actually measure the number of > TLB shootdown IPIs sent per TLB shootdown. I think the intuition was > correct but I didn't actually observe what I thought I had; my > original use of probe points was incorrect. However, after fixing my > methodology, I'm having trouble proving that the existing Lazy TLB > mode is working properly. > > > > I've spent some time trying to reproduce this in a microbenchmark. > One thread does mmap, touch page, munmap, while other threads in the > same process are configured to either busy-spin or busy-spin and > yield. All threads set their own affinity to a unique cpu, and the > system is otherwise idle. I look at the per-cpu delta of the TLB and > CAL lines of /proc/interrupts over the run of the microbenchmark. > > Let's say I have 4 spin threads that never yield. The mmap thread > does N unmaps. I observe each spin-thread core receives N (+/- small > noise) TLB shootdown interrupts, and the total TLB interrupt count is > 4N (+/- small noise). This is expected behavior. > > Then I add some synchronization: the unmap thread rendezvouses with > all the spinners, and when they are all ready, the spinners busy-spin > for D milliseconds and then yield (pthread_yield, sched_yield produce > identical results, though I'm not confident here that this is the > right yield). Meanwhile, the unmap thread busy-spins for D+E > milliseconds and then does M map/touch/unmaps. (D, E are single-digit > milliseconds). The idea here is that the unmap happens a little while > after the spinners yielded; the kernel should be in the user process' > mm but lazy TLB mode should defer TLB flushes. It seems that lazy > mode on each CPU should take 1 interrupt and then suppress subsequent > interrupts. If they're busy threads, shouldn't the yield return immediately because the threads are still ready to run? Lazy TLB won't do much unless you get the kernel in some state where it's running in the context of a different kernel thread and hasn't switched to swapper_pg_dir. IIRC idle works like that, but you'd need to actually sleep to go idle. --Andy
Re: [PATCH RFC v6] x86,mm,sched: make lazy TLB mode even lazier
On Thu, Sep 8, 2016 at 5:09 PM, Benjamin Serebrin wrote: > Sorry for the delay, I was eaten by a grue. > > I found that my initial study did not actually measure the number of > TLB shootdown IPIs sent per TLB shootdown. I think the intuition was > correct but I didn't actually observe what I thought I had; my > original use of probe points was incorrect. However, after fixing my > methodology, I'm having trouble proving that the existing Lazy TLB > mode is working properly. > > > > I've spent some time trying to reproduce this in a microbenchmark. > One thread does mmap, touch page, munmap, while other threads in the > same process are configured to either busy-spin or busy-spin and > yield. All threads set their own affinity to a unique cpu, and the > system is otherwise idle. I look at the per-cpu delta of the TLB and > CAL lines of /proc/interrupts over the run of the microbenchmark. > > Let's say I have 4 spin threads that never yield. The mmap thread > does N unmaps. I observe each spin-thread core receives N (+/- small > noise) TLB shootdown interrupts, and the total TLB interrupt count is > 4N (+/- small noise). This is expected behavior. > > Then I add some synchronization: the unmap thread rendezvouses with > all the spinners, and when they are all ready, the spinners busy-spin > for D milliseconds and then yield (pthread_yield, sched_yield produce > identical results, though I'm not confident here that this is the > right yield). Meanwhile, the unmap thread busy-spins for D+E > milliseconds and then does M map/touch/unmaps. (D, E are single-digit > milliseconds). The idea here is that the unmap happens a little while > after the spinners yielded; the kernel should be in the user process' > mm but lazy TLB mode should defer TLB flushes. It seems that lazy > mode on each CPU should take 1 interrupt and then suppress subsequent > interrupts. If they're busy threads, shouldn't the yield return immediately because the threads are still ready to run? Lazy TLB won't do much unless you get the kernel in some state where it's running in the context of a different kernel thread and hasn't switched to swapper_pg_dir. IIRC idle works like that, but you'd need to actually sleep to go idle. --Andy
Re: [PATCH RFC v6] x86,mm,sched: make lazy TLB mode even lazier
Sorry for the delay, I was eaten by a grue. I found that my initial study did not actually measure the number of TLB shootdown IPIs sent per TLB shootdown. I think the intuition was correct but I didn't actually observe what I thought I had; my original use of probe points was incorrect. However, after fixing my methodology, I'm having trouble proving that the existing Lazy TLB mode is working properly. I've spent some time trying to reproduce this in a microbenchmark. One thread does mmap, touch page, munmap, while other threads in the same process are configured to either busy-spin or busy-spin and yield. All threads set their own affinity to a unique cpu, and the system is otherwise idle. I look at the per-cpu delta of the TLB and CAL lines of /proc/interrupts over the run of the microbenchmark. Let's say I have 4 spin threads that never yield. The mmap thread does N unmaps. I observe each spin-thread core receives N (+/- small noise) TLB shootdown interrupts, and the total TLB interrupt count is 4N (+/- small noise). This is expected behavior. Then I add some synchronization: the unmap thread rendezvouses with all the spinners, and when they are all ready, the spinners busy-spin for D milliseconds and then yield (pthread_yield, sched_yield produce identical results, though I'm not confident here that this is the right yield). Meanwhile, the unmap thread busy-spins for D+E milliseconds and then does M map/touch/unmaps. (D, E are single-digit milliseconds). The idea here is that the unmap happens a little while after the spinners yielded; the kernel should be in the user process' mm but lazy TLB mode should defer TLB flushes. It seems that lazy mode on each CPU should take 1 interrupt and then suppress subsequent interrupts. I expect lazy TLB invalidation to take 1 interrupt on each spinner CPU, per rendezvous sequence, and I expect Rik's extra-lazy version to take 0. I see M in all cases. This leads me to wonder if I'm failing to trigger lazy TLB invalidation, or if lazy TLB invalidation is not working as intended. I get similar results using perf record on probe points: I filter by CPU number and count the number of IPIs sent per each pair of probe points in the tlb flush routines. I put probe points on flush_tlb_mm_range and flush_tlb_mm_range%return. Counting number of IPIs sent: In a VM that uses x2_physical mode, probing native_x2apic_icr_write or __x2apic_send_IPI_dest is usually convenient if it doesn't get inlined away (which sometimes happens), since that function is called once per CPU target in the cpu_mask of __x2apic_send_IPI_mask (in x2 physical mode). I filter perf script to look at the distribution of cpus targeted per TLB shootdown. Rik's patch definitely looks correct, but I can't yet cite the gains. Thanks! Ben On Wed, Sep 7, 2016 at 11:56 PM, Ingo Molnarwrote: > > * Rik van Riel wrote: > >> On Sat, 27 Aug 2016 16:02:25 -0700 >> Linus Torvalds wrote: >> >> > Yeah, with those small fixes from Ingo, I definitely don't think this >> > looks hacky at all. This all seems to be exactly what we should always >> > have done. >> >> OK, so I was too tired yesterday to do kernel hacking, and >> missed yet another bit (xen_flush_tlb_others). Sigh. >> >> Otherwise, the patch is identical. >> >> Looking forward to Ben's test results. > > Gentle ping to Ben. > > I can also apply this without waiting for the test result, the patch looks > sane > enough to me. > > Thanks, > > Ingo
Re: [PATCH RFC v6] x86,mm,sched: make lazy TLB mode even lazier
Sorry for the delay, I was eaten by a grue. I found that my initial study did not actually measure the number of TLB shootdown IPIs sent per TLB shootdown. I think the intuition was correct but I didn't actually observe what I thought I had; my original use of probe points was incorrect. However, after fixing my methodology, I'm having trouble proving that the existing Lazy TLB mode is working properly. I've spent some time trying to reproduce this in a microbenchmark. One thread does mmap, touch page, munmap, while other threads in the same process are configured to either busy-spin or busy-spin and yield. All threads set their own affinity to a unique cpu, and the system is otherwise idle. I look at the per-cpu delta of the TLB and CAL lines of /proc/interrupts over the run of the microbenchmark. Let's say I have 4 spin threads that never yield. The mmap thread does N unmaps. I observe each spin-thread core receives N (+/- small noise) TLB shootdown interrupts, and the total TLB interrupt count is 4N (+/- small noise). This is expected behavior. Then I add some synchronization: the unmap thread rendezvouses with all the spinners, and when they are all ready, the spinners busy-spin for D milliseconds and then yield (pthread_yield, sched_yield produce identical results, though I'm not confident here that this is the right yield). Meanwhile, the unmap thread busy-spins for D+E milliseconds and then does M map/touch/unmaps. (D, E are single-digit milliseconds). The idea here is that the unmap happens a little while after the spinners yielded; the kernel should be in the user process' mm but lazy TLB mode should defer TLB flushes. It seems that lazy mode on each CPU should take 1 interrupt and then suppress subsequent interrupts. I expect lazy TLB invalidation to take 1 interrupt on each spinner CPU, per rendezvous sequence, and I expect Rik's extra-lazy version to take 0. I see M in all cases. This leads me to wonder if I'm failing to trigger lazy TLB invalidation, or if lazy TLB invalidation is not working as intended. I get similar results using perf record on probe points: I filter by CPU number and count the number of IPIs sent per each pair of probe points in the tlb flush routines. I put probe points on flush_tlb_mm_range and flush_tlb_mm_range%return. Counting number of IPIs sent: In a VM that uses x2_physical mode, probing native_x2apic_icr_write or __x2apic_send_IPI_dest is usually convenient if it doesn't get inlined away (which sometimes happens), since that function is called once per CPU target in the cpu_mask of __x2apic_send_IPI_mask (in x2 physical mode). I filter perf script to look at the distribution of cpus targeted per TLB shootdown. Rik's patch definitely looks correct, but I can't yet cite the gains. Thanks! Ben On Wed, Sep 7, 2016 at 11:56 PM, Ingo Molnar wrote: > > * Rik van Riel wrote: > >> On Sat, 27 Aug 2016 16:02:25 -0700 >> Linus Torvalds wrote: >> >> > Yeah, with those small fixes from Ingo, I definitely don't think this >> > looks hacky at all. This all seems to be exactly what we should always >> > have done. >> >> OK, so I was too tired yesterday to do kernel hacking, and >> missed yet another bit (xen_flush_tlb_others). Sigh. >> >> Otherwise, the patch is identical. >> >> Looking forward to Ben's test results. > > Gentle ping to Ben. > > I can also apply this without waiting for the test result, the patch looks > sane > enough to me. > > Thanks, > > Ingo
Re: [PATCH RFC v6] x86,mm,sched: make lazy TLB mode even lazier
* Rik van Rielwrote: > On Sat, 27 Aug 2016 16:02:25 -0700 > Linus Torvalds wrote: > > > Yeah, with those small fixes from Ingo, I definitely don't think this > > looks hacky at all. This all seems to be exactly what we should always > > have done. > > OK, so I was too tired yesterday to do kernel hacking, and > missed yet another bit (xen_flush_tlb_others). Sigh. > > Otherwise, the patch is identical. > > Looking forward to Ben's test results. Gentle ping to Ben. I can also apply this without waiting for the test result, the patch looks sane enough to me. Thanks, Ingo
Re: [PATCH RFC v6] x86,mm,sched: make lazy TLB mode even lazier
* Rik van Riel wrote: > On Sat, 27 Aug 2016 16:02:25 -0700 > Linus Torvalds wrote: > > > Yeah, with those small fixes from Ingo, I definitely don't think this > > looks hacky at all. This all seems to be exactly what we should always > > have done. > > OK, so I was too tired yesterday to do kernel hacking, and > missed yet another bit (xen_flush_tlb_others). Sigh. > > Otherwise, the patch is identical. > > Looking forward to Ben's test results. Gentle ping to Ben. I can also apply this without waiting for the test result, the patch looks sane enough to me. Thanks, Ingo
[PATCH RFC v6] x86,mm,sched: make lazy TLB mode even lazier
On Sat, 27 Aug 2016 16:02:25 -0700 Linus Torvaldswrote: > Yeah, with those small fixes from Ingo, I definitely don't think this > looks hacky at all. This all seems to be exactly what we should always > have done. OK, so I was too tired yesterday to do kernel hacking, and missed yet another bit (xen_flush_tlb_others). Sigh. Otherwise, the patch is identical. Looking forward to Ben's test results. ---8<--- Subject: x86,mm,sched: make lazy TLB mode even lazier Lazy TLB mode can result in an idle CPU being woken up for a TLB flush, when all it really needed to do was flush %CR3 before the next context switch. This is mostly fine on bare metal, though sub-optimal from a power saving point of view, and deeper C-states could make TLB flushes take a little longer than desired. On virtual machines, the pain can be much worse, especially if a currently non-running VCPU is woken up for a TLB invalidation IPI, on a CPU that is busy running another task. It could take a while before that IPI is handled, leading to performance issues. This patch deals with the issue by introducing a third TLB state, TLBSTATE_FLUSH, which causes %CR3 to be flushed at the next context switch. A CPU that transitions from TLBSTATE_LAZY to TLBSTATE_OK during the attempted transition to TLBSTATE_FLUSH will get a TLB flush IPI, just like a CPU that was in TLBSTATE_OK to begin with. Nothing is done for a CPU that is already in TLBSTATE_FLUSH mode. Signed-off-by: Rik van Riel Reported-by: Benjamin Serebrin --- arch/x86/include/asm/paravirt_types.h | 2 +- arch/x86/include/asm/tlbflush.h | 3 +- arch/x86/include/asm/uv/uv.h | 6 ++-- arch/x86/mm/tlb.c | 64 --- arch/x86/platform/uv/tlb_uv.c | 2 +- arch/x86/xen/mmu.c| 2 +- 6 files changed, 68 insertions(+), 11 deletions(-) diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h index 7fa9e7740ba3..b7e695c90c43 100644 --- a/arch/x86/include/asm/paravirt_types.h +++ b/arch/x86/include/asm/paravirt_types.h @@ -225,7 +225,7 @@ struct pv_mmu_ops { void (*flush_tlb_user)(void); void (*flush_tlb_kernel)(void); void (*flush_tlb_single)(unsigned long addr); - void (*flush_tlb_others)(const struct cpumask *cpus, + void (*flush_tlb_others)(struct cpumask *cpus, struct mm_struct *mm, unsigned long start, unsigned long end); diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index 4e5be94e079a..c3dbacbc49be 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -304,12 +304,13 @@ extern void flush_tlb_kernel_range(unsigned long start, unsigned long end); #define flush_tlb()flush_tlb_current_task() -void native_flush_tlb_others(const struct cpumask *cpumask, +void native_flush_tlb_others(struct cpumask *cpumask, struct mm_struct *mm, unsigned long start, unsigned long end); #define TLBSTATE_OK1 #define TLBSTATE_LAZY 2 +#define TLBSTATE_FLUSH 3 static inline void reset_lazy_tlbstate(void) { diff --git a/arch/x86/include/asm/uv/uv.h b/arch/x86/include/asm/uv/uv.h index 062921ef34e9..7e83cc633ba1 100644 --- a/arch/x86/include/asm/uv/uv.h +++ b/arch/x86/include/asm/uv/uv.h @@ -13,7 +13,7 @@ extern int is_uv_system(void); extern void uv_cpu_init(void); extern void uv_nmi_init(void); extern void uv_system_init(void); -extern const struct cpumask *uv_flush_tlb_others(const struct cpumask *cpumask, +extern struct cpumask *uv_flush_tlb_others(struct cpumask *cpumask, struct mm_struct *mm, unsigned long start, unsigned long end, @@ -25,8 +25,8 @@ static inline enum uv_system_type get_uv_system_type(void) { return UV_NONE; } static inline int is_uv_system(void) { return 0; } static inline void uv_cpu_init(void) { } static inline void uv_system_init(void){ } -static inline const struct cpumask * -uv_flush_tlb_others(const struct cpumask *cpumask, struct mm_struct *mm, +static inline struct cpumask * +uv_flush_tlb_others(struct cpumask *cpumask, struct mm_struct *mm, unsigned long start, unsigned long end, unsigned int cpu) { return cpumask; } diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 5643fd0b1a7d..634248b38db9 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -140,10 +140,24 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, } #ifdef CONFIG_SMP else { + int *tlbstate = this_cpu_ptr(_tlbstate.state); + int oldstate = *tlbstate; + +
[PATCH RFC v6] x86,mm,sched: make lazy TLB mode even lazier
On Sat, 27 Aug 2016 16:02:25 -0700 Linus Torvalds wrote: > Yeah, with those small fixes from Ingo, I definitely don't think this > looks hacky at all. This all seems to be exactly what we should always > have done. OK, so I was too tired yesterday to do kernel hacking, and missed yet another bit (xen_flush_tlb_others). Sigh. Otherwise, the patch is identical. Looking forward to Ben's test results. ---8<--- Subject: x86,mm,sched: make lazy TLB mode even lazier Lazy TLB mode can result in an idle CPU being woken up for a TLB flush, when all it really needed to do was flush %CR3 before the next context switch. This is mostly fine on bare metal, though sub-optimal from a power saving point of view, and deeper C-states could make TLB flushes take a little longer than desired. On virtual machines, the pain can be much worse, especially if a currently non-running VCPU is woken up for a TLB invalidation IPI, on a CPU that is busy running another task. It could take a while before that IPI is handled, leading to performance issues. This patch deals with the issue by introducing a third TLB state, TLBSTATE_FLUSH, which causes %CR3 to be flushed at the next context switch. A CPU that transitions from TLBSTATE_LAZY to TLBSTATE_OK during the attempted transition to TLBSTATE_FLUSH will get a TLB flush IPI, just like a CPU that was in TLBSTATE_OK to begin with. Nothing is done for a CPU that is already in TLBSTATE_FLUSH mode. Signed-off-by: Rik van Riel Reported-by: Benjamin Serebrin --- arch/x86/include/asm/paravirt_types.h | 2 +- arch/x86/include/asm/tlbflush.h | 3 +- arch/x86/include/asm/uv/uv.h | 6 ++-- arch/x86/mm/tlb.c | 64 --- arch/x86/platform/uv/tlb_uv.c | 2 +- arch/x86/xen/mmu.c| 2 +- 6 files changed, 68 insertions(+), 11 deletions(-) diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h index 7fa9e7740ba3..b7e695c90c43 100644 --- a/arch/x86/include/asm/paravirt_types.h +++ b/arch/x86/include/asm/paravirt_types.h @@ -225,7 +225,7 @@ struct pv_mmu_ops { void (*flush_tlb_user)(void); void (*flush_tlb_kernel)(void); void (*flush_tlb_single)(unsigned long addr); - void (*flush_tlb_others)(const struct cpumask *cpus, + void (*flush_tlb_others)(struct cpumask *cpus, struct mm_struct *mm, unsigned long start, unsigned long end); diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index 4e5be94e079a..c3dbacbc49be 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -304,12 +304,13 @@ extern void flush_tlb_kernel_range(unsigned long start, unsigned long end); #define flush_tlb()flush_tlb_current_task() -void native_flush_tlb_others(const struct cpumask *cpumask, +void native_flush_tlb_others(struct cpumask *cpumask, struct mm_struct *mm, unsigned long start, unsigned long end); #define TLBSTATE_OK1 #define TLBSTATE_LAZY 2 +#define TLBSTATE_FLUSH 3 static inline void reset_lazy_tlbstate(void) { diff --git a/arch/x86/include/asm/uv/uv.h b/arch/x86/include/asm/uv/uv.h index 062921ef34e9..7e83cc633ba1 100644 --- a/arch/x86/include/asm/uv/uv.h +++ b/arch/x86/include/asm/uv/uv.h @@ -13,7 +13,7 @@ extern int is_uv_system(void); extern void uv_cpu_init(void); extern void uv_nmi_init(void); extern void uv_system_init(void); -extern const struct cpumask *uv_flush_tlb_others(const struct cpumask *cpumask, +extern struct cpumask *uv_flush_tlb_others(struct cpumask *cpumask, struct mm_struct *mm, unsigned long start, unsigned long end, @@ -25,8 +25,8 @@ static inline enum uv_system_type get_uv_system_type(void) { return UV_NONE; } static inline int is_uv_system(void) { return 0; } static inline void uv_cpu_init(void) { } static inline void uv_system_init(void){ } -static inline const struct cpumask * -uv_flush_tlb_others(const struct cpumask *cpumask, struct mm_struct *mm, +static inline struct cpumask * +uv_flush_tlb_others(struct cpumask *cpumask, struct mm_struct *mm, unsigned long start, unsigned long end, unsigned int cpu) { return cpumask; } diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 5643fd0b1a7d..634248b38db9 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -140,10 +140,24 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, } #ifdef CONFIG_SMP else { + int *tlbstate = this_cpu_ptr(_tlbstate.state); + int oldstate = *tlbstate; + + if (unlikely(oldstate == TLBSTATE_LAZY)) { + /*