Re: [PATCH] x86/mm/pat: Fix missing preemption disable for __native_flush_tlb()
On 11/10/18 4:31 PM, Dan Williams wrote: >> If it indeed can run late in boot or after boot, then it sure looks >> buggy. Either the __flush_tlb_all() should be removed or it should >> be replaced with flush_tlb_kernel_range(). It’s unclear to me why a >> flush is needed at all, but if it’s needed, surely all CPUs need >> flushing. > Yeah, I don't think __flush_tlb_all() is needed at > kernel_physical_mapping_init() time, and at > kernel_physical_mapping_remove() time we do a full flush_tlb_all(). It doesn't look strictly necessary to me. I _think_ we're only ever populating previously non-present entries, and those never need TLB flushes. I didn't look too deeply, so I'd appreciate anyone else double-checking me on this. The __flush_tlb_all() actually appears to predate git and it was originally entirely intended for early-boot-only. It probably lasted this long because it looks really important. :) It was even next to where we set MMU features in CR4, which is *really* early in boot: > + asm volatile("movq %%cr4,%0" : "=r" (mmu_cr4_features)); > + __flush_tlb_all(); I also totally agree with Andy that if it were needed on the local CPU, this code would be buggy because it doesn't initiate any *remote* TLB flushes. So, let's remove it, but also add some comments about not being allowed to *change* page table entries, only populate them. We could even add some warnings to keep this enforced.
Re: [PATCH] x86/mm/pat: Fix missing preemption disable for __native_flush_tlb()
On 11/10/18 4:31 PM, Dan Williams wrote: >> If it indeed can run late in boot or after boot, then it sure looks >> buggy. Either the __flush_tlb_all() should be removed or it should >> be replaced with flush_tlb_kernel_range(). It’s unclear to me why a >> flush is needed at all, but if it’s needed, surely all CPUs need >> flushing. > Yeah, I don't think __flush_tlb_all() is needed at > kernel_physical_mapping_init() time, and at > kernel_physical_mapping_remove() time we do a full flush_tlb_all(). It doesn't look strictly necessary to me. I _think_ we're only ever populating previously non-present entries, and those never need TLB flushes. I didn't look too deeply, so I'd appreciate anyone else double-checking me on this. The __flush_tlb_all() actually appears to predate git and it was originally entirely intended for early-boot-only. It probably lasted this long because it looks really important. :) It was even next to where we set MMU features in CR4, which is *really* early in boot: > + asm volatile("movq %%cr4,%0" : "=r" (mmu_cr4_features)); > + __flush_tlb_all(); I also totally agree with Andy that if it were needed on the local CPU, this code would be buggy because it doesn't initiate any *remote* TLB flushes. So, let's remove it, but also add some comments about not being allowed to *change* page table entries, only populate them. We could even add some warnings to keep this enforced.
Re: [PATCH] x86/mm/pat: Fix missing preemption disable for __native_flush_tlb()
[ added Kirill ] On Sat, Nov 10, 2018 at 4:19 PM Andy Lutomirski wrote: > > On Nov 10, 2018, at 3:57 PM, Dan Williams wrote: > > > >> On Fri, Nov 9, 2018 at 4:22 PM Andy Lutomirski wrote: > >> > >> > >> > >>> On Nov 9, 2018, at 4:05 PM, Dan Williams wrote: > >>> > >>> Commit f77084d96355 "x86/mm/pat: Disable preemption around > >>> __flush_tlb_all()" addressed a case where __flush_tlb_all() is called > >>> without preemption being disabled. It also left a warning to catch other > >>> cases where preemption is not disabled. That warning triggers for the > >>> memory hotplug path which is also used for persistent memory enabling: > >> > >> I don’t think I agree with the patch. If you call __flush_tlb_all() in a > >> context where you might be *migrated*, then there’s a bug. We could change > >> the code to allow this particular use by checking that we haven’t done SMP > >> init yet, perhaps. > > > > Hmm, are saying the entire kernel_physical_mapping_init() sequence > > needs to run with pre-emption disabled? > > If it indeed can run late in boot or after boot, then it sure looks buggy. > Either the __flush_tlb_all() should be removed or it should be replaced with > flush_tlb_kernel_range(). It’s unclear to me why a flush is needed at all, > but if it’s needed, surely all CPUs need flushing. Yeah, I don't think __flush_tlb_all() is needed at kernel_physical_mapping_init() time, and at kernel_physical_mapping_remove() time we do a full flush_tlb_all(). Kirill?
Re: [PATCH] x86/mm/pat: Fix missing preemption disable for __native_flush_tlb()
[ added Kirill ] On Sat, Nov 10, 2018 at 4:19 PM Andy Lutomirski wrote: > > On Nov 10, 2018, at 3:57 PM, Dan Williams wrote: > > > >> On Fri, Nov 9, 2018 at 4:22 PM Andy Lutomirski wrote: > >> > >> > >> > >>> On Nov 9, 2018, at 4:05 PM, Dan Williams wrote: > >>> > >>> Commit f77084d96355 "x86/mm/pat: Disable preemption around > >>> __flush_tlb_all()" addressed a case where __flush_tlb_all() is called > >>> without preemption being disabled. It also left a warning to catch other > >>> cases where preemption is not disabled. That warning triggers for the > >>> memory hotplug path which is also used for persistent memory enabling: > >> > >> I don’t think I agree with the patch. If you call __flush_tlb_all() in a > >> context where you might be *migrated*, then there’s a bug. We could change > >> the code to allow this particular use by checking that we haven’t done SMP > >> init yet, perhaps. > > > > Hmm, are saying the entire kernel_physical_mapping_init() sequence > > needs to run with pre-emption disabled? > > If it indeed can run late in boot or after boot, then it sure looks buggy. > Either the __flush_tlb_all() should be removed or it should be replaced with > flush_tlb_kernel_range(). It’s unclear to me why a flush is needed at all, > but if it’s needed, surely all CPUs need flushing. Yeah, I don't think __flush_tlb_all() is needed at kernel_physical_mapping_init() time, and at kernel_physical_mapping_remove() time we do a full flush_tlb_all(). Kirill?
Re: [PATCH] x86/mm/pat: Fix missing preemption disable for __native_flush_tlb()
> On Nov 10, 2018, at 3:57 PM, Dan Williams wrote: > >> On Fri, Nov 9, 2018 at 4:22 PM Andy Lutomirski wrote: >> >> >> >>> On Nov 9, 2018, at 4:05 PM, Dan Williams wrote: >>> >>> Commit f77084d96355 "x86/mm/pat: Disable preemption around >>> __flush_tlb_all()" addressed a case where __flush_tlb_all() is called >>> without preemption being disabled. It also left a warning to catch other >>> cases where preemption is not disabled. That warning triggers for the >>> memory hotplug path which is also used for persistent memory enabling: >> >> I don’t think I agree with the patch. If you call __flush_tlb_all() in a >> context where you might be *migrated*, then there’s a bug. We could change >> the code to allow this particular use by checking that we haven’t done SMP >> init yet, perhaps. > > Hmm, are saying the entire kernel_physical_mapping_init() sequence > needs to run with pre-emption disabled? If it indeed can run late in boot or after boot, then it sure looks buggy. Either the __flush_tlb_all() should be removed or it should be replaced with flush_tlb_kernel_range(). It’s unclear to me why a flush is needed at all, but if it’s needed, surely all CPUs need flushing.
Re: [PATCH] x86/mm/pat: Fix missing preemption disable for __native_flush_tlb()
> On Nov 10, 2018, at 3:57 PM, Dan Williams wrote: > >> On Fri, Nov 9, 2018 at 4:22 PM Andy Lutomirski wrote: >> >> >> >>> On Nov 9, 2018, at 4:05 PM, Dan Williams wrote: >>> >>> Commit f77084d96355 "x86/mm/pat: Disable preemption around >>> __flush_tlb_all()" addressed a case where __flush_tlb_all() is called >>> without preemption being disabled. It also left a warning to catch other >>> cases where preemption is not disabled. That warning triggers for the >>> memory hotplug path which is also used for persistent memory enabling: >> >> I don’t think I agree with the patch. If you call __flush_tlb_all() in a >> context where you might be *migrated*, then there’s a bug. We could change >> the code to allow this particular use by checking that we haven’t done SMP >> init yet, perhaps. > > Hmm, are saying the entire kernel_physical_mapping_init() sequence > needs to run with pre-emption disabled? If it indeed can run late in boot or after boot, then it sure looks buggy. Either the __flush_tlb_all() should be removed or it should be replaced with flush_tlb_kernel_range(). It’s unclear to me why a flush is needed at all, but if it’s needed, surely all CPUs need flushing.
Re: [PATCH] x86/mm/pat: Fix missing preemption disable for __native_flush_tlb()
On Fri, Nov 9, 2018 at 4:22 PM Andy Lutomirski wrote: > > > > > On Nov 9, 2018, at 4:05 PM, Dan Williams wrote: > > > > Commit f77084d96355 "x86/mm/pat: Disable preemption around > > __flush_tlb_all()" addressed a case where __flush_tlb_all() is called > > without preemption being disabled. It also left a warning to catch other > > cases where preemption is not disabled. That warning triggers for the > > memory hotplug path which is also used for persistent memory enabling: > > I don’t think I agree with the patch. If you call __flush_tlb_all() in a > context where you might be *migrated*, then there’s a bug. We could change > the code to allow this particular use by checking that we haven’t done SMP > init yet, perhaps. Hmm, are saying the entire kernel_physical_mapping_init() sequence needs to run with pre-emption disabled?
Re: [PATCH] x86/mm/pat: Fix missing preemption disable for __native_flush_tlb()
On Fri, Nov 9, 2018 at 4:22 PM Andy Lutomirski wrote: > > > > > On Nov 9, 2018, at 4:05 PM, Dan Williams wrote: > > > > Commit f77084d96355 "x86/mm/pat: Disable preemption around > > __flush_tlb_all()" addressed a case where __flush_tlb_all() is called > > without preemption being disabled. It also left a warning to catch other > > cases where preemption is not disabled. That warning triggers for the > > memory hotplug path which is also used for persistent memory enabling: > > I don’t think I agree with the patch. If you call __flush_tlb_all() in a > context where you might be *migrated*, then there’s a bug. We could change > the code to allow this particular use by checking that we haven’t done SMP > init yet, perhaps. Hmm, are saying the entire kernel_physical_mapping_init() sequence needs to run with pre-emption disabled?
Re: [PATCH] x86/mm/pat: Fix missing preemption disable for __native_flush_tlb()
> On Nov 9, 2018, at 4:05 PM, Dan Williams wrote: > > Commit f77084d96355 "x86/mm/pat: Disable preemption around > __flush_tlb_all()" addressed a case where __flush_tlb_all() is called > without preemption being disabled. It also left a warning to catch other > cases where preemption is not disabled. That warning triggers for the > memory hotplug path which is also used for persistent memory enabling: I don’t think I agree with the patch. If you call __flush_tlb_all() in a context where you might be *migrated*, then there’s a bug. We could change the code to allow this particular use by checking that we haven’t done SMP init yet, perhaps. > > WARNING: CPU: 35 PID: 911 at ./arch/x86/include/asm/tlbflush.h:460 > RIP: 0010:__flush_tlb_all+0x1b/0x3a > [..] > Call Trace: > phys_pud_init+0x29c/0x2bb > kernel_physical_mapping_init+0xfc/0x219 > init_memory_mapping+0x1a5/0x3b0 > arch_add_memory+0x2c/0x50 > devm_memremap_pages+0x3aa/0x610 > pmem_attach_disk+0x585/0x700 [nd_pmem] > > Rather than audit all __flush_tlb_all() callers to add preemption, just > do it internally to __flush_tlb_all(). > > Fixes: f77084d96355 ("x86/mm/pat: Disable preemption around > __flush_tlb_all()") > Cc: Sebastian Andrzej Siewior > Cc: Thomas Gleixner > Cc: Andy Lutomirski > Cc: Dave Hansen > Cc: Peter Zijlstra > Cc: Borislav Petkov > Cc: > Signed-off-by: Dan Williams > --- > arch/x86/include/asm/tlbflush.h |8 > arch/x86/mm/pageattr.c |6 +- > 2 files changed, 5 insertions(+), 9 deletions(-) > > diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h > index d760611cfc35..049e0aca0fb5 100644 > --- a/arch/x86/include/asm/tlbflush.h > +++ b/arch/x86/include/asm/tlbflush.h > @@ -454,11 +454,10 @@ static inline void __native_flush_tlb_one_user(unsigned > long addr) > static inline void __flush_tlb_all(void) > { >/* > - * This is to catch users with enabled preemption and the PGE feature > - * and don't trigger the warning in __native_flush_tlb(). > + * Preemption needs to be disabled around __flush_tlb* calls > + * due to CR3 reload in __native_flush_tlb(). > */ > -VM_WARN_ON_ONCE(preemptible()); > - > +preempt_disable(); >if (boot_cpu_has(X86_FEATURE_PGE)) { >__flush_tlb_global(); >} else { > @@ -467,6 +466,7 @@ static inline void __flush_tlb_all(void) > */ >__flush_tlb(); >} > +preempt_enable(); > } > > /* > diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c > index db7a10082238..f799076e3d57 100644 > --- a/arch/x86/mm/pageattr.c > +++ b/arch/x86/mm/pageattr.c > @@ -2309,13 +2309,9 @@ void __kernel_map_pages(struct page *page, int > numpages, int enable) > >/* > * We should perform an IPI and flush all tlbs, > - * but that can deadlock->flush only current cpu. > - * Preemption needs to be disabled around __flush_tlb_all() due to > - * CR3 reload in __native_flush_tlb(). > + * but that can deadlock->flush only current cpu: > */ > -preempt_disable(); >__flush_tlb_all(); > -preempt_enable(); > >arch_flush_lazy_mmu_mode(); > } >
Re: [PATCH] x86/mm/pat: Fix missing preemption disable for __native_flush_tlb()
> On Nov 9, 2018, at 4:05 PM, Dan Williams wrote: > > Commit f77084d96355 "x86/mm/pat: Disable preemption around > __flush_tlb_all()" addressed a case where __flush_tlb_all() is called > without preemption being disabled. It also left a warning to catch other > cases where preemption is not disabled. That warning triggers for the > memory hotplug path which is also used for persistent memory enabling: I don’t think I agree with the patch. If you call __flush_tlb_all() in a context where you might be *migrated*, then there’s a bug. We could change the code to allow this particular use by checking that we haven’t done SMP init yet, perhaps. > > WARNING: CPU: 35 PID: 911 at ./arch/x86/include/asm/tlbflush.h:460 > RIP: 0010:__flush_tlb_all+0x1b/0x3a > [..] > Call Trace: > phys_pud_init+0x29c/0x2bb > kernel_physical_mapping_init+0xfc/0x219 > init_memory_mapping+0x1a5/0x3b0 > arch_add_memory+0x2c/0x50 > devm_memremap_pages+0x3aa/0x610 > pmem_attach_disk+0x585/0x700 [nd_pmem] > > Rather than audit all __flush_tlb_all() callers to add preemption, just > do it internally to __flush_tlb_all(). > > Fixes: f77084d96355 ("x86/mm/pat: Disable preemption around > __flush_tlb_all()") > Cc: Sebastian Andrzej Siewior > Cc: Thomas Gleixner > Cc: Andy Lutomirski > Cc: Dave Hansen > Cc: Peter Zijlstra > Cc: Borislav Petkov > Cc: > Signed-off-by: Dan Williams > --- > arch/x86/include/asm/tlbflush.h |8 > arch/x86/mm/pageattr.c |6 +- > 2 files changed, 5 insertions(+), 9 deletions(-) > > diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h > index d760611cfc35..049e0aca0fb5 100644 > --- a/arch/x86/include/asm/tlbflush.h > +++ b/arch/x86/include/asm/tlbflush.h > @@ -454,11 +454,10 @@ static inline void __native_flush_tlb_one_user(unsigned > long addr) > static inline void __flush_tlb_all(void) > { >/* > - * This is to catch users with enabled preemption and the PGE feature > - * and don't trigger the warning in __native_flush_tlb(). > + * Preemption needs to be disabled around __flush_tlb* calls > + * due to CR3 reload in __native_flush_tlb(). > */ > -VM_WARN_ON_ONCE(preemptible()); > - > +preempt_disable(); >if (boot_cpu_has(X86_FEATURE_PGE)) { >__flush_tlb_global(); >} else { > @@ -467,6 +466,7 @@ static inline void __flush_tlb_all(void) > */ >__flush_tlb(); >} > +preempt_enable(); > } > > /* > diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c > index db7a10082238..f799076e3d57 100644 > --- a/arch/x86/mm/pageattr.c > +++ b/arch/x86/mm/pageattr.c > @@ -2309,13 +2309,9 @@ void __kernel_map_pages(struct page *page, int > numpages, int enable) > >/* > * We should perform an IPI and flush all tlbs, > - * but that can deadlock->flush only current cpu. > - * Preemption needs to be disabled around __flush_tlb_all() due to > - * CR3 reload in __native_flush_tlb(). > + * but that can deadlock->flush only current cpu: > */ > -preempt_disable(); >__flush_tlb_all(); > -preempt_enable(); > >arch_flush_lazy_mmu_mode(); > } >