Re: [PATCH] x86/mm/pat: Fix missing preemption disable for __native_flush_tlb()

2018-11-12 Thread Dave Hansen
On 11/10/18 4:31 PM, Dan Williams wrote:
>> If it indeed can run late in boot or after boot, then it sure looks
>> buggy. Either the __flush_tlb_all() should be removed or it should
>> be replaced with flush_tlb_kernel_range(). It’s unclear to me why a
>> flush is needed at all, but if it’s needed, surely all CPUs need
>> flushing.
> Yeah, I don't think __flush_tlb_all() is needed at 
> kernel_physical_mapping_init() time, and at 
> kernel_physical_mapping_remove() time we do a full flush_tlb_all().

It doesn't look strictly necessary to me.  I _think_ we're only ever
populating previously non-present entries, and those never need TLB
flushes.  I didn't look too deeply, so I'd appreciate anyone else
double-checking me on this.

The __flush_tlb_all() actually appears to predate git and it was
originally entirely intended for early-boot-only.  It probably lasted
this long because it looks really important. :)

It was even next to where we set MMU features in CR4, which is *really*
early in boot:

> +   asm volatile("movq %%cr4,%0" : "=r" (mmu_cr4_features));
> +   __flush_tlb_all();

I also totally agree with Andy that if it were needed on the local CPU,
this code would be buggy because it doesn't initiate any *remote* TLB
flushes.

So, let's remove it, but also add some comments about not being allowed
to *change* page table entries, only populate them.  We could even add
some warnings to keep this enforced.


Re: [PATCH] x86/mm/pat: Fix missing preemption disable for __native_flush_tlb()

2018-11-12 Thread Dave Hansen
On 11/10/18 4:31 PM, Dan Williams wrote:
>> If it indeed can run late in boot or after boot, then it sure looks
>> buggy. Either the __flush_tlb_all() should be removed or it should
>> be replaced with flush_tlb_kernel_range(). It’s unclear to me why a
>> flush is needed at all, but if it’s needed, surely all CPUs need
>> flushing.
> Yeah, I don't think __flush_tlb_all() is needed at 
> kernel_physical_mapping_init() time, and at 
> kernel_physical_mapping_remove() time we do a full flush_tlb_all().

It doesn't look strictly necessary to me.  I _think_ we're only ever
populating previously non-present entries, and those never need TLB
flushes.  I didn't look too deeply, so I'd appreciate anyone else
double-checking me on this.

The __flush_tlb_all() actually appears to predate git and it was
originally entirely intended for early-boot-only.  It probably lasted
this long because it looks really important. :)

It was even next to where we set MMU features in CR4, which is *really*
early in boot:

> +   asm volatile("movq %%cr4,%0" : "=r" (mmu_cr4_features));
> +   __flush_tlb_all();

I also totally agree with Andy that if it were needed on the local CPU,
this code would be buggy because it doesn't initiate any *remote* TLB
flushes.

So, let's remove it, but also add some comments about not being allowed
to *change* page table entries, only populate them.  We could even add
some warnings to keep this enforced.


Re: [PATCH] x86/mm/pat: Fix missing preemption disable for __native_flush_tlb()

2018-11-10 Thread Dan Williams
[ added Kirill ]

On Sat, Nov 10, 2018 at 4:19 PM Andy Lutomirski  wrote:
> > On Nov 10, 2018, at 3:57 PM, Dan Williams  wrote:
> >
> >> On Fri, Nov 9, 2018 at 4:22 PM Andy Lutomirski  wrote:
> >>
> >>
> >>
> >>> On Nov 9, 2018, at 4:05 PM, Dan Williams  wrote:
> >>>
> >>> Commit f77084d96355 "x86/mm/pat: Disable preemption around
> >>> __flush_tlb_all()" addressed a case where __flush_tlb_all() is called
> >>> without preemption being disabled. It also left a warning to catch other
> >>> cases where preemption is not disabled. That warning triggers for the
> >>> memory hotplug path which is also used for persistent memory enabling:
> >>
> >> I don’t think I agree with the patch. If you call __flush_tlb_all() in a 
> >> context where you might be *migrated*, then there’s a bug. We could change 
> >> the code to allow this particular use by checking that we haven’t done SMP 
> >> init yet, perhaps.
> >
> > Hmm, are saying the entire kernel_physical_mapping_init() sequence
> > needs to run with pre-emption disabled?
>
> If it indeed can run late in boot or after boot, then it sure looks buggy. 
> Either the __flush_tlb_all() should be removed or it should be replaced with 
> flush_tlb_kernel_range(). It’s unclear to me why a flush is needed at all, 
> but if it’s needed, surely all CPUs need flushing.

Yeah, I don't think __flush_tlb_all() is needed at
kernel_physical_mapping_init() time, and at
kernel_physical_mapping_remove() time we do a full flush_tlb_all().

Kirill?


Re: [PATCH] x86/mm/pat: Fix missing preemption disable for __native_flush_tlb()

2018-11-10 Thread Dan Williams
[ added Kirill ]

On Sat, Nov 10, 2018 at 4:19 PM Andy Lutomirski  wrote:
> > On Nov 10, 2018, at 3:57 PM, Dan Williams  wrote:
> >
> >> On Fri, Nov 9, 2018 at 4:22 PM Andy Lutomirski  wrote:
> >>
> >>
> >>
> >>> On Nov 9, 2018, at 4:05 PM, Dan Williams  wrote:
> >>>
> >>> Commit f77084d96355 "x86/mm/pat: Disable preemption around
> >>> __flush_tlb_all()" addressed a case where __flush_tlb_all() is called
> >>> without preemption being disabled. It also left a warning to catch other
> >>> cases where preemption is not disabled. That warning triggers for the
> >>> memory hotplug path which is also used for persistent memory enabling:
> >>
> >> I don’t think I agree with the patch. If you call __flush_tlb_all() in a 
> >> context where you might be *migrated*, then there’s a bug. We could change 
> >> the code to allow this particular use by checking that we haven’t done SMP 
> >> init yet, perhaps.
> >
> > Hmm, are saying the entire kernel_physical_mapping_init() sequence
> > needs to run with pre-emption disabled?
>
> If it indeed can run late in boot or after boot, then it sure looks buggy. 
> Either the __flush_tlb_all() should be removed or it should be replaced with 
> flush_tlb_kernel_range(). It’s unclear to me why a flush is needed at all, 
> but if it’s needed, surely all CPUs need flushing.

Yeah, I don't think __flush_tlb_all() is needed at
kernel_physical_mapping_init() time, and at
kernel_physical_mapping_remove() time we do a full flush_tlb_all().

Kirill?


Re: [PATCH] x86/mm/pat: Fix missing preemption disable for __native_flush_tlb()

2018-11-10 Thread Andy Lutomirski


> On Nov 10, 2018, at 3:57 PM, Dan Williams  wrote:
> 
>> On Fri, Nov 9, 2018 at 4:22 PM Andy Lutomirski  wrote:
>> 
>> 
>> 
>>> On Nov 9, 2018, at 4:05 PM, Dan Williams  wrote:
>>> 
>>> Commit f77084d96355 "x86/mm/pat: Disable preemption around
>>> __flush_tlb_all()" addressed a case where __flush_tlb_all() is called
>>> without preemption being disabled. It also left a warning to catch other
>>> cases where preemption is not disabled. That warning triggers for the
>>> memory hotplug path which is also used for persistent memory enabling:
>> 
>> I don’t think I agree with the patch. If you call __flush_tlb_all() in a 
>> context where you might be *migrated*, then there’s a bug. We could change 
>> the code to allow this particular use by checking that we haven’t done SMP 
>> init yet, perhaps.
> 
> Hmm, are saying the entire kernel_physical_mapping_init() sequence
> needs to run with pre-emption disabled?

If it indeed can run late in boot or after boot, then it sure looks buggy. 
Either the __flush_tlb_all() should be removed or it should be replaced with 
flush_tlb_kernel_range(). It’s unclear to me why a flush is needed at all, but 
if it’s needed, surely all CPUs need flushing.

Re: [PATCH] x86/mm/pat: Fix missing preemption disable for __native_flush_tlb()

2018-11-10 Thread Andy Lutomirski


> On Nov 10, 2018, at 3:57 PM, Dan Williams  wrote:
> 
>> On Fri, Nov 9, 2018 at 4:22 PM Andy Lutomirski  wrote:
>> 
>> 
>> 
>>> On Nov 9, 2018, at 4:05 PM, Dan Williams  wrote:
>>> 
>>> Commit f77084d96355 "x86/mm/pat: Disable preemption around
>>> __flush_tlb_all()" addressed a case where __flush_tlb_all() is called
>>> without preemption being disabled. It also left a warning to catch other
>>> cases where preemption is not disabled. That warning triggers for the
>>> memory hotplug path which is also used for persistent memory enabling:
>> 
>> I don’t think I agree with the patch. If you call __flush_tlb_all() in a 
>> context where you might be *migrated*, then there’s a bug. We could change 
>> the code to allow this particular use by checking that we haven’t done SMP 
>> init yet, perhaps.
> 
> Hmm, are saying the entire kernel_physical_mapping_init() sequence
> needs to run with pre-emption disabled?

If it indeed can run late in boot or after boot, then it sure looks buggy. 
Either the __flush_tlb_all() should be removed or it should be replaced with 
flush_tlb_kernel_range(). It’s unclear to me why a flush is needed at all, but 
if it’s needed, surely all CPUs need flushing.

Re: [PATCH] x86/mm/pat: Fix missing preemption disable for __native_flush_tlb()

2018-11-10 Thread Dan Williams
On Fri, Nov 9, 2018 at 4:22 PM Andy Lutomirski  wrote:
>
>
>
> > On Nov 9, 2018, at 4:05 PM, Dan Williams  wrote:
> >
> > Commit f77084d96355 "x86/mm/pat: Disable preemption around
> > __flush_tlb_all()" addressed a case where __flush_tlb_all() is called
> > without preemption being disabled. It also left a warning to catch other
> > cases where preemption is not disabled. That warning triggers for the
> > memory hotplug path which is also used for persistent memory enabling:
>
> I don’t think I agree with the patch. If you call __flush_tlb_all() in a 
> context where you might be *migrated*, then there’s a bug. We could change 
> the code to allow this particular use by checking that we haven’t done SMP 
> init yet, perhaps.

Hmm, are saying the entire kernel_physical_mapping_init() sequence
needs to run with pre-emption disabled?


Re: [PATCH] x86/mm/pat: Fix missing preemption disable for __native_flush_tlb()

2018-11-10 Thread Dan Williams
On Fri, Nov 9, 2018 at 4:22 PM Andy Lutomirski  wrote:
>
>
>
> > On Nov 9, 2018, at 4:05 PM, Dan Williams  wrote:
> >
> > Commit f77084d96355 "x86/mm/pat: Disable preemption around
> > __flush_tlb_all()" addressed a case where __flush_tlb_all() is called
> > without preemption being disabled. It also left a warning to catch other
> > cases where preemption is not disabled. That warning triggers for the
> > memory hotplug path which is also used for persistent memory enabling:
>
> I don’t think I agree with the patch. If you call __flush_tlb_all() in a 
> context where you might be *migrated*, then there’s a bug. We could change 
> the code to allow this particular use by checking that we haven’t done SMP 
> init yet, perhaps.

Hmm, are saying the entire kernel_physical_mapping_init() sequence
needs to run with pre-emption disabled?


Re: [PATCH] x86/mm/pat: Fix missing preemption disable for __native_flush_tlb()

2018-11-09 Thread Andy Lutomirski



> On Nov 9, 2018, at 4:05 PM, Dan Williams  wrote:
> 
> Commit f77084d96355 "x86/mm/pat: Disable preemption around
> __flush_tlb_all()" addressed a case where __flush_tlb_all() is called
> without preemption being disabled. It also left a warning to catch other
> cases where preemption is not disabled. That warning triggers for the
> memory hotplug path which is also used for persistent memory enabling:

I don’t think I agree with the patch. If you call __flush_tlb_all() in a 
context where you might be *migrated*, then there’s a bug. We could change the 
code to allow this particular use by checking that we haven’t done SMP init 
yet, perhaps.

> 
> WARNING: CPU: 35 PID: 911 at ./arch/x86/include/asm/tlbflush.h:460
> RIP: 0010:__flush_tlb_all+0x1b/0x3a
> [..]
> Call Trace:
>  phys_pud_init+0x29c/0x2bb
>  kernel_physical_mapping_init+0xfc/0x219
>  init_memory_mapping+0x1a5/0x3b0
>  arch_add_memory+0x2c/0x50
>  devm_memremap_pages+0x3aa/0x610
>  pmem_attach_disk+0x585/0x700 [nd_pmem]
> 
> Rather than audit all __flush_tlb_all() callers to add preemption, just
> do it internally to __flush_tlb_all().
> 
> Fixes: f77084d96355 ("x86/mm/pat: Disable preemption around 
> __flush_tlb_all()")
> Cc: Sebastian Andrzej Siewior 
> Cc: Thomas Gleixner 
> Cc: Andy Lutomirski 
> Cc: Dave Hansen 
> Cc: Peter Zijlstra 
> Cc: Borislav Petkov 
> Cc: 
> Signed-off-by: Dan Williams 
> ---
> arch/x86/include/asm/tlbflush.h |8 
> arch/x86/mm/pageattr.c  |6 +-
> 2 files changed, 5 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
> index d760611cfc35..049e0aca0fb5 100644
> --- a/arch/x86/include/asm/tlbflush.h
> +++ b/arch/x86/include/asm/tlbflush.h
> @@ -454,11 +454,10 @@ static inline void __native_flush_tlb_one_user(unsigned 
> long addr)
> static inline void __flush_tlb_all(void)
> {
>/*
> - * This is to catch users with enabled preemption and the PGE feature
> - * and don't trigger the warning in __native_flush_tlb().
> + *  Preemption needs to be disabled around __flush_tlb* calls
> + *  due to CR3 reload in __native_flush_tlb().
> */
> -VM_WARN_ON_ONCE(preemptible());
> -
> +preempt_disable();
>if (boot_cpu_has(X86_FEATURE_PGE)) {
>__flush_tlb_global();
>} else {
> @@ -467,6 +466,7 @@ static inline void __flush_tlb_all(void)
> */
>__flush_tlb();
>}
> +preempt_enable();
> }
> 
> /*
> diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
> index db7a10082238..f799076e3d57 100644
> --- a/arch/x86/mm/pageattr.c
> +++ b/arch/x86/mm/pageattr.c
> @@ -2309,13 +2309,9 @@ void __kernel_map_pages(struct page *page, int 
> numpages, int enable)
> 
>/*
> * We should perform an IPI and flush all tlbs,
> - * but that can deadlock->flush only current cpu.
> - * Preemption needs to be disabled around __flush_tlb_all() due to
> - * CR3 reload in __native_flush_tlb().
> + * but that can deadlock->flush only current cpu:
> */
> -preempt_disable();
>__flush_tlb_all();
> -preempt_enable();
> 
>arch_flush_lazy_mmu_mode();
> }
> 


Re: [PATCH] x86/mm/pat: Fix missing preemption disable for __native_flush_tlb()

2018-11-09 Thread Andy Lutomirski



> On Nov 9, 2018, at 4:05 PM, Dan Williams  wrote:
> 
> Commit f77084d96355 "x86/mm/pat: Disable preemption around
> __flush_tlb_all()" addressed a case where __flush_tlb_all() is called
> without preemption being disabled. It also left a warning to catch other
> cases where preemption is not disabled. That warning triggers for the
> memory hotplug path which is also used for persistent memory enabling:

I don’t think I agree with the patch. If you call __flush_tlb_all() in a 
context where you might be *migrated*, then there’s a bug. We could change the 
code to allow this particular use by checking that we haven’t done SMP init 
yet, perhaps.

> 
> WARNING: CPU: 35 PID: 911 at ./arch/x86/include/asm/tlbflush.h:460
> RIP: 0010:__flush_tlb_all+0x1b/0x3a
> [..]
> Call Trace:
>  phys_pud_init+0x29c/0x2bb
>  kernel_physical_mapping_init+0xfc/0x219
>  init_memory_mapping+0x1a5/0x3b0
>  arch_add_memory+0x2c/0x50
>  devm_memremap_pages+0x3aa/0x610
>  pmem_attach_disk+0x585/0x700 [nd_pmem]
> 
> Rather than audit all __flush_tlb_all() callers to add preemption, just
> do it internally to __flush_tlb_all().
> 
> Fixes: f77084d96355 ("x86/mm/pat: Disable preemption around 
> __flush_tlb_all()")
> Cc: Sebastian Andrzej Siewior 
> Cc: Thomas Gleixner 
> Cc: Andy Lutomirski 
> Cc: Dave Hansen 
> Cc: Peter Zijlstra 
> Cc: Borislav Petkov 
> Cc: 
> Signed-off-by: Dan Williams 
> ---
> arch/x86/include/asm/tlbflush.h |8 
> arch/x86/mm/pageattr.c  |6 +-
> 2 files changed, 5 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
> index d760611cfc35..049e0aca0fb5 100644
> --- a/arch/x86/include/asm/tlbflush.h
> +++ b/arch/x86/include/asm/tlbflush.h
> @@ -454,11 +454,10 @@ static inline void __native_flush_tlb_one_user(unsigned 
> long addr)
> static inline void __flush_tlb_all(void)
> {
>/*
> - * This is to catch users with enabled preemption and the PGE feature
> - * and don't trigger the warning in __native_flush_tlb().
> + *  Preemption needs to be disabled around __flush_tlb* calls
> + *  due to CR3 reload in __native_flush_tlb().
> */
> -VM_WARN_ON_ONCE(preemptible());
> -
> +preempt_disable();
>if (boot_cpu_has(X86_FEATURE_PGE)) {
>__flush_tlb_global();
>} else {
> @@ -467,6 +466,7 @@ static inline void __flush_tlb_all(void)
> */
>__flush_tlb();
>}
> +preempt_enable();
> }
> 
> /*
> diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
> index db7a10082238..f799076e3d57 100644
> --- a/arch/x86/mm/pageattr.c
> +++ b/arch/x86/mm/pageattr.c
> @@ -2309,13 +2309,9 @@ void __kernel_map_pages(struct page *page, int 
> numpages, int enable)
> 
>/*
> * We should perform an IPI and flush all tlbs,
> - * but that can deadlock->flush only current cpu.
> - * Preemption needs to be disabled around __flush_tlb_all() due to
> - * CR3 reload in __native_flush_tlb().
> + * but that can deadlock->flush only current cpu:
> */
> -preempt_disable();
>__flush_tlb_all();
> -preempt_enable();
> 
>arch_flush_lazy_mmu_mode();
> }
>