Re: [GIT PULL] x86/mm changes for v4.21

2019-02-07 Thread Andy Lutomirski



> On Feb 7, 2019, at 2:53 PM, Linus Torvalds  
> wrote:
> 
>> On Thu, Feb 7, 2019 at 1:24 PM Andy Lutomirski  wrote:
>> 
>> How bad would it be to set CR0.CD while fiddling with the page tables rather 
>> than masking the address?
> 
> I would suggest against it. When you disable caching, things don't
> just go "slightly slower". Everything comes to a screeching halt, with
> instruction fetch etc becoming a serious problem.
> 
> So disabling caching for a few instructions in a very simple asm
> snippet might be reasonable (particularly since you could pre-fetch
> the cache and just rely on CR0.CD not fetching *new* lines). But doing
> it in C code that might have things like tracing enabled etc? I'd be
> very very leery.
> 
> 

In principle, it’s just the code that walks the page tables and changes the 
mode.  But we could get a perf NMI and take who knows how long to process it if 
caching is off.

Re: [GIT PULL] x86/mm changes for v4.21

2019-02-07 Thread Linus Torvalds
On Thu, Feb 7, 2019 at 1:24 PM Andy Lutomirski  wrote:
>
> How bad would it be to set CR0.CD while fiddling with the page tables rather 
> than masking the address?

I would suggest against it. When you disable caching, things don't
just go "slightly slower". Everything comes to a screeching halt, with
instruction fetch etc becoming a serious problem.

So disabling caching for a few instructions in a very simple asm
snippet might be reasonable (particularly since you could pre-fetch
the cache and just rely on CR0.CD not fetching *new* lines). But doing
it in C code that might have things like tracing enabled etc? I'd be
very very leery.

 Linus


Re: [GIT PULL] x86/mm changes for v4.21

2019-02-07 Thread Andy Lutomirski



> On Feb 7, 2019, at 10:46 AM, Luck, Tony  wrote:
> 
>> On Thu, Feb 07, 2019 at 10:07:28AM -0800, Andy Lutomirski wrote:
>> Joining this thread late...
>> 
>> This is all IMO rather crazy.  How about we fiddle with CR0 to turn off
>> the cache, then fiddle with page tables, then turn caching on?  Or, heck,
>> see if there’s some chicken bit we can set to improve the situation
>> while we’re in the MCE handler.
> 
>> Also, since I don’t really want
>> to dig into the code to answer this, how exactly do we do a broadcast TLB
>> flush from MCE context?  We’re super-duper-atomic, and locks might be
>> held on various CPUs.  Shouldn’t we be telling the cpa code to skip
>> the flush and then just have the MCE code do a full flush manually?
>> The MCE code has already taken over all CPUs on non-LMCE systems.
> 
> MCE code doesn't do this while still in MCE context. You helped
> restructure this code so the recovery bits happen after we call
> 
>ist_begin_non_atomic(regs);
> 
> on just the CPU that hit the error (in the broadcast case we've
> let the other out of MCE jail by this point).
> 
> So there is a small window where we know the broken page is still
> mapped WB in the kernel 1:1 map. But we just live dangerously for
> a few more microseconds until we can fix the map.

Hmm.

How bad would it be to set CR0.CD while fiddling with the page tables rather 
than masking the address?

> 
>> Or, better yet, get Intel to fix the hardware. A failed speculative
>> access while already in MCE context should just be ignored.
> 
> Good idea.
> 
> -Tony


Re: [GIT PULL] x86/mm changes for v4.21

2019-02-07 Thread Luck, Tony
On Thu, Feb 07, 2019 at 10:07:28AM -0800, Andy Lutomirski wrote:
> Joining this thread late...
> 
> This is all IMO rather crazy.  How about we fiddle with CR0 to turn off
> the cache, then fiddle with page tables, then turn caching on?  Or, heck,
> see if there’s some chicken bit we can set to improve the situation
> while we’re in the MCE handler.

> Also, since I don’t really want
> to dig into the code to answer this, how exactly do we do a broadcast TLB
> flush from MCE context?  We’re super-duper-atomic, and locks might be
> held on various CPUs.  Shouldn’t we be telling the cpa code to skip
> the flush and then just have the MCE code do a full flush manually?
> The MCE code has already taken over all CPUs on non-LMCE systems.

MCE code doesn't do this while still in MCE context. You helped
restructure this code so the recovery bits happen after we call

ist_begin_non_atomic(regs);

on just the CPU that hit the error (in the broadcast case we've
let the other out of MCE jail by this point).

So there is a small window where we know the broken page is still
mapped WB in the kernel 1:1 map. But we just live dangerously for
a few more microseconds until we can fix the map.

> Or, better yet, get Intel to fix the hardware. A failed speculative
> access while already in MCE context should just be ignored.

Good idea.

-Tony


Re: [GIT PULL] x86/mm changes for v4.21

2019-02-07 Thread Luck, Tony
On Thu, Feb 07, 2019 at 06:57:20PM +0100, Peter Zijlstra wrote:
> Something like so then? AFAICT CLFLUSH will also #GP if feed it crap.

Correct. CFLUSH will also #GP on a non-canonical address.

> - __flush_tlb_one_kernel(__cpa_addr(cpa, i));
> + __flush_tlb_one_kernel(fix_addr(__cpa_addr(cpa, i)));


> - clflush_cache_range_opt((void *)addr, PAGE_SIZE);
> + clflush_cache_range_opt((void *)fix_addr(addr), 
> PAGE_SIZE);

Looks like as close as you can get without being crazy. Many/most
of the apparent function calls here are going to be inlined. So
likely safe.

Tried it out and it works!

So:

Tested-by: Tony Luck 

Thanks

-Tony


Re: [GIT PULL] x86/mm changes for v4.21

2019-02-07 Thread Andy Lutomirski



> On Feb 7, 2019, at 9:57 AM, Peter Zijlstra  wrote:
> 
>> On Thu, Feb 07, 2019 at 09:36:00AM -0800, Luck, Tony wrote:
>>> On Thu, Feb 07, 2019 at 03:01:31PM +0100, Peter Zijlstra wrote:
 On Thu, Feb 07, 2019 at 11:50:52AM +, Linus Torvalds wrote:
 If you re-generate the canonical address in __cpa_addr(), now we'll
 actually have the real virtual address around for a lot of code-paths
 (pte lookup etc), which was what people wanted to avoid in the first
 place.
>>> 
>>> Note that it's an 'unsigned long' address, not an actual pointer, and
>>> (afaict) non of the code paths use it as a pointer. This _should_ avoid
>>> the CPU from following said pointer and doing a deref on it.
>> 
>> The type doesn't matter. You want to avoid having the
>> true value in the register as long as possible. Ideal
>> spot would be the instruction before the TLB is flushed.
>> 
>> The speculative issue is that any branch you encounter
>> while you have the address in a register may be mispredicted.
>> You might also get a bogus hit in the branch target cache
>> and speculatively jump into the weeds. While there you
>> could find an instruction that loads using that register, and
>> even though it is speculative and the instruction won't
>> retire, a machine check log will be created in a bank (no
>> machine check is signalled).
>> 
>> Once the TLB is updated, you are safe. A speculative
>> access to an uncached address will not load or log anything.
> 
> Something like so then? AFAICT CLFLUSH will also #GP if feed it crap.
> 
> 
> diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
> index 4f8972311a77..d3ae92ad72a6 100644
> --- a/arch/x86/mm/pageattr.c
> +++ b/arch/x86/mm/pageattr.c
> @@ -230,6 +230,28 @@ static bool __cpa_pfn_in_highmap(unsigned long pfn)
> 
> #endif
> 
> +/*
> + * Machine check recovery code needs to change cache mode of poisoned
> + * pages to UC to avoid speculative access logging another error. But
> + * passing the address of the 1:1 mapping to set_memory_uc() is a fine
> + * way to encourage a speculative access. So we cheat and flip the top
> + * bit of the address. This works fine for the code that updates the
> + * page tables. But at the end of the process we need to flush the cache
> + * and the non-canonical address causes a #GP fault when used by the
> + * CLFLUSH instruction.
> + *
> + * But in the common case we already have a canonical address. This code
> + * will fix the top bit if needed and is a no-op otherwise.

Joining this thread late...

This is all IMO rather crazy.  How about we fiddle with CR0 to turn off the 
cache, then fiddle with page tables, then turn caching on?  Or, heck, see if 
there’s some chicken bit we can set to improve the situation while we’re in the 
MCE handler.

Also, since I don’t really want to dig into the code to answer this, how 
exactly do we do a broadcast TLB flush from MCE context?  We’re 
super-duper-atomic, and locks might be held on various CPUs.  Shouldn’t we be 
telling the cpa code to skip the flush and then just have the MCE code do a 
full flush manually?  The MCE code has already taken over all CPUs on non-LMCE 
systems.

Or, better yet, get Intel to fix the hardware. A failed speculative access 
while already in MCE context should just be ignored.

Re: [GIT PULL] x86/mm changes for v4.21

2019-02-07 Thread Peter Zijlstra
On Thu, Feb 07, 2019 at 09:36:00AM -0800, Luck, Tony wrote:
> On Thu, Feb 07, 2019 at 03:01:31PM +0100, Peter Zijlstra wrote:
> > On Thu, Feb 07, 2019 at 11:50:52AM +, Linus Torvalds wrote:
> > > If you re-generate the canonical address in __cpa_addr(), now we'll
> > > actually have the real virtual address around for a lot of code-paths
> > > (pte lookup etc), which was what people wanted to avoid in the first
> > > place.
> > 
> > Note that it's an 'unsigned long' address, not an actual pointer, and
> > (afaict) non of the code paths use it as a pointer. This _should_ avoid
> > the CPU from following said pointer and doing a deref on it.
> 
> The type doesn't matter. You want to avoid having the
> true value in the register as long as possible. Ideal
> spot would be the instruction before the TLB is flushed.
> 
> The speculative issue is that any branch you encounter
> while you have the address in a register may be mispredicted.
> You might also get a bogus hit in the branch target cache
> and speculatively jump into the weeds. While there you
> could find an instruction that loads using that register, and
> even though it is speculative and the instruction won't
> retire, a machine check log will be created in a bank (no
> machine check is signalled).
> 
> Once the TLB is updated, you are safe. A speculative
> access to an uncached address will not load or log anything.

Something like so then? AFAICT CLFLUSH will also #GP if feed it crap.


diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 4f8972311a77..d3ae92ad72a6 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -230,6 +230,28 @@ static bool __cpa_pfn_in_highmap(unsigned long pfn)
 
 #endif
 
+/*
+ * Machine check recovery code needs to change cache mode of poisoned
+ * pages to UC to avoid speculative access logging another error. But
+ * passing the address of the 1:1 mapping to set_memory_uc() is a fine
+ * way to encourage a speculative access. So we cheat and flip the top
+ * bit of the address. This works fine for the code that updates the
+ * page tables. But at the end of the process we need to flush the cache
+ * and the non-canonical address causes a #GP fault when used by the
+ * CLFLUSH instruction.
+ *
+ * But in the common case we already have a canonical address. This code
+ * will fix the top bit if needed and is a no-op otherwise.
+ */
+static inline unsigned long fix_addr(unsigned long addr)
+{
+#ifdef CONFIG_X86_64
+   return (long)(addr << 1) >> 1;
+#else
+   return addr;
+#endif
+}
+
 static unsigned long __cpa_addr(struct cpa_data *cpa, unsigned long idx)
 {
if (cpa->flags & CPA_PAGES_ARRAY) {
@@ -313,7 +335,7 @@ void __cpa_flush_tlb(void *data)
unsigned int i;
 
for (i = 0; i < cpa->numpages; i++)
-   __flush_tlb_one_kernel(__cpa_addr(cpa, i));
+   __flush_tlb_one_kernel(fix_addr(__cpa_addr(cpa, i)));
 }
 
 static void cpa_flush(struct cpa_data *data, int cache)
@@ -347,7 +369,7 @@ static void cpa_flush(struct cpa_data *data, int cache)
 * Only flush present addresses:
 */
if (pte && (pte_val(*pte) & _PAGE_PRESENT))
-   clflush_cache_range_opt((void *)addr, PAGE_SIZE);
+   clflush_cache_range_opt((void *)fix_addr(addr), 
PAGE_SIZE);
}
mb();
 }
@@ -1627,29 +1649,6 @@ static int __change_page_attr_set_clr(struct cpa_data 
*cpa, int checkalias)
return ret;
 }
 
-/*
- * Machine check recovery code needs to change cache mode of poisoned
- * pages to UC to avoid speculative access logging another error. But
- * passing the address of the 1:1 mapping to set_memory_uc() is a fine
- * way to encourage a speculative access. So we cheat and flip the top
- * bit of the address. This works fine for the code that updates the
- * page tables. But at the end of the process we need to flush the cache
- * and the non-canonical address causes a #GP fault when used by the
- * CLFLUSH instruction.
- *
- * But in the common case we already have a canonical address. This code
- * will fix the top bit if needed and is a no-op otherwise.
- */
-static inline unsigned long make_addr_canonical_again(unsigned long addr)
-{
-#ifdef CONFIG_X86_64
-   return (long)(addr << 1) >> 1;
-#else
-   return addr;
-#endif
-}
-
-
 static int change_page_attr_set_clr(unsigned long *addr, int numpages,
pgprot_t mask_set, pgprot_t mask_clr,
int force_split, int in_flag,


Re: [GIT PULL] x86/mm changes for v4.21

2019-02-07 Thread Luck, Tony
On Thu, Feb 07, 2019 at 03:01:31PM +0100, Peter Zijlstra wrote:
> On Thu, Feb 07, 2019 at 11:50:52AM +, Linus Torvalds wrote:
> > If you re-generate the canonical address in __cpa_addr(), now we'll
> > actually have the real virtual address around for a lot of code-paths
> > (pte lookup etc), which was what people wanted to avoid in the first
> > place.
> 
> Note that it's an 'unsigned long' address, not an actual pointer, and
> (afaict) non of the code paths use it as a pointer. This _should_ avoid
> the CPU from following said pointer and doing a deref on it.

The type doesn't matter. You want to avoid having the
true value in the register as long as possible. Ideal
spot would be the instruction before the TLB is flushed.

The speculative issue is that any branch you encounter
while you have the address in a register may be mispredicted.
You might also get a bogus hit in the branch target cache
and speculatively jump into the weeds. While there you
could find an instruction that loads using that register, and
even though it is speculative and the instruction won't
retire, a machine check log will be created in a bank (no
machine check is signalled).

Once the TLB is updated, you are safe. A speculative
access to an uncached address will not load or log anything.

-Tony


Re: [GIT PULL] x86/mm changes for v4.21

2019-02-07 Thread Peter Zijlstra
On Thu, Feb 07, 2019 at 11:50:52AM +, Linus Torvalds wrote:
> If you re-generate the canonical address in __cpa_addr(), now we'll
> actually have the real virtual address around for a lot of code-paths
> (pte lookup etc), which was what people wanted to avoid in the first
> place.

Note that it's an 'unsigned long' address, not an actual pointer, and
(afaict) non of the code paths use it as a pointer. This _should_ avoid
the CPU from following said pointer and doing a deref on it.

But yes, I didn't go all the way down, maybe I missed some.


Re: [GIT PULL] x86/mm changes for v4.21

2019-02-07 Thread Linus Torvalds
On Thu, Feb 7, 2019 at 10:18 AM Peter Zijlstra  wrote:
>
> So now the question is where to put it back in, I'm thinking this might
> want to be in __cpa_addr().

Wasn't one of the goals to *not* keep the real virtual address around
to avoid speculation, since we don't want some random speculation
using that address to then trigger more MCE's?

If you re-generate the canonical address in __cpa_addr(), now we'll
actually have the real virtual address around for a lot of code-paths
(pte lookup etc), which was what people wanted to avoid in the first
place.

NOTE! I may be *entirely* off on this, and just confused about exactly
which speculative accesses people were worried about.

But at the same time, looking at all the other cases, it does look
like we need to do it in __cpa_addr(), because we also have other tsts
for an actual valid virtual address, ie all the

vaddr = __cpa_addr(cpa, cpa->curpage);
if (!(within(vaddr, PAGE_OFFSET,
PAGE_OFFSET + (max_pfn_mapped << PAGE_SHIFT {

kind of checks.

What's the exact rule and path from "set_mce_nospec()" (which is what
sets that "decoy" address with the high bit clear) to when we can then
use the address for tlb flushing?)

Adding Dan Williams to the participants, because it's really only
set_mce_nospec() that triggers this, and where should try to be
careful that that path really doesn't use the actual real virtual
address. He touched it last.

But I guess the whole decoy_addr thing goes back to Tony Luck and
commit ce0fa3e56ad2 ("x86/mm, mm/hwpoison: Clear PRESENT bit for
kernel 1:1 mappings of poison pages").

Do we really need it?

   Linus


Re: [GIT PULL] x86/mm changes for v4.21

2019-02-07 Thread Peter Zijlstra
On Wed, Feb 06, 2019 at 04:17:42PM -0800, Luck, Tony wrote:

> fe0937b24ff5 ("x86/mm/cpa: Fold cpa_flush_range() and cpa_flush_array() into 
> a single cpa_flush() function")

Boris pointed me at this gem:

  c7486104a5ce ("x86/mce: Fix set_mce_nospec() to avoid #GP fault")

(can I just revel at the pure awesome grossness of that)

But we then see my above commit having:

@@ -1732,11 +1685,6 @@ static int change_page_attr_set_clr(unsigned long *addr, 
int numpages,
 */
WARN_ON_ONCE(1);
}
-   /*
-* Save address for cache flush. *addr is modified in the call
-* to __change_page_attr_set_clr() below.
-*/
-   baddr = make_addr_canonical_again(*addr);

Where I clearly got distracted by that excellent comment..


So now the question is where to put it back in, I'm thinking this might
want to be in __cpa_addr().

Something like so?

diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 4f8972311a77..319b767484fb 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -230,6 +230,28 @@ static bool __cpa_pfn_in_highmap(unsigned long pfn)
 
 #endif
 
+/*
+ * Machine check recovery code needs to change cache mode of poisoned
+ * pages to UC to avoid speculative access logging another error. But
+ * passing the address of the 1:1 mapping to set_memory_uc() is a fine
+ * way to encourage a speculative access. So we cheat and flip the top
+ * bit of the address. This works fine for the code that updates the
+ * page tables. But at the end of the process we need to flush the cache
+ * and the non-canonical address causes a #GP fault when used by the
+ * CLFLUSH instruction.
+ *
+ * But in the common case we already have a canonical address. This code
+ * will fix the top bit if needed and is a no-op otherwise.
+ */
+static inline unsigned long make_addr_canonical_again(unsigned long addr)
+{
+#ifdef CONFIG_X86_64
+   return (long)(addr << 1) >> 1;
+#else
+   return addr;
+#endif
+}
+
 static unsigned long __cpa_addr(struct cpa_data *cpa, unsigned long idx)
 {
if (cpa->flags & CPA_PAGES_ARRAY) {
@@ -244,7 +266,7 @@ static unsigned long __cpa_addr(struct cpa_data *cpa, 
unsigned long idx)
if (cpa->flags & CPA_ARRAY)
return cpa->vaddr[idx];
 
-   return *cpa->vaddr + idx * PAGE_SIZE;
+   return make_addr_canonical_again(*cpa->vaddr + idx * PAGE_SIZE);
 }
 
 /*
@@ -1627,29 +1649,6 @@ static int __change_page_attr_set_clr(struct cpa_data 
*cpa, int checkalias)
return ret;
 }
 
-/*
- * Machine check recovery code needs to change cache mode of poisoned
- * pages to UC to avoid speculative access logging another error. But
- * passing the address of the 1:1 mapping to set_memory_uc() is a fine
- * way to encourage a speculative access. So we cheat and flip the top
- * bit of the address. This works fine for the code that updates the
- * page tables. But at the end of the process we need to flush the cache
- * and the non-canonical address causes a #GP fault when used by the
- * CLFLUSH instruction.
- *
- * But in the common case we already have a canonical address. This code
- * will fix the top bit if needed and is a no-op otherwise.
- */
-static inline unsigned long make_addr_canonical_again(unsigned long addr)
-{
-#ifdef CONFIG_X86_64
-   return (long)(addr << 1) >> 1;
-#else
-   return addr;
-#endif
-}
-
-
 static int change_page_attr_set_clr(unsigned long *addr, int numpages,
pgprot_t mask_set, pgprot_t mask_clr,
int force_split, int in_flag,


Re: [GIT PULL] x86/mm changes for v4.21

2019-02-07 Thread Peter Zijlstra
On Wed, Feb 06, 2019 at 04:33:54PM -0800, Dave Hansen wrote:

> I wonder if the patches that you bisected to just changed the flushing
> from being CR3-based (and not taking an address) to being INVPCID-based,
> and taking an address that is sensitive to canonicality.

That is indeed one of the things that patch series does; before this it
would always flush world for ARRAY interfaces.

I'll have a look at the code; I seem to remember there being test for
canonical addresses, maybe they're not in the right place.


Re: [GIT PULL] x86/mm changes for v4.21

2019-02-06 Thread Dave Hansen
On 2/6/19 4:17 PM, Luck, Tony wrote:
> [   93.491692] RAX:  RBX:  RCX: 
> 99623f2c3f70
> [   93.499658] RDX: 2e6b58da0121 RSI:  RDI: 
> 7fff9981feeab000
...
> Potentially the problem might be a non-canonical address passed down
> by the machine check recovery code to switch the page with the error
> to uncacheable. Perhaps the refactored code is now using that in the
> 
>   invpcid (%rcx),%rax
> 
> instruction that gets the #GP fault?

That looks probable.  RDI even has the non-canonical address still in it
(RCX is pointing to the stack, btw).

I think there's an option to dump the trace buffers at panic time.  You
might want to enable the TLB flush tracing:

/sys/kernel/debug/tracing/events/tlb/tlb_flush

and see if we get a suspect flush captured just before the #GP.

I wonder if the patches that you bisected to just changed the flushing
from being CR3-based (and not taking an address) to being INVPCID-based,
and taking an address that is sensitive to canonicality.


Re: [GIT PULL] x86/mm changes for v4.21

2019-02-06 Thread Luck, Tony
On Tue, Dec 25, 2018 at 12:11:06AM +0100, Ingo Molnar wrote:
> Peter Zijlstra (9):
>   x86/mm/cpa: Add ARRAY and PAGES_ARRAY selftests
>   x86/mm/cpa: Add __cpa_addr() helper
>   x86/mm/cpa: Make cpa_data::vaddr invariant
>   x86/mm/cpa: Simplify the code after making cpa->vaddr invariant
>   x86/mm/cpa: Optimize cpa_flush_array() TLB invalidation
>   x86/mm/cpa: Make cpa_data::numpages invariant
>   x86/mm/cpa: Fold cpa_flush_range() and cpa_flush_array() into a single 
> cpa_flush() function
>   x86/mm/cpa: Better use CLFLUSHOPT
>   x86/mm/cpa: Rename @addrinarray to @numpages

Something in this series from Peter is causing problems with
machine check recovery.  The kernel dies with a #GP fault

[   93.363295] Disabling lock debugging due to kernel taint
[   93.369700] mce: Uncorrected hardware memory error in user-access at 
3fbeeab400
[   93.369709] mce: [Hardware Error]: Machine check events logged
[   93.384415] mce: [Hardware Error]: Machine check events logged
[   93.390973] EDAC MC2: 1 UE memory read error on 
CPU_SrcID#1_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0x3fbeeab offset:0x400 
grain:32 -  OVERFLOW recoverable area:DRAM err_code:0001:0090 socket:1 ha:0 
channel_mask:1 rank:0)
[   93.413569] Memory failure: 0x3fbeeab: Killing einj_mem_uc:4810 due to 
hardware memory corruption
[   93.423501] Memory failure: 0x3fbeeab: recovery action for dirty LRU page: 
Recovered
[   93.432508] general protection fault:  [#1] SMP PTI
[   93.438359] CPU: 11 PID: 0 Comm: swapper/11 Tainted: G   M  
4.20.0-rc5+ #13
[   93.447294] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS 
BRBDXSD1.86B.0338.V01.1603162127 03/16/2016
[   93.458869] RIP: 0010:native_flush_tlb_one_user+0x8c/0xa0
[   93.464899] Code: 02 48 8b 44 24 18 65 48 33 04 25 28 00 00 00 75 20 c9 c3 
83 c0 01 48 89 7c 24 08 48 89 e1 80 cc 08 0f b7 c0 48 89 04 24 31 c0 <66> 0f 38 
82 01 eb d0 e8 78 0e 05 00 0f 1f 84 00 00 00 00 00 0f 1f
[   93.485859] RSP: 0018:99623f2c3f70 EFLAGS: 00010046
[   93.491692] RAX:  RBX:  RCX: 99623f2c3f70
[   93.499658] RDX: 2e6b58da0121 RSI:  RDI: 7fff9981feeab000
[   93.507623] RBP: 99623f2c3f98 R08: 0002 R09: 00021640
[   93.515587] R10: 000ecaed3e716d58 R11:  R12: 84fe9920
[   93.523550] R13:  R14:  R15: 
[   93.531518] FS:  () GS:99623f2c() 
knlGS:
[   93.540551] CS:  0010 DS:  ES:  CR0: 80050033
[   93.546966] CR2: 5566b2cd5470 CR3: 0049bee0a006 CR4: 003606e0
[   93.554927] DR0:  DR1:  DR2: 
[   93.562892] DR3:  DR6: fffe0ff0 DR7: 0400
[   93.570857] Call Trace:
[   93.573593]  
[   93.575846]  ? recalibrate_cpu_khz+0x10/0x10
[   93.580628]  __cpa_flush_tlb+0x2e/0x50
[   93.584830]  flush_smp_call_function_queue+0x35/0xe0
[   93.590390]  smp_call_function_interrupt+0x3a/0xd0
[   93.595740]  call_function_interrupt+0xf/0x20
[   93.600604]  

Build errors during bisection couldn't point to a single commit, but
it did limit it to:

There are only 'skip'ped commits left to test.
The first bad commit could be any of:
83b4e39146aa70913580966e0f2b78b7c3492760
935f5839827ef54b53406e80906f7c355eb73c1b
fe0937b24ff5d7b343b9922201e469f9a6009d9d
We cannot bisect more!

so (more descriptively):
83b4e39146aa ("x86/mm/cpa: Make cpa_data::numpages invariant")
935f5839827e ("x86/mm/cpa: Optimize cpa_flush_array() TLB invalidation")
fe0937b24ff5 ("x86/mm/cpa: Fold cpa_flush_range() and cpa_flush_array() into a 
single cpa_flush() function")


If I revert those three (together with the following three from this
merge - because I didn't want to run into more build problems). Then
machine check recovery starts working again.

Potentially the problem might be a non-canonical address passed down
by the machine check recovery code to switch the page with the error
to uncacheable. Perhaps the refactored code is now using that in the

invpcid (%rcx),%rax

instruction that gets the #GP fault?

-Tony


Re: [GIT PULL] x86/mm changes for v4.21

2018-12-26 Thread pr-tracker-bot
The pull request you sent on Tue, 25 Dec 2018 00:11:06 +0100:

> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86-mm-for-linus

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/e57d9f638af9673f38d9f09de66fa0a28303127d

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker