Re: [Xen-devel] [Bug 198497] handle_mm_fault / xen_pmd_val / radix_tree_lookup_slot Null pointer

2018-09-04 Thread Jason Andryuk
On Mon, Apr 23, 2018 at 4:17 AM Juergen Gross  wrote:
> On 20/04/18 17:20, Jason Andryuk wrote:
> > Adding xen-devel and the Linux Xen maintainers.
> >
> > Summary: Some Xen users (and maybe others) are hitting a BUG in
> > __radix_tree_lookup() under do_swap_page() - example backtrace is
> > provided at the end.  Matthew Wilcox provided a band-aid patch that
> > prints errors like the following instead of triggering the bug.
> >
> > Skylake 32bit PAE Dom0:
> > Bad swp_entry: 8000
> > mm/swap_state.c:683: bad pte d3a39f1c(8004)
> >
> > Ivy Bridge 32bit PAE Dom0:
> > Bad swp_entry: 4000
> > mm/swap_state.c:683: bad pte d3a05f1c(8002)
> >
> > Other 32bit DomU:
> > Bad swp_entry: 400
> > mm/swap_state.c:683: bad pte e2187f30(8002)
> >
> > Other 32bit:
> > Bad swp_entry: 200
> > mm/swap_state.c:683: bad pte ef3a3f38(8001)
> >
> > The Linux bugzilla has more info
> > https://bugzilla.kernel.org/show_bug.cgi?id=198497
> >
> > This may not be exclusive to Xen Linux, but most of the reports are on
> > Xen.  Matthew wonders if Xen might be stepping on the upper bits of a
> > pte.
> >

>
> Could it be we just have a race regarding pte_clear()? This will set
> the low part of the pte to zero first and then the hight part.
>
> In case pte_clear() is used in interrupt mode especially Xen will be
> rather slow as it emulates the two writes to the page table resulting
> in a larger window where the race might happen.

It looks like Juergen was correct.  With the L1TF vulnerability, the
Xen hypervisor needs to detect vulnerable PTEs.  For 32bit PAE, Xen
would trap on PTEs like 0x8000'0002''  - the same format as
seen in this bug.  He wrote two patches for Linux, now upstream, to
write PTEs with 64bit operations or hypercalls and avoid the invalid
PTEs:
f7c90c2aa400 "x86/xen: don't write ptes directly in 32-bit PV guests"
b2d7a075a1cc "x86/pae: use 64 bit atomic xchg function in
native_ptep_get_and_clear"

With those patches, I have not seen a "Bad swp_entry", so this seems
fixed for me on Xen.

There was also a report of a non-Xen kernel being affected.  Is there
an underlying problem that native PAE code updates PTEs in two writes,
but there is no locking to prevent the intermediate PTE from being
used elsewhere in the kernel?

Regards,
Jason

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [Bug 198497] handle_mm_fault / xen_pmd_val / radix_tree_lookup_slot Null pointer

2018-04-23 Thread Juergen Gross
On 20/04/18 17:20, Jason Andryuk wrote:
> Adding xen-devel and the Linux Xen maintainers.
> 
> Summary: Some Xen users (and maybe others) are hitting a BUG in
> __radix_tree_lookup() under do_swap_page() - example backtrace is
> provided at the end.  Matthew Wilcox provided a band-aid patch that
> prints errors like the following instead of triggering the bug.
> 
> Skylake 32bit PAE Dom0:
> Bad swp_entry: 8000
> mm/swap_state.c:683: bad pte d3a39f1c(8004)
> 
> Ivy Bridge 32bit PAE Dom0:
> Bad swp_entry: 4000
> mm/swap_state.c:683: bad pte d3a05f1c(8002)
> 
> Other 32bit DomU:
> Bad swp_entry: 400
> mm/swap_state.c:683: bad pte e2187f30(8002)
> 
> Other 32bit:
> Bad swp_entry: 200
> mm/swap_state.c:683: bad pte ef3a3f38(8001)
> 
> The Linux bugzilla has more info
> https://bugzilla.kernel.org/show_bug.cgi?id=198497
> 
> This may not be exclusive to Xen Linux, but most of the reports are on
> Xen.  Matthew wonders if Xen might be stepping on the upper bits of a
> pte.
> 
> On Fri, Apr 20, 2018 at 9:39 AM, Matthew Wilcox  wrote:
>> On Fri, Apr 20, 2018 at 09:10:11AM -0400, Jason Andryuk wrote:
 Given that this is happening on Xen, I wonder if Xen is using some of the
 bits in the page table for its own purposes.
>>>
>>> The backtraces include do_swap_page().  While I have a swap partition
>>> configured, I don't think it's being used.  Are we somehow
>>> misidentifying the page as a swap page?  I'm not familiar with the
>>> code, but is there an easy way to query global swap usage?  That way
>>> we can see if the check for a swap page is bogus.
>>>
>>> My system works with the band-aid patch.  When that patch sets page =
>>> NULL, does that mean userspace is just going to get a zero-ed page?
>>> Userspace still works AFAICT, which makes me think it is a
>>> mis-identified page to start with.
>>
>> Here's how this code works.
> 
> Thanks for the description.
> 
>> When we swap out an anonymous page (a page which is not backed by a
>> file; could be from a MAP_PRIVATE mapping, could be brk()), we write it
>> to the swap cache.  In order to be able to find it again, we store a
>> cookie (called a swp_entry_t) in the process' page table (marked with
>> the 'present' bit clear, so the CPU will fault on it).  When we get a
>> fault, we look up the cookie in a radix tree and bring that page back
>> in from swap.
>>
>> If there's no page found in the radix tree, we put a freshly zeroed
>> page into the process's address space.  That's because we won't find
>> a page in the swap cache's radix tree for the first time we fault.
>> It's not an indication of a bug if there's no page to be found.
> 
> Is "no page found" the case for a lazy, un-allocated MAP_ANONYMOUS page?
> 
>> What we're seeing for this bug is page table entries of the format
>> 0x8000'0004''.  That would be a zeroed entry, except for the
>> fact that something's stepped on the upper bits.
> 
> Does a totally zero-ed entry correspond to an un-allocated MAP_ANONYMOUS page?
> 
>> What is worrying is that potentially Xen might be stepping on the upper
>> bits of either a present entry (leading to the process loading a page
>> that belongs to someone else) or an entry which has been swapped out,
>> leading to the process getting a zeroed page when it should be getting
>> its page back from swap.
> 
> There was at least one report of non-Xen 32bit being affected.  There
> was no backtrace, so it could be something else.  One report doesn't
> have any swap configured.
> 
>> Defending against this kind of corruption would take adding a parity
>> bit to the page tables.  That's not a project I have time for right now.
> 
> Understood.  Thanks for the response.
> 
> Regards,
> Jason
> 
> 
> [ 2234.939079] BUG: unable to handle kernel NULL pointer dereference at 
> 0008
> [ 2234.942154] IP: __radix_tree_lookup+0xe/0xa0
> [ 2234.945176] *pdpt = 08cd5027 *pde = 
> [ 2234.948382] Oops:  [#1] SMP
> [ 2234.951410] Modules linked in: hp_wmi sparse_keymap rfkill wmi_bmof
> pcspkr i915 wmi hp_accel lis3lv02d input_polldev drm_kms_helper
> syscopyarea sysfillrect sysimgblt fb_sys_fops drm hp_wireless
> i2c_algo_bit hid_multitouch sha256_generic xen_netfront v4v(O) psmouse
> ecb xts hid_generic xhci_pci xhci_hcd ohci_pci ohci_hcd uhci_hcd
> ehci_pci ehci_hcd usbhid hid tpm_tis tpm_tis_core tpm
> [ 2234.960816] CPU: 1 PID: 2338 Comm: xenvm Tainted: G   O4.14.18 
> #1
> [ 2234.963991] Hardware name: Hewlett-Packard HP EliteBook Folio
> 9470m/18DF, BIOS 68IBD Ver. F.40 02/01/2013
> [ 2234.967186] task: d4370980 task.stack: cf8e8000
> [ 2234.970351] EIP: __radix_tree_lookup+0xe/0xa0
> [ 2234.973520] EFLAGS: 00010286 CPU: 1
> [ 2234.976699] EAX: 0004 EBX: b590 ECX:  EDX: 
> [ 2234.979887] ESI:  EDI: 0004 EBP: cf8e9dd0 ESP: cf8e9dc0
> [ 2234.983081]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0069
> [ 

Re: [Xen-devel] [Bug 198497] handle_mm_fault / xen_pmd_val / radix_tree_lookup_slot Null pointer

2018-04-21 Thread Juergen Gross
On 21/04/18 16:35, Matthew Wilcox wrote:
> On Fri, Apr 20, 2018 at 10:02:29AM -0600, Jan Beulich wrote:
>> Skylake 32bit PAE Dom0:
>> Bad swp_entry: 8000
>> mm/swap_state.c:683: bad pte d3a39f1c(8004)
>>
>> Ivy Bridge 32bit PAE Dom0:
>> Bad swp_entry: 4000
>> mm/swap_state.c:683: bad pte d3a05f1c(8002)
>>
>> Other 32bit DomU:
>> Bad swp_entry: 400
>> mm/swap_state.c:683: bad pte e2187f30(8002)
>>
>> Other 32bit:
>> Bad swp_entry: 200
>> mm/swap_state.c:683: bad pte ef3a3f38(8001)
> 
>> As said in my previous reply - both of the bits Andrew has mentioned can
>> only ever be set when the present bit is also set (which doesn't appear to
>> be the case here). The set bits above are actually in the range of bits
>> designated to the address, which Xen wouldn't ever play with.
> 
> Is it relevant that all the crashes we've seen are with PAE in the guest?
> Is it possible that Xen thinks the guest is not using PAE?
> 

All Xen 32-bit PV guests are using PAE. Its part of the PV ABI.


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [Bug 198497] handle_mm_fault / xen_pmd_val / radix_tree_lookup_slot Null pointer

2018-04-21 Thread Matthew Wilcox
On Fri, Apr 20, 2018 at 10:02:29AM -0600, Jan Beulich wrote:
>  Skylake 32bit PAE Dom0:
>  Bad swp_entry: 8000
>  mm/swap_state.c:683: bad pte d3a39f1c(8004)
> 
>  Ivy Bridge 32bit PAE Dom0:
>  Bad swp_entry: 4000
>  mm/swap_state.c:683: bad pte d3a05f1c(8002)
> 
>  Other 32bit DomU:
>  Bad swp_entry: 400
>  mm/swap_state.c:683: bad pte e2187f30(8002)
> 
>  Other 32bit:
>  Bad swp_entry: 200
>  mm/swap_state.c:683: bad pte ef3a3f38(8001)

> As said in my previous reply - both of the bits Andrew has mentioned can
> only ever be set when the present bit is also set (which doesn't appear to
> be the case here). The set bits above are actually in the range of bits
> designated to the address, which Xen wouldn't ever play with.

Is it relevant that all the crashes we've seen are with PAE in the guest?
Is it possible that Xen thinks the guest is not using PAE?

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [Bug 198497] handle_mm_fault / xen_pmd_val / radix_tree_lookup_slot Null pointer

2018-04-21 Thread Juergen Gross
On 20/04/18 21:20, Boris Ostrovsky wrote:
> On 04/20/2018 12:02 PM, Jan Beulich wrote:
> On 20.04.18 at 17:52,  wrote:
>>> On Fri, Apr 20, 2018 at 11:42 AM, Jan Beulich  wrote:
>>> On 20.04.18 at 17:25,  wrote:
> On 20/04/18 16:20, Jason Andryuk wrote:
>> Adding xen-devel and the Linux Xen maintainers.
>>
>> Summary: Some Xen users (and maybe others) are hitting a BUG in
>> __radix_tree_lookup() under do_swap_page() - example backtrace is
>> provided at the end.  Matthew Wilcox provided a band-aid patch that
>> prints errors like the following instead of triggering the bug.
>>
>> Skylake 32bit PAE Dom0:
>> Bad swp_entry: 8000
>> mm/swap_state.c:683: bad pte d3a39f1c(8004)
>>
>> Ivy Bridge 32bit PAE Dom0:
>> Bad swp_entry: 4000
>> mm/swap_state.c:683: bad pte d3a05f1c(8002)
>>
>> Other 32bit DomU:
>> Bad swp_entry: 400
>> mm/swap_state.c:683: bad pte e2187f30(8002)
>>
>> Other 32bit:
>> Bad swp_entry: 200
>> mm/swap_state.c:683: bad pte ef3a3f38(8001)
>>
>> The Linux bugzilla has more info
>> https://bugzilla.kernel.org/show_bug.cgi?id=198497 
>>
>> This may not be exclusive to Xen Linux, but most of the reports are on
>> Xen.  Matthew wonders if Xen might be stepping on the upper bits of a
>> pte.
> Yes - Xen does use the upper bits of a PTE, but only 1 in release
> builds, and a second in debug builds.  I don't understand where you're
> getting the 3rd bit in there.
 The former supposedly is _PAGE_GUEST_KERNEL, which we use for 64-bit
 guests only. Above talk is of 32-bit guests only.

 In addition both this and _PAGE_GNTTAB are used on present PTEs only,
 while above talk is about swap entries.
>>> This hits a BUG going through do_swap_page, but it seems like users
>>> don't think they are actually using swap at the time.  One reporter
>>> didn't have any swap configured.  Some of this information was further
>>> down in my original message.
>>>
>>> I'm wondering if somehow we have a PTE that should be empty and should
>>> be lazily filled.  For some reason, the entry has some bits set and is
>>> causing the trouble.  Would Xen mess with the PTEs in that case?
>> As said in my previous reply - both of the bits Andrew has mentioned can
>> only ever be set when the present bit is also set (which doesn't appear to
>> be the case here). The set bits above are actually in the range of bits
>> designated to the address, which Xen wouldn't ever play with.
> 
> 
> The bug description starts with: "On a Xen VM running as pvh"
> 
> So is this a PV or a PVH guest?

The stack backtrace suggests PV.


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [Bug 198497] handle_mm_fault / xen_pmd_val / radix_tree_lookup_slot Null pointer

2018-04-20 Thread Boris Ostrovsky
On 04/20/2018 12:02 PM, Jan Beulich wrote:
 On 20.04.18 at 17:52,  wrote:
>> On Fri, Apr 20, 2018 at 11:42 AM, Jan Beulich  wrote:
>> On 20.04.18 at 17:25,  wrote:
 On 20/04/18 16:20, Jason Andryuk wrote:
> Adding xen-devel and the Linux Xen maintainers.
>
> Summary: Some Xen users (and maybe others) are hitting a BUG in
> __radix_tree_lookup() under do_swap_page() - example backtrace is
> provided at the end.  Matthew Wilcox provided a band-aid patch that
> prints errors like the following instead of triggering the bug.
>
> Skylake 32bit PAE Dom0:
> Bad swp_entry: 8000
> mm/swap_state.c:683: bad pte d3a39f1c(8004)
>
> Ivy Bridge 32bit PAE Dom0:
> Bad swp_entry: 4000
> mm/swap_state.c:683: bad pte d3a05f1c(8002)
>
> Other 32bit DomU:
> Bad swp_entry: 400
> mm/swap_state.c:683: bad pte e2187f30(8002)
>
> Other 32bit:
> Bad swp_entry: 200
> mm/swap_state.c:683: bad pte ef3a3f38(8001)
>
> The Linux bugzilla has more info
> https://bugzilla.kernel.org/show_bug.cgi?id=198497 
>
> This may not be exclusive to Xen Linux, but most of the reports are on
> Xen.  Matthew wonders if Xen might be stepping on the upper bits of a
> pte.
 Yes - Xen does use the upper bits of a PTE, but only 1 in release
 builds, and a second in debug builds.  I don't understand where you're
 getting the 3rd bit in there.
>>> The former supposedly is _PAGE_GUEST_KERNEL, which we use for 64-bit
>>> guests only. Above talk is of 32-bit guests only.
>>>
>>> In addition both this and _PAGE_GNTTAB are used on present PTEs only,
>>> while above talk is about swap entries.
>> This hits a BUG going through do_swap_page, but it seems like users
>> don't think they are actually using swap at the time.  One reporter
>> didn't have any swap configured.  Some of this information was further
>> down in my original message.
>>
>> I'm wondering if somehow we have a PTE that should be empty and should
>> be lazily filled.  For some reason, the entry has some bits set and is
>> causing the trouble.  Would Xen mess with the PTEs in that case?
> As said in my previous reply - both of the bits Andrew has mentioned can
> only ever be set when the present bit is also set (which doesn't appear to
> be the case here). The set bits above are actually in the range of bits
> designated to the address, which Xen wouldn't ever play with.


The bug description starts with: "On a Xen VM running as pvh"

So is this a PV or a PVH guest?


-boris

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [Bug 198497] handle_mm_fault / xen_pmd_val / radix_tree_lookup_slot Null pointer

2018-04-20 Thread Jan Beulich
>>> On 20.04.18 at 17:52,  wrote:
> On Fri, Apr 20, 2018 at 11:42 AM, Jan Beulich  wrote:
> On 20.04.18 at 17:25,  wrote:
>>> On 20/04/18 16:20, Jason Andryuk wrote:
 Adding xen-devel and the Linux Xen maintainers.

 Summary: Some Xen users (and maybe others) are hitting a BUG in
 __radix_tree_lookup() under do_swap_page() - example backtrace is
 provided at the end.  Matthew Wilcox provided a band-aid patch that
 prints errors like the following instead of triggering the bug.

 Skylake 32bit PAE Dom0:
 Bad swp_entry: 8000
 mm/swap_state.c:683: bad pte d3a39f1c(8004)

 Ivy Bridge 32bit PAE Dom0:
 Bad swp_entry: 4000
 mm/swap_state.c:683: bad pte d3a05f1c(8002)

 Other 32bit DomU:
 Bad swp_entry: 400
 mm/swap_state.c:683: bad pte e2187f30(8002)

 Other 32bit:
 Bad swp_entry: 200
 mm/swap_state.c:683: bad pte ef3a3f38(8001)

 The Linux bugzilla has more info
 https://bugzilla.kernel.org/show_bug.cgi?id=198497 

 This may not be exclusive to Xen Linux, but most of the reports are on
 Xen.  Matthew wonders if Xen might be stepping on the upper bits of a
 pte.
>>>
>>> Yes - Xen does use the upper bits of a PTE, but only 1 in release
>>> builds, and a second in debug builds.  I don't understand where you're
>>> getting the 3rd bit in there.
>>
>> The former supposedly is _PAGE_GUEST_KERNEL, which we use for 64-bit
>> guests only. Above talk is of 32-bit guests only.
>>
>> In addition both this and _PAGE_GNTTAB are used on present PTEs only,
>> while above talk is about swap entries.
> 
> This hits a BUG going through do_swap_page, but it seems like users
> don't think they are actually using swap at the time.  One reporter
> didn't have any swap configured.  Some of this information was further
> down in my original message.
> 
> I'm wondering if somehow we have a PTE that should be empty and should
> be lazily filled.  For some reason, the entry has some bits set and is
> causing the trouble.  Would Xen mess with the PTEs in that case?

As said in my previous reply - both of the bits Andrew has mentioned can
only ever be set when the present bit is also set (which doesn't appear to
be the case here). The set bits above are actually in the range of bits
designated to the address, which Xen wouldn't ever play with.

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [Bug 198497] handle_mm_fault / xen_pmd_val / radix_tree_lookup_slot Null pointer

2018-04-20 Thread Andrew Cooper
On 20/04/18 16:52, Jason Andryuk wrote:
> On Fri, Apr 20, 2018 at 11:42 AM, Jan Beulich  wrote:
> On 20.04.18 at 17:25,  wrote:
>>> On 20/04/18 16:20, Jason Andryuk wrote:
 Adding xen-devel and the Linux Xen maintainers.

 Summary: Some Xen users (and maybe others) are hitting a BUG in
 __radix_tree_lookup() under do_swap_page() - example backtrace is
 provided at the end.  Matthew Wilcox provided a band-aid patch that
 prints errors like the following instead of triggering the bug.

 Skylake 32bit PAE Dom0:
 Bad swp_entry: 8000
 mm/swap_state.c:683: bad pte d3a39f1c(8004)

 Ivy Bridge 32bit PAE Dom0:
 Bad swp_entry: 4000
 mm/swap_state.c:683: bad pte d3a05f1c(8002)

 Other 32bit DomU:
 Bad swp_entry: 400
 mm/swap_state.c:683: bad pte e2187f30(8002)

 Other 32bit:
 Bad swp_entry: 200
 mm/swap_state.c:683: bad pte ef3a3f38(8001)

 The Linux bugzilla has more info
 https://bugzilla.kernel.org/show_bug.cgi?id=198497

 This may not be exclusive to Xen Linux, but most of the reports are on
 Xen.  Matthew wonders if Xen might be stepping on the upper bits of a
 pte.
>>> Yes - Xen does use the upper bits of a PTE, but only 1 in release
>>> builds, and a second in debug builds.  I don't understand where you're
>>> getting the 3rd bit in there.
>> The former supposedly is _PAGE_GUEST_KERNEL, which we use for 64-bit
>> guests only. Above talk is of 32-bit guests only.
>>
>> In addition both this and _PAGE_GNTTAB are used on present PTEs only,
>> while above talk is about swap entries.
> This hits a BUG going through do_swap_page, but it seems like users
> don't think they are actually using swap at the time.  One reporter
> didn't have any swap configured.  Some of this information was further
> down in my original message.
>
> I'm wondering if somehow we have a PTE that should be empty and should
> be lazily filled.  For some reason, the entry has some bits set and is
> causing the trouble.  Would Xen mess with the PTEs in that case?

Any PTE with the present bit clear will be accepted and used
unmodified.  That said, I believe there is some batching of updates for
efficiency reasons in the PVops layer of the kernel, which might end up
causing a disconnect between what the swap system things, and what the
actual PTEs show when read.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [Bug 198497] handle_mm_fault / xen_pmd_val / radix_tree_lookup_slot Null pointer

2018-04-20 Thread Jason Andryuk
On Fri, Apr 20, 2018 at 11:42 AM, Jan Beulich  wrote:
 On 20.04.18 at 17:25,  wrote:
>> On 20/04/18 16:20, Jason Andryuk wrote:
>>> Adding xen-devel and the Linux Xen maintainers.
>>>
>>> Summary: Some Xen users (and maybe others) are hitting a BUG in
>>> __radix_tree_lookup() under do_swap_page() - example backtrace is
>>> provided at the end.  Matthew Wilcox provided a band-aid patch that
>>> prints errors like the following instead of triggering the bug.
>>>
>>> Skylake 32bit PAE Dom0:
>>> Bad swp_entry: 8000
>>> mm/swap_state.c:683: bad pte d3a39f1c(8004)
>>>
>>> Ivy Bridge 32bit PAE Dom0:
>>> Bad swp_entry: 4000
>>> mm/swap_state.c:683: bad pte d3a05f1c(8002)
>>>
>>> Other 32bit DomU:
>>> Bad swp_entry: 400
>>> mm/swap_state.c:683: bad pte e2187f30(8002)
>>>
>>> Other 32bit:
>>> Bad swp_entry: 200
>>> mm/swap_state.c:683: bad pte ef3a3f38(8001)
>>>
>>> The Linux bugzilla has more info
>>> https://bugzilla.kernel.org/show_bug.cgi?id=198497
>>>
>>> This may not be exclusive to Xen Linux, but most of the reports are on
>>> Xen.  Matthew wonders if Xen might be stepping on the upper bits of a
>>> pte.
>>
>> Yes - Xen does use the upper bits of a PTE, but only 1 in release
>> builds, and a second in debug builds.  I don't understand where you're
>> getting the 3rd bit in there.
>
> The former supposedly is _PAGE_GUEST_KERNEL, which we use for 64-bit
> guests only. Above talk is of 32-bit guests only.
>
> In addition both this and _PAGE_GNTTAB are used on present PTEs only,
> while above talk is about swap entries.

This hits a BUG going through do_swap_page, but it seems like users
don't think they are actually using swap at the time.  One reporter
didn't have any swap configured.  Some of this information was further
down in my original message.

I'm wondering if somehow we have a PTE that should be empty and should
be lazily filled.  For some reason, the entry has some bits set and is
causing the trouble.  Would Xen mess with the PTEs in that case?

Thanks,
Jason

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [Bug 198497] handle_mm_fault / xen_pmd_val / radix_tree_lookup_slot Null pointer

2018-04-20 Thread Jan Beulich
>>> On 20.04.18 at 17:25,  wrote:
> On 20/04/18 16:20, Jason Andryuk wrote:
>> Adding xen-devel and the Linux Xen maintainers.
>>
>> Summary: Some Xen users (and maybe others) are hitting a BUG in
>> __radix_tree_lookup() under do_swap_page() - example backtrace is
>> provided at the end.  Matthew Wilcox provided a band-aid patch that
>> prints errors like the following instead of triggering the bug.
>>
>> Skylake 32bit PAE Dom0:
>> Bad swp_entry: 8000
>> mm/swap_state.c:683: bad pte d3a39f1c(8004)
>>
>> Ivy Bridge 32bit PAE Dom0:
>> Bad swp_entry: 4000
>> mm/swap_state.c:683: bad pte d3a05f1c(8002)
>>
>> Other 32bit DomU:
>> Bad swp_entry: 400
>> mm/swap_state.c:683: bad pte e2187f30(8002)
>>
>> Other 32bit:
>> Bad swp_entry: 200
>> mm/swap_state.c:683: bad pte ef3a3f38(8001)
>>
>> The Linux bugzilla has more info
>> https://bugzilla.kernel.org/show_bug.cgi?id=198497 
>>
>> This may not be exclusive to Xen Linux, but most of the reports are on
>> Xen.  Matthew wonders if Xen might be stepping on the upper bits of a
>> pte.
> 
> Yes - Xen does use the upper bits of a PTE, but only 1 in release
> builds, and a second in debug builds.  I don't understand where you're
> getting the 3rd bit in there.

The former supposedly is _PAGE_GUEST_KERNEL, which we use for 64-bit
guests only. Above talk is of 32-bit guests only.

In addition both this and _PAGE_GNTTAB are used on present PTEs only,
while above talk is about swap entries.

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [Bug 198497] handle_mm_fault / xen_pmd_val / radix_tree_lookup_slot Null pointer

2018-04-20 Thread Andrew Cooper
On 20/04/18 16:25, Andrew Cooper wrote:
> On 20/04/18 16:20, Jason Andryuk wrote:
>> Adding xen-devel and the Linux Xen maintainers.
>>
>> Summary: Some Xen users (and maybe others) are hitting a BUG in
>> __radix_tree_lookup() under do_swap_page() - example backtrace is
>> provided at the end.  Matthew Wilcox provided a band-aid patch that
>> prints errors like the following instead of triggering the bug.
>>
>> Skylake 32bit PAE Dom0:
>> Bad swp_entry: 8000
>> mm/swap_state.c:683: bad pte d3a39f1c(8004)
>>
>> Ivy Bridge 32bit PAE Dom0:
>> Bad swp_entry: 4000
>> mm/swap_state.c:683: bad pte d3a05f1c(8002)
>>
>> Other 32bit DomU:
>> Bad swp_entry: 400
>> mm/swap_state.c:683: bad pte e2187f30(8002)
>>
>> Other 32bit:
>> Bad swp_entry: 200
>> mm/swap_state.c:683: bad pte ef3a3f38(8001)
>>
>> The Linux bugzilla has more info
>> https://bugzilla.kernel.org/show_bug.cgi?id=198497
>>
>> This may not be exclusive to Xen Linux, but most of the reports are on
>> Xen.  Matthew wonders if Xen might be stepping on the upper bits of a
>> pte.
> Yes - Xen does use the upper bits of a PTE, but only 1 in release
> builds, and a second in debug builds.  I don't understand where you're
> getting the 3rd bit in there.
>
> The use of these bits are dubious, and not adequately described in the
> ABI, and attempts to improve the state of play has come to nothing in
> the past.

Sorry - hit send too early.  To be rather more helpful:

For 64bit guests only, we use one bit to distinguish between guest
kernel and guest user pages.  This is because both guest user and kernel
run in ring3, and have to have _PAGE_USER set on them.  We use bit 52 to
tag guest kernel mappings, which is seeded from the guest kernels choice
of _PAGE_USER.

In debug builds of the hypervisor only, we use bit 62 to tag grant
mappings.  This is to help spot API errors in the guest, and results in
an instant crash if we spot misuse.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [Bug 198497] handle_mm_fault / xen_pmd_val / radix_tree_lookup_slot Null pointer

2018-04-20 Thread Jason Andryuk
Adding xen-devel and the Linux Xen maintainers.

Summary: Some Xen users (and maybe others) are hitting a BUG in
__radix_tree_lookup() under do_swap_page() - example backtrace is
provided at the end.  Matthew Wilcox provided a band-aid patch that
prints errors like the following instead of triggering the bug.

Skylake 32bit PAE Dom0:
Bad swp_entry: 8000
mm/swap_state.c:683: bad pte d3a39f1c(8004)

Ivy Bridge 32bit PAE Dom0:
Bad swp_entry: 4000
mm/swap_state.c:683: bad pte d3a05f1c(8002)

Other 32bit DomU:
Bad swp_entry: 400
mm/swap_state.c:683: bad pte e2187f30(8002)

Other 32bit:
Bad swp_entry: 200
mm/swap_state.c:683: bad pte ef3a3f38(8001)

The Linux bugzilla has more info
https://bugzilla.kernel.org/show_bug.cgi?id=198497

This may not be exclusive to Xen Linux, but most of the reports are on
Xen.  Matthew wonders if Xen might be stepping on the upper bits of a
pte.

On Fri, Apr 20, 2018 at 9:39 AM, Matthew Wilcox  wrote:
> On Fri, Apr 20, 2018 at 09:10:11AM -0400, Jason Andryuk wrote:
>> > Given that this is happening on Xen, I wonder if Xen is using some of the
>> > bits in the page table for its own purposes.
>>
>> The backtraces include do_swap_page().  While I have a swap partition
>> configured, I don't think it's being used.  Are we somehow
>> misidentifying the page as a swap page?  I'm not familiar with the
>> code, but is there an easy way to query global swap usage?  That way
>> we can see if the check for a swap page is bogus.
>>
>> My system works with the band-aid patch.  When that patch sets page =
>> NULL, does that mean userspace is just going to get a zero-ed page?
>> Userspace still works AFAICT, which makes me think it is a
>> mis-identified page to start with.
>
> Here's how this code works.

Thanks for the description.

> When we swap out an anonymous page (a page which is not backed by a
> file; could be from a MAP_PRIVATE mapping, could be brk()), we write it
> to the swap cache.  In order to be able to find it again, we store a
> cookie (called a swp_entry_t) in the process' page table (marked with
> the 'present' bit clear, so the CPU will fault on it).  When we get a
> fault, we look up the cookie in a radix tree and bring that page back
> in from swap.
>
> If there's no page found in the radix tree, we put a freshly zeroed
> page into the process's address space.  That's because we won't find
> a page in the swap cache's radix tree for the first time we fault.
> It's not an indication of a bug if there's no page to be found.

Is "no page found" the case for a lazy, un-allocated MAP_ANONYMOUS page?

> What we're seeing for this bug is page table entries of the format
> 0x8000'0004''.  That would be a zeroed entry, except for the
> fact that something's stepped on the upper bits.

Does a totally zero-ed entry correspond to an un-allocated MAP_ANONYMOUS page?

> What is worrying is that potentially Xen might be stepping on the upper
> bits of either a present entry (leading to the process loading a page
> that belongs to someone else) or an entry which has been swapped out,
> leading to the process getting a zeroed page when it should be getting
> its page back from swap.

There was at least one report of non-Xen 32bit being affected.  There
was no backtrace, so it could be something else.  One report doesn't
have any swap configured.

> Defending against this kind of corruption would take adding a parity
> bit to the page tables.  That's not a project I have time for right now.

Understood.  Thanks for the response.

Regards,
Jason


[ 2234.939079] BUG: unable to handle kernel NULL pointer dereference at 0008
[ 2234.942154] IP: __radix_tree_lookup+0xe/0xa0
[ 2234.945176] *pdpt = 08cd5027 *pde = 
[ 2234.948382] Oops:  [#1] SMP
[ 2234.951410] Modules linked in: hp_wmi sparse_keymap rfkill wmi_bmof
pcspkr i915 wmi hp_accel lis3lv02d input_polldev drm_kms_helper
syscopyarea sysfillrect sysimgblt fb_sys_fops drm hp_wireless
i2c_algo_bit hid_multitouch sha256_generic xen_netfront v4v(O) psmouse
ecb xts hid_generic xhci_pci xhci_hcd ohci_pci ohci_hcd uhci_hcd
ehci_pci ehci_hcd usbhid hid tpm_tis tpm_tis_core tpm
[ 2234.960816] CPU: 1 PID: 2338 Comm: xenvm Tainted: G   O4.14.18 #1
[ 2234.963991] Hardware name: Hewlett-Packard HP EliteBook Folio
9470m/18DF, BIOS 68IBD Ver. F.40 02/01/2013
[ 2234.967186] task: d4370980 task.stack: cf8e8000
[ 2234.970351] EIP: __radix_tree_lookup+0xe/0xa0
[ 2234.973520] EFLAGS: 00010286 CPU: 1
[ 2234.976699] EAX: 0004 EBX: b590 ECX:  EDX: 
[ 2234.979887] ESI:  EDI: 0004 EBP: cf8e9dd0 ESP: cf8e9dc0
[ 2234.983081]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0069
[ 2234.986233] CR0: 80050033 CR2: 0008 CR3: 08f12000 CR4: 00042660
[ 2234.989340] Call Trace:
[ 2234.992354]  radix_tree_lookup_slot+0x1d/0x50
[ 2234.995341]  ? xen_irq_disable_direct+0xc/0xc
[ 2234.998288]