Re: [Xen-devel] [Bug 198497] handle_mm_fault / xen_pmd_val / radix_tree_lookup_slot Null pointer
On Mon, Apr 23, 2018 at 4:17 AM Juergen Gross wrote: > On 20/04/18 17:20, Jason Andryuk wrote: > > Adding xen-devel and the Linux Xen maintainers. > > > > Summary: Some Xen users (and maybe others) are hitting a BUG in > > __radix_tree_lookup() under do_swap_page() - example backtrace is > > provided at the end. Matthew Wilcox provided a band-aid patch that > > prints errors like the following instead of triggering the bug. > > > > Skylake 32bit PAE Dom0: > > Bad swp_entry: 8000 > > mm/swap_state.c:683: bad pte d3a39f1c(8004) > > > > Ivy Bridge 32bit PAE Dom0: > > Bad swp_entry: 4000 > > mm/swap_state.c:683: bad pte d3a05f1c(8002) > > > > Other 32bit DomU: > > Bad swp_entry: 400 > > mm/swap_state.c:683: bad pte e2187f30(8002) > > > > Other 32bit: > > Bad swp_entry: 200 > > mm/swap_state.c:683: bad pte ef3a3f38(8001) > > > > The Linux bugzilla has more info > > https://bugzilla.kernel.org/show_bug.cgi?id=198497 > > > > This may not be exclusive to Xen Linux, but most of the reports are on > > Xen. Matthew wonders if Xen might be stepping on the upper bits of a > > pte. > > > > Could it be we just have a race regarding pte_clear()? This will set > the low part of the pte to zero first and then the hight part. > > In case pte_clear() is used in interrupt mode especially Xen will be > rather slow as it emulates the two writes to the page table resulting > in a larger window where the race might happen. It looks like Juergen was correct. With the L1TF vulnerability, the Xen hypervisor needs to detect vulnerable PTEs. For 32bit PAE, Xen would trap on PTEs like 0x8000'0002'' - the same format as seen in this bug. He wrote two patches for Linux, now upstream, to write PTEs with 64bit operations or hypercalls and avoid the invalid PTEs: f7c90c2aa400 "x86/xen: don't write ptes directly in 32-bit PV guests" b2d7a075a1cc "x86/pae: use 64 bit atomic xchg function in native_ptep_get_and_clear" With those patches, I have not seen a "Bad swp_entry", so this seems fixed for me on Xen. There was also a report of a non-Xen kernel being affected. Is there an underlying problem that native PAE code updates PTEs in two writes, but there is no locking to prevent the intermediate PTE from being used elsewhere in the kernel? Regards, Jason ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [Bug 198497] handle_mm_fault / xen_pmd_val / radix_tree_lookup_slot Null pointer
On 20/04/18 17:20, Jason Andryuk wrote: > Adding xen-devel and the Linux Xen maintainers. > > Summary: Some Xen users (and maybe others) are hitting a BUG in > __radix_tree_lookup() under do_swap_page() - example backtrace is > provided at the end. Matthew Wilcox provided a band-aid patch that > prints errors like the following instead of triggering the bug. > > Skylake 32bit PAE Dom0: > Bad swp_entry: 8000 > mm/swap_state.c:683: bad pte d3a39f1c(8004) > > Ivy Bridge 32bit PAE Dom0: > Bad swp_entry: 4000 > mm/swap_state.c:683: bad pte d3a05f1c(8002) > > Other 32bit DomU: > Bad swp_entry: 400 > mm/swap_state.c:683: bad pte e2187f30(8002) > > Other 32bit: > Bad swp_entry: 200 > mm/swap_state.c:683: bad pte ef3a3f38(8001) > > The Linux bugzilla has more info > https://bugzilla.kernel.org/show_bug.cgi?id=198497 > > This may not be exclusive to Xen Linux, but most of the reports are on > Xen. Matthew wonders if Xen might be stepping on the upper bits of a > pte. > > On Fri, Apr 20, 2018 at 9:39 AM, Matthew Wilcoxwrote: >> On Fri, Apr 20, 2018 at 09:10:11AM -0400, Jason Andryuk wrote: Given that this is happening on Xen, I wonder if Xen is using some of the bits in the page table for its own purposes. >>> >>> The backtraces include do_swap_page(). While I have a swap partition >>> configured, I don't think it's being used. Are we somehow >>> misidentifying the page as a swap page? I'm not familiar with the >>> code, but is there an easy way to query global swap usage? That way >>> we can see if the check for a swap page is bogus. >>> >>> My system works with the band-aid patch. When that patch sets page = >>> NULL, does that mean userspace is just going to get a zero-ed page? >>> Userspace still works AFAICT, which makes me think it is a >>> mis-identified page to start with. >> >> Here's how this code works. > > Thanks for the description. > >> When we swap out an anonymous page (a page which is not backed by a >> file; could be from a MAP_PRIVATE mapping, could be brk()), we write it >> to the swap cache. In order to be able to find it again, we store a >> cookie (called a swp_entry_t) in the process' page table (marked with >> the 'present' bit clear, so the CPU will fault on it). When we get a >> fault, we look up the cookie in a radix tree and bring that page back >> in from swap. >> >> If there's no page found in the radix tree, we put a freshly zeroed >> page into the process's address space. That's because we won't find >> a page in the swap cache's radix tree for the first time we fault. >> It's not an indication of a bug if there's no page to be found. > > Is "no page found" the case for a lazy, un-allocated MAP_ANONYMOUS page? > >> What we're seeing for this bug is page table entries of the format >> 0x8000'0004''. That would be a zeroed entry, except for the >> fact that something's stepped on the upper bits. > > Does a totally zero-ed entry correspond to an un-allocated MAP_ANONYMOUS page? > >> What is worrying is that potentially Xen might be stepping on the upper >> bits of either a present entry (leading to the process loading a page >> that belongs to someone else) or an entry which has been swapped out, >> leading to the process getting a zeroed page when it should be getting >> its page back from swap. > > There was at least one report of non-Xen 32bit being affected. There > was no backtrace, so it could be something else. One report doesn't > have any swap configured. > >> Defending against this kind of corruption would take adding a parity >> bit to the page tables. That's not a project I have time for right now. > > Understood. Thanks for the response. > > Regards, > Jason > > > [ 2234.939079] BUG: unable to handle kernel NULL pointer dereference at > 0008 > [ 2234.942154] IP: __radix_tree_lookup+0xe/0xa0 > [ 2234.945176] *pdpt = 08cd5027 *pde = > [ 2234.948382] Oops: [#1] SMP > [ 2234.951410] Modules linked in: hp_wmi sparse_keymap rfkill wmi_bmof > pcspkr i915 wmi hp_accel lis3lv02d input_polldev drm_kms_helper > syscopyarea sysfillrect sysimgblt fb_sys_fops drm hp_wireless > i2c_algo_bit hid_multitouch sha256_generic xen_netfront v4v(O) psmouse > ecb xts hid_generic xhci_pci xhci_hcd ohci_pci ohci_hcd uhci_hcd > ehci_pci ehci_hcd usbhid hid tpm_tis tpm_tis_core tpm > [ 2234.960816] CPU: 1 PID: 2338 Comm: xenvm Tainted: G O4.14.18 > #1 > [ 2234.963991] Hardware name: Hewlett-Packard HP EliteBook Folio > 9470m/18DF, BIOS 68IBD Ver. F.40 02/01/2013 > [ 2234.967186] task: d4370980 task.stack: cf8e8000 > [ 2234.970351] EIP: __radix_tree_lookup+0xe/0xa0 > [ 2234.973520] EFLAGS: 00010286 CPU: 1 > [ 2234.976699] EAX: 0004 EBX: b590 ECX: EDX: > [ 2234.979887] ESI: EDI: 0004 EBP: cf8e9dd0 ESP: cf8e9dc0 > [ 2234.983081] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0069 > [
Re: [Xen-devel] [Bug 198497] handle_mm_fault / xen_pmd_val / radix_tree_lookup_slot Null pointer
On 21/04/18 16:35, Matthew Wilcox wrote: > On Fri, Apr 20, 2018 at 10:02:29AM -0600, Jan Beulich wrote: >> Skylake 32bit PAE Dom0: >> Bad swp_entry: 8000 >> mm/swap_state.c:683: bad pte d3a39f1c(8004) >> >> Ivy Bridge 32bit PAE Dom0: >> Bad swp_entry: 4000 >> mm/swap_state.c:683: bad pte d3a05f1c(8002) >> >> Other 32bit DomU: >> Bad swp_entry: 400 >> mm/swap_state.c:683: bad pte e2187f30(8002) >> >> Other 32bit: >> Bad swp_entry: 200 >> mm/swap_state.c:683: bad pte ef3a3f38(8001) > >> As said in my previous reply - both of the bits Andrew has mentioned can >> only ever be set when the present bit is also set (which doesn't appear to >> be the case here). The set bits above are actually in the range of bits >> designated to the address, which Xen wouldn't ever play with. > > Is it relevant that all the crashes we've seen are with PAE in the guest? > Is it possible that Xen thinks the guest is not using PAE? > All Xen 32-bit PV guests are using PAE. Its part of the PV ABI. Juergen ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [Bug 198497] handle_mm_fault / xen_pmd_val / radix_tree_lookup_slot Null pointer
On Fri, Apr 20, 2018 at 10:02:29AM -0600, Jan Beulich wrote: > Skylake 32bit PAE Dom0: > Bad swp_entry: 8000 > mm/swap_state.c:683: bad pte d3a39f1c(8004) > > Ivy Bridge 32bit PAE Dom0: > Bad swp_entry: 4000 > mm/swap_state.c:683: bad pte d3a05f1c(8002) > > Other 32bit DomU: > Bad swp_entry: 400 > mm/swap_state.c:683: bad pte e2187f30(8002) > > Other 32bit: > Bad swp_entry: 200 > mm/swap_state.c:683: bad pte ef3a3f38(8001) > As said in my previous reply - both of the bits Andrew has mentioned can > only ever be set when the present bit is also set (which doesn't appear to > be the case here). The set bits above are actually in the range of bits > designated to the address, which Xen wouldn't ever play with. Is it relevant that all the crashes we've seen are with PAE in the guest? Is it possible that Xen thinks the guest is not using PAE? ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [Bug 198497] handle_mm_fault / xen_pmd_val / radix_tree_lookup_slot Null pointer
On 20/04/18 21:20, Boris Ostrovsky wrote: > On 04/20/2018 12:02 PM, Jan Beulich wrote: > On 20.04.18 at 17:52,wrote: >>> On Fri, Apr 20, 2018 at 11:42 AM, Jan Beulich wrote: >>> On 20.04.18 at 17:25, wrote: > On 20/04/18 16:20, Jason Andryuk wrote: >> Adding xen-devel and the Linux Xen maintainers. >> >> Summary: Some Xen users (and maybe others) are hitting a BUG in >> __radix_tree_lookup() under do_swap_page() - example backtrace is >> provided at the end. Matthew Wilcox provided a band-aid patch that >> prints errors like the following instead of triggering the bug. >> >> Skylake 32bit PAE Dom0: >> Bad swp_entry: 8000 >> mm/swap_state.c:683: bad pte d3a39f1c(8004) >> >> Ivy Bridge 32bit PAE Dom0: >> Bad swp_entry: 4000 >> mm/swap_state.c:683: bad pte d3a05f1c(8002) >> >> Other 32bit DomU: >> Bad swp_entry: 400 >> mm/swap_state.c:683: bad pte e2187f30(8002) >> >> Other 32bit: >> Bad swp_entry: 200 >> mm/swap_state.c:683: bad pte ef3a3f38(8001) >> >> The Linux bugzilla has more info >> https://bugzilla.kernel.org/show_bug.cgi?id=198497 >> >> This may not be exclusive to Xen Linux, but most of the reports are on >> Xen. Matthew wonders if Xen might be stepping on the upper bits of a >> pte. > Yes - Xen does use the upper bits of a PTE, but only 1 in release > builds, and a second in debug builds. I don't understand where you're > getting the 3rd bit in there. The former supposedly is _PAGE_GUEST_KERNEL, which we use for 64-bit guests only. Above talk is of 32-bit guests only. In addition both this and _PAGE_GNTTAB are used on present PTEs only, while above talk is about swap entries. >>> This hits a BUG going through do_swap_page, but it seems like users >>> don't think they are actually using swap at the time. One reporter >>> didn't have any swap configured. Some of this information was further >>> down in my original message. >>> >>> I'm wondering if somehow we have a PTE that should be empty and should >>> be lazily filled. For some reason, the entry has some bits set and is >>> causing the trouble. Would Xen mess with the PTEs in that case? >> As said in my previous reply - both of the bits Andrew has mentioned can >> only ever be set when the present bit is also set (which doesn't appear to >> be the case here). The set bits above are actually in the range of bits >> designated to the address, which Xen wouldn't ever play with. > > > The bug description starts with: "On a Xen VM running as pvh" > > So is this a PV or a PVH guest? The stack backtrace suggests PV. Juergen ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [Bug 198497] handle_mm_fault / xen_pmd_val / radix_tree_lookup_slot Null pointer
On 04/20/2018 12:02 PM, Jan Beulich wrote: On 20.04.18 at 17:52,wrote: >> On Fri, Apr 20, 2018 at 11:42 AM, Jan Beulich wrote: >> On 20.04.18 at 17:25, wrote: On 20/04/18 16:20, Jason Andryuk wrote: > Adding xen-devel and the Linux Xen maintainers. > > Summary: Some Xen users (and maybe others) are hitting a BUG in > __radix_tree_lookup() under do_swap_page() - example backtrace is > provided at the end. Matthew Wilcox provided a band-aid patch that > prints errors like the following instead of triggering the bug. > > Skylake 32bit PAE Dom0: > Bad swp_entry: 8000 > mm/swap_state.c:683: bad pte d3a39f1c(8004) > > Ivy Bridge 32bit PAE Dom0: > Bad swp_entry: 4000 > mm/swap_state.c:683: bad pte d3a05f1c(8002) > > Other 32bit DomU: > Bad swp_entry: 400 > mm/swap_state.c:683: bad pte e2187f30(8002) > > Other 32bit: > Bad swp_entry: 200 > mm/swap_state.c:683: bad pte ef3a3f38(8001) > > The Linux bugzilla has more info > https://bugzilla.kernel.org/show_bug.cgi?id=198497 > > This may not be exclusive to Xen Linux, but most of the reports are on > Xen. Matthew wonders if Xen might be stepping on the upper bits of a > pte. Yes - Xen does use the upper bits of a PTE, but only 1 in release builds, and a second in debug builds. I don't understand where you're getting the 3rd bit in there. >>> The former supposedly is _PAGE_GUEST_KERNEL, which we use for 64-bit >>> guests only. Above talk is of 32-bit guests only. >>> >>> In addition both this and _PAGE_GNTTAB are used on present PTEs only, >>> while above talk is about swap entries. >> This hits a BUG going through do_swap_page, but it seems like users >> don't think they are actually using swap at the time. One reporter >> didn't have any swap configured. Some of this information was further >> down in my original message. >> >> I'm wondering if somehow we have a PTE that should be empty and should >> be lazily filled. For some reason, the entry has some bits set and is >> causing the trouble. Would Xen mess with the PTEs in that case? > As said in my previous reply - both of the bits Andrew has mentioned can > only ever be set when the present bit is also set (which doesn't appear to > be the case here). The set bits above are actually in the range of bits > designated to the address, which Xen wouldn't ever play with. The bug description starts with: "On a Xen VM running as pvh" So is this a PV or a PVH guest? -boris ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [Bug 198497] handle_mm_fault / xen_pmd_val / radix_tree_lookup_slot Null pointer
>>> On 20.04.18 at 17:52,wrote: > On Fri, Apr 20, 2018 at 11:42 AM, Jan Beulich wrote: > On 20.04.18 at 17:25, wrote: >>> On 20/04/18 16:20, Jason Andryuk wrote: Adding xen-devel and the Linux Xen maintainers. Summary: Some Xen users (and maybe others) are hitting a BUG in __radix_tree_lookup() under do_swap_page() - example backtrace is provided at the end. Matthew Wilcox provided a band-aid patch that prints errors like the following instead of triggering the bug. Skylake 32bit PAE Dom0: Bad swp_entry: 8000 mm/swap_state.c:683: bad pte d3a39f1c(8004) Ivy Bridge 32bit PAE Dom0: Bad swp_entry: 4000 mm/swap_state.c:683: bad pte d3a05f1c(8002) Other 32bit DomU: Bad swp_entry: 400 mm/swap_state.c:683: bad pte e2187f30(8002) Other 32bit: Bad swp_entry: 200 mm/swap_state.c:683: bad pte ef3a3f38(8001) The Linux bugzilla has more info https://bugzilla.kernel.org/show_bug.cgi?id=198497 This may not be exclusive to Xen Linux, but most of the reports are on Xen. Matthew wonders if Xen might be stepping on the upper bits of a pte. >>> >>> Yes - Xen does use the upper bits of a PTE, but only 1 in release >>> builds, and a second in debug builds. I don't understand where you're >>> getting the 3rd bit in there. >> >> The former supposedly is _PAGE_GUEST_KERNEL, which we use for 64-bit >> guests only. Above talk is of 32-bit guests only. >> >> In addition both this and _PAGE_GNTTAB are used on present PTEs only, >> while above talk is about swap entries. > > This hits a BUG going through do_swap_page, but it seems like users > don't think they are actually using swap at the time. One reporter > didn't have any swap configured. Some of this information was further > down in my original message. > > I'm wondering if somehow we have a PTE that should be empty and should > be lazily filled. For some reason, the entry has some bits set and is > causing the trouble. Would Xen mess with the PTEs in that case? As said in my previous reply - both of the bits Andrew has mentioned can only ever be set when the present bit is also set (which doesn't appear to be the case here). The set bits above are actually in the range of bits designated to the address, which Xen wouldn't ever play with. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [Bug 198497] handle_mm_fault / xen_pmd_val / radix_tree_lookup_slot Null pointer
On 20/04/18 16:52, Jason Andryuk wrote: > On Fri, Apr 20, 2018 at 11:42 AM, Jan Beulichwrote: > On 20.04.18 at 17:25, wrote: >>> On 20/04/18 16:20, Jason Andryuk wrote: Adding xen-devel and the Linux Xen maintainers. Summary: Some Xen users (and maybe others) are hitting a BUG in __radix_tree_lookup() under do_swap_page() - example backtrace is provided at the end. Matthew Wilcox provided a band-aid patch that prints errors like the following instead of triggering the bug. Skylake 32bit PAE Dom0: Bad swp_entry: 8000 mm/swap_state.c:683: bad pte d3a39f1c(8004) Ivy Bridge 32bit PAE Dom0: Bad swp_entry: 4000 mm/swap_state.c:683: bad pte d3a05f1c(8002) Other 32bit DomU: Bad swp_entry: 400 mm/swap_state.c:683: bad pte e2187f30(8002) Other 32bit: Bad swp_entry: 200 mm/swap_state.c:683: bad pte ef3a3f38(8001) The Linux bugzilla has more info https://bugzilla.kernel.org/show_bug.cgi?id=198497 This may not be exclusive to Xen Linux, but most of the reports are on Xen. Matthew wonders if Xen might be stepping on the upper bits of a pte. >>> Yes - Xen does use the upper bits of a PTE, but only 1 in release >>> builds, and a second in debug builds. I don't understand where you're >>> getting the 3rd bit in there. >> The former supposedly is _PAGE_GUEST_KERNEL, which we use for 64-bit >> guests only. Above talk is of 32-bit guests only. >> >> In addition both this and _PAGE_GNTTAB are used on present PTEs only, >> while above talk is about swap entries. > This hits a BUG going through do_swap_page, but it seems like users > don't think they are actually using swap at the time. One reporter > didn't have any swap configured. Some of this information was further > down in my original message. > > I'm wondering if somehow we have a PTE that should be empty and should > be lazily filled. For some reason, the entry has some bits set and is > causing the trouble. Would Xen mess with the PTEs in that case? Any PTE with the present bit clear will be accepted and used unmodified. That said, I believe there is some batching of updates for efficiency reasons in the PVops layer of the kernel, which might end up causing a disconnect between what the swap system things, and what the actual PTEs show when read. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [Bug 198497] handle_mm_fault / xen_pmd_val / radix_tree_lookup_slot Null pointer
On Fri, Apr 20, 2018 at 11:42 AM, Jan Beulichwrote: On 20.04.18 at 17:25, wrote: >> On 20/04/18 16:20, Jason Andryuk wrote: >>> Adding xen-devel and the Linux Xen maintainers. >>> >>> Summary: Some Xen users (and maybe others) are hitting a BUG in >>> __radix_tree_lookup() under do_swap_page() - example backtrace is >>> provided at the end. Matthew Wilcox provided a band-aid patch that >>> prints errors like the following instead of triggering the bug. >>> >>> Skylake 32bit PAE Dom0: >>> Bad swp_entry: 8000 >>> mm/swap_state.c:683: bad pte d3a39f1c(8004) >>> >>> Ivy Bridge 32bit PAE Dom0: >>> Bad swp_entry: 4000 >>> mm/swap_state.c:683: bad pte d3a05f1c(8002) >>> >>> Other 32bit DomU: >>> Bad swp_entry: 400 >>> mm/swap_state.c:683: bad pte e2187f30(8002) >>> >>> Other 32bit: >>> Bad swp_entry: 200 >>> mm/swap_state.c:683: bad pte ef3a3f38(8001) >>> >>> The Linux bugzilla has more info >>> https://bugzilla.kernel.org/show_bug.cgi?id=198497 >>> >>> This may not be exclusive to Xen Linux, but most of the reports are on >>> Xen. Matthew wonders if Xen might be stepping on the upper bits of a >>> pte. >> >> Yes - Xen does use the upper bits of a PTE, but only 1 in release >> builds, and a second in debug builds. I don't understand where you're >> getting the 3rd bit in there. > > The former supposedly is _PAGE_GUEST_KERNEL, which we use for 64-bit > guests only. Above talk is of 32-bit guests only. > > In addition both this and _PAGE_GNTTAB are used on present PTEs only, > while above talk is about swap entries. This hits a BUG going through do_swap_page, but it seems like users don't think they are actually using swap at the time. One reporter didn't have any swap configured. Some of this information was further down in my original message. I'm wondering if somehow we have a PTE that should be empty and should be lazily filled. For some reason, the entry has some bits set and is causing the trouble. Would Xen mess with the PTEs in that case? Thanks, Jason ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [Bug 198497] handle_mm_fault / xen_pmd_val / radix_tree_lookup_slot Null pointer
>>> On 20.04.18 at 17:25,wrote: > On 20/04/18 16:20, Jason Andryuk wrote: >> Adding xen-devel and the Linux Xen maintainers. >> >> Summary: Some Xen users (and maybe others) are hitting a BUG in >> __radix_tree_lookup() under do_swap_page() - example backtrace is >> provided at the end. Matthew Wilcox provided a band-aid patch that >> prints errors like the following instead of triggering the bug. >> >> Skylake 32bit PAE Dom0: >> Bad swp_entry: 8000 >> mm/swap_state.c:683: bad pte d3a39f1c(8004) >> >> Ivy Bridge 32bit PAE Dom0: >> Bad swp_entry: 4000 >> mm/swap_state.c:683: bad pte d3a05f1c(8002) >> >> Other 32bit DomU: >> Bad swp_entry: 400 >> mm/swap_state.c:683: bad pte e2187f30(8002) >> >> Other 32bit: >> Bad swp_entry: 200 >> mm/swap_state.c:683: bad pte ef3a3f38(8001) >> >> The Linux bugzilla has more info >> https://bugzilla.kernel.org/show_bug.cgi?id=198497 >> >> This may not be exclusive to Xen Linux, but most of the reports are on >> Xen. Matthew wonders if Xen might be stepping on the upper bits of a >> pte. > > Yes - Xen does use the upper bits of a PTE, but only 1 in release > builds, and a second in debug builds. I don't understand where you're > getting the 3rd bit in there. The former supposedly is _PAGE_GUEST_KERNEL, which we use for 64-bit guests only. Above talk is of 32-bit guests only. In addition both this and _PAGE_GNTTAB are used on present PTEs only, while above talk is about swap entries. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [Bug 198497] handle_mm_fault / xen_pmd_val / radix_tree_lookup_slot Null pointer
On 20/04/18 16:25, Andrew Cooper wrote: > On 20/04/18 16:20, Jason Andryuk wrote: >> Adding xen-devel and the Linux Xen maintainers. >> >> Summary: Some Xen users (and maybe others) are hitting a BUG in >> __radix_tree_lookup() under do_swap_page() - example backtrace is >> provided at the end. Matthew Wilcox provided a band-aid patch that >> prints errors like the following instead of triggering the bug. >> >> Skylake 32bit PAE Dom0: >> Bad swp_entry: 8000 >> mm/swap_state.c:683: bad pte d3a39f1c(8004) >> >> Ivy Bridge 32bit PAE Dom0: >> Bad swp_entry: 4000 >> mm/swap_state.c:683: bad pte d3a05f1c(8002) >> >> Other 32bit DomU: >> Bad swp_entry: 400 >> mm/swap_state.c:683: bad pte e2187f30(8002) >> >> Other 32bit: >> Bad swp_entry: 200 >> mm/swap_state.c:683: bad pte ef3a3f38(8001) >> >> The Linux bugzilla has more info >> https://bugzilla.kernel.org/show_bug.cgi?id=198497 >> >> This may not be exclusive to Xen Linux, but most of the reports are on >> Xen. Matthew wonders if Xen might be stepping on the upper bits of a >> pte. > Yes - Xen does use the upper bits of a PTE, but only 1 in release > builds, and a second in debug builds. I don't understand where you're > getting the 3rd bit in there. > > The use of these bits are dubious, and not adequately described in the > ABI, and attempts to improve the state of play has come to nothing in > the past. Sorry - hit send too early. To be rather more helpful: For 64bit guests only, we use one bit to distinguish between guest kernel and guest user pages. This is because both guest user and kernel run in ring3, and have to have _PAGE_USER set on them. We use bit 52 to tag guest kernel mappings, which is seeded from the guest kernels choice of _PAGE_USER. In debug builds of the hypervisor only, we use bit 62 to tag grant mappings. This is to help spot API errors in the guest, and results in an instant crash if we spot misuse. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [Bug 198497] handle_mm_fault / xen_pmd_val / radix_tree_lookup_slot Null pointer
Adding xen-devel and the Linux Xen maintainers. Summary: Some Xen users (and maybe others) are hitting a BUG in __radix_tree_lookup() under do_swap_page() - example backtrace is provided at the end. Matthew Wilcox provided a band-aid patch that prints errors like the following instead of triggering the bug. Skylake 32bit PAE Dom0: Bad swp_entry: 8000 mm/swap_state.c:683: bad pte d3a39f1c(8004) Ivy Bridge 32bit PAE Dom0: Bad swp_entry: 4000 mm/swap_state.c:683: bad pte d3a05f1c(8002) Other 32bit DomU: Bad swp_entry: 400 mm/swap_state.c:683: bad pte e2187f30(8002) Other 32bit: Bad swp_entry: 200 mm/swap_state.c:683: bad pte ef3a3f38(8001) The Linux bugzilla has more info https://bugzilla.kernel.org/show_bug.cgi?id=198497 This may not be exclusive to Xen Linux, but most of the reports are on Xen. Matthew wonders if Xen might be stepping on the upper bits of a pte. On Fri, Apr 20, 2018 at 9:39 AM, Matthew Wilcoxwrote: > On Fri, Apr 20, 2018 at 09:10:11AM -0400, Jason Andryuk wrote: >> > Given that this is happening on Xen, I wonder if Xen is using some of the >> > bits in the page table for its own purposes. >> >> The backtraces include do_swap_page(). While I have a swap partition >> configured, I don't think it's being used. Are we somehow >> misidentifying the page as a swap page? I'm not familiar with the >> code, but is there an easy way to query global swap usage? That way >> we can see if the check for a swap page is bogus. >> >> My system works with the band-aid patch. When that patch sets page = >> NULL, does that mean userspace is just going to get a zero-ed page? >> Userspace still works AFAICT, which makes me think it is a >> mis-identified page to start with. > > Here's how this code works. Thanks for the description. > When we swap out an anonymous page (a page which is not backed by a > file; could be from a MAP_PRIVATE mapping, could be brk()), we write it > to the swap cache. In order to be able to find it again, we store a > cookie (called a swp_entry_t) in the process' page table (marked with > the 'present' bit clear, so the CPU will fault on it). When we get a > fault, we look up the cookie in a radix tree and bring that page back > in from swap. > > If there's no page found in the radix tree, we put a freshly zeroed > page into the process's address space. That's because we won't find > a page in the swap cache's radix tree for the first time we fault. > It's not an indication of a bug if there's no page to be found. Is "no page found" the case for a lazy, un-allocated MAP_ANONYMOUS page? > What we're seeing for this bug is page table entries of the format > 0x8000'0004''. That would be a zeroed entry, except for the > fact that something's stepped on the upper bits. Does a totally zero-ed entry correspond to an un-allocated MAP_ANONYMOUS page? > What is worrying is that potentially Xen might be stepping on the upper > bits of either a present entry (leading to the process loading a page > that belongs to someone else) or an entry which has been swapped out, > leading to the process getting a zeroed page when it should be getting > its page back from swap. There was at least one report of non-Xen 32bit being affected. There was no backtrace, so it could be something else. One report doesn't have any swap configured. > Defending against this kind of corruption would take adding a parity > bit to the page tables. That's not a project I have time for right now. Understood. Thanks for the response. Regards, Jason [ 2234.939079] BUG: unable to handle kernel NULL pointer dereference at 0008 [ 2234.942154] IP: __radix_tree_lookup+0xe/0xa0 [ 2234.945176] *pdpt = 08cd5027 *pde = [ 2234.948382] Oops: [#1] SMP [ 2234.951410] Modules linked in: hp_wmi sparse_keymap rfkill wmi_bmof pcspkr i915 wmi hp_accel lis3lv02d input_polldev drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm hp_wireless i2c_algo_bit hid_multitouch sha256_generic xen_netfront v4v(O) psmouse ecb xts hid_generic xhci_pci xhci_hcd ohci_pci ohci_hcd uhci_hcd ehci_pci ehci_hcd usbhid hid tpm_tis tpm_tis_core tpm [ 2234.960816] CPU: 1 PID: 2338 Comm: xenvm Tainted: G O4.14.18 #1 [ 2234.963991] Hardware name: Hewlett-Packard HP EliteBook Folio 9470m/18DF, BIOS 68IBD Ver. F.40 02/01/2013 [ 2234.967186] task: d4370980 task.stack: cf8e8000 [ 2234.970351] EIP: __radix_tree_lookup+0xe/0xa0 [ 2234.973520] EFLAGS: 00010286 CPU: 1 [ 2234.976699] EAX: 0004 EBX: b590 ECX: EDX: [ 2234.979887] ESI: EDI: 0004 EBP: cf8e9dd0 ESP: cf8e9dc0 [ 2234.983081] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0069 [ 2234.986233] CR0: 80050033 CR2: 0008 CR3: 08f12000 CR4: 00042660 [ 2234.989340] Call Trace: [ 2234.992354] radix_tree_lookup_slot+0x1d/0x50 [ 2234.995341] ? xen_irq_disable_direct+0xc/0xc [ 2234.998288]