Bug#503821: Purpose of features/all/xen/workaround-pte-file.patch?
On Wed, 2008-11-05 at 15:20 +0100, Bastian Blank wrote: > On Wed, Nov 05, 2008 at 03:15:38PM +0100, Bastian Blank wrote: > > On Wed, Nov 05, 2008 at 02:50:32PM +0100, Bastian Blank wrote: > > > On Wed, Nov 05, 2008 at 01:10:30PM +, Ian Campbell wrote: > > > > On Wed, 2008-11-05 at 10:05 +, Ian Campbell wrote: > > > > > > What happens if you use "nopat" to disable the usage of PAT? > > > > > CONFIG_X86_PAT is disabled anyway so it makes no difference. > > > > Although it does turn out that PAT is implicated... > > > Ah. > > Shorter patch. Tries to use the compiler instead of the preprocessor. > > Even shorter. I think the minimum is attached, but this wouldn't give a build failure if a usage snuck in. I'm happy with any of the variants. -- Ian Campbell Eat drink and be merry, for tomorrow they may make it illegal. Index: sid-xen/include/asm-x86/mach-xen/asm/pgtable.h === --- sid-xen.orig/include/asm-x86/mach-xen/asm/pgtable.h 2008-11-05 13:07:33.0 + +++ sid-xen/include/asm-x86/mach-xen/asm/pgtable.h 2008-11-05 14:08:15.0 + @@ -74,5 +74,5 @@ * PAT settings are part of the hypervisor interface, which sets the * MSR to 0x050100070406 (i.e. WB, WT, UC-, UC, WC, WP [, UC, UC]). */ -#define _PAGE_CACHE_MASK (_PAGE_PCD | _PAGE_PWT | _PAGE_PAT) +#define _PAGE_CACHE_MASK (_PAGE_PCD | _PAGE_PWT) #define _PAGE_CACHE_WB (0) Index: sid-xen/arch/x86/Kconfig === --- sid-xen.orig/arch/x86/Kconfig 2008-11-05 13:07:33.0 + +++ sid-xen/arch/x86/Kconfig 2008-11-05 14:08:15.0 + @@ -1141,6 +1141,7 @@ bool prompt "x86 PAT support" depends on MTRR + depends on !XEN help Use PAT attributes to setup page level cache control.
Bug#503821: Purpose of features/all/xen/workaround-pte-file.patch?
On Wed, Nov 05, 2008 at 03:15:38PM +0100, Bastian Blank wrote: > On Wed, Nov 05, 2008 at 02:50:32PM +0100, Bastian Blank wrote: > > On Wed, Nov 05, 2008 at 01:10:30PM +, Ian Campbell wrote: > > > On Wed, 2008-11-05 at 10:05 +, Ian Campbell wrote: > > > > > What happens if you use "nopat" to disable the usage of PAT? > > > > CONFIG_X86_PAT is disabled anyway so it makes no difference. > > > Although it does turn out that PAT is implicated... > > Ah. > Shorter patch. Tries to use the compiler instead of the preprocessor. Even shorter. Bastian -- No more blah, blah, blah! -- Kirk, "Miri", stardate 2713.6 diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 170d743..ae1e15b 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1141,6 +1141,7 @@ config X86_PAT bool prompt "x86 PAT support" depends on MTRR + depends on !XEN help Use PAT attributes to setup page level cache control. diff --git a/arch/x86/mm/ioremap-xen.c b/arch/x86/mm/ioremap-xen.c index 95c214f..52c527f 100644 --- a/arch/x86/mm/ioremap-xen.c +++ b/arch/x86/mm/ioremap-xen.c @@ -255,9 +255,11 @@ int ioremap_change_attr(unsigned long vaddr, unsigned long size, default: err = _set_memory_uc(vaddr, nrpages); break; +#ifdef CONFIG_X86_PAT case _PAGE_CACHE_WC: err = _set_memory_wc(vaddr, nrpages); break; +#endif case _PAGE_CACHE_WB: err = _set_memory_wb(vaddr, nrpages); break; @@ -340,11 +342,16 @@ static void __iomem *__ioremap_caller(resource_size_t phys_addr, * - request is uc-, return cannot be write-combine * - request is write-combine, return cannot be write-back */ +#ifdef CONFIG_X86_PAT if ((prot_val == _PAGE_CACHE_UC_MINUS && (new_prot_val == _PAGE_CACHE_WB || new_prot_val == _PAGE_CACHE_WC)) || (prot_val == _PAGE_CACHE_WC && -new_prot_val == _PAGE_CACHE_WB)) { +new_prot_val == _PAGE_CACHE_WB)) +#else + if (prot_val == _PAGE_CACHE_UC_MINUS && new_prot_val == _PAGE_CACHE_WB) +#endif + { pr_debug( "ioremap error for 0x%llx-0x%llx, requested 0x%lx, got 0x%lx\n", (unsigned long long)phys_addr, @@ -364,9 +371,11 @@ static void __iomem *__ioremap_caller(resource_size_t phys_addr, case _PAGE_CACHE_UC_MINUS: prot = PAGE_KERNEL_UC_MINUS; break; +#ifdef CONFIG_X86_PAT case _PAGE_CACHE_WC: prot = PAGE_KERNEL_WC; break; +#endif case _PAGE_CACHE_WB: prot = PAGE_KERNEL; break; diff --git a/arch/x86/mm/pat-xen.c b/arch/x86/mm/pat-xen.c index 6fc94ce..52f9776 100644 --- a/arch/x86/mm/pat-xen.c +++ b/arch/x86/mm/pat-xen.c @@ -116,9 +116,11 @@ static char *cattr_name(unsigned long flags) case _PAGE_CACHE_UC:return "uncached"; case _PAGE_CACHE_UC_MINUS: return "uncached-minus"; case _PAGE_CACHE_WB:return "write-back"; +#ifdef CONFIG_X86_PAT case _PAGE_CACHE_WC:return "write-combining"; case _PAGE_CACHE_WP:return "write-protected"; case _PAGE_CACHE_WT:return "write-through"; +#endif default:return "broken"; } } diff --git a/arch/x86/pci/i386.c b/arch/x86/pci/i386.c index 10fb308..5908a06 100644 --- a/arch/x86/pci/i386.c +++ b/arch/x86/pci/i386.c @@ -324,10 +324,16 @@ int pci_mmap_page_range(struct pci_dev *dev, struct vm_area_struct *vma, * - request is uncached, return cannot be write-combine * - request is write-combine, return cannot be write-back */ +#ifdef CONFIG_X86_PAT if ((flags == _PAGE_CACHE_UC_MINUS && (new_flags == _PAGE_CACHE_WB)) || (flags == _PAGE_CACHE_WC && -new_flags == _PAGE_CACHE_WB)) { +new_flags == _PAGE_CACHE_WB)) +#else + if (flags == _PAGE_CACHE_UC_MINUS && + new_flags == _PAGE_CACHE_WB) +#endif + { free_memtype(addr, addr+len); return -EINVAL; } diff --git a/include/asm-x86/mach-xen/asm/pgtable.h b/include/asm-x86/mach-xen/asm/pgtable.h index a9ff073..73ec9f3 100644 --- a/include/asm-x86/mach-xen/asm/pgtable.h +++ b/include/asm-x86/mach-xen/asm/pgtable.h @@ -74,11 +74,17 @@ extern unsigned int __kernel_page_user; * PAT settings are part of the hypervisor interface, which sets the * MSR to 0x050100070406 (i.e. WB, WT, UC-, UC, WC, WP [, UC, UC]). */ -#define _PAGE_CACHE_MASK (_PAG
Bug#503821: Purpose of features/all/xen/workaround-pte-file.patch?
On Wed, 2008-11-05 at 14:50 +0100, Bastian Blank wrote: > On Wed, Nov 05, 2008 at 01:10:30PM +, Ian Campbell wrote: > > On Wed, 2008-11-05 at 10:05 +, Ian Campbell wrote: > > > > What happens if you use "nopat" to disable the usage of PAT? > > > CONFIG_X86_PAT is disabled anyway so it makes no difference. > > Although it does turn out that PAT is implicated... > > Ah. > > > /* Set of bits not changed in pte_modify */ > > -#define _PAGE_CHG_MASK (PTE_MASK | _PAGE_CACHE_MASK | _PAGE_IO | > > \ > > +#define _PAGE_CHG_MASK (PTE_MASK | _PAGE_PCD | _PAGE_PWT | _PAGE_IO | > > \ > > _PAGE_ACCESSED | _PAGE_DIRTY) > > This is the direct expansion, why? Just because the native version uses the expanded version and I was playing with it earlier. I decided to keep it since the difference seemed unnecessary. > > +#else > > + /* This is identical to page table setting without PAT */ > > + if (ret_type) { > > + if (req_type == -1) { > > + *ret_type = _PAGE_CACHE_WB; > > + } else { > > + *ret_type = req_type; > > + } > > + } > > + return 0; > > +#endif > > Code duplication. Better force pat_wc_enabled to 0 and let the compiler > do this instead of forcing the knowledge into the preprocessor. pat_wc_enable is already const int = 0, I just did it via the preprocessor to prove that _PAGE_CACHE_WC was unused at build time, about 90% of the patch is only because of that. I could drop everything apart from the _PAGE_CACHE_MASK and the Kconfig change and the fix would still be valid. I just felt that making the code completely unavailable was safer in case something sneaks in since it will be a build failure. In this case I could have simply ifdef'd the main function body, I've made that change. > I would also make several parts dependant on _PAGE_CACHE_WC instead of > X86_PAT. OK. Fresh patch attached. I'll test the proper Debian config rather than my cut down one now. If I remove workaround-pte-file.patch is the correct way to do that to remove the reference from debian/patches/series/7-extra or to add a "-" type reference to 10-extra? Ian. Index: sid-xen/include/asm-x86/mach-xen/asm/pgtable.h === --- sid-xen.orig/include/asm-x86/mach-xen/asm/pgtable.h 2008-11-05 13:07:33.0 + +++ sid-xen/include/asm-x86/mach-xen/asm/pgtable.h 2008-11-05 14:08:15.0 + @@ -67,18 +67,20 @@ _PAGE_DIRTY | __kernel_page_user) /* Set of bits not changed in pte_modify */ -#define _PAGE_CHG_MASK (PTE_MASK | _PAGE_CACHE_MASK | _PAGE_IO | \ +#define _PAGE_CHG_MASK (PTE_MASK | _PAGE_PCD | _PAGE_PWT | _PAGE_IO | \ _PAGE_ACCESSED | _PAGE_DIRTY) /* * PAT settings are part of the hypervisor interface, which sets the * MSR to 0x050100070406 (i.e. WB, WT, UC-, UC, WC, WP [, UC, UC]). */ -#define _PAGE_CACHE_MASK (_PAGE_PCD | _PAGE_PWT | _PAGE_PAT) +#define _PAGE_CACHE_MASK (_PAGE_PCD | _PAGE_PWT) #define _PAGE_CACHE_WB (0) +#ifdef CONFIG_X86_PAT #define _PAGE_CACHE_WT (_PAGE_PWT) #define _PAGE_CACHE_WC (_PAGE_PAT) #define _PAGE_CACHE_WP (_PAGE_PAT | _PAGE_PWT) +#endif #define _PAGE_CACHE_UC_MINUS (_PAGE_PCD) #define _PAGE_CACHE_UC (_PAGE_PCD | _PAGE_PWT) Index: sid-xen/arch/x86/Kconfig === --- sid-xen.orig/arch/x86/Kconfig 2008-11-05 13:07:33.0 + +++ sid-xen/arch/x86/Kconfig2008-11-05 14:08:15.0 + @@ -1141,6 +1141,7 @@ bool prompt "x86 PAT support" depends on MTRR + depends on !XEN help Use PAT attributes to setup page level cache control. Index: sid-xen/arch/x86/mm/ioremap-xen.c === --- sid-xen.orig/arch/x86/mm/ioremap-xen.c 2008-11-05 13:07:33.0 + +++ sid-xen/arch/x86/mm/ioremap-xen.c 2008-11-05 14:08:15.0 + @@ -255,9 +255,11 @@ default: err = _set_memory_uc(vaddr, nrpages); break; +#ifdef _PAGE_CACHE_WC case _PAGE_CACHE_WC: err = _set_memory_wc(vaddr, nrpages); break; +#endif case _PAGE_CACHE_WB: err = _set_memory_wb(vaddr, nrpages); break; @@ -340,11 +342,17 @@ * - request is uc-, return cannot be write-combine * - request is write-combine, return cannot be write-back */ - if ((prot_val == _PAGE_CACHE_UC_MINUS && + if ( +#ifdef _PAGE_CACHE_WC + (prot_val == _PAGE_CACHE_UC_MINUS && (new_prot_val == _PAGE_CACHE_WB || new_prot_val == _PAGE_CACHE_WC)) || (prot_val == _PAGE_CACHE_WC && -
Bug#503821: Purpose of features/all/xen/workaround-pte-file.patch?
On Wed, Nov 05, 2008 at 02:50:32PM +0100, Bastian Blank wrote: > On Wed, Nov 05, 2008 at 01:10:30PM +, Ian Campbell wrote: > > On Wed, 2008-11-05 at 10:05 +, Ian Campbell wrote: > > > > What happens if you use "nopat" to disable the usage of PAT? > > > CONFIG_X86_PAT is disabled anyway so it makes no difference. > > Although it does turn out that PAT is implicated... > Ah. Shorter patch. Tries to use the compiler instead of the preprocessor. Bastian -- Change is the essential process of all existence. -- Spock, "Let That Be Your Last Battlefield", stardate 5730.2 diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 170d743..ae1e15b 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1141,6 +1141,7 @@ config X86_PAT bool prompt "x86 PAT support" depends on MTRR + depends on !XEN help Use PAT attributes to setup page level cache control. diff --git a/arch/x86/mm/ioremap-xen.c b/arch/x86/mm/ioremap-xen.c index 95c214f..93fc6f7 100644 --- a/arch/x86/mm/ioremap-xen.c +++ b/arch/x86/mm/ioremap-xen.c @@ -255,9 +255,11 @@ int ioremap_change_attr(unsigned long vaddr, unsigned long size, default: err = _set_memory_uc(vaddr, nrpages); break; +#ifdef CONFIG_X86_PAT case _PAGE_CACHE_WC: err = _set_memory_wc(vaddr, nrpages); break; +#endif case _PAGE_CACHE_WB: err = _set_memory_wb(vaddr, nrpages); break; @@ -340,11 +342,17 @@ static void __iomem *__ioremap_caller(resource_size_t phys_addr, * - request is uc-, return cannot be write-combine * - request is write-combine, return cannot be write-back */ - if ((prot_val == _PAGE_CACHE_UC_MINUS && + if ( +#ifdef CONFIG_X86_PAT + (prot_val == _PAGE_CACHE_UC_MINUS && (new_prot_val == _PAGE_CACHE_WB || new_prot_val == _PAGE_CACHE_WC)) || (prot_val == _PAGE_CACHE_WC && -new_prot_val == _PAGE_CACHE_WB)) { +new_prot_val == _PAGE_CACHE_WB) +#else + (prot_val == _PAGE_CACHE_UC_MINUS && new_prot_val == _PAGE_CACHE_WB) +#endif + ) { pr_debug( "ioremap error for 0x%llx-0x%llx, requested 0x%lx, got 0x%lx\n", (unsigned long long)phys_addr, @@ -364,9 +372,11 @@ static void __iomem *__ioremap_caller(resource_size_t phys_addr, case _PAGE_CACHE_UC_MINUS: prot = PAGE_KERNEL_UC_MINUS; break; +#ifdef CONFIG_X86_PAT case _PAGE_CACHE_WC: prot = PAGE_KERNEL_WC; break; +#endif case _PAGE_CACHE_WB: prot = PAGE_KERNEL; break; diff --git a/arch/x86/mm/pat-xen.c b/arch/x86/mm/pat-xen.c index 6fc94ce..52f9776 100644 --- a/arch/x86/mm/pat-xen.c +++ b/arch/x86/mm/pat-xen.c @@ -116,9 +116,11 @@ static char *cattr_name(unsigned long flags) case _PAGE_CACHE_UC:return "uncached"; case _PAGE_CACHE_UC_MINUS: return "uncached-minus"; case _PAGE_CACHE_WB:return "write-back"; +#ifdef CONFIG_X86_PAT case _PAGE_CACHE_WC:return "write-combining"; case _PAGE_CACHE_WP:return "write-protected"; case _PAGE_CACHE_WT:return "write-through"; +#endif default:return "broken"; } } diff --git a/arch/x86/pci/i386.c b/arch/x86/pci/i386.c index 10fb308..5908a06 100644 --- a/arch/x86/pci/i386.c +++ b/arch/x86/pci/i386.c @@ -324,10 +324,16 @@ int pci_mmap_page_range(struct pci_dev *dev, struct vm_area_struct *vma, * - request is uncached, return cannot be write-combine * - request is write-combine, return cannot be write-back */ +#ifdef CONFIG_X86_PAT if ((flags == _PAGE_CACHE_UC_MINUS && (new_flags == _PAGE_CACHE_WB)) || (flags == _PAGE_CACHE_WC && -new_flags == _PAGE_CACHE_WB)) { +new_flags == _PAGE_CACHE_WB)) +#else + if (flags == _PAGE_CACHE_UC_MINUS && + new_flags == _PAGE_CACHE_WB) +#endif + { free_memtype(addr, addr+len); return -EINVAL; } diff --git a/include/asm-x86/mach-xen/asm/pgtable.h b/include/asm-x86/mach-xen/asm/pgtable.h index a9ff073..73ec9f3 100644 --- a/include/asm-x86/mach-xen/asm/pgtable.h +++ b/include/asm-x86/mach-xen/asm/pgtable.h @@ -74,11 +74,17 @@ extern unsigned int __kernel_page_user; * PAT settings are part of the hypervisor interface, which sets the * MSR to 0x050100070406 (i.e. WB, WT, UC-, UC, WC, WP [, UC, UC])
Bug#503821: Purpose of features/all/xen/workaround-pte-file.patch?
On Wed, Nov 05, 2008 at 01:10:30PM +, Ian Campbell wrote: > On Wed, 2008-11-05 at 10:05 +, Ian Campbell wrote: > > > What happens if you use "nopat" to disable the usage of PAT? > > CONFIG_X86_PAT is disabled anyway so it makes no difference. > Although it does turn out that PAT is implicated... Ah. > /* Set of bits not changed in pte_modify */ > -#define _PAGE_CHG_MASK (PTE_MASK | _PAGE_CACHE_MASK | _PAGE_IO | > \ > +#define _PAGE_CHG_MASK (PTE_MASK | _PAGE_PCD | _PAGE_PWT | _PAGE_IO | > \ >_PAGE_ACCESSED | _PAGE_DIRTY) This is the direct expansion, why? > +#else > + /* This is identical to page table setting without PAT */ > + if (ret_type) { > + if (req_type == -1) { > + *ret_type = _PAGE_CACHE_WB; > + } else { > + *ret_type = req_type; > + } > + } > + return 0; > +#endif Code duplication. Better force pat_wc_enabled to 0 and let the compiler do this instead of forcing the knowledge into the preprocessor. I would also make several parts dependant on _PAGE_CACHE_WC instead of X86_PAT. Bastian -- Killing is wrong. -- Losira, "That Which Survives", stardate unknown -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Bug#503821: Purpose of features/all/xen/workaround-pte-file.patch?
On Wed, 2008-11-05 at 10:05 +, Ian Campbell wrote: > > > What happens if you use "nopat" to disable the usage of PAT? > > CONFIG_X86_PAT is disabled anyway so it makes no difference. Although it does turn out that PAT is implicated... The key turned out to be the difference in _PAGE_CACHE_MASK between the asm and mach-xen/asm versions of pgtable.h. -#define _PAGE_CACHE_MASK (_PAGE_PCD | _PAGE_PWT) [...] +#define _PAGE_CACHE_MASK (_PAGE_PCD | _PAGE_PWT | _PAGE_PAT) Native sets up PAT such that only the PCD and PWD bits are used for the cache modes the kernel is interested in (WC, WC, UC, UC-) and the PAT bit is unused. On Xen PAT is controlled by the hypervisor and has a static setting which uses the PAT bit for WC. PAT PCD PWT NATIVE XEN BIOSINIT 0 0 0 WB WB WB WB 0 0 1 WC WT WT WT 0 1 0 UC- UC- UC- UC- 0 1 1 UC UC - UC 1 0 0 - WC WB WB 1 0 1 - WP WT WT 1 1 0 - - UC- UC- 1 1 1 - - - UC- (INIT is the processors initial state and BIOS is apparently commonly set by the BIOS) The kernel uses WB and UC/UC- and conditionally uses WC IFF PAT is compiled in (CONFIG_X86_PAT) and enabled (pat_wc_enabled). Now it seems that on native the masking and checking etc of PROTNONE pages works fine because _PAGE_CACHE_MASK does not include _PAGE_PAT (which == _PAGE_PSE used by PROTNONE). Xen broke this by adding that bit to the mask which presumably leads to confusion on some paths and then to the observed mprotect crash... Since we disable CONFIG_X86_PAT it is much simpler to revert the change to _PAGE_CACHE_MASK than to figure out where the confusion occurs. The attached patch is bigger than this trivial change since I #undefined _PAGE_CACHE_{WT,WC,WP} to be sure they were unused and had to #ifdef their use. WT and WP are completely unused anyway and WC is always gated on pat_wc_enabled and hence would have been compiled out. With this both the mprotect and the swap death test cases work (or correctly OOM in the swap case). Obviously workaround-pte-file.patch should be dropped too. I never got to the bottom of why the workaround caused the swap issue. I'll hand wave it as "incorrect masking" somewhere ;-) Shall I commit? Ian. Index: sid-xen/include/asm-x86/mach-xen/asm/pgtable.h === --- sid-xen.orig/include/asm-x86/mach-xen/asm/pgtable.h 2008-11-05 09:47:42.0 + +++ sid-xen/include/asm-x86/mach-xen/asm/pgtable.h 2008-11-05 12:32:00.0 + @@ -67,18 +67,20 @@ _PAGE_DIRTY | __kernel_page_user) /* Set of bits not changed in pte_modify */ -#define _PAGE_CHG_MASK (PTE_MASK | _PAGE_CACHE_MASK | _PAGE_IO | \ +#define _PAGE_CHG_MASK (PTE_MASK | _PAGE_PCD | _PAGE_PWT | _PAGE_IO | \ _PAGE_ACCESSED | _PAGE_DIRTY) /* * PAT settings are part of the hypervisor interface, which sets the * MSR to 0x050100070406 (i.e. WB, WT, UC-, UC, WC, WP [, UC, UC]). */ -#define _PAGE_CACHE_MASK (_PAGE_PCD | _PAGE_PWT | _PAGE_PAT) +#define _PAGE_CACHE_MASK (_PAGE_PCD | _PAGE_PWT) #define _PAGE_CACHE_WB (0) +#ifdef CONFIG_X86_PAT #define _PAGE_CACHE_WT (_PAGE_PWT) #define _PAGE_CACHE_WC (_PAGE_PAT) #define _PAGE_CACHE_WP (_PAGE_PAT | _PAGE_PWT) +#endif #define _PAGE_CACHE_UC_MINUS (_PAGE_PCD) #define _PAGE_CACHE_UC (_PAGE_PCD | _PAGE_PWT) Index: sid-xen/arch/x86/Kconfig === --- sid-xen.orig/arch/x86/Kconfig 2008-11-05 12:26:55.0 + +++ sid-xen/arch/x86/Kconfig2008-11-05 12:27:02.0 + @@ -1141,6 +1141,7 @@ bool prompt "x86 PAT support" depends on MTRR + depends on !XEN help Use PAT attributes to setup page level cache control. Index: sid-xen/arch/x86/mm/ioremap-xen.c === --- sid-xen.orig/arch/x86/mm/ioremap-xen.c 2008-11-05 12:37:53.0 + +++ sid-xen/arch/x86/mm/ioremap-xen.c 2008-11-05 12:43:06.0 + @@ -255,9 +255,11 @@ default: err = _set_memory_uc(vaddr, nrpages); break; +#ifdef CONFIG_X86_PAT case _PAGE_CACHE_WC: err = _set_memory_wc(vaddr, nrpages); break; +#endif case _PAGE_CACHE_WB: err = _set_memory_wb(vaddr, nrpages); break; @@ -340,11 +342,17 @@ * - request is uc-, return cannot be write-combine * - request is write-combine, return cannot be write-back */ - if ((prot_val ==
Bug#503821: Purpose of features/all/xen/workaround-pte-file.patch?
On Tue, Nov 04, 2008 at 09:49:19AM +0100, Bastian Blank wrote: > On Tue, Nov 04, 2008 at 08:30:16AM +, Ian Campbell wrote: > > However I assume the workaround is there for a specific purpose, what > > was it? > A crash in mprotect. I took another look into the original crash. | BUG: unable to handle kernel paging request at cefb81c0 | IP: [] handle_mm_fault+0x4d6/0xe78 | *pdpt = a8072027 *pde = b62fd067 *pte = 8000a8f19061 | Oops: 0003 [#1] SMP | Modules linked in: ipv6 ext3 jbd mbcache thermal_sys | | Pid: 1093, comm: test Not tainted (2.6.26-1-xen-686 #1) | EIP: 0061:[] EFLAGS: 00010282 CPU: 0 | EIP is at handle_mm_fault+0x4d6/0xe78 | EAX: aa1210e5 EBX: b7e38000 ECX: 8000 EDX: 8000 | ESI: EDI: cefb81c0 EBP: c13c070c ESP: cfccdea8 | DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069 | Process test (pid: 1093, ti=cfccc000 task=cfcb8da0 task.ti=cfccc000) [...] | Call Trace: | [] xen_tlb_flush_mask+0x28/0x39 | [] mprotect_fixup+0x6d3/0x735 | [] do_page_fault+0x684/0xbd6 | [] sys_mprotect+0x17a/0x1df | [] sys_mprotect+0x1cc/0x1df | [] do_page_fault+0x0/0xbd6 | [] error_code+0x35/0x3c | === [...] | EIP: [] handle_mm_fault+0x4d6/0xe78 SS:ESP 0069:cfccdea8 | ---[ end trace a379c90432c095a5 ]--- Why would Xen inject a page fault into a tlb flush? The tlb flushs in the paravirt implementation looks completely different. Bastian -- Wait! You have not been prepared! -- Mr. Atoz, "Tomorrow is Yesterday", stardate 3113.2 -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Bug#503821: Purpose of features/all/xen/workaround-pte-file.patch?
On Wed, 2008-11-05 at 10:59 +0100, Bastian Blank wrote: > On Tue, Nov 04, 2008 at 09:49:19AM +0100, Bastian Blank wrote: > > On Tue, Nov 04, 2008 at 08:30:16AM +, Ian Campbell wrote: > > > However I assume the workaround is there for a specific purpose, what > > > was it? > > A crash in mprotect. > > I took another look into the original crash. > > | BUG: unable to handle kernel paging request at cefb81c0 > | IP: [] handle_mm_fault+0x4d6/0xe78 > | *pdpt = a8072027 *pde = b62fd067 *pte = 8000a8f19061 > | Oops: 0003 [#1] SMP > | Modules linked in: ipv6 ext3 jbd mbcache thermal_sys > | > | Pid: 1093, comm: test Not tainted (2.6.26-1-xen-686 #1) > | EIP: 0061:[] EFLAGS: 00010282 CPU: 0 > | EIP is at handle_mm_fault+0x4d6/0xe78 > | EAX: aa1210e5 EBX: b7e38000 ECX: 8000 EDX: 8000 > | ESI: EDI: cefb81c0 EBP: c13c070c ESP: cfccdea8 > | DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069 > | Process test (pid: 1093, ti=cfccc000 task=cfcb8da0 task.ti=cfccc000) > [...] > | Call Trace: > | [] xen_tlb_flush_mask+0x28/0x39 > | [] mprotect_fixup+0x6d3/0x735 > | [] do_page_fault+0x684/0xbd6 > | [] sys_mprotect+0x17a/0x1df > | [] sys_mprotect+0x1cc/0x1df > | [] do_page_fault+0x0/0xbd6 > | [] error_code+0x35/0x3c > | === > [...] > | EIP: [] handle_mm_fault+0x4d6/0xe78 SS:ESP 0069:cfccdea8 > | ---[ end trace a379c90432c095a5 ]--- > > Why would Xen inject a page fault into a tlb flush? The tlb flushs in > the paravirt implementation looks completely different. Perhaps some of those entries are just stale stack data. sys_mprotect won't have been called from do_page_fault for example. Ian. > -- Ian Campbell Current Noise: Fudge Tunnel - Sunshine Of Your Love If you go out of your mind, do it quietly, so as not to disturb those around you. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Bug#503821: Purpose of features/all/xen/workaround-pte-file.patch?
On Wed, Nov 05, 2008 at 10:13:46AM +, Ian Campbell wrote: > Perhaps some of those entries are just stale stack data. sys_mprotect > won't have been called from do_page_fault for example. Hrm. Yes. mm/fault.c and mm/fault-xen.c have some weird differences. Bastian -- If there are self-made purgatories, then we all have to live in them. -- Spock, "This Side of Paradise", stardate 3417.7 -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Bug#503821: Purpose of features/all/xen/workaround-pte-file.patch?
On Wed, 2008-11-05 at 10:40 +0100, Bastian Blank wrote: > On Wed, Nov 05, 2008 at 07:06:58AM +, Ian Campbell wrote: > > This patch makes mprotect work by (very skankily) hacking out large page > > support which is unsupported on top of Xen anyway (I think so, currently > > anyway). I think I took out PAT as collaterol damage too. A cleaned up > > version without the pat damage might be an acceptable fix for the > > mprotect issue. > > Xen have to support large pages, they are used over and over. Xen currently only supports 4K pages (for PV guests, HVM obviously supports everything). There were patches to enable larger page types recently on xen-devel but they aren't in any hypervisor we care about for Lenny. > > My suspicion is that one of the -xen.c or mach-xen/asm/ files has gotten > > out of sync with a fix to its native partner since _PAGE_PSE is used for > > PROTNONE on native too so they must get round it somehow. I'll have a > > scrobble through and see if I can see it. > > Would make sense. > > What happens if you use "nopat" to disable the usage of PAT? CONFIG_X86_PAT is disabled anyway so it makes no difference. > The only way to generate huge pages from userspace is to use hugetlbfs, which > is > disabled. Ian. -- Ian Campbell Current Noise: Fudge Tunnel - Hate Song (Version) Money is the root of all wealth. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Bug#503821: Purpose of features/all/xen/workaround-pte-file.patch?
On Wed, Nov 05, 2008 at 07:06:58AM +, Ian Campbell wrote: > This patch makes mprotect work by (very skankily) hacking out large page > support which is unsupported on top of Xen anyway (I think so, currently > anyway). I think I took out PAT as collaterol damage too. A cleaned up > version without the pat damage might be an acceptable fix for the > mprotect issue. Xen have to support large pages, they are used over and over. > My suspicion is that one of the -xen.c or mach-xen/asm/ files has gotten > out of sync with a fix to its native partner since _PAGE_PSE is used for > PROTNONE on native too so they must get round it somehow. I'll have a > scrobble through and see if I can see it. Would make sense. What happens if you use "nopat" to disable the usage of PAT? The only way to generate huge pages from userspace is to use hugetlbfs, which is disabled. Bastian -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Bug#503821: Purpose of features/all/xen/workaround-pte-file.patch?
* Bastian Blank: > Unchecked patch attached. It disallows changes from and to PROT_NONE. Huh? Doesn't this break the user-space ABI? -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Bug#503821: Purpose of features/all/xen/workaround-pte-file.patch?
On Tue, 2008-11-04 at 19:12 +, Ian Campbell wrote: > On Tue, 2008-11-04 at 17:43 +0100, Bastian Blank wrote: > > On Tue, Nov 04, 2008 at 02:26:33PM +, Ian Campbell wrote: > > > On Tue, 2008-11-04 at 14:02 +0100, Bastian Blank wrote: > > > > Maybe its the best to remove the workaround and instead cripple mprotect > > > > to not allow PROT_NONE for now. And then hope that this can't be > > > > triggered by mmap with PROT_NONE. > > > I was thinking of going down the path of removing the workaround then > > > fixing mprotect, so your suggestion would be a consistant first step I > > > think. This patch makes mprotect work by (very skankily) hacking out large page support which is unsupported on top of Xen anyway (I think so, currently anyway). I think I took out PAT as collaterol damage too. A cleaned up version without the pat damage might be an acceptable fix for the mprotect issue. My suspicion is that one of the -xen.c or mach-xen/asm/ files has gotten out of sync with a fix to its native partner since _PAGE_PSE is used for PROTNONE on native too so they must get round it somehow. I'll have a scrobble through and see if I can see it. Ian. -- Ian Campbell Once I finally figured out all of life's answers, they changed the questions. Index: sid-xen/mm/mprotect.c === --- sid-xen.orig/mm/mprotect.c 2008-11-05 06:41:55.0 + +++ sid-xen/mm/mprotect.c 2008-11-05 06:51:56.0 + @@ -39,6 +39,7 @@ { pte_t *pte, oldpte; spinlock_t *ptl; + int debug = !strcmp(current->comm, "mprot"); pte = pte_offset_map_lock(mm, pmd, addr, &ptl); arch_enter_lazy_mmu_mode(); @@ -60,6 +61,9 @@ if (dirty_accountable && pte_dirty(ptent)) ptent = pte_mkwrite(ptent); set_pte_at(mm, addr, pte, ptent); + if (debug) +printk(KERN_CRIT "change present pte @ %p %#lx -> %#lx\n", + pte, oldpte.pte, ptent.pte); #ifdef CONFIG_MIGRATION } else if (!pte_file(oldpte)) { swp_entry_t entry = pte_to_swp_entry(oldpte); @@ -227,6 +231,7 @@ { unsigned long vm_flags, nstart, end, tmp, reqprot; struct vm_area_struct *vma, *prev; + int debug = !strcmp(current->comm, "mprot"); int error = -EINVAL; const int grows = prot & (PROT_GROWSDOWN|PROT_GROWSUP); prot &= ~(PROT_GROWSDOWN|PROT_GROWSUP); @@ -280,6 +285,8 @@ if (start > vma->vm_start) prev = vma; + + for (nstart = start ; ; ) { unsigned long newflags; @@ -287,6 +294,10 @@ newflags = vm_flags | (vma->vm_flags & ~(VM_READ | VM_WRITE | VM_EXEC)); + if (debug) + printk(KERN_CRIT "mprotect(%s) vma:%p %#lx-%#lx flags:%#lx->%#lx new prot %#lx\n", current->comm, + vma, vma->vm_start, vma->vm_end, vma->vm_flags, newflags, prot); + /* newflags >> 4 shift VM_MAY% in place of VM_% */ if ((newflags & ~(newflags >> 4)) & (VM_READ | VM_WRITE | VM_EXEC)) { error = -EACCES; Index: sid-xen/arch/x86/mm/dump_pagetables.c === --- sid-xen.orig/arch/x86/mm/dump_pagetables.c 2008-11-05 06:41:22.0 + +++ sid-xen/arch/x86/mm/dump_pagetables.c 2008-11-05 06:44:14.0 + @@ -98,15 +98,15 @@ /* Bit 9 has a different meaning on level 3 vs 4 */ if (level <= 3) { - if (pr & _PAGE_PSE) -seq_printf(m, "PSE "); - else -seq_printf(m, ""); +// if (pr & _PAGE_PSE) +//seq_printf(m, "PSE "); +// else +//seq_printf(m, ""); } else { - if (pr & _PAGE_PAT) -seq_printf(m, "pat "); - else -seq_printf(m, ""); +// if (pr & _PAGE_PAT) +//seq_printf(m, "pat "); +// else +//seq_printf(m, ""); } if (pr & _PAGE_GLOBAL) seq_printf(m, "GLB "); Index: sid-xen/arch/x86/mm/ioremap-xen.c === --- sid-xen.orig/arch/x86/mm/ioremap-xen.c 2008-11-05 06:41:22.0 + +++ sid-xen/arch/x86/mm/ioremap-xen.c 2008-11-05 06:44:14.0 + @@ -255,12 +255,12 @@ default: err = _set_memory_uc(vaddr, nrpages); break; - case _PAGE_CACHE_WC: - err = _set_memory_wc(vaddr, nrpages); - break; - case _PAGE_CACHE_WB: - err = _set_memory_wb(vaddr, nrpages); - break; + //case _PAGE_CACHE_WC: + //err = _set_memory_wc(vaddr, nrpages); + //break; + //case _PAGE_CACHE_WB: + //err = _set_memory_wb(vaddr, nrpages); + //break; } return err; @@ -340,7 +340,7 @@ * - request is uc-, return cannot be write-combine * - request is write-combine, return cannot be write-back */ - if ((prot_val == _PAGE_CACHE_UC_MINUS && +/* if ((prot_val == _PAGE_CACHE_UC_MINUS && (new_prot_val == _PAGE_CACHE_WB || new_prot_val == _PAGE_CACHE_WC)) || (prot_val == _PAGE_CACHE_WC && @@ -353,6 +353,7 @@ free_memtype(phys_addr, phys_addr + size); return NULL; } +*/ prot_val = new_prot_val; } @@ -364,12 +365,12 @@ case _PAGE_CACHE_UC_MINUS: prot = PAGE_KERNEL_UC_MINUS; break; - case _PAGE_CACHE_WC: - p
Bug#503821: Purpose of features/all/xen/workaround-pte-file.patch?
On Tue, 2008-11-04 at 17:43 +0100, Bastian Blank wrote: > On Tue, Nov 04, 2008 at 02:26:33PM +, Ian Campbell wrote: > > On Tue, 2008-11-04 at 14:02 +0100, Bastian Blank wrote: > > > Maybe its the best to remove the workaround and instead cripple mprotect > > > to not allow PROT_NONE for now. And then hope that this can't be > > > triggered by mmap with PROT_NONE. > > I was thinking of going down the path of removing the workaround then > > fixing mprotect, so your suggestion would be a consistant first step I > > think. > > Unchecked patch attached. It disallows changes from and to PROT_NONE. It boots but I cannot login, it hangs after MOTD. I added some debug and I see an aweful lot of transitions to prot == 0x0, seemingly for every process... I think this login has hung although I don't get a debug message from it: 43.350538] login S 8037fea0 0 1087 1 [ 43.350538] 880007de7b78 0286 880007de7ab0 [ 43.350538] 880007d1b810 803d4460 880007d1ba48 07802b50 [ 43.350538] 000480fd 80292c42 000480fd [ 43.350538] Call Trace: [ 43.350538] [] __find_get_block_slow+0xec/0xf8 [ 43.350538] [] do_get_write_access+0x39e/0x3e9 [ 43.350538] [] schedule_timeout+0x1e/0xad [ 43.350538] [] __getblk+0x1d/0x230 [ 43.350538] [] prepare_to_wait_exclusive+0x15/0x5e [ 43.350538] [] unix_wait_for_peer+0xb0/0xcc [ 43.350538] [] autoremove_wake_function+0x0/0x2e [ 43.350538] [] memcpy_fromiovec+0x36/0x66 [ 43.350538] [] unix_dgram_sendmsg+0x3b7/0x4e4 [ 43.350538] [] sock_sendmsg+0xcb/0xe3 [ 43.350538] [] autoremove_wake_function+0x0/0x2e [ 43.350538] [] mntput_no_expire+0x20/0x15d [ 43.350538] [] unix_find_other+0x112/0x1e8 [ 43.350538] [] do_path_lookup+0x14c/0x170 [ 43.350538] [] mntput_no_expire+0x20/0x15d [ 43.350538] [] sockfd_lookup_light+0x1a/0x52 [ 43.350538] [] sys_sendto+0xf3/0x127 [ 43.350538] [] unix_dgram_connect+0x14d/0x182 [ 43.350538] [] sys_connect+0x6c/0x9c [ 43.350538] [] system_call+0x68/0x6d [ 43.350538] [] system_call+0x0/0x6d Ian. -- Ian Campbell True to our past we work with an inherited, observed, and accepted vision of personal futility, and of the beauty of the world. -- David Mamet signature.asc Description: This is a digitally signed message part
Bug#503821: Purpose of features/all/xen/workaround-pte-file.patch?
On Tue, Nov 04, 2008 at 02:26:33PM +, Ian Campbell wrote: > On Tue, 2008-11-04 at 14:02 +0100, Bastian Blank wrote: > > Maybe its the best to remove the workaround and instead cripple mprotect > > to not allow PROT_NONE for now. And then hope that this can't be > > triggered by mmap with PROT_NONE. > I was thinking of going down the path of removing the workaround then > fixing mprotect, so your suggestion would be a consistant first step I > think. Unchecked patch attached. It disallows changes from and to PROT_NONE. Bastian -- It is a human characteristic to love little animals, especially if they're attractive in some way. -- McCoy, "The Trouble with Tribbles", stardate 4525.6 diff --git a/mm/mprotect.c b/mm/mprotect.c index e943715..24b4cfd 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -244,6 +244,14 @@ sys_mprotect(unsigned long start, size_t len, unsigned long prot) if (prot & ~(PROT_READ | PROT_WRITE | PROT_EXEC | PROT_SEM)) return -EINVAL; +#ifdef CONFIG_XEN + /* +* XXX: Disallow change to PROT_NONE. +*/ + if (!(prot & (PROT_READ | PROT_WRITE | PROT_EXEC))) + return -EACCES; +#endif + reqprot = prot; /* * Does the application expect PROT_READ to imply PROT_EXEC: @@ -285,6 +293,16 @@ sys_mprotect(unsigned long start, size_t len, unsigned long prot) /* Here we know that vma->vm_start <= nstart < vma->vm_end. */ +#ifdef CONFIG_XEN + /* +* XXX: Disallow change from PROT_NONE. +*/ + if (!(vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC))) { + error = -EACCES; + goto out; + } +#endif + newflags = vm_flags | (vma->vm_flags & ~(VM_READ | VM_WRITE | VM_EXEC)); /* newflags >> 4 shift VM_MAY% in place of VM_% */ signature.asc Description: Digital signature
Bug#503821: Purpose of features/all/xen/workaround-pte-file.patch?
On Tue, 2008-11-04 at 14:02 +0100, Bastian Blank wrote: > On Tue, Nov 04, 2008 at 09:49:19AM +0100, Bastian Blank wrote: > > On Tue, Nov 04, 2008 at 08:30:16AM +, Ian Campbell wrote: > > > However I assume the workaround is there for a specific purpose, what > > > was it? > > A crash in mprotect. > > Maybe its the best to remove the workaround and instead cripple mprotect > to not allow PROT_NONE for now. And then hope that this can't be > triggered by mmap with PROT_NONE. I was thinking of going down the path of removing the workaround then fixing mprotect, so your suggestion would be a consistant first step I think. Ian. -- Ian Campbell Take what you can use and let the rest go by. -- Ken Kesey signature.asc Description: This is a digitally signed message part
Bug#503821: Purpose of features/all/xen/workaround-pte-file.patch?
On Tue, Nov 04, 2008 at 09:49:19AM +0100, Bastian Blank wrote: > On Tue, Nov 04, 2008 at 08:30:16AM +, Ian Campbell wrote: > > However I assume the workaround is there for a specific purpose, what > > was it? > A crash in mprotect. Maybe its the best to remove the workaround and instead cripple mprotect to not allow PROT_NONE for now. And then hope that this can't be triggered by mmap with PROT_NONE. Bastian -- Those who hate and fight must stop themselves -- otherwise it is not stopped. -- Spock, "Day of the Dove", stardate unknown -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Bug#503821: Purpose of features/all/xen/workaround-pte-file.patch?
On Tue, 2008-11-04 at 09:49 +0100, Bastian Blank wrote: > On Tue, Nov 04, 2008 at 08:30:16AM +, Ian Campbell wrote: > > However I assume the workaround is there for a specific purpose, what > > was it? > > A crash in mprotect. Thanks. -- Ian Campbell Today is a good day for information-gathering. Read someone else's mail file. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Bug#503821: Purpose of features/all/xen/workaround-pte-file.patch?
On Tue, Nov 04, 2008 at 08:30:16AM +, Ian Campbell wrote: > However I assume the workaround is there for a specific purpose, what > was it? A crash in mprotect. | #include | #include | #include | | const int size = 4096; | | int main() | { | char *buf = mmap(0, size * 4, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); | mprotect(buf, size, PROT_NONE); | mprotect(buf, size, PROT_READ); | printf("%d\n", buf[0]); | } Bastian -- Earth -- mother of the most beautiful women in the universe. -- Apollo, "Who Mourns for Adonais?" stardate 3468.1 -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Bug#503821: Purpose of features/all/xen/workaround-pte-file.patch?
If I revert features/all/xen/workaround-pte-file.patch (i.e. use _PAGE_PSE for _PAGE_PROTNONE) then the crash when running the test program turns into a simple OOM, which I think is acceptable given the nature of the test program. However I assume the workaround is there for a specific purpose, what was it? -- Ian Campbell You may be sure that when a man begins to call himself a "realist," he is preparing to do something he is secretly ashamed of doing. -- Sydney Harris signature.asc Description: This is a digitally signed message part