Re: [Xen-devel] [PATCH] x86/xpti: Hide almost all of .text and all .data/.rodata/.bss mappings
>>> On 14.02.18 at 17:23,wrote: > The current XPTI implementation isolates the directmap (and therefore a lot > of > guest data), but a large quantity of CPU0's state (including its stack) > remains visible. > > Furthermore, an attacker able to read .text is in a vastly superior position > to normal when it comes to fingerprinting Xen for known vulnerabilities, or > scanning for ROP/Spectre gadgets. > > Collect together the entrypoints in .text.entry (currently 3x4k frames, but > can almost certainly be slimmed down), and create a common mapping which is > inserted into each per-cpu shadow. The stubs are also inserted into this > mapping by pointing at the in-use L2. This allows stubs allocated later (SMP > boot, or CPU hotplug) to work without further changes to the common > mappings. > > Signed-off-by: Andrew Cooper > --- > CC: Jan Beulich > CC: Wei Liu > CC: Juergen Gross > > v2: > * Drop "incomplete" warning from the docs. This is about as complete as it >can reasonably get. > * Correct BUILD_BUG_ON() calculation, duplicate in the common_pgt code > * scope/const improvements > * Use .push/.popsection in preference to .previous > * Exclude {compat_}create_bounce_frame from .text.entry > * Extend the sanity checking of linear in clone_mapping(). There is a > latent >bug where a bad linear parameter would cause Xen to fall over a NULL > pointer. > --- > docs/misc/xen-command-line.markdown | 3 -- > xen/arch/x86/smpboot.c | 56 > + > xen/arch/x86/x86_64/compat/entry.S | 5 > xen/arch/x86/x86_64/entry.S | 11 ++-- > xen/arch/x86/xen.lds.S | 7 + > 5 files changed, 71 insertions(+), 11 deletions(-) > > diff --git a/docs/misc/xen-command-line.markdown > b/docs/misc/xen-command-line.markdown > index 79feba6..8317639 100644 > --- a/docs/misc/xen-command-line.markdown > +++ b/docs/misc/xen-command-line.markdown > @@ -1935,9 +1935,6 @@ mode. > Override default selection of whether to isolate 64-bit PV guest page > tables. > > -** WARNING: Not yet a complete isolation implementation, but better than > -nothing. ** > - > ### xsave > > `= ` > > diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c > index 2ebef03..10bf2f3 100644 > --- a/xen/arch/x86/smpboot.c > +++ b/xen/arch/x86/smpboot.c > @@ -622,6 +622,9 @@ unsigned long alloc_stub_page(unsigned int cpu, unsigned > long *mfn) > unmap_domain_page(memset(__map_domain_page(pg), 0xcc, PAGE_SIZE)); > } > > +/* Confirm that all stubs fit in a single L2 pagetable. */ > +BUILD_BUG_ON(NR_CPUS * PAGE_SIZE > (1u << L3_PAGETABLE_SHIFT)); It's a little confusing that the comment talks about L2 yet the check is on L3. May I suggest you either use L2_PAGETABLE_ENTRIES << L2_PAGETABLE_SHIFT or say "are covered by a single L3 entry" in the comment? > @@ -646,13 +649,23 @@ static int clone_mapping(const void *ptr, > root_pgentry_t *rpt) > { > unsigned long linear = (unsigned long)ptr, pfn; > unsigned int flags; > -l3_pgentry_t *pl3e = > l4e_to_l3e(idle_pg_table[root_table_offset(linear)]) + > - l3_table_offset(linear); > +l3_pgentry_t *pl3e; > l2_pgentry_t *pl2e; > l1_pgentry_t *pl1e; > > -if ( linear < DIRECTMAP_VIRT_START ) > -return 0; > +/* > + * Sanity check 'linear'. We only allow cloning from the Xen virtual > + * range, and in particular, only from the directmap and .text ranges. > + */ > +if ( root_table_offset(linear) > ROOT_PAGETABLE_LAST_XEN_SLOT || > + root_table_offset(linear) < ROOT_PAGETABLE_FIRST_XEN_SLOT ) > +return -EINVAL; > +if ( !(linear >= DIRECTMAP_VIRT_START || > + (linear >= XEN_VIRT_START && linear < XEN_VIRT_END)) ) Putting it this way is certainly fine, but generally I find using !() on logical || or && expressions less easy to understand than pushing the negation through the entire expression. Once having done so, it might even be debatable whether if ( linear < XEN_VIRT_START || (linear >= XEN_VIRT_END && linear < DIRECTMAP_VIRT_START)) ) return -EINVAL; wouldn't be better. With at least the first (comment vs code) aspect taken care of in both instances Reviewed-by: Jan Beulich Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH] x86/xpti: Hide almost all of .text and all .data/.rodata/.bss mappings
The current XPTI implementation isolates the directmap (and therefore a lot of guest data), but a large quantity of CPU0's state (including its stack) remains visible. Furthermore, an attacker able to read .text is in a vastly superior position to normal when it comes to fingerprinting Xen for known vulnerabilities, or scanning for ROP/Spectre gadgets. Collect together the entrypoints in .text.entry (currently 3x4k frames, but can almost certainly be slimmed down), and create a common mapping which is inserted into each per-cpu shadow. The stubs are also inserted into this mapping by pointing at the in-use L2. This allows stubs allocated later (SMP boot, or CPU hotplug) to work without further changes to the common mappings. Signed-off-by: Andrew Cooper--- CC: Jan Beulich CC: Wei Liu CC: Juergen Gross v2: * Drop "incomplete" warning from the docs. This is about as complete as it can reasonably get. * Correct BUILD_BUG_ON() calculation, duplicate in the common_pgt code * scope/const improvements * Use .push/.popsection in preference to .previous * Exclude {compat_}create_bounce_frame from .text.entry * Extend the sanity checking of linear in clone_mapping(). There is a latent bug where a bad linear parameter would cause Xen to fall over a NULL pointer. --- docs/misc/xen-command-line.markdown | 3 -- xen/arch/x86/smpboot.c | 56 + xen/arch/x86/x86_64/compat/entry.S | 5 xen/arch/x86/x86_64/entry.S | 11 ++-- xen/arch/x86/xen.lds.S | 7 + 5 files changed, 71 insertions(+), 11 deletions(-) diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown index 79feba6..8317639 100644 --- a/docs/misc/xen-command-line.markdown +++ b/docs/misc/xen-command-line.markdown @@ -1935,9 +1935,6 @@ mode. Override default selection of whether to isolate 64-bit PV guest page tables. -** WARNING: Not yet a complete isolation implementation, but better than -nothing. ** - ### xsave > `= ` diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c index 2ebef03..10bf2f3 100644 --- a/xen/arch/x86/smpboot.c +++ b/xen/arch/x86/smpboot.c @@ -622,6 +622,9 @@ unsigned long alloc_stub_page(unsigned int cpu, unsigned long *mfn) unmap_domain_page(memset(__map_domain_page(pg), 0xcc, PAGE_SIZE)); } +/* Confirm that all stubs fit in a single L2 pagetable. */ +BUILD_BUG_ON(NR_CPUS * PAGE_SIZE > (1u << L3_PAGETABLE_SHIFT)); + stub_va = XEN_VIRT_END - (cpu + 1) * PAGE_SIZE; if ( map_pages_to_xen(stub_va, mfn_x(page_to_mfn(pg)), 1, PAGE_HYPERVISOR_RX | MAP_SMALL_PAGES) ) @@ -646,13 +649,23 @@ static int clone_mapping(const void *ptr, root_pgentry_t *rpt) { unsigned long linear = (unsigned long)ptr, pfn; unsigned int flags; -l3_pgentry_t *pl3e = l4e_to_l3e(idle_pg_table[root_table_offset(linear)]) + - l3_table_offset(linear); +l3_pgentry_t *pl3e; l2_pgentry_t *pl2e; l1_pgentry_t *pl1e; -if ( linear < DIRECTMAP_VIRT_START ) -return 0; +/* + * Sanity check 'linear'. We only allow cloning from the Xen virtual + * range, and in particular, only from the directmap and .text ranges. + */ +if ( root_table_offset(linear) > ROOT_PAGETABLE_LAST_XEN_SLOT || + root_table_offset(linear) < ROOT_PAGETABLE_FIRST_XEN_SLOT ) +return -EINVAL; +if ( !(linear >= DIRECTMAP_VIRT_START || + (linear >= XEN_VIRT_START && linear < XEN_VIRT_END)) ) +return -EINVAL; + +pl3e = l4e_to_l3e(idle_pg_table[root_table_offset(linear)]) + +l3_table_offset(linear); flags = l3e_get_flags(*pl3e); ASSERT(flags & _PAGE_PRESENT); @@ -744,8 +757,12 @@ static __read_mostly int8_t opt_xpti = -1; boolean_param("xpti", opt_xpti); DEFINE_PER_CPU(root_pgentry_t *, root_pgt); +extern const char _stextentry[], _etextentry[]; + static int setup_cpu_root_pgt(unsigned int cpu) { +static root_pgentry_t common_pgt; + root_pgentry_t *rpt; unsigned int off; int rc; @@ -764,8 +781,35 @@ static int setup_cpu_root_pgt(unsigned int cpu) idle_pg_table[root_table_offset(RO_MPT_VIRT_START)]; /* SH_LINEAR_PT inserted together with guest mappings. */ /* PERDOMAIN inserted during context switch. */ -rpt[root_table_offset(XEN_VIRT_START)] = -idle_pg_table[root_table_offset(XEN_VIRT_START)]; + +/* One-time setup of common_pgt, which maps .text.entry and the stubs. */ +if ( unlikely(!root_get_intpte(common_pgt)) ) +{ +unsigned long stubs_linear = XEN_VIRT_END - 1; +l3_pgentry_t *stubs_main, *stubs_shadow; +const char *ptr; + +for ( rc = 0, ptr = _stextentry; + !rc && ptr < _etextentry; ptr += PAGE_SIZE ) +rc = clone_mapping(ptr, rpt); + +if ( rc )
Re: [Xen-devel] [PATCH] x86/xpti: Hide almost all of .text and all .data/.rodata/.bss mappings
On 14/02/18 12:27, Juergen Gross wrote: > On 14/02/18 13:19, Andrew Cooper wrote: >> On 14/02/18 12:15, Juergen Gross wrote: >>> On 14/02/18 13:03, Juergen Gross wrote: On 14/02/18 12:48, Andrew Cooper wrote: > On 14/02/18 07:54, Juergen Gross wrote: >> On 13/02/18 20:45, Andrew Cooper wrote: >>> The current XPTI implementation isolates the directmap (and therefore a >>> lot of >>> guest data), but a large quantity of CPU0's state (including its stack) >>> remains visible. >>> >>> Furthermore, an attacker able to read .text is in a vastly superior >>> position >>> to normal when it comes to fingerprinting Xen for known >>> vulnerabilities, or >>> scanning for ROP/Spectre gadgets. >>> >>> Collect together the entrypoints in .text.entry (currently 3x4k frames, >>> but >>> can almost certainly be slimmed down), and create a common mapping >>> which is >>> inserted into each per-cpu shadow. The stubs are also inserted into >>> this >>> mapping by pointing at the in-use L2. This allows stubs allocated >>> later (SMP >>> boot, or CPU hotplug) to work without further changes to the common >>> mappings. >>> >>> Signed-off-by: Andrew Cooper>>> --- >>> CC: Jan Beulich >>> CC: Wei Liu >>> CC: Juergen Gross >>> >>> RFC, because I don't think the stubs handling is particularly sensible. >>> >>> We allocate 4k of virtual address space per CPU, but squash loads of >>> CPUs >>> together onto a single MFN. The stubs ought to be isolated as well (as >>> they >>> leak the virtual addresses of each stack), which can be done by >>> allocating an >>> MFN per CPU (and simplifies cpu_smpboot_alloc() somewhat). At this >>> point, we >>> can't use a common set of mappings, and will have to clone the single >>> stub and >>> .entry.text into each PCPUs copy of the pagetables. >>> >>> Also, my plan to cause .text.entry to straddle a 512TB boundary (and >>> therefore >>> avoid any further pagetable allocations) has come a little unstuck >>> because of >>> CONFIG_BIGMEM. I'm still working out whether there is a sensible way to >>> rearrange the virtual layout for this plan to work. >>> --- >>> xen/arch/x86/smpboot.c | 37 >>> - >>> xen/arch/x86/x86_64/compat/entry.S | 2 ++ >>> xen/arch/x86/x86_64/entry.S| 4 +++- >>> xen/arch/x86/xen.lds.S | 7 +++ >>> 4 files changed, 44 insertions(+), 6 deletions(-) >>> >>> diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c >>> index 2ebef03..2519141 100644 >>> --- a/xen/arch/x86/smpboot.c >>> +++ b/xen/arch/x86/smpboot.c >>> @@ -622,6 +622,9 @@ unsigned long alloc_stub_page(unsigned int cpu, >>> unsigned long *mfn) >>> unmap_domain_page(memset(__map_domain_page(pg), 0xcc, >>> PAGE_SIZE)); >>> } >>> >>> +/* Confirm that all stubs fit in a single L2 pagetable. */ >>> +BUILD_BUG_ON(NR_CPUS * PAGE_SIZE > (1u << L2_PAGETABLE_SHIFT)); >> So we limit NR_CPUS to be max 512 now? > Not intentionally. The PAGE_SIZE should be dropped. (One full L2 > pagetable allows us to map 512*512 pages). L2_PAGETABLE_SHIFT is 21. So I still don't get why dropping PAGE_SIZE will correct it. OTOH I'm quite sure the BUILD_BUG_ON() won't trigger any more with PAGE_SIZE being dropped. :-) >> Maybe you should use STUB_BUF_SIZE instead of PAGE_SIZE? > No - that would be incorrect because of the physical packing of stubs > which occurs. > >> BTW: Is there any reason we don't use a common stub page mapped to each >> per-cpu stack area? The stack address can easily be obtained via %rip >> relative addressing then (see my patch for the per-vcpu stacks: >> https://lists.xen.org/archives/html/xen-devel/2018-02/msg00773.html ) > I don't understand what you are asking here. We cannot access the > per-cpu area with plain rip-retaliative addressing without using gs base > (and we really don't want to go down that route), or without per-cpu > pagetables (which would have to be a compile time choice). The stub-page of a cpu is currently mapped as the 3rd page of the stack area. So the distance to the primary stack would be the same for all cpus (a little bit less than 20kB). > As for why the per-cpu areas aren't mapped, that's because they aren't > needed at the moment. Any decision to change this needs to weigh the > utility of mapping the areas vs the additional data leakage, which is > substantial. The stack area is mapped. And that's where the stub is living. >>> Oh, did I mix up something? I followed the comments in
Re: [Xen-devel] [PATCH] x86/xpti: Hide almost all of .text and all .data/.rodata/.bss mappings
On 14/02/18 09:14, Jan Beulich wrote: On 13.02.18 at 20:45,wrote: >> RFC, because I don't think the stubs handling is particularly sensible. >> >> We allocate 4k of virtual address space per CPU, but squash loads of CPUs >> together onto a single MFN. The stubs ought to be isolated as well (as they >> leak the virtual addresses of each stack), which can be done by allocating an >> MFN per CPU (and simplifies cpu_smpboot_alloc() somewhat). At this point, we >> can't use a common set of mappings, and will have to clone the single stub >> and >> .entry.text into each PCPUs copy of the pagetables. > The 4k-per-CPU allocation of VA space is probably not strictly > necessary - quoting the original commit message: > "While sharing physical pages among certain CPUs on the same node, for > now the virtual mappings get established in distinct pages for each > CPU. This isn't a strict requirement, but simplifies VA space > management for this initial implementation: Sharing VA space would > require additional tracking of which areas are currently in use. If > the VA and/or TLB overhead turned out to be a problem, such extra code > could easily be added." > > Without allocating a page per CPU I think what you do is sensible. > And I don't see how hiding stubs of other CPUs would help much - > an attacker can gather stack locations from various CPUs as its > vCPU is being moved around, and you can't hide the current CPU's > stub space. Well - scenarios with pinning or cpupools would have a restricted access to other cpu stack locations. If the stubs were split in half, we could at least separate the syscall stubs from the emulation stubs, and leave the latter unmapped. > >> --- a/xen/arch/x86/smpboot.c >> +++ b/xen/arch/x86/smpboot.c >> @@ -622,6 +622,9 @@ unsigned long alloc_stub_page(unsigned int cpu, unsigned >> long *mfn) >> unmap_domain_page(memset(__map_domain_page(pg), 0xcc, PAGE_SIZE)); >> } >> >> +/* Confirm that all stubs fit in a single L2 pagetable. */ >> +BUILD_BUG_ON(NR_CPUS * PAGE_SIZE > (1u << L2_PAGETABLE_SHIFT)); > Perhaps duplicate this in setup_cpu_root_pgt() (suitably adjusted > of course, as Jürgen has already pointed out)? > >> @@ -651,9 +654,6 @@ static int clone_mapping(const void *ptr, root_pgentry_t >> *rpt) >> l2_pgentry_t *pl2e; >> l1_pgentry_t *pl1e; >> >> -if ( linear < DIRECTMAP_VIRT_START ) >> -return 0; > Isn't outright removing this going a little too far? Considering it is buggy, no. Retuning success after doing nothing is very antisocial. If anything, it should return -EINVAL so the caller knows that something expected happened, but it should also reject attempts to clone mappings beyond the end of the directmap (as those will explode when we switch to PV context), and the XEN_VIRT_* range needs white-listing as ok to clone for this usecase to work. >> --- a/xen/arch/x86/x86_64/compat/entry.S >> +++ b/xen/arch/x86/x86_64/compat/entry.S >> @@ -13,6 +13,8 @@ >> #include >> #include >> >> +.section .text.entry, "ax", @progbits >> + >> ENTRY(entry_int82) >> ASM_CLAC >> pushq $0 > This also puts compat_create_bounce_frame into the entry section, > which surely isn't needed. Same for the non-compat variant. I considered that. Excluding those ranges won't reduce the number of allocations we make, but it does complicate backports. Also, I still fully intend to make the functions disappear into C and move out of .text.entry that way. > >> @@ -854,7 +856,7 @@ GLOBAL(autogen_entrypoints) >> .popsection >> .endm >> >> -.text >> +.previous >> autogen_stubs: /* Automatically generated stubs. */ > Perhaps better to switch the earlier .section to .pushsection, and > use .popsection here? Will do. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/xpti: Hide almost all of .text and all .data/.rodata/.bss mappings
On 14/02/18 13:19, Andrew Cooper wrote: > On 14/02/18 12:15, Juergen Gross wrote: >> On 14/02/18 13:03, Juergen Gross wrote: >>> On 14/02/18 12:48, Andrew Cooper wrote: On 14/02/18 07:54, Juergen Gross wrote: > On 13/02/18 20:45, Andrew Cooper wrote: >> The current XPTI implementation isolates the directmap (and therefore a >> lot of >> guest data), but a large quantity of CPU0's state (including its stack) >> remains visible. >> >> Furthermore, an attacker able to read .text is in a vastly superior >> position >> to normal when it comes to fingerprinting Xen for known vulnerabilities, >> or >> scanning for ROP/Spectre gadgets. >> >> Collect together the entrypoints in .text.entry (currently 3x4k frames, >> but >> can almost certainly be slimmed down), and create a common mapping which >> is >> inserted into each per-cpu shadow. The stubs are also inserted into this >> mapping by pointing at the in-use L2. This allows stubs allocated later >> (SMP >> boot, or CPU hotplug) to work without further changes to the common >> mappings. >> >> Signed-off-by: Andrew Cooper>> --- >> CC: Jan Beulich >> CC: Wei Liu >> CC: Juergen Gross >> >> RFC, because I don't think the stubs handling is particularly sensible. >> >> We allocate 4k of virtual address space per CPU, but squash loads of CPUs >> together onto a single MFN. The stubs ought to be isolated as well (as >> they >> leak the virtual addresses of each stack), which can be done by >> allocating an >> MFN per CPU (and simplifies cpu_smpboot_alloc() somewhat). At this >> point, we >> can't use a common set of mappings, and will have to clone the single >> stub and >> .entry.text into each PCPUs copy of the pagetables. >> >> Also, my plan to cause .text.entry to straddle a 512TB boundary (and >> therefore >> avoid any further pagetable allocations) has come a little unstuck >> because of >> CONFIG_BIGMEM. I'm still working out whether there is a sensible way to >> rearrange the virtual layout for this plan to work. >> --- >> xen/arch/x86/smpboot.c | 37 >> - >> xen/arch/x86/x86_64/compat/entry.S | 2 ++ >> xen/arch/x86/x86_64/entry.S| 4 +++- >> xen/arch/x86/xen.lds.S | 7 +++ >> 4 files changed, 44 insertions(+), 6 deletions(-) >> >> diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c >> index 2ebef03..2519141 100644 >> --- a/xen/arch/x86/smpboot.c >> +++ b/xen/arch/x86/smpboot.c >> @@ -622,6 +622,9 @@ unsigned long alloc_stub_page(unsigned int cpu, >> unsigned long *mfn) >> unmap_domain_page(memset(__map_domain_page(pg), 0xcc, >> PAGE_SIZE)); >> } >> >> +/* Confirm that all stubs fit in a single L2 pagetable. */ >> +BUILD_BUG_ON(NR_CPUS * PAGE_SIZE > (1u << L2_PAGETABLE_SHIFT)); > So we limit NR_CPUS to be max 512 now? Not intentionally. The PAGE_SIZE should be dropped. (One full L2 pagetable allows us to map 512*512 pages). >>> L2_PAGETABLE_SHIFT is 21. So I still don't get why dropping PAGE_SIZE >>> will correct it. OTOH I'm quite sure the BUILD_BUG_ON() won't trigger >>> any more with PAGE_SIZE being dropped. :-) >>> > Maybe you should use STUB_BUF_SIZE instead of PAGE_SIZE? No - that would be incorrect because of the physical packing of stubs which occurs. > BTW: Is there any reason we don't use a common stub page mapped to each > per-cpu stack area? The stack address can easily be obtained via %rip > relative addressing then (see my patch for the per-vcpu stacks: > https://lists.xen.org/archives/html/xen-devel/2018-02/msg00773.html ) I don't understand what you are asking here. We cannot access the per-cpu area with plain rip-retaliative addressing without using gs base (and we really don't want to go down that route), or without per-cpu pagetables (which would have to be a compile time choice). >>> The stub-page of a cpu is currently mapped as the 3rd page of the >>> stack area. So the distance to the primary stack would be the same >>> for all cpus (a little bit less than 20kB). >>> As for why the per-cpu areas aren't mapped, that's because they aren't needed at the moment. Any decision to change this needs to weigh the utility of mapping the areas vs the additional data leakage, which is substantial. >>> The stack area is mapped. And that's where the stub is living. >> Oh, did I mix up something? I followed the comments in current.h. The >> code suggests the syscall trampoline page isn't used at the moment for >> the stubs... > > That will be stale from the work Jan did to make
Re: [Xen-devel] [PATCH] x86/xpti: Hide almost all of .text and all .data/.rodata/.bss mappings
On 14/02/18 12:15, Juergen Gross wrote: > On 14/02/18 13:03, Juergen Gross wrote: >> On 14/02/18 12:48, Andrew Cooper wrote: >>> On 14/02/18 07:54, Juergen Gross wrote: On 13/02/18 20:45, Andrew Cooper wrote: > The current XPTI implementation isolates the directmap (and therefore a > lot of > guest data), but a large quantity of CPU0's state (including its stack) > remains visible. > > Furthermore, an attacker able to read .text is in a vastly superior > position > to normal when it comes to fingerprinting Xen for known vulnerabilities, > or > scanning for ROP/Spectre gadgets. > > Collect together the entrypoints in .text.entry (currently 3x4k frames, > but > can almost certainly be slimmed down), and create a common mapping which > is > inserted into each per-cpu shadow. The stubs are also inserted into this > mapping by pointing at the in-use L2. This allows stubs allocated later > (SMP > boot, or CPU hotplug) to work without further changes to the common > mappings. > > Signed-off-by: Andrew Cooper> --- > CC: Jan Beulich > CC: Wei Liu > CC: Juergen Gross > > RFC, because I don't think the stubs handling is particularly sensible. > > We allocate 4k of virtual address space per CPU, but squash loads of CPUs > together onto a single MFN. The stubs ought to be isolated as well (as > they > leak the virtual addresses of each stack), which can be done by > allocating an > MFN per CPU (and simplifies cpu_smpboot_alloc() somewhat). At this > point, we > can't use a common set of mappings, and will have to clone the single > stub and > .entry.text into each PCPUs copy of the pagetables. > > Also, my plan to cause .text.entry to straddle a 512TB boundary (and > therefore > avoid any further pagetable allocations) has come a little unstuck > because of > CONFIG_BIGMEM. I'm still working out whether there is a sensible way to > rearrange the virtual layout for this plan to work. > --- > xen/arch/x86/smpboot.c | 37 > - > xen/arch/x86/x86_64/compat/entry.S | 2 ++ > xen/arch/x86/x86_64/entry.S| 4 +++- > xen/arch/x86/xen.lds.S | 7 +++ > 4 files changed, 44 insertions(+), 6 deletions(-) > > diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c > index 2ebef03..2519141 100644 > --- a/xen/arch/x86/smpboot.c > +++ b/xen/arch/x86/smpboot.c > @@ -622,6 +622,9 @@ unsigned long alloc_stub_page(unsigned int cpu, > unsigned long *mfn) > unmap_domain_page(memset(__map_domain_page(pg), 0xcc, > PAGE_SIZE)); > } > > +/* Confirm that all stubs fit in a single L2 pagetable. */ > +BUILD_BUG_ON(NR_CPUS * PAGE_SIZE > (1u << L2_PAGETABLE_SHIFT)); So we limit NR_CPUS to be max 512 now? >>> Not intentionally. The PAGE_SIZE should be dropped. (One full L2 >>> pagetable allows us to map 512*512 pages). >> L2_PAGETABLE_SHIFT is 21. So I still don't get why dropping PAGE_SIZE >> will correct it. OTOH I'm quite sure the BUILD_BUG_ON() won't trigger >> any more with PAGE_SIZE being dropped. :-) >> Maybe you should use STUB_BUF_SIZE instead of PAGE_SIZE? >>> No - that would be incorrect because of the physical packing of stubs >>> which occurs. >>> BTW: Is there any reason we don't use a common stub page mapped to each per-cpu stack area? The stack address can easily be obtained via %rip relative addressing then (see my patch for the per-vcpu stacks: https://lists.xen.org/archives/html/xen-devel/2018-02/msg00773.html ) >>> I don't understand what you are asking here. We cannot access the >>> per-cpu area with plain rip-retaliative addressing without using gs base >>> (and we really don't want to go down that route), or without per-cpu >>> pagetables (which would have to be a compile time choice). >> The stub-page of a cpu is currently mapped as the 3rd page of the >> stack area. So the distance to the primary stack would be the same >> for all cpus (a little bit less than 20kB). >> >>> As for why the per-cpu areas aren't mapped, that's because they aren't >>> needed at the moment. Any decision to change this needs to weigh the >>> utility of mapping the areas vs the additional data leakage, which is >>> substantial. >> The stack area is mapped. And that's where the stub is living. > Oh, did I mix up something? I followed the comments in current.h. The > code suggests the syscall trampoline page isn't used at the moment for > the stubs... That will be stale from the work Jan did to make the stack fully NX. The syscall stubs used to be on the stack, but are no longer. ~Andrew ___
Re: [Xen-devel] [PATCH] x86/xpti: Hide almost all of .text and all .data/.rodata/.bss mappings
On 14/02/18 13:03, Juergen Gross wrote: > On 14/02/18 12:48, Andrew Cooper wrote: >> On 14/02/18 07:54, Juergen Gross wrote: >>> On 13/02/18 20:45, Andrew Cooper wrote: The current XPTI implementation isolates the directmap (and therefore a lot of guest data), but a large quantity of CPU0's state (including its stack) remains visible. Furthermore, an attacker able to read .text is in a vastly superior position to normal when it comes to fingerprinting Xen for known vulnerabilities, or scanning for ROP/Spectre gadgets. Collect together the entrypoints in .text.entry (currently 3x4k frames, but can almost certainly be slimmed down), and create a common mapping which is inserted into each per-cpu shadow. The stubs are also inserted into this mapping by pointing at the in-use L2. This allows stubs allocated later (SMP boot, or CPU hotplug) to work without further changes to the common mappings. Signed-off-by: Andrew Cooper--- CC: Jan Beulich CC: Wei Liu CC: Juergen Gross RFC, because I don't think the stubs handling is particularly sensible. We allocate 4k of virtual address space per CPU, but squash loads of CPUs together onto a single MFN. The stubs ought to be isolated as well (as they leak the virtual addresses of each stack), which can be done by allocating an MFN per CPU (and simplifies cpu_smpboot_alloc() somewhat). At this point, we can't use a common set of mappings, and will have to clone the single stub and .entry.text into each PCPUs copy of the pagetables. Also, my plan to cause .text.entry to straddle a 512TB boundary (and therefore avoid any further pagetable allocations) has come a little unstuck because of CONFIG_BIGMEM. I'm still working out whether there is a sensible way to rearrange the virtual layout for this plan to work. --- xen/arch/x86/smpboot.c | 37 - xen/arch/x86/x86_64/compat/entry.S | 2 ++ xen/arch/x86/x86_64/entry.S| 4 +++- xen/arch/x86/xen.lds.S | 7 +++ 4 files changed, 44 insertions(+), 6 deletions(-) diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c index 2ebef03..2519141 100644 --- a/xen/arch/x86/smpboot.c +++ b/xen/arch/x86/smpboot.c @@ -622,6 +622,9 @@ unsigned long alloc_stub_page(unsigned int cpu, unsigned long *mfn) unmap_domain_page(memset(__map_domain_page(pg), 0xcc, PAGE_SIZE)); } +/* Confirm that all stubs fit in a single L2 pagetable. */ +BUILD_BUG_ON(NR_CPUS * PAGE_SIZE > (1u << L2_PAGETABLE_SHIFT)); >>> So we limit NR_CPUS to be max 512 now? >> >> Not intentionally. The PAGE_SIZE should be dropped. (One full L2 >> pagetable allows us to map 512*512 pages). > > L2_PAGETABLE_SHIFT is 21. So I still don't get why dropping PAGE_SIZE > will correct it. OTOH I'm quite sure the BUILD_BUG_ON() won't trigger > any more with PAGE_SIZE being dropped. :-) > >>> Maybe you should use STUB_BUF_SIZE instead of PAGE_SIZE? >> >> No - that would be incorrect because of the physical packing of stubs >> which occurs. >> >>> BTW: Is there any reason we don't use a common stub page mapped to each >>> per-cpu stack area? The stack address can easily be obtained via %rip >>> relative addressing then (see my patch for the per-vcpu stacks: >>> https://lists.xen.org/archives/html/xen-devel/2018-02/msg00773.html ) >> >> I don't understand what you are asking here. We cannot access the >> per-cpu area with plain rip-retaliative addressing without using gs base >> (and we really don't want to go down that route), or without per-cpu >> pagetables (which would have to be a compile time choice). > > The stub-page of a cpu is currently mapped as the 3rd page of the > stack area. So the distance to the primary stack would be the same > for all cpus (a little bit less than 20kB). > >> As for why the per-cpu areas aren't mapped, that's because they aren't >> needed at the moment. Any decision to change this needs to weigh the >> utility of mapping the areas vs the additional data leakage, which is >> substantial. > > The stack area is mapped. And that's where the stub is living. Oh, did I mix up something? I followed the comments in current.h. The code suggests the syscall trampoline page isn't used at the moment for the stubs... Juergen ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/xpti: Hide almost all of .text and all .data/.rodata/.bss mappings
On 14/02/18 12:48, Andrew Cooper wrote: > On 14/02/18 07:54, Juergen Gross wrote: >> On 13/02/18 20:45, Andrew Cooper wrote: >>> The current XPTI implementation isolates the directmap (and therefore a lot >>> of >>> guest data), but a large quantity of CPU0's state (including its stack) >>> remains visible. >>> >>> Furthermore, an attacker able to read .text is in a vastly superior position >>> to normal when it comes to fingerprinting Xen for known vulnerabilities, or >>> scanning for ROP/Spectre gadgets. >>> >>> Collect together the entrypoints in .text.entry (currently 3x4k frames, but >>> can almost certainly be slimmed down), and create a common mapping which is >>> inserted into each per-cpu shadow. The stubs are also inserted into this >>> mapping by pointing at the in-use L2. This allows stubs allocated later >>> (SMP >>> boot, or CPU hotplug) to work without further changes to the common >>> mappings. >>> >>> Signed-off-by: Andrew Cooper>>> --- >>> CC: Jan Beulich >>> CC: Wei Liu >>> CC: Juergen Gross >>> >>> RFC, because I don't think the stubs handling is particularly sensible. >>> >>> We allocate 4k of virtual address space per CPU, but squash loads of CPUs >>> together onto a single MFN. The stubs ought to be isolated as well (as they >>> leak the virtual addresses of each stack), which can be done by allocating >>> an >>> MFN per CPU (and simplifies cpu_smpboot_alloc() somewhat). At this point, >>> we >>> can't use a common set of mappings, and will have to clone the single stub >>> and >>> .entry.text into each PCPUs copy of the pagetables. >>> >>> Also, my plan to cause .text.entry to straddle a 512TB boundary (and >>> therefore >>> avoid any further pagetable allocations) has come a little unstuck because >>> of >>> CONFIG_BIGMEM. I'm still working out whether there is a sensible way to >>> rearrange the virtual layout for this plan to work. >>> --- >>> xen/arch/x86/smpboot.c | 37 >>> - >>> xen/arch/x86/x86_64/compat/entry.S | 2 ++ >>> xen/arch/x86/x86_64/entry.S| 4 +++- >>> xen/arch/x86/xen.lds.S | 7 +++ >>> 4 files changed, 44 insertions(+), 6 deletions(-) >>> >>> diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c >>> index 2ebef03..2519141 100644 >>> --- a/xen/arch/x86/smpboot.c >>> +++ b/xen/arch/x86/smpboot.c >>> @@ -622,6 +622,9 @@ unsigned long alloc_stub_page(unsigned int cpu, >>> unsigned long *mfn) >>> unmap_domain_page(memset(__map_domain_page(pg), 0xcc, PAGE_SIZE)); >>> } >>> >>> +/* Confirm that all stubs fit in a single L2 pagetable. */ >>> +BUILD_BUG_ON(NR_CPUS * PAGE_SIZE > (1u << L2_PAGETABLE_SHIFT)); >> So we limit NR_CPUS to be max 512 now? > > Not intentionally. The PAGE_SIZE should be dropped. (One full L2 > pagetable allows us to map 512*512 pages). L2_PAGETABLE_SHIFT is 21. So I still don't get why dropping PAGE_SIZE will correct it. OTOH I'm quite sure the BUILD_BUG_ON() won't trigger any more with PAGE_SIZE being dropped. :-) >> Maybe you should use STUB_BUF_SIZE instead of PAGE_SIZE? > > No - that would be incorrect because of the physical packing of stubs > which occurs. > >> BTW: Is there any reason we don't use a common stub page mapped to each >> per-cpu stack area? The stack address can easily be obtained via %rip >> relative addressing then (see my patch for the per-vcpu stacks: >> https://lists.xen.org/archives/html/xen-devel/2018-02/msg00773.html ) > > I don't understand what you are asking here. We cannot access the > per-cpu area with plain rip-retaliative addressing without using gs base > (and we really don't want to go down that route), or without per-cpu > pagetables (which would have to be a compile time choice). The stub-page of a cpu is currently mapped as the 3rd page of the stack area. So the distance to the primary stack would be the same for all cpus (a little bit less than 20kB). > As for why the per-cpu areas aren't mapped, that's because they aren't > needed at the moment. Any decision to change this needs to weigh the > utility of mapping the areas vs the additional data leakage, which is > substantial. The stack area is mapped. And that's where the stub is living. Juergen ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/xpti: Hide almost all of .text and all .data/.rodata/.bss mappings
On 14/02/18 07:54, Juergen Gross wrote: > On 13/02/18 20:45, Andrew Cooper wrote: >> The current XPTI implementation isolates the directmap (and therefore a lot >> of >> guest data), but a large quantity of CPU0's state (including its stack) >> remains visible. >> >> Furthermore, an attacker able to read .text is in a vastly superior position >> to normal when it comes to fingerprinting Xen for known vulnerabilities, or >> scanning for ROP/Spectre gadgets. >> >> Collect together the entrypoints in .text.entry (currently 3x4k frames, but >> can almost certainly be slimmed down), and create a common mapping which is >> inserted into each per-cpu shadow. The stubs are also inserted into this >> mapping by pointing at the in-use L2. This allows stubs allocated later (SMP >> boot, or CPU hotplug) to work without further changes to the common mappings. >> >> Signed-off-by: Andrew Cooper>> --- >> CC: Jan Beulich >> CC: Wei Liu >> CC: Juergen Gross >> >> RFC, because I don't think the stubs handling is particularly sensible. >> >> We allocate 4k of virtual address space per CPU, but squash loads of CPUs >> together onto a single MFN. The stubs ought to be isolated as well (as they >> leak the virtual addresses of each stack), which can be done by allocating an >> MFN per CPU (and simplifies cpu_smpboot_alloc() somewhat). At this point, we >> can't use a common set of mappings, and will have to clone the single stub >> and >> .entry.text into each PCPUs copy of the pagetables. >> >> Also, my plan to cause .text.entry to straddle a 512TB boundary (and >> therefore >> avoid any further pagetable allocations) has come a little unstuck because of >> CONFIG_BIGMEM. I'm still working out whether there is a sensible way to >> rearrange the virtual layout for this plan to work. >> --- >> xen/arch/x86/smpboot.c | 37 >> - >> xen/arch/x86/x86_64/compat/entry.S | 2 ++ >> xen/arch/x86/x86_64/entry.S| 4 +++- >> xen/arch/x86/xen.lds.S | 7 +++ >> 4 files changed, 44 insertions(+), 6 deletions(-) >> >> diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c >> index 2ebef03..2519141 100644 >> --- a/xen/arch/x86/smpboot.c >> +++ b/xen/arch/x86/smpboot.c >> @@ -622,6 +622,9 @@ unsigned long alloc_stub_page(unsigned int cpu, unsigned >> long *mfn) >> unmap_domain_page(memset(__map_domain_page(pg), 0xcc, PAGE_SIZE)); >> } >> >> +/* Confirm that all stubs fit in a single L2 pagetable. */ >> +BUILD_BUG_ON(NR_CPUS * PAGE_SIZE > (1u << L2_PAGETABLE_SHIFT)); > So we limit NR_CPUS to be max 512 now? Not intentionally. The PAGE_SIZE should be dropped. (One full L2 pagetable allows us to map 512*512 pages). > Maybe you should use STUB_BUF_SIZE instead of PAGE_SIZE? No - that would be incorrect because of the physical packing of stubs which occurs. > BTW: Is there any reason we don't use a common stub page mapped to each > per-cpu stack area? The stack address can easily be obtained via %rip > relative addressing then (see my patch for the per-vcpu stacks: > https://lists.xen.org/archives/html/xen-devel/2018-02/msg00773.html ) I don't understand what you are asking here. We cannot access the per-cpu area with plain rip-retaliative addressing without using gs base (and we really don't want to go down that route), or without per-cpu pagetables (which would have to be a compile time choice). As for why the per-cpu areas aren't mapped, that's because they aren't needed at the moment. Any decision to change this needs to weigh the utility of mapping the areas vs the additional data leakage, which is substantial. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/xpti: Hide almost all of .text and all .data/.rodata/.bss mappings
>>> On 13.02.18 at 20:45,wrote: > RFC, because I don't think the stubs handling is particularly sensible. > > We allocate 4k of virtual address space per CPU, but squash loads of CPUs > together onto a single MFN. The stubs ought to be isolated as well (as they > leak the virtual addresses of each stack), which can be done by allocating an > MFN per CPU (and simplifies cpu_smpboot_alloc() somewhat). At this point, we > can't use a common set of mappings, and will have to clone the single stub and > .entry.text into each PCPUs copy of the pagetables. The 4k-per-CPU allocation of VA space is probably not strictly necessary - quoting the original commit message: "While sharing physical pages among certain CPUs on the same node, for now the virtual mappings get established in distinct pages for each CPU. This isn't a strict requirement, but simplifies VA space management for this initial implementation: Sharing VA space would require additional tracking of which areas are currently in use. If the VA and/or TLB overhead turned out to be a problem, such extra code could easily be added." Without allocating a page per CPU I think what you do is sensible. And I don't see how hiding stubs of other CPUs would help much - an attacker can gather stack locations from various CPUs as its vCPU is being moved around, and you can't hide the current CPU's stub space. > --- a/xen/arch/x86/smpboot.c > +++ b/xen/arch/x86/smpboot.c > @@ -622,6 +622,9 @@ unsigned long alloc_stub_page(unsigned int cpu, unsigned > long *mfn) > unmap_domain_page(memset(__map_domain_page(pg), 0xcc, PAGE_SIZE)); > } > > +/* Confirm that all stubs fit in a single L2 pagetable. */ > +BUILD_BUG_ON(NR_CPUS * PAGE_SIZE > (1u << L2_PAGETABLE_SHIFT)); Perhaps duplicate this in setup_cpu_root_pgt() (suitably adjusted of course, as Jürgen has already pointed out)? > @@ -651,9 +654,6 @@ static int clone_mapping(const void *ptr, root_pgentry_t > *rpt) > l2_pgentry_t *pl2e; > l1_pgentry_t *pl1e; > > -if ( linear < DIRECTMAP_VIRT_START ) > -return 0; Isn't outright removing this going a little too far? > @@ -744,6 +744,9 @@ static __read_mostly int8_t opt_xpti = -1; > boolean_param("xpti", opt_xpti); > DEFINE_PER_CPU(root_pgentry_t *, root_pgt); > > +static root_pgentry_t common_pgt; Move into setup_cpu_root_pgt()? > +extern char _stextentry[], _etextentry[]; const? > --- a/xen/arch/x86/x86_64/compat/entry.S > +++ b/xen/arch/x86/x86_64/compat/entry.S > @@ -13,6 +13,8 @@ > #include > #include > > +.section .text.entry, "ax", @progbits > + > ENTRY(entry_int82) > ASM_CLAC > pushq $0 This also puts compat_create_bounce_frame into the entry section, which surely isn't needed. Same for the non-compat variant. > @@ -854,7 +856,7 @@ GLOBAL(autogen_entrypoints) > .popsection > .endm > > -.text > +.previous > autogen_stubs: /* Automatically generated stubs. */ Perhaps better to switch the earlier .section to .pushsection, and use .popsection here? Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/xpti: Hide almost all of .text and all .data/.rodata/.bss mappings
On 13/02/18 20:45, Andrew Cooper wrote: > The current XPTI implementation isolates the directmap (and therefore a lot of > guest data), but a large quantity of CPU0's state (including its stack) > remains visible. > > Furthermore, an attacker able to read .text is in a vastly superior position > to normal when it comes to fingerprinting Xen for known vulnerabilities, or > scanning for ROP/Spectre gadgets. > > Collect together the entrypoints in .text.entry (currently 3x4k frames, but > can almost certainly be slimmed down), and create a common mapping which is > inserted into each per-cpu shadow. The stubs are also inserted into this > mapping by pointing at the in-use L2. This allows stubs allocated later (SMP > boot, or CPU hotplug) to work without further changes to the common mappings. > > Signed-off-by: Andrew Cooper> --- > CC: Jan Beulich > CC: Wei Liu > CC: Juergen Gross > > RFC, because I don't think the stubs handling is particularly sensible. > > We allocate 4k of virtual address space per CPU, but squash loads of CPUs > together onto a single MFN. The stubs ought to be isolated as well (as they > leak the virtual addresses of each stack), which can be done by allocating an > MFN per CPU (and simplifies cpu_smpboot_alloc() somewhat). At this point, we > can't use a common set of mappings, and will have to clone the single stub and > .entry.text into each PCPUs copy of the pagetables. > > Also, my plan to cause .text.entry to straddle a 512TB boundary (and therefore > avoid any further pagetable allocations) has come a little unstuck because of > CONFIG_BIGMEM. I'm still working out whether there is a sensible way to > rearrange the virtual layout for this plan to work. > --- > xen/arch/x86/smpboot.c | 37 - > xen/arch/x86/x86_64/compat/entry.S | 2 ++ > xen/arch/x86/x86_64/entry.S| 4 +++- > xen/arch/x86/xen.lds.S | 7 +++ > 4 files changed, 44 insertions(+), 6 deletions(-) > > diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c > index 2ebef03..2519141 100644 > --- a/xen/arch/x86/smpboot.c > +++ b/xen/arch/x86/smpboot.c > @@ -622,6 +622,9 @@ unsigned long alloc_stub_page(unsigned int cpu, unsigned > long *mfn) > unmap_domain_page(memset(__map_domain_page(pg), 0xcc, PAGE_SIZE)); > } > > +/* Confirm that all stubs fit in a single L2 pagetable. */ > +BUILD_BUG_ON(NR_CPUS * PAGE_SIZE > (1u << L2_PAGETABLE_SHIFT)); So we limit NR_CPUS to be max 512 now? Maybe you should use STUB_BUF_SIZE instead of PAGE_SIZE? BTW: Is there any reason we don't use a common stub page mapped to each per-cpu stack area? The stack address can easily be obtained via %rip relative addressing then (see my patch for the per-vcpu stacks: https://lists.xen.org/archives/html/xen-devel/2018-02/msg00773.html ) Juergen > + > stub_va = XEN_VIRT_END - (cpu + 1) * PAGE_SIZE; > if ( map_pages_to_xen(stub_va, mfn_x(page_to_mfn(pg)), 1, >PAGE_HYPERVISOR_RX | MAP_SMALL_PAGES) ) > @@ -651,9 +654,6 @@ static int clone_mapping(const void *ptr, root_pgentry_t > *rpt) > l2_pgentry_t *pl2e; > l1_pgentry_t *pl1e; > > -if ( linear < DIRECTMAP_VIRT_START ) > -return 0; > - > flags = l3e_get_flags(*pl3e); > ASSERT(flags & _PAGE_PRESENT); > if ( flags & _PAGE_PSE ) > @@ -744,6 +744,9 @@ static __read_mostly int8_t opt_xpti = -1; > boolean_param("xpti", opt_xpti); > DEFINE_PER_CPU(root_pgentry_t *, root_pgt); > > +static root_pgentry_t common_pgt; > +extern char _stextentry[], _etextentry[]; > + > static int setup_cpu_root_pgt(unsigned int cpu) > { > root_pgentry_t *rpt; > @@ -764,8 +767,32 @@ static int setup_cpu_root_pgt(unsigned int cpu) > idle_pg_table[root_table_offset(RO_MPT_VIRT_START)]; > /* SH_LINEAR_PT inserted together with guest mappings. */ > /* PERDOMAIN inserted during context switch. */ > -rpt[root_table_offset(XEN_VIRT_START)] = > -idle_pg_table[root_table_offset(XEN_VIRT_START)]; > + > +/* One-time setup of common_pgt, which maps .text.entry and the stubs. */ > +if ( unlikely(!root_get_intpte(common_pgt)) ) > +{ > +unsigned long stubs_linear = XEN_VIRT_END - 1; > +l3_pgentry_t *stubs_main, *stubs_shadow; > +char *ptr; > + > +for ( rc = 0, ptr = _stextentry; > + !rc && ptr < _etextentry; ptr += PAGE_SIZE ) > +rc = clone_mapping(ptr, rpt); > + > +if ( rc ) > +return rc; > + > +stubs_main = > l4e_to_l3e(idle_pg_table[l4_table_offset(stubs_linear)]); > +stubs_shadow = l4e_to_l3e(rpt[l4_table_offset(stubs_linear)]); > + > +/* Splice into the regular L2 mapping the stubs. */ > +stubs_shadow[l3_table_offset(stubs_linear)] = > +stubs_main[l3_table_offset(stubs_linear)]; > + > +
[Xen-devel] [PATCH] x86/xpti: Hide almost all of .text and all .data/.rodata/.bss mappings
The current XPTI implementation isolates the directmap (and therefore a lot of guest data), but a large quantity of CPU0's state (including its stack) remains visible. Furthermore, an attacker able to read .text is in a vastly superior position to normal when it comes to fingerprinting Xen for known vulnerabilities, or scanning for ROP/Spectre gadgets. Collect together the entrypoints in .text.entry (currently 3x4k frames, but can almost certainly be slimmed down), and create a common mapping which is inserted into each per-cpu shadow. The stubs are also inserted into this mapping by pointing at the in-use L2. This allows stubs allocated later (SMP boot, or CPU hotplug) to work without further changes to the common mappings. Signed-off-by: Andrew Cooper--- CC: Jan Beulich CC: Wei Liu CC: Juergen Gross RFC, because I don't think the stubs handling is particularly sensible. We allocate 4k of virtual address space per CPU, but squash loads of CPUs together onto a single MFN. The stubs ought to be isolated as well (as they leak the virtual addresses of each stack), which can be done by allocating an MFN per CPU (and simplifies cpu_smpboot_alloc() somewhat). At this point, we can't use a common set of mappings, and will have to clone the single stub and .entry.text into each PCPUs copy of the pagetables. Also, my plan to cause .text.entry to straddle a 512TB boundary (and therefore avoid any further pagetable allocations) has come a little unstuck because of CONFIG_BIGMEM. I'm still working out whether there is a sensible way to rearrange the virtual layout for this plan to work. --- xen/arch/x86/smpboot.c | 37 - xen/arch/x86/x86_64/compat/entry.S | 2 ++ xen/arch/x86/x86_64/entry.S| 4 +++- xen/arch/x86/xen.lds.S | 7 +++ 4 files changed, 44 insertions(+), 6 deletions(-) diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c index 2ebef03..2519141 100644 --- a/xen/arch/x86/smpboot.c +++ b/xen/arch/x86/smpboot.c @@ -622,6 +622,9 @@ unsigned long alloc_stub_page(unsigned int cpu, unsigned long *mfn) unmap_domain_page(memset(__map_domain_page(pg), 0xcc, PAGE_SIZE)); } +/* Confirm that all stubs fit in a single L2 pagetable. */ +BUILD_BUG_ON(NR_CPUS * PAGE_SIZE > (1u << L2_PAGETABLE_SHIFT)); + stub_va = XEN_VIRT_END - (cpu + 1) * PAGE_SIZE; if ( map_pages_to_xen(stub_va, mfn_x(page_to_mfn(pg)), 1, PAGE_HYPERVISOR_RX | MAP_SMALL_PAGES) ) @@ -651,9 +654,6 @@ static int clone_mapping(const void *ptr, root_pgentry_t *rpt) l2_pgentry_t *pl2e; l1_pgentry_t *pl1e; -if ( linear < DIRECTMAP_VIRT_START ) -return 0; - flags = l3e_get_flags(*pl3e); ASSERT(flags & _PAGE_PRESENT); if ( flags & _PAGE_PSE ) @@ -744,6 +744,9 @@ static __read_mostly int8_t opt_xpti = -1; boolean_param("xpti", opt_xpti); DEFINE_PER_CPU(root_pgentry_t *, root_pgt); +static root_pgentry_t common_pgt; +extern char _stextentry[], _etextentry[]; + static int setup_cpu_root_pgt(unsigned int cpu) { root_pgentry_t *rpt; @@ -764,8 +767,32 @@ static int setup_cpu_root_pgt(unsigned int cpu) idle_pg_table[root_table_offset(RO_MPT_VIRT_START)]; /* SH_LINEAR_PT inserted together with guest mappings. */ /* PERDOMAIN inserted during context switch. */ -rpt[root_table_offset(XEN_VIRT_START)] = -idle_pg_table[root_table_offset(XEN_VIRT_START)]; + +/* One-time setup of common_pgt, which maps .text.entry and the stubs. */ +if ( unlikely(!root_get_intpte(common_pgt)) ) +{ +unsigned long stubs_linear = XEN_VIRT_END - 1; +l3_pgentry_t *stubs_main, *stubs_shadow; +char *ptr; + +for ( rc = 0, ptr = _stextentry; + !rc && ptr < _etextentry; ptr += PAGE_SIZE ) +rc = clone_mapping(ptr, rpt); + +if ( rc ) +return rc; + +stubs_main = l4e_to_l3e(idle_pg_table[l4_table_offset(stubs_linear)]); +stubs_shadow = l4e_to_l3e(rpt[l4_table_offset(stubs_linear)]); + +/* Splice into the regular L2 mapping the stubs. */ +stubs_shadow[l3_table_offset(stubs_linear)] = +stubs_main[l3_table_offset(stubs_linear)]; + +common_pgt = rpt[root_table_offset(XEN_VIRT_START)]; +} + +rpt[root_table_offset(XEN_VIRT_START)] = common_pgt; /* Install direct map page table entries for stack, IDT, and TSS. */ for ( off = rc = 0; !rc && off < STACK_SIZE; off += PAGE_SIZE ) diff --git a/xen/arch/x86/x86_64/compat/entry.S b/xen/arch/x86/x86_64/compat/entry.S index 707c746..b001e79 100644 --- a/xen/arch/x86/x86_64/compat/entry.S +++ b/xen/arch/x86/x86_64/compat/entry.S @@ -13,6 +13,8 @@ #include #include +.section .text.entry, "ax", @progbits + ENTRY(entry_int82) ASM_CLAC pushq $0 diff --git