On 14/02/18 12:15, Juergen Gross wrote:
> On 14/02/18 13:03, Juergen Gross wrote:
>> On 14/02/18 12:48, Andrew Cooper wrote:
>>> On 14/02/18 07:54, Juergen Gross wrote:
>>>> On 13/02/18 20:45, Andrew Cooper wrote:
>>>>> The current XPTI implementation isolates the directmap (and therefore a 
>>>>> lot of
>>>>> guest data), but a large quantity of CPU0's state (including its stack)
>>>>> remains visible.
>>>>> Furthermore, an attacker able to read .text is in a vastly superior 
>>>>> position
>>>>> to normal when it comes to fingerprinting Xen for known vulnerabilities, 
>>>>> or
>>>>> scanning for ROP/Spectre gadgets.
>>>>> Collect together the entrypoints in .text.entry (currently 3x4k frames, 
>>>>> but
>>>>> can almost certainly be slimmed down), and create a common mapping which 
>>>>> is
>>>>> inserted into each per-cpu shadow.  The stubs are also inserted into this
>>>>> mapping by pointing at the in-use L2.  This allows stubs allocated later 
>>>>> (SMP
>>>>> boot, or CPU hotplug) to work without further changes to the common 
>>>>> mappings.
>>>>> Signed-off-by: Andrew Cooper <andrew.coop...@citrix.com>
>>>>> ---
>>>>> CC: Jan Beulich <jbeul...@suse.com>
>>>>> CC: Wei Liu <wei.l...@citrix.com>
>>>>> CC: Juergen Gross <jgr...@suse.com>
>>>>> RFC, because I don't think the stubs handling is particularly sensible.
>>>>> We allocate 4k of virtual address space per CPU, but squash loads of CPUs
>>>>> together onto a single MFN.  The stubs ought to be isolated as well (as 
>>>>> they
>>>>> leak the virtual addresses of each stack), which can be done by 
>>>>> allocating an
>>>>> MFN per CPU (and simplifies cpu_smpboot_alloc() somewhat).  At this 
>>>>> point, we
>>>>> can't use a common set of mappings, and will have to clone the single 
>>>>> stub and
>>>>> .entry.text into each PCPUs copy of the pagetables.
>>>>> Also, my plan to cause .text.entry to straddle a 512TB boundary (and 
>>>>> therefore
>>>>> avoid any further pagetable allocations) has come a little unstuck 
>>>>> because of
>>>>> CONFIG_BIGMEM.  I'm still working out whether there is a sensible way to
>>>>> rearrange the virtual layout for this plan to work.
>>>>> ---
>>>>>  xen/arch/x86/smpboot.c             | 37 
>>>>> ++++++++++++++++++++++++++++++++-----
>>>>>  xen/arch/x86/x86_64/compat/entry.S |  2 ++
>>>>>  xen/arch/x86/x86_64/entry.S        |  4 +++-
>>>>>  xen/arch/x86/xen.lds.S             |  7 +++++++
>>>>>  4 files changed, 44 insertions(+), 6 deletions(-)
>>>>> diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
>>>>> index 2ebef03..2519141 100644
>>>>> --- a/xen/arch/x86/smpboot.c
>>>>> +++ b/xen/arch/x86/smpboot.c
>>>>> @@ -622,6 +622,9 @@ unsigned long alloc_stub_page(unsigned int cpu, 
>>>>> unsigned long *mfn)
>>>>>          unmap_domain_page(memset(__map_domain_page(pg), 0xcc, 
>>>>> PAGE_SIZE));
>>>>>      }
>>>>> +    /* Confirm that all stubs fit in a single L2 pagetable. */
>>>> So we limit NR_CPUS to be max 512 now?
>>> Not intentionally.  The PAGE_SIZE should be dropped.  (One full L2
>>> pagetable allows us to map 512*512 pages).
>> L2_PAGETABLE_SHIFT is 21. So I still don't get why dropping PAGE_SIZE
>> will correct it. OTOH I'm quite sure the BUILD_BUG_ON() won't trigger
>> any more with PAGE_SIZE being dropped. :-)
>>>> Maybe you should use STUB_BUF_SIZE instead of PAGE_SIZE?
>>> No - that would be incorrect because of the physical packing of stubs
>>> which occurs.
>>>> BTW: Is there any reason we don't use a common stub page mapped to each
>>>> per-cpu stack area? The stack address can easily be obtained via %rip
>>>> relative addressing then (see my patch for the per-vcpu stacks:
>>>> https://lists.xen.org/archives/html/xen-devel/2018-02/msg00773.html )
>>> I don't understand what you are asking here.  We cannot access the
>>> per-cpu area with plain rip-retaliative addressing without using gs base
>>> (and we really don't want to go down that route), or without per-cpu
>>> pagetables (which would have to be a compile time choice).
>> The stub-page of a cpu is currently mapped as the 3rd page of the
>> stack area. So the distance to the primary stack would be the same
>> for all cpus (a little bit less than 20kB).
>>> As for why the per-cpu areas aren't mapped, that's because they aren't
>>> needed at the moment.  Any decision to change this needs to weigh the
>>> utility of mapping the areas vs the additional data leakage, which is
>>> substantial.
>> The stack area is mapped. And that's where the stub is living.
> Oh, did I mix up something? I followed the comments in current.h. The
> code suggests the syscall trampoline page isn't used at the moment for
> the stubs...

That will be stale from the work Jan did to make the stack fully NX. 
The syscall stubs used to be on the stack, but are no longer.


Xen-devel mailing list

Reply via email to