On 14/02/18 12:27, Juergen Gross wrote:
> On 14/02/18 13:19, Andrew Cooper wrote:
>> On 14/02/18 12:15, Juergen Gross wrote:
>>> On 14/02/18 13:03, Juergen Gross wrote:
>>>> On 14/02/18 12:48, Andrew Cooper wrote:
>>>>> On 14/02/18 07:54, Juergen Gross wrote:
>>>>>> On 13/02/18 20:45, Andrew Cooper wrote:
>>>>>>> The current XPTI implementation isolates the directmap (and therefore a 
>>>>>>> lot of
>>>>>>> guest data), but a large quantity of CPU0's state (including its stack)
>>>>>>> remains visible.
>>>>>>>
>>>>>>> Furthermore, an attacker able to read .text is in a vastly superior 
>>>>>>> position
>>>>>>> to normal when it comes to fingerprinting Xen for known 
>>>>>>> vulnerabilities, or
>>>>>>> scanning for ROP/Spectre gadgets.
>>>>>>>
>>>>>>> Collect together the entrypoints in .text.entry (currently 3x4k frames, 
>>>>>>> but
>>>>>>> can almost certainly be slimmed down), and create a common mapping 
>>>>>>> which is
>>>>>>> inserted into each per-cpu shadow.  The stubs are also inserted into 
>>>>>>> this
>>>>>>> mapping by pointing at the in-use L2.  This allows stubs allocated 
>>>>>>> later (SMP
>>>>>>> boot, or CPU hotplug) to work without further changes to the common 
>>>>>>> mappings.
>>>>>>>
>>>>>>> Signed-off-by: Andrew Cooper <andrew.coop...@citrix.com>
>>>>>>> ---
>>>>>>> CC: Jan Beulich <jbeul...@suse.com>
>>>>>>> CC: Wei Liu <wei.l...@citrix.com>
>>>>>>> CC: Juergen Gross <jgr...@suse.com>
>>>>>>>
>>>>>>> RFC, because I don't think the stubs handling is particularly sensible.
>>>>>>>
>>>>>>> We allocate 4k of virtual address space per CPU, but squash loads of 
>>>>>>> CPUs
>>>>>>> together onto a single MFN.  The stubs ought to be isolated as well (as 
>>>>>>> they
>>>>>>> leak the virtual addresses of each stack), which can be done by 
>>>>>>> allocating an
>>>>>>> MFN per CPU (and simplifies cpu_smpboot_alloc() somewhat).  At this 
>>>>>>> point, we
>>>>>>> can't use a common set of mappings, and will have to clone the single 
>>>>>>> stub and
>>>>>>> .entry.text into each PCPUs copy of the pagetables.
>>>>>>>
>>>>>>> Also, my plan to cause .text.entry to straddle a 512TB boundary (and 
>>>>>>> therefore
>>>>>>> avoid any further pagetable allocations) has come a little unstuck 
>>>>>>> because of
>>>>>>> CONFIG_BIGMEM.  I'm still working out whether there is a sensible way to
>>>>>>> rearrange the virtual layout for this plan to work.
>>>>>>> ---
>>>>>>>  xen/arch/x86/smpboot.c             | 37 
>>>>>>> ++++++++++++++++++++++++++++++++-----
>>>>>>>  xen/arch/x86/x86_64/compat/entry.S |  2 ++
>>>>>>>  xen/arch/x86/x86_64/entry.S        |  4 +++-
>>>>>>>  xen/arch/x86/xen.lds.S             |  7 +++++++
>>>>>>>  4 files changed, 44 insertions(+), 6 deletions(-)
>>>>>>>
>>>>>>> diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
>>>>>>> index 2ebef03..2519141 100644
>>>>>>> --- a/xen/arch/x86/smpboot.c
>>>>>>> +++ b/xen/arch/x86/smpboot.c
>>>>>>> @@ -622,6 +622,9 @@ unsigned long alloc_stub_page(unsigned int cpu, 
>>>>>>> unsigned long *mfn)
>>>>>>>          unmap_domain_page(memset(__map_domain_page(pg), 0xcc, 
>>>>>>> PAGE_SIZE));
>>>>>>>      }
>>>>>>>  
>>>>>>> +    /* Confirm that all stubs fit in a single L2 pagetable. */
>>>>>>> +    BUILD_BUG_ON(NR_CPUS * PAGE_SIZE > (1u << L2_PAGETABLE_SHIFT));
>>>>>> So we limit NR_CPUS to be max 512 now?
>>>>> Not intentionally.  The PAGE_SIZE should be dropped.  (One full L2
>>>>> pagetable allows us to map 512*512 pages).
>>>> L2_PAGETABLE_SHIFT is 21. So I still don't get why dropping PAGE_SIZE
>>>> will correct it. OTOH I'm quite sure the BUILD_BUG_ON() won't trigger
>>>> any more with PAGE_SIZE being dropped. :-)
>>>>
>>>>>> Maybe you should use STUB_BUF_SIZE instead of PAGE_SIZE?
>>>>> No - that would be incorrect because of the physical packing of stubs
>>>>> which occurs.
>>>>>
>>>>>> BTW: Is there any reason we don't use a common stub page mapped to each
>>>>>> per-cpu stack area? The stack address can easily be obtained via %rip
>>>>>> relative addressing then (see my patch for the per-vcpu stacks:
>>>>>> https://lists.xen.org/archives/html/xen-devel/2018-02/msg00773.html )
>>>>> I don't understand what you are asking here.  We cannot access the
>>>>> per-cpu area with plain rip-retaliative addressing without using gs base
>>>>> (and we really don't want to go down that route), or without per-cpu
>>>>> pagetables (which would have to be a compile time choice).
>>>> The stub-page of a cpu is currently mapped as the 3rd page of the
>>>> stack area. So the distance to the primary stack would be the same
>>>> for all cpus (a little bit less than 20kB).
>>>>
>>>>> As for why the per-cpu areas aren't mapped, that's because they aren't
>>>>> needed at the moment.  Any decision to change this needs to weigh the
>>>>> utility of mapping the areas vs the additional data leakage, which is
>>>>> substantial.
>>>> The stack area is mapped. And that's where the stub is living.
>>> Oh, did I mix up something? I followed the comments in current.h. The
>>> code suggests the syscall trampoline page isn't used at the moment for
>>> the stubs...
>> That will be stale from the work Jan did to make the stack fully NX. 
>> The syscall stubs used to be on the stack, but are no longer.
> Changing this to make the syscall stub RO and executable again isn't
> impossible, I guess. Especially when this will make life easier at
> other places we should at least consider that option.

The stubs are RO/NX in the XEN_VIRT_* range, and it is these mappings
that I clone.

The emulation stubs are written by using map_domain_page() to make a
writeable alias.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Reply via email to