On 14/02/18 09:14, Jan Beulich wrote:
>>>> On 13.02.18 at 20:45, <andrew.coop...@citrix.com> wrote:
>> RFC, because I don't think the stubs handling is particularly sensible.
>> We allocate 4k of virtual address space per CPU, but squash loads of CPUs
>> together onto a single MFN. The stubs ought to be isolated as well (as they
>> leak the virtual addresses of each stack), which can be done by allocating an
>> MFN per CPU (and simplifies cpu_smpboot_alloc() somewhat). At this point, we
>> can't use a common set of mappings, and will have to clone the single stub
>> .entry.text into each PCPUs copy of the pagetables.
> The 4k-per-CPU allocation of VA space is probably not strictly
> necessary - quoting the original commit message:
> "While sharing physical pages among certain CPUs on the same node, for
> now the virtual mappings get established in distinct pages for each
> CPU. This isn't a strict requirement, but simplifies VA space
> management for this initial implementation: Sharing VA space would
> require additional tracking of which areas are currently in use. If
> the VA and/or TLB overhead turned out to be a problem, such extra code
> could easily be added."
> Without allocating a page per CPU I think what you do is sensible.
> And I don't see how hiding stubs of other CPUs would help much -
> an attacker can gather stack locations from various CPUs as its
> vCPU is being moved around, and you can't hide the current CPU's
> stub space.
Well - scenarios with pinning or cpupools would have a restricted access
to other cpu stack locations.
If the stubs were split in half, we could at least separate the syscall
stubs from the emulation stubs, and leave the latter unmapped.
>> --- a/xen/arch/x86/smpboot.c
>> +++ b/xen/arch/x86/smpboot.c
>> @@ -622,6 +622,9 @@ unsigned long alloc_stub_page(unsigned int cpu, unsigned
>> long *mfn)
>> unmap_domain_page(memset(__map_domain_page(pg), 0xcc, PAGE_SIZE));
>> + /* Confirm that all stubs fit in a single L2 pagetable. */
>> + BUILD_BUG_ON(NR_CPUS * PAGE_SIZE > (1u << L2_PAGETABLE_SHIFT));
> Perhaps duplicate this in setup_cpu_root_pgt() (suitably adjusted
> of course, as Jürgen has already pointed out)?
>> @@ -651,9 +654,6 @@ static int clone_mapping(const void *ptr, root_pgentry_t
>> l2_pgentry_t *pl2e;
>> l1_pgentry_t *pl1e;
>> - if ( linear < DIRECTMAP_VIRT_START )
>> - return 0;
> Isn't outright removing this going a little too far?
Considering it is buggy, no. Retuning success after doing nothing is
If anything, it should return -EINVAL so the caller knows that something
expected happened, but it should also reject attempts to clone mappings
beyond the end of the directmap (as those will explode when we switch to
PV context), and the XEN_VIRT_* range needs white-listing as ok to clone
for this usecase to work.
>> --- a/xen/arch/x86/x86_64/compat/entry.S
>> +++ b/xen/arch/x86/x86_64/compat/entry.S
>> @@ -13,6 +13,8 @@
>> #include <public/xen.h>
>> #include <irq_vectors.h>
>> + .section .text.entry, "ax", @progbits
>> pushq $0
> This also puts compat_create_bounce_frame into the entry section,
> which surely isn't needed. Same for the non-compat variant.
I considered that. Excluding those ranges won't reduce the number of
allocations we make, but it does complicate backports.
Also, I still fully intend to make the functions disappear into C and
move out of .text.entry that way.
>> @@ -854,7 +856,7 @@ GLOBAL(autogen_entrypoints)
>> - .text
>> + .previous
>> autogen_stubs: /* Automatically generated stubs. */
> Perhaps better to switch the earlier .section to .pushsection, and
> use .popsection here?
Xen-devel mailing list