Re: [PATCH 2/5] x86/pvh: Make PVH entrypoint PIC for x86-64

2024-04-11 Thread Brian Gerst
On Thu, Apr 11, 2024 at 11:26 AM Jason Andryuk  wrote:
>
> On 2024-04-10 17:00, Brian Gerst wrote:
> > On Wed, Apr 10, 2024 at 3:50 PM Jason Andryuk  wrote:
>
> >>  /* 64-bit entry point. */
> >>  .code64
> >>   1:
> >> +   UNWIND_HINT_END_OF_STACK
> >> +
> >>  /* Set base address in stack canary descriptor. */
> >>  mov $MSR_GS_BASE,%ecx
> >> -   mov $_pa(canary), %eax
> >> +   leal rva(canary)(%ebp), %eax
> >
> > Since this is in 64-bit mode, RIP-relative addressing can be used.
> >
> >>  xor %edx, %edx
> >>  wrmsr
> >>
> >>  call xen_prepare_pvh
> >>
> >>  /* startup_64 expects boot_params in %rsi. */
> >> -   mov $_pa(pvh_bootparams), %rsi
> >> -   mov $_pa(startup_64), %rax
> >> +   lea rva(pvh_bootparams)(%ebp), %rsi
> >> +   lea rva(startup_64)(%ebp), %rax
> >
> > RIP-relative here too.
>
> Yes, thanks for catching that.  With the RIP-relative conversion, there
> is now:
> vmlinux.o: warning: objtool: pvh_start_xen+0x10d: relocation to !ENDBR:
> startup_64+0x0
>
> I guess RIP-relative made it visible.  That can be quieted by adding
> ANNOTATE_NOENDBR to startup_64.

Change it to a direct jump, since branches are always RIP-relative.

Brian Gerst



Re: [PATCH 2/5] x86/pvh: Make PVH entrypoint PIC for x86-64

2024-04-10 Thread Brian Gerst
On Wed, Apr 10, 2024 at 3:50 PM Jason Andryuk  wrote:
>
> The PVH entrypoint is 32bit non-PIC code running the uncompressed
> vmlinux at its load address CONFIG_PHYSICAL_START - default 0x100
> (16MB).  The kernel is loaded at that physical address inside the VM by
> the VMM software (Xen/QEMU).
>
> When running a Xen PVH Dom0, the host reserved addresses are mapped 1-1
> into the PVH container.  There exist system firmwares (Coreboot/EDK2)
> with reserved memory at 16MB.  This creates a conflict where the PVH
> kernel cannot be loaded at that address.
>
> Modify the PVH entrypoint to be position-indepedent to allow flexibility
> in load address.  Only the 64bit entry path is converted.  A 32bit
> kernel is not PIC, so calling into other parts of the kernel, like
> xen_prepare_pvh() and mk_pgtable_32(), don't work properly when
> relocated.
>
> This makes the code PIC, but the page tables need to be updated as well
> to handle running from the kernel high map.
>
> The UNWIND_HINT_END_OF_STACK is to silence:
> vmlinux.o: warning: objtool: pvh_start_xen+0x7f: unreachable instruction
> after the lret into 64bit code.
>
> Signed-off-by: Jason Andryuk 
> ---
> ---
>  arch/x86/platform/pvh/head.S | 44 
>  1 file changed, 34 insertions(+), 10 deletions(-)
>
> diff --git a/arch/x86/platform/pvh/head.S b/arch/x86/platform/pvh/head.S
> index f7235ef87bc3..bb1e582e32b1 100644
> --- a/arch/x86/platform/pvh/head.S
> +++ b/arch/x86/platform/pvh/head.S
> @@ -7,6 +7,7 @@
> .code32
> .text
>  #define _pa(x)  ((x) - __START_KERNEL_map)
> +#define rva(x)  ((x) - pvh_start_xen)
>
>  #include 
>  #include 
> @@ -54,7 +55,25 @@ SYM_CODE_START_LOCAL(pvh_start_xen)
> UNWIND_HINT_END_OF_STACK
> cld
>
> -   lgdt (_pa(gdt))
> +   /*
> +* See the comment for startup_32 for more details.  We need to
> +* execute a call to get the execution address to be position
> +* independent, but we don't have a stack.  Save and restore the
> +* magic field of start_info in ebx, and use that as the stack.
> +*/
> +   mov  (%ebx), %eax
> +   leal 4(%ebx), %esp
> +   ANNOTATE_INTRA_FUNCTION_CALL
> +   call 1f
> +1: popl %ebp
> +   mov  %eax, (%ebx)
> +   subl $rva(1b), %ebp
> +   movl $0, %esp
> +
> +   leal rva(gdt)(%ebp), %eax
> +   leal rva(gdt_start)(%ebp), %ecx
> +   movl %ecx, 2(%eax)
> +   lgdt (%eax)
>
> mov $PVH_DS_SEL,%eax
> mov %eax,%ds
> @@ -62,14 +81,14 @@ SYM_CODE_START_LOCAL(pvh_start_xen)
> mov %eax,%ss
>
> /* Stash hvm_start_info. */
> -   mov $_pa(pvh_start_info), %edi
> +   leal rva(pvh_start_info)(%ebp), %edi
> mov %ebx, %esi
> -   mov _pa(pvh_start_info_sz), %ecx
> +   movl rva(pvh_start_info_sz)(%ebp), %ecx
> shr $2,%ecx
> rep
> movsl
>
> -   mov $_pa(early_stack_end), %esp
> +   leal rva(early_stack_end)(%ebp), %esp
>
> /* Enable PAE mode. */
> mov %cr4, %eax
> @@ -84,28 +103,33 @@ SYM_CODE_START_LOCAL(pvh_start_xen)
> wrmsr
>
> /* Enable pre-constructed page tables. */
> -   mov $_pa(init_top_pgt), %eax
> +   leal rva(init_top_pgt)(%ebp), %eax
> mov %eax, %cr3
> mov $(X86_CR0_PG | X86_CR0_PE), %eax
> mov %eax, %cr0
>
> /* Jump to 64-bit mode. */
> -   ljmp $PVH_CS_SEL, $_pa(1f)
> +   pushl $PVH_CS_SEL
> +   leal  rva(1f)(%ebp), %eax
> +   pushl %eax
> +   lretl
>
> /* 64-bit entry point. */
> .code64
>  1:
> +   UNWIND_HINT_END_OF_STACK
> +
> /* Set base address in stack canary descriptor. */
> mov $MSR_GS_BASE,%ecx
> -   mov $_pa(canary), %eax
> +   leal rva(canary)(%ebp), %eax

Since this is in 64-bit mode, RIP-relative addressing can be used.

> xor %edx, %edx
> wrmsr
>
> call xen_prepare_pvh
>
> /* startup_64 expects boot_params in %rsi. */
> -   mov $_pa(pvh_bootparams), %rsi
> -   mov $_pa(startup_64), %rax
> +   lea rva(pvh_bootparams)(%ebp), %rsi
> +   lea rva(startup_64)(%ebp), %rax

RIP-relative here too.

> ANNOTATE_RETPOLINE_SAFE
> jmp *%rax
>
> @@ -143,7 +167,7 @@ SYM_CODE_END(pvh_start_xen)
> .balign 8
>  SYM_DATA_START_LOCAL(gdt)
> .word gdt_end - gdt_start
> -   .long _pa(gdt_start)
> +   .long _pa(gdt_start) /* x86-64 will overwrite if relocated. */
> .word 0
>  SYM_DATA_END(gdt)
>  SYM_DATA_START_LOCAL(gdt_start)
> --
> 2.44.0
>
>

Brian Gerst



Re: [PATCH v13 26/35] x86/fred: FRED entry/exit and dispatch code

2023-12-06 Thread Brian Gerst
On Wed, Dec 6, 2023 at 2:19 PM Li, Xin3  wrote:
>
> > >>> + case X86_TRAP_OF:
> > >>> + exc_overflow(regs);
> > >>> + return;
> > >>> +
> > >>> + /* INT3 */
> > >>> + case X86_TRAP_BP:
> > >>> + exc_int3(regs);
> > >>> + return;
> > >> ... neither OF nor BP will ever enter fred_intx() because they're
> > >> type SWEXC not SWINT.
> > > Per FRED spec 5.0, section 7.3 Software Interrupts and Related 
> > > Instructions:
> > > INT n (opcode CD followed by an immediate byte): There are 256 such
> > > software interrupt instructions, one for each value n of the immediate
> > > byte (0–255).
> > >
> > > And appendix B Event Stack Levels:
> > > If the event is an execution of INT n (opcode CD n for 8-bit value n),
> > > the event stack level is 0. The event type is 4 (software interrupt)
> > > and the vector is n.
> > >
> > > So int $0x4 and int $0x3 (use asm(".byte 0xCD, 0x03")) get here.
> > >
> > > But into (0xCE) and int3 (0xCC) do use event type SWEXC.
> > >
> > > BTW, into is NOT allowed in 64-bit mode but "int $0x4" is allowed.
> >
> > There is certainly fun to be had with CD 03 and CD 04 byte patterns, but if 
> > you
> > meant to mean those here, then the comments are wrong.
> >
> > Vectors 3 and 4 are installed with DPL3 because that is necessary to make 
> > CC and
> > CE function in userspace.  It also suggests that the SWINT vs SWEXC 
> > distinction
> > was retrofitted to architecture after the 286, because exceptions don't 
> > check DPL
> > and ICEBP delivers #DB from userspace even when Vector 1 has a DPL of 0.
> >
> > While CC is for most cases indistinguishable from CD 03, CE behaves entirely
> > differently to CD 04.  CD 04 doesn't #UD in 64bit mode, and will trigger
> > exc_overflow() irrespective of the state of EFLAGS.OF.
> >
> >
> > The SDM goes out of it's way to say not to use the CD 03 byte pattern (and 
> > it
> > does take effort to emit this byte pattern - e.g. GAS will silently 
> > translate "int $3"
> > to "int3"), and there's no plausible way software is using CD 04 in place 
> > of CE.
> >
> > So why do we care about containing to make mistakes of the IDT era work in a
> > FRED world?
>
> First, I agree with you because it makes things simple and neat.
>
> However, the latest SDM and FRED spec 5.0 both doesn't disallow it, so it
> becomes an OS implementation choice.
>
> >
> > Is there anything (other than perhaps the selftests) which would even 
> > notice?
>
> I'm just conservative :)
>
> If a user app can do it with IDT, we should still allow it when FRED is
> enabled.  But if all key stakeholders don't care whatever gets broken
> due to the change and agree to change it.

One case to consider is Windows software running under Wine.
Anti-tampering code has been known to do some non-standard things,
like using ICEBP or using SYSCALL directly instead of through system
DLLs.  Keeping the status quo should be preferred, especially if
Microsoft does the same.


Brian Gerst



Re: [PATCH 1/4] x86/percpu: Use explicit segment registers in lib/cmpxchg{8,16}b_emu.S

2023-10-12 Thread Brian Gerst
On Thu, Oct 12, 2023 at 12:13 PM Uros Bizjak  wrote:
>
> PER_CPU_VAR macro is intended to be applied to a symbol, it is not
> intended to be used as a selector between %fs and %gs segment
> registers for general operands.
>
> The address to these emulation functions is passed in a register, so
> use explicit segment registers to access percpu variable instead.
>
> Also add a missing function comment to this_cpu_cmpxchg8b_emu.
>
> No functional changes intended.
>
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: Borislav Petkov 
> Cc: Dave Hansen 
> Cc: "H. Peter Anvin" 
> Cc: Peter Zijlstra 
> Signed-off-by: Uros Bizjak 
> ---
>  arch/x86/lib/cmpxchg16b_emu.S | 12 ++--
>  arch/x86/lib/cmpxchg8b_emu.S  | 30 +-
>  2 files changed, 27 insertions(+), 15 deletions(-)
>
> diff --git a/arch/x86/lib/cmpxchg16b_emu.S b/arch/x86/lib/cmpxchg16b_emu.S
> index 6962df315793..2bd8b89bce75 100644
> --- a/arch/x86/lib/cmpxchg16b_emu.S
> +++ b/arch/x86/lib/cmpxchg16b_emu.S
> @@ -23,14 +23,14 @@ SYM_FUNC_START(this_cpu_cmpxchg16b_emu)
> cli
>
> /* if (*ptr == old) */
> -   cmpqPER_CPU_VAR(0(%rsi)), %rax
> +   cmpq%gs:(%rsi), %rax
> jne .Lnot_same
> -   cmpqPER_CPU_VAR(8(%rsi)), %rdx
> +   cmpq%gs:8(%rsi), %rdx
> jne .Lnot_same
>
> /* *ptr = new */
> -   movq%rbx, PER_CPU_VAR(0(%rsi))
> -   movq%rcx, PER_CPU_VAR(8(%rsi))
> +   movq%rbx, %gs:(%rsi)
> +   movq%rcx, %gs:8(%rsi)
>
> /* set ZF in EFLAGS to indicate success */
> orl $X86_EFLAGS_ZF, (%rsp)
> @@ -42,8 +42,8 @@ SYM_FUNC_START(this_cpu_cmpxchg16b_emu)
> /* *ptr != old */
>
> /* old = *ptr */
> -   movqPER_CPU_VAR(0(%rsi)), %rax
> -   movqPER_CPU_VAR(8(%rsi)), %rdx
> +   movq%gs:(%rsi), %rax
> +   movq%gs:8(%rsi), %rdx
>
> /* clear ZF in EFLAGS to indicate failure */
> andl$(~X86_EFLAGS_ZF), (%rsp)
> diff --git a/arch/x86/lib/cmpxchg8b_emu.S b/arch/x86/lib/cmpxchg8b_emu.S
> index 49805257b125..b7d68d5e2d31 100644
> --- a/arch/x86/lib/cmpxchg8b_emu.S
> +++ b/arch/x86/lib/cmpxchg8b_emu.S
> @@ -24,12 +24,12 @@ SYM_FUNC_START(cmpxchg8b_emu)
> pushfl
> cli
>
> -   cmpl0(%esi), %eax
> +   cmpl(%esi), %eax
> jne .Lnot_same
> cmpl4(%esi), %edx
> jne .Lnot_same
>
> -   movl%ebx, 0(%esi)
> +   movl%ebx, (%esi)
> movl%ecx, 4(%esi)
>
> orl $X86_EFLAGS_ZF, (%esp)
> @@ -38,7 +38,7 @@ SYM_FUNC_START(cmpxchg8b_emu)
> RET
>
>  .Lnot_same:
> -   movl0(%esi), %eax
> +   movl(%esi), %eax
> movl4(%esi), %edx
>
> andl$(~X86_EFLAGS_ZF), (%esp)
> @@ -53,18 +53,30 @@ EXPORT_SYMBOL(cmpxchg8b_emu)
>
>  #ifndef CONFIG_UML
>
> +/*
> + * Emulate 'cmpxchg8b %fs:(%esi)'
> + *
> + * Inputs:
> + * %esi : memory location to compare
> + * %eax : low 32 bits of old value
> + * %edx : high 32 bits of old value
> + * %ebx : low 32 bits of new value
> + * %ecx : high 32 bits of new value
> + *
> + * Notably this is not LOCK prefixed and is not safe against NMIs
> + */
>  SYM_FUNC_START(this_cpu_cmpxchg8b_emu)
>
> pushfl
> cli
>
> -   cmplPER_CPU_VAR(0(%esi)), %eax
> +   cmpl%fs:(%esi), %eax
> jne .Lnot_same2
> -   cmplPER_CPU_VAR(4(%esi)), %edx
> +   cmpl%fs:4(%esi), %edx
> jne .Lnot_same2
>
> -   movl%ebx, PER_CPU_VAR(0(%esi))
> -   movl%ecx, PER_CPU_VAR(4(%esi))
> +   movl%ebx, %fs:(%esi)
> +   movl%ecx, %fs:4(%esi)
>
> orl $X86_EFLAGS_ZF, (%esp)
>
> @@ -72,8 +84,8 @@ SYM_FUNC_START(this_cpu_cmpxchg8b_emu)
>     RET
>
>  .Lnot_same2:
> -   movlPER_CPU_VAR(0(%esi)), %eax
> -   movlPER_CPU_VAR(4(%esi)), %edx
> +   movl%fs:(%esi), %eax
> +   movl%fs:4(%esi), %edx
>
> andl$(~X86_EFLAGS_ZF), (%esp)
>
> --
> 2.41.0
>

This will break on !SMP builds, where per-cpu variables are just
regular data and not accessed with a segment prefix.

Brian Gerst



Re: [patch 35/37] x86/smpboot: Support parallel startup of secondary CPUs

2023-04-15 Thread Brian Gerst
On Fri, Apr 14, 2023 at 7:45 PM Thomas Gleixner  wrote:
>
> From: David Woodhouse 
>
> Rework the real-mode startup code to allow for APs to be brought up in
> parallel. This is in two parts:
>
>  1. Introduce a bit-spinlock to prevent them from all using the real
> mode stack at the same time.
>
>  2. Avoid needing to use the global smpboot_control variable to pass
> each AP its CPU number.
>
> To achieve the latter, export the cpuid_to_apicid[] array so that each
> AP can find its own CPU number by searching therein based on its APIC ID.
>
> Introduce flags in the top bits of smpboot_control which indicate methods
> by which an AP should find its CPU number. For a serialized bringup, the
> CPU number is explicitly passed in the low bits of smpboot_control as
> before. For parallel mode there are flags directing the AP to find its APIC
> ID in CPUID leaf 0x0b or 1x1f (for X2APIC mode) or CPUID leaf 0x01 where 8
> bits are sufficient, then perform the cpuid_to_apicid[] lookup with that.
>
> Aside from the fact that APs will now look up their CPU number via the
> newly-exported cpuid_to_apicid[] table, there is no behavioural change
> intended, since the parallel bootup has not yet been enabled.
>
> [ tglx: Initial proof of concept patch with bitlock and APIC ID lookup ]
> [ dwmw2: Rework and testing, commit message, CPUID 0x1 and CPU0 support ]
> [ seanc: Fix stray override of initial_gs in common_cpu_up() ]
> [ Oleksandr Natalenko: reported suspend/resume issue fixed in
>   x86_acpi_suspend_lowlevel ]
>
> Co-developed-by: Thomas Gleixner 
> Signed-off-by: Thomas Gleixner 
> Co-developed-by: Brian Gerst 
> Signed-off-by: Brian Gerst 
> Signed-off-by: David Woodhouse 
> Signed-off-by: Usama Arif 
> Signed-off-by: Thomas Gleixner 
> ---
>  arch/x86/include/asm/apic.h  |2
>  arch/x86/include/asm/realmode.h  |3 +
>  arch/x86/include/asm/smp.h   |8 +++
>  arch/x86/kernel/acpi/sleep.c |9 +++
>  arch/x86/kernel/apic/apic.c  |2
>  arch/x86/kernel/head_64.S|   79 
> ++-
>  arch/x86/kernel/smpboot.c|5 --
>  arch/x86/realmode/init.c |3 +
>  arch/x86/realmode/rm/trampoline_64.S |   27 +--
>  9 files changed, 125 insertions(+), 13 deletions(-)
>
> --- a/arch/x86/include/asm/apic.h
> +++ b/arch/x86/include/asm/apic.h
> @@ -55,6 +55,8 @@ extern int local_apic_timer_c2_ok;
>  extern int disable_apic;
>  extern unsigned int lapic_timer_period;
>
> +extern int cpuid_to_apicid[];
> +
>  extern enum apic_intr_mode_id apic_intr_mode;
>  enum apic_intr_mode_id {
> APIC_PIC,
> --- a/arch/x86/include/asm/realmode.h
> +++ b/arch/x86/include/asm/realmode.h
> @@ -52,6 +52,7 @@ struct trampoline_header {
> u64 efer;
> u32 cr4;
> u32 flags;
> +   u32 lock;
>  #endif
>  };
>
> @@ -64,6 +65,8 @@ extern unsigned long initial_stack;
>  extern unsigned long initial_vc_handler;
>  #endif
>
> +extern u32 *trampoline_lock;
> +
>  extern unsigned char real_mode_blob[];
>  extern unsigned char real_mode_relocs[];
>
> --- a/arch/x86/include/asm/smp.h
> +++ b/arch/x86/include/asm/smp.h
> @@ -198,4 +198,12 @@ extern unsigned int smpboot_control;
>
>  #endif /* !__ASSEMBLY__ */
>
> +/* Control bits for startup_64 */
> +#define STARTUP_APICID_CPUID_1F 0x8000
> +#define STARTUP_APICID_CPUID_0B 0x4000
> +#define STARTUP_APICID_CPUID_01 0x2000
> +
> +/* Top 8 bits are reserved for control */
> +#define STARTUP_PARALLEL_MASK  0xFF00
> +
>  #endif /* _ASM_X86_SMP_H */
> --- a/arch/x86/kernel/acpi/sleep.c
> +++ b/arch/x86/kernel/acpi/sleep.c
> @@ -16,6 +16,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  #include 
>  #include "../../realmode/rm/wakeup.h"
> @@ -127,7 +128,13 @@ int x86_acpi_suspend_lowlevel(void)
>  * value is in the actual %rsp register.
>  */
> current->thread.sp = (unsigned long)temp_stack + sizeof(temp_stack);
> -   smpboot_control = smp_processor_id();
> +   /*
> +* Ensure the CPU knows which one it is when it comes back, if
> +* it isn't in parallel mode and expected to work that out for
> +* itself.
> +*/
> +   if (!(smpboot_control & STARTUP_PARALLEL_MASK))
> +   smpboot_control = smp_processor_id();
>  #endif
> initial_code = (unsigned long)wakeup_long64;
> saved_magic = 0x123456789abcdef0L;
> --- a/arch/x86/kernel/apic/apic.c
> +++ b/arch/x86/kernel/apic/apic.c
> @@ -2377,7 +2377,7 @@ static int nr_logical_cpuids = 1;
>  /*
>   * Use

Re: [patch 19/37] x86/smpboot: Switch to hotplug core state synchronization

2023-04-15 Thread Brian Gerst
lease it.
> +* Sync point with the hotplug core. Sets the sync state to ALIVE
> +* and waits for the control CPU to release it.
>  */
> -   wait_for_master_cpu(raw_smp_processor_id());
> +   cpuhp_ap_sync_alive();
>
> cpu_init();
> rcu_cpu_starting(raw_smp_processor_id());
> @@ -285,7 +271,6 @@ static void notrace start_secondary(void
> set_cpu_online(smp_processor_id(), true);
> lapic_online();
> unlock_vector_lock();
> -   cpu_set_state_online(smp_processor_id());
> x86_platform.nmi_init();
>
> /* enable local interrupts */
> @@ -736,9 +721,10 @@ static void impress_friends(void)
>  * Allow the user to impress friends.
>  */
> pr_debug("Before bogomips\n");
> -   for_each_possible_cpu(cpu)
> -   if (cpumask_test_cpu(cpu, cpu_callout_mask))
> +   for_each_possible_cpu(cpu) {
> +   if (cpumask_test_cpu(cpu, cpu_online_mask))
> bogosum += cpu_data(cpu).loops_per_jiffy;

This should be the same as for_each_online_cpu().

--
Brian Gerst



Re: [PATCH 4/4] x86: switch to cpu_feature_enabled() for X86_FEATURE_XENPV

2022-11-03 Thread Brian Gerst
On Thu, Nov 3, 2022 at 8:37 AM Juergen Gross  wrote:
>
> Convert the remaining cases of static_cpu_has(X86_FEATURE_XENPV) and
> boot_cpu_has(X86_FEATURE_XENPV) to use cpu_feature_enabled(), allowing
> more efficient code in case the kernel is configured without
> CONFIG_XEN_PV.
>
> Signed-off-by: Juergen Gross 
> ---
>  arch/x86/kernel/cpu/amd.c| 2 +-
>  arch/x86/kernel/cpu/bugs.c   | 2 +-
>  arch/x86/kernel/cpu/hygon.c  | 2 +-
>  arch/x86/kernel/process_64.c | 4 ++--
>  arch/x86/kernel/topology.c   | 2 +-
>  arch/x86/mm/cpu_entry_area.c | 2 +-
>  6 files changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
> index 860b60273df3..697fe881e967 100644
> --- a/arch/x86/kernel/cpu/amd.c
> +++ b/arch/x86/kernel/cpu/amd.c
> @@ -985,7 +985,7 @@ static void init_amd(struct cpuinfo_x86 *c)
> set_cpu_cap(c, X86_FEATURE_3DNOWPREFETCH);
>
> /* AMD CPUs don't reset SS attributes on SYSRET, Xen does. */
> -   if (!cpu_has(c, X86_FEATURE_XENPV))
> +   if (!cpu_feature_enabled(X86_FEATURE_XENPV))
> set_cpu_bug(c, X86_BUG_SYSRET_SS_ATTRS);
>
> /*
> diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
> index da7c361f47e0..7f78e1527c5e 100644
> --- a/arch/x86/kernel/cpu/bugs.c
> +++ b/arch/x86/kernel/cpu/bugs.c
> @@ -1302,7 +1302,7 @@ static enum spectre_v2_mitigation_cmd __init 
> spectre_v2_parse_cmdline(void)
> return SPECTRE_V2_CMD_AUTO;
> }
>
> -   if (cmd == SPECTRE_V2_CMD_IBRS && boot_cpu_has(X86_FEATURE_XENPV)) {
> +   if (cmd == SPECTRE_V2_CMD_IBRS && 
> cpu_feature_enabled(X86_FEATURE_XENPV)) {
> pr_err("%s selected but running as XenPV guest. Switching to 
> AUTO select\n",
>mitigation_options[i].option);
> return SPECTRE_V2_CMD_AUTO;
> diff --git a/arch/x86/kernel/cpu/hygon.c b/arch/x86/kernel/cpu/hygon.c
> index 21fd425088fe..1c27645fd429 100644
> --- a/arch/x86/kernel/cpu/hygon.c
> +++ b/arch/x86/kernel/cpu/hygon.c
> @@ -339,7 +339,7 @@ static void init_hygon(struct cpuinfo_x86 *c)
> set_cpu_cap(c, X86_FEATURE_ARAT);
>
> /* Hygon CPUs don't reset SS attributes on SYSRET, Xen does. */
> -   if (!cpu_has(c, X86_FEATURE_XENPV))
> +   if (!cpu_feature_enabled(X86_FEATURE_XENPV))
> set_cpu_bug(c, X86_BUG_SYSRET_SS_ATTRS);
>
> check_null_seg_clears_base(c);
> diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
> index 6b3418bff326..e2f469175be8 100644
> --- a/arch/x86/kernel/process_64.c
> +++ b/arch/x86/kernel/process_64.c
> @@ -165,7 +165,7 @@ static noinstr unsigned long __rdgsbase_inactive(void)
>
> lockdep_assert_irqs_disabled();
>
> -   if (!static_cpu_has(X86_FEATURE_XENPV)) {
> +   if (!cpu_feature_enabled(X86_FEATURE_XENPV)) {
> native_swapgs();
> gsbase = rdgsbase();
> native_swapgs();
> @@ -190,7 +190,7 @@ static noinstr void __wrgsbase_inactive(unsigned long 
> gsbase)
>  {
> lockdep_assert_irqs_disabled();
>
> -   if (!static_cpu_has(X86_FEATURE_XENPV)) {
> +   if (!cpu_feature_enabled(X86_FEATURE_XENPV)) {
> native_swapgs();
> wrgsbase(gsbase);
> native_swapgs();
> diff --git a/arch/x86/kernel/topology.c b/arch/x86/kernel/topology.c
> index 8617d1ed9d31..1b83377274b8 100644
> --- a/arch/x86/kernel/topology.c
> +++ b/arch/x86/kernel/topology.c
> @@ -106,7 +106,7 @@ int arch_register_cpu(int num)
>  * Xen PV guests don't support CPU0 hotplug at all.
>  */
> if (c->x86_vendor != X86_VENDOR_INTEL ||
> -   boot_cpu_has(X86_FEATURE_XENPV))
> +   cpu_feature_enabled(X86_FEATURE_XENPV))
> cpu0_hotpluggable = 0;
>
> /*
> diff --git a/arch/x86/mm/cpu_entry_area.c b/arch/x86/mm/cpu_entry_area.c
> index 6c2f1b76a0b6..c83799753d44 100644
> --- a/arch/x86/mm/cpu_entry_area.c
> +++ b/arch/x86/mm/cpu_entry_area.c
> @@ -147,7 +147,7 @@ static void __init setup_cpu_entry_area(unsigned int cpu)
>  * On Xen PV, the GDT must be read-only because the hypervisor
>  * requires it.
>  */
> -   pgprot_t gdt_prot = boot_cpu_has(X86_FEATURE_XENPV) ?
> +   pgprot_t gdt_prot = cpu_feature_enabled(X86_FEATURE_XENPV) ?
> PAGE_KERNEL_RO : PAGE_KERNEL;
> pgprot_t tss_prot = PAGE_KERNEL;
>  #endif

This is another case that can be removed because it's for 32-bit.

--
Brian Gerst



Re: [PATCH v2 0/4] Remove 32-bit Xen PV guest support

2020-07-02 Thread Brian Gerst
On Wed, Jul 1, 2020 at 7:07 AM Juergen Gross  wrote:
>
> The long term plan has been to replace Xen PV guests by PVH. The first
> victim of that plan are now 32-bit PV guests, as those are used only
> rather seldom these days. Xen on x86 requires 64-bit support and with
> Grub2 now supporting PVH officially since version 2.04 there is no
> need to keep 32-bit PV guest support alive in the Linux kernel.
> Additionally Meltdown mitigation is not available in the kernel running
> as 32-bit PV guest, so dropping this mode makes sense from security
> point of view, too.

One thing that you missed is removing VDSO_NOTE_NONEGSEG_BIT from
vdso32/note.S.  With that removed there is no difference from the
64-bit version.

Otherwise this series looks good to me.
--
Brian Gerst



Re: [PATCH v2 1/4] x86/xen: remove 32-bit Xen PV guest support

2020-07-01 Thread Brian Gerst
On Wed, Jul 1, 2020 at 7:08 AM Juergen Gross  wrote:
>
> Xen is requiring 64-bit machines today and since Xen 4.14 it can be
> built without 32-bit PV guest support. There is no need to carry the
> burden of 32-bit PV guest support in the kernel any longer, as new
> guests can be either HVM or PVH, or they can use a 64 bit kernel.
>
> Remove the 32-bit Xen PV support from the kernel.

If you send a v3, it would be better to split the move of the 64-bit
code into xen-asm.S into a separate patch.

--
Brian Gerst



Re: [PATCH 3/6] x86/entry/64/compat: Fix Xen PV SYSENTER frame setup

2020-07-01 Thread Brian Gerst
On Fri, Jun 26, 2020 at 1:30 PM Andy Lutomirski  wrote:
>
> The SYSENTER frame setup was nonsense.  It worked by accident
> because the normal code into which the Xen asm jumped
> (entry_SYSENTER_32/compat) threw away SP without touching the stack.
> entry_SYSENTER_compat was recently modified such that it relied on
> having a valid stack pointer, so now the Xen asm needs to invoke it
> with a valid stack.
>
> Fix it up like SYSCALL: use the Xen-provided frame and skip the bare
> metal prologue.
>
> Cc: Boris Ostrovsky 
> Cc: Juergen Gross 
> Cc: Stefano Stabellini 
> Cc: xen-devel@lists.xenproject.org
> Fixes: 1c3e5d3f60e2 ("x86/entry: Make entry_64_compat.S objtool clean")
> Signed-off-by: Andy Lutomirski 
> ---
>  arch/x86/entry/entry_64_compat.S |  1 +
>  arch/x86/xen/xen-asm_64.S| 20 
>  2 files changed, 17 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/entry/entry_64_compat.S 
> b/arch/x86/entry/entry_64_compat.S
> index 7b9d8150f652..381a6de7de9c 100644
> --- a/arch/x86/entry/entry_64_compat.S
> +++ b/arch/x86/entry/entry_64_compat.S
> @@ -79,6 +79,7 @@ SYM_CODE_START(entry_SYSENTER_compat)
> pushfq  /* pt_regs->flags (except IF = 0) */
> pushq   $__USER32_CS/* pt_regs->cs */
> pushq   $0  /* pt_regs->ip = 0 (placeholder) */
> +SYM_INNER_LABEL(entry_SYSENTER_compat_after_hwframe, SYM_L_GLOBAL)

This skips over the section that truncates the syscall number to
32-bits.  The comments present some doubt that it is actually
necessary, but the Xen path shouldn't differ from native.  That code
should be moved after this new label.

--
Brian Gerst



Re: [Xen-devel] [PATCH v4 1/2] xen/PVH: Set up GS segment for stack canary

2018-05-22 Thread Brian Gerst
On Tue, May 22, 2018 at 9:57 AM, Jan Beulich <jbeul...@suse.com> wrote:
>>>> On 22.05.18 at 15:45, <brge...@gmail.com> wrote:
>> On Mon, May 21, 2018 at 11:54 PM, Boris Ostrovsky 
>> <boris.ostrov...@oracle.com> wrote:
>>> @@ -98,6 +101,12 @@ ENTRY(pvh_start_xen)
>>> /* 64-bit entry point. */
>>> .code64
>>>  1:
>>> +   /* Set base address in stack canary descriptor. */
>>> +   mov $MSR_GS_BASE,%ecx
>>> +   mov $canary, %rax
>>> +   cdq
>>> +   wrmsr
>>
>> CDQ only sign-extends EAX to RAX.  What you really want is to move the
>> high 32-bits to EDX (or zero EDX if we can guarantee it is loaded
>> below 4G).
>
> What you describe is CDQE (AT name: CLTD); CDQ (AT: CLTQ)
> sign-extends EAX to EDX:EAX.

But that would still be wrong, as it would set EDX to 0x if
the kernel was loaded between 2G and 4G.  Looking closer at the code,
we just left 32-bit mode, so we must have been loaded below 4G,
therefore EDX must be zero.

--
Brian Gerst

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v4 1/2] xen/PVH: Set up GS segment for stack canary

2018-05-22 Thread Brian Gerst
On Mon, May 21, 2018 at 11:54 PM, Boris Ostrovsky
<boris.ostrov...@oracle.com> wrote:
> We are making calls to C code (e.g. xen_prepare_pvh()) which may use
> stack canary (stored in GS segment).
>
> Signed-off-by: Boris Ostrovsky <boris.ostrov...@oracle.com>
> ---
>  arch/x86/xen/xen-pvh.S | 26 +-
>  1 file changed, 25 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/xen/xen-pvh.S b/arch/x86/xen/xen-pvh.S
> index e1a5fbe..0169374 100644
> --- a/arch/x86/xen/xen-pvh.S
> +++ b/arch/x86/xen/xen-pvh.S
> @@ -54,6 +54,9 @@
>   * charge of setting up it's own stack, GDT and IDT.
>   */
>
> +#define PVH_GDT_ENTRY_CANARY   4
> +#define PVH_CANARY_SEL (PVH_GDT_ENTRY_CANARY * 8)
> +
>  ENTRY(pvh_start_xen)
> cld
>
> @@ -98,6 +101,12 @@ ENTRY(pvh_start_xen)
> /* 64-bit entry point. */
> .code64
>  1:
> +   /* Set base address in stack canary descriptor. */
> +   mov $MSR_GS_BASE,%ecx
> +   mov $canary, %rax
> +   cdq
> +   wrmsr

CDQ only sign-extends EAX to RAX.  What you really want is to move the
high 32-bits to EDX (or zero EDX if we can guarantee it is loaded
below 4G).

--
Brian Gerst

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel