from:"Ingo Molnar"

Re: [Xen-devel] [PATCH 2/3] x86: add guest_late_init hook to hypervisor_x86 structure

2017-11-08 Thread Ingo Molnar


* Juergen Gross <jgr...@suse.com> wrote:

> On 08/11/17 10:40, Ingo Molnar wrote:
> > 
> > * Juergen Gross <jgr...@suse.com> wrote:
> > 
> >>> Plus add a default empty function (which hypervisors can override). This 
> >>> avoids 
> >>> all the hidden conditions and wrappery.
> >>
> >> Hmm, x86_hyper is just a pointer being NULL on bare metal. So we would
> >> have to add a pre-filled struct with dummy functions and in case a
> >> hypervisor is detected we'd need to copy all non-NULL pointers of the
> >> hypervisor specific struct to the pre-filled one.
> > 
> > Ok, I missed that detail - but yeah, since this is mostly about boot code,
> > where readability is king, I still think it would be an overall improvement.
> > 
> > This is the pattern we are trying to use with x86_platform ops for example, 
> > and 
> > doing:
> > 
> >   git grep -w x86_platform arch/x86
> > 
> > gives a pretty clear idea about how it's used - while if it was all within 
> > wrappers it would be a lot more difficult to discover all the callsites.
> > 
> > Admittedly it's not done totally consistently:
> > 
> >  arch/x86/kernel/apic/probe_32.c:if (x86_platform.apic_post_init)
> >  arch/x86/kernel/apic/probe_64.c:if (x86_platform.apic_post_init)
> >  arch/x86/kernel/ebda.c: if (!x86_platform.legacy.reserve_bios_regions)
> >  arch/x86/kernel/platform-quirks.c:  if 
> > (x86_platform.set_legacy_features)
> >  arch/x86/platform/intel-mid/device_libs/platform_mrfld_rtc.c:   if 
> > (!x86_platform.legacy.rtc)
> > 
> > ... but the _idea_ behind it is consistent ;-)
> > 
> >> In case there are no objections I can add a patch to modify the current
> >> way x86_hyper is used to the pre-filled variant.
> > 
> > Yeah, sounds good to me!
> 
> Okay. With you mentioning x86_platform: should I make x86_hyper a member
> of x86_platform (e.g. x86_platform.hyper.) ?
> 
> I think this would fit quite nice.

Yeah, seems like a good idea!

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 2/3] x86: add guest_late_init hook to hypervisor_x86 structure

2017-11-08 Thread Ingo Molnar

* Juergen Gross  wrote:

> > Plus add a default empty function (which hypervisors can override). This 
> > avoids 
> > all the hidden conditions and wrappery.
> 
> Hmm, x86_hyper is just a pointer being NULL on bare metal. So we would
> have to add a pre-filled struct with dummy functions and in case a
> hypervisor is detected we'd need to copy all non-NULL pointers of the
> hypervisor specific struct to the pre-filled one.

Ok, I missed that detail - but yeah, since this is mostly about boot code,
where readability is king, I still think it would be an overall improvement.

This is the pattern we are trying to use with x86_platform ops for example, and 
doing:

  git grep -w x86_platform arch/x86

gives a pretty clear idea about how it's used - while if it was all within 
wrappers it would be a lot more difficult to discover all the callsites.

Admittedly it's not done totally consistently:

 arch/x86/kernel/apic/probe_32.c:if (x86_platform.apic_post_init)
 arch/x86/kernel/apic/probe_64.c:if (x86_platform.apic_post_init)
 arch/x86/kernel/ebda.c: if (!x86_platform.legacy.reserve_bios_regions)
 arch/x86/kernel/platform-quirks.c:  if (x86_platform.set_legacy_features)
 arch/x86/platform/intel-mid/device_libs/platform_mrfld_rtc.c:   if 
(!x86_platform.legacy.rtc)

... but the _idea_ behind it is consistent ;-)

> In case there are no objections I can add a patch to modify the current
> way x86_hyper is used to the pre-filled variant.

Yeah, sounds good to me!

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 2/3] x86: add guest_late_init hook to hypervisor_x86 structure

2017-11-08 Thread Ingo Molnar


* Juergen Gross  wrote:

> Add a new guest_late_init hook to the hypervisor_x86 structure. It
> will replace the current kvm_guest_init() call which is changed to
> make use of the new hook.
> 
> Signed-off-by: Juergen Gross 
> ---
>  arch/x86/include/asm/hypervisor.h | 11 +++
>  arch/x86/include/asm/kvm_para.h   |  2 --
>  arch/x86/kernel/kvm.c |  3 ++-
>  arch/x86/kernel/setup.c   |  2 +-
>  4 files changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/include/asm/hypervisor.h 
> b/arch/x86/include/asm/hypervisor.h
> index 0ead9dbb9130..37320687b8cb 100644
> --- a/arch/x86/include/asm/hypervisor.h
> +++ b/arch/x86/include/asm/hypervisor.h
> @@ -38,6 +38,9 @@ struct hypervisor_x86 {
>   /* Platform setup (run once per boot) */
>   void(*init_platform)(void);
>  
> + /* Guest late init */
> + void(*guest_late_init)(void);
> +
>   /* X2APIC detection (run once per boot) */
>   bool(*x2apic_available)(void);
>  
> @@ -66,9 +69,17 @@ static inline void hypervisor_init_mem_mapping(void)
>   if (x86_hyper && x86_hyper->init_mem_mapping)
>   x86_hyper->init_mem_mapping();
>  }
> +
> +static inline void hypervisor_guest_late_init(void)
> +{
> + if (x86_hyper && x86_hyper->guest_late_init)
> + x86_hyper->guest_late_init();
> +}
> +
>  #else
>  static inline void init_hypervisor_platform(void) { }
>  static inline bool hypervisor_x2apic_available(void) { return false; }
>  static inline void hypervisor_init_mem_mapping(void) { }
> +static inline void hypervisor_guest_late_init(void) { }
>  #endif /* CONFIG_HYPERVISOR_GUEST */
>  #endif /* _ASM_X86_HYPERVISOR_H */
> diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
> index c373e44049b1..7b407dda2bd7 100644
> --- a/arch/x86/include/asm/kvm_para.h
> +++ b/arch/x86/include/asm/kvm_para.h
> @@ -88,7 +88,6 @@ static inline long kvm_hypercall4(unsigned int nr, unsigned 
> long p1,
>  #ifdef CONFIG_KVM_GUEST
>  bool kvm_para_available(void);
>  unsigned int kvm_arch_para_features(void);
> -void __init kvm_guest_init(void);
>  void kvm_async_pf_task_wait(u32 token, int interrupt_kernel);
>  void kvm_async_pf_task_wake(u32 token);
>  u32 kvm_read_and_reset_pf_reason(void);
> @@ -103,7 +102,6 @@ static inline void kvm_spinlock_init(void)
>  #endif /* CONFIG_PARAVIRT_SPINLOCKS */
>  
>  #else /* CONFIG_KVM_GUEST */
> -#define kvm_guest_init() do {} while (0)
>  #define kvm_async_pf_task_wait(T, I) do {} while(0)
>  #define kvm_async_pf_task_wake(T) do {} while(0)
>  
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index 8bb9594d0761..d331b5060aa9 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -465,7 +465,7 @@ static void __init kvm_apf_trap_init(void)
>   update_intr_gate(X86_TRAP_PF, async_page_fault);
>  }
>  
> -void __init kvm_guest_init(void)
> +static void __init kvm_guest_init(void)
>  {
>   int i;
>  
> @@ -548,6 +548,7 @@ const struct hypervisor_x86 x86_hyper_kvm __refconst = {
>   .name   = "KVM",
>   .detect = kvm_detect,
>   .x2apic_available   = kvm_para_available,
> + .guest_late_init= kvm_guest_init,
>  };
>  EXPORT_SYMBOL_GPL(x86_hyper_kvm);
>  
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 0957dd73d127..578569481d87 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -1294,7 +1294,7 @@ void __init setup_arch(char **cmdline_p)
>  
>   io_apic_init_mappings();
>  
> - kvm_guest_init();
> + hypervisor_guest_late_init();

No principal objections, but could we please use a more obvious pattern showing 
that this is a callback, by calling it directly:

x86_hyper->guest_late_init();

Plus add a default empty function (which hypervisors can override). This avoids 
all the hidden conditions and wrappery.

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v1 06/27] x86/entry/64: Adapt assembly for PIE support

2017-10-20 Thread Ingo Molnar


* Thomas Garnier  wrote:

> >>*/
> >> - cmpq$.Lentry_SYSCALL_64_after_fastpath_call, (%rsp)
> >> + leaq.Lentry_SYSCALL_64_after_fastpath_call(%rip), %r11
> >> + cmpq%r11, (%rsp)
> >>   jne 1f

> > This patch seems to add extra overhead to the syscall fast-path even when 
> > PIE is
> > disabled, right?
> 
> It does add extra instructions when one is not possible, I preferred
> that over ifdefing but I can change it.

So my problem is, this pattern repeats in many other places as well, but 
sprinking 
various pieces of assembly code with #ifdefs would be very bad as well.

I have no good idea how to solve this.

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v1 06/27] x86/entry/64: Adapt assembly for PIE support

2017-10-20 Thread Ingo Molnar


* Thomas Garnier  wrote:

> Change the assembly code to use only relative references of symbols for the
> kernel to be PIE compatible.
> 
> Position Independent Executable (PIE) support will allow to extended the
> KASLR randomization range below the -2G memory limit.
> 
> Signed-off-by: Thomas Garnier 
> ---
>  arch/x86/entry/entry_64.S | 22 +++---
>  1 file changed, 15 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index 49167258d587..15bd5530d2ae 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -194,12 +194,15 @@ entry_SYSCALL_64_fastpath:
>   ja  1f  /* return -ENOSYS (already in 
> pt_regs->ax) */
>   movq%r10, %rcx
>  
> + /* Ensures the call is position independent */
> + leaqsys_call_table(%rip), %r11
> +
>   /*
>* This call instruction is handled specially in stub_ptregs_64.
>* It might end up jumping to the slow path.  If it jumps, RAX
>* and all argument registers are clobbered.
>*/
> - call*sys_call_table(, %rax, 8)
> + call*(%r11, %rax, 8)
>  .Lentry_SYSCALL_64_after_fastpath_call:
>  
>   movq%rax, RAX(%rsp)
> @@ -334,7 +337,8 @@ ENTRY(stub_ptregs_64)
>* RAX stores a pointer to the C function implementing the syscall.
>* IRQs are on.
>*/
> - cmpq$.Lentry_SYSCALL_64_after_fastpath_call, (%rsp)
> + leaq.Lentry_SYSCALL_64_after_fastpath_call(%rip), %r11
> + cmpq%r11, (%rsp)
>   jne 1f
>  
>   /*
> @@ -1172,7 +1176,8 @@ ENTRY(error_entry)
>   movl%ecx, %eax  /* zero extend */
>   cmpq%rax, RIP+8(%rsp)
>   je  .Lbstep_iret
> - cmpq$.Lgs_change, RIP+8(%rsp)
> + leaq.Lgs_change(%rip), %rcx
> + cmpq%rcx, RIP+8(%rsp)
>   jne .Lerror_entry_done
>  
>   /*
> @@ -1383,10 +1388,10 @@ ENTRY(nmi)
>* resume the outer NMI.
>*/
>  
> - movq$repeat_nmi, %rdx
> + leaqrepeat_nmi(%rip), %rdx
>   cmpq8(%rsp), %rdx
>   ja  1f
> - movq$end_repeat_nmi, %rdx
> + leaqend_repeat_nmi(%rip), %rdx
>   cmpq8(%rsp), %rdx
>   ja  nested_nmi_out
>  1:
> @@ -1440,7 +1445,8 @@ nested_nmi:
>   pushq   %rdx
>   pushfq
>   pushq   $__KERNEL_CS
> - pushq   $repeat_nmi
> + leaqrepeat_nmi(%rip), %rdx
> + pushq   %rdx
>  
>   /* Put stack back */
>   addq$(6*8), %rsp
> @@ -1479,7 +1485,9 @@ first_nmi:
>   addq$8, (%rsp)  /* Fix up RSP */
>   pushfq  /* RFLAGS */
>   pushq   $__KERNEL_CS/* CS */
> - pushq   $1f /* RIP */
> + pushq   %rax/* Support Position Independent Code */
> + leaq1f(%rip), %rax  /* RIP */
> + xchgq   %rax, (%rsp)/* Restore RAX, put 1f */
>   INTERRUPT_RETURN/* continues at repeat_nmi below */
>   UNWIND_HINT_IRET_REGS

This patch seems to add extra overhead to the syscall fast-path even when PIE 
is 
disabled, right?

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v1 01/27] x86/crypto: Adapt assembly for PIE support

2017-10-20 Thread Ingo Molnar


* Thomas Garnier  wrote:

> Change the assembly code to use only relative references of symbols for the
> kernel to be PIE compatible.
> 
> Position Independent Executable (PIE) support will allow to extended the
> KASLR randomization range below the -2G memory limit.

> diff --git a/arch/x86/crypto/aes-x86_64-asm_64.S 
> b/arch/x86/crypto/aes-x86_64-asm_64.S
> index 8739cf7795de..86fa068e5e81 100644
> --- a/arch/x86/crypto/aes-x86_64-asm_64.S
> +++ b/arch/x86/crypto/aes-x86_64-asm_64.S
> @@ -48,8 +48,12 @@
>  #define R10  %r10
>  #define R11  %r11
>  
> +/* Hold global for PIE suport */
> +#define RBASE%r12
> +
>  #define prologue(FUNC,KEY,B128,B192,r1,r2,r5,r6,r7,r8,r9,r10,r11) \
>   ENTRY(FUNC);\
> + pushq   RBASE;  \
>   movqr1,r2;  \
>   leaqKEY+48(r8),r9;  \
>   movqr10,r11;\
> @@ -74,54 +78,63 @@
>   movlr6 ## E,4(r9);  \
>   movlr7 ## E,8(r9);  \
>   movlr8 ## E,12(r9); \
> + popqRBASE;  \
>   ret;\
>   ENDPROC(FUNC);
>  
> +#define round_mov(tab_off, reg_i, reg_o) \
> + leaqtab_off(%rip), RBASE; \
> + movl(RBASE,reg_i,4), reg_o;
> +
> +#define round_xor(tab_off, reg_i, reg_o) \
> + leaqtab_off(%rip), RBASE; \
> + xorl(RBASE,reg_i,4), reg_o;
> +
>  #define round(TAB,OFFSET,r1,r2,r3,r4,r5,r6,r7,r8,ra,rb,rc,rd) \
>   movzbl  r2 ## H,r5 ## E;\
>   movzbl  r2 ## L,r6 ## E;\
> - movlTAB+1024(,r5,4),r5 ## E;\
> + round_mov(TAB+1024, r5, r5 ## E)\
>   movwr4 ## X,r2 ## X;\
> - movlTAB(,r6,4),r6 ## E; \
> + round_mov(TAB, r6, r6 ## E) \
>   roll$16,r2 ## E;\
>   shrl$16,r4 ## E;\
>   movzbl  r4 ## L,r7 ## E;\
>   movzbl  r4 ## H,r4 ## E;\
>   xorlOFFSET(r8),ra ## E; \
>   xorlOFFSET+4(r8),rb ## E;   \
> - xorlTAB+3072(,r4,4),r5 ## E;\
> - xorlTAB+2048(,r7,4),r6 ## E;\
> + round_xor(TAB+3072, r4, r5 ## E)\
> + round_xor(TAB+2048, r7, r6 ## E)\
>   movzbl  r1 ## L,r7 ## E;\
>   movzbl  r1 ## H,r4 ## E;\
> - movlTAB+1024(,r4,4),r4 ## E;\
> + round_mov(TAB+1024, r4, r4 ## E)\
>   movwr3 ## X,r1 ## X;\
>   roll$16,r1 ## E;\
>   shrl$16,r3 ## E;\
> - xorlTAB(,r7,4),r5 ## E; \
> + round_xor(TAB, r7, r5 ## E) \
>   movzbl  r3 ## L,r7 ## E;\
>   movzbl  r3 ## H,r3 ## E;\
> - xorlTAB+3072(,r3,4),r4 ## E;\
> - xorlTAB+2048(,r7,4),r5 ## E;\
> + round_xor(TAB+3072, r3, r4 ## E)\
> + round_xor(TAB+2048, r7, r5 ## E)\
>   movzbl  r1 ## L,r7 ## E;\
>   movzbl  r1 ## H,r3 ## E;\
>   shrl$16,r1 ## E;\
> - xorlTAB+3072(,r3,4),r6 ## E;\
> - movlTAB+2048(,r7,4),r3 ## E;\
> + round_xor(TAB+3072, r3, r6 ## E)\
> + round_mov(TAB+2048, r7, r3 ## E)\
>   movzbl  r1 ## L,r7 ## E;\
>   movzbl  r1 ## H,r1 ## E;\
> - xorlTAB+1024(,r1,4),r6 ## E;\
> - xorlTAB(,r7,4),r3 ## E; \
> + round_xor(TAB+1024, r1, r6 ## E)\
> + round_xor(TAB, r7, r3 ## E) \
>   movzbl  r2 ## H,r1 ## E;\
>   movzbl  r2 ## L,r7 ## E;\
>   shrl$16,r2 ## E;\
> - xorlTAB+3072(,r1,4),r3 ## E;\
> - xorlTAB+2048(,r7,4),r4 ## E;\
> + round_xor(TAB+3072, r1, r3 ## E)\
> + round_xor(TAB+2048, r7, r4 ## E)\
>   movzbl  r2 ## H,r1 ## E;\
>   movzbl  r2 ## L,r2 ## E;\
>   xorlOFFSET+8(r8),rc ## E;   \
>   xorlOFFSET+12(r8),rd ## E;  \
> - xorlTAB+1024(,r1,4),r3 ## E;\
> - xorlTAB(,r2,4),r4 ## E;
> + round_xor(TAB+1024, r1, r3 ## E)\
> + round_xor(TAB, r2, r4 ## E)

This appears to be adding unconditional overhead to a function that was moved 
to 
assembly to improve its performance.

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] x86: PIE support and option to extend KASLR randomization

2017-10-20 Thread Ingo Molnar


* Pavel Machek <pa...@ucw.cz> wrote:

> On Mon 2017-09-25 09:33:42, Ingo Molnar wrote:
> > 
> > * Pavel Machek <pa...@ucw.cz> wrote:
> > 
> > > > For example, there would be collision with regular user-space mappings, 
> > > > right? 
> > > > Can local unprivileged users use mmap(MAP_FIXED) probing to figure out 
> > > > where 
> > > > the kernel lives?
> > > 
> > > Local unpriviledged users can probably get your secret bits using cache 
> > > probing 
> > > and jump prediction buffers.
> > > 
> > > Yes, you don't want to leak the information using mmap(MAP_FIXED), but 
> > > CPU will 
> > > leak it for you, anyway.
> > 
> > Depends on the CPU I think, and CPU vendors are busy trying to mitigate 
> > this 
> > angle.
> 
> I believe any x86 CPU running Linux will leak it. And with CPU vendors
> putting "artifical inteligence" into branch prediction, no, I don't
> think it is going to get better.
> 
> That does not mean we shoudl not prevent mmap() info leak, but...

That might or might not be so, but there's a world of a difference between
running a relatively long statistical attack figuring out the kernel's
location, versus being able to programmatically probe the kernel's location
by using large MAP_FIXED user-space mmap()s, within a few dozen microseconds
or so and a 100% guaranteed, non-statistical result.

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 1/1] sched/cputime: do not decrease steal time after live migration on xen

2017-10-10 Thread Ingo Molnar


(Cc:-ed more gents involved in kernel/sched/cputime.c work. Full patch quoted 
below.)

* Dongli Zhang  wrote:

> After guest live migration on xen, steal time in /proc/stat
> (cpustat[CPUTIME_STEAL]) might decrease because steal returned by
> paravirt_steal_clock() might be less than this_rq()->prev_steal_time.
> 
> For instance, steal time of each vcpu is 335 before live migration.
> 
> cpu  198 0 368 200064 1962 0 0 1340 0 0
> cpu0 38 0 81 50063 492 0 0 335 0 0
> cpu1 65 0 97 49763 634 0 0 335 0 0
> cpu2 38 0 81 50098 462 0 0 335 0 0
> cpu3 56 0 107 50138 374 0 0 335 0 0
> 
> After live migration, steal time is reduced to 312.
> 
> cpu  200 0 370 200330 1971 0 0 1248 0 0
> cpu0 38 0 82 50123 500 0 0 312 0 0
> cpu1 65 0 97 49832 634 0 0 312 0 0
> cpu2 39 0 82 50167 462 0 0 312 0 0
> cpu3 56 0 107 50207 374 0 0 312 0 0
> 
> The code in this patch is borrowed from do_stolen_accounting() which has
> already been removed from linux source code since commit ecb23dc6 ("xen:
> add steal_clock support on x86").
> 
> Similar and more severe issue would impact prior linux 4.8-4.10 as
> discussed by Michael Las at
> https://0xstubs.org/debugging-a-flaky-cpu-steal-time-counter-on-a-paravirtualized-xen-guest.
> Unlike the issue discussed by Michael Las which would overflow steal time
> and lead to 100% st usage in top command for linux 4.8-4.10, the issue for
> linux 4.11+ would only decrease but not overflow steal time after live
> migration.
> 
> References: 
> https://0xstubs.org/debugging-a-flaky-cpu-steal-time-counter-on-a-paravirtualized-xen-guest
> Signed-off-by: Dongli Zhang 
> ---
>  kernel/sched/cputime.c | 13 ++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
> index 14d2dbf..57d09cab 100644
> --- a/kernel/sched/cputime.c
> +++ b/kernel/sched/cputime.c
> @@ -238,10 +238,17 @@ static __always_inline u64 
> steal_account_process_time(u64 maxtime)
>  {
>  #ifdef CONFIG_PARAVIRT
>   if (static_key_false(_steal_enabled)) {
> - u64 steal;
> + u64 steal, steal_time;
> + s64 steal_delta;
> +
> + steal_time = paravirt_steal_clock(smp_processor_id());
> + steal = steal_delta = steal_time - this_rq()->prev_steal_time;
> +
> + if (unlikely(steal_delta < 0)) {
> + this_rq()->prev_steal_time = steal_time;
> + return 0;
> + }
>  
> - steal = paravirt_steal_clock(smp_processor_id());
> - steal -= this_rq()->prev_steal_time;
>   steal = min(steal, maxtime);
>   account_steal_time(steal);
>   this_rq()->prev_steal_time += steal;
> -- 
> 2.7.4
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] x86: PIE support and option to extend KASLR randomization

2017-09-25 Thread Ingo Molnar


* Pavel Machek  wrote:

> > For example, there would be collision with regular user-space mappings, 
> > right? 
> > Can local unprivileged users use mmap(MAP_FIXED) probing to figure out 
> > where 
> > the kernel lives?
> 
> Local unpriviledged users can probably get your secret bits using cache 
> probing 
> and jump prediction buffers.
> 
> Yes, you don't want to leak the information using mmap(MAP_FIXED), but CPU 
> will 
> leak it for you, anyway.

Depends on the CPU I think, and CPU vendors are busy trying to mitigate this 
angle.

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] x86: PIE support and option to extend KASLR randomization

2017-09-23 Thread Ingo Molnar

* H. Peter Anvin  wrote:

> We do need to consider how we want modules to fit into whatever model we
> choose, though.  They can be adjacent, or we could go with a more
> traditional dynamic link model where the modules can be separate, and
> chained together with the main kernel via the GOT.

So I believe we should start with 'adjacent'. The thing is, having modules 
separately randomized mostly helps if any of the secret locations fails and
we want to prevent hopping from one to the other. But if one the 
kernel-privileged
secret location fails then KASLR has already failed to a significant degree...

So I think the large-PIC model for modules does not buy us any real advantages 
in 
practice, and the disadvantages of large-PIC are real and most Linux users have 
to 
pay that cost unconditionally, as distro kernels have half of their kernel 
functionality living in modules.

But I do see fundamental value in being able to hide the kernel somewhere in a 
~48 
bits address space, especially if we also implement Linus's suggestion to 
utilize 
the lower bits as well. 0..281474976710656 is a nicely large range and will get 
larger with time.

But it should all be done smartly and carefully:

For example, there would be collision with regular user-space mappings, right?
Can local unprivileged users use mmap(MAP_FIXED) probing to figure out where
the kernel lives?

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] x86: PIE support and option to extend KASLR randomization

2017-09-23 Thread Ingo Molnar

* H. Peter Anvin <h...@zytor.com> wrote:

> On 09/22/17 09:32, Ingo Molnar wrote:
> > 
> > BTW., I think things improved with ORC because with ORC we have RBP as an 
> > extra 
> > register and with PIE we lose RBX - so register pressure in code generation 
> > is 
> > lower.
> > 
> 
> We lose EBX on 32 bits, but we don't lose RBX on 64 bits - since x86-64
> has RIP-relative addressing there is no need for a dedicated PIC register.

Indeed, but we'd use a new register _a lot_ for constructs, transforming:

  movr9,QWORD PTR [r11*8-0x7e3da060] (8 bytes)

into:

  learbx,[rip+] (7 bytes)
  movr9,QWORD PTR [rbx+r11*8] (6 bytes)

... which I suppose is quite close to (but not the same as) 'losing' RBX.

Of course the compiler can pick other registers as well, not that it matters 
much 
to register pressure in larger functions in the end. Plus if the compiler has 
to 
pick a callee-saved register there's the additional saving/restoring overhead 
of 
that as well.

Right?

> I'm somewhat confused how we can have as much as almost 1% overhead.  I 
> suspect 
> that we end up making a GOT and maybe even a PLT for no good reason.

So the above transformation alone would explain a good chunk of the overhead I 
think.

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] x86: PIE support and option to extend KASLR randomization

2017-09-23 Thread Ingo Molnar


* Thomas Garnier  wrote:

> >   2) we first implement the additional entropy bits that Linus suggested.
> >
> > does this work for you?
> 
> Sure, I can look at how feasible that is. If it is, can I send
> everything as part of the same patch set? The additional entropy would
> be enabled for all KASLR but PIE will be off-by-default of course.

Sure, can all be part of the same series.

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] x86: PIE support and option to extend KASLR randomization

2017-09-22 Thread Ingo Molnar

* Thomas Garnier <thgar...@google.com> wrote:

> On Thu, Sep 21, 2017 at 8:59 AM, Ingo Molnar <mi...@kernel.org> wrote:
> >
> > ( Sorry about the delay in answering this. I could blame the delay on the 
> > merge
> >   window, but in reality I've been procrastinating this is due to the 
> > permanent,
> >   non-trivial impact PIE has on generated C code. )
> >
> > * Thomas Garnier <thgar...@google.com> wrote:
> >
> >> 1) PIE sometime needs two instructions to represent a single
> >> instruction on mcmodel=kernel.
> >
> > What again is the typical frequency of this occurring in an x86-64 defconfig
> > kernel, with the very latest GCC?
> 
> I am not sure what is the best way to measure that.

If this is the dominant factor then 'sizeof vmlinux' ought to be enough:

> With ORC: PIE .text is 0.814224% than baseline

I.e. the overhead is +0.81% in both size and (roughly) in number of 
instructions 
executed.

BTW., I think things improved with ORC because with ORC we have RBP as an extra 
register and with PIE we lose RBX - so register pressure in code generation is 
lower.

Ok, I suspect we can try it, but my preconditions for merging it would be:

  1) Linus doesn't NAK it (obviously)
  2) we first implement the additional entropy bits that Linus suggested.

does this work for you?

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] x86: PIE support and option to extend KASLR randomization

2017-09-21 Thread Ingo Molnar

( Sorry about the delay in answering this. I could blame the delay on the merge 
  window, but in reality I've been procrastinating this is due to the permanent,
  non-trivial impact PIE has on generated C code. )

* Thomas Garnier  wrote:

> 1) PIE sometime needs two instructions to represent a single
> instruction on mcmodel=kernel.

What again is the typical frequency of this occurring in an x86-64 defconfig 
kernel, with the very latest GCC?

Also, to make sure: which unwinder did you use for your measurements, 
frame-pointers or ORC? Please use ORC only for future numbers, as
frame-pointers is obsolete from a performance measurement POV.

> 2) GCC does not optimize switches in PIE in order to reduce relocations:

Hopefully this can either be fixed in GCC or at least influenced via a compiler 
switch in the future.

> The switches are the biggest increase on small functions but I don't
> think they represent a large portion of the difference (number 1 is).

Ok.

> A side note, while testing gcc 7.2.0 on hackbench I have seen the PIE
> kernel being faster by 1% across multiple runs (comparing 50 runs done
> across 5 reboots twice). I don't think PIE is faster than a
> mcmodel=kernel but recent versions of gcc makes them fairly similar.

So I think we are down to an overhead range where the inherent noise (both 
random 
and systematic one) in 'hackbench' overwhelms the signal we are trying to 
measure.

So I think it's the kernel .text size change that is the best noise-free proxy 
for 
the overhead impact of PIE.

It doesn't hurt to double check actual real performance as well, just don't 
expect 
there to be much of a signal for anything but fully cached microbenchmark 
workloads.

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] linux-next: manual merge of the xen-tip tree with the tip tree

2017-08-31 Thread Ingo Molnar

* Thomas Gleixner  wrote:

> --- a/arch/x86/xen/enlighten_pv.c
> +++ b/arch/x86/xen/enlighten_pv.c
> @@ -586,6 +586,68 @@ static void xen_write_ldt_entry(struct d
>   preempt_enable();
>  }
>  
> +#ifdef CONFIG_X86_64
> +static struct {
> + void (*orig)(void);
> + void (*xen)(void);
> + bool ist_okay;
> +} trap_array[] = {
> + { debug, xen_xendebug, true },
> + { int3, xen_xenint3, true },
> + { double_fault, xen_double_fault, true },
> +#ifdef CONFIG_X86_MCE
> + { machine_check, xen_machine_check, true },
> +#endif
> + { nmi, xen_nmi, true },
> + { overflow, xen_overflow, false },
> +#ifdef CONFIG_IA32_EMULATION
> + { entry_INT80_compat, xen_entry_INT80_compat, false },
> +#endif
> + { page_fault, xen_page_fault, false },
> + { divide_error, xen_divide_error, false },
> + { bounds, xen_bounds, false },
> + { invalid_op, xen_invalid_op, false },
> + { device_not_available, xen_device_not_available, false },
> + { coprocessor_segment_overrun, xen_coprocessor_segment_overrun, false },
> + { invalid_TSS, xen_invalid_TSS, false },
> + { segment_not_present, xen_segment_not_present, false },
> + { stack_segment, xen_stack_segment, false },
> + { general_protection, xen_general_protection, false },
> + { spurious_interrupt_bug, xen_spurious_interrupt_bug, false },
> + { coprocessor_error, xen_coprocessor_error, false },
> + { alignment_check, xen_alignment_check, false },
> + { simd_coprocessor_error, xen_simd_coprocessor_error, false },
> +#ifdef CONFIG_TRACING
> + { trace_page_fault, xen_trace_page_fault, false },
> +#endif
> +};

Low prio nitpicking, could we please write such table based initializers in a 
vertically organized, tabular fashion:

> + { debug,xen_xendebug,   
> true },
> + { int3, xen_xenint3,
> true },
> + { double_fault, xen_double_fault,   
> true },
> +#ifdef CONFIG_X86_MCE
> + { machine_check,xen_machine_check,  
> true },
> +#endif
> + { nmi,  xen_nmi,
> true },
> + { overflow, xen_overflow,   
> false },
> +#ifdef CONFIG_IA32_EMULATION
> + { entry_INT80_compat,   xen_entry_INT80_compat, 
> false },
> +#endif
> + { page_fault,   xen_page_fault, 
> false },
> + { divide_error, xen_divide_error,   
> false },
> + { bounds,   xen_bounds, 
> false },
> + { invalid_op,   xen_invalid_op, 
> false },
> + { device_not_available, xen_device_not_available,   
> false },
> + { coprocessor_segment_overrun,  xen_coprocessor_segment_overrun,
> false },
> + { invalid_TSS,  xen_invalid_TSS,
> false },
> + { segment_not_present,  xen_segment_not_present,
> false },
> + { stack_segment,xen_stack_segment,  
> false },
> + { general_protection,   xen_general_protection, 
> false },
> + { spurious_interrupt_bug,   xen_spurious_interrupt_bug, 
> false },
> + { coprocessor_error,xen_coprocessor_error,  
> false },
> + { alignment_check,  xen_alignment_check,
> false },
> + { simd_coprocessor_error,   xen_simd_coprocessor_error, 
> false },
> +#ifdef CONFIG_TRACING
> + { trace_page_fault, xen_trace_page_fault,   
> false },
> +#endif

... as to me such a table is 100 times more readable - YMMV.

(If checkpatch whinges about col80 then ignore it.)

> +
> +static bool get_trap_addr(unsigned long *addr, unsigned int ist)
> +{
> + unsigned int nr;
> + bool ist_okay = false;
> +
> + /*
> +  * Replace trap handler addresses by Xen specific ones.
> +  * Check for known traps using IST and whitelist them.
> +  * The debugger ones are the only ones we care about.
> +  * Xen will handle faults like double_fault, * so we should never see
> +  * them.  Warn if there's an unexpected IST-using fault handler.
> +  */
> + for (nr = 0; nr < ARRAY_SIZE(trap_array); nr++)
> + if (*addr == (unsigned long)trap_array[nr].orig) {
> + *addr = (unsigned long)trap_array[nr].xen;
> + ist_okay = trap_array[nr].ist_okay;
> + break;
> + }

And here I think we could do a more readable variant:

  static bool get_trap_addr(void **addr, unsigned int ist)
  ...
struct

Re: [Xen-devel] x86: PIE support and option to extend KASLR randomization

2017-08-25 Thread Ingo Molnar


* Thomas Garnier  wrote:

> With the fix for function tracing, the hackbench results have an
> average of +0.8 to +1.4% (from +8% to +10% before). With a default
> configuration, the numbers are closer to 0.8%.
> 
> On the .text size, with gcc 4.9 I see +0.8% on default configuration
> and +1.180% on the ubuntu configuration.

A 1% text size increase is still significant. Could you look at the 
disassembly, 
where does the size increase come from?

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] x86: PIE support and option to extend KASLR randomization

2017-08-17 Thread Ingo Molnar

* Thomas Garnier  wrote:

> > > -model=small/medium assume you are on the low 32-bit. It generates 
> > > instructions where the virtual addresses have the high 32-bit to be zero.
> >
> > How are these assumptions hardcoded by GCC? Most of the instructions should 
> > be 
> > relocatable straight away, as most call/jump/branch instructions are 
> > RIP-relative.
> 
> I think PIE is capable to use relative instructions well. mcmodel=large 
> assumes 
> symbols can be anywhere.

So if the numbers in your changelog and Kconfig text cannot be trusted, there's 
this description of the size impact which I suspect is less susceptible to 
measurement error:

+ The kernel and modules will generate slightly more assembly (1 to 2%
+ increase on the .text sections). The vmlinux binary will be
+ significantly smaller due to less relocations.

... but describing a 1-2% kernel text size increase as "slightly more assembly" 
shows a gratituous disregard to kernel code generation quality! In reality 
that's 
a huge size increase that in most cases will almost directly transfer to a 1-2% 
slowdown for kernel intense workloads.

Where does that size increase come from, if PIE is capable of using relative 
instructins well? Does it come from the loss of a generic register and the 
resulting increase in register pressure, stack spills, etc.?

So I'm still unhappy about this all, and about the attitude surrounding it.

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] x86: PIE support and option to extend KASLR randomization

2017-08-16 Thread Ingo Molnar


* Thomas Garnier <thgar...@google.com> wrote:

> On Tue, Aug 15, 2017 at 12:56 AM, Ingo Molnar <mi...@kernel.org> wrote:
> >
> > * Thomas Garnier <thgar...@google.com> wrote:
> >
> >> > Do these changes get us closer to being able to build the kernel as truly
> >> > position independent, i.e. to place it anywhere in the valid x86-64 
> >> > address
> >> > space? Or any other advantages?
> >>
> >> Yes, PIE allows us to put the kernel anywhere in memory. It will allow us 
> >> to
> >> have a full randomized address space where position and order of sections 
> >> are
> >> completely random. There is still some work to get there but being able to 
> >> build
> >> a PIE kernel is a significant step.
> >
> > So I _really_ dislike the whole PIE approach, because of the huge slowdown:
> >
> > +config RANDOMIZE_BASE_LARGE
> > +   bool "Increase the randomization range of the kernel image"
> > +   depends on X86_64 && RANDOMIZE_BASE
> > +   select X86_PIE
> > +   select X86_MODULE_PLTS if MODULES
> > +   default n
> > +   ---help---
> > + Build the kernel as a Position Independent Executable (PIE) and
> > + increase the available randomization range from 1GB to 3GB.
> > +
> > + This option impacts performance on kernel CPU intensive workloads 
> > up
> > + to 10% due to PIE generated code. Impact on user-mode processes 
> > and
> > + typical usage would be significantly less (0.50% when you build 
> > the
> > + kernel).
> > +
> > + The kernel and modules will generate slightly more assembly (1 to 
> > 2%
> > + increase on the .text sections). The vmlinux binary will be
> > + significantly smaller due to less relocations.
> >
> > To put 10% kernel overhead into perspective: enabling this option wipes out 
> > about
> > 5-10 years worth of painstaking optimizations we've done to keep the kernel 
> > fast
> > ... (!!)
> 
> Note that 10% is the high-bound of a CPU intensive workload.

Note that the 8-10% hackbench or even a 2%-4% range would be 'huge' in terms of 
modern kernel performance. In many cases we are literally applying cycle level 
optimizations that are barely measurable. A 0.1% speedup in linear execution 
speed 
is already a big success.

> I am going to start doing performance testing on -mcmodel=large to see if it 
> is 
> faster than -fPIE.

Unfortunately mcmodel=large looks pretty heavy too AFAICS, at the machine 
instruction level.

Function calls look like this:

 -mcmodel=medium:

   757:   e8 98 ff ff ff  callq  6f4 

 -mcmodel=large

   77b:   48 b8 10 f7 df ff ffmovabs $0xffdff710,%rax
   782:   ff ff ff 
   785:   48 8d 04 03 lea(%rbx,%rax,1),%rax
   789:   ff d0   callq  *%rax

And we'd do this for _EVERY_ function call in the kernel. That kind of crap is 
totally unacceptable.

> > I think the fundamental flaw is the assumption that we need a PIE 
> > executable 
> > to have a freely relocatable kernel on 64-bit CPUs.
> >
> > Have you considered a kernel with -mcmodel=small (or medium) instead of 
> > -fpie 
> > -mcmodel=large? We can pick a random 2GB window in the (non-kernel) 
> > canonical 
> > x86-64 address space to randomize the location of kernel text. The location 
> > of 
> > modules can be further randomized within that 2GB window.
> 
> -model=small/medium assume you are on the low 32-bit. It generates 
> instructions 
> where the virtual addresses have the high 32-bit to be zero.

How are these assumptions hardcoded by GCC? Most of the instructions should be 
relocatable straight away, as most call/jump/branch instructions are 
RIP-relative.

I.e. is there no GCC code generation mode where code can be placed anywhere in 
the 
canonical address space, yet call and jump distance is within 31 bits so that 
the 
generated code is fast?

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] x86: PIE support and option to extend KASLR randomization

2017-08-15 Thread Ingo Molnar


* Thomas Garnier  wrote:

> > Do these changes get us closer to being able to build the kernel as truly 
> > position independent, i.e. to place it anywhere in the valid x86-64 address 
> > space? Or any other advantages?
> 
> Yes, PIE allows us to put the kernel anywhere in memory. It will allow us to 
> have a full randomized address space where position and order of sections are 
> completely random. There is still some work to get there but being able to 
> build 
> a PIE kernel is a significant step.

So I _really_ dislike the whole PIE approach, because of the huge slowdown:

+config RANDOMIZE_BASE_LARGE
+   bool "Increase the randomization range of the kernel image"
+   depends on X86_64 && RANDOMIZE_BASE
+   select X86_PIE
+   select X86_MODULE_PLTS if MODULES
+   default n
+   ---help---
+ Build the kernel as a Position Independent Executable (PIE) and
+ increase the available randomization range from 1GB to 3GB.
+
+ This option impacts performance on kernel CPU intensive workloads up
+ to 10% due to PIE generated code. Impact on user-mode processes and
+ typical usage would be significantly less (0.50% when you build the
+ kernel).
+
+ The kernel and modules will generate slightly more assembly (1 to 2%
+ increase on the .text sections). The vmlinux binary will be
+ significantly smaller due to less relocations.

To put 10% kernel overhead into perspective: enabling this option wipes out 
about 
5-10 years worth of painstaking optimizations we've done to keep the kernel 
fast 
... (!!)

I think the fundamental flaw is the assumption that we need a PIE executable to 
have a freely relocatable kernel on 64-bit CPUs.

Have you considered a kernel with -mcmodel=small (or medium) instead of -fpie 
-mcmodel=large? We can pick a random 2GB window in the (non-kernel) canonical 
x86-64 address space to randomize the location of kernel text. The location of 
modules can be further randomized within that 2GB window.

It should have far less performance impact than the register-losing and 
overhead-inducing -fpie / -mcmodel=large (for modules) execution models.

My quick guess is tha the performance impact might be close to zero in fact.

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v4] xen: get rid of paravirt op adjust_exception_frame

2017-08-12 Thread Ingo Molnar


* Juergen Gross <jgr...@suse.com> wrote:

> When running as Xen pv-guest the exception frame on the stack contains
> %r11 and %rcx additional to the other data pushed by the processor.
> 
> Instead of having a paravirt op being called for each exception type
> prepend the Xen specific code to each exception entry. When running as
> Xen pv-guest just use the exception entry with prepended instructions,
> otherwise use the entry without the Xen specific code.
> 
> Signed-off-by: Juergen Gross <jgr...@suse.com>
> ---
>  arch/x86/entry/entry_64.S | 23 ++---
>  arch/x86/entry/entry_64_compat.S  |  1 -
>  arch/x86/include/asm/paravirt.h   |  5 --
>  arch/x86/include/asm/paravirt_types.h |  4 --
>  arch/x86/include/asm/proto.h  |  3 ++
>  arch/x86/include/asm/traps.h  | 33 ++--
>  arch/x86/kernel/asm-offsets_64.c  |  1 -
>  arch/x86/kernel/paravirt.c|  3 --
>  arch/x86/xen/enlighten_pv.c   | 96 
> +++
>  arch/x86/xen/irq.c|  3 --
>  arch/x86/xen/xen-asm_64.S | 45 ++--
>  arch/x86/xen/xen-ops.h    |  1 -
>  12 files changed, 140 insertions(+), 78 deletions(-)

Acked-by: Ingo Molnar <mi...@kernel.org>

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 0/3] fix xen hvm guest with kaslr enabled

2017-08-11 Thread Ingo Molnar


* Juergen Gross <jgr...@suse.com> wrote:

> On 08/08/17 16:00, Boris Ostrovsky wrote:
> > On 08/08/2017 02:46 AM, Juergen Gross wrote:
> >> On 28/07/17 12:23, Juergen Gross wrote:
> >>> This patch series fixes a regression introduced in 4.13-rc1: A Xen
> >>> HVM guest with KASLR enabled wouldn't boot any longer due to the usage
> >>> of __va() before kernel_randomize_memory() was called.
> >>>
> >>> Changes in V2:
> >>> - patch 1: test for x86_hyper being not NULL
> >>>
> >>> Juergen Gross (3):
> >>>   x86: provide an init_mem_mapping hypervisor hook
> >>>   xen: split up xen_hvm_init_shared_info()
> >>>   xen: fix hvm guest with kaslr enabled
> >>>
> >>>  arch/x86/include/asm/hypervisor.h | 10 +++
> >>>  arch/x86/mm/init.c|  3 ++
> >>>  arch/x86/xen/enlighten_hvm.c  | 59 
> >>> ---
> >>>  3 files changed, 50 insertions(+), 22 deletions(-)
> >>>
> >> Could I have some feedback, please?
> >>
> >> I'd like to get this regression fixed in 4.13.
> >>
> >> In case nobody objects this week I'll just add the patches to the Xen
> >> tree for rc5.
> > 
> > 
> > As I said before I think .init_mem_mapping() could live in
> > x86_platform_ops() but this works too, so
> > 
> > Reviewed-by: Boris Ostrovsky <boris.ostrov...@oracle.com>
> > 
> > But this still wants x86 maintainers' ACK.
> 
> x86 maintainers, could you please comment on at least patch 1?

LGTM:

Acked-by: Ingo Molnar <mi...@kernel.org>

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] x86: PIE support and option to extend KASLR randomization

2017-08-11 Thread Ingo Molnar

* Thomas Garnier  wrote:

> Changes:
>  - v2:
>- Add support for global stack cookie while compiler default to fs without
>  mcmodel=kernel
>- Change patch 7 to correctly jump out of the identity mapping on kexec 
> load
>  preserve.
> 
> These patches make the changes necessary to build the kernel as Position
> Independent Executable (PIE) on x86_64. A PIE kernel can be relocated below
> the top 2G of the virtual address space. It allows to optionally extend the
> KASLR randomization range from 1G to 3G.

So this:

 61 files changed, 923 insertions(+), 299 deletions(-)

... is IMHO an _awful_ lot of churn and extra complexity in pretty fragile 
pieces 
of code, to gain what appears to be only ~1.5 more bits of randomization!

Do these changes get us closer to being able to build the kernel as truly 
position 
independent, i.e. to place it anywhere in the valid x86-64 address space? Or 
any 
other advantages?

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 4.12 26/84] x86/xen/efi: Initialize only the EFI struct members used by Xen

2017-07-20 Thread Ingo Molnar


* Greg Kroah-Hartman <gre...@linuxfoundation.org> wrote:

> On Thu, Jul 20, 2017 at 10:39:10AM +0200, Ingo Molnar wrote:
> > 
> > * Daniel Kiper <daniel.ki...@oracle.com> wrote:
> > 
> > > Hey Greg,
> > > 
> > > On Wed, Jul 19, 2017 at 11:43:32AM +0200, Greg Kroah-Hartman wrote:
> > > > 4.12-stable review patch.  If anyone has any objections, please let me 
> > > > know.
> > > 
> > > Why did you skip this patch for 4.11? IMO it should be applied there too.
> > 
> > The thing is, this patch should probaly not even be in v4.12, as it should 
> > only 
> > make any difference if there's a separate _bug_ in the kernel.
> > 
> > This patch makes things more robust going forward, but I question that it 
> > needs to 
> > be in -stable.
> 
> Yeah, good point, I'm going to go drop it entirely from the 4.12-stable
> tree as it obviously isn't stable material, sorry for not catching that
> before.

I should have caught the tag as well when applying the upstream patch to begin 
with.

Thanks!

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 4.12 26/84] x86/xen/efi: Initialize only the EFI struct members used by Xen

2017-07-20 Thread Ingo Molnar

* Daniel Kiper  wrote:

> Hey Greg,
> 
> On Wed, Jul 19, 2017 at 11:43:32AM +0200, Greg Kroah-Hartman wrote:
> > 4.12-stable review patch.  If anyone has any objections, please let me know.
> 
> Why did you skip this patch for 4.11? IMO it should be applied there too.

The thing is, this patch should probaly not even be in v4.12, as it should only 
make any difference if there's a separate _bug_ in the kernel.

This patch makes things more robust going forward, but I question that it needs 
to 
be in -stable.

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v9 00/38] x86: Secure Memory Encryption (AMD)

2017-07-08 Thread Ingo Molnar

* Tom Lendacky  wrote:

> This patch series provides support for AMD's new Secure Memory Encryption 
> (SME)
> feature.

I'm wondering, what's the typical performance hit to DRAM access latency when 
SME 
is enabled?

On that same note, if the performance hit is noticeable I'd expect SME to not 
be 
enabled in native kernels typically - but still it looks like a useful hardware 
feature. Since it's controlled at the page table level, have you considered 
allowing SME-activated vmas via mmap(), even on kernels that are otherwise not 
using encrypted DRAM?

One would think that putting encryption keys into such encrypted RAM regions 
would 
generally improve robustness against various physical space attacks that want 
to 
extract keys but don't have full control of the CPU.

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 2/2] x86/xen/efi: Init only efi struct members used by Xen

2017-06-21 Thread Ingo Molnar


* Daniel Kiper  wrote:

> -static const struct efi efi_xen __initconst = {
> - .systab   = NULL, /* Initialized later. */
> - .runtime_version  = 0,/* Initialized later. */
> - .mps  = EFI_INVALID_TABLE_ADDR,
> - .acpi = EFI_INVALID_TABLE_ADDR,
> - .acpi20   = EFI_INVALID_TABLE_ADDR,
> - .smbios   = EFI_INVALID_TABLE_ADDR,
> - .smbios3  = EFI_INVALID_TABLE_ADDR,
> - .sal_systab   = EFI_INVALID_TABLE_ADDR,
> - .boot_info= EFI_INVALID_TABLE_ADDR,
> - .hcdp = EFI_INVALID_TABLE_ADDR,
> - .uga  = EFI_INVALID_TABLE_ADDR,
> - .uv_systab= EFI_INVALID_TABLE_ADDR,
> - .fw_vendor= EFI_INVALID_TABLE_ADDR,
> - .runtime  = EFI_INVALID_TABLE_ADDR,
> - .config_table = EFI_INVALID_TABLE_ADDR,
> - .get_time = xen_efi_get_time,
> - .set_time = xen_efi_set_time,
> - .get_wakeup_time  = xen_efi_get_wakeup_time,
> - .set_wakeup_time  = xen_efi_set_wakeup_time,
> - .get_variable = xen_efi_get_variable,
> - .get_next_variable= xen_efi_get_next_variable,
> - .set_variable = xen_efi_set_variable,
> - .query_variable_info  = xen_efi_query_variable_info,
> - .update_capsule   = xen_efi_update_capsule,
> - .query_capsule_caps   = xen_efi_query_capsule_caps,
> - .get_next_high_mono_count = xen_efi_get_next_high_mono_count,
> - .reset_system = xen_efi_reset_system,
> - .set_virtual_address_map  = NULL, /* Not used under Xen. */
> - .flags= 0 /* Initialized later. */
> -};
> -
>  static efi_system_table_t __init *xen_efi_probe(void)
>  {
>   struct xen_platform_op op = {
> @@ -102,7 +70,18 @@ static efi_system_table_t __init *xen_efi_probe(void)
>  
>   /* Here we know that Xen runs on EFI platform. */
>  
> - efi = efi_xen;
> + efi.get_time = xen_efi_get_time;
> + efi.set_time = xen_efi_set_time;
> + efi.get_wakeup_time = xen_efi_get_wakeup_time;
> + efi.set_wakeup_time = xen_efi_set_wakeup_time;
> + efi.get_variable = xen_efi_get_variable;
> + efi.get_next_variable = xen_efi_get_next_variable;
> + efi.set_variable = xen_efi_set_variable;
> + efi.query_variable_info = xen_efi_query_variable_info;
> + efi.update_capsule = xen_efi_update_capsule;
> + efi.query_capsule_caps = xen_efi_query_capsule_caps;
> + efi.get_next_high_mono_count = xen_efi_get_next_high_mono_count;
> + efi.reset_system = xen_efi_reset_system;

This is a step back stylistically, as you lost the nice vertical tabulation of 
the 
original initializer ...

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2] xen/mce: don't issue error message for failed /dev/mcelog registration

2017-06-13 Thread Ingo Molnar


* Juergen Gross  wrote:

> When running under Xen as dom0 /dev/mcelog is being registered by Xen
> instead of the normal mcelog driver. Avoid an error message being
> issued by the mcelog driver in this case. Instead issue an informative
> message that Xen has registered the device.
> 
> Signed-off-by: Juergen Gross 
> ---
>  arch/x86/kernel/cpu/mcheck/dev-mcelog.c | 11 +--
>  drivers/xen/mcelog.c|  2 ++
>  2 files changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mcheck/dev-mcelog.c 
> b/arch/x86/kernel/cpu/mcheck/dev-mcelog.c
> index 9c632cb88546..4eb5f7d1d593 100644
> --- a/arch/x86/kernel/cpu/mcheck/dev-mcelog.c
> +++ b/arch/x86/kernel/cpu/mcheck/dev-mcelog.c
> @@ -388,9 +388,16 @@ static __init int dev_mcelog_init_device(void)
>   /* register character device /dev/mcelog */
>   err = misc_register(_chrdev_device);
>   if (err) {
> - pr_err("Unable to init device /dev/mcelog (rc: %d)\n", err);
> - return err;
> + if (err == -EBUSY)
> + /* Xen dom0 might have registered the device already. */
> + pr_info("Unable to init device /dev/mcelog, already 
> registered");
> + else {
> + pr_err("Unable to init device /dev/mcelog (rc: %d)\n",
> +err);
> + return err;
> + }

Please only use balanced curly braces in conditional statements.

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] x86/mm: Split read_cr3() into read_cr3_pa() and __read_cr3()

2017-06-13 Thread Ingo Molnar


* Andy Lutomirski  wrote:

> The kernel has several code paths that read CR3.  Most of them assume that
> CR3 contains the PGD's physical address, whereas some of them awkwardly
> use PHYSICAL_PAGE_MASK to mask off low bits.
> 
> Add explicit mask macros for CR3 and convert all of the CR3 readers.
> This will keep them from breaking when PCID is enabled.
> 
> Cc: Tom Lendacky 
> Cc: Juergen Gross 
> Cc: xen-devel 
> Cc: Boris Ostrovsky 
> Signed-off-by: Andy Lutomirski 
> ---
> 
> Hi Ingo-
> 
> I broke this out because Tom's SME series and my PCID series both need it.
> Please consider applying it to tip:x86/mm.
> 
> I'll send PCID v2 soon.  It'll apply to x86/mm + sched/urgent + this patch.

Ok, thanks - will push it out after a bit of testing.

Note that I also merged sched/urgent into x86/mm, to create a better base tree 
for 
the PCID changes by picking up 252d2a4117bc that PCID depends on.

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] xen_exit_mmap() questions

2017-04-27 Thread Ingo Molnar

* Boris Ostrovsky  wrote:

> > xen_mc_issue() does:
> > 
> > if ((paravirt_get_lazy_mode() & mode) == 0)
> > xen_mc_flush();
> > 
> > I assume the load_cr3() is intended to deal with the case where we're
> > in lazy mode, but we'll still be in lazy mode, right?  Or does it
> > serve some other purpose?
> 
> Of course. I can't read (I ignored the "== 0" part).

Ha, ob'sidenote: the preferred form to write this is:

if (!(paravirt_get_lazy_mode() & mode))
xen_mc_flush();

... exactly due to the readability problem you ran into: a 'pre' negation is 
much 
easier to read, plus '==' tends to trigger 'equal to' attributes in the brain, 
which is the opposite of negation. So it's very easy to mis-read such syntactic 
constructs even if they are technically correct.

I think '== 0' should be forbidden in all cases where the purpose is a logic 
operation and should be used strictly only in cases where we do explicit 
integer 
arithmetics.

(Bools and '== false' are suboptimal for similar reasons.)

... but I digress!

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 03/10] x86: assembly, use SYM_FUNC_END for functions

2017-04-12 Thread Ingo Molnar


* Jiri Slaby  wrote:

> On 04/10/2017, 09:35 PM, Josh Poimboeuf wrote:
> > The code should be in a mergeable state after each patch.  If only
> > patches 1-3 were merged, the code would be in an inconsistent state,
> > with some functions having confusing ENTRY/SYM_FUNC_END pairs.  That
> > complicates git history and also makes it harder to review each patch.
> > 
> > It would be cleaner to separate things out.  First, convert ENTRY/END
> > functions to use ENDPROC, which is a minor bug fix.  Then they can be
> > converted to the new SYM_FUNC_START/END macros in a separate patch.
> 
> OTOH I don't think touching and reviewing the same place twice is what
> actually maintainers would want to see. But as I wrote earlier, I can do
> whatever is preferred -- therefore I am asking before I start reworking
> the patches: maintainers, what do you prefer?

I'd lean towards Josh's suggestion of a more granular series. Having to review 
more is sometimes less, if the patches are more focused.

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] linux-next: manual merge of the xen-tip tree with the tip tree

2017-04-03 Thread Ingo Molnar


* Juergen Gross  wrote:

> > So my suggestion would be: could you delay 75cd32d6093e for a week, and 
> > then 
> > merge it on top of a pulled in tip:x86/mm? I'll send that tree to Linus on 
> > the 
> > first day of the merge window so there shouldn't be any ordering problems.
> 
> Okay, that's rather easy to do.
> 
> Boris, I renamed the current Xen for-linus-4.12 branch for easy development 
> of 
> other Xen patches to for-linus-4.12-pre.
> 
> I'll create another branch for-linus-4.12 based on the tip tree next week 
> which 
> will be subject to the pull request for Linus. As soon as for-linus-4.12 is 
> ready the for-linus-4.12-pre branch shouldn't be used any longer.

I've created a tip:x86-mm-for-xen branch with the following head:

  7f75540ff2ca ("Merge tag 'v4.11-rc5' into x86/mm, to refresh the branch")

... which should be a reasonable base that includes a working version of the 
5-level pagetable changes.

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] linux-next: manual merge of the xen-tip tree with the tip tree

2017-03-29 Thread Ingo Molnar


* Juergen Gross <jgr...@suse.com> wrote:

> On 29/03/17 10:59, Ingo Molnar wrote:
> > 
> > * Juergen Gross <jgr...@suse.com> wrote:
> > 
> >> On 29/03/17 05:35, Stephen Rothwell wrote:
> >>> Hi all,
> >>>
> >>> Today's linux-next merge of the xen-tip tree got a conflict in:
> >>>
> >>>   arch/x86/xen/enlighten.c
> >>>
> >>> between commits:
> >>>
> >>>   6415813bae75 ("x86/cpu: Drop wp_works_ok member of struct cpuinfo_x86")
> >>>   69218e47994d ("x86: Remap GDT tables in the fixmap section")
> >>>   b23adb7d3f7d ("x86/xen/gdt: Use X86_FEATURE_XENPV instead of globals 
> >>> for the GDT fixup")
> >>>
> >>> from the tip tree and commits:
> >>>
> >>>   75cd32d6093e ("x86/xen: split off enlighten_pv.c")
> >>>
> >>> from the xen-tip tree.
> >>>
> >>> I dropped the xen-tip tree for today (see other conflict reports),
> >>> please get together and sort these out, thanks.
> 
> Stephen, I have rewound the linux-next branch of xen-tip to its previous
> position. You can re-enable xen-tip.

Thank you!

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] linux-next: manual merge of the xen-tip tree with the tip tree

2017-03-29 Thread Ingo Molnar

* Juergen Gross  wrote:

> On 29/03/17 05:35, Stephen Rothwell wrote:
> > Hi all,
> > 
> > Today's linux-next merge of the xen-tip tree got a conflict in:
> > 
> >   arch/x86/xen/enlighten.c
> > 
> > between commits:
> > 
> >   6415813bae75 ("x86/cpu: Drop wp_works_ok member of struct cpuinfo_x86")
> >   69218e47994d ("x86: Remap GDT tables in the fixmap section")
> >   b23adb7d3f7d ("x86/xen/gdt: Use X86_FEATURE_XENPV instead of globals for 
> > the GDT fixup")
> > 
> > from the tip tree and commits:
> > 
> >   75cd32d6093e ("x86/xen: split off enlighten_pv.c")
> > 
> > from the xen-tip tree.
> > 
> > I dropped the xen-tip tree for today (see other conflict reports),
> > please get together and sort these out, thanks.
> > 
> 
> Hmm, seems to be a rather bad timing for the series of Vitaly.
> 
> What is the best way to resolve those conflicts? A rebase of Vitaly's
> patches seems to be required in any case.
> 
> Should I rebase the Xen tree on current tip? This seems to be rather
> easy, but I think this will work only if I can be sure the current tip
> tree contents will all be merged by Linus before the Xen tree.

That's certainly very likely, -tip trees all go in very early in the merge 
window.

> I could try to cherry pick the patches from tip where Vitaly's patches
> have conflicts with, but I think this could lead to a lot of patches
> to take.

Nor is it desirable as a workflow.

I'd suggest the following: in about a week I can guarantee a working tip:x86/mm 
base with most of the 5-level paging patches applied that you could base Xen 
patches on.

Unfortunately, right now there's at least one regression with those changes 
that 
needs to be properly fixed before it's a suitable base tree. The fix already 
exists, it just needs to be tested and the whole tree needs to cook for a few 
days 
to be dependable for Xen as a base.

> Or we could delay Vitaly's series until tip has been merged, but this
> will either delay some other Xen patches depending on (or conflicting
> with) Vitaly's patches or would make the rebase for Vitaly more
> difficult.

So my suggestion would be: could you delay 75cd32d6093e for a week, and then 
merge 
it on top of a pulled in tip:x86/mm? I'll send that tree to Linus on the first 
day 
of the merge window so there shouldn't be any ordering problems.

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 02/10] x86: assembly, FUNC_START for fn, DATA_START for data

2017-03-23 Thread Ingo Molnar


* Josh Poimboeuf <jpoim...@redhat.com> wrote:

> On Wed, Mar 22, 2017 at 08:46:16AM +0100, Ingo Molnar wrote:
> > 
> > * Jiri Slaby <jsl...@suse.cz> wrote:
> > 
> > > On 03/22/2017, 08:25 AM, Ingo Molnar wrote:
> > > > 
> > > > * Pavel Machek <pa...@ucw.cz> wrote:
> > > > 
> > > >> Hi!
> > > >>
> > > >>> -ENTRY(saved_rbp) .quad   0
> > > >>> -ENTRY(saved_rsi) .quad   0
> > > >>> -ENTRY(saved_rdi) .quad   0
> > > >>> -ENTRY(saved_rbx) .quad   0
> > > >>> +SYM_DATA_START(saved_rbp).quad   0
> > > >>> +SYM_DATA_START(saved_rsi).quad   0
> > > >>> +SYM_DATA_START(saved_rdi).quad   0
> > > >>> +SYM_DATA_START(saved_rbx).quad   0
> > > >>
> > > >> Does it make sense to call it SYM_DATA_*START* when there's no
> > > >> corresponding end?
> > > > 
> > > > That looks like a bug - I think we should strive for them to always be 
> > > > in pairs.
> > > > 
> > > > Jiri, Josh, could objtool help here perhaps, to detect 'non-terminated' 
> > > > SYM_*_START() uses? This could be done by emitting debug data into a 
> > > > special 
> > > > section and then analyzing that section for unpaired entries. The 
> > > > section can be 
> > > > discarded in the final link, it won't show up in the kernel image.
> > > 
> > > It should be easier than that. No introduction of other info needed --
> > > every global symbol without a ".type" or ".size" (i.e. SYM_*_END) should
> > > be a bug now.
> > 
> > I'm all for that!
> 
> It would be easy to add this checking to objtool since it already reads
> the symbol table.  The hard part is figuring out the logistics. :-)
> 
> - Should the warnings be on by default?

Yes, if objtool is running. Keep it simple.

> - Part of the "objtool check" command or something else?

Yes - I think it's still within the 'object file check' functionality.

> - Separate config option or just include it with
>   CONFIG_STACK_VALIDATION?

Yeah, but I'd rename CONFIG_STACK_VALIDATION to CONFIG_OBJ_VALIDATION or such. 
As 
I predicted early on, objtool will go beyond stack checking! ;-)

> - Should all asm files be checked, including those currently skipped by
>   objtool with OBJECT_FILES_NON_STANDARD?

The symbol syntax check should definitely be for all files, yes.

Could we perhaps emit 'non-standard stack frames' information into the .o 
itself 
(via a flag or a special section?), so that objtool can decide on its own 
whether 
to complain about any weirdnesses there?

> > Can we detect double ends as well - i.e. do a build check of the full 
> > syntax of 
> > these symbol definition primitives?
> 
> Detecting double ends would be a little trickier.  The second SYM_*_END
> supersedes the first, so that information isn't in the ELF symbol table.

Indeed.

> We could use a special section to annotate all the macro uses and have
> objtool do the checking, similar to what you suggested earlier.

That might be useful for other purposes as well - such as the non-standard 
stack 
frame annotations?

But it's your call really: I'm principally fine with any of the solutions, as 
long 
as the checking is done.

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 02/10] x86: assembly, FUNC_START for fn, DATA_START for data

2017-03-22 Thread Ingo Molnar


* Jiri Slaby <jsl...@suse.cz> wrote:

> On 03/22/2017, 08:25 AM, Ingo Molnar wrote:
> > 
> > * Pavel Machek <pa...@ucw.cz> wrote:
> > 
> >> Hi!
> >>
> >>> -ENTRY(saved_rbp) .quad   0
> >>> -ENTRY(saved_rsi) .quad   0
> >>> -ENTRY(saved_rdi) .quad   0
> >>> -ENTRY(saved_rbx) .quad   0
> >>> +SYM_DATA_START(saved_rbp).quad   0
> >>> +SYM_DATA_START(saved_rsi).quad   0
> >>> +SYM_DATA_START(saved_rdi).quad   0
> >>> +SYM_DATA_START(saved_rbx).quad   0
> >>
> >> Does it make sense to call it SYM_DATA_*START* when there's no
> >> corresponding end?
> > 
> > That looks like a bug - I think we should strive for them to always be in 
> > pairs.
> > 
> > Jiri, Josh, could objtool help here perhaps, to detect 'non-terminated' 
> > SYM_*_START() uses? This could be done by emitting debug data into a 
> > special 
> > section and then analyzing that section for unpaired entries. The section 
> > can be 
> > discarded in the final link, it won't show up in the kernel image.
> 
> It should be easier than that. No introduction of other info needed --
> every global symbol without a ".type" or ".size" (i.e. SYM_*_END) should
> be a bug now.

I'm all for that!

Can we detect double ends as well - i.e. do a build check of the full syntax of 
these symbol definition primitives?

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 03/10] x86: assembly, use SYM_FUNC_END for functions

2017-03-22 Thread Ingo Molnar

* Josh Poimboeuf  wrote:

> On Mon, Mar 20, 2017 at 01:32:15PM +0100, Jiri Slaby wrote:
> >  ENTRY(ftrace_caller)
> > /* save_mcount_regs fills in first two parameters */
> > @@ -184,11 +184,12 @@ GLOBAL(ftrace_epilogue)
> >  GLOBAL(ftrace_graph_call)
> > jmp ftrace_stub
> >  #endif
> > +SYM_FUNC_END(ftrace_caller)
> >  
> >  /* This is weak to keep gas from relaxing the jumps */
> >  WEAK(ftrace_stub)
> > retq
> > -END(ftrace_caller)
> > +SYM_FUNC_END(ftrace_caller)
> 
> This gives ftrace_caller() two ends.

Such errors too could be detected automatically via objtool or some other 
symbol 
debug mechanism.

The reason it might be a good fit for objtool is to make the checking optional 
(no 
need to burden a regular build with it), plus objtool already looks at the .o 
from 
first principles - a symbol checking sub-functionality could analyze the symbol 
names in the same pass.

If we want to complicate things with CFI then we absolutely should increase the 
quality of our symbol names space.

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 02/10] x86: assembly, FUNC_START for fn, DATA_START for data

2017-03-22 Thread Ingo Molnar

* Pavel Machek  wrote:

> Hi!
> 
> > -ENTRY(saved_rbp)   .quad   0
> > -ENTRY(saved_rsi)   .quad   0
> > -ENTRY(saved_rdi)   .quad   0
> > -ENTRY(saved_rbx)   .quad   0
> > +SYM_DATA_START(saved_rbp)  .quad   0
> > +SYM_DATA_START(saved_rsi)  .quad   0
> > +SYM_DATA_START(saved_rdi)  .quad   0
> > +SYM_DATA_START(saved_rbx)  .quad   0
> 
> Does it make sense to call it SYM_DATA_*START* when there's no
> corresponding end?

That looks like a bug - I think we should strive for them to always be in pairs.

Jiri, Josh, could objtool help here perhaps, to detect 'non-terminated' 
SYM_*_START() uses? This could be done by emitting debug data into a special 
section and then analyzing that section for unpaired entries. The section can 
be 
discarded in the final link, it won't show up in the kernel image.

We don't ever nest symbols, right?

> Plus... it looks like saved_rsi (and friends) are only used inside
> wakeup_64.S. Could we just delete the "ENTRY" annotations?

That appears to make sense as well.

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v7 1/3] x86/mm: Adapt MODULES_END based on Fixmap section size

2017-03-17 Thread Ingo Molnar


* Thomas Garnier <thgar...@google.com> wrote:

> On Thu, Mar 16, 2017 at 1:10 AM, Ingo Molnar <mi...@kernel.org> wrote:
> >
> > Note that asm/fixmap.h is an x86-ism that isn't present in many other
> > architectures, so this hunk will break the build.
> >
> > To make progress with these patches I've fixed it up with an ugly #ifdef
> > CONFIG_X86, but it needs a real solution instead before this can be pushed
> > upstream.
> 
> I also saw an error on x86 tip on special configuration. I found this
> new patch below to be a good solution to both.
> 
> Let me know what you think.
> 
> =
> 
> This patch aligns MODULES_END to the beginning of the Fixmap section.
> It optimizes the space available for both sections. The address is
> pre-computed based on the number of pages required by the Fixmap
> section.
> 
> It will allow GDT remapping in the Fixmap section. The current
> MODULES_END static address does not provide enough space for the kernel
> to support a large number of processors.
> 
> Signed-off-by: Thomas Garnier <thgar...@google.com>
> ---
> Based on next-20170308
> ---
>  Documentation/x86/x86_64/mm.txt | 5 -
>  arch/x86/include/asm/pgtable_64.h   | 1 +
>  arch/x86/include/asm/pgtable_64_types.h | 3 ++-
>  3 files changed, 7 insertions(+), 2 deletions(-)

The patch is heavily whitespace and line wrap damaged :-(

Please send a properly titled, changelogged delta patch against tip:master (or 
tip:x86/mm) to remove the CONFIG_X86 hack and fix the build bug.

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v7 1/3] x86/mm: Adapt MODULES_END based on Fixmap section size

2017-03-16 Thread Ingo Molnar


* Thomas Garnier  wrote:

> This patch aligns MODULES_END to the beginning of the Fixmap section.
> It optimizes the space available for both sections. The address is
> pre-computed based on the number of pages required by the Fixmap
> section.
> 
> It will allow GDT remapping in the Fixmap section. The current
> MODULES_END static address does not provide enough space for the kernel
> to support a large number of processors.
> 
> Signed-off-by: Thomas Garnier 
> ---
> Based on next-20170308
> ---
>  Documentation/x86/x86_64/mm.txt | 5 -
>  arch/x86/include/asm/pgtable_64_types.h | 3 ++-
>  arch/x86/kernel/module.c| 1 +
>  arch/x86/mm/dump_pagetables.c   | 1 +
>  arch/x86/mm/kasan_init_64.c | 1 +
>  mm/vmalloc.c| 1 +

> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -35,6 +35,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "internal.h"

Note that asm/fixmap.h is an x86-ism that isn't present in many other 
architectures, so this hunk will break the build.

To make progress with these patches I've fixed it up with an ugly #ifdef 
CONFIG_X86, but it needs a real solution instead before this can be pushed 
upstream.

Thanks,

Ingo

=>
 mm/vmalloc.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index dabea6a29fad..b7d2a23349f4 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -35,7 +35,10 @@
 #include 
 #include 
 #include 
-#include 
+
+#ifdef CONFIG_X86
+# include 
+#endif
 
 #include "internal.h"
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC] linkage: new macros for functions and data

2017-03-16 Thread Ingo Molnar

* Jiri Slaby  wrote:

> SYM_LOCAL_ALIAS_START -- use where there are two local names for one code
> SYM_ALIAS_START -- use where there are two global names for one code
> SYM_LOCAL_FUNC_START -- use for local functions
> SYM_FUNCTION_START -- use for global functions
> SYM_WEAK_FUNC_START -- use for weak functions
> SYM_ALIAS_END -- the end of LOCALALIASed or ALIASed code
> SYM_FUNCTION_END -- the end of SYM_LOCAL_FUNC_START, SYM_FUNCTION_START, 
> SYM_WEAK_FUNC_START, ...
> SYM_DATA_START -- global data symbol
> SYM_DATA_END -- the end of SYM_DATA_START symbol

This looks mostly good to me, with minor details:

- The mixed 'FUNC' and 'FUNCTION' naming looks a bit confusing - I'd pick one 
and
  use that consistently.

- I'd also make the START/END constructs look more symmetric, i.e. make the 
  attribute a postfix, not a prefix.

- In fact I'd make the 'alias' notion an attribute as well - what matters 
mostly 
  is that it's a function symbol, and that fact is lost from the SYM_ALIAS 
naming.

I.e. something like this:

SYM_FUNCTION_START
SYM_FUNCTION_START_WEAK
SYM_FUNCTION_START_LOCAL
SYM_FUNCTION_START_ALIAS
SYM_FUNCTION_START_LOCAL_ALIAS
...
SYM_FUNCTION_END

SYM_DATA_START
SYM_DATA_END

Note how simple and structured looking the nomenclature is when grouped in such 
a 
way, how easy it is to verify at a glance, and how easy it is to extend with 
new 
postfixes.

( In theory we could also make the attributes a real macro argument in the 
future,
  should the number of attributes increase significantly. )

Does this look good to everyone?

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 01/10] x86: assembly, ENTRY for fn, GLOBAL for data

2017-03-07 Thread Ingo Molnar

* h...@zytor.com <h...@zytor.com> wrote:

> On March 1, 2017 2:27:54 AM PST, Ingo Molnar <mi...@kernel.org> wrote:

> >> > So how about using macro names that actually show the purpose, instead 
> >> > of 
> >> > importing all the crappy, historic, essentially randomly chosen debug 
> >> > symbol macro names from the binutils and older kernels?
> >> > 
> >> > Something sane, like:
> >> > 
> >> >  SYM__FUNCTION_START
> >> 
> >> Sane would be:
> >> 
> >>SYM_FUNCTION_START
> >> 
> >> The double underscore is just not giving any value.
> >
> > So the double underscore (at least in my view) has two advantages:
> >
> > 1) it helps separate the prefix from the postfix.
> >
> > I.e. it's a 'symbols' namespace, and a 'function start', not the 'start' of 
> > a 
> > 'symbol function'.
> >
> > 2) It also helps easy greppability.
> >
> > Try this in latest -tip:
> >
> >git grep e820__
> >
> > To see all the E820 API calls - with no false positives!
> >
> > 'git grep e820_' on the other hand is a lot less reliable...
> 
> IMO these little "namespace tricks" especially for small common macros like 
> we 
> are taking about here make the code very frustrating to read, and even more 
> to 
> write.  Noone would design a programming language that way, and things like 
> PROC 
> are really just substitutes for proper language features (and could even be 
> as 
> assembly rather than cpp macros.)

This is a totally different thing from language keywords which needs to be 
short 
and concise for obvious reasons.

Keywords of languages get nested and are used all the time, and everyone needs 
to 
know them and they need to stay out of the way. The symbol start/end macros we 
are 
talking about here are _MUCH_ less common, and they are only ever used in a 
single 
nesting level:

SYM__FUNC_START(some_kernel_asm_function)
...
SYM__FUNC_END(some_kernel_asm_function)

Most kernel developers writing new assembly code rarely know these constructs 
by 
heart, they just look them up and carbon copy existing practices. And guess 
what, 
the 'looking them up' gets harder if the macro naming scheme is an idosyncratic 
leftover from long ago.

Kernel developers _reading_ assembly code will know the exact purpose of the 
macros even less, especially if they are named in an ambiguous, illogical 
fashion.

Furthermore, your suggestion of:

> PROC..ENDPROC, LOCALPROC..ENDPROC and DATA..ENDDATA.  Clear, unambiguous and 
> balanced.

Are neither clear, not unambiguous nor balanced! I mean, they are the _exact_ 
opposite:

 - 'PROC' is actually ambiguous in the kernel source code context, as it 
clashes 
   with common abbreviations of 'procfs' and 'process'.

   It's also an unnecessary abbreviation of a word ('procedure') that is not 
   actually used a _single time_ in the C ISO/IEC 9899:TC2 standard - in all 
half 
   thousand+ pages of it. (!) Why the hell does this have to be used in the 
   kernel?

 - It's visually and semantically imbalanced, because some terms have an 'END' 
   prefix, but there's no matching 'START' or 'BEGIN' prefix for their 
   counterparts. This makes it easy to commit various symbol definition 
   termination errors, like:

PROC(some_kernel_asm_function)
...

   Here it's not obvious that it needs an ENDPROC. While if it's written as:

SYM__FUNC_START(some_kernel_asm_function)
...

   ... it's pretty obvious at first sight that an '_END' is missing!

 - What you suggest also has senselessly missing underscores, which makes it 
   _more_ cluttered and less clear. We clearly don't have addtowaitqueue() and
   removefromwaitqueue() function names in the kernel, right? Why should we have
   'ENDPROC' and 'ENDDATA' macro names?

 - Hierarchical naming schemes generally tend to put the more generic category
   name first, not last. So it's:

mutex_init()
mutex_lock()
mutex_unlock()
mutex_trylock()

   It's _NOT_ the other way around:

init_mutex()
lock_mutex()
unlock_mutex()
trylock_mutex()

   The prefix naming scheme is easier to read both visually/typographically 
   (because it aligns vertically in a natural fashion so it's easier to pattern 
   match), and also semantically: because when reading it it's easy to skip the 
   line once your brain reads the generic category of 'mutex'.

   But with 'ENDPROC' my brain both has to needlessly perform the following 
steps:

- disambiguate the 'END' and the 'PROC'

- fill in the missing underscore

- and finally when arriving at the generic term 'PROC', discard it as
  uninteresting

 - S

Re: [Xen-devel] [PATCH 01/10] x86: assembly, ENTRY for fn, GLOBAL for data

2017-03-06 Thread Ingo Molnar


* h...@zytor.com <h...@zytor.com> wrote:

> On March 1, 2017 2:27:54 AM PST, Ingo Molnar <mi...@kernel.org> wrote:
> >
> >* Thomas Gleixner <t...@linutronix.de> wrote:
> >
> >> On Wed, 1 Mar 2017, Ingo Molnar wrote:
> >> > 
> >> > * Jiri Slaby <jsl...@suse.cz> wrote:
> >> > 
> >> > > This is a start of series to unify use of ENTRY, ENDPROC, GLOBAL,
> >END,
> >> > > and other macros across x86. When we have all this sorted out,
> >this will
> >> > > help to inject DWARF unwinding info by objtool later.
> >> > > 
> >> > > So, let us use the macros this way:
> >> > > * ENTRY -- start of a global function
> >> > > * ENDPROC -- end of a local/global function
> >> > > * GLOBAL -- start of a globally visible data symbol
> >> > > * END -- end of local/global data symbol
> >> > 
> >> > So how about using macro names that actually show the purpose,
> >instead of 
> >> > importing all the crappy, historic, essentially randomly chosen
> >debug symbol macro 
> >> > names from the binutils and older kernels?
> >> > 
> >> > Something sane, like:
> >> > 
> >> >  SYM__FUNCTION_START
> >> 
> >> Sane would be:
> >> 
> >>SYM_FUNCTION_START
> >> 
> >> The double underscore is just not giving any value.
> >
> >So the double underscore (at least in my view) has two advantages:
> >
> >1) it helps separate the prefix from the postfix.
> >
> >I.e. it's a 'symbols' namespace, and a 'function start', not the
> >'start' of a 
> >'symbol function'.
> >
> >2) It also helps easy greppability.
> >
> >Try this in latest -tip:
> >
> >  git grep e820__
> >
> >To see all the E820 API calls - with no false positives!
> >
> >'git grep e820_' on the other hand is a lot less reliable...
> >
> >But no strong feelings either way, I just try to sneak in these small
> >namespace 
> >structure tricks when nobody's looking! ;-)
> >
> >Thanks,
> >
> > Ingo
> 
> This seems needlessly verbose to me and clutters the code.
> 
> How about:
> 
> PROC..ENDPROC, LOCALPROC..ENDPROC and DATA..ENDDATA.  Clear, unambiguous and 
> balanced.

I'm sorry, but that naming scheme is disgusing, it reminds me of BASIC 
nomenclature ... did we run out of underscores or what?

Nor would clearly structured names clutter anything, this isn't going to be 
used 
deep inside nested code, it's going to be the single level syntactic term in 
addition to the symbol name itself:

SYM__FUNCTION_START(some_kernel_asm_function)
...
SYM__FUNCTION_END(some_kernel_asm_function)

We could shorten it to 'FUNC' if length is really an issue:

SYM__FUNC_START(some_kernel_asm_function)
...
SYM__FUNC_END(some_kernel_asm_function)

Also, 'PROC', presumably standing for 'procedure', is both ambiguous and a 
misnomer:

 - it's ambiguous with commonly used abbreviations of procfs and process

 - C functions are not actually 'procedures'. Procedures in PASCAL style 
languages
   denote functions that don't return any values. Most of the kernel asm 
functions
   actually do. I realize that even in C we sometimes talk about 'procedures' 
out
   of hysterical inertia, but if you check the C standards, most of them don't
   even use the term 'procedure'.

'function' on the other hand is totally unambiguous.

Plus the lack of START/STOP (or BEGIN/END) makes it easy to commit the mistake 
of 
forgetting to add the end marker, and your naming scheme preserves that!

So if we fix/extend these macros then _PLEASE_ we need to get this stupid, 
illogical, nonsensical, external tooling inherited naming craziness fixed 
first, 
not doubled down on...



Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 01/10] x86: assembly, ENTRY for fn, GLOBAL for data

2017-03-01 Thread Ingo Molnar

* Thomas Gleixner <t...@linutronix.de> wrote:

> On Wed, 1 Mar 2017, Ingo Molnar wrote:
> > 
> > * Jiri Slaby <jsl...@suse.cz> wrote:
> > 
> > > This is a start of series to unify use of ENTRY, ENDPROC, GLOBAL, END,
> > > and other macros across x86. When we have all this sorted out, this will
> > > help to inject DWARF unwinding info by objtool later.
> > > 
> > > So, let us use the macros this way:
> > > * ENTRY -- start of a global function
> > > * ENDPROC -- end of a local/global function
> > > * GLOBAL -- start of a globally visible data symbol
> > > * END -- end of local/global data symbol
> > 
> > So how about using macro names that actually show the purpose, instead of 
> > importing all the crappy, historic, essentially randomly chosen debug 
> > symbol macro 
> > names from the binutils and older kernels?
> > 
> > Something sane, like:
> > 
> > SYM__FUNCTION_START
> 
> Sane would be:
> 
>   SYM_FUNCTION_START
> 
> The double underscore is just not giving any value.

So the double underscore (at least in my view) has two advantages:

1) it helps separate the prefix from the postfix.

I.e. it's a 'symbols' namespace, and a 'function start', not the 'start' of a 
'symbol function'.

2) It also helps easy greppability.

Try this in latest -tip:

  git grep e820__

To see all the E820 API calls - with no false positives!

'git grep e820_' on the other hand is a lot less reliable...

But no strong feelings either way, I just try to sneak in these small namespace 
structure tricks when nobody's looking! ;-)

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 01/10] x86: assembly, ENTRY for fn, GLOBAL for data

2017-03-01 Thread Ingo Molnar

* Jiri Slaby  wrote:

> This is a start of series to unify use of ENTRY, ENDPROC, GLOBAL, END,
> and other macros across x86. When we have all this sorted out, this will
> help to inject DWARF unwinding info by objtool later.
> 
> So, let us use the macros this way:
> * ENTRY -- start of a global function
> * ENDPROC -- end of a local/global function
> * GLOBAL -- start of a globally visible data symbol
> * END -- end of local/global data symbol

So how about using macro names that actually show the purpose, instead of 
importing all the crappy, historic, essentially randomly chosen debug symbol 
macro 
names from the binutils and older kernels?

Something sane, like:

SYM__FUNCTION_START
SYM__FUNCTION_END

SYM__DATA_START
SYM__DATA_END

... and extend that macro namespace with any other variants we might need.

We can still keep the old macro names (for a short while) to ease the 
transition, 
but for heaven's sake, if we do "cleanups" before complicating the code let's 
make 
sure the result is actually readable!

Agreed?

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 4/4] KVM: VMX: Simplify segment_base

2017-02-21 Thread Ingo Molnar


* Paolo Bonzini  wrote:

> > Paolo, how stable, non-rebasing are the KVM tree commits?
> 
> Whatever ends in linux-next is stable.  I have a separate rebasing branch,
> but it's not part of linux-next by design.

Ok, that's nice!

> > Or should we keep Andy's KVM patches together with the GDT patches? Either 
> > workflow works for me - it's your call as these are predominantly KVM 
> > changes.
> 
> I'll delay my pull request to Linus a couple days so that I can test Andy's 6 
> patches. Then you can just base your branch on Linus's tree.

Fantastic, thank you!

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 4/4] KVM: VMX: Simplify segment_base

2017-02-21 Thread Ingo Molnar

* Thomas Garnier  wrote:

> > Okay, I guess I will have to wait for it to be integrated to
> > linux-next then. Or would you rather to it after this patch set is
> > added?
> 
> Read your summary for the patchset of KVM cleanup, I will wait for it to 
> reach 
> linux-next to rebase and send the new iteration.

Note that to be able to apply your readonly-GDT series to the x86 tree I'll 
need a 
stable SHA1 to base it on.

Paolo, how stable, non-rebasing are the KVM tree commits? Once Andy's patches 
hit 
the KVM tree I could pull from you and order it so that I'd send it to Linus 
only 
after you sent your bits upstream. That would save us 3 months of patch 
propagation latency.

Or should we keep Andy's KVM patches together with the GDT patches? Either 
workflow works for me - it's your call as these are predominantly KVM changes.

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 3/3] x86: Make the GDT remapping read-only on 64 bit

2017-02-01 Thread Ingo Molnar


* Andy Lutomirski <l...@kernel.org> wrote:

> On Wed, Feb 1, 2017 at 1:15 AM, Ingo Molnar <mi...@kernel.org> wrote:
> >
> > * Thomas Garnier <thgar...@google.com> wrote:
> >
> >> This patch makes the GDT remapped pages read-only to prevent corruption.
> >> This change is done only on 64 bit.
> >
> 
> 
> >>
> >> - table_base = gdt->address;
> >> + table_base = (unsigned long)get_current_direct_gdt();
> >
> > Instead of spreading these type casts far and wide please introduce another
> > accessor the returns 'unsigned long':
> >
> > get_cpu_gdt_rw_vaddr()
> >
> 
> That whole function is an abomination.  How about replacing 'unsigned
> long table_base' with 'struct desc_struct *table'?  If you're feeling
> really adventurous, *delete* that function and replace all of its
> users with something sane.

Yeah, even better!

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 3/3] x86: Make the GDT remapping read-only on 64 bit

2017-02-01 Thread Ingo Molnar


* Thomas Garnier  wrote:

> This patch makes the GDT remapped pages read-only to prevent corruption.
> This change is done only on 64 bit.

Please spell '64-bit' consistently through the series. I've seen two variants:

  64 bit
  64bit

> +/*
> + * The LTR instruction marks the TSS GDT entry as busy. In 64bit, the GDT is
> + * a read-only remapping. To prevent a page fault, the GDT is switched to the
> + * original writeable version when needed.

s/In 64bit,
 /On 64-bit kernels,

> + */
> +#ifdef CONFIG_X86_64
> +static inline void native_load_tr_desc(void)
> +{
> + struct desc_ptr gdt;
> + int cpu = raw_smp_processor_id();
> + bool restore = false;
> + struct desc_struct *fixmap_gdt;
> +
> + native_store_gdt();
> + fixmap_gdt = get_cpu_fixmap_gdt(cpu);
> +
> + /*
> +  * If the current GDT is the read-only fixmap, swap to the original
> +  * writeable version. Swap back at the end.
> +  */
> + if (gdt.address == (unsigned long)fixmap_gdt) {
> + load_direct_gdt(cpu);
> + restore = true;
> + }
> + asm volatile("ltr %w0"::"q" (GDT_ENTRY_TSS*8));
> + if (restore)
> + load_fixmap_gdt(cpu);

Please use bool plus 0/1, it's more readable (to me) than the true/false 
notation.

>  extern void switch_to_new_gdt(int);
> +extern void load_direct_gdt(int);
>  extern void load_fixmap_gdt(int);

> +/* Load the original GDT from the per-cpu structure */
> +void load_direct_gdt(int cpu)
> +{
> + struct desc_ptr gdt_descr;
> +
> + gdt_descr.address = (long)get_cpu_direct_gdt(cpu);

Please name the functions in an easier to understand way, such as:

get_cpu_gdt_rw()
get_cpu_gdt_ro()

that the GDT is in the direct mappings is less important than the fact the the 
address is writable ...

> +}
> +EXPORT_SYMBOL(load_direct_gdt);

EXPORT_SYMBOL_GPL(), or no export at all.

> +EXPORT_SYMBOL(load_fixmap_gdt);

ditto.

>* VT restores TR but not its size.  Useless.
>*/
> - struct desc_ptr *gdt = this_cpu_ptr(_gdt);
>   struct desc_struct *descs;
>  
> - descs = (void *)gdt->address;
> + descs = (void *)get_current_direct_gdt();

Couldn't the type cast be dropped?

>  
> - table_base = gdt->address;
> + table_base = (unsigned long)get_current_direct_gdt();

Instead of spreading these type casts far and wide please introduce another 
accessor the returns 'unsigned long':

get_cpu_gdt_rw_vaddr()

or such.

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] x86/head: Refactor 32-bit pgtable setup

2017-01-05 Thread Ingo Molnar


* Boris Ostrovsky <boris.ostrov...@oracle.com> wrote:

> The new Xen PVH entry point requires page tables to be setup by the
> kernel since it is entered with paging disabled.
> 
> Pull the common code out of head_32.S so that mk_early_pgtbl_32 can be
> invoked from both the new Xen entry point and the existing startup_32
> code.
> 
> Convert resulting common code to C.
> 
> Signed-off-by: Boris Ostrovsky <boris.ostrov...@oracle.com>
> ---
> This is replacement for https://lkml.org/lkml/2016/10/14/434, with
> assembly code re-written in C as requested by Ingo.
> 
> 
>  arch/x86/include/asm/pgtable_32.h |  32 ++
>  arch/x86/kernel/head32.c  |  62 +++
>  arch/x86/kernel/head_32.S | 122 
> +++---
>  3 files changed, 101 insertions(+), 115 deletions(-)

Yeah, so (belatedly) I tried to merge this to latest upstream, via the commit 
below (note the slight edits to the changelog) - but 32-bit defconfig fails to 
build:

  arch/x86/kernel/head_32.S:615: Error: can't resolve `init_thread_union' 
{*UND* section} - `SIZEOF_PTREGS' {*UND* section}

Thanks,

Ingo

===>
From 071878c634fda7d9be3f015f05a89f5468e7e2e4 Mon Sep 17 00:00:00 2001
From: Boris Ostrovsky <boris.ostrov...@oracle.com>
Date: Thu, 8 Dec 2016 11:44:31 -0500
Subject: [PATCH] x86/boot/32: Convert the 32-bit pgtable setup code from 
assembly to C

The new Xen PVH entry point requires page tables to be set up by the
kernel since it is entered with paging disabled.

Pull the common code out of head_32.S so that mk_early_pgtbl_32() can be
invoked from both the new Xen entry point and the existing startup_32()
code.

Convert resulting common code to C.

Signed-off-by: Boris Ostrovsky <boris.ostrov...@oracle.com>
Cc: Andy Lutomirski <l...@kernel.org>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Brian Gerst <brge...@gmail.com>
Cc: Denys Vlasenko <dvlas...@redhat.com>
Cc: H. Peter Anvin <h...@zytor.com>
Cc: Josh Poimboeuf <jpoim...@redhat.com>
Cc: Linus Torvalds <torva...@linux-foundation.org>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: m...@codeblueprint.co.uk
Cc: xen-de...@lists.xenproject.org
Link: 
http://lkml.kernel.org/r/1481215471-9639-1-git-send-email-boris.ostrov...@oracle.com
Signed-off-by: Ingo Molnar <mi...@kernel.org>
---
 arch/x86/include/asm/pgtable_32.h |  32 ++
 arch/x86/kernel/head32.c  |  62 +++
 arch/x86/kernel/head_32.S | 122 +++---
 3 files changed, 101 insertions(+), 115 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_32.h 
b/arch/x86/include/asm/pgtable_32.h
index b6c0b404898a..fbc73360aea0 100644
--- a/arch/x86/include/asm/pgtable_32.h
+++ b/arch/x86/include/asm/pgtable_32.h
@@ -27,6 +27,7 @@ struct vm_area_struct;
 
 extern pgd_t swapper_pg_dir[1024];
 extern pgd_t initial_page_table[1024];
+extern pmd_t initial_pg_pmd[];
 
 static inline void pgtable_cache_init(void) { }
 static inline void check_pgt_cache(void) { }
@@ -75,4 +76,35 @@ do { \
 #define kern_addr_valid(kaddr) (0)
 #endif
 
+/*
+ * This is how much memory in addition to the memory covered up to
+ * and including _end we need mapped initially.
+ * We need:
+ * (KERNEL_IMAGE_SIZE/4096) / 1024 pages (worst case, non PAE)
+ * (KERNEL_IMAGE_SIZE/4096) / 512 + 4 pages (worst case for PAE)
+ *
+ * Modulo rounding, each megabyte assigned here requires a kilobyte of
+ * memory, which is currently unreclaimed.
+ *
+ * This should be a multiple of a page.
+ *
+ * KERNEL_IMAGE_SIZE should be greater than pa(_end)
+ * and small than max_low_pfn, otherwise will waste some page table entries
+ */
+#if PTRS_PER_PMD > 1
+#define PAGE_TABLE_SIZE(pages) (((pages) / PTRS_PER_PMD) + PTRS_PER_PGD)
+#else
+#define PAGE_TABLE_SIZE(pages) ((pages) / PTRS_PER_PGD)
+#endif
+
+/*
+ * Number of possible pages in the lowmem region.
+ *
+ * We shift 2 by 31 instead of 1 by 32 to the left in order to avoid a
+ * gas warning about overflowing shift count when gas has been compiled
+ * with only a host target support using a 32-bit type for internal
+ * representation.
+ */
+#define LOWMEM_PAGES 2<<31) - __PAGE_OFFSET) >> PAGE_SHIFT))
+
 #endif /* _ASM_X86_PGTABLE_32_H */
diff --git a/arch/x86/kernel/head32.c b/arch/x86/kernel/head32.c
index f16c55bfc090..e5fb436a6548 100644
--- a/arch/x86/kernel/head32.c
+++ b/arch/x86/kernel/head32.c
@@ -49,3 +49,65 @@ asmlinkage __visible void __init i386_start_kernel(void)
 
start_kernel();
 }
+
+/*
+ * Initialize page tables.  This creates a PDE and a set of page
+ * tables, which are located immediately beyond __brk_base.  The variable
+ * _brk_end is set up to point to the first "safe" location.
+ * Mappings are created

Re: [Xen-devel] [PATCH] x86/head: Refactor 32-bit pgtable setup

2016-12-18 Thread Ingo Molnar


* Boris Ostrovsky <boris.ostrov...@oracle.com> wrote:

> On 12/08/2016 11:33 PM, Ingo Molnar wrote:
> > * Boris Ostrovsky <boris.ostrov...@oracle.com> wrote:
> >
> >> The new Xen PVH entry point requires page tables to be setup by the
> >> kernel since it is entered with paging disabled.
> >>
> >> Pull the common code out of head_32.S so that mk_early_pgtbl_32 can be
> >> invoked from both the new Xen entry point and the existing startup_32
> >> code.
> >>
> >> Convert resulting common code to C.
> >>
> >> Signed-off-by: Boris Ostrovsky <boris.ostrov...@oracle.com>
> >> ---
> >> This is replacement for https://lkml.org/lkml/2016/10/14/434, with
> >> assembly code re-written in C as requested by Ingo.
> >>
> >>
> >>  arch/x86/include/asm/pgtable_32.h |  32 ++
> >>  arch/x86/kernel/head32.c  |  62 +++
> >>  arch/x86/kernel/head_32.S | 122 
> >> +++---
> >>  3 files changed, 101 insertions(+), 115 deletions(-)
> > Whee, I love it! And the code is so much more readable!
> >
> > Did you have any particular robustness problems (difficult to resolve 
> > crashes) 
> > while developing it, or was it reasonably straightforward to do?
> 
> There was nothing particularly difficult beyond understanding current
> code. That, of course, is not to say that there were no crashes but
> developing this on a guest gives you pretty good insight into why/where
> you crashed.
> 
> This was tested on bare-metal (in case you are wondering), but obviously
> more testing is always good.

Ok, cool!

Would you like to carry this with your other Xen dependencies? If yes:

   Acked-by: Ingo Molnar <mi...@kernel.org>

If not then I can pick it up and get it to Linus in v4.10.

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] x86/head: Refactor 32-bit pgtable setup

2016-12-08 Thread Ingo Molnar


* Boris Ostrovsky  wrote:

> The new Xen PVH entry point requires page tables to be setup by the
> kernel since it is entered with paging disabled.
> 
> Pull the common code out of head_32.S so that mk_early_pgtbl_32 can be
> invoked from both the new Xen entry point and the existing startup_32
> code.
> 
> Convert resulting common code to C.
> 
> Signed-off-by: Boris Ostrovsky 
> ---
> This is replacement for https://lkml.org/lkml/2016/10/14/434, with
> assembly code re-written in C as requested by Ingo.
> 
> 
>  arch/x86/include/asm/pgtable_32.h |  32 ++
>  arch/x86/kernel/head32.c  |  62 +++
>  arch/x86/kernel/head_32.S | 122 
> +++---
>  3 files changed, 101 insertions(+), 115 deletions(-)

Whee, I love it! And the code is so much more readable!

Did you have any particular robustness problems (difficult to resolve crashes) 
while developing it, or was it reasonably straightforward to do?

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC PATCH v3] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries

2016-12-08 Thread Ingo Molnar


* Juergen Gross <jgr...@suse.com> wrote:

> On 05/12/16 18:49, Alex Thorlton wrote:
> > This is the third pass at my patchset to fix up our problems with
> > XENMEM_machine_memory_map on large systems.  The only changes on this
> > pass were to flesh out the comment above the E820_X_MAX definition, and
> > to add Juergen's Reviewed-by to the second patch.
> > 
> > Let me know if anyone has any questions/comments!
> > 
> > Alex Thorlton (2):
> >   x86: Make E820_X_MAX unconditionally larger than E820MAX
> >   xen/x86: Increase xen_e820_map to E820_X_MAX possible entries
> > 
> >  arch/x86/include/asm/e820.h | 12 
> >  arch/x86/xen/setup.c|  6 +++---
> >  2 files changed, 11 insertions(+), 7 deletions(-)
> > 
> 
> Ingo, do you have any preferences through which tree those patches
> should go? I'd like to have at least patch 2 in 4.10, so I could take
> it through the Xen tree.

Sure, both are fine to me:

  Acked-by: Ingo Molnar <mi...@kernel.org>

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 2/8] x86/head: Refactor 32-bit pgtable setup

2016-12-02 Thread Ingo Molnar


* Boris Ostrovsky <boris.ostrov...@oracle.com> wrote:

> > It is tricky to do so safely, because at this stage almost nothing of the C 
> > execution environment has been set up.

Yeah - but we do have a fair amount of early C code though.

> I can still give it a try but I'd rather not tie it to this (Xen PVH) patch 
> series. Which would leave me with two options: either keep what this patch 
> does, 
> leaving it as assembly (requires your ack), or have Xen code build the pages 
> on 
> its own.

If you give it a try in a subsequent patch (please Cc: me) then it's OK to me:

  Acked-by: Ingo Molnar <mi...@kernel.org>

Feel free to carry it in the Xen tree.

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 2/8] x86/head: Refactor 32-bit pgtable setup

2016-12-02 Thread Ingo Molnar


* Boris Ostrovsky <boris.ostrov...@oracle.com> wrote:

> On 12/02/2016 04:45 AM, Ingo Molnar wrote:
> > * Boris Ostrovsky <boris.ostrov...@oracle.com> wrote:
> >
> >> On 10/31/2016 08:33 AM, Boris Ostrovsky wrote:
> >>>
> >>> On 10/14/2016 02:05 PM, Boris Ostrovsky wrote:
> >>>> From: Matt Fleming <m...@codeblueprint.co.uk>
> >>>>
> >>>> The new Xen PVH entry point requires page tables to be setup by the
> >>>> kernel since it is entered with paging disabled.
> >>>>
> >>>> Pull the common code out of head_32.S and into pgtable_32.S so that
> >>>> setup_pgtable_32 can be invoked from both the new Xen entry point and
> >>>> the existing startup_32 code.
> >>>
> >>> Ping to x86 maintainers.
> >> Pinging again.
> >>
> >> I will be re-sending this series at some point (it has been delayed by
> >> some hypervisor changes that will be needed) but I'd like to hear from
> >> x86 maintainers whether this will be acceptable before I post this again.
> > Could this be done in C?
> 
> I suppose it could be, I haven't thought about it.
> 
> The goal here was to simply make existing startup code available to others 
> (Xen 
> guest) without changes. Are you suggesting to build page tables in C for the 
> Xen 
> guest only or to make startup_32 call new C code as well?

My suggestion would be to transform the factored out assembly code to C.

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 2/8] x86/head: Refactor 32-bit pgtable setup

2016-12-02 Thread Ingo Molnar


* Boris Ostrovsky  wrote:

> On 10/31/2016 08:33 AM, Boris Ostrovsky wrote:
> >
> >
> > On 10/14/2016 02:05 PM, Boris Ostrovsky wrote:
> >> From: Matt Fleming 
> >>
> >> The new Xen PVH entry point requires page tables to be setup by the
> >> kernel since it is entered with paging disabled.
> >>
> >> Pull the common code out of head_32.S and into pgtable_32.S so that
> >> setup_pgtable_32 can be invoked from both the new Xen entry point and
> >> the existing startup_32 code.
> >
> >
> > Ping to x86 maintainers.
> 
> Pinging again.
> 
> I will be re-sending this series at some point (it has been delayed by
> some hypervisor changes that will be needed) but I'd like to hear from
> x86 maintainers whether this will be acceptable before I post this again.

Could this be done in C?

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 1/2] x86: Make E820_X_MAX unconditionally larger than E820MAX

2016-11-29 Thread Ingo Molnar


* Alex Thorlton <athorl...@sgi.com> wrote:

> It's really not necessary to limit E820_X_MAX to 128 in the non-EFI
> case.  This commit drops E820_X_MAX's dependency on CONFIG_EFI, so that
> E820_X_MAX is always at least slightly larger than E820MAX.
> 
> The real motivation behind this is actually to prevent some issues in
> the Xen kernel, where the XENMEM_machine_memory_map hypercall can
> produce an e820 map larger than 128 entries, even on systems where the
> original e820 table was quite a bit smaller than that, depending on how
> many IOAPICs are installed on the system.
> 
> Signed-off-by: Alex Thorlton <athorl...@sgi.com>
> Suggested-by: Ingo Molnar <mi...@redhat.com>
> Cc: Russ Anderson <r...@sgi.com>
> Cc: David Vrabel <david.vra...@citrix.com>
> Cc: Ingo Molnar <mi...@redhat.com>
> Cc: Juergen Gross <jgr...@suse.com>
> Cc: Thomas Gleixner <t...@linutronix.de>
> Cc: "H. Peter Anvin" <h...@zytor.com>
> Cc: Denys Vlasenko <dvlas...@redhat.com>
> Cc: Boris Ostrovsky <boris.ostrov...@oracle.com>
> Cc: x...@kernel.org
> Cc: xen-de...@lists.xenproject.org
> ---
>  arch/x86/include/asm/e820.h | 10 ++
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/include/asm/e820.h b/arch/x86/include/asm/e820.h
> index 476b574..aa00d33 100644
> --- a/arch/x86/include/asm/e820.h
> +++ b/arch/x86/include/asm/e820.h
> @@ -1,13 +1,15 @@
>  #ifndef _ASM_X86_E820_H
>  #define _ASM_X86_E820_H
>  
> -#ifdef CONFIG_EFI
> +/*
> + * We need to make sure that E820_X_MAX is defined
> + * before we include uapi/asm/e820.h
> + */
>  #include 
>  #define E820_X_MAX (E820MAX + 3 * MAX_NUMNODES)

What we need an explanation for in the comment is what does this stand for 
(what 
does the 'X' mean?), and what is the magic 3*MAX_NUMNODES about?

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries

2016-11-14 Thread Ingo Molnar


* Alex Thorlton  wrote:

> On systems with sufficiently large e820 tables, and several IOAPICs, it
> is possible for the XENMEM_machine_memory_map callback (and its
> counterpart, XENMEM_memory_map) to attempt to return an e820 table with
> more than 128 entries.  This callback adds entries to the BIOS-provided
> e820 table to account for IOAPIC registers, which, on sufficiently large
> systems, can result in an e820 table that is too large to copy back into
> xen_e820_map.
> 
> This change simply increases the size of xen_e820_map to E820_X_MAX to
> ensure that there is enough room to store the entire e820 map returned
> from this callback.

This means:

 #ifdef CONFIG_EFI
 #include 
 #define E820_X_MAX (E820MAX + 3 * MAX_NUMNODES)
 #else   /* ! CONFIG_EFI */
 #define E820_X_MAX E820MAX
 #endif

It's a bit weird to key it off of CONFIG_EFI - why isn't it unconditional?

But I have no objections to the principle if it fixes a crash.

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 0/8] x86: audit and remove needless module.h includes

2016-07-14 Thread Ingo Molnar

* Paul Gortmaker <paul.gortma...@windriver.com> wrote:

> [Re: [PATCH 0/8] x86: audit and remove needless module.h includes] On 
> 14/07/2016 (Thu 20:39) Ingo Molnar wrote:
> 
> > 
> > * Paul Gortmaker <paul.gortma...@windriver.com> wrote:
> > 
> > > > I'll continue testing with the setup_percpu.c change left out.
> > > 
> > > Let me know if you want a resend or if you want to just add the
> > > asm/desc.h locally or ...
> > 
> > So I tried the asm/desc.h but saw other build failures - for now gave up.
> > Can we do this in a separate patch?
> 
> Yes of course.  In the meantime I'll investigate further and add more
> configs to my testing.  Is there a list of the configs used for sanity
> testing tip somewhere?

No need for you to complicate your testing (unless you have enough hardware 
resources for that) - I can deal with the occasional build failure. Since I 
eventually create and boot randconfigs there's no fixed set of configs. :-/

That I enabled SMP on the "allnoconfig" (and forgot about it!) has to do with 
the 
fact that even my allnoconfigs are typically bootable (on one of my test 
systems) 
so there's a handful of pre-baked configs that always get enabled.

But other than those small perturbations I do similar testing to what you did: 
all{no|def|mod|yes}config on x86-{32|64}, plus randconfigs.

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 0/8] x86: audit and remove needless module.h includes

2016-07-14 Thread Ingo Molnar


* Paul Gortmaker  wrote:

> > I'll continue testing with the setup_percpu.c change left out.
> 
> Let me know if you want a resend or if you want to just add the
> asm/desc.h locally or ...

So I tried the asm/desc.h but saw other build failures - for now gave up.
Can we do this in a separate patch?

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 0/8] x86: audit and remove needless module.h includes

2016-07-14 Thread Ingo Molnar


* Paul Gortmaker  wrote:

> To that end, I have done allmodconfig, allyesconfig and allnoconfig
> for both 32 bit and 64 bit x86 with these changes on the linux-next
> from today, which presumably has an up to date copy of tip in it.

It does, still I get this on allnoconfig with your patches applied:

arch/x86/kernel/setup_percpu.c: In function ‘setup_percpu_segment’:
arch/x86/kernel/setup_percpu.c:159:2: error: implicit declaration of function 
‘pack_descriptor’ [-Werror=implicit-function-declaration]
  pack_descriptor(, per_cpu_offset(cpu), 0xF,
  ^
arch/x86/kernel/setup_percpu.c:162:2: error: implicit declaration of function 
‘write_gdt_entry’ [-Werror=implicit-function-declaration]
  write_gdt_entry(get_cpu_gdt_table(cpu),
  ^
arch/x86/kernel/setup_percpu.c:162:18: error: implicit declaration of function 
‘get_cpu_gdt_table’ [-Werror=implicit-function-declaration]
  write_gdt_entry(get_cpu_gdt_table(cpu),

I'll continue testing with the setup_percpu.c change left out.

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 1/7] x86/xen: Simplify set_aliased_prot

2016-06-11 Thread Ingo Molnar


* Andy Lutomirski  wrote:

> On Wed, May 25, 2016 at 2:50 AM, David Vrabel  wrote:
> > On 24/05/16 23:48, Andy Lutomirski wrote:
> >> In aa1acff356bb ("x86/xen: Probe target addresses in
> >> set_aliased_prot() before the hypercall"), I added an explicit probe
> >> to work around a hypercall issue.  The code can be simplified by
> >> using probe_kernel_read.
> >
> > Acked-by: David Vrabel 
> 
> Ingo, can you apply this one patch directly to x86/asm?  The rest of
> the series is stalled pending my fixing Borislav's review comments
> and, more importantly, fixing the bugs that testing it has shaken
> loose.  This patch is a nice cleanup all by itself, though.

Ok, agreed, done.

Note that I simplified it some more:

-   probe_kernel_read(, (unsigned char *)v, 1);
+   probe_kernel_read(, v, 1);

... because 'v' already has a 'void *' natural type.

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] x86 PAT fixes for stable

2016-05-25 Thread Ingo Molnar


* Kani, Toshimitsu <toshi.k...@hpe.com> wrote:

> Hello,
> 
> The following patches fix two x86 PAT issues reported in 4.4 [1][2]. Can
> you please backport them up to 4.4? 

Do they apply, build and boot cleanly in that order on top of v4.4, v4.5 and 
v4.6? 
If yes then:

  Acked-by: Ingo Molnar <mi...@kernel.org>

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] x86, locking: Remove ticket (spin)lock implementation

2016-05-19 Thread Ingo Molnar


* Peter Zijlstra  wrote:

> My motivation was that people keep trying to send patches against the ticket 
> lock code... David did just today, and he's not the first.

Yeah, let's just remove dead code ASAP. It's not like it's hard to add (back) 
new 
code, should the need arise.

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [GIT PULL] EFI ARM Xen support

2016-05-18 Thread Ingo Molnar


* Stefano Stabellini  wrote:

> On Tue, 17 May 2016, David Vrabel wrote:
> > On 17/05/16 11:12, Stefano Stabellini wrote:
> > > On Sat, 14 May 2016, Matt Fleming wrote:
> > >> Folks,
> > >>
> > >> Please pull the following branch which contains support for Xen on ARM
> > >> EFI platforms.
> > >>
> > >> This merge includes a merge of Stefano's xen/linux-next branch to pull
> > >> in the prerequisites required for Shannon's commit:
> > >>
> > >>   11ee5491e5ff ("Xen: EFI: Parse DT parameters for Xen specific UEFI")
> > >>
> > >> as it needs both the latest changes in the EFI 'next' branch (now
> > >> tip/efi/core) and xen/linux-next.
> > >>
> > >> I have intentionally not included the individual patches as I would
> > >> normally do in a pull request, so that commit history created by my
> > >> merging of Stefano's branch is preserved, and so that there's no way
> > >> to accidentally apply the patches individually.
> > >>
> > >> The following changes since commit 
> > >> 6c5450ef66816216e574885cf8d3ddb31ef77428:
> > >>
> > >>   efivarfs: Make efivarfs_file_ioctl() static (2016-05-07 07:06:13 +0200)
> > >>
> > >> are available in the git repository at:
> > >>
> > >>   git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi.git 
> > >> efi/arm-xen
> > >>
> > >> for you to fetch changes up to 11ee5491e5ff519e0d3a7018eb21351cb6955a98:
> > >>
> > >>   Xen: EFI: Parse DT parameters for Xen specific UEFI (2016-05-14 
> > >> 19:31:01 +0100)
> > > 
> > > Is this arrangement working for everybody, in particular the tip
> > > maintainers?
> > 
> > Yes.
> 
> I meant the x86 tip maintainers (Thomas, Peter and Ingo).

I have no particular objections, since this seems to be Xen-next merged to the 
EFI 
tree that is already upstream, plus this single commit on top of t:

  11ee5491e5ff Xen: EFI: Parse DT parameters for Xen specific UEFI

Which, if Matt is fine with it, you could send to Linus too as part of this 
cycle's Xen pull request.

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] efi_enabled(EFI_PARAVIRT) use

2016-04-29 Thread Ingo Molnar


* Stefano Stabellini <sstabell...@kernel.org> wrote:

> On Fri, 29 Apr 2016, Ingo Molnar wrote:
> > Also, it would be nice to have all things EFI in a single tree, the 
> > conflicts are 
> > going to be painful! There's very little reason not to carry this kind of 
> > commit:
> > 
> >  arch/arm/xen/enlighten.c   |  6 +
> >  drivers/firmware/efi/arm-runtime.c | 17 +-
> >  drivers/firmware/efi/efi.c | 45 
> > --
> >  3 files changed, 56 insertions(+), 12 deletions(-)
> > 
> > in the EFI tree.
> 
> That's true. I'll drop this commit from xentip and let Matt pick it up
> or request changes as he sees fit.

Thanks!

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] efi_enabled(EFI_PARAVIRT) use

2016-04-29 Thread Ingo Molnar


* Stephen Rothwell <s...@canb.auug.org.au> wrote:

> Hi all,
> 
> Today's linux-next merge of the xen-tip tree got a conflict in:
> 
>   drivers/firmware/efi/arm-runtime.c
> 
> between commit:
> 
>   14c43be60166 ("efi/arm*: Drop writable mapping of the UEFI System table")
> 
> from the tip tree and commit:
> 
>   21c8dfaa2327 ("Xen: EFI: Parse DT parameters for Xen specific UEFI")
> 
> from the xen-tip tree.

(I've attached 21c8dfaa2327 below, for reference.)

Argh:

With considerable pain we just got rid of paravirt_enabled() in the x86 tree, 
and 
Xen is now reintroducing it in the EFI code. Please don't: if you have to do 
capability flags then name the flag accordingly to what it does, don't use some 
generic catch-all naming that will inevitably cause the kind of problems 
paravirt_enabled() caused...

So: NAKed-by: Ingo Molnar <mi...@kernel.org>

Also, it would be nice to have all things EFI in a single tree, the conflicts 
are 
going to be painful! There's very little reason not to carry this kind of 
commit:

 arch/arm/xen/enlighten.c   |  6 +
 drivers/firmware/efi/arm-runtime.c | 17 +-
 drivers/firmware/efi/efi.c | 45 --
 3 files changed, 56 insertions(+), 12 deletions(-)

in the EFI tree.

Thanks,

Ingo

===>
From 21c8dfaa23276be2ae6d580331d8d252cc41e8d9 Mon Sep 17 00:00:00 2001
From: Shannon Zhao <shannon.z...@linaro.org>
Date: Thu, 7 Apr 2016 20:03:34 +0800
Subject: [PATCH] Xen: EFI: Parse DT parameters for Xen specific UEFI

Add a new function to parse DT parameters for Xen specific UEFI just
like the way for normal UEFI. Then it could reuse the existing codes.

If Xen supports EFI, initialize runtime services.

CC: Matt Fleming <m...@codeblueprint.co.uk>
Signed-off-by: Shannon Zhao <shannon.z...@linaro.org>
Reviewed-by: Matt Fleming <m...@codeblueprint.co.uk>
Reviewed-by: Stefano Stabellini <stefano.stabell...@eu.citrix.com>
Tested-by: Julien Grall <julien.gr...@arm.com>
---
 arch/arm/xen/enlighten.c   |  6 +
 drivers/firmware/efi/arm-runtime.c | 17 +-
 drivers/firmware/efi/efi.c | 45 --
 3 files changed, 56 insertions(+), 12 deletions(-)

diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
index 13e3e9f9b094..e130562d3283 100644
--- a/arch/arm/xen/enlighten.c
+++ b/arch/arm/xen/enlighten.c
@@ -261,6 +261,12 @@ static int __init fdt_find_hyper_node(unsigned long node, 
const char *uname,
!strncmp(hyper_node.prefix, s, strlen(hyper_node.prefix)))
hyper_node.version = s + strlen(hyper_node.prefix);
 
+   if (IS_ENABLED(CONFIG_XEN_EFI)) {
+   /* Check if Xen supports EFI */
+   if (of_get_flat_dt_subnode_by_name(node, "uefi") > 0)
+   set_bit(EFI_PARAVIRT, );
+   }
+
return 0;
 }
 
diff --git a/drivers/firmware/efi/arm-runtime.c 
b/drivers/firmware/efi/arm-runtime.c
index 6ae21e41a429..ac609b9f0b99 100644
--- a/drivers/firmware/efi/arm-runtime.c
+++ b/drivers/firmware/efi/arm-runtime.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 
 extern u64 efi_system_table;
 
@@ -107,13 +108,19 @@ static int __init arm_enable_runtime_services(void)
}
set_bit(EFI_SYSTEM_TABLES, );
 
-   if (!efi_virtmap_init()) {
-   pr_err("No UEFI virtual mapping was installed -- runtime 
services will not be available\n");
-   return -ENOMEM;
+   if (IS_ENABLED(CONFIG_XEN_EFI) && efi_enabled(EFI_PARAVIRT)) {
+   /* Set up runtime services function pointers for Xen Dom0 */
+   xen_efi_runtime_setup();
+   } else {
+   if (!efi_virtmap_init()) {
+   pr_err("No UEFI virtual mapping was installed -- 
runtime services will not be available\n");
+   return -ENOMEM;
+   }
+
+   /* Set up runtime services function pointers */
+   efi_native_runtime_setup();
}
 
-   /* Set up runtime services function pointers */
-   efi_native_runtime_setup();
set_bit(EFI_RUNTIME_SERVICES, );
 
efi.runtime_version = efi.systab->hdr.revision;
diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index 3a69ed5ecfcb..519c096a7c33 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -469,12 +469,14 @@ device_initcall(efi_load_efivars);
FIELD_SIZEOF(struct efi_fdt_params, field) \
}
 
-static __initdata struct {
+struct params {
const char name[32];
const char propname[32];
int offset;
int size;
-} dt_params[] = {
+};
+
+static struct params fdt_params[] __initdata = {
UEFI_PARAM("System Table", "li

Re: [Xen-devel] [PATCH v2] x86/mm/pat: Fix BUG_ON in mmap_mem on QEMU/i386

2016-04-13 Thread Ingo Molnar


* Toshi Kani <toshi.k...@hpe.com> wrote:

> The following BUG_ON error was reported on QEMU/i386:
> 
>   kernel BUG at arch/x86/mm/physaddr.c:79!
>   Call Trace:
>   phys_mem_access_prot_allowed
>   mmap_mem
>   ? mmap_region
>   mmap_region
>   do_mmap
>   vm_mmap_pgoff
>   SyS_mmap_pgoff
>   do_int80_syscall_32
>   entry_INT80_32
> 
> after commit edfe63ec97ed ("x86/mtrr: Fix Xorg crashes in
> Qemu sessions").
> 
> PAT is now set to disabled state when MTRRs are disabled.
> Thus, reactivating the __pa(high_memory) check in
> phys_mem_access_prot_allowed().
> 
> When CONFIG_DEBUG_VIRTUAL is set, __pa() calls __phys_addr(),
> which in turn calls slow_virt_to_phys() for 'high_memory'.
> Because 'high_memory' is set to (the max direct mapped virt
> addr + 1), it is not a valid virtual address.  Hence,
> slow_virt_to_phys() returns 0 and hit the BUG_ON.  Using
> __pa_nodebug() instead of __pa() will fix this BUG_ON.
> 
> However, this code block, originally written for Pentiums and
> earlier, is no longer adequate since a 32-bit Xen guest has
> MTRRs disabled and supports ZONE_HIGHMEM.  In this setup,
> this code sets UC attribute for accessing RAM in high memory
> range.
> 
> Delete this code block as it has been unused for a long time.
> 
> Reported-by: kernel test robot <ying.hu...@linux.intel.com>
> Link: https://lkml.org/lkml/2016/4/1/608
> Signed-off-by: Toshi Kani <toshi.k...@hpe.com>
> Cc: Thomas Gleixner <t...@linutronix.de>
> Cc: Ingo Molnar <mi...@kernel.org>
> Cc: H. Peter Anvin <h...@zytor.com>
> Cc: Borislav Petkov <b...@suse.de>
> Cc: David Vrabel <david.vra...@citrix.com>

So you missed the Reviewed-by tag from Boris ...

I've added it, but please try to propagate tags!

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v5 07/14] tools/lguest: force disable tboot and apm

2016-04-13 Thread Ingo Molnar

* Luis R. Rodriguez  wrote:

> The paravirt_enabled() check is going away, the area tossed to
> the kernel on lguest is not zerored out, so ensure lguest force
> disables tboot and apm just in case the kernel file being read might
> have this set for whatever reason.
> 
> Signed-off-by: Luis R. Rodriguez 
> ---
>  tools/lguest/lguest.c | 6 ++
>  1 file changed, 6 insertions(+)

Yeah, so the v4 patch was acked by Rusty:

  Acked-by: Rusty Russell 

... but you didn't add his ack to the v5 patch :-(

Please don't do this! Also please double check all past replies to make sure no 
acks got lost. I noticed this one, but there might be more.

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v5 01/14] x86/boot: enumerate documentation for the x86 hardware_subarch

2016-04-13 Thread Ingo Molnar


* Luis R. Rodriguez  wrote:

> Although hardware_subarch has been in place since the x86 boot
> protocol 2.07 it hasn't been used much. Enumerate current possible
> values to avoid misuses and help with semantics later at boot
> time should this be used further.
> 
> These enums should only ever be used by architecture x86 code,
> and all that code should be well contained and compartamentalized,
> clarify that as well.
> 
> v2: updates documentation further -- be a bit more pedantic about
> annotating care and use of these guys.
> v3: Use s/SOC/SoC and also anntoate that both domU and dom0 are
> both currently supported through the PV boot path.
> 
> Cc: Andy Shevchenko 
> Signed-off-by: Luis R. Rodriguez 
> ---
>  arch/x86/include/uapi/asm/bootparam.h | 37 
> ++-
>  1 file changed, 36 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/uapi/asm/bootparam.h 
> b/arch/x86/include/uapi/asm/bootparam.h
> index 329254373479..bf9fea2f4591 100644
> --- a/arch/x86/include/uapi/asm/bootparam.h
> +++ b/arch/x86/include/uapi/asm/bootparam.h
> @@ -157,7 +157,42 @@ struct boot_params {
>   __u8  _pad9[276];   /* 0xeec */
>  } __attribute__((packed));
>  
> -enum {
> +/**
> + * enum x86_hardware_subarch - x86 hardware subarchitecture

Could you add some prominent warning here, like:

> + * WARNING: the 'x86 subarch flag' is only used for legacy hacks, for 
> platform
> + *  features that are not easily enumerated or discoverable. You 
> should
> + *  not ever use this for new features.

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 0/7] Enhance PAT init to fix Xorg crashes

2016-03-29 Thread Ingo Molnar


* Toshi Kani  wrote:

> A Xorg failure on qemu32 was reported as a regression [1] caused by
> 'commit 9cd25aac1f44 ("x86/mm/pat: Emulate PAT when it is disabled")'.
> This patch-set fixes the regression.
> 
> Negative effects of this regression were two failures [2] in Xorg on
> QEMU with QEMU CPU model "qemu32" (-cpu qemu32), which were triggered
> by the fact that its virtual CPU does not support MTRR.
>  #1. copy_process() failed in the check in reserve_pfn_range()
>  #2. error path in copy_process() then hit WARN_ON_ONCE in
>  untrack_pfn().
> 
> These negative effects are caused by two separate bugs, but they can be
> addressed in separate patches.  This patch-set addresses the root cause,
> a long-standing PAT initialization issue.
> 
> Please see the changelog in patch 4/7 for the details of the issue.
> 
> - Patch 1-2 make necessary enhancement to PAT for the fix without
>   breaking Xen.
> - Patch 3 is cleanup.
> - Patch 4 fixes the regression.
> - Patch 5 fixes an MTRR issue related with PAT init.
> - Patch 6 removes PAT init code from Xen.
> - Patch 7 adds PAT init to documentation.
> 
> [1]: https://lkml.org/lkml/2016/3/3/828
> [2]: https://lkml.org/lkml/2016/3/4/775

The changelogs are much improved, I've applied these patches to tip:x86/mm,
thanks Toshi!

> I'd appreciate if someone can test this patch-set on Xen to verify that
> there is no change in "x86/PAT: Configuration [0-7] .." message in dmesg.

So I don't have a Xen setup, but hopefully such testing will happen once these 
changes show up in linux-next, tomorrow or so.

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v4 2/5] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops

2016-03-15 Thread Ingo Molnar


* Ingo Molnar <mi...@kernel.org> wrote:

> * Andy Lutomirski <l...@amacapital.net> wrote:
> 
> > On Mon, Mar 14, 2016 at 11:40 AM, Linus Torvalds
> > <torva...@linux-foundation.org> wrote:
> > > On Mon, Mar 14, 2016 at 11:24 AM, Andy Lutomirski <l...@amacapital.net> 
> > > wrote:
> > >>
> > >> The code in my queue is, literally:
> > >>
> > >> bool ex_handler_rdmsr_unsafe(const struct exception_table_entry *fixup,
> > >>  struct pt_regs *regs, int trapnr)
> > >> {
> > >> WARN_ONCE(1, "unchecked MSR access error: RDMSR from 0x%x",
> > >>   (unsigned int)regs->cx);
> > >>
> > >> /* Pretend that the read succeeded and returned 0. */
> > >> regs->ip = ex_fixup_addr(fixup);
> > >> regs->ax = 0;
> > >> regs->dx = 0;
> > >> return true;
> > >> }
> > >> EXPORT_SYMBOL(ex_handler_rdmsr_unsafe);
> > >
> > > I guess I can live with this, as long as we also extend the
> > > early-fault handling to work with the special exception handlers.
> > 
> > OK, will do.  I need to rewrork the early IDT code a bit so it
> > generates a real pt_regs layout, but that's arguably a cleanup anyway.
> 
> Ok, with that's I'm pretty happy about it as well.

Note, I think it's pretty clear at this point that this is v4.7 material.

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v4 2/5] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops

2016-03-15 Thread Ingo Molnar


* Andy Lutomirski  wrote:

> On Mon, Mar 14, 2016 at 11:40 AM, Linus Torvalds
>  wrote:
> > On Mon, Mar 14, 2016 at 11:24 AM, Andy Lutomirski  
> > wrote:
> >>
> >> The code in my queue is, literally:
> >>
> >> bool ex_handler_rdmsr_unsafe(const struct exception_table_entry *fixup,
> >>  struct pt_regs *regs, int trapnr)
> >> {
> >> WARN_ONCE(1, "unchecked MSR access error: RDMSR from 0x%x",
> >>   (unsigned int)regs->cx);
> >>
> >> /* Pretend that the read succeeded and returned 0. */
> >> regs->ip = ex_fixup_addr(fixup);
> >> regs->ax = 0;
> >> regs->dx = 0;
> >> return true;
> >> }
> >> EXPORT_SYMBOL(ex_handler_rdmsr_unsafe);
> >
> > I guess I can live with this, as long as we also extend the
> > early-fault handling to work with the special exception handlers.
> 
> OK, will do.  I need to rewrork the early IDT code a bit so it
> generates a real pt_regs layout, but that's arguably a cleanup anyway.

Ok, with that's I'm pretty happy about it as well.

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v4 2/5] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops

2016-03-15 Thread Ingo Molnar


* Linus Torvalds  wrote:

> On Mon, Mar 14, 2016 at 10:17 AM, Andy Lutomirski  wrote:
> >
> > So yes, let's please warn.  I'm okay with removing the panic_on_oops
> > thing though.  (But if anyone suggests that we should stop OOPSing on
> > bad kernel page faults, I *will* fight back.)
> 
> Bad kernel page faults are something completely different. They would
> be actual bugs regardless.
> 
> The MSR thing has *often* been just silly "this CPU doesn't do that
> MSR". Generic bootup setup code etc that just didn't know or care
> about the particular badly documented rule for one particular random
> CPU version and stepping.
> 
> In fact, when I say "often", I suspect I should really just say
> "always". I don't think we've ever found a case where oopsing would
> have been the right thing. But it has definitely caused lots of
> problems, especially in the early paths where your code doesn't even
> work right now.
> 
> Now, when it comes to the warning, I guess I could live with it, but I
> think it's stupid to make this a low-level exception handler thing.
> 
> So what I think should be done:
> 
>  - make sure that wr/rdmsr_safe() actually works during very early
> init. At some point it didn't.
> 
>  - get rid of the current wrmsr/rdmsr *entirely*. It's shit.
> 
>  - Add this wrapper:
> 
>   #define wrmsr(msr, low, high) \
> WARN_ON_ONCE(wrmsr_safe(msr, low, high))
> 
> and be done with it. We could even decide to make that WARN_ON_ONCE()
> be something we could configure out, because it's really a debugging
> thing and isn't like it should be all that fatal.
> 
> None of this insane complicated crap that buys us exactly *nothing*,
> and depends on fancy new exception handling support etc etc.
> 
> So what's the downside to just doing this simple thing?

This was my original suggestion as well.

The objection was that sometimes it's performance critical. To which I replied 
that 1) I highly doubt that a single extra branch of the check is measurable 2) 
and even if so, then _those_ performance critical cases should be done in some 
special way (rdmsr_nocheck() or whatever) - at which point the argument was 
that 
there's a lot of such cases.

Which final line of argument I still don't really buy, btw., but it became a me 
against everyone else argument and I cowardly punted.

Now that the 800 lb gorilla is on my side I of course stand my ground, 
principles 
are principles!

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops

2016-03-12 Thread Ingo Molnar


* Andy Lutomirski <l...@amacapital.net> wrote:

> On Thu, Oct 1, 2015 at 12:15 AM, Ingo Molnar <mi...@kernel.org> wrote:
> >
> > * Andy Lutomirski <l...@amacapital.net> wrote:
> >
> >> > These could still be open coded in an inlined fashion, like the 
> >> > scheduler usage.
> >>
> >> We could have a raw_rdmsr for those.
> >>
> >> OTOH, I'm still not 100% convinced that this warn-but-don't-die behavior is
> >> worth the effort.  This isn't a frequent source of bugs to my knowledge, 
> >> and we
> >> don't try to recover from incorrect cr writes, out-of-bounds MMIO, etc, so 
> >> do we
> >> really gain much by rigging a recovery mechanism for rdmsr and wrmsr 
> >> failures
> >> for code that doesn't use the _safe variants?
> >
> > It's just the general principle really: don't crash the kernel on bootup. 
> > There's
> > few things more user hostile than that.
> >
> > Also, this would maintain the status quo: since we now (accidentally) don't 
> > crash
> > the kernel on distro kernels (but silently and unsafely ignore the faulting
> > instruction), we should not regress that behavior (by adding the chance to 
> > crash
> > again), but improve upon it.
> 
> Just a heads up: the extable improvements in tip:ras/core make it
> straightforward to get the best of all worlds: explicit failure
> handling (written in C!), no fast path overhead whatsoever, and no new
> garbage in the exception handlers.

I _knew_ I should have merged them into tip:x86/mm, not tip:ras/core ;-)

I had a quick look at your new MSR series and I'm very happy with that 
direction!

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 2/5] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops

2016-03-12 Thread Ingo Molnar


* Andy Lutomirski  wrote:

> This demotes an OOPS and likely panic due to a failed non-"safe" MSR
> access to a WARN and, for RDMSR, a return value of zero.  If
> panic_on_oops is set, then failed unsafe MSR accesses will still
> oops and panic.
> 
> To be clear, this type of failure should *not* happen.  This patch
> exists to minimize the chance of nasty undebuggable failures due on
> systems that used to work due to a now-fixed CONFIG_PARAVIRT=y bug.
> 
> Signed-off-by: Andy Lutomirski 
> ---
>  arch/x86/include/asm/msr.h | 10 --
>  arch/x86/mm/extable.c  | 33 +
>  2 files changed, 41 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
> index 93fb7c1cffda..1487054a1a70 100644
> --- a/arch/x86/include/asm/msr.h
> +++ b/arch/x86/include/asm/msr.h
> @@ -92,7 +92,10 @@ static inline unsigned long long native_read_msr(unsigned 
> int msr)
>  {
>   DECLARE_ARGS(val, low, high);
>  
> - asm volatile("rdmsr" : EAX_EDX_RET(val, low, high) : "c" (msr));
> + asm volatile("1: rdmsr\n"
> +  "2:\n"
> +  _ASM_EXTABLE_HANDLE(1b, 2b, ex_handler_rdmsr_unsafe)
> +  : EAX_EDX_RET(val, low, high) : "c" (msr));
>   if (msr_tracepoint_active(__tracepoint_read_msr))
>   do_trace_read_msr(msr, EAX_EDX_VAL(val, low, high), 0);
>   return EAX_EDX_VAL(val, low, high);
> @@ -119,7 +122,10 @@ static inline unsigned long long 
> native_read_msr_safe(unsigned int msr,
>  static inline void native_write_msr(unsigned int msr,
>   unsigned low, unsigned high)
>  {
> - asm volatile("wrmsr" : : "c" (msr), "a"(low), "d" (high) : "memory");
> + asm volatile("1: wrmsr\n"
> +  "2:\n"
> +  _ASM_EXTABLE_HANDLE(1b, 2b, ex_handler_wrmsr_unsafe)
> +  : : "c" (msr), "a"(low), "d" (high) : "memory");
>   if (msr_tracepoint_active(__tracepoint_read_msr))
>   do_trace_write_msr(msr, ((u64)high << 32 | low), 0);
>  }
> diff --git a/arch/x86/mm/extable.c b/arch/x86/mm/extable.c
> index 9dd7e4b7fcde..f310714e6e6d 100644
> --- a/arch/x86/mm/extable.c
> +++ b/arch/x86/mm/extable.c
> @@ -49,6 +49,39 @@ bool ex_handler_ext(const struct exception_table_entry 
> *fixup,
>  }
>  EXPORT_SYMBOL(ex_handler_ext);
>  
> +bool ex_handler_rdmsr_unsafe(const struct exception_table_entry *fixup,
> +  struct pt_regs *regs, int trapnr)
> +{
> + WARN(1, "unsafe MSR access error: RDMSR from 0x%x",
> +  (unsigned int)regs->cx);

Btw., instead of the safe/unsafe naming (which has an emotional and security 
secondary attribute), shouldn't we move this over to a _check() (or 
_checking()) 
naming instead that we do in other places in the kernel?

I.e.:

rdmsr(msr, l, h);

and:

if (rdmsr_check(msr, l, h)) {
...
}

and then we could name the helpers as _check() and _nocheck() - which is 
neutral 
naming.

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 2/5] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops

2016-03-12 Thread Ingo Molnar


* Andy Lutomirski  wrote:

> +bool ex_handler_rdmsr_unsafe(const struct exception_table_entry *fixup,
> +  struct pt_regs *regs, int trapnr)
> +{
> + WARN(1, "unsafe MSR access error: RDMSR from 0x%x",
> +  (unsigned int)regs->cx);

Please make this WARN_ONCE(). There's no point in locking up the system with 
WARN() spam, should this trigger frequently.

> + WARN(1, "unsafe MSR access error: WRMSR to 0x%x (tried to write 
> 0x%08x%08x)\n",
> +  (unsigned int)regs->cx,
> +  (unsigned int)regs->dx, (unsigned int)regs->ax);

Ditto.

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 01/11] x86/boot: enumerate documentation for the x86 hardware_subarch

2016-02-25 Thread Ingo Molnar

* Andy Lutomirski <l...@amacapital.net> wrote:

> On Feb 24, 2016 12:33 AM, "Ingo Molnar" <mi...@kernel.org> wrote:
> >
> > For hard coded platform quirks I'd suggest we add x86_platform.quirks 
> > flags. For
> > example the F00F hack for Xen could be done via:
> >
> > x86_platform.quirks.idt_remap = 0;
> >
> 
> Don't we unconditionally remap the IDT?  I think Kees did it for
> general purpose hardening due to our complete inability to hide the
> IDT address. I.e. I think we can remove the f00f condition entirely.

Yeah, indeed - I only judged by the (limited) patch context and assumed the Xen 
problem was with IDT remapping.

But what the quirk really does is only to avoid printing the f00f workaround - 
i.e. a cosmetic change. I think we should just drop the paravirt_enabled() 
check.

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 01/11] x86/boot: enumerate documentation for the x86 hardware_subarch

2016-02-23 Thread Ingo Molnar


* Luis R. Rodriguez  wrote:

> Although hardware_subarch has been in place since the x86 boot
> protocol 2.07 it hasn't been used much. Enumerate current possible
> values to avoid misuses and help with semantics later at boot
> time should this be used further.
> 
> v2: fix typos
> 
> Cc: Andy Shevchenko 
> Signed-off-by: Luis R. Rodriguez 
> ---
>  arch/x86/include/uapi/asm/bootparam.h | 31 ++-
>  1 file changed, 30 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/uapi/asm/bootparam.h 
> b/arch/x86/include/uapi/asm/bootparam.h
> index 329254373479..50d5009cf276 100644
> --- a/arch/x86/include/uapi/asm/bootparam.h
> +++ b/arch/x86/include/uapi/asm/bootparam.h
> @@ -157,7 +157,36 @@ struct boot_params {
>   __u8  _pad9[276];   /* 0xeec */
>  } __attribute__((packed));
>  
> -enum {
> +/**
> + * enum x86_hardware_subarch - x86 hardware subarchitecture
> + *
> + * The x86 hardware_subarch and hardware_subarch_data were added as of the 
> x86
> + * boot protocol 2.07 to help distinguish and supports custom x86 boot
> + * sequences. This enum represents accepted values for the x86
> + * hardware_subarch. Custom x86 boot sequences (not X86_SUBARCH_PC) do not 
> have
> + * or simply do not make use of natural stubs like BIOS or EFI, the
> + * hardware_subarch can be used on the Linux entry path to revector to a
> + * subarchitecture stub when needed. This subarchitecture stub can be used to
> + * set up Linux boot parameters or for special care to account for 
> nonstandard
> + * handling of page tables.
> + *
> + * KVM and Xen HVM do not have a subarch as these are expected to follow
> + * standard x86 boot entries. If there is a genuine need for "hypervisor" 
> type
> + * that should be considered separately in the future.
> + *
> + * @X86_SUBARCH_PC: Should be used if the hardware is enumerable using 
> standard
> + *   PC mechanisms (PCI, ACPI) and doesn't need a special boot flow.
> + * @X86_SUBARCH_LGUEST: Used for x86 hypervisor demo, lguest
> + * @X86_SUBARCH_XEN: Used for Xen guest types which follow the PV boot path,
> + *   which start at asm startup_xen() entry point and later jump to the C
> + *   xen_start_kernel() entry point.
> + * @X86_SUBARCH_INTEL_MID: Used for Intel MID (Mobile Internet Device) 
> platform
> + *   systems which do not have the PCI legacy interfaces.
> + * @X86_SUBARCH_CE4100: Used for Intel CE media processor (CE4100) SOC for
> + *   for settop boxes and media devices, the use of a subarch for CE4100
> + *   is more of a hack...
> + */
> +enum x86_hardware_subarch {
>   X86_SUBARCH_PC = 0,
>   X86_SUBARCH_LGUEST,
>   X86_SUBARCH_XEN,

No, this is really backwards.

While I agree that we want to get rid of paravirt_enabled(), we _dont_ want to 
spread the use of (arguably broken) boot flags like this! This is one of the 
cases 
where the cure is worse than the disease.

The 'modern' way to handle platform quirks is to have hardware drivers with 
auto-detection, and drivers know how to detect whether the hardware is present 
or 
not. For legacy cases where no auto-detection is possible, we have per hardware 
capability flags to turn it off.

The 'hardware subarch' flag of the bootloader can be used to install certain 
quirks, in a single early bootup function - but that should be all: ideally no 
quirks are needed. We don't want to spread 'subarch flags' into various 
unrelated 
code.

Let's go over your series and see whether and how that principle can be applied:

 - patch #1, #2: should be dropped

 - patch #3, #4: EBDA support.

   The EBDA BIOS signature is an ancient data structure, starting off at
   physical memory 0x40E - which is the very first physical memory page of the
   system.

   We should add an x86_ebda_bios flag that is set to 1 by default, but which
   paravirt bootup can set to 0. That would avoid the reservation of the BIOS 
   area and will save a bit of RAM.

 - patch #5, #6, #7: looks good, does not use a subarch flag

 - patch #8: f00f workaround. Subarch flag use is wrong. The complication with 
   this workaround is that it uses MM tricks to install an IDT. Could you check 
   whether Xen truly needs this quirk? If yes then there should be a new flag, 
   something like x86_idt_readonly, which is set to 0 but Xen can set it to 1. 
If 
   that flag is set then the F00F workaround does not have to be installed.

   Or something like that: the point is to use a specific flag.

 - Patch #9, #10: RTC support. The problem with RTC platform driver is that 
it's 
   not possible to detect the RTC reliably - so we sometimes have to quirk it 
off.

   Instead of using bitflags, add something like x86_platform.rtc_available, 
which 
   defaults to 1. Don't add negation to the name and don't use bitflags - use a 
   byte flag.

 - Patch #11: this patch wants to disable the PNP BIOS code.

   The

Re: [Xen-devel] [PATCH v4 10/11] dma: rename dma__writecombine() to dma__wc()

2016-01-19 Thread Ingo Molnar


* Luis R. Rodriguez <mcg...@do-not-panic.com> wrote:

> On Tue, Aug 25, 2015 at 9:21 PM, Ingo Molnar <mi...@kernel.org> wrote:
> >
> > * Andrew Morton <a...@linux-foundation.org> wrote:
> >
> >> > There's a catch-22 issue here either way, for instance this rename patch 
> >> > has
> >> > been being baked for probably 2 releases already but the difficulty has 
> >> > been
> >> > trying to find the appropriate time to merge it without conflict.
> >> >
> >> > If you do it in the beginning of the merge window, you have to ask 
> >> > yourself in
> >> > what tree it will be done. Since subsystems are topic specific that 
> >> > means that
> >> > subsystem will end up having a conflict at the end of the merge window.
> >>
> >> Yes it's a special case.  I think the best way of handling such things is 
> >> to get
> >> them in to Linus either right at the end of the merge window or the day 
> >> after he
> >> releases -rc1.  This is when most people's trees are mostly empty.
> >
> > Yes, that was the plan last time around as well - but the end of the merge 
> > window
> > is when we have the least maintainer bandwidth as well ...
> >
> > Anyway, I applied most of the patches (sans the rename), so the rename patch
> > should be a lot simpler to execute at the right moment this time around.
> 
> Ingo, should we try this again some time? I have some ideas on how to
> make these sorts of changes easier to manage in the future, it
> involves having an automatic git rebase option to use Coccinelle for
> you if a patch is annotated to have been completely done with
> Coccinelle, but future tooling is needed for that [0]. In the meantime
> I (or you) can simply run the script at any point in time to catch all
> the names as-is in the kernel / point in time we decide to merge this
> simple rename.
> 
> [0] http://kernelnewbies.org/KernelProjects/linux-oven

So beyond the rename, can we also keep the old names as compatibility helpers, 
with a #define mapping them to the new names?

If so then please (re-)send the changes.

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Linux 4.4 MW: Boot under Xen fails with CONFIG_DEBUG_WX enabled: RIP: ptdump_walk_pgd_level_core

2015-11-04 Thread Ingo Molnar


* Stephen Smalley  wrote:

> On 11/04/2015 06:55 AM, Sander Eikelenboom wrote:
> >Hi All,
> >
> >I just tried to boot with the current linus mergewindow tree under Xen.
> >It fails with a kernel panic at boot with the new "CONFIG_DEBUG_WX"
> >option enabled.
> >Disabling it makes the kernel boot fine.
> >
> >The splat:
> >[   18.424241] Freeing unused kernel memory: 1104K (822fc000 -
> >8241)
> >[   18.430314] Write protecting the kernel read-only data: 18432k
> >[   18.441054] Freeing unused kernel memory: 1144K (880001ae2000 -
> >880001c0)
> >[   18.447966] Freeing unused kernel memory: 1560K (88000207a000 -
> >88000220)
> >[   18.453947] BUG: unable to handle kernel paging request at
> >88055c883000
> >[   18.459943] IP: []
> >ptdump_walk_pgd_level_core+0x20e/0x440
> >[   18.465847] PGD 2212067 PUD 0
> >[   18.471564] Oops:  [#1] SMP
> >[   18.477248] Modules linked in:
> >[   18.482918] CPU: 2 PID: 1 Comm: swapper/0 Not tainted
> >4.3.0-mw-20151104-linus-doflr+ #1
> >[   18.488804] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640)  , BIOS
> >V1.8B1 09/13/2010
> >[   18.494778] task: 880059b9 ti: 880059b98000 task.ti:
> >880059b98000
> >[   18.500852] RIP: e030:[]  []
> >ptdump_walk_pgd_level_core+0x20e/0x440

It would be nice to see which line of code this corresponds to. Doing this:

  gdb vmlinux
  list *0x8105af8e

should normally do the trick.

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops

2015-10-01 Thread Ingo Molnar

* Andy Lutomirski  wrote:

> > These could still be open coded in an inlined fashion, like the scheduler 
> > usage.
> 
> We could have a raw_rdmsr for those.
> 
> OTOH, I'm still not 100% convinced that this warn-but-don't-die behavior is 
> worth the effort.  This isn't a frequent source of bugs to my knowledge, and 
> we 
> don't try to recover from incorrect cr writes, out-of-bounds MMIO, etc, so do 
> we 
> really gain much by rigging a recovery mechanism for rdmsr and wrmsr failures 
> for code that doesn't use the _safe variants?

It's just the general principle really: don't crash the kernel on bootup. 
There's 
few things more user hostile than that.

Also, this would maintain the status quo: since we now (accidentally) don't 
crash 
the kernel on distro kernels (but silently and unsafely ignore the faulting 
instruction), we should not regress that behavior (by adding the chance to 
crash 
again), but improve upon it.

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops

2015-09-30 Thread Ingo Molnar


* Peter Zijlstra <pet...@infradead.org> wrote:

> On Mon, Sep 21, 2015 at 09:36:15AM -0700, Linus Torvalds wrote:
> > On Mon, Sep 21, 2015 at 1:46 AM, Ingo Molnar <mi...@kernel.org> wrote:
> > >
> > > Linus, what's your preference?
> > 
> > So quite frankly, is there any reason we don't just implement
> > native_read_msr() as just
> > 
> >unsigned long long native_read_msr(unsigned int msr)
> >{
> >   int err;
> >   unsigned long long val;
> > 
> >   val = native_read_msr_safe(msr, );
> >   WARN_ON_ONCE(err);
> >   return val;
> >}
> > 
> > Note: no inline, no nothing. Just put it in arch/x86/lib/msr.c, and be
> > done with it. I don't see the downside.
> > 
> > How many msr reads are so critical that the function call
> > overhead would matter? Get rid of the inline version of the _safe()
> > thing too, and put that thing there too.
> 
> There are a few in the perf code, and esp. on cores without a stack engine 
> the 
> call overhead is noticeable. Also note that the perf MSRs are generally 
> optimized MSRs and less slow (we cannot say fast, they're still MSRs) than 
> regular MSRs.

These could still be open coded in an inlined fashion, like the scheduler usage.

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops

2015-09-22 Thread Ingo Molnar

* Linus Torvalds <torva...@linux-foundation.org> wrote:

> On Mon, Sep 21, 2015 at 1:46 AM, Ingo Molnar <mi...@kernel.org> wrote:
> >
> > Linus, what's your preference?
> 
> So quite frankly, is there any reason we don't just implement
> native_read_msr() as just
> 
>unsigned long long native_read_msr(unsigned int msr)
>{
>   int err;
>   unsigned long long val;
> 
>   val = native_read_msr_safe(msr, );
>   WARN_ON_ONCE(err);
>   return val;
>}
> 
> Note: no inline, no nothing. Just put it in arch/x86/lib/msr.c, and be
> done with it. I don't see the downside.

Absolutely!

> How many msr reads are so critical that the function call overhead 
> would 
> matter? Get rid of the inline version of the _safe() thing too, and put that 
> thing there too.

Only a very low number of them is performance critical (because even 
hw-accelerated MSR accesses are generally slow so we try to avoid MSR accesses 
in 
fast paths as much as possible, via shadowing, etc.) - and in the few cases 
where 
we have to access an MSR in a fast path we can do those separately.

I'm only worried about the 'default' APIs, i.e. rdmsr() that is used throughout 
arch/x86/ over a hundred times, not about performance critical code paths that 
get 
enough testing and enough attention in general.

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops

2015-09-21 Thread Ingo Molnar


* Andy Lutomirski  wrote:

> On Sep 20, 2015 5:15 PM, "Linus Torvalds"  
> wrote:
> >
> > On Sun, Sep 20, 2015 at 5:02 PM, Andy Lutomirski  wrote:
> > > This demotes an OOPS and likely panic due to a failed non-"safe" MSR
> > > access to a WARN_ON_ONCE and a return of zero (in the RDMSR case).
> > > We still write a pr_info entry unconditionally for debugging.
> >
> > No, this is wrong.
> >
> > If you really want to do something like this, then just make all MSR reads 
> > safe. So the only difference between "safe" and "unsafe" is that the unsafe 
> > version just doesn't check the return value, and silently just returns zero 
> > for reads (or writes nothing).
> >
> > To quote Obi-Wan: "Use the exception table, Luke".
> >
> > Because decoding instructions is just too ugly. We'll do it for CPU errata 
> > where we might have to do it for user space code too (ie the AMD prefetch 
> > mess), but for code that _we_ control? Hell no.
> >
> > So NAK on this.
> 
> My personal preference is to just not do this at all.  A couple people 
> disagree.  
> If we make the unsafe variants not oops, then I think we want to have the 
> nice 
> loud warning, since these issues are bugs if they happen.
> 
> We could certainly use the exception table for this, but it'll result in 
> bigger 
> core, since each MSR access will need an exception table entry and an 
> associated 
> fixup to call some helper that warns and sets the result to zero.
> 
> I'd be happy to implement that, but only if it'll be applied. Otherwise I'd 
> rather just drop this patch and keep the rest of the series.

Linus, what's your preference?

Due to the bug mentioned earlier in this thread all MSR reads are currently 
'safe' 
on all the major Linux distros (which all have CONFIG_PARAVIRT=y), i.e. by 
'fixing' them we'd reintroduce random crashes into various fragile pieces of 
code...

To add insult to injury, the current 'silently safe by accident' MSR code isn't 
so 
safe: because it leaves the result of the read uninitialized...

To fix this all I'd really like to have:

 - safe MSR reads by default (i.e. never boot crash the kernel on some rare 
   condition - which to most users is either a silent boot hang or an instant 
   restart). Historicaly we had a stream of 'silly boot crashes' due to MSR 
reads 
   that generate a #GPF. They make Linux less usable around the edges, 
especially 
   in the x86 non-server (desktop) space where most hardware vendors are either 
   openly Linux hostile, or, at best, Linux oblivious.

 - proper result-zeroing behavior on exceptions

 - and we should also generate _some_ sort of warning when MSR exceptions happen
   in an 'unintended' fashion.

Maybe the warning could be put under a (default-enabled) config option for the 
size conscious.

Or we could extend exception table entry encoding to include a 'warning bit', 
to 
not bloat the kernel. If the exception handler code encounters such an 
exception 
it would generate a one-time warning for that entry, but otherwise not crash 
the 
kernel and continue execution with an all-zeroes result for the MSR read.

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops

2015-09-18 Thread Ingo Molnar

* Andy Lutomirski  wrote:

> This demotes an OOPS and likely panic due to a failed non-"safe" MSR
> access to a WARN_ON_ONCE and a return of poisoned values (in the
> RDMSR case).  We still write a pr_info entry unconditionally for
> debugging.
> 
> To be clear, this type of failure should *not* happen.  This patch
> exists to minimize the chance of nasty undebuggable failures due on
> systems that used to work due to a now-fixed CONFIG_PARAVIRT=y bug.

> + if (opcode == 0x320f) {
> + /* RDMSR */
> + pr_info("bad kernel RDMSR from non-existent MSR 0x%x",
> + (unsigned int)regs->cx);
> + if (!panic_on_oops) {
> + WARN_ON_ONCE(true);
> +
> + /* Patch it up with deterministic poison. */
> + regs->ax = 0x5aadc0de;
> + regs->dx = 0x8badf00d;
> + regs->ip += 2;
> + return true;

IMHO this should really not poison the result, but use zero as the result.

The poison might randomly indicate 'present' feature in various registers that 
might be accessed in a buggy way. Don't send the code further down into 
la-la-land 
by giving it a 'success'.

And yes, zero can mean success too, but we have to pick a side here ...

The warning will be enough to fix these ups, people (and in particular distro 
testing people) will be watching out for them.

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 0/3] x86/paravirt: Fix baremetal paravirt MSR ops

2015-09-17 Thread Ingo Molnar

* Andy Lutomirski  wrote:

> Setting CONFIG_PARAVIRT=y has an unintended side effect: it silently
> turns all rdmsr and wrmsr operations into the safe variants without
> any checks that the operations actually succeed.
> 
> This is IMO awful: it papers over bugs.  In particular, KVM gueests
> might be unwittingly depending on this behavior because
> CONFIG_KVM_GUEST currently depends on CONFIG_PARAVIRT.  I'm not
> aware of any such problems, but applying this series would be a good
> way to shake them out.
> 
> Fix it so that the MSR operations work the same on CONFIG_PARAVIRT=n
> and CONFIG_PARAVIRT=y as long as Xen isn't being used.  The Xen
> maintainers are welcome to make a similar change on top of this.
> 
> Since there's plenty of time before the next merge window, I think
> we should apply and fix anything that breaks.

No, I think we should at most generate a warning instead, and not crash the 
kernel 
via rdmsr()!

Most big distro kernels on bare metal have CONFIG_PARAVIRT=y (I checked Ubuntu 
and 
Fedora), so we are potentially exposing a lot of users to problems.

Crashing the bootup on an unknown MSR is bad. Many MSR reads and writes are 
non-critical and returning the 'safe' result is much better than crashing or 
hanging the bootup.

( We should double check that rdmsr()/wrmsr() results are never left 
  uninitialized, but are set to zero or so, for cases where the return code is 
not 
  checked. )

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 0/3] x86/paravirt: Fix baremetal paravirt MSR ops

2015-09-17 Thread Ingo Molnar


* Andy Lutomirski <l...@amacapital.net> wrote:

> On Thu, Sep 17, 2015 at 12:19 AM, Ingo Molnar <mi...@kernel.org> wrote:
> >
> > * Andy Lutomirski <l...@kernel.org> wrote:
> >
> >> Setting CONFIG_PARAVIRT=y has an unintended side effect: it silently
> >> turns all rdmsr and wrmsr operations into the safe variants without
> >> any checks that the operations actually succeed.
> >>
> >> This is IMO awful: it papers over bugs.  In particular, KVM gueests
> >> might be unwittingly depending on this behavior because
> >> CONFIG_KVM_GUEST currently depends on CONFIG_PARAVIRT.  I'm not
> >> aware of any such problems, but applying this series would be a good
> >> way to shake them out.
> >>
> >> Fix it so that the MSR operations work the same on CONFIG_PARAVIRT=n
> >> and CONFIG_PARAVIRT=y as long as Xen isn't being used.  The Xen
> >> maintainers are welcome to make a similar change on top of this.
> >>
> >> Since there's plenty of time before the next merge window, I think
> >> we should apply and fix anything that breaks.
> >
> > No, I think we should at most generate a warning instead, and not crash the 
> > kernel
> > via rdmsr()!
> >
> > Most big distro kernels on bare metal have CONFIG_PARAVIRT=y (I checked 
> > Ubuntu and
> > Fedora), so we are potentially exposing a lot of users to problems.
> >
> > Crashing the bootup on an unknown MSR is bad. Many MSR reads and writes are
> > non-critical and returning the 'safe' result is much better than crashing or
> > hanging the bootup.
> >
> 
> Should we do that for CONFIG_PARAVIRT=n, too?

Absolutely. PARAVIRT=n should not behave differently from PARAVIRT=y on bare 
metal.

> It would be straightforward to rig this up (temporarily?) on top of these 
> patches.  To keep bloat down, we might want to implement it in 
> do_general_protection rather than sticking it in native_read_msr.

Fair enough.

> wrmsr is a different beast, since we can fail due to writing the wrong value 
> to 
> an otherwise valid MSR.  Given that MSR screwups can very easily be security 
> holes, I'm not sure that warning and blindly continuing on an unchecked 
> failed 
> wrmsr is a good idea.

So the fact is that right now we are silently ignoring failures there, and have 
been doing that for some time. The right first step is to live with that and 
start 
generating low-key, once-per-bootup warnings at most, and see how frequent (and 
how serious) they are.

We could add a (default disabled) CONFIG_PANIC_ON_UNKNOWN_MSR=y option if 
that's 
really a serious concern.

> In any event, I think it's nuts that CONFIG_PARAVIRT changes this
> behavior.  We should pick something sane and stick with it.

Absolutely - and as it happens, the 'does not crash the kernel' PARAVIRT=y 
accidental behavior is actually quite close to what we wanted for a long time, 
so 
let's make it official - and add a warning to make sure we are aware of 
problems. 

But don't turn 'potential problems' into showstopper bugs such as a hard to 
debug 
early boot crash, which to most Linux users means a black screen on bootup!

> > ( We should double check that rdmsr()/wrmsr() results are never left
> >   uninitialized, but are set to zero or so, for cases where the return code 
> > is not
> >   checked. )
> 
> It sure looks like native_read_msr_safe doesn't clear the output if the rdmsr 
> fails.

Yeah, that should be fixed, to make such soft failures deterministic.

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v4 10/11] dma: rename dma__writecombine() to dma__wc()

2015-08-25 Thread Ingo Molnar


* Andrew Morton a...@linux-foundation.org wrote:

  There's a catch-22 issue here either way, for instance this rename patch 
  has 
  been being baked for probably 2 releases already but the difficulty has 
  been 
  trying to find the appropriate time to merge it without conflict.
  
  If you do it in the beginning of the merge window, you have to ask yourself 
  in 
  what tree it will be done. Since subsystems are topic specific that means 
  that 
  subsystem will end up having a conflict at the end of the merge window.
 
 Yes it's a special case.  I think the best way of handling such things is to 
 get 
 them in to Linus either right at the end of the merge window or the day after 
 he 
 releases -rc1.  This is when most people's trees are mostly empty.

Yes, that was the plan last time around as well - but the end of the merge 
window 
is when we have the least maintainer bandwidth as well ...

Anyway, I applied most of the patches (sans the rename), so the rename patch 
should be a lot simpler to execute at the right moment this time around.

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v4 10/11] dma: rename dma__writecombine() to dma__wc()

2015-08-25 Thread Ingo Molnar


* Luis R. Rodriguez mcg...@do-not-panic.com wrote:

 From: Luis R. Rodriguez mcg...@suse.com
 
 Rename dma_*_writecombine() to dma_*_wc(), so that the naming
 is coherent across the various write-combining APIs.
 
 The following Coccinelle SmPL patch was used for this simple
 transformation:
 
 @ rename_dma_alloc_writecombine @
 expression dev, size, dma_addr, gfp;
 @@
 
 -dma_alloc_writecombine(dev, size, dma_addr, gfp)
 +dma_alloc_wc(dev, size, dma_addr, gfp)
 
 @ rename_dma_free_writecombine @
 expression dev, size, cpu_addr, dma_addr;
 @@
 
 -dma_free_writecombine(dev, size, cpu_addr, dma_addr)
 +dma_free_wc(dev, size, cpu_addr, dma_addr)
 
 @ rename_dma_mmap_writecombine @
 expression dev, vma, cpu_addr, dma_addr, size;
 @@
 
 -dma_mmap_writecombine(dev, vma, cpu_addr, dma_addr, size)
 +dma_mmap_wc(dev, vma, cpu_addr, dma_addr, size)
 
 Generated-by: Coccinelle SmPL
 Suggested-by: Ingo Molnar mi...@kernel.org
 Signed-off-by: Luis R. Rodriguez mcg...@suse.com
 ---
  arch/arm/mach-lpc32xx/phy3250.c   | 13 ++---
  arch/arm/mach-netx/fb.c   | 14 ++
  arch/arm/mach-nspire/clcd.c   | 13 ++---
  arch/avr32/include/asm/dma-mapping.h  | 20 ++--
  arch/avr32/mm/dma-coherent.c  | 12 ++--
  arch/metag/include/asm/dma-mapping.h  |  4 ++--
  arch/metag/kernel/dma.c   |  6 +++---
  drivers/dma/iop-adma.c|  8 
  drivers/dma/mv_xor.c  |  4 ++--
  drivers/dma/qcom_bam_dma.c| 14 +++---
  drivers/gpu/drm/drm_gem_cma_helper.c  | 13 ++---
  drivers/gpu/drm/omapdrm/omap_dmm_tiler.c  | 13 ++---
  drivers/gpu/drm/omapdrm/omap_gem.c|  8 
  drivers/gpu/drm/sti/sti_cursor.c  | 13 ++---
  drivers/gpu/drm/sti/sti_gdp.c |  3 +--
  drivers/gpu/drm/sti/sti_hqvdp.c   |  6 +++---
  drivers/gpu/drm/tegra/gem.c   | 11 +--
  drivers/gpu/host1x/cdma.c |  8 
  drivers/gpu/host1x/job.c  | 10 --
  drivers/media/platform/coda/coda-bit.c| 10 +-
  drivers/video/fbdev/acornfb.c |  4 ++--
  drivers/video/fbdev/amba-clcd-versatile.c | 14 ++
  drivers/video/fbdev/amba-clcd.c   |  4 ++--
  drivers/video/fbdev/atmel_lcdfb.c |  9 +
  drivers/video/fbdev/ep93xx-fb.c   |  9 +++--
  drivers/video/fbdev/gbefb.c   |  8 
  drivers/video/fbdev/imxfb.c   | 12 ++--
  drivers/video/fbdev/mx3fb.c   |  9 -
  drivers/video/fbdev/nuc900fb.c|  8 
  drivers/video/fbdev/omap/lcdc.c   | 16 
  drivers/video/fbdev/pxa168fb.c|  8 
  drivers/video/fbdev/pxafb.c   |  4 ++--
  drivers/video/fbdev/s3c-fb.c  |  7 +++
  drivers/video/fbdev/s3c2410fb.c   |  8 
  drivers/video/fbdev/sa1100fb.c|  8 
  include/linux/dma-mapping.h   | 16 
  sound/arm/pxa2xx-pcm-lib.c| 20 
  sound/soc/fsl/imx-pcm-fiq.c   | 10 --
  sound/soc/nuc900/nuc900-pcm.c |  6 ++
  sound/soc/omap/omap-pcm.c | 12 
  40 files changed, 183 insertions(+), 212 deletions(-)

Which kernel is this against? It has conflicts in 3 files with Linus's latest: 
v4.2-rc8.

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Patch x86/ldt: Make modify_ldt synchronous has been added to the 4.1-stable tree

2015-08-14 Thread Ingo Molnar


* Greg KH gre...@linuxfoundation.org wrote:

 On Thu, Aug 13, 2015 at 06:38:46PM -0700, Andy Lutomirski wrote:
  On Thu, Aug 13, 2015 at 5:44 PM,  gre...@linuxfoundation.org wrote:
  
   This is a note to let you know that I've just added the patch titled
  
   x86/ldt: Make modify_ldt synchronous
  
  This needs:
  
  https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?h=x86/urgentid=4809146b86c3d41ce588fdb767d021e2a80600dd
  
  and its parent.  :(
 
 As that isn't in Linus's tree yet, I'll move this one back and wait a
 release or so before it gets merged.

I just sent it to Linus, so barring a catastrophy it should be upstream soon.

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] linux-next: manual merge of the xen-tip tree with the tip tree

2015-08-13 Thread Ingo Molnar


* Peter Zijlstra pet...@infradead.org wrote:

 On Wed, Aug 12, 2015 at 02:44:52PM -0400, Boris Ostrovsky wrote:
  On 08/12/2015 02:36 PM, Peter Zijlstra wrote:
  On Wed, Aug 12, 2015 at 11:26:41AM -0700, H. Peter Anvin wrote:
  One option might be to do the addition in assembly, i.e.:
  
  i (key), i (index)
  
  ... and put the addition into the assembly source.
  Like so? Seems to build on gcc-4.6.
  
  Yes, this builds on 4.4.4. as well.
 
 Thanks, Ingo / Hpa, can you make the below patch appear in the right tip
 branch?

Applied, it will show up as the following commit:

  d420acd816c0 (jump_label/x86: Work around asm build bug on older/backported 
GCCs)

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 1/1] x86: allow to enable/disable modify_ldt at run time

2015-08-05 Thread Ingo Molnar


* Willy Tarreau w...@1wt.eu wrote:

 @@ -276,6 +282,15 @@ asmlinkage int sys_modify_ldt(int func, void __user *ptr,
  {
   int ret = -ENOSYS;
  
 + if (!sysctl_modify_ldt) {
 + printk_ratelimited(KERN_INFO
 + Denied a call to modify_ldt() from %s[%d] (uid: %d).
 +  Adjust sysctl if this was not an exploit attempt.\n,
 + current-comm, task_pid_nr(current),
 + from_kuid_munged(current_user_ns(), current_uid()));

UI nit: so this message should really tell the user _which_ sysctl to 
configure, 
instead of passive-aggressively alluding to the fact that there's a sysctl 
somewhere that might do the trick...

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 1/1] x86: allow to enable/disable modify_ldt at run time

2015-08-05 Thread Ingo Molnar


* Willy Tarreau w...@1wt.eu wrote:

 Hi Ingo,
 
 On Wed, Aug 05, 2015 at 10:00:37AM +0200, Ingo Molnar wrote:
  
  * Willy Tarreau w...@1wt.eu wrote:
  
   @@ -276,6 +282,15 @@ asmlinkage int sys_modify_ldt(int func, void __user 
   *ptr,
{
 int ret = -ENOSYS;

   + if (!sysctl_modify_ldt) {
   + printk_ratelimited(KERN_INFO
   + Denied a call to modify_ldt() from %s[%d] (uid: %d).
   +  Adjust sysctl if this was not an exploit attempt.\n,
   + current-comm, task_pid_nr(current),
   + from_kuid_munged(current_user_ns(), current_uid()));
  
  UI nit: so this message should really tell the user _which_ sysctl to 
  configure, 
  instead of passive-aggressively alluding to the fact that there's a sysctl 
  somewhere that might do the trick...
 
 I agree, I did it first and changed my mind due to the repetition of
 the word modify_ldt.
 
 Here's an updated version instead.
 
 Willy
 
 
 From 17b2720cd54df0fde6686c1d85aaed38d679cbe7 Mon Sep 17 00:00:00 2001
 From: Willy Tarreau w...@1wt.eu
 Date: Sat, 25 Jul 2015 12:18:33 +0200
 Subject: [PATCH] x86/ldt: allow to disable modify_ldt at runtime
 
 For distros who prefer not to take the risk of completely disabling the
 modify_ldt syscall using CONFIG_MODIFY_LDT_SYSCALL, this patch adds a
 sysctl to enable or/disable it at runtime, and proposes to disable it
 by default. This can be a safe alternative. A message is logged if an
 attempt was stopped so that it's easy to spot if/when it is needed.
 
 Cc: Andy Lutomirski l...@kernel.org
 Cc: Kees Cook keesc...@chromium.org
 Signed-off-by: Willy Tarreau w...@1wt.eu
 ---
  Documentation/sysctl/kernel.txt | 15 +++
  arch/x86/Kconfig| 17 +
  arch/x86/kernel/ldt.c   | 16 
  kernel/sysctl.c | 14 ++
  4 files changed, 62 insertions(+)
 
 diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
 index 6fccb69..60c7c7a 100644
 --- a/Documentation/sysctl/kernel.txt
 +++ b/Documentation/sysctl/kernel.txt
 @@ -41,6 +41,7 @@ show up in /proc/sys/kernel:
  - kptr_restrict
  - kstack_depth_to_print   [ X86 only ]
  - l2cr[ PPC only ]
 +- modify_ldt  [ X86 only ]
  - modprobe== Documentation/debugging-modules.txt
  - modules_disabled
  - msg_next_id  [ sysv ipc ]
 @@ -391,6 +392,20 @@ This flag controls the L2 cache of G3 processor boards. 
 If
  
  ==
  
 +modify_ldt: (X86 only)
 +
 +Enables (1) or disables (0) the modify_ldt syscall. Modifying the LDT
 +(Local Descriptor Table) may be needed to run a 16-bit or segmented code

s/run a/run

 +such as Dosemu or Wine. This is done via a system call which is not needed

s/Dosemu/DOSEMU

 +to run portable applications, and which can sometimes be abused to exploit
 +some weaknesses of the architecture, opening new vulnerabilities.

So that's pretty vague IMHO, and a bit FUD-ish in character. How about:

  ... , and which system call exposes complex, rarely used legacy hardware 
  features and semantics that had suffered vulnerabilities in the past.

 +
 +This sysctl allows one to increase the system's security by disabling the
 +system call, or to restore compatibility with specific applications when it
 +was already disabled.
 +
 +==
 +
  modules_disabled:
  
  A toggle value indicating if modules are allowed to be loaded
 diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
 index beabf30..88d10a0 100644
 --- a/arch/x86/Kconfig
 +++ b/arch/x86/Kconfig
 @@ -2069,6 +2069,23 @@ config MODIFY_LDT_SYSCALL
 surface.  Disabling it removes the modify_ldt(2) system call.
  
 Saying 'N' here may make sense for embedded or server kernels.
 +   If really unsure, say 'Y', you'll be able to disable it at runtime.
 +
 +config DEFAULT_MODIFY_LDT_SYSCALL
 + bool Allow userspace to modify the LDT by default
 + depends on MODIFY_LDT_SYSCALL
 + default y
 + ---help---
 +   Modifying the LDT (Local Descriptor Table) may be needed to run a
 +   16-bit or segmented code such as Dosemu or Wine. This is done via
 +   a system call which is not needed to run portable applications,
 +   and which can sometimes be abused to exploit some weaknesses of
 +   the architecture, opening new vulnerabilities.
 +
 +   For this reason this option allows one to enable or disable the
 +   feature at runtime. It is recommended to say 'N' here to leave
 +   the system protected, and to enable it at runtime only if needed
 +   by setting the sys.kernel.modify_ldt sysctl.

Here I'd do the same modifications as to the sysctl text above.

 + if (!sysctl_modify_ldt) {
 + printk_ratelimited(KERN_INFO
 + Denied a call to modify_ldt

Re: [Xen-devel] [PATCH v3 1/1] x86: allow to enable/disable modify_ldt at run time

2015-08-05 Thread Ingo Molnar


* Willy Tarreau w...@1wt.eu wrote:

   + if (!sysctl_modify_ldt) {
   + printk_ratelimited(KERN_INFO
   + Denied a call to modify_ldt() from %s[%d] (uid: %d).
   +  Adjust the modify_ldt sysctl if this was not an
  
  Would it really be so difficult to write this as:
  
Set sys.kernel.modify_ldt = 1 in /etc/sysctl.conf if this was not an 
  exploit attempt.
 
 It's just a matter of taste. Normally I consider the kernel distro-agnostic 
 so I 
 don't like to suggest one way to adjust sysctls nor to reference config 
 files. 
 Here we're in a case where only standard distro users may hit the issue, and 
 users of embedded distros will not face this message or will easily translate 
 it 
 into their respective configuration scheme. So OK for this one.

So it's a side issue, but it's not a matter of taste at all: why should we end 
up 
hurting 99% of Linux users (that use regular distros), just to make it slightly 
more 'correct' for the weird 1% 'embedded distro' case that decided to put 
sysctl 
configuration elsewhere?

Users of 'embedded' distros won't normally see kernel messages, and even if 
they 
do, the message is crystal clear even to them...

Such messages should be as helpful to the regular case as possible. The weird 
cases will be OK too: and it does not help to make a message unhelpful for 
_both_ 
cases.

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v6 4/4] x86/ldt: Make modify_ldt optional

2015-07-31 Thread Ingo Molnar


* Andy Lutomirski l...@kernel.org wrote:

 The modify_ldt syscall exposes a large attack surface and is
 unnecessary for modern userspace.  Make it optional.
 
 Reviewed-by: Kees Cook keesc...@chromium.org
 Signed-off-by: Andy Lutomirski l...@kernel.org
 ---
  arch/x86/Kconfig   | 17 +
  arch/x86/include/asm/mmu.h |  2 ++
  arch/x86/include/asm/mmu_context.h | 28 +---
  arch/x86/kernel/Makefile   |  3 ++-
  arch/x86/kernel/cpu/perf_event.c   |  4 
  arch/x86/kernel/process_64.c   |  2 ++
  arch/x86/kernel/step.c |  2 ++
  kernel/sys_ni.c|  1 +
  8 files changed, 51 insertions(+), 8 deletions(-)

btw., I fixed a (rare) build failure on MATH_EMULATION=y  !MODIFY_LDT_SYSCALL:

  arch/x86/math-emu/fpu_system.h:21:71: error: ‘mm_context_t’ has no member 
named ‘ldt’

I took the easy fix: made MATH_EMULATION depend on MODIFY_LDT_SYSCALL for now.

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v9 0/8] pci: add pci_iomap_wc() and pci_ioremap_wc_bar()

2015-07-22 Thread Ingo Molnar


* Bjorn Helgaas bhelg...@google.com wrote:

Let me know if these are OK or if there are any questions.

[0] http://lkml.kernel.org/r/20150625204703.gc4...@pd.tnic
[1] http://lkml.kernel.org/r/20150707095012.gq7...@wotan.suse.de
   
   Ingo,
   
   Just a friendly reminder. Let me know if there are any issues or 
   questions.
  
  It would be nice to get an Acked-by from Bjorn for the PCI API bits.
 
 I think the actual code of pci_ioremap_wc() and pci_ioremap_wc_bar() is fine 
 (although I might have named it pci_ioremap_bar_wc() for consistency).
 
 I declined to merge or ack them myself because they're obvious extensions of 
 pci_ioremap() and pci_ioremap_bar(), and I would prefer that they be exported 
 the same way, i.e., with EXPORT_SYMBOL(), not EXPORT_SYMBOL_GPL().

Huh? AFAICS pci_ioremap_bar() has been a _GPL export for a long time:

1684f5ddd4c0c   (Andrew Morton  2008-12-01 14:30:30 -0800   126)#ifdef 
CONFIG_HAS_IOMEM
1684f5ddd4c0c   (Andrew Morton  2008-12-01 14:30:30 -0800   127)void 
__iomem *pci_ioremap_bar(struct pci_dev *pdev, int bar)
1684f5ddd4c0c   (Andrew Morton  2008-12-01 14:30:30 -0800   128){
1f7bf3bfb5d60   (Bjorn Helgaas  2015-03-12 12:30:11 -0500   129)struct 
resource *res = pdev-resource[bar];
1f7bf3bfb5d60   (Bjorn Helgaas  2015-03-12 12:30:11 -0500   130)
1684f5ddd4c0c   (Andrew Morton  2008-12-01 14:30:30 -0800   131)/*
1684f5ddd4c0c   (Andrew Morton  2008-12-01 14:30:30 -0800   132) * Make 
sure the BAR is actually a memory resource, not an IO resource
1684f5ddd4c0c   (Andrew Morton  2008-12-01 14:30:30 -0800   133) */
646c0282df042   (Bjorn Helgaas  2015-03-12 12:30:15 -0500   134)if 
(res-flags  IORESOURCE_UNSET || !(res-flags  IORESOURCE_MEM)) {
1f7bf3bfb5d60   (Bjorn Helgaas  2015-03-12 12:30:11 -0500   135)
dev_warn(pdev-dev, can't ioremap BAR %d: %pR\n, bar, res);
1684f5ddd4c0c   (Andrew Morton  2008-12-01 14:30:30 -0800   136)
return NULL;
1684f5ddd4c0c   (Andrew Morton  2008-12-01 14:30:30 -0800   137)}
1f7bf3bfb5d60   (Bjorn Helgaas  2015-03-12 12:30:11 -0500   138)return 
ioremap_nocache(res-start, resource_size(res));
1684f5ddd4c0c   (Andrew Morton  2008-12-01 14:30:30 -0800   139)}
1684f5ddd4c0c   (Andrew Morton  2008-12-01 14:30:30 -0800   
140)EXPORT_SYMBOL_GPL(pci_ioremap_bar);
1684f5ddd4c0c   (Andrew Morton  2008-12-01 14:30:30 -0800   141)#endif

commit 1684f5ddd4c0c happened in 2008, well before you became PCI maintainer in 
2012.

and I'd prefer keeping the same EXPORT_SYMBOL_GPL() pattern for the new APIs.

(ioremap_wc() is EXPORT_SYMBOL() mostly by accident, it's the odd one out.)

Also, FWIIW: I personally got essentially zero feedback and help from 
proprietary 
binary kernel module vendors in the past couple of years as x86 maintainer, 
despite a fair chunk of kernel crashes reported on distro kernels occuring in 
them...

Based on that very negative experience, when we introduce something as complex 
and 
as critical as new caching APIs, the last thing I want is to have obscure bugs 
in 
binary modules I cannot fix in any reasonable fashion. So even if the parent 
APIs 
of new APIs weren't already _GPL exports (as in this case), I'd export them as 
_GPL in this case.

 I think using EXPORT_SYMBOL_GPL to express individual political aims rather 
 than 
 as a hint about what might be derived work makes us look like zealots, and 
 that's not my style.

As far as I'm concerned it's a pure technological choice: I don't want to 
export 
certain types of hard to fix and critical functionality to drivers that I 
cannot 
then fix.

But I also applied EXPORT_SYMBOL() patches in the past, so I'm not one-sided 
about 
it.

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v9 0/8] pci: add pci_iomap_wc() and pci_ioremap_wc_bar()

2015-07-21 Thread Ingo Molnar


* Luis R. Rodriguez mcg...@suse.com wrote:

 On Wed, Jul 08, 2015 at 06:54:11PM -0700, Luis R. Rodriguez wrote:
  From: Luis R. Rodriguez mcg...@suse.com
  
  Ingo,
  
  Boris is on vacation, he picked up these patches on his bp#tip-mm tree [0]
  and they have baked there for a while now. That tree receives 0-day
  bot testing, but other than that its not clear what other tests were
  run on these patches. Boris modified the commit logs a bit, and made one
  optimizaiton to bail early on an PCI ioremap call when it should. These
  patches have no modifications from what is on Boris' tree and tip-mm branch.
  
  The 0 day build bot did find issues on Boris' tree but those are related
  to ioremap_uc() (already upstream) and its first use on atyfb (not
  upstream) -- I will be addressing a fix for that ioremap_uc() issue through
  another patch series prior to posting the final set for atyfb which makes
  use of ioremap_uc().
  
  No issues have been found with this series. Benh did note some possible 
  issues
  with expectations with what is done for write-combining for PowerPC [1] but
  the issue is a rather general long standing issue with semantics of ioremap 
  --
  in the case for ioremap_wc() on PowerPC benh notes that writel() will never
  write-combine as it uses too heavy barriers. Benh notes that although
  writel_relaxed() today is identical to writel() this can be changed. There 
  are
  other general semantics issues with ioremap() variant calls -- we seem to 
  have
  all gotten together to discuss all these issues on a thread where Dan 
  Williams
  is proposing to unify ioremap prototypes and macro aliases [1], folks
  intersted on these issues or semantic concerns can drop in and chime there.
  
  Let me know if these are OK or if there are any questions.
  
  [0] http://lkml.kernel.org/r/20150625204703.gc4...@pd.tnic
  [1] http://lkml.kernel.org/r/20150707095012.gq7...@wotan.suse.de
 
 Ingo,
 
 Just a friendly reminder. Let me know if there are any issues or questions.

It would be nice to get an Acked-by from Bjorn for the PCI API bits.

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 07/12] x86/virt/guest/xen: Remove use of pgd_list from the Xen guest code

2015-06-15 Thread Ingo Molnar


* David Vrabel david.vra...@citrix.com wrote:

 On 15/06/15 10:05, Ian Campbell wrote:
  On Sat, 2015-06-13 at 11:49 +0200, Ingo Molnar wrote:

  xen_mm_pin_all()/unpin_all() are used to implement full guest instance 
  suspend/restore. It's a stop-all method that needs to iterate through all 
  allocated pgds in the system to fix them up for Xen's use.
 
  This code uses pgd_list, probably because it was an easy interface.
 
  But we want to remove the pgd_list, so convert the code over to walk all 
  tasks in the system. This is an equivalent method.
 
 It is not equivalent because pgd_alloc() now populates entries in pgds that 
 are 
 not visible to xen_mm_pin_all() (note how the original code adds the pgd to 
 the 
 pgd_list in pgd_ctor() before calling pgd_prepopulate_pmd()).  These newly 
 allocated page tables won't be correctly converted on suspend/resume and the 
 new 
 process will die after resume.

So how should the Xen logic be fixed for the new scheme? I can't say I can see 
through the paravirt complexity here.

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] RIP MTRR - status update for upcoming v4.2

2015-06-13 Thread Ingo Molnar


* Andy Lutomirski l...@amacapital.net wrote:

 On Jun 12, 2015 12:59 AM, Jan Beulich jbeul...@suse.com wrote:
 
   On 12.06.15 at 01:23, toshi.k...@hp.com wrote:
   There are two usages on MTRRs:
1) MTRR entries set by firmware
2) MTRR entries set by OS drivers
  
   We can obsolete 2), but we have no control over 1).  As UEFI firmwares
   also set this up, this usage will continue to stay.  So, we should not
   get rid of the MTRR code that looks up the MTRR entries, while we have
   no need to modify them.
  
   Such MTRR entries provide safe guard to /dev/mem, which allows privileged 
   user to access a range that may require UC mapping while the /dev/mem 
   driver 
   blindly maps it with WB.  MTRRs converts WB to UC in such a case.
 
  But it wouldn't be impossible to simply read the MTRRs upon boot, store the 
  information, disable MTRRs, and correctly use PAT to achieve the same 
  effect 
  (i.e. the blindly maps part of course would need fixing).
 
 This may crash and burn badly when we call a UEFI function or an SMI happens. 
  I 
 think we should just leave the MTRRs alone.

Not to mention suspend/resume, reboot and other goodies where the firmware 
might 
pop up expecting intact MTRRs.

Btw., doesn't a lack of MTRRs imply UC? So is 'crash and burn' possible in most 
cases? Isn't it just 'executes slower than before'?

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] x86/cpu: Fix SMAP check in PVOPS environments

2015-06-05 Thread Ingo Molnar


* Rusty Russell ru...@rustcorp.com.au wrote:

 As the comment in arch/x86/include/asm/paravirt_types.h says:
 
* Get/set interrupt state.  save_fl and restore_fl are only
* expected to use X86_EFLAGS_IF; all other bits
* returned from save_fl are undefined, and may be ignored by
* restore_fl.

There should be no 'may' about this: under CONFIG_PARAVIRT_DEBUG=y the reminder 
of 
the flags should be cleared (or all bits should be set: that will instantly 
crash 
if restored verbatim), so that we trip up generic code that relies on non-IF 
restoration?

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

1 2 >

1 - 100 of 110 matches

Mail list logo