Re: [Xen-devel] [PATCH 2/3] x86: add guest_late_init hook to hypervisor_x86 structure
* Juergen Gross <jgr...@suse.com> wrote: > On 08/11/17 10:40, Ingo Molnar wrote: > > > > * Juergen Gross <jgr...@suse.com> wrote: > > > >>> Plus add a default empty function (which hypervisors can override). This > >>> avoids > >>> all the hidden conditions and wrappery. > >> > >> Hmm, x86_hyper is just a pointer being NULL on bare metal. So we would > >> have to add a pre-filled struct with dummy functions and in case a > >> hypervisor is detected we'd need to copy all non-NULL pointers of the > >> hypervisor specific struct to the pre-filled one. > > > > Ok, I missed that detail - but yeah, since this is mostly about boot code, > > where readability is king, I still think it would be an overall improvement. > > > > This is the pattern we are trying to use with x86_platform ops for example, > > and > > doing: > > > > git grep -w x86_platform arch/x86 > > > > gives a pretty clear idea about how it's used - while if it was all within > > wrappers it would be a lot more difficult to discover all the callsites. > > > > Admittedly it's not done totally consistently: > > > > arch/x86/kernel/apic/probe_32.c:if (x86_platform.apic_post_init) > > arch/x86/kernel/apic/probe_64.c:if (x86_platform.apic_post_init) > > arch/x86/kernel/ebda.c: if (!x86_platform.legacy.reserve_bios_regions) > > arch/x86/kernel/platform-quirks.c: if > > (x86_platform.set_legacy_features) > > arch/x86/platform/intel-mid/device_libs/platform_mrfld_rtc.c: if > > (!x86_platform.legacy.rtc) > > > > ... but the _idea_ behind it is consistent ;-) > > > >> In case there are no objections I can add a patch to modify the current > >> way x86_hyper is used to the pre-filled variant. > > > > Yeah, sounds good to me! > > Okay. With you mentioning x86_platform: should I make x86_hyper a member > of x86_platform (e.g. x86_platform.hyper.) ? > > I think this would fit quite nice. Yeah, seems like a good idea! Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 2/3] x86: add guest_late_init hook to hypervisor_x86 structure
* Juergen Grosswrote: > > Plus add a default empty function (which hypervisors can override). This > > avoids > > all the hidden conditions and wrappery. > > Hmm, x86_hyper is just a pointer being NULL on bare metal. So we would > have to add a pre-filled struct with dummy functions and in case a > hypervisor is detected we'd need to copy all non-NULL pointers of the > hypervisor specific struct to the pre-filled one. Ok, I missed that detail - but yeah, since this is mostly about boot code, where readability is king, I still think it would be an overall improvement. This is the pattern we are trying to use with x86_platform ops for example, and doing: git grep -w x86_platform arch/x86 gives a pretty clear idea about how it's used - while if it was all within wrappers it would be a lot more difficult to discover all the callsites. Admittedly it's not done totally consistently: arch/x86/kernel/apic/probe_32.c:if (x86_platform.apic_post_init) arch/x86/kernel/apic/probe_64.c:if (x86_platform.apic_post_init) arch/x86/kernel/ebda.c: if (!x86_platform.legacy.reserve_bios_regions) arch/x86/kernel/platform-quirks.c: if (x86_platform.set_legacy_features) arch/x86/platform/intel-mid/device_libs/platform_mrfld_rtc.c: if (!x86_platform.legacy.rtc) ... but the _idea_ behind it is consistent ;-) > In case there are no objections I can add a patch to modify the current > way x86_hyper is used to the pre-filled variant. Yeah, sounds good to me! Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 2/3] x86: add guest_late_init hook to hypervisor_x86 structure
* Juergen Grosswrote: > Add a new guest_late_init hook to the hypervisor_x86 structure. It > will replace the current kvm_guest_init() call which is changed to > make use of the new hook. > > Signed-off-by: Juergen Gross > --- > arch/x86/include/asm/hypervisor.h | 11 +++ > arch/x86/include/asm/kvm_para.h | 2 -- > arch/x86/kernel/kvm.c | 3 ++- > arch/x86/kernel/setup.c | 2 +- > 4 files changed, 14 insertions(+), 4 deletions(-) > > diff --git a/arch/x86/include/asm/hypervisor.h > b/arch/x86/include/asm/hypervisor.h > index 0ead9dbb9130..37320687b8cb 100644 > --- a/arch/x86/include/asm/hypervisor.h > +++ b/arch/x86/include/asm/hypervisor.h > @@ -38,6 +38,9 @@ struct hypervisor_x86 { > /* Platform setup (run once per boot) */ > void(*init_platform)(void); > > + /* Guest late init */ > + void(*guest_late_init)(void); > + > /* X2APIC detection (run once per boot) */ > bool(*x2apic_available)(void); > > @@ -66,9 +69,17 @@ static inline void hypervisor_init_mem_mapping(void) > if (x86_hyper && x86_hyper->init_mem_mapping) > x86_hyper->init_mem_mapping(); > } > + > +static inline void hypervisor_guest_late_init(void) > +{ > + if (x86_hyper && x86_hyper->guest_late_init) > + x86_hyper->guest_late_init(); > +} > + > #else > static inline void init_hypervisor_platform(void) { } > static inline bool hypervisor_x2apic_available(void) { return false; } > static inline void hypervisor_init_mem_mapping(void) { } > +static inline void hypervisor_guest_late_init(void) { } > #endif /* CONFIG_HYPERVISOR_GUEST */ > #endif /* _ASM_X86_HYPERVISOR_H */ > diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h > index c373e44049b1..7b407dda2bd7 100644 > --- a/arch/x86/include/asm/kvm_para.h > +++ b/arch/x86/include/asm/kvm_para.h > @@ -88,7 +88,6 @@ static inline long kvm_hypercall4(unsigned int nr, unsigned > long p1, > #ifdef CONFIG_KVM_GUEST > bool kvm_para_available(void); > unsigned int kvm_arch_para_features(void); > -void __init kvm_guest_init(void); > void kvm_async_pf_task_wait(u32 token, int interrupt_kernel); > void kvm_async_pf_task_wake(u32 token); > u32 kvm_read_and_reset_pf_reason(void); > @@ -103,7 +102,6 @@ static inline void kvm_spinlock_init(void) > #endif /* CONFIG_PARAVIRT_SPINLOCKS */ > > #else /* CONFIG_KVM_GUEST */ > -#define kvm_guest_init() do {} while (0) > #define kvm_async_pf_task_wait(T, I) do {} while(0) > #define kvm_async_pf_task_wake(T) do {} while(0) > > diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c > index 8bb9594d0761..d331b5060aa9 100644 > --- a/arch/x86/kernel/kvm.c > +++ b/arch/x86/kernel/kvm.c > @@ -465,7 +465,7 @@ static void __init kvm_apf_trap_init(void) > update_intr_gate(X86_TRAP_PF, async_page_fault); > } > > -void __init kvm_guest_init(void) > +static void __init kvm_guest_init(void) > { > int i; > > @@ -548,6 +548,7 @@ const struct hypervisor_x86 x86_hyper_kvm __refconst = { > .name = "KVM", > .detect = kvm_detect, > .x2apic_available = kvm_para_available, > + .guest_late_init= kvm_guest_init, > }; > EXPORT_SYMBOL_GPL(x86_hyper_kvm); > > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c > index 0957dd73d127..578569481d87 100644 > --- a/arch/x86/kernel/setup.c > +++ b/arch/x86/kernel/setup.c > @@ -1294,7 +1294,7 @@ void __init setup_arch(char **cmdline_p) > > io_apic_init_mappings(); > > - kvm_guest_init(); > + hypervisor_guest_late_init(); No principal objections, but could we please use a more obvious pattern showing that this is a callback, by calling it directly: x86_hyper->guest_late_init(); Plus add a default empty function (which hypervisors can override). This avoids all the hidden conditions and wrappery. Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v1 06/27] x86/entry/64: Adapt assembly for PIE support
* Thomas Garnierwrote: > >>*/ > >> - cmpq$.Lentry_SYSCALL_64_after_fastpath_call, (%rsp) > >> + leaq.Lentry_SYSCALL_64_after_fastpath_call(%rip), %r11 > >> + cmpq%r11, (%rsp) > >> jne 1f > > This patch seems to add extra overhead to the syscall fast-path even when > > PIE is > > disabled, right? > > It does add extra instructions when one is not possible, I preferred > that over ifdefing but I can change it. So my problem is, this pattern repeats in many other places as well, but sprinking various pieces of assembly code with #ifdefs would be very bad as well. I have no good idea how to solve this. Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v1 06/27] x86/entry/64: Adapt assembly for PIE support
* Thomas Garnierwrote: > Change the assembly code to use only relative references of symbols for the > kernel to be PIE compatible. > > Position Independent Executable (PIE) support will allow to extended the > KASLR randomization range below the -2G memory limit. > > Signed-off-by: Thomas Garnier > --- > arch/x86/entry/entry_64.S | 22 +++--- > 1 file changed, 15 insertions(+), 7 deletions(-) > > diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S > index 49167258d587..15bd5530d2ae 100644 > --- a/arch/x86/entry/entry_64.S > +++ b/arch/x86/entry/entry_64.S > @@ -194,12 +194,15 @@ entry_SYSCALL_64_fastpath: > ja 1f /* return -ENOSYS (already in > pt_regs->ax) */ > movq%r10, %rcx > > + /* Ensures the call is position independent */ > + leaqsys_call_table(%rip), %r11 > + > /* >* This call instruction is handled specially in stub_ptregs_64. >* It might end up jumping to the slow path. If it jumps, RAX >* and all argument registers are clobbered. >*/ > - call*sys_call_table(, %rax, 8) > + call*(%r11, %rax, 8) > .Lentry_SYSCALL_64_after_fastpath_call: > > movq%rax, RAX(%rsp) > @@ -334,7 +337,8 @@ ENTRY(stub_ptregs_64) >* RAX stores a pointer to the C function implementing the syscall. >* IRQs are on. >*/ > - cmpq$.Lentry_SYSCALL_64_after_fastpath_call, (%rsp) > + leaq.Lentry_SYSCALL_64_after_fastpath_call(%rip), %r11 > + cmpq%r11, (%rsp) > jne 1f > > /* > @@ -1172,7 +1176,8 @@ ENTRY(error_entry) > movl%ecx, %eax /* zero extend */ > cmpq%rax, RIP+8(%rsp) > je .Lbstep_iret > - cmpq$.Lgs_change, RIP+8(%rsp) > + leaq.Lgs_change(%rip), %rcx > + cmpq%rcx, RIP+8(%rsp) > jne .Lerror_entry_done > > /* > @@ -1383,10 +1388,10 @@ ENTRY(nmi) >* resume the outer NMI. >*/ > > - movq$repeat_nmi, %rdx > + leaqrepeat_nmi(%rip), %rdx > cmpq8(%rsp), %rdx > ja 1f > - movq$end_repeat_nmi, %rdx > + leaqend_repeat_nmi(%rip), %rdx > cmpq8(%rsp), %rdx > ja nested_nmi_out > 1: > @@ -1440,7 +1445,8 @@ nested_nmi: > pushq %rdx > pushfq > pushq $__KERNEL_CS > - pushq $repeat_nmi > + leaqrepeat_nmi(%rip), %rdx > + pushq %rdx > > /* Put stack back */ > addq$(6*8), %rsp > @@ -1479,7 +1485,9 @@ first_nmi: > addq$8, (%rsp) /* Fix up RSP */ > pushfq /* RFLAGS */ > pushq $__KERNEL_CS/* CS */ > - pushq $1f /* RIP */ > + pushq %rax/* Support Position Independent Code */ > + leaq1f(%rip), %rax /* RIP */ > + xchgq %rax, (%rsp)/* Restore RAX, put 1f */ > INTERRUPT_RETURN/* continues at repeat_nmi below */ > UNWIND_HINT_IRET_REGS This patch seems to add extra overhead to the syscall fast-path even when PIE is disabled, right? Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v1 01/27] x86/crypto: Adapt assembly for PIE support
* Thomas Garnierwrote: > Change the assembly code to use only relative references of symbols for the > kernel to be PIE compatible. > > Position Independent Executable (PIE) support will allow to extended the > KASLR randomization range below the -2G memory limit. > diff --git a/arch/x86/crypto/aes-x86_64-asm_64.S > b/arch/x86/crypto/aes-x86_64-asm_64.S > index 8739cf7795de..86fa068e5e81 100644 > --- a/arch/x86/crypto/aes-x86_64-asm_64.S > +++ b/arch/x86/crypto/aes-x86_64-asm_64.S > @@ -48,8 +48,12 @@ > #define R10 %r10 > #define R11 %r11 > > +/* Hold global for PIE suport */ > +#define RBASE%r12 > + > #define prologue(FUNC,KEY,B128,B192,r1,r2,r5,r6,r7,r8,r9,r10,r11) \ > ENTRY(FUNC);\ > + pushq RBASE; \ > movqr1,r2; \ > leaqKEY+48(r8),r9; \ > movqr10,r11;\ > @@ -74,54 +78,63 @@ > movlr6 ## E,4(r9); \ > movlr7 ## E,8(r9); \ > movlr8 ## E,12(r9); \ > + popqRBASE; \ > ret;\ > ENDPROC(FUNC); > > +#define round_mov(tab_off, reg_i, reg_o) \ > + leaqtab_off(%rip), RBASE; \ > + movl(RBASE,reg_i,4), reg_o; > + > +#define round_xor(tab_off, reg_i, reg_o) \ > + leaqtab_off(%rip), RBASE; \ > + xorl(RBASE,reg_i,4), reg_o; > + > #define round(TAB,OFFSET,r1,r2,r3,r4,r5,r6,r7,r8,ra,rb,rc,rd) \ > movzbl r2 ## H,r5 ## E;\ > movzbl r2 ## L,r6 ## E;\ > - movlTAB+1024(,r5,4),r5 ## E;\ > + round_mov(TAB+1024, r5, r5 ## E)\ > movwr4 ## X,r2 ## X;\ > - movlTAB(,r6,4),r6 ## E; \ > + round_mov(TAB, r6, r6 ## E) \ > roll$16,r2 ## E;\ > shrl$16,r4 ## E;\ > movzbl r4 ## L,r7 ## E;\ > movzbl r4 ## H,r4 ## E;\ > xorlOFFSET(r8),ra ## E; \ > xorlOFFSET+4(r8),rb ## E; \ > - xorlTAB+3072(,r4,4),r5 ## E;\ > - xorlTAB+2048(,r7,4),r6 ## E;\ > + round_xor(TAB+3072, r4, r5 ## E)\ > + round_xor(TAB+2048, r7, r6 ## E)\ > movzbl r1 ## L,r7 ## E;\ > movzbl r1 ## H,r4 ## E;\ > - movlTAB+1024(,r4,4),r4 ## E;\ > + round_mov(TAB+1024, r4, r4 ## E)\ > movwr3 ## X,r1 ## X;\ > roll$16,r1 ## E;\ > shrl$16,r3 ## E;\ > - xorlTAB(,r7,4),r5 ## E; \ > + round_xor(TAB, r7, r5 ## E) \ > movzbl r3 ## L,r7 ## E;\ > movzbl r3 ## H,r3 ## E;\ > - xorlTAB+3072(,r3,4),r4 ## E;\ > - xorlTAB+2048(,r7,4),r5 ## E;\ > + round_xor(TAB+3072, r3, r4 ## E)\ > + round_xor(TAB+2048, r7, r5 ## E)\ > movzbl r1 ## L,r7 ## E;\ > movzbl r1 ## H,r3 ## E;\ > shrl$16,r1 ## E;\ > - xorlTAB+3072(,r3,4),r6 ## E;\ > - movlTAB+2048(,r7,4),r3 ## E;\ > + round_xor(TAB+3072, r3, r6 ## E)\ > + round_mov(TAB+2048, r7, r3 ## E)\ > movzbl r1 ## L,r7 ## E;\ > movzbl r1 ## H,r1 ## E;\ > - xorlTAB+1024(,r1,4),r6 ## E;\ > - xorlTAB(,r7,4),r3 ## E; \ > + round_xor(TAB+1024, r1, r6 ## E)\ > + round_xor(TAB, r7, r3 ## E) \ > movzbl r2 ## H,r1 ## E;\ > movzbl r2 ## L,r7 ## E;\ > shrl$16,r2 ## E;\ > - xorlTAB+3072(,r1,4),r3 ## E;\ > - xorlTAB+2048(,r7,4),r4 ## E;\ > + round_xor(TAB+3072, r1, r3 ## E)\ > + round_xor(TAB+2048, r7, r4 ## E)\ > movzbl r2 ## H,r1 ## E;\ > movzbl r2 ## L,r2 ## E;\ > xorlOFFSET+8(r8),rc ## E; \ > xorlOFFSET+12(r8),rd ## E; \ > - xorlTAB+1024(,r1,4),r3 ## E;\ > - xorlTAB(,r2,4),r4 ## E; > + round_xor(TAB+1024, r1, r3 ## E)\ > + round_xor(TAB, r2, r4 ## E) This appears to be adding unconditional overhead to a function that was moved to assembly to improve its performance. Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] x86: PIE support and option to extend KASLR randomization
* Pavel Machek <pa...@ucw.cz> wrote: > On Mon 2017-09-25 09:33:42, Ingo Molnar wrote: > > > > * Pavel Machek <pa...@ucw.cz> wrote: > > > > > > For example, there would be collision with regular user-space mappings, > > > > right? > > > > Can local unprivileged users use mmap(MAP_FIXED) probing to figure out > > > > where > > > > the kernel lives? > > > > > > Local unpriviledged users can probably get your secret bits using cache > > > probing > > > and jump prediction buffers. > > > > > > Yes, you don't want to leak the information using mmap(MAP_FIXED), but > > > CPU will > > > leak it for you, anyway. > > > > Depends on the CPU I think, and CPU vendors are busy trying to mitigate > > this > > angle. > > I believe any x86 CPU running Linux will leak it. And with CPU vendors > putting "artifical inteligence" into branch prediction, no, I don't > think it is going to get better. > > That does not mean we shoudl not prevent mmap() info leak, but... That might or might not be so, but there's a world of a difference between running a relatively long statistical attack figuring out the kernel's location, versus being able to programmatically probe the kernel's location by using large MAP_FIXED user-space mmap()s, within a few dozen microseconds or so and a 100% guaranteed, non-statistical result. Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/1] sched/cputime: do not decrease steal time after live migration on xen
(Cc:-ed more gents involved in kernel/sched/cputime.c work. Full patch quoted below.) * Dongli Zhangwrote: > After guest live migration on xen, steal time in /proc/stat > (cpustat[CPUTIME_STEAL]) might decrease because steal returned by > paravirt_steal_clock() might be less than this_rq()->prev_steal_time. > > For instance, steal time of each vcpu is 335 before live migration. > > cpu 198 0 368 200064 1962 0 0 1340 0 0 > cpu0 38 0 81 50063 492 0 0 335 0 0 > cpu1 65 0 97 49763 634 0 0 335 0 0 > cpu2 38 0 81 50098 462 0 0 335 0 0 > cpu3 56 0 107 50138 374 0 0 335 0 0 > > After live migration, steal time is reduced to 312. > > cpu 200 0 370 200330 1971 0 0 1248 0 0 > cpu0 38 0 82 50123 500 0 0 312 0 0 > cpu1 65 0 97 49832 634 0 0 312 0 0 > cpu2 39 0 82 50167 462 0 0 312 0 0 > cpu3 56 0 107 50207 374 0 0 312 0 0 > > The code in this patch is borrowed from do_stolen_accounting() which has > already been removed from linux source code since commit ecb23dc6 ("xen: > add steal_clock support on x86"). > > Similar and more severe issue would impact prior linux 4.8-4.10 as > discussed by Michael Las at > https://0xstubs.org/debugging-a-flaky-cpu-steal-time-counter-on-a-paravirtualized-xen-guest. > Unlike the issue discussed by Michael Las which would overflow steal time > and lead to 100% st usage in top command for linux 4.8-4.10, the issue for > linux 4.11+ would only decrease but not overflow steal time after live > migration. > > References: > https://0xstubs.org/debugging-a-flaky-cpu-steal-time-counter-on-a-paravirtualized-xen-guest > Signed-off-by: Dongli Zhang > --- > kernel/sched/cputime.c | 13 ++--- > 1 file changed, 10 insertions(+), 3 deletions(-) > > diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c > index 14d2dbf..57d09cab 100644 > --- a/kernel/sched/cputime.c > +++ b/kernel/sched/cputime.c > @@ -238,10 +238,17 @@ static __always_inline u64 > steal_account_process_time(u64 maxtime) > { > #ifdef CONFIG_PARAVIRT > if (static_key_false(_steal_enabled)) { > - u64 steal; > + u64 steal, steal_time; > + s64 steal_delta; > + > + steal_time = paravirt_steal_clock(smp_processor_id()); > + steal = steal_delta = steal_time - this_rq()->prev_steal_time; > + > + if (unlikely(steal_delta < 0)) { > + this_rq()->prev_steal_time = steal_time; > + return 0; > + } > > - steal = paravirt_steal_clock(smp_processor_id()); > - steal -= this_rq()->prev_steal_time; > steal = min(steal, maxtime); > account_steal_time(steal); > this_rq()->prev_steal_time += steal; > -- > 2.7.4 > ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] x86: PIE support and option to extend KASLR randomization
* Pavel Machekwrote: > > For example, there would be collision with regular user-space mappings, > > right? > > Can local unprivileged users use mmap(MAP_FIXED) probing to figure out > > where > > the kernel lives? > > Local unpriviledged users can probably get your secret bits using cache > probing > and jump prediction buffers. > > Yes, you don't want to leak the information using mmap(MAP_FIXED), but CPU > will > leak it for you, anyway. Depends on the CPU I think, and CPU vendors are busy trying to mitigate this angle. Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] x86: PIE support and option to extend KASLR randomization
* H. Peter Anvinwrote: > We do need to consider how we want modules to fit into whatever model we > choose, though. They can be adjacent, or we could go with a more > traditional dynamic link model where the modules can be separate, and > chained together with the main kernel via the GOT. So I believe we should start with 'adjacent'. The thing is, having modules separately randomized mostly helps if any of the secret locations fails and we want to prevent hopping from one to the other. But if one the kernel-privileged secret location fails then KASLR has already failed to a significant degree... So I think the large-PIC model for modules does not buy us any real advantages in practice, and the disadvantages of large-PIC are real and most Linux users have to pay that cost unconditionally, as distro kernels have half of their kernel functionality living in modules. But I do see fundamental value in being able to hide the kernel somewhere in a ~48 bits address space, especially if we also implement Linus's suggestion to utilize the lower bits as well. 0..281474976710656 is a nicely large range and will get larger with time. But it should all be done smartly and carefully: For example, there would be collision with regular user-space mappings, right? Can local unprivileged users use mmap(MAP_FIXED) probing to figure out where the kernel lives? Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] x86: PIE support and option to extend KASLR randomization
* H. Peter Anvin <h...@zytor.com> wrote: > On 09/22/17 09:32, Ingo Molnar wrote: > > > > BTW., I think things improved with ORC because with ORC we have RBP as an > > extra > > register and with PIE we lose RBX - so register pressure in code generation > > is > > lower. > > > > We lose EBX on 32 bits, but we don't lose RBX on 64 bits - since x86-64 > has RIP-relative addressing there is no need for a dedicated PIC register. Indeed, but we'd use a new register _a lot_ for constructs, transforming: movr9,QWORD PTR [r11*8-0x7e3da060] (8 bytes) into: learbx,[rip+] (7 bytes) movr9,QWORD PTR [rbx+r11*8] (6 bytes) ... which I suppose is quite close to (but not the same as) 'losing' RBX. Of course the compiler can pick other registers as well, not that it matters much to register pressure in larger functions in the end. Plus if the compiler has to pick a callee-saved register there's the additional saving/restoring overhead of that as well. Right? > I'm somewhat confused how we can have as much as almost 1% overhead. I > suspect > that we end up making a GOT and maybe even a PLT for no good reason. So the above transformation alone would explain a good chunk of the overhead I think. Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] x86: PIE support and option to extend KASLR randomization
* Thomas Garnierwrote: > > 2) we first implement the additional entropy bits that Linus suggested. > > > > does this work for you? > > Sure, I can look at how feasible that is. If it is, can I send > everything as part of the same patch set? The additional entropy would > be enabled for all KASLR but PIE will be off-by-default of course. Sure, can all be part of the same series. Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] x86: PIE support and option to extend KASLR randomization
* Thomas Garnier <thgar...@google.com> wrote: > On Thu, Sep 21, 2017 at 8:59 AM, Ingo Molnar <mi...@kernel.org> wrote: > > > > ( Sorry about the delay in answering this. I could blame the delay on the > > merge > > window, but in reality I've been procrastinating this is due to the > > permanent, > > non-trivial impact PIE has on generated C code. ) > > > > * Thomas Garnier <thgar...@google.com> wrote: > > > >> 1) PIE sometime needs two instructions to represent a single > >> instruction on mcmodel=kernel. > > > > What again is the typical frequency of this occurring in an x86-64 defconfig > > kernel, with the very latest GCC? > > I am not sure what is the best way to measure that. If this is the dominant factor then 'sizeof vmlinux' ought to be enough: > With ORC: PIE .text is 0.814224% than baseline I.e. the overhead is +0.81% in both size and (roughly) in number of instructions executed. BTW., I think things improved with ORC because with ORC we have RBP as an extra register and with PIE we lose RBX - so register pressure in code generation is lower. Ok, I suspect we can try it, but my preconditions for merging it would be: 1) Linus doesn't NAK it (obviously) 2) we first implement the additional entropy bits that Linus suggested. does this work for you? Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] x86: PIE support and option to extend KASLR randomization
( Sorry about the delay in answering this. I could blame the delay on the merge window, but in reality I've been procrastinating this is due to the permanent, non-trivial impact PIE has on generated C code. ) * Thomas Garnierwrote: > 1) PIE sometime needs two instructions to represent a single > instruction on mcmodel=kernel. What again is the typical frequency of this occurring in an x86-64 defconfig kernel, with the very latest GCC? Also, to make sure: which unwinder did you use for your measurements, frame-pointers or ORC? Please use ORC only for future numbers, as frame-pointers is obsolete from a performance measurement POV. > 2) GCC does not optimize switches in PIE in order to reduce relocations: Hopefully this can either be fixed in GCC or at least influenced via a compiler switch in the future. > The switches are the biggest increase on small functions but I don't > think they represent a large portion of the difference (number 1 is). Ok. > A side note, while testing gcc 7.2.0 on hackbench I have seen the PIE > kernel being faster by 1% across multiple runs (comparing 50 runs done > across 5 reboots twice). I don't think PIE is faster than a > mcmodel=kernel but recent versions of gcc makes them fairly similar. So I think we are down to an overhead range where the inherent noise (both random and systematic one) in 'hackbench' overwhelms the signal we are trying to measure. So I think it's the kernel .text size change that is the best noise-free proxy for the overhead impact of PIE. It doesn't hurt to double check actual real performance as well, just don't expect there to be much of a signal for anything but fully cached microbenchmark workloads. Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] linux-next: manual merge of the xen-tip tree with the tip tree
* Thomas Gleixnerwrote: > --- a/arch/x86/xen/enlighten_pv.c > +++ b/arch/x86/xen/enlighten_pv.c > @@ -586,6 +586,68 @@ static void xen_write_ldt_entry(struct d > preempt_enable(); > } > > +#ifdef CONFIG_X86_64 > +static struct { > + void (*orig)(void); > + void (*xen)(void); > + bool ist_okay; > +} trap_array[] = { > + { debug, xen_xendebug, true }, > + { int3, xen_xenint3, true }, > + { double_fault, xen_double_fault, true }, > +#ifdef CONFIG_X86_MCE > + { machine_check, xen_machine_check, true }, > +#endif > + { nmi, xen_nmi, true }, > + { overflow, xen_overflow, false }, > +#ifdef CONFIG_IA32_EMULATION > + { entry_INT80_compat, xen_entry_INT80_compat, false }, > +#endif > + { page_fault, xen_page_fault, false }, > + { divide_error, xen_divide_error, false }, > + { bounds, xen_bounds, false }, > + { invalid_op, xen_invalid_op, false }, > + { device_not_available, xen_device_not_available, false }, > + { coprocessor_segment_overrun, xen_coprocessor_segment_overrun, false }, > + { invalid_TSS, xen_invalid_TSS, false }, > + { segment_not_present, xen_segment_not_present, false }, > + { stack_segment, xen_stack_segment, false }, > + { general_protection, xen_general_protection, false }, > + { spurious_interrupt_bug, xen_spurious_interrupt_bug, false }, > + { coprocessor_error, xen_coprocessor_error, false }, > + { alignment_check, xen_alignment_check, false }, > + { simd_coprocessor_error, xen_simd_coprocessor_error, false }, > +#ifdef CONFIG_TRACING > + { trace_page_fault, xen_trace_page_fault, false }, > +#endif > +}; Low prio nitpicking, could we please write such table based initializers in a vertically organized, tabular fashion: > + { debug,xen_xendebug, > true }, > + { int3, xen_xenint3, > true }, > + { double_fault, xen_double_fault, > true }, > +#ifdef CONFIG_X86_MCE > + { machine_check,xen_machine_check, > true }, > +#endif > + { nmi, xen_nmi, > true }, > + { overflow, xen_overflow, > false }, > +#ifdef CONFIG_IA32_EMULATION > + { entry_INT80_compat, xen_entry_INT80_compat, > false }, > +#endif > + { page_fault, xen_page_fault, > false }, > + { divide_error, xen_divide_error, > false }, > + { bounds, xen_bounds, > false }, > + { invalid_op, xen_invalid_op, > false }, > + { device_not_available, xen_device_not_available, > false }, > + { coprocessor_segment_overrun, xen_coprocessor_segment_overrun, > false }, > + { invalid_TSS, xen_invalid_TSS, > false }, > + { segment_not_present, xen_segment_not_present, > false }, > + { stack_segment,xen_stack_segment, > false }, > + { general_protection, xen_general_protection, > false }, > + { spurious_interrupt_bug, xen_spurious_interrupt_bug, > false }, > + { coprocessor_error,xen_coprocessor_error, > false }, > + { alignment_check, xen_alignment_check, > false }, > + { simd_coprocessor_error, xen_simd_coprocessor_error, > false }, > +#ifdef CONFIG_TRACING > + { trace_page_fault, xen_trace_page_fault, > false }, > +#endif ... as to me such a table is 100 times more readable - YMMV. (If checkpatch whinges about col80 then ignore it.) > + > +static bool get_trap_addr(unsigned long *addr, unsigned int ist) > +{ > + unsigned int nr; > + bool ist_okay = false; > + > + /* > + * Replace trap handler addresses by Xen specific ones. > + * Check for known traps using IST and whitelist them. > + * The debugger ones are the only ones we care about. > + * Xen will handle faults like double_fault, * so we should never see > + * them. Warn if there's an unexpected IST-using fault handler. > + */ > + for (nr = 0; nr < ARRAY_SIZE(trap_array); nr++) > + if (*addr == (unsigned long)trap_array[nr].orig) { > + *addr = (unsigned long)trap_array[nr].xen; > + ist_okay = trap_array[nr].ist_okay; > + break; > + } And here I think we could do a more readable variant: static bool get_trap_addr(void **addr, unsigned int ist) ... struct
Re: [Xen-devel] x86: PIE support and option to extend KASLR randomization
* Thomas Garnierwrote: > With the fix for function tracing, the hackbench results have an > average of +0.8 to +1.4% (from +8% to +10% before). With a default > configuration, the numbers are closer to 0.8%. > > On the .text size, with gcc 4.9 I see +0.8% on default configuration > and +1.180% on the ubuntu configuration. A 1% text size increase is still significant. Could you look at the disassembly, where does the size increase come from? Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] x86: PIE support and option to extend KASLR randomization
* Thomas Garnierwrote: > > > -model=small/medium assume you are on the low 32-bit. It generates > > > instructions where the virtual addresses have the high 32-bit to be zero. > > > > How are these assumptions hardcoded by GCC? Most of the instructions should > > be > > relocatable straight away, as most call/jump/branch instructions are > > RIP-relative. > > I think PIE is capable to use relative instructions well. mcmodel=large > assumes > symbols can be anywhere. So if the numbers in your changelog and Kconfig text cannot be trusted, there's this description of the size impact which I suspect is less susceptible to measurement error: + The kernel and modules will generate slightly more assembly (1 to 2% + increase on the .text sections). The vmlinux binary will be + significantly smaller due to less relocations. ... but describing a 1-2% kernel text size increase as "slightly more assembly" shows a gratituous disregard to kernel code generation quality! In reality that's a huge size increase that in most cases will almost directly transfer to a 1-2% slowdown for kernel intense workloads. Where does that size increase come from, if PIE is capable of using relative instructins well? Does it come from the loss of a generic register and the resulting increase in register pressure, stack spills, etc.? So I'm still unhappy about this all, and about the attitude surrounding it. Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] x86: PIE support and option to extend KASLR randomization
* Thomas Garnier <thgar...@google.com> wrote: > On Tue, Aug 15, 2017 at 12:56 AM, Ingo Molnar <mi...@kernel.org> wrote: > > > > * Thomas Garnier <thgar...@google.com> wrote: > > > >> > Do these changes get us closer to being able to build the kernel as truly > >> > position independent, i.e. to place it anywhere in the valid x86-64 > >> > address > >> > space? Or any other advantages? > >> > >> Yes, PIE allows us to put the kernel anywhere in memory. It will allow us > >> to > >> have a full randomized address space where position and order of sections > >> are > >> completely random. There is still some work to get there but being able to > >> build > >> a PIE kernel is a significant step. > > > > So I _really_ dislike the whole PIE approach, because of the huge slowdown: > > > > +config RANDOMIZE_BASE_LARGE > > + bool "Increase the randomization range of the kernel image" > > + depends on X86_64 && RANDOMIZE_BASE > > + select X86_PIE > > + select X86_MODULE_PLTS if MODULES > > + default n > > + ---help--- > > + Build the kernel as a Position Independent Executable (PIE) and > > + increase the available randomization range from 1GB to 3GB. > > + > > + This option impacts performance on kernel CPU intensive workloads > > up > > + to 10% due to PIE generated code. Impact on user-mode processes > > and > > + typical usage would be significantly less (0.50% when you build > > the > > + kernel). > > + > > + The kernel and modules will generate slightly more assembly (1 to > > 2% > > + increase on the .text sections). The vmlinux binary will be > > + significantly smaller due to less relocations. > > > > To put 10% kernel overhead into perspective: enabling this option wipes out > > about > > 5-10 years worth of painstaking optimizations we've done to keep the kernel > > fast > > ... (!!) > > Note that 10% is the high-bound of a CPU intensive workload. Note that the 8-10% hackbench or even a 2%-4% range would be 'huge' in terms of modern kernel performance. In many cases we are literally applying cycle level optimizations that are barely measurable. A 0.1% speedup in linear execution speed is already a big success. > I am going to start doing performance testing on -mcmodel=large to see if it > is > faster than -fPIE. Unfortunately mcmodel=large looks pretty heavy too AFAICS, at the machine instruction level. Function calls look like this: -mcmodel=medium: 757: e8 98 ff ff ff callq 6f4 -mcmodel=large 77b: 48 b8 10 f7 df ff ffmovabs $0xffdff710,%rax 782: ff ff ff 785: 48 8d 04 03 lea(%rbx,%rax,1),%rax 789: ff d0 callq *%rax And we'd do this for _EVERY_ function call in the kernel. That kind of crap is totally unacceptable. > > I think the fundamental flaw is the assumption that we need a PIE > > executable > > to have a freely relocatable kernel on 64-bit CPUs. > > > > Have you considered a kernel with -mcmodel=small (or medium) instead of > > -fpie > > -mcmodel=large? We can pick a random 2GB window in the (non-kernel) > > canonical > > x86-64 address space to randomize the location of kernel text. The location > > of > > modules can be further randomized within that 2GB window. > > -model=small/medium assume you are on the low 32-bit. It generates > instructions > where the virtual addresses have the high 32-bit to be zero. How are these assumptions hardcoded by GCC? Most of the instructions should be relocatable straight away, as most call/jump/branch instructions are RIP-relative. I.e. is there no GCC code generation mode where code can be placed anywhere in the canonical address space, yet call and jump distance is within 31 bits so that the generated code is fast? Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] x86: PIE support and option to extend KASLR randomization
* Thomas Garnierwrote: > > Do these changes get us closer to being able to build the kernel as truly > > position independent, i.e. to place it anywhere in the valid x86-64 address > > space? Or any other advantages? > > Yes, PIE allows us to put the kernel anywhere in memory. It will allow us to > have a full randomized address space where position and order of sections are > completely random. There is still some work to get there but being able to > build > a PIE kernel is a significant step. So I _really_ dislike the whole PIE approach, because of the huge slowdown: +config RANDOMIZE_BASE_LARGE + bool "Increase the randomization range of the kernel image" + depends on X86_64 && RANDOMIZE_BASE + select X86_PIE + select X86_MODULE_PLTS if MODULES + default n + ---help--- + Build the kernel as a Position Independent Executable (PIE) and + increase the available randomization range from 1GB to 3GB. + + This option impacts performance on kernel CPU intensive workloads up + to 10% due to PIE generated code. Impact on user-mode processes and + typical usage would be significantly less (0.50% when you build the + kernel). + + The kernel and modules will generate slightly more assembly (1 to 2% + increase on the .text sections). The vmlinux binary will be + significantly smaller due to less relocations. To put 10% kernel overhead into perspective: enabling this option wipes out about 5-10 years worth of painstaking optimizations we've done to keep the kernel fast ... (!!) I think the fundamental flaw is the assumption that we need a PIE executable to have a freely relocatable kernel on 64-bit CPUs. Have you considered a kernel with -mcmodel=small (or medium) instead of -fpie -mcmodel=large? We can pick a random 2GB window in the (non-kernel) canonical x86-64 address space to randomize the location of kernel text. The location of modules can be further randomized within that 2GB window. It should have far less performance impact than the register-losing and overhead-inducing -fpie / -mcmodel=large (for modules) execution models. My quick guess is tha the performance impact might be close to zero in fact. Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4] xen: get rid of paravirt op adjust_exception_frame
* Juergen Gross <jgr...@suse.com> wrote: > When running as Xen pv-guest the exception frame on the stack contains > %r11 and %rcx additional to the other data pushed by the processor. > > Instead of having a paravirt op being called for each exception type > prepend the Xen specific code to each exception entry. When running as > Xen pv-guest just use the exception entry with prepended instructions, > otherwise use the entry without the Xen specific code. > > Signed-off-by: Juergen Gross <jgr...@suse.com> > --- > arch/x86/entry/entry_64.S | 23 ++--- > arch/x86/entry/entry_64_compat.S | 1 - > arch/x86/include/asm/paravirt.h | 5 -- > arch/x86/include/asm/paravirt_types.h | 4 -- > arch/x86/include/asm/proto.h | 3 ++ > arch/x86/include/asm/traps.h | 33 ++-- > arch/x86/kernel/asm-offsets_64.c | 1 - > arch/x86/kernel/paravirt.c| 3 -- > arch/x86/xen/enlighten_pv.c | 96 > +++ > arch/x86/xen/irq.c| 3 -- > arch/x86/xen/xen-asm_64.S | 45 ++-- > arch/x86/xen/xen-ops.h | 1 - > 12 files changed, 140 insertions(+), 78 deletions(-) Acked-by: Ingo Molnar <mi...@kernel.org> Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 0/3] fix xen hvm guest with kaslr enabled
* Juergen Gross <jgr...@suse.com> wrote: > On 08/08/17 16:00, Boris Ostrovsky wrote: > > On 08/08/2017 02:46 AM, Juergen Gross wrote: > >> On 28/07/17 12:23, Juergen Gross wrote: > >>> This patch series fixes a regression introduced in 4.13-rc1: A Xen > >>> HVM guest with KASLR enabled wouldn't boot any longer due to the usage > >>> of __va() before kernel_randomize_memory() was called. > >>> > >>> Changes in V2: > >>> - patch 1: test for x86_hyper being not NULL > >>> > >>> Juergen Gross (3): > >>> x86: provide an init_mem_mapping hypervisor hook > >>> xen: split up xen_hvm_init_shared_info() > >>> xen: fix hvm guest with kaslr enabled > >>> > >>> arch/x86/include/asm/hypervisor.h | 10 +++ > >>> arch/x86/mm/init.c| 3 ++ > >>> arch/x86/xen/enlighten_hvm.c | 59 > >>> --- > >>> 3 files changed, 50 insertions(+), 22 deletions(-) > >>> > >> Could I have some feedback, please? > >> > >> I'd like to get this regression fixed in 4.13. > >> > >> In case nobody objects this week I'll just add the patches to the Xen > >> tree for rc5. > > > > > > As I said before I think .init_mem_mapping() could live in > > x86_platform_ops() but this works too, so > > > > Reviewed-by: Boris Ostrovsky <boris.ostrov...@oracle.com> > > > > But this still wants x86 maintainers' ACK. > > x86 maintainers, could you please comment on at least patch 1? LGTM: Acked-by: Ingo Molnar <mi...@kernel.org> Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] x86: PIE support and option to extend KASLR randomization
* Thomas Garnierwrote: > Changes: > - v2: >- Add support for global stack cookie while compiler default to fs without > mcmodel=kernel >- Change patch 7 to correctly jump out of the identity mapping on kexec > load > preserve. > > These patches make the changes necessary to build the kernel as Position > Independent Executable (PIE) on x86_64. A PIE kernel can be relocated below > the top 2G of the virtual address space. It allows to optionally extend the > KASLR randomization range from 1G to 3G. So this: 61 files changed, 923 insertions(+), 299 deletions(-) ... is IMHO an _awful_ lot of churn and extra complexity in pretty fragile pieces of code, to gain what appears to be only ~1.5 more bits of randomization! Do these changes get us closer to being able to build the kernel as truly position independent, i.e. to place it anywhere in the valid x86-64 address space? Or any other advantages? Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 4.12 26/84] x86/xen/efi: Initialize only the EFI struct members used by Xen
* Greg Kroah-Hartman <gre...@linuxfoundation.org> wrote: > On Thu, Jul 20, 2017 at 10:39:10AM +0200, Ingo Molnar wrote: > > > > * Daniel Kiper <daniel.ki...@oracle.com> wrote: > > > > > Hey Greg, > > > > > > On Wed, Jul 19, 2017 at 11:43:32AM +0200, Greg Kroah-Hartman wrote: > > > > 4.12-stable review patch. If anyone has any objections, please let me > > > > know. > > > > > > Why did you skip this patch for 4.11? IMO it should be applied there too. > > > > The thing is, this patch should probaly not even be in v4.12, as it should > > only > > make any difference if there's a separate _bug_ in the kernel. > > > > This patch makes things more robust going forward, but I question that it > > needs to > > be in -stable. > > Yeah, good point, I'm going to go drop it entirely from the 4.12-stable > tree as it obviously isn't stable material, sorry for not catching that > before. I should have caught the tag as well when applying the upstream patch to begin with. Thanks! Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 4.12 26/84] x86/xen/efi: Initialize only the EFI struct members used by Xen
* Daniel Kiperwrote: > Hey Greg, > > On Wed, Jul 19, 2017 at 11:43:32AM +0200, Greg Kroah-Hartman wrote: > > 4.12-stable review patch. If anyone has any objections, please let me know. > > Why did you skip this patch for 4.11? IMO it should be applied there too. The thing is, this patch should probaly not even be in v4.12, as it should only make any difference if there's a separate _bug_ in the kernel. This patch makes things more robust going forward, but I question that it needs to be in -stable. Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v9 00/38] x86: Secure Memory Encryption (AMD)
* Tom Lendackywrote: > This patch series provides support for AMD's new Secure Memory Encryption > (SME) > feature. I'm wondering, what's the typical performance hit to DRAM access latency when SME is enabled? On that same note, if the performance hit is noticeable I'd expect SME to not be enabled in native kernels typically - but still it looks like a useful hardware feature. Since it's controlled at the page table level, have you considered allowing SME-activated vmas via mmap(), even on kernels that are otherwise not using encrypted DRAM? One would think that putting encryption keys into such encrypted RAM regions would generally improve robustness against various physical space attacks that want to extract keys but don't have full control of the CPU. Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 2/2] x86/xen/efi: Init only efi struct members used by Xen
* Daniel Kiperwrote: > -static const struct efi efi_xen __initconst = { > - .systab = NULL, /* Initialized later. */ > - .runtime_version = 0,/* Initialized later. */ > - .mps = EFI_INVALID_TABLE_ADDR, > - .acpi = EFI_INVALID_TABLE_ADDR, > - .acpi20 = EFI_INVALID_TABLE_ADDR, > - .smbios = EFI_INVALID_TABLE_ADDR, > - .smbios3 = EFI_INVALID_TABLE_ADDR, > - .sal_systab = EFI_INVALID_TABLE_ADDR, > - .boot_info= EFI_INVALID_TABLE_ADDR, > - .hcdp = EFI_INVALID_TABLE_ADDR, > - .uga = EFI_INVALID_TABLE_ADDR, > - .uv_systab= EFI_INVALID_TABLE_ADDR, > - .fw_vendor= EFI_INVALID_TABLE_ADDR, > - .runtime = EFI_INVALID_TABLE_ADDR, > - .config_table = EFI_INVALID_TABLE_ADDR, > - .get_time = xen_efi_get_time, > - .set_time = xen_efi_set_time, > - .get_wakeup_time = xen_efi_get_wakeup_time, > - .set_wakeup_time = xen_efi_set_wakeup_time, > - .get_variable = xen_efi_get_variable, > - .get_next_variable= xen_efi_get_next_variable, > - .set_variable = xen_efi_set_variable, > - .query_variable_info = xen_efi_query_variable_info, > - .update_capsule = xen_efi_update_capsule, > - .query_capsule_caps = xen_efi_query_capsule_caps, > - .get_next_high_mono_count = xen_efi_get_next_high_mono_count, > - .reset_system = xen_efi_reset_system, > - .set_virtual_address_map = NULL, /* Not used under Xen. */ > - .flags= 0 /* Initialized later. */ > -}; > - > static efi_system_table_t __init *xen_efi_probe(void) > { > struct xen_platform_op op = { > @@ -102,7 +70,18 @@ static efi_system_table_t __init *xen_efi_probe(void) > > /* Here we know that Xen runs on EFI platform. */ > > - efi = efi_xen; > + efi.get_time = xen_efi_get_time; > + efi.set_time = xen_efi_set_time; > + efi.get_wakeup_time = xen_efi_get_wakeup_time; > + efi.set_wakeup_time = xen_efi_set_wakeup_time; > + efi.get_variable = xen_efi_get_variable; > + efi.get_next_variable = xen_efi_get_next_variable; > + efi.set_variable = xen_efi_set_variable; > + efi.query_variable_info = xen_efi_query_variable_info; > + efi.update_capsule = xen_efi_update_capsule; > + efi.query_capsule_caps = xen_efi_query_capsule_caps; > + efi.get_next_high_mono_count = xen_efi_get_next_high_mono_count; > + efi.reset_system = xen_efi_reset_system; This is a step back stylistically, as you lost the nice vertical tabulation of the original initializer ... Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2] xen/mce: don't issue error message for failed /dev/mcelog registration
* Juergen Grosswrote: > When running under Xen as dom0 /dev/mcelog is being registered by Xen > instead of the normal mcelog driver. Avoid an error message being > issued by the mcelog driver in this case. Instead issue an informative > message that Xen has registered the device. > > Signed-off-by: Juergen Gross > --- > arch/x86/kernel/cpu/mcheck/dev-mcelog.c | 11 +-- > drivers/xen/mcelog.c| 2 ++ > 2 files changed, 11 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/kernel/cpu/mcheck/dev-mcelog.c > b/arch/x86/kernel/cpu/mcheck/dev-mcelog.c > index 9c632cb88546..4eb5f7d1d593 100644 > --- a/arch/x86/kernel/cpu/mcheck/dev-mcelog.c > +++ b/arch/x86/kernel/cpu/mcheck/dev-mcelog.c > @@ -388,9 +388,16 @@ static __init int dev_mcelog_init_device(void) > /* register character device /dev/mcelog */ > err = misc_register(_chrdev_device); > if (err) { > - pr_err("Unable to init device /dev/mcelog (rc: %d)\n", err); > - return err; > + if (err == -EBUSY) > + /* Xen dom0 might have registered the device already. */ > + pr_info("Unable to init device /dev/mcelog, already > registered"); > + else { > + pr_err("Unable to init device /dev/mcelog (rc: %d)\n", > +err); > + return err; > + } Please only use balanced curly braces in conditional statements. Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86/mm: Split read_cr3() into read_cr3_pa() and __read_cr3()
* Andy Lutomirskiwrote: > The kernel has several code paths that read CR3. Most of them assume that > CR3 contains the PGD's physical address, whereas some of them awkwardly > use PHYSICAL_PAGE_MASK to mask off low bits. > > Add explicit mask macros for CR3 and convert all of the CR3 readers. > This will keep them from breaking when PCID is enabled. > > Cc: Tom Lendacky > Cc: Juergen Gross > Cc: xen-devel > Cc: Boris Ostrovsky > Signed-off-by: Andy Lutomirski > --- > > Hi Ingo- > > I broke this out because Tom's SME series and my PCID series both need it. > Please consider applying it to tip:x86/mm. > > I'll send PCID v2 soon. It'll apply to x86/mm + sched/urgent + this patch. Ok, thanks - will push it out after a bit of testing. Note that I also merged sched/urgent into x86/mm, to create a better base tree for the PCID changes by picking up 252d2a4117bc that PCID depends on. Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] xen_exit_mmap() questions
* Boris Ostrovskywrote: > > xen_mc_issue() does: > > > > if ((paravirt_get_lazy_mode() & mode) == 0) > > xen_mc_flush(); > > > > I assume the load_cr3() is intended to deal with the case where we're > > in lazy mode, but we'll still be in lazy mode, right? Or does it > > serve some other purpose? > > Of course. I can't read (I ignored the "== 0" part). Ha, ob'sidenote: the preferred form to write this is: if (!(paravirt_get_lazy_mode() & mode)) xen_mc_flush(); ... exactly due to the readability problem you ran into: a 'pre' negation is much easier to read, plus '==' tends to trigger 'equal to' attributes in the brain, which is the opposite of negation. So it's very easy to mis-read such syntactic constructs even if they are technically correct. I think '== 0' should be forbidden in all cases where the purpose is a logic operation and should be used strictly only in cases where we do explicit integer arithmetics. (Bools and '== false' are suboptimal for similar reasons.) ... but I digress! Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 03/10] x86: assembly, use SYM_FUNC_END for functions
* Jiri Slabywrote: > On 04/10/2017, 09:35 PM, Josh Poimboeuf wrote: > > The code should be in a mergeable state after each patch. If only > > patches 1-3 were merged, the code would be in an inconsistent state, > > with some functions having confusing ENTRY/SYM_FUNC_END pairs. That > > complicates git history and also makes it harder to review each patch. > > > > It would be cleaner to separate things out. First, convert ENTRY/END > > functions to use ENDPROC, which is a minor bug fix. Then they can be > > converted to the new SYM_FUNC_START/END macros in a separate patch. > > OTOH I don't think touching and reviewing the same place twice is what > actually maintainers would want to see. But as I wrote earlier, I can do > whatever is preferred -- therefore I am asking before I start reworking > the patches: maintainers, what do you prefer? I'd lean towards Josh's suggestion of a more granular series. Having to review more is sometimes less, if the patches are more focused. Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] linux-next: manual merge of the xen-tip tree with the tip tree
* Juergen Grosswrote: > > So my suggestion would be: could you delay 75cd32d6093e for a week, and > > then > > merge it on top of a pulled in tip:x86/mm? I'll send that tree to Linus on > > the > > first day of the merge window so there shouldn't be any ordering problems. > > Okay, that's rather easy to do. > > Boris, I renamed the current Xen for-linus-4.12 branch for easy development > of > other Xen patches to for-linus-4.12-pre. > > I'll create another branch for-linus-4.12 based on the tip tree next week > which > will be subject to the pull request for Linus. As soon as for-linus-4.12 is > ready the for-linus-4.12-pre branch shouldn't be used any longer. I've created a tip:x86-mm-for-xen branch with the following head: 7f75540ff2ca ("Merge tag 'v4.11-rc5' into x86/mm, to refresh the branch") ... which should be a reasonable base that includes a working version of the 5-level pagetable changes. Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] linux-next: manual merge of the xen-tip tree with the tip tree
* Juergen Gross <jgr...@suse.com> wrote: > On 29/03/17 10:59, Ingo Molnar wrote: > > > > * Juergen Gross <jgr...@suse.com> wrote: > > > >> On 29/03/17 05:35, Stephen Rothwell wrote: > >>> Hi all, > >>> > >>> Today's linux-next merge of the xen-tip tree got a conflict in: > >>> > >>> arch/x86/xen/enlighten.c > >>> > >>> between commits: > >>> > >>> 6415813bae75 ("x86/cpu: Drop wp_works_ok member of struct cpuinfo_x86") > >>> 69218e47994d ("x86: Remap GDT tables in the fixmap section") > >>> b23adb7d3f7d ("x86/xen/gdt: Use X86_FEATURE_XENPV instead of globals > >>> for the GDT fixup") > >>> > >>> from the tip tree and commits: > >>> > >>> 75cd32d6093e ("x86/xen: split off enlighten_pv.c") > >>> > >>> from the xen-tip tree. > >>> > >>> I dropped the xen-tip tree for today (see other conflict reports), > >>> please get together and sort these out, thanks. > > Stephen, I have rewound the linux-next branch of xen-tip to its previous > position. You can re-enable xen-tip. Thank you! Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] linux-next: manual merge of the xen-tip tree with the tip tree
* Juergen Grosswrote: > On 29/03/17 05:35, Stephen Rothwell wrote: > > Hi all, > > > > Today's linux-next merge of the xen-tip tree got a conflict in: > > > > arch/x86/xen/enlighten.c > > > > between commits: > > > > 6415813bae75 ("x86/cpu: Drop wp_works_ok member of struct cpuinfo_x86") > > 69218e47994d ("x86: Remap GDT tables in the fixmap section") > > b23adb7d3f7d ("x86/xen/gdt: Use X86_FEATURE_XENPV instead of globals for > > the GDT fixup") > > > > from the tip tree and commits: > > > > 75cd32d6093e ("x86/xen: split off enlighten_pv.c") > > > > from the xen-tip tree. > > > > I dropped the xen-tip tree for today (see other conflict reports), > > please get together and sort these out, thanks. > > > > Hmm, seems to be a rather bad timing for the series of Vitaly. > > What is the best way to resolve those conflicts? A rebase of Vitaly's > patches seems to be required in any case. > > Should I rebase the Xen tree on current tip? This seems to be rather > easy, but I think this will work only if I can be sure the current tip > tree contents will all be merged by Linus before the Xen tree. That's certainly very likely, -tip trees all go in very early in the merge window. > I could try to cherry pick the patches from tip where Vitaly's patches > have conflicts with, but I think this could lead to a lot of patches > to take. Nor is it desirable as a workflow. I'd suggest the following: in about a week I can guarantee a working tip:x86/mm base with most of the 5-level paging patches applied that you could base Xen patches on. Unfortunately, right now there's at least one regression with those changes that needs to be properly fixed before it's a suitable base tree. The fix already exists, it just needs to be tested and the whole tree needs to cook for a few days to be dependable for Xen as a base. > Or we could delay Vitaly's series until tip has been merged, but this > will either delay some other Xen patches depending on (or conflicting > with) Vitaly's patches or would make the rebase for Vitaly more > difficult. So my suggestion would be: could you delay 75cd32d6093e for a week, and then merge it on top of a pulled in tip:x86/mm? I'll send that tree to Linus on the first day of the merge window so there shouldn't be any ordering problems. Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 02/10] x86: assembly, FUNC_START for fn, DATA_START for data
* Josh Poimboeuf <jpoim...@redhat.com> wrote: > On Wed, Mar 22, 2017 at 08:46:16AM +0100, Ingo Molnar wrote: > > > > * Jiri Slaby <jsl...@suse.cz> wrote: > > > > > On 03/22/2017, 08:25 AM, Ingo Molnar wrote: > > > > > > > > * Pavel Machek <pa...@ucw.cz> wrote: > > > > > > > >> Hi! > > > >> > > > >>> -ENTRY(saved_rbp) .quad 0 > > > >>> -ENTRY(saved_rsi) .quad 0 > > > >>> -ENTRY(saved_rdi) .quad 0 > > > >>> -ENTRY(saved_rbx) .quad 0 > > > >>> +SYM_DATA_START(saved_rbp).quad 0 > > > >>> +SYM_DATA_START(saved_rsi).quad 0 > > > >>> +SYM_DATA_START(saved_rdi).quad 0 > > > >>> +SYM_DATA_START(saved_rbx).quad 0 > > > >> > > > >> Does it make sense to call it SYM_DATA_*START* when there's no > > > >> corresponding end? > > > > > > > > That looks like a bug - I think we should strive for them to always be > > > > in pairs. > > > > > > > > Jiri, Josh, could objtool help here perhaps, to detect 'non-terminated' > > > > SYM_*_START() uses? This could be done by emitting debug data into a > > > > special > > > > section and then analyzing that section for unpaired entries. The > > > > section can be > > > > discarded in the final link, it won't show up in the kernel image. > > > > > > It should be easier than that. No introduction of other info needed -- > > > every global symbol without a ".type" or ".size" (i.e. SYM_*_END) should > > > be a bug now. > > > > I'm all for that! > > It would be easy to add this checking to objtool since it already reads > the symbol table. The hard part is figuring out the logistics. :-) > > - Should the warnings be on by default? Yes, if objtool is running. Keep it simple. > - Part of the "objtool check" command or something else? Yes - I think it's still within the 'object file check' functionality. > - Separate config option or just include it with > CONFIG_STACK_VALIDATION? Yeah, but I'd rename CONFIG_STACK_VALIDATION to CONFIG_OBJ_VALIDATION or such. As I predicted early on, objtool will go beyond stack checking! ;-) > - Should all asm files be checked, including those currently skipped by > objtool with OBJECT_FILES_NON_STANDARD? The symbol syntax check should definitely be for all files, yes. Could we perhaps emit 'non-standard stack frames' information into the .o itself (via a flag or a special section?), so that objtool can decide on its own whether to complain about any weirdnesses there? > > Can we detect double ends as well - i.e. do a build check of the full > > syntax of > > these symbol definition primitives? > > Detecting double ends would be a little trickier. The second SYM_*_END > supersedes the first, so that information isn't in the ELF symbol table. Indeed. > We could use a special section to annotate all the macro uses and have > objtool do the checking, similar to what you suggested earlier. That might be useful for other purposes as well - such as the non-standard stack frame annotations? But it's your call really: I'm principally fine with any of the solutions, as long as the checking is done. Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 02/10] x86: assembly, FUNC_START for fn, DATA_START for data
* Jiri Slaby <jsl...@suse.cz> wrote: > On 03/22/2017, 08:25 AM, Ingo Molnar wrote: > > > > * Pavel Machek <pa...@ucw.cz> wrote: > > > >> Hi! > >> > >>> -ENTRY(saved_rbp) .quad 0 > >>> -ENTRY(saved_rsi) .quad 0 > >>> -ENTRY(saved_rdi) .quad 0 > >>> -ENTRY(saved_rbx) .quad 0 > >>> +SYM_DATA_START(saved_rbp).quad 0 > >>> +SYM_DATA_START(saved_rsi).quad 0 > >>> +SYM_DATA_START(saved_rdi).quad 0 > >>> +SYM_DATA_START(saved_rbx).quad 0 > >> > >> Does it make sense to call it SYM_DATA_*START* when there's no > >> corresponding end? > > > > That looks like a bug - I think we should strive for them to always be in > > pairs. > > > > Jiri, Josh, could objtool help here perhaps, to detect 'non-terminated' > > SYM_*_START() uses? This could be done by emitting debug data into a > > special > > section and then analyzing that section for unpaired entries. The section > > can be > > discarded in the final link, it won't show up in the kernel image. > > It should be easier than that. No introduction of other info needed -- > every global symbol without a ".type" or ".size" (i.e. SYM_*_END) should > be a bug now. I'm all for that! Can we detect double ends as well - i.e. do a build check of the full syntax of these symbol definition primitives? Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 03/10] x86: assembly, use SYM_FUNC_END for functions
* Josh Poimboeufwrote: > On Mon, Mar 20, 2017 at 01:32:15PM +0100, Jiri Slaby wrote: > > ENTRY(ftrace_caller) > > /* save_mcount_regs fills in first two parameters */ > > @@ -184,11 +184,12 @@ GLOBAL(ftrace_epilogue) > > GLOBAL(ftrace_graph_call) > > jmp ftrace_stub > > #endif > > +SYM_FUNC_END(ftrace_caller) > > > > /* This is weak to keep gas from relaxing the jumps */ > > WEAK(ftrace_stub) > > retq > > -END(ftrace_caller) > > +SYM_FUNC_END(ftrace_caller) > > This gives ftrace_caller() two ends. Such errors too could be detected automatically via objtool or some other symbol debug mechanism. The reason it might be a good fit for objtool is to make the checking optional (no need to burden a regular build with it), plus objtool already looks at the .o from first principles - a symbol checking sub-functionality could analyze the symbol names in the same pass. If we want to complicate things with CFI then we absolutely should increase the quality of our symbol names space. Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 02/10] x86: assembly, FUNC_START for fn, DATA_START for data
* Pavel Machekwrote: > Hi! > > > -ENTRY(saved_rbp) .quad 0 > > -ENTRY(saved_rsi) .quad 0 > > -ENTRY(saved_rdi) .quad 0 > > -ENTRY(saved_rbx) .quad 0 > > +SYM_DATA_START(saved_rbp) .quad 0 > > +SYM_DATA_START(saved_rsi) .quad 0 > > +SYM_DATA_START(saved_rdi) .quad 0 > > +SYM_DATA_START(saved_rbx) .quad 0 > > Does it make sense to call it SYM_DATA_*START* when there's no > corresponding end? That looks like a bug - I think we should strive for them to always be in pairs. Jiri, Josh, could objtool help here perhaps, to detect 'non-terminated' SYM_*_START() uses? This could be done by emitting debug data into a special section and then analyzing that section for unpaired entries. The section can be discarded in the final link, it won't show up in the kernel image. We don't ever nest symbols, right? > Plus... it looks like saved_rsi (and friends) are only used inside > wakeup_64.S. Could we just delete the "ENTRY" annotations? That appears to make sense as well. Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v7 1/3] x86/mm: Adapt MODULES_END based on Fixmap section size
* Thomas Garnier <thgar...@google.com> wrote: > On Thu, Mar 16, 2017 at 1:10 AM, Ingo Molnar <mi...@kernel.org> wrote: > > > > Note that asm/fixmap.h is an x86-ism that isn't present in many other > > architectures, so this hunk will break the build. > > > > To make progress with these patches I've fixed it up with an ugly #ifdef > > CONFIG_X86, but it needs a real solution instead before this can be pushed > > upstream. > > I also saw an error on x86 tip on special configuration. I found this > new patch below to be a good solution to both. > > Let me know what you think. > > = > > This patch aligns MODULES_END to the beginning of the Fixmap section. > It optimizes the space available for both sections. The address is > pre-computed based on the number of pages required by the Fixmap > section. > > It will allow GDT remapping in the Fixmap section. The current > MODULES_END static address does not provide enough space for the kernel > to support a large number of processors. > > Signed-off-by: Thomas Garnier <thgar...@google.com> > --- > Based on next-20170308 > --- > Documentation/x86/x86_64/mm.txt | 5 - > arch/x86/include/asm/pgtable_64.h | 1 + > arch/x86/include/asm/pgtable_64_types.h | 3 ++- > 3 files changed, 7 insertions(+), 2 deletions(-) The patch is heavily whitespace and line wrap damaged :-( Please send a properly titled, changelogged delta patch against tip:master (or tip:x86/mm) to remove the CONFIG_X86 hack and fix the build bug. Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v7 1/3] x86/mm: Adapt MODULES_END based on Fixmap section size
* Thomas Garnierwrote: > This patch aligns MODULES_END to the beginning of the Fixmap section. > It optimizes the space available for both sections. The address is > pre-computed based on the number of pages required by the Fixmap > section. > > It will allow GDT remapping in the Fixmap section. The current > MODULES_END static address does not provide enough space for the kernel > to support a large number of processors. > > Signed-off-by: Thomas Garnier > --- > Based on next-20170308 > --- > Documentation/x86/x86_64/mm.txt | 5 - > arch/x86/include/asm/pgtable_64_types.h | 3 ++- > arch/x86/kernel/module.c| 1 + > arch/x86/mm/dump_pagetables.c | 1 + > arch/x86/mm/kasan_init_64.c | 1 + > mm/vmalloc.c| 1 + > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -35,6 +35,7 @@ > #include > #include > #include > +#include > > #include "internal.h" Note that asm/fixmap.h is an x86-ism that isn't present in many other architectures, so this hunk will break the build. To make progress with these patches I've fixed it up with an ugly #ifdef CONFIG_X86, but it needs a real solution instead before this can be pushed upstream. Thanks, Ingo => mm/vmalloc.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index dabea6a29fad..b7d2a23349f4 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -35,7 +35,10 @@ #include #include #include -#include + +#ifdef CONFIG_X86 +# include +#endif #include "internal.h" ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC] linkage: new macros for functions and data
* Jiri Slabywrote: > SYM_LOCAL_ALIAS_START -- use where there are two local names for one code > SYM_ALIAS_START -- use where there are two global names for one code > SYM_LOCAL_FUNC_START -- use for local functions > SYM_FUNCTION_START -- use for global functions > SYM_WEAK_FUNC_START -- use for weak functions > SYM_ALIAS_END -- the end of LOCALALIASed or ALIASed code > SYM_FUNCTION_END -- the end of SYM_LOCAL_FUNC_START, SYM_FUNCTION_START, > SYM_WEAK_FUNC_START, ... > SYM_DATA_START -- global data symbol > SYM_DATA_END -- the end of SYM_DATA_START symbol This looks mostly good to me, with minor details: - The mixed 'FUNC' and 'FUNCTION' naming looks a bit confusing - I'd pick one and use that consistently. - I'd also make the START/END constructs look more symmetric, i.e. make the attribute a postfix, not a prefix. - In fact I'd make the 'alias' notion an attribute as well - what matters mostly is that it's a function symbol, and that fact is lost from the SYM_ALIAS naming. I.e. something like this: SYM_FUNCTION_START SYM_FUNCTION_START_WEAK SYM_FUNCTION_START_LOCAL SYM_FUNCTION_START_ALIAS SYM_FUNCTION_START_LOCAL_ALIAS ... SYM_FUNCTION_END SYM_DATA_START SYM_DATA_END Note how simple and structured looking the nomenclature is when grouped in such a way, how easy it is to verify at a glance, and how easy it is to extend with new postfixes. ( In theory we could also make the attributes a real macro argument in the future, should the number of attributes increase significantly. ) Does this look good to everyone? Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 01/10] x86: assembly, ENTRY for fn, GLOBAL for data
* h...@zytor.com <h...@zytor.com> wrote: > On March 1, 2017 2:27:54 AM PST, Ingo Molnar <mi...@kernel.org> wrote: > >> > So how about using macro names that actually show the purpose, instead > >> > of > >> > importing all the crappy, historic, essentially randomly chosen debug > >> > symbol macro names from the binutils and older kernels? > >> > > >> > Something sane, like: > >> > > >> > SYM__FUNCTION_START > >> > >> Sane would be: > >> > >>SYM_FUNCTION_START > >> > >> The double underscore is just not giving any value. > > > > So the double underscore (at least in my view) has two advantages: > > > > 1) it helps separate the prefix from the postfix. > > > > I.e. it's a 'symbols' namespace, and a 'function start', not the 'start' of > > a > > 'symbol function'. > > > > 2) It also helps easy greppability. > > > > Try this in latest -tip: > > > >git grep e820__ > > > > To see all the E820 API calls - with no false positives! > > > > 'git grep e820_' on the other hand is a lot less reliable... > > IMO these little "namespace tricks" especially for small common macros like > we > are taking about here make the code very frustrating to read, and even more > to > write. Noone would design a programming language that way, and things like > PROC > are really just substitutes for proper language features (and could even be > as > assembly rather than cpp macros.) This is a totally different thing from language keywords which needs to be short and concise for obvious reasons. Keywords of languages get nested and are used all the time, and everyone needs to know them and they need to stay out of the way. The symbol start/end macros we are talking about here are _MUCH_ less common, and they are only ever used in a single nesting level: SYM__FUNC_START(some_kernel_asm_function) ... SYM__FUNC_END(some_kernel_asm_function) Most kernel developers writing new assembly code rarely know these constructs by heart, they just look them up and carbon copy existing practices. And guess what, the 'looking them up' gets harder if the macro naming scheme is an idosyncratic leftover from long ago. Kernel developers _reading_ assembly code will know the exact purpose of the macros even less, especially if they are named in an ambiguous, illogical fashion. Furthermore, your suggestion of: > PROC..ENDPROC, LOCALPROC..ENDPROC and DATA..ENDDATA. Clear, unambiguous and > balanced. Are neither clear, not unambiguous nor balanced! I mean, they are the _exact_ opposite: - 'PROC' is actually ambiguous in the kernel source code context, as it clashes with common abbreviations of 'procfs' and 'process'. It's also an unnecessary abbreviation of a word ('procedure') that is not actually used a _single time_ in the C ISO/IEC 9899:TC2 standard - in all half thousand+ pages of it. (!) Why the hell does this have to be used in the kernel? - It's visually and semantically imbalanced, because some terms have an 'END' prefix, but there's no matching 'START' or 'BEGIN' prefix for their counterparts. This makes it easy to commit various symbol definition termination errors, like: PROC(some_kernel_asm_function) ... Here it's not obvious that it needs an ENDPROC. While if it's written as: SYM__FUNC_START(some_kernel_asm_function) ... ... it's pretty obvious at first sight that an '_END' is missing! - What you suggest also has senselessly missing underscores, which makes it _more_ cluttered and less clear. We clearly don't have addtowaitqueue() and removefromwaitqueue() function names in the kernel, right? Why should we have 'ENDPROC' and 'ENDDATA' macro names? - Hierarchical naming schemes generally tend to put the more generic category name first, not last. So it's: mutex_init() mutex_lock() mutex_unlock() mutex_trylock() It's _NOT_ the other way around: init_mutex() lock_mutex() unlock_mutex() trylock_mutex() The prefix naming scheme is easier to read both visually/typographically (because it aligns vertically in a natural fashion so it's easier to pattern match), and also semantically: because when reading it it's easy to skip the line once your brain reads the generic category of 'mutex'. But with 'ENDPROC' my brain both has to needlessly perform the following steps: - disambiguate the 'END' and the 'PROC' - fill in the missing underscore - and finally when arriving at the generic term 'PROC', discard it as uninteresting - S
Re: [Xen-devel] [PATCH 01/10] x86: assembly, ENTRY for fn, GLOBAL for data
* h...@zytor.com <h...@zytor.com> wrote: > On March 1, 2017 2:27:54 AM PST, Ingo Molnar <mi...@kernel.org> wrote: > > > >* Thomas Gleixner <t...@linutronix.de> wrote: > > > >> On Wed, 1 Mar 2017, Ingo Molnar wrote: > >> > > >> > * Jiri Slaby <jsl...@suse.cz> wrote: > >> > > >> > > This is a start of series to unify use of ENTRY, ENDPROC, GLOBAL, > >END, > >> > > and other macros across x86. When we have all this sorted out, > >this will > >> > > help to inject DWARF unwinding info by objtool later. > >> > > > >> > > So, let us use the macros this way: > >> > > * ENTRY -- start of a global function > >> > > * ENDPROC -- end of a local/global function > >> > > * GLOBAL -- start of a globally visible data symbol > >> > > * END -- end of local/global data symbol > >> > > >> > So how about using macro names that actually show the purpose, > >instead of > >> > importing all the crappy, historic, essentially randomly chosen > >debug symbol macro > >> > names from the binutils and older kernels? > >> > > >> > Something sane, like: > >> > > >> > SYM__FUNCTION_START > >> > >> Sane would be: > >> > >>SYM_FUNCTION_START > >> > >> The double underscore is just not giving any value. > > > >So the double underscore (at least in my view) has two advantages: > > > >1) it helps separate the prefix from the postfix. > > > >I.e. it's a 'symbols' namespace, and a 'function start', not the > >'start' of a > >'symbol function'. > > > >2) It also helps easy greppability. > > > >Try this in latest -tip: > > > > git grep e820__ > > > >To see all the E820 API calls - with no false positives! > > > >'git grep e820_' on the other hand is a lot less reliable... > > > >But no strong feelings either way, I just try to sneak in these small > >namespace > >structure tricks when nobody's looking! ;-) > > > >Thanks, > > > > Ingo > > This seems needlessly verbose to me and clutters the code. > > How about: > > PROC..ENDPROC, LOCALPROC..ENDPROC and DATA..ENDDATA. Clear, unambiguous and > balanced. I'm sorry, but that naming scheme is disgusing, it reminds me of BASIC nomenclature ... did we run out of underscores or what? Nor would clearly structured names clutter anything, this isn't going to be used deep inside nested code, it's going to be the single level syntactic term in addition to the symbol name itself: SYM__FUNCTION_START(some_kernel_asm_function) ... SYM__FUNCTION_END(some_kernel_asm_function) We could shorten it to 'FUNC' if length is really an issue: SYM__FUNC_START(some_kernel_asm_function) ... SYM__FUNC_END(some_kernel_asm_function) Also, 'PROC', presumably standing for 'procedure', is both ambiguous and a misnomer: - it's ambiguous with commonly used abbreviations of procfs and process - C functions are not actually 'procedures'. Procedures in PASCAL style languages denote functions that don't return any values. Most of the kernel asm functions actually do. I realize that even in C we sometimes talk about 'procedures' out of hysterical inertia, but if you check the C standards, most of them don't even use the term 'procedure'. 'function' on the other hand is totally unambiguous. Plus the lack of START/STOP (or BEGIN/END) makes it easy to commit the mistake of forgetting to add the end marker, and your naming scheme preserves that! So if we fix/extend these macros then _PLEASE_ we need to get this stupid, illogical, nonsensical, external tooling inherited naming craziness fixed first, not doubled down on... Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 01/10] x86: assembly, ENTRY for fn, GLOBAL for data
* Thomas Gleixner <t...@linutronix.de> wrote: > On Wed, 1 Mar 2017, Ingo Molnar wrote: > > > > * Jiri Slaby <jsl...@suse.cz> wrote: > > > > > This is a start of series to unify use of ENTRY, ENDPROC, GLOBAL, END, > > > and other macros across x86. When we have all this sorted out, this will > > > help to inject DWARF unwinding info by objtool later. > > > > > > So, let us use the macros this way: > > > * ENTRY -- start of a global function > > > * ENDPROC -- end of a local/global function > > > * GLOBAL -- start of a globally visible data symbol > > > * END -- end of local/global data symbol > > > > So how about using macro names that actually show the purpose, instead of > > importing all the crappy, historic, essentially randomly chosen debug > > symbol macro > > names from the binutils and older kernels? > > > > Something sane, like: > > > > SYM__FUNCTION_START > > Sane would be: > > SYM_FUNCTION_START > > The double underscore is just not giving any value. So the double underscore (at least in my view) has two advantages: 1) it helps separate the prefix from the postfix. I.e. it's a 'symbols' namespace, and a 'function start', not the 'start' of a 'symbol function'. 2) It also helps easy greppability. Try this in latest -tip: git grep e820__ To see all the E820 API calls - with no false positives! 'git grep e820_' on the other hand is a lot less reliable... But no strong feelings either way, I just try to sneak in these small namespace structure tricks when nobody's looking! ;-) Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 01/10] x86: assembly, ENTRY for fn, GLOBAL for data
* Jiri Slabywrote: > This is a start of series to unify use of ENTRY, ENDPROC, GLOBAL, END, > and other macros across x86. When we have all this sorted out, this will > help to inject DWARF unwinding info by objtool later. > > So, let us use the macros this way: > * ENTRY -- start of a global function > * ENDPROC -- end of a local/global function > * GLOBAL -- start of a globally visible data symbol > * END -- end of local/global data symbol So how about using macro names that actually show the purpose, instead of importing all the crappy, historic, essentially randomly chosen debug symbol macro names from the binutils and older kernels? Something sane, like: SYM__FUNCTION_START SYM__FUNCTION_END SYM__DATA_START SYM__DATA_END ... and extend that macro namespace with any other variants we might need. We can still keep the old macro names (for a short while) to ease the transition, but for heaven's sake, if we do "cleanups" before complicating the code let's make sure the result is actually readable! Agreed? Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 4/4] KVM: VMX: Simplify segment_base
* Paolo Bonziniwrote: > > Paolo, how stable, non-rebasing are the KVM tree commits? > > Whatever ends in linux-next is stable. I have a separate rebasing branch, > but it's not part of linux-next by design. Ok, that's nice! > > Or should we keep Andy's KVM patches together with the GDT patches? Either > > workflow works for me - it's your call as these are predominantly KVM > > changes. > > I'll delay my pull request to Linus a couple days so that I can test Andy's 6 > patches. Then you can just base your branch on Linus's tree. Fantastic, thank you! Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 4/4] KVM: VMX: Simplify segment_base
* Thomas Garnierwrote: > > Okay, I guess I will have to wait for it to be integrated to > > linux-next then. Or would you rather to it after this patch set is > > added? > > Read your summary for the patchset of KVM cleanup, I will wait for it to > reach > linux-next to rebase and send the new iteration. Note that to be able to apply your readonly-GDT series to the x86 tree I'll need a stable SHA1 to base it on. Paolo, how stable, non-rebasing are the KVM tree commits? Once Andy's patches hit the KVM tree I could pull from you and order it so that I'd send it to Linus only after you sent your bits upstream. That would save us 3 months of patch propagation latency. Or should we keep Andy's KVM patches together with the GDT patches? Either workflow works for me - it's your call as these are predominantly KVM changes. Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 3/3] x86: Make the GDT remapping read-only on 64 bit
* Andy Lutomirski <l...@kernel.org> wrote: > On Wed, Feb 1, 2017 at 1:15 AM, Ingo Molnar <mi...@kernel.org> wrote: > > > > * Thomas Garnier <thgar...@google.com> wrote: > > > >> This patch makes the GDT remapped pages read-only to prevent corruption. > >> This change is done only on 64 bit. > > > > > >> > >> - table_base = gdt->address; > >> + table_base = (unsigned long)get_current_direct_gdt(); > > > > Instead of spreading these type casts far and wide please introduce another > > accessor the returns 'unsigned long': > > > > get_cpu_gdt_rw_vaddr() > > > > That whole function is an abomination. How about replacing 'unsigned > long table_base' with 'struct desc_struct *table'? If you're feeling > really adventurous, *delete* that function and replace all of its > users with something sane. Yeah, even better! Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 3/3] x86: Make the GDT remapping read-only on 64 bit
* Thomas Garnierwrote: > This patch makes the GDT remapped pages read-only to prevent corruption. > This change is done only on 64 bit. Please spell '64-bit' consistently through the series. I've seen two variants: 64 bit 64bit > +/* > + * The LTR instruction marks the TSS GDT entry as busy. In 64bit, the GDT is > + * a read-only remapping. To prevent a page fault, the GDT is switched to the > + * original writeable version when needed. s/In 64bit, /On 64-bit kernels, > + */ > +#ifdef CONFIG_X86_64 > +static inline void native_load_tr_desc(void) > +{ > + struct desc_ptr gdt; > + int cpu = raw_smp_processor_id(); > + bool restore = false; > + struct desc_struct *fixmap_gdt; > + > + native_store_gdt(); > + fixmap_gdt = get_cpu_fixmap_gdt(cpu); > + > + /* > + * If the current GDT is the read-only fixmap, swap to the original > + * writeable version. Swap back at the end. > + */ > + if (gdt.address == (unsigned long)fixmap_gdt) { > + load_direct_gdt(cpu); > + restore = true; > + } > + asm volatile("ltr %w0"::"q" (GDT_ENTRY_TSS*8)); > + if (restore) > + load_fixmap_gdt(cpu); Please use bool plus 0/1, it's more readable (to me) than the true/false notation. > extern void switch_to_new_gdt(int); > +extern void load_direct_gdt(int); > extern void load_fixmap_gdt(int); > +/* Load the original GDT from the per-cpu structure */ > +void load_direct_gdt(int cpu) > +{ > + struct desc_ptr gdt_descr; > + > + gdt_descr.address = (long)get_cpu_direct_gdt(cpu); Please name the functions in an easier to understand way, such as: get_cpu_gdt_rw() get_cpu_gdt_ro() that the GDT is in the direct mappings is less important than the fact the the address is writable ... > +} > +EXPORT_SYMBOL(load_direct_gdt); EXPORT_SYMBOL_GPL(), or no export at all. > +EXPORT_SYMBOL(load_fixmap_gdt); ditto. >* VT restores TR but not its size. Useless. >*/ > - struct desc_ptr *gdt = this_cpu_ptr(_gdt); > struct desc_struct *descs; > > - descs = (void *)gdt->address; > + descs = (void *)get_current_direct_gdt(); Couldn't the type cast be dropped? > > - table_base = gdt->address; > + table_base = (unsigned long)get_current_direct_gdt(); Instead of spreading these type casts far and wide please introduce another accessor the returns 'unsigned long': get_cpu_gdt_rw_vaddr() or such. Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86/head: Refactor 32-bit pgtable setup
* Boris Ostrovsky <boris.ostrov...@oracle.com> wrote: > The new Xen PVH entry point requires page tables to be setup by the > kernel since it is entered with paging disabled. > > Pull the common code out of head_32.S so that mk_early_pgtbl_32 can be > invoked from both the new Xen entry point and the existing startup_32 > code. > > Convert resulting common code to C. > > Signed-off-by: Boris Ostrovsky <boris.ostrov...@oracle.com> > --- > This is replacement for https://lkml.org/lkml/2016/10/14/434, with > assembly code re-written in C as requested by Ingo. > > > arch/x86/include/asm/pgtable_32.h | 32 ++ > arch/x86/kernel/head32.c | 62 +++ > arch/x86/kernel/head_32.S | 122 > +++--- > 3 files changed, 101 insertions(+), 115 deletions(-) Yeah, so (belatedly) I tried to merge this to latest upstream, via the commit below (note the slight edits to the changelog) - but 32-bit defconfig fails to build: arch/x86/kernel/head_32.S:615: Error: can't resolve `init_thread_union' {*UND* section} - `SIZEOF_PTREGS' {*UND* section} Thanks, Ingo ===> From 071878c634fda7d9be3f015f05a89f5468e7e2e4 Mon Sep 17 00:00:00 2001 From: Boris Ostrovsky <boris.ostrov...@oracle.com> Date: Thu, 8 Dec 2016 11:44:31 -0500 Subject: [PATCH] x86/boot/32: Convert the 32-bit pgtable setup code from assembly to C The new Xen PVH entry point requires page tables to be set up by the kernel since it is entered with paging disabled. Pull the common code out of head_32.S so that mk_early_pgtbl_32() can be invoked from both the new Xen entry point and the existing startup_32() code. Convert resulting common code to C. Signed-off-by: Boris Ostrovsky <boris.ostrov...@oracle.com> Cc: Andy Lutomirski <l...@kernel.org> Cc: Borislav Petkov <b...@alien8.de> Cc: Brian Gerst <brge...@gmail.com> Cc: Denys Vlasenko <dvlas...@redhat.com> Cc: H. Peter Anvin <h...@zytor.com> Cc: Josh Poimboeuf <jpoim...@redhat.com> Cc: Linus Torvalds <torva...@linux-foundation.org> Cc: Peter Zijlstra <pet...@infradead.org> Cc: Thomas Gleixner <t...@linutronix.de> Cc: m...@codeblueprint.co.uk Cc: xen-de...@lists.xenproject.org Link: http://lkml.kernel.org/r/1481215471-9639-1-git-send-email-boris.ostrov...@oracle.com Signed-off-by: Ingo Molnar <mi...@kernel.org> --- arch/x86/include/asm/pgtable_32.h | 32 ++ arch/x86/kernel/head32.c | 62 +++ arch/x86/kernel/head_32.S | 122 +++--- 3 files changed, 101 insertions(+), 115 deletions(-) diff --git a/arch/x86/include/asm/pgtable_32.h b/arch/x86/include/asm/pgtable_32.h index b6c0b404898a..fbc73360aea0 100644 --- a/arch/x86/include/asm/pgtable_32.h +++ b/arch/x86/include/asm/pgtable_32.h @@ -27,6 +27,7 @@ struct vm_area_struct; extern pgd_t swapper_pg_dir[1024]; extern pgd_t initial_page_table[1024]; +extern pmd_t initial_pg_pmd[]; static inline void pgtable_cache_init(void) { } static inline void check_pgt_cache(void) { } @@ -75,4 +76,35 @@ do { \ #define kern_addr_valid(kaddr) (0) #endif +/* + * This is how much memory in addition to the memory covered up to + * and including _end we need mapped initially. + * We need: + * (KERNEL_IMAGE_SIZE/4096) / 1024 pages (worst case, non PAE) + * (KERNEL_IMAGE_SIZE/4096) / 512 + 4 pages (worst case for PAE) + * + * Modulo rounding, each megabyte assigned here requires a kilobyte of + * memory, which is currently unreclaimed. + * + * This should be a multiple of a page. + * + * KERNEL_IMAGE_SIZE should be greater than pa(_end) + * and small than max_low_pfn, otherwise will waste some page table entries + */ +#if PTRS_PER_PMD > 1 +#define PAGE_TABLE_SIZE(pages) (((pages) / PTRS_PER_PMD) + PTRS_PER_PGD) +#else +#define PAGE_TABLE_SIZE(pages) ((pages) / PTRS_PER_PGD) +#endif + +/* + * Number of possible pages in the lowmem region. + * + * We shift 2 by 31 instead of 1 by 32 to the left in order to avoid a + * gas warning about overflowing shift count when gas has been compiled + * with only a host target support using a 32-bit type for internal + * representation. + */ +#define LOWMEM_PAGES 2<<31) - __PAGE_OFFSET) >> PAGE_SHIFT)) + #endif /* _ASM_X86_PGTABLE_32_H */ diff --git a/arch/x86/kernel/head32.c b/arch/x86/kernel/head32.c index f16c55bfc090..e5fb436a6548 100644 --- a/arch/x86/kernel/head32.c +++ b/arch/x86/kernel/head32.c @@ -49,3 +49,65 @@ asmlinkage __visible void __init i386_start_kernel(void) start_kernel(); } + +/* + * Initialize page tables. This creates a PDE and a set of page + * tables, which are located immediately beyond __brk_base. The variable + * _brk_end is set up to point to the first "safe" location. + * Mappings are created
Re: [Xen-devel] [PATCH] x86/head: Refactor 32-bit pgtable setup
* Boris Ostrovsky <boris.ostrov...@oracle.com> wrote: > On 12/08/2016 11:33 PM, Ingo Molnar wrote: > > * Boris Ostrovsky <boris.ostrov...@oracle.com> wrote: > > > >> The new Xen PVH entry point requires page tables to be setup by the > >> kernel since it is entered with paging disabled. > >> > >> Pull the common code out of head_32.S so that mk_early_pgtbl_32 can be > >> invoked from both the new Xen entry point and the existing startup_32 > >> code. > >> > >> Convert resulting common code to C. > >> > >> Signed-off-by: Boris Ostrovsky <boris.ostrov...@oracle.com> > >> --- > >> This is replacement for https://lkml.org/lkml/2016/10/14/434, with > >> assembly code re-written in C as requested by Ingo. > >> > >> > >> arch/x86/include/asm/pgtable_32.h | 32 ++ > >> arch/x86/kernel/head32.c | 62 +++ > >> arch/x86/kernel/head_32.S | 122 > >> +++--- > >> 3 files changed, 101 insertions(+), 115 deletions(-) > > Whee, I love it! And the code is so much more readable! > > > > Did you have any particular robustness problems (difficult to resolve > > crashes) > > while developing it, or was it reasonably straightforward to do? > > There was nothing particularly difficult beyond understanding current > code. That, of course, is not to say that there were no crashes but > developing this on a guest gives you pretty good insight into why/where > you crashed. > > This was tested on bare-metal (in case you are wondering), but obviously > more testing is always good. Ok, cool! Would you like to carry this with your other Xen dependencies? If yes: Acked-by: Ingo Molnar <mi...@kernel.org> If not then I can pick it up and get it to Linus in v4.10. Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86/head: Refactor 32-bit pgtable setup
* Boris Ostrovskywrote: > The new Xen PVH entry point requires page tables to be setup by the > kernel since it is entered with paging disabled. > > Pull the common code out of head_32.S so that mk_early_pgtbl_32 can be > invoked from both the new Xen entry point and the existing startup_32 > code. > > Convert resulting common code to C. > > Signed-off-by: Boris Ostrovsky > --- > This is replacement for https://lkml.org/lkml/2016/10/14/434, with > assembly code re-written in C as requested by Ingo. > > > arch/x86/include/asm/pgtable_32.h | 32 ++ > arch/x86/kernel/head32.c | 62 +++ > arch/x86/kernel/head_32.S | 122 > +++--- > 3 files changed, 101 insertions(+), 115 deletions(-) Whee, I love it! And the code is so much more readable! Did you have any particular robustness problems (difficult to resolve crashes) while developing it, or was it reasonably straightforward to do? Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC PATCH v3] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries
* Juergen Gross <jgr...@suse.com> wrote: > On 05/12/16 18:49, Alex Thorlton wrote: > > This is the third pass at my patchset to fix up our problems with > > XENMEM_machine_memory_map on large systems. The only changes on this > > pass were to flesh out the comment above the E820_X_MAX definition, and > > to add Juergen's Reviewed-by to the second patch. > > > > Let me know if anyone has any questions/comments! > > > > Alex Thorlton (2): > > x86: Make E820_X_MAX unconditionally larger than E820MAX > > xen/x86: Increase xen_e820_map to E820_X_MAX possible entries > > > > arch/x86/include/asm/e820.h | 12 > > arch/x86/xen/setup.c| 6 +++--- > > 2 files changed, 11 insertions(+), 7 deletions(-) > > > > Ingo, do you have any preferences through which tree those patches > should go? I'd like to have at least patch 2 in 4.10, so I could take > it through the Xen tree. Sure, both are fine to me: Acked-by: Ingo Molnar <mi...@kernel.org> Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 2/8] x86/head: Refactor 32-bit pgtable setup
* Boris Ostrovsky <boris.ostrov...@oracle.com> wrote: > > It is tricky to do so safely, because at this stage almost nothing of the C > > execution environment has been set up. Yeah - but we do have a fair amount of early C code though. > I can still give it a try but I'd rather not tie it to this (Xen PVH) patch > series. Which would leave me with two options: either keep what this patch > does, > leaving it as assembly (requires your ack), or have Xen code build the pages > on > its own. If you give it a try in a subsequent patch (please Cc: me) then it's OK to me: Acked-by: Ingo Molnar <mi...@kernel.org> Feel free to carry it in the Xen tree. Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 2/8] x86/head: Refactor 32-bit pgtable setup
* Boris Ostrovsky <boris.ostrov...@oracle.com> wrote: > On 12/02/2016 04:45 AM, Ingo Molnar wrote: > > * Boris Ostrovsky <boris.ostrov...@oracle.com> wrote: > > > >> On 10/31/2016 08:33 AM, Boris Ostrovsky wrote: > >>> > >>> On 10/14/2016 02:05 PM, Boris Ostrovsky wrote: > >>>> From: Matt Fleming <m...@codeblueprint.co.uk> > >>>> > >>>> The new Xen PVH entry point requires page tables to be setup by the > >>>> kernel since it is entered with paging disabled. > >>>> > >>>> Pull the common code out of head_32.S and into pgtable_32.S so that > >>>> setup_pgtable_32 can be invoked from both the new Xen entry point and > >>>> the existing startup_32 code. > >>> > >>> Ping to x86 maintainers. > >> Pinging again. > >> > >> I will be re-sending this series at some point (it has been delayed by > >> some hypervisor changes that will be needed) but I'd like to hear from > >> x86 maintainers whether this will be acceptable before I post this again. > > Could this be done in C? > > I suppose it could be, I haven't thought about it. > > The goal here was to simply make existing startup code available to others > (Xen > guest) without changes. Are you suggesting to build page tables in C for the > Xen > guest only or to make startup_32 call new C code as well? My suggestion would be to transform the factored out assembly code to C. Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 2/8] x86/head: Refactor 32-bit pgtable setup
* Boris Ostrovskywrote: > On 10/31/2016 08:33 AM, Boris Ostrovsky wrote: > > > > > > On 10/14/2016 02:05 PM, Boris Ostrovsky wrote: > >> From: Matt Fleming > >> > >> The new Xen PVH entry point requires page tables to be setup by the > >> kernel since it is entered with paging disabled. > >> > >> Pull the common code out of head_32.S and into pgtable_32.S so that > >> setup_pgtable_32 can be invoked from both the new Xen entry point and > >> the existing startup_32 code. > > > > > > Ping to x86 maintainers. > > Pinging again. > > I will be re-sending this series at some point (it has been delayed by > some hypervisor changes that will be needed) but I'd like to hear from > x86 maintainers whether this will be acceptable before I post this again. Could this be done in C? Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/2] x86: Make E820_X_MAX unconditionally larger than E820MAX
* Alex Thorlton <athorl...@sgi.com> wrote: > It's really not necessary to limit E820_X_MAX to 128 in the non-EFI > case. This commit drops E820_X_MAX's dependency on CONFIG_EFI, so that > E820_X_MAX is always at least slightly larger than E820MAX. > > The real motivation behind this is actually to prevent some issues in > the Xen kernel, where the XENMEM_machine_memory_map hypercall can > produce an e820 map larger than 128 entries, even on systems where the > original e820 table was quite a bit smaller than that, depending on how > many IOAPICs are installed on the system. > > Signed-off-by: Alex Thorlton <athorl...@sgi.com> > Suggested-by: Ingo Molnar <mi...@redhat.com> > Cc: Russ Anderson <r...@sgi.com> > Cc: David Vrabel <david.vra...@citrix.com> > Cc: Ingo Molnar <mi...@redhat.com> > Cc: Juergen Gross <jgr...@suse.com> > Cc: Thomas Gleixner <t...@linutronix.de> > Cc: "H. Peter Anvin" <h...@zytor.com> > Cc: Denys Vlasenko <dvlas...@redhat.com> > Cc: Boris Ostrovsky <boris.ostrov...@oracle.com> > Cc: x...@kernel.org > Cc: xen-de...@lists.xenproject.org > --- > arch/x86/include/asm/e820.h | 10 ++ > 1 file changed, 6 insertions(+), 4 deletions(-) > > diff --git a/arch/x86/include/asm/e820.h b/arch/x86/include/asm/e820.h > index 476b574..aa00d33 100644 > --- a/arch/x86/include/asm/e820.h > +++ b/arch/x86/include/asm/e820.h > @@ -1,13 +1,15 @@ > #ifndef _ASM_X86_E820_H > #define _ASM_X86_E820_H > > -#ifdef CONFIG_EFI > +/* > + * We need to make sure that E820_X_MAX is defined > + * before we include uapi/asm/e820.h > + */ > #include > #define E820_X_MAX (E820MAX + 3 * MAX_NUMNODES) What we need an explanation for in the comment is what does this stand for (what does the 'X' mean?), and what is the magic 3*MAX_NUMNODES about? Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries
* Alex Thorltonwrote: > On systems with sufficiently large e820 tables, and several IOAPICs, it > is possible for the XENMEM_machine_memory_map callback (and its > counterpart, XENMEM_memory_map) to attempt to return an e820 table with > more than 128 entries. This callback adds entries to the BIOS-provided > e820 table to account for IOAPIC registers, which, on sufficiently large > systems, can result in an e820 table that is too large to copy back into > xen_e820_map. > > This change simply increases the size of xen_e820_map to E820_X_MAX to > ensure that there is enough room to store the entire e820 map returned > from this callback. This means: #ifdef CONFIG_EFI #include #define E820_X_MAX (E820MAX + 3 * MAX_NUMNODES) #else /* ! CONFIG_EFI */ #define E820_X_MAX E820MAX #endif It's a bit weird to key it off of CONFIG_EFI - why isn't it unconditional? But I have no objections to the principle if it fixes a crash. Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 0/8] x86: audit and remove needless module.h includes
* Paul Gortmaker <paul.gortma...@windriver.com> wrote: > [Re: [PATCH 0/8] x86: audit and remove needless module.h includes] On > 14/07/2016 (Thu 20:39) Ingo Molnar wrote: > > > > > * Paul Gortmaker <paul.gortma...@windriver.com> wrote: > > > > > > I'll continue testing with the setup_percpu.c change left out. > > > > > > Let me know if you want a resend or if you want to just add the > > > asm/desc.h locally or ... > > > > So I tried the asm/desc.h but saw other build failures - for now gave up. > > Can we do this in a separate patch? > > Yes of course. In the meantime I'll investigate further and add more > configs to my testing. Is there a list of the configs used for sanity > testing tip somewhere? No need for you to complicate your testing (unless you have enough hardware resources for that) - I can deal with the occasional build failure. Since I eventually create and boot randconfigs there's no fixed set of configs. :-/ That I enabled SMP on the "allnoconfig" (and forgot about it!) has to do with the fact that even my allnoconfigs are typically bootable (on one of my test systems) so there's a handful of pre-baked configs that always get enabled. But other than those small perturbations I do similar testing to what you did: all{no|def|mod|yes}config on x86-{32|64}, plus randconfigs. Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 0/8] x86: audit and remove needless module.h includes
* Paul Gortmakerwrote: > > I'll continue testing with the setup_percpu.c change left out. > > Let me know if you want a resend or if you want to just add the > asm/desc.h locally or ... So I tried the asm/desc.h but saw other build failures - for now gave up. Can we do this in a separate patch? Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 0/8] x86: audit and remove needless module.h includes
* Paul Gortmakerwrote: > To that end, I have done allmodconfig, allyesconfig and allnoconfig > for both 32 bit and 64 bit x86 with these changes on the linux-next > from today, which presumably has an up to date copy of tip in it. It does, still I get this on allnoconfig with your patches applied: arch/x86/kernel/setup_percpu.c: In function ‘setup_percpu_segment’: arch/x86/kernel/setup_percpu.c:159:2: error: implicit declaration of function ‘pack_descriptor’ [-Werror=implicit-function-declaration] pack_descriptor(, per_cpu_offset(cpu), 0xF, ^ arch/x86/kernel/setup_percpu.c:162:2: error: implicit declaration of function ‘write_gdt_entry’ [-Werror=implicit-function-declaration] write_gdt_entry(get_cpu_gdt_table(cpu), ^ arch/x86/kernel/setup_percpu.c:162:18: error: implicit declaration of function ‘get_cpu_gdt_table’ [-Werror=implicit-function-declaration] write_gdt_entry(get_cpu_gdt_table(cpu), I'll continue testing with the setup_percpu.c change left out. Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/7] x86/xen: Simplify set_aliased_prot
* Andy Lutomirskiwrote: > On Wed, May 25, 2016 at 2:50 AM, David Vrabel wrote: > > On 24/05/16 23:48, Andy Lutomirski wrote: > >> In aa1acff356bb ("x86/xen: Probe target addresses in > >> set_aliased_prot() before the hypercall"), I added an explicit probe > >> to work around a hypercall issue. The code can be simplified by > >> using probe_kernel_read. > > > > Acked-by: David Vrabel > > Ingo, can you apply this one patch directly to x86/asm? The rest of > the series is stalled pending my fixing Borislav's review comments > and, more importantly, fixing the bugs that testing it has shaken > loose. This patch is a nice cleanup all by itself, though. Ok, agreed, done. Note that I simplified it some more: - probe_kernel_read(, (unsigned char *)v, 1); + probe_kernel_read(, v, 1); ... because 'v' already has a 'void *' natural type. Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] x86 PAT fixes for stable
* Kani, Toshimitsu <toshi.k...@hpe.com> wrote: > Hello, > > The following patches fix two x86 PAT issues reported in 4.4 [1][2]. Can > you please backport them up to 4.4? Do they apply, build and boot cleanly in that order on top of v4.4, v4.5 and v4.6? If yes then: Acked-by: Ingo Molnar <mi...@kernel.org> Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86, locking: Remove ticket (spin)lock implementation
* Peter Zijlstrawrote: > My motivation was that people keep trying to send patches against the ticket > lock code... David did just today, and he's not the first. Yeah, let's just remove dead code ASAP. It's not like it's hard to add (back) new code, should the need arise. Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [GIT PULL] EFI ARM Xen support
* Stefano Stabelliniwrote: > On Tue, 17 May 2016, David Vrabel wrote: > > On 17/05/16 11:12, Stefano Stabellini wrote: > > > On Sat, 14 May 2016, Matt Fleming wrote: > > >> Folks, > > >> > > >> Please pull the following branch which contains support for Xen on ARM > > >> EFI platforms. > > >> > > >> This merge includes a merge of Stefano's xen/linux-next branch to pull > > >> in the prerequisites required for Shannon's commit: > > >> > > >> 11ee5491e5ff ("Xen: EFI: Parse DT parameters for Xen specific UEFI") > > >> > > >> as it needs both the latest changes in the EFI 'next' branch (now > > >> tip/efi/core) and xen/linux-next. > > >> > > >> I have intentionally not included the individual patches as I would > > >> normally do in a pull request, so that commit history created by my > > >> merging of Stefano's branch is preserved, and so that there's no way > > >> to accidentally apply the patches individually. > > >> > > >> The following changes since commit > > >> 6c5450ef66816216e574885cf8d3ddb31ef77428: > > >> > > >> efivarfs: Make efivarfs_file_ioctl() static (2016-05-07 07:06:13 +0200) > > >> > > >> are available in the git repository at: > > >> > > >> git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi.git > > >> efi/arm-xen > > >> > > >> for you to fetch changes up to 11ee5491e5ff519e0d3a7018eb21351cb6955a98: > > >> > > >> Xen: EFI: Parse DT parameters for Xen specific UEFI (2016-05-14 > > >> 19:31:01 +0100) > > > > > > Is this arrangement working for everybody, in particular the tip > > > maintainers? > > > > Yes. > > I meant the x86 tip maintainers (Thomas, Peter and Ingo). I have no particular objections, since this seems to be Xen-next merged to the EFI tree that is already upstream, plus this single commit on top of t: 11ee5491e5ff Xen: EFI: Parse DT parameters for Xen specific UEFI Which, if Matt is fine with it, you could send to Linus too as part of this cycle's Xen pull request. Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] efi_enabled(EFI_PARAVIRT) use
* Stefano Stabellini <sstabell...@kernel.org> wrote: > On Fri, 29 Apr 2016, Ingo Molnar wrote: > > Also, it would be nice to have all things EFI in a single tree, the > > conflicts are > > going to be painful! There's very little reason not to carry this kind of > > commit: > > > > arch/arm/xen/enlighten.c | 6 + > > drivers/firmware/efi/arm-runtime.c | 17 +- > > drivers/firmware/efi/efi.c | 45 > > -- > > 3 files changed, 56 insertions(+), 12 deletions(-) > > > > in the EFI tree. > > That's true. I'll drop this commit from xentip and let Matt pick it up > or request changes as he sees fit. Thanks! Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] efi_enabled(EFI_PARAVIRT) use
* Stephen Rothwell <s...@canb.auug.org.au> wrote: > Hi all, > > Today's linux-next merge of the xen-tip tree got a conflict in: > > drivers/firmware/efi/arm-runtime.c > > between commit: > > 14c43be60166 ("efi/arm*: Drop writable mapping of the UEFI System table") > > from the tip tree and commit: > > 21c8dfaa2327 ("Xen: EFI: Parse DT parameters for Xen specific UEFI") > > from the xen-tip tree. (I've attached 21c8dfaa2327 below, for reference.) Argh: With considerable pain we just got rid of paravirt_enabled() in the x86 tree, and Xen is now reintroducing it in the EFI code. Please don't: if you have to do capability flags then name the flag accordingly to what it does, don't use some generic catch-all naming that will inevitably cause the kind of problems paravirt_enabled() caused... So: NAKed-by: Ingo Molnar <mi...@kernel.org> Also, it would be nice to have all things EFI in a single tree, the conflicts are going to be painful! There's very little reason not to carry this kind of commit: arch/arm/xen/enlighten.c | 6 + drivers/firmware/efi/arm-runtime.c | 17 +- drivers/firmware/efi/efi.c | 45 -- 3 files changed, 56 insertions(+), 12 deletions(-) in the EFI tree. Thanks, Ingo ===> From 21c8dfaa23276be2ae6d580331d8d252cc41e8d9 Mon Sep 17 00:00:00 2001 From: Shannon Zhao <shannon.z...@linaro.org> Date: Thu, 7 Apr 2016 20:03:34 +0800 Subject: [PATCH] Xen: EFI: Parse DT parameters for Xen specific UEFI Add a new function to parse DT parameters for Xen specific UEFI just like the way for normal UEFI. Then it could reuse the existing codes. If Xen supports EFI, initialize runtime services. CC: Matt Fleming <m...@codeblueprint.co.uk> Signed-off-by: Shannon Zhao <shannon.z...@linaro.org> Reviewed-by: Matt Fleming <m...@codeblueprint.co.uk> Reviewed-by: Stefano Stabellini <stefano.stabell...@eu.citrix.com> Tested-by: Julien Grall <julien.gr...@arm.com> --- arch/arm/xen/enlighten.c | 6 + drivers/firmware/efi/arm-runtime.c | 17 +- drivers/firmware/efi/efi.c | 45 -- 3 files changed, 56 insertions(+), 12 deletions(-) diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c index 13e3e9f9b094..e130562d3283 100644 --- a/arch/arm/xen/enlighten.c +++ b/arch/arm/xen/enlighten.c @@ -261,6 +261,12 @@ static int __init fdt_find_hyper_node(unsigned long node, const char *uname, !strncmp(hyper_node.prefix, s, strlen(hyper_node.prefix))) hyper_node.version = s + strlen(hyper_node.prefix); + if (IS_ENABLED(CONFIG_XEN_EFI)) { + /* Check if Xen supports EFI */ + if (of_get_flat_dt_subnode_by_name(node, "uefi") > 0) + set_bit(EFI_PARAVIRT, ); + } + return 0; } diff --git a/drivers/firmware/efi/arm-runtime.c b/drivers/firmware/efi/arm-runtime.c index 6ae21e41a429..ac609b9f0b99 100644 --- a/drivers/firmware/efi/arm-runtime.c +++ b/drivers/firmware/efi/arm-runtime.c @@ -27,6 +27,7 @@ #include #include #include +#include extern u64 efi_system_table; @@ -107,13 +108,19 @@ static int __init arm_enable_runtime_services(void) } set_bit(EFI_SYSTEM_TABLES, ); - if (!efi_virtmap_init()) { - pr_err("No UEFI virtual mapping was installed -- runtime services will not be available\n"); - return -ENOMEM; + if (IS_ENABLED(CONFIG_XEN_EFI) && efi_enabled(EFI_PARAVIRT)) { + /* Set up runtime services function pointers for Xen Dom0 */ + xen_efi_runtime_setup(); + } else { + if (!efi_virtmap_init()) { + pr_err("No UEFI virtual mapping was installed -- runtime services will not be available\n"); + return -ENOMEM; + } + + /* Set up runtime services function pointers */ + efi_native_runtime_setup(); } - /* Set up runtime services function pointers */ - efi_native_runtime_setup(); set_bit(EFI_RUNTIME_SERVICES, ); efi.runtime_version = efi.systab->hdr.revision; diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c index 3a69ed5ecfcb..519c096a7c33 100644 --- a/drivers/firmware/efi/efi.c +++ b/drivers/firmware/efi/efi.c @@ -469,12 +469,14 @@ device_initcall(efi_load_efivars); FIELD_SIZEOF(struct efi_fdt_params, field) \ } -static __initdata struct { +struct params { const char name[32]; const char propname[32]; int offset; int size; -} dt_params[] = { +}; + +static struct params fdt_params[] __initdata = { UEFI_PARAM("System Table", "li
Re: [Xen-devel] [PATCH v2] x86/mm/pat: Fix BUG_ON in mmap_mem on QEMU/i386
* Toshi Kani <toshi.k...@hpe.com> wrote: > The following BUG_ON error was reported on QEMU/i386: > > kernel BUG at arch/x86/mm/physaddr.c:79! > Call Trace: > phys_mem_access_prot_allowed > mmap_mem > ? mmap_region > mmap_region > do_mmap > vm_mmap_pgoff > SyS_mmap_pgoff > do_int80_syscall_32 > entry_INT80_32 > > after commit edfe63ec97ed ("x86/mtrr: Fix Xorg crashes in > Qemu sessions"). > > PAT is now set to disabled state when MTRRs are disabled. > Thus, reactivating the __pa(high_memory) check in > phys_mem_access_prot_allowed(). > > When CONFIG_DEBUG_VIRTUAL is set, __pa() calls __phys_addr(), > which in turn calls slow_virt_to_phys() for 'high_memory'. > Because 'high_memory' is set to (the max direct mapped virt > addr + 1), it is not a valid virtual address. Hence, > slow_virt_to_phys() returns 0 and hit the BUG_ON. Using > __pa_nodebug() instead of __pa() will fix this BUG_ON. > > However, this code block, originally written for Pentiums and > earlier, is no longer adequate since a 32-bit Xen guest has > MTRRs disabled and supports ZONE_HIGHMEM. In this setup, > this code sets UC attribute for accessing RAM in high memory > range. > > Delete this code block as it has been unused for a long time. > > Reported-by: kernel test robot <ying.hu...@linux.intel.com> > Link: https://lkml.org/lkml/2016/4/1/608 > Signed-off-by: Toshi Kani <toshi.k...@hpe.com> > Cc: Thomas Gleixner <t...@linutronix.de> > Cc: Ingo Molnar <mi...@kernel.org> > Cc: H. Peter Anvin <h...@zytor.com> > Cc: Borislav Petkov <b...@suse.de> > Cc: David Vrabel <david.vra...@citrix.com> So you missed the Reviewed-by tag from Boris ... I've added it, but please try to propagate tags! Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v5 07/14] tools/lguest: force disable tboot and apm
* Luis R. Rodriguezwrote: > The paravirt_enabled() check is going away, the area tossed to > the kernel on lguest is not zerored out, so ensure lguest force > disables tboot and apm just in case the kernel file being read might > have this set for whatever reason. > > Signed-off-by: Luis R. Rodriguez > --- > tools/lguest/lguest.c | 6 ++ > 1 file changed, 6 insertions(+) Yeah, so the v4 patch was acked by Rusty: Acked-by: Rusty Russell ... but you didn't add his ack to the v5 patch :-( Please don't do this! Also please double check all past replies to make sure no acks got lost. I noticed this one, but there might be more. Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v5 01/14] x86/boot: enumerate documentation for the x86 hardware_subarch
* Luis R. Rodriguezwrote: > Although hardware_subarch has been in place since the x86 boot > protocol 2.07 it hasn't been used much. Enumerate current possible > values to avoid misuses and help with semantics later at boot > time should this be used further. > > These enums should only ever be used by architecture x86 code, > and all that code should be well contained and compartamentalized, > clarify that as well. > > v2: updates documentation further -- be a bit more pedantic about > annotating care and use of these guys. > v3: Use s/SOC/SoC and also anntoate that both domU and dom0 are > both currently supported through the PV boot path. > > Cc: Andy Shevchenko > Signed-off-by: Luis R. Rodriguez > --- > arch/x86/include/uapi/asm/bootparam.h | 37 > ++- > 1 file changed, 36 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/include/uapi/asm/bootparam.h > b/arch/x86/include/uapi/asm/bootparam.h > index 329254373479..bf9fea2f4591 100644 > --- a/arch/x86/include/uapi/asm/bootparam.h > +++ b/arch/x86/include/uapi/asm/bootparam.h > @@ -157,7 +157,42 @@ struct boot_params { > __u8 _pad9[276]; /* 0xeec */ > } __attribute__((packed)); > > -enum { > +/** > + * enum x86_hardware_subarch - x86 hardware subarchitecture Could you add some prominent warning here, like: > + * WARNING: the 'x86 subarch flag' is only used for legacy hacks, for > platform > + * features that are not easily enumerated or discoverable. You > should > + * not ever use this for new features. Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 0/7] Enhance PAT init to fix Xorg crashes
* Toshi Kaniwrote: > A Xorg failure on qemu32 was reported as a regression [1] caused by > 'commit 9cd25aac1f44 ("x86/mm/pat: Emulate PAT when it is disabled")'. > This patch-set fixes the regression. > > Negative effects of this regression were two failures [2] in Xorg on > QEMU with QEMU CPU model "qemu32" (-cpu qemu32), which were triggered > by the fact that its virtual CPU does not support MTRR. > #1. copy_process() failed in the check in reserve_pfn_range() > #2. error path in copy_process() then hit WARN_ON_ONCE in > untrack_pfn(). > > These negative effects are caused by two separate bugs, but they can be > addressed in separate patches. This patch-set addresses the root cause, > a long-standing PAT initialization issue. > > Please see the changelog in patch 4/7 for the details of the issue. > > - Patch 1-2 make necessary enhancement to PAT for the fix without > breaking Xen. > - Patch 3 is cleanup. > - Patch 4 fixes the regression. > - Patch 5 fixes an MTRR issue related with PAT init. > - Patch 6 removes PAT init code from Xen. > - Patch 7 adds PAT init to documentation. > > [1]: https://lkml.org/lkml/2016/3/3/828 > [2]: https://lkml.org/lkml/2016/3/4/775 The changelogs are much improved, I've applied these patches to tip:x86/mm, thanks Toshi! > I'd appreciate if someone can test this patch-set on Xen to verify that > there is no change in "x86/PAT: Configuration [0-7] .." message in dmesg. So I don't have a Xen setup, but hopefully such testing will happen once these changes show up in linux-next, tomorrow or so. Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4 2/5] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops
* Ingo Molnar <mi...@kernel.org> wrote: > * Andy Lutomirski <l...@amacapital.net> wrote: > > > On Mon, Mar 14, 2016 at 11:40 AM, Linus Torvalds > > <torva...@linux-foundation.org> wrote: > > > On Mon, Mar 14, 2016 at 11:24 AM, Andy Lutomirski <l...@amacapital.net> > > > wrote: > > >> > > >> The code in my queue is, literally: > > >> > > >> bool ex_handler_rdmsr_unsafe(const struct exception_table_entry *fixup, > > >> struct pt_regs *regs, int trapnr) > > >> { > > >> WARN_ONCE(1, "unchecked MSR access error: RDMSR from 0x%x", > > >> (unsigned int)regs->cx); > > >> > > >> /* Pretend that the read succeeded and returned 0. */ > > >> regs->ip = ex_fixup_addr(fixup); > > >> regs->ax = 0; > > >> regs->dx = 0; > > >> return true; > > >> } > > >> EXPORT_SYMBOL(ex_handler_rdmsr_unsafe); > > > > > > I guess I can live with this, as long as we also extend the > > > early-fault handling to work with the special exception handlers. > > > > OK, will do. I need to rewrork the early IDT code a bit so it > > generates a real pt_regs layout, but that's arguably a cleanup anyway. > > Ok, with that's I'm pretty happy about it as well. Note, I think it's pretty clear at this point that this is v4.7 material. Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4 2/5] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops
* Andy Lutomirskiwrote: > On Mon, Mar 14, 2016 at 11:40 AM, Linus Torvalds > wrote: > > On Mon, Mar 14, 2016 at 11:24 AM, Andy Lutomirski > > wrote: > >> > >> The code in my queue is, literally: > >> > >> bool ex_handler_rdmsr_unsafe(const struct exception_table_entry *fixup, > >> struct pt_regs *regs, int trapnr) > >> { > >> WARN_ONCE(1, "unchecked MSR access error: RDMSR from 0x%x", > >> (unsigned int)regs->cx); > >> > >> /* Pretend that the read succeeded and returned 0. */ > >> regs->ip = ex_fixup_addr(fixup); > >> regs->ax = 0; > >> regs->dx = 0; > >> return true; > >> } > >> EXPORT_SYMBOL(ex_handler_rdmsr_unsafe); > > > > I guess I can live with this, as long as we also extend the > > early-fault handling to work with the special exception handlers. > > OK, will do. I need to rewrork the early IDT code a bit so it > generates a real pt_regs layout, but that's arguably a cleanup anyway. Ok, with that's I'm pretty happy about it as well. Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4 2/5] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops
* Linus Torvaldswrote: > On Mon, Mar 14, 2016 at 10:17 AM, Andy Lutomirski wrote: > > > > So yes, let's please warn. I'm okay with removing the panic_on_oops > > thing though. (But if anyone suggests that we should stop OOPSing on > > bad kernel page faults, I *will* fight back.) > > Bad kernel page faults are something completely different. They would > be actual bugs regardless. > > The MSR thing has *often* been just silly "this CPU doesn't do that > MSR". Generic bootup setup code etc that just didn't know or care > about the particular badly documented rule for one particular random > CPU version and stepping. > > In fact, when I say "often", I suspect I should really just say > "always". I don't think we've ever found a case where oopsing would > have been the right thing. But it has definitely caused lots of > problems, especially in the early paths where your code doesn't even > work right now. > > Now, when it comes to the warning, I guess I could live with it, but I > think it's stupid to make this a low-level exception handler thing. > > So what I think should be done: > > - make sure that wr/rdmsr_safe() actually works during very early > init. At some point it didn't. > > - get rid of the current wrmsr/rdmsr *entirely*. It's shit. > > - Add this wrapper: > > #define wrmsr(msr, low, high) \ > WARN_ON_ONCE(wrmsr_safe(msr, low, high)) > > and be done with it. We could even decide to make that WARN_ON_ONCE() > be something we could configure out, because it's really a debugging > thing and isn't like it should be all that fatal. > > None of this insane complicated crap that buys us exactly *nothing*, > and depends on fancy new exception handling support etc etc. > > So what's the downside to just doing this simple thing? This was my original suggestion as well. The objection was that sometimes it's performance critical. To which I replied that 1) I highly doubt that a single extra branch of the check is measurable 2) and even if so, then _those_ performance critical cases should be done in some special way (rdmsr_nocheck() or whatever) - at which point the argument was that there's a lot of such cases. Which final line of argument I still don't really buy, btw., but it became a me against everyone else argument and I cowardly punted. Now that the 800 lb gorilla is on my side I of course stand my ground, principles are principles! Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops
* Andy Lutomirski <l...@amacapital.net> wrote: > On Thu, Oct 1, 2015 at 12:15 AM, Ingo Molnar <mi...@kernel.org> wrote: > > > > * Andy Lutomirski <l...@amacapital.net> wrote: > > > >> > These could still be open coded in an inlined fashion, like the > >> > scheduler usage. > >> > >> We could have a raw_rdmsr for those. > >> > >> OTOH, I'm still not 100% convinced that this warn-but-don't-die behavior is > >> worth the effort. This isn't a frequent source of bugs to my knowledge, > >> and we > >> don't try to recover from incorrect cr writes, out-of-bounds MMIO, etc, so > >> do we > >> really gain much by rigging a recovery mechanism for rdmsr and wrmsr > >> failures > >> for code that doesn't use the _safe variants? > > > > It's just the general principle really: don't crash the kernel on bootup. > > There's > > few things more user hostile than that. > > > > Also, this would maintain the status quo: since we now (accidentally) don't > > crash > > the kernel on distro kernels (but silently and unsafely ignore the faulting > > instruction), we should not regress that behavior (by adding the chance to > > crash > > again), but improve upon it. > > Just a heads up: the extable improvements in tip:ras/core make it > straightforward to get the best of all worlds: explicit failure > handling (written in C!), no fast path overhead whatsoever, and no new > garbage in the exception handlers. I _knew_ I should have merged them into tip:x86/mm, not tip:ras/core ;-) I had a quick look at your new MSR series and I'm very happy with that direction! Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 2/5] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops
* Andy Lutomirskiwrote: > This demotes an OOPS and likely panic due to a failed non-"safe" MSR > access to a WARN and, for RDMSR, a return value of zero. If > panic_on_oops is set, then failed unsafe MSR accesses will still > oops and panic. > > To be clear, this type of failure should *not* happen. This patch > exists to minimize the chance of nasty undebuggable failures due on > systems that used to work due to a now-fixed CONFIG_PARAVIRT=y bug. > > Signed-off-by: Andy Lutomirski > --- > arch/x86/include/asm/msr.h | 10 -- > arch/x86/mm/extable.c | 33 + > 2 files changed, 41 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h > index 93fb7c1cffda..1487054a1a70 100644 > --- a/arch/x86/include/asm/msr.h > +++ b/arch/x86/include/asm/msr.h > @@ -92,7 +92,10 @@ static inline unsigned long long native_read_msr(unsigned > int msr) > { > DECLARE_ARGS(val, low, high); > > - asm volatile("rdmsr" : EAX_EDX_RET(val, low, high) : "c" (msr)); > + asm volatile("1: rdmsr\n" > + "2:\n" > + _ASM_EXTABLE_HANDLE(1b, 2b, ex_handler_rdmsr_unsafe) > + : EAX_EDX_RET(val, low, high) : "c" (msr)); > if (msr_tracepoint_active(__tracepoint_read_msr)) > do_trace_read_msr(msr, EAX_EDX_VAL(val, low, high), 0); > return EAX_EDX_VAL(val, low, high); > @@ -119,7 +122,10 @@ static inline unsigned long long > native_read_msr_safe(unsigned int msr, > static inline void native_write_msr(unsigned int msr, > unsigned low, unsigned high) > { > - asm volatile("wrmsr" : : "c" (msr), "a"(low), "d" (high) : "memory"); > + asm volatile("1: wrmsr\n" > + "2:\n" > + _ASM_EXTABLE_HANDLE(1b, 2b, ex_handler_wrmsr_unsafe) > + : : "c" (msr), "a"(low), "d" (high) : "memory"); > if (msr_tracepoint_active(__tracepoint_read_msr)) > do_trace_write_msr(msr, ((u64)high << 32 | low), 0); > } > diff --git a/arch/x86/mm/extable.c b/arch/x86/mm/extable.c > index 9dd7e4b7fcde..f310714e6e6d 100644 > --- a/arch/x86/mm/extable.c > +++ b/arch/x86/mm/extable.c > @@ -49,6 +49,39 @@ bool ex_handler_ext(const struct exception_table_entry > *fixup, > } > EXPORT_SYMBOL(ex_handler_ext); > > +bool ex_handler_rdmsr_unsafe(const struct exception_table_entry *fixup, > + struct pt_regs *regs, int trapnr) > +{ > + WARN(1, "unsafe MSR access error: RDMSR from 0x%x", > + (unsigned int)regs->cx); Btw., instead of the safe/unsafe naming (which has an emotional and security secondary attribute), shouldn't we move this over to a _check() (or _checking()) naming instead that we do in other places in the kernel? I.e.: rdmsr(msr, l, h); and: if (rdmsr_check(msr, l, h)) { ... } and then we could name the helpers as _check() and _nocheck() - which is neutral naming. Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 2/5] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops
* Andy Lutomirskiwrote: > +bool ex_handler_rdmsr_unsafe(const struct exception_table_entry *fixup, > + struct pt_regs *regs, int trapnr) > +{ > + WARN(1, "unsafe MSR access error: RDMSR from 0x%x", > + (unsigned int)regs->cx); Please make this WARN_ONCE(). There's no point in locking up the system with WARN() spam, should this trigger frequently. > + WARN(1, "unsafe MSR access error: WRMSR to 0x%x (tried to write > 0x%08x%08x)\n", > + (unsigned int)regs->cx, > + (unsigned int)regs->dx, (unsigned int)regs->ax); Ditto. Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 01/11] x86/boot: enumerate documentation for the x86 hardware_subarch
* Andy Lutomirski <l...@amacapital.net> wrote: > On Feb 24, 2016 12:33 AM, "Ingo Molnar" <mi...@kernel.org> wrote: > > > > For hard coded platform quirks I'd suggest we add x86_platform.quirks > > flags. For > > example the F00F hack for Xen could be done via: > > > > x86_platform.quirks.idt_remap = 0; > > > > Don't we unconditionally remap the IDT? I think Kees did it for > general purpose hardening due to our complete inability to hide the > IDT address. I.e. I think we can remove the f00f condition entirely. Yeah, indeed - I only judged by the (limited) patch context and assumed the Xen problem was with IDT remapping. But what the quirk really does is only to avoid printing the f00f workaround - i.e. a cosmetic change. I think we should just drop the paravirt_enabled() check. Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 01/11] x86/boot: enumerate documentation for the x86 hardware_subarch
* Luis R. Rodriguezwrote: > Although hardware_subarch has been in place since the x86 boot > protocol 2.07 it hasn't been used much. Enumerate current possible > values to avoid misuses and help with semantics later at boot > time should this be used further. > > v2: fix typos > > Cc: Andy Shevchenko > Signed-off-by: Luis R. Rodriguez > --- > arch/x86/include/uapi/asm/bootparam.h | 31 ++- > 1 file changed, 30 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/include/uapi/asm/bootparam.h > b/arch/x86/include/uapi/asm/bootparam.h > index 329254373479..50d5009cf276 100644 > --- a/arch/x86/include/uapi/asm/bootparam.h > +++ b/arch/x86/include/uapi/asm/bootparam.h > @@ -157,7 +157,36 @@ struct boot_params { > __u8 _pad9[276]; /* 0xeec */ > } __attribute__((packed)); > > -enum { > +/** > + * enum x86_hardware_subarch - x86 hardware subarchitecture > + * > + * The x86 hardware_subarch and hardware_subarch_data were added as of the > x86 > + * boot protocol 2.07 to help distinguish and supports custom x86 boot > + * sequences. This enum represents accepted values for the x86 > + * hardware_subarch. Custom x86 boot sequences (not X86_SUBARCH_PC) do not > have > + * or simply do not make use of natural stubs like BIOS or EFI, the > + * hardware_subarch can be used on the Linux entry path to revector to a > + * subarchitecture stub when needed. This subarchitecture stub can be used to > + * set up Linux boot parameters or for special care to account for > nonstandard > + * handling of page tables. > + * > + * KVM and Xen HVM do not have a subarch as these are expected to follow > + * standard x86 boot entries. If there is a genuine need for "hypervisor" > type > + * that should be considered separately in the future. > + * > + * @X86_SUBARCH_PC: Should be used if the hardware is enumerable using > standard > + * PC mechanisms (PCI, ACPI) and doesn't need a special boot flow. > + * @X86_SUBARCH_LGUEST: Used for x86 hypervisor demo, lguest > + * @X86_SUBARCH_XEN: Used for Xen guest types which follow the PV boot path, > + * which start at asm startup_xen() entry point and later jump to the C > + * xen_start_kernel() entry point. > + * @X86_SUBARCH_INTEL_MID: Used for Intel MID (Mobile Internet Device) > platform > + * systems which do not have the PCI legacy interfaces. > + * @X86_SUBARCH_CE4100: Used for Intel CE media processor (CE4100) SOC for > + * for settop boxes and media devices, the use of a subarch for CE4100 > + * is more of a hack... > + */ > +enum x86_hardware_subarch { > X86_SUBARCH_PC = 0, > X86_SUBARCH_LGUEST, > X86_SUBARCH_XEN, No, this is really backwards. While I agree that we want to get rid of paravirt_enabled(), we _dont_ want to spread the use of (arguably broken) boot flags like this! This is one of the cases where the cure is worse than the disease. The 'modern' way to handle platform quirks is to have hardware drivers with auto-detection, and drivers know how to detect whether the hardware is present or not. For legacy cases where no auto-detection is possible, we have per hardware capability flags to turn it off. The 'hardware subarch' flag of the bootloader can be used to install certain quirks, in a single early bootup function - but that should be all: ideally no quirks are needed. We don't want to spread 'subarch flags' into various unrelated code. Let's go over your series and see whether and how that principle can be applied: - patch #1, #2: should be dropped - patch #3, #4: EBDA support. The EBDA BIOS signature is an ancient data structure, starting off at physical memory 0x40E - which is the very first physical memory page of the system. We should add an x86_ebda_bios flag that is set to 1 by default, but which paravirt bootup can set to 0. That would avoid the reservation of the BIOS area and will save a bit of RAM. - patch #5, #6, #7: looks good, does not use a subarch flag - patch #8: f00f workaround. Subarch flag use is wrong. The complication with this workaround is that it uses MM tricks to install an IDT. Could you check whether Xen truly needs this quirk? If yes then there should be a new flag, something like x86_idt_readonly, which is set to 0 but Xen can set it to 1. If that flag is set then the F00F workaround does not have to be installed. Or something like that: the point is to use a specific flag. - Patch #9, #10: RTC support. The problem with RTC platform driver is that it's not possible to detect the RTC reliably - so we sometimes have to quirk it off. Instead of using bitflags, add something like x86_platform.rtc_available, which defaults to 1. Don't add negation to the name and don't use bitflags - use a byte flag. - Patch #11: this patch wants to disable the PNP BIOS code. The
Re: [Xen-devel] [PATCH v4 10/11] dma: rename dma_*_writecombine() to dma_*_wc()
* Luis R. Rodriguez <mcg...@do-not-panic.com> wrote: > On Tue, Aug 25, 2015 at 9:21 PM, Ingo Molnar <mi...@kernel.org> wrote: > > > > * Andrew Morton <a...@linux-foundation.org> wrote: > > > >> > There's a catch-22 issue here either way, for instance this rename patch > >> > has > >> > been being baked for probably 2 releases already but the difficulty has > >> > been > >> > trying to find the appropriate time to merge it without conflict. > >> > > >> > If you do it in the beginning of the merge window, you have to ask > >> > yourself in > >> > what tree it will be done. Since subsystems are topic specific that > >> > means that > >> > subsystem will end up having a conflict at the end of the merge window. > >> > >> Yes it's a special case. I think the best way of handling such things is > >> to get > >> them in to Linus either right at the end of the merge window or the day > >> after he > >> releases -rc1. This is when most people's trees are mostly empty. > > > > Yes, that was the plan last time around as well - but the end of the merge > > window > > is when we have the least maintainer bandwidth as well ... > > > > Anyway, I applied most of the patches (sans the rename), so the rename patch > > should be a lot simpler to execute at the right moment this time around. > > Ingo, should we try this again some time? I have some ideas on how to > make these sorts of changes easier to manage in the future, it > involves having an automatic git rebase option to use Coccinelle for > you if a patch is annotated to have been completely done with > Coccinelle, but future tooling is needed for that [0]. In the meantime > I (or you) can simply run the script at any point in time to catch all > the names as-is in the kernel / point in time we decide to merge this > simple rename. > > [0] http://kernelnewbies.org/KernelProjects/linux-oven So beyond the rename, can we also keep the old names as compatibility helpers, with a #define mapping them to the new names? If so then please (re-)send the changes. Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Linux 4.4 MW: Boot under Xen fails with CONFIG_DEBUG_WX enabled: RIP: ptdump_walk_pgd_level_core
* Stephen Smalleywrote: > On 11/04/2015 06:55 AM, Sander Eikelenboom wrote: > >Hi All, > > > >I just tried to boot with the current linus mergewindow tree under Xen. > >It fails with a kernel panic at boot with the new "CONFIG_DEBUG_WX" > >option enabled. > >Disabling it makes the kernel boot fine. > > > >The splat: > >[ 18.424241] Freeing unused kernel memory: 1104K (822fc000 - > >8241) > >[ 18.430314] Write protecting the kernel read-only data: 18432k > >[ 18.441054] Freeing unused kernel memory: 1144K (880001ae2000 - > >880001c0) > >[ 18.447966] Freeing unused kernel memory: 1560K (88000207a000 - > >88000220) > >[ 18.453947] BUG: unable to handle kernel paging request at > >88055c883000 > >[ 18.459943] IP: [] > >ptdump_walk_pgd_level_core+0x20e/0x440 > >[ 18.465847] PGD 2212067 PUD 0 > >[ 18.471564] Oops: [#1] SMP > >[ 18.477248] Modules linked in: > >[ 18.482918] CPU: 2 PID: 1 Comm: swapper/0 Not tainted > >4.3.0-mw-20151104-linus-doflr+ #1 > >[ 18.488804] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640) , BIOS > >V1.8B1 09/13/2010 > >[ 18.494778] task: 880059b9 ti: 880059b98000 task.ti: > >880059b98000 > >[ 18.500852] RIP: e030:[] [] > >ptdump_walk_pgd_level_core+0x20e/0x440 It would be nice to see which line of code this corresponds to. Doing this: gdb vmlinux list *0x8105af8e should normally do the trick. Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops
* Andy Lutomirskiwrote: > > These could still be open coded in an inlined fashion, like the scheduler > > usage. > > We could have a raw_rdmsr for those. > > OTOH, I'm still not 100% convinced that this warn-but-don't-die behavior is > worth the effort. This isn't a frequent source of bugs to my knowledge, and > we > don't try to recover from incorrect cr writes, out-of-bounds MMIO, etc, so do > we > really gain much by rigging a recovery mechanism for rdmsr and wrmsr failures > for code that doesn't use the _safe variants? It's just the general principle really: don't crash the kernel on bootup. There's few things more user hostile than that. Also, this would maintain the status quo: since we now (accidentally) don't crash the kernel on distro kernels (but silently and unsafely ignore the faulting instruction), we should not regress that behavior (by adding the chance to crash again), but improve upon it. Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops
* Peter Zijlstra <pet...@infradead.org> wrote: > On Mon, Sep 21, 2015 at 09:36:15AM -0700, Linus Torvalds wrote: > > On Mon, Sep 21, 2015 at 1:46 AM, Ingo Molnar <mi...@kernel.org> wrote: > > > > > > Linus, what's your preference? > > > > So quite frankly, is there any reason we don't just implement > > native_read_msr() as just > > > >unsigned long long native_read_msr(unsigned int msr) > >{ > > int err; > > unsigned long long val; > > > > val = native_read_msr_safe(msr, ); > > WARN_ON_ONCE(err); > > return val; > >} > > > > Note: no inline, no nothing. Just put it in arch/x86/lib/msr.c, and be > > done with it. I don't see the downside. > > > > How many msr reads are so critical that the function call > > overhead would matter? Get rid of the inline version of the _safe() > > thing too, and put that thing there too. > > There are a few in the perf code, and esp. on cores without a stack engine > the > call overhead is noticeable. Also note that the perf MSRs are generally > optimized MSRs and less slow (we cannot say fast, they're still MSRs) than > regular MSRs. These could still be open coded in an inlined fashion, like the scheduler usage. Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops
* Linus Torvalds <torva...@linux-foundation.org> wrote: > On Mon, Sep 21, 2015 at 1:46 AM, Ingo Molnar <mi...@kernel.org> wrote: > > > > Linus, what's your preference? > > So quite frankly, is there any reason we don't just implement > native_read_msr() as just > >unsigned long long native_read_msr(unsigned int msr) >{ > int err; > unsigned long long val; > > val = native_read_msr_safe(msr, ); > WARN_ON_ONCE(err); > return val; >} > > Note: no inline, no nothing. Just put it in arch/x86/lib/msr.c, and be > done with it. I don't see the downside. Absolutely! > How many msr reads are so critical that the function call overhead > would > matter? Get rid of the inline version of the _safe() thing too, and put that > thing there too. Only a very low number of them is performance critical (because even hw-accelerated MSR accesses are generally slow so we try to avoid MSR accesses in fast paths as much as possible, via shadowing, etc.) - and in the few cases where we have to access an MSR in a fast path we can do those separately. I'm only worried about the 'default' APIs, i.e. rdmsr() that is used throughout arch/x86/ over a hundred times, not about performance critical code paths that get enough testing and enough attention in general. Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops
* Andy Lutomirskiwrote: > On Sep 20, 2015 5:15 PM, "Linus Torvalds" > wrote: > > > > On Sun, Sep 20, 2015 at 5:02 PM, Andy Lutomirski wrote: > > > This demotes an OOPS and likely panic due to a failed non-"safe" MSR > > > access to a WARN_ON_ONCE and a return of zero (in the RDMSR case). > > > We still write a pr_info entry unconditionally for debugging. > > > > No, this is wrong. > > > > If you really want to do something like this, then just make all MSR reads > > safe. So the only difference between "safe" and "unsafe" is that the unsafe > > version just doesn't check the return value, and silently just returns zero > > for reads (or writes nothing). > > > > To quote Obi-Wan: "Use the exception table, Luke". > > > > Because decoding instructions is just too ugly. We'll do it for CPU errata > > where we might have to do it for user space code too (ie the AMD prefetch > > mess), but for code that _we_ control? Hell no. > > > > So NAK on this. > > My personal preference is to just not do this at all. A couple people > disagree. > If we make the unsafe variants not oops, then I think we want to have the > nice > loud warning, since these issues are bugs if they happen. > > We could certainly use the exception table for this, but it'll result in > bigger > core, since each MSR access will need an exception table entry and an > associated > fixup to call some helper that warns and sets the result to zero. > > I'd be happy to implement that, but only if it'll be applied. Otherwise I'd > rather just drop this patch and keep the rest of the series. Linus, what's your preference? Due to the bug mentioned earlier in this thread all MSR reads are currently 'safe' on all the major Linux distros (which all have CONFIG_PARAVIRT=y), i.e. by 'fixing' them we'd reintroduce random crashes into various fragile pieces of code... To add insult to injury, the current 'silently safe by accident' MSR code isn't so safe: because it leaves the result of the read uninitialized... To fix this all I'd really like to have: - safe MSR reads by default (i.e. never boot crash the kernel on some rare condition - which to most users is either a silent boot hang or an instant restart). Historicaly we had a stream of 'silly boot crashes' due to MSR reads that generate a #GPF. They make Linux less usable around the edges, especially in the x86 non-server (desktop) space where most hardware vendors are either openly Linux hostile, or, at best, Linux oblivious. - proper result-zeroing behavior on exceptions - and we should also generate _some_ sort of warning when MSR exceptions happen in an 'unintended' fashion. Maybe the warning could be put under a (default-enabled) config option for the size conscious. Or we could extend exception table entry encoding to include a 'warning bit', to not bloat the kernel. If the exception handler code encounters such an exception it would generate a one-time warning for that entry, but otherwise not crash the kernel and continue execution with an all-zeroes result for the MSR read. Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops
* Andy Lutomirskiwrote: > This demotes an OOPS and likely panic due to a failed non-"safe" MSR > access to a WARN_ON_ONCE and a return of poisoned values (in the > RDMSR case). We still write a pr_info entry unconditionally for > debugging. > > To be clear, this type of failure should *not* happen. This patch > exists to minimize the chance of nasty undebuggable failures due on > systems that used to work due to a now-fixed CONFIG_PARAVIRT=y bug. > + if (opcode == 0x320f) { > + /* RDMSR */ > + pr_info("bad kernel RDMSR from non-existent MSR 0x%x", > + (unsigned int)regs->cx); > + if (!panic_on_oops) { > + WARN_ON_ONCE(true); > + > + /* Patch it up with deterministic poison. */ > + regs->ax = 0x5aadc0de; > + regs->dx = 0x8badf00d; > + regs->ip += 2; > + return true; IMHO this should really not poison the result, but use zero as the result. The poison might randomly indicate 'present' feature in various registers that might be accessed in a buggy way. Don't send the code further down into la-la-land by giving it a 'success'. And yes, zero can mean success too, but we have to pick a side here ... The warning will be enough to fix these ups, people (and in particular distro testing people) will be watching out for them. Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 0/3] x86/paravirt: Fix baremetal paravirt MSR ops
* Andy Lutomirskiwrote: > Setting CONFIG_PARAVIRT=y has an unintended side effect: it silently > turns all rdmsr and wrmsr operations into the safe variants without > any checks that the operations actually succeed. > > This is IMO awful: it papers over bugs. In particular, KVM gueests > might be unwittingly depending on this behavior because > CONFIG_KVM_GUEST currently depends on CONFIG_PARAVIRT. I'm not > aware of any such problems, but applying this series would be a good > way to shake them out. > > Fix it so that the MSR operations work the same on CONFIG_PARAVIRT=n > and CONFIG_PARAVIRT=y as long as Xen isn't being used. The Xen > maintainers are welcome to make a similar change on top of this. > > Since there's plenty of time before the next merge window, I think > we should apply and fix anything that breaks. No, I think we should at most generate a warning instead, and not crash the kernel via rdmsr()! Most big distro kernels on bare metal have CONFIG_PARAVIRT=y (I checked Ubuntu and Fedora), so we are potentially exposing a lot of users to problems. Crashing the bootup on an unknown MSR is bad. Many MSR reads and writes are non-critical and returning the 'safe' result is much better than crashing or hanging the bootup. ( We should double check that rdmsr()/wrmsr() results are never left uninitialized, but are set to zero or so, for cases where the return code is not checked. ) Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 0/3] x86/paravirt: Fix baremetal paravirt MSR ops
* Andy Lutomirski <l...@amacapital.net> wrote: > On Thu, Sep 17, 2015 at 12:19 AM, Ingo Molnar <mi...@kernel.org> wrote: > > > > * Andy Lutomirski <l...@kernel.org> wrote: > > > >> Setting CONFIG_PARAVIRT=y has an unintended side effect: it silently > >> turns all rdmsr and wrmsr operations into the safe variants without > >> any checks that the operations actually succeed. > >> > >> This is IMO awful: it papers over bugs. In particular, KVM gueests > >> might be unwittingly depending on this behavior because > >> CONFIG_KVM_GUEST currently depends on CONFIG_PARAVIRT. I'm not > >> aware of any such problems, but applying this series would be a good > >> way to shake them out. > >> > >> Fix it so that the MSR operations work the same on CONFIG_PARAVIRT=n > >> and CONFIG_PARAVIRT=y as long as Xen isn't being used. The Xen > >> maintainers are welcome to make a similar change on top of this. > >> > >> Since there's plenty of time before the next merge window, I think > >> we should apply and fix anything that breaks. > > > > No, I think we should at most generate a warning instead, and not crash the > > kernel > > via rdmsr()! > > > > Most big distro kernels on bare metal have CONFIG_PARAVIRT=y (I checked > > Ubuntu and > > Fedora), so we are potentially exposing a lot of users to problems. > > > > Crashing the bootup on an unknown MSR is bad. Many MSR reads and writes are > > non-critical and returning the 'safe' result is much better than crashing or > > hanging the bootup. > > > > Should we do that for CONFIG_PARAVIRT=n, too? Absolutely. PARAVIRT=n should not behave differently from PARAVIRT=y on bare metal. > It would be straightforward to rig this up (temporarily?) on top of these > patches. To keep bloat down, we might want to implement it in > do_general_protection rather than sticking it in native_read_msr. Fair enough. > wrmsr is a different beast, since we can fail due to writing the wrong value > to > an otherwise valid MSR. Given that MSR screwups can very easily be security > holes, I'm not sure that warning and blindly continuing on an unchecked > failed > wrmsr is a good idea. So the fact is that right now we are silently ignoring failures there, and have been doing that for some time. The right first step is to live with that and start generating low-key, once-per-bootup warnings at most, and see how frequent (and how serious) they are. We could add a (default disabled) CONFIG_PANIC_ON_UNKNOWN_MSR=y option if that's really a serious concern. > In any event, I think it's nuts that CONFIG_PARAVIRT changes this > behavior. We should pick something sane and stick with it. Absolutely - and as it happens, the 'does not crash the kernel' PARAVIRT=y accidental behavior is actually quite close to what we wanted for a long time, so let's make it official - and add a warning to make sure we are aware of problems. But don't turn 'potential problems' into showstopper bugs such as a hard to debug early boot crash, which to most Linux users means a black screen on bootup! > > ( We should double check that rdmsr()/wrmsr() results are never left > > uninitialized, but are set to zero or so, for cases where the return code > > is not > > checked. ) > > It sure looks like native_read_msr_safe doesn't clear the output if the rdmsr > fails. Yeah, that should be fixed, to make such soft failures deterministic. Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4 10/11] dma: rename dma_*_writecombine() to dma_*_wc()
* Andrew Morton a...@linux-foundation.org wrote: There's a catch-22 issue here either way, for instance this rename patch has been being baked for probably 2 releases already but the difficulty has been trying to find the appropriate time to merge it without conflict. If you do it in the beginning of the merge window, you have to ask yourself in what tree it will be done. Since subsystems are topic specific that means that subsystem will end up having a conflict at the end of the merge window. Yes it's a special case. I think the best way of handling such things is to get them in to Linus either right at the end of the merge window or the day after he releases -rc1. This is when most people's trees are mostly empty. Yes, that was the plan last time around as well - but the end of the merge window is when we have the least maintainer bandwidth as well ... Anyway, I applied most of the patches (sans the rename), so the rename patch should be a lot simpler to execute at the right moment this time around. Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4 10/11] dma: rename dma_*_writecombine() to dma_*_wc()
* Luis R. Rodriguez mcg...@do-not-panic.com wrote: From: Luis R. Rodriguez mcg...@suse.com Rename dma_*_writecombine() to dma_*_wc(), so that the naming is coherent across the various write-combining APIs. The following Coccinelle SmPL patch was used for this simple transformation: @ rename_dma_alloc_writecombine @ expression dev, size, dma_addr, gfp; @@ -dma_alloc_writecombine(dev, size, dma_addr, gfp) +dma_alloc_wc(dev, size, dma_addr, gfp) @ rename_dma_free_writecombine @ expression dev, size, cpu_addr, dma_addr; @@ -dma_free_writecombine(dev, size, cpu_addr, dma_addr) +dma_free_wc(dev, size, cpu_addr, dma_addr) @ rename_dma_mmap_writecombine @ expression dev, vma, cpu_addr, dma_addr, size; @@ -dma_mmap_writecombine(dev, vma, cpu_addr, dma_addr, size) +dma_mmap_wc(dev, vma, cpu_addr, dma_addr, size) Generated-by: Coccinelle SmPL Suggested-by: Ingo Molnar mi...@kernel.org Signed-off-by: Luis R. Rodriguez mcg...@suse.com --- arch/arm/mach-lpc32xx/phy3250.c | 13 ++--- arch/arm/mach-netx/fb.c | 14 ++ arch/arm/mach-nspire/clcd.c | 13 ++--- arch/avr32/include/asm/dma-mapping.h | 20 ++-- arch/avr32/mm/dma-coherent.c | 12 ++-- arch/metag/include/asm/dma-mapping.h | 4 ++-- arch/metag/kernel/dma.c | 6 +++--- drivers/dma/iop-adma.c| 8 drivers/dma/mv_xor.c | 4 ++-- drivers/dma/qcom_bam_dma.c| 14 +++--- drivers/gpu/drm/drm_gem_cma_helper.c | 13 ++--- drivers/gpu/drm/omapdrm/omap_dmm_tiler.c | 13 ++--- drivers/gpu/drm/omapdrm/omap_gem.c| 8 drivers/gpu/drm/sti/sti_cursor.c | 13 ++--- drivers/gpu/drm/sti/sti_gdp.c | 3 +-- drivers/gpu/drm/sti/sti_hqvdp.c | 6 +++--- drivers/gpu/drm/tegra/gem.c | 11 +-- drivers/gpu/host1x/cdma.c | 8 drivers/gpu/host1x/job.c | 10 -- drivers/media/platform/coda/coda-bit.c| 10 +- drivers/video/fbdev/acornfb.c | 4 ++-- drivers/video/fbdev/amba-clcd-versatile.c | 14 ++ drivers/video/fbdev/amba-clcd.c | 4 ++-- drivers/video/fbdev/atmel_lcdfb.c | 9 + drivers/video/fbdev/ep93xx-fb.c | 9 +++-- drivers/video/fbdev/gbefb.c | 8 drivers/video/fbdev/imxfb.c | 12 ++-- drivers/video/fbdev/mx3fb.c | 9 - drivers/video/fbdev/nuc900fb.c| 8 drivers/video/fbdev/omap/lcdc.c | 16 drivers/video/fbdev/pxa168fb.c| 8 drivers/video/fbdev/pxafb.c | 4 ++-- drivers/video/fbdev/s3c-fb.c | 7 +++ drivers/video/fbdev/s3c2410fb.c | 8 drivers/video/fbdev/sa1100fb.c| 8 include/linux/dma-mapping.h | 16 sound/arm/pxa2xx-pcm-lib.c| 20 sound/soc/fsl/imx-pcm-fiq.c | 10 -- sound/soc/nuc900/nuc900-pcm.c | 6 ++ sound/soc/omap/omap-pcm.c | 12 40 files changed, 183 insertions(+), 212 deletions(-) Which kernel is this against? It has conflicts in 3 files with Linus's latest: v4.2-rc8. Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Patch x86/ldt: Make modify_ldt synchronous has been added to the 4.1-stable tree
* Greg KH gre...@linuxfoundation.org wrote: On Thu, Aug 13, 2015 at 06:38:46PM -0700, Andy Lutomirski wrote: On Thu, Aug 13, 2015 at 5:44 PM, gre...@linuxfoundation.org wrote: This is a note to let you know that I've just added the patch titled x86/ldt: Make modify_ldt synchronous This needs: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?h=x86/urgentid=4809146b86c3d41ce588fdb767d021e2a80600dd and its parent. :( As that isn't in Linus's tree yet, I'll move this one back and wait a release or so before it gets merged. I just sent it to Linus, so barring a catastrophy it should be upstream soon. Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] linux-next: manual merge of the xen-tip tree with the tip tree
* Peter Zijlstra pet...@infradead.org wrote: On Wed, Aug 12, 2015 at 02:44:52PM -0400, Boris Ostrovsky wrote: On 08/12/2015 02:36 PM, Peter Zijlstra wrote: On Wed, Aug 12, 2015 at 11:26:41AM -0700, H. Peter Anvin wrote: One option might be to do the addition in assembly, i.e.: i (key), i (index) ... and put the addition into the assembly source. Like so? Seems to build on gcc-4.6. Yes, this builds on 4.4.4. as well. Thanks, Ingo / Hpa, can you make the below patch appear in the right tip branch? Applied, it will show up as the following commit: d420acd816c0 (jump_label/x86: Work around asm build bug on older/backported GCCs) Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 1/1] x86: allow to enable/disable modify_ldt at run time
* Willy Tarreau w...@1wt.eu wrote: @@ -276,6 +282,15 @@ asmlinkage int sys_modify_ldt(int func, void __user *ptr, { int ret = -ENOSYS; + if (!sysctl_modify_ldt) { + printk_ratelimited(KERN_INFO + Denied a call to modify_ldt() from %s[%d] (uid: %d). + Adjust sysctl if this was not an exploit attempt.\n, + current-comm, task_pid_nr(current), + from_kuid_munged(current_user_ns(), current_uid())); UI nit: so this message should really tell the user _which_ sysctl to configure, instead of passive-aggressively alluding to the fact that there's a sysctl somewhere that might do the trick... Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 1/1] x86: allow to enable/disable modify_ldt at run time
* Willy Tarreau w...@1wt.eu wrote: Hi Ingo, On Wed, Aug 05, 2015 at 10:00:37AM +0200, Ingo Molnar wrote: * Willy Tarreau w...@1wt.eu wrote: @@ -276,6 +282,15 @@ asmlinkage int sys_modify_ldt(int func, void __user *ptr, { int ret = -ENOSYS; + if (!sysctl_modify_ldt) { + printk_ratelimited(KERN_INFO + Denied a call to modify_ldt() from %s[%d] (uid: %d). + Adjust sysctl if this was not an exploit attempt.\n, + current-comm, task_pid_nr(current), + from_kuid_munged(current_user_ns(), current_uid())); UI nit: so this message should really tell the user _which_ sysctl to configure, instead of passive-aggressively alluding to the fact that there's a sysctl somewhere that might do the trick... I agree, I did it first and changed my mind due to the repetition of the word modify_ldt. Here's an updated version instead. Willy From 17b2720cd54df0fde6686c1d85aaed38d679cbe7 Mon Sep 17 00:00:00 2001 From: Willy Tarreau w...@1wt.eu Date: Sat, 25 Jul 2015 12:18:33 +0200 Subject: [PATCH] x86/ldt: allow to disable modify_ldt at runtime For distros who prefer not to take the risk of completely disabling the modify_ldt syscall using CONFIG_MODIFY_LDT_SYSCALL, this patch adds a sysctl to enable or/disable it at runtime, and proposes to disable it by default. This can be a safe alternative. A message is logged if an attempt was stopped so that it's easy to spot if/when it is needed. Cc: Andy Lutomirski l...@kernel.org Cc: Kees Cook keesc...@chromium.org Signed-off-by: Willy Tarreau w...@1wt.eu --- Documentation/sysctl/kernel.txt | 15 +++ arch/x86/Kconfig| 17 + arch/x86/kernel/ldt.c | 16 kernel/sysctl.c | 14 ++ 4 files changed, 62 insertions(+) diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt index 6fccb69..60c7c7a 100644 --- a/Documentation/sysctl/kernel.txt +++ b/Documentation/sysctl/kernel.txt @@ -41,6 +41,7 @@ show up in /proc/sys/kernel: - kptr_restrict - kstack_depth_to_print [ X86 only ] - l2cr[ PPC only ] +- modify_ldt [ X86 only ] - modprobe== Documentation/debugging-modules.txt - modules_disabled - msg_next_id [ sysv ipc ] @@ -391,6 +392,20 @@ This flag controls the L2 cache of G3 processor boards. If == +modify_ldt: (X86 only) + +Enables (1) or disables (0) the modify_ldt syscall. Modifying the LDT +(Local Descriptor Table) may be needed to run a 16-bit or segmented code s/run a/run +such as Dosemu or Wine. This is done via a system call which is not needed s/Dosemu/DOSEMU +to run portable applications, and which can sometimes be abused to exploit +some weaknesses of the architecture, opening new vulnerabilities. So that's pretty vague IMHO, and a bit FUD-ish in character. How about: ... , and which system call exposes complex, rarely used legacy hardware features and semantics that had suffered vulnerabilities in the past. + +This sysctl allows one to increase the system's security by disabling the +system call, or to restore compatibility with specific applications when it +was already disabled. + +== + modules_disabled: A toggle value indicating if modules are allowed to be loaded diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index beabf30..88d10a0 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -2069,6 +2069,23 @@ config MODIFY_LDT_SYSCALL surface. Disabling it removes the modify_ldt(2) system call. Saying 'N' here may make sense for embedded or server kernels. + If really unsure, say 'Y', you'll be able to disable it at runtime. + +config DEFAULT_MODIFY_LDT_SYSCALL + bool Allow userspace to modify the LDT by default + depends on MODIFY_LDT_SYSCALL + default y + ---help--- + Modifying the LDT (Local Descriptor Table) may be needed to run a + 16-bit or segmented code such as Dosemu or Wine. This is done via + a system call which is not needed to run portable applications, + and which can sometimes be abused to exploit some weaknesses of + the architecture, opening new vulnerabilities. + + For this reason this option allows one to enable or disable the + feature at runtime. It is recommended to say 'N' here to leave + the system protected, and to enable it at runtime only if needed + by setting the sys.kernel.modify_ldt sysctl. Here I'd do the same modifications as to the sysctl text above. + if (!sysctl_modify_ldt) { + printk_ratelimited(KERN_INFO + Denied a call to modify_ldt
Re: [Xen-devel] [PATCH v3 1/1] x86: allow to enable/disable modify_ldt at run time
* Willy Tarreau w...@1wt.eu wrote: + if (!sysctl_modify_ldt) { + printk_ratelimited(KERN_INFO + Denied a call to modify_ldt() from %s[%d] (uid: %d). + Adjust the modify_ldt sysctl if this was not an Would it really be so difficult to write this as: Set sys.kernel.modify_ldt = 1 in /etc/sysctl.conf if this was not an exploit attempt. It's just a matter of taste. Normally I consider the kernel distro-agnostic so I don't like to suggest one way to adjust sysctls nor to reference config files. Here we're in a case where only standard distro users may hit the issue, and users of embedded distros will not face this message or will easily translate it into their respective configuration scheme. So OK for this one. So it's a side issue, but it's not a matter of taste at all: why should we end up hurting 99% of Linux users (that use regular distros), just to make it slightly more 'correct' for the weird 1% 'embedded distro' case that decided to put sysctl configuration elsewhere? Users of 'embedded' distros won't normally see kernel messages, and even if they do, the message is crystal clear even to them... Such messages should be as helpful to the regular case as possible. The weird cases will be OK too: and it does not help to make a message unhelpful for _both_ cases. Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v6 4/4] x86/ldt: Make modify_ldt optional
* Andy Lutomirski l...@kernel.org wrote: The modify_ldt syscall exposes a large attack surface and is unnecessary for modern userspace. Make it optional. Reviewed-by: Kees Cook keesc...@chromium.org Signed-off-by: Andy Lutomirski l...@kernel.org --- arch/x86/Kconfig | 17 + arch/x86/include/asm/mmu.h | 2 ++ arch/x86/include/asm/mmu_context.h | 28 +--- arch/x86/kernel/Makefile | 3 ++- arch/x86/kernel/cpu/perf_event.c | 4 arch/x86/kernel/process_64.c | 2 ++ arch/x86/kernel/step.c | 2 ++ kernel/sys_ni.c| 1 + 8 files changed, 51 insertions(+), 8 deletions(-) btw., I fixed a (rare) build failure on MATH_EMULATION=y !MODIFY_LDT_SYSCALL: arch/x86/math-emu/fpu_system.h:21:71: error: ‘mm_context_t’ has no member named ‘ldt’ I took the easy fix: made MATH_EMULATION depend on MODIFY_LDT_SYSCALL for now. Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v9 0/8] pci: add pci_iomap_wc() and pci_ioremap_wc_bar()
* Bjorn Helgaas bhelg...@google.com wrote: Let me know if these are OK or if there are any questions. [0] http://lkml.kernel.org/r/20150625204703.gc4...@pd.tnic [1] http://lkml.kernel.org/r/20150707095012.gq7...@wotan.suse.de Ingo, Just a friendly reminder. Let me know if there are any issues or questions. It would be nice to get an Acked-by from Bjorn for the PCI API bits. I think the actual code of pci_ioremap_wc() and pci_ioremap_wc_bar() is fine (although I might have named it pci_ioremap_bar_wc() for consistency). I declined to merge or ack them myself because they're obvious extensions of pci_ioremap() and pci_ioremap_bar(), and I would prefer that they be exported the same way, i.e., with EXPORT_SYMBOL(), not EXPORT_SYMBOL_GPL(). Huh? AFAICS pci_ioremap_bar() has been a _GPL export for a long time: 1684f5ddd4c0c (Andrew Morton 2008-12-01 14:30:30 -0800 126)#ifdef CONFIG_HAS_IOMEM 1684f5ddd4c0c (Andrew Morton 2008-12-01 14:30:30 -0800 127)void __iomem *pci_ioremap_bar(struct pci_dev *pdev, int bar) 1684f5ddd4c0c (Andrew Morton 2008-12-01 14:30:30 -0800 128){ 1f7bf3bfb5d60 (Bjorn Helgaas 2015-03-12 12:30:11 -0500 129)struct resource *res = pdev-resource[bar]; 1f7bf3bfb5d60 (Bjorn Helgaas 2015-03-12 12:30:11 -0500 130) 1684f5ddd4c0c (Andrew Morton 2008-12-01 14:30:30 -0800 131)/* 1684f5ddd4c0c (Andrew Morton 2008-12-01 14:30:30 -0800 132) * Make sure the BAR is actually a memory resource, not an IO resource 1684f5ddd4c0c (Andrew Morton 2008-12-01 14:30:30 -0800 133) */ 646c0282df042 (Bjorn Helgaas 2015-03-12 12:30:15 -0500 134)if (res-flags IORESOURCE_UNSET || !(res-flags IORESOURCE_MEM)) { 1f7bf3bfb5d60 (Bjorn Helgaas 2015-03-12 12:30:11 -0500 135) dev_warn(pdev-dev, can't ioremap BAR %d: %pR\n, bar, res); 1684f5ddd4c0c (Andrew Morton 2008-12-01 14:30:30 -0800 136) return NULL; 1684f5ddd4c0c (Andrew Morton 2008-12-01 14:30:30 -0800 137)} 1f7bf3bfb5d60 (Bjorn Helgaas 2015-03-12 12:30:11 -0500 138)return ioremap_nocache(res-start, resource_size(res)); 1684f5ddd4c0c (Andrew Morton 2008-12-01 14:30:30 -0800 139)} 1684f5ddd4c0c (Andrew Morton 2008-12-01 14:30:30 -0800 140)EXPORT_SYMBOL_GPL(pci_ioremap_bar); 1684f5ddd4c0c (Andrew Morton 2008-12-01 14:30:30 -0800 141)#endif commit 1684f5ddd4c0c happened in 2008, well before you became PCI maintainer in 2012. and I'd prefer keeping the same EXPORT_SYMBOL_GPL() pattern for the new APIs. (ioremap_wc() is EXPORT_SYMBOL() mostly by accident, it's the odd one out.) Also, FWIIW: I personally got essentially zero feedback and help from proprietary binary kernel module vendors in the past couple of years as x86 maintainer, despite a fair chunk of kernel crashes reported on distro kernels occuring in them... Based on that very negative experience, when we introduce something as complex and as critical as new caching APIs, the last thing I want is to have obscure bugs in binary modules I cannot fix in any reasonable fashion. So even if the parent APIs of new APIs weren't already _GPL exports (as in this case), I'd export them as _GPL in this case. I think using EXPORT_SYMBOL_GPL to express individual political aims rather than as a hint about what might be derived work makes us look like zealots, and that's not my style. As far as I'm concerned it's a pure technological choice: I don't want to export certain types of hard to fix and critical functionality to drivers that I cannot then fix. But I also applied EXPORT_SYMBOL() patches in the past, so I'm not one-sided about it. Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v9 0/8] pci: add pci_iomap_wc() and pci_ioremap_wc_bar()
* Luis R. Rodriguez mcg...@suse.com wrote: On Wed, Jul 08, 2015 at 06:54:11PM -0700, Luis R. Rodriguez wrote: From: Luis R. Rodriguez mcg...@suse.com Ingo, Boris is on vacation, he picked up these patches on his bp#tip-mm tree [0] and they have baked there for a while now. That tree receives 0-day bot testing, but other than that its not clear what other tests were run on these patches. Boris modified the commit logs a bit, and made one optimizaiton to bail early on an PCI ioremap call when it should. These patches have no modifications from what is on Boris' tree and tip-mm branch. The 0 day build bot did find issues on Boris' tree but those are related to ioremap_uc() (already upstream) and its first use on atyfb (not upstream) -- I will be addressing a fix for that ioremap_uc() issue through another patch series prior to posting the final set for atyfb which makes use of ioremap_uc(). No issues have been found with this series. Benh did note some possible issues with expectations with what is done for write-combining for PowerPC [1] but the issue is a rather general long standing issue with semantics of ioremap -- in the case for ioremap_wc() on PowerPC benh notes that writel() will never write-combine as it uses too heavy barriers. Benh notes that although writel_relaxed() today is identical to writel() this can be changed. There are other general semantics issues with ioremap() variant calls -- we seem to have all gotten together to discuss all these issues on a thread where Dan Williams is proposing to unify ioremap prototypes and macro aliases [1], folks intersted on these issues or semantic concerns can drop in and chime there. Let me know if these are OK or if there are any questions. [0] http://lkml.kernel.org/r/20150625204703.gc4...@pd.tnic [1] http://lkml.kernel.org/r/20150707095012.gq7...@wotan.suse.de Ingo, Just a friendly reminder. Let me know if there are any issues or questions. It would be nice to get an Acked-by from Bjorn for the PCI API bits. Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 07/12] x86/virt/guest/xen: Remove use of pgd_list from the Xen guest code
* David Vrabel david.vra...@citrix.com wrote: On 15/06/15 10:05, Ian Campbell wrote: On Sat, 2015-06-13 at 11:49 +0200, Ingo Molnar wrote: xen_mm_pin_all()/unpin_all() are used to implement full guest instance suspend/restore. It's a stop-all method that needs to iterate through all allocated pgds in the system to fix them up for Xen's use. This code uses pgd_list, probably because it was an easy interface. But we want to remove the pgd_list, so convert the code over to walk all tasks in the system. This is an equivalent method. It is not equivalent because pgd_alloc() now populates entries in pgds that are not visible to xen_mm_pin_all() (note how the original code adds the pgd to the pgd_list in pgd_ctor() before calling pgd_prepopulate_pmd()). These newly allocated page tables won't be correctly converted on suspend/resume and the new process will die after resume. So how should the Xen logic be fixed for the new scheme? I can't say I can see through the paravirt complexity here. Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RIP MTRR - status update for upcoming v4.2
* Andy Lutomirski l...@amacapital.net wrote: On Jun 12, 2015 12:59 AM, Jan Beulich jbeul...@suse.com wrote: On 12.06.15 at 01:23, toshi.k...@hp.com wrote: There are two usages on MTRRs: 1) MTRR entries set by firmware 2) MTRR entries set by OS drivers We can obsolete 2), but we have no control over 1). As UEFI firmwares also set this up, this usage will continue to stay. So, we should not get rid of the MTRR code that looks up the MTRR entries, while we have no need to modify them. Such MTRR entries provide safe guard to /dev/mem, which allows privileged user to access a range that may require UC mapping while the /dev/mem driver blindly maps it with WB. MTRRs converts WB to UC in such a case. But it wouldn't be impossible to simply read the MTRRs upon boot, store the information, disable MTRRs, and correctly use PAT to achieve the same effect (i.e. the blindly maps part of course would need fixing). This may crash and burn badly when we call a UEFI function or an SMI happens. I think we should just leave the MTRRs alone. Not to mention suspend/resume, reboot and other goodies where the firmware might pop up expecting intact MTRRs. Btw., doesn't a lack of MTRRs imply UC? So is 'crash and burn' possible in most cases? Isn't it just 'executes slower than before'? Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86/cpu: Fix SMAP check in PVOPS environments
* Rusty Russell ru...@rustcorp.com.au wrote: As the comment in arch/x86/include/asm/paravirt_types.h says: * Get/set interrupt state. save_fl and restore_fl are only * expected to use X86_EFLAGS_IF; all other bits * returned from save_fl are undefined, and may be ignored by * restore_fl. There should be no 'may' about this: under CONFIG_PARAVIRT_DEBUG=y the reminder of the flags should be cleared (or all bits should be set: that will instantly crash if restored verbatim), so that we trip up generic code that relies on non-IF restoration? Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel