Re: [PATCH v2 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops
On 09/21/2015 09:36 AM, Linus Torvalds wrote: > > How many msr reads are so critical that the function call > overhead would matter? Get rid of the inline version of the _safe() > thing too, and put that thing there too. > Probably only the ones that may go in the context switch path. -hpa -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] x86: Use entire page for the per-cpu GDT only if paravirt-enabled
No, it is a natural result of an implemention which treats setting the A bit as an abnormal flow (e.g. in microcode as opposed to hardware). On September 29, 2015 7:11:59 PM PDT, ebied...@xmission.com wrote: >"H. Peter Anvin" <h...@zytor.com> writes: > >> On 09/29/2015 06:20 PM, Eric W. Biederman wrote: >>> Linus Torvalds <torva...@linux-foundation.org> writes: >>> >>>> On Tue, Sep 29, 2015 at 1:35 PM, Andy Lutomirski ><l...@amacapital.net> wrote: >>>>> >>>>> Does anyone know what happens if you stick a non-accessed segment >in >>>>> the GDT, map the GDT RO, and access it? >>>> >>>> You should get a #PF, as you guess, but go ahead and test it if you >>>> want to make sure. >>> >>> I tested this by accident once when workinng on what has become >known >>> as coreboot. Early in boot with your GDT in a EEPROM switching from >>> real mode to 32bit protected mode causes a write and locks up the >>> machine when the hardware declines the write to the GDT to set the >>> accessed bit. As I recall the write kept being retried and retried >and >>> retried... >>> >>> Setting the access bit in the GDT cleared up the problem and I did >not >>> look back. >>> >>> Way up in 64bit mode something might be different, but I don't know >why >>> cpu designeres would waste the silicon. >>> >> >> This is totally different from a TLB violation. In your case, the >write >> goes through as far as the CPU is concerned, but when the data is >> fetched back, it hasn't changed. A write to a TLB-protected location >> will #PF. > >The key point is that a write is generated when the cpu needs to set >the >access bit. I agree the failure points are different. A TLB fault vs >a >case where the hardware did not accept the write. > >The idea of a cpu reading back data (and not trusting it's cache >coherency controls) to verify the access bit gets set seems mind >boggling. That is slow, stupid, racy and incorrect. Incorrect as the >cpu should not only set the access bit once per segment register load. > >In my case I am pretty certain it was something very weird with the >hardware not acceppting the write and either not acknowledging the bus >transaction or cancelling it. In which case the cpu knew the write had >not made it to the ``memory'' and was trying to cope. > >Eric -- Sent from my Android device with K-9 Mail. Please excuse my brevity. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] x86: Use entire page for the per-cpu GDT only if paravirt-enabled
On 09/29/2015 06:20 PM, Eric W. Biederman wrote: > Linus Torvaldswrites: > >> On Tue, Sep 29, 2015 at 1:35 PM, Andy Lutomirski wrote: >>> >>> Does anyone know what happens if you stick a non-accessed segment in >>> the GDT, map the GDT RO, and access it? >> >> You should get a #PF, as you guess, but go ahead and test it if you >> want to make sure. > > I tested this by accident once when workinng on what has become known > as coreboot. Early in boot with your GDT in a EEPROM switching from > real mode to 32bit protected mode causes a write and locks up the > machine when the hardware declines the write to the GDT to set the > accessed bit. As I recall the write kept being retried and retried and > retried... > > Setting the access bit in the GDT cleared up the problem and I did not > look back. > > Way up in 64bit mode something might be different, but I don't know why > cpu designeres would waste the silicon. > This is totally different from a TLB violation. In your case, the write goes through as far as the CPU is concerned, but when the data is fetched back, it hasn't changed. A write to a TLB-protected location will #PF. -hpa -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] x86: Use entire page for the per-cpu GDT only if paravirt-enabled
SGDT would be easy to use, and it is logical that it is faster since it reads an internal register. SIDT does too but unlike the GDT has a secondary limit (it can never be larger than 4096 bytes) and so all limits in the range 4095-65535 are exactly equivalent. Anything that causes a write to the GDT will #PF if read-only. So yes, we need to force the accessed bit to set. This shouldn't be a problem and in fact ought to be a performance improvement. On September 29, 2015 10:35:38 AM PDT, Andy Lutomirski <l...@amacapital.net> wrote: >On Sep 29, 2015 2:01 AM, "Ingo Molnar" <mi...@kernel.org> wrote: >> >> >> * Denys Vlasenko <dvlas...@redhat.com> wrote: >> >> > On 09/28/2015 09:58 AM, Ingo Molnar wrote: >> > > >> > > * Denys Vlasenko <dvlas...@redhat.com> wrote: >> > > >> > >> On 09/26/2015 09:50 PM, H. Peter Anvin wrote: >> > >>> NAK. We really should map the GDT read-only on all 64 bit >systems, >> > >>> since we can't hide the address from SLDT. Same with the IDT. >> > >> >> > >> Sorry, I don't understand your point. >> > > >> > > So the problem is that right now the SGDT instruction (which is >unprivileged) >> > > leaks the real address of the kernel image: >> > > >> > > fomalhaut:~> ./sgdt >> > > SGDT: 88303fd89000 / 007f >> > > >> > > that '88303fd89000' is a kernel address. >> > >> > Thank you. >> > I do know that SGDT and friends are unprivileged on x86 >> > and thus they allow userspace (and guest kernels in paravirt) >> > learn things they don't need to know. >> > >> > I don't see how making GDT page-aligned and page-sized >> > changes anything in this regard. SGDT will still work, >> > and still leak GDT address. >> >> Well, as I try to explain it in the other part of my mail, doing so >enables us to >> remap the GDT to a less security sensitive virtual address that does >not leak the >> kernel's randomized address: >> >> > > Your observation in the changelog and your patch: >> > > >> > >>>> It is page-sized because of paravirt. [...] >> > > >> > > ... conflicts with the intention to mark (remap) the primary GDT >address read-only >> > > on native kernels as well. >> > > >> > > So what we should do instead is to use the page alignment >properly and remap the >> > > GDT to a read-only location, and load that one. >> > >> > If we'd have a small GDT (i.e. what my patch does), we still can >remap the >> > entire page which contains small GDT, and simply don't care that >some other data >> > is also visible through that RO page. >> >> That's generally considered fragile: suppose an attacker has a >limited information >> leak that can read absolute addresses with system privilege but he >doesn't know >> the kernel's randomized base offset. With a 'partial page' mapping >there could be >> function pointers near the GDT, part of the page the GDT happens to >be on, that >> leak this information. >> >> (Same goes for crypto keys or other critical information (like canary >information, >> salts, etc.) accidentally ending up nearby.) >> >> Arguably it's a bit tenuous, but when playing remapping games it's >generally >> considered good to be page aligned and page sized, with zero padding. >> >> > > This would have a couple of advantages: >> > > >> > > - This would give kernel address randomization more teeth on >x86. >> > > >> > > - An additional advantage would be that rootkits overwriting the >GDT would have >> > >a bit more work to do. >> > > >> > > - A third advantage would be that for NUMA systems we could >'mirror' the GDT into >> > >node-local memory and load those. This makes GDT load >cache-misses a bit less >> > >expensive. >> > >> > GDT is per-cpu. Isn't per-cpu memory already NUMA-local? >> >> Indeed it is: >> >> fomalhaut:~> for ((cpu=1; cpu<9; cpu++)); do taskset $cpu ./sgdt ; >done >> SGDT: 88103fa09000 / 007f >> SGDT: 88103fa29000 / 007f >> SGDT: 88103fa29000 / 007f >> SGDT: 88103fa49000 / 007f >> SGDT: 88103fa49000 / 007f >> SGDT: 88103fa49000 / 007f >> SGDT: 88103fa29000 / 007f >> SGDT: 88103fa69000 / 007f >> >> I confused it wi
Re: [PATCH] x86: Use entire page for the per-cpu GDT only if paravirt-enabled
Ugh. Didn't realize that. On September 29, 2015 11:22:04 AM PDT, Andy Lutomirski <l...@amacapital.net> wrote: >On Tue, Sep 29, 2015 at 11:18 AM, H. Peter Anvin <h...@zytor.com> wrote: >> SGDT would be easy to use, and it is logical that it is faster since >it reads an internal register. SIDT does too but unlike the GDT has a >secondary limit (it can never be larger than 4096 bytes) and so all >limits in the range 4095-65535 are exactly equivalent. >> > >Using the IDT limit would have been a great ideal if Intel hadn't >decided to clobber it on every VM exit. > >--Andy -- Sent from my Android device with K-9 Mail. Please excuse my brevity. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] x86: Use entire page for the per-cpu GDT only if paravirt-enabled
On 09/29/2015 11:02 AM, Andy Lutomirski wrote: > On Tue, Sep 29, 2015 at 10:50 AM, Linus Torvalds >wrote: >> On Tue, Sep 29, 2015 at 1:35 PM, Andy Lutomirski wrote: >>> >>> Does anyone know what happens if you stick a non-accessed segment in >>> the GDT, map the GDT RO, and access it? >> >> You should get a #PF, as you guess, but go ahead and test it if you >> want to make sure. > > Then I think that, if we do this, the patch series should first make > it percpu and fixmapped but RW and then flip it RO as a separate patch > in case we need to revert the actual RO bit. I don't want to break > Wine or The Witcher 2 because of this, and we might need various > fixups. I really hope that no one uses get_thread_area to check > whether TLS has been accessed. > Of course. -hpa -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] x86: Use entire page for the per-cpu GDT only if paravirt-enabled
NAK. We really should map the GDT read-only on all 64 bit systems, since we can't hide the address from SLDT. Same with the IDT. On September 26, 2015 11:00:40 AM PDT, Denys Vlasenko <dvlas...@redhat.com> wrote: >We have our GDT in a page-sized per-cpu structure, gdt_page. > >On x86_64 kernel, GDT is 128 bytes - only ~3% of that page is used. > >It is page-sized because of paravirt. Hypervisors need to know when >GDT is changed, so they remap it read-only and handle write faults. >If it's not in its own page, other writes nearby will cause >those faults too. > >In other words, we need GDT to live in a separate page >only if CONFIG_HYPERVISOR_GUEST=y. > >This patch reduces GDT alignment to cacheline-aligned >if CONFIG_HYPERVISOR_GUEST is not set. > >Patch also renames gdt_page to cpu_gdt (mimicking naming of existing >cpu_tss per-cpu variable), since now it is not always a full page. > >$ objdump -x vmlinux | grep .data..percpu | sort >Before: >(offset)(size) > wO .data..percpu 4000 >irq_stack_union >4000 wO .data..percpu 5000 >exception_stacks >9000 wO .data..percpu 1000 gdt_page <<< >HERE > a000 wO .data..percpu 0008 espfix_waddr > a008 wO .data..percpu 0008 espfix_stack >... > 00019398 g .data..percpu __per_cpu_end >After: > wO .data..percpu 4000 >irq_stack_union >4000 wO .data..percpu 5000 >exception_stacks > 9000 wO .data..percpu 0008 espfix_waddr > 9008 wO .data..percpu 0008 espfix_stack >... >00013c80 wO .data..percpu 0040 cyc2ns >00013cc0 wO .data..percpu 22c0 cpu_tss >00015f80 wO .data..percpu 0080 cpu_gdt <<< >HERE > 00016000 wO .data..percpu 0018 cpu_tlbstate >... > 00018418 g .data..percpu 00000000 __per_cpu_end > >Run-tested on a 144 CPU machine. > >Signed-off-by: Denys Vlasenko <dvlas...@redhat.com> >CC: Ingo Molnar <mi...@kernel.org> >CC: H. Peter Anvin <h...@zytor.com> >CC: Konrad Rzeszutek Wilk <konrad.w...@oracle.com> >CC: Boris Ostrovsky <boris.ostrov...@oracle.com> >CC: David Vrabel <david.vra...@citrix.com> >CC: Joerg Roedel <j...@8bytes.org> >CC: Gleb Natapov <g...@kernel.org> >CC: Paolo Bonzini <pbonz...@redhat.com> >CC: kvm@vger.kernel.org >CC: x...@kernel.org >CC: linux-ker...@vger.kernel.org >--- > arch/x86/entry/entry_32.S| 2 +- > arch/x86/include/asm/desc.h | 16 +++- > arch/x86/kernel/cpu/common.c | 10 -- > arch/x86/kernel/cpu/perf_event.c | 2 +- > arch/x86/kernel/head_32.S| 4 ++-- > arch/x86/kernel/head_64.S| 2 +- > arch/x86/kernel/vmlinux.lds.S| 2 +- > arch/x86/tools/relocs.c | 2 +- > arch/x86/xen/enlighten.c | 4 ++-- > 9 files changed, 28 insertions(+), 16 deletions(-) > >diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S >index b2909bf..bc6ae1c 100644 >--- a/arch/x86/entry/entry_32.S >+++ b/arch/x86/entry/entry_32.S >@@ -429,7 +429,7 @@ ldt_ss: > * compensating for the offset by changing to the ESPFIX segment with > * a base address that matches for the difference. > */ >-#define GDT_ESPFIX_SS PER_CPU_VAR(gdt_page) + (GDT_ENTRY_ESPFIX_SS * >8) >+#define GDT_ESPFIX_SS PER_CPU_VAR(cpu_gdt) + (GDT_ENTRY_ESPFIX_SS * 8) > mov %esp, %edx /* load kernel esp */ > mov PT_OLDESP(%esp), %eax /* load userspace esp */ > mov %dx, %ax/* eax: new kernel esp */ >diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h >index 4e10d73..76de300 100644 >--- a/arch/x86/include/asm/desc.h >+++ b/arch/x86/include/asm/desc.h >@@ -39,15 +39,21 @@ extern gate_desc idt_table[]; > extern struct desc_ptr debug_idt_descr; > extern gate_desc debug_idt_table[]; > >-struct gdt_page { >+struct cpu_gdt { > struct desc_struct gdt[GDT_ENTRIES]; >-} __attribute__((aligned(PAGE_SIZE))); >- >-DECLARE_PER_CPU_PAGE_ALIGNED(struct gdt_page, gdt_page); >+} >+#ifdef CONFIG_HYPERVISOR_GUEST >+/* Xen et al want GDT to have its own page. They remap it read-only */ >+__attribute__((aligned(PAGE_SIZE))); >+DECLARE_PER_CPU_PAGE_ALIGNED(struct cpu_gdt, cpu_gdt); >+#else >+cacheline_aligned; >+DECLARE_PER_CPU_ALIGNED(s
Re: [PATCH 0/3] x86/paravirt: Fix baremetal paravirt MSR ops
However, the difference between one CONFIG and another is quite frankly crazy. We should explicitly use the safe versions where this is appropriate, and then yes, we should do this. Yet another reason the paravirt code is batshit crazy. On September 17, 2015 2:31:34 AM PDT, Borislav Petkovwrote: >On Thu, Sep 17, 2015 at 09:19:20AM +0200, Ingo Molnar wrote: >> Most big distro kernels on bare metal have CONFIG_PARAVIRT=y (I >checked Ubuntu and >> Fedora), so we are potentially exposing a lot of users to problems. > >+ SUSE. > >> Crashing the bootup on an unknown MSR is bad. Many MSR reads and >writes are >> non-critical and returning the 'safe' result is much better than >crashing or >> hanging the bootup. > >... and prepending all MSR accesses with feature/CPUID checks is >probably almost >impossible. -- Sent from my Android device with K-9 Mail. Please excuse my brevity. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [v6] kvm/fpu: Enable fully eager restore kvm FPU
On 04/23/2015 08:28 AM, Dave Hansen wrote: On 04/23/2015 02:13 PM, Liang Li wrote: When compiling kernel on westmere, the performance of eager FPU is about 0.4% faster than lazy FPU. Do you have an theory why this is? What does the regression come from? This is interesting since previous measurements on KVM have had the exact opposite results. I think we need to understand this a lot more. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [v3 21/26] x86, irq: Define a global vector for VT-d Posted-Interrupts
On 12/12/2014 07:14 AM, Feng Wu wrote: Currently, we use a global vector as the Posted-Interrupts Notification Event for all the vCPUs in the system. We need to introduce another global vector for VT-d Posted-Interrtups, which will be used to wakeup the sleep vCPU when an external interrupt from a direct-assigned device happens for that vCPU. Signed-off-by: Feng Wu feng...@intel.com #ifdef CONFIG_HAVE_KVM +void (*wakeup_handler_callback)(void) = NULL; +EXPORT_SYMBOL_GPL(wakeup_handler_callback); + Stylistic nitpick: we generally don't explicitly initialize global/static pointer variables to NULL (that happens automatically anyway.) Other than that, Acked-by: H. Peter Anvin h...@linux.intel.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Xen-devel] [RFC] Hypervisor RNG and enumeration
On 10/29/2014 03:37 AM, Andrew Cooper wrote: CPUID with EAX = 0x4F01 and ECX = N MUST return all zeros. To the extent that the hypervisor prefers a given interface, it should specify that interface earlier in the list. For example, KVM might place its KVMKVMKVM signature first in the list to indicate that it should be used by guests in preference to other supported interfaces. Other hypervisors would likely use a different order. The exact semantics of the ordering of the list is beyond the scope of this specification. How do you evaluate N? It would make more sense for CPUID.4F01[ECX=0] to return N in one register, and perhaps prefered interface index in another. The signatures can then be obtained from CPUID.4F01[ECX={1 to N}]. That way, a consumer can be confident that they have found all the signatures, without relying on an unbounded loop and checking for zeroes Yes. Specifically, it should return it in EAX. That is the preferred interface and we are trying to push for that going forward. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Xen-devel] [RFC] Hypervisor RNG and enumeration
On 10/29/2014 03:37 AM, Andrew Cooper wrote: CPUID with EAX = 0x4F01 and ECX = N MUST return all zeros. To the extent that the hypervisor prefers a given interface, it should specify that interface earlier in the list. For example, KVM might place its KVMKVMKVM signature first in the list to indicate that it should be used by guests in preference to other supported interfaces. Other hypervisors would likely use a different order. The exact semantics of the ordering of the list is beyond the scope of this specification. How do you evaluate N? It would make more sense for CPUID.4F01[ECX=0] to return N in one register, and perhaps prefered interface index in another. The signatures can then be obtained from CPUID.4F01[ECX={1 to N}]. That way, a consumer can be confident that they have found all the signatures, without relying on an unbounded loop and checking for zeroes Yes. Specifically, it should return it in EAX. That is the preferred interface and we are trying to push for that going forward. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Xen-devel] [RFC] Hypervisor RNG and enumeration
On 10/29/2014 03:37 AM, Andrew Cooper wrote: CPUID with EAX = 0x4F01 and ECX = N MUST return all zeros. To the extent that the hypervisor prefers a given interface, it should specify that interface earlier in the list. For example, KVM might place its KVMKVMKVM signature first in the list to indicate that it should be used by guests in preference to other supported interfaces. Other hypervisors would likely use a different order. The exact semantics of the ordering of the list is beyond the scope of this specification. How do you evaluate N? It would make more sense for CPUID.4F01[ECX=0] to return N in one register, and perhaps prefered interface index in another. The signatures can then be obtained from CPUID.4F01[ECX={1 to N}]. That way, a consumer can be confident that they have found all the signatures, without relying on an unbounded loop and checking for zeroes Yes. Specifically, it should return it in EAX. That is the preferred interface and we are trying to push for that going forward. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Standardizing an MSR or other hypercall to get an RNG seed?
On 09/22/2014 06:31 AM, Christopher Covington wrote: On 09/19/2014 05:46 PM, H. Peter Anvin wrote: On 09/19/2014 01:46 PM, Andy Lutomirski wrote: However, it sounds to me that at least for KVM, it is very easy just to emulate the RDRAND instruction. The hypervisor would report to the guest that RDRAND is supported in CPUID and the emulate the instruction when guest executes it. KVM already traps guest #UD (which would occur if RDRAND executed while it is not supported) - so this scheme wouldn’t introduce additional overhead over RDMSR. Because then guest user code will think that rdrand is there and will try to use it, resulting in abysmal performance. Yes, the presence of RDRAND implies a cheap and inexhaustible entropy source. A guest kernel couldn't make it look like RDRAND is not present to guest userspace? It could, but how would you enumerate that? A new RDRAND-CPL-0 CPUID bit pretty much would be required. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Standardizing an MSR or other hypercall to get an RNG seed?
On 09/22/2014 07:17 AM, H. Peter Anvin wrote: It could, but how would you enumerate that? A new RDRAND-CPL-0 CPUID bit pretty much would be required. Note that there are two things that differ: the CPL 0-ness and the performance/exhaustibility attributes. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Standardizing an MSR or other hypercall to get an RNG seed?
Not really, no. Sent from my tablet, pardon any formatting problems. On Sep 22, 2014, at 06:31, Christopher Covington c...@codeaurora.org wrote: On 09/19/2014 05:46 PM, H. Peter Anvin wrote: On 09/19/2014 01:46 PM, Andy Lutomirski wrote: However, it sounds to me that at least for KVM, it is very easy just to emulate the RDRAND instruction. The hypervisor would report to the guest that RDRAND is supported in CPUID and the emulate the instruction when guest executes it. KVM already traps guest #UD (which would occur if RDRAND executed while it is not supported) - so this scheme wouldn’t introduce additional overhead over RDMSR. Because then guest user code will think that rdrand is there and will try to use it, resulting in abysmal performance. Yes, the presence of RDRAND implies a cheap and inexhaustible entropy source. A guest kernel couldn't make it look like RDRAND is not present to guest userspace? Christopher -- Employee of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by the Linux Foundation. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Standardizing an MSR or other hypercall to get an RNG seed?
On 09/19/2014 09:37 AM, Gleb Natapov wrote: Linux detects what hypervior it runs on very early Not anywhere close to early enough. We're talking for uses like kASLR. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Standardizing an MSR or other hypercall to get an RNG seed?
On 09/19/2014 09:14 AM, Nakajima, Jun wrote: I slept on it, and I think using the CPUID instruction alone would be simple and efficient: - We have a huge space for CPUID leaves - CPUID also works for user-level - It can take an additional 32-bit parameter (ECX), and returns 4 32-bit values (EAX, EBX, ECX, and EDX). RDMSR, for example, returns a 64-bit value. Basically we can use it to implement a hypercall (rather than VMCALL). For example, - CPUID 0x4801.EAX would return the feature presence (e.g. in EBX), and the result in EDX:EAX (if present) at the same time, or - CPUID 0x4801.EAX would return the feature presence only, and CPUID 0x4802.EAX (acts like a hypercall) returns up to 4 32-bit values. There is a huge disadvantage to the fact that CPUID is a user space instruction, though. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Standardizing an MSR or other hypercall to get an RNG seed?
On 09/19/2014 09:53 AM, Gleb Natapov wrote: On Fri, Sep 19, 2014 at 09:40:07AM -0700, H. Peter Anvin wrote: On 09/19/2014 09:37 AM, Gleb Natapov wrote: Linux detects what hypervior it runs on very early Not anywhere close to early enough. We're talking for uses like kASLR. Still to early to do: h = cpuid(HYPERVIOR_SIGNATURE) if (h == KVMKVMKVM) { if (cpuid(kvm_features) kvm_rnd) rdmsr(kvm_rnd) else (h == HyperV) { if (cpuid(hv_features) hv_rnd) rdmsr(hv_rnd) else (h == XenXenXen) { if (cpuid(xen_features) xen_rnd) rdmsr(xen_rnd) } If we need to do chase loops, especially not so... -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Standardizing an MSR or other hypercall to get an RNG seed?
On 09/19/2014 10:15 AM, Gleb Natapov wrote: On Fri, Sep 19, 2014 at 10:08:20AM -0700, H. Peter Anvin wrote: On 09/19/2014 09:53 AM, Gleb Natapov wrote: On Fri, Sep 19, 2014 at 09:40:07AM -0700, H. Peter Anvin wrote: On 09/19/2014 09:37 AM, Gleb Natapov wrote: Linux detects what hypervior it runs on very early Not anywhere close to early enough. We're talking for uses like kASLR. Still to early to do: h = cpuid(HYPERVIOR_SIGNATURE) if (h == KVMKVMKVM) { if (cpuid(kvm_features) kvm_rnd) rdmsr(kvm_rnd) else (h == HyperV) { if (cpuid(hv_features) hv_rnd) rdmsr(hv_rnd) else (h == XenXenXen) { if (cpuid(xen_features) xen_rnd) rdmsr(xen_rnd) } If we need to do chase loops, especially not so... What loops exactly? As a non native English speaker I fail to understand if your answer is yes or no ;) The above isn't actually the full algorithm used. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Standardizing an MSR or other hypercall to get an RNG seed?
On 09/19/2014 10:15 AM, Gleb Natapov wrote: On Fri, Sep 19, 2014 at 10:08:20AM -0700, H. Peter Anvin wrote: On 09/19/2014 09:53 AM, Gleb Natapov wrote: On Fri, Sep 19, 2014 at 09:40:07AM -0700, H. Peter Anvin wrote: On 09/19/2014 09:37 AM, Gleb Natapov wrote: Linux detects what hypervior it runs on very early Not anywhere close to early enough. We're talking for uses like kASLR. Still to early to do: h = cpuid(HYPERVIOR_SIGNATURE) if (h == KVMKVMKVM) { if (cpuid(kvm_features) kvm_rnd) rdmsr(kvm_rnd) else (h == HyperV) { if (cpuid(hv_features) hv_rnd) rdmsr(hv_rnd) else (h == XenXenXen) { if (cpuid(xen_features) xen_rnd) rdmsr(xen_rnd) } If we need to do chase loops, especially not so... What loops exactly? As a non native English speaker I fail to understand if your answer is yes or no ;) The above isn't actually the full algorithm used. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Standardizing an MSR or other hypercall to get an RNG seed?
On 09/19/2014 10:21 AM, Andy Lutomirski wrote: There is a huge disadvantage to the fact that CPUID is a user space instruction, though. We can always make cpuid on the leaf in question return all zeros if CPL 0. Not sure that is better... -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Standardizing an MSR or other hypercall to get an RNG seed?
On 09/19/2014 01:46 PM, Andy Lutomirski wrote: However, it sounds to me that at least for KVM, it is very easy just to emulate the RDRAND instruction. The hypervisor would report to the guest that RDRAND is supported in CPUID and the emulate the instruction when guest executes it. KVM already traps guest #UD (which would occur if RDRAND executed while it is not supported) - so this scheme wouldn’t introduce additional overhead over RDMSR. Because then guest user code will think that rdrand is there and will try to use it, resulting in abysmal performance. Yes, the presence of RDRAND implies a cheap and inexhaustible entropy source. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Standardizing an MSR or other hypercall to get an RNG seed?
On 09/19/2014 04:12 PM, Andy Lutomirski wrote: To force deterministic execution. I incorrectly thought that the kernel could switch RDRAND on and off. It turns out that a hypervisor can do this, but not the kernel. Also, determinism is lost anyway because of TSX, which *also* can't be turned on and off. Actually, a much bigger reason is because it lets rogue guest *user space*, even will a well-behaved guest OS, do something potentially harmful to the host. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Standardizing an MSR or other hypercall to get an RNG seed?
On 09/19/2014 04:12 PM, Andy Lutomirski wrote: To force deterministic execution. I incorrectly thought that the kernel could switch RDRAND on and off. It turns out that a hypervisor can do this, but not the kernel. Also, determinism is lost anyway because of TSX, which *also* can't be turned on and off. Actually, a much bigger reason is because it lets rogue guest *user space*, even will a well-behaved guest OS, do something potentially harmful to the host. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Standardizing an MSR or other hypercall to get an RNG seed?
On 09/19/2014 04:35 PM, Theodore Ts'o wrote: On Fri, Sep 19, 2014 at 04:29:53PM -0700, H. Peter Anvin wrote: Actually, a much bigger reason is because it lets rogue guest *user space*, even will a well-behaved guest OS, do something potentially harmful to the host. Right, but if the host kernel is dependent on the guest OS for security, the game is over. The Guest Kernel must NEVER been able to do anything harmful to the host. If it can, it is a severe security bug in KVM that must be fixed ASAP. Security and resource well-behaved are two different things. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Standardizing an MSR or other hypercall to get an RNG seed?
On 09/18/2014 07:40 AM, KY Srinivasan wrote: The main questions are what MSR index to use and how to detect the presence of the MSR. I've played with two approaches: 1. Use CPUID to detect the presence of this feature. This is very easy for KVM to implement by using a KVM-specific CPUID feature. The problem is that this will necessarily be KVM-specific, as the guest must first probe for KVM and then probe for the KVM feature. I doubt that Hyper-V, for example, wants to claim to be KVM. If we could standardize a non- hypervisor-specific CPUID feature, then this problem would go away. We would prefer a CPUID feature bit to detect this feature. I guess if we're introducing the concept of pan-OS MSRs we could also have pan-OS CPUID. The real issue is to get a single non-conflicting standard. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Standardizing an MSR or other hypercall to get an RNG seed?
Quite frankly it might make more sense to define a cross-VM *cpuid* range. The cpuid leaf can just point to the MSR. The big question is who will be willing to be the registrar. On September 18, 2014 11:35:39 AM PDT, Andy Lutomirski l...@amacapital.net wrote: On Thu, Sep 18, 2014 at 10:42 AM, Nakajima, Jun jun.nakaj...@intel.com wrote: On Thu, Sep 18, 2014 at 10:20 AM, KY Srinivasan k...@microsoft.com wrote: -Original Message- From: Paolo Bonzini [mailto:paolo.bonz...@gmail.com] On Behalf Of Paolo Bonzini Sent: Thursday, September 18, 2014 10:18 AM To: Nakajima, Jun; KY Srinivasan Cc: Mathew John; Theodore Ts'o; John Starks; kvm list; Gleb Natapov; Niels Ferguson; Andy Lutomirski; David Hepkin; H. Peter Anvin; Jake Oshins; Linux Virtualization Subject: Re: Standardizing an MSR or other hypercall to get an RNG seed? Il 18/09/2014 19:13, Nakajima, Jun ha scritto: In terms of the address for the MSR, I suggest that you choose one from the range between 4000H - 40FFH. The SDM (35.1 ARCHITECTURAL MSRS) says All existing and future processors will not implement any features using any MSR in this range. Hyper-V already defines many synthetic MSRs in this range, and I think it would be reasonable for you to pick one for this to avoid a conflict? KVM is not using any MSR in that range. However, I think it would be better to have the MSR (and perhaps CPUID) outside the hypervisor-reserved ranges, so that it becomes architecturally defined. In some sense it is similar to the HYPERVISOR CPUID feature. Yes, given that we want this to be hypervisor agnostic. Actually, that MSR address range has been reserved for that purpose, along with: - CPUID.EAX=1 - ECX bit 31 (always returns 0 on bare metal) - CPUID.EAX=4000_00xxH leaves (i.e. HYPERVISOR CPUID) I don't know whether this is documented anywhere, but Linux tries to detect a hypervisor by searching CPUID leaves 0x400xyz00 for KVMKVMKVM\0\0\0, so at least Linux can handle the KVM leaves being in a somewhat variable location. Do we consider this mechanism to work across all hypervisors and guests? That is, could we put something like CrossHVPara\0 somewhere in that range, where each hypervisor would be free to decide exactly where it ends up? --Andy -- Sent from my mobile phone. Please pardon brevity and lack of formatting. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Standardizing an MSR or other hypercall to get an RNG seed?
On 09/18/2014 02:46 PM, David Hepkin wrote: I'm not sure what you mean by this mechanism? Are you suggesting that each hypervisor put CrossHVPara\0 somewhere in the 0x4000 - 0x400f CPUID range, and an OS has to do a full scan of this CPUID range on boot to find it? That seems pretty inefficient. An OS will take 1000's of hypervisor intercepts on every boot just to search this CPUID range. I suggest we come to consensus on a specific CPUID leaf where an OS needs to look to determine if a hypervisor supports this capability. We could define a new CPUID leaf range at a well-defined location, or we could just use one of the existing CPUID leaf ranges implemented by an existing hypervisor. I'm not familiar with the KVM CPUID leaf range, but in the case of Hyper-V, the Hyper-V CPUID leaf range was architected to allow for other hypervisors to implement it and just show through specific capabilities supported by the hypervisor. So, we could define a bit in the Hyper-V CPUID leaf range (since Xen and KVM also implement this range), but that would require Linux to look in that range on boot to discover this capability. Yes, I would agree that if anything we should define a new range unique to this cross-VM interface, e.g. 0x4800. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Standardizing an MSR or other hypercall to get an RNG seed?
On 09/18/2014 03:00 PM, Andy Lutomirski wrote: On Thu, Sep 18, 2014 at 2:46 PM, David Hepkin david...@microsoft.com wrote: I'm not sure what you mean by this mechanism? Are you suggesting that each hypervisor put CrossHVPara\0 somewhere in the 0x4000 - 0x400f CPUID range, and an OS has to do a full scan of this CPUID range on boot to find it? That seems pretty inefficient. An OS will take 1000's of hypervisor intercepts on every boot just to search this CPUID range. Linux already does this, which is arguably unfortunate. But it's not quite that bad; the KVM and Xen code is only scanning at increments of 0x100. I think that Linux as a guest would have no problem with checking the Hyper-V range or some new range. I don't think that Linux would want to have to set a guest OS identity, and it's not entirely clear to me whether this would be necessary to use the Hyper-V mechanism. We really don't want to have to do this in early code, though. I suggest we come to consensus on a specific CPUID leaf where an OS needs to look to determine if a hypervisor supports this capability. We could define a new CPUID leaf range at a well-defined location, or we could just use one of the existing CPUID leaf ranges implemented by an existing hypervisor. I'm not familiar with the KVM CPUID leaf range, but in the case of Hyper-V, the Hyper-V CPUID leaf range was architected to allow for other hypervisors to implement it and just show through specific capabilities supported by the hypervisor. So, we could define a bit in the Hyper-V CPUID leaf range (since Xen and KVM also implement this range), but that would require Linux to look in that range on boot to discover this capability. I also don't know whether QEMU and KVM would be okay with implementing the host side of the Hyper-V mechanism by default. They would have to implement at least leaves 0x4001 and 0x402, plus correctly reporting zeros through whatever leaf is used for this new feature. Gleb? Paolo? The problem is what happens with a noncooperating hypervisor. I guess we could put a magic number in one of the leaf registers, but still... -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: GET_RNG_SEED hypercall ABI? (Re: [PATCH v5 0/5] random,x86,kvm: Rework arch RNG seeds and get some from kvm)
On 08/27/2014 12:00 AM, Paolo Bonzini wrote: Il 27/08/2014 01:58, Andy Lutomirski ha scritto: hpa pointed out that the ABI that I chose (an MSR from the KVM range and a KVM cpuid bit) is unnecessarily KVM-specific. It would be nice to allocate an MSR that everyone involved can agree on and, rather than relying on a cpuid bit, just have the guest probe for the MSR. This leads to a few questions: 1. How do we allocate an MSR? (For background, this would be an MSR that either returns 64 bits of best-effort cryptographically secure random data or fails with #GP.) Ask Intel? :) I'm going to poke around internally. Intel might as a matter of policy be reluctant to assign an MSR index specifically for software use, but I'll try to find out. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 0/5] random,x86,kvm: Rework arch RNG seeds and get some from kvm
On 08/12/2014 12:22 PM, Andy Lutomirski wrote: On Tue, Aug 12, 2014 at 12:17 PM, Theodore Ts'o ty...@mit.edu wrote: On Tue, Aug 12, 2014 at 12:11:29PM -0700, Andy Lutomirski wrote: What's the status of this series? I assume that it's too late for at least patches 2-5 to make it into 3.17. Which tree were you hoping this patch series to go through? I was assuming it would go through the x86 tree since the bulk of the changes in the x86 subsystem (hence my Acked-by). There's some argument that patch 1 should go through the kvm tree. There's no real need for patch 1 and 2-5 to end up in the same kernel release, either. IIRC, Peter had some concerns, and I don't remember if they were all addressed. Peter? I don't know. I rewrite one thing he didn't like and undid the other, but there's plenty of opportunity for this version to be problematic, too. Sorry, I have been heads down on the current merge window. I will look at this for 3.18, presumably after Kernel Summit. The proposed arch_get_rng_seed() is not really what it claims to be; it most definitely does not produce seed-grade randomness, instead it seems to be an arch function for best-effort initialization of the entropy pools -- which is fine, it is just something quite different. I want to look over it more carefully before acking it, though. Andy, are you going to be in Chicago? -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 0/5] random,x86,kvm: Rework arch RNG seeds and get some from kvm
On 08/13/2014 09:13 AM, Andy Lutomirski wrote: Sounds good to me. FWIW, I'd like to see a second use added in random.c: I think that we should do this, or even all of init_std_data, on resume from suspend and especially on resume from hibernate / kexec. Yes, we should. We also need to make it possible to do this after cloning a VM. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 0/5] random,x86,kvm: Rework arch RNG seeds and get some from kvm
On 08/13/2014 11:33 AM, Andy Lutomirski wrote: As for doing arch_random_init after clone/migration, I think we'll need another KVM extension for that, since, AFAIK, we don't actually get notified that we were cloned or migrated. That will be nontrivial. Maybe we can figure that out at KS, too. We don't need a reset when migrated (although it might be a good idea under some circumstances, i.e. if the pools might somehow have gotten exposed) but definitely when cloned. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 0/5] random,x86,kvm: Rework arch RNG seeds and get some from kvm
On 08/13/2014 11:44 AM, H. Peter Anvin wrote: On 08/13/2014 11:33 AM, Andy Lutomirski wrote: As for doing arch_random_init after clone/migration, I think we'll need another KVM extension for that, since, AFAIK, we don't actually get notified that we were cloned or migrated. That will be nontrivial. Maybe we can figure that out at KS, too. We don't need a reset when migrated (although it might be a good idea under some circumstances, i.e. if the pools might somehow have gotten exposed) but definitely when cloned. But yes, we need a notification. For obvious reasons there is no suspend event (one can snapshot a running VM) but we need to be notified upon wakeup, *or* we need to give KVM a way to update the necessary state. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] virtio: rng: add derating factor for use by hwrng core
On 08/11/2014 10:27 PM, Amit Shah wrote: On (Mon) 11 Aug 2014 [15:11:03], H. Peter Anvin wrote: On 08/11/2014 11:49 AM, Amit Shah wrote: The khwrngd thread is started when a hwrng device of sufficient quality is registered. The virtio-rng device is backed by the hypervisor, and we trust the hypervisor to provide real entropy. A malicious hypervisor is a scenario that's ruled out, so we are certain the quality of randomness we receive is perfectly trustworthy. Hence, we use 100% for the factor, indicating maximum confidence in the source. Signed-off-by: Amit Shah amit.s...@redhat.com It isn't ruled out, it is just irrelevant: if the hypervisor is malicious, the quality of your random number source is the least of your problems. Yea; I meant ruled out in that sense. Should the commit msg be more verbose? Yes, as it is written it is misleading. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] virtio: rng: add derating factor for use by hwrng core
On 08/11/2014 11:49 AM, Amit Shah wrote: The khwrngd thread is started when a hwrng device of sufficient quality is registered. The virtio-rng device is backed by the hypervisor, and we trust the hypervisor to provide real entropy. A malicious hypervisor is a scenario that's ruled out, so we are certain the quality of randomness we receive is perfectly trustworthy. Hence, we use 100% for the factor, indicating maximum confidence in the source. Signed-off-by: Amit Shah amit.s...@redhat.com It isn't ruled out, it is just irrelevant: if the hypervisor is malicious, the quality of your random number source is the least of your problems. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 2/5] random: Add and use arch_get_rng_seed
On 07/22/2014 01:44 PM, Andy Lutomirski wrote: But, if you Intel's hardware does, in fact, work as documented, then the current code will collect very little entropy on RDSEED-less hardware. I see no great reason that we should do something weaker than following Intel's explicit recommendation for how to seed a PRNG from RDRAND. Very little entropy in the architectural worst case. However, since we are running single-threaded at this point, actual hardware performs orders of magnitude better. Since we run the mixing function (for no particularly good reason -- it is a linear function and doesn't add security) there will be enough delay that RDRAND will in practice catch up and the output will be quite high quality. Since the pool is quite large, the likely outcome is that there will be enough randomness that in practice we would probably be okay if *no* further entropy was ever collected. Another benefit of this split is that it will potentially allow arch_get_rng_seed to be made to work before alternatives are run. There's no fundamental reason that it couldn't work *extremely* early in boot. (The KASLR code is an example of how this might work.) On the other hand, making arch_get_random_long work very early in boot would either slow down all the other callers or add a considerable amount of extra complexity. So I think that this patch is a slight improvement in RNG initialization and will actually result in simpler code. (And yes, if I submit a new version of it, I'll fix the changelog.) There really isn't any significant reason why we could not permit randomness initialization very early in the boot, indeed. It has largely been useless in the past because until the I/O system gets initialized there is no randomness of any kind available on traditional hardware. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 2/5] random: Add and use arch_get_rng_seed
On 07/22/2014 02:04 PM, Andy Lutomirski wrote: Just to check: do you mean the RDRAND is very likely to work (i.e. arch_get_random_long will return true) or that RDRAND will actually reseed several times during initialization? I mean that RDRAND will actually reseed several times during initialization. The documented architectural limit is actually extremely conservative. Either way, it isn't really different from seeding from a VM hosts /dev/urandom... -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 2/5] random: Add and use arch_get_rng_seed
On 07/22/2014 02:10 PM, Andy Lutomirski wrote: On Tue, Jul 22, 2014 at 2:08 PM, H. Peter Anvin h...@zytor.com wrote: On 07/22/2014 02:04 PM, Andy Lutomirski wrote: Just to check: do you mean the RDRAND is very likely to work (i.e. arch_get_random_long will return true) or that RDRAND will actually reseed several times during initialization? I mean that RDRAND will actually reseed several times during initialization. The documented architectural limit is actually extremely conservative. Either way, it isn't really different from seeding from a VM hosts /dev/urandom... Sure it is. The VM host's /dev/urandom makes no guarantee (or AFAIK even any particular effort) to reseed such that the output has some minimum entropy per bit, so there would be no point to reading extra data from it. Depends on what you define as extra data. If the data pulled is less than the size of the output pool, it *may* be fully entropic. (Fun fact: it may even have been fully entropic at the time you pull it, but then turn out not to be later because *another* process consumed data from /dev/urandom without adequate reseeding.) Anyway, I'd be willing to drop the conservative RDRAND logic, but I *still* think that arch_get_rng_seed is a much better interface than arch_get_slow_rng_u64. That I will leave up to you and Ted. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/5] random,x86: Add arch_get_slow_rng_u64
On 07/17/2014 03:33 AM, Theodore Ts'o wrote: On Wed, Jul 16, 2014 at 09:55:15PM -0700, H. Peter Anvin wrote: On 07/16/2014 05:03 PM, Andy Lutomirski wrote: I meant that prandom isn't using rdrand for early seeding. We should probably fix that. It wouldn't hurt to explicitly use arch_get_random_long() in prandom, but it does use get_random_bytes() in early seed, and for CPU's with RDRAND present, we do use it in init_std_data() in drivers/char/random.c, so prandom is already getting initialized via an RNG (which is effectively a DRBG even if it doesn't pass all of NIST's rules) which is derived from RDRAND. I assumed he was referring to before alternatives. Not sure if we use prandom before that point, though. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] random,x86,kvm: Add and use MSR_KVM_GET_RNG_SEED
On 07/16/2014 07:07 AM, Andy Lutomirski wrote: This patch has nothing whatsoever to do with how much I trust the CPU vs the hypervisor. It's for the enormous installed base of machines without RDRAND. hpa suggested emulating RDRAND awhile ago, but I think that'll unusably slow -- the kernel uses RDRAND in various places where it's expected to be fast, and not using it at all will be preferable to causing a VM exit for every few bytes. I've been careful to only use this in the guest in places where a few hundred to a few thousand cycles per 64 bits of RNG seed is acceptable. I suggested emulating RDRAND *but not set the CPUID bit*. We already developed a protocol in KVM/Qemu to enumerate emulated features (created for MOVBE as I recall), specifically to service the semantic feature X will work but will be substantially slower than normal. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] random,x86,kvm: Add and use MSR_KVM_GET_RNG_SEED
On 07/16/2014 09:08 AM, Paolo Bonzini wrote: Il 16/07/2014 18:03, H. Peter Anvin ha scritto: I suggested emulating RDRAND *but not set the CPUID bit*. We already developed a protocol in KVM/Qemu to enumerate emulated features (created for MOVBE as I recall), specifically to service the semantic feature X will work but will be substantially slower than normal. But those will set the CPUID bit. There is currently no way for KVM guests to know if a CPUID bit is real or emulated. OK, so there wasn't any protocol implemented in the end. I sit corrected. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] random,x86,kvm: Add and use MSR_KVM_GET_RNG_SEED
On 07/16/2014 09:21 AM, Gleb Natapov wrote: On Wed, Jul 16, 2014 at 09:13:23AM -0700, H. Peter Anvin wrote: On 07/16/2014 09:08 AM, Paolo Bonzini wrote: Il 16/07/2014 18:03, H. Peter Anvin ha scritto: I suggested emulating RDRAND *but not set the CPUID bit*. We already developed a protocol in KVM/Qemu to enumerate emulated features (created for MOVBE as I recall), specifically to service the semantic feature X will work but will be substantially slower than normal. But those will set the CPUID bit. There is currently no way for KVM guests to know if a CPUID bit is real or emulated. OK, so there wasn't any protocol implemented in the end. I sit corrected. That protocol that was implemented is between qemu and kvm, not kvm and a guest. Either which way, the notion was to have a PV CPUID bit like the proposed kvm_get_rng_seed bit, but to have it exercised by executing RDRAND. The biggest reason to *not* do this would be that with an MSR it is not available to guest user space, which may be better under the circumstances. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] random,x86,kvm: Add and use MSR_KVM_GET_RNG_SEED
On 07/16/2014 02:32 PM, Andy Lutomirski wrote: On the theory that I see no legitimate reason to expose this to guest user space, I think we shouldn't expose it. If we wanted to add a get_random_bytes syscall, that would be an entirely different story, though. Should I send v3 as one series or should I split it into host and guest parts? It doesn't matter... as long as they are separate *patches*. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/5] random,x86: Add arch_get_slow_rng_u64
On 07/16/2014 02:45 PM, Andy Lutomirski wrote: diff --git a/arch/x86/include/asm/archslowrng.h b/arch/x86/include/asm/archslowrng.h new file mode 100644 index 000..c8e8d0d --- /dev/null +++ b/arch/x86/include/asm/archslowrng.h @@ -0,0 +1,30 @@ +/* + * This file is part of the Linux kernel. + * + * Copyright (c) 2014 Andy Lutomirski + * Authors: Andy Lutomirski l...@amacapital.net + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + */ + +#ifndef ASM_X86_ARCHSLOWRANDOM_H +#define ASM_X86_ARCHSLOWRANDOM_H + +#ifndef CONFIG_ARCH_SLOW_RNG +# error archslowrng.h should not be included if !CONFIG_ARCH_SLOW_RNG +#endif + I'm *seriously* questioning the wisdom of this. A much saner thing would be to do: #ifndef CONFIG_ARCH_SLOW_RNG /* Not supported */ static inline int arch_get_slow_rng_u64(u64 *v) { (void)v; return 0; } #endif ... which is basically what we do for the archrandom stuff. I'm also wondering if it makes sense to have a function which prefers arch_get_random*() over this one as a preferred interface. Something like: int get_random_arch_u64_slow_ok(u64 *v) { int i; u64 x = 0; unsigned long l; for (i = 0; i 64/BITS_PER_LONG; i++) { if (!arch_get_random_long(l)) return arch_get_slow_rng_u64(v); x |= l (i*BITS_PER_LONG); } *v = l; return 0; } This still doesn't address the issue e.g. on x86 where RDRAND is available but we haven't set up alternatives yet. So it might be that what we really want is to encapsulate this fallback in arch code and do a more direct enumeration. + +static int kvm_get_slow_rng_u64(u64 *v) +{ + /* + * Allow migration from a hypervisor with the GET_RNG_SEED + * feature to a hypervisor without it. + */ + if (rdmsrl_safe(MSR_KVM_GET_RNG_SEED, v) == 0) + return 1; + else + return 0; +} How about: return rdmsrl_safe(MSR_KVM_GET_RNG_SEED, v) == 0; The naming also feels really inconsistent... -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/5] random,x86: Add arch_get_slow_rng_u64
On 07/16/2014 03:40 PM, Andy Lutomirski wrote: On Wed, Jul 16, 2014 at 3:13 PM, Andy Lutomirski l...@amacapital.net wrote: My personal preference is to defer this until some user shows up. I think that even this would be too complicated for KASLR, which is the only extremely early-boot user that I found. Hmm. Does the prandom stuff want to use this? prandom isn't even using rdrand. I'd suggest fixing this separately, or even just waiting until someone goes and deletes prandom. prandom is exactly the opposite; it is designed for when we need possibly low quality random numbers very quickly. RDRAND is actually too slow. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/5] random,x86: Add arch_get_slow_rng_u64
On 07/16/2014 05:03 PM, Andy Lutomirski wrote: prandom is exactly the opposite; it is designed for when we need possibly low quality random numbers very quickly. RDRAND is actually too slow. I meant that prandom isn't using rdrand for early seeding. We should probably fix that. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 9/9] KVM: x86: smsw emulation is incorrect in 64-bit mode
On 06/05/2014 08:02 AM, Nadav Amit wrote: I'm sorry, I'm missing the place where 64-bit mode is taken into account? It is not, since on 32-bit mode the high-order 16 bits of a register destination are undefined. If I recall correctly, in this case the high-order 16-bits on native system actually reflect the high-order 16-bits of CR0. This sounds like something that really should be verified experimentally. The above claim seems... odd. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] x86: fix page fault tracing when KVM guest support enabled
On 05/16/2014 12:45 PM, Dave Hansen wrote: From: Dave Hansen dave.han...@linux.intel.com I noticed on some of my systems that page fault tracing doesn't work: cd /sys/kernel/debug/tracing echo 1 events/exceptions/enable cat trace; # nothing shows up I eventually traced it down to CONFIG_KVM_GUEST. At least in a KVM VM, enabling that option breaks page fault tracing, and disabling fixes it. I tried on some old kernels and this does not appear to be a regression: it never worked. There are two page-fault entry functions today. One when tracing is on and another when it is off. The KVM code calls do_page_fault() directly instead of calling the traced version: dotraplinkage void __kprobes do_async_page_fault(struct pt_regs *regs, unsigned long error_code) { enum ctx_state prev_state; switch (kvm_read_and_reset_pf_reason()) { default: do_page_fault(regs, error_code); break; case KVM_PV_REASON_PAGE_NOT_PRESENT: I'm also having problems with the page fault tracing on bare metal (same symptom of no trace output). I'm unsure if it's related. Steven had an alternative to this which has zero overhead when tracing is off where this includes the standard noops even when tracing is disabled. I'm unconvinced that the extra complexity of his apporach: http://lkml.kernel.org/r/20140508194508.561ed...@gandalf.local.home is worth it, expecially considering that the KVM code is already making page fault entry slower here. This solution is dirt-simple. Gleb, please apply. Signed-off-by: Dave Hansen dave.han...@linux.intel.com Cc: Thomas Gleixner t...@linutronix.de Cc: x...@kernel.org Cc: Peter Zijlstra pet...@infradead.org Cc: Gleb Natapov g...@redhat.com Cc: H. Peter Anvin h...@zytor.com Cc: kvm@vger.kernel.org Cc: Paolo Bonzini pbonz...@redhat.com Cc: Steven Rostedt rost...@goodmis.org Acked-by: H. Peter Anvin h...@linux.intel.com If Gleb and Paolo are okay with it, I am. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: x86_64 allyesconfig has screwed up voffset and blows up KVM
On 05/05/2014 11:41 AM, Andy Lutomirski wrote: I'm testing 39bfe90706ab0f588db7cb4d1c0e6d1181e1d2f9. I'm not sure what's going on here. voffset.h contains: #define VO__end 0x8111c7a0 #define VO__end 0x8db9a000 #define VO__text 0x8100 because $ nm vmlinux|grep ' _end' 8111c7a0 t _end 8db9a000 B _end The t _end implies there is a local symbol _end which I guess the scripts are incorrectly picking up. Taking a look now. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
random: Providing a seed value to VM guests
On 05/01/2014 11:53 AM, Andy Lutomirski wrote: A CPUID leaf or an MSR advertised by a CPUID leaf has another advantage: it's easy to use in the ASLR code -- I don't think there's a real IDT, so there's nothing like rdmsr_safe available. It also avoids doing anything complicated with the boot process to allow the same seed to be used for ASLR and random.c; it can just be invoked twice on boot. At that point we are talking an x86-specific interface, and so we might as well simply emulate RDRAND (urandom) and RDSEED (random) if the CPU doesn't support them. I believe KVM already has a way to report CPUID features that are emulated but supported anyway, i.e. they work but are slow. What's the right forum for this? This thread is probably not it. Change the subject line? -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: random: Providing a seed value to VM guests
The normal CPUID bit is unset I believe. On May 1, 2014 12:02:49 PM PDT, Andy Lutomirski l...@amacapital.net wrote: On Thu, May 1, 2014 at 11:59 AM, H. Peter Anvin h...@zytor.com wrote: On 05/01/2014 11:53 AM, Andy Lutomirski wrote: A CPUID leaf or an MSR advertised by a CPUID leaf has another advantage: it's easy to use in the ASLR code -- I don't think there's a real IDT, so there's nothing like rdmsr_safe available. It also avoids doing anything complicated with the boot process to allow the same seed to be used for ASLR and random.c; it can just be invoked twice on boot. At that point we are talking an x86-specific interface, and so we might as well simply emulate RDRAND (urandom) and RDSEED (random) if the CPU doesn't support them. I believe KVM already has a way to report CPUID features that are emulated but supported anyway, i.e. they work but are slow. Do existing kernels and userspace respect this? If the normal bit for RDRAND is unset, then we might be okay, but, if not, then I think this may kill guest performance. Is RDSEED really reasonable here? Won't it slow down by several orders of magnitude? What's the right forum for this? This thread is probably not it. Change the subject line? :) -hpa -- Sent from my mobile phone. Please pardon brevity and lack of formatting. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: random: Providing a seed value to VM guests
As I said... I think KVM has already added an emulated instructions enumeration API. On May 1, 2014 12:26:18 PM PDT, ty...@mit.edu wrote: On Thu, May 01, 2014 at 12:02:49PM -0700, Andy Lutomirski wrote: Is RDSEED really reasonable here? Won't it slow down by several orders of magnitude? That is I think the biggest problem; RDRAND and RDSEED are fast if they are native, but they will involve a VM exit if they need to be emulated. So when an OS might want to use RDRAND and RDSEED might be quite different if we know they are being emulated. Using the RDRAND and RDSEED api certainly makes sense, at least for x86, but I suspect we might want to use a different way of signalling that a VM guest can use RDRAND and RDSEED if they are running on a CPU which doesn't provide that kind of access. Maybe a CPUID extended function parameter, if one could be allocated for use by a Linux hypervisor? - Ted -- Sent from my mobile phone. Please pardon brevity and lack of formatting. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: random: Providing a seed value to VM guests
RDSEED is not synchronous. It is, however, nonblocking. On May 1, 2014 1:16:40 PM PDT, Andy Lutomirski l...@amacapital.net wrote: On May 1, 2014 12:26 PM, ty...@mit.edu wrote: On Thu, May 01, 2014 at 12:02:49PM -0700, Andy Lutomirski wrote: Is RDSEED really reasonable here? Won't it slow down by several orders of magnitude? That is I think the biggest problem; RDRAND and RDSEED are fast if they are native, but they will involve a VM exit if they need to be emulated. So when an OS might want to use RDRAND and RDSEED might be quite different if we know they are being emulated. Using the RDRAND and RDSEED api certainly makes sense, at least for x86, but I suspect we might want to use a different way of signalling that a VM guest can use RDRAND and RDSEED if they are running on a CPU which doesn't provide that kind of access. Maybe a CPUID extended function parameter, if one could be allocated for use by a Linux hypervisor? I'm still not convinced. This will affect userspace as well as the guest kernel, and I don't see why guest user code should be able to access this API. RDRAND for CPL0 only would work, but that seems odd. And I think that RDSEED emulation is asking for trouble. RDSEED is synchronous, but /dev/random is asynchronous. And making bootup wait for even a single byte from /dev/random seems bad. In any event, virtio-rng should be a better interface for this. - Ted -- Sent from my mobile phone. Please pardon brevity and lack of formatting. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: random: Providing a seed value to VM guests
On 05/01/2014 01:56 PM, Andy Lutomirski wrote: Even if we could emulate RDSEED effectively**, I don't really understand what the guest is expected to do with it. And I generally dislike defining an interface with no known sensible users, because it means that there's a good chance that the interface won't end up working. ** Doing this sensibly in the host will be awkward. Is the host supposed to use non-blocking reads of /dev/random? Getting anything remotely fair may be difficult. The host can use nonblocking reads of /dev/random. Fairness would have to be implemented at the host level, but that is true for anything. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: random: Providing a seed value to VM guests
On 05/01/2014 03:32 PM, Andy Lutomirski wrote: On Thu, May 1, 2014 at 3:28 PM, ty...@mit.edu wrote: On Thu, May 01, 2014 at 02:06:13PM -0700, Andy Lutomirski wrote: I still don't see the point. What does this do better than virtio-rng? I believe you had been complaining about how complicated it was to set up virtio? And this complexity is also an issue if we want to use it to initialize the RNG used for the kernel text ASLR --- which has to be done very early in the boot process, and where making something as simple as possible is a Good Thing. It's complicated, so it won't be up until much later in the boot process. This is completely fine for /dev/random, but it's a problem for /dev/urandom, ASLR, and such. And since we would want to use RDRAND/RDSEED if it is available *anyway*, perhaps in combination with other things, why not use the RDRAND/RDSEED interface? Because it's awkward. I don't think it simplifies anything. It greatly simplifies discovery, which is a Big Deal[TM] in early code. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: random: Providing a seed value to VM guests
On 05/01/2014 03:56 PM, Andy Lutomirski wrote: I think we're comparing: a) cpuid to detect rdrand *or* emulated rdrand followed by rdrand to b) cpuid to detect rdrand or the paravirt seed msr/cpuid call, followed by rdrand or the msr or cpuid read this seems like it barely makes a difference, especially since (a) probably requires detecting KVM anyway. Well, it lets one do something like: if (boot_cpu_has(X86_FEATURE_RDRAND) || boot_cpu_has(X86_FEATURE_RDRAND_SIMULATED)) rdrand_long(...); We need the ifs anyway for early code; the arch_*() interfaces are only available after alternatives run. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] KVM: x86: RSI/RDI/RCX are zero-extended when affected by string ops
On 04/23/2014 01:53 PM, Nadav Amit wrote: Err, operand size is forced to 64-bits, not address size. The following aspects of near branches are controlled by the effective operand size: • Truncation of the size of the instruction pointer Still, 67h call should not truncate EIP (which your patch does). Yes, I missed it. But if I am not mistaken again, it means that the existing implementation of jmp_rel is broken as well when address-size override prefix is used. In this case, as I see it, the existing masking would cause the carry from the add operation to the lower half of the rip not to be added to the rip higher half. I guess another patch is needed for that as well. Yes, on x86 JMP really should be thought of as MOV ...,IP/EIP/RIP. On some other architectures, e.g. m68k, JMP acts as if it was LEA ...,PC, which causes some serious confusion for people familiar with that model. However, on x86 considering JMP as a MOV to the IP register really is very consistent and will give you the right mental model. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: x86: Fix page-tables reserved bits
On 04/16/2014 12:03 PM, Marcelo Tosatti wrote: @@ -3550,9 +3550,9 @@ static void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu, break; case PT64_ROOT_LEVEL: context-rsvd_bits_mask[0][3] = exb_bit_rsvd | -rsvd_bits(maxphyaddr, 51) | rsvd_bits(7, 8); +rsvd_bits(maxphyaddr, 51) | rsvd_bits(7, 7); context-rsvd_bits_mask[0][2] = exb_bit_rsvd | -rsvd_bits(maxphyaddr, 51) | rsvd_bits(7, 8); +rsvd_bits(maxphyaddr, 51) | rsvd_bits(7, 7); Bit 7 is not reserved either, for the PDPTE (its PageSize bit). In long mode (IA-32e), bit 7 is definitely reserved. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 0/4] KVM: enable Intel SMAP for KVM
I would like to see this in 3.15. -hpa On April 13, 2014 2:57:38 PM PDT, Marcelo Tosatti mtosa...@redhat.com wrote: On Fri, Apr 11, 2014 at 08:16:28PM -0400, Paolo Bonzini wrote: Il 10/04/2014 16:01, Marcelo Tosatti ha scritto: On Tue, Apr 08, 2014 at 04:38:08PM -0400, Paolo Bonzini wrote: Il 07/04/2014 21:06, Wu, Feng ha scritto: Even though the tests do not cover the CPL=3/implicit access case, the logic to compute PFERR_RSVD_MASK dynamically is already covered by AC=1. So I'm quite happy with the coverage. Series is Reviewed-by: Paolo Bonzini pbonz...@redhat.com] Thanks very much for your review on this. BTW: Since 3.15 merge window is still open, I am wondering whether there is any possibility to make SMAP into 3.15 with another pull request. This is up to Marcelo who is currently managing the KVM tree. Paolo The merge window is for patches which have been tested in queue/next for sometime. This patch has received no testing other than the developer testing. This is not going to change unfortunately since this is not shipping in any real silicon. The only hope could be to use QEMU's SVM and SMAP emulation. Well, let me know if you want an exception to the rule so i should merge this patchset and submit it for 3.15. Lack of implicit supervisor mode by instructions such as Examples of such implicit... in section 9.3.2, in KVM's emulator, makes the feature incomplete, does it not ? Implicit supervisor mode is handled by KVM emulator using read/write_std. These accesses do not set PFERR_USER_MASK, and should work fine with SMAP. Am I misunderstanding? Paolo Right. -- Sent from my mobile phone. Please pardon brevity and lack of formatting. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] KVM: enable Intel SMAP for KVM
On 03/27/2014 04:50 AM, Paolo Bonzini wrote: You also need a matching QEMU patch to enable SMAP on Haswell CPUs (do all Haswells have SMAP?), though only for a 2.1 or newer machine type. But this can be covered later. Haswell does not have SMAP (Ivy Bridge and Haswell do have SMEP, however.) -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: GPF in intel_pmu_lbr_reset() with qemu -cpu host
Using _safe has it's own issues if noone checks the errors. On March 22, 2014 5:27:59 AM PDT, Gleb Natapov g...@kernel.org wrote: On Sat, Mar 22, 2014 at 11:05:03AM +0100, Peter Wu wrote: On Saturday 22 March 2014 10:50:45 Gleb Natapov wrote: On Fri, Mar 21, 2014 at 12:04:32PM -0700, Venkatesh Srinivas wrote: On Fri, Mar 21, 2014 at 10:46 AM, Peter Wu pe...@lekensteyn.nl wrote: [skip] When -cpu host is used, qemu/kvm passed the host CPUID F/M/S to the guest. intel_pmu_cpu_*() - intel_pmu_lbr_reset() uses rdmsr() / wrmsr(), rather than the safe variants; if KVM does not support the particular MSRs in question, you will see a #GP(0) there. See https://lkml.org/lkml/2014/3/13/453 for a similar bug other PMU code. When kernel is compiled with guest support all rdmsr()/wrmsr() become _safe(), so the question for Peter is if his guest kernel has guest support enabled? Linux guest support (CONFIG_HYPERVISOR_GUEST) was not enabled, see .config in the first mail[1]. Enabling that option does not change the situation. With CONFIG_PARAVIRT and CONFIG_KVM_GUEST enabled, the PMU GPF is gone, Yeah, it should be PARAVIRT indeed since rdmsr()/wrmsr() is substituted by _safe() using paravirt calls. but now I have a NULL dereference (in rapl_pmu_init). Previously, when `-cpu SandyBridge` was passed to qemu, it would show this: [0.016995] Performance Events: unsupported p6 CPU model 42 no PMU driver, software events only. The same NULL pointer deref would be visible (slightly different addresses, but the Code lines are equal). With `-host`, the NULL deref with `-cpu host` contains: [0.016445] Performance Events: 16-deep LBR, IvyBridge events, Intel PMU driver. Full dmesg below. I am confused. Do you see crash now with -cpu SandyBridge and -cpu host, or -cpu host only? -- Gleb. -- Sent from my mobile phone. Please pardon brevity and lack of formatting. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: GPF in intel_pmu_lbr_reset() with qemu -cpu host
Calling this a bug in the PMU code is ridiculous. If KVM tells the system it os a specific vendor-family-model-stepping but diverges in behavior then it, by definition, is broken. On March 21, 2014 12:04:32 PM PDT, Venkatesh Srinivas venkate...@google.com wrote: On Fri, Mar 21, 2014 at 10:46 AM, Peter Wu pe...@lekensteyn.nl wrote: cc'ing kvm people and list. On Friday 21 March 2014 18:42:40 Peter Wu wrote: Hi, While trying to run QEMU with `-enable-kvm -host cpu`, I get a GPF in intel_pmu_lbr_reset(): [0.024000] general protection fault: [#1] [0.024000] CPU: 0 PID: 1 Comm: swapper Not tainted 3.14.0-rc7-qemu-00059-g08edb33 #14 [0.024000] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [0.024000] task: 88003e05 ti: 88003e054000 task.ti: 88003e054000 [0.024000] RIP: 0010:[8101148a] [8101148a] intel_pmu_lbr_reset+0x2a/0x80 [0.024000] RSP: :88003e055e78 EFLAGS: 0002 [0.024000] RAX: RBX: 0286 RCX: 0680 [0.024000] RDX: RSI: RDI: [0.024000] RBP: 81622120 R08: 88003ffee0e0 R09: 88003e00bf00 [0.024000] R10: R11: 0004 R12: [0.024000] R13: R14: R15: [0.024000] FS: () GS:8161e000() knlGS: [0.024000] CS: 0010 DS: ES: CR0: 80050033 [0.024000] CR2: 8800019bb000 CR3: 01611000 CR4: 001407b0 [0.024000] Stack: [0.024000] 8101308a 8100e3da 8165ba62 [0.024000] 8165b5bd [0.024000] 81655dcd [0.024000] Call Trace: [0.024000] [8101308a] ? intel_pmu_cpu_starting+0xa/0x80 [0.024000] [8100e3da] ? x86_pmu_notifier+0x5a/0xc0 [0.024000] [8165ba62] ? init_hw_perf_events+0x4a5/0x4dd [0.024000] [8165b5bd] ? check_bugs+0x42/0x42 [0.024000] [81655dcd] ? do_one_initcall+0x76/0xf9 [0.024000] [81276b70] ? rest_init+0x70/0x70 [0.024000] [81655ea7] ? kernel_init_freeable+0x57/0x177 [0.024000] [81276b70] ? rest_init+0x70/0x70 [0.024000] [81276b75] ? kernel_init+0x5/0xe0 [0.024000] [8128067a] ? ret_from_fork+0x7a/0xb0 [0.024000] [81276b70] ? rest_init+0x70/0x70 [0.024000] Code: 00 8b 15 02 c4 63 00 85 d2 74 69 f6 05 af c3 63 00 3f 75 2d 85 d2 7e 5c 31 f6 31 c0 0f 1f 44 00 00 8b 0d d2 c3 63 00 89 c2 01 f1 0f 30 83 c6 01 3b 35 d3 c3 63 00 7c e9 f3 c3 0f 1f 80 00 00 00 [0.024000] RIP [8101148a] intel_pmu_lbr_reset+0x2a/0x80 [0.024000] RSP 88003e055e78 [0.024000] ---[ end trace ecbd794f78441b2c ]--- [0.024002] Kernel panic - not syncing: Attempted to kill init! exitcode=0x000b It possibly has something to do with the msr write. Reproducable with: qemu-system-x86_64 -enable-kvm -cpu host -kernel bzImage -m 1G -serial file:ser.txt In the host dmesg, the following is visible when qemu: kvm [4939]: vcpu0 unhandled wrmsr: 0x680 data 0 The full guest dmesg is shown below. The issue occurs also with v3.13.6, v3.12.14, v3.10.33 (other versions are not tested). QEMU: 1.7.0 Host kernel: v3.14-rc5 Guest kernel: v3.14-rc7-59-g08edb33 (.config on the bottom) Kind regards, Peter ### dmesg [0.00] Linux version 3.14.0-rc7-qemu-00059-g08edb33 (pc@antartica) (gcc version 4.8.2 (Ubuntu 4.8.2-16ubuntu6) ) #14 Fri Mar 21 17:30:49 CET 2014 [0.00] Command line: console=ttyS0 loglevel=8 [0.00] KERNEL supported cpus: [0.00] Intel GenuineIntel [0.00] e820: BIOS-provided physical RAM map: [0.00] BIOS-e820: [mem 0x-0x0009fbff] usable [0.00] BIOS-e820: [mem 0x0009fc00-0x0009] reserved [0.00] BIOS-e820: [mem 0x000f-0x000f] reserved [0.00] BIOS-e820: [mem 0x0010-0x3fffdfff] usable [0.00] BIOS-e820: [mem 0x3fffe000-0x3fff] reserved [0.00] BIOS-e820: [mem 0xfeffc000-0xfeff] reserved [0.00] BIOS-e820: [mem 0xfffc-0x] reserved [0.00] NX (Execute Disable) protection: active [0.00] SMBIOS 2.4 present. [0.00] DMI: Bochs Bochs, BIOS Bochs 01/01/2011 [0.00] e820: update [mem 0x-0x0fff] usable == reserved [0.00] e820: remove [mem 0x000a-0x000f] usable [0.00] e820: last_pfn = 0x3fffe max_arch_pfn = 0x4 [0.00] MTRR default type: write-back [0.00] MTRR fixed ranges enabled: [0.00] 0-9
Re: [PATCH v7 06/11] pvqspinlock, x86: Allow unfair queue spinlock in a KVM guest
On 03/20/2014 03:01 PM, Paolo Bonzini wrote: No! Please do what I asked you to do. You are not handling Hyper-V or VMWare. Just use X86_FEATURE_HYPERVISOR and it will cover all hypervisors that actually follow Intel's guidelines. And for those that don't, we should turn on X86_FEATURE_HYPERVISOR in the Linux enumeration code as we do for other features we detect independently. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] KVM: x86 emulator: emulate MOVAPS and MOVAPD SSE instructions
After seeing the sheer number of one-off additions, I'm wondering if going through the opcode map systematically and see what is still missing might not be a bad idea. On March 17, 2014 2:30:43 AM PDT, Paolo Bonzini pbonz...@redhat.com wrote: Il 15/03/2014 23:42, H. Peter Anvin ha scritto: Stupid question... what instructions do NOT need emulsion in KVM? It would seem that at least anything that touches memory would? Yes, indeed. Anything that touches memory can be used on MMIO and then needs emulation. Paolo On March 15, 2014 1:01:58 PM PDT, Igor Mammedov imamm...@redhat.com wrote: MS HCK test fails on 32-bit Windows 8.1 due to missing MOVAPS instruction emulation, this series adds it and while at it, it adds emulation of MOVAPD which is trivial to implement on top of MOVAPS. Igor Mammedov (2): KVM: x86 emulator: emulate MOVAPS KVM: x86 emulator: emulate MOVAPD arch/x86/kvm/emulate.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) -- Sent from my mobile phone. Please pardon brevity and lack of formatting. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] KVM: x86 emulator: emulate MOVAPS and MOVAPD SSE instructions
On 03/17/2014 10:01 AM, Paolo Bonzini wrote: the emulator). If CS and possibly SS are valid real mode selectors, it should be possible to run big real mode at almost-full speed, taking exits only for memory accesses via other segment registers. It is on my todo list, but not very high. Depending on the exit overhead, it may be a better idea to revert the emulate_invalid_guest_state default to N and let people who care about big real mode specify Y. I'm not sure what you mean with valid real mode selectors; the normal case in big real mode is that either CS = SS = 0 or CS = SS = some program base address. As Big Real Mode is part of the spec for certain things (option ROMs, as we discussed) it probably matters, but especially with the CPUs not supporting unrestricted mode fading into history I suspect it is fine for BRM to be slow on those older processors. The PM transitions that you mentioned are usually only a handful of instructions and thus can be slow. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] KVM: x86 emulator: emulate MOVAPS and MOVAPD SSE instructions
MOVAPS, MOVAPD, and MOVDQA are the same operation. They may, architecturally, have different performance characteristics, but nothing that would affect an emulator. On March 15, 2014 1:01:58 PM PDT, Igor Mammedov imamm...@redhat.com wrote: MS HCK test fails on 32-bit Windows 8.1 due to missing MOVAPS instruction emulation, this series adds it and while at it, it adds emulation of MOVAPD which is trivial to implement on top of MOVAPS. Igor Mammedov (2): KVM: x86 emulator: emulate MOVAPS KVM: x86 emulator: emulate MOVAPD arch/x86/kvm/emulate.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) -- Sent from my mobile phone. Please pardon brevity and lack of formatting. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] KVM: x86 emulator: emulate MOVAPS and MOVAPD SSE instructions
Stupid question... what instructions do NOT need emulsion in KVM? It would seem that at least anything that touches memory would? On March 15, 2014 1:01:58 PM PDT, Igor Mammedov imamm...@redhat.com wrote: MS HCK test fails on 32-bit Windows 8.1 due to missing MOVAPS instruction emulation, this series adds it and while at it, it adds emulation of MOVAPD which is trivial to implement on top of MOVAPS. Igor Mammedov (2): KVM: x86 emulator: emulate MOVAPS KVM: x86 emulator: emulate MOVAPD arch/x86/kvm/emulate.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) -- Sent from my mobile phone. Please pardon brevity and lack of formatting. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [qemu64,+smep,+smap] Kernel panic - not syncing: No working init found.
On 02/13/2014 04:45 AM, Fengguang Wu wrote: Greetings, I find that when running qemu-system-x86_64 -cpu qemu64,+smep,+smap Some kernels will 100% produce this error, where the error code -13,-14 are -EACCES and -EFAULT: Any ideas? I notice this is a non-SMAP kernel: # CONFIG_X86_SMAP is not set If the kernel turns on SMAP in CR4 even though SMAP isn't enabled in the kernel, that is a kernel bug. If Qemu enforces SMAP even if it is turned off in CR4, that would be a Qemu bug. I have reproduced the failure locally and an am considering both possibilities now. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [qemu64,+smep,+smap] Kernel panic - not syncing: No working init found.
On 02/13/2014 06:55 AM, H. Peter Anvin wrote: On 02/13/2014 04:45 AM, Fengguang Wu wrote: Greetings, I find that when running qemu-system-x86_64 -cpu qemu64,+smep,+smap Some kernels will 100% produce this error, where the error code -13,-14 are -EACCES and -EFAULT: Any ideas? I notice this is a non-SMAP kernel: # CONFIG_X86_SMAP is not set If the kernel turns on SMAP in CR4 even though SMAP isn't enabled in the kernel, that is a kernel bug. If Qemu enforces SMAP even if it is turned off in CR4, that would be a Qemu bug. I have reproduced the failure locally and an am considering both possibilities now. So we do turn on the bit in CR4 even with SMAP compiled out. This is a bug. However, I still get the same failure even with that bug fixed (and qemu info registers verify that it is, indeed, not set) so I'm wondering if there is a bug in Qemu as well. However, staring at the code in Qemu I don't see where that bug would be... -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [qemu64,+smep,+smap] Kernel panic - not syncing: No working init found.
On 02/13/2014 06:55 AM, H. Peter Anvin wrote: On 02/13/2014 04:45 AM, Fengguang Wu wrote: Greetings, I find that when running qemu-system-x86_64 -cpu qemu64,+smep,+smap Some kernels will 100% produce this error, where the error code -13,-14 are -EACCES and -EFAULT: Any ideas? I notice this is a non-SMAP kernel: # CONFIG_X86_SMAP is not set If the kernel turns on SMAP in CR4 even though SMAP isn't enabled in the kernel, that is a kernel bug. If Qemu enforces SMAP even if it is turned off in CR4, that would be a Qemu bug. I have reproduced the failure locally and an am considering both possibilities now. No, it is simply a second kernel bug. I have patches for both and will push them momentarily. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 1/4] KVM/X86: Fix xsave cpuid exposing bug
On 01/22/2014 02:21 AM, Paolo Bonzini wrote: Il 21/01/2014 19:59, Liu, Jinsong ha scritto: From 3155a190ce6ebb213e6c724240f4e6620ba67a9d Mon Sep 17 00:00:00 2001 From: Liu Jinsong jinsong@intel.com Date: Fri, 13 Dec 2013 02:32:03 +0800 Subject: [PATCH v3 1/4] KVM/X86: Fix xsave cpuid exposing bug EBX of cpuid(0xD, 0) is dynamic per XCR0 features enable/disable. Bit 63 of XCR0 is reserved for future expansion. Signed-off-by: Liu Jinsong jinsong@intel.com Peter, can I have your acked-by on this? Yes. Acked-by: H. Peter Anvin h...@linux.intel.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] X86, mpx: Intel MPX definition
No... we always ask for cpufeature.h patches separately because they sometimes cause conflicts between branches. Borislav Petkov b...@alien8.de wrote: On Sat, Dec 07, 2013 at 02:52:55AM +0800, Qiaowei Ren wrote: Signed-off-by: Qiaowei Ren qiaowei@intel.com Signed-off-by: Xudong Hao xudong@intel.com Signed-off-by: Liu Jinsong jinsong@intel.com --- arch/x86/include/asm/cpufeature.h |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) This patch should probably be merged with the next one... diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h index d3f5c63..6c2738d 100644 --- a/arch/x86/include/asm/cpufeature.h +++ b/arch/x86/include/asm/cpufeature.h @@ -216,6 +216,7 @@ #define X86_FEATURE_ERMS(9*32+ 9) /* Enhanced REP MOVSB/STOSB */ #define X86_FEATURE_INVPCID (9*32+10) /* Invalidate Processor Context ID */ #define X86_FEATURE_RTM (9*32+11) /* Restricted Transactional Memory */ +#define X86_FEATURE_MPX (9*32+14) /* Memory Protection Extension */ #define X86_FEATURE_RDSEED (9*32+18) /* The RDSEED instruction */ #define X86_FEATURE_ADX (9*32+19) /* The ADCX and ADOX instructions */ #define X86_FEATURE_SMAP(9*32+20) /* Supervisor Mode Access Prevention */ @@ -330,6 +331,7 @@ extern const char * const x86_power_flags[32]; #define cpu_has_perfctr_l2 boot_cpu_has(X86_FEATURE_PERFCTR_L2) #define cpu_has_cx8 boot_cpu_has(X86_FEATURE_CX8) #define cpu_has_cx16boot_cpu_has(X86_FEATURE_CX16) +#define cpu_has_mpx boot_cpu_has(X86_FEATURE_MPX) ... and we're trying to not have more of those macros so people should be simply using boot_cpu_has(X86_FEATURE_YYY). -- Sent from my mobile phone. Please pardon brevity and lack of formatting. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v3 0/2] Intel MPX feature support at Qemu
On 12/06/2013 08:27 AM, Liu, Jinsong wrote: Eric Blake wrote: On 12/06/2013 07:06 AM, Liu, Jinsong wrote: Intel has released Memory Protection Extensions (MPX) recently. Please refer to http://download-software.intel.com/sites/default/files/319433-015.pdf These 2 patches are version2 to support Intel MPX at qemu side. You still aren't threading correctly, which makes it hard to track your series. Please review http://wiki.qemu.org/Contribute/SubmitAPatch and make sure your 'git send-email' settings allow for proper threading; a good way to test that is to first send the patch series to yourself to ensure your environment is set up correctly. Thanks Blake! will take care and learn using git send-email when I send patches later (i.e. kvm mpx patches). Not to mention that Linux kernel patches should be Cc:'d to linux-ker...@vger.kernel.org. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] X86, mpx: Intel MPX xstate feature definition
On 12/06/2013 05:46 AM, Borislav Petkov wrote: I'm guessing this and the struct lwp_struct above is being added so that you can have the LWP XSAVE area size? If so, you don't need it: LWP XSAVE area is 128 bytes at offset 832 according to my manuals so I'd guess having a u8 lwp_area[128] should be fine. Sure, but any reason to *not* document the internal structure? +struct bndregs_struct bndregs; +struct bndcsr_struct bndcsr; /* new processor state extensions will go here */ } __attribute__ ((packed, aligned (64))); diff --git a/arch/x86/include/asm/xsave.h b/arch/x86/include/asm/xsave.h index 0415cda..5cd9de3 100644 --- a/arch/x86/include/asm/xsave.h +++ b/arch/x86/include/asm/xsave.h @@ -9,6 +9,8 @@ #define XSTATE_FP 0x1 #define XSTATE_SSE 0x2 #define XSTATE_YMM 0x4 +#define XSTATE_BNDREGS 0x8 +#define XSTATE_BNDCSR 0x10 #define XSTATE_FPSSE(XSTATE_FP | XSTATE_SSE) @@ -20,10 +22,12 @@ #define XSAVE_YMM_SIZE 256 #define XSAVE_YMM_OFFSET(XSAVE_HDR_SIZE + XSAVE_HDR_OFFSET) +#define XSTATE_FLEXIBLE (XSTATE_FP | XSTATE_SSE | XSTATE_YMM) What's the use of that macro if it is used only once? Documentation seems good enough. Explicitly separating out the features which MUST be eagerly saved seems like a good thing. +#define XSTATE_EAGER(XSTATE_BNDREGS | XSTATE_BNDCSR) /* * These are the features that the OS can handle currently. */ -#define XCNTXT_MASK (XSTATE_FP | XSTATE_SSE | XSTATE_YMM) +#define XCNTXT_MASK (XSTATE_FLEXIBLE | XSTATE_EAGER) -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 3/3] X86, mpx: Intel MPX xstate feature definition
On 12/06/2013 09:35 AM, Paolo Bonzini wrote: Sorry for the back-and-forth, but I think this and the removal of XSTATE_FLEXIBLE (perhaps XSTATE_LAZY?) makes your v2 worse than v1. Since Peter already said the same, please undo these changes. Also, how is XSTATE_EAGER used? Should MPX be disabled when xsaveopt is disabled on the kernel command line? (Liu, how would this affect the KVM patches, too?) There are two options: we could disable MPX etc. or we could force eager saving (using xsave) even if xsaveopt is disabled. It is a hard call to make, but I guess I'm leaning towards the latter; we could add an lazyxsave option to explicitly disable all eager features if there is use for that. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v2 3/3] X86, mpx: Intel MPX xstate feature definition
On 12/06/2013 12:05 PM, Liu, Jinsong wrote: Since Peter already said the same, please undo these changes. Also, how is XSTATE_EAGER used? Should MPX be disabled when xsaveopt is disabled on the kernel command line? (Liu, how would this affect the KVM patches, too?) Paolo Currently seems no, and if needed we can add a new patch at kvm side accordingly when native mpx patches checked in. We need to either disable these features in lazy mode, or we need to force eager mode if these features are to be supported. The problem with the latter is that it means forcing eager mode regardless of if anything actually *uses* these features. A third option would be to require applications to use a prctl() or similar to enable eager-save features. Thoughts? -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v2 3/3] X86, mpx: Intel MPX xstate feature definition
On 12/06/2013 04:23 PM, Ren, Qiaowei wrote: We need to either disable these features in lazy mode, or we need to force eager mode if these features are to be supported. The problem with the latter is that it means forcing eager mode regardless of if anything actually *uses* these features. A third option would be to require applications to use a prctl() or similar to enable eager-save features. The third option seems better -- how does native mpx patches work, force eager? It should be the second option, as you can see xsave.c which we remove from this patch. :) Ah yes... I missed the fact that that chunk had been dropped from this patch. It really shouldn't be. I'll substitute the previous version of the patch. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v2 3/3] X86, mpx: Intel MPX xstate feature definition
On 12/06/2013 05:16 PM, Ren, Qiaowei wrote: Jinsong think that both kvm and host depend on these feature definition header file, so we firstly submit these files depended on. Yes, but we can't turn on the feature without proper protection. Either way, they are now in tip:x86/cpufeature. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] X86: Intel MPX definiation
On 12/05/2013 08:08 AM, Paolo Bonzini wrote: Il 02/12/2013 17:43, Liu, Jinsong ha scritto: From fbfa537f690eca139a96c6b2636ab5130bf57716 Mon Sep 17 00:00:00 2001 From: Liu Jinsong jinsong@intel.com Date: Fri, 29 Nov 2013 01:27:00 +0800 Subject: [PATCH 1/4] X86: Intel MPX definiation Signed-off-by: Xudong Hao xudong@intel.com Signed-off-by: Liu Jinsong jinsong@intel.com --- arch/x86/include/asm/cpufeature.h |2 ++ arch/x86/include/asm/xsave.h |5 - 2 files changed, 6 insertions(+), 1 deletions(-) hpa/Ingo/Thomas, can you give your Acked-by for this patch? I'm not sure of the consequences of changing XCNTXT_MASK. This series (which was submitted with the wrong threading) wants it so that KVM can use fpu_save_init and fpu_restore_checking to save and restore the MPX state of the guest. Hi, I'm currently reviewing internally another set of patches for MPX support which would at least in part conflict with these. I don't see the rest of the series -- where was it posted? Either way: 1. asm/cpufeatures.h patches should always be separate, as we put those into a special branch into the -tip tree since they touch so many other things. 2. Enabling MPX is only safe with XSTATE_EAGER, which Qiaowei's patchset has done correctly. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH, RFC] x86-64: properly handle FPU code/data selectors
On 10/16/2013 05:00 AM, Jan Beulich wrote: Having had reports of certain Windows versions, when put in some special driver verification mode, blue-screening due to the FPU state having changed across interrupt handler runs (resulting from a host/ hypervisor side context switch somewhere in the middle of the guest interrupt handler execution) on Xen, and assuming that KVM would suffer from the same problem, as well as having also noticed (long ago) that 32-bit processes don't behave correctly in this regard when run on a 64-bit kernel, this is the resulting attempt to port (and suitably extend) the Xen side fix to Linux. The basic idea here is to either use a priori information on the intended state layout (in the case of 32-bit processes) or sense the proper layout (in the case of KVM guests) by inspecting the already saved FPU rip/rdp, and reading their actual values in a second save operation. This second save operation could be another [F]XSAVE, but on all systems I measured this on using FNSTENV turned out to be the faster alternative. It is not at all clear to me from the description what the flow is that causes the problem, whatever the problem is. Perhaps it should be if I wasn't horribly sleep-deprived, but the description should be clear enough that one should be able to tell the problem at a glance. Please describe the flow that causes trouble. Is this basically a problem with the 32-bit version of FXSAVE versus the 64-bit version? Furthermore, you define X86_FEATURE_NO_FPU_SEL, but you don't set it anywhere. At least that bit needs to be factored out into a separate patch. + if (config_enabled(CONFIG_IA32_EMULATION) + test_tsk_thread_flag(tsk, TIF_IA32)) is_ia32_task()? -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH, RFC] x86-64: properly handle FPU code/data selectors
On 10/16/2013 09:07 AM, Jan Beulich wrote: Furthermore, you define X86_FEATURE_NO_FPU_SEL, but you don't set it anywhere. At least that bit needs to be factored out into a separate patch. That's already being done in get_cpu_cap(), as it's part of x86_capability[9]. Ah, sorry, my bad. For some reason I thought you added it to word 3, but this is a hardware-provided CPUID bit. I, if anyone, should have known :) +if (config_enabled(CONFIG_IA32_EMULATION) +test_tsk_thread_flag(tsk, TIF_IA32)) is_ia32_task()? That'd imply that tsk == current in all cases, which I don't think is right here. True. It wold be good to have an equivalent predicate function for another task, though. This assumes the process doesn't switch modes on us, which it is allowed to do. For that it really would be better to look at the CS.L bit, which can be done with the LAR instruction for the current task; otherwise we'd have to walk the descriptor tables. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RESEND V13 14/14] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
Raghavendra... Even with this latest patch this branch is broken: :(.discard+0x6108): multiple definition of `__pcpu_unique_lock_waiting' arch/x86/xen/built-in.o:(.discard+0x23): first defined here CC drivers/firmware/google/gsmi.o arch/x86/kernel/built-in.o:(.discard+0x6108): multiple definition of `__pcpu_unique_lock_waiting' arch/x86/xen/built-in.o:(.discard+0x23): first defined here CC sound/core/seq/oss/seq_oss_init.o This is trivially reproducible by doing a build with make allyesconfig. Please fix and *verify* it is fixed before resubmitting. I will be away so Ingo will have to handle the resubmission. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V12 0/14] Paravirtualized ticket spinlocks
On 08/09/2013 06:00 AM, Konrad Rzeszutek Wilk wrote: On Fri, Aug 09, 2013 at 06:20:02PM +0530, Raghavendra K T wrote: On 08/09/2013 04:34 AM, H. Peter Anvin wrote: Okay, I figured it out. One of several problems with the formatting of this patchset is that it has one- and two-digit patch numbers in the headers, which meant that my scripts tried to apply patch 10 first. My bad. I 'll send out in uniform digit form next time. If you use 'git format-patch --subject-prefix PATCH V14 v3.11-rc4..' and 'git send-email --subject [PATCH V14] bla blah ..' that should be automatically taken care of? Indeed it should. Another problem with this patchset was that the subject was duplicated in the body, which meant the tools didn't pick up the From: line. I ended up having to manually edit them. That seems to have been fixed, too, in V13. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V12 0/14] Paravirtualized ticket spinlocks
On 08/07/2013 06:02 PM, Gleb Natapov wrote: On Wed, Aug 07, 2013 at 08:50:12PM -0400, Konrad Rzeszutek Wilk wrote: On Wed, Aug 07, 2013 at 12:15:21PM +0530, Raghavendra K T wrote: On 08/07/2013 10:18 AM, H. Peter Anvin wrote: Please let me know, if I should rebase again. tip:master is not a stable branch; it is more like linux-next. We need to figure out which topic branches are dependencies for this set. Okay. I 'll start looking at the branches that would get affected. (Xen, kvm are obvious ones). Please do let me know the branches I might have to check for. From the Xen standpoint anything past v3.11-rc4 would work. For KVM as early as past v3.11-rc1 would be OK. I'm still completely confused as to the base of this patchset. The first patch has the following hunk for arch/x86/include/asm/paravirt.h: --- arch/x86/include/asm/paravirt.h +++ arch/x86/include/asm/paravirt.h @@ -718,7 +718,7 @@ PVOP_VCALLEE2(pv_lock_ops.lock_spinning, lock, ticket); } -static __always_inline void ticket_unlock_kick(struct arch_spinlock *lock, +static __always_inline void __ticket_unlock_kick(struct arch_spinlock *lock, __ticket_t ticket) { PVOP_VCALL2(pv_lock_ops.unlock_kick, lock, ticket); However, there is no ticket_unlock_kick in paravirt.h in either tip:master nor in linus... -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V12 0/14] Paravirtualized ticket spinlocks
On 08/08/2013 02:13 PM, H. Peter Anvin wrote: On 08/07/2013 06:02 PM, Gleb Natapov wrote: On Wed, Aug 07, 2013 at 08:50:12PM -0400, Konrad Rzeszutek Wilk wrote: On Wed, Aug 07, 2013 at 12:15:21PM +0530, Raghavendra K T wrote: On 08/07/2013 10:18 AM, H. Peter Anvin wrote: Please let me know, if I should rebase again. tip:master is not a stable branch; it is more like linux-next. We need to figure out which topic branches are dependencies for this set. Okay. I 'll start looking at the branches that would get affected. (Xen, kvm are obvious ones). Please do let me know the branches I might have to check for. From the Xen standpoint anything past v3.11-rc4 would work. For KVM as early as past v3.11-rc1 would be OK. I'm still completely confused as to the base of this patchset. The first patch has the following hunk for arch/x86/include/asm/paravirt.h: Okay, I figured it out. One of several problems with the formatting of this patchset is that it has one- and two-digit patch numbers in the headers, which meant that my scripts tried to apply patch 10 first. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V12 0/14] Paravirtualized ticket spinlocks
The kbuild test bot is reporting some pretty serious errors for this patchset. I think these are serious enough that the patchset will need to be respun. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V12 0/14] Paravirtualized ticket spinlocks
On 08/06/2013 04:40 AM, Raghavendra K T wrote: This series replaces the existing paravirtualized spinlock mechanism with a paravirtualized ticketlock mechanism. The series provides implementation for both Xen and KVM. The current set of patches are for Xen/x86 spinlock/KVM guest side, to be included against -tip. What is the baseline for this patchset? I tried to apply it on top of 3.11-rc4 and I got nontrivial conflicts. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V12 0/14] Paravirtualized ticket spinlocks
On 08/06/2013 07:54 PM, Raghavendra K T wrote: On 08/07/2013 02:31 AM, H. Peter Anvin wrote: What is the baseline for this patchset? I tried to apply it on top of 3.11-rc4 and I got nontrivial conflicts. I had based it on top of 445363e8 [ Merge branch 'perf/urgent'] of tip. Sorry for not mentioning that. Please let me know, if I should rebase again. tip:master is not a stable branch; it is more like linux-next. We need to figure out which topic branches are dependencies for this set. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2 4/4] x86: correctly detect hypervisor
On 08/05/2013 07:34 AM, Konrad Rzeszutek Wilk wrote: Could you provide me with a git branch so I can test it overnight please? Pull tip:x86/paravirt. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC V11 0/18] Paravirtualized ticket spinlocks
So, having read through the entire thread I *think* this is what the status of this patchset is: 1. Patches 1-17 are noncontroversial, Raghavendra is going to send an update split into two patchsets; 2. There are at least two versions of patch 15; I think the PATCH RESEND RFC is the right one. 3. Patch 18 is controversial but there are performance numbers; these should be integrated in the patch description. 4. People are in general OK with us putting this patchset into -tip for testing, once the updated (V12) patchset is posted. If I'm misunderstanding something, it is because of excessive thread length as mentioned by Ingo. Either way, I'm going to hold off on putting it into -tip until tomorrow unless Ingo beats me to it. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 4/4] x86: properly handle kvm emulation of hyperv
I don't see how this solves the A emulates B, B emulates A problem? KY Srinivasan k...@microsoft.com wrote: -Original Message- From: Paolo Bonzini [mailto:paolo.bonz...@gmail.com] On Behalf Of Paolo Bonzini Sent: Wednesday, July 24, 2013 3:07 AM To: Jason Wang Cc: H. Peter Anvin; KY Srinivasan; t...@linutronix.de; mi...@redhat.com; x...@kernel.org; g...@redhat.com; kvm@vger.kernel.org; linux- ker...@vger.kernel.org Subject: Re: [PATCH 4/4] x86: properly handle kvm emulation of hyperv Il 24/07/2013 08:54, Jason Wang ha scritto: On 07/24/2013 12:48 PM, H. Peter Anvin wrote: On 07/23/2013 09:37 PM, Jason Wang wrote: On 07/23/2013 10:48 PM, H. Peter Anvin wrote: On 07/23/2013 06:55 AM, KY Srinivasan wrote: This strategy of hypervisor detection based on some detection order IMHO is not a robust detection strategy. The current scheme works since the only hypervisor emulated (by other hypervisors happens to be Hyper-V). What if this were to change. One strategy would be to pick the *last* one in the CPUID list, since the ones before it are logically the one(s) being emulated... -hpa How about simply does a reverse loop from 0x4001 to 0x4001? Not all systems like being poked too far into hyperspace. Just remember the last match and walk the list. -hpa Ok, but it raises a question - how to know it was the 'last' match without knowing all signatures of other hyper-visor? You can return a priority value from the .detect function. The priority value can simply be the CPUID leaf where the signature was found (or a low value such as 1 if detection was done with DMI). Then you can pick the hypervisor with the highest priority instead of hard-coding the order. I like this idea; this allows some guest level control that is what we want when we have hypervisors emulating each other. Regards, K. Y Paolo -- Sent from my mobile phone. Please excuse brevity and lack of formatting. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 4/4] x86: properly handle kvm emulation of hyperv
What I'm suggesting is exactly that except that the native hypervisor is later in CPUID space. KY Srinivasan k...@microsoft.com wrote: -Original Message- From: H. Peter Anvin [mailto:h...@zytor.com] Sent: Wednesday, July 24, 2013 11:14 AM To: KY Srinivasan; Paolo Bonzini; Jason Wang Cc: t...@linutronix.de; mi...@redhat.com; x...@kernel.org; g...@redhat.com; kvm@vger.kernel.org; linux-ker...@vger.kernel.org Subject: RE: [PATCH 4/4] x86: properly handle kvm emulation of hyperv I don't see how this solves the A emulates B, B emulates A problem? As Paolo suggested if there were some priority encoded, the guest could make an informed decision. If the guest under question can run on both hypervisors A and B, we would rather the guest discover hypervisor A when running on A and hypervisor B when running on B. The priority encoding could be as simple as surfacing the native hypervisor signature earlier in the CPUID space. K. Y KY Srinivasan k...@microsoft.com wrote: -Original Message- From: Paolo Bonzini [mailto:paolo.bonz...@gmail.com] On Behalf Of Paolo Bonzini Sent: Wednesday, July 24, 2013 3:07 AM To: Jason Wang Cc: H. Peter Anvin; KY Srinivasan; t...@linutronix.de; mi...@redhat.com; x...@kernel.org; g...@redhat.com; kvm@vger.kernel.org; linux- ker...@vger.kernel.org Subject: Re: [PATCH 4/4] x86: properly handle kvm emulation of hyperv Il 24/07/2013 08:54, Jason Wang ha scritto: On 07/24/2013 12:48 PM, H. Peter Anvin wrote: On 07/23/2013 09:37 PM, Jason Wang wrote: On 07/23/2013 10:48 PM, H. Peter Anvin wrote: On 07/23/2013 06:55 AM, KY Srinivasan wrote: This strategy of hypervisor detection based on some detection order IMHO is not a robust detection strategy. The current scheme works since the only hypervisor emulated (by other hypervisors happens to be Hyper-V). What if this were to change. One strategy would be to pick the *last* one in the CPUID list, since the ones before it are logically the one(s) being emulated... -hpa How about simply does a reverse loop from 0x4001 to 0x4001? Not all systems like being poked too far into hyperspace. Just remember the last match and walk the list. -hpa Ok, but it raises a question - how to know it was the 'last' match without knowing all signatures of other hyper-visor? You can return a priority value from the .detect function. The priority value can simply be the CPUID leaf where the signature was found (or a low value such as 1 if detection was done with DMI). Then you can pick the hypervisor with the highest priority instead of hard-coding the order. I like this idea; this allows some guest level control that is what we want when we have hypervisors emulating each other. Regards, K. Y Paolo -- Sent from my mobile phone. Please excuse brevity and lack of formatting. -- Sent from my mobile phone. Please excuse brevity and lack of formatting. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] x86: introduce hypervisor_cpuid_base()
On 07/23/2013 02:41 AM, Jason Wang wrote: +static inline uint32_t hypervisor_cpuid_base(const char *sig, uint32_t leaves) +{ + uint32_t base, eax, ebx, ecx, edx; + char signature[13]; + + for (base = 0x4000; base 0x4001; base += 0x100) { + cpuid(base, eax, ebx, ecx, edx); + *(uint32_t *)(signature + 0) = ebx; + *(uint32_t *)(signature + 4) = ecx; + *(uint32_t *)(signature + 8) = edx; + signature[12] = 0; + + if (!strcmp(sig, signature) + (leaves == 0 || ((eax - base) = leaves))) + return base; + } + + return 0; +} + Hmm... how about: uint32_t sign[3]; cpuid(base, eax, sign[0], sign[1], sign[2]); if (!memcmp(sig, sign, 12) ...); -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] x86: properly handle kvm emulation of hyperv
On 07/23/2013 06:55 AM, KY Srinivasan wrote: This strategy of hypervisor detection based on some detection order IMHO is not a robust detection strategy. The current scheme works since the only hypervisor emulated (by other hypervisors happens to be Hyper-V). What if this were to change. One strategy would be to pick the *last* one in the CPUID list, since the ones before it are logically the one(s) being emulated... -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] x86: introduce hypervisor_cpuid_base()
On 07/23/2013 04:16 AM, Paolo Bonzini wrote: That's nicer, though strcmp is what the replaced code used to do in patches 2 and 3. Note that memcmp requires the caller to use KVMKVMKVM\0\0 as the signature (or alternatively hypervisor_cpuid_base can copy the argument into another 12-byte local variable). Which is the actual signature, though... -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] x86: properly handle kvm emulation of hyperv
On 07/23/2013 10:45 AM, KY Srinivasan wrote: One strategy would be to pick the *last* one in the CPUID list, since the ones before it are logically the one(s) being emulated... Is it always possible to guarantee this ordering. As a hypothetical, what if hypervisor A emulates Hypervisor B and Hypervisor B emulates Hypervisor A. In this case we cannot have any order based detection that can yield correct detection. I define correctness as follows: If a guest can run on both the hypervisors, the guest should detect the true native Hypervisor. My point was that most hypervisors tend to put the native signature at the end of the list starting at 0x4000, just to deal with naïve guests which only look at 0x4000 and not beyond. So a natural convention would be to use the last entry in the list you know how to handle. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] x86: introduce hypervisor_cpuid_base()
On 07/23/2013 09:44 PM, Jason Wang wrote: Since it's just a minor optimization. How about just keep using the strcmp()? It's more that it enables the rest of the cleanup, making the code easier to read. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] x86: properly handle kvm emulation of hyperv
On 07/23/2013 09:37 PM, Jason Wang wrote: On 07/23/2013 10:48 PM, H. Peter Anvin wrote: On 07/23/2013 06:55 AM, KY Srinivasan wrote: This strategy of hypervisor detection based on some detection order IMHO is not a robust detection strategy. The current scheme works since the only hypervisor emulated (by other hypervisors happens to be Hyper-V). What if this were to change. One strategy would be to pick the *last* one in the CPUID list, since the ones before it are logically the one(s) being emulated... -hpa How about simply does a reverse loop from 0x4001 to 0x4001? Not all systems like being poked too far into hyperspace. Just remember the last match and walk the list. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html