Re: 32-bit color graphic on KVM virtual machines
shacky wrote: Hi. Is it possible to have 32-bit color graphic on KVM virtual machines? I installed a Windows virtual machine, but it allows me to configure only 24-bit color display and it does not have any display driver installed. 24-bit means 8 bits per RGB channel. 32-bit means 8 bits per RGB channel plus 8 bits alpha, which isn't very useful on the display. So I wouldn't worry about it. (If you had a 8bpp display, that would be a different story, but those aren't very common.) Of course, lots of programs use 32 bit offscreen surfaces, but that's a different story. --Andy Is there a way to solve this problem? Thank youv very much! Bye. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
percpu allocation failures in kvm
On 3.7.0 + irrelevant patches, I get this on boot. I've seen it on and off on earlier kernels, I think (although I'm not currently getting it on 3.5). [ 10.230054] PERCPU: allocation failed, size=304 align=32, alloc from reserved chunk failed [ 10.230059] Pid: 1026, comm: modprobe Tainted: GW3.7.0-ama+ #5 [ 10.230060] Call Trace: [ 10.230070] [81129efb] pcpu_alloc+0x9db/0xa40 [ 10.230074] [810a81ad] ? find_symbol_in_section+0x4d/0x140 [ 10.230077] [810a8160] ? finished_loading+0x50/0x50 [ 10.230080] [810a8af0] ? each_symbol_section+0x30/0x70 [ 10.230083] [810a8b61] ? find_symbol+0x31/0x60 [ 10.230086] [8112a1f3] __alloc_reserved_percpu+0x13/0x20 [ 10.230089] [810ab48d] load_module+0x3ed/0x1b50 [ 10.230093] [81075c3b] ? __srcu_read_unlock+0x4b/0x70 --Andy -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: percpu allocation failures in kvm
On Fri, Dec 14, 2012 at 5:03 PM, Marcelo Tosatti mtosa...@redhat.com wrote: On Thu, Dec 13, 2012 at 09:43:23PM -0800, Andy Lutomirski wrote: On 3.7.0 + irrelevant patches, I get this on boot. I've seen it on and off on earlier kernels, I think (although I'm not currently getting it on 3.5). [ 10.230054] PERCPU: allocation failed, size=304 align=32, alloc from reserved chunk failed [ 10.230059] Pid: 1026, comm: modprobe Tainted: GW3.7.0-ama+ #5 [ 10.230060] Call Trace: [ 10.230070] [81129efb] pcpu_alloc+0x9db/0xa40 [ 10.230074] [810a81ad] ? find_symbol_in_section+0x4d/0x140 [ 10.230077] [810a8160] ? finished_loading+0x50/0x50 [ 10.230080] [810a8af0] ? each_symbol_section+0x30/0x70 [ 10.230083] [810a8b61] ? find_symbol+0x31/0x60 [ 10.230086] [8112a1f3] __alloc_reserved_percpu+0x13/0x20 [ 10.230089] [810ab48d] load_module+0x3ed/0x1b50 [ 10.230093] [81075c3b] ? __srcu_read_unlock+0x4b/0x70 --Andy You're loading the kvm module, or loading some other module inside a kvm guest? This is loading the kvm module on startup. There are no guests. -- Andy Lutomirski AMA Capital Management, LLC -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[KVM paravirt issue?] Re: vsyscall=emulate regression
Hi, kvm people- Here's a strange failure. It could be a bug in something RHEL6-specific, but it could be a generic issue that only triggers with a paravirt guest with old userspace on a non-ept host. There was a bug like this on Xen, and I'm wondering something's wrong on kvm as well. For background, a change in 3.1 (IIRC) means that, when vsyscall=emulate or vsyscall=none, the vsyscall page in the fixmap is NX. It seems like Amit's machine is marking the physical PTE present but unreadable. So I could have messed up, or there could be a subtle bug somewhere. Any ideas? I'll try to reproduce on a non-ept host later on, but that will involve finding one. On Wed, Feb 15, 2012 at 3:01 AM, Amit Shah amit.s...@redhat.com wrote: On (Tue) 14 Feb 2012 [08:26:22], Andy Lutomirski wrote: On Tue, Feb 14, 2012 at 4:22 AM, Amit Shah amit.s...@redhat.com wrote: Can you try booting the initramfs here: http://web.mit.edu/luto/www/linux/vsyscall_initramfs.img with your kernel image (i.e. qemu-kvm -kernel whatever -initrd vsyscall_initramfs.img -whatever_else) and seeing what happens? It works for me. This too results in a similar error. Can you post the exact error? I'm interested in how far it gets before it fails. I didn't try a modern distro, but looks like this is enough evidence for now to check the kvm emulator code. I tried the same guests on a newer kernel (Fedora 16's 3.2), and things worked fine except for vsyscall=none, panic message below. vsyscall=none isn't supposed to work unless you're running a very modern distro *and* you have no legacy static binaries *and* you aren't using anything written in Go (sigh). It will probably either never become the default or will take 5-10 years. model name : Intel(R) Core(TM)2 Duo CPU E6550 @ 2.33GHz flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm lahf_lm dts tpr_shadow vnmi flexpriority Hmm. You don't have ept. If your guest kernel supports paravirt, then you might use the hypercall interface instead of programming the fixmap directly. This is what I get with vsyscall=none, where emulate and native work fine on the 3.2 kernel on different host hardware, the guest stays the same: [ 2.874661] debug: unmapping init memory 8167f000..818dc000 [ 2.876778] Write protecting the kernel read-only data: 6144k [ 2.879111] debug: unmapping init memory 880001318000..88000140 [ 2.881242] debug: unmapping init memory 8800015a..88000160 [ 2.884637] init[1] vsyscall attempted with vsyscall=none ip:ff600400 cs:33 sp:7fff2f48fe18 ax:7fff2f48fe50 si:7fff2f48ff08 di:0 This like (vsyscall attempted) means that the emulation worked correctly. Your other traces didn't have it or anything like it, which mostly rules out do_emulate_vsyscall issues. --Andy -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM paravirt issue?] Re: vsyscall=emulate regression
On Thu, Feb 16, 2012 at 8:17 AM, Avi Kivity a...@redhat.com wrote: On 02/15/2012 09:36 PM, Andy Lutomirski wrote: Hi, kvm people- Here's a strange failure. It could be a bug in something RHEL6-specific, but it could be a generic issue that only triggers with a paravirt guest with old userspace on a non-ept host. There was a bug like this on Xen, and I'm wondering something's wrong on kvm as well. For background, a change in 3.1 (IIRC) means that, when vsyscall=emulate or vsyscall=none, the vsyscall page in the fixmap is NX. It seems like Amit's machine is marking the physical PTE present but unreadable. No such thing as present and unreadable, without EPT. So I could have messed up, or there could be a subtle bug somewhere. Any ideas? What's the code trying to do? Execute an instruction from an non-executable page, trap the #PF, and emulate? And what are the symptoms? wrong error code for the #PF? That could easily be a kvm bug. The symptom is that some kind of access to a page that's supposed to be readable, NX is reporting error 5. I'm not quite sure what kind of access is causing that. I'll try to reproduce on a non-ept host later on, but that will involve finding one. rmmod kvm-intel moprobe kvm-intel ept=0 I just tried that and still can't reproduce the problem. FWIW, I also failed to reproduce it on the one RHEL6 machine I have access to. Hmm. You don't have ept. If your guest kernel supports paravirt, then you might use the hypercall interface instead of programming the fixmap directly. There is no hypercall interface for writing page tables in kvm. Evidently I was looking at the removed kvm_set_pte stuff :) This is what I get with vsyscall=none, where emulate and native work fine on the 3.2 kernel on different host hardware, the guest stays the same: [ 2.874661] debug: unmapping init memory 8167f000..818dc000 [ 2.876778] Write protecting the kernel read-only data: 6144k [ 2.879111] debug: unmapping init memory 880001318000..88000140 [ 2.881242] debug: unmapping init memory 8800015a..88000160 [ 2.884637] init[1] vsyscall attempted with vsyscall=none ip:ff600400 cs:33 sp:7fff2f48fe18 ax:7fff2f48fe50 si:7fff2f48ff08 di:0 This like (vsyscall attempted) means that the emulation worked correctly. Your other traces didn't have it or anything like it, which mostly rules out do_emulate_vsyscall issues. Can you point me at the code in question? The setup code is in arch/x86/kernel/vsyscall_64.c in map_vsyscall. The bad access is to the vsyscall page. Amit, a trace would be nice. The full output from a test boot of my (updated this morning) initramfs here: http://web.mit.edu/luto/www/linux/vsyscall_initramfs.img may give a better hint. The updated code is here: #include unistd.h #include stdio.h #include string.h #include time.h typedef time_t (*vsys_time_t)(time_t *); int main() { vsys_time_t vsys_time = (vsys_time_t)(0xff600400); unsigned char *p = (char*)0xff600400; int i; printf(Will try reading...\n); printf(The first few bytes are:\n); for (i = 0; i 16; i++) { unsigned char c = p[i]; printf(%02x , (int)c); } printf(\n); printf(Will try executing...\n); printf(The time is %ld\n, (long)( vsys_time(0) )); printf(All done\n); while(1) pause(); } --Andy -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM paravirt issue?] Re: vsyscall=emulate regression
On Thu, Feb 16, 2012 at 9:14 AM, Avi Kivity a...@redhat.com wrote: On 02/16/2012 06:45 PM, Andy Lutomirski wrote: So I could have messed up, or there could be a subtle bug somewhere. Any ideas? What's the code trying to do? Execute an instruction from an non-executable page, trap the #PF, and emulate? And what are the symptoms? wrong error code for the #PF? That could easily be a kvm bug. The symptom is that some kind of access to a page that's supposed to be readable, NX is reporting error 5. I'm not quite sure what kind of access is causing that. Might it be a fetch access, with kvm forgetting to set bit 4 correctly? Can you point me at the code in question? The setup code is in arch/x86/kernel/vsyscall_64.c in map_vsyscall. The bad access is to the vsyscall page. The bad access is on purpose, yes? From fault.c: #ifdef CONFIG_X86_64 /* * Instruction fetch faults in the vsyscall page might need * emulation. */ if (unlikely((error_code PF_INSTR) ((address ~0xfff) == VSYSCALL_START))) { if (emulate_vsyscall(regs, address)) return; } #endif so it seems like kvm doesn't set PF_INSTR? Yes, this is on purpose, and you're almost certainly right (and I feel dumb for not figuring this out immediately). The error message is: segfault at ff600400 ip ff600400 sp 7fff103d72f8 error 5 which is garbage. The instruction at 0xff600400 can't fetch itself as data and fault on the data access (at least not in 64-bit mode, as far as I can think of, without evil messing with the TLBs). So... what do we do about this? This (whitespace-damaged, untested) patch will probably work around it well enough to boot the system: diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 9d74824..52b9522 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -741,8 +741,11 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long * Instruction fetch faults in the vsyscall page might need * emulation. */ - if (unlikely((error_code PF_INSTR) + if (unlikely(address == regs-ip !(error_code PF_WRITE) ((address ~0xfff) == VSYSCALL_START))) { + WARN_ONCE(!(error_code PF_INSTR), + Fixing up bogus vsyscall read fault -- + your hypervisor is buggy.); if (emulate_vsyscall(regs, address)) return; } Before we patch the guest like this, though, it would be nice to know what hosts are affected. If it's just one version of RHEL6, maybe it makes sense to fix the hypervisor and either leave the guest alone or just add a warning saying to fix your hypervisor, like: WARN_ONCE(address == regs-ip !(error_code (PF_INSTR | PF_WRITE)) user_64bit_mode(regs), Fishy page fault -- you might need to fix your hypervisor); near some exit path in the page fault handler. The 64-bit check is because (I think) 32-bit code can mess with regs-ip using a cs offset in the LDT and trigger the warning at will. --Andy -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM paravirt issue?] Re: vsyscall=emulate regression
On Thu, Feb 23, 2012 at 8:34 PM, H. Peter Anvin h...@zytor.com wrote: On 02/16/2012 09:39 AM, Avi Kivity wrote: Yes, this is on purpose Why? I think the this refers to the PF_INSTR fault when executing at 0xff600xxx. That's definitely intentional -- it's how vsyscall emulation works. I think it's unintentional that some kvm versions apparently forget to set the PF_INSTR bit. --Andy -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf. -- Andy Lutomirski AMA Capital Management, LLC Office: (310) 553-5322 Mobile: (650) 906-0647 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM: x86: use dynamic percpu allocations for shared msrs area
On Thu, Jan 3, 2013 at 5:41 AM, Marcelo Tosatti mtosa...@redhat.com wrote: Andy, Mike, can you confirm whether this fixes the percpu allocation failures when loading kvm.ko? TIA Use dynamic percpu allocations for the shared msrs structure, to avoid using the limited reserved percpu space. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Sorry for the amazingly long delay. What kernel does this apply to? --Andy -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/5] x86: Annotate _ASM_EXTABLE users to distinguish uaccess from everything else
The idea is that the kernel can be much more careful fixing up uaccess exceptions -- page faults on user addresses are the only legitimate reason for a uaccess instruction to fault. Signed-off-by: Andy Lutomirski l...@amacapital.net --- I'm not 100% sure what's happening in the KVM code. Can someone familiar with it take a look? arch/x86/ia32/ia32entry.S | 4 +- arch/x86/include/asm/asm.h| 13 ++- arch/x86/include/asm/fpu-internal.h | 6 +- arch/x86/include/asm/futex.h | 8 +- arch/x86/include/asm/kvm_host.h | 2 +- arch/x86/include/asm/msr.h| 4 +- arch/x86/include/asm/segment.h| 2 +- arch/x86/include/asm/special_insns.h | 2 +- arch/x86/include/asm/uaccess.h| 8 +- arch/x86/include/asm/word-at-a-time.h | 2 +- arch/x86/include/asm/xsave.h | 6 +- arch/x86/kernel/entry_32.S| 26 ++--- arch/x86/kernel/entry_64.S| 6 +- arch/x86/kernel/ftrace.c | 4 +- arch/x86/kernel/test_nx.c | 2 +- arch/x86/kernel/test_rodata.c | 2 +- arch/x86/kvm/emulate.c| 4 +- arch/x86/lib/checksum_32.S| 4 +- arch/x86/lib/copy_user_64.S | 50 arch/x86/lib/copy_user_nocache_64.S | 44 +++ arch/x86/lib/csum-copy_64.S | 6 +- arch/x86/lib/getuser.S| 12 +- arch/x86/lib/mmx_32.c | 12 +- arch/x86/lib/msr-reg.S| 4 +- arch/x86/lib/putuser.S| 10 +- arch/x86/lib/usercopy_32.c| 212 +- arch/x86/lib/usercopy_64.c| 4 +- arch/x86/mm/init_32.c | 2 +- arch/x86/um/checksum_32.S | 4 +- arch/x86/xen/xen-asm_32.S | 2 +- 30 files changed, 236 insertions(+), 231 deletions(-) diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S index 474dc1b..8d3b5c2 100644 --- a/arch/x86/ia32/ia32entry.S +++ b/arch/x86/ia32/ia32entry.S @@ -149,7 +149,7 @@ ENTRY(ia32_sysenter_target) 32bit zero extended */ ASM_STAC 1: movl(%rbp),%ebp - _ASM_EXTABLE(1b,ia32_badarg) + _ASM_EXTABLE_UACCESS(1b,ia32_badarg) ASM_CLAC orl $TS_COMPAT,TI_status+THREAD_INFO(%rsp,RIP-ARGOFFSET) testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET) @@ -306,7 +306,7 @@ ENTRY(ia32_cstar_target) /* hardware stack frame is complete now */ ASM_STAC 1: movl(%r8),%r9d - _ASM_EXTABLE(1b,ia32_badarg) + _ASM_EXTABLE_UACCESS(1b,ia32_badarg) ASM_CLAC orl $TS_COMPAT,TI_status+THREAD_INFO(%rsp,RIP-ARGOFFSET) testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET) diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h index fa47fd4..f48a850 100644 --- a/arch/x86/include/asm/asm.h +++ b/arch/x86/include/asm/asm.h @@ -57,14 +57,16 @@ */ /* There are two bits of extable entry class, added to a signed offset. */ -#define _EXTABLE_CLASS_DEFAULT 0 /* standard uaccess fixup */ +#define _EXTABLE_CLASS_UACCESS 0 /* standard uaccess fixup */ +#define _EXTABLE_CLASS_ANY 0x4000 /* catch any exception */ #define _EXTABLE_CLASS_EX 0x8000 /* uaccess + set uaccess_err */ /* * The biases are the class constants + 0x2000, as signed integers. * This can't use ordinary arithmetic -- the assembler isn't that smart. */ -#define _EXTABLE_BIAS_DEFAULT 0x2000 +#define _EXTABLE_BIAS_UACCESS 0x2000 +#define _EXTABLE_BIAS_ANY 0x2000 + 0x4000 #define _EXTABLE_BIAS_EX 0x2000 - 0x8000 #ifdef __ASSEMBLY__ @@ -85,8 +87,11 @@ .popsection\n #endif -#define _ASM_EXTABLE(from,to) \ - _ASM_EXTABLE_CLASS(from, to, _EXTABLE_BIAS_DEFAULT) +#define _ASM_EXTABLE_UACCESS(from,to) \ + _ASM_EXTABLE_CLASS(from, to, _EXTABLE_BIAS_UACCESS) + +#define _ASM_EXTABLE_ANY(from,to) \ + _ASM_EXTABLE_CLASS(from, to, _EXTABLE_BIAS_ANY) #define _ASM_EXTABLE_EX(from,to) \ _ASM_EXTABLE_CLASS(from, to, _EXTABLE_BIAS_EX) diff --git a/arch/x86/include/asm/fpu-internal.h b/arch/x86/include/asm/fpu-internal.h index e25cc33..7f86031 100644 --- a/arch/x86/include/asm/fpu-internal.h +++ b/arch/x86/include/asm/fpu-internal.h @@ -133,7 +133,7 @@ static inline void sanitize_i387_state(struct task_struct *tsk) 3: movl $-1,%[err]\n\ jmp 2b\n\ .previous\n \ -_ASM_EXTABLE(1b, 3b) \ +_ASM_EXTABLE_UACCESS(1b, 3b
Re: VDSO pvclock may increase host cpu consumption, is this a problem?
On 03/29/2014 01:47 AM, Zhanghailiang wrote: Hi, I found when Guest is idle, VDSO pvclock may increase host consumption. We can calcutate as follow, Correct me if I am wrong. (Host)250 * update_pvclock_gtod = 1500 * gettimeofday(Guest) In Host, VDSO pvclock introduce a notifier chain, pvclock_gtod_chain in timekeeping.c. It consume nearly 900 cycles per call. So in consideration of 250 Hz, it may consume 225,000 cycles per second, even no VM is created. In Guest, gettimeofday consumes 220 cycles per call with VDSO pvclock. If the no-kvmclock-vsyscall is configured, gettimeofday consumes 370 cycles per call. The feature decrease 150 cycles consumption per call. When call gettimeofday 1500 times,it decrease 225,000 cycles,equal to the host consumption. Both Host and Guest is linux-3.13.6. So, whether the host cpu consumption is a problem? Does pvclock serve any real purpose on systems with fully-functional TSCs? The x86 guest implementation is awful, so it's about 2x slower than TSC. It could be improved a lot, but I'm not sure I understand why it exists in the first place. I certainly understand the goal of keeping the guest CLOCK_REALTIME is sync with the host, but pvclock seems like overkill for that. --Andy -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: VDSO pvclock may increase host cpu consumption, is this a problem?
On Mar 31, 2014 8:45 PM, Marcelo Tosatti mtosa...@redhat.com wrote: On Mon, Mar 31, 2014 at 10:52:25AM -0700, Andy Lutomirski wrote: On 03/29/2014 01:47 AM, Zhanghailiang wrote: Hi, I found when Guest is idle, VDSO pvclock may increase host consumption. We can calcutate as follow, Correct me if I am wrong. (Host)250 * update_pvclock_gtod = 1500 * gettimeofday(Guest) In Host, VDSO pvclock introduce a notifier chain, pvclock_gtod_chain in timekeeping.c. It consume nearly 900 cycles per call. So in consideration of 250 Hz, it may consume 225,000 cycles per second, even no VM is created. In Guest, gettimeofday consumes 220 cycles per call with VDSO pvclock. If the no-kvmclock-vsyscall is configured, gettimeofday consumes 370 cycles per call. The feature decrease 150 cycles consumption per call. When call gettimeofday 1500 times,it decrease 225,000 cycles,equal to the host consumption. Both Host and Guest is linux-3.13.6. So, whether the host cpu consumption is a problem? Does pvclock serve any real purpose on systems with fully-functional TSCs? The x86 guest implementation is awful, so it's about 2x slower than TSC. It could be improved a lot, but I'm not sure I understand why it exists in the first place. VM migration. Why does that need percpu stuff? Wouldn't it be sufficient to interrupt all CPUs (or at least all cpus running in userspace) on migration and update the normal timing data structures? Even better: have the VM offer to invalidate the physical page containing the kernel's clock data on migration and interrupt one CPU. If another CPU races, it'll fault and wait for the guest kernel to update its timing. Does the current kvmclock stuff track CLOCK_MONOTONIC and CLOCK_REALTIME separately? Can you explain why you consider it so bad ? How you think it could be improved ? The second rdtsc_barrier looks unnecessary. Even better, if rdtscp is available, then rdtscp can replace rdtsc_barrier, rdtsc, and the getcpu call. It would also be nice to avoid having two sets of rescalings of the timing data. I certainly understand the goal of keeping the guest CLOCK_REALTIME is sync with the host, but pvclock seems like overkill for that. VM migration. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: VDSO pvclock may increase host cpu consumption, is this a problem?
On Tue, Apr 1, 2014 at 11:01 AM, Marcelo Tosatti mtosa...@redhat.com wrote: On Mon, Mar 31, 2014 at 10:33:41PM -0700, Andy Lutomirski wrote: On Mar 31, 2014 8:45 PM, Marcelo Tosatti mtosa...@redhat.com wrote: On Mon, Mar 31, 2014 at 10:52:25AM -0700, Andy Lutomirski wrote: On 03/29/2014 01:47 AM, Zhanghailiang wrote: Hi, I found when Guest is idle, VDSO pvclock may increase host consumption. We can calcutate as follow, Correct me if I am wrong. (Host)250 * update_pvclock_gtod = 1500 * gettimeofday(Guest) In Host, VDSO pvclock introduce a notifier chain, pvclock_gtod_chain in timekeeping.c. It consume nearly 900 cycles per call. So in consideration of 250 Hz, it may consume 225,000 cycles per second, even no VM is created. In Guest, gettimeofday consumes 220 cycles per call with VDSO pvclock. If the no-kvmclock-vsyscall is configured, gettimeofday consumes 370 cycles per call. The feature decrease 150 cycles consumption per call. When call gettimeofday 1500 times,it decrease 225,000 cycles,equal to the host consumption. Both Host and Guest is linux-3.13.6. So, whether the host cpu consumption is a problem? Does pvclock serve any real purpose on systems with fully-functional TSCs? The x86 guest implementation is awful, so it's about 2x slower than TSC. It could be improved a lot, but I'm not sure I understand why it exists in the first place. VM migration. Why does that need percpu stuff? Wouldn't it be sufficient to interrupt all CPUs (or at least all cpus running in userspace) on migration and update the normal timing data structures? Are you suggesting to allow interruption of the timekeeping code at any time to update frequency information ? I'm not sure what you mean by interruption of the timekeeping code. I'm suggesting sending an interrupt to the guest (via a virtio device, presumably) to tell it that it has been paused and resumed. This is probably worth getting John's input if you actually want to do this. I'm not about to :) Is there any case in which the TSC is stable and the kvmclock data for different cpus is actually different? Do you want to that as a special tsc clocksource driver ? Even better: have the VM offer to invalidate the physical page containing the kernel's clock data on migration and interrupt one CPU. If another CPU races, it'll fault and wait for the guest kernel to update its timing. Perhaps that is a good idea. Does the current kvmclock stuff track CLOCK_MONOTONIC and CLOCK_REALTIME separately? No. kvmclock counting is interrupted on vm pause (the hw clock does not count during vm pause). Makes sense. Can you explain why you consider it so bad ? How you think it could be improved ? The second rdtsc_barrier looks unnecessary. Even better, if rdtscp is available, then rdtscp can replace rdtsc_barrier, rdtsc, and the getcpu call. It would also be nice to avoid having two sets of rescalings of the timing data. Yep, probably good improvements, patches are welcome :-) I may get to it at some point. No guarantees. I did just rewrite all the mapping-related code for every other x86 vdso timesource, so maybe I should try to add this to the pile. The fact that the data is a variable number of pages makes it messy, though, and since I don't understand why there's a separate structure for each CPU, I'm hesitant to change it too much. --Andy -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: VDSO pvclock may increase host cpu consumption, is this a problem?
On Tue, Apr 1, 2014 at 5:12 PM, Marcelo Tosatti mtosa...@redhat.com wrote: On Tue, Apr 01, 2014 at 12:17:16PM -0700, Andy Lutomirski wrote: On Tue, Apr 1, 2014 at 11:01 AM, Marcelo Tosatti mtosa...@redhat.com wrote: On Mon, Mar 31, 2014 at 10:33:41PM -0700, Andy Lutomirski wrote: On Mar 31, 2014 8:45 PM, Marcelo Tosatti mtosa...@redhat.com wrote: On Mon, Mar 31, 2014 at 10:52:25AM -0700, Andy Lutomirski wrote: On 03/29/2014 01:47 AM, Zhanghailiang wrote: Hi, I found when Guest is idle, VDSO pvclock may increase host consumption. We can calcutate as follow, Correct me if I am wrong. (Host)250 * update_pvclock_gtod = 1500 * gettimeofday(Guest) In Host, VDSO pvclock introduce a notifier chain, pvclock_gtod_chain in timekeeping.c. It consume nearly 900 cycles per call. So in consideration of 250 Hz, it may consume 225,000 cycles per second, even no VM is created. In Guest, gettimeofday consumes 220 cycles per call with VDSO pvclock. If the no-kvmclock-vsyscall is configured, gettimeofday consumes 370 cycles per call. The feature decrease 150 cycles consumption per call. When call gettimeofday 1500 times,it decrease 225,000 cycles,equal to the host consumption. Both Host and Guest is linux-3.13.6. So, whether the host cpu consumption is a problem? Does pvclock serve any real purpose on systems with fully-functional TSCs? The x86 guest implementation is awful, so it's about 2x slower than TSC. It could be improved a lot, but I'm not sure I understand why it exists in the first place. VM migration. Why does that need percpu stuff? Wouldn't it be sufficient to interrupt all CPUs (or at least all cpus running in userspace) on migration and update the normal timing data structures? Are you suggesting to allow interruption of the timekeeping code at any time to update frequency information ? I'm not sure what you mean by interruption of the timekeeping code. I'm suggesting sending an interrupt to the guest (via a virtio device, presumably) to tell it that it has been paused and resumed. code: 1) disable interrupts 2) A = RDTSC 3) B = SCALE(A, TSC.FREQ) If migration happens between 2 and 3, you've got an incorrect value. Fair enough. I guess 1) disable interrupts 2) A = RDTSC 3) B = SCALE(A, TSC.FREQ) is also bad if (3) blocks due to magic invalidation of the physical page. --Andy -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: VDSO pvclock may increase host cpu consumption, is this a problem?
On Tue, Apr 1, 2014 at 5:29 PM, Marcelo Tosatti mtosa...@redhat.com wrote: On Tue, Apr 01, 2014 at 12:17:16PM -0700, Andy Lutomirski wrote: On Tue, Apr 1, 2014 at 11:01 AM, Marcelo Tosatti mtosa...@redhat.com wrote: On Mon, Mar 31, 2014 at 10:33:41PM -0700, Andy Lutomirski wrote: On Mar 31, 2014 8:45 PM, Marcelo Tosatti mtosa...@redhat.com wrote: On Mon, Mar 31, 2014 at 10:52:25AM -0700, Andy Lutomirski wrote: On 03/29/2014 01:47 AM, Zhanghailiang wrote: Hi, I found when Guest is idle, VDSO pvclock may increase host consumption. We can calcutate as follow, Correct me if I am wrong. (Host)250 * update_pvclock_gtod = 1500 * gettimeofday(Guest) In Host, VDSO pvclock introduce a notifier chain, pvclock_gtod_chain in timekeeping.c. It consume nearly 900 cycles per call. So in consideration of 250 Hz, it may consume 225,000 cycles per second, even no VM is created. In Guest, gettimeofday consumes 220 cycles per call with VDSO pvclock. If the no-kvmclock-vsyscall is configured, gettimeofday consumes 370 cycles per call. The feature decrease 150 cycles consumption per call. When call gettimeofday 1500 times,it decrease 225,000 cycles,equal to the host consumption. Both Host and Guest is linux-3.13.6. So, whether the host cpu consumption is a problem? Does pvclock serve any real purpose on systems with fully-functional TSCs? The x86 guest implementation is awful, so it's about 2x slower than TSC. It could be improved a lot, but I'm not sure I understand why it exists in the first place. VM migration. Why does that need percpu stuff? Wouldn't it be sufficient to interrupt all CPUs (or at least all cpus running in userspace) on migration and update the normal timing data structures? Are you suggesting to allow interruption of the timekeeping code at any time to update frequency information ? I'm not sure what you mean by interruption of the timekeeping code. I'm suggesting sending an interrupt to the guest (via a virtio device, presumably) to tell it that it has been paused and resumed. This is probably worth getting John's input if you actually want to do this. I'm not about to :) Honestly, neither am i at the moment. But i'll think about it. Is there any case in which the TSC is stable and the kvmclock data for different cpus is actually different? No. However, kvmclock_data.flags field is an interface for watchdog unpause. Do you want to that as a special tsc clocksource driver ? Even better: have the VM offer to invalidate the physical page containing the kernel's clock data on migration and interrupt one CPU. If another CPU races, it'll fault and wait for the guest kernel to update its timing. Perhaps that is a good idea. Does the current kvmclock stuff track CLOCK_MONOTONIC and CLOCK_REALTIME separately? No. kvmclock counting is interrupted on vm pause (the hw clock does not count during vm pause). Makes sense. Can you explain why you consider it so bad ? How you think it could be improved ? The second rdtsc_barrier looks unnecessary. Even better, if rdtscp is available, then rdtscp can replace rdtsc_barrier, rdtsc, and the getcpu call. It would also be nice to avoid having two sets of rescalings of the timing data. Yep, probably good improvements, patches are welcome :-) I may get to it at some point. No guarantees. I did just rewrite all the mapping-related code for every other x86 vdso timesource, so maybe I should try to add this to the pile. The fact that the data is a variable number of pages makes it messy, though, and since I don't understand why there's a separate structure for each CPU, I'm hesitant to change it too much. --Andy kvmclock.data? Because each VCPU can have different .flags fields for example. It looks like the vdso kvmclock code only runs if PVCLOCK_TSC_STABLE_BIT is set, which in turn is only the case if the TSC is guaranteed to be monotonic across all CPUs. If we can rely on the fact that that bit will only be set if tsc_to_system_mul and tsc_shift are the same on all CPUs and that (system_time - (tsc_timestamp * mul) shift) is the same on all CPUs, then there should be no reason for the vdso to read the pvclock data for anything but CPU 0. That will make it a lot faster and simpler. Can we rely on that? I wonder what happens if the guest runs ntpd or otherwise uses adjtimex. Presumably it starts drifting relative to the host. --Andy -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: VDSO pvclock may increase host cpu consumption, is this a problem?
On Wed, Apr 2, 2014 at 3:05 PM, Marcelo Tosatti mtosa...@redhat.com wrote: On Tue, Apr 01, 2014 at 05:46:34PM -0700, Andy Lutomirski wrote: On Tue, Apr 1, 2014 at 5:29 PM, Marcelo Tosatti mtosa...@redhat.com wrote: On Tue, Apr 01, 2014 at 12:17:16PM -0700, Andy Lutomirski wrote: On Tue, Apr 1, 2014 at 11:01 AM, Marcelo Tosatti mtosa...@redhat.com wrote: On Mon, Mar 31, 2014 at 10:33:41PM -0700, Andy Lutomirski wrote: On Mar 31, 2014 8:45 PM, Marcelo Tosatti mtosa...@redhat.com wrote: On Mon, Mar 31, 2014 at 10:52:25AM -0700, Andy Lutomirski wrote: On 03/29/2014 01:47 AM, Zhanghailiang wrote: Hi, I found when Guest is idle, VDSO pvclock may increase host consumption. We can calcutate as follow, Correct me if I am wrong. (Host)250 * update_pvclock_gtod = 1500 * gettimeofday(Guest) In Host, VDSO pvclock introduce a notifier chain, pvclock_gtod_chain in timekeeping.c. It consume nearly 900 cycles per call. So in consideration of 250 Hz, it may consume 225,000 cycles per second, even no VM is created. In Guest, gettimeofday consumes 220 cycles per call with VDSO pvclock. If the no-kvmclock-vsyscall is configured, gettimeofday consumes 370 cycles per call. The feature decrease 150 cycles consumption per call. When call gettimeofday 1500 times,it decrease 225,000 cycles,equal to the host consumption. Both Host and Guest is linux-3.13.6. So, whether the host cpu consumption is a problem? Does pvclock serve any real purpose on systems with fully-functional TSCs? The x86 guest implementation is awful, so it's about 2x slower than TSC. It could be improved a lot, but I'm not sure I understand why it exists in the first place. VM migration. Why does that need percpu stuff? Wouldn't it be sufficient to interrupt all CPUs (or at least all cpus running in userspace) on migration and update the normal timing data structures? Are you suggesting to allow interruption of the timekeeping code at any time to update frequency information ? I'm not sure what you mean by interruption of the timekeeping code. I'm suggesting sending an interrupt to the guest (via a virtio device, presumably) to tell it that it has been paused and resumed. This is probably worth getting John's input if you actually want to do this. I'm not about to :) Honestly, neither am i at the moment. But i'll think about it. Is there any case in which the TSC is stable and the kvmclock data for different cpus is actually different? No. However, kvmclock_data.flags field is an interface for watchdog unpause. Do you want to that as a special tsc clocksource driver ? Even better: have the VM offer to invalidate the physical page containing the kernel's clock data on migration and interrupt one CPU. If another CPU races, it'll fault and wait for the guest kernel to update its timing. Perhaps that is a good idea. Does the current kvmclock stuff track CLOCK_MONOTONIC and CLOCK_REALTIME separately? No. kvmclock counting is interrupted on vm pause (the hw clock does not count during vm pause). Makes sense. Can you explain why you consider it so bad ? How you think it could be improved ? The second rdtsc_barrier looks unnecessary. Even better, if rdtscp is available, then rdtscp can replace rdtsc_barrier, rdtsc, and the getcpu call. It would also be nice to avoid having two sets of rescalings of the timing data. Yep, probably good improvements, patches are welcome :-) I may get to it at some point. No guarantees. I did just rewrite all the mapping-related code for every other x86 vdso timesource, so maybe I should try to add this to the pile. The fact that the data is a variable number of pages makes it messy, though, and since I don't understand why there's a separate structure for each CPU, I'm hesitant to change it too much. --Andy kvmclock.data? Because each VCPU can have different .flags fields for example. It looks like the vdso kvmclock code only runs if PVCLOCK_TSC_STABLE_BIT is set, which in turn is only the case if the TSC is guaranteed to be monotonic across all CPUs. If we can rely on the fact that that bit will only be set if tsc_to_system_mul and tsc_shift are the same on all CPUs and that (system_time - (tsc_timestamp * mul) shift) is the same on all CPUs, then there should be no reason for the vdso to read the pvclock data for anything but CPU 0. That will make it a lot faster and simpler. Can we rely on that? In theory yes, but you would have to handle PVCLOCK_TSC_STABLE_BIT set - PVCLOCK_TSC_STABLE_BIT not set Transition (and the other way around as well). Since !STABLE already results in a real syscall for clock_gettime and gettimeofday, I don't
Re: [PATCH] random: Add initialized variable to proc
On Thu, May 1, 2014 at 8:35 AM, Andy Lutomirski l...@amacapital.net wrote: On Thu, May 1, 2014 at 8:05 AM, ty...@mit.edu wrote: On Wed, Apr 30, 2014 at 09:05:00PM -0700, H. Peter Anvin wrote: Giving the guest a seed would be highly useful, though. There are a number of ways to do that; changing the boot protocol is probably only useful if Qemu itself bouts the kernel as opposed to an in-VM bootloader. So how about simply passing a memory address and an optional offset on the boot command line? That way the hypervisor can drop the seed in some convenient real memory location, and the kernel can just copy it someplace safe, or in the case of kernel ASLR, the relocator can use it to seed its CRNG, and then after it relocates the kernel, it can crank the CRNG to pass a seed to the kernel's urandom driver. That way, we don't have to do something which is ACPI or DT dependent. Maybe there will be embedded architectures where using DT might be more convenient, but this would probably be simplest for KVM/qumu-based VM's, I would think. One problem with passing a seed in memory like this is that it provides no benefit if the guest reboots without restarting the hypervisor. Using an MSR or something avoids that issue. Passing an address in I/O space that can be read to synchronously obtain a seed would work, but it could still be messy to get the address to propagate through the booatloader and the reboot process. A CPUID leaf or an MSR advertised by a CPUID leaf has another advantage: it's easy to use in the ASLR code -- I don't think there's a real IDT, so there's nothing like rdmsr_safe available. It also avoids doing anything complicated with the boot process to allow the same seed to be used for ASLR and random.c; it can just be invoked twice on boot. Here are two easyish ways to do it: a. Add a new CPUID leaf KVM_CPUID_URANDOM = 0x4002. The existence of the leaf is signaled by KVM_CPUID_SIGNATURE.eax = 0x4002. Reading the leaf either gives all zeros to indicate that it's unsupported or disabled or it gives 256 bits of urandom-style data in rax,rbx,rcx,edx. 32-bit callers will have trouble extracting more than 128 of those 256 bits, but that should be fine. b. Add a new MSR_KVM_URANDOM and indicate support using KVM_FEATURE_URANDOM. The is cleaner, since it matches existing practice, but it's awkward to return more than 64 bits at a time from rdmsr. 128 bits is straightforward by cheating and using the high bits in rax and rdx, but that's kind of gross. Clobbering any more registers is awful, and passing a pointer into wrmsr seems overcomplicated. There's also the hypercall interface, but it looks like hyperv support can interfere with it, and I'm not sure whether the guest needs to cooperate with whatever the magical vmcall patching code is doing. What's the right forum for this? This thread is probably not it. --Andy -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: random: Providing a seed value to VM guests
On Thu, May 1, 2014 at 11:59 AM, H. Peter Anvin h...@zytor.com wrote: On 05/01/2014 11:53 AM, Andy Lutomirski wrote: A CPUID leaf or an MSR advertised by a CPUID leaf has another advantage: it's easy to use in the ASLR code -- I don't think there's a real IDT, so there's nothing like rdmsr_safe available. It also avoids doing anything complicated with the boot process to allow the same seed to be used for ASLR and random.c; it can just be invoked twice on boot. At that point we are talking an x86-specific interface, and so we might as well simply emulate RDRAND (urandom) and RDSEED (random) if the CPU doesn't support them. I believe KVM already has a way to report CPUID features that are emulated but supported anyway, i.e. they work but are slow. Do existing kernels and userspace respect this? If the normal bit for RDRAND is unset, then we might be okay, but, if not, then I think this may kill guest performance. Is RDSEED really reasonable here? Won't it slow down by several orders of magnitude? What's the right forum for this? This thread is probably not it. Change the subject line? :) -hpa -- Andy Lutomirski AMA Capital Management, LLC -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: random: Providing a seed value to VM guests
On May 1, 2014 12:26 PM, ty...@mit.edu wrote: On Thu, May 01, 2014 at 12:02:49PM -0700, Andy Lutomirski wrote: Is RDSEED really reasonable here? Won't it slow down by several orders of magnitude? That is I think the biggest problem; RDRAND and RDSEED are fast if they are native, but they will involve a VM exit if they need to be emulated. So when an OS might want to use RDRAND and RDSEED might be quite different if we know they are being emulated. Using the RDRAND and RDSEED api certainly makes sense, at least for x86, but I suspect we might want to use a different way of signalling that a VM guest can use RDRAND and RDSEED if they are running on a CPU which doesn't provide that kind of access. Maybe a CPUID extended function parameter, if one could be allocated for use by a Linux hypervisor? I'm still not convinced. This will affect userspace as well as the guest kernel, and I don't see why guest user code should be able to access this API. RDRAND for CPL0 only would work, but that seems odd. And I think that RDSEED emulation is asking for trouble. RDSEED is synchronous, but /dev/random is asynchronous. And making bootup wait for even a single byte from /dev/random seems bad. In any event, virtio-rng should be a better interface for this. - Ted -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: random: Providing a seed value to VM guests
On Thu, May 1, 2014 at 1:30 PM, H. Peter Anvin h...@zytor.com wrote: RDSEED is not synchronous. It is, however, nonblocking. What I mean is: IIUC it's reasonable to call RDSEED a few times in a loop and hope it works. It makes no sense to do that with /dev/random. On May 1, 2014 1:16:40 PM PDT, Andy Lutomirski l...@amacapital.net wrote: On May 1, 2014 12:26 PM, ty...@mit.edu wrote: On Thu, May 01, 2014 at 12:02:49PM -0700, Andy Lutomirski wrote: Is RDSEED really reasonable here? Won't it slow down by several orders of magnitude? That is I think the biggest problem; RDRAND and RDSEED are fast if they are native, but they will involve a VM exit if they need to be emulated. So when an OS might want to use RDRAND and RDSEED might be quite different if we know they are being emulated. Using the RDRAND and RDSEED api certainly makes sense, at least for x86, but I suspect we might want to use a different way of signalling that a VM guest can use RDRAND and RDSEED if they are running on a CPU which doesn't provide that kind of access. Maybe a CPUID extended function parameter, if one could be allocated for use by a Linux hypervisor? I'm still not convinced. This will affect userspace as well as the guest kernel, and I don't see why guest user code should be able to access this API. RDRAND for CPL0 only would work, but that seems odd. And I think that RDSEED emulation is asking for trouble. RDSEED is synchronous, but /dev/random is asynchronous. And making bootup wait for even a single byte from /dev/random seems bad. In any event, virtio-rng should be a better interface for this. - Ted -- Sent from my mobile phone. Please pardon brevity and lack of formatting. -- Andy Lutomirski AMA Capital Management, LLC -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: random: Providing a seed value to VM guests
On Thu, May 1, 2014 at 1:39 PM, ty...@mit.edu wrote: On Thu, May 01, 2014 at 01:32:55PM -0700, Andy Lutomirski wrote: On Thu, May 1, 2014 at 1:30 PM, H. Peter Anvin h...@zytor.com wrote: RDSEED is not synchronous. It is, however, nonblocking. What I mean is: IIUC it's reasonable to call RDSEED a few times in a loop and hope it works. It makes no sense to do that with /dev/random. RDSEED is allowed to return an error if there is insufficient entropy. So long as the caller understands that this is an emulated instruction, I don't see a problem. What's the point? I think this is too caught up in x86 architectural stuff. As I see it, the goal is to give guests a way to ask their hosts to give them, immediately and synchronously, some bytes suitable for seeding an RNG. These bytes need not contain true entropy, because the host may not be able to provide entropy an a timely manner. The mechanism should be usable extremely early after boot, it should be usable after a guest reboot, and it should be reliable. I think there's an added benefit if all architectures can implement a semantically equivalent function, even if the interface is completely different. There's no need for anything new to provide asynchronous and-or very slow true random data -- virtio-rng already exists. * Emulating RDRAND for this purpose is a little weird because it's normally available to user code and it has the flag indicating failure. We're also not going to want the guest kernel to access it through the arch_get_random interface. Even if we could emulate RDSEED effectively**, I don't really understand what the guest is expected to do with it. And I generally dislike defining an interface with no known sensible users, because it means that there's a good chance that the interface won't end up working. * I still don't know why it doesn't work for me. I'll fiddle with it, but I think that the right solution is to fix it for this purpose, not to replace it. ** Doing this sensibly in the host will be awkward. Is the host supposed to use non-blocking reads of /dev/random? Getting anything remotely fair may be difficult. - Ted -- Andy Lutomirski AMA Capital Management, LLC -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: random: Providing a seed value to VM guests
On Thu, May 1, 2014 at 2:01 PM, H. Peter Anvin h...@zytor.com wrote: On 05/01/2014 01:56 PM, Andy Lutomirski wrote: Even if we could emulate RDSEED effectively**, I don't really understand what the guest is expected to do with it. And I generally dislike defining an interface with no known sensible users, because it means that there's a good chance that the interface won't end up working. ** Doing this sensibly in the host will be awkward. Is the host supposed to use non-blocking reads of /dev/random? Getting anything remotely fair may be difficult. The host can use nonblocking reads of /dev/random. Fairness would have to be implemented at the host level, but that is true for anything. I still don't see the point. What does this do better than virtio-rng? The ASLR code doesn't even try to use RDSEED. RDSEED is used in add_interrupt_randomness, which should drain the host's /dev/random even if it could, and it's used in init_std_data. The logic there is: if (!arch_get_random_seed_long(rv) !arch_get_random_long(rv)) rv = random_get_entropy(); I think this is better achieved by having the host try to supply the highest quality data it can. The third RDSEED use is arch_random_refill. This purpose would be much better served by the khwrng stuff and virtio-rng. So I still claim that fancy emulated RDSEED support will have no users. --Andy -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: random: Providing a seed value to VM guests
On Thu, May 1, 2014 at 3:28 PM, ty...@mit.edu wrote: On Thu, May 01, 2014 at 02:06:13PM -0700, Andy Lutomirski wrote: I still don't see the point. What does this do better than virtio-rng? I believe you had been complaining about how complicated it was to set up virtio? And this complexity is also an issue if we want to use it to initialize the RNG used for the kernel text ASLR --- which has to be done very early in the boot process, and where making something as simple as possible is a Good Thing. It's complicated, so it won't be up until much later in the boot process. This is completely fine for /dev/random, but it's a problem for /dev/urandom, ASLR, and such. And since we would want to use RDRAND/RDSEED if it is available *anyway*, perhaps in combination with other things, why not use the RDRAND/RDSEED interface? Because it's awkward. I don't think it simplifies anything. --Andy -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: random: Providing a seed value to VM guests
On Thu, May 1, 2014 at 3:46 PM, H. Peter Anvin h...@zytor.com wrote: On 05/01/2014 03:32 PM, Andy Lutomirski wrote: On Thu, May 1, 2014 at 3:28 PM, ty...@mit.edu wrote: On Thu, May 01, 2014 at 02:06:13PM -0700, Andy Lutomirski wrote: I still don't see the point. What does this do better than virtio-rng? I believe you had been complaining about how complicated it was to set up virtio? And this complexity is also an issue if we want to use it to initialize the RNG used for the kernel text ASLR --- which has to be done very early in the boot process, and where making something as simple as possible is a Good Thing. It's complicated, so it won't be up until much later in the boot process. This is completely fine for /dev/random, but it's a problem for /dev/urandom, ASLR, and such. And since we would want to use RDRAND/RDSEED if it is available *anyway*, perhaps in combination with other things, why not use the RDRAND/RDSEED interface? Because it's awkward. I don't think it simplifies anything. It greatly simplifies discovery, which is a Big Deal[TM] in early code. I think we're comparing: a) cpuid to detect rdrand *or* emulated rdrand followed by rdrand to b) cpuid to detect rdrand or the paravirt seed msr/cpuid call, followed by rdrand or the msr or cpuid read this seems like it barely makes a difference, especially since (a) probably requires detecting KVM anyway. For the real kernel code, it's probably even closer to making no difference, since I don't think we'll want arch_get_random_long to use emulated rdrand. --Andy -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
x86_64 allyesconfig has screwed up voffset and blows up KVM
I'm testing 39bfe90706ab0f588db7cb4d1c0e6d1181e1d2f9. I'm not sure what's going on here. voffset.h contains: #define VO__end 0x8111c7a0 #define VO__end 0x8db9a000 #define VO__text 0x8100 because $ nm vmlinux|grep ' _end' 8111c7a0 t _end 8db9a000 B _end Booting the resulting image says: KVM internal error. Suberror: 1 emulation failure EAX=8001 EBX= ECX=c080 EDX= ESI=00014630 EDI=0b08f000 EBP=0010 ESP=038f14b8 EIP=00100119 EFL=00010046 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0018 00c09300 DPL=0 DS [-WA] CS =0010 00c09b00 DPL=0 CS32 [-RA] SS =0018 00c09300 DPL=0 DS [-WA] DS =0018 00c09300 DPL=0 DS [-WA] FS =0018 00c09300 DPL=0 DS [-WA] GS =0018 00c09300 DPL=0 DS [-WA] LDT= 00c0 TR =0020 0fff 00808b00 DPL=0 TSS64-busy GDT= 038e5320 0030 IDT= CR0=8011 CR2= CR3=0b089000 CR4=0020 DR0= DR1= DR2= DR3= DR6=0ff0 DR7=0400 EFER=0500 Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? Linus's tree from today doesn't seem any better. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/10] RFC: userfault
On 07/02/2014 09:50 AM, Andrea Arcangeli wrote: Hello everyone, There's a large CC list for this RFC because this adds two new syscalls (userfaultfd and remap_anon_pages) and MADV_USERFAULT/MADV_NOUSERFAULT, so suggestions on changes to the API or on a completely different API if somebody has better ideas are welcome now. cc:linux-api -- this is certainly worthy of linux-api discussion. The combination of these features are what I would propose to implement postcopy live migration in qemu, and in general demand paging of remote memory, hosted in different cloud nodes. The MADV_USERFAULT feature should be generic enough that it can provide the userfaults to the Android volatile range feature too, on access of reclaimed volatile pages. If the access could ever happen in kernel context through syscalls (not not just from userland context), then userfaultfd has to be used to make the userfault unnoticeable to the syscall (no error will be returned). This latter feature is more advanced than what volatile ranges alone could do with SIGBUS so far (but it's optional, if the process doesn't call userfaultfd, the regular SIGBUS will fire, if the fd is closed SIGBUS will also fire for any blocked userfault that was waiting a userfaultfd_write ack). userfaultfd is also a generic enough feature, that it allows KVM to implement postcopy live migration without having to modify a single line of KVM kernel code. Guest async page faults, FOLL_NOWAIT and all other GUP features works just fine in combination with userfaults (userfaults trigger async page faults in the guest scheduler so those guest processes that aren't waiting for userfaults can keep running in the guest vcpus). remap_anon_pages is the syscall to use to resolve the userfaults (it's not mandatory, vmsplice will likely still be used in the case of local postcopy live migration just to upgrade the qemu binary, but remap_anon_pages is faster and ideal for transferring memory across the network, it's zerocopy and doesn't touch the vma: it only holds the mmap_sem for reading). The current behavior of remap_anon_pages is very strict to avoid any chance of memory corruption going unnoticed. mremap is not strict like that: if there's a synchronization bug it would drop the destination range silently resulting in subtle memory corruption for example. remap_anon_pages would return -EEXIST in that case. If there are holes in the source range remap_anon_pages will return -ENOENT. If remap_anon_pages is used always with 2M naturally aligned addresses, transparent hugepages will not be splitted. In there could be 4k (or any size) holes in the 2M (or any size) source range, remap_anon_pages should be used with the RAP_ALLOW_SRC_HOLES flag to relax some of its strict checks (-ENOENT won't be returned if RAP_ALLOW_SRC_HOLES is set, remap_anon_pages then will just behave as a noop on any hole in the source range). This flag is generally useful when implementing userfaults with THP granularity, but it shouldn't be set if doing the userfaults with PAGE_SIZE granularity if the developer wants to benefit from the strict -ENOENT behavior. The remap_anon_pages syscall API is not vectored, as I expect it to be used mainly for demand paging (where there can be just one faulting range per userfault) or for large ranges (with the THP model as an alternative to zapping re-dirtied pages with MADV_DONTNEED with 4k granularity before starting the guest in the destination node) where vectoring isn't going to provide much performance advantages (thanks to the THP coarser granularity). On the rmap side remap_anon_pages doesn't add much complexity: there's no need of nonlinear anon vmas to support it because I added the constraint that it will fail if the mapcount is more than 1. So in general the source range of remap_anon_pages should be marked MADV_DONTFORK to prevent any risk of failure if the process ever forks (like qemu can in some case). One part that hasn't been tested is the poll() syscall on the userfaultfd because the postcopy migration thread currently is more efficient waiting on blocking read()s (I'll write some code to test poll() too). I also appended below a patch to trinity to exercise remap_anon_pages and userfaultfd and it completes trinity successfully. The code can be found here: git clone --reference linux git://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git -b userfault The branch is rebased so you can get updates for example with: git fetch git checkout -f origin/userfault Comments welcome, thanks! Andrea From cbe940e13b4cead41e0f862b3abfa3814f235ec3 Mon Sep 17 00:00:00 2001 From: Andrea Arcangeli aarca...@redhat.com Date: Wed, 2 Jul 2014 18:32:35 +0200 Subject: [PATCH] add remap_anon_pages and userfaultfd Signed-off-by: Andrea Arcangeli aarca...@redhat.com --- include/syscalls-x86_64.h | 2 + syscalls/remap_anon_pages.c | 100
Re: [PATCH 08/10] userfaultfd: add new syscall to provide memory externalization
On 07/02/2014 09:50 AM, Andrea Arcangeli wrote: Once an userfaultfd is created MADV_USERFAULT regions talks through the userfaultfd protocol with the thread responsible for doing the memory externalization of the process. The protocol starts by userland writing the requested/preferred USERFAULT_PROTOCOL version into the userfault fd (64bit write), if kernel knows it, it will ack it by allowing userland to read 64bit from the userfault fd that will contain the same 64bit USERFAULT_PROTOCOL version that userland asked. Otherwise userfault will read __u64 value -1ULL (aka USERFAULTFD_UNKNOWN_PROTOCOL) and it will have to try again by writing an older protocol version if suitable for its usage too, and read it back again until it stops reading -1ULL. After that the userfaultfd protocol starts. The protocol consists in the userfault fd reads 64bit in size providing userland the fault addresses. After a userfault address has been read and the fault is resolved by userland, the application must write back 128bits in the form of [ start, end ] range (64bit each) that will tell the kernel such a range has been mapped. Multiple read userfaults can be resolved in a single range write. poll() can be used to know when there are new userfaults to read (POLLIN) and when there are threads waiting a wakeup through a range write (POLLOUT). Signed-off-by: Andrea Arcangeli aarca...@redhat.com +#ifdef CONFIG_PROC_FS +static int userfaultfd_show_fdinfo(struct seq_file *m, struct file *f) +{ + struct userfaultfd_ctx *ctx = f-private_data; + int ret; + wait_queue_t *wq; + struct userfaultfd_wait_queue *uwq; + unsigned long pending = 0, total = 0; + + spin_lock(ctx-fault_wqh.lock); + list_for_each_entry(wq, ctx-fault_wqh.task_list, task_list) { + uwq = container_of(wq, struct userfaultfd_wait_queue, wq); + if (uwq-pending) + pending++; + total++; + } + spin_unlock(ctx-fault_wqh.lock); + + ret = seq_printf(m, pending:\t%lu\ntotal:\t%lu\n, pending, total); This should show the protocol version, too. + +SYSCALL_DEFINE1(userfaultfd, int, flags) +{ + int fd, error; + struct file *file; This looks like it can't be used more than once in a process. That will be unfortunate for libraries. Would it be feasible to either have userfaultfd claim a range of addresses or for a vma to be explicitly associated with a userfaultfd? (In the latter case, giant PROT_NONE MAP_NORESERVE mappings could be used.) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/4] random,x86,kvm: Add and use MSR_KVM_GET_RNG_SEED
virtio-rng is both too complicated and insufficient for initial rng seeding. It's far too complicated to use for KASLR or any other early boot random number needs. It also provides /dev/random-style bits, which means that making guest boot wait for virtio-rng is unacceptably slow, and doing it asynchronously means that /dev/urandom might be predictable when userspace starts. This introduces a very simple synchronous mechanism to get /dev/urandom-style bits. This is a KVM change: am I supposed to write a unit test somewhere? Andy Lutomirski (4): x86,kvm: Add MSR_KVM_GET_RNG_SEED and a matching feature bit random,x86: Add arch_get_slow_rng_u64 random: Seed pools from arch_get_slow_rng_u64 at startup x86,kaslr: Use MSR_KVM_GET_RNG_SEED for KASLR if available Documentation/virtual/kvm/cpuid.txt | 3 +++ arch/x86/Kconfig | 4 arch/x86/boot/compressed/aslr.c | 27 +++ arch/x86/include/asm/archslowrng.h | 30 ++ arch/x86/include/uapi/asm/kvm_para.h | 2 ++ arch/x86/kernel/kvm.c| 22 ++ arch/x86/kvm/cpuid.c | 3 ++- arch/x86/kvm/x86.c | 4 drivers/char/random.c| 14 +- include/linux/random.h | 9 + 10 files changed, 116 insertions(+), 2 deletions(-) create mode 100644 arch/x86/include/asm/archslowrng.h -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] x86,kvm: Add MSR_KVM_GET_RNG_SEED and a matching feature bit
This adds a simple interface to allow a guest to request 64 bits of host nonblocking entropy. This is independent of virtio-rng for a couple of reasons: - It's intended to be usable during early boot, when a trivial synchronous interface is needed. - virtio-rng gives blocking entropy, and making guest boot wait for the host's /dev/random will cause problems. MSR_KVM_GET_RNG_SEED is intended to provide 64 bits of best-effort cryptographically secure data for use as a seed. It provides no guarantee that the result contains any actual entropy. Signed-off-by: Andy Lutomirski l...@amacapital.net --- Documentation/virtual/kvm/cpuid.txt | 3 +++ arch/x86/include/uapi/asm/kvm_para.h | 2 ++ arch/x86/kvm/cpuid.c | 3 ++- arch/x86/kvm/x86.c | 4 4 files changed, 11 insertions(+), 1 deletion(-) diff --git a/Documentation/virtual/kvm/cpuid.txt b/Documentation/virtual/kvm/cpuid.txt index 3c65feb..0ab043b 100644 --- a/Documentation/virtual/kvm/cpuid.txt +++ b/Documentation/virtual/kvm/cpuid.txt @@ -54,6 +54,9 @@ KVM_FEATURE_PV_UNHALT || 7 || guest checks this feature bit || || before enabling paravirtualized || || spinlock support. -- +KVM_FEATURE_GET_RNG_SEED || 8 || host provides rng seed data via + || || MSR_KVM_GET_RNG_SEED. +-- KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||24 || host will warn if no guest-side || || per-cpu warps are expected in || || kvmclock. diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h index 94dc8ca..e2eaf93 100644 --- a/arch/x86/include/uapi/asm/kvm_para.h +++ b/arch/x86/include/uapi/asm/kvm_para.h @@ -24,6 +24,7 @@ #define KVM_FEATURE_STEAL_TIME 5 #define KVM_FEATURE_PV_EOI 6 #define KVM_FEATURE_PV_UNHALT 7 +#define KVM_FEATURE_GET_RNG_SEED 8 /* The last 8 bits are used to indicate how to interpret the flags field * in pvclock structure. If no bits are set, all flags are ignored. @@ -40,6 +41,7 @@ #define MSR_KVM_ASYNC_PF_EN 0x4b564d02 #define MSR_KVM_STEAL_TIME 0x4b564d03 #define MSR_KVM_PV_EOI_EN 0x4b564d04 +#define MSR_KVM_GET_RNG_SEED 0x4b564d05 struct kvm_steal_time { __u64 steal; diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 38a0afe..40d6763 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -479,7 +479,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, (1 KVM_FEATURE_ASYNC_PF) | (1 KVM_FEATURE_PV_EOI) | (1 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) | -(1 KVM_FEATURE_PV_UNHALT); +(1 KVM_FEATURE_PV_UNHALT) | +(1 KVM_FEATURE_GET_RNG_SEED); if (sched_info_on()) entry-eax |= (1 KVM_FEATURE_STEAL_TIME); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index f644933..4e81853 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -48,6 +48,7 @@ #include linux/pci.h #include linux/timekeeper_internal.h #include linux/pvclock_gtod.h +#include linux/random.h #include trace/events/kvm.h #define CREATE_TRACE_POINTS @@ -2480,6 +2481,9 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata) case MSR_KVM_PV_EOI_EN: data = vcpu-arch.pv_eoi.msr_val; break; + case MSR_KVM_GET_RNG_SEED: + get_random_bytes(data, sizeof(data)); + break; case MSR_IA32_P5_MC_ADDR: case MSR_IA32_P5_MC_TYPE: case MSR_IA32_MCG_CAP: -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] random: Seed pools from arch_get_slow_rng_u64 at startup
This should help solve the problem of guests starting out with predictable RNG state. Signed-off-by: Andy Lutomirski l...@amacapital.net --- drivers/char/random.c | 14 +- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/drivers/char/random.c b/drivers/char/random.c index 0a7ac0a..bd88a24 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -1248,7 +1248,7 @@ EXPORT_SYMBOL(get_random_bytes_arch); */ static void init_std_data(struct entropy_store *r) { - int i; + int i, slow_rng_bits = 0; ktime_t now = ktime_get_real(); unsigned long rv; @@ -1261,6 +1261,18 @@ static void init_std_data(struct entropy_store *r) mix_pool_bytes(r, rv, sizeof(rv), NULL); } mix_pool_bytes(r, utsname(), sizeof(*(utsname())), NULL); + + for (i = 0; i 4; i++) { + u64 rv64; + + if (arch_get_slow_rng_u64(rv64)) { + mix_pool_bytes(r, rv64, sizeof(rv64), NULL); + slow_rng_bits += 8 * sizeof(rv64); + } + } + if (slow_rng_bits) + pr_info(random: seeded %s pool with %d bits of arch slow rng data\n, + r-name, slow_rng_bits); } /* -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] random,x86: Add arch_get_slow_rng_u64
arch_get_slow_rng_u64 tries to get 64 bits of RNG seed data. Unlike arch_get_random_{bytes,seed}, etc., it makes no claims about entropy content. It's also likely to be much slower and should not be used frequently. That being said, it should be fast enough to call several times during boot without any noticeable slowdown. This initial implementation backs it with MSR_KVM_GET_RNG_SEED if available. The intent is for other hypervisor guest implementations to implement this interface. Signed-off-by: Andy Lutomirski l...@amacapital.net --- arch/x86/Kconfig | 4 arch/x86/include/asm/archslowrng.h | 30 ++ arch/x86/kernel/kvm.c | 22 ++ include/linux/random.h | 9 + 4 files changed, 65 insertions(+) create mode 100644 arch/x86/include/asm/archslowrng.h diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index a8f749e..4dfb539 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -593,6 +593,7 @@ config KVM_GUEST bool KVM Guest support (including kvmclock) depends on PARAVIRT select PARAVIRT_CLOCK + select ARCH_SLOW_RNG default y ---help--- This option enables various optimizations for running under the KVM @@ -627,6 +628,9 @@ config PARAVIRT_TIME_ACCOUNTING config PARAVIRT_CLOCK bool +config ARCH_SLOW_RNG + bool + endif #HYPERVISOR_GUEST config NO_BOOTMEM diff --git a/arch/x86/include/asm/archslowrng.h b/arch/x86/include/asm/archslowrng.h new file mode 100644 index 000..c8e8d0d --- /dev/null +++ b/arch/x86/include/asm/archslowrng.h @@ -0,0 +1,30 @@ +/* + * This file is part of the Linux kernel. + * + * Copyright (c) 2014 Andy Lutomirski + * Authors: Andy Lutomirski l...@amacapital.net + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + */ + +#ifndef ASM_X86_ARCHSLOWRANDOM_H +#define ASM_X86_ARCHSLOWRANDOM_H + +#ifndef CONFIG_ARCH_SLOW_RNG +# error archslowrng.h should not be included if !CONFIG_ARCH_SLOW_RNG +#endif + +/* + * Performance is irrelevant here, so there's no point in using the + * paravirt ops mechanism. Instead just use a function pointer. + */ +extern int (*arch_get_slow_rng_u64)(u64 *v); + +#endif /* ASM_X86_ARCHSLOWRANDOM_H */ diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index 3dd8e2c..8d64d28 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -416,6 +416,25 @@ void kvm_disable_steal_time(void) wrmsr(MSR_KVM_STEAL_TIME, 0, 0); } +static int nop_get_slow_rng_u64(u64 *v) +{ + return 0; +} + +static int kvm_get_slow_rng_u64(u64 *v) +{ + /* +* Allow migration from a hypervisor with the GET_RNG_SEED +* feature to a hypervisor without it. +*/ + if (rdmsrl_safe(MSR_KVM_GET_RNG_SEED, v) == 0) + return 1; + else + return 0; +} + +int (*arch_get_slow_rng_u64)(u64 *v) = nop_get_slow_rng_u64; + #ifdef CONFIG_SMP static void __init kvm_smp_prepare_boot_cpu(void) { @@ -493,6 +512,9 @@ void __init kvm_guest_init(void) if (kvmclock_vsyscall) kvm_setup_vsyscall_timeinfo(); + if (kvm_para_has_feature(KVM_FEATURE_GET_RNG_SEED)) + arch_get_slow_rng_u64 = kvm_get_slow_rng_u64; + #ifdef CONFIG_SMP smp_ops.smp_prepare_boot_cpu = kvm_smp_prepare_boot_cpu; register_cpu_notifier(kvm_cpu_notifier); diff --git a/include/linux/random.h b/include/linux/random.h index 57fbbff..ceafbcf 100644 --- a/include/linux/random.h +++ b/include/linux/random.h @@ -106,6 +106,15 @@ static inline int arch_has_random_seed(void) } #endif +#ifdef CONFIG_ARCH_SLOW_RNG +# include asm/archslowrng.h +#else +static inline int arch_get_slow_rng_u64(u64 *v) +{ + return 0; +} +#endif + /* Pseudo random number generator from numerical recipes. */ static inline u32 next_pseudo_random32(u32 seed) { -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] x86,kaslr: Use MSR_KVM_GET_RNG_SEED for KASLR if available
It's considerably better than any of the alternatives on KVM. Signed-off-by: Andy Lutomirski l...@amacapital.net --- arch/x86/boot/compressed/aslr.c | 27 +++ 1 file changed, 27 insertions(+) diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c index fc6091a..8583f0e 100644 --- a/arch/x86/boot/compressed/aslr.c +++ b/arch/x86/boot/compressed/aslr.c @@ -5,6 +5,8 @@ #include asm/archrandom.h #include asm/e820.h +#include uapi/asm/kvm_para.h + #include generated/compile.h #include linux/module.h #include linux/uts.h @@ -15,6 +17,22 @@ static const char build_str[] = UTS_RELEASE ( LINUX_COMPILE_BY @ LINUX_COMPILE_HOST ) ( LINUX_COMPILER ) UTS_VERSION; +static bool kvm_para_has_feature(unsigned int feature) +{ + u32 kvm_base; + u32 features; + + if (!has_cpuflag(X86_FEATURE_HYPERVISOR)) + return false; + + kvm_base = hypervisor_cpuid_base(KVMKVMKVM\0\0\0, KVM_CPUID_FEATURES); + if (!kvm_base) + return false; + + features = cpuid_eax(kvm_base | KVM_CPUID_FEATURES); + return features (1UL feature); +} + #define I8254_PORT_CONTROL 0x43 #define I8254_PORT_COUNTER00x40 #define I8254_CMD_READBACK 0xC0 @@ -81,6 +99,15 @@ static unsigned long get_random_long(void) } } + if (kvm_para_has_feature(KVM_FEATURE_GET_RNG_SEED)) { + u64 seed; + + debug_putstr( MSR_KVM_GET_RNG_SEED); + rdmsrl(MSR_KVM_GET_RNG_SEED, seed); + random ^= (unsigned long)seed; + use_i8254 = false; + } + if (has_cpuflag(X86_FEATURE_TSC)) { debug_putstr( RDTSC); rdtscll(raw); -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH kvm-unit-tests] Add a test case for MSR_KVM_GET_RNG_SEED
Signed-off-by: Andy Lutomirski l...@amacapital.net --- config/config-x86-common.mak | 5 - x86/get_rng_seed.c | 50 x86/unittests.cfg| 3 +++ 3 files changed, 57 insertions(+), 1 deletion(-) create mode 100644 x86/get_rng_seed.c diff --git a/config/config-x86-common.mak b/config/config-x86-common.mak index 0b0da85..201a029 100644 --- a/config/config-x86-common.mak +++ b/config/config-x86-common.mak @@ -35,7 +35,8 @@ tests-common = $(TEST_DIR)/vmexit.flat $(TEST_DIR)/tsc.flat \ $(TEST_DIR)/kvmclock_test.flat $(TEST_DIR)/eventinj.flat \ $(TEST_DIR)/s3.flat $(TEST_DIR)/pmu.flat \ $(TEST_DIR)/tsc_adjust.flat $(TEST_DIR)/asyncpf.flat \ - $(TEST_DIR)/init.flat $(TEST_DIR)/smap.flat + $(TEST_DIR)/init.flat $(TEST_DIR)/smap.flat \ + $(TEST_DIR)/get_rng_seed.flat ifdef API tests-common += api/api-sample @@ -105,6 +106,8 @@ $(TEST_DIR)/vmx.elf: $(cstart.o) $(TEST_DIR)/vmx.o $(TEST_DIR)/vmx_tests.o $(TEST_DIR)/debug.elf: $(cstart.o) $(TEST_DIR)/debug.o +$(TEST_DIR)/get_rng_seed.elf: $(cstart.o) $(TEST_DIR)/get_rng_seed.o + arch_clean: $(RM) $(TEST_DIR)/*.o $(TEST_DIR)/*.flat $(TEST_DIR)/*.elf \ $(TEST_DIR)/.*.d lib/x86/.*.d diff --git a/x86/get_rng_seed.c b/x86/get_rng_seed.c new file mode 100644 index 000..b2e1b01 --- /dev/null +++ b/x86/get_rng_seed.c @@ -0,0 +1,50 @@ +/* + * Simple test for MSR_KVM_GET_RNG_SEED. + */ +#include x86/msr.h +#include x86/processor.h +#include x86/apic-defs.h +#include x86/apic.h +#include x86/desc.h +#include x86/isr.h +#include x86/vm.h + +#include libcflat.h +#include stdint.h + +#define MSR_KVM_GET_RNG_SEED 0x4b564d05 + +volatile int ngpfs; +bool fail; + +static void gpf_isr(struct ex_regs *r) +{ + ngpfs++; + r-rip += 2; +} + +int main(int ac, char **av) +{ + int loop = 3; + u64 val, prev = 0; + + setup_vm(); + setup_idt(); + while(loop--) { + val = rdmsr(MSR_KVM_GET_RNG_SEED); + printf(rng seed: %llx\n, (unsigned long)val); + if (val == prev) + fail = true; + prev = val; + } + + handle_exception(13, gpf_isr); + wrmsr(MSR_KVM_GET_RNG_SEED, 0); + if (ngpfs != 1) { + printf(error: wrmsr(MSR_KVM_GET_RNG_SEED) should not work\n); + fail = true; + } + + printf(%s\n, fail ? FAIL : PASS); + return fail; +} diff --git a/x86/unittests.cfg b/x86/unittests.cfg index d78fe0e..98e5c7b 100644 --- a/x86/unittests.cfg +++ b/x86/unittests.cfg @@ -158,3 +158,6 @@ arch = x86_64 [debug] file = debug.flat arch = x86_64 + +[get_rng_seed] +file = get_rnd_seed.flat -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] random,x86,kvm: Add and use MSR_KVM_GET_RNG_SEED
On Wed, Jul 16, 2014 at 12:36 AM, Paolo Bonzini pbonz...@redhat.com wrote: Il 16/07/2014 09:10, Daniel Borkmann ha scritto: On 07/16/2014 08:41 AM, Gleb Natapov wrote: On Tue, Jul 15, 2014 at 07:48:06PM -0700, Andy Lutomirski wrote: virtio-rng is both too complicated and insufficient for initial rng seeding. It's far too complicated to use for KASLR or any other early boot random number needs. It also provides /dev/random-style bits, which means that making guest boot wait for virtio-rng is unacceptably slow, and doing it asynchronously means that /dev/urandom might be predictable when userspace starts. This introduces a very simple synchronous mechanism to get /dev/urandom-style bits. Why can't you use RDRAND instruction for that? You mean using it directly? I think simply for the very same reasons as in c2557a303a ... No, this is very different. This mechanism provides no guarantee that the result contains any actual entropy. In fact, patch 3 adds a call to the new arch_get_slow_rng_u64 just below a call to arch_get_random_lang aka RDRAND. I agree with Gleb that it's simpler to just expect a relatively recent processor and use RDRAND. BTW, the logic for crediting entropy to RDSEED but not RDRAND escapes me. If you trust the processor, you could use Intel's algorithm to force reseeding of RDRAND. If you don't trust the processor, the same paranoia applies to RDRAND and RDSEED. In a guest you must trust the hypervisor anyway to use RDRAND or RDSEED, since the hypervisor can trap it. A malicious hypervisor is no different from a malicious processor. This patch has nothing whatsoever to do with how much I trust the CPU vs the hypervisor. It's for the enormous installed base of machines without RDRAND. hpa suggested emulating RDRAND awhile ago, but I think that'll unusably slow -- the kernel uses RDRAND in various places where it's expected to be fast, and not using it at all will be preferable to causing a VM exit for every few bytes. I've been careful to only use this in the guest in places where a few hundred to a few thousand cycles per 64 bits of RNG seed is acceptable. In any case, is there a matching QEMU patch somewhere? What QEMU change is needed? I admit I'm a bit vague on how QEMU and KVM cooperate here, but there's no state to save and restore. I guess that QEMU wants the ability to turn this on and off for migration. How does that work? I couldn't spot the KVM code that allows this type of control. --Andy -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH qemu] i386,linux-headers: Add support for kvm_get_rng_seed
This updates x86's kvm_para.h for the feature bit definition and target-i386/cpu.c for the feature name and default. Signed-off-by: Andy Lutomirski l...@amacapital.net --- linux-headers/asm-x86/kvm_para.h | 2 ++ target-i386/cpu.c| 5 +++-- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/linux-headers/asm-x86/kvm_para.h b/linux-headers/asm-x86/kvm_para.h index e41c5c1..a9b27ce 100644 --- a/linux-headers/asm-x86/kvm_para.h +++ b/linux-headers/asm-x86/kvm_para.h @@ -24,6 +24,7 @@ #define KVM_FEATURE_STEAL_TIME 5 #define KVM_FEATURE_PV_EOI 6 #define KVM_FEATURE_PV_UNHALT 7 +#define KVM_FEATURE_GET_RNG_SEED 8 /* The last 8 bits are used to indicate how to interpret the flags field * in pvclock structure. If no bits are set, all flags are ignored. @@ -40,6 +41,7 @@ #define MSR_KVM_ASYNC_PF_EN 0x4b564d02 #define MSR_KVM_STEAL_TIME 0x4b564d03 #define MSR_KVM_PV_EOI_EN 0x4b564d04 +#define MSR_KVM_GET_RNG_SEED 0x4b564d05 struct kvm_steal_time { __u64 steal; diff --git a/target-i386/cpu.c b/target-i386/cpu.c index 8fd1497..4ea7e6c 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -236,7 +236,7 @@ static const char *ext4_feature_name[] = { static const char *kvm_feature_name[] = { kvmclock, kvm_nopiodelay, kvm_mmu, kvmclock, kvm_asyncpf, kvm_steal_time, kvm_pv_eoi, kvm_pv_unhalt, -NULL, NULL, NULL, NULL, +kvm_get_rng_seed, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, @@ -368,7 +368,8 @@ static uint32_t kvm_default_features[FEATURE_WORDS] = { (1 KVM_FEATURE_ASYNC_PF) | (1 KVM_FEATURE_STEAL_TIME) | (1 KVM_FEATURE_PV_EOI) | -(1 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT), +(1 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) | +(1 KVM_FEATURE_GET_RNG_SEED), [FEAT_1_ECX] = CPUID_EXT_X2APIC, }; -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] random,x86,kvm: Add and use MSR_KVM_GET_RNG_SEED
On Wed, Jul 16, 2014 at 7:32 AM, Paolo Bonzini pbonz...@redhat.com wrote: Il 16/07/2014 16:07, Andy Lutomirski ha scritto: This patch has nothing whatsoever to do with how much I trust the CPU vs the hypervisor. It's for the enormous installed base of machines without RDRAND. Ok. I think an MSR is fine, though I don't think it's useful for the guest to use it if it already has RDRAND and/or RDSEED. In any case, is there a matching QEMU patch somewhere? What QEMU change is needed? I admit I'm a bit vague on how QEMU and KVM cooperate here, but there's no state to save and restore. I guess that QEMU wants the ability to turn this on and off for migration. How does that work? I couldn't spot the KVM code that allows this type of control. It is QEMU who decides the CPUID bits that are visible to the guest. By default it blocks bits that it doesn't know about. You would need to add the bit in the kvm_default_features and kvm_feature_name arrays. For migration, we have versioned machine types, for example pc-2.1. Once the versioned machine type exists, blocking the feature is a one-liner like x86_cpu_compat_disable_kvm_features(FEAT_KVM, KVM_FEATURE_NAME); Unfortunately, QEMU is in hard freeze, so you'd likely be the one creating pc-2.2. This is a boilerplate but relatively complicated patch. But let's cross that bridge when we'll reach it. For now, you can simply add the bit to the two arrays above. Done. NB: Patch 4 of this series is bad due to an asm constraint issue that I haven't figured out yet. I'll send a replacement once I get it working. *sigh* the x86 kernel loading code is a bit of a compilation mess. Paolo -- Andy Lutomirski AMA Capital Management, LLC -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 1/5] x86,kvm: Add MSR_KVM_GET_RNG_SEED and a matching feature bit
This adds a simple interface to allow a guest to request 64 bits of host nonblocking entropy. This is independent of virtio-rng for a couple of reasons: - It's intended to be usable during early boot, when a trivial synchronous interface is needed. - virtio-rng gives blocking entropy, and making guest boot wait for the host's /dev/random will cause problems. MSR_KVM_GET_RNG_SEED is intended to provide 64 bits of best-effort cryptographically secure data for use as a seed. It provides no guarantee that the result contains any actual entropy. Signed-off-by: Andy Lutomirski l...@amacapital.net --- Documentation/virtual/kvm/cpuid.txt | 3 +++ arch/x86/include/uapi/asm/kvm_para.h | 2 ++ arch/x86/kvm/cpuid.c | 3 ++- arch/x86/kvm/x86.c | 4 4 files changed, 11 insertions(+), 1 deletion(-) diff --git a/Documentation/virtual/kvm/cpuid.txt b/Documentation/virtual/kvm/cpuid.txt index 3c65feb..0ab043b 100644 --- a/Documentation/virtual/kvm/cpuid.txt +++ b/Documentation/virtual/kvm/cpuid.txt @@ -54,6 +54,9 @@ KVM_FEATURE_PV_UNHALT || 7 || guest checks this feature bit || || before enabling paravirtualized || || spinlock support. -- +KVM_FEATURE_GET_RNG_SEED || 8 || host provides rng seed data via + || || MSR_KVM_GET_RNG_SEED. +-- KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||24 || host will warn if no guest-side || || per-cpu warps are expected in || || kvmclock. diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h index 94dc8ca..e2eaf93 100644 --- a/arch/x86/include/uapi/asm/kvm_para.h +++ b/arch/x86/include/uapi/asm/kvm_para.h @@ -24,6 +24,7 @@ #define KVM_FEATURE_STEAL_TIME 5 #define KVM_FEATURE_PV_EOI 6 #define KVM_FEATURE_PV_UNHALT 7 +#define KVM_FEATURE_GET_RNG_SEED 8 /* The last 8 bits are used to indicate how to interpret the flags field * in pvclock structure. If no bits are set, all flags are ignored. @@ -40,6 +41,7 @@ #define MSR_KVM_ASYNC_PF_EN 0x4b564d02 #define MSR_KVM_STEAL_TIME 0x4b564d03 #define MSR_KVM_PV_EOI_EN 0x4b564d04 +#define MSR_KVM_GET_RNG_SEED 0x4b564d05 struct kvm_steal_time { __u64 steal; diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 38a0afe..40d6763 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -479,7 +479,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, (1 KVM_FEATURE_ASYNC_PF) | (1 KVM_FEATURE_PV_EOI) | (1 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) | -(1 KVM_FEATURE_PV_UNHALT); +(1 KVM_FEATURE_PV_UNHALT) | +(1 KVM_FEATURE_GET_RNG_SEED); if (sched_info_on()) entry-eax |= (1 KVM_FEATURE_STEAL_TIME); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index f644933..4e81853 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -48,6 +48,7 @@ #include linux/pci.h #include linux/timekeeper_internal.h #include linux/pvclock_gtod.h +#include linux/random.h #include trace/events/kvm.h #define CREATE_TRACE_POINTS @@ -2480,6 +2481,9 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata) case MSR_KVM_PV_EOI_EN: data = vcpu-arch.pv_eoi.msr_val; break; + case MSR_KVM_GET_RNG_SEED: + get_random_bytes(data, sizeof(data)); + break; case MSR_IA32_P5_MC_ADDR: case MSR_IA32_P5_MC_TYPE: case MSR_IA32_MCG_CAP: -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 5/5] x86,kaslr: Use MSR_KVM_GET_RNG_SEED for KASLR if available
It's considerably better than any of the alternatives on KVM. Rather than reinventing all of the cpu feature query code, this fixes native_cpuid to work in PIC objects. I haven't combined it with boot/cpuflags.c's cpuid implementation: including asm/processor.h from boot/cpuflags.c results in a flood of unrelated errors, and fixing it might be messy. Signed-off-by: Andy Lutomirski l...@amacapital.net --- arch/x86/boot/compressed/aslr.c | 27 +++ arch/x86/include/asm/processor.h | 21 ++--- 2 files changed, 45 insertions(+), 3 deletions(-) diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c index fc6091a..8583f0e 100644 --- a/arch/x86/boot/compressed/aslr.c +++ b/arch/x86/boot/compressed/aslr.c @@ -5,6 +5,8 @@ #include asm/archrandom.h #include asm/e820.h +#include uapi/asm/kvm_para.h + #include generated/compile.h #include linux/module.h #include linux/uts.h @@ -15,6 +17,22 @@ static const char build_str[] = UTS_RELEASE ( LINUX_COMPILE_BY @ LINUX_COMPILE_HOST ) ( LINUX_COMPILER ) UTS_VERSION; +static bool kvm_para_has_feature(unsigned int feature) +{ + u32 kvm_base; + u32 features; + + if (!has_cpuflag(X86_FEATURE_HYPERVISOR)) + return false; + + kvm_base = hypervisor_cpuid_base(KVMKVMKVM\0\0\0, KVM_CPUID_FEATURES); + if (!kvm_base) + return false; + + features = cpuid_eax(kvm_base | KVM_CPUID_FEATURES); + return features (1UL feature); +} + #define I8254_PORT_CONTROL 0x43 #define I8254_PORT_COUNTER00x40 #define I8254_CMD_READBACK 0xC0 @@ -81,6 +99,15 @@ static unsigned long get_random_long(void) } } + if (kvm_para_has_feature(KVM_FEATURE_GET_RNG_SEED)) { + u64 seed; + + debug_putstr( MSR_KVM_GET_RNG_SEED); + rdmsrl(MSR_KVM_GET_RNG_SEED, seed); + random ^= (unsigned long)seed; + use_i8254 = false; + } + if (has_cpuflag(X86_FEATURE_TSC)) { debug_putstr( RDTSC); rdtscll(raw); diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index a4ea023..6096f3c 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -189,10 +189,25 @@ static inline int have_cpuid_p(void) static inline void native_cpuid(unsigned int *eax, unsigned int *ebx, unsigned int *ecx, unsigned int *edx) { - /* ecx is often an input as well as an output. */ - asm volatile(cpuid + /* +* This function can be used from the boot code, so it needs +* to avoid using EBX in constraints in PIC mode. +* +* ecx is often an input as well as an output. +*/ + asm volatile(.ifnc %%ebx,%1 ; .ifnc %%rbx,%1 \n\t +movl %%ebx,%1\n\t +.endif ; .endif \n\t +cpuid \n\t +.ifnc %%ebx,%1 ; .ifnc %%rbx,%1 \n\t +xchgl %%ebx,%1\n\t +.endif ; .endif : =a (*eax), - =b (*ebx), +#if defined(__i386__) defined(__PIC__) + =r (*ebx), /* gcc won't let us use ebx */ +#else + =b (*ebx), /* ebx is okay */ +#endif =c (*ecx), =d (*edx) : 0 (*eax), 2 (*ecx) -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 4/5] random: Log how many bits we managed to seed with in init_std_data
This is useful for making sure that init_std_data is working correctly and for allaying fear when this happens: random: xyz urandom read with SMALL_NUMBER bits of entropy available Signed-off-by: Andy Lutomirski l...@amacapital.net --- drivers/char/random.c | 14 +++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/drivers/char/random.c b/drivers/char/random.c index e2c3d02..10e9642 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -1251,12 +1251,16 @@ static void init_std_data(struct entropy_store *r) int i; ktime_t now = ktime_get_real(); unsigned long rv; + int arch_seed_bits = 0, arch_random_bits = 0, slow_rng_bits = 0; r-last_pulled = jiffies; mix_pool_bytes(r, now, sizeof(now), NULL); for (i = r-poolinfo-poolbytes; i 0; i -= sizeof(rv)) { - if (!arch_get_random_seed_long(rv) - !arch_get_random_long(rv)) + if (arch_get_random_seed_long(rv)) + arch_seed_bits += 8 * sizeof(rv); + else if (arch_get_random_long(rv)) + arch_random_bits += 8 * sizeof(rv); + else rv = random_get_entropy(); mix_pool_bytes(r, rv, sizeof(rv), NULL); } @@ -1265,10 +1269,14 @@ static void init_std_data(struct entropy_store *r) for (i = 0; i 4; i++) { u64 rv64; - if (arch_get_slow_rng_u64(rv64)) + if (arch_get_slow_rng_u64(rv64)) { mix_pool_bytes(r, rv64, sizeof(rv64), NULL); + slow_rng_bits += 8 * sizeof(rv64); } } + + pr_info(random: seeded %s pool with %d bits of arch random seed, %d bits of arch random, and %d bits of arch slow rng\n, + r-name, arch_seed_bits, arch_random_bits, slow_rng_bits); } /* -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 2/5] random,x86: Add arch_get_slow_rng_u64
arch_get_slow_rng_u64 tries to get 64 bits of RNG seed data. Unlike arch_get_random_{bytes,seed}, etc., it makes no claims about entropy content. It's also likely to be much slower and should not be used frequently. That being said, it should be fast enough to call several times during boot without any noticeable slowdown. This initial implementation backs it with MSR_KVM_GET_RNG_SEED if available. The intent is for other hypervisor guest implementations to implement this interface. Signed-off-by: Andy Lutomirski l...@amacapital.net --- arch/x86/Kconfig | 4 arch/x86/include/asm/archslowrng.h | 30 ++ arch/x86/kernel/kvm.c | 22 ++ include/linux/random.h | 9 + 4 files changed, 65 insertions(+) create mode 100644 arch/x86/include/asm/archslowrng.h diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index a8f749e..4dfb539 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -593,6 +593,7 @@ config KVM_GUEST bool KVM Guest support (including kvmclock) depends on PARAVIRT select PARAVIRT_CLOCK + select ARCH_SLOW_RNG default y ---help--- This option enables various optimizations for running under the KVM @@ -627,6 +628,9 @@ config PARAVIRT_TIME_ACCOUNTING config PARAVIRT_CLOCK bool +config ARCH_SLOW_RNG + bool + endif #HYPERVISOR_GUEST config NO_BOOTMEM diff --git a/arch/x86/include/asm/archslowrng.h b/arch/x86/include/asm/archslowrng.h new file mode 100644 index 000..c8e8d0d --- /dev/null +++ b/arch/x86/include/asm/archslowrng.h @@ -0,0 +1,30 @@ +/* + * This file is part of the Linux kernel. + * + * Copyright (c) 2014 Andy Lutomirski + * Authors: Andy Lutomirski l...@amacapital.net + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + */ + +#ifndef ASM_X86_ARCHSLOWRANDOM_H +#define ASM_X86_ARCHSLOWRANDOM_H + +#ifndef CONFIG_ARCH_SLOW_RNG +# error archslowrng.h should not be included if !CONFIG_ARCH_SLOW_RNG +#endif + +/* + * Performance is irrelevant here, so there's no point in using the + * paravirt ops mechanism. Instead just use a function pointer. + */ +extern int (*arch_get_slow_rng_u64)(u64 *v); + +#endif /* ASM_X86_ARCHSLOWRANDOM_H */ diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index 3dd8e2c..8d64d28 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -416,6 +416,25 @@ void kvm_disable_steal_time(void) wrmsr(MSR_KVM_STEAL_TIME, 0, 0); } +static int nop_get_slow_rng_u64(u64 *v) +{ + return 0; +} + +static int kvm_get_slow_rng_u64(u64 *v) +{ + /* +* Allow migration from a hypervisor with the GET_RNG_SEED +* feature to a hypervisor without it. +*/ + if (rdmsrl_safe(MSR_KVM_GET_RNG_SEED, v) == 0) + return 1; + else + return 0; +} + +int (*arch_get_slow_rng_u64)(u64 *v) = nop_get_slow_rng_u64; + #ifdef CONFIG_SMP static void __init kvm_smp_prepare_boot_cpu(void) { @@ -493,6 +512,9 @@ void __init kvm_guest_init(void) if (kvmclock_vsyscall) kvm_setup_vsyscall_timeinfo(); + if (kvm_para_has_feature(KVM_FEATURE_GET_RNG_SEED)) + arch_get_slow_rng_u64 = kvm_get_slow_rng_u64; + #ifdef CONFIG_SMP smp_ops.smp_prepare_boot_cpu = kvm_smp_prepare_boot_cpu; register_cpu_notifier(kvm_cpu_notifier); diff --git a/include/linux/random.h b/include/linux/random.h index 57fbbff..ceafbcf 100644 --- a/include/linux/random.h +++ b/include/linux/random.h @@ -106,6 +106,15 @@ static inline int arch_has_random_seed(void) } #endif +#ifdef CONFIG_ARCH_SLOW_RNG +# include asm/archslowrng.h +#else +static inline int arch_get_slow_rng_u64(u64 *v) +{ + return 0; +} +#endif + /* Pseudo random number generator from numerical recipes. */ static inline u32 next_pseudo_random32(u32 seed) { -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 3/5] random: Seed pools from arch_get_slow_rng_u64 at startup
This should help solve the problem of guests starting out with predictable RNG state. Signed-off-by: Andy Lutomirski l...@amacapital.net --- drivers/char/random.c | 8 1 file changed, 8 insertions(+) diff --git a/drivers/char/random.c b/drivers/char/random.c index 0a7ac0a..e2c3d02 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -1261,6 +1261,14 @@ static void init_std_data(struct entropy_store *r) mix_pool_bytes(r, rv, sizeof(rv), NULL); } mix_pool_bytes(r, utsname(), sizeof(*(utsname())), NULL); + + for (i = 0; i 4; i++) { + u64 rv64; + + if (arch_get_slow_rng_u64(rv64)) + mix_pool_bytes(r, rv64, sizeof(rv64), NULL); + } + } } /* -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 0/5] random,x86,kvm: Add and use MSR_KVM_GET_RNG_SEED
virtio-rng is both too complicated and insufficient for initial rng seeding. It's far too complicated to use for KASLR or any other early boot random number needs. It also provides /dev/random-style bits, which means that making guest boot wait for virtio-rng is unacceptably slow, and doing it asynchronously means that /dev/urandom might be predictable when userspace starts. This introduces a very simple synchronous mechanism to get /dev/urandom-style bits. I sent the corresponding kvm-unit-tests and qemu changes separately. There's room for bikeshedding on the same arch_get_slow_rng_u64. I considered arch_get_rng_seed_u64, but that could be confused with arch_get_random_seed_long, which is not interchangeable. Changes from v1: - Split patches 2 and 3 - Log all arch sources in init_std_data - Fix the 32-bit kaslr build Andy Lutomirski (5): x86,kvm: Add MSR_KVM_GET_RNG_SEED and a matching feature bit random,x86: Add arch_get_slow_rng_u64 random: Seed pools from arch_get_slow_rng_u64 at startup random: Log how many bits we managed to seed with in init_std_data x86,kaslr: Use MSR_KVM_GET_RNG_SEED for KASLR if available Documentation/virtual/kvm/cpuid.txt | 3 +++ arch/x86/Kconfig | 4 arch/x86/boot/compressed/aslr.c | 27 +++ arch/x86/include/asm/archslowrng.h | 30 ++ arch/x86/include/asm/processor.h | 21 ++--- arch/x86/include/uapi/asm/kvm_para.h | 2 ++ arch/x86/kernel/kvm.c| 22 ++ arch/x86/kvm/cpuid.c | 3 ++- arch/x86/kvm/x86.c | 4 drivers/char/random.c| 20 ++-- include/linux/random.h | 9 + 11 files changed, 139 insertions(+), 6 deletions(-) create mode 100644 arch/x86/include/asm/archslowrng.h -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 0/5] random,x86,kvm: Add and use MSR_KVM_GET_RNG_SEED
On Wed, Jul 16, 2014 at 11:02 AM, Bandan Das b...@redhat.com wrote: Andy Lutomirski l...@amacapital.net writes: virtio-rng is both too complicated and insufficient for initial rng seeding. It's far too complicated to use for KASLR or any other early boot random number needs. It also provides /dev/random-style bits, which means that making guest boot wait for virtio-rng is unacceptably slow, and doing it asynchronously means that /dev/urandom might be predictable when userspace starts. This introduces a very simple synchronous mechanism to get /dev/urandom-style bits. Whoa! the cover letter seems more like virtio-rng bashing rather than introduction to the patchset (and/or it's advantages over existing methods) :) That's ok though I guess, these won't be in the commit log. Yeah, sorry -- I figured that the biggest objection would be just use virtio-rng. I'll send a v3 later today -- there's a trivial bisectability bug in this version. --Andy I sent the corresponding kvm-unit-tests and qemu changes separately. There's room for bikeshedding on the same arch_get_slow_rng_u64. I considered arch_get_rng_seed_u64, but that could be confused with arch_get_random_seed_long, which is not interchangeable. Changes from v1: - Split patches 2 and 3 - Log all arch sources in init_std_data - Fix the 32-bit kaslr build Andy Lutomirski (5): x86,kvm: Add MSR_KVM_GET_RNG_SEED and a matching feature bit random,x86: Add arch_get_slow_rng_u64 random: Seed pools from arch_get_slow_rng_u64 at startup random: Log how many bits we managed to seed with in init_std_data x86,kaslr: Use MSR_KVM_GET_RNG_SEED for KASLR if available Documentation/virtual/kvm/cpuid.txt | 3 +++ arch/x86/Kconfig | 4 arch/x86/boot/compressed/aslr.c | 27 +++ arch/x86/include/asm/archslowrng.h | 30 ++ arch/x86/include/asm/processor.h | 21 ++--- arch/x86/include/uapi/asm/kvm_para.h | 2 ++ arch/x86/kernel/kvm.c| 22 ++ arch/x86/kvm/cpuid.c | 3 ++- arch/x86/kvm/x86.c | 4 drivers/char/random.c| 20 ++-- include/linux/random.h | 9 + 11 files changed, 139 insertions(+), 6 deletions(-) create mode 100644 arch/x86/include/asm/archslowrng.h -- Andy Lutomirski AMA Capital Management, LLC -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] random,x86,kvm: Add and use MSR_KVM_GET_RNG_SEED
On Wed, Jul 16, 2014 at 1:20 PM, H. Peter Anvin h...@zytor.com wrote: On 07/16/2014 09:21 AM, Gleb Natapov wrote: On Wed, Jul 16, 2014 at 09:13:23AM -0700, H. Peter Anvin wrote: On 07/16/2014 09:08 AM, Paolo Bonzini wrote: Il 16/07/2014 18:03, H. Peter Anvin ha scritto: I suggested emulating RDRAND *but not set the CPUID bit*. We already developed a protocol in KVM/Qemu to enumerate emulated features (created for MOVBE as I recall), specifically to service the semantic feature X will work but will be substantially slower than normal. But those will set the CPUID bit. There is currently no way for KVM guests to know if a CPUID bit is real or emulated. OK, so there wasn't any protocol implemented in the end. I sit corrected. That protocol that was implemented is between qemu and kvm, not kvm and a guest. Either which way, the notion was to have a PV CPUID bit like the proposed kvm_get_rng_seed bit, but to have it exercised by executing RDRAND. The biggest reason to *not* do this would be that with an MSR it is not available to guest user space, which may be better under the circumstances. On the theory that I see no legitimate reason to expose this to guest user space, I think we shouldn't expose it. If we wanted to add a get_random_bytes syscall, that would be an entirely different story, though. Should I send v3 as one series or should I split it into host and guest parts? --Andy -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 0/5] random,x86,kvm: Add and use MSR_KVM_GET_RNG_SEED
This introduces and uses a very simple synchronous mechanism to get /dev/urandom-style bits appropriate for initial KVM PV guest RNG seeding. virtio-rng is not suitable for this purpose. It's too difficult to enumerate for use in early boot (e.g. KASLR, which runs before we even have an IDT). It also provides /dev/random-style bits, which means that making guest boot wait for virtio-rng is unacceptably slow, and doing it asynchronously means that /dev/urandom might still be predictable when userspace starts. I sent the corresponding kvm-unit-tests and qemu changes separately. There's room for bikeshedding on the same arch_get_slow_rng_u64. I considered arch_get_rng_seed_u64, but that could be confused with arch_get_random_seed_long, which is not interchangeable. Changes from v2: - Bisection fix (patch 2 had a misplaced brace). The final states is identical to that of v2. - Improve the 0/5 description a little bit. Changes from v1: - Split patches 2 and 3 - Log all arch sources in init_std_data - Fix the 32-bit kaslr build Andy Lutomirski (5): x86,kvm: Add MSR_KVM_GET_RNG_SEED and a matching feature bit random,x86: Add arch_get_slow_rng_u64 random: Seed pools from arch_get_slow_rng_u64 at startup random: Log how many bits we managed to seed with in init_std_data x86,kaslr: Use MSR_KVM_GET_RNG_SEED for KASLR if available Documentation/virtual/kvm/cpuid.txt | 3 +++ arch/x86/Kconfig | 4 arch/x86/boot/compressed/aslr.c | 27 +++ arch/x86/include/asm/archslowrng.h | 30 ++ arch/x86/include/asm/processor.h | 21 ++--- arch/x86/include/uapi/asm/kvm_para.h | 2 ++ arch/x86/kernel/kvm.c| 22 ++ arch/x86/kvm/cpuid.c | 3 ++- arch/x86/kvm/x86.c | 4 drivers/char/random.c| 20 ++-- include/linux/random.h | 9 + 11 files changed, 139 insertions(+), 6 deletions(-) create mode 100644 arch/x86/include/asm/archslowrng.h -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 3/5] random: Seed pools from arch_get_slow_rng_u64 at startup
This should help solve the problem of guests starting out with predictable RNG state. Signed-off-by: Andy Lutomirski l...@amacapital.net --- drivers/char/random.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/char/random.c b/drivers/char/random.c index 0a7ac0a..17ad33d 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -1261,6 +1261,13 @@ static void init_std_data(struct entropy_store *r) mix_pool_bytes(r, rv, sizeof(rv), NULL); } mix_pool_bytes(r, utsname(), sizeof(*(utsname())), NULL); + + for (i = 0; i 4; i++) { + u64 rv64; + + if (arch_get_slow_rng_u64(rv64)) + mix_pool_bytes(r, rv64, sizeof(rv64), NULL); + } } /* -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 2/5] random,x86: Add arch_get_slow_rng_u64
arch_get_slow_rng_u64 tries to get 64 bits of RNG seed data. Unlike arch_get_random_{bytes,seed}, etc., it makes no claims about entropy content. It's also likely to be much slower and should not be used frequently. That being said, it should be fast enough to call several times during boot without any noticeable slowdown. This initial implementation backs it with MSR_KVM_GET_RNG_SEED if available. The intent is for other hypervisor guest implementations to implement this interface. Signed-off-by: Andy Lutomirski l...@amacapital.net --- arch/x86/Kconfig | 4 arch/x86/include/asm/archslowrng.h | 30 ++ arch/x86/kernel/kvm.c | 22 ++ include/linux/random.h | 9 + 4 files changed, 65 insertions(+) create mode 100644 arch/x86/include/asm/archslowrng.h diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index a8f749e..4dfb539 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -593,6 +593,7 @@ config KVM_GUEST bool KVM Guest support (including kvmclock) depends on PARAVIRT select PARAVIRT_CLOCK + select ARCH_SLOW_RNG default y ---help--- This option enables various optimizations for running under the KVM @@ -627,6 +628,9 @@ config PARAVIRT_TIME_ACCOUNTING config PARAVIRT_CLOCK bool +config ARCH_SLOW_RNG + bool + endif #HYPERVISOR_GUEST config NO_BOOTMEM diff --git a/arch/x86/include/asm/archslowrng.h b/arch/x86/include/asm/archslowrng.h new file mode 100644 index 000..c8e8d0d --- /dev/null +++ b/arch/x86/include/asm/archslowrng.h @@ -0,0 +1,30 @@ +/* + * This file is part of the Linux kernel. + * + * Copyright (c) 2014 Andy Lutomirski + * Authors: Andy Lutomirski l...@amacapital.net + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + */ + +#ifndef ASM_X86_ARCHSLOWRANDOM_H +#define ASM_X86_ARCHSLOWRANDOM_H + +#ifndef CONFIG_ARCH_SLOW_RNG +# error archslowrng.h should not be included if !CONFIG_ARCH_SLOW_RNG +#endif + +/* + * Performance is irrelevant here, so there's no point in using the + * paravirt ops mechanism. Instead just use a function pointer. + */ +extern int (*arch_get_slow_rng_u64)(u64 *v); + +#endif /* ASM_X86_ARCHSLOWRANDOM_H */ diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index 3dd8e2c..8d64d28 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -416,6 +416,25 @@ void kvm_disable_steal_time(void) wrmsr(MSR_KVM_STEAL_TIME, 0, 0); } +static int nop_get_slow_rng_u64(u64 *v) +{ + return 0; +} + +static int kvm_get_slow_rng_u64(u64 *v) +{ + /* +* Allow migration from a hypervisor with the GET_RNG_SEED +* feature to a hypervisor without it. +*/ + if (rdmsrl_safe(MSR_KVM_GET_RNG_SEED, v) == 0) + return 1; + else + return 0; +} + +int (*arch_get_slow_rng_u64)(u64 *v) = nop_get_slow_rng_u64; + #ifdef CONFIG_SMP static void __init kvm_smp_prepare_boot_cpu(void) { @@ -493,6 +512,9 @@ void __init kvm_guest_init(void) if (kvmclock_vsyscall) kvm_setup_vsyscall_timeinfo(); + if (kvm_para_has_feature(KVM_FEATURE_GET_RNG_SEED)) + arch_get_slow_rng_u64 = kvm_get_slow_rng_u64; + #ifdef CONFIG_SMP smp_ops.smp_prepare_boot_cpu = kvm_smp_prepare_boot_cpu; register_cpu_notifier(kvm_cpu_notifier); diff --git a/include/linux/random.h b/include/linux/random.h index 57fbbff..ceafbcf 100644 --- a/include/linux/random.h +++ b/include/linux/random.h @@ -106,6 +106,15 @@ static inline int arch_has_random_seed(void) } #endif +#ifdef CONFIG_ARCH_SLOW_RNG +# include asm/archslowrng.h +#else +static inline int arch_get_slow_rng_u64(u64 *v) +{ + return 0; +} +#endif + /* Pseudo random number generator from numerical recipes. */ static inline u32 next_pseudo_random32(u32 seed) { -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 5/5] x86,kaslr: Use MSR_KVM_GET_RNG_SEED for KASLR if available
It's considerably better than any of the alternatives on KVM. Rather than reinventing all of the cpu feature query code, this fixes native_cpuid to work in PIC objects. I haven't combined it with boot/cpuflags.c's cpuid implementation: including asm/processor.h from boot/cpuflags.c results in a flood of unrelated errors, and fixing it might be messy. Signed-off-by: Andy Lutomirski l...@amacapital.net --- arch/x86/boot/compressed/aslr.c | 27 +++ arch/x86/include/asm/processor.h | 21 ++--- 2 files changed, 45 insertions(+), 3 deletions(-) diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c index fc6091a..8583f0e 100644 --- a/arch/x86/boot/compressed/aslr.c +++ b/arch/x86/boot/compressed/aslr.c @@ -5,6 +5,8 @@ #include asm/archrandom.h #include asm/e820.h +#include uapi/asm/kvm_para.h + #include generated/compile.h #include linux/module.h #include linux/uts.h @@ -15,6 +17,22 @@ static const char build_str[] = UTS_RELEASE ( LINUX_COMPILE_BY @ LINUX_COMPILE_HOST ) ( LINUX_COMPILER ) UTS_VERSION; +static bool kvm_para_has_feature(unsigned int feature) +{ + u32 kvm_base; + u32 features; + + if (!has_cpuflag(X86_FEATURE_HYPERVISOR)) + return false; + + kvm_base = hypervisor_cpuid_base(KVMKVMKVM\0\0\0, KVM_CPUID_FEATURES); + if (!kvm_base) + return false; + + features = cpuid_eax(kvm_base | KVM_CPUID_FEATURES); + return features (1UL feature); +} + #define I8254_PORT_CONTROL 0x43 #define I8254_PORT_COUNTER00x40 #define I8254_CMD_READBACK 0xC0 @@ -81,6 +99,15 @@ static unsigned long get_random_long(void) } } + if (kvm_para_has_feature(KVM_FEATURE_GET_RNG_SEED)) { + u64 seed; + + debug_putstr( MSR_KVM_GET_RNG_SEED); + rdmsrl(MSR_KVM_GET_RNG_SEED, seed); + random ^= (unsigned long)seed; + use_i8254 = false; + } + if (has_cpuflag(X86_FEATURE_TSC)) { debug_putstr( RDTSC); rdtscll(raw); diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index a4ea023..6096f3c 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -189,10 +189,25 @@ static inline int have_cpuid_p(void) static inline void native_cpuid(unsigned int *eax, unsigned int *ebx, unsigned int *ecx, unsigned int *edx) { - /* ecx is often an input as well as an output. */ - asm volatile(cpuid + /* +* This function can be used from the boot code, so it needs +* to avoid using EBX in constraints in PIC mode. +* +* ecx is often an input as well as an output. +*/ + asm volatile(.ifnc %%ebx,%1 ; .ifnc %%rbx,%1 \n\t +movl %%ebx,%1\n\t +.endif ; .endif \n\t +cpuid \n\t +.ifnc %%ebx,%1 ; .ifnc %%rbx,%1 \n\t +xchgl %%ebx,%1\n\t +.endif ; .endif : =a (*eax), - =b (*ebx), +#if defined(__i386__) defined(__PIC__) + =r (*ebx), /* gcc won't let us use ebx */ +#else + =b (*ebx), /* ebx is okay */ +#endif =c (*ecx), =d (*edx) : 0 (*eax), 2 (*ecx) -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 4/5] random: Log how many bits we managed to seed with in init_std_data
This is useful for making sure that init_std_data is working correctly and for allaying fear when this happens: random: xyz urandom read with SMALL_NUMBER bits of entropy available Signed-off-by: Andy Lutomirski l...@amacapital.net --- drivers/char/random.c | 15 --- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/drivers/char/random.c b/drivers/char/random.c index 17ad33d..10e9642 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -1251,12 +1251,16 @@ static void init_std_data(struct entropy_store *r) int i; ktime_t now = ktime_get_real(); unsigned long rv; + int arch_seed_bits = 0, arch_random_bits = 0, slow_rng_bits = 0; r-last_pulled = jiffies; mix_pool_bytes(r, now, sizeof(now), NULL); for (i = r-poolinfo-poolbytes; i 0; i -= sizeof(rv)) { - if (!arch_get_random_seed_long(rv) - !arch_get_random_long(rv)) + if (arch_get_random_seed_long(rv)) + arch_seed_bits += 8 * sizeof(rv); + else if (arch_get_random_long(rv)) + arch_random_bits += 8 * sizeof(rv); + else rv = random_get_entropy(); mix_pool_bytes(r, rv, sizeof(rv), NULL); } @@ -1265,9 +1269,14 @@ static void init_std_data(struct entropy_store *r) for (i = 0; i 4; i++) { u64 rv64; - if (arch_get_slow_rng_u64(rv64)) + if (arch_get_slow_rng_u64(rv64)) { mix_pool_bytes(r, rv64, sizeof(rv64), NULL); + slow_rng_bits += 8 * sizeof(rv64); + } } + + pr_info(random: seeded %s pool with %d bits of arch random seed, %d bits of arch random, and %d bits of arch slow rng\n, + r-name, arch_seed_bits, arch_random_bits, slow_rng_bits); } /* -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 1/5] x86,kvm: Add MSR_KVM_GET_RNG_SEED and a matching feature bit
This adds a simple interface to allow a guest to request 64 bits of host nonblocking entropy. This is independent of virtio-rng for a couple of reasons: - It's intended to be usable during early boot, when a trivial synchronous interface is needed. - virtio-rng gives blocking entropy, and making guest boot wait for the host's /dev/random will cause problems. MSR_KVM_GET_RNG_SEED is intended to provide 64 bits of best-effort cryptographically secure data for use as a seed. It provides no guarantee that the result contains any actual entropy. Signed-off-by: Andy Lutomirski l...@amacapital.net --- Documentation/virtual/kvm/cpuid.txt | 3 +++ arch/x86/include/uapi/asm/kvm_para.h | 2 ++ arch/x86/kvm/cpuid.c | 3 ++- arch/x86/kvm/x86.c | 4 4 files changed, 11 insertions(+), 1 deletion(-) diff --git a/Documentation/virtual/kvm/cpuid.txt b/Documentation/virtual/kvm/cpuid.txt index 3c65feb..0ab043b 100644 --- a/Documentation/virtual/kvm/cpuid.txt +++ b/Documentation/virtual/kvm/cpuid.txt @@ -54,6 +54,9 @@ KVM_FEATURE_PV_UNHALT || 7 || guest checks this feature bit || || before enabling paravirtualized || || spinlock support. -- +KVM_FEATURE_GET_RNG_SEED || 8 || host provides rng seed data via + || || MSR_KVM_GET_RNG_SEED. +-- KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||24 || host will warn if no guest-side || || per-cpu warps are expected in || || kvmclock. diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h index 94dc8ca..e2eaf93 100644 --- a/arch/x86/include/uapi/asm/kvm_para.h +++ b/arch/x86/include/uapi/asm/kvm_para.h @@ -24,6 +24,7 @@ #define KVM_FEATURE_STEAL_TIME 5 #define KVM_FEATURE_PV_EOI 6 #define KVM_FEATURE_PV_UNHALT 7 +#define KVM_FEATURE_GET_RNG_SEED 8 /* The last 8 bits are used to indicate how to interpret the flags field * in pvclock structure. If no bits are set, all flags are ignored. @@ -40,6 +41,7 @@ #define MSR_KVM_ASYNC_PF_EN 0x4b564d02 #define MSR_KVM_STEAL_TIME 0x4b564d03 #define MSR_KVM_PV_EOI_EN 0x4b564d04 +#define MSR_KVM_GET_RNG_SEED 0x4b564d05 struct kvm_steal_time { __u64 steal; diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 38a0afe..40d6763 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -479,7 +479,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, (1 KVM_FEATURE_ASYNC_PF) | (1 KVM_FEATURE_PV_EOI) | (1 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) | -(1 KVM_FEATURE_PV_UNHALT); +(1 KVM_FEATURE_PV_UNHALT) | +(1 KVM_FEATURE_GET_RNG_SEED); if (sched_info_on()) entry-eax |= (1 KVM_FEATURE_STEAL_TIME); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index f644933..4e81853 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -48,6 +48,7 @@ #include linux/pci.h #include linux/timekeeper_internal.h #include linux/pvclock_gtod.h +#include linux/random.h #include trace/events/kvm.h #define CREATE_TRACE_POINTS @@ -2480,6 +2481,9 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata) case MSR_KVM_PV_EOI_EN: data = vcpu-arch.pv_eoi.msr_val; break; + case MSR_KVM_GET_RNG_SEED: + get_random_bytes(data, sizeof(data)); + break; case MSR_IA32_P5_MC_ADDR: case MSR_IA32_P5_MC_TYPE: case MSR_IA32_MCG_CAP: -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/5] random,x86: Add arch_get_slow_rng_u64
On Wed, Jul 16, 2014 at 2:59 PM, H. Peter Anvin h...@zytor.com wrote: On 07/16/2014 02:45 PM, Andy Lutomirski wrote: diff --git a/arch/x86/include/asm/archslowrng.h b/arch/x86/include/asm/archslowrng.h new file mode 100644 index 000..c8e8d0d --- /dev/null +++ b/arch/x86/include/asm/archslowrng.h @@ -0,0 +1,30 @@ +/* + * This file is part of the Linux kernel. + * + * Copyright (c) 2014 Andy Lutomirski + * Authors: Andy Lutomirski l...@amacapital.net + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + */ + +#ifndef ASM_X86_ARCHSLOWRANDOM_H +#define ASM_X86_ARCHSLOWRANDOM_H + +#ifndef CONFIG_ARCH_SLOW_RNG +# error archslowrng.h should not be included if !CONFIG_ARCH_SLOW_RNG +#endif + I'm *seriously* questioning the wisdom of this. A much saner thing would be to do: #ifndef CONFIG_ARCH_SLOW_RNG /* Not supported */ static inline int arch_get_slow_rng_u64(u64 *v) { (void)v; return 0; } #endif ... which is basically what we do for the archrandom stuff. The archrandom stuff defines the not supported variant in the generic header, which is what I'm doing here. I could wrap all of asm/archslowrng.h in #ifdef CONFIG_ARCH_SLOW_RNG instead of putting the #error in there, but I have no strong preference. I'm also wondering if it makes sense to have a function which prefers arch_get_random*() over this one as a preferred interface. Something like: int get_random_arch_u64_slow_ok(u64 *v) { int i; u64 x = 0; unsigned long l; for (i = 0; i 64/BITS_PER_LONG; i++) { if (!arch_get_random_long(l)) return arch_get_slow_rng_u64(v); x |= l (i*BITS_PER_LONG); } *v = l; return 0; } I played with something like this earlier, but I dropped it when it ended up having exactly one user. I suspect that the highly paranoid will actually prefer seeding with both sources in init_std_data even if RDRAND is available -- it costs very little and it provides a bit of extra assurance. This still doesn't address the issue e.g. on x86 where RDRAND is available but we haven't set up alternatives yet. So it might be that what we really want is to encapsulate this fallback in arch code and do a more direct enumeration. My personal preference is to defer this until some user shows up. I think that even this would be too complicated for KASLR, which is the only extremely early-boot user that I found. Hmm. Does the prandom stuff want to use this? + +static int kvm_get_slow_rng_u64(u64 *v) +{ + /* + * Allow migration from a hypervisor with the GET_RNG_SEED + * feature to a hypervisor without it. + */ + if (rdmsrl_safe(MSR_KVM_GET_RNG_SEED, v) == 0) + return 1; + else + return 0; +} How about: return rdmsrl_safe(MSR_KVM_GET_RNG_SEED, v) == 0; The naming also feels really inconsistent... Better ideas welcome. I could call the generic function arch_get_pv_random_seed, but maybe someone will come up with a non-paravirt implementation. --Andy -hpa -- Andy Lutomirski AMA Capital Management, LLC -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/5] random,x86: Add arch_get_slow_rng_u64
On Wed, Jul 16, 2014 at 3:13 PM, Andy Lutomirski l...@amacapital.net wrote: My personal preference is to defer this until some user shows up. I think that even this would be too complicated for KASLR, which is the only extremely early-boot user that I found. Hmm. Does the prandom stuff want to use this? prandom isn't even using rdrand. I'd suggest fixing this separately, or even just waiting until someone goes and deletes prandom. --Andy -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/5] random,x86: Add arch_get_slow_rng_u64
On Jul 16, 2014 4:00 PM, H. Peter Anvin h...@zytor.com wrote: On 07/16/2014 03:40 PM, Andy Lutomirski wrote: On Wed, Jul 16, 2014 at 3:13 PM, Andy Lutomirski l...@amacapital.net wrote: My personal preference is to defer this until some user shows up. I think that even this would be too complicated for KASLR, which is the only extremely early-boot user that I found. Hmm. Does the prandom stuff want to use this? prandom isn't even using rdrand. I'd suggest fixing this separately, or even just waiting until someone goes and deletes prandom. prandom is exactly the opposite; it is designed for when we need possibly low quality random numbers very quickly. RDRAND is actually too slow. I meant that prandom isn't using rdrand for early seeding. --Andy -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/5] random,x86: Add arch_get_slow_rng_u64
On Thu, Jul 17, 2014 at 9:39 AM, H. Peter Anvin h...@zytor.com wrote: On 07/17/2014 03:33 AM, Theodore Ts'o wrote: On Wed, Jul 16, 2014 at 09:55:15PM -0700, H. Peter Anvin wrote: On 07/16/2014 05:03 PM, Andy Lutomirski wrote: I meant that prandom isn't using rdrand for early seeding. We should probably fix that. It wouldn't hurt to explicitly use arch_get_random_long() in prandom, but it does use get_random_bytes() in early seed, and for CPU's with RDRAND present, we do use it in init_std_data() in drivers/char/random.c, so prandom is already getting initialized via an RNG (which is effectively a DRBG even if it doesn't pass all of NIST's rules) which is derived from RDRAND. I assumed he was referring to before alternatives. Not sure if we use prandom before that point, though. Unless I'm reading the code wrong, the prandom_reseed_late call can happen after userspace is running. Anyway, I'm working on a near-complete rewrite of the guest part of all of this. --Andy -hpa -- Andy Lutomirski AMA Capital Management, LLC -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/5] random,x86: Add arch_get_slow_rng_u64
On Thu, Jul 17, 2014 at 10:32 AM, Theodore Ts'o ty...@mit.edu wrote: On Thu, Jul 17, 2014 at 10:12:27AM -0700, Andy Lutomirski wrote: Unless I'm reading the code wrong, the prandom_reseed_late call can happen after userspace is running. But there is also the prandom_reseed() call, which happens early. Right -- I missed that. - Ted -- Andy Lutomirski AMA Capital Management, LLC -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 1/5] x86,kvm: Add MSR_KVM_GET_RNG_SEED and a matching feature bit
On Thu, Jul 17, 2014 at 10:43 AM, Andrew Honig aho...@google.com wrote: + case MSR_KVM_GET_RNG_SEED: + get_random_bytes(data, sizeof(data)); + break; Should this be rate limited in the interest of conserving randomness? If there ever is an attack on the prng, this would create very favorable conditions for an attacker to exploit it. IMO if the nonblocking pool has a weakness that requires us to conserve its output, then this is the least of our worries. --Andy -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 3/5] x86,random: Add an x86 implementation of arch_get_rng_seed
This is closer to Intel's recommended logic for using RDRAND and RDSEED. It will attempt to seed the entire internal state of the RNG pool using RDSEED (with one bit of RDSEED output per bit of state). For any bits that can't be obtained using RDSEED (e.g. if RDSEED is unavailable), it calculates the number of RDRAND reseeds needed to obtain the missing bits from the internal NRBG and then requests enough bits from RDRAND to obtain the full output from at least that many reseeds. Arguably, arch_get_random_seed could be removed now: I'm having some trouble imagining a sensible non-architecture-specific use of it that wouldn't be better served by arch_get_rng_seed. Signed-off-by: Andy Lutomirski l...@amacapital.net --- arch/x86/include/asm/archrandom.h | 6 +++ arch/x86/kernel/Makefile | 2 + arch/x86/kernel/archrandom.c | 79 +++ 3 files changed, 87 insertions(+) create mode 100644 arch/x86/kernel/archrandom.c diff --git a/arch/x86/include/asm/archrandom.h b/arch/x86/include/asm/archrandom.h index 69f1366..88f9c5a 100644 --- a/arch/x86/include/asm/archrandom.h +++ b/arch/x86/include/asm/archrandom.h @@ -117,6 +117,12 @@ GET_SEED(arch_get_random_seed_int, unsigned int, RDSEED_INT, ASM_NOP4); #define arch_has_random() static_cpu_has(X86_FEATURE_RDRAND) #define arch_has_random_seed() static_cpu_has(X86_FEATURE_RDSEED) +#define __HAVE_ARCH_GET_RNG_SEED +extern void arch_get_rng_seed(void *ctx, + void (*seed)(void *ctx, u32 data), + int bits_per_source, + const char *log_prefix); + #else static inline int rdrand_long(unsigned long *v) diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 047f9ff..0718bae 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -92,6 +92,8 @@ obj-$(CONFIG_PARAVIRT)+= paravirt.o paravirt_patch_$(BITS).o obj-$(CONFIG_PARAVIRT_SPINLOCKS)+= paravirt-spinlocks.o obj-$(CONFIG_PARAVIRT_CLOCK) += pvclock.o +obj-$(CONFIG_ARCH_RANDOM) += archrandom.o + obj-$(CONFIG_PCSPKR_PLATFORM) += pcspeaker.o obj-$(CONFIG_X86_CHECK_BIOS_CORRUPTION) += check.o diff --git a/arch/x86/kernel/archrandom.c b/arch/x86/kernel/archrandom.c new file mode 100644 index 000..5515fc8 --- /dev/null +++ b/arch/x86/kernel/archrandom.c @@ -0,0 +1,79 @@ +/* + * This file is part of the Linux kernel. + * + * Copyright (c) 2014 Andy Lutomirski + * Authors: Andy Lutomirski l...@amacapital.net + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + */ + +#include asm/archrandom.h + +void arch_get_rng_seed(void *ctx, + void (*seed)(void *ctx, u32 data), + int bits_per_source, + const char *log_prefix) +{ + int i, longs = (bits_per_source + BITS_PER_LONG - 1) / BITS_PER_LONG; + int rdseed_bits = 0, rdrand_bits = 0; + int rdrand_longs_wanted = 0; + char buf[128] = ; + char *msgptr = buf; + + for (i = 0; i longs; i++) { + unsigned long rv; + + if (arch_get_random_seed_long(rv)) { + seed(ctx, (u32)rv); +#if BITS_PER_LONG 32 + seed(ctx, (u32)(rv 32)); +#endif + rdseed_bits += 8 * sizeof(rv); + } + } + if (rdseed_bits) + msgptr += sprintf(msgptr, , %d bits from RDSEED, rdseed_bits); + + /* +* According to the Intel DRNG Software Implementation Guide 2.0, +* the RDRAND hardware is guaranteed to provide at least 128 bits +* of non-deterministic entropy per 511*128 bits of RDRAND output. +* Nonetheless, the guide suggests using a 512:1 reduction for +* generating seeds. +* +* We use one extra reseed, because we might not own the first +* or last few samples. +* +* We skip using RDRAND for any bits already provided by RDSEED, +* as they use the same underlying entropy source. +*/ + if (rdseed_bits bits_per_source arch_has_random()) { + int nrbg_bits = bits_per_source - rdseed_bits; + int reseeds = (nrbg_bits + 127) / 128 + 1; + + rdrand_longs_wanted = reseeds * 512 * 128 / BITS_PER_LONG; + } + for (i = 0; i rdrand_longs_wanted; i++) { + unsigned long rv; + + if (arch_get_random_long(rv)) { + seed(ctx, (u32)rv); +#if BITS_PER_LONG 32 + seed(ctx, (u32
[PATCH v4 1/5] x86,kvm: Add MSR_KVM_GET_RNG_SEED and a matching feature bit
This adds a simple interface to allow a guest to request 64 bits of host nonblocking entropy. This is independent of virtio-rng for a couple of reasons: - It's intended to be usable during early boot, when a trivial synchronous interface is needed. - virtio-rng gives blocking entropy, and making guest boot wait for the host's /dev/random will cause problems. MSR_KVM_GET_RNG_SEED is intended to provide 64 bits of best-effort cryptographically secure data for use as a seed. It provides no guarantee that the result contains any actual entropy. Signed-off-by: Andy Lutomirski l...@amacapital.net --- Documentation/virtual/kvm/cpuid.txt | 3 +++ arch/x86/include/uapi/asm/kvm_para.h | 2 ++ arch/x86/kvm/cpuid.c | 3 ++- arch/x86/kvm/x86.c | 4 4 files changed, 11 insertions(+), 1 deletion(-) diff --git a/Documentation/virtual/kvm/cpuid.txt b/Documentation/virtual/kvm/cpuid.txt index 3c65feb..0ab043b 100644 --- a/Documentation/virtual/kvm/cpuid.txt +++ b/Documentation/virtual/kvm/cpuid.txt @@ -54,6 +54,9 @@ KVM_FEATURE_PV_UNHALT || 7 || guest checks this feature bit || || before enabling paravirtualized || || spinlock support. -- +KVM_FEATURE_GET_RNG_SEED || 8 || host provides rng seed data via + || || MSR_KVM_GET_RNG_SEED. +-- KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||24 || host will warn if no guest-side || || per-cpu warps are expected in || || kvmclock. diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h index 94dc8ca..e2eaf93 100644 --- a/arch/x86/include/uapi/asm/kvm_para.h +++ b/arch/x86/include/uapi/asm/kvm_para.h @@ -24,6 +24,7 @@ #define KVM_FEATURE_STEAL_TIME 5 #define KVM_FEATURE_PV_EOI 6 #define KVM_FEATURE_PV_UNHALT 7 +#define KVM_FEATURE_GET_RNG_SEED 8 /* The last 8 bits are used to indicate how to interpret the flags field * in pvclock structure. If no bits are set, all flags are ignored. @@ -40,6 +41,7 @@ #define MSR_KVM_ASYNC_PF_EN 0x4b564d02 #define MSR_KVM_STEAL_TIME 0x4b564d03 #define MSR_KVM_PV_EOI_EN 0x4b564d04 +#define MSR_KVM_GET_RNG_SEED 0x4b564d05 struct kvm_steal_time { __u64 steal; diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 38a0afe..40d6763 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -479,7 +479,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, (1 KVM_FEATURE_ASYNC_PF) | (1 KVM_FEATURE_PV_EOI) | (1 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) | -(1 KVM_FEATURE_PV_UNHALT); +(1 KVM_FEATURE_PV_UNHALT) | +(1 KVM_FEATURE_GET_RNG_SEED); if (sched_info_on()) entry-eax |= (1 KVM_FEATURE_STEAL_TIME); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index f644933..4e81853 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -48,6 +48,7 @@ #include linux/pci.h #include linux/timekeeper_internal.h #include linux/pvclock_gtod.h +#include linux/random.h #include trace/events/kvm.h #define CREATE_TRACE_POINTS @@ -2480,6 +2481,9 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata) case MSR_KVM_PV_EOI_EN: data = vcpu-arch.pv_eoi.msr_val; break; + case MSR_KVM_GET_RNG_SEED: + get_random_bytes(data, sizeof(data)); + break; case MSR_IA32_P5_MC_ADDR: case MSR_IA32_P5_MC_TYPE: case MSR_IA32_MCG_CAP: -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 5/5] x86,kaslr: Use MSR_KVM_GET_RNG_SEED for KASLR if available
It's considerably better than any of the alternatives on KVM. Rather than reinventing all of the cpu feature query code, this fixes native_cpuid to work in PIC objects. I haven't combined it with boot/cpuflags.c's cpuid implementation: including asm/processor.h from boot/cpuflags.c results in a flood of unrelated errors, and fixing it might be messy. Signed-off-by: Andy Lutomirski l...@amacapital.net --- arch/x86/boot/compressed/aslr.c | 27 +++ arch/x86/include/asm/processor.h | 21 ++--- 2 files changed, 45 insertions(+), 3 deletions(-) diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c index fc6091a..8583f0e 100644 --- a/arch/x86/boot/compressed/aslr.c +++ b/arch/x86/boot/compressed/aslr.c @@ -5,6 +5,8 @@ #include asm/archrandom.h #include asm/e820.h +#include uapi/asm/kvm_para.h + #include generated/compile.h #include linux/module.h #include linux/uts.h @@ -15,6 +17,22 @@ static const char build_str[] = UTS_RELEASE ( LINUX_COMPILE_BY @ LINUX_COMPILE_HOST ) ( LINUX_COMPILER ) UTS_VERSION; +static bool kvm_para_has_feature(unsigned int feature) +{ + u32 kvm_base; + u32 features; + + if (!has_cpuflag(X86_FEATURE_HYPERVISOR)) + return false; + + kvm_base = hypervisor_cpuid_base(KVMKVMKVM\0\0\0, KVM_CPUID_FEATURES); + if (!kvm_base) + return false; + + features = cpuid_eax(kvm_base | KVM_CPUID_FEATURES); + return features (1UL feature); +} + #define I8254_PORT_CONTROL 0x43 #define I8254_PORT_COUNTER00x40 #define I8254_CMD_READBACK 0xC0 @@ -81,6 +99,15 @@ static unsigned long get_random_long(void) } } + if (kvm_para_has_feature(KVM_FEATURE_GET_RNG_SEED)) { + u64 seed; + + debug_putstr( MSR_KVM_GET_RNG_SEED); + rdmsrl(MSR_KVM_GET_RNG_SEED, seed); + random ^= (unsigned long)seed; + use_i8254 = false; + } + if (has_cpuflag(X86_FEATURE_TSC)) { debug_putstr( RDTSC); rdtscll(raw); diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index a4ea023..6096f3c 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -189,10 +189,25 @@ static inline int have_cpuid_p(void) static inline void native_cpuid(unsigned int *eax, unsigned int *ebx, unsigned int *ecx, unsigned int *edx) { - /* ecx is often an input as well as an output. */ - asm volatile(cpuid + /* +* This function can be used from the boot code, so it needs +* to avoid using EBX in constraints in PIC mode. +* +* ecx is often an input as well as an output. +*/ + asm volatile(.ifnc %%ebx,%1 ; .ifnc %%rbx,%1 \n\t +movl %%ebx,%1\n\t +.endif ; .endif \n\t +cpuid \n\t +.ifnc %%ebx,%1 ; .ifnc %%rbx,%1 \n\t +xchgl %%ebx,%1\n\t +.endif ; .endif : =a (*eax), - =b (*ebx), +#if defined(__i386__) defined(__PIC__) + =r (*ebx), /* gcc won't let us use ebx */ +#else + =b (*ebx), /* ebx is okay */ +#endif =c (*ecx), =d (*edx) : 0 (*eax), 2 (*ecx) -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 2/5] random: Add and use arch_get_rng_seed
Currently, init_std_data contains its own logic for using arch random sources. This logic is a bit strange: it reads one long of arch random data per byte of internal state. This replaces that logic with a generic function arch_get_rng_seed that allows arch code to supply its own logic. The default implementation tries arch_get_random_seed_long and arch_get_random_long individually, requesting one bit per bit of internal state being seeded. Assuming the arch sources are perfect, this is the right thing to do. They're not, though, so the followup patch attempts to implement the correct logic on x86. Signed-off-by: Andy Lutomirski l...@amacapital.net --- drivers/char/random.c | 14 +++--- include/linux/random.h | 40 2 files changed, 51 insertions(+), 3 deletions(-) diff --git a/drivers/char/random.c b/drivers/char/random.c index 0a7ac0a..be7a94e 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -1236,6 +1236,10 @@ void get_random_bytes_arch(void *buf, int nbytes) } EXPORT_SYMBOL(get_random_bytes_arch); +static void seed_entropy_store(void *ctx, u32 data) +{ + mix_pool_bytes((struct entropy_store *)ctx, data, sizeof(data), NULL); +} /* * init_std_data - initialize pool with system data @@ -1251,15 +1255,19 @@ static void init_std_data(struct entropy_store *r) int i; ktime_t now = ktime_get_real(); unsigned long rv; + char log_prefix[128]; r-last_pulled = jiffies; mix_pool_bytes(r, now, sizeof(now), NULL); for (i = r-poolinfo-poolbytes; i 0; i -= sizeof(rv)) { - if (!arch_get_random_seed_long(rv) - !arch_get_random_long(rv)) - rv = random_get_entropy(); + rv = random_get_entropy(); mix_pool_bytes(r, rv, sizeof(rv), NULL); } + + sprintf(log_prefix, random: seeded %s pool, r-name); + arch_get_rng_seed(r, seed_entropy_store, 8 * r-poolinfo-poolbytes, + log_prefix); + mix_pool_bytes(r, utsname(), sizeof(*(utsname())), NULL); } diff --git a/include/linux/random.h b/include/linux/random.h index 57fbbff..a17065e 100644 --- a/include/linux/random.h +++ b/include/linux/random.h @@ -106,6 +106,46 @@ static inline int arch_has_random_seed(void) } #endif +#ifndef __HAVE_ARCH_GET_RNG_SEED + +/** + * arch_get_rng_seed() - get architectural rng seed data + * @ctx: context for the seed function + * @seed: function to call for each u32 obtained + * @bits_per_source: number of bits from each source to try to use + * @log_prefix: beginning of log output (may be NULL) + * + * Synchronously load some architectural entropy or other best-effort + * random seed data. An arch-specific implementation should be no worse + * than this generic implementation. If the arch code does something + * interesting, it may log something of the form log_prefix with + * 8 bits of stuff. + * + * No arch-specific implementation should be any worse than the generic + * implementation. + */ +static inline void arch_get_rng_seed(void *ctx, +void (*seed)(void *ctx, u32 data), +int bits_per_source, +const char *log_prefix) +{ + int i, longs = (bits_per_source + BITS_PER_LONG - 1) / BITS_PER_LONG; + + for (i = 0; i longs; i++) { + unsigned long rv; + + if (arch_get_random_seed_long(rv) || + arch_get_random_long(rv)) { + seed(ctx, (u32)rv); +#if BITS_PER_LONG 32 + seed(ctx, (u32)(rv 32)); +#endif + } + } +} + +#endif /* __HAVE_ARCH_GET_RNG_SEED */ + /* Pseudo random number generator from numerical recipes. */ static inline u32 next_pseudo_random32(u32 seed) { -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 0/5] random,x86,kvm: Rework arch RNG seeds and get some from kvm
This introduces and uses a very simple synchronous mechanism to get /dev/urandom-style bits appropriate for initial KVM PV guest RNG seeding. It also re-works the way that architectural random data is fed into random.c's pools. I added a new arch hook called arch_get_rng_seed. The default implementation uses arch_get_random_seed_long and arch_get_random_long, but not quite the same way as before. x86 gets a custom arch_get_rng_seed, which is significantly enhanced over the generic implementation. It uses RDSEED less aggressively (the old implementation requested 4x or 8x as many bits as would fit in the pool, depending on kernel bitness), but, if using RDRAND, it requests enough bits to comply with Intel's recommendations. x86's arch_get_rng_seed will also use KVM_GET_RNG_SEED if available. If more paravirt seed sources show up, it will be a natural place to add them. I sent the corresponding kvm-unit-tests and qemu changes separately. Changes from v3: - Other than KASLR, the guest pieces are completely rewritten. Patches 2-4 have essentially nothing in common with v2. Changes from v2: - Bisection fix (patch 2 had a misplaced brace). The final states is identical to that of v2. - Improve the 0/5 description a little bit. Changes from v1: - Split patches 2 and 3 - Log all arch sources in init_std_data - Fix the 32-bit kaslr build Andy Lutomirski (5): x86,kvm: Add MSR_KVM_GET_RNG_SEED and a matching feature bit random: Add and use arch_get_rng_seed x86,random: Add an x86 implementation of arch_get_rng_seed x86,random,kvm: Use KVM_GET_RNG_SEED in arch_get_rng_seed x86,kaslr: Use MSR_KVM_GET_RNG_SEED for KASLR if available Documentation/virtual/kvm/cpuid.txt | 3 ++ arch/x86/Kconfig | 4 ++ arch/x86/boot/compressed/aslr.c | 27 ++ arch/x86/include/asm/archrandom.h| 6 +++ arch/x86/include/asm/kvm_guest.h | 9 arch/x86/include/asm/processor.h | 21 ++-- arch/x86/include/uapi/asm/kvm_para.h | 2 + arch/x86/kernel/Makefile | 2 + arch/x86/kernel/archrandom.c | 99 arch/x86/kernel/kvm.c| 10 arch/x86/kvm/cpuid.c | 3 +- arch/x86/kvm/x86.c | 4 ++ drivers/char/random.c| 14 +++-- include/linux/random.h | 40 +++ 14 files changed, 237 insertions(+), 7 deletions(-) create mode 100644 arch/x86/kernel/archrandom.c -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 4/5] x86,random,kvm: Use KVM_GET_RNG_SEED in arch_get_rng_seed
This is a straightforward implementation: for each bit of internal RNG state, request one bit from KVM_GET_RNG_SEED. This is done even if RDSEED/RDRAND worked, since KVM_GET_RNG_SEED is likely to provide cryptographically secure output even if the CPU's RNG is weak or compromised. Signed-off-by: Andy Lutomirski l...@amacapital.net --- arch/x86/Kconfig | 4 arch/x86/include/asm/kvm_guest.h | 9 + arch/x86/kernel/archrandom.c | 22 +- arch/x86/kernel/kvm.c| 10 ++ 4 files changed, 44 insertions(+), 1 deletion(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index a8f749e..adfa09c 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -593,6 +593,7 @@ config KVM_GUEST bool KVM Guest support (including kvmclock) depends on PARAVIRT select PARAVIRT_CLOCK + select ARCH_RANDOM default y ---help--- This option enables various optimizations for running under the KVM @@ -1507,6 +1508,9 @@ config ARCH_RANDOM If supported, this is a high bandwidth, cryptographically secure hardware random number generator. + This also enables paravirt RNGs such as KVM's if the relevant + PV guest support is enabled. + config X86_SMAP def_bool y prompt Supervisor Mode Access Prevention if EXPERT diff --git a/arch/x86/include/asm/kvm_guest.h b/arch/x86/include/asm/kvm_guest.h index a92b176..8c4dbd5 100644 --- a/arch/x86/include/asm/kvm_guest.h +++ b/arch/x86/include/asm/kvm_guest.h @@ -3,4 +3,13 @@ int kvm_setup_vsyscall_timeinfo(void); +#if defined(CONFIG_KVM_GUEST) defined(CONFIG_ARCH_RANDOM) +extern bool kvm_get_rng_seed(u64 *rv); +#else +static inline bool kvm_get_rng_seed(u64 *rv) +{ + return false; +} +#endif + #endif /* _ASM_X86_KVM_GUEST_H */ diff --git a/arch/x86/kernel/archrandom.c b/arch/x86/kernel/archrandom.c index 5515fc8..3bcfa58 100644 --- a/arch/x86/kernel/archrandom.c +++ b/arch/x86/kernel/archrandom.c @@ -15,6 +15,7 @@ */ #include asm/archrandom.h +#include asm/kvm_guest.h void arch_get_rng_seed(void *ctx, void (*seed)(void *ctx, u32 data), @@ -22,7 +23,7 @@ void arch_get_rng_seed(void *ctx, const char *log_prefix) { int i, longs = (bits_per_source + BITS_PER_LONG - 1) / BITS_PER_LONG; - int rdseed_bits = 0, rdrand_bits = 0; + int rdseed_bits = 0, rdrand_bits = 0, kvm_bits = 0; int rdrand_longs_wanted = 0; char buf[128] = ; char *msgptr = buf; @@ -74,6 +75,25 @@ void arch_get_rng_seed(void *ctx, if (rdrand_bits) msgptr += sprintf(msgptr, , %d bits from RDRAND, rdrand_bits); + /* +* Use KVM_GET_RNG_SEED regardless of whether the CPU RNG worked, +* since it incorporates entropy unavailable to the CPU. We +* request enough bits for the entire internal RNG state, because +* there's no good reason not to. +*/ + for (i = 0; i (bits_per_source + 63) / 64; i++) { + u64 rv; + + if (kvm_get_rng_seed(rv)) { + seed(ctx, (u32)rv); + seed(ctx, (u32)(rv 32)); + kvm_bits += 8 * sizeof(rv); + } + } + if (kvm_bits) + msgptr += sprintf(msgptr, , %d bits from KVM_GET_RNG_BITS, + kvm_bits); + if (buf[0]) pr_info(%s with %s\n, log_prefix, buf + 2); } diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index 3dd8e2c..bd8783a 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -416,6 +416,16 @@ void kvm_disable_steal_time(void) wrmsr(MSR_KVM_STEAL_TIME, 0, 0); } +bool kvm_get_rng_seed(u64 *v) +{ + /* +* Allow migration from a hypervisor with the GET_RNG_SEED +* feature to a hypervisor without it. +*/ + return (kvm_para_has_feature(KVM_FEATURE_GET_RNG_SEED) + rdmsrl_safe(MSR_KVM_GET_RNG_SEED, v) == 0); +} + #ifdef CONFIG_SMP static void __init kvm_smp_prepare_boot_cpu(void) { -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/5] random,x86: Add arch_get_slow_rng_u64
On Thu, Jul 17, 2014 at 11:42 AM, Hannes Frederic Sowa han...@stressinduktion.org wrote: On Thu, Jul 17, 2014, at 19:34, Andy Lutomirski wrote: On Thu, Jul 17, 2014 at 10:32 AM, Theodore Ts'o ty...@mit.edu wrote: On Thu, Jul 17, 2014 at 10:12:27AM -0700, Andy Lutomirski wrote: Unless I'm reading the code wrong, the prandom_reseed_late call can happen after userspace is running. But there is also the prandom_reseed() call, which happens early. Right -- I missed that. prandom_init is a core_initcall, prandom_reseed is a late_initcall. During initialization of the network stack we have calls to prandom_u32 before the late_initcall happens. That said, I think it is not that important to seed prandom with rdseed/rdrand as security relevant entropy extraction should always use get_random_bytes(), but we should do it nonetheless. Regardless, I don't want to do this as part of this patch series. One thing at a time... --Andy -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 2/5] random: Add and use arch_get_rng_seed
On Tue, Jul 22, 2014 at 6:59 AM, Theodore Ts'o ty...@mit.edu wrote: On Thu, Jul 17, 2014 at 11:22:17AM -0700, Andy Lutomirski wrote: Currently, init_std_data contains its own logic for using arch random sources. This logic is a bit strange: it reads one long of arch random data per byte of internal state. This isn't true. Check out the init_std_data() a bit more closely. unsigned long rv; ... for (i = r-poolinfo-poolbytes; i 0; i -= sizeof(rv)) { ... In particular, note the i -= sizeof(rv). We are reading one bit per bit of internal state beeing seeded. Whoops, my bad. Assuming the arch sources are perfect, this is the right thing to do. They're not, though, so the followup patch attempts to implement the correct logic on x86. ... and that's not a problem because we aren't giving any entropy credit --- and this is deliberate, because we don't want to trust un-auditable hardware. We are deliberately trying to be conservative here. True. But, if you Intel's hardware does, in fact, work as documented, then the current code will collect very little entropy on RDSEED-less hardware. I see no great reason that we should do something weaker than following Intel's explicit recommendation for how to seed a PRNG from RDRAND. So I don't think either this patch or the next one is needed. It adds far more complexity than is warranted. The real reason I did this is because I didn't want to pollute the kernel with yet more arch_get_random_xyz functions. In the previous iteration of this patchset, init_std_data had to deal with no less than three arch random sources. If Xen adds something (which, IMO, they should), then either it'll be up to four, or one of them will have to multiplex. Another benefit of this split is that it will potentially allow arch_get_rng_seed to be made to work before alternatives are run. There's no fundamental reason that it couldn't work *extremely* early in boot. (The KASLR code is an example of how this might work.) On the other hand, making arch_get_random_long work very early in boot would either slow down all the other callers or add a considerable amount of extra complexity. So I think that this patch is a slight improvement in RNG initialization and will actually result in simpler code. (And yes, if I submit a new version of it, I'll fix the changelog.) --Andy -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 2/5] random: Add and use arch_get_rng_seed
On Tue, Jul 22, 2014 at 1:57 PM, H. Peter Anvin h...@zytor.com wrote: On 07/22/2014 01:44 PM, Andy Lutomirski wrote: But, if you Intel's hardware does, in fact, work as documented, then the current code will collect very little entropy on RDSEED-less hardware. I see no great reason that we should do something weaker than following Intel's explicit recommendation for how to seed a PRNG from RDRAND. Very little entropy in the architectural worst case. However, since we are running single-threaded at this point, actual hardware performs orders of magnitude better. Since we run the mixing function (for no particularly good reason -- it is a linear function and doesn't add security) there will be enough delay that RDRAND will in practice catch up and the output will be quite high quality. Since the pool is quite large, the likely outcome is that there will be enough randomness that in practice we would probably be okay if *no* further entropy was ever collected. Just to check: do you mean the RDRAND is very likely to work (i.e. arch_get_random_long will return true) or that RDRAND will actually reseed several times during initialization? I have no RDRAND-capable hardware, so I can't benchmark it, but I imagine that we're talking about adding 1-2 ms per boot to ensure that the pool is filled to capacity with *NRBG* data according to the the architectural specification. Anyway, the current code is IMO very much encoding some form of knowledge of how arch_get_random_* work into init_std_data, and I don't think that's the place for it. Another benefit of this split is that it will potentially allow arch_get_rng_seed to be made to work before alternatives are run. There's no fundamental reason that it couldn't work *extremely* early in boot. (The KASLR code is an example of how this might work.) On the other hand, making arch_get_random_long work very early in boot would either slow down all the other callers or add a considerable amount of extra complexity. So I think that this patch is a slight improvement in RNG initialization and will actually result in simpler code. (And yes, if I submit a new version of it, I'll fix the changelog.) There really isn't any significant reason why we could not permit randomness initialization very early in the boot, indeed. It has largely been useless in the past because until the I/O system gets initialized there is no randomness of any kind available on traditional hardware. To me, the question is whether this is a sufficient reason to add arch_get_rng_data. If it is, then great. If not, then I'd like to know what other way of doing this would be acceptable. You disliked arch_get_slow_rng_u64 or whatever I called it, and I agree -- I think it sucked. --Andy -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 2/5] random: Add and use arch_get_rng_seed
On Tue, Jul 22, 2014 at 2:08 PM, H. Peter Anvin h...@zytor.com wrote: On 07/22/2014 02:04 PM, Andy Lutomirski wrote: Just to check: do you mean the RDRAND is very likely to work (i.e. arch_get_random_long will return true) or that RDRAND will actually reseed several times during initialization? I mean that RDRAND will actually reseed several times during initialization. The documented architectural limit is actually extremely conservative. Either way, it isn't really different from seeding from a VM hosts /dev/urandom... Sure it is. The VM host's /dev/urandom makes no guarantee (or AFAIK even any particular effort) to reseed such that the output has some minimum entropy per bit, so there would be no point to reading extra data from it. Anyway, I'd be willing to drop the conservative RDRAND logic, but I *still* think that arch_get_rng_seed is a much better interface than arch_get_slow_rng_u64. --Andy -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 0/5] random,x86,kvm: Rework arch RNG seeds and get some from kvm
This introduces and uses a very simple synchronous mechanism to get /dev/urandom-style bits appropriate for initial KVM PV guest RNG seeding. It also re-works the way that architectural random data is fed into random.c's pools. I added a new arch hook called arch_get_rng_seed. The default implementation is more or less the same as the current code, except that random_get_entropy is now called unconditionally. x86 gets a custom arch_get_rng_seed. It will use KVM_GET_RNG_SEED if available, and, if it does anything, it will log the number of bits collected from each available architectural source. If more paravirt seed sources show up, it will be a natural place to add them. I sent the corresponding kvm-unit-tests and qemu changes separately. Changes from v4: - Got rid of the RDRAND behavior change. If this series is accepted, I may resend it separately, but I think it's an unrelated issue. - Fix up the changelog entries -- I misunderstood how the old code worked. - Avoid lots of failed attempts to use KVM_GET_RNG_SEED if it's not available. Changes from v3: - Other than KASLR, the guest pieces are completely rewritten. Patches 2-4 have essentially nothing in common with v2. Changes from v2: - Bisection fix (patch 2 had a misplaced brace). The final states is identical to that of v2. - Improve the 0/5 description a little bit. Changes from v1: - Split patches 2 and 3 - Log all arch sources in init_std_data - Fix the 32-bit kaslr build Andy Lutomirski (5): x86,kvm: Add MSR_KVM_GET_RNG_SEED and a matching feature bit random: Add and use arch_get_rng_seed x86,random: Add an x86 implementation of arch_get_rng_seed x86,random,kvm: Use KVM_GET_RNG_SEED in arch_get_rng_seed x86,kaslr: Use MSR_KVM_GET_RNG_SEED for KASLR if available Documentation/virtual/kvm/cpuid.txt | 3 ++ arch/x86/Kconfig | 4 ++ arch/x86/boot/compressed/aslr.c | 27 + arch/x86/include/asm/archrandom.h| 6 +++ arch/x86/include/asm/kvm_guest.h | 9 + arch/x86/include/asm/processor.h | 21 -- arch/x86/include/uapi/asm/kvm_para.h | 2 + arch/x86/kernel/Makefile | 2 + arch/x86/kernel/archrandom.c | 74 arch/x86/kernel/kvm.c| 10 + arch/x86/kvm/cpuid.c | 3 +- arch/x86/kvm/x86.c | 4 ++ drivers/char/random.c| 14 +-- include/linux/random.h | 40 +++ 14 files changed, 212 insertions(+), 7 deletions(-) create mode 100644 arch/x86/kernel/archrandom.c -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 2/5] random: Add and use arch_get_rng_seed
Currently, init_std_data contains its own logic for using arch random sources. This replaces that logic with a generic function arch_get_rng_seed that allows arch code to supply its own logic. The default implementation tries arch_get_random_seed_long and arch_get_random_long individually. The only functional change here is that random_get_entropy() is used unconditionally instead of being used only when the arch sources fail. This may add a tiny amount of security. Signed-off-by: Andy Lutomirski l...@amacapital.net --- drivers/char/random.c | 14 +++--- include/linux/random.h | 40 2 files changed, 51 insertions(+), 3 deletions(-) diff --git a/drivers/char/random.c b/drivers/char/random.c index 0a7ac0a..be7a94e 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -1236,6 +1236,10 @@ void get_random_bytes_arch(void *buf, int nbytes) } EXPORT_SYMBOL(get_random_bytes_arch); +static void seed_entropy_store(void *ctx, u32 data) +{ + mix_pool_bytes((struct entropy_store *)ctx, data, sizeof(data), NULL); +} /* * init_std_data - initialize pool with system data @@ -1251,15 +1255,19 @@ static void init_std_data(struct entropy_store *r) int i; ktime_t now = ktime_get_real(); unsigned long rv; + char log_prefix[128]; r-last_pulled = jiffies; mix_pool_bytes(r, now, sizeof(now), NULL); for (i = r-poolinfo-poolbytes; i 0; i -= sizeof(rv)) { - if (!arch_get_random_seed_long(rv) - !arch_get_random_long(rv)) - rv = random_get_entropy(); + rv = random_get_entropy(); mix_pool_bytes(r, rv, sizeof(rv), NULL); } + + sprintf(log_prefix, random: seeded %s pool, r-name); + arch_get_rng_seed(r, seed_entropy_store, 8 * r-poolinfo-poolbytes, + log_prefix); + mix_pool_bytes(r, utsname(), sizeof(*(utsname())), NULL); } diff --git a/include/linux/random.h b/include/linux/random.h index 57fbbff..81a6145 100644 --- a/include/linux/random.h +++ b/include/linux/random.h @@ -106,6 +106,46 @@ static inline int arch_has_random_seed(void) } #endif +#ifndef __HAVE_ARCH_GET_RNG_SEED + +/** + * arch_get_rng_seed() - get architectural rng seed data + * @ctx: context for the seed function + * @seed: function to call for each u32 obtained + * @bits_per_source: number of bits from each source to try to use + * @log_prefix: beginning of log output (may be NULL) + * + * Synchronously load some architectural entropy or other best-effort + * random seed data. An arch-specific implementation should be no worse + * than this generic implementation. If the arch code does something + * interesting, it may log something of the form log_prefix with + * 8 bits of stuff. + * + * No arch-specific implementation should be any worse than the generic + * implementation. + */ +static inline void arch_get_rng_seed(void *ctx, +void (*seed)(void *ctx, u32 data), +int bits_per_source, +const char *log_prefix) +{ + int i; + + for (i = 0; i bits_per_source; i += 8 * sizeof(long)) { + unsigned long rv; + + if (arch_get_random_seed_long(rv) || + arch_get_random_long(rv)) { + seed(ctx, (u32)rv); +#if BITS_PER_LONG 32 + seed(ctx, (u32)(rv 32)); +#endif + } + } +} + +#endif /* __HAVE_ARCH_GET_RNG_SEED */ + /* Pseudo random number generator from numerical recipes. */ static inline u32 next_pseudo_random32(u32 seed) { -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 5/5] x86,kaslr: Use MSR_KVM_GET_RNG_SEED for KASLR if available
It's considerably better than any of the alternatives on KVM. Rather than reinventing all of the cpu feature query code, this fixes native_cpuid to work in PIC objects. I haven't combined it with boot/cpuflags.c's cpuid implementation: including asm/processor.h from boot/cpuflags.c results in a flood of unrelated errors, and fixing it might be messy. Reviewed-by: Kees Cook keesc...@chromium.org Signed-off-by: Andy Lutomirski l...@amacapital.net --- arch/x86/boot/compressed/aslr.c | 27 +++ arch/x86/include/asm/processor.h | 21 ++--- 2 files changed, 45 insertions(+), 3 deletions(-) diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c index fc6091a..8583f0e 100644 --- a/arch/x86/boot/compressed/aslr.c +++ b/arch/x86/boot/compressed/aslr.c @@ -5,6 +5,8 @@ #include asm/archrandom.h #include asm/e820.h +#include uapi/asm/kvm_para.h + #include generated/compile.h #include linux/module.h #include linux/uts.h @@ -15,6 +17,22 @@ static const char build_str[] = UTS_RELEASE ( LINUX_COMPILE_BY @ LINUX_COMPILE_HOST ) ( LINUX_COMPILER ) UTS_VERSION; +static bool kvm_para_has_feature(unsigned int feature) +{ + u32 kvm_base; + u32 features; + + if (!has_cpuflag(X86_FEATURE_HYPERVISOR)) + return false; + + kvm_base = hypervisor_cpuid_base(KVMKVMKVM\0\0\0, KVM_CPUID_FEATURES); + if (!kvm_base) + return false; + + features = cpuid_eax(kvm_base | KVM_CPUID_FEATURES); + return features (1UL feature); +} + #define I8254_PORT_CONTROL 0x43 #define I8254_PORT_COUNTER00x40 #define I8254_CMD_READBACK 0xC0 @@ -81,6 +99,15 @@ static unsigned long get_random_long(void) } } + if (kvm_para_has_feature(KVM_FEATURE_GET_RNG_SEED)) { + u64 seed; + + debug_putstr( MSR_KVM_GET_RNG_SEED); + rdmsrl(MSR_KVM_GET_RNG_SEED, seed); + random ^= (unsigned long)seed; + use_i8254 = false; + } + if (has_cpuflag(X86_FEATURE_TSC)) { debug_putstr( RDTSC); rdtscll(raw); diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index a4ea023..6096f3c 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -189,10 +189,25 @@ static inline int have_cpuid_p(void) static inline void native_cpuid(unsigned int *eax, unsigned int *ebx, unsigned int *ecx, unsigned int *edx) { - /* ecx is often an input as well as an output. */ - asm volatile(cpuid + /* +* This function can be used from the boot code, so it needs +* to avoid using EBX in constraints in PIC mode. +* +* ecx is often an input as well as an output. +*/ + asm volatile(.ifnc %%ebx,%1 ; .ifnc %%rbx,%1 \n\t +movl %%ebx,%1\n\t +.endif ; .endif \n\t +cpuid \n\t +.ifnc %%ebx,%1 ; .ifnc %%rbx,%1 \n\t +xchgl %%ebx,%1\n\t +.endif ; .endif : =a (*eax), - =b (*ebx), +#if defined(__i386__) defined(__PIC__) + =r (*ebx), /* gcc won't let us use ebx */ +#else + =b (*ebx), /* ebx is okay */ +#endif =c (*ecx), =d (*edx) : 0 (*eax), 2 (*ecx) -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 3/5] x86,random: Add an x86 implementation of arch_get_rng_seed
This does the same thing as the generic implementation, except that it logs how many bits of each type it collected. I want to know whether the initial seeding is working and, if so, whether the RNG is fast enough. (I know that hpa assures me that the hardware RNG is more than fast enough, but I'd still like a direct way to verify this.) Arguably, arch_get_random_seed could be removed now: I'm having some trouble imagining a sensible non-architecture-specific use of it that wouldn't be better served by arch_get_rng_seed. Signed-off-by: Andy Lutomirski l...@amacapital.net --- arch/x86/include/asm/archrandom.h | 6 + arch/x86/kernel/Makefile | 2 ++ arch/x86/kernel/archrandom.c | 51 +++ 3 files changed, 59 insertions(+) create mode 100644 arch/x86/kernel/archrandom.c diff --git a/arch/x86/include/asm/archrandom.h b/arch/x86/include/asm/archrandom.h index 69f1366..88f9c5a 100644 --- a/arch/x86/include/asm/archrandom.h +++ b/arch/x86/include/asm/archrandom.h @@ -117,6 +117,12 @@ GET_SEED(arch_get_random_seed_int, unsigned int, RDSEED_INT, ASM_NOP4); #define arch_has_random() static_cpu_has(X86_FEATURE_RDRAND) #define arch_has_random_seed() static_cpu_has(X86_FEATURE_RDSEED) +#define __HAVE_ARCH_GET_RNG_SEED +extern void arch_get_rng_seed(void *ctx, + void (*seed)(void *ctx, u32 data), + int bits_per_source, + const char *log_prefix); + #else static inline int rdrand_long(unsigned long *v) diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 047f9ff..0718bae 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -92,6 +92,8 @@ obj-$(CONFIG_PARAVIRT)+= paravirt.o paravirt_patch_$(BITS).o obj-$(CONFIG_PARAVIRT_SPINLOCKS)+= paravirt-spinlocks.o obj-$(CONFIG_PARAVIRT_CLOCK) += pvclock.o +obj-$(CONFIG_ARCH_RANDOM) += archrandom.o + obj-$(CONFIG_PCSPKR_PLATFORM) += pcspeaker.o obj-$(CONFIG_X86_CHECK_BIOS_CORRUPTION) += check.o diff --git a/arch/x86/kernel/archrandom.c b/arch/x86/kernel/archrandom.c new file mode 100644 index 000..47d13b0 --- /dev/null +++ b/arch/x86/kernel/archrandom.c @@ -0,0 +1,51 @@ +/* + * This file is part of the Linux kernel. + * + * Copyright (c) 2014 Andy Lutomirski + * Authors: Andy Lutomirski l...@amacapital.net + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + */ + +#include asm/archrandom.h + +void arch_get_rng_seed(void *ctx, + void (*seed)(void *ctx, u32 data), + int bits_per_source, + const char *log_prefix) +{ + int i; + int rdseed_bits = 0, rdrand_bits = 0; + char buf[128] = ; + char *msgptr = buf; + + for (i = 0; i bits_per_source; i += 8 * sizeof(long)) { + unsigned long rv; + + if (arch_get_random_seed_long(rv)) + rdseed_bits += 8 * sizeof(rv); + else if (arch_get_random_long(rv)) + rdrand_bits += 8 * sizeof(rv); + else + continue; /* Don't waste time mixing. */ + + seed(ctx, (u32)rv); +#if BITS_PER_LONG 32 + seed(ctx, (u32)(rv 32)); +#endif + } + + if (rdseed_bits) + msgptr += sprintf(msgptr, , %d bits from RDSEED, rdseed_bits); + if (rdrand_bits) + msgptr += sprintf(msgptr, , %d bits from RDRAND, rdrand_bits); + if (buf[0]) + pr_info(%s with %s\n, log_prefix, buf + 2); +} -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 4/5] x86,random,kvm: Use KVM_GET_RNG_SEED in arch_get_rng_seed
This is a straightforward implementation: for each bit of internal RNG state, request one bit from KVM_GET_RNG_SEED. This is done even if RDSEED/RDRAND worked, since KVM_GET_RNG_SEED is likely to provide cryptographically secure output even if the CPU's RNG is weak or compromised. Signed-off-by: Andy Lutomirski l...@amacapital.net --- arch/x86/Kconfig | 4 arch/x86/include/asm/kvm_guest.h | 9 + arch/x86/kernel/archrandom.c | 25 - arch/x86/kernel/kvm.c| 10 ++ 4 files changed, 47 insertions(+), 1 deletion(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index a8f749e..adfa09c 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -593,6 +593,7 @@ config KVM_GUEST bool KVM Guest support (including kvmclock) depends on PARAVIRT select PARAVIRT_CLOCK + select ARCH_RANDOM default y ---help--- This option enables various optimizations for running under the KVM @@ -1507,6 +1508,9 @@ config ARCH_RANDOM If supported, this is a high bandwidth, cryptographically secure hardware random number generator. + This also enables paravirt RNGs such as KVM's if the relevant + PV guest support is enabled. + config X86_SMAP def_bool y prompt Supervisor Mode Access Prevention if EXPERT diff --git a/arch/x86/include/asm/kvm_guest.h b/arch/x86/include/asm/kvm_guest.h index a92b176..8c4dbd5 100644 --- a/arch/x86/include/asm/kvm_guest.h +++ b/arch/x86/include/asm/kvm_guest.h @@ -3,4 +3,13 @@ int kvm_setup_vsyscall_timeinfo(void); +#if defined(CONFIG_KVM_GUEST) defined(CONFIG_ARCH_RANDOM) +extern bool kvm_get_rng_seed(u64 *rv); +#else +static inline bool kvm_get_rng_seed(u64 *rv) +{ + return false; +} +#endif + #endif /* _ASM_X86_KVM_GUEST_H */ diff --git a/arch/x86/kernel/archrandom.c b/arch/x86/kernel/archrandom.c index 47d13b0..8c8d021 100644 --- a/arch/x86/kernel/archrandom.c +++ b/arch/x86/kernel/archrandom.c @@ -15,6 +15,7 @@ */ #include asm/archrandom.h +#include asm/kvm_guest.h void arch_get_rng_seed(void *ctx, void (*seed)(void *ctx, u32 data), @@ -22,7 +23,7 @@ void arch_get_rng_seed(void *ctx, const char *log_prefix) { int i; - int rdseed_bits = 0, rdrand_bits = 0; + int rdseed_bits = 0, rdrand_bits = 0, kvm_bits = 0; char buf[128] = ; char *msgptr = buf; @@ -42,10 +43,32 @@ void arch_get_rng_seed(void *ctx, #endif } + /* +* Use KVM_GET_RNG_SEED regardless of whether the CPU RNG +* worked, since it incorporates entropy unavailable to the CPU, +* and we shouldn't trust the hardware RNG more than we need to. +* We request enough bits for the entire internal RNG state, +* because there's no good reason not to. +*/ + for (i = 0; i bits_per_source; i += 64) { + u64 rv; + + if (kvm_get_rng_seed(rv)) { + seed(ctx, (u32)rv); + seed(ctx, (u32)(rv 32)); + kvm_bits += 8 * sizeof(rv); + } else { + break; /* If it fails once, it will keep failing. */ + } + } + if (rdseed_bits) msgptr += sprintf(msgptr, , %d bits from RDSEED, rdseed_bits); if (rdrand_bits) msgptr += sprintf(msgptr, , %d bits from RDRAND, rdrand_bits); + if (kvm_bits) + msgptr += sprintf(msgptr, , %d bits from KVM_GET_RNG_BITS, + kvm_bits); if (buf[0]) pr_info(%s with %s\n, log_prefix, buf + 2); } diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index 3dd8e2c..bd8783a 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -416,6 +416,16 @@ void kvm_disable_steal_time(void) wrmsr(MSR_KVM_STEAL_TIME, 0, 0); } +bool kvm_get_rng_seed(u64 *v) +{ + /* +* Allow migration from a hypervisor with the GET_RNG_SEED +* feature to a hypervisor without it. +*/ + return (kvm_para_has_feature(KVM_FEATURE_GET_RNG_SEED) + rdmsrl_safe(MSR_KVM_GET_RNG_SEED, v) == 0); +} + #ifdef CONFIG_SMP static void __init kvm_smp_prepare_boot_cpu(void) { -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 1/5] x86,kvm: Add MSR_KVM_GET_RNG_SEED and a matching feature bit
This adds a simple interface to allow a guest to request 64 bits of host nonblocking entropy. This is independent of virtio-rng for a couple of reasons: - It's intended to be usable during early boot, when a trivial synchronous interface is needed. - virtio-rng gives blocking entropy, and making guest boot wait for the host's /dev/random will cause problems. MSR_KVM_GET_RNG_SEED is intended to provide 64 bits of best-effort cryptographically secure data for use as a seed. It provides no guarantee that the result contains any actual entropy. Signed-off-by: Andy Lutomirski l...@amacapital.net --- Documentation/virtual/kvm/cpuid.txt | 3 +++ arch/x86/include/uapi/asm/kvm_para.h | 2 ++ arch/x86/kvm/cpuid.c | 3 ++- arch/x86/kvm/x86.c | 4 4 files changed, 11 insertions(+), 1 deletion(-) diff --git a/Documentation/virtual/kvm/cpuid.txt b/Documentation/virtual/kvm/cpuid.txt index 3c65feb..0ab043b 100644 --- a/Documentation/virtual/kvm/cpuid.txt +++ b/Documentation/virtual/kvm/cpuid.txt @@ -54,6 +54,9 @@ KVM_FEATURE_PV_UNHALT || 7 || guest checks this feature bit || || before enabling paravirtualized || || spinlock support. -- +KVM_FEATURE_GET_RNG_SEED || 8 || host provides rng seed data via + || || MSR_KVM_GET_RNG_SEED. +-- KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||24 || host will warn if no guest-side || || per-cpu warps are expected in || || kvmclock. diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h index 94dc8ca..e2eaf93 100644 --- a/arch/x86/include/uapi/asm/kvm_para.h +++ b/arch/x86/include/uapi/asm/kvm_para.h @@ -24,6 +24,7 @@ #define KVM_FEATURE_STEAL_TIME 5 #define KVM_FEATURE_PV_EOI 6 #define KVM_FEATURE_PV_UNHALT 7 +#define KVM_FEATURE_GET_RNG_SEED 8 /* The last 8 bits are used to indicate how to interpret the flags field * in pvclock structure. If no bits are set, all flags are ignored. @@ -40,6 +41,7 @@ #define MSR_KVM_ASYNC_PF_EN 0x4b564d02 #define MSR_KVM_STEAL_TIME 0x4b564d03 #define MSR_KVM_PV_EOI_EN 0x4b564d04 +#define MSR_KVM_GET_RNG_SEED 0x4b564d05 struct kvm_steal_time { __u64 steal; diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 38a0afe..40d6763 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -479,7 +479,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, (1 KVM_FEATURE_ASYNC_PF) | (1 KVM_FEATURE_PV_EOI) | (1 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) | -(1 KVM_FEATURE_PV_UNHALT); +(1 KVM_FEATURE_PV_UNHALT) | +(1 KVM_FEATURE_GET_RNG_SEED); if (sched_info_on()) entry-eax |= (1 KVM_FEATURE_STEAL_TIME); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index f644933..4e81853 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -48,6 +48,7 @@ #include linux/pci.h #include linux/timekeeper_internal.h #include linux/pvclock_gtod.h +#include linux/random.h #include trace/events/kvm.h #define CREATE_TRACE_POINTS @@ -2480,6 +2481,9 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata) case MSR_KVM_PV_EOI_EN: data = vcpu-arch.pv_eoi.msr_val; break; + case MSR_KVM_GET_RNG_SEED: + get_random_bytes(data, sizeof(data)); + break; case MSR_IA32_P5_MC_ADDR: case MSR_IA32_P5_MC_TYPE: case MSR_IA32_MCG_CAP: -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 2/5] random: Add and use arch_get_rng_seed
On Wed, Jul 23, 2014 at 9:57 PM, Andy Lutomirski l...@amacapital.net wrote: Currently, init_std_data contains its own logic for using arch random sources. This replaces that logic with a generic function arch_get_rng_seed that allows arch code to supply its own logic. The default implementation tries arch_get_random_seed_long and arch_get_random_long individually. The only functional change here is that random_get_entropy() is used unconditionally instead of being used only when the arch sources fail. This may add a tiny amount of security. tytso, are you okay with this approach? I'd be happy to rework this if you prefer some other way of doing it. --Andy Signed-off-by: Andy Lutomirski l...@amacapital.net --- drivers/char/random.c | 14 +++--- include/linux/random.h | 40 2 files changed, 51 insertions(+), 3 deletions(-) diff --git a/drivers/char/random.c b/drivers/char/random.c index 0a7ac0a..be7a94e 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -1236,6 +1236,10 @@ void get_random_bytes_arch(void *buf, int nbytes) } EXPORT_SYMBOL(get_random_bytes_arch); +static void seed_entropy_store(void *ctx, u32 data) +{ + mix_pool_bytes((struct entropy_store *)ctx, data, sizeof(data), NULL); +} /* * init_std_data - initialize pool with system data @@ -1251,15 +1255,19 @@ static void init_std_data(struct entropy_store *r) int i; ktime_t now = ktime_get_real(); unsigned long rv; + char log_prefix[128]; r-last_pulled = jiffies; mix_pool_bytes(r, now, sizeof(now), NULL); for (i = r-poolinfo-poolbytes; i 0; i -= sizeof(rv)) { - if (!arch_get_random_seed_long(rv) - !arch_get_random_long(rv)) - rv = random_get_entropy(); + rv = random_get_entropy(); mix_pool_bytes(r, rv, sizeof(rv), NULL); } + + sprintf(log_prefix, random: seeded %s pool, r-name); + arch_get_rng_seed(r, seed_entropy_store, 8 * r-poolinfo-poolbytes, + log_prefix); + mix_pool_bytes(r, utsname(), sizeof(*(utsname())), NULL); } diff --git a/include/linux/random.h b/include/linux/random.h index 57fbbff..81a6145 100644 --- a/include/linux/random.h +++ b/include/linux/random.h @@ -106,6 +106,46 @@ static inline int arch_has_random_seed(void) } #endif +#ifndef __HAVE_ARCH_GET_RNG_SEED + +/** + * arch_get_rng_seed() - get architectural rng seed data + * @ctx: context for the seed function + * @seed: function to call for each u32 obtained + * @bits_per_source: number of bits from each source to try to use + * @log_prefix: beginning of log output (may be NULL) + * + * Synchronously load some architectural entropy or other best-effort + * random seed data. An arch-specific implementation should be no worse + * than this generic implementation. If the arch code does something + * interesting, it may log something of the form log_prefix with + * 8 bits of stuff. + * + * No arch-specific implementation should be any worse than the generic + * implementation. + */ +static inline void arch_get_rng_seed(void *ctx, +void (*seed)(void *ctx, u32 data), +int bits_per_source, +const char *log_prefix) +{ + int i; + + for (i = 0; i bits_per_source; i += 8 * sizeof(long)) { + unsigned long rv; + + if (arch_get_random_seed_long(rv) || + arch_get_random_long(rv)) { + seed(ctx, (u32)rv); +#if BITS_PER_LONG 32 + seed(ctx, (u32)(rv 32)); +#endif + } + } +} + +#endif /* __HAVE_ARCH_GET_RNG_SEED */ + /* Pseudo random number generator from numerical recipes. */ static inline u32 next_pseudo_random32(u32 seed) { -- 1.9.3 -- Andy Lutomirski AMA Capital Management, LLC -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 0/5] random,x86,kvm: Rework arch RNG seeds and get some from kvm
On Wed, Jul 23, 2014 at 9:57 PM, Andy Lutomirski l...@amacapital.net wrote: This introduces and uses a very simple synchronous mechanism to get /dev/urandom-style bits appropriate for initial KVM PV guest RNG seeding. It also re-works the way that architectural random data is fed into random.c's pools. I added a new arch hook called arch_get_rng_seed. The default implementation is more or less the same as the current code, except that random_get_entropy is now called unconditionally. x86 gets a custom arch_get_rng_seed. It will use KVM_GET_RNG_SEED if available, and, if it does anything, it will log the number of bits collected from each available architectural source. If more paravirt seed sources show up, it will be a natural place to add them. I sent the corresponding kvm-unit-tests and qemu changes separately. What's the status of this series? I assume that it's too late for at least patches 2-5 to make it into 3.17. --Andy Changes from v4: - Got rid of the RDRAND behavior change. If this series is accepted, I may resend it separately, but I think it's an unrelated issue. - Fix up the changelog entries -- I misunderstood how the old code worked. - Avoid lots of failed attempts to use KVM_GET_RNG_SEED if it's not available. Changes from v3: - Other than KASLR, the guest pieces are completely rewritten. Patches 2-4 have essentially nothing in common with v2. Changes from v2: - Bisection fix (patch 2 had a misplaced brace). The final states is identical to that of v2. - Improve the 0/5 description a little bit. Changes from v1: - Split patches 2 and 3 - Log all arch sources in init_std_data - Fix the 32-bit kaslr build Andy Lutomirski (5): x86,kvm: Add MSR_KVM_GET_RNG_SEED and a matching feature bit random: Add and use arch_get_rng_seed x86,random: Add an x86 implementation of arch_get_rng_seed x86,random,kvm: Use KVM_GET_RNG_SEED in arch_get_rng_seed x86,kaslr: Use MSR_KVM_GET_RNG_SEED for KASLR if available Documentation/virtual/kvm/cpuid.txt | 3 ++ arch/x86/Kconfig | 4 ++ arch/x86/boot/compressed/aslr.c | 27 + arch/x86/include/asm/archrandom.h| 6 +++ arch/x86/include/asm/kvm_guest.h | 9 + arch/x86/include/asm/processor.h | 21 -- arch/x86/include/uapi/asm/kvm_para.h | 2 + arch/x86/kernel/Makefile | 2 + arch/x86/kernel/archrandom.c | 74 arch/x86/kernel/kvm.c| 10 + arch/x86/kvm/cpuid.c | 3 +- arch/x86/kvm/x86.c | 4 ++ drivers/char/random.c| 14 +-- include/linux/random.h | 40 +++ 14 files changed, 212 insertions(+), 7 deletions(-) create mode 100644 arch/x86/kernel/archrandom.c -- 1.9.3 -- Andy Lutomirski AMA Capital Management, LLC -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 0/5] random,x86,kvm: Rework arch RNG seeds and get some from kvm
On Tue, Aug 12, 2014 at 12:17 PM, Theodore Ts'o ty...@mit.edu wrote: On Tue, Aug 12, 2014 at 12:11:29PM -0700, Andy Lutomirski wrote: What's the status of this series? I assume that it's too late for at least patches 2-5 to make it into 3.17. Which tree were you hoping this patch series to go through? I was assuming it would go through the x86 tree since the bulk of the changes in the x86 subsystem (hence my Acked-by). There's some argument that patch 1 should go through the kvm tree. There's no real need for patch 1 and 2-5 to end up in the same kernel release, either. IIRC, Peter had some concerns, and I don't remember if they were all addressed. Peter? I don't know. I rewrite one thing he didn't like and undid the other, but there's plenty of opportunity for this version to be problematic, too. --Andy -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 0/5] random,x86,kvm: Rework arch RNG seeds and get some from kvm
On Aug 13, 2014 12:48 AM, H. Peter Anvin h...@zytor.com wrote: On 08/12/2014 12:22 PM, Andy Lutomirski wrote: On Tue, Aug 12, 2014 at 12:17 PM, Theodore Ts'o ty...@mit.edu wrote: On Tue, Aug 12, 2014 at 12:11:29PM -0700, Andy Lutomirski wrote: What's the status of this series? I assume that it's too late for at least patches 2-5 to make it into 3.17. Which tree were you hoping this patch series to go through? I was assuming it would go through the x86 tree since the bulk of the changes in the x86 subsystem (hence my Acked-by). There's some argument that patch 1 should go through the kvm tree. There's no real need for patch 1 and 2-5 to end up in the same kernel release, either. IIRC, Peter had some concerns, and I don't remember if they were all addressed. Peter? I don't know. I rewrite one thing he didn't like and undid the other, but there's plenty of opportunity for this version to be problematic, too. Sorry, I have been heads down on the current merge window. I will look at this for 3.18, presumably after Kernel Summit. The proposed arch_get_rng_seed() is not really what it claims to be; it most definitely does not produce seed-grade randomness, instead it seems to be an arch function for best-effort initialization of the entropy pools -- which is fine, it is just something quite different. Fair enough. I meant seed as in something that initialized a PRNG (think srand), not seed as in a promised-to-be-cryptographically-secure seed for a DRBG. I can rename it, update the comment, or otherwise tweak it to make the intent clearer. I want to look over it more carefully before acking it, though. It would also be nice for someone with a Haswell box (and an RDSEED box) to test it. I have neither. Andy, are you going to be in Chicago? Yes. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 0/5] random,x86,kvm: Rework arch RNG seeds and get some from kvm
On Wed, Aug 13, 2014 at 7:32 AM, Theodore Ts'o ty...@mit.edu wrote: On Wed, Aug 13, 2014 at 12:48:41AM -0700, H. Peter Anvin wrote: The proposed arch_get_rng_seed() is not really what it claims to be; it most definitely does not produce seed-grade randomness, instead it seems to be an arch function for best-effort initialization of the entropy pools -- which is fine, it is just something quite different. Without getting into an argument about which definition of seed is correct --- it's certainly confusing and different form the RDSEED usage of the word seed. Do we expect that anyone else besides arch_get_rnd_seed() would actually want to use it? If you mean random.c instead of arch_get_rnd_seed, then I don't expect there to be other users. Aside from the best-effort bit causing this to be basically useless on old bare metal, the interface is really awkward for anything other than the use in random.c. I'd argue no; we want the rest of the kernel to either use get_random_bytes() or prandom_u32(). Given that, maybe we should just call it arch_random_init(), and expect that the only user of this interface would be drivers/char/random.c? Sounds good to me. FWIW, I'd like to see a second use added in random.c: I think that we should do this, or even all of init_std_data, on resume from suspend and especially on resume from hibernate / kexec. --Andy -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 0/5] random,x86,kvm: Rework arch RNG seeds and get some from kvm
On Wed, Aug 13, 2014 at 11:22 AM, Theodore Ts'o ty...@mit.edu wrote: On Wed, Aug 13, 2014 at 10:45:25AM -0700, H. Peter Anvin wrote: On 08/13/2014 09:13 AM, Andy Lutomirski wrote: Sounds good to me. FWIW, I'd like to see a second use added in random.c: I think that we should do this, or even all of init_std_data, on resume from suspend and especially on resume from hibernate / kexec. Yes, we should. We also need to make it possible to do this after cloning a VM. Agreed. Can you send a patch? I can carry the commits to add arch_random_init() the generic version, and the patch to call it after suspend/resume. I'll do this at the very head of the random tree, and make sure it gets pushed to Linus early during the next merge window. Does that sound like a plan? Or does someone want to suggest something different? I'm flexible... OK. Here's a proposal. I'll split the series into two parts. The first part will be the arch_random_init generic change and code to call it after suspend/resume (once I figure out the right callback). I'll send that to you. The second part will be the KVM and x86 code, which will look just like this version (v5) except for the rename. If needed, hpa and I can hash the details we need at KS. As for doing arch_random_init after clone/migration, I think we'll need another KVM extension for that, since, AFAIK, we don't actually get notified that we were cloned or migrated. That will be nontrivial. Maybe we can figure that out at KS, too. --Andy - Ted -- Andy Lutomirski AMA Capital Management, LLC -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 0/5] random,x86,kvm: Rework arch RNG seeds and get some from kvm
On Wed, Aug 13, 2014 at 7:41 PM, H. Peter Anvin h...@zytor.com wrote: On 08/13/2014 11:44 AM, H. Peter Anvin wrote: On 08/13/2014 11:33 AM, Andy Lutomirski wrote: As for doing arch_random_init after clone/migration, I think we'll need another KVM extension for that, since, AFAIK, we don't actually get notified that we were cloned or migrated. That will be nontrivial. Maybe we can figure that out at KS, too. We don't need a reset when migrated (although it might be a good idea under some circumstances, i.e. if the pools might somehow have gotten exposed) but definitely when cloned. But yes, we need a notification. For obvious reasons there is no suspend event (one can snapshot a running VM) but we need to be notified upon wakeup, *or* we need to give KVM a way to update the necessary state. This could presumably use the interrupt mechanism on virtio-rng if we're willing to depend on having host support for virtio-rng. v6 (coming in a few minutes) will at least get it right when the kernel goes through the resume path (i.e. not KVM/QEMU suspend, and maybe not S0ix either). --Andy -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v6 0/7] random,x86,kvm: Rework arch RNG seeds and get some from kvm
This introduces and uses a very simple synchronous mechanism to get /dev/urandom-style bits appropriate for initial KVM PV guest RNG seeding. It also re-works the way that architectural random data is fed into random.c's pools. Timekeeping randomness now comes directly from the timekeeping core rather than being pulled in from init_std_data, and timekeeping randomness is added both on boot and on resume. I added a new arch hook called arch_rng_init. The default implementation is more or less the same as the current code, except that random_get_entropy is now called unconditionally. We now also call init_std_data on resume. x86 gets a custom arch_rng_init. It will use KVM_GET_RNG_SEED if available, and, if it does anything, it will log the number of bits collected from each available architectural source. If more paravirt seed sources show up, it will be a natural place to add them. I sent the corresponding kvm-unit-tests and qemu changes separately. Changes from v5: - Moved the generic changes to the beginning. - Renamed arch_get_rng_seed to arch_rng_init. - The timekeeping change is new. - random.c registers a syscore callback to reseed on resume. Changes from v4: - Got rid of the RDRAND behavior change. If this series is accepted, I may resend it separately, but I think it's an unrelated issue. - Fix up the changelog entries -- I misunderstood how the old code worked. - Avoid lots of failed attempts to use KVM_GET_RNG_SEED if it's not available. Changes from v3: - Other than KASLR, the guest pieces are completely rewritten. Patches 2-4 have essentially nothing in common with v2. Changes from v2: - Bisection fix (patch 2 had a misplaced brace). The final states is identical to that of v2. - Improve the 0/5 description a little bit. Changes from v1: - Split patches 2 and 3 - Log all arch sources in init_std_data - Fix the 32-bit kaslr build Andy Lutomirski (7): random: Add and use arch_rng_init random, timekeeping: Collect timekeeping entropy in the timekeeping code random: Reseed pools on resume x86,kvm: Add MSR_KVM_GET_RNG_SEED and a matching feature bit x86,random: Add an x86 implementation of arch_rng_init x86,random,kvm: Use KVM_GET_RNG_SEED in arch_rng_init x86,kaslr: Use MSR_KVM_GET_RNG_SEED for KASLR if available Documentation/virtual/kvm/cpuid.txt | 3 ++ arch/x86/Kconfig | 4 ++ arch/x86/boot/compressed/aslr.c | 27 + arch/x86/include/asm/archrandom.h| 6 +++ arch/x86/include/asm/kvm_guest.h | 9 + arch/x86/include/asm/processor.h | 21 -- arch/x86/include/uapi/asm/kvm_para.h | 2 + arch/x86/kernel/Makefile | 2 + arch/x86/kernel/archrandom.c | 74 arch/x86/kernel/kvm.c| 10 + arch/x86/kvm/cpuid.c | 3 +- arch/x86/kvm/x86.c | 4 ++ drivers/char/random.c| 42 include/linux/random.h | 40 +++ kernel/time/timekeeping.c| 11 ++ 15 files changed, 246 insertions(+), 12 deletions(-) create mode 100644 arch/x86/kernel/archrandom.c -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v6 7/7] x86,kaslr: Use MSR_KVM_GET_RNG_SEED for KASLR if available
It's considerably better than any of the alternatives on KVM. Rather than reinventing all of the cpu feature query code, this fixes native_cpuid to work in PIC objects. I haven't combined it with boot/cpuflags.c's cpuid implementation: including asm/processor.h from boot/cpuflags.c results in a flood of unrelated errors, and fixing it might be messy. Reviewed-by: Kees Cook keesc...@chromium.org Acked-by: Paolo Bonzini pbonz...@redhat.com Signed-off-by: Andy Lutomirski l...@amacapital.net --- arch/x86/boot/compressed/aslr.c | 27 +++ arch/x86/include/asm/processor.h | 21 ++--- 2 files changed, 45 insertions(+), 3 deletions(-) diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c index fc6091a..8583f0e 100644 --- a/arch/x86/boot/compressed/aslr.c +++ b/arch/x86/boot/compressed/aslr.c @@ -5,6 +5,8 @@ #include asm/archrandom.h #include asm/e820.h +#include uapi/asm/kvm_para.h + #include generated/compile.h #include linux/module.h #include linux/uts.h @@ -15,6 +17,22 @@ static const char build_str[] = UTS_RELEASE ( LINUX_COMPILE_BY @ LINUX_COMPILE_HOST ) ( LINUX_COMPILER ) UTS_VERSION; +static bool kvm_para_has_feature(unsigned int feature) +{ + u32 kvm_base; + u32 features; + + if (!has_cpuflag(X86_FEATURE_HYPERVISOR)) + return false; + + kvm_base = hypervisor_cpuid_base(KVMKVMKVM\0\0\0, KVM_CPUID_FEATURES); + if (!kvm_base) + return false; + + features = cpuid_eax(kvm_base | KVM_CPUID_FEATURES); + return features (1UL feature); +} + #define I8254_PORT_CONTROL 0x43 #define I8254_PORT_COUNTER00x40 #define I8254_CMD_READBACK 0xC0 @@ -81,6 +99,15 @@ static unsigned long get_random_long(void) } } + if (kvm_para_has_feature(KVM_FEATURE_GET_RNG_SEED)) { + u64 seed; + + debug_putstr( MSR_KVM_GET_RNG_SEED); + rdmsrl(MSR_KVM_GET_RNG_SEED, seed); + random ^= (unsigned long)seed; + use_i8254 = false; + } + if (has_cpuflag(X86_FEATURE_TSC)) { debug_putstr( RDTSC); rdtscll(raw); diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index a4ea023..6096f3c 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -189,10 +189,25 @@ static inline int have_cpuid_p(void) static inline void native_cpuid(unsigned int *eax, unsigned int *ebx, unsigned int *ecx, unsigned int *edx) { - /* ecx is often an input as well as an output. */ - asm volatile(cpuid + /* +* This function can be used from the boot code, so it needs +* to avoid using EBX in constraints in PIC mode. +* +* ecx is often an input as well as an output. +*/ + asm volatile(.ifnc %%ebx,%1 ; .ifnc %%rbx,%1 \n\t +movl %%ebx,%1\n\t +.endif ; .endif \n\t +cpuid \n\t +.ifnc %%ebx,%1 ; .ifnc %%rbx,%1 \n\t +xchgl %%ebx,%1\n\t +.endif ; .endif : =a (*eax), - =b (*ebx), +#if defined(__i386__) defined(__PIC__) + =r (*ebx), /* gcc won't let us use ebx */ +#else + =b (*ebx), /* ebx is okay */ +#endif =c (*ecx), =d (*edx) : 0 (*eax), 2 (*ecx) -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v6 6/7] x86,random,kvm: Use KVM_GET_RNG_SEED in arch_rng_init
This is a straightforward implementation: for each bit of internal RNG state, request one bit from KVM_GET_RNG_SEED. This is done even if RDSEED/RDRAND worked, since KVM_GET_RNG_SEED is likely to provide cryptographically secure output even if the CPU's RNG is weak or compromised. Acked-by: Paolo Bonzini pbonz...@redhat.com Signed-off-by: Andy Lutomirski l...@amacapital.net --- arch/x86/Kconfig | 4 arch/x86/include/asm/kvm_guest.h | 9 + arch/x86/kernel/archrandom.c | 25 - arch/x86/kernel/kvm.c| 10 ++ 4 files changed, 47 insertions(+), 1 deletion(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index d24887b..ad87278 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -594,6 +594,7 @@ config KVM_GUEST bool KVM Guest support (including kvmclock) depends on PARAVIRT select PARAVIRT_CLOCK + select ARCH_RANDOM default y ---help--- This option enables various optimizations for running under the KVM @@ -1508,6 +1509,9 @@ config ARCH_RANDOM If supported, this is a high bandwidth, cryptographically secure hardware random number generator. + This also enables paravirt RNGs such as KVM's if the relevant + PV guest support is enabled. + config X86_SMAP def_bool y prompt Supervisor Mode Access Prevention if EXPERT diff --git a/arch/x86/include/asm/kvm_guest.h b/arch/x86/include/asm/kvm_guest.h index a92b176..8c4dbd5 100644 --- a/arch/x86/include/asm/kvm_guest.h +++ b/arch/x86/include/asm/kvm_guest.h @@ -3,4 +3,13 @@ int kvm_setup_vsyscall_timeinfo(void); +#if defined(CONFIG_KVM_GUEST) defined(CONFIG_ARCH_RANDOM) +extern bool kvm_get_rng_seed(u64 *rv); +#else +static inline bool kvm_get_rng_seed(u64 *rv) +{ + return false; +} +#endif + #endif /* _ASM_X86_KVM_GUEST_H */ diff --git a/arch/x86/kernel/archrandom.c b/arch/x86/kernel/archrandom.c index e8d2ffb..adbaa25 100644 --- a/arch/x86/kernel/archrandom.c +++ b/arch/x86/kernel/archrandom.c @@ -15,6 +15,7 @@ */ #include asm/archrandom.h +#include asm/kvm_guest.h void arch_rng_init(void *ctx, void (*seed)(void *ctx, u32 data), @@ -22,7 +23,7 @@ void arch_rng_init(void *ctx, const char *log_prefix) { int i; - int rdseed_bits = 0, rdrand_bits = 0; + int rdseed_bits = 0, rdrand_bits = 0, kvm_bits = 0; char buf[128] = ; char *msgptr = buf; @@ -42,10 +43,32 @@ void arch_rng_init(void *ctx, #endif } + /* +* Use KVM_GET_RNG_SEED regardless of whether the CPU RNG +* worked, since it incorporates entropy unavailable to the CPU, +* and we shouldn't trust the hardware RNG more than we need to. +* We request enough bits for the entire internal RNG state, +* because there's no good reason not to. +*/ + for (i = 0; i bits_per_source; i += 64) { + u64 rv; + + if (kvm_get_rng_seed(rv)) { + seed(ctx, (u32)rv); + seed(ctx, (u32)(rv 32)); + kvm_bits += 8 * sizeof(rv); + } else { + break; /* If it fails once, it will keep failing. */ + } + } + if (rdseed_bits) msgptr += sprintf(msgptr, , %d bits from RDSEED, rdseed_bits); if (rdrand_bits) msgptr += sprintf(msgptr, , %d bits from RDRAND, rdrand_bits); + if (kvm_bits) + msgptr += sprintf(msgptr, , %d bits from KVM_GET_RNG_BITS, + kvm_bits); if (buf[0]) pr_info(%s with %s\n, log_prefix, buf + 2); } diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index 3dd8e2c..bd8783a 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -416,6 +416,16 @@ void kvm_disable_steal_time(void) wrmsr(MSR_KVM_STEAL_TIME, 0, 0); } +bool kvm_get_rng_seed(u64 *v) +{ + /* +* Allow migration from a hypervisor with the GET_RNG_SEED +* feature to a hypervisor without it. +*/ + return (kvm_para_has_feature(KVM_FEATURE_GET_RNG_SEED) + rdmsrl_safe(MSR_KVM_GET_RNG_SEED, v) == 0); +} + #ifdef CONFIG_SMP static void __init kvm_smp_prepare_boot_cpu(void) { -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v6 6/7] x86,random,kvm: Use KVM_GET_RNG_SEED in arch_get_rng_seed
This is a straightforward implementation: for each bit of internal RNG state, request one bit from KVM_GET_RNG_SEED. This is done even if RDSEED/RDRAND worked, since KVM_GET_RNG_SEED is likely to provide cryptographically secure output even if the CPU's RNG is weak or compromised. Acked-by: Paolo Bonzini pbonz...@redhat.com Signed-off-by: Andy Lutomirski l...@amacapital.net --- arch/x86/Kconfig | 4 arch/x86/include/asm/kvm_guest.h | 9 + arch/x86/kernel/archrandom.c | 25 - arch/x86/kernel/kvm.c| 10 ++ 4 files changed, 47 insertions(+), 1 deletion(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index d24887b..ad87278 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -594,6 +594,7 @@ config KVM_GUEST bool KVM Guest support (including kvmclock) depends on PARAVIRT select PARAVIRT_CLOCK + select ARCH_RANDOM default y ---help--- This option enables various optimizations for running under the KVM @@ -1508,6 +1509,9 @@ config ARCH_RANDOM If supported, this is a high bandwidth, cryptographically secure hardware random number generator. + This also enables paravirt RNGs such as KVM's if the relevant + PV guest support is enabled. + config X86_SMAP def_bool y prompt Supervisor Mode Access Prevention if EXPERT diff --git a/arch/x86/include/asm/kvm_guest.h b/arch/x86/include/asm/kvm_guest.h index a92b176..8c4dbd5 100644 --- a/arch/x86/include/asm/kvm_guest.h +++ b/arch/x86/include/asm/kvm_guest.h @@ -3,4 +3,13 @@ int kvm_setup_vsyscall_timeinfo(void); +#if defined(CONFIG_KVM_GUEST) defined(CONFIG_ARCH_RANDOM) +extern bool kvm_get_rng_seed(u64 *rv); +#else +static inline bool kvm_get_rng_seed(u64 *rv) +{ + return false; +} +#endif + #endif /* _ASM_X86_KVM_GUEST_H */ diff --git a/arch/x86/kernel/archrandom.c b/arch/x86/kernel/archrandom.c index e8d2ffb..adbaa25 100644 --- a/arch/x86/kernel/archrandom.c +++ b/arch/x86/kernel/archrandom.c @@ -15,6 +15,7 @@ */ #include asm/archrandom.h +#include asm/kvm_guest.h void arch_rng_init(void *ctx, void (*seed)(void *ctx, u32 data), @@ -22,7 +23,7 @@ void arch_rng_init(void *ctx, const char *log_prefix) { int i; - int rdseed_bits = 0, rdrand_bits = 0; + int rdseed_bits = 0, rdrand_bits = 0, kvm_bits = 0; char buf[128] = ; char *msgptr = buf; @@ -42,10 +43,32 @@ void arch_rng_init(void *ctx, #endif } + /* +* Use KVM_GET_RNG_SEED regardless of whether the CPU RNG +* worked, since it incorporates entropy unavailable to the CPU, +* and we shouldn't trust the hardware RNG more than we need to. +* We request enough bits for the entire internal RNG state, +* because there's no good reason not to. +*/ + for (i = 0; i bits_per_source; i += 64) { + u64 rv; + + if (kvm_get_rng_seed(rv)) { + seed(ctx, (u32)rv); + seed(ctx, (u32)(rv 32)); + kvm_bits += 8 * sizeof(rv); + } else { + break; /* If it fails once, it will keep failing. */ + } + } + if (rdseed_bits) msgptr += sprintf(msgptr, , %d bits from RDSEED, rdseed_bits); if (rdrand_bits) msgptr += sprintf(msgptr, , %d bits from RDRAND, rdrand_bits); + if (kvm_bits) + msgptr += sprintf(msgptr, , %d bits from KVM_GET_RNG_BITS, + kvm_bits); if (buf[0]) pr_info(%s with %s\n, log_prefix, buf + 2); } diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index 3dd8e2c..bd8783a 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -416,6 +416,16 @@ void kvm_disable_steal_time(void) wrmsr(MSR_KVM_STEAL_TIME, 0, 0); } +bool kvm_get_rng_seed(u64 *v) +{ + /* +* Allow migration from a hypervisor with the GET_RNG_SEED +* feature to a hypervisor without it. +*/ + return (kvm_para_has_feature(KVM_FEATURE_GET_RNG_SEED) + rdmsrl_safe(MSR_KVM_GET_RNG_SEED, v) == 0); +} + #ifdef CONFIG_SMP static void __init kvm_smp_prepare_boot_cpu(void) { -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v6 5/7] x86,random: Add an x86 implementation of arch_rng_init
This does the same thing as the generic implementation, except that it logs how many bits of each type it collected. I want to know whether the initial seeding is working and, if so, whether the RNG is fast enough. (I know that hpa assures me that the hardware RNG is more than fast enough, but I'd still like a direct way to verify this.) Arguably, arch_get_random_seed could be removed now: I'm having some trouble imagining a sensible non-architecture-specific use of it that wouldn't be better served by arch_rng_init. Signed-off-by: Andy Lutomirski l...@amacapital.net --- arch/x86/include/asm/archrandom.h | 6 + arch/x86/kernel/Makefile | 2 ++ arch/x86/kernel/archrandom.c | 51 +++ 3 files changed, 59 insertions(+) create mode 100644 arch/x86/kernel/archrandom.c diff --git a/arch/x86/include/asm/archrandom.h b/arch/x86/include/asm/archrandom.h index 69f1366..5611c21 100644 --- a/arch/x86/include/asm/archrandom.h +++ b/arch/x86/include/asm/archrandom.h @@ -117,6 +117,12 @@ GET_SEED(arch_get_random_seed_int, unsigned int, RDSEED_INT, ASM_NOP4); #define arch_has_random() static_cpu_has(X86_FEATURE_RDRAND) #define arch_has_random_seed() static_cpu_has(X86_FEATURE_RDSEED) +#define __HAVE_ARCH_RNG_INIT +extern void arch_rng_init(void *ctx, + void (*seed)(void *ctx, u32 data), + int bits_per_source, + const char *log_prefix); + #else static inline int rdrand_long(unsigned long *v) diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 047f9ff..0718bae 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -92,6 +92,8 @@ obj-$(CONFIG_PARAVIRT)+= paravirt.o paravirt_patch_$(BITS).o obj-$(CONFIG_PARAVIRT_SPINLOCKS)+= paravirt-spinlocks.o obj-$(CONFIG_PARAVIRT_CLOCK) += pvclock.o +obj-$(CONFIG_ARCH_RANDOM) += archrandom.o + obj-$(CONFIG_PCSPKR_PLATFORM) += pcspeaker.o obj-$(CONFIG_X86_CHECK_BIOS_CORRUPTION) += check.o diff --git a/arch/x86/kernel/archrandom.c b/arch/x86/kernel/archrandom.c new file mode 100644 index 000..e8d2ffb --- /dev/null +++ b/arch/x86/kernel/archrandom.c @@ -0,0 +1,51 @@ +/* + * This file is part of the Linux kernel. + * + * Copyright (c) 2014 Andy Lutomirski + * Authors: Andy Lutomirski l...@amacapital.net + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + */ + +#include asm/archrandom.h + +void arch_rng_init(void *ctx, + void (*seed)(void *ctx, u32 data), + int bits_per_source, + const char *log_prefix) +{ + int i; + int rdseed_bits = 0, rdrand_bits = 0; + char buf[128] = ; + char *msgptr = buf; + + for (i = 0; i bits_per_source; i += 8 * sizeof(long)) { + unsigned long rv; + + if (arch_get_random_seed_long(rv)) + rdseed_bits += 8 * sizeof(rv); + else if (arch_get_random_long(rv)) + rdrand_bits += 8 * sizeof(rv); + else + continue; /* Don't waste time mixing. */ + + seed(ctx, (u32)rv); +#if BITS_PER_LONG 32 + seed(ctx, (u32)(rv 32)); +#endif + } + + if (rdseed_bits) + msgptr += sprintf(msgptr, , %d bits from RDSEED, rdseed_bits); + if (rdrand_bits) + msgptr += sprintf(msgptr, , %d bits from RDRAND, rdrand_bits); + if (buf[0]) + pr_info(%s with %s\n, log_prefix, buf + 2); +} -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v6 3/7] random: Reseed pools on resume
After a suspend/resume cycle, and especially after hibernating, we should assume that the random pools might have leaked. To minimize the risk this poses, try to collect fresh architectural entropy on resume. Signed-off-by: Andy Lutomirski l...@amacapital.net --- drivers/char/random.c | 26 +++--- 1 file changed, 23 insertions(+), 3 deletions(-) diff --git a/drivers/char/random.c b/drivers/char/random.c index 8dc3e3a..0811ad4 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -257,6 +257,7 @@ #include linux/kmemcheck.h #include linux/workqueue.h #include linux/irq.h +#include linux/syscore_ops.h #include asm/processor.h #include asm/uaccess.h @@ -1279,6 +1280,26 @@ static void init_std_data(struct entropy_store *r) mix_pool_bytes(r, utsname(), sizeof(*(utsname())), NULL); } +static void init_all_pools(void) +{ + init_std_data(input_pool); + init_std_data(blocking_pool); + init_std_data(nonblocking_pool); +} + +static void random_resume(void) +{ + /* +* After resume (and especially after hibernation / kexec resume), +* make a best-effort attempt to collect fresh entropy. +*/ + init_all_pools(); +} + +static struct syscore_ops random_syscore_ops = { + .resume = random_resume, +}; + /* * Note that setup_arch() may call add_device_randomness() * long before we get here. This allows seeding of the pools @@ -1291,9 +1312,8 @@ static void init_std_data(struct entropy_store *r) */ static int rand_initialize(void) { - init_std_data(input_pool); - init_std_data(blocking_pool); - init_std_data(nonblocking_pool); + init_all_pools(); + register_syscore_ops(random_syscore_ops); return 0; } early_initcall(rand_initialize); -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v6 4/7] x86,kvm: Add MSR_KVM_GET_RNG_SEED and a matching feature bit
This adds a simple interface to allow a guest to request 64 bits of host nonblocking entropy. This is independent of virtio-rng for a couple of reasons: - It's intended to be usable during early boot, when a trivial synchronous interface is needed. - virtio-rng gives blocking entropy, and making guest boot wait for the host's /dev/random will cause problems. MSR_KVM_GET_RNG_SEED is intended to provide 64 bits of best-effort cryptographically secure data for use as a seed. It provides no guarantee that the result contains any actual entropy. Acked-by: Paolo Bonzini pbonz...@redhat.com Signed-off-by: Andy Lutomirski l...@amacapital.net --- Documentation/virtual/kvm/cpuid.txt | 3 +++ arch/x86/include/uapi/asm/kvm_para.h | 2 ++ arch/x86/kvm/cpuid.c | 3 ++- arch/x86/kvm/x86.c | 4 4 files changed, 11 insertions(+), 1 deletion(-) diff --git a/Documentation/virtual/kvm/cpuid.txt b/Documentation/virtual/kvm/cpuid.txt index 3c65feb..0ab043b 100644 --- a/Documentation/virtual/kvm/cpuid.txt +++ b/Documentation/virtual/kvm/cpuid.txt @@ -54,6 +54,9 @@ KVM_FEATURE_PV_UNHALT || 7 || guest checks this feature bit || || before enabling paravirtualized || || spinlock support. -- +KVM_FEATURE_GET_RNG_SEED || 8 || host provides rng seed data via + || || MSR_KVM_GET_RNG_SEED. +-- KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||24 || host will warn if no guest-side || || per-cpu warps are expected in || || kvmclock. diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h index 94dc8ca..e2eaf93 100644 --- a/arch/x86/include/uapi/asm/kvm_para.h +++ b/arch/x86/include/uapi/asm/kvm_para.h @@ -24,6 +24,7 @@ #define KVM_FEATURE_STEAL_TIME 5 #define KVM_FEATURE_PV_EOI 6 #define KVM_FEATURE_PV_UNHALT 7 +#define KVM_FEATURE_GET_RNG_SEED 8 /* The last 8 bits are used to indicate how to interpret the flags field * in pvclock structure. If no bits are set, all flags are ignored. @@ -40,6 +41,7 @@ #define MSR_KVM_ASYNC_PF_EN 0x4b564d02 #define MSR_KVM_STEAL_TIME 0x4b564d03 #define MSR_KVM_PV_EOI_EN 0x4b564d04 +#define MSR_KVM_GET_RNG_SEED 0x4b564d05 struct kvm_steal_time { __u64 steal; diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 38a0afe..40d6763 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -479,7 +479,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, (1 KVM_FEATURE_ASYNC_PF) | (1 KVM_FEATURE_PV_EOI) | (1 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) | -(1 KVM_FEATURE_PV_UNHALT); +(1 KVM_FEATURE_PV_UNHALT) | +(1 KVM_FEATURE_GET_RNG_SEED); if (sched_info_on()) entry-eax |= (1 KVM_FEATURE_STEAL_TIME); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index ef432f8..695b682 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -48,6 +48,7 @@ #include linux/pci.h #include linux/timekeeper_internal.h #include linux/pvclock_gtod.h +#include linux/random.h #include trace/events/kvm.h #define CREATE_TRACE_POINTS @@ -2480,6 +2481,9 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata) case MSR_KVM_PV_EOI_EN: data = vcpu-arch.pv_eoi.msr_val; break; + case MSR_KVM_GET_RNG_SEED: + get_random_bytes(data, sizeof(data)); + break; case MSR_IA32_P5_MC_ADDR: case MSR_IA32_P5_MC_TYPE: case MSR_IA32_MCG_CAP: -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v6 2/7] random, timekeeping: Collect timekeeping entropy in the timekeeping code
Currently, init_std_data calls ktime_get_real(). This imposes awkward constraints on when init_std_data can be called, and init_std_data is unlikely to collect the full unpredictable data available to the timekeeping code, especially after resume. Remove this code from random.c and add the appropriate add_device_randomness calls to timekeeping.c instead. Cc: John Stultz john.stu...@linaro.org Signed-off-by: Andy Lutomirski l...@amacapital.net --- drivers/char/random.c | 2 -- kernel/time/timekeeping.c | 11 +++ 2 files changed, 11 insertions(+), 2 deletions(-) diff --git a/drivers/char/random.c b/drivers/char/random.c index 7673e60..8dc3e3a 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -1263,12 +1263,10 @@ static void seed_entropy_store(void *ctx, u32 data) static void init_std_data(struct entropy_store *r) { int i; - ktime_t now = ktime_get_real(); unsigned long rv; char log_prefix[128]; r-last_pulled = jiffies; - mix_pool_bytes(r, now, sizeof(now), NULL); for (i = r-poolinfo-poolbytes; i 0; i -= sizeof(rv)) { rv = random_get_entropy(); mix_pool_bytes(r, rv, sizeof(rv), NULL); diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 32d8d6a..9609db9 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -23,6 +23,7 @@ #include linux/stop_machine.h #include linux/pvclock_gtod.h #include linux/compiler.h +#include linux/random.h #include tick-internal.h #include ntp_internal.h @@ -835,6 +836,9 @@ void __init timekeeping_init(void) memcpy(shadow_timekeeper, timekeeper, sizeof(timekeeper)); write_seqcount_end(timekeeper_seq); + + add_device_randomness(tk, sizeof(tk)); + raw_spin_unlock_irqrestore(timekeeper_lock, flags); } @@ -976,6 +980,13 @@ static void timekeeping_resume(void) timekeeping_suspended = 0; timekeeping_update(tk, TK_MIRROR | TK_CLOCK_WAS_SET); write_seqcount_end(timekeeper_seq); + + /* +* The timekeeping state has a decent chance of differing +* between resumptions of the same image. +*/ + add_device_randomness(tk, sizeof(tk)); + raw_spin_unlock_irqrestore(timekeeper_lock, flags); touch_softlockup_watchdog(); -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v6 1/7] random: Add and use arch_rng_init
Currently, init_std_data contains its own logic for using arch random sources. This replaces that logic with a generic function arch_rng_init that allows arch code to supply its own logic. The default implementation tries arch_get_random_seed_long and arch_get_random_long individually. The only functional change here is that random_get_entropy() is used unconditionally instead of being used only when the arch sources fail. This may add a tiny amount of security. Acked-by: Theodore Ts'o ty...@mit.edu Signed-off-by: Andy Lutomirski l...@amacapital.net --- drivers/char/random.c | 14 +++--- include/linux/random.h | 40 2 files changed, 51 insertions(+), 3 deletions(-) diff --git a/drivers/char/random.c b/drivers/char/random.c index 71529e1..7673e60 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -1246,6 +1246,10 @@ void get_random_bytes_arch(void *buf, int nbytes) } EXPORT_SYMBOL(get_random_bytes_arch); +static void seed_entropy_store(void *ctx, u32 data) +{ + mix_pool_bytes((struct entropy_store *)ctx, data, sizeof(data), NULL); +} /* * init_std_data - initialize pool with system data @@ -1261,15 +1265,19 @@ static void init_std_data(struct entropy_store *r) int i; ktime_t now = ktime_get_real(); unsigned long rv; + char log_prefix[128]; r-last_pulled = jiffies; mix_pool_bytes(r, now, sizeof(now), NULL); for (i = r-poolinfo-poolbytes; i 0; i -= sizeof(rv)) { - if (!arch_get_random_seed_long(rv) - !arch_get_random_long(rv)) - rv = random_get_entropy(); + rv = random_get_entropy(); mix_pool_bytes(r, rv, sizeof(rv), NULL); } + + sprintf(log_prefix, random: seeded %s pool, r-name); + arch_rng_init(r, seed_entropy_store, 8 * r-poolinfo-poolbytes, + log_prefix); + mix_pool_bytes(r, utsname(), sizeof(*(utsname())), NULL); } diff --git a/include/linux/random.h b/include/linux/random.h index 57fbbff..c8d692e 100644 --- a/include/linux/random.h +++ b/include/linux/random.h @@ -106,6 +106,46 @@ static inline int arch_has_random_seed(void) } #endif +#ifndef __HAVE_ARCH_RNG_INIT + +/** + * arch_rng_init() - get architectural rng seed data + * @ctx: context for the seed function + * @seed: function to call for each u32 obtained + * @bits_per_source: number of bits from each source to try to use + * @log_prefix: beginning of log output (may be NULL) + * + * Synchronously load some architectural entropy or other best-effort + * random seed data. An arch-specific implementation should be no worse + * than this generic implementation. If the arch code does something + * interesting, it may log something of the form log_prefix with + * 8 bits of stuff. + * + * No arch-specific implementation should be any worse than the generic + * implementation. + */ +static inline void arch_rng_init(void *ctx, +void (*seed)(void *ctx, u32 data), +int bits_per_source, +const char *log_prefix) +{ + int i; + + for (i = 0; i bits_per_source; i += 8 * sizeof(long)) { + unsigned long rv; + + if (arch_get_random_seed_long(rv) || + arch_get_random_long(rv)) { + seed(ctx, (u32)rv); +#if BITS_PER_LONG 32 + seed(ctx, (u32)(rv 32)); +#endif + } + } +} + +#endif /* __HAVE_ARCH_RNG_INIT */ + /* Pseudo random number generator from numerical recipes. */ static inline u32 next_pseudo_random32(u32 seed) { -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6 2/7] random, timekeeping: Collect timekeeping entropy in the timekeeping code
On Wed, Aug 13, 2014 at 10:43 PM, Andy Lutomirski l...@amacapital.net wrote: Currently, init_std_data calls ktime_get_real(). This imposes awkward constraints on when init_std_data can be called, and init_std_data is unlikely to collect the full unpredictable data available to the timekeeping code, especially after resume. Remove this code from random.c and add the appropriate add_device_randomness calls to timekeeping.c instead. *sigh* this is buggy: + add_device_randomness(tk, sizeof(tk)); sizeof(*tk) + add_device_randomness(tk, sizeof(tk)); ditto. I'll fix this for v7, but I'll wait awhile for other comments to reduce spam. --Andy -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
GET_RNG_SEED hypercall ABI? (Re: [PATCH v5 0/5] random,x86,kvm: Rework arch RNG seeds and get some from kvm)
hpa pointed out that the ABI that I chose (an MSR from the KVM range and a KVM cpuid bit) is unnecessarily KVM-specific. It would be nice to allocate an MSR that everyone involved can agree on and, rather than relying on a cpuid bit, just have the guest probe for the MSR. This leads to a few questions: 1. How do we allocate an MSR? (For background, this would be an MSR that either returns 64 bits of best-effort cryptographically secure random data or fails with #GP.) 2. For KVM, what's the right way to allow QEMU to turn the feature on and off? Is this even necessary? KVM currently doesn't seem to allow QEMU to turn any of its MSRs off; it just allows QEMU to ask it to stop advertising support. 3. QEMU people, can you please fix your RDMSR emulation to send #GP on failure? I can work around it for this MSR in the Linux code, but for Pete's sake... :( Thanks, Andy -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: GET_RNG_SEED hypercall ABI? (Re: [PATCH v5 0/5] random,x86,kvm: Rework arch RNG seeds and get some from kvm)
On Aug 28, 2014 7:17 AM, Gleb Natapov g...@kernel.org wrote: On Tue, Aug 26, 2014 at 04:58:34PM -0700, Andy Lutomirski wrote: hpa pointed out that the ABI that I chose (an MSR from the KVM range and a KVM cpuid bit) is unnecessarily KVM-specific. It would be nice to allocate an MSR that everyone involved can agree on and, rather than relying on a cpuid bit, just have the guest probe for the MSR. CPUID part allows feature to be disabled for machine compatibility purpose during migration. Of course interface can explicitly state that one successful use of the MSR does not mean that next use will not result in a #GP, but that doesn't sound very elegant and is different from any other MSR out there. Is there a non-cpuid interface between QEMU and KVM for this? AFAICT, even turning off cpuid bits for things like async pf doesn't actually disable the MSRs (which is arguably an attack surface issue). --Andy -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: GET_RNG_SEED hypercall ABI? (Re: [PATCH v5 0/5] random,x86,kvm: Rework arch RNG seeds and get some from kvm)
On Thu, Aug 28, 2014 at 12:46 PM, Paolo Bonzini pbonz...@redhat.com wrote: Il 28/08/2014 18:22, Andy Lutomirski ha scritto: Is there a non-cpuid interface between QEMU and KVM for this? No. Hmm. Then, assuming that someone manages to allocate a cross-hypervisor MSR number for this, what am I supposed to do in the KVM code? Just make it available unconditionally? I don't see why that wouldn't work reliably, but it seems like an odd design. AFAICT, even turning off cpuid bits for things like async pf doesn't actually disable the MSRs (which is arguably an attack surface issue). No, it doesn't. You cannot disable instructions even if you hide CPUID bits, so KVM just extends this to MSRs (both native and paravirtual). It sometimes helps too, for example with a particular guest OS that does not necessary check CPUID for bits that are always present on Apple hardware... But I bet that no one assumes that KVM paravirt MSRs are available even if the feature bit isn't set. Also, the one and only native feature flag I tested (rdtscp) actually does work: RDTSCP seems to send #UD if QEMU is passed -cpu host,-rdtscp. --Andy -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Standardizing an MSR or other hypercall to get an RNG seed?
Hi all- I would like to standardize on a very simple protocol by which a guest OS can obtain an RNG seed early in boot. The main design requirements are: - The interface should be very easy to use. Linux, at least, will want to use it extremely early in boot as part of kernel ASLR. This means that PCI and ACPI will not work. - It should be synchronous. We don't want to delay boot while waiting for a slow host RNG. (On Linux, at least, we have a separate interface for that: virtio-rng. I think that Windows has some support for virtio-rng as well.) - Random numbers obtained through this interface should be best-effort. We want the best quality randomness that the host can provide immediately. It seems to me that the best interface for the actual request for a random number is rdmsr. This is supported on all hypervisors and all virtualization technologies. It can return a 64 bit random number, and it is easy to rdmsr the same register more than once to get a larger random number. The main questions are what MSR index to use and how to detect the presence of the MSR. I've played with two approaches: 1. Use CPUID to detect the presence of this feature. This is very easy for KVM to implement by using a KVM-specific CPUID feature. The problem is that this will necessarily be KVM-specific, as the guest must first probe for KVM and then probe for the KVM feature. I doubt that Hyper-V, for example, wants to claim to be KVM. If we could standardize a non-hypervisor-specific CPUID feature, then this problem would go away. 2. Detect the existence of the MSR by trying to read it and handling the #GP(0) that will occur if the MSR is not present. Linux, at least, is okay with doing this, and I have code to enable an IDT and an rdmsr fixup early enough in boot to use it for ASLR. I don't know whether other operating systems can do this, though. The major questions, then, are what enumeration mechanism should be used and what MSR index should be used. For the MSR index, we could use an MSR from the Intel range if Intel were to give explicit approval, thus guaranteeing that nothing would conflict. Or we could try to agree on an MSR index in the 0x4000-0x4fff range that is unlikely to conflict with anything. For enumeration, we could just probe the MSR if all relevant guests are okay with this or we could standardize on a CPUID-based mechanism. If we do the latter, I don't know what that mechanism would be. NB: This thread will be cc'd to Microsoft and possibly Hyper-V people shortly. I very much appreciate Jun Nakajima's help with this! Thanks, Andy -- Andy Lutomirski AMA Capital Management, LLC -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Standardizing an MSR or other hypercall to get an RNG seed?
On Thu, Sep 18, 2014 at 7:43 AM, H. Peter Anvin h...@zytor.com wrote: On 09/18/2014 07:40 AM, KY Srinivasan wrote: The main questions are what MSR index to use and how to detect the presence of the MSR. I've played with two approaches: 1. Use CPUID to detect the presence of this feature. This is very easy for KVM to implement by using a KVM-specific CPUID feature. The problem is that this will necessarily be KVM-specific, as the guest must first probe for KVM and then probe for the KVM feature. I doubt that Hyper-V, for example, wants to claim to be KVM. If we could standardize a non- hypervisor-specific CPUID feature, then this problem would go away. We would prefer a CPUID feature bit to detect this feature. I guess if we're introducing the concept of pan-OS MSRs we could also have pan-OS CPUID. The real issue is to get a single non-conflicting standard. Agreed. KVM currently puts 0 in 0x4000.EAX, meaning that a feature bit in Microsoft's leaf 0x4003 would probably not work well for KVM. I don't expect that Microsoft wants to start claiming to be KVM for the purpose of using a KVM-style feature bit, so, if we went the CPUID route, we would probably need something new. --Andy -hpa -- Andy Lutomirski AMA Capital Management, LLC -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Standardizing an MSR or other hypercall to get an RNG seed?
On Thu, Sep 18, 2014 at 8:38 AM, Andy Lutomirski l...@amacapital.net wrote: On Thu, Sep 18, 2014 at 7:43 AM, H. Peter Anvin h...@zytor.com wrote: On 09/18/2014 07:40 AM, KY Srinivasan wrote: The main questions are what MSR index to use and how to detect the presence of the MSR. I've played with two approaches: 1. Use CPUID to detect the presence of this feature. This is very easy for KVM to implement by using a KVM-specific CPUID feature. The problem is that this will necessarily be KVM-specific, as the guest must first probe for KVM and then probe for the KVM feature. I doubt that Hyper-V, for example, wants to claim to be KVM. If we could standardize a non- hypervisor-specific CPUID feature, then this problem would go away. We would prefer a CPUID feature bit to detect this feature. I guess if we're introducing the concept of pan-OS MSRs we could also have pan-OS CPUID. The real issue is to get a single non-conflicting standard. Agreed. KVM currently puts 0 in 0x4000.EAX, meaning that a feature bit in Microsoft's leaf 0x4003 would probably not work well for KVM. I don't expect that Microsoft wants to start claiming to be KVM for the purpose of using a KVM-style feature bit, so, if we went the CPUID route, we would probably need something new. Slight correction: QEMU/KVM has optional support for Hyper-V feature enumeration. Ideally the RNG seed mechanism would be enabled by default, but I don't know whether the QEMU maintainers would be okay with enabling the Hyper-V cpuid mechanism in a default configuration. --Andy --Andy -hpa -- Andy Lutomirski AMA Capital Management, LLC -- Andy Lutomirski AMA Capital Management, LLC -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Standardizing an MSR or other hypercall to get an RNG seed?
On Thu, Sep 18, 2014 at 10:42 AM, Nakajima, Jun jun.nakaj...@intel.com wrote: On Thu, Sep 18, 2014 at 10:20 AM, KY Srinivasan k...@microsoft.com wrote: -Original Message- From: Paolo Bonzini [mailto:paolo.bonz...@gmail.com] On Behalf Of Paolo Bonzini Sent: Thursday, September 18, 2014 10:18 AM To: Nakajima, Jun; KY Srinivasan Cc: Mathew John; Theodore Ts'o; John Starks; kvm list; Gleb Natapov; Niels Ferguson; Andy Lutomirski; David Hepkin; H. Peter Anvin; Jake Oshins; Linux Virtualization Subject: Re: Standardizing an MSR or other hypercall to get an RNG seed? Il 18/09/2014 19:13, Nakajima, Jun ha scritto: In terms of the address for the MSR, I suggest that you choose one from the range between 4000H - 40FFH. The SDM (35.1 ARCHITECTURAL MSRS) says All existing and future processors will not implement any features using any MSR in this range. Hyper-V already defines many synthetic MSRs in this range, and I think it would be reasonable for you to pick one for this to avoid a conflict? KVM is not using any MSR in that range. However, I think it would be better to have the MSR (and perhaps CPUID) outside the hypervisor-reserved ranges, so that it becomes architecturally defined. In some sense it is similar to the HYPERVISOR CPUID feature. Yes, given that we want this to be hypervisor agnostic. Actually, that MSR address range has been reserved for that purpose, along with: - CPUID.EAX=1 - ECX bit 31 (always returns 0 on bare metal) - CPUID.EAX=4000_00xxH leaves (i.e. HYPERVISOR CPUID) I don't know whether this is documented anywhere, but Linux tries to detect a hypervisor by searching CPUID leaves 0x400xyz00 for KVMKVMKVM\0\0\0, so at least Linux can handle the KVM leaves being in a somewhat variable location. Do we consider this mechanism to work across all hypervisors and guests? That is, could we put something like CrossHVPara\0 somewhere in that range, where each hypervisor would be free to decide exactly where it ends up? --Andy -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Standardizing an MSR or other hypercall to get an RNG seed?
On Thu, Sep 18, 2014 at 11:54 AM, Niels Ferguson ni...@microsoft.com wrote: Defining a standard way of transferring random numbers between the host and the guest is an excellent idea. As the person who writes the RNG code in Windows, I have a few comments: DETECTION: It should be possible to detect this feature through CPUID or similar mechanism. That allows the code that uses this feature to be written without needing the ability to catch CPU exceptions. I could be wrong, but as far as I know there is no support for exception handling in the Windows OS loader where we gather our initial random state. Linux is like this, too, except that I have experimental code to create an IDT in that code, so we can handle it. I agree, though, that using CPUID in early boot is easier. EFFICIENCY: Is there a way we can transfer more bytes per interaction? With a single 64-bit MSR we always need multiple reads to get a seed, and each of them results in a context switch to the host, which is expensive. This is even worse for 32-bit guests. Windows would typically need to fetch 64 bytes of random data at boot and at regular intervals. It is not a show-stopper, but better efficiency would be nice. I thought about this for a while and didn't come up with anything that wouldn't messy. We could fudge the MSR rax/rdx high bits to get 128 bits, but that's nonportable and awful to implement. We could return a random number directly from CPUID, but that's weird. In very informal benchmarking, rdmsr wasn't that bad. On the other hand, I wasn't immediately planning on using the msr on an ongoing basis on Linux guests except after suspend/resume. GUEST-TO-HOST: Can we also define a way to have random values flow from the guest to the host? Guests are also gathering entropy from their own sources, and if we allow the guests to send random data to the host, then the host can treat it as an entropy source and all the VMs on a single host can share their entropy. (This is not a security problem; any reasonable host RNG cannot be hurt even by maliciously chosen entropy inputs.) wrmsr on the same MSR? I don't know much about how hypervisors work on the inside, but maybe we can define a mechanism for standardized hypervisor calls that work on all hypervisors that support this feature. Then we could define a function to do an entropy exchange: the guest provides N bytes of random data to the host, and the host replies with N bytes of random data. The data exchange can now be done through memory. A standardized hypervisor-call mechanism also seems generally useful for future features, whereas the MSR solution is very limited in what it can do. We might end up with standardized hypervisor-calls in the future for some other reason, and then the MSR solution looks very odd. I think there'll be resistance to a standardized hypercall mechanism, just because the implementations tend to be complex. Hyper-V uses a special page in guest physical memory that contains a trampoline. We could use wrmsr to a register where the payload is a pointer to a buffer to receive random bytes, but that loses some of the simplicity of just calling rdmsr a few times. --Andy -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Standardizing an MSR or other hypercall to get an RNG seed?
On Thu, Sep 18, 2014 at 11:58 AM, Paolo Bonzini pbonz...@redhat.com wrote: Actually, that MSR address range has been reserved for that purpose, along with: - CPUID.EAX=1 - ECX bit 31 (always returns 0 on bare metal) - CPUID.EAX=4000_00xxH leaves (i.e. HYPERVISOR CPUID) I don't know whether this is documented anywhere, but Linux tries to detect a hypervisor by searching CPUID leaves 0x400xyz00 for KVMKVMKVM\0\0\0, so at least Linux can handle the KVM leaves being in a somewhat variable location. Do we consider this mechanism to work across all hypervisors and guests? That is, could we put something like CrossHVPara\0 somewhere in that range, where each hypervisor would be free to decide exactly where it ends up? That's also possible, but extending the hypervisor CPUID range beywond 40FFH is not officially sanctioned by Intel. Xen started doing that in order to expose both Hyper-V and Xen CPUID leaves, and KVM followed the practice. Whoops. Might Intel be willing to extend that range to 0x4000 - 0x400f? And would Microsoft be okay with using this mechanism for discovery? Do we have anyone from VMware in this thread? I don't have any VMware contacts. --Andy -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Standardizing an MSR or other hypercall to get an RNG seed?
On Thu, Sep 18, 2014 at 2:21 PM, Nakajima, Jun jun.nakaj...@intel.com wrote: On Thu, Sep 18, 2014 at 12:07 PM, Andy Lutomirski l...@amacapital.net wrote: Might Intel be willing to extend that range to 0x4000 - 0x400f? And would Microsoft be okay with using this mechanism for discovery? So, for CPUID, the SDM (Table 3-17. Information Returned by CPUID) says today: No existing or future CPU will return processor identification or feature information if the initial EAX value is in the range 4000H to 4FFFH. We can define a cross-VM CPUID range from there. The CPUID can return the index of the MSR if needed. Right, sorry. I was looking at this sentence in SDM Volume 3 Section 35.1: MSR address range between 4000H - 40FFH is marked as a specially reserved range. All existing and future processors will not implement any features using any MSR in this range. That's not really a large enough range for us to reserve an MSR for this. However, KVM, is already using MSRs outside that range: it uses 0x4b564d00-0x4b564d04 or so. I wonder whether KVM got confused by the differing ranges for cpuid leaves and MSR indices. Any chance that Intel could reserve a larger range to include the KVM MSRs? It would also be easier if the MSR indices for cross-HV features were constants. Thanks, Andy -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Standardizing an MSR or other hypercall to get an RNG seed?
On Thu, Sep 18, 2014 at 2:46 PM, David Hepkin david...@microsoft.com wrote: I'm not sure what you mean by this mechanism? Are you suggesting that each hypervisor put CrossHVPara\0 somewhere in the 0x4000 - 0x400f CPUID range, and an OS has to do a full scan of this CPUID range on boot to find it? That seems pretty inefficient. An OS will take 1000's of hypervisor intercepts on every boot just to search this CPUID range. Linux already does this, which is arguably unfortunate. But it's not quite that bad; the KVM and Xen code is only scanning at increments of 0x100. I think that Linux as a guest would have no problem with checking the Hyper-V range or some new range. I don't think that Linux would want to have to set a guest OS identity, and it's not entirely clear to me whether this would be necessary to use the Hyper-V mechanism. I suggest we come to consensus on a specific CPUID leaf where an OS needs to look to determine if a hypervisor supports this capability. We could define a new CPUID leaf range at a well-defined location, or we could just use one of the existing CPUID leaf ranges implemented by an existing hypervisor. I'm not familiar with the KVM CPUID leaf range, but in the case of Hyper-V, the Hyper-V CPUID leaf range was architected to allow for other hypervisors to implement it and just show through specific capabilities supported by the hypervisor. So, we could define a bit in the Hyper-V CPUID leaf range (since Xen and KVM also implement this range), but that would require Linux to look in that range on boot to discover this capability. I also don't know whether QEMU and KVM would be okay with implementing the host side of the Hyper-V mechanism by default. They would have to implement at least leaves 0x4001 and 0x402, plus correctly reporting zeros through whatever leaf is used for this new feature. Gleb? Paolo? --Andy -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Standardizing an MSR or other hypercall to get an RNG seed?
On Thu, Sep 18, 2014 at 2:57 PM, H. Peter Anvin h...@zytor.com wrote: On 09/18/2014 02:46 PM, David Hepkin wrote: I'm not sure what you mean by this mechanism? Are you suggesting that each hypervisor put CrossHVPara\0 somewhere in the 0x4000 - 0x400f CPUID range, and an OS has to do a full scan of this CPUID range on boot to find it? That seems pretty inefficient. An OS will take 1000's of hypervisor intercepts on every boot just to search this CPUID range. I suggest we come to consensus on a specific CPUID leaf where an OS needs to look to determine if a hypervisor supports this capability. We could define a new CPUID leaf range at a well-defined location, or we could just use one of the existing CPUID leaf ranges implemented by an existing hypervisor. I'm not familiar with the KVM CPUID leaf range, but in the case of Hyper-V, the Hyper-V CPUID leaf range was architected to allow for other hypervisors to implement it and just show through specific capabilities supported by the hypervisor. So, we could define a bit in the Hyper-V CPUID leaf range (since Xen and KVM also implement this range), but that would require Linux to look in that range on boot to discover this capability. Yes, I would agree that if anything we should define a new range unique to this cross-VM interface, e.g. 0x4800. So, as a concrete straw-man: CPUID leaf 0x4800 would return a maximum leaf number in EAX (e.g. 0x4801) along with a signature value (e.g. CrossHVPara\0) in EBX, ECX, and EDX. CPUID 0x4801.EAX would contain an MSR number to read to get a random number if supported and zero if not supported. Questions: 1. Can we use a fixed MSR number? This would be a little bit simpler, but it would depend on getting a wider MSR range from Intel. 2. Who would host and maintain such a spec? I could do it on github, but this seems a bit silly. Other options would include Intel, Microsoft, or perhaps the Linux Foundation. I don't know whether Intel or LF would want to do this, and MS isn't exactly vendor-neutral. (Even L-F isn't entirely neutral, since they sort of represent two hypervisors.) Or we could do something temporary and then try to work with a group like OASIS, but that might end up being a lot of work. --Andy -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html