Re: 32-bit color graphic on KVM virtual machines

2010-04-30 Thread Andy Lutomirski

shacky wrote:

Hi.
Is it possible to have 32-bit color graphic on KVM virtual machines?
I installed a Windows virtual machine, but it allows me to configure
only 24-bit color display and it does not have any display driver
installed.


24-bit means 8 bits per RGB channel.  32-bit means 8 bits per RGB 
channel plus 8 bits alpha, which isn't very useful on the display.  So I 
wouldn't worry about it.  (If you had a 8bpp display, that would be a 
different story, but those aren't very common.)


Of course, lots of programs use 32 bit offscreen surfaces, but that's a 
different story.


--Andy



Is there a way to solve this problem?

Thank youv very much!
Bye.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


percpu allocation failures in kvm

2012-12-13 Thread Andy Lutomirski
On 3.7.0 + irrelevant patches, I get this on boot.  I've seen it on
and off on earlier kernels, I think (although I'm not currently
getting it on 3.5).

[   10.230054] PERCPU: allocation failed, size=304 align=32, alloc
from reserved chunk failed
[   10.230059] Pid: 1026, comm: modprobe Tainted: GW3.7.0-ama+ #5
[   10.230060] Call Trace:
[   10.230070]  [81129efb] pcpu_alloc+0x9db/0xa40
[   10.230074]  [810a81ad] ? find_symbol_in_section+0x4d/0x140
[   10.230077]  [810a8160] ? finished_loading+0x50/0x50
[   10.230080]  [810a8af0] ? each_symbol_section+0x30/0x70
[   10.230083]  [810a8b61] ? find_symbol+0x31/0x60
[   10.230086]  [8112a1f3] __alloc_reserved_percpu+0x13/0x20
[   10.230089]  [810ab48d] load_module+0x3ed/0x1b50
[   10.230093]  [81075c3b] ? __srcu_read_unlock+0x4b/0x70

--Andy
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: percpu allocation failures in kvm

2012-12-14 Thread Andy Lutomirski
On Fri, Dec 14, 2012 at 5:03 PM, Marcelo Tosatti mtosa...@redhat.com wrote:
 On Thu, Dec 13, 2012 at 09:43:23PM -0800, Andy Lutomirski wrote:
 On 3.7.0 + irrelevant patches, I get this on boot.  I've seen it on
 and off on earlier kernels, I think (although I'm not currently
 getting it on 3.5).

 [   10.230054] PERCPU: allocation failed, size=304 align=32, alloc
 from reserved chunk failed
 [   10.230059] Pid: 1026, comm: modprobe Tainted: GW3.7.0-ama+ #5
 [   10.230060] Call Trace:
 [   10.230070]  [81129efb] pcpu_alloc+0x9db/0xa40
 [   10.230074]  [810a81ad] ? find_symbol_in_section+0x4d/0x140
 [   10.230077]  [810a8160] ? finished_loading+0x50/0x50
 [   10.230080]  [810a8af0] ? each_symbol_section+0x30/0x70
 [   10.230083]  [810a8b61] ? find_symbol+0x31/0x60
 [   10.230086]  [8112a1f3] __alloc_reserved_percpu+0x13/0x20
 [   10.230089]  [810ab48d] load_module+0x3ed/0x1b50
 [   10.230093]  [81075c3b] ? __srcu_read_unlock+0x4b/0x70

 --Andy

 You're loading the kvm module, or loading some other module inside
 a kvm guest?


This is loading the kvm module on startup.  There are no guests.

-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[KVM paravirt issue?] Re: vsyscall=emulate regression

2012-02-15 Thread Andy Lutomirski
Hi, kvm people-

Here's a strange failure.  It could be a bug in something
RHEL6-specific, but it could be a generic issue that only triggers
with a paravirt guest with old userspace on a non-ept host.  There was
a bug like this on Xen, and I'm wondering something's wrong on kvm as
well.

For background, a change in 3.1 (IIRC) means that, when
vsyscall=emulate or vsyscall=none, the vsyscall page in the fixmap is
NX.  It seems like Amit's machine is marking the physical PTE present
but unreadable.  So I could have messed up, or there could be a subtle
bug somewhere.  Any ideas?

I'll try to reproduce on a non-ept host later on, but that will
involve finding one.

On Wed, Feb 15, 2012 at 3:01 AM, Amit Shah amit.s...@redhat.com wrote:
 On (Tue) 14 Feb 2012 [08:26:22], Andy Lutomirski wrote:
 On Tue, Feb 14, 2012 at 4:22 AM, Amit Shah amit.s...@redhat.com wrote:
 Can you try booting the initramfs here:
 http://web.mit.edu/luto/www/linux/vsyscall_initramfs.img
 with your kernel image (i.e. qemu-kvm -kernel whatever -initrd
 vsyscall_initramfs.img -whatever_else) and seeing what happens?  It
 works for me.

 This too results in a similar error.

Can you post the exact error?  I'm interested in how far it gets
before it fails.

 I didn't try a modern distro, but looks like this is enough evidence
 for now to check the kvm emulator code.  I tried the same guests on a
 newer kernel (Fedora 16's 3.2), and things worked fine except for
 vsyscall=none, panic message below.

vsyscall=none isn't supposed to work unless you're running a very
modern distro *and* you have no legacy static binaries *and* you
aren't using anything written in Go (sigh).  It will probably either
never become the default or will take 5-10 years.


 model name      : Intel(R) Core(TM)2 Duo CPU     E6550  @ 2.33GHz
 flags           : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov 
 pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm 
 constant_tsc arch_perfmon pebs bts rep_good aperfmperf pni dtes64 monitor 
 ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm lahf_lm dts tpr_shadow vnmi 
 flexpriority

Hmm.  You don't have ept.  If your guest kernel supports paravirt,
then you might use the hypercall interface instead of programming the
fixmap directly.


 This is what I get with vsyscall=none, where emulate and native work
 fine on the 3.2 kernel on different host hardware, the guest stays the
 same:


 [    2.874661] debug: unmapping init memory 8167f000..818dc000
 [    2.876778] Write protecting the kernel read-only data: 6144k
 [    2.879111] debug: unmapping init memory 880001318000..88000140
 [    2.881242] debug: unmapping init memory 8800015a..88000160
 [    2.884637] init[1] vsyscall attempted with vsyscall=none 
 ip:ff600400 cs:33 sp:7fff2f48fe18 ax:7fff2f48fe50 si:7fff2f48ff08 di:0

This like (vsyscall attempted) means that the emulation worked
correctly.  Your other traces didn't have it or anything like it,
which mostly rules out do_emulate_vsyscall issues.

--Andy
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM paravirt issue?] Re: vsyscall=emulate regression

2012-02-16 Thread Andy Lutomirski
On Thu, Feb 16, 2012 at 8:17 AM, Avi Kivity a...@redhat.com wrote:
 On 02/15/2012 09:36 PM, Andy Lutomirski wrote:
 Hi, kvm people-

 Here's a strange failure.  It could be a bug in something
 RHEL6-specific, but it could be a generic issue that only triggers
 with a paravirt guest with old userspace on a non-ept host.  There was
 a bug like this on Xen, and I'm wondering something's wrong on kvm as
 well.

 For background, a change in 3.1 (IIRC) means that, when
 vsyscall=emulate or vsyscall=none, the vsyscall page in the fixmap is
 NX.  It seems like Amit's machine is marking the physical PTE present
 but unreadable.

 No such thing as present and unreadable, without EPT.

 So I could have messed up, or there could be a subtle
 bug somewhere.  Any ideas?

 What's the code trying to do?  Execute an instruction from an
 non-executable page, trap the #PF, and emulate?  And what are the
 symptoms? wrong error code for the #PF?  That could easily be a kvm bug.


The symptom is that some kind of access to a page that's supposed to
be readable, NX is reporting error 5.  I'm not quite sure what kind of
access is causing that.


 I'll try to reproduce on a non-ept host later on, but that will
 involve finding one.

 rmmod kvm-intel
 moprobe kvm-intel ept=0

I just tried that and still can't reproduce the problem.  FWIW, I also
failed to reproduce it on the one RHEL6 machine I have access to.


 Hmm.  You don't have ept.  If your guest kernel supports paravirt,
 then you might use the hypercall interface instead of programming the
 fixmap directly.

 There is no hypercall interface for writing page tables in kvm.

Evidently I was looking at the removed kvm_set_pte stuff :)



 
  This is what I get with vsyscall=none, where emulate and native work
  fine on the 3.2 kernel on different host hardware, the guest stays the
  same:
 
 
  [    2.874661] debug: unmapping init memory 
  8167f000..818dc000
  [    2.876778] Write protecting the kernel read-only data: 6144k
  [    2.879111] debug: unmapping init memory 
  880001318000..88000140
  [    2.881242] debug: unmapping init memory 
  8800015a..88000160
  [    2.884637] init[1] vsyscall attempted with vsyscall=none 
  ip:ff600400 cs:33 sp:7fff2f48fe18 ax:7fff2f48fe50 si:7fff2f48ff08 
  di:0

 This like (vsyscall attempted) means that the emulation worked
 correctly.  Your other traces didn't have it or anything like it,
 which mostly rules out do_emulate_vsyscall issues.


 Can you point me at the code in question?

The setup code is in arch/x86/kernel/vsyscall_64.c in map_vsyscall.
The bad access is to the vsyscall page.


 Amit, a trace would be nice.

The full output from a test boot of my (updated this morning) initramfs here:
http://web.mit.edu/luto/www/linux/vsyscall_initramfs.img
may give a better hint.

The updated code is here:

#include unistd.h
#include stdio.h
#include string.h
#include time.h

typedef time_t (*vsys_time_t)(time_t *);

int main()
{
  vsys_time_t vsys_time = (vsys_time_t)(0xff600400);
  unsigned char *p = (char*)0xff600400;
  int i;

  printf(Will try reading...\n);
  printf(The first few bytes are:\n);
  for (i = 0; i  16; i++) {
unsigned char c = p[i];
printf(%02x , (int)c);
  }
  printf(\n);

  printf(Will try executing...\n);
  printf(The time is %ld\n, (long)( vsys_time(0) ));

  printf(All done\n);
  while(1)
pause();
}

--Andy
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM paravirt issue?] Re: vsyscall=emulate regression

2012-02-16 Thread Andy Lutomirski
On Thu, Feb 16, 2012 at 9:14 AM, Avi Kivity a...@redhat.com wrote:
 On 02/16/2012 06:45 PM, Andy Lutomirski wrote:
 
  So I could have messed up, or there could be a subtle
  bug somewhere.  Any ideas?
 
  What's the code trying to do?  Execute an instruction from an
  non-executable page, trap the #PF, and emulate?  And what are the
  symptoms? wrong error code for the #PF?  That could easily be a kvm bug.
 

 The symptom is that some kind of access to a page that's supposed to
 be readable, NX is reporting error 5.  I'm not quite sure what kind of
 access is causing that.

 Might it be a fetch access, with kvm forgetting to set bit 4 correctly?

 
  Can you point me at the code in question?

 The setup code is in arch/x86/kernel/vsyscall_64.c in map_vsyscall.
 The bad access is to the vsyscall page.

 The bad access is on purpose, yes?

 From fault.c:

 #ifdef CONFIG_X86_64
                /*
                 * Instruction fetch faults in the vsyscall page might need
                 * emulation.
                 */
                if (unlikely((error_code  PF_INSTR) 
                             ((address  ~0xfff) == VSYSCALL_START))) {
                        if (emulate_vsyscall(regs, address))
                                return;
                }
 #endif

 so it seems like kvm doesn't set PF_INSTR?

Yes, this is on purpose, and you're almost certainly right (and I feel
dumb for not figuring this out immediately).  The error message is:

segfault at ff600400 ip ff600400 sp 7fff103d72f8 error 5

which is garbage.  The instruction at 0xff600400 can't fetch
itself as data and fault on the data access (at least not in 64-bit
mode, as far as I can think of, without evil messing with the TLBs).

So... what do we do about this?  This (whitespace-damaged, untested)
patch will probably work around it well enough to boot the system:

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 9d74824..52b9522 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -741,8 +741,11 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long
 * Instruction fetch faults in the vsyscall page might need
 * emulation.
 */
-   if (unlikely((error_code  PF_INSTR) 
+   if (unlikely(address == regs-ip  !(error_code  PF_WRITE) 
 ((address  ~0xfff) == VSYSCALL_START))) {
+   WARN_ONCE(!(error_code  PF_INSTR),
+ Fixing up bogus vsyscall read fault -- 
+ your hypervisor is buggy.);
if (emulate_vsyscall(regs, address))
return;
}

Before we patch the guest like this, though, it would be nice to know
what hosts are affected.  If it's just one version of RHEL6, maybe it
makes sense to fix the hypervisor and either leave the guest alone or
just add a warning saying to fix your hypervisor, like:

WARN_ONCE(address == regs-ip  !(error_code  (PF_INSTR | PF_WRITE))
 user_64bit_mode(regs), Fishy page fault -- you might need to fix
your hypervisor);

near some exit path in the page fault handler.  The 64-bit check is
because (I think) 32-bit code can mess with regs-ip using a cs offset
in the LDT and trigger the warning at will.

--Andy
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM paravirt issue?] Re: vsyscall=emulate regression

2012-02-24 Thread Andy Lutomirski
On Thu, Feb 23, 2012 at 8:34 PM, H. Peter Anvin h...@zytor.com wrote:
 On 02/16/2012 09:39 AM, Avi Kivity wrote:

 Yes, this is on purpose

 Why?

I think the this refers to the PF_INSTR fault when executing at
0xff600xxx.  That's definitely intentional -- it's how
vsyscall emulation works.

I think it's unintentional that some kvm versions apparently forget to
set the PF_INSTR bit.

--Andy


        -hpa


 --
 H. Peter Anvin, Intel Open Source Technology Center
 I work for Intel.  I don't speak on their behalf.




-- 
Andy Lutomirski
AMA Capital Management, LLC
Office: (310) 553-5322
Mobile: (650) 906-0647
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM: x86: use dynamic percpu allocations for shared msrs area

2013-02-01 Thread Andy Lutomirski
On Thu, Jan 3, 2013 at 5:41 AM, Marcelo Tosatti mtosa...@redhat.com wrote:

 Andy, Mike, can you confirm whether this fixes the percpu allocation
 failures when loading kvm.ko? TIA

 

 Use dynamic percpu allocations for the shared msrs structure,
 to avoid using the limited reserved percpu space.

 Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Sorry for the amazingly long delay.  What kernel does this apply to?

--Andy
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/5] x86: Annotate _ASM_EXTABLE users to distinguish uaccess from everything else

2013-05-22 Thread Andy Lutomirski
The idea is that the kernel can be much more careful fixing up
uaccess exceptions -- page faults on user addresses are the only
legitimate reason for a uaccess instruction to fault.

Signed-off-by: Andy Lutomirski l...@amacapital.net
---

I'm not 100% sure what's happening in the KVM code.  Can someone familiar
with it take a look?

 arch/x86/ia32/ia32entry.S |   4 +-
 arch/x86/include/asm/asm.h|  13 ++-
 arch/x86/include/asm/fpu-internal.h   |   6 +-
 arch/x86/include/asm/futex.h  |   8 +-
 arch/x86/include/asm/kvm_host.h   |   2 +-
 arch/x86/include/asm/msr.h|   4 +-
 arch/x86/include/asm/segment.h|   2 +-
 arch/x86/include/asm/special_insns.h  |   2 +-
 arch/x86/include/asm/uaccess.h|   8 +-
 arch/x86/include/asm/word-at-a-time.h |   2 +-
 arch/x86/include/asm/xsave.h  |   6 +-
 arch/x86/kernel/entry_32.S|  26 ++---
 arch/x86/kernel/entry_64.S|   6 +-
 arch/x86/kernel/ftrace.c  |   4 +-
 arch/x86/kernel/test_nx.c |   2 +-
 arch/x86/kernel/test_rodata.c |   2 +-
 arch/x86/kvm/emulate.c|   4 +-
 arch/x86/lib/checksum_32.S|   4 +-
 arch/x86/lib/copy_user_64.S   |  50 
 arch/x86/lib/copy_user_nocache_64.S   |  44 +++
 arch/x86/lib/csum-copy_64.S   |   6 +-
 arch/x86/lib/getuser.S|  12 +-
 arch/x86/lib/mmx_32.c |  12 +-
 arch/x86/lib/msr-reg.S|   4 +-
 arch/x86/lib/putuser.S|  10 +-
 arch/x86/lib/usercopy_32.c| 212 +-
 arch/x86/lib/usercopy_64.c|   4 +-
 arch/x86/mm/init_32.c |   2 +-
 arch/x86/um/checksum_32.S |   4 +-
 arch/x86/xen/xen-asm_32.S |   2 +-
 30 files changed, 236 insertions(+), 231 deletions(-)

diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index 474dc1b..8d3b5c2 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -149,7 +149,7 @@ ENTRY(ia32_sysenter_target)
   32bit zero extended */ 
ASM_STAC
 1: movl(%rbp),%ebp
-   _ASM_EXTABLE(1b,ia32_badarg)
+   _ASM_EXTABLE_UACCESS(1b,ia32_badarg)
ASM_CLAC
orl $TS_COMPAT,TI_status+THREAD_INFO(%rsp,RIP-ARGOFFSET)
testl   
$_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
@@ -306,7 +306,7 @@ ENTRY(ia32_cstar_target)
/* hardware stack frame is complete now */  
ASM_STAC
 1: movl(%r8),%r9d
-   _ASM_EXTABLE(1b,ia32_badarg)
+   _ASM_EXTABLE_UACCESS(1b,ia32_badarg)
ASM_CLAC
orl $TS_COMPAT,TI_status+THREAD_INFO(%rsp,RIP-ARGOFFSET)
testl   
$_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h
index fa47fd4..f48a850 100644
--- a/arch/x86/include/asm/asm.h
+++ b/arch/x86/include/asm/asm.h
@@ -57,14 +57,16 @@
  */
 
 /* There are two bits of extable entry class, added to a signed offset. */
-#define _EXTABLE_CLASS_DEFAULT 0   /* standard uaccess fixup */
+#define _EXTABLE_CLASS_UACCESS 0   /* standard uaccess fixup */
+#define _EXTABLE_CLASS_ANY 0x4000  /* catch any exception */
 #define _EXTABLE_CLASS_EX  0x8000  /* uaccess + set uaccess_err */
 
 /*
  * The biases are the class constants + 0x2000, as signed integers.
  * This can't use ordinary arithmetic -- the assembler isn't that smart.
  */
-#define _EXTABLE_BIAS_DEFAULT  0x2000
+#define _EXTABLE_BIAS_UACCESS  0x2000
+#define _EXTABLE_BIAS_ANY  0x2000 + 0x4000
 #define _EXTABLE_BIAS_EX   0x2000 - 0x8000
 
 #ifdef __ASSEMBLY__
@@ -85,8 +87,11 @@
 .popsection\n
 #endif
 
-#define _ASM_EXTABLE(from,to)  \
-   _ASM_EXTABLE_CLASS(from, to, _EXTABLE_BIAS_DEFAULT)
+#define _ASM_EXTABLE_UACCESS(from,to)  \
+   _ASM_EXTABLE_CLASS(from, to, _EXTABLE_BIAS_UACCESS)
+
+#define _ASM_EXTABLE_ANY(from,to)  \
+   _ASM_EXTABLE_CLASS(from, to, _EXTABLE_BIAS_ANY)
 
 #define _ASM_EXTABLE_EX(from,to)   \
_ASM_EXTABLE_CLASS(from, to, _EXTABLE_BIAS_EX)
diff --git a/arch/x86/include/asm/fpu-internal.h 
b/arch/x86/include/asm/fpu-internal.h
index e25cc33..7f86031 100644
--- a/arch/x86/include/asm/fpu-internal.h
+++ b/arch/x86/include/asm/fpu-internal.h
@@ -133,7 +133,7 @@ static inline void sanitize_i387_state(struct task_struct 
*tsk)
 3:  movl $-1,%[err]\n\
 jmp  2b\n\
 .previous\n  \
-_ASM_EXTABLE(1b, 3b)   \
+_ASM_EXTABLE_UACCESS(1b, 3b

Re: VDSO pvclock may increase host cpu consumption, is this a problem?

2014-03-31 Thread Andy Lutomirski
On 03/29/2014 01:47 AM, Zhanghailiang wrote:
 Hi,
 I found when Guest is idle, VDSO pvclock may increase host consumption.
 We can calcutate as follow, Correct me if I am wrong.
   (Host)250 * update_pvclock_gtod = 1500 * gettimeofday(Guest)
 In Host, VDSO pvclock introduce a notifier chain, pvclock_gtod_chain in 
 timekeeping.c. It consume nearly 900 cycles per call. So in consideration of 
 250 Hz, it may consume 225,000 cycles per second, even no VM is created.
 In Guest, gettimeofday consumes 220 cycles per call with VDSO pvclock. If the 
 no-kvmclock-vsyscall is configured, gettimeofday consumes 370 cycles per 
 call. The feature decrease 150 cycles consumption per call. 
 When call gettimeofday 1500 times,it decrease 225,000 cycles,equal to the 
 host consumption.
 Both Host and Guest is linux-3.13.6.
 So, whether the host cpu consumption is a problem?

Does pvclock serve any real purpose on systems with fully-functional
TSCs?  The x86 guest implementation is awful, so it's about 2x slower
than TSC.  It could be improved a lot, but I'm not sure I understand why
it exists in the first place.

I certainly understand the goal of keeping the guest CLOCK_REALTIME is
sync with the host, but pvclock seems like overkill for that.

--Andy

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: VDSO pvclock may increase host cpu consumption, is this a problem?

2014-03-31 Thread Andy Lutomirski
On Mar 31, 2014 8:45 PM, Marcelo Tosatti mtosa...@redhat.com wrote:

 On Mon, Mar 31, 2014 at 10:52:25AM -0700, Andy Lutomirski wrote:
  On 03/29/2014 01:47 AM, Zhanghailiang wrote:
   Hi,
   I found when Guest is idle, VDSO pvclock may increase host consumption.
   We can calcutate as follow, Correct me if I am wrong.
 (Host)250 * update_pvclock_gtod = 1500 * gettimeofday(Guest)
   In Host, VDSO pvclock introduce a notifier chain, pvclock_gtod_chain in 
   timekeeping.c. It consume nearly 900 cycles per call. So in consideration 
   of 250 Hz, it may consume 225,000 cycles per second, even no VM is 
   created.
   In Guest, gettimeofday consumes 220 cycles per call with VDSO pvclock. If 
   the no-kvmclock-vsyscall is configured, gettimeofday consumes 370 cycles 
   per call. The feature decrease 150 cycles consumption per call.
   When call gettimeofday 1500 times,it decrease 225,000 cycles,equal to the 
   host consumption.
   Both Host and Guest is linux-3.13.6.
   So, whether the host cpu consumption is a problem?
 
  Does pvclock serve any real purpose on systems with fully-functional
  TSCs?  The x86 guest implementation is awful, so it's about 2x slower
  than TSC.  It could be improved a lot, but I'm not sure I understand why
  it exists in the first place.

 VM migration.

Why does that need percpu stuff?  Wouldn't it be sufficient to
interrupt all CPUs (or at least all cpus running in userspace) on
migration and update the normal timing data structures?

Even better: have the VM offer to invalidate the physical page
containing the kernel's clock data on migration and interrupt one CPU.
 If another CPU races, it'll fault and wait for the guest kernel to
update its timing.

Does the current kvmclock stuff track CLOCK_MONOTONIC and
CLOCK_REALTIME separately?


 Can you explain why you consider it so bad ? How you think it could be
 improved ?

The second rdtsc_barrier looks unnecessary.  Even better, if rdtscp is
available, then rdtscp can replace rdtsc_barrier, rdtsc, and the
getcpu call.

It would also be nice to avoid having two sets of rescalings of the timing data.



  I certainly understand the goal of keeping the guest CLOCK_REALTIME is
  sync with the host, but pvclock seems like overkill for that.

 VM migration.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: VDSO pvclock may increase host cpu consumption, is this a problem?

2014-04-01 Thread Andy Lutomirski
On Tue, Apr 1, 2014 at 11:01 AM, Marcelo Tosatti mtosa...@redhat.com wrote:
 On Mon, Mar 31, 2014 at 10:33:41PM -0700, Andy Lutomirski wrote:
 On Mar 31, 2014 8:45 PM, Marcelo Tosatti mtosa...@redhat.com wrote:
 
  On Mon, Mar 31, 2014 at 10:52:25AM -0700, Andy Lutomirski wrote:
   On 03/29/2014 01:47 AM, Zhanghailiang wrote:
Hi,
I found when Guest is idle, VDSO pvclock may increase host consumption.
We can calcutate as follow, Correct me if I am wrong.
  (Host)250 * update_pvclock_gtod = 1500 * gettimeofday(Guest)
In Host, VDSO pvclock introduce a notifier chain, pvclock_gtod_chain 
in timekeeping.c. It consume nearly 900 cycles per call. So in 
consideration of 250 Hz, it may consume 225,000 cycles per second, 
even no VM is created.
In Guest, gettimeofday consumes 220 cycles per call with VDSO pvclock. 
If the no-kvmclock-vsyscall is configured, gettimeofday consumes 370 
cycles per call. The feature decrease 150 cycles consumption per call.
When call gettimeofday 1500 times,it decrease 225,000 cycles,equal to 
the host consumption.
Both Host and Guest is linux-3.13.6.
So, whether the host cpu consumption is a problem?
  
   Does pvclock serve any real purpose on systems with fully-functional
   TSCs?  The x86 guest implementation is awful, so it's about 2x slower
   than TSC.  It could be improved a lot, but I'm not sure I understand why
   it exists in the first place.
 
  VM migration.

 Why does that need percpu stuff?  Wouldn't it be sufficient to
 interrupt all CPUs (or at least all cpus running in userspace) on
 migration and update the normal timing data structures?

 Are you suggesting to allow interruption of the timekeeping code
 at any time to update frequency information ?

I'm not sure what you mean by interruption of the timekeeping code.
I'm suggesting sending an interrupt to the guest (via a virtio device,
presumably) to tell it that it has been paused and resumed.

This is probably worth getting John's input if you actually want to do
this.  I'm not about to :)

Is there any case in which the TSC is stable and the kvmclock data for
different cpus is actually different?


 Do you want to that as a special tsc clocksource driver ?

 Even better: have the VM offer to invalidate the physical page
 containing the kernel's clock data on migration and interrupt one CPU.
  If another CPU races, it'll fault and wait for the guest kernel to
 update its timing.

 Perhaps that is a good idea.

 Does the current kvmclock stuff track CLOCK_MONOTONIC and
 CLOCK_REALTIME separately?

 No. kvmclock counting is interrupted on vm pause (the hw clock does not
 count during vm pause).

Makes sense.


  Can you explain why you consider it so bad ? How you think it could be
  improved ?

 The second rdtsc_barrier looks unnecessary.  Even better, if rdtscp is
 available, then rdtscp can replace rdtsc_barrier, rdtsc, and the
 getcpu call.

 It would also be nice to avoid having two sets of rescalings of the timing 
 data.

 Yep, probably good improvements, patches are welcome :-)


I may get to it at some point.  No guarantees.  I did just rewrite all
the mapping-related code for every other x86 vdso timesource, so maybe
I should try to add this to the pile.  The fact that the data is a
variable number of pages makes it messy, though, and since I don't
understand why there's a separate structure for each CPU, I'm hesitant
to change it too much.

--Andy
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: VDSO pvclock may increase host cpu consumption, is this a problem?

2014-04-01 Thread Andy Lutomirski
On Tue, Apr 1, 2014 at 5:12 PM, Marcelo Tosatti mtosa...@redhat.com wrote:
 On Tue, Apr 01, 2014 at 12:17:16PM -0700, Andy Lutomirski wrote:
 On Tue, Apr 1, 2014 at 11:01 AM, Marcelo Tosatti mtosa...@redhat.com wrote:
  On Mon, Mar 31, 2014 at 10:33:41PM -0700, Andy Lutomirski wrote:
  On Mar 31, 2014 8:45 PM, Marcelo Tosatti mtosa...@redhat.com wrote:
  
   On Mon, Mar 31, 2014 at 10:52:25AM -0700, Andy Lutomirski wrote:
On 03/29/2014 01:47 AM, Zhanghailiang wrote:
 Hi,
 I found when Guest is idle, VDSO pvclock may increase host 
 consumption.
 We can calcutate as follow, Correct me if I am wrong.
   (Host)250 * update_pvclock_gtod = 1500 * gettimeofday(Guest)
 In Host, VDSO pvclock introduce a notifier chain, 
 pvclock_gtod_chain in timekeeping.c. It consume nearly 900 cycles 
 per call. So in consideration of 250 Hz, it may consume 225,000 
 cycles per second, even no VM is created.
 In Guest, gettimeofday consumes 220 cycles per call with VDSO 
 pvclock. If the no-kvmclock-vsyscall is configured, gettimeofday 
 consumes 370 cycles per call. The feature decrease 150 cycles 
 consumption per call.
 When call gettimeofday 1500 times,it decrease 225,000 cycles,equal 
 to the host consumption.
 Both Host and Guest is linux-3.13.6.
 So, whether the host cpu consumption is a problem?
   
Does pvclock serve any real purpose on systems with fully-functional
TSCs?  The x86 guest implementation is awful, so it's about 2x slower
than TSC.  It could be improved a lot, but I'm not sure I understand 
why
it exists in the first place.
  
   VM migration.
 
  Why does that need percpu stuff?  Wouldn't it be sufficient to
  interrupt all CPUs (or at least all cpus running in userspace) on
  migration and update the normal timing data structures?
 
  Are you suggesting to allow interruption of the timekeeping code
  at any time to update frequency information ?

 I'm not sure what you mean by interruption of the timekeeping code.
 I'm suggesting sending an interrupt to the guest (via a virtio device,
 presumably) to tell it that it has been paused and resumed.

 code:

 1) disable interrupts
 2) A = RDTSC
 3) B = SCALE(A, TSC.FREQ)

 If migration happens between 2 and 3, you've got an incorrect value.


Fair enough.

I guess

1) disable interrupts
2) A = RDTSC
3) B = SCALE(A, TSC.FREQ)

is also bad if (3) blocks due to magic invalidation of the physical page.

--Andy
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: VDSO pvclock may increase host cpu consumption, is this a problem?

2014-04-01 Thread Andy Lutomirski
On Tue, Apr 1, 2014 at 5:29 PM, Marcelo Tosatti mtosa...@redhat.com wrote:
 On Tue, Apr 01, 2014 at 12:17:16PM -0700, Andy Lutomirski wrote:
 On Tue, Apr 1, 2014 at 11:01 AM, Marcelo Tosatti mtosa...@redhat.com wrote:
  On Mon, Mar 31, 2014 at 10:33:41PM -0700, Andy Lutomirski wrote:
  On Mar 31, 2014 8:45 PM, Marcelo Tosatti mtosa...@redhat.com wrote:
  
   On Mon, Mar 31, 2014 at 10:52:25AM -0700, Andy Lutomirski wrote:
On 03/29/2014 01:47 AM, Zhanghailiang wrote:
 Hi,
 I found when Guest is idle, VDSO pvclock may increase host 
 consumption.
 We can calcutate as follow, Correct me if I am wrong.
   (Host)250 * update_pvclock_gtod = 1500 * gettimeofday(Guest)
 In Host, VDSO pvclock introduce a notifier chain, 
 pvclock_gtod_chain in timekeeping.c. It consume nearly 900 cycles 
 per call. So in consideration of 250 Hz, it may consume 225,000 
 cycles per second, even no VM is created.
 In Guest, gettimeofday consumes 220 cycles per call with VDSO 
 pvclock. If the no-kvmclock-vsyscall is configured, gettimeofday 
 consumes 370 cycles per call. The feature decrease 150 cycles 
 consumption per call.
 When call gettimeofday 1500 times,it decrease 225,000 cycles,equal 
 to the host consumption.
 Both Host and Guest is linux-3.13.6.
 So, whether the host cpu consumption is a problem?
   
Does pvclock serve any real purpose on systems with fully-functional
TSCs?  The x86 guest implementation is awful, so it's about 2x slower
than TSC.  It could be improved a lot, but I'm not sure I understand 
why
it exists in the first place.
  
   VM migration.
 
  Why does that need percpu stuff?  Wouldn't it be sufficient to
  interrupt all CPUs (or at least all cpus running in userspace) on
  migration and update the normal timing data structures?
 
  Are you suggesting to allow interruption of the timekeeping code
  at any time to update frequency information ?

 I'm not sure what you mean by interruption of the timekeeping code.
 I'm suggesting sending an interrupt to the guest (via a virtio device,
 presumably) to tell it that it has been paused and resumed.

 This is probably worth getting John's input if you actually want to do
 this.  I'm not about to :)

 Honestly, neither am i at the moment. But i'll think about it.

 Is there any case in which the TSC is stable and the kvmclock data for
 different cpus is actually different?

 No. However, kvmclock_data.flags field is an interface for watchdog
 unpause.

  Do you want to that as a special tsc clocksource driver ?
 
  Even better: have the VM offer to invalidate the physical page
  containing the kernel's clock data on migration and interrupt one CPU.
   If another CPU races, it'll fault and wait for the guest kernel to
  update its timing.
 
  Perhaps that is a good idea.
 
  Does the current kvmclock stuff track CLOCK_MONOTONIC and
  CLOCK_REALTIME separately?
 
  No. kvmclock counting is interrupted on vm pause (the hw clock does not
  count during vm pause).

 Makes sense.

 
   Can you explain why you consider it so bad ? How you think it could be
   improved ?
 
  The second rdtsc_barrier looks unnecessary.  Even better, if rdtscp is
  available, then rdtscp can replace rdtsc_barrier, rdtsc, and the
  getcpu call.
 
  It would also be nice to avoid having two sets of rescalings of the 
  timing data.
 
  Yep, probably good improvements, patches are welcome :-)
 

 I may get to it at some point.  No guarantees.  I did just rewrite all
 the mapping-related code for every other x86 vdso timesource, so maybe
 I should try to add this to the pile.  The fact that the data is a
 variable number of pages makes it messy, though, and since I don't
 understand why there's a separate structure for each CPU, I'm hesitant
 to change it too much.

 --Andy

 kvmclock.data? Because each VCPU can have different .flags fields for
 example.

It looks like the vdso kvmclock code only runs if
PVCLOCK_TSC_STABLE_BIT is set, which in turn is only the case if the
TSC is guaranteed to be monotonic across all CPUs.  If we can rely on
the fact that that bit will only be set if tsc_to_system_mul and
tsc_shift are the same on all CPUs and that (system_time -
(tsc_timestamp * mul)  shift) is the same on all CPUs, then there
should be no reason for the vdso to read the pvclock data for anything
but CPU 0.  That will make it a lot faster and simpler.

Can we rely on that?

I wonder what happens if the guest runs ntpd or otherwise uses
adjtimex.  Presumably it starts drifting relative to the host.

--Andy
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: VDSO pvclock may increase host cpu consumption, is this a problem?

2014-04-02 Thread Andy Lutomirski
On Wed, Apr 2, 2014 at 3:05 PM, Marcelo Tosatti mtosa...@redhat.com wrote:
 On Tue, Apr 01, 2014 at 05:46:34PM -0700, Andy Lutomirski wrote:
 On Tue, Apr 1, 2014 at 5:29 PM, Marcelo Tosatti mtosa...@redhat.com wrote:
  On Tue, Apr 01, 2014 at 12:17:16PM -0700, Andy Lutomirski wrote:
  On Tue, Apr 1, 2014 at 11:01 AM, Marcelo Tosatti mtosa...@redhat.com 
  wrote:
   On Mon, Mar 31, 2014 at 10:33:41PM -0700, Andy Lutomirski wrote:
   On Mar 31, 2014 8:45 PM, Marcelo Tosatti mtosa...@redhat.com wrote:
   
On Mon, Mar 31, 2014 at 10:52:25AM -0700, Andy Lutomirski wrote:
 On 03/29/2014 01:47 AM, Zhanghailiang wrote:
  Hi,
  I found when Guest is idle, VDSO pvclock may increase host 
  consumption.
  We can calcutate as follow, Correct me if I am wrong.
(Host)250 * update_pvclock_gtod = 1500 * 
  gettimeofday(Guest)
  In Host, VDSO pvclock introduce a notifier chain, 
  pvclock_gtod_chain in timekeeping.c. It consume nearly 900 
  cycles per call. So in consideration of 250 Hz, it may consume 
  225,000 cycles per second, even no VM is created.
  In Guest, gettimeofday consumes 220 cycles per call with VDSO 
  pvclock. If the no-kvmclock-vsyscall is configured, gettimeofday 
  consumes 370 cycles per call. The feature decrease 150 cycles 
  consumption per call.
  When call gettimeofday 1500 times,it decrease 225,000 
  cycles,equal to the host consumption.
  Both Host and Guest is linux-3.13.6.
  So, whether the host cpu consumption is a problem?

 Does pvclock serve any real purpose on systems with 
 fully-functional
 TSCs?  The x86 guest implementation is awful, so it's about 2x 
 slower
 than TSC.  It could be improved a lot, but I'm not sure I 
 understand why
 it exists in the first place.
   
VM migration.
  
   Why does that need percpu stuff?  Wouldn't it be sufficient to
   interrupt all CPUs (or at least all cpus running in userspace) on
   migration and update the normal timing data structures?
  
   Are you suggesting to allow interruption of the timekeeping code
   at any time to update frequency information ?
 
  I'm not sure what you mean by interruption of the timekeeping code.
  I'm suggesting sending an interrupt to the guest (via a virtio device,
  presumably) to tell it that it has been paused and resumed.
 
  This is probably worth getting John's input if you actually want to do
  this.  I'm not about to :)
 
  Honestly, neither am i at the moment. But i'll think about it.
 
  Is there any case in which the TSC is stable and the kvmclock data for
  different cpus is actually different?
 
  No. However, kvmclock_data.flags field is an interface for watchdog
  unpause.
 
   Do you want to that as a special tsc clocksource driver ?
  
   Even better: have the VM offer to invalidate the physical page
   containing the kernel's clock data on migration and interrupt one CPU.
If another CPU races, it'll fault and wait for the guest kernel to
   update its timing.
  
   Perhaps that is a good idea.
  
   Does the current kvmclock stuff track CLOCK_MONOTONIC and
   CLOCK_REALTIME separately?
  
   No. kvmclock counting is interrupted on vm pause (the hw clock does 
   not
   count during vm pause).
 
  Makes sense.
 
  
Can you explain why you consider it so bad ? How you think it could 
be
improved ?
  
   The second rdtsc_barrier looks unnecessary.  Even better, if rdtscp is
   available, then rdtscp can replace rdtsc_barrier, rdtsc, and the
   getcpu call.
  
   It would also be nice to avoid having two sets of rescalings of the 
   timing data.
  
   Yep, probably good improvements, patches are welcome :-)
  
 
  I may get to it at some point.  No guarantees.  I did just rewrite all
  the mapping-related code for every other x86 vdso timesource, so maybe
  I should try to add this to the pile.  The fact that the data is a
  variable number of pages makes it messy, though, and since I don't
  understand why there's a separate structure for each CPU, I'm hesitant
  to change it too much.
 
  --Andy
 
  kvmclock.data? Because each VCPU can have different .flags fields for
  example.

 It looks like the vdso kvmclock code only runs if
 PVCLOCK_TSC_STABLE_BIT is set, which in turn is only the case if the
 TSC is guaranteed to be monotonic across all CPUs.  If we can rely on
 the fact that that bit will only be set if tsc_to_system_mul and
 tsc_shift are the same on all CPUs and that (system_time -
 (tsc_timestamp * mul)  shift) is the same on all CPUs, then there
 should be no reason for the vdso to read the pvclock data for anything
 but CPU 0.  That will make it a lot faster and simpler.

 Can we rely on that?

 In theory yes, but you would have to handle

 PVCLOCK_TSC_STABLE_BIT set - PVCLOCK_TSC_STABLE_BIT not set

 Transition (and the other way around as well).

Since !STABLE already results in a real syscall for clock_gettime and
gettimeofday, I don't

Re: [PATCH] random: Add initialized variable to proc

2014-05-01 Thread Andy Lutomirski
On Thu, May 1, 2014 at 8:35 AM, Andy Lutomirski l...@amacapital.net wrote:
 On Thu, May 1, 2014 at 8:05 AM,  ty...@mit.edu wrote:
 On Wed, Apr 30, 2014 at 09:05:00PM -0700, H. Peter Anvin wrote:

 Giving the guest a seed would be highly useful, though.  There are a
 number of ways to do that; changing the boot protocol is probably
 only useful if Qemu itself bouts the kernel as opposed to an in-VM
 bootloader.

 So how about simply passing a memory address and an optional offset on
 the boot command line?  That way the hypervisor can drop the seed in
 some convenient real memory location, and the kernel can just copy it
 someplace safe, or in the case of kernel ASLR, the relocator can use
 it to seed its CRNG, and then after it relocates the kernel, it can
 crank the CRNG to pass a seed to the kernel's urandom driver.

 That way, we don't have to do something which is ACPI or DT dependent.
 Maybe there will be embedded architectures where using DT might be
 more convenient, but this would probably be simplest for KVM/qumu-based
 VM's, I would think.

 One problem with passing a seed in memory like this is that it
 provides no benefit if the guest reboots without restarting the
 hypervisor.  Using an MSR or something avoids that issue.

 Passing an address in I/O space that can be read to synchronously
 obtain a seed would work, but it could still be messy to get the
 address to propagate through the booatloader and the reboot process.


A CPUID leaf or an MSR advertised by a CPUID leaf has another
advantage: it's easy to use in the ASLR code -- I don't think there's
a real IDT, so there's nothing like rdmsr_safe available.  It also
avoids doing anything complicated with the boot process to allow the
same seed to be used for ASLR and random.c; it can just be invoked
twice on boot.

Here are two easyish ways to do it:

a. Add a new CPUID leaf KVM_CPUID_URANDOM = 0x4002.  The existence
of the leaf is signaled by KVM_CPUID_SIGNATURE.eax = 0x4002.
Reading the leaf either gives all zeros to indicate that it's
unsupported or disabled or it gives 256 bits of urandom-style data in
rax,rbx,rcx,edx.  32-bit callers will have trouble extracting more
than 128 of those 256 bits, but that should be fine.

b. Add a new MSR_KVM_URANDOM and indicate support using
KVM_FEATURE_URANDOM.  The is cleaner, since it matches existing
practice, but it's awkward to return more than 64 bits at a time from
rdmsr.  128 bits is straightforward by cheating and using the high
bits in rax and rdx, but that's kind of gross.  Clobbering any more
registers is awful, and passing a pointer into wrmsr seems
overcomplicated.

There's also the hypercall interface, but it looks like hyperv support
can interfere with it, and I'm not sure whether the guest needs to
cooperate with whatever the magical vmcall patching code is doing.

What's the right forum for this?  This thread is probably not it.

--Andy
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: random: Providing a seed value to VM guests

2014-05-01 Thread Andy Lutomirski
On Thu, May 1, 2014 at 11:59 AM, H. Peter Anvin h...@zytor.com wrote:
 On 05/01/2014 11:53 AM, Andy Lutomirski wrote:

 A CPUID leaf or an MSR advertised by a CPUID leaf has another
 advantage: it's easy to use in the ASLR code -- I don't think there's
 a real IDT, so there's nothing like rdmsr_safe available.  It also
 avoids doing anything complicated with the boot process to allow the
 same seed to be used for ASLR and random.c; it can just be invoked
 twice on boot.


 At that point we are talking an x86-specific interface, and so we might
 as well simply emulate RDRAND (urandom) and RDSEED (random) if the CPU
 doesn't support them.  I believe KVM already has a way to report CPUID
 features that are emulated but supported anyway, i.e. they work but
 are slow.

Do existing kernels and userspace respect this?  If the normal bit for
RDRAND is unset, then we might be okay, but, if not, then I think this
may kill guest performance.

Is RDSEED really reasonable here?  Won't it slow down by several
orders of magnitude?


 What's the right forum for this?  This thread is probably not it.

 Change the subject line?

:)


 -hpa





-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: random: Providing a seed value to VM guests

2014-05-01 Thread Andy Lutomirski
On May 1, 2014 12:26 PM, ty...@mit.edu wrote:

 On Thu, May 01, 2014 at 12:02:49PM -0700, Andy Lutomirski wrote:
 
  Is RDSEED really reasonable here?  Won't it slow down by several
  orders of magnitude?

 That is I think the biggest problem; RDRAND and RDSEED are fast if
 they are native, but they will involve a VM exit if they need to be
 emulated.  So when an OS might want to use RDRAND and RDSEED might be
 quite different if we know they are being emulated.

 Using the RDRAND and RDSEED api certainly makes sense, at least for
 x86, but I suspect we might want to use a different way of signalling
 that a VM guest can use RDRAND and RDSEED if they are running on a CPU
 which doesn't provide that kind of access.  Maybe a CPUID extended
 function parameter, if one could be allocated for use by a Linux
 hypervisor?


I'm still not convinced.  This will affect userspace as well as the
guest kernel, and I don't see why guest user code should be able to
access this API.  RDRAND for CPL0 only would work, but that seems odd.

And I think that RDSEED emulation is asking for trouble.  RDSEED is
synchronous, but /dev/random is asynchronous.  And making bootup wait
for even a single byte from /dev/random seems bad.  In any event,
virtio-rng should be a better interface for this.

 - Ted

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: random: Providing a seed value to VM guests

2014-05-01 Thread Andy Lutomirski
On Thu, May 1, 2014 at 1:30 PM, H. Peter Anvin h...@zytor.com wrote:
 RDSEED is not synchronous.  It is, however, nonblocking.

What I mean is: IIUC it's reasonable to call RDSEED a few times in a
loop and hope it works.  It makes no sense to do that with
/dev/random.


 On May 1, 2014 1:16:40 PM PDT, Andy Lutomirski l...@amacapital.net wrote:
On May 1, 2014 12:26 PM, ty...@mit.edu wrote:

 On Thu, May 01, 2014 at 12:02:49PM -0700, Andy Lutomirski wrote:
 
  Is RDSEED really reasonable here?  Won't it slow down by several
  orders of magnitude?

 That is I think the biggest problem; RDRAND and RDSEED are fast if
 they are native, but they will involve a VM exit if they need to be
 emulated.  So when an OS might want to use RDRAND and RDSEED might be
 quite different if we know they are being emulated.

 Using the RDRAND and RDSEED api certainly makes sense, at least for
 x86, but I suspect we might want to use a different way of signalling
 that a VM guest can use RDRAND and RDSEED if they are running on a
CPU
 which doesn't provide that kind of access.  Maybe a CPUID extended
 function parameter, if one could be allocated for use by a Linux
 hypervisor?


I'm still not convinced.  This will affect userspace as well as the
guest kernel, and I don't see why guest user code should be able to
access this API.  RDRAND for CPL0 only would work, but that seems odd.

And I think that RDSEED emulation is asking for trouble.  RDSEED is
synchronous, but /dev/random is asynchronous.  And making bootup wait
for even a single byte from /dev/random seems bad.  In any event,
virtio-rng should be a better interface for this.

 - Ted


 --
 Sent from my mobile phone.  Please pardon brevity and lack of formatting.



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: random: Providing a seed value to VM guests

2014-05-01 Thread Andy Lutomirski
On Thu, May 1, 2014 at 1:39 PM,  ty...@mit.edu wrote:
 On Thu, May 01, 2014 at 01:32:55PM -0700, Andy Lutomirski wrote:
 On Thu, May 1, 2014 at 1:30 PM, H. Peter Anvin h...@zytor.com wrote:
  RDSEED is not synchronous.  It is, however, nonblocking.

 What I mean is: IIUC it's reasonable to call RDSEED a few times in a
 loop and hope it works.  It makes no sense to do that with
 /dev/random.

 RDSEED is allowed to return an error if there is insufficient entropy.
 So long as the caller understands that this is an emulated
 instruction, I don't see a problem.

What's the point?

I think this is too caught up in x86 architectural stuff.  As I see
it, the goal is to give guests a way to ask their hosts to give them,
immediately and synchronously, some bytes suitable for seeding an RNG.
 These bytes need not contain true entropy, because the host may not
be able to provide entropy an a timely manner.  The mechanism should
be usable extremely early after boot, it should be usable after a
guest reboot, and it should be reliable.  I think there's an added
benefit if all architectures can implement a semantically equivalent
function, even if the interface is completely different.

There's no need for anything new to provide asynchronous and-or very
slow true random data -- virtio-rng already exists. *

Emulating RDRAND for this purpose is a little weird because it's
normally available to user code and it has the flag indicating
failure.  We're also not going to want the guest kernel to access it
through the arch_get_random interface.

Even if we could emulate RDSEED effectively**, I don't really
understand what the guest is expected to do with it.  And I generally
dislike defining an interface with no known sensible users, because it
means that there's a good chance that the interface won't end up
working.

* I still don't know why it doesn't work for me.  I'll fiddle with it,
but I think that the right solution is to fix it for this purpose, not
to replace it.
** Doing this sensibly in the host will be awkward.  Is the host
supposed to use non-blocking reads of /dev/random?  Getting anything
remotely fair may be difficult.


 - Ted



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: random: Providing a seed value to VM guests

2014-05-01 Thread Andy Lutomirski
On Thu, May 1, 2014 at 2:01 PM, H. Peter Anvin h...@zytor.com wrote:
 On 05/01/2014 01:56 PM, Andy Lutomirski wrote:

 Even if we could emulate RDSEED effectively**, I don't really
 understand what the guest is expected to do with it.  And I generally
 dislike defining an interface with no known sensible users, because it
 means that there's a good chance that the interface won't end up
 working.

 ** Doing this sensibly in the host will be awkward.  Is the host
 supposed to use non-blocking reads of /dev/random?  Getting anything
 remotely fair may be difficult.

 The host can use nonblocking reads of /dev/random.  Fairness would have
 to be implemented at the host level, but that is true for anything.


I still don't see the point.  What does this do better than virtio-rng?

The ASLR code doesn't even try to use RDSEED.  RDSEED is used in
add_interrupt_randomness, which should drain the host's /dev/random
even if it could, and it's used in init_std_data.  The logic there is:

if (!arch_get_random_seed_long(rv) 
!arch_get_random_long(rv))
rv = random_get_entropy();

I think this is better achieved by having the host try to supply the
highest quality data it can.

The third RDSEED use is arch_random_refill.  This purpose would be
much better served by the khwrng stuff and virtio-rng.

So I still claim that fancy emulated RDSEED support will have no users.

--Andy
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: random: Providing a seed value to VM guests

2014-05-01 Thread Andy Lutomirski
On Thu, May 1, 2014 at 3:28 PM,  ty...@mit.edu wrote:
 On Thu, May 01, 2014 at 02:06:13PM -0700, Andy Lutomirski wrote:

 I still don't see the point.  What does this do better than virtio-rng?

 I believe you had been complaining about how complicated it was to set
 up virtio?  And this complexity is also an issue if we want to use it
 to initialize the RNG used for the kernel text ASLR --- which has to
 be done very early in the boot process, and where making something as
 simple as possible is a Good Thing.

It's complicated, so it won't be up until much later in the boot
process.  This is completely fine for /dev/random, but it's a problem
for /dev/urandom, ASLR, and such.


 And since we would want to use RDRAND/RDSEED if it is available
 *anyway*, perhaps in combination with other things, why not use the
 RDRAND/RDSEED interface?

Because it's awkward.  I don't think it simplifies anything.

--Andy
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: random: Providing a seed value to VM guests

2014-05-01 Thread Andy Lutomirski
On Thu, May 1, 2014 at 3:46 PM, H. Peter Anvin h...@zytor.com wrote:
 On 05/01/2014 03:32 PM, Andy Lutomirski wrote:
 On Thu, May 1, 2014 at 3:28 PM,  ty...@mit.edu wrote:
 On Thu, May 01, 2014 at 02:06:13PM -0700, Andy Lutomirski wrote:

 I still don't see the point.  What does this do better than virtio-rng?

 I believe you had been complaining about how complicated it was to set
 up virtio?  And this complexity is also an issue if we want to use it
 to initialize the RNG used for the kernel text ASLR --- which has to
 be done very early in the boot process, and where making something as
 simple as possible is a Good Thing.

 It's complicated, so it won't be up until much later in the boot
 process.  This is completely fine for /dev/random, but it's a problem
 for /dev/urandom, ASLR, and such.


 And since we would want to use RDRAND/RDSEED if it is available
 *anyway*, perhaps in combination with other things, why not use the
 RDRAND/RDSEED interface?

 Because it's awkward.  I don't think it simplifies anything.


 It greatly simplifies discovery, which is a Big Deal[TM] in early code.

I think we're comparing:

a) cpuid to detect rdrand *or* emulated rdrand followed by rdrand

to

b) cpuid to detect rdrand or the paravirt seed msr/cpuid call,
followed by rdrand or the msr or cpuid read

this seems like it barely makes a difference, especially since (a)
probably requires detecting KVM anyway.


For the real kernel code, it's probably even closer to making no
difference, since I don't think we'll want arch_get_random_long to use
emulated rdrand.

--Andy
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


x86_64 allyesconfig has screwed up voffset and blows up KVM

2014-05-05 Thread Andy Lutomirski
I'm testing 39bfe90706ab0f588db7cb4d1c0e6d1181e1d2f9.  I'm not sure
what's going on here.

voffset.h contains:

#define VO__end 0x8111c7a0
#define VO__end 0x8db9a000
#define VO__text 0x8100

because

$ nm vmlinux|grep ' _end'
8111c7a0 t _end
8db9a000 B _end


Booting the resulting image says:

KVM internal error. Suberror: 1
emulation failure
EAX=8001 EBX= ECX=c080 EDX=
ESI=00014630 EDI=0b08f000 EBP=0010 ESP=038f14b8
EIP=00100119 EFL=00010046 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0018   00c09300 DPL=0 DS   [-WA]
CS =0010   00c09b00 DPL=0 CS32 [-RA]
SS =0018   00c09300 DPL=0 DS   [-WA]
DS =0018   00c09300 DPL=0 DS   [-WA]
FS =0018   00c09300 DPL=0 DS   [-WA]
GS =0018   00c09300 DPL=0 DS   [-WA]
LDT=   00c0
TR =0020  0fff 00808b00 DPL=0 TSS64-busy
GDT= 038e5320 0030
IDT=  
CR0=8011 CR2= CR3=0b089000 CR4=0020
DR0= DR1= DR2=
DR3=
DR6=0ff0 DR7=0400
EFER=0500
Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
?? ?? ?? ?? ?? ??

Linus's tree from today doesn't seem any better.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/10] RFC: userfault

2014-07-02 Thread Andy Lutomirski
On 07/02/2014 09:50 AM, Andrea Arcangeli wrote:
 Hello everyone,
 
 There's a large CC list for this RFC because this adds two new
 syscalls (userfaultfd and remap_anon_pages) and
 MADV_USERFAULT/MADV_NOUSERFAULT, so suggestions on changes to the API
 or on a completely different API if somebody has better ideas are
 welcome now.

cc:linux-api -- this is certainly worthy of linux-api discussion.

 
 The combination of these features are what I would propose to
 implement postcopy live migration in qemu, and in general demand
 paging of remote memory, hosted in different cloud nodes.
 
 The MADV_USERFAULT feature should be generic enough that it can
 provide the userfaults to the Android volatile range feature too, on
 access of reclaimed volatile pages.
 
 If the access could ever happen in kernel context through syscalls
 (not not just from userland context), then userfaultfd has to be used
 to make the userfault unnoticeable to the syscall (no error will be
 returned). This latter feature is more advanced than what volatile
 ranges alone could do with SIGBUS so far (but it's optional, if the
 process doesn't call userfaultfd, the regular SIGBUS will fire, if the
 fd is closed SIGBUS will also fire for any blocked userfault that was
 waiting a userfaultfd_write ack).
 
 userfaultfd is also a generic enough feature, that it allows KVM to
 implement postcopy live migration without having to modify a single
 line of KVM kernel code. Guest async page faults, FOLL_NOWAIT and all
 other GUP features works just fine in combination with userfaults
 (userfaults trigger async page faults in the guest scheduler so those
 guest processes that aren't waiting for userfaults can keep running in
 the guest vcpus).
 
 remap_anon_pages is the syscall to use to resolve the userfaults (it's
 not mandatory, vmsplice will likely still be used in the case of local
 postcopy live migration just to upgrade the qemu binary, but
 remap_anon_pages is faster and ideal for transferring memory across
 the network, it's zerocopy and doesn't touch the vma: it only holds
 the mmap_sem for reading).
 
 The current behavior of remap_anon_pages is very strict to avoid any
 chance of memory corruption going unnoticed. mremap is not strict like
 that: if there's a synchronization bug it would drop the destination
 range silently resulting in subtle memory corruption for
 example. remap_anon_pages would return -EEXIST in that case. If there
 are holes in the source range remap_anon_pages will return -ENOENT.
 
 If remap_anon_pages is used always with 2M naturally aligned
 addresses, transparent hugepages will not be splitted. In there could
 be 4k (or any size) holes in the 2M (or any size) source range,
 remap_anon_pages should be used with the RAP_ALLOW_SRC_HOLES flag to
 relax some of its strict checks (-ENOENT won't be returned if
 RAP_ALLOW_SRC_HOLES is set, remap_anon_pages then will just behave as
 a noop on any hole in the source range). This flag is generally useful
 when implementing userfaults with THP granularity, but it shouldn't be
 set if doing the userfaults with PAGE_SIZE granularity if the
 developer wants to benefit from the strict -ENOENT behavior.
 
 The remap_anon_pages syscall API is not vectored, as I expect it to be
 used mainly for demand paging (where there can be just one faulting
 range per userfault) or for large ranges (with the THP model as an
 alternative to zapping re-dirtied pages with MADV_DONTNEED with 4k
 granularity before starting the guest in the destination node) where
 vectoring isn't going to provide much performance advantages (thanks
 to the THP coarser granularity).
 
 On the rmap side remap_anon_pages doesn't add much complexity: there's
 no need of nonlinear anon vmas to support it because I added the
 constraint that it will fail if the mapcount is more than 1. So in
 general the source range of remap_anon_pages should be marked
 MADV_DONTFORK to prevent any risk of failure if the process ever
 forks (like qemu can in some case).
 
 One part that hasn't been tested is the poll() syscall on the
 userfaultfd because the postcopy migration thread currently is more
 efficient waiting on blocking read()s (I'll write some code to test
 poll() too). I also appended below a patch to trinity to exercise
 remap_anon_pages and userfaultfd and it completes trinity
 successfully.
 
 The code can be found here:
 
 git clone --reference linux 
 git://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git -b userfault 
 
 The branch is rebased so you can get updates for example with:
 
 git fetch  git checkout -f origin/userfault
 
 Comments welcome, thanks!
 Andrea
 
 From cbe940e13b4cead41e0f862b3abfa3814f235ec3 Mon Sep 17 00:00:00 2001
 From: Andrea Arcangeli aarca...@redhat.com
 Date: Wed, 2 Jul 2014 18:32:35 +0200
 Subject: [PATCH] add remap_anon_pages and userfaultfd
 
 Signed-off-by: Andrea Arcangeli aarca...@redhat.com
 ---
  include/syscalls-x86_64.h   |   2 +
  syscalls/remap_anon_pages.c | 100 
 

Re: [PATCH 08/10] userfaultfd: add new syscall to provide memory externalization

2014-07-02 Thread Andy Lutomirski
On 07/02/2014 09:50 AM, Andrea Arcangeli wrote:
 Once an userfaultfd is created MADV_USERFAULT regions talks through
 the userfaultfd protocol with the thread responsible for doing the
 memory externalization of the process.
 
 The protocol starts by userland writing the requested/preferred
 USERFAULT_PROTOCOL version into the userfault fd (64bit write), if
 kernel knows it, it will ack it by allowing userland to read 64bit
 from the userfault fd that will contain the same 64bit
 USERFAULT_PROTOCOL version that userland asked. Otherwise userfault
 will read __u64 value -1ULL (aka USERFAULTFD_UNKNOWN_PROTOCOL) and it
 will have to try again by writing an older protocol version if
 suitable for its usage too, and read it back again until it stops
 reading -1ULL. After that the userfaultfd protocol starts.
 
 The protocol consists in the userfault fd reads 64bit in size
 providing userland the fault addresses. After a userfault address has
 been read and the fault is resolved by userland, the application must
 write back 128bits in the form of [ start, end ] range (64bit each)
 that will tell the kernel such a range has been mapped. Multiple read
 userfaults can be resolved in a single range write. poll() can be used
 to know when there are new userfaults to read (POLLIN) and when there
 are threads waiting a wakeup through a range write (POLLOUT).
 
 Signed-off-by: Andrea Arcangeli aarca...@redhat.com

 +#ifdef CONFIG_PROC_FS
 +static int userfaultfd_show_fdinfo(struct seq_file *m, struct file *f)
 +{
 + struct userfaultfd_ctx *ctx = f-private_data;
 + int ret;
 + wait_queue_t *wq;
 + struct userfaultfd_wait_queue *uwq;
 + unsigned long pending = 0, total = 0;
 +
 + spin_lock(ctx-fault_wqh.lock);
 + list_for_each_entry(wq, ctx-fault_wqh.task_list, task_list) {
 + uwq = container_of(wq, struct userfaultfd_wait_queue, wq);
 + if (uwq-pending)
 + pending++;
 + total++;
 + }
 + spin_unlock(ctx-fault_wqh.lock);
 +
 + ret = seq_printf(m, pending:\t%lu\ntotal:\t%lu\n, pending, total);

This should show the protocol version, too.

 +
 +SYSCALL_DEFINE1(userfaultfd, int, flags)
 +{
 + int fd, error;
 + struct file *file;

This looks like it can't be used more than once in a process.  That will
be unfortunate for libraries.  Would it be feasible to either have
userfaultfd claim a range of addresses or for a vma to be explicitly
associated with a userfaultfd?  (In the latter case, giant PROT_NONE
MAP_NORESERVE mappings could be used.)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/4] random,x86,kvm: Add and use MSR_KVM_GET_RNG_SEED

2014-07-15 Thread Andy Lutomirski
virtio-rng is both too complicated and insufficient for initial rng
seeding.  It's far too complicated to use for KASLR or any other
early boot random number needs.  It also provides /dev/random-style
bits, which means that making guest boot wait for virtio-rng is
unacceptably slow, and doing it asynchronously means that
/dev/urandom might be predictable when userspace starts.

This introduces a very simple synchronous mechanism to get
/dev/urandom-style bits.

This is a KVM change: am I supposed to write a unit test somewhere?

Andy Lutomirski (4):
  x86,kvm: Add MSR_KVM_GET_RNG_SEED and a matching feature bit
  random,x86: Add arch_get_slow_rng_u64
  random: Seed pools from arch_get_slow_rng_u64 at startup
  x86,kaslr: Use MSR_KVM_GET_RNG_SEED for KASLR if available

 Documentation/virtual/kvm/cpuid.txt  |  3 +++
 arch/x86/Kconfig |  4 
 arch/x86/boot/compressed/aslr.c  | 27 +++
 arch/x86/include/asm/archslowrng.h   | 30 ++
 arch/x86/include/uapi/asm/kvm_para.h |  2 ++
 arch/x86/kernel/kvm.c| 22 ++
 arch/x86/kvm/cpuid.c |  3 ++-
 arch/x86/kvm/x86.c   |  4 
 drivers/char/random.c| 14 +-
 include/linux/random.h   |  9 +
 10 files changed, 116 insertions(+), 2 deletions(-)
 create mode 100644 arch/x86/include/asm/archslowrng.h

-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] x86,kvm: Add MSR_KVM_GET_RNG_SEED and a matching feature bit

2014-07-15 Thread Andy Lutomirski
This adds a simple interface to allow a guest to request 64 bits of
host nonblocking entropy.  This is independent of virtio-rng for a
couple of reasons:

 - It's intended to be usable during early boot, when a trivial
   synchronous interface is needed.

 - virtio-rng gives blocking entropy, and making guest boot wait for
   the host's /dev/random will cause problems.

MSR_KVM_GET_RNG_SEED is intended to provide 64 bits of best-effort
cryptographically secure data for use as a seed.  It provides no
guarantee that the result contains any actual entropy.

Signed-off-by: Andy Lutomirski l...@amacapital.net
---
 Documentation/virtual/kvm/cpuid.txt  | 3 +++
 arch/x86/include/uapi/asm/kvm_para.h | 2 ++
 arch/x86/kvm/cpuid.c | 3 ++-
 arch/x86/kvm/x86.c   | 4 
 4 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/Documentation/virtual/kvm/cpuid.txt 
b/Documentation/virtual/kvm/cpuid.txt
index 3c65feb..0ab043b 100644
--- a/Documentation/virtual/kvm/cpuid.txt
+++ b/Documentation/virtual/kvm/cpuid.txt
@@ -54,6 +54,9 @@ KVM_FEATURE_PV_UNHALT  || 7 || guest checks 
this feature bit
||   || before enabling paravirtualized
||   || spinlock support.
 --
+KVM_FEATURE_GET_RNG_SEED   || 8 || host provides rng seed data via
+   ||   || MSR_KVM_GET_RNG_SEED.
+--
 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||24 || host will warn if no guest-side
||   || per-cpu warps are expected in
||   || kvmclock.
diff --git a/arch/x86/include/uapi/asm/kvm_para.h 
b/arch/x86/include/uapi/asm/kvm_para.h
index 94dc8ca..e2eaf93 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -24,6 +24,7 @@
 #define KVM_FEATURE_STEAL_TIME 5
 #define KVM_FEATURE_PV_EOI 6
 #define KVM_FEATURE_PV_UNHALT  7
+#define KVM_FEATURE_GET_RNG_SEED   8
 
 /* The last 8 bits are used to indicate how to interpret the flags field
  * in pvclock structure. If no bits are set, all flags are ignored.
@@ -40,6 +41,7 @@
 #define MSR_KVM_ASYNC_PF_EN 0x4b564d02
 #define MSR_KVM_STEAL_TIME  0x4b564d03
 #define MSR_KVM_PV_EOI_EN  0x4b564d04
+#define MSR_KVM_GET_RNG_SEED 0x4b564d05
 
 struct kvm_steal_time {
__u64 steal;
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 38a0afe..40d6763 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -479,7 +479,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
 (1  KVM_FEATURE_ASYNC_PF) |
 (1  KVM_FEATURE_PV_EOI) |
 (1  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) |
-(1  KVM_FEATURE_PV_UNHALT);
+(1  KVM_FEATURE_PV_UNHALT) |
+(1  KVM_FEATURE_GET_RNG_SEED);
 
if (sched_info_on())
entry-eax |= (1  KVM_FEATURE_STEAL_TIME);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f644933..4e81853 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -48,6 +48,7 @@
 #include linux/pci.h
 #include linux/timekeeper_internal.h
 #include linux/pvclock_gtod.h
+#include linux/random.h
 #include trace/events/kvm.h
 
 #define CREATE_TRACE_POINTS
@@ -2480,6 +2481,9 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, 
u64 *pdata)
case MSR_KVM_PV_EOI_EN:
data = vcpu-arch.pv_eoi.msr_val;
break;
+   case MSR_KVM_GET_RNG_SEED:
+   get_random_bytes(data, sizeof(data));
+   break;
case MSR_IA32_P5_MC_ADDR:
case MSR_IA32_P5_MC_TYPE:
case MSR_IA32_MCG_CAP:
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] random: Seed pools from arch_get_slow_rng_u64 at startup

2014-07-15 Thread Andy Lutomirski
This should help solve the problem of guests starting out with
predictable RNG state.

Signed-off-by: Andy Lutomirski l...@amacapital.net
---
 drivers/char/random.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 0a7ac0a..bd88a24 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1248,7 +1248,7 @@ EXPORT_SYMBOL(get_random_bytes_arch);
  */
 static void init_std_data(struct entropy_store *r)
 {
-   int i;
+   int i, slow_rng_bits = 0;
ktime_t now = ktime_get_real();
unsigned long rv;
 
@@ -1261,6 +1261,18 @@ static void init_std_data(struct entropy_store *r)
mix_pool_bytes(r, rv, sizeof(rv), NULL);
}
mix_pool_bytes(r, utsname(), sizeof(*(utsname())), NULL);
+
+   for (i = 0; i  4; i++) {
+   u64 rv64;
+
+   if (arch_get_slow_rng_u64(rv64)) {
+   mix_pool_bytes(r, rv64, sizeof(rv64), NULL);
+   slow_rng_bits += 8 * sizeof(rv64);
+   }
+   }
+   if (slow_rng_bits)
+   pr_info(random: seeded %s pool with %d bits of arch slow rng 
data\n,
+  r-name, slow_rng_bits);
 }
 
 /*
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] random,x86: Add arch_get_slow_rng_u64

2014-07-15 Thread Andy Lutomirski
arch_get_slow_rng_u64 tries to get 64 bits of RNG seed data.  Unlike
arch_get_random_{bytes,seed}, etc., it makes no claims about entropy
content.  It's also likely to be much slower and should not be used
frequently.  That being said, it should be fast enough to call
several times during boot without any noticeable slowdown.

This initial implementation backs it with MSR_KVM_GET_RNG_SEED if
available.  The intent is for other hypervisor guest implementations
to implement this interface.

Signed-off-by: Andy Lutomirski l...@amacapital.net
---
 arch/x86/Kconfig   |  4 
 arch/x86/include/asm/archslowrng.h | 30 ++
 arch/x86/kernel/kvm.c  | 22 ++
 include/linux/random.h |  9 +
 4 files changed, 65 insertions(+)
 create mode 100644 arch/x86/include/asm/archslowrng.h

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index a8f749e..4dfb539 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -593,6 +593,7 @@ config KVM_GUEST
bool KVM Guest support (including kvmclock)
depends on PARAVIRT
select PARAVIRT_CLOCK
+   select ARCH_SLOW_RNG
default y
---help---
  This option enables various optimizations for running under the KVM
@@ -627,6 +628,9 @@ config PARAVIRT_TIME_ACCOUNTING
 config PARAVIRT_CLOCK
bool
 
+config ARCH_SLOW_RNG
+   bool
+
 endif #HYPERVISOR_GUEST
 
 config NO_BOOTMEM
diff --git a/arch/x86/include/asm/archslowrng.h 
b/arch/x86/include/asm/archslowrng.h
new file mode 100644
index 000..c8e8d0d
--- /dev/null
+++ b/arch/x86/include/asm/archslowrng.h
@@ -0,0 +1,30 @@
+/*
+ * This file is part of the Linux kernel.
+ *
+ * Copyright (c) 2014 Andy Lutomirski
+ * Authors: Andy Lutomirski l...@amacapital.net
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+#ifndef ASM_X86_ARCHSLOWRANDOM_H
+#define ASM_X86_ARCHSLOWRANDOM_H
+
+#ifndef CONFIG_ARCH_SLOW_RNG
+# error archslowrng.h should not be included if !CONFIG_ARCH_SLOW_RNG
+#endif
+
+/*
+ * Performance is irrelevant here, so there's no point in using the
+ * paravirt ops mechanism.  Instead just use a function pointer.
+ */
+extern int (*arch_get_slow_rng_u64)(u64 *v);
+
+#endif /* ASM_X86_ARCHSLOWRANDOM_H */
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 3dd8e2c..8d64d28 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -416,6 +416,25 @@ void kvm_disable_steal_time(void)
wrmsr(MSR_KVM_STEAL_TIME, 0, 0);
 }
 
+static int nop_get_slow_rng_u64(u64 *v)
+{
+   return 0;
+}
+
+static int kvm_get_slow_rng_u64(u64 *v)
+{
+   /*
+* Allow migration from a hypervisor with the GET_RNG_SEED
+* feature to a hypervisor without it.
+*/
+   if (rdmsrl_safe(MSR_KVM_GET_RNG_SEED, v) == 0)
+   return 1;
+   else
+   return 0;
+}
+
+int (*arch_get_slow_rng_u64)(u64 *v) = nop_get_slow_rng_u64;
+
 #ifdef CONFIG_SMP
 static void __init kvm_smp_prepare_boot_cpu(void)
 {
@@ -493,6 +512,9 @@ void __init kvm_guest_init(void)
if (kvmclock_vsyscall)
kvm_setup_vsyscall_timeinfo();
 
+   if (kvm_para_has_feature(KVM_FEATURE_GET_RNG_SEED))
+   arch_get_slow_rng_u64 = kvm_get_slow_rng_u64;
+
 #ifdef CONFIG_SMP
smp_ops.smp_prepare_boot_cpu = kvm_smp_prepare_boot_cpu;
register_cpu_notifier(kvm_cpu_notifier);
diff --git a/include/linux/random.h b/include/linux/random.h
index 57fbbff..ceafbcf 100644
--- a/include/linux/random.h
+++ b/include/linux/random.h
@@ -106,6 +106,15 @@ static inline int arch_has_random_seed(void)
 }
 #endif
 
+#ifdef CONFIG_ARCH_SLOW_RNG
+# include asm/archslowrng.h
+#else
+static inline int arch_get_slow_rng_u64(u64 *v)
+{
+   return 0;
+}
+#endif
+
 /* Pseudo random number generator from numerical recipes. */
 static inline u32 next_pseudo_random32(u32 seed)
 {
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] x86,kaslr: Use MSR_KVM_GET_RNG_SEED for KASLR if available

2014-07-15 Thread Andy Lutomirski
It's considerably better than any of the alternatives on KVM.

Signed-off-by: Andy Lutomirski l...@amacapital.net
---
 arch/x86/boot/compressed/aslr.c | 27 +++
 1 file changed, 27 insertions(+)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index fc6091a..8583f0e 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -5,6 +5,8 @@
 #include asm/archrandom.h
 #include asm/e820.h
 
+#include uapi/asm/kvm_para.h
+
 #include generated/compile.h
 #include linux/module.h
 #include linux/uts.h
@@ -15,6 +17,22 @@
 static const char build_str[] = UTS_RELEASE  ( LINUX_COMPILE_BY @
LINUX_COMPILE_HOST ) ( LINUX_COMPILER )  UTS_VERSION;
 
+static bool kvm_para_has_feature(unsigned int feature)
+{
+   u32 kvm_base;
+   u32 features;
+
+   if (!has_cpuflag(X86_FEATURE_HYPERVISOR))
+   return false;
+
+   kvm_base = hypervisor_cpuid_base(KVMKVMKVM\0\0\0, KVM_CPUID_FEATURES);
+   if (!kvm_base)
+   return false;
+
+   features = cpuid_eax(kvm_base | KVM_CPUID_FEATURES);
+   return features  (1UL  feature);
+}
+
 #define I8254_PORT_CONTROL 0x43
 #define I8254_PORT_COUNTER00x40
 #define I8254_CMD_READBACK 0xC0
@@ -81,6 +99,15 @@ static unsigned long get_random_long(void)
}
}
 
+   if (kvm_para_has_feature(KVM_FEATURE_GET_RNG_SEED)) {
+   u64 seed;
+
+   debug_putstr( MSR_KVM_GET_RNG_SEED);
+   rdmsrl(MSR_KVM_GET_RNG_SEED, seed);
+   random ^= (unsigned long)seed;
+   use_i8254 = false;
+   }
+
if (has_cpuflag(X86_FEATURE_TSC)) {
debug_putstr( RDTSC);
rdtscll(raw);
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests] Add a test case for MSR_KVM_GET_RNG_SEED

2014-07-15 Thread Andy Lutomirski
Signed-off-by: Andy Lutomirski l...@amacapital.net
---
 config/config-x86-common.mak |  5 -
 x86/get_rng_seed.c   | 50 
 x86/unittests.cfg|  3 +++
 3 files changed, 57 insertions(+), 1 deletion(-)
 create mode 100644 x86/get_rng_seed.c

diff --git a/config/config-x86-common.mak b/config/config-x86-common.mak
index 0b0da85..201a029 100644
--- a/config/config-x86-common.mak
+++ b/config/config-x86-common.mak
@@ -35,7 +35,8 @@ tests-common = $(TEST_DIR)/vmexit.flat $(TEST_DIR)/tsc.flat \
$(TEST_DIR)/kvmclock_test.flat  $(TEST_DIR)/eventinj.flat \
$(TEST_DIR)/s3.flat $(TEST_DIR)/pmu.flat \
$(TEST_DIR)/tsc_adjust.flat $(TEST_DIR)/asyncpf.flat \
-   $(TEST_DIR)/init.flat $(TEST_DIR)/smap.flat
+   $(TEST_DIR)/init.flat $(TEST_DIR)/smap.flat \
+   $(TEST_DIR)/get_rng_seed.flat
 
 ifdef API
 tests-common += api/api-sample
@@ -105,6 +106,8 @@ $(TEST_DIR)/vmx.elf: $(cstart.o) $(TEST_DIR)/vmx.o 
$(TEST_DIR)/vmx_tests.o
 
 $(TEST_DIR)/debug.elf: $(cstart.o) $(TEST_DIR)/debug.o
 
+$(TEST_DIR)/get_rng_seed.elf: $(cstart.o) $(TEST_DIR)/get_rng_seed.o
+
 arch_clean:
$(RM) $(TEST_DIR)/*.o $(TEST_DIR)/*.flat $(TEST_DIR)/*.elf \
$(TEST_DIR)/.*.d lib/x86/.*.d
diff --git a/x86/get_rng_seed.c b/x86/get_rng_seed.c
new file mode 100644
index 000..b2e1b01
--- /dev/null
+++ b/x86/get_rng_seed.c
@@ -0,0 +1,50 @@
+/*
+ * Simple test for MSR_KVM_GET_RNG_SEED.
+ */
+#include x86/msr.h
+#include x86/processor.h
+#include x86/apic-defs.h
+#include x86/apic.h
+#include x86/desc.h
+#include x86/isr.h
+#include x86/vm.h
+
+#include libcflat.h
+#include stdint.h
+
+#define MSR_KVM_GET_RNG_SEED 0x4b564d05
+
+volatile int ngpfs;
+bool fail;
+
+static void gpf_isr(struct ex_regs *r)
+{
+   ngpfs++;
+   r-rip += 2;
+}
+
+int main(int ac, char **av)
+{
+   int loop = 3;
+   u64 val, prev = 0;
+
+   setup_vm();
+   setup_idt();
+   while(loop--) {
+   val = rdmsr(MSR_KVM_GET_RNG_SEED);
+   printf(rng seed: %llx\n, (unsigned long)val);
+   if (val == prev)
+   fail = true;
+   prev = val;
+   }
+
+   handle_exception(13, gpf_isr);
+   wrmsr(MSR_KVM_GET_RNG_SEED, 0);
+   if (ngpfs != 1) {
+   printf(error: wrmsr(MSR_KVM_GET_RNG_SEED) should not work\n);
+   fail = true;
+   }
+
+   printf(%s\n, fail ? FAIL : PASS);
+   return fail;
+}
diff --git a/x86/unittests.cfg b/x86/unittests.cfg
index d78fe0e..98e5c7b 100644
--- a/x86/unittests.cfg
+++ b/x86/unittests.cfg
@@ -158,3 +158,6 @@ arch = x86_64
 [debug]
 file = debug.flat
 arch = x86_64
+
+[get_rng_seed]
+file = get_rnd_seed.flat
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] random,x86,kvm: Add and use MSR_KVM_GET_RNG_SEED

2014-07-16 Thread Andy Lutomirski
On Wed, Jul 16, 2014 at 12:36 AM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 16/07/2014 09:10, Daniel Borkmann ha scritto:

 On 07/16/2014 08:41 AM, Gleb Natapov wrote:

 On Tue, Jul 15, 2014 at 07:48:06PM -0700, Andy Lutomirski wrote:

 virtio-rng is both too complicated and insufficient for initial rng
 seeding.  It's far too complicated to use for KASLR or any other
 early boot random number needs.  It also provides /dev/random-style
 bits, which means that making guest boot wait for virtio-rng is
 unacceptably slow, and doing it asynchronously means that
 /dev/urandom might be predictable when userspace starts.

 This introduces a very simple synchronous mechanism to get
 /dev/urandom-style bits.


 Why can't you use RDRAND instruction for that?


 You mean using it directly? I think simply for the very same reasons
 as in c2557a303a ...


 No, this is very different.  This mechanism provides no guarantee that the
 result contains any actual entropy.  In fact, patch 3 adds a call to the
 new arch_get_slow_rng_u64 just below a call to arch_get_random_lang aka
 RDRAND.  I agree with Gleb that it's simpler to just expect a relatively
 recent processor and use RDRAND.

 BTW, the logic for crediting entropy to RDSEED but not RDRAND escapes me.
 If you trust the processor, you could use Intel's algorithm to force
 reseeding of RDRAND.  If you don't trust the processor, the same paranoia
 applies to RDRAND and RDSEED.

 In a guest you must trust the hypervisor anyway to use RDRAND or RDSEED,
 since the hypervisor can trap it.  A malicious hypervisor is no different
 from a malicious processor.


This patch has nothing whatsoever to do with how much I trust the CPU
vs the hypervisor.  It's for the enormous installed base of machines
without RDRAND.

hpa suggested emulating RDRAND awhile ago, but I think that'll
unusably slow -- the kernel uses RDRAND in various places where it's
expected to be fast, and not using it at all will be preferable to
causing a VM exit for every few bytes.  I've been careful to only use
this in the guest in places where a few hundred to a few thousand
cycles per 64 bits of RNG seed is acceptable.

 In any case, is there a matching QEMU patch somewhere?

What QEMU change is needed?  I admit I'm a bit vague on how QEMU and
KVM cooperate here, but there's no state to save and restore.  I guess
that QEMU wants the ability to turn this on and off for migration.
How does that work?  I couldn't spot the KVM code that allows this
type of control.

--Andy
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH qemu] i386,linux-headers: Add support for kvm_get_rng_seed

2014-07-16 Thread Andy Lutomirski
This updates x86's kvm_para.h for the feature bit definition and
target-i386/cpu.c for the feature name and default.

Signed-off-by: Andy Lutomirski l...@amacapital.net
---
 linux-headers/asm-x86/kvm_para.h | 2 ++
 target-i386/cpu.c| 5 +++--
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/linux-headers/asm-x86/kvm_para.h b/linux-headers/asm-x86/kvm_para.h
index e41c5c1..a9b27ce 100644
--- a/linux-headers/asm-x86/kvm_para.h
+++ b/linux-headers/asm-x86/kvm_para.h
@@ -24,6 +24,7 @@
 #define KVM_FEATURE_STEAL_TIME 5
 #define KVM_FEATURE_PV_EOI 6
 #define KVM_FEATURE_PV_UNHALT  7
+#define KVM_FEATURE_GET_RNG_SEED   8
 
 /* The last 8 bits are used to indicate how to interpret the flags field
  * in pvclock structure. If no bits are set, all flags are ignored.
@@ -40,6 +41,7 @@
 #define MSR_KVM_ASYNC_PF_EN 0x4b564d02
 #define MSR_KVM_STEAL_TIME  0x4b564d03
 #define MSR_KVM_PV_EOI_EN  0x4b564d04
+#define MSR_KVM_GET_RNG_SEED 0x4b564d05
 
 struct kvm_steal_time {
__u64 steal;
diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index 8fd1497..4ea7e6c 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -236,7 +236,7 @@ static const char *ext4_feature_name[] = {
 static const char *kvm_feature_name[] = {
 kvmclock, kvm_nopiodelay, kvm_mmu, kvmclock,
 kvm_asyncpf, kvm_steal_time, kvm_pv_eoi, kvm_pv_unhalt,
-NULL, NULL, NULL, NULL,
+kvm_get_rng_seed, NULL, NULL, NULL,
 NULL, NULL, NULL, NULL,
 NULL, NULL, NULL, NULL,
 NULL, NULL, NULL, NULL,
@@ -368,7 +368,8 @@ static uint32_t kvm_default_features[FEATURE_WORDS] = {
 (1  KVM_FEATURE_ASYNC_PF) |
 (1  KVM_FEATURE_STEAL_TIME) |
 (1  KVM_FEATURE_PV_EOI) |
-(1  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT),
+(1  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) |
+(1  KVM_FEATURE_GET_RNG_SEED),
 [FEAT_1_ECX] = CPUID_EXT_X2APIC,
 };
 
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] random,x86,kvm: Add and use MSR_KVM_GET_RNG_SEED

2014-07-16 Thread Andy Lutomirski
On Wed, Jul 16, 2014 at 7:32 AM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 16/07/2014 16:07, Andy Lutomirski ha scritto:

 This patch has nothing whatsoever to do with how much I trust the CPU
 vs the hypervisor.  It's for the enormous installed base of machines
 without RDRAND.


 Ok.  I think an MSR is fine, though I don't think it's useful for the guest
 to use it if it already has RDRAND and/or RDSEED.


  In any case, is there a matching QEMU patch somewhere?

 What QEMU change is needed?  I admit I'm a bit vague on how QEMU and
 KVM cooperate here, but there's no state to save and restore.  I guess
 that QEMU wants the ability to turn this on and off for migration.
 How does that work?  I couldn't spot the KVM code that allows this
 type of control.


 It is QEMU who decides the CPUID bits that are visible to the guest.  By
 default it blocks bits that it doesn't know about.  You would need to add
 the bit in the kvm_default_features and kvm_feature_name arrays.

 For migration, we have versioned machine types, for example pc-2.1.
 Once the versioned machine type exists, blocking the feature is a one-liner
 like

 x86_cpu_compat_disable_kvm_features(FEAT_KVM, KVM_FEATURE_NAME);

 Unfortunately, QEMU is in hard freeze, so you'd likely be the one creating
 pc-2.2.  This is a boilerplate but relatively complicated patch.  But let's
 cross that bridge when we'll reach it.  For now, you can simply add the bit
 to the two arrays above.


Done.

NB: Patch 4 of this series is bad due to an asm constraint issue that
I haven't figured out yet.  I'll send a replacement once I get it
working.  *sigh* the x86 kernel loading code is a bit of a compilation
mess.

 Paolo



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/5] x86,kvm: Add MSR_KVM_GET_RNG_SEED and a matching feature bit

2014-07-16 Thread Andy Lutomirski
This adds a simple interface to allow a guest to request 64 bits of
host nonblocking entropy.  This is independent of virtio-rng for a
couple of reasons:

 - It's intended to be usable during early boot, when a trivial
   synchronous interface is needed.

 - virtio-rng gives blocking entropy, and making guest boot wait for
   the host's /dev/random will cause problems.

MSR_KVM_GET_RNG_SEED is intended to provide 64 bits of best-effort
cryptographically secure data for use as a seed.  It provides no
guarantee that the result contains any actual entropy.

Signed-off-by: Andy Lutomirski l...@amacapital.net
---
 Documentation/virtual/kvm/cpuid.txt  | 3 +++
 arch/x86/include/uapi/asm/kvm_para.h | 2 ++
 arch/x86/kvm/cpuid.c | 3 ++-
 arch/x86/kvm/x86.c   | 4 
 4 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/Documentation/virtual/kvm/cpuid.txt 
b/Documentation/virtual/kvm/cpuid.txt
index 3c65feb..0ab043b 100644
--- a/Documentation/virtual/kvm/cpuid.txt
+++ b/Documentation/virtual/kvm/cpuid.txt
@@ -54,6 +54,9 @@ KVM_FEATURE_PV_UNHALT  || 7 || guest checks 
this feature bit
||   || before enabling paravirtualized
||   || spinlock support.
 --
+KVM_FEATURE_GET_RNG_SEED   || 8 || host provides rng seed data via
+   ||   || MSR_KVM_GET_RNG_SEED.
+--
 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||24 || host will warn if no guest-side
||   || per-cpu warps are expected in
||   || kvmclock.
diff --git a/arch/x86/include/uapi/asm/kvm_para.h 
b/arch/x86/include/uapi/asm/kvm_para.h
index 94dc8ca..e2eaf93 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -24,6 +24,7 @@
 #define KVM_FEATURE_STEAL_TIME 5
 #define KVM_FEATURE_PV_EOI 6
 #define KVM_FEATURE_PV_UNHALT  7
+#define KVM_FEATURE_GET_RNG_SEED   8
 
 /* The last 8 bits are used to indicate how to interpret the flags field
  * in pvclock structure. If no bits are set, all flags are ignored.
@@ -40,6 +41,7 @@
 #define MSR_KVM_ASYNC_PF_EN 0x4b564d02
 #define MSR_KVM_STEAL_TIME  0x4b564d03
 #define MSR_KVM_PV_EOI_EN  0x4b564d04
+#define MSR_KVM_GET_RNG_SEED 0x4b564d05
 
 struct kvm_steal_time {
__u64 steal;
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 38a0afe..40d6763 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -479,7 +479,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
 (1  KVM_FEATURE_ASYNC_PF) |
 (1  KVM_FEATURE_PV_EOI) |
 (1  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) |
-(1  KVM_FEATURE_PV_UNHALT);
+(1  KVM_FEATURE_PV_UNHALT) |
+(1  KVM_FEATURE_GET_RNG_SEED);
 
if (sched_info_on())
entry-eax |= (1  KVM_FEATURE_STEAL_TIME);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f644933..4e81853 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -48,6 +48,7 @@
 #include linux/pci.h
 #include linux/timekeeper_internal.h
 #include linux/pvclock_gtod.h
+#include linux/random.h
 #include trace/events/kvm.h
 
 #define CREATE_TRACE_POINTS
@@ -2480,6 +2481,9 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, 
u64 *pdata)
case MSR_KVM_PV_EOI_EN:
data = vcpu-arch.pv_eoi.msr_val;
break;
+   case MSR_KVM_GET_RNG_SEED:
+   get_random_bytes(data, sizeof(data));
+   break;
case MSR_IA32_P5_MC_ADDR:
case MSR_IA32_P5_MC_TYPE:
case MSR_IA32_MCG_CAP:
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 5/5] x86,kaslr: Use MSR_KVM_GET_RNG_SEED for KASLR if available

2014-07-16 Thread Andy Lutomirski
It's considerably better than any of the alternatives on KVM.

Rather than reinventing all of the cpu feature query code, this fixes
native_cpuid to work in PIC objects.

I haven't combined it with boot/cpuflags.c's cpuid implementation:
including asm/processor.h from boot/cpuflags.c results in a flood of
unrelated errors, and fixing it might be messy.

Signed-off-by: Andy Lutomirski l...@amacapital.net
---
 arch/x86/boot/compressed/aslr.c  | 27 +++
 arch/x86/include/asm/processor.h | 21 ++---
 2 files changed, 45 insertions(+), 3 deletions(-)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index fc6091a..8583f0e 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -5,6 +5,8 @@
 #include asm/archrandom.h
 #include asm/e820.h
 
+#include uapi/asm/kvm_para.h
+
 #include generated/compile.h
 #include linux/module.h
 #include linux/uts.h
@@ -15,6 +17,22 @@
 static const char build_str[] = UTS_RELEASE  ( LINUX_COMPILE_BY @
LINUX_COMPILE_HOST ) ( LINUX_COMPILER )  UTS_VERSION;
 
+static bool kvm_para_has_feature(unsigned int feature)
+{
+   u32 kvm_base;
+   u32 features;
+
+   if (!has_cpuflag(X86_FEATURE_HYPERVISOR))
+   return false;
+
+   kvm_base = hypervisor_cpuid_base(KVMKVMKVM\0\0\0, KVM_CPUID_FEATURES);
+   if (!kvm_base)
+   return false;
+
+   features = cpuid_eax(kvm_base | KVM_CPUID_FEATURES);
+   return features  (1UL  feature);
+}
+
 #define I8254_PORT_CONTROL 0x43
 #define I8254_PORT_COUNTER00x40
 #define I8254_CMD_READBACK 0xC0
@@ -81,6 +99,15 @@ static unsigned long get_random_long(void)
}
}
 
+   if (kvm_para_has_feature(KVM_FEATURE_GET_RNG_SEED)) {
+   u64 seed;
+
+   debug_putstr( MSR_KVM_GET_RNG_SEED);
+   rdmsrl(MSR_KVM_GET_RNG_SEED, seed);
+   random ^= (unsigned long)seed;
+   use_i8254 = false;
+   }
+
if (has_cpuflag(X86_FEATURE_TSC)) {
debug_putstr( RDTSC);
rdtscll(raw);
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index a4ea023..6096f3c 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -189,10 +189,25 @@ static inline int have_cpuid_p(void)
 static inline void native_cpuid(unsigned int *eax, unsigned int *ebx,
unsigned int *ecx, unsigned int *edx)
 {
-   /* ecx is often an input as well as an output. */
-   asm volatile(cpuid
+   /*
+* This function can be used from the boot code, so it needs
+* to avoid using EBX in constraints in PIC mode.
+*
+* ecx is often an input as well as an output.
+*/
+   asm volatile(.ifnc %%ebx,%1 ; .ifnc %%rbx,%1   \n\t
+movl  %%ebx,%1\n\t
+.endif ; .endif   \n\t
+cpuid \n\t
+.ifnc %%ebx,%1 ; .ifnc %%rbx,%1   \n\t
+xchgl %%ebx,%1\n\t
+.endif ; .endif
: =a (*eax),
- =b (*ebx),
+#if defined(__i386__)  defined(__PIC__)
+ =r (*ebx),  /* gcc won't let us use ebx */
+#else
+ =b (*ebx),  /* ebx is okay */
+#endif
  =c (*ecx),
  =d (*edx)
: 0 (*eax), 2 (*ecx)
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 4/5] random: Log how many bits we managed to seed with in init_std_data

2014-07-16 Thread Andy Lutomirski
This is useful for making sure that init_std_data is working
correctly and for allaying fear when this happens:

random: xyz urandom read with SMALL_NUMBER bits of entropy available

Signed-off-by: Andy Lutomirski l...@amacapital.net
---
 drivers/char/random.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index e2c3d02..10e9642 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1251,12 +1251,16 @@ static void init_std_data(struct entropy_store *r)
int i;
ktime_t now = ktime_get_real();
unsigned long rv;
+   int arch_seed_bits = 0, arch_random_bits = 0, slow_rng_bits = 0;
 
r-last_pulled = jiffies;
mix_pool_bytes(r, now, sizeof(now), NULL);
for (i = r-poolinfo-poolbytes; i  0; i -= sizeof(rv)) {
-   if (!arch_get_random_seed_long(rv) 
-   !arch_get_random_long(rv))
+   if (arch_get_random_seed_long(rv))
+   arch_seed_bits += 8 * sizeof(rv);
+   else if (arch_get_random_long(rv))
+   arch_random_bits += 8 * sizeof(rv);
+   else
rv = random_get_entropy();
mix_pool_bytes(r, rv, sizeof(rv), NULL);
}
@@ -1265,10 +1269,14 @@ static void init_std_data(struct entropy_store *r)
for (i = 0; i  4; i++) {
u64 rv64;
 
-   if (arch_get_slow_rng_u64(rv64))
+   if (arch_get_slow_rng_u64(rv64)) {
mix_pool_bytes(r, rv64, sizeof(rv64), NULL);
+   slow_rng_bits += 8 * sizeof(rv64);
}
}
+
+   pr_info(random: seeded %s pool with %d bits of arch random seed, %d 
bits of arch random, and %d bits of arch slow rng\n,
+   r-name, arch_seed_bits, arch_random_bits, slow_rng_bits);
 }
 
 /*
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/5] random,x86: Add arch_get_slow_rng_u64

2014-07-16 Thread Andy Lutomirski
arch_get_slow_rng_u64 tries to get 64 bits of RNG seed data.  Unlike
arch_get_random_{bytes,seed}, etc., it makes no claims about entropy
content.  It's also likely to be much slower and should not be used
frequently.  That being said, it should be fast enough to call
several times during boot without any noticeable slowdown.

This initial implementation backs it with MSR_KVM_GET_RNG_SEED if
available.  The intent is for other hypervisor guest implementations
to implement this interface.

Signed-off-by: Andy Lutomirski l...@amacapital.net
---
 arch/x86/Kconfig   |  4 
 arch/x86/include/asm/archslowrng.h | 30 ++
 arch/x86/kernel/kvm.c  | 22 ++
 include/linux/random.h |  9 +
 4 files changed, 65 insertions(+)
 create mode 100644 arch/x86/include/asm/archslowrng.h

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index a8f749e..4dfb539 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -593,6 +593,7 @@ config KVM_GUEST
bool KVM Guest support (including kvmclock)
depends on PARAVIRT
select PARAVIRT_CLOCK
+   select ARCH_SLOW_RNG
default y
---help---
  This option enables various optimizations for running under the KVM
@@ -627,6 +628,9 @@ config PARAVIRT_TIME_ACCOUNTING
 config PARAVIRT_CLOCK
bool
 
+config ARCH_SLOW_RNG
+   bool
+
 endif #HYPERVISOR_GUEST
 
 config NO_BOOTMEM
diff --git a/arch/x86/include/asm/archslowrng.h 
b/arch/x86/include/asm/archslowrng.h
new file mode 100644
index 000..c8e8d0d
--- /dev/null
+++ b/arch/x86/include/asm/archslowrng.h
@@ -0,0 +1,30 @@
+/*
+ * This file is part of the Linux kernel.
+ *
+ * Copyright (c) 2014 Andy Lutomirski
+ * Authors: Andy Lutomirski l...@amacapital.net
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+#ifndef ASM_X86_ARCHSLOWRANDOM_H
+#define ASM_X86_ARCHSLOWRANDOM_H
+
+#ifndef CONFIG_ARCH_SLOW_RNG
+# error archslowrng.h should not be included if !CONFIG_ARCH_SLOW_RNG
+#endif
+
+/*
+ * Performance is irrelevant here, so there's no point in using the
+ * paravirt ops mechanism.  Instead just use a function pointer.
+ */
+extern int (*arch_get_slow_rng_u64)(u64 *v);
+
+#endif /* ASM_X86_ARCHSLOWRANDOM_H */
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 3dd8e2c..8d64d28 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -416,6 +416,25 @@ void kvm_disable_steal_time(void)
wrmsr(MSR_KVM_STEAL_TIME, 0, 0);
 }
 
+static int nop_get_slow_rng_u64(u64 *v)
+{
+   return 0;
+}
+
+static int kvm_get_slow_rng_u64(u64 *v)
+{
+   /*
+* Allow migration from a hypervisor with the GET_RNG_SEED
+* feature to a hypervisor without it.
+*/
+   if (rdmsrl_safe(MSR_KVM_GET_RNG_SEED, v) == 0)
+   return 1;
+   else
+   return 0;
+}
+
+int (*arch_get_slow_rng_u64)(u64 *v) = nop_get_slow_rng_u64;
+
 #ifdef CONFIG_SMP
 static void __init kvm_smp_prepare_boot_cpu(void)
 {
@@ -493,6 +512,9 @@ void __init kvm_guest_init(void)
if (kvmclock_vsyscall)
kvm_setup_vsyscall_timeinfo();
 
+   if (kvm_para_has_feature(KVM_FEATURE_GET_RNG_SEED))
+   arch_get_slow_rng_u64 = kvm_get_slow_rng_u64;
+
 #ifdef CONFIG_SMP
smp_ops.smp_prepare_boot_cpu = kvm_smp_prepare_boot_cpu;
register_cpu_notifier(kvm_cpu_notifier);
diff --git a/include/linux/random.h b/include/linux/random.h
index 57fbbff..ceafbcf 100644
--- a/include/linux/random.h
+++ b/include/linux/random.h
@@ -106,6 +106,15 @@ static inline int arch_has_random_seed(void)
 }
 #endif
 
+#ifdef CONFIG_ARCH_SLOW_RNG
+# include asm/archslowrng.h
+#else
+static inline int arch_get_slow_rng_u64(u64 *v)
+{
+   return 0;
+}
+#endif
+
 /* Pseudo random number generator from numerical recipes. */
 static inline u32 next_pseudo_random32(u32 seed)
 {
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 3/5] random: Seed pools from arch_get_slow_rng_u64 at startup

2014-07-16 Thread Andy Lutomirski
This should help solve the problem of guests starting out with
predictable RNG state.

Signed-off-by: Andy Lutomirski l...@amacapital.net
---
 drivers/char/random.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 0a7ac0a..e2c3d02 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1261,6 +1261,14 @@ static void init_std_data(struct entropy_store *r)
mix_pool_bytes(r, rv, sizeof(rv), NULL);
}
mix_pool_bytes(r, utsname(), sizeof(*(utsname())), NULL);
+
+   for (i = 0; i  4; i++) {
+   u64 rv64;
+
+   if (arch_get_slow_rng_u64(rv64))
+   mix_pool_bytes(r, rv64, sizeof(rv64), NULL);
+   }
+   }
 }
 
 /*
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/5] random,x86,kvm: Add and use MSR_KVM_GET_RNG_SEED

2014-07-16 Thread Andy Lutomirski
virtio-rng is both too complicated and insufficient for initial rng
seeding.  It's far too complicated to use for KASLR or any other
early boot random number needs.  It also provides /dev/random-style
bits, which means that making guest boot wait for virtio-rng is
unacceptably slow, and doing it asynchronously means that
/dev/urandom might be predictable when userspace starts.

This introduces a very simple synchronous mechanism to get
/dev/urandom-style bits.

I sent the corresponding kvm-unit-tests and qemu changes separately.

There's room for bikeshedding on the same arch_get_slow_rng_u64.  I
considered arch_get_rng_seed_u64, but that could be confused with
arch_get_random_seed_long, which is not interchangeable.

Changes from v1:
 - Split patches 2 and 3
 - Log all arch sources in init_std_data
 - Fix the 32-bit kaslr build

Andy Lutomirski (5):
  x86,kvm: Add MSR_KVM_GET_RNG_SEED and a matching feature bit
  random,x86: Add arch_get_slow_rng_u64
  random: Seed pools from arch_get_slow_rng_u64 at startup
  random: Log how many bits we managed to seed with in init_std_data
  x86,kaslr: Use MSR_KVM_GET_RNG_SEED for KASLR if available

 Documentation/virtual/kvm/cpuid.txt  |  3 +++
 arch/x86/Kconfig |  4 
 arch/x86/boot/compressed/aslr.c  | 27 +++
 arch/x86/include/asm/archslowrng.h   | 30 ++
 arch/x86/include/asm/processor.h | 21 ++---
 arch/x86/include/uapi/asm/kvm_para.h |  2 ++
 arch/x86/kernel/kvm.c| 22 ++
 arch/x86/kvm/cpuid.c |  3 ++-
 arch/x86/kvm/x86.c   |  4 
 drivers/char/random.c| 20 ++--
 include/linux/random.h   |  9 +
 11 files changed, 139 insertions(+), 6 deletions(-)
 create mode 100644 arch/x86/include/asm/archslowrng.h

-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/5] random,x86,kvm: Add and use MSR_KVM_GET_RNG_SEED

2014-07-16 Thread Andy Lutomirski
On Wed, Jul 16, 2014 at 11:02 AM, Bandan Das b...@redhat.com wrote:
 Andy Lutomirski l...@amacapital.net writes:

 virtio-rng is both too complicated and insufficient for initial rng
 seeding.  It's far too complicated to use for KASLR or any other
 early boot random number needs.  It also provides /dev/random-style
 bits, which means that making guest boot wait for virtio-rng is
 unacceptably slow, and doing it asynchronously means that
 /dev/urandom might be predictable when userspace starts.

 This introduces a very simple synchronous mechanism to get
 /dev/urandom-style bits.

 Whoa! the cover letter seems more like virtio-rng bashing rather than
 introduction to the patchset (and/or it's advantages over existing methods)
 :) That's ok though I guess, these won't be in the commit log.


Yeah, sorry -- I figured that the biggest objection would be just use
virtio-rng.

I'll send a v3 later today -- there's a trivial bisectability bug in
this version.

--Andy

 I sent the corresponding kvm-unit-tests and qemu changes separately.

 There's room for bikeshedding on the same arch_get_slow_rng_u64.  I
 considered arch_get_rng_seed_u64, but that could be confused with
 arch_get_random_seed_long, which is not interchangeable.

 Changes from v1:
  - Split patches 2 and 3
  - Log all arch sources in init_std_data
  - Fix the 32-bit kaslr build

 Andy Lutomirski (5):
   x86,kvm: Add MSR_KVM_GET_RNG_SEED and a matching feature bit
   random,x86: Add arch_get_slow_rng_u64
   random: Seed pools from arch_get_slow_rng_u64 at startup
   random: Log how many bits we managed to seed with in init_std_data
   x86,kaslr: Use MSR_KVM_GET_RNG_SEED for KASLR if available

  Documentation/virtual/kvm/cpuid.txt  |  3 +++
  arch/x86/Kconfig |  4 
  arch/x86/boot/compressed/aslr.c  | 27 +++
  arch/x86/include/asm/archslowrng.h   | 30 ++
  arch/x86/include/asm/processor.h | 21 ++---
  arch/x86/include/uapi/asm/kvm_para.h |  2 ++
  arch/x86/kernel/kvm.c| 22 ++
  arch/x86/kvm/cpuid.c |  3 ++-
  arch/x86/kvm/x86.c   |  4 
  drivers/char/random.c| 20 ++--
  include/linux/random.h   |  9 +
  11 files changed, 139 insertions(+), 6 deletions(-)
  create mode 100644 arch/x86/include/asm/archslowrng.h



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] random,x86,kvm: Add and use MSR_KVM_GET_RNG_SEED

2014-07-16 Thread Andy Lutomirski
On Wed, Jul 16, 2014 at 1:20 PM, H. Peter Anvin h...@zytor.com wrote:
 On 07/16/2014 09:21 AM, Gleb Natapov wrote:
 On Wed, Jul 16, 2014 at 09:13:23AM -0700, H. Peter Anvin wrote:
 On 07/16/2014 09:08 AM, Paolo Bonzini wrote:
 Il 16/07/2014 18:03, H. Peter Anvin ha scritto:
 I suggested emulating RDRAND *but not set the CPUID bit*.  We already
 developed a protocol in KVM/Qemu to enumerate emulated features (created
 for MOVBE as I recall), specifically to service the semantic feature X
 will work but will be substantially slower than normal.

 But those will set the CPUID bit.  There is currently no way for KVM
 guests to know if a CPUID bit is real or emulated.


 OK, so there wasn't any protocol implemented in the end.  I sit corrected.

 That protocol that was implemented is between qemu and kvm, not kvm and a 
 guest.


 Either which way, the notion was to have a PV CPUID bit like the
 proposed kvm_get_rng_seed bit, but to have it exercised by executing RDRAND.

 The biggest reason to *not* do this would be that with an MSR it is not
 available to guest user space, which may be better under the circumstances.

On the theory that I see no legitimate reason to expose this to guest
user space, I think we shouldn't expose it.  If we wanted to add a
get_random_bytes syscall, that would be an entirely different story,
though.

Should I send v3 as one series or should I split it into host and guest parts?

--Andy
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 0/5] random,x86,kvm: Add and use MSR_KVM_GET_RNG_SEED

2014-07-16 Thread Andy Lutomirski
This introduces and uses a very simple synchronous mechanism to get
/dev/urandom-style bits appropriate for initial KVM PV guest RNG
seeding.

virtio-rng is not suitable for this purpose.  It's too difficult to
enumerate for use in early boot (e.g. KASLR, which runs before we
even have an IDT).  It also provides /dev/random-style bits, which
means that making guest boot wait for virtio-rng is unacceptably
slow, and doing it asynchronously means that /dev/urandom might
still be predictable when userspace starts.

I sent the corresponding kvm-unit-tests and qemu changes separately.

There's room for bikeshedding on the same arch_get_slow_rng_u64.  I
considered arch_get_rng_seed_u64, but that could be confused with
arch_get_random_seed_long, which is not interchangeable.

Changes from v2:
 - Bisection fix (patch 2 had a misplaced brace).  The final states is
   identical to that of v2.
 - Improve the 0/5 description a little bit.

Changes from v1:
 - Split patches 2 and 3
 - Log all arch sources in init_std_data
 - Fix the 32-bit kaslr build

Andy Lutomirski (5):
  x86,kvm: Add MSR_KVM_GET_RNG_SEED and a matching feature bit
  random,x86: Add arch_get_slow_rng_u64
  random: Seed pools from arch_get_slow_rng_u64 at startup
  random: Log how many bits we managed to seed with in init_std_data
  x86,kaslr: Use MSR_KVM_GET_RNG_SEED for KASLR if available

 Documentation/virtual/kvm/cpuid.txt  |  3 +++
 arch/x86/Kconfig |  4 
 arch/x86/boot/compressed/aslr.c  | 27 +++
 arch/x86/include/asm/archslowrng.h   | 30 ++
 arch/x86/include/asm/processor.h | 21 ++---
 arch/x86/include/uapi/asm/kvm_para.h |  2 ++
 arch/x86/kernel/kvm.c| 22 ++
 arch/x86/kvm/cpuid.c |  3 ++-
 arch/x86/kvm/x86.c   |  4 
 drivers/char/random.c| 20 ++--
 include/linux/random.h   |  9 +
 11 files changed, 139 insertions(+), 6 deletions(-)
 create mode 100644 arch/x86/include/asm/archslowrng.h

-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 3/5] random: Seed pools from arch_get_slow_rng_u64 at startup

2014-07-16 Thread Andy Lutomirski
This should help solve the problem of guests starting out with
predictable RNG state.

Signed-off-by: Andy Lutomirski l...@amacapital.net
---
 drivers/char/random.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 0a7ac0a..17ad33d 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1261,6 +1261,13 @@ static void init_std_data(struct entropy_store *r)
mix_pool_bytes(r, rv, sizeof(rv), NULL);
}
mix_pool_bytes(r, utsname(), sizeof(*(utsname())), NULL);
+
+   for (i = 0; i  4; i++) {
+   u64 rv64;
+
+   if (arch_get_slow_rng_u64(rv64))
+   mix_pool_bytes(r, rv64, sizeof(rv64), NULL);
+   }
 }
 
 /*
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 2/5] random,x86: Add arch_get_slow_rng_u64

2014-07-16 Thread Andy Lutomirski
arch_get_slow_rng_u64 tries to get 64 bits of RNG seed data.  Unlike
arch_get_random_{bytes,seed}, etc., it makes no claims about entropy
content.  It's also likely to be much slower and should not be used
frequently.  That being said, it should be fast enough to call
several times during boot without any noticeable slowdown.

This initial implementation backs it with MSR_KVM_GET_RNG_SEED if
available.  The intent is for other hypervisor guest implementations
to implement this interface.

Signed-off-by: Andy Lutomirski l...@amacapital.net
---
 arch/x86/Kconfig   |  4 
 arch/x86/include/asm/archslowrng.h | 30 ++
 arch/x86/kernel/kvm.c  | 22 ++
 include/linux/random.h |  9 +
 4 files changed, 65 insertions(+)
 create mode 100644 arch/x86/include/asm/archslowrng.h

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index a8f749e..4dfb539 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -593,6 +593,7 @@ config KVM_GUEST
bool KVM Guest support (including kvmclock)
depends on PARAVIRT
select PARAVIRT_CLOCK
+   select ARCH_SLOW_RNG
default y
---help---
  This option enables various optimizations for running under the KVM
@@ -627,6 +628,9 @@ config PARAVIRT_TIME_ACCOUNTING
 config PARAVIRT_CLOCK
bool
 
+config ARCH_SLOW_RNG
+   bool
+
 endif #HYPERVISOR_GUEST
 
 config NO_BOOTMEM
diff --git a/arch/x86/include/asm/archslowrng.h 
b/arch/x86/include/asm/archslowrng.h
new file mode 100644
index 000..c8e8d0d
--- /dev/null
+++ b/arch/x86/include/asm/archslowrng.h
@@ -0,0 +1,30 @@
+/*
+ * This file is part of the Linux kernel.
+ *
+ * Copyright (c) 2014 Andy Lutomirski
+ * Authors: Andy Lutomirski l...@amacapital.net
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+#ifndef ASM_X86_ARCHSLOWRANDOM_H
+#define ASM_X86_ARCHSLOWRANDOM_H
+
+#ifndef CONFIG_ARCH_SLOW_RNG
+# error archslowrng.h should not be included if !CONFIG_ARCH_SLOW_RNG
+#endif
+
+/*
+ * Performance is irrelevant here, so there's no point in using the
+ * paravirt ops mechanism.  Instead just use a function pointer.
+ */
+extern int (*arch_get_slow_rng_u64)(u64 *v);
+
+#endif /* ASM_X86_ARCHSLOWRANDOM_H */
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 3dd8e2c..8d64d28 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -416,6 +416,25 @@ void kvm_disable_steal_time(void)
wrmsr(MSR_KVM_STEAL_TIME, 0, 0);
 }
 
+static int nop_get_slow_rng_u64(u64 *v)
+{
+   return 0;
+}
+
+static int kvm_get_slow_rng_u64(u64 *v)
+{
+   /*
+* Allow migration from a hypervisor with the GET_RNG_SEED
+* feature to a hypervisor without it.
+*/
+   if (rdmsrl_safe(MSR_KVM_GET_RNG_SEED, v) == 0)
+   return 1;
+   else
+   return 0;
+}
+
+int (*arch_get_slow_rng_u64)(u64 *v) = nop_get_slow_rng_u64;
+
 #ifdef CONFIG_SMP
 static void __init kvm_smp_prepare_boot_cpu(void)
 {
@@ -493,6 +512,9 @@ void __init kvm_guest_init(void)
if (kvmclock_vsyscall)
kvm_setup_vsyscall_timeinfo();
 
+   if (kvm_para_has_feature(KVM_FEATURE_GET_RNG_SEED))
+   arch_get_slow_rng_u64 = kvm_get_slow_rng_u64;
+
 #ifdef CONFIG_SMP
smp_ops.smp_prepare_boot_cpu = kvm_smp_prepare_boot_cpu;
register_cpu_notifier(kvm_cpu_notifier);
diff --git a/include/linux/random.h b/include/linux/random.h
index 57fbbff..ceafbcf 100644
--- a/include/linux/random.h
+++ b/include/linux/random.h
@@ -106,6 +106,15 @@ static inline int arch_has_random_seed(void)
 }
 #endif
 
+#ifdef CONFIG_ARCH_SLOW_RNG
+# include asm/archslowrng.h
+#else
+static inline int arch_get_slow_rng_u64(u64 *v)
+{
+   return 0;
+}
+#endif
+
 /* Pseudo random number generator from numerical recipes. */
 static inline u32 next_pseudo_random32(u32 seed)
 {
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 5/5] x86,kaslr: Use MSR_KVM_GET_RNG_SEED for KASLR if available

2014-07-16 Thread Andy Lutomirski
It's considerably better than any of the alternatives on KVM.

Rather than reinventing all of the cpu feature query code, this fixes
native_cpuid to work in PIC objects.

I haven't combined it with boot/cpuflags.c's cpuid implementation:
including asm/processor.h from boot/cpuflags.c results in a flood of
unrelated errors, and fixing it might be messy.

Signed-off-by: Andy Lutomirski l...@amacapital.net
---
 arch/x86/boot/compressed/aslr.c  | 27 +++
 arch/x86/include/asm/processor.h | 21 ++---
 2 files changed, 45 insertions(+), 3 deletions(-)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index fc6091a..8583f0e 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -5,6 +5,8 @@
 #include asm/archrandom.h
 #include asm/e820.h
 
+#include uapi/asm/kvm_para.h
+
 #include generated/compile.h
 #include linux/module.h
 #include linux/uts.h
@@ -15,6 +17,22 @@
 static const char build_str[] = UTS_RELEASE  ( LINUX_COMPILE_BY @
LINUX_COMPILE_HOST ) ( LINUX_COMPILER )  UTS_VERSION;
 
+static bool kvm_para_has_feature(unsigned int feature)
+{
+   u32 kvm_base;
+   u32 features;
+
+   if (!has_cpuflag(X86_FEATURE_HYPERVISOR))
+   return false;
+
+   kvm_base = hypervisor_cpuid_base(KVMKVMKVM\0\0\0, KVM_CPUID_FEATURES);
+   if (!kvm_base)
+   return false;
+
+   features = cpuid_eax(kvm_base | KVM_CPUID_FEATURES);
+   return features  (1UL  feature);
+}
+
 #define I8254_PORT_CONTROL 0x43
 #define I8254_PORT_COUNTER00x40
 #define I8254_CMD_READBACK 0xC0
@@ -81,6 +99,15 @@ static unsigned long get_random_long(void)
}
}
 
+   if (kvm_para_has_feature(KVM_FEATURE_GET_RNG_SEED)) {
+   u64 seed;
+
+   debug_putstr( MSR_KVM_GET_RNG_SEED);
+   rdmsrl(MSR_KVM_GET_RNG_SEED, seed);
+   random ^= (unsigned long)seed;
+   use_i8254 = false;
+   }
+
if (has_cpuflag(X86_FEATURE_TSC)) {
debug_putstr( RDTSC);
rdtscll(raw);
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index a4ea023..6096f3c 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -189,10 +189,25 @@ static inline int have_cpuid_p(void)
 static inline void native_cpuid(unsigned int *eax, unsigned int *ebx,
unsigned int *ecx, unsigned int *edx)
 {
-   /* ecx is often an input as well as an output. */
-   asm volatile(cpuid
+   /*
+* This function can be used from the boot code, so it needs
+* to avoid using EBX in constraints in PIC mode.
+*
+* ecx is often an input as well as an output.
+*/
+   asm volatile(.ifnc %%ebx,%1 ; .ifnc %%rbx,%1   \n\t
+movl  %%ebx,%1\n\t
+.endif ; .endif   \n\t
+cpuid \n\t
+.ifnc %%ebx,%1 ; .ifnc %%rbx,%1   \n\t
+xchgl %%ebx,%1\n\t
+.endif ; .endif
: =a (*eax),
- =b (*ebx),
+#if defined(__i386__)  defined(__PIC__)
+ =r (*ebx),  /* gcc won't let us use ebx */
+#else
+ =b (*ebx),  /* ebx is okay */
+#endif
  =c (*ecx),
  =d (*edx)
: 0 (*eax), 2 (*ecx)
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 4/5] random: Log how many bits we managed to seed with in init_std_data

2014-07-16 Thread Andy Lutomirski
This is useful for making sure that init_std_data is working
correctly and for allaying fear when this happens:

random: xyz urandom read with SMALL_NUMBER bits of entropy available

Signed-off-by: Andy Lutomirski l...@amacapital.net
---
 drivers/char/random.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 17ad33d..10e9642 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1251,12 +1251,16 @@ static void init_std_data(struct entropy_store *r)
int i;
ktime_t now = ktime_get_real();
unsigned long rv;
+   int arch_seed_bits = 0, arch_random_bits = 0, slow_rng_bits = 0;
 
r-last_pulled = jiffies;
mix_pool_bytes(r, now, sizeof(now), NULL);
for (i = r-poolinfo-poolbytes; i  0; i -= sizeof(rv)) {
-   if (!arch_get_random_seed_long(rv) 
-   !arch_get_random_long(rv))
+   if (arch_get_random_seed_long(rv))
+   arch_seed_bits += 8 * sizeof(rv);
+   else if (arch_get_random_long(rv))
+   arch_random_bits += 8 * sizeof(rv);
+   else
rv = random_get_entropy();
mix_pool_bytes(r, rv, sizeof(rv), NULL);
}
@@ -1265,9 +1269,14 @@ static void init_std_data(struct entropy_store *r)
for (i = 0; i  4; i++) {
u64 rv64;
 
-   if (arch_get_slow_rng_u64(rv64))
+   if (arch_get_slow_rng_u64(rv64)) {
mix_pool_bytes(r, rv64, sizeof(rv64), NULL);
+   slow_rng_bits += 8 * sizeof(rv64);
+   }
}
+
+   pr_info(random: seeded %s pool with %d bits of arch random seed, %d 
bits of arch random, and %d bits of arch slow rng\n,
+   r-name, arch_seed_bits, arch_random_bits, slow_rng_bits);
 }
 
 /*
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 1/5] x86,kvm: Add MSR_KVM_GET_RNG_SEED and a matching feature bit

2014-07-16 Thread Andy Lutomirski
This adds a simple interface to allow a guest to request 64 bits of
host nonblocking entropy.  This is independent of virtio-rng for a
couple of reasons:

 - It's intended to be usable during early boot, when a trivial
   synchronous interface is needed.

 - virtio-rng gives blocking entropy, and making guest boot wait for
   the host's /dev/random will cause problems.

MSR_KVM_GET_RNG_SEED is intended to provide 64 bits of best-effort
cryptographically secure data for use as a seed.  It provides no
guarantee that the result contains any actual entropy.

Signed-off-by: Andy Lutomirski l...@amacapital.net
---
 Documentation/virtual/kvm/cpuid.txt  | 3 +++
 arch/x86/include/uapi/asm/kvm_para.h | 2 ++
 arch/x86/kvm/cpuid.c | 3 ++-
 arch/x86/kvm/x86.c   | 4 
 4 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/Documentation/virtual/kvm/cpuid.txt 
b/Documentation/virtual/kvm/cpuid.txt
index 3c65feb..0ab043b 100644
--- a/Documentation/virtual/kvm/cpuid.txt
+++ b/Documentation/virtual/kvm/cpuid.txt
@@ -54,6 +54,9 @@ KVM_FEATURE_PV_UNHALT  || 7 || guest checks 
this feature bit
||   || before enabling paravirtualized
||   || spinlock support.
 --
+KVM_FEATURE_GET_RNG_SEED   || 8 || host provides rng seed data via
+   ||   || MSR_KVM_GET_RNG_SEED.
+--
 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||24 || host will warn if no guest-side
||   || per-cpu warps are expected in
||   || kvmclock.
diff --git a/arch/x86/include/uapi/asm/kvm_para.h 
b/arch/x86/include/uapi/asm/kvm_para.h
index 94dc8ca..e2eaf93 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -24,6 +24,7 @@
 #define KVM_FEATURE_STEAL_TIME 5
 #define KVM_FEATURE_PV_EOI 6
 #define KVM_FEATURE_PV_UNHALT  7
+#define KVM_FEATURE_GET_RNG_SEED   8
 
 /* The last 8 bits are used to indicate how to interpret the flags field
  * in pvclock structure. If no bits are set, all flags are ignored.
@@ -40,6 +41,7 @@
 #define MSR_KVM_ASYNC_PF_EN 0x4b564d02
 #define MSR_KVM_STEAL_TIME  0x4b564d03
 #define MSR_KVM_PV_EOI_EN  0x4b564d04
+#define MSR_KVM_GET_RNG_SEED 0x4b564d05
 
 struct kvm_steal_time {
__u64 steal;
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 38a0afe..40d6763 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -479,7 +479,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
 (1  KVM_FEATURE_ASYNC_PF) |
 (1  KVM_FEATURE_PV_EOI) |
 (1  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) |
-(1  KVM_FEATURE_PV_UNHALT);
+(1  KVM_FEATURE_PV_UNHALT) |
+(1  KVM_FEATURE_GET_RNG_SEED);
 
if (sched_info_on())
entry-eax |= (1  KVM_FEATURE_STEAL_TIME);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f644933..4e81853 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -48,6 +48,7 @@
 #include linux/pci.h
 #include linux/timekeeper_internal.h
 #include linux/pvclock_gtod.h
+#include linux/random.h
 #include trace/events/kvm.h
 
 #define CREATE_TRACE_POINTS
@@ -2480,6 +2481,9 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, 
u64 *pdata)
case MSR_KVM_PV_EOI_EN:
data = vcpu-arch.pv_eoi.msr_val;
break;
+   case MSR_KVM_GET_RNG_SEED:
+   get_random_bytes(data, sizeof(data));
+   break;
case MSR_IA32_P5_MC_ADDR:
case MSR_IA32_P5_MC_TYPE:
case MSR_IA32_MCG_CAP:
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/5] random,x86: Add arch_get_slow_rng_u64

2014-07-16 Thread Andy Lutomirski
On Wed, Jul 16, 2014 at 2:59 PM, H. Peter Anvin h...@zytor.com wrote:
 On 07/16/2014 02:45 PM, Andy Lutomirski wrote:
 diff --git a/arch/x86/include/asm/archslowrng.h 
 b/arch/x86/include/asm/archslowrng.h
 new file mode 100644
 index 000..c8e8d0d
 --- /dev/null
 +++ b/arch/x86/include/asm/archslowrng.h
 @@ -0,0 +1,30 @@
 +/*
 + * This file is part of the Linux kernel.
 + *
 + * Copyright (c) 2014 Andy Lutomirski
 + * Authors: Andy Lutomirski l...@amacapital.net
 + *
 + * This program is free software; you can redistribute it and/or modify it
 + * under the terms and conditions of the GNU General Public License,
 + * version 2, as published by the Free Software Foundation.
 + *
 + * This program is distributed in the hope it will be useful, but WITHOUT
 + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
 + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
 + * more details.
 + */
 +
 +#ifndef ASM_X86_ARCHSLOWRANDOM_H
 +#define ASM_X86_ARCHSLOWRANDOM_H
 +
 +#ifndef CONFIG_ARCH_SLOW_RNG
 +# error archslowrng.h should not be included if !CONFIG_ARCH_SLOW_RNG
 +#endif
 +

 I'm *seriously* questioning the wisdom of this.  A much saner thing
 would be to do:

 #ifndef CONFIG_ARCH_SLOW_RNG

 /* Not supported */
 static inline int arch_get_slow_rng_u64(u64 *v)
 {
 (void)v;
 return 0;
 }

 #endif

 ... which is basically what we do for the archrandom stuff.

The archrandom stuff defines the not supported variant in the
generic header, which is what I'm doing here.  I could wrap all of
asm/archslowrng.h in #ifdef CONFIG_ARCH_SLOW_RNG instead of putting
the #error in there, but I have no strong preference.


 I'm also wondering if it makes sense to have a function which prefers
 arch_get_random*() over this one as a preferred interface.  Something like:

 int get_random_arch_u64_slow_ok(u64 *v)
 {
 int i;
 u64 x = 0;
 unsigned long l;

 for (i = 0; i  64/BITS_PER_LONG; i++) {
 if (!arch_get_random_long(l))
 return arch_get_slow_rng_u64(v);

 x |=  l  (i*BITS_PER_LONG);
 }
 *v = l;
 return 0;
 }

I played with something like this earlier, but I dropped it when it
ended up having exactly one user.  I suspect that the highly paranoid
will actually prefer seeding with both sources in init_std_data even
if RDRAND is available -- it costs very little and it provides a bit
of extra assurance.


 This still doesn't address the issue e.g. on x86 where RDRAND is
 available but we haven't set up alternatives yet.  So it might be that
 what we really want is to encapsulate this fallback in arch code and do
 a more direct enumeration.

My personal preference is to defer this until some user shows up.  I
think that even this would be too complicated for KASLR, which is the
only extremely early-boot user that I found.

Hmm.  Does the prandom stuff want to use this?


 +
 +static int kvm_get_slow_rng_u64(u64 *v)
 +{
 + /*
 +  * Allow migration from a hypervisor with the GET_RNG_SEED
 +  * feature to a hypervisor without it.
 +  */
 + if (rdmsrl_safe(MSR_KVM_GET_RNG_SEED, v) == 0)
 + return 1;
 + else
 + return 0;
 +}

 How about:

 return rdmsrl_safe(MSR_KVM_GET_RNG_SEED, v) == 0;

 The naming also feels really inconsistent...

Better ideas welcome.  I could call the generic function
arch_get_pv_random_seed, but maybe someone will come up with a
non-paravirt implementation.

--Andy


 -hpa




-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/5] random,x86: Add arch_get_slow_rng_u64

2014-07-16 Thread Andy Lutomirski
On Wed, Jul 16, 2014 at 3:13 PM, Andy Lutomirski l...@amacapital.net wrote:
 My personal preference is to defer this until some user shows up.  I
 think that even this would be too complicated for KASLR, which is the
 only extremely early-boot user that I found.

 Hmm.  Does the prandom stuff want to use this?

prandom isn't even using rdrand.  I'd suggest fixing this separately,
or even just waiting until someone goes and deletes prandom.

--Andy
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/5] random,x86: Add arch_get_slow_rng_u64

2014-07-16 Thread Andy Lutomirski
On Jul 16, 2014 4:00 PM, H. Peter Anvin h...@zytor.com wrote:

 On 07/16/2014 03:40 PM, Andy Lutomirski wrote:
  On Wed, Jul 16, 2014 at 3:13 PM, Andy Lutomirski l...@amacapital.net 
  wrote:
  My personal preference is to defer this until some user shows up.  I
  think that even this would be too complicated for KASLR, which is the
  only extremely early-boot user that I found.
 
  Hmm.  Does the prandom stuff want to use this?
 
  prandom isn't even using rdrand.  I'd suggest fixing this separately,
  or even just waiting until someone goes and deletes prandom.
 

 prandom is exactly the opposite; it is designed for when we need
 possibly low quality random numbers very quickly.  RDRAND is actually
 too slow.

I meant that prandom isn't using rdrand for early seeding.

--Andy


 -hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/5] random,x86: Add arch_get_slow_rng_u64

2014-07-17 Thread Andy Lutomirski
On Thu, Jul 17, 2014 at 9:39 AM, H. Peter Anvin h...@zytor.com wrote:
 On 07/17/2014 03:33 AM, Theodore Ts'o wrote:
 On Wed, Jul 16, 2014 at 09:55:15PM -0700, H. Peter Anvin wrote:
 On 07/16/2014 05:03 PM, Andy Lutomirski wrote:

 I meant that prandom isn't using rdrand for early seeding.


 We should probably fix that.

 It wouldn't hurt to explicitly use arch_get_random_long() in prandom,
 but it does use get_random_bytes() in early seed, and for CPU's with
 RDRAND present, we do use it in init_std_data() in
 drivers/char/random.c, so prandom is already getting initialized via
 an RNG (which is effectively a DRBG even if it doesn't pass all of
 NIST's rules) which is derived from RDRAND.


 I assumed he was referring to before alternatives.  Not sure if we use
 prandom before that point, though.

Unless I'm reading the code wrong, the prandom_reseed_late call can
happen after userspace is running.

Anyway, I'm working on a near-complete rewrite of the guest part of all of this.

--Andy


 -hpa





-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/5] random,x86: Add arch_get_slow_rng_u64

2014-07-17 Thread Andy Lutomirski
On Thu, Jul 17, 2014 at 10:32 AM, Theodore Ts'o ty...@mit.edu wrote:
 On Thu, Jul 17, 2014 at 10:12:27AM -0700, Andy Lutomirski wrote:

 Unless I'm reading the code wrong, the prandom_reseed_late call can
 happen after userspace is running.

 But there is also the prandom_reseed() call, which happens early.


Right -- I missed that.

- Ted



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 1/5] x86,kvm: Add MSR_KVM_GET_RNG_SEED and a matching feature bit

2014-07-17 Thread Andy Lutomirski
On Thu, Jul 17, 2014 at 10:43 AM, Andrew Honig aho...@google.com wrote:
 +   case MSR_KVM_GET_RNG_SEED:
 +   get_random_bytes(data, sizeof(data));
 +   break;

 Should this be rate limited in the interest of conserving randomness?
 If there ever is an attack on the prng, this would create very
 favorable conditions for an attacker to exploit it.

IMO if the nonblocking pool has a weakness that requires us to
conserve its output, then this is the least of our worries.

--Andy
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 3/5] x86,random: Add an x86 implementation of arch_get_rng_seed

2014-07-17 Thread Andy Lutomirski
This is closer to Intel's recommended logic for using RDRAND and
RDSEED.  It will attempt to seed the entire internal state of the
RNG pool using RDSEED (with one bit of RDSEED output per bit of
state).  For any bits that can't be obtained using RDSEED (e.g. if
RDSEED is unavailable), it calculates the number of RDRAND reseeds
needed to obtain the missing bits from the internal NRBG and then
requests enough bits from RDRAND to obtain the full output from at
least that many reseeds.

Arguably, arch_get_random_seed could be removed now: I'm having some
trouble imagining a sensible non-architecture-specific use of it
that wouldn't be better served by arch_get_rng_seed.

Signed-off-by: Andy Lutomirski l...@amacapital.net
---
 arch/x86/include/asm/archrandom.h |  6 +++
 arch/x86/kernel/Makefile  |  2 +
 arch/x86/kernel/archrandom.c  | 79 +++
 3 files changed, 87 insertions(+)
 create mode 100644 arch/x86/kernel/archrandom.c

diff --git a/arch/x86/include/asm/archrandom.h 
b/arch/x86/include/asm/archrandom.h
index 69f1366..88f9c5a 100644
--- a/arch/x86/include/asm/archrandom.h
+++ b/arch/x86/include/asm/archrandom.h
@@ -117,6 +117,12 @@ GET_SEED(arch_get_random_seed_int, unsigned int, 
RDSEED_INT, ASM_NOP4);
 #define arch_has_random()  static_cpu_has(X86_FEATURE_RDRAND)
 #define arch_has_random_seed() static_cpu_has(X86_FEATURE_RDSEED)
 
+#define __HAVE_ARCH_GET_RNG_SEED
+extern void arch_get_rng_seed(void *ctx,
+ void (*seed)(void *ctx, u32 data),
+ int bits_per_source,
+ const char *log_prefix);
+
 #else
 
 static inline int rdrand_long(unsigned long *v)
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 047f9ff..0718bae 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -92,6 +92,8 @@ obj-$(CONFIG_PARAVIRT)+= paravirt.o 
paravirt_patch_$(BITS).o
 obj-$(CONFIG_PARAVIRT_SPINLOCKS)+= paravirt-spinlocks.o
 obj-$(CONFIG_PARAVIRT_CLOCK)   += pvclock.o
 
+obj-$(CONFIG_ARCH_RANDOM)  += archrandom.o
+
 obj-$(CONFIG_PCSPKR_PLATFORM)  += pcspeaker.o
 
 obj-$(CONFIG_X86_CHECK_BIOS_CORRUPTION) += check.o
diff --git a/arch/x86/kernel/archrandom.c b/arch/x86/kernel/archrandom.c
new file mode 100644
index 000..5515fc8
--- /dev/null
+++ b/arch/x86/kernel/archrandom.c
@@ -0,0 +1,79 @@
+/*
+ * This file is part of the Linux kernel.
+ *
+ * Copyright (c) 2014 Andy Lutomirski
+ * Authors: Andy Lutomirski l...@amacapital.net
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+#include asm/archrandom.h
+
+void arch_get_rng_seed(void *ctx,
+  void (*seed)(void *ctx, u32 data),
+  int bits_per_source,
+  const char *log_prefix)
+{
+   int i, longs = (bits_per_source + BITS_PER_LONG - 1) / BITS_PER_LONG;
+   int rdseed_bits = 0, rdrand_bits = 0;
+   int rdrand_longs_wanted = 0;
+   char buf[128] = ;
+   char *msgptr = buf;
+
+   for (i = 0; i  longs; i++) {
+   unsigned long rv;
+
+   if (arch_get_random_seed_long(rv)) {
+   seed(ctx, (u32)rv);
+#if BITS_PER_LONG  32
+   seed(ctx, (u32)(rv  32));
+#endif
+   rdseed_bits += 8 * sizeof(rv);
+   }
+   }
+   if (rdseed_bits)
+   msgptr += sprintf(msgptr, , %d bits from RDSEED, rdseed_bits);
+
+   /*
+* According to the Intel DRNG Software Implementation Guide 2.0,
+* the RDRAND hardware is guaranteed to provide at least 128 bits
+* of non-deterministic entropy per 511*128 bits of RDRAND output.
+* Nonetheless, the guide suggests using a 512:1 reduction for
+* generating seeds.
+*
+* We use one extra reseed, because we might not own the first
+* or last few samples.
+*
+* We skip using RDRAND for any bits already provided by RDSEED,
+* as they use the same underlying entropy source.
+*/
+   if (rdseed_bits  bits_per_source  arch_has_random()) {
+   int nrbg_bits = bits_per_source - rdseed_bits;
+   int reseeds = (nrbg_bits + 127) / 128 + 1;
+
+   rdrand_longs_wanted = reseeds * 512 * 128 / BITS_PER_LONG;
+   }
+   for (i = 0; i  rdrand_longs_wanted; i++) {
+   unsigned long rv;
+
+   if (arch_get_random_long(rv)) {
+   seed(ctx, (u32)rv);
+#if BITS_PER_LONG  32
+   seed(ctx, (u32

[PATCH v4 1/5] x86,kvm: Add MSR_KVM_GET_RNG_SEED and a matching feature bit

2014-07-17 Thread Andy Lutomirski
This adds a simple interface to allow a guest to request 64 bits of
host nonblocking entropy.  This is independent of virtio-rng for a
couple of reasons:

 - It's intended to be usable during early boot, when a trivial
   synchronous interface is needed.

 - virtio-rng gives blocking entropy, and making guest boot wait for
   the host's /dev/random will cause problems.

MSR_KVM_GET_RNG_SEED is intended to provide 64 bits of best-effort
cryptographically secure data for use as a seed.  It provides no
guarantee that the result contains any actual entropy.

Signed-off-by: Andy Lutomirski l...@amacapital.net
---
 Documentation/virtual/kvm/cpuid.txt  | 3 +++
 arch/x86/include/uapi/asm/kvm_para.h | 2 ++
 arch/x86/kvm/cpuid.c | 3 ++-
 arch/x86/kvm/x86.c   | 4 
 4 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/Documentation/virtual/kvm/cpuid.txt 
b/Documentation/virtual/kvm/cpuid.txt
index 3c65feb..0ab043b 100644
--- a/Documentation/virtual/kvm/cpuid.txt
+++ b/Documentation/virtual/kvm/cpuid.txt
@@ -54,6 +54,9 @@ KVM_FEATURE_PV_UNHALT  || 7 || guest checks 
this feature bit
||   || before enabling paravirtualized
||   || spinlock support.
 --
+KVM_FEATURE_GET_RNG_SEED   || 8 || host provides rng seed data via
+   ||   || MSR_KVM_GET_RNG_SEED.
+--
 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||24 || host will warn if no guest-side
||   || per-cpu warps are expected in
||   || kvmclock.
diff --git a/arch/x86/include/uapi/asm/kvm_para.h 
b/arch/x86/include/uapi/asm/kvm_para.h
index 94dc8ca..e2eaf93 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -24,6 +24,7 @@
 #define KVM_FEATURE_STEAL_TIME 5
 #define KVM_FEATURE_PV_EOI 6
 #define KVM_FEATURE_PV_UNHALT  7
+#define KVM_FEATURE_GET_RNG_SEED   8
 
 /* The last 8 bits are used to indicate how to interpret the flags field
  * in pvclock structure. If no bits are set, all flags are ignored.
@@ -40,6 +41,7 @@
 #define MSR_KVM_ASYNC_PF_EN 0x4b564d02
 #define MSR_KVM_STEAL_TIME  0x4b564d03
 #define MSR_KVM_PV_EOI_EN  0x4b564d04
+#define MSR_KVM_GET_RNG_SEED 0x4b564d05
 
 struct kvm_steal_time {
__u64 steal;
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 38a0afe..40d6763 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -479,7 +479,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
 (1  KVM_FEATURE_ASYNC_PF) |
 (1  KVM_FEATURE_PV_EOI) |
 (1  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) |
-(1  KVM_FEATURE_PV_UNHALT);
+(1  KVM_FEATURE_PV_UNHALT) |
+(1  KVM_FEATURE_GET_RNG_SEED);
 
if (sched_info_on())
entry-eax |= (1  KVM_FEATURE_STEAL_TIME);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f644933..4e81853 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -48,6 +48,7 @@
 #include linux/pci.h
 #include linux/timekeeper_internal.h
 #include linux/pvclock_gtod.h
+#include linux/random.h
 #include trace/events/kvm.h
 
 #define CREATE_TRACE_POINTS
@@ -2480,6 +2481,9 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, 
u64 *pdata)
case MSR_KVM_PV_EOI_EN:
data = vcpu-arch.pv_eoi.msr_val;
break;
+   case MSR_KVM_GET_RNG_SEED:
+   get_random_bytes(data, sizeof(data));
+   break;
case MSR_IA32_P5_MC_ADDR:
case MSR_IA32_P5_MC_TYPE:
case MSR_IA32_MCG_CAP:
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 5/5] x86,kaslr: Use MSR_KVM_GET_RNG_SEED for KASLR if available

2014-07-17 Thread Andy Lutomirski
It's considerably better than any of the alternatives on KVM.

Rather than reinventing all of the cpu feature query code, this fixes
native_cpuid to work in PIC objects.

I haven't combined it with boot/cpuflags.c's cpuid implementation:
including asm/processor.h from boot/cpuflags.c results in a flood of
unrelated errors, and fixing it might be messy.

Signed-off-by: Andy Lutomirski l...@amacapital.net
---
 arch/x86/boot/compressed/aslr.c  | 27 +++
 arch/x86/include/asm/processor.h | 21 ++---
 2 files changed, 45 insertions(+), 3 deletions(-)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index fc6091a..8583f0e 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -5,6 +5,8 @@
 #include asm/archrandom.h
 #include asm/e820.h
 
+#include uapi/asm/kvm_para.h
+
 #include generated/compile.h
 #include linux/module.h
 #include linux/uts.h
@@ -15,6 +17,22 @@
 static const char build_str[] = UTS_RELEASE  ( LINUX_COMPILE_BY @
LINUX_COMPILE_HOST ) ( LINUX_COMPILER )  UTS_VERSION;
 
+static bool kvm_para_has_feature(unsigned int feature)
+{
+   u32 kvm_base;
+   u32 features;
+
+   if (!has_cpuflag(X86_FEATURE_HYPERVISOR))
+   return false;
+
+   kvm_base = hypervisor_cpuid_base(KVMKVMKVM\0\0\0, KVM_CPUID_FEATURES);
+   if (!kvm_base)
+   return false;
+
+   features = cpuid_eax(kvm_base | KVM_CPUID_FEATURES);
+   return features  (1UL  feature);
+}
+
 #define I8254_PORT_CONTROL 0x43
 #define I8254_PORT_COUNTER00x40
 #define I8254_CMD_READBACK 0xC0
@@ -81,6 +99,15 @@ static unsigned long get_random_long(void)
}
}
 
+   if (kvm_para_has_feature(KVM_FEATURE_GET_RNG_SEED)) {
+   u64 seed;
+
+   debug_putstr( MSR_KVM_GET_RNG_SEED);
+   rdmsrl(MSR_KVM_GET_RNG_SEED, seed);
+   random ^= (unsigned long)seed;
+   use_i8254 = false;
+   }
+
if (has_cpuflag(X86_FEATURE_TSC)) {
debug_putstr( RDTSC);
rdtscll(raw);
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index a4ea023..6096f3c 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -189,10 +189,25 @@ static inline int have_cpuid_p(void)
 static inline void native_cpuid(unsigned int *eax, unsigned int *ebx,
unsigned int *ecx, unsigned int *edx)
 {
-   /* ecx is often an input as well as an output. */
-   asm volatile(cpuid
+   /*
+* This function can be used from the boot code, so it needs
+* to avoid using EBX in constraints in PIC mode.
+*
+* ecx is often an input as well as an output.
+*/
+   asm volatile(.ifnc %%ebx,%1 ; .ifnc %%rbx,%1   \n\t
+movl  %%ebx,%1\n\t
+.endif ; .endif   \n\t
+cpuid \n\t
+.ifnc %%ebx,%1 ; .ifnc %%rbx,%1   \n\t
+xchgl %%ebx,%1\n\t
+.endif ; .endif
: =a (*eax),
- =b (*ebx),
+#if defined(__i386__)  defined(__PIC__)
+ =r (*ebx),  /* gcc won't let us use ebx */
+#else
+ =b (*ebx),  /* ebx is okay */
+#endif
  =c (*ecx),
  =d (*edx)
: 0 (*eax), 2 (*ecx)
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 2/5] random: Add and use arch_get_rng_seed

2014-07-17 Thread Andy Lutomirski
Currently, init_std_data contains its own logic for using arch
random sources.  This logic is a bit strange: it reads one long of
arch random data per byte of internal state.

This replaces that logic with a generic function arch_get_rng_seed
that allows arch code to supply its own logic.  The default
implementation tries arch_get_random_seed_long and
arch_get_random_long individually, requesting one bit per bit of
internal state being seeded.

Assuming the arch sources are perfect, this is the right thing to
do.  They're not, though, so the followup patch attempts to
implement the correct logic on x86.

Signed-off-by: Andy Lutomirski l...@amacapital.net
---
 drivers/char/random.c  | 14 +++---
 include/linux/random.h | 40 
 2 files changed, 51 insertions(+), 3 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 0a7ac0a..be7a94e 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1236,6 +1236,10 @@ void get_random_bytes_arch(void *buf, int nbytes)
 }
 EXPORT_SYMBOL(get_random_bytes_arch);
 
+static void seed_entropy_store(void *ctx, u32 data)
+{
+   mix_pool_bytes((struct entropy_store *)ctx, data, sizeof(data), NULL);
+}
 
 /*
  * init_std_data - initialize pool with system data
@@ -1251,15 +1255,19 @@ static void init_std_data(struct entropy_store *r)
int i;
ktime_t now = ktime_get_real();
unsigned long rv;
+   char log_prefix[128];
 
r-last_pulled = jiffies;
mix_pool_bytes(r, now, sizeof(now), NULL);
for (i = r-poolinfo-poolbytes; i  0; i -= sizeof(rv)) {
-   if (!arch_get_random_seed_long(rv) 
-   !arch_get_random_long(rv))
-   rv = random_get_entropy();
+   rv = random_get_entropy();
mix_pool_bytes(r, rv, sizeof(rv), NULL);
}
+
+   sprintf(log_prefix, random: seeded %s pool, r-name);
+   arch_get_rng_seed(r, seed_entropy_store, 8 * r-poolinfo-poolbytes,
+ log_prefix);
+
mix_pool_bytes(r, utsname(), sizeof(*(utsname())), NULL);
 }
 
diff --git a/include/linux/random.h b/include/linux/random.h
index 57fbbff..a17065e 100644
--- a/include/linux/random.h
+++ b/include/linux/random.h
@@ -106,6 +106,46 @@ static inline int arch_has_random_seed(void)
 }
 #endif
 
+#ifndef __HAVE_ARCH_GET_RNG_SEED
+
+/**
+ * arch_get_rng_seed() - get architectural rng seed data
+ * @ctx: context for the seed function
+ * @seed: function to call for each u32 obtained
+ * @bits_per_source: number of bits from each source to try to use
+ * @log_prefix: beginning of log output (may be NULL)
+ *
+ * Synchronously load some architectural entropy or other best-effort
+ * random seed data.  An arch-specific implementation should be no worse
+ * than this generic implementation.  If the arch code does something
+ * interesting, it may log something of the form log_prefix with
+ * 8 bits of stuff.
+ *
+ * No arch-specific implementation should be any worse than the generic
+ * implementation.
+ */
+static inline void arch_get_rng_seed(void *ctx,
+void (*seed)(void *ctx, u32 data),
+int bits_per_source,
+const char *log_prefix)
+{
+   int i, longs = (bits_per_source + BITS_PER_LONG - 1) / BITS_PER_LONG;
+
+   for (i = 0; i  longs; i++) {
+   unsigned long rv;
+
+   if (arch_get_random_seed_long(rv) ||
+   arch_get_random_long(rv)) {
+   seed(ctx, (u32)rv);
+#if BITS_PER_LONG  32
+   seed(ctx, (u32)(rv  32));
+#endif
+   }
+   }
+}
+
+#endif /* __HAVE_ARCH_GET_RNG_SEED */
+
 /* Pseudo random number generator from numerical recipes. */
 static inline u32 next_pseudo_random32(u32 seed)
 {
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 0/5] random,x86,kvm: Rework arch RNG seeds and get some from kvm

2014-07-17 Thread Andy Lutomirski
This introduces and uses a very simple synchronous mechanism to get
/dev/urandom-style bits appropriate for initial KVM PV guest RNG
seeding.

It also re-works the way that architectural random data is fed into
random.c's pools.  I added a new arch hook called arch_get_rng_seed.
The default implementation uses arch_get_random_seed_long and
arch_get_random_long, but not quite the same way as before.

x86 gets a custom arch_get_rng_seed, which is significantly enhanced
over the generic implementation.  It uses RDSEED less aggressively (the
old implementation requested 4x or 8x as many bits as would fit in the
pool, depending on kernel bitness), but, if using RDRAND, it requests
enough bits to comply with Intel's recommendations.

x86's arch_get_rng_seed will also use KVM_GET_RNG_SEED if available.
If more paravirt seed sources show up, it will be a natural place
to add them.

I sent the corresponding kvm-unit-tests and qemu changes separately.

Changes from v3:
 - Other than KASLR, the guest pieces are completely rewritten.
   Patches 2-4 have essentially nothing in common with v2.

Changes from v2:
 - Bisection fix (patch 2 had a misplaced brace).  The final states is
   identical to that of v2.
 - Improve the 0/5 description a little bit.

Changes from v1:
 - Split patches 2 and 3
 - Log all arch sources in init_std_data
 - Fix the 32-bit kaslr build

Andy Lutomirski (5):
  x86,kvm: Add MSR_KVM_GET_RNG_SEED and a matching feature bit
  random: Add and use arch_get_rng_seed
  x86,random: Add an x86 implementation of arch_get_rng_seed
  x86,random,kvm: Use KVM_GET_RNG_SEED in arch_get_rng_seed
  x86,kaslr: Use MSR_KVM_GET_RNG_SEED for KASLR if available

 Documentation/virtual/kvm/cpuid.txt  |  3 ++
 arch/x86/Kconfig |  4 ++
 arch/x86/boot/compressed/aslr.c  | 27 ++
 arch/x86/include/asm/archrandom.h|  6 +++
 arch/x86/include/asm/kvm_guest.h |  9 
 arch/x86/include/asm/processor.h | 21 ++--
 arch/x86/include/uapi/asm/kvm_para.h |  2 +
 arch/x86/kernel/Makefile |  2 +
 arch/x86/kernel/archrandom.c | 99 
 arch/x86/kernel/kvm.c| 10 
 arch/x86/kvm/cpuid.c |  3 +-
 arch/x86/kvm/x86.c   |  4 ++
 drivers/char/random.c| 14 +++--
 include/linux/random.h   | 40 +++
 14 files changed, 237 insertions(+), 7 deletions(-)
 create mode 100644 arch/x86/kernel/archrandom.c

-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 4/5] x86,random,kvm: Use KVM_GET_RNG_SEED in arch_get_rng_seed

2014-07-17 Thread Andy Lutomirski
This is a straightforward implementation: for each bit of internal
RNG state, request one bit from KVM_GET_RNG_SEED.  This is done even
if RDSEED/RDRAND worked, since KVM_GET_RNG_SEED is likely to provide
cryptographically secure output even if the CPU's RNG is weak or
compromised.

Signed-off-by: Andy Lutomirski l...@amacapital.net
---
 arch/x86/Kconfig |  4 
 arch/x86/include/asm/kvm_guest.h |  9 +
 arch/x86/kernel/archrandom.c | 22 +-
 arch/x86/kernel/kvm.c| 10 ++
 4 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index a8f749e..adfa09c 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -593,6 +593,7 @@ config KVM_GUEST
bool KVM Guest support (including kvmclock)
depends on PARAVIRT
select PARAVIRT_CLOCK
+   select ARCH_RANDOM
default y
---help---
  This option enables various optimizations for running under the KVM
@@ -1507,6 +1508,9 @@ config ARCH_RANDOM
  If supported, this is a high bandwidth, cryptographically
  secure hardware random number generator.
 
+ This also enables paravirt RNGs such as KVM's if the relevant
+ PV guest support is enabled.
+
 config X86_SMAP
def_bool y
prompt Supervisor Mode Access Prevention if EXPERT
diff --git a/arch/x86/include/asm/kvm_guest.h b/arch/x86/include/asm/kvm_guest.h
index a92b176..8c4dbd5 100644
--- a/arch/x86/include/asm/kvm_guest.h
+++ b/arch/x86/include/asm/kvm_guest.h
@@ -3,4 +3,13 @@
 
 int kvm_setup_vsyscall_timeinfo(void);
 
+#if defined(CONFIG_KVM_GUEST)  defined(CONFIG_ARCH_RANDOM)
+extern bool kvm_get_rng_seed(u64 *rv);
+#else
+static inline bool kvm_get_rng_seed(u64 *rv)
+{
+   return false;
+}
+#endif
+
 #endif /* _ASM_X86_KVM_GUEST_H */
diff --git a/arch/x86/kernel/archrandom.c b/arch/x86/kernel/archrandom.c
index 5515fc8..3bcfa58 100644
--- a/arch/x86/kernel/archrandom.c
+++ b/arch/x86/kernel/archrandom.c
@@ -15,6 +15,7 @@
  */
 
 #include asm/archrandom.h
+#include asm/kvm_guest.h
 
 void arch_get_rng_seed(void *ctx,
   void (*seed)(void *ctx, u32 data),
@@ -22,7 +23,7 @@ void arch_get_rng_seed(void *ctx,
   const char *log_prefix)
 {
int i, longs = (bits_per_source + BITS_PER_LONG - 1) / BITS_PER_LONG;
-   int rdseed_bits = 0, rdrand_bits = 0;
+   int rdseed_bits = 0, rdrand_bits = 0, kvm_bits = 0;
int rdrand_longs_wanted = 0;
char buf[128] = ;
char *msgptr = buf;
@@ -74,6 +75,25 @@ void arch_get_rng_seed(void *ctx,
if (rdrand_bits)
msgptr += sprintf(msgptr, , %d bits from RDRAND, rdrand_bits);
 
+   /*
+* Use KVM_GET_RNG_SEED regardless of whether the CPU RNG worked,
+* since it incorporates entropy unavailable to the CPU.  We
+* request enough bits for the entire internal RNG state, because
+* there's no good reason not to.
+*/
+   for (i = 0; i  (bits_per_source + 63) / 64; i++) {
+   u64 rv;
+
+   if (kvm_get_rng_seed(rv)) {
+   seed(ctx, (u32)rv);
+   seed(ctx, (u32)(rv  32));
+   kvm_bits += 8 * sizeof(rv);
+   }
+   }
+   if (kvm_bits)
+   msgptr += sprintf(msgptr, , %d bits from KVM_GET_RNG_BITS,
+ kvm_bits);
+
if (buf[0])
pr_info(%s with %s\n, log_prefix, buf + 2);
 }
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 3dd8e2c..bd8783a 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -416,6 +416,16 @@ void kvm_disable_steal_time(void)
wrmsr(MSR_KVM_STEAL_TIME, 0, 0);
 }
 
+bool kvm_get_rng_seed(u64 *v)
+{
+   /*
+* Allow migration from a hypervisor with the GET_RNG_SEED
+* feature to a hypervisor without it.
+*/
+   return (kvm_para_has_feature(KVM_FEATURE_GET_RNG_SEED) 
+   rdmsrl_safe(MSR_KVM_GET_RNG_SEED, v) == 0);
+}
+
 #ifdef CONFIG_SMP
 static void __init kvm_smp_prepare_boot_cpu(void)
 {
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/5] random,x86: Add arch_get_slow_rng_u64

2014-07-17 Thread Andy Lutomirski
On Thu, Jul 17, 2014 at 11:42 AM, Hannes Frederic Sowa
han...@stressinduktion.org wrote:


 On Thu, Jul 17, 2014, at 19:34, Andy Lutomirski wrote:
 On Thu, Jul 17, 2014 at 10:32 AM, Theodore Ts'o ty...@mit.edu wrote:
  On Thu, Jul 17, 2014 at 10:12:27AM -0700, Andy Lutomirski wrote:
 
  Unless I'm reading the code wrong, the prandom_reseed_late call can
  happen after userspace is running.
 
  But there is also the prandom_reseed() call, which happens early.
 

 Right -- I missed that.

 prandom_init is a core_initcall, prandom_reseed is a late_initcall.
 During initialization of the network stack we have calls to prandom_u32
 before the late_initcall happens. That said, I think it is not that
 important to seed prandom with rdseed/rdrand as security relevant
 entropy extraction should always use get_random_bytes(), but we should
 do it nonetheless.


Regardless, I don't want to do this as part of this patch series.  One
thing at a time...

--Andy
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 2/5] random: Add and use arch_get_rng_seed

2014-07-22 Thread Andy Lutomirski
On Tue, Jul 22, 2014 at 6:59 AM, Theodore Ts'o ty...@mit.edu wrote:
 On Thu, Jul 17, 2014 at 11:22:17AM -0700, Andy Lutomirski wrote:
 Currently, init_std_data contains its own logic for using arch
 random sources.  This logic is a bit strange: it reads one long of
 arch random data per byte of internal state.

 This isn't true.  Check out the init_std_data() a bit more closely.

 unsigned long rv;

 ...

 for (i = r-poolinfo-poolbytes; i  0; i -= sizeof(rv)) {
 ...

 In particular, note the i -= sizeof(rv).  We are reading one bit per
 bit of internal state beeing seeded.

Whoops, my bad.


 Assuming the arch sources are perfect, this is the right thing to
 do.  They're not, though, so the followup patch attempts to
 implement the correct logic on x86.

 ... and that's not a problem because we aren't giving any entropy
 credit --- and this is deliberate, because we don't want to trust
 un-auditable hardware.  We are deliberately trying to be conservative
 here.

True.

But, if you Intel's hardware does, in fact, work as documented, then
the current code will collect very little entropy on RDSEED-less
hardware.  I see no great reason that we should do something weaker
than following Intel's explicit recommendation for how to seed a PRNG
from RDRAND.


 So I don't think either this patch or the next one is needed.  It adds
 far more complexity than is warranted.

The real reason I did this is because I didn't want to pollute the
kernel with yet more arch_get_random_xyz functions.  In the previous
iteration of this patchset, init_std_data had to deal with no less
than three arch random sources.  If Xen adds something (which, IMO,
they should), then either it'll be up to four, or one of them will
have to multiplex.

Another benefit of this split is that it will potentially allow
arch_get_rng_seed to be made to work before alternatives are run.
There's no fundamental reason that it couldn't work *extremely* early
in boot.  (The KASLR code is an example of how this might work.)  On
the other hand, making arch_get_random_long work very early in boot
would either slow down all the other callers or add a considerable
amount of extra complexity.

So I think that this patch is a slight improvement in RNG
initialization and will actually result in simpler code.  (And yes, if
I submit a new version of it, I'll fix the changelog.)

--Andy
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 2/5] random: Add and use arch_get_rng_seed

2014-07-22 Thread Andy Lutomirski
On Tue, Jul 22, 2014 at 1:57 PM, H. Peter Anvin h...@zytor.com wrote:
 On 07/22/2014 01:44 PM, Andy Lutomirski wrote:

 But, if you Intel's hardware does, in fact, work as documented, then
 the current code will collect very little entropy on RDSEED-less
 hardware.  I see no great reason that we should do something weaker
 than following Intel's explicit recommendation for how to seed a PRNG
 from RDRAND.


 Very little entropy in the architectural worst case.  However, since we
 are running single-threaded at this point, actual hardware performs
 orders of magnitude better.  Since we run the mixing function (for no
 particularly good reason -- it is a linear function and doesn't add
 security) there will be enough delay that RDRAND will in practice catch
 up and the output will be quite high quality.  Since the pool is quite
 large, the likely outcome is that there will be enough randomness that
 in practice we would probably be okay if *no* further entropy was ever
 collected.

Just to check: do you mean the RDRAND is very likely to work (i.e.
arch_get_random_long will return true) or that RDRAND will actually
reseed several times during initialization?

I have no RDRAND-capable hardware, so I can't benchmark it, but I
imagine that we're talking about adding 1-2 ms per boot to ensure that
the pool is filled to capacity with *NRBG* data according to the the
architectural specification.

Anyway, the current code is IMO very much encoding some form of
knowledge of how arch_get_random_* work into init_std_data, and I
don't think that's the place for it.


 Another benefit of this split is that it will potentially allow
 arch_get_rng_seed to be made to work before alternatives are run.
 There's no fundamental reason that it couldn't work *extremely* early
 in boot.  (The KASLR code is an example of how this might work.)  On
 the other hand, making arch_get_random_long work very early in boot
 would either slow down all the other callers or add a considerable
 amount of extra complexity.

 So I think that this patch is a slight improvement in RNG
 initialization and will actually result in simpler code.  (And yes, if
 I submit a new version of it, I'll fix the changelog.)

 There really isn't any significant reason why we could not permit
 randomness initialization very early in the boot, indeed.  It has
 largely been useless in the past because until the I/O system gets
 initialized there is no randomness of any kind available on traditional
 hardware.

To me, the question is whether this is a sufficient reason to add
arch_get_rng_data.  If it is, then great.  If not, then I'd like to
know what other way of doing this would be acceptable.  You disliked
arch_get_slow_rng_u64 or whatever I called it, and I agree -- I think
it sucked.

--Andy
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 2/5] random: Add and use arch_get_rng_seed

2014-07-22 Thread Andy Lutomirski
On Tue, Jul 22, 2014 at 2:08 PM, H. Peter Anvin h...@zytor.com wrote:
 On 07/22/2014 02:04 PM, Andy Lutomirski wrote:

 Just to check: do you mean the RDRAND is very likely to work (i.e.
 arch_get_random_long will return true) or that RDRAND will actually
 reseed several times during initialization?


 I mean that RDRAND will actually reseed several times during
 initialization.  The documented architectural limit is actually
 extremely conservative.

 Either way, it isn't really different from seeding from a VM hosts
 /dev/urandom...


Sure it is.  The VM host's /dev/urandom makes no guarantee (or AFAIK
even any particular effort) to reseed such that the output has some
minimum entropy per bit, so there would be no point to reading extra
data from it.

Anyway, I'd be willing to drop the conservative RDRAND logic, but I
*still* think that arch_get_rng_seed is a much better interface than
arch_get_slow_rng_u64.

--Andy
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 0/5] random,x86,kvm: Rework arch RNG seeds and get some from kvm

2014-07-23 Thread Andy Lutomirski
This introduces and uses a very simple synchronous mechanism to get
/dev/urandom-style bits appropriate for initial KVM PV guest RNG
seeding.

It also re-works the way that architectural random data is fed into
random.c's pools.  I added a new arch hook called arch_get_rng_seed.
The default implementation is more or less the same as the current
code, except that random_get_entropy is now called unconditionally.

x86 gets a custom arch_get_rng_seed.  It will use KVM_GET_RNG_SEED
if available, and, if it does anything, it will log the number of
bits collected from each available architectural source.  If more
paravirt seed sources show up, it will be a natural place to add
them.

I sent the corresponding kvm-unit-tests and qemu changes separately.

Changes from v4:
 - Got rid of the RDRAND behavior change.  If this series is accepted,
   I may resend it separately, but I think it's an unrelated issue.
 - Fix up the changelog entries -- I misunderstood how the old code
   worked.
 - Avoid lots of failed attempts to use KVM_GET_RNG_SEED if it's not
   available.

Changes from v3:
 - Other than KASLR, the guest pieces are completely rewritten.
   Patches 2-4 have essentially nothing in common with v2.

Changes from v2:
 - Bisection fix (patch 2 had a misplaced brace).  The final states is
   identical to that of v2.
 - Improve the 0/5 description a little bit.

Changes from v1:
 - Split patches 2 and 3
 - Log all arch sources in init_std_data
 - Fix the 32-bit kaslr build

Andy Lutomirski (5):
  x86,kvm: Add MSR_KVM_GET_RNG_SEED and a matching feature bit
  random: Add and use arch_get_rng_seed
  x86,random: Add an x86 implementation of arch_get_rng_seed
  x86,random,kvm: Use KVM_GET_RNG_SEED in arch_get_rng_seed
  x86,kaslr: Use MSR_KVM_GET_RNG_SEED for KASLR if available

 Documentation/virtual/kvm/cpuid.txt  |  3 ++
 arch/x86/Kconfig |  4 ++
 arch/x86/boot/compressed/aslr.c  | 27 +
 arch/x86/include/asm/archrandom.h|  6 +++
 arch/x86/include/asm/kvm_guest.h |  9 +
 arch/x86/include/asm/processor.h | 21 --
 arch/x86/include/uapi/asm/kvm_para.h |  2 +
 arch/x86/kernel/Makefile |  2 +
 arch/x86/kernel/archrandom.c | 74 
 arch/x86/kernel/kvm.c| 10 +
 arch/x86/kvm/cpuid.c |  3 +-
 arch/x86/kvm/x86.c   |  4 ++
 drivers/char/random.c| 14 +--
 include/linux/random.h   | 40 +++
 14 files changed, 212 insertions(+), 7 deletions(-)
 create mode 100644 arch/x86/kernel/archrandom.c

-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 2/5] random: Add and use arch_get_rng_seed

2014-07-23 Thread Andy Lutomirski
Currently, init_std_data contains its own logic for using arch
random sources.  This replaces that logic with a generic function
arch_get_rng_seed that allows arch code to supply its own logic.
The default implementation tries arch_get_random_seed_long and
arch_get_random_long individually.

The only functional change here is that random_get_entropy() is used
unconditionally instead of being used only when the arch sources
fail.  This may add a tiny amount of security.

Signed-off-by: Andy Lutomirski l...@amacapital.net
---
 drivers/char/random.c  | 14 +++---
 include/linux/random.h | 40 
 2 files changed, 51 insertions(+), 3 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 0a7ac0a..be7a94e 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1236,6 +1236,10 @@ void get_random_bytes_arch(void *buf, int nbytes)
 }
 EXPORT_SYMBOL(get_random_bytes_arch);
 
+static void seed_entropy_store(void *ctx, u32 data)
+{
+   mix_pool_bytes((struct entropy_store *)ctx, data, sizeof(data), NULL);
+}
 
 /*
  * init_std_data - initialize pool with system data
@@ -1251,15 +1255,19 @@ static void init_std_data(struct entropy_store *r)
int i;
ktime_t now = ktime_get_real();
unsigned long rv;
+   char log_prefix[128];
 
r-last_pulled = jiffies;
mix_pool_bytes(r, now, sizeof(now), NULL);
for (i = r-poolinfo-poolbytes; i  0; i -= sizeof(rv)) {
-   if (!arch_get_random_seed_long(rv) 
-   !arch_get_random_long(rv))
-   rv = random_get_entropy();
+   rv = random_get_entropy();
mix_pool_bytes(r, rv, sizeof(rv), NULL);
}
+
+   sprintf(log_prefix, random: seeded %s pool, r-name);
+   arch_get_rng_seed(r, seed_entropy_store, 8 * r-poolinfo-poolbytes,
+ log_prefix);
+
mix_pool_bytes(r, utsname(), sizeof(*(utsname())), NULL);
 }
 
diff --git a/include/linux/random.h b/include/linux/random.h
index 57fbbff..81a6145 100644
--- a/include/linux/random.h
+++ b/include/linux/random.h
@@ -106,6 +106,46 @@ static inline int arch_has_random_seed(void)
 }
 #endif
 
+#ifndef __HAVE_ARCH_GET_RNG_SEED
+
+/**
+ * arch_get_rng_seed() - get architectural rng seed data
+ * @ctx: context for the seed function
+ * @seed: function to call for each u32 obtained
+ * @bits_per_source: number of bits from each source to try to use
+ * @log_prefix: beginning of log output (may be NULL)
+ *
+ * Synchronously load some architectural entropy or other best-effort
+ * random seed data.  An arch-specific implementation should be no worse
+ * than this generic implementation.  If the arch code does something
+ * interesting, it may log something of the form log_prefix with
+ * 8 bits of stuff.
+ *
+ * No arch-specific implementation should be any worse than the generic
+ * implementation.
+ */
+static inline void arch_get_rng_seed(void *ctx,
+void (*seed)(void *ctx, u32 data),
+int bits_per_source,
+const char *log_prefix)
+{
+   int i;
+
+   for (i = 0; i  bits_per_source; i += 8 * sizeof(long)) {
+   unsigned long rv;
+
+   if (arch_get_random_seed_long(rv) ||
+   arch_get_random_long(rv)) {
+   seed(ctx, (u32)rv);
+#if BITS_PER_LONG  32
+   seed(ctx, (u32)(rv  32));
+#endif
+   }
+   }
+}
+
+#endif /* __HAVE_ARCH_GET_RNG_SEED */
+
 /* Pseudo random number generator from numerical recipes. */
 static inline u32 next_pseudo_random32(u32 seed)
 {
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 5/5] x86,kaslr: Use MSR_KVM_GET_RNG_SEED for KASLR if available

2014-07-23 Thread Andy Lutomirski
It's considerably better than any of the alternatives on KVM.

Rather than reinventing all of the cpu feature query code, this fixes
native_cpuid to work in PIC objects.

I haven't combined it with boot/cpuflags.c's cpuid implementation:
including asm/processor.h from boot/cpuflags.c results in a flood of
unrelated errors, and fixing it might be messy.

Reviewed-by: Kees Cook keesc...@chromium.org
Signed-off-by: Andy Lutomirski l...@amacapital.net
---
 arch/x86/boot/compressed/aslr.c  | 27 +++
 arch/x86/include/asm/processor.h | 21 ++---
 2 files changed, 45 insertions(+), 3 deletions(-)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index fc6091a..8583f0e 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -5,6 +5,8 @@
 #include asm/archrandom.h
 #include asm/e820.h
 
+#include uapi/asm/kvm_para.h
+
 #include generated/compile.h
 #include linux/module.h
 #include linux/uts.h
@@ -15,6 +17,22 @@
 static const char build_str[] = UTS_RELEASE  ( LINUX_COMPILE_BY @
LINUX_COMPILE_HOST ) ( LINUX_COMPILER )  UTS_VERSION;
 
+static bool kvm_para_has_feature(unsigned int feature)
+{
+   u32 kvm_base;
+   u32 features;
+
+   if (!has_cpuflag(X86_FEATURE_HYPERVISOR))
+   return false;
+
+   kvm_base = hypervisor_cpuid_base(KVMKVMKVM\0\0\0, KVM_CPUID_FEATURES);
+   if (!kvm_base)
+   return false;
+
+   features = cpuid_eax(kvm_base | KVM_CPUID_FEATURES);
+   return features  (1UL  feature);
+}
+
 #define I8254_PORT_CONTROL 0x43
 #define I8254_PORT_COUNTER00x40
 #define I8254_CMD_READBACK 0xC0
@@ -81,6 +99,15 @@ static unsigned long get_random_long(void)
}
}
 
+   if (kvm_para_has_feature(KVM_FEATURE_GET_RNG_SEED)) {
+   u64 seed;
+
+   debug_putstr( MSR_KVM_GET_RNG_SEED);
+   rdmsrl(MSR_KVM_GET_RNG_SEED, seed);
+   random ^= (unsigned long)seed;
+   use_i8254 = false;
+   }
+
if (has_cpuflag(X86_FEATURE_TSC)) {
debug_putstr( RDTSC);
rdtscll(raw);
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index a4ea023..6096f3c 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -189,10 +189,25 @@ static inline int have_cpuid_p(void)
 static inline void native_cpuid(unsigned int *eax, unsigned int *ebx,
unsigned int *ecx, unsigned int *edx)
 {
-   /* ecx is often an input as well as an output. */
-   asm volatile(cpuid
+   /*
+* This function can be used from the boot code, so it needs
+* to avoid using EBX in constraints in PIC mode.
+*
+* ecx is often an input as well as an output.
+*/
+   asm volatile(.ifnc %%ebx,%1 ; .ifnc %%rbx,%1   \n\t
+movl  %%ebx,%1\n\t
+.endif ; .endif   \n\t
+cpuid \n\t
+.ifnc %%ebx,%1 ; .ifnc %%rbx,%1   \n\t
+xchgl %%ebx,%1\n\t
+.endif ; .endif
: =a (*eax),
- =b (*ebx),
+#if defined(__i386__)  defined(__PIC__)
+ =r (*ebx),  /* gcc won't let us use ebx */
+#else
+ =b (*ebx),  /* ebx is okay */
+#endif
  =c (*ecx),
  =d (*edx)
: 0 (*eax), 2 (*ecx)
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 3/5] x86,random: Add an x86 implementation of arch_get_rng_seed

2014-07-23 Thread Andy Lutomirski
This does the same thing as the generic implementation, except
that it logs how many bits of each type it collected.  I want to
know whether the initial seeding is working and, if so, whether
the RNG is fast enough.

(I know that hpa assures me that the hardware RNG is more than
 fast enough, but I'd still like a direct way to verify this.)

Arguably, arch_get_random_seed could be removed now: I'm having some
trouble imagining a sensible non-architecture-specific use of it
that wouldn't be better served by arch_get_rng_seed.

Signed-off-by: Andy Lutomirski l...@amacapital.net
---
 arch/x86/include/asm/archrandom.h |  6 +
 arch/x86/kernel/Makefile  |  2 ++
 arch/x86/kernel/archrandom.c  | 51 +++
 3 files changed, 59 insertions(+)
 create mode 100644 arch/x86/kernel/archrandom.c

diff --git a/arch/x86/include/asm/archrandom.h 
b/arch/x86/include/asm/archrandom.h
index 69f1366..88f9c5a 100644
--- a/arch/x86/include/asm/archrandom.h
+++ b/arch/x86/include/asm/archrandom.h
@@ -117,6 +117,12 @@ GET_SEED(arch_get_random_seed_int, unsigned int, 
RDSEED_INT, ASM_NOP4);
 #define arch_has_random()  static_cpu_has(X86_FEATURE_RDRAND)
 #define arch_has_random_seed() static_cpu_has(X86_FEATURE_RDSEED)
 
+#define __HAVE_ARCH_GET_RNG_SEED
+extern void arch_get_rng_seed(void *ctx,
+ void (*seed)(void *ctx, u32 data),
+ int bits_per_source,
+ const char *log_prefix);
+
 #else
 
 static inline int rdrand_long(unsigned long *v)
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 047f9ff..0718bae 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -92,6 +92,8 @@ obj-$(CONFIG_PARAVIRT)+= paravirt.o 
paravirt_patch_$(BITS).o
 obj-$(CONFIG_PARAVIRT_SPINLOCKS)+= paravirt-spinlocks.o
 obj-$(CONFIG_PARAVIRT_CLOCK)   += pvclock.o
 
+obj-$(CONFIG_ARCH_RANDOM)  += archrandom.o
+
 obj-$(CONFIG_PCSPKR_PLATFORM)  += pcspeaker.o
 
 obj-$(CONFIG_X86_CHECK_BIOS_CORRUPTION) += check.o
diff --git a/arch/x86/kernel/archrandom.c b/arch/x86/kernel/archrandom.c
new file mode 100644
index 000..47d13b0
--- /dev/null
+++ b/arch/x86/kernel/archrandom.c
@@ -0,0 +1,51 @@
+/*
+ * This file is part of the Linux kernel.
+ *
+ * Copyright (c) 2014 Andy Lutomirski
+ * Authors: Andy Lutomirski l...@amacapital.net
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+#include asm/archrandom.h
+
+void arch_get_rng_seed(void *ctx,
+  void (*seed)(void *ctx, u32 data),
+  int bits_per_source,
+  const char *log_prefix)
+{
+   int i;
+   int rdseed_bits = 0, rdrand_bits = 0;
+   char buf[128] = ;
+   char *msgptr = buf;
+
+   for (i = 0; i  bits_per_source; i += 8 * sizeof(long)) {
+   unsigned long rv;
+
+   if (arch_get_random_seed_long(rv))
+   rdseed_bits += 8 * sizeof(rv);
+   else if (arch_get_random_long(rv))
+   rdrand_bits += 8 * sizeof(rv);
+   else
+   continue;   /* Don't waste time mixing. */
+
+   seed(ctx, (u32)rv);
+#if BITS_PER_LONG  32
+   seed(ctx, (u32)(rv  32));
+#endif
+   }
+
+   if (rdseed_bits)
+   msgptr += sprintf(msgptr, , %d bits from RDSEED, rdseed_bits);
+   if (rdrand_bits)
+   msgptr += sprintf(msgptr, , %d bits from RDRAND, rdrand_bits);
+   if (buf[0])
+   pr_info(%s with %s\n, log_prefix, buf + 2);
+}
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 4/5] x86,random,kvm: Use KVM_GET_RNG_SEED in arch_get_rng_seed

2014-07-23 Thread Andy Lutomirski
This is a straightforward implementation: for each bit of internal
RNG state, request one bit from KVM_GET_RNG_SEED.  This is done even
if RDSEED/RDRAND worked, since KVM_GET_RNG_SEED is likely to provide
cryptographically secure output even if the CPU's RNG is weak or
compromised.

Signed-off-by: Andy Lutomirski l...@amacapital.net
---
 arch/x86/Kconfig |  4 
 arch/x86/include/asm/kvm_guest.h |  9 +
 arch/x86/kernel/archrandom.c | 25 -
 arch/x86/kernel/kvm.c| 10 ++
 4 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index a8f749e..adfa09c 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -593,6 +593,7 @@ config KVM_GUEST
bool KVM Guest support (including kvmclock)
depends on PARAVIRT
select PARAVIRT_CLOCK
+   select ARCH_RANDOM
default y
---help---
  This option enables various optimizations for running under the KVM
@@ -1507,6 +1508,9 @@ config ARCH_RANDOM
  If supported, this is a high bandwidth, cryptographically
  secure hardware random number generator.
 
+ This also enables paravirt RNGs such as KVM's if the relevant
+ PV guest support is enabled.
+
 config X86_SMAP
def_bool y
prompt Supervisor Mode Access Prevention if EXPERT
diff --git a/arch/x86/include/asm/kvm_guest.h b/arch/x86/include/asm/kvm_guest.h
index a92b176..8c4dbd5 100644
--- a/arch/x86/include/asm/kvm_guest.h
+++ b/arch/x86/include/asm/kvm_guest.h
@@ -3,4 +3,13 @@
 
 int kvm_setup_vsyscall_timeinfo(void);
 
+#if defined(CONFIG_KVM_GUEST)  defined(CONFIG_ARCH_RANDOM)
+extern bool kvm_get_rng_seed(u64 *rv);
+#else
+static inline bool kvm_get_rng_seed(u64 *rv)
+{
+   return false;
+}
+#endif
+
 #endif /* _ASM_X86_KVM_GUEST_H */
diff --git a/arch/x86/kernel/archrandom.c b/arch/x86/kernel/archrandom.c
index 47d13b0..8c8d021 100644
--- a/arch/x86/kernel/archrandom.c
+++ b/arch/x86/kernel/archrandom.c
@@ -15,6 +15,7 @@
  */
 
 #include asm/archrandom.h
+#include asm/kvm_guest.h
 
 void arch_get_rng_seed(void *ctx,
   void (*seed)(void *ctx, u32 data),
@@ -22,7 +23,7 @@ void arch_get_rng_seed(void *ctx,
   const char *log_prefix)
 {
int i;
-   int rdseed_bits = 0, rdrand_bits = 0;
+   int rdseed_bits = 0, rdrand_bits = 0, kvm_bits = 0;
char buf[128] = ;
char *msgptr = buf;
 
@@ -42,10 +43,32 @@ void arch_get_rng_seed(void *ctx,
 #endif
}
 
+   /*
+* Use KVM_GET_RNG_SEED regardless of whether the CPU RNG
+* worked, since it incorporates entropy unavailable to the CPU,
+* and we shouldn't trust the hardware RNG more than we need to.
+* We request enough bits for the entire internal RNG state,
+* because there's no good reason not to.
+*/
+   for (i = 0; i  bits_per_source; i += 64) {
+   u64 rv;
+
+   if (kvm_get_rng_seed(rv)) {
+   seed(ctx, (u32)rv);
+   seed(ctx, (u32)(rv  32));
+   kvm_bits += 8 * sizeof(rv);
+   } else {
+   break;  /* If it fails once, it will keep failing. */
+   }
+   }
+
if (rdseed_bits)
msgptr += sprintf(msgptr, , %d bits from RDSEED, rdseed_bits);
if (rdrand_bits)
msgptr += sprintf(msgptr, , %d bits from RDRAND, rdrand_bits);
+   if (kvm_bits)
+   msgptr += sprintf(msgptr, , %d bits from KVM_GET_RNG_BITS,
+ kvm_bits);
if (buf[0])
pr_info(%s with %s\n, log_prefix, buf + 2);
 }
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 3dd8e2c..bd8783a 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -416,6 +416,16 @@ void kvm_disable_steal_time(void)
wrmsr(MSR_KVM_STEAL_TIME, 0, 0);
 }
 
+bool kvm_get_rng_seed(u64 *v)
+{
+   /*
+* Allow migration from a hypervisor with the GET_RNG_SEED
+* feature to a hypervisor without it.
+*/
+   return (kvm_para_has_feature(KVM_FEATURE_GET_RNG_SEED) 
+   rdmsrl_safe(MSR_KVM_GET_RNG_SEED, v) == 0);
+}
+
 #ifdef CONFIG_SMP
 static void __init kvm_smp_prepare_boot_cpu(void)
 {
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 1/5] x86,kvm: Add MSR_KVM_GET_RNG_SEED and a matching feature bit

2014-07-23 Thread Andy Lutomirski
This adds a simple interface to allow a guest to request 64 bits of
host nonblocking entropy.  This is independent of virtio-rng for a
couple of reasons:

 - It's intended to be usable during early boot, when a trivial
   synchronous interface is needed.

 - virtio-rng gives blocking entropy, and making guest boot wait for
   the host's /dev/random will cause problems.

MSR_KVM_GET_RNG_SEED is intended to provide 64 bits of best-effort
cryptographically secure data for use as a seed.  It provides no
guarantee that the result contains any actual entropy.

Signed-off-by: Andy Lutomirski l...@amacapital.net
---
 Documentation/virtual/kvm/cpuid.txt  | 3 +++
 arch/x86/include/uapi/asm/kvm_para.h | 2 ++
 arch/x86/kvm/cpuid.c | 3 ++-
 arch/x86/kvm/x86.c   | 4 
 4 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/Documentation/virtual/kvm/cpuid.txt 
b/Documentation/virtual/kvm/cpuid.txt
index 3c65feb..0ab043b 100644
--- a/Documentation/virtual/kvm/cpuid.txt
+++ b/Documentation/virtual/kvm/cpuid.txt
@@ -54,6 +54,9 @@ KVM_FEATURE_PV_UNHALT  || 7 || guest checks 
this feature bit
||   || before enabling paravirtualized
||   || spinlock support.
 --
+KVM_FEATURE_GET_RNG_SEED   || 8 || host provides rng seed data via
+   ||   || MSR_KVM_GET_RNG_SEED.
+--
 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||24 || host will warn if no guest-side
||   || per-cpu warps are expected in
||   || kvmclock.
diff --git a/arch/x86/include/uapi/asm/kvm_para.h 
b/arch/x86/include/uapi/asm/kvm_para.h
index 94dc8ca..e2eaf93 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -24,6 +24,7 @@
 #define KVM_FEATURE_STEAL_TIME 5
 #define KVM_FEATURE_PV_EOI 6
 #define KVM_FEATURE_PV_UNHALT  7
+#define KVM_FEATURE_GET_RNG_SEED   8
 
 /* The last 8 bits are used to indicate how to interpret the flags field
  * in pvclock structure. If no bits are set, all flags are ignored.
@@ -40,6 +41,7 @@
 #define MSR_KVM_ASYNC_PF_EN 0x4b564d02
 #define MSR_KVM_STEAL_TIME  0x4b564d03
 #define MSR_KVM_PV_EOI_EN  0x4b564d04
+#define MSR_KVM_GET_RNG_SEED 0x4b564d05
 
 struct kvm_steal_time {
__u64 steal;
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 38a0afe..40d6763 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -479,7 +479,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
 (1  KVM_FEATURE_ASYNC_PF) |
 (1  KVM_FEATURE_PV_EOI) |
 (1  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) |
-(1  KVM_FEATURE_PV_UNHALT);
+(1  KVM_FEATURE_PV_UNHALT) |
+(1  KVM_FEATURE_GET_RNG_SEED);
 
if (sched_info_on())
entry-eax |= (1  KVM_FEATURE_STEAL_TIME);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f644933..4e81853 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -48,6 +48,7 @@
 #include linux/pci.h
 #include linux/timekeeper_internal.h
 #include linux/pvclock_gtod.h
+#include linux/random.h
 #include trace/events/kvm.h
 
 #define CREATE_TRACE_POINTS
@@ -2480,6 +2481,9 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, 
u64 *pdata)
case MSR_KVM_PV_EOI_EN:
data = vcpu-arch.pv_eoi.msr_val;
break;
+   case MSR_KVM_GET_RNG_SEED:
+   get_random_bytes(data, sizeof(data));
+   break;
case MSR_IA32_P5_MC_ADDR:
case MSR_IA32_P5_MC_TYPE:
case MSR_IA32_MCG_CAP:
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 2/5] random: Add and use arch_get_rng_seed

2014-07-29 Thread Andy Lutomirski
On Wed, Jul 23, 2014 at 9:57 PM, Andy Lutomirski l...@amacapital.net wrote:
 Currently, init_std_data contains its own logic for using arch
 random sources.  This replaces that logic with a generic function
 arch_get_rng_seed that allows arch code to supply its own logic.
 The default implementation tries arch_get_random_seed_long and
 arch_get_random_long individually.

 The only functional change here is that random_get_entropy() is used
 unconditionally instead of being used only when the arch sources
 fail.  This may add a tiny amount of security.

tytso, are you okay with this approach?  I'd be happy to rework this
if you prefer some other way of doing it.

--Andy


 Signed-off-by: Andy Lutomirski l...@amacapital.net
 ---
  drivers/char/random.c  | 14 +++---
  include/linux/random.h | 40 
  2 files changed, 51 insertions(+), 3 deletions(-)

 diff --git a/drivers/char/random.c b/drivers/char/random.c
 index 0a7ac0a..be7a94e 100644
 --- a/drivers/char/random.c
 +++ b/drivers/char/random.c
 @@ -1236,6 +1236,10 @@ void get_random_bytes_arch(void *buf, int nbytes)
  }
  EXPORT_SYMBOL(get_random_bytes_arch);

 +static void seed_entropy_store(void *ctx, u32 data)
 +{
 +   mix_pool_bytes((struct entropy_store *)ctx, data, sizeof(data), 
 NULL);
 +}

  /*
   * init_std_data - initialize pool with system data
 @@ -1251,15 +1255,19 @@ static void init_std_data(struct entropy_store *r)
 int i;
 ktime_t now = ktime_get_real();
 unsigned long rv;
 +   char log_prefix[128];

 r-last_pulled = jiffies;
 mix_pool_bytes(r, now, sizeof(now), NULL);
 for (i = r-poolinfo-poolbytes; i  0; i -= sizeof(rv)) {
 -   if (!arch_get_random_seed_long(rv) 
 -   !arch_get_random_long(rv))
 -   rv = random_get_entropy();
 +   rv = random_get_entropy();
 mix_pool_bytes(r, rv, sizeof(rv), NULL);
 }
 +
 +   sprintf(log_prefix, random: seeded %s pool, r-name);
 +   arch_get_rng_seed(r, seed_entropy_store, 8 * r-poolinfo-poolbytes,
 + log_prefix);
 +
 mix_pool_bytes(r, utsname(), sizeof(*(utsname())), NULL);
  }

 diff --git a/include/linux/random.h b/include/linux/random.h
 index 57fbbff..81a6145 100644
 --- a/include/linux/random.h
 +++ b/include/linux/random.h
 @@ -106,6 +106,46 @@ static inline int arch_has_random_seed(void)
  }
  #endif

 +#ifndef __HAVE_ARCH_GET_RNG_SEED
 +
 +/**
 + * arch_get_rng_seed() - get architectural rng seed data
 + * @ctx: context for the seed function
 + * @seed: function to call for each u32 obtained
 + * @bits_per_source: number of bits from each source to try to use
 + * @log_prefix: beginning of log output (may be NULL)
 + *
 + * Synchronously load some architectural entropy or other best-effort
 + * random seed data.  An arch-specific implementation should be no worse
 + * than this generic implementation.  If the arch code does something
 + * interesting, it may log something of the form log_prefix with
 + * 8 bits of stuff.
 + *
 + * No arch-specific implementation should be any worse than the generic
 + * implementation.
 + */
 +static inline void arch_get_rng_seed(void *ctx,
 +void (*seed)(void *ctx, u32 data),
 +int bits_per_source,
 +const char *log_prefix)
 +{
 +   int i;
 +
 +   for (i = 0; i  bits_per_source; i += 8 * sizeof(long)) {
 +   unsigned long rv;
 +
 +   if (arch_get_random_seed_long(rv) ||
 +   arch_get_random_long(rv)) {
 +   seed(ctx, (u32)rv);
 +#if BITS_PER_LONG  32
 +   seed(ctx, (u32)(rv  32));
 +#endif
 +   }
 +   }
 +}
 +
 +#endif /* __HAVE_ARCH_GET_RNG_SEED */
 +
  /* Pseudo random number generator from numerical recipes. */
  static inline u32 next_pseudo_random32(u32 seed)
  {
 --
 1.9.3




-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 0/5] random,x86,kvm: Rework arch RNG seeds and get some from kvm

2014-08-12 Thread Andy Lutomirski
On Wed, Jul 23, 2014 at 9:57 PM, Andy Lutomirski l...@amacapital.net wrote:
 This introduces and uses a very simple synchronous mechanism to get
 /dev/urandom-style bits appropriate for initial KVM PV guest RNG
 seeding.

 It also re-works the way that architectural random data is fed into
 random.c's pools.  I added a new arch hook called arch_get_rng_seed.
 The default implementation is more or less the same as the current
 code, except that random_get_entropy is now called unconditionally.

 x86 gets a custom arch_get_rng_seed.  It will use KVM_GET_RNG_SEED
 if available, and, if it does anything, it will log the number of
 bits collected from each available architectural source.  If more
 paravirt seed sources show up, it will be a natural place to add
 them.

 I sent the corresponding kvm-unit-tests and qemu changes separately.

What's the status of this series?  I assume that it's too late for at
least patches 2-5 to make it into 3.17.

--Andy


 Changes from v4:
  - Got rid of the RDRAND behavior change.  If this series is accepted,
I may resend it separately, but I think it's an unrelated issue.
  - Fix up the changelog entries -- I misunderstood how the old code
worked.
  - Avoid lots of failed attempts to use KVM_GET_RNG_SEED if it's not
available.

 Changes from v3:
  - Other than KASLR, the guest pieces are completely rewritten.
Patches 2-4 have essentially nothing in common with v2.

 Changes from v2:
  - Bisection fix (patch 2 had a misplaced brace).  The final states is
identical to that of v2.
  - Improve the 0/5 description a little bit.

 Changes from v1:
  - Split patches 2 and 3
  - Log all arch sources in init_std_data
  - Fix the 32-bit kaslr build

 Andy Lutomirski (5):
   x86,kvm: Add MSR_KVM_GET_RNG_SEED and a matching feature bit
   random: Add and use arch_get_rng_seed
   x86,random: Add an x86 implementation of arch_get_rng_seed
   x86,random,kvm: Use KVM_GET_RNG_SEED in arch_get_rng_seed
   x86,kaslr: Use MSR_KVM_GET_RNG_SEED for KASLR if available

  Documentation/virtual/kvm/cpuid.txt  |  3 ++
  arch/x86/Kconfig |  4 ++
  arch/x86/boot/compressed/aslr.c  | 27 +
  arch/x86/include/asm/archrandom.h|  6 +++
  arch/x86/include/asm/kvm_guest.h |  9 +
  arch/x86/include/asm/processor.h | 21 --
  arch/x86/include/uapi/asm/kvm_para.h |  2 +
  arch/x86/kernel/Makefile |  2 +
  arch/x86/kernel/archrandom.c | 74 
 
  arch/x86/kernel/kvm.c| 10 +
  arch/x86/kvm/cpuid.c |  3 +-
  arch/x86/kvm/x86.c   |  4 ++
  drivers/char/random.c| 14 +--
  include/linux/random.h   | 40 +++
  14 files changed, 212 insertions(+), 7 deletions(-)
  create mode 100644 arch/x86/kernel/archrandom.c

 --
 1.9.3




-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 0/5] random,x86,kvm: Rework arch RNG seeds and get some from kvm

2014-08-12 Thread Andy Lutomirski
On Tue, Aug 12, 2014 at 12:17 PM, Theodore Ts'o ty...@mit.edu wrote:
 On Tue, Aug 12, 2014 at 12:11:29PM -0700, Andy Lutomirski wrote:

 What's the status of this series?  I assume that it's too late for at
 least patches 2-5 to make it into 3.17.

 Which tree were you hoping this patch series to go through?  I was
 assuming it would go through the x86 tree since the bulk of the
 changes in the x86 subsystem (hence my Acked-by).

There's some argument that patch 1 should go through the kvm tree.
There's no real need for patch 1 and 2-5 to end up in the same kernel
release, either.


 IIRC, Peter had some concerns, and I don't remember if they were all
 addressed.  Peter?


I don't know.  I rewrite one thing he didn't like and undid the other,
but there's plenty of opportunity for this version to be problematic, too.

--Andy
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 0/5] random,x86,kvm: Rework arch RNG seeds and get some from kvm

2014-08-13 Thread Andy Lutomirski
On Aug 13, 2014 12:48 AM, H. Peter Anvin h...@zytor.com wrote:

 On 08/12/2014 12:22 PM, Andy Lutomirski wrote:
  On Tue, Aug 12, 2014 at 12:17 PM, Theodore Ts'o ty...@mit.edu wrote:
  On Tue, Aug 12, 2014 at 12:11:29PM -0700, Andy Lutomirski wrote:
 
  What's the status of this series?  I assume that it's too late for at
  least patches 2-5 to make it into 3.17.
 
  Which tree were you hoping this patch series to go through?  I was
  assuming it would go through the x86 tree since the bulk of the
  changes in the x86 subsystem (hence my Acked-by).
 
  There's some argument that patch 1 should go through the kvm tree.
  There's no real need for patch 1 and 2-5 to end up in the same kernel
  release, either.
 
 
  IIRC, Peter had some concerns, and I don't remember if they were all
  addressed.  Peter?
 
 
  I don't know.  I rewrite one thing he didn't like and undid the other,
  but there's plenty of opportunity for this version to be problematic, too.
 

 Sorry, I have been heads down on the current merge window.  I will look
 at this for 3.18, presumably after Kernel Summit.

 The proposed arch_get_rng_seed() is not really what it claims to be; it
 most definitely does not produce seed-grade randomness, instead it seems
 to be an arch function for best-effort initialization of the entropy
 pools -- which is fine, it is just something quite different.

Fair enough.  I meant seed as in something that initialized a PRNG
(think srand), not seed as in a
promised-to-be-cryptographically-secure seed for a DRBG.

I can rename it, update the comment, or otherwise tweak it to make the
intent clearer.


 I want to look over it more carefully before acking it, though.

It would also be nice for someone with a Haswell box (and an RDSEED
box) to test it.  I have neither.


 Andy, are you going to be in Chicago?

Yes.


 -hpa

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 0/5] random,x86,kvm: Rework arch RNG seeds and get some from kvm

2014-08-13 Thread Andy Lutomirski
On Wed, Aug 13, 2014 at 7:32 AM, Theodore Ts'o ty...@mit.edu wrote:
 On Wed, Aug 13, 2014 at 12:48:41AM -0700, H. Peter Anvin wrote:
 The proposed arch_get_rng_seed() is not really what it claims to be; it
 most definitely does not produce seed-grade randomness, instead it seems
 to be an arch function for best-effort initialization of the entropy
 pools -- which is fine, it is just something quite different.

 Without getting into an argument about which definition of seed is
 correct --- it's certainly confusing and different form the RDSEED
 usage of the word seed.

 Do we expect that anyone else besides arch_get_rnd_seed() would
 actually want to use it?

If you mean random.c instead of arch_get_rnd_seed, then I don't expect
there to be other users.  Aside from the best-effort bit causing
this to be basically useless on old bare metal, the interface is
really awkward for anything other than the use in random.c.

 I'd argue no; we want the rest of the kernel
 to either use get_random_bytes() or prandom_u32().  Given that, maybe
 we should just call it arch_random_init(), and expect that the only
 user of this interface would be drivers/char/random.c?

Sounds good to me.

FWIW, I'd like to see a second use added in random.c: I think that we
should do this, or even all of init_std_data, on resume from suspend
and especially on resume from hibernate / kexec.

--Andy
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 0/5] random,x86,kvm: Rework arch RNG seeds and get some from kvm

2014-08-13 Thread Andy Lutomirski
On Wed, Aug 13, 2014 at 11:22 AM, Theodore Ts'o ty...@mit.edu wrote:
 On Wed, Aug 13, 2014 at 10:45:25AM -0700, H. Peter Anvin wrote:
 On 08/13/2014 09:13 AM, Andy Lutomirski wrote:
 
  Sounds good to me.
 
  FWIW, I'd like to see a second use added in random.c: I think that we
  should do this, or even all of init_std_data, on resume from suspend
  and especially on resume from hibernate / kexec.
 

 Yes, we should.  We also need to make it possible to do this after
 cloning a VM.

 Agreed.  Can you send a patch?

 I can carry the commits to add arch_random_init() the generic version,
 and the patch to call it after suspend/resume.  I'll do this at the
 very head of the random tree, and make sure it gets pushed to Linus
 early during the next merge window.

 Does that sound like a plan?  Or does someone want to suggest
 something different?  I'm flexible...

OK.  Here's a proposal.  I'll split the series into two parts.  The
first part will be the arch_random_init generic change and code to
call it after suspend/resume (once I figure out the right callback).
I'll send that to you.

The second part will be the KVM and x86 code, which will look just
like this version (v5) except for the rename.  If needed, hpa and I
can hash the details we need at KS.

As for doing arch_random_init after clone/migration, I think we'll
need another KVM extension for that, since, AFAIK, we don't actually
get notified that we were cloned or migrated.  That will be
nontrivial.  Maybe we can figure that out at KS, too.

--Andy


 - Ted



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 0/5] random,x86,kvm: Rework arch RNG seeds and get some from kvm

2014-08-13 Thread Andy Lutomirski
On Wed, Aug 13, 2014 at 7:41 PM, H. Peter Anvin h...@zytor.com wrote:
 On 08/13/2014 11:44 AM, H. Peter Anvin wrote:
 On 08/13/2014 11:33 AM, Andy Lutomirski wrote:

 As for doing arch_random_init after clone/migration, I think we'll
 need another KVM extension for that, since, AFAIK, we don't actually
 get notified that we were cloned or migrated.  That will be
 nontrivial.  Maybe we can figure that out at KS, too.


 We don't need a reset when migrated (although it might be a good idea
 under some circumstances, i.e. if the pools might somehow have gotten
 exposed) but definitely when cloned.


 But yes, we need a notification.  For obvious reasons there is no
 suspend event (one can snapshot a running VM) but we need to be notified
 upon wakeup, *or* we need to give KVM a way to update the necessary state.

This could presumably use the interrupt mechanism on virtio-rng if
we're willing to depend on having host support for virtio-rng.

v6 (coming in a few minutes) will at least get it right when the
kernel goes through the resume path (i.e. not KVM/QEMU suspend, and
maybe not S0ix either).

--Andy
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 0/7] random,x86,kvm: Rework arch RNG seeds and get some from kvm

2014-08-13 Thread Andy Lutomirski
This introduces and uses a very simple synchronous mechanism to get
/dev/urandom-style bits appropriate for initial KVM PV guest RNG
seeding.

It also re-works the way that architectural random data is fed into
random.c's pools.  Timekeeping randomness now comes directly from
the timekeeping core rather than being pulled in from init_std_data,
and timekeeping randomness is added both on boot and on resume.  I
added a new arch hook called arch_rng_init.  The default
implementation is more or less the same as the current code, except
that random_get_entropy is now called unconditionally.  We now also
call init_std_data on resume.

x86 gets a custom arch_rng_init.  It will use KVM_GET_RNG_SEED if
available, and, if it does anything, it will log the number of bits
collected from each available architectural source.  If more
paravirt seed sources show up, it will be a natural place to add
them.

I sent the corresponding kvm-unit-tests and qemu changes separately.

Changes from v5:
 - Moved the generic changes to the beginning.
 - Renamed arch_get_rng_seed to arch_rng_init.
 - The timekeeping change is new.
 - random.c registers a syscore callback to reseed on resume.

Changes from v4:
 - Got rid of the RDRAND behavior change.  If this series is accepted,
   I may resend it separately, but I think it's an unrelated issue.
 - Fix up the changelog entries -- I misunderstood how the old code
   worked.
 - Avoid lots of failed attempts to use KVM_GET_RNG_SEED if it's not
   available.

Changes from v3:
 - Other than KASLR, the guest pieces are completely rewritten.
   Patches 2-4 have essentially nothing in common with v2.

Changes from v2:
 - Bisection fix (patch 2 had a misplaced brace).  The final states is
   identical to that of v2.
 - Improve the 0/5 description a little bit.

Changes from v1:
 - Split patches 2 and 3
 - Log all arch sources in init_std_data
 - Fix the 32-bit kaslr build

Andy Lutomirski (7):
  random: Add and use arch_rng_init
  random, timekeeping: Collect timekeeping entropy in the timekeeping
code
  random: Reseed pools on resume
  x86,kvm: Add MSR_KVM_GET_RNG_SEED and a matching feature bit
  x86,random: Add an x86 implementation of arch_rng_init
  x86,random,kvm: Use KVM_GET_RNG_SEED in arch_rng_init
  x86,kaslr: Use MSR_KVM_GET_RNG_SEED for KASLR if available

 Documentation/virtual/kvm/cpuid.txt  |  3 ++
 arch/x86/Kconfig |  4 ++
 arch/x86/boot/compressed/aslr.c  | 27 +
 arch/x86/include/asm/archrandom.h|  6 +++
 arch/x86/include/asm/kvm_guest.h |  9 +
 arch/x86/include/asm/processor.h | 21 --
 arch/x86/include/uapi/asm/kvm_para.h |  2 +
 arch/x86/kernel/Makefile |  2 +
 arch/x86/kernel/archrandom.c | 74 
 arch/x86/kernel/kvm.c| 10 +
 arch/x86/kvm/cpuid.c |  3 +-
 arch/x86/kvm/x86.c   |  4 ++
 drivers/char/random.c| 42 
 include/linux/random.h   | 40 +++
 kernel/time/timekeeping.c| 11 ++
 15 files changed, 246 insertions(+), 12 deletions(-)
 create mode 100644 arch/x86/kernel/archrandom.c

-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 7/7] x86,kaslr: Use MSR_KVM_GET_RNG_SEED for KASLR if available

2014-08-13 Thread Andy Lutomirski
It's considerably better than any of the alternatives on KVM.

Rather than reinventing all of the cpu feature query code, this fixes
native_cpuid to work in PIC objects.

I haven't combined it with boot/cpuflags.c's cpuid implementation:
including asm/processor.h from boot/cpuflags.c results in a flood of
unrelated errors, and fixing it might be messy.

Reviewed-by: Kees Cook keesc...@chromium.org
Acked-by: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Andy Lutomirski l...@amacapital.net
---
 arch/x86/boot/compressed/aslr.c  | 27 +++
 arch/x86/include/asm/processor.h | 21 ++---
 2 files changed, 45 insertions(+), 3 deletions(-)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index fc6091a..8583f0e 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -5,6 +5,8 @@
 #include asm/archrandom.h
 #include asm/e820.h
 
+#include uapi/asm/kvm_para.h
+
 #include generated/compile.h
 #include linux/module.h
 #include linux/uts.h
@@ -15,6 +17,22 @@
 static const char build_str[] = UTS_RELEASE  ( LINUX_COMPILE_BY @
LINUX_COMPILE_HOST ) ( LINUX_COMPILER )  UTS_VERSION;
 
+static bool kvm_para_has_feature(unsigned int feature)
+{
+   u32 kvm_base;
+   u32 features;
+
+   if (!has_cpuflag(X86_FEATURE_HYPERVISOR))
+   return false;
+
+   kvm_base = hypervisor_cpuid_base(KVMKVMKVM\0\0\0, KVM_CPUID_FEATURES);
+   if (!kvm_base)
+   return false;
+
+   features = cpuid_eax(kvm_base | KVM_CPUID_FEATURES);
+   return features  (1UL  feature);
+}
+
 #define I8254_PORT_CONTROL 0x43
 #define I8254_PORT_COUNTER00x40
 #define I8254_CMD_READBACK 0xC0
@@ -81,6 +99,15 @@ static unsigned long get_random_long(void)
}
}
 
+   if (kvm_para_has_feature(KVM_FEATURE_GET_RNG_SEED)) {
+   u64 seed;
+
+   debug_putstr( MSR_KVM_GET_RNG_SEED);
+   rdmsrl(MSR_KVM_GET_RNG_SEED, seed);
+   random ^= (unsigned long)seed;
+   use_i8254 = false;
+   }
+
if (has_cpuflag(X86_FEATURE_TSC)) {
debug_putstr( RDTSC);
rdtscll(raw);
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index a4ea023..6096f3c 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -189,10 +189,25 @@ static inline int have_cpuid_p(void)
 static inline void native_cpuid(unsigned int *eax, unsigned int *ebx,
unsigned int *ecx, unsigned int *edx)
 {
-   /* ecx is often an input as well as an output. */
-   asm volatile(cpuid
+   /*
+* This function can be used from the boot code, so it needs
+* to avoid using EBX in constraints in PIC mode.
+*
+* ecx is often an input as well as an output.
+*/
+   asm volatile(.ifnc %%ebx,%1 ; .ifnc %%rbx,%1   \n\t
+movl  %%ebx,%1\n\t
+.endif ; .endif   \n\t
+cpuid \n\t
+.ifnc %%ebx,%1 ; .ifnc %%rbx,%1   \n\t
+xchgl %%ebx,%1\n\t
+.endif ; .endif
: =a (*eax),
- =b (*ebx),
+#if defined(__i386__)  defined(__PIC__)
+ =r (*ebx),  /* gcc won't let us use ebx */
+#else
+ =b (*ebx),  /* ebx is okay */
+#endif
  =c (*ecx),
  =d (*edx)
: 0 (*eax), 2 (*ecx)
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 6/7] x86,random,kvm: Use KVM_GET_RNG_SEED in arch_rng_init

2014-08-13 Thread Andy Lutomirski
This is a straightforward implementation: for each bit of internal
RNG state, request one bit from KVM_GET_RNG_SEED.  This is done even
if RDSEED/RDRAND worked, since KVM_GET_RNG_SEED is likely to provide
cryptographically secure output even if the CPU's RNG is weak or
compromised.

Acked-by: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Andy Lutomirski l...@amacapital.net
---
 arch/x86/Kconfig |  4 
 arch/x86/include/asm/kvm_guest.h |  9 +
 arch/x86/kernel/archrandom.c | 25 -
 arch/x86/kernel/kvm.c| 10 ++
 4 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index d24887b..ad87278 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -594,6 +594,7 @@ config KVM_GUEST
bool KVM Guest support (including kvmclock)
depends on PARAVIRT
select PARAVIRT_CLOCK
+   select ARCH_RANDOM
default y
---help---
  This option enables various optimizations for running under the KVM
@@ -1508,6 +1509,9 @@ config ARCH_RANDOM
  If supported, this is a high bandwidth, cryptographically
  secure hardware random number generator.
 
+ This also enables paravirt RNGs such as KVM's if the relevant
+ PV guest support is enabled.
+
 config X86_SMAP
def_bool y
prompt Supervisor Mode Access Prevention if EXPERT
diff --git a/arch/x86/include/asm/kvm_guest.h b/arch/x86/include/asm/kvm_guest.h
index a92b176..8c4dbd5 100644
--- a/arch/x86/include/asm/kvm_guest.h
+++ b/arch/x86/include/asm/kvm_guest.h
@@ -3,4 +3,13 @@
 
 int kvm_setup_vsyscall_timeinfo(void);
 
+#if defined(CONFIG_KVM_GUEST)  defined(CONFIG_ARCH_RANDOM)
+extern bool kvm_get_rng_seed(u64 *rv);
+#else
+static inline bool kvm_get_rng_seed(u64 *rv)
+{
+   return false;
+}
+#endif
+
 #endif /* _ASM_X86_KVM_GUEST_H */
diff --git a/arch/x86/kernel/archrandom.c b/arch/x86/kernel/archrandom.c
index e8d2ffb..adbaa25 100644
--- a/arch/x86/kernel/archrandom.c
+++ b/arch/x86/kernel/archrandom.c
@@ -15,6 +15,7 @@
  */
 
 #include asm/archrandom.h
+#include asm/kvm_guest.h
 
 void arch_rng_init(void *ctx,
   void (*seed)(void *ctx, u32 data),
@@ -22,7 +23,7 @@ void arch_rng_init(void *ctx,
   const char *log_prefix)
 {
int i;
-   int rdseed_bits = 0, rdrand_bits = 0;
+   int rdseed_bits = 0, rdrand_bits = 0, kvm_bits = 0;
char buf[128] = ;
char *msgptr = buf;
 
@@ -42,10 +43,32 @@ void arch_rng_init(void *ctx,
 #endif
}
 
+   /*
+* Use KVM_GET_RNG_SEED regardless of whether the CPU RNG
+* worked, since it incorporates entropy unavailable to the CPU,
+* and we shouldn't trust the hardware RNG more than we need to.
+* We request enough bits for the entire internal RNG state,
+* because there's no good reason not to.
+*/
+   for (i = 0; i  bits_per_source; i += 64) {
+   u64 rv;
+
+   if (kvm_get_rng_seed(rv)) {
+   seed(ctx, (u32)rv);
+   seed(ctx, (u32)(rv  32));
+   kvm_bits += 8 * sizeof(rv);
+   } else {
+   break;  /* If it fails once, it will keep failing. */
+   }
+   }
+
if (rdseed_bits)
msgptr += sprintf(msgptr, , %d bits from RDSEED, rdseed_bits);
if (rdrand_bits)
msgptr += sprintf(msgptr, , %d bits from RDRAND, rdrand_bits);
+   if (kvm_bits)
+   msgptr += sprintf(msgptr, , %d bits from KVM_GET_RNG_BITS,
+ kvm_bits);
if (buf[0])
pr_info(%s with %s\n, log_prefix, buf + 2);
 }
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 3dd8e2c..bd8783a 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -416,6 +416,16 @@ void kvm_disable_steal_time(void)
wrmsr(MSR_KVM_STEAL_TIME, 0, 0);
 }
 
+bool kvm_get_rng_seed(u64 *v)
+{
+   /*
+* Allow migration from a hypervisor with the GET_RNG_SEED
+* feature to a hypervisor without it.
+*/
+   return (kvm_para_has_feature(KVM_FEATURE_GET_RNG_SEED) 
+   rdmsrl_safe(MSR_KVM_GET_RNG_SEED, v) == 0);
+}
+
 #ifdef CONFIG_SMP
 static void __init kvm_smp_prepare_boot_cpu(void)
 {
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 6/7] x86,random,kvm: Use KVM_GET_RNG_SEED in arch_get_rng_seed

2014-08-13 Thread Andy Lutomirski
This is a straightforward implementation: for each bit of internal
RNG state, request one bit from KVM_GET_RNG_SEED.  This is done even
if RDSEED/RDRAND worked, since KVM_GET_RNG_SEED is likely to provide
cryptographically secure output even if the CPU's RNG is weak or
compromised.

Acked-by: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Andy Lutomirski l...@amacapital.net
---
 arch/x86/Kconfig |  4 
 arch/x86/include/asm/kvm_guest.h |  9 +
 arch/x86/kernel/archrandom.c | 25 -
 arch/x86/kernel/kvm.c| 10 ++
 4 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index d24887b..ad87278 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -594,6 +594,7 @@ config KVM_GUEST
bool KVM Guest support (including kvmclock)
depends on PARAVIRT
select PARAVIRT_CLOCK
+   select ARCH_RANDOM
default y
---help---
  This option enables various optimizations for running under the KVM
@@ -1508,6 +1509,9 @@ config ARCH_RANDOM
  If supported, this is a high bandwidth, cryptographically
  secure hardware random number generator.
 
+ This also enables paravirt RNGs such as KVM's if the relevant
+ PV guest support is enabled.
+
 config X86_SMAP
def_bool y
prompt Supervisor Mode Access Prevention if EXPERT
diff --git a/arch/x86/include/asm/kvm_guest.h b/arch/x86/include/asm/kvm_guest.h
index a92b176..8c4dbd5 100644
--- a/arch/x86/include/asm/kvm_guest.h
+++ b/arch/x86/include/asm/kvm_guest.h
@@ -3,4 +3,13 @@
 
 int kvm_setup_vsyscall_timeinfo(void);
 
+#if defined(CONFIG_KVM_GUEST)  defined(CONFIG_ARCH_RANDOM)
+extern bool kvm_get_rng_seed(u64 *rv);
+#else
+static inline bool kvm_get_rng_seed(u64 *rv)
+{
+   return false;
+}
+#endif
+
 #endif /* _ASM_X86_KVM_GUEST_H */
diff --git a/arch/x86/kernel/archrandom.c b/arch/x86/kernel/archrandom.c
index e8d2ffb..adbaa25 100644
--- a/arch/x86/kernel/archrandom.c
+++ b/arch/x86/kernel/archrandom.c
@@ -15,6 +15,7 @@
  */
 
 #include asm/archrandom.h
+#include asm/kvm_guest.h
 
 void arch_rng_init(void *ctx,
   void (*seed)(void *ctx, u32 data),
@@ -22,7 +23,7 @@ void arch_rng_init(void *ctx,
   const char *log_prefix)
 {
int i;
-   int rdseed_bits = 0, rdrand_bits = 0;
+   int rdseed_bits = 0, rdrand_bits = 0, kvm_bits = 0;
char buf[128] = ;
char *msgptr = buf;
 
@@ -42,10 +43,32 @@ void arch_rng_init(void *ctx,
 #endif
}
 
+   /*
+* Use KVM_GET_RNG_SEED regardless of whether the CPU RNG
+* worked, since it incorporates entropy unavailable to the CPU,
+* and we shouldn't trust the hardware RNG more than we need to.
+* We request enough bits for the entire internal RNG state,
+* because there's no good reason not to.
+*/
+   for (i = 0; i  bits_per_source; i += 64) {
+   u64 rv;
+
+   if (kvm_get_rng_seed(rv)) {
+   seed(ctx, (u32)rv);
+   seed(ctx, (u32)(rv  32));
+   kvm_bits += 8 * sizeof(rv);
+   } else {
+   break;  /* If it fails once, it will keep failing. */
+   }
+   }
+
if (rdseed_bits)
msgptr += sprintf(msgptr, , %d bits from RDSEED, rdseed_bits);
if (rdrand_bits)
msgptr += sprintf(msgptr, , %d bits from RDRAND, rdrand_bits);
+   if (kvm_bits)
+   msgptr += sprintf(msgptr, , %d bits from KVM_GET_RNG_BITS,
+ kvm_bits);
if (buf[0])
pr_info(%s with %s\n, log_prefix, buf + 2);
 }
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 3dd8e2c..bd8783a 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -416,6 +416,16 @@ void kvm_disable_steal_time(void)
wrmsr(MSR_KVM_STEAL_TIME, 0, 0);
 }
 
+bool kvm_get_rng_seed(u64 *v)
+{
+   /*
+* Allow migration from a hypervisor with the GET_RNG_SEED
+* feature to a hypervisor without it.
+*/
+   return (kvm_para_has_feature(KVM_FEATURE_GET_RNG_SEED) 
+   rdmsrl_safe(MSR_KVM_GET_RNG_SEED, v) == 0);
+}
+
 #ifdef CONFIG_SMP
 static void __init kvm_smp_prepare_boot_cpu(void)
 {
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 5/7] x86,random: Add an x86 implementation of arch_rng_init

2014-08-13 Thread Andy Lutomirski
This does the same thing as the generic implementation, except
that it logs how many bits of each type it collected.  I want to
know whether the initial seeding is working and, if so, whether
the RNG is fast enough.

(I know that hpa assures me that the hardware RNG is more than
 fast enough, but I'd still like a direct way to verify this.)

Arguably, arch_get_random_seed could be removed now: I'm having some
trouble imagining a sensible non-architecture-specific use of it
that wouldn't be better served by arch_rng_init.

Signed-off-by: Andy Lutomirski l...@amacapital.net
---
 arch/x86/include/asm/archrandom.h |  6 +
 arch/x86/kernel/Makefile  |  2 ++
 arch/x86/kernel/archrandom.c  | 51 +++
 3 files changed, 59 insertions(+)
 create mode 100644 arch/x86/kernel/archrandom.c

diff --git a/arch/x86/include/asm/archrandom.h 
b/arch/x86/include/asm/archrandom.h
index 69f1366..5611c21 100644
--- a/arch/x86/include/asm/archrandom.h
+++ b/arch/x86/include/asm/archrandom.h
@@ -117,6 +117,12 @@ GET_SEED(arch_get_random_seed_int, unsigned int, 
RDSEED_INT, ASM_NOP4);
 #define arch_has_random()  static_cpu_has(X86_FEATURE_RDRAND)
 #define arch_has_random_seed() static_cpu_has(X86_FEATURE_RDSEED)
 
+#define __HAVE_ARCH_RNG_INIT
+extern void arch_rng_init(void *ctx,
+ void (*seed)(void *ctx, u32 data),
+ int bits_per_source,
+ const char *log_prefix);
+
 #else
 
 static inline int rdrand_long(unsigned long *v)
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 047f9ff..0718bae 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -92,6 +92,8 @@ obj-$(CONFIG_PARAVIRT)+= paravirt.o 
paravirt_patch_$(BITS).o
 obj-$(CONFIG_PARAVIRT_SPINLOCKS)+= paravirt-spinlocks.o
 obj-$(CONFIG_PARAVIRT_CLOCK)   += pvclock.o
 
+obj-$(CONFIG_ARCH_RANDOM)  += archrandom.o
+
 obj-$(CONFIG_PCSPKR_PLATFORM)  += pcspeaker.o
 
 obj-$(CONFIG_X86_CHECK_BIOS_CORRUPTION) += check.o
diff --git a/arch/x86/kernel/archrandom.c b/arch/x86/kernel/archrandom.c
new file mode 100644
index 000..e8d2ffb
--- /dev/null
+++ b/arch/x86/kernel/archrandom.c
@@ -0,0 +1,51 @@
+/*
+ * This file is part of the Linux kernel.
+ *
+ * Copyright (c) 2014 Andy Lutomirski
+ * Authors: Andy Lutomirski l...@amacapital.net
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+#include asm/archrandom.h
+
+void arch_rng_init(void *ctx,
+  void (*seed)(void *ctx, u32 data),
+  int bits_per_source,
+  const char *log_prefix)
+{
+   int i;
+   int rdseed_bits = 0, rdrand_bits = 0;
+   char buf[128] = ;
+   char *msgptr = buf;
+
+   for (i = 0; i  bits_per_source; i += 8 * sizeof(long)) {
+   unsigned long rv;
+
+   if (arch_get_random_seed_long(rv))
+   rdseed_bits += 8 * sizeof(rv);
+   else if (arch_get_random_long(rv))
+   rdrand_bits += 8 * sizeof(rv);
+   else
+   continue;   /* Don't waste time mixing. */
+
+   seed(ctx, (u32)rv);
+#if BITS_PER_LONG  32
+   seed(ctx, (u32)(rv  32));
+#endif
+   }
+
+   if (rdseed_bits)
+   msgptr += sprintf(msgptr, , %d bits from RDSEED, rdseed_bits);
+   if (rdrand_bits)
+   msgptr += sprintf(msgptr, , %d bits from RDRAND, rdrand_bits);
+   if (buf[0])
+   pr_info(%s with %s\n, log_prefix, buf + 2);
+}
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 3/7] random: Reseed pools on resume

2014-08-13 Thread Andy Lutomirski
After a suspend/resume cycle, and especially after hibernating, we
should assume that the random pools might have leaked.  To minimize
the risk this poses, try to collect fresh architectural entropy on
resume.

Signed-off-by: Andy Lutomirski l...@amacapital.net
---
 drivers/char/random.c | 26 +++---
 1 file changed, 23 insertions(+), 3 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 8dc3e3a..0811ad4 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -257,6 +257,7 @@
 #include linux/kmemcheck.h
 #include linux/workqueue.h
 #include linux/irq.h
+#include linux/syscore_ops.h
 
 #include asm/processor.h
 #include asm/uaccess.h
@@ -1279,6 +1280,26 @@ static void init_std_data(struct entropy_store *r)
mix_pool_bytes(r, utsname(), sizeof(*(utsname())), NULL);
 }
 
+static void init_all_pools(void)
+{
+   init_std_data(input_pool);
+   init_std_data(blocking_pool);
+   init_std_data(nonblocking_pool);
+}
+
+static void random_resume(void)
+{
+   /*
+* After resume (and especially after hibernation / kexec resume),
+* make a best-effort attempt to collect fresh entropy.
+*/
+   init_all_pools();
+}
+
+static struct syscore_ops random_syscore_ops = {
+   .resume = random_resume,
+};
+
 /*
  * Note that setup_arch() may call add_device_randomness()
  * long before we get here. This allows seeding of the pools
@@ -1291,9 +1312,8 @@ static void init_std_data(struct entropy_store *r)
  */
 static int rand_initialize(void)
 {
-   init_std_data(input_pool);
-   init_std_data(blocking_pool);
-   init_std_data(nonblocking_pool);
+   init_all_pools();
+   register_syscore_ops(random_syscore_ops);
return 0;
 }
 early_initcall(rand_initialize);
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 4/7] x86,kvm: Add MSR_KVM_GET_RNG_SEED and a matching feature bit

2014-08-13 Thread Andy Lutomirski
This adds a simple interface to allow a guest to request 64 bits of
host nonblocking entropy.  This is independent of virtio-rng for a
couple of reasons:

 - It's intended to be usable during early boot, when a trivial
   synchronous interface is needed.

 - virtio-rng gives blocking entropy, and making guest boot wait for
   the host's /dev/random will cause problems.

MSR_KVM_GET_RNG_SEED is intended to provide 64 bits of best-effort
cryptographically secure data for use as a seed.  It provides no
guarantee that the result contains any actual entropy.

Acked-by: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Andy Lutomirski l...@amacapital.net
---
 Documentation/virtual/kvm/cpuid.txt  | 3 +++
 arch/x86/include/uapi/asm/kvm_para.h | 2 ++
 arch/x86/kvm/cpuid.c | 3 ++-
 arch/x86/kvm/x86.c   | 4 
 4 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/Documentation/virtual/kvm/cpuid.txt 
b/Documentation/virtual/kvm/cpuid.txt
index 3c65feb..0ab043b 100644
--- a/Documentation/virtual/kvm/cpuid.txt
+++ b/Documentation/virtual/kvm/cpuid.txt
@@ -54,6 +54,9 @@ KVM_FEATURE_PV_UNHALT  || 7 || guest checks 
this feature bit
||   || before enabling paravirtualized
||   || spinlock support.
 --
+KVM_FEATURE_GET_RNG_SEED   || 8 || host provides rng seed data via
+   ||   || MSR_KVM_GET_RNG_SEED.
+--
 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||24 || host will warn if no guest-side
||   || per-cpu warps are expected in
||   || kvmclock.
diff --git a/arch/x86/include/uapi/asm/kvm_para.h 
b/arch/x86/include/uapi/asm/kvm_para.h
index 94dc8ca..e2eaf93 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -24,6 +24,7 @@
 #define KVM_FEATURE_STEAL_TIME 5
 #define KVM_FEATURE_PV_EOI 6
 #define KVM_FEATURE_PV_UNHALT  7
+#define KVM_FEATURE_GET_RNG_SEED   8
 
 /* The last 8 bits are used to indicate how to interpret the flags field
  * in pvclock structure. If no bits are set, all flags are ignored.
@@ -40,6 +41,7 @@
 #define MSR_KVM_ASYNC_PF_EN 0x4b564d02
 #define MSR_KVM_STEAL_TIME  0x4b564d03
 #define MSR_KVM_PV_EOI_EN  0x4b564d04
+#define MSR_KVM_GET_RNG_SEED 0x4b564d05
 
 struct kvm_steal_time {
__u64 steal;
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 38a0afe..40d6763 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -479,7 +479,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
 (1  KVM_FEATURE_ASYNC_PF) |
 (1  KVM_FEATURE_PV_EOI) |
 (1  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) |
-(1  KVM_FEATURE_PV_UNHALT);
+(1  KVM_FEATURE_PV_UNHALT) |
+(1  KVM_FEATURE_GET_RNG_SEED);
 
if (sched_info_on())
entry-eax |= (1  KVM_FEATURE_STEAL_TIME);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ef432f8..695b682 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -48,6 +48,7 @@
 #include linux/pci.h
 #include linux/timekeeper_internal.h
 #include linux/pvclock_gtod.h
+#include linux/random.h
 #include trace/events/kvm.h
 
 #define CREATE_TRACE_POINTS
@@ -2480,6 +2481,9 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, 
u64 *pdata)
case MSR_KVM_PV_EOI_EN:
data = vcpu-arch.pv_eoi.msr_val;
break;
+   case MSR_KVM_GET_RNG_SEED:
+   get_random_bytes(data, sizeof(data));
+   break;
case MSR_IA32_P5_MC_ADDR:
case MSR_IA32_P5_MC_TYPE:
case MSR_IA32_MCG_CAP:
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 2/7] random, timekeeping: Collect timekeeping entropy in the timekeeping code

2014-08-13 Thread Andy Lutomirski
Currently, init_std_data calls ktime_get_real().  This imposes
awkward constraints on when init_std_data can be called, and
init_std_data is unlikely to collect the full unpredictable data
available to the timekeeping code, especially after resume.

Remove this code from random.c and add the appropriate
add_device_randomness calls to timekeeping.c instead.

Cc: John Stultz john.stu...@linaro.org
Signed-off-by: Andy Lutomirski l...@amacapital.net
---
 drivers/char/random.c |  2 --
 kernel/time/timekeeping.c | 11 +++
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 7673e60..8dc3e3a 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1263,12 +1263,10 @@ static void seed_entropy_store(void *ctx, u32 data)
 static void init_std_data(struct entropy_store *r)
 {
int i;
-   ktime_t now = ktime_get_real();
unsigned long rv;
char log_prefix[128];
 
r-last_pulled = jiffies;
-   mix_pool_bytes(r, now, sizeof(now), NULL);
for (i = r-poolinfo-poolbytes; i  0; i -= sizeof(rv)) {
rv = random_get_entropy();
mix_pool_bytes(r, rv, sizeof(rv), NULL);
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 32d8d6a..9609db9 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -23,6 +23,7 @@
 #include linux/stop_machine.h
 #include linux/pvclock_gtod.h
 #include linux/compiler.h
+#include linux/random.h
 
 #include tick-internal.h
 #include ntp_internal.h
@@ -835,6 +836,9 @@ void __init timekeeping_init(void)
memcpy(shadow_timekeeper, timekeeper, sizeof(timekeeper));
 
write_seqcount_end(timekeeper_seq);
+
+   add_device_randomness(tk, sizeof(tk));
+
raw_spin_unlock_irqrestore(timekeeper_lock, flags);
 }
 
@@ -976,6 +980,13 @@ static void timekeeping_resume(void)
timekeeping_suspended = 0;
timekeeping_update(tk, TK_MIRROR | TK_CLOCK_WAS_SET);
write_seqcount_end(timekeeper_seq);
+
+   /*
+* The timekeeping state has a decent chance of differing
+* between resumptions of the same image.
+*/
+   add_device_randomness(tk, sizeof(tk));
+
raw_spin_unlock_irqrestore(timekeeper_lock, flags);
 
touch_softlockup_watchdog();
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 1/7] random: Add and use arch_rng_init

2014-08-13 Thread Andy Lutomirski
Currently, init_std_data contains its own logic for using arch
random sources.  This replaces that logic with a generic function
arch_rng_init that allows arch code to supply its own logic.  The
default implementation tries arch_get_random_seed_long and
arch_get_random_long individually.

The only functional change here is that random_get_entropy() is used
unconditionally instead of being used only when the arch sources
fail.  This may add a tiny amount of security.

Acked-by: Theodore Ts'o ty...@mit.edu
Signed-off-by: Andy Lutomirski l...@amacapital.net
---
 drivers/char/random.c  | 14 +++---
 include/linux/random.h | 40 
 2 files changed, 51 insertions(+), 3 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 71529e1..7673e60 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1246,6 +1246,10 @@ void get_random_bytes_arch(void *buf, int nbytes)
 }
 EXPORT_SYMBOL(get_random_bytes_arch);
 
+static void seed_entropy_store(void *ctx, u32 data)
+{
+   mix_pool_bytes((struct entropy_store *)ctx, data, sizeof(data), NULL);
+}
 
 /*
  * init_std_data - initialize pool with system data
@@ -1261,15 +1265,19 @@ static void init_std_data(struct entropy_store *r)
int i;
ktime_t now = ktime_get_real();
unsigned long rv;
+   char log_prefix[128];
 
r-last_pulled = jiffies;
mix_pool_bytes(r, now, sizeof(now), NULL);
for (i = r-poolinfo-poolbytes; i  0; i -= sizeof(rv)) {
-   if (!arch_get_random_seed_long(rv) 
-   !arch_get_random_long(rv))
-   rv = random_get_entropy();
+   rv = random_get_entropy();
mix_pool_bytes(r, rv, sizeof(rv), NULL);
}
+
+   sprintf(log_prefix, random: seeded %s pool, r-name);
+   arch_rng_init(r, seed_entropy_store, 8 * r-poolinfo-poolbytes,
+ log_prefix);
+
mix_pool_bytes(r, utsname(), sizeof(*(utsname())), NULL);
 }
 
diff --git a/include/linux/random.h b/include/linux/random.h
index 57fbbff..c8d692e 100644
--- a/include/linux/random.h
+++ b/include/linux/random.h
@@ -106,6 +106,46 @@ static inline int arch_has_random_seed(void)
 }
 #endif
 
+#ifndef __HAVE_ARCH_RNG_INIT
+
+/**
+ * arch_rng_init() - get architectural rng seed data
+ * @ctx: context for the seed function
+ * @seed: function to call for each u32 obtained
+ * @bits_per_source: number of bits from each source to try to use
+ * @log_prefix: beginning of log output (may be NULL)
+ *
+ * Synchronously load some architectural entropy or other best-effort
+ * random seed data.  An arch-specific implementation should be no worse
+ * than this generic implementation.  If the arch code does something
+ * interesting, it may log something of the form log_prefix with
+ * 8 bits of stuff.
+ *
+ * No arch-specific implementation should be any worse than the generic
+ * implementation.
+ */
+static inline void arch_rng_init(void *ctx,
+void (*seed)(void *ctx, u32 data),
+int bits_per_source,
+const char *log_prefix)
+{
+   int i;
+
+   for (i = 0; i  bits_per_source; i += 8 * sizeof(long)) {
+   unsigned long rv;
+
+   if (arch_get_random_seed_long(rv) ||
+   arch_get_random_long(rv)) {
+   seed(ctx, (u32)rv);
+#if BITS_PER_LONG  32
+   seed(ctx, (u32)(rv  32));
+#endif
+   }
+   }
+}
+
+#endif /* __HAVE_ARCH_RNG_INIT */
+
 /* Pseudo random number generator from numerical recipes. */
 static inline u32 next_pseudo_random32(u32 seed)
 {
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 2/7] random, timekeeping: Collect timekeeping entropy in the timekeeping code

2014-08-14 Thread Andy Lutomirski
On Wed, Aug 13, 2014 at 10:43 PM, Andy Lutomirski l...@amacapital.net wrote:
 Currently, init_std_data calls ktime_get_real().  This imposes
 awkward constraints on when init_std_data can be called, and
 init_std_data is unlikely to collect the full unpredictable data
 available to the timekeeping code, especially after resume.

 Remove this code from random.c and add the appropriate
 add_device_randomness calls to timekeeping.c instead.

*sigh* this is buggy:


 +   add_device_randomness(tk, sizeof(tk));

sizeof(*tk)

 +   add_device_randomness(tk, sizeof(tk));

ditto.

I'll fix this for v7, but I'll wait awhile for other comments to reduce spam.

--Andy
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


GET_RNG_SEED hypercall ABI? (Re: [PATCH v5 0/5] random,x86,kvm: Rework arch RNG seeds and get some from kvm)

2014-08-26 Thread Andy Lutomirski
hpa pointed out that the ABI that I chose (an MSR from the KVM range
and a KVM cpuid bit) is unnecessarily KVM-specific.  It would be nice
to allocate an MSR that everyone involved can agree on and, rather
than relying on a cpuid bit, just have the guest probe for the MSR.

This leads to a few questions:

1. How do we allocate an MSR?  (For background, this would be an MSR
that either returns 64 bits of best-effort cryptographically secure
random data or fails with #GP.)

2. For KVM, what's the right way to allow QEMU to turn the feature on
and off?  Is this even necessary?  KVM currently doesn't seem to allow
QEMU to turn any of its MSRs off; it just allows QEMU to ask it to
stop advertising support.

3. QEMU people, can you please fix your RDMSR emulation to send #GP on
failure?  I can work around it for this MSR in the Linux code, but for
Pete's sake... :(

Thanks,
Andy
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: GET_RNG_SEED hypercall ABI? (Re: [PATCH v5 0/5] random,x86,kvm: Rework arch RNG seeds and get some from kvm)

2014-08-28 Thread Andy Lutomirski
On Aug 28, 2014 7:17 AM, Gleb Natapov g...@kernel.org wrote:

 On Tue, Aug 26, 2014 at 04:58:34PM -0700, Andy Lutomirski wrote:
  hpa pointed out that the ABI that I chose (an MSR from the KVM range
  and a KVM cpuid bit) is unnecessarily KVM-specific.  It would be nice
  to allocate an MSR that everyone involved can agree on and, rather
  than relying on a cpuid bit, just have the guest probe for the MSR.
 
 CPUID part allows feature to be disabled for machine compatibility purpose
 during migration. Of course interface can explicitly state that one successful
 use of the MSR does not mean that next use will not result in a #GP, but that
 doesn't sound very elegant and is different from any other MSR out there.


Is there a non-cpuid interface between QEMU and KVM for this?

AFAICT, even turning off cpuid bits for things like async pf doesn't
actually disable the MSRs (which is arguably an attack surface issue).

--Andy

 --
 Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: GET_RNG_SEED hypercall ABI? (Re: [PATCH v5 0/5] random,x86,kvm: Rework arch RNG seeds and get some from kvm)

2014-08-28 Thread Andy Lutomirski
On Thu, Aug 28, 2014 at 12:46 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 28/08/2014 18:22, Andy Lutomirski ha scritto:
 Is there a non-cpuid interface between QEMU and KVM for this?

 No.

Hmm.  Then, assuming that someone manages to allocate a
cross-hypervisor MSR number for this, what am I supposed to do in the
KVM code?  Just make it available unconditionally?  I don't see why
that wouldn't work reliably, but it seems like an odd design.


 AFAICT, even turning off cpuid bits for things like async pf doesn't
 actually disable the MSRs (which is arguably an attack surface issue).

 No, it doesn't.  You cannot disable instructions even if you hide CPUID
 bits, so KVM just extends this to MSRs (both native and paravirtual). It
 sometimes helps too, for example with a particular guest OS that does
 not necessary check CPUID for bits that are always present on Apple
 hardware...

But I bet that no one assumes that KVM paravirt MSRs are available
even if the feature bit isn't set.

Also, the one and only native feature flag I tested (rdtscp) actually
does work: RDTSCP seems to send #UD if QEMU is passed -cpu
host,-rdtscp.

--Andy
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Standardizing an MSR or other hypercall to get an RNG seed?

2014-09-17 Thread Andy Lutomirski
Hi all-

I would like to standardize on a very simple protocol by which a guest
OS can obtain an RNG seed early in boot.

The main design requirements are:

 - The interface should be very easy to use.  Linux, at least, will
want to use it extremely early in boot as part of kernel ASLR.  This
means that PCI and ACPI will not work.

 - It should be synchronous.  We don't want to delay boot while
waiting for a slow host RNG.  (On Linux, at least, we have a separate
interface for that: virtio-rng.  I think that Windows has some support
for virtio-rng as well.)

 - Random numbers obtained through this interface should be
best-effort.  We want the best quality randomness that the host can
provide immediately.

It seems to me that the best interface for the actual request for a
random number is rdmsr.  This is supported on all hypervisors and all
virtualization technologies.  It can return a 64 bit random number,
and it is easy to rdmsr the same register more than once to get a
larger random number.

The main questions are what MSR index to use and how to detect the
presence of the MSR.  I've played with two approaches:

1. Use CPUID to detect the presence of this feature.  This is very
easy for KVM to implement by using a KVM-specific CPUID feature.  The
problem is that this will necessarily be KVM-specific, as the guest
must first probe for KVM and then probe for the KVM feature.  I doubt
that Hyper-V, for example, wants to claim to be KVM.  If we could
standardize a non-hypervisor-specific CPUID feature, then this problem
would go away.

2. Detect the existence of the MSR by trying to read it and handling
the #GP(0) that will occur if the MSR is not present.  Linux, at
least, is okay with doing this, and I have code to enable an IDT and
an rdmsr fixup early enough in boot to use it for ASLR.  I don't know
whether other operating systems can do this, though.

The major questions, then, are what enumeration mechanism should be
used and what MSR index should be used.

For the MSR index, we could use an MSR from the Intel range if Intel
were to give explicit approval, thus guaranteeing that nothing would
conflict.  Or we could try to agree on an MSR index in the
0x4000-0x4fff range that is unlikely to conflict with
anything.

For enumeration, we could just probe the MSR if all relevant guests
are okay with this or we could standardize on a CPUID-based mechanism.
If we do the latter, I don't know what that mechanism would be.

NB: This thread will be cc'd to Microsoft and possibly Hyper-V people
shortly.  I very much appreciate Jun Nakajima's help with this!

Thanks,
Andy

-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Standardizing an MSR or other hypercall to get an RNG seed?

2014-09-18 Thread Andy Lutomirski
On Thu, Sep 18, 2014 at 7:43 AM, H. Peter Anvin h...@zytor.com wrote:
 On 09/18/2014 07:40 AM, KY Srinivasan wrote:

 The main questions are what MSR index to use and how to detect the
 presence of the MSR.  I've played with two approaches:

 1. Use CPUID to detect the presence of this feature.  This is very easy for
 KVM to implement by using a KVM-specific CPUID feature.  The problem is
 that this will necessarily be KVM-specific, as the guest must first probe 
 for
 KVM and then probe for the KVM feature.  I doubt that Hyper-V, for
 example, wants to claim to be KVM.  If we could standardize a non-
 hypervisor-specific CPUID feature, then this problem would go away.

 We would prefer a CPUID feature bit to detect this feature.


 I guess if we're introducing the concept of pan-OS MSRs we could also
 have pan-OS CPUID.  The real issue is to get a single non-conflicting
 standard.

Agreed.

KVM currently puts 0 in 0x4000.EAX, meaning that a feature bit in
Microsoft's leaf 0x4003 would probably not work well for KVM.  I
don't expect that Microsoft wants to start claiming to be KVM for the
purpose of using a KVM-style feature bit, so, if we went the CPUID
route, we would probably need something new.

--Andy


 -hpa





-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Standardizing an MSR or other hypercall to get an RNG seed?

2014-09-18 Thread Andy Lutomirski
On Thu, Sep 18, 2014 at 8:38 AM, Andy Lutomirski l...@amacapital.net wrote:
 On Thu, Sep 18, 2014 at 7:43 AM, H. Peter Anvin h...@zytor.com wrote:
 On 09/18/2014 07:40 AM, KY Srinivasan wrote:

 The main questions are what MSR index to use and how to detect the
 presence of the MSR.  I've played with two approaches:

 1. Use CPUID to detect the presence of this feature.  This is very easy for
 KVM to implement by using a KVM-specific CPUID feature.  The problem is
 that this will necessarily be KVM-specific, as the guest must first probe 
 for
 KVM and then probe for the KVM feature.  I doubt that Hyper-V, for
 example, wants to claim to be KVM.  If we could standardize a non-
 hypervisor-specific CPUID feature, then this problem would go away.

 We would prefer a CPUID feature bit to detect this feature.


 I guess if we're introducing the concept of pan-OS MSRs we could also
 have pan-OS CPUID.  The real issue is to get a single non-conflicting
 standard.

 Agreed.

 KVM currently puts 0 in 0x4000.EAX, meaning that a feature bit in
 Microsoft's leaf 0x4003 would probably not work well for KVM.  I
 don't expect that Microsoft wants to start claiming to be KVM for the
 purpose of using a KVM-style feature bit, so, if we went the CPUID
 route, we would probably need something new.

Slight correction: QEMU/KVM has optional support for Hyper-V feature
enumeration.  Ideally the RNG seed mechanism would be enabled by
default, but I don't know whether the QEMU maintainers would be okay
with enabling the Hyper-V cpuid mechanism in a default configuration.

--Andy


 --Andy


 -hpa





 --
 Andy Lutomirski
 AMA Capital Management, LLC



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Standardizing an MSR or other hypercall to get an RNG seed?

2014-09-18 Thread Andy Lutomirski
On Thu, Sep 18, 2014 at 10:42 AM, Nakajima, Jun jun.nakaj...@intel.com wrote:
 On Thu, Sep 18, 2014 at 10:20 AM, KY Srinivasan k...@microsoft.com wrote:


 -Original Message-
 From: Paolo Bonzini [mailto:paolo.bonz...@gmail.com] On Behalf Of Paolo
 Bonzini
 Sent: Thursday, September 18, 2014 10:18 AM
 To: Nakajima, Jun; KY Srinivasan
 Cc: Mathew John; Theodore Ts'o; John Starks; kvm list; Gleb Natapov; Niels
 Ferguson; Andy Lutomirski; David Hepkin; H. Peter Anvin; Jake Oshins; Linux
 Virtualization
 Subject: Re: Standardizing an MSR or other hypercall to get an RNG seed?

 Il 18/09/2014 19:13, Nakajima, Jun ha scritto:
  In terms of the address for the MSR, I suggest that you choose one
  from the range between 4000H - 40FFH. The SDM (35.1
  ARCHITECTURAL MSRS) says All existing and future processors will not
  implement any features using any MSR in this range. Hyper-V already
  defines many synthetic MSRs in this range, and I think it would be
  reasonable for you to pick one for this to avoid a conflict?

 KVM is not using any MSR in that range.

 However, I think it would be better to have the MSR (and perhaps CPUID)
 outside the hypervisor-reserved ranges, so that it becomes architecturally
 defined.  In some sense it is similar to the HYPERVISOR CPUID feature.

 Yes, given that we want this to be hypervisor agnostic.


 Actually, that MSR address range has been reserved for that purpose, along 
 with:
 - CPUID.EAX=1 - ECX bit 31 (always returns 0 on bare metal)
 - CPUID.EAX=4000_00xxH leaves (i.e. HYPERVISOR CPUID)

I don't know whether this is documented anywhere, but Linux tries to
detect a hypervisor by searching CPUID leaves 0x400xyz00 for
KVMKVMKVM\0\0\0, so at least Linux can handle the KVM leaves being
in a somewhat variable location.

Do we consider this mechanism to work across all hypervisors and
guests?  That is, could we put something like CrossHVPara\0
somewhere in that range, where each hypervisor would be free to decide
exactly where it ends up?

--Andy
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Standardizing an MSR or other hypercall to get an RNG seed?

2014-09-18 Thread Andy Lutomirski
On Thu, Sep 18, 2014 at 11:54 AM, Niels Ferguson ni...@microsoft.com wrote:
 Defining a standard way of transferring random numbers between the host and 
 the guest is an excellent idea.

 As the person who writes the RNG code in Windows, I have a few comments:

 DETECTION:
 It should be possible to detect this feature through CPUID or similar 
 mechanism. That allows the code that uses this feature to be written without 
 needing the ability to catch CPU exceptions. I could be wrong, but as far as 
 I know there is no support for exception handling in the Windows OS loader 
 where we gather our initial random state.


Linux is like this, too, except that I have experimental code to
create an IDT in that code, so we can handle it.  I agree, though,
that using CPUID in early boot is easier.

 EFFICIENCY:
 Is there a way we can transfer more bytes per interaction? With a single 
 64-bit MSR we always need multiple reads to get a seed, and each of them 
 results in a context switch to the host, which is expensive. This is even 
 worse for 32-bit guests. Windows would typically need to fetch 64 bytes of 
 random data at boot and at regular intervals. It is not a show-stopper, but 
 better efficiency would be nice.

I thought about this for a while and didn't come up with anything that
wouldn't messy.  We could fudge the MSR rax/rdx high bits to get 128
bits, but that's nonportable and awful to implement.  We could return
a random number directly from CPUID, but that's weird.

In very informal benchmarking, rdmsr wasn't that bad.  On the other
hand, I wasn't immediately planning on using the msr on an ongoing
basis on Linux guests except after suspend/resume.


 GUEST-TO-HOST:
 Can we also define a way to have random values flow from the guest to the 
 host? Guests are also gathering entropy from their own sources, and if we 
 allow the guests to send random data to the host, then the host can treat it 
 as an entropy source and all the VMs on a single host can share their 
 entropy. (This is not a security problem; any reasonable host RNG cannot be 
 hurt even by maliciously chosen entropy inputs.)


wrmsr on the same MSR?


 I don't know much about how hypervisors work on the inside, but maybe we can 
 define a mechanism for standardized hypervisor calls that work on all 
 hypervisors that support this feature. Then we could define a function to do 
 an entropy exchange: the guest provides N bytes of random data to the host, 
 and the host replies with N bytes of random data. The data exchange can now 
 be done through memory.

 A standardized hypervisor-call mechanism also seems generally useful for 
 future features, whereas the MSR solution is very limited in what it can do. 
 We might end up with standardized hypervisor-calls in the future for some 
 other reason, and then the MSR solution looks very odd.

I think there'll be resistance to a standardized hypercall mechanism,
just because the implementations tend to be complex.  Hyper-V uses a
special page in guest physical memory that contains a trampoline.

We could use wrmsr to a register where the payload is a pointer to a
buffer to receive random bytes, but that loses some of the simplicity
of just calling rdmsr a few times.

--Andy
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Standardizing an MSR or other hypercall to get an RNG seed?

2014-09-18 Thread Andy Lutomirski
On Thu, Sep 18, 2014 at 11:58 AM, Paolo Bonzini pbonz...@redhat.com wrote:

  Actually, that MSR address range has been reserved for that purpose, along
  with:
  - CPUID.EAX=1 - ECX bit 31 (always returns 0 on bare metal)
  - CPUID.EAX=4000_00xxH leaves (i.e. HYPERVISOR CPUID)

 I don't know whether this is documented anywhere, but Linux tries to
 detect a hypervisor by searching CPUID leaves 0x400xyz00 for
 KVMKVMKVM\0\0\0, so at least Linux can handle the KVM leaves being
 in a somewhat variable location.

 Do we consider this mechanism to work across all hypervisors and
 guests?  That is, could we put something like CrossHVPara\0
 somewhere in that range, where each hypervisor would be free to decide
 exactly where it ends up?

 That's also possible, but extending the hypervisor CPUID range
 beywond 40FFH is not officially sanctioned by Intel.

 Xen started doing that in order to expose both Hyper-V and Xen
 CPUID leaves, and KVM followed the practice.


Whoops.

Might Intel be willing to extend that range to 0x4000 -
0x400f?  And would Microsoft be okay with using this mechanism for
discovery?

Do we have anyone from VMware in this thread?  I don't have any VMware contacts.

--Andy
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Standardizing an MSR or other hypercall to get an RNG seed?

2014-09-18 Thread Andy Lutomirski
On Thu, Sep 18, 2014 at 2:21 PM, Nakajima, Jun jun.nakaj...@intel.com wrote:
 On Thu, Sep 18, 2014 at 12:07 PM, Andy Lutomirski l...@amacapital.net wrote:

 Might Intel be willing to extend that range to 0x4000 -
 0x400f?  And would Microsoft be okay with using this mechanism for
 discovery?

 So, for CPUID, the SDM (Table 3-17. Information Returned by CPUID) says today:
 No existing or future CPU will return processor identification or
 feature information if the initial EAX value is in the range 4000H
 to 4FFFH.

 We can define a cross-VM CPUID range from there. The CPUID can return
 the index of the MSR if needed.

Right, sorry.  I was looking at this sentence in SDM Volume 3 Section 35.1:

MSR address range between 4000H - 40FFH is marked as a
specially reserved range. All existing and
future processors will not implement any features using any MSR in this range.

That's not really a large enough range for us to reserve an MSR for
this.  However, KVM, is already using MSRs outside that range: it uses
0x4b564d00-0x4b564d04 or so.  I wonder whether KVM got confused by the
differing ranges for cpuid leaves and MSR indices.

Any chance that Intel could reserve a larger range to include the KVM
MSRs?  It would also be easier if the MSR indices for cross-HV
features were constants.

Thanks,
Andy
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Standardizing an MSR or other hypercall to get an RNG seed?

2014-09-18 Thread Andy Lutomirski
On Thu, Sep 18, 2014 at 2:46 PM, David Hepkin david...@microsoft.com wrote:
 I'm not sure what you mean by this mechanism?  Are you suggesting that each 
 hypervisor put CrossHVPara\0 somewhere in the 0x4000 - 0x400f CPUID 
 range, and an OS has to do a full scan of this CPUID range on boot to find 
 it?  That seems pretty inefficient.  An OS will take 1000's of hypervisor 
 intercepts on every boot just to search this CPUID range.

Linux already does this, which is arguably unfortunate.  But it's not
quite that bad; the KVM and Xen code is only scanning at increments of
0x100.

I think that Linux as a guest would have no problem with checking the
Hyper-V range or some new range.  I don't think that Linux would want
to have to set a guest OS identity, and it's not entirely clear to me
whether this would be necessary to use the Hyper-V mechanism.


 I suggest we come to consensus on a specific CPUID leaf where an OS needs to 
 look to determine if a hypervisor supports this capability.  We could define 
 a new CPUID leaf range at a well-defined location, or we could just use one 
 of the existing CPUID leaf ranges implemented by an existing hypervisor.  I'm 
 not familiar with the KVM CPUID leaf range, but in the case of Hyper-V, the 
 Hyper-V CPUID leaf range was architected to allow for other hypervisors to 
 implement it and just show through specific capabilities supported by the 
 hypervisor.  So, we could define a bit in the Hyper-V CPUID leaf range (since 
 Xen and KVM also implement this range), but that would require Linux to look 
 in that range on boot to discover this capability.

I also don't know whether QEMU and KVM would be okay with implementing
the host side of the Hyper-V mechanism by default.  They would have to
implement at least leaves 0x4001 and 0x402, plus correctly
reporting zeros through whatever leaf is used for this new feature.
Gleb?  Paolo?

--Andy
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Standardizing an MSR or other hypercall to get an RNG seed?

2014-09-18 Thread Andy Lutomirski
On Thu, Sep 18, 2014 at 2:57 PM, H. Peter Anvin h...@zytor.com wrote:
 On 09/18/2014 02:46 PM, David Hepkin wrote:
 I'm not sure what you mean by this mechanism?  Are you suggesting that 
 each hypervisor put CrossHVPara\0 somewhere in the 0x4000 - 0x400f 
 CPUID range, and an OS has to do a full scan of this CPUID range on boot to 
 find it?  That seems pretty inefficient.  An OS will take 1000's of 
 hypervisor intercepts on every boot just to search this CPUID range.

 I suggest we come to consensus on a specific CPUID leaf where an OS needs to 
 look to determine if a hypervisor supports this capability.  We could define 
 a new CPUID leaf range at a well-defined location, or we could just use one 
 of the existing CPUID leaf ranges implemented by an existing hypervisor.  
 I'm not familiar with the KVM CPUID leaf range, but in the case of Hyper-V, 
 the Hyper-V CPUID leaf range was architected to allow for other hypervisors 
 to implement it and just show through specific capabilities supported by the 
 hypervisor.  So, we could define a bit in the Hyper-V CPUID leaf range 
 (since Xen and KVM also implement this range), but that would require Linux 
 to look in that range on boot to discover this capability.


 Yes, I would agree that if anything we should define a new range unique
 to this cross-VM interface, e.g. 0x4800.

So, as a concrete straw-man:

CPUID leaf 0x4800 would return a maximum leaf number in EAX (e.g.
0x4801) along with a signature value (e.g. CrossHVPara\0) in
EBX, ECX, and EDX.

CPUID 0x4801.EAX would contain an MSR number to read to get a
random number if supported and zero if not supported.

Questions:

1. Can we use a fixed MSR number?  This would be a little bit simpler,
but it would depend on getting a wider MSR range from Intel.

2. Who would host and maintain such a spec?  I could do it on github,
but this seems a bit silly.  Other options would include Intel,
Microsoft, or perhaps the Linux Foundation.  I don't know whether
Intel or LF would want to do this, and MS isn't exactly
vendor-neutral.  (Even L-F isn't entirely neutral, since they sort of
represent two hypervisors.)  Or we could do something temporary and
then try to work with a group like OASIS, but that might end up being
a lot of work.

--Andy
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   3   4   >