Re: [PATCH v2 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops

2015-09-30 Thread H. Peter Anvin
On 09/21/2015 09:36 AM, Linus Torvalds wrote:
> 
> How many msr reads are so critical that the function call
> overhead would matter? Get rid of the inline version of the _safe()
> thing too, and put that thing there too.
> 

Probably only the ones that may go in the context switch path.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] x86: Use entire page for the per-cpu GDT only if paravirt-enabled

2015-09-29 Thread H. Peter Anvin
No, it is a natural result of an implemention which treats setting the A bit as 
an abnormal flow (e.g. in microcode as opposed to hardware).

On September 29, 2015 7:11:59 PM PDT, ebied...@xmission.com wrote:
>"H. Peter Anvin" <h...@zytor.com> writes:
>
>> On 09/29/2015 06:20 PM, Eric W. Biederman wrote:
>>> Linus Torvalds <torva...@linux-foundation.org> writes:
>>> 
>>>> On Tue, Sep 29, 2015 at 1:35 PM, Andy Lutomirski
><l...@amacapital.net> wrote:
>>>>>
>>>>> Does anyone know what happens if you stick a non-accessed segment
>in
>>>>> the GDT, map the GDT RO, and access it?
>>>>
>>>> You should get a #PF, as you guess, but go ahead and test it if you
>>>> want to make sure.
>>> 
>>> I tested this by accident once when workinng on what has become
>known
>>> as coreboot.  Early in boot with your GDT in a EEPROM switching from
>>> real mode to 32bit protected mode causes a write and locks up the
>>> machine when the hardware declines the write to the GDT to set the
>>> accessed bit.  As I recall the write kept being retried and retried
>and
>>> retried...
>>> 
>>> Setting the access bit in the GDT cleared up the problem and I did
>not
>>> look back.
>>> 
>>> Way up in 64bit mode something might be different, but I don't know
>why
>>> cpu designeres would waste the silicon.
>>> 
>>
>> This is totally different from a TLB violation.  In your case, the
>write
>> goes through as far as the CPU is concerned, but when the data is
>> fetched back, it hasn't changed.  A write to a TLB-protected location
>> will #PF.
>
>The key point is that a write is generated when the cpu needs to set
>the
>access bit.  I agree the failure points are different.  A TLB fault vs
>a
>case where the hardware did not accept the write.
>
>The idea of a cpu reading back data (and not trusting it's cache
>coherency controls) to verify the access bit gets set seems mind
>boggling.  That is slow, stupid, racy and incorrect.  Incorrect as the
>cpu should not only set the access bit once per segment register load.
>
>In my case I am pretty certain it was something very weird with the
>hardware not acceppting the write and either not acknowledging the bus
>transaction or cancelling it.  In which case the cpu knew the write had
>not made it to the ``memory'' and was trying to cope.
>
>Eric

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] x86: Use entire page for the per-cpu GDT only if paravirt-enabled

2015-09-29 Thread H. Peter Anvin
On 09/29/2015 06:20 PM, Eric W. Biederman wrote:
> Linus Torvalds  writes:
> 
>> On Tue, Sep 29, 2015 at 1:35 PM, Andy Lutomirski  wrote:
>>>
>>> Does anyone know what happens if you stick a non-accessed segment in
>>> the GDT, map the GDT RO, and access it?
>>
>> You should get a #PF, as you guess, but go ahead and test it if you
>> want to make sure.
> 
> I tested this by accident once when workinng on what has become known
> as coreboot.  Early in boot with your GDT in a EEPROM switching from
> real mode to 32bit protected mode causes a write and locks up the
> machine when the hardware declines the write to the GDT to set the
> accessed bit.  As I recall the write kept being retried and retried and
> retried...
> 
> Setting the access bit in the GDT cleared up the problem and I did not
> look back.
> 
> Way up in 64bit mode something might be different, but I don't know why
> cpu designeres would waste the silicon.
> 

This is totally different from a TLB violation.  In your case, the write
goes through as far as the CPU is concerned, but when the data is
fetched back, it hasn't changed.  A write to a TLB-protected location
will #PF.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] x86: Use entire page for the per-cpu GDT only if paravirt-enabled

2015-09-29 Thread H. Peter Anvin
SGDT would be easy to use, and it is logical that it is faster since it reads 
an internal register.  SIDT does too but unlike the GDT has a secondary limit 
(it can never be larger than 4096 bytes) and so all limits in the range 
4095-65535 are exactly equivalent.

Anything that causes a write to the GDT will #PF if read-only.  So yes, we need 
to force the accessed bit to set.  This shouldn't be a problem and in fact 
ought to be a performance improvement.

On September 29, 2015 10:35:38 AM PDT, Andy Lutomirski <l...@amacapital.net> 
wrote:
>On Sep 29, 2015 2:01 AM, "Ingo Molnar" <mi...@kernel.org> wrote:
>>
>>
>> * Denys Vlasenko <dvlas...@redhat.com> wrote:
>>
>> > On 09/28/2015 09:58 AM, Ingo Molnar wrote:
>> > >
>> > > * Denys Vlasenko <dvlas...@redhat.com> wrote:
>> > >
>> > >> On 09/26/2015 09:50 PM, H. Peter Anvin wrote:
>> > >>> NAK.  We really should map the GDT read-only on all 64 bit
>systems,
>> > >>> since we can't hide the address from SLDT.  Same with the IDT.
>> > >>
>> > >> Sorry, I don't understand your point.
>> > >
>> > > So the problem is that right now the SGDT instruction (which is
>unprivileged)
>> > > leaks the real address of the kernel image:
>> > >
>> > >  fomalhaut:~> ./sgdt
>> > >  SGDT: 88303fd89000 / 007f
>> > >
>> > > that '88303fd89000' is a kernel address.
>> >
>> > Thank you.
>> > I do know that SGDT and friends are unprivileged on x86
>> > and thus they allow userspace (and guest kernels in paravirt)
>> > learn things they don't need to know.
>> >
>> > I don't see how making GDT page-aligned and page-sized
>> > changes anything in this regard. SGDT will still work,
>> > and still leak GDT address.
>>
>> Well, as I try to explain it in the other part of my mail, doing so
>enables us to
>> remap the GDT to a less security sensitive virtual address that does
>not leak the
>> kernel's randomized address:
>>
>> > > Your observation in the changelog and your patch:
>> > >
>> > >>>> It is page-sized because of paravirt. [...]
>> > >
>> > > ... conflicts with the intention to mark (remap) the primary GDT
>address read-only
>> > > on native kernels as well.
>> > >
>> > > So what we should do instead is to use the page alignment
>properly and remap the
>> > > GDT to a read-only location, and load that one.
>> >
>> > If we'd have a small GDT (i.e. what my patch does), we still can
>remap the
>> > entire page which contains small GDT, and simply don't care that
>some other data
>> > is also visible through that RO page.
>>
>> That's generally considered fragile: suppose an attacker has a
>limited information
>> leak that can read absolute addresses with system privilege but he
>doesn't know
>> the kernel's randomized base offset. With a 'partial page' mapping
>there could be
>> function pointers near the GDT, part of the page the GDT happens to
>be on, that
>> leak this information.
>>
>> (Same goes for crypto keys or other critical information (like canary
>information,
>> salts, etc.) accidentally ending up nearby.)
>>
>> Arguably it's a bit tenuous, but when playing remapping games it's
>generally
>> considered good to be page aligned and page sized, with zero padding.
>>
>> > > This would have a couple of advantages:
>> > >
>> > >  - This would give kernel address randomization more teeth on
>x86.
>> > >
>> > >  - An additional advantage would be that rootkits overwriting the
>GDT would have
>> > >a bit more work to do.
>> > >
>> > >  - A third advantage would be that for NUMA systems we could
>'mirror' the GDT into
>> > >node-local memory and load those. This makes GDT load
>cache-misses a bit less
>> > >expensive.
>> >
>> > GDT is per-cpu. Isn't per-cpu memory already NUMA-local?
>>
>> Indeed it is:
>>
>> fomalhaut:~> for ((cpu=1; cpu<9; cpu++)); do taskset $cpu ./sgdt ;
>done
>> SGDT: 88103fa09000 / 007f
>> SGDT: 88103fa29000 / 007f
>> SGDT: 88103fa29000 / 007f
>> SGDT: 88103fa49000 / 007f
>> SGDT: 88103fa49000 / 007f
>> SGDT: 88103fa49000 / 007f
>> SGDT: 88103fa29000 / 007f
>> SGDT: 88103fa69000 / 007f
>>
>> I confused it wi

Re: [PATCH] x86: Use entire page for the per-cpu GDT only if paravirt-enabled

2015-09-29 Thread H. Peter Anvin
Ugh.  Didn't realize that.

On September 29, 2015 11:22:04 AM PDT, Andy Lutomirski <l...@amacapital.net> 
wrote:
>On Tue, Sep 29, 2015 at 11:18 AM, H. Peter Anvin <h...@zytor.com> wrote:
>> SGDT would be easy to use, and it is logical that it is faster since
>it reads an internal register.  SIDT does too but unlike the GDT has a
>secondary limit (it can never be larger than 4096 bytes) and so all
>limits in the range 4095-65535 are exactly equivalent.
>>
>
>Using the IDT limit would have been a great ideal if Intel hadn't
>decided to clobber it on every VM exit.
>
>--Andy

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] x86: Use entire page for the per-cpu GDT only if paravirt-enabled

2015-09-29 Thread H. Peter Anvin
On 09/29/2015 11:02 AM, Andy Lutomirski wrote:
> On Tue, Sep 29, 2015 at 10:50 AM, Linus Torvalds
>  wrote:
>> On Tue, Sep 29, 2015 at 1:35 PM, Andy Lutomirski  wrote:
>>>
>>> Does anyone know what happens if you stick a non-accessed segment in
>>> the GDT, map the GDT RO, and access it?
>>
>> You should get a #PF, as you guess, but go ahead and test it if you
>> want to make sure.
> 
> Then I think that, if we do this, the patch series should first make
> it percpu and fixmapped but RW and then flip it RO as a separate patch
> in case we need to revert the actual RO bit.  I don't want to break
> Wine or The Witcher 2 because of this, and we might need various
> fixups.  I really hope that no one uses get_thread_area to check
> whether TLS has been accessed.
> 

Of course.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] x86: Use entire page for the per-cpu GDT only if paravirt-enabled

2015-09-26 Thread H. Peter Anvin
NAK.  We really should map the GDT read-only on all 64 bit systems, since we 
can't hide the address from SLDT.  Same with the IDT.

On September 26, 2015 11:00:40 AM PDT, Denys Vlasenko <dvlas...@redhat.com> 
wrote:
>We have our GDT in a page-sized per-cpu structure, gdt_page.
>
>On x86_64 kernel, GDT is 128 bytes - only ~3% of that page is used.
>
>It is page-sized because of paravirt. Hypervisors need to know when
>GDT is changed, so they remap it read-only and handle write faults.
>If it's not in its own page, other writes nearby will cause
>those faults too.
>
>In other words, we need GDT to live in a separate page
>only if CONFIG_HYPERVISOR_GUEST=y.
>
>This patch reduces GDT alignment to cacheline-aligned
>if CONFIG_HYPERVISOR_GUEST is not set.
>
>Patch also renames gdt_page to cpu_gdt (mimicking naming of existing
>cpu_tss per-cpu variable), since now it is not always a full page.
>
>$ objdump -x vmlinux | grep .data..percpu | sort
>Before:
>(offset)(size)
>  wO .data..percpu  4000
>irq_stack_union
>4000  wO .data..percpu  5000
>exception_stacks
>9000  wO .data..percpu  1000 gdt_page  <<<
>HERE
>  a000  wO .data..percpu  0008 espfix_waddr
>  a008  wO .data..percpu  0008 espfix_stack
>...
> 00019398 g   .data..percpu   __per_cpu_end
>After:
>  wO .data..percpu  4000
>irq_stack_union
>4000  wO .data..percpu  5000
>exception_stacks
>  9000  wO .data..percpu  0008 espfix_waddr
>  9008  wO .data..percpu  0008 espfix_stack
>...
>00013c80  wO .data..percpu  0040 cyc2ns
>00013cc0  wO .data..percpu  22c0 cpu_tss
>00015f80  wO .data..percpu  0080 cpu_gdt  <<<
>HERE
>  00016000  wO .data..percpu  0018 cpu_tlbstate
>...
> 00018418 g   .data..percpu  00000000 __per_cpu_end
>
>Run-tested on a 144 CPU machine.
>
>Signed-off-by: Denys Vlasenko <dvlas...@redhat.com>
>CC: Ingo Molnar <mi...@kernel.org>
>CC: H. Peter Anvin <h...@zytor.com>
>CC: Konrad Rzeszutek Wilk <konrad.w...@oracle.com>
>CC: Boris Ostrovsky <boris.ostrov...@oracle.com>
>CC: David Vrabel <david.vra...@citrix.com>
>CC: Joerg Roedel <j...@8bytes.org>
>CC: Gleb Natapov <g...@kernel.org>
>CC: Paolo Bonzini <pbonz...@redhat.com>
>CC: kvm@vger.kernel.org
>CC: x...@kernel.org
>CC: linux-ker...@vger.kernel.org
>---
> arch/x86/entry/entry_32.S|  2 +-
> arch/x86/include/asm/desc.h  | 16 +++-
> arch/x86/kernel/cpu/common.c | 10 --
> arch/x86/kernel/cpu/perf_event.c |  2 +-
> arch/x86/kernel/head_32.S|  4 ++--
> arch/x86/kernel/head_64.S|  2 +-
> arch/x86/kernel/vmlinux.lds.S|  2 +-
> arch/x86/tools/relocs.c  |  2 +-
> arch/x86/xen/enlighten.c |  4 ++--
> 9 files changed, 28 insertions(+), 16 deletions(-)
>
>diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
>index b2909bf..bc6ae1c 100644
>--- a/arch/x86/entry/entry_32.S
>+++ b/arch/x86/entry/entry_32.S
>@@ -429,7 +429,7 @@ ldt_ss:
>  * compensating for the offset by changing to the ESPFIX segment with
>  * a base address that matches for the difference.
>  */
>-#define GDT_ESPFIX_SS PER_CPU_VAR(gdt_page) + (GDT_ENTRY_ESPFIX_SS *
>8)
>+#define GDT_ESPFIX_SS PER_CPU_VAR(cpu_gdt) + (GDT_ENTRY_ESPFIX_SS * 8)
>   mov %esp, %edx  /* load kernel esp */
>   mov PT_OLDESP(%esp), %eax   /* load userspace esp */
>   mov %dx, %ax/* eax: new kernel esp */
>diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
>index 4e10d73..76de300 100644
>--- a/arch/x86/include/asm/desc.h
>+++ b/arch/x86/include/asm/desc.h
>@@ -39,15 +39,21 @@ extern gate_desc idt_table[];
> extern struct desc_ptr debug_idt_descr;
> extern gate_desc debug_idt_table[];
> 
>-struct gdt_page {
>+struct cpu_gdt {
>   struct desc_struct gdt[GDT_ENTRIES];
>-} __attribute__((aligned(PAGE_SIZE)));
>-
>-DECLARE_PER_CPU_PAGE_ALIGNED(struct gdt_page, gdt_page);
>+}
>+#ifdef CONFIG_HYPERVISOR_GUEST
>+/* Xen et al want GDT to have its own page. They remap it read-only */
>+__attribute__((aligned(PAGE_SIZE)));
>+DECLARE_PER_CPU_PAGE_ALIGNED(struct cpu_gdt, cpu_gdt);
>+#else
>+cacheline_aligned;
>+DECLARE_PER_CPU_ALIGNED(s

Re: [PATCH 0/3] x86/paravirt: Fix baremetal paravirt MSR ops

2015-09-17 Thread H. Peter Anvin
However, the difference between one CONFIG and another is quite frankly crazy.  
We should explicitly use the safe versions where this is appropriate, and then 
yes, we should do this.

Yet another reason the paravirt code is batshit crazy.

On September 17, 2015 2:31:34 AM PDT, Borislav Petkov  wrote:
>On Thu, Sep 17, 2015 at 09:19:20AM +0200, Ingo Molnar wrote:
>> Most big distro kernels on bare metal have CONFIG_PARAVIRT=y (I
>checked Ubuntu and 
>> Fedora), so we are potentially exposing a lot of users to problems.
>
>+ SUSE.
>
>> Crashing the bootup on an unknown MSR is bad. Many MSR reads and
>writes are 
>> non-critical and returning the 'safe' result is much better than
>crashing or 
>> hanging the bootup.
>
>... and prepending all MSR accesses with feature/CPUID checks is
>probably almost
>impossible.

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v6] kvm/fpu: Enable fully eager restore kvm FPU

2015-04-23 Thread H. Peter Anvin
On 04/23/2015 08:28 AM, Dave Hansen wrote:
 On 04/23/2015 02:13 PM, Liang Li wrote:
 When compiling kernel on westmere, the performance of eager FPU
 is about 0.4% faster than lazy FPU.
 
 Do you have an theory why this is?  What does the regression come from?
 

This is interesting since previous measurements on KVM have had the
exact opposite results.  I think we need to understand this a lot more.

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v3 21/26] x86, irq: Define a global vector for VT-d Posted-Interrupts

2015-01-30 Thread H. Peter Anvin
On 12/12/2014 07:14 AM, Feng Wu wrote:
 Currently, we use a global vector as the Posted-Interrupts
 Notification Event for all the vCPUs in the system. We need
 to introduce another global vector for VT-d Posted-Interrtups,
 which will be used to wakeup the sleep vCPU when an external
 interrupt from a direct-assigned device happens for that vCPU.
 
 Signed-off-by: Feng Wu feng...@intel.com
  

  #ifdef CONFIG_HAVE_KVM
 +void (*wakeup_handler_callback)(void) = NULL;
 +EXPORT_SYMBOL_GPL(wakeup_handler_callback);
 +

Stylistic nitpick: we generally don't explicitly initialize
global/static pointer variables to NULL (that happens automatically anyway.)

Other than that,

Acked-by: H. Peter Anvin h...@linux.intel.com

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] [RFC] Hypervisor RNG and enumeration

2014-10-29 Thread H. Peter Anvin
On 10/29/2014 03:37 AM, Andrew Cooper wrote:

 CPUID with EAX = 0x4F01 and ECX = N MUST return all zeros.

 To the extent that the hypervisor prefers a given interface, it should
 specify that interface earlier in the list.  For example, KVM might place
 its KVMKVMKVM signature first in the list to indicate that it should be
 used by guests in preference to other supported interfaces.  Other 
 hypervisors
 would likely use a different order.

 The exact semantics of the ordering of the list is beyond the scope of
 this specification.
 
 How do you evaluate N?
 
 It would make more sense for CPUID.4F01[ECX=0] to return N in one
 register, and perhaps prefered interface index in another.  The
 signatures can then be obtained from CPUID.4F01[ECX={1 to N}].
 
 That way, a consumer can be confident that they have found all the
 signatures, without relying on an unbounded loop and checking for zeroes

Yes.  Specifically, it should return it in EAX.  That is the preferred
interface and we are trying to push for that going forward.

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] [RFC] Hypervisor RNG and enumeration

2014-10-29 Thread H. Peter Anvin
On 10/29/2014 03:37 AM, Andrew Cooper wrote:

 CPUID with EAX = 0x4F01 and ECX = N MUST return all zeros.

 To the extent that the hypervisor prefers a given interface, it should
 specify that interface earlier in the list.  For example, KVM might place
 its KVMKVMKVM signature first in the list to indicate that it should be
 used by guests in preference to other supported interfaces.  Other 
 hypervisors
 would likely use a different order.

 The exact semantics of the ordering of the list is beyond the scope of
 this specification.
 
 How do you evaluate N?
 
 It would make more sense for CPUID.4F01[ECX=0] to return N in one
 register, and perhaps prefered interface index in another.  The
 signatures can then be obtained from CPUID.4F01[ECX={1 to N}].
 
 That way, a consumer can be confident that they have found all the
 signatures, without relying on an unbounded loop and checking for zeroes

Yes.  Specifically, it should return it in EAX.  That is the preferred
interface and we are trying to push for that going forward.

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] [RFC] Hypervisor RNG and enumeration

2014-10-29 Thread H. Peter Anvin
On 10/29/2014 03:37 AM, Andrew Cooper wrote:

 CPUID with EAX = 0x4F01 and ECX = N MUST return all zeros.

 To the extent that the hypervisor prefers a given interface, it should
 specify that interface earlier in the list.  For example, KVM might place
 its KVMKVMKVM signature first in the list to indicate that it should be
 used by guests in preference to other supported interfaces.  Other 
 hypervisors
 would likely use a different order.

 The exact semantics of the ordering of the list is beyond the scope of
 this specification.
 
 How do you evaluate N?
 
 It would make more sense for CPUID.4F01[ECX=0] to return N in one
 register, and perhaps prefered interface index in another.  The
 signatures can then be obtained from CPUID.4F01[ECX={1 to N}].
 
 That way, a consumer can be confident that they have found all the
 signatures, without relying on an unbounded loop and checking for zeroes

Yes.  Specifically, it should return it in EAX.  That is the preferred
interface and we are trying to push for that going forward.

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Standardizing an MSR or other hypercall to get an RNG seed?

2014-09-22 Thread H. Peter Anvin
On 09/22/2014 06:31 AM, Christopher Covington wrote:
 On 09/19/2014 05:46 PM, H. Peter Anvin wrote:
 On 09/19/2014 01:46 PM, Andy Lutomirski wrote:

 However, it sounds to me that at least for KVM, it is very easy just to 
 emulate the RDRAND instruction. The hypervisor would report to the guest 
 that RDRAND is supported in CPUID and the emulate the instruction when 
 guest executes it. KVM already traps guest #UD (which would occur if 
 RDRAND executed while it is not supported) - so this scheme wouldn’t 
 introduce additional overhead over RDMSR.

 Because then guest user code will think that rdrand is there and will
 try to use it, resulting in abysmal performance.


 Yes, the presence of RDRAND implies a cheap and inexhaustible entropy
 source.
 
 A guest kernel couldn't make it look like RDRAND is not present to guest
 userspace?
 

It could, but how would you enumerate that?  A new RDRAND-CPL-0 CPUID
bit pretty much would be required.

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Standardizing an MSR or other hypercall to get an RNG seed?

2014-09-22 Thread H. Peter Anvin
On 09/22/2014 07:17 AM, H. Peter Anvin wrote:
 
 It could, but how would you enumerate that?  A new RDRAND-CPL-0 CPUID
 bit pretty much would be required.
 

Note that there are two things that differ: the CPL 0-ness and the
performance/exhaustibility attributes.

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Standardizing an MSR or other hypercall to get an RNG seed?

2014-09-22 Thread H. Peter Anvin
Not really, no.

Sent from my tablet, pardon any formatting problems.

 On Sep 22, 2014, at 06:31, Christopher Covington c...@codeaurora.org wrote:
 
 On 09/19/2014 05:46 PM, H. Peter Anvin wrote:
 On 09/19/2014 01:46 PM, Andy Lutomirski wrote:
 
 However, it sounds to me that at least for KVM, it is very easy just to 
 emulate the RDRAND instruction. The hypervisor would report to the guest 
 that RDRAND is supported in CPUID and the emulate the instruction when 
 guest executes it. KVM already traps guest #UD (which would occur if 
 RDRAND executed while it is not supported) - so this scheme wouldn’t 
 introduce additional overhead over RDMSR.
 
 Because then guest user code will think that rdrand is there and will
 try to use it, resulting in abysmal performance.
 
 Yes, the presence of RDRAND implies a cheap and inexhaustible entropy
 source.
 
 A guest kernel couldn't make it look like RDRAND is not present to guest
 userspace?
 
 Christopher
 
 -- 
 Employee of Qualcomm Innovation Center, Inc.
 Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
 hosted by the Linux Foundation.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Standardizing an MSR or other hypercall to get an RNG seed?

2014-09-19 Thread H. Peter Anvin
On 09/19/2014 09:37 AM, Gleb Natapov wrote:

 Linux detects what hypervior it runs on very early

Not anywhere close to early enough.  We're talking for uses like kASLR.

-hpa

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Standardizing an MSR or other hypercall to get an RNG seed?

2014-09-19 Thread H. Peter Anvin
On 09/19/2014 09:14 AM, Nakajima, Jun wrote:
 
 I slept on it, and I think using the CPUID instruction alone would be
 simple and efficient:
 - We have a huge space for CPUID leaves
 - CPUID also works for user-level
 - It can take an additional 32-bit parameter (ECX), and returns 4
 32-bit values (EAX, EBX, ECX, and EDX).  RDMSR, for example, returns a
 64-bit value.
 
 Basically we can use it to implement a hypercall (rather than VMCALL).
 
 For example,
 - CPUID 0x4801.EAX would return the feature presence (e.g. in
 EBX), and the result in EDX:EAX (if present) at the same time, or
 - CPUID 0x4801.EAX would return the feature presence only, and
 CPUID 0x4802.EAX (acts like a hypercall) returns up to 4 32-bit
 values.
 

There is a huge disadvantage to the fact that CPUID is a user space
instruction, though.

-hpa

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Standardizing an MSR or other hypercall to get an RNG seed?

2014-09-19 Thread H. Peter Anvin
On 09/19/2014 09:53 AM, Gleb Natapov wrote:
 On Fri, Sep 19, 2014 at 09:40:07AM -0700, H. Peter Anvin wrote:
 On 09/19/2014 09:37 AM, Gleb Natapov wrote:

 Linux detects what hypervior it runs on very early

 Not anywhere close to early enough.  We're talking for uses like kASLR.

 Still to early to do:
 
h = cpuid(HYPERVIOR_SIGNATURE)
if (h == KVMKVMKVM) {
   if (cpuid(kvm_features)  kvm_rnd)
  rdmsr(kvm_rnd)
else (h == HyperV) {
   if (cpuid(hv_features)  hv_rnd)
 rdmsr(hv_rnd) 
else (h == XenXenXen) {
   if (cpuid(xen_features)  xen_rnd)
 rdmsr(xen_rnd)
   }
 

If we need to do chase loops, especially not so...

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Standardizing an MSR or other hypercall to get an RNG seed?

2014-09-19 Thread H. Peter Anvin
On 09/19/2014 10:15 AM, Gleb Natapov wrote:
 On Fri, Sep 19, 2014 at 10:08:20AM -0700, H. Peter Anvin wrote:
 On 09/19/2014 09:53 AM, Gleb Natapov wrote:
 On Fri, Sep 19, 2014 at 09:40:07AM -0700, H. Peter Anvin wrote:
 On 09/19/2014 09:37 AM, Gleb Natapov wrote:

 Linux detects what hypervior it runs on very early

 Not anywhere close to early enough.  We're talking for uses like kASLR.

 Still to early to do:

h = cpuid(HYPERVIOR_SIGNATURE)
if (h == KVMKVMKVM) {
   if (cpuid(kvm_features)  kvm_rnd)
  rdmsr(kvm_rnd)
else (h == HyperV) {
   if (cpuid(hv_features)  hv_rnd)
 rdmsr(hv_rnd) 
else (h == XenXenXen) {
   if (cpuid(xen_features)  xen_rnd)
 rdmsr(xen_rnd)
   }


 If we need to do chase loops, especially not so...

 What loops exactly? As a non native English speaker I fail to understand
 if your answer is yes or no ;)
 

The above isn't actually the full algorithm used.

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Standardizing an MSR or other hypercall to get an RNG seed?

2014-09-19 Thread H. Peter Anvin
On 09/19/2014 10:15 AM, Gleb Natapov wrote:
 On Fri, Sep 19, 2014 at 10:08:20AM -0700, H. Peter Anvin wrote:
 On 09/19/2014 09:53 AM, Gleb Natapov wrote:
 On Fri, Sep 19, 2014 at 09:40:07AM -0700, H. Peter Anvin wrote:
 On 09/19/2014 09:37 AM, Gleb Natapov wrote:

 Linux detects what hypervior it runs on very early

 Not anywhere close to early enough.  We're talking for uses like kASLR.

 Still to early to do:

h = cpuid(HYPERVIOR_SIGNATURE)
if (h == KVMKVMKVM) {
   if (cpuid(kvm_features)  kvm_rnd)
  rdmsr(kvm_rnd)
else (h == HyperV) {
   if (cpuid(hv_features)  hv_rnd)
 rdmsr(hv_rnd) 
else (h == XenXenXen) {
   if (cpuid(xen_features)  xen_rnd)
 rdmsr(xen_rnd)
   }


 If we need to do chase loops, especially not so...

 What loops exactly? As a non native English speaker I fail to understand
 if your answer is yes or no ;)
 

The above isn't actually the full algorithm used.

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Standardizing an MSR or other hypercall to get an RNG seed?

2014-09-19 Thread H. Peter Anvin
On 09/19/2014 10:21 AM, Andy Lutomirski wrote:

 There is a huge disadvantage to the fact that CPUID is a user space
 instruction, though.
 
 We can always make cpuid on the leaf in question return all zeros if CPL  0.
 

Not sure that is better...

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Standardizing an MSR or other hypercall to get an RNG seed?

2014-09-19 Thread H. Peter Anvin
On 09/19/2014 01:46 PM, Andy Lutomirski wrote:

 However, it sounds to me that at least for KVM, it is very easy just to 
 emulate the RDRAND instruction. The hypervisor would report to the guest 
 that RDRAND is supported in CPUID and the emulate the instruction when guest 
 executes it. KVM already traps guest #UD (which would occur if RDRAND 
 executed while it is not supported) - so this scheme wouldn’t introduce 
 additional overhead over RDMSR.
 
 Because then guest user code will think that rdrand is there and will
 try to use it, resulting in abysmal performance.
 

Yes, the presence of RDRAND implies a cheap and inexhaustible entropy
source.

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Standardizing an MSR or other hypercall to get an RNG seed?

2014-09-19 Thread H. Peter Anvin
On 09/19/2014 04:12 PM, Andy Lutomirski wrote:
 
 To force deterministic execution.
 
 I incorrectly thought that the kernel could switch RDRAND on and off.
 It turns out that a hypervisor can do this, but not the kernel.  Also,
 determinism is lost anyway because of TSX, which *also* can't be
 turned on and off.
 

Actually, a much bigger reason is because it lets rogue guest *user
space*, even will a well-behaved guest OS, do something potentially
harmful to the host.

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Standardizing an MSR or other hypercall to get an RNG seed?

2014-09-19 Thread H. Peter Anvin
On 09/19/2014 04:12 PM, Andy Lutomirski wrote:
 
 To force deterministic execution.
 
 I incorrectly thought that the kernel could switch RDRAND on and off.
 It turns out that a hypervisor can do this, but not the kernel.  Also,
 determinism is lost anyway because of TSX, which *also* can't be
 turned on and off.
 

Actually, a much bigger reason is because it lets rogue guest *user
space*, even will a well-behaved guest OS, do something potentially
harmful to the host.

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Standardizing an MSR or other hypercall to get an RNG seed?

2014-09-19 Thread H. Peter Anvin
On 09/19/2014 04:35 PM, Theodore Ts'o wrote:
 On Fri, Sep 19, 2014 at 04:29:53PM -0700, H. Peter Anvin wrote:

 Actually, a much bigger reason is because it lets rogue guest *user
 space*, even will a well-behaved guest OS, do something potentially
 harmful to the host.
 
 Right, but if the host kernel is dependent on the guest OS for
 security, the game is over.  The Guest Kernel must NEVER been able to
 do anything harmful to the host.  If it can, it is a severe security
 bug in KVM that must be fixed ASAP.
 

Security and resource well-behaved are two different things.

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Standardizing an MSR or other hypercall to get an RNG seed?

2014-09-18 Thread H. Peter Anvin
On 09/18/2014 07:40 AM, KY Srinivasan wrote:

 The main questions are what MSR index to use and how to detect the
 presence of the MSR.  I've played with two approaches:

 1. Use CPUID to detect the presence of this feature.  This is very easy for
 KVM to implement by using a KVM-specific CPUID feature.  The problem is
 that this will necessarily be KVM-specific, as the guest must first probe for
 KVM and then probe for the KVM feature.  I doubt that Hyper-V, for
 example, wants to claim to be KVM.  If we could standardize a non-
 hypervisor-specific CPUID feature, then this problem would go away.
 
 We would prefer a CPUID feature bit to detect this feature.
  

I guess if we're introducing the concept of pan-OS MSRs we could also
have pan-OS CPUID.  The real issue is to get a single non-conflicting
standard.

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Standardizing an MSR or other hypercall to get an RNG seed?

2014-09-18 Thread H. Peter Anvin
Quite frankly it might make more sense to define a cross-VM *cpuid* range.  The 
cpuid leaf can just point to the MSR.  The big question is who will be willing 
to be the registrar.

On September 18, 2014 11:35:39 AM PDT, Andy Lutomirski l...@amacapital.net 
wrote:
On Thu, Sep 18, 2014 at 10:42 AM, Nakajima, Jun
jun.nakaj...@intel.com wrote:
 On Thu, Sep 18, 2014 at 10:20 AM, KY Srinivasan k...@microsoft.com
wrote:


 -Original Message-
 From: Paolo Bonzini [mailto:paolo.bonz...@gmail.com] On Behalf Of
Paolo
 Bonzini
 Sent: Thursday, September 18, 2014 10:18 AM
 To: Nakajima, Jun; KY Srinivasan
 Cc: Mathew John; Theodore Ts'o; John Starks; kvm list; Gleb
Natapov; Niels
 Ferguson; Andy Lutomirski; David Hepkin; H. Peter Anvin; Jake
Oshins; Linux
 Virtualization
 Subject: Re: Standardizing an MSR or other hypercall to get an RNG
seed?

 Il 18/09/2014 19:13, Nakajima, Jun ha scritto:
  In terms of the address for the MSR, I suggest that you choose
one
  from the range between 4000H - 40FFH. The SDM (35.1
  ARCHITECTURAL MSRS) says All existing and future processors will
not
  implement any features using any MSR in this range. Hyper-V
already
  defines many synthetic MSRs in this range, and I think it would
be
  reasonable for you to pick one for this to avoid a conflict?

 KVM is not using any MSR in that range.

 However, I think it would be better to have the MSR (and perhaps
CPUID)
 outside the hypervisor-reserved ranges, so that it becomes
architecturally
 defined.  In some sense it is similar to the HYPERVISOR CPUID
feature.

 Yes, given that we want this to be hypervisor agnostic.


 Actually, that MSR address range has been reserved for that purpose,
along with:
 - CPUID.EAX=1 - ECX bit 31 (always returns 0 on bare metal)
 - CPUID.EAX=4000_00xxH leaves (i.e. HYPERVISOR CPUID)

I don't know whether this is documented anywhere, but Linux tries to
detect a hypervisor by searching CPUID leaves 0x400xyz00 for
KVMKVMKVM\0\0\0, so at least Linux can handle the KVM leaves being
in a somewhat variable location.

Do we consider this mechanism to work across all hypervisors and
guests?  That is, could we put something like CrossHVPara\0
somewhere in that range, where each hypervisor would be free to decide
exactly where it ends up?

--Andy

-- 
Sent from my mobile phone.  Please pardon brevity and lack of formatting.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Standardizing an MSR or other hypercall to get an RNG seed?

2014-09-18 Thread H. Peter Anvin
On 09/18/2014 02:46 PM, David Hepkin wrote:
 I'm not sure what you mean by this mechanism?  Are you suggesting that each 
 hypervisor put CrossHVPara\0 somewhere in the 0x4000 - 0x400f CPUID 
 range, and an OS has to do a full scan of this CPUID range on boot to find 
 it?  That seems pretty inefficient.  An OS will take 1000's of hypervisor 
 intercepts on every boot just to search this CPUID range.
 
 I suggest we come to consensus on a specific CPUID leaf where an OS needs to 
 look to determine if a hypervisor supports this capability.  We could define 
 a new CPUID leaf range at a well-defined location, or we could just use one 
 of the existing CPUID leaf ranges implemented by an existing hypervisor.  I'm 
 not familiar with the KVM CPUID leaf range, but in the case of Hyper-V, the 
 Hyper-V CPUID leaf range was architected to allow for other hypervisors to 
 implement it and just show through specific capabilities supported by the 
 hypervisor.  So, we could define a bit in the Hyper-V CPUID leaf range (since 
 Xen and KVM also implement this range), but that would require Linux to look 
 in that range on boot to discover this capability.
 

Yes, I would agree that if anything we should define a new range unique
to this cross-VM interface, e.g. 0x4800.

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Standardizing an MSR or other hypercall to get an RNG seed?

2014-09-18 Thread H. Peter Anvin
On 09/18/2014 03:00 PM, Andy Lutomirski wrote:
 On Thu, Sep 18, 2014 at 2:46 PM, David Hepkin david...@microsoft.com wrote:
 I'm not sure what you mean by this mechanism?  Are you suggesting that 
 each hypervisor put CrossHVPara\0 somewhere in the 0x4000 - 0x400f 
 CPUID range, and an OS has to do a full scan of this CPUID range on boot to 
 find it?  That seems pretty inefficient.  An OS will take 1000's of 
 hypervisor intercepts on every boot just to search this CPUID range.
 
 Linux already does this, which is arguably unfortunate.  But it's not
 quite that bad; the KVM and Xen code is only scanning at increments of
 0x100.
 
 I think that Linux as a guest would have no problem with checking the
 Hyper-V range or some new range.  I don't think that Linux would want
 to have to set a guest OS identity, and it's not entirely clear to me
 whether this would be necessary to use the Hyper-V mechanism.
 

We really don't want to have to do this in early code, though.


 I suggest we come to consensus on a specific CPUID leaf where an OS needs to 
 look to determine if a hypervisor supports this capability.  We could define 
 a new CPUID leaf range at a well-defined location, or we could just use one 
 of the existing CPUID leaf ranges implemented by an existing hypervisor.  
 I'm not familiar with the KVM CPUID leaf range, but in the case of Hyper-V, 
 the Hyper-V CPUID leaf range was architected to allow for other hypervisors 
 to implement it and just show through specific capabilities supported by the 
 hypervisor.  So, we could define a bit in the Hyper-V CPUID leaf range 
 (since Xen and KVM also implement this range), but that would require Linux 
 to look in that range on boot to discover this capability.
 
 I also don't know whether QEMU and KVM would be okay with implementing
 the host side of the Hyper-V mechanism by default.  They would have to
 implement at least leaves 0x4001 and 0x402, plus correctly
 reporting zeros through whatever leaf is used for this new feature.
 Gleb?  Paolo?
 

The problem is what happens with a noncooperating hypervisor.  I guess
we could put a magic number in one of the leaf registers, but still...

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: GET_RNG_SEED hypercall ABI? (Re: [PATCH v5 0/5] random,x86,kvm: Rework arch RNG seeds and get some from kvm)

2014-08-27 Thread H. Peter Anvin
On 08/27/2014 12:00 AM, Paolo Bonzini wrote:
 Il 27/08/2014 01:58, Andy Lutomirski ha scritto:
 hpa pointed out that the ABI that I chose (an MSR from the KVM range
 and a KVM cpuid bit) is unnecessarily KVM-specific.  It would be nice
 to allocate an MSR that everyone involved can agree on and, rather
 than relying on a cpuid bit, just have the guest probe for the MSR.

 This leads to a few questions:

 1. How do we allocate an MSR?  (For background, this would be an MSR
 that either returns 64 bits of best-effort cryptographically secure
 random data or fails with #GP.)
 
 Ask Intel? :)

I'm going to poke around internally.  Intel might as a matter of policy
be reluctant to assign an MSR index specifically for software use, but
I'll try to find out.

-hpa

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 0/5] random,x86,kvm: Rework arch RNG seeds and get some from kvm

2014-08-13 Thread H. Peter Anvin
On 08/12/2014 12:22 PM, Andy Lutomirski wrote:
 On Tue, Aug 12, 2014 at 12:17 PM, Theodore Ts'o ty...@mit.edu wrote:
 On Tue, Aug 12, 2014 at 12:11:29PM -0700, Andy Lutomirski wrote:

 What's the status of this series?  I assume that it's too late for at
 least patches 2-5 to make it into 3.17.

 Which tree were you hoping this patch series to go through?  I was
 assuming it would go through the x86 tree since the bulk of the
 changes in the x86 subsystem (hence my Acked-by).
 
 There's some argument that patch 1 should go through the kvm tree.
 There's no real need for patch 1 and 2-5 to end up in the same kernel
 release, either.
 

 IIRC, Peter had some concerns, and I don't remember if they were all
 addressed.  Peter?

 
 I don't know.  I rewrite one thing he didn't like and undid the other,
 but there's plenty of opportunity for this version to be problematic, too.
 

Sorry, I have been heads down on the current merge window.  I will look
at this for 3.18, presumably after Kernel Summit.

The proposed arch_get_rng_seed() is not really what it claims to be; it
most definitely does not produce seed-grade randomness, instead it seems
to be an arch function for best-effort initialization of the entropy
pools -- which is fine, it is just something quite different.

I want to look over it more carefully before acking it, though.

Andy, are you going to be in Chicago?

-hpa

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 0/5] random,x86,kvm: Rework arch RNG seeds and get some from kvm

2014-08-13 Thread H. Peter Anvin
On 08/13/2014 09:13 AM, Andy Lutomirski wrote:
 
 Sounds good to me.
 
 FWIW, I'd like to see a second use added in random.c: I think that we
 should do this, or even all of init_std_data, on resume from suspend
 and especially on resume from hibernate / kexec.
 

Yes, we should.  We also need to make it possible to do this after
cloning a VM.

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 0/5] random,x86,kvm: Rework arch RNG seeds and get some from kvm

2014-08-13 Thread H. Peter Anvin
On 08/13/2014 11:33 AM, Andy Lutomirski wrote:
 
 As for doing arch_random_init after clone/migration, I think we'll
 need another KVM extension for that, since, AFAIK, we don't actually
 get notified that we were cloned or migrated.  That will be
 nontrivial.  Maybe we can figure that out at KS, too.
 

We don't need a reset when migrated (although it might be a good idea
under some circumstances, i.e. if the pools might somehow have gotten
exposed) but definitely when cloned.

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 0/5] random,x86,kvm: Rework arch RNG seeds and get some from kvm

2014-08-13 Thread H. Peter Anvin
On 08/13/2014 11:44 AM, H. Peter Anvin wrote:
 On 08/13/2014 11:33 AM, Andy Lutomirski wrote:

 As for doing arch_random_init after clone/migration, I think we'll
 need another KVM extension for that, since, AFAIK, we don't actually
 get notified that we were cloned or migrated.  That will be
 nontrivial.  Maybe we can figure that out at KS, too.

 
 We don't need a reset when migrated (although it might be a good idea
 under some circumstances, i.e. if the pools might somehow have gotten
 exposed) but definitely when cloned.
 

But yes, we need a notification.  For obvious reasons there is no
suspend event (one can snapshot a running VM) but we need to be notified
upon wakeup, *or* we need to give KVM a way to update the necessary state.

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] virtio: rng: add derating factor for use by hwrng core

2014-08-12 Thread H. Peter Anvin
On 08/11/2014 10:27 PM, Amit Shah wrote:
 On (Mon) 11 Aug 2014 [15:11:03], H. Peter Anvin wrote:
 On 08/11/2014 11:49 AM, Amit Shah wrote:
 The khwrngd thread is started when a hwrng device of sufficient
 quality is registered.  The virtio-rng device is backed by the
 hypervisor, and we trust the hypervisor to provide real entropy.  A
 malicious hypervisor is a scenario that's ruled out, so we are certain
 the quality of randomness we receive is perfectly trustworthy.  Hence,
 we use 100% for the factor, indicating maximum confidence in the source.

 Signed-off-by: Amit Shah amit.s...@redhat.com

 It isn't ruled out, it is just irrelevant: if the hypervisor is
 malicious, the quality of your random number source is the least of your
 problems.
 
 Yea; I meant ruled out in that sense.  Should the commit msg be more
 verbose?
 

Yes, as it is written it is misleading.

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] virtio: rng: add derating factor for use by hwrng core

2014-08-11 Thread H. Peter Anvin
On 08/11/2014 11:49 AM, Amit Shah wrote:
 The khwrngd thread is started when a hwrng device of sufficient
 quality is registered.  The virtio-rng device is backed by the
 hypervisor, and we trust the hypervisor to provide real entropy.  A
 malicious hypervisor is a scenario that's ruled out, so we are certain
 the quality of randomness we receive is perfectly trustworthy.  Hence,
 we use 100% for the factor, indicating maximum confidence in the source.
 
 Signed-off-by: Amit Shah amit.s...@redhat.com

It isn't ruled out, it is just irrelevant: if the hypervisor is
malicious, the quality of your random number source is the least of your
problems.

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 2/5] random: Add and use arch_get_rng_seed

2014-07-22 Thread H. Peter Anvin
On 07/22/2014 01:44 PM, Andy Lutomirski wrote:
 
 But, if you Intel's hardware does, in fact, work as documented, then
 the current code will collect very little entropy on RDSEED-less
 hardware.  I see no great reason that we should do something weaker
 than following Intel's explicit recommendation for how to seed a PRNG
 from RDRAND.
 

Very little entropy in the architectural worst case.  However, since we
are running single-threaded at this point, actual hardware performs
orders of magnitude better.  Since we run the mixing function (for no
particularly good reason -- it is a linear function and doesn't add
security) there will be enough delay that RDRAND will in practice catch
up and the output will be quite high quality.  Since the pool is quite
large, the likely outcome is that there will be enough randomness that
in practice we would probably be okay if *no* further entropy was ever
collected.

 Another benefit of this split is that it will potentially allow
 arch_get_rng_seed to be made to work before alternatives are run.
 There's no fundamental reason that it couldn't work *extremely* early
 in boot.  (The KASLR code is an example of how this might work.)  On
 the other hand, making arch_get_random_long work very early in boot
 would either slow down all the other callers or add a considerable
 amount of extra complexity.
 
 So I think that this patch is a slight improvement in RNG
 initialization and will actually result in simpler code.  (And yes, if
 I submit a new version of it, I'll fix the changelog.)

There really isn't any significant reason why we could not permit
randomness initialization very early in the boot, indeed.  It has
largely been useless in the past because until the I/O system gets
initialized there is no randomness of any kind available on traditional
hardware.

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 2/5] random: Add and use arch_get_rng_seed

2014-07-22 Thread H. Peter Anvin
On 07/22/2014 02:04 PM, Andy Lutomirski wrote:
 
 Just to check: do you mean the RDRAND is very likely to work (i.e.
 arch_get_random_long will return true) or that RDRAND will actually
 reseed several times during initialization?
 

I mean that RDRAND will actually reseed several times during
initialization.  The documented architectural limit is actually
extremely conservative.

Either way, it isn't really different from seeding from a VM hosts
/dev/urandom...

-hpa

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 2/5] random: Add and use arch_get_rng_seed

2014-07-22 Thread H. Peter Anvin
On 07/22/2014 02:10 PM, Andy Lutomirski wrote:
 On Tue, Jul 22, 2014 at 2:08 PM, H. Peter Anvin h...@zytor.com wrote:
 On 07/22/2014 02:04 PM, Andy Lutomirski wrote:

 Just to check: do you mean the RDRAND is very likely to work (i.e.
 arch_get_random_long will return true) or that RDRAND will actually
 reseed several times during initialization?


 I mean that RDRAND will actually reseed several times during
 initialization.  The documented architectural limit is actually
 extremely conservative.

 Either way, it isn't really different from seeding from a VM hosts
 /dev/urandom...
 
 Sure it is.  The VM host's /dev/urandom makes no guarantee (or AFAIK
 even any particular effort) to reseed such that the output has some
 minimum entropy per bit, so there would be no point to reading extra
 data from it.

Depends on what you define as extra data.  If the data pulled is less
than the size of the output pool, it *may* be fully entropic.

(Fun fact: it may even have been fully entropic at the time you pull it,
but then turn out not to be later because *another* process consumed
data from /dev/urandom without adequate reseeding.)

 Anyway, I'd be willing to drop the conservative RDRAND logic, but I
 *still* think that arch_get_rng_seed is a much better interface than
 arch_get_slow_rng_u64.

That I will leave up to you and Ted.

-hpa

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/5] random,x86: Add arch_get_slow_rng_u64

2014-07-17 Thread H. Peter Anvin
On 07/17/2014 03:33 AM, Theodore Ts'o wrote:
 On Wed, Jul 16, 2014 at 09:55:15PM -0700, H. Peter Anvin wrote:
 On 07/16/2014 05:03 PM, Andy Lutomirski wrote:

 I meant that prandom isn't using rdrand for early seeding.


 We should probably fix that.
 
 It wouldn't hurt to explicitly use arch_get_random_long() in prandom,
 but it does use get_random_bytes() in early seed, and for CPU's with
 RDRAND present, we do use it in init_std_data() in
 drivers/char/random.c, so prandom is already getting initialized via
 an RNG (which is effectively a DRBG even if it doesn't pass all of
 NIST's rules) which is derived from RDRAND.
 

I assumed he was referring to before alternatives.  Not sure if we use
prandom before that point, though.

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] random,x86,kvm: Add and use MSR_KVM_GET_RNG_SEED

2014-07-16 Thread H. Peter Anvin
On 07/16/2014 07:07 AM, Andy Lutomirski wrote:
 
 This patch has nothing whatsoever to do with how much I trust the CPU
 vs the hypervisor.  It's for the enormous installed base of machines
 without RDRAND.
 
 hpa suggested emulating RDRAND awhile ago, but I think that'll
 unusably slow -- the kernel uses RDRAND in various places where it's
 expected to be fast, and not using it at all will be preferable to
 causing a VM exit for every few bytes.  I've been careful to only use
 this in the guest in places where a few hundred to a few thousand
 cycles per 64 bits of RNG seed is acceptable.
 

I suggested emulating RDRAND *but not set the CPUID bit*.  We already
developed a protocol in KVM/Qemu to enumerate emulated features (created
for MOVBE as I recall), specifically to service the semantic feature X
will work but will be substantially slower than normal.

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] random,x86,kvm: Add and use MSR_KVM_GET_RNG_SEED

2014-07-16 Thread H. Peter Anvin
On 07/16/2014 09:08 AM, Paolo Bonzini wrote:
 Il 16/07/2014 18:03, H. Peter Anvin ha scritto:
 I suggested emulating RDRAND *but not set the CPUID bit*.  We already
 developed a protocol in KVM/Qemu to enumerate emulated features (created
 for MOVBE as I recall), specifically to service the semantic feature X
 will work but will be substantially slower than normal.
 
 But those will set the CPUID bit.  There is currently no way for KVM
 guests to know if a CPUID bit is real or emulated.
 

OK, so there wasn't any protocol implemented in the end.  I sit corrected.

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] random,x86,kvm: Add and use MSR_KVM_GET_RNG_SEED

2014-07-16 Thread H. Peter Anvin
On 07/16/2014 09:21 AM, Gleb Natapov wrote:
 On Wed, Jul 16, 2014 at 09:13:23AM -0700, H. Peter Anvin wrote:
 On 07/16/2014 09:08 AM, Paolo Bonzini wrote:
 Il 16/07/2014 18:03, H. Peter Anvin ha scritto:
 I suggested emulating RDRAND *but not set the CPUID bit*.  We already
 developed a protocol in KVM/Qemu to enumerate emulated features (created
 for MOVBE as I recall), specifically to service the semantic feature X
 will work but will be substantially slower than normal.

 But those will set the CPUID bit.  There is currently no way for KVM
 guests to know if a CPUID bit is real or emulated.


 OK, so there wasn't any protocol implemented in the end.  I sit corrected.

 That protocol that was implemented is between qemu and kvm, not kvm and a 
 guest.
 

Either which way, the notion was to have a PV CPUID bit like the
proposed kvm_get_rng_seed bit, but to have it exercised by executing RDRAND.

The biggest reason to *not* do this would be that with an MSR it is not
available to guest user space, which may be better under the circumstances.

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] random,x86,kvm: Add and use MSR_KVM_GET_RNG_SEED

2014-07-16 Thread H. Peter Anvin
On 07/16/2014 02:32 PM, Andy Lutomirski wrote:
 
 On the theory that I see no legitimate reason to expose this to guest
 user space, I think we shouldn't expose it.  If we wanted to add a
 get_random_bytes syscall, that would be an entirely different story,
 though.
 
 Should I send v3 as one series or should I split it into host and guest parts?
 

It doesn't matter... as long as they are separate *patches*.

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/5] random,x86: Add arch_get_slow_rng_u64

2014-07-16 Thread H. Peter Anvin
On 07/16/2014 02:45 PM, Andy Lutomirski wrote:
 diff --git a/arch/x86/include/asm/archslowrng.h 
 b/arch/x86/include/asm/archslowrng.h
 new file mode 100644
 index 000..c8e8d0d
 --- /dev/null
 +++ b/arch/x86/include/asm/archslowrng.h
 @@ -0,0 +1,30 @@
 +/*
 + * This file is part of the Linux kernel.
 + *
 + * Copyright (c) 2014 Andy Lutomirski
 + * Authors: Andy Lutomirski l...@amacapital.net
 + *
 + * This program is free software; you can redistribute it and/or modify it
 + * under the terms and conditions of the GNU General Public License,
 + * version 2, as published by the Free Software Foundation.
 + *
 + * This program is distributed in the hope it will be useful, but WITHOUT
 + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
 + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
 + * more details.
 + */
 +
 +#ifndef ASM_X86_ARCHSLOWRANDOM_H
 +#define ASM_X86_ARCHSLOWRANDOM_H
 +
 +#ifndef CONFIG_ARCH_SLOW_RNG
 +# error archslowrng.h should not be included if !CONFIG_ARCH_SLOW_RNG
 +#endif
 +

I'm *seriously* questioning the wisdom of this.  A much saner thing
would be to do:

#ifndef CONFIG_ARCH_SLOW_RNG

/* Not supported */
static inline int arch_get_slow_rng_u64(u64 *v)
{
(void)v;
return 0;
}

#endif

... which is basically what we do for the archrandom stuff.

I'm also wondering if it makes sense to have a function which prefers
arch_get_random*() over this one as a preferred interface.  Something like:

int get_random_arch_u64_slow_ok(u64 *v)
{
int i;
u64 x = 0;
unsigned long l;

for (i = 0; i  64/BITS_PER_LONG; i++) {
if (!arch_get_random_long(l))
return arch_get_slow_rng_u64(v);

x |=  l  (i*BITS_PER_LONG);
}
*v = l;
return 0;
}

This still doesn't address the issue e.g. on x86 where RDRAND is
available but we haven't set up alternatives yet.  So it might be that
what we really want is to encapsulate this fallback in arch code and do
a more direct enumeration.

 +
 +static int kvm_get_slow_rng_u64(u64 *v)
 +{
 + /*
 +  * Allow migration from a hypervisor with the GET_RNG_SEED
 +  * feature to a hypervisor without it.
 +  */
 + if (rdmsrl_safe(MSR_KVM_GET_RNG_SEED, v) == 0)
 + return 1;
 + else
 + return 0;
 +}

How about:

return rdmsrl_safe(MSR_KVM_GET_RNG_SEED, v) == 0;

The naming also feels really inconsistent...

-hpa

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/5] random,x86: Add arch_get_slow_rng_u64

2014-07-16 Thread H. Peter Anvin
On 07/16/2014 03:40 PM, Andy Lutomirski wrote:
 On Wed, Jul 16, 2014 at 3:13 PM, Andy Lutomirski l...@amacapital.net wrote:
 My personal preference is to defer this until some user shows up.  I
 think that even this would be too complicated for KASLR, which is the
 only extremely early-boot user that I found.

 Hmm.  Does the prandom stuff want to use this?
 
 prandom isn't even using rdrand.  I'd suggest fixing this separately,
 or even just waiting until someone goes and deletes prandom.
 

prandom is exactly the opposite; it is designed for when we need
possibly low quality random numbers very quickly.  RDRAND is actually
too slow.

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/5] random,x86: Add arch_get_slow_rng_u64

2014-07-16 Thread H. Peter Anvin
On 07/16/2014 05:03 PM, Andy Lutomirski wrote:

 prandom is exactly the opposite; it is designed for when we need
 possibly low quality random numbers very quickly.  RDRAND is actually
 too slow.
 
 I meant that prandom isn't using rdrand for early seeding.
 

We should probably fix that.

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 9/9] KVM: x86: smsw emulation is incorrect in 64-bit mode

2014-06-05 Thread H. Peter Anvin
On 06/05/2014 08:02 AM, Nadav Amit wrote:
 I'm sorry, I'm missing the place where 64-bit mode is taken into account?
 It is not, since on 32-bit mode the high-order 16 bits of a register 
 destination are undefined. 
 If I recall correctly, in this case the high-order 16-bits on native
system actually reflect the high-order 16-bits of CR0.

This sounds like something that really should be verified
experimentally.  The above claim seems... odd.

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] x86: fix page fault tracing when KVM guest support enabled

2014-05-16 Thread H. Peter Anvin
On 05/16/2014 12:45 PM, Dave Hansen wrote:
 From: Dave Hansen dave.han...@linux.intel.com
 
 I noticed on some of my systems that page fault tracing doesn't
 work:
 
   cd /sys/kernel/debug/tracing
   echo 1  events/exceptions/enable
   cat trace;
   # nothing shows up
 
 I eventually traced it down to CONFIG_KVM_GUEST.  At least in a
 KVM VM, enabling that option breaks page fault tracing, and
 disabling fixes it.  I tried on some old kernels and this does
 not appear to be a regression: it never worked.
 
 There are two page-fault entry functions today.  One when tracing
 is on and another when it is off.  The KVM code calls do_page_fault()
 directly instead of calling the traced version:
 
 dotraplinkage void __kprobes
 do_async_page_fault(struct pt_regs *regs, unsigned long
 error_code)
 {
 enum ctx_state prev_state;

 switch (kvm_read_and_reset_pf_reason()) {
 default:
 do_page_fault(regs, error_code);
 break;
 case KVM_PV_REASON_PAGE_NOT_PRESENT:
 
 I'm also having problems with the page fault tracing on bare
 metal (same symptom of no trace output).  I'm unsure if it's
 related.
 
 Steven had an alternative to this which has zero overhead when
 tracing is off where this includes the standard noops even when
 tracing is disabled.  I'm unconvinced that the extra complexity
 of his apporach:
 
   http://lkml.kernel.org/r/20140508194508.561ed...@gandalf.local.home
 
 is worth it, expecially considering that the KVM code is already
 making page fault entry slower here.  This solution is
 dirt-simple.
 
 Gleb, please apply.
 
 Signed-off-by: Dave Hansen dave.han...@linux.intel.com
 Cc: Thomas Gleixner t...@linutronix.de
 Cc: x...@kernel.org
 Cc: Peter Zijlstra pet...@infradead.org
 Cc: Gleb Natapov g...@redhat.com
 Cc: H. Peter Anvin h...@zytor.com
 Cc: kvm@vger.kernel.org
 Cc: Paolo Bonzini pbonz...@redhat.com
 Cc: Steven Rostedt rost...@goodmis.org

Acked-by: H. Peter Anvin h...@linux.intel.com

If Gleb and Paolo are okay with it, I am.

-hpa




--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: x86_64 allyesconfig has screwed up voffset and blows up KVM

2014-05-05 Thread H. Peter Anvin
On 05/05/2014 11:41 AM, Andy Lutomirski wrote:
 I'm testing 39bfe90706ab0f588db7cb4d1c0e6d1181e1d2f9.  I'm not sure
 what's going on here.
 
 voffset.h contains:
 
 #define VO__end 0x8111c7a0
 #define VO__end 0x8db9a000
 #define VO__text 0x8100
 
 because
 
 $ nm vmlinux|grep ' _end'
 8111c7a0 t _end
 8db9a000 B _end
 

The t _end implies there is a local symbol _end which I guess the
scripts are incorrectly picking up.  Taking a look now.

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


random: Providing a seed value to VM guests

2014-05-01 Thread H. Peter Anvin
On 05/01/2014 11:53 AM, Andy Lutomirski wrote:
 
 A CPUID leaf or an MSR advertised by a CPUID leaf has another
 advantage: it's easy to use in the ASLR code -- I don't think there's
 a real IDT, so there's nothing like rdmsr_safe available.  It also
 avoids doing anything complicated with the boot process to allow the
 same seed to be used for ASLR and random.c; it can just be invoked
 twice on boot.
 

At that point we are talking an x86-specific interface, and so we might
as well simply emulate RDRAND (urandom) and RDSEED (random) if the CPU
doesn't support them.  I believe KVM already has a way to report CPUID
features that are emulated but supported anyway, i.e. they work but
are slow.

 What's the right forum for this?  This thread is probably not it.

Change the subject line?

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: random: Providing a seed value to VM guests

2014-05-01 Thread H. Peter Anvin
The normal CPUID bit is unset I believe.

On May 1, 2014 12:02:49 PM PDT, Andy Lutomirski l...@amacapital.net wrote:
On Thu, May 1, 2014 at 11:59 AM, H. Peter Anvin h...@zytor.com wrote:
 On 05/01/2014 11:53 AM, Andy Lutomirski wrote:

 A CPUID leaf or an MSR advertised by a CPUID leaf has another
 advantage: it's easy to use in the ASLR code -- I don't think
there's
 a real IDT, so there's nothing like rdmsr_safe available.  It also
 avoids doing anything complicated with the boot process to allow the
 same seed to be used for ASLR and random.c; it can just be invoked
 twice on boot.


 At that point we are talking an x86-specific interface, and so we
might
 as well simply emulate RDRAND (urandom) and RDSEED (random) if the
CPU
 doesn't support them.  I believe KVM already has a way to report
CPUID
 features that are emulated but supported anyway, i.e. they work but
 are slow.

Do existing kernels and userspace respect this?  If the normal bit for
RDRAND is unset, then we might be okay, but, if not, then I think this
may kill guest performance.

Is RDSEED really reasonable here?  Won't it slow down by several
orders of magnitude?


 What's the right forum for this?  This thread is probably not it.

 Change the subject line?

:)


 -hpa



-- 
Sent from my mobile phone.  Please pardon brevity and lack of formatting.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: random: Providing a seed value to VM guests

2014-05-01 Thread H. Peter Anvin
As I said... I think KVM has already added an emulated instructions enumeration 
API.

On May 1, 2014 12:26:18 PM PDT, ty...@mit.edu wrote:
On Thu, May 01, 2014 at 12:02:49PM -0700, Andy Lutomirski wrote:
 
 Is RDSEED really reasonable here?  Won't it slow down by several
 orders of magnitude?

That is I think the biggest problem; RDRAND and RDSEED are fast if
they are native, but they will involve a VM exit if they need to be
emulated.  So when an OS might want to use RDRAND and RDSEED might be
quite different if we know they are being emulated.

Using the RDRAND and RDSEED api certainly makes sense, at least for
x86, but I suspect we might want to use a different way of signalling
that a VM guest can use RDRAND and RDSEED if they are running on a CPU
which doesn't provide that kind of access.  Maybe a CPUID extended
function parameter, if one could be allocated for use by a Linux
hypervisor?

   - Ted

-- 
Sent from my mobile phone.  Please pardon brevity and lack of formatting.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: random: Providing a seed value to VM guests

2014-05-01 Thread H. Peter Anvin
RDSEED is not synchronous.  It is, however, nonblocking.

On May 1, 2014 1:16:40 PM PDT, Andy Lutomirski l...@amacapital.net wrote:
On May 1, 2014 12:26 PM, ty...@mit.edu wrote:

 On Thu, May 01, 2014 at 12:02:49PM -0700, Andy Lutomirski wrote:
 
  Is RDSEED really reasonable here?  Won't it slow down by several
  orders of magnitude?

 That is I think the biggest problem; RDRAND and RDSEED are fast if
 they are native, but they will involve a VM exit if they need to be
 emulated.  So when an OS might want to use RDRAND and RDSEED might be
 quite different if we know they are being emulated.

 Using the RDRAND and RDSEED api certainly makes sense, at least for
 x86, but I suspect we might want to use a different way of signalling
 that a VM guest can use RDRAND and RDSEED if they are running on a
CPU
 which doesn't provide that kind of access.  Maybe a CPUID extended
 function parameter, if one could be allocated for use by a Linux
 hypervisor?


I'm still not convinced.  This will affect userspace as well as the
guest kernel, and I don't see why guest user code should be able to
access this API.  RDRAND for CPL0 only would work, but that seems odd.

And I think that RDSEED emulation is asking for trouble.  RDSEED is
synchronous, but /dev/random is asynchronous.  And making bootup wait
for even a single byte from /dev/random seems bad.  In any event,
virtio-rng should be a better interface for this.

 - Ted


-- 
Sent from my mobile phone.  Please pardon brevity and lack of formatting.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: random: Providing a seed value to VM guests

2014-05-01 Thread H. Peter Anvin
On 05/01/2014 01:56 PM, Andy Lutomirski wrote:
 
 Even if we could emulate RDSEED effectively**, I don't really
 understand what the guest is expected to do with it.  And I generally
 dislike defining an interface with no known sensible users, because it
 means that there's a good chance that the interface won't end up
 working.

 ** Doing this sensibly in the host will be awkward.  Is the host
 supposed to use non-blocking reads of /dev/random?  Getting anything
 remotely fair may be difficult.

The host can use nonblocking reads of /dev/random.  Fairness would have
to be implemented at the host level, but that is true for anything.

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: random: Providing a seed value to VM guests

2014-05-01 Thread H. Peter Anvin
On 05/01/2014 03:32 PM, Andy Lutomirski wrote:
 On Thu, May 1, 2014 at 3:28 PM,  ty...@mit.edu wrote:
 On Thu, May 01, 2014 at 02:06:13PM -0700, Andy Lutomirski wrote:

 I still don't see the point.  What does this do better than virtio-rng?

 I believe you had been complaining about how complicated it was to set
 up virtio?  And this complexity is also an issue if we want to use it
 to initialize the RNG used for the kernel text ASLR --- which has to
 be done very early in the boot process, and where making something as
 simple as possible is a Good Thing.
 
 It's complicated, so it won't be up until much later in the boot
 process.  This is completely fine for /dev/random, but it's a problem
 for /dev/urandom, ASLR, and such.
 

 And since we would want to use RDRAND/RDSEED if it is available
 *anyway*, perhaps in combination with other things, why not use the
 RDRAND/RDSEED interface?
 
 Because it's awkward.  I don't think it simplifies anything.
 

It greatly simplifies discovery, which is a Big Deal[TM] in early code.

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: random: Providing a seed value to VM guests

2014-05-01 Thread H. Peter Anvin
On 05/01/2014 03:56 PM, Andy Lutomirski wrote:
 
 I think we're comparing:
 
 a) cpuid to detect rdrand *or* emulated rdrand followed by rdrand
 
 to
 
 b) cpuid to detect rdrand or the paravirt seed msr/cpuid call,
 followed by rdrand or the msr or cpuid read
 
 this seems like it barely makes a difference, especially since (a)
 probably requires detecting KVM anyway.

Well, it lets one do something like:

if (boot_cpu_has(X86_FEATURE_RDRAND) ||
boot_cpu_has(X86_FEATURE_RDRAND_SIMULATED))
rdrand_long(...);

We need the ifs anyway for early code; the arch_*() interfaces are only
available after alternatives run.

-hpa

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] KVM: x86: RSI/RDI/RCX are zero-extended when affected by string ops

2014-04-23 Thread H. Peter Anvin
On 04/23/2014 01:53 PM, Nadav Amit wrote:

 Err, operand size is forced to 64-bits, not address size.

 The following aspects of near branches are controlled by the effective
 operand size:
   • Truncation of the size of the instruction pointer

 Still, 67h call should not truncate EIP (which your patch does).

 Yes, I missed it.
 But if I am not mistaken again, it means that the existing
 implementation of jmp_rel is broken as well when address-size override
 prefix is used. In this case, as I see it, the existing masking would
 cause the carry from the add operation to the lower half of the rip not
 to be added to the rip higher half.
 
 I guess another patch is needed for that as well.
 

Yes, on x86 JMP really should be thought of as MOV ...,IP/EIP/RIP.  On
some other architectures, e.g. m68k, JMP acts as if it was
LEA ...,PC, which causes some serious confusion for people familiar
with that model.  However, on x86 considering JMP as a MOV to the IP
register really is very consistent and will give you the right mental model.

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: Fix page-tables reserved bits

2014-04-16 Thread H. Peter Anvin
On 04/16/2014 12:03 PM, Marcelo Tosatti wrote:
 @@ -3550,9 +3550,9 @@ static void reset_rsvds_bits_mask(struct kvm_vcpu 
 *vcpu,
  break;
  case PT64_ROOT_LEVEL:
  context-rsvd_bits_mask[0][3] = exb_bit_rsvd |
 -rsvd_bits(maxphyaddr, 51) | rsvd_bits(7, 8);
 +rsvd_bits(maxphyaddr, 51) | rsvd_bits(7, 7);
  context-rsvd_bits_mask[0][2] = exb_bit_rsvd |
 -rsvd_bits(maxphyaddr, 51) | rsvd_bits(7, 8);
 +rsvd_bits(maxphyaddr, 51) | rsvd_bits(7, 7);
 
 Bit 7 is not reserved either, for the PDPTE (its PageSize bit).
 

In long mode (IA-32e), bit 7 is definitely reserved.

-hpa

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 0/4] KVM: enable Intel SMAP for KVM

2014-04-13 Thread H. Peter Anvin
I would like to see this in 3.15.

 -hpa

On April 13, 2014 2:57:38 PM PDT, Marcelo Tosatti mtosa...@redhat.com wrote:
On Fri, Apr 11, 2014 at 08:16:28PM -0400, Paolo Bonzini wrote:
 Il 10/04/2014 16:01, Marcelo Tosatti ha scritto:
 On Tue, Apr 08, 2014 at 04:38:08PM -0400, Paolo Bonzini wrote:
 Il 07/04/2014 21:06, Wu, Feng ha scritto:
 Even though the tests do not cover the CPL=3/implicit access
case, the
 logic to compute PFERR_RSVD_MASK dynamically is already covered
by AC=1.
   So I'm quite happy with the coverage.  Series is
 
 Reviewed-by: Paolo Bonzini pbonz...@redhat.com]
 Thanks very much for your review on this.
 BTW: Since 3.15 merge window is still open, I am wondering whether
there is
 any possibility to make SMAP into 3.15 with another pull request.
 
 
 This is up to Marcelo who is currently managing the KVM tree.
 
 Paolo
 
 The merge window is for patches which have been tested in queue/next
 for sometime. This patch has received no testing other than the
 developer testing.
 
 This is not going to change unfortunately since this is not shipping
 in any real silicon.  The only hope could be to use QEMU's SVM and
 SMAP emulation.

Well, let me know if you want an exception to the rule so i should
merge this patchset and submit it for 3.15.

 
 Lack of implicit supervisor mode by instructions such as Examples
of
 such implicit... in section 9.3.2, in KVM's emulator, makes the
feature
 incomplete, does it not ?
 
 Implicit supervisor mode is handled by KVM emulator using
 read/write_std.  These accesses do not set PFERR_USER_MASK, and
 should work fine with SMAP.  Am I misunderstanding?
 
 Paolo

Right.

-- 
Sent from my mobile phone.  Please pardon brevity and lack of formatting.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] KVM: enable Intel SMAP for KVM

2014-03-27 Thread H. Peter Anvin
On 03/27/2014 04:50 AM, Paolo Bonzini wrote:
 
 You also need a matching QEMU patch to enable SMAP on Haswell CPUs (do
 all Haswells have SMAP?), though only for a 2.1 or newer machine type.
 But this can be covered later.
 

Haswell does not have SMAP (Ivy Bridge and Haswell do have SMEP, however.)

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: GPF in intel_pmu_lbr_reset() with qemu -cpu host

2014-03-22 Thread H. Peter Anvin
Using _safe has it's own issues if noone checks the errors.

On March 22, 2014 5:27:59 AM PDT, Gleb Natapov g...@kernel.org wrote:
On Sat, Mar 22, 2014 at 11:05:03AM +0100, Peter Wu wrote:
 On Saturday 22 March 2014 10:50:45 Gleb Natapov wrote:
  On Fri, Mar 21, 2014 at 12:04:32PM -0700, Venkatesh Srinivas wrote:
   On Fri, Mar 21, 2014 at 10:46 AM, Peter Wu pe...@lekensteyn.nl
wrote:
  [skip]
  
   When -cpu host is used, qemu/kvm passed the host CPUID F/M/S to
the
   guest. intel_pmu_cpu_*() - intel_pmu_lbr_reset() uses rdmsr() /
   wrmsr(), rather than the safe variants; if KVM does not support
the
   particular MSRs in question, you will see a #GP(0) there. See
   https://lkml.org/lkml/2014/3/13/453 for a similar bug other PMU
code.
   
  When kernel is compiled with guest support all rdmsr()/wrmsr()
become _safe(),
  so the question for Peter is if his guest kernel has guest support
enabled?
 
 Linux guest support (CONFIG_HYPERVISOR_GUEST) was not enabled, see
 .config in the first mail[1]. Enabling that option does not change
the
 situation.
 
 With CONFIG_PARAVIRT and CONFIG_KVM_GUEST enabled, the PMU GPF is
gone,
Yeah, it should be PARAVIRT indeed since rdmsr()/wrmsr() is substituted
by _safe()
using paravirt calls.

 but now I have a NULL dereference (in rapl_pmu_init). Previously,
when
 `-cpu SandyBridge` was passed to qemu, it would show this:
 
 [0.016995] Performance Events: unsupported p6 CPU model 42 no
PMU driver, software events only.
 
 The same NULL pointer deref would be visible (slightly different
 addresses, but the Code lines are equal). With `-host`, the NULL
deref
 with `-cpu host` contains:
 
 [0.016445] Performance Events: 16-deep LBR, IvyBridge events,
Intel PMU driver.
 
 Full dmesg below.
 
I am confused. Do you see crash now with -cpu SandyBridge and -cpu
host, or -cpu host only?

--
   Gleb.

-- 
Sent from my mobile phone.  Please pardon brevity and lack of formatting.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: GPF in intel_pmu_lbr_reset() with qemu -cpu host

2014-03-21 Thread H. Peter Anvin
Calling this a bug in the PMU code is ridiculous.  If KVM tells the system it 
os a specific vendor-family-model-stepping but diverges in behavior then it, by 
definition, is broken.

On March 21, 2014 12:04:32 PM PDT, Venkatesh Srinivas venkate...@google.com 
wrote:
On Fri, Mar 21, 2014 at 10:46 AM, Peter Wu pe...@lekensteyn.nl wrote:
 cc'ing kvm people and list.

 On Friday 21 March 2014 18:42:40 Peter Wu wrote:
 Hi,

 While trying to run QEMU with `-enable-kvm -host cpu`, I get a GPF
in
 intel_pmu_lbr_reset():

 [0.024000] general protection fault:  [#1]
 [0.024000] CPU: 0 PID: 1 Comm: swapper Not tainted
3.14.0-rc7-qemu-00059-g08edb33 #14
 [0.024000] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
 [0.024000] task: 88003e05 ti: 88003e054000 task.ti:
88003e054000
 [0.024000] RIP: 0010:[8101148a]  [8101148a]
intel_pmu_lbr_reset+0x2a/0x80
 [0.024000] RSP: :88003e055e78  EFLAGS: 0002
 [0.024000] RAX:  RBX: 0286 RCX:
0680
 [0.024000] RDX:  RSI:  RDI:

 [0.024000] RBP: 81622120 R08: 88003ffee0e0 R09:
88003e00bf00
 [0.024000] R10:  R11: 0004 R12:

 [0.024000] R13:  R14:  R15:

 [0.024000] FS:  () GS:8161e000()
knlGS:
 [0.024000] CS:  0010 DS:  ES:  CR0: 80050033
 [0.024000] CR2: 8800019bb000 CR3: 01611000 CR4:
001407b0
 [0.024000] Stack:
 [0.024000]  8101308a 8100e3da 8165ba62

 [0.024000]  8165b5bd  

 [0.024000]  81655dcd  

 [0.024000] Call Trace:
 [0.024000]  [8101308a] ?
intel_pmu_cpu_starting+0xa/0x80
 [0.024000]  [8100e3da] ? x86_pmu_notifier+0x5a/0xc0
 [0.024000]  [8165ba62] ?
init_hw_perf_events+0x4a5/0x4dd
 [0.024000]  [8165b5bd] ? check_bugs+0x42/0x42
 [0.024000]  [81655dcd] ? do_one_initcall+0x76/0xf9
 [0.024000]  [81276b70] ? rest_init+0x70/0x70
 [0.024000]  [81655ea7] ?
kernel_init_freeable+0x57/0x177
 [0.024000]  [81276b70] ? rest_init+0x70/0x70
 [0.024000]  [81276b75] ? kernel_init+0x5/0xe0
 [0.024000]  [8128067a] ? ret_from_fork+0x7a/0xb0
 [0.024000]  [81276b70] ? rest_init+0x70/0x70
 [0.024000] Code: 00 8b 15 02 c4 63 00 85 d2 74 69 f6 05 af c3 63
00 3f 75 2d 85 d2 7e 5c 31 f6 31 c0 0f 1f 44 00 00 8b 0d d2 c3 63 00 89
c2 01 f1 0f 30 83 c6 01 3b 35 d3 c3 63 00 7c e9 f3 c3 0f 1f 80 00 00
00
 [0.024000] RIP  [8101148a]
intel_pmu_lbr_reset+0x2a/0x80
 [0.024000]  RSP 88003e055e78
 [0.024000] ---[ end trace ecbd794f78441b2c ]---
 [0.024002] Kernel panic - not syncing: Attempted to kill init!
exitcode=0x000b


 It possibly has something to do with the msr write. Reproducable
with:

 qemu-system-x86_64 -enable-kvm -cpu host -kernel bzImage -m 1G
-serial file:ser.txt

 In the host dmesg, the following is visible when qemu:

 kvm [4939]: vcpu0 unhandled wrmsr: 0x680 data 0

 The full guest dmesg is shown below. The issue occurs also with
 v3.13.6, v3.12.14, v3.10.33 (other versions are not tested).

 QEMU: 1.7.0
 Host kernel: v3.14-rc5
 Guest kernel: v3.14-rc7-59-g08edb33 (.config on the bottom)

 Kind regards,
 Peter

 ### dmesg
 [0.00] Linux version 3.14.0-rc7-qemu-00059-g08edb33
(pc@antartica) (gcc version 4.8.2 (Ubuntu 4.8.2-16ubuntu6) ) #14 Fri
Mar 21 17:30:49 CET 2014
 [0.00] Command line: console=ttyS0 loglevel=8
 [0.00] KERNEL supported cpus:
 [0.00]   Intel GenuineIntel
 [0.00] e820: BIOS-provided physical RAM map:
 [0.00] BIOS-e820: [mem
0x-0x0009fbff] usable
 [0.00] BIOS-e820: [mem
0x0009fc00-0x0009] reserved
 [0.00] BIOS-e820: [mem
0x000f-0x000f] reserved
 [0.00] BIOS-e820: [mem
0x0010-0x3fffdfff] usable
 [0.00] BIOS-e820: [mem
0x3fffe000-0x3fff] reserved
 [0.00] BIOS-e820: [mem
0xfeffc000-0xfeff] reserved
 [0.00] BIOS-e820: [mem
0xfffc-0x] reserved
 [0.00] NX (Execute Disable) protection: active
 [0.00] SMBIOS 2.4 present.
 [0.00] DMI: Bochs Bochs, BIOS Bochs 01/01/2011
 [0.00] e820: update [mem 0x-0x0fff] usable ==
reserved
 [0.00] e820: remove [mem 0x000a-0x000f] usable
 [0.00] e820: last_pfn = 0x3fffe max_arch_pfn = 0x4
 [0.00] MTRR default type: write-back
 [0.00] MTRR fixed ranges enabled:
 [0.00]   0-9 

Re: [PATCH v7 06/11] pvqspinlock, x86: Allow unfair queue spinlock in a KVM guest

2014-03-20 Thread H. Peter Anvin
On 03/20/2014 03:01 PM, Paolo Bonzini wrote:
 
 No!  Please do what I asked you to do.  You are not handling Hyper-V or
 VMWare.  Just use X86_FEATURE_HYPERVISOR and it will cover all
 hypervisors that actually follow Intel's guidelines.
 

And for those that don't, we should turn on X86_FEATURE_HYPERVISOR in
the Linux enumeration code as we do for other features we detect
independently.

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] KVM: x86 emulator: emulate MOVAPS and MOVAPD SSE instructions

2014-03-17 Thread H. Peter Anvin
After seeing the sheer number of one-off additions, I'm wondering if going 
through the opcode map systematically and see what is still missing might not 
be a bad idea.

On March 17, 2014 2:30:43 AM PDT, Paolo Bonzini pbonz...@redhat.com wrote:
Il 15/03/2014 23:42, H. Peter Anvin ha scritto:
 Stupid question... what instructions do NOT need emulsion in KVM? It
would seem that at least anything that touches memory would?

Yes, indeed.  Anything that touches memory can be used on MMIO and then

needs emulation.

Paolo

 On March 15, 2014 1:01:58 PM PDT, Igor Mammedov imamm...@redhat.com
wrote:
 MS HCK test fails on 32-bit Windows 8.1 due to missing MOVAPS
 instruction emulation, this series adds it and while at it,
 it adds emulation of MOVAPD which is trivial to implement on
 top of MOVAPS.

 Igor Mammedov (2):
  KVM: x86 emulator: emulate MOVAPS
  KVM: x86 emulator: emulate MOVAPD

 arch/x86/kvm/emulate.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)


-- 
Sent from my mobile phone.  Please pardon brevity and lack of formatting.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] KVM: x86 emulator: emulate MOVAPS and MOVAPD SSE instructions

2014-03-17 Thread H. Peter Anvin
On 03/17/2014 10:01 AM, Paolo Bonzini wrote:
 the emulator).
 
 If CS and possibly SS are valid real mode selectors, it should be
 possible to run big real mode at almost-full speed, taking exits only
 for memory accesses via other segment registers.  It is on my todo list,
 but not very high.  Depending on the exit overhead, it may be a better
 idea to revert the emulate_invalid_guest_state default to N and let
 people who care about big real mode specify Y.
 

I'm not sure what you mean with valid real mode selectors; the normal
case in big real mode is that either CS = SS = 0 or CS = SS = some
program base address.

As Big Real Mode is part of the spec for certain things (option ROMs, as
we discussed) it probably matters, but especially with the CPUs not
supporting unrestricted mode fading into history I suspect it is fine
for BRM to be slow on those older processors.

The PM transitions that you mentioned are usually only a handful of
instructions and thus can be slow.

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] KVM: x86 emulator: emulate MOVAPS and MOVAPD SSE instructions

2014-03-15 Thread H. Peter Anvin
MOVAPS, MOVAPD, and MOVDQA are the same operation.  They may, architecturally, 
have different performance characteristics, but nothing that would affect an 
emulator.

On March 15, 2014 1:01:58 PM PDT, Igor Mammedov imamm...@redhat.com wrote:
MS HCK test fails on 32-bit Windows 8.1 due to missing MOVAPS
instruction emulation, this series adds it and while at it,
it adds emulation of MOVAPD which is trivial to implement on
top of MOVAPS.

Igor Mammedov (2):
  KVM: x86 emulator: emulate MOVAPS
  KVM: x86 emulator: emulate MOVAPD

 arch/x86/kvm/emulate.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

-- 
Sent from my mobile phone.  Please pardon brevity and lack of formatting.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] KVM: x86 emulator: emulate MOVAPS and MOVAPD SSE instructions

2014-03-15 Thread H. Peter Anvin
Stupid question... what instructions do NOT need emulsion in KVM? It would seem 
that at least anything that touches memory would?

On March 15, 2014 1:01:58 PM PDT, Igor Mammedov imamm...@redhat.com wrote:
MS HCK test fails on 32-bit Windows 8.1 due to missing MOVAPS
instruction emulation, this series adds it and while at it,
it adds emulation of MOVAPD which is trivial to implement on
top of MOVAPS.

Igor Mammedov (2):
  KVM: x86 emulator: emulate MOVAPS
  KVM: x86 emulator: emulate MOVAPD

 arch/x86/kvm/emulate.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

-- 
Sent from my mobile phone.  Please pardon brevity and lack of formatting.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [qemu64,+smep,+smap] Kernel panic - not syncing: No working init found.

2014-02-13 Thread H. Peter Anvin
On 02/13/2014 04:45 AM, Fengguang Wu wrote:
 Greetings,
 
 I find that when running
 
 qemu-system-x86_64 -cpu qemu64,+smep,+smap
 
 Some kernels will 100% produce this error, where the error code
 -13,-14 are -EACCES and -EFAULT:
 
 Any ideas?
 

I notice this is a non-SMAP kernel:

# CONFIG_X86_SMAP is not set

If the kernel turns on SMAP in CR4 even though SMAP isn't enabled in the
kernel, that is a kernel bug.  If Qemu enforces SMAP even if it is
turned off in CR4, that would be a Qemu bug.  I have reproduced the
failure locally and an am considering both possibilities now.

-hpa

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [qemu64,+smep,+smap] Kernel panic - not syncing: No working init found.

2014-02-13 Thread H. Peter Anvin
On 02/13/2014 06:55 AM, H. Peter Anvin wrote:
 On 02/13/2014 04:45 AM, Fengguang Wu wrote:
 Greetings,

 I find that when running

 qemu-system-x86_64 -cpu qemu64,+smep,+smap

 Some kernels will 100% produce this error, where the error code
 -13,-14 are -EACCES and -EFAULT:

 Any ideas?

 
 I notice this is a non-SMAP kernel:
 
 # CONFIG_X86_SMAP is not set
 
 If the kernel turns on SMAP in CR4 even though SMAP isn't enabled in the
 kernel, that is a kernel bug.  If Qemu enforces SMAP even if it is
 turned off in CR4, that would be a Qemu bug.  I have reproduced the
 failure locally and an am considering both possibilities now.
 

So we do turn on the bit in CR4 even with SMAP compiled out.  This is a
bug.  However, I still get the same failure even with that bug fixed
(and qemu info registers verify that it is, indeed, not set) so I'm
wondering if there is a bug in Qemu as well.  However, staring at the
code in Qemu I don't see where that bug would be...

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [qemu64,+smep,+smap] Kernel panic - not syncing: No working init found.

2014-02-13 Thread H. Peter Anvin
On 02/13/2014 06:55 AM, H. Peter Anvin wrote:
 On 02/13/2014 04:45 AM, Fengguang Wu wrote:
 Greetings,

 I find that when running

 qemu-system-x86_64 -cpu qemu64,+smep,+smap

 Some kernels will 100% produce this error, where the error code
 -13,-14 are -EACCES and -EFAULT:

 Any ideas?

 
 I notice this is a non-SMAP kernel:
 
 # CONFIG_X86_SMAP is not set
 
 If the kernel turns on SMAP in CR4 even though SMAP isn't enabled in the
 kernel, that is a kernel bug.  If Qemu enforces SMAP even if it is
 turned off in CR4, that would be a Qemu bug.  I have reproduced the
 failure locally and an am considering both possibilities now.
 

No, it is simply a second kernel bug.  I have patches for both and will
push them momentarily.

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 1/4] KVM/X86: Fix xsave cpuid exposing bug

2014-01-22 Thread H. Peter Anvin
On 01/22/2014 02:21 AM, Paolo Bonzini wrote:
 Il 21/01/2014 19:59, Liu, Jinsong ha scritto:
 From 3155a190ce6ebb213e6c724240f4e6620ba67a9d Mon Sep 17 00:00:00 2001
 From: Liu Jinsong jinsong@intel.com
 Date: Fri, 13 Dec 2013 02:32:03 +0800
 Subject: [PATCH v3 1/4] KVM/X86: Fix xsave cpuid exposing bug

 EBX of cpuid(0xD, 0) is dynamic per XCR0 features enable/disable.
 Bit 63 of XCR0 is reserved for future expansion.

 Signed-off-by: Liu Jinsong jinsong@intel.com
 
 Peter, can I have your acked-by on this?
 

Yes.

Acked-by: H. Peter Anvin h...@linux.intel.com

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] X86, mpx: Intel MPX definition

2013-12-06 Thread H. Peter Anvin
No... we always ask for cpufeature.h patches separately because they sometimes 
cause conflicts between branches.

Borislav Petkov b...@alien8.de wrote:
On Sat, Dec 07, 2013 at 02:52:55AM +0800, Qiaowei Ren wrote:
 
 Signed-off-by: Qiaowei Ren qiaowei@intel.com
 Signed-off-by: Xudong Hao xudong@intel.com
 Signed-off-by: Liu Jinsong jinsong@intel.com
 ---
  arch/x86/include/asm/cpufeature.h |2 ++
  1 files changed, 2 insertions(+), 0 deletions(-)

This patch should probably be merged with the next one...

 
 diff --git a/arch/x86/include/asm/cpufeature.h
b/arch/x86/include/asm/cpufeature.h
 index d3f5c63..6c2738d 100644
 --- a/arch/x86/include/asm/cpufeature.h
 +++ b/arch/x86/include/asm/cpufeature.h
 @@ -216,6 +216,7 @@
  #define X86_FEATURE_ERMS(9*32+ 9) /* Enhanced REP MOVSB/STOSB */
  #define X86_FEATURE_INVPCID (9*32+10) /* Invalidate Processor
Context ID */
  #define X86_FEATURE_RTM (9*32+11) /* Restricted Transactional
Memory */
 +#define X86_FEATURE_MPX (9*32+14) /* Memory Protection 
 Extension */
  #define X86_FEATURE_RDSEED  (9*32+18) /* The RDSEED instruction */
  #define X86_FEATURE_ADX (9*32+19) /* The ADCX and ADOX 
 instructions
*/
  #define X86_FEATURE_SMAP(9*32+20) /* Supervisor Mode Access
Prevention */
 @@ -330,6 +331,7 @@ extern const char * const x86_power_flags[32];
  #define cpu_has_perfctr_l2  boot_cpu_has(X86_FEATURE_PERFCTR_L2)
  #define cpu_has_cx8 boot_cpu_has(X86_FEATURE_CX8)
  #define cpu_has_cx16boot_cpu_has(X86_FEATURE_CX16)
 +#define cpu_has_mpx boot_cpu_has(X86_FEATURE_MPX)

... and we're trying to not have more of those macros so people should
be simply
using boot_cpu_has(X86_FEATURE_YYY).

-- 
Sent from my mobile phone.  Please pardon brevity and lack of formatting.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v3 0/2] Intel MPX feature support at Qemu

2013-12-06 Thread H. Peter Anvin
On 12/06/2013 08:27 AM, Liu, Jinsong wrote:
 Eric Blake wrote:
 On 12/06/2013 07:06 AM, Liu, Jinsong wrote:
 Intel has released Memory Protection Extensions (MPX) recently.
 Please refer to
 http://download-software.intel.com/sites/default/files/319433-015.pdf

 These 2 patches are version2 to support Intel MPX at qemu side.

 You still aren't threading correctly, which makes it hard to track
 your series.  Please review
 http://wiki.qemu.org/Contribute/SubmitAPatch and make sure your 'git
 send-email' settings allow for proper threading; a good way to test
 that is to first send the patch series to yourself to ensure your
 environment is set up correctly. 
 
 Thanks Blake! will take care and learn using git send-email when I send 
 patches later (i.e. kvm mpx patches).
 

Not to mention that Linux kernel patches should be Cc:'d to
linux-ker...@vger.kernel.org.

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] X86, mpx: Intel MPX xstate feature definition

2013-12-06 Thread H. Peter Anvin
On 12/06/2013 05:46 AM, Borislav Petkov wrote:
 
 I'm guessing this and the struct lwp_struct above is being added so that
 you can have the LWP XSAVE area size? If so, you don't need it: LWP
 XSAVE area is 128 bytes at offset 832 according to my manuals so I'd
 guess having a u8 lwp_area[128] should be fine.
 

Sure, but any reason to *not* document the internal structure?

 
 +struct bndregs_struct bndregs;
 +struct bndcsr_struct bndcsr;
  /* new processor state extensions will go here */
  } __attribute__ ((packed, aligned (64)));
  
 diff --git a/arch/x86/include/asm/xsave.h b/arch/x86/include/asm/xsave.h
 index 0415cda..5cd9de3 100644
 --- a/arch/x86/include/asm/xsave.h
 +++ b/arch/x86/include/asm/xsave.h
 @@ -9,6 +9,8 @@
  #define XSTATE_FP   0x1
  #define XSTATE_SSE  0x2
  #define XSTATE_YMM  0x4
 +#define XSTATE_BNDREGS  0x8
 +#define XSTATE_BNDCSR   0x10
  
  #define XSTATE_FPSSE(XSTATE_FP | XSTATE_SSE)
  
 @@ -20,10 +22,12 @@
  #define XSAVE_YMM_SIZE  256
  #define XSAVE_YMM_OFFSET(XSAVE_HDR_SIZE + XSAVE_HDR_OFFSET)
  
 +#define XSTATE_FLEXIBLE (XSTATE_FP | XSTATE_SSE | XSTATE_YMM)
 
 What's the use of that macro if it is used only once?

Documentation seems good enough.  Explicitly separating out the features
which MUST be eagerly saved seems like a good thing.

 +#define XSTATE_EAGER(XSTATE_BNDREGS | XSTATE_BNDCSR)
  /*
   * These are the features that the OS can handle currently.
   */
 -#define XCNTXT_MASK (XSTATE_FP | XSTATE_SSE | XSTATE_YMM)
 +#define XCNTXT_MASK (XSTATE_FLEXIBLE | XSTATE_EAGER)
  

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/3] X86, mpx: Intel MPX xstate feature definition

2013-12-06 Thread H. Peter Anvin
On 12/06/2013 09:35 AM, Paolo Bonzini wrote:
 
 Sorry for the back-and-forth, but I think this and the removal of
 XSTATE_FLEXIBLE (perhaps XSTATE_LAZY?) makes your v2 worse than v1.
 
 Since Peter already said the same, please undo these changes.
 
 Also, how is XSTATE_EAGER used?  Should MPX be disabled when xsaveopt is
 disabled on the kernel command line?  (Liu, how would this affect the
 KVM patches, too?)
 

There are two options: we could disable MPX etc. or we could force eager
saving (using xsave) even if xsaveopt is disabled.  It is a hard call to
make, but I guess I'm leaning towards the latter; we could add an
lazyxsave option to explicitly disable all eager features if there is
use for that.

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v2 3/3] X86, mpx: Intel MPX xstate feature definition

2013-12-06 Thread H. Peter Anvin
On 12/06/2013 12:05 PM, Liu, Jinsong wrote:

 Since Peter already said the same, please undo these changes.

 Also, how is XSTATE_EAGER used?  Should MPX be disabled when xsaveopt
 is disabled on the kernel command line?  (Liu, how would this affect
 the KVM patches, too?)

 Paolo
 
 Currently seems no, and if needed we can add a new patch at kvm side 
 accordingly when native mpx patches checked in.
 

We need to either disable these features in lazy mode, or we need to
force eager mode if these features are to be supported.  The problem
with the latter is that it means forcing eager mode regardless of if
anything actually *uses* these features.

A third option would be to require applications to use a prctl() or
similar to enable eager-save features.

Thoughts?

-hpa

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v2 3/3] X86, mpx: Intel MPX xstate feature definition

2013-12-06 Thread H. Peter Anvin
On 12/06/2013 04:23 PM, Ren, Qiaowei wrote:

 We need to either disable these features in lazy mode, or we need to
 force eager mode if these features are to be supported.  The problem
 with the latter is that it means forcing eager mode regardless of if
 anything actually *uses* these features.

 A third option would be to require applications to use a prctl() or
 similar to enable eager-save features.


 The third option seems better -- how does native mpx patches work, force
 eager?

 It should be the second option, as you can see xsave.c which we remove from 
 this patch. :)
 

Ah yes... I missed the fact that that chunk had been dropped from this
patch.  It really shouldn't be.

I'll substitute the previous version of the patch.

-hpa

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v2 3/3] X86, mpx: Intel MPX xstate feature definition

2013-12-06 Thread H. Peter Anvin
On 12/06/2013 05:16 PM, Ren, Qiaowei wrote:
 Jinsong think that both kvm and host depend on these feature definition 
 header file, so we firstly submit these files depended on.

Yes, but we can't turn on the feature without proper protection.  Either
way, they are now in tip:x86/cpufeature.

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] X86: Intel MPX definiation

2013-12-05 Thread H. Peter Anvin
On 12/05/2013 08:08 AM, Paolo Bonzini wrote:
 Il 02/12/2013 17:43, Liu, Jinsong ha scritto:
 From fbfa537f690eca139a96c6b2636ab5130bf57716 Mon Sep 17 00:00:00 2001
 From: Liu Jinsong jinsong@intel.com
 Date: Fri, 29 Nov 2013 01:27:00 +0800
 Subject: [PATCH 1/4] X86: Intel MPX definiation

 Signed-off-by: Xudong Hao xudong@intel.com
 Signed-off-by: Liu Jinsong jinsong@intel.com
 ---
  arch/x86/include/asm/cpufeature.h |2 ++
  arch/x86/include/asm/xsave.h  |5 -
  2 files changed, 6 insertions(+), 1 deletions(-)

 
 hpa/Ingo/Thomas, can you give your Acked-by for this patch?
 
 I'm not sure of the consequences of changing XCNTXT_MASK.  This series
 (which was submitted with the wrong threading) wants it so that KVM can
 use fpu_save_init and fpu_restore_checking to save and restore the MPX
 state of the guest.
 

Hi, I'm currently reviewing internally another set of patches for MPX
support which would at least in part conflict with these.  I don't see
the rest of the series -- where was it posted?

Either way:

1. asm/cpufeatures.h patches should always be separate, as we put those
into a special branch into the -tip tree since they touch so many other
things.

2. Enabling MPX is only safe with XSTATE_EAGER, which Qiaowei's patchset
has done correctly.

-hpa

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH, RFC] x86-64: properly handle FPU code/data selectors

2013-10-16 Thread H. Peter Anvin
On 10/16/2013 05:00 AM, Jan Beulich wrote:
 Having had reports of certain Windows versions, when put in some
 special driver verification mode, blue-screening due to the FPU state
 having changed across interrupt handler runs (resulting from a host/
 hypervisor side context switch somewhere in the middle of the guest
 interrupt handler execution) on Xen, and assuming that KVM would suffer
 from the same problem, as well as having also noticed (long ago) that
 32-bit processes don't behave correctly in this regard when run on a
 64-bit kernel, this is the resulting attempt to port (and suitably
 extend) the Xen side fix to Linux.
 
 The basic idea here is to either use a priori information on the
 intended state layout (in the case of 32-bit processes) or sense the
 proper layout (in the case of KVM guests) by inspecting the already
 saved FPU rip/rdp, and reading their actual values in a second save
 operation.
 
 This second save operation could be another [F]XSAVE, but on all
 systems I measured this on using FNSTENV turned out to be the faster
 alternative.

It is not at all clear to me from the description what the flow is that
causes the problem, whatever the problem is.  Perhaps it should be if I
wasn't horribly sleep-deprived, but the description should be clear
enough that one should be able to tell the problem at a glance.

Please describe the flow that causes trouble.

Is this basically a problem with the 32-bit version of FXSAVE versus the
64-bit version?

Furthermore, you define X86_FEATURE_NO_FPU_SEL, but you don't set it
anywhere.  At least that bit needs to be factored out into a separate patch.

+   if (config_enabled(CONFIG_IA32_EMULATION) 
+   test_tsk_thread_flag(tsk, TIF_IA32))

is_ia32_task()?

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH, RFC] x86-64: properly handle FPU code/data selectors

2013-10-16 Thread H. Peter Anvin
On 10/16/2013 09:07 AM, Jan Beulich wrote:
 
 Furthermore, you define X86_FEATURE_NO_FPU_SEL, but you don't set it
 anywhere.  At least that bit needs to be factored out into a separate patch.
 
 That's already being done in get_cpu_cap(), as it's part of
 x86_capability[9].
 

Ah, sorry, my bad.  For some reason I thought you added it to word 3,
but this is a hardware-provided CPUID bit.  I, if anyone, should have
known :)

 +if (config_enabled(CONFIG_IA32_EMULATION) 
 +test_tsk_thread_flag(tsk, TIF_IA32))

 is_ia32_task()?
 
 That'd imply that tsk == current in all cases, which I don't
 think is right here.

True.  It wold be good to have an equivalent predicate function for
another task, though.

This assumes the process doesn't switch modes on us, which it is allowed
to do.  For that it really would be better to look at the CS.L bit,
which can be done with the LAR instruction for the current task;
otherwise we'd have to walk the descriptor tables.

-hpa

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND V13 14/14] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor

2013-08-13 Thread H. Peter Anvin
Raghavendra...

Even with this latest patch this branch is broken:

:(.discard+0x6108): multiple definition of `__pcpu_unique_lock_waiting'
arch/x86/xen/built-in.o:(.discard+0x23): first defined here
  CC  drivers/firmware/google/gsmi.o
arch/x86/kernel/built-in.o:(.discard+0x6108): multiple definition of
`__pcpu_unique_lock_waiting'
arch/x86/xen/built-in.o:(.discard+0x23): first defined here
  CC  sound/core/seq/oss/seq_oss_init.o

This is trivially reproducible by doing a build with make allyesconfig.

Please fix and *verify* it is fixed before resubmitting.

I will be away so Ingo will have to handle the resubmission.

-hpa

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V12 0/14] Paravirtualized ticket spinlocks

2013-08-09 Thread H. Peter Anvin
On 08/09/2013 06:00 AM, Konrad Rzeszutek Wilk wrote:
 On Fri, Aug 09, 2013 at 06:20:02PM +0530, Raghavendra K T wrote:
 On 08/09/2013 04:34 AM, H. Peter Anvin wrote:

 Okay, I figured it out.

 One of several problems with the formatting of this patchset is that it
 has one- and two-digit patch numbers in the headers, which meant that my
 scripts tried to apply patch 10 first.


 My bad. I 'll send out in uniform digit form next time.

 
 If you use 'git format-patch --subject-prefix PATCH V14 v3.11-rc4..'
 and 'git send-email --subject [PATCH V14] bla blah ..'
 that should be automatically taken care of?
 

Indeed it should.

Another problem with this patchset was that the subject was duplicated
in the body, which meant the tools didn't pick up the From: line.  I
ended up having to manually edit them.

That seems to have been fixed, too, in V13.

-hpa

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V12 0/14] Paravirtualized ticket spinlocks

2013-08-08 Thread H. Peter Anvin
On 08/07/2013 06:02 PM, Gleb Natapov wrote:
 On Wed, Aug 07, 2013 at 08:50:12PM -0400, Konrad Rzeszutek Wilk wrote:
 On Wed, Aug 07, 2013 at 12:15:21PM +0530, Raghavendra K T wrote:
 On 08/07/2013 10:18 AM, H. Peter Anvin wrote:
 Please let me know, if I should rebase again.


 tip:master is not a stable branch; it is more like linux-next.  We need
 to figure out which topic branches are dependencies for this set.

 Okay. I 'll start looking at the branches that would get affected.
 (Xen, kvm are obvious ones).
 Please do let me know the branches I might have to check for.

 From the Xen standpoint anything past v3.11-rc4 would work.

 For KVM as early as past v3.11-rc1 would be OK.
 

I'm still completely confused as to the base of this patchset.  The
first patch has the following hunk for arch/x86/include/asm/paravirt.h:


--- arch/x86/include/asm/paravirt.h
+++ arch/x86/include/asm/paravirt.h
@@ -718,7 +718,7 @@
PVOP_VCALLEE2(pv_lock_ops.lock_spinning, lock, ticket);
 }

-static __always_inline void ticket_unlock_kick(struct arch_spinlock
*lock,
+static __always_inline void __ticket_unlock_kick(struct arch_spinlock
*lock,
__ticket_t ticket)
 {
PVOP_VCALL2(pv_lock_ops.unlock_kick, lock, ticket);


However, there is no ticket_unlock_kick in paravirt.h in either
tip:master nor in linus...

-hpa



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V12 0/14] Paravirtualized ticket spinlocks

2013-08-08 Thread H. Peter Anvin
On 08/08/2013 02:13 PM, H. Peter Anvin wrote:
 On 08/07/2013 06:02 PM, Gleb Natapov wrote:
 On Wed, Aug 07, 2013 at 08:50:12PM -0400, Konrad Rzeszutek Wilk wrote:
 On Wed, Aug 07, 2013 at 12:15:21PM +0530, Raghavendra K T wrote:
 On 08/07/2013 10:18 AM, H. Peter Anvin wrote:
 Please let me know, if I should rebase again.


 tip:master is not a stable branch; it is more like linux-next.  We need
 to figure out which topic branches are dependencies for this set.

 Okay. I 'll start looking at the branches that would get affected.
 (Xen, kvm are obvious ones).
 Please do let me know the branches I might have to check for.

 From the Xen standpoint anything past v3.11-rc4 would work.

 For KVM as early as past v3.11-rc1 would be OK.

 
 I'm still completely confused as to the base of this patchset.  The
 first patch has the following hunk for arch/x86/include/asm/paravirt.h:
 

Okay, I figured it out.

One of several problems with the formatting of this patchset is that it
has one- and two-digit patch numbers in the headers, which meant that my
scripts tried to apply patch 10 first.

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V12 0/14] Paravirtualized ticket spinlocks

2013-08-08 Thread H. Peter Anvin
The kbuild test bot is reporting some pretty serious errors for this
patchset.  I think these are serious enough that the patchset will need
to be respun.

-hpa

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V12 0/14] Paravirtualized ticket spinlocks

2013-08-06 Thread H. Peter Anvin
On 08/06/2013 04:40 AM, Raghavendra K T wrote:
 This series replaces the existing paravirtualized spinlock mechanism
 with a paravirtualized ticketlock mechanism. The series provides
 implementation for both Xen and KVM.
 
 The current set of patches are for Xen/x86 spinlock/KVM guest side, to be 
 included
 against -tip.
 

What is the baseline for this patchset?  I tried to apply it on top of
3.11-rc4 and I got nontrivial conflicts.

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V12 0/14] Paravirtualized ticket spinlocks

2013-08-06 Thread H. Peter Anvin
On 08/06/2013 07:54 PM, Raghavendra K T wrote:
 On 08/07/2013 02:31 AM, H. Peter Anvin wrote:

 What is the baseline for this patchset?  I tried to apply it on top of
 3.11-rc4 and I got nontrivial conflicts.

 
 I had based it on top of 445363e8 [ Merge branch 'perf/urgent']
 of tip.  Sorry for not mentioning that.
 
 Please let me know, if I should rebase again.
 

tip:master is not a stable branch; it is more like linux-next.  We need
to figure out which topic branches are dependencies for this set.

-hpa

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 4/4] x86: correctly detect hypervisor

2013-08-05 Thread H. Peter Anvin
On 08/05/2013 07:34 AM, Konrad Rzeszutek Wilk wrote:
 
 Could you provide me with a git branch so I can test it overnight please?
 

Pull tip:x86/paravirt.

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC V11 0/18] Paravirtualized ticket spinlocks

2013-08-05 Thread H. Peter Anvin
So, having read through the entire thread I *think* this is what the
status of this patchset is:

1. Patches 1-17 are noncontroversial, Raghavendra is going to send an
   update split into two patchsets;
2. There are at least two versions of patch 15; I think the PATCH
   RESEND RFC is the right one.
3. Patch 18 is controversial but there are performance numbers; these
   should be integrated in the patch description.
4. People are in general OK with us putting this patchset into -tip for
   testing, once the updated (V12) patchset is posted.

If I'm misunderstanding something, it is because of excessive thread
length as mentioned by Ingo.

Either way, I'm going to hold off on putting it into -tip until tomorrow
unless Ingo beats me to it.

-hpa

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 4/4] x86: properly handle kvm emulation of hyperv

2013-07-24 Thread H. Peter Anvin
I don't see how this solves the A emulates B, B emulates A problem?

KY Srinivasan k...@microsoft.com wrote:


 -Original Message-
 From: Paolo Bonzini [mailto:paolo.bonz...@gmail.com] On Behalf Of
Paolo
 Bonzini
 Sent: Wednesday, July 24, 2013 3:07 AM
 To: Jason Wang
 Cc: H. Peter Anvin; KY Srinivasan; t...@linutronix.de;
mi...@redhat.com;
 x...@kernel.org; g...@redhat.com; kvm@vger.kernel.org; linux-
 ker...@vger.kernel.org
 Subject: Re: [PATCH 4/4] x86: properly handle kvm emulation of hyperv
 
 Il 24/07/2013 08:54, Jason Wang ha scritto:
  On 07/24/2013 12:48 PM, H. Peter Anvin wrote:
  On 07/23/2013 09:37 PM, Jason Wang wrote:
  On 07/23/2013 10:48 PM, H. Peter Anvin wrote:
  On 07/23/2013 06:55 AM, KY Srinivasan wrote:
  This strategy of hypervisor detection based on some detection
order
 IMHO is not
  a robust detection strategy. The current scheme works since the
only
 hypervisor emulated
  (by other hypervisors happens to be Hyper-V). What if this were
to
 change.
 
  One strategy would be to pick the *last* one in the CPUID list,
since
  the ones before it are logically the one(s) being emulated...
 
  -hpa
 
  How about simply does a reverse loop from 0x4001 to
0x4001?
 
  Not all systems like being poked too far into hyperspace.  Just
remember
  the last match and walk the list.
 
-hpa
 
 
  Ok, but it raises a question - how to know it was the 'last' match
  without knowing all signatures of other hyper-visor?
 
 You can return a priority value from the .detect function.  The
 priority value can simply be the CPUID leaf where the signature was
 found (or a low value such as 1 if detection was done with DMI).
 
 Then you can pick the hypervisor with the highest priority instead of
 hard-coding the order.

I like this idea; this allows some guest level control that is what we
want
when we have hypervisors emulating each other.


Regards,

K. Y 
 
 Paolo
 
 

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 4/4] x86: properly handle kvm emulation of hyperv

2013-07-24 Thread H. Peter Anvin
What I'm suggesting is exactly that except that the native hypervisor is later 
in CPUID space.

KY Srinivasan k...@microsoft.com wrote:


 -Original Message-
 From: H. Peter Anvin [mailto:h...@zytor.com]
 Sent: Wednesday, July 24, 2013 11:14 AM
 To: KY Srinivasan; Paolo Bonzini; Jason Wang
 Cc: t...@linutronix.de; mi...@redhat.com; x...@kernel.org;
g...@redhat.com;
 kvm@vger.kernel.org; linux-ker...@vger.kernel.org
 Subject: RE: [PATCH 4/4] x86: properly handle kvm emulation of hyperv
 
 I don't see how this solves the A emulates B, B emulates A problem?

As Paolo suggested if there were some priority encoded, the guest could
make an
informed decision. If the guest under question can run on both
hypervisors A and B,
we would rather the guest discover hypervisor A when running on A and
hypervisor B
when running on B. The priority encoding could be as simple as
surfacing the native hypervisor
signature earlier in the CPUID space.

K. Y
 
 KY Srinivasan k...@microsoft.com wrote:
 
 
  -Original Message-
  From: Paolo Bonzini [mailto:paolo.bonz...@gmail.com] On Behalf Of
 Paolo
  Bonzini
  Sent: Wednesday, July 24, 2013 3:07 AM
  To: Jason Wang
  Cc: H. Peter Anvin; KY Srinivasan; t...@linutronix.de;
 mi...@redhat.com;
  x...@kernel.org; g...@redhat.com; kvm@vger.kernel.org; linux-
  ker...@vger.kernel.org
  Subject: Re: [PATCH 4/4] x86: properly handle kvm emulation of
hyperv
 
  Il 24/07/2013 08:54, Jason Wang ha scritto:
   On 07/24/2013 12:48 PM, H. Peter Anvin wrote:
   On 07/23/2013 09:37 PM, Jason Wang wrote:
   On 07/23/2013 10:48 PM, H. Peter Anvin wrote:
   On 07/23/2013 06:55 AM, KY Srinivasan wrote:
   This strategy of hypervisor detection based on some
detection
 order
  IMHO is not
   a robust detection strategy. The current scheme works since
the
 only
  hypervisor emulated
   (by other hypervisors happens to be Hyper-V). What if this
were
 to
  change.
  
   One strategy would be to pick the *last* one in the CPUID
list,
 since
   the ones before it are logically the one(s) being emulated...
  
-hpa
  
   How about simply does a reverse loop from 0x4001 to
 0x4001?
  
   Not all systems like being poked too far into hyperspace.  Just
 remember
   the last match and walk the list.
  
  -hpa
  
  
   Ok, but it raises a question - how to know it was the 'last'
match
   without knowing all signatures of other hyper-visor?
 
  You can return a priority value from the .detect function.  The
  priority value can simply be the CPUID leaf where the signature
was
  found (or a low value such as 1 if detection was done with DMI).
 
  Then you can pick the hypervisor with the highest priority instead
of
  hard-coding the order.
 
 I like this idea; this allows some guest level control that is what
we
 want
 when we have hypervisors emulating each other.
 
 
 Regards,
 
 K. Y
 
  Paolo
 
 
 
 --
 Sent from my mobile phone. Please excuse brevity and lack of
formatting.
 
 

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] x86: introduce hypervisor_cpuid_base()

2013-07-23 Thread H. Peter Anvin
On 07/23/2013 02:41 AM, Jason Wang wrote:
  
 +static inline uint32_t hypervisor_cpuid_base(const char *sig, uint32_t 
 leaves)
 +{
 + uint32_t base, eax, ebx, ecx, edx;
 + char signature[13];
 +
 + for (base = 0x4000; base  0x4001; base += 0x100) {
 + cpuid(base, eax, ebx, ecx, edx);
 + *(uint32_t *)(signature + 0) = ebx;
 + *(uint32_t *)(signature + 4) = ecx;
 + *(uint32_t *)(signature + 8) = edx;
 + signature[12] = 0;
 +
 + if (!strcmp(sig, signature) 
 + (leaves == 0 || ((eax - base) = leaves)))
 + return base;
 + }
 +
 + return 0;
 +}
 +

Hmm... how about:

uint32_t sign[3];

cpuid(base, eax, sign[0], sign[1], sign[2]);

if (!memcmp(sig, sign, 12)  ...);

-hpa

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] x86: properly handle kvm emulation of hyperv

2013-07-23 Thread H. Peter Anvin
On 07/23/2013 06:55 AM, KY Srinivasan wrote:
 
 This strategy of hypervisor detection based on some detection order IMHO is 
 not
 a robust detection strategy. The current scheme works since the only 
 hypervisor emulated
 (by other hypervisors happens to be Hyper-V). What if this were to change.
 

One strategy would be to pick the *last* one in the CPUID list, since
the ones before it are logically the one(s) being emulated...

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] x86: introduce hypervisor_cpuid_base()

2013-07-23 Thread H. Peter Anvin
On 07/23/2013 04:16 AM, Paolo Bonzini wrote:
 
 That's nicer, though strcmp is what the replaced code used to do in
 patches 2 and 3.
 
 Note that memcmp requires the caller to use KVMKVMKVM\0\0 as the
 signature (or alternatively hypervisor_cpuid_base can copy the argument
 into another 12-byte local variable).
 

Which is the actual signature, though...

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] x86: properly handle kvm emulation of hyperv

2013-07-23 Thread H. Peter Anvin
On 07/23/2013 10:45 AM, KY Srinivasan wrote:

 One strategy would be to pick the *last* one in the CPUID list, since
 the ones before it are logically the one(s) being emulated...
 
 Is it always possible to guarantee this ordering. As a hypothetical, what if 
 hypervisor A
 emulates Hypervisor B and Hypervisor B emulates Hypervisor A. In this case we 
 cannot
 have any order based detection that can yield correct detection. I define 
 correctness
 as follows:
 
 If a guest can run on both the hypervisors, the guest should detect the true 
 native
 Hypervisor. 
 

My point was that most hypervisors tend to put the native signature at
the end of the list starting at 0x4000, just to deal with naïve
guests which only look at 0x4000 and not beyond.  So a natural
convention would be to use the last entry in the list you know how to
handle.

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] x86: introduce hypervisor_cpuid_base()

2013-07-23 Thread H. Peter Anvin
On 07/23/2013 09:44 PM, Jason Wang wrote:
 
 Since it's just a minor optimization. How about just keep using the
 strcmp()?
 

It's more that it enables the rest of the cleanup, making the code
easier to read.

-hpa

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] x86: properly handle kvm emulation of hyperv

2013-07-23 Thread H. Peter Anvin
On 07/23/2013 09:37 PM, Jason Wang wrote:
 On 07/23/2013 10:48 PM, H. Peter Anvin wrote:
 On 07/23/2013 06:55 AM, KY Srinivasan wrote:
 This strategy of hypervisor detection based on some detection order IMHO is 
 not
 a robust detection strategy. The current scheme works since the only 
 hypervisor emulated
 (by other hypervisors happens to be Hyper-V). What if this were to change.

 One strategy would be to pick the *last* one in the CPUID list, since
 the ones before it are logically the one(s) being emulated...

  -hpa

 
 How about simply does a reverse loop from 0x4001 to 0x4001?
 

Not all systems like being poked too far into hyperspace.  Just remember
the last match and walk the list.

-hpa

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   3   >