date:20150917

Re: [PATCH] ARM/ARM64: KVM: remove 'config KVM_ARM_MAX_VCPUS'

2015-09-17 Thread Ming Lei

On Wed, Sep 2, 2015 at 7:42 PM, Ming Lei  wrote:
> On Wed, Sep 2, 2015 at 6:25 PM, Christoffer Dall
>  wrote:
>> On Wed, Sep 02, 2015 at 02:31:21PM +0800, Ming Lei wrote:
>>> This patch removes config option of KVM_ARM_MAX_VCPUS,
>>> and like other ARCHs, just choose the maximum allowed
>>> value from hardware, and follows the reasons:
>>>
>>> 1) from distribution view, the option has to be
>>> defined as the max allowed value because it need to
>>> meet all kinds of virtulization applications and
>>> need to support most of SoCs;
>>>
>>> 2) using a bigger value doesn't introduce extra memory
>>> consumption, and the help text in Kconfig isn't accurate
>>> because kvm_vpu structure isn't allocated until request
>>> of creating VCPU is sent from QEMU;
>>
>> This used to be true because of the vgic bitmaps, but that is now
>> dynamically allocated, so I believe you're correct in saying that the
>> text is no longer accurate.
>>
>>>
>>> 3) the main effect is that the field of vcpus[] in 'struct kvm'
>>> becomes a bit bigger(sizeof(void *) per vcpu) and need more cache
>>> lines to hold the structure, but 'struct kvm' is one generic struct,
>>> and it has worked well on other ARCHs already in this way. Also,
>>> the world switch frequecy is often low, for example, it is ~2000
>>> when running kernel building load in VM from APM xgene KVM host,
>>> so the effect is very small, and the difference can't be observed
>>> in my test at all.
>>
>> While I'm not prinicipally opposed to removing this option, I have to
>> point out that this analysis is far far over-simplified.  You have
>> chosen a workload which excercised only CPU and memory virtualization,
>> mostly solved by the hardware virtualization support, and therefore you
>> don't see many exits.
>>
>> Try running an I/O bound workload, or something which involves a lot of
>> virtual IPIs, and you'll see a higher number of exits.
>
> Yeah, the frequency of exits becomes higher(6600/sec) when I run a
> totally I/O benchmark(fio: 4 jobs, bs 4k, libaio over virtio-blk) in a
> quad-core VM, but it is still not high enough to cause any difference
> on the test result.
>
>>
>> However, I still doubt that the effects will be noticable in the grand
>> scheme of things.
>>>
>>> Cc: Dann Frazier 
>>> Cc: Christoffer Dall 
>>> Cc: Marc Zyngier 
>>> Cc: kvm...@lists.cs.columbia.edu
>>> Cc: kvm@vger.kernel.org
>>> Signed-off-by: Ming Lei 
>>> ---
>>>  arch/arm/include/asm/kvm_host.h   |  8 ++--
>>>  arch/arm/kvm/Kconfig  | 11 ---
>>>  arch/arm64/include/asm/kvm_host.h |  8 ++--
>>>  arch/arm64/kvm/Kconfig| 11 ---
>>>  include/kvm/arm_vgic.h|  6 +-
>>>  virt/kvm/arm/vgic-v3.c|  2 +-
>>>  6 files changed, 6 insertions(+), 40 deletions(-)
>>>
>>> diff --git a/arch/arm/include/asm/kvm_host.h 
>>> b/arch/arm/include/asm/kvm_host.h
>>> index dcba0fa..c8c226a 100644
>>> --- a/arch/arm/include/asm/kvm_host.h
>>> +++ b/arch/arm/include/asm/kvm_host.h
>>> @@ -29,12 +29,6 @@
>>>
>>>  #define __KVM_HAVE_ARCH_INTC_INITIALIZED
>>>
>>> -#if defined(CONFIG_KVM_ARM_MAX_VCPUS)
>>> -#define KVM_MAX_VCPUS CONFIG_KVM_ARM_MAX_VCPUS
>>> -#else
>>> -#define KVM_MAX_VCPUS 0
>>> -#endif
>>> -
>>>  #define KVM_USER_MEM_SLOTS 32
>>>  #define KVM_PRIVATE_MEM_SLOTS 4
>>>  #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
>>> @@ -44,6 +38,8 @@
>>>
>>>  #include 
>>>
>>> +#define KVM_MAX_VCPUS VGIC_V2_MAX_CPUS
>>> +
>>>  u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode);
>>>  int __attribute_const__ kvm_target_cpu(void);
>>>  int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
>>> diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
>>> index bfb915d..210ecca 100644
>>> --- a/arch/arm/kvm/Kconfig
>>> +++ b/arch/arm/kvm/Kconfig
>>> @@ -45,15 +45,4 @@ config KVM_ARM_HOST
>>>   ---help---
>>> Provides host support for ARM processors.
>>>
>>> -config KVM_ARM_MAX_VCPUS
>>> - int "Number maximum supported virtual CPUs per VM"
>>> - depends on KVM_ARM_HOST
>>> - default 4
>>> - help
>>> -   Static number of max supported virtual CPUs per VM.
>>> -
>>> -   If you choose a high number, the vcpu structures will be quite
>>> -   large, so only choose a reasonable number that you expect to
>>> -   actually use.
>>> -
>>>  endif # VIRTUALIZATION
>>> diff --git a/arch/arm64/include/asm/kvm_host.h 
>>> b/arch/arm64/include/asm/kvm_host.h
>>> index 415938d..3fb58ea 100644
>>> --- a/arch/arm64/include/asm/kvm_host.h
>>> +++ b/arch/arm64/include/asm/kvm_host.h
>>> @@ -30,12 +30,6 @@
>>>
>>>  #define __KVM_HAVE_ARCH_INTC_INITIALIZED
>>>
>>> -#if defined(CONFIG_KVM_ARM_MAX_VCPUS)
>>> -#define KVM_MAX_VCPUS CONFIG_KVM_ARM_MAX_VCPUS
>>> -#else
>>> -#define KVM_MAX_VCPUS 0
>>> -#endif
>>> -
>>>  #define KVM_USER_MEM_SLOTS 32
>>>  #define

Re: [PATCH] ARM/ARM64: KVM: remove 'config KVM_ARM_MAX_VCPUS'

2015-09-17 Thread Marc Zyngier

On 17/09/15 07:08, Ming Lei wrote:
> On Wed, Sep 2, 2015 at 7:42 PM, Ming Lei  wrote:
>> On Wed, Sep 2, 2015 at 6:25 PM, Christoffer Dall
>>  wrote:
>>> On Wed, Sep 02, 2015 at 02:31:21PM +0800, Ming Lei wrote:
 This patch removes config option of KVM_ARM_MAX_VCPUS,
 and like other ARCHs, just choose the maximum allowed
 value from hardware, and follows the reasons:

 1) from distribution view, the option has to be
 defined as the max allowed value because it need to
 meet all kinds of virtulization applications and
 need to support most of SoCs;

 2) using a bigger value doesn't introduce extra memory
 consumption, and the help text in Kconfig isn't accurate
 because kvm_vpu structure isn't allocated until request
 of creating VCPU is sent from QEMU;
>>>
>>> This used to be true because of the vgic bitmaps, but that is now
>>> dynamically allocated, so I believe you're correct in saying that the
>>> text is no longer accurate.
>>>

 3) the main effect is that the field of vcpus[] in 'struct kvm'
 becomes a bit bigger(sizeof(void *) per vcpu) and need more cache
 lines to hold the structure, but 'struct kvm' is one generic struct,
 and it has worked well on other ARCHs already in this way. Also,
 the world switch frequecy is often low, for example, it is ~2000
 when running kernel building load in VM from APM xgene KVM host,
 so the effect is very small, and the difference can't be observed
 in my test at all.
>>>
>>> While I'm not prinicipally opposed to removing this option, I have to
>>> point out that this analysis is far far over-simplified.  You have
>>> chosen a workload which excercised only CPU and memory virtualization,
>>> mostly solved by the hardware virtualization support, and therefore you
>>> don't see many exits.
>>>
>>> Try running an I/O bound workload, or something which involves a lot of
>>> virtual IPIs, and you'll see a higher number of exits.
>>
>> Yeah, the frequency of exits becomes higher(6600/sec) when I run a
>> totally I/O benchmark(fio: 4 jobs, bs 4k, libaio over virtio-blk) in a
>> quad-core VM, but it is still not high enough to cause any difference
>> on the test result.
>>
>>>
>>> However, I still doubt that the effects will be noticable in the grand
>>> scheme of things.

 Cc: Dann Frazier 
 Cc: Christoffer Dall 
 Cc: Marc Zyngier 
 Cc: kvm...@lists.cs.columbia.edu
 Cc: kvm@vger.kernel.org
 Signed-off-by: Ming Lei 
 ---
  arch/arm/include/asm/kvm_host.h   |  8 ++--
  arch/arm/kvm/Kconfig  | 11 ---
  arch/arm64/include/asm/kvm_host.h |  8 ++--
  arch/arm64/kvm/Kconfig| 11 ---
  include/kvm/arm_vgic.h|  6 +-
  virt/kvm/arm/vgic-v3.c|  2 +-
  6 files changed, 6 insertions(+), 40 deletions(-)

 diff --git a/arch/arm/include/asm/kvm_host.h 
 b/arch/arm/include/asm/kvm_host.h
 index dcba0fa..c8c226a 100644
 --- a/arch/arm/include/asm/kvm_host.h
 +++ b/arch/arm/include/asm/kvm_host.h
 @@ -29,12 +29,6 @@

  #define __KVM_HAVE_ARCH_INTC_INITIALIZED

 -#if defined(CONFIG_KVM_ARM_MAX_VCPUS)
 -#define KVM_MAX_VCPUS CONFIG_KVM_ARM_MAX_VCPUS
 -#else
 -#define KVM_MAX_VCPUS 0
 -#endif
 -
  #define KVM_USER_MEM_SLOTS 32
  #define KVM_PRIVATE_MEM_SLOTS 4
  #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
 @@ -44,6 +38,8 @@

  #include 

 +#define KVM_MAX_VCPUS VGIC_V2_MAX_CPUS
 +
  u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode);
  int __attribute_const__ kvm_target_cpu(void);
  int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
 diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
 index bfb915d..210ecca 100644
 --- a/arch/arm/kvm/Kconfig
 +++ b/arch/arm/kvm/Kconfig
 @@ -45,15 +45,4 @@ config KVM_ARM_HOST
   ---help---
 Provides host support for ARM processors.

 -config KVM_ARM_MAX_VCPUS
 - int "Number maximum supported virtual CPUs per VM"
 - depends on KVM_ARM_HOST
 - default 4
 - help
 -   Static number of max supported virtual CPUs per VM.
 -
 -   If you choose a high number, the vcpu structures will be quite
 -   large, so only choose a reasonable number that you expect to
 -   actually use.
 -
  endif # VIRTUALIZATION
 diff --git a/arch/arm64/include/asm/kvm_host.h 
 b/arch/arm64/include/asm/kvm_host.h
 index 415938d..3fb58ea 100644
 --- a/arch/arm64/include/asm/kvm_host.h
 +++ b/arch/arm64/include/asm/kvm_host.h
 @@ -30,12 +30,6 @@

  #define __KVM_HAVE_ARCH_INTC_INITIALIZED

 -#if defined(CONFIG_KVM_ARM_MAX_VCPUS)
 -#define

Re: [PATCH 0/3] x86/paravirt: Fix baremetal paravirt MSR ops

2015-09-17 Thread Ingo Molnar

* Andy Lutomirski  wrote:

> Setting CONFIG_PARAVIRT=y has an unintended side effect: it silently
> turns all rdmsr and wrmsr operations into the safe variants without
> any checks that the operations actually succeed.
> 
> This is IMO awful: it papers over bugs.  In particular, KVM gueests
> might be unwittingly depending on this behavior because
> CONFIG_KVM_GUEST currently depends on CONFIG_PARAVIRT.  I'm not
> aware of any such problems, but applying this series would be a good
> way to shake them out.
> 
> Fix it so that the MSR operations work the same on CONFIG_PARAVIRT=n
> and CONFIG_PARAVIRT=y as long as Xen isn't being used.  The Xen
> maintainers are welcome to make a similar change on top of this.
> 
> Since there's plenty of time before the next merge window, I think
> we should apply and fix anything that breaks.

No, I think we should at most generate a warning instead, and not crash the 
kernel 
via rdmsr()!

Most big distro kernels on bare metal have CONFIG_PARAVIRT=y (I checked Ubuntu 
and 
Fedora), so we are potentially exposing a lot of users to problems.

Crashing the bootup on an unknown MSR is bad. Many MSR reads and writes are 
non-critical and returning the 'safe' result is much better than crashing or 
hanging the bootup.

( We should double check that rdmsr()/wrmsr() results are never left 
  uninitialized, but are set to zero or so, for cases where the return code is 
not 
  checked. )

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 00/22] KVM: ARM64: Add guest PMU support

2015-09-17 Thread Shannon Zhao

Hi Wei,

On 2015/9/17 13:56, Wei Huang wrote:
> 
> 
> On 09/16/2015 08:32 PM, Shannon Zhao wrote:
>> Hi Wei,
>>
>> On 2015/9/17 5:07, Wei Huang wrote:
>>> I am testing this series. 
>> Thanks for your time and help.
>>
>>> The first question is: do you plan to add ACPI
>>> support in QEMU?
> 
> I saw "KVM_{SET/GET}_DEVICE_ATTR failed: Invalid argument" while using
> your QEMU tree (PMU_v2 branch). A quick debugging:
> 
>From this log, it might fail at below check:
+   if (reg < VGIC_NR_SGIS || reg > dev->kvm->arch.vgic.nr_irqs)
+   return -EINVAL;

> (1) dmesg on host kernel didn't show any vPMU initialization errors. So
> I suspect the problem is related to QEMU.
> (2) Commit 58771bc2a78 worked fine. So probably the problem was
> introduced by new PMU code.
> 
> Have you seen it before?
> 

Oh, I didn't see this. And I checkout the code on git.linaro.org, it's
same with my local code.

Could you add some print in kvm_arm_pmu_set_irq of hw/misc/arm_pmu_kvm.c
and kvm_arm_pmu_set_attr, kvm_arm_pmu_set_irq of virt/kvm/arm/pmu.c.

Thanks,

-- 
Shannon

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 08/18] nvdimm: init backend memory mapping and config data area

2015-09-17 Thread Xiao Guangrong




On 09/16/2015 12:07 AM, Paolo Bonzini wrote:



On 26/08/2015 12:40, Xiao Guangrong wrote:


+
+size = get_file_size(fd);
+buf = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);


I guess the user will want to choose between MAP_SHARED and MAP_PRIVATE.
This can be added in the future.


Good idea, it will allow guest to write data but discards its content
after it exits. Will implement O_RDONLY + MAP_PRIVATE in the near future.


FWIW, if Igor's backend/frontend idea is implemented, the choice between
MAP_SHARED and MAP_PRIVATE should belong in the backend.


Yes. I can not agree with you more! :)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Xen-devel] [PATCH 0/3] x86/paravirt: Fix baremetal paravirt MSR ops

2015-09-17 Thread Andrew Cooper

On 17/09/15 00:33, Andy Lutomirski wrote:
> Setting CONFIG_PARAVIRT=y has an unintended side effect: it silently
> turns all rdmsr and wrmsr operations into the safe variants without
> any checks that the operations actually succeed.
>
> This is IMO awful: it papers over bugs.  In particular, KVM gueests
> might be unwittingly depending on this behavior because
> CONFIG_KVM_GUEST currently depends on CONFIG_PARAVIRT.  I'm not
> aware of any such problems, but applying this series would be a good
> way to shake them out.
>
> Fix it so that the MSR operations work the same on CONFIG_PARAVIRT=n
> and CONFIG_PARAVIRT=y as long as Xen isn't being used.  The Xen
> maintainers are welcome to make a similar change on top of this.

The Xen side of things need some further modification before this would
be a safe operation to perform.

On the wrmsr side of things alone, this is the list of things Xen
currently objects to and injects #GP faults for.

(XEN) traps.c:2692:d0v0 Domain attempted WRMSR c081 from
0xe023e008 to 0x00230010.
(XEN) traps.c:2692:d0v0 Domain attempted WRMSR c082 from
0x82d0b000 to 0x81560060.
(XEN) traps.c:2692:d0v0 Domain attempted WRMSR c083 from
0x82d0b020 to 0x81558100.
(XEN) traps.c:2692:d0v0 Domain attempted WRMSR 0174 from
0xe008 to 0x0010.
(XEN) traps.c:2692:d0v0 Domain attempted WRMSR 0175 from
0x8300ac0f7fc0 to 0x.
(XEN) traps.c:2692:d0v0 Domain attempted WRMSR 0176 from
0x82d08023fd50 to 0x815616d0.
(XEN) traps.c:2692:d0v0 Domain attempted WRMSR c083 from
0x82d0b020 to 0x81561910.
(XEN) traps.c:2692:d0v0 Domain attempted WRMSR c084 from
0x00074700 to 0x00047700.

However, it would be certainly be worth teaching PVops not to play with
MSRs it doesn't own.

~Andrew
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] x86/paravirt: Fix baremetal paravirt MSR ops

2015-09-17 Thread Borislav Petkov

On Thu, Sep 17, 2015 at 09:19:20AM +0200, Ingo Molnar wrote:
> Most big distro kernels on bare metal have CONFIG_PARAVIRT=y (I checked 
> Ubuntu and 
> Fedora), so we are potentially exposing a lot of users to problems.

+ SUSE.

> Crashing the bootup on an unknown MSR is bad. Many MSR reads and writes are 
> non-critical and returning the 'safe' result is much better than crashing or 
> hanging the bootup.

... and prepending all MSR accesses with feature/CPUID checks is probably almost
impossible.

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] x86/paravirt: Fix baremetal paravirt MSR ops

2015-09-17 Thread Paolo Bonzini



On 17/09/2015 10:58, Peter Zijlstra wrote:
> But the far greater problem I have with the whole virt thing is that
> you cannot use rdmsr_safe() to probe if an MSR exists at all because, as
> you told me, these virt thingies return 0 for all 'unknown' MSRs instead
> of faulting.

At least for KVM, that behavior is opt-in (the ignore_msrs parameter)
and no distro that I know enables it by default.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] x86/paravirt: Fix baremetal paravirt MSR ops

2015-09-17 Thread Paolo Bonzini



On 17/09/2015 11:31, Borislav Petkov wrote:
> 
>> > Crashing the bootup on an unknown MSR is bad. Many MSR reads and writes 
>> > are 
>> > non-critical and returning the 'safe' result is much better than crashing 
>> > or 
>> > hanging the bootup.
> ... and prepending all MSR accesses with feature/CPUID checks is probably 
> almost
> impossible.

That's not a big deal, that's what *_safe is for.  The problem is that
there are definitely some cases where the *_safe version is not being used.

I agree with Ingo that we should start with a WARN.  For example:

- give the read_msr and write_msr hooks the same prototype as the safe
variants

- make the virt platforms always return "no error" for the unsafe
variants (I understand if your first reaction is "ouch", but this
effectively is already the current behavior)

- change rdmsr/wrmsr/rdmsrl/wrmsrl to WARN if the read_msr and write_msr
hooks return an error

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/3] arm64: KVM: Disable virtual timer even if the guest is not using it

2015-09-17 Thread Marc Zyngier

On 17/09/15 12:17, Christoffer Dall wrote:
> On Wed, Sep 16, 2015 at 04:58:06PM +0100, Marc Zyngier wrote:
>> When running a guest with the architected timer disabled (with QEMU and
>> the kernel_irqchip=off option, for example), it is important to make
>> sure the timer gets turned off. Otherwise, the guest may try to
>> enable it anyway, leading to a screaming HW interrupt.
>>
>> The fix is to unconditionally turn off the virtual timer on guest
>> exit.
>>
>> Cc: sta...@vger.kernel.org
>> Signed-off-by: Marc Zyngier 
>> ---
>>  arch/arm64/kvm/hyp.S | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
>> index 6addf97..38f5434 100644
>> --- a/arch/arm64/kvm/hyp.S
>> +++ b/arch/arm64/kvm/hyp.S
>> @@ -570,8 +570,6 @@ alternative_endif
> 
> The context confuses me; did you happen to base this on your VHE
> patches?

No, that's on top of 4.3-rc1, which happens to have this:

[...]
alternative_if_not ARM64_HAS_SYSREG_GIC_CPUIF
bl  __restore_vgic_v2_state
alternative_else
bl  __restore_vgic_v3_state
alternative_endif
.endm

.macro save_timer_state
[...]

and for some reason git doesn't use save_timer_state as the context anchor.

>>  mrs x3, cntv_ctl_el0
>>  and x3, x3, #3
>>  str w3, [x0, #VCPU_TIMER_CNTV_CTL]
>> -bic x3, x3, #1  // Clear Enable
>> -msr cntv_ctl_el0, x3
>>  
>>  isb
>>  
>> @@ -579,6 +577,8 @@ alternative_endif
>>  str x3, [x0, #VCPU_TIMER_CNTV_CVAL]
>>  
>>  1:
>> +msr cntv_ctl_el0, xzr
>> +
> 
> We could have a comment here, but ok.

I'll add something.

>>  // Allow physical timer/counter access for the host
>>  mrs x2, cnthctl_el2
>>  orr x2, x2, #3
>> -- 
>> 2.1.4
>>
> 
> Otherwise:
> 
> Reviewed-by: Christoffer Dall 
> 

Thanks!

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: add halt_attempted_poll to VCPU stats

2015-09-17 Thread David Hildenbrand

> Am 15.09.2015 um 18:27 schrieb Paolo Bonzini:
> > This new statistic can help diagnosing VCPUs that, for any reason,
> > trigger bad behavior of halt_poll_ns autotuning.
> > 
> > For example, say halt_poll_ns = 48, and wakeups are spaced exactly
> > like 479us, 481us, 479us, 481us. Then KVM always fails polling and wastes
> > 10+20+40+80+160+320+480 = 1110 microseconds out of every
> > 479+481+479+481+479+481+479 = 3359 microseconds. The VCPU then
> > is consuming about 30% more CPU than it would use without
> > polling.  This would show as an abnormally high number of
> > attempted polling compared to the successful polls.
> > 
> > Cc: Christian Borntraeger  > Cc: David Matlack 
> > Signed-off-by: Paolo Bonzini 
> 
> Acked-by: Christian Borntraeger 
> 
> yes, this will help to detect some bad cases, but not all.
> 
> PS: 
> upstream maintenance keeps me really busy at the moment :-)
> I am looking into a case right now, where auto polling goes 
> completely nuts on my system:
> 
> guest1: 8vcpusguest2: 1 vcpu
> iperf with 25 process (-P25) from guest1 to guest2.
> 
> I/O interrupts on s390 are floating (pending on all CPUs) so on 
> ALL VCPUs that go to sleep, polling will consider any pending
> network interrupt as successful poll. So with auto polling the
> guest consumes up to 5 host CPUs without auto polling only 1.
> Reducing  halt_poll_ns to 10 seems to work (goes back to 
> 1 cpu).
> 
> The proper way might be to feedback the result of the
> interrupt dequeue into the heuristics. Don't know yet how
> to handle that properly.
> 
> Christian

I think the main problem is that we have two different kinds of wakeups, and
they can't be properly reported for now. "runnability" says nothing about the
"reason".

a) "forced wakeup" - "local workload"
- "local" interrupt/timer pending
- signal
- (request bits, nested irqs ... for some archs)
-> Impacts halt_poll_ns

b) "trial wakeup" - "floating" workload
-> floating interrupts
Another vcpu might be faster and dequeue the floating interrupt.
However, if we have a high I/O load, we want all VCPUs to reduce their
halt_poll_ns value. Special cases would also be:
- Only one VCPU in the system
- Only one VCPU running
- Only one VCPU that is enabled for this type of interrupt
-> Impacts halt_poll_ns only partially

So kvm_arch_vcpu_runnable() returns true for multiple
VCPUs, although only one might get the interrupt. If we could change that
internally (one VCPU reserving that interrupt), we might get this working out of
the box. As kvm_arch_vcpu_runnable is on the hot path and also called from other
VCPUs, this isn't that trivial. Will play with it. Until then, I'll prepare a
patch to disable it for s390x, just as Paolo suggested.

David

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/3] arm64: KVM: Disable virtual timer even if the guest is not using it

2015-09-17 Thread Christoffer Dall

On Wed, Sep 16, 2015 at 04:58:06PM +0100, Marc Zyngier wrote:
> When running a guest with the architected timer disabled (with QEMU and
> the kernel_irqchip=off option, for example), it is important to make
> sure the timer gets turned off. Otherwise, the guest may try to
> enable it anyway, leading to a screaming HW interrupt.
> 
> The fix is to unconditionally turn off the virtual timer on guest
> exit.
> 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Marc Zyngier 
> ---
>  arch/arm64/kvm/hyp.S | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
> index 6addf97..38f5434 100644
> --- a/arch/arm64/kvm/hyp.S
> +++ b/arch/arm64/kvm/hyp.S
> @@ -570,8 +570,6 @@ alternative_endif

The context confuses me; did you happen to base this on your VHE
patches?

>   mrs x3, cntv_ctl_el0
>   and x3, x3, #3
>   str w3, [x0, #VCPU_TIMER_CNTV_CTL]
> - bic x3, x3, #1  // Clear Enable
> - msr cntv_ctl_el0, x3
>  
>   isb
>  
> @@ -579,6 +577,8 @@ alternative_endif
>   str x3, [x0, #VCPU_TIMER_CNTV_CVAL]
>  
>  1:
> + msr cntv_ctl_el0, xzr
> +

We could have a comment here, but ok.

>   // Allow physical timer/counter access for the host
>   mrs x2, cnthctl_el2
>   orr x2, x2, #3
> -- 
> 2.1.4
> 

Otherwise:

Reviewed-by: Christoffer Dall 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] x86/paravirt: Fix baremetal paravirt MSR ops

2015-09-17 Thread H. Peter Anvin

However, the difference between one CONFIG and another is quite frankly crazy.  
We should explicitly use the safe versions where this is appropriate, and then 
yes, we should do this.

Yet another reason the paravirt code is batshit crazy.

On September 17, 2015 2:31:34 AM PDT, Borislav Petkov  wrote:
>On Thu, Sep 17, 2015 at 09:19:20AM +0200, Ingo Molnar wrote:
>> Most big distro kernels on bare metal have CONFIG_PARAVIRT=y (I
>checked Ubuntu and 
>> Fedora), so we are potentially exposing a lot of users to problems.
>
>+ SUSE.
>
>> Crashing the bootup on an unknown MSR is bad. Many MSR reads and
>writes are 
>> non-critical and returning the 'safe' result is much better than
>crashing or 
>> hanging the bootup.
>
>... and prepending all MSR accesses with feature/CPUID checks is
>probably almost
>impossible.

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 08/18] nvdimm: init backend memory mapping and config data area

2015-09-17 Thread Xiao Guangrong




On 09/16/2015 12:06 AM, Paolo Bonzini wrote:



On 25/08/2015 18:03, Stefan Hajnoczi wrote:


+static uint64_t get_file_size(int fd)
+{
+struct stat stat_buf;
+uint64_t size;
+
+if (fstat(fd, _buf) < 0) {
+return 0;
+}
+
+if (S_ISREG(stat_buf.st_mode)) {
+return stat_buf.st_size;
+}
+
+if (S_ISBLK(stat_buf.st_mode) && !ioctl(fd, BLKGETSIZE64, )) {
+return size;
+}

#ifdef __linux__ for ioctl(fd, BLKGETSIZE64, )?

There is nothing Linux-specific about emulating NVDIMMs so this code
should compile on all platforms.


The code from block/raw-posix.c and block/raw-win32.c's raw_getlength
should probably be extracted to a new function in utils/, and reused here.



The function you pointed out is really complex - it mixed 9 platforms and each
platform has its own specific implementation. It is hard for us to verify the
change.

I'd prefer to make it for Linux specific first then share it to other platforms
if it's needed in the future.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 08/18] nvdimm: init backend memory mapping and config data area

2015-09-17 Thread Xiao Guangrong




On 09/16/2015 12:10 AM, Paolo Bonzini wrote:



On 01/09/2015 11:14, Stefan Hajnoczi wrote:


When I was digging into live migration code, i noticed that the same MR name may
cause the name "idstr", please refer to qemu_ram_set_idstr().

Since nvdimm devices do not have parent-bus, it will trigger the abort() in that
function.

I see.  The other devices that use a constant name are on a bus so the
abort doesn't trigger.


However, the MR name must be the same across the two machines.  Indices
are not friendly to hotplug.  Even though hotplug isn't supported now,
we should prepare and try not to change migration format when we support
hotplug in the future.



Thanks for your reminder.


Is there any other fixed value that we can use, for example the base
address of the NVDIMM?


How about use object_get_canonical_path(OBJECT(dev)) (the @dev is NVDIMM
device) ?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 08/18] nvdimm: init backend memory mapping and config data area

2015-09-17 Thread Xiao Guangrong




On 09/17/2015 05:04 PM, Igor Mammedov wrote:

On Thu, 17 Sep 2015 16:39:12 +0800
Xiao Guangrong  wrote:




On 09/16/2015 12:10 AM, Paolo Bonzini wrote:



On 01/09/2015 11:14, Stefan Hajnoczi wrote:


When I was digging into live migration code, i noticed that the same MR name may
cause the name "idstr", please refer to qemu_ram_set_idstr().

Since nvdimm devices do not have parent-bus, it will trigger the abort() in that
function.

I see.  The other devices that use a constant name are on a bus so the
abort doesn't trigger.


However, the MR name must be the same across the two machines.  Indices
are not friendly to hotplug.  Even though hotplug isn't supported now,
we should prepare and try not to change migration format when we support
hotplug in the future.



Thanks for your reminder.


Is there any other fixed value that we can use, for example the base
address of the NVDIMM?


How about use object_get_canonical_path(OBJECT(dev)) (the @dev is NVDIMM
device) ?

if you use split backend/frotnend idea then existing backends
already have a stable name derived from backend's ID and you won't need to care
about it.



Yes, i am using this idea and addressing your suggestion that use
memory_region_init_alias() to partly map hostmem to guest's address
space.

The code is like this:

/* get the memory region from backend memory. */
mr = host_memory_backend_get_memory(dimm->hostmem, errp);

/* nvdimm_nr will map to guest address space. */
memory_region_init_alias(>nvdimm_mr, OBJECT(dev),
 object_get_canonical_path(OBJECT(dev)), mr, 0,
 size - nvdimm->label_size);

/* the label size at the end of the file used as label_data of NVDIMM. */
..

So there are two memory regions, one is the backend-mem and another one
is nvdimm_mr in the example above. The name i am worried about is the name
of nvdimm_mr.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v8 09/13] KVM: Add an arch specific hooks in 'struct kvm_kernel_irqfd'

2015-09-17 Thread Paolo Bonzini



On 17/09/2015 03:51, Wu, Feng wrote:
> 
> 
>> -Original Message-
>> From: Paolo Bonzini [mailto:pbonz...@redhat.com]
>> Sent: Wednesday, September 16, 2015 5:27 PM
>> To: Wu, Feng; alex.william...@redhat.com; j...@8bytes.org;
>> mtosa...@redhat.com
>> Cc: eric.au...@linaro.org; kvm@vger.kernel.org;
>> io...@lists.linux-foundation.org; linux-ker...@vger.kernel.org
>> Subject: Re: [PATCH v8 09/13] KVM: Add an arch specific hooks in 'struct
>> kvm_kernel_irqfd'
>>
>>
>>
>> On 16/09/2015 10:50, Feng Wu wrote:
>>> +int kvm_arch_update_irqfd_routing(struct kvm *kvm, unsigned int host_irq,
>>> +  uint32_t guest_irq, bool set)
>>> +{
>>> +   return !kvm_x86_ops->update_pi_irte ? -EINVAL :
>>> +   kvm_x86_ops->update_pi_irte(kvm, host_irq, guest_irq, set);
>>> +}
>>> +
>>
>> Just use "if" here.  No need to resend if this is the only comment.
> 
> I am sorry, I don't quite understand. Do you mean I don't need to include
> this patch in v9? If so, what about other patches with your Reviewed-by?

No, I just wrote this email first.  If this was the only comment, I
could have fixed it up myself.  But since there were a few other
comments, please send v9 with all the required changes.

Thanks,

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 00/22] KVM: ARM64: Add guest PMU support

2015-09-17 Thread Shannon Zhao

Hi Andrew,

On 2015/9/17 17:30, Andrew Jones wrote:
> On Thu, Sep 17, 2015 at 09:32:34AM +0800, Shannon Zhao wrote:
>> > Hi Wei,
>> > 
>> > On 2015/9/17 5:07, Wei Huang wrote:
>>> > > I am testing this series. 
>> > Thanks for your time and help.
>> > 
>>> > > The first question is: do you plan to add ACPI
>>> > > support in QEMU?
>> > To the completeness, this should be added. Maybe this could be added at
>> > v3. But I have a look at the kernel PMU driver, it doesn't support
>> > probing through ACPI, although there are some patches[1] out-of-tree.
>> > 
>>> > > My in-house kernel uses ACPI for device probing. I had
>>> > > to force "acpi=off" when I test this patch series.
>> > Guest kernel only boots with ACPI when you add "acpi=force". No need to
>> > add "acpi=off".
> The Red Hat kernel uses ACPI by default, when tables are present (Thanks
> again for your QEMU table generation patches!)
> 
Oh, I see. Thanks for your explanation.

-- 
Shannon

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4] ppc/spapr: Implement H_RANDOM hypercall in QEMU

2015-09-17 Thread Thomas Huth

The PAPR interface defines a hypercall to pass high-quality
hardware generated random numbers to guests. Recent kernels can
already provide this hypercall to the guest if the right hardware
random number generator is available. But in case the user wants
to use another source like EGD, or QEMU is running with an older
kernel, we should also have this call in QEMU, so that guests that
do not support virtio-rng yet can get good random numbers, too.

This patch now adds a new pseudo-device to QEMU that either
directly provides this hypercall to the guest or is able to
enable the in-kernel hypercall if available. The in-kernel
hypercall can be enabled with the use-kvm property, e.g.:

 qemu-system-ppc64 -device spapr-rng,use-kvm=true

For handling the hypercall in QEMU instead, a "RngBackend" is
required since the hypercall should provide "good" random data
instead of pseudo-random (like from a "simple" library function
like rand() or g_random_int()). Since there are multiple RngBackends
available, the user must select an appropriate back-end via the
"rng" property of the device, e.g.:

 qemu-system-ppc64 -object rng-random,filename=/dev/hwrng,id=gid0 \
   -device spapr-rng,rng=gid0 ...

See http://wiki.qemu-project.org/Features-Done/VirtIORNG for
other example of specifying RngBackends.

Signed-off-by: Thomas Huth 
---
 hw/ppc/Makefile.objs   |   2 +-
 hw/ppc/spapr.c |   8 +++
 hw/ppc/spapr_rng.c | 186 +
 include/hw/ppc/spapr.h |   4 ++
 target-ppc/kvm.c   |   9 +++
 target-ppc/kvm_ppc.h   |   5 ++
 6 files changed, 213 insertions(+), 1 deletion(-)
 create mode 100644 hw/ppc/spapr_rng.c

diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs
index c8ab06e..c1ffc77 100644
--- a/hw/ppc/Makefile.objs
+++ b/hw/ppc/Makefile.objs
@@ -3,7 +3,7 @@ obj-y += ppc.o ppc_booke.o
 # IBM pSeries (sPAPR)
 obj-$(CONFIG_PSERIES) += spapr.o spapr_vio.o spapr_events.o
 obj-$(CONFIG_PSERIES) += spapr_hcall.o spapr_iommu.o spapr_rtas.o
-obj-$(CONFIG_PSERIES) += spapr_pci.o spapr_rtc.o spapr_drc.o
+obj-$(CONFIG_PSERIES) += spapr_pci.o spapr_rtc.o spapr_drc.o spapr_rng.o
 ifeq ($(CONFIG_PCI)$(CONFIG_PSERIES)$(CONFIG_LINUX), yyy)
 obj-y += spapr_pci_vfio.o
 endif
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index bf0c64f..34e7d24 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -768,6 +768,14 @@ static void spapr_finalize_fdt(sPAPRMachineState *spapr,
 exit(1);
 }
 
+if (object_resolve_path_type("", TYPE_SPAPR_RNG, NULL)) {
+ret = spapr_rng_populate_dt(fdt);
+if (ret < 0) {
+fprintf(stderr, "could not set up rng device in the fdt\n");
+exit(1);
+}
+}
+
 QLIST_FOREACH(phb, >phbs, list) {
 ret = spapr_populate_pci_dt(phb, PHANDLE_XICP, fdt);
 }
diff --git a/hw/ppc/spapr_rng.c b/hw/ppc/spapr_rng.c
new file mode 100644
index 000..ed43d5e
--- /dev/null
+++ b/hw/ppc/spapr_rng.c
@@ -0,0 +1,186 @@
+/*
+ * QEMU sPAPR random number generator "device" for H_RANDOM hypercall
+ *
+ * Copyright 2015 Thomas Huth, Red Hat Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License,
+ * or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see .
+ */
+
+#include "qemu/error-report.h"
+#include "sysemu/sysemu.h"
+#include "sysemu/device_tree.h"
+#include "sysemu/rng.h"
+#include "hw/ppc/spapr.h"
+#include "kvm_ppc.h"
+
+#define SPAPR_RNG(obj) \
+OBJECT_CHECK(sPAPRRngState, (obj), TYPE_SPAPR_RNG)
+
+struct sPAPRRngState {
+/*< private >*/
+DeviceState ds;
+RngBackend *backend;
+bool use_kvm;
+};
+typedef struct sPAPRRngState sPAPRRngState;
+
+struct HRandomData {
+QemuSemaphore sem;
+union {
+uint64_t v64;
+uint8_t v8[8];
+} val;
+int received;
+};
+typedef struct HRandomData HRandomData;
+
+/* Callback function for the RngBackend */
+static void random_recv(void *dest, const void *src, size_t size)
+{
+HRandomData *hrdp = dest;
+
+if (src && size > 0) {
+assert(size + hrdp->received <= sizeof(hrdp->val.v8));
+memcpy(>val.v8[hrdp->received], src, size);
+hrdp->received += size;
+}
+
+qemu_sem_post(>sem);
+}
+
+/* Handler for the H_RANDOM hypercall */
+static target_ulong h_random(PowerPCCPU *cpu, sPAPRMachineState *spapr,
+ target_ulong opcode, target_ulong *args)
+{
+sPAPRRngState *rngstate;
+HRandomData

Re: [PATCH v2 08/18] nvdimm: init backend memory mapping and config data area

2015-09-17 Thread Igor Mammedov

On Thu, 17 Sep 2015 16:39:12 +0800
Xiao Guangrong  wrote:

> 
> 
> On 09/16/2015 12:10 AM, Paolo Bonzini wrote:
> >
> >
> > On 01/09/2015 11:14, Stefan Hajnoczi wrote:
> 
>  When I was digging into live migration code, i noticed that the same MR 
>  name may
>  cause the name "idstr", please refer to qemu_ram_set_idstr().
> 
>  Since nvdimm devices do not have parent-bus, it will trigger the abort() 
>  in that
>  function.
> >> I see.  The other devices that use a constant name are on a bus so the
> >> abort doesn't trigger.
> >
> > However, the MR name must be the same across the two machines.  Indices
> > are not friendly to hotplug.  Even though hotplug isn't supported now,
> > we should prepare and try not to change migration format when we support
> > hotplug in the future.
> >
> 
> Thanks for your reminder.
> 
> > Is there any other fixed value that we can use, for example the base
> > address of the NVDIMM?
> 
> How about use object_get_canonical_path(OBJECT(dev)) (the @dev is NVDIMM
> device) ?
if you use split backend/frotnend idea then existing backends
already have a stable name derived from backend's ID and you won't need to care
about it.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] x86/paravirt: Fix baremetal paravirt MSR ops

2015-09-17 Thread Peter Zijlstra

On Wed, Sep 16, 2015 at 04:33:11PM -0700, Andy Lutomirski wrote:
> Setting CONFIG_PARAVIRT=y has an unintended side effect: it silently
> turns all rdmsr and wrmsr operations into the safe variants without
> any checks that the operations actually succeed.
> 
> This is IMO awful: it papers over bugs.  In particular, KVM gueests
> might be unwittingly depending on this behavior because
> CONFIG_KVM_GUEST currently depends on CONFIG_PARAVIRT.  I'm not
> aware of any such problems, but applying this series would be a good
> way to shake them out.
> 
> Fix it so that the MSR operations work the same on CONFIG_PARAVIRT=n
> and CONFIG_PARAVIRT=y as long as Xen isn't being used.  The Xen
> maintainers are welcome to make a similar change on top of this.
> 
> Since there's plenty of time before the next merge window, I think
> we should apply and fix anything that breaks.
> 
> Doing this is probably a prerequisite to sanely decoupling
> CONFIG_KVM_GUEST and CONFIG_PARAVIRT, which would probably make
> Arjan and the rest of the Clear Containers people happy :)

So I actually like this, although by Ingo's argument, its a tad risky.

But the far greater problem I have with the whole virt thing is that
you cannot use rdmsr_safe() to probe if an MSR exists at all because, as
you told me, these virt thingies return 0 for all 'unknown' MSRs instead
of faulting.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 00/22] KVM: ARM64: Add guest PMU support

2015-09-17 Thread Andrew Jones

On Thu, Sep 17, 2015 at 09:32:34AM +0800, Shannon Zhao wrote:
> Hi Wei,
> 
> On 2015/9/17 5:07, Wei Huang wrote:
> > I am testing this series. 
> Thanks for your time and help.
> 
> > The first question is: do you plan to add ACPI
> > support in QEMU?
> To the completeness, this should be added. Maybe this could be added at
> v3. But I have a look at the kernel PMU driver, it doesn't support
> probing through ACPI, although there are some patches[1] out-of-tree.
> 
> > My in-house kernel uses ACPI for device probing. I had
> > to force "acpi=off" when I test this patch series.
> Guest kernel only boots with ACPI when you add "acpi=force". No need to
> add "acpi=off".

The Red Hat kernel uses ACPI by default, when tables are present (Thanks
again for your QEMU table generation patches!)

drew

> 
> Thanks,
> 
> [1] http://marc.info/?l=linaro-acpi=137949337925645=2
> -- 
> Shannon
> ___
> kvmarm mailing list
> kvm...@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 08/18] nvdimm: init backend memory mapping and config data area

2015-09-17 Thread Paolo Bonzini



On 17/09/2015 11:14, Xiao Guangrong wrote:
> 
> 
> /* get the memory region from backend memory. */
> mr = host_memory_backend_get_memory(dimm->hostmem, errp);
> 
> /* nvdimm_nr will map to guest address space. */
> memory_region_init_alias(>nvdimm_mr, OBJECT(dev),
>  object_get_canonical_path(OBJECT(dev)), mr, 0,
>  size - nvdimm->label_size);

You can just use "memory" here for the name here.  The name only needs
to be unique for RAM memory regions, and dimm->hostmem will take care of it.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v8 03/13] KVM: Define a new interface kvm_intr_is_single_vcpu()

2015-09-17 Thread Paolo Bonzini



On 17/09/2015 05:17, Wu, Feng wrote:
>>> > > +   if (irq->dest_mode == APIC_DEST_PHYSICAL) {
>>> > > +   if (irq->dest_id == 0xFF)
>>> > > +   goto out;
>>> > > +
>>> > > +   if (irq->dest_id >= ARRAY_SIZE(map->phys_map)) {
>> > 
>> > Warning here is wrong, the guest can trigger it.
> Could you please share more information about how the guest
> triggers these conditions (including the following two), Thanks
> a lot!

irq->dest_id is a 16-bit value, so it can be > 255.

> + if (!kvm_apic_logical_map_valid(map)) {
> + WARN_ON_ONCE(1);

Here, the guest can trigger it by setting a few APICs in flat mode and
others in cluster mode, for example.

> + if (cid >= ARRAY_SIZE(map->logical_map)) {
> + WARN_ON_ONCE(1);

In x2apic mode irq->dest_id could have bits 12..15 set.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] x86/paravirt: Fix baremetal paravirt MSR ops

2015-09-17 Thread Peter Zijlstra

On Thu, Sep 17, 2015 at 01:40:30PM +0200, Paolo Bonzini wrote:
> 
> 
> On 17/09/2015 10:58, Peter Zijlstra wrote:
> > But the far greater problem I have with the whole virt thing is that
> > you cannot use rdmsr_safe() to probe if an MSR exists at all because, as
> > you told me, these virt thingies return 0 for all 'unknown' MSRs instead
> > of faulting.
> 
> At least for KVM, that behavior is opt-in (the ignore_msrs parameter)
> and no distro that I know enables it by default.

Ah, that would be good news. Andy earlier argued I could not rely on
rdmsr_safe() faulting on unknown MSRs. If practically we can there's
some code I can simplify :-)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH v2 08/18] nvdimm: init backend memory mapping and config data area

2015-09-17 Thread Xiao Guangrong




On 09/17/2015 05:34 PM, Paolo Bonzini wrote:



On 17/09/2015 11:14, Xiao Guangrong wrote:



/* get the memory region from backend memory. */
mr = host_memory_backend_get_memory(dimm->hostmem, errp);

/* nvdimm_nr will map to guest address space. */
memory_region_init_alias(>nvdimm_mr, OBJECT(dev),
  object_get_canonical_path(OBJECT(dev)), mr, 0,
  size - nvdimm->label_size);


You can just use "memory" here for the name here.  The name only needs
to be unique for RAM memory regions, and dimm->hostmem will take care of it.



Okay. I will try it, thank you, Paolo.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 7/8] arm64: KVM: Remove all traces of the ThumbEE registers

2015-09-17 Thread Marc Zyngier

From: Will Deacon 

Although the ThumbEE registers and traps were present in earlier
versions of the v8 architecture, it was retrospectively removed and so
we can do the same.

Whilst this breaks migrating a guest started on a previous version of
the kernel, it is much better to kill these (non existent) registers
as soon as possible.

Reviewed-by: Marc Zyngier 
Signed-off-by: Will Deacon 
[maz: added commend about migration]
Signed-off-by: Marc Zyngier 
---
 arch/arm64/include/asm/kvm_arm.h |  1 -
 arch/arm64/include/asm/kvm_asm.h |  4 +---
 arch/arm64/kvm/hyp.S | 22 --
 arch/arm64/kvm/sys_regs.c|  7 ---
 4 files changed, 5 insertions(+), 29 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index cbc5e1a..9694f26 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -172,7 +172,6 @@
 #define VTTBR_VMID_MASK  (UL(0xFF) << VTTBR_VMID_SHIFT)
 
 /* Hyp System Trap Register */
-#define HSTR_EL2_TTEE  (1 << 16)
 #define HSTR_EL2_T(x)  (1 << x)
 
 /* Hyp Coproccessor Trap Register Shifts */
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 67fa0de..5e37710 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -53,9 +53,7 @@
 #defineIFSR32_EL2  25  /* Instruction Fault Status Register */
 #defineFPEXC32_EL2 26  /* Floating-Point Exception Control 
Register */
 #defineDBGVCR32_EL227  /* Debug Vector Catch Register */
-#defineTEECR32_EL1 28  /* ThumbEE Configuration Register */
-#defineTEEHBR32_EL129  /* ThumbEE Handler Base Register */
-#defineNR_SYS_REGS 30
+#defineNR_SYS_REGS 28
 
 /* 32bit mapping */
 #define c0_MPIDR   (MPIDR_EL1 * 2) /* MultiProcessor ID Register */
diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
index 60a83e2..8563477 100644
--- a/arch/arm64/kvm/hyp.S
+++ b/arch/arm64/kvm/hyp.S
@@ -433,20 +433,13 @@
mrs x5, ifsr32_el2
stp x4, x5, [x3]
 
-   skip_fpsimd_state x8, 3f
+   skip_fpsimd_state x8, 2f
mrs x6, fpexc32_el2
str x6, [x3, #16]
-3:
-   skip_debug_state x8, 2f
+2:
+   skip_debug_state x8, 1f
mrs x7, dbgvcr32_el2
str x7, [x3, #24]
-2:
-   skip_tee_state x8, 1f
-
-   add x3, x2, #CPU_SYSREG_OFFSET(TEECR32_EL1)
-   mrs x4, teecr32_el1
-   mrs x5, teehbr32_el1
-   stp x4, x5, [x3]
 1:
 .endm
 
@@ -466,16 +459,9 @@
msr dacr32_el2, x4
msr ifsr32_el2, x5
 
-   skip_debug_state x8, 2f
+   skip_debug_state x8, 1f
ldr x7, [x3, #24]
msr dbgvcr32_el2, x7
-2:
-   skip_tee_state x8, 1f
-
-   add x3, x2, #CPU_SYSREG_OFFSET(TEECR32_EL1)
-   ldp x4, x5, [x3]
-   msr teecr32_el1, x4
-   msr teehbr32_el1, x5
 1:
 .endm
 
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 1d0463e..d03d3af 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -539,13 +539,6 @@ static const struct sys_reg_desc sys_reg_descs[] = {
{ Op0(0b10), Op1(0b000), CRn(0b0111), CRm(0b1110), Op2(0b110),
  trap_dbgauthstatus_el1 },
 
-   /* TEECR32_EL1 */
-   { Op0(0b10), Op1(0b010), CRn(0b), CRm(0b), Op2(0b000),
- NULL, reset_val, TEECR32_EL1, 0 },
-   /* TEEHBR32_EL1 */
-   { Op0(0b10), Op1(0b010), CRn(0b0001), CRm(0b), Op2(0b000),
- NULL, reset_val, TEEHBR32_EL1, 0 },
-
/* MDCCSR_EL1 */
{ Op0(0b10), Op1(0b011), CRn(0b), CRm(0b0001), Op2(0b000),
  trap_raz_wi },
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 6/8] arm: KVM: Disable virtual timer even if the guest is not using it

2015-09-17 Thread Marc Zyngier

When running a guest with the architected timer disabled (with QEMU and
the kernel_irqchip=off option, for example), it is important to make
sure the timer gets turned off. Otherwise, the guest may try to
enable it anyway, leading to a screaming HW interrupt.

The fix is to unconditionally turn off the virtual timer on guest
exit.

Cc: sta...@vger.kernel.org
Signed-off-by: Marc Zyngier 
---
 arch/arm/kvm/interrupts_head.S | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/arm/kvm/interrupts_head.S b/arch/arm/kvm/interrupts_head.S
index 702740d..51a5950 100644
--- a/arch/arm/kvm/interrupts_head.S
+++ b/arch/arm/kvm/interrupts_head.S
@@ -515,8 +515,7 @@ ARM_BE8(rev r6, r6  )
 
mrc p15, 0, r2, c14, c3, 1  @ CNTV_CTL
str r2, [vcpu, #VCPU_TIMER_CNTV_CTL]
-   bic r2, #1  @ Clear ENABLE
-   mcr p15, 0, r2, c14, c3, 1  @ CNTV_CTL
+
isb
 
mrrcp15, 3, rr_lo_hi(r2, r3), c14   @ CNTV_CVAL
@@ -529,6 +528,9 @@ ARM_BE8(rev r6, r6  )
mcrrp15, 4, r2, r2, c14 @ CNTVOFF
 
 1:
+   mov r2, #0  @ Clear ENABLE
+   mcr p15, 0, r2, c14, c3, 1  @ CNTV_CTL
+
@ Allow physical timer/counter access for the host
mrc p15, 4, r2, c14, c1, 0  @ CNTHCTL
orr r2, r2, #(CNTHCTL_PL1PCEN | CNTHCTL_PL1PCTEN)
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 8/8] arm/arm64: KVM: Remove 'config KVM_ARM_MAX_VCPUS'

2015-09-17 Thread Marc Zyngier

From: Ming Lei 

This patch removes config option of KVM_ARM_MAX_VCPUS,
and like other ARCHs, just choose the maximum allowed
value from hardware, and follows the reasons:

1) from distribution view, the option has to be
defined as the max allowed value because it need to
meet all kinds of virtulization applications and
need to support most of SoCs;

2) using a bigger value doesn't introduce extra memory
consumption, and the help text in Kconfig isn't accurate
because kvm_vpu structure isn't allocated until request
of creating VCPU is sent from QEMU;

3) the main effect is that the field of vcpus[] in 'struct kvm'
becomes a bit bigger(sizeof(void *) per vcpu) and need more cache
lines to hold the structure, but 'struct kvm' is one generic struct,
and it has worked well on other ARCHs already in this way. Also,
the world switch frequecy is often low, for example, it is ~2000
when running kernel building load in VM from APM xgene KVM host,
so the effect is very small, and the difference can't be observed
in my test at all.

Cc: Dann Frazier 
Signed-off-by: Ming Lei 
Reviewed-by: Christoffer Dall 
Signed-off-by: Marc Zyngier 
---
 arch/arm/include/asm/kvm_host.h   |  8 ++--
 arch/arm/kvm/Kconfig  | 11 ---
 arch/arm64/include/asm/kvm_host.h |  8 ++--
 arch/arm64/kvm/Kconfig| 11 ---
 include/kvm/arm_vgic.h|  6 +-
 virt/kvm/arm/vgic-v3.c|  2 +-
 6 files changed, 6 insertions(+), 40 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index dcba0fa..c8c226a 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -29,12 +29,6 @@
 
 #define __KVM_HAVE_ARCH_INTC_INITIALIZED
 
-#if defined(CONFIG_KVM_ARM_MAX_VCPUS)
-#define KVM_MAX_VCPUS CONFIG_KVM_ARM_MAX_VCPUS
-#else
-#define KVM_MAX_VCPUS 0
-#endif
-
 #define KVM_USER_MEM_SLOTS 32
 #define KVM_PRIVATE_MEM_SLOTS 4
 #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
@@ -44,6 +38,8 @@
 
 #include 
 
+#define KVM_MAX_VCPUS VGIC_V2_MAX_CPUS
+
 u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode);
 int __attribute_const__ kvm_target_cpu(void);
 int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
index bfb915d..210ecca 100644
--- a/arch/arm/kvm/Kconfig
+++ b/arch/arm/kvm/Kconfig
@@ -45,15 +45,4 @@ config KVM_ARM_HOST
---help---
  Provides host support for ARM processors.
 
-config KVM_ARM_MAX_VCPUS
-   int "Number maximum supported virtual CPUs per VM"
-   depends on KVM_ARM_HOST
-   default 4
-   help
- Static number of max supported virtual CPUs per VM.
-
- If you choose a high number, the vcpu structures will be quite
- large, so only choose a reasonable number that you expect to
- actually use.
-
 endif # VIRTUALIZATION
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 415938d..3fb58ea 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -30,12 +30,6 @@
 
 #define __KVM_HAVE_ARCH_INTC_INITIALIZED
 
-#if defined(CONFIG_KVM_ARM_MAX_VCPUS)
-#define KVM_MAX_VCPUS CONFIG_KVM_ARM_MAX_VCPUS
-#else
-#define KVM_MAX_VCPUS 0
-#endif
-
 #define KVM_USER_MEM_SLOTS 32
 #define KVM_PRIVATE_MEM_SLOTS 4
 #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
@@ -43,6 +37,8 @@
 #include 
 #include 
 
+#define KVM_MAX_VCPUS VGIC_V3_MAX_CPUS
+
 #define KVM_VCPU_MAX_FEATURES 3
 
 int __attribute_const__ kvm_target_cpu(void);
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index bfffe8f..5c7e920 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -41,15 +41,4 @@ config KVM_ARM_HOST
---help---
  Provides host support for ARM processors.
 
-config KVM_ARM_MAX_VCPUS
-   int "Number maximum supported virtual CPUs per VM"
-   depends on KVM_ARM_HOST
-   default 4
-   help
- Static number of max supported virtual CPUs per VM.
-
- If you choose a high number, the vcpu structures will be quite
- large, so only choose a reasonable number that you expect to
- actually use.
-
 endif # VIRTUALIZATION
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index d901f1a..4e14dac 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -35,11 +35,7 @@
 #define VGIC_V3_MAX_LRS16
 #define VGIC_MAX_IRQS  1024
 #define VGIC_V2_MAX_CPUS   8
-
-/* Sanity checks... */
-#if (KVM_MAX_VCPUS > 255)
-#error Too many KVM VCPUs, the VGIC only supports up to 255 VCPUs for now
-#endif
+#define VGIC_V3_MAX_CPUS   255
 
 #if (VGIC_NR_IRQS_LEGACY & 31)
 #error "VGIC_NR_IRQS must be a multiple of 32"
diff --git a/virt/kvm/arm/vgic-v3.c b/virt/kvm/arm/vgic-v3.c
index afbf925..7dd5d62 100644
--- a/virt/kvm/arm/vgic-v3.c
+++

[PATCH 3/8] arm: KVM: Fix incorrect device to IPA mapping

2015-09-17 Thread Marc Zyngier

From: Marek Majtyka 

A critical bug has been found in device memory stage1 translation for
VMs with more then 4GB of address space. Once vm_pgoff size is smaller
then pa (which is true for LPAE case, u32 and u64 respectively) some
more significant bits of pa may be lost as a shift operation is performed
on u32 and later cast onto u64.

Example: vm_pgoff(u32)=0x00210030, PAGE_SHIFT=12
expected pa(u64):   0x00201003
produced pa(u64):   0x1003

The fix is to change the order of operations (casting first onto phys_addr_t
and then shifting).

Reviewed-by: Marc Zyngier 
[maz: fixed changelog and patch formatting]
Cc: sta...@vger.kernel.org
Signed-off-by: Marek Majtyka 
Signed-off-by: Marc Zyngier 
---
 arch/arm/kvm/mmu.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 7b42012..6984342 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -1792,8 +1792,10 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
if (vma->vm_flags & VM_PFNMAP) {
gpa_t gpa = mem->guest_phys_addr +
(vm_start - mem->userspace_addr);
-   phys_addr_t pa = (vma->vm_pgoff << PAGE_SHIFT) +
-vm_start - vma->vm_start;
+   phys_addr_t pa;
+
+   pa = (phys_addr_t)vma->vm_pgoff << PAGE_SHIFT;
+   pa += vm_start - vma->vm_start;
 
/* IO region dirty page logging not allowed */
if (memslot->flags & KVM_MEM_LOG_DIRTY_PAGES)
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 4/8] arm/arm64: KVM: vgic: Check for !irqchip_in_kernel() when mapping resources

2015-09-17 Thread Marc Zyngier

From: Pavel Fedin 

Until b26e5fdac43c ("arm/arm64: KVM: introduce per-VM ops"),
kvm_vgic_map_resources() used to include a check on irqchip_in_kernel(),
and vgic_v2_map_resources() still has it.

But now vm_ops are not initialized until we call kvm_vgic_create().
Therefore kvm_vgic_map_resources() can being called without a VGIC,
and we die because vm_ops.map_resources is NULL.

Fixing this restores QEMU's kernel-irqchip=off option to a working state,
allowing to use GIC emulation in userspace.

Fixes: b26e5fdac43c ("arm/arm64: KVM: introduce per-VM ops")
Cc: sta...@vger.kernel.org
Signed-off-by: Pavel Fedin 
[maz: reworked commit message]
Signed-off-by: Marc Zyngier 
---
 arch/arm/kvm/arm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index ce404a5..dc017ad 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -446,7 +446,7 @@ static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu)
 * Map the VGIC hardware resources before running a vcpu the first
 * time on this VM.
 */
-   if (unlikely(!vgic_ready(kvm))) {
+   if (unlikely(irqchip_in_kernel(kvm) && !vgic_ready(kvm))) {
ret = kvm_vgic_map_resources(kvm);
if (ret)
return ret;
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 5/8] arm64: KVM: Disable virtual timer even if the guest is not using it

2015-09-17 Thread Marc Zyngier

When running a guest with the architected timer disabled (with QEMU and
the kernel_irqchip=off option, for example), it is important to make
sure the timer gets turned off. Otherwise, the guest may try to
enable it anyway, leading to a screaming HW interrupt.

The fix is to unconditionally turn off the virtual timer on guest
exit.

Cc: sta...@vger.kernel.org
Reviewed-by: Christoffer Dall 
Signed-off-by: Marc Zyngier 
---
 arch/arm64/kvm/hyp.S | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
index 39aa322..60a83e2 100644
--- a/arch/arm64/kvm/hyp.S
+++ b/arch/arm64/kvm/hyp.S
@@ -562,8 +562,6 @@
mrs x3, cntv_ctl_el0
and x3, x3, #3
str w3, [x0, #VCPU_TIMER_CNTV_CTL]
-   bic x3, x3, #1  // Clear Enable
-   msr cntv_ctl_el0, x3
 
isb
 
@@ -571,6 +569,9 @@
str x3, [x0, #VCPU_TIMER_CNTV_CVAL]
 
 1:
+   // Disable the virtual timer
+   msr cntv_ctl_el0, xzr
+
// Allow physical timer/counter access for the host
mrs x2, cnthctl_el2
orr x2, x2, #3
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Xen-devel] [PATCH 0/3] x86/paravirt: Fix baremetal paravirt MSR ops

2015-09-17 Thread Boris Ostrovsky


On 09/17/2015 05:10 AM, Andrew Cooper wrote:

On 17/09/15 00:33, Andy Lutomirski wrote:

Setting CONFIG_PARAVIRT=y has an unintended side effect: it silently
turns all rdmsr and wrmsr operations into the safe variants without
any checks that the operations actually succeed.

This is IMO awful: it papers over bugs.  In particular, KVM gueests
might be unwittingly depending on this behavior because
CONFIG_KVM_GUEST currently depends on CONFIG_PARAVIRT.  I'm not
aware of any such problems, but applying this series would be a good
way to shake them out.

Fix it so that the MSR operations work the same on CONFIG_PARAVIRT=n
and CONFIG_PARAVIRT=y as long as Xen isn't being used.  The Xen
maintainers are welcome to make a similar change on top of this.

The Xen side of things need some further modification before this would
be a safe operation to perform.

On the wrmsr side of things alone, this is the list of things Xen
currently objects to and injects #GP faults for.

(XEN) traps.c:2692:d0v0 Domain attempted WRMSR c081 from
0xe023e008 to 0x00230010.
(XEN) traps.c:2692:d0v0 Domain attempted WRMSR c082 from
0x82d0b000 to 0x81560060.
(XEN) traps.c:2692:d0v0 Domain attempted WRMSR c083 from
0x82d0b020 to 0x81558100.
(XEN) traps.c:2692:d0v0 Domain attempted WRMSR 0174 from
0xe008 to 0x0010.
(XEN) traps.c:2692:d0v0 Domain attempted WRMSR 0175 from
0x8300ac0f7fc0 to 0x.
(XEN) traps.c:2692:d0v0 Domain attempted WRMSR 0176 from
0x82d08023fd50 to 0x815616d0.
(XEN) traps.c:2692:d0v0 Domain attempted WRMSR c083 from
0x82d0b020 to 0x81561910.
(XEN) traps.c:2692:d0v0 Domain attempted WRMSR c084 from
0x00074700 to 0x00047700.

However, it would be certainly be worth teaching PVops not to play with
MSRs it doesn't own.


PVops already knows about those. There is even has a comment about how 
we shouldn't touch those MSRs. And yet three lines later we still write 
them.


-boris

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] x86/paravirt: Fix baremetal paravirt MSR ops

2015-09-17 Thread Andy Lutomirski

On Thu, Sep 17, 2015 at 12:19 AM, Ingo Molnar  wrote:
>
> * Andy Lutomirski  wrote:
>
>> Setting CONFIG_PARAVIRT=y has an unintended side effect: it silently
>> turns all rdmsr and wrmsr operations into the safe variants without
>> any checks that the operations actually succeed.
>>
>> This is IMO awful: it papers over bugs.  In particular, KVM gueests
>> might be unwittingly depending on this behavior because
>> CONFIG_KVM_GUEST currently depends on CONFIG_PARAVIRT.  I'm not
>> aware of any such problems, but applying this series would be a good
>> way to shake them out.
>>
>> Fix it so that the MSR operations work the same on CONFIG_PARAVIRT=n
>> and CONFIG_PARAVIRT=y as long as Xen isn't being used.  The Xen
>> maintainers are welcome to make a similar change on top of this.
>>
>> Since there's plenty of time before the next merge window, I think
>> we should apply and fix anything that breaks.
>
> No, I think we should at most generate a warning instead, and not crash the 
> kernel
> via rdmsr()!
>
> Most big distro kernels on bare metal have CONFIG_PARAVIRT=y (I checked 
> Ubuntu and
> Fedora), so we are potentially exposing a lot of users to problems.
>
> Crashing the bootup on an unknown MSR is bad. Many MSR reads and writes are
> non-critical and returning the 'safe' result is much better than crashing or
> hanging the bootup.
>

Should we do that for CONFIG_PARAVIRT=n, too?

It would be straightforward to rig this up (temporarily?) on top of
these patches.  To keep bloat down, we might want to implement it in
do_general_protection rather than sticking it in native_read_msr.

wrmsr is a different beast, since we can fail due to writing the wrong
value to an otherwise valid MSR.  Given that MSR screwups can very
easily be security holes, I'm not sure that warning and blindly
continuing on an unchecked failed wrmsr is a good idea.

In any event, I think it's nuts that CONFIG_PARAVIRT changes this
behavior.  We should pick something sane and stick with it.
CONFIG_PARAVIRT=n appears to work just fine on bare metal in lots of
deployments, especially pre-KVM.  CONFIG_PARAVIRT=n also appears to
work fine under KVM, and I use it frequently.

> ( We should double check that rdmsr()/wrmsr() results are never left
>   uninitialized, but are set to zero or so, for cases where the return code 
> is not
>   checked. )

It sure looks like native_read_msr_safe doesn't clear the output if
the rdmsr fails.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] x86/paravirt: Fix baremetal paravirt MSR ops

2015-09-17 Thread Peter Zijlstra

On Thu, Sep 17, 2015 at 08:17:18AM -0700, Andy Lutomirski wrote:

> > Ah, that would be good news. Andy earlier argued I could not rely on
> > rdmsr_safe() faulting on unknown MSRs. If practically we can there's
> > some code I can simplify :-)
> 
> I was taking about QEMU TCG, not KVM.

Just for my education, TCG is the thing without _any_ hardware assist?
The thing you fear to use because it cannot boot a kernel this side of
tomorrow etc.. ?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] x86/paravirt: Fix baremetal paravirt MSR ops

2015-09-17 Thread Andy Lutomirski

On Thu, Sep 17, 2015 at 8:17 AM, Peter Zijlstra  wrote:
> On Thu, Sep 17, 2015 at 08:17:18AM -0700, Andy Lutomirski wrote:
>
>> > Ah, that would be good news. Andy earlier argued I could not rely on
>> > rdmsr_safe() faulting on unknown MSRs. If practically we can there's
>> > some code I can simplify :-)
>>
>> I was taking about QEMU TCG, not KVM.
>
> Just for my education, TCG is the thing without _any_ hardware assist?

Yes.

> The thing you fear to use because it cannot boot a kernel this side of
> tomorrow etc.. ?

I actually use it on a semi-regular basis.  It appears to boot a
normal kernel correctly and surprisingly quickly.  It's important for
a silly reason.  Asking KVM on an Intel host to emulate an AMD CPU or
vice versa results in a chimera: I get an Opteron that's
"GenuineIntel", which causes Linux to treat it as Intel, which means
it gets the Intel quirks (SYSENTER in long mode, no
X86_BUG_SYSRET_SS_ATTRS, etc) instead of the AMD quirks (SYSCALL in
compat mode, etc).  So, if I want to test compat SYSCALL, I have to
use TCG, which will actually do a decent job of emulating an AMD CPU
for me.

Maybe Paolo can fix QEMU to fail bad MSR accesses for real...

--Andy

-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] x86/paravirt: Fix baremetal paravirt MSR ops

2015-09-17 Thread Paolo Bonzini



On 17/09/2015 17:27, Arjan van de Ven wrote:
>>
>>> ( We should double check that rdmsr()/wrmsr() results are never left
>>>uninitialized, but are set to zero or so, for cases where the
>>> return code is not
>>>checked. )
>>
>> It sure looks like native_read_msr_safe doesn't clear the output if
>> the rdmsr fails.
> 
> I'd suggest to return some poison not just 0...

What about 0 + WARN?

Paolo

> less likely to get interesting surprises that are insane hard to
> debug/diagnose
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] x86/paravirt: Fix baremetal paravirt MSR ops

2015-09-17 Thread Paolo Bonzini



On 17/09/2015 17:26, Andy Lutomirski wrote:
> Maybe Paolo can fix QEMU to fail bad MSR accesses for real...

I was afraid of someone proposing exactly that. :)

I can do it since the list of MSRs can be lifted from KVM.  Let's first
see the direction these patches go...

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v8 03/13] KVM: Define a new interface kvm_intr_is_single_vcpu()

2015-09-17 Thread Radim Krčmář

2015-09-17 16:24+0200, Paolo Bonzini:
> I think kvm_apic_match_logical_addr for MSI and IOAPIC interrupts is
> buggy in x2apic mode.
> 
> It does:
> 
> if (apic_x2apic_mode(apic))
> return ((logical_id >> 16) == (mda >> 16))
>&& (logical_id & mda & 0x) != 0;
> 
> But mda is only 8-bits for MSI and IOAPIC interrupts.
> 
> Radim, should kvm_apic_mda also handle the !ipi && x2apic_mda && dest_id
> != APIC_BROADCAST case?  It never triggers with Linux because it uses
> only the physical mode (that's not super-easy to see;
> ioapic_set_affinity looks for the RTEs in irq_data->chip_data and that
> is allocated with kzalloc).

KVM handles that case, it's just convoluted.
(I wish we scrapped the IR-less x2APIC mode.)

For interrupts from MSI and IOxAPIC:
- Flat logical interrupts are delivered as if we had natural
  (CPU0<->bit0, CPU1<->bit1, ...) flat logical xAPIC for first 8 VCPUs.
- Cluster logical doesn't work much, it's interpreted like flat logical.
  I didn't care about xAPIC cluster because Linux, the sole user of our
  paravirtualized x2APIC, doesn't configure it.

I'll paste kvm_apic_mda() source for better explanation:

  static u32 kvm_apic_mda(unsigned int dest_id, struct kvm_lapic *source,
struct kvm_lapic *target)
  {
bool ipi = source != NULL;
bool x2apic_mda = apic_x2apic_mode(ipi ? source : target);
  
if (!ipi && dest_id == APIC_BROADCAST && x2apic_mda)
return X2APIC_BROADCAST;
  
return x2apic_mda ? dest_id : SET_APIC_DEST_FIELD(dest_id);
  }

MSI/IOxAPIC interrupt means that source is NULL and if the target is in
x2APIC mode, the original 'dest_id' is returned as mda => a flat logical
xAPIC to 0x0f will get interpreted as (cluster) logical x2APIC 0xf in
kvm_apic_match_logical_addr().
xAPIC address are only 8 bit long so they always get delivered to x2APIC
cluster 0, where first 16 bits work like xAPIC flat logical mode.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] kvm: fix preemption warnings in kvm_vcpu_block

2015-09-17 Thread Dominik Dingel

Commit f78195129963 ("kvm: add halt_poll_ns module parameter") calls, with
enabled preemption, single_task_running. When CONFIG_DEBUG_PREEMPT is
enabled that will result in a debug_smp_processor_id() call.

Cc:   # 4.2.x
Signed-off-by: Dominik Dingel 
---
 virt/kvm/kvm_main.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 54534de..ce67dd6 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1971,6 +1971,7 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
 
start = cur = ktime_get();
if (vcpu->halt_poll_ns) {
+   bool solo;
ktime_t stop = ktime_add_ns(ktime_get(), vcpu->halt_poll_ns);
 
do {
@@ -1982,8 +1983,13 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
++vcpu->stat.halt_successful_poll;
goto out;
}
+
+   preempt_disable();
+   solo = single_task_running();
+   preempt_enable();
+
cur = ktime_get();
-   } while (single_task_running() && ktime_before(cur, stop));
+   } while (solo && ktime_before(cur, stop));
}
 
for (;;) {
-- 
2.3.8

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 5/8] arm64: KVM: Disable virtual timer even if the guest is not using it

2015-09-17 Thread Marc Zyngier

On 17/09/15 16:02, Paolo Bonzini wrote:
> 
> 
> On 17/09/2015 16:46, Marc Zyngier wrote:
>> When running a guest with the architected timer disabled (with QEMU and
>> the kernel_irqchip=off option, for example), it is important to make
>> sure the timer gets turned off. Otherwise, the guest may try to
>> enable it anyway, leading to a screaming HW interrupt.
>>
>> The fix is to unconditionally turn off the virtual timer on guest
>> exit.
>>
>> Cc: sta...@vger.kernel.org
>> Reviewed-by: Christoffer Dall 
>> Signed-off-by: Marc Zyngier 
>> ---
>>  arch/arm64/kvm/hyp.S | 5 +++--
>>  1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
>> index 39aa322..60a83e2 100644
>> --- a/arch/arm64/kvm/hyp.S
>> +++ b/arch/arm64/kvm/hyp.S
>> @@ -562,8 +562,6 @@
>>  mrs x3, cntv_ctl_el0
>>  and x3, x3, #3
>>  str w3, [x0, #VCPU_TIMER_CNTV_CTL]
>> -bic x3, x3, #1  // Clear Enable
>> -msr cntv_ctl_el0, x3
>>  
>>  isb
>>  
>> @@ -571,6 +569,9 @@
>>  str x3, [x0, #VCPU_TIMER_CNTV_CVAL]
>>  
>>  1:
>> +// Disable the virtual timer
>> +msr cntv_ctl_el0, xzr
>> +
>>  // Allow physical timer/counter access for the host
>>  mrs x2, cnthctl_el2
>>  orr x2, x2, #3
>>
> 
> It looks like here in restore_timer_state:
> 
> ldr w2, [x0, #VCPU_TIMER_CNTV_CTL]
> and x2, x2, #3
> msr cntv_ctl_el0, x2
> 
> the "and" would be unnecessary if kvm_arm_timer_set_reg remembered to 
> do it.  Something like this, which would also make the code more 
> consistent between arm and arm64...
> 
> diff --git a/arch/arm/kvm/interrupts_head.S b/arch/arm/kvm/interrupts_head.S
> index 702740d37465..93e322b4d242 100644
> --- a/arch/arm/kvm/interrupts_head.S
> +++ b/arch/arm/kvm/interrupts_head.S
> @@ -514,6 +514,7 @@ ARM_BE8(rev   r6, r6  )
>   beq 1f
>  
>   mrc p15, 0, r2, c14, c3, 1  @ CNTV_CTL
> + and r2, r2, #3

I don't think we need this. Exposing the ISTATUS bit to the kernel (or
even userspace) is not really a problem (that's actually an interesting
piece of information), and restoring it is not possible since it is
read-only.

We should drop the equivalent 'and' from the arm64 version.

>   str r2, [vcpu, #VCPU_TIMER_CNTV_CTL]
>   bic r2, #1  @ Clear ENABLE
>   mcr p15, 0, r2, c14, c3, 1  @ CNTV_CTL
> @@ -566,7 +567,6 @@ ARM_BE8(rev   r6, r6  )
>   isb
>  
>   ldr r2, [vcpu, #VCPU_TIMER_CNTV_CTL]
> - and r2, r2, #3
>   mcr p15, 0, r2, c14, c3, 1  @ CNTV_CTL
>  1:
>  .endm
> diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
> index 10915aaf0b01..bfcd3f3a947b 100644
> --- a/arch/arm64/kvm/hyp.S
> +++ b/arch/arm64/kvm/hyp.S
> @@ -887,7 +887,6 @@ alternative_endif
>   isb
>  
>   ldr w2, [x0, #VCPU_TIMER_CNTV_CTL]
> - and x2, x2, #3
>   msr cntv_ctl_el0, x2
>  1:
>  .endm
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index 98c95f2fcba4..9b03c9f5abbf 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -218,7 +218,7 @@ int kvm_arm_timer_set_reg(struct kvm_vcpu *vcpu, u64 
> regid, u64 value)
>  
>   switch (regid) {
>   case KVM_REG_ARM_TIMER_CTL:
> - timer->cntv_ctl = value;
> + timer->cntv_ctl = value & (ARCH_TIMER_CTRL_IT_MASK | 
> ARCH_TIMER_CTRL_ENABLE);
>   break;
>   case KVM_REG_ARM_TIMER_CNT:
>   vcpu->kvm->arch.timer.cntvoff = kvm_phys_timer_read() - value;
> 

Otherwise looks pretty good. Can you send an updated patch?

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] x86/paravirt: Fix baremetal paravirt MSR ops

2015-09-17 Thread Arjan van de Ven




( We should double check that rdmsr()/wrmsr() results are never left
   uninitialized, but are set to zero or so, for cases where the return code is 
not
   checked. )


It sure looks like native_read_msr_safe doesn't clear the output if
the rdmsr fails.


I'd suggest to return some poison not just 0...
less likely to get interesting surprises that are insane hard to debug/diagnose



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 5/8] arm64: KVM: Disable virtual timer even if the guest is not using it

2015-09-17 Thread Paolo Bonzini



On 17/09/2015 17:28, Marc Zyngier wrote:
> > diff --git a/arch/arm/kvm/interrupts_head.S b/arch/arm/kvm/interrupts_head.S
> > index 702740d37465..93e322b4d242 100644
> > --- a/arch/arm/kvm/interrupts_head.S
> > +++ b/arch/arm/kvm/interrupts_head.S
> > @@ -514,6 +514,7 @@ ARM_BE8(rev r6, r6  )
> > beq 1f
> >  
> > mrc p15, 0, r2, c14, c3, 1  @ CNTV_CTL
> > +   and r2, r2, #3
> 
> I don't think we need this. Exposing the ISTATUS bit to the kernel (or
> even userspace) is not really a problem (that's actually an interesting
> piece of information), and restoring it is not possible since it is
> read-only.
> 
> We should drop the equivalent 'and' from the arm64 version.

Ok.  I'll resend the thing as a proper patch.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] x86/paravirt: Fix baremetal paravirt MSR ops

2015-09-17 Thread Arjan van de Ven


On 9/17/2015 8:29 AM, Paolo Bonzini wrote:



On 17/09/2015 17:27, Arjan van de Ven wrote:



( We should double check that rdmsr()/wrmsr() results are never left
uninitialized, but are set to zero or so, for cases where the
return code is not
checked. )


It sure looks like native_read_msr_safe doesn't clear the output if
the rdmsr fails.


I'd suggest to return some poison not just 0...


What about 0 + WARN?


why 0?

0xdeadbeef or any other pattern (even 0x3636363636) makes more sense (of course 
also WARN... but most folks don't read dmesg for WARNs)

(it's the same thing we do for list or slab poison stuff)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] x86/paravirt: Fix baremetal paravirt MSR ops

2015-09-17 Thread Borislav Petkov

On Thu, Sep 17, 2015 at 01:39:26PM +0200, Paolo Bonzini wrote:
> That's not a big deal, that's what *_safe is for.  The problem is that
> there are definitely some cases where the *_safe version is not being used.

I mean to do feature checks which assure you that those MSRs are
there so you don't need the safe variants. And that is not always
easy/possible.

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] x86/paravirt: Fix baremetal paravirt MSR ops

2015-09-17 Thread Paolo Bonzini



On 17/09/2015 17:31, Arjan van de Ven wrote:
>>
>> What about 0 + WARN?
> 
> why 0?
> 
> 0xdeadbeef or any other pattern (even 0x3636363636) makes more sense (of
> course also WARN... but most folks don't read dmesg for WARNs)
> 
> (it's the same thing we do for list or slab poison stuff)

Sorry, brain fart.  That makes sense for the safe variants.  It would
require some auditing though.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v8 03/13] KVM: Define a new interface kvm_intr_is_single_vcpu()

2015-09-17 Thread Paolo Bonzini



On 17/09/2015 17:58, Radim Krčmář wrote:
> For interrupts from MSI and IOxAPIC:
> - Flat logical interrupts are delivered as if we had natural
>   (CPU0<->bit0, CPU1<->bit1, ...) flat logical xAPIC for first 8 VCPUs.
> - Cluster logical doesn't work much, it's interpreted like flat logical.
>   I didn't care about xAPIC cluster because Linux, the sole user of our
>   paravirtualized x2APIC, doesn't configure it.
> 
> I'll paste kvm_apic_mda() source for better explanation:
> 
>   static u32 kvm_apic_mda(unsigned int dest_id, struct kvm_lapic *source,
> struct kvm_lapic *target)
>   {
>   bool ipi = source != NULL;
>   bool x2apic_mda = apic_x2apic_mode(ipi ? source : target);
>   
>   if (!ipi && dest_id == APIC_BROADCAST && x2apic_mda)
>   return X2APIC_BROADCAST;
>   
>   return x2apic_mda ? dest_id : SET_APIC_DEST_FIELD(dest_id);
>   }
> 
> MSI/IOxAPIC interrupt means that source is NULL and if the target is in
> x2APIC mode, the original 'dest_id' is returned as mda => a flat logical
> xAPIC to 0x0f will get interpreted as (cluster) logical x2APIC 0xf in
> kvm_apic_match_logical_addr().
> xAPIC address are only 8 bit long so they always get delivered to x2APIC
> cluster 0, where first 16 bits work like xAPIC flat logical mode.

Ok, I was wondering whether this was the correct interpretation.  Thanks!

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] kvmtool: don't rely on $HOME

2015-09-17 Thread Will Deacon

On Thu, Sep 17, 2015 at 03:03:15PM +0100, Alban Crequy wrote:
> kvm__set_dir() called in main() and kvm__get_dir() rely on $HOME. But in
> some environments (such as starting lkvm through systemd-run), $HOME is
> undefined. This causes bind() to use a socket path containing "(null)"
> and this fails. The current code does not check errors returned by
> realpath().
> 
> Symptoms:
> 
> | bind: No such file or directory
> |   Error: Failed adding socket to epoll
> |  Warning: Failed init: kvm_ipc__init
> |
> |  Fatal: Initialisation failed
> 
> This bug was first reported on https://github.com/coreos/rkt/issues/1393
> 
> Instead of using "$HOME/.lkvm/" (i.e. "/root/.lkvm/"), this patch uses
> "/var/lib/lkvm/". This also improve the error reporting by printing the
> socket filename.

Hmm, but that requires lkvm to be run with sufficient privileges to
write to /var/lib, which I don't think is generally the case. I think we
have a few options:

  (1) Try /var/lib/lkvm if $HOME is NULL
  (2) Use an alternative environment variable for the pid prefix
  (3) Add a --pid command line option for the pidfile
  (4) ???

Any preferences? What do other projects do?

Will
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Xen-devel] [PATCH 0/3] x86/paravirt: Fix baremetal paravirt MSR ops

2015-09-17 Thread Andrew Cooper

On 17/09/15 16:27, Borislav Petkov wrote:
> On Thu, Sep 17, 2015 at 01:39:26PM +0200, Paolo Bonzini wrote:
>> That's not a big deal, that's what *_safe is for.  The problem is that
>> there are definitely some cases where the *_safe version is not being used.
> I mean to do feature checks which assure you that those MSRs are
> there so you don't need the safe variants. And that is not always
> easy/possible.
>

There are plenty of non-architectural MSRs in use which don't have
feature bits.

Xen used to have problems booting when using the masking MSRs when
booting virtualised.  Nowadays it uses a cpu vendor check and _safe()
probe to detect support.

~Andrew
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Xen-devel] [PATCH 0/3] x86/paravirt: Fix baremetal paravirt MSR ops

2015-09-17 Thread Borislav Petkov

On Thu, Sep 17, 2015 at 04:32:53PM +0100, Andrew Cooper wrote:
> There are plenty of non-architectural MSRs in use which don't have
> feature bits.

That's exactly what the "possible" adjective was supposed to represent.

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH v8 03/13] KVM: Define a new interface kvm_intr_is_single_vcpu()

2015-09-17 Thread Wu, Feng



> -Original Message-
> From: Paolo Bonzini [mailto:pbonz...@redhat.com]
> Sent: Thursday, September 17, 2015 5:42 PM
> To: Wu, Feng; alex.william...@redhat.com; j...@8bytes.org;
> mtosa...@redhat.com
> Cc: eric.au...@linaro.org; kvm@vger.kernel.org;
> io...@lists.linux-foundation.org; linux-ker...@vger.kernel.org
> Subject: Re: [PATCH v8 03/13] KVM: Define a new interface
> kvm_intr_is_single_vcpu()
> 
> 
> 
> On 17/09/2015 05:17, Wu, Feng wrote:
> >>> > > + if (irq->dest_mode == APIC_DEST_PHYSICAL) {
> >>> > > + if (irq->dest_id == 0xFF)
> >>> > > + goto out;
> >>> > > +
> >>> > > + if (irq->dest_id >= ARRAY_SIZE(map->phys_map)) {
> >> >
> >> > Warning here is wrong, the guest can trigger it.
> > Could you please share more information about how the guest
> > triggers these conditions (including the following two), Thanks
> > a lot!
> 
> irq->dest_id is a 16-bit value, so it can be > 255.

Yes, irq->dest_id is defined as u32, but by looking the current KVM
code, seems desst_id is used as an u8 variable, even in x2apic mode
the dest_id will not beyond 255 (except for broadcast dest in in x2apic
mode). Correct me if I am wrong. Thanks a lot!

> 
> > +   if (!kvm_apic_logical_map_valid(map)) {
> > +   WARN_ON_ONCE(1);
> 
> Here, the guest can trigger it by setting a few APICs in flat mode and
> others in cluster mode, for example.

Oh, right, the logical map works only when the destination mode of all
the vCPUs are the same.

> 
> > +   if (cid >= ARRAY_SIZE(map->logical_map)) {
> > +   WARN_ON_ONCE(1);
> 
> In x2apic mode irq->dest_id could have bits 12..15 set.

cid is gotten from bit 16 ..31 of the ldr (in apic_logical_id()), and
in x2apic mode, ldr is constructed in kvm_apic_set_x2apic_id() as
below:

u32 ldr = ((id >> 4) << 16) | (1 << (id & 0xf));

So in fact, cid is (id >> 4), I cannot think of why cid can beyond 15.
Do I miss something here? Thanks!

Thanks,
Feng

> 
> Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [GIT PULL] Second set of KVM/ARM updates for 4.3-rc2

2015-09-17 Thread Paolo Bonzini



On 17/09/2015 16:46, Marc Zyngier wrote:
> Hi Paolo,
> 
> We've had a "nice" collection of fixes trickling in this week, and
> since both Christoffer and I are away next week, I've decided to send
> everything your way a bit early.

Sure, thanks.

I'll send the pull request to Linus tomorrow.

Paolo

 Fairly random stuff to be honnest,
> but a negative diffstat can't be that bad! :-)
> 
> Thanks,
> 
>M.
> 
> The following changes since commit 0c0672922dcc70ffba11d96385e98e42fb3ae08d:
> 
>   arm/arm64: KVM: Fix PSCI affinity info return value for non valid cores 
> (2015-09-04 17:02:48 +0100)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git 
> tags/kvm-arm-for-4.3-rc2-2
> 
> for you to fetch changes up to ef748917b529847277f07c98c55e1c0ce416449f:
> 
>   arm/arm64: KVM: Remove 'config KVM_ARM_MAX_VCPUS' (2015-09-17 13:13:27 
> +0100)
> 
> 
> Second set of KVM/ARM changes for 4.3-rc2
> 
> - Workaround for a Cortex-A57 erratum
> - Bug fix for the debugging infrastructure
> - Fix for 32bit guests with more than 4GB of address space
>   on a 32bit host
> - A number of fixes for the (unusual) case when we don't use
>   the in-kernel GIC emulation
> - Removal of ThumbEE handling on arm64, since these have been
>   dropped from the architecture before anyone actually ever
>   built a CPU
> - Remove the KVM_ARM_MAX_VCPUS limitation which has become
>   fairly pointless
> 
> 
> Marc Zyngier (3):
>   arm64: KVM: Fix user access for debug registers
>   arm64: KVM: Disable virtual timer even if the guest is not using it
>   arm: KVM: Disable virtual timer even if the guest is not using it
> 
> Marek Majtyka (1):
>   arm: KVM: Fix incorrect device to IPA mapping
> 
> Ming Lei (1):
>   arm/arm64: KVM: Remove 'config KVM_ARM_MAX_VCPUS'
> 
> Pavel Fedin (1):
>   arm/arm64: KVM: vgic: Check for !irqchip_in_kernel() when mapping 
> resources
> 
> Will Deacon (2):
>   KVM: arm64: add workaround for Cortex-A57 erratum #852523
>   arm64: KVM: Remove all traces of the ThumbEE registers
> 
>  arch/arm/include/asm/kvm_host.h   |  8 ++--
>  arch/arm/kvm/Kconfig  | 11 ---
>  arch/arm/kvm/arm.c|  2 +-
>  arch/arm/kvm/interrupts_head.S|  6 --
>  arch/arm/kvm/mmu.c|  6 --
>  arch/arm64/include/asm/kvm_arm.h  |  1 -
>  arch/arm64/include/asm/kvm_asm.h  |  4 +---
>  arch/arm64/include/asm/kvm_host.h |  8 ++--
>  arch/arm64/kvm/Kconfig| 11 ---
>  arch/arm64/kvm/hyp.S  | 31 ++-
>  arch/arm64/kvm/sys_regs.c | 15 ---
>  include/kvm/arm_vgic.h|  6 +-
>  virt/kvm/arm/vgic-v3.c|  2 +-
>  13 files changed, 30 insertions(+), 81 deletions(-)
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/8] arm64: KVM: Fix user access for debug registers

2015-09-17 Thread Marc Zyngier

When setting the debug register from userspace, make sure that
copy_from_user() is called with its parameters in the expected
order. It otherwise doesn't do what you think.

Fixes: 84e690bfbed1 ("KVM: arm64: introduce vcpu->arch.debug_ptr")
Reported-by: Peter Maydell 
Cc: Alex Bennée 
Reviewed-by: Christoffer Dall 
Signed-off-by: Marc Zyngier 
---
 arch/arm64/kvm/sys_regs.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index b41607d..1d0463e 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -272,7 +272,7 @@ static int set_bvr(struct kvm_vcpu *vcpu, const struct 
sys_reg_desc *rd,
 {
__u64 *r = >arch.vcpu_debug_state.dbg_bvr[rd->reg];
 
-   if (copy_from_user(uaddr, r, KVM_REG_SIZE(reg->id)) != 0)
+   if (copy_from_user(r, uaddr, KVM_REG_SIZE(reg->id)) != 0)
return -EFAULT;
return 0;
 }
@@ -314,7 +314,7 @@ static int set_bcr(struct kvm_vcpu *vcpu, const struct 
sys_reg_desc *rd,
 {
__u64 *r = >arch.vcpu_debug_state.dbg_bcr[rd->reg];
 
-   if (copy_from_user(uaddr, r, KVM_REG_SIZE(reg->id)) != 0)
+   if (copy_from_user(r, uaddr, KVM_REG_SIZE(reg->id)) != 0)
return -EFAULT;
 
return 0;
@@ -358,7 +358,7 @@ static int set_wvr(struct kvm_vcpu *vcpu, const struct 
sys_reg_desc *rd,
 {
__u64 *r = >arch.vcpu_debug_state.dbg_wvr[rd->reg];
 
-   if (copy_from_user(uaddr, r, KVM_REG_SIZE(reg->id)) != 0)
+   if (copy_from_user(r, uaddr, KVM_REG_SIZE(reg->id)) != 0)
return -EFAULT;
return 0;
 }
@@ -400,7 +400,7 @@ static int set_wcr(struct kvm_vcpu *vcpu, const struct 
sys_reg_desc *rd,
 {
__u64 *r = >arch.vcpu_debug_state.dbg_wcr[rd->reg];
 
-   if (copy_from_user(uaddr, r, KVM_REG_SIZE(reg->id)) != 0)
+   if (copy_from_user(r, uaddr, KVM_REG_SIZE(reg->id)) != 0)
return -EFAULT;
return 0;
 }
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] x86/paravirt: Fix baremetal paravirt MSR ops

2015-09-17 Thread Andy Lutomirski

On Sep 17, 2015 5:33 AM, "Peter Zijlstra"  wrote:
>
> On Thu, Sep 17, 2015 at 01:40:30PM +0200, Paolo Bonzini wrote:
> >
> >
> > On 17/09/2015 10:58, Peter Zijlstra wrote:
> > > But the far greater problem I have with the whole virt thing is that
> > > you cannot use rdmsr_safe() to probe if an MSR exists at all because, as
> > > you told me, these virt thingies return 0 for all 'unknown' MSRs instead
> > > of faulting.
> >
> > At least for KVM, that behavior is opt-in (the ignore_msrs parameter)
> > and no distro that I know enables it by default.
>
> Ah, that would be good news. Andy earlier argued I could not rely on
> rdmsr_safe() faulting on unknown MSRs. If practically we can there's
> some code I can simplify :-)

I was taking about QEMU TCG, not KVM.


--Andy
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] kvmtool: don't rely on $HOME

2015-09-17 Thread Alban Crequy

kvm__set_dir() called in main() and kvm__get_dir() rely on $HOME. But in
some environments (such as starting lkvm through systemd-run), $HOME is
undefined. This causes bind() to use a socket path containing "(null)"
and this fails. The current code does not check errors returned by
realpath().

Symptoms:

| bind: No such file or directory
|   Error: Failed adding socket to epoll
|  Warning: Failed init: kvm_ipc__init
|
|  Fatal: Initialisation failed

This bug was first reported on https://github.com/coreos/rkt/issues/1393

Instead of using "$HOME/.lkvm/" (i.e. "/root/.lkvm/"), this patch uses
"/var/lib/lkvm/". This also improve the error reporting by printing the
socket filename.

Signed-off-by: Alban Crequy 
---
 include/kvm/kvm.h |  5 ++---
 kvm-ipc.c |  1 +
 kvm.c | 26 --
 main.c|  6 +-
 4 files changed, 16 insertions(+), 22 deletions(-)

diff --git a/include/kvm/kvm.h b/include/kvm/kvm.h
index 37155db..368f256 100644
--- a/include/kvm/kvm.h
+++ b/include/kvm/kvm.h
@@ -16,8 +16,7 @@
 #define SIGKVMEXIT (SIGRTMIN + 0)
 #define SIGKVMPAUSE(SIGRTMIN + 1)
 
-#define KVM_PID_FILE_PATH  "/.lkvm/"
-#define HOME_DIR   getenv("HOME")
+#define KVM_PID_FILE_PATH  "/var/lib/lkvm/"
 #define KVM_BINARY_NAME"lkvm"
 
 #ifndef PAGE_SIZE
@@ -70,7 +69,7 @@ struct kvm {
int vm_state;
 };
 
-void kvm__set_dir(const char *fmt, ...);
+int kvm__set_dir(void);
 const char *kvm__get_dir(void);
 
 int kvm__init(struct kvm *kvm);
diff --git a/kvm-ipc.c b/kvm-ipc.c
index 857b0dc..d94456c 100644
--- a/kvm-ipc.c
+++ b/kvm-ipc.c
@@ -60,6 +60,7 @@ static int kvm__create_socket(struct kvm *kvm)
r = bind(s, (struct sockaddr *), len);
if (r < 0) {
perror("bind");
+   pr_err("Cannot bind on %s", full_name);
goto fail;
}
 
diff --git a/kvm.c b/kvm.c
index 10ed230..482f47b 100644
--- a/kvm.c
+++ b/kvm.c
@@ -63,31 +63,21 @@ extern struct kvm_ext kvm_req_ext[];
 
 static char kvm_dir[PATH_MAX];
 
-static int set_dir(const char *fmt, va_list args)
+int kvm__set_dir(void)
 {
-   char tmp[PATH_MAX];
+   int err;
 
-   vsnprintf(tmp, sizeof(tmp), fmt, args);
-
-   mkdir(tmp, 0777);
-
-   if (!realpath(tmp, kvm_dir))
-   return -errno;
+   err = mkdir(KVM_PID_FILE_PATH, 0700);
+   if (err != 0 && errno != EEXIST) {
+   perror("mkdir " KVM_PID_FILE_PATH);
+   return 1;
+   }
 
-   strcat(kvm_dir, "/");
+   snprintf(kvm_dir, sizeof(kvm_dir), KVM_PID_FILE_PATH);
 
return 0;
 }
 
-void kvm__set_dir(const char *fmt, ...)
-{
-   va_list args;
-
-   va_start(args, fmt);
-   set_dir(fmt, args);
-   va_end(args);
-}
-
 const char *kvm__get_dir(void)
 {
return kvm_dir;
diff --git a/main.c b/main.c
index 05bc82c..22cc4e2 100644
--- a/main.c
+++ b/main.c
@@ -13,7 +13,11 @@ static int handle_kvm_command(int argc, char **argv)
 
 int main(int argc, char *argv[])
 {
-   kvm__set_dir("%s/%s", HOME_DIR, KVM_PID_FILE_PATH);
+   int ret;
+
+   ret = kvm__set_dir();
+   if (ret != 0)
+   return ret;
 
return handle_kvm_command(argc - 1, [1]);
 }
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v8 03/13] KVM: Define a new interface kvm_intr_is_single_vcpu()

2015-09-17 Thread Paolo Bonzini

>> On 17/09/2015 05:17, Wu, Feng wrote:
>>> +   if (irq->dest_mode == APIC_DEST_PHYSICAL) {
>>> +   if (irq->dest_id == 0xFF)
>>> +   goto out;
>>> +
>>> +   if (irq->dest_id >= ARRAY_SIZE(map->phys_map)) {
>
> Warning here is wrong, the guest can trigger it.
>>> Could you please share more information about how the guest
>>> triggers these conditions (including the following two), Thanks
>>> a lot!
>>
>> irq->dest_id is a 16-bit value, so it can be > 255.
> 
> Yes, irq->dest_id is defined as u32, but by looking the current KVM
> code, seems desst_id is used as an u8 variable, even in x2apic mode
> the dest_id will not beyond 255 (except for broadcast dest in in x2apic
> mode). Correct me if I am wrong. Thanks a lot!

Actually you're right, the MSI destination is only 8 bits.  I was
confused because of

#define  MSI_ADDR_DEST_ID_MASK  0x000

in arch/x86/include/asm/msidef.h.  But there may be a bug, see below...

>>> +   if (cid >= ARRAY_SIZE(map->logical_map)) {
>>> +   WARN_ON_ONCE(1);
>>
>> In x2apic mode irq->dest_id could have bits 12..15 set.
> 
> cid is gotten from bit 16 ..31 of the ldr (in apic_logical_id()), and
> in x2apic mode, ldr is constructed in kvm_apic_set_x2apic_id() as
> below:
> 
> u32 ldr = ((id >> 4) << 16) | (1 << (id & 0xf));
> 
> So in fact, cid is (id >> 4), I cannot think of why cid can beyond 15.

I think kvm_apic_match_logical_addr for MSI and IOAPIC interrupts is
buggy in x2apic mode.

It does:

if (apic_x2apic_mode(apic))
return ((logical_id >> 16) == (mda >> 16))
   && (logical_id & mda & 0x) != 0;

But mda is only 8-bits for MSI and IOAPIC interrupts.

Radim, should kvm_apic_mda also handle the !ipi && x2apic_mda && dest_id
!= APIC_BROADCAST case?  It never triggers with Linux because it uses
only the physical mode (that's not super-easy to see;
ioapic_set_affinity looks for the RTEs in irq_data->chip_data and that
is allocated with kzalloc).

> Do I miss something here? Thanks!

No, you were right.

But still I think the WARNs are unnecessary; it is conceivable that some
future chipset adds support for more than 8-bits in the dest_id.

Paolo

> Thanks,
> Feng
> 
>>
>> Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/8] KVM: arm64: add workaround for Cortex-A57 erratum #852523

2015-09-17 Thread Marc Zyngier

From: Will Deacon 

When restoring the system register state for an AArch32 guest at EL2,
writes to DACR32_EL2 may not be correctly synchronised by Cortex-A57,
which can lead to the guest effectively running with junk in the DACR
and running into unexpected domain faults.

This patch works around the issue by re-ordering our restoration of the
AArch32 register aliases so that they happen before the AArch64 system
registers. Ensuring that the registers are restored in this order
guarantees that they will be correctly synchronised by the core.

Cc: 
Reviewed-by: Marc Zyngier 
Signed-off-by: Will Deacon 
Signed-off-by: Marc Zyngier 
---
 arch/arm64/kvm/hyp.S | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
index 8188f6a..39aa322 100644
--- a/arch/arm64/kvm/hyp.S
+++ b/arch/arm64/kvm/hyp.S
@@ -745,6 +745,9 @@ ENTRY(__kvm_vcpu_run)
// Guest context
add x2, x0, #VCPU_CONTEXT
 
+   // We must restore the 32-bit state before the sysregs, thanks
+   // to Cortex-A57 erratum #852523.
+   restore_guest_32bit_state
bl __restore_sysregs
 
skip_debug_state x3, 1f
@@ -752,7 +755,6 @@ ENTRY(__kvm_vcpu_run)
kern_hyp_va x3
bl  __restore_debug
 1:
-   restore_guest_32bit_state
restore_guest_regs
 
// That's it, no more messing around.
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[GIT PULL] Second set of KVM/ARM updates for 4.3-rc2

2015-09-17 Thread Marc Zyngier

Hi Paolo,

We've had a "nice" collection of fixes trickling in this week, and
since both Christoffer and I are away next week, I've decided to send
everything your way a bit early. Fairly random stuff to be honnest,
but a negative diffstat can't be that bad! :-)

Thanks,

   M.

The following changes since commit 0c0672922dcc70ffba11d96385e98e42fb3ae08d:

  arm/arm64: KVM: Fix PSCI affinity info return value for non valid cores 
(2015-09-04 17:02:48 +0100)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git 
tags/kvm-arm-for-4.3-rc2-2

for you to fetch changes up to ef748917b529847277f07c98c55e1c0ce416449f:

  arm/arm64: KVM: Remove 'config KVM_ARM_MAX_VCPUS' (2015-09-17 13:13:27 +0100)


Second set of KVM/ARM changes for 4.3-rc2

- Workaround for a Cortex-A57 erratum
- Bug fix for the debugging infrastructure
- Fix for 32bit guests with more than 4GB of address space
  on a 32bit host
- A number of fixes for the (unusual) case when we don't use
  the in-kernel GIC emulation
- Removal of ThumbEE handling on arm64, since these have been
  dropped from the architecture before anyone actually ever
  built a CPU
- Remove the KVM_ARM_MAX_VCPUS limitation which has become
  fairly pointless


Marc Zyngier (3):
  arm64: KVM: Fix user access for debug registers
  arm64: KVM: Disable virtual timer even if the guest is not using it
  arm: KVM: Disable virtual timer even if the guest is not using it

Marek Majtyka (1):
  arm: KVM: Fix incorrect device to IPA mapping

Ming Lei (1):
  arm/arm64: KVM: Remove 'config KVM_ARM_MAX_VCPUS'

Pavel Fedin (1):
  arm/arm64: KVM: vgic: Check for !irqchip_in_kernel() when mapping 
resources

Will Deacon (2):
  KVM: arm64: add workaround for Cortex-A57 erratum #852523
  arm64: KVM: Remove all traces of the ThumbEE registers

 arch/arm/include/asm/kvm_host.h   |  8 ++--
 arch/arm/kvm/Kconfig  | 11 ---
 arch/arm/kvm/arm.c|  2 +-
 arch/arm/kvm/interrupts_head.S|  6 --
 arch/arm/kvm/mmu.c|  6 --
 arch/arm64/include/asm/kvm_arm.h  |  1 -
 arch/arm64/include/asm/kvm_asm.h  |  4 +---
 arch/arm64/include/asm/kvm_host.h |  8 ++--
 arch/arm64/kvm/Kconfig| 11 ---
 arch/arm64/kvm/hyp.S  | 31 ++-
 arch/arm64/kvm/sys_regs.c | 15 ---
 include/kvm/arm_vgic.h|  6 +-
 virt/kvm/arm/vgic-v3.c|  2 +-
 13 files changed, 30 insertions(+), 81 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 5/8] arm64: KVM: Disable virtual timer even if the guest is not using it

2015-09-17 Thread Paolo Bonzini



On 17/09/2015 16:46, Marc Zyngier wrote:
> When running a guest with the architected timer disabled (with QEMU and
> the kernel_irqchip=off option, for example), it is important to make
> sure the timer gets turned off. Otherwise, the guest may try to
> enable it anyway, leading to a screaming HW interrupt.
> 
> The fix is to unconditionally turn off the virtual timer on guest
> exit.
> 
> Cc: sta...@vger.kernel.org
> Reviewed-by: Christoffer Dall 
> Signed-off-by: Marc Zyngier 
> ---
>  arch/arm64/kvm/hyp.S | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
> index 39aa322..60a83e2 100644
> --- a/arch/arm64/kvm/hyp.S
> +++ b/arch/arm64/kvm/hyp.S
> @@ -562,8 +562,6 @@
>   mrs x3, cntv_ctl_el0
>   and x3, x3, #3
>   str w3, [x0, #VCPU_TIMER_CNTV_CTL]
> - bic x3, x3, #1  // Clear Enable
> - msr cntv_ctl_el0, x3
>  
>   isb
>  
> @@ -571,6 +569,9 @@
>   str x3, [x0, #VCPU_TIMER_CNTV_CVAL]
>  
>  1:
> + // Disable the virtual timer
> + msr cntv_ctl_el0, xzr
> +
>   // Allow physical timer/counter access for the host
>   mrs x2, cnthctl_el2
>   orr x2, x2, #3
> 

It looks like here in restore_timer_state:

ldr w2, [x0, #VCPU_TIMER_CNTV_CTL]
and x2, x2, #3
msr cntv_ctl_el0, x2

the "and" would be unnecessary if kvm_arm_timer_set_reg remembered to 
do it.  Something like this, which would also make the code more 
consistent between arm and arm64...

diff --git a/arch/arm/kvm/interrupts_head.S b/arch/arm/kvm/interrupts_head.S
index 702740d37465..93e322b4d242 100644
--- a/arch/arm/kvm/interrupts_head.S
+++ b/arch/arm/kvm/interrupts_head.S
@@ -514,6 +514,7 @@ ARM_BE8(rev r6, r6  )
beq 1f
 
mrc p15, 0, r2, c14, c3, 1  @ CNTV_CTL
+   and r2, r2, #3
str r2, [vcpu, #VCPU_TIMER_CNTV_CTL]
bic r2, #1  @ Clear ENABLE
mcr p15, 0, r2, c14, c3, 1  @ CNTV_CTL
@@ -566,7 +567,6 @@ ARM_BE8(rev r6, r6  )
isb
 
ldr r2, [vcpu, #VCPU_TIMER_CNTV_CTL]
-   and r2, r2, #3
mcr p15, 0, r2, c14, c3, 1  @ CNTV_CTL
 1:
 .endm
diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
index 10915aaf0b01..bfcd3f3a947b 100644
--- a/arch/arm64/kvm/hyp.S
+++ b/arch/arm64/kvm/hyp.S
@@ -887,7 +887,6 @@ alternative_endif
isb
 
ldr w2, [x0, #VCPU_TIMER_CNTV_CTL]
-   and x2, x2, #3
msr cntv_ctl_el0, x2
 1:
 .endm
diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 98c95f2fcba4..9b03c9f5abbf 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -218,7 +218,7 @@ int kvm_arm_timer_set_reg(struct kvm_vcpu *vcpu, u64 regid, 
u64 value)
 
switch (regid) {
case KVM_REG_ARM_TIMER_CTL:
-   timer->cntv_ctl = value;
+   timer->cntv_ctl = value & (ARCH_TIMER_CTRL_IT_MASK | 
ARCH_TIMER_CTRL_ENABLE);
break;
case KVM_REG_ARM_TIMER_CNT:
vcpu->kvm->arch.timer.cntvoff = kvm_phys_timer_read() - value;
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: single_task_running() vs. preemption warnings (was Re: [PATCH] kvm: fix preemption warnings in kvm_vcpu_block)

2015-09-17 Thread Dominik Dingel

On Thu, 17 Sep 2015 18:45:00 +0200
Paolo Bonzini  wrote:

> 
> 
> On 17/09/2015 18:27, Dominik Dingel wrote:
> > +   preempt_disable();
> > +   solo = single_task_running();
> > +   preempt_enable();
> > +
> > cur = ktime_get();
> > -   } while (single_task_running() && ktime_before(cur, stop));
> 
> That's the obvious way to fix it, but the TOCTTOU problem (which was in
> the buggy code too) is obvious too. :)  And the only other user of
> single_task_running() in drivers/crypto/mcryptd.c has the same issue.

Right, worst thing we fly another round.

I am not sure about the case for mcryptd.c. I think it might be that the worker
there is bounded to one cpu and will not be migrated.

I really need to look more in the details what is happening with that worker.

> In fact, because of the way the function is used ("maybe I can do a
> little bit of work before going to sleep") it will likely be called many
> times in a loop.  This in turn means that:
> 
> - any wrong result due to a concurrent process migration would be
> rectified very soon
> 
> - preempt_disable()/preempt_enable() can actually be just as expensive
> or more expensive than single_task_running() itself.
> 
> Therefore, I wonder if single_task_running() should just use
> raw_smp_processor_id().  At least the TOCTTOU issue can be clearly
> documented in the function comment, instead of being hidden behind each
> of the callers.

Yes to be useful it should probably call raw_smp_processor_id,
and as a lot of code actually already does just does that I do not really see 
much
down sides.

@Tim, would it be okay if I change single_task_running and add a specific 
comment on top?

> Thanks,
> 
> Paolo
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] x86/paravirt: Fix baremetal paravirt MSR ops

2015-09-17 Thread Ingo Molnar


* Andy Lutomirski  wrote:

> On Thu, Sep 17, 2015 at 12:19 AM, Ingo Molnar  wrote:
> >
> > * Andy Lutomirski  wrote:
> >
> >> Setting CONFIG_PARAVIRT=y has an unintended side effect: it silently
> >> turns all rdmsr and wrmsr operations into the safe variants without
> >> any checks that the operations actually succeed.
> >>
> >> This is IMO awful: it papers over bugs.  In particular, KVM gueests
> >> might be unwittingly depending on this behavior because
> >> CONFIG_KVM_GUEST currently depends on CONFIG_PARAVIRT.  I'm not
> >> aware of any such problems, but applying this series would be a good
> >> way to shake them out.
> >>
> >> Fix it so that the MSR operations work the same on CONFIG_PARAVIRT=n
> >> and CONFIG_PARAVIRT=y as long as Xen isn't being used.  The Xen
> >> maintainers are welcome to make a similar change on top of this.
> >>
> >> Since there's plenty of time before the next merge window, I think
> >> we should apply and fix anything that breaks.
> >
> > No, I think we should at most generate a warning instead, and not crash the 
> > kernel
> > via rdmsr()!
> >
> > Most big distro kernels on bare metal have CONFIG_PARAVIRT=y (I checked 
> > Ubuntu and
> > Fedora), so we are potentially exposing a lot of users to problems.
> >
> > Crashing the bootup on an unknown MSR is bad. Many MSR reads and writes are
> > non-critical and returning the 'safe' result is much better than crashing or
> > hanging the bootup.
> >
> 
> Should we do that for CONFIG_PARAVIRT=n, too?

Absolutely. PARAVIRT=n should not behave differently from PARAVIRT=y on bare 
metal.

> It would be straightforward to rig this up (temporarily?) on top of these 
> patches.  To keep bloat down, we might want to implement it in 
> do_general_protection rather than sticking it in native_read_msr.

Fair enough.

> wrmsr is a different beast, since we can fail due to writing the wrong value 
> to 
> an otherwise valid MSR.  Given that MSR screwups can very easily be security 
> holes, I'm not sure that warning and blindly continuing on an unchecked 
> failed 
> wrmsr is a good idea.

So the fact is that right now we are silently ignoring failures there, and have 
been doing that for some time. The right first step is to live with that and 
start 
generating low-key, once-per-bootup warnings at most, and see how frequent (and 
how serious) they are.

We could add a (default disabled) CONFIG_PANIC_ON_UNKNOWN_MSR=y option if 
that's 
really a serious concern.

> In any event, I think it's nuts that CONFIG_PARAVIRT changes this
> behavior.  We should pick something sane and stick with it.

Absolutely - and as it happens, the 'does not crash the kernel' PARAVIRT=y 
accidental behavior is actually quite close to what we wanted for a long time, 
so 
let's make it official - and add a warning to make sure we are aware of 
problems. 

But don't turn 'potential problems' into showstopper bugs such as a hard to 
debug 
early boot crash, which to most Linux users means a black screen on bootup!

> > ( We should double check that rdmsr()/wrmsr() results are never left
> >   uninitialized, but are set to zero or so, for cases where the return code 
> > is not
> >   checked. )
> 
> It sure looks like native_read_msr_safe doesn't clear the output if the rdmsr 
> fails.

Yeah, that should be fixed, to make such soft failures deterministic.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

single_task_running() vs. preemption warnings (was Re: [PATCH] kvm: fix preemption warnings in kvm_vcpu_block)

2015-09-17 Thread Paolo Bonzini

On 17/09/2015 18:27, Dominik Dingel wrote:
> + preempt_disable();
> + solo = single_task_running();
> + preempt_enable();
> +
>   cur = ktime_get();
> - } while (single_task_running() && ktime_before(cur, stop));

That's the obvious way to fix it, but the TOCTTOU problem (which was in
the buggy code too) is obvious too. :)  And the only other user of
single_task_running() in drivers/crypto/mcryptd.c has the same issue.

In fact, because of the way the function is used ("maybe I can do a
little bit of work before going to sleep") it will likely be called many
times in a loop.  This in turn means that:

- any wrong result due to a concurrent process migration would be
rectified very soon

- preempt_disable()/preempt_enable() can actually be just as expensive
or more expensive than single_task_running() itself.

Therefore, I wonder if single_task_running() should just use
raw_smp_processor_id().  At least the TOCTTOU issue can be clearly
documented in the function comment, instead of being hidden behind each
of the callers.

Thanks,

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] x86/paravirt: Fix baremetal paravirt MSR ops

2015-09-17 Thread Andy Lutomirski

On Thu, Sep 17, 2015 at 10:30 AM, Ingo Molnar  wrote:
>
> * Andy Lutomirski  wrote:
>
>> On Thu, Sep 17, 2015 at 12:19 AM, Ingo Molnar  wrote:
>> >
>> > * Andy Lutomirski  wrote:
>> >
>> >> Setting CONFIG_PARAVIRT=y has an unintended side effect: it silently
>> >> turns all rdmsr and wrmsr operations into the safe variants without
>> >> any checks that the operations actually succeed.
>> >>
>> >> This is IMO awful: it papers over bugs.  In particular, KVM gueests
>> >> might be unwittingly depending on this behavior because
>> >> CONFIG_KVM_GUEST currently depends on CONFIG_PARAVIRT.  I'm not
>> >> aware of any such problems, but applying this series would be a good
>> >> way to shake them out.
>> >>
>> >> Fix it so that the MSR operations work the same on CONFIG_PARAVIRT=n
>> >> and CONFIG_PARAVIRT=y as long as Xen isn't being used.  The Xen
>> >> maintainers are welcome to make a similar change on top of this.
>> >>
>> >> Since there's plenty of time before the next merge window, I think
>> >> we should apply and fix anything that breaks.
>> >
>> > No, I think we should at most generate a warning instead, and not crash 
>> > the kernel
>> > via rdmsr()!
>> >
>> > Most big distro kernels on bare metal have CONFIG_PARAVIRT=y (I checked 
>> > Ubuntu and
>> > Fedora), so we are potentially exposing a lot of users to problems.
>> >
>> > Crashing the bootup on an unknown MSR is bad. Many MSR reads and writes are
>> > non-critical and returning the 'safe' result is much better than crashing 
>> > or
>> > hanging the bootup.
>> >
>>
>> Should we do that for CONFIG_PARAVIRT=n, too?
>
> Absolutely. PARAVIRT=n should not behave differently from PARAVIRT=y on bare
> metal.
>
>> It would be straightforward to rig this up (temporarily?) on top of these
>> patches.  To keep bloat down, we might want to implement it in
>> do_general_protection rather than sticking it in native_read_msr.
>
> Fair enough.
>
>> wrmsr is a different beast, since we can fail due to writing the wrong value 
>> to
>> an otherwise valid MSR.  Given that MSR screwups can very easily be security
>> holes, I'm not sure that warning and blindly continuing on an unchecked 
>> failed
>> wrmsr is a good idea.
>
> So the fact is that right now we are silently ignoring failures there, and 
> have
> been doing that for some time. The right first step is to live with that and 
> start
> generating low-key, once-per-bootup warnings at most, and see how frequent 
> (and
> how serious) they are.
>
> We could add a (default disabled) CONFIG_PANIC_ON_UNKNOWN_MSR=y option if 
> that's
> really a serious concern.

Could we abuse panic_on_oops for this purpose?

>
>> In any event, I think it's nuts that CONFIG_PARAVIRT changes this
>> behavior.  We should pick something sane and stick with it.
>
> Absolutely - and as it happens, the 'does not crash the kernel' PARAVIRT=y
> accidental behavior is actually quite close to what we wanted for a long 
> time, so
> let's make it official - and add a warning to make sure we are aware of 
> problems.
>
> But don't turn 'potential problems' into showstopper bugs such as a hard to 
> debug
> early boot crash, which to most Linux users means a black screen on bootup!
>

Fair enough.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: single_task_running() vs. preemption warnings (was Re: [PATCH] kvm: fix preemption warnings in kvm_vcpu_block)

2015-09-17 Thread Tim Chen

On Thu, 2015-09-17 at 19:07 +0200, Dominik Dingel wrote:
> On Thu, 17 Sep 2015 18:45:00 +0200
> Paolo Bonzini  wrote:
> 
> > 
> > 
> > On 17/09/2015 18:27, Dominik Dingel wrote:
> > > + preempt_disable();
> > > + solo = single_task_running();
> > > + preempt_enable();
> > > +
> > >   cur = ktime_get();
> > > - } while (single_task_running() && ktime_before(cur, stop));
> > 
> > That's the obvious way to fix it, but the TOCTTOU problem (which was in
> > the buggy code too) is obvious too. :)  And the only other user of
> > single_task_running() in drivers/crypto/mcryptd.c has the same issue.
> 
> Right, worst thing we fly another round.
> 
> I am not sure about the case for mcryptd.c. I think it might be that the 
> worker
> there is bounded to one cpu and will not be migrated.
> 
> I really need to look more in the details what is happening with that worker.
> 
> > In fact, because of the way the function is used ("maybe I can do a
> > little bit of work before going to sleep") it will likely be called many
> > times in a loop.  This in turn means that:
> > 
> > - any wrong result due to a concurrent process migration would be
> > rectified very soon
> > 
> > - preempt_disable()/preempt_enable() can actually be just as expensive
> > or more expensive than single_task_running() itself.
> > 
> > Therefore, I wonder if single_task_running() should just use
> > raw_smp_processor_id().  At least the TOCTTOU issue can be clearly
> > documented in the function comment, instead of being hidden behind each
> > of the callers.
> 
> Yes to be useful it should probably call raw_smp_processor_id,
> and as a lot of code actually already does just does that I do not really see 
> much
> down sides.
> 
> @Tim, would it be okay if I change single_task_running and add a specific 
> comment on top?

I have no objection to change single_task_running to use
raw_smp_processor_id.  The worker in mcryptd is bound to
the cpu so it has no migration/preemption issue.  So it shouldn't care
which smp_processor_id version is being used.  Yes, please add a comment
to alert the user of this caveat should you change single_task_running.

Thanks.

Tim


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: rdmsr_safe in Linux PV (under Xen) gets an #GP:Re: [Fedora-xen] Running fedora xen on top of KVM?

2015-09-17 Thread Andy Lutomirski

On Thu, Sep 17, 2015 at 1:10 PM, Konrad Rzeszutek Wilk
 wrote:
> On Wed, Sep 16, 2015 at 06:39:03PM -0400, Cole Robinson wrote:
>> On 09/16/2015 05:08 PM, Konrad Rzeszutek Wilk wrote:
>> > On Wed, Sep 16, 2015 at 05:04:31PM -0400, Cole Robinson wrote:
>> >> On 09/16/2015 04:07 PM, M A Young wrote:
>> >>> On Wed, 16 Sep 2015, Cole Robinson wrote:
>> >>>
>>  Unfortunately I couldn't get anything else extra out of xen using any 
>>  of these
>>  options or the ones Major recommended... in fact I couldn't get 
>>  anything to
>>  the serial console at all. console=con1 would seem to redirect messages 
>>  since
>>  they wouldn't show up on the graphical display, but nothing went to the 
>>  serial
>>  log. Maybe I'm missing something...
>> >>>
>> >>> That should be console=com1 so you have a typo either in this message or
>> >>> in your tests.
>> >>>
>> >>
>> >> Yeah that was it :/ So here's the crash output use -cpu host:
>> >>
>> >> - Cole
>> >>
>>
>> 
>>
>> >> about to get started...
>> >> (XEN) traps.c:459:d0v0 Unhandled general protection fault fault/trap 
>> >> [#13] on
>> >> VCPU 0 [ec=]
>> >> (XEN) domain_crash_sync called from entry.S: fault at 82d08023a5d3
>> >> create_bounce_frame+0x12b/0x13a
>> >> (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
>> >> (XEN) [ Xen-4.5.1  x86_64  debug=n  Not tainted ]
>> >> (XEN) CPU:0
>> >> (XEN) RIP:e033:[]
>> >
>> > That is the Linux kernel EIP. Can you figure out what is at 
>> > 810032b0 ?
>> >
>> > gdb vmlinux and then
>> > x/20i 0x810032b0
>> >
>> > can help with that.
>> >
>>
>> Updated to the latest kernel 4.1.6-201.fc22.x86_64. Trace is now:
>>
>> about to get started...
>> (XEN) traps.c:459:d0v0 Unhandled general protection fault fault/trap [#13] on
>> VCPU 0 [ec=]

What exactly does this mean?

>> (XEN) domain_crash_sync called from entry.S: fault at 82d08023a5d3
>> create_bounce_frame+0x12b/0x13a
>> (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
>> (XEN) [ Xen-4.5.1  x86_64  debug=n  Not tainted ]
>> (XEN) CPU:0
>> (XEN) RIP:e033:[]
>> (XEN) RFLAGS: 0282   EM: 1   CONTEXT: pv guest
>> (XEN) rax: 0015   rbx: 81c03e1c   rcx: c0010112
>> (XEN) rdx: 0001   rsi: 81c03e1c   rdi: c0010112
>> (XEN) rbp: 81c03df8   rsp: 81c03da0   r8:  81c03e28
>> (XEN) r9:  81c03e2c   r10:    r11: 
>> (XEN) r12: 81d25a60   r13: 0400   r14: 
>> (XEN) r15:    cr0: 80050033   cr4: 000406f0
>> (XEN) cr3: 75c0b000   cr2: 
>> (XEN) ds:    es:    fs:    gs:    ss: e02b   cs: e033
>> (XEN) Guest stack trace from rsp=81c03da0:
>> (XEN)c0010112   810031f0
>> (XEN)0001e030 00010082 81c03de0 e02b
>> (XEN) 000c 81c03e1c 81c03e48
>> (XEN)8102a7a4 81c03e48 8102aa3b 81c03e48
>> (XEN)cf1fa5f5e026f464 0100 81c03ef8 0400
>> (XEN) 81c03e58 81d5d142 81c03ee8
>> (XEN)81d58b56   81c03e88
>> (XEN)810f8a39 81c03ee8 81798b13 0010
>> (XEN)81c03ef8 81c03eb8 cf1fa5f5e026f464 81f1de9c
>> (XEN)  81df7920 
>> (XEN) 81c03f28 81d51c74 cf1fa5f5e026f464
>> (XEN) 81c03f60 81c03f5c 
>> (XEN) 81c03f38 81d51339 81c03ff8
>> (XEN)81d548b1  00600f12 00010800
>> (XEN)03010032 0005  
>> (XEN)   
>> (XEN)   
>> (XEN)   
>> (XEN)   
>> (XEN)0f0060c0c748 c305  
>> (XEN) Domain 0 crashed: rebooting machine in 5 seconds.
>>
>>
>> gdb output:
>>
>> (gdb) x/20i 0x810031f0
>>0x810031f0 : rdmsr
>
> Fantastic! So we have some rdmsr that makes KVM inject an
> GP.

What's the scenario?  Is this Xen on KVM?

Why didn't the guest print anything?

Is the issue here that the guest died due to failure to handle an
RDMSR failure or did the *hypervisor* die?

It looks like null_trap_bounce is returning true, which suggests that
the failure is happening before the guest

rdmsr_safe in Linux PV (under Xen) gets an #GP:Re: [Fedora-xen] Running fedora xen on top of KVM?

2015-09-17 Thread Konrad Rzeszutek Wilk

On Wed, Sep 16, 2015 at 06:39:03PM -0400, Cole Robinson wrote:
> On 09/16/2015 05:08 PM, Konrad Rzeszutek Wilk wrote:
> > On Wed, Sep 16, 2015 at 05:04:31PM -0400, Cole Robinson wrote:
> >> On 09/16/2015 04:07 PM, M A Young wrote:
> >>> On Wed, 16 Sep 2015, Cole Robinson wrote:
> >>>
>  Unfortunately I couldn't get anything else extra out of xen using any of 
>  these
>  options or the ones Major recommended... in fact I couldn't get anything 
>  to
>  the serial console at all. console=con1 would seem to redirect messages 
>  since
>  they wouldn't show up on the graphical display, but nothing went to the 
>  serial
>  log. Maybe I'm missing something...
> >>>
> >>> That should be console=com1 so you have a typo either in this message or 
> >>> in your tests.
> >>>
> >>
> >> Yeah that was it :/ So here's the crash output use -cpu host:
> >>
> >> - Cole
> >>
> 
> 
> 
> >> about to get started...
> >> (XEN) traps.c:459:d0v0 Unhandled general protection fault fault/trap [#13] 
> >> on
> >> VCPU 0 [ec=]
> >> (XEN) domain_crash_sync called from entry.S: fault at 82d08023a5d3
> >> create_bounce_frame+0x12b/0x13a
> >> (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
> >> (XEN) [ Xen-4.5.1  x86_64  debug=n  Not tainted ]
> >> (XEN) CPU:0
> >> (XEN) RIP:e033:[]
> > 
> > That is the Linux kernel EIP. Can you figure out what is at 
> > 810032b0 ?
> > 
> > gdb vmlinux and then
> > x/20i 0x810032b0
> > 
> > can help with that.
> > 
> 
> Updated to the latest kernel 4.1.6-201.fc22.x86_64. Trace is now:
> 
> about to get started...
> (XEN) traps.c:459:d0v0 Unhandled general protection fault fault/trap [#13] on
> VCPU 0 [ec=]
> (XEN) domain_crash_sync called from entry.S: fault at 82d08023a5d3
> create_bounce_frame+0x12b/0x13a
> (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
> (XEN) [ Xen-4.5.1  x86_64  debug=n  Not tainted ]
> (XEN) CPU:0
> (XEN) RIP:e033:[]
> (XEN) RFLAGS: 0282   EM: 1   CONTEXT: pv guest
> (XEN) rax: 0015   rbx: 81c03e1c   rcx: c0010112
> (XEN) rdx: 0001   rsi: 81c03e1c   rdi: c0010112
> (XEN) rbp: 81c03df8   rsp: 81c03da0   r8:  81c03e28
> (XEN) r9:  81c03e2c   r10:    r11: 
> (XEN) r12: 81d25a60   r13: 0400   r14: 
> (XEN) r15:    cr0: 80050033   cr4: 000406f0
> (XEN) cr3: 75c0b000   cr2: 
> (XEN) ds:    es:    fs:    gs:    ss: e02b   cs: e033
> (XEN) Guest stack trace from rsp=81c03da0:
> (XEN)c0010112   810031f0
> (XEN)0001e030 00010082 81c03de0 e02b
> (XEN) 000c 81c03e1c 81c03e48
> (XEN)8102a7a4 81c03e48 8102aa3b 81c03e48
> (XEN)cf1fa5f5e026f464 0100 81c03ef8 0400
> (XEN) 81c03e58 81d5d142 81c03ee8
> (XEN)81d58b56   81c03e88
> (XEN)810f8a39 81c03ee8 81798b13 0010
> (XEN)81c03ef8 81c03eb8 cf1fa5f5e026f464 81f1de9c
> (XEN)  81df7920 
> (XEN) 81c03f28 81d51c74 cf1fa5f5e026f464
> (XEN) 81c03f60 81c03f5c 
> (XEN) 81c03f38 81d51339 81c03ff8
> (XEN)81d548b1  00600f12 00010800
> (XEN)03010032 0005  
> (XEN)   
> (XEN)   
> (XEN)   
> (XEN)   
> (XEN)0f0060c0c748 c305  
> (XEN) Domain 0 crashed: rebooting machine in 5 seconds.
> 
> 
> gdb output:
> 
> (gdb) x/20i 0x810031f0
>0x810031f0 : rdmsr

Fantastic! So we have some rdmsr that makes KVM inject an
GP.

Looking at the stack you have some other values:
81c03de0, 81c03e1c .. they should correspond
to other functions calling this one. If you do 'nm --defined vmlinux | grep 
81c03e1'
that should give an idea where they are. Or use 'gdb'.

That will give us an stack - and we can find what type of MSR
this is. Oh wait, it is on the registers: c0010112

Ok, so where in the code is that MSR ah, that looks to be:
 #define MSR_K8_TSEG_ADDR0xc0010112

Re: [Xen-devel] rdmsr_safe in Linux PV (under Xen) gets an #GP:Re: [Fedora-xen] Running fedora xen on top of KVM?

2015-09-17 Thread Andrew Cooper

On 17/09/2015 21:23, Andy Lutomirski wrote:
> On Thu, Sep 17, 2015 at 1:10 PM, Konrad Rzeszutek Wilk
>  wrote:
>> On Wed, Sep 16, 2015 at 06:39:03PM -0400, Cole Robinson wrote:
>>> On 09/16/2015 05:08 PM, Konrad Rzeszutek Wilk wrote:
 On Wed, Sep 16, 2015 at 05:04:31PM -0400, Cole Robinson wrote:
> On 09/16/2015 04:07 PM, M A Young wrote:
>> On Wed, 16 Sep 2015, Cole Robinson wrote:
>>
>>> Unfortunately I couldn't get anything else extra out of xen using any 
>>> of these
>>> options or the ones Major recommended... in fact I couldn't get 
>>> anything to
>>> the serial console at all. console=con1 would seem to redirect messages 
>>> since
>>> they wouldn't show up on the graphical display, but nothing went to the 
>>> serial
>>> log. Maybe I'm missing something...
>> That should be console=com1 so you have a typo either in this message or
>> in your tests.
>>
> Yeah that was it :/ So here's the crash output use -cpu host:
>
> - Cole
>
>>> 
>>>
> about to get started...
> (XEN) traps.c:459:d0v0 Unhandled general protection fault fault/trap 
> [#13] on
> VCPU 0 [ec=]
> (XEN) domain_crash_sync called from entry.S: fault at 82d08023a5d3
> create_bounce_frame+0x12b/0x13a
> (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
> (XEN) [ Xen-4.5.1  x86_64  debug=n  Not tainted ]
> (XEN) CPU:0
> (XEN) RIP:e033:[]
 That is the Linux kernel EIP. Can you figure out what is at 
 810032b0 ?

 gdb vmlinux and then
 x/20i 0x810032b0

 can help with that.

>>> Updated to the latest kernel 4.1.6-201.fc22.x86_64. Trace is now:
>>>
>>> about to get started...
>>> (XEN) traps.c:459:d0v0 Unhandled general protection fault fault/trap [#13] 
>>> on
>>> VCPU 0 [ec=]
> What exactly does this mean?

This means that there was  #GP fault originating from dom0 context, but
dom0 has not yet registered a #GP handler with Xen.  (I already have a
patch pending to correct the wording of that error message.)

Would be a double fault on native.

>
>>> (XEN) domain_crash_sync called from entry.S: fault at 82d08023a5d3
>>> create_bounce_frame+0x12b/0x13a
>>> (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
>>> (XEN) [ Xen-4.5.1  x86_64  debug=n  Not tainted ]
>>> (XEN) CPU:0
>>> (XEN) RIP:e033:[]
>>> (XEN) RFLAGS: 0282   EM: 1   CONTEXT: pv guest
>>> (XEN) rax: 0015   rbx: 81c03e1c   rcx: c0010112
>>> (XEN) rdx: 0001   rsi: 81c03e1c   rdi: c0010112
>>> (XEN) rbp: 81c03df8   rsp: 81c03da0   r8:  81c03e28
>>> (XEN) r9:  81c03e2c   r10:    r11: 
>>> (XEN) r12: 81d25a60   r13: 0400   r14: 
>>> (XEN) r15:    cr0: 80050033   cr4: 000406f0
>>> (XEN) cr3: 75c0b000   cr2: 
>>> (XEN) ds:    es:    fs:    gs:    ss: e02b   cs: e033
>>> (XEN) Guest stack trace from rsp=81c03da0:
>>> (XEN)c0010112   810031f0
>>> (XEN)0001e030 00010082 81c03de0 e02b
>>> (XEN) 000c 81c03e1c 81c03e48
>>> (XEN)8102a7a4 81c03e48 8102aa3b 81c03e48
>>> (XEN)cf1fa5f5e026f464 0100 81c03ef8 0400
>>> (XEN) 81c03e58 81d5d142 81c03ee8
>>> (XEN)81d58b56   81c03e88
>>> (XEN)810f8a39 81c03ee8 81798b13 0010
>>> (XEN)81c03ef8 81c03eb8 cf1fa5f5e026f464 81f1de9c
>>> (XEN)  81df7920 
>>> (XEN) 81c03f28 81d51c74 cf1fa5f5e026f464
>>> (XEN) 81c03f60 81c03f5c 
>>> (XEN) 81c03f38 81d51339 81c03ff8
>>> (XEN)81d548b1  00600f12 00010800
>>> (XEN)03010032 0005  
>>> (XEN)   
>>> (XEN)   
>>> (XEN)   
>>> (XEN)   
>>> (XEN)0f0060c0c748 c305  
>>> (XEN) Domain 0 crashed: rebooting machine in 5 seconds.
>>>
>>>
>>> gdb output:
>>>
>>> (gdb) x/20i 0x810031f0
>>>0x810031f0 : rdmsr
>> Fantastic! So we have some rdmsr that makes KVM

[PATCH 0/2] x86/msr: MSR access failure changes

2015-09-17 Thread Andy Lutomirski

This applies on top of my earlier MSR series.

Andy Lutomirski (2):
  x86/msr: Carry on after a non-"safe" MSR access fails without
!panic_on_oops
  x86/msr: Set the return value to zero when native_rdmsr_safe fails

 arch/x86/include/asm/msr.h |  5 -
 arch/x86/kernel/traps.c| 51 ++
 2 files changed, 55 insertions(+), 1 deletion(-)

-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] x86/msr: Set the return value to zero when native_rdmsr_safe fails

2015-09-17 Thread Andy Lutomirski

This will cause unchecked native_rdmsr_safe failures to return
deterministic results.

We could poison the results with something other than zero, but that
would increase code size.

Signed-off-by: Andy Lutomirski 
---
 arch/x86/include/asm/msr.h | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index 77d8b284e4a7..9eda52205d42 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -73,7 +73,10 @@ static inline unsigned long long 
native_read_msr_safe(unsigned int msr,
asm volatile("2: rdmsr ; xor %[err],%[err]\n"
 "1:\n\t"
 ".section .fixup,\"ax\"\n\t"
-"3:  mov %[fault],%[err] ; jmp 1b\n\t"
+"3: mov %[fault],%[err]\n\t"
+"xorl %%eax, %%eax\n\t"
+"xorl %%edx, %%edx\n\t"
+"jmp 1b\n\t"
 ".previous\n\t"
 _ASM_EXTABLE(2b, 3b)
 : [err] "=r" (*err), EAX_EDX_RET(val, low, high)
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops

2015-09-17 Thread Andy Lutomirski

This demotes an OOPS and likely panic due to a failed non-"safe" MSR
access to a WARN_ON_ONCE and a return of poisoned values (in the
RDMSR case).  We still write a pr_info entry unconditionally for
debugging.

To be clear, this type of failure should *not* happen.  This patch
exists to minimize the chance of nasty undebuggable failures due on
systems that used to work due to a now-fixed CONFIG_PARAVIRT=y bug.

Signed-off-by: Andy Lutomirski 
---
 arch/x86/kernel/traps.c | 51 +
 1 file changed, 51 insertions(+)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 346eec73f7db..b7731765017a 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -437,6 +437,54 @@ exit_trap:
do_trap(X86_TRAP_BR, SIGSEGV, "bounds", regs, error_code, NULL);
 }
 
+static bool paper_over_kernel_gpf(struct pt_regs *regs)
+{
+   /*
+* Try to decode the opcode that failed.  So far, we only care
+* about boring two-byte unprefixed opcodes, so we don't need
+* the full instruction decoder machinery.
+*/
+   u16 opcode;
+
+   if (probe_kernel_read(, (const void *)regs->ip, sizeof(opcode)))
+   return false;
+
+   if (opcode == 0x320f) {
+   /* RDMSR */
+   pr_info("bad kernel RDMSR from non-existent MSR 0x%x",
+   (unsigned int)regs->cx);
+   if (!panic_on_oops) {
+   WARN_ON_ONCE(true);
+
+   /* Patch it up with deterministic poison. */
+   regs->ax = 0x5aadc0de;
+   regs->dx = 0x8badf00d;
+   regs->ip += 2;
+   return true;
+   } else {
+   /* Don't fix it up. */
+   return false;
+   }
+   } else if (opcode == 0x300f) {
+   /* WRMSR */
+   pr_info("bad kernel WRMSR writing 0x%08x%08x to MSR 0x%x",
+   (unsigned int)regs->dx, (unsigned int)regs->ax,
+   (unsigned int)regs->cx);
+   if (!panic_on_oops) {
+   WARN_ON_ONCE(true);
+
+   /* Pretend it worked and carry on. */
+   regs->ip += 2;
+   return true;
+   } else {
+   /* Don't fix it up. */
+   return false;
+   }
+   }
+
+   return false;
+}
+
 dotraplinkage void
 do_general_protection(struct pt_regs *regs, long error_code)
 {
@@ -456,6 +504,9 @@ do_general_protection(struct pt_regs *regs, long error_code)
if (fixup_exception(regs))
return;
 
+   if (paper_over_kernel_gpf(regs))
+   return;
+
tsk->thread.error_code = error_code;
tsk->thread.trap_nr = X86_TRAP_GP;
if (notify_die(DIE_GPF, "general protection fault", regs, 
error_code,
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] KVM/arm: enable armv7 fp/simd lazy switch

2015-09-17 Thread Mario Smarduch

Adds code to enable fp/simd lazy switch. On each entry check if fp/simd
registers have been switched to guest, if no set the trap flag. On trap 
switch fp/simd registers and set vfp_lazy to true and disable trapping. 
When the vcpu is about to be put, then context switch fp/simd registers
save guest and restore host and reset the vfp_lazy state to enable trapping 
again.

Signed-off-by: Mario Smarduch 
---
 arch/arm/kvm/arm.c| 17 +
 arch/arm/kvm/interrupts.S | 40 +---
 2 files changed, 46 insertions(+), 11 deletions(-)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index ce404a5..0acbb69 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -105,6 +105,20 @@ void kvm_arch_check_processor_compat(void *rtn)
*(int *)rtn = 0;
 }
 
+/**
+ * kvm_switch_fp_regs() - switch guest/host VFP/SIMD registers
+ * @vcpu:  pointer to vcpu structure.
+ *
+ */
+static void kvm_switch_fp_regs(struct kvm_vcpu *vcpu)
+{
+#ifdef CONFIG_ARM
+   if (vcpu->arch.vfp_lazy == 1) {
+   kvm_call_hyp(__kvm_restore_host_vfp_state, vcpu);
+   vcpu->arch.vfp_lazy = 0;
+   }
+#endif
+}
 
 /**
  * kvm_arch_init_vm - initializes a VM data structure
@@ -295,6 +309,9 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 {
+   /* Check if Guest accessed VFP registers */
+   kvm_switch_fp_regs(vcpu);
+
/*
 * The arch-generic KVM code expects the cpu field of a vcpu to be -1
 * if the vcpu is no longer assigned to a cpu.  This is used for the
diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
index 900ef6d..a47acc1 100644
--- a/arch/arm/kvm/interrupts.S
+++ b/arch/arm/kvm/interrupts.S
@@ -96,6 +96,24 @@ ENTRY(__kvm_flush_vm_context)
bx  lr
 ENDPROC(__kvm_flush_vm_context)
 
+/**
+ * void __kvm_restore_host_vfp_state(struct vcpu *vcpu) - Executes a lazy
+ * fp/simd switch, saves the guest, restores host.
+ *
+ */
+ENTRY(__kvm_restore_host_vfp_state)
+   push{r3-r7}
+
+   add r7, r0, #VCPU_VFP_GUEST
+   store_vfp_state r7
+
+   add r7, r0, #VCPU_VFP_HOST
+   ldr r7, [r7]
+   restore_vfp_state r7
+
+   pop {r3-r7}
+   bx  lr
+ENDPROC(__kvm_restore_host_vfp_state)
 
 /
  *  Hypervisor world-switch code
@@ -131,7 +149,14 @@ ENTRY(__kvm_vcpu_run)
 
@ Trap coprocessor CRx accesses
set_hstr vmentry
+
+   ldr r1, [vcpu, #VCPU_VFP_LAZY]
+   cmp r1, #1
+   beq skip_guest_vfp_trap
+
set_hcptr vmentry, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11))
+skip_guest_vfp_trap:
+
set_hdcr vmentry
 
@ Write configured ID register into MIDR alias
@@ -170,22 +195,12 @@ __kvm_vcpu_return:
@ Don't trap coprocessor accesses for host kernel
set_hstr vmexit
set_hdcr vmexit
-   set_hcptr vmexit, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11)), 
after_vfp_restore
+   set_hcptr vmexit, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11))
 
 #ifdef CONFIG_VFPv3
-   @ Switch VFP/NEON hardware state to the host's
-   add r7, vcpu, #VCPU_VFP_GUEST
-   store_vfp_state r7
-   add r7, vcpu, #VCPU_VFP_HOST
-   ldr r7, [r7]
-   restore_vfp_state r7
-
-after_vfp_restore:
@ Restore FPEXC_EN which we clobbered on entry
pop {r2}
VFPFMXR FPEXC, r2
-#else
-after_vfp_restore:
 #endif
 
@ Reset Hyp-role
@@ -485,6 +500,9 @@ switch_to_guest_vfp:
@ NEON/VFP used.  Turn on VFP access.
set_hcptr vmtrap, (HCPTR_TCP(10) | HCPTR_TCP(11))
 
+   mov r1, #1
+   str r1, [vcpu, #VCPU_VFP_LAZY]
+
@ Switch VFP/NEON hardware state to the guest's
add r7, r0, #VCPU_VFP_HOST
ldr r7, [r7]
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/2] KVM/arm: add fp/simd lazy switch support

2015-09-17 Thread Mario Smarduch

These patches enable armv7 fp/simd lazy switch. On guest entry fp/simd
access trap is set, and on guest first access fp/simd registers are switched -
host saved, guest restored. CPU continues with guest fp/simd content until
vcpu_put where guest is saved and host is restored. 

Running floating point workload illustrates reduction of fp/simd context 
switches the amount depends on the load. For a light load with with FP 
application running only 2% of all exits result in calls to lazy switch.

arm64 version is in test and appears to work fine, remaining work is
boot arm32 guest on arm64 and verify operation. Initial intent was to post
all patches at once, but arm64 version will be posted soon.

Mario Smarduch (2):
  add hooks for armv7 vfp/simd lazy switch support
  enable armv7 vfp/simd lazy switch

 arch/arm/include/asm/kvm_asm.h  |  1 +
 arch/arm/include/asm/kvm_host.h |  3 +++
 arch/arm/kernel/asm-offsets.c   |  1 +
 arch/arm/kvm/arm.c  | 17 +
 arch/arm/kvm/interrupts.S   | 40 +---
 5 files changed, 51 insertions(+), 11 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] KVM/arm: add hooks for armv7 fp/simd lazy switch support

2015-09-17 Thread Mario Smarduch

Basic hooks are added to support fp/simd lazy switch. A vcpu flag to track 
fp/simd state, offset into the vcpu structure and switch prototype function.

Signed-off-by: Mario Smarduch 
---
 arch/arm/include/asm/kvm_asm.h  | 1 +
 arch/arm/include/asm/kvm_host.h | 3 +++
 arch/arm/kernel/asm-offsets.c   | 1 +
 3 files changed, 5 insertions(+)

diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index 194c91b..4b45d79 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -97,6 +97,7 @@ extern char __kvm_hyp_code_end[];
 extern void __kvm_flush_vm_context(void);
 extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
 extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
+extern void __kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu);
 
 extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
 #endif
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index dcba0fa..4858f6c 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -111,6 +111,9 @@ struct kvm_vcpu_arch {
/* Interrupt related fields */
u32 irq_lines;  /* IRQ and FIQ levels */
 
+   /* Track fp/simd lazy switch */
+   u32 vfp_lazy;
+
/* Exception Information */
struct kvm_vcpu_fault_info fault;
 
diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
index 871b826..4a80802f 100644
--- a/arch/arm/kernel/asm-offsets.c
+++ b/arch/arm/kernel/asm-offsets.c
@@ -191,6 +191,7 @@ int main(void)
   DEFINE(VCPU_HPFAR,   offsetof(struct kvm_vcpu, arch.fault.hpfar));
   DEFINE(VCPU_HYP_PC,  offsetof(struct kvm_vcpu, arch.fault.hyp_pc));
   DEFINE(VCPU_VGIC_CPU,offsetof(struct kvm_vcpu, 
arch.vgic_cpu));
+  DEFINE(VCPU_VFP_LAZY,offsetof(struct kvm_vcpu, 
arch.vfp_lazy));
   DEFINE(VGIC_V2_CPU_HCR,  offsetof(struct vgic_cpu, vgic_v2.vgic_hcr));
   DEFINE(VGIC_V2_CPU_VMCR, offsetof(struct vgic_cpu, vgic_v2.vgic_vmcr));
   DEFINE(VGIC_V2_CPU_MISR, offsetof(struct vgic_cpu, vgic_v2.vgic_misr));
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH v8 03/13] KVM: Define a new interface kvm_intr_is_single_vcpu()

2015-09-17 Thread Wu, Feng



> -Original Message-
> From: Paolo Bonzini [mailto:pbonz...@redhat.com]
> Sent: Friday, September 18, 2015 12:00 AM
> To: Radim Krčmář
> Cc: Wu, Feng; alex.william...@redhat.com; j...@8bytes.org;
> mtosa...@redhat.com; eric.au...@linaro.org; kvm@vger.kernel.org;
> io...@lists.linux-foundation.org; linux-ker...@vger.kernel.org
> Subject: Re: [PATCH v8 03/13] KVM: Define a new interface
> kvm_intr_is_single_vcpu()
> 
> 
> 
> On 17/09/2015 17:58, Radim Krčmář wrote:
> > For interrupts from MSI and IOxAPIC:
> > - Flat logical interrupts are delivered as if we had natural
> >   (CPU0<->bit0, CPU1<->bit1, ...) flat logical xAPIC for first 8 VCPUs.
> > - Cluster logical doesn't work much, it's interpreted like flat logical.
> >   I didn't care about xAPIC cluster because Linux, the sole user of our
> >   paravirtualized x2APIC, doesn't configure it.
> >
> > I'll paste kvm_apic_mda() source for better explanation:
> >
> >   static u32 kvm_apic_mda(unsigned int dest_id, struct kvm_lapic *source,
> > struct kvm_lapic
> *target)
> >   {
> > bool ipi = source != NULL;
> > bool x2apic_mda = apic_x2apic_mode(ipi ? source : target);
> >
> > if (!ipi && dest_id == APIC_BROADCAST && x2apic_mda)
> > return X2APIC_BROADCAST;
> >
> > return x2apic_mda ? dest_id : SET_APIC_DEST_FIELD(dest_id);
> >   }
> >
> > MSI/IOxAPIC interrupt means that source is NULL and if the target is in
> > x2APIC mode, the original 'dest_id' is returned as mda => a flat logical
> > xAPIC to 0x0f will get interpreted as (cluster) logical x2APIC 0xf in
> > kvm_apic_match_logical_addr().
> > xAPIC address are only 8 bit long so they always get delivered to x2APIC
> > cluster 0, where first 16 bits work like xAPIC flat logical mode.
> 
> Ok, I was wondering whether this was the correct interpretation.  Thanks!

Paolo, I don't think Radim clarify your concern, right? Since mda is 8-bit, it
is wrong with mda >> 16, this is your concern, right?

Thanks,
Feng

> 
> Paolo
N�r��yb�X��ǧv�^�)޺{.n�+h����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf

RE: [PATCH v8 03/13] KVM: Define a new interface kvm_intr_is_single_vcpu()

2015-09-17 Thread Wu, Feng



> -Original Message-
> From: Paolo Bonzini [mailto:pbonz...@redhat.com]
> Sent: Thursday, September 17, 2015 10:25 PM
> To: Wu, Feng; alex.william...@redhat.com; j...@8bytes.org;
> mtosa...@redhat.com
> Cc: eric.au...@linaro.org; kvm@vger.kernel.org;
> io...@lists.linux-foundation.org; linux-ker...@vger.kernel.org; Radim
> Krčmář
> Subject: Re: [PATCH v8 03/13] KVM: Define a new interface
> kvm_intr_is_single_vcpu()
> 
> >> On 17/09/2015 05:17, Wu, Feng wrote:
> >>> + if (irq->dest_mode == APIC_DEST_PHYSICAL) {
> >>> + if (irq->dest_id == 0xFF)
> >>> + goto out;
> >>> +
> >>> + if (irq->dest_id >= ARRAY_SIZE(map->phys_map)) {
> >
> > Warning here is wrong, the guest can trigger it.
> >>> Could you please share more information about how the guest
> >>> triggers these conditions (including the following two), Thanks
> >>> a lot!
> >>
> >> irq->dest_id is a 16-bit value, so it can be > 255.
> >
> > Yes, irq->dest_id is defined as u32, but by looking the current KVM
> > code, seems desst_id is used as an u8 variable, even in x2apic mode
> > the dest_id will not beyond 255 (except for broadcast dest in in x2apic
> > mode). Correct me if I am wrong. Thanks a lot!
> 
> Actually you're right, the MSI destination is only 8 bits.  I was
> confused because of
> 
> #defineMSI_ADDR_DEST_ID_MASK  0x000
> 
> in arch/x86/include/asm/msidef.h.  But there may be a bug, see below...
> 
> >>> + if (cid >= ARRAY_SIZE(map->logical_map)) {
> >>> + WARN_ON_ONCE(1);
> >>
> >> In x2apic mode irq->dest_id could have bits 12..15 set.
> >
> > cid is gotten from bit 16 ..31 of the ldr (in apic_logical_id()), and
> > in x2apic mode, ldr is constructed in kvm_apic_set_x2apic_id() as
> > below:
> >
> > u32 ldr = ((id >> 4) << 16) | (1 << (id & 0xf));
> >
> > So in fact, cid is (id >> 4), I cannot think of why cid can beyond 15.
> 
> I think kvm_apic_match_logical_addr for MSI and IOAPIC interrupts is
> buggy in x2apic mode.
> 
> It does:
> 
> if (apic_x2apic_mode(apic))
> return ((logical_id >> 16) == (mda >> 16))
>&& (logical_id & mda & 0x) != 0;
> 
> But mda is only 8-bits for MSI and IOAPIC interrupts.
> 
> Radim, should kvm_apic_mda also handle the !ipi && x2apic_mda && dest_id
> != APIC_BROADCAST case?  It never triggers with Linux because it uses
> only the physical mode (that's not super-easy to see;
> ioapic_set_affinity looks for the RTEs in irq_data->chip_data and that
> is allocated with kzalloc).
> 
> > Do I miss something here? Thanks!
> 
> No, you were right.
> 
> But still I think the WARNs are unnecessary; it is conceivable that some
> future chipset adds support for more than 8-bits in the dest_id.

No problem, I agree with it. Here I just want clarify some questions, thanks
for the elaboration!

Thanks,
Feng

> 
> Paolo
> 
> > Thanks,
> > Feng
> >
> >>
> >> Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

75 matches

Mail list logo