Re: [PATCH] kvm: x86: fix comment about {mmu,nested_mmu}.gva_to_gpa

2016-01-07 Thread Paolo Bonzini


On 30/12/2015 17:26, David Matlack wrote:
> The comment had the meaning of mmu.gva_to_gpa and nested_mmu.gva_to_gpa
> swapped. Fix that, and also add some details describing how each translation
> works.
> 
> Signed-off-by: David Matlack 
> ---
>  arch/x86/kvm/mmu.c | 10 ++
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index e7c2c14..098a9c2 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -4058,10 +4058,12 @@ static void init_kvm_nested_mmu(struct kvm_vcpu *vcpu)
>   g_context->inject_page_fault = kvm_inject_page_fault;
>  
>   /*
> -  * Note that arch.mmu.gva_to_gpa translates l2_gva to l1_gpa. The
> -  * translation of l2_gpa to l1_gpa addresses is done using the
> -  * arch.nested_mmu.gva_to_gpa function. Basically the gva_to_gpa
> -  * functions between mmu and nested_mmu are swapped.
> +  * Note that arch.mmu.gva_to_gpa translates l2_gpa to l1_gpa using
> +  * L1's nested page tables (e.g. EPT12). The nested translation
> +  * of l2_gva to l1_gpa is done by arch.nested_mmu.gva_to_gpa using
> +  * L2's page tables as the first level of translation and L1's
> +  * nested page tables as the second level of translation. Basically
> +  * the gva_to_gpa functions between mmu and nested_mmu are swapped.
>*/
>   if (!is_paging(vcpu)) {
>   g_context->nx = false;
> 

Applied, thanks.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PULL] KVM/ARM updates for 4.5

2016-01-07 Thread Paolo Bonzini


On 24/12/2015 12:12, Marc Zyngier wrote:
> Hi Paolo,
> 
> THis is the first pull request for the 4.5 merge window. Not much in
> terms of features, but a rewrite of our 64bit world switch, making it
> a lot nicer, maintainable, and much more likely to cope with things
> like VHE. Also support 16bit VMIDs for systems that need to run that
> many VMs concurrently.
> 
> I was really hoping that the PMU code would make it this time around,
> but it got slightly delayed, and the holiday season didn't help. If
> we're lucky enough (read: if all known issues have been addressed), I
> may send you another pull request early in the new year.
> 
> In the mean time, please pull!

Pulled, thanks.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND] kvm:x86:Make sure kvm_write_guest successes for first call in kvm_write_wall_clock

2016-01-07 Thread Paolo Bonzini


On 30/12/2015 19:08, Nicholas Krause wrote:
> This makes sure that kvm_write_guest successes for the first call
> in order to make sure that the wall clock is successfully written
> to the host system before being calucated as required by the
> guest system.
> 
> Signed-off-by: Nicholas Krause 
> ---
>  arch/x86/kvm/x86.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index eed3228..7551f30 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1167,7 +1167,8 @@ static void kvm_write_wall_clock(struct kvm *kvm, gpa_t 
> wall_clock)
>  
>   ++version;
>  
> - kvm_write_guest(kvm, wall_clock, , sizeof(version));
> + if (kvm_write_guest(kvm, wall_clock, , sizeof(version)))
> + return;
>  
>   /*
>* The guest calculates current wall clock time by adding
> 

Applied, thanks.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: Remove unused KVM_REQ_KICK to save a bit in vcpu->requests

2016-01-07 Thread Paolo Bonzini


On 07/01/2016 12:43, Takuya Yoshikawa wrote:
> Signed-off-by: Takuya Yoshikawa 
> ---
>  include/linux/kvm_host.h | 45 ++---
>  1 file changed, 22 insertions(+), 23 deletions(-)
> 
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 61c3e6c..ca9b93e 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -122,29 +122,28 @@ static inline bool is_error_page(struct page *page)
>  #define KVM_REQ_UNHALT 6
>  #define KVM_REQ_MMU_SYNC   7
>  #define KVM_REQ_CLOCK_UPDATE   8
> -#define KVM_REQ_KICK   9

I'd prefer to leave just an /* unused: 9 */ here.

This patch can go in for 4.5.  Regarding the other patch,
KVM_REQ_MCLOCK_INPROGRESS is indeed not really necessary, see
http://www.spinics.net/lists/kvm/msg95944.html and the follow-up.  Were
you thinking of the same?  If so I would prefer to have some comments.

Paolo

> -#define KVM_REQ_DEACTIVATE_FPU10
> -#define KVM_REQ_EVENT 11
> -#define KVM_REQ_APF_HALT  12
> -#define KVM_REQ_STEAL_UPDATE  13
> -#define KVM_REQ_NMI   14
> -#define KVM_REQ_PMU   15
> -#define KVM_REQ_PMI   16
> -#define KVM_REQ_WATCHDOG  17
> -#define KVM_REQ_MASTERCLOCK_UPDATE 18
> -#define KVM_REQ_MCLOCK_INPROGRESS 19
> -#define KVM_REQ_EPR_EXIT  20
> -#define KVM_REQ_SCAN_IOAPIC   21
> -#define KVM_REQ_GLOBAL_CLOCK_UPDATE 22
> -#define KVM_REQ_ENABLE_IBS23
> -#define KVM_REQ_DISABLE_IBS   24
> -#define KVM_REQ_APIC_PAGE_RELOAD  25
> -#define KVM_REQ_SMI   26
> -#define KVM_REQ_HV_CRASH  27
> -#define KVM_REQ_IOAPIC_EOI_EXIT   28
> -#define KVM_REQ_HV_RESET  29
> -#define KVM_REQ_HV_EXIT   30
> -#define KVM_REQ_HV_STIMER 31
> +#define KVM_REQ_DEACTIVATE_FPU 9
> +#define KVM_REQ_EVENT 10
> +#define KVM_REQ_APF_HALT  11
> +#define KVM_REQ_STEAL_UPDATE  12
> +#define KVM_REQ_NMI   13
> +#define KVM_REQ_PMU   14
> +#define KVM_REQ_PMI   15
> +#define KVM_REQ_WATCHDOG  16
> +#define KVM_REQ_MASTERCLOCK_UPDATE 17
> +#define KVM_REQ_MCLOCK_INPROGRESS 18
> +#define KVM_REQ_EPR_EXIT  19
> +#define KVM_REQ_SCAN_IOAPIC   20
> +#define KVM_REQ_GLOBAL_CLOCK_UPDATE 21
> +#define KVM_REQ_ENABLE_IBS22
> +#define KVM_REQ_DISABLE_IBS   23
> +#define KVM_REQ_APIC_PAGE_RELOAD  24
> +#define KVM_REQ_SMI   25
> +#define KVM_REQ_HV_CRASH  26
> +#define KVM_REQ_IOAPIC_EOI_EXIT   27
> +#define KVM_REQ_HV_RESET  28
> +#define KVM_REQ_HV_EXIT   29
> +#define KVM_REQ_HV_STIMER 30
>  
>  #define KVM_USERSPACE_IRQ_SOURCE_ID  0
>  #define KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID 1
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: renumber architecture-dependent requests

2016-01-07 Thread Paolo Bonzini


On 07/01/2016 16:27, Christian Borntraeger wrote:
> On 01/07/2016 03:17 PM, Paolo Bonzini wrote:
>> Leave room for 4 more arch-independent requests.
> 
> The patch subject is wrong.
> 
> "renumber architecture-dependent requests"
> 
> --> "renumber kvm requests"
> 
> as we also renumber the architecture independent ones.

Right.  Ok with that change?

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] KVM: move architecture-dependent requests to arch/

2016-01-07 Thread Paolo Bonzini


On 07/01/2016 16:54, Christian Borntraeger wrote:
> On 01/07/2016 03:17 PM, Paolo Bonzini wrote:
> 
> Can you add at least a one line patch description?

Yes, and it will be more than one line. :)

"Since the numbers now overlap, it makes sense to enumerate
them in asm/kvm_host.h rather than linux/kvm_host.h.  Functions
that refer to architecture-specific requests are also moved
to arch/."

Paolo

>> Signed-off-by: Paolo Bonzini <pbonz...@redhat.com>
> 
> 
> Reviewed-by: Christian Borntraeger <borntrae...@de.ibm.com>
>> ---
>>  arch/powerpc/include/asm/kvm_host.h |  4 
>>  arch/s390/include/asm/kvm_host.h|  4 
>>  arch/x86/include/asm/kvm_host.h | 28 +++
>>  arch/x86/kvm/x86.c  | 10 ++
>>  include/linux/kvm_host.h| 38 
>> ++---
>>  virt/kvm/kvm_main.c | 10 --
>>  6 files changed, 48 insertions(+), 46 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/kvm_host.h 
>> b/arch/powerpc/include/asm/kvm_host.h
>> index cfa758c6b4f6..271fefbbe521 100644
>> --- a/arch/powerpc/include/asm/kvm_host.h
>> +++ b/arch/powerpc/include/asm/kvm_host.h
>> @@ -50,6 +50,10 @@
>>  #define KVM_NR_IRQCHIPS  1
>>  #define KVM_IRQCHIP_NUM_PINS 256
>>
>> +/* PPC-specific vcpu->requests bit members */
>> +#define KVM_REQ_WATCHDOG   8
>> +#define KVM_REQ_EPR_EXIT   9
>> +
>>  #include 
>>
>>  #define KVM_ARCH_WANT_MMU_NOTIFIER
>> diff --git a/arch/s390/include/asm/kvm_host.h 
>> b/arch/s390/include/asm/kvm_host.h
>> index c83144110ea9..31fe20f4d129 100644
>> --- a/arch/s390/include/asm/kvm_host.h
>> +++ b/arch/s390/include/asm/kvm_host.h
>> @@ -39,6 +39,10 @@
>>  #define KVM_IRQCHIP_NUM_PINS 4096
>>  #define KVM_HALT_POLL_NS_DEFAULT 0
>>
>> +/* s390-specific vcpu->requests bit members */
>> +#define KVM_REQ_ENABLE_IBS 8
>> +#define KVM_REQ_DISABLE_IBS9
>> +
>>  #define SIGP_CTRL_C 0x80
>>  #define SIGP_CTRL_SCN_MASK  0x3f
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h 
>> b/arch/x86/include/asm/kvm_host.h
>> index a7c89876698b..44adbb819041 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -46,6 +46,31 @@
>>
>>  #define KVM_IRQCHIP_NUM_PINS  KVM_IOAPIC_NUM_PINS
>>
>> +/* x86-specific vcpu->requests bit members */
>> +#define KVM_REQ_MIGRATE_TIMER  8
>> +#define KVM_REQ_REPORT_TPR_ACCESS  9
>> +#define KVM_REQ_TRIPLE_FAULT  10
>> +#define KVM_REQ_MMU_SYNC  11
>> +#define KVM_REQ_CLOCK_UPDATE  12
>> +#define KVM_REQ_DEACTIVATE_FPU13
>> +#define KVM_REQ_EVENT 14
>> +#define KVM_REQ_APF_HALT  15
>> +#define KVM_REQ_STEAL_UPDATE  16
>> +#define KVM_REQ_NMI   17
>> +#define KVM_REQ_PMU   18
>> +#define KVM_REQ_PMI   19
>> +#define KVM_REQ_SMI   20
>> +#define KVM_REQ_MASTERCLOCK_UPDATE 21
>> +#define KVM_REQ_MCLOCK_INPROGRESS 22
>> +#define KVM_REQ_SCAN_IOAPIC   23
>> +#define KVM_REQ_GLOBAL_CLOCK_UPDATE 24
>> +#define KVM_REQ_APIC_PAGE_RELOAD  25
>> +#define KVM_REQ_HV_CRASH  26
>> +#define KVM_REQ_IOAPIC_EOI_EXIT   27
>> +#define KVM_REQ_HV_RESET  28
>> +#define KVM_REQ_HV_EXIT   29
>> +#define KVM_REQ_HV_STIMER 30
>> +
>>  #define CR0_RESERVED_BITS   \
>>  (~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
>>| X86_CR0_ET | X86_CR0_NE | X86_CR0_WP | X86_CR0_AM \
>> @@ -1268,6 +1293,9 @@ u64 kvm_read_l1_tsc(struct kvm_vcpu *vcpu, u64 
>> host_tsc);
>>  unsigned long kvm_get_linear_rip(struct kvm_vcpu *vcpu);
>>  bool kvm_is_linear_rip(struct kvm_vcpu *vcpu, unsigned long linear_rip);
>>
>> +void kvm_make_mclock_inprogress_request(struct kvm *kvm);
>> +void kvm_make_scan_ioapic_request(struct kvm *kvm);
>> +
>>  void kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
>>   struct kvm_async_pf *work);
>>  void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 102c3028513f..dc6b37f47cd7 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -1686,6 +1686,11 @@ static void pvclock_update_vm_gtod_copy(struct kvm 
>> *kvm)
>>  #endif
>>  

Re: [PATCH v1 0/6] KVM: Hyper-V SynIC timers migration fixes

2016-01-07 Thread Paolo Bonzini


On 23/12/2015 12:28, Andrey Smetanin wrote:
> During testing of Windows 2012R2 guest migration with
> Hyper-V SynIC timers enabled we found several bugs
> which lead to restoring guest in a hung state.
> 
> This patch series provides several fixes to make the
> migration of guest with Hyper-V SynIC timers enabled
> succeed.
> 
> The series applies on top of
> 'kvm/x86: Remove Hyper-V SynIC timer stopping'
> previously sent.
> 
> Signed-off-by: Andrey Smetanin <asmeta...@virtuozzo.com>
> Reviewed-by: Roman Kagan <rka...@virtuozzo.com>
> CC: Gleb Natapov <g...@kernel.org>
> CC: Paolo Bonzini <pbonz...@redhat.com>
> CC: Roman Kagan <rka...@virtuozzo.com>
> CC: Denis V. Lunev <d...@openvz.org>
> CC: qemu-de...@nongnu.org
> 
> Andrey Smetanin (6):
>   kvm/x86: Drop stimer_stop() function
>   kvm/x86: Hyper-V unify stimer_start() and stimer_restart()
>   kvm/x86: Reorg stimer_expiration() to better control timer restart
>   kvm/x86: Hyper-V fix SynIC timer disabling condition
>   kvm/x86: Skip SynIC vector check for QEMU side
>   kvm/x86: Update SynIC timers on guest entry only
> 
>  arch/x86/kvm/hyperv.c | 112 
> +++---
>  arch/x86/kvm/x86.c|   6 +++
>  2 files changed, 58 insertions(+), 60 deletions(-)
> 

Applied, let me know if I should fix up patch 3 (the bug is preexisting
anyway, if it is a bug as I suspect).

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/7] kvm/x86: Hyper-V unify stimer_start() and stimer_restart()

2016-01-07 Thread Paolo Bonzini


On 28/12/2015 16:27, Andrey Smetanin wrote:
> This will be used in future to start Hyper-V SynIC timer
> in several places by one logic in one function.
> 
> Changes v2:
> * drop stimer->count == 0 check inside stimer_start()
> * comment stimer_start() assumptions

Can you replace comments with WARN_ON_ONCE?

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: QUESTION: mmu.gva_to_gpa versus nested_mmu.gva_to_gpa

2015-12-30 Thread Paolo Bonzini


On 29/12/2015 17:37, David Matlack wrote:
>> > Yes, it's correct.

s/it's/you're/ :)

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is kvm-kmod still supported?

2015-12-28 Thread Paolo Bonzini


On 28/12/2015 23:09, Estrada, Zachary J wrote:
> I've been maintaining a fork for research and tinkering. Is the kvm-kmod
> standalone module still supported or should I be using the full Linux
> tree? I find kvm-kmod convenient to keep the source independent of the
> kernel tree, but I also want to be using the latest and greatest.
> 
> The repository I'm referring to is here:
> http://git.kiszka.org/?p=kvm-kmod.git;a=summary
> 
> However, the "next" branch appears to be out of sync with the latest
> upstream in the Linux tree.

I'm using it occasionally and I have patches that bring it up to date,
but only with recent base kernel versions (3.10+ *should* work).

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 0/4] scsi: cleanup ioctl headers and provide UAPI versions

2015-12-28 Thread Paolo Bonzini


On 25/09/2015 11:27, Paolo Bonzini wrote:
> This is v3 of the series to provide an "official" sg.h header (and
> scsi_ioctl.h too, though it's basically obsolete) together with the other
> userspace API definitions.  The change from v2 to v3 is that defaults
> for sg.c are not exported in include/uapi/linux/sg.c.
> 
> Paolo
> 
> 2.5.0
> 

What happened to these patches?...

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: QUESTION: mmu.gva_to_gpa versus nested_mmu.gva_to_gpa

2015-12-28 Thread Paolo Bonzini


On 28/12/2015 23:23, David Matlack wrote:
> I'm wondering if this comment in mmu.c:init_kvm_nested_mmu is correct (at
> least in the context of Nested EPT):
> 
> 4055 /*
> 4056  * Note that arch.mmu.gva_to_gpa translates l2_gva to l1_gpa. The
> 4057  * translation of l2_gpa to l1_gpa addresses is done using the
> 4058  * arch.nested_mmu.gva_to_gpa function. Basically the gva_to_gpa
> 4059  * functions between mmu and nested_mmu are swapped.
> 4060  */
> 
> nested_mmu.get_cr3 gets set to get_cr3, which I believe will return L2's cr3.
> In vmx.c:nested_ept_init_mmu_context, mmu.get_cr3 is set to
> nested_ept_get_cr3, which should be the root of EPT12. Given these get_cr3
> functions, shouldn't nested_mmu.gva_to_gpa translate l2_gva->l2_gpa and
> mmu.gva_to_gpa translate l2_gpa->l1_gpa?

Yes, it's correct.  It can be trivially seen by looking at
kvm_init_shadow_ept_mmu's usage of >arch.mmu.  This is obviously a
l2_gpa to l1_gpa translation.

Whether the roles are swapped, depends on whether you think of
"nested_mmu" as "nested guest" or "nested virtualization"  nested_mmu is
the MMU for the nested guest, mmu is the MMU for the L1 guest and it's
the one that takes care of nested virtualization.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL] KVM fixes for v4.4-rc7

2015-12-22 Thread Paolo Bonzini
Linus,

The following changes since commit 6764e5ebd5c62236d082f9ae030674467d0b2779:

  Merge tag 'vfio-v4.4-rc5' of git://github.com/awilliam/linux-vfio (2015-12-09 
16:52:12 -0800)

are available in the git repository at:


  git://git.kernel.org/pub/scm/virt/kvm/kvm.git tags/for-linus

for you to fetch changes up to 0185604c2d82c560dab2f2933a18f797e74ab5a8:

  KVM: x86: Reload pit counters for all channels when restoring state 
(2015-12-22 15:36:26 +0100)


KVM/ARM fixes for v4.4-rc7

- A series of fixes to the MTRR emulation, tested in the BZ by several users
  so they should be safe this late

- A fix for a division by zero

- Two very simple ARM and PPC fixes


Alexis Dambricourt (1):
  KVM: MTRR: fix fixed MTRR segment look up

Andrew Honig (1):
  KVM: x86: Reload pit counters for all channels when restoring state

Christoffer Dall (1):
  KVM: arm/arm64: vgic: Fix kvm_vgic_map_is_active's dist check

Haozhong Zhang (1):
  KVM: VMX: Fix host initiated access to guest MSR_TSC_AUX

Paolo Bonzini (5):
  kvm: x86: move tracepoints outside extended quiescent state
  Merge branch 'kvm-ppc-fixes' of git://git.kernel.org/.../paulus/powerpc 
into kvm-master
  Merge tag 'kvm-arm-for-v4.4-rc6' of 
git://git.kernel.org/.../kvmarm/kvmarm into HEAD
  KVM: MTRR: observe maxphyaddr from guest CPUID, not host
  KVM: MTRR: treat memory as writeback if MTRR is disabled in guest CPUID

Paul Mackerras (1):
  KVM: PPC: Book3S HV: Prohibit setting illegal transaction state in MSR

 arch/powerpc/kvm/book3s_hv.c |  6 ++
 arch/x86/kvm/cpuid.h |  8 
 arch/x86/kvm/mtrr.c  | 25 +++--
 arch/x86/kvm/svm.c   |  4 ++--
 arch/x86/kvm/vmx.c   |  7 ---
 arch/x86/kvm/x86.c   | 12 
 virt/kvm/arm/vgic.c  |  2 +-
 7 files changed, 48 insertions(+), 16 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 5/5] kvm/x86: Hyper-V kvm exit

2015-12-18 Thread Paolo Bonzini


On 18/12/2015 16:19, Pavel Fedin wrote:
> As far as i understand this code, KVM_EXIT_HYPERV is called when one
> of three MSRs are accessed. But, shouldn't we have implemented 
> instead something more generic, like KVM_EXIT_REG_IO, which would
> work similar to KVM_EXIT_PIO or KVM_EXIT_MMIO, but carry register 
> code and value?

Yes, we considered that.  There were actually patches for this as well.
 However, in this case the register is still emulated in the kernel, and
userspace just gets informed of the new value.

> This would allow us to solve the same task which we have done here,
> but this solution would be reusable for other devices and other 
> archirectures. What if in future we have more system registers to
> emulate in userspace?

If we do get that, we will just rename KVM_EXIT_HYPERV to
KVM_EXIT_MSR_ACCESS, and KVM_EXIT_HYPERV_SYNIC to
KVM_EXIT_MSR_HYPERV_SYNIC, and struct kvm_hyperv_exit to kvm_msr_exit.

Actually, we can do it now.  What do you guys think?  Peter?  I might
even be convinced to document the capability (in Documentation/ and
uapi/) even if the upstream kernel doesn't provide it.  We still have a
lot of time until 4.5 is out, it can be done after the merge window even.

Paolo

> I write this because at one point i suggested similar thing for ARM64
> (but i never actually wrote it), to emulate physical CP15 timer. And
> it would require exactly the same capability - process some trapped
> system register accesses in userspace.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: MMU: Use clear_page() instead of init_shadow_page_table()

2015-12-18 Thread Paolo Bonzini


On 18/12/2015 10:54, Takuya Yoshikawa wrote:
> Not just in order to clean up the code, but to make it faster by using
> enhanced instructions: the initialization became 20-30% faster on our
> testing machine.
> 
> Signed-off-by: Takuya Yoshikawa 

Applied locally, but I'm not sure when it will appear on kernel.org...
it depends on the amount of testing I can do next week before the
holidays.  Anyway, it's not lost. :)

Paolo

> ---
>  arch/x86/kvm/mmu.c | 10 +-
>  1 file changed, 1 insertion(+), 9 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index a1a3d19..7f5a82b 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -2041,14 +2041,6 @@ static void mmu_sync_children(struct kvm_vcpu *vcpu,
>   }
>  }
>  
> -static void init_shadow_page_table(struct kvm_mmu_page *sp)
> -{
> - int i;
> -
> - for (i = 0; i < PT64_ENT_PER_PAGE; ++i)
> - sp->spt[i] = 0ull;
> -}
> -
>  static void __clear_sp_write_flooding_count(struct kvm_mmu_page *sp)
>  {
>   sp->write_flooding_count = 0;
> @@ -2128,7 +2120,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct 
> kvm_vcpu *vcpu,
>   account_shadowed(vcpu->kvm, sp);
>   }
>   sp->mmu_valid_gen = vcpu->kvm->arch.mmu_valid_gen;
> - init_shadow_page_table(sp);
> + clear_page(sp->spt);
>   trace_kvm_mmu_get_page(sp, true);
>   return sp;
>  }
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PULL take #2] KVM/ARM fixes for v4.4-rc6

2015-12-18 Thread Paolo Bonzini


On 18/12/2015 15:05, Marc Zyngier wrote:
> Hi Paolo,
> 
> We have a one line fix for the VGIC this time around, fixing a patch
> that went in -rc2. Oh well. Hopefully this is the last one for v4.4.
> And yes, the right patch is following the pull-request this time...
> 
> Please pull!

Pulled, thanks.

Paolo

> Thanks,
> 
>   M.
> 
> The following changes since commit 0de58f852875a0f0dcfb120bb8433e4e73c7803b:
> 
>   ARM/arm64: KVM: correct PTE uncachedness check (2015-12-04 16:30:17 +)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git 
> tags/kvm-arm-for-v4.4-rc6
> 
> for you to fetch changes up to fdec12c12ed4333afb49c9948c29fbd5fb52da97:
> 
>   KVM: arm/arm64: vgic: Fix kvm_vgic_map_is_active's dist check (2015-12-11 
> 16:33:31 +)
> 
> 
> KVM/ARM fixes for v4.4-rc6
> 
> - Fix for the active interrupt detection code, affecting
>   the timer interrupt injection.
> 
> 
> Christoffer Dall (1):
>   KVM: arm/arm64: vgic: Fix kvm_vgic_map_is_active's dist check
> 
>  virt/kvm/arm/vgic.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 5/5] kvm/x86: Hyper-V kvm exit

2015-12-18 Thread Paolo Bonzini


On 18/12/2015 19:10, Peter Hornyack wrote:
> On brief inspection of Andrey's patch (I have not been following
> closely) it looks like the kvm_hyperv_exit struct that's returned to
> userspace contains more data (control, evt_page, and msg_page fields)
> than simply the value of the MSR, so would the desired SynIC exit fit
> into a general-purpose exit for MSR emulation?

This would be a special case that is used even if the hyperv MSRs are
emulated in the kernel.  Other exit subcodes than
KVM_EXIT_(MSR_)HYPERV_SYNIC would be used for your usecase of for MSR
emulation in userspace.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 2/5] KVM: add KVM_EXIT_MSR exit reason and capability.

2015-12-18 Thread Paolo Bonzini


On 18/08/2015 20:46, Peter Hornyack wrote:
> Define KVM_EXIT_MSR, a new exit reason for accesses to MSRs that kvm
> does not handle. Define KVM_CAP_UNHANDLED_MSR_EXITS, a vm-wide
> capability that guards the new exit reason and which can be enabled via
> the KVM_ENABLE_CAP ioctl.
> 
> Signed-off-by: Peter Hornyack 
> ---
>  Documentation/virtual/kvm/api.txt | 48 
> +++
>  include/uapi/linux/kvm.h  | 14 
>  2 files changed, 62 insertions(+)

Let's instead add:

- KVM_CAP_MSR_EXITS

- KVM_CAP_ENABLE_MSR_EXIT

- KVM_CAP_DISABLE_MSR_EXIT

and 3 kinds of exits: KVM_EXIT_MSR_READ, KVM_EXIT_MSR_WRITE,
KVM_EXIT_MSR_AFTER_WRITE.  The first two use four fields (type, msr,
data, handled) and #GP if handled=0 is zero on the next entry.  The last
one uses the first three fields and can be used for HyperV.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND] kvm:x86:Fix error handling in the function kvm_write_wall_clock

2015-12-17 Thread Paolo Bonzini


On 17/12/2015 03:30, Nicholas Krause wrote:
> This fixes error handling in the function kvm_write_wall_clock
> by checking if any of the calls to kvm_write_guest have failed
> inside this paricutlar function and if so print to the console
> with pr_err that we are unable to write the data to the guest
> system to warn the user of this failure before directly returning
> to the caller of the function kvm_write_wall_check as we cannot
> continue the function to this function after a failed call to
> 
> Signed-off-by: Nicholas Krause 

I think I have explained before why this is mostly unnecessary, but the
patch can be saved.  I'll go through the problems again first.

> - kvm_write_guest(kvm, wall_clock, , sizeof(version));
> + if (kvm_write_guest(kvm, wall_clock, , sizeof(version))) {
> + pr_err("Unable to correctly write data to guest system\n");
> + return;
> + }

You've written a message to the log that can be triggered by the guest
(by writing an invalid value to the wall clock MSR).  We don't let the
guest spam the host logs.

>   /*
>* The guest calculates current wall clock time by adding
> @@ -1168,10 +1171,16 @@ static void kvm_write_wall_clock(struct kvm *kvm, 
> gpa_t wall_clock)
>   wc.nsec = boot.tv_nsec;
>   wc.version = version;
>  
> - kvm_write_guest(kvm, wall_clock, , sizeof(wc));
> + if (kvm_write_guest(kvm, wall_clock, , sizeof(wc))) {
> + pr_err("Unable to correctly write data to guest system\n");
> + return;
> + }

The other kvm_write_guest will probably also fail but, if it doesn't,
you've left an odd version in the data structure. The guest will loop
forever waiting for the even value.

Plus, same problem with logs.

>   version++;
> - kvm_write_guest(kvm, wall_clock, , sizeof(version));
> + if (kvm_write_guest(kvm, wall_clock, , sizeof(version))) {
> + pr_err("Unable to correctly write data to guest system\n");
> + return;
> + }
>  }

Same problem with logs, and the return is not useful.

Can you send a patch that only adds a return if the *first*
kvm_write_guest fails?  You can leave aside the others, and not add any
pr_err.

Thanks,

Paolo

>  static uint32_t div_frac(uint32_t dividend, uint32_t divisor)
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/16] MIPS: KVM: Misc trivial cleanups

2015-12-17 Thread Paolo Bonzini


On 17/12/2015 00:49, James Hogan wrote:
> This patchset contains a bunch of miscellaneous cleanups (which are
> mostly trivial) for MIPS KVM & MIPS headers, such as:
> - Style/whitespace fixes
> - General cleanup and removal of dead code.
> - Moving/refactoring of general MIPS definitions out of arch/mips/kvm/
>   and into arch/mips/include/asm/ so they can be shared with the rest of
>   arch/mips. Specifically COP0 register bits, exception codes, cache
>   ops, & instruction opcodes.
> - Add MAINTAINERS entry for MIPS KVM.
> 
> Due to the interaction with other arch/mips/ code, I think it makes
> sense for these to go via the MIPS tree.

No objection.

Acked-by: Paolo Bonzini <pbonz...@redhat.com>

I think I'd use s8/u8 instead of int8_t/uint8_t in patch 15, but really
that's just me.  I'm fine either way, and that's really the only comment
I have on the series. :)

Paolo

> James Hogan (16):
>   MIPS: KVM: Trivial whitespace and style fixes
>   MIPS: KVM: Drop some unused definitions from kvm_host.h
>   MIPS: Move definition of DC bit to mipsregs.h
>   MIPS: KVM: Drop unused kvm_mips_host_tlb_inv_index()
>   MIPS: KVM: Convert EXPORT_SYMBOL to _GPL
>   MIPS: KVM: Refactor added offsetof()s
>   MIPS: KVM: Make kvm_mips_{init,exit}() static
>   MIPS: Move Cause.ExcCode trap codes to mipsregs.h
>   MIPS: Update trap codes
>   MIPS: Use EXCCODE_ constants with set_except_vector()
>   MIPS: Break down cacheops.h definitions
>   MIPS: KVM: Use cacheops.h definitions
>   MIPS: Move KVM specific opcodes into asm/inst.h
>   MIPS: KVM: Add missing newline to kvm_err()
>   MIPS: KVM: Consistent use of uint*_t in MMIO handling
>   MAINTAINERS: Add KVM for MIPS entry
> 
>  MAINTAINERS   |   8 +++
>  arch/mips/Kconfig |   3 +-
>  arch/mips/include/asm/cacheops.h  | 106 --
>  arch/mips/include/asm/kvm_host.h  |  39 +
>  arch/mips/include/asm/mipsregs.h  |  34 +++
>  arch/mips/include/uapi/asm/inst.h |   3 +-
>  arch/mips/kernel/cpu-bugs64.c |   8 +--
>  arch/mips/kernel/traps.c  |  52 -
>  arch/mips/kvm/callback.c  |   2 +-
>  arch/mips/kvm/dyntrans.c  |  10 +---
>  arch/mips/kvm/emulate.c   | 118 
> --
>  arch/mips/kvm/interrupt.c |   8 +--
>  arch/mips/kvm/locore.S|  12 ++--
>  arch/mips/kvm/mips.c  |  38 ++--
>  arch/mips/kvm/opcode.h|  22 ---
>  arch/mips/kvm/tlb.c   |  77 +++--
>  arch/mips/kvm/trap_emul.c |   1 -
>  17 files changed, 245 insertions(+), 296 deletions(-)
>  delete mode 100644 arch/mips/kvm/opcode.h
> 
> Cc: Ralf Baechle <r...@linux-mips.org>
> Cc: Paolo Bonzini <pbonz...@redhat.com>
> Cc: Gleb Natapov <g...@kernel.org>
> Cc: linux-m...@linux-mips.org
> Cc: kvm@vger.kernel.org
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/3] KVM-UNIT-TESTS: Hyper-V SynIC timers test

2015-12-17 Thread Paolo Bonzini


On 16/12/2015 19:51, Denis V. Lunev wrote:
> On 12/08/2015 07:36 PM, Andrey Smetanin wrote:
>> The test checks Hyper-V SynIC timers functionality.
>> The test runs on every vCPU and performs start/stop
>> of periodic/one-shot timers (with period=1ms) and checks
>> validity of received expiration messages in appropriate
>> ISR's.
>>
>> Changes v2:
>> * Share generic Hyper-V tests code
>> * Hyper-V SynIC timers test fixes to improve
>> readability and output
>>
>> Signed-off-by: Andrey Smetanin <asmeta...@virtuozzo.com>
>> Reviewed-by: Roman Kagan <rka...@virtuozzo.com>
>> CC: Paolo Bonzini <pbonz...@redhat.com>
>> CC: Marcelo Tosatti <mtosa...@redhat.com>
>> CC: Roman Kagan <rka...@virtuozzo.com>
>> CC: Denis V. Lunev <d...@openvz.org>
>> CC: qemu-de...@nongnu.org
>>
>> Andrey Smetanin (3):
>>lib/x86: Make free_page() available to call
>>x86/hyperv: Move Hyper-V generic code into hyperv.h/hyperv.c
>>x86: Hyper-V SynIC timers test
>>
>>   config/config-x86-common.mak |   8 +-
>>   lib/x86/msr.h|  23 ---
>>   lib/x86/vm.h |   1 +
>>   x86/hyperv.c |  24 +++
>>   x86/hyperv.h | 183 +
>>   x86/hyperv_stimer.c  | 376
>> +++
>>   x86/hyperv_synic.c   |  42 +
>>   x86/unittests.cfg|   5 +
>>   8 files changed, 603 insertions(+), 59 deletions(-)
>>   create mode 100644 x86/hyperv.c
>>   create mode 100644 x86/hyperv.h
>>   create mode 100644 x86/hyperv_stimer.c
>>
> ping :)

I was waiting for the QEMU 2.5 release so that I can merge the support
patches first.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL 17/45] target-i386/kvm: Hyper-V SynIC timers MSR's support

2015-12-17 Thread Paolo Bonzini
From: Andrey Smetanin <asmeta...@virtuozzo.com>

Hyper-V SynIC timers are host timers that are configurable
by guest through corresponding MSR's (HV_X64_MSR_STIMER*).
Guest setup and use fired by host events(SynIC interrupt
and appropriate timer expiration message) as guest clock
events.

The state of Hyper-V SynIC timers are stored in corresponding
MSR's. This patch seria implements such MSR's support and migration.

Signed-off-by: Andrey Smetanin <asmeta...@virtuozzo.com>
CC: Paolo Bonzini <pbonz...@redhat.com>
CC: Richard Henderson <r...@twiddle.net>
CC: Eduardo Habkost <ehabk...@redhat.com>
CC: "Andreas Färber" <afaer...@suse.de>
CC: Marcelo Tosatti <mtosa...@redhat.com>
CC: Denis V. Lunev <d...@openvz.org>
CC: Roman Kagan <rka...@virtuozzo.com>
CC: kvm@vger.kernel.org

Message-Id: <1448464885-8300-3-git-send-email-asmeta...@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonz...@redhat.com>
---
 target-i386/cpu-qom.h |  1 +
 target-i386/cpu.c |  1 +
 target-i386/cpu.h |  2 ++
 target-i386/kvm.c | 50 +-
 target-i386/machine.c | 29 +
 5 files changed, 82 insertions(+), 1 deletion(-)

diff --git a/target-i386/cpu-qom.h b/target-i386/cpu-qom.h
index 7ea5b34..5f9d960 100644
--- a/target-i386/cpu-qom.h
+++ b/target-i386/cpu-qom.h
@@ -95,6 +95,7 @@ typedef struct X86CPU {
 bool hyperv_vpindex;
 bool hyperv_runtime;
 bool hyperv_synic;
+bool hyperv_stimer;
 bool check_cpuid;
 bool enforce_cpuid;
 bool expose_kvm;
diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index 2fdd855..92f7cc1 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -3147,6 +3147,7 @@ static Property x86_cpu_properties[] = {
 DEFINE_PROP_BOOL("hv-vpindex", X86CPU, hyperv_vpindex, false),
 DEFINE_PROP_BOOL("hv-runtime", X86CPU, hyperv_runtime, false),
 DEFINE_PROP_BOOL("hv-synic", X86CPU, hyperv_synic, false),
+DEFINE_PROP_BOOL("hv-stimer", X86CPU, hyperv_stimer, false),
 DEFINE_PROP_BOOL("check", X86CPU, check_cpuid, true),
 DEFINE_PROP_BOOL("enforce", X86CPU, enforce_cpuid, false),
 DEFINE_PROP_BOOL("kvm", X86CPU, expose_kvm, true),
diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index 1168ddf..595891e 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -925,6 +925,8 @@ typedef struct CPUX86State {
 uint64_t msr_hv_synic_evt_page;
 uint64_t msr_hv_synic_msg_page;
 uint64_t msr_hv_synic_sint[HV_SYNIC_SINT_COUNT];
+uint64_t msr_hv_stimer_config[HV_SYNIC_STIMER_COUNT];
+uint64_t msr_hv_stimer_count[HV_SYNIC_STIMER_COUNT];
 
 /* exception/interrupt handling */
 int error_code;
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index d1c2c81..7692b59 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -90,6 +90,7 @@ static bool has_msr_hv_reset;
 static bool has_msr_hv_vpindex;
 static bool has_msr_hv_runtime;
 static bool has_msr_hv_synic;
+static bool has_msr_hv_stimer;
 static bool has_msr_mtrr;
 static bool has_msr_xss;
 
@@ -526,7 +527,8 @@ static bool hyperv_enabled(X86CPU *cpu)
 cpu->hyperv_reset ||
 cpu->hyperv_vpindex ||
 cpu->hyperv_runtime ||
-cpu->hyperv_synic);
+cpu->hyperv_synic ||
+cpu->hyperv_stimer);
 }
 
 static Error *invtsc_mig_blocker;
@@ -630,6 +632,13 @@ int kvm_arch_init_vcpu(CPUState *cs)
 env->msr_hv_synic_sint[sint] = HV_SYNIC_SINT_MASKED;
 }
 }
+if (cpu->hyperv_stimer) {
+if (!has_msr_hv_stimer) {
+fprintf(stderr, "Hyper-V timers aren't supported by kernel\n");
+return -ENOSYS;
+}
+c->eax |= HV_X64_MSR_SYNTIMER_AVAILABLE;
+}
 c = _data.entries[cpuid_i++];
 c->function = HYPERV_CPUID_ENLIGHTMENT_INFO;
 if (cpu->hyperv_relaxed_timing) {
@@ -980,6 +989,10 @@ static int kvm_get_supported_msrs(KVMState *s)
 has_msr_hv_synic = true;
 continue;
 }
+if (kvm_msr_list->indices[i] == HV_X64_MSR_STIMER0_CONFIG) {
+has_msr_hv_stimer = true;
+continue;
+}
 }
 }
 
@@ -1558,6 +1571,19 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
   env->msr_hv_synic_sint[j]);
 }
 }
+if (has_msr_hv_stimer) {
+int j;
+
+for (j = 0; j < ARRAY_SIZE(env->msr_hv_stimer_config); j++) {
+kvm_msr_entry_set([n++], HV_X64_MSR_STIMER0_CONFIG + j*2,
+env->msr_hv_stimer_config[j]);
+}
+
+for (j = 0; j < ARRAY_SIZE(env->msr_hv_

[PULL 16/45] hw/misc: Hyper-V test device 'hyperv-testdev'

2015-12-17 Thread Paolo Bonzini
From: Andrey Smetanin <asmeta...@virtuozzo.com>

'hyperv-testdev' will be used by kvm-unit-tests
to setup Hyper-V SynIC SINT's routing and to inject
Hyper-V SynIC SINT's.

Hyper-V test device is ISA type device that creates 0x3000
IO memory region and catches write access into it. Every
write operation data decoded into ctl code and parameters
for Hyper-V test device.

Signed-off-by: Andrey Smetanin <asmeta...@virtuozzo.com>
Reviewed-by: Roman Kagan <rka...@virtuozzo.com>
Signed-off-by: Denis V. Lunev <d...@openvz.org>
CC: Paolo Bonzini <pbonz...@redhat.com>
CC: Richard Henderson <r...@twiddle.net>
CC: Eduardo Habkost <ehabk...@redhat.com>
CC: "Andreas Färber" <afaer...@suse.de>
CC: Marcelo Tosatti <mtosa...@redhat.com>
CC: Roman Kagan <rka...@virtuozzo.com>
CC: Denis V. Lunev <d...@openvz.org>
CC: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonz...@redhat.com>
---
 default-configs/i386-softmmu.mak   |   1 +
 default-configs/x86_64-softmmu.mak |   1 +
 hw/misc/Makefile.objs  |   1 +
 hw/misc/hyperv_testdev.c   | 167 +
 4 files changed, 170 insertions(+)
 create mode 100644 hw/misc/hyperv_testdev.c

diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
index 43c96d1..57ccff3 100644
--- a/default-configs/i386-softmmu.mak
+++ b/default-configs/i386-softmmu.mak
@@ -50,3 +50,4 @@ CONFIG_XIO3130=y
 CONFIG_IOH3420=y
 CONFIG_I82801B11=y
 CONFIG_SMBIOS=y
+CONFIG_HYPERV_TESTDEV=$(CONFIG_KVM)
diff --git a/default-configs/x86_64-softmmu.mak 
b/default-configs/x86_64-softmmu.mak
index dfb8095..30f5e75 100644
--- a/default-configs/x86_64-softmmu.mak
+++ b/default-configs/x86_64-softmmu.mak
@@ -50,3 +50,4 @@ CONFIG_XIO3130=y
 CONFIG_IOH3420=y
 CONFIG_I82801B11=y
 CONFIG_SMBIOS=y
+CONFIG_HYPERV_TESTDEV=$(CONFIG_KVM)
diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs
index 8a235df..d4765c2 100644
--- a/hw/misc/Makefile.objs
+++ b/hw/misc/Makefile.objs
@@ -43,3 +43,4 @@ obj-$(CONFIG_STM32F2XX_SYSCFG) += stm32f2xx_syscfg.o
 
 obj-$(CONFIG_PVPANIC) += pvpanic.o
 obj-$(CONFIG_EDU) += edu.o
+obj-$(CONFIG_HYPERV_TESTDEV) += hyperv_testdev.o
diff --git a/hw/misc/hyperv_testdev.c b/hw/misc/hyperv_testdev.c
new file mode 100644
index 000..d88844a
--- /dev/null
+++ b/hw/misc/hyperv_testdev.c
@@ -0,0 +1,167 @@
+/*
+ * QEMU KVM Hyper-V test device to support Hyper-V kvm-unit-tests
+ *
+ * Copyright (C) 2015 Andrey Smetanin <asmeta...@virtuozzo.com>
+ *
+ * Authors:
+ *  Andrey Smetanin <asmeta...@virtuozzo.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "hw/hw.h"
+#include "hw/qdev.h"
+#include "hw/isa/isa.h"
+#include "sysemu/kvm.h"
+#include "linux/kvm.h"
+#include "target-i386/hyperv.h"
+#include "kvm_i386.h"
+
+#define HV_TEST_DEV_MAX_SINT_ROUTES 64
+
+struct HypervTestDev {
+ISADevice parent_obj;
+MemoryRegion sint_control;
+HvSintRoute *sint_route[HV_TEST_DEV_MAX_SINT_ROUTES];
+};
+typedef struct HypervTestDev HypervTestDev;
+
+#define TYPE_HYPERV_TEST_DEV "hyperv-testdev"
+#define HYPERV_TEST_DEV(obj) \
+OBJECT_CHECK(HypervTestDev, (obj), TYPE_HYPERV_TEST_DEV)
+
+enum {
+HV_TEST_DEV_SINT_ROUTE_CREATE = 1,
+HV_TEST_DEV_SINT_ROUTE_DESTROY,
+HV_TEST_DEV_SINT_ROUTE_SET_SINT
+};
+
+static int alloc_sint_route_index(HypervTestDev *dev)
+{
+int i;
+
+for (i = 0; i < ARRAY_SIZE(dev->sint_route); i++) {
+if (dev->sint_route[i] == NULL) {
+return i;
+}
+}
+return -1;
+}
+
+static void free_sint_route_index(HypervTestDev *dev, int i)
+{
+assert(i >= 0 && i < ARRAY_SIZE(dev->sint_route));
+dev->sint_route[i] = NULL;
+}
+
+static int find_sint_route_index(HypervTestDev *dev, uint32_t vcpu_id,
+ uint32_t sint)
+{
+HvSintRoute *sint_route;
+int i;
+
+for (i = 0; i < ARRAY_SIZE(dev->sint_route); i++) {
+sint_route = dev->sint_route[i];
+if (sint_route && sint_route->vcpu_id == vcpu_id &&
+sint_route->sint == sint) {
+return i;
+}
+}
+return -1;
+}
+
+static void hv_synic_test_dev_control(HypervTestDev *dev, uint32_t ctl,
+  uint32_t vcpu_id, uint32_t sint)
+{
+int i;
+HvSintRoute *sint_route;
+
+switch (ctl) {
+case HV_TEST_DEV_SINT_ROUTE_CREATE:
+i = alloc_sint_route_index(dev);
+assert(i >= 0);
+sint_route = kvm_hv_sint_route_create(vcpu_id, sint, NULL);
+assert(sint_route);
+dev->sint_route[i] = sint_route;
+break;
+case HV_TEST_DEV_SINT_ROUTE_DESTROY:
+i = find_sint_route_index(dev, 

[PULL 13/45] target-i386/kvm: Hyper-V SynIC MSR's support

2015-12-17 Thread Paolo Bonzini
From: Andrey Smetanin <asmeta...@virtuozzo.com>

This patch does Hyper-V Synthetic interrupt
controller(Hyper-V SynIC) MSR's support and
migration. Hyper-V SynIC is enabled by cpu's
'hv-synic' option.

This patch does not allow cpu creation if
'hv-synic' option specified but kernel
doesn't support Hyper-V SynIC.

Changes v3:
* removed 'msr_hv_synic_version' migration because
it's value always the same
* moved SynIC msr's initialization into kvm_arch_init_vcpu

Signed-off-by: Andrey Smetanin <asmeta...@virtuozzo.com>
Reviewed-by: Roman Kagan <rka...@virtuozzo.com>
Signed-off-by: Denis V. Lunev <d...@openvz.org>
CC: Paolo Bonzini <pbonz...@redhat.com>
CC: Richard Henderson <r...@twiddle.net>
CC: Eduardo Habkost <ehabk...@redhat.com>
CC: "Andreas Färber" <afaer...@suse.de>
CC: Marcelo Tosatti <mtosa...@redhat.com>
CC: Roman Kagan <rka...@virtuozzo.com>
CC: Denis V. Lunev <d...@openvz.org>
CC: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonz...@redhat.com>
---
 target-i386/cpu-qom.h |  1 +
 target-i386/cpu.c |  1 +
 target-i386/cpu.h |  5 
 target-i386/kvm.c | 66 ++-
 target-i386/machine.c | 37 +
 5 files changed, 109 insertions(+), 1 deletion(-)

diff --git a/target-i386/cpu-qom.h b/target-i386/cpu-qom.h
index e3bfe9d..7ea5b34 100644
--- a/target-i386/cpu-qom.h
+++ b/target-i386/cpu-qom.h
@@ -94,6 +94,7 @@ typedef struct X86CPU {
 bool hyperv_reset;
 bool hyperv_vpindex;
 bool hyperv_runtime;
+bool hyperv_synic;
 bool check_cpuid;
 bool enforce_cpuid;
 bool expose_kvm;
diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index 11e5e39..2fdd855 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -3146,6 +3146,7 @@ static Property x86_cpu_properties[] = {
 DEFINE_PROP_BOOL("hv-reset", X86CPU, hyperv_reset, false),
 DEFINE_PROP_BOOL("hv-vpindex", X86CPU, hyperv_vpindex, false),
 DEFINE_PROP_BOOL("hv-runtime", X86CPU, hyperv_runtime, false),
+DEFINE_PROP_BOOL("hv-synic", X86CPU, hyperv_synic, false),
 DEFINE_PROP_BOOL("check", X86CPU, check_cpuid, true),
 DEFINE_PROP_BOOL("enforce", X86CPU, enforce_cpuid, false),
 DEFINE_PROP_BOOL("kvm", X86CPU, expose_kvm, true),
diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index 84edfd0..1168ddf 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -920,6 +920,11 @@ typedef struct CPUX86State {
 uint64_t msr_hv_tsc;
 uint64_t msr_hv_crash_params[HV_X64_MSR_CRASH_PARAMS];
 uint64_t msr_hv_runtime;
+uint64_t msr_hv_synic_control;
+uint64_t msr_hv_synic_version;
+uint64_t msr_hv_synic_evt_page;
+uint64_t msr_hv_synic_msg_page;
+uint64_t msr_hv_synic_sint[HV_SYNIC_SINT_COUNT];
 
 /* exception/interrupt handling */
 int error_code;
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 6dc9846..ca0708a 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -86,6 +86,7 @@ static bool has_msr_hv_crash;
 static bool has_msr_hv_reset;
 static bool has_msr_hv_vpindex;
 static bool has_msr_hv_runtime;
+static bool has_msr_hv_synic;
 static bool has_msr_mtrr;
 static bool has_msr_xss;
 
@@ -521,7 +522,8 @@ static bool hyperv_enabled(X86CPU *cpu)
 cpu->hyperv_crash ||
 cpu->hyperv_reset ||
 cpu->hyperv_vpindex ||
-cpu->hyperv_runtime);
+cpu->hyperv_runtime ||
+cpu->hyperv_synic);
 }
 
 static Error *invtsc_mig_blocker;
@@ -610,6 +612,21 @@ int kvm_arch_init_vcpu(CPUState *cs)
 if (cpu->hyperv_runtime && has_msr_hv_runtime) {
 c->eax |= HV_X64_MSR_VP_RUNTIME_AVAILABLE;
 }
+if (cpu->hyperv_synic) {
+int sint;
+
+if (!has_msr_hv_synic ||
+kvm_vcpu_enable_cap(cs, KVM_CAP_HYPERV_SYNIC, 0)) {
+fprintf(stderr, "Hyper-V SynIC is not supported by kernel\n");
+return -ENOSYS;
+}
+
+c->eax |= HV_X64_MSR_SYNIC_AVAILABLE;
+env->msr_hv_synic_version = HV_SYNIC_VERSION_1;
+for (sint = 0; sint < ARRAY_SIZE(env->msr_hv_synic_sint); sint++) {
+env->msr_hv_synic_sint[sint] = HV_SYNIC_SINT_MASKED;
+}
+}
 c = _data.entries[cpuid_i++];
 c->function = HYPERV_CPUID_ENLIGHTMENT_INFO;
 if (cpu->hyperv_relaxed_timing) {
@@ -956,6 +973,10 @@ static int kvm_get_supported_msrs(KVMState *s)
 has_msr_hv_runtime = true;
 continue;
 }
+if (kvm_msr_list->indices[i] == HV_X64_MSR_SCONTROL) {
+has_msr_hv_synic = true;
+continue;
+}
 }
 }
 
@@ -1517,6 +1538,23 @@ stat

[PULL 15/45] target-i386/hyperv: Hyper-V SynIC SINT routing and vcpu exit

2015-12-17 Thread Paolo Bonzini
From: Andrey Smetanin <asmeta...@virtuozzo.com>

Hyper-V SynIC(synthetic interrupt controller) helpers for
Hyper-V SynIC irq routing setup, irq injection, irq ack
notifications event/message pages changes tracking for future use.

Signed-off-by: Andrey Smetanin <asmeta...@virtuozzo.com>
Reviewed-by: Roman Kagan <rka...@virtuozzo.com>
Signed-off-by: Denis V. Lunev <d...@openvz.org>
CC: Paolo Bonzini <pbonz...@redhat.com>
CC: Richard Henderson <r...@twiddle.net>
CC: Eduardo Habkost <ehabk...@redhat.com>
CC: "Andreas Färber" <afaer...@suse.de>
CC: Marcelo Tosatti <mtosa...@redhat.com>
CC: Roman Kagan <rka...@virtuozzo.com>
CC: Denis V. Lunev <d...@openvz.org>
CC: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonz...@redhat.com>
---
 target-i386/Makefile.objs |   2 +-
 target-i386/hyperv.c  | 127 ++
 target-i386/hyperv.h  |  42 +++
 target-i386/kvm.c |   6 +++
 4 files changed, 176 insertions(+), 1 deletion(-)
 create mode 100644 target-i386/hyperv.c
 create mode 100644 target-i386/hyperv.h

diff --git a/target-i386/Makefile.objs b/target-i386/Makefile.objs
index 437d997..2255f46 100644
--- a/target-i386/Makefile.objs
+++ b/target-i386/Makefile.objs
@@ -3,5 +3,5 @@ obj-y += excp_helper.o fpu_helper.o cc_helper.o int_helper.o 
svm_helper.o
 obj-y += smm_helper.o misc_helper.o mem_helper.o seg_helper.o
 obj-y += gdbstub.o
 obj-$(CONFIG_SOFTMMU) += machine.o arch_memory_mapping.o arch_dump.o monitor.o
-obj-$(CONFIG_KVM) += kvm.o
+obj-$(CONFIG_KVM) += kvm.o hyperv.o
 obj-$(call lnot,$(CONFIG_KVM)) += kvm-stub.o
diff --git a/target-i386/hyperv.c b/target-i386/hyperv.c
new file mode 100644
index 000..e79b173
--- /dev/null
+++ b/target-i386/hyperv.c
@@ -0,0 +1,127 @@
+/*
+ * QEMU KVM Hyper-V support
+ *
+ * Copyright (C) 2015 Andrey Smetanin <asmeta...@virtuozzo.com>
+ *
+ * Authors:
+ *  Andrey Smetanin <asmeta...@virtuozzo.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "hyperv.h"
+#include "standard-headers/asm-x86/hyperv.h"
+
+int kvm_hv_handle_exit(X86CPU *cpu, struct kvm_hyperv_exit *exit)
+{
+CPUX86State *env = >env;
+
+switch (exit->type) {
+case KVM_EXIT_HYPERV_SYNIC:
+if (!cpu->hyperv_synic) {
+return -1;
+}
+
+/*
+ * For now just track changes in SynIC control and msg/evt pages msr's.
+ * When SynIC messaging/events processing will be added in future
+ * here we will do messages queues flushing and pages remapping.
+ */
+switch (exit->u.synic.msr) {
+case HV_X64_MSR_SCONTROL:
+env->msr_hv_synic_control = exit->u.synic.control;
+break;
+case HV_X64_MSR_SIMP:
+env->msr_hv_synic_msg_page = exit->u.synic.msg_page;
+break;
+case HV_X64_MSR_SIEFP:
+env->msr_hv_synic_evt_page = exit->u.synic.evt_page;
+break;
+default:
+return -1;
+}
+return 0;
+default:
+return -1;
+}
+}
+
+static void kvm_hv_sint_ack_handler(EventNotifier *notifier)
+{
+HvSintRoute *sint_route = container_of(notifier, HvSintRoute,
+   sint_ack_notifier);
+event_notifier_test_and_clear(notifier);
+if (sint_route->sint_ack_clb) {
+sint_route->sint_ack_clb(sint_route);
+}
+}
+
+HvSintRoute *kvm_hv_sint_route_create(uint32_t vcpu_id, uint32_t sint,
+  HvSintAckClb sint_ack_clb)
+{
+HvSintRoute *sint_route;
+int r, gsi;
+
+sint_route = g_malloc0(sizeof(*sint_route));
+r = event_notifier_init(_route->sint_set_notifier, false);
+if (r) {
+goto err;
+}
+
+r = event_notifier_init(_route->sint_ack_notifier, false);
+if (r) {
+goto err_sint_set_notifier;
+}
+
+event_notifier_set_handler(_route->sint_ack_notifier,
+   kvm_hv_sint_ack_handler);
+
+gsi = kvm_irqchip_add_hv_sint_route(kvm_state, vcpu_id, sint);
+if (gsi < 0) {
+goto err_gsi;
+}
+
+r = kvm_irqchip_add_irqfd_notifier_gsi(kvm_state,
+   _route->sint_set_notifier,
+   _route->sint_ack_notifier, 
gsi);
+if (r) {
+goto err_irqfd;
+}
+sint_route->gsi = gsi;
+sint_route->sint_ack_clb = sint_ack_clb;
+sint_route->vcpu_id = vcpu_id;
+sint_route->sint = sint;
+
+return sint_route;
+
+err_irqfd:
+kvm_irqchip_release_virq(kvm_state, gsi);
+err_gsi:
+event_notifier_set_handler(_route->sint_ack_notifier, NULL);
+event_notifier_cleanup(_route->sint_ack_notifier);
+err_

[PULL 14/45] kvm: Hyper-V SynIC irq routing support

2015-12-17 Thread Paolo Bonzini
From: Andrey Smetanin <asmeta...@virtuozzo.com>

Signed-off-by: Andrey Smetanin <asmeta...@virtuozzo.com>
Reviewed-by: Roman Kagan <rka...@virtuozzo.com>
Signed-off-by: Denis V. Lunev <d...@openvz.org>
CC: Paolo Bonzini <pbonz...@redhat.com>
CC: Richard Henderson <r...@twiddle.net>
CC: Eduardo Habkost <ehabk...@redhat.com>
CC: "Andreas Färber" <afaer...@suse.de>
CC: Marcelo Tosatti <mtosa...@redhat.com>
CC: Roman Kagan <rka...@virtuozzo.com>
CC: Denis V. Lunev <d...@openvz.org>
CC: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonz...@redhat.com>
---
 include/sysemu/kvm.h |  1 +
 kvm-all.c| 33 +
 2 files changed, 34 insertions(+)

diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index b31f325..9a569f1 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -455,6 +455,7 @@ int kvm_irqchip_update_msi_route(KVMState *s, int virq, 
MSIMessage msg,
 void kvm_irqchip_release_virq(KVMState *s, int virq);
 
 int kvm_irqchip_add_adapter_route(KVMState *s, AdapterInfo *adapter);
+int kvm_irqchip_add_hv_sint_route(KVMState *s, uint32_t vcpu, uint32_t sint);
 
 int kvm_irqchip_add_irqfd_notifier_gsi(KVMState *s, EventNotifier *n,
EventNotifier *rn, int virq);
diff --git a/kvm-all.c b/kvm-all.c
index ed707fe..e78a378 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1300,6 +1300,34 @@ int kvm_irqchip_add_adapter_route(KVMState *s, 
AdapterInfo *adapter)
 return virq;
 }
 
+int kvm_irqchip_add_hv_sint_route(KVMState *s, uint32_t vcpu, uint32_t sint)
+{
+struct kvm_irq_routing_entry kroute = {};
+int virq;
+
+if (!kvm_gsi_routing_enabled()) {
+return -ENOSYS;
+}
+if (!kvm_check_extension(s, KVM_CAP_HYPERV_SYNIC)) {
+return -ENOSYS;
+}
+virq = kvm_irqchip_get_virq(s);
+if (virq < 0) {
+return virq;
+}
+
+kroute.gsi = virq;
+kroute.type = KVM_IRQ_ROUTING_HV_SINT;
+kroute.flags = 0;
+kroute.u.hv_sint.vcpu = vcpu;
+kroute.u.hv_sint.sint = sint;
+
+kvm_add_routing_entry(s, );
+kvm_irqchip_commit_routes(s);
+
+return virq;
+}
+
 #else /* !KVM_CAP_IRQ_ROUTING */
 
 void kvm_init_irq_routing(KVMState *s)
@@ -1325,6 +1353,11 @@ int kvm_irqchip_add_adapter_route(KVMState *s, 
AdapterInfo *adapter)
 return -ENOSYS;
 }
 
+int kvm_irqchip_add_hv_sint_route(KVMState *s, uint32_t vcpu, uint32_t sint)
+{
+return -ENOSYS;
+}
+
 static int kvm_irqchip_assign_irqfd(KVMState *s, int fd, int virq, bool assign)
 {
 abort();
-- 
2.5.0


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] Threaded MSI interrupt for VFIO PCI device

2015-12-16 Thread Paolo Bonzini
Alex,

can you take a look at the extension to the irq bypass interface in
patch 2?  I'm not sure I understand what is the case where you have
multiple consumers for the same token.

Paolo

On 03/12/2015 19:22, Yunhong Jiang wrote:
> When assigning a VFIO device to a KVM guest with low latency requirement, it  
> is better to handle the interrupt in the hard interrupt context, to reduce 
> the context switch to/from the IRQ thread.
> 
> Based on discussion on https://lkml.org/lkml/2015/10/26/764, the VFIO msi 
> interrupt is changed to use request_threaded_irq(). The primary interrupt 
> handler tries to set the guest interrupt atomically. If it fails to achieve 
> it, a threaded interrupt handler will be invoked.
> 
> The irq_bypass manager is extended for this purpose. The KVM eventfd will 
> provide a irqbypass consumer to handle the interrupt at hard interrupt 
> context. The producer will invoke the consumer's handler then.
> 
> Yunhong Jiang (5):
>   Extract the irqfd_wakeup_pollin/irqfd_wakeup_pollup
>   Support runtime irq_bypass consumer
>   Support threaded interrupt handling on VFIO
>   Add the irq handling consumer
>   Expose x86 kvm_arch_set_irq_inatomic()
> 
>  arch/x86/kvm/Kconfig  |   1 +
>  drivers/vfio/pci/vfio_pci_intrs.c |  39 ++--
>  include/linux/irqbypass.h |   8 +++
>  include/linux/kvm_host.h  |  19 +-
>  include/linux/kvm_irqfd.h |   1 +
>  virt/kvm/Kconfig  |   3 +
>  virt/kvm/eventfd.c| 131 
> ++
>  virt/lib/irqbypass.c  |  82 ++--
>  8 files changed, 214 insertions(+), 70 deletions(-)
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL 0/4] KVM: s390 features and fixes for 4.5 (kvm/next)

2015-12-16 Thread Paolo Bonzini


On 15/12/2015 20:23, Christian Borntraeger wrote:
> Paolo,
> 
> here is the 2nd part of the s390 queue for 4.5
> 
> The following changes since commit 460146348518a1c4e810d01baf81847f8c6a1c73:
> 
>   Merge tag 'kvm-s390-next-4.5-1' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD 
> (2015-12-02 13:50:04 +0100)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux.git  
> tags/kvm-s390-next-4.5-2
> 
> for you to fetch changes up to 32e6b236d26946eb076d1450bfb8f9978f15d6b9:
> 
>   KVM: s390: consider system MHA for guest storage (2015-12-15 17:08:22 +0100)
> 
> 
> KVM: s390 features and fixes for 4.5 (kvm/next)
> 
> Some small cleanups
> - use assignment instead of memcpy
> - use %pK for kernel pointers
> 
> Changes regarding guest memory size
> - Fix an off-by-one error in our guest memory interface (we might
> use unnecessarily big page tables, e.g. 3 levels for a 2GB guest
> instead of 2 levels)
> - We now ask the machine about the max. supported guest address
>   and limit accordingly.
> 
> 
> Christian Borntraeger (2):
>   KVM: s390: use assignment instead of memcpy
>   KVM: s390: obey kptr_restrict in traces
> 
> Dominik Dingel (1):
>   KVM: s390: fix mismatch between user and in-kernel guest limit
> 
> Guenther Hutzl (1):
>   KVM: s390: consider system MHA for guest storage
> 
>  Documentation/virtual/kvm/devices/vm.txt |  3 ++-
>  arch/s390/include/asm/kvm_host.h |  1 +
>  arch/s390/include/uapi/asm/kvm.h |  2 ++
>  arch/s390/kvm/kvm-s390.c | 44 
> 
>  arch/s390/kvm/trace-s390.h   |  6 ++---
>  arch/s390/mm/pgtable.c   |  4 +--
>  drivers/s390/char/sclp_early.c   |  8 +-
>  7 files changed, 50 insertions(+), 18 deletions(-)
> 

Pulled (but not pushed yet), thanks.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM with PCI forwarding really slow after 4.1

2015-12-16 Thread Paolo Bonzini


On 16/12/2015 18:55, Michael Büsch wrote:
>>> On 01/12/2015 18:09, Michael Büsch wrote:
> I use "-device pci-assign,host=00:1a.0" to forward a USB
> host chip to a Win7 32 bit inside of qemu/kvm. That used to
> work pretty well, but it broke horribly somewhere after
> 4.1. With recent kernels the virtual machine boots, but is
> _very_ slow. It takes hours to boot. If PCI forwarding is
> disabled, everything is fine.
>>> 
>>> This has been reported already, I'm going to look at it this
>>> week.
> 
> Are there any news regarding this issue?

It's being discussed in Bugzilla
(https://bugzilla.kernel.org/show_bug.cgi?id=107561).

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Question] Switching VCPU CPL from the hypervisor ?

2015-12-16 Thread Paolo Bonzini


On 15/12/2015 18:02, Hebbal Yacine wrote:
> What I want to do is: when a controlled process is in user mode, i
> change its cpl to 0, force it to execute a code that is injected in the
> VM, set back its cpl to 3 and let it run like if nothing happened

Could you inject an SMI and place your code in the guest firmware's SMM
handler?  What input is needed by this CPL=0 code?

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1] kvm/x86: Remove Hyper-V SynIC timer stopping

2015-12-16 Thread Paolo Bonzini


On 14/12/2015 18:01, Andrey Smetanin wrote:
> hostguest
> start periodic stimer
> start periodic timer
> timer expires after 15ms
> send expiration message into guest
> restart periodic timer
> doing something
> timer expires again after 15 ms
> msg slot is still not cleared so
> setup ->msg_pending
> restart periodic timer
> doing something
> process timer msg and clear slot
> so ->msg_pending was set:
> send EOM into host
> received EOM
> queued call of kvm_hv_process_stimers()
> by KVM_REQ_HV_STIMER
> 
> kvm_hv_process_stimers():
> ...
> stimer_stop()
> if (time_now >= stimer->exp_time)
> stimer_expiration(stimer);
> But time_now  < stimer->exp_time, so stimer_expiration is not called
> in this case and timer is not restarted. so guest lose timer.

Great, this explains it.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] Threaded MSI interrupt for VFIO PCI device

2015-12-16 Thread Paolo Bonzini


On 16/12/2015 20:15, Alex Williamson wrote:
> The consumers would be, for instance, Intel PI + the threaded handler
> added in this series.  These run independently, the PI bypass simply
> makes the interrupt disappear from the host when it catches it, but if
> the vCPU isn't running in the right place at the time of the interrupt,
> it gets delivered to the host, in which case the secondary consumer
> implementing handle_irq() provides a lower latency injection than the
> eventfd path.  If PI isn't supported, only this latter consumer is
> registered.

I would implement the two in a single consumer, knowing that only one of
the two parts would effectively run.  But because of the possibility of
multiple consumers implementing handle_irq(), I am not sure if this is
feasible.

> On the surface it seems like a reasonable solution, though having
> multiple consumers implementing handle_irq() seems problematic.  Do we
> get multiple injections if we call them all?

Indeed.

> Should we have some way
> to prioritize one handler versus another?  Perhaps KVM should have a
> single unified consumer that can provide that sort of logic, though we
> still need the srcu code added here to protect against registration and
> irq_handler() races.  Thanks,

I'm happy to see that we have the same doubts. :)

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] target-i386: kvm: Increase MSR entry array limits, check for array overrun

2015-12-16 Thread Paolo Bonzini


On 16/12/2015 20:06, Eduardo Habkost wrote:
> We are dangerously close to the array limits in kvm_put_msrs()
> and kvm_get_msrs(): with the default mcg_cap configuration, we
> can set up to 148 MSRs in kvm_put_msrs(), and if we allow mcg_cap
> to be changed, we can write up to 236 MSRs[1].
> 
> This series changes the code to allocate a buffer once per VCPU,
> increase buffer size to 4096 bytes (that can hold up to 255 MSR
> entries), and check array limits before appending new entries.

Thanks, it's a good improvement.

Reviewed-by: Paolo Bonzini <pbonz...@redhat.com>

> [1] I have checked the limits by copying and pasting the
> kvm_put_msrs() code to a new file, replacing the "if" lines,
> copying the macro definitions, and adding a helper macro to
> keep track of the kvm_msr_entry_set() calls. The code can be
> seen at:
> https://gist.github.com/ehabkost/08d4177a33b8648a71ef
> 
> Eduardo Habkost (5):
>   target-i386: kvm: Allocate kvm_msrs struct once per VCPU
>   target-i386: kvm: Increase MSR_BUF_SIZE
>   target-i386: kvm: Simplify MSR array construction
>   target-i386: kvm: Simplify MSR setting functions
>   target-i386: kvm: Eliminate kvm_msr_entry_set()
> 
>  target-i386/cpu-qom.h |   4 +
>  target-i386/kvm.c | 322 
> +++---
>  2 files changed, 149 insertions(+), 177 deletions(-)
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvm-unit-tests 1/4] lib: add setjmp header and x86 implementation

2015-12-15 Thread Paolo Bonzini


On 15/12/2015 17:43, Andrew Jones wrote:
> How about making this a "real" test, i.e.
> 
> report("longjmp", i == 10);
> return report_summary();
> 
> I have patches that allow adding timeouts to tests, that I've been
> thinking about posting upstream. With those we could add a short
> timeout to this one, allowing us to get the FAIL when we loop
> forever, as well as when we never loop.

Good idea.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Question] Switching VCPU CPL from the hypervisor ?

2015-12-15 Thread Paolo Bonzini


On 15/12/2015 17:20, Yacine HEBBAL wrote:
> Hi,
> I working on an application in which I control an arbitrary process to
> execute an a given code (injected code for example). I want the process I'm
> controlling to execute my code with root privilege. Is it possible to
> arbitrary switch vcpu cpl to 0 from the hypervisor level (process is in user
> mode) ? I'm aware that I can do this using some hacks and emulation or by
> controlling the process just after it enters or just before it quits kernel
> mode (but I need to wait and intercept these events). Is there a
> straightforward technique to switch vcpu cpl from the hypervisor level at
> demand ?

Would a hypercall do?  VMCALL can be executed from CPL 3.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvmclock doesn't work, help?

2015-12-15 Thread Paolo Bonzini


On 14/12/2015 23:31, Andy Lutomirski wrote:
> > RAW TSC NTP corrected TSC
> > t0  10  10
> > t1  20  19.99
> > t2  30  29.98
> > t3  40  39.97
> > t4  50  49.96
> >
> > ...
> >
> > if you suddenly switch from RAW TSC to NTP corrected TSC,
> > you can see what will happen.
>
> Sure, but why would you ever switch from one to the other?

The guest uses the raw TSC and systemtime = 0 until suspend.  After
resume, the TSC certainly increases at the same rate as before, but the
raw TSC restarted counting from 0 and systemtime has increased slower
than the guest kvmclock.

Paolo

> The only things that need to be monotonic are the output from
> vread_pvclock and the in-kernel equivalent, I think.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvm-unit-tests 0/6] Improve the output of test runners

2015-12-15 Thread Paolo Bonzini


On 14/12/2015 22:24, Radim Krčmář wrote:
> This series is a mix of patches that change the output of run_tests.sh
> and x86-run.  The output of ./run_tests.sh now looks like this:

I like the idea, thanks!  I agree with Andrew about pretty much
everything, except that I like having the summary close to PASS/FAIL.

>> qemu-kvm: Property '.hv-synic' not found
>> skip hyperv_synic (failed $(echo quit | $qemu -enable-kvm -cpu 
>> kvm64,hv_synic -device hyperv-testdev -monitor stdio > /dev/null))

Here I'd just print "failed check".

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests 1/4] lib: add setjmp header and x86 implementation

2015-12-15 Thread Paolo Bonzini
Signed-off-by: Paolo Bonzini <pbonz...@redhat.com>
---
 config/config-i386.mak   |  2 ++
 config/config-x86-common.mak |  4 +++-
 config/config-x86_64.mak |  2 ++
 lib/setjmp.h | 12 
 lib/x86/setjmp32.S   | 25 +
 lib/x86/setjmp64.S   | 27 +++
 x86/setjmp.c | 19 +++
 7 files changed, 90 insertions(+), 1 deletion(-)
 create mode 100644 lib/setjmp.h
 create mode 100644 lib/x86/setjmp32.S
 create mode 100644 lib/x86/setjmp64.S
 create mode 100644 x86/setjmp.c

diff --git a/config/config-i386.mak b/config/config-i386.mak
index 691381c..e353387 100644
--- a/config/config-i386.mak
+++ b/config/config-i386.mak
@@ -3,6 +3,8 @@ bits = 32
 ldarch = elf32-i386
 CFLAGS += -I $(KERNELDIR)/include
 
+cflatobjs += lib/x86/setjmp32.o
+
 tests = $(TEST_DIR)/taskswitch.flat $(TEST_DIR)/taskswitch2.flat \
$(TEST_DIR)/cmpxchg8b.flat
 
diff --git a/config/config-x86-common.mak b/config/config-x86-common.mak
index f64874d..2bb2f46 100644
--- a/config/config-x86-common.mak
+++ b/config/config-x86-common.mak
@@ -34,7 +34,7 @@ tests-common = $(TEST_DIR)/vmexit.flat $(TEST_DIR)/tsc.flat \
$(TEST_DIR)/realmode.flat $(TEST_DIR)/msr.flat \
$(TEST_DIR)/hypercall.flat $(TEST_DIR)/sieve.flat \
$(TEST_DIR)/kvmclock_test.flat  $(TEST_DIR)/eventinj.flat \
-   $(TEST_DIR)/s3.flat $(TEST_DIR)/pmu.flat \
+   $(TEST_DIR)/s3.flat $(TEST_DIR)/pmu.flat 
$(TEST_DIR)/setjmp.flat \
$(TEST_DIR)/tsc_adjust.flat $(TEST_DIR)/asyncpf.flat \
$(TEST_DIR)/init.flat $(TEST_DIR)/smap.flat \
$(TEST_DIR)/hyperv_synic.flat
@@ -115,6 +115,8 @@ $(TEST_DIR)/memory.elf: $(cstart.o) $(TEST_DIR)/memory.o
 
 $(TEST_DIR)/hyperv_synic.elf: $(cstart.o) $(TEST_DIR)/hyperv_synic.o
 
+$(TEST_DIR)/setjmp.elf: $(cstart.o) $(TEST_DIR)/setjmp.o
+
 arch_clean:
$(RM) $(TEST_DIR)/*.o $(TEST_DIR)/*.flat $(TEST_DIR)/*.elf \
$(TEST_DIR)/.*.d lib/x86/.*.d
diff --git a/config/config-x86_64.mak b/config/config-x86_64.mak
index 1764701..d190be8 100644
--- a/config/config-x86_64.mak
+++ b/config/config-x86_64.mak
@@ -3,6 +3,8 @@ bits = 64
 ldarch = elf64-x86-64
 CFLAGS += -mno-red-zone
 
+cflatobjs += lib/x86/setjmp64.o
+
 tests = $(TEST_DIR)/access.flat $(TEST_DIR)/apic.flat \
  $(TEST_DIR)/emulator.flat $(TEST_DIR)/idt_test.flat \
  $(TEST_DIR)/xsave.flat $(TEST_DIR)/rmap_chain.flat \
diff --git a/lib/setjmp.h b/lib/setjmp.h
new file mode 100644
index 000..334f466
--- /dev/null
+++ b/lib/setjmp.h
@@ -0,0 +1,12 @@
+#ifndef LIBCFLAT_SETJMP_H
+#define LIBCFLAT_SETJMP_H 1
+
+typedef struct jmp_buf_tag {
+   long int regs[8];
+} jmp_buf[1];
+
+extern int setjmp (struct jmp_buf_tag env[1]);
+extern void longjmp (struct jmp_buf_tag env[1], int val)
+ __attribute__ ((__noreturn__));
+
+#endif /* setjmp.h  */
diff --git a/lib/x86/setjmp32.S b/lib/x86/setjmp32.S
new file mode 100644
index 000..b0be7c2
--- /dev/null
+++ b/lib/x86/setjmp32.S
@@ -0,0 +1,25 @@
+.globl setjmp
+setjmp:
+   mov (%esp), %ecx// get return EIP
+   mov 4(%esp), %eax   // get jmp_buf
+   mov %ecx, (%eax)
+   mov %esp, 4(%eax)
+   mov %ebp, 8(%eax)
+   mov %ebx, 12(%eax)
+   mov %esi, 16(%eax)
+   mov %edi, 20(%eax)
+   xor %eax, %eax
+   ret
+
+.globl longjmp
+longjmp:
+   mov 8(%esp), %eax   // get return value
+   mov 4(%esp), %ecx   // get jmp_buf
+   mov 20(%ecx), %edi
+   mov 16(%ecx), %esi
+   mov 12(%ecx), %ebx
+   mov 8(%ecx), %ebp
+   mov 4(%ecx), %esp
+   mov (%ecx), %ecx// get saved EIP
+   mov %ecx, (%esp)// and store it on the stack
+   ret
diff --git a/lib/x86/setjmp64.S b/lib/x86/setjmp64.S
new file mode 100644
index 000..c8ae790
--- /dev/null
+++ b/lib/x86/setjmp64.S
@@ -0,0 +1,27 @@
+.globl setjmp
+setjmp:
+   mov (%rsp), %rsi
+   mov %rsi, (%rdi)
+   mov %rsp, 0x8(%rdi)
+   mov %rbp, 0x10(%rdi)
+   mov %rbx, 0x18(%rdi)
+   mov %r12, 0x20(%rdi)
+   mov %r13, 0x28(%rdi)
+   mov %r14, 0x30(%rdi)
+   mov %r15, 0x38(%rdi)
+   xor %eax, %eax
+   ret
+
+.globl longjmp
+longjmp:
+   mov %esi, %eax
+   mov 0x38(%rdi), %r15
+   mov 0x30(%rdi), %r14
+   mov 0x28(%rdi), %r13
+   mov 0x20(%rdi), %r12
+   mov 0x18(%rdi), %rbx
+   mov 0x10(%rdi), %rbp
+   mov 0x8(%rdi), %rsp
+   mov (%rdi), %rsi
+   mov %rsi, (%rsp)
+   ret
diff --git a/x86/setjmp.c b/x86/setjmp.c
new file mode 100644
index 000..46f0d9c
--- /dev/null
+++ b/x86/setjmp.c
@@ -0,0 +1,19 @@
+#include "stdio.h"
+#include "setjmp.h"
+
+int main()
+{
+volatile int i;
+jmp_buf j;
+
+if (setjmp(j) == 0) {
+   i = 0;
+}
+printf("%d\n", i);
+if (++i < 10) {
+   longjmp(j, 1);
+   

[PATCH kvm-unit-tests 2/4] x86: replace set_exception_return with longjmp-based implementation

2015-12-15 Thread Paolo Bonzini
set_exception_return forces exceptions handlers to return to a specific
address instead of returning to the instruction address pushed by the
CPU at the time of the exception. The unit tests apic.c and vmx.c use
this functionality to recover from expected exceptions.

When using set_exception_return one would have to be careful not to modify
the stack (such as by doing a function call) as triggering the exception
will likely jump us past the instructions which undo the stack manipulation
(such as a ret).  This is unnecessarily brittle, and C already has a
mechanism to do non-local returns---setjmp.  Now that libcflat includes
an implementation of setjmp, replace set_exception_return with a wrapper
that takes care of restoring the processor flags as well.

Reported-by: David Matlack <dmatl...@google.com>
Signed-off-by: Paolo Bonzini <pbonz...@redhat.com>
---
 lib/x86/desc.c | 15 +++
 lib/x86/desc.h |  6 +-
 x86/apic.c |  8 
 x86/vmx.c  | 18 +-
 4 files changed, 29 insertions(+), 18 deletions(-)

diff --git a/lib/x86/desc.c b/lib/x86/desc.c
index 4760026..acf29e3 100644
--- a/lib/x86/desc.c
+++ b/lib/x86/desc.c
@@ -1,6 +1,7 @@
 #include "libcflat.h"
 #include "desc.h"
 #include "processor.h"
+#include 
 
 void set_idt_entry(int vec, void *addr, int dpl)
 {
@@ -315,12 +316,18 @@ void setup_alt_stack(void)
 #endif
 
 static bool exception;
-static void *exception_return;
+static jmp_buf *exception_jmpbuf;
+
+static void exception_handler_longjmp(void)
+{
+   longjmp(*exception_jmpbuf, 1);
+}
 
 static void exception_handler(struct ex_regs *regs)
 {
+   /* longjmp must happen after iret, so do not do it now.  */
exception = true;
-   regs->rip = (unsigned long)exception_return;
+   regs->rip = (unsigned long)_handler_longjmp;
 }
 
 bool test_for_exception(unsigned int ex, void (*trigger_func)(void *data),
@@ -333,7 +340,7 @@ bool test_for_exception(unsigned int ex, void 
(*trigger_func)(void *data),
return exception;
 }
 
-void set_exception_return(void *addr)
+void __set_exception_jmpbuf(jmp_buf *addr)
 {
-   exception_return = addr;
+   exception_jmpbuf = addr;
 }
diff --git a/lib/x86/desc.h b/lib/x86/desc.h
index 7733151..be52fd4 100644
--- a/lib/x86/desc.h
+++ b/lib/x86/desc.h
@@ -1,6 +1,8 @@
 #ifndef __IDT_TEST__
 #define __IDT_TEST__
 
+#include 
+
 void setup_idt(void);
 void setup_alt_stack(void);
 
@@ -155,6 +157,8 @@ void handle_exception(u8 v, void (*func)(struct ex_regs 
*regs));
 
 bool test_for_exception(unsigned int ex, void (*trigger_func)(void *data),
void *data);
-void set_exception_return(void *addr);
+void __set_exception_jmpbuf(jmp_buf *addr);
+#define set_exception_jmpbuf(jmpbuf) \
+   (setjmp(jmpbuf) ? : (__set_exception_jmpbuf(&(jmpbuf)), 0))
 
 #endif
diff --git a/x86/apic.c b/x86/apic.c
index d4eec52..2e04c82 100644
--- a/x86/apic.c
+++ b/x86/apic.c
@@ -63,10 +63,10 @@ static void test_tsc_deadline_timer(void)
 
 static void do_write_apicbase(void *data)
 {
-set_exception_return(&);
-wrmsr(MSR_IA32_APICBASE, *(u64 *)data);
-resume:
-barrier();
+jmp_buf jmpbuf;
+if (set_exception_jmpbuf(jmpbuf) == 0) {
+wrmsr(MSR_IA32_APICBASE, *(u64 *)data);
+}
 }
 
 void test_enable_x2apic(void)
diff --git a/x86/vmx.c b/x86/vmx.c
index 47593bd..ee9aca8 100644
--- a/x86/vmx.c
+++ b/x86/vmx.c
@@ -619,19 +619,19 @@ static void init_vmx(void)
 
 static void do_vmxon_off(void *data)
 {
-   set_exception_return(&);
-   vmx_on();
-   vmx_off();
-resume:
-   barrier();
+   jmp_buf jmpbuf;
+   if (set_exception_jmpbuf(jmpbuf) == 0) {
+   vmx_on();
+   vmx_off();
+   }
 }
 
 static void do_write_feature_control(void *data)
 {
-   set_exception_return(&);
-   wrmsr(MSR_IA32_FEATURE_CONTROL, 0);
-resume:
-   barrier();
+   jmp_buf jmpbuf;
+   if (set_exception_jmpbuf(jmpbuf) == 0) {
+   wrmsr(MSR_IA32_FEATURE_CONTROL, 0);
+   }
 }
 
 static int test_vmx_feature_control(void)
-- 
2.5.0


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests 4/4] x86: apic: cleanup

2015-12-15 Thread Paolo Bonzini
Signed-off-by: Paolo Bonzini <pbonz...@redhat.com>
---
 x86/apic.c | 26 +++---
 1 file changed, 11 insertions(+), 15 deletions(-)

diff --git a/x86/apic.c b/x86/apic.c
index de19724..dfaea35 100644
--- a/x86/apic.c
+++ b/x86/apic.c
@@ -66,7 +66,7 @@ static bool do_write_apicbase(u64 data)
 jmp_buf jmpbuf;
 int ret;
 if (set_exception_jmpbuf(GP_VECTOR, jmpbuf) == 0) {
-wrmsr(MSR_IA32_APICBASE, data);
+wrmsr(MSR_IA32_APICBASE, APIC_DEFAULT_PHYS_BASE | APIC_BSP | data);
 ret = 0;
 } else {
 ret = 1;
@@ -77,36 +77,32 @@ static bool do_write_apicbase(u64 data)
 
 void test_enable_x2apic(void)
 {
-u64 invalid_state = APIC_DEFAULT_PHYS_BASE | APIC_BSP | APIC_EXTD;
-u64 apic_enabled = APIC_DEFAULT_PHYS_BASE | APIC_BSP | APIC_EN;
-u64 x2apic_enabled =
-APIC_DEFAULT_PHYS_BASE | APIC_BSP | APIC_EN | APIC_EXTD;
-
 if (enable_x2apic()) {
 printf("x2apic enabled\n");
 
 report("x2apic enabled to invalid state",
-   do_write_apicbase(invalid_state));
+   do_write_apicbase(APIC_EXTD));
 report("x2apic enabled to apic enabled",
-   do_write_apicbase(apic_enabled));
+   do_write_apicbase(APIC_EN));
 
-wrmsr(MSR_IA32_APICBASE, APIC_DEFAULT_PHYS_BASE | APIC_BSP);
+report("x2apic enabled to disabled state",
+   !do_write_apicbase(0));
 report("disabled to invalid state",
-   do_write_apicbase(invalid_state));
+   do_write_apicbase(APIC_EXTD));
 report("disabled to x2apic enabled",
-   do_write_apicbase(x2apic_enabled));
+   do_write_apicbase(APIC_EN | APIC_EXTD));
 
-wrmsr(MSR_IA32_APICBASE, apic_enabled);
+wrmsr(MSR_IA32_APICBASE, APIC_EN);
 report("apic enabled to invalid state",
-   do_write_apicbase(invalid_state));
+   do_write_apicbase(APIC_EXTD));
 
-wrmsr(MSR_IA32_APICBASE, x2apic_enabled);
+wrmsr(MSR_IA32_APICBASE, APIC_EN | APIC_EXTD);
 apic_write(APIC_SPIV, 0x1ff);
 } else {
 printf("x2apic not detected\n");
 
 report("enable unsupported x2apic",
-   do_write_apicbase(x2apic_enabled));
+   do_write_apicbase(APIC_EN | APIC_EXTD));
 }
 }
 
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests 0/3] use setjmp/longjmp to catch exceptions

2015-12-15 Thread Paolo Bonzini
This is an attempt to fix David's reported problem with set_exception_return
and make it more robust.

Patch 1 introduces setjmp; patches 2 and 3 replace test_for_exception
and set_exception_return with setjmp/longjmp.  Patch 4 provides further
cleanups.

Paolo

Paolo Bonzini (4):
  lib: add setjmp header and x86 implementation
  x86: replace set_exception_return with longjmp-based implementation
  x86: remove test_for_exception
  x86: apic: cleanup

 config/config-i386.mak   |  2 ++
 config/config-x86-common.mak |  4 +++-
 config/config-x86_64.mak |  2 ++
 lib/setjmp.h | 12 +++
 lib/x86/desc.c   | 24 +
 lib/x86/desc.h   |  8 ---
 lib/x86/setjmp32.S   | 25 ++
 lib/x86/setjmp64.S   | 27 
 x86/apic.c   | 50 
 x86/setjmp.c | 19 +
 x86/vmx.c| 43 ++---
 11 files changed, 154 insertions(+), 62 deletions(-)
 create mode 100644 lib/setjmp.h
 create mode 100644 lib/x86/setjmp32.S
 create mode 100644 lib/x86/setjmp64.S
 create mode 100644 x86/setjmp.c

-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests 3/4] x86: remove test_for_exception

2015-12-15 Thread Paolo Bonzini
Test functions know whether an exception was generated simply by checking
the last value returned by set_exception_jmpbuf.  The exception number is
passed to set_exception_jmpbuf so that it can set up the exception handler.

Signed-off-by: Paolo Bonzini <pbonz...@redhat.com>
---
 lib/x86/desc.c | 13 +
 lib/x86/desc.h |  8 +++-
 x86/apic.c | 34 +-
 x86/vmx.c  | 29 +++--
 4 files changed, 40 insertions(+), 44 deletions(-)

diff --git a/lib/x86/desc.c b/lib/x86/desc.c
index acf29e3..acf66b6 100644
--- a/lib/x86/desc.c
+++ b/lib/x86/desc.c
@@ -315,7 +315,6 @@ void setup_alt_stack(void)
 }
 #endif
 
-static bool exception;
 static jmp_buf *exception_jmpbuf;
 
 static void exception_handler_longjmp(void)
@@ -326,21 +325,11 @@ static void exception_handler_longjmp(void)
 static void exception_handler(struct ex_regs *regs)
 {
/* longjmp must happen after iret, so do not do it now.  */
-   exception = true;
regs->rip = (unsigned long)_handler_longjmp;
 }
 
-bool test_for_exception(unsigned int ex, void (*trigger_func)(void *data),
-   void *data)
+void __set_exception_jmpbuf(unsigned int ex, jmp_buf *addr)
 {
handle_exception(ex, exception_handler);
-   exception = false;
-   trigger_func(data);
-   handle_exception(ex, NULL);
-   return exception;
-}
-
-void __set_exception_jmpbuf(jmp_buf *addr)
-{
exception_jmpbuf = addr;
 }
diff --git a/lib/x86/desc.h b/lib/x86/desc.h
index be52fd4..fceabd8 100644
--- a/lib/x86/desc.h
+++ b/lib/x86/desc.h
@@ -155,10 +155,8 @@ void set_intr_alt_stack(int e, void *fn);
 void print_current_tss_info(void);
 void handle_exception(u8 v, void (*func)(struct ex_regs *regs));
 
-bool test_for_exception(unsigned int ex, void (*trigger_func)(void *data),
-   void *data);
-void __set_exception_jmpbuf(jmp_buf *addr);
-#define set_exception_jmpbuf(jmpbuf) \
-   (setjmp(jmpbuf) ? : (__set_exception_jmpbuf(&(jmpbuf)), 0))
+void __set_exception_jmpbuf(unsigned int ex, jmp_buf *addr);
+#define set_exception_jmpbuf(ex, jmpbuf) \
+   (setjmp(jmpbuf) ? : (__set_exception_jmpbuf((ex), &(jmpbuf)), 0))
 
 #endif
diff --git a/x86/apic.c b/x86/apic.c
index 2e04c82..de19724 100644
--- a/x86/apic.c
+++ b/x86/apic.c
@@ -61,12 +61,18 @@ static void test_tsc_deadline_timer(void)
 }
 }
 
-static void do_write_apicbase(void *data)
+static bool do_write_apicbase(u64 data)
 {
 jmp_buf jmpbuf;
-if (set_exception_jmpbuf(jmpbuf) == 0) {
-wrmsr(MSR_IA32_APICBASE, *(u64 *)data);
+int ret;
+if (set_exception_jmpbuf(GP_VECTOR, jmpbuf) == 0) {
+wrmsr(MSR_IA32_APICBASE, data);
+ret = 0;
+} else {
+ret = 1;
 }
+handle_exception(GP_VECTOR, NULL);
+return ret;
 }
 
 void test_enable_x2apic(void)
@@ -80,24 +86,19 @@ void test_enable_x2apic(void)
 printf("x2apic enabled\n");
 
 report("x2apic enabled to invalid state",
-   test_for_exception(GP_VECTOR, do_write_apicbase,
-  _state));
+   do_write_apicbase(invalid_state));
 report("x2apic enabled to apic enabled",
-   test_for_exception(GP_VECTOR, do_write_apicbase,
-  _enabled));
+   do_write_apicbase(apic_enabled));
 
 wrmsr(MSR_IA32_APICBASE, APIC_DEFAULT_PHYS_BASE | APIC_BSP);
 report("disabled to invalid state",
-   test_for_exception(GP_VECTOR, do_write_apicbase,
-  _state));
+   do_write_apicbase(invalid_state));
 report("disabled to x2apic enabled",
-   test_for_exception(GP_VECTOR, do_write_apicbase,
-  _enabled));
+   do_write_apicbase(x2apic_enabled));
 
 wrmsr(MSR_IA32_APICBASE, apic_enabled);
 report("apic enabled to invalid state",
-   test_for_exception(GP_VECTOR, do_write_apicbase,
-  _state));
+   do_write_apicbase(invalid_state));
 
 wrmsr(MSR_IA32_APICBASE, x2apic_enabled);
 apic_write(APIC_SPIV, 0x1ff);
@@ -105,8 +106,7 @@ void test_enable_x2apic(void)
 printf("x2apic not detected\n");
 
 report("enable unsupported x2apic",
-   test_for_exception(GP_VECTOR, do_write_apicbase,
-  _enabled));
+   do_write_apicbase(x2apic_enabled));
 }
 }
 
@@ -128,11 +128,11 @@ static void test_apicbase(void)
 
 value = orig_apicbase | (1UL << cpuid_maxphyaddr());
 report("reserved physaddr bits",
-   test_for_exception(GP_VECTOR, do_write_apicbase, ));
+   do_write_apicbase(value));
 
 value = orig_apicbase | 1;
 report("reserved low bits",
-  

Re: [PATCH] target-i386: check vcpu features before accessing MSR_TSC_AUX

2015-12-14 Thread Paolo Bonzini


On 14/12/2015 12:07, Haozhong Zhang wrote:
> This patch fix a bug that prevents VM rebooting on recent versions of
> KVM (from commit 9dbe6cf).
> 
> kvm_get_msrs() is called to save guest MSR_TSC_AUX and other MSRs across
> rebooting. It only checks whether KVM exposes MSR_TSC_AUX to userspace.
> However, if vcpu does not support rdtscp (e.g. kvm64), current KVM will
> fail the saving and thus all other MSRs following it will fail in
> kvm_get_msrs(). As a result, from KVM commit 9dbe6cf that exposes
> MSR_TSC_AUX, VM can not successfully reboot.
> 
> This patch fixes this bug by adding the missing rdtscp feature checks.

That commit is not in any released kernel.  It's better if we just check
msr_info->host_initiated in vmx_get_msr and vmx_set_msr. Can you prepare
a patch?

Thanks,

Paolo

> Signed-off-by: Haozhong Zhang 
> ---
>  target-i386/kvm.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
> index 6dc9846..cc842c6 100644
> --- a/target-i386/kvm.c
> +++ b/target-i386/kvm.c
> @@ -1414,7 +1414,8 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
>  if (has_msr_hsave_pa) {
>  kvm_msr_entry_set([n++], MSR_VM_HSAVE_PA, env->vm_hsave);
>  }
> -if (has_msr_tsc_aux) {
> +if (has_msr_tsc_aux &&
> +(env->features[FEAT_8000_0001_EDX] & CPUID_EXT2_RDTSCP)) {
>  kvm_msr_entry_set([n++], MSR_TSC_AUX, env->tsc_aux);
>  }
>  if (has_msr_tsc_adjust) {
> @@ -1793,7 +1794,8 @@ static int kvm_get_msrs(X86CPU *cpu)
>  if (has_msr_hsave_pa) {
>  msrs[n++].index = MSR_VM_HSAVE_PA;
>  }
> -if (has_msr_tsc_aux) {
> +if (has_msr_tsc_aux &&
> +(env->features[FEAT_8000_0001_EDX] & CPUID_EXT2_RDTSCP)) {
>  msrs[n++].index = MSR_TSC_AUX;
>  }
>  if (has_msr_tsc_adjust) {
> 

This commit is not in any released kernel.  It's better if we just check
msr_info->host_initiated in vmx_get_msr and vmx_set_msr.  Can you
prepare a patch?

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1] kvm/x86: Remove Hyper-V SynIC timer stopping

2015-12-14 Thread Paolo Bonzini


On 14/12/2015 16:33, Andrey Smetanin wrote:
> It's possible that guest send us Hyper-V EOM at the middle
> of Hyper-V SynIC timer running, so we start processing of Hyper-V
> SynIC timers in vcpu context and stop the Hyper-V SynIC timer
> uncoditionally and lose time expiration which Windows 2012R2 guest
> expects.
> 
> The patch fixes such situation by not stopping Hyper-V SynIC timer
> at all, because it's safe to restart it without stop in vcpu context
> and timer callback always returns HRTIMER_NORESTART.

Can you summarize with a "picture" what is the bad race?

The patch seems safe, but I'd like to have a better understanding of
what goes wrong.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: MTRR: fix fixed MTRR segment look up

2015-12-14 Thread Paolo Bonzini


On 14/12/2015 15:39, Alexis D...t wrote:
> It fixes the slow-down of VM running with pci-passthrough, since some MTRR
> range changed from MTRR_TYPE_WRBACK to MTRR_TYPE_UNCACHABLE.
> 
> Fixes: fa61213746a ("KVM: MTRR: simplify kvm_mtrr_get_guest_memory_type")
> Bugzilla: (https://bugzilla.kernel.org/show_bug.cgi?id=107561)
> 
> Signed-off-by: Alexis Dambricourt 
> ---
>  arch/x86/kvm/mtrr.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/mtrr.c b/arch/x86/kvm/mtrr.c
> index 9e8bf13..adc54e1 100644
> --- a/arch/x86/kvm/mtrr.c
> +++ b/arch/x86/kvm/mtrr.c
> @@ -267,7 +267,7 @@ static int fixed_mtrr_addr_to_seg(u64 addr)
> 
> for (seg = 0; seg < seg_num; seg++) {
> mtrr_seg = _seg_table[seg];
> -   if (mtrr_seg->start >= addr && addr < mtrr_seg->end)
> +   if (mtrr_seg->start <= addr && addr < mtrr_seg->end)

So this if could never be true.

> return seg;
> }
> 
> --
> 

While that's embarrassing, :)  it would only apply to memory in the 640K-1M
range, while the logs in the BZ have

nov 22 17:06:49 Core-i7-5.lan kernel: vmx_get_mt_mask got the following: cpu=6, 
vcpu=0, gfn=67fe00, MMIO=0, cache=0

Note that this is a page above 4GB, which is why in the BZ I thought that
the culprit was MAXPHYADDR (and I still believe it is, at least for the
OVMF case---there are probably two bugs, and your patch fixes one).

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: MTRR: fix fixed MTRR segment look up

2015-12-14 Thread Paolo Bonzini


On 14/12/2015 17:06, Alexis D...t wrote:
> From what I see in my case, it was especially gfn from 0x0 to 0xA0
> which are deadly impacted by the wrong cache property.

Yes, indeed.  I'll send the patch to Linus this week; I've now posted to
the BZ a second patch for GFNs above 4GB.

Paolo

> 2015-12-14 16:36 GMT+01:00 Paolo Bonzini <pbonz...@redhat.com>:
>> >
>> >
>> > On 14/12/2015 15:39, Alexis D...t wrote:
>>> >> It fixes the slow-down of VM running with pci-passthrough, since some 
>>> >> MTRR
>>> >> range changed from MTRR_TYPE_WRBACK to MTRR_TYPE_UNCACHABLE.
>>> >>
>>> >> Fixes: fa61213746a ("KVM: MTRR: simplify kvm_mtrr_get_guest_memory_type")
>>> >> Bugzilla: (https://bugzilla.kernel.org/show_bug.cgi?id=107561)
>>> >>
>>> >> Signed-off-by: Alexis Dambricourt <alexis.dambrico...@gmail.com>
>>> >> ---
>>> >>  arch/x86/kvm/mtrr.c | 2 +-
>>> >>  1 file changed, 1 insertion(+), 1 deletion(-)
>>> >>
>>> >> diff --git a/arch/x86/kvm/mtrr.c b/arch/x86/kvm/mtrr.c
>>> >> index 9e8bf13..adc54e1 100644
>>> >> --- a/arch/x86/kvm/mtrr.c
>>> >> +++ b/arch/x86/kvm/mtrr.c
>>> >> @@ -267,7 +267,7 @@ static int fixed_mtrr_addr_to_seg(u64 addr)
>>> >>
>>> >> for (seg = 0; seg < seg_num; seg++) {
>>> >> mtrr_seg = _seg_table[seg];
>>> >> -   if (mtrr_seg->start >= addr && addr < mtrr_seg->end)
>>> >> +   if (mtrr_seg->start <= addr && addr < mtrr_seg->end)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvmclock doesn't work, help?

2015-12-14 Thread Paolo Bonzini


On 11/12/2015 22:57, Andy Lutomirski wrote:
> I'm still not seeing the issue.
> 
> The formula is:
> 
> (((rdtsc - pvti->tsc_timestamp) * pvti->tsc_to_system_mul) >>
> pvti->tsc_shift) + pvti->system_time
> 
> Obviously, if you reset pvti->tsc_timestamp to the current tsc value
> after suspend/resume, you would also need to update system_time.
> 
> I don't see what this has to do with suspend/resume or with whether
> the effective scale factor is greater than or less than one.  The only
> suspend/resume interaction I can see is that, if the host allows the
> guest-observed TSC value to jump (which is arguably a bug, what that's
> not important here), it needs to update pvti before resuming the
> guest.

Which is not an issue, since freezing obviously gets all CPUs out of
guest mode.

Marcelo, can you provide an example with made-up values for tsc and pvti?

> Can you clarify concretely what goes wrong here?
> 
> (I'm also at a bit of a loss as to why this needs both system_time and
> tsc_timestamp.  They're redundant in the sense that you could set
> tsc_timestamp to zero and subtract (tsc_timestamp * tsc_to_system_mul) >>
> tsc_shift to system_time without changing the result of the
> calculation.)

You would have to ensure that all elements of pvti are rounded correctly
whenever the base TSC is updated.  Doable, but it does seem simpler to
keep subtract-TSC and add-nanoseconds separate.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] target-i386: check vcpu features before accessing MSR_TSC_AUX

2015-12-14 Thread Paolo Bonzini


On 14/12/2015 14:54, Haozhong Zhang wrote:
>> > That commit is not in any released kernel.
> Right, it's currently only in kvm next. But I assume it would finally come
> into a released kernel.

Yes, but until it is, it's easier (and better) to fix KVM instead of QEMU.

> > It's better if we just check
> > msr_info->host_initiated in vmx_get_msr and vmx_set_msr. Can you prepare
> > a patch?
>
> Yes, I'll send a KVM patch later. And then this QEMU patch is not
> needed any more.

Great, thanks.

paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: x86: move tracepoints outside extended quiescent state

2015-12-11 Thread Paolo Bonzini


On 11/12/2015 11:22, Borislav Petkov wrote:
> On Thu, Dec 10, 2015 at 07:15:19PM +0100, Paolo Bonzini wrote:
>> Yeah, wait_lapic_expire also have to be moved before __kvm_guest_enter.
> 
> Yeah, v2 doesn't splat on the Intel box anymore but the AMD box still
> has, and it is a different problem. With the v2 applied, it still
> explodes, see below.

Yes, I didn't expect it to fix anything.  I just wanted to pinpoint it
to kvm-amd.

> And I'm willing to bet good money on that shadow pages fun.

You can disable it (well, make it take a few days to appear) with this:

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 484079efea5b..a9070e260c72 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -496,7 +496,7 @@ static struct kvm_memslots *kvm_alloc_memslots(void)
 * Init kvm generation close to the maximum to easily test the
 * code of handling generation number wrap-around.
 */
-   slots->generation = -150;
+   slots->generation = 0;
for (i = 0; i < KVM_MEM_SLOTS_NUM; i++)
slots->id_to_index[i] = slots->memslots[i].id = i;

but it would not be AMD-specific.

Anyway if this theory is true:

> [  959.466549] kernel tried to execute NX-protected page - exploit attempt? 
> (uid: 1000)
> 
> line basically says that we're pagefaulting when trying to fetch
> instructions, i.e., we're trying to execute something from a page, rIP
> points to 0x8800b9f9bdf0 and that is most likely a page belonging to
> kvm, which, however, is for some reason not executable (anymore?).

It would be a kvm hypervisor page, not a kvm guest page, hence unrelated
to the zapping thing.

Can you grab the kallsyms before making it crash?  I will get to it next
week.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] Please pull my kvm-ppc-fixes branch

2015-12-11 Thread Paolo Bonzini


On 10/12/2015 04:12, Paul Mackerras wrote:
>   git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc.git 
> kvm-ppc-fixes

Pulled, thanks.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] Please pull my kvm-ppc-fixes branch

2015-12-11 Thread Paolo Bonzini


On 10/12/2015 04:12, Paul Mackerras wrote:
>   git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc.git 
> kvm-ppc-fixes

Pulled, thanks.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: x86: move tracepoints outside extended quiescent state

2015-12-11 Thread Paolo Bonzini


On 11/12/2015 12:41, Borislav Petkov wrote:
> On Fri, Dec 11, 2015 at 11:41:30AM +0100, Paolo Bonzini wrote:
>> It would be a kvm hypervisor page, not a kvm guest page, hence unrelated
>> to the zapping thing.
> 
> Ah right, guest pages should be userspace addresses, come to think of
> it.
> 
>> Can you grab the kallsyms before making it crash?
> 
> Attached. It was a different corruption this time, see below. This time
> we don't even have a page table, PGD is 0, rIP is 1. (Fun :-))

Hmm, you had:

- RIP=0 in the original report (start_this_handle)
- RIP=0 in the second (mutex_lock_nested in ext4)
- RIP=1 now

The more interesting one is the other one which doesn't have a small RIP,
because it has RIP that is slightly larger than the stack pointer, meaning
it's likely a frame pointer.  And this means in turn that the call trace
is correct, and the bug might have happened closer to the actual corruption.

[  959.548625] RIP: 0010:[]  [] 
0x8800b9f9bdf0
[  959.556338] RSP: 0018:8800b9f9bde0  EFLAGS: 00010206
[  959.618579] Stack:
[  959.620607]  a02d5e17 8800b7d48000 8800b9f9be08 
a02bdb1f
[  959.628104]   8800b9f9be98 a02bdc7b 
8804242a4400
[  959.635601]  0070 4000 81a3c1e0 
8800b7ca5e00
[  959.643114] Call Trace:
[  959.645599]  [] ? kvm_arch_vcpu_put+0x17/0x40 [kvm]
[  959.652081]  [] ? vcpu_put+0x1f/0x60 [kvm]
[  959.657782]  [] ? kvm_vcpu_ioctl+0x11b/0x6f0 [kvm]
[  959.664169]  [] ? do_vfs_ioctl+0x2e0/0x540
[  959.669855]  [] ? __fget_light+0x29/0x90
[  959.675364]  [] ? SyS_ioctl+0x4c/0x90
[  959.680618]  [] ? entry_SYSCALL_64_fastpath+0x16/0x6f

My wild guess is that RSP is getting corrupted, but I guess I'll have to try
to reproduce to figure out what happens.

The last thing I need from you (hopefully) is a Kconfig.  If you have some
time, it would be great to check if you can reproduce it with an older kernel
version---trying 4.4-rc1 and 4.3 would be great.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/11] qapi: Rename qjson.h to qobject-json.h

2015-12-11 Thread Paolo Bonzini
xer.o json-streamer.o json-parser.o
> diff --git a/qobject/qjson.c b/qobject/qobject-json.c
> similarity index 99%
> rename from qobject/qjson.c
> rename to qobject/qobject-json.c
> index 41d9d65..8fc65a4 100644
> --- a/qobject/qjson.c
> +++ b/qobject/qobject-json.c
> @@ -14,7 +14,7 @@
>  #include "qapi/qmp/json-lexer.h"
>  #include "qapi/qmp/json-parser.h"
>  #include "qapi/qmp/json-streamer.h"
> -#include "qapi/qmp/qjson.h"
> +#include "qapi/qmp/qobject-json.h"
>  #include "qapi/qmp/qint.h"
>  #include "qapi/qmp/qlist.h"
>  #include "qapi/qmp/qbool.h"
> diff --git a/target-s390x/kvm.c b/target-s390x/kvm.c
> index 75a0e5d..40dd729 100644
> --- a/target-s390x/kvm.c
> +++ b/target-s390x/kvm.c
> @@ -36,7 +36,7 @@
>  #include "hw/hw.h"
>  #include "cpu.h"
>  #include "sysemu/device_tree.h"
> -#include "qapi/qmp/qjson.h"
> +#include "qapi/qmp/qobject-json.h"
>  #include "exec/gdbstub.h"
>  #include "exec/address-spaces.h"
>  #include "trace.h"
> diff --git a/tests/.gitignore b/tests/.gitignore
> index 1e55722..6d3416d 100644
> --- a/tests/.gitignore
> +++ b/tests/.gitignore
> @@ -1,8 +1,8 @@
>  check-qdict
>  check-qfloat
>  check-qint
> -check-qjson
>  check-qlist
> +check-qobject-json
>  check-qstring
>  check-qom-interface
>  check-qom-proplist
> diff --git a/tests/Makefile b/tests/Makefile
> index 053c1ae..17d74e2 100644
> --- a/tests/Makefile
> +++ b/tests/Makefile
> @@ -16,8 +16,8 @@ check-unit-y += tests/check-qstring$(EXESUF)
>  gcov-files-check-qstring-y = qobject/qstring.c
>  check-unit-y += tests/check-qlist$(EXESUF)
>  gcov-files-check-qlist-y = qobject/qlist.c
> -check-unit-y += tests/check-qjson$(EXESUF)
> -gcov-files-check-qjson-y = qobject/qjson.c
> +check-unit-y += tests/check-qobject-json$(EXESUF)
> +gcov-files-check-qobject-json-y = qobject/qobject-json.c
>  check-unit-y += tests/test-qmp-output-visitor$(EXESUF)
>  gcov-files-test-qmp-output-visitor-y = qapi/qmp-output-visitor.c
>  check-unit-y += tests/test-qmp-input-visitor$(EXESUF)
> @@ -360,7 +360,7 @@ GENERATED_HEADERS += tests/test-qapi-types.h 
> tests/test-qapi-visit.h \
>   tests/test-qmp-introspect.h
> 
>  test-obj-y = tests/check-qint.o tests/check-qstring.o tests/check-qdict.o \
> - tests/check-qlist.o tests/check-qfloat.o tests/check-qjson.o \
> + tests/check-qlist.o tests/check-qfloat.o tests/check-qobject-json.o \
>   tests/test-coroutine.o tests/test-string-output-visitor.o \
>   tests/test-string-input-visitor.o tests/test-qmp-output-visitor.o \
>   tests/test-qmp-input-visitor.o tests/test-qmp-input-strict.o \
> @@ -387,7 +387,7 @@ tests/check-qstring$(EXESUF): tests/check-qstring.o 
> $(test-util-obj-y)
>  tests/check-qdict$(EXESUF): tests/check-qdict.o $(test-util-obj-y)
>  tests/check-qlist$(EXESUF): tests/check-qlist.o $(test-util-obj-y)
>  tests/check-qfloat$(EXESUF): tests/check-qfloat.o $(test-util-obj-y)
> -tests/check-qjson$(EXESUF): tests/check-qjson.o $(test-util-obj-y)
> +tests/check-qobject-json$(EXESUF): tests/check-qobject-json.o 
> $(test-util-obj-y)
>  tests/check-qom-interface$(EXESUF): tests/check-qom-interface.o 
> $(test-qom-obj-y)
>  tests/check-qom-proplist$(EXESUF): tests/check-qom-proplist.o 
> $(test-qom-obj-y)
>  tests/test-coroutine$(EXESUF): tests/test-coroutine.o $(test-block-obj-y)
> diff --git a/tests/check-qjson.c b/tests/check-qobject-json.c
> similarity index 99%
> rename from tests/check-qjson.c
> rename to tests/check-qobject-json.c
> index 61e9bfb..9c4e53a 100644
> --- a/tests/check-qjson.c
> +++ b/tests/check-qobject-json.c
> @@ -18,7 +18,7 @@
>  #include "qapi/qmp/qlist.h"
>  #include "qapi/qmp/qfloat.h"
>  #include "qapi/qmp/qbool.h"
> -#include "qapi/qmp/qjson.h"
> +#include "qapi/qmp/qobject-json.h"
> 
>  #include "qemu-common.h"
> 
> @@ -63,7 +63,7 @@ static void escaped_string(void)
> 
>  g_assert(obj != NULL);
>  g_assert(qobject_type(obj) == QTYPE_QSTRING);
> -
> +
>  str = qobject_to_qstring(obj);
>  g_assert_cmpstr(qstring_get_str(str), ==, test_cases[i].decoded);
> 
> @@ -98,7 +98,7 @@ static void simple_string(void)
> 
>  g_assert(obj != NULL);
>  g_assert(qobject_type(obj) == QTYPE_QSTRING);
> -
> +
>  str = qobject_to_qstring(obj);
>  g_assert(strcmp(qstring_get_str(str), test_cases[i].decoded) == 0);
> 
> @@ -106,7 +106,7 @@ static void simple_string(void)
>  g_assert(strcmp(qstring_get_str(str), test_c

Re: [PATCH 2/5] x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader

2015-12-11 Thread Paolo Bonzini


On 11/12/2015 08:52, Ingo Molnar wrote:
> 
> * Paolo Bonzini <pbonz...@redhat.com> wrote:
> 
>>
>>
>> On 10/12/2015 00:12, Andy Lutomirski wrote:
>>> From: Andy Lutomirski <l...@amacapital.net>
>>>
>>> The pvclock vdso code was too abstracted to understand easily and
>>> excessively paranoid.  Simplify it for a huge speedup.
>>>
>>> This opens the door for additional simplifications, as the vdso no
>>> longer accesses the pvti for any vcpu other than vcpu 0.
>>>
>>> Before, vclock_gettime using kvm-clock took about 45ns on my machine.
>>> With this change, it takes 29ns, which is almost as fast as the pure TSC
>>> implementation.
>>>
>>> Signed-off-by: Andy Lutomirski <l...@amacapital.net>
>>> ---
>>>  arch/x86/entry/vdso/vclock_gettime.c | 81 
>>> 
>>>  1 file changed, 46 insertions(+), 35 deletions(-)
>>>
>>> diff --git a/arch/x86/entry/vdso/vclock_gettime.c 
>>> b/arch/x86/entry/vdso/vclock_gettime.c
>>> index ca94fa649251..c325ba1bdddf 100644
>>> --- a/arch/x86/entry/vdso/vclock_gettime.c
>>> +++ b/arch/x86/entry/vdso/vclock_gettime.c
>>> @@ -78,47 +78,58 @@ static notrace const struct pvclock_vsyscall_time_info 
>>> *get_pvti(int cpu)
>>>  
>>>  static notrace cycle_t vread_pvclock(int *mode)
>>>  {
>>> -   const struct pvclock_vsyscall_time_info *pvti;
>>> +   const struct pvclock_vcpu_time_info *pvti = _pvti(0)->pvti;
>>> cycle_t ret;
>>> -   u64 last;
>>> -   u32 version;
>>> -   u8 flags;
>>> -   unsigned cpu, cpu1;
>>> -
>>> +   u64 tsc, pvti_tsc;
>>> +   u64 last, delta, pvti_system_time;
>>> +   u32 version, pvti_tsc_to_system_mul, pvti_tsc_shift;
>>>  
>>> /*
>>> -* Note: hypervisor must guarantee that:
>>> -* 1. cpu ID number maps 1:1 to per-CPU pvclock time info.
>>> -* 2. that per-CPU pvclock time info is updated if the
>>> -*underlying CPU changes.
>>> -* 3. that version is increased whenever underlying CPU
>>> -*changes.
>>> +* Note: The kernel and hypervisor must guarantee that cpu ID
>>> +* number maps 1:1 to per-CPU pvclock time info.
>>> +*
>>> +* Because the hypervisor is entirely unaware of guest userspace
>>> +* preemption, it cannot guarantee that per-CPU pvclock time
>>> +* info is updated if the underlying CPU changes or that that
>>> +* version is increased whenever underlying CPU changes.
>>>  *
>>> +* On KVM, we are guaranteed that pvti updates for any vCPU are
>>> +* atomic as seen by *all* vCPUs.  This is an even stronger
>>> +* guarantee than we get with a normal seqlock.
>>> +*
>>> +* On Xen, we don't appear to have that guarantee, but Xen still
>>> +* supplies a valid seqlock using the version field.
>>> +
>>> +* We only do pvclock vdso timing at all if
>>> +* PVCLOCK_TSC_STABLE_BIT is set, and we interpret that bit to
>>> +* mean that all vCPUs have matching pvti and that the TSC is
>>> +* synced, so we can just look at vCPU 0's pvti.
>>>  */
>>> -   do {
>>> -   cpu = __getcpu() & VGETCPU_CPU_MASK;
>>> -   /* TODO: We can put vcpu id into higher bits of pvti.version.
>>> -* This will save a couple of cycles by getting rid of
>>> -* __getcpu() calls (Gleb).
>>> -*/
>>> -
>>> -   pvti = get_pvti(cpu);
>>> -
>>> -   version = __pvclock_read_cycles(>pvti, , );
>>> -
>>> -   /*
>>> -* Test we're still on the cpu as well as the version.
>>> -* We could have been migrated just after the first
>>> -* vgetcpu but before fetching the version, so we
>>> -* wouldn't notice a version change.
>>> -*/
>>> -   cpu1 = __getcpu() & VGETCPU_CPU_MASK;
>>> -   } while (unlikely(cpu != cpu1 ||
>>> - (pvti->pvti.version & 1) ||
>>> - pvti->pvti.version != version));
>>> -
>>> -   if (unlikely(!(flags & PVCLOCK_TSC_STABLE_BIT)))
>>> +
>>> +   if (unlikely(!(pvti->flags & PVCLOCK_TSC_STABLE_BIT))) {
>>> *mode = VCLOCK_NONE;
>>> + 

Re: [PATCH] kvm: x86: move tracepoints outside extended quiescent state

2015-12-10 Thread Paolo Bonzini


On 10/12/2015 19:09, Borislav Petkov wrote:
> On Thu, Dec 10, 2015 at 06:38:57PM +0100, Paolo Bonzini wrote:
>> Invoking tracepoints within kvm_guest_enter/kvm_guest_exit causes a
>> lockdep splat.
>>
>> Cc: sta...@vger.kernel.org
>> Reported-by: Borislav Petkov <b...@alien8.de>
>> Signed-off-by: Paolo Bonzini <pbonz...@redhat.com>
>> ---
>>  arch/x86/kvm/svm.c | 4 ++--
>>  arch/x86/kvm/vmx.c | 3 ++-
>>  arch/x86/kvm/x86.c | 2 +-
>>  3 files changed, 5 insertions(+), 4 deletions(-)
> 
> Looks like you missed some...

Yeah, wait_lapic_expire also have to be moved before __kvm_guest_enter.

Paolo

> [  144.296364] kvm: zapping shadow pages for mmio generation wraparound
> [  164.699053] kvm: zapping shadow pages for mmio generation wraparound
> [  312.115767] kvm: zapping shadow pages for mmio generation wraparound
> [  432.277585] kvm: zapping shadow pages for mmio generation wraparound
> 
> [  434.547820] ===
> [  434.552020] [ INFO: suspicious RCU usage. ]
> [  434.556223] 4.4.0-rc4+ #1 Not tainted
> [  434.559886] ---
> [  434.564072] arch/x86/kvm/trace.h:971 suspicious rcu_dereference_check() 
> usage!
> [  434.571303] 
>other info that might help us debug this:
> 
> [  434.579324] 
>RCU used illegally from idle CPU!
>rcu_scheduler_active = 1, debug_locks = 0
> [  434.590209] RCU used illegally from extended quiescent state!
> [  434.595971] 1 lock held by qemu-system-x86/2402:
> [  434.600596]  #0:  (>mutex){+.+.+.}, at: [] 
> vcpu_load+0x1c/0x80 [kvm]
> [  434.609146] 
>stack backtrace:
> [  434.613526] CPU: 4 PID: 2402 Comm: qemu-system-x86 Not tainted 4.4.0-rc4+ 
> #1
> [  434.620583] Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A13 
> 05/11/2014
> [  434.627987]  0001 88042f79fcf0 813c2cfc 
> 880435aa
> [  434.635443]  88042f79fd20 810c5157 88042fd48000 
> 000295a85563
> [  434.642886]  000295de483f  88042f79fd58 
> a023ec6e
> [  434.650334] Call Trace:
> [  434.652804]  [] dump_stack+0x4e/0x82
> [  434.657950]  [] lockdep_rcu_suspicious+0xe7/0x120
> [  434.664239]  [] wait_lapic_expire+0xfe/0x1e0 [kvm]
> [  434.670606]  [] kvm_arch_vcpu_ioctl_run+0x76e/0x19c0 
> [kvm]
> [  434.677674]  [] ? kvm_arch_vcpu_ioctl_run+0x8b2/0x19c0 
> [kvm]
> [  434.684905]  [] ? mutex_lock_killable_nested+0x29c/0x4c0
> [  434.691792]  [] kvm_vcpu_ioctl+0x342/0x700 [kvm]
> [  434.697984]  [] ? __lock_is_held+0x4d/0x70
> [  434.703655]  [] ? __fget+0xfe/0x200
> [  434.708719]  [] do_vfs_ioctl+0x301/0x550
> [  434.714208]  [] ? __fget_light+0x2a/0x90
> [  434.719700]  [] SyS_ioctl+0x41/0x70
> [  434.724754]  [] entry_SYSCALL_64_fastpath+0x16/0x7a
> [  437.411818] kvm [2400]: vcpu0 unhandled rdmsr: 0x606
> [  437.416898] kvm [2400]: vcpu0 unhandled rdmsr: 0x34
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: x86: Don't report guest userspace emulation error to userspace, why ?

2015-12-10 Thread Paolo Bonzini
> Paolo Bonzini <pbonz...@redhat.com> writes:
> > On 10/12/2015 18:58, Bandan Das wrote:
> >>> > Allowing userspace to stop the guest with an emulation failure is a
> >> This one I don't :) Userspace started the guest after all, there are other
> >> ways for it to kill the guest if it wanted to.
> >
> > I mean allowing guest userspace to stop the guest.
> 
> Sure! Userspace (Qemu) can just reenter the guest when it sees the failure.
> Doing it in the host kvm seems overkill.

Most userspaces will get it wrong.  Doing it once makes sure that you
do it right.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: x86: Don't report guest userspace emulation error to userspace, why ?

2015-12-10 Thread Paolo Bonzini


On 09/12/2015 23:18, Bandan Das wrote:
> Commit a2b9e6c1a35afcc09:
> 
> KVM: x86: Don't report guest userspace emulation error to userspace
> 
> Commit fc3a9157d314 ("KVM: X86: Don't report L2 emulation failures to
> user-space") disabled the reporting of L2 (nested guest) emulation 
> failures to
> userspace due to race-condition between a vmexit and the instruction 
> emulator.
> The same rational applies also to userspace applications that are 
> permitted by
> the guest OS to access MMIO area or perform PIO.
> 
> This patch extends the current behavior - of injecting a #UD instead of
> reporting it to userspace - also for guest userspace code.
> 
> I searched the archives but failed in finding anything. Can someone please
> explain why this is needed ? Or, why not let userspace decide what to do based
> on the cpl, whether to continue execution or kill the guest ? Is the 
> assumption
> here that this is what userspace always wants ?

Not what userspace always wants, but what the guest kernel always wants.

Allowing userspace to stop the guest with an emulation failure is a
possible denial of service, similar to L2 stopping L1 with an emulation
failure.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/5] x86/vdso: Get pvclock data from the vvar VMA instead of the fixmap

2015-12-10 Thread Paolo Bonzini
 __pa(pvti) >> PAGE_SHIFT,
> +   PAGE_SIZE,
> +   PAGE_READONLY);
> +
> + if (ret)
> + goto up_fail;
> + }
> +
>  up_fail:
>   if (ret)
>   current->mm->context.vdso = NULL;
> diff --git a/arch/x86/include/asm/pvclock.h b/arch/x86/include/asm/pvclock.h
> index 7a6bed5c08bc..3864398c7cb2 100644
> --- a/arch/x86/include/asm/pvclock.h
> +++ b/arch/x86/include/asm/pvclock.h
> @@ -4,6 +4,15 @@
>  #include 
>  #include 
>  
> +#ifdef CONFIG_PARAVIRT_CLOCK
> +extern struct pvclock_vsyscall_time_info *pvclock_pvti_cpu0_va(void);
> +#else
> +static inline struct pvclock_vsyscall_time_info *pvclock_pvti_cpu0_va(void)
> +{
> + return NULL;
> +}
> +#endif
> +
>  /* some helper functions for xen and kvm pv clock sources */
>  cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src);
>  u8 pvclock_read_flags(struct pvclock_vcpu_time_info *src);
> diff --git a/arch/x86/include/asm/vdso.h b/arch/x86/include/asm/vdso.h
> index 756de9190aec..deabaf9759b6 100644
> --- a/arch/x86/include/asm/vdso.h
> +++ b/arch/x86/include/asm/vdso.h
> @@ -22,6 +22,7 @@ struct vdso_image {
>  
>   long sym_vvar_page;
>   long sym_hpet_page;
> + long sym_pvclock_page;
>   long sym_VDSO32_NOTE_MASK;
>   long sym___kernel_sigreturn;
>   long sym___kernel_rt_sigreturn;
> diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
> index 2bd81e302427..ec1b06dc82d2 100644
> --- a/arch/x86/kernel/kvmclock.c
> +++ b/arch/x86/kernel/kvmclock.c
> @@ -45,6 +45,11 @@ early_param("no-kvmclock", parse_no_kvmclock);
>  static struct pvclock_vsyscall_time_info *hv_clock;
>  static struct pvclock_wall_clock wall_clock;
>  
> +struct pvclock_vsyscall_time_info *pvclock_pvti_cpu0_va(void)
> +{
> + return hv_clock;
> +}
> +
>  /*
>   * The wallclock is the time of day when we booted. Since then, some time may
>   * have elapsed since the hypervisor wrote the data. So we try to account for
> 

Acked-by: Paolo Bonzini <pbonz...@redhat.com>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] x86/vdso: Enable vdso pvclock access on all vdso variants

2015-12-10 Thread Paolo Bonzini


On 10/12/2015 00:12, Andy Lutomirski wrote:
> Now that pvclock doesn't require access to the fixmap, all vdso
> variants can use it.
> 
> The kernel side isn't wired up for 32-bit kernels yet, but this
> covers 32-bit and x32 userspace on 64-bit kernels.
> 
> Signed-off-by: Andy Lutomirski <l...@kernel.org>
> ---
>  arch/x86/entry/vdso/vclock_gettime.c | 91 
> 
>  1 file changed, 40 insertions(+), 51 deletions(-)
> 
> diff --git a/arch/x86/entry/vdso/vclock_gettime.c 
> b/arch/x86/entry/vdso/vclock_gettime.c
> index 59a98c25bde7..8602f06c759f 100644
> --- a/arch/x86/entry/vdso/vclock_gettime.c
> +++ b/arch/x86/entry/vdso/vclock_gettime.c
> @@ -17,8 +17,10 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
> +#include 
>  
>  #define gtod ((vsyscall_gtod_data))
>  
> @@ -43,10 +45,6 @@ extern u8 pvclock_page
>  
>  #ifndef BUILD_VDSO32
>  
> -#include 
> -#include 
> -#include 
> -
>  notrace static long vdso_fallback_gettime(long clock, struct timespec *ts)
>  {
>   long ret;
> @@ -64,8 +62,42 @@ notrace static long vdso_fallback_gtod(struct timeval *tv, 
> struct timezone *tz)
>   return ret;
>  }
>  
> -#ifdef CONFIG_PARAVIRT_CLOCK
>  
> +#else
> +
> +notrace static long vdso_fallback_gettime(long clock, struct timespec *ts)
> +{
> + long ret;
> +
> + asm(
> + "mov %%ebx, %%edx \n"
> + "mov %2, %%ebx \n"
> + "call __kernel_vsyscall \n"
> + "mov %%edx, %%ebx \n"
> + : "=a" (ret)
> + : "0" (__NR_clock_gettime), "g" (clock), "c" (ts)
> + : "memory", "edx");
> + return ret;
> +}
> +
> +notrace static long vdso_fallback_gtod(struct timeval *tv, struct timezone 
> *tz)
> +{
> + long ret;
> +
> + asm(
> + "mov %%ebx, %%edx \n"
> + "mov %2, %%ebx \n"
> + "call __kernel_vsyscall \n"
> + "mov %%edx, %%ebx \n"
> + : "=a" (ret)
> + : "0" (__NR_gettimeofday), "g" (tv), "c" (tz)
> + : "memory", "edx");
> + return ret;
> +}
> +
> +#endif
> +
> +#ifdef CONFIG_PARAVIRT_CLOCK
>  static notrace const struct pvclock_vsyscall_time_info *get_pvti0(void)
>  {
>   return (const struct pvclock_vsyscall_time_info *)_page;
> @@ -109,9 +141,9 @@ static notrace cycle_t vread_pvclock(int *mode)
>   do {
>   version = pvti->version;
>  
> - /* This is also a read barrier, so we'll read version first. */
> - tsc = rdtsc_ordered();
> + smp_rmb();
>  
> + tsc = rdtsc_ordered();
>   pvti_tsc_to_system_mul = pvti->tsc_to_system_mul;
>   pvti_tsc_shift = pvti->tsc_shift;
>   pvti_system_time = pvti->system_time;
> @@ -126,7 +158,7 @@ static notrace cycle_t vread_pvclock(int *mode)
>   pvclock_scale_delta(delta, pvti_tsc_to_system_mul,
>   pvti_tsc_shift);
>  
> - /* refer to tsc.c read_tsc() comment for rationale */
> + /* refer to vread_tsc() comment for rationale */
>   last = gtod->cycle_last;
>  
>   if (likely(ret >= last))
> @@ -136,49 +168,6 @@ static notrace cycle_t vread_pvclock(int *mode)
>  }
>  #endif
>  
> -#else
> -
> -notrace static long vdso_fallback_gettime(long clock, struct timespec *ts)
> -{
> - long ret;
> -
> - asm(
> - "mov %%ebx, %%edx \n"
> - "mov %2, %%ebx \n"
> - "call __kernel_vsyscall \n"
> - "mov %%edx, %%ebx \n"
> - : "=a" (ret)
> - : "0" (__NR_clock_gettime), "g" (clock), "c" (ts)
> - : "memory", "edx");
> - return ret;
> -}
> -
> -notrace static long vdso_fallback_gtod(struct timeval *tv, struct timezone 
> *tz)
> -{
> - long ret;
> -
> - asm(
> - "mov %%ebx, %%edx \n"
> - "mov %2, %%ebx \n"
> - "call __kernel_vsyscall \n"
> - "mov %%edx, %%ebx \n"
> - : "=a" (ret)
> - : "0" (__NR_gettimeofday), "g" (tv), "c" (tz)
> - : "memory", "edx");
> - return ret;
> -}
> -
> -#ifdef CONFIG_PARAVIRT_CLOCK
> -
> -static notrace cycle_t vread_pvclock(int *mode)
> -{
> - *mode = VCLOCK_NONE;
> - return 0;
> -}
> -#endif
> -
> -#endif
> -
>  notrace static cycle_t vread_tsc(void)
>  {
>   cycle_t ret = (cycle_t)rdtsc_ordered();
> 

Acked-by: Paolo Bonzini <pbonz...@redhat.com>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] x86/vdso: Remove pvclock fixmap machinery

2015-12-10 Thread Paolo Bonzini


On 10/12/2015 00:12, Andy Lutomirski wrote:
> Signed-off-by: Andy Lutomirski <l...@kernel.org>
> ---
>  arch/x86/entry/vdso/vclock_gettime.c |  1 -
>  arch/x86/entry/vdso/vma.c|  1 +
>  arch/x86/include/asm/fixmap.h|  5 -
>  arch/x86/include/asm/pvclock.h   |  5 -
>  arch/x86/kernel/kvmclock.c   |  6 --
>  arch/x86/kernel/pvclock.c| 24 
>  6 files changed, 1 insertion(+), 41 deletions(-)
> 
> diff --git a/arch/x86/entry/vdso/vclock_gettime.c 
> b/arch/x86/entry/vdso/vclock_gettime.c
> index 5dd363d54348..59a98c25bde7 100644
> --- a/arch/x86/entry/vdso/vclock_gettime.c
> +++ b/arch/x86/entry/vdso/vclock_gettime.c
> @@ -45,7 +45,6 @@ extern u8 pvclock_page
>  
>  #include 
>  #include 
> -#include 
>  #include 
>  
>  notrace static long vdso_fallback_gettime(long clock, struct timespec *ts)
> diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
> index aa828191c654..b8f69e264ac4 100644
> --- a/arch/x86/entry/vdso/vma.c
> +++ b/arch/x86/entry/vdso/vma.c
> @@ -12,6 +12,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
> index f80d70009ff8..6d7d0e52ed5a 100644
> --- a/arch/x86/include/asm/fixmap.h
> +++ b/arch/x86/include/asm/fixmap.h
> @@ -19,7 +19,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  #ifdef CONFIG_X86_32
>  #include 
>  #include 
> @@ -72,10 +71,6 @@ enum fixed_addresses {
>  #ifdef CONFIG_X86_VSYSCALL_EMULATION
>   VSYSCALL_PAGE = (FIXADDR_TOP - VSYSCALL_ADDR) >> PAGE_SHIFT,
>  #endif
> -#ifdef CONFIG_PARAVIRT_CLOCK
> - PVCLOCK_FIXMAP_BEGIN,
> - PVCLOCK_FIXMAP_END = PVCLOCK_FIXMAP_BEGIN+PVCLOCK_VSYSCALL_NR_PAGES-1,
> -#endif
>  #endif
>   FIX_DBGP_BASE,
>   FIX_EARLYCON_MEM_BASE,
> diff --git a/arch/x86/include/asm/pvclock.h b/arch/x86/include/asm/pvclock.h
> index 3864398c7cb2..66df22b2e0c9 100644
> --- a/arch/x86/include/asm/pvclock.h
> +++ b/arch/x86/include/asm/pvclock.h
> @@ -100,10 +100,5 @@ struct pvclock_vsyscall_time_info {
>  } __attribute__((__aligned__(SMP_CACHE_BYTES)));
>  
>  #define PVTI_SIZE sizeof(struct pvclock_vsyscall_time_info)
> -#define PVCLOCK_VSYSCALL_NR_PAGES (((NR_CPUS-1)/(PAGE_SIZE/PVTI_SIZE))+1)
> -
> -int __init pvclock_init_vsyscall(struct pvclock_vsyscall_time_info *i,
> -  int size);
> -struct pvclock_vcpu_time_info *pvclock_get_vsyscall_time_info(int cpu);
>  
>  #endif /* _ASM_X86_PVCLOCK_H */
> diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
> index ec1b06dc82d2..72cef58693c7 100644
> --- a/arch/x86/kernel/kvmclock.c
> +++ b/arch/x86/kernel/kvmclock.c
> @@ -310,7 +310,6 @@ int __init kvm_setup_vsyscall_timeinfo(void)
>  {
>  #ifdef CONFIG_X86_64
>   int cpu;
> - int ret;
>   u8 flags;
>   struct pvclock_vcpu_time_info *vcpu_time;
>   unsigned int size;
> @@ -330,11 +329,6 @@ int __init kvm_setup_vsyscall_timeinfo(void)
>   return 1;
>   }
>  
> - if ((ret = pvclock_init_vsyscall(hv_clock, size))) {
> - put_cpu();
> - return ret;
> - }
> -
>   put_cpu();
>  
>   kvm_clock.archdata.vclock_mode = VCLOCK_PVCLOCK;
> diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
> index 2f355d229a58..99bfc025111d 100644
> --- a/arch/x86/kernel/pvclock.c
> +++ b/arch/x86/kernel/pvclock.c
> @@ -140,27 +140,3 @@ void pvclock_read_wallclock(struct pvclock_wall_clock 
> *wall_clock,
>  
>   set_normalized_timespec(ts, now.tv_sec, now.tv_nsec);
>  }
> -
> -#ifdef CONFIG_X86_64
> -/*
> - * Initialize the generic pvclock vsyscall state.  This will allocate
> - * a/some page(s) for the per-vcpu pvclock information, set up a
> - * fixmap mapping for the page(s)
> - */
> -
> -int __init pvclock_init_vsyscall(struct pvclock_vsyscall_time_info *i,
> -  int size)
> -{
> - int idx;
> -
> - WARN_ON (size != PVCLOCK_VSYSCALL_NR_PAGES*PAGE_SIZE);
> -
> - for (idx = 0; idx <= (PVCLOCK_FIXMAP_END-PVCLOCK_FIXMAP_BEGIN); idx++) {
> - __set_fixmap(PVCLOCK_FIXMAP_BEGIN + idx,
> -  __pa(i) + (idx*PAGE_SIZE),
> -  PAGE_KERNEL_VVAR);
> - }
> -
> - return 0;
> -}
> -#endif
> 

Acked-by: Paolo Bonzini <pbonz...@redhat.com>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/5] x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader

2015-12-10 Thread Paolo Bonzini


On 10/12/2015 00:12, Andy Lutomirski wrote:
> From: Andy Lutomirski <l...@amacapital.net>
> 
> The pvclock vdso code was too abstracted to understand easily and
> excessively paranoid.  Simplify it for a huge speedup.
> 
> This opens the door for additional simplifications, as the vdso no
> longer accesses the pvti for any vcpu other than vcpu 0.
> 
> Before, vclock_gettime using kvm-clock took about 45ns on my machine.
> With this change, it takes 29ns, which is almost as fast as the pure TSC
> implementation.
> 
> Signed-off-by: Andy Lutomirski <l...@amacapital.net>
> ---
>  arch/x86/entry/vdso/vclock_gettime.c | 81 
> 
>  1 file changed, 46 insertions(+), 35 deletions(-)
> 
> diff --git a/arch/x86/entry/vdso/vclock_gettime.c 
> b/arch/x86/entry/vdso/vclock_gettime.c
> index ca94fa649251..c325ba1bdddf 100644
> --- a/arch/x86/entry/vdso/vclock_gettime.c
> +++ b/arch/x86/entry/vdso/vclock_gettime.c
> @@ -78,47 +78,58 @@ static notrace const struct pvclock_vsyscall_time_info 
> *get_pvti(int cpu)
>  
>  static notrace cycle_t vread_pvclock(int *mode)
>  {
> - const struct pvclock_vsyscall_time_info *pvti;
> + const struct pvclock_vcpu_time_info *pvti = _pvti(0)->pvti;
>   cycle_t ret;
> - u64 last;
> - u32 version;
> - u8 flags;
> - unsigned cpu, cpu1;
> -
> + u64 tsc, pvti_tsc;
> + u64 last, delta, pvti_system_time;
> + u32 version, pvti_tsc_to_system_mul, pvti_tsc_shift;
>  
>   /*
> -  * Note: hypervisor must guarantee that:
> -  * 1. cpu ID number maps 1:1 to per-CPU pvclock time info.
> -  * 2. that per-CPU pvclock time info is updated if the
> -  *underlying CPU changes.
> -  * 3. that version is increased whenever underlying CPU
> -  *changes.
> +  * Note: The kernel and hypervisor must guarantee that cpu ID
> +  * number maps 1:1 to per-CPU pvclock time info.
> +  *
> +  * Because the hypervisor is entirely unaware of guest userspace
> +  * preemption, it cannot guarantee that per-CPU pvclock time
> +  * info is updated if the underlying CPU changes or that that
> +  * version is increased whenever underlying CPU changes.
>*
> +  * On KVM, we are guaranteed that pvti updates for any vCPU are
> +  * atomic as seen by *all* vCPUs.  This is an even stronger
> +  * guarantee than we get with a normal seqlock.
> +  *
> +  * On Xen, we don't appear to have that guarantee, but Xen still
> +  * supplies a valid seqlock using the version field.
> +
> +  * We only do pvclock vdso timing at all if
> +  * PVCLOCK_TSC_STABLE_BIT is set, and we interpret that bit to
> +  * mean that all vCPUs have matching pvti and that the TSC is
> +  * synced, so we can just look at vCPU 0's pvti.
>*/
> - do {
> - cpu = __getcpu() & VGETCPU_CPU_MASK;
> - /* TODO: We can put vcpu id into higher bits of pvti.version.
> -  * This will save a couple of cycles by getting rid of
> -  * __getcpu() calls (Gleb).
> -  */
> -
> - pvti = get_pvti(cpu);
> -
> - version = __pvclock_read_cycles(>pvti, , );
> -
> - /*
> -  * Test we're still on the cpu as well as the version.
> -  * We could have been migrated just after the first
> -  * vgetcpu but before fetching the version, so we
> -  * wouldn't notice a version change.
> -  */
> - cpu1 = __getcpu() & VGETCPU_CPU_MASK;
> - } while (unlikely(cpu != cpu1 ||
> -   (pvti->pvti.version & 1) ||
> -   pvti->pvti.version != version));
> -
> - if (unlikely(!(flags & PVCLOCK_TSC_STABLE_BIT)))
> +
> + if (unlikely(!(pvti->flags & PVCLOCK_TSC_STABLE_BIT))) {
>   *mode = VCLOCK_NONE;
> + return 0;
> + }
> +
> + do {
> + version = pvti->version;
> +
> + /* This is also a read barrier, so we'll read version first. */
> + tsc = rdtsc_ordered();
> +
> + pvti_tsc_to_system_mul = pvti->tsc_to_system_mul;
> + pvti_tsc_shift = pvti->tsc_shift;
> + pvti_system_time = pvti->system_time;
> + pvti_tsc = pvti->tsc_timestamp;
> +
> + /* Make sure that the version double-check is last. */
> +     smp_rmb();
> + } while (unlikely((version & 1) || version != pvti->version));
> +
> + delta = tsc - pvti_tsc;
> + ret = pvti_system_time +
> +  

Re: freeze host when injecting NMIs in the guest, at least in 4.4-rc4+

2015-12-10 Thread Paolo Bonzini


On 10/12/2015 17:44, Borislav Petkov wrote:
> Yap,
> 
> this is clearly a qemu/kvm issue. Lemme remove ext4 folks from CC. So
> here's what happens:
> 
> I boot a kvm guest, connect to its monitor (qemu is started with
> "-monitor pty") and on the monitor I issue a couple of times the "nmi"
> command. It doesn't explode immediately but it happens pretty often and
> when it happens, the *host*(!) freezes with some nasty corruption, see
> below.
> 
> Thoughts, suggestions, ideas?

Can you try it on Intel?

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: x86: Don't report guest userspace emulation error to userspace, why ?

2015-12-10 Thread Paolo Bonzini


On 10/12/2015 18:58, Bandan Das wrote:
>> > Allowing userspace to stop the guest with an emulation failure is a
> This one I don't :) Userspace started the guest after all, there are other
> ways for it to kill the guest if it wanted to.

I mean allowing guest userspace to stop the guest.

Paolo

>> > possible denial of service, similar to L2 stopping L1 with an emulation
>> > failure.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: freeze host when injecting NMIs in the guest, at least in 4.4-rc4+

2015-12-10 Thread Paolo Bonzini


On 10/12/2015 17:53, Borislav Petkov wrote:
> Just did, there it splats even when booting the guest, without even
> injecting NMIs:
> 
> [  113.233992] ===
> [  113.238192] [ INFO: suspicious RCU usage. ]
> [  113.242393] 4.4.0-rc4+ #1 Not tainted
> [  113.246056] ---
> [  113.250257] arch/x86/kvm/trace.h:29 suspicious rcu_dereference_check() 
> usage!
> [  113.257400] 
>other info that might help us debug this:
> 
> [  113.265432] 
>RCU used illegally from idle CPU!
>rcu_scheduler_active = 1, debug_locks = 0
> [  113.276321] RCU used illegally from extended quiescent state!
> [  113.282083] 1 lock held by qemu-system-x86/2275:
> [  113.286711]  #0:  (>mutex){+.+.+.}, at: [] 
> vcpu_load+0x1c/0x80 [kvm]
> [  113.295270] 
>stack backtrace:
> [  113.299644] CPU: 4 PID: 2275 Comm: qemu-system-x86 Not tainted 4.4.0-rc4+ 
> #1
> [  113.306706] Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A13 
> 05/11/2014
> [  113.314122]  0001 88042f263d28 813c2cfc 
> 880432d53000
> [  113.321575]  88042f263d58 810c5157 88042f268000 
> 
> [  113.329027]   0001 88042f263e10 
> a01ee7c8
> [  113.336483] Call Trace:
> [  113.338939]  [] dump_stack+0x4e/0x82
> [  113.344098]  [] lockdep_rcu_suspicious+0xe7/0x120
> [  113.350385]  [] kvm_arch_vcpu_ioctl_run+0x16f8/0x19c0 
> [kvm]
> [  113.357544]  [] ? kvm_arch_vcpu_ioctl_run+0xd0/0x19c0 
> [kvm]
> [  113.364695]  [] kvm_vcpu_ioctl+0x342/0x700 [kvm]
> [  113.370896]  [] ? __lock_is_held+0x4d/0x70
> [  113.376583]  [] ? __fget+0xfe/0x200
> [  113.381662]  [] do_vfs_ioctl+0x301/0x550
> [  113.387156]  [] ? __fget_light+0x2a/0x90
> [  113.392654]  [] SyS_ioctl+0x41/0x70
> [  113.397739]  [] entry_SYSCALL_64_fastpath+0x16/0x7a
> [  113.755008] kvm: zapping shadow pages for mmio generation wraparound

Wow, this must have been there forever...

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm: x86: move tracepoints outside extended quiescent state

2015-12-10 Thread Paolo Bonzini
Invoking tracepoints within kvm_guest_enter/kvm_guest_exit causes a
lockdep splat.

Cc: sta...@vger.kernel.org
Reported-by: Borislav Petkov <b...@alien8.de>
Signed-off-by: Paolo Bonzini <pbonz...@redhat.com>
---
 arch/x86/kvm/svm.c | 4 ++--
 arch/x86/kvm/vmx.c | 3 ++-
 arch/x86/kvm/x86.c | 2 +-
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 83a1c643f9a5..899c40f826dd 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3422,6 +3422,8 @@ static int handle_exit(struct kvm_vcpu *vcpu)
struct kvm_run *kvm_run = vcpu->run;
u32 exit_code = svm->vmcb->control.exit_code;
 
+   trace_kvm_exit(exit_code, vcpu, KVM_ISA_SVM);
+
if (!is_cr_intercept(svm, INTERCEPT_CR0_WRITE))
vcpu->arch.cr0 = svm->vmcb->save.cr0;
if (npt_enabled)
@@ -3892,8 +3894,6 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
vcpu->arch.regs[VCPU_REGS_RSP] = svm->vmcb->save.rsp;
vcpu->arch.regs[VCPU_REGS_RIP] = svm->vmcb->save.rip;
 
-   trace_kvm_exit(svm->vmcb->control.exit_code, vcpu, KVM_ISA_SVM);
-
if (unlikely(svm->vmcb->control.exit_code == SVM_EXIT_NMI))
kvm_before_handle_nmi(>vcpu);
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index af823a388c19..6b5605607849 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -8042,6 +8042,8 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu)
u32 exit_reason = vmx->exit_reason;
u32 vectoring_info = vmx->idt_vectoring_info;
 
+   trace_kvm_exit(exit_reason, vcpu, KVM_ISA_VMX);
+
/*
 * Flush logged GPAs PML buffer, this will make dirty_bitmap more
 * updated. Another good is, in kvm_vm_ioctl_get_dirty_log, before
@@ -8668,7 +8670,6 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
vmx->loaded_vmcs->launched = 1;
 
vmx->exit_reason = vmcs_read32(VM_EXIT_REASON);
-   trace_kvm_exit(vmx->exit_reason, vcpu, KVM_ISA_VMX);
 
/*
 * the KVM_REQ_EVENT optimization bit is only on for one entry, and if
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index eed32283d22c..2a8c035ec7fe 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6515,6 +6515,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
if (req_immediate_exit)
smp_send_reschedule(vcpu->cpu);
 
+   trace_kvm_entry(vcpu->vcpu_id);
__kvm_guest_enter();
 
if (unlikely(vcpu->arch.switch_db_regs)) {
@@ -6527,7 +6528,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
vcpu->arch.switch_db_regs &= ~KVM_DEBUGREG_RELOAD;
}
 
-   trace_kvm_entry(vcpu->vcpu_id);
wait_lapic_expire(vcpu);
kvm_x86_ops->run(vcpu);
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] kvm: x86: move tracepoints outside extended quiescent state

2015-12-10 Thread Paolo Bonzini
Invoking tracepoints within kvm_guest_enter/kvm_guest_exit causes a
lockdep splat.

Cc: sta...@vger.kernel.org
Reported-by: Borislav Petkov <b...@alien8.de>
Signed-off-by: Paolo Bonzini <pbonz...@redhat.com>
---
 arch/x86/kvm/svm.c | 4 ++--
 arch/x86/kvm/vmx.c | 3 ++-
 arch/x86/kvm/x86.c | 4 ++--
 3 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 83a1c643f9a5..899c40f826dd 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3422,6 +3422,8 @@ static int handle_exit(struct kvm_vcpu *vcpu)
struct kvm_run *kvm_run = vcpu->run;
u32 exit_code = svm->vmcb->control.exit_code;
 
+   trace_kvm_exit(exit_code, vcpu, KVM_ISA_SVM);
+
if (!is_cr_intercept(svm, INTERCEPT_CR0_WRITE))
vcpu->arch.cr0 = svm->vmcb->save.cr0;
if (npt_enabled)
@@ -3892,8 +3894,6 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
vcpu->arch.regs[VCPU_REGS_RSP] = svm->vmcb->save.rsp;
vcpu->arch.regs[VCPU_REGS_RIP] = svm->vmcb->save.rip;
 
-   trace_kvm_exit(svm->vmcb->control.exit_code, vcpu, KVM_ISA_SVM);
-
if (unlikely(svm->vmcb->control.exit_code == SVM_EXIT_NMI))
kvm_before_handle_nmi(>vcpu);
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index af823a388c19..6b5605607849 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -8042,6 +8042,8 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu)
u32 exit_reason = vmx->exit_reason;
u32 vectoring_info = vmx->idt_vectoring_info;
 
+   trace_kvm_exit(exit_reason, vcpu, KVM_ISA_VMX);
+
/*
 * Flush logged GPAs PML buffer, this will make dirty_bitmap more
 * updated. Another good is, in kvm_vm_ioctl_get_dirty_log, before
@@ -8668,7 +8670,6 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
vmx->loaded_vmcs->launched = 1;
 
vmx->exit_reason = vmcs_read32(VM_EXIT_REASON);
-   trace_kvm_exit(vmx->exit_reason, vcpu, KVM_ISA_VMX);
 
/*
 * the KVM_REQ_EVENT optimization bit is only on for one entry, and if
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index eed32283d22c..b84ba4b17757 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6515,6 +6515,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
if (req_immediate_exit)
smp_send_reschedule(vcpu->cpu);
 
+   trace_kvm_entry(vcpu->vcpu_id);
+   wait_lapic_expire(vcpu);
__kvm_guest_enter();
 
if (unlikely(vcpu->arch.switch_db_regs)) {
@@ -6527,8 +6529,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
vcpu->arch.switch_db_regs &= ~KVM_DEBUGREG_RELOAD;
}
 
-   trace_kvm_entry(vcpu->vcpu_id);
-   wait_lapic_expire(vcpu);
kvm_x86_ops->run(vcpu);
 
/*
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvm-unit-tests] x86: always inline functions called after set_exception_return

2015-12-09 Thread Paolo Bonzini


On 07/12/2015 21:36, David Matlack wrote:
> set_exception_return forces exceptions handlers to return to a specific
> address instead of returning to the instruction address pushed by the
> CPU at the time of the exception. The unit tests apic.c and vmx.c use
> this functionality to recover from expected exceptions.
> 
> When using set_exception_return we have to be careful not to modify the
> stack (such as by doing a function call) as triggering the exception will
> likely jump us past the instructions which undo the stack manipulation
> (such as a ret). To accomplish this, declare all functions called after
> set_exception_return as __always_inline, so that the compiler always
> inlines them.

set_exception_return is generally not a great idea IMHO---thanks for
looking at it.

A couple years ago we discussed adding setjmp/longjmp to libcflat
(http://www.spinics.net/lists/kvm/msg94159.html which is however missing
a 32-bit version).  Making the exceptions do a longjmp would be a much
safer option.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvmclock doesn't work, help?

2015-12-09 Thread Paolo Bonzini


On 09/12/2015 23:27, Andy Lutomirski wrote:
> On Wed, Dec 9, 2015 at 2:12 PM, Paolo Bonzini <pbonz...@redhat.com> wrote:
>> On 09/12/2015 22:49, Andy Lutomirski wrote:
>>> On Wed, Dec 9, 2015 at 1:16 PM, Paolo Bonzini <pbonz...@redhat.com> wrote:
>>>>
>>>>
>>>> On 09/12/2015 22:10, Andy Lutomirski wrote:
>>>>> Can we please stop making kvmclock more complex?  It's a beast right
>>>>> now, and not in a good way.  It's far too tangled with the vclock
>>>>> machinery on both the host and guest sides, the pvclock stuff is not
>>>>> well thought out (even in principle in an ABI sense), and it's never
>>>>> been clear to my what problem exactly the kvmclock stuff is supposed
>>>>> to solve.
>>>>
>>>> It's supposed to solve the problem that:
>>>>
>>>> - not all hosts have a working TSC
>>>
>>> Fine, but we don't need any vdso integration for that.
>>
>> Well, you still want a fast time source.  That was a given. :)
> 
> If the host can't figure out how to give *itself* a fast time source,
> I'd be surprised if KVM can manage to give the guest a fast, reliable
> time source.

There's no vdso integration unless the host has a constant, nonstop
(fully "working") TSC.  That's the meaning of PVCLOCK_TSC_STABLE_BIT.

So, correction:  if you can pull it off, you still want a fast time
source.  Otherwise, you still want one that is as fast as possible,
especially on the kernel side.

>>>> - even if they all do, virtual machines can be migrated (or
>>>> saved/restored) to a host with a different TSC frequency
>>>>
>>>> - any MMIO- or PIO-based mechanism to access the current time is orders
>>>> of magnitude slower than the TSC and less precise too.
>>
>> the problem is if you want to solve all three of them.  The first
>> two are solved by the ACPI PM timer with a decent resolution (70
>> ns---much faster anyway than an I/O port access).  The third is solved
>> by TSC.  To solve all three, you need kvmclock.
> 
> Still confused.  Is kvmclock really used in cases where even the host
> can't pull of working TSC?

You can certainly provide kvmclock even if you lack constant-rate or
nonstop TSC.  Those are only a requirement for vdso.

If the host has a constant-rate TSC, but the rate differs per physical
CPU (common on older NUMA machines), you can easily provide a working
kvmclock.  It cannot support vdso because you'll need to read the time
from a non-preemptable section, but it will work because KVM can update
the kvmclock parameters on VCPU migration, and it's still faster than
anything else.  (The purpose of the now-gone migration notifiers was to
support vdso even in this case).

If the host doesn't even have constant-rate TSC, you can still provide
kernel-only kvmclock reads through cpufreq notifiers.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvmclock doesn't work, help?

2015-12-09 Thread Paolo Bonzini


On 09/12/2015 22:10, Andy Lutomirski wrote:
> Can we please stop making kvmclock more complex?  It's a beast right
> now, and not in a good way.  It's far too tangled with the vclock
> machinery on both the host and guest sides, the pvclock stuff is not
> well thought out (even in principle in an ABI sense), and it's never
> been clear to my what problem exactly the kvmclock stuff is supposed
> to solve.

It's supposed to solve the problem that:

- not all hosts have a working TSC

- even if they all do, virtual machines can be migrated (or
saved/restored) to a host with a different TSC frequency

- any MMIO- or PIO-based mechanism to access the current time is orders
of magnitude slower than the TSC and less precise too.

> I'm somewhat tempted to suggest that we delete kvmclock entirely and
> start over.  A correctly functioning KVM guest using TSC (i.e.
> ignoring kvmclock entirely) seems to work rather more reliably and
> considerably faster than a kvmclock guest.

If all your hosts have a working TSC and you don't do migration or
save/restore, that's a valid configuration.  It's not a good default,
however.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvmclock doesn't work, help?

2015-12-09 Thread Paolo Bonzini


On 09/12/2015 22:49, Andy Lutomirski wrote:
> On Wed, Dec 9, 2015 at 1:16 PM, Paolo Bonzini <pbonz...@redhat.com> wrote:
>>
>>
>> On 09/12/2015 22:10, Andy Lutomirski wrote:
>>> Can we please stop making kvmclock more complex?  It's a beast right
>>> now, and not in a good way.  It's far too tangled with the vclock
>>> machinery on both the host and guest sides, the pvclock stuff is not
>>> well thought out (even in principle in an ABI sense), and it's never
>>> been clear to my what problem exactly the kvmclock stuff is supposed
>>> to solve.
>>
>> It's supposed to solve the problem that:
>>
>> - not all hosts have a working TSC
> 
> Fine, but we don't need any vdso integration for that.

Well, you still want a fast time source.  That was a given. :)

>> - even if they all do, virtual machines can be migrated (or
>> saved/restored) to a host with a different TSC frequency
>> 
>> - any MMIO- or PIO-based mechanism to access the current time is orders
>> of magnitude slower than the TSC and less precise too.
> 
> Yup.  But TSC by itself gets that benefit, too.

Yes, the problem is if you want to solve all three of them.  The first
two are solved by the ACPI PM timer with a decent resolution (70
ns---much faster anyway than an I/O port access).  The third is solved
by TSC.  To solve all three, you need kvmclock.

>>> I'm somewhat tempted to suggest that we delete kvmclock entirely and
>>> start over.  A correctly functioning KVM guest using TSC (i.e.
>>> ignoring kvmclock entirely) seems to work rather more reliably and
>>> considerably faster than a kvmclock guest.
>>
>> If all your hosts have a working TSC and you don't do migration or
>> save/restore, that's a valid configuration.  It's not a good default,
>> however.
> 
> Er?
> 
> kvmclock is still really quite slow and buggy.

Unless it takes 3-4000 clock cycles for a gettimeofday, which it
shouldn't even with vdso disabled, it's definitely not slower than PIO.

> And the patch I identified is definitely a problem here:
> 
> [  136.131241] KVM: disabling fast timing permanently due to inability
> to recover from suspend
> 
> I got that on the host with this whitespace-damaged patch:
> 
> if (backwards_tsc) {
> u64 delta_cyc = max_tsc - local_tsc;
> +   if (!backwards_tsc_observed)
> +   pr_warn("KVM: disabling fast timing
> permanently due to inability to recover from suspend\n");
> 
> when I suspended and resumed.
> 
> Can anyone explain what problem
> 16a9602158861687c78b6de6dc6a79e6e8a9136f is supposed to solve?  On
> brief inspection, it just seems to be incorrect.  Shouldn't KVM's
> normal TSC logic handle that case right?  After all, all vcpus should
> be paused when we resume from suspend.  At worst, we should just need
> kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu) on all vcpus.  (Actually,
> shouldn't we do that regardless of which way the TSC jumped on
> suspend/resume?  After all, the jTSC-to-wall-clock offset is quite
> likely to change except on the very small handful of CPUs (if any)
> that keep the TSC running in S3 and hibernate.

I don't recall the details of that patch, so Marcelo will have to answer
this, or Alex too since he chimed in the original thread.  At least it
should be made conditional on the existence of a VM at suspend time (and
the master clock stuff should be made per VM, as I suggested at
https://www.mail-archive.com/kvm@vger.kernel.org/msg102316.html).

It would indeed be great if the master clock could be dropped.  But I'm
definitely missing some of the subtle details. :(

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: Dump guest rIP when the guest tried something unsupported

2015-12-04 Thread Paolo Bonzini


On 20/11/2015 19:52, Borislav Petkov wrote:
> From: Borislav Petkov 
> 
> It looks like this in action:
> 
>   kvm [5197]: vcpu0, guest rIP: 0x810187ba unhandled rdmsr: 0xc001102
> 
> and helps to pinpoint quickly where in the guest we did the unsupported
> thing.
> 
> Signed-off-by: Borislav Petkov 
> ---
>  include/linux/kvm_host.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 5706a2108f0a..597f6607c440 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -439,7 +439,8 @@ struct kvm {
>  
>  /* The guest did something we don't support. */
>  #define vcpu_unimpl(vcpu, fmt, ...)  \
> - kvm_pr_unimpl("vcpu%i " fmt, (vcpu)->vcpu_id, ## __VA_ARGS__)
> + kvm_pr_unimpl("vcpu%i, guest rIP: 0x%lx " fmt,  \
> + (vcpu)->vcpu_id, kvm_rip_read(vcpu), ## __VA_ARGS__)
>  
>  #define vcpu_debug(vcpu, fmt, ...)   \
>   kvm_debug("vcpu%i " fmt, (vcpu)->vcpu_id, ## __VA_ARGS__)
> 

Thanks, applying this to kvm/queue.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/9] drivers/hv: replace enum hv_message_type by u32

2015-12-04 Thread Paolo Bonzini


On 04/12/2015 15:33, Denis V. Lunev wrote:
> On 12/02/2015 03:22 PM, Paolo Bonzini wrote:
>>
>> On 30/11/2015 17:22, Andrey Smetanin wrote:
>>> enum hv_message_type inside struct hv_message, hv_post_message
>>> is not size portable. Replace enum by u32.
>> It's only non-portable inside structs.  Okay to apply just these:
>>
>> @@ -172,7 +174,7 @@ union hv_message_flags {
>>
>>   /* Define synthetic interrupt controller message header. */
>>   struct hv_message_header {
>> -u32 message_type;
>> +enum hv_message_type message_type;
>>   u8 payload_size;
>>   union hv_message_flags message_flags;
>>   u8 reserved[2];
>> @@ -345,7 +347,7 @@ enum hv_call_code {
>>   struct hv_input_post_message {
>>   union hv_connection_id connectionid;
>>   u32 reserved;
>> -u32 message_type;
>> +enum hv_message_type message_type;
>>   u32 payload_size;
>>   u64 payload[HV_MESSAGE_PAYLOAD_QWORD_COUNT];
>>   };
>>
>> ?
>>
>> Paolo
> sorry for the delay.
> 
> Andrey is on vacation till the end of the week.
> 
> This could be not enough for some compilers as this exact
> enum could be signed and unsigned depends upon the
> implementation of the compiler and if it is signed we
> can face signed/unsigned comparison in ifs.

But why is that a problem?  The issue is pre-existing anyway; the only
one that can cause bugs when moving code to uapi/ (i.e. which means it
can be used on non-x86 platforms) is the size of the enum, not the
signedness.

Paolo

> Vitaly, by the way, this code is a bit rotten. The only caller of
> hv_post_message calls it with message type exactly written
> as "1", which is not defined in the enum.
> 
> /*
>  * vmbus_post_msg - Send a msg on the vmbus's message connection
>  */
> int vmbus_post_msg(void *buffer, size_t buflen)
> {
> ...
> ret = hv_post_message(conn_id, 1, buffer, buflen);
> 
> Den
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL] KVM fixes for 4.4-rc4

2015-12-04 Thread Paolo Bonzini
Linus,

The following changes since commit 31ade3b83e1821da5fbb2f11b5b3d4ab2ec39db8:

  Linux 4.4-rc3 (2015-11-29 18:58:26 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/virt/kvm/kvm.git tags/for-linus

for you to fetch changes up to 09922076003ad66de41ea14d2f8c3b4a16ec7774:

  Merge tag 'kvm-arm-for-v4.4-rc4' of 
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master 
(2015-12-04 18:32:32 +0100)



KVM/ARM fixes for v4.4-rc4

- A series of fixes to deal with the aliasing between the sp and xzr register
- A fix for the cache flush fix that went in -rc3


Ard Biesheuvel (1):
  ARM/arm64: KVM: correct PTE uncachedness check

Paolo Bonzini (1):
  Merge tag 'kvm-arm-for-v4.4-rc4' of 
git://git.kernel.org/.../kvmarm/kvmarm into kvm-master

Pavel Fedin (4):
  arm64: KVM: Correctly handle zero register during MMIO
  arm64: KVM: Remove const from struct sys_reg_params
  arm64: KVM: Correctly handle zero register in system register accesses
  arm64: KVM: Get rid of old vcpu_reg()

 arch/arm/include/asm/kvm_emulate.h   |  12 
 arch/arm/kvm/mmio.c  |   5 +-
 arch/arm/kvm/mmu.c   |   4 +-
 arch/arm/kvm/psci.c  |  20 +++---
 arch/arm64/include/asm/kvm_emulate.h |  18 +++--
 arch/arm64/kvm/handle_exit.c |   2 +-
 arch/arm64/kvm/sys_regs.c| 123 +--
 arch/arm64/kvm/sys_regs.h|   8 +--
 arch/arm64/kvm/sys_regs_generic_v8.c |   4 +-
 9 files changed, 107 insertions(+), 89 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/9] drivers/hv: replace enum hv_message_type by u32

2015-12-04 Thread Paolo Bonzini


On 04/12/2015 17:55, Denis V. Lunev wrote:
> On 12/04/2015 05:41 PM, Paolo Bonzini wrote:
>>
>> On 04/12/2015 15:33, Denis V. Lunev wrote:
>>> On 12/02/2015 03:22 PM, Paolo Bonzini wrote:
>>>> On 30/11/2015 17:22, Andrey Smetanin wrote:
>>>>> enum hv_message_type inside struct hv_message, hv_post_message
>>>>> is not size portable. Replace enum by u32.
>>>> It's only non-portable inside structs.  Okay to apply just these:
>>>>
>>>> @@ -172,7 +174,7 @@ union hv_message_flags {
>>>>
>>>>/* Define synthetic interrupt controller message header. */
>>>>struct hv_message_header {
>>>> -u32 message_type;
>>>> +enum hv_message_type message_type;
>>>>u8 payload_size;
>>>>union hv_message_flags message_flags;
>>>>u8 reserved[2];
>>>> @@ -345,7 +347,7 @@ enum hv_call_code {
>>>>struct hv_input_post_message {
>>>>union hv_connection_id connectionid;
>>>>u32 reserved;
>>>> -u32 message_type;
>>>> +enum hv_message_type message_type;
>>>>u32 payload_size;
>>>>u64 payload[HV_MESSAGE_PAYLOAD_QWORD_COUNT];
>>>>};
>>>>
>>>> ?
>>>>
>>>> Paolo
>>> sorry for the delay.
>>>
>>> Andrey is on vacation till the end of the week.
>>>
>>> This could be not enough for some compilers as this exact
>>> enum could be signed and unsigned depends upon the
>>> implementation of the compiler and if it is signed we
>>> can face signed/unsigned comparison in ifs.
>> But why is that a problem?  The issue is pre-existing anyway; the only
>> one that can cause bugs when moving code to uapi/ (i.e. which means it
>> can be used on non-x86 platforms) is the size of the enum, not the
>> signedness.
>
> we are now comparing enum with enum which are the same type.
> With the change you are proposing we will compare enum
> with u32 which are different.

This is only an issue in C++.

> Original suggestion from Andrey was safe in this respect.

Sure, but it makes code less clear.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] KVM/ARM fixes for 4.4-rc4

2015-12-04 Thread Paolo Bonzini


On 04/12/2015 18:17, Marc Zyngier wrote:
> Hi Paolo,
> 
> This pull request contains a number of fixes for 4.4-rc4 (or -rc5 if
> we already missed the boat).
> 
> The first part is a very nice catch from Pavel, who noticed that we
> were not dealing very well (if at all) with the aliasing between two
> registers.
> 
> The last patch from Ard is a fix for a fix that was merged in
> -rc3. Hopefully we got it right this time.
> 
> Please pull!
> 
>M.
> 
> The following changes since commit fbb4574ce9a37e15a9872860bf202f2be5bdf6c4:
> 
>   arm64: kvm: report original PAR_EL1 upon panic (2015-11-24 18:20:58 +0100)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git 
> tags/kvm-arm-for-v4.4-rc4
> 
> for you to fetch changes up to 0de58f852875a0f0dcfb120bb8433e4e73c7803b:
> 
>   ARM/arm64: KVM: correct PTE uncachedness check (2015-12-04 16:30:17 +)
> 
> 
> KVM/ARM fixes for v4.4-rc4
> 
> - A series of fixes to deal with the aliasing between the sp and xzr register
> - A fix for the cache flush fix that went in -rc3
> 
> 
> Ard Biesheuvel (1):
>   ARM/arm64: KVM: correct PTE uncachedness check
> 
> Pavel Fedin (4):
>   arm64: KVM: Correctly handle zero register during MMIO
>   arm64: KVM: Remove const from struct sys_reg_params
>   arm64: KVM: Correctly handle zero register in system register accesses
>   arm64: KVM: Get rid of old vcpu_reg()
> 
>  arch/arm/include/asm/kvm_emulate.h   |  12 
>  arch/arm/kvm/mmio.c  |   5 +-
>  arch/arm/kvm/mmu.c   |   4 +-
>  arch/arm/kvm/psci.c  |  20 +++---
>  arch/arm64/include/asm/kvm_emulate.h |  18 +++--
>  arch/arm64/kvm/handle_exit.c |   2 +-
>  arch/arm64/kvm/sys_regs.c| 123 
> +--
>  arch/arm64/kvm/sys_regs.h|   8 +--
>  arch/arm64/kvm/sys_regs_generic_v8.c |   4 +-
>  9 files changed, 107 insertions(+), 89 deletions(-)
> 

Pulled, thanks.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/9] drivers/hv: replace enum hv_message_type by u32

2015-12-04 Thread Paolo Bonzini

> >> we are now comparing enum with enum which are the same type.
> >> With the change you are proposing we will compare enum
> >> with u32 which are different.
> > This is only an issue in C++.
> >
> >> Original suggestion from Andrey was safe in this respect.
> > Sure, but it makes code less clear.
> >
> > Paolo
> 
> ok, this seems reasonable. Why not to reduce the patch :)
> We'll send an update on Monday.
> 
> Are there any other issue with the patchset?

No, I can also do the change myself. Check kvm/queue.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: VMX: fix the writing POSTED_INTR_NV

2015-12-03 Thread Paolo Bonzini


On 03/12/2015 06:29, roy.qing...@gmail.com wrote:
> From: Li RongQing  
> 
> POSTED_INTR_NV is 16bit, should not use 64bit write function
> 
> [ 5311.676074] vmwrite error: reg 3 value 0 (err 12)
>   [ 5311.680001] CPU: 49 PID: 4240 Comm: qemu-system-i38 Tainted: G I 
> 4.1.13-WR8.0.0.0_standard #1
>   [ 5311.689343] Hardware name: Intel Corporation S2600WT2/S2600WT2, BIOS 
> SE5C610.86B.01.01.0008.021120151325 02/11/2015
>   [ 5311.699550]   e69a7e1c c1950de1  e69a7e38 
> fafcff45 fafebd24
>   [ 5311.706924] 0003  000c b6a06dfa e69a7e40 fafcff79 
> e69a7eb0 fafd5f57
>   [ 5311.714296] e69a7ec0 c1080600  0001 c0e18018 01be 
>  0b43
>   [ 5311.721651] Call Trace:
>   [ 5311.722942] [] dump_stack+0x4b/0x75
>   [ 5311.726467] [] vmwrite_error+0x35/0x40 [kvm_intel]
>   [ 5311.731444] [] vmcs_writel+0x29/0x30 [kvm_intel]
>   [ 5311.736228] [] vmx_create_vcpu+0x337/0xb90 [kvm_intel]
>   [ 5311.741600] [] ? dequeue_task_fair+0x2e0/0xf60
>   [ 5311.746197] [] kvm_arch_vcpu_create+0x3a/0x70 [kvm]
>   [ 5311.751278] [] kvm_vm_ioctl+0x14d/0x640 [kvm]
>   [ 5311.755771] [] ? free_pages_prepare+0x1a4/0x2d0
>   [ 5311.760455] [] ? debug_smp_processor_id+0x12/0x20
>   [ 5311.765333] [] ? sched_move_task+0xbe/0x170
>   [ 5311.769621] [] ? kmem_cache_free+0x213/0x230
>   [ 5311.774016] [] ? kvm_set_memory_region+0x60/0x60 [kvm]
>   [ 5311.779379] [] do_vfs_ioctl+0x2e2/0x500
>   [ 5311.783285] [] ? kmem_cache_free+0x213/0x230
>   [ 5311.787677] [] ? __mmdrop+0x63/0xd0
>   [ 5311.791196] [] ? __mmdrop+0x63/0xd0
>   [ 5311.794712] [] ? __mmdrop+0x63/0xd0
>   [ 5311.798234] [] ? __fget+0x57/0x90
>   [ 5311.801559] [] ? __fget_light+0x22/0x50
>   [ 5311.805464] [] SyS_ioctl+0x80/0x90
>   [ 5311.808885] [] sysenter_do_call+0x12/0x12
>   [ 5312.059280] kvm: zapping shadow pages for mmio generation wraparound
>   [ 5313.678415] kvm [4231]: vcpu0 disabled perfctr wrmsr: 0xc2 data 0x
>   [ 5313.726518] kvm [4231]: vcpu0 unhandled rdmsr: 0x570
> 
> Signed-off-by: Li RongQing 
> Cc: Yang Zhang 
> ---
>  arch/x86/kvm/vmx.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 5eb56ed..418e084 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -4780,7 +4780,7 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
>  
>   vmcs_write16(GUEST_INTR_STATUS, 0);
>  
> - vmcs_write64(POSTED_INTR_NV, POSTED_INTR_VECTOR);
> + vmcs_write16(POSTED_INTR_NV, POSTED_INTR_VECTOR);
>   vmcs_write64(POSTED_INTR_DESC_ADDR, __pa((>pi_desc)));
>   }
>  
> @@ -9505,7 +9505,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, 
> struct vmcs12 *vmcs12)
>*/
>   vmx->nested.posted_intr_nv = vmcs12->posted_intr_nv;
>   vmx->nested.pi_pending = false;
> - vmcs_write64(POSTED_INTR_NV, POSTED_INTR_VECTOR);
> + vmcs_write16(POSTED_INTR_NV, POSTED_INTR_VECTOR);
>   vmcs_write64(POSTED_INTR_DESC_ADDR,
>   page_to_phys(vmx->nested.pi_desc_page) +
>   (unsigned long)(vmcs12->posted_intr_desc_addr &
> 

Indeed this is a problem for 32-bit hosts.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: VMX: fix read/write sizes of VMCS fields

2015-12-03 Thread Paolo Bonzini
In theory this should have broken EPT on 32-bit kernels (due to
reading the high part of natural-width field GUEST_CR3).  Not sure
if no one noticed or the processor behaves differently from the
documentation.

Signed-off-by: Paolo Bonzini <pbonz...@redhat.com>
---
 arch/x86/kvm/vmx.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c39737ff0581..b1af1e48070b 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4868,7 +4868,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool 
init_event)
 
seg_setup(VCPU_SREG_CS);
vmcs_write16(GUEST_CS_SELECTOR, 0xf000);
-   vmcs_write32(GUEST_CS_BASE, 0x);
+   vmcs_writel(GUEST_CS_BASE, 0xul);
 
seg_setup(VCPU_SREG_DS);
seg_setup(VCPU_SREG_ES);
@@ -4904,7 +4904,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool 
init_event)
 
vmcs_write32(GUEST_ACTIVITY_STATE, GUEST_ACTIVITY_ACTIVE);
vmcs_write32(GUEST_INTERRUPTIBILITY_INFO, 0);
-   vmcs_write32(GUEST_PENDING_DBG_EXCEPTIONS, 0);
+   vmcs_writel(GUEST_PENDING_DBG_EXCEPTIONS, 0);
 
setup_msrs(vmx);
 
@@ -7893,7 +7893,7 @@ static void dump_vmcs(void)
u32 pin_based_exec_ctrl = vmcs_read32(PIN_BASED_VM_EXEC_CONTROL);
u32 secondary_exec_control = 0;
unsigned long cr4 = vmcs_readl(GUEST_CR4);
-   u64 efer = vmcs_readl(GUEST_IA32_EFER);
+   u64 efer = vmcs_read64(GUEST_IA32_EFER);
int i, n;
 
if (cpu_has_secondary_exec_ctrls())
@@ -10159,7 +10159,7 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, 
struct vmcs12 *vmcs12,
 * Additionally, restore L2's PDPTR to vmcs12.
 */
if (enable_ept) {
-   vmcs12->guest_cr3 = vmcs_read64(GUEST_CR3);
+   vmcs12->guest_cr3 = vmcs_readl(GUEST_CR3);
vmcs12->guest_pdptr0 = vmcs_read64(GUEST_PDPTR0);
vmcs12->guest_pdptr1 = vmcs_read64(GUEST_PDPTR1);
vmcs12->guest_pdptr2 = vmcs_read64(GUEST_PDPTR2);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: VMX: fix read/write sizes of VMCS fields in dump_vmcs

2015-12-03 Thread Paolo Bonzini
This was not printing the high parts of several 64-bit fields on
32-bit kernels.  Separate from the previous one to make the patches
easier to review.

Signed-off-by: Paolo Bonzini <pbonz...@redhat.com>
---
 arch/x86/kvm/vmx.c | 39 ---
 1 file changed, 20 insertions(+), 19 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index b1af1e48070b..b1a453d78155 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -7909,10 +7909,10 @@ static void dump_vmcs(void)
if ((secondary_exec_control & SECONDARY_EXEC_ENABLE_EPT) &&
(cr4 & X86_CR4_PAE) && !(efer & EFER_LMA))
{
-   pr_err("PDPTR0 = 0x%016lx  PDPTR1 = 0x%016lx\n",
-  vmcs_readl(GUEST_PDPTR0), vmcs_readl(GUEST_PDPTR1));
-   pr_err("PDPTR2 = 0x%016lx  PDPTR3 = 0x%016lx\n",
-  vmcs_readl(GUEST_PDPTR2), vmcs_readl(GUEST_PDPTR3));
+   pr_err("PDPTR0 = 0x%016llx  PDPTR1 = 0x%016llx\n",
+  vmcs_read64(GUEST_PDPTR0), vmcs_read64(GUEST_PDPTR1));
+   pr_err("PDPTR2 = 0x%016llx  PDPTR3 = 0x%016llx\n",
+  vmcs_read64(GUEST_PDPTR2), vmcs_read64(GUEST_PDPTR3));
}
pr_err("RSP = 0x%016lx  RIP = 0x%016lx\n",
   vmcs_readl(GUEST_RSP), vmcs_readl(GUEST_RIP));
@@ -7933,16 +7933,16 @@ static void dump_vmcs(void)
vmx_dump_sel("TR:  ", GUEST_TR_SELECTOR);
if ((vmexit_ctl & (VM_EXIT_SAVE_IA32_PAT | VM_EXIT_SAVE_IA32_EFER)) ||
(vmentry_ctl & (VM_ENTRY_LOAD_IA32_PAT | VM_ENTRY_LOAD_IA32_EFER)))
-   pr_err("EFER = 0x%016llx  PAT = 0x%016lx\n",
-  efer, vmcs_readl(GUEST_IA32_PAT));
-   pr_err("DebugCtl = 0x%016lx  DebugExceptions = 0x%016lx\n",
-  vmcs_readl(GUEST_IA32_DEBUGCTL),
+   pr_err("EFER = 0x%016llx  PAT = 0x%016llx\n",
+  efer, vmcs_read64(GUEST_IA32_PAT));
+   pr_err("DebugCtl = 0x%016llx  DebugExceptions = 0x%016lx\n",
+  vmcs_read64(GUEST_IA32_DEBUGCTL),
   vmcs_readl(GUEST_PENDING_DBG_EXCEPTIONS));
if (vmentry_ctl & VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL)
-   pr_err("PerfGlobCtl = 0x%016lx\n",
-  vmcs_readl(GUEST_IA32_PERF_GLOBAL_CTRL));
+   pr_err("PerfGlobCtl = 0x%016llx\n",
+  vmcs_read64(GUEST_IA32_PERF_GLOBAL_CTRL));
if (vmentry_ctl & VM_ENTRY_LOAD_BNDCFGS)
-   pr_err("BndCfgS = 0x%016lx\n", vmcs_readl(GUEST_BNDCFGS));
+   pr_err("BndCfgS = 0x%016llx\n", vmcs_read64(GUEST_BNDCFGS));
pr_err("Interruptibility = %08x  ActivityState = %08x\n",
   vmcs_read32(GUEST_INTERRUPTIBILITY_INFO),
   vmcs_read32(GUEST_ACTIVITY_STATE));
@@ -7971,11 +7971,12 @@ static void dump_vmcs(void)
   vmcs_read32(HOST_IA32_SYSENTER_CS),
   vmcs_readl(HOST_IA32_SYSENTER_EIP));
if (vmexit_ctl & (VM_EXIT_LOAD_IA32_PAT | VM_EXIT_LOAD_IA32_EFER))
-   pr_err("EFER = 0x%016lx  PAT = 0x%016lx\n",
-  vmcs_readl(HOST_IA32_EFER), vmcs_readl(HOST_IA32_PAT));
+   pr_err("EFER = 0x%016llx  PAT = 0x%016llx\n",
+  vmcs_read64(HOST_IA32_EFER),
+  vmcs_read64(HOST_IA32_PAT));
if (vmexit_ctl & VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL)
-   pr_err("PerfGlobCtl = 0x%016lx\n",
-  vmcs_readl(HOST_IA32_PERF_GLOBAL_CTRL));
+   pr_err("PerfGlobCtl = 0x%016llx\n",
+  vmcs_read64(HOST_IA32_PERF_GLOBAL_CTRL));
 
pr_err("*** Control State ***\n");
pr_err("PinBased=%08x CPUBased=%08x SecondaryExec=%08x\n",
@@ -7998,16 +7999,16 @@ static void dump_vmcs(void)
pr_err("IDTVectoring: info=%08x errcode=%08x\n",
   vmcs_read32(IDT_VECTORING_INFO_FIELD),
   vmcs_read32(IDT_VECTORING_ERROR_CODE));
-   pr_err("TSC Offset = 0x%016lx\n", vmcs_readl(TSC_OFFSET));
+   pr_err("TSC Offset = 0x%016llx\n", vmcs_read64(TSC_OFFSET));
if (secondary_exec_control & SECONDARY_EXEC_TSC_SCALING)
-   pr_err("TSC Multiplier = 0x%016lx\n",
-  vmcs_readl(TSC_MULTIPLIER));
+   pr_err("TSC Multiplier = 0x%016llx\n",
+  vmcs_read64(TSC_MULTIPLIER));
if (cpu_based_exec_ctrl & CPU_BASED_TPR_SHADOW)
pr_err("TPR Threshold = 0x%02x\n", vmcs_read32(TPR_THRESHOLD));
if (pin_based_exec_ctrl & PIN_BASED_POSTED_INTR)
pr_err("PostedIntrVec = 0x%02x\n", vmcs_read16

[PATCH] KVM: vmx: detect mismatched size in VMCS read/write

2015-12-03 Thread Paolo Bonzini
Signed-off-by: Paolo Bonzini <pbonz...@redhat.com>
---
I am sending this as RFC because the error messages it produces are
very ugly.  Because of inlining, the original line is lost.  The
alternative is to change vmcs_read/write/checkXX into macros, but
then you need to have a single huge BUILD_BUG_ON or BUILD_BUG_ON_MSG
because multiple BUILD_BUG_ON* with the same __LINE__ are not
supported well.
---
 arch/x86/kvm/vmx.c | 100 -
 1 file changed, 83 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index b1a453d78155..62d958a1ec0f 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1447,7 +1447,51 @@ static inline void ept_sync_context(u64 eptp)
}
 }
 
-static __always_inline unsigned long vmcs_readl(unsigned long field)
+static __always_inline void vmcs_check16(unsigned long field)
+{
+BUILD_BUG_ON_MSG(__builtin_constant_p(field) && ((field) & 0x6001) == 
0x2000,
+"16-bit accessor invalid for 64-bit field");
+BUILD_BUG_ON_MSG(__builtin_constant_p(field) && ((field) & 0x6001) == 
0x2001,
+"16-bit accessor invalid for 64-bit high field");
+BUILD_BUG_ON_MSG(__builtin_constant_p(field) && ((field) & 0x6000) == 
0x4000,
+"16-bit accessor invalid for 32-bit high field");
+BUILD_BUG_ON_MSG(__builtin_constant_p(field) && ((field) & 0x6000) == 
0x6000,
+"16-bit accessor invalid for natural width field");
+}
+
+static __always_inline void vmcs_check32(unsigned long field)
+{
+BUILD_BUG_ON_MSG(__builtin_constant_p(field) && ((field) & 0x6000) == 
0,
+"32-bit accessor invalid for 16-bit field");
+BUILD_BUG_ON_MSG(__builtin_constant_p(field) && ((field) & 0x6000) == 
0x6000,
+"32-bit accessor invalid for natural width field");
+}
+
+static __always_inline void vmcs_check64(unsigned long field)
+{
+BUILD_BUG_ON_MSG(__builtin_constant_p(field) && ((field) & 0x6000) == 
0,
+"64-bit accessor invalid for 16-bit field");
+BUILD_BUG_ON_MSG(__builtin_constant_p(field) && ((field) & 0x6001) == 
0x2001,
+"64-bit accessor invalid for 64-bit high field");
+BUILD_BUG_ON_MSG(__builtin_constant_p(field) && ((field) & 0x6000) == 
0x4000,
+"64-bit accessor invalid for 32-bit field");
+BUILD_BUG_ON_MSG(__builtin_constant_p(field) && ((field) & 0x6000) == 
0x6000,
+"64-bit accessor invalid for natural width field");
+}
+
+static __always_inline void vmcs_checkl(unsigned long field)
+{
+BUILD_BUG_ON_MSG(__builtin_constant_p(field) && ((field) & 0x6000) == 
0,
+"Natural width accessor invalid for 16-bit field");
+BUILD_BUG_ON_MSG(__builtin_constant_p(field) && ((field) & 0x6001) == 
0x2000,
+"Natural width accessor invalid for 64-bit field");
+BUILD_BUG_ON_MSG(__builtin_constant_p(field) && ((field) & 0x6001) == 
0x2001,
+"Natural width accessor invalid for 64-bit high 
field");
+BUILD_BUG_ON_MSG(__builtin_constant_p(field) && ((field) & 0x6000) == 
0x4000,
+"Natural width accessor invalid for 32-bit field");
+}
+
+static __always_inline unsigned long __vmcs_readl(unsigned long field)
 {
unsigned long value;
 
@@ -1458,23 +1502,32 @@ static __always_inline unsigned long 
vmcs_readl(unsigned long field)
 
 static __always_inline u16 vmcs_read16(unsigned long field)
 {
-   return vmcs_readl(field);
+   vmcs_check16(field);
+   return __vmcs_readl(field);
 }
 
 static __always_inline u32 vmcs_read32(unsigned long field)
 {
-   return vmcs_readl(field);
+   vmcs_check32(field);
+   return __vmcs_readl(field);
 }
 
 static __always_inline u64 vmcs_read64(unsigned long field)
 {
+   vmcs_check64(field);
 #ifdef CONFIG_X86_64
-   return vmcs_readl(field);
+   return __vmcs_readl(field);
 #else
-   return vmcs_readl(field) | ((u64)vmcs_readl(field+1) << 32);
+   return __vmcs_readl(field) | ((u64)__vmcs_readl(field+1) << 32);
 #endif
 }
 
+static __always_inline unsigned long vmcs_readl(unsigned long field)
+{
+   vmcs_checkl(field);
+   return __vmcs_readl(field);
+}
+
 static noinline void vmwrite_error(unsigned long field, unsigned long value)
 {
printk(KERN_ERR "vmwrite error: reg %lx value %lx (err %d)\n",
@@ -1482,7 +1535,7 @@ static noinline void vmwrite

Re: [GIT PULL 04/23] KVM: s390: rewrite vcpu_post_run and drop out early

2015-12-02 Thread Paolo Bonzini


On 02/12/2015 12:06, Christian Borntraeger wrote:
> + memcpy(>run->s.regs.gprs[14], >arch.sie_block->gg14, 16);

This is preexisting but... boy it's ugly. :)

Do you gain much over the simpler

vcpu->run->s.regs.gprs[14] = vcpu->arch.sie_block->gg14;
vcpu->run->s.regs.gprs[15] = vcpu->arch.sie_block->gg15;

?

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/9] drivers/hv: replace enum hv_message_type by u32

2015-12-02 Thread Paolo Bonzini


On 30/11/2015 17:22, Andrey Smetanin wrote:
> enum hv_message_type inside struct hv_message, hv_post_message
> is not size portable. Replace enum by u32.

It's only non-portable inside structs.  Okay to apply just these:

@@ -172,7 +174,7 @@ union hv_message_flags {

 /* Define synthetic interrupt controller message header. */
 struct hv_message_header {
-   u32 message_type;
+   enum hv_message_type message_type;
u8 payload_size;
union hv_message_flags message_flags;
u8 reserved[2];
@@ -345,7 +347,7 @@ enum hv_call_code {
 struct hv_input_post_message {
union hv_connection_id connectionid;
u32 reserved;
-   u32 message_type;
+   enum hv_message_type message_type;
u32 payload_size;
u64 payload[HV_MESSAGE_PAYLOAD_QWORD_COUNT];
 };

?

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL 00/23] KVM: s390 features, kvm_get_vcpu_by_id and stat for 4.5

2015-12-02 Thread Paolo Bonzini


On 02/12/2015 12:06, Christian Borntraeger wrote:
> Paolo,
> 
> here is the first s390 pull request for 4.5. It also contains the
> remaining vcpu lookup changes and an improved cleanup of the kvm_stat
> exit path.
> I have deferred the kvm_stat per VM patches.
> 
> The s390 changes are:
> - ESCA support (up to 248 CPUs)
> - detection if KVM works (e.g. for nested virtualization)
> - cleanups
> 
> The following changes since commit bb11c6c96544737aede6a2eb92e5c6bc8b46534b:
> 
>   KVM: x86: MMU: Remove unused parameter parent_pte from kvm_mmu_get_page() 
> (2015-11-26 15:31:36 +0100)
> 
> are available in the git reposi at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux.git  
> tags/kvm-s390-next-4.5-1
> 
> for you to fetch changes up to 2f8a43d45d14ad62b105ed99151b453c12df7149:
> 
>   KVM: s390: remove redudant assigment of error code (2015-11-30 12:47:13 
> +0100)
> 
> 
> KVM: s390 features, kvm_get_vcpu_by_id and stat
> 
> Several features for s390
> 1. ESCA support (up to 248 vCPUs)
> 2. KVM detection: we  can now detect if we support KVM (e.g. does KVM
>under KVM work?)
> 
> kvm_stat:
> 1. cleanup
> 
> kvm_get_vcpu_by_id:
> 1. Use kvm_get_vcpu_by_id where appropriate
> 2. Apply a heuristic to optimize for ID VCPU == No. VCPU
> 
> 
> Christian Borntraeger (1):
>   KVM: s390: remove redudant assigment of error code
> 
> David Hildenbrand (12):
>   KVM: Use common function for VCPU lookup by id
>   KVM: use heuristic for fast VCPU lookup by id
>   KVM: s390: rewrite vcpu_post_run and drop out early
>   KVM: s390: fast path for sca_ext_call_pending
>   KVM: s390: we always have a SCA
>   KVM: s390: fix SCA related races and double use
>   KVM: s390: always set/clear the SCA sda field
>   KVM: s390: cleanup sca_add_vcpu
>   KVM: s390: don't switch to ESCA for ucontrol
>   s390/sclp: introduce check for SIE
>   s390: show virtualization support in /proc/cpuinfo
>   KVM: s390: don't load kvm without virtualization support
> 
> Eugene (jno) Dvurechenski (8):
>   s390/sclp: introduce checks for ESCA and HVS
>   KVM: s390: Generalize access to IPTE controls
>   KVM: s390: Generalize access to SIGP controls
>   KVM: s390: Provide SCA-aware helpers for VCPU add/del
>   KVM: s390: Introduce new structures
>   KVM: s390: Make provisions for ESCA utilization
>   KVM: s390: Introduce switching code
>   KVM: s390: Enable up to 248 VCPUs per VM
> 
> Heiko Carstens (1):
>   KVM: s390: remove pointless test_facility(2) check
> 
> Janosch Frank (1):
>   KVM: Remove unnecessary debugfs dentry references
> 
>  arch/powerpc/kvm/book3s_hv.c |  10 +-
>  arch/s390/include/asm/elf.h  |   7 ++
>  arch/s390/include/asm/kvm_host.h |  49 +++-
>  arch/s390/include/asm/sclp.h |   8 +-
>  arch/s390/kernel/processor.c |   6 +
>  arch/s390/kernel/setup.c |   9 ++
>  arch/s390/kvm/diag.c |  11 +-
>  arch/s390/kvm/gaccess.c  |  38 +--
>  arch/s390/kvm/intercept.c|   7 +-
>  arch/s390/kvm/interrupt.c| 133 +-
>  arch/s390/kvm/kvm-s390.c | 237 
> +++
>  arch/s390/kvm/kvm-s390.h |   7 ++
>  drivers/s390/char/sclp_early.c   |   8 +-
>  include/linux/kvm_host.h |   6 +-
>  virt/kvm/kvm_main.c  |  30 ++---
>  15 files changed, 407 insertions(+), 159 deletions(-)
> 

Pulled, thanks.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL 04/23] KVM: s390: rewrite vcpu_post_run and drop out early

2015-12-02 Thread Paolo Bonzini


On 02/12/2015 14:04, Christian Borntraeger wrote:
>> > Do you gain much over the simpler
>> > 
>> >vcpu->run->s.regs.gprs[14] = vcpu->arch.sie_block->gg14;
>> >vcpu->run->s.regs.gprs[15] = vcpu->arch.sie_block->gg15;
>> > 
> Its just legacy code from the old days.
> There is a difference, but it seems to a missed opportunity from gcc
> 
> vcpu->arch.sie_block->gg14 = vcpu->run->s.regs.gprs[14];
> 839c:   e3 30 f0 b8 00 04   lg  %r3,184(%r15)
> 83a2:   e3 10 32 40 00 04   lg  %r1,576(%r3)
> 83a8:   e3 20 30 80 00 04   lg  %r2,128(%r3)
> 83ae:   e3 20 21 b8 00 04   lg  %r2,440(%r2)
> 83b4:   e3 20 10 a0 00 24   stg %r2,160(%r1)
> vcpu->arch.sie_block->gg15 = vcpu->run->s.regs.gprs[15];
> 83ba:   e3 10 32 40 00 04   lg  %r1,576(%r3)
> 83c0:   e3 20 30 80 00 04   lg  %r2,128(%r3)
> 83c6:   e3 20 21 c0 00 04   lg  %r2,448(%r2)
> 83cc:   e3 20 10 a8 00 24   stg %r2,168(%r1)
> 
> gcc seems to reuse and reload %r2 and %r3, maybe register pressure.

More likely to be -fno-strict-aliasing. :(

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests] x86: use asm volatile for flags and segment register read/writes

2015-12-02 Thread Paolo Bonzini
The effects of a move from or to these registers is not entirely described
by the asm's operands.  Therefore, it may happen that the compiler
moves the asm around in ways that break tests.

In one case, the compiler marked read_ss() as pure and thus subjected
it to common subexpression elimination:

u16 ss = read_ss();

// check for null segment load
*mem = 0;
asm volatile("mov %0, %%ss" : : "m"(*mem));
report("mov null, %%ss", read_ss() == 0);

This caused a spurious failure of the test.

Reported-by: Lucas Meneguel Rodrigues <l...@scylladb.com>
Signed-off-by: Paolo Bonzini <pbonz...@redhat.com>
---
 lib/x86/processor.h | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/lib/x86/processor.h b/lib/x86/processor.h
index 1816807..95cea1a 100644
--- a/lib/x86/processor.h
+++ b/lib/x86/processor.h
@@ -61,7 +61,7 @@ static inline u16 read_cs(void)
 {
 unsigned val;
 
-asm ("mov %%cs, %0" : "=mr"(val));
+asm volatile ("mov %%cs, %0" : "=mr"(val));
 return val;
 }
 
@@ -69,7 +69,7 @@ static inline u16 read_ds(void)
 {
 unsigned val;
 
-asm ("mov %%ds, %0" : "=mr"(val));
+asm volatile ("mov %%ds, %0" : "=mr"(val));
 return val;
 }
 
@@ -77,7 +77,7 @@ static inline u16 read_es(void)
 {
 unsigned val;
 
-asm ("mov %%es, %0" : "=mr"(val));
+asm volatile ("mov %%es, %0" : "=mr"(val));
 return val;
 }
 
@@ -85,7 +85,7 @@ static inline u16 read_ss(void)
 {
 unsigned val;
 
-asm ("mov %%ss, %0" : "=mr"(val));
+asm volatile ("mov %%ss, %0" : "=mr"(val));
 return val;
 }
 
@@ -93,7 +93,7 @@ static inline u16 read_fs(void)
 {
 unsigned val;
 
-asm ("mov %%fs, %0" : "=mr"(val));
+asm volatile ("mov %%fs, %0" : "=mr"(val));
 return val;
 }
 
@@ -101,45 +101,45 @@ static inline u16 read_gs(void)
 {
 unsigned val;
 
-asm ("mov %%gs, %0" : "=mr"(val));
+asm volatile ("mov %%gs, %0" : "=mr"(val));
 return val;
 }
 
 static inline unsigned long read_rflags(void)
 {
unsigned long f;
-   asm ("pushf; pop %0\n\t" : "=rm"(f));
+   asm volatile ("pushf; pop %0\n\t" : "=rm"(f));
return f;
 }
 
 static inline void write_ds(unsigned val)
 {
-asm ("mov %0, %%ds" : : "rm"(val) : "memory");
+asm volatile ("mov %0, %%ds" : : "rm"(val) : "memory");
 }
 
 static inline void write_es(unsigned val)
 {
-asm ("mov %0, %%es" : : "rm"(val) : "memory");
+asm volatile ("mov %0, %%es" : : "rm"(val) : "memory");
 }
 
 static inline void write_ss(unsigned val)
 {
-asm ("mov %0, %%ss" : : "rm"(val) : "memory");
+asm volatile ("mov %0, %%ss" : : "rm"(val) : "memory");
 }
 
 static inline void write_fs(unsigned val)
 {
-asm ("mov %0, %%fs" : : "rm"(val) : "memory");
+asm volatile ("mov %0, %%fs" : : "rm"(val) : "memory");
 }
 
 static inline void write_gs(unsigned val)
 {
-asm ("mov %0, %%gs" : : "rm"(val) : "memory");
+asm volatile ("mov %0, %%gs" : : "rm"(val) : "memory");
 }
 
 static inline void write_rflags(unsigned long f)
 {
-   asm ("push %0; popf\n\t" : : "rm"(f));
+asm volatile ("push %0; popf\n\t" : : "rm"(f));
 }
 
 static inline u64 rdmsr(u32 index)
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: gva_to_gpa function internals

2015-12-01 Thread Paolo Bonzini


On 01/12/2015 19:30, Yacine HEBBAL wrote:
> Hi all,
> I'm trying to build some tools on top of kvm in order to debug, monitor and
> reverse engineer the guest OS (ubuntu 12.04, 32 bits)
> One of my tools walks through (and prints) the guest paging data structures
> as following: cr3 -> pdpte -> pde -> pte -> page (PAE paging, 32 bits)
> 
> According to my logs some accessed kernel PTEs are not present (pte =
> 9090909090909090) in all processes address spaces (even from init process
> cr3), however when I use the function kvm_read_guest_virt_helper on their
> corresponding virtual addresses (GVAs), I get a correct content (content
> correctness checked using system.map file).
> Just after calling kvm_read_guest_virt_helper, I check again the PTE
> corresponding to the read gva, I see that they are unmapped (invalid, always
> 9090909090909090)
> 
> I investigated a little the code of kvm_read_guest_virt_helper, this
> function calls vcpu->arch.walk_mmu->gva_to_gpa(vcpu, gva, ...) which in turn
> calls other functions until FNAME(walk_addr_generic) which seems to do the
> translation.
> walk_addr_generic seems to do the translation starting from cr3 of the
> current process (in line: mmu->get_cr3(vcpu);) and works fine regardless of
> the identity of the current process (i.e. current cr3).
> 
> So how the function gva_to_gpa is able to the read correctly any GVA that my
> tool sees invalid (unmapped) in the paging structures, knowing that my tool
> is able to read and display correctly a content of (thousands) many other 
> GVAs ?
> I would be very thankful for any feedback :)

Unfortunately that's impossible to know without knowing your tool.  How
does it read guest memory?

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [for-2.6 PATCH 1/3] target-i386: Define structs for layout of xsave area

2015-12-01 Thread Paolo Bonzini


On 01/12/2015 18:20, Richard Henderson wrote:
>>
>> X86XSaveArea will be used only when loading/saving state using
>> xsave, not for executing regular instructions.
> 
> ... like the regular instruction xsave?
> 
> https://patchwork.ozlabs.org/patch/493318/

Right, but that's a helper anyway.

>> In X86CPU, the
>> data is already stored as XMMReg unions (the one with the
>> XMM_[BWDQ] helpers).
> 
> Of course.  But those unions are arranged to be in big-endian format on
> big-endian hosts.  So we need to swap the data back to little-endian
> format for storage into guest memory.

Yes, you can use byte moves with XMM_B (more obvious), or stq_le_p with
XMM_Q (faster I guess---though the compiler might optimize the former on
little-endian hosts).  Either works with an uint8_t[] destination.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM with PCI forwarding really slow after 4.1

2015-12-01 Thread Paolo Bonzini


On 01/12/2015 18:09, Michael Büsch wrote:
> Hi,
> 
> I use "-device pci-assign,host=00:1a.0" to forward a USB host chip
> to a Win7 32 bit inside of qemu/kvm. That used to work pretty well,
> but it broke horribly somewhere after 4.1. With recent kernels the
> virtual machine boots, but is _very_ slow. It takes hours to boot. 
> If PCI forwarding is disabled, everything is fine.

This has been reported already, I'm going to look at it this week.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/11] KVM: x86: track guest page access

2015-12-01 Thread Paolo Bonzini


On 30/11/2015 19:26, Xiao Guangrong wrote:
> This patchset introduces the feature which allows us to track page
> access in guest. Currently, only write access tracking is implemented
> in this version.
> 
> Four APIs are introduces:
> - kvm_page_track_add_page(kvm, gfn, mode), single guest page @gfn is
>   added into the track pool of the guest instance represented by @kvm,
>   @mode specifies which kind of access on the @gfn is tracked
>   
> - kvm_page_track_remove_page(kvm, gfn, mode), is the opposed operation
>   of kvm_page_track_add_page() which removes @gfn from the tracking pool.
>   gfn is no tracked after its last user is gone
> 
> - kvm_page_track_register_notifier(kvm, n), register a notifier so that
>   the event triggered by page tracking will be received, at that time,
>   the callback of n->track_write() will be called
> 
> - kvm_page_track_unregister_notifier(kvm, n), does the opposed operation
>   of kvm_page_track_register_notifier(), which unlinks the notifier and
>   stops receiving the tracked event
> 
> The first user of page track is non-leaf shadow page tables as they are
> always write protected. It also gains performance improvement because
> page track speeds up page fault handler for the tracked pages. The
> performance result of kernel building is as followings:
> 
>before   after
> real 461.63   real 455.48
> user 4529.55  user 4557.88
> sys 1995.39   sys 1922.57

For KVM-GT, as far as I know Andrea Arcangeli is working on extending
userfaultfd to tracking write faults only.  Perhaps KVM-GT can do
something similar, where KVM gets the write tracking functionality for
free through the MMU notifiers.  Any thoughts on this?

Applying your technique to non-leaf shadow pages actually makes this
series quite interesting. :)  Shadow paging is still in use for nested
EPT, so it's always a good idea to speed it up.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/3] target-i386: Use C struct for xsave area layout, offsets & sizes

2015-12-01 Thread Paolo Bonzini
On 30/11/2015 18:34, Eduardo Habkost wrote:
> target-i386/cpu.c:ext_save_area uses magic numbers for the xsave
> area offets and sizes, and target-i386/kvm.c:kvm_{put,get}_xsave()
> uses offset macros and bit manipulation to access the xsave area.
> This series changes both to use C structs for those operations.
> 
> I still need to figure out a way to write unit tests for the new
> code. Maybe I will just copy and paste the new and old functions,
> and test them locally (checking if they give the same results
> when translating blobs of random bytes).
> 
> Changes v1 -> v2:
> * Use uint8_t[8*n] instead of uint64_t[n] for register data
> * Keep the QEMU_BUILD_BUG_ON lines
> 
> v1 -> v2 diff below:
> 
>   diff --git a/target-i386/cpu.h b/target-i386/cpu.h
>   index 3d1d01e..41f55ef 100644
>   --- a/target-i386/cpu.h
>   +++ b/target-i386/cpu.h
>   @@ -818,7 +818,7 @@ typedef union X86LegacyXSaveArea {
>uint32_t mxcsr;
>uint32_t mxcsr_mask;
>FPReg fpregs[8];
>   -uint64_t xmm_regs[16][2];
>   +uint8_t xmm_regs[16][16];
>};
>uint8_t data[512];
>} X86LegacyXSaveArea;
>   @@ -831,7 +831,7 @@ typedef struct X86XSaveHeader {
> 
>/* Ext. save area 2: AVX State */
>typedef struct XSaveAVX {
>   -uint64_t ymmh[16][2];
>   +uint8_t ymmh[16][16];
>} XSaveAVX;
> 
>/* Ext. save area 3: BNDREG */
>   @@ -852,12 +852,12 @@ typedef struct XSaveOpmask {
> 
>/* Ext. save area 6: ZMM_Hi256 */
>typedef struct XSaveZMM_Hi256 {
>   -uint64_t zmm_hi256[16][4];
>   +uint8_t zmm_hi256[16][32];
>} XSaveZMM_Hi256;
> 
>/* Ext. save area 7: Hi16_ZMM */
>typedef struct XSaveHi16_ZMM {
>   -XMMReg hi16_zmm[16];
>   +uint8_t hi16_zmm[16][64];
>} XSaveHi16_ZMM;
> 
>typedef struct X86XSaveArea {
>   diff --git a/target-i386/kvm.c b/target-i386/kvm.c
>   index 5e7ec70..98249e4 100644
>   --- a/target-i386/kvm.c
>   +++ b/target-i386/kvm.c
>   @@ -1203,6 +1203,43 @@ static int kvm_put_fpu(X86CPU *cpu)
>return kvm_vcpu_ioctl(CPU(cpu), KVM_SET_FPU, );
>}
> 
>   +#define XSAVE_FCW_FSW 0
>   +#define XSAVE_FTW_FOP 1
>   +#define XSAVE_CWD_RIP 2
>   +#define XSAVE_CWD_RDP 4
>   +#define XSAVE_MXCSR   6
>   +#define XSAVE_ST_SPACE8
>   +#define XSAVE_XMM_SPACE   40
>   +#define XSAVE_XSTATE_BV   128
>   +#define XSAVE_YMMH_SPACE  144
>   +#define XSAVE_BNDREGS 240
>   +#define XSAVE_BNDCSR  256
>   +#define XSAVE_OPMASK  272
>   +#define XSAVE_ZMM_Hi256   288
>   +#define XSAVE_Hi16_ZMM416
>   +
>   +#define XSAVE_BYTE_OFFSET(word_offset) \
>   +((word_offset)*sizeof(((struct kvm_xsave*)0)->region[0]))
>   +
>   +#define ASSERT_OFFSET(word_offset, field) \
>   +QEMU_BUILD_BUG_ON(XSAVE_BYTE_OFFSET(word_offset) != \
>   +  offsetof(X86XSaveArea, field))
>   +
>   +ASSERT_OFFSET(XSAVE_FCW_FSW, legacy.fcw);
>   +ASSERT_OFFSET(XSAVE_FTW_FOP, legacy.ftw);
>   +ASSERT_OFFSET(XSAVE_CWD_RIP, legacy.fpip);
>   +ASSERT_OFFSET(XSAVE_CWD_RDP, legacy.fpdp);
>   +ASSERT_OFFSET(XSAVE_MXCSR, legacy.mxcsr);
>   +ASSERT_OFFSET(XSAVE_ST_SPACE, legacy.fpregs);
>   +ASSERT_OFFSET(XSAVE_XMM_SPACE, legacy.xmm_regs);
>   +ASSERT_OFFSET(XSAVE_XSTATE_BV, header.xstate_bv);
>   +ASSERT_OFFSET(XSAVE_YMMH_SPACE, avx_state);
>   +ASSERT_OFFSET(XSAVE_BNDREGS, bndreg_state);
>   +ASSERT_OFFSET(XSAVE_BNDCSR, bndcsr_state);
>   +ASSERT_OFFSET(XSAVE_OPMASK, opmask_state);
>   +ASSERT_OFFSET(XSAVE_ZMM_Hi256, zmm_hi256_state);
>   +ASSERT_OFFSET(XSAVE_Hi16_ZMM, hi16_zmm_state);
>   +
>static int kvm_put_xsave(X86CPU *cpu)
>{
>CPUX86State *env = >env;
>   @@ -1239,17 +1276,17 @@ static int kvm_put_xsave(X86CPU *cpu)
>sizeof env->opmask_regs);
> 
>for (i = 0; i < CPU_NB_REGS; i++) {
>   -X86LegacyXSaveArea *legacy = >legacy;
>   -XSaveAVX *avx = >avx_state;
>   -XSaveZMM_Hi256 *zmm_hi256 = >zmm_hi256_state;
>   -stq_p(>xmm_regs[i][0], env->xmm_regs[i].XMM_Q(0));
>   -stq_p(>xmm_regs[i][1], env->xmm_regs[i].XMM_Q(1));
>   -stq_p(>ymmh[i][0],env->xmm_regs[i].XMM_Q(2));
>   -stq_p(>ymmh[i][1],env->xmm_regs[i].XMM_Q(3));
>   -stq_p(_hi256->zmm_hi256[i][0], env->xmm_regs[i].XMM_Q(4));
>   -stq_p(_hi256->zmm_hi256[i][1], env->xmm_regs[i].XMM_Q(5));
>   -stq_p(_hi256->zmm_hi256[i][2], env->xmm_regs[i].XMM_Q(6));
>   -stq_p(_hi256->zmm_hi256[i][3], env->xmm_regs[i].XMM_Q(7));
>   +uint8_t *xmm = xsave->legacy.xmm_regs[i];
>   +uint8_t *ymmh = xsave->avx_state.ymmh[i];
>   +uint8_t *zmmh = xsave->zmm_hi256_state.zmm_hi256[i];
>   +stq_p(xmm, env->xmm_regs[i].XMM_Q(0));
>   +stq_p(xmm+8,   env->xmm_regs[i].XMM_Q(1));
>   +stq_p(ymmh,env->xmm_regs[i].XMM_Q(2));
>   +stq_p(ymmh+8,  env->xmm_regs[i].XMM_Q(3));
>   +stq_p(zmmh,

Re: [PATCH v2 0/3] target-i386: Use C struct for xsave area layout, offsets & sizes

2015-12-01 Thread Paolo Bonzini


On 30/11/2015 18:34, Eduardo Habkost wrote:
> target-i386/cpu.c:ext_save_area uses magic numbers for the xsave
> area offets and sizes, and target-i386/kvm.c:kvm_{put,get}_xsave()
> uses offset macros and bit manipulation to access the xsave area.
> This series changes both to use C structs for those operations.
> 
> I still need to figure out a way to write unit tests for the new
> code. Maybe I will just copy and paste the new and old functions,
> and test them locally (checking if they give the same results
> when translating blobs of random bytes).

I think it's easier to use small guests (i.e. kvm-unit-tests) to test
this code.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/3] target-i386: Use C struct for xsave area layout, offsets & sizes

2015-12-01 Thread Paolo Bonzini


On 01/12/2015 16:25, Eduardo Habkost wrote:
> > I think it's easier to use small guests (i.e. kvm-unit-tests) to test
> > this code.
>
> I agree it's easier, but how likely it is to catch bugs in the
> save/load code? If the code corrupts a register, we need to
> trigger a save/load cycle at the exact moment the guest code is
> using that register. Do we have something that helps us
> repeatedly save/load CPU state while kvm-unit-tests is running?

A vmware magic port read should do that.  Put VMPORT_MAGIC in EAX and
VMPORT_CMD_GETVERSION in ECX, then do a 32-bit in from port 0x5658.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 2/2] KVM: Make KVM_CAP_IRQFD dependent on KVM_CAP_IRQCHIP

2015-12-01 Thread Paolo Bonzini


On 30/11/2015 15:38, Cornelia Huck wrote:
> It obviously
> requires an irqchip; but if you need some configuration/enablement
> beforehand, you'll get different values depending on when you retrieve
> the cap. So does KVM_CAP_IRQFD mean "irqfds are available in principle"
> or "everything has been setup for usage of irqfds"? I'd assume the
> former.

It should be the former, yes.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/11] KVM: x86: track guest page access

2015-12-01 Thread Paolo Bonzini


On 01/12/2015 16:02, Andrea Arcangeli wrote:
> > Applying your technique to non-leaf shadow pages actually makes this
> > series quite interesting. :)  Shadow paging is still in use for nested
> > EPT, so it's always a good idea to speed it up.
> 
> I don't have the full picture of how userfaultfd write tracking could
> also fit in the leaf/non-leaf shadow pagetable write tracking yet but
> it's good to think about it.

It's unrelated.

Xiao wrote this series for KVM-GT.  I'm suggesting that he uses
userfaultfd write tracking (or similar techniques---but anyway
implemented out of KVM) for KVM-GT.  The benefit is that KVM-GT is then
unrelated to KVM, similar to legacy KVM device assignment vs. VFIO.

However, he also applied this new API to shadow pagetable write
tracking.  He gets measurable (~2%) performance improvement.  We can
look separately at how to get a similar performance improvement, even if
KVM-GT will not use the new page tracking API.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [for-2.6 PATCH 1/3] target-i386: Define structs for layout of xsave area

2015-11-30 Thread Paolo Bonzini


On 28/11/2015 20:56, Eduardo Habkost wrote:
> +/* Ext. save area 2: AVX State */
> +typedef struct XSaveAVX {
> +uint64_t ymmh[16][2];
> +} XSaveAVX;
> +

Because this is always little endian, I would write it as uint8_t[16][16].

> +/* Ext. save area 6: ZMM_Hi256 */
> +typedef struct XSaveZMM_Hi256 {
> +uint64_t zmm_hi256[16][4];
> +} XSaveZMM_Hi256;

Same here (uint8_t[16][32]).

> +/* Ext. save area 7: Hi16_ZMM */
> +typedef struct XSaveHi16_ZMM {
> +XMMReg hi16_zmm[16];
> +} XSaveHi16_ZMM;

This is uint8_t[16][64] or uint64_t[16][8].

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [for-2.6 PATCH 0/3] target-i386: Use C struct for xsave area layout, offsets & sizes

2015-11-30 Thread Paolo Bonzini


On 28/11/2015 20:56, Eduardo Habkost wrote:
> I still need to figure out a way to write unit tests for the new
> code. Maybe I will just copy and paste the new and old functions,
> and test them locally (checking if they give the same results
> when translating blobs of random bytes).

Aren't the QEMU_BUILD_BUG_ON enough?  No need to delete them in patch 3,
though perhaps you can remove the #defines.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 0/5] KVM-UNIT-TESTS: Hyper-V SynIC timers test

2015-11-27 Thread Paolo Bonzini


On 26/11/2015 17:29, Andrey Smetanin wrote:
> The test checks Hyper-V SynIC timers functionality.
> The test runs on every vCPU and performs start/stop
> of periodic/one-shot timers (with period=1ms) and checks
> validity of received expiration messages in appropriate
> ISR's.
> 
> Signed-off-by: Andrey Smetanin <asmeta...@virtuozzo.com>
> Reviewed-by: Roman Kagan <rka...@virtuozzo.com>
> CC: Paolo Bonzini <pbonz...@redhat.com>
> CC: Marcelo Tosatti <mtosa...@redhat.com>
> CC: Roman Kagan <rka...@virtuozzo.com>
> CC: Denis V. Lunev <d...@openvz.org>
> CC: qemu-de...@nongnu.org
> 
> Andrey Smetanin (5):
>   lib/x86: Added Hyper-V MSR's availability bits into msr.h
>   lib/x86: Added HV_X64_MSR_TIME_REF_COUNT value into msr.h
>   lib/x86: Added Hyper-V SynIC timers MSR's values
>   lib/x86: Make free_page() available to call
>   x86: Hyper-V SynIC timers test

In addition to my comments on 5/5, can you make instead a hyperv.h file
with all that you need in the two testcases?

Paolo

>  config/config-x86-common.mak |   4 +-
>  lib/x86/msr.h|  19 ++
>  lib/x86/vm.h |   1 +
>  x86/hyperv_stimer.c  | 500 
> +++
>  x86/hyperv_synic.c   |   2 +-
>  x86/unittests.cfg|   5 +
>  6 files changed, 529 insertions(+), 2 deletions(-)
>  create mode 100644 x86/hyperv_stimer.c
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 5/5] x86: Hyper-V SynIC timers test

2015-11-27 Thread Paolo Bonzini


On 27/11/2015 12:30, Andrey Smetanin wrote:
>>>
>>> +
>>> +static void stimer_test_cleanup(void *ctx)
>>> +{
>>> +irq_enable();
>>
>> Why enable again?
> I'll remove it.

I guess you can remove the one in stimer_test_prepare too.  If the
interrupts are disabled you don't get the IPI either, do you?

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 5/5] x86: Hyper-V SynIC timers test

2015-11-27 Thread Paolo Bonzini
The test logic is good, but the glue can be improved a bit so that the
output is more useful if it breaks.

On 26/11/2015 17:29, Andrey Smetanin wrote:
> The test checks Hyper-V SynIC timers functionality.
> The test runs on every vCPU and performs start/stop
> of periodic/one-shot timers (with period=1ms) and checks
> validity of received expiration messages in appropriate
> ISR's.
> 
> Signed-off-by: Andrey Smetanin <asmeta...@virtuozzo.com>
> Reviewed-by: Roman Kagan <rka...@virtuozzo.com>
> CC: Paolo Bonzini <pbonz...@redhat.com>
> CC: Marcelo Tosatti <mtosa...@redhat.com>
> CC: Roman Kagan <rka...@virtuozzo.com>
> CC: Denis V. Lunev <d...@openvz.org>
> CC: qemu-de...@nongnu.org
> 
> ---
>  config/config-x86-common.mak |   4 +-
>  x86/hyperv_stimer.c  | 500 
> +++
>  x86/unittests.cfg|   5 +
>  3 files changed, 508 insertions(+), 1 deletion(-)
>  create mode 100644 x86/hyperv_stimer.c
> 
> diff --git a/config/config-x86-common.mak b/config/config-x86-common.mak
> index f64874d..a75be87 100644
> --- a/config/config-x86-common.mak
> +++ b/config/config-x86-common.mak
> @@ -37,7 +37,7 @@ tests-common = $(TEST_DIR)/vmexit.flat $(TEST_DIR)/tsc.flat 
> \
> $(TEST_DIR)/s3.flat $(TEST_DIR)/pmu.flat \
> $(TEST_DIR)/tsc_adjust.flat $(TEST_DIR)/asyncpf.flat \
> $(TEST_DIR)/init.flat $(TEST_DIR)/smap.flat \
> -   $(TEST_DIR)/hyperv_synic.flat
> +   $(TEST_DIR)/hyperv_synic.flat $(TEST_DIR)/hyperv_stimer.flat \
>  
>  ifdef API
>  tests-common += api/api-sample
> @@ -115,6 +115,8 @@ $(TEST_DIR)/memory.elf: $(cstart.o) $(TEST_DIR)/memory.o
>  
>  $(TEST_DIR)/hyperv_synic.elf: $(cstart.o) $(TEST_DIR)/hyperv_synic.o
>  
> +$(TEST_DIR)/hyperv_stimer.elf: $(cstart.o) $(TEST_DIR)/hyperv_stimer.o
> +
>  arch_clean:
>   $(RM) $(TEST_DIR)/*.o $(TEST_DIR)/*.flat $(TEST_DIR)/*.elf \
>   $(TEST_DIR)/.*.d lib/x86/.*.d
> diff --git a/x86/hyperv_stimer.c b/x86/hyperv_stimer.c
> new file mode 100644
> index 000..e9186ca
> --- /dev/null
> +++ b/x86/hyperv_stimer.c
> @@ -0,0 +1,500 @@
> +#include "libcflat.h"
> +#include "processor.h"
> +#include "msr.h"
> +#include "isr.h"
> +#include "vm.h"
> +#include "apic.h"
> +#include "desc.h"
> +#include "io.h"
> +#include "smp.h"
> +#include "atomic.h"
> +
> +#define MAX_CPUS 4
> +#define HYPERV_CPUID_FEATURES   0x4003
> +
> +#define HV_SYNIC_CONTROL_ENABLE (1ULL << 0)
> +#define HV_SYNIC_SIMP_ENABLE(1ULL << 0)
> +#define HV_SYNIC_SIEFP_ENABLE   (1ULL << 0)
> +#define HV_SYNIC_SINT_MASKED(1ULL << 16)
> +#define HV_SYNIC_SINT_AUTO_EOI  (1ULL << 17)
> +#define HV_SYNIC_SINT_VECTOR_MASK   (0xFF)
> +#define HV_SYNIC_SINT_COUNT 16
> +
> +#define HV_STIMER_ENABLE(1ULL << 0)
> +#define HV_STIMER_PERIODIC  (1ULL << 1)
> +#define HV_STIMER_LAZY  (1ULL << 2)
> +#define HV_STIMER_AUTOENABLE(1ULL << 3)
> +#define HV_STIMER_SINT(config)  (__u8)(((config) >> 16) & 0x0F)
> +
> +#define HV_SYNIC_STIMER_COUNT   (4)
> +
> +/* Define synthetic interrupt controller message constants. */
> +#define HV_MESSAGE_SIZE (256)
> +#define HV_MESSAGE_PAYLOAD_BYTE_COUNT   (240)
> +#define HV_MESSAGE_PAYLOAD_QWORD_COUNT  (30)
> +
> +/* Define hypervisor message types. */
> +enum hv_message_type {
> +HVMSG_NONE  = 0x,
> +
> +/* Memory access messages. */
> +HVMSG_UNMAPPED_GPA  = 0x8000,
> +HVMSG_GPA_INTERCEPT = 0x8001,
> +
> +/* Timer notification messages. */
> +HVMSG_TIMER_EXPIRED = 0x8010,
> +
> +/* Error messages. */
> +HVMSG_INVALID_VP_REGISTER_VALUE = 0x8020,
> +HVMSG_UNRECOVERABLE_EXCEPTION   = 0x8021,
> +HVMSG_UNSUPPORTED_FEATURE   = 0x8022,
> +
> +/* Trace buffer complete messages. */
> +HVMSG_EVENTLOG_BUFFERCOMPLETE   = 0x8040,
> +
> +/* Platform-specific processor intercept messages. */
> +HVMSG_X64_IOPORT_INTERCEPT  = 0x8001,
> +HVMSG_X64_MSR_INTERCEPT = 0x80010001,
> +HVMSG_X64_CPUID_INTERCEPT   = 0x80010002,
> +HVMSG_X64_EXCEPTION_INTERC

Re: [PATCH v1 2/7] drivers/hv: Move struct hv_message into UAPI Hyper-V x86 header

2015-11-27 Thread Paolo Bonzini


On 25/11/2015 16:20, Andrey Smetanin wrote:
> This struct is required for Hyper-V SynIC timers implementation inside KVM
> and for upcoming Hyper-V VMBus support by userspace(QEMU). So place it into
> Hyper-V UAPI header.
> 
> Signed-off-by: Andrey Smetanin <asmeta...@virtuozzo.com>
> Reviewed-by: Roman Kagan <rka...@virtuozzo.com>
> CC: Gleb Natapov <g...@kernel.org>
> CC: Paolo Bonzini <pbonz...@redhat.com>
> CC: "K. Y. Srinivasan" <k...@microsoft.com>
> CC: Haiyang Zhang <haiya...@microsoft.com>
> CC: Vitaly Kuznetsov <vkuzn...@redhat.com>
> CC: Roman Kagan <rka...@virtuozzo.com>
> CC: Denis V. Lunev <d...@openvz.org>
> CC: qemu-de...@nongnu.org
> ---
>  arch/x86/include/uapi/asm/hyperv.h | 91 
> ++
>  drivers/hv/hyperv_vmbus.h  | 91 
> --
>  2 files changed, 91 insertions(+), 91 deletions(-)
> 
> diff --git a/arch/x86/include/uapi/asm/hyperv.h 
> b/arch/x86/include/uapi/asm/hyperv.h
> index 07981f0..e86d77e 100644
> --- a/arch/x86/include/uapi/asm/hyperv.h
> +++ b/arch/x86/include/uapi/asm/hyperv.h
> @@ -271,4 +271,95 @@ typedef struct _HV_REFERENCE_TSC_PAGE {
>  
>  #define HV_SYNIC_STIMER_COUNT(4)
>  
> +/* Define synthetic interrupt controller message constants. */
> +#define HV_MESSAGE_SIZE  (256)
> +#define HV_MESSAGE_PAYLOAD_BYTE_COUNT(240)
> +#define HV_MESSAGE_PAYLOAD_QWORD_COUNT   (30)
> +
> +/* Define hypervisor message types. */
> +enum hv_message_type {
> + HVMSG_NONE  = 0x,
> +
> + /* Memory access messages. */
> + HVMSG_UNMAPPED_GPA  = 0x8000,
> + HVMSG_GPA_INTERCEPT = 0x8001,
> +
> + /* Timer notification messages. */
> + HVMSG_TIMER_EXPIRED = 0x8010,
> +
> + /* Error messages. */
> + HVMSG_INVALID_VP_REGISTER_VALUE = 0x8020,
> + HVMSG_UNRECOVERABLE_EXCEPTION   = 0x8021,
> + HVMSG_UNSUPPORTED_FEATURE   = 0x8022,
> +
> + /* Trace buffer complete messages. */
> + HVMSG_EVENTLOG_BUFFERCOMPLETE   = 0x8040,
> +
> + /* Platform-specific processor intercept messages. */
> + HVMSG_X64_IOPORT_INTERCEPT  = 0x8001,
> + HVMSG_X64_MSR_INTERCEPT = 0x80010001,
> + HVMSG_X64_CPUID_INTERCEPT   = 0x80010002,
> + HVMSG_X64_EXCEPTION_INTERCEPT   = 0x80010003,
> + HVMSG_X64_APIC_EOI  = 0x80010004,
> + HVMSG_X64_LEGACY_FP_ERROR   = 0x80010005
> +};
> +
> +/* Define synthetic interrupt controller message flags. */
> +union hv_message_flags {
> + __u8 asu8;
> + struct {
> + __u8 msg_pending:1;
> + __u8 reserved:7;
> + };
> +};
> +
> +/* Define port identifier type. */
> +union hv_port_id {
> + __u32 asu32;
> + struct {
> + __u32 id:24;
> + __u32 reserved:8;
> + } u;
> +};
> +
> +/* Define port type. */
> +enum hv_port_type {
> + HVPORT_MSG  = 1,
> + HVPORT_EVENT= 2,
> + HVPORT_MONITOR  = 3
> +};
> +
> +/* Define synthetic interrupt controller message header. */
> +struct hv_message_header {
> + enum hv_message_type message_type;

Do not declare this as an enum, declare it as __u32 to make the size
portable.  It can be a patch on top.

KY, can you ack these two patches?

Paolo

> + __u8 payload_size;
> + union hv_message_flags message_flags;
> + __u8 reserved[2];
> + union {
> + __u64 sender;
> + union hv_port_id port;
> + };
> +};
> +
> +/* Define timer message payload structure. */
> +struct hv_timer_message_payload {
> + __u32 timer_index;
> + __u32 reserved;
> + __u64 expiration_time;  /* When the timer expired */
> + __u64 delivery_time;/* When the message was delivered */
> +};
> +
> +/* Define synthetic interrupt controller message format. */
> +struct hv_message {
> + struct hv_message_header header;
> + union {
> + __u64 payload[HV_MESSAGE_PAYLOAD_QWORD_COUNT];
> + } u;
> +};
> +
> +/* Define the synthetic interrupt message page layout. */
> +struct hv_message_page {
> + struct hv_message sint_message[HV_SYNIC_SINT_COUNT];
> +};
> +
>  #endif
> diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
> index 46e23d1..d22230c 100644
> --- a/drivers/hv/hyperv_vmbus.h
> +++ b/drivers/hv/hyperv_vmbus.h
> @@ -63,10 +63,6 @@ enum hv_cpuid_function {
>  /* Define version of the synthet

  1   2   3   4   5   6   7   8   9   10   >