Re: [PATCH] KVM: x86: fix bogus warning about reserved bits

2015-09-23 Thread Xiao Guangrong



On 09/23/2015 05:36 PM, Paolo Bonzini wrote:



On 23/09/2015 09:56, Borislav Petkov wrote:

On Tue, Sep 22, 2015 at 11:04:38PM +0200, Paolo Bonzini wrote:

Let's add more debugging output:


Here you go:

[   50.474002] walk_shadow_page_get_mmio_spte: detect reserved bits on spte, 
addr 0xb8000 (level 4, 0xf00f8)
[   50.484249] walk_shadow_page_get_mmio_spte: detect reserved bits on spte, 
addr 0xb8000 (level 3, 0xf0078)
[   50.494492] walk_shadow_page_get_mmio_spte: detect reserved bits on spte, 
addr 0xb8000 (level 2, 0xf0078)


And another patch, which both cranks up the debugging a bit and
tries another fix:

diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index dd05b9cef6ae..b2f49bb15ba1 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -105,8 +105,15 @@ static inline bool guest_cpuid_has_x2apic(struct kvm_vcpu 
*vcpu)
  static inline bool guest_cpuid_is_amd(struct kvm_vcpu *vcpu)
  {
struct kvm_cpuid_entry2 *best;
+   static bool first;

best = kvm_find_cpuid_entry(vcpu, 0, 0);
+   if (first && best) {
+   printk("cpuid(0).ebx = %x\n", best->ebx);
+   first = false;
+   } else if (first)
+   printk_ratelimited("cpuid(0) not initialized yet\n");
+
return best && best->ebx == X86EMUL_CPUID_VENDOR_AuthenticAMD_ebx;
  }

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index bf1122e9c7bf..f50b280ffee1 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3625,7 +3625,7 @@ static void
  __reset_rsvds_bits_mask(struct kvm_vcpu *vcpu,
struct rsvd_bits_validate *rsvd_check,
int maxphyaddr, int level, bool nx, bool gbpages,
-   bool pse)
+   bool pse, bool amd)
  {
u64 exb_bit_rsvd = 0;
u64 gbpages_bit_rsvd = 0;
@@ -3642,7 +3642,7 @@ __reset_rsvds_bits_mask(struct kvm_vcpu *vcpu,
 * Non-leaf PML4Es and PDPEs reserve bit 8 (which would be the G bit for
 * leaf entries) on AMD CPUs only.
 */
-   if (guest_cpuid_is_amd(vcpu))
+   if (amd)
nonleaf_bit8_rsvd = rsvd_bits(8, 8);

switch (level) {
@@ -3710,7 +3710,7 @@ static void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu,
__reset_rsvds_bits_mask(vcpu, &context->guest_rsvd_check,
cpuid_maxphyaddr(vcpu), context->root_level,
context->nx, guest_cpuid_has_gbpages(vcpu),
-   is_pse(vcpu));
+   is_pse(vcpu), guest_cpuid_is_amd(vcpu));
  }

  static void
@@ -3760,13 +3760,25 @@ static void reset_rsvds_bits_mask_ept(struct kvm_vcpu 
*vcpu,
  void
  reset_shadow_zero_bits_mask(struct kvm_vcpu *vcpu, struct kvm_mmu *context)
  {
+   /*
+* Passing "true" to the last argument is okay; it adds a check
+* on bit 8 of the SPTEs which KVM doesn't use anyway.
+*/
__reset_rsvds_bits_mask(vcpu, &context->shadow_zero_check,
boot_cpu_data.x86_phys_bits,
context->shadow_root_level, context->nx,
-   guest_cpuid_has_gbpages(vcpu), is_pse(vcpu));
+   guest_cpuid_has_gbpages(vcpu), is_pse(vcpu),
+   true);
  }
  EXPORT_SYMBOL_GPL(reset_shadow_zero_bits_mask);

+static inline bool
+boot_cpu_is_amd(void)
+{
+   WARN_ON_ONCE(!tdp_enabled);
+   return shadow_x_mask != 0;


shadow_x_mask != 0 is Intel's CPU.

Borislav, could you please check shadow_x_mask == 0 instead and test it again?

Further more, use guest_cpuid_is_amd() to detect hardware CPU vendor is wrong
as usespace can fool KVM. Should test host CPUID or introduce intel/amd callback
instead.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Questions about KVM TSC trapping

2015-09-23 Thread yangoliver

David,

Sorry for late reply. See my inline comments.


On Tue, 15 Sep 2015, David Matlack wrote:

> On Tue, Sep 15, 2015 at 12:04 AM, Oliver Yang  wrote:
> > Hi Guys,
> >
> > I found below patch for KVM TSC trapping / migration support,
> >
> > https://lkml.org/lkml/2011/1/6/90
> >
> > It seemed the patch were not merged in Linux mainline.
> >
> > So I have 3 questions here,
> >
> > 1.  Can KVM support TSC trapping today? If not, what is the plan?
> 
> Not without a patch. Did you want to trap RDTSC? RDTSC works without
> trapping thanks to hardware support.
In fact, I think TSC trap is the only way to make TSC sync in guest OS.

For exmaple, user space applications could use rdtsc directly, which
may cause the problem if there is no TSC emulation.
> 
> >
> > 2. What is the solution if my SMP Linux guest OS doesn't have reliable
> > TSC?
> 
> If you are seeing an unreliable TSC in your guest, maybe your host
> hardware does not support a synchronized TSC across CPUs. I can't
> recall how to check for this. There might be a flag in your host's
> /proc/cpuinfo.

Yes. I knew how to check the hardware capabilitiy.

My major question was whether KVM support TSC sync even running over
TSC unreliable hardware.
> 
> >
> > Because the no TSC trapping support, will kvmclock driver handle all
> > TSC sync issues?
> >
> > 3. What if my Linux guest doesn't have kvmclock driver?
> 
> The guest will use a different clock source (e.g. acpi-pm). Note the
> RDTSC[P] instruction will still work just fine.

The major problem is user application, as my application calls rstsc under
multi-thread environment.

> >
> > Does that mean I shouldn't run TSC sensitive application in my guest
> > OS?
> >
> > BTW, my application is written with lots of rdtsc instructions, and
> > which performs well in VMware guest.
> 
> Does it not work well in KVM?

Not quite sure. We plan to run our application over KVM.
VMware ESX5.x supported TSC sync even underlying hardware TSC reliable,
so I'm just wondering KVM cases.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] Revert "KVM: x86: zero kvmclock_offset when vcpu0 initializes kvmclock system MSR"

2015-09-23 Thread Marcelo Tosatti
On Tue, Sep 22, 2015 at 09:52:49PM +0200, Paolo Bonzini wrote:
> 
> 
> On 22/09/2015 21:01, Marcelo Tosatti wrote:
> > On Fri, Sep 18, 2015 at 05:54:30PM +0200, Radim Krčmář wrote:
> >> Shifting pvclock_vcpu_time_info.system_time on write to KVM system time
> >> MSR is a change of ABI.  Probably only 2.6.16 based SLES 10 breaks due
> >> to its custom enhancements to kvmclock, but KVM never declared the MSR
> >> only for one-shot initialization.  (Doc says that only one write is
> >> needed.)
> >>
> >> This reverts commit b7e60c5aedd2b63f16ef06fde4f81ca032211bc5.
> >> And adds a note to the definition of PVCLOCK_COUNTS_FROM_ZERO.
> >>
> >> Signed-off-by: Radim Krčmář 
> >> ---
> >>  arch/x86/include/asm/pvclock-abi.h | 1 +
> >>  arch/x86/kvm/x86.c | 4 
> >>  2 files changed, 1 insertion(+), 4 deletions(-)
> >>
> >> diff --git a/arch/x86/include/asm/pvclock-abi.h 
> >> b/arch/x86/include/asm/pvclock-abi.h
> >> index 655e07a48f6c..67f08230103a 100644
> >> --- a/arch/x86/include/asm/pvclock-abi.h
> >> +++ b/arch/x86/include/asm/pvclock-abi.h
> >> @@ -41,6 +41,7 @@ struct pvclock_wall_clock {
> >>  
> >>  #define PVCLOCK_TSC_STABLE_BIT(1 << 0)
> >>  #define PVCLOCK_GUEST_STOPPED (1 << 1)
> >> +/* PVCLOCK_COUNTS_FROM_ZERO broke ABI and can't be used anymore. */
> >>  #define PVCLOCK_COUNTS_FROM_ZERO (1 << 2)
> >>  #endif /* __ASSEMBLY__ */
> >>  #endif /* _ASM_X86_PVCLOCK_ABI_H */
> >> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> >> index 18d59b584dee..34d33f4757d2 100644
> >> --- a/arch/x86/kvm/x86.c
> >> +++ b/arch/x86/kvm/x86.c
> >> @@ -1707,8 +1707,6 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
> >>vcpu->pvclock_set_guest_stopped_request = false;
> >>}
> >>  
> >> -  pvclock_flags |= PVCLOCK_COUNTS_FROM_ZERO;
> >> -
> >>/* If the host uses TSC clocksource, then it is stable */
> >>if (use_master_clock)
> >>pvclock_flags |= PVCLOCK_TSC_STABLE_BIT;
> >> @@ -2006,8 +2004,6 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct 
> >> msr_data *msr_info)
> >>&vcpu->requests);
> >>  
> >>ka->boot_vcpu_runs_old_kvmclock = tmp;
> >> -
> >> -  ka->kvmclock_offset = -get_kernel_ns();
> >>}
> >>  
> >>vcpu->arch.time = data;
> >> -- 
> >> 2.5.2
> > 
> > ACK
> 
> So I suppose you changed your mind :) but can you explain the reasoning?
> 
> Paolo

The patch is correct. Overflow (issue raised) is only an issue 
without PVCLOCK_TSC_STABLE_BIT.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 11/15] mm, dax, pmem: introduce __pfn_t

2015-09-23 Thread Williams, Dan J
[ adding kvm@vger.kernel.org ]

On Wed, 2015-09-23 at 09:02 -0700, Dave Hansen wrote:
> On 09/22/2015 09:42 PM, Dan Williams wrote:
> >  /*
> > + * __pfn_t: encapsulates a page-frame number that is optionally backed
> > + * by memmap (struct page).  Whether a __pfn_t has a 'struct page'
> > + * backing is indicated by flags in the low bits of the value;
> > + */
> > +typedef struct {
> > +   unsigned long val;
> > +} __pfn_t;
> > +
> > +/*
> > + * PFN_SG_CHAIN - pfn is a pointer to the next scatterlist entry
> > + * PFN_SG_LAST - pfn references a page and is the last scatterlist entry
> > + * PFN_DEV - pfn is not covered by system memmap by default
> > + * PFN_MAP - pfn has a dynamic page mapping established by a device driver
> > + */
> > +enum {
> > +   PFN_SHIFT = 4,
> > +   PFN_MASK = (1UL << PFN_SHIFT) - 1,
> > +   PFN_SG_CHAIN = (1UL << 0),
> > +   PFN_SG_LAST = (1UL << 1),
> > +   PFN_DEV = (1UL << 2),
> > +   PFN_MAP = (1UL << 3),
> > +};
> 
> Please forgive a little bikeshedding here...
> 
> Why __pfn_t?  Because the KVM code has a pfn_t?  If so, I think we
> should rescue pfn_t from KVM and give them a kvm_pfn_t.

Sounds good, once 0day has a chance to chew on the conversion I'll send
it out.

> I think you should do one of two things:  Make PFN_SHIFT 12 so that a
> physical addr can be stored in a __pfn_t with no work.  Or, use the
> *high* 12 bits of __pfn_t.val.
> 

pfn to pfn_t conversions are more likely, so the high bits sounds
better.

> If you use the high bits, *and* make it store a plain pfn when all the
> bits are 0, then you get a zero-cost pfn<->__pfn_t conversion which will
> hopefully generate the exact same code which is there today.
> 
> The one disadvantage here is that it makes it more likely that somebody
> that's just setting __pfn_t.val=foo will get things subtly wrong
> somehow, but that it will work most of the time.
> 
> Also, about naming...  PFN_SHIFT is pretty awful name for this.  It
> probably needs to be __PFN_T_SOMETHING.  We don't want folks doing
> craziness like:
> 
>   unsigned long phys_addr = pfn << PFN_SHIFT.
> 
> Which *looks* OK.

Heh, true.  It's now PFN_FLAGS_MASK in the reworked patch...


8<---
Subject: mm, dax, pmem: introduce pfn_t

From: Dan Williams 

In preparation for enabling get_user_pages() operations on dax mappings,
introduce a type that encapsulates a page-frame-number that can also be
used to encode other information.  This other information is the
historical "page_link" encoding in a scatterlist, but can also denote
"device memory".  Where "device memory" is a set of pfns that are not
part of the kernel's linear mapping by default, but are accessed via the
same memory controller as ram.  The motivation for this new type is
large capacity persistent memory that optionally has struct page entries
in the 'memmap'.

When a driver, like pmem, has established a devm_memremap_pages()
mapping it needs to communicate to upper layers that the pfn has a page
backing.  This property will be leveraged in a later patch to enable
dax-gup.  For now, update all the ->direct_access() implementations to
communicate whether the returned pfn range is mapped.

Cc: Christoph Hellwig 
Cc: Dave Hansen 
Cc: Andrew Morton 
Cc: Ross Zwisler 
Signed-off-by: Dan Williams 
---
 arch/powerpc/sysdev/axonram.c |8 ++---
 drivers/block/brd.c   |4 +--
 drivers/nvdimm/pmem.c |   27 +
 drivers/s390/block/dcssblk.c  |   10 +++---
 fs/block_dev.c|2 +
 fs/dax.c  |   23 ---
 include/linux/blkdev.h|4 +--
 include/linux/mm.h|   65 +
 include/linux/pfn.h   |9 ++
 9 files changed, 112 insertions(+), 40 deletions(-)

diff --git a/arch/powerpc/sysdev/axonram.c b/arch/powerpc/sysdev/axonram.c
index 24ffab2572e8..ec3b072d20cf 100644
--- a/arch/powerpc/sysdev/axonram.c
+++ b/arch/powerpc/sysdev/axonram.c
@@ -141,15 +141,13 @@ axon_ram_make_request(struct request_queue *queue, struct 
bio *bio)
  */
 static long
 axon_ram_direct_access(struct block_device *device, sector_t sector,
-  void __pmem **kaddr, unsigned long *pfn)
+  void __pmem **kaddr, pfn_t *pfn)
 {
struct axon_ram_bank *bank = device->bd_disk->private_data;
loff_t offset = (loff_t)sector << AXON_RAM_SECTOR_SHIFT;
-   void *addr = (void *)(bank->ph_addr + offset);
-
-   *kaddr = (void __pmem *)addr;
-   *pfn = virt_to_phys(addr) >> PAGE_SHIFT;
 
+   *kaddr = (void __pmem __force *) bank->io_addr + offset;
+   *pfn = phys_to_pfn_t(bank->ph_addr + offset, PFN_DEV);
return bank->size - offset;
 }
 
diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index f645a71ae827..796112e9174b 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -374,7 +374,7 @@ static int brd_rw_page(struct block_device *bdev, sector_t 
sector,
 
 #ifdef CONFIG_BLK_DEV_RAM_DAX
 static long b

[PATCH v2] Fix AF_PACKET ABI breakage in 4.2

2015-09-23 Thread David Woodhouse
Commit 7d82410950aa ("virtio: add explicit big-endian support to memory
accessors") accidentally changed the virtio_net header used by
AF_PACKET with PACKET_VNET_HDR from host-endian to big-endian.

Since virtio_legacy_is_little_endian() is a very long identifier,
define a VIO_LE macro and use that throughout the code instead of the
hard-coded 'false' for little-endian.

This restores the ABI to match 4.1 and earlier kernels, and makes my
test program work again.

Signed-off-by: David Woodhouse 
---
On Wed, 2015-09-23 at 11:09 -0700, David Miller wrote:
> > +#define VIO_LE virtio_legacy_is_little_endian()
> 
> When you define a shorthand macro, the defines to a function call,
> make the macro have parenthesis too.

In which case I suppose it also wants to be lower-case. Although
"function call" is a bit strong since it's effectively just a constant.
I'm still wondering if it'd be nicer just to use (__force u16) instead.

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 7b8e39a..aa4b15c 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -230,6 +230,8 @@ struct packet_skb_cb {
} sa;
 };
 
+#define vio_le() virtio_legacy_is_little_endian()
+
 #define PACKET_SKB_CB(__skb)   ((struct packet_skb_cb *)((__skb)->cb))
 
 #define GET_PBDQC_FROM_RB(x)   ((struct tpacket_kbdq_core *)(&(x)->prb_bdqc))
@@ -2680,15 +2682,15 @@ static int packet_snd(struct socket *sock, struct 
msghdr *msg, size_t len)
goto out_unlock;
 
if ((vnet_hdr.flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) &&
-   (__virtio16_to_cpu(false, vnet_hdr.csum_start) +
-__virtio16_to_cpu(false, vnet_hdr.csum_offset) + 2 >
- __virtio16_to_cpu(false, vnet_hdr.hdr_len)))
-   vnet_hdr.hdr_len = __cpu_to_virtio16(false,
-__virtio16_to_cpu(false, vnet_hdr.csum_start) +
-   __virtio16_to_cpu(false, vnet_hdr.csum_offset) 
+ 2);
+   (__virtio16_to_cpu(vio_le(), vnet_hdr.csum_start) +
+__virtio16_to_cpu(vio_le(), vnet_hdr.csum_offset) + 2 >
+ __virtio16_to_cpu(vio_le(), vnet_hdr.hdr_len)))
+   vnet_hdr.hdr_len = __cpu_to_virtio16(vio_le(),
+__virtio16_to_cpu(vio_le(), 
vnet_hdr.csum_start) +
+   __virtio16_to_cpu(vio_le(), 
vnet_hdr.csum_offset) + 2);
 
err = -EINVAL;
-   if (__virtio16_to_cpu(false, vnet_hdr.hdr_len) > len)
+   if (__virtio16_to_cpu(vio_le(), vnet_hdr.hdr_len) > len)
goto out_unlock;
 
if (vnet_hdr.gso_type != VIRTIO_NET_HDR_GSO_NONE) {
@@ -2731,7 +2733,7 @@ static int packet_snd(struct socket *sock, struct msghdr 
*msg, size_t len)
hlen = LL_RESERVED_SPACE(dev);
tlen = dev->needed_tailroom;
skb = packet_alloc_skb(sk, hlen + tlen, hlen, len,
-  __virtio16_to_cpu(false, vnet_hdr.hdr_len),
+  __virtio16_to_cpu(vio_le(), vnet_hdr.hdr_len),
   msg->msg_flags & MSG_DONTWAIT, &err);
if (skb == NULL)
goto out_unlock;
@@ -2778,8 +2780,8 @@ static int packet_snd(struct socket *sock, struct msghdr 
*msg, size_t len)
 
if (po->has_vnet_hdr) {
if (vnet_hdr.flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) {
-   u16 s = __virtio16_to_cpu(false, vnet_hdr.csum_start);
-   u16 o = __virtio16_to_cpu(false, vnet_hdr.csum_offset);
+   u16 s = __virtio16_to_cpu(vio_le(), 
vnet_hdr.csum_start);
+   u16 o = __virtio16_to_cpu(vio_le(), 
vnet_hdr.csum_offset);
if (!skb_partial_csum_set(skb, s, o)) {
err = -EINVAL;
goto out_free;
@@ -2787,7 +2789,7 @@ static int packet_snd(struct socket *sock, struct msghdr 
*msg, size_t len)
}
 
skb_shinfo(skb)->gso_size =
-   __virtio16_to_cpu(false, vnet_hdr.gso_size);
+   __virtio16_to_cpu(vio_le(), vnet_hdr.gso_size);
skb_shinfo(skb)->gso_type = gso_type;
 
/* Header must be checked, and gso_segs computed. */
@@ -3161,9 +3163,9 @@ static int packet_recvmsg(struct socket *sock, struct 
msghdr *msg, size_t len,
 
/* This is a hint as to how much should be linear. */
vnet_hdr.hdr_len =
-   __cpu_to_virtio16(false, skb_headlen(skb));
+   __cpu_to_virtio16(vio_le(), skb_headlen(skb));
vnet_hdr.gso_size =
-   __cpu_to_virtio16(false, sinfo->gso_size);
+   __cpu_to_virtio16(vio_le(), sinfo->gso_size);
if (sinfo->

Re: [PATCH] Fix AF_PACKET ABI breakage in 4.2

2015-09-23 Thread David Miller
From: David Woodhouse 
Date: Wed, 23 Sep 2015 15:44:30 +0100

> +#define VIO_LE virtio_legacy_is_little_endian()

When you define a shorthand macro, the defines to a function call,
make the macro have parenthesis too.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 07/11] KVM: arm/arm64: vgic: Allow HW interrupts to be queued to a guest

2015-09-23 Thread Andre Przywara
Salut Marc,

I know that this patch is already merged, but 

On 07/08/15 16:45, Marc Zyngier wrote:
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index 51c9900..9d009d2 100644
...
> @@ -1364,6 +1397,39 @@ static bool vgic_process_maintenance(struct kvm_vcpu 
> *vcpu)
>   return level_pending;
>  }
>  
> +/*
> + * Save the physical active state, and reset it to inactive.
> + *
> + * Return 1 if HW interrupt went from active to inactive, and 0 otherwise.
> + */
> +static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr)
> +{
> + struct irq_phys_map *map;
> + int ret;
> +
> + if (!(vlr.state & LR_HW))
> + return 0;
> +
> + map = vgic_irq_map_search(vcpu, vlr.irq);
> + BUG_ON(!map || !map->active);
> +
> + ret = irq_get_irqchip_state(map->irq,
> + IRQCHIP_STATE_ACTIVE,
> + &map->active);
> +
> + WARN_ON(ret);
> +
> + if (map->active) {
> + ret = irq_set_irqchip_state(map->irq,
> + IRQCHIP_STATE_ACTIVE,
> + false);
> + WARN_ON(ret);
> + return 0;
> + }
> +
> + return 1;
> +}
> +
>  /* Sync back the VGIC state after a guest run */
>  static void __kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
>  {
> @@ -1378,14 +1444,31 @@ static void __kvm_vgic_sync_hwstate(struct kvm_vcpu 
> *vcpu)
>   elrsr = vgic_get_elrsr(vcpu);
>   elrsr_ptr = u64_to_bitmask(&elrsr);
>  
> - /* Clear mappings for empty LRs */
> - for_each_set_bit(lr, elrsr_ptr, vgic->nr_lr) {
> + /* Deal with HW interrupts, and clear mappings for empty LRs */
> + for (lr = 0; lr < vgic->nr_lr; lr++) {
>   struct vgic_lr vlr;
>  
> - if (!test_and_clear_bit(lr, vgic_cpu->lr_used))
> + if (!test_bit(lr, vgic_cpu->lr_used))
>   continue;
>  
>   vlr = vgic_get_lr(vcpu, lr);
> + if (vgic_sync_hwirq(vcpu, vlr)) {
> + /*
> +  * So this is a HW interrupt that the guest
> +  * EOI-ed. Clean the LR state and allow the
> +  * interrupt to be sampled again.
> +  */
> + vlr.state = 0;
> + vlr.hwirq = 0;
> + vgic_set_lr(vcpu, lr, vlr);
> + vgic_irq_clear_queued(vcpu, vlr.irq);

Isn't this line altering common VGIC state without holding the lock?
Eric removed the coarse dist->lock around the whole
__kvm_vgic_sync_hwstate() function, we take it now in
vgic_process_maintenance(), but don't hold it here AFAICT.
As long as we are only dealing with private timer IRQs this is probably
not a problem, but the IRQ number could be a SPI as well, right?

Cheers,
Andre.

> + set_bit(lr, elrsr_ptr);
> + }
> +
> + if (!test_bit(lr, elrsr_ptr))
> + continue;
> +
> + clear_bit(lr, vgic_cpu->lr_used);
>  
>   BUG_ON(vlr.irq >= dist->nr_irqs);
>   vgic_cpu->vgic_irq_lr_map[vlr.irq] = LR_EMPTY;
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 7/8] arm/arm64: KVM: Rework the arch timer to use level-triggered semantics

2015-09-23 Thread Andre Przywara
Hi Christoffer,

> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index 9ed8d53..f4ea950 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -1422,34 +1422,43 @@ static bool vgic_process_maintenance(struct kvm_vcpu 
> *vcpu)
>  /*
>   * Save the physical active state, and reset it to inactive.
>   *
> - * Return 1 if HW interrupt went from active to inactive, and 0 otherwise.
> + * Return true if there's a pending level triggered interrupt line to queue.
>   */
> -static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr)
> +static bool vgic_sync_hwirq(struct kvm_vcpu *vcpu, int lr, struct vgic_lr 
> vlr)
>  {
>   struct irq_phys_map *map;
> + bool phys_active;
>   int ret;
>  
>   if (!(vlr.state & LR_HW))
>   return 0;
>  
>   map = vgic_irq_map_search(vcpu, vlr.irq);
> - BUG_ON(!map || !map->active);
> + BUG_ON(!map);
>  
>   ret = irq_get_irqchip_state(map->irq,
>   IRQCHIP_STATE_ACTIVE,
> - &map->active);
> + &phys_active);
>  
>   WARN_ON(ret);
>  
> - if (map->active) {
> + if (phys_active) {
> + /*
> +  * Interrupt still marked as active on the physical
> +  * distributor, so guest did not EOI it yet.  Reset to
> +  * non-active so that other VMs can see interrupts from this
> +  * device.
> +  */
>   ret = irq_set_irqchip_state(map->irq,
>   IRQCHIP_STATE_ACTIVE,
>   false);
>   WARN_ON(ret);
> - return 0;
> + return false;
>   }
>  
> - return 1;
> + /* Mapped edge-triggered interrupts not yet supported. */
> + WARN_ON(vgic_irq_is_edge(vcpu, vlr.irq));
> + return process_level_irq(vcpu, lr, vlr);

Don't you miss the dist->lock here? The other call to
process_level_irq() certainly does it, and Eric recently removed the
coarse grained lock around the whole __kvm_vgic_sync_hwstate() function.
So we don't hold the lock here, but we change quite some common VGIC
state in there.

Cheers.
Andre.

>  }
>  
>  /* Sync back the VGIC state after a guest run */
> @@ -1474,18 +1483,8 @@ static void __kvm_vgic_sync_hwstate(struct kvm_vcpu 
> *vcpu)
>   continue;
>  
>   vlr = vgic_get_lr(vcpu, lr);
> - if (vgic_sync_hwirq(vcpu, vlr)) {
> - /*
> -  * So this is a HW interrupt that the guest
> -  * EOI-ed. Clean the LR state and allow the
> -  * interrupt to be sampled again.
> -  */
> - vlr.state = 0;
> - vlr.hwirq = 0;
> - vgic_set_lr(vcpu, lr, vlr);
> - vgic_irq_clear_queued(vcpu, vlr.irq);
> - set_bit(lr, elrsr_ptr);
> - }
> + if (vgic_sync_hwirq(vcpu, lr, vlr))
> + level_pending = true;
>  
>   if (!test_bit(lr, elrsr_ptr))
>   continue;
> @@ -1861,30 +1860,6 @@ static void vgic_free_phys_irq_map_rcu(struct rcu_head 
> *rcu)
>  }
>  
>  /**
> - * kvm_vgic_get_phys_irq_active - Return the active state of a mapped IRQ
> - *
> - * Return the logical active state of a mapped interrupt. This doesn't
> - * necessarily reflects the current HW state.
> - */
> -bool kvm_vgic_get_phys_irq_active(struct irq_phys_map *map)
> -{
> - BUG_ON(!map);
> - return map->active;
> -}
> -
> -/**
> - * kvm_vgic_set_phys_irq_active - Set the active state of a mapped IRQ
> - *
> - * Set the logical active state of a mapped interrupt. This doesn't
> - * immediately affects the HW state.
> - */
> -void kvm_vgic_set_phys_irq_active(struct irq_phys_map *map, bool active)
> -{
> - BUG_ON(!map);
> - map->active = active;
> -}
> -
> -/**
>   * kvm_vgic_unmap_phys_irq - Remove a virtual to physical IRQ mapping
>   * @vcpu: The VCPU pointer
>   * @map: The pointer to a mapping obtained through kvm_vgic_map_phys_irq
> @@ -2112,10 +2087,14 @@ int vgic_init(struct kvm *kvm)
>   if (i < VGIC_NR_SGIS)
>   vgic_bitmap_set_irq_val(&dist->irq_enabled,
>   vcpu->vcpu_id, i, 1);
> - if (i < VGIC_NR_PRIVATE_IRQS)
> + if (i < VGIC_NR_SGIS)
>   vgic_bitmap_set_irq_val(&dist->irq_cfg,
>   vcpu->vcpu_id, i,
>   VGIC_CFG_EDGE);
> + else if (i < VGIC_NR_PRIVATE_IRQS) /* PPIs */
> + vgic_bitmap_set_irq_val(&dist->irq_cfg,
> + vcpu->vcpu_id, i,
> + 

[PATCH] Fix AF_PACKET ABI breakage in 4.2

2015-09-23 Thread David Woodhouse
Commit 7d82410950aa ("virtio: add explicit big-endian support to memory
accessors") accidentally changed the virtio_net header used by
AF_PACKET with PACKET_VNET_HDR from host-endian to big-endian.

Since virtio_legacy_is_little_endian() is a very long identifier,
define a VIO_LE macro and use that throughout the code instead of the
hard-coded 'false' for little-endian.

This restores the ABI to match 4.1 and earlier kernels, and makes my
test program work again.

Signed-off-by: David Woodhouse 
---
Or perhaps we should just use (__force u16) and (__force __virtio16)
since that's all we really want anyway?

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 7b8e39a..d646623 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -230,6 +230,8 @@ struct packet_skb_cb {
} sa;
 };
 
+#define VIO_LE virtio_legacy_is_little_endian()
+
 #define PACKET_SKB_CB(__skb)   ((struct packet_skb_cb *)((__skb)->cb))
 
 #define GET_PBDQC_FROM_RB(x)   ((struct tpacket_kbdq_core *)(&(x)->prb_bdqc))
@@ -2680,15 +2682,15 @@ static int packet_snd(struct socket *sock, struct 
msghdr *msg, size_t len)
goto out_unlock;
 
if ((vnet_hdr.flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) &&
-   (__virtio16_to_cpu(false, vnet_hdr.csum_start) +
-__virtio16_to_cpu(false, vnet_hdr.csum_offset) + 2 >
- __virtio16_to_cpu(false, vnet_hdr.hdr_len)))
-   vnet_hdr.hdr_len = __cpu_to_virtio16(false,
-__virtio16_to_cpu(false, vnet_hdr.csum_start) +
-   __virtio16_to_cpu(false, vnet_hdr.csum_offset) 
+ 2);
+   (__virtio16_to_cpu(VIO_LE, vnet_hdr.csum_start) +
+__virtio16_to_cpu(VIO_LE, vnet_hdr.csum_offset) + 2 >
+ __virtio16_to_cpu(VIO_LE, vnet_hdr.hdr_len)))
+   vnet_hdr.hdr_len = __cpu_to_virtio16(VIO_LE,
+__virtio16_to_cpu(VIO_LE, vnet_hdr.csum_start) 
+
+   __virtio16_to_cpu(VIO_LE, vnet_hdr.csum_offset) 
+ 2);
 
err = -EINVAL;
-   if (__virtio16_to_cpu(false, vnet_hdr.hdr_len) > len)
+   if (__virtio16_to_cpu(VIO_LE, vnet_hdr.hdr_len) > len)
goto out_unlock;
 
if (vnet_hdr.gso_type != VIRTIO_NET_HDR_GSO_NONE) {
@@ -2731,7 +2733,7 @@ static int packet_snd(struct socket *sock, struct msghdr 
*msg, size_t len)
hlen = LL_RESERVED_SPACE(dev);
tlen = dev->needed_tailroom;
skb = packet_alloc_skb(sk, hlen + tlen, hlen, len,
-  __virtio16_to_cpu(false, vnet_hdr.hdr_len),
+  __virtio16_to_cpu(VIO_LE, vnet_hdr.hdr_len),
   msg->msg_flags & MSG_DONTWAIT, &err);
if (skb == NULL)
goto out_unlock;
@@ -2778,8 +2780,8 @@ static int packet_snd(struct socket *sock, struct msghdr 
*msg, size_t len)
 
if (po->has_vnet_hdr) {
if (vnet_hdr.flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) {
-   u16 s = __virtio16_to_cpu(false, vnet_hdr.csum_start);
-   u16 o = __virtio16_to_cpu(false, vnet_hdr.csum_offset);
+   u16 s = __virtio16_to_cpu(VIO_LE, vnet_hdr.csum_start);
+   u16 o = __virtio16_to_cpu(VIO_LE, vnet_hdr.csum_offset);
if (!skb_partial_csum_set(skb, s, o)) {
err = -EINVAL;
goto out_free;
@@ -2787,7 +2789,7 @@ static int packet_snd(struct socket *sock, struct msghdr 
*msg, size_t len)
}
 
skb_shinfo(skb)->gso_size =
-   __virtio16_to_cpu(false, vnet_hdr.gso_size);
+   __virtio16_to_cpu(VIO_LE, vnet_hdr.gso_size);
skb_shinfo(skb)->gso_type = gso_type;
 
/* Header must be checked, and gso_segs computed. */
@@ -3161,9 +3163,9 @@ static int packet_recvmsg(struct socket *sock, struct 
msghdr *msg, size_t len,
 
/* This is a hint as to how much should be linear. */
vnet_hdr.hdr_len =
-   __cpu_to_virtio16(false, skb_headlen(skb));
+   __cpu_to_virtio16(VIO_LE, skb_headlen(skb));
vnet_hdr.gso_size =
-   __cpu_to_virtio16(false, sinfo->gso_size);
+   __cpu_to_virtio16(VIO_LE, sinfo->gso_size);
if (sinfo->gso_type & SKB_GSO_TCPV4)
vnet_hdr.gso_type = VIRTIO_NET_HDR_GSO_TCPV4;
else if (sinfo->gso_type & SKB_GSO_TCPV6)
@@ -3181,9 +3183,9 @@ static int packet_recvmsg(struct socket *sock, struct 
msghdr *msg, size_t len,
 
if (skb->ip_summed == CHECKSUM_PARTIAL) {
  

Re: [PATCH] KVM: x86: fix bogus warning about reserved bits

2015-09-23 Thread Paolo Bonzini


On 23/09/2015 13:07, Borislav Petkov wrote:
>> > +  static bool first;
>> >  
>> >best = kvm_find_cpuid_entry(vcpu, 0, 0);
>> > +  if (first && best) {
>> > +  printk("cpuid(0).ebx = %x\n", best->ebx);
>> > +  first = false;
>> > +  } else if (first)
>> > +  printk_ratelimited("cpuid(0) not initialized yet\n");
>> > +
>> >return best && best->ebx == X86EMUL_CPUID_VENDOR_AuthenticAMD_ebx;
> Do I see it correctly that that "first" thing is never true?
> 
> In any case, I changed it to initialize to true but still no output from
> that function.
> 
> [  102.448438] walk_shadow_page_get_mmio_spte: detect reserved bits on spte, 
> addr 0xb8000 (level 4, 0xf00f8)
> [  102.458706] walk_shadow_page_get_mmio_spte: detect reserved bits on spte, 
> addr 0xb8000 (level 3, 0xf0078)
> [  102.468955] walk_shadow_page_get_mmio_spte: detect reserved bits on spte, 
> addr 0xb8000 (level 2, 0xf0078)

Yeah, but I didn't totally need the output so I didn't bother answering
with yet another v2.  I'm a bit stymied and will try to find an AMD
machine inside RH (it's always a pain to install latest-and-greatest
kernel on unknown machines).  It's probably also time to buy one too...

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: fix bogus warning about reserved bits

2015-09-23 Thread Borislav Petkov
On Wed, Sep 23, 2015 at 11:36:47AM +0200, Paolo Bonzini wrote:
> And another patch, which both cranks up the debugging a bit and
> tries another fix:
> 
> diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
> index dd05b9cef6ae..b2f49bb15ba1 100644
> --- a/arch/x86/kvm/cpuid.h
> +++ b/arch/x86/kvm/cpuid.h
> @@ -105,8 +105,15 @@ static inline bool guest_cpuid_has_x2apic(struct 
> kvm_vcpu *vcpu)
>  static inline bool guest_cpuid_is_amd(struct kvm_vcpu *vcpu)
>  {
>   struct kvm_cpuid_entry2 *best;
> + static bool first;
>  
>   best = kvm_find_cpuid_entry(vcpu, 0, 0);
> + if (first && best) {
> + printk("cpuid(0).ebx = %x\n", best->ebx);
> + first = false;
> + } else if (first)
> + printk_ratelimited("cpuid(0) not initialized yet\n");
> +
>   return best && best->ebx == X86EMUL_CPUID_VENDOR_AuthenticAMD_ebx;

Do I see it correctly that that "first" thing is never true?

In any case, I changed it to initialize to true but still no output from
that function.

[  102.448438] walk_shadow_page_get_mmio_spte: detect reserved bits on spte, 
addr 0xb8000 (level 4, 0xf00f8)
[  102.458706] walk_shadow_page_get_mmio_spte: detect reserved bits on spte, 
addr 0xb8000 (level 3, 0xf0078)
[  102.468955] walk_shadow_page_get_mmio_spte: detect reserved bits on spte, 
addr 0xb8000 (level 2, 0xf0078)
[  102.479337] dump hierarchy:
[  102.482152] -- spte 0x416edb027 level 4.
[  102.482154] -- spte 0x416eda027 level 3.
[  102.482155] -- spte 0x416ed5027 level 2.
[  102.482157] -- spte 0x000b8f67 level 1.
[  102.482158] [ cut here ]
[  102.482196] WARNING: CPU: 6 PID: 3550 at arch/x86/kvm/mmu.c:3396 
handle_mmio_page_fault.part.57+0x1a/0x20 [kvm]()
[  102.482236] Modules linked in: tun sha256_ssse3 sha256_generic drbg 
binfmt_misc ipv6 vfat fat fuse dm_crypt dm_mod kvm_amd kvm crc32_pclmul 
aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd 
amd64_edac_mod fam15h_power k10temp edac_core amdkfd amd_iommu_v2 radeon 
acpi_cpufreq
[  102.482240] CPU: 6 PID: 3550 Comm: qemu-system-x86 Not tainted 4.3.0-rc2+ #1
[  102.482242] Hardware name: To be filled by O.E.M. To be filled by 
O.E.M./M5A97 EVO R2.0, BIOS 1503 01/16/2013
[  102.482249]  a030c992 880424b8fb78 812c758a 

[  102.482253]  880424b8fbb0 810534c1 8804160e 
000f
[  102.482257]  000b8000   
880424b8fbc0
[  102.482259] Call Trace:
[  102.482268]  [] dump_stack+0x4e/0x84
[  102.482273]  [] warn_slowpath_common+0x91/0xd0
[  102.482276]  [] warn_slowpath_null+0x1a/0x20
[  102.482306]  [] handle_mmio_page_fault.part.57+0x1a/0x20 
[kvm]
[  102.482334]  [] tdp_page_fault+0x2a0/0x2b0 [kvm]
[  102.482340]  [] ? __lock_acquire+0x57d/0x17a0
[  102.482369]  [] kvm_mmu_page_fault+0x35/0x240 [kvm]
[  102.482376]  [] pf_interception+0x108/0x1d0 [kvm_amd]
[  102.482381]  [] handle_exit+0x150/0xa40 [kvm_amd]
[  102.482408]  [] ? kvm_arch_vcpu_ioctl_run+0x4c8/0x16f0 
[kvm]
[  102.482435]  [] kvm_arch_vcpu_ioctl_run+0x533/0x16f0 [kvm]
[  102.482461]  [] ? kvm_arch_vcpu_ioctl_run+0x4c8/0x16f0 
[kvm]
[  102.482466]  [] ? mutex_lock_killable_nested+0x312/0x480
[  102.482485]  [] ? kvm_vcpu_ioctl+0x79/0x6f0 [kvm]
[  102.482490]  [] ? preempt_count_sub+0xb3/0x110
[  102.482509]  [] kvm_vcpu_ioctl+0x33f/0x6f0 [kvm]
[  102.482515]  [] do_vfs_ioctl+0x2d7/0x530
[  102.482519]  [] ? __fget_light+0x29/0x90
[  102.482523]  [] SyS_ioctl+0x4c/0x90
[  102.482527]  [] entry_SYSCALL_64_fastpath+0x16/0x73
[  102.482531] ---[ end trace b8899512fc52cf2e ]---

Thanks.

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] KVM: introduce __vmx_flush_tlb to handle specific vpid

2015-09-23 Thread Wanpeng Li
Introduce __vmx_flush_tlb() to handle specific vpid.

Signed-off-by: Wanpeng Li 
---
 arch/x86/kvm/vmx.c | 21 +
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 794c529..7188c5e 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1343,13 +1343,13 @@ static void loaded_vmcs_clear(struct loaded_vmcs 
*loaded_vmcs)
 __loaded_vmcs_clear, loaded_vmcs, 1);
 }
 
-static inline void vpid_sync_vcpu_single(struct vcpu_vmx *vmx)
+static inline void vpid_sync_vcpu_single(int vpid)
 {
-   if (vmx->vpid == 0)
+   if (vpid == 0)
return;
 
if (cpu_has_vmx_invvpid_single())
-   __invvpid(VMX_VPID_EXTENT_SINGLE_CONTEXT, vmx->vpid, 0);
+   __invvpid(VMX_VPID_EXTENT_SINGLE_CONTEXT, vpid, 0);
 }
 
 static inline void vpid_sync_vcpu_global(void)
@@ -1358,10 +1358,10 @@ static inline void vpid_sync_vcpu_global(void)
__invvpid(VMX_VPID_EXTENT_ALL_CONTEXT, 0, 0);
 }
 
-static inline void vpid_sync_context(struct vcpu_vmx *vmx)
+static inline void vpid_sync_context(int vpid)
 {
if (cpu_has_vmx_invvpid_single())
-   vpid_sync_vcpu_single(vmx);
+   vpid_sync_vcpu_single(vpid);
else
vpid_sync_vcpu_global();
 }
@@ -3450,9 +3450,9 @@ static void exit_lmode(struct kvm_vcpu *vcpu)
 
 #endif
 
-static void vmx_flush_tlb(struct kvm_vcpu *vcpu)
+static inline void __vmx_flush_tlb(struct kvm_vcpu *vcpu, int vpid)
 {
-   vpid_sync_context(to_vmx(vcpu));
+   vpid_sync_context(vpid);
if (enable_ept) {
if (!VALID_PAGE(vcpu->arch.mmu.root_hpa))
return;
@@ -3460,6 +3460,11 @@ static void vmx_flush_tlb(struct kvm_vcpu *vcpu)
}
 }
 
+static void vmx_flush_tlb(struct kvm_vcpu *vcpu)
+{
+   __vmx_flush_tlb(vcpu, to_vmx(vcpu)->vpid);
+}
+
 static void vmx_decache_cr0_guest_bits(struct kvm_vcpu *vcpu)
 {
ulong cr0_guest_owned_bits = vcpu->arch.cr0_guest_owned_bits;
@@ -4795,7 +4800,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool 
init_event)
vmx_fpu_activate(vcpu);
update_exception_bitmap(vcpu);
 
-   vpid_sync_context(vmx);
+   vpid_sync_context(vmx->vpid);
 }
 
 /*
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: nVMX: emulate the INVVPID instruction

2015-09-23 Thread Wanpeng Li

On 9/23/15 4:39 PM, Paolo Bonzini wrote:


On 23/09/2015 09:59, Wanpeng Li wrote:

Add the INVVPID instruction emulation.

Reviewed-by: Wincy Van 
Signed-off-by: Wanpeng Li 
---
  arch/x86/include/asm/vmx.h |  1 +
  arch/x86/kvm/vmx.c | 23 ++-
  2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index d25f32a..69f3d71 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -397,6 +397,7 @@ enum vmcs_field {
  #define IDENTITY_PAGETABLE_PRIVATE_MEMSLOT(KVM_USER_MEM_SLOTS + 2)
  
  #define VMX_NR_VPIDS(1 << 16)

+#define VMX_VPID_EXTENT_INDIVIDUAL_ADDR0
  #define VMX_VPID_EXTENT_SINGLE_CONTEXT1
  #define VMX_VPID_EXTENT_ALL_CONTEXT   2
  
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c

index 6ad991a..794c529 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -7189,7 +7189,28 @@ static int handle_invept(struct kvm_vcpu *vcpu)
  
  static int handle_invvpid(struct kvm_vcpu *vcpu)

  {
-   kvm_queue_exception(vcpu, UD_VECTOR);
+   u32 vmx_instruction_info;
+   unsigned long type;
+
+   if (!nested_vmx_check_permission(vcpu))
+   return 1;
+
+   vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO);
+   type = kvm_register_readl(vcpu, (vmx_instruction_info >> 28) & 0xf);
+
+   switch (type) {
+   case VMX_VPID_EXTENT_INDIVIDUAL_ADDR:
+   case VMX_VPID_EXTENT_SINGLE_CONTEXT:
+   case VMX_VPID_EXTENT_ALL_CONTEXT:
+   vmx_flush_tlb(vcpu);
+   nested_vmx_succeed(vcpu);
+   break;
+   default:
+   nested_vmx_failInvalid(vcpu);
+   break;
+   }
+
+   skip_emulated_instruction(vcpu);
return 1;
  }
  


This is not enough.  You need to add a VPID argument to
vpid_sync_vcpu_single, and inline vmx_flush_tlb in handle_invvpid so
that it can use the new VPID argument of vpid_sync_vcpu_single.

Note that the "all context" variant can be mapped to
vpid_sync_vcpu_single with vpid02 as the argument (a nice side effect of
your vpid02 design).



Got it. Just send out patches to handle this. :-)

Regards,
Wanpeng Li
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] KVM: nVMX: fix flush vpid01 during nested vmentry/vmexit

2015-09-23 Thread Wanpeng Li
vpid_sync_vcpu_single() still handles vpid01 during nested 
vmentry/vmexit since vmx->vpid is used for invvpid. This 
patch fix it by specific the vpid02 through __vmx_flush_tlb() 
to flush the right vpid.

Signed-off-by: Wanpeng Li 
---
 arch/x86/kvm/vmx.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 7188c5e..31fb631 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -7207,7 +7207,7 @@ static int handle_invvpid(struct kvm_vcpu *vcpu)
case VMX_VPID_EXTENT_INDIVIDUAL_ADDR:
case VMX_VPID_EXTENT_SINGLE_CONTEXT:
case VMX_VPID_EXTENT_ALL_CONTEXT:
-   vmx_flush_tlb(vcpu);
+   __vmx_flush_tlb(vcpu, to_vmx(vcpu)->nested.vpid02);
nested_vmx_succeed(vcpu);
break;
default:
@@ -9501,7 +9501,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct 
vmcs12 *vmcs12)
vmcs_write16(VIRTUAL_PROCESSOR_ID, vmx->nested.vpid02);
if (vmcs12->virtual_processor_id != 
vmx->nested.last_vpid) {
vmx->nested.last_vpid = 
vmcs12->virtual_processor_id;
-   vmx_flush_tlb(vcpu);
+   __vmx_flush_tlb(vcpu, 
to_vmx(vcpu)->nested.vpid02);
}
} else {
vmcs_write16(VIRTUAL_PROCESSOR_ID, vmx->vpid);
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: fix bogus warning about reserved bits

2015-09-23 Thread Paolo Bonzini


On 23/09/2015 09:56, Borislav Petkov wrote:
> On Tue, Sep 22, 2015 at 11:04:38PM +0200, Paolo Bonzini wrote:
>> Let's add more debugging output:
> 
> Here you go:
> 
> [   50.474002] walk_shadow_page_get_mmio_spte: detect reserved bits on spte, 
> addr 0xb8000 (level 4, 0xf00f8)
> [   50.484249] walk_shadow_page_get_mmio_spte: detect reserved bits on spte, 
> addr 0xb8000 (level 3, 0xf0078)
> [   50.494492] walk_shadow_page_get_mmio_spte: detect reserved bits on spte, 
> addr 0xb8000 (level 2, 0xf0078)

And another patch, which both cranks up the debugging a bit and
tries another fix:

diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index dd05b9cef6ae..b2f49bb15ba1 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -105,8 +105,15 @@ static inline bool guest_cpuid_has_x2apic(struct kvm_vcpu 
*vcpu)
 static inline bool guest_cpuid_is_amd(struct kvm_vcpu *vcpu)
 {
struct kvm_cpuid_entry2 *best;
+   static bool first;
 
best = kvm_find_cpuid_entry(vcpu, 0, 0);
+   if (first && best) {
+   printk("cpuid(0).ebx = %x\n", best->ebx);
+   first = false;
+   } else if (first)
+   printk_ratelimited("cpuid(0) not initialized yet\n");
+
return best && best->ebx == X86EMUL_CPUID_VENDOR_AuthenticAMD_ebx;
 }
 
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index bf1122e9c7bf..f50b280ffee1 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3625,7 +3625,7 @@ static void
 __reset_rsvds_bits_mask(struct kvm_vcpu *vcpu,
struct rsvd_bits_validate *rsvd_check,
int maxphyaddr, int level, bool nx, bool gbpages,
-   bool pse)
+   bool pse, bool amd)
 {
u64 exb_bit_rsvd = 0;
u64 gbpages_bit_rsvd = 0;
@@ -3642,7 +3642,7 @@ __reset_rsvds_bits_mask(struct kvm_vcpu *vcpu,
 * Non-leaf PML4Es and PDPEs reserve bit 8 (which would be the G bit for
 * leaf entries) on AMD CPUs only.
 */
-   if (guest_cpuid_is_amd(vcpu))
+   if (amd)
nonleaf_bit8_rsvd = rsvd_bits(8, 8);
 
switch (level) {
@@ -3710,7 +3710,7 @@ static void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu,
__reset_rsvds_bits_mask(vcpu, &context->guest_rsvd_check,
cpuid_maxphyaddr(vcpu), context->root_level,
context->nx, guest_cpuid_has_gbpages(vcpu),
-   is_pse(vcpu));
+   is_pse(vcpu), guest_cpuid_is_amd(vcpu));
 }
 
 static void
@@ -3760,13 +3760,25 @@ static void reset_rsvds_bits_mask_ept(struct kvm_vcpu 
*vcpu,
 void
 reset_shadow_zero_bits_mask(struct kvm_vcpu *vcpu, struct kvm_mmu *context)
 {
+   /*
+* Passing "true" to the last argument is okay; it adds a check
+* on bit 8 of the SPTEs which KVM doesn't use anyway.
+*/
__reset_rsvds_bits_mask(vcpu, &context->shadow_zero_check,
boot_cpu_data.x86_phys_bits,
context->shadow_root_level, context->nx,
-   guest_cpuid_has_gbpages(vcpu), is_pse(vcpu));
+   guest_cpuid_has_gbpages(vcpu), is_pse(vcpu),
+   true);
 }
 EXPORT_SYMBOL_GPL(reset_shadow_zero_bits_mask);
 
+static inline bool
+boot_cpu_is_amd(void)
+{
+   WARN_ON_ONCE(!tdp_enabled);
+   return shadow_x_mask != 0;
+}
+
 /*
  * the direct page table on host, use as much mmu features as
  * possible, however, kvm currently does not do execution-protection.
@@ -3775,11 +3787,11 @@ static void
 reset_tdp_shadow_zero_bits_mask(struct kvm_vcpu *vcpu,
struct kvm_mmu *context)
 {
-   if (guest_cpuid_is_amd(vcpu))
+   if (boot_cpu_is_amd())
__reset_rsvds_bits_mask(vcpu, &context->shadow_zero_check,
boot_cpu_data.x86_phys_bits,
context->shadow_root_level, false,
-   cpu_has_gbpages, true);
+   cpu_has_gbpages, true, true);
else
__reset_rsvds_bits_mask_ept(&context->shadow_zero_check,
boot_cpu_data.x86_phys_bits,



Applies on top of everything else you've got already.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: nVMX: emulate the INVVPID instruction

2015-09-23 Thread Paolo Bonzini


On 23/09/2015 09:59, Wanpeng Li wrote:
> Add the INVVPID instruction emulation.
> 
> Reviewed-by: Wincy Van 
> Signed-off-by: Wanpeng Li 
> ---
>  arch/x86/include/asm/vmx.h |  1 +
>  arch/x86/kvm/vmx.c | 23 ++-
>  2 files changed, 23 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
> index d25f32a..69f3d71 100644
> --- a/arch/x86/include/asm/vmx.h
> +++ b/arch/x86/include/asm/vmx.h
> @@ -397,6 +397,7 @@ enum vmcs_field {
>  #define IDENTITY_PAGETABLE_PRIVATE_MEMSLOT   (KVM_USER_MEM_SLOTS + 2)
>  
>  #define VMX_NR_VPIDS (1 << 16)
> +#define VMX_VPID_EXTENT_INDIVIDUAL_ADDR  0
>  #define VMX_VPID_EXTENT_SINGLE_CONTEXT   1
>  #define VMX_VPID_EXTENT_ALL_CONTEXT  2
>  
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 6ad991a..794c529 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -7189,7 +7189,28 @@ static int handle_invept(struct kvm_vcpu *vcpu)
>  
>  static int handle_invvpid(struct kvm_vcpu *vcpu)
>  {
> - kvm_queue_exception(vcpu, UD_VECTOR);
> + u32 vmx_instruction_info;
> + unsigned long type;
> +
> + if (!nested_vmx_check_permission(vcpu))
> + return 1;
> +
> + vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO);
> + type = kvm_register_readl(vcpu, (vmx_instruction_info >> 28) & 0xf);
> +
> + switch (type) {
> + case VMX_VPID_EXTENT_INDIVIDUAL_ADDR:
> + case VMX_VPID_EXTENT_SINGLE_CONTEXT:
> + case VMX_VPID_EXTENT_ALL_CONTEXT:
> + vmx_flush_tlb(vcpu);
> + nested_vmx_succeed(vcpu);
> + break;
> + default:
> + nested_vmx_failInvalid(vcpu);
> + break;
> + }
> +
> + skip_emulated_instruction(vcpu);
>   return 1;
>  }
>  
> 

This is not enough.  You need to add a VPID argument to
vpid_sync_vcpu_single, and inline vmx_flush_tlb in handle_invvpid so
that it can use the new VPID argument of vpid_sync_vcpu_single.

Note that the "all context" variant can be mapped to
vpid_sync_vcpu_single with vpid02 as the argument (a nice side effect of
your vpid02 design).

However, I have applied the patch to kvm/queue.  Please send the changes
separately, and I will squash them in the existing VPID patch.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: fix bogus warning about reserved bits

2015-09-23 Thread Paolo Bonzini


On 23/09/2015 09:56, Borislav Petkov wrote:
> [   50.474002] walk_shadow_page_get_mmio_spte: detect reserved bits on spte, 
> addr 0xb8000 (level 4, 0xf00f8)
> [   50.484249] walk_shadow_page_get_mmio_spte: detect reserved bits on spte, 
> addr 0xb8000 (level 3, 0xf0078)
> [   50.494492] walk_shadow_page_get_mmio_spte: detect reserved bits on spte, 
> addr 0xb8000 (level 2, 0xf0078)

It's checking against EPT format, no surprise that it complains...

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: nVMX: emulate the INVVPID instruction

2015-09-23 Thread Wanpeng Li
Add the INVVPID instruction emulation.

Reviewed-by: Wincy Van 
Signed-off-by: Wanpeng Li 
---
 arch/x86/include/asm/vmx.h |  1 +
 arch/x86/kvm/vmx.c | 23 ++-
 2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index d25f32a..69f3d71 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -397,6 +397,7 @@ enum vmcs_field {
 #define IDENTITY_PAGETABLE_PRIVATE_MEMSLOT (KVM_USER_MEM_SLOTS + 2)
 
 #define VMX_NR_VPIDS   (1 << 16)
+#define VMX_VPID_EXTENT_INDIVIDUAL_ADDR0
 #define VMX_VPID_EXTENT_SINGLE_CONTEXT 1
 #define VMX_VPID_EXTENT_ALL_CONTEXT2
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 6ad991a..794c529 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -7189,7 +7189,28 @@ static int handle_invept(struct kvm_vcpu *vcpu)
 
 static int handle_invvpid(struct kvm_vcpu *vcpu)
 {
-   kvm_queue_exception(vcpu, UD_VECTOR);
+   u32 vmx_instruction_info;
+   unsigned long type;
+
+   if (!nested_vmx_check_permission(vcpu))
+   return 1;
+
+   vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO);
+   type = kvm_register_readl(vcpu, (vmx_instruction_info >> 28) & 0xf);
+
+   switch (type) {
+   case VMX_VPID_EXTENT_INDIVIDUAL_ADDR:
+   case VMX_VPID_EXTENT_SINGLE_CONTEXT:
+   case VMX_VPID_EXTENT_ALL_CONTEXT:
+   vmx_flush_tlb(vcpu);
+   nested_vmx_succeed(vcpu);
+   break;
+   default:
+   nested_vmx_failInvalid(vcpu);
+   break;
+   }
+
+   skip_emulated_instruction(vcpu);
return 1;
 }
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: fix bogus warning about reserved bits

2015-09-23 Thread Borislav Petkov
On Tue, Sep 22, 2015 at 11:04:38PM +0200, Paolo Bonzini wrote:
> Let's add more debugging output:

Here you go:

[   50.474002] walk_shadow_page_get_mmio_spte: detect reserved bits on spte, 
addr 0xb8000 (level 4, 0xf00f8)
[   50.484249] walk_shadow_page_get_mmio_spte: detect reserved bits on spte, 
addr 0xb8000 (level 3, 0xf0078)
[   50.494492] walk_shadow_page_get_mmio_spte: detect reserved bits on spte, 
addr 0xb8000 (level 2, 0xf0078)
[   50.504767] dump hierarchy:
[   50.507595] -- spte 0x416533027 level 4.
[   50.507595] -- spte 0x416534027 level 3.
[   50.507596] -- spte 0x416535027 level 2.
[   50.507596] -- spte 0x000b8f67 level 1.
[   50.507597] [ cut here ]
[   50.507616] WARNING: CPU: 4 PID: 3539 at arch/x86/kvm/mmu.c:3396 
handle_mmio_page_fault.part.57+0x1a/0x20 [kvm]()
[   50.507630] Modules linked in: tun sha256_ssse3 sha256_generic drbg 
binfmt_misc ipv6 vfat fat fuse dm_crypt dm_mod kvm_amd kvm crc32_pclmul 
aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd 
amd64_edac_mod k10temp edac_core fam15h_power amdkfd amd_iommu_v2 radeon 
acpi_cpufreq
[   50.507632] CPU: 4 PID: 3539 Comm: qemu-system-x86 Not tainted 4.3.0-rc2+ #2
[   50.507633] Hardware name: To be filled by O.E.M. To be filled by 
O.E.M./M5A97 EVO R2.0, BIOS 1503 01/16/2013
[   50.507635]  a0433932 880416973b78 812c758a 

[   50.507637]  880416973bb0 810534c1 8804231c 
000f
[   50.507638]  000b8000   
880416973bc0
[   50.507639] Call Trace:
[   50.507643]  [] dump_stack+0x4e/0x84
[   50.507646]  [] warn_slowpath_common+0x91/0xd0
[   50.507647]  [] warn_slowpath_null+0x1a/0x20
[   50.507657]  [] handle_mmio_page_fault.part.57+0x1a/0x20 
[kvm]
[   50.507667]  [] tdp_page_fault+0x2a0/0x2b0 [kvm]
[   50.507673]  [] ? __lock_acquire+0x57d/0x17a0
[   50.507682]  [] kvm_mmu_page_fault+0x35/0x240 [kvm]
[   50.507685]  [] pf_interception+0x108/0x1d0 [kvm_amd]
[   50.507688]  [] handle_exit+0x150/0xa40 [kvm_amd]
[   50.507697]  [] ? kvm_arch_vcpu_ioctl_run+0x4c8/0x16f0 
[kvm]
[   50.507706]  [] kvm_arch_vcpu_ioctl_run+0x533/0x16f0 [kvm]
[   50.507715]  [] ? kvm_arch_vcpu_ioctl_run+0x4c8/0x16f0 
[kvm]
[   50.507717]  [] ? mutex_lock_killable_nested+0x312/0x480
[   50.507724]  [] ? kvm_vcpu_ioctl+0x79/0x6f0 [kvm]
[   50.507726]  [] ? preempt_count_sub+0xb3/0x110
[   50.507733]  [] kvm_vcpu_ioctl+0x33f/0x6f0 [kvm]
[   50.507735]  [] do_vfs_ioctl+0x2d7/0x530
[   50.507737]  [] ? __fget_light+0x29/0x90
[   50.507738]  [] SyS_ioctl+0x4c/0x90
[   50.507740]  [] entry_SYSCALL_64_fastpath+0x16/0x73
[   50.507741] ---[ end trace ff23795fcc279cbd ]---

> Thus same as before.
> 
> Just to be safe, can you try using "-cpu host" on the QEMU command
> line and see if it changes anything?  This would catch things such
> as an Intel CPUID on an AMD host.

Here's my full qemu command:

qemu-system-x86_64 -enable-kvm -gdb tcp::1234 -cpu host -m 2048 -hda 
/home/boris/kvm/debian/sid-x86_64.img -hdb /home/boris/kvm/swap.img -boot 
menu=off,order=c -localtime -net nic,model=rtl8139 -net 
user,hostfwd=tcp::1235-:22 -usbdevice tablet -kernel 
/home/boris/kernel/linux-2.6/arch/x86/boot/bzImage -append "root=/dev/sda1 
resume=/dev/sdb1 debug ignore_loglevel log_buf_len=16M earlyprintk=ttyS0,115200 
console=ttyS0,115200 console=tty0 " -monitor pty -virtfs 
local,path=/tmp,mount_tag=tmp,security_model=none -serial 
file:/home/boris/kvm/test-x86_64-1235.log -snapshot -name "Debian x86_64:1235" 
-smp 8

and that splats too:

[  146.891735] walk_shadow_page_get_mmio_spte: detect reserved bits on spte, 
addr 0xb8000 (level 4, 0xf00f8)
[  146.901981] walk_shadow_page_get_mmio_spte: detect reserved bits on spte, 
addr 0xb8000 (level 3, 0xf0078)
[  146.912224] walk_shadow_page_get_mmio_spte: detect reserved bits on spte, 
addr 0xb8000 (level 2, 0xf0078)
[  146.922496] dump hierarchy:
[  146.925331] -- spte 0x37d47027 level 4.
[  146.925332] -- spte 0x37d46027 level 3.
[  146.925332] -- spte 0xb9faa027 level 2.
[  146.925333] -- spte 0x000b8f67 level 1.
[  146.925333] [ cut here ]
[  146.925351] WARNING: CPU: 6 PID: 3753 at arch/x86/kvm/mmu.c:3396 
handle_mmio_page_fault.part.57+0x1a/0x20 [kvm]()
[  146.925371] Modules linked in: tun sha256_ssse3 sha256_generic drbg 
binfmt_misc ipv6 vfat fat fuse dm_crypt dm_mod kvm_amd kvm crc32_pclmul 
aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd 
amd64_edac_mod k10temp edac_core fam15h_power amdkfd amd_iommu_v2 radeon 
acpi_cpufreq
[  146.925373] CPU: 6 PID: 3753 Comm: qemu-system-x86 Tainted: GW   
4.3.0-rc2+ #2
[  146.925374] Hardware name: To be filled by O.E.M. To be filled by 
O.E.M./M5A97 EVO R2.0, BIOS 1503 01/16/2013
[  146.925376]  a0433932 880423377b78 812c758a 

[  146.925378]  8804233

Re: [PATCH 1/1] target-i386: get/put MSR_TSC_AUX across reset and migration

2015-09-23 Thread Paolo Bonzini


On 23/09/2015 08:27, Amit Shah wrote:
> There's one report of migration breaking due to missing MSR_TSC_AUX
> save/restore.  Fix this by adding a new subsection that saves the state
> of this MSR.
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1261797

It turns out that the MSR is already saved/restored into the migration
stream!  However, the commit that introduced RDTSCP support (commit
1b05007, "target-i386: add RDTSCP support", 2009-09-19) was written for
TCG, and we ended up forgetting to fish the value out of KVM and send it
back in.

The KVM bits are okay.  Eduardo, can you undo the machine.c hunk or
should Amit send v2?

Paolo

> Reported-by: Xiaoqing Wei 
> Signed-off-by: Amit Shah 
> CC: Paolo Bonzini 
> CC: Juan Quintela 
> CC: "Dr. David Alan Gilbert" 
> CC: Marcelo Tosatti 
> CC: Richard Henderson 
> CC: Eduardo Habkost 
> ---
>  target-i386/kvm.c | 14 ++
>  target-i386/machine.c | 20 
>  2 files changed, 34 insertions(+)
> 
> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
> index 7b0ba17..80d1a7e 100644
> --- a/target-i386/kvm.c
> +++ b/target-i386/kvm.c
> @@ -67,6 +67,7 @@ const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
>  
>  static bool has_msr_star;
>  static bool has_msr_hsave_pa;
> +static bool has_msr_tsc_aux;
>  static bool has_msr_tsc_adjust;
>  static bool has_msr_tsc_deadline;
>  static bool has_msr_feature_control;
> @@ -825,6 +826,10 @@ static int kvm_get_supported_msrs(KVMState *s)
>  has_msr_hsave_pa = true;
>  continue;
>  }
> +if (kvm_msr_list->indices[i] == MSR_TSC_AUX) {
> +has_msr_tsc_aux = true;
> +continue;
> +}
>  if (kvm_msr_list->indices[i] == MSR_TSC_ADJUST) {
>  has_msr_tsc_adjust = true;
>  continue;
> @@ -1299,6 +1304,9 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
>  if (has_msr_hsave_pa) {
>  kvm_msr_entry_set(&msrs[n++], MSR_VM_HSAVE_PA, env->vm_hsave);
>  }
> +if (has_msr_tsc_aux) {
> +kvm_msr_entry_set(&msrs[n++], MSR_TSC_AUX, env->tsc_aux);
> +}
>  if (has_msr_tsc_adjust) {
>  kvm_msr_entry_set(&msrs[n++], MSR_TSC_ADJUST, env->tsc_adjust);
>  }
> @@ -1671,6 +1679,9 @@ static int kvm_get_msrs(X86CPU *cpu)
>  if (has_msr_hsave_pa) {
>  msrs[n++].index = MSR_VM_HSAVE_PA;
>  }
> +if (has_msr_tsc_aux) {
> +msrs[n++].index = MSR_TSC_AUX;
> +}
>  if (has_msr_tsc_adjust) {
>  msrs[n++].index = MSR_TSC_ADJUST;
>  }
> @@ -1820,6 +1831,9 @@ static int kvm_get_msrs(X86CPU *cpu)
>  case MSR_IA32_TSC:
>  env->tsc = msrs[i].data;
>  break;
> +case MSR_TSC_AUX:
> +env->tsc_aux = msrs[i].data;
> +break;
>  case MSR_TSC_ADJUST:
>  env->tsc_adjust = msrs[i].data;
>  break;
> diff --git a/target-i386/machine.c b/target-i386/machine.c
> index 9fa0563..116693d 100644
> --- a/target-i386/machine.c
> +++ b/target-i386/machine.c
> @@ -453,6 +453,25 @@ static const VMStateDescription vmstate_fpop_ip_dp = {
>  }
>  };
>  
> +static bool tsc_aux_needed(void *opaque)
> +{
> +X86CPU *cpu = opaque;
> +CPUX86State *env = &cpu->env;
> +
> +return env->tsc_aux != 0;
> +}
> +
> +static const VMStateDescription vmstate_msr_tsc_aux = {
> +.name = "cpu/msr_tsc_aux",
> +.version_id = 1,
> +.minimum_version_id = 1,
> +.needed = tsc_aux_needed,
> +.fields = (VMStateField[]) {
> +VMSTATE_UINT64(env.tsc_aux, X86CPU),
> +VMSTATE_END_OF_LIST()
> +}
> +};
> +
>  static bool tsc_adjust_needed(void *opaque)
>  {
>  X86CPU *cpu = opaque;
> @@ -871,6 +890,7 @@ VMStateDescription vmstate_x86_cpu = {
>  &vmstate_msr_hyperv_crash,
>  &vmstate_avx512,
>  &vmstate_xss,
> +&vmstate_msr_tsc_aux,
>  NULL
>  }
>  };
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html