[GIT PULL] KVM fixes for 4.3-rc6

2015-10-14 Thread Paolo Bonzini
Linus,

The following changes since commit d2922422c48df93f3edff7d872ee4f3191fefb08:

  Use WARN_ON_ONCE for missing X86_FEATURE_NRIPS (2015-10-01 14:59:37 +0200)

are available in the git repository at:

  git://git.kernel.org/pub/scm/virt/kvm/kvm.git tags/for-linus

for you to fetch changes up to b10d92a54dac25a6152f1aa1ffc95c12908035ce:

  KVM: x86: fix RSM into 64-bit protected mode (2015-10-14 16:39:52 +0200)


Bug fixes for system management mode emulation.  The first two patches
fix SMM emulation on Nehalem processors.  The others fix some cases
that became apparent as work progressed on the firmware side.


Paolo Bonzini (6):
  KVM: x86: build kvm_userspace_memory_region in x86_set_memory_region
  KVM: x86: map/unmap private slots in __x86_set_memory_region
  KVM: x86: clean up kvm_arch_vcpu_runnable
  KVM: x86: fix SMI to halted VCPU
  KVM: x86: fix previous commit for 32-bit
  KVM: x86: fix RSM into 64-bit protected mode

 arch/x86/include/asm/kvm_host.h |   6 +-
 arch/x86/kvm/emulate.c  |  10 ++-
 arch/x86/kvm/vmx.c  |  26 ++--
 arch/x86/kvm/x86.c  | 135 ++--
 4 files changed, 90 insertions(+), 87 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] VFIO: platform: clear IRQ_NOAUTOEN when de-assigning the IRQ

2015-10-14 Thread Eric Auger
The vfio platform driver currently sets the IRQ_NOAUTOEN before
doing the request_irq to properly handle the user masking. However
it does not clear it when de-assigning the IRQ. This brings issues
when loading the native driver again which may not explicitly enable
the IRQ. This problem was observed with xgbe driver.

Signed-off-by: Eric Auger 
---
 drivers/vfio/platform/vfio_platform_irq.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/vfio/platform/vfio_platform_irq.c 
b/drivers/vfio/platform/vfio_platform_irq.c
index 88bba57..46d4750 100644
--- a/drivers/vfio/platform/vfio_platform_irq.c
+++ b/drivers/vfio/platform/vfio_platform_irq.c
@@ -185,6 +185,7 @@ static int vfio_set_trigger(struct vfio_platform_device 
*vdev, int index,
int ret;
 
if (irq->trigger) {
+   irq_clear_status_flags(irq->hwirq, IRQ_NOAUTOEN);
free_irq(irq->hwirq, irq);
kfree(irq->name);
eventfd_ctx_put(irq->trigger);
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 27/32] nvdimm: support DSM_CMD_IMPLEMENTED function

2015-10-14 Thread Xiao Guangrong



On 10/14/2015 05:41 PM, Stefan Hajnoczi wrote:

On Sun, Oct 11, 2015 at 11:52:59AM +0800, Xiao Guangrong wrote:

+out->len = sizeof(out->status);


out->len is uint16_t, it needs cpu_to_le16().  There may be other
instances in this patch series.



out->len is internally used only which is invisible to guest OS, i,e,
we write this value and read this value by ourself. I think it is
okay.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] VFIO: platform: AMD xgbe reset module

2015-10-14 Thread Arnd Bergmann
On Wednesday 14 October 2015 15:33:12 Eric Auger wrote:
> --- a/drivers/vfio/platform/vfio_platform_common.c
> +++ b/drivers/vfio/platform/vfio_platform_common.c
> @@ -31,6 +31,11 @@ static const struct vfio_platform_reset_combo 
> reset_lookup_table[] = {
> .reset_function_name = "vfio_platform_calxedaxgmac_reset",
> .module_name = "vfio-platform-calxedaxgmac",
> },
> +   {
> +   .compat = "amd,xgbe-seattle-v1a",
> +   .reset_function_name = "vfio_platform_amdxgbe_reset",
> +   .module_name = "vfio-platform-amdxgbe",
> +   },
>  };
>  
>  static void vfio_platform_get_reset(struct vfio_platform_device *vdev,
> 

This is causing build errors for me when CONFIG_MODULES is disabled.

Could this please be restructured so vfio_platform_get_reset does
not attempt to call __symbol_get() but instead has the drivers
register themselves properly to a subsystem?

I don't see any way this could be fixed otherwise. The problem
of course showed up with calxedaxgmac already, but I'd prefer not
to see anything added there until the common code has been improved.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] kvm: svm: Only propagate next_rip when guest supports it

2015-10-14 Thread Paolo Bonzini


On 14/10/2015 15:10, Joerg Roedel wrote:
> From 94ee662c527683c26ea5fa98a5a8f2c798c58470 Mon Sep 17 00:00:00 2001
> From: Joerg Roedel 
> Date: Wed, 7 Oct 2015 13:38:19 +0200
> Subject: [PATCH] kvm: svm: Only propagate next_rip when guest supports it
> 
> Currently we always write the next_rip of the shadow vmcb to
> the guests vmcb when we emulate a vmexit. This could confuse
> the guest when its cpuid indicated no support for the
> next_rip feature.
> 
> Fix this by only propagating next_rip if the guest actually
> supports it.
> 
> Cc: Bandan Das 
> Cc: Dirk Mueller 
> Tested-By: Dirk Mueller 
> Signed-off-by: Joerg Roedel 
> ---
>  arch/x86/kvm/cpuid.h | 21 +
>  arch/x86/kvm/svm.c   | 11 ++-
>  2 files changed, 31 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
> index dd05b9c..effca1f 100644
> --- a/arch/x86/kvm/cpuid.h
> +++ b/arch/x86/kvm/cpuid.h
> @@ -133,4 +133,25 @@ static inline bool guest_cpuid_has_mpx(struct kvm_vcpu 
> *vcpu)
>   best = kvm_find_cpuid_entry(vcpu, 7, 0);
>   return best && (best->ebx & bit(X86_FEATURE_MPX));
>  }
> +
> +/*
> + * NRIPS is provided through cpuidfn 0x800a.edx bit 3
> + */
> +#define BIT_NRIPS3
> +
> +static inline bool guest_cpuid_has_nrips(struct kvm_vcpu *vcpu)
> +{
> + struct kvm_cpuid_entry2 *best;
> +
> + best = kvm_find_cpuid_entry(vcpu, 0x800a, 0);
> +
> + /*
> +  * NRIPS is a scattered cpuid feature, so we can't use
> +  * X86_FEATURE_NRIPS here (X86_FEATURE_NRIPS would be bit
> +  * position 8, not 3).
> +  */
> + return best && (best->edx & bit(BIT_NRIPS));
> +}
> +#undef BIT_NRIPS
> +
>  #endif
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 2f9ed1f..e9e3294 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -159,6 +159,9 @@ struct vcpu_svm {
>   u32 apf_reason;
>  
>   u64  tsc_ratio;
> +
> + /* cached guest cpuid flags for faster access */
> + bool nrips_enabled  : 1;
>  };
>  
>  static DEFINE_PER_CPU(u64, current_tsc_ratio);
> @@ -2365,7 +2368,9 @@ static int nested_svm_vmexit(struct vcpu_svm *svm)
>   nested_vmcb->control.exit_info_2   = vmcb->control.exit_info_2;
>   nested_vmcb->control.exit_int_info = vmcb->control.exit_int_info;
>   nested_vmcb->control.exit_int_info_err = 
> vmcb->control.exit_int_info_err;
> - nested_vmcb->control.next_rip  = vmcb->control.next_rip;
> +
> + if (svm->nrips_enabled)
> + nested_vmcb->control.next_rip  = vmcb->control.next_rip;
>  
>   /*
>* If we emulate a VMRUN/#VMEXIT in the same host #vmexit cycle we have
> @@ -4098,6 +4103,10 @@ static u64 svm_get_mt_mask(struct kvm_vcpu *vcpu, 
> gfn_t gfn, bool is_mmio)
>  
>  static void svm_cpuid_update(struct kvm_vcpu *vcpu)
>  {
> + struct vcpu_svm *svm = to_svm(vcpu);
> +
> + /* Update nrips enabled cache */
> + svm->nrips_enabled = !!guest_cpuid_has_nrips(>vcpu);
>  }
>  
>  static void svm_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 *entry)
> 

Applied to kvm/queue, thanks.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 27/32] nvdimm: support DSM_CMD_IMPLEMENTED function

2015-10-14 Thread Xiao Guangrong



On 10/14/2015 05:40 PM, Stefan Hajnoczi wrote:

On Sun, Oct 11, 2015 at 11:52:59AM +0800, Xiao Guangrong wrote:

  static void dsm_write(void *opaque, hwaddr addr,
uint64_t val, unsigned size)
  {
+NVDIMMState *state = opaque;
+MemoryRegion *dsm_ram_mr;
+dsm_in *in;
+dsm_out *out;
+uint32_t revision, function, handle;
+
  if (val != NOTIFY_VALUE) {
  fprintf(stderr, "BUG: unexepected notify value 0x%" PRIx64, val);
  }
+
+dsm_ram_mr = memory_region_find(>mr, state->page_size,
+state->page_size).mr;
+memory_region_unref(dsm_ram_mr);
+in = memory_region_get_ram_ptr(dsm_ram_mr);


This looks suspicious.  Shouldn't the memory_region_unref(dsm_ram_mr)
happen after we're done using it?


This region is keep-alive during QEMU's running, it is okay. The
same style is applied to other codes, for example:
line 208 in hw/s390x/sclp.c.




+out = (dsm_out *)in;
+
+revision = in->arg1;
+function = in->arg2;
+handle = in->handle;
+le32_to_cpus();
+le32_to_cpus();
+le32_to_cpus();
+
+nvdebug("UUID " UUID_FMT ".\n", in->arg0[0], in->arg0[1], in->arg0[2],
+in->arg0[3], in->arg0[4], in->arg0[5], in->arg0[6],
+in->arg0[7], in->arg0[8], in->arg0[9], in->arg0[10],
+in->arg0[11], in->arg0[12], in->arg0[13], in->arg0[14],
+in->arg0[15]);
+nvdebug("Revision %#x Function %#x Handler %#x.\n", revision, function,
+handle);
+
+if (revision != DSM_REVISION) {
+nvdebug("Revision %#x is not supported, expect %#x.\n",
+revision, DSM_REVISION);
+goto exit;
+}
+
+if (!handle) {
+if (!dsm_is_root_uuid(in->arg0)) {


Please don't dereference 'in' or pass it to other functions.  Avoid race
conditions with guest vcpus by coping in the entire dsm_in struct.

This is like a system call - the kernel cannot trust userspace memory
and must copy in before accessing data.  The same rules apply.



It's little different for QEMU:
- the memory address is always valid to QEMU, it's not always true for Kernel
  due to context-switch

- we have checked the header before use it's data, for example, when we get
  data from GET_NAMESPACE_DATA, we have got the @offset and @length from the
  memory, then copy memory based on these values, that means the userspace
  has no chance to cause buffer overflow by increasing these values at runtime.

  The scenario for our case is simple but Kernel is difficult to do
  check_all_before_use as many paths may be involved.

- guest changes some data is okay, the worst case is that the label data is
  corrupted. This is caused by guest itself. Kernel also supports this kind
  of behaviour, e,g. network TX zero copy, the userspace page is being
  transferred while userspace can still access it.

- it's 4K size on x86, full copy wastes CPU time too much.






--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] VFIO: platform: AMD xgbe reset module

2015-10-14 Thread Eric Auger
This patch introduces a module that registers and implements a low-level
reset function for the AMD XGBE device.

it performs the following actions:
- reset the PHY
- disable auto-negotiation
- disable & clear auto-negotiation IRQ
- soft-reset the MAC

Those tiny pieces of code are inherited from the native xgbe driver.

Signed-off-by: Eric Auger 
---
 drivers/vfio/platform/reset/Kconfig|   7 ++
 drivers/vfio/platform/reset/Makefile   |   2 +
 .../vfio/platform/reset/vfio_platform_amdxgbe.c| 130 +
 drivers/vfio/platform/vfio_platform_common.c   |   5 +
 4 files changed, 144 insertions(+)
 create mode 100644 drivers/vfio/platform/reset/vfio_platform_amdxgbe.c

diff --git a/drivers/vfio/platform/reset/Kconfig 
b/drivers/vfio/platform/reset/Kconfig
index 746b96b..ed9bb28 100644
--- a/drivers/vfio/platform/reset/Kconfig
+++ b/drivers/vfio/platform/reset/Kconfig
@@ -5,3 +5,10 @@ config VFIO_PLATFORM_CALXEDAXGMAC_RESET
  Enables the VFIO platform driver to handle reset for Calxeda xgmac
 
  If you don't know what to do here, say N.
+config VFIO_PLATFORM_AMDXGBE_RESET
+   tristate "VFIO support for AMD XGBE reset"
+   depends on VFIO_PLATFORM
+   help
+ Enables the VFIO platform driver to handle reset for AMD XGBE
+
+ If you don't know what to do here, say N.
diff --git a/drivers/vfio/platform/reset/Makefile 
b/drivers/vfio/platform/reset/Makefile
index 2a486af..93f4e23 100644
--- a/drivers/vfio/platform/reset/Makefile
+++ b/drivers/vfio/platform/reset/Makefile
@@ -1,5 +1,7 @@
 vfio-platform-calxedaxgmac-y := vfio_platform_calxedaxgmac.o
+vfio-platform-amdxgbe-y := vfio_platform_amdxgbe.o
 
 ccflags-y += -Idrivers/vfio/platform
 
 obj-$(CONFIG_VFIO_PLATFORM_CALXEDAXGMAC_RESET) += vfio-platform-calxedaxgmac.o
+obj-$(CONFIG_VFIO_PLATFORM_AMDXGBE_RESET) += vfio-platform-amdxgbe.o
diff --git a/drivers/vfio/platform/reset/vfio_platform_amdxgbe.c 
b/drivers/vfio/platform/reset/vfio_platform_amdxgbe.c
new file mode 100644
index 000..bd2189b
--- /dev/null
+++ b/drivers/vfio/platform/reset/vfio_platform_amdxgbe.c
@@ -0,0 +1,130 @@
+/*
+ * VFIO platform driver specialized for AMD xgbe reset
+ * reset code is inherited from AMD xgbe native driver
+ *
+ * Copyright (c) 2015 Linaro Ltd.
+ *  www.linaro.org
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see .
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "vfio_platform_private.h"
+
+#define DRIVER_VERSION  "0.1"
+#define DRIVER_AUTHOR   "Eric Auger "
+#define DRIVER_DESC "Reset support for AMD xgbe vfio platform device"
+
+#define DMA_MR 0x3000
+#define MAC_VR 0x0110
+#define DMA_ISR0x3008
+#define MAC_ISR0x00b0
+#define PCS_MMD_SELECT 0xff
+#define MDIO_AN_INT0x8002
+#define MDIO_AN_INTMASK0x8001
+
+static unsigned int xmdio_read(void *ioaddr, unsigned int mmd,
+  unsigned int reg)
+{
+   unsigned int mmd_address, value;
+
+   mmd_address = (mmd << 16) | ((reg) & 0x);
+   iowrite32(mmd_address >> 8, ioaddr + (PCS_MMD_SELECT << 2));
+   value = ioread32(ioaddr + ((mmd_address & 0xff) << 2));
+   return value;
+}
+
+static void xmdio_write(void *ioaddr, unsigned int mmd,
+   unsigned int reg, unsigned int value)
+{
+   unsigned int mmd_address;
+
+   mmd_address = (mmd << 16) | ((reg) & 0x);
+   iowrite32(mmd_address >> 8, ioaddr + (PCS_MMD_SELECT << 2));
+   iowrite32(value, ioaddr + ((mmd_address & 0xff) << 2));
+}
+
+int vfio_platform_amdxgbe_reset(struct vfio_platform_device *vdev)
+{
+   struct vfio_platform_region xgmac_regs = vdev->regions[0];
+   struct vfio_platform_region xpcs_regs = vdev->regions[1];
+   u32 dma_mr_value, pcs_value, value;
+   unsigned int count;
+
+   if (!xgmac_regs.ioaddr) {
+   xgmac_regs.ioaddr =
+   ioremap_nocache(xgmac_regs.addr, xgmac_regs.size);
+   if (!xgmac_regs.ioaddr)
+   return -ENOMEM;
+   }
+   if (!xpcs_regs.ioaddr) {
+   xpcs_regs.ioaddr =
+   ioremap_nocache(xpcs_regs.addr, xpcs_regs.size);
+   if (!xpcs_regs.ioaddr)
+   

Re: [PATCH v3 27/32] nvdimm: support DSM_CMD_IMPLEMENTED function

2015-10-14 Thread Eduardo Habkost
On Wed, Oct 14, 2015 at 10:50:40PM +0800, Xiao Guangrong wrote:
> On 10/14/2015 05:40 PM, Stefan Hajnoczi wrote:
> >On Sun, Oct 11, 2015 at 11:52:59AM +0800, Xiao Guangrong wrote:
> >>  static void dsm_write(void *opaque, hwaddr addr,
> >>uint64_t val, unsigned size)
> >>  {
> >>+NVDIMMState *state = opaque;
> >>+MemoryRegion *dsm_ram_mr;
> >>+dsm_in *in;
> >>+dsm_out *out;
> >>+uint32_t revision, function, handle;
> >>+
> >>  if (val != NOTIFY_VALUE) {
> >>  fprintf(stderr, "BUG: unexepected notify value 0x%" PRIx64, val);
> >>  }
> >>+
> >>+dsm_ram_mr = memory_region_find(>mr, state->page_size,
> >>+state->page_size).mr;
> >>+memory_region_unref(dsm_ram_mr);
> >>+in = memory_region_get_ram_ptr(dsm_ram_mr);
> >
> >This looks suspicious.  Shouldn't the memory_region_unref(dsm_ram_mr)
> >happen after we're done using it?
> 
> This region is keep-alive during QEMU's running, it is okay.  The same
> style is applied to other codes, for example: line 208 in
> hw/s390x/sclp.c.

In sclp.c (assign_storage()), the memory region is never used after
memory_region_unref() is called. In unassign_storage(), sclp.c owns an
additional reference, grabbed by assign_storage().

-- 
Eduardo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [edk2] KVM: MTRR: fix memory type handling if MTRR is completely disabled

2015-10-14 Thread Janusz
W dniu 14.10.2015 o 10:32, Xiao Guangrong pisze:
>
>
> On 10/14/2015 04:24 PM, Xiao Guangrong wrote:
>>
>>
>> On 10/14/2015 03:37 PM, Janusz wrote:
>>> I was able to run my virtual machine with this, but had very high cpu
>>> usage when something happen in it like booting system. once, my virtual
>>> machine hang and I couln't even get my mouse / keyboard back from qemu.
>>> When I did vga passthrough, I didn't get any video output, and cpu
>>> usage
>>> was also high. Tried it on 4.3
>>
>> Which tree are you using? Is it kvm tree?
>> Could you please work on queue brancn on current kvm tree based on
>> top commit 73917739334c6509: KVM: x86: fix SMI to halted VCPU.
>>
>> Hmm... interesting, this diff works on my box...
>
> Forgot to say that i built my test env following the instructions on
> kvm-wiki:
> http://www.linux-kvm.org/page/OVMF
>
> My test script is attached, and i will try to build the env like yours
> as much
> as possible...
I cloned git://git.kernel.org/pub/scm/virt/kvm/kvm.git 73917739334c6509
commit, but this is breaking my system...
Slim is not able to start i3, xdm is not killing X when I stop xdm, qemu
is not able to start when I don't use option -nographic
log from qemu on that kernel version:
xcb_connection_has_error() returned true
No protocol specified
Could not initialize SDL(No available video device) - exiting

On main kernel branch I don't have those problems.

I tried to run with -nographic, and tried pc-i440fx-2.1 but the same
problem as before, high cpu usage and no graphic on my GPU.
I don't know if that will help by this is my log from option -global
isa-debugcon.iobase=0x402 -debugcon file:fedora.ovmf.log:
https://bpaste.net/show/36c54dba68c2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [edk2] KVM: MTRR: fix memory type handling if MTRR is completely disabled

2015-10-14 Thread Janusz
I was able to run my virtual machine with this, but had very high cpu
usage when something happen in it like booting system. once, my virtual
machine hang and I couln't even get my mouse / keyboard back from qemu.
When I did vga passthrough, I didn't get any video output, and cpu usage
was also high. Tried it on 4.3

W dniu 14.10.2015 o 05:58, Xiao Guangrong pisze:
>
> Janusz,
>
> Could you please try this:
>
> $ git diff
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 185fc16..bdd564f 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4957,12 +4957,14 @@ static int handle_emulation_failure(struct
> kvm_vcpu *vcpu)
>
> ++vcpu->stat.insn_emulation_fail;
> trace_kvm_emulate_insn_failed(vcpu);
> +#if 0
> if (!is_guest_mode(vcpu) && kvm_x86_ops->get_cpl(vcpu) == 0) {
> vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
> vcpu->run->internal.suberror =
> KVM_INTERNAL_ERROR_EMULATION;
> vcpu->run->internal.ndata = 0;
> r = EMULATE_FAIL;
> }
> +#endif
> kvm_queue_exception(vcpu, UD_VECTOR);
>
> return r;
>
> To see if the issue still there?
>
>
> On 10/02/2015 10:38 PM, Janusz wrote:
>> W dniu 01.10.2015 o 16:18, Paolo Bonzini pisze:
>>>
>>> On 01/10/2015 16:12, Janusz wrote:
 Now, I can also add, that the problem is only when I allow VM to use
 more than one core, so with option  for example:
 -smp 8,cores=4,threads=2,sockets=1 and other combinations like -smp
 4,threads=1 its not working, and without it I am always running VM
 without problems

 Any ideas what can it be? or any idea what would help to find out what
 is causing this?
>>> I am going to send a revert of the patch tomorrow.
>>>
>>> Paolo
>> Thanks, but revert patch doesn't help, so something else is wrong here
>>

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [edk2] KVM: MTRR: fix memory type handling if MTRR is completely disabled

2015-10-14 Thread Janusz
W dniu 14.10.2015 o 11:13, Janusz pisze:
> W dniu 14.10.2015 o 10:32, Xiao Guangrong pisze:
>>
>> On 10/14/2015 04:24 PM, Xiao Guangrong wrote:
>>>
>>> On 10/14/2015 03:37 PM, Janusz wrote:
 I was able to run my virtual machine with this, but had very high cpu
 usage when something happen in it like booting system. once, my virtual
 machine hang and I couln't even get my mouse / keyboard back from qemu.
 When I did vga passthrough, I didn't get any video output, and cpu
 usage
 was also high. Tried it on 4.3
>>> Which tree are you using? Is it kvm tree?
>>> Could you please work on queue brancn on current kvm tree based on
>>> top commit 73917739334c6509: KVM: x86: fix SMI to halted VCPU.
>>>
>>> Hmm... interesting, this diff works on my box...
>> Forgot to say that i built my test env following the instructions on
>> kvm-wiki:
>> http://www.linux-kvm.org/page/OVMF
>>
>> My test script is attached, and i will try to build the env like yours
>> as much
>> as possible...
> I attach my script. I see that you are using pc-i440fx-2.1 - I use
> default, I think its pc-i440fx-2.4, tried 2.3 some time ago and I get
> the same problem. I will try with 2.1 after work
> I am using  master from main kernel  tree, will also try this tree you
> mentioned after work
I am sending this one more time, as my message was rejected by intel
servers because of attached script... Script:
https://bpaste.net/show/8467c3af8b18
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [edk2] KVM: MTRR: fix memory type handling if MTRR is completely disabled

2015-10-14 Thread Xiao Guangrong



On 10/14/2015 04:24 PM, Xiao Guangrong wrote:



On 10/14/2015 03:37 PM, Janusz wrote:

I was able to run my virtual machine with this, but had very high cpu
usage when something happen in it like booting system. once, my virtual
machine hang and I couln't even get my mouse / keyboard back from qemu.
When I did vga passthrough, I didn't get any video output, and cpu usage
was also high. Tried it on 4.3


Which tree are you using? Is it kvm tree?
Could you please work on queue brancn on current kvm tree based on
top commit 73917739334c6509: KVM: x86: fix SMI to halted VCPU.

Hmm... interesting, this diff works on my box...


Forgot to say that i built my test env following the instructions on kvm-wiki:
http://www.linux-kvm.org/page/OVMF

My test script is attached, and i will try to build the env like yours as much
as possible...


ovmf.sh
Description: application/shellscript


Re: [edk2] KVM: MTRR: fix memory type handling if MTRR is completely disabled

2015-10-14 Thread Janusz
W dniu 14.10.2015 o 10:32, Xiao Guangrong pisze:
>
>
> On 10/14/2015 04:24 PM, Xiao Guangrong wrote:
>>
>>
>> On 10/14/2015 03:37 PM, Janusz wrote:
>>> I was able to run my virtual machine with this, but had very high cpu
>>> usage when something happen in it like booting system. once, my virtual
>>> machine hang and I couln't even get my mouse / keyboard back from qemu.
>>> When I did vga passthrough, I didn't get any video output, and cpu
>>> usage
>>> was also high. Tried it on 4.3
>>
>> Which tree are you using? Is it kvm tree?
>> Could you please work on queue brancn on current kvm tree based on
>> top commit 73917739334c6509: KVM: x86: fix SMI to halted VCPU.
>>
>> Hmm... interesting, this diff works on my box...
>
> Forgot to say that i built my test env following the instructions on
> kvm-wiki:
> http://www.linux-kvm.org/page/OVMF
>
> My test script is attached, and i will try to build the env like yours
> as much
> as possible...
I attach my script. I see that you are using pc-i440fx-2.1 - I use
default, I think its pc-i440fx-2.4, tried 2.3 some time ago and I get
the same problem. I will try with 2.1 after work
I am using  master from main kernel  tree, will also try this tree you
mentioned after work


kvm.sh
Description: application/shellscript


Re: [edk2] KVM: MTRR: fix memory type handling if MTRR is completely disabled

2015-10-14 Thread Laszlo Ersek
On 10/14/15 10:32, Xiao Guangrong wrote:
> 
> 
> On 10/14/2015 04:24 PM, Xiao Guangrong wrote:
>>
>>
>> On 10/14/2015 03:37 PM, Janusz wrote:
>>> I was able to run my virtual machine with this, but had very high cpu
>>> usage when something happen in it like booting system. once, my virtual
>>> machine hang and I couln't even get my mouse / keyboard back from qemu.
>>> When I did vga passthrough, I didn't get any video output, and cpu usage
>>> was also high. Tried it on 4.3
>>
>> Which tree are you using? Is it kvm tree?
>> Could you please work on queue brancn on current kvm tree based on
>> top commit 73917739334c6509: KVM: x86: fix SMI to halted VCPU.
>>
>> Hmm... interesting, this diff works on my box...
> 
> Forgot to say that i built my test env following the instructions on
> kvm-wiki:
> http://www.linux-kvm.org/page/OVMF

Wow! Someone actually cares about the whitepaper. Thank you. :)

Laszlo

> 
> My test script is attached, and i will try to build the env like yours
> as much
> as possible...

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [edk2] KVM: MTRR: fix memory type handling if MTRR is completely disabled

2015-10-14 Thread Xiao Guangrong



On 10/14/2015 03:37 PM, Janusz wrote:

I was able to run my virtual machine with this, but had very high cpu
usage when something happen in it like booting system. once, my virtual
machine hang and I couln't even get my mouse / keyboard back from qemu.
When I did vga passthrough, I didn't get any video output, and cpu usage
was also high. Tried it on 4.3


Which tree are you using? Is it kvm tree?
Could you please work on queue brancn on current kvm tree based on
top commit 73917739334c6509: KVM: x86: fix SMI to halted VCPU.

Hmm... interesting, this diff works on my box...



W dniu 14.10.2015 o 05:58, Xiao Guangrong pisze:


Janusz,

Could you please try this:

$ git diff
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 185fc16..bdd564f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4957,12 +4957,14 @@ static int handle_emulation_failure(struct
kvm_vcpu *vcpu)

 ++vcpu->stat.insn_emulation_fail;
 trace_kvm_emulate_insn_failed(vcpu);
+#if 0
 if (!is_guest_mode(vcpu) && kvm_x86_ops->get_cpl(vcpu) == 0) {
 vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
 vcpu->run->internal.suberror =
KVM_INTERNAL_ERROR_EMULATION;
 vcpu->run->internal.ndata = 0;
 r = EMULATE_FAIL;
 }
+#endif
 kvm_queue_exception(vcpu, UD_VECTOR);

 return r;

To see if the issue still there?


On 10/02/2015 10:38 PM, Janusz wrote:

W dniu 01.10.2015 o 16:18, Paolo Bonzini pisze:


On 01/10/2015 16:12, Janusz wrote:

Now, I can also add, that the problem is only when I allow VM to use
more than one core, so with option  for example:
-smp 8,cores=4,threads=2,sockets=1 and other combinations like -smp
4,threads=1 its not working, and without it I am always running VM
without problems

Any ideas what can it be? or any idea what would help to find out what
is causing this?

I am going to send a revert of the patch tomorrow.

Paolo

Thanks, but revert patch doesn't help, so something else is wrong here





--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 00/16] KVM: arm64: GICv3 ITS emulation

2015-10-14 Thread Eric Auger
Hi Andre, Pavel
On 10/12/2015 05:18 PM, Pavel Fedin wrote:
>  Hello!
> 
>> Also what is the status of Eric's IRQ routing support? Should this go in
>> first now?
> 
>  I'd say without vITS there's nothing to use IRQ routing with. It could go in 
> and just lay around
> silently, so that it's not forgotten, but for example current qemu just knows 
> that with GICv2m it
> should use hardcoded linear MSI->SPI mapping.
Currently the gsi routing applies on top of ITS emulation series. I am
going to rebase it soon. It can go in 4.5 with ITS emulation series.

Best Regards

Eric
> 
> Kind regards,
> Pavel Fedin
> Expert Engineer
> Samsung Electronics Research center Russia
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v3 00/16] KVM: arm64: GICv3 ITS emulation

2015-10-14 Thread Pavel Fedin
 Hello!

> Currently the gsi routing applies on top of ITS emulation series. I am
> going to rebase it soon. It can go in 4.5 with ITS emulation series.

 Ah, yes, of course, because it reuses API definitions. I forgot this.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v3 00/32] implement vNVDIMM

2015-10-14 Thread Dan Williams
On Tue, Oct 13, 2015 at 9:03 PM, Xiao Guangrong
 wrote:
>> Label-less DIMMs are tested as part of the unit test [1] and the
>> "memmap=nn!ss" kernel parameter that registers a persistent-memory
>> address range without a DIMM.  What error do you see when label
>> support is disabled?
>>
>> [1]: https://github.com/pmem/ndctl/blob/master/README.md
>>
>
> After revert my commits on NVDIMM driver, yeah, it works.
>
> Okay, i will drop the namespace part and make it as label-less
> instead.
>
> Thank you, Dan!
>

Good to hear.  There are still cases where a guest would likely want
to submit a _DSM, like retrieving address range scrub results from the
ACPI0012 root device, so the ASL work is still needed.  However, I
think the bulk of the storage functionality can be had without
storing/retrieving labels in the guest.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


just an observation about USB

2015-10-14 Thread Eric S. Johansson

 update from the  NaturallySpeaking in a VM project.

 don't remember what I told you before but, yes I can now send 
keystroke events generated by speech recognition in the Windows guest 
into the Linux input queue. I can also extract information from the 
Linux side, and have it modify the grammar on the Windows side. The 
result of  activating that grammar  is that I can execute code on either 
side  in response to speech recognition commands. it's fragile as all 
hell but I'm the only one using it so far. :-)


Latency is a bit longer than I like. USB and network connections break 
every time I come out of suspend part at least I don't have to use 
Windows all the time.


 One thing is puzzling though. Windows, in idle, consume something like 
15 to 20% CPU according to top. I turn on NaturallySpeaking, the 
utilization climbs to him roughly 30 to 40%. I turn on the microphone 
and utilization jumps up to 80-110%.  In other words, it takes up a 
whole core.


I can live with it. I chalk it up to the cost of having a disability 
(a.k.a. cripple tax).


 Hope my observations are useful and if you want me to monitor 
anything, let me know and I'll try to fit it into my daily routine.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: just an observation about USB

2015-10-14 Thread Paolo Bonzini


On 14/10/2015 21:39, Eric S. Johansson wrote:
>  update from the  NaturallySpeaking in a VM project.
> 
>  don't remember what I told you before but, yes I can now send keystroke
> events generated by speech recognition in the Windows guest into the
> Linux input queue. I can also extract information from the Linux side,
> and have it modify the grammar on the Windows side. The result of 
> activating that grammar  is that I can execute code on either side  in
> response to speech recognition commands. it's fragile as all hell but
> I'm the only one using it so far. :-)

That's awesome!  What was the problem?

> Latency is a bit longer than I like. USB and network connections break
> every time I come out of suspend part at least I don't have to use
> Windows all the time.
> 
>  One thing is puzzling though. Windows, in idle, consume something like
> 15 to 20% CPU according to top. I turn on NaturallySpeaking, the
> utilization climbs to him roughly 30 to 40%. I turn on the microphone
> and utilization jumps up to 80-110%.  In other words, it takes up a
> whole core.

USB is really expensive because it's all done through polling.  Do that
in hardware, and your computer is a bit hotter; do that in software
(that's what VMs do) and your computer doubles as a frying pan.

If you have USB3 drivers in Windows, you can try using a USB3
controller.  But it's probably going to waste a lot of processing power
too, because USB audio uses a lot of small packets, making it basically
the worst case.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: just an observation about USB

2015-10-14 Thread Eric S. Johansson



On 10/14/2015 04:04 PM, Paolo Bonzini wrote:


On 14/10/2015 21:39, Eric S. Johansson wrote:

  update from the  NaturallySpeaking in a VM project.

  don't remember what I told you before but, yes I can now send keystroke
events generated by speech recognition in the Windows guest into the
Linux input queue. I can also extract information from the Linux side,
and have it modify the grammar on the Windows side. The result of
activating that grammar  is that I can execute code on either side  in
response to speech recognition commands. it's fragile as all hell but
I'm the only one using it so far. :-)

That's awesome!  What was the problem?
  I would have to say the most the problems were because I just didn't 
know enough. Once I found the right people and gained a bit more 
knowledge about subsystems I never touched, it came together pretty easily.


 I'm living with this for a while to get a feel for what I need to do 
next. It looks like the 2 things that would be most important are 
communicating window status (i.e. is it in a text area or not) and 
trying to create something like Select-and-Say without really using it 
because Nuance isn't talking about how to make it work.


 The 1st is important so that I can know when to dump keystrokes from 
inappropriate recognition. For example, using Thunderbird. You only want 
generalized dictation in text regions like creating this email. You 
don't want it happening when you're someplace where keystroke commands 
are active such as the navigation windows. Let me tell you, I have lost 
more email to miss recognition errors at the wrong time than any other time.


the 2nd is important to enable correction and speech driven editing.





Latency is a bit longer than I like. USB and network connections break
every time I come out of suspend part at least I don't have to use
Windows all the time.

  One thing is puzzling though. Windows, in idle, consume something like
15 to 20% CPU according to top. I turn on NaturallySpeaking, the
utilization climbs to him roughly 30 to 40%. I turn on the microphone
and utilization jumps up to 80-110%.  In other words, it takes up a
whole core.

USB is really expensive because it's all done through polling.  Do that
in hardware, and your computer is a bit hotter; do that in software
(that's what VMs do) and your computer doubles as a frying pan.

If you have USB3 drivers in Windows, you can try using a USB3
controller.  But it's probably going to waste a lot of processing power
too, because USB audio uses a lot of small packets, making it basically
the worst case.


 Okay, then let's try to solve this a different way. What's the 
cleanest, lowest latency way of delivering audio to a virtual machine 
that doesn't use USB in the virtual machine?


I will say that my experience here and this note about USB explaining 
why my laptop gets so hot reinforces were I want to go with this model 
of accessibility tools. It's nice to be able to make this happen in a VM 
but, I think the better solution is to keep all of the accessibility 
tools such as speech recognition or text-to-speech in a tablet like 
device so you can dedicate all of the horsepower as well as carry all 
the accessibility interface in a dedicated platform. Then, it should be 
relatively simple[1]  to put a small  bit of software on the machine 
where you do your work and make that box accessible to disabled user.


I've  simulated this with 2 laptops and it worked really well, much 
better than with a virtual machine. The challenge is, finding a suitable 
secondary device that can run Windows and NaturallySpeaking plus 
whatever,  that isn't too large, too expensive, or too slow.


http://nuance.custhelp.com/app/answers/detail/a_id/16262/~/system-requirements-for-dragon-naturallyspeaking-13

from past experience, I can tell you that the specs are good for at 
least 2 releases as long as you are running nothing else on that machine.


--- eric

[1]  you can stop laughing now. :-)


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: x86: move steal time initialization to vcpu entry time

2015-10-14 Thread Marcelo Tosatti

As reported at https://bugs.launchpad.net/qemu/+bug/1494350, 
it is possible to have vcpu->arch.st.last_steal initialized 
from a thread other than vcpu thread, say the iothread, via 
KVM_SET_MSRS.

Which can cause an overflow later (when subtracting from vcpu threads
sched_info.run_delay).

To avoid that, move steal time accumulation to vcpu entry time, 
before copying steal time data to guest.

Signed-off-by: Marcelo Tosatti 

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8f0f6ec..0e0332e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2030,6 +2030,8 @@ static void accumulate_steal_time(struct kvm_vcpu *vcpu)
 
 static void record_steal_time(struct kvm_vcpu *vcpu)
 {
+   accumulate_steal_time(vcpu);
+
if (!(vcpu->arch.st.msr_val & KVM_MSR_ENABLED))
return;
 
@@ -2182,12 +2184,6 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
if (!(data & KVM_MSR_ENABLED))
break;
 
-   vcpu->arch.st.last_steal = current->sched_info.run_delay;
-
-   preempt_disable();
-   accumulate_steal_time(vcpu);
-   preempt_enable();
-
kvm_make_request(KVM_REQ_STEAL_UPDATE, vcpu);
 
break;
@@ -2830,7 +2826,6 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
vcpu->cpu = cpu;
}
 
-   accumulate_steal_time(vcpu);
kvm_make_request(KVM_REQ_STEAL_UPDATE, vcpu);
 }
 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: VMX: enable LBR virtualization

2015-10-14 Thread Jian Zhou

On 12/10/2015 20:44, Paolo Bonzini wrote:



On 12/10/2015 14:10, Jian Zhou wrote:

ping...


I think your expectations for review RTT are a bit too optimistic.
I have only worked 4 hours since you posted the patch...  But it was on
my list anyway, so let's do it.


  Thank for Paolo's time to review and valuable comments. :)


First of all, you should move the implementation entirely into vmx.c,
because these new hooks are not acceptable:


+   void (*vmcs_write64)(unsigned long field, u64 value);
+   u64 (*vmcs_read64)(unsigned long field);
+
+   int (*add_atomic_switch_msr)(struct kvm_vcpu *vcpu, u32 msr, u64 
guest_val, u64 host_val);
+   void (*disable_intercept_guest_msr)(struct kvm_vcpu *vcpu, u32 msr);


x86.c must not have any knowledge of VMX internals such as the VMCS.
Also, AMD have their own implementation of LBR virtualization.


  ok. These hooks will be modified in the next patch.


In addition, the MSR numbers may differ between the guest and the host,
because it is possible to emulate e.g. a Core CPU on a Core 2 CPU.  So I
recommend against using the atomic switch mechanism for the from/to MSRs.


  The vLBR feature depends on vPMU, and to enable vPMU, it needs to
  specify the "cpu mode" in the guest XML as host-passthrough. I think
  the MSR numbers between the guest and the host are the same in this
  senario.


Instead, if GUEST_DEBUGCTL.LBR = 1 you can read the from/to MSRs into an
array stored in struct kvm_arch_vcpu at vmexit time.  Reading the
from/to MSRs should cause a vmexit in the first implementation.  Any
optimization of this can be done separately.


  ok. I will compare this method to atomic switch mechanism.


As a benefit, this will force you to implement a mechanism for passing
the contents of the MSRs to userspace and read them back.  This is
necessary for debugging and for migration.  You will also have to
implement support for the feature in QEMU in order to support migration
of virtual machines that use LBRs.


  ok. Migration will be supported in the next patch.


+/* Core 2 and Atom last-branch recording */
+#define MSR_C2_LASTBRANCH_TOS  0x01c9
+#define MSR_C2_LASTBRANCH_0_FROM_IP0x0040
+#define MSR_C2_LASTBRANCH_0_TO_IP  0x0060
+#define NUM_MSR_C2_LASTBRANCH_FROM_TO  4
+#define NUM_MSR_ATOM_LASTBRANCH_FROM_TO8
+
+struct lbr_info {
+   u32 base, count;
+} p4_lbr[] = {
+   { MSR_LBR_SELECT,   1 },
+   { MSR_P4_LER_FROM_LIP,  1 },
+   { MSR_P4_LER_TO_LIP,1 },
+   { MSR_P4_LASTBRANCH_TOS,1 },
+   { MSR_P4_LASTBRANCH_0_FROM_LIP, NUM_MSR_P4_LASTBRANCH_FROM_TO },
+   { MSR_P4_LASTBRANCH_0_TO_LIP,   NUM_MSR_P4_LASTBRANCH_FROM_TO },
+   { 0, 0 }
+}, c2_lbr[] = {
+   { MSR_LBR_SELECT,   1 },
+   { MSR_IA32_LASTINTFROMIP,   1 },
+   { MSR_IA32_LASTINTTOIP, 1 },
+   { MSR_C2_LASTBRANCH_TOS,1 },
+   { MSR_C2_LASTBRANCH_0_FROM_IP,  NUM_MSR_C2_LASTBRANCH_FROM_TO },
+   { MSR_C2_LASTBRANCH_0_TO_IP,NUM_MSR_C2_LASTBRANCH_FROM_TO },
+   { 0, 0 }
+}, nh_lbr[] = {
+   { MSR_LBR_SELECT,   1 },
+   { MSR_IA32_LASTINTFROMIP,   1 },
+   { MSR_IA32_LASTINTTOIP, 1 },
+   { MSR_C2_LASTBRANCH_TOS,1 },


Nehalem has 16 LBR records, not 4.  I suggest that you reorganize the
tables so that it is easy to match them against tables in the Intel SDM.
Note that you have to compute the number of records according to the
_guest_ CPUID, not the host CPUID.  For simplicity I suggest that you
only enable LBR virtualization if the host machine has 16 LBR entries,


  ok. The table will be reorganized in the next patch.


and make it possible to disable it through a kvm_intel module parameter.


  A kvm_intel module parameter will be added to permanently disable
  LBR virtualization in the next patch.

  Thanks and regards,

  Jian


Thanks,

Paolo


+   { MSR_P4_LASTBRANCH_0_FROM_LIP, NUM_MSR_P4_LASTBRANCH_FROM_TO },
+   { MSR_P4_LASTBRANCH_0_TO_LIP,   NUM_MSR_P4_LASTBRANCH_FROM_TO },
+   { 0, 0 }
+}, at_lbr[] = {
+   { MSR_LBR_SELECT,   1 },
+   { MSR_IA32_LASTINTFROMIP,   1 },
+   { MSR_IA32_LASTINTTOIP, 1 },
+   { MSR_C2_LASTBRANCH_TOS,1 },
+   { MSR_C2_LASTBRANCH_0_FROM_IP,  NUM_MSR_ATOM_LASTBRANCH_FROM_TO },
+   { MSR_C2_LASTBRANCH_0_TO_IP,NUM_MSR_ATOM_LASTBRANCH_FROM_TO },
+   { 0, 0 }
+};
+
+static const struct lbr_info *last_branch_msr_get(void)
+{
+   switch ( boot_cpu_data.x86 )
+   {
+   case 6:
+   switch ( boot_cpu_data.x86_model )
+   {
+   /* Core2 Duo */
+   case 15:
+   /* Enhanced Core */
+   case 23:
+   return c2_lbr;
+   break;
+

Re: [PATCH] KVM: VMX: enable LBR virtualization

2015-10-14 Thread Paolo Bonzini


On 14/10/2015 13:26, Jian Zhou wrote:
> On 12/10/2015 20:44, Paolo Bonzini wrote:
>> In addition, the MSR numbers may differ between the guest and the host,
>> because it is possible to emulate e.g. a Core CPU on a Core 2 CPU.  So I
>> recommend against using the atomic switch mechanism for the from/to MSRs.
> 
>   The vLBR feature depends on vPMU, and to enable vPMU, it needs to
>   specify the "cpu mode" in the guest XML as host-passthrough. I think
>   the MSR numbers between the guest and the host are the same in this
>   senario.

Does it depend on vPMU _for Linux guests_ or in general?  My impression
is that LBR can be used by the guest independent of the PMU.  You should
also write a unit test for kvm-unit-tests to test the behavior of your
implementation.

Thanks,

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v9 17/18] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked

2015-10-14 Thread Wu, Feng


> -Original Message-
> From: David Matlack [mailto:dmatl...@google.com]
> Sent: Thursday, October 15, 2015 7:41 AM
> To: Wu, Feng 
> Cc: Paolo Bonzini ; alex.william...@redhat.com; Joerg
> Roedel ; Marcelo Tosatti ;
> eric.au...@linaro.org; kvm list ; iommu@lists.linux-
> foundation.org; linux-ker...@vger.kernel.org
> Subject: Re: [PATCH v9 17/18] KVM: Update Posted-Interrupts Descriptor when
> vCPU is blocked
> 
> Hi Feng.
> 
> On Fri, Sep 18, 2015 at 7:29 AM, Feng Wu  wrote:
> > This patch updates the Posted-Interrupts Descriptor when vCPU
> > is blocked.
> >
> > pre-block:
> > - Add the vCPU to the blocked per-CPU list
> > - Set 'NV' to POSTED_INTR_WAKEUP_VECTOR
> >
> > post-block:
> > - Remove the vCPU from the per-CPU list
> 
> I'm wondering what happens if a posted interrupt arrives at the IOMMU
> after pre-block and before post-block.
> 
> In pre_block, NV is set to POSTED_INTR_WAKEUP_VECTOR. IIUC, this means
> future posted interrupts will not trigger "Posted-Interrupt Processing"
> (PIR will not get copied to VIRR). Instead, the IOMMU will do ON := 1,
> PIR |= (1 << vector), and send POSTED_INTR_WAKEUP_VECTOR. PIWV calls
> wakeup_handler which does kvm_vcpu_kick. kvm_vcpu_kick does a wait-queue
> wakeup and possibly a scheduler ipi.
> 
> But the VCPU is sitting in kvm_vcpu_block. It spins and/or schedules
> (wait queue) until it has a reason to wake up. I couldn't find a code
> path from kvm_vcpu_block that lead to checking ON or PIR. How does the
> blocked VCPU "receive" the posted interrupt? (And when does Posted-
> Interrupt Processing get triggered?)

In the pre_block, it also change the 'NDST' filed to the pCPU, on which the vCPU
is put to the per-CPU list 'blocked_vcpu_on_cpu', so when posted-interrupts
come it, it will sent the wakeup notification event to the pCPU above, then in
the wakeup_handler, it can find the vCPU from the per-CPU list, hence
kvm_vcpu_kick can wake up it.

Thanks,
Feng

> 
> Thanks!
> 
> >
> > Signed-off-by: Feng Wu 
> > ---
> > v9:
> > - Add description for blocked_vcpu_on_cpu_lock in
> Documentation/virtual/kvm/locking.txt
> > - Check !kvm_arch_has_assigned_device(vcpu->kvm) first, then
> >   !irq_remapping_cap(IRQ_POSTING_CAP)
> >
> > v8:
> > - Rename 'pi_pre_block' to 'pre_block'
> > - Rename 'pi_post_block' to 'post_block'
> > - Change some comments
> > - Only add the vCPU to the blocking list when the VM has assigned devices.
> >
> >  Documentation/virtual/kvm/locking.txt |  12 +++
> >  arch/x86/include/asm/kvm_host.h   |  13 +++
> >  arch/x86/kvm/vmx.c| 153
> ++
> >  arch/x86/kvm/x86.c|  53 +---
> >  include/linux/kvm_host.h  |   3 +
> >  virt/kvm/kvm_main.c   |   3 +
> >  6 files changed, 227 insertions(+), 10 deletions(-)
> >
> > diff --git a/Documentation/virtual/kvm/locking.txt
> b/Documentation/virtual/kvm/locking.txt
> > index d68af4d..19f94a6 100644
> > --- a/Documentation/virtual/kvm/locking.txt
> > +++ b/Documentation/virtual/kvm/locking.txt
> > @@ -166,3 +166,15 @@ Comment:   The srcu read lock must be held while
> accessing memslots (e.g.
> > MMIO/PIO address->device structure mapping (kvm->buses).
> > The srcu index can be stored in kvm_vcpu->srcu_idx per vcpu
> > if it is needed by multiple functions.
> > +
> > +Name:  blocked_vcpu_on_cpu_lock
> > +Type:  spinlock_t
> > +Arch:  x86
> > +Protects:  blocked_vcpu_on_cpu
> > +Comment:   This is a per-CPU lock and it is used for VT-d 
> > posted-interrupts.
> > +   When VT-d posted-interrupts is supported and the VM has 
> > assigned
> > +   devices, we put the blocked vCPU on the list 
> > blocked_vcpu_on_cpu
> > +   protected by blocked_vcpu_on_cpu_lock, when VT-d hardware
> issues
> > +   wakeup notification event since external interrupts from the
> > +   assigned devices happens, we will find the vCPU on the list 
> > to
> > +   wakeup.
> > diff --git a/arch/x86/include/asm/kvm_host.h
> b/arch/x86/include/asm/kvm_host.h
> > index 0ddd353..304fbb5 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -552,6 +552,8 @@ struct kvm_vcpu_arch {
> >  */
> > bool write_fault_to_shadow_pgtable;
> >
> > +   bool halted;
> > +
> > /* set at EPT violation at this point */
> > unsigned long exit_qualification;
> >
> > @@ -864,6 +866,17 @@ struct kvm_x86_ops {
> > /* pmu operations of sub-arch */
> > const struct kvm_pmu_ops *pmu_ops;
> >
> > +   /*
> > +* Architecture specific hooks for vCPU blocking due to
> > +* HLT instruction.
> > +* Returns for .pre_block():
> > +*- 0 

Re: [PATCH v9 17/18] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked

2015-10-14 Thread David Matlack
Hi Feng.

On Fri, Sep 18, 2015 at 7:29 AM, Feng Wu  wrote:
> This patch updates the Posted-Interrupts Descriptor when vCPU
> is blocked.
>
> pre-block:
> - Add the vCPU to the blocked per-CPU list
> - Set 'NV' to POSTED_INTR_WAKEUP_VECTOR
>
> post-block:
> - Remove the vCPU from the per-CPU list

I'm wondering what happens if a posted interrupt arrives at the IOMMU
after pre-block and before post-block.

In pre_block, NV is set to POSTED_INTR_WAKEUP_VECTOR. IIUC, this means
future posted interrupts will not trigger "Posted-Interrupt Processing"
(PIR will not get copied to VIRR). Instead, the IOMMU will do ON := 1,
PIR |= (1 << vector), and send POSTED_INTR_WAKEUP_VECTOR. PIWV calls
wakeup_handler which does kvm_vcpu_kick. kvm_vcpu_kick does a wait-queue
wakeup and possibly a scheduler ipi.

But the VCPU is sitting in kvm_vcpu_block. It spins and/or schedules
(wait queue) until it has a reason to wake up. I couldn't find a code
path from kvm_vcpu_block that lead to checking ON or PIR. How does the
blocked VCPU "receive" the posted interrupt? (And when does Posted-
Interrupt Processing get triggered?)

Thanks!

>
> Signed-off-by: Feng Wu 
> ---
> v9:
> - Add description for blocked_vcpu_on_cpu_lock in 
> Documentation/virtual/kvm/locking.txt
> - Check !kvm_arch_has_assigned_device(vcpu->kvm) first, then
>   !irq_remapping_cap(IRQ_POSTING_CAP)
>
> v8:
> - Rename 'pi_pre_block' to 'pre_block'
> - Rename 'pi_post_block' to 'post_block'
> - Change some comments
> - Only add the vCPU to the blocking list when the VM has assigned devices.
>
>  Documentation/virtual/kvm/locking.txt |  12 +++
>  arch/x86/include/asm/kvm_host.h   |  13 +++
>  arch/x86/kvm/vmx.c| 153 
> ++
>  arch/x86/kvm/x86.c|  53 +---
>  include/linux/kvm_host.h  |   3 +
>  virt/kvm/kvm_main.c   |   3 +
>  6 files changed, 227 insertions(+), 10 deletions(-)
>
> diff --git a/Documentation/virtual/kvm/locking.txt 
> b/Documentation/virtual/kvm/locking.txt
> index d68af4d..19f94a6 100644
> --- a/Documentation/virtual/kvm/locking.txt
> +++ b/Documentation/virtual/kvm/locking.txt
> @@ -166,3 +166,15 @@ Comment:   The srcu read lock must be held while 
> accessing memslots (e.g.
> MMIO/PIO address->device structure mapping (kvm->buses).
> The srcu index can be stored in kvm_vcpu->srcu_idx per vcpu
> if it is needed by multiple functions.
> +
> +Name:  blocked_vcpu_on_cpu_lock
> +Type:  spinlock_t
> +Arch:  x86
> +Protects:  blocked_vcpu_on_cpu
> +Comment:   This is a per-CPU lock and it is used for VT-d 
> posted-interrupts.
> +   When VT-d posted-interrupts is supported and the VM has 
> assigned
> +   devices, we put the blocked vCPU on the list 
> blocked_vcpu_on_cpu
> +   protected by blocked_vcpu_on_cpu_lock, when VT-d hardware 
> issues
> +   wakeup notification event since external interrupts from the
> +   assigned devices happens, we will find the vCPU on the list to
> +   wakeup.
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 0ddd353..304fbb5 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -552,6 +552,8 @@ struct kvm_vcpu_arch {
>  */
> bool write_fault_to_shadow_pgtable;
>
> +   bool halted;
> +
> /* set at EPT violation at this point */
> unsigned long exit_qualification;
>
> @@ -864,6 +866,17 @@ struct kvm_x86_ops {
> /* pmu operations of sub-arch */
> const struct kvm_pmu_ops *pmu_ops;
>
> +   /*
> +* Architecture specific hooks for vCPU blocking due to
> +* HLT instruction.
> +* Returns for .pre_block():
> +*- 0 means continue to block the vCPU.
> +*- 1 means we cannot block the vCPU since some event
> +*happens during this period, such as, 'ON' bit in
> +*posted-interrupts descriptor is set.
> +*/
> +   int (*pre_block)(struct kvm_vcpu *vcpu);
> +   void (*post_block)(struct kvm_vcpu *vcpu);
> int (*update_pi_irte)(struct kvm *kvm, unsigned int host_irq,
>   uint32_t guest_irq, bool set);
>  };
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 902a67d..9968896 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -879,6 +879,13 @@ static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
>  static DEFINE_PER_CPU(struct list_head, loaded_vmcss_on_cpu);
>  static DEFINE_PER_CPU(struct desc_ptr, host_gdt);
>
> +/*
> + * We maintian a per-CPU linked-list of vCPU, so in wakeup_handler() we
> + * can find which vCPU should be waken up.
> + */
> +static DEFINE_PER_CPU(struct list_head, blocked_vcpu_on_cpu);
> +static DEFINE_PER_CPU(spinlock_t, 

[PATCH v3 5/5] KVM: nVMX: expose VPID capability to L1

2015-10-14 Thread Wanpeng Li
Expose VPID capability to L1. For nested guests, we don't do anything 
specific for single context invalidation. Hence, only advertise support 
for global context invalidation. The major benefit of nested VPID comes 
from having separate vpids when switching between L1 and L2, and also 
when L2's vCPUs not sched in/out on L1.

Reviewed-by: Wincy Van 
Signed-off-by: Wanpeng Li 
---
 arch/x86/kvm/vmx.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 2a54cc7..0b558ae 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2622,7 +2622,11 @@ static void nested_vmx_setup_ctls_msrs(struct vcpu_vmx 
*vmx)
} else
vmx->nested.nested_vmx_ept_caps = 0;
 
-   vmx->nested.nested_vmx_vpid_caps = 0;
+   if (enable_vpid)
+   vmx->nested.nested_vmx_vpid_caps = VMX_VPID_INVVPID_BIT |
+   VMX_VPID_EXTENT_GLOBAL_CONTEXT_BIT;
+   else
+   vmx->nested.nested_vmx_vpid_caps = 0;
 
if (enable_unrestricted_guest)
vmx->nested.nested_vmx_secondary_ctls_high |=
@@ -2739,7 +2743,8 @@ static int vmx_get_vmx_msr(struct kvm_vcpu *vcpu, u32 
msr_index, u64 *pdata)
break;
case MSR_IA32_VMX_EPT_VPID_CAP:
/* Currently, no nested vpid support */
-   *pdata = vmx->nested.nested_vmx_ept_caps;
+   *pdata = vmx->nested.nested_vmx_ept_caps |
+   ((u64)vmx->nested.nested_vmx_vpid_caps << 32);
break;
default:
return 1;
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 4/5] KVM: nVMX: nested VPID emulation

2015-10-14 Thread Wanpeng Li
VPID is used to tag address space and avoid a TLB flush. Currently L0 use
the same VPID to run L1 and all its guests. KVM flushes VPID when switching
between L1 and L2.

This patch advertises VPID to the L1 hypervisor, then address space of L1
and L2 can be separately treated and avoid TLB flush when swithing between
L1 and L2. For each nested vmentry, if vpid12 is changed, reuse shadow vpid
w/ an invvpid.

Performance:

run lmbench on L2 w/ 3.5 kernel.

Context switching - times in microseconds - smaller is better
-
Host OS  2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
 ctxsw  ctxsw  ctxsw ctxsw  ctxsw   ctxsw   ctxsw
- - -- -- -- -- -- --- ---
kernelLinux 3.5.0-1 1.2200 1.3700 1.4500 4.7800 2.3300 5.6 2.88000  
nested VPID
kernelLinux 3.5.0-1 1.2600 1.4300 1.5600   12.7   12.9 3.49000 7.46000  
vanilla

Reviewed-by: Jan Kiszka 
Reviewed-by: Wincy Van 
Signed-off-by: Wanpeng Li 
---
 arch/x86/kvm/vmx.c | 39 ---
 1 file changed, 32 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index ca0b526..2a54cc7 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -426,6 +426,9 @@ struct nested_vmx {
/* to migrate it to L2 if VM_ENTRY_LOAD_DEBUG_CONTROLS is off */
u64 vmcs01_debugctl;
 
+   u16 vpid02;
+   u16 last_vpid;
+
u32 nested_vmx_procbased_ctls_low;
u32 nested_vmx_procbased_ctls_high;
u32 nested_vmx_true_procbased_ctls_low;
@@ -1213,6 +1216,11 @@ static inline bool 
nested_cpu_has_virt_x2apic_mode(struct vmcs12 *vmcs12)
return nested_cpu_has2(vmcs12, SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE);
 }
 
+static inline bool nested_cpu_has_vpid(struct vmcs12 *vmcs12)
+{
+   return nested_cpu_has2(vmcs12, SECONDARY_EXEC_ENABLE_VPID);
+}
+
 static inline bool nested_cpu_has_apic_reg_virt(struct vmcs12 *vmcs12)
 {
return nested_cpu_has2(vmcs12, SECONDARY_EXEC_APIC_REGISTER_VIRT);
@@ -2590,6 +2598,7 @@ static void nested_vmx_setup_ctls_msrs(struct vcpu_vmx 
*vmx)
SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES |
SECONDARY_EXEC_RDTSCP |
SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE |
+   SECONDARY_EXEC_ENABLE_VPID |
SECONDARY_EXEC_APIC_REGISTER_VIRT |
SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY |
SECONDARY_EXEC_WBINVD_EXITING |
@@ -6832,6 +6841,7 @@ static void free_nested(struct vcpu_vmx *vmx)
return;
 
vmx->nested.vmxon = false;
+   free_vpid(vmx->nested.vpid02);
nested_release_vmcs12(vmx);
if (enable_shadow_vmcs)
free_vmcs(vmx->nested.current_shadow_vmcs);
@@ -7393,7 +7403,7 @@ static int handle_invvpid(struct kvm_vcpu *vcpu)
VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID);
return 1;
}
-   vmx_flush_tlb(vcpu);
+   __vmx_flush_tlb(vcpu, to_vmx(vcpu)->nested.vpid02);
nested_vmx_succeed(vcpu);
break;
default:
@@ -8773,8 +8783,10 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, 
unsigned int id)
goto free_vmcs;
}
 
-   if (nested)
+   if (nested) {
nested_vmx_setup_ctls_msrs(vmx);
+   vmx->nested.vpid02 = allocate_vpid();
+   }
 
vmx->nested.posted_intr_nv = -1;
vmx->nested.current_vmptr = -1ull;
@@ -8795,6 +8807,7 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, 
unsigned int id)
return >vcpu;
 
 free_vmcs:
+   free_vpid(vmx->nested.vpid02);
free_loaded_vmcs(vmx->loaded_vmcs);
 free_msrs:
kfree(vmx->guest_msrs);
@@ -9679,12 +9692,24 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, 
struct vmcs12 *vmcs12)
 
if (enable_vpid) {
/*
-* Trivially support vpid by letting L2s share their parent
-* L1's vpid. TODO: move to a more elaborate solution, giving
-* each L2 its own vpid and exposing the vpid feature to L1.
+* There is no direct mapping between vpid02 and vpid12, the
+* vpid02 is per-vCPU for L0 and reused while the value of
+* vpid12 is changed w/ one invvpid during nested vmentry.
+* The vpid12 is allocated by L1 for L2, so it will not
+* influence global bitmap(for vpid01 and vpid02 allocation)
+* even if spawn a lot of nested vCPUs.
 */
-   vmcs_write16(VIRTUAL_PROCESSOR_ID, vmx->vpid);
-   vmx_flush_tlb(vcpu);
+   if (nested_cpu_has_vpid(vmcs12) && vmx->nested.vpid02) {
+   

[PATCH v3 0/5] KVM: nVMX: nested VPID emulation

2015-10-14 Thread Wanpeng Li
v2 -> v3:
 * separate nested_vmx_vpid_caps and move checks to patch 3/5,
   only rejoin them when reading the MSR.

v1 -> v2:
 * set bit 32 of the VMX_EPT_VPID_CAP MSR
 * check against the supported types in the implementation of 
   the INVVPID instruction
 * the memory operand must always be read even if it isn't needed 
   (e.g., for type==global), similar to INVEPT
 * for single-context invalidation to check that VPID != 0, though in 
   practice that doesn't matter because we don't want to support
   single-context invalidation
 * don't set msr's ept related bits if !enable_ept 


VPID is used to tag address space and avoid a TLB flush. Currently L0 use
the same VPID to run L1 and all its guests. KVM flushes VPID when switching
between L1 and L2.

This patch advertises VPID to the L1 hypervisor, then address space of L1
and L2 can be separately treated and avoid TLB flush when swithing between
L1 and L2. For each nested vmentry, if vpid12 is changed, reuse shadow vpid
w/ an invvpid.

Performance:

run lmbench on L2 w/ 3.5 kernel.

Context switching - times in microseconds - smaller is better
-
Host OS  2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
 ctxsw  ctxsw  ctxsw ctxsw  ctxsw   ctxsw   ctxsw
- - -- -- -- -- -- --- ---
kernelLinux 3.5.0-1 1.2200 1.3700 1.4500 4.7800 2.3300 5.6 2.88000  
nested VPID
kernelLinux 3.5.0-1 1.2600 1.4300 1.5600   12.7   12.9 3.49000 7.46000  
vanilla

Wanpeng Li (5):
  KVM: VMX: adjust interface to allocate/free_vpid
  KVM: VMX: introduce __vmx_flush_tlb to handle specific vpid
  KVM: nVMX: emulate the INVVPID instruction
  KVM: nVMX: nested VPID emulation
  KVM: nVMX: expose VPID capability to L1

 arch/x86/include/asm/vmx.h |   1 +
 arch/x86/kvm/vmx.c | 151 -
 2 files changed, 123 insertions(+), 29 deletions(-)

-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v3 14/16] KVM: arm64: implement ITS command queue command handlers

2015-10-14 Thread Pavel Fedin
 Hello!

> -Original Message-
> From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf 
> Of Andre Przywara
> Sent: Wednesday, October 07, 2015 5:55 PM
> To: marc.zyng...@arm.com; christoffer.d...@linaro.org
> Cc: eric.au...@linaro.org; p.fe...@samsung.com; kvm...@lists.cs.columbia.edu; 
> linux-arm-
> ker...@lists.infradead.org; kvm@vger.kernel.org
> Subject: [PATCH v3 14/16] KVM: arm64: implement ITS command queue command 
> handlers
> 
> The connection between a device, an event ID, the LPI number and the
> allocated CPU is stored in in-memory tables in a GICv3, but their
> format is not specified by the spec. Instead software uses a command
> queue in a ring buffer to let the ITS implementation use their own
> format.
> Implement handlers for the various ITS commands and let them store
> the requested relation into our own data structures.
> To avoid kmallocs inside the ITS spinlock, we preallocate possibly
> needed memory outside of the lock and free that if it turns out to
> be not needed (mostly error handling).
> Error handling is very basic at this point, as we don't have a good
> way of communicating errors to the guest (usually a SError).
> The INT command handler is missing at this point, as we gain the
> capability of actually injecting MSIs into the guest only later on.
> 
> Signed-off-by: Andre Przywara 
> ---
> Changelog v2..v3:
> - adjust handlers to new pendbaser/propbaser locking scheme
> - properly free ITTEs (including pending bitmap)
> - fix handling of unmapped collections
> 
>  include/linux/irqchip/arm-gic-v3.h |   5 +-
>  virt/kvm/arm/its-emul.c| 502 
> -
>  virt/kvm/arm/its-emul.h|  11 +
>  3 files changed, 516 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/irqchip/arm-gic-v3.h 
> b/include/linux/irqchip/arm-gic-v3.h
> index ef274a9..27c0e75 100644
> --- a/include/linux/irqchip/arm-gic-v3.h
> +++ b/include/linux/irqchip/arm-gic-v3.h
> @@ -255,7 +255,10 @@
>   */
>  #define GITS_CMD_MAPD0x08
>  #define GITS_CMD_MAPC0x09
> -#define GITS_CMD_MAPVI   0x0a
> +#define GITS_CMD_MAPTI   0x0a
> +/* older GIC documentation used MAPVI for this command */
> +#define GITS_CMD_MAPVI   GITS_CMD_MAPTI
> +#define GITS_CMD_MAPI0x0b
>  #define GITS_CMD_MOVI0x01
>  #define GITS_CMD_DISCARD 0x0f
>  #define GITS_CMD_INV 0x0c
> diff --git a/virt/kvm/arm/its-emul.c b/virt/kvm/arm/its-emul.c
> index 7a8c5db..642effb 100644
> --- a/virt/kvm/arm/its-emul.c
> +++ b/virt/kvm/arm/its-emul.c
> @@ -22,6 +22,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
> 
>  #include 
>  #include 
> @@ -64,6 +65,34 @@ struct its_itte {
>   unsigned long *pending;
>  };
> 
> +static struct its_device *find_its_device(struct kvm *kvm, u32 device_id)
> +{
> + struct vgic_its *its = >arch.vgic.its;
> + struct its_device *device;
> +
> + list_for_each_entry(device, >device_list, dev_list)
> + if (device_id == device->device_id)
> + return device;
> +
> + return NULL;
> +}
> +
> +static struct its_itte *find_itte(struct kvm *kvm, u32 device_id, u32 
> event_id)
> +{
> + struct its_device *device;
> + struct its_itte *itte;
> +
> + device = find_its_device(kvm, device_id);
> + if (device == NULL)
> + return NULL;
> +
> + list_for_each_entry(itte, >itt, itte_list)
> + if (itte->event_id == event_id)
> + return itte;
> +
> + return NULL;
> +}
> +
>  /* To be used as an iterator this macro misses the enclosing parentheses */
>  #define for_each_lpi(dev, itte, kvm) \
>   list_for_each_entry(dev, &(kvm)->arch.vgic.its.device_list, dev_list) \
> @@ -81,6 +110,19 @@ static struct its_itte *find_itte_by_lpi(struct kvm *kvm, 
> int lpi)
>   return NULL;
>  }
> 
> +static struct its_collection *find_collection(struct kvm *kvm, int coll_id)
> +{
> + struct its_collection *collection;
> +
> + list_for_each_entry(collection, >arch.vgic.its.collection_list,
> + coll_list) {
> + if (coll_id == collection->collection_id)
> + return collection;
> + }
> +
> + return NULL;
> +}
> +
>  #define LPI_PROP_ENABLE_BIT(p)   ((p) & LPI_PROP_ENABLED)
>  #define LPI_PROP_PRIORITY(p) ((p) & 0xfc)
> 
> @@ -352,13 +394,471 @@ static void its_free_itte(struct its_itte *itte)
>   kfree(itte);
>  }
> 
> +static u64 its_cmd_mask_field(u64 *its_cmd, int word, int shift, int size)
> +{
> + return (le64_to_cpu(its_cmd[word]) >> shift) & (BIT_ULL(size) - 1);
> +}
> +
> +#define its_cmd_get_command(cmd) its_cmd_mask_field(cmd, 0,  0,  8)
> +#define its_cmd_get_deviceid(cmd)its_cmd_mask_field(cmd, 0, 32, 32)
> +#define its_cmd_get_id(cmd)

[PATCH v3 3/5] KVM: nVMX: emulate the INVVPID instruction

2015-10-14 Thread Wanpeng Li
Add the INVVPID instruction emulation.

Reviewed-by: Wincy Van 
Signed-off-by: Wanpeng Li 
---
 arch/x86/include/asm/vmx.h |  1 +
 arch/x86/kvm/vmx.c | 61 +-
 2 files changed, 61 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index d25f32a..aa336ff 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -416,6 +416,7 @@ enum vmcs_field {
 #define VMX_EPT_EXTENT_CONTEXT_BIT (1ull << 25)
 #define VMX_EPT_EXTENT_GLOBAL_BIT  (1ull << 26)
 
+#define VMX_VPID_INVVPID_BIT(1ull << 0) /* (32 - 32) */
 #define VMX_VPID_EXTENT_SINGLE_CONTEXT_BIT  (1ull << 9) /* (41 - 32) */
 #define VMX_VPID_EXTENT_GLOBAL_CONTEXT_BIT  (1ull << 10) /* (42 - 32) */
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index d21b9a6..ca0b526 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -442,6 +442,7 @@ struct nested_vmx {
u32 nested_vmx_misc_low;
u32 nested_vmx_misc_high;
u32 nested_vmx_ept_caps;
+   u32 nested_vmx_vpid_caps;
 };
 
 #define POSTED_INTR_ON  0
@@ -2612,6 +2613,8 @@ static void nested_vmx_setup_ctls_msrs(struct vcpu_vmx 
*vmx)
} else
vmx->nested.nested_vmx_ept_caps = 0;
 
+   vmx->nested.nested_vmx_vpid_caps = 0;
+
if (enable_unrestricted_guest)
vmx->nested.nested_vmx_secondary_ctls_high |=
SECONDARY_EXEC_UNRESTRICTED_GUEST;
@@ -7343,7 +7346,63 @@ static int handle_invept(struct kvm_vcpu *vcpu)
 
 static int handle_invvpid(struct kvm_vcpu *vcpu)
 {
-   kvm_queue_exception(vcpu, UD_VECTOR);
+   struct vcpu_vmx *vmx = to_vmx(vcpu);
+   u32 vmx_instruction_info;
+   unsigned long type, types;
+   gva_t gva;
+   struct x86_exception e;
+   int vpid;
+
+   if (!(vmx->nested.nested_vmx_secondary_ctls_high &
+ SECONDARY_EXEC_ENABLE_VPID) ||
+   !(vmx->nested.nested_vmx_vpid_caps & 
VMX_VPID_INVVPID_BIT)) {
+   kvm_queue_exception(vcpu, UD_VECTOR);
+   return 1;
+   }
+
+   if (!nested_vmx_check_permission(vcpu))
+   return 1;
+
+   vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO);
+   type = kvm_register_readl(vcpu, (vmx_instruction_info >> 28) & 0xf);
+
+   types = (vmx->nested.nested_vmx_vpid_caps >> 8) & 0x7;
+
+   if (!(types & (1UL << type))) {
+   nested_vmx_failValid(vcpu,
+   VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID);
+   return 1;
+   }
+
+   /* according to the intel vmx instruction reference, the memory
+* operand is read even if it isn't needed (e.g., for type==global)
+*/
+   if (get_vmx_mem_address(vcpu, vmcs_readl(EXIT_QUALIFICATION),
+   vmx_instruction_info, false, ))
+   return 1;
+   if (kvm_read_guest_virt(>arch.emulate_ctxt, gva, ,
+   sizeof(u32), )) {
+   kvm_inject_page_fault(vcpu, );
+   return 1;
+   }
+
+   switch (type) {
+   case VMX_VPID_EXTENT_ALL_CONTEXT:
+   if (get_vmcs12(vcpu)->virtual_processor_id == 0) {
+   nested_vmx_failValid(vcpu,
+   VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID);
+   return 1;
+   }
+   vmx_flush_tlb(vcpu);
+   nested_vmx_succeed(vcpu);
+   break;
+   default:
+   /* Trap single context invalidation invvpid calls */
+   BUG_ON(1);
+   break;
+   }
+
+   skip_emulated_instruction(vcpu);
return 1;
 }
 
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 2/5] KVM: VMX: introduce __vmx_flush_tlb to handle specific vpid

2015-10-14 Thread Wanpeng Li
Introduce __vmx_flush_tlb() to handle specific vpid. It will be 
used by later patches, note that the "all context" variant can 
be mapped to vpid_sync_vcpu_single with vpid02 as the argument 
(a nice side effect of vpid02 design).

Reviewed-by: Wincy Van 
Signed-off-by: Wanpeng Li 
---
 arch/x86/kvm/vmx.c | 21 +
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 1a0e336..d21b9a6 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1392,13 +1392,13 @@ static void loaded_vmcs_clear(struct loaded_vmcs 
*loaded_vmcs)
 __loaded_vmcs_clear, loaded_vmcs, 1);
 }
 
-static inline void vpid_sync_vcpu_single(struct vcpu_vmx *vmx)
+static inline void vpid_sync_vcpu_single(int vpid)
 {
-   if (vmx->vpid == 0)
+   if (vpid == 0)
return;
 
if (cpu_has_vmx_invvpid_single())
-   __invvpid(VMX_VPID_EXTENT_SINGLE_CONTEXT, vmx->vpid, 0);
+   __invvpid(VMX_VPID_EXTENT_SINGLE_CONTEXT, vpid, 0);
 }
 
 static inline void vpid_sync_vcpu_global(void)
@@ -1407,10 +1407,10 @@ static inline void vpid_sync_vcpu_global(void)
__invvpid(VMX_VPID_EXTENT_ALL_CONTEXT, 0, 0);
 }
 
-static inline void vpid_sync_context(struct vcpu_vmx *vmx)
+static inline void vpid_sync_context(int vpid)
 {
if (cpu_has_vmx_invvpid_single())
-   vpid_sync_vcpu_single(vmx);
+   vpid_sync_vcpu_single(vpid);
else
vpid_sync_vcpu_global();
 }
@@ -3563,9 +3563,9 @@ static void exit_lmode(struct kvm_vcpu *vcpu)
 
 #endif
 
-static void vmx_flush_tlb(struct kvm_vcpu *vcpu)
+static inline void __vmx_flush_tlb(struct kvm_vcpu *vcpu, int vpid)
 {
-   vpid_sync_context(to_vmx(vcpu));
+   vpid_sync_context(vpid);
if (enable_ept) {
if (!VALID_PAGE(vcpu->arch.mmu.root_hpa))
return;
@@ -3573,6 +3573,11 @@ static void vmx_flush_tlb(struct kvm_vcpu *vcpu)
}
 }
 
+static void vmx_flush_tlb(struct kvm_vcpu *vcpu)
+{
+   __vmx_flush_tlb(vcpu, to_vmx(vcpu)->vpid);
+}
+
 static void vmx_decache_cr0_guest_bits(struct kvm_vcpu *vcpu)
 {
ulong cr0_guest_owned_bits = vcpu->arch.cr0_guest_owned_bits;
@@ -4924,7 +4929,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool 
init_event)
vmx_fpu_activate(vcpu);
update_exception_bitmap(vcpu);
 
-   vpid_sync_context(vmx);
+   vpid_sync_context(vmx->vpid);
 }
 
 /*
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 1/5] KVM: VMX: adjust interface to allocate/free_vpid

2015-10-14 Thread Wanpeng Li
Adjust allocate/free_vid so that they can be reused for the nested vpid.

Reviewed-by: Wincy Van 
Signed-off-by: Wanpeng Li 
---
 arch/x86/kvm/vmx.c | 25 -
 1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c5c2283..1a0e336 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4277,29 +4277,28 @@ static int alloc_identity_pagetable(struct kvm *kvm)
return r;
 }
 
-static void allocate_vpid(struct vcpu_vmx *vmx)
+static int allocate_vpid(void)
 {
int vpid;
 
-   vmx->vpid = 0;
if (!enable_vpid)
-   return;
+   return 0;
spin_lock(_vpid_lock);
vpid = find_first_zero_bit(vmx_vpid_bitmap, VMX_NR_VPIDS);
-   if (vpid < VMX_NR_VPIDS) {
-   vmx->vpid = vpid;
+   if (vpid < VMX_NR_VPIDS)
__set_bit(vpid, vmx_vpid_bitmap);
-   }
+   else
+   vpid = 0;
spin_unlock(_vpid_lock);
+   return vpid;
 }
 
-static void free_vpid(struct vcpu_vmx *vmx)
+static void free_vpid(int vpid)
 {
-   if (!enable_vpid)
+   if (!enable_vpid || vpid == 0)
return;
spin_lock(_vpid_lock);
-   if (vmx->vpid != 0)
-   __clear_bit(vmx->vpid, vmx_vpid_bitmap);
+   __clear_bit(vpid, vmx_vpid_bitmap);
spin_unlock(_vpid_lock);
 }
 
@@ -8643,7 +8642,7 @@ static void vmx_free_vcpu(struct kvm_vcpu *vcpu)
 
if (enable_pml)
vmx_disable_pml(vmx);
-   free_vpid(vmx);
+   free_vpid(vmx->vpid);
leave_guest_mode(vcpu);
vmx_load_vmcs01(vcpu);
free_nested(vmx);
@@ -8662,7 +8661,7 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, 
unsigned int id)
if (!vmx)
return ERR_PTR(-ENOMEM);
 
-   allocate_vpid(vmx);
+   vmx->vpid = allocate_vpid();
 
err = kvm_vcpu_init(>vcpu, kvm, id);
if (err)
@@ -8738,7 +8737,7 @@ free_msrs:
 uninit_vcpu:
kvm_vcpu_uninit(>vcpu);
 free_vcpu:
-   free_vpid(vmx);
+   free_vpid(vmx->vpid);
kmem_cache_free(kvm_vcpu_cache, vmx);
return ERR_PTR(err);
 }
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Inconsistent guest OS disk size compared to volume.img size

2015-10-14 Thread Stefan Hajnoczi
On Tue, Sep 29, 2015 at 12:02:17AM -0700, Jay Fishman wrote:
> I  have looked all over the internet but I can not even find a
> reference to this issue.
> 
> 
> I have installed the following on Linux Mint 17.1
> QEMU emulator version 2.0.0 (Debian 2.0.0+dfsg-2ubuntu1.19), Fabrice Bellard
> 
> On that, I have created a Ubuntu 14.04.3 LTS guest and created a
> storage volume of 12.88GB. The format that I used was raw.
> 
> The host uses a physical mirrored drive and I did NOT use LVM (ext4
> was the format type)
> 
> When installing the guest, I selected to "use entire disk" and again I
> did NOT use LVM (ext4 was the format type)
> 
> 
> After installation, the guest reports I am using 23.8% of 4.84GB. Why
> is the disk size 4.84GB instead of 12.88GB?
> 
> The size of the guest virtual disk is being reduced by almost a third?

If you still need help with this, please provide the following
information:

1. Output of "fdisk -lu /dev/vda" and "df -h /" from inside the guest.

   You may need to adjust the block device path if the root file system
   isn't on the first virtio-blk device (e.g. /dev/vdb or /dev/sda).

2. Output of "stat disk.img" from the host, where "disk.img" is the
   filename.

Thanks,
Stefan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 27/32] nvdimm: support DSM_CMD_IMPLEMENTED function

2015-10-14 Thread Stefan Hajnoczi
On Sun, Oct 11, 2015 at 11:52:59AM +0800, Xiao Guangrong wrote:
>  static void dsm_write(void *opaque, hwaddr addr,
>uint64_t val, unsigned size)
>  {
> +NVDIMMState *state = opaque;
> +MemoryRegion *dsm_ram_mr;
> +dsm_in *in;
> +dsm_out *out;
> +uint32_t revision, function, handle;
> +
>  if (val != NOTIFY_VALUE) {
>  fprintf(stderr, "BUG: unexepected notify value 0x%" PRIx64, val);
>  }
> +
> +dsm_ram_mr = memory_region_find(>mr, state->page_size,
> +state->page_size).mr;
> +memory_region_unref(dsm_ram_mr);
> +in = memory_region_get_ram_ptr(dsm_ram_mr);

This looks suspicious.  Shouldn't the memory_region_unref(dsm_ram_mr)
happen after we're done using it?

> +out = (dsm_out *)in;
> +
> +revision = in->arg1;
> +function = in->arg2;
> +handle = in->handle;
> +le32_to_cpus();
> +le32_to_cpus();
> +le32_to_cpus();
> +
> +nvdebug("UUID " UUID_FMT ".\n", in->arg0[0], in->arg0[1], in->arg0[2],
> +in->arg0[3], in->arg0[4], in->arg0[5], in->arg0[6],
> +in->arg0[7], in->arg0[8], in->arg0[9], in->arg0[10],
> +in->arg0[11], in->arg0[12], in->arg0[13], in->arg0[14],
> +in->arg0[15]);
> +nvdebug("Revision %#x Function %#x Handler %#x.\n", revision, function,
> +handle);
> +
> +if (revision != DSM_REVISION) {
> +nvdebug("Revision %#x is not supported, expect %#x.\n",
> +revision, DSM_REVISION);
> +goto exit;
> +}
> +
> +if (!handle) {
> +if (!dsm_is_root_uuid(in->arg0)) {

Please don't dereference 'in' or pass it to other functions.  Avoid race
conditions with guest vcpus by coping in the entire dsm_in struct.

This is like a system call - the kernel cannot trust userspace memory
and must copy in before accessing data.  The same rules apply.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [edk2] KVM: MTRR: fix memory type handling if MTRR is completely disabled

2015-10-14 Thread Xiao Guangrong



On 10/14/2015 05:47 PM, Laszlo Ersek wrote:

On 10/14/15 10:32, Xiao Guangrong wrote:



On 10/14/2015 04:24 PM, Xiao Guangrong wrote:



On 10/14/2015 03:37 PM, Janusz wrote:

I was able to run my virtual machine with this, but had very high cpu
usage when something happen in it like booting system. once, my virtual
machine hang and I couln't even get my mouse / keyboard back from qemu.
When I did vga passthrough, I didn't get any video output, and cpu usage
was also high. Tried it on 4.3


Which tree are you using? Is it kvm tree?
Could you please work on queue brancn on current kvm tree based on
top commit 73917739334c6509: KVM: x86: fix SMI to halted VCPU.

Hmm... interesting, this diff works on my box...


Forgot to say that i built my test env following the instructions on
kvm-wiki:
http://www.linux-kvm.org/page/OVMF


Wow! Someone actually cares about the whitepaper. Thank you. :)


:)

The document is really useful to me, thanks for your contribution.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [edk2] KVM: MTRR: fix memory type handling if MTRR is completely disabled

2015-10-14 Thread Xiao Guangrong



On 10/15/2015 02:08 AM, Janusz wrote:

W dniu 14.10.2015 o 10:32, Xiao Guangrong pisze:



On 10/14/2015 04:24 PM, Xiao Guangrong wrote:



On 10/14/2015 03:37 PM, Janusz wrote:

I was able to run my virtual machine with this, but had very high cpu
usage when something happen in it like booting system. once, my virtual
machine hang and I couln't even get my mouse / keyboard back from qemu.
When I did vga passthrough, I didn't get any video output, and cpu
usage
was also high. Tried it on 4.3


Which tree are you using? Is it kvm tree?
Could you please work on queue brancn on current kvm tree based on
top commit 73917739334c6509: KVM: x86: fix SMI to halted VCPU.

Hmm... interesting, this diff works on my box...


Forgot to say that i built my test env following the instructions on
kvm-wiki:
http://www.linux-kvm.org/page/OVMF

My test script is attached, and i will try to build the env like yours
as much
as possible...

I cloned git://git.kernel.org/pub/scm/virt/kvm/kvm.git 73917739334c6509
commit, but this is breaking my system...
Slim is not able to start i3, xdm is not killing X when I stop xdm, qemu
is not able to start when I don't use option -nographic
log from qemu on that kernel version:
xcb_connection_has_error() returned true
No protocol specified
Could not initialize SDL(No available video device) - exiting

On main kernel branch I don't have those problems.

I tried to run with -nographic, and tried pc-i440fx-2.1 but the same
problem as before, high cpu usage and no graphic on my GPU.
I don't know if that will help by this is my log from option -global
isa-debugcon.iobase=0x402 -debugcon file:fedora.ovmf.log:
https://bpaste.net/show/36c54dba68c2


Well, the bug may be not in KVM. When this bug happened, i saw OVMF
only checked 1 CPU out, there is the log from OVMF's debug input:

  Flushing GCD
  Flushing GCD
  Flushing GCD
  Flushing GCD
  Flushing GCD
  Flushing GCD
  Flushing GCD
  Flushing GCD
  Flushing GCD
  Flushing GCDs
Detect CPU count: 1

So that the startup code has been freed however the APs are still running,
i think that why we saw the vCPUs executed on unexpected address.

After digging into OVMF's code, i noticed that BSP CPU waits for APs
for a fixed timer period, however, KVM recent changes require zap all
mappings if CR0.CD is changed, that means the APs need more time to
startup.

After following changes to OVMF, the bug is completely gone on my side:

--- a/UefiCpuPkg/CpuDxe/ApStartup.c
+++ b/UefiCpuPkg/CpuDxe/ApStartup.c
@@ -454,7 +454,9 @@ StartApsStackless (
   //
   // Wait 100 milliseconds for APs to arrive at the ApEntryPoint routine
   //
-  MicroSecondDelay (100 * 1000);
+  MicroSecondDelay (10 * 100 * 1000);

   return EFI_SUCCESS;
 }

Janusz, could you please check this instead? You can switch to your
previous kernel to do this test.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] kvm: svm: Only propagate next_rip when guest supports it

2015-10-14 Thread Joerg Roedel
On Fri, Oct 09, 2015 at 01:15:11PM +0200, Paolo Bonzini wrote:
> This could be a bit expensive to do on every vmexit.  Can you benchmark
> it with kvm-unit-tests, or just cache the result in struct vcpu_svm?

Yes, caching it is certainly a good idea. I updated the patch:

>From 94ee662c527683c26ea5fa98a5a8f2c798c58470 Mon Sep 17 00:00:00 2001
From: Joerg Roedel 
Date: Wed, 7 Oct 2015 13:38:19 +0200
Subject: [PATCH] kvm: svm: Only propagate next_rip when guest supports it

Currently we always write the next_rip of the shadow vmcb to
the guests vmcb when we emulate a vmexit. This could confuse
the guest when its cpuid indicated no support for the
next_rip feature.

Fix this by only propagating next_rip if the guest actually
supports it.

Cc: Bandan Das 
Cc: Dirk Mueller 
Tested-By: Dirk Mueller 
Signed-off-by: Joerg Roedel 
---
 arch/x86/kvm/cpuid.h | 21 +
 arch/x86/kvm/svm.c   | 11 ++-
 2 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index dd05b9c..effca1f 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -133,4 +133,25 @@ static inline bool guest_cpuid_has_mpx(struct kvm_vcpu 
*vcpu)
best = kvm_find_cpuid_entry(vcpu, 7, 0);
return best && (best->ebx & bit(X86_FEATURE_MPX));
 }
+
+/*
+ * NRIPS is provided through cpuidfn 0x800a.edx bit 3
+ */
+#define BIT_NRIPS  3
+
+static inline bool guest_cpuid_has_nrips(struct kvm_vcpu *vcpu)
+{
+   struct kvm_cpuid_entry2 *best;
+
+   best = kvm_find_cpuid_entry(vcpu, 0x800a, 0);
+
+   /*
+* NRIPS is a scattered cpuid feature, so we can't use
+* X86_FEATURE_NRIPS here (X86_FEATURE_NRIPS would be bit
+* position 8, not 3).
+*/
+   return best && (best->edx & bit(BIT_NRIPS));
+}
+#undef BIT_NRIPS
+
 #endif
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 2f9ed1f..e9e3294 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -159,6 +159,9 @@ struct vcpu_svm {
u32 apf_reason;
 
u64  tsc_ratio;
+
+   /* cached guest cpuid flags for faster access */
+   bool nrips_enabled  : 1;
 };
 
 static DEFINE_PER_CPU(u64, current_tsc_ratio);
@@ -2365,7 +2368,9 @@ static int nested_svm_vmexit(struct vcpu_svm *svm)
nested_vmcb->control.exit_info_2   = vmcb->control.exit_info_2;
nested_vmcb->control.exit_int_info = vmcb->control.exit_int_info;
nested_vmcb->control.exit_int_info_err = 
vmcb->control.exit_int_info_err;
-   nested_vmcb->control.next_rip  = vmcb->control.next_rip;
+
+   if (svm->nrips_enabled)
+   nested_vmcb->control.next_rip  = vmcb->control.next_rip;
 
/*
 * If we emulate a VMRUN/#VMEXIT in the same host #vmexit cycle we have
@@ -4098,6 +4103,10 @@ static u64 svm_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t 
gfn, bool is_mmio)
 
 static void svm_cpuid_update(struct kvm_vcpu *vcpu)
 {
+   struct vcpu_svm *svm = to_svm(vcpu);
+
+   /* Update nrips enabled cache */
+   svm->nrips_enabled = !!guest_cpuid_has_nrips(>vcpu);
 }
 
 static void svm_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 *entry)
-- 
1.8.4.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: x86: fix RSM into 64-bit protected mode

2015-10-14 Thread Paolo Bonzini
In order to get into 64-bit protected mode, CS.L must be 0.  This
is always the case when executing RSM, so it is enough to load the
segments after CR0 and CR4.

Fixes: 660a5d517aaab9187f93854425c4c63f4a09195c
Cc: sta...@vger.kernel.org
Signed-off-by: Paolo Bonzini 
---
 arch/x86/kvm/emulate.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index e7a4fde5d631..2392541a96e6 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2418,7 +2418,7 @@ static int rsm_load_state_64(struct x86_emulate_ctxt 
*ctxt, u64 smbase)
u64 val, cr0, cr4;
u32 base3;
u16 selector;
-   int i;
+   int i, r;
 
for (i = 0; i < 16; i++)
*reg_write(ctxt, i) = GET_SMSTATE(u64, smbase, 0x7ff8 - i * 8);
@@ -2460,13 +2460,17 @@ static int rsm_load_state_64(struct x86_emulate_ctxt 
*ctxt, u64 smbase)
dt.address =GET_SMSTATE(u64, smbase, 0x7e68);
ctxt->ops->set_gdt(ctxt, );
 
+   r = rsm_enter_protected_mode(ctxt, cr0, cr4);
+   if (r != X86EMUL_CONTINUE)
+   return r;
+
for (i = 0; i < 6; i++) {
-   int r = rsm_load_seg_64(ctxt, smbase, i);
+   r = rsm_load_seg_64(ctxt, smbase, i);
if (r != X86EMUL_CONTINUE)
return r;
}
 
-   return rsm_enter_protected_mode(ctxt, cr0, cr4);
+   return X86EMUL_CONTINUE;
 }
 
 static int em_rsm(struct x86_emulate_ctxt *ctxt)
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 27/32] nvdimm: support DSM_CMD_IMPLEMENTED function

2015-10-14 Thread Xiao Guangrong



On 10/15/2015 01:06 AM, Eduardo Habkost wrote:

On Wed, Oct 14, 2015 at 10:50:40PM +0800, Xiao Guangrong wrote:

On 10/14/2015 05:40 PM, Stefan Hajnoczi wrote:

On Sun, Oct 11, 2015 at 11:52:59AM +0800, Xiao Guangrong wrote:

  static void dsm_write(void *opaque, hwaddr addr,
uint64_t val, unsigned size)
  {
+NVDIMMState *state = opaque;
+MemoryRegion *dsm_ram_mr;
+dsm_in *in;
+dsm_out *out;
+uint32_t revision, function, handle;
+
  if (val != NOTIFY_VALUE) {
  fprintf(stderr, "BUG: unexepected notify value 0x%" PRIx64, val);
  }
+
+dsm_ram_mr = memory_region_find(>mr, state->page_size,
+state->page_size).mr;
+memory_region_unref(dsm_ram_mr);
+in = memory_region_get_ram_ptr(dsm_ram_mr);


This looks suspicious.  Shouldn't the memory_region_unref(dsm_ram_mr)
happen after we're done using it?


This region is keep-alive during QEMU's running, it is okay.  The same
style is applied to other codes, for example: line 208 in
hw/s390x/sclp.c.


In sclp.c (assign_storage()), the memory region is never used after
memory_region_unref() is called. In unassign_storage(), sclp.c owns an
additional reference, grabbed by assign_storage().



Ah... I got it, thank you for pointing it out.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: AIO requests may be disordered by Qemu-kvm iothread with disk cache=writethrough, Bug or Feature?

2015-10-14 Thread Stefan Hajnoczi
On Thu, Oct 08, 2015 at 07:59:56PM +0800, charlie.song wrote:
> We recently try to use Linux AIO from guest OS and find that the IOthread 
> mechanism of Qemu-KVM will reorder I/O requests from guest OS 
> even when the AIO write requests are issued from a single thread in order. 
> This does not happen on the host OS however.

I think you are describing a situation where a guest submits multiple
overlapping I/O requests at the same time.

virtio-blk does not guarantee a specific request ordering, so the
application needs to wait for request completion if ordering matters.

io_submit(2) also does not make guarantees about ordering.

Stefan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 27/32] nvdimm: support DSM_CMD_IMPLEMENTED function

2015-10-14 Thread Stefan Hajnoczi
On Sun, Oct 11, 2015 at 11:52:59AM +0800, Xiao Guangrong wrote:
> +out->len = sizeof(out->status);

out->len is uint16_t, it needs cpu_to_le16().  There may be other
instances in this patch series.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: e500: fix couple of shift operations on 64 bits

2015-10-14 Thread Paul Mackerras
On Thu, Oct 01, 2015 at 03:58:03PM +0300, Laurentiu Tudor wrote:
> Fix couple of cases where we shift left a 32-bit
> value thus might get truncated results on 64-bit
> targets.
> 
> Signed-off-by: Laurentiu Tudor 
> Suggested-by: Scott Wood 

Thanks, applied to my kvm-ppc-next branch.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][v2] KVM: PPC: e500: Emulate TMCFG0 TMRN register

2015-10-14 Thread Paul Mackerras
On Fri, Sep 25, 2015 at 06:02:23PM +0300, Laurentiu Tudor wrote:
> Emulate TMCFG0 TMRN register exposing one HW thread per vcpu.
> 
> Signed-off-by: Mihai Caraman 
> [laurentiu.tu...@freescale.com: rebased on latest kernel, use
>  define instead of hardcoded value, moved code in own function]
> Signed-off-by: Laurentiu Tudor 

Thanks, applied to my kvm-ppc-next branch.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] powerpc/e6500: add TMCFG0 register definition

2015-10-14 Thread Paul Mackerras
On Wed, Sep 23, 2015 at 06:06:22PM +0300, Laurentiu Tudor wrote:
> The register is not currently used in the base kernel
> but will be in a forthcoming kvm patch.
> 
> Signed-off-by: Laurentiu Tudor 

Thanks, applied to my kvm-ppc-next branch.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][v2] KVM: PPC: e500: Emulate TMCFG0 TMRN register

2015-10-14 Thread Paul Mackerras
On Fri, Sep 25, 2015 at 06:02:23PM +0300, Laurentiu Tudor wrote:
> Emulate TMCFG0 TMRN register exposing one HW thread per vcpu.
> 
> Signed-off-by: Mihai Caraman 
> [laurentiu.tu...@freescale.com: rebased on latest kernel, use
>  define instead of hardcoded value, moved code in own function]
> Signed-off-by: Laurentiu Tudor 

Thanks, applied to my kvm-ppc-next branch.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: e500: fix couple of shift operations on 64 bits

2015-10-14 Thread Paul Mackerras
On Thu, Oct 01, 2015 at 03:58:03PM +0300, Laurentiu Tudor wrote:
> Fix couple of cases where we shift left a 32-bit
> value thus might get truncated results on 64-bit
> targets.
> 
> Signed-off-by: Laurentiu Tudor 
> Suggested-by: Scott Wood 

Thanks, applied to my kvm-ppc-next branch.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 15/19] KVM: PPC: e500: fix handling local_sid_lookup result

2015-10-14 Thread Paul Mackerras
On Thu, Sep 24, 2015 at 04:00:23PM +0200, Andrzej Hajda wrote:
> The function can return negative value.
> 
> The problem has been detected using proposed semantic patch
> scripts/coccinelle/tests/assign_signed_to_unsigned.cocci [1].
> 
> [1]: http://permalink.gmane.org/gmane.linux.kernel/2046107
> 
> Signed-off-by: Andrzej Hajda 

Thanks, applied to my kvm-ppc-next branch.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: PPC: Book3S HV: Make H_REMOVE return correct HPTE value for absent HPTEs

2015-10-14 Thread Paul Mackerras
This fixes a bug where the old HPTE value returned by H_REMOVE has
the valid bit clear if the HPTE was an absent HPTE, as happens for
HPTEs for emulated MMIO pages and for RAM pages that have been paged
out by the host.  If the absent bit is set, we clear it and set the
valid bit, because from the guest's point of view, the HPTE is valid.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/kvm/book3s_hv_rm_mmu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index c1df9bb..97e7f8c 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -470,6 +470,8 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long 
flags,
note_hpte_modification(kvm, rev);
unlock_hpte(hpte, 0);
 
+   if (v & HPTE_V_ABSENT)
+   v = (v & ~HPTE_V_ABSENT) | HPTE_V_VALID;
hpret[0] = v;
hpret[1] = r;
return H_SUCCESS;
-- 
2.6.1

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: PPC: Book3S HV: Make H_REMOVE return correct HPTE value for absent HPTEs

2015-10-14 Thread Paul Mackerras
This fixes a bug where the old HPTE value returned by H_REMOVE has
the valid bit clear if the HPTE was an absent HPTE, as happens for
HPTEs for emulated MMIO pages and for RAM pages that have been paged
out by the host.  If the absent bit is set, we clear it and set the
valid bit, because from the guest's point of view, the HPTE is valid.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/kvm/book3s_hv_rm_mmu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index c1df9bb..97e7f8c 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -470,6 +470,8 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long 
flags,
note_hpte_modification(kvm, rev);
unlock_hpte(hpte, 0);
 
+   if (v & HPTE_V_ABSENT)
+   v = (v & ~HPTE_V_ABSENT) | HPTE_V_VALID;
hpret[0] = v;
hpret[1] = r;
return H_SUCCESS;
-- 
2.6.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: PPC: Book3S HV: Don't fall back to smaller HPT size in allocation ioctl

2015-10-14 Thread Paul Mackerras
Currently the KVM_PPC_ALLOCATE_HTAB will try to allocate the requested
size of HPT, and if that is not possible, then try to allocate smaller
sizes (by factors of 2) until either a minimum is reached or the
allocation succeeds.  This is not ideal for userspace, particularly in
migration scenarios, where the destination VM really does require the
size requested.  Also, the minimum HPT size of 256kB may be
insufficient for the guest to run successfully.

This removes the fallback to smaller sizes on allocation failure for
the KVM_PPC_ALLOCATE_HTAB ioctl.  The fallback still exists for the
case where the HPT is allocated at the time the first VCPU is run, if
no HPT has been allocated by ioctl by that time.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 1f9c0a1..10722b1 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -70,7 +70,8 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
}
 
/* Lastly try successively smaller sizes from the page allocator */
-   while (!hpt && order > PPC_MIN_HPT_ORDER) {
+   /* Only do this if userspace didn't specify a size via ioctl */
+   while (!hpt && order > PPC_MIN_HPT_ORDER && !htab_orderp) {
hpt = __get_free_pages(GFP_KERNEL|__GFP_ZERO|__GFP_REPEAT|
   __GFP_NOWARN, order - PAGE_SHIFT);
if (!hpt)
-- 
2.6.1

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: PPC: Book3S HV: Don't fall back to smaller HPT size in allocation ioctl

2015-10-14 Thread Paul Mackerras
Currently the KVM_PPC_ALLOCATE_HTAB will try to allocate the requested
size of HPT, and if that is not possible, then try to allocate smaller
sizes (by factors of 2) until either a minimum is reached or the
allocation succeeds.  This is not ideal for userspace, particularly in
migration scenarios, where the destination VM really does require the
size requested.  Also, the minimum HPT size of 256kB may be
insufficient for the guest to run successfully.

This removes the fallback to smaller sizes on allocation failure for
the KVM_PPC_ALLOCATE_HTAB ioctl.  The fallback still exists for the
case where the HPT is allocated at the time the first VCPU is run, if
no HPT has been allocated by ioctl by that time.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 1f9c0a1..10722b1 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -70,7 +70,8 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
}
 
/* Lastly try successively smaller sizes from the page allocator */
-   while (!hpt && order > PPC_MIN_HPT_ORDER) {
+   /* Only do this if userspace didn't specify a size via ioctl */
+   while (!hpt && order > PPC_MIN_HPT_ORDER && !htab_orderp) {
hpt = __get_free_pages(GFP_KERNEL|__GFP_ZERO|__GFP_REPEAT|
   __GFP_NOWARN, order - PAGE_SHIFT);
if (!hpt)
-- 
2.6.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html