[PATCHv4] kvmppc: Implement H_LOGICAL_CI_{LOAD,STORE} in KVM

2015-04-20 Thread David Gibson
On POWER, storage caching is usually configured via the MMU - attributes
such as cache-inhibited are stored in the TLB and the hashed page table.

This makes correctly performing cache inhibited IO accesses awkward when
the MMU is turned off (real mode).  Some CPU models provide special
registers to control the cache attributes of real mode load and stores but
this is not at all consistent.  This is a problem in particular for SLOF,
the firmware used on KVM guests, which runs entirely in real mode, but
which needs to do IO to load the kernel.

To simplify this qemu implements two special hypercalls, H_LOGICAL_CI_LOAD
and H_LOGICAL_CI_STORE which simulate a cache-inhibited load or store to
a logical address (aka guest physical address).  SLOF uses these for IO.

However, because these are implemented within qemu, not the host kernel,
these bypass any IO devices emulated within KVM itself.  The simplest way
to see this problem is to attempt to boot a KVM guest from a virtio-blk
device with iothread / dataplane enabled.  The iothread code relies on an
in kernel implementation of the virtio queue notification, which is not
triggered by the IO hcalls, and so the guest will stall in SLOF unable to
load the guest OS.

This patch addresses this by providing in-kernel implementations of the
2 hypercalls, which correctly scan the KVM IO bus.  Any access to an
address not handled by the KVM IO bus will cause a VM exit, hitting the
qemu implementation as before.

Note that a userspace change is also required, in order to enable these
new hcall implementations with KVM_CAP_PPC_ENABLE_HCALL.

Signed-off-by: David Gibson da...@gibson.dropbear.id.au
---
 arch/powerpc/include/asm/kvm_book3s.h |  3 ++
 arch/powerpc/kvm/book3s.c | 76 +++
 arch/powerpc/kvm/book3s_hv.c  | 12 ++
 arch/powerpc/kvm/book3s_pr_papr.c | 28 +
 4 files changed, 119 insertions(+)

Changes in v4:
 * Rebase onto 4.0+, correct for changed signature of kvm_io_bus_{read,write}

Alex, I saw from some build system notifications that you seemed to
hit some troubles compiling the last version of this patch. This
should fix it - hope it's not too late to get into 4.1.

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index 9930904..b91e74a 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -288,6 +288,9 @@ static inline bool kvmppc_supports_magic_page(struct 
kvm_vcpu *vcpu)
return !is_kvmppc_hv_enabled(vcpu-kvm);
 }
 
+extern int kvmppc_h_logical_ci_load(struct kvm_vcpu *vcpu);
+extern int kvmppc_h_logical_ci_store(struct kvm_vcpu *vcpu);
+
 /* Magic register values loaded into r3 and r4 before the 'sc' assembly
  * instruction for the OSI hypercalls */
 #define OSI_SC_MAGIC_R30x113724FA
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index cfbcdc6..453a8a4 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -821,6 +821,82 @@ void kvmppc_core_destroy_vm(struct kvm *kvm)
 #endif
 }
 
+int kvmppc_h_logical_ci_load(struct kvm_vcpu *vcpu)
+{
+   unsigned long size = kvmppc_get_gpr(vcpu, 4);
+   unsigned long addr = kvmppc_get_gpr(vcpu, 5);
+   u64 buf;
+   int ret;
+
+   if (!is_power_of_2(size) || (size  sizeof(buf)))
+   return H_TOO_HARD;
+
+   ret = kvm_io_bus_read(vcpu, KVM_MMIO_BUS, addr, size, buf);
+   if (ret != 0)
+   return H_TOO_HARD;
+
+   switch (size) {
+   case 1:
+   kvmppc_set_gpr(vcpu, 4, *(u8 *)buf);
+   break;
+
+   case 2:
+   kvmppc_set_gpr(vcpu, 4, be16_to_cpu(*(__be16 *)buf));
+   break;
+
+   case 4:
+   kvmppc_set_gpr(vcpu, 4, be32_to_cpu(*(__be32 *)buf));
+   break;
+
+   case 8:
+   kvmppc_set_gpr(vcpu, 4, be64_to_cpu(*(__be64 *)buf));
+   break;
+
+   default:
+   BUG();
+   }
+
+   return H_SUCCESS;
+}
+EXPORT_SYMBOL_GPL(kvmppc_h_logical_ci_load);
+
+int kvmppc_h_logical_ci_store(struct kvm_vcpu *vcpu)
+{
+   unsigned long size = kvmppc_get_gpr(vcpu, 4);
+   unsigned long addr = kvmppc_get_gpr(vcpu, 5);
+   unsigned long val = kvmppc_get_gpr(vcpu, 6);
+   u64 buf;
+   int ret;
+
+   switch (size) {
+   case 1:
+   *(u8 *)buf = val;
+   break;
+
+   case 2:
+   *(__be16 *)buf = cpu_to_be16(val);
+   break;
+
+   case 4:
+   *(__be32 *)buf = cpu_to_be32(val);
+   break;
+
+   case 8:
+   *(__be64 *)buf = cpu_to_be64(val);
+   break;
+
+   default:
+   return H_TOO_HARD;
+   }
+
+   ret = kvm_io_bus_write(vcpu, KVM_MMIO_BUS, addr, size, buf);
+   if (ret != 0)
+   return H_TOO_HARD;
+
+   return H_SUCCESS;
+}

Non-exiting rdpmc on KVM guests?

2015-04-20 Thread Andy Lutomirski
I just wrote a little perf self-monitoring tool that uses rdpmc to
count cycles.  Performance sucks under KVM (VMX).

How hard would it be to avoid rdpmc exits in cases where the host and
guest pmu configurations are compatible as seen by rdpmc?  I'm mostly
ignorant of how the PMU counter offsets and such work.

(Also, grr, Intel, couldn't you come up with a better interface for this?)

--Andy
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv4] kvmppc: Implement H_LOGICAL_CI_{LOAD,STORE} in KVM

2015-04-20 Thread David Gibson
On POWER, storage caching is usually configured via the MMU - attributes
such as cache-inhibited are stored in the TLB and the hashed page table.

This makes correctly performing cache inhibited IO accesses awkward when
the MMU is turned off (real mode).  Some CPU models provide special
registers to control the cache attributes of real mode load and stores but
this is not at all consistent.  This is a problem in particular for SLOF,
the firmware used on KVM guests, which runs entirely in real mode, but
which needs to do IO to load the kernel.

To simplify this qemu implements two special hypercalls, H_LOGICAL_CI_LOAD
and H_LOGICAL_CI_STORE which simulate a cache-inhibited load or store to
a logical address (aka guest physical address).  SLOF uses these for IO.

However, because these are implemented within qemu, not the host kernel,
these bypass any IO devices emulated within KVM itself.  The simplest way
to see this problem is to attempt to boot a KVM guest from a virtio-blk
device with iothread / dataplane enabled.  The iothread code relies on an
in kernel implementation of the virtio queue notification, which is not
triggered by the IO hcalls, and so the guest will stall in SLOF unable to
load the guest OS.

This patch addresses this by providing in-kernel implementations of the
2 hypercalls, which correctly scan the KVM IO bus.  Any access to an
address not handled by the KVM IO bus will cause a VM exit, hitting the
qemu implementation as before.

Note that a userspace change is also required, in order to enable these
new hcall implementations with KVM_CAP_PPC_ENABLE_HCALL.

Signed-off-by: David Gibson da...@gibson.dropbear.id.au
---
 arch/powerpc/include/asm/kvm_book3s.h |  3 ++
 arch/powerpc/kvm/book3s.c | 76 +++
 arch/powerpc/kvm/book3s_hv.c  | 12 ++
 arch/powerpc/kvm/book3s_pr_papr.c | 28 +
 4 files changed, 119 insertions(+)

Changes in v4:
 * Rebase onto 4.0+, correct for changed signature of kvm_io_bus_{read,write}

Alex, I saw from some build system notifications that you seemed to
hit some troubles compiling the last version of this patch. This
should fix it - hope it's not too late to get into 4.1.

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index 9930904..b91e74a 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -288,6 +288,9 @@ static inline bool kvmppc_supports_magic_page(struct 
kvm_vcpu *vcpu)
return !is_kvmppc_hv_enabled(vcpu-kvm);
 }
 
+extern int kvmppc_h_logical_ci_load(struct kvm_vcpu *vcpu);
+extern int kvmppc_h_logical_ci_store(struct kvm_vcpu *vcpu);
+
 /* Magic register values loaded into r3 and r4 before the 'sc' assembly
  * instruction for the OSI hypercalls */
 #define OSI_SC_MAGIC_R30x113724FA
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index cfbcdc6..453a8a4 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -821,6 +821,82 @@ void kvmppc_core_destroy_vm(struct kvm *kvm)
 #endif
 }
 
+int kvmppc_h_logical_ci_load(struct kvm_vcpu *vcpu)
+{
+   unsigned long size = kvmppc_get_gpr(vcpu, 4);
+   unsigned long addr = kvmppc_get_gpr(vcpu, 5);
+   u64 buf;
+   int ret;
+
+   if (!is_power_of_2(size) || (size  sizeof(buf)))
+   return H_TOO_HARD;
+
+   ret = kvm_io_bus_read(vcpu, KVM_MMIO_BUS, addr, size, buf);
+   if (ret != 0)
+   return H_TOO_HARD;
+
+   switch (size) {
+   case 1:
+   kvmppc_set_gpr(vcpu, 4, *(u8 *)buf);
+   break;
+
+   case 2:
+   kvmppc_set_gpr(vcpu, 4, be16_to_cpu(*(__be16 *)buf));
+   break;
+
+   case 4:
+   kvmppc_set_gpr(vcpu, 4, be32_to_cpu(*(__be32 *)buf));
+   break;
+
+   case 8:
+   kvmppc_set_gpr(vcpu, 4, be64_to_cpu(*(__be64 *)buf));
+   break;
+
+   default:
+   BUG();
+   }
+
+   return H_SUCCESS;
+}
+EXPORT_SYMBOL_GPL(kvmppc_h_logical_ci_load);
+
+int kvmppc_h_logical_ci_store(struct kvm_vcpu *vcpu)
+{
+   unsigned long size = kvmppc_get_gpr(vcpu, 4);
+   unsigned long addr = kvmppc_get_gpr(vcpu, 5);
+   unsigned long val = kvmppc_get_gpr(vcpu, 6);
+   u64 buf;
+   int ret;
+
+   switch (size) {
+   case 1:
+   *(u8 *)buf = val;
+   break;
+
+   case 2:
+   *(__be16 *)buf = cpu_to_be16(val);
+   break;
+
+   case 4:
+   *(__be32 *)buf = cpu_to_be32(val);
+   break;
+
+   case 8:
+   *(__be64 *)buf = cpu_to_be64(val);
+   break;
+
+   default:
+   return H_TOO_HARD;
+   }
+
+   ret = kvm_io_bus_write(vcpu, KVM_MMIO_BUS, addr, size, buf);
+   if (ret != 0)
+   return H_TOO_HARD;
+
+   return H_SUCCESS;
+}

Re: [PATCH v2 4/4] target-arm: kvm - add support for HW assisted debug

2015-04-20 Thread Peter Maydell
On 31 March 2015 at 16:40, Alex Bennée alex.ben...@linaro.org wrote:
 From: Alex Bennée a...@bennee.com

 This adds basic support for HW assisted debug. The ioctl interface to
 KVM allows us to pass an implementation defined number of break and
 watch point registers. When KVM_GUESTDBG_USE_HW_BP is specified these
 debug registers will be installed in place on the world switch into the
 guest.

 The hardware is actually capable of more advanced matching but it is
 unclear if this expressiveness is available via the gdbstub protocol.

 Signed-off-by: Alex Bennée alex.ben...@linaro.org

 ---
 v2
   - correct setting of PMC/BAS/MASK
   - improved commentary
   - added helper function to check watchpoint in range
   - fix find/deletion of watchpoints

 diff --git a/target-arm/kvm.c b/target-arm/kvm.c
 index ae0f8b2..d1adf5f 100644
 --- a/target-arm/kvm.c
 +++ b/target-arm/kvm.c
 @@ -17,6 +17,7 @@

  #include qemu-common.h
  #include qemu/timer.h
 +#include qemu/error-report.h
  #include sysemu/sysemu.h
  #include sysemu/kvm.h
  #include kvm_arm.h
 @@ -476,6 +477,8 @@ void kvm_arch_post_run(CPUState *cs, struct kvm_run *run)

  #define HSR_EC_SHIFT26
  #define HSR_EC_SOFT_STEP0x32
 +#define HSR_EC_HW_BKPT  0x30
 +#define HSR_EC_HW_WATCH 0x34
  #define HSR_EC_SW_BKPT  0x3c

  static int kvm_handle_debug(CPUState *cs, struct kvm_run *run)
 @@ -496,6 +499,16 @@ static int kvm_handle_debug(CPUState *cs, struct kvm_run 
 *run)
  return true;
  }
  break;
 +case HSR_EC_HW_BKPT:
 +if (kvm_arm_find_hw_breakpoint(cs, arch_info-pc)) {
 +return true;
 +}
 +break;
 +case HSR_EC_HW_WATCH:
 +if (kvm_arm_find_hw_watchpoint(cs, arch_info-far)) {
 +return true;
 +}
 +break;
  default:
  error_report(%s: unhandled debug exit (%x, %llx)\n,
   __func__, arch_info-hsr, arch_info-pc);
 @@ -556,6 +569,10 @@ void kvm_arch_update_guest_debug(CPUState *cs, struct 
 kvm_guest_debug *dbg)
  if (kvm_sw_breakpoints_active(cs)) {
  dbg-control |= KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_USE_SW_BP;
  }
 +if (kvm_hw_breakpoints_active(cs)) {
 +dbg-control |= KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_USE_HW_BP;
 +kvm_copy_hw_breakpoint_data(dbg-arch);
 +}
  }

  /* C6.6.29 BRK instruction */
 @@ -582,26 +599,6 @@ int kvm_arch_remove_sw_breakpoint(CPUState *cs, struct 
 kvm_sw_breakpoint *bp)
  return 0;
  }

 -int kvm_arch_insert_hw_breakpoint(target_ulong addr,
 -  target_ulong len, int type)
 -{
 -qemu_log_mask(LOG_UNIMP, %s: not implemented\n, __func__);
 -return -EINVAL;
 -}
 -
 -int kvm_arch_remove_hw_breakpoint(target_ulong addr,
 -  target_ulong len, int type)
 -{
 -qemu_log_mask(LOG_UNIMP, %s: not implemented\n, __func__);
 -return -EINVAL;
 -}
 -
 -
 -void kvm_arch_remove_all_hw_breakpoints(void)
 -{
 -qemu_log_mask(LOG_UNIMP, %s: not implemented\n, __func__);
 -}
 -
  void kvm_arch_init_irq_routing(KVMState *s)
  {
  }
 diff --git a/target-arm/kvm64.c b/target-arm/kvm64.c
 index 8cf3a62..dbe81a7 100644
 --- a/target-arm/kvm64.c
 +++ b/target-arm/kvm64.c
 @@ -2,6 +2,7 @@
   * ARM implementation of KVM hooks, 64 bit specific code
   *
   * Copyright Mian-M. Hamayun 2013, Virtual Open Systems
 + * Copyright Alex Bennée 2014, Linaro
   *
   * This work is licensed under the terms of the GNU GPL, version 2 or later.
   * See the COPYING file in the top-level directory.
 @@ -12,11 +13,17 @@
  #include sys/types.h
  #include sys/ioctl.h
  #include sys/mman.h
 +#include sys/ptrace.h
 +#include asm/ptrace.h

We really need the asm/ include ?

 +#include linux/elf.h
  #include linux/kvm.h

  #include qemu-common.h
  #include qemu/timer.h
 +#include qemu/host-utils.h
 +#include qemu/error-report.h
 +#include exec/gdbstub.h
  #include sysemu/sysemu.h
  #include sysemu/kvm.h
  #include kvm_arm.h
 @@ -24,6 +31,312 @@
  #include internals.h
  #include hw/arm/arm.h

 +/* Max and current break/watch point counts */
 +int max_hw_bp, max_hw_wp;
 +int cur_hw_bp, cur_hw_wp;
 +struct kvm_guest_debug_arch guest_debug_registers;

How does this work in an SMP guest?

 +
 +/**
 + * kvm_arm_init_debug()
 + * @cs: CPUState
 + *
 + * kvm_check_extension returns 0 if we have no debug registers or the
 + * number we have.
 + *
 + */
 +static void kvm_arm_init_debug(CPUState *cs)
 +{
 +max_hw_wp = kvm_check_extension(cs-kvm_state, 
 KVM_CAP_GUEST_DEBUG_HW_WPS);
 +max_hw_bp = kvm_check_extension(cs-kvm_state, 
 KVM_CAP_GUEST_DEBUG_HW_BPS);
 +return;
 +}
 +
 +/**
 + * insert_hw_breakpoint()
 + * @addr: address of breakpoint
 + *
 + * See ARM ARM D2.9.1 for details but here we are only going to create
 + * simple un-linked breakpoints (i.e. we don't chain breakpoints
 + * together to match address and context or vmid). The hardware is
 + * capable of fancier 

Re: [GIT PULL] First batch of KVM changes for 4.1

2015-04-20 Thread Andy Lutomirski
On Mon, Apr 20, 2015 at 9:59 AM, Paolo Bonzini pbonz...@redhat.com wrote:


 On 17/04/2015 22:18, Marcelo Tosatti wrote:
 The bug which this is fixing is very rare, have no memory of a report.

 In fact, its even difficult to create a synthetic reproducer.

 But then why was the task migration notifier even in Jeremy's original
 code for Xen?  Was it supposed to work even on non-synchronized TSC?

 If that's the case, then it could be reverted indeed; but then why did
 you commit this patch to 4.1?  Did you think of something that would
 cause the seqcount-like protocol to fail, and that turned out not to be
 the case later?  I was only following the mailing list sparsely in March.

I don't think anyone ever tried that hard to test this stuff.  There
was an infinte loop that Firefox was triggering as a KVM guest
somewhat reliably until a couple months ago in the same vdso code.  :(

--Andy
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM: x86: fix kvmclock write race (v2)

2015-04-20 Thread Radim Krčmář
2015-04-17 21:23-0300, Marcelo Tosatti:
 
 From: Radim Krčmář rkrc...@redhat.com
 
 As noted by Andy Lutomirski, kvm does not follow the documented version
 protocol. Fix it.
 
 Note: this bug results in a race which can occur if the following three
 conditions are met:
 
 1) There is KVM guest time update (there is one every 5 minutes).
 
 2) Which races with a thread in the guest in the following way:
 The execution of these 29 instructions has to take at _least_
 2 seconds (rebalance interval is 1 second).
 
 lsl%r9w,%esi
 mov%esi,%r8d
 and$0x3f,%esi
 and$0xfff,%r8d
 test   $0xfc0,%r8d
 jne0xa12 vread_pvclock+210
 shl$0x6,%rsi
 mov-0xa01000(%rsi),%r10d
 data32 xchg %ax,%ax
 data32 xchg %ax,%ax
 rdtsc  
 shl$0x20,%rdx
 mov%eax,%eax
 movsbl -0xa00fe4(%rsi),%ecx
 or %rax,%rdx
 sub-0xa00ff8(%rsi),%rdx
 mov-0xa00fe8(%rsi),%r11d
 mov%rdx,%rax
 shl%cl,%rax
 test   %ecx,%ecx
 js 0xa08 vread_pvclock+200
 mov%r11d,%edx
 movzbl -0xa00fe3(%rsi),%ecx
 mov-0xa00ff0(%rsi),%r11
 mul%rdx
 shrd   $0x20,%rdx,%rax
 data32 xchg %ax,%ax
 data32 xchg %ax,%ax
 lsl%r9w,%edx
 
 3) Scheduler moves the task, while executing these 29 instructions, to a
 destination processor, then back to the source processor.
 
 4) Source processor, after has been moved back from destination,
 perceives data out of order as written by processor performing guest
 time update (item 1), with string mov.
 
 Given the rarity of this condition, and the fact it was never observed
 or reported, reverting pvclock vsyscall on systems whose host is
 susceptible to the race, seems an excessive measure.
 
 Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
 Cc: sta...@kernel.org

Thanks.

Reviewed-or-Signed-off-by: Radim Krčmář rkrc...@redhat.com

Like most code, I would have written it differently now :)

 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 + kvm_write_guest_cached(v-kvm, vcpu-pv_time,
 + guest_hv_clock,
 + sizeof(guest_hv_clock));

The easiest optimization is replacing sizeof(guest_hv_clock) with
  offsetof(typeof(guest_hv_clock), version) + sizeof(guest_hv_clock.version)
because kvm_write_guest_cached() allows writing of prefixes.
This still won't get optimized to a simple MOV at compile time, but
saves few mov bytes.

(Offset of version is 0 now, so using 'sizeof guest_hv_clock.version' is
 just a minor offence sand saves some hard to read code.)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/4] QEMU support for KVM Guest Debug on arm64

2015-04-20 Thread Alex Bennée

Alex Bennée alex.ben...@linaro.org writes:

 Hi,

 I thought I'd sent V1 to the list but apparently not. Anyway this
 patch series provides the QEMU side of guest debug support for arm64.
 I'm assuming the first patch will be dropped when a proper merge of
 the linux-headers is done once the kernel side is upstreamed.

 There is nothing particularly special about the implementation details
 although some of the bit-fiddling is a little fiddly.

 GIT Repos:

 The patch series is based off a recent master and can be found at:

 https://github.com/stsquad/qemu
 branch: kvm/guest-debug-v2

 The kernel patches for this series are based off a v4.0-rc6 and can be
 found at:

 https://git.linaro.org/people/alex.bennee/linux.git
 branch: guest-debug/4.0-rc6-v2

 Alex Bennée (4):
   linux-headers: partial sync from my kernel tree (DEV)
   target-arm: kvm - implement software breakpoints
   target-arm: kvm - support for single step
   target-arm: kvm - add support for HW assisted debug

snip

The kernel side has had two rounds of review and is getting into shape.
It would be nice if I could get some review of the QEMU side for balance
;-)

-- 
Alex Bennée
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] uio: add irq control support to uio_pci_generic

2015-04-20 Thread Michael S. Tsirkin
On Thu, Apr 16, 2015 at 02:21:10PM -0700, Stephen Hemminger wrote:
 On Thu, 16 Apr 2015 09:43:24 +0200
 Michael S. Tsirkin m...@redhat.com wrote:
 
  On Wed, Apr 15, 2015 at 09:59:34AM -0700, Stephen Hemminger wrote:
   The driver already supported INTX interrupts but had no in kernel
   function to enable and disable them.
   
   It is possible for userspace to do this by accessing PCI config
   directly, but this racy
  
  How is it racy? We have userspace using this interface,
  if there's a race I want to fix it.
 
 There is nothing to prevent two threads in user space doing 
 read/modify write at the same time.

Well that's a userspace bug then - so let's drop that
from commit log lest people think this fixes some
kernel bugs. read/modify/write to the same register
is at least an easy to grasp problem, creating
an extra interface for the same function opens up
the possibility that some userspace will do
read/modify/write from one thread with irqcontrol
from another thread, creating more races.

 The bigger issue is that DPDK needs to support multiple UIO
 interface types. And with current model there is no abstraction.
 The way to enable/disable IRQ is different depending on the UIO
 drivers.

OK compatibility with other devices might be useful, but what are the
other UIO drivers DPDK supports? I only found support for igb_uio so
far, and that doesn't seem to be upstream.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] KVM: SVM: Sync g_pat with guest-written PAT value

2015-04-20 Thread Jan Kiszka
When hardware supports the g_pat VMCB field, we can use it for emulating
the PAT configuration that the guest configures by writing to the
corresponding MSR.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---

Changes in v2:
 - add mark_dirty as found missing by Radim

 arch/x86/kvm/svm.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index ce741b8..68fdddc 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3245,6 +3245,16 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr)
case MSR_VM_IGNNE:
vcpu_unimpl(vcpu, unimplemented wrmsr: 0x%x data 0x%llx\n, 
ecx, data);
break;
+   case MSR_IA32_CR_PAT:
+   if (npt_enabled) {
+   if (!kvm_mtrr_valid(vcpu, MSR_IA32_CR_PAT, data))
+   return 1;
+   svm-vmcb-save.g_pat = data;
+   mark_dirty(svm-vmcb, VMCB_NPT);
+   vcpu-arch.pat = data;
+   break;
+   }
+   /* fall through */
default:
return kvm_set_msr_common(vcpu, msr);
}
-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] KVM: SVM: Sync g_pat with guest-written PAT value

2015-04-20 Thread Jan Kiszka
On 2015-04-20 19:37, Jan Kiszka wrote:
 On 2015-04-20 19:33, Radim Krčmář wrote:
 2015-04-20 19:21+0200, Jan Kiszka:
 On 2015-04-20 19:16, Radim Krčmář wrote:
 2015-04-20 18:14+0200, Radim Krčmář:
 Tested-by: Radim Krčmář rkrc...@redhat.com

 Uncached accesses were roughly 20x slower.
 In case anyone wanted to reproduce, I used this as a kvm-unit-test:

 ---
 | [code]

 Great, thanks. Will you push it to the unit tests? Could raise
 motivations to fix the !NPT/EPT case.

 It can't be included in `run_tests.sh`, because we intenionally ignore
 PAT for normal RAM on VMX and the test does fail ...
 
 That ignoring is encoded into the EPT? Hmm... Maybe we can create a
 ivshmem device and use that as test target.

And do you also know why is it ignored on Intel? Side effects on the host?

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] KVM: SVM: Sync g_pat with guest-written PAT value

2015-04-20 Thread Jan Kiszka
On 2015-04-20 19:33, Radim Krčmář wrote:
 2015-04-20 19:21+0200, Jan Kiszka:
 On 2015-04-20 19:16, Radim Krčmář wrote:
 2015-04-20 18:14+0200, Radim Krčmář:
 Tested-by: Radim Krčmář rkrc...@redhat.com

 Uncached accesses were roughly 20x slower.
 In case anyone wanted to reproduce, I used this as a kvm-unit-test:

 ---
 | [code]

 Great, thanks. Will you push it to the unit tests? Could raise
 motivations to fix the !NPT/EPT case.
 
 It can't be included in `run_tests.sh`, because we intenionally ignore
 PAT for normal RAM on VMX and the test does fail ...

That ignoring is encoded into the EPT? Hmm... Maybe we can create a
ivshmem device and use that as test target.

 
 I'll think how to make the test use fool-proof first, and also look how
 to fix the !NPT/EPT without affecting the case we care about too much.
 (And if we can do a similar trick with NPT.)
 

OK.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] KVM: SVM: Sync g_pat with guest-written PAT value

2015-04-20 Thread Radim Krčmář
2015-04-20 19:25+0200, Jan Kiszka:
 When hardware supports the g_pat VMCB field, we can use it for emulating
 the PAT configuration that the guest configures by writing to the
 corresponding MSR.
 
 Signed-off-by: Jan Kiszka jan.kis...@siemens.com
 ---
 
 Changes in v2:
  - add mark_dirty as found missing by Radim

Thanks.

Reviewed-by: Radim Krčmář rkrc...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: x2apic issues with Solaris and Xen guests

2015-04-20 Thread Jan Kiszka
On 2015-04-20 19:07, Stefan Hajnoczi wrote:
 I wonder whether the following two x2apic issues are related:
 
 Solaris 10 U11 network doesn't work
 https://bugzilla.redhat.com/show_bug.cgi?id=1040500
 
 kvm - fails to setup timer interrupt via io-apic
 (Thanks to Michael Tokarev for posting this link)
 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=528077#68
 
 It seems KVM's x2apic emulation works with regular Linux and Windows
 guests, but not necessarily with other OSes.

KVM's x2apic is kind of paravirtual - without VT-d interrupt remapping.
That may confuse the guest, though it should work. But Xen already
refuses to pick it according to the second report:

| (XEN) Not enabling x2APIC: depends on iommu_supports_eim.

 
 Has anyone looked into this?

Not yet. Is there a handy reproduction guest image? Or maybe someone
would like to start with tracing what the guest and the host do.

Jan




signature.asc
Description: OpenPGP digital signature


Re: [RFC][PATCH] KVM: SVM: Sync g_pat with guest-written PAT value

2015-04-20 Thread Radim Krčmář
2015-04-20 19:21+0200, Jan Kiszka:
 On 2015-04-20 19:16, Radim Krčmář wrote:
  2015-04-20 18:14+0200, Radim Krčmář:
  Tested-by: Radim Krčmář rkrc...@redhat.com
  
  Uncached accesses were roughly 20x slower.
  In case anyone wanted to reproduce, I used this as a kvm-unit-test:
  
  ---
| [code]
 
 Great, thanks. Will you push it to the unit tests? Could raise
 motivations to fix the !NPT/EPT case.

It can't be included in `run_tests.sh`, because we intenionally ignore
PAT for normal RAM on VMX and the test does fail ...

I'll think how to make the test use fool-proof first, and also look how
to fix the !NPT/EPT without affecting the case we care about too much.
(And if we can do a similar trick with NPT.)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: x2apic issues with Solaris and Xen guests

2015-04-20 Thread Michael Tokarev
20.04.2015 20:29, Jan Kiszka wrote:
 On 2015-04-20 19:07, Stefan Hajnoczi wrote:
 I wonder whether the following two x2apic issues are related:

 Solaris 10 U11 network doesn't work
 https://bugzilla.redhat.com/show_bug.cgi?id=1040500

 kvm - fails to setup timer interrupt via io-apic
 (Thanks to Michael Tokarev for posting this link)
 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=528077#68
[]
 Has anyone looked into this?
 
 Not yet. Is there a handy reproduction guest image? Or maybe someone
 would like to start with tracing what the guest and the host do.

The second link gives a trivial reproducer, you need just the
xen hipervisor binary and some kernel.  This should be easy
too, because it happens right on boot.  But I guess it requires
some inner knowlege of xen early boot machinery.

Thanks,

/mjt

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] First batch of KVM changes for 4.1

2015-04-20 Thread Paolo Bonzini


On 17/04/2015 22:18, Marcelo Tosatti wrote:
 The bug which this is fixing is very rare, have no memory of a report.
 
 In fact, its even difficult to create a synthetic reproducer.

But then why was the task migration notifier even in Jeremy's original
code for Xen?  Was it supposed to work even on non-synchronized TSC?

If that's the case, then it could be reverted indeed; but then why did
you commit this patch to 4.1?  Did you think of something that would
cause the seqcount-like protocol to fail, and that turned out not to be
the case later?  I was only following the mailing list sparsely in March.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] KVM: SVM: Sync g_pat with guest-written PAT value

2015-04-20 Thread Radim Krčmář
2015-04-20 18:14+0200, Radim Krčmář:
 Tested-by: Radim Krčmář rkrc...@redhat.com

Uncached accesses were roughly 20x slower.
In case anyone wanted to reproduce, I used this as a kvm-unit-test:

---
#include processor.h

#define NR_TOP_LOOPS 24
#define NR_MEM_LOOPS 10
#define MEM_ELEMENTS 1024

static volatile u64 pat_test_memory[MEM_ELEMENTS];

static void flush_tlb(void)
{
write_cr3(read_cr3());
}

static void set_pat(u64 val)
{
wrmsr(0x277, val);
flush_tlb();

}

static u64 time_memory_accesses(void)
{
u64 tsc_before = rdtsc();

for (unsigned loop = 0; loop  NR_MEM_LOOPS; loop++)
for (unsigned i = 0; i  MEM_ELEMENTS; i++)
pat_test_memory[i]++;

return rdtsc() - tsc_before;
}

int main(int argc, char **argv)
{
unsigned error = 0;

for (unsigned loop = 0; loop  NR_TOP_LOOPS; loop++) {
u64 time_uc, time_wb;

set_pat(0);
time_uc = time_memory_accesses();

set_pat(0x0606060606060606ULL);
time_wb = time_memory_accesses();

if (time_uc  time_wb * 4)
error++;

printf(%02d uc: %10lld wb: %8lld\n, loop, time_uc, time_wb);
}

report(guest PAT, !error);

return report_summary();
}
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] KVM: SVM: Sync g_pat with guest-written PAT value

2015-04-20 Thread Radim Krčmář
2015-04-20 19:16+0200, Radim Krčmář:
 Uncached accesses were roughly 20x slower.

Sorry, a zero is missing there ... they were 200 times slower.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


x2apic issues with Solaris and Xen guests

2015-04-20 Thread Stefan Hajnoczi
I wonder whether the following two x2apic issues are related:

Solaris 10 U11 network doesn't work
https://bugzilla.redhat.com/show_bug.cgi?id=1040500

kvm - fails to setup timer interrupt via io-apic
(Thanks to Michael Tokarev for posting this link)
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=528077#68

It seems KVM's x2apic emulation works with regular Linux and Windows
guests, but not necessarily with other OSes.

Has anyone looked into this?

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] KVM: SVM: Sync g_pat with guest-written PAT value

2015-04-20 Thread Jan Kiszka
On 2015-04-20 19:16, Radim Krčmář wrote:
 2015-04-20 18:14+0200, Radim Krčmář:
 Tested-by: Radim Krčmář rkrc...@redhat.com
 
 Uncached accesses were roughly 20x slower.
 In case anyone wanted to reproduce, I used this as a kvm-unit-test:
 
 ---
 #include processor.h
 
 #define NR_TOP_LOOPS 24
 #define NR_MEM_LOOPS 10
 #define MEM_ELEMENTS 1024
 
 static volatile u64 pat_test_memory[MEM_ELEMENTS];
 
 static void flush_tlb(void)
 {
   write_cr3(read_cr3());
 }
 
 static void set_pat(u64 val)
 {
   wrmsr(0x277, val);
   flush_tlb();
 
 }
 
 static u64 time_memory_accesses(void)
 {
   u64 tsc_before = rdtsc();
 
   for (unsigned loop = 0; loop  NR_MEM_LOOPS; loop++)
   for (unsigned i = 0; i  MEM_ELEMENTS; i++)
   pat_test_memory[i]++;
 
   return rdtsc() - tsc_before;
 }
 
 int main(int argc, char **argv)
 {
   unsigned error = 0;
 
   for (unsigned loop = 0; loop  NR_TOP_LOOPS; loop++) {
   u64 time_uc, time_wb;
 
   set_pat(0);
   time_uc = time_memory_accesses();
 
   set_pat(0x0606060606060606ULL);
   time_wb = time_memory_accesses();
 
   if (time_uc  time_wb * 4)
   error++;
 
   printf(%02d uc: %10lld wb: %8lld\n, loop, time_uc, time_wb);
   }
 
   report(guest PAT, !error);
 
   return report_summary();
 }
 

Great, thanks. Will you push it to the unit tests? Could raise
motivations to fix the !NPT/EPT case.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] uio: add irq control support to uio_pci_generic

2015-04-20 Thread Michael S. Tsirkin
On Mon, Apr 20, 2015 at 08:33:18AM -0700, Stephen Hemminger wrote:
 On Mon, 20 Apr 2015 15:59:06 +0200
 Michael S. Tsirkin m...@redhat.com wrote:
 
  On Thu, Apr 16, 2015 at 02:21:10PM -0700, Stephen Hemminger wrote:
   On Thu, 16 Apr 2015 09:43:24 +0200
   Michael S. Tsirkin m...@redhat.com wrote:
   
On Wed, Apr 15, 2015 at 09:59:34AM -0700, Stephen Hemminger wrote:
 The driver already supported INTX interrupts but had no in kernel
 function to enable and disable them.
 
 It is possible for userspace to do this by accessing PCI config
 directly, but this racy

How is it racy? We have userspace using this interface,
if there's a race I want to fix it.
   
   There is nothing to prevent two threads in user space doing 
   read/modify write at the same time.
  
  Well that's a userspace bug then - so let's drop that
  from commit log lest people think this fixes some
  kernel bugs. read/modify/write to the same register
  is at least an easy to grasp problem, creating
  an extra interface for the same function opens up
  the possibility that some userspace will do
  read/modify/write from one thread with irqcontrol
  from another thread, creating more races.
  
   The bigger issue is that DPDK needs to support multiple UIO
   interface types. And with current model there is no abstraction.
   The way to enable/disable IRQ is different depending on the UIO
   drivers.
  
  OK compatibility with other devices might be useful, but what are the
  other UIO drivers DPDK supports? I only found support for igb_uio so
  far, and that doesn't seem to be upstream.
  
 
 Currently, supports:
   igb_uio, uio_pci_generic (as well as vfio)
 
 There are additional drivers which been submitted but not accepted for Xen 
 and HyperV
 both of which require special uio drivers.

Well vfio doesn't have irq_control, does it?  So I'd say it's best to
wait and see before we commit to a new ABI then.  You probably need to
support existing kernels anyway, if igb_uio makes it upstream,
then adding an interface that's consistent with it will make sense.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] KVM: SVM: Sync g_pat with guest-written PAT value

2015-04-20 Thread Radim Krčmář
2015-04-13 08:58+0200, Jan Kiszka:
 When hardware supports the g_pat VMCB field, we can use it for emulating
 the PAT configuration that the guest configures by writing to the
 corresponding MSR.
 
 Signed-off-by: Jan Kiszka jan.kis...@siemens.com
 ---
 
 RFC because it is only compile-tested.
 
 diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
 @@ -3245,6 +3245,15 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct 
 msr_data *msr)
   case MSR_VM_IGNNE:
   vcpu_unimpl(vcpu, unimplemented wrmsr: 0x%x data 0x%llx\n, 
 ecx, data);
   break;
 + case MSR_IA32_CR_PAT:
 + if (npt_enabled) {
 + if (!kvm_mtrr_valid(vcpu, MSR_IA32_CR_PAT, data))
 + return 1;
 + svm-vmcb-save.g_pat = data;
 + vcpu-arch.pat = data;

Disregarding my Reviewed-by, the code is missing:

  mark_dirty(svm-vmcb, VMCB_NPT);

Also,

Tested-by: Radim Krčmář rkrc...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] uio: add irq control support to uio_pci_generic

2015-04-20 Thread Stephen Hemminger
On Mon, 20 Apr 2015 15:59:06 +0200
Michael S. Tsirkin m...@redhat.com wrote:

 On Thu, Apr 16, 2015 at 02:21:10PM -0700, Stephen Hemminger wrote:
  On Thu, 16 Apr 2015 09:43:24 +0200
  Michael S. Tsirkin m...@redhat.com wrote:
  
   On Wed, Apr 15, 2015 at 09:59:34AM -0700, Stephen Hemminger wrote:
The driver already supported INTX interrupts but had no in kernel
function to enable and disable them.

It is possible for userspace to do this by accessing PCI config
directly, but this racy
   
   How is it racy? We have userspace using this interface,
   if there's a race I want to fix it.
  
  There is nothing to prevent two threads in user space doing 
  read/modify write at the same time.
 
 Well that's a userspace bug then - so let's drop that
 from commit log lest people think this fixes some
 kernel bugs. read/modify/write to the same register
 is at least an easy to grasp problem, creating
 an extra interface for the same function opens up
 the possibility that some userspace will do
 read/modify/write from one thread with irqcontrol
 from another thread, creating more races.
 
  The bigger issue is that DPDK needs to support multiple UIO
  interface types. And with current model there is no abstraction.
  The way to enable/disable IRQ is different depending on the UIO
  drivers.
 
 OK compatibility with other devices might be useful, but what are the
 other UIO drivers DPDK supports? I only found support for igb_uio so
 far, and that doesn't seem to be upstream.
 

Currently, supports:
  igb_uio, uio_pci_generic (as well as vfio)

There are additional drivers which been submitted but not accepted for Xen and 
HyperV
both of which require special uio drivers.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 3/5] KVM: x86/vPMU: Create vPMU interface for VMX and SVM

2015-04-20 Thread Wei Huang

snip

+/* check if msr_idx is a valid index to access PMU */
+inline int kvm_pmu_check_msr_idx(struct kvm_vcpu *vcpu, unsigned msr_idx)


If we really want it inline, it's better done in header.
(I think GCC would inline this in-module anyway, but other modules still
  have to call it.)


+{
+   return kvm_pmu_ops-check_msr_idx(vcpu, msr_idx);
+}
+

| [...]

+bool kvm_pmu_is_msr(struct kvm_vcpu *vcpu, u32 msr)


(Might make sense to inline these trivial wrappers.)

Hi Radim,

I forgot to mention that I didn't create inline for these functions in 
V3. For an inline to work on across source files, I have to explicitly 
use extern; so I decided not to touch it in V3 yet. If you insist, I 
will change it.


Another solution is to replace the functions with 
kvm_opmu_ops-blah_blah(). But this looks less appealing to me.


Thanks,
-Wei




+{
+   return kvm_pmu_ops-is_pmu_msr(vcpu, msr);
+}
+
+int kvm_pmu_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *data)
+{
+   return kvm_pmu_ops-get_msr(vcpu, msr, data);
+}
+
+int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
+{
+   return kvm_pmu_ops-set_msr(vcpu, msr_info);
+   



snip
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] KVM: SVM: Sync g_pat with guest-written PAT value

2015-04-20 Thread Radim Krčmář
2015-04-20 19:45+0200, Jan Kiszka:
 On 2015-04-20 19:37, Jan Kiszka wrote:
  On 2015-04-20 19:33, Radim Krčmář wrote:
  2015-04-20 19:21+0200, Jan Kiszka:
  On 2015-04-20 19:16, Radim Krčmář wrote:
  2015-04-20 18:14+0200, Radim Krčmář:
  Tested-by: Radim Krčmář rkrc...@redhat.com
 
  Uncached accesses were roughly 20x slower.
  In case anyone wanted to reproduce, I used this as a kvm-unit-test:
 
  ---
  | [code]
 
  Great, thanks. Will you push it to the unit tests? Could raise
  motivations to fix the !NPT/EPT case.
 
  It can't be included in `run_tests.sh`, because we intenionally ignore
  PAT for normal RAM on VMX and the test does fail ...
  
  That ignoring is encoded into the EPT?

Yes, it's the VMX_EPT_IPAT_BIT.

 And do you also know why is it ignored on Intel? Side effects on the host?

I think it is an optimization exclusive to Intel.
We know that the other side is not real hardware, which could avoid CPU
caches when accessing memory, so there is little reason to slow the
guest down.

 Hmm... Maybe we can create a
  ivshmem device and use that as test target.

Good idea, thanks.
(Haven't used it yet, so its parts might be able to do what is needed
 without creating a dependency on the whole ivshmem system.)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: x2apic issues with Solaris and Xen guests

2015-04-20 Thread Bandan Das
Michael Tokarev m...@tls.msk.ru writes:

 20.04.2015 20:29, Jan Kiszka wrote:
 On 2015-04-20 19:07, Stefan Hajnoczi wrote:
 I wonder whether the following two x2apic issues are related:

 Solaris 10 U11 network doesn't work
 https://bugzilla.redhat.com/show_bug.cgi?id=1040500

 kvm - fails to setup timer interrupt via io-apic
 (Thanks to Michael Tokarev for posting this link)
 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=528077#68
 []
 Has anyone looked into this?
 
 Not yet. Is there a handy reproduction guest image? Or maybe someone
 would like to start with tracing what the guest and the host do.

 The second link gives a trivial reproducer, you need just the
 xen hipervisor binary and some kernel.  This should be easy
 too, because it happens right on boot.  But I guess it requires
 some inner knowlege of xen early boot machinery.

Have you tried Radim's patch ?

commit c806a6ad35bfa6c92249cd0ca4772d5ac3f8cb68
Author: Radim Krčmář rkrc...@redhat.com
Date:   Wed Mar 18 19:38:22 2015 +0100

KVM: x86: call irq notifiers with directed EOI


 Thanks,

 /mjt

 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] KVM: SVM: Sync g_pat with guest-written PAT value

2015-04-20 Thread Jan Kiszka
On 2015-04-20 20:33, Radim Krčmář wrote:
 2015-04-20 19:45+0200, Jan Kiszka:
 On 2015-04-20 19:37, Jan Kiszka wrote:
 On 2015-04-20 19:33, Radim Krčmář wrote:
 2015-04-20 19:21+0200, Jan Kiszka:
 On 2015-04-20 19:16, Radim Krčmář wrote:
 2015-04-20 18:14+0200, Radim Krčmář:
 Tested-by: Radim Krčmář rkrc...@redhat.com

 Uncached accesses were roughly 20x slower.
 In case anyone wanted to reproduce, I used this as a kvm-unit-test:

 ---
 | [code]

 Great, thanks. Will you push it to the unit tests? Could raise
 motivations to fix the !NPT/EPT case.

 It can't be included in `run_tests.sh`, because we intenionally ignore
 PAT for normal RAM on VMX and the test does fail ...

 That ignoring is encoded into the EPT?
 
 Yes, it's the VMX_EPT_IPAT_BIT.
 
 And do you also know why is it ignored on Intel? Side effects on the host?
 
 I think it is an optimization exclusive to Intel.
 We know that the other side is not real hardware, which could avoid CPU
 caches when accessing memory, so there is little reason to slow the
 guest down.

If the guest pushes data for DMA into RAM, it may assume that it lands
there directly, without the need for explicit flushes, because it has
caching disabled - no?

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/4] target-arm: kvm - implement software breakpoints

2015-04-20 Thread Peter Maydell
On 31 March 2015 at 16:40, Alex Bennée alex.ben...@linaro.org wrote:
 These don't involve messing around with debug registers, just setting
 the breakpoint instruction in memory. GDB will not use this mechanism if
 it can't access the memory to write the breakpoint.

 All the kernel has to do is ensure the hypervisor traps the breakpoint
 exceptions and returns to userspace.

 Signed-off-by: Alex Bennée alex.ben...@linaro.org

 --
 v2
   - handle debug exit with new hsr exception info
   - add verbosity to UNIMP message

 diff --git a/target-arm/kvm.c b/target-arm/kvm.c
 index 72c1fa1..290c1fe 100644
 --- a/target-arm/kvm.c
 +++ b/target-arm/kvm.c
 @@ -25,6 +25,7 @@
  #include hw/arm/arm.h

  const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
 +KVM_CAP_INFO(SET_GUEST_DEBUG),
  KVM_CAP_LAST_INFO
  };

Doesn't this mean we'll suddenly stop working on older
kernels that didn't implement this capability? It would be
nicer to merely disable the breakpoint/debug support, rather
than refuse to run at all...

 @@ -466,9 +467,57 @@ void kvm_arch_post_run(CPUState *cs, struct kvm_run *run)
  {
  }

 +/* See ARM ARM D7.2.27 ESR_ELx, Exception Syndrome Register

You should probably say 'v8 ARM ARM'.

 +**
 +** To minimise translating between kernel and user-space the kernel
 +** ABI just provides user-space with the full exception syndrome
 +** register value to be decoded in QEMU.

What's with the weird '**' comment format?

 +*/
 +
 +#define HSR_EC_SHIFT26
 +#define HSR_EC_SW_BKPT  0x3c
 +
 +static int kvm_handle_debug(CPUState *cs, struct kvm_run *run)
 +{
 +struct kvm_debug_exit_arch *arch_info = run-debug.arch;
 +int hsr_ec = arch_info-hsr  HSR_EC_SHIFT;
 +
 +switch (hsr_ec) {
 +case HSR_EC_SW_BKPT:
 +if (kvm_find_sw_breakpoint(cs, arch_info-pc)) {
 +return true;
 +}
 +break;
 +default:
 +error_report(%s: unhandled debug exit (%x, %llx)\n,
 + __func__, arch_info-hsr, arch_info-pc);

Is this intended to be a can't happen case, or is it something
a guest can trigger?

 +}
 +
 +/* If we don't handle this it could be it really is for the
 +   guest to handle */

(suboptimal multiline comment format)

 +qemu_log_mask(LOG_UNIMP,
 +  %s: re-injecting exception not yet implemented (0x%x, 
 %llx)\n,
 +  __func__, hsr_ec, arch_info-pc);

When does this happen? Guest userspace program hits a hardcoded
breakpoint insn while we're trying to debug the VM?

If we just return to the guest in that situation will we
try to re-execute the bkpt insn and loop round taking
exceptions forever?

 +
 +return false;
 +}
 +
  int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
  {
 -return 0;
 +int ret = 0;
 +
 +switch (run-exit_reason) {
 +case KVM_EXIT_DEBUG:
 +if (kvm_handle_debug(cs, run)) {
 +ret = EXCP_DEBUG;
 +} /* otherwise return to guest */
 +break;
 +default:
 +qemu_log_mask(LOG_UNIMP, %s: un-handled exit reason %d\n,
 +  __func__, run-exit_reason);
 +break;
 +}
 +return ret;
  }

  bool kvm_arch_stop_on_emulation_error(CPUState *cs)
 @@ -493,14 +542,33 @@ int kvm_arch_on_sigbus(int code, void *addr)

  void kvm_arch_update_guest_debug(CPUState *cs, struct kvm_guest_debug *dbg)
  {
 -qemu_log_mask(LOG_UNIMP, %s: not implemented\n, __func__);
 +if (kvm_sw_breakpoints_active(cs)) {
 +dbg-control |= KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_USE_SW_BP;
 +}
  }

 -int kvm_arch_insert_sw_breakpoint(CPUState *cs,
 -  struct kvm_sw_breakpoint *bp)
 +/* C6.6.29 BRK instruction */

This comment would be better placed next to the magic number it's
explaining.

 +int kvm_arch_insert_sw_breakpoint(CPUState *cs, struct kvm_sw_breakpoint *bp)
  {
 -qemu_log_mask(LOG_UNIMP, %s: not implemented\n, __func__);
 -return -EINVAL;
 +static const uint32_t brk = 0xd420;
 +
 +if (cpu_memory_rw_debug(cs, bp-pc, (uint8_t *)bp-saved_insn, 4, 0) ||
 +cpu_memory_rw_debug(cs, bp-pc, (uint8_t *)brk, 4, 1)) {

Does this work correctly for big-endian hosts?

 +return -EINVAL;
 +}
 +return 0;
 +}
 +
 +int kvm_arch_remove_sw_breakpoint(CPUState *cs, struct kvm_sw_breakpoint *bp)
 +{
 +static uint32_t brk;
 +
 +if (cpu_memory_rw_debug(cs, bp-pc, (uint8_t *)brk, 4, 0) ||
 +brk != 0xd420 ||

If you're going to use this magic number twice please name it...

 +cpu_memory_rw_debug(cs, bp-pc, (uint8_t *)bp-saved_insn, 4, 1)) {
 +return -EINVAL;
 +}
 +return 0;
  }

  int kvm_arch_insert_hw_breakpoint(target_ulong addr,
 @@ -517,12 +585,6 @@ int kvm_arch_remove_hw_breakpoint(target_ulong addr,
  return -EINVAL;
  }

 -int kvm_arch_remove_sw_breakpoint(CPUState *cs,
 -  struct kvm_sw_breakpoint *bp)
 -{
 -

Re: [PATCH v2 3/4] target-arm: kvm - support for single step

2015-04-20 Thread Peter Maydell
On 31 March 2015 at 16:40, Alex Bennée alex.ben...@linaro.org wrote:
 This adds support for single-step. There isn't much to do on the QEMU
 side as after we set-up the request for single step via the debug ioctl
 it is all handled within the kernel.

 Signed-off-by: Alex Bennée alex.ben...@linaro.org

 ---
 v2
   - convert to using HSR_EC

 diff --git a/target-arm/kvm.c b/target-arm/kvm.c
 index 290c1fe..ae0f8b2 100644
 --- a/target-arm/kvm.c
 +++ b/target-arm/kvm.c
 @@ -475,6 +475,7 @@ void kvm_arch_post_run(CPUState *cs, struct kvm_run *run)
  */

  #define HSR_EC_SHIFT26
 +#define HSR_EC_SOFT_STEP0x32
  #define HSR_EC_SW_BKPT  0x3c

We already include internals.h in this file, so can you just use
the EC_* constants and ARM_EL_EC_SHIFT rather than defining
new ones? (Applies for patch 1 as well.)

  static int kvm_handle_debug(CPUState *cs, struct kvm_run *run)
 @@ -483,6 +484,13 @@ static int kvm_handle_debug(CPUState *cs, struct kvm_run 
 *run)
  int hsr_ec = arch_info-hsr  HSR_EC_SHIFT;

  switch (hsr_ec) {
 +case HSR_EC_SOFT_STEP:
 +if (cs-singlestep_enabled) {
 +return true;
 +} else {
 +error_report(Came out of SINGLE STEP when not enabled);

This can only happen if there's a kernel bug, right?

 +}
 +break;
  case HSR_EC_SW_BKPT:
  if (kvm_find_sw_breakpoint(cs, arch_info-pc)) {
  return true;
 @@ -542,6 +550,9 @@ int kvm_arch_on_sigbus(int code, void *addr)

  void kvm_arch_update_guest_debug(CPUState *cs, struct kvm_guest_debug *dbg)
  {
 +if (cs-singlestep_enabled) {
 +dbg-control |= KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_SINGLESTEP;
 +}
  if (kvm_sw_breakpoints_active(cs)) {
  dbg-control |= KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_USE_SW_BP;
  }
 --
 2.3.4



thanks
-- PMM
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html