Re: [PATCH net-next RFC 1/3] virtio: support for urgent descriptors

2014-10-15 Thread Rusty Russell
Jason Wang jasow...@redhat.com writes:
 Below should be useful for some experiments Jason is doing.
 I thought I'd send it out for early review/feedback.

 event idx feature allows us to defer interrupts until
 a specific # of descriptors were used.
 Sometimes it might be useful to get an interrupt after
 a specific descriptor, regardless.
 This adds a descriptor flag for this, and an API
 to create an urgent output descriptor.
 This is still an RFC:
 we'll need a feature bit for drivers to detect this,
 but we've run out of feature bits for virtio 0.X.
 For experimentation purposes, drivers can assume
 this is set, or add a driver-specific feature bit.

 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 Signed-off-by: Jason Wang jasow...@redhat.com

The new VRING_DESC_F_URGENT bit is theoretically nicer, but for
networking (which tends to take packets in order) couldn't we just set
the event counter to give us a tx interrupt at the packet we want?

Cheers,
Rusty.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 03/25] virtio-pci: move freeze/restore to virtio core

2014-10-15 Thread Paul Bolle
On Mon, 2014-10-13 at 10:48 +0300, Michael S. Tsirkin wrote:
 This is in preparation to extending config changed event handling
 in core.
 Wrapping these in an API also seems to make for a cleaner code.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 Reviewed-by: Cornelia Huck cornelia.h...@de.ibm.com

This patch landed in today's linux-next (next-20141015).

  include/linux/virtio.h  |  6 +
  drivers/virtio/virtio.c | 54 
 +
  drivers/virtio/virtio_pci.c | 54 
 ++---
  3 files changed, 62 insertions(+), 52 deletions(-)
 
 diff --git a/include/linux/virtio.h b/include/linux/virtio.h
 index 3c19bd3..8df7ba8 100644
 --- a/include/linux/virtio.h
 +++ b/include/linux/virtio.h
 @@ -78,6 +78,7 @@ bool virtqueue_is_broken(struct virtqueue *vq);
  /**
   * virtio_device - representation of a device using virtio
   * @index: unique position on the virtio bus
 + * @failed: saved value for CONFIG_S_FAILED bit (for restore)

s/CONFIG_S_FAILED/VIRTIO_CONFIG_S_FAILED/?

   * @dev: underlying device.
   * @id: the device type identification (used to match it with a driver).
   * @config: the configuration ops for this device.


Paul Bolle

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next RFC 0/3] virtio-net: Conditionally enable tx interrupt

2014-10-15 Thread Jason Wang
On 10/15/2014 07:06 AM, Michael S. Tsirkin wrote:
 On Tue, Oct 14, 2014 at 02:53:27PM -0400, David Miller wrote:
  From: Jason Wang jasow...@redhat.com
  Date: Sat, 11 Oct 2014 15:16:43 +0800
  
   We free old transmitted packets in ndo_start_xmit() currently, so any
   packet must be orphaned also there. This was used to reduce the 
   overhead of
   tx interrupt to achieve better performance. But this may not work for 
   some
   protocols such as TCP stream. TCP depends on the value of sk_wmem_alloc 
   to
   implement various optimization for small packets stream such as TCP 
   small
   queue and auto corking. But orphaning packets early in ndo_start_xmit()
   disable such things more or less since sk_wmem_alloc was not accurate. 
   This
   lead extra low throughput for TCP stream of small writes.
   
   This series tries to solve this issue by enable tx interrupts for all 
   TCP
   packets other than the ones with push bit or pure ACK. This is done 
   through
   the support of urgent descriptor which can force an interrupt for a
   specified packet. If tx interrupt was enabled for a packet, there's no 
   need
   to orphan it in ndo_start_xmit(), we can free it tx napi which is 
   scheduled
   by tx interrupt. Then sk_wmem_alloc was more accurate than before and 
   TCP
   can batch more for small write. More larger skb was produced by TCP in 
   this
   case to improve both throughput and cpu utilization.
   
   Test shows great improvements on small write tcp streams. For most of 
   the
   other cases, the throughput and cpu utilization are the same in the
   past. Only few cases, more cpu utilization was noticed which needs more
   investigation.
   
   Review and comments are welcomed.
  
  I think proper accounting and queueing (at all levels, not just TCP
  sockets) is more important than trying to skim a bunch of cycles by
  avoiding TX interrupts.
  
  Having an event to free the SKB is absolutely essential for the stack
  to operate correctly.
  
  And with virtio-net you don't even have the excuse of the HW
  unfortunately doesn't have an appropriate TX event.
  
  So please don't play games, and instead use TX interrupts all the
  time.  You can mitigate them in various ways, but don't turn them on
  selectively based upon traffic type, that's terrible.
  
  You can even use -xmit_more to defer the TX interrupt indication to
  the final TX packet in the chain.
 I guess we can just defer the kick, interrupt will naturally be
 deferred as well.
 This should solve the problem for old hosts as well.

Interrupt were delayed but not reduced, to support this we need publish
avail idx as used event. This should reduce the tx interrupt in the case
of bulk dequeuing.

I will draft a new rfc series contain this.

 We'll also need to implement bql for this.
 Something like the below?
 Completely untested - posting here to see if I figured the
 API out correctly. Has to be applied on top of the previous patch.

Looks so. I believe better to have but not a must.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 3/4] KVM: MMU: reload request from GET_DIRTY_LOG path

2014-10-15 Thread Gleb Natapov
On Mon, Oct 13, 2014 at 05:52:38AM -0300, Marcelo Tosatti wrote:
 On Fri, Oct 10, 2014 at 04:09:29PM +0300, Gleb Natapov wrote:
  On Wed, Oct 08, 2014 at 04:22:31PM -0300, Marcelo Tosatti wrote:
 
 Argh, lets try again:
 
 skip_pinned = true
 --
 
 mark page dirty, keep spte intact
 
 called from get dirty log path.
 
 skip_pinned = false
 ---
 reload remote mmu
 destroy pinned spte.
 
 called from: dirty log enablement, rmap write protect (unused for 
 pinned
 sptes)
 
 
 Note this behaviour is your suggestion:

Yes, I remember that and I thought we will not need this skip_pinned
at all.  For rmap write protect case there shouldn't be any pinned 
pages,
but why dirty log enablement sets skip_pinned to false? Why not mark
pinned pages as dirty just like you do in get dirty log path?
   
   Because if its a large spte, it must be nuked (or marked read-only,
   which for pinned sptes, is not possible).
   
  If a large page has one small page pinned inside it its spte will
  be marked as pinned, correct?  
 
 Correct.
 
  We did nuke large ptes here until very
  recently: c126d94f2c90ed9d, but we cannot drop a pte here anyway without
  kicking all vcpu from a guest mode, but do you need additional skip_pinned
  parameter?  Why not check if spte is large instead?
 
 Nuke only if large spte is found? Can do that, instead.
 
  So why not have per slot pinned page list (Xiao suggested the same) and do:
 
 The interface is per-vcpu (that is registration of pinned pages is
 performed on a per-vcpu basis).
 
PEBS is per cpu, but it does not mean that pinning should be per cpu, it
can be done globally with ref counting.

  spte_write_protect() {
if (is_pinned(spte) {
  if (large(spte))
 // cannot drop while vcpu are running
 mmu_reload_pinned_vcpus();
  else
 return false;
  }
  
  
  get_dirty_log() {
   for_each(pinned pages i)
  makr_dirty(i);
  }
 
 That is effectively the same this patchset does, except that the spte
 pinned bit is checked at spte_write_protect, instead of looping over 
 page pinned list. Fail to see huge advantage there.

I think spte_write_protect is a strange place to mark pages dirty, but
otherwise yes the effect is the same, so definitely not a huge difference.
If global pinned list is a PITA in your opinion leave it as is.

 I'll drop the skip_pinned parameter and use is_large_pte check instead.
 
Thanks,

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v11 1/6] KVM: Add architecture-specific TLB flush implementations

2014-10-15 Thread Alexander Graf


On 23.09.14 02:54, Mario Smarduch wrote:
 Add support to declare architecture specific TLB flush function, for now 
 ARMv7.
 
 Signed-off-by: Mario Smarduch m.smard...@samsung.com
 ---
  include/linux/kvm_host.h |1 +
  virt/kvm/Kconfig |3 +++
  virt/kvm/kvm_main.c  |4 
  3 files changed, 8 insertions(+)
 
 diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
 index ec4e3bd..a49a6df 100644
 --- a/include/linux/kvm_host.h
 +++ b/include/linux/kvm_host.h
 @@ -592,6 +592,7 @@ void kvm_load_guest_fpu(struct kvm_vcpu *vcpu);
  void kvm_put_guest_fpu(struct kvm_vcpu *vcpu);
  
  void kvm_flush_remote_tlbs(struct kvm *kvm);
 +void kvm_arch_flush_remote_tlbs(struct kvm *kvm);
  void kvm_reload_remote_mmus(struct kvm *kvm);
  void kvm_make_mclock_inprogress_request(struct kvm *kvm);
  void kvm_make_scan_ioapic_request(struct kvm *kvm);
 diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
 index 13f2d19..f1efaa5 100644
 --- a/virt/kvm/Kconfig
 +++ b/virt/kvm/Kconfig
 @@ -34,3 +34,6 @@ config HAVE_KVM_CPU_RELAX_INTERCEPT
  
  config KVM_VFIO
 bool
 +
 +config HAVE_KVM_ARCH_TLB_FLUSH_ALL
 +   bool
 diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
 index 4b6c01b..d0a24f5 100644
 --- a/virt/kvm/kvm_main.c
 +++ b/virt/kvm/kvm_main.c
 @@ -186,12 +186,16 @@ static bool make_all_cpus_request(struct kvm *kvm, 
 unsigned int req)
  
  void kvm_flush_remote_tlbs(struct kvm *kvm)
  {
 +#ifdef CONFIG_HAVE_KVM_ARCH_TLB_FLUSH_ALL

Please make this an #ifndef on the outer definition and just define your
own copy of this function inside of arch/arm/kvm :).


Alex

 + kvm_arch_flush_remote_tlbs(kvm);
 +#else
   long dirty_count = kvm-tlbs_dirty;
  
   smp_mb();
   if (make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH))
   ++kvm-stat.remote_tlb_flush;
   cmpxchg(kvm-tlbs_dirty, dirty_count, 0);
 +#endif
  }
  EXPORT_SYMBOL_GPL(kvm_flush_remote_tlbs);
  
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v11 2/6] KVM: Add generic implementation of kvm_vm_ioctl_get_dirty_log

2014-10-15 Thread Alexander Graf


On 23.09.14 02:54, Mario Smarduch wrote:
 Add support for generic implementation of dirty log read function. For now
 x86_64 and ARMv7 share generic dirty log read. Other architectures call
 their architecture specific functions.
 
 Signed-off-by: Mario Smarduch m.smard...@samsung.com
 ---
  arch/arm/kvm/Kconfig  |1 +
  arch/arm/kvm/arm.c|2 +-
  arch/arm64/kvm/Kconfig|1 +
  arch/ia64/kvm/Kconfig |1 +
  arch/ia64/kvm/kvm-ia64.c  |2 +-
  arch/mips/kvm/Kconfig |1 +
  arch/mips/kvm/mips.c  |2 +-
  arch/powerpc/kvm/Kconfig  |1 +
  arch/powerpc/kvm/book3s.c |2 +-
  arch/powerpc/kvm/booke.c  |2 +-
  arch/s390/kvm/Kconfig |1 +
  arch/s390/kvm/kvm-s390.c  |2 +-
  arch/x86/kvm/x86.c|   86 --
  include/linux/kvm_host.h  |6 +++
  virt/kvm/Kconfig  |3 ++
  virt/kvm/kvm_main.c   |   91 
 +
  16 files changed, 112 insertions(+), 92 deletions(-)
 
 diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
 index 4be5bb1..cd9bb1c 100644
 --- a/arch/arm/kvm/Kconfig
 +++ b/arch/arm/kvm/Kconfig
 @@ -23,6 +23,7 @@ config KVM
   select HAVE_KVM_CPU_RELAX_INTERCEPT
   select KVM_MMIO
   select KVM_ARM_HOST
 + select HAVE_KVM_ARCH_DIRTY_LOG

I think we're better off treating the common dirty log option as the
odd case. So instead of adding this option for every arch, just set it
on x86 and later on arm. Then you can ...

   depends on ARM_VIRT_EXT  ARM_LPAE  !CPU_BIG_ENDIAN
   ---help---
 Support hosting virtualized guest machines. You will also
 diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
 index 3c82b37..c52b2bd 100644
 --- a/arch/arm/kvm/arm.c
 +++ b/arch/arm/kvm/arm.c
 @@ -774,7 +774,7 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
   }
  }
  
 -int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
 +int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log 
 *log)
  {
   return -EINVAL;
  }
 diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
 index 8ba85e9..40a8d19 100644
 --- a/arch/arm64/kvm/Kconfig
 +++ b/arch/arm64/kvm/Kconfig
 @@ -26,6 +26,7 @@ config KVM
   select KVM_ARM_HOST
   select KVM_ARM_VGIC
   select KVM_ARM_TIMER
 + select HAVE_KVM_ARCH_DIRTY_LOG
   ---help---
 Support hosting virtualized guest machines.
  
 diff --git a/arch/ia64/kvm/Kconfig b/arch/ia64/kvm/Kconfig
 index 990b864..217f10a 100644
 --- a/arch/ia64/kvm/Kconfig
 +++ b/arch/ia64/kvm/Kconfig
 @@ -28,6 +28,7 @@ config KVM
   select HAVE_KVM_IRQ_ROUTING
   select KVM_APIC_ARCHITECTURE
   select KVM_MMIO
 + select HAVE_KVM_ARCH_DIRTY_LOG
   ---help---
 Support hosting fully virtualized guest machines using hardware
 virtualization extensions.  You will need a fairly recent
 diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
 index 6a4309b..3166df5 100644
 --- a/arch/ia64/kvm/kvm-ia64.c
 +++ b/arch/ia64/kvm/kvm-ia64.c
 @@ -1812,7 +1812,7 @@ static void kvm_ia64_sync_dirty_log(struct kvm *kvm,
   spin_unlock(kvm-arch.dirty_log_lock);
  }
  
 -int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
 +int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm,
   struct kvm_dirty_log *log)
  {
   int r;
 diff --git a/arch/mips/kvm/Kconfig b/arch/mips/kvm/Kconfig
 index 30e334e..b57f49e 100644
 --- a/arch/mips/kvm/Kconfig
 +++ b/arch/mips/kvm/Kconfig
 @@ -20,6 +20,7 @@ config KVM
   select PREEMPT_NOTIFIERS
   select ANON_INODES
   select KVM_MMIO
 + select HAVE_KVM_ARCH_DIRTY_LOG
   ---help---
 Support for hosting Guest kernels.
 Currently supported on MIPS32 processors.
 diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
 index d687c6e..885fdfe 100644
 --- a/arch/mips/kvm/mips.c
 +++ b/arch/mips/kvm/mips.c
 @@ -791,7 +791,7 @@ out:
  }
  
  /* Get (and clear) the dirty memory log for a memory slot. */
 -int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
 +int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log 
 *log)
  {
   struct kvm_memory_slot *memslot;
   unsigned long ga, ga_end;
 diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
 index d6a53b9..4f28a82 100644
 --- a/arch/powerpc/kvm/Kconfig
 +++ b/arch/powerpc/kvm/Kconfig
 @@ -21,6 +21,7 @@ config KVM
   select PREEMPT_NOTIFIERS
   select ANON_INODES
   select HAVE_KVM_EVENTFD
 + select HAVE_KVM_ARCH_DIRTY_LOG
  
  config KVM_BOOK3S_HANDLER
   bool
 diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
 index c254c27..304faa1 100644
 --- a/arch/powerpc/kvm/book3s.c
 +++ b/arch/powerpc/kvm/book3s.c
 @@ -815,7 +815,7 @@ int kvmppc_core_check_requests(struct kvm_vcpu *vcpu)
   return vcpu-kvm-arch.kvm_ops-check_requests(vcpu);
  }
  
 -int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct 

[PATCH v2] target-ppc: kvm: Fix memory overflow issue about strncat()

2014-10-15 Thread Chen Gang
strncat() will append additional '\0' to destination buffer, so need
additional 1 byte for it, or may cause memory overflow, just like other
area within QEMU have done.

And can use g_strdup_printf() instead of strncat(), which may be more
easier understanding.

Signed-off-by: Chen Gang gang.chen.5...@gmail.com
---
 target-ppc/kvm.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index 66e7ce5..cea6a87 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -1782,7 +1782,7 @@ static int kvmppc_find_cpu_dt(char *buf, int buf_len)
  * format) */
 static uint64_t kvmppc_read_int_cpu_dt(const char *propname)
 {
-char buf[PATH_MAX];
+char buf[PATH_MAX], *tmp;
 union {
 uint32_t v32;
 uint64_t v64;
@@ -1794,10 +1794,10 @@ static uint64_t kvmppc_read_int_cpu_dt(const char 
*propname)
 return -1;
 }
 
-strncat(buf, /, sizeof(buf) - strlen(buf));
-strncat(buf, propname, sizeof(buf) - strlen(buf));
+tmp = g_strdup_printf(%s/%s, buf, propname);
 
-f = fopen(buf, rb);
+f = fopen(tmp, rb);
+g_free(tmp);
 if (!f) {
 return -1;
 }
-- 
1.9.3
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] target-ppc: kvm: Fix memory overflow issue about strncat()

2014-10-15 Thread Alexander Graf


On 15.10.14 15:48, Chen Gang wrote:
 strncat() will append additional '\0' to destination buffer, so need
 additional 1 byte for it, or may cause memory overflow, just like other
 area within QEMU have done.
 
 And can use g_strdup_printf() instead of strncat(), which may be more
 easier understanding.
 
 Signed-off-by: Chen Gang gang.chen.5...@gmail.com

Thanks, applied to ppc-next. And sorry for not replying earlier.


Alex
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 01/15] arm/arm64: KVM: rework MPIDR assignment and add accessors

2014-10-15 Thread Christoffer Dall
On Thu, Aug 21, 2014 at 02:06:42PM +0100, Andre Przywara wrote:
 The virtual MPIDR registers (containing topology information) for the
 guest are currently mapped linearily to the vcpu_id. Improve this
 mapping for arm64 by using three levels to not artificially limit the
 number of vCPUs. Also add an accessor to later allow easier access to
 a vCPU with a given MPIDR.
 Use this new accessor in the PSCI emulation.
 
 Signed-off-by: Andre Przywara andre.przyw...@arm.com
 ---
  arch/arm/include/asm/kvm_emulate.h   |2 +-
  arch/arm/include/asm/kvm_host.h  |2 ++
  arch/arm/kvm/arm.c   |   15 +++
  arch/arm/kvm/psci.c  |   15 ---
  arch/arm64/include/asm/kvm_emulate.h |3 ++-
  arch/arm64/include/asm/kvm_host.h|2 ++
  arch/arm64/kvm/sys_regs.c|   11 +--
  7 files changed, 35 insertions(+), 15 deletions(-)
 
 diff --git a/arch/arm/include/asm/kvm_emulate.h 
 b/arch/arm/include/asm/kvm_emulate.h
 index 69b7469..18e45f7 100644
 --- a/arch/arm/include/asm/kvm_emulate.h
 +++ b/arch/arm/include/asm/kvm_emulate.h
 @@ -159,7 +159,7 @@ static inline u32 kvm_vcpu_hvc_get_imm(struct kvm_vcpu 
 *vcpu)
  
  static inline unsigned long kvm_vcpu_get_mpidr(struct kvm_vcpu *vcpu)
  {
 - return vcpu-arch.cp15[c0_MPIDR];
 + return vcpu-arch.cp15[c0_MPIDR]  MPIDR_HWID_BITMASK;

why?

  }
  
  static inline void kvm_vcpu_set_be(struct kvm_vcpu *vcpu)
 diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
 index 6dfb404..cf99ad0 100644
 --- a/arch/arm/include/asm/kvm_host.h
 +++ b/arch/arm/include/asm/kvm_host.h
 @@ -233,4 +233,6 @@ static inline void vgic_arch_setup(const struct 
 vgic_params *vgic)
  int kvm_perf_init(void);
  int kvm_perf_teardown(void);
  
 +struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
 +
  #endif /* __ARM_KVM_HOST_H__ */
 diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
 index 7d5065e..addb990 100644
 --- a/arch/arm/kvm/arm.c
 +++ b/arch/arm/kvm/arm.c
 @@ -990,6 +990,21 @@ static void check_kvm_target_cpu(void *ret)
   *(int *)ret = kvm_target_cpu();
  }
  
 +struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr)
 +{
 + unsigned long c_mpidr;
 + struct kvm_vcpu *vcpu;
 + int i;
 +
 + mpidr = MPIDR_HWID_BITMASK;
 + kvm_for_each_vcpu(i, vcpu, kvm) {
 + c_mpidr = kvm_vcpu_get_mpidr(vcpu);
 + if (c_mpidr == mpidr)
 + return vcpu;
 + }
 + return NULL;
 +}
 +
  /**
   * Initialize Hyp-mode and memory mappings on all CPUs.
   */
 diff --git a/arch/arm/kvm/psci.c b/arch/arm/kvm/psci.c
 index 09cf377..49f0992 100644
 --- a/arch/arm/kvm/psci.c
 +++ b/arch/arm/kvm/psci.c
 @@ -21,6 +21,7 @@
  #include asm/cputype.h
  #include asm/kvm_emulate.h
  #include asm/kvm_psci.h
 +#include asm/kvm_host.h
  
  /*
   * This is an implementation of the Power State Coordination Interface
 @@ -65,25 +66,17 @@ static void kvm_psci_vcpu_off(struct kvm_vcpu *vcpu)
  static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
  {
   struct kvm *kvm = source_vcpu-kvm;
 - struct kvm_vcpu *vcpu = NULL, *tmp;
 + struct kvm_vcpu *vcpu = NULL;
   wait_queue_head_t *wq;
   unsigned long cpu_id;
   unsigned long context_id;
 - unsigned long mpidr;
   phys_addr_t target_pc;
 - int i;
  
 - cpu_id = *vcpu_reg(source_vcpu, 1);
 + cpu_id = *vcpu_reg(source_vcpu, 1)  MPIDR_HWID_BITMASK;
   if (vcpu_mode_is_32bit(source_vcpu))
   cpu_id = ~((u32) 0);
  
 - kvm_for_each_vcpu(i, tmp, kvm) {
 - mpidr = kvm_vcpu_get_mpidr(tmp);
 - if ((mpidr  MPIDR_HWID_BITMASK) == (cpu_id  
 MPIDR_HWID_BITMASK)) {
 - vcpu = tmp;
 - break;
 - }
 - }
 + vcpu = kvm_mpidr_to_vcpu(kvm, cpu_id);
  
   /*
* Make sure the caller requested a valid CPU and that the CPU is
 diff --git a/arch/arm64/include/asm/kvm_emulate.h 
 b/arch/arm64/include/asm/kvm_emulate.h
 index fdc3e21..4b8d636 100644
 --- a/arch/arm64/include/asm/kvm_emulate.h
 +++ b/arch/arm64/include/asm/kvm_emulate.h
 @@ -27,6 +27,7 @@
  #include asm/kvm_arm.h
  #include asm/kvm_mmio.h
  #include asm/ptrace.h
 +#include asm/cputype.h
  
  unsigned long *vcpu_reg32(const struct kvm_vcpu *vcpu, u8 reg_num);
  unsigned long *vcpu_spsr32(const struct kvm_vcpu *vcpu);
 @@ -179,7 +180,7 @@ static inline u8 kvm_vcpu_trap_get_fault(const struct 
 kvm_vcpu *vcpu)
  
  static inline unsigned long kvm_vcpu_get_mpidr(struct kvm_vcpu *vcpu)
  {
 - return vcpu_sys_reg(vcpu, MPIDR_EL1);
 + return vcpu_sys_reg(vcpu, MPIDR_EL1)  MPIDR_HWID_BITMASK;

why?

  }
  
  static inline void kvm_vcpu_set_be(struct kvm_vcpu *vcpu)
 diff --git a/arch/arm64/include/asm/kvm_host.h 
 b/arch/arm64/include/asm/kvm_host.h
 index e10c45a..017fbae 100644
 --- a/arch/arm64/include/asm/kvm_host.h
 +++ 

Re: [PATCH v2 04/15] arm/arm64: KVM: wrap 64 bit MMIO accesses with two 32 bit ones

2014-10-15 Thread Christoffer Dall
On Thu, Aug 21, 2014 at 02:06:45PM +0100, Andre Przywara wrote:
 Some GICv3 registers can and will be accessed as 64 bit registers.
 Currently the register handling code can only deal with 32 bit
 accesses, so we do two consecutive calls to cover this.
 
 Signed-off-by: Andre Przywara andre.przyw...@arm.com
 ---
  virt/kvm/arm/vgic.c |   48 +---
  1 file changed, 45 insertions(+), 3 deletions(-)
 
 diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
 index 3b6f78d..bef9aa0 100644
 --- a/virt/kvm/arm/vgic.c
 +++ b/virt/kvm/arm/vgic.c
 @@ -926,6 +926,48 @@ static bool vgic_validate_access(const struct vgic_dist 
 *dist,
  }
  
  /*
 + * Call the respective handler function for the given range.
 + * We split up any 64 bit accesses into two consecutive 32 bit
 + * handler calls and merge the result afterwards.
 + */
 +static bool call_range_handler(struct kvm_vcpu *vcpu,
 +struct kvm_exit_mmio *mmio,
 +unsigned long offset,
 +const struct mmio_range *range)
 +{
 + u32 *data32 = (void *)mmio-data;
 + struct kvm_exit_mmio mmio32;
 + bool ret;
 +
 + if (likely(mmio-len = 4))
 + return range-handle_mmio(vcpu, mmio, offset);
 +
 + /*
 +  * We assume that any access greater than 4 bytes is actually

Is this an assumption or something that will always hold true at this
point in the code?  If the former, which situations could it not hold
and what would happen?  If the latter, we should just state that.

 +  * 8 bytes long, caused by a 64-bit access
 +  */
 +
 + mmio32.len = 4;
 + mmio32.is_write = mmio-is_write;
 +
 + mmio32.phys_addr = mmio-phys_addr + 4;
 + if (mmio-is_write)
 + *(u32 *)mmio32.data = data32[1];
 + ret = range-handle_mmio(vcpu, mmio32, offset + 4);
 + if (!mmio-is_write)
 + data32[1] = *(u32 *)mmio32.data;
 +
 + mmio32.phys_addr = mmio-phys_addr;
 + if (mmio-is_write)
 + *(u32 *)mmio32.data = data32[0];
 + ret |= range-handle_mmio(vcpu, mmio32, offset);
 + if (!mmio-is_write)
 + data32[0] = *(u32 *)mmio32.data;

won't this break on a BE system?

 +
 + return ret;
 +}
 +
 +/*
   * vgic_handle_mmio_range - handle an in-kernel MMIO access
   * @vcpu:pointer to the vcpu performing the access
   * @run: pointer to the kvm_run structure
 @@ -956,10 +998,10 @@ static bool vgic_handle_mmio_range(struct kvm_vcpu 
 *vcpu, struct kvm_run *run,
   spin_lock(vcpu-kvm-arch.vgic.lock);
   offset -= range-base;
   if (vgic_validate_access(dist, range, offset)) {
 - updated_state = range-handle_mmio(vcpu, mmio, offset);
 + updated_state = call_range_handler(vcpu, mmio, offset, range);
   } else {
 - vgic_reg_access(mmio, NULL, offset,
 - ACCESS_READ_RAZ | ACCESS_WRITE_IGNORED);
 + if (!mmio-is_write)
 + memset(mmio-data, 0, mmio-len);
   updated_state = false;
   }
   spin_unlock(vcpu-kvm-arch.vgic.lock);
 -- 
 1.7.9.5
 

Thanks,
-Christoffer
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 05/15] arm/arm64: KVM: introduce per-VM ops

2014-10-15 Thread Christoffer Dall
On Thu, Aug 21, 2014 at 02:06:46PM +0100, Andre Przywara wrote:
 Currently we only have one virtual GIC model supported, so all guests
 use the same emulation code. With the addition of another model we
 end up with different guests using potentially different vGIC models,
 so we have to split up some functions to be per VM.
 Introduce a vgic_vm_ops struct to hold function pointers for those
 functions that are different and provide the necessary code to
 initialize them.
 This includes functions that depend on the emulated GIC model only
 and functions that depend on the combination of host and guest GIC.
 
 Signed-off-by: Andre Przywara andre.przyw...@arm.com
 ---
  include/kvm/arm_vgic.h |   18 --
  virt/kvm/arm/vgic-v2.c |   17 +++--
  virt/kvm/arm/vgic-v3.c |   16 +++--
  virt/kvm/arm/vgic.c|   92 
 +++-
  4 files changed, 113 insertions(+), 30 deletions(-)
 
 diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
 index 4feac9a..7e7c99e 100644
 --- a/include/kvm/arm_vgic.h
 +++ b/include/kvm/arm_vgic.h
 @@ -99,8 +99,6 @@ struct vgic_vmcr {
  };
  
  struct vgic_ops {
 - struct vgic_lr  (*get_lr)(const struct kvm_vcpu *, int);
 - void(*set_lr)(struct kvm_vcpu *, int, struct vgic_lr);
   void(*sync_lr_elrsr)(struct kvm_vcpu *, int, struct vgic_lr);
   u64 (*get_elrsr)(const struct kvm_vcpu *vcpu);
   u64 (*get_eisr)(const struct kvm_vcpu *vcpu);
 @@ -123,6 +121,17 @@ struct vgic_params {
   unsigned intmaint_irq;
   /* Virtual control interface base address */
   void __iomem*vctrl_base;
 + bool (*init_emul)(struct kvm *kvm, int type);
 +};
 +
 +struct vgic_vm_ops {
 + struct vgic_lr  (*get_lr)(const struct kvm_vcpu *, int);
 + void(*set_lr)(struct kvm_vcpu *, int, struct vgic_lr);
 + bool(*handle_mmio)(struct kvm_vcpu *, struct kvm_run *,
 +struct kvm_exit_mmio *);
 + bool(*queue_sgi)(struct kvm_vcpu *vcpu, int irq);
 + void(*unqueue_sgi)(struct kvm_vcpu *vcpu, int irq, int source);
 + int (*vgic_init)(struct kvm *kvm, const struct vgic_params *params);
  };
  
  struct vgic_dist {
 @@ -131,6 +140,9 @@ struct vgic_dist {
   boolin_kernel;
   boolready;
  
 + /* vGIC model the kernel emulates for the guest (GICv2 or GICv3) */
 + u32 vgic_model;
 +
   int nr_cpus;
   int nr_irqs;
  
 @@ -168,6 +180,8 @@ struct vgic_dist {
  
   /* Bitmap indicating which CPU has something pending */
   unsigned long   irq_pending_on_cpu;
 +
 + struct vgic_vm_ops  vm_ops;
  #endif
  };
  
 diff --git a/virt/kvm/arm/vgic-v2.c b/virt/kvm/arm/vgic-v2.c
 index 01124ef..90947c6 100644
 --- a/virt/kvm/arm/vgic-v2.c
 +++ b/virt/kvm/arm/vgic-v2.c
 @@ -161,8 +161,6 @@ static void vgic_v2_enable(struct kvm_vcpu *vcpu)
  }
  
  static const struct vgic_ops vgic_v2_ops = {
 - .get_lr = vgic_v2_get_lr,
 - .set_lr = vgic_v2_set_lr,
   .sync_lr_elrsr  = vgic_v2_sync_lr_elrsr,
   .get_elrsr  = vgic_v2_get_elrsr,
   .get_eisr   = vgic_v2_get_eisr,
 @@ -176,6 +174,20 @@ static const struct vgic_ops vgic_v2_ops = {
  
  static struct vgic_params vgic_v2_params;
  
 +static bool vgic_v2_init_emul(struct kvm *kvm, int type)
 +{
 + struct vgic_vm_ops *vm_ops = kvm-arch.vgic.vm_ops;
 +
 + switch (type) {
 + case KVM_DEV_TYPE_ARM_VGIC_V2:
 + vm_ops-get_lr = vgic_v2_get_lr;
 + vm_ops-set_lr = vgic_v2_set_lr;
 + return true;
 + }
 +
 + return false;
 +}
 +
  /**
   * vgic_v2_probe - probe for a GICv2 compatible interrupt controller in DT
   * @node:pointer to the DT node
 @@ -214,6 +226,7 @@ int vgic_v2_probe(struct device_node *vgic_node,
   ret = -ENOMEM;
   goto out;
   }
 + vgic-init_emul = vgic_v2_init_emul;

this feels overly complicated, can't each host-support function just
export this function and we call it directly from the vgic creation
code?

That or just keep the host-specific functions the way they are and
handle the split within these functions?

Bah, I'm not sure, this patch is just terrible to review...  Perhaps
it's because we're doing some combination of introducing infrastructure
and also changing random bits and naming to be version-specific.

  
   vgic-nr_lr = readl_relaxed(vgic-vctrl_base + GICH_VTR);
   vgic-nr_lr = (vgic-nr_lr  0x3f) + 1;
 diff --git a/virt/kvm/arm/vgic-v3.c b/virt/kvm/arm/vgic-v3.c
 index 1c2c8ee..a38339e 100644
 --- a/virt/kvm/arm/vgic-v3.c
 +++ b/virt/kvm/arm/vgic-v3.c
 @@ -157,8 +157,6 @@ static void vgic_v3_enable(struct kvm_vcpu *vcpu)
  }
  
  static const struct vgic_ops vgic_v3_ops = {
 - .get_lr = vgic_v3_get_lr,
 - .set_lr = 

Re: [PATCH v2 06/15] arm/arm64: KVM: make the maximum number of vCPUs a per-VM value

2014-10-15 Thread Christoffer Dall
On Thu, Aug 21, 2014 at 02:06:47PM +0100, Andre Przywara wrote:
 Currently the maximum number of vCPUs supported is a global value
 limited by the used GIC model. GICv3 will lift this limit, but we
 still need to observe it for guests using GICv2.
 So the maximum number of vCPUs is per-VM value, depending on the
 GIC model the guest uses.
 Store and check the value in struct kvm_arch, but keep it down to
 8 for now.
 
 Signed-off-by: Andre Przywara andre.przyw...@arm.com
 ---
  arch/arm/include/asm/kvm_host.h   |1 +
  arch/arm/kvm/arm.c|6 ++
  arch/arm64/include/asm/kvm_host.h |3 +++
  virt/kvm/arm/vgic-v2.c|5 +
  virt/kvm/arm/vgic-v3.c|6 ++
  5 files changed, 21 insertions(+)
 
 diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
 index cf99ad0..0638419 100644
 --- a/arch/arm/include/asm/kvm_host.h
 +++ b/arch/arm/include/asm/kvm_host.h
 @@ -67,6 +67,7 @@ struct kvm_arch {
  
   /* Interrupt controller */
   struct vgic_distvgic;
 + int max_vcpus;
  };
  
  #define KVM_NR_MEM_OBJS 40
 diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
 index 157e1b6..190f05f 100644
 --- a/arch/arm/kvm/arm.c
 +++ b/arch/arm/kvm/arm.c
 @@ -142,6 +142,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
  
   /* Mark the initial VMID generation invalid */
   kvm-arch.vmid_gen = 0;
 + kvm-arch.max_vcpus = CONFIG_KVM_ARM_MAX_VCPUS;
  
   return ret;
  out_free_stage2_pgd:
 @@ -223,6 +224,11 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, 
 unsigned int id)
   int err;
   struct kvm_vcpu *vcpu;
  
 + if (id = kvm-arch.max_vcpus) {
 + err = -EINVAL;
 + goto out;
 + }
 +
   vcpu = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL);
   if (!vcpu) {
   err = -ENOMEM;
 diff --git a/arch/arm64/include/asm/kvm_host.h 
 b/arch/arm64/include/asm/kvm_host.h
 index 017fbae..a325161 100644
 --- a/arch/arm64/include/asm/kvm_host.h
 +++ b/arch/arm64/include/asm/kvm_host.h
 @@ -58,6 +58,9 @@ struct kvm_arch {
   /* VTTBR value associated with above pgd and vmid */
   u64vttbr;
  
 + /* The maximum number of vCPUs depends on the used GIC model */
 + int max_vcpus;
 +
   /* Interrupt controller */
   struct vgic_distvgic;
  
 diff --git a/virt/kvm/arm/vgic-v2.c b/virt/kvm/arm/vgic-v2.c
 index 90947c6..c7362ff 100644
 --- a/virt/kvm/arm/vgic-v2.c
 +++ b/virt/kvm/arm/vgic-v2.c
 @@ -177,11 +177,16 @@ static struct vgic_params vgic_v2_params;
  static bool vgic_v2_init_emul(struct kvm *kvm, int type)
  {
   struct vgic_vm_ops *vm_ops = kvm-arch.vgic.vm_ops;
 + int nr_vcpus;
  
   switch (type) {
   case KVM_DEV_TYPE_ARM_VGIC_V2:
 + nr_vcpus = atomic_read(kvm-online_vcpus);
 + if (nr_vcpus  8)
 + return false;
   vm_ops-get_lr = vgic_v2_get_lr;
   vm_ops-set_lr = vgic_v2_set_lr;
 + kvm-arch.max_vcpus = 8;
   return true;
   }
  
 diff --git a/virt/kvm/arm/vgic-v3.c b/virt/kvm/arm/vgic-v3.c
 index a38339e..86e8b99 100644
 --- a/virt/kvm/arm/vgic-v3.c
 +++ b/virt/kvm/arm/vgic-v3.c
 @@ -171,11 +171,17 @@ static const struct vgic_ops vgic_v3_ops = {
  static bool vgic_v3_init_emul_compat(struct kvm *kvm, int type)
  {
   struct vgic_vm_ops *vm_ops = kvm-arch.vgic.vm_ops;
 + int nr_vcpus;
  
   switch (type) {
   case KVM_DEV_TYPE_ARM_VGIC_V2:
 + nr_vcpus = atomic_read(kvm-online_vcpus);
 + if (nr_vcpus  8)
 + return false;
 +

I have a feeling we could be seeing this error a bit, could we dedicate
an error code for the purpose or print a ratelimited warning or
something?

   vm_ops-get_lr = vgic_v3_get_lr;
   vm_ops-set_lr = vgic_v3_set_lr;
 + kvm-arch.max_vcpus = 8;
   return true;
   }
   return false;
 -- 
 1.7.9.5
 
Thanks,
-Christoffer
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 07/15] arm/arm64: KVM: make the value of ICC_SRE_EL1 a per-VM variable

2014-10-15 Thread Christoffer Dall
On Thu, Aug 21, 2014 at 02:06:48PM +0100, Andre Przywara wrote:
 ICC_SRE_EL1 is a system register allowing msr/mrs accesses to the
 GIC CPU interface for EL1 (guests). Currently we force it to 0, but
 for proper GICv3 support we have to allow guests to use it (depending
 on their selected virtual GIC model).
 So add ICC_SRE_EL1 to the list of saved/restored registers on a
 world switch, but actually disallow a guest to change it by only
 restoring a fixed, once-initialized value.
 This value depends on the GIC model userland has chosen for a guest.
 
 Signed-off-by: Andre Przywara andre.przyw...@arm.com
 ---
  arch/arm64/kernel/asm-offsets.c |1 +
  arch/arm64/kvm/vgic-v3-switch.S |   14 +-
  include/kvm/arm_vgic.h  |1 +
  virt/kvm/arm/vgic-v3.c  |9 +++--
  4 files changed, 18 insertions(+), 7 deletions(-)
 
 diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
 index 9a9fce0..9d34486 100644
 --- a/arch/arm64/kernel/asm-offsets.c
 +++ b/arch/arm64/kernel/asm-offsets.c
 @@ -140,6 +140,7 @@ int main(void)
DEFINE(VGIC_V2_CPU_ELRSR,  offsetof(struct vgic_cpu, vgic_v2.vgic_elrsr));
DEFINE(VGIC_V2_CPU_APR,offsetof(struct vgic_cpu, vgic_v2.vgic_apr));
DEFINE(VGIC_V2_CPU_LR, offsetof(struct vgic_cpu, vgic_v2.vgic_lr));
 +  DEFINE(VGIC_V3_CPU_SRE,offsetof(struct vgic_cpu, vgic_v3.vgic_sre));
DEFINE(VGIC_V3_CPU_HCR,offsetof(struct vgic_cpu, vgic_v3.vgic_hcr));
DEFINE(VGIC_V3_CPU_VMCR,   offsetof(struct vgic_cpu, vgic_v3.vgic_vmcr));
DEFINE(VGIC_V3_CPU_MISR,   offsetof(struct vgic_cpu, vgic_v3.vgic_misr));
 diff --git a/arch/arm64/kvm/vgic-v3-switch.S b/arch/arm64/kvm/vgic-v3-switch.S
 index d160469..617a012 100644
 --- a/arch/arm64/kvm/vgic-v3-switch.S
 +++ b/arch/arm64/kvm/vgic-v3-switch.S
 @@ -148,17 +148,18 @@
   * x0: Register pointing to VCPU struct
   */
  .macro   restore_vgic_v3_state
 - // Disable SRE_EL1 access. Necessary, otherwise
 - // ICH_VMCR_EL2.VFIQEn becomes one, and FIQ happens...
 - msr_s   ICC_SRE_EL1, xzr
 - isb
 -
   // Compute the address of struct vgic_cpu
   add x3, x0, #VCPU_VGIC_CPU
  
   // Restore all interesting registers
   ldr w4, [x3, #VGIC_V3_CPU_HCR]
   ldr w5, [x3, #VGIC_V3_CPU_VMCR]
 + ldr w25, [x3, #VGIC_V3_CPU_SRE]
 +
 + msr_s   ICC_SRE_EL1, x25
 +
 + // make sure SRE is valid before writing the other registers
 + isb
  
   msr_s   ICH_HCR_EL2, x4
   msr_s   ICH_VMCR_EL2, x5
 @@ -244,9 +245,12 @@
   dsb sy
  
   // Prevent the guest from touching the GIC system registers
 + // if SRE isn't enabled for GICv3 emulation
 + cbnzx25, 1f
   mrs_s   x5, ICC_SRE_EL2
   and x5, x5, #~ICC_SRE_EL2_ENABLE
   msr_s   ICC_SRE_EL2, x5
 +1:
  .endm
  
  ENTRY(__save_vgic_v3_state)
 diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
 index 7e7c99e..8aa8482 100644
 --- a/include/kvm/arm_vgic.h
 +++ b/include/kvm/arm_vgic.h
 @@ -199,6 +199,7 @@ struct vgic_v3_cpu_if {
  #ifdef CONFIG_ARM_GIC_V3
   u32 vgic_hcr;
   u32 vgic_vmcr;
 + u32 vgic_sre;   /* Restored only, change ignored */
   u32 vgic_misr;  /* Saved only */
   u32 vgic_eisr;  /* Saved only */
   u32 vgic_elrsr; /* Saved only */
 diff --git a/virt/kvm/arm/vgic-v3.c b/virt/kvm/arm/vgic-v3.c
 index 86e8b99..8af8037 100644
 --- a/virt/kvm/arm/vgic-v3.c
 +++ b/virt/kvm/arm/vgic-v3.c
 @@ -145,15 +145,20 @@ static void vgic_v3_set_vmcr(struct kvm_vcpu *vcpu, 
 struct vgic_vmcr *vmcrp)
  
  static void vgic_v3_enable(struct kvm_vcpu *vcpu)
  {
 + struct vgic_v3_cpu_if *vgic_v3;
 +
 + vgic_v3 = vcpu-arch.vgic_cpu.vgic_v3;
   /*
* By forcing VMCR to zero, the GIC will restore the binary
* points to their reset values. Anything else resets to zero
* anyway.
*/
 - vcpu-arch.vgic_cpu.vgic_v3.vgic_vmcr = 0;
 + vgic_v3-vgic_vmcr = 0;
 +
 + vgic_v3-vgic_sre   = 0;
   why the tab ^^^ ?
  
   /* Get the show on the road... */
 - vcpu-arch.vgic_cpu.vgic_v3.vgic_hcr = ICH_HCR_EN;
 + vgic_v3-vgic_hcr = ICH_HCR_EN;
  }
  
  static const struct vgic_ops vgic_v3_ops = {
 -- 
 1.7.9.5
 

Besides the nit:

Reviewed-by: Christoffer Dall christoffer.d...@linaro.org
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Nested VMX guest tsc_deadline OOPS

2014-10-15 Thread Andy Lutomirski
On Tue, Oct 14, 2014 at 3:46 AM, Bandan Das b...@redhat.com wrote:
 Andy Lutomirski l...@amacapital.net writes:

 I don't know what's going on here, but a nested VMX boot using -cpu
 host on SNB on L0 and L1 fails with the OOPS below.  I kind of suspect
 that there's both a KVM bug and an APIC driver bug here.

 What kernel version is this ? What are you running for L1 and L2 ?

The problem seems to be that, under nested virtualization without
kvm-clock enabled, TSC calibration in the L2 guest fails completely.
I have no idea whether this is a bug in Linux, KVM, or QEMU.

The rest of the problem seems to be that the apic code falls over
completely on systems that advertise a TSC deadline timer if TSC
calibration fails.  I sent a patch to fix this.  With that patch
applied, L2 can boot, but it still has an unusable TSC.

--Andy


 -cpu host,-tsc-deadline on L1 works around the OOPS.

 [0.184000] Freeing SMP alternatives memory: 32K (81ff
 - 81ff8000)
 [0.188000] [ cut here ]
 [0.189000] WARNING: CPU: 0 PID: 1 at
 arch/x86/kernel/apic/apic.c:1393 setup_local_APIC+0x26c/0x320()
 [0.19] Modules linked in:
 [0.192000] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.17.0+ #459
 [0.193000] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
 [0.194000]  0009 8800070a3e40 818588a7
 
 [0.199000]  8800070a3e78 8108db68 
 00f0
 [0.205000]    0001
 8800070a3e88
 [0.21] Call Trace:
 [0.211000]  [818588a7] dump_stack+0x45/0x56
 [0.212000]  [8108db68] warn_slowpath_common+0x78/0xa0
 [0.213000]  [8108dc45] warn_slowpath_null+0x15/0x20
 [0.214000]  [8107744c] setup_local_APIC+0x26c/0x320
 [0.215000]  [81f06a74] native_smp_prepare_cpus+0x255/0x312
 [0.216000]  [81ef702d] kernel_init_freeable+0x94/0x1d1
 [0.217000]  [81851aa0] ? rest_init+0x80/0x80
 [0.218000]  [81851aa9] kernel_init+0x9/0xf0
 [0.219000]  [8186276c] ret_from_fork+0x7c/0xb0
 [0.22]  [81851aa0] ? rest_init+0x80/0x80
 [0.221000] ---[ end trace f5f19be5716e445c ]---
 [0.234000] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
 [0.244000] smpboot: CPU0: Intel(R) Core(TM) i7-3930K CPU @ 3.20GHz
 (fam: 06, model: 2d, stepping: 06)
 [0.248000] divide error:  [#1] SMP
 [0.248000] Modules linked in:
 [0.248000] CPU: 0 PID: 1 Comm: swapper/0 Tainted: GW
 3.17.0+ #459
 [0.248000] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
 [0.248000] task: 880007098000 ti: 8800070a task.ti:
 8800070a
 [0.248000] RIP: 0010:[810eb9ef]  [810eb9ef]
 clockevents_config.part.3+0x1f/0xa0
 [0.248000] RSP: :8800070a3e58  EFLAGS: 00010246
 [0.248000] RAX:  RBX: 880007c0cd00 RCX: 
 
 [0.248000] RDX:  RSI:  RDI: 
 
 [0.248000] RBP: 8800070a3e70 R08: 0001 R09: 
 82036b90
 [0.248000] R10: 00a7 R11: 00a6 R12: 
 b018
 [0.248000] R13: 0040 R14: b020 R15: 
 
 [0.248000] FS:  () GS:880007c0()
 knlGS:
 [0.248000] CS:  0010 DS:  ES:  CR0: 80050033
 [0.248000] CR2: 8800020dd000 CR3: 01e11000 CR4: 
 000406f0
 [0.248000] Stack:
 [0.248000]  880007c0cd00 b018 0040
 8800070a3e88
 [0.248000]  810ebbcb 880007c0cd00 8800070a3e98
 8107646f
 [0.248000]  8800070a3ed8 81f086d0 0053
 
 [0.248000] Call Trace:
 [0.248000]  [810ebbcb] 
 clockevents_config_and_register+0x1b/0x30
 [0.248000]  [8107646f] setup_APIC_timer+0x10f/0x170
 [0.248000]  [81f086d0] setup_boot_APIC_clock+0x4ae/0x4ba
 [0.248000]  [81f06b0e] native_smp_prepare_cpus+0x2ef/0x312
 [0.248000]  [81ef702d] kernel_init_freeable+0x94/0x1d1
 [0.248000]  [81851aa0] ? rest_init+0x80/0x80
 [0.248000]  [81851aa9] kernel_init+0x9/0xf0
 [0.248000]  [8186276c] ret_from_fork+0x7c/0xb0
 [0.248000]  [81851aa0] ? rest_init+0x80/0x80
 [0.248000] Code: c3 66 66 2e 0f 1f 84 00 00 00 00 00 55 31 d2 89
 f1 89 f6 41 b8 01 00 00 00 48 89 e5 41 55 41 54 53 48 89 fb 48 8b 7f
 70 48 89 f8 48 f7 f6 48 85 c0 74 0b 48 3d 58 02 00 00 41 89 c0 77 4e
 4c 8d
 [0.248000] RIP  [810eb9ef] clockevents_config.part.3+0x1f/0xa0
 [0.248000]  RSP 8800070a3e58
 [0.249000] ---[ end trace f5f19be5716e445d ]---
 [0.25] Kernel panic - not syncing: Attempted to kill init!
 exitcode=0x000b
 [0.25]
 [

Re: reduce networking latency

2014-10-15 Thread David Xu
2014-09-29 5:04 GMT-04:00 Michael S. Tsirkin m...@redhat.com:
 On Wed, Sep 24, 2014 at 02:40:53PM -0400, David Xu wrote:
 Hi Michael,

 I found this interesting project from KVM TODO website:

 allow handling short packets from softirq or VCPU context
  Plan:
We are going through the scheduler 3 times
(could be up to 5 if softirqd is involved)
Consider RX: host irq - io thread - VCPU thread -
guest irq - guest thread.
This adds a lot of latency.
We can cut it by some 1.5x if we do a bit of work
either in the VCPU or softirq context.
  Testing: netperf TCP RR - should be improved drastically
   netperf TCP STREAM guest to host - no regression

 Would you mind saying more about the work either in the vCPU or
 softirq context?

 For TX, we might be able to execute it directly from VCPU context.
 For RX, from softirq context.


Which step is removed for TX and RX compared with vanilla? Or what's
the new path?

TX: guest thread- host irq?
TX: host irq- ?

Thanks.

 Why it is only for short packets handling?

 That's just one idea to avoid doing too much work like this.

 Doing too much work in VCPU context would break pipelining,
 likely degrading stream performance.
 Work in softirq context is not accounted against the correct
 cgroups, doing a lot of work there will mean guest can steal
 CPU from other guests.

 Thanks a
 lot!


 Regards,

 Cong
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] arm/arm64: KVM: Fix BE accesses to GICv2 EISR and ELRSR regs

2014-10-15 Thread Victor Kamensky
On 14 October 2014 08:21, Victor Kamensky victor.kamen...@linaro.org wrote:
 On 14 October 2014 02:47, Marc Zyngier marc.zyng...@arm.com wrote:
 On Sun, Sep 28 2014 at 03:04:26 PM, Christoffer Dall 
 christoffer.d...@linaro.org wrote:
 The EIRSR and ELRSR registers are 32-bit registers on GICv2, and we
 store these as an array of two such registers on the vgic vcpu struct.
 However, we access them as a single 64-bit value or as a bitmap pointer
 in the generic vgic code, which breaks BE support.

 Instead, store them as u64 values on the vgic structure and do the
 word-swapping in the assembly code, which already handles the byte order
 for BE systems.

 Signed-off-by: Christoffer Dall christoffer.d...@linaro.org

 (still going through my email backlog, hence the delay).

 This looks like a valuable fix. Haven't had a chance to try it (no BE
 setup at hand) but maybe Victor can help reproducing this?.

 I'll give it a spin.

Tested-by: Victor Kamensky victor.kamen...@linaro.org

Tested on v3.17 + this fix on TC2 (V7) and Mustang (V8) with BE
kvm host, tried different combination of guests BE/LE V7/V8. All looks
good.

Only with latest qemu in BE V8 mode in v3.17 without this
fix I was able to reproduce the issue that Will spotted. With kvmtool,
and older qemu V8 BE code never hit vgic_v2_set_lr function so
that is why we did not run into it before. I guess fix in qemu in
pl011 mentioned by 1f2bb4acc125, uncovered vgic_v2_set_lr
code path and this BE issue. With this patch it works fine now.

Thanks,
Victor

 Thanks,
 Victor

 Acked-by: Marc Zyngier marc.zyng...@arm.com

 M.
 --
 Jazz is not dead. It just smells funny.
 ___
 kvmarm mailing list
 kvm...@lists.cs.columbia.edu
 https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html