Re: [PATCH v4 8/8] macvtap/tun: add VNET_BE flag

2015-04-22 Thread Michael S. Tsirkin
On Wed, Apr 22, 2015 at 12:01:29PM +0200, Greg Kurz wrote:
 On Tue, 21 Apr 2015 20:30:23 +0200
 Michael S. Tsirkin m...@redhat.com wrote:
 
  On Tue, Apr 21, 2015 at 06:22:20PM +0200, Greg Kurz wrote:
   On Tue, 21 Apr 2015 16:06:33 +0200
   Michael S. Tsirkin m...@redhat.com wrote:
   
On Fri, Apr 10, 2015 at 12:20:21PM +0200, Greg Kurz wrote:
 The VNET_LE flag was introduced to fix accesses to virtio 1.0 headers
 that are always little-endian. It can also be used to handle the 
 special
 case of a legacy little-endian device implemented by a big-endian 
 host.
 
 Let's add a flag and ioctls for big-endian devices as well. If both 
 flags
 are set, little-endian wins.
 
 Since this is isn't a common usecase, the feature is controlled by a 
 kernel
 config option (not set by default).
 
 Both macvtap and tun are covered by this patch since they share the 
 same
 API with userland.
 
 Signed-off-by: Greg Kurz gk...@linux.vnet.ibm.com
 ---
  drivers/net/Kconfig |   12 
  drivers/net/macvtap.c   |   60 
 +-
  drivers/net/tun.c   |   62 
 ++-
  include/uapi/linux/if_tun.h |2 +
  4 files changed, 134 insertions(+), 2 deletions(-)
 
 diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
 index df51d60..f0e23a0 100644
 --- a/drivers/net/Kconfig
 +++ b/drivers/net/Kconfig
 @@ -244,6 +244,18 @@ config TUN
  
 If you don't know what to use this for, you don't need it.
  
 +config TUN_VNET_BE
 + bool Support for big-endian vnet headers
 + default n
 + ---help---
 +   This option allows TUN/TAP and MACVTAP device drivers to parse
 +   vnet headers that are in big-endian byte order. It is useful
 +   when the headers come from a big-endian legacy virtio driver 
 and
 +   the host is little-endian.
 +
 +   Unless you have a little-endian system hosting a big-endian 
 virtual
 +   machine with a virtio NIC, you should say N.
 +

should mention cross-endian, not big-endian, right?

   
   The current TUN_VNET_LE related code is already doing cross-endian: 
   without
   this patch, one can already run a LE guest on a BE host... wouldn't it be
   confusing to mention cross-endian only when the guest is BE ?
  
  Hmm I think no - LE is also useful for virtio 1 - this is what it was
  intended for after all.
  
   What about having a completely distinct implementation for cross-endian 
   that
   don't reuse the existing code and defines then ?
  
  I think implementation and interface are fine, just the documentation
  can be improved a bit.
  
  How about:
  Support for cross-endian vnet headers on little-endian kernels.
  
  Accordingly CONFIG_TUN_VNET_CROSS_LE
  
  ?
  
 
 Sure. And what about also renaming the ioctl to TUNSETVNETCROSSLE then ?
 
 --
 Greg

I think not.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] KVM/ARM updates for v4.1, take 2

2015-04-22 Thread Paolo Bonzini


On 22/04/2015 17:08, Marc Zyngier wrote:
 Paolo, Marcelo,
 
 This is the second pull request for the KVM/ARM updates targeting
 v4.1. Not much to see this time, just a couple of borring fixes.

Pulled.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH stable] KVM: x86: Fix lost interrupt on irr_pending race

2015-04-22 Thread Luis Henriques
On Tue, Apr 21, 2015 at 10:47:37AM +0200, Paolo Bonzini wrote:
 
 
 On 21/04/2015 09:52, Paolo Bonzini wrote:
  From: Nadav Amit na...@cs.technion.ac.il
  
  [ upstream commit f210f7572bedf3320599e8b2d8e8ec2d96270d0b ]
  
  apic_find_highest_irr assumes irr_pending is set if any vector in APIC_IRR 
  is
  set.  If this assumption is broken and apicv is disabled, the injection of
  interrupts may be deferred until another interrupt is delivered to the 
  guest.
  Ultimately, if no other interrupt should be injected to that vCPU, the 
  pending
  interrupt may be lost.
  
  commit 56cc2406d68c (KVM: nVMX: fix acknowledge interrupt on exit when 
  APICv
  is in use) changed the behavior of apic_clear_irr so irr_pending is cleared
  after setting APIC_IRR vector. After this commit, if apic_set_irr and
  apic_clear_irr run simultaneously, a race may occur, resulting in APIC_IRR
  vector set, and irr_pending cleared. In the following example, assume a 
  single
  vector is set in IRR prior to calling apic_clear_irr:
  
  apic_set_irrapic_clear_irr
  --
  apic-irr_pending = true;
  apic_clear_vector(...);
  vec = apic_search_irr(apic);
  // = vec == -1
  apic_set_vector(...);
  apic-irr_pending = (vec != -1);
  // = apic-irr_pending == false
  
  Nonetheless, it appears the race might even occur prior to this commit:
  
  apic_set_irrapic_clear_irr
  --
  apic-irr_pending = true;
  apic-irr_pending = false;
  apic_clear_vector(...);
  if (apic_search_irr(apic) != -1)
  apic-irr_pending = true;
  // = apic-irr_pending == false
  apic_set_vector(...);
  
  Fixing this issue by:
  1. Restoring the previous behavior of apic_clear_irr: clear irr_pending, 
  call
 apic_clear_vector, and then if APIC_IRR is non-zero, set irr_pending.
  2. On apic_set_irr: first call apic_set_vector, then set irr_pending.
  
  Signed-off-by: Nadav Amit na...@cs.technion.ac.il
  Fixes: 33e4c68656a2e461b296ce714ec322978de85412
  Cc: sta...@vger.kernel.org # 2.6.32+
  Signed-off-by: Paolo Bonzini pbonz...@redhat.com
  ---
  The race was reported in 3.17+ by Brad Campbell and in
  2.6.32 by Saso Slavicic, so it qualifies for stable.
 
 Patch for kernels before 3.17:
 

Thanks Paolo.  I was going to apply this backport to the 3.16 kernel
but it looks like the original commit is a clean cherry-pick.  Shall I
still apply your backport, or do you think the original commit should
be applied instead?

Cheers,
--
Luís

 diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
 index 6e8ce5a1a05d..e0e5642dae41 100644
 --- a/arch/x86/kvm/lapic.c
 +++ b/arch/x86/kvm/lapic.c
 @@ -341,8 +341,12 @@ EXPORT_SYMBOL_GPL(kvm_apic_update_irr);
  
  static inline void apic_set_irr(int vec, struct kvm_lapic *apic)
  {
 - apic-irr_pending = true;
   apic_set_vector(vec, apic-regs + APIC_IRR);
 + /*
 +  * irr_pending must be true if any interrupt is pending; set it after
 +  * APIC_IRR to avoid race with apic_clear_irr
 +  */
 + apic-irr_pending = true;
  }
  
  static inline int apic_search_irr(struct kvm_lapic *apic)
 
 
 Thanks,
 
 Paolo
 --
 To unsubscribe from this list: send the line unsubscribe stable in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GSoC] project proposal

2015-04-22 Thread Paolo Bonzini


On 22/04/2015 10:51, Catalin Vasile wrote:
 If we want a mainstream userspace backend that could interact with a
 lot of crypto engines, we could use OpenSSL (it can actually use
 cryptodev and AF_ALG as engines).
 For now, until mid June (my diploma project presentation) I still want
 to use vhost as a backend for the sole purpose of having a finished
 backend which now I have a good grasp upon.
 If the finished work would be good enough work to be merged upstream
 will be talked later.
 As a GSoC project, OpenSSL as a backend would continue the
 virtio-crypto development, as it's not uncommon to have multiple types
 of backends.
 The current work on virtio-crypto qemu and guest module is pretty
 backend agnostic, and could allow future development(use of other
 backends and other features).

OpenSSL's license is not compatible with QEMU, hence the suggestion of
using gnutls.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH stable] KVM: x86: Fix lost interrupt on irr_pending race

2015-04-22 Thread Paolo Bonzini


On 22/04/2015 15:34, Luis Henriques wrote:
 Thanks Paolo.  I was going to apply this backport to the 3.16 kernel
 but it looks like the original commit is a clean cherry-pick.  Shall I
 still apply your backport, or do you think the original commit should
 be applied instead?

Indeed you're right.  I wrote the backport for 3.16(.0).  However,
commit 56cc2406d68c0f09505c389e276f27a99f495cbd was marked for stable,
so it's necessary to cherry-pick the entire patch on the stable kernel
where the buggy commit was backported.

That should be, according to the sta...@vger.kernel.org archives,
3.10.54+, 3.13.11.7+, 3.14.18+, 3.16.2+.

Paolo

 Cheers,
 --
 Luís
 
 diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
 index 6e8ce5a1a05d..e0e5642dae41 100644
 --- a/arch/x86/kvm/lapic.c
 +++ b/arch/x86/kvm/lapic.c
 @@ -341,8 +341,12 @@ EXPORT_SYMBOL_GPL(kvm_apic_update_irr);
  
  static inline void apic_set_irr(int vec, struct kvm_lapic *apic)
  {
 -apic-irr_pending = true;
  apic_set_vector(vec, apic-regs + APIC_IRR);
 +/*
 + * irr_pending must be true if any interrupt is pending; set it after
 + * APIC_IRR to avoid race with apic_clear_irr
 + */
 +apic-irr_pending = true;
  }
  
  static inline int apic_search_irr(struct kvm_lapic *apic)


 Thanks,

 Paolo
 --
 To unsubscribe from this list: send the line unsubscribe stable in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH stable] KVM: x86: Fix lost interrupt on irr_pending race

2015-04-22 Thread Luis Henriques
On Wed, Apr 22, 2015 at 03:47:04PM +0200, Paolo Bonzini wrote:
 
 
 On 22/04/2015 15:34, Luis Henriques wrote:
  Thanks Paolo.  I was going to apply this backport to the 3.16 kernel
  but it looks like the original commit is a clean cherry-pick.  Shall I
  still apply your backport, or do you think the original commit should
  be applied instead?
 
 Indeed you're right.  I wrote the backport for 3.16(.0).  However,
 commit 56cc2406d68c0f09505c389e276f27a99f495cbd was marked for stable,
 so it's necessary to cherry-pick the entire patch on the stable kernel
 where the buggy commit was backported.
 
 That should be, according to the sta...@vger.kernel.org archives,
 3.10.54+, 3.13.11.7+, 3.14.18+, 3.16.2+.
 

Great, thanks for the quick reply.  I'll queue the (entire) fix for
the 3.16 kernel.

Cheers,
--
Luís

 Paolo
 
  Cheers,
  --
  Luís
  
  diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
  index 6e8ce5a1a05d..e0e5642dae41 100644
  --- a/arch/x86/kvm/lapic.c
  +++ b/arch/x86/kvm/lapic.c
  @@ -341,8 +341,12 @@ EXPORT_SYMBOL_GPL(kvm_apic_update_irr);
   
   static inline void apic_set_irr(int vec, struct kvm_lapic *apic)
   {
  -  apic-irr_pending = true;
 apic_set_vector(vec, apic-regs + APIC_IRR);
  +  /*
  +   * irr_pending must be true if any interrupt is pending; set it after
  +   * APIC_IRR to avoid race with apic_clear_irr
  +   */
  +  apic-irr_pending = true;
   }
   
   static inline int apic_search_irr(struct kvm_lapic *apic)
 
 
  Thanks,
 
  Paolo
  --
  To unsubscribe from this list: send the line unsubscribe stable in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
 --
 To unsubscribe from this list: send the line unsubscribe stable in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] KVM: arm/arm64: check IRQ number on userland injection

2015-04-22 Thread Marc Zyngier
From: Andre Przywara andre.przyw...@arm.com

When userland injects a SPI via the KVM_IRQ_LINE ioctl we currently
only check it against a fixed limit, which historically is set
to 127. With the new dynamic IRQ allocation the effective limit may
actually be smaller (64).
So when now a malicious or buggy userland injects a SPI in that
range, we spill over on our VGIC bitmaps and bytemaps memory.
I could trigger a host kernel NULL pointer dereference with current
mainline by injecting some bogus IRQ number from a hacked kvmtool:
-

DEBUG: kvm_vgic_inject_irq(kvm, cpu=0, irq=114, level=1)
DEBUG: vgic_update_irq_pending(kvm, cpu=0, irq=114, level=1)
DEBUG: IRQ #114 still in the game, writing to bytemap now...
Unable to handle kernel NULL pointer dereference at virtual address 
pgd = ffc07652e000
[] *pgd=f658b003, *pud=f658b003, *pmd=
Internal error: Oops: 9606 [#1] PREEMPT SMP
Modules linked in:
CPU: 1 PID: 1053 Comm: lkvm-msi-irqinj Not tainted 4.0.0-rc7+ #3027
Hardware name: FVP Base (DT)
task: ffc0774e9680 ti: ffc0765a8000 task.ti: ffc0765a8000
PC is at kvm_vgic_inject_irq+0x234/0x310
LR is at kvm_vgic_inject_irq+0x30c/0x310
pc : [ffcae0a8] lr : [ffcae180] pstate: 8145
.

So this patch fixes this by checking the SPI number against the
actual limit. Also we remove the former legacy hard limit of
127 in the ioctl code.

Signed-off-by: Andre Przywara andre.przyw...@arm.com
Reviewed-by: Christoffer Dall christoffer.d...@linaro.org
CC: sta...@vger.kernel.org # 4.0, 3.19, 3.18
[maz: wrap KVM_ARM_IRQ_GIC_MAX with #ifndef __KERNEL__,
as suggested by Christopher Covington]
Signed-off-by: Marc Zyngier marc.zyng...@arm.com
---
 arch/arm/include/uapi/asm/kvm.h   | 8 +++-
 arch/arm/kvm/arm.c| 3 +--
 arch/arm64/include/uapi/asm/kvm.h | 8 +++-
 virt/kvm/arm/vgic.c   | 3 +++
 4 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
index 2499867..df3f60c 100644
--- a/arch/arm/include/uapi/asm/kvm.h
+++ b/arch/arm/include/uapi/asm/kvm.h
@@ -195,8 +195,14 @@ struct kvm_arch_memory_slot {
 #define KVM_ARM_IRQ_CPU_IRQ0
 #define KVM_ARM_IRQ_CPU_FIQ1
 
-/* Highest supported SPI, from VGIC_NR_IRQS */
+/*
+ * This used to hold the highest supported SPI, but it is now obsolete
+ * and only here to provide source code level compatibility with older
+ * userland. The highest SPI number can be set via 
KVM_DEV_ARM_VGIC_GRP_NR_IRQS.
+ */
+#ifndef __KERNEL__
 #define KVM_ARM_IRQ_GIC_MAX127
+#endif
 
 /* One single KVM irqchip, ie. the VGIC */
 #define KVM_NR_IRQCHIPS  1
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 6f53645..d9631ec 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -671,8 +671,7 @@ int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct 
kvm_irq_level *irq_level,
if (!irqchip_in_kernel(kvm))
return -ENXIO;
 
-   if (irq_num  VGIC_NR_PRIVATE_IRQS ||
-   irq_num  KVM_ARM_IRQ_GIC_MAX)
+   if (irq_num  VGIC_NR_PRIVATE_IRQS)
return -EINVAL;
 
return kvm_vgic_inject_irq(kvm, 0, irq_num, level);
diff --git a/arch/arm64/include/uapi/asm/kvm.h 
b/arch/arm64/include/uapi/asm/kvm.h
index c154c0b..d268320 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -188,8 +188,14 @@ struct kvm_arch_memory_slot {
 #define KVM_ARM_IRQ_CPU_IRQ0
 #define KVM_ARM_IRQ_CPU_FIQ1
 
-/* Highest supported SPI, from VGIC_NR_IRQS */
+/*
+ * This used to hold the highest supported SPI, but it is now obsolete
+ * and only here to provide source code level compatibility with older
+ * userland. The highest SPI number can be set via 
KVM_DEV_ARM_VGIC_GRP_NR_IRQS.
+ */
+#ifndef __KERNEL__
 #define KVM_ARM_IRQ_GIC_MAX127
+#endif
 
 /* One single KVM irqchip, ie. the VGIC */
 #define KVM_NR_IRQCHIPS  1
diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 7ed7873..78fb820 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -1561,6 +1561,9 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, 
unsigned int irq_num,
goto out;
}
 
+   if (irq_num = kvm-arch.vgic.nr_irqs)
+   return -EINVAL;
+
vcpu_id = vgic_update_irq_pending(kvm, cpuid, irq_num, level);
if (vcpu_id = 0) {
/* kick the specified vcpu */
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GSoC] project proposal

2015-04-22 Thread Catalin Vasile
I found my way through it's API.
http://www.gnutls.org/manual/gnutls.html#Cryptographic-API
Does anyone know if it has one shot givencrypt (generate IV and
encrypt as one job)?
I see an option to get random data, but I was thinking if there is an
one shot option.

On Wed, Apr 22, 2015 at 4:43 PM, Paolo Bonzini pbonz...@redhat.com wrote:


 On 22/04/2015 10:51, Catalin Vasile wrote:
 If we want a mainstream userspace backend that could interact with a
 lot of crypto engines, we could use OpenSSL (it can actually use
 cryptodev and AF_ALG as engines).
 For now, until mid June (my diploma project presentation) I still want
 to use vhost as a backend for the sole purpose of having a finished
 backend which now I have a good grasp upon.
 If the finished work would be good enough work to be merged upstream
 will be talked later.
 As a GSoC project, OpenSSL as a backend would continue the
 virtio-crypto development, as it's not uncommon to have multiple types
 of backends.
 The current work on virtio-crypto qemu and guest module is pretty
 backend agnostic, and could allow future development(use of other
 backends and other features).

 OpenSSL's license is not compatible with QEMU, hence the suggestion of
 using gnutls.

 Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] KVM: arm: irqfd: fix value returned by kvm_irq_map_gsi

2015-04-22 Thread Marc Zyngier
From: Eric Auger eric.au...@linaro.org

irqfd/arm curently does not support routing. kvm_irq_map_gsi is
supposed to return all the routing entries associated with the
provided gsi and return the number of those entries. We should
return 0 at this point.

Signed-off-by: Eric Auger eric.au...@linaro.org
Acked-by: Christoffer Dall christoffer.d...@linaro.org
Signed-off-by: Marc Zyngier marc.zyng...@arm.com
---
 virt/kvm/arm/vgic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 8d550ff..7ed7873 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -2141,7 +2141,7 @@ int kvm_irq_map_gsi(struct kvm *kvm,
struct kvm_kernel_irq_routing_entry *entries,
int gsi)
 {
-   return gsi;
+   return 0;
 }
 
 int kvm_irq_map_chip_pin(struct kvm *kvm, unsigned irqchip, unsigned pin)
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] KVM/ARM updates for v4.1, take 2

2015-04-22 Thread Marc Zyngier
Paolo, Marcelo,

This is the second pull request for the KVM/ARM updates targeting
v4.1. Not much to see this time, just a couple of borring fixes.

Thanks,

M.

The following changes since commit b79013b2449c23f1f505bdf39c5a6c330338b244:

  Merge tag 'staging-4.1-rc1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging (2015-04-13 
17:37:33 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git 
tags/kvm-arm-for-4.1-take2

for you to fetch changes up to fd1d0ddf2ae92fb3df42ed476939861806c5d785:

  KVM: arm/arm64: check IRQ number on userland injection (2015-04-22 15:42:24 
+0100)


KVM/ARM changes for v4.1, take #2:

Rather small this time:

- a fix for a nasty bug with virtual IRQ injection
- a fix for irqfd


Andre Przywara (1):
  KVM: arm/arm64: check IRQ number on userland injection

Eric Auger (1):
  KVM: arm: irqfd: fix value returned by kvm_irq_map_gsi

 arch/arm/include/uapi/asm/kvm.h   | 8 +++-
 arch/arm/kvm/arm.c| 3 +--
 arch/arm64/include/uapi/asm/kvm.h | 8 +++-
 virt/kvm/arm/vgic.c   | 5 -
 4 files changed, 19 insertions(+), 5 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Non-exiting rdpmc on KVM guests?

2015-04-22 Thread Peter Zijlstra
On Tue, Apr 21, 2015 at 02:10:53PM -0700, Andy Lutomirski wrote:

 One question is whether we care if we leak unrelated counters to the
 guest.  (We already leak them to unrelated user tasks, so this is
 hopefully not a big deal.  OTOH, the API is different for guests.)

Good question indeed. I really do not know.

 Another question is whether it's even worth trying to optimize this.

I think I just ran into a bunch of people who think virt pmu stuff is
important, but we'll have to see if they follow up with the effort of
actually doing the work.

My only concern is that they'll not make a mess of things ;-)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 00/12] Remaining improvements for HV KVM

2015-04-22 Thread Paul Mackerras
On Wed, Apr 15, 2015 at 10:16:41PM +0200, Alexander Graf wrote:
 
 
 On 14.04.15 13:56, Paul Mackerras wrote:
  Did you forget to push it out or something?  Your kvm-ppc-queue branch
  is still at 4.0-rc1 as far as I can see.
 
 Oops, not sure how that happened. Does it show up correctly for you now?

Yes, it's fine now, thanks.

Paul.
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 00/12] Remaining improvements for HV KVM

2015-04-22 Thread Paul Mackerras
On Wed, Apr 15, 2015 at 10:16:41PM +0200, Alexander Graf wrote:
 
 
 On 14.04.15 13:56, Paul Mackerras wrote:
  Did you forget to push it out or something?  Your kvm-ppc-queue branch
  is still at 4.0-rc1 as far as I can see.
 
 Oops, not sure how that happened. Does it show up correctly for you now?

Yes, it's fine now, thanks.

Paul.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 4/4] KVM: x86/vPMU: Enable PMU handling for AMD PERFCTRn and EVNTSELn MSRs

2015-04-22 Thread Radim Krčmář
2015-04-18 02:23-0400, Wei Huang:
 This patch enables AMD guest VM to access (R/W) PMU related MSRs, which
 include PERFCTR[0..3] and EVNTSEL[0..3].
 
 Signed-off-by: Wei Huang w...@redhat.com
 ---

Reviewed-by: Radim Krčmář rkrc...@redhat.com

 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 @@ -2268,27 +2268,17 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct 
 msr_data *msr_info)
* which we perfectly emulate ;-). Any other value should be at least
* reported, some guests depend on them.

(This comment is a bit outdated now too.)

*/
 - case MSR_K7_EVNTSEL0:
 - case MSR_K7_EVNTSEL1:
 - case MSR_K7_EVNTSEL2:
 - case MSR_K7_EVNTSEL3:
 - if (data != 0)
 - vcpu_unimpl(vcpu, unimplemented perfctr wrmsr: 
 - 0x%x data 0x%llx\n, msr, data);
 - break;
 - /* at least RHEL 4 unconditionally writes to the perfctr registers,
 -  * so we ignore writes to make it happy.
 -  */
 @@ -2513,6 +2503,12 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, 
 u64 *pdata)
   case MSR_K7_EVNTSEL0:
   case MSR_K7_EVNTSEL1:
   case MSR_K7_EVNTSEL2:
|   case MSR_K7_EVNTSEL3:
|   case MSR_K7_PERFCTR0:
   case MSR_K7_PERFCTR1:
   case MSR_K7_PERFCTR2:
   case MSR_K7_PERFCTR3:

(As we depend on continuous ranges anyway, the GCCism comes to mind:
 'case MSR_K7_PERFCTR0 ... MSR_K7_PERFCTR3:')

   case MSR_P6_PERFCTR0:
   case MSR_P6_PERFCTR1:
   case MSR_P6_EVNTSEL0:
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 0/4] KVM vPMU support for AMD CPUs

2015-04-22 Thread Radim Krčmář
2015-04-18 02:23-0400, Wei Huang:
 Currently KVM only supports vPMU for Intel CPUs. This patchset enables
 KVM vPMU support for AMD platform by creating a common PMU interface for
 x86. By refractoring, PMU related MSR accesses from guest VMs are dispatched
 to corresponding functions defined in arch specific files.
 
 V3:
   * Rebase the code to the latest of KVM tree (queue branch);
   * Branch out the Intel specific code from pmu.c to pmu_intel.c, in order
 to reflect the change history more accurately;
   * Name the parameters/variables more consistently (use msr, idx, 
 pmc_idx) across files;
   * Fix issues (whitespaces, macro names, ...) based on Radim's V2 comments;
   * Fix the MSR_K7_PERFCTRn and MSR_K7_EVNTSELn access code (in patch 4);

I still wasn't happy about the API, especially naming, but didnt't find
any bugs in functionality, also

Tested-by: Radim Krčmář rkrc...@redhat.com

I didn't give reviewed-by to all patches, but if we want it fast,
there's no problem in fixing some stuff later.  (Though it usually
doesn't happen and we end up with bad code.)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 3/4] KVM: x86/vPMU: Implement AMD vPMU code for KVM

2015-04-22 Thread Radim Krčmář
2015-04-18 02:23-0400, Wei Huang:
 This patch replaces the empty AMD vPMU functions (in pmu_amd.c) with real
 implementation.
 
 Signed-off-by: Wei Huang w...@redhat.com
 ---

Reviewed-by: Radim Krčmář rkrc...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 2/4] KVM: x86/vPMU: Create vPMU interface for VMX and SVM

2015-04-22 Thread Radim Krčmář
2015-04-18 02:23-0400, Wei Huang:
 This patch splits existing vPMU code into a common vPMU interface (pmc.c)
 and Intel specific vPMU code (pmu_intel.c) using the following steps:
 
 - Part of arechitectural vPMU code is extracted and moved to pmu_intel.c
   file. They are hooked up with the newly-defined intel_pmu_ops, which will
   be called from pmu interface;
 - Create a dummy pmu_amd.c file for AMD SVM with empty functions;
 
 All architectural vPMU functions are now called via PMU function dispatcher
 (kvm_pmu_ops). This function dispatcher is initialized by calling
 kvm_x86_ops-get_pmu_ops() at the beginning. Also note that Intel and AMD
 modules are now generated by combinig their corresponding arch files
 (vmx.c/svm.c) and pmu files (pmu_intel.c/pmu_amd.c).
 
 Signed-off-by: Wei Huang w...@redhat.com
 ---
 diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
 @@ -19,83 +18,41 @@
 +/* NOTE:
 + * - Each perf counter is defined as struct kvm_pmc;
 + * - There are two types of perf counters: general purpose (gp) and fixed.
 + *   gp counters are stored in gp_counters[] and fixed counters are stored
 + *   in fixed_counters[] respectively. Both of them are part of struct
 + *   kvm_pmu;
 + * - pmu.c understands the difference between gp counters and fixed counters.
 + *   However AMD doesn't support fixed-counters;
 + * - There are three types of index to access perf counters (PMC):
 + * 1. MSR (named msr): For example Intel has MSR_IA32_PERFCTRn and AMD
 + *has MSR_K7_PERFCTRn.
 + * 2. MSR Index (named idx):

Unless it's named msr :(

 + This normally is used by RDPMC instruction.
 + *For instance AMD RDPMC instruction uses _0003h in ECX to access
 + *C001_0007h (MSR_K7_PERCTR3). Intel has a similar mechanism, except
 + *that it also supports fixed counters. idx can be used to as index 
 to
 + *gp and fixed counters.
 + * 3. Global PMC Index (named pmc_idx): pmc_idx is an index specific to 
 PMU
 + *code. Each pmc_idx, stored in kvm_pmc.idx field, is unique across
 + *all perf counters (both gp and fixed). The mapping relationship
 + *between pmc_idx and perf counters is as the following:
 + ** Intel: [0 .. INTEL_PMC_MAX_GENERIC-1] = gp counters
 + * [INTEL_PMC_IDX_FIXED .. INTEL_PMC_IDX_FIXED + 2] = fixed
 + ** AMD:   [0 .. AMD64_NUM_COUNTERS-1] = gp counters
 + */

The declaration from [1/4] will hopefully help to show what I dislike:

  struct kvm_pmu_ops {
int (*check_msr)(struct kvm_vcpu *vcpu, unsigned msr);
struct kvm_pmc *(*msr_to_pmc)(struct kvm_vcpu *vcpu, unsigned idx);
bool (*is_pmu_msr)(struct kvm_vcpu *vcpu, u32 msr);
int (*get_msr)(struct kvm_vcpu *vcpu, u32 index, u64 *data);
int (*set_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr_info);
  };

This makes you think how to use it those two similar checks and what you
gain by converting to pmc ...

There are actually two groups of meaning for msr
  1) check_msr, msr_to_pmc (msr = PMC identifier)
  2) is_pmu_msr, get_msr, set_msr  (msr = MSR identifier)

And even after you know there are two meanings, only the position in
declaration really helps to distinguish them, which is far from what I'd
call good naming for API.
(I think that 'check_msr' goes well with 'get_msr' and 'set_msr', and
 wrappers just prepend 'kvm_pmu_'.)

Any different names (ideally not very similar) would work better.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Non-exiting rdpmc on KVM guests?

2015-04-22 Thread Paolo Bonzini


On 21/04/2015 22:51, Peter Zijlstra wrote:
  However, if you take into account that RDPMC can also be used
  to read an inactive counter, and that multiple guests fight for the
  same host counters, it's even harder to ensure that the guest counter
  indices match those on the host.

 That doesn't make sense, only a single vcpu task will ever run at any
 one time.

Right, but it puts more pressure on the scheduler which could end up
going more often through the slow path.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] First batch of KVM changes for 4.1

2015-04-22 Thread Marcelo Tosatti
On Mon, Apr 20, 2015 at 01:27:58PM -0700, Andy Lutomirski wrote:
 On Mon, Apr 20, 2015 at 9:59 AM, Paolo Bonzini pbonz...@redhat.com wrote:
 
 
  On 17/04/2015 22:18, Marcelo Tosatti wrote:
  The bug which this is fixing is very rare, have no memory of a report.
 
  In fact, its even difficult to create a synthetic reproducer.
 
  But then why was the task migration notifier even in Jeremy's original
  code for Xen?  Was it supposed to work even on non-synchronized TSC?
 
  If that's the case, then it could be reverted indeed; but then why did
  you commit this patch to 4.1?  Did you think of something that would
  cause the seqcount-like protocol to fail, and that turned out not to be
  the case later?  I was only following the mailing list sparsely in March.
 
 I don't think anyone ever tried that hard to test this stuff.  There
 was an infinte loop that Firefox was triggering as a KVM guest
 somewhat reliably until a couple months ago in the same vdso code.  :(

https://bugzilla.redhat.com/show_bug.cgi?id=1174664

--- Comment #5 from Juan Quintela quint...@redhat.com ---

Another round

# dmesg | grep msr
[0.00] kvm-clock: Using msrs 4b564d01 and 4b564d00
[0.00] kvm-clock: cpu 0, msr 1:1ffd8001, primary cpu clock
[0.00] kvm-stealtime: cpu 0, msr 11fc0d100
[0.041174] kvm-clock: cpu 1, msr 1:1ffd8041, secondary cpu clock
[0.053011] kvm-stealtime: cpu 1, msr 11fc8d100


After start:

[root@trasno yum.repos.d]# virsh qemu-monitor-command --hmp browser  'xp
/8x
0x1ffd8000'
1ffd8000: 0x3b401060 0xfffc7f4b 0x3b42d040 0xfffc7f4b
1ffd8010: 0x3b42d460 0xfffc7f4b 0x3b42d4c0 0xfffc7f4b


[root@trasno yum.repos.d]# virsh qemu-monitor-command --hmp browser  'xp /8x 
0x1ffd8040'
1ffd8040: 0x3b42d700 0xfffc7f4b 0x3b42d760 0xfffc7f4b
1ffd8050: 0x3b42d7c0 0xfffc7f4b 0x3b42d820 0xfffc7f4b

When firefox hangs

[root@trasno yum.repos.d]# virsh qemu-monitor-command --hmp browser  'xp
/8x
0x1ffd8000'
1ffd8000: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a
1ffd8010: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a


[root@trasno yum.repos.d]# virsh qemu-monitor-command --hmp browser  'xp
/8x
0x1ffd8040'
1ffd8040: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a
1ffd8050: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] First batch of KVM changes for 4.1

2015-04-22 Thread Marcelo Tosatti
On Wed, Apr 22, 2015 at 11:01:49PM +0200, Paolo Bonzini wrote:
 
 
 On 22/04/2015 22:56, Marcelo Tosatti wrote:
   But then why was the task migration notifier even in Jeremy's original
   code for Xen? 
  To cover for the vcpu1 - vcpu2 - vcpu1 case, i believe.
 
 Ok, to cover it for non-synchronized TSC.  While KVM requires
 synchronized TSC.
 
   If that's the case, then it could be reverted indeed; but then why did
   you commit this patch to 4.1? 
  
  Because it fixes the problem Andy reported (see Subject: KVM: x86: fix
  kvmclock write race (v2) on kvm@). As long as you have Radim's
  fix on top.
 
 But if it's so rare, and it was known that fixing the host protocol was
 just as good a solution, why was the guest fix committed?

I don't know. Should have fixed the host protocol.

 I'm just trying to understand.  I am worried that this patch was rushed
 in; so far I had assumed it wasn't (a revert of a revert is rare enough
 that you don't do it lightly...) but maybe I was wrong.

Yes it was rushed in.

 Right now I cannot even decide whether to revert it (and please Peter in
 the process :)) or submit the Kconfig symbol patch officially.
 
 Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] First batch of KVM changes for 4.1

2015-04-22 Thread Marcelo Tosatti
On Mon, Apr 20, 2015 at 06:59:04PM +0200, Paolo Bonzini wrote:
 
 
 On 17/04/2015 22:18, Marcelo Tosatti wrote:
  The bug which this is fixing is very rare, have no memory of a report.
  
  In fact, its even difficult to create a synthetic reproducer.
 
 But then why was the task migration notifier even in Jeremy's original
 code for Xen? 

To cover for the vcpu1 - vcpu2 - vcpu1 case, i believe.

 Was it supposed to work even on non-synchronized TSC?

Yes it is supposed to work on non-synchronized TSC.

 If that's the case, then it could be reverted indeed; but then why did
 you commit this patch to 4.1? 

Because it fixes the problem Andy reported (see Subject: KVM: x86: fix
kvmclock write race (v2) on kvm@). As long as you have Radim's
fix on top.

 Did you think of something that would
 cause the seqcount-like protocol to fail, and that turned out not to be
 the case later?  I was only following the mailing list sparsely in March.

No.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] First batch of KVM changes for 4.1

2015-04-22 Thread Paolo Bonzini


On 22/04/2015 22:56, Marcelo Tosatti wrote:
  But then why was the task migration notifier even in Jeremy's original
  code for Xen? 
 To cover for the vcpu1 - vcpu2 - vcpu1 case, i believe.

Ok, to cover it for non-synchronized TSC.  While KVM requires
synchronized TSC.

  If that's the case, then it could be reverted indeed; but then why did
  you commit this patch to 4.1? 
 
 Because it fixes the problem Andy reported (see Subject: KVM: x86: fix
 kvmclock write race (v2) on kvm@). As long as you have Radim's
 fix on top.

But if it's so rare, and it was known that fixing the host protocol was
just as good a solution, why was the guest fix committed?

I'm just trying to understand.  I am worried that this patch was rushed
in; so far I had assumed it wasn't (a revert of a revert is rare enough
that you don't do it lightly...) but maybe I was wrong.

Right now I cannot even decide whether to revert it (and please Peter in
the process :)) or submit the Kconfig symbol patch officially.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GSoC] project proposal

2015-04-22 Thread Stefan Hajnoczi
On Tue, Apr 21, 2015 at 04:07:56PM +0200, Paolo Bonzini wrote:
 On 21/04/2015 16:07, Catalin Vasile wrote:
  I don't get the part with getting cryptodev upstream.
  I don't know what getting cryptodev upstream actually implies.
  From what I know cryptodev is done (is a functional project) that was
  rejected in the Linux Kernel
  and there isn't actually way to get it upstream.
 
 Yes, I agree.

The limitations of AF_ALG need to addressed somehow, so what is the next
step?

Stefan


pgpDV1dGiX8CC.pgp
Description: PGP signature


Re: [GSoC] project proposal

2015-04-22 Thread Stefan Hajnoczi
On Tue, Apr 21, 2015 at 05:24:55PM +0300, Catalin Vasile wrote:
 Can you give me more details on GnuTLS?
 I'm going through some documentation and code and I see that it
 doesn't actually have separate encryption and authentication
 primitives.

gnutls is a natural choice because QEMU already uses it for TLS, but if
it doesn't support the primitives you need, then AF_ALG could be used
directly.

http://www.gnutls.org/manual/gnutls.html#Using-GnuTLS-as-a-cryptographic-library

Stefan


pgpucapBiwS6o.pgp
Description: PGP signature


[RFC PATCH 3/3] kvm/powerpc: report guest steal time in host

2015-04-22 Thread Naveen N. Rao
On powerpc, kvm tracks both the guest steal time as well as the time
when guest was idle and this gets sent in to the guest through DTL. The
guest accounts these entries as either steal time or idle time based on
the last running task. Since the true guest idle status is not visible
to the host, we can't accurately report the guest steal time in the
host.

However, tracking the guest vcpu cede status can get us a reasonable
(within 5% variation) vcpu steal time since guest vcpus cede the
processor on entering the idle task. To do this, we introduce a new
field ceded_st in kvm_vcpu_arch structure to accurately track the guest
vcpu cede status (this is needed since the existing ceded field is
modified before we can use it). During DTL entry creation, we check this
flag and account the time as stolen if the guest vcpu had not ceded.

Tests show that the steal time being reported in the host with this
approach is around 5% higher than the steal time shown in guest. Please
suggest if there are ways to get more accurate steal time information in
the host.

Signed-off-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/kvm_host.h | 1 +
 arch/powerpc/kernel/asm-offsets.c   | 1 +
 arch/powerpc/kvm/book3s_hv.c| 2 ++
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 3 +++
 4 files changed, 7 insertions(+)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 8ef0512..7db48c4 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -655,6 +655,7 @@ struct kvm_vcpu_arch {
u64 busy_preempt;
 
u32 emul_inst;
+   u8 ceded_st;
 #endif
 };
 
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 4717859..765c7c4 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -521,6 +521,7 @@ int main(void)
DEFINE(VCPU_DEC_EXPIRES, offsetof(struct kvm_vcpu, arch.dec_expires));
DEFINE(VCPU_PENDING_EXC, offsetof(struct kvm_vcpu, 
arch.pending_exceptions));
DEFINE(VCPU_CEDED, offsetof(struct kvm_vcpu, arch.ceded));
+   DEFINE(VCPU_CEDED_ST, offsetof(struct kvm_vcpu, arch.ceded_st));
DEFINE(VCPU_PRODDED, offsetof(struct kvm_vcpu, arch.prodded));
DEFINE(VCPU_MMCR, offsetof(struct kvm_vcpu, arch.mmcr));
DEFINE(VCPU_PMC, offsetof(struct kvm_vcpu, arch.pmc));
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index de74756..ad7c0e3 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -545,6 +545,8 @@ static void kvmppc_create_dtl_entry(struct kvm_vcpu *vcpu,
spin_lock_irq(vcpu-arch.tbacct_lock);
stolen += vcpu-arch.busy_stolen;
vcpu-arch.busy_stolen = 0;
+   if (!vcpu-arch.ceded_st  stolen)
+   (pid_task(vcpu-pid, PIDTYPE_PID))-gstime += stolen;
spin_unlock_irq(vcpu-arch.tbacct_lock);
if (!dt || !vpa)
return;
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 6cbf163..28f304e 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -873,6 +873,7 @@ deliver_guest_interrupt:
 fast_guest_return:
li  r0,0
stb r0,VCPU_CEDED(r4)   /* cancel cede */
+   stb r0,VCPU_CEDED_ST(r4)/* cancel cede */
mtspr   SPRN_HSRR0,r10
mtspr   SPRN_HSRR1,r11
 
@@ -1889,6 +1890,7 @@ _GLOBAL(kvmppc_h_cede)
std r11,VCPU_MSR(r3)
li  r0,1
stb r0,VCPU_CEDED(r3)
+   stb r0,VCPU_CEDED_ST(r3)
sync/* order setting ceded vs. testing prodded */
lbz r5,VCPU_PRODDED(r3)
cmpwi   r5,0
@@ -2052,6 +2054,7 @@ kvm_cede_prodded:
stb r0,VCPU_PRODDED(r3)
sync/* order testing prodded vs. clearing ceded */
stb r0,VCPU_CEDED(r3)
+   stb r0,VCPU_CEDED_ST(r3)
li  r3,H_SUCCESS
blr
 
-- 
2.3.5

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 2/3] kvm/x86: report guest steal time in host

2015-04-22 Thread Naveen N. Rao
Report guest steal time in host task statistics. On x86, this is just
the scheduler run_delay.

Signed-off-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com
---
 arch/x86/kvm/x86.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0ee725f..737b0e4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2094,6 +2094,7 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
 
vcpu-arch.st.steal.steal += vcpu-arch.st.accum_steal;
vcpu-arch.st.steal.version += 2;
+   current-gstime += vcpu-arch.st.accum_steal;
vcpu-arch.st.accum_steal = 0;
 
kvm_write_guest_cached(vcpu-kvm, vcpu-arch.st.stime,
-- 
2.3.5

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 0/3] Report guest steal time in host

2015-04-22 Thread Naveen N. Rao
Steal time accounts the time duration during which a guest vcpu was ready to
run, but was not scheduled to run by the hypervisor. This is particularly
relevant in cloud environment where customers would want to use this as an
indicator that their guests are being throttled. However, as it stands today,
guest steal time information is not visible from the hypervisor.

For cloud service providers, this is problematic since they would want to
overcommit cpu resources to achieve optimum resource utilization while at the
same time ensuring guests are not throttled. It is useful for service providers
to have access to the guest steal time data so that they can base their
overcommit/guest packing decisions on this. Higher guest steal time can be used
as a trigger to change how the guests are scheduled, or even migrate guests out
of a system.

This patchset attempts to make the guest steal times available in the host.
This is achieved by introducing a new field in per-task statistics
(/proc/pid/stat and /proc/pid/task/pid/stat) to accumulate per-vcpu steal
time. Programs (such as pidstat) can then be enhanced to report this
information on a per-thread basis [If there is a better place/way to expose
this, please let me know]. As an example, with pidstat on ppc64:

Guest steal time information using mpstat:
-

[root@rhel7-img ~]# mpstat -P ALL 1
Linux 3.19.0nnr (rhel7-img) 04/15/2015  _ppc64_ (4 CPU)

03:13:23 PM  CPU%usr   %nice%sys %iowait%irq   %soft  %steal  
%guest  %gnice   %idle
03:13:24 PM  all   12.250.001.250.001.002.25   13.75
0.000.00   69.50
03:13:24 PM0   46.530.000.000.000.004.95   45.54
0.000.002.97
03:13:24 PM10.000.000.000.000.004.043.03
0.000.00   92.93
03:13:24 PM20.000.000.000.003.960.992.97
0.000.00   92.08
03:13:24 PM33.000.004.000.000.000.004.00
0.000.00   89.00

03:13:24 PM  CPU%usr   %nice%sys %iowait%irq   %soft  %steal  
%guest  %gnice   %idle
03:13:25 PM  all   12.590.000.000.000.000.25   12.35
0.000.00   74.81
03:13:25 PM0   50.000.000.000.000.000.98   49.02
0.000.000.00
03:13:25 PM10.980.000.000.000.000.000.00
0.000.00   99.02
03:13:25 PM20.000.000.000.000.000.000.00
0.000.00  100.00
03:13:25 PM30.000.000.000.000.000.000.00
0.000.00  100.00

03:13:25 PM  CPU%usr   %nice%sys %iowait%irq   %soft  %steal  
%guest  %gnice   %idle
03:13:26 PM  all   12.990.000.000.000.250.00   12.75
0.000.00   74.02
03:13:26 PM0   51.960.000.000.000.000.00   48.04
0.000.000.00
03:13:26 PM10.000.000.000.000.000.000.00
0.000.00  100.00
03:13:26 PM20.000.000.000.000.980.002.94
0.000.00   96.08
03:13:26 PM30.000.000.000.000.000.000.00
0.000.00  100.00

03:13:26 PM  CPU%usr   %nice%sys %iowait%irq   %soft  %steal  
%guest  %gnice   %idle
03:13:27 PM  all   12.530.001.000.250.000.25   12.03
0.000.00   73.93
03:13:27 PM0   51.020.000.000.000.000.00   48.98
0.000.000.00
03:13:27 PM10.000.004.040.000.000.000.00
0.000.00   95.96
03:13:27 PM20.000.000.000.000.000.000.00
0.000.00  100.00
03:13:27 PM30.000.000.000.000.000.000.00
0.000.00  100.00

Average: CPU%usr   %nice%sys %iowait%irq   %soft  %steal  
%guest  %gnice   %idle
Average: all   12.910.000.540.010.040.12   12.39
0.000.00   74.00
Average:   0   51.360.000.030.000.030.26   48.27
0.000.000.05
Average:   10.020.001.540.020.020.150.36
0.000.00   97.89
Average:   20.000.000.520.000.090.020.36
0.000.00   99.02
Average:   30.050.000.070.000.020.090.34
0.000.00   99.43

Steal time information in host using (locally modified) pidstat:
---

[naveen@xx sysstat]$ ./pidstat -C qemu -tIu 1
Linux 3.19.0nnr (xx.in.ibm.com) 04/15/2015  _ppc64_ (64 CPU)

04:43:20 AM   UID  TGID   TID%usr %system  %guest%CPU  %steal   
CPU  Command
04:43:22 AM  1008  3001 -0.000.00   54.213.39   45.79   
 12  qemu-system-ppc
04:43:22 AM  1008 -  30050.000.00   54.213.390.00   
 12  |__qemu-system-ppc

04:43:22 AM   UID  

[RFC PATCH 0/3] Report guest steal time in host

2015-04-22 Thread Naveen N. Rao
Steal time accounts the time duration during which a guest vcpu was ready to
run, but was not scheduled to run by the hypervisor. This is particularly
relevant in cloud environment where customers would want to use this as an
indicator that their guests are being throttled. However, as it stands today,
guest steal time information is not visible from the hypervisor.

For cloud service providers, this is problematic since they would want to
overcommit cpu resources to achieve optimum resource utilization while at the
same time ensuring guests are not throttled. It is useful for service providers
to have access to the guest steal time data so that they can base their
overcommit/guest packing decisions on this. Higher guest steal time can be used
as a trigger to change how the guests are scheduled, or even migrate guests out
of a system.

This patchset attempts to make the guest steal times available in the host.
This is achieved by introducing a new field in per-task statistics
(/proc/pid/stat and /proc/pid/task/pid/stat) to accumulate per-vcpu steal
time. Programs (such as pidstat) can then be enhanced to report this
information on a per-thread basis [If there is a better place/way to expose
this, please let me know]. As an example, with pidstat on ppc64:

Guest steal time information using mpstat:
-

[root@rhel7-img ~]# mpstat -P ALL 1
Linux 3.19.0nnr (rhel7-img) 04/15/2015  _ppc64_ (4 CPU)

03:13:23 PM  CPU%usr   %nice%sys %iowait%irq   %soft  %steal  
%guest  %gnice   %idle
03:13:24 PM  all   12.250.001.250.001.002.25   13.75
0.000.00   69.50
03:13:24 PM0   46.530.000.000.000.004.95   45.54
0.000.002.97
03:13:24 PM10.000.000.000.000.004.043.03
0.000.00   92.93
03:13:24 PM20.000.000.000.003.960.992.97
0.000.00   92.08
03:13:24 PM33.000.004.000.000.000.004.00
0.000.00   89.00

03:13:24 PM  CPU%usr   %nice%sys %iowait%irq   %soft  %steal  
%guest  %gnice   %idle
03:13:25 PM  all   12.590.000.000.000.000.25   12.35
0.000.00   74.81
03:13:25 PM0   50.000.000.000.000.000.98   49.02
0.000.000.00
03:13:25 PM10.980.000.000.000.000.000.00
0.000.00   99.02
03:13:25 PM20.000.000.000.000.000.000.00
0.000.00  100.00
03:13:25 PM30.000.000.000.000.000.000.00
0.000.00  100.00

03:13:25 PM  CPU%usr   %nice%sys %iowait%irq   %soft  %steal  
%guest  %gnice   %idle
03:13:26 PM  all   12.990.000.000.000.250.00   12.75
0.000.00   74.02
03:13:26 PM0   51.960.000.000.000.000.00   48.04
0.000.000.00
03:13:26 PM10.000.000.000.000.000.000.00
0.000.00  100.00
03:13:26 PM20.000.000.000.000.980.002.94
0.000.00   96.08
03:13:26 PM30.000.000.000.000.000.000.00
0.000.00  100.00

03:13:26 PM  CPU%usr   %nice%sys %iowait%irq   %soft  %steal  
%guest  %gnice   %idle
03:13:27 PM  all   12.530.001.000.250.000.25   12.03
0.000.00   73.93
03:13:27 PM0   51.020.000.000.000.000.00   48.98
0.000.000.00
03:13:27 PM10.000.004.040.000.000.000.00
0.000.00   95.96
03:13:27 PM20.000.000.000.000.000.000.00
0.000.00  100.00
03:13:27 PM30.000.000.000.000.000.000.00
0.000.00  100.00

Average: CPU%usr   %nice%sys %iowait%irq   %soft  %steal  
%guest  %gnice   %idle
Average: all   12.910.000.540.010.040.12   12.39
0.000.00   74.00
Average:   0   51.360.000.030.000.030.26   48.27
0.000.000.05
Average:   10.020.001.540.020.020.150.36
0.000.00   97.89
Average:   20.000.000.520.000.090.020.36
0.000.00   99.02
Average:   30.050.000.070.000.020.090.34
0.000.00   99.43

Steal time information in host using (locally modified) pidstat:
---

[naveen@xx sysstat]$ ./pidstat -C qemu -tIu 1
Linux 3.19.0nnr (xx.in.ibm.com) 04/15/2015  _ppc64_ (64 CPU)

04:43:20 AM   UID  TGID   TID%usr %system  %guest%CPU  %steal   
CPU  Command
04:43:22 AM  1008  3001 -0.000.00   54.213.39   45.79   
 12  qemu-system-ppc
04:43:22 AM  1008 -  30050.000.00   54.213.390.00   
 12  |__qemu-system-ppc

04:43:22 AM   UID  

[RFC PATCH 3/3] kvm/powerpc: report guest steal time in host

2015-04-22 Thread Naveen N. Rao
On powerpc, kvm tracks both the guest steal time as well as the time
when guest was idle and this gets sent in to the guest through DTL. The
guest accounts these entries as either steal time or idle time based on
the last running task. Since the true guest idle status is not visible
to the host, we can't accurately report the guest steal time in the
host.

However, tracking the guest vcpu cede status can get us a reasonable
(within 5% variation) vcpu steal time since guest vcpus cede the
processor on entering the idle task. To do this, we introduce a new
field ceded_st in kvm_vcpu_arch structure to accurately track the guest
vcpu cede status (this is needed since the existing ceded field is
modified before we can use it). During DTL entry creation, we check this
flag and account the time as stolen if the guest vcpu had not ceded.

Tests show that the steal time being reported in the host with this
approach is around 5% higher than the steal time shown in guest. Please
suggest if there are ways to get more accurate steal time information in
the host.

Signed-off-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/kvm_host.h | 1 +
 arch/powerpc/kernel/asm-offsets.c   | 1 +
 arch/powerpc/kvm/book3s_hv.c| 2 ++
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 3 +++
 4 files changed, 7 insertions(+)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 8ef0512..7db48c4 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -655,6 +655,7 @@ struct kvm_vcpu_arch {
u64 busy_preempt;
 
u32 emul_inst;
+   u8 ceded_st;
 #endif
 };
 
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 4717859..765c7c4 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -521,6 +521,7 @@ int main(void)
DEFINE(VCPU_DEC_EXPIRES, offsetof(struct kvm_vcpu, arch.dec_expires));
DEFINE(VCPU_PENDING_EXC, offsetof(struct kvm_vcpu, 
arch.pending_exceptions));
DEFINE(VCPU_CEDED, offsetof(struct kvm_vcpu, arch.ceded));
+   DEFINE(VCPU_CEDED_ST, offsetof(struct kvm_vcpu, arch.ceded_st));
DEFINE(VCPU_PRODDED, offsetof(struct kvm_vcpu, arch.prodded));
DEFINE(VCPU_MMCR, offsetof(struct kvm_vcpu, arch.mmcr));
DEFINE(VCPU_PMC, offsetof(struct kvm_vcpu, arch.pmc));
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index de74756..ad7c0e3 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -545,6 +545,8 @@ static void kvmppc_create_dtl_entry(struct kvm_vcpu *vcpu,
spin_lock_irq(vcpu-arch.tbacct_lock);
stolen += vcpu-arch.busy_stolen;
vcpu-arch.busy_stolen = 0;
+   if (!vcpu-arch.ceded_st  stolen)
+   (pid_task(vcpu-pid, PIDTYPE_PID))-gstime += stolen;
spin_unlock_irq(vcpu-arch.tbacct_lock);
if (!dt || !vpa)
return;
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 6cbf163..28f304e 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -873,6 +873,7 @@ deliver_guest_interrupt:
 fast_guest_return:
li  r0,0
stb r0,VCPU_CEDED(r4)   /* cancel cede */
+   stb r0,VCPU_CEDED_ST(r4)/* cancel cede */
mtspr   SPRN_HSRR0,r10
mtspr   SPRN_HSRR1,r11
 
@@ -1889,6 +1890,7 @@ _GLOBAL(kvmppc_h_cede)
std r11,VCPU_MSR(r3)
li  r0,1
stb r0,VCPU_CEDED(r3)
+   stb r0,VCPU_CEDED_ST(r3)
sync/* order setting ceded vs. testing prodded */
lbz r5,VCPU_PRODDED(r3)
cmpwi   r5,0
@@ -2052,6 +2054,7 @@ kvm_cede_prodded:
stb r0,VCPU_PRODDED(r3)
sync/* order testing prodded vs. clearing ceded */
stb r0,VCPU_CEDED(r3)
+   stb r0,VCPU_CEDED_ST(r3)
li  r3,H_SUCCESS
blr
 
-- 
2.3.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 1/3] procfs: add guest steal time in /proc/pid/stat

2015-04-22 Thread Naveen N. Rao
Introduce a field in /proc/pid/stat to expose guest steal time.

Signed-off-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com
---
 fs/proc/array.c   | 6 ++
 include/linux/sched.h | 7 +++
 kernel/fork.c | 2 +-
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/fs/proc/array.c b/fs/proc/array.c
index 1295a00..d86f00e 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -363,6 +363,7 @@ static int do_task_stat(struct seq_file *m, struct 
pid_namespace *ns,
unsigned long rsslim = 0;
char tcomm[sizeof(task-comm)];
unsigned long flags;
+   cputime_t gstime;
 
state = *get_task_state(task);
vsize = eip = esp = 0;
@@ -382,6 +383,7 @@ static int do_task_stat(struct seq_file *m, struct 
pid_namespace *ns,
sigemptyset(sigcatch);
cutime = cstime = utime = stime = 0;
cgtime = gtime = 0;
+   gstime = 0;
 
if (lock_task_sighand(task, flags)) {
struct signal_struct *sig = task-signal;
@@ -410,6 +412,7 @@ static int do_task_stat(struct seq_file *m, struct 
pid_namespace *ns,
min_flt += t-min_flt;
maj_flt += t-maj_flt;
gtime += task_gtime(t);
+   gstime += task_gstime(t);
} while_each_thread(task, t);
 
min_flt += sig-min_flt;
@@ -432,6 +435,7 @@ static int do_task_stat(struct seq_file *m, struct 
pid_namespace *ns,
maj_flt = task-maj_flt;
task_cputime_adjusted(task, utime, stime);
gtime = task_gtime(task);
+   gstime = task_gstime(task);
}
 
/* scale priority and nice values from timeslices to -20..20 */
@@ -505,6 +509,8 @@ static int do_task_stat(struct seq_file *m, struct 
pid_namespace *ns,
else
seq_put_decimal_ll(m, ' ', 0);
 
+   seq_put_decimal_ull(m, ' ', cputime_to_clock_t(gstime));
+
seq_putc(m, '\n');
if (mm)
mmput(mm);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 0eabab9..cb57954 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1429,6 +1429,7 @@ struct task_struct {
 
cputime_t utime, stime, utimescaled, stimescaled;
cputime_t gtime;
+   cputime_t gstime;
 #ifndef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
struct cputime prev_cputime;
 #endif
@@ -1955,6 +1956,12 @@ static inline cputime_t task_gtime(struct task_struct *t)
return t-gtime;
 }
 #endif
+
+static inline cputime_t task_gstime(struct task_struct *t)
+{
+   return t-gstime;
+}
+
 extern void task_cputime_adjusted(struct task_struct *p, cputime_t *ut, 
cputime_t *st);
 extern void thread_group_cputime_adjusted(struct task_struct *p, cputime_t 
*ut, cputime_t *st);
 
diff --git a/kernel/fork.c b/kernel/fork.c
index cf65139..529ebe5 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1293,7 +1293,7 @@ static struct task_struct *copy_process(unsigned long 
clone_flags,
 
init_sigpending(p-pending);
 
-   p-utime = p-stime = p-gtime = 0;
+   p-utime = p-stime = p-gtime = p-gstime = 0;
p-utimescaled = p-stimescaled = 0;
 #ifndef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
p-prev_cputime.utime = p-prev_cputime.stime = 0;
-- 
2.3.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 2/3] kvm/x86: report guest steal time in host

2015-04-22 Thread Naveen N. Rao
Report guest steal time in host task statistics. On x86, this is just
the scheduler run_delay.

Signed-off-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com
---
 arch/x86/kvm/x86.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0ee725f..737b0e4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2094,6 +2094,7 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
 
vcpu-arch.st.steal.steal += vcpu-arch.st.accum_steal;
vcpu-arch.st.steal.version += 2;
+   current-gstime += vcpu-arch.st.accum_steal;
vcpu-arch.st.accum_steal = 0;
 
kvm_write_guest_cached(vcpu-kvm, vcpu-arch.st.stime,
-- 
2.3.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 1/3] procfs: add guest steal time in /proc/pid/stat

2015-04-22 Thread Naveen N. Rao
Introduce a field in /proc/pid/stat to expose guest steal time.

Signed-off-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com
---
 fs/proc/array.c   | 6 ++
 include/linux/sched.h | 7 +++
 kernel/fork.c | 2 +-
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/fs/proc/array.c b/fs/proc/array.c
index 1295a00..d86f00e 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -363,6 +363,7 @@ static int do_task_stat(struct seq_file *m, struct 
pid_namespace *ns,
unsigned long rsslim = 0;
char tcomm[sizeof(task-comm)];
unsigned long flags;
+   cputime_t gstime;
 
state = *get_task_state(task);
vsize = eip = esp = 0;
@@ -382,6 +383,7 @@ static int do_task_stat(struct seq_file *m, struct 
pid_namespace *ns,
sigemptyset(sigcatch);
cutime = cstime = utime = stime = 0;
cgtime = gtime = 0;
+   gstime = 0;
 
if (lock_task_sighand(task, flags)) {
struct signal_struct *sig = task-signal;
@@ -410,6 +412,7 @@ static int do_task_stat(struct seq_file *m, struct 
pid_namespace *ns,
min_flt += t-min_flt;
maj_flt += t-maj_flt;
gtime += task_gtime(t);
+   gstime += task_gstime(t);
} while_each_thread(task, t);
 
min_flt += sig-min_flt;
@@ -432,6 +435,7 @@ static int do_task_stat(struct seq_file *m, struct 
pid_namespace *ns,
maj_flt = task-maj_flt;
task_cputime_adjusted(task, utime, stime);
gtime = task_gtime(task);
+   gstime = task_gstime(task);
}
 
/* scale priority and nice values from timeslices to -20..20 */
@@ -505,6 +509,8 @@ static int do_task_stat(struct seq_file *m, struct 
pid_namespace *ns,
else
seq_put_decimal_ll(m, ' ', 0);
 
+   seq_put_decimal_ull(m, ' ', cputime_to_clock_t(gstime));
+
seq_putc(m, '\n');
if (mm)
mmput(mm);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 0eabab9..cb57954 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1429,6 +1429,7 @@ struct task_struct {
 
cputime_t utime, stime, utimescaled, stimescaled;
cputime_t gtime;
+   cputime_t gstime;
 #ifndef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
struct cputime prev_cputime;
 #endif
@@ -1955,6 +1956,12 @@ static inline cputime_t task_gtime(struct task_struct *t)
return t-gtime;
 }
 #endif
+
+static inline cputime_t task_gstime(struct task_struct *t)
+{
+   return t-gstime;
+}
+
 extern void task_cputime_adjusted(struct task_struct *p, cputime_t *ut, 
cputime_t *st);
 extern void thread_group_cputime_adjusted(struct task_struct *p, cputime_t 
*ut, cputime_t *st);
 
diff --git a/kernel/fork.c b/kernel/fork.c
index cf65139..529ebe5 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1293,7 +1293,7 @@ static struct task_struct *copy_process(unsigned long 
clone_flags,
 
init_sigpending(p-pending);
 
-   p-utime = p-stime = p-gtime = 0;
+   p-utime = p-stime = p-gtime = p-gstime = 0;
p-utimescaled = p-stimescaled = 0;
 #ifndef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
p-prev_cputime.utime = p-prev_cputime.stime = 0;
-- 
2.3.5

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: tweak types of fields in kvm_lapic_irq

2015-04-22 Thread Paolo Bonzini


On 22/04/2015 11:35, Radim Krčmář wrote:
  Change the level field to bool, since we assign 1 sometimes, but
  just mask icr_low with APIC_INT_ASSERT in apic_send_-ipi.
 
 Would be more consistent to change that assignment instead ...
 If we dropped the idea that struct kvm_lapic_irq fields can be bitORed
 to get the ICR, we could also easily change trig_mode/dest_mode to bool
 level_trig/logical_dest.  (I can do a followup patch.)

Right, I thought of both.  However, level is something that has an
obviously understandable meaning as a bool, while trig_mode/dest_mode as
you said have to be renamed as well.

You're right on the u8 type for vector, too.  But I probably will end up
not committing this patch at all...

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: tweak types of fields in kvm_lapic_irq

2015-04-22 Thread Radim Krčmář
2015-04-21 19:01+0200, Paolo Bonzini:
 Change to u16 if they only contain data in the low 16 bits.
 
 Change the level field to bool, since we assign 1 sometimes, but
 just mask icr_low with APIC_INT_ASSERT in apic_send_ipi.

Would be more consistent to change that assignment instead ...
If we dropped the idea that struct kvm_lapic_irq fields can be bitORed
to get the ICR, we could also easily change trig_mode/dest_mode to bool
level_trig/logical_dest.  (I can do a followup patch.)

 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 ---
  arch/x86/include/asm/kvm_host.h | 8 
  arch/x86/kvm/lapic.c| 2 +-
  2 files changed, 5 insertions(+), 5 deletions(-)
 
 diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
 index 3a19e30f0be0..dc83b43d0850 100644
 --- a/arch/x86/include/asm/kvm_host.h
 +++ b/arch/x86/include/asm/kvm_host.h
 @@ -689,10 +689,10 @@ struct msr_data {
  
  struct kvm_lapic_irq {
   u32 vector;

Vector can be u8.

 - u32 delivery_mode;
 - u32 dest_mode;
 - u32 level;
 - u32 trig_mode;
 + u16 delivery_mode;
 + u16 dest_mode;
 + bool level;
 + u16 trig_mode;

I'd prefer to have the u8 vector as well, but it works,
Reviewed-by: Radim Krčmář rkrc...@redhat.com

   u32 shorthand;
   u32 dest_id;
  };
 diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
 index abf165330881..ba585d0c42c5 100644
 --- a/arch/x86/kvm/lapic.c
 +++ b/arch/x86/kvm/lapic.c
 @@ -914,7 +914,7 @@ static void apic_send_ipi(struct kvm_lapic *apic)
   irq.vector = icr_low  APIC_VECTOR_MASK;
   irq.delivery_mode = icr_low  APIC_MODE_MASK;
   irq.dest_mode = icr_low  APIC_DEST_MASK;
 - irq.level = icr_low  APIC_INT_ASSERT;
 + irq.level = (icr_low  APIC_INT_ASSERT) != 0;
   irq.trig_mode = icr_low  APIC_INT_LEVELTRIG;
   irq.shorthand = icr_low  APIC_SHORT_MASK;
   if (apic_x2apic_mode(apic))
 -- 
 1.8.3.1
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 06/10] KVM: arm64: guest debug, add SW break point support

2015-04-22 Thread Alex Bennée

Zhichao Huang zhichao.hu...@linaro.org writes:

 On Tue, Mar 31, 2015 at 04:08:04PM +0100, Alex Bennée wrote:
 This adds support for SW breakpoints inserted by userspace.

 We do this by trapping all BKPT exceptions in the
 hypervisor (MDCR_EL2_TDE).

 why should we trap all debug exceptions?

 The trap for cp14 register r/w seems enough to record relevant
 informations to context switch the dbg register while neccessary.

Lets think about this case when the SW breakpoint exception occurs:

If KVM doesn't trap it and pass it back to userspace to handle it would
have to deliver it to the guest. The guest not having inserted the
breakpoint in the first place would get very confused.

So what we actually do is re-route the exception to the hypervisor and
stop the VM and return to userspace with the debug information. Once in
QEMU we check to see if the SW breakpoint was one of the ones we
inserted at which point control is passed back to the host GDB (attached
via the GDB stub in QEMU). If it is not a breakpoint which was set-up by
the host then it must be one for the guest at which point we need to
ensure the exception is delivered to the guest for it to process.

-- 
Alex Bennée
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 8/8] macvtap/tun: add VNET_BE flag

2015-04-22 Thread Greg Kurz
On Tue, 21 Apr 2015 20:30:23 +0200
Michael S. Tsirkin m...@redhat.com wrote:

 On Tue, Apr 21, 2015 at 06:22:20PM +0200, Greg Kurz wrote:
  On Tue, 21 Apr 2015 16:06:33 +0200
  Michael S. Tsirkin m...@redhat.com wrote:
  
   On Fri, Apr 10, 2015 at 12:20:21PM +0200, Greg Kurz wrote:
The VNET_LE flag was introduced to fix accesses to virtio 1.0 headers
that are always little-endian. It can also be used to handle the special
case of a legacy little-endian device implemented by a big-endian host.

Let's add a flag and ioctls for big-endian devices as well. If both 
flags
are set, little-endian wins.

Since this is isn't a common usecase, the feature is controlled by a 
kernel
config option (not set by default).

Both macvtap and tun are covered by this patch since they share the same
API with userland.

Signed-off-by: Greg Kurz gk...@linux.vnet.ibm.com
---
 drivers/net/Kconfig |   12 
 drivers/net/macvtap.c   |   60 
+-
 drivers/net/tun.c   |   62 
++-
 include/uapi/linux/if_tun.h |2 +
 4 files changed, 134 insertions(+), 2 deletions(-)

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index df51d60..f0e23a0 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -244,6 +244,18 @@ config TUN
 
  If you don't know what to use this for, you don't need it.
 
+config TUN_VNET_BE
+   bool Support for big-endian vnet headers
+   default n
+   ---help---
+ This option allows TUN/TAP and MACVTAP device drivers to parse
+ vnet headers that are in big-endian byte order. It is useful
+ when the headers come from a big-endian legacy virtio driver 
and
+ the host is little-endian.
+
+ Unless you have a little-endian system hosting a big-endian 
virtual
+ machine with a virtio NIC, you should say N.
+
   
   should mention cross-endian, not big-endian, right?
   
  
  The current TUN_VNET_LE related code is already doing cross-endian: without
  this patch, one can already run a LE guest on a BE host... wouldn't it be
  confusing to mention cross-endian only when the guest is BE ?
 
 Hmm I think no - LE is also useful for virtio 1 - this is what it was
 intended for after all.
 
  What about having a completely distinct implementation for cross-endian that
  don't reuse the existing code and defines then ?
 
 I think implementation and interface are fine, just the documentation
 can be improved a bit.
 
 How about:
   Support for cross-endian vnet headers on little-endian kernels.
 
 Accordingly CONFIG_TUN_VNET_CROSS_LE
 
 ?
 

Sure. And what about also renaming the ioctl to TUNSETVNETCROSSLE then ?

--
Greg

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/3] Report guest steal time in host

2015-04-22 Thread Christian Borntraeger
Am 22.04.2015 um 12:24 schrieb Naveen N. Rao:
 Steal time accounts the time duration during which a guest vcpu was ready to
 run, but was not scheduled to run by the hypervisor. This is particularly
 relevant in cloud environment where customers would want to use this as an
 indicator that their guests are being throttled. However, as it stands today,
 guest steal time information is not visible from the hypervisor.
 
 For cloud service providers, this is problematic since they would want to
 overcommit cpu resources to achieve optimum resource utilization while at the
 same time ensuring guests are not throttled. It is useful for service 
 providers
 to have access to the guest steal time data so that they can base their
 overcommit/guest packing decisions on this. Higher guest steal time can be 
 used
 as a trigger to change how the guests are scheduled, or even migrate guests 
 out
 of a system.
 
 This patchset attempts to make the guest steal times available in the host.
 This is achieved by introducing a new field in per-task statistics
 (/proc/pid/stat and /proc/pid/task/pid/stat) to accumulate per-vcpu 
 steal
 time. Programs (such as pidstat) can then be enhanced to report this
 information on a per-thread basis [If there is a better place/way to expose
 this, please let me know]. As an example, with pidstat on ppc64:
 
 Guest steal time information using mpstat:
 -
 
 [root@rhel7-img ~]# mpstat -P ALL 1
 Linux 3.19.0nnr (rhel7-img)   04/15/2015  _ppc64_ (4 CPU)
 
 03:13:23 PM  CPU%usr   %nice%sys %iowait%irq   %soft  %steal  
 %guest  %gnice   %idle
 03:13:24 PM  all   12.250.001.250.001.002.25   13.75
 0.000.00   69.50
 03:13:24 PM0   46.530.000.000.000.004.95   45.54
 0.000.002.97
 03:13:24 PM10.000.000.000.000.004.043.03
 0.000.00   92.93
 03:13:24 PM20.000.000.000.003.960.992.97
 0.000.00   92.08
 03:13:24 PM33.000.004.000.000.000.004.00
 0.000.00   89.00
 
 03:13:24 PM  CPU%usr   %nice%sys %iowait%irq   %soft  %steal  
 %guest  %gnice   %idle
 03:13:25 PM  all   12.590.000.000.000.000.25   12.35
 0.000.00   74.81
 03:13:25 PM0   50.000.000.000.000.000.98   49.02
 0.000.000.00
 03:13:25 PM10.980.000.000.000.000.000.00
 0.000.00   99.02
 03:13:25 PM20.000.000.000.000.000.000.00
 0.000.00  100.00
 03:13:25 PM30.000.000.000.000.000.000.00
 0.000.00  100.00
 
 03:13:25 PM  CPU%usr   %nice%sys %iowait%irq   %soft  %steal  
 %guest  %gnice   %idle
 03:13:26 PM  all   12.990.000.000.000.250.00   12.75
 0.000.00   74.02
 03:13:26 PM0   51.960.000.000.000.000.00   48.04
 0.000.000.00
 03:13:26 PM10.000.000.000.000.000.000.00
 0.000.00  100.00
 03:13:26 PM20.000.000.000.000.980.002.94
 0.000.00   96.08
 03:13:26 PM30.000.000.000.000.000.000.00
 0.000.00  100.00
 
 03:13:26 PM  CPU%usr   %nice%sys %iowait%irq   %soft  %steal  
 %guest  %gnice   %idle
 03:13:27 PM  all   12.530.001.000.250.000.25   12.03
 0.000.00   73.93
 03:13:27 PM0   51.020.000.000.000.000.00   48.98
 0.000.000.00
 03:13:27 PM10.000.004.040.000.000.000.00
 0.000.00   95.96
 03:13:27 PM20.000.000.000.000.000.000.00
 0.000.00  100.00
 03:13:27 PM30.000.000.000.000.000.000.00
 0.000.00  100.00
 
 Average: CPU%usr   %nice%sys %iowait%irq   %soft  %steal  
 %guest  %gnice   %idle
 Average: all   12.910.000.540.010.040.12   12.39
 0.000.00   74.00
 Average:   0   51.360.000.030.000.030.26   48.27
 0.000.000.05
 Average:   10.020.001.540.020.020.150.36
 0.000.00   97.89
 Average:   20.000.000.520.000.090.020.36
 0.000.00   99.02
 Average:   30.050.000.070.000.020.090.34
 0.000.00   99.43
 
 Steal time information in host using (locally modified) pidstat:
 ---
 
 [naveen@xx sysstat]$ ./pidstat -C qemu -tIu 1
 Linux 3.19.0nnr (xx.in.ibm.com)   04/15/2015  _ppc64_ (64 CPU)
 
 04:43:20 AM   UID  TGID   TID%usr %system  %guest%CPU  %steal 
   CPU  Command
 04:43:22 AM  1008  3001 -0.000.00   54.213.39   45.79 
 

Re: [RFC PATCH 0/3] Report guest steal time in host

2015-04-22 Thread Naveen N. Rao
On 2015/04/22 01:05PM, Christian Borntraeger wrote:
 Am 22.04.2015 um 12:24 schrieb Naveen N. Rao:
  Steal time accounts the time duration during which a guest vcpu was ready to
  run, but was not scheduled to run by the hypervisor. This is particularly
  relevant in cloud environment where customers would want to use this as an
  indicator that their guests are being throttled. However, as it stands 
  today,
  guest steal time information is not visible from the hypervisor.
  
  For cloud service providers, this is problematic since they would want to
  overcommit cpu resources to achieve optimum resource utilization while at 
  the
  same time ensuring guests are not throttled. It is useful for service 
  providers
  to have access to the guest steal time data so that they can base their
  overcommit/guest packing decisions on this. Higher guest steal time can be 
  used
  as a trigger to change how the guests are scheduled, or even migrate guests 
  out
  of a system.
  
  This patchset attempts to make the guest steal times available in the host.
  This is achieved by introducing a new field in per-task statistics
  (/proc/pid/stat and /proc/pid/task/pid/stat) to accumulate per-vcpu 
  steal
  time. Programs (such as pidstat) can then be enhanced to report this
  information on a per-thread basis [If there is a better place/way to expose
  this, please let me know]. As an example, with pidstat on ppc64:
  
  Guest steal time information using mpstat:
  -
  
  [root@rhel7-img ~]# mpstat -P ALL 1
  Linux 3.19.0nnr (rhel7-img) 04/15/2015  _ppc64_ (4 CPU)
  
  03:13:23 PM  CPU%usr   %nice%sys %iowait%irq   %soft  %steal  
  %guest  %gnice   %idle
  03:13:24 PM  all   12.250.001.250.001.002.25   13.75
  0.000.00   69.50
  03:13:24 PM0   46.530.000.000.000.004.95   45.54
  0.000.002.97
  03:13:24 PM10.000.000.000.000.004.043.03
  0.000.00   92.93
  03:13:24 PM20.000.000.000.003.960.992.97
  0.000.00   92.08
  03:13:24 PM33.000.004.000.000.000.004.00
  0.000.00   89.00
  
  03:13:24 PM  CPU%usr   %nice%sys %iowait%irq   %soft  %steal  
  %guest  %gnice   %idle
  03:13:25 PM  all   12.590.000.000.000.000.25   12.35
  0.000.00   74.81
  03:13:25 PM0   50.000.000.000.000.000.98   49.02
  0.000.000.00
  03:13:25 PM10.980.000.000.000.000.000.00
  0.000.00   99.02
  03:13:25 PM20.000.000.000.000.000.000.00
  0.000.00  100.00
  03:13:25 PM30.000.000.000.000.000.000.00
  0.000.00  100.00
  
  03:13:25 PM  CPU%usr   %nice%sys %iowait%irq   %soft  %steal  
  %guest  %gnice   %idle
  03:13:26 PM  all   12.990.000.000.000.250.00   12.75
  0.000.00   74.02
  03:13:26 PM0   51.960.000.000.000.000.00   48.04
  0.000.000.00
  03:13:26 PM10.000.000.000.000.000.000.00
  0.000.00  100.00
  03:13:26 PM20.000.000.000.000.980.002.94
  0.000.00   96.08
  03:13:26 PM30.000.000.000.000.000.000.00
  0.000.00  100.00
  
  03:13:26 PM  CPU%usr   %nice%sys %iowait%irq   %soft  %steal  
  %guest  %gnice   %idle
  03:13:27 PM  all   12.530.001.000.250.000.25   12.03
  0.000.00   73.93
  03:13:27 PM0   51.020.000.000.000.000.00   48.98
  0.000.000.00
  03:13:27 PM10.000.004.040.000.000.000.00
  0.000.00   95.96
  03:13:27 PM20.000.000.000.000.000.000.00
  0.000.00  100.00
  03:13:27 PM30.000.000.000.000.000.000.00
  0.000.00  100.00
  
  Average: CPU%usr   %nice%sys %iowait%irq   %soft  %steal  
  %guest  %gnice   %idle
  Average: all   12.910.000.540.010.040.12   12.39
  0.000.00   74.00
  Average:   0   51.360.000.030.000.030.26   48.27
  0.000.000.05
  Average:   10.020.001.540.020.020.150.36
  0.000.00   97.89
  Average:   20.000.000.520.000.090.020.36
  0.000.00   99.02
  Average:   30.050.000.070.000.020.090.34
  0.000.00   99.43
  
  Steal time information in host using (locally modified) pidstat:
  ---
  
  [naveen@xx sysstat]$ ./pidstat -C qemu -tIu 1
  Linux 3.19.0nnr (xx.in.ibm.com) 04/15/2015  _ppc64_ (64 CPU)
  
  04:43:20 

Re: [RFC PATCH 0/3] Report guest steal time in host

2015-04-22 Thread Naveen N. Rao
On 2015/04/22 01:05PM, Christian Borntraeger wrote:
 Am 22.04.2015 um 12:24 schrieb Naveen N. Rao:
  Steal time accounts the time duration during which a guest vcpu was ready to
  run, but was not scheduled to run by the hypervisor. This is particularly
  relevant in cloud environment where customers would want to use this as an
  indicator that their guests are being throttled. However, as it stands 
  today,
  guest steal time information is not visible from the hypervisor.
  
  For cloud service providers, this is problematic since they would want to
  overcommit cpu resources to achieve optimum resource utilization while at 
  the
  same time ensuring guests are not throttled. It is useful for service 
  providers
  to have access to the guest steal time data so that they can base their
  overcommit/guest packing decisions on this. Higher guest steal time can be 
  used
  as a trigger to change how the guests are scheduled, or even migrate guests 
  out
  of a system.
  
  This patchset attempts to make the guest steal times available in the host.
  This is achieved by introducing a new field in per-task statistics
  (/proc/pid/stat and /proc/pid/task/pid/stat) to accumulate per-vcpu 
  steal
  time. Programs (such as pidstat) can then be enhanced to report this
  information on a per-thread basis [If there is a better place/way to expose
  this, please let me know]. As an example, with pidstat on ppc64:
  
  Guest steal time information using mpstat:
  -
  
  [root@rhel7-img ~]# mpstat -P ALL 1
  Linux 3.19.0nnr (rhel7-img) 04/15/2015  _ppc64_ (4 CPU)
  
  03:13:23 PM  CPU%usr   %nice%sys %iowait%irq   %soft  %steal  
  %guest  %gnice   %idle
  03:13:24 PM  all   12.250.001.250.001.002.25   13.75
  0.000.00   69.50
  03:13:24 PM0   46.530.000.000.000.004.95   45.54
  0.000.002.97
  03:13:24 PM10.000.000.000.000.004.043.03
  0.000.00   92.93
  03:13:24 PM20.000.000.000.003.960.992.97
  0.000.00   92.08
  03:13:24 PM33.000.004.000.000.000.004.00
  0.000.00   89.00
  
  03:13:24 PM  CPU%usr   %nice%sys %iowait%irq   %soft  %steal  
  %guest  %gnice   %idle
  03:13:25 PM  all   12.590.000.000.000.000.25   12.35
  0.000.00   74.81
  03:13:25 PM0   50.000.000.000.000.000.98   49.02
  0.000.000.00
  03:13:25 PM10.980.000.000.000.000.000.00
  0.000.00   99.02
  03:13:25 PM20.000.000.000.000.000.000.00
  0.000.00  100.00
  03:13:25 PM30.000.000.000.000.000.000.00
  0.000.00  100.00
  
  03:13:25 PM  CPU%usr   %nice%sys %iowait%irq   %soft  %steal  
  %guest  %gnice   %idle
  03:13:26 PM  all   12.990.000.000.000.250.00   12.75
  0.000.00   74.02
  03:13:26 PM0   51.960.000.000.000.000.00   48.04
  0.000.000.00
  03:13:26 PM10.000.000.000.000.000.000.00
  0.000.00  100.00
  03:13:26 PM20.000.000.000.000.980.002.94
  0.000.00   96.08
  03:13:26 PM30.000.000.000.000.000.000.00
  0.000.00  100.00
  
  03:13:26 PM  CPU%usr   %nice%sys %iowait%irq   %soft  %steal  
  %guest  %gnice   %idle
  03:13:27 PM  all   12.530.001.000.250.000.25   12.03
  0.000.00   73.93
  03:13:27 PM0   51.020.000.000.000.000.00   48.98
  0.000.000.00
  03:13:27 PM10.000.004.040.000.000.000.00
  0.000.00   95.96
  03:13:27 PM20.000.000.000.000.000.000.00
  0.000.00  100.00
  03:13:27 PM30.000.000.000.000.000.000.00
  0.000.00  100.00
  
  Average: CPU%usr   %nice%sys %iowait%irq   %soft  %steal  
  %guest  %gnice   %idle
  Average: all   12.910.000.540.010.040.12   12.39
  0.000.00   74.00
  Average:   0   51.360.000.030.000.030.26   48.27
  0.000.000.05
  Average:   10.020.001.540.020.020.150.36
  0.000.00   97.89
  Average:   20.000.000.520.000.090.020.36
  0.000.00   99.02
  Average:   30.050.000.070.000.020.090.34
  0.000.00   99.43
  
  Steal time information in host using (locally modified) pidstat:
  ---
  
  [naveen@xx sysstat]$ ./pidstat -C qemu -tIu 1
  Linux 3.19.0nnr (xx.in.ibm.com) 04/15/2015  _ppc64_ (64 CPU)
  
  04:43:20 

Re: [PATCH v4 7/8] vhost: feature to set the vring endianness

2015-04-22 Thread Michael S. Tsirkin
On Wed, Apr 22, 2015 at 11:08:54AM +0200, Greg Kurz wrote:
 On Tue, 21 Apr 2015 20:25:03 +0200
 Michael S. Tsirkin m...@redhat.com wrote:
 [ ... ]
 @@ -630,6 +634,53 @@ static long vhost_set_memory(struct vhost_dev 
 *d, struct vhost_memory __user *m)
   return 0;
  }
  
 +#ifdef CONFIG_VHOST_SET_ENDIAN_LEGACY
 +static long vhost_set_vring_big_endian(struct vhost_virtqueue *vq,
 +int __user *argp)
 +{
 + struct vhost_vring_state s;
 +
 + if (vq-private_data)
 + return -EBUSY;
 +
 + if (copy_from_user(s, argp, sizeof(s)))
 + return -EFAULT;
 +
 + if (s.num  s.num != 1)

s.num  ~0x1

   
   Since s.num is unsigned and I assume this won't change, what about
   s.num  1 as suggested by Cornelia ?
  
  I just tried and gcc optimizes
  s.num != 0  s.num != 1 to s.num  1
  
  The former will be more readable once we
  replace 0 and 1 with defines.
  
  So ignore my advice, keep code as is but use defines.
  
 
 Ok.
 
 [ ... ] 
 --- a/include/uapi/linux/vhost.h
 +++ b/include/uapi/linux/vhost.h
 @@ -103,6 +103,15 @@ struct vhost_memory {
  /* Get accessor: reads index, writes value in num */
  #define VHOST_GET_VRING_BASE _IOWR(VHOST_VIRTIO, 0x12, struct 
 vhost_vring_state)
  
 +/* Set the vring byte order in num. This is a legacy only API that 
 is simply
 + * ignored when VIRTIO_F_VERSION_1 is set.
 + * 0 to set to little-endian
 + * 1 to set to big-endian

How about defines for these?

   
   Ok. I'll put the defines here so that all the cross-endian stuff
   lies in the same hunk. Is it ok for you ?
  
  Fine.
  
 + * other values return EINVAL.
  
  Pls also add a note saying that not all kernel configurations support this 
  ioctl,
  but all configurations that support SET also support GET.
  
 
 Ok.
 
 + */
 +#define VHOST_SET_VRING_BIG_ENDIAN _IOW(VHOST_VIRTIO, 0x13, struct 
 vhost_vring_state)
 +#define VHOST_GET_VRING_BIG_ENDIAN _IOW(VHOST_VIRTIO, 0x14, struct 
 vhost_vring_state)
 +
  /* The following ioctls use eventfd file descriptors to signal and 
 poll
   * for events. */
  

  
  I'm inclined to think VHOST_SET_VRING_ENDIAN is a slightly better name.
  What do you think?
  
 
 Or VHOST_SET_VRING_CROSS_ENDIAN ? I like the idea to keep a hint that this
 API is for cross-endian only... like the rest of this series.
 
 --
 Greg

I think VHOST_SET_VRING_CROSS_ENDIAN is not a good name -
it would imply 1 for cross endian, 0 for native endian.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GSoC] project proposal

2015-04-22 Thread Catalin Vasile
On Wed, Apr 22, 2015 at 11:20 AM, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Tue, Apr 21, 2015 at 04:07:56PM +0200, Paolo Bonzini wrote:
 On 21/04/2015 16:07, Catalin Vasile wrote:
  I don't get the part with getting cryptodev upstream.
  I don't know what getting cryptodev upstream actually implies.
  From what I know cryptodev is done (is a functional project) that was
  rejected in the Linux Kernel
  and there isn't actually way to get it upstream.

 Yes, I agree.

 The limitations of AF_ALG need to addressed somehow, so what is the next
 step?

 Stefan

If we want a mainstream userspace backend that could interact with a
lot of crypto engines, we could use OpenSSL (it can actually use
cryptodev and AF_ALG as engines).
For now, until mid June (my diploma project presentation) I still want
to use vhost as a backend for the sole purpose of having a finished
backend which now I have a good grasp upon.
If the finished work would be good enough work to be merged upstream
will be talked later.
As a GSoC project, OpenSSL as a backend would continue the
virtio-crypto development, as it's not uncommon to have multiple types
of backends.
The current work on virtio-crypto qemu and guest module is pretty
backend agnostic, and could allow future development(use of other
backends and other features).
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 7/8] vhost: feature to set the vring endianness

2015-04-22 Thread Greg Kurz
On Tue, 21 Apr 2015 20:25:03 +0200
Michael S. Tsirkin m...@redhat.com wrote:
[ ... ]
@@ -630,6 +634,53 @@ static long vhost_set_memory(struct vhost_dev *d, 
struct vhost_memory __user *m)
return 0;
 }
 
+#ifdef CONFIG_VHOST_SET_ENDIAN_LEGACY
+static long vhost_set_vring_big_endian(struct vhost_virtqueue *vq,
+  int __user *argp)
+{
+   struct vhost_vring_state s;
+
+   if (vq-private_data)
+   return -EBUSY;
+
+   if (copy_from_user(s, argp, sizeof(s)))
+   return -EFAULT;
+
+   if (s.num  s.num != 1)
   
   s.num  ~0x1
   
  
  Since s.num is unsigned and I assume this won't change, what about
  s.num  1 as suggested by Cornelia ?
 
 I just tried and gcc optimizes
 s.num != 0  s.num != 1 to s.num  1
 
 The former will be more readable once we
 replace 0 and 1 with defines.
 
 So ignore my advice, keep code as is but use defines.
 

Ok.

[ ... ] 
--- a/include/uapi/linux/vhost.h
+++ b/include/uapi/linux/vhost.h
@@ -103,6 +103,15 @@ struct vhost_memory {
 /* Get accessor: reads index, writes value in num */
 #define VHOST_GET_VRING_BASE _IOWR(VHOST_VIRTIO, 0x12, struct 
vhost_vring_state)
 
+/* Set the vring byte order in num. This is a legacy only API that is 
simply
+ * ignored when VIRTIO_F_VERSION_1 is set.
+ * 0 to set to little-endian
+ * 1 to set to big-endian
   
   How about defines for these?
   
  
  Ok. I'll put the defines here so that all the cross-endian stuff
  lies in the same hunk. Is it ok for you ?
 
 Fine.
 
+ * other values return EINVAL.
 
 Pls also add a note saying that not all kernel configurations support this 
 ioctl,
 but all configurations that support SET also support GET.
 

Ok.

+ */
+#define VHOST_SET_VRING_BIG_ENDIAN _IOW(VHOST_VIRTIO, 0x13, struct 
vhost_vring_state)
+#define VHOST_GET_VRING_BIG_ENDIAN _IOW(VHOST_VIRTIO, 0x14, struct 
vhost_vring_state)
+
 /* The following ioctls use eventfd file descriptors to signal and poll
  * for events. */
 
   
 
 I'm inclined to think VHOST_SET_VRING_ENDIAN is a slightly better name.
 What do you think?
 

Or VHOST_SET_VRING_CROSS_ENDIAN ? I like the idea to keep a hint that this
API is for cross-endian only... like the rest of this series.

--
Greg

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GSoC] project proposal

2015-04-22 Thread Catalin Vasile
In those examples algorithms are used with standard protocols, not
with standalone algorithms.
CryptoAPI itself offers basic primitives such as encryption and
authentication which can be combined however you like.
Some combinations care result in other protocol implementations as well.

On Wed, Apr 22, 2015 at 11:27 AM, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Tue, Apr 21, 2015 at 05:24:55PM +0300, Catalin Vasile wrote:
 Can you give me more details on GnuTLS?
 I'm going through some documentation and code and I see that it
 doesn't actually have separate encryption and authentication
 primitives.

 gnutls is a natural choice because QEMU already uses it for TLS, but if
 it doesn't support the primitives you need, then AF_ALG could be used
 directly.

 http://www.gnutls.org/manual/gnutls.html#Using-GnuTLS-as-a-cryptographic-library

 Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html