date:20121004

RE: [PATCH 1/6] KVM: PPC: booke: use vcpu reference from thread_struct

2012-10-04 Thread Bhushan Bharat-R65777



 -Original Message-
 From: Alexander Graf [mailto:ag...@suse.de]
 Sent: Monday, September 24, 2012 9:58 PM
 To: Bhushan Bharat-R65777
 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; Bhushan Bharat-R65777
 Subject: Re: [PATCH 1/6] KVM: PPC: booke: use vcpu reference from 
 thread_struct
 
 
 On 21.08.2012, at 15:51, Bharat Bhushan wrote:
 
  Like other places, use thread_struct to get vcpu reference.
 
 Please remove the definition of SPRN_SPRG_R/WVCPU as well.

Ok

Thanks
-Bharat

 
 
 Alex
 
 
  Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
  ---
  arch/powerpc/kernel/asm-offsets.c   |2 +-
  arch/powerpc/kvm/booke_interrupts.S |6 ++
  2 files changed, 3 insertions(+), 5 deletions(-)
 
  diff --git a/arch/powerpc/kernel/asm-offsets.c
  b/arch/powerpc/kernel/asm-offsets.c
  index 85b05c4..fbb999c 100644
  --- a/arch/powerpc/kernel/asm-offsets.c
  +++ b/arch/powerpc/kernel/asm-offsets.c
  @@ -116,7 +116,7 @@ int main(void)
  #ifdef CONFIG_KVM_BOOK3S_32_HANDLER
  DEFINE(THREAD_KVM_SVCPU, offsetof(struct thread_struct,
  kvm_shadow_vcpu)); #endif -#ifdef CONFIG_KVM_BOOKE_HV
  +#if defined(CONFIG_KVM)  defined(CONFIG_BOOKE)
  DEFINE(THREAD_KVM_VCPU, offsetof(struct thread_struct, kvm_vcpu));
  #endif
 
  diff --git a/arch/powerpc/kvm/booke_interrupts.S
  b/arch/powerpc/kvm/booke_interrupts.S
  index bb46b32..ca16d57 100644
  --- a/arch/powerpc/kvm/booke_interrupts.S
  +++ b/arch/powerpc/kvm/booke_interrupts.S
  @@ -56,7 +56,8 @@
  _GLOBAL(kvmppc_handler_\ivor_nr)
  /* Get pointer to vcpu and record exit number. */
  mtspr   \scratch , r4
  -   mfspr   r4, SPRN_SPRG_RVCPU
  +   mfspr   r4, SPRN_SPRG_THREAD
  +   lwz r4, THREAD_KVM_VCPU(r4)
  stw r3, VCPU_GPR(R3)(r4)
  stw r5, VCPU_GPR(R5)(r4)
  stw r6, VCPU_GPR(R6)(r4)
  @@ -402,9 +403,6 @@ lightweight_exit:
  lwz r8, kvmppc_booke_handlers@l(r8)
  mtspr   SPRN_IVPR, r8
 
  -   /* Save vcpu pointer for the exception handlers. */
  -   mtspr   SPRN_SPRG_WVCPU, r4
  -
  lwz r5, VCPU_SHARED(r4)
 
  /* Can't switch the stack pointer until after IVPR is switched,
  --
  1.7.0.4
 
 
 


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] virtio-net: inline header support

2012-10-04 Thread Paolo Bonzini

Il 04/10/2012 02:11, Rusty Russell ha scritto:
   There's a reason I haven't done this.  I really, really dislike my
   implemention isn't broken feature bits.  We could have an infinite
   number of them, for each bug in each device.
 
  However, this bug affects (almost) all implementations and (almost) all
  devices.  It even makes sense to reserve a transport feature bit for it
  instead of a device feature bit.

 Perhaps, but we have to fix the bugs first!

Yes. :)  Isn't that what mst's patch does?

 As I said, my torture patch broke qemu immediately.  Since noone has
 leapt onto fixing that, I'll take a look now...

I can look at virtio-scsi.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH 4/6] KVM: PPC: debug stub interface parameter defined

2012-10-04 Thread Bhushan Bharat-R65777



 -Original Message-
 From: Alexander Graf [mailto:ag...@suse.de]
 Sent: Monday, September 24, 2012 9:09 PM
 To: Bhushan Bharat-R65777
 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; Bhushan Bharat-R65777
 Subject: Re: [PATCH 4/6] KVM: PPC: debug stub interface parameter defined
 
 
 On 21.08.2012, at 15:51, Bharat Bhushan wrote:
 
  This patch defines the interface parameter for KVM_SET_GUEST_DEBUG
  ioctl support. Follow up patches will use this for setting up hardware
  breakpoints, watchpoints and software breakpoints.
 
  Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
  ---
  arch/powerpc/include/asm/kvm.h |   33 +
  arch/powerpc/kvm/book3s.c  |6 ++
  arch/powerpc/kvm/booke.c   |6 ++
  arch/powerpc/kvm/powerpc.c |6 --
  4 files changed, 45 insertions(+), 6 deletions(-)
 
  diff --git a/arch/powerpc/include/asm/kvm.h
  b/arch/powerpc/include/asm/kvm.h index 3c14202..61b197e 100644
  --- a/arch/powerpc/include/asm/kvm.h
  +++ b/arch/powerpc/include/asm/kvm.h
  @@ -269,8 +269,41 @@ struct kvm_debug_exit_arch {
 
  /* for KVM_SET_GUEST_DEBUG */
  struct kvm_guest_debug_arch {
  +   struct {
  +   /* H/W breakpoint/watchpoint address */
  +   __u64 addr;
  +   /*
  +* Type denotes h/w breakpoint, read watchpoint, write
  +* watchpoint or watchpoint (both read and write).
  +*/
  +#define KVMPPC_DEBUG_NOTYPE0x0
  +#define KVMPPC_DEBUG_BREAKPOINT(1UL  1)
  +#define KVMPPC_DEBUG_WATCH_WRITE   (1UL  2)
  +#define KVMPPC_DEBUG_WATCH_READ(1UL  3)
  +   __u32 type;
  +   __u32 pad1;
 
 Why the padding?

Not sure why, I will remove this.

 
  +   __u64 pad2;
  +   } bp[16];
 
 Why 16?

I think for now 6 (4 iac + 2 dac) is sufficient for BOOKE. We kept 16 to have 
some room for future and other platforms.

Thanks
-Bharat
 
  };
 
  +/* Debug related defines */
  +/*
  + * kvm_guest_debug-control is a 32 bit field. The lower 16 bits are
  +generic
  + * and upper 16 bits are architecture specific. Architecture specific
  +defines
  + * that ioctl is for setting hardware breakpoint or software breakpoint.
  + */
  +#define KVM_GUESTDBG_USE_SW_BP 0x0001
  +#define KVM_GUESTDBG_USE_HW_BP 0x0002
  +
  +/* When setting software breakpoint, Change the software breakpoint
  + * instruction to special trap instruction and set
  +KVM_GUESTDBG_USE_SW_BP
  + * flag in kvm_guest_debug-control. KVM does keep track of software
  + * breakpoints. So when KVM_GUESTDBG_USE_SW_BP flag is set and
  +special trap
  + * instruction is executed by guest then exit to userspace.
  + * NOTE: A Nice interface can be added to get the special trap instruction.
  + */
  +#define KVMPPC_INST_GUEST_GDB  0x7C00021C  /* ehpriv OC=0 
  */
 
 This definitely has to be passed to user space (which writes that instruction
 into guest phys memory). Other PPC subarchs will use different instructions.
 Just model it as a read-only ONE_REG.
 
 
 Alex
 


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] hw: Add test device for unittests execution

2012-10-04 Thread Paolo Bonzini

Il 04/10/2012 05:49, Lucas Meneghel Rodrigues ha scritto:
 Add a test device which supports the kvmctl ioports,
 so one can run the KVM unittest suite [1].
 
 Usage:
 
 qemu -device testdev
 
 1) Removed port 0xf1, since now kvm-unit-tests use
serial
 
 2) Removed exit code port 0xf4, since that can be
replaced by
 
 -device isa-debugexit,iobase=0xf4,access-size=2

Can you include this in your series?

 3) Removed ram size port 0xd1, since guest memory
size can be retrieved from firmware, there's a
patch for kvm-unit-tests including an API to
retrieve that value.
 
 [1] Preliminary versions of this patch were posted
 to the mailing list about a year ago, I re-read the
 comments of the thread, and had guidance from
 Paolo about which ports to remove from the test
 device.
 
 CC: Paolo Bonzini pbonz...@redhat.com
 Signed-off-by: Gerd Hoffmann kra...@redhat.com
 Signed-off-by: Avi Kivity a...@redhat.com
 Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
 Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com
 ---
  hw/i386/Makefile.objs |   1 +
  hw/testdev.c  | 131 
 ++
  2 files changed, 132 insertions(+)
  create mode 100644 hw/testdev.c
 
 diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
 index 8c764bb..64d2787 100644
 --- a/hw/i386/Makefile.objs
 +++ b/hw/i386/Makefile.objs
 @@ -11,5 +11,6 @@ obj-$(CONFIG_XEN_PCI_PASSTHROUGH) += xen-host-pci-device.o
  obj-$(CONFIG_XEN_PCI_PASSTHROUGH) += xen_pt.o xen_pt_config_init.o 
 xen_pt_msi.o
  obj-y += kvm/
  obj-$(CONFIG_SPICE) += qxl.o qxl-logger.o qxl-render.o
 +obj-y += testdev.o
  
  obj-y := $(addprefix ../,$(obj-y))
 diff --git a/hw/testdev.c b/hw/testdev.c
 new file mode 100644
 index 000..44070f2
 --- /dev/null
 +++ b/hw/testdev.c
 @@ -0,0 +1,131 @@
 +#include sys/mman.h
 +#include hw.h
 +#include qdev.h
 +#include isa.h
 +
 +struct testdev {
 +ISADevice dev;
 +MemoryRegion iomem;
 +CharDriverState *chr;

This member is not necessary anymore.

Paolo

 +};
 +
 +#define TYPE_TESTDEV testdev
 +#define TESTDEV(obj) \
 + OBJECT_CHECK(struct testdev, (obj), TYPE_TESTDEV)
 +
 +static void test_device_irq_line(void *opaque, uint32_t addr, uint32_t data)
 +{
 +struct testdev *dev = opaque;
 +
 +qemu_set_irq(isa_get_irq(dev-dev, addr - 0x2000), !!data);
 +}
 +
 +static uint32 test_device_ioport_data;
 +
 +static void test_device_ioport_write(void *opaque, uint32_t addr, uint32_t 
 data)
 +{
 +test_device_ioport_data = data;
 +}
 +
 +static uint32_t test_device_ioport_read(void *opaque, uint32_t addr)
 +{
 +return test_device_ioport_data;
 +}
 +
 +static void test_device_flush_page(void *opaque, uint32_t addr, uint32_t 
 data)
 +{
 +target_phys_addr_t len = 4096;
 +void *a = cpu_physical_memory_map(data  ~0xffful, len, 0);
 +
 +mprotect(a, 4096, PROT_NONE);
 +mprotect(a, 4096, PROT_READ|PROT_WRITE);
 +cpu_physical_memory_unmap(a, len, 0, 0);
 +}
 +
 +static char *iomem_buf;
 +
 +static uint32_t test_iomem_readb(void *opaque, target_phys_addr_t addr)
 +{
 +return iomem_buf[addr];
 +}
 +
 +static uint32_t test_iomem_readw(void *opaque, target_phys_addr_t addr)
 +{
 +return *(uint16_t*)(iomem_buf + addr);
 +}
 +
 +static uint32_t test_iomem_readl(void *opaque, target_phys_addr_t addr)
 +{
 +return *(uint32_t*)(iomem_buf + addr);
 +}
 +
 +static void test_iomem_writeb(void *opaque, target_phys_addr_t addr, 
 uint32_t val)
 +{
 +iomem_buf[addr] = val;
 +}
 +
 +static void test_iomem_writew(void *opaque, target_phys_addr_t addr, 
 uint32_t val)
 +{
 +*(uint16_t*)(iomem_buf + addr) = val;
 +}
 +
 +static void test_iomem_writel(void *opaque, target_phys_addr_t addr, 
 uint32_t val)
 +{
 +*(uint32_t*)(iomem_buf + addr) = val;
 +}
 +
 +static const MemoryRegionOps test_iomem_ops = {
 +.old_mmio = {
 +.read = { test_iomem_readb, test_iomem_readw, test_iomem_readl, },
 +.write = { test_iomem_writeb, test_iomem_writew, test_iomem_writel, 
 },
 +},
 +.endianness = DEVICE_LITTLE_ENDIAN,
 +};
 +
 +static int init_test_device(ISADevice *isa)
 +{
 +struct testdev *dev = DO_UPCAST(struct testdev, dev, isa);
 +
 +register_ioport_read(0xe0, 1, 1, test_device_ioport_read, dev);
 +register_ioport_write(0xe0, 1, 1, test_device_ioport_write, dev);
 +register_ioport_read(0xe0, 1, 2, test_device_ioport_read, dev);
 +register_ioport_write(0xe0, 1, 2, test_device_ioport_write, dev);
 +register_ioport_read(0xe0, 1, 4, test_device_ioport_read, dev);
 +register_ioport_write(0xe0, 1, 4, test_device_ioport_write, dev);
 +register_ioport_write(0xe4, 1, 4, test_device_flush_page, dev);
 +register_ioport_write(0x2000, 24, 1, test_device_irq_line, NULL);
 +iomem_buf = g_malloc0(0x1);
 +memory_region_init_io(dev-iomem, test_iomem_ops, dev,
 +  testdev, 0x1);
 +memory_region_add_subregion(isa_address_space(dev-dev), 0xff00,
 +

Re: [PATCH RFC 1/2] kvm: Handle undercommitted guest case in PLE handler

2012-10-04 Thread Gleb Natapov

On Wed, Oct 03, 2012 at 04:56:57PM +0200, Avi Kivity wrote:
 On 10/03/2012 04:17 PM, Raghavendra K T wrote:
  * Avi Kivity a...@redhat.com [2012-09-30 13:13:09]:
  
  On 09/30/2012 01:07 PM, Gleb Natapov wrote:
   On Sun, Sep 30, 2012 at 10:18:17AM +0200, Avi Kivity wrote:
   On 09/28/2012 08:16 AM, Raghavendra K T wrote:

   
+struct pv_sched_info {
+   unsigned long   sched_bitmap;

Thinking, whether we need something similar to cpumask here?
Only thing is we are representing guest (v)cpumask.

   
   DECLARE_BITMAP(sched_bitmap, KVM_MAX_VCPUS)
   
   vcpu_id can be greater than KVM_MAX_VCPUS.
  
  Use the index into the vcpu table as the bitmap index then.  In fact
  it's better because then the lookup to get the vcpu pointer is trivial.
  
  Did you mean, while setting the bitmap,
  
  we should do 
  for (i = 1..n)
  if (kvm-vcpus[i] == vcpu) set ith position in bitmap?
 
 You can store i in the vcpu itself:
 
   set_bit(vcpu-index, kvm-preempted);
 
This will make the fact that vcpus are stored in an array into API
instead of implementation detail :( There were patches for vcpu
destruction that changed the array to rculist. Well, it will be still
possible to make the array rcu protected and copy it every time vcpu is
deleted/added I guess.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH 4/6] KVM: PPC: debug stub interface parameter defined

2012-10-04 Thread Bhushan Bharat-R65777



 -Original Message-
 From: Alexander Graf [mailto:ag...@suse.de]
 Sent: Monday, September 24, 2012 9:09 PM
 To: Bhushan Bharat-R65777
 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; Bhushan Bharat-R65777
 Subject: Re: [PATCH 4/6] KVM: PPC: debug stub interface parameter defined
 
 
 On 21.08.2012, at 15:51, Bharat Bhushan wrote:
 
  This patch defines the interface parameter for KVM_SET_GUEST_DEBUG
  ioctl support. Follow up patches will use this for setting up hardware
  breakpoints, watchpoints and software breakpoints.
 
  Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
  ---
  arch/powerpc/include/asm/kvm.h |   33 +
  arch/powerpc/kvm/book3s.c  |6 ++
  arch/powerpc/kvm/booke.c   |6 ++
  arch/powerpc/kvm/powerpc.c |6 --
  4 files changed, 45 insertions(+), 6 deletions(-)
 
  diff --git a/arch/powerpc/include/asm/kvm.h
  b/arch/powerpc/include/asm/kvm.h index 3c14202..61b197e 100644
  --- a/arch/powerpc/include/asm/kvm.h
  +++ b/arch/powerpc/include/asm/kvm.h
  @@ -269,8 +269,41 @@ struct kvm_debug_exit_arch {
 
  /* for KVM_SET_GUEST_DEBUG */
  struct kvm_guest_debug_arch {
  +   struct {
  +   /* H/W breakpoint/watchpoint address */
  +   __u64 addr;
  +   /*
  +* Type denotes h/w breakpoint, read watchpoint, write
  +* watchpoint or watchpoint (both read and write).
  +*/
  +#define KVMPPC_DEBUG_NOTYPE0x0
  +#define KVMPPC_DEBUG_BREAKPOINT(1UL  1)
  +#define KVMPPC_DEBUG_WATCH_WRITE   (1UL  2)
  +#define KVMPPC_DEBUG_WATCH_READ(1UL  3)
  +   __u32 type;
  +   __u32 pad1;
 
 Why the padding?
 
  +   __u64 pad2;
  +   } bp[16];
 
 Why 16?
 
  };
 
  +/* Debug related defines */
  +/*
  + * kvm_guest_debug-control is a 32 bit field. The lower 16 bits are
  +generic
  + * and upper 16 bits are architecture specific. Architecture specific
  +defines
  + * that ioctl is for setting hardware breakpoint or software breakpoint.
  + */
  +#define KVM_GUESTDBG_USE_SW_BP 0x0001
  +#define KVM_GUESTDBG_USE_HW_BP 0x0002
  +
  +/* When setting software breakpoint, Change the software breakpoint
  + * instruction to special trap instruction and set
  +KVM_GUESTDBG_USE_SW_BP
  + * flag in kvm_guest_debug-control. KVM does keep track of software
  + * breakpoints. So when KVM_GUESTDBG_USE_SW_BP flag is set and
  +special trap
  + * instruction is executed by guest then exit to userspace.
  + * NOTE: A Nice interface can be added to get the special trap instruction.
  + */
  +#define KVMPPC_INST_GUEST_GDB  0x7C00021C  /* ehpriv OC=0 
  */
 
 This definitely has to be passed to user space (which writes that instruction
 into guest phys memory). Other PPC subarchs will use different instructions.
 Just model it as a read-only ONE_REG.

Ok.

Thanks
-Bharat


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] virtio-net: inline header support

2012-10-04 Thread Rusty Russell

Anthony Liguori anth...@codemonkey.ws writes:
 lguest fix is pending in my queue.  lkvm and qemu are broken; lkvm isn't
 ever going to be merged, so I'm not sure what its status is?  But I'm
 determined to fix qemu, and hence my torture patch to make sure this
 doesn't creep in again.

 There are even more implementations out there and I'd wager they all
 rely on framing.

Worse, both virtio_blk (for scsi commands) and virtio_scsi explicitly
and inescapably rely on framing.  The spec conflicts clearly with
itself.

Such layering violations are always a mistake, but I can't blame anyone
else for my lack of attention :(

Here's the spec change:
commit 7e74459bb966ccbaad9e4bf361d1178b7f400b79
Author: Rusty Russell ru...@rustcorp.com.au
Date:   Thu Oct 4 17:11:27 2012 +0930

No longer assume framing is independent of messages.  *sniff*

Signed-off-by: Rusty Russell ru...@rustcorp.com.au

--- virtio-spec.txt 2012-10-04 17:13:04.988259234 +0930
+++ virtio-spec.txt.new 2012-10-04 17:12:54.624258969 +0930
@@ -880,19 +880,19 @@
 
   Message Framing
 
-The descriptors used for a buffer should not effect the semantics
-of the message, except for the total length of the buffer. For
-example, a network buffer consists of a 10 byte header followed
-by the network packet. Whether this is presented in the ring
-descriptor chain as (say) a 10 byte buffer and a 1514 byte
-buffer, or a single 1524 byte buffer, or even three buffers,
-should have no effect.
+Unless stated otherwise, it is expected that headers within a
+message are contained within their own descriptors. For example,
+a network buffer consists of a 10 or 12 byte header followed by
+the network packet. An implementation should expect that this
+header will be within the first descriptor, and that the
+remainder of the data will begin on the second descriptor.
 
-In particular, no implementation should use the descriptor
-boundaries to determine the size of any header in a request.[footnote:
-The current qemu device implementations mistakenly insist that
-the first descriptor cover the header in these cases exactly, so
-a cautious driver should arrange it so.
+[footnote:
+It was previously asserted that framing should be independent of
+message contents, yet invariably drivers layed out messages in
+reliable ways and devices assumed it. In addition, the
+specifications for virtio_blk and virtio_scsi require intuiting
+field lengths from frame boundaries.
 ]
 
   Device Improvements

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH] hw: Add test device for unittests execution

2012-10-04 Thread Peter Maydell

On 4 October 2012 04:49, Lucas Meneghel Rodrigues l...@redhat.com wrote:
 Add a test device which supports the kvmctl ioports,
 so one can run the KVM unittest suite [1].

 Usage:

 qemu -device testdev

 1) Removed port 0xf1, since now kvm-unit-tests use
serial

 2) Removed exit code port 0xf4, since that can be
replaced by

 -device isa-debugexit,iobase=0xf4,access-size=2

 3) Removed ram size port 0xd1, since guest memory
size can be retrieved from firmware, there's a
patch for kvm-unit-tests including an API to
retrieve that value.

 [1] Preliminary versions of this patch were posted
 to the mailing list about a year ago, I re-read the
 comments of the thread, and had guidance from
 Paolo about which ports to remove from the test
 device.

General remark: there's no documentation anywhere in
this patch. I don't necessarily mean user-facing docs,
but at least a descriptive comment saying what the
heck this device is and what it does would be helpful.



 CC: Paolo Bonzini pbonz...@redhat.com
 Signed-off-by: Gerd Hoffmann kra...@redhat.com
 Signed-off-by: Avi Kivity a...@redhat.com
 Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
 Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com
 ---
  hw/i386/Makefile.objs |   1 +
  hw/testdev.c  | 131 
 ++
  2 files changed, 132 insertions(+)
  create mode 100644 hw/testdev.c

 diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
 index 8c764bb..64d2787 100644
 --- a/hw/i386/Makefile.objs
 +++ b/hw/i386/Makefile.objs
 @@ -11,5 +11,6 @@ obj-$(CONFIG_XEN_PCI_PASSTHROUGH) += xen-host-pci-device.o
  obj-$(CONFIG_XEN_PCI_PASSTHROUGH) += xen_pt.o xen_pt_config_init.o 
 xen_pt_msi.o
  obj-y += kvm/
  obj-$(CONFIG_SPICE) += qxl.o qxl-logger.o qxl-render.o
 +obj-y += testdev.o

...the device is useful even in non-KVM configs, then?

  obj-y := $(addprefix ../,$(obj-y))
 diff --git a/hw/testdev.c b/hw/testdev.c
 new file mode 100644
 index 000..44070f2
 --- /dev/null
 +++ b/hw/testdev.c
 @@ -0,0 +1,131 @@
 +#include sys/mman.h

This file needs a leading comment with the usual copyright/license/
description of what the file does.

 +#include hw.h
 +#include qdev.h
 +#include isa.h
 +
 +struct testdev {
 +ISADevice dev;
 +MemoryRegion iomem;
 +CharDriverState *chr;
 +};
 +
 +#define TYPE_TESTDEV testdev
 +#define TESTDEV(obj) \
 + OBJECT_CHECK(struct testdev, (obj), TYPE_TESTDEV)
 +
 +static void test_device_irq_line(void *opaque, uint32_t addr, uint32_t data)
 +{
 +struct testdev *dev = opaque;
 +
 +qemu_set_irq(isa_get_irq(dev-dev, addr - 0x2000), !!data);
 +}
 +
 +static uint32 test_device_ioport_data;
 +
 +static void test_device_ioport_write(void *opaque, uint32_t addr, uint32_t 
 data)
 +{
 +test_device_ioport_data = data;
 +}
 +
 +static uint32_t test_device_ioport_read(void *opaque, uint32_t addr)
 +{
 +return test_device_ioport_data;
 +}
 +
 +static void test_device_flush_page(void *opaque, uint32_t addr, uint32_t 
 data)
 +{
 +target_phys_addr_t len = 4096;
 +void *a = cpu_physical_memory_map(data  ~0xffful, len, 0);
 +
 +mprotect(a, 4096, PROT_NONE);
 +mprotect(a, 4096, PROT_READ|PROT_WRITE);
 +cpu_physical_memory_unmap(a, len, 0, 0);
 +}
 +
 +static char *iomem_buf;
 +
 +static uint32_t test_iomem_readb(void *opaque, target_phys_addr_t addr)
 +{
 +return iomem_buf[addr];
 +}
 +
 +static uint32_t test_iomem_readw(void *opaque, target_phys_addr_t addr)
 +{
 +return *(uint16_t*)(iomem_buf + addr);
 +}
 +
 +static uint32_t test_iomem_readl(void *opaque, target_phys_addr_t addr)
 +{
 +return *(uint32_t*)(iomem_buf + addr);
 +}
 +
 +static void test_iomem_writeb(void *opaque, target_phys_addr_t addr, 
 uint32_t val)
 +{
 +iomem_buf[addr] = val;
 +}
 +
 +static void test_iomem_writew(void *opaque, target_phys_addr_t addr, 
 uint32_t val)
 +{
 +*(uint16_t*)(iomem_buf + addr) = val;
 +}
 +
 +static void test_iomem_writel(void *opaque, target_phys_addr_t addr, 
 uint32_t val)
 +{
 +*(uint32_t*)(iomem_buf + addr) = val;
 +}
 +
 +static const MemoryRegionOps test_iomem_ops = {
 +.old_mmio = {
 +.read = { test_iomem_readb, test_iomem_readw, test_iomem_readl, },
 +.write = { test_iomem_writeb, test_iomem_writew, test_iomem_writel, 
 },
 +},
 +.endianness = DEVICE_LITTLE_ENDIAN,
 +};
 +
 +static int init_test_device(ISADevice *isa)
 +{
 +struct testdev *dev = DO_UPCAST(struct testdev, dev, isa);
 +
 +register_ioport_read(0xe0, 1, 1, test_device_ioport_read, dev);
 +register_ioport_write(0xe0, 1, 1, test_device_ioport_write, dev);
 +register_ioport_read(0xe0, 1, 2, test_device_ioport_read, dev);
 +register_ioport_write(0xe0, 1, 2, test_device_ioport_write, dev);
 +register_ioport_read(0xe0, 1, 4, test_device_ioport_read, dev);
 +register_ioport_write(0xe0, 1, 4, test_device_ioport_write, dev);
 +register_ioport_write(0xe4, 1, 4, test_device_flush_page,

Re: [Qemu-devel] [PATCH] hw: Add test device for unittests execution

2012-10-04 Thread Paolo Bonzini

Il 04/10/2012 10:02, Peter Maydell ha scritto:
 On 4 October 2012 04:49, Lucas Meneghel Rodrigues l...@redhat.com wrote:
 Add a test device which supports the kvmctl ioports,
 so one can run the KVM unittest suite [1].

 Usage:

 qemu -device testdev

 1) Removed port 0xf1, since now kvm-unit-tests use
serial

 2) Removed exit code port 0xf4, since that can be
replaced by

 -device isa-debugexit,iobase=0xf4,access-size=2

 3) Removed ram size port 0xd1, since guest memory
size can be retrieved from firmware, there's a
patch for kvm-unit-tests including an API to
retrieve that value.

 [1] Preliminary versions of this patch were posted
 to the mailing list about a year ago, I re-read the
 comments of the thread, and had guidance from
 Paolo about which ports to remove from the test
 device.
 
 General remark: there's no documentation anywhere in
 this patch. I don't necessarily mean user-facing docs,
 but at least a descriptive comment saying what the
 heck this device is and what it does would be helpful.
 
 

 CC: Paolo Bonzini pbonz...@redhat.com
 Signed-off-by: Gerd Hoffmann kra...@redhat.com
 Signed-off-by: Avi Kivity a...@redhat.com
 Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
 Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com
 ---
  hw/i386/Makefile.objs |   1 +
  hw/testdev.c  | 131 
 ++
  2 files changed, 132 insertions(+)
  create mode 100644 hw/testdev.c

 diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
 index 8c764bb..64d2787 100644
 --- a/hw/i386/Makefile.objs
 +++ b/hw/i386/Makefile.objs
 @@ -11,5 +11,6 @@ obj-$(CONFIG_XEN_PCI_PASSTHROUGH) += xen-host-pci-device.o
  obj-$(CONFIG_XEN_PCI_PASSTHROUGH) += xen_pt.o xen_pt_config_init.o 
 xen_pt_msi.o
  obj-y += kvm/
  obj-$(CONFIG_SPICE) += qxl.o qxl-logger.o qxl-render.o
 +obj-y += testdev.o
 
 ...the device is useful even in non-KVM configs, then?
 
  obj-y := $(addprefix ../,$(obj-y))
 diff --git a/hw/testdev.c b/hw/testdev.c
 new file mode 100644
 index 000..44070f2
 --- /dev/null
 +++ b/hw/testdev.c
 @@ -0,0 +1,131 @@
 +#include sys/mman.h
 
 This file needs a leading comment with the usual copyright/license/
 description of what the file does.
 
 +#include hw.h
 +#include qdev.h
 +#include isa.h
 +
 +struct testdev {
 +ISADevice dev;
 +MemoryRegion iomem;
 +CharDriverState *chr;
 +};
 +
 +#define TYPE_TESTDEV testdev
 +#define TESTDEV(obj) \
 + OBJECT_CHECK(struct testdev, (obj), TYPE_TESTDEV)
 +
 +static void test_device_irq_line(void *opaque, uint32_t addr, uint32_t data)
 +{
 +struct testdev *dev = opaque;
 +
 +qemu_set_irq(isa_get_irq(dev-dev, addr - 0x2000), !!data);
 +}
 +
 +static uint32 test_device_ioport_data;
 +
 +static void test_device_ioport_write(void *opaque, uint32_t addr, uint32_t 
 data)
 +{
 +test_device_ioport_data = data;
 +}
 +
 +static uint32_t test_device_ioport_read(void *opaque, uint32_t addr)
 +{
 +return test_device_ioport_data;
 +}
 +
 +static void test_device_flush_page(void *opaque, uint32_t addr, uint32_t 
 data)
 +{
 +target_phys_addr_t len = 4096;
 +void *a = cpu_physical_memory_map(data  ~0xffful, len, 0);
 +
 +mprotect(a, 4096, PROT_NONE);
 +mprotect(a, 4096, PROT_READ|PROT_WRITE);
 +cpu_physical_memory_unmap(a, len, 0, 0);
 +}
 +
 +static char *iomem_buf;
 +
 +static uint32_t test_iomem_readb(void *opaque, target_phys_addr_t addr)
 +{
 +return iomem_buf[addr];
 +}
 +
 +static uint32_t test_iomem_readw(void *opaque, target_phys_addr_t addr)
 +{
 +return *(uint16_t*)(iomem_buf + addr);
 +}
 +
 +static uint32_t test_iomem_readl(void *opaque, target_phys_addr_t addr)
 +{
 +return *(uint32_t*)(iomem_buf + addr);
 +}
 +
 +static void test_iomem_writeb(void *opaque, target_phys_addr_t addr, 
 uint32_t val)
 +{
 +iomem_buf[addr] = val;
 +}
 +
 +static void test_iomem_writew(void *opaque, target_phys_addr_t addr, 
 uint32_t val)
 +{
 +*(uint16_t*)(iomem_buf + addr) = val;
 +}
 +
 +static void test_iomem_writel(void *opaque, target_phys_addr_t addr, 
 uint32_t val)
 +{
 +*(uint32_t*)(iomem_buf + addr) = val;
 +}
 +
 +static const MemoryRegionOps test_iomem_ops = {
 +.old_mmio = {
 +.read = { test_iomem_readb, test_iomem_readw, test_iomem_readl, },
 +.write = { test_iomem_writeb, test_iomem_writew, test_iomem_writel, 
 },
 +},
 +.endianness = DEVICE_LITTLE_ENDIAN,
 +};
 +
 +static int init_test_device(ISADevice *isa)
 +{
 +struct testdev *dev = DO_UPCAST(struct testdev, dev, isa);
 +
 +register_ioport_read(0xe0, 1, 1, test_device_ioport_read, dev);
 +register_ioport_write(0xe0, 1, 1, test_device_ioport_write, dev);
 +register_ioport_read(0xe0, 1, 2, test_device_ioport_read, dev);
 +register_ioport_write(0xe0, 1, 2, test_device_ioport_write, dev);
 +register_ioport_read(0xe0, 1, 4, test_device_ioport_read, dev);
 +register_ioport_write(0xe0, 1, 4, test_device_ioport_write, dev);
 +

Re: [Qemu-devel] [patch 2/6] Use machine options to emulate -no-kvm-irqchip

2012-10-04 Thread Jan Kiszka

On 2012-10-03 20:26, Marcelo Tosatti wrote:
 On Wed, Oct 03, 2012 at 07:24:48PM +0200, Jan Kiszka wrote:
 On 2012-10-03 19:16, Anthony Liguori wrote:
 Jan Kiszka jan.kis...@web.de writes:

 On 2012-10-03 17:03, Marcelo Tosatti wrote:
 On Wed, Oct 03, 2012 at 09:40:17AM -0500, Anthony Liguori wrote:
 Marcelo Tosatti mtosa...@redhat.com writes:

 Commit 3ad763fcba5bd0ec5a79d4a9b6baeef119dd4a3d from qemu-kvm.git.

 From: Jan Kiszka jan.kis...@siemens.com
 
 Upstream is moving towards this mechanism, so start using it in qemu-kvm
 already to configure the specific defaults: kvm enabled on, just like
 in-kernel irqchips.

 Signed-off-by: Marcelo Tosatti mtosa...@redhat.com


 Reviewed-by: Anthony Liguori aligu...@us.ibm.com

 Although it's a little odd to have From: Jan without a SoB...

 Agree, Jan can you ACK?

 I wasn't able to join the call yesterday: Is there a removal schedule
 associated with those switches? Also, why pushing things upstream, even
 when only for one release, that have been loudly deprecated for a while
 in qemu-kvm? Some switches are lacking deprecated warnings on the
 console, and -no-kvm is missing completely. I tend to focus on patch 1 
 5, dropping the rest - based on relevance for production use.

 The distros need to keep these flags to do the switch.

 Why? Should be documented in commit log.

  I see no point
 in deprecating them since they're trivially easy to maintain.

 Given the level of cr** we already have in the command line, they are
 kind of noise, yes. But even then, these patches are not consistent as
 pointed out above.

 Also, they should not be documented to avoid being spread. That's what
 we did with other deprecated switches in QEMU.

 Jan
 
 Jan,
 
 You're comments to the patch are:
 
 - No documentation.

See e.g. how -M is handled in qemu-options.hx.

 - Expiration date.

Anthony said forever, but I think we should remove all those that
issue deprecation warnings after 1-2 years.

 - Changelog explaining what?? (didnt get that). Perhaps better changelog
   in general?

I'm still failing to understand who could depend on -no-kvm-irqchip or
-no-kvm-pit. And I don't understand why -no-kvm was not included. Soe
the reasons for include -X should be provided. Also check your patch
subjects again, at least one was wrong.

Jan




signature.asc
Description: OpenPGP digital signature

[virt][PATCH 2/4] virt: Cleanup unused import in libvirt_xml.py

2012-10-04 Thread Jiří Župka

Signed-off-by: Jiří Župka jzu...@redhat.com
---
 virttest/libvirt_xml.py |6 ++
 1 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/virttest/libvirt_xml.py b/virttest/libvirt_xml.py
index bedc496..4376111 100644
--- a/virttest/libvirt_xml.py
+++ b/virttest/libvirt_xml.py
@@ -8,10 +8,8 @@
 xml_utils module documentation for more information on working with
 XMLTreeFile instances.
 
-
-import logging, os.path
-from autotest.client.shared import error, xml_utils
-from virttest import libvirt_vm, virsh
+from autotest.client.shared import xml_utils
+from virttest import virsh
 
 
 class LibvirtXMLError(Exception):
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[virt][PATCH 3/4] virt: Adds kvm,libvirt check of machine model

2012-10-04 Thread Jiří Župka

Signed-off-by: Jiří Župka jzu...@redhat.com
---
 virttest/kvm_vm.py |   12 ++--
 virttest/libvirt_vm.py |   13 +++--
 2 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/virttest/kvm_vm.py b/virttest/kvm_vm.py
index 9877d55..c51cd1b 100644
--- a/virttest/kvm_vm.py
+++ b/virttest/kvm_vm.py
@@ -830,7 +830,6 @@ class VM(virt_vm.BaseVM):
 cmd = 
 return cmd
 
-
 def add_machine_type(hlp, machine_type):
 if has_option(hlp, machine) or has_option(hlp, M):
 return  -M %s % machine_type
@@ -916,6 +915,7 @@ class VM(virt_vm.BaseVM):
 self.qemu_binary = qemu_binary
 hlp = commands.getoutput(%s -help % qemu_binary)
 support_cpu_model = commands.getoutput(%s -cpu ?list % qemu_binary)
+support_machine_type = commands.getoutput(%s -M ? % qemu_binary)
 
 device_help = 
 if has_option(hlp, device):
@@ -1165,7 +1165,15 @@ class VM(virt_vm.BaseVM):
 
 machine_type = params.get(machine_type)
 if machine_type:
-qemu_cmd += add_machine_type(hlp, machine_type)
+m_types = []
+for m in support_machine_type.splitlines()[1:]:
+m_types.append(m.split()[0])
+
+if machine_type in m_types:
+qemu_cmd += add_machine_type(hlp, machine_type)
+else:
+raise error.TestNAError(Not supported machine type %s. %
+(machine_type))
 
 for cdrom in params.objects(cdroms):
 cd_format = params.get(cd_format, )
diff --git a/virttest/libvirt_vm.py b/virttest/libvirt_vm.py
index 4e03835..c46c139 100644
--- a/virttest/libvirt_vm.py
+++ b/virttest/libvirt_vm.py
@@ -8,7 +8,7 @@ import time, os, logging, fcntl, re, shutil, urlparse, tempfile
 from autotest.client.shared import error
 from autotest.client import utils, os_dep
 from xml.dom import minidom
-import utils_misc, virt_vm, storage, aexpect, remote, virsh
+import utils_misc, virt_vm, storage, aexpect, remote, virsh, libvirt_xml
 
 
 def libvirtd_restart():
@@ -521,6 +521,11 @@ class VM(virt_vm.BaseVM):
 
 help = utils.system_output(%s --help % virt_install_binary)
 
+# Find all machine type. There isn't filtred any machine type even if
+# domain doesn't support this machine type. Filtering could be added.
+me = libvirt_xml.LibvirtXML().getroot().findall(./guest/arch/machine)
+support_machine_type = map(lambda m: m.text, me)
+
 # Start constructing the qemu command
 virt_install_cmd = 
 # Set the X11 display parameter if requested
@@ -542,7 +547,11 @@ class VM(virt_vm.BaseVM):
 
 machine_type = params.get(machine_type)
 if machine_type:
-virt_install_cmd += add_machine_type(help, machine_type)
+if machine_type in support_machine_type:
+virt_install_cmd += add_machine_type(help, machine_type)
+else:
+raise error.TestNAError(Not supported machine type %s. %
+(machine_type))
 
 mem = params.get(mem)
 if mem:
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[virt][PATCH 4/4] virt: Adds multihost machine type test variants.

2012-10-04 Thread Jiří Župka

This patch groups multihost test into one group to which is
added machine types variant of test.

Signed-off-by: Jiří Župka jzu...@redhat.com
---
 kvm/cfg/multi-host-tests.cfg.sample |6 +-
 shared/cfg/subtests.cfg.sample  |  164 ++-
 2 files changed, 107 insertions(+), 63 deletions(-)

diff --git a/kvm/cfg/multi-host-tests.cfg.sample 
b/kvm/cfg/multi-host-tests.cfg.sample
index bc0ce03..d47e756 100644
--- a/kvm/cfg/multi-host-tests.cfg.sample
+++ b/kvm/cfg/multi-host-tests.cfg.sample
@@ -21,7 +21,8 @@ variants:
 only no_pci_assignable
 only no_9p_export
 only smallpages
-only Fedora.15.64
+only pc
+only Fedora.17.64
 only migrate_multi_host
 
 # Runs qemu, f16 64 bit guest OS, install, boot, shutdown
@@ -37,7 +38,8 @@ variants:
 only no_pci_assignable
 only no_9p_export
 only smallpages
-only Fedora.15.64
+only pc
+only Fedora.17.64
 only cpuflags_multi_host
 
 only qemu_migrate_multi_host
diff --git a/shared/cfg/subtests.cfg.sample b/shared/cfg/subtests.cfg.sample
index ab62e12..957bb9c 100644
--- a/shared/cfg/subtests.cfg.sample
+++ b/shared/cfg/subtests.cfg.sample
@@ -960,52 +960,111 @@ variants:
 - monotonic_time:
 test_control_file = monotonic_time.control
 
-- migrate_multi_host: install setup image_copy unattended_install.cdrom
-type = migration_multi_host
-vms = vm1
-start_vm = no
-migration_test_command = help
-migration_bg_command = cd /tmp; nohup tcpdump -q -i any -t ip host 
localhost
-migration_bg_check_command = pgrep tcpdump
-migration_bg_kill_command = pkill tcpdump
-kill_vm_on_error = yes
-iterations = 2
-used_mem = 1024
-mig_timeout = 4800
-disk_prepare_timeout = 360
-comm_port = 13234
-regain_ip_cmd = killall dhclient; sleep 10; dhclient;
+- @multi_host:
 variants:
-#Migration protocol.
--tcp:
+- migrate_multi_host: install setup image_copy 
unattended_install.cdrom
+type = migration_multi_host
+vms = vm1
+start_vm = no
+migration_test_command = help
+migration_bg_command = cd /tmp; nohup tcpdump -q -i any -t ip 
host localhost
+migration_bg_check_command = pgrep tcpdump
+migration_bg_kill_command = pkill tcpdump
+kill_vm_on_error = yes
+iterations = 2
+used_mem = 1024
+mig_timeout = 4800
+disk_prepare_timeout = 360
+comm_port = 13234
+regain_ip_cmd = killall dhclient; sleep 10; dhclient;
 variants:
-- @default:
-type = migration_multi_host
-- measure_migration_speed:
-only Linux
-mig_speed = 1G
-type = migration_multi_host_with_speed_measurement
--fd:
-type = migration_multi_host_fd
-
-- migration_multi_host_with_file_transfer: install setup image_copy 
unattended_install.cdrom
-type = migration_multi_host_with_file_transfer
-vms = vm1
-start_vm = no
-kill_vm_on_error = yes
-used_mem = 1024
-mig_timeout = 4800
-disk_prepare_timeout = 360
-comm_port = 13234
-#path where file is stored on guest.
-guest_path = /tmp/file
-#size of generated file.
-file_size = 500
-transfer_timeout = 240
-#Transfer speed in Mb
-transfer_speed = 100
-#Count of migration during file transfer.
-migrate_count = 3
+#Migration protocol.
+-tcp:
+variants:
+- @default:
+type = migration_multi_host
+- measure_migration_speed:
+only Linux
+mig_speed = 1G
+type = 
migration_multi_host_with_speed_measurement
+-fd:
+type = migration_multi_host_fd
+
+- migration_multi_host_with_file_transfer: install setup 
image_copy unattended_install.cdrom
+type = migration_multi_host_with_file_transfer
+vms = vm1
+start_vm = no
+kill_vm_on_error = yes
+used_mem = 1024
+mig_timeout = 4800
+disk_prepare_timeout = 360
+comm_port = 13234
+#path where file is stored on guest.
+guest_path = /tmp/file
+#size of generated file.
+file_size = 500
+

Re: qemu-kvm: remove boot=on|off drive parameter compatibility

2012-10-04 Thread Jan Kiszka

On 2012-10-03 15:19, Paolo Bonzini wrote:
 Il 03/10/2012 12:57, Lucas Meneghel Rodrigues ha scritto:
 Yep, I did send patches with the testdev device present on qemu-kvm.git
 to qemu.git a while ago, but there were many comments on the review, I
 ended up not implementing everything that was asked and the patches were
 archived.

 If nobody wants to step up to port it, I'll re-read the original thread
 and will spin up new patches (and try to go through the end with it).
 Executing the KVM unittests is something that we can't afford to lose,
 so I'd say it's important on this last mile effort to get rid of qemu-kvm.
 
 Absolutely, IIRC the problem was that testdev did a little bit of
 everything... let's see what's the functionality of testdev:
 
 - write (port 0xf1), can be replaced in autotest with:
 -device isa-debugcon,iobase=0xf1,chardev=...
 
 - exit code (port 0xf4), see this series:
 http://lists.gnu.org/archive/html/qemu-devel/2012-07/msg00818.html
 
 - ram size (port 0xd1).  If we can also patch kvm-unittests, the memory
 is available in the CMOS or in fwcfg.  Here is the SeaBIOS code:
 
 u32 rs = ((inb_cmos(0x34)  16) | (inb_cmos(0x35)  24));
 if (rs)
 rs += 16 * 1024 * 1024;
 else
 rs = (((inb_cmos(0x30)  10) | (inb_cmos(0x31)  18))
   + 1 * 1024 * 1024);
 
 The rest (ports 0xe0..0xe7, 0x2000..0x2017, MMIO) can be left in testdev.

IIRC, one of the biggest problem with testdev was its hack to inject
interrupts.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC 1/2] kvm: Handle undercommitted guest case in PLE handler

2012-10-04 Thread Raghavendra K T


On 10/03/2012 10:35 PM, Avi Kivity wrote:

On 10/03/2012 02:22 PM, Raghavendra K T wrote:

So I think it's worth trying again with ple_window of 2-4.



Hi Avi,

I ran different benchmarks increasing ple_window, and results does not
seem to be encouraging for increasing ple_window.


Thanks for testing! Comments below.


Results:
16 core PLE machine with 16 vcpu guest.

base kernel = 3.6-rc5 + ple handler optimization patch
base_pleopt_8k = base kernel + ple window = 8k
base_pleopt_16k = base kernel + ple window = 16k
base_pleopt_32k = base kernel + ple window = 32k


Percentage improvements of benchmarks w.r.t base_pleopt with ple_window = 4096

base_pleopt_8k  base_pleopt_16k base_pleopt_32k
-   

kernbench_1x-5.54915-15.94529   -44.31562
kernbench_2x-7.89399-17.75039   -37.73498


So, 44% degradation even with no overcommit?  That's surprising.


Yes. Kernbench was run with #threads = #vcpu * 2 as usual. Is it
spending 8 times the original ple_window cycles for 16 vcpus
significant?




I also got perf top output to analyse the difference. Difference comes
because of flushtlb (and also spinlock).


That's in the guest, yes?


Yes. Perf is in guest.





Ebizzy run for 4k ple_window
-  87.20%  [kernel]  [k] arch_local_irq_restore
- arch_local_irq_restore
   - 100.00% _raw_spin_unlock_irqrestore
  + 52.89% release_pages
  + 47.10% pagevec_lru_move_fn
-   5.71%  [kernel]  [k] arch_local_irq_restore
- arch_local_irq_restore
   + 86.03% default_send_IPI_mask_allbutself_phys
   + 13.96% default_send_IPI_mask_sequence_phys
-   3.10%  [kernel]  [k] smp_call_function_many
  smp_call_function_many


Ebizzy run for 32k ple_window

-  91.40%  [kernel]  [k] arch_local_irq_restore
- arch_local_irq_restore
   - 100.00% _raw_spin_unlock_irqrestore
  + 53.13% release_pages
  + 46.86% pagevec_lru_move_fn
-   4.38%  [kernel]  [k] smp_call_function_many
  smp_call_function_many
-   2.51%  [kernel]  [k] arch_local_irq_restore
- arch_local_irq_restore
   + 90.76% default_send_IPI_mask_allbutself_phys
   + 9.24% default_send_IPI_mask_sequence_phys



Both the 4k and the 32k results are crazy.  Why is
arch_local_irq_restore() so prominent?  Do you have a very high
interrupt rate in the guest?


How to measure if I have high interrupt rate in guest?
From /proc/interrupt numbers I am not able to judge :(

I went back and got the results on a 32 core machine with 32 vcpu guest.
Strangely, I got result supporting the claim that increasing ple_window 
helps for non-overcommitted scenario.


32 core 32 vcpu guest 1x scenarios.

ple_gap = 0
kernbench: Elapsed Time 38.61
ebizzy: 7463 records/s

ple_window = 4k
kernbench: Elapsed Time 43.5067
ebizzy:2528 records/s

ple_window = 32k
kernebench : Elapsed Time 39.4133
ebizzy: 7196 records/s


perf top for ebizzy for above:
ple_gap = 0
-  84.74%  [kernel]  [k] arch_local_irq_restore
   - arch_local_irq_restore
  - 100.00% _raw_spin_unlock_irqrestore
 + 50.96% release_pages
 + 49.02% pagevec_lru_move_fn
-   6.57%  [kernel]  [k] arch_local_irq_restore
   - arch_local_irq_restore
  + 92.54% default_send_IPI_mask_allbutself_phys
  + 7.46% default_send_IPI_mask_sequence_phys
-   1.54%  [kernel]  [k] smp_call_function_many
 smp_call_function_many

ple_window = 32k
-  84.47%  [kernel]  [k] arch_local_irq_restore
   + arch_local_irq_restore
-   6.46%  [kernel]  [k] arch_local_irq_restore
   - arch_local_irq_restore
  + 93.51% default_send_IPI_mask_allbutself_phys
  + 6.49% default_send_IPI_mask_sequence_phys
-   1.80%  [kernel]  [k] smp_call_function_many
   - smp_call_function_many
  + 99.98% native_flush_tlb_others


ple_window = 4k
-  91.35%  [kernel]  [k] arch_local_irq_restore
   - arch_local_irq_restore
  - 100.00% _raw_spin_unlock_irqrestore
 + 53.19% release_pages
 + 46.81% pagevec_lru_move_fn
-   3.90%  [kernel]  [k] smp_call_function_many
 smp_call_function_many
-   2.94%  [kernel]  [k] arch_local_irq_restore
   - arch_local_irq_restore
  + 93.12% default_send_IPI_mask_allbutself_phys
  + 6.88% default_send_IPI_mask_sequence_phys

Let me know if I can try something here..
/me confused :(

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC 0/2] kvm: Improving undercommit,overcommit scenarios in PLE handler

2012-10-04 Thread Raghavendra K T


On 10/03/2012 10:55 PM, Avi Kivity wrote:

On 10/03/2012 04:29 PM, Raghavendra K T wrote:

* Avi Kivity a...@redhat.com [2012-09-27 14:03:59]:


On 09/27/2012 01:23 PM, Raghavendra K T wrote:



[...]

2) looking at the result (comparing A  C) , I do feel we have
significant in iterating over vcpus (when compared to even vmexit)
so We still would need undercommit fix sugested by PeterZ (improving by
140%). ?


Looking only at the current runqueue?  My worry is that it misses a lot
of cases.  Maybe try the current runqueue first and then others.



Okay. Do you mean we can have something like

+   if (rq-nr_running == 1  p_rq-nr_running == 1) {
+   yielded = -ESRCH;
+   goto out_irq;
+   }

in the Peter's patch ?

( I thought lot about  or || . Both seem to have their own cons ).
But that should be only when we have short term imbalance, as PeterZ
told.


I'm missing the context.  What is p_rq?


p_rq is the run queue of target vcpu.
What I was trying below was to address Rik concern. Suppose
rq of source vcpu has one task, but target probably has two task,
with a eligible vcpu waiting to be scheduled.



What I mean was:

   if can_yield_to_process_in_current_rq
  do that
   else if can_yield_to_process_in_other_rq
  do that
   else
  return -ESRCH


I think you are saying we have to check the run queue of the
source vcpu, if we have a vcpu belonging to same VM and try yield to 
that? ignoring whatever the target vcpu we received for yield_to.


Or is it that kvm_vcpu_yield_to should now check the vcpus of same vm
belonging to same run queue first. If we don't succeed, go again for
a vcpu in different runqueue.
Does it add more overhead especially in = 1x scenario?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH 6/6] KVM: booke/bookehv: Add debug stub support

2012-10-04 Thread Bhushan Bharat-R65777



 -Original Message-
 From: Alexander Graf [mailto:ag...@suse.de]
 Sent: Monday, September 24, 2012 9:50 PM
 To: Bhushan Bharat-R65777
 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; Bhushan Bharat-R65777
 Subject: Re: [PATCH 6/6] KVM: booke/bookehv: Add debug stub support
 
 
 On 21.08.2012, at 15:52, Bharat Bhushan wrote:
 
  This patch adds the debug stub support on booke/bookehv.
  Now QEMU debug stub can use hw breakpoint, watchpoint and
  software breakpoint to debug guest.
 
  Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
  ---
  arch/powerpc/include/asm/kvm.h|   29 ++-
  arch/powerpc/include/asm/kvm_host.h   |5 +
  arch/powerpc/kernel/asm-offsets.c |   26 ++
  arch/powerpc/kvm/booke.c  |  144 
  +
  arch/powerpc/kvm/booke_interrupts.S   |  110 +
  arch/powerpc/kvm/bookehv_interrupts.S |  141 
  +++-
  arch/powerpc/kvm/e500mc.c |3 +-
  7 files changed, 435 insertions(+), 23 deletions(-)
 
  diff --git a/arch/powerpc/include/asm/kvm.h b/arch/powerpc/include/asm/kvm.h
  index 61b197e..53479ea 100644
  --- a/arch/powerpc/include/asm/kvm.h
  +++ b/arch/powerpc/include/asm/kvm.h
  @@ -25,6 +25,7 @@
  /* Select powerpc specific features in linux/kvm.h */
  #define __KVM_HAVE_SPAPR_TCE
  #define __KVM_HAVE_PPC_SMT
  +#define __KVM_HAVE_GUEST_DEBUG
 
  struct kvm_regs {
  __u64 pc;
  @@ -264,7 +265,31 @@ struct kvm_fpu {
  __u64 fpr[32];
  };
 
  +
  +/*
  + * Defines for h/w breakpoint, watchpoint (read, write or both) and
  + * software breakpoint.
  + * These are used as type in KVM_SET_GUEST_DEBUG ioctl and status
  + * for KVM_DEBUG_EXIT.
  + */
  +#define KVMPPC_DEBUG_NONE  0x0
  +#define KVMPPC_DEBUG_BREAKPOINT(1UL  1)
  +#define KVMPPC_DEBUG_WATCH_WRITE   (1UL  2)
  +#define KVMPPC_DEBUG_WATCH_READ(1UL  3)
  struct kvm_debug_exit_arch {
  +   __u64 pc;
  +   /*
  +* exception - returns the exception number. If the KVM_DEBUG_EXIT
  +* exit is not handled (say not h/w breakpoint or software breakpoint
  +* set for this address) by qemu then it is supposed to inject this
  +* exception to guest.
  +*/
  +   __u32 exception;
  +   /*
  +* exiting to userspace because of h/w breakpoint, watchpoint
  +* (read, write or both) and software breakpoint.
  +*/
  +   __u32 status;
  };
 
  /* for KVM_SET_GUEST_DEBUG */
  @@ -276,10 +301,6 @@ struct kvm_guest_debug_arch {
   * Type denotes h/w breakpoint, read watchpoint, write
   * watchpoint or watchpoint (both read and write).
   */
  -#define KVMPPC_DEBUG_NOTYPE0x0
  -#define KVMPPC_DEBUG_BREAKPOINT(1UL  1)
  -#define KVMPPC_DEBUG_WATCH_WRITE   (1UL  2)
  -#define KVMPPC_DEBUG_WATCH_READ(1UL  3)
  __u32 type;
  __u32 pad1;
  __u64 pad2;
  diff --git a/arch/powerpc/include/asm/kvm_host.h
 b/arch/powerpc/include/asm/kvm_host.h
  index c7219c1..3ba465a 100644
  --- a/arch/powerpc/include/asm/kvm_host.h
  +++ b/arch/powerpc/include/asm/kvm_host.h
  @@ -496,7 +496,12 @@ struct kvm_vcpu_arch {
  u32 mmucfg;
  u32 epr;
  u32 crit_save;
  +   /* guest debug registers*/
  struct kvmppc_booke_debug_reg dbg_reg;
  +   /* shadow debug registers */
  +   struct kvmppc_booke_debug_reg shadow_dbg_reg;
  +   /* host debug registers*/
  +   struct kvmppc_booke_debug_reg host_dbg_reg;
  #endif
  gpa_t paddr_accessed;
  gva_t vaddr_accessed;
  diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-
 offsets.c
  index 555448e..6987821 100644
  --- a/arch/powerpc/kernel/asm-offsets.c
  +++ b/arch/powerpc/kernel/asm-offsets.c
  @@ -564,6 +564,32 @@ int main(void)
  DEFINE(VCPU_FAULT_DEAR, offsetof(struct kvm_vcpu, arch.fault_dear));
  DEFINE(VCPU_FAULT_ESR, offsetof(struct kvm_vcpu, arch.fault_esr));
  DEFINE(VCPU_CRIT_SAVE, offsetof(struct kvm_vcpu, arch.crit_save));
  +   DEFINE(VCPU_DBSR, offsetof(struct kvm_vcpu, arch.dbsr));
  +   DEFINE(VCPU_SHADOW_DBG, offsetof(struct kvm_vcpu, arch.shadow_dbg_reg));
  +   DEFINE(VCPU_HOST_DBG, offsetof(struct kvm_vcpu, arch.host_dbg_reg));
  +   DEFINE(KVMPPC_DBG_DBCR0, offsetof(struct kvmppc_booke_debug_reg,
  + dbcr0));
  +   DEFINE(KVMPPC_DBG_DBCR1, offsetof(struct kvmppc_booke_debug_reg,
  + dbcr1));
  +   DEFINE(KVMPPC_DBG_DBCR2, offsetof(struct kvmppc_booke_debug_reg,
  + dbcr2));
  +#ifdef CONFIG_KVM_E500MC
  +   DEFINE(KVMPPC_DBG_DBCR4, offsetof(struct kvmppc_booke_debug_reg,
  + dbcr4));
  +#endif
  +   DEFINE(KVMPPC_DBG_IAC1, offsetof(struct kvmppc_booke_debug_reg,
  +iac[0]));
  +   DEFINE(KVMPPC_DBG_IAC2, offsetof(struct kvmppc_booke_debug_reg,
  +

Re: [PATCH 6/6] KVM: booke/bookehv: Add debug stub support

2012-10-04 Thread Alexander Graf


On 04.10.2012, at 13:06, Bhushan Bharat-R65777 wrote:

 
 
 -Original Message-
 From: Alexander Graf [mailto:ag...@suse.de]
 Sent: Monday, September 24, 2012 9:50 PM
 To: Bhushan Bharat-R65777
 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; Bhushan Bharat-R65777
 Subject: Re: [PATCH 6/6] KVM: booke/bookehv: Add debug stub support
 
 
 On 21.08.2012, at 15:52, Bharat Bhushan wrote:
 
 This patch adds the debug stub support on booke/bookehv.
 Now QEMU debug stub can use hw breakpoint, watchpoint and
 software breakpoint to debug guest.
 
 Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
 ---
 arch/powerpc/include/asm/kvm.h|   29 ++-
 arch/powerpc/include/asm/kvm_host.h   |5 +
 arch/powerpc/kernel/asm-offsets.c |   26 ++
 arch/powerpc/kvm/booke.c  |  144 
 +
 arch/powerpc/kvm/booke_interrupts.S   |  110 +
 arch/powerpc/kvm/bookehv_interrupts.S |  141 
 +++-
 arch/powerpc/kvm/e500mc.c |3 +-
 7 files changed, 435 insertions(+), 23 deletions(-)
 
 diff --git a/arch/powerpc/include/asm/kvm.h b/arch/powerpc/include/asm/kvm.h
 index 61b197e..53479ea 100644
 --- a/arch/powerpc/include/asm/kvm.h
 +++ b/arch/powerpc/include/asm/kvm.h
 @@ -25,6 +25,7 @@
 /* Select powerpc specific features in linux/kvm.h */
 #define __KVM_HAVE_SPAPR_TCE
 #define __KVM_HAVE_PPC_SMT
 +#define __KVM_HAVE_GUEST_DEBUG
 
 struct kvm_regs {
 __u64 pc;
 @@ -264,7 +265,31 @@ struct kvm_fpu {
 __u64 fpr[32];
 };
 
 +
 +/*
 + * Defines for h/w breakpoint, watchpoint (read, write or both) and
 + * software breakpoint.
 + * These are used as type in KVM_SET_GUEST_DEBUG ioctl and status
 + * for KVM_DEBUG_EXIT.
 + */
 +#define KVMPPC_DEBUG_NONE  0x0
 +#define KVMPPC_DEBUG_BREAKPOINT(1UL  1)
 +#define KVMPPC_DEBUG_WATCH_WRITE   (1UL  2)
 +#define KVMPPC_DEBUG_WATCH_READ(1UL  3)
 struct kvm_debug_exit_arch {
 +   __u64 pc;
 +   /*
 +* exception - returns the exception number. If the KVM_DEBUG_EXIT
 +* exit is not handled (say not h/w breakpoint or software breakpoint
 +* set for this address) by qemu then it is supposed to inject this
 +* exception to guest.
 +*/
 +   __u32 exception;
 +   /*
 +* exiting to userspace because of h/w breakpoint, watchpoint
 +* (read, write or both) and software breakpoint.
 +*/
 +   __u32 status;
 };
 
 /* for KVM_SET_GUEST_DEBUG */
 @@ -276,10 +301,6 @@ struct kvm_guest_debug_arch {
  * Type denotes h/w breakpoint, read watchpoint, write
  * watchpoint or watchpoint (both read and write).
  */
 -#define KVMPPC_DEBUG_NOTYPE0x0
 -#define KVMPPC_DEBUG_BREAKPOINT(1UL  1)
 -#define KVMPPC_DEBUG_WATCH_WRITE   (1UL  2)
 -#define KVMPPC_DEBUG_WATCH_READ(1UL  3)
 __u32 type;
 __u32 pad1;
 __u64 pad2;
 diff --git a/arch/powerpc/include/asm/kvm_host.h
 b/arch/powerpc/include/asm/kvm_host.h
 index c7219c1..3ba465a 100644
 --- a/arch/powerpc/include/asm/kvm_host.h
 +++ b/arch/powerpc/include/asm/kvm_host.h
 @@ -496,7 +496,12 @@ struct kvm_vcpu_arch {
 u32 mmucfg;
 u32 epr;
 u32 crit_save;
 +   /* guest debug registers*/
 struct kvmppc_booke_debug_reg dbg_reg;
 +   /* shadow debug registers */
 +   struct kvmppc_booke_debug_reg shadow_dbg_reg;
 +   /* host debug registers*/
 +   struct kvmppc_booke_debug_reg host_dbg_reg;
 #endif
 gpa_t paddr_accessed;
 gva_t vaddr_accessed;
 diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-
 offsets.c
 index 555448e..6987821 100644
 --- a/arch/powerpc/kernel/asm-offsets.c
 +++ b/arch/powerpc/kernel/asm-offsets.c
 @@ -564,6 +564,32 @@ int main(void)
 DEFINE(VCPU_FAULT_DEAR, offsetof(struct kvm_vcpu, arch.fault_dear));
 DEFINE(VCPU_FAULT_ESR, offsetof(struct kvm_vcpu, arch.fault_esr));
 DEFINE(VCPU_CRIT_SAVE, offsetof(struct kvm_vcpu, arch.crit_save));
 +   DEFINE(VCPU_DBSR, offsetof(struct kvm_vcpu, arch.dbsr));
 +   DEFINE(VCPU_SHADOW_DBG, offsetof(struct kvm_vcpu, arch.shadow_dbg_reg));
 +   DEFINE(VCPU_HOST_DBG, offsetof(struct kvm_vcpu, arch.host_dbg_reg));
 +   DEFINE(KVMPPC_DBG_DBCR0, offsetof(struct kvmppc_booke_debug_reg,
 + dbcr0));
 +   DEFINE(KVMPPC_DBG_DBCR1, offsetof(struct kvmppc_booke_debug_reg,
 + dbcr1));
 +   DEFINE(KVMPPC_DBG_DBCR2, offsetof(struct kvmppc_booke_debug_reg,
 + dbcr2));
 +#ifdef CONFIG_KVM_E500MC
 +   DEFINE(KVMPPC_DBG_DBCR4, offsetof(struct kvmppc_booke_debug_reg,
 + dbcr4));
 +#endif
 +   DEFINE(KVMPPC_DBG_IAC1, offsetof(struct kvmppc_booke_debug_reg,
 +iac[0]));
 +   DEFINE(KVMPPC_DBG_IAC2, offsetof(struct kvmppc_booke_debug_reg,
 +iac[1]));
 +

Re: qemu-kvm: remove boot=on|off drive parameter compatibility

2012-10-04 Thread Lucas Meneghel Rodrigues


On 10/04/2012 07:48 AM, Jan Kiszka wrote:

On 2012-10-03 15:19, Paolo Bonzini wrote:

Il 03/10/2012 12:57, Lucas Meneghel Rodrigues ha scritto:

Yep, I did send patches with the testdev device present on qemu-kvm.git
to qemu.git a while ago, but there were many comments on the review, I
ended up not implementing everything that was asked and the patches were
archived.

If nobody wants to step up to port it, I'll re-read the original thread
and will spin up new patches (and try to go through the end with it).
Executing the KVM unittests is something that we can't afford to lose,
so I'd say it's important on this last mile effort to get rid of qemu-kvm.


Absolutely, IIRC the problem was that testdev did a little bit of
everything... let's see what's the functionality of testdev:

- write (port 0xf1), can be replaced in autotest with:
-device isa-debugcon,iobase=0xf1,chardev=...

- exit code (port 0xf4), see this series:
http://lists.gnu.org/archive/html/qemu-devel/2012-07/msg00818.html

- ram size (port 0xd1).  If we can also patch kvm-unittests, the memory
is available in the CMOS or in fwcfg.  Here is the SeaBIOS code:

 u32 rs = ((inb_cmos(0x34)  16) | (inb_cmos(0x35)  24));
 if (rs)
 rs += 16 * 1024 * 1024;
 else
 rs = (((inb_cmos(0x30)  10) | (inb_cmos(0x31)  18))
   + 1 * 1024 * 1024);

The rest (ports 0xe0..0xe7, 0x2000..0x2017, MMIO) can be left in testdev.


IIRC, one of the biggest problem with testdev was its hack to inject
interrupts.


Jan, I assume this commit helps to fix this, right?

commit b334ec567f1de9a60349991e7b75083d569ddb0a
Author: Jan Kiszka jan.kis...@siemens.com
Date:   Fri Mar 2 10:30:47 2012 +0100

qemu-kvm: Use upstream kvm-i8259

Drop the qemu-kvm version in favor of the equivalent upstream
implementation. This allows to move the i8259 back into the hwlib.

Note that this also drops the testdev hack and restores proper
isa_get_irq. If testdev scripts exist that inject  IRQ15, they need
fixing. Testing for these interrupts on the PIIX3 makes no practical
sense anyway as those lines are unused.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
Signed-off-by: Avi Kivity a...@redhat.com


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: qemu-kvm: remove boot=on|off drive parameter compatibility

2012-10-04 Thread Jan Kiszka

On 2012-10-04 14:10, Lucas Meneghel Rodrigues wrote:
 On 10/04/2012 07:48 AM, Jan Kiszka wrote:
 On 2012-10-03 15:19, Paolo Bonzini wrote:
 Il 03/10/2012 12:57, Lucas Meneghel Rodrigues ha scritto:
 Yep, I did send patches with the testdev device present on qemu-kvm.git
 to qemu.git a while ago, but there were many comments on the review, I
 ended up not implementing everything that was asked and the patches were
 archived.

 If nobody wants to step up to port it, I'll re-read the original thread
 and will spin up new patches (and try to go through the end with it).
 Executing the KVM unittests is something that we can't afford to lose,
 so I'd say it's important on this last mile effort to get rid of qemu-kvm.

 Absolutely, IIRC the problem was that testdev did a little bit of
 everything... let's see what's the functionality of testdev:

 - write (port 0xf1), can be replaced in autotest with:
 -device isa-debugcon,iobase=0xf1,chardev=...

 - exit code (port 0xf4), see this series:
 http://lists.gnu.org/archive/html/qemu-devel/2012-07/msg00818.html

 - ram size (port 0xd1).  If we can also patch kvm-unittests, the memory
 is available in the CMOS or in fwcfg.  Here is the SeaBIOS code:

  u32 rs = ((inb_cmos(0x34)  16) | (inb_cmos(0x35)  24));
  if (rs)
  rs += 16 * 1024 * 1024;
  else
  rs = (((inb_cmos(0x30)  10) | (inb_cmos(0x31)  18))
+ 1 * 1024 * 1024);

 The rest (ports 0xe0..0xe7, 0x2000..0x2017, MMIO) can be left in testdev.

 IIRC, one of the biggest problem with testdev was its hack to inject
 interrupts.
 
 Jan, I assume this commit helps to fix this, right?
 
 commit b334ec567f1de9a60349991e7b75083d569ddb0a
 Author: Jan Kiszka jan.kis...@siemens.com
 Date:   Fri Mar 2 10:30:47 2012 +0100
 
  qemu-kvm: Use upstream kvm-i8259
 
  Drop the qemu-kvm version in favor of the equivalent upstream
  implementation. This allows to move the i8259 back into the hwlib.
 
  Note that this also drops the testdev hack and restores proper
  isa_get_irq. If testdev scripts exist that inject  IRQ15, they need
  fixing. Testing for these interrupts on the PIIX3 makes no practical
  sense anyway as those lines are unused.
 
  Signed-off-by: Jan Kiszka jan.kis...@siemens.com
  Signed-off-by: Avi Kivity a...@redhat.com

Yes, this improved it a lot as we no longer depend on additional
changes. I'm not sure if there was resistance beyond that.

When cleaning up the code: register_ioport_read must be replaced with
the memory API.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC 1/2] kvm: Handle undercommitted guest case in PLE handler

2012-10-04 Thread Avi Kivity

On 10/04/2012 12:49 PM, Raghavendra K T wrote:
 On 10/03/2012 10:35 PM, Avi Kivity wrote:
 On 10/03/2012 02:22 PM, Raghavendra K T wrote:
 So I think it's worth trying again with ple_window of 2-4.


 Hi Avi,

 I ran different benchmarks increasing ple_window, and results does not
 seem to be encouraging for increasing ple_window.

 Thanks for testing! Comments below.

 Results:
 16 core PLE machine with 16 vcpu guest.

 base kernel = 3.6-rc5 + ple handler optimization patch
 base_pleopt_8k = base kernel + ple window = 8k
 base_pleopt_16k = base kernel + ple window = 16k
 base_pleopt_32k = base kernel + ple window = 32k


 Percentage improvements of benchmarks w.r.t base_pleopt with
 ple_window = 4096

 base_pleopt_8kbase_pleopt_16kbase_pleopt_32k
 -   

 kernbench_1x-5.54915-15.94529-44.31562
 kernbench_2x-7.89399-17.75039-37.73498

 So, 44% degradation even with no overcommit?  That's surprising.
 
 Yes. Kernbench was run with #threads = #vcpu * 2 as usual. Is it
 spending 8 times the original ple_window cycles for 16 vcpus
 significant?

A PLE exit when not overcommitted cannot do any good, it is better to
spin in the guest rather that look for candidates on the host.  In fact
when we benchmark we often disable PLE completely.

 

 I also got perf top output to analyse the difference. Difference comes
 because of flushtlb (and also spinlock).

 That's in the guest, yes?
 
 Yes. Perf is in guest.
 


 Ebizzy run for 4k ple_window
 -  87.20%  [kernel]  [k] arch_local_irq_restore
 - arch_local_irq_restore
- 100.00% _raw_spin_unlock_irqrestore
   + 52.89% release_pages
   + 47.10% pagevec_lru_move_fn
 -   5.71%  [kernel]  [k] arch_local_irq_restore
 - arch_local_irq_restore
+ 86.03% default_send_IPI_mask_allbutself_phys
+ 13.96% default_send_IPI_mask_sequence_phys
 -   3.10%  [kernel]  [k] smp_call_function_many
   smp_call_function_many


 Ebizzy run for 32k ple_window

 -  91.40%  [kernel]  [k] arch_local_irq_restore
 - arch_local_irq_restore
- 100.00% _raw_spin_unlock_irqrestore
   + 53.13% release_pages
   + 46.86% pagevec_lru_move_fn
 -   4.38%  [kernel]  [k] smp_call_function_many
   smp_call_function_many
 -   2.51%  [kernel]  [k] arch_local_irq_restore
 - arch_local_irq_restore
+ 90.76% default_send_IPI_mask_allbutself_phys
+ 9.24% default_send_IPI_mask_sequence_phys


 Both the 4k and the 32k results are crazy.  Why is
 arch_local_irq_restore() so prominent?  Do you have a very high
 interrupt rate in the guest?
 
 How to measure if I have high interrupt rate in guest?
 From /proc/interrupt numbers I am not able to judge :(

'vmstat 1'

 
 I went back and got the results on a 32 core machine with 32 vcpu guest.
 Strangely, I got result supporting the claim that increasing ple_window
 helps for non-overcommitted scenario.
 
 32 core 32 vcpu guest 1x scenarios.
 
 ple_gap = 0
 kernbench: Elapsed Time 38.61
 ebizzy: 7463 records/s
 
 ple_window = 4k
 kernbench: Elapsed Time 43.5067
 ebizzy:2528 records/s
 
 ple_window = 32k
 kernebench : Elapsed Time 39.4133
 ebizzy: 7196 records/s

So maybe something was wrong with the first measurement.

 
 
 perf top for ebizzy for above:
 ple_gap = 0
 -  84.74%  [kernel]  [k] arch_local_irq_restore
- arch_local_irq_restore
   - 100.00% _raw_spin_unlock_irqrestore
  + 50.96% release_pages
  + 49.02% pagevec_lru_move_fn
 -   6.57%  [kernel]  [k] arch_local_irq_restore
- arch_local_irq_restore
   + 92.54% default_send_IPI_mask_allbutself_phys
   + 7.46% default_send_IPI_mask_sequence_phys
 -   1.54%  [kernel]  [k] smp_call_function_many
  smp_call_function_many

Again the numbers are ridiculously high for arch_local_irq_restore.
Maybe there's a bad perf/kvm interaction when we're injecting an
interrupt, I can't believe we're spending 84% of the time running the
popf instruction.

 
 ple_window = 32k
 -  84.47%  [kernel]  [k] arch_local_irq_restore
+ arch_local_irq_restore
 -   6.46%  [kernel]  [k] arch_local_irq_restore
- arch_local_irq_restore
   + 93.51% default_send_IPI_mask_allbutself_phys
   + 6.49% default_send_IPI_mask_sequence_phys
 -   1.80%  [kernel]  [k] smp_call_function_many
- smp_call_function_many
   + 99.98% native_flush_tlb_others
 
 
 ple_window = 4k
 -  91.35%  [kernel]  [k] arch_local_irq_restore
- arch_local_irq_restore
   - 100.00% _raw_spin_unlock_irqrestore
  + 53.19% release_pages
  + 46.81% pagevec_lru_move_fn
 -   3.90%  [kernel]  [k] smp_call_function_many
  smp_call_function_many
 -   2.94%  [kernel]  [k] arch_local_irq_restore
- arch_local_irq_restore
   + 93.12% default_send_IPI_mask_allbutself_phys
   + 6.88% default_send_IPI_mask_sequence_phys
 
 Let me know if I can try something here..
 /me confused :(

Re: [PATCH RFC 0/2] kvm: Improving undercommit,overcommit scenarios in PLE handler

2012-10-04 Thread Avi Kivity

On 10/04/2012 12:56 PM, Raghavendra K T wrote:
 On 10/03/2012 10:55 PM, Avi Kivity wrote:
 On 10/03/2012 04:29 PM, Raghavendra K T wrote:
 * Avi Kivity a...@redhat.com [2012-09-27 14:03:59]:

 On 09/27/2012 01:23 PM, Raghavendra K T wrote:

 [...]
 2) looking at the result (comparing A  C) , I do feel we have
 significant in iterating over vcpus (when compared to even vmexit)
 so We still would need undercommit fix sugested by PeterZ
 (improving by
 140%). ?

 Looking only at the current runqueue?  My worry is that it misses a lot
 of cases.  Maybe try the current runqueue first and then others.


 Okay. Do you mean we can have something like

 +   if (rq-nr_running == 1  p_rq-nr_running == 1) {
 +   yielded = -ESRCH;
 +   goto out_irq;
 +   }

 in the Peter's patch ?

 ( I thought lot about  or || . Both seem to have their own cons ).
 But that should be only when we have short term imbalance, as PeterZ
 told.

 I'm missing the context.  What is p_rq?
 
 p_rq is the run queue of target vcpu.
 What I was trying below was to address Rik concern. Suppose
 rq of source vcpu has one task, but target probably has two task,
 with a eligible vcpu waiting to be scheduled.
 

 What I mean was:

if can_yield_to_process_in_current_rq
   do that
else if can_yield_to_process_in_other_rq
   do that
else
   return -ESRCH
 
 I think you are saying we have to check the run queue of the
 source vcpu, if we have a vcpu belonging to same VM and try yield to
 that? ignoring whatever the target vcpu we received for yield_to.
 
 Or is it that kvm_vcpu_yield_to should now check the vcpus of same vm
 belonging to same run queue first. If we don't succeed, go again for
 a vcpu in different runqueue.

Right.  Prioritize vcpus that are cheap to yield to.  But may return bad
results if all vcpus on the current runqueue are spinners, so probably
not a good idea.

 Does it add more overhead especially in = 1x scenario?

The current runqueue should have just our vcpu in that case, so low
overhead.  But it's a bad idea due to the above scenario.

-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] virtio-net: inline header support

2012-10-04 Thread Rusty Russell

Paolo Bonzini pbonz...@redhat.com writes:

 Il 04/10/2012 02:11, Rusty Russell ha scritto:
   There's a reason I haven't done this.  I really, really dislike my
   implemention isn't broken feature bits.  We could have an infinite
   number of them, for each bug in each device.
 
  However, this bug affects (almost) all implementations and (almost) all
  devices.  It even makes sense to reserve a transport feature bit for it
  instead of a device feature bit.

 Perhaps, but we have to fix the bugs first!

 Yes. :)  Isn't that what mst's patch does?

 As I said, my torture patch broke qemu immediately.  Since noone has
 leapt onto fixing that, I'll take a look now...

 I can look at virtio-scsi.

Actually, you can't, see my reply to Anthony...

Message-ID: 87lifm1y1n@rustcorp.com.au

Cheers,
Rusty.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH 06/15] KVM: ARM: Initial skeleton to compile KVM support

2012-10-04 Thread Min-gyu Kim

 -Original Message-
 From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On
 Behalf Of Christoffer Dall
 Sent: Monday, October 01, 2012 4:22 AM
 To: Will Deacon
 Cc: kvm@vger.kernel.org; linux-arm-ker...@lists.infradead.org;
 kvm...@lists.cs.columbia.edu; rusty.russ...@linaro.org; a...@redhat.com;
 marc.zyng...@arm.com
 Subject: Re: [PATCH 06/15] KVM: ARM: Initial skeleton to compile KVM
 support

 On Thu, Sep 27, 2012 at 10:13 AM, Will Deacon will.dea...@arm.com wrote:
  On Wed, Sep 26, 2012 at 02:43:14AM +0100, Christoffer Dall wrote:
  On 09/25/2012 11:20 AM, Will Deacon wrote:
   +/* Multiprocessor Affinity Register */
   +#define MPIDR_CPUID(0x3  0)

   I'm fairly sure we already have code under arch/arm/ for dealing
   with the mpidr. Let's re-use that rather than reinventing it here.

  I see some defines in topology.c - do you want some of these factored
  out into a header file that we can then also use from kvm? If so,
where?

  I guess either in topology.h or a new header (topology-bits.h).

   +#define EXCEPTION_NONE  0
   +#define EXCEPTION_RESET 0x80
   +#define EXCEPTION_UNDEFINED 0x40
   +#define EXCEPTION_SOFTWARE  0x20
   +#define EXCEPTION_PREFETCH  0x10
   +#define EXCEPTION_DATA  0x08
   +#define EXCEPTION_IMPRECISE 0x04
   +#define EXCEPTION_IRQ   0x02
   +#define EXCEPTION_FIQ   0x01

   Why the noise?

  these are simply cruft from a previous life of KVM/ARM.

  Ok, then please get rid of them.

   +static inline enum vcpu_mode vcpu_mode(struct kvm_vcpu *vcpu) {
   +   u8 mode = __vcpu_mode(vcpu-arch.regs.cpsr);
   +   BUG_ON(mode == 0xf);
   +   return mode;
   +}

   I noticed that you have a fair few BUG_ONs throughout the series.
   Fair enough, but for hyp code is that really the right thing to do?
   Killing the guest could make more sense, perhaps?

  the idea is to have BUG_ONs that are indeed BUG_ONs that we want to
  catch explicitly on the host. We have had a pass over the code to
  change all the BUG_ONs that can be provoked by the guest and inject
  the proper exceptions into the guest in this case. If you find places
  where this is not the case, it should be changed, and do let me know.

  Ok, so are you saying that a BUG_ON due to some detected inconsistency
  with one guest may not necessarily terminate the other guests? BUG_ONs
  in the host seem like a bad idea if the host is able to continue with
  a subset of guests.

 No, I'm saying a BUG_ON is an actual BUG, it should not happen and there
 should be nowhere where a guest can cause a BUG_ON to occur in the host,
 because that would be a bug.

 We basically never kill a guest unless really extreme things happen (like
 we cannot allocate a pte, in which case we return -ENOMEM). If a guest
 does something really weird, that guest will receive the appropriate
 exception (undefined, prefetch abort, ...)

I agree with Will. It seems to be overreacting to kill the entire system.

From the code above, BUG_ON case clearly points out that there happened a
serious bug case. However, killing the corresponding VM may not cause any
further problem.
Then leave some logs for debugging and killing the VM seems to be enough.

Let's assume KVM for ARM is distributed with a critical bug.
If the case is defended by BUG_ON, it will cause host to shutdown.
If the case is defended by killing VM, it will cause VM to shutdown.
In my opinion, latter case seems to be better.

I looked for a guide on BUG_ON and found this:
 http://yarchive.net/comp/linux/BUG.html

   +static inline u32 *vcpu_pc(struct kvm_vcpu *vcpu) {
   +   return vcpu_reg(vcpu, 15); }

   If you stick a struct pt_regs into struct kvm_regs, you could reuse
   ARM_pc here etc.

  I prefer not to, because we'd have those registers presumably for usr
  mode and then we only define the others explicit. I think it's much
  clearer to look at kvm_regs today.

  I disagree and think that you should reuse as much of the arch/arm/
  code as possible. Not only does it make it easier to read by people
  who are familiar with that code (and in turn get you more reviewers)
  but it also means that we limit the amount of duplication that we have.

 Reusing a struct just for the sake of reusing is not necessarily an
 improvement. Some times it complicates things, and some times it's
 misleading. To me, pt_regs carry the meaning that these are the registers
 for a user space process that traps into the kernel - in KVM we emulate a
 virtual CPU and that current definition is quite clear.

 The argument that more people will review the code if the struct contains
 a pt_regs field rather than a usr_regs field is completely invalid,
 because I'm sure everyone that reviews virtualization code will know that
 user mode is a mode on the cpu and it has some registers and this is the
 state we store when we context switch a VM - pt_regs could be read as the
 regs that we stored in the

Re: [PATCH RFC 1/2] kvm: Handle undercommitted guest case in PLE handler

2012-10-04 Thread Peter Zijlstra

On Thu, 2012-10-04 at 14:41 +0200, Avi Kivity wrote:
 
 Again the numbers are ridiculously high for arch_local_irq_restore.
 Maybe there's a bad perf/kvm interaction when we're injecting an
 interrupt, I can't believe we're spending 84% of the time running the
 popf instruction. 

Smells like a software fallback that doesn't do NMI, hrtimer based
sampling typically hits popf where we re-enable interrupts.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] virtio-net: inline header support

2012-10-04 Thread Paolo Bonzini

Il 04/10/2012 14:51, Rusty Russell ha scritto:
 Paolo Bonzini pbonz...@redhat.com writes:
 
 Il 04/10/2012 02:11, Rusty Russell ha scritto:
 There's a reason I haven't done this.  I really, really dislike my
 implemention isn't broken feature bits.  We could have an infinite
 number of them, for each bug in each device.

 However, this bug affects (almost) all implementations and (almost) all
 devices.  It even makes sense to reserve a transport feature bit for it
 instead of a device feature bit.

 Perhaps, but we have to fix the bugs first!

 Yes. :)  Isn't that what mst's patch does?

 As I said, my torture patch broke qemu immediately.  Since noone has
 leapt onto fixing that, I'll take a look now...

 I can look at virtio-scsi.
 
 Actually, you can't, see my reply to Anthony...
 
 Message-ID: 87lifm1y1n@rustcorp.com.au

struct virtio_scsi_req_cmd {
// Read-only
u8 lun[8];
u64 id;
u8 task_attr;
u8 prio;
u8 crn;
char cdb[cdb_size];
char dataout[];
// Write-only part
u32 sense_len;
u32 residual;
u16 status_qualifier;
u8 status;
u8 response;
u8 sense[sense_size];
char datain[];
};

where cdb_size and sense_size come from configuration space.  The device
right now expects everything before dataout/datain to be in a single
descriptor, but that's in no way part of the spec.  Am I missing
something egregious?

Paolo

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 06/15] KVM: ARM: Initial skeleton to compile KVM support

2012-10-04 Thread Christoffer Dall

On Thu, Oct 4, 2012 at 9:02 AM, Min-gyu Kim mingyu84@samsung.com wrote:

 -Original Message-
 From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On
 Behalf Of Christoffer Dall
 Sent: Monday, October 01, 2012 4:22 AM
 To: Will Deacon
 Cc: kvm@vger.kernel.org; linux-arm-ker...@lists.infradead.org;
 kvm...@lists.cs.columbia.edu; rusty.russ...@linaro.org; a...@redhat.com;
 marc.zyng...@arm.com
 Subject: Re: [PATCH 06/15] KVM: ARM: Initial skeleton to compile KVM
 support

 On Thu, Sep 27, 2012 at 10:13 AM, Will Deacon will.dea...@arm.com wrote:
  On Wed, Sep 26, 2012 at 02:43:14AM +0100, Christoffer Dall wrote:
  On 09/25/2012 11:20 AM, Will Deacon wrote:
   +/* Multiprocessor Affinity Register */
   +#define MPIDR_CPUID(0x3  0)

   I'm fairly sure we already have code under arch/arm/ for dealing
   with the mpidr. Let's re-use that rather than reinventing it here.

  I see some defines in topology.c - do you want some of these factored
  out into a header file that we can then also use from kvm? If so,
 where?

  I guess either in topology.h or a new header (topology-bits.h).

   +#define EXCEPTION_NONE  0
   +#define EXCEPTION_RESET 0x80
   +#define EXCEPTION_UNDEFINED 0x40
   +#define EXCEPTION_SOFTWARE  0x20
   +#define EXCEPTION_PREFETCH  0x10
   +#define EXCEPTION_DATA  0x08
   +#define EXCEPTION_IMPRECISE 0x04
   +#define EXCEPTION_IRQ   0x02
   +#define EXCEPTION_FIQ   0x01

   Why the noise?

  these are simply cruft from a previous life of KVM/ARM.

  Ok, then please get rid of them.

   +static inline enum vcpu_mode vcpu_mode(struct kvm_vcpu *vcpu) {
   +   u8 mode = __vcpu_mode(vcpu-arch.regs.cpsr);
   +   BUG_ON(mode == 0xf);
   +   return mode;
   +}

   I noticed that you have a fair few BUG_ONs throughout the series.
   Fair enough, but for hyp code is that really the right thing to do?
   Killing the guest could make more sense, perhaps?

  the idea is to have BUG_ONs that are indeed BUG_ONs that we want to
  catch explicitly on the host. We have had a pass over the code to
  change all the BUG_ONs that can be provoked by the guest and inject
  the proper exceptions into the guest in this case. If you find places
  where this is not the case, it should be changed, and do let me know.

  Ok, so are you saying that a BUG_ON due to some detected inconsistency
  with one guest may not necessarily terminate the other guests? BUG_ONs
  in the host seem like a bad idea if the host is able to continue with
  a subset of guests.

 No, I'm saying a BUG_ON is an actual BUG, it should not happen and there
 should be nowhere where a guest can cause a BUG_ON to occur in the host,
 because that would be a bug.

 We basically never kill a guest unless really extreme things happen (like
 we cannot allocate a pte, in which case we return -ENOMEM). If a guest
 does something really weird, that guest will receive the appropriate
 exception (undefined, prefetch abort, ...)

 I agree with Will. It seems to be overreacting to kill the entire system.

 From the code above, BUG_ON case clearly points out that there happened a
 serious bug case. However, killing the corresponding VM may not cause any
 further problem.
 Then leave some logs for debugging and killing the VM seems to be enough.

 Let's assume KVM for ARM is distributed with a critical bug.
 If the case is defended by BUG_ON, it will cause host to shutdown.
 If the case is defended by killing VM, it will cause VM to shutdown.
 In my opinion, latter case seems to be better.

 I looked for a guide on BUG_ON and found this:
  http://yarchive.net/comp/linux/BUG.html

I completely agree with all this, no further argument is needed. The
point of a BUG_ON is to explicitly state the reason for a bug that
will anyhow cause the host kernel to overall malfunction. The above
bug_on statement is long gone (see new patches), and if you see other
cases like this, the code should have been tested and we can remove
the BUG_ON.

   +static inline u32 *vcpu_pc(struct kvm_vcpu *vcpu) {
   +   return vcpu_reg(vcpu, 15); }

   If you stick a struct pt_regs into struct kvm_regs, you could reuse
   ARM_pc here etc.

  I prefer not to, because we'd have those registers presumably for usr
  mode and then we only define the others explicit. I think it's much
  clearer to look at kvm_regs today.

  I disagree and think that you should reuse as much of the arch/arm/
  code as possible. Not only does it make it easier to read by people
  who are familiar with that code (and in turn get you more reviewers)
  but it also means that we limit the amount of duplication that we have.

 Reusing a struct just for the sake of reusing is not necessarily an
 improvement. Some times it complicates things, and some times it's
 misleading. To me, pt_regs carry the meaning that these are the registers
 for a user space process that traps into the kernel - in KVM we emulate a
 virtual CPU and

Re: [kvmarm] [PATCH 06/15] KVM: ARM: Initial skeleton to compile KVM support

2012-10-04 Thread Avi Kivity

On 09/25/2012 05:20 PM, Will Deacon wrote:
 +   case KVM_GET_REG_LIST: {
 +   struct kvm_reg_list __user *user_list = argp;
 +   struct kvm_reg_list reg_list;
 +   unsigned n;
 +
 +   if (copy_from_user(reg_list, user_list, sizeof reg_list))
 +   return -EFAULT;
 +   n = reg_list.n;
 +   reg_list.n = kvm_arm_num_regs(vcpu);
 +   if (copy_to_user(user_list, reg_list, sizeof reg_list))
 +   return -EFAULT;
 +   if (n  reg_list.n)
 +   return -E2BIG;
 +   return kvm_arm_copy_reg_indices(vcpu, user_list-reg);
 
 kvm_reg_list sounds like it could be done using a regset instead.

Wouldn't those regsets be userspace oriented?

For example, the GPRs returned here include all the shadowed interrupt
registers (or however they're called) while most user oriented APIs
would only include the user visible registers.

FWIW, we're trying to move to an architecture independent ABI for KVM
registers, but that's a lot of work since we need to make sure all the
weird x86 registers (and non-register state) fit into that.  Maybe that
ABI will be regset based, but I don't want to block the ARM port on this.

-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [patch 2/6] Use machine options to emulate -no-kvm-irqchip

2012-10-04 Thread Anthony Liguori

Jan Kiszka jan.kis...@web.de writes:

 On 2012-10-03 20:26, Marcelo Tosatti wrote:
 On Wed, Oct 03, 2012 at 07:24:48PM +0200, Jan Kiszka wrote:
 On 2012-10-03 19:16, Anthony Liguori wrote:
 Jan Kiszka jan.kis...@web.de writes:

 On 2012-10-03 17:03, Marcelo Tosatti wrote:
 On Wed, Oct 03, 2012 at 09:40:17AM -0500, Anthony Liguori wrote:
 Marcelo Tosatti mtosa...@redhat.com writes:

 Commit 3ad763fcba5bd0ec5a79d4a9b6baeef119dd4a3d from qemu-kvm.git.

 From: Jan Kiszka jan.kis...@siemens.com
 
 Upstream is moving towards this mechanism, so start using it in 
 qemu-kvm
 already to configure the specific defaults: kvm enabled on, just like
 in-kernel irqchips.

 Signed-off-by: Marcelo Tosatti mtosa...@redhat.com


 Reviewed-by: Anthony Liguori aligu...@us.ibm.com

 Although it's a little odd to have From: Jan without a SoB...

 Agree, Jan can you ACK?

 I wasn't able to join the call yesterday: Is there a removal schedule
 associated with those switches? Also, why pushing things upstream, even
 when only for one release, that have been loudly deprecated for a while
 in qemu-kvm? Some switches are lacking deprecated warnings on the
 console, and -no-kvm is missing completely. I tend to focus on patch 1 
 5, dropping the rest - based on relevance for production use.

 The distros need to keep these flags to do the switch.

 Why? Should be documented in commit log.

  I see no point
 in deprecating them since they're trivially easy to maintain.

 Given the level of cr** we already have in the command line, they are
 kind of noise, yes. But even then, these patches are not consistent as
 pointed out above.

 Also, they should not be documented to avoid being spread. That's what
 we did with other deprecated switches in QEMU.

 Jan
 
 Jan,
 
 You're comments to the patch are:
 
 - No documentation.

 See e.g. how -M is handled in qemu-options.hx.

 - Expiration date.

 Anthony said forever, but I think we should remove all those that
 issue deprecation warnings after 1-2 years.

 - Changelog explaining what?? (didnt get that). Perhaps better changelog
   in general?

 I'm still failing to understand who could depend on -no-kvm-irqchip or
 -no-kvm-pit. And I don't understand why -no-kvm was not included. Soe
 the reasons for include -X should be provided. Also check your patch
 subjects again, at least one was wrong.

-no-kvm should be included too.

I just ran across a user that was injecting '-no-kvm-irqchip' in their
libvirt XML via a custom attribute.  It turned out it was to work around
broken MSI support in their funky guest they were running.  It was the
wrong solution to the problem but they were doing it regardless.

The point is, there are users in the wild using these options.  There's
no reason to remove them if they are trivial to maintain (and they are
in their current form).

Regards,

Anthony Liguori


 Jan

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH 6/6] KVM: booke/bookehv: Add debug stub support

2012-10-04 Thread Bhushan Bharat-R65777



 -Original Message-
 From: Alexander Graf [mailto:ag...@suse.de]
 Sent: Thursday, October 04, 2012 4:56 PM
 To: Bhushan Bharat-R65777
 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org
 Subject: Re: [PATCH 6/6] KVM: booke/bookehv: Add debug stub support
 
 
 On 04.10.2012, at 13:06, Bhushan Bharat-R65777 wrote:
 
 
 
  -Original Message-
  From: Alexander Graf [mailto:ag...@suse.de]
  Sent: Monday, September 24, 2012 9:50 PM
  To: Bhushan Bharat-R65777
  Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; Bhushan
  Bharat-R65777
  Subject: Re: [PATCH 6/6] KVM: booke/bookehv: Add debug stub support
 
 
  On 21.08.2012, at 15:52, Bharat Bhushan wrote:
 
  This patch adds the debug stub support on booke/bookehv.
  Now QEMU debug stub can use hw breakpoint, watchpoint and software
  breakpoint to debug guest.
 
  Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
  ---
  arch/powerpc/include/asm/kvm.h|   29 ++-
  arch/powerpc/include/asm/kvm_host.h   |5 +
  arch/powerpc/kernel/asm-offsets.c |   26 ++
  arch/powerpc/kvm/booke.c  |  144 
  +--
 --
  arch/powerpc/kvm/booke_interrupts.S   |  110 +
  arch/powerpc/kvm/bookehv_interrupts.S |  141
 +++-
  arch/powerpc/kvm/e500mc.c |3 +-
  7 files changed, 435 insertions(+), 23 deletions(-)
 
  diff --git a/arch/powerpc/include/asm/kvm.h
  b/arch/powerpc/include/asm/kvm.h index 61b197e..53479ea 100644
  --- a/arch/powerpc/include/asm/kvm.h
  +++ b/arch/powerpc/include/asm/kvm.h
  @@ -25,6 +25,7 @@
  /* Select powerpc specific features in linux/kvm.h */ #define
  __KVM_HAVE_SPAPR_TCE #define __KVM_HAVE_PPC_SMT
  +#define __KVM_HAVE_GUEST_DEBUG
 
  struct kvm_regs {
__u64 pc;
  @@ -264,7 +265,31 @@ struct kvm_fpu {
__u64 fpr[32];
  };
 
  +
  +/*
  + * Defines for h/w breakpoint, watchpoint (read, write or both) and
  + * software breakpoint.
  + * These are used as type in KVM_SET_GUEST_DEBUG ioctl and status
  + * for KVM_DEBUG_EXIT.
  + */
  +#define KVMPPC_DEBUG_NONE0x0
  +#define KVMPPC_DEBUG_BREAKPOINT  (1UL  1)
  +#define KVMPPC_DEBUG_WATCH_WRITE (1UL  2)
  +#define KVMPPC_DEBUG_WATCH_READ  (1UL  3)
  struct kvm_debug_exit_arch {
  + __u64 pc;
  + /*
  +  * exception - returns the exception number. If the KVM_DEBUG_EXIT
  +  * exit is not handled (say not h/w breakpoint or software breakpoint
  +  * set for this address) by qemu then it is supposed to inject this
  +  * exception to guest.
  +  */
  + __u32 exception;
  + /*
  +  * exiting to userspace because of h/w breakpoint, watchpoint
  +  * (read, write or both) and software breakpoint.
  +  */
  + __u32 status;
  };
 
  /* for KVM_SET_GUEST_DEBUG */
  @@ -276,10 +301,6 @@ struct kvm_guest_debug_arch {
 * Type denotes h/w breakpoint, read watchpoint, write
 * watchpoint or watchpoint (both read and write).
 */
  -#define KVMPPC_DEBUG_NOTYPE  0x0
  -#define KVMPPC_DEBUG_BREAKPOINT  (1UL  1)
  -#define KVMPPC_DEBUG_WATCH_WRITE (1UL  2)
  -#define KVMPPC_DEBUG_WATCH_READ  (1UL  3)
__u32 type;
__u32 pad1;
__u64 pad2;
  diff --git a/arch/powerpc/include/asm/kvm_host.h
  b/arch/powerpc/include/asm/kvm_host.h
  index c7219c1..3ba465a 100644
  --- a/arch/powerpc/include/asm/kvm_host.h
  +++ b/arch/powerpc/include/asm/kvm_host.h
  @@ -496,7 +496,12 @@ struct kvm_vcpu_arch {
u32 mmucfg;
u32 epr;
u32 crit_save;
  + /* guest debug registers*/
struct kvmppc_booke_debug_reg dbg_reg;
  + /* shadow debug registers */
  + struct kvmppc_booke_debug_reg shadow_dbg_reg;
  + /* host debug registers*/
  + struct kvmppc_booke_debug_reg host_dbg_reg;
  #endif
gpa_t paddr_accessed;
gva_t vaddr_accessed;
  diff --git a/arch/powerpc/kernel/asm-offsets.c
  b/arch/powerpc/kernel/asm-
  offsets.c
  index 555448e..6987821 100644
  --- a/arch/powerpc/kernel/asm-offsets.c
  +++ b/arch/powerpc/kernel/asm-offsets.c
  @@ -564,6 +564,32 @@ int main(void)
DEFINE(VCPU_FAULT_DEAR, offsetof(struct kvm_vcpu, arch.fault_dear));
DEFINE(VCPU_FAULT_ESR, offsetof(struct kvm_vcpu, arch.fault_esr));
DEFINE(VCPU_CRIT_SAVE, offsetof(struct kvm_vcpu, arch.crit_save));
  + DEFINE(VCPU_DBSR, offsetof(struct kvm_vcpu, arch.dbsr));
  + DEFINE(VCPU_SHADOW_DBG, offsetof(struct kvm_vcpu, arch.shadow_dbg_reg));
  + DEFINE(VCPU_HOST_DBG, offsetof(struct kvm_vcpu, arch.host_dbg_reg));
  + DEFINE(KVMPPC_DBG_DBCR0, offsetof(struct kvmppc_booke_debug_reg,
  +   dbcr0));
  + DEFINE(KVMPPC_DBG_DBCR1, offsetof(struct kvmppc_booke_debug_reg,
  +   dbcr1));
  + DEFINE(KVMPPC_DBG_DBCR2, offsetof(struct kvmppc_booke_debug_reg,
  +   dbcr2));
  +#ifdef CONFIG_KVM_E500MC
  + DEFINE(KVMPPC_DBG_DBCR4, offsetof(struct kvmppc_booke_debug_reg,
  +

[GIT PULL] KVM updates for the 3.7 merge window

2012-10-04 Thread Avi Kivity

Linus, please pull from the repo and tag at:

  git://git.kernel.org/pub/scm/virt/kvm/kvm.git tags/kvm-3.7-1

to receive the KVM updates for the 3.7 merge windows.  Highlights of the
changes for this release include support for vfio level triggered interrupts,
improved big real mode support on older Intels, a streamlines guest page table
walker, guest APIC speedups, PIO optimizations, better overcommit handling, and
read-only memory.

Shortlog and diffstat follow.


KVM updates for the 3.7 merge window


Alex Williamson (1):
  KVM: Add resampling irqfds for level triggered interrupts

Alexander Graf (1):
  KVM: Add ppc hypercall documentation

Avi Kivity (36):
  Merge branch 'queue' into next
  KVM: Don't update PPR on any APIC read
  KVM: Remove internal timer abstraction
  KVM: Simplify kvm_timer
  KVM: Simplify kvm_pit_timer
  KVM: fold kvm_pit_timer into kvm_kpit_state
  Merge remote-tracking branch 'upstream' into next
  KVM: VMX: Advertize RDTSC exiting to nested guests
  KVM: x86 emulator: access GPRs on demand
  KVM: VMX: Separate saving pre-realmode state from setting segments
  KVM: VMX: Fix incorrect lookup of segment S flag in fix_pmode_dataseg()
  KVM: VMX: Use kvm_segment to save protected-mode segments when entering 
realmode
  KVM: VMX: Retain limit and attributes when entering protected mode
  KVM: VMX: Allow real mode emulation using vm86 with dpl=0
  KVM: VMX: Allow vm86 virtualization of big real mode
  KVM: x86 emulator: Leave segment limit and attributs alone in real mode
  KVM: x86 emulator: Check segment limits in real mode too
  KVM: x86 emulator: Fix #GP error code during linearization
  KVM: VMX: Return real real-mode segment data even if 
emulate_invalid_guest_state=1
  KVM: VMX: Preserve segment limit and access rights in real mode
  KVM: VMX: Save all segment data in real mode
  KVM: VMX: Ignore segment G and D bits when considering whether we can 
virtualize
  KVM: VMX: Make lto-friendly
  KVM: VMX: Make use of asm.h
  KVM: SVM: Make use of asm.h
  KVM: MMU: Push clean gpte write protection out of gpte_access()
  KVM: MMU: Optimize gpte_access() slightly
  KVM: MMU: Move gpte_access() out of paging_tmpl.h
  KVM: MMU: Update accessed and dirty bits after guest pagetable walk
  KVM: MMU: Optimize pte permission checks
  KVM: MMU: Simplify walk_addr_generic() loop
  KVM: MMU: Optimize is_last_gpte()
  KVM: MMU: Eliminate eperm temporary
  KVM: MMU: Avoid access/dirty update loop if all is well
  KVM: MMU: Eliminate pointless temporary 'ac'
  Merge branch 'queue' into next

Christian Borntraeger (1):
  KVM: s390: Fix vcpu_load handling in interrupt code

Christoffer Dall (1):
  KVM: Move KVM_IRQ_LINE to arch-generic code

Cornelia Huck (3):
  s390/dis: Instruction decoding interface
  KVM: s390: Add architectural trace events
  KVM: s390: Add implementation-specific trace events

Florian Westphal (1):
  KVM guest: disable stealtime on reboot to avoid mem corruption

Gavin Shan (1):
  KVM: PPC: book3s: fix build error caused by gfn_to_hva_memslot()

Gleb Natapov (18):
  KVM: x86 emulator: drop unneeded call to get_segment()
  KVM: x86: update KVM_SAVE_MSRS_BEGIN to correct value
  KVM: clean up kvm_(set|get)_apic_base
  KVM: use kvm_lapic_set_base() to change apic_base
  KVM: mark apic enabled on start up
  jump_label: Export jump_label_rate_limit()
  KVM: use jump label to optimize checking for HW enabled APIC in APIC_BASE 
MSR
  KVM: use jump label to optimize checking for SW enabled apic in spurious 
interrupt register
  KVM: use jump label to optimize checking for in kernel local apic presence
  KVM: inline kvm_apic_present() and kvm_lapic_enabled()
  KVM: correctly detect APIC SW state in kvm_apic_post_state_restore()
  KVM: VMX: restore MSR_IA32_DEBUGCTLMSR after VMEXIT
  KVM: cleanup pic reset
  KVM: Provide userspace IO exit completion callback
  KVM: emulator: make x86 emulation modes enum instead of defines
  KVM: emulator: string_addr_inc() cleanup
  KVM: emulator: optimize rep ins handling
  KVM: optimize apic interrupt delivery

Guo Chao (7):
  KVM: VMX: Fix typos
  KVM: SVM: Fix typos
  KVM: x86: Fix typos in x86.c
  KVM: x86: Fix typos in emulate.c
  KVM: x86: Fix typos in cpuid.c
  KVM: x86: Fix typos in lapic.c
  KVM: x86: Fix typos in pmu.c

Jan Kiszka (2):
  KVM: Improve wording of KVM_SET_USER_MEMORY_REGION documentation
  KVM: x86: Fix guest debug across vcpu INIT reset

Liu, Jinsong (1):
  KVM: Depend on HIGH_RES_TIMERS

Marcelo Tosatti (7):
  KVM: x86: fix pvclock guest stopped flag reporting
  x86: KVM guest: merge CONFIG_KVM_CLOCK into

Re: vfio problems

2012-10-04 Thread Alex Williamson

On Thu, 2012-10-04 at 11:40 +0200, Dominic Eschweiler wrote:
 Hi,
 
 I just recently started to play with vfio, since the new Kernel 3.6
 comes directly with an integrated vfio-stack. My problem currently is,
 that I'm not able to bind the vfio-pci to the (unused) smbus controller
 in my laptop. 
 
 Here are the steps I did and the related results:
 
 
 # modprobe vfio-pci
 
 [ 1609.065705] VFIO - User Level meta-driver version: 0.3
 
 # lspci
 
 ...
 00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family
 SMBus Controller (rev 05)
 ...
 
 # lspci -n -s :00:1f.3
 
 00:1f.3 0c05: 8086:1c22 (rev 05)
 
 # echo 8086 1c22  /sys/bus/pci/drivers/vfio-pci/new_id
 
 [ 4485.759148] vfio-pci: probe of :00:1f.3 failed with error -22
 
 
 I omitted the unbind step, which is described in the documentation,
 since the device is not claimed by any driver at all. Also the error
 -22 statement isn't really helping in this case. 
 
 Any ideas?
 

What does this report?

readlink /sys/bus/pci/devices/\:00\:1f.3/iommu_group

The likely cause of an EINVAL for an endpoint device is that it's not
part of an IOMMU group, which may mean you don't have an IOMMU enabled.
You can also look in /sys/kernel/iommu_groups to see the groups.
Thanks,

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [patch 2/6] Use machine options to emulate -no-kvm-irqchip

2012-10-04 Thread Jan Kiszka

On 2012-10-04 16:21, Anthony Liguori wrote:
 -no-kvm should be included too.

Reminds me that we still need to agree on the final default accel strategy.

 
 I just ran across a user that was injecting '-no-kvm-irqchip' in their
 libvirt XML via a custom attribute.  It turned out it was to work around
 broken MSI support in their funky guest they were running.  It was the
 wrong solution to the problem but they were doing it regardless.
 
 The point is, there are users in the wild using these options.  There's
 no reason to remove them if they are trivial to maintain (and they are
 in their current form).

So let's define a consistent policy for them all:
 - warn on the command line on use
 - avoid adding them to the help or other user documentation
 - keep them until we rework the whole command line

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 6/6] KVM: booke/bookehv: Add debug stub support

2012-10-04 Thread Alexander Graf


On 04.10.2012, at 16:22, Bhushan Bharat-R65777 wrote:

 
 
 -Original Message-
 From: Alexander Graf [mailto:ag...@suse.de]
 Sent: Thursday, October 04, 2012 4:56 PM
 To: Bhushan Bharat-R65777
 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org
 Subject: Re: [PATCH 6/6] KVM: booke/bookehv: Add debug stub support
 
 
 On 04.10.2012, at 13:06, Bhushan Bharat-R65777 wrote:
 
 
 
 -Original Message-
 From: Alexander Graf [mailto:ag...@suse.de]
 Sent: Monday, September 24, 2012 9:50 PM
 To: Bhushan Bharat-R65777
 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; Bhushan
 Bharat-R65777
 Subject: Re: [PATCH 6/6] KVM: booke/bookehv: Add debug stub support
 
 
 On 21.08.2012, at 15:52, Bharat Bhushan wrote:
 
 This patch adds the debug stub support on booke/bookehv.
 Now QEMU debug stub can use hw breakpoint, watchpoint and software
 breakpoint to debug guest.
 
 Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
 ---
 arch/powerpc/include/asm/kvm.h|   29 ++-
 arch/powerpc/include/asm/kvm_host.h   |5 +
 arch/powerpc/kernel/asm-offsets.c |   26 ++
 arch/powerpc/kvm/booke.c  |  144 
 +--
 --
 arch/powerpc/kvm/booke_interrupts.S   |  110 +
 arch/powerpc/kvm/bookehv_interrupts.S |  141
 +++-
 arch/powerpc/kvm/e500mc.c |3 +-
 7 files changed, 435 insertions(+), 23 deletions(-)
 
 diff --git a/arch/powerpc/include/asm/kvm.h
 b/arch/powerpc/include/asm/kvm.h index 61b197e..53479ea 100644
 --- a/arch/powerpc/include/asm/kvm.h
 +++ b/arch/powerpc/include/asm/kvm.h
 @@ -25,6 +25,7 @@
 /* Select powerpc specific features in linux/kvm.h */ #define
 __KVM_HAVE_SPAPR_TCE #define __KVM_HAVE_PPC_SMT
 +#define __KVM_HAVE_GUEST_DEBUG
 
 struct kvm_regs {
   __u64 pc;
 @@ -264,7 +265,31 @@ struct kvm_fpu {
   __u64 fpr[32];
 };
 
 +
 +/*
 + * Defines for h/w breakpoint, watchpoint (read, write or both) and
 + * software breakpoint.
 + * These are used as type in KVM_SET_GUEST_DEBUG ioctl and status
 + * for KVM_DEBUG_EXIT.
 + */
 +#define KVMPPC_DEBUG_NONE0x0
 +#define KVMPPC_DEBUG_BREAKPOINT  (1UL  1)
 +#define KVMPPC_DEBUG_WATCH_WRITE (1UL  2)
 +#define KVMPPC_DEBUG_WATCH_READ  (1UL  3)
 struct kvm_debug_exit_arch {
 + __u64 pc;
 + /*
 +  * exception - returns the exception number. If the KVM_DEBUG_EXIT
 +  * exit is not handled (say not h/w breakpoint or software breakpoint
 +  * set for this address) by qemu then it is supposed to inject this
 +  * exception to guest.
 +  */
 + __u32 exception;
 + /*
 +  * exiting to userspace because of h/w breakpoint, watchpoint
 +  * (read, write or both) and software breakpoint.
 +  */
 + __u32 status;
 };
 
 /* for KVM_SET_GUEST_DEBUG */
 @@ -276,10 +301,6 @@ struct kvm_guest_debug_arch {
* Type denotes h/w breakpoint, read watchpoint, write
* watchpoint or watchpoint (both read and write).
*/
 -#define KVMPPC_DEBUG_NOTYPE  0x0
 -#define KVMPPC_DEBUG_BREAKPOINT  (1UL  1)
 -#define KVMPPC_DEBUG_WATCH_WRITE (1UL  2)
 -#define KVMPPC_DEBUG_WATCH_READ  (1UL  3)
   __u32 type;
   __u32 pad1;
   __u64 pad2;
 diff --git a/arch/powerpc/include/asm/kvm_host.h
 b/arch/powerpc/include/asm/kvm_host.h
 index c7219c1..3ba465a 100644
 --- a/arch/powerpc/include/asm/kvm_host.h
 +++ b/arch/powerpc/include/asm/kvm_host.h
 @@ -496,7 +496,12 @@ struct kvm_vcpu_arch {
   u32 mmucfg;
   u32 epr;
   u32 crit_save;
 + /* guest debug registers*/
   struct kvmppc_booke_debug_reg dbg_reg;
 + /* shadow debug registers */
 + struct kvmppc_booke_debug_reg shadow_dbg_reg;
 + /* host debug registers*/
 + struct kvmppc_booke_debug_reg host_dbg_reg;
 #endif
   gpa_t paddr_accessed;
   gva_t vaddr_accessed;
 diff --git a/arch/powerpc/kernel/asm-offsets.c
 b/arch/powerpc/kernel/asm-
 offsets.c
 index 555448e..6987821 100644
 --- a/arch/powerpc/kernel/asm-offsets.c
 +++ b/arch/powerpc/kernel/asm-offsets.c
 @@ -564,6 +564,32 @@ int main(void)
   DEFINE(VCPU_FAULT_DEAR, offsetof(struct kvm_vcpu, arch.fault_dear));
   DEFINE(VCPU_FAULT_ESR, offsetof(struct kvm_vcpu, arch.fault_esr));
   DEFINE(VCPU_CRIT_SAVE, offsetof(struct kvm_vcpu, arch.crit_save));
 + DEFINE(VCPU_DBSR, offsetof(struct kvm_vcpu, arch.dbsr));
 + DEFINE(VCPU_SHADOW_DBG, offsetof(struct kvm_vcpu, arch.shadow_dbg_reg));
 + DEFINE(VCPU_HOST_DBG, offsetof(struct kvm_vcpu, arch.host_dbg_reg));
 + DEFINE(KVMPPC_DBG_DBCR0, offsetof(struct kvmppc_booke_debug_reg,
 +   dbcr0));
 + DEFINE(KVMPPC_DBG_DBCR1, offsetof(struct kvmppc_booke_debug_reg,
 +   dbcr1));
 + DEFINE(KVMPPC_DBG_DBCR2, offsetof(struct kvmppc_booke_debug_reg,
 +   dbcr2));
 +#ifdef CONFIG_KVM_E500MC
 + DEFINE(KVMPPC_DBG_DBCR4, offsetof(struct kvmppc_booke_debug_reg,
 +   dbcr4));
 +#endif
 +

Re: [PATCH RFC 1/2] kvm: Handle undercommitted guest case in PLE handler

2012-10-04 Thread Andrew Theurer

On Thu, 2012-10-04 at 14:41 +0200, Avi Kivity wrote:
 On 10/04/2012 12:49 PM, Raghavendra K T wrote:
  On 10/03/2012 10:35 PM, Avi Kivity wrote:
  On 10/03/2012 02:22 PM, Raghavendra K T wrote:
  So I think it's worth trying again with ple_window of 2-4.
 
 
  Hi Avi,
 
  I ran different benchmarks increasing ple_window, and results does not
  seem to be encouraging for increasing ple_window.
 
  Thanks for testing! Comments below.
 
  Results:
  16 core PLE machine with 16 vcpu guest.
 
  base kernel = 3.6-rc5 + ple handler optimization patch
  base_pleopt_8k = base kernel + ple window = 8k
  base_pleopt_16k = base kernel + ple window = 16k
  base_pleopt_32k = base kernel + ple window = 32k
 
 
  Percentage improvements of benchmarks w.r.t base_pleopt with
  ple_window = 4096
 
  base_pleopt_8kbase_pleopt_16kbase_pleopt_32k
  - 

 
  kernbench_1x-5.54915-15.94529-44.31562
  kernbench_2x-7.89399-17.75039-37.73498
 
  So, 44% degradation even with no overcommit?  That's surprising.
  
  Yes. Kernbench was run with #threads = #vcpu * 2 as usual. Is it
  spending 8 times the original ple_window cycles for 16 vcpus
  significant?
 
 A PLE exit when not overcommitted cannot do any good, it is better to
 spin in the guest rather that look for candidates on the host.  In fact
 when we benchmark we often disable PLE completely.

Agreed.  However, I really do not understand why the kernbench regressed
with bigger ple_window.  It should stay the same or improve.  Raghu, do
you have perf data for the kernbench runs?
 
  
 
  I also got perf top output to analyse the difference. Difference comes
  because of flushtlb (and also spinlock).
 
  That's in the guest, yes?
  
  Yes. Perf is in guest.
  
 
 
  Ebizzy run for 4k ple_window
  -  87.20%  [kernel]  [k] arch_local_irq_restore
  - arch_local_irq_restore
 - 100.00% _raw_spin_unlock_irqrestore
+ 52.89% release_pages
+ 47.10% pagevec_lru_move_fn
  -   5.71%  [kernel]  [k] arch_local_irq_restore
  - arch_local_irq_restore
 + 86.03% default_send_IPI_mask_allbutself_phys
 + 13.96% default_send_IPI_mask_sequence_phys
  -   3.10%  [kernel]  [k] smp_call_function_many
smp_call_function_many
 
 
  Ebizzy run for 32k ple_window
 
  -  91.40%  [kernel]  [k] arch_local_irq_restore
  - arch_local_irq_restore
 - 100.00% _raw_spin_unlock_irqrestore
+ 53.13% release_pages
+ 46.86% pagevec_lru_move_fn
  -   4.38%  [kernel]  [k] smp_call_function_many
smp_call_function_many
  -   2.51%  [kernel]  [k] arch_local_irq_restore
  - arch_local_irq_restore
 + 90.76% default_send_IPI_mask_allbutself_phys
 + 9.24% default_send_IPI_mask_sequence_phys
 
 
  Both the 4k and the 32k results are crazy.  Why is
  arch_local_irq_restore() so prominent?  Do you have a very high
  interrupt rate in the guest?
  
  How to measure if I have high interrupt rate in guest?
  From /proc/interrupt numbers I am not able to judge :(
 
 'vmstat 1'
 
  
  I went back and got the results on a 32 core machine with 32 vcpu guest.
  Strangely, I got result supporting the claim that increasing ple_window
  helps for non-overcommitted scenario.
  
  32 core 32 vcpu guest 1x scenarios.
  
  ple_gap = 0
  kernbench: Elapsed Time 38.61
  ebizzy: 7463 records/s
  
  ple_window = 4k
  kernbench: Elapsed Time 43.5067
  ebizzy:2528 records/s
  
  ple_window = 32k
  kernebench : Elapsed Time 39.4133
  ebizzy: 7196 records/s
 
 So maybe something was wrong with the first measurement.

OK, this is more in line with what I expected for kernbench.  FWIW, in
order to show an improvement for a larger ple_window, we really need a
workload which we know has a longer lock holding time (without factoring
in LHP).  We have noticed this on IO based locks mostly.  We saw it with
a massive disk IO test (qla2xxx lock), and also with a large web serving
test (some vfs related lock, but I forget what exactly it was).
 
  
  
  perf top for ebizzy for above:
  ple_gap = 0
  -  84.74%  [kernel]  [k] arch_local_irq_restore
 - arch_local_irq_restore
- 100.00% _raw_spin_unlock_irqrestore
   + 50.96% release_pages
   + 49.02% pagevec_lru_move_fn
  -   6.57%  [kernel]  [k] arch_local_irq_restore
 - arch_local_irq_restore
+ 92.54% default_send_IPI_mask_allbutself_phys
+ 7.46% default_send_IPI_mask_sequence_phys
  -   1.54%  [kernel]  [k] smp_call_function_many
   smp_call_function_many
 
 Again the numbers are ridiculously high for arch_local_irq_restore.
 Maybe there's a bad perf/kvm interaction when we're injecting an
 interrupt, I can't believe we're spending 84% of the time running the
 popf instruction.

I do have a feeling that ebizzy just has too many variables and LHP is
just one of many problems.  However, am I curious what perf kvm from

Re: [PATCH RFC 1/2] kvm: Handle undercommitted guest case in PLE handler

2012-10-04 Thread Avi Kivity

On 10/04/2012 03:07 PM, Peter Zijlstra wrote:
 On Thu, 2012-10-04 at 14:41 +0200, Avi Kivity wrote:
 
 Again the numbers are ridiculously high for arch_local_irq_restore.
 Maybe there's a bad perf/kvm interaction when we're injecting an
 interrupt, I can't believe we're spending 84% of the time running the
 popf instruction. 
 
 Smells like a software fallback that doesn't do NMI, hrtimer based
 sampling typically hits popf where we re-enable interrupts.

Good nose, that's probably it.  Raghavendra, can you ensure that the PMU
is properly exposed?  'dmesg' in the guest will tell.  If it isn't, -cpu
host will expose it (and a good idea anyway to get best performance).

-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH 6/6] KVM: booke/bookehv: Add debug stub support

2012-10-04 Thread Bhushan Bharat-R65777



  -static int emulation_exit(struct kvm_run *run, struct kvm_vcpu
  *vcpu)
  +static int emulation_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
  + int exit_nr)
  {
  enum emulation_result er;
 
  +   if (unlikely(vcpu-guest_debug  KVM_GUESTDBG_USE_SW_BP) 
  +vcpu-arch.last_inst == KVMPPC_INST_GUEST_GDB) {
 
  This belongs into the normal emulation code path, behind the same
  switch() that everything else goes through.
 
  I am not sure I understood correctly. Below is the reason why I
  placed this
  code here.
  Instruction where software breakpoint is to be set is replaced by ehpriv
  instruction. On e500v2, this is not a valid instruction can causes
  program interrupt. On e500mc, ehpriv is a valid instruction. Both
  the exit path calls emulation_exit(), so we have placed the code in this
 function.
  Do you want this code to be moved in program interrupt exit path for
  e500v2
  and BOOKE_INTERRUPT_HV_PRIV for e500mc?
 
  Ok, in this patch you do (basically):
 
  int emulation_exit()
  {
 if (inst == DEBUG_INST) {
 debug_stuff();
 return;
 }
 
 switch (inst) {
 case INST_A:
 foo();
 
 }
  }
 
  Are not we doing something like this:
  int emulation_exit()
  {
  if (inst == DEBUG_INST) {
  debug_stuff();
  return;
  }
 
  status = kvmppc_emulate_instruction()
  switch (status) {
  case FAIL:
  foo();
  case DONE:
  foo1();
  
  }
  }
 
  Do you want something like this:
 
  int emulation_exit()
  {
 
  status = kvmppc_emulate_instruction()
  switch (status) {
  case FAIL:
  if (inst == DEBUG_INST) {
  debug_stuff();
return;
  }
  foo();
 
  case DONE:
  foo1();
  
  }
  }
 
 No, I want the DEBUG_INST be handled the same as any other instruction we
 emulate.

I would like to understand what you are thinking:
What I derived is , add the instruction in kvmppc_emulate_instruction() (or its 
child function) which, 
1) fill the relevant information in run- , kvmppc_account_exit(vcpu, 
DEBUG_EXITS); and returns EMULATION_DONE
 And in emulation_exit()
 status = kvmppc_emulate_instruction()
 switch (status) {
case EMULATION_DONE:
if (inst == DEBUG)
return RESUME_HOST;
 }
 Or
2) kvmppc_account_exit(vcpu, DEBUG_EXITS); returns EMULATION_DONE;
And in emulation_exit()
 status = kvmppc_emulate_instruction()
 switch (status) {
case EMULATION_DONE:
if (inst == DEBUG) {
fill run- 
return RESUME_HOST;
}
 }

Or
3) kvmppc_account_exit(vcpu, DEBUG_EXITS); returns a new status type 
(EMULATION_DEBUG_INST)
And in emulation_exit()
 status = kvmppc_emulate_instruction()
 switch (status) {
case EMULATION_DEBUG_INST:
fill run- 
return RESUME_HOST;
 }

 
 
  what I want is:
 
  int emulation_exit()
  {
 switch (inst) {
 case INST_A:
 foo(); break;
 case DEBUG_INST:
 debug_stuff(); break;
 
 }
  }
 
 
 
 
  +   run-exit_reason = KVM_EXIT_DEBUG;
  +   run-debug.arch.pc = vcpu-arch.pc;
  +   run-debug.arch.exception = exit_nr;
  +   run-debug.arch.status = 0;
  +   kvmppc_account_exit(vcpu, DEBUG_EXITS);
  +   return RESUME_HOST;
  +   }
  +
  er = kvmppc_emulate_instruction(run, vcpu);
  switch (er) {
  case EMULATE_DONE:
  @@ -697,6 +711,44 @@ static int emulation_exit(struct kvm_run
  *run, struct
  kvm_vcpu *vcpu)
  default:
  BUG();
  }
  +
  +   if (unlikely(vcpu-guest_debug  KVM_GUESTDBG_ENABLE) 
  +   (vcpu-guest_debug  KVM_GUESTDBG_SINGLESTEP)) {
 
  I don't understand how this is supposed to work. When we enable
  singlestep, why would we end up in emulation_exit()?
 
  When singlestep is enabled then we set DBCR0[ICMP] and the debug
  handler
  should be able to handle this. I think you are right.
 
 
  +   run-exit_reason = KVM_EXIT_DEBUG;
  +   return RESUME_HOST;
  +   }
  +}
  +
  +static int kvmppc_handle_debug(struct kvm_run *run, struct
  +kvm_vcpu
  +*vcpu) {
  +   u32 dbsr;
  +
  +#ifndef CONFIG_KVM_BOOKE_HV
  +   if (cpu_has_feature(CPU_FTR_DEBUG_LVL_EXC))
  +   vcpu-arch.pc = mfspr(SPRN_DSRR0);
  +   else
  +   vcpu-arch.pc = mfspr(SPRN_CSRR0); #endif
 
  Why doesn't this get handled in the asm code that recovers from the
  respective exceptions?
 
  Yes. I will remove this.
 
 
  +   dbsr = vcpu-arch.dbsr;
  +
  +   run-debug.arch.pc = vcpu-arch.pc;
  +

vfio: missing patch in linux 3.6

2012-10-04 Thread Andreas Hartmann

Hello Alex,

I just tested vfio as part of linux 3.6 and detected, that it doesn't
work because of the following missing patch:

http://news.gmane.org/find-root.php?group=gmane.linux.kernel.pciarticle=16422

Could you please get it applied for the next stable release?


Thanks,
kind regards,
Andreas Hartmann
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: vfio: missing patch in linux 3.6

2012-10-04 Thread Alex Williamson

On Thu, 2012-10-04 at 17:13 +0200, Andreas Hartmann wrote:
 Hello Alex,
 
 I just tested vfio as part of linux 3.6 and detected, that it doesn't
 work because of the following missing patch:
 
 http://news.gmane.org/find-root.php?group=gmane.linux.kernel.pciarticle=16422
 
 Could you please get it applied for the next stable release?

Hi Andreas,

This patch needs to go through Bjorn's PCI tree, but as noted in the
comments I have some outstanding questions that need to be answered,
probably by Joerg.  Thanks,

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [patch 2/6] Use machine options to emulate -no-kvm-irqchip

2012-10-04 Thread Andreas Färber

Am 04.10.2012 16:30, schrieb Jan Kiszka:
 On 2012-10-04 16:21, Anthony Liguori wrote:
 -no-kvm should be included too.
 
 Reminds me that we still need to agree on the final default accel strategy.
 

 I just ran across a user that was injecting '-no-kvm-irqchip' in their
 libvirt XML via a custom attribute.  It turned out it was to work around
 broken MSI support in their funky guest they were running.  It was the
 wrong solution to the problem but they were doing it regardless.

 The point is, there are users in the wild using these options.  There's
 no reason to remove them if they are trivial to maintain (and they are
 in their current form).
 
 So let's define a consistent policy for them all:
  - warn on the command line on use

  - avoid adding them to the help or other user documentation

That's dangerous - at some point someone will notice and propose a patch
documenting them and the reviewers may have forgotten by then why it was
not documented in the first place. Better clearly document them in help
output as DEPRECATED, to be removed in future versions or so.

Andreas

  - keep them until we rework the whole command line
 
 Jan
 


-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 6/6] KVM: booke/bookehv: Add debug stub support

2012-10-04 Thread Alexander Graf


On 04.10.2012, at 17:19, Bhushan Bharat-R65777 wrote:

 
 
 -static int emulation_exit(struct kvm_run *run, struct kvm_vcpu
 *vcpu)
 +static int emulation_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
 + int exit_nr)
 {
 enum emulation_result er;
 
 +   if (unlikely(vcpu-guest_debug  KVM_GUESTDBG_USE_SW_BP) 
 +vcpu-arch.last_inst == KVMPPC_INST_GUEST_GDB) {
 
 This belongs into the normal emulation code path, behind the same
 switch() that everything else goes through.
 
 I am not sure I understood correctly. Below is the reason why I
 placed this
 code here.
 Instruction where software breakpoint is to be set is replaced by ehpriv
 instruction. On e500v2, this is not a valid instruction can causes
 program interrupt. On e500mc, ehpriv is a valid instruction. Both
 the exit path calls emulation_exit(), so we have placed the code in this
 function.
 Do you want this code to be moved in program interrupt exit path for
 e500v2
 and BOOKE_INTERRUPT_HV_PRIV for e500mc?
 
 Ok, in this patch you do (basically):
 
 int emulation_exit()
 {
   if (inst == DEBUG_INST) {
   debug_stuff();
   return;
   }
 
   switch (inst) {
   case INST_A:
   foo();
   
   }
 }
 
 Are not we doing something like this:
 int emulation_exit()
 {
if (inst == DEBUG_INST) {
debug_stuff();
return;
}
 
status = kvmppc_emulate_instruction()
switch (status) {
case FAIL:
foo();
case DONE:
 foo1();

}
 }
 
 Do you want something like this:
 
 int emulation_exit()
 {
 
status = kvmppc_emulate_instruction()
switch (status) {
case FAIL:
 if (inst == DEBUG_INST) {
 debug_stuff();
   return;
 }
foo();
 
case DONE:
 foo1();

}
 }
 
 No, I want the DEBUG_INST be handled the same as any other instruction we
 emulate.
 
 I would like to understand what you are thinking:
 What I derived is , add the instruction in kvmppc_emulate_instruction() (or 
 its child function) which, 
 1) fill the relevant information in run- , kvmppc_account_exit(vcpu, 
 DEBUG_EXITS); and returns EMULATION_DONE
 And in emulation_exit()
 status = kvmppc_emulate_instruction()
 switch (status) {
   case EMULATION_DONE:
   if (inst == DEBUG)
   return RESUME_HOST;
 }
 Or
 2) kvmppc_account_exit(vcpu, DEBUG_EXITS); returns EMULATION_DONE;
 And in emulation_exit()
 status = kvmppc_emulate_instruction()
 switch (status) {
   case EMULATION_DONE:
   if (inst == DEBUG) {
   fill run- 
   return RESUME_HOST;
   }
 }
 
 Or
 3) kvmppc_account_exit(vcpu, DEBUG_EXITS); returns a new status type 
 (EMULATION_DEBUG_INST)
 And in emulation_exit()
 status = kvmppc_emulate_instruction()
 switch (status) {
   case EMULATION_DEBUG_INST:
   fill run- 
   return RESUME_HOST;
 }

This one :).

 
 
 
 what I want is:
 
 int emulation_exit()
 {
   switch (inst) {
   case INST_A:
   foo(); break;
   case DEBUG_INST:
   debug_stuff(); break;
   
   }
 }
 
 
 
 
 +   run-exit_reason = KVM_EXIT_DEBUG;
 +   run-debug.arch.pc = vcpu-arch.pc;
 +   run-debug.arch.exception = exit_nr;
 +   run-debug.arch.status = 0;
 +   kvmppc_account_exit(vcpu, DEBUG_EXITS);
 +   return RESUME_HOST;
 +   }
 +
 er = kvmppc_emulate_instruction(run, vcpu);
 switch (er) {
 case EMULATE_DONE:
 @@ -697,6 +711,44 @@ static int emulation_exit(struct kvm_run
 *run, struct
 kvm_vcpu *vcpu)
 default:
 BUG();
 }
 +
 +   if (unlikely(vcpu-guest_debug  KVM_GUESTDBG_ENABLE) 
 +   (vcpu-guest_debug  KVM_GUESTDBG_SINGLESTEP)) {
 
 I don't understand how this is supposed to work. When we enable
 singlestep, why would we end up in emulation_exit()?
 
 When singlestep is enabled then we set DBCR0[ICMP] and the debug
 handler
 should be able to handle this. I think you are right.
 
 
 +   run-exit_reason = KVM_EXIT_DEBUG;
 +   return RESUME_HOST;
 +   }
 +}
 +
 +static int kvmppc_handle_debug(struct kvm_run *run, struct
 +kvm_vcpu
 +*vcpu) {
 +   u32 dbsr;
 +
 +#ifndef CONFIG_KVM_BOOKE_HV
 +   if (cpu_has_feature(CPU_FTR_DEBUG_LVL_EXC))
 +   vcpu-arch.pc = mfspr(SPRN_DSRR0);
 +   else
 +   vcpu-arch.pc = mfspr(SPRN_CSRR0); #endif
 
 Why doesn't this get handled in the asm code that recovers from the
 respective exceptions?
 
 Yes. I will remove this.
 
 
 +   dbsr = vcpu-arch.dbsr;
 +
 +   run-debug.arch.pc = vcpu-arch.pc;
 +   run-debug.arch.status = 0;
 +   vcpu-arch.dbsr = 0;
 +
 +   if (dbsr  (DBSR_IAC1 | DBSR_IAC2

Re: vfio: missing patch in linux 3.6

2012-10-04 Thread Andreas Hartmann

Alex Williamson wrote:
 On Thu, 2012-10-04 at 17:13 +0200, Andreas Hartmann wrote:
 Hello Alex,
 
 I just tested vfio as part of linux 3.6 and detected, that it
 doesn't work because of the following missing patch:
 
 http://news.gmane.org/find-root.php?group=gmane.linux.kernel.pciarticle=16422


 
Could you please get it applied for the next stable release?
 
 Hi Andreas,
 
 This patch needs to go through Bjorn's PCI tree, but as noted in
 the comments I have some outstanding questions that need to be
 answered, probably by Joerg.  Thanks,

Sorry Alex,
I missed your additional questions because this patch just works
pretty fine here with kernel 3.4.x.

But you're right, your questions should be resolved before freeing it
to the wild.

@Joerg:
Please, liberate me from compiling and patching each kernel myself to
get it working and give all the other users out there a chance to use
this pretty code and the IOMMU support of AMD's hardware :-) by
answering Alex' questions.


Thanks,
kind regards,
Andreas
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [patch 2/6] Use machine options to emulate -no-kvm-irqchip

2012-10-04 Thread Jan Kiszka

On 2012-10-04 17:36, Andreas Färber wrote:
 Am 04.10.2012 16:30, schrieb Jan Kiszka:
 On 2012-10-04 16:21, Anthony Liguori wrote:
 -no-kvm should be included too.

 Reminds me that we still need to agree on the final default accel strategy.


 I just ran across a user that was injecting '-no-kvm-irqchip' in their
 libvirt XML via a custom attribute.  It turned out it was to work around
 broken MSI support in their funky guest they were running.  It was the
 wrong solution to the problem but they were doing it regardless.

 The point is, there are users in the wild using these options.  There's
 no reason to remove them if they are trivial to maintain (and they are
 in their current form).

 So let's define a consistent policy for them all:
  - warn on the command line on use
 
  - avoid adding them to the help or other user documentation
 
 That's dangerous - at some point someone will notice and propose a patch
 documenting them and the reviewers may have forgotten by then why it was
 not documented in the first place. Better clearly document them in help
 output as DEPRECATED, to be removed in future versions or so.

-M is marked as deprecated in the file that you would have to mess up.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] KVM: x86: Skip request checking branches in vcpu_enter_guest() more effectively

2012-10-04 Thread Takuya Yoshikawa

On Mon, 24 Sep 2012 09:16:12 +0200
Gleb Natapov g...@redhat.com wrote:

 Yes, for guests that do not enable steal time KVM_REQ_STEAL_UPDATE
 should be never set, but currently it is. The patch (not tested) should
 fix this.

Thinking a bit more about KVM_REQ_STEAL_UPDATE...

 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index 901ad00..01572f5 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -1544,6 +1544,8 @@ static void accumulate_steal_time(struct kvm_vcpu *vcpu)
   delta = current-sched_info.run_delay - vcpu-arch.st.last_steal;
   vcpu-arch.st.last_steal = current-sched_info.run_delay;
   vcpu-arch.st.accum_steal = delta;
 +
 + kvm_make_request(KVM_REQ_STEAL_UPDATE, vcpu);
  }

Even without this flag, we know when we should record the steal time:
that's when vcpu-arch.st.accum_steal != 0.

  
  static void record_steal_time(struct kvm_vcpu *vcpu)

So we can live without KVM_REQ_STEAL_UPDATE and make record_steal_time()
do the job when vcpu-arch.st.accum_steal != 0 so that it can be called
unconditionally in vcpu_enter_guest().

Why didn't we do that?  The only reason I can think of is we wanted to
eliminate that check for the zero-requests case.

But does such a micro-optimization really help us?

Thanks,
Takuya
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [patch 2/6] Use machine options to emulate -no-kvm-irqchip

2012-10-04 Thread Andreas Färber

Am 04.10.2012 18:04, schrieb Jan Kiszka:
 On 2012-10-04 17:36, Andreas Färber wrote:
 Am 04.10.2012 16:30, schrieb Jan Kiszka:
 On 2012-10-04 16:21, Anthony Liguori wrote:
  - avoid adding them to the help or other user documentation

 That's dangerous - at some point someone will notice and propose a patch
 documenting them and the reviewers may have forgotten by then why it was
 not documented in the first place. Better clearly document them in help
 output as DEPRECATED, to be removed in future versions or so.
 
 -M is marked as deprecated in the file that you would have to mess up.

If it's not within +/- 3 lines and goes through qemu-trivial then that
point is moot. Don't see how deprecation of -no-kvm-irqchip et al. is
related to -M being deprecated either way.

Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: vfio: missing patch in linux 3.6

2012-10-04 Thread Roedel, Joerg

On Thu, Oct 04, 2012 at 09:26:39AM -0600, Alex Williamson wrote:
 On Thu, 2012-10-04 at 17:13 +0200, Andreas Hartmann wrote:
  Hello Alex,
  
  I just tested vfio as part of linux 3.6 and detected, that it doesn't
  work because of the following missing patch:
  
  http://news.gmane.org/find-root.php?group=gmane.linux.kernel.pciarticle=16422
  
  Could you please get it applied for the next stable release?
 
 Hi Andreas,
 
 This patch needs to go through Bjorn's PCI tree, but as noted in the
 comments I have some outstanding questions that need to be answered,
 probably by Joerg.  Thanks,

What are the open questions? I have confirmation from the hardware
people that the south bridges are peer-2-peer safe. Other questions?


Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: vfio: missing patch in linux 3.6

2012-10-04 Thread Alex Williamson

On Thu, 2012-10-04 at 18:14 +0200, Roedel, Joerg wrote:
 On Thu, Oct 04, 2012 at 09:26:39AM -0600, Alex Williamson wrote:
  On Thu, 2012-10-04 at 17:13 +0200, Andreas Hartmann wrote:
   Hello Alex,
   
   I just tested vfio as part of linux 3.6 and detected, that it doesn't
   work because of the following missing patch:
   
   http://news.gmane.org/find-root.php?group=gmane.linux.kernel.pciarticle=16422
   
   Could you please get it applied for the next stable release?
  
  Hi Andreas,
  
  This patch needs to go through Bjorn's PCI tree, but as noted in the
  comments I have some outstanding questions that need to be answered,
  probably by Joerg.  Thanks,
 
 What are the open questions? I have confirmation from the hardware
 people that the south bridges are peer-2-peer safe. Other questions?

Hi Joerg,

There are a couple questions in the link above.  Since the devices don't
expose a PCIe capability, we probably need to add a check to look at the
upstream device and verify we're not on a legacy bus where ACS can't be
enforced.  Then there's the general question of whether the confirmation
of no peer-to-peer applies to every case where we might see this device
(some of them seem to have history that pre-dates this specific package
implementation) or do we need to try to identify specific package
properties in addition to just a device ID?  Thanks,

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [patch 2/6] Use machine options to emulate -no-kvm-irqchip

2012-10-04 Thread Jan Kiszka

On 2012-10-04 18:13, Andreas Färber wrote:
 Am 04.10.2012 18:04, schrieb Jan Kiszka:
 On 2012-10-04 17:36, Andreas Färber wrote:
 Am 04.10.2012 16:30, schrieb Jan Kiszka:
 On 2012-10-04 16:21, Anthony Liguori wrote:
  - avoid adding them to the help or other user documentation

 That's dangerous - at some point someone will notice and propose a patch
 documenting them and the reviewers may have forgotten by then why it was
 not documented in the first place. Better clearly document them in help
 output as DEPRECATED, to be removed in future versions or so.

 -M is marked as deprecated in the file that you would have to mess up.
 
 If it's not within +/- 3 lines and goes through qemu-trivial then that
 point is moot. Don't see how deprecation of -no-kvm-irqchip et al. is
 related to -M being deprecated either way.

I'm only citing an example that these switches should follow:

[qemu-options.hx]
HXCOMM Deprecated by -machine
DEF(M, HAS_ARG, QEMU_OPTION_M, , QEMU_ARCH_ALL)

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/2] VFIO PCI INTx fixes

2012-10-04 Thread Alex Williamson

These patches are now available in my next tree to fix a couple
issues with PCI INTx.  Thanks,

Alex

---

Alex Williamson (2):
  vfio: Fix PCI INTx disable consistency
  vfio: Move PCI INTx eventfd setting earlier


 drivers/vfio/pci/vfio_pci_intrs.c |   18 +++---
 1 file changed, 15 insertions(+), 3 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] vfio: Move PCI INTx eventfd setting earlier

2012-10-04 Thread Alex Williamson

We need to be ready to recieve an interrupt as soon as we call
request_irq, so our eventfd context setting needs to be moved
earlier.  Without this, an interrupt from our device or one
sharing the interrupt line can pass a NULL into eventfd_signal
and oops.

Cc: sta...@vger.kernel.org
Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 drivers/vfio/pci/vfio_pci_intrs.c |5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c 
b/drivers/vfio/pci/vfio_pci_intrs.c
index d8dedc7..c8139a5 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -400,19 +400,20 @@ static int vfio_intx_set_signal(struct vfio_pci_device 
*vdev, int fd)
return PTR_ERR(trigger);
}
 
+   vdev-ctx[0].trigger = trigger;
+
if (!vdev-pci_2_3)
irqflags = 0;
 
ret = request_irq(pdev-irq, vfio_intx_handler,
  irqflags, vdev-ctx[0].name, vdev);
if (ret) {
+   vdev-ctx[0].trigger = NULL;
kfree(vdev-ctx[0].name);
eventfd_ctx_put(trigger);
return ret;
}
 
-   vdev-ctx[0].trigger = trigger;
-
/*
 * INTx disable will stick across the new irq setup,
 * disable_irq won't.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] vfio: Fix PCI INTx disable consistency

2012-10-04 Thread Alex Williamson

The virq_disabled flag tracks the userspace view of INTx masking
across interrupt mode changes, but we're not consistently applying
this to the interrupt and masking handler notion of the device.
Currently if the user sets DisINTx while in MSI or MSIX mode, then
returns to INTx mode (ex. rebooting a qemu guest), the hardware has
DisINTx+, but the management of INTx thinks it's enabled, making it
impossible to actually clear DisINTx.  Fix this by updating the
handler state when INTx is re-enabled.

Cc: sta...@vger.kernel.org
Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 drivers/vfio/pci/vfio_pci_intrs.c |   13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c 
b/drivers/vfio/pci/vfio_pci_intrs.c
index c8139a5..3639371 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -366,6 +366,17 @@ static int vfio_intx_enable(struct vfio_pci_device *vdev)
return -ENOMEM;
 
vdev-num_ctx = 1;
+
+   /*
+* If the virtual interrupt is masked, restore it.  Devices
+* supporting DisINTx can be masked at the hardware level
+* here, non-PCI-2.3 devices will have to wait until the
+* interrupt is enabled.
+*/
+   vdev-ctx[0].masked = vdev-virq_disabled;
+   if (vdev-pci_2_3)
+   pci_intx(vdev-pdev, !vdev-ctx[0].masked);
+
vdev-irq_type = VFIO_PCI_INTX_IRQ_INDEX;
 
return 0;
@@ -419,7 +430,7 @@ static int vfio_intx_set_signal(struct vfio_pci_device 
*vdev, int fd)
 * disable_irq won't.
 */
spin_lock_irqsave(vdev-irqlock, flags);
-   if (!vdev-pci_2_3  (vdev-ctx[0].masked || vdev-virq_disabled))
+   if (!vdev-pci_2_3  vdev-ctx[0].masked)
disable_irq_nosync(pdev-irq);
spin_unlock_irqrestore(vdev-irqlock, flags);
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [patch 2/6] Use machine options to emulate -no-kvm-irqchip

2012-10-04 Thread Marcelo Tosatti

On Thu, Oct 04, 2012 at 05:36:38PM +0200, Andreas Färber wrote:
 Am 04.10.2012 16:30, schrieb Jan Kiszka:
  On 2012-10-04 16:21, Anthony Liguori wrote:
  -no-kvm should be included too.
  
  Reminds me that we still need to agree on the final default accel strategy.
  
 
  I just ran across a user that was injecting '-no-kvm-irqchip' in their
  libvirt XML via a custom attribute.  It turned out it was to work around
  broken MSI support in their funky guest they were running.  It was the
  wrong solution to the problem but they were doing it regardless.
 
  The point is, there are users in the wild using these options.  There's
  no reason to remove them if they are trivial to maintain (and they are
  in their current form).
  
  So let's define a consistent policy for them all:
   - warn on the command line on use
 
   - avoid adding them to the help or other user documentation
 
 That's dangerous - at some point someone will notice and propose a patch
 documenting them and the reviewers may have forgotten by then why it was
 not documented in the first place. Better clearly document them in help
 output as DEPRECATED, to be removed in future versions or so.

Adding the policy to docs/qemukvm-compat-commands-policy.txt should workaround
that problem.

And a friendly text on the announce e-mail.

 Andreas
 
   - keep them until we rework the whole command line
  
  Jan
  

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [patch 2/6] Use machine options to emulate -no-kvm-irqchip

2012-10-04 Thread Marcelo Tosatti

On Thu, Oct 04, 2012 at 04:30:26PM +0200, Jan Kiszka wrote:
 On 2012-10-04 16:21, Anthony Liguori wrote:
  -no-kvm should be included too.
 
 Reminds me that we still need to agree on the final default accel strategy.

Default accel=kvm for x86_64 targets, no fallback.

Markus's argument is pretty convincing to me:

Friendly perhaps, generating an infinite series of questions why is my
guest slow as molasses? certainly.

And for each instance of the question, there's an unknown number of
users who give QEMU a quick try, screw up KVM unknowingly, observe the
glacial speed, and conclude it's crap.

  I just ran across a user that was injecting '-no-kvm-irqchip' in their
  libvirt XML via a custom attribute.  It turned out it was to work around
  broken MSI support in their funky guest they were running.  It was the
  wrong solution to the problem but they were doing it regardless.
  
  The point is, there are users in the wild using these options.  There's
  no reason to remove them if they are trivial to maintain (and they are
  in their current form).
 
 So let's define a consistent policy for them all:
  - warn on the command line on use
  - avoid adding them to the help or other user documentation
  - keep them until we rework the whole command line
 
 Jan
 
 -- 
 Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
 Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: qemu-kvm: remove boot=on|off drive parameter compatibility

2012-10-04 Thread Lucas Meneghel Rodrigues


On 10/04/2012 09:27 AM, Jan Kiszka wrote:

On 2012-10-04 14:10, Lucas Meneghel Rodrigues wrote:

On 10/04/2012 07:48 AM, Jan Kiszka wrote:

On 2012-10-03 15:19, Paolo Bonzini wrote:

Il 03/10/2012 12:57, Lucas Meneghel Rodrigues ha scritto:

Yep, I did send patches with the testdev device present on qemu-kvm.git
to qemu.git a while ago, but there were many comments on the review, I
ended up not implementing everything that was asked and the patches were
archived.

If nobody wants to step up to port it, I'll re-read the original thread
and will spin up new patches (and try to go through the end with it).
Executing the KVM unittests is something that we can't afford to lose,
so I'd say it's important on this last mile effort to get rid of qemu-kvm.


Absolutely, IIRC the problem was that testdev did a little bit of
everything... let's see what's the functionality of testdev:

- write (port 0xf1), can be replaced in autotest with:
-device isa-debugcon,iobase=0xf1,chardev=...

- exit code (port 0xf4), see this series:
http://lists.gnu.org/archive/html/qemu-devel/2012-07/msg00818.html

- ram size (port 0xd1).  If we can also patch kvm-unittests, the memory
is available in the CMOS or in fwcfg.  Here is the SeaBIOS code:

  u32 rs = ((inb_cmos(0x34)  16) | (inb_cmos(0x35)  24));
  if (rs)
  rs += 16 * 1024 * 1024;
  else
  rs = (((inb_cmos(0x30)  10) | (inb_cmos(0x31)  18))
+ 1 * 1024 * 1024);

The rest (ports 0xe0..0xe7, 0x2000..0x2017, MMIO) can be left in testdev.


IIRC, one of the biggest problem with testdev was its hack to inject
interrupts.


Jan, I assume this commit helps to fix this, right?

commit b334ec567f1de9a60349991e7b75083d569ddb0a
Author: Jan Kiszka jan.kis...@siemens.com
Date:   Fri Mar 2 10:30:47 2012 +0100

  qemu-kvm: Use upstream kvm-i8259

  Drop the qemu-kvm version in favor of the equivalent upstream
  implementation. This allows to move the i8259 back into the hwlib.

  Note that this also drops the testdev hack and restores proper
  isa_get_irq. If testdev scripts exist that inject  IRQ15, they need
  fixing. Testing for these interrupts on the PIIX3 makes no practical
  sense anyway as those lines are unused.

  Signed-off-by: Jan Kiszka jan.kis...@siemens.com
  Signed-off-by: Avi Kivity a...@redhat.com


Yes, this improved it a lot as we no longer depend on additional
changes. I'm not sure if there was resistance beyond that.

When cleaning up the code: register_ioport_read must be replaced with
the memory API.


I did look at the MemoryRegionOps/memory_region_init_io and still did 
not figure out how to port things. I'll send a v2 addressing all the 
comments made so far but this one, just to see if people are OK with the 
direction of the full patch, then if you could give me some pointers of 
how to do this conversion, it'd be great.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: qemu-kvm: remove boot=on|off drive parameter compatibility

2012-10-04 Thread Jan Kiszka

On 2012-10-04 19:21, Lucas Meneghel Rodrigues wrote:
 On 10/04/2012 09:27 AM, Jan Kiszka wrote:
 On 2012-10-04 14:10, Lucas Meneghel Rodrigues wrote:
 On 10/04/2012 07:48 AM, Jan Kiszka wrote:
 On 2012-10-03 15:19, Paolo Bonzini wrote:
 Il 03/10/2012 12:57, Lucas Meneghel Rodrigues ha scritto:
 Yep, I did send patches with the testdev device present on qemu-kvm.git
 to qemu.git a while ago, but there were many comments on the review, I
 ended up not implementing everything that was asked and the patches were
 archived.

 If nobody wants to step up to port it, I'll re-read the original thread
 and will spin up new patches (and try to go through the end with it).
 Executing the KVM unittests is something that we can't afford to lose,
 so I'd say it's important on this last mile effort to get rid of 
 qemu-kvm.

 Absolutely, IIRC the problem was that testdev did a little bit of
 everything... let's see what's the functionality of testdev:

 - write (port 0xf1), can be replaced in autotest with:
 -device isa-debugcon,iobase=0xf1,chardev=...

 - exit code (port 0xf4), see this series:
 http://lists.gnu.org/archive/html/qemu-devel/2012-07/msg00818.html

 - ram size (port 0xd1).  If we can also patch kvm-unittests, the memory
 is available in the CMOS or in fwcfg.  Here is the SeaBIOS code:

   u32 rs = ((inb_cmos(0x34)  16) | (inb_cmos(0x35)  24));
   if (rs)
   rs += 16 * 1024 * 1024;
   else
   rs = (((inb_cmos(0x30)  10) | (inb_cmos(0x31)  18))
 + 1 * 1024 * 1024);

 The rest (ports 0xe0..0xe7, 0x2000..0x2017, MMIO) can be left in testdev.

 IIRC, one of the biggest problem with testdev was its hack to inject
 interrupts.

 Jan, I assume this commit helps to fix this, right?

 commit b334ec567f1de9a60349991e7b75083d569ddb0a
 Author: Jan Kiszka jan.kis...@siemens.com
 Date:   Fri Mar 2 10:30:47 2012 +0100

   qemu-kvm: Use upstream kvm-i8259

   Drop the qemu-kvm version in favor of the equivalent upstream
   implementation. This allows to move the i8259 back into the hwlib.

   Note that this also drops the testdev hack and restores proper
   isa_get_irq. If testdev scripts exist that inject  IRQ15, they need
   fixing. Testing for these interrupts on the PIIX3 makes no practical
   sense anyway as those lines are unused.

   Signed-off-by: Jan Kiszka jan.kis...@siemens.com
   Signed-off-by: Avi Kivity a...@redhat.com

 Yes, this improved it a lot as we no longer depend on additional
 changes. I'm not sure if there was resistance beyond that.

 When cleaning up the code: register_ioport_read must be replaced with
 the memory API.
 
 I did look at the MemoryRegionOps/memory_region_init_io and still did 
 not figure out how to port things. I'll send a v2 addressing all the 
 comments made so far but this one, just to see if people are OK with the 
 direction of the full patch, then if you could give me some pointers of 
 how to do this conversion, it'd be great.
 

See e.g. http://thread.gmane.org/gmane.comp.emulators.qemu/171217

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [patch 2/6] Use machine options to emulate -no-kvm-irqchip

2012-10-04 Thread Anthony Liguori

Marcelo Tosatti mtosa...@redhat.com writes:

 On Thu, Oct 04, 2012 at 04:30:26PM +0200, Jan Kiszka wrote:
 On 2012-10-04 16:21, Anthony Liguori wrote:
  -no-kvm should be included too.
 
 Reminds me that we still need to agree on the final default accel strategy.

 Default accel=kvm for x86_64 targets, no fallback.

Ack.

BTW, if you want to override this, you can just do:

[machine]
accel=kvm:tcg

In /etc/qemu/target-i386.conf

So the distros can do whatever they like in their packages.

We should instruct users how to do this in an error message too in case
they want to permanently disable kvm because they don't have appropriate
hardware suport.

Regards,

Anthony Liguori

 Markus's argument is pretty convincing to me:

 Friendly perhaps, generating an infinite series of questions why is my
 guest slow as molasses? certainly.

 And for each instance of the question, there's an unknown number of
 users who give QEMU a quick try, screw up KVM unknowingly, observe the
 glacial speed, and conclude it's crap.

  I just ran across a user that was injecting '-no-kvm-irqchip' in their
  libvirt XML via a custom attribute.  It turned out it was to work around
  broken MSI support in their funky guest they were running.  It was the
  wrong solution to the problem but they were doing it regardless.
  
  The point is, there are users in the wild using these options.  There's
  no reason to remove them if they are trivial to maintain (and they are
  in their current form).
 
 So let's define a consistent policy for them all:
  - warn on the command line on use
  - avoid adding them to the help or other user documentation
  - keep them until we rework the whole command line
 
 Jan
 
 -- 
 Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
 Corporate Competence Center Embedded Linux
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] Proposal for virtio standardization.

2012-10-04 Thread Anthony Liguori

Rusty Russell ru...@rustcorp.com.au writes:

 Hi all,

   I've had several requests for a more formal approach to the
 virtio draft spec, and (after some soul-searching) I'd like to try that.

   The proposal is to use OASIS as the standards body, as it's
 fairly light-weight as these things go.  For me this means paperwork and
 setting up a Working Group and getting the right people involved as
 Voting members starting with the current contributors; for most of you
 it just means a new mailing list, though I'll be cross-posting any
 drafts and major changes here anyway.

   I believe that a documented standard (aka virtio 1.0) will
 increase visibility and adoption in areas outside our normal linux/kvm
 universe.  There's been some of that already, but this is the clearest
 path to accelerate it.  Not the easiest path, but I believe that a solid
 I/O standard is a Good Thing for everyone.

   Yet I also want to decouple new and experimental development
 from the standards effort; running code comes first.  New feature bits
 and new device numbers should be reservable without requiring a full
 spec change.

 So the essence of my proposal is:
 1) I start a Working Group within OASIS where we can aim for virtio spec
1.0.

 2) The current spec is textually reordered so the core is clearly
bus-independent, with PCI, mmio, etc appendices.

 3) Various clarifications, formalizations and cleanups to the spec text,
and possibly elimination of old deprecated features.

 4) The only significant change to the spec is that we use PCI
capabilities, so we can have infinite feature bits.
(see 
 http://lists.linuxfoundation.org/pipermail/virtualization/2011-December/019198.html)

We discussed this on IRC last night.  I don't think PCI capabilites are
a good mechanism to use...

PCI capabilities are there to organize how the PCI config space is
allocated to allow vendor extensions to co-exist with future PCI
extensions.

But we've never used the PCI config space within virtio-pci.  We do
everything in BAR0.  I don't think there's any real advantage of using
the config space vs. a BAR for virtio-pci.

The nice thing about using a BAR is that the region is MMIO or PIO.
This maps really nicely to non-PCI transports too.  But extending the
PCI config space (especially dealing with capability allocation) is
pretty gnarly and there isn't an obvious equivalent outside of PCI.

There are very devices that we emulate today that make use of extended
PCI device registers outside the platform devices (that have no BARs).

Regards,

Anthony Liguori


 5) Changes to the ring layout and other such things are deferred to a
future virtio version; whether this is done within OASIS or
externally depends on how well this works for the 1.0 release.

 Thoughts?
 Rusty.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 00/15] QEMU KVM_GET_SUPPORTED_CPUID cleanups and fixes

2012-10-04 Thread Eduardo Habkost

Most of this series are just cleanups that will help when making -cpu
check/enforce work properly, with some fixes.

In addition to code movements, the main changes are:
 - x2apic won't be enabled if in-kernel irqchip is disabled
   (patch 10)
 - CPUID feature bit filtering is done much earlier, and inside 
target-i386/cpu.c
   (patch 13)
 - CPUID leaf 7 feature bits are now filterd based on GET_SUPPORTED_CPUID too
   (patch 15)

Eduardo Habkost (15):
  i386: kvm: kvm_arch_get_supported_cpuid: move R_EDX hack outside of
for loop
  i386: kvm: kvm_arch_get_supported_cpuid: clean up has_kvm_features
check
  i386: kvm: kvm_arch_get_supported_cpuid: use 'entry' variable
  i386: kvm: extract register switch to cpuid_entry_get_reg() function
  i386: kvm: extract CPUID entry lookup to cpuid_find_entry() function
  i386: kvm: extract try_get_cpuid() loop to get_supported_cpuid()
function
  i386: kvm: kvm_arch_get_supported_cpuid: replace if+switch with
single 'if'
  i386: kvm: set CPUID_EXT_HYPERVISOR on kvm_arch_get_supported_cpuid()
  i386: kvm: set CPUID_EXT_TSC_DEADLINE_TIMER on
kvm_arch_get_supported_cpuid()
  i386: kvm: x2apic is not supported without in-kernel irqchip
  i386: kvm: mask cpuid_kvm_features earlier
  i386: kvm: mask cpuid_ext4_features bits earlier
  i386: kvm: filter CPUID feature words earlier, on cpu.c
  i386: kvm: reformat filter_features_for_kvm() code
  i386: kvm: filter CPUID leaf 7 based on GET_SUPPORTED_CPUID, too

 kvm.h |   1 +
 target-i386/cpu.c |  30 +++
 target-i386/kvm.c | 153 --
 3 files changed, 122 insertions(+), 62 deletions(-)

-- 
1.7.11.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 01/15] i386: kvm: kvm_arch_get_supported_cpuid: move R_EDX hack outside of for loop

2012-10-04 Thread Eduardo Habkost

The for loop will become a separate function, so clean it up so it can
become independent from the bit hacking for R_EDX.

No behavior change[1], just code movement.

[1] Well, only if the kernel returned CPUID leafs 1 or 0x8001 as
unsupported, but there's no kernel version that does that.

Signed-off-by: Eduardo Habkost ehabk...@redhat.com
---
 target-i386/kvm.c | 31 ++-
 1 file changed, 18 insertions(+), 13 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 5b18383..8b4ab34 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -155,24 +155,29 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, 
uint32_t function,
 break;
 case R_EDX:
 ret = cpuid-entries[i].edx;
-switch (function) {
-case 1:
-/* KVM before 2.6.30 misreports the following features */
-ret |= CPUID_MTRR | CPUID_PAT | CPUID_MCE | CPUID_MCA;
-break;
-case 0x8001:
-/* On Intel, kvm returns cpuid according to the Intel spec,
- * so add missing bits according to the AMD spec:
- */
-cpuid_1_edx = kvm_arch_get_supported_cpuid(s, 1, 0, R_EDX);
-ret |= cpuid_1_edx  CPUID_EXT2_AMD_ALIASES;
-break;
-}
 break;
 }
 }
 }
 
+/* Fixups for the data returned by KVM, below */
+
+if (reg == R_EDX) {
+switch (function) {
+case 1:
+/* KVM before 2.6.30 misreports the following features */
+ret |= CPUID_MTRR | CPUID_PAT | CPUID_MCE | CPUID_MCA;
+break;
+case 0x8001:
+/* On Intel, kvm returns cpuid according to the Intel spec,
+ * so add missing bits according to the AMD spec:
+ */
+cpuid_1_edx = kvm_arch_get_supported_cpuid(s, 1, 0, R_EDX);
+ret |= cpuid_1_edx  CPUID_EXT2_AMD_ALIASES;
+break;
+}
+}
+
 g_free(cpuid);
 
 /* fallback for older kernels */
-- 
1.7.11.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 02/15] i386: kvm: kvm_arch_get_supported_cpuid: clean up has_kvm_features check

2012-10-04 Thread Eduardo Habkost

Instead of a function-specific has_kvm_features variable, simply use a
found variable that will be checked in case we have to use the legacy
get_para_features() interface.

No behavior change, just code cleanup.

Signed-off-by: Eduardo Habkost ehabk...@redhat.com
---
 target-i386/kvm.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 8b4ab34..9ebde66 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -130,7 +130,7 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, uint32_t 
function,
 int i, max;
 uint32_t ret = 0;
 uint32_t cpuid_1_edx;
-int has_kvm_features = 0;
+bool found = false;
 
 max = 1;
 while ((cpuid = try_get_cpuid(s, max)) == NULL) {
@@ -140,9 +140,7 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, uint32_t 
function,
 for (i = 0; i  cpuid-nent; ++i) {
 if (cpuid-entries[i].function == function 
 cpuid-entries[i].index == index) {
-if (cpuid-entries[i].function == KVM_CPUID_FEATURES) {
-has_kvm_features = 1;
-}
+found = true;
 switch (reg) {
 case R_EAX:
 ret = cpuid-entries[i].eax;
@@ -181,7 +179,7 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, uint32_t 
function,
 g_free(cpuid);
 
 /* fallback for older kernels */
-if (!has_kvm_features  (function == KVM_CPUID_FEATURES)) {
+if ((function == KVM_CPUID_FEATURES)  !found) {
 ret = get_para_features(s);
 }
 
-- 
1.7.11.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 15/15] i386: kvm: filter CPUID leaf 7 based on GET_SUPPORTED_CPUID, too

2012-10-04 Thread Eduardo Habkost

Now that CPUID leaf 7 features can be enabled/disabled on the
command-line, we need to filter them properly using GET_SUPPORTED_CPUID,
at the same place where other features are filtered out.

Signed-off-by: Eduardo Habkost ehabk...@redhat.com
---
 target-i386/cpu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index 7921b1b..d8cd40a 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -1376,6 +1376,8 @@ static void filter_features_for_kvm(X86CPU *cpu)
 kvm_arch_get_supported_cpuid(s, 0x8001, 0, R_ECX);
 env-cpuid_svm_features  =
 kvm_arch_get_supported_cpuid(s, 0x800A, 0, R_EDX);
+env-cpuid_7_0_ebx_features =
+kvm_arch_get_supported_cpuid(s, 7, 0, R_EBX);
 env-cpuid_kvm_features =
 kvm_arch_get_supported_cpuid(s, KVM_CPUID_FEATURES, 0, R_EAX);
 env-cpuid_ext4_features =
-- 
1.7.11.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 06/15] i386: kvm: extract try_get_cpuid() loop to get_supported_cpuid() function

2012-10-04 Thread Eduardo Habkost

No behavior change, just code movement.

Signed-off-by: Eduardo Habkost ehabk...@redhat.com
---
 target-i386/kvm.c | 19 ++-
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 3519028..7115921 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -98,6 +98,19 @@ static struct kvm_cpuid2 *try_get_cpuid(KVMState *s, int max)
 return cpuid;
 }
 
+/* Run KVM_GET_SUPPORTED_CPUID ioctl(), allocating a buffer large enough
+ * for all entries.
+ */
+static struct kvm_cpuid2 *get_supported_cpuid(KVMState *s)
+{
+struct kvm_cpuid2 *cpuid;
+int max = 1;
+while ((cpuid = try_get_cpuid(s, max)) == NULL) {
+max *= 2;
+}
+return cpuid;
+}
+
 struct kvm_para_features {
 int cap;
 int feature;
@@ -166,15 +179,11 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, 
uint32_t function,
   uint32_t index, int reg)
 {
 struct kvm_cpuid2 *cpuid;
-int max;
 uint32_t ret = 0;
 uint32_t cpuid_1_edx;
 bool found = false;
 
-max = 1;
-while ((cpuid = try_get_cpuid(s, max)) == NULL) {
-max *= 2;
-}
+cpuid = get_supported_cpuid(s);
 
 struct kvm_cpuid_entry2 *entry = cpuid_find_entry(cpuid, function, index);
 if (entry) {
-- 
1.7.11.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 03/15] i386: kvm: kvm_arch_get_supported_cpuid: use 'entry' variable

2012-10-04 Thread Eduardo Habkost

The reg switch will be moved to a separate function, so store the entry
pointer in a variable.

No behavior change, just code movement.

Signed-off-by: Eduardo Habkost ehabk...@redhat.com
---
 target-i386/kvm.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 9ebde66..22e8564 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -140,19 +140,20 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, 
uint32_t function,
 for (i = 0; i  cpuid-nent; ++i) {
 if (cpuid-entries[i].function == function 
 cpuid-entries[i].index == index) {
+struct kvm_cpuid_entry2 *entry = cpuid-entries[i];
 found = true;
 switch (reg) {
 case R_EAX:
-ret = cpuid-entries[i].eax;
+ret = entry-eax;
 break;
 case R_EBX:
-ret = cpuid-entries[i].ebx;
+ret = entry-ebx;
 break;
 case R_ECX:
-ret = cpuid-entries[i].ecx;
+ret = entry-ecx;
 break;
 case R_EDX:
-ret = cpuid-entries[i].edx;
+ret = entry-edx;
 break;
 }
 }
-- 
1.7.11.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 04/15] i386: kvm: extract register switch to cpuid_entry_get_reg() function

2012-10-04 Thread Eduardo Habkost

No behavior change: just code movement.

Signed-off-by: Eduardo Habkost ehabk...@redhat.com
---
 target-i386/kvm.c | 37 +++--
 1 file changed, 23 insertions(+), 14 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 22e8564..ae51573 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -123,6 +123,28 @@ static int get_para_features(KVMState *s)
 }
 
 
+/* Returns the value for a specific register on the cpuid entry
+ */
+static uint32_t cpuid_entry_get_reg(struct kvm_cpuid_entry2 *entry, int reg)
+{
+uint32_t ret = 0;
+switch (reg) {
+case R_EAX:
+ret = entry-eax;
+break;
+case R_EBX:
+ret = entry-ebx;
+break;
+case R_ECX:
+ret = entry-ecx;
+break;
+case R_EDX:
+ret = entry-edx;
+break;
+}
+return ret;
+}
+
 uint32_t kvm_arch_get_supported_cpuid(KVMState *s, uint32_t function,
   uint32_t index, int reg)
 {
@@ -142,20 +164,7 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, 
uint32_t function,
 cpuid-entries[i].index == index) {
 struct kvm_cpuid_entry2 *entry = cpuid-entries[i];
 found = true;
-switch (reg) {
-case R_EAX:
-ret = entry-eax;
-break;
-case R_EBX:
-ret = entry-ebx;
-break;
-case R_ECX:
-ret = entry-ecx;
-break;
-case R_EDX:
-ret = entry-edx;
-break;
-}
+ret = cpuid_entry_get_reg(entry, reg);
 }
 }
 
-- 
1.7.11.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 13/15] i386: kvm: filter CPUID feature words earlier, on cpu.c

2012-10-04 Thread Eduardo Habkost

cpu.c contains the code that will check if all requested CPU features
are available, so the filtering of KVM features must be there, so we can
implement check and enforce properly.

The only point where kvm_arch_init_vcpu() is called on i386 is:

- cpu_x86_init()
  - x86_cpu_realize() (after cpu_x86_register() is called)
- qemu_init_vcpu()
  - qemu_kvm_start_vcpu()
- qemu_kvm_thread_fn() (on a new thread)
  - kvm_init_vcpu()
- kvm_arch_init_vcpu()

With this patch, the filtering will be done earlier, at:
- cpu_x86_init()
  - cpu_x86_register() (before x86_cpu_realize() is called)

Also, the KVM CPUID filtering will now be done at the same place where
the TCG CPUID feature filtering is done. Later, the code can be changed
to use the same filtering code for the check and enforce modes, as
now the cpu.c code knows exactly which CPU features are going to be
exposed to the guest (and much earlier).

One thing I was worrying about when doing this is that
kvm_arch_get_supported_cpuid() depends on kvm_irqchip_in_kernel(), and
maybe the 'kvm_kernel_irqchip' global variable wasn't initialized yet at
CPU creation time. But kvm_kernel_irqchip is initialized during
kvm_init(), that is called very early (much earlier than the machine
init function), and kvm_init() is already a requirement to run the
GET_SUPPORTED_CPUID ioctl() (as kvm_init() initializes the kvm_state
global variable).

Side note: it would be nice to keep KVM-specific code inside kvm.c. The
problem is that properly implementing -cpu check/enforce code (that's
inside cpu.c) depends directly on the feature bit filtering done using
kvm_arch_get_supported_cpuid(). Currently -cpu check/enforce is broken
because it simply uses the host CPU feature bits instead of
GET_SUPPORTED_CPUID, and we need to fix that.

Signed-off-by: Eduardo Habkost ehabk...@redhat.com
---
 kvm.h |  1 +
 target-i386/cpu.c | 30 ++
 target-i386/kvm.c | 18 --
 3 files changed, 31 insertions(+), 18 deletions(-)

diff --git a/kvm.h b/kvm.h
index dea2998..150f95d 100644
--- a/kvm.h
+++ b/kvm.h
@@ -20,6 +20,7 @@
 
 #ifdef CONFIG_KVM
 #include linux/kvm.h
+#include linux/kvm_para.h
 #endif
 
 extern int kvm_allowed;
diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index bb1e44e..a5c1628 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -1360,6 +1360,32 @@ CpuDefinitionInfoList *arch_query_cpu_definitions(Error 
**errp)
 return cpu_list;
 }
 
+#ifdef CONFIG_KVM
+static void filter_features_for_kvm(X86CPU *cpu)
+{
+CPUX86State *env = cpu-env;
+KVMState *s = kvm_state;
+
+env-cpuid_features = kvm_arch_get_supported_cpuid(s, 1, 0, R_EDX);
+
+env-cpuid_ext_features = kvm_arch_get_supported_cpuid(s, 1, 0, R_ECX);
+
+env-cpuid_ext2_features = kvm_arch_get_supported_cpuid(s, 0x8001,
+ 0, R_EDX);
+env-cpuid_ext3_features = kvm_arch_get_supported_cpuid(s, 0x8001,
+ 0, R_ECX);
+env-cpuid_svm_features  = kvm_arch_get_supported_cpuid(s, 0x800A,
+ 0, R_EDX);
+
+env-cpuid_kvm_features =
+kvm_arch_get_supported_cpuid(s, KVM_CPUID_FEATURES, 0, R_EAX);
+
+env-cpuid_ext4_features = kvm_arch_get_supported_cpuid(s, 0xC001,
+ 0, R_EDX);
+
+}
+#endif
+
 int cpu_x86_register(X86CPU *cpu, const char *cpu_model)
 {
 CPUX86State *env = cpu-env;
@@ -1417,6 +1443,10 @@ int cpu_x86_register(X86CPU *cpu, const char *cpu_model)
 );
 env-cpuid_ext3_features = TCG_EXT3_FEATURES;
 env-cpuid_svm_features = TCG_SVM_FEATURES;
+} else {
+#ifdef CONFIG_KVM
+filter_features_for_kvm(cpu);
+#endif
 }
 object_property_set_str(OBJECT(cpu), def-model_id, model-id, error);
 if (error_is_set(error)) {
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 335b3e7..0257083 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -410,30 +410,12 @@ int kvm_arch_init_vcpu(CPUX86State *env)
 struct kvm_cpuid2 cpuid;
 struct kvm_cpuid_entry2 entries[100];
 } QEMU_PACKED cpuid_data;
-KVMState *s = env-kvm_state;
 uint32_t limit, i, j, cpuid_i;
 uint32_t unused;
 struct kvm_cpuid_entry2 *c;
 uint32_t signature[3];
 int r;
 
-env-cpuid_features = kvm_arch_get_supported_cpuid(s, 1, 0, R_EDX);
-
-env-cpuid_ext_features = kvm_arch_get_supported_cpuid(s, 1, 0, R_ECX);
-
-env-cpuid_ext2_features = kvm_arch_get_supported_cpuid(s, 0x8001,
- 0, R_EDX);
-env-cpuid_ext3_features = kvm_arch_get_supported_cpuid(s, 0x8001,
- 0, R_ECX);
-env-cpuid_svm_features  = kvm_arch_get_supported_cpuid(s, 0x800A,
-

[PATCH 05/15] i386: kvm: extract CPUID entry lookup to cpuid_find_entry() function

2012-10-04 Thread Eduardo Habkost

No behavior change, just code movement.

Signed-off-by: Eduardo Habkost ehabk...@redhat.com
---
 target-i386/kvm.c | 30 ++
 1 file changed, 22 insertions(+), 8 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index ae51573..3519028 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -145,11 +145,28 @@ static uint32_t cpuid_entry_get_reg(struct 
kvm_cpuid_entry2 *entry, int reg)
 return ret;
 }
 
+/* Find matching entry for function/index on kvm_cpuid2 struct
+ */
+static struct kvm_cpuid_entry2 *cpuid_find_entry(struct kvm_cpuid2 *cpuid,
+ uint32_t function,
+ uint32_t index)
+{
+int i;
+for (i = 0; i  cpuid-nent; ++i) {
+if (cpuid-entries[i].function == function 
+cpuid-entries[i].index == index) {
+return cpuid-entries[i];
+}
+}
+/* not found: */
+return NULL;
+}
+
 uint32_t kvm_arch_get_supported_cpuid(KVMState *s, uint32_t function,
   uint32_t index, int reg)
 {
 struct kvm_cpuid2 *cpuid;
-int i, max;
+int max;
 uint32_t ret = 0;
 uint32_t cpuid_1_edx;
 bool found = false;
@@ -159,13 +176,10 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, 
uint32_t function,
 max *= 2;
 }
 
-for (i = 0; i  cpuid-nent; ++i) {
-if (cpuid-entries[i].function == function 
-cpuid-entries[i].index == index) {
-struct kvm_cpuid_entry2 *entry = cpuid-entries[i];
-found = true;
-ret = cpuid_entry_get_reg(entry, reg);
-}
+struct kvm_cpuid_entry2 *entry = cpuid_find_entry(cpuid, function, index);
+if (entry) {
+found = true;
+ret = cpuid_entry_get_reg(entry, reg);
 }
 
 /* Fixups for the data returned by KVM, below */
-- 
1.7.11.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 07/15] i386: kvm: kvm_arch_get_supported_cpuid: replace if+switch with single 'if'

2012-10-04 Thread Eduardo Habkost

Additional fixups will be added, and making them a single 'if/else if'
chain makes it clearer than two nested switch statements.

Signed-off-by: Eduardo Habkost ehabk...@redhat.com
---
 target-i386/kvm.c | 23 +--
 1 file changed, 9 insertions(+), 14 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 7115921..084b40c 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -193,20 +193,15 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, 
uint32_t function,
 
 /* Fixups for the data returned by KVM, below */
 
-if (reg == R_EDX) {
-switch (function) {
-case 1:
-/* KVM before 2.6.30 misreports the following features */
-ret |= CPUID_MTRR | CPUID_PAT | CPUID_MCE | CPUID_MCA;
-break;
-case 0x8001:
-/* On Intel, kvm returns cpuid according to the Intel spec,
- * so add missing bits according to the AMD spec:
- */
-cpuid_1_edx = kvm_arch_get_supported_cpuid(s, 1, 0, R_EDX);
-ret |= cpuid_1_edx  CPUID_EXT2_AMD_ALIASES;
-break;
-}
+if (function == 1  reg == R_EDX) {
+/* KVM before 2.6.30 misreports the following features */
+ret |= CPUID_MTRR | CPUID_PAT | CPUID_MCE | CPUID_MCA;
+} else if (function == 0x8001  reg == R_EDX) {
+/* On Intel, kvm returns cpuid according to the Intel spec,
+ * so add missing bits according to the AMD spec:
+ */
+cpuid_1_edx = kvm_arch_get_supported_cpuid(s, 1, 0, R_EDX);
+ret |= cpuid_1_edx  CPUID_EXT2_AMD_ALIASES;
 }
 
 g_free(cpuid);
-- 
1.7.11.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 14/15] i386: kvm: reformat filter_features_for_kvm() code

2012-10-04 Thread Eduardo Habkost

Cosmetic, but it will also help to make futher patches easier to review.

Signed-off-by: Eduardo Habkost ehabk...@redhat.com
---
 target-i386/cpu.c | 28 +---
 1 file changed, 13 insertions(+), 15 deletions(-)

diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index a5c1628..7921b1b 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -1366,22 +1366,20 @@ static void filter_features_for_kvm(X86CPU *cpu)
 CPUX86State *env = cpu-env;
 KVMState *s = kvm_state;
 
-env-cpuid_features = kvm_arch_get_supported_cpuid(s, 1, 0, R_EDX);
-
-env-cpuid_ext_features = kvm_arch_get_supported_cpuid(s, 1, 0, R_ECX);
-
-env-cpuid_ext2_features = kvm_arch_get_supported_cpuid(s, 0x8001,
- 0, R_EDX);
-env-cpuid_ext3_features = kvm_arch_get_supported_cpuid(s, 0x8001,
- 0, R_ECX);
-env-cpuid_svm_features  = kvm_arch_get_supported_cpuid(s, 0x800A,
- 0, R_EDX);
-
+env-cpuid_features =
+kvm_arch_get_supported_cpuid(s, 1, 0, R_EDX);
+env-cpuid_ext_features =
+kvm_arch_get_supported_cpuid(s, 1, 0, R_ECX);
+env-cpuid_ext2_features =
+kvm_arch_get_supported_cpuid(s, 0x8001, 0, R_EDX);
+env-cpuid_ext3_features =
+kvm_arch_get_supported_cpuid(s, 0x8001, 0, R_ECX);
+env-cpuid_svm_features  =
+kvm_arch_get_supported_cpuid(s, 0x800A, 0, R_EDX);
 env-cpuid_kvm_features =
-kvm_arch_get_supported_cpuid(s, KVM_CPUID_FEATURES, 0, R_EAX);
-
-env-cpuid_ext4_features = kvm_arch_get_supported_cpuid(s, 0xC001,
- 0, R_EDX);
+kvm_arch_get_supported_cpuid(s, KVM_CPUID_FEATURES, 0, R_EAX);
+env-cpuid_ext4_features =
+kvm_arch_get_supported_cpuid(s, 0xC001, 0, R_EDX);
 
 }
 #endif
-- 
1.7.11.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 08/15] i386: kvm: set CPUID_EXT_HYPERVISOR on kvm_arch_get_supported_cpuid()

2012-10-04 Thread Eduardo Habkost

Full grep for kvm_arch_get_supported_cpuid:

   kvm.h:uint32_t kvm_arch_get_supported_cpuid(KVMState *env, uint32_t function,
   target-i386/cpu.c:x86_cpu_def-cpuid_7_0_ebx_features = 
kvm_arch_get_supported_cpuid(kvm_state, 0x7, 0, R_EBX);
   target-i386/cpu.c:*eax = kvm_arch_get_supported_cpuid(s, 0xA, 
count, R_EAX);
   target-i386/cpu.c:*ebx = kvm_arch_get_supported_cpuid(s, 0xA, 
count, R_EBX);
   target-i386/cpu.c:*ecx = kvm_arch_get_supported_cpuid(s, 0xA, 
count, R_ECX);
   target-i386/cpu.c:*edx = kvm_arch_get_supported_cpuid(s, 0xA, 
count, R_EDX);
   target-i386/cpu.c:*eax = kvm_arch_get_supported_cpuid(s, 0xd, 
count, R_EAX);
   target-i386/cpu.c:*ebx = kvm_arch_get_supported_cpuid(s, 0xd, 
count, R_EBX);
   target-i386/cpu.c:*ecx = kvm_arch_get_supported_cpuid(s, 0xd, 
count, R_ECX);
   target-i386/cpu.c:*edx = kvm_arch_get_supported_cpuid(s, 0xd, 
count, R_EDX);
   target-i386/kvm.c:uint32_t kvm_arch_get_supported_cpuid(KVMState *s, 
uint32_t function,
   target-i386/kvm.c:cpuid_1_edx = kvm_arch_get_supported_cpuid(s, 1, 
0, R_EDX);
   target-i386/kvm.c:env-cpuid_features = kvm_arch_get_supported_cpuid(s, 
1, 0, R_EDX);
 * target-i386/kvm.c:env-cpuid_ext_features = 
kvm_arch_get_supported_cpuid(s, 1, 0, R_ECX);
   target-i386/kvm.c:env-cpuid_ext2_features = 
kvm_arch_get_supported_cpuid(s, 0x8001,
   target-i386/kvm.c:env-cpuid_ext3_features = 
kvm_arch_get_supported_cpuid(s, 0x8001,
   target-i386/kvm.c:env-cpuid_svm_features  = 
kvm_arch_get_supported_cpuid(s, 0x800A,
   target-i386/kvm.c:kvm_arch_get_supported_cpuid(s, 
KVM_CPUID_FEATURES, 0, R_EAX);
   target-i386/kvm.c:kvm_arch_get_supported_cpuid(s, 0xC001, 0, 
R_EDX);

Note that there is only one call for CPUID[1].ECX above (*), and it is
the one that gets hacked to include CPUID_EXT_HYPERVISOR, so we can
simply make kvm_arch_get_supported_cpuid() set it, to let the rest of
the code automatically know that the flag can be safely set by QEMU.

Signed-off-by: Eduardo Habkost ehabk...@redhat.com
---
 target-i386/kvm.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 084b40c..1f451e3 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -196,6 +196,11 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, 
uint32_t function,
 if (function == 1  reg == R_EDX) {
 /* KVM before 2.6.30 misreports the following features */
 ret |= CPUID_MTRR | CPUID_PAT | CPUID_MCE | CPUID_MCA;
+} else if (function == 1  reg == R_ECX) {
+/* We can set the hypervisor flag, even if KVM does not return it on
+ * GET_SUPPORTED_CPUID
+ */
+ret |= CPUID_EXT_HYPERVISOR;
 } else if (function == 0x8001  reg == R_EDX) {
 /* On Intel, kvm returns cpuid according to the Intel spec,
  * so add missing bits according to the AMD spec:
@@ -399,10 +404,8 @@ int kvm_arch_init_vcpu(CPUX86State *env)
 
 env-cpuid_features = kvm_arch_get_supported_cpuid(s, 1, 0, R_EDX);
 
-i = env-cpuid_ext_features  CPUID_EXT_HYPERVISOR;
 j = env-cpuid_ext_features  CPUID_EXT_TSC_DEADLINE_TIMER;
 env-cpuid_ext_features = kvm_arch_get_supported_cpuid(s, 1, 0, R_ECX);
-env-cpuid_ext_features |= i;
 if (j  kvm_irqchip_in_kernel() 
 kvm_check_extension(s, KVM_CAP_TSC_DEADLINE_TIMER)) {
 env-cpuid_ext_features |= CPUID_EXT_TSC_DEADLINE_TIMER;
-- 
1.7.11.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 09/15] i386: kvm: set CPUID_EXT_TSC_DEADLINE_TIMER on kvm_arch_get_supported_cpuid()

2012-10-04 Thread Eduardo Habkost

This moves the CPUID_EXT_TSC_DEADLINE_TIMER CPUID flag hacking from
kvm_arch_init_vcpu() to kvm_arch_get_supported_cpuid().

Full git grep for kvm_arch_get_supported_cpuid:

   kvm.h:uint32_t kvm_arch_get_supported_cpuid(KVMState *env, uint32_t function,
   target-i386/cpu.c:x86_cpu_def-cpuid_7_0_ebx_features = 
kvm_arch_get_supported_cpuid(kvm_state, 0x7, 0, R_EBX);
   target-i386/cpu.c:*eax = kvm_arch_get_supported_cpuid(s, 0xA, 
count, R_EAX);
   target-i386/cpu.c:*ebx = kvm_arch_get_supported_cpuid(s, 0xA, 
count, R_EBX);
   target-i386/cpu.c:*ecx = kvm_arch_get_supported_cpuid(s, 0xA, 
count, R_ECX);
   target-i386/cpu.c:*edx = kvm_arch_get_supported_cpuid(s, 0xA, 
count, R_EDX);
   target-i386/cpu.c:*eax = kvm_arch_get_supported_cpuid(s, 0xd, 
count, R_EAX);
   target-i386/cpu.c:*ebx = kvm_arch_get_supported_cpuid(s, 0xd, 
count, R_EBX);
   target-i386/cpu.c:*ecx = kvm_arch_get_supported_cpuid(s, 0xd, 
count, R_ECX);
   target-i386/cpu.c:*edx = kvm_arch_get_supported_cpuid(s, 0xd, 
count, R_EDX);
   target-i386/kvm.c:uint32_t kvm_arch_get_supported_cpuid(KVMState *s, 
uint32_t function,
   target-i386/kvm.c:cpuid_1_edx = kvm_arch_get_supported_cpuid(s, 1, 
0, R_EDX);
   target-i386/kvm.c:env-cpuid_features = kvm_arch_get_supported_cpuid(s, 
1, 0, R_EDX);
 * target-i386/kvm.c:env-cpuid_ext_features = 
kvm_arch_get_supported_cpuid(s, 1, 0, R_ECX);
   target-i386/kvm.c:env-cpuid_ext2_features = 
kvm_arch_get_supported_cpuid(s, 0x8001,
   target-i386/kvm.c:env-cpuid_ext3_features = 
kvm_arch_get_supported_cpuid(s, 0x8001,
   target-i386/kvm.c:env-cpuid_svm_features  = 
kvm_arch_get_supported_cpuid(s, 0x800A,
   target-i386/kvm.c:kvm_arch_get_supported_cpuid(s, 
KVM_CPUID_FEATURES, 0, R_EAX);
   target-i386/kvm.c:kvm_arch_get_supported_cpuid(s, 0xC001, 0, 
R_EDX);

Note that there is only one call for CPUID[1].ECX above (*), and it is
the one that gets hacked to include CPUID_EXT_TSC_DEADLINE_TIMER, so we
can simply make kvm_arch_get_supported_cpuid() set it, to let the rest
of the code know the flag can be safely set by QEMU.

One thing I was worrying about when doing this is that now
kvm_arch_get_supported_cpuid() depends on kvm_irqchip_in_kernel(). But
the 'kvm_kernel_irqchip' global variable is initialized during
kvm_init(), that is called very early, and kvm_init() is already a
requirement to run the GET_SUPPORTED_CPUID ioctl() (as kvm_init() is the
function that initializes the 'kvm_state' global variable).

Signed-off-by: Eduardo Habkost ehabk...@redhat.com
---
 target-i386/kvm.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 1f451e3..704ad33 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -201,6 +201,14 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, 
uint32_t function,
  * GET_SUPPORTED_CPUID
  */
 ret |= CPUID_EXT_HYPERVISOR;
+/* tsc-deadline flag is not returned by GET_SUPPORTED_CPUID, but it
+ * can be enabled if the kernel has KVM_CAP_TSC_DEADLINE_TIMER,
+ * and the irqchip is in the kernel.
+ */
+if (kvm_irqchip_in_kernel() 
+kvm_check_extension(s, KVM_CAP_TSC_DEADLINE_TIMER)) {
+ret |= CPUID_EXT_TSC_DEADLINE_TIMER;
+}
 } else if (function == 0x8001  reg == R_EDX) {
 /* On Intel, kvm returns cpuid according to the Intel spec,
  * so add missing bits according to the AMD spec:
@@ -404,12 +412,7 @@ int kvm_arch_init_vcpu(CPUX86State *env)
 
 env-cpuid_features = kvm_arch_get_supported_cpuid(s, 1, 0, R_EDX);
 
-j = env-cpuid_ext_features  CPUID_EXT_TSC_DEADLINE_TIMER;
 env-cpuid_ext_features = kvm_arch_get_supported_cpuid(s, 1, 0, R_ECX);
-if (j  kvm_irqchip_in_kernel() 
-kvm_check_extension(s, KVM_CAP_TSC_DEADLINE_TIMER)) {
-env-cpuid_ext_features |= CPUID_EXT_TSC_DEADLINE_TIMER;
-}
 
 env-cpuid_ext2_features = kvm_arch_get_supported_cpuid(s, 0x8001,
  0, R_EDX);
-- 
1.7.11.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 11/15] i386: kvm: mask cpuid_kvm_features earlier

2012-10-04 Thread Eduardo Habkost

Instead of masking the KVM feature bits very late (while building the
KVM_SET_CPUID2 data), mask it out on env-cpuid_kvm_features, at the
same point where the other feature words are masked out.

Signed-off-by: Eduardo Habkost ehabk...@redhat.com
---
 target-i386/kvm.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 590f4d5..3f4bee5 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -428,6 +428,9 @@ int kvm_arch_init_vcpu(CPUX86State *env)
 env-cpuid_svm_features  = kvm_arch_get_supported_cpuid(s, 0x800A,
  0, R_EDX);
 
+env-cpuid_kvm_features =
+kvm_arch_get_supported_cpuid(s, KVM_CPUID_FEATURES, 0, R_EAX);
+
 cpuid_i = 0;
 
 /* Paravirtualization CPUIDs */
@@ -448,8 +451,7 @@ int kvm_arch_init_vcpu(CPUX86State *env)
 c = cpuid_data.entries[cpuid_i++];
 memset(c, 0, sizeof(*c));
 c-function = KVM_CPUID_FEATURES;
-c-eax = env-cpuid_kvm_features 
-kvm_arch_get_supported_cpuid(s, KVM_CPUID_FEATURES, 0, R_EAX);
+c-eax = env-cpuid_kvm_features;
 
 if (hyperv_enabled()) {
 memcpy(signature, Hv#1\0\0\0\0\0\0\0\0, 12);
-- 
1.7.11.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 12/15] i386: kvm: mask cpuid_ext4_features bits earlier

2012-10-04 Thread Eduardo Habkost

This way all the filtering by GET_SUPPORTED_CPUID is being done at the
same place in the code.

Signed-off-by: Eduardo Habkost ehabk...@redhat.com
---
 target-i386/kvm.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 3f4bee5..335b3e7 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -431,6 +431,9 @@ int kvm_arch_init_vcpu(CPUX86State *env)
 env-cpuid_kvm_features =
 kvm_arch_get_supported_cpuid(s, KVM_CPUID_FEATURES, 0, R_EAX);
 
+env-cpuid_ext4_features = kvm_arch_get_supported_cpuid(s, 0xC001,
+ 0, R_EDX);
+
 cpuid_i = 0;
 
 /* Paravirtualization CPUIDs */
@@ -572,8 +575,6 @@ int kvm_arch_init_vcpu(CPUX86State *env)
 
 /* Call Centaur's CPUID instructions they are supported. */
 if (env-cpuid_xlevel2  0) {
-env-cpuid_ext4_features =
-kvm_arch_get_supported_cpuid(s, 0xC001, 0, R_EDX);
 cpu_x86_cpuid(env, 0xC000, 0, limit, unused, unused, unused);
 
 for (i = 0xC000; i = limit; i++) {
-- 
1.7.11.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 10/15] i386: kvm: x2apic is not supported without in-kernel irqchip

2012-10-04 Thread Eduardo Habkost

This is necessary so that x2apic is not improperly enabled when the
in-kernel irqchip is disabled.

This won't generate a warning with -cpu ...,check because the current
check/enforce code is broken (it checks the host CPU data directly,
instead of using kvm_arch_get_supported_cpuid()), but it will be
eventually fixed to properly report the missing x2apic flag.

Signed-off-by: Eduardo Habkost ehabk...@redhat.com
---
 target-i386/kvm.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 704ad33..590f4d5 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -209,6 +209,13 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, 
uint32_t function,
 kvm_check_extension(s, KVM_CAP_TSC_DEADLINE_TIMER)) {
 ret |= CPUID_EXT_TSC_DEADLINE_TIMER;
 }
+
+/* x2apic is reported by GET_SUPPORTED_CPUID, but it can't be enabled
+ * without the in-kernel irqchip
+ */
+if (!kvm_irqchip_in_kernel()) {
+ret = ~CPUID_EXT_X2APIC;
+}
 } else if (function == 0x8001  reg == R_EDX) {
 /* On Intel, kvm returns cpuid according to the Intel spec,
  * so add missing bits according to the AMD spec:
-- 
1.7.11.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/3] kvm tools: Fix powerpc build errors caused by recent changes

2012-10-04 Thread Michael Ellerman

Several caused by commit 8074303 remove global kvm object,
ioport__setup_arch(), term_getc_iov()  term_getc() in the
spapr_hvcons.c code, and kvm_cpu__reboot() in rtas_power_off().

Commit 221b584 move active_console into struct kvm_config added
checks in h_put_term_char()  h_get_term_char() of
kvm-cfg.active_console but needs to be vcpu-kvm-cfg.active_console.

That commit also missed updates to term_putc()  term_getc() in
spapr_rtas.c, and I'm guessing that we need similar checks of
active_console in rtas_put_term_char()  rtas_get_term_char().

Signed-off-by: Michael Ellerman mich...@ellerman.id.au
---
 tools/kvm/powerpc/ioport.c   |2 +-
 tools/kvm/powerpc/spapr_hvcons.c |6 +++---
 tools/kvm/powerpc/spapr_rtas.c   |   14 +-
 3 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/tools/kvm/powerpc/ioport.c b/tools/kvm/powerpc/ioport.c
index a8e4dc3..264fb7e 100644
--- a/tools/kvm/powerpc/ioport.c
+++ b/tools/kvm/powerpc/ioport.c
@@ -12,7 +12,7 @@
 
 #include stdlib.h
 
-void ioport__setup_arch(void)
+void ioport__setup_arch(struct kvm *kvm)
 {
/* PPC has no legacy ioports to set up */
 }
diff --git a/tools/kvm/powerpc/spapr_hvcons.c b/tools/kvm/powerpc/spapr_hvcons.c
index 1fe4bdb..0bdf75b 100644
--- a/tools/kvm/powerpc/spapr_hvcons.c
+++ b/tools/kvm/powerpc/spapr_hvcons.c
@@ -50,7 +50,7 @@ static unsigned long h_put_term_char(struct kvm_cpu *vcpu, 
unsigned long opcode,
do {
int ret;
 
-   if (kvm-cfg.active_console == CONSOLE_HV)
+   if (vcpu-kvm-cfg.active_console == CONSOLE_HV)
ret = term_putc_iov(iov, 1, 0);
else
ret = 0;
@@ -74,14 +74,14 @@ static unsigned long h_get_term_char(struct kvm_cpu *vcpu, 
unsigned long opcode,
union hv_chario data;
struct iovec iov;
 
-   if (kvm-cfg.active_console != CONSOLE_HV)
+   if (vcpu-kvm-cfg.active_console != CONSOLE_HV)
return H_SUCCESS;
 
if (term_readable(0)) {
iov.iov_base = data.buf;
iov.iov_len = 16;
 
-   *len = term_getc_iov(iov, 1, 0);
+   *len = term_getc_iov(vcpu-kvm, iov, 1, 0);
*char0_7 = be64_to_cpu(data.a.char0_7);
*char8_15 = be64_to_cpu(data.a.char8_15);
} else {
diff --git a/tools/kvm/powerpc/spapr_rtas.c b/tools/kvm/powerpc/spapr_rtas.c
index 14a3462..c81d82b 100644
--- a/tools/kvm/powerpc/spapr_rtas.c
+++ b/tools/kvm/powerpc/spapr_rtas.c
@@ -41,7 +41,7 @@ static void rtas_display_character(struct kvm_cpu *vcpu,
uint32_t nret, target_ulong rets)
 {
char c = rtas_ld(vcpu-kvm, args, 0);
-   term_putc(CONSOLE_HV, c, 1, 0);
+   term_putc(c, 1, 0);
rtas_st(vcpu-kvm, rets, 0, 0);
 }
 
@@ -52,7 +52,10 @@ static void rtas_put_term_char(struct kvm_cpu *vcpu,
   uint32_t nret, target_ulong rets)
 {
char c = rtas_ld(vcpu-kvm, args, 0);
-   term_putc(CONSOLE_HV, c, 1, 0);
+
+   if (vcpu-kvm-cfg.active_console == CONSOLE_HV)
+   term_putc(c, 1, 0);
+
rtas_st(vcpu-kvm, rets, 0, 0);
 }
 
@@ -62,8 +65,9 @@ static void rtas_get_term_char(struct kvm_cpu *vcpu,
   uint32_t nret, target_ulong rets)
 {
int c;
-   if (term_readable(CONSOLE_HV, 0) 
-   (c = term_getc(CONSOLE_HV, 0)) = 0) {
+
+   if (vcpu-kvm-cfg.active_console == CONSOLE_HV  term_readable(0) 
+   (c = term_getc(vcpu-kvm, 0)) = 0) {
rtas_st(vcpu-kvm, rets, 0, 0);
rtas_st(vcpu-kvm, rets, 1, c);
} else {
@@ -115,7 +119,7 @@ static void rtas_power_off(struct kvm_cpu *vcpu,
rtas_st(vcpu-kvm, rets, 0, -3);
return;
}
-   kvm_cpu__reboot();
+   kvm_cpu__reboot(vcpu-kvm);
 }
 
 static void rtas_query_cpu_stopped_state(struct kvm_cpu *vcpu,
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/3] kvm tools: Fix segfault on powerpc in xics_register()

2012-10-04 Thread Michael Ellerman

In commit 06e6648 move kvm_cpus into struct kvm, kvm_cpu__init() became
kvm_cpu__arch_init() called from a new kvm_cpu__init(), and the call was moved
from the end of the init sequence to much earlier, and in particular prior to
irq__init().

This leads to a segfault on powerpc, because kvm_cpu__arch_init() calls into
xics_cpu_register(), which dereferences vcpu-kvm.icp which is uninitialised
until irq__init().

Later in commit a48488d use init/exit where possible, irq__init() was pulled
out of the init sequence and made a dev_base_init() routine, on x86. On powerpc
the call to irq__init() was dropped entirely.

Finally, we now have a circular dependency between kvm_cpu__init() (which needs
kvm-arch.icp), and irq__init() (which needs kvm-nrcpus). This is caused by
the combination of commit 89f40a7 move nrcpus into struct kvm_config,
which moved the global nrcpus into kvm-cfg, and commit 06e6648 move kvm_cpus
into struct kvm, which moved the setup of kvm-nrcpus from kvm-cfg into
kvm_cpu__init().

To fix it we drop irq__init() entirely, if we ever have a non xics irq option
we can bring it back. We turn xics_system_init() into xics_init(), and have it
do the allocation and setup of the icp/ics, including the per-vcpu setup,
removing the dependency from kvm_cpu__init() (via kvm_cpu__arch_init()).

xics_init() is a base_init() routine, it can't be core, which should be early
enough, fingers crossed.

Finally drop irq__exit(), it does nothing and is never called.

Signed-off-by: Michael Ellerman mich...@ellerman.id.au
---
 tools/kvm/powerpc/irq.c |   19 ---
 tools/kvm/powerpc/kvm-cpu.c |3 ---
 tools/kvm/powerpc/xics.c|   38 +++---
 tools/kvm/powerpc/xics.h|5 -
 4 files changed, 23 insertions(+), 42 deletions(-)

diff --git a/tools/kvm/powerpc/irq.c b/tools/kvm/powerpc/irq.c
index 6d134c5..e89fa3b 100644
--- a/tools/kvm/powerpc/irq.c
+++ b/tools/kvm/powerpc/irq.c
@@ -26,8 +26,6 @@
 #include xics.h
 #include spapr_pci.h
 
-#define XICS_IRQS   1024
-
 /*
  * FIXME: The code in this file assumes an SPAPR guest, using XICS.  Make
  * generic  cope with multiple PPC platform types.
@@ -51,23 +49,6 @@ int irq__register_device(u32 dev, u8 *num, u8 *pin, u8 *line)
return 0;
 }
 
-int irq__init(struct kvm *kvm)
-{
-   /*
-* kvm-nr_cpus is now valid; for /now/, pass
-* this to xics_system_init(), which assumes servers
-* are numbered 0..nrcpus.  This may not really be true,
-* but it is OK currently.
-*/
-   kvm-arch.icp = xics_system_init(XICS_IRQS, kvm-nrcpus);
-   return 0;
-}
-
-int irq__exit(struct kvm *kvm)
-{
-   return 0;
-}
-
 int irq__add_msix_route(struct kvm *kvm, struct msi_msg *msg)
 {
die(__FUNCTION__);
diff --git a/tools/kvm/powerpc/kvm-cpu.c b/tools/kvm/powerpc/kvm-cpu.c
index 6aaf424..8fce121 100644
--- a/tools/kvm/powerpc/kvm-cpu.c
+++ b/tools/kvm/powerpc/kvm-cpu.c
@@ -93,9 +93,6 @@ struct kvm_cpu *kvm_cpu__arch_init(struct kvm *kvm, unsigned 
long cpu_id)
 */
vcpu-is_running = true;
 
-   /* Register with IRQ controller (FIXME, assumes XICS) */
-   xics_cpu_register(vcpu);
-
return vcpu;
 }
 
diff --git a/tools/kvm/powerpc/xics.c b/tools/kvm/powerpc/xics.c
index 1cf9558..d4b5caa 100644
--- a/tools/kvm/powerpc/xics.c
+++ b/tools/kvm/powerpc/xics.c
@@ -18,6 +18,8 @@
 #include stdio.h
 #include malloc.h
 
+#define XICS_NUM_IRQS  1024
+
 
 /* #define DEBUG_XICS yes */
 #ifdef DEBUG_XICS
@@ -441,26 +443,19 @@ static void rtas_int_on(struct kvm_cpu *vcpu, uint32_t 
token,
rtas_st(vcpu-kvm, rets, 0, 0); /* Success */
 }
 
-void xics_cpu_register(struct kvm_cpu *vcpu)
-{
-   if (vcpu-cpu_id  vcpu-kvm-arch.icp-nr_servers)
-   vcpu-kvm-arch.icp-ss[vcpu-cpu_id].cpu = vcpu;
-   else
-   die(Setting invalid server for cpuid %ld\n, vcpu-cpu_id);
-}
-
-struct icp_state *xics_system_init(unsigned int nr_irqs, unsigned int nr_cpus)
+static int xics_init(struct kvm *kvm)
 {
int max_server_num;
unsigned int i;
struct icp_state *icp;
struct ics_state *ics;
+   int j;
 
-   max_server_num = nr_cpus;
+   max_server_num = kvm-nrcpus;
 
icp = malloc(sizeof(*icp));
icp-nr_servers = max_server_num + 1;
-   icp-ss = malloc(icp-nr_servers*sizeof(struct icp_server_state));
+   icp-ss = malloc(icp-nr_servers * sizeof(struct icp_server_state));
 
for (i = 0; i  icp-nr_servers; i++) {
icp-ss[i].xirr = 0;
@@ -475,14 +470,14 @@ struct icp_state *xics_system_init(unsigned int nr_irqs, 
unsigned int nr_cpus)
 */
 
ics = malloc(sizeof(*ics));
-   ics-nr_irqs = nr_irqs;
+   ics-nr_irqs = XICS_NUM_IRQS;
ics-offset = XICS_IRQ_OFFSET;
-   ics-irqs = malloc(nr_irqs * sizeof(struct ics_irq_state));
+   ics-irqs = malloc(ics-nr_irqs * sizeof(struct ics_irq_state));
 
icp-ics = ics;

[PATCH 3/3] kvm tools: Do setup_fdt() later, get powerpc to boot again

2012-10-04 Thread Michael Ellerman

In commit e3d3ced kernel load/firmware cleanup, the call to
kvm__arch_setup_firmware() was moved. Previously more or less at the end
of the init sequence, but that commit moved it into kvm__init() which
is a core_init() call and so runs quite early.

This broke booting powerpc guests, as setup_fdt() needs to be called
later in the setup sequence. In particular it looks at kvm-nrcpus,
which is uninitialised at that point.

In general setup_fdt() needs to run late in the sequence, as it encodes
the setup of the machine into the device tree.

So move setup_fdt() out of kvm__arch_setup_firmware() and make it a
firmware_init() call of its own.

With this patch I am able to boot guests again on HV KVM.

Signed-off-by: Michael Ellerman mich...@ellerman.id.au
---
 tools/kvm/powerpc/kvm.c |8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/tools/kvm/powerpc/kvm.c b/tools/kvm/powerpc/kvm.c
index e4f5315..d675265 100644
--- a/tools/kvm/powerpc/kvm.c
+++ b/tools/kvm/powerpc/kvm.c
@@ -286,7 +286,7 @@ static void generate_segment_page_sizes(struct 
kvm_ppc_smmu_info *info, struct f
  * and whilst most PPC targets will require CPU/memory nodes, others like RTAS
  * should eventually be added separately.
  */
-static void setup_fdt(struct kvm *kvm)
+static int setup_fdt(struct kvm *kvm)
 {
uint64_tmem_reg_property[] = { 0, cpu_to_be64(kvm-ram_size) };
int smp_cpus = kvm-nrcpus;
@@ -488,7 +488,10 @@ static void setup_fdt(struct kvm *kvm)
_FDT(fdt_pack(fdt_dest));
 
free(segment_page_sizes.value);
+
+   return 0;
 }
+firmware_init(setup_fdt);
 
 /**
  * kvm__arch_setup_firmware
@@ -517,9 +520,6 @@ int kvm__arch_setup_firmware(struct kvm *kvm)
 
/* Load SLOF */
 
-   /* Init FDT */
-   setup_fdt(kvm);
-
return 0;
 }
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH] kvm: Set default accelerator to kvm if the host supports it

2012-10-04 Thread Alexander Graf


On 03.10.2012, at 22:26, Peter Maydell wrote:

 On 3 October 2012 21:01, Blue Swirl blauwir...@gmail.com wrote:
 On Mon, Oct 1, 2012 at 4:20 PM, Anthony Liguori anth...@codemonkey.ws 
 wrote:
 Jan Kiszka jan.kis...@siemens.com writes:
 +/* The default accelerator depends on the availability of KVM. */
 +p = kvm_configured ? kvm : tcg;
 }
 
 Blue/Aurelien, any objections?
 
 No, maybe a message could be printed that says that the default has
 changed, for a few releases.
 
 I've lost track of the conversation, are we currently proposing
 the accelerator default to be kvm (as per the original patch
 you quote here) or kvm:tcg ?
 
 I'm not entirely sure which I prefer from an ARM perspective
 For some time to come and for a lot of targets (ie any target
 CPU except A15), having a default of kvm is going to cause
 existing working commandlines to stop working. [I expect that
 ARM-host qemu binaries will be built with CONFIG_KVM once ARM
 KVM support lands, but the same binary will be run on hosts
 without virtualization extensions.] On the other hand, perhaps
 there just aren't really very many people who run QEMU on
 ARM hosts, and so we can ignore them :-)

We get similar problems on PPC. Take the following example:

  $ qemu-system-ppc -M mpc8544ds -kernel uImage -nographic

That command line would work just fine if you run it on a G5 today. However, if 
we switch to accel=kvm:tcg, it will stop working, because kvm on the G5 can not 
virtualize an e500 CPU.

I don't know any good way around that one either the way the code is layered 
today. We only know the type of CPU we want to create after we made the 
decision whether we do KVM or not.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH] kvm: Set default accelerator to kvm if the host supports it

2012-10-04 Thread Anthony Liguori

Alexander Graf ag...@suse.de writes:

 On 03.10.2012, at 22:26, Peter Maydell wrote:

 On 3 October 2012 21:01, Blue Swirl blauwir...@gmail.com wrote:
 On Mon, Oct 1, 2012 at 4:20 PM, Anthony Liguori anth...@codemonkey.ws 
 wrote:
 Jan Kiszka jan.kis...@siemens.com writes:
 +/* The default accelerator depends on the availability of KVM. */
 +p = kvm_configured ? kvm : tcg;
 }
 
 Blue/Aurelien, any objections?
 
 No, maybe a message could be printed that says that the default has
 changed, for a few releases.
 
 I've lost track of the conversation, are we currently proposing
 the accelerator default to be kvm (as per the original patch
 you quote here) or kvm:tcg ?
 
 I'm not entirely sure which I prefer from an ARM perspective
 For some time to come and for a lot of targets (ie any target
 CPU except A15), having a default of kvm is going to cause
 existing working commandlines to stop working. [I expect that
 ARM-host qemu binaries will be built with CONFIG_KVM once ARM
 KVM support lands, but the same binary will be run on hosts
 without virtualization extensions.] On the other hand, perhaps
 there just aren't really very many people who run QEMU on
 ARM hosts, and so we can ignore them :-)

 We get similar problems on PPC. Take the following example:

   $ qemu-system-ppc -M mpc8544ds -kernel uImage -nographic

But do you really expect people to do this?  I have to believe that
people running on PPC hardware and running qemu-system-ppc most likely
want to do KVM...

Kernel development seems unlikely...  The folks that I know doing PPC
kernel development with QEMU still qemu-system-ppc on x86.

Regards,

Anthony Liguori


 That command line would work just fine if you run it on a G5 today. However, 
 if we switch to accel=kvm:tcg, it will stop working, because kvm on the G5 
 can not virtualize an e500 CPU.

 I don't know any good way around that one either the way the code is layered 
 today. We only know the type of CPU we want to create after we made the 
 decision whether we do KVM or not.


 Alex
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH v2 06/14] KVM: ARM: Memory virtualization setup

2012-10-04 Thread Min-gyu Kim



 -Original Message-
 From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On
 Behalf Of Christoffer Dall
 Sent: Monday, October 01, 2012 6:11 PM
 To: kvm@vger.kernel.org; linux-arm-ker...@lists.infradead.org;
 kvm...@lists.cs.columbia.edu
 Cc: Marc Zyngier
 Subject: [PATCH v2 06/14] KVM: ARM: Memory virtualization setup
 
 +static void stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache
 *cache,
 +phys_addr_t addr, const pte_t *new_pte) {
 + pgd_t *pgd;
 + pud_t *pud;
 + pmd_t *pmd;
 + pte_t *pte, old_pte;
 +
 + /* Create 2nd stage page table mapping - Level 1 */
 + pgd = kvm-arch.pgd + pgd_index(addr);
 + pud = pud_offset(pgd, addr);
 + if (pud_none(*pud)) {
 + if (!cache)
 + return; /* ignore calls from kvm_set_spte_hva */
 + pmd = mmu_memory_cache_alloc(cache);
 + pud_populate(NULL, pud, pmd);
 + pmd += pmd_index(addr);
 + get_page(virt_to_page(pud));
 + } else
 + pmd = pmd_offset(pud, addr);
 +
 + /* Create 2nd stage page table mapping - Level 2 */
 + if (pmd_none(*pmd)) {
 + if (!cache)
 + return; /* ignore calls from kvm_set_spte_hva */
 + pte = mmu_memory_cache_alloc(cache);
 + clean_pte_table(pte);
 + pmd_populate_kernel(NULL, pmd, pte);
 + pte += pte_index(addr);
 + get_page(virt_to_page(pmd));
 + } else
 + pte = pte_offset_kernel(pmd, addr);
 +
 + /* Create 2nd stage page table mapping - Level 3 */
 + old_pte = *pte;
 + set_pte_ext(pte, *new_pte, 0);
 + if (pte_present(old_pte))
 + __kvm_tlb_flush_vmid(kvm);
 + else
 + get_page(virt_to_page(pte));
 +}


I'm not sure about the 3-level page table, but isn't it necessary to
clean the page table for 2nd level?
There are two mmu_memory_cache_alloc calls. One has following clean_pte_table
and the other doesn't have. 

And why do you ignore calls from kvm_set_spte_hva? It is supposed to happen when
host moves the page, right? Then you ignore the case because it can be handled
later when fault actually happens? Is there any other reason that I miss?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH 1/6] KVM: PPC: booke: use vcpu reference from thread_struct

2012-10-04 Thread Bhushan Bharat-R65777



 -Original Message-
 From: Alexander Graf [mailto:ag...@suse.de]
 Sent: Monday, September 24, 2012 9:58 PM
 To: Bhushan Bharat-R65777
 Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; Bhushan Bharat-R65777
 Subject: Re: [PATCH 1/6] KVM: PPC: booke: use vcpu reference from 
 thread_struct
 
 
 On 21.08.2012, at 15:51, Bharat Bhushan wrote:
 
  Like other places, use thread_struct to get vcpu reference.
 
 Please remove the definition of SPRN_SPRG_R/WVCPU as well.

Ok

Thanks
-Bharat

 
 
 Alex
 
 
  Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
  ---
  arch/powerpc/kernel/asm-offsets.c   |2 +-
  arch/powerpc/kvm/booke_interrupts.S |6 ++
  2 files changed, 3 insertions(+), 5 deletions(-)
 
  diff --git a/arch/powerpc/kernel/asm-offsets.c
  b/arch/powerpc/kernel/asm-offsets.c
  index 85b05c4..fbb999c 100644
  --- a/arch/powerpc/kernel/asm-offsets.c
  +++ b/arch/powerpc/kernel/asm-offsets.c
  @@ -116,7 +116,7 @@ int main(void)
  #ifdef CONFIG_KVM_BOOK3S_32_HANDLER
  DEFINE(THREAD_KVM_SVCPU, offsetof(struct thread_struct,
  kvm_shadow_vcpu)); #endif -#ifdef CONFIG_KVM_BOOKE_HV
  +#if defined(CONFIG_KVM)  defined(CONFIG_BOOKE)
  DEFINE(THREAD_KVM_VCPU, offsetof(struct thread_struct, kvm_vcpu));
  #endif
 
  diff --git a/arch/powerpc/kvm/booke_interrupts.S
  b/arch/powerpc/kvm/booke_interrupts.S
  index bb46b32..ca16d57 100644
  --- a/arch/powerpc/kvm/booke_interrupts.S
  +++ b/arch/powerpc/kvm/booke_interrupts.S
  @@ -56,7 +56,8 @@
  _GLOBAL(kvmppc_handler_\ivor_nr)
  /* Get pointer to vcpu and record exit number. */
  mtspr   \scratch , r4
  -   mfspr   r4, SPRN_SPRG_RVCPU
  +   mfspr   r4, SPRN_SPRG_THREAD
  +   lwz r4, THREAD_KVM_VCPU(r4)
  stw r3, VCPU_GPR(R3)(r4)
  stw r5, VCPU_GPR(R5)(r4)
  stw r6, VCPU_GPR(R6)(r4)
  @@ -402,9 +403,6 @@ lightweight_exit:
  lwz r8, kvmppc_booke_handlers@l(r8)
  mtspr   SPRN_IVPR, r8
 
  -   /* Save vcpu pointer for the exception handlers. */
  -   mtspr   SPRN_SPRG_WVCPU, r4
  -
  lwz r5, VCPU_SHARED(r4)
 
  /* Can't switch the stack pointer until after IVPR is switched,
  --
  1.7.0.4
 
 
 


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH 4/6] KVM: PPC: debug stub interface parameter defined

2012-10-04 Thread Bhushan Bharat-R65777



 -Original Message-
 From: Alexander Graf [mailto:ag...@suse.de]
 Sent: Monday, September 24, 2012 9:09 PM
 To: Bhushan Bharat-R65777
 Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; Bhushan Bharat-R65777
 Subject: Re: [PATCH 4/6] KVM: PPC: debug stub interface parameter defined
 
 
 On 21.08.2012, at 15:51, Bharat Bhushan wrote:
 
  This patch defines the interface parameter for KVM_SET_GUEST_DEBUG
  ioctl support. Follow up patches will use this for setting up hardware
  breakpoints, watchpoints and software breakpoints.
 
  Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
  ---
  arch/powerpc/include/asm/kvm.h |   33 +
  arch/powerpc/kvm/book3s.c  |6 ++
  arch/powerpc/kvm/booke.c   |6 ++
  arch/powerpc/kvm/powerpc.c |6 --
  4 files changed, 45 insertions(+), 6 deletions(-)
 
  diff --git a/arch/powerpc/include/asm/kvm.h
  b/arch/powerpc/include/asm/kvm.h index 3c14202..61b197e 100644
  --- a/arch/powerpc/include/asm/kvm.h
  +++ b/arch/powerpc/include/asm/kvm.h
  @@ -269,8 +269,41 @@ struct kvm_debug_exit_arch {
 
  /* for KVM_SET_GUEST_DEBUG */
  struct kvm_guest_debug_arch {
  +   struct {
  +   /* H/W breakpoint/watchpoint address */
  +   __u64 addr;
  +   /*
  +* Type denotes h/w breakpoint, read watchpoint, write
  +* watchpoint or watchpoint (both read and write).
  +*/
  +#define KVMPPC_DEBUG_NOTYPE0x0
  +#define KVMPPC_DEBUG_BREAKPOINT(1UL  1)
  +#define KVMPPC_DEBUG_WATCH_WRITE   (1UL  2)
  +#define KVMPPC_DEBUG_WATCH_READ(1UL  3)
  +   __u32 type;
  +   __u32 pad1;
 
 Why the padding?

Not sure why, I will remove this.

 
  +   __u64 pad2;
  +   } bp[16];
 
 Why 16?

I think for now 6 (4 iac + 2 dac) is sufficient for BOOKE. We kept 16 to have 
some room for future and other platforms.

Thanks
-Bharat
 
  };
 
  +/* Debug related defines */
  +/*
  + * kvm_guest_debug-control is a 32 bit field. The lower 16 bits are
  +generic
  + * and upper 16 bits are architecture specific. Architecture specific
  +defines
  + * that ioctl is for setting hardware breakpoint or software breakpoint.
  + */
  +#define KVM_GUESTDBG_USE_SW_BP 0x0001
  +#define KVM_GUESTDBG_USE_HW_BP 0x0002
  +
  +/* When setting software breakpoint, Change the software breakpoint
  + * instruction to special trap instruction and set
  +KVM_GUESTDBG_USE_SW_BP
  + * flag in kvm_guest_debug-control. KVM does keep track of software
  + * breakpoints. So when KVM_GUESTDBG_USE_SW_BP flag is set and
  +special trap
  + * instruction is executed by guest then exit to userspace.
  + * NOTE: A Nice interface can be added to get the special trap instruction.
  + */
  +#define KVMPPC_INST_GUEST_GDB  0x7C00021C  /* ehpriv OC=0 
  */
 
 This definitely has to be passed to user space (which writes that instruction
 into guest phys memory). Other PPC subarchs will use different instructions.
 Just model it as a read-only ONE_REG.
 
 
 Alex
 


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH 4/6] KVM: PPC: debug stub interface parameter defined

2012-10-04 Thread Bhushan Bharat-R65777



 -Original Message-
 From: Alexander Graf [mailto:ag...@suse.de]
 Sent: Monday, September 24, 2012 9:09 PM
 To: Bhushan Bharat-R65777
 Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; Bhushan Bharat-R65777
 Subject: Re: [PATCH 4/6] KVM: PPC: debug stub interface parameter defined
 
 
 On 21.08.2012, at 15:51, Bharat Bhushan wrote:
 
  This patch defines the interface parameter for KVM_SET_GUEST_DEBUG
  ioctl support. Follow up patches will use this for setting up hardware
  breakpoints, watchpoints and software breakpoints.
 
  Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
  ---
  arch/powerpc/include/asm/kvm.h |   33 +
  arch/powerpc/kvm/book3s.c  |6 ++
  arch/powerpc/kvm/booke.c   |6 ++
  arch/powerpc/kvm/powerpc.c |6 --
  4 files changed, 45 insertions(+), 6 deletions(-)
 
  diff --git a/arch/powerpc/include/asm/kvm.h
  b/arch/powerpc/include/asm/kvm.h index 3c14202..61b197e 100644
  --- a/arch/powerpc/include/asm/kvm.h
  +++ b/arch/powerpc/include/asm/kvm.h
  @@ -269,8 +269,41 @@ struct kvm_debug_exit_arch {
 
  /* for KVM_SET_GUEST_DEBUG */
  struct kvm_guest_debug_arch {
  +   struct {
  +   /* H/W breakpoint/watchpoint address */
  +   __u64 addr;
  +   /*
  +* Type denotes h/w breakpoint, read watchpoint, write
  +* watchpoint or watchpoint (both read and write).
  +*/
  +#define KVMPPC_DEBUG_NOTYPE0x0
  +#define KVMPPC_DEBUG_BREAKPOINT(1UL  1)
  +#define KVMPPC_DEBUG_WATCH_WRITE   (1UL  2)
  +#define KVMPPC_DEBUG_WATCH_READ(1UL  3)
  +   __u32 type;
  +   __u32 pad1;
 
 Why the padding?
 
  +   __u64 pad2;
  +   } bp[16];
 
 Why 16?
 
  };
 
  +/* Debug related defines */
  +/*
  + * kvm_guest_debug-control is a 32 bit field. The lower 16 bits are
  +generic
  + * and upper 16 bits are architecture specific. Architecture specific
  +defines
  + * that ioctl is for setting hardware breakpoint or software breakpoint.
  + */
  +#define KVM_GUESTDBG_USE_SW_BP 0x0001
  +#define KVM_GUESTDBG_USE_HW_BP 0x0002
  +
  +/* When setting software breakpoint, Change the software breakpoint
  + * instruction to special trap instruction and set
  +KVM_GUESTDBG_USE_SW_BP
  + * flag in kvm_guest_debug-control. KVM does keep track of software
  + * breakpoints. So when KVM_GUESTDBG_USE_SW_BP flag is set and
  +special trap
  + * instruction is executed by guest then exit to userspace.
  + * NOTE: A Nice interface can be added to get the special trap instruction.
  + */
  +#define KVMPPC_INST_GUEST_GDB  0x7C00021C  /* ehpriv OC=0 
  */
 
 This definitely has to be passed to user space (which writes that instruction
 into guest phys memory). Other PPC subarchs will use different instructions.
 Just model it as a read-only ONE_REG.

Ok.

Thanks
-Bharat


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH 6/6] KVM: booke/bookehv: Add debug stub support

2012-10-04 Thread Bhushan Bharat-R65777



 -Original Message-
 From: Alexander Graf [mailto:ag...@suse.de]
 Sent: Monday, September 24, 2012 9:50 PM
 To: Bhushan Bharat-R65777
 Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; Bhushan Bharat-R65777
 Subject: Re: [PATCH 6/6] KVM: booke/bookehv: Add debug stub support
 
 
 On 21.08.2012, at 15:52, Bharat Bhushan wrote:
 
  This patch adds the debug stub support on booke/bookehv.
  Now QEMU debug stub can use hw breakpoint, watchpoint and
  software breakpoint to debug guest.
 
  Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
  ---
  arch/powerpc/include/asm/kvm.h|   29 ++-
  arch/powerpc/include/asm/kvm_host.h   |5 +
  arch/powerpc/kernel/asm-offsets.c |   26 ++
  arch/powerpc/kvm/booke.c  |  144 
  +
  arch/powerpc/kvm/booke_interrupts.S   |  110 +
  arch/powerpc/kvm/bookehv_interrupts.S |  141 
  +++-
  arch/powerpc/kvm/e500mc.c |3 +-
  7 files changed, 435 insertions(+), 23 deletions(-)
 
  diff --git a/arch/powerpc/include/asm/kvm.h b/arch/powerpc/include/asm/kvm.h
  index 61b197e..53479ea 100644
  --- a/arch/powerpc/include/asm/kvm.h
  +++ b/arch/powerpc/include/asm/kvm.h
  @@ -25,6 +25,7 @@
  /* Select powerpc specific features in linux/kvm.h */
  #define __KVM_HAVE_SPAPR_TCE
  #define __KVM_HAVE_PPC_SMT
  +#define __KVM_HAVE_GUEST_DEBUG
 
  struct kvm_regs {
  __u64 pc;
  @@ -264,7 +265,31 @@ struct kvm_fpu {
  __u64 fpr[32];
  };
 
  +
  +/*
  + * Defines for h/w breakpoint, watchpoint (read, write or both) and
  + * software breakpoint.
  + * These are used as type in KVM_SET_GUEST_DEBUG ioctl and status
  + * for KVM_DEBUG_EXIT.
  + */
  +#define KVMPPC_DEBUG_NONE  0x0
  +#define KVMPPC_DEBUG_BREAKPOINT(1UL  1)
  +#define KVMPPC_DEBUG_WATCH_WRITE   (1UL  2)
  +#define KVMPPC_DEBUG_WATCH_READ(1UL  3)
  struct kvm_debug_exit_arch {
  +   __u64 pc;
  +   /*
  +* exception - returns the exception number. If the KVM_DEBUG_EXIT
  +* exit is not handled (say not h/w breakpoint or software breakpoint
  +* set for this address) by qemu then it is supposed to inject this
  +* exception to guest.
  +*/
  +   __u32 exception;
  +   /*
  +* exiting to userspace because of h/w breakpoint, watchpoint
  +* (read, write or both) and software breakpoint.
  +*/
  +   __u32 status;
  };
 
  /* for KVM_SET_GUEST_DEBUG */
  @@ -276,10 +301,6 @@ struct kvm_guest_debug_arch {
   * Type denotes h/w breakpoint, read watchpoint, write
   * watchpoint or watchpoint (both read and write).
   */
  -#define KVMPPC_DEBUG_NOTYPE0x0
  -#define KVMPPC_DEBUG_BREAKPOINT(1UL  1)
  -#define KVMPPC_DEBUG_WATCH_WRITE   (1UL  2)
  -#define KVMPPC_DEBUG_WATCH_READ(1UL  3)
  __u32 type;
  __u32 pad1;
  __u64 pad2;
  diff --git a/arch/powerpc/include/asm/kvm_host.h
 b/arch/powerpc/include/asm/kvm_host.h
  index c7219c1..3ba465a 100644
  --- a/arch/powerpc/include/asm/kvm_host.h
  +++ b/arch/powerpc/include/asm/kvm_host.h
  @@ -496,7 +496,12 @@ struct kvm_vcpu_arch {
  u32 mmucfg;
  u32 epr;
  u32 crit_save;
  +   /* guest debug registers*/
  struct kvmppc_booke_debug_reg dbg_reg;
  +   /* shadow debug registers */
  +   struct kvmppc_booke_debug_reg shadow_dbg_reg;
  +   /* host debug registers*/
  +   struct kvmppc_booke_debug_reg host_dbg_reg;
  #endif
  gpa_t paddr_accessed;
  gva_t vaddr_accessed;
  diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-
 offsets.c
  index 555448e..6987821 100644
  --- a/arch/powerpc/kernel/asm-offsets.c
  +++ b/arch/powerpc/kernel/asm-offsets.c
  @@ -564,6 +564,32 @@ int main(void)
  DEFINE(VCPU_FAULT_DEAR, offsetof(struct kvm_vcpu, arch.fault_dear));
  DEFINE(VCPU_FAULT_ESR, offsetof(struct kvm_vcpu, arch.fault_esr));
  DEFINE(VCPU_CRIT_SAVE, offsetof(struct kvm_vcpu, arch.crit_save));
  +   DEFINE(VCPU_DBSR, offsetof(struct kvm_vcpu, arch.dbsr));
  +   DEFINE(VCPU_SHADOW_DBG, offsetof(struct kvm_vcpu, arch.shadow_dbg_reg));
  +   DEFINE(VCPU_HOST_DBG, offsetof(struct kvm_vcpu, arch.host_dbg_reg));
  +   DEFINE(KVMPPC_DBG_DBCR0, offsetof(struct kvmppc_booke_debug_reg,
  + dbcr0));
  +   DEFINE(KVMPPC_DBG_DBCR1, offsetof(struct kvmppc_booke_debug_reg,
  + dbcr1));
  +   DEFINE(KVMPPC_DBG_DBCR2, offsetof(struct kvmppc_booke_debug_reg,
  + dbcr2));
  +#ifdef CONFIG_KVM_E500MC
  +   DEFINE(KVMPPC_DBG_DBCR4, offsetof(struct kvmppc_booke_debug_reg,
  + dbcr4));
  +#endif
  +   DEFINE(KVMPPC_DBG_IAC1, offsetof(struct kvmppc_booke_debug_reg,
  +iac[0]));
  +   DEFINE(KVMPPC_DBG_IAC2, offsetof(struct kvmppc_booke_debug_reg,
  +

Re: [PATCH 6/6] KVM: booke/bookehv: Add debug stub support

2012-10-04 Thread Alexander Graf


On 04.10.2012, at 13:06, Bhushan Bharat-R65777 wrote:

 
 
 -Original Message-
 From: Alexander Graf [mailto:ag...@suse.de]
 Sent: Monday, September 24, 2012 9:50 PM
 To: Bhushan Bharat-R65777
 Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; Bhushan Bharat-R65777
 Subject: Re: [PATCH 6/6] KVM: booke/bookehv: Add debug stub support
 
 
 On 21.08.2012, at 15:52, Bharat Bhushan wrote:
 
 This patch adds the debug stub support on booke/bookehv.
 Now QEMU debug stub can use hw breakpoint, watchpoint and
 software breakpoint to debug guest.
 
 Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
 ---
 arch/powerpc/include/asm/kvm.h|   29 ++-
 arch/powerpc/include/asm/kvm_host.h   |5 +
 arch/powerpc/kernel/asm-offsets.c |   26 ++
 arch/powerpc/kvm/booke.c  |  144 
 +
 arch/powerpc/kvm/booke_interrupts.S   |  110 +
 arch/powerpc/kvm/bookehv_interrupts.S |  141 
 +++-
 arch/powerpc/kvm/e500mc.c |3 +-
 7 files changed, 435 insertions(+), 23 deletions(-)
 
 diff --git a/arch/powerpc/include/asm/kvm.h b/arch/powerpc/include/asm/kvm.h
 index 61b197e..53479ea 100644
 --- a/arch/powerpc/include/asm/kvm.h
 +++ b/arch/powerpc/include/asm/kvm.h
 @@ -25,6 +25,7 @@
 /* Select powerpc specific features in linux/kvm.h */
 #define __KVM_HAVE_SPAPR_TCE
 #define __KVM_HAVE_PPC_SMT
 +#define __KVM_HAVE_GUEST_DEBUG
 
 struct kvm_regs {
 __u64 pc;
 @@ -264,7 +265,31 @@ struct kvm_fpu {
 __u64 fpr[32];
 };
 
 +
 +/*
 + * Defines for h/w breakpoint, watchpoint (read, write or both) and
 + * software breakpoint.
 + * These are used as type in KVM_SET_GUEST_DEBUG ioctl and status
 + * for KVM_DEBUG_EXIT.
 + */
 +#define KVMPPC_DEBUG_NONE  0x0
 +#define KVMPPC_DEBUG_BREAKPOINT(1UL  1)
 +#define KVMPPC_DEBUG_WATCH_WRITE   (1UL  2)
 +#define KVMPPC_DEBUG_WATCH_READ(1UL  3)
 struct kvm_debug_exit_arch {
 +   __u64 pc;
 +   /*
 +* exception - returns the exception number. If the KVM_DEBUG_EXIT
 +* exit is not handled (say not h/w breakpoint or software breakpoint
 +* set for this address) by qemu then it is supposed to inject this
 +* exception to guest.
 +*/
 +   __u32 exception;
 +   /*
 +* exiting to userspace because of h/w breakpoint, watchpoint
 +* (read, write or both) and software breakpoint.
 +*/
 +   __u32 status;
 };
 
 /* for KVM_SET_GUEST_DEBUG */
 @@ -276,10 +301,6 @@ struct kvm_guest_debug_arch {
  * Type denotes h/w breakpoint, read watchpoint, write
  * watchpoint or watchpoint (both read and write).
  */
 -#define KVMPPC_DEBUG_NOTYPE0x0
 -#define KVMPPC_DEBUG_BREAKPOINT(1UL  1)
 -#define KVMPPC_DEBUG_WATCH_WRITE   (1UL  2)
 -#define KVMPPC_DEBUG_WATCH_READ(1UL  3)
 __u32 type;
 __u32 pad1;
 __u64 pad2;
 diff --git a/arch/powerpc/include/asm/kvm_host.h
 b/arch/powerpc/include/asm/kvm_host.h
 index c7219c1..3ba465a 100644
 --- a/arch/powerpc/include/asm/kvm_host.h
 +++ b/arch/powerpc/include/asm/kvm_host.h
 @@ -496,7 +496,12 @@ struct kvm_vcpu_arch {
 u32 mmucfg;
 u32 epr;
 u32 crit_save;
 +   /* guest debug registers*/
 struct kvmppc_booke_debug_reg dbg_reg;
 +   /* shadow debug registers */
 +   struct kvmppc_booke_debug_reg shadow_dbg_reg;
 +   /* host debug registers*/
 +   struct kvmppc_booke_debug_reg host_dbg_reg;
 #endif
 gpa_t paddr_accessed;
 gva_t vaddr_accessed;
 diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-
 offsets.c
 index 555448e..6987821 100644
 --- a/arch/powerpc/kernel/asm-offsets.c
 +++ b/arch/powerpc/kernel/asm-offsets.c
 @@ -564,6 +564,32 @@ int main(void)
 DEFINE(VCPU_FAULT_DEAR, offsetof(struct kvm_vcpu, arch.fault_dear));
 DEFINE(VCPU_FAULT_ESR, offsetof(struct kvm_vcpu, arch.fault_esr));
 DEFINE(VCPU_CRIT_SAVE, offsetof(struct kvm_vcpu, arch.crit_save));
 +   DEFINE(VCPU_DBSR, offsetof(struct kvm_vcpu, arch.dbsr));
 +   DEFINE(VCPU_SHADOW_DBG, offsetof(struct kvm_vcpu, arch.shadow_dbg_reg));
 +   DEFINE(VCPU_HOST_DBG, offsetof(struct kvm_vcpu, arch.host_dbg_reg));
 +   DEFINE(KVMPPC_DBG_DBCR0, offsetof(struct kvmppc_booke_debug_reg,
 + dbcr0));
 +   DEFINE(KVMPPC_DBG_DBCR1, offsetof(struct kvmppc_booke_debug_reg,
 + dbcr1));
 +   DEFINE(KVMPPC_DBG_DBCR2, offsetof(struct kvmppc_booke_debug_reg,
 + dbcr2));
 +#ifdef CONFIG_KVM_E500MC
 +   DEFINE(KVMPPC_DBG_DBCR4, offsetof(struct kvmppc_booke_debug_reg,
 + dbcr4));
 +#endif
 +   DEFINE(KVMPPC_DBG_IAC1, offsetof(struct kvmppc_booke_debug_reg,
 +iac[0]));
 +   DEFINE(KVMPPC_DBG_IAC2, offsetof(struct kvmppc_booke_debug_reg,
 +iac[1]));
 +

[PULL 00/56] ppc patch queue 2012-10-04

2012-10-04 Thread Alexander Graf

Hi Avi / Marcelo,

This is my current patch queue for ppc.  Please pull.

Changes include:

  * add support for idle hcall on booke
  * icache clear on map
  * mmu notifier support for e500 and book3s_pr
  * revive the 440 support slightly (still not 100% happy)
  * unify booke and book3s_pr entry/exit code a bit
  * add watchdog emulation for booke
  * reset and migratbility fixes for book3s_64_hv
  * rework book3s_64_hv memslot locking
  * small bug fixes

Alex


The following changes since commit 1e08ec4a130e2745d96df169e67c58df98a07311:
  Gleb Natapov (1):
KVM: optimize apic interrupt delivery

are available in the git repository at:

  git://github.com/agraf/linux-2.6.git for-upstream

Alexander Graf (28):
  KVM: PPC: PR: Use generic tracepoint for guest exit
  KVM: PPC: Expose SYNC cap based on mmu notifiers
  KVM: PPC: BookE: Expose remote TLB flushes in debugfs
  KVM: PPC: E500: Fix clear_tlb_refs
  KVM: PPC: BookE: Add check_requests helper function
  KVM: PPC: BookE: Add support for vcpu-mode
  KVM: PPC: E500: Implement MMU notifiers
  KVM: PPC: BookE: Add some more trace points
  KVM: PPC: BookE: No duplicate request != 0 check
  KVM: PPC: Use same kvmppc_prepare_to_enter code for booke and book3s_pr
  KVM: PPC: Book3s: PR: Add (dumb) MMU Notifier support
  KVM: PPC: BookE: Drop redundant vcpu-mode set
  KVM: PPC: Book3S: PR: Only do resched check once per exit
  KVM: PPC: Exit guest context while handling exit
  KVM: PPC: Book3S: PR: Indicate we're out of guest mode
  KVM: PPC: Consistentify vcpu exit path
  KVM: PPC: Book3S: PR: Rework irq disabling
  KVM: PPC: Move kvm_guest_enter call into generic code
  KVM: PPC: Ignore EXITING_GUEST_MODE mode
  KVM: PPC: Add return value in prepare_to_enter
  KVM: PPC: Add return value to core_check_requests
  KVM: PPC: 44x: Initialize PVR
  KVM: PPC: BookE: Add MCSR SPR support
  KVM: PPC: Use symbols for exit trace
  KVM: PPC: E500: Remove E500_TLB_DIRTY flag
  KVM: PPC: 440: Implement mtdcrx
  KVM: PPC: 440: Implement mfdcrx
  KVM: PPC: BookE: Support FPU on non-hv systems

Bharat Bhushan (3):
  KVM: PPC: booke: Add watchdog emulation
  booke: Added ONE_REG interface for IAC/DAC debug registers
  Document IACx/DACx registers access using ONE_REG API

Julia Lawall (1):
  arch/powerpc/kvm/e500_tlb.c: fix error return code

Liu Yu-B13201 (3):
  KVM: PPC: Add support for ePAPR idle hcall in host kernel
  KVM: PPC: ev_idle hcall support for e500 guests
  PPC: Don't use hardcoded opcode for ePAPR hcall invocation

Mihai Caraman (1):
  KVM: PPC: bookehv: Allow duplicate calls of DO_KVM macro

Paul Mackerras (11):
  KVM: PPC: Quieten message about allocating linear regions
  KVM: PPC: Book3S HV: Take the SRCU read lock before looking up memslots
  KVM: PPC: Move kvm-arch.slot_phys into memslot.arch
  KVM: PPC: Book3S HV: Handle memory slot deletion and modification 
correctly
  KVM: Move some PPC ioctl definitions to the correct place
  KVM: PPC: Book3S HV: Fix updates of vcpu-cpu
  KVM: PPC: Book3S HV: Remove bogus update of physical thread IDs
  KVM: PPC: Book3S HV: Fix calculation of guest phys address for MMIO 
emulation
  KVM: PPC: Book3S: Get/set guest SPRs using the GET/SET_ONE_REG interface
  KVM: PPC: Book3S: Get/set guest FP regs using the GET/SET_ONE_REG 
interface
  KVM: PPC: Book3S HV: Provide a way for userspace to get/set per-vCPU areas

Scott Wood (5):
  powerpc/fsl-soc: use CONFIG_EPAPR_PARAVIRT for hcalls
  powerpc/epapr: export epapr_hypercall_start
  KVM: PPC: e500: fix allocation size error on g2h_tlb1_map
  KVM: PPC: e500: MMU API: fix leak of shared_tlb_pages
  KVM: PPC: set IN_GUEST_MODE before checking requests

Stuart Yoder (4):
  PPC: epapr: create define for return code value of success
  KVM: PPC: use definitions in epapr header for hcalls
  KVM: PPC: add pvinfo for hcall opcodes on e500mc/e5500
  PPC: select EPAPR_PARAVIRT for all users of epapr hcalls

 Documentation/virtual/kvm/api.txt   |   49 -
 arch/powerpc/include/asm/Kbuild |1 +
 arch/powerpc/include/asm/epapr_hcalls.h |   36 ++--
 arch/powerpc/include/asm/fsl_hcalls.h   |   36 ++--
 arch/powerpc/include/asm/kvm.h  |   59 +
 arch/powerpc/include/asm/kvm_book3s.h   |2 +-
 arch/powerpc/include/asm/kvm_booke_hv_asm.h |4 +-
 arch/powerpc/include/asm/kvm_host.h |   38 +++-
 arch/powerpc/include/asm/kvm_para.h |   21 +-
 arch/powerpc/include/asm/kvm_ppc.h  |   64 +-
 arch/powerpc/include/asm/reg_booke.h|7 +
 arch/powerpc/kernel/epapr_hcalls.S  |   28 +++
 arch/powerpc/kernel/epapr_paravirt.c|   11 +-
 arch/powerpc/kernel/kvm.c   |2 +-
 arch/powerpc/kernel/ppc_ksyms.c |5 +
 arch/powerpc/kvm/44x.c

[PATCH 01/56] PPC: epapr: create define for return code value of success

2012-10-04 Thread Alexander Graf

From: Stuart Yoder stuart.yo...@freescale.com

Signed-off-by: Stuart Yoder stuart.yo...@freescale.com
Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/include/asm/epapr_hcalls.h |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/include/asm/epapr_hcalls.h 
b/arch/powerpc/include/asm/epapr_hcalls.h
index bf2c06c..c0c7adc 100644
--- a/arch/powerpc/include/asm/epapr_hcalls.h
+++ b/arch/powerpc/include/asm/epapr_hcalls.h
@@ -88,7 +88,8 @@
 #define _EV_HCALL_TOKEN(id, num) (((id)  16) | (num))
 #define EV_HCALL_TOKEN(hcall_num) _EV_HCALL_TOKEN(EV_EPAPR_VENDOR_ID, 
hcall_num)
 
-/* epapr error codes */
+/* epapr return codes */
+#define EV_SUCCESS 0
 #define EV_EPERM   1   /* Operation not permitted */
 #define EV_ENOENT  2   /*  Entry Not Found */
 #define EV_EIO 3   /* I/O error occured */
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 03/56] KVM: PPC: add pvinfo for hcall opcodes on e500mc/e5500

2012-10-04 Thread Alexander Graf

From: Stuart Yoder stuart.yo...@freescale.com

Signed-off-by: Liu Yu yu@freescale.com
[stuart: factored this out from idle hcall support in host patch]
Signed-off-by: Stuart Yoder stuart.yo...@freescale.com
Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kvm/powerpc.c |   10 +-
 1 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 0368a93..a478e66 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -751,9 +751,16 @@ int kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct 
vm_fault *vmf)
 
 static int kvm_vm_ioctl_get_pvinfo(struct kvm_ppc_pvinfo *pvinfo)
 {
+   u32 inst_nop = 0x6000;
+#ifdef CONFIG_KVM_BOOKE_HV
+   u32 inst_sc1 = 0x4422;
+   pvinfo-hcall[0] = inst_sc1;
+   pvinfo-hcall[1] = inst_nop;
+   pvinfo-hcall[2] = inst_nop;
+   pvinfo-hcall[3] = inst_nop;
+#else
u32 inst_lis = 0x3c00;
u32 inst_ori = 0x6000;
-   u32 inst_nop = 0x6000;
u32 inst_sc = 0x4402;
u32 inst_imm_mask = 0x;
 
@@ -770,6 +777,7 @@ static int kvm_vm_ioctl_get_pvinfo(struct kvm_ppc_pvinfo 
*pvinfo)
pvinfo-hcall[1] = inst_ori | (KVM_SC_MAGIC_R0  inst_imm_mask);
pvinfo-hcall[2] = inst_sc;
pvinfo-hcall[3] = inst_nop;
+#endif
 
return 0;
 }
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 02/56] KVM: PPC: use definitions in epapr header for hcalls

2012-10-04 Thread Alexander Graf

From: Stuart Yoder stuart.yo...@freescale.com

Signed-off-by: Stuart Yoder stuart.yo...@freescale.com
Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/include/asm/kvm_para.h |   21 +++--
 arch/powerpc/kernel/kvm.c   |2 +-
 arch/powerpc/kvm/powerpc.c  |   10 +-
 3 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_para.h 
b/arch/powerpc/include/asm/kvm_para.h
index c18916b..a168ce3 100644
--- a/arch/powerpc/include/asm/kvm_para.h
+++ b/arch/powerpc/include/asm/kvm_para.h
@@ -75,9 +75,10 @@ struct kvm_vcpu_arch_shared {
 };
 
 #define KVM_SC_MAGIC_R00x4b564d21 /* KVM! */
-#define HC_VENDOR_KVM  (42  16)
-#define HC_EV_SUCCESS  0
-#define HC_EV_UNIMPLEMENTED12
+
+#define KVM_HCALL_TOKEN(num) _EV_HCALL_TOKEN(EV_KVM_VENDOR_ID, num)
+
+#include asm/epapr_hcalls.h
 
 #define KVM_FEATURE_MAGIC_PAGE 1
 
@@ -121,7 +122,7 @@ static unsigned long kvm_hypercall(unsigned long *in,
   unsigned long *out,
   unsigned long nr)
 {
-   return HC_EV_UNIMPLEMENTED;
+   return EV_UNIMPLEMENTED;
 }
 
 #endif
@@ -132,7 +133,7 @@ static inline long kvm_hypercall0_1(unsigned int nr, 
unsigned long *r2)
unsigned long out[8];
unsigned long r;
 
-   r = kvm_hypercall(in, out, nr | HC_VENDOR_KVM);
+   r = kvm_hypercall(in, out, KVM_HCALL_TOKEN(nr));
*r2 = out[0];
 
return r;
@@ -143,7 +144,7 @@ static inline long kvm_hypercall0(unsigned int nr)
unsigned long in[8];
unsigned long out[8];
 
-   return kvm_hypercall(in, out, nr | HC_VENDOR_KVM);
+   return kvm_hypercall(in, out, KVM_HCALL_TOKEN(nr));
 }
 
 static inline long kvm_hypercall1(unsigned int nr, unsigned long p1)
@@ -152,7 +153,7 @@ static inline long kvm_hypercall1(unsigned int nr, unsigned 
long p1)
unsigned long out[8];
 
in[0] = p1;
-   return kvm_hypercall(in, out, nr | HC_VENDOR_KVM);
+   return kvm_hypercall(in, out, KVM_HCALL_TOKEN(nr));
 }
 
 static inline long kvm_hypercall2(unsigned int nr, unsigned long p1,
@@ -163,7 +164,7 @@ static inline long kvm_hypercall2(unsigned int nr, unsigned 
long p1,
 
in[0] = p1;
in[1] = p2;
-   return kvm_hypercall(in, out, nr | HC_VENDOR_KVM);
+   return kvm_hypercall(in, out, KVM_HCALL_TOKEN(nr));
 }
 
 static inline long kvm_hypercall3(unsigned int nr, unsigned long p1,
@@ -175,7 +176,7 @@ static inline long kvm_hypercall3(unsigned int nr, unsigned 
long p1,
in[0] = p1;
in[1] = p2;
in[2] = p3;
-   return kvm_hypercall(in, out, nr | HC_VENDOR_KVM);
+   return kvm_hypercall(in, out, KVM_HCALL_TOKEN(nr));
 }
 
 static inline long kvm_hypercall4(unsigned int nr, unsigned long p1,
@@ -189,7 +190,7 @@ static inline long kvm_hypercall4(unsigned int nr, unsigned 
long p1,
in[1] = p2;
in[2] = p3;
in[3] = p4;
-   return kvm_hypercall(in, out, nr | HC_VENDOR_KVM);
+   return kvm_hypercall(in, out, KVM_HCALL_TOKEN(nr));
 }
 
 
diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c
index 867db1d..a61b133 100644
--- a/arch/powerpc/kernel/kvm.c
+++ b/arch/powerpc/kernel/kvm.c
@@ -419,7 +419,7 @@ static void kvm_map_magic_page(void *data)
in[0] = KVM_MAGIC_PAGE;
in[1] = KVM_MAGIC_PAGE;
 
-   kvm_hypercall(in, out, HC_VENDOR_KVM | KVM_HC_PPC_MAP_MAGIC_PAGE);
+   kvm_hypercall(in, out, KVM_HCALL_TOKEN(KVM_HC_PPC_MAP_MAGIC_PAGE));
 
*features = out[0];
 }
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 4d213b8..0368a93 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -67,18 +67,18 @@ int kvmppc_kvm_pv(struct kvm_vcpu *vcpu)
}
 
switch (nr) {
-   case HC_VENDOR_KVM | KVM_HC_PPC_MAP_MAGIC_PAGE:
+   case KVM_HCALL_TOKEN(KVM_HC_PPC_MAP_MAGIC_PAGE):
{
vcpu-arch.magic_page_pa = param1;
vcpu-arch.magic_page_ea = param2;
 
r2 = KVM_MAGIC_FEAT_SR | KVM_MAGIC_FEAT_MAS0_TO_SPRG7;
 
-   r = HC_EV_SUCCESS;
+   r = EV_SUCCESS;
break;
}
-   case HC_VENDOR_KVM | KVM_HC_FEATURES:
-   r = HC_EV_SUCCESS;
+   case KVM_HCALL_TOKEN(KVM_HC_FEATURES):
+   r = EV_SUCCESS;
 #if defined(CONFIG_PPC_BOOK3S) || defined(CONFIG_KVM_E500V2)
/* XXX Missing magic page on 44x */
r2 |= (1  KVM_FEATURE_MAGIC_PAGE);
@@ -87,7 +87,7 @@ int kvmppc_kvm_pv(struct kvm_vcpu *vcpu)
/* Second return value is in r4 */
break;
default:
-   r = HC_EV_UNIMPLEMENTED;
+   r = EV_UNIMPLEMENTED;
break;
}
 
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo

[PATCH 06/56] PPC: select EPAPR_PARAVIRT for all users of epapr hcalls

2012-10-04 Thread Alexander Graf

From: Stuart Yoder stuart.yo...@freescale.com

Signed-off-by: Stuart Yoder stuart.yo...@freescale.com
Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/platforms/Kconfig |1 +
 drivers/tty/Kconfig|1 +
 drivers/virt/Kconfig   |1 +
 3 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/platforms/Kconfig b/arch/powerpc/platforms/Kconfig
index e7a896a..48a920d 100644
--- a/arch/powerpc/platforms/Kconfig
+++ b/arch/powerpc/platforms/Kconfig
@@ -90,6 +90,7 @@ config MPIC
 config PPC_EPAPR_HV_PIC
bool
default n
+   select EPAPR_PARAVIRT
 
 config MPIC_WEIRD
bool
diff --git a/drivers/tty/Kconfig b/drivers/tty/Kconfig
index 830cd62..aa99cd2 100644
--- a/drivers/tty/Kconfig
+++ b/drivers/tty/Kconfig
@@ -358,6 +358,7 @@ config TRACE_SINK
 config PPC_EPAPR_HV_BYTECHAN
tristate ePAPR hypervisor byte channel driver
depends on PPC
+   select EPAPR_PARAVIRT
help
  This driver creates /dev entries for each ePAPR hypervisor byte
  channel, thereby allowing applications to communicate with byte
diff --git a/drivers/virt/Kconfig b/drivers/virt/Kconfig
index 2dcdbc9..99ebdde 100644
--- a/drivers/virt/Kconfig
+++ b/drivers/virt/Kconfig
@@ -15,6 +15,7 @@ if VIRT_DRIVERS
 config FSL_HV_MANAGER
tristate Freescale hypervisor management driver
depends on FSL_SOC
+   select EPAPR_PARAVIRT
help
   The Freescale hypervisor management driver provides several services
  to drivers and applications related to the Freescale hypervisor:
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 04/56] KVM: PPC: Add support for ePAPR idle hcall in host kernel

2012-10-04 Thread Alexander Graf

From: Liu Yu-B13201 yu@freescale.com

And add a new flag definition in kvm_ppc_pvinfo to indicate
whether the host supports the EV_IDLE hcall.

Signed-off-by: Liu Yu yu@freescale.com
[stuart.yo...@freescale.com: cleanup,fixes for conditions allowing idle]
Signed-off-by: Stuart Yoder stuart.yo...@freescale.com
[agraf: fix typo]
Signed-off-by: Alexander Graf ag...@suse.de
---
 Documentation/virtual/kvm/api.txt |7 +--
 arch/powerpc/include/asm/Kbuild   |1 +
 arch/powerpc/kvm/powerpc.c|   10 --
 include/linux/kvm.h   |2 ++
 4 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 36befa7..11b5d31 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1194,12 +1194,15 @@ struct kvm_ppc_pvinfo {
 This ioctl fetches PV specific information that need to be passed to the guest
 using the device tree or other means from vm context.
 
-For now the only implemented piece of information distributed here is an array
-of 4 instructions that make up a hypercall.
+The hcall array defines 4 instructions that make up a hypercall.
 
 If any additional field gets added to this structure later on, a bit for that
 additional piece of information will be set in the flags bitmap.
 
+The flags bitmap is defined as:
+
+   /* the host supports the ePAPR idle hcall
+   #define KVM_PPC_PVINFO_FLAGS_EV_IDLE   (10)
 
 4.48 KVM_ASSIGN_PCI_DEVICE
 
diff --git a/arch/powerpc/include/asm/Kbuild b/arch/powerpc/include/asm/Kbuild
index 7e313f1..13d6b7b 100644
--- a/arch/powerpc/include/asm/Kbuild
+++ b/arch/powerpc/include/asm/Kbuild
@@ -34,5 +34,6 @@ header-y += termios.h
 header-y += types.h
 header-y += ucontext.h
 header-y += unistd.h
+header-y += epapr_hcalls.h
 
 generic-y += rwsem.h
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index a478e66..dbf56e1 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -38,8 +38,7 @@
 
 int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
 {
-   return !(v-arch.shared-msr  MSR_WE) ||
-  !!(v-arch.pending_exceptions) ||
+   return !!(v-arch.pending_exceptions) ||
   v-requests;
 }
 
@@ -86,6 +85,11 @@ int kvmppc_kvm_pv(struct kvm_vcpu *vcpu)
 
/* Second return value is in r4 */
break;
+   case EV_HCALL_TOKEN(EV_IDLE):
+   r = EV_SUCCESS;
+   kvm_vcpu_block(vcpu);
+   clear_bit(KVM_REQ_UNHALT, vcpu-requests);
+   break;
default:
r = EV_UNIMPLEMENTED;
break;
@@ -779,6 +783,8 @@ static int kvm_vm_ioctl_get_pvinfo(struct kvm_ppc_pvinfo 
*pvinfo)
pvinfo-hcall[3] = inst_nop;
 #endif
 
+   pvinfo-flags = KVM_PPC_PVINFO_FLAGS_EV_IDLE;
+
return 0;
 }
 
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index d808694..6be840a 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -477,6 +477,8 @@ struct kvm_ppc_smmu_info {
struct kvm_ppc_one_seg_page_size sps[KVM_PPC_PAGE_SIZES_MAX_SZ];
 };
 
+#define KVM_PPC_PVINFO_FLAGS_EV_IDLE   (10)
+
 #define KVMIO 0xAE
 
 /* machine type bits, to be used as argument to KVM_CREATE_VM */
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 07/56] powerpc/fsl-soc: use CONFIG_EPAPR_PARAVIRT for hcalls

2012-10-04 Thread Alexander Graf

From: Scott Wood scottw...@freescale.com

Signed-off-by: Scott Wood scottw...@freescale.com
Signed-off-by: Stuart Yoder stuart.yo...@freescale.com
Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/sysdev/fsl_msi.c |9 +++--
 arch/powerpc/sysdev/fsl_soc.c |2 ++
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/sysdev/fsl_msi.c b/arch/powerpc/sysdev/fsl_msi.c
index 6e097de..7e2b2f2 100644
--- a/arch/powerpc/sysdev/fsl_msi.c
+++ b/arch/powerpc/sysdev/fsl_msi.c
@@ -236,7 +236,6 @@ static void fsl_msi_cascade(unsigned int irq, struct 
irq_desc *desc)
u32 intr_index;
u32 have_shift = 0;
struct fsl_msi_cascade_data *cascade_data;
-   unsigned int ret;
 
cascade_data = irq_get_handler_data(irq);
msi_data = cascade_data-msi_data;
@@ -268,7 +267,9 @@ static void fsl_msi_cascade(unsigned int irq, struct 
irq_desc *desc)
case FSL_PIC_IP_IPIC:
msir_value = fsl_msi_read(msi_data-msi_regs, msir_index * 0x4);
break;
-   case FSL_PIC_IP_VMPIC:
+#ifdef CONFIG_EPAPR_PARAVIRT
+   case FSL_PIC_IP_VMPIC: {
+   unsigned int ret;
ret = fh_vmpic_get_msir(virq_to_hw(irq), msir_value);
if (ret) {
pr_err(fsl-msi: fh_vmpic_get_msir() failed for 
@@ -277,6 +278,8 @@ static void fsl_msi_cascade(unsigned int irq, struct 
irq_desc *desc)
}
break;
}
+#endif
+   }
 
while (msir_value) {
intr_index = ffs(msir_value) - 1;
@@ -508,10 +511,12 @@ static const struct of_device_id fsl_of_msi_ids[] = {
.compatible = fsl,ipic-msi,
.data = (void *)ipic_msi_feature,
},
+#ifdef CONFIG_EPAPR_PARAVIRT
{
.compatible = fsl,vmpic-msi,
.data = (void *)vmpic_msi_feature,
},
+#endif
{}
 };
 
diff --git a/arch/powerpc/sysdev/fsl_soc.c b/arch/powerpc/sysdev/fsl_soc.c
index c449dbd..97118dc 100644
--- a/arch/powerpc/sysdev/fsl_soc.c
+++ b/arch/powerpc/sysdev/fsl_soc.c
@@ -253,6 +253,7 @@ struct platform_diu_data_ops diu_ops;
 EXPORT_SYMBOL(diu_ops);
 #endif
 
+#ifdef CONFIG_EPAPR_PARAVIRT
 /*
  * Restart the current partition
  *
@@ -278,3 +279,4 @@ void fsl_hv_halt(void)
pr_info(hv exit\n);
fh_partition_stop(-1);
 }
+#endif
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 08/56] PPC: Don't use hardcoded opcode for ePAPR hcall invocation

2012-10-04 Thread Alexander Graf

From: Liu Yu-B13201 yu@freescale.com

Signed-off-by: Liu Yu yu@freescale.com
Signed-off-by: Stuart Yoder stuart.yo...@freescale.com
Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/include/asm/epapr_hcalls.h |   22 +-
 arch/powerpc/include/asm/fsl_hcalls.h   |   36 +++---
 2 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/arch/powerpc/include/asm/epapr_hcalls.h 
b/arch/powerpc/include/asm/epapr_hcalls.h
index 833ce2c..b8d9445 100644
--- a/arch/powerpc/include/asm/epapr_hcalls.h
+++ b/arch/powerpc/include/asm/epapr_hcalls.h
@@ -195,7 +195,7 @@ static inline unsigned int ev_int_set_config(unsigned int 
interrupt,
r5  = priority;
r6  = destination;
 
-   __asm__ __volatile__ (sc 1
+   asm volatile(blepapr_hypercall_start
: +r (r11), +r (r3), +r (r4), +r (r5), +r (r6)
: : EV_HCALL_CLOBBERS4
);
@@ -224,7 +224,7 @@ static inline unsigned int ev_int_get_config(unsigned int 
interrupt,
r11 = EV_HCALL_TOKEN(EV_INT_GET_CONFIG);
r3 = interrupt;
 
-   __asm__ __volatile__ (sc 1
+   asm volatile(blepapr_hypercall_start
: +r (r11), +r (r3), =r (r4), =r (r5), =r (r6)
: : EV_HCALL_CLOBBERS4
);
@@ -254,7 +254,7 @@ static inline unsigned int ev_int_set_mask(unsigned int 
interrupt,
r3 = interrupt;
r4 = mask;
 
-   __asm__ __volatile__ (sc 1
+   asm volatile(blepapr_hypercall_start
: +r (r11), +r (r3), +r (r4)
: : EV_HCALL_CLOBBERS2
);
@@ -279,7 +279,7 @@ static inline unsigned int ev_int_get_mask(unsigned int 
interrupt,
r11 = EV_HCALL_TOKEN(EV_INT_GET_MASK);
r3 = interrupt;
 
-   __asm__ __volatile__ (sc 1
+   asm volatile(blepapr_hypercall_start
: +r (r11), +r (r3), =r (r4)
: : EV_HCALL_CLOBBERS2
);
@@ -307,7 +307,7 @@ static inline unsigned int ev_int_eoi(unsigned int 
interrupt)
r11 = EV_HCALL_TOKEN(EV_INT_EOI);
r3 = interrupt;
 
-   __asm__ __volatile__ (sc 1
+   asm volatile(blepapr_hypercall_start
: +r (r11), +r (r3)
: : EV_HCALL_CLOBBERS1
);
@@ -346,7 +346,7 @@ static inline unsigned int ev_byte_channel_send(unsigned 
int handle,
r7 = be32_to_cpu(p[2]);
r8 = be32_to_cpu(p[3]);
 
-   __asm__ __volatile__ (sc 1
+   asm volatile(blepapr_hypercall_start
: +r (r11), +r (r3),
  +r (r4), +r (r5), +r (r6), +r (r7), +r (r8)
: : EV_HCALL_CLOBBERS6
@@ -385,7 +385,7 @@ static inline unsigned int ev_byte_channel_receive(unsigned 
int handle,
r3 = handle;
r4 = *count;
 
-   __asm__ __volatile__ (sc 1
+   asm volatile(blepapr_hypercall_start
: +r (r11), +r (r3), +r (r4),
  =r (r5), =r (r6), =r (r7), =r (r8)
: : EV_HCALL_CLOBBERS6
@@ -423,7 +423,7 @@ static inline unsigned int ev_byte_channel_poll(unsigned 
int handle,
r11 = EV_HCALL_TOKEN(EV_BYTE_CHANNEL_POLL);
r3 = handle;
 
-   __asm__ __volatile__ (sc 1
+   asm volatile(blepapr_hypercall_start
: +r (r11), +r (r3), =r (r4), =r (r5)
: : EV_HCALL_CLOBBERS3
);
@@ -456,7 +456,7 @@ static inline unsigned int ev_int_iack(unsigned int handle,
r11 = EV_HCALL_TOKEN(EV_INT_IACK);
r3 = handle;
 
-   __asm__ __volatile__ (sc 1
+   asm volatile(blepapr_hypercall_start
: +r (r11), +r (r3), =r (r4)
: : EV_HCALL_CLOBBERS2
);
@@ -480,7 +480,7 @@ static inline unsigned int ev_doorbell_send(unsigned int 
handle)
r11 = EV_HCALL_TOKEN(EV_DOORBELL_SEND);
r3 = handle;
 
-   __asm__ __volatile__ (sc 1
+   asm volatile(blepapr_hypercall_start
: +r (r11), +r (r3)
: : EV_HCALL_CLOBBERS1
);
@@ -500,7 +500,7 @@ static inline unsigned int ev_idle(void)
 
r11 = EV_HCALL_TOKEN(EV_IDLE);
 
-   __asm__ __volatile__ (sc 1
+   asm volatile(blepapr_hypercall_start
: +r (r11), =r (r3)
: : EV_HCALL_CLOBBERS1
);
diff --git a/arch/powerpc/include/asm/fsl_hcalls.h 
b/arch/powerpc/include/asm/fsl_hcalls.h
index 922d9b5..3abb583 100644
--- a/arch/powerpc/include/asm/fsl_hcalls.h
+++ b/arch/powerpc/include/asm/fsl_hcalls.h
@@ -96,7 +96,7 @@ static inline unsigned int fh_send_nmi(unsigned int vcpu_mask)
r11 = FH_HCALL_TOKEN(FH_SEND_NMI);
r3 = vcpu_mask;
 
-   __asm__ __volatile__ (sc 1
+   asm volatile(blepapr_hypercall_start
: +r (r11), +r (r3)
: : EV_HCALL_CLOBBERS1
);
@@ -151,7 +151,7 @@ static inline unsigned int fh_partition_get_dtprop(int 
handle,
r9 = (uint32_t)propvalue_addr;

[PATCH 10/56] KVM: PPC: Expose SYNC cap based on mmu notifiers

2012-10-04 Thread Alexander Graf

Semantically, the SYNC cap means that we have mmu notifiers available.
Express this in our #ifdef'ery around the feature, so that we can be sure
we don't miss out on ppc targets when they get their implementation.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kvm/powerpc.c |8 +++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index dbf56e1..45fe433 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -264,10 +264,16 @@ int kvm_dev_ioctl_check_extension(long ext)
if (cpu_has_feature(CPU_FTR_ARCH_201))
r = 2;
break;
+#endif
case KVM_CAP_SYNC_MMU:
+#ifdef CONFIG_KVM_BOOK3S_64_HV
r = cpu_has_feature(CPU_FTR_ARCH_206) ? 1 : 0;
-   break;
+#elif defined(KVM_ARCH_WANT_MMU_NOTIFIER)
+   r = 1;
+#else
+   r = 0;
 #endif
+   break;
case KVM_CAP_NR_VCPUS:
/*
 * Recommending a number of CPUs is somewhat arbitrary; we
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 05/56] KVM: PPC: ev_idle hcall support for e500 guests

2012-10-04 Thread Alexander Graf

From: Liu Yu-B13201 yu@freescale.com

Signed-off-by: Liu Yu yu@freescale.com
[varun: 64-bit changes]
Signed-off-by: Varun Sethi varun.se...@freescale.com
Signed-off-by: Stuart Yoder stuart.yo...@freescale.com
Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/include/asm/epapr_hcalls.h |   11 ++-
 arch/powerpc/kernel/epapr_hcalls.S  |   28 
 arch/powerpc/kernel/epapr_paravirt.c|   11 ++-
 3 files changed, 44 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/epapr_hcalls.h 
b/arch/powerpc/include/asm/epapr_hcalls.h
index c0c7adc..833ce2c 100644
--- a/arch/powerpc/include/asm/epapr_hcalls.h
+++ b/arch/powerpc/include/asm/epapr_hcalls.h
@@ -50,10 +50,6 @@
 #ifndef _EPAPR_HCALLS_H
 #define _EPAPR_HCALLS_H
 
-#include linux/types.h
-#include linux/errno.h
-#include asm/byteorder.h
-
 #define EV_BYTE_CHANNEL_SEND   1
 #define EV_BYTE_CHANNEL_RECEIVE2
 #define EV_BYTE_CHANNEL_POLL   3
@@ -109,6 +105,11 @@
 #define EV_UNIMPLEMENTED   12  /* Unimplemented hypercall */
 #define EV_BUFFER_OVERFLOW 13  /* Caller-supplied buffer too small */
 
+#ifndef __ASSEMBLY__
+#include linux/types.h
+#include linux/errno.h
+#include asm/byteorder.h
+
 /*
  * Hypercall register clobber list
  *
@@ -506,5 +507,5 @@ static inline unsigned int ev_idle(void)
 
return r3;
 }
-
+#endif /* !__ASSEMBLY__ */
 #endif
diff --git a/arch/powerpc/kernel/epapr_hcalls.S 
b/arch/powerpc/kernel/epapr_hcalls.S
index 697b390..62c0dc2 100644
--- a/arch/powerpc/kernel/epapr_hcalls.S
+++ b/arch/powerpc/kernel/epapr_hcalls.S
@@ -8,13 +8,41 @@
  */
 
 #include linux/threads.h
+#include asm/epapr_hcalls.h
 #include asm/reg.h
 #include asm/page.h
 #include asm/cputable.h
 #include asm/thread_info.h
 #include asm/ppc_asm.h
+#include asm/asm-compat.h
 #include asm/asm-offsets.h
 
+/* epapr_ev_idle() was derived from e500_idle() */
+_GLOBAL(epapr_ev_idle)
+   CURRENT_THREAD_INFO(r3, r1)
+   PPC_LL  r4, TI_LOCAL_FLAGS(r3)  /* set napping bit */
+   ori r4, r4,_TLF_NAPPING /* so when we take an exception */
+   PPC_STL r4, TI_LOCAL_FLAGS(r3)  /* it will return to our caller */
+
+   wrteei  1
+
+idle_loop:
+   LOAD_REG_IMMEDIATE(r11, EV_HCALL_TOKEN(EV_IDLE))
+
+.global epapr_ev_idle_start
+epapr_ev_idle_start:
+   li  r3, -1
+   nop
+   nop
+   nop
+
+   /*
+* Guard against spurious wakeups from a hypervisor --
+* only interrupt will cause us to return to LR due to
+* _TLF_NAPPING.
+*/
+   b   idle_loop
+
 /* Hypercall entry point. Will be patched with device tree instructions. */
 .global epapr_hypercall_start
 epapr_hypercall_start:
diff --git a/arch/powerpc/kernel/epapr_paravirt.c 
b/arch/powerpc/kernel/epapr_paravirt.c
index 028aeae..f3eab85 100644
--- a/arch/powerpc/kernel/epapr_paravirt.c
+++ b/arch/powerpc/kernel/epapr_paravirt.c
@@ -21,6 +21,10 @@
 #include asm/epapr_hcalls.h
 #include asm/cacheflush.h
 #include asm/code-patching.h
+#include asm/machdep.h
+
+extern void epapr_ev_idle(void);
+extern u32 epapr_ev_idle_start[];
 
 bool epapr_paravirt_enabled;
 
@@ -41,8 +45,13 @@ static int __init epapr_paravirt_init(void)
if (len % 4 || len  (4 * 4))
return -ENODEV;
 
-   for (i = 0; i  (len / 4); i++)
+   for (i = 0; i  (len / 4); i++) {
patch_instruction(epapr_hypercall_start + i, insts[i]);
+   patch_instruction(epapr_ev_idle_start + i, insts[i]);
+   }
+
+   if (of_get_property(hyper_node, has-idle, NULL))
+   ppc_md.power_save = epapr_ev_idle;
 
epapr_paravirt_enabled = true;
 
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 11/56] KVM: PPC: BookE: Expose remote TLB flushes in debugfs

2012-10-04 Thread Alexander Graf

We're already counting remote TLB flushes in a variable, but don't export
it to user space yet. Do so, so we know what's going on.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kvm/booke.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 7ce2ed0..1d4ce9a 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -63,6 +63,7 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
{ halt_wakeup, VCPU_STAT(halt_wakeup) },
{ doorbell, VCPU_STAT(dbell_exits) },
{ guest doorbell, VCPU_STAT(gdbell_exits) },
+   { remote_tlb_flush, VM_STAT(remote_tlb_flush) },
{ NULL }
 };
 
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 13/56] KVM: PPC: Quieten message about allocating linear regions

2012-10-04 Thread Alexander Graf

From: Paul Mackerras pau...@samba.org

This is printed once for every RMA or HPT region that get
preallocated.  If one preallocates hundreds of such regions
(in order to run hundreds of KVM guests), that gets rather
painful, so make it a bit quieter.

Signed-off-by: Paul Mackerras pau...@samba.org
Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kvm/book3s_hv_builtin.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c 
b/arch/powerpc/kvm/book3s_hv_builtin.c
index fb4eac2..ec0a9e5 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -157,8 +157,8 @@ static void __init kvm_linear_init_one(ulong size, int 
count, int type)
linear_info = alloc_bootmem(count * sizeof(struct kvmppc_linear_info));
for (i = 0; i  count; ++i) {
linear = alloc_bootmem_align(size, size);
-   pr_info(Allocated KVM %s at %p (%ld MB)\n, typestr, linear,
-   size  20);
+   pr_debug(Allocated KVM %s at %p (%ld MB)\n, typestr, linear,
+size  20);
linear_info[i].base_virt = linear;
linear_info[i].base_pfn = __pa(linear)  PAGE_SHIFT;
linear_info[i].npages = npages;
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 12/56] KVM: PPC: E500: Fix clear_tlb_refs

2012-10-04 Thread Alexander Graf

Our mapping code assumes that TLB0 entries are always mapped. However, after
calling clear_tlb_refs() this is no longer the case.

Map them dynamically if we find an entry unmapped in TLB0.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kvm/e500_tlb.c |8 ++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/e500_tlb.c b/arch/powerpc/kvm/e500_tlb.c
index ff38b66..b56b6e1 100644
--- a/arch/powerpc/kvm/e500_tlb.c
+++ b/arch/powerpc/kvm/e500_tlb.c
@@ -1039,8 +1039,12 @@ void kvmppc_mmu_map(struct kvm_vcpu *vcpu, u64 eaddr, 
gpa_t gpaddr,
sesel = 0; /* unused */
priv = vcpu_e500-gtlb_priv[tlbsel][esel];
 
-   kvmppc_e500_setup_stlbe(vcpu, gtlbe, BOOK3E_PAGESZ_4K,
-   priv-ref, eaddr, stlbe);
+   /* Only triggers after clear_tlb_refs */
+   if (unlikely(!(priv-ref.flags  E500_TLB_VALID)))
+   kvmppc_e500_tlb0_map(vcpu_e500, esel, stlbe);
+   else
+   kvmppc_e500_setup_stlbe(vcpu, gtlbe, BOOK3E_PAGESZ_4K,
+   priv-ref, eaddr, stlbe);
break;
 
case 1: {
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 14/56] powerpc/epapr: export epapr_hypercall_start

2012-10-04 Thread Alexander Graf

From: Scott Wood scottw...@freescale.com

This fixes breakage introduced by the following commit:

  commit 6d2d82627f4f1e96a33664ace494fa363e0495cb
  Author: Liu Yu-B13201 yu@freescale.com
  Date:   Tue Jul 3 05:48:56 2012 +

PPC: Don't use hardcoded opcode for ePAPR hcall invocation

when a driver that uses ePAPR hypercalls is built as a module.

Reported-by: Geert Uytterhoeven ge...@linux-m68k.org
Signed-off-by: Scott Wood scottw...@freescale.com
Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kernel/ppc_ksyms.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/ppc_ksyms.c b/arch/powerpc/kernel/ppc_ksyms.c
index 3e40315..e597dde 100644
--- a/arch/powerpc/kernel/ppc_ksyms.c
+++ b/arch/powerpc/kernel/ppc_ksyms.c
@@ -43,6 +43,7 @@
 #include asm/dcr.h
 #include asm/ftrace.h
 #include asm/switch_to.h
+#include asm/epapr_hcalls.h
 
 #ifdef CONFIG_PPC32
 extern void transfer_to_handler(void);
@@ -192,3 +193,7 @@ EXPORT_SYMBOL(__arch_hweight64);
 #ifdef CONFIG_PPC_BOOK3S_64
 EXPORT_SYMBOL_GPL(mmu_psize_defs);
 #endif
+
+#ifdef CONFIG_EPAPR_PARAVIRT
+EXPORT_SYMBOL(epapr_hypercall_start);
+#endif
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 16/56] KVM: PPC: BookE: Add support for vcpu-mode

2012-10-04 Thread Alexander Graf

Generic KVM code might want to know whether we are inside guest context
or outside. It also wants to be able to push us out of guest context.

Add support to the BookE code for the generic vcpu-mode field that describes
the above states.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kvm/booke.c |   11 +++
 1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index bcf87fe..70a86c0 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -501,6 +501,15 @@ static int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu)
continue;
}
 
+   if (vcpu-mode == EXITING_GUEST_MODE) {
+   r = 1;
+   break;
+   }
+
+   /* Going into guest context! Yay! */
+   vcpu-mode = IN_GUEST_MODE;
+   smp_wmb();
+
break;
}
 
@@ -572,6 +581,8 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct 
kvm_vcpu *vcpu)
kvm_guest_exit();
 
 out:
+   vcpu-mode = OUTSIDE_GUEST_MODE;
+   smp_wmb();
local_irq_enable();
return ret;
 }
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 15/56] KVM: PPC: BookE: Add check_requests helper function

2012-10-04 Thread Alexander Graf

We need a central place to check for pending requests in. Add one that
only does the timer check we already do in a different place.

Later, this central function can be extended by more checks.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kvm/booke.c |   24 +---
 1 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 1d4ce9a..bcf87fe 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -419,13 +419,6 @@ static void kvmppc_core_check_exceptions(struct kvm_vcpu 
*vcpu)
unsigned long *pending = vcpu-arch.pending_exceptions;
unsigned int priority;
 
-   if (vcpu-requests) {
-   if (kvm_check_request(KVM_REQ_PENDING_TIMER, vcpu)) {
-   smp_mb();
-   update_timer_ints(vcpu);
-   }
-   }
-
priority = __ffs(*pending);
while (priority  BOOKE_IRQPRIO_MAX) {
if (kvmppc_booke_irqprio_deliver(vcpu, priority))
@@ -461,6 +454,14 @@ int kvmppc_core_prepare_to_enter(struct kvm_vcpu *vcpu)
return r;
 }
 
+static void kvmppc_check_requests(struct kvm_vcpu *vcpu)
+{
+   if (vcpu-requests) {
+   if (kvm_check_request(KVM_REQ_PENDING_TIMER, vcpu))
+   update_timer_ints(vcpu);
+   }
+}
+
 /*
  * Common checks before entering the guest world.  Call with interrupts
  * disabled.
@@ -485,6 +486,15 @@ static int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu)
break;
}
 
+   smp_mb();
+   if (vcpu-requests) {
+   /* Make sure we process requests preemptable */
+   local_irq_enable();
+   kvmppc_check_requests(vcpu);
+   local_irq_disable();
+   continue;
+   }
+
if (kvmppc_core_prepare_to_enter(vcpu)) {
/* interrupts got enabled in between, so we
   are back at square 1 */
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

1 2 >

1 - 100 of 145 matches

Mail list logo