date:20130109

RE: [PATCH v8 2/3] x86, apicv: add virtual interrupt delivery support

2013-01-09 Thread Zhang, Yang Z

Gleb Natapov wrote on 2013-01-08:
 On Tue, Jan 08, 2013 at 11:57:59AM -0200, Marcelo Tosatti wrote:
 On Tue, Jan 08, 2013 at 12:57:42PM +, Zhang, Yang Z wrote:
 Gleb Natapov wrote on 2013-01-08:
 On Mon, Jan 07, 2013 at 07:32:39PM -0200, Marcelo Tosatti wrote:
 On Mon, Jan 07, 2013 at 07:48:43PM +0200, Gleb Natapov wrote:
 ioapic_write (or any other ioapic update)
 lock()
 perform update
 make_all_vcpus_request(KVM_REQ_UPDATE_EOI_BITMAP) (*)
 unlock()
 
 (*) Similarly to TLB flush.
 
 The advantage is that all work becomes vcpu local. The end result
 is much simpler code.
 What complexity will it remove?
 
 Synchronization between multiple CPUs (except the KVM_REQ_ bit
 processing, which is infrastructure shared by other parts of KVM).
 
 Synchronization is just a lock around bitmap access. Can be replaced
 with RCU if it turns to be performance problem.
 
 We agreed that performance is non issue here.
 Yes, if the code is indeed simpler we can take the hit, although
 recalculating bitmap 255 times instead of one for -smp 255 looks like a
 little bit excessive, but I do not see considerable simplification (if
 at all).
 
 So as far as I understand you are proposing:
 
 vcpu0 or io thread:   |vcpu1:
 ioapic_write (or other ioapic update) |
  lock(exitbitmap) |
  if (on vcpu) |
ioapic_update_my_eoi_exitmap() |
  make_all_vcpus_request(update)   |if (update requested)
   |
 ioapic_update_my_eoi_exitmap()
  unlock(exitbitmap)   |
 The current patch logic is this:
 
 vcpu0 or io thread:   |  vcpu1:
 ioapic_write (or other ioapic update) |
  lock(exitbitmap) |
  ioapic_update_all_eoi_exitmaps() |
  make request on each vcpu|
  kick each vcpu   | if (update requested)
  unlock(exitbitmap)   |lock(exitbitmap)
   |load_exitbitmap()
   |unlock(exitbitmap)
 If I described correctly what you are proposing I do not
 see simplification since the bulk of the complexity is in the
 ioapic_update_(my|all)_eoi_exitmap() and they will be the same in both
 implementations. Actually I do see complication in your idea introduced
 by the fact that the case when update is done from vcpu thread have to
 be handled specially.
 
 The proposed patch may be simplified further by
 make_all_vcpus_request_async(update)(*) instead of making request and
 kicking each vcpu individually. In fact the way it is done now is buggy
 since requests are made only for vcpus with bit set in their bitmask,
 but if bit is cleared request is not made so vcpu can run with stale
 bitmask.
 ok, how about the follow logic:
 ioapic_write()
 lock()
 clear_eoi_exitmap_on_all_vcpus()
 perform update(no make request)
 make_all_vcpus_request(like tlb flush)
 unlock()
 
 Why not just
 
 ioapic writer / map updater context
 --
 
 ioapic_write()
 make_all_vcpus_request()
 
 
 (no special lock taken)
 
 
 vcpu context, entry
 --
 
  if(check_request(KVM_REQ_, )) {
  ioapic_lock();  (*)
  update local EOI exit bitmap from IOAPIC
In my patch, it traverses IOAPIC entry once and only updates target vcpus's eoi 
exit bitmap. Then make request for all vcpus.
With your suggestion , all vcpus will traverse all IOAPIC entries. Though 
ioapic entry write is rare, it's still not reasonable.

  ioapic_unlock();
  }
 
 Fine by me. Looks simpler.
 
 
 
 (*) plus any other lock that paths that update the map take
 
 
 
 
 
 
 Best regards,
 Yang
 
 --
   Gleb.


Best regards,
Yang

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 10/12] virtio-net: multiqueue support

2013-01-09 Thread Wanlong Gao

On 01/08/2013 06:14 PM, Jason Wang wrote:
 On 01/08/2013 06:00 PM, Wanlong Gao wrote:
 On 01/08/2013 05:51 PM, Jason Wang wrote:
 On 01/08/2013 05:49 PM, Wanlong Gao wrote:
 On 01/08/2013 05:29 PM, Jason Wang wrote:
 On 01/08/2013 05:07 PM, Wanlong Gao wrote:
 On 12/28/2012 06:32 PM, Jason Wang wrote:
 +} else if (nc-peer-info-type !=  NET_CLIENT_OPTIONS_KIND_TAP) {
 +ret = -1;
 +} else {
 +ret = tap_detach(nc-peer);
 +}
 +
 +return ret;
 +}
 +
 +static void virtio_net_set_queues(VirtIONet *n)
 +{
 +int i;
 +
 +for (i = 0; i  n-max_queues; i++) {
 +if (i  n-curr_queues) {
 +assert(!peer_attach(n, i));
 +} else {
 +assert(!peer_detach(n, i));
 I got a assert here,
 qemu-system-x86_64: /work/git/qemu/hw/virtio-net.c:330: 
 virtio_net_set_queues: Assertion `!peer_detach(n, i)' failed.

 Any thoughts?

 Thanks,
 Wanlong Gao
 Thanks for the testing, which steps or cases did you met this assertion,
 migration, reboot or just changing the number of virtqueues?
 I use the 3.8-rc2 to test it again, I saw this tag has the multi-tap 
 support.

 I just can't start the QEMU use  -netdev 
 tap,id=hostnet0,queues=2,fd=%d,fd=%d -device 
 virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:ce:7b:29,bus=pci.0,addr=0x3

 I pre-opened two tap fds, did I missing something?
 Nothing missed :) It should work.

 Could you please try not use fd=X and let qemu to create the file
 descriptors by itself? Btw, how did you create the two tap fds?
 Can it create descriptors itself? I get 
 qemu-system-x86_64: -netdev tap,id=hostnet0,queues=2: Device 'tap' could not 
 be initialized
 
 You need prepare an ifup script which default at /etc/qemu-ifup (like
 following). Or you may try to add a script=no after:
 
 #!/bin/sh
 
 switch=kvmbr0
 
 /sbin/ifconfig $1 0.0.0.0 up
 /usr/sbin/brctl addif $switch $1
 /usr/sbin/brctl stp $switch off
 
 This will let qemu create a tap fd itself and make it to be connected to
 a port of the bridge caled kvmbr0.

But how to support multi-queue in this way?
I got guest kernel panic when using this way and set queues=4.

Thanks,
Wanlong Gao


 I create the tap fd like this, and dup create the second fd, third fd, right?
 
 The second and third fd should be created with TUNSETIFF with the same
 tap_name also. Btw, you need to specify a IFF_MULTI_QUEUE flag to tell
 the kernel you want to create a multiqueue tap device, otherwise the
 second and third calling of TUNSETIFF will fail.
 
 Thanks

  int tap_fd = open(/dev/net/tun, O_RDWR);
  int vhost_fd = open(/dev/vhost-net, O_RDWR);
  char *tap_name = tap;
  char cmd[2048];
  char brctl[256];
  char netup[256];
  struct ifreq ifr;
  if (tap_fd  0) {
  printf(open tun device failed\n);
  return -1;
  }
  if (vhost_fd  0) {
  printf(open vhost-net device failed\n);
  return -1;
  }
  memset(ifr, 0, sizeof(ifr));
  memcpy(ifr.ifr_name, tap_name, sizeof(tap_name));
  ifr.ifr_flags = IFF_TAP | IFF_NO_PI;

  /*
   * setup tap net device
   */
  if (ioctl(tap_fd, TUNSETIFF, ifr)  0) {
  printf(setup tap net device failed\n);
  return -1;
  }

  sprintf(brctl, brctl addif virbr0 %s, tap_name);
  sprintf(netup, ifconfig %s up, tap_name);
  system(brctl);
  system(netup);

 Thanks,
 Wanlong Gao


 Thanks
 Thanks,
 Wanlong Gao

 +}
 +}
 +}
 +
 +static void virtio_net_set_multiqueue(VirtIONet *n, int multiqueue, 
 int ctrl);
 +
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

 
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: what's the different for qemu --eanble-kvm and accel=kvm and qemu(when kvm kmod load)

2013-01-09 Thread Stefan Hajnoczi

On Mon, Jan 07, 2013 at 05:08:13PM +0800, lei yang wrote:
 On Mon, Jan 7, 2013 at 4:58 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
  On Sun, Jan 6, 2013 at 12:27 PM, lei yang yanglei.f...@gmail.com wrote:
  What's the different with below combos?
 
  The difference is historical, it's just how the command-line options
  evolved over time.
 
  1)qemu --enable-kvm
 
  The old way.  Still useful because it's slightly easier to type than
  --machine accel=kvm.
 
  2)qemu accel=kvm
 
  The modern way.
 
  3)qemu without above parameters when kvm kmod has been load
 
  There is a difference in behavior between QEMU and qemu-kvm here:
 
  QEMU uses TCG and not KVM by default, regardless of whether the kvm.ko
  module has been loaded or not.  qemu-kvm uses KVM by default, if
  available.
 
  The qemu-kvm fork has been retired so it's best not to rely on this
  behavior.  Future distro packages will be built from QEMU and unless a
  code change is made, the default accelerator is TCG.
 
 
 Thanks fro the explain
 
 So if we want use kvm we need to explicitly add --enable-kvm or   
 accel=kvm
 regardless kvm.ko load or not

Yes.

 How can we check we are using TCG or KVM, can we check this in guestos
 or check this with monitor
 can you show me the exactly command?

The QEMU monitor info kvm command displays the status.

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] virtio-spec: fix two typos

2013-01-09 Thread Stefan Hajnoczi

On Wed, Jan 09, 2013 at 03:55:23PM +0800, ak...@redhat.com wrote:
 From: Amos Kong ak...@redhat.com
 
 VIRTIO_NET_F_VTRL_VQ - VIRTIO_NET_F_CTRL_VQ
 VIRTIO_NET_CTRL_MQ is defined to 4 in kernel code
 
 Signed-off-by: Amos Kong ak...@redhat.com
 ---
  virtio-spec.lyx | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

Reviewed-by: Stefan Hajnoczi stefa...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH qom-cpu 2/7] target-i386: Don't set any KVM flag by default if KVM is disabled

2013-01-09 Thread Igor Mammedov

On Mon,  7 Jan 2013 16:20:43 -0200
Eduardo Habkost ehabk...@redhat.com wrote:

 This is a cleanup that tries to solve two small issues:
 
  - We don't need a separate kvm_pv_eoi_features variable just to keep a
constant calculated at compile-time, and this style would require
adding a separate variable (that's declared twice because of the
CONFIG_KVM ifdef) for each feature that's going to be enabled/disable
by machine-type compat code.
  - The pc-1.3 code is setting the kvm_pv_eoi flag on cpuid_kvm_features
even when KVM is disabled at runtime. This small incosistency in
the cpuid_kvm_features field isn't a problem today because
cpuid_kvm_features is ignored by the TCG code, but it may cause
unexpected problems later when refactoring the CPUID handling code.
 
 This patch eliminates the kvm_pv_eoi_features variable and simply uses
 kvm_enabled() inside the enable_kvm_pv_eoi() compat function, so it
 enables kvm_pv_eoi only if KVM is enabled. I believe this makes the
 behavior of enable_kvm_pv_eoi() clearer and easier to understand.

Subj doesn't match what patch actually does.
Have you meant Don't set kvm_pv_eoi flag by default if KVM is disabled?
Although eliminate kvm_pv_eoi_features variable might better describe what
patch is doing.

 
 Signed-off-by: Eduardo Habkost ehabk...@redhat.com
 ---
 Cc: kvm@vger.kernel.org
 Cc: Michael S. Tsirkin m...@redhat.com
 Cc: Gleb Natapov g...@redhat.com
 Cc: Marcelo Tosatti mtosa...@redhat.com
 
 Changes v2:
  - Coding style fix
 
 Changes v3:
  - Eliminate #ifdef by using the fake KVM_FEATURE_PV_EOI #define
 ---
  target-i386/cpu.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)
 
 diff --git a/target-i386/cpu.c b/target-i386/cpu.c
 index 951e206..40400ac 100644
 --- a/target-i386/cpu.c
 +++ b/target-i386/cpu.c
 @@ -164,15 +164,15 @@ static uint32_t kvm_default_features = (1 
 KVM_FEATURE_CLOCKSOURCE) | (1  KVM_FEATURE_ASYNC_PF) |
  (1  KVM_FEATURE_STEAL_TIME) |
  (1  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT);
 -static const uint32_t kvm_pv_eoi_features = (0x1  KVM_FEATURE_PV_EOI);
  #else
  static uint32_t kvm_default_features = 0;
 -static const uint32_t kvm_pv_eoi_features = 0;
  #endif
  
  void enable_kvm_pv_eoi(void)
  {
 -kvm_default_features |= kvm_pv_eoi_features;
 +if (kvm_enabled()) {
 +kvm_default_features |= (1UL  KVM_FEATURE_PV_EOI);
 +}
  }
  
  void host_cpuid(uint32_t function, uint32_t count,

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 10/12] virtio-net: multiqueue support

2013-01-09 Thread Jason Wang

On 01/09/2013 04:23 PM, Wanlong Gao wrote:
 On 01/08/2013 06:14 PM, Jason Wang wrote:
 On 01/08/2013 06:00 PM, Wanlong Gao wrote:
 On 01/08/2013 05:51 PM, Jason Wang wrote:
 On 01/08/2013 05:49 PM, Wanlong Gao wrote:
 On 01/08/2013 05:29 PM, Jason Wang wrote:
 On 01/08/2013 05:07 PM, Wanlong Gao wrote:
 On 12/28/2012 06:32 PM, Jason Wang wrote:
 +} else if (nc-peer-info-type !=  NET_CLIENT_OPTIONS_KIND_TAP) {
 +ret = -1;
 +} else {
 +ret = tap_detach(nc-peer);
 +}
 +
 +return ret;
 +}
 +
 +static void virtio_net_set_queues(VirtIONet *n)
 +{
 +int i;
 +
 +for (i = 0; i  n-max_queues; i++) {
 +if (i  n-curr_queues) {
 +assert(!peer_attach(n, i));
 +} else {
 +assert(!peer_detach(n, i));
 I got a assert here,
 qemu-system-x86_64: /work/git/qemu/hw/virtio-net.c:330: 
 virtio_net_set_queues: Assertion `!peer_detach(n, i)' failed.

 Any thoughts?

 Thanks,
 Wanlong Gao
 Thanks for the testing, which steps or cases did you met this assertion,
 migration, reboot or just changing the number of virtqueues?
 I use the 3.8-rc2 to test it again, I saw this tag has the multi-tap 
 support.

 I just can't start the QEMU use  -netdev 
 tap,id=hostnet0,queues=2,fd=%d,fd=%d -device 
 virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:ce:7b:29,bus=pci.0,addr=0x3

 I pre-opened two tap fds, did I missing something?
 Nothing missed :) It should work.

 Could you please try not use fd=X and let qemu to create the file
 descriptors by itself? Btw, how did you create the two tap fds?
 Can it create descriptors itself? I get 
 qemu-system-x86_64: -netdev tap,id=hostnet0,queues=2: Device 'tap' could 
 not be initialized
 You need prepare an ifup script which default at /etc/qemu-ifup (like
 following). Or you may try to add a script=no after:

 #!/bin/sh

 switch=kvmbr0

 /sbin/ifconfig $1 0.0.0.0 up
 /usr/sbin/brctl addif $switch $1
 /usr/sbin/brctl stp $switch off

 This will let qemu create a tap fd itself and make it to be connected to
 a port of the bridge caled kvmbr0.
 But how to support multi-queue in this way?

Qemu will create the necessary multiqueue tap by itself, see patch 0/12.
 I got guest kernel panic when using this way and set queues=4.

Does it happens w/o or w/ a fd parameter? What's the qemu command line?
Did you meet it during boot time?

Thanks

 Thanks,
 Wanlong Gao

 I create the tap fd like this, and dup create the second fd, third fd, 
 right?
 The second and third fd should be created with TUNSETIFF with the same
 tap_name also. Btw, you need to specify a IFF_MULTI_QUEUE flag to tell
 the kernel you want to create a multiqueue tap device, otherwise the
 second and third calling of TUNSETIFF will fail.

 Thanks
 int tap_fd = open(/dev/net/tun, O_RDWR);
 int vhost_fd = open(/dev/vhost-net, O_RDWR);
 char *tap_name = tap;
 char cmd[2048];
 char brctl[256];
 char netup[256];
 struct ifreq ifr;
 if (tap_fd  0) {
 printf(open tun device failed\n);
 return -1;
 }
 if (vhost_fd  0) {
 printf(open vhost-net device failed\n);
 return -1;
 }
 memset(ifr, 0, sizeof(ifr));
 memcpy(ifr.ifr_name, tap_name, sizeof(tap_name));
 ifr.ifr_flags = IFF_TAP | IFF_NO_PI;

 /*
  * setup tap net device
  */
 if (ioctl(tap_fd, TUNSETIFF, ifr)  0) {
 printf(setup tap net device failed\n);
 return -1;
 }

 sprintf(brctl, brctl addif virbr0 %s, tap_name);
 sprintf(netup, ifconfig %s up, tap_name);
 system(brctl);
 system(netup);

 Thanks,
 Wanlong Gao


 Thanks
 Thanks,
 Wanlong Gao

 +}
 +}
 +}
 +
 +static void virtio_net_set_multiqueue(VirtIONet *n, int multiqueue, 
 int ctrl);
 +
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 01/12] tap: multiqueue support

2013-01-09 Thread Stefan Hajnoczi

On Fri, Dec 28, 2012 at 06:31:53PM +0800, Jason Wang wrote:
 diff --git a/qapi-schema.json b/qapi-schema.json
 index 5dfa052..583eb7c 100644
 --- a/qapi-schema.json
 +++ b/qapi-schema.json
 @@ -2465,7 +2465,7 @@
  { 'type': 'NetdevTapOptions',
'data': {
  '*ifname': 'str',
 -'*fd': 'str',
 +'*fd': ['String'],

This change is not backwards-compatible.  You need to add a '*fds':
['String'] field instead.

  '*script': 'str',
  '*downscript': 'str',
  '*helper': 'str',
 @@ -2473,7 +2473,8 @@
  '*vnet_hdr':   'bool',
  '*vhost':  'bool',
  '*vhostfd':'str',
 -'*vhostforce': 'bool' } }
 +'*vhostforce': 'bool',
 +'*queues': 'uint32'} }

The 'queues' parameter should not be necessary when fd passing is used
since we can learn the number of queues by looking at the list length.

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 10/12] virtio-net: multiqueue support

2013-01-09 Thread Wanlong Gao

On 01/09/2013 05:30 PM, Jason Wang wrote:
 On 01/09/2013 04:23 PM, Wanlong Gao wrote:
 On 01/08/2013 06:14 PM, Jason Wang wrote:
 On 01/08/2013 06:00 PM, Wanlong Gao wrote:
 On 01/08/2013 05:51 PM, Jason Wang wrote:
 On 01/08/2013 05:49 PM, Wanlong Gao wrote:
 On 01/08/2013 05:29 PM, Jason Wang wrote:
 On 01/08/2013 05:07 PM, Wanlong Gao wrote:
 On 12/28/2012 06:32 PM, Jason Wang wrote:
 +} else if (nc-peer-info-type !=  NET_CLIENT_OPTIONS_KIND_TAP) 
 {
 +ret = -1;
 +} else {
 +ret = tap_detach(nc-peer);
 +}
 +
 +return ret;
 +}
 +
 +static void virtio_net_set_queues(VirtIONet *n)
 +{
 +int i;
 +
 +for (i = 0; i  n-max_queues; i++) {
 +if (i  n-curr_queues) {
 +assert(!peer_attach(n, i));
 +} else {
 +assert(!peer_detach(n, i));
 I got a assert here,
 qemu-system-x86_64: /work/git/qemu/hw/virtio-net.c:330: 
 virtio_net_set_queues: Assertion `!peer_detach(n, i)' failed.

 Any thoughts?

 Thanks,
 Wanlong Gao
 Thanks for the testing, which steps or cases did you met this assertion,
 migration, reboot or just changing the number of virtqueues?
 I use the 3.8-rc2 to test it again, I saw this tag has the multi-tap 
 support.

 I just can't start the QEMU use  -netdev 
 tap,id=hostnet0,queues=2,fd=%d,fd=%d -device 
 virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:ce:7b:29,bus=pci.0,addr=0x3

 I pre-opened two tap fds, did I missing something?
 Nothing missed :) It should work.

 Could you please try not use fd=X and let qemu to create the file
 descriptors by itself? Btw, how did you create the two tap fds?
 Can it create descriptors itself? I get 
 qemu-system-x86_64: -netdev tap,id=hostnet0,queues=2: Device 'tap' could 
 not be initialized
 You need prepare an ifup script which default at /etc/qemu-ifup (like
 following). Or you may try to add a script=no after:

 #!/bin/sh

 switch=kvmbr0

 /sbin/ifconfig $1 0.0.0.0 up
 /usr/sbin/brctl addif $switch $1
 /usr/sbin/brctl stp $switch off

 This will let qemu create a tap fd itself and make it to be connected to
 a port of the bridge caled kvmbr0.
 But how to support multi-queue in this way?
 
 Qemu will create the necessary multiqueue tap by itself, see patch 0/12.
 I got guest kernel panic when using this way and set queues=4.
 
 Does it happens w/o or w/ a fd parameter? What's the qemu command line?
 Did you meet it during boot time?

The QEMU command line is 

/work/git/qemu/x86_64-softmmu/qemu-system-x86_64 -name f17 -M pc-0.15 
-enable-kvm -m 3096 \
-smp 4,sockets=4,cores=1,threads=1 \
-uuid c31a9f3e-4161-c53a-339c-5dc36d0497cb -no-user-config -nodefaults \
-chardev 
socket,id=charmonitor,path=/var/lib/libvirt/qemu/f17.monitor,server,nowait \
-mon chardev=charmonitor,id=monitor,mode=control \
-rtc base=utc -no-shutdown \
-device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \
-device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0xb,num_queues=4,hotplug=on \
-device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 \
-drive file=/vm/f17.img,if=none,id=drive-virtio-disk0,format=qcow2 \
-device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
 \
-drive file=/vm2/f17-kernel.img,if=none,id=drive-virtio-disk1,format=qcow2 \
-device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk1,id=virtio-disk1
 \
-drive file=/vm/virtio-scsi/scsi3.img,if=none,id=drive-scsi0-0-2-0,format=raw \
-device 
scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=2,drive=drive-scsi0-0-2-0,id=scsi0-0-2-0,removable=on
 \
-drive file=/vm/virtio-scsi/scsi4.img,if=none,id=drive-scsi0-0-3-0,format=raw \
-device 
scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=3,drive=drive-scsi0-0-3-0,id=scsi0-0-3-0
 \
-drive file=/vm/virtio-scsi/scsi1.img,if=none,id=drive-scsi0-0-0-0,format=raw \
-device 
scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0
 \
-drive file=/vm/virtio-scsi/scsi2.img,if=none,id=drive-scsi0-0-1-0,format=raw \
-device 
scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi0-0-1-0,id=scsi0-0-1-0
 \
-chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 \
-chardev file,id=charserial1,path=/vm/f17.log \
-device isa-serial,chardev=charserial1,id=serial1 \
-device usb-tablet,id=input0 -vga std \
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 \
-netdev tap,id=hostnet0,vhost=on,queues=4 \
-device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:ce:7b:29,bus=pci.0,addr=0x3 
\
-monitor stdio

I got panic just after booting the system, did nothing,  waited for a while, 
the guest panicked.

[   28.053004] BUG: soft lockup - CPU#1 stuck for 23s! [ip:592]
[   28.053004] Modules linked in: ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 
xt_state nf_conntrack ip6table_filter ip6_tables uinput joydev microcode 
virtio_balloon pcspkr virtio_net i2c_piix4 i2c_core virtio_scsi virtio_blk 
floppy
[   28.053004] CPU 1 
[   28.053004] Pid: 592, comm: ip Not tainted 3.8.0-rc1-net+ #3 Bochs

Re: [PATCH v5 01/12] KVM: ARM: Introduce KVM_SET_DEVICE_ADDRESS ioctl

2013-01-09 Thread Alexander Graf



Am 09.01.2013 um 00:49 schrieb Christoffer Dall c.d...@virtualopensystems.com:

 On Tue, Jan 8, 2013 at 6:29 PM, Scott Wood scottw...@freescale.com wrote:
 On 01/08/2013 05:17:05 PM, Christoffer Dall wrote:
 
 On Tue, Jan 8, 2013 at 5:36 PM, Scott Wood scottw...@freescale.com
 wrote:
 On 01/08/2013 12:41:30 PM, Christoffer Dall wrote:
 +struct kvm_device_address {
 +   __u64 id;
 +   __u64 addr;
 +};
 
 
 What about this is really specific to addresses?  Can't we set other
 device
 parameters this way?
 
 Sort of like a device equivalent of PPC's one-reg interface.
 
 This has been discussed a number of times,
 
 
 Sorry, this patch was just pointed out to me today.  I googled the patch
 title but couldn't find this discussion.
 
 
 
 I believe it was mainly discussed at the KVM Forum in person.
 
 and one or the other there
 is a need for userspace to tell KVM to present memory-mapped devices
 at a given address. It was also considered to make this specific to
 irqchip initialization, but irqchips are different and a lot of that
 code is x86-specific, so that approach was discarded.
 
 This *could* look something like this:
 
 struct kvm_device_param {
u64 dev_id;
u64 param_id;
u64 value;
 };
 
 but that has less clear, or at least less specific, semantics.
 
 
 Why is it less clear?  You need to have device-specific documentation for
 what id means, so why not also an enumeration of params?  Or just keep
 it as is, and rename address to value.  Whether dev and param are
 combined is orthogonal from whether it's used for non-address things.
 
 less clear in the sense that you have to look at more code to see what
 it does. I'm not saying that it's too unclear, at all, I'm actually
 fine with it, but to make my point, we can make an ioctl that's called
 do_something() that takes a struct with val0, val1, val2, val3, ...
 
 
 If you leave it as address, either we'll have it being used for
 non-address things regardless of the name (Not a typewriter!), or there'll
 end up being yet more unnecessary ioctls, or device-specific things will end
 up getting shoved into CPU interfaces such as one-reg.  For example, on MPIC
 we need to be able to specify the version of the chip to emulate in addition
 to the address at which it lives.
 
 Also, why is it documented as an arm interface?  Shouldn't it be a generic
 interface, with other architectures currently not implementing any IDs?
 What in the kvm_arch_vm_ioctl() wrapper is arm-specific?
 
 As I remember the argument for keeping this the point was that there
 were other preferred methods for other archs to do this, and that ARM
 was the only platform that had this explicit need, but maybe I'm
 making this up.
 
 I'll let Peter and Alex respond this as well, and if they're fine with
 changing it to what I proposed above, then let's do that, and we can
 make it a non arm-specific interface.

I think we should make thus at least potentially generic. In fact, I wouldn't 
even mind calling it DEV_REG with the same semantics as ONE_REG, just that it 
also gets a unique dev id that gets created during in-kernel device creation 
and that it's a vm ioctl.

That way we wouldn't block our path to create two in-kernel irqchips one day.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: FreeBSD-amd64 fails to start with SMP on quemu-kvm

2013-01-09 Thread Gleb Natapov

On Tue, Jan 08, 2013 at 09:45:35PM +0100, Artur Samborski wrote:
 W dniu 08.01.2013 00:00, Marcelo Tosatti pisze:
 On Mon, Jan 07, 2013 at 06:13:22PM +0100, Artur Samborski wrote:
 Hello,
 
 When i try to run FreeBSD-amd64 on more than 1 vcpu in quemu-kvm
 (Fedora Core 17) eg. to run FreeBSD-9.0-RELEASE-amd64 with:
 
 qemu-kvm -m 1024m -cpu host -smp 2 -cdrom
 /storage/iso/FreeBSD-9.0-RELEASE-amd64-dvd1.iso
 
 it freezes KVM with:
 
 KVM internal error. Suberror: 1
 emulation failure
 RAX=80b0d4c0 RBX=0009f000 RCX=c080
 RDX=
 RSI=d238 RDI= RBP=
 RSP=
 R8 = R9 = R10=
 R11=
 R12= R13= R14=
 R15=
 RIP=0009f076 RFL=00010086 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
 ES =   f300 DPL=3 DS16 [-WA]
 CS =0008   00209900 DPL=0 CS64 [--A]
 SS =9f00 0009f000  f300 DPL=3 DS16 [-WA]
 DS =0018   00c09300 DPL=0 DS   [-WA]
 FS =   f300 DPL=3 DS16 [-WA]
 GS =   f300 DPL=3 DS16 [-WA]
 LDT=   8200 DPL=0 LDT
 TR =   8b00 DPL=0 TSS64-busy
 GDT= 0009f080 0020
 IDT=  
 CR0=8011 CR2= CR3=0009c000 CR4=0030
 DR0= DR1= DR2=
 DR3=
 DR6=0ff0 DR7=0400
 EFER=0501
 Code=00 00 00 80 0f 22 c0 ea 70 f0 09 00 08 00 48 b8 c0 d4 b0 80
 ff ff ff ff ff e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 00 99 20 00 ff ff 00 00
 
 Artur,
 
 Can you check whether
 
 https://patchwork-mail.kernel.org/patch/1942681/
 
 fixes your problem
 
 
 Hi, thanks for the reply.
 
 Unfortunately, the patch does not help. Attempt to start FreeBSD
 amd64 via quemu-kvm with -smp parameter greater than 1 fails in
 exactly the same way as before.
 
 The patch was applied to the kernel from the 3.6.11-1.fc17.src.rpm package.
 
 Do I need some additional patches?
 
 
Can you try queue branch from kvm.git?

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/4] ARM updates for kvmtool

2013-01-09 Thread Pekka Enberg

On Mon, Jan 7, 2013 at 8:14 PM, Will Deacon will.dea...@arm.com wrote:
 Hello kvm hackers,

 This patch series introduces some updates to the ARM (AArch32) kvm tools
 code:

 - virtio mmio fixes to deal with guest page sizes != 4k (in
   preparation for AArch64, which I will post separately).
 - .dtb dumping via the lkvm command line
 - Support for PSCI firmware as a replacement to the spin-table
   based SMP boot code

 The last option was implemented after discussion on the linux-arm-kernel
 list when adding support for the mach-virt platform. I hope to upstream
 the kernel-side part of the implementation for 3.9 and expect the kvm
 bits to follow once that has been merged.

 All feedback welcome.

Applied, thanks Will!
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/2] Add support for ARMv8 CPUs to kvmtool

2013-01-09 Thread Pekka Enberg

Hi Will,

On Mon, Jan 7, 2013 at 8:43 PM, Will Deacon will.dea...@arm.com wrote:
 Hello again,

 These two patches add support for ARMv8 processors running an AArch64 instance
 of kvm to kvmtool. Both AArch32 and AArch64 guests are supported and, in the
 case of the latter, the guest page size may be either 64k or 4k.

 This depends on the ARM updates series I just posted:

   https://lists.cs.columbia.edu/pipermail/kvmarm/2013-January/004505.html

 Feedback welcome,

So will ARMv7 continue to work after this patch even without the ARM
updates? If so, I'm happy to merge the patch before the updates hit
v3.9.

Pekka
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/2] Add support for ARMv8 CPUs to kvmtool

2013-01-09 Thread Will Deacon

On Wed, Jan 09, 2013 at 11:16:42AM +, Pekka Enberg wrote:
 Hi Will,

Hello,

 On Mon, Jan 7, 2013 at 8:43 PM, Will Deacon will.dea...@arm.com wrote:
  These two patches add support for ARMv8 processors running an AArch64 
  instance
  of kvm to kvmtool. Both AArch32 and AArch64 guests are supported and, in the
  case of the latter, the guest page size may be either 64k or 4k.
 
  This depends on the ARM updates series I just posted:
 
https://lists.cs.columbia.edu/pipermail/kvmarm/2013-January/004505.html
 
  Feedback welcome,
 
 So will ARMv7 continue to work after this patch even without the ARM
 updates? If so, I'm happy to merge the patch before the updates hit
 v3.9.

I think we're getting our wires crossed a bit here, so I'll try to explain
my madness:

- Mainline ARM kernels cannot be booted by kvmtool yet, you
  currently have to use my virt/mach branch from

git://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git

  I hope to upstream this for 3.9 (it has gone through the necessary
  review on the ARM kernel list).

- As part of the review mentioned above, some kvmtool changes were
  required to support SMP with the latest virt/mach. You just merged
  these in my ARM updates series.

- The ARMv8 kvmtool code builds on top of the series you just merged
  and won't apply without it. Applying it won't have any adverse
  effects on ARMv7 (AArch32). In fact, the two series sit together
  on my kvmtool/arm branch from

git://git.kernel.org/pub/scm/linux/kernel/git/will/kvmtool.git

  but I posted them separately to avoid holding up the ARM updates.

Does that answer you question or have I made things worse?!

Cheers,

Will
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH qom-cpu 2/7] target-i386: Don't set any KVM flag by default if KVM is disabled

2013-01-09 Thread Eduardo Habkost

On Wed, Jan 09, 2013 at 09:41:37AM -0200, Eduardo Habkost wrote:
[...]
 Andreas, please ignore this patch (it is not necessary anymore as this
 series doesn't include machine-type compatibility code for kvm_mmu).
 Patches 2-7 don't depend on this patch and should apply cleanly without
 it.

I mean: patches 1 and 3-7 don't depend on this patch and can be applied
cleanly without it.

-- 
Eduardo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/4] ARM updates for kvmtool

2013-01-09 Thread Will Deacon

On Wed, Jan 09, 2013 at 11:13:58AM +, Pekka Enberg wrote:
 On Mon, Jan 7, 2013 at 8:14 PM, Will Deacon will.dea...@arm.com wrote:
  Hello kvm hackers,
 
  This patch series introduces some updates to the ARM (AArch32) kvm tools
  code:
 
  - virtio mmio fixes to deal with guest page sizes != 4k (in
preparation for AArch64, which I will post separately).
  - .dtb dumping via the lkvm command line
  - Support for PSCI firmware as a replacement to the spin-table
based SMP boot code
 
  The last option was implemented after discussion on the linux-arm-kernel
  list when adding support for the mach-virt platform. I hope to upstream
  the kernel-side part of the implementation for 3.9 and expect the kvm
  bits to follow once that has been merged.
 
  All feedback welcome.
 
 Applied, thanks Will!

Great, cheers Pekka.

Will
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/2] Add support for ARMv8 CPUs to kvmtool

2013-01-09 Thread Pekka Enberg

On Wed, Jan 9, 2013 at 1:33 PM, Will Deacon will.dea...@arm.com wrote:
 I think we're getting our wires crossed a bit here, so I'll try to explain
 my madness:

 - Mainline ARM kernels cannot be booted by kvmtool yet, you
   currently have to use my virt/mach branch from

 git://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git

   I hope to upstream this for 3.9 (it has gone through the necessary
   review on the ARM kernel list).

 - As part of the review mentioned above, some kvmtool changes were
   required to support SMP with the latest virt/mach. You just merged
   these in my ARM updates series.

 - The ARMv8 kvmtool code builds on top of the series you just merged
   and won't apply without it. Applying it won't have any adverse
   effects on ARMv7 (AArch32). In fact, the two series sit together
   on my kvmtool/arm branch from

 git://git.kernel.org/pub/scm/linux/kernel/git/will/kvmtool.git

   but I posted them separately to avoid holding up the ARM updates.

 Does that answer you question or have I made things worse?!

Yes, it answers my question. I'll go ahead and apply the ARMv8 patches.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH qom-cpu 2/7] target-i386: Don't set any KVM flag by default if KVM is disabled

2013-01-09 Thread Eduardo Habkost

On Wed, Jan 09, 2013 at 10:46:12AM +0100, Igor Mammedov wrote:
 On Mon,  7 Jan 2013 16:20:43 -0200
 Eduardo Habkost ehabk...@redhat.com wrote:
 
  This is a cleanup that tries to solve two small issues:
  
   - We don't need a separate kvm_pv_eoi_features variable just to keep a
 constant calculated at compile-time, and this style would require
 adding a separate variable (that's declared twice because of the
 CONFIG_KVM ifdef) for each feature that's going to be enabled/disable
 by machine-type compat code.
   - The pc-1.3 code is setting the kvm_pv_eoi flag on cpuid_kvm_features
 even when KVM is disabled at runtime. This small incosistency in
 the cpuid_kvm_features field isn't a problem today because
 cpuid_kvm_features is ignored by the TCG code, but it may cause
 unexpected problems later when refactoring the CPUID handling code.
  
  This patch eliminates the kvm_pv_eoi_features variable and simply uses
  kvm_enabled() inside the enable_kvm_pv_eoi() compat function, so it
  enables kvm_pv_eoi only if KVM is enabled. I believe this makes the
  behavior of enable_kvm_pv_eoi() clearer and easier to understand.
 
 Subj doesn't match what patch actually does.
 Have you meant Don't set kvm_pv_eoi flag by default if KVM is disabled?
 Although eliminate kvm_pv_eoi_features variable might better describe what
 patch is doing.


True. The other flags in kvm_default_features may be already set and
will be copied to cpuid_kvm_features even if kvm_enabled() is false.

I had a previous version of this patch that also changed the code using
kvm_default_features to check kvm_enabled(), but I removed that part and
kept only the kvm_pv_eoi_features variable removal.

Andreas, please ignore this patch (it is not necessary anymore as this
series doesn't include machine-type compatibility code for kvm_mmu).
Patches 2-7 don't depend on this patch and should apply cleanly without
it.

I will send a new version later, probably with a separate patch to
ignore kvm_default_features if kvm_enabled() is false (so there's no
need to even check kvm_enabled() inside enable_kvm_pv_eoi()).

 
  
  Signed-off-by: Eduardo Habkost ehabk...@redhat.com
  ---
  Cc: kvm@vger.kernel.org
  Cc: Michael S. Tsirkin m...@redhat.com
  Cc: Gleb Natapov g...@redhat.com
  Cc: Marcelo Tosatti mtosa...@redhat.com
  
  Changes v2:
   - Coding style fix
  
  Changes v3:
   - Eliminate #ifdef by using the fake KVM_FEATURE_PV_EOI #define
  ---
   target-i386/cpu.c | 6 +++---
   1 file changed, 3 insertions(+), 3 deletions(-)
  
  diff --git a/target-i386/cpu.c b/target-i386/cpu.c
  index 951e206..40400ac 100644
  --- a/target-i386/cpu.c
  +++ b/target-i386/cpu.c
  @@ -164,15 +164,15 @@ static uint32_t kvm_default_features = (1 
  KVM_FEATURE_CLOCKSOURCE) | (1  KVM_FEATURE_ASYNC_PF) |
   (1  KVM_FEATURE_STEAL_TIME) |
   (1  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT);
  -static const uint32_t kvm_pv_eoi_features = (0x1  KVM_FEATURE_PV_EOI);
   #else
   static uint32_t kvm_default_features = 0;
  -static const uint32_t kvm_pv_eoi_features = 0;
   #endif
   
   void enable_kvm_pv_eoi(void)
   {
  -kvm_default_features |= kvm_pv_eoi_features;
  +if (kvm_enabled()) {
  +kvm_default_features |= (1UL  KVM_FEATURE_PV_EOI);
  +}
   }
   
   void host_cpuid(uint32_t function, uint32_t count,
 

-- 
Eduardo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

KVM versions, machine types and failed migrations

2013-01-09 Thread Vangelis Koukis

Hello,

I'd like to ask a few questions about the way migrations work in KVM
among different emulated machine types and different versions of the
qemu-kvm package. I am sending to both the kvm@ and qemu-devel@ lists,
please redirect me if I was wrong in doing so.

In a nutshell: while trying to live-migrate a VM on ~okeanos [1], we
see VM migrations fail silently if going from kvm 1.0 to kvm 1.1.
The source VM is frozen, info migrate on the source monitor reports
success, but the VM is dead upon arrival on the destination process.
Please see [3] for the exact package versions for qemu-kvm we have
tested with.

Migration works if the destination kvm has been started with the same
machine type as the source VM, e.g., using -M pc-1.0 specifically on
the destination, when migrating a pc-1.0 machine from kvm 1.0 to
kvm 1.1.

How does the machine type specified with -M work in the case of
migrations? Are migrations expected to fail if the machine type is
different between source and destination process? If yes, shouldn't KVM be
able to detect this and abort the migration instead of failing silently?

For every (src, dst) pair of package version / machine type,
we saw the following:

 dst:   1.0/pc-0.12  1.0/pc-1.0  1.1/pc-1.1
src:
1.0/pc-0.12 ok   ok  fails silently
1.0/pc-1.0  fails silently   ok  fails silently
1.1/pc-1.1  fails silently   fails silently  ok

Machine types pc-0.12 and pc-1.0 were run with qemu-kvm package version
1.0, machine type pc-1.1 with qemu-kvm package version 1.1.

Also, the migration 1.0/pc-0.12 to 1.0/pc-1.0 seems to work... How can
it work or even be allowed, given that the guest finds itself running on
a different hardware configuration after the migration?

Regarding different package versions of qemu-kvm, it seems migrations do
not work from source 0.12.5 to any other version *even* if -M pc-0.12 is
specified at the incoming KVM process. For versions = 1.0 everything
works provided the machine type on the destination is the same as on the
source.

Our goal is to patch Ganeti [2] so that it sets the destination machine
type to that of the source specifically, ensuring migrations work
seamlessly after a KVM upgrade. Is there a way to retrieve the machine
type of a running KVM process through a monitor command?

Thank you,
Vangelis.

[1] ~okeanos IaaS: http://okeanos.io
[2] Ganeti: https://code.google.com/p/ganeti/
[3] We tested on Debian Squeeze with package versions:

kvm 0.12: qemu-kvm_0.12.5+dfsg-5+squeeze8_amd64.deb
kvm 1.0: qemu-kvm_1.0+dfsg-8~bpo60+1_amd64.deb
kvm 1.1: qemu-kvm_1.1.2+dfsg-2~bpo60+1_amd64.deb

-- 
Vangelis Koukis
vkou...@grnet.gr
OpenPGP public key ID:
pub  1024D/1D038E97 2003-07-13 Vangelis Koukis vkou...@cslab.ece.ntua.gr
 Key fingerprint = C5CD E02E 2C78 7C10 8A00  53D8 FBFC 3799 1D03 8E97

Only those who will risk going too far
can possibly find out how far one can go.
-- T.S. Eliot


signature.asc
Description: Digital signature

Re: KVM versions, machine types and failed migrations

2013-01-09 Thread Daniel P. Berrange

On Wed, Jan 09, 2013 at 02:23:50PM +0200, Vangelis Koukis wrote:
 Hello,
 
 I'd like to ask a few questions about the way migrations work in KVM
 among different emulated machine types and different versions of the
 qemu-kvm package. I am sending to both the kvm@ and qemu-devel@ lists,
 please redirect me if I was wrong in doing so.
 
 In a nutshell: while trying to live-migrate a VM on ~okeanos [1], we
 see VM migrations fail silently if going from kvm 1.0 to kvm 1.1.
 The source VM is frozen, info migrate on the source monitor reports
 success, but the VM is dead upon arrival on the destination process.
 Please see [3] for the exact package versions for qemu-kvm we have
 tested with.
 
 Migration works if the destination kvm has been started with the same
 machine type as the source VM, e.g., using -M pc-1.0 specifically on
 the destination, when migrating a pc-1.0 machine from kvm 1.0 to
 kvm 1.1.
 
 How does the machine type specified with -M work in the case of
 migrations? Are migrations expected to fail if the machine type is
 different between source and destination process? If yes, shouldn't KVM be
 able to detect this and abort the migration instead of failing silently?

When doing migration, the fundamental requirement is that the guest
OS visible machine ABI must not change. Thus there are three key
things to take care of when launching QEMU on the migration target
host.

 - The device PCI/USB addresses must be identical to the source
 - The machine type must be identical to the source
 - The CPU model must be identical to the source

If you don't follow those requirements, either QEMU or the guest OS
or both will crash  burn during migration  you get to keep both
pieces :-)

 Regarding different package versions of qemu-kvm, it seems migrations do
 not work from source 0.12.5 to any other version *even* if -M pc-0.12 is
 specified at the incoming KVM process. For versions = 1.0 everything
 works provided the machine type on the destination is the same as on the
 source.

Some older versions of QEMU were buggy causing the machine type to
not correctly preserve ABI.

 Our goal is to patch Ganeti [2] so that it sets the destination machine
 type to that of the source specifically, ensuring migrations work
 seamlessly after a KVM upgrade. Is there a way to retrieve the machine
 type of a running KVM process through a monitor command?

IIRC there is not a monitor command for this. The general approach
to dealing with migration stability should be to launch QEMU with a
canonical hardware configuration. This means explicitly setting a machine
type, CPU model and PCI/USB devices addresses upfront. NB you should not
use 'pc' as a machine type - if you query the list of machine types from
QEMU, it will tell you what 'pc' corresponds to (pc-1.2) and then use the
versioned type so you have a known machine type.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [okeanos-dev] Re: KVM versions, machine types and failed migrations

2013-01-09 Thread Vangelis Koukis

On Wed, Jan 09, 2013 at 01:10:45pm +, Daniel P. Berrange wrote:
 When doing migration, the fundamental requirement is that the guest
 OS visible machine ABI must not change. Thus there are three key
 things to take care of when launching QEMU on the migration target
 host.
 
  - The device PCI/USB addresses must be identical to the source
  - The machine type must be identical to the source
  - The CPU model must be identical to the source
 

Thanks for the detailed list of requirements, we'll take it into account
for the relevant Ganeti patch.

 If you don't follow those requirements, either QEMU or the guest OS
 or both will crash  burn during migration  you get to keep both
 pieces :-)
 

My point is, are these requirements left up to the caller of kvm
-incoming to satisfy? Since the migration will most probably break,
wouldn't it be best for QEMU to detect this and complain loudly, instead
of continuing with the migration, failing silently and destroying the
VM?

Sure there could be some yes, do it, I know it is going to break
option, which will make QEMU proceed with the migration. However, in 99%
of the cases this is just user error, e.g. the user has upgraded the
version on the other end and has not specified -M explicitly. It would
be best if QEMU was able to detect and warn the user about what is going
to happen, because it does lead to the VM dying.

  Regarding different package versions of qemu-kvm, it seems migrations do
  not work from source 0.12.5 to any other version *even* if -M pc-0.12 is
  specified at the incoming KVM process. For versions = 1.0 everything
  works provided the machine type on the destination is the same as on the
  source.
 
 Some older versions of QEMU were buggy causing the machine type to
 not correctly preserve ABI.
 
  Our goal is to patch Ganeti [2] so that it sets the destination machine
  type to that of the source specifically, ensuring migrations work
  seamlessly after a KVM upgrade. Is there a way to retrieve the machine
  type of a running KVM process through a monitor command?
 
 IIRC there is not a monitor command for this.

 The general approach
 to dealing with migration stability should be to launch QEMU with a
 canonical hardware configuration. This means explicitly setting a machine
 type, CPU model and PCI/USB devices addresses upfront. NB you should not
 use 'pc' as a machine type - if you query the list of machine types from
 QEMU, it will tell you what 'pc' corresponds to (pc-1.2) and then use the
 versioned type so you have a known machine type.
 

This is exactly what we're trying to do: specify -M explicitly in the
kvm command line, instead of letting the default pc machine type
change arbitrarily whenever the qemu-kvm package gets upgraded.

Thanks again,
Vangelis.

-- 
Vangelis Koukis
vkou...@grnet.gr
OpenPGP public key ID:
pub  1024D/1D038E97 2003-07-13 Vangelis Koukis vkou...@cslab.ece.ntua.gr
 Key fingerprint = C5CD E02E 2C78 7C10 8A00  53D8 FBFC 3799 1D03 8E97

Only those who will risk going too far
can possibly find out how far one can go.
-- T.S. Eliot


signature.asc
Description: Digital signature

Re: [PATCH v5 12/12] ARM: KVM: Add VGIC configuration option

2013-01-09 Thread Sergei Shtylyov


Hello.

On 08-01-2013 22:42, Christoffer Dall wrote:


From: Marc Zyngier marc.zyng...@arm.com



It is now possible to select the VGIC configuration option.



Signed-off-by: Marc Zyngier marc.zyng...@arm.com
Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
  arch/arm/kvm/Kconfig |8 
  1 file changed, 8 insertions(+)



diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
index 05227cb..d32e33f 100644
--- a/arch/arm/kvm/Kconfig
+++ b/arch/arm/kvm/Kconfig
@@ -51,6 +51,14 @@ config KVM_ARM_MAX_VCPUS
  large, so only choose a reasonable number that you expect to
  actually use.

+config KVM_ARM_VGIC
+bool KVM support for Virtual GIC


   Please indent with tab, as below.


+   depends on KVM_ARM_HOST  OF
+   select HAVE_KVM_IRQCHIP
+   default y
+   ---help---
+ Adds support for a hardware assisted, in-kernel GIC emulation.
+


WBR, Sergei


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 00/12] Multiqueue virtio-net

2013-01-09 Thread Stefan Hajnoczi

On Fri, Dec 28, 2012 at 06:31:52PM +0800, Jason Wang wrote:
 Perf Numbers:
 
 Two Intel Xeon 5620 with direct connected intel 82599EB
 Host/Guest kernel: David net tree
 vhost enabled
 
 - lots of improvents of both latency and cpu utilization in request-reponse 
 test
 - get regression of guest sending small packets which because TCP tends to 
 batch
   less when the latency were improved
 
 1q/2q/4q
 TCP_RR
  size #sessions trans.rate  norm trans.rate  norm trans.rate  norm
 1 1 9393.26   595.64  9408.18   597.34  9375.19   584.12
 1 2072162.1   2214.24 129880.22 2456.13 196949.81 2298.13
 1 50107513.38 2653.99 139721.93 2490.58 259713.82 2873.57
 1 100   126734.63 2676.54 145553.5  2406.63 265252.68 2943
 64 19453.42   632.33  9371.37   616.13  9338.19   615.97
 64 20   70620.03  2093.68 125155.75 2409.15 191239.91 2253.32
 64 50   1069662448.29 146518.67 2514.47 242134.07 2720.91
 64 100  117046.35 2394.56 190153.09 2696.82 238881.29 2704.41
 256 1   8733.29   736.36  8701.07   680.83  8608.92   530.1
 256 20  69279.89  2274.45 115103.07 2299.76 144555.16 1963.53
 256 50  97676.02  2296.09 150719.57 2522.92 254510.5  3028.44
 256 100 150221.55 2949.56 197569.3  2790.92 300695.78 3494.83
 TCP_CRR
  size #sessions trans.rate  norm trans.rate  norm trans.rate  norm
 1 1 2848.37  163.41 2230.39  130.89 2013.09  120.47
 1 2023434.5  562.11 31057.43 531.07 49488.28 564.41
 1 5028514.88 582.17 40494.23 605.92 60113.35 654.97
 1 100   28827.22 584.73 48813.25 661.6  61783.62 676.56
 64 12780.08  159.4  2201.07  127.96 2006.8   117.63
 64 20   23318.51 564.47 30982.44 530.24 49734.95 566.13
 64 50   28585.72 582.54 40576.7  610.08 60167.89 656.56
 64 100  28747.37 584.17 49081.87 667.87 60612.94 662
 256 1   2772.08  160.51 2231.84  131.05 2003.62  113.45
 256 20  23086.35 559.8  30929.09 528.16 48454.9  555.22
 256 50  28354.7  579.85 40578.31 60760261.71 657.87
 256 100 28844.55 585.67 48541.86 659.08 61941.07 676.72
 TCP_STREAM guest receiving
  size #sessions throughput  norm throughput  norm throughput  norm
 1 1 16.27   1.33   16.11.12   16.13   0.99
 1 2 33.04   2.08   32.96   2.19   32.75   1.98
 1 4 66.62   6.83   68.35.56   66.14   2.65
 64 1896.55  56.67  914.02  58.14  898.9   61.56
 64 21830.46 91.02  1812.02 64.59  1835.57 66.26
 64 43626.61 142.55 3636.25 100.64 3607.46 75.03
 256 1   2619.49 131.23 2543.19 129.03 2618.69 132.39
 256 2   5136.58 203.02 5163.31 141.11 5236.51 149.4
 256 4   7063.99 242.83 9365.4  208.49 9421.03 159.94
 512 1   3592.43 165.24 3603.12 167.19 3552.5  169.57
 512 2   7042.62 246.59 7068.46 180.87 7258.52 186.3
 512 4   6996.08 241.49 9298.34 206.12 9418.52 159.33
 1024 1  4339.54 192.95 4370.2  191.92 4211.72 192.49
 1024 2  7439.45 254.77 9403.99 215.24 9120.82 222.67
 1024 4  7953.86 272.11 9403.87 208.23 9366.98 159.49
 4096 1  7696.28 272.04 7611.41 270.38 7778.71 267.76
 4096 2  7530.35 261.1  8905.43 246.27 8990.18 267.57
 4096 4  7121.6  247.02 9411.75 206.71 9654.96 184.67
 16384 1 7795.73 268.54 7780.94 267.2  7634.26 260.73
 16384 2 7436.57 255.81 9381.86 220.85 9392220.36
 16384 4 7199.07 247.81 9420.96 205.87 9373.69 159.57
 TCP_MAERTS guest sending
  size #sessions throughput  norm throughput  norm throughput  norm
 1 1 15.94   0.62   15.55   0.61   15.13   0.59
 1 2 36.11   0.83   32.46   0.69   32.28   0.69
 1 4 71.59   1  68.91   0.94   61.52   0.77
 64 1630.71  22.52  622.11  22.35  605.09  21.84
 64 21442.36 30.57  1292.15 25.82  1282.67 25.55
 64 43186.79 42.59  2844.96 36.03  2529.69 30.06
 256 1   1760.96 58.07  1738.44 57.43  1695.99 56.19
 256 2   4834.23 95.19  3524.85 64.21  3511.94 64.45
 256 4   9324.63 145.74 8956.49 116.39 6720.17 73.86
 512 1   2678.03 84.1   2630.68 82.93  2636.54 82.57
 512 2   9368.17 195.61 9408.82 204.53 5316.3  92.99
 512 4   9186.34 209.68 9358.72 183.82 9489.29 160.42
 1024 1  3620.71 109.88 3625.54 109.83 3606.61 112.35
 1024 2  9429258.32 7082.79 120.55 7403.53 134.78
 1024 4  9430.66 290.44 9499.29 232.31 9414.6  190.92
 4096 1  9339.28 296.48 9374.23 372.88 9348.76 298.49
 4096 2  9410.53 378.69 9412.61 286.18 9409.75 278.31
 4096 4  9487.35 374.1  9556.91 288.81 9441.94 221.64
 16384 1 9380.43 403.8  9379.78 399.13 9382.42 393.55
 16384 2 9367.69 406.93 9415.04 312.68 9409.29 300.9
 16384 4 9391.96 405.17 9695.12 310.54 9423.76 223.47

Trying to understand the performance results:

What is the host device configuration?  tap + bridge?

Did you use host CPU affinity for the vhost threads?

Can multiqueue tap take advantage of multiqueue host NICs or is
virtio-net multiqueue unaware of the physical NIC multiqueue
capabilities?

The results seem pretty mixed - as a user it's not obvious what to
choose as a good all-round setting.  Any observations on how multiqueue
should be configured?

What is the norm statistic?

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a

Re: [PATCH v5 01/12] KVM: ARM: Introduce KVM_SET_DEVICE_ADDRESS ioctl

2013-01-09 Thread Peter Maydell

On 9 January 2013 10:02, Alexander Graf ag...@suse.de wrote:
 I think we should make thus at least potentially generic. In fact, I wouldn't 
 even
 mind calling it DEV_REG with the same semantics as ONE_REG, just that it also
 gets a unique dev id that gets created during in-kernel device creation and 
 that
 it's a vm ioctl.

Well, we might want a DEV_REG, but you might as well just make ONE_REG
OK as a VM ioctl, since there's no reason not to have not-per-cpu but not
device registers. ONE_REG already supports dividing up the ID space so
you can say which device or whatever is being used, because we had
things like GIC registers in mind when we put that API together.

However, this shouldn't be DEV_REG, because this isn't actually setting state
in the irqchip device, it's configuring the kernel's part of the system model
[compare wiring up irq routing, which also has a custom ioctl rather than a
generic one]. As such it definitely needs to happen only before the VM is
actually started and not during VM execution, unlike a DEV_REG which would
presumably be generally valid.

 That way we wouldn't block our path to create two in-kernel irqchips one day.

Er, SET_DEVICE_ADDRESS doesn't block us doing that; that's why it has
an ID parameter.

The discussion at the KVM forum, as I recall it was basically:
 (a) some other ppc irqchips also want to set the base address for where
their memory mapped registers live, so this isn't a totally ARM specific
weirdness
 (b) we didn't need to tangle up and delay the KVM ARM work with a vague
and unspecified desire for general configurability

-- PMM
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5 01/12] KVM: ARM: Introduce KVM_SET_DEVICE_ADDRESS ioctl

2013-01-09 Thread Alexander Graf


On 09.01.2013, at 15:48, Peter Maydell wrote:

 On 9 January 2013 10:02, Alexander Graf ag...@suse.de wrote:
 I think we should make thus at least potentially generic. In fact, I 
 wouldn't even
 mind calling it DEV_REG with the same semantics as ONE_REG, just that it also
 gets a unique dev id that gets created during in-kernel device creation and 
 that
 it's a vm ioctl.
 
 Well, we might want a DEV_REG, but you might as well just make ONE_REG
 OK as a VM ioctl, since there's no reason not to have not-per-cpu but not
 device registers. ONE_REG already supports dividing up the ID space so
 you can say which device or whatever is being used, because we had
 things like GIC registers in mind when we put that API together.

ONE_REG's address space today doesn't include a field for the target device, 
because that's already predetermined by the fd you run it on. Hence my 
suggestion to add it as separate field, because then we keep the id space 
mask-free.

 However, this shouldn't be DEV_REG, because this isn't actually setting state
 in the irqchip device, it's configuring the kernel's part of the system model
 [compare wiring up irq routing, which also has a custom ioctl rather than a
 generic one]. As such it definitely needs to happen only before the VM is
 actually started and not during VM execution, unlike a DEV_REG which would
 presumably be generally valid.

The same can be true for ONE_REG/SET_SREGS. On PPC we support setting the PVR 
(CPU type) via SREGS because ONE_REG only came later. Setting the CPU type only 
makes sense before you first start using a vcpu. After that point it should be 
forbidden.

So in a way, the irqchip memory address location is a device register, as it 
has all the properties that a register on that device would have. It is

  * per device
  * one id for one purpose
  * may or may not be writable during certain times

We would also keep the gates open to get a new register id one day for a second 
address. Imagine an interrupt controller that splits its address mappings into 
PIC base registers and MSI target registers, though the MSI registers 
logically belong to the PIC. Then the ioctl as designed here gets stuck too.

 
 That way we wouldn't block our path to create two in-kernel irqchips one day.
 
 Er, SET_DEVICE_ADDRESS doesn't block us doing that; that's why it has
 an ID parameter.
 
 The discussion at the KVM forum, as I recall it was basically:
 (a) some other ppc irqchips also want to set the base address for where
 their memory mapped registers live, so this isn't a totally ARM specific
 weirdness

Yes.

 (b) we didn't need to tangle up and delay the KVM ARM work with a vague
 and unspecified desire for general configurability

Yeah, that was the basic idea. Considering that the patch set hasn't been going 
in for another 2 months after that discussion indicates that this isn't too 
much of an issue though :).

In fact, I do recall myself pointing out that we need some padding in that 
ioctl to accomodate for later usages like the ones above. Then we could later 
rename (or copy name) it and everyone'd be happy.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5 01/12] KVM: ARM: Introduce KVM_SET_DEVICE_ADDRESS ioctl

2013-01-09 Thread Christoffer Dall

On Wed, Jan 9, 2013 at 10:11 AM, Peter Maydell peter.mayd...@linaro.org wrote:
 On 9 January 2013 14:58, Alexander Graf ag...@suse.de wrote:

 On 09.01.2013, at 15:48, Peter Maydell wrote:

 On 9 January 2013 10:02, Alexander Graf ag...@suse.de wrote:
 I think we should make thus at least potentially generic. In fact, I 
 wouldn't even
 mind calling it DEV_REG with the same semantics as ONE_REG, just that it 
 also
 gets a unique dev id that gets created during in-kernel device creation 
 and that
 it's a vm ioctl.

 Well, we might want a DEV_REG, but you might as well just make ONE_REG
 OK as a VM ioctl, since there's no reason not to have not-per-cpu but not
 device registers. ONE_REG already supports dividing up the ID space so
 you can say which device or whatever is being used, because we had
 things like GIC registers in mind when we put that API together.

 ONE_REG's address space today doesn't include a field for the target device,

 There's 16 bits of register group type.

 because that's already predetermined by the fd you run it on. Hence my
 suggestion to add it as separate field, because then we keep the id space
 mask-free.

 Er, it already has masks for the different parts of ID space. That's a 
 feature,
 not a bug.

 However, this shouldn't be DEV_REG, because this isn't actually setting 
 state
 in the irqchip device, it's configuring the kernel's part of the system 
 model
 [compare wiring up irq routing, which also has a custom ioctl rather than a
 generic one]. As such it definitely needs to happen only before the VM is
 actually started and not during VM execution, unlike a DEV_REG which would
 presumably be generally valid.

 The same can be true for ONE_REG/SET_SREGS. On PPC we support setting
 the PVR (CPU type) via SREGS because ONE_REG only came later. Setting
 the CPU type only makes sense before you first start using a vcpu. After that
 point it should be forbidden.

 So in a way, the irqchip memory address location is a device register, as it 
 has
 all the properties that a register on that device would have. It is

   * per device
   * one id for one purpose
   * may or may not be writable during certain times

 There's a distinction between things you need to do in machine setup
 and things you need to propagate for migration. It should be possible
 to do a migration by doing a complete copy of all ONE_REG (and DEV_REG
 if we have it) state from source to destination, so it needs to be always
 possible to write them.

 We would also keep the gates open to get a new register id one day for a 
 second
 address. Imagine an interrupt controller that splits its address mappings 
 into
 PIC base registers and MSI target registers, though the MSI registers 
 logically
 belong to the PIC. Then the ioctl as designed here gets stuck too.

 This is exactly what ARM's GIC has already. We have several memory regions
 (one for the distributor, one for the cpu interface), and map them with 
 several
 calls to SET_DEVICE_ADDRESS.

 (b) we didn't need to tangle up and delay the KVM ARM work with a vague
 and unspecified desire for general configurability

 Yeah, that was the basic idea. Considering that the patch set hasn't been 
 going
 in for another 2 months after that discussion indicates that this isn't too 
 much of
 an issue though :).

well, now it's actually going in, and it seems like this is the last
issue. Let's hold our breaths here for a second and let me write up a
new device attribute API and send that out, and if you guys can
possibly live with it, then please consider accepting it.

-Christoffer
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH qom-cpu 4/7] target-i386/cpu: Introduce FeatureWord typedefs

2013-01-09 Thread Gleb Natapov

On Mon, Jan 07, 2013 at 04:20:45PM -0200, Eduardo Habkost wrote:
 This introduces a FeatureWord enum, FeatureWordInfo struct (with
 generation information about a feature word), and a FeatureWordArray
 typedef, and changes add_flagname_to_bitmaps() code and
 cpu_x86_parse_featurestr() to use the new typedefs instead of separate
 variables for each feature word.
 
 This will help us keep the code at kvm_check_features_against_host(),
 cpu_x86_parse_featurestr() and add_flagname_to_bitmaps() sane while
 adding new feature name arrays.
 
 Signed-off-by: Eduardo Habkost ehabk...@redhat.com
 ---
  target-i386/cpu.c | 97 
 +++
  target-i386/cpu.h | 15 +
  2 files changed, 63 insertions(+), 49 deletions(-)
 
 diff --git a/target-i386/cpu.c b/target-i386/cpu.c
 index b09b625..7d62d48 100644
 --- a/target-i386/cpu.c
 +++ b/target-i386/cpu.c
 @@ -124,6 +124,20 @@ static const char *cpuid_7_0_ebx_feature_name[] = {
  NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
  };
  
 +typedef struct FeatureWordInfo {
 +const char **feat_names;
 +} FeatureWordInfo;
 +
 +static FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
 +[FEAT_1_EDX] = { .feat_names = feature_name },
 +[FEAT_1_ECX] = { .feat_names = ext_feature_name },
 +[FEAT_8000_0001_EDX] = { .feat_names = ext2_feature_name },
 +[FEAT_8000_0001_ECX] = { .feat_names = ext3_feature_name },
 +[FEAT_KVM]   = { .feat_names = kvm_feature_name },
 +[FEAT_SVM]   = { .feat_names = svm_feature_name },
 +[FEAT_7_0_EBX] = { .feat_names = cpuid_7_0_ebx_feature_name },
 +};
 +
  const char *get_register_name_32(unsigned int reg)
  {
  static const char *reg_names[CPU_NB_REGS32] = {
 @@ -271,23 +285,20 @@ static bool lookup_feature(uint32_t *pval, const char 
 *s, const char *e,
  return found;
  }
  
 -static void add_flagname_to_bitmaps(const char *flagname, uint32_t *features,
 -uint32_t *ext_features,
 -uint32_t *ext2_features,
 -uint32_t *ext3_features,
 -uint32_t *kvm_features,
 -uint32_t *svm_features,
 -uint32_t *cpuid_7_0_ebx_features)
 +static void add_flagname_to_bitmaps(const char *flagname,
 +FeatureWordArray words)
  {
 -if (!lookup_feature(features, flagname, NULL, feature_name) 
 -!lookup_feature(ext_features, flagname, NULL, ext_feature_name) 
 -!lookup_feature(ext2_features, flagname, NULL, ext2_feature_name) 
 -!lookup_feature(ext3_features, flagname, NULL, ext3_feature_name) 
 -!lookup_feature(kvm_features, flagname, NULL, kvm_feature_name) 
 -!lookup_feature(svm_features, flagname, NULL, svm_feature_name) 
 -!lookup_feature(cpuid_7_0_ebx_features, flagname, NULL,
 -cpuid_7_0_ebx_feature_name))
 -fprintf(stderr, CPU feature %s not found\n, flagname);
 +FeatureWord w;
 +for (w = 0; w  FEATURE_WORDS; w++) {
 +FeatureWordInfo *wi = feature_word_info[w];
 +if (wi-feat_names 
I would move patch 6 before that one and drop this check here, but if
you disagree it is your call.

 +lookup_feature(words[w], flagname, NULL, wi-feat_names)) {
 +break;
 +}
 +}
 +if (w == FEATURE_WORDS) {
 +fprintf(stderr, CPU feature %s not found\n, flagname);
 +}
  }
  
  typedef struct x86_def_t {
 @@ -1256,35 +1267,23 @@ static int cpu_x86_parse_featurestr(x86_def_t 
 *x86_cpu_def, char *features)
  unsigned int i;
  char *featurestr; /* Single 'key=value string being parsed */
  /* Features to be added */
 -uint32_t plus_features = 0, plus_ext_features = 0;
 -uint32_t plus_ext2_features = 0, plus_ext3_features = 0;
 -uint32_t plus_kvm_features = kvm_default_features, plus_svm_features = 0;
 -uint32_t plus_7_0_ebx_features = 0;
 +FeatureWordArray plus_features = {
 +[FEAT_KVM] = kvm_default_features,
 +};
  /* Features to be removed */
 -uint32_t minus_features = 0, minus_ext_features = 0;
 -uint32_t minus_ext2_features = 0, minus_ext3_features = 0;
 -uint32_t minus_kvm_features = 0, minus_svm_features = 0;
 -uint32_t minus_7_0_ebx_features = 0;
 +FeatureWordArray minus_features = { 0 };
  uint32_t numvalue;
  
 -add_flagname_to_bitmaps(hypervisor, plus_features,
 -plus_ext_features, plus_ext2_features, plus_ext3_features,
 -plus_kvm_features, plus_svm_features,  plus_7_0_ebx_features);
 +add_flagname_to_bitmaps(hypervisor, plus_features);
  
  featurestr = features ? strtok(features, ,) : NULL;
  
  while (featurestr) {
  char *val;
  if (featurestr[0] == '+') {
 -add_flagname_to_bitmaps(featurestr + 1, plus_features,
 -

Re: [PATCH v5 01/12] KVM: ARM: Introduce KVM_SET_DEVICE_ADDRESS ioctl

2013-01-09 Thread Peter Maydell

On 9 January 2013 14:58, Alexander Graf ag...@suse.de wrote:

 On 09.01.2013, at 15:48, Peter Maydell wrote:

 On 9 January 2013 10:02, Alexander Graf ag...@suse.de wrote:
 I think we should make thus at least potentially generic. In fact, I 
 wouldn't even
 mind calling it DEV_REG with the same semantics as ONE_REG, just that it 
 also
 gets a unique dev id that gets created during in-kernel device creation and 
 that
 it's a vm ioctl.

 Well, we might want a DEV_REG, but you might as well just make ONE_REG
 OK as a VM ioctl, since there's no reason not to have not-per-cpu but not
 device registers. ONE_REG already supports dividing up the ID space so
 you can say which device or whatever is being used, because we had
 things like GIC registers in mind when we put that API together.

 ONE_REG's address space today doesn't include a field for the target device,

There's 16 bits of register group type.

 because that's already predetermined by the fd you run it on. Hence my
 suggestion to add it as separate field, because then we keep the id space
 mask-free.

Er, it already has masks for the different parts of ID space. That's a feature,
not a bug.

 However, this shouldn't be DEV_REG, because this isn't actually setting state
 in the irqchip device, it's configuring the kernel's part of the system model
 [compare wiring up irq routing, which also has a custom ioctl rather than a
 generic one]. As such it definitely needs to happen only before the VM is
 actually started and not during VM execution, unlike a DEV_REG which would
 presumably be generally valid.

 The same can be true for ONE_REG/SET_SREGS. On PPC we support setting
 the PVR (CPU type) via SREGS because ONE_REG only came later. Setting
 the CPU type only makes sense before you first start using a vcpu. After that
 point it should be forbidden.

 So in a way, the irqchip memory address location is a device register, as it 
 has
 all the properties that a register on that device would have. It is

   * per device
   * one id for one purpose
   * may or may not be writable during certain times

There's a distinction between things you need to do in machine setup
and things you need to propagate for migration. It should be possible
to do a migration by doing a complete copy of all ONE_REG (and DEV_REG
if we have it) state from source to destination, so it needs to be always
possible to write them.

 We would also keep the gates open to get a new register id one day for a 
 second
 address. Imagine an interrupt controller that splits its address mappings into
 PIC base registers and MSI target registers, though the MSI registers 
 logically
 belong to the PIC. Then the ioctl as designed here gets stuck too.

This is exactly what ARM's GIC has already. We have several memory regions
(one for the distributor, one for the cpu interface), and map them with several
calls to SET_DEVICE_ADDRESS.

 (b) we didn't need to tangle up and delay the KVM ARM work with a vague
 and unspecified desire for general configurability

 Yeah, that was the basic idea. Considering that the patch set hasn't been 
 going
 in for another 2 months after that discussion indicates that this isn't too 
 much of
 an issue though :).

We might get there faster if people didn't keep nitpicking the APIs at the
last minute :-)

-- PMM
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5 01/12] KVM: ARM: Introduce KVM_SET_DEVICE_ADDRESS ioctl

2013-01-09 Thread Alexander Graf


On 09.01.2013, at 16:11, Peter Maydell wrote:

 On 9 January 2013 14:58, Alexander Graf ag...@suse.de wrote:
 
 On 09.01.2013, at 15:48, Peter Maydell wrote:
 
 On 9 January 2013 10:02, Alexander Graf ag...@suse.de wrote:
 I think we should make thus at least potentially generic. In fact, I 
 wouldn't even
 mind calling it DEV_REG with the same semantics as ONE_REG, just that it 
 also
 gets a unique dev id that gets created during in-kernel device creation 
 and that
 it's a vm ioctl.
 
 Well, we might want a DEV_REG, but you might as well just make ONE_REG
 OK as a VM ioctl, since there's no reason not to have not-per-cpu but not
 device registers. ONE_REG already supports dividing up the ID space so
 you can say which device or whatever is being used, because we had
 things like GIC registers in mind when we put that API together.
 
 ONE_REG's address space today doesn't include a field for the target device,
 
 There's 16 bits of register group type.

That's register group, not device id. It's constant per id. If you have an 
ARM register, it will always stay an ARM register. It's not going to change. 
But if you reuse those bits for the device id, they are going to change, 
because a device id is similar to a handle that should be returned by the 
device create ioctl.

 
 because that's already predetermined by the fd you run it on. Hence my
 suggestion to add it as separate field, because then we keep the id space
 mask-free.
 
 Er, it already has masks for the different parts of ID space. That's a 
 feature,
 not a bug.

All those masks are constant masks and only used to make reading the IDs 
easier. They never change within a single type of a register.

 
 However, this shouldn't be DEV_REG, because this isn't actually setting 
 state
 in the irqchip device, it's configuring the kernel's part of the system 
 model
 [compare wiring up irq routing, which also has a custom ioctl rather than a
 generic one]. As such it definitely needs to happen only before the VM is
 actually started and not during VM execution, unlike a DEV_REG which would
 presumably be generally valid.
 
 The same can be true for ONE_REG/SET_SREGS. On PPC we support setting
 the PVR (CPU type) via SREGS because ONE_REG only came later. Setting
 the CPU type only makes sense before you first start using a vcpu. After that
 point it should be forbidden.
 
 So in a way, the irqchip memory address location is a device register, as it 
 has
 all the properties that a register on that device would have. It is
 
  * per device
  * one id for one purpose
  * may or may not be writable during certain times
 
 There's a distinction between things you need to do in machine setup
 and things you need to propagate for migration. It should be possible
 to do a migration by doing a complete copy of all ONE_REG (and DEV_REG
 if we have it) state from source to destination, so it needs to be always
 possible to write them.

You need to somewhere encode a list of registers you want to synchronize for 
migration anyways. Don't include this register in that list then.

 
 We would also keep the gates open to get a new register id one day for a 
 second
 address. Imagine an interrupt controller that splits its address mappings 
 into
 PIC base registers and MSI target registers, though the MSI registers 
 logically
 belong to the PIC. Then the ioctl as designed here gets stuck too.
 
 This is exactly what ARM's GIC has already. We have several memory regions
 (one for the distributor, one for the cpu interface), and map them with 
 several
 calls to SET_DEVICE_ADDRESS.

I see.

 
 (b) we didn't need to tangle up and delay the KVM ARM work with a vague
 and unspecified desire for general configurability
 
 Yeah, that was the basic idea. Considering that the patch set hasn't been 
 going
 in for another 2 months after that discussion indicates that this isn't too 
 much of
 an issue though :).
 
 We might get there faster if people didn't keep nitpicking the APIs at the
 last minute :-)

I guess we just had too much bad experience with designing horrible APIs in the 
past on the PPC side ;).


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v8 2/3] x86, apicv: add virtual interrupt delivery support

2013-01-09 Thread Marcelo Tosatti

On Wed, Jan 09, 2013 at 08:07:31AM +, Zhang, Yang Z wrote:
 if(check_request(KVM_REQ_, )) {
 ioapic_lock();  (*)
 update local EOI exit bitmap from IOAPIC
 In my patch, it traverses IOAPIC entry once and only updates target vcpus's 
 eoi exit bitmap. Then make request for all vcpus.
 With your suggestion , all vcpus will traverse all IOAPIC entries. Though 
 ioapic entry write is rare, it's still not reasonable.

It should be fast, and very rare (as in once during system
initialization, or device hotplug). 

Is there a particular case that makes it necessary to optimize scanning?

 
 ioapic_unlock();
 }
  
  Fine by me. Looks simpler.
  
  
  
  (*) plus any other lock that paths that update the map take
  
  
  
  
  
  
  Best regards,
  Yang
  
  --
  Gleb.
 
 
 Best regards,
 Yang
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH qom-cpu 0/7] disable kvm_mmu + -cpu enforce fixes (v3)

2013-01-09 Thread Gleb Natapov

On Mon, Jan 07, 2013 at 04:20:41PM -0200, Eduardo Habkost wrote:
 Changes on v3:
  - Patches 3-9 from v2 are now already on qom-cpu tree
  - Remove CONFIG_KVM #ifdefs by declaring fake KVM_* #defines on sysemu/kvm.h
  - Refactor code that uses the feature word arrays
(to make it easier to add a new feature name array)
  - Add feature name array for CPUID leaf 0xC001
 
 Changes on v2:
  - Now both the kvm_mmu-disable and -cpu enforce changes are on the same
series
  - Coding style fixes
 
 Git tree for reference:
   git://github.com/ehabkost/qemu-hacks.git cpu-enforce-all.v3
   https://github.com/ehabkost/qemu-hacks/tree/cpu-enforce-all.v3
 
 The changes are a bit intrusive, but:
 
  - The longer we take to make enforce strict as it should (and make libvirt
finally use it), more users will have VMs with migration-unsafe 
 unpredictable
guest ABIs. For this reason, I would like to get this into QEMU 1.4.
  - The changes in this series should affect only users that are already using
the enforce flag, and I believe whoever is using the enforce flag 
 really
want the strict behavior introduced by this series.
 
 
 
Reviewed-by: Gleb Natapov g...@redhat.com

Small comment on patch 4. Fill free to ignore.

 Eduardo Habkost (7):
   kvm: Add fake KVM constants to avoid #ifdefs on KVM-specific code
   target-i386: Don't set any KVM flag by default if KVM is disabled
   target-i386: Disable kvm_mmu by default
   target-i386/cpu: Introduce FeatureWord typedefs
   target-i386: kvm_check_features_against_host(): Use feature_word_info
   target-i386/cpu.c: Add feature name array for ext4_features
   target-i386: check/enforce: Check all feature words
 
  include/sysemu/kvm.h |  14 
  target-i386/cpu.c| 193 
 ---
  target-i386/cpu.h|  15 
  3 files changed, 150 insertions(+), 72 deletions(-)
 
 -- 
 1.7.11.7

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 01/12] tap: multiqueue support

2013-01-09 Thread Jason Wang

On 01/09/2013 05:56 PM, Stefan Hajnoczi wrote:
 On Fri, Dec 28, 2012 at 06:31:53PM +0800, Jason Wang wrote:
 diff --git a/qapi-schema.json b/qapi-schema.json
 index 5dfa052..583eb7c 100644
 --- a/qapi-schema.json
 +++ b/qapi-schema.json
 @@ -2465,7 +2465,7 @@
  { 'type': 'NetdevTapOptions',
'data': {
  '*ifname': 'str',
 -'*fd': 'str',
 +'*fd': ['String'],
 This change is not backwards-compatible.  You need to add a '*fds':
 ['String'] field instead.

I'm not quite understand this case, I think it still work when we we
just specify one fd.
  '*script': 'str',
  '*downscript': 'str',
  '*helper': 'str',
 @@ -2473,7 +2473,8 @@
  '*vnet_hdr':   'bool',
  '*vhost':  'bool',
  '*vhostfd':'str',
 -'*vhostforce': 'bool' } }
 +'*vhostforce': 'bool',
 +'*queues': 'uint32'} }
 The 'queues' parameter should not be necessary when fd passing is used
 since we can learn the number of queues by looking at the list length.

Ok.

 Stefan

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 10/12] virtio-net: multiqueue support

2013-01-09 Thread Jason Wang

On 01/09/2013 06:01 PM, Wanlong Gao wrote:
 On 01/09/2013 05:30 PM, Jason Wang wrote:
 On 01/09/2013 04:23 PM, Wanlong Gao wrote:
 On 01/08/2013 06:14 PM, Jason Wang wrote:
 On 01/08/2013 06:00 PM, Wanlong Gao wrote:
 On 01/08/2013 05:51 PM, Jason Wang wrote:
 On 01/08/2013 05:49 PM, Wanlong Gao wrote:
 On 01/08/2013 05:29 PM, Jason Wang wrote:
 On 01/08/2013 05:07 PM, Wanlong Gao wrote:
 On 12/28/2012 06:32 PM, Jason Wang wrote:
 +} else if (nc-peer-info-type !=  
 NET_CLIENT_OPTIONS_KIND_TAP) {
 +ret = -1;
 +} else {
 +ret = tap_detach(nc-peer);
 +}
 +
 +return ret;
 +}
 +
 +static void virtio_net_set_queues(VirtIONet *n)
 +{
 +int i;
 +
 +for (i = 0; i  n-max_queues; i++) {
 +if (i  n-curr_queues) {
 +assert(!peer_attach(n, i));
 +} else {
 +assert(!peer_detach(n, i));
 I got a assert here,
 qemu-system-x86_64: /work/git/qemu/hw/virtio-net.c:330: 
 virtio_net_set_queues: Assertion `!peer_detach(n, i)' failed.

 Any thoughts?

 Thanks,
 Wanlong Gao
 Thanks for the testing, which steps or cases did you met this 
 assertion,
 migration, reboot or just changing the number of virtqueues?
 I use the 3.8-rc2 to test it again, I saw this tag has the multi-tap 
 support.

 I just can't start the QEMU use  -netdev 
 tap,id=hostnet0,queues=2,fd=%d,fd=%d -device 
 virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:ce:7b:29,bus=pci.0,addr=0x3

 I pre-opened two tap fds, did I missing something?
 Nothing missed :) It should work.

 Could you please try not use fd=X and let qemu to create the file
 descriptors by itself? Btw, how did you create the two tap fds?
 Can it create descriptors itself? I get 
 qemu-system-x86_64: -netdev tap,id=hostnet0,queues=2: Device 'tap' could 
 not be initialized
 You need prepare an ifup script which default at /etc/qemu-ifup (like
 following). Or you may try to add a script=no after:

 #!/bin/sh

 switch=kvmbr0

 /sbin/ifconfig $1 0.0.0.0 up
 /usr/sbin/brctl addif $switch $1
 /usr/sbin/brctl stp $switch off

 This will let qemu create a tap fd itself and make it to be connected to
 a port of the bridge caled kvmbr0.
 But how to support multi-queue in this way?
 Qemu will create the necessary multiqueue tap by itself, see patch 0/12.
 I got guest kernel panic when using this way and set queues=4.
 Does it happens w/o or w/ a fd parameter? What's the qemu command line?
 Did you meet it during boot time?
 The QEMU command line is 

 /work/git/qemu/x86_64-softmmu/qemu-system-x86_64 -name f17 -M pc-0.15 
 -enable-kvm -m 3096 \
 -smp 4,sockets=4,cores=1,threads=1 \
 -uuid c31a9f3e-4161-c53a-339c-5dc36d0497cb -no-user-config -nodefaults \
 -chardev 
 socket,id=charmonitor,path=/var/lib/libvirt/qemu/f17.monitor,server,nowait \
 -mon chardev=charmonitor,id=monitor,mode=control \
 -rtc base=utc -no-shutdown \
 -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \
 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0xb,num_queues=4,hotplug=on \
 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 \
 -drive file=/vm/f17.img,if=none,id=drive-virtio-disk0,format=qcow2 \
 -device 
 virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
  \
 -drive file=/vm2/f17-kernel.img,if=none,id=drive-virtio-disk1,format=qcow2 \
 -device 
 virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk1,id=virtio-disk1
  \
 -drive file=/vm/virtio-scsi/scsi3.img,if=none,id=drive-scsi0-0-2-0,format=raw 
 \
 -device 
 scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=2,drive=drive-scsi0-0-2-0,id=scsi0-0-2-0,removable=on
  \
 -drive file=/vm/virtio-scsi/scsi4.img,if=none,id=drive-scsi0-0-3-0,format=raw 
 \
 -device 
 scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=3,drive=drive-scsi0-0-3-0,id=scsi0-0-3-0
  \
 -drive file=/vm/virtio-scsi/scsi1.img,if=none,id=drive-scsi0-0-0-0,format=raw 
 \
 -device 
 scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0
  \
 -drive file=/vm/virtio-scsi/scsi2.img,if=none,id=drive-scsi0-0-1-0,format=raw 
 \
 -device 
 scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi0-0-1-0,id=scsi0-0-1-0
  \
 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 
 \
 -chardev file,id=charserial1,path=/vm/f17.log \
 -device isa-serial,chardev=charserial1,id=serial1 \
 -device usb-tablet,id=input0 -vga std \
 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 \
 -netdev tap,id=hostnet0,vhost=on,queues=4 \
 -device 
 virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:ce:7b:29,bus=pci.0,addr=0x3
  \
 -monitor stdio

 I got panic just after booting the system, did nothing,  waited for a while, 
 the guest panicked.

 [   28.053004] BUG: soft lockup - CPU#1 stuck for 23s! [ip:592]
 [   28.053004] Modules linked in: ip6t_REJECT nf_conntrack_ipv6 
 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables uinput joydev 
 microcode virtio_balloon pcspkr virtio_net i2c_piix4 i2c_core virtio_scsi

Re: [PATCH v5 01/12] KVM: ARM: Introduce KVM_SET_DEVICE_ADDRESS ioctl

2013-01-09 Thread Alexander Graf


On 09.01.2013, at 16:22, Marc Zyngier wrote:

 On Wed, 9 Jan 2013 15:11:39 +, Peter Maydell
 peter.mayd...@linaro.org
 wrote:
 On 9 January 2013 14:58, Alexander Graf ag...@suse.de wrote:
 
 Yeah, that was the basic idea. Considering that the patch set hasn't
 been going
 in for another 2 months after that discussion indicates that this isn't
 too much of
 an issue though :).
 
 We might get there faster if people didn't keep nitpicking the APIs at
 the
 last minute :-)
 
 Exactly. We're trying hard to get the damned thing merged, and the
 permanent API churn quickly becomes a burden.

As I said earlier, we have had a lot of experience in creating really bad APIs 
in the past.

But how about making this one specific? Call it KVM_ARM_SET_VGIC_ADDRESS, keep 
the rest as it is, resend it, and later we can come up with an actually generic 
interface.


Alex

 
 My understanding was that a consensus was reached 2 months ago. Has
 something fundamentally changed? Something that makes this API invalid? If
 not, can we please keep this one?
 
 Thanks,
 
M.
 -- 
 Who you jivin' with that Cosmik Debris?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 00/12] Multiqueue virtio-net

2013-01-09 Thread Michael S. Tsirkin

On Wed, Jan 09, 2013 at 03:29:24PM +0100, Stefan Hajnoczi wrote:
 On Fri, Dec 28, 2012 at 06:31:52PM +0800, Jason Wang wrote:
  Perf Numbers:
  
  Two Intel Xeon 5620 with direct connected intel 82599EB
  Host/Guest kernel: David net tree
  vhost enabled
  
  - lots of improvents of both latency and cpu utilization in request-reponse 
  test
  - get regression of guest sending small packets which because TCP tends to 
  batch
less when the latency were improved
  
  1q/2q/4q
  TCP_RR
   size #sessions trans.rate  norm trans.rate  norm trans.rate  norm
  1 1 9393.26   595.64  9408.18   597.34  9375.19   584.12
  1 2072162.1   2214.24 129880.22 2456.13 196949.81 2298.13
  1 50107513.38 2653.99 139721.93 2490.58 259713.82 2873.57
  1 100   126734.63 2676.54 145553.5  2406.63 265252.68 2943
  64 19453.42   632.33  9371.37   616.13  9338.19   615.97
  64 20   70620.03  2093.68 125155.75 2409.15 191239.91 2253.32
  64 50   1069662448.29 146518.67 2514.47 242134.07 2720.91
  64 100  117046.35 2394.56 190153.09 2696.82 238881.29 2704.41
  256 1   8733.29   736.36  8701.07   680.83  8608.92   530.1
  256 20  69279.89  2274.45 115103.07 2299.76 144555.16 1963.53
  256 50  97676.02  2296.09 150719.57 2522.92 254510.5  3028.44
  256 100 150221.55 2949.56 197569.3  2790.92 300695.78 3494.83
  TCP_CRR
   size #sessions trans.rate  norm trans.rate  norm trans.rate  norm
  1 1 2848.37  163.41 2230.39  130.89 2013.09  120.47
  1 2023434.5  562.11 31057.43 531.07 49488.28 564.41
  1 5028514.88 582.17 40494.23 605.92 60113.35 654.97
  1 100   28827.22 584.73 48813.25 661.6  61783.62 676.56
  64 12780.08  159.4  2201.07  127.96 2006.8   117.63
  64 20   23318.51 564.47 30982.44 530.24 49734.95 566.13
  64 50   28585.72 582.54 40576.7  610.08 60167.89 656.56
  64 100  28747.37 584.17 49081.87 667.87 60612.94 662
  256 1   2772.08  160.51 2231.84  131.05 2003.62  113.45
  256 20  23086.35 559.8  30929.09 528.16 48454.9  555.22
  256 50  28354.7  579.85 40578.31 60760261.71 657.87
  256 100 28844.55 585.67 48541.86 659.08 61941.07 676.72
  TCP_STREAM guest receiving
   size #sessions throughput  norm throughput  norm throughput  norm
  1 1 16.27   1.33   16.11.12   16.13   0.99
  1 2 33.04   2.08   32.96   2.19   32.75   1.98
  1 4 66.62   6.83   68.35.56   66.14   2.65
  64 1896.55  56.67  914.02  58.14  898.9   61.56
  64 21830.46 91.02  1812.02 64.59  1835.57 66.26
  64 43626.61 142.55 3636.25 100.64 3607.46 75.03
  256 1   2619.49 131.23 2543.19 129.03 2618.69 132.39
  256 2   5136.58 203.02 5163.31 141.11 5236.51 149.4
  256 4   7063.99 242.83 9365.4  208.49 9421.03 159.94
  512 1   3592.43 165.24 3603.12 167.19 3552.5  169.57
  512 2   7042.62 246.59 7068.46 180.87 7258.52 186.3
  512 4   6996.08 241.49 9298.34 206.12 9418.52 159.33
  1024 1  4339.54 192.95 4370.2  191.92 4211.72 192.49
  1024 2  7439.45 254.77 9403.99 215.24 9120.82 222.67
  1024 4  7953.86 272.11 9403.87 208.23 9366.98 159.49
  4096 1  7696.28 272.04 7611.41 270.38 7778.71 267.76
  4096 2  7530.35 261.1  8905.43 246.27 8990.18 267.57
  4096 4  7121.6  247.02 9411.75 206.71 9654.96 184.67
  16384 1 7795.73 268.54 7780.94 267.2  7634.26 260.73
  16384 2 7436.57 255.81 9381.86 220.85 9392220.36
  16384 4 7199.07 247.81 9420.96 205.87 9373.69 159.57
  TCP_MAERTS guest sending
   size #sessions throughput  norm throughput  norm throughput  norm
  1 1 15.94   0.62   15.55   0.61   15.13   0.59
  1 2 36.11   0.83   32.46   0.69   32.28   0.69
  1 4 71.59   1  68.91   0.94   61.52   0.77
  64 1630.71  22.52  622.11  22.35  605.09  21.84
  64 21442.36 30.57  1292.15 25.82  1282.67 25.55
  64 43186.79 42.59  2844.96 36.03  2529.69 30.06
  256 1   1760.96 58.07  1738.44 57.43  1695.99 56.19
  256 2   4834.23 95.19  3524.85 64.21  3511.94 64.45
  256 4   9324.63 145.74 8956.49 116.39 6720.17 73.86
  512 1   2678.03 84.1   2630.68 82.93  2636.54 82.57
  512 2   9368.17 195.61 9408.82 204.53 5316.3  92.99
  512 4   9186.34 209.68 9358.72 183.82 9489.29 160.42
  1024 1  3620.71 109.88 3625.54 109.83 3606.61 112.35
  1024 2  9429258.32 7082.79 120.55 7403.53 134.78
  1024 4  9430.66 290.44 9499.29 232.31 9414.6  190.92
  4096 1  9339.28 296.48 9374.23 372.88 9348.76 298.49
  4096 2  9410.53 378.69 9412.61 286.18 9409.75 278.31
  4096 4  9487.35 374.1  9556.91 288.81 9441.94 221.64
  16384 1 9380.43 403.8  9379.78 399.13 9382.42 393.55
  16384 2 9367.69 406.93 9415.04 312.68 9409.29 300.9
  16384 4 9391.96 405.17 9695.12 310.54 9423.76 223.47
 
 Trying to understand the performance results:
 
 What is the host device configuration?  tap + bridge?
 
 Did you use host CPU affinity for the vhost threads?
 
 Can multiqueue tap take advantage of multiqueue host NICs or is
 virtio-net multiqueue unaware of the physical NIC multiqueue
 capabilities?
 
 The results seem pretty mixed - as a user it's not obvious what to
 choose as a good all-round setting.

Yes, this I

Re: [Qemu-devel] [PATCH 00/12] Multiqueue virtio-net

2013-01-09 Thread Jason Wang

On 01/09/2013 11:32 PM, Michael S. Tsirkin wrote:
 On Wed, Jan 09, 2013 at 03:29:24PM +0100, Stefan Hajnoczi wrote:
 On Fri, Dec 28, 2012 at 06:31:52PM +0800, Jason Wang wrote:
 Perf Numbers:

 Two Intel Xeon 5620 with direct connected intel 82599EB
 Host/Guest kernel: David net tree
 vhost enabled

 - lots of improvents of both latency and cpu utilization in request-reponse 
 test
 - get regression of guest sending small packets which because TCP tends to 
 batch
   less when the latency were improved

 1q/2q/4q
 TCP_RR
  size #sessions trans.rate  norm trans.rate  norm trans.rate  norm
 1 1 9393.26   595.64  9408.18   597.34  9375.19   584.12
 1 2072162.1   2214.24 129880.22 2456.13 196949.81 2298.13
 1 50107513.38 2653.99 139721.93 2490.58 259713.82 2873.57
 1 100   126734.63 2676.54 145553.5  2406.63 265252.68 2943
 64 19453.42   632.33  9371.37   616.13  9338.19   615.97
 64 20   70620.03  2093.68 125155.75 2409.15 191239.91 2253.32
 64 50   1069662448.29 146518.67 2514.47 242134.07 2720.91
 64 100  117046.35 2394.56 190153.09 2696.82 238881.29 2704.41
 256 1   8733.29   736.36  8701.07   680.83  8608.92   530.1
 256 20  69279.89  2274.45 115103.07 2299.76 144555.16 1963.53
 256 50  97676.02  2296.09 150719.57 2522.92 254510.5  3028.44
 256 100 150221.55 2949.56 197569.3  2790.92 300695.78 3494.83
 TCP_CRR
  size #sessions trans.rate  norm trans.rate  norm trans.rate  norm
 1 1 2848.37  163.41 2230.39  130.89 2013.09  120.47
 1 2023434.5  562.11 31057.43 531.07 49488.28 564.41
 1 5028514.88 582.17 40494.23 605.92 60113.35 654.97
 1 100   28827.22 584.73 48813.25 661.6  61783.62 676.56
 64 12780.08  159.4  2201.07  127.96 2006.8   117.63
 64 20   23318.51 564.47 30982.44 530.24 49734.95 566.13
 64 50   28585.72 582.54 40576.7  610.08 60167.89 656.56
 64 100  28747.37 584.17 49081.87 667.87 60612.94 662
 256 1   2772.08  160.51 2231.84  131.05 2003.62  113.45
 256 20  23086.35 559.8  30929.09 528.16 48454.9  555.22
 256 50  28354.7  579.85 40578.31 60760261.71 657.87
 256 100 28844.55 585.67 48541.86 659.08 61941.07 676.72
 TCP_STREAM guest receiving
  size #sessions throughput  norm throughput  norm throughput  norm
 1 1 16.27   1.33   16.11.12   16.13   0.99
 1 2 33.04   2.08   32.96   2.19   32.75   1.98
 1 4 66.62   6.83   68.35.56   66.14   2.65
 64 1896.55  56.67  914.02  58.14  898.9   61.56
 64 21830.46 91.02  1812.02 64.59  1835.57 66.26
 64 43626.61 142.55 3636.25 100.64 3607.46 75.03
 256 1   2619.49 131.23 2543.19 129.03 2618.69 132.39
 256 2   5136.58 203.02 5163.31 141.11 5236.51 149.4
 256 4   7063.99 242.83 9365.4  208.49 9421.03 159.94
 512 1   3592.43 165.24 3603.12 167.19 3552.5  169.57
 512 2   7042.62 246.59 7068.46 180.87 7258.52 186.3
 512 4   6996.08 241.49 9298.34 206.12 9418.52 159.33
 1024 1  4339.54 192.95 4370.2  191.92 4211.72 192.49
 1024 2  7439.45 254.77 9403.99 215.24 9120.82 222.67
 1024 4  7953.86 272.11 9403.87 208.23 9366.98 159.49
 4096 1  7696.28 272.04 7611.41 270.38 7778.71 267.76
 4096 2  7530.35 261.1  8905.43 246.27 8990.18 267.57
 4096 4  7121.6  247.02 9411.75 206.71 9654.96 184.67
 16384 1 7795.73 268.54 7780.94 267.2  7634.26 260.73
 16384 2 7436.57 255.81 9381.86 220.85 9392220.36
 16384 4 7199.07 247.81 9420.96 205.87 9373.69 159.57
 TCP_MAERTS guest sending
  size #sessions throughput  norm throughput  norm throughput  norm
 1 1 15.94   0.62   15.55   0.61   15.13   0.59
 1 2 36.11   0.83   32.46   0.69   32.28   0.69
 1 4 71.59   1  68.91   0.94   61.52   0.77
 64 1630.71  22.52  622.11  22.35  605.09  21.84
 64 21442.36 30.57  1292.15 25.82  1282.67 25.55
 64 43186.79 42.59  2844.96 36.03  2529.69 30.06
 256 1   1760.96 58.07  1738.44 57.43  1695.99 56.19
 256 2   4834.23 95.19  3524.85 64.21  3511.94 64.45
 256 4   9324.63 145.74 8956.49 116.39 6720.17 73.86
 512 1   2678.03 84.1   2630.68 82.93  2636.54 82.57
 512 2   9368.17 195.61 9408.82 204.53 5316.3  92.99
 512 4   9186.34 209.68 9358.72 183.82 9489.29 160.42
 1024 1  3620.71 109.88 3625.54 109.83 3606.61 112.35
 1024 2  9429258.32 7082.79 120.55 7403.53 134.78
 1024 4  9430.66 290.44 9499.29 232.31 9414.6  190.92
 4096 1  9339.28 296.48 9374.23 372.88 9348.76 298.49
 4096 2  9410.53 378.69 9412.61 286.18 9409.75 278.31
 4096 4  9487.35 374.1  9556.91 288.81 9441.94 221.64
 16384 1 9380.43 403.8  9379.78 399.13 9382.42 393.55
 16384 2 9367.69 406.93 9415.04 312.68 9409.29 300.9
 16384 4 9391.96 405.17 9695.12 310.54 9423.76 223.47
 Trying to understand the performance results:

 What is the host device configuration?  tap + bridge?

Yes.

 Did you use host CPU affinity for the vhost threads?

I use numactl to pin cpu threads and vhost threads in the same numa node.
 Can multiqueue tap take advantage of multiqueue host NICs or is
 virtio-net multiqueue unaware of the physical NIC multiqueue
 capabilities?

Tap is unware of the physical multiqueue NIC, but we can benefit from it
since we

Re: [PATCH v5 01/12] KVM: ARM: Introduce KVM_SET_DEVICE_ADDRESS ioctl

2013-01-09 Thread Marc Zyngier

On Wed, 9 Jan 2013 15:11:39 +, Peter Maydell
peter.mayd...@linaro.org
wrote:
 On 9 January 2013 14:58, Alexander Graf ag...@suse.de wrote:

 Yeah, that was the basic idea. Considering that the patch set hasn't
 been going
 in for another 2 months after that discussion indicates that this isn't
 too much of
 an issue though :).
 
 We might get there faster if people didn't keep nitpicking the APIs at
the
 last minute :-)

Exactly. We're trying hard to get the damned thing merged, and the
permanent API churn quickly becomes a burden.

My understanding was that a consensus was reached 2 months ago. Has
something fundamentally changed? Something that makes this API invalid? If
not, can we please keep this one?

Thanks,

M.
-- 
Who you jivin' with that Cosmik Debris?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5 01/12] KVM: ARM: Introduce KVM_SET_DEVICE_ADDRESS ioctl

2013-01-09 Thread Alexander Graf


On 09.01.2013, at 16:50, Marc Zyngier wrote:

 On Wed, 9 Jan 2013 16:28:03 +0100, Alexander Graf ag...@suse.de wrote:
 On 09.01.2013, at 16:22, Marc Zyngier wrote:
 
 On Wed, 9 Jan 2013 15:11:39 +, Peter Maydell
 peter.mayd...@linaro.org
 wrote:
 On 9 January 2013 14:58, Alexander Graf ag...@suse.de wrote:
 
 Yeah, that was the basic idea. Considering that the patch set hasn't
 been going
 in for another 2 months after that discussion indicates that this
 isn't
 too much of
 an issue though :).
 
 We might get there faster if people didn't keep nitpicking the APIs at
 the
 last minute :-)
 
 Exactly. We're trying hard to get the damned thing merged, and the
 permanent API churn quickly becomes a burden.
 
 As I said earlier, we have had a lot of experience in creating really
 bad
 APIs in the past.
 
 Is this one really bad? Again, what changed in the meantime that makes you
 think this API is wrong?

I complained about it 2 months ago already.

 But how about making this one specific? Call it
 KVM_ARM_SET_VGIC_ADDRESS,
 keep the rest as it is, resend it, and later we can come up with an
 actually generic interface.
 
 Let's pretend you never wrote that, shall we? ;-)

Why? The worst thing that happened to us so far were interfaces that looked 
generic and extensible and turned out not to be (see SET_SREGS in the ppc 
code). We have 2 options to circumvent this:

  1) Commonly agree on an interface that actually is extensible and use it 
throughout the code
  2) Create a very specific interface for a single use case, keep it 
implemented forever and as soon as we need more, implement a more generic one 
that goes back to 1

Since the ARM patches have been out of tree for too long, I'm happy with going 
route 2 until we go back to square 1.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5 01/12] KVM: ARM: Introduce KVM_SET_DEVICE_ADDRESS ioctl

2013-01-09 Thread Marc Zyngier

On 09/01/13 15:56, Alexander Graf wrote:
 
 On 09.01.2013, at 16:50, Marc Zyngier wrote:
 
 On Wed, 9 Jan 2013 16:28:03 +0100, Alexander Graf ag...@suse.de wrote:
 On 09.01.2013, at 16:22, Marc Zyngier wrote:

 On Wed, 9 Jan 2013 15:11:39 +, Peter Maydell
 peter.mayd...@linaro.org
 wrote:
 On 9 January 2013 14:58, Alexander Graf ag...@suse.de wrote:

 Yeah, that was the basic idea. Considering that the patch set hasn't
 been going
 in for another 2 months after that discussion indicates that this
 isn't
 too much of
 an issue though :).

 We might get there faster if people didn't keep nitpicking the APIs at
 the
 last minute :-)

 Exactly. We're trying hard to get the damned thing merged, and the
 permanent API churn quickly becomes a burden.

 As I said earlier, we have had a lot of experience in creating really
 bad
 APIs in the past.

 Is this one really bad? Again, what changed in the meantime that makes you
 think this API is wrong?
 
 I complained about it 2 months ago already.
 
 But how about making this one specific? Call it
 KVM_ARM_SET_VGIC_ADDRESS,
 keep the rest as it is, resend it, and later we can come up with an
 actually generic interface.

 Let's pretend you never wrote that, shall we? ;-)
 
 Why? The worst thing that happened to us so far were interfaces that looked 
 generic and extensible and turned out not to be (see SET_SREGS in the ppc 
 code). We have 2 options to circumvent this:
 
   1) Commonly agree on an interface that actually is extensible and use it 
 throughout the code
   2) Create a very specific interface for a single use case, keep it 
 implemented forever and as soon as we need more, implement a more generic 
 one that goes back to 1
 
 Since the ARM patches have been out of tree for too long, I'm happy with 
 going route 2 until we go back to square 1.

I really don't want to see that. Either we keep the API as it is, or we
change it for something that is really better and used by other
architectures. No point in turning it upside down if we're the only one
doing it.

Oh, and as we're aiming for 3.9, it'd better be ready soon...

M.
-- 
Jazz is not dead. It just smells funny...

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5 01/12] KVM: ARM: Introduce KVM_SET_DEVICE_ADDRESS ioctl

2013-01-09 Thread Marc Zyngier

On Wed, 9 Jan 2013 16:28:03 +0100, Alexander Graf ag...@suse.de wrote:
 On 09.01.2013, at 16:22, Marc Zyngier wrote:
 
 On Wed, 9 Jan 2013 15:11:39 +, Peter Maydell
 peter.mayd...@linaro.org
 wrote:
 On 9 January 2013 14:58, Alexander Graf ag...@suse.de wrote:
 
 Yeah, that was the basic idea. Considering that the patch set hasn't
 been going
 in for another 2 months after that discussion indicates that this
isn't
 too much of
 an issue though :).
 
 We might get there faster if people didn't keep nitpicking the APIs at
 the
 last minute :-)
 
 Exactly. We're trying hard to get the damned thing merged, and the
 permanent API churn quickly becomes a burden.
 
 As I said earlier, we have had a lot of experience in creating really
bad
 APIs in the past.

Is this one really bad? Again, what changed in the meantime that makes you
think this API is wrong?

 But how about making this one specific? Call it
KVM_ARM_SET_VGIC_ADDRESS,
 keep the rest as it is, resend it, and later we can come up with an
 actually generic interface.

Let's pretend you never wrote that, shall we? ;-)

M.
-- 
Fast, cheap, reliable. Pick two.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v5.1 1/2] KVM: ARM: Introduce KVM_SET_DEVICE_ADDRESS ioctl

2013-01-09 Thread Christoffer Dall

On ARM (and possibly other architectures) some bits are specific to the
model being emulated for the guest and user space needs a way to tell
the kernel about those bits.  An example is mmio device base addresses,
where KVM must know the base address for a given device to properly
emulate mmio accesses within a certain address range or directly map a
device with virtualiation extensions into the guest address space.

We try to make this API slightly more generic than for our specific use,
but so far only the VGIC uses this feature.

Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 Documentation/virtual/kvm/api.txt |   37 +
 arch/arm/include/uapi/asm/kvm.h   |   13 +
 arch/arm/kvm/arm.c|   23 ++-
 include/uapi/linux/kvm.h  |8 
 4 files changed, 80 insertions(+), 1 deletion(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 38066a7a..c2ebd9e 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2206,6 +2206,43 @@ This ioctl returns the guest registers that are 
supported for the
 KVM_GET_ONE_REG/KVM_SET_ONE_REG calls.
 
 
+4.80 KVM_ARM_SET_DEVICE_ADDR
+
+Capability: KVM_CAP_ARM_SET_DEVICE_ADDR
+Architectures: arm
+Type: vm ioctl
+Parameters: struct kvm_arm_device_address (in)
+Returns: 0 on success, -1 on error
+Errors:
+  ENODEV: The device id is unknown
+  ENXIO:  Device not supported on current system
+  EEXIST: Address already set
+  E2BIG:  Address outside guest physical address space
+
+struct kvm_arm_device_addr {
+   __u64 id;
+   __u64 addr;
+};
+
+Specify a device address in the guest's physical address space where guests
+can access emulated or directly exposed devices, which the host kernel needs
+to know about. The id field is an architecture specific identifier for a
+specific device.
+
+ARM divides the id field into two parts, a device id and an address type id
+specific to the individual device.
+
+  bits:  | 63...   32 | 31...16 | 15...0 |
+  field: |0x  | device id   |  addr type id  |
+
+ARM currently only require this when using the in-kernel GIC support for the
+hardware VGIC features, using KVM_ARM_DEVICE_VGIC_V2 as the device id.  When
+setting the base address for the guest's mapping of the VGIC virtual CPU
+and distributor interface, the ioctl must be called after calling
+KVM_CREATE_IRQCHIP, but before calling KVM_RUN on any of the VCPUs.  Calling
+this ioctl twice for any of the base addresses will return -EEXIST.
+
+
 5. The kvm_run structure
 
 
diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
index 73b9615..09911a7 100644
--- a/arch/arm/include/uapi/asm/kvm.h
+++ b/arch/arm/include/uapi/asm/kvm.h
@@ -65,6 +65,19 @@ struct kvm_regs {
 #define KVM_ARM_TARGET_CORTEX_A15  0
 #define KVM_ARM_NUM_TARGETS1
 
+/* KVM_SET_DEVICE_ADDRESS ioctl id encoding */
+#define KVM_DEVICE_TYPE_SHIFT  0
+#define KVM_DEVICE_TYPE_MASK   (0x  KVM_DEVICE_TYPE_SHIFT)
+#define KVM_DEVICE_ID_SHIFT16
+#define KVM_DEVICE_ID_MASK (0x  KVM_DEVICE_ID_SHIFT)
+
+/* Supported device IDs */
+#define KVM_ARM_DEVICE_VGIC_V2 0
+
+/* Supported VGIC address types  */
+#define KVM_VGIC_V2_ADDR_TYPE_DIST 0
+#define KVM_VGIC_V2_ADDR_TYPE_CPU  1
+
 struct kvm_vcpu_init {
__u32 target;
__u32 features[7];
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index f42d828..ce6e455 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -165,6 +165,8 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_COALESCED_MMIO:
r = KVM_COALESCED_MMIO_PAGE_OFFSET;
break;
+   case KVM_CAP_ARM_SET_DEVICE_ADDR:
+   r = 1;
case KVM_CAP_NR_VCPUS:
r = num_online_cpus();
break;
@@ -805,10 +807,29 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct 
kvm_dirty_log *log)
return -EINVAL;
 }
 
+static int kvm_vm_ioctl_set_device_addr(struct kvm *kvm,
+   struct kvm_arm_device_addr *dev_addr)
+{
+   return -ENODEV;
+}
+
 long kvm_arch_vm_ioctl(struct file *filp,
   unsigned int ioctl, unsigned long arg)
 {
-   return -EINVAL;
+   struct kvm *kvm = filp-private_data;
+   void __user *argp = (void __user *)arg;
+
+   switch (ioctl) {
+   case KVM_ARM_SET_DEVICE_ADDR: {
+   struct kvm_arm_device_addr dev_addr;
+
+   if (copy_from_user(dev_addr, argp, sizeof(dev_addr)))
+   return -EFAULT;
+   return kvm_vm_ioctl_set_device_addr(kvm, dev_addr);
+   }
+   default:
+   return -EINVAL;
+   }
 }
 
 static void cpu_init_hyp_mode(void *vector)
diff --git a/include/uapi/linux/kvm.h

[PATCH v5.1 2/2] ARM: KVM: VGIC accept vcpu and dist base addresses from user space

2013-01-09 Thread Christoffer Dall

User space defines the model to emulate to a guest and should therefore
decide which addresses are used for both the virtual CPU interface
directly mapped in the guest physical address space and for the emulated
distributor interface, which is mapped in software by the in-kernel VGIC
support.

Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 Documentation/virtual/kvm/api.txt |1 +
 arch/arm/include/asm/kvm_vgic.h   |9 +
 arch/arm/include/uapi/asm/kvm.h   |3 ++
 arch/arm/kvm/arm.c|   14 
 arch/arm/kvm/vgic.c   |   62 +
 5 files changed, 88 insertions(+), 1 deletion(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index c2ebd9e..497fd48 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2218,6 +2218,7 @@ Errors:
   ENXIO:  Device not supported on current system
   EEXIST: Address already set
   E2BIG:  Address outside guest physical address space
+  EBUSY:  Address overlaps with other device range
 
 struct kvm_arm_device_addr {
__u64 id;
diff --git a/arch/arm/include/asm/kvm_vgic.h b/arch/arm/include/asm/kvm_vgic.h
index fcfd530..270dcd2 100644
--- a/arch/arm/include/asm/kvm_vgic.h
+++ b/arch/arm/include/asm/kvm_vgic.h
@@ -22,6 +22,9 @@
 #include asm/hardware/gic.h
 
 struct vgic_dist {
+   /* Distributor and vcpu interface mapping in the guest */
+   phys_addr_t vgic_dist_base;
+   phys_addr_t vgic_cpu_base;
 };
 
 struct vgic_cpu {
@@ -33,6 +36,7 @@ struct kvm_run;
 struct kvm_exit_mmio;
 
 #ifdef CONFIG_KVM_ARM_VGIC
+int kvm_vgic_set_addr(struct kvm *kvm, unsigned long type, u64 addr);
 bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run,
  struct kvm_exit_mmio *mmio);
 
@@ -42,6 +46,11 @@ static inline int kvm_vgic_hyp_init(void)
return 0;
 }
 
+static inline int kvm_vgic_set_addr(struct kvm *kvm, unsigned long type, u64 
addr)
+{
+   return 0;
+}
+
 static inline int kvm_vgic_init(struct kvm *kvm)
 {
return 0;
diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
index 09911a7..94e893b 100644
--- a/arch/arm/include/uapi/asm/kvm.h
+++ b/arch/arm/include/uapi/asm/kvm.h
@@ -78,6 +78,9 @@ struct kvm_regs {
 #define KVM_VGIC_V2_ADDR_TYPE_DIST 0
 #define KVM_VGIC_V2_ADDR_TYPE_CPU  1
 
+#define KVM_VGIC_V2_DIST_SIZE  0x1000
+#define KVM_VGIC_V2_CPU_SIZE   0x2000
+
 struct kvm_vcpu_init {
__u32 target;
__u32 features[7];
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index fcb0dfc..b4e7571 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -858,7 +858,19 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct 
kvm_dirty_log *log)
 static int kvm_vm_ioctl_set_device_addr(struct kvm *kvm,
struct kvm_arm_device_addr *dev_addr)
 {
-   return -ENODEV;
+   unsigned long dev_id, type;
+
+   dev_id = (dev_addr-id  KVM_DEVICE_ID_MASK)  KVM_DEVICE_ID_SHIFT;
+   type = (dev_addr-id  KVM_DEVICE_TYPE_MASK)  KVM_DEVICE_TYPE_SHIFT;
+
+   switch (dev_id) {
+   case KVM_ARM_DEVICE_VGIC_V2:
+   if (!vgic_present)
+   return -ENXIO;
+   return kvm_vgic_set_addr(kvm, type, dev_addr-addr);
+   default:
+   return -ENODEV;
+   }
 }
 
 long kvm_arch_vm_ioctl(struct file *filp,
diff --git a/arch/arm/kvm/vgic.c b/arch/arm/kvm/vgic.c
index 1feee5a..cdb7671 100644
--- a/arch/arm/kvm/vgic.c
+++ b/arch/arm/kvm/vgic.c
@@ -22,6 +22,9 @@
 #include linux/io.h
 #include asm/kvm_emulate.h
 
+#define VGIC_ADDR_UNDEF(-1)
+#define IS_VGIC_ADDR_UNDEF(_x)  ((_x) == VGIC_ADDR_UNDEF)
+
 #define ACCESS_READ_VALUE  (1  0)
 #define ACCESS_READ_RAZ(0  0)
 #define ACCESS_READ_MASK(x)((x)  (1  0))
@@ -142,3 +145,62 @@ bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct 
kvm_run *run,
 {
return KVM_EXIT_MMIO;
 }
+
+static bool vgic_ioaddr_overlap(struct kvm *kvm)
+{
+   phys_addr_t dist = kvm-arch.vgic.vgic_dist_base;
+   phys_addr_t cpu = kvm-arch.vgic.vgic_cpu_base;
+
+   if (IS_VGIC_ADDR_UNDEF(dist) || IS_VGIC_ADDR_UNDEF(cpu))
+   return 0;
+   if ((dist = cpu  dist + KVM_VGIC_V2_DIST_SIZE  cpu) ||
+   (cpu = dist  cpu + KVM_VGIC_V2_CPU_SIZE  dist))
+   return -EBUSY;
+   return 0;
+}
+
+static int vgic_ioaddr_assign(struct kvm *kvm, phys_addr_t *ioaddr,
+ phys_addr_t addr, phys_addr_t size)
+{
+   int ret;
+
+   if (!IS_VGIC_ADDR_UNDEF(*ioaddr))
+   return -EEXIST;
+   if (addr + size  addr)
+   return -EINVAL;
+
+   ret = vgic_ioaddr_overlap(kvm);
+   if (ret)
+   return ret;
+   *ioaddr = addr;
+   return ret;
+}
+
+int kvm_vgic_set_addr(struct kvm *kvm, unsigned long type, u64

[PATCH v5.1 0/2] KVM: ARM: Rename KVM_SET_DEVICE_ADDRESS

2013-01-09 Thread Christoffer Dall

Renames the KVM_SET_DEVICE_ADDRESS to KVM_ARM_SET_DEVICE_ADDR
to make it obvious that this is ARM specific in lack of a better generic
interface.

Once we agree on a better interface the KVM/ARM code can also take
advantage of that, but until then we don't want to hold up the KVM/ARM
patches.

A new complete series of these patches including this change can be
pulled from:
  git://github.com/virtualopensystems/linux-kvm-arm.git
  kvm-arm-v15-devaddr-v2

-Christoffer

---

Christoffer Dall (2):
  KVM: ARM: Introduce KVM_SET_DEVICE_ADDRESS ioctl
  ARM: KVM: VGIC accept vcpu and dist base addresses from user space


 Documentation/virtual/kvm/api.txt   |   38 ++
 arch/arm/include/asm/hardware/gic.h |   25 
 arch/arm/include/asm/kvm_host.h |   18 +++
 arch/arm/include/asm/kvm_vgic.h |   89 +++
 arch/arm/include/uapi/asm/kvm.h |   16 +++
 arch/arm/kvm/Makefile   |1 
 arch/arm/kvm/arm.c  |   92 +++-
 arch/arm/kvm/interrupts.S   |4 +
 arch/arm/kvm/mmio.c |3 +
 arch/arm/kvm/vgic.c |  206 +++
 include/uapi/linux/kvm.h|8 +
 11 files changed, 498 insertions(+), 2 deletions(-)
 create mode 100644 arch/arm/include/asm/kvm_vgic.h
 create mode 100644 arch/arm/kvm/vgic.c

-- 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5 01/12] KVM: ARM: Introduce KVM_SET_DEVICE_ADDRESS ioctl

2013-01-09 Thread Christoffer Dall

On Wed, Jan 9, 2013 at 11:12 AM, Marc Zyngier marc.zyng...@arm.com wrote:
 On 09/01/13 15:56, Alexander Graf wrote:

 On 09.01.2013, at 16:50, Marc Zyngier wrote:

 On Wed, 9 Jan 2013 16:28:03 +0100, Alexander Graf ag...@suse.de wrote:
 On 09.01.2013, at 16:22, Marc Zyngier wrote:

 On Wed, 9 Jan 2013 15:11:39 +, Peter Maydell
 peter.mayd...@linaro.org
 wrote:
 On 9 January 2013 14:58, Alexander Graf ag...@suse.de wrote:

 Yeah, that was the basic idea. Considering that the patch set hasn't
 been going
 in for another 2 months after that discussion indicates that this
 isn't
 too much of
 an issue though :).

 We might get there faster if people didn't keep nitpicking the APIs at
 the
 last minute :-)

 Exactly. We're trying hard to get the damned thing merged, and the
 permanent API churn quickly becomes a burden.

 As I said earlier, we have had a lot of experience in creating really
 bad
 APIs in the past.

 Is this one really bad? Again, what changed in the meantime that makes you
 think this API is wrong?

 I complained about it 2 months ago already.

 But how about making this one specific? Call it
 KVM_ARM_SET_VGIC_ADDRESS,
 keep the rest as it is, resend it, and later we can come up with an
 actually generic interface.

 Let's pretend you never wrote that, shall we? ;-)

 Why? The worst thing that happened to us so far were interfaces that looked 
 generic and extensible and turned out not to be (see SET_SREGS in the ppc 
 code). We have 2 options to circumvent this:

   1) Commonly agree on an interface that actually is extensible and use it 
 throughout the code
   2) Create a very specific interface for a single use case, keep it 
 implemented forever and as soon as we need more, implement a more generic 
 one that goes back to 1

 Since the ARM patches have been out of tree for too long, I'm happy with 
 going route 2 until we go back to square 1.

 I really don't want to see that. Either we keep the API as it is, or we
 change it for something that is really better and used by other
 architectures. No point in turning it upside down if we're the only one
 doing it.

 Oh, and as we're aiming for 3.9, it'd better be ready soon...

ok, I renamed it to KVM_ARM_SET_DEVICE_ADDR, using the same binary
format, so user space tools remain compatible.

I'll be happy to help out with a more generic API, once the kvm/arm
patches reach upstream.

Thanks so far!
-Christoffer
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5 12/12] ARM: KVM: Add VGIC configuration option

2013-01-09 Thread Christoffer Dall

On Wed, Jan 9, 2013 at 8:28 AM, Sergei Shtylyov sshtyl...@mvista.com wrote:
 Hello.


 On 08-01-2013 22:42, Christoffer Dall wrote:

 From: Marc Zyngier marc.zyng...@arm.com


 It is now possible to select the VGIC configuration option.


 Signed-off-by: Marc Zyngier marc.zyng...@arm.com
 Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
 ---
   arch/arm/kvm/Kconfig |8 
   1 file changed, 8 insertions(+)


 diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
 index 05227cb..d32e33f 100644
 --- a/arch/arm/kvm/Kconfig
 +++ b/arch/arm/kvm/Kconfig
 @@ -51,6 +51,14 @@ config KVM_ARM_MAX_VCPUS
   large, so only choose a reasonable number that you expect to
   actually use.

 +config KVM_ARM_VGIC
 +bool KVM support for Virtual GIC


Please indent with tab, as below.



fixed both, thanks.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [kvmarm] [PATCH v5.1 0/2] KVM: ARM: Rename KVM_SET_DEVICE_ADDRESS

2013-01-09 Thread Alexander Graf


On 09.01.2013, at 17:26, Christoffer Dall wrote:

 Renames the KVM_SET_DEVICE_ADDRESS to KVM_ARM_SET_DEVICE_ADDR
 to make it obvious that this is ARM specific in lack of a better generic
 interface.
 
 Once we agree on a better interface the KVM/ARM code can also take
 advantage of that, but until then we don't want to hold up the KVM/ARM
 patches.

Works for me. Scott, are you happy with this one too? We can start to introduce 
(and fix ARM) with a generic ioctl in the MPIC patches then.


Alex

 
 A new complete series of these patches including this change can be
 pulled from:
  git://github.com/virtualopensystems/linux-kvm-arm.git
  kvm-arm-v15-devaddr-v2
 
 -Christoffer
 
 ---
 
 Christoffer Dall (2):
  KVM: ARM: Introduce KVM_SET_DEVICE_ADDRESS ioctl
  ARM: KVM: VGIC accept vcpu and dist base addresses from user space
 
 
 Documentation/virtual/kvm/api.txt   |   38 ++
 arch/arm/include/asm/hardware/gic.h |   25 
 arch/arm/include/asm/kvm_host.h |   18 +++
 arch/arm/include/asm/kvm_vgic.h |   89 +++
 arch/arm/include/uapi/asm/kvm.h |   16 +++
 arch/arm/kvm/Makefile   |1 
 arch/arm/kvm/arm.c  |   92 +++-
 arch/arm/kvm/interrupts.S   |4 +
 arch/arm/kvm/mmio.c |3 +
 arch/arm/kvm/vgic.c |  206 +++
 include/uapi/linux/kvm.h|8 +
 11 files changed, 498 insertions(+), 2 deletions(-)
 create mode 100644 arch/arm/include/asm/kvm_vgic.h
 create mode 100644 arch/arm/kvm/vgic.c
 
 -- 
 ___
 kvmarm mailing list
 kvm...@lists.cs.columbia.edu
 https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: Guest performance is reduced after live migration

2013-01-09 Thread Mark Petersen

Here's what I have found so far...

Ubuntu 10.04 performed within +/- 2% so I'm not including those results.It 
seems that it's more of an issue of disk access, so I'm going to run some more 
disk specific benchmarks and I'll those results later.   I'd be happy to run 
any other perf tests that might help track down the problem as well.

Qemu command line:
/usr/bin/kvm -name one-3 -S -M pc-1.2 -cpu Westmere -enable-kvm -m 73728 -smp 
16,sockets=2,cores=8,threads=1 -uuid 3ebea329-cfbb-3447-0b49-b41e078a3ede 
-no-user-config -nodefaults -chardev 
socket,id=charmonitor,path=/var/lib/libvirt/qemu/one-3.monitor,server,nowait 
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew 
-no-kvm-pit-reinjection -no-shutdown -device 
piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive 
file=/var/lib/one//datastores/0/3/disk.0,if=none,id=drive-virtio-disk0,format=raw,cache=none
 -device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
 -drive 
file=/var/lib/one//datastores/0/3/disk.1,if=none,id=drive-ide0-0-0,readonly=on,format=raw
 -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev 
tap,fd=23,id=hostnet0,vhost=on,vhostfd=25 -device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=02:00:0a:64:02:fe,bus=pci.0,addr=0x3 
-vnc 0.0.0.0:3,password -vga cirrus -incoming tcp:0.0.0.0:49155 -device 
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5

Initial Boot Benchmarks
===
Huge Page Usage
Physical Host: 2627584kB
QEMU Process: 2478080kB

Using phoronix pts/compuational
ImageMagick - 31.54s
Linux Kernel 3.1 - 43.91s
Mplayer - 30.49s
PHP - 22.25s


After Live Migration Benchmarks
===
Huge Page Usage
Physical Host: 3174400kB
QEMU Process: 3151872kB

Using phoronix pts/compilation
ImageMagick - 43.29s
Linux Kernel 3.1 - 76.67s
Mplayer - 45.41s
PHP - 29.1s



-Original Message-
From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf Of 
Mark Petersen
Sent: Wednesday, January 02, 2013 7:32 PM
To: Marcelo Tosatti
Cc: kvm@vger.kernel.org; shouta.ueh...@jp.yokogawa.com
Subject: RE: Guest performance is reduced after live migration

I believe I disabled huge pages on the guest and host previously, but I'll test 
a few scenarios and look at transparent hugepage usage specifically again over 
the next couple days and report back.


Below is a command line used for testing.

/usr/bin/kvm - qemu-x86_64

/usr/bin/kvm -name one-483 -S -M pc-1.2 -cpu Westmere -enable-kvm -m 73728 -smp 
16,sockets=2,cores=8,threads=1 -uuid a844146a-0d72-a661-fe6c-cb6b7a4ba240 
-no-user-config -nodefaults -chardev 
socket,id=charmonitor,path=/var/lib/libvirt/qemu/one-483.monitor,server,nowait 
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown 
-device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive 
file=/var/lib/one//datastores/0/483/disk.0,if=none,id=drive-virtio-disk0,format=raw,cache=none
 -device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
 -drive 
file=/var/lib/one//datastores/0/483/disk.1,if=none,id=drive-ide0-0-0,readonly=on,format=raw
 -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev 
tap,fd=26,id=hostnet0,vhost=on,vhostfd=27 -device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=02:00:0a:02:02:4b,bus=pci.0,addr=0x3 
-vnc 0.0.0.0 -vga cirrus -device 
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5


-Original Message-
From: Marcelo Tosatti [mailto:mtosa...@redhat.com]
Sent: Wednesday, January 02, 2013 6:49 PM
To: Mark Petersen
Cc: kvm@vger.kernel.org; shouta.ueh...@jp.yokogawa.com
Subject: Re: Guest performance is reduced after live migration

On Wed, Jan 02, 2013 at 11:56:11PM +, Mark Petersen wrote:
 I don't think it's related to huge pages...
 
 I was using phoronix-test-suite to run benchmarks.  The 'batch/compilation' 
 group shows the slowdown for all tests, the 'batch/computation' show some 
 performance degradation, but not nearly as significant.

Huge pages in the host, for the qemu-kvm process, i mean.
Without huge pages backing guest memory in the host, 4k EPT TLB entries will be 
used instead of 2MB EPT TLB entries.

 You could probably easily test this way without phoronix -  Start a VM with 
 almost nothing running.  Download mainline Linux kernel, compile.  This takes 
 about 45 seconds in my case (72GB memory, 16 virtual CPUs, idle physical host 
 running this VM.)  Run as many times as you want, still takes ~45 seconds.
 
 Migrate to a new idle host, kernel compile now takes ~90 seconds, wait
 3 hours (should give khugepaged a change to do its thing I imagine),

Please verify its the case (by checking how much memory is backed by hugepages).

http://www.mjmwired.net/kernel/Documentation/vm/transhuge.txt
Monitoring Usage.


 kernel compiles still take 90 seconds.  Reboot virtual machine (run 'shutdown 
 -r now', reboot, whatever.)  First compile

Re: [PATCH 1/2] VFIO-AER: Vfio-pci driver changes for supporting AER

2013-01-09 Thread Alex Williamson

On Wed, 2013-01-09 at 06:26 +, Pandarathil, Vijaymohan R wrote:
   - New ioctl which is used to pass the eventfd that is signaled when
   an error occurs in the vfio_pci_device
 
   - Register pci_error_handler for the vfio_pci driver
 
   - When the device encounters an error, the error handler registered by
   the vfio_pci driver gets invoked by the AER infrastructure
 
   - In the error handler, signal the eventfd registered for the device.
 
   - This results in the qemu eventfd handler getting invoked and
   appropriate action taken for the guest.
 
 Signed-off-by: Vijay Mohan Pandarathil vijaymohan.pandarat...@hp.com
 ---
  drivers/vfio/pci/vfio_pci.c | 29 +
  drivers/vfio/pci/vfio_pci_private.h |  1 +
  drivers/vfio/vfio.c |  8 
  include/linux/vfio.h|  1 +
  include/uapi/linux/vfio.h   |  9 +
  5 files changed, 48 insertions(+)
 
 diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
 index 6c11994..4ae9526 100644
 --- a/drivers/vfio/pci/vfio_pci.c
 +++ b/drivers/vfio/pci/vfio_pci.c
 @@ -207,6 +207,8 @@ static long vfio_pci_ioctl(void *device_data,
   if (vdev-reset_works)
   info.flags |= VFIO_DEVICE_FLAGS_RESET;
  
 + info.flags |= VFIO_DEVICE_FLAGS_AER_NOTIFY;
 +

This appears to be a PCI specific flag, so the name should include
_PCI_.  We also support non-PCIe devices and it seems like it would be
possible to not have AER support available, so shouldn't this be
conditional?

   info.num_regions = VFIO_PCI_NUM_REGIONS;
   info.num_irqs = VFIO_PCI_NUM_IRQS;
  
 @@ -348,6 +350,19 @@ static long vfio_pci_ioctl(void *device_data,
  
   return ret;
  
 + } else if (cmd == VFIO_DEVICE_SET_ERRFD) {
 + int32_t fd = (int32_t)arg;
 +
 + if (fd  0)
 + return -EINVAL;
 +
 + vdev-err_trigger = eventfd_ctx_fdget(fd);
 +
 + if (IS_ERR(vdev-err_trigger))
 + return PTR_ERR(vdev-err_trigger);
 +
 + return 0;
 +

I'm not sure why we wouldn't describe this as just another interrupt
from the device and configure it via SET_IRQ.  This ioctl has very
limited use and doesn't follow any of the conventions of all the other
vfio ioctls.

   } else if (cmd == VFIO_DEVICE_RESET)
   return vdev-reset_works ?
   pci_reset_function(vdev-pdev) : -EINVAL;
 @@ -527,11 +542,25 @@ static void vfio_pci_remove(struct pci_dev *pdev)
   kfree(vdev);
  }
  
 +static pci_ers_result_t vfio_err_detected(struct pci_dev *pdev,
 + pci_channel_state_t state)
 +{
 + struct vfio_pci_device *vdev = vfio_get_vdev(pdev-dev);
 +
 + eventfd_signal(vdev-err_trigger, 1);
 + return PCI_ERS_RESULT_CAN_RECOVER;
 +}

What if err_trigger hasn't been set?

 +
 +static const struct pci_error_handlers vfio_err_handlers = {
 + .error_detected = vfio_err_detected,
 +};
 +
  static struct pci_driver vfio_pci_driver = {
   .name   = vfio-pci,
   .id_table   = NULL, /* only dynamic ids */
   .probe  = vfio_pci_probe,
   .remove = vfio_pci_remove,
 + .err_handler= vfio_err_handlers,
  };
  
  static void __exit vfio_pci_cleanup(void)
 diff --git a/drivers/vfio/pci/vfio_pci_private.h 
 b/drivers/vfio/pci/vfio_pci_private.h
 index 611827c..daee62f 100644
 --- a/drivers/vfio/pci/vfio_pci_private.h
 +++ b/drivers/vfio/pci/vfio_pci_private.h
 @@ -55,6 +55,7 @@ struct vfio_pci_device {
   boolbardirty;
   struct pci_saved_state  *pci_saved_state;
   atomic_trefcnt;
 + struct eventfd_ctx  *err_trigger;
  };
  
  #define is_intx(vdev) (vdev-irq_type == VFIO_PCI_INTX_IRQ_INDEX)
 diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
 index 56097c6..5ed5a54 100644
 --- a/drivers/vfio/vfio.c
 +++ b/drivers/vfio/vfio.c
 @@ -693,6 +693,14 @@ void *vfio_del_group_dev(struct device *dev)
  }
  EXPORT_SYMBOL_GPL(vfio_del_group_dev);
  
 +void *vfio_get_vdev(struct device *dev)
 +{
 + struct vfio_device *device = dev_get_drvdata(dev);
 +
 + return device-device_data;
 +}
 +EXPORT_SYMBOL_GPL(vfio_get_vdev);
 +

This is unsafe.  How do we know dev is a vfio device?  How do we keep
that drvdata valid while you're using it?  I think you want to export
the existing vfio_group_get_device() and vfio_device_put().  Thanks,

Alex

  /**
   * VFIO base fd, /dev/vfio/vfio
   */
 diff --git a/include/linux/vfio.h b/include/linux/vfio.h
 index ab9e862..3c97b03 100644
 --- a/include/linux/vfio.h
 +++ b/include/linux/vfio.h
 @@ -45,6 +45,7 @@ extern int vfio_add_group_dev(struct device *dev,
 void *device_data);
  
  extern void *vfio_del_group_dev(struct device *dev);
 +extern void *vfio_get_vdev(struct device *dev);
  
  /**
   *

Re: [PATCH 2/2] QEMU-AER: Qemu changes to support AER for VFIO-PCI devices

2013-01-09 Thread Alex Williamson

On Wed, 2013-01-09 at 06:26 +, Pandarathil, Vijaymohan R wrote:
   - Create eventfd per vfio device assigned to a guest and register an
   event handler
 
   - This fd is passed to the vfio_pci driver through a new ioctl
 
   - When the device encounters an error, the eventfd is signaled
   and the qemu eventfd handler gets invoked.
 
   - In the handler decide what action to take. Current action taken
   is to terminate the guest.
 
 Signed-off-by: Vijay Mohan Pandarathil vijaymohan.pandarat...@hp.com
 ---
  hw/vfio_pci.c  | 56 
 ++
  linux-headers/linux/vfio.h |  9 
  2 files changed, 65 insertions(+)
 
 diff --git a/hw/vfio_pci.c b/hw/vfio_pci.c
 index 28c8303..9c3c28b 100644
 --- a/hw/vfio_pci.c
 +++ b/hw/vfio_pci.c
 @@ -38,6 +38,7 @@
  #include qemu/error-report.h
  #include qemu/queue.h
  #include qemu/range.h
 +#include sysemu/sysemu.h
  
  /* #define DEBUG_VFIO */
  #ifdef DEBUG_VFIO
 @@ -130,6 +131,8 @@ typedef struct VFIODevice {
  QLIST_ENTRY(VFIODevice) next;
  struct VFIOGroup *group;
  bool reset_works;
 +EventNotifier errfd;

Misleading name, it's an EventNotifier not an fd.

 +__u32 dev_info_flags;
  } VFIODevice;
  
  typedef struct VFIOGroup {
 @@ -1805,6 +1808,8 @@ static int vfio_get_device(VFIOGroup *group, const char 
 *name, VFIODevice *vdev)
  DPRINTF(Device %s flags: %u, regions: %u, irgs: %u\n, name,
  dev_info.flags, dev_info.num_regions, dev_info.num_irqs);
  
 +vdev-dev_info_flags = dev_info.flags;
 +

We test the VFIO_DEVICE_FLAGS_RESET elsewhere by caching the bit into a
variable, why not do that here too?

  if (!(dev_info.flags  VFIO_DEVICE_FLAGS_PCI)) {
  error_report(vfio: Um, this isn't a PCI device\n);
  goto error;
 @@ -1900,6 +1905,55 @@ static void vfio_put_device(VFIODevice *vdev)
  }
  }
  
 +static void vfio_errfd_handler(void *opaque)
 +{
 +VFIODevice *vdev = opaque;
 +
 +if (!event_notifier_test_and_clear(vdev-errfd)) {
 +return;
 +}
 +
 +/*
 + * TBD. Retrieve the error details and decide what action
 + * needs to be taken. One of the actions could be to pass
 + * the error to the guest and have the guest driver recover
 + * the error. This requires that PCIe capabilities be
 + * exposed to the guest. At present, we just terminate the
 + * guest to contain the error.
 + */
 +error_report(%s(%04x:%02x:%02x.%x) 
 +Unrecoverable error detected... Terminating guest\n,
 +__func__, vdev-host.domain, vdev-host.bus, vdev-host.slot,
 +vdev-host.function);
 +
 +qemu_system_shutdown_request();

I would have figured hw_error

 +return;
 +}
 +
 +static void vfio_register_errfd(VFIODevice *vdev)
 +{
 +int32_t pfd;

pfd is used elsewhere in vfio as Pointer to FD, this is just a fd.

 +int ret;
 +
 +if (!(vdev-dev_info_flags  VFIO_DEVICE_FLAGS_AER_NOTIFY)) {
 +error_report(vfio: Warning: Error notification not supported for 
 the device\n);

This should probably just be a debug print.

 +return;
 +}
 +if (event_notifier_init(vdev-errfd, 0)) {
 +error_report(vfio: Warning: Unable to init event notifier for error 
 detection\n);
 +return;
 +}
 +pfd = event_notifier_get_fd(vdev-errfd);
 +qemu_set_fd_handler(pfd, vfio_errfd_handler, NULL, vdev);
 +
 +ret = ioctl(vdev-fd, VFIO_DEVICE_SET_ERRFD, pfd);
 +if (ret) {
 +error_report(vfio: Warning: Failed to setup error fd: %d\n, ret);
 +qemu_set_fd_handler(pfd, NULL, NULL, vdev);
 +event_notifier_cleanup(vdev-errfd);
 +}
 +return;
 +}
  static int vfio_initfn(PCIDevice *pdev)
  {
  VFIODevice *pvdev, *vdev = DO_UPCAST(VFIODevice, pdev, pdev);
 @@ -2010,6 +2064,8 @@ static int vfio_initfn(PCIDevice *pdev)
  }
  }
  
 +vfio_register_errfd(vdev);
 +

No cleanup in exitfn?!  Thanks,

Alex

  return 0;
  
  out_teardown:
 diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
 index 4758d1b..0ca4eeb 100644
 --- a/linux-headers/linux/vfio.h
 +++ b/linux-headers/linux/vfio.h
 @@ -147,6 +147,7 @@ struct vfio_device_info {
   __u32   flags;
  #define VFIO_DEVICE_FLAGS_RESET  (1  0)/* Device supports 
 reset */
  #define VFIO_DEVICE_FLAGS_PCI(1  1)/* vfio-pci device */
 +#define VFIO_DEVICE_FLAGS_AER_NOTIFY (1  2)   /* Supports aer notification 
 */
   __u32   num_regions;/* Max region index + 1 */
   __u32   num_irqs;   /* Max IRQ index + 1 */
  };
 @@ -288,6 +289,14 @@ struct vfio_irq_set {
   */
  #define VFIO_DEVICE_RESET_IO(VFIO_TYPE, VFIO_BASE + 11)
  
 +/**
 + * VFIO_DEVICE_SET_ERRFD - _IO(VFIO_TYPE, VFIO_BASE + 12)
 + *
 + * Pass the eventfd to the vfio-pci driver for signalling any device
 + * error notifications
 + */
 +#define VFIO_DEVICE_SET_ERRFD   _IO(VFIO_TYPE, VFIO_BASE

[RFC 02/12] target-i386: Don't set any KVM flag by default if KVM is disabled

2013-01-09 Thread Eduardo Habkost

This is a cleanup that tries to solve two small issues:

 - We don't need a separate kvm_pv_eoi_features variable just to keep a
   constant calculated at compile-time, and this style would require
   adding a separate variable (that's declared twice because of the
   CONFIG_KVM ifdef) for each feature that's going to be enabled/disable
   by machine-type compat code.
 - The pc-1.3 code is setting the kvm_pv_eoi flag on cpuid_kvm_features
   even when KVM is disabled at runtime. This small incosistency in
   the cpuid_kvm_features field isn't a problem today because
   cpuid_kvm_features is ignored by the TCG code, but it may cause
   unexpected problems later when refactoring the CPUID handling code.

This patch eliminates the kvm_pv_eoi_features variable and simply uses
kvm_enabled() inside the enable_kvm_pv_eoi() compat function, so it
enables kvm_pv_eoi only if KVM is enabled. I believe this makes the
behavior of enable_kvm_pv_eoi() clearer and easier to understand.

Signed-off-by: Eduardo Habkost ehabk...@redhat.com
---
Cc: kvm@vger.kernel.org
Cc: Michael S. Tsirkin m...@redhat.com
Cc: Gleb Natapov g...@redhat.com
Cc: Marcelo Tosatti mtosa...@redhat.com

Changes v2:
 - Coding style fix

Changes v3:
 - Eliminate #ifdef by using the fake KVM_FEATURE_PV_EOI #define

Changes v4:
 - Check kvm_enabled() when actually using kvm_default_features
 - Eliminate Yet Another #ifdef by using the fake KVM_FEATURE_*
   #defines on kvm_default_features initialization
---
 target-i386/cpu.c | 16 ++--
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index e4dc370..57a22b7 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -206,22 +206,16 @@ typedef struct model_features_t {
 int check_cpuid = 0;
 int enforce_cpuid = 0;
 
-#if defined(CONFIG_KVM)
 static uint32_t kvm_default_features = (1  KVM_FEATURE_CLOCKSOURCE) |
 (1  KVM_FEATURE_NOP_IO_DELAY) |
 (1  KVM_FEATURE_CLOCKSOURCE2) |
 (1  KVM_FEATURE_ASYNC_PF) |
 (1  KVM_FEATURE_STEAL_TIME) |
 (1  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT);
-static const uint32_t kvm_pv_eoi_features = (0x1  KVM_FEATURE_PV_EOI);
-#else
-static uint32_t kvm_default_features = 0;
-static const uint32_t kvm_pv_eoi_features = 0;
-#endif
 
 void enable_kvm_pv_eoi(void)
 {
-kvm_default_features |= kvm_pv_eoi_features;
+kvm_default_features |= (1UL  KVM_FEATURE_PV_EOI);
 }
 
 void host_cpuid(uint32_t function, uint32_t count,
@@ -1343,13 +1337,15 @@ static int cpu_x86_parse_featurestr(x86_def_t 
*x86_cpu_def, char *features)
 unsigned int i;
 char *featurestr; /* Single 'key=value string being parsed */
 /* Features to be added */
-FeatureWordArray plus_features = {
-[FEAT_KVM] = kvm_default_features,
-};
+FeatureWordArray plus_features = { 0 };
 /* Features to be removed */
 FeatureWordArray minus_features = { 0 };
 uint32_t numvalue;
 
+if (kvm_enabled()) {
+plus_features[FEAT_KVM] = kvm_default_features;
+}
+
 add_flagname_to_bitmaps(hypervisor, plus_features);
 
 featurestr = features ? strtok(features, ,) : NULL;
-- 
1.7.11.7

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC 04/12] kvm: Create kvm_arch_vcpu_id() function

2013-01-09 Thread Eduardo Habkost

This will allow each architecture to define how the VCPU ID is set on
the KVM_CREATE_VCPU ioctl call.

Signed-off-by: Eduardo Habkost ehabk...@redhat.com
---
Cc: kvm@vger.kernel.org
Cc: Michael S. Tsirkin m...@redhat.com
Cc: Gleb Natapov g...@redhat.com
Cc: Marcelo Tosatti mtosa...@redhat.com

Changes v2:
 - Get CPUState as argument instead of CPUArchState
---
 include/sysemu/kvm.h | 3 +++
 kvm-all.c| 2 +-
 target-i386/kvm.c| 5 +
 target-ppc/kvm.c | 5 +
 target-s390x/kvm.c   | 5 +
 5 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index 22acf91..384ee66 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -196,6 +196,9 @@ int kvm_arch_init(KVMState *s);
 
 int kvm_arch_init_vcpu(CPUState *cpu);
 
+/* Returns VCPU ID to be used on KVM_CREATE_VCPU ioctl() */
+unsigned long kvm_arch_vcpu_id(CPUState *cpu);
+
 void kvm_arch_reset_vcpu(CPUState *cpu);
 
 int kvm_arch_on_sigbus_vcpu(CPUState *cpu, int code, void *addr);
diff --git a/kvm-all.c b/kvm-all.c
index 6e2164b..d37359f 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -222,7 +222,7 @@ int kvm_init_vcpu(CPUState *cpu)
 
 DPRINTF(kvm_init_vcpu\n);
 
-ret = kvm_vm_ioctl(s, KVM_CREATE_VCPU, cpu-cpu_index);
+ret = kvm_vm_ioctl(s, KVM_CREATE_VCPU, kvm_arch_vcpu_id(cpu));
 if (ret  0) {
 DPRINTF(kvm_create_vcpu failed\n);
 goto err;
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 3acff40..5f3f789 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -411,6 +411,11 @@ static void cpu_update_state(void *opaque, int running, 
RunState state)
 }
 }
 
+unsigned long kvm_arch_vcpu_id(CPUState *cpu)
+{
+return cpu-cpu_index;
+}
+
 int kvm_arch_init_vcpu(CPUState *cs)
 {
 struct {
diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index e4a387f..bdcd133 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -384,6 +384,11 @@ static inline void kvm_fixup_page_sizes(PowerPCCPU *cpu)
 
 #endif /* !defined (TARGET_PPC64) */
 
+unsigned long kvm_arch_vcpu_id(CPUState *cpu)
+{
+return cpu-cpu_index;
+}
+
 int kvm_arch_init_vcpu(CPUState *cs)
 {
 PowerPCCPU *cpu = POWERPC_CPU(cs);
diff --git a/target-s390x/kvm.c b/target-s390x/kvm.c
index 6ec5e6d..bd9864c 100644
--- a/target-s390x/kvm.c
+++ b/target-s390x/kvm.c
@@ -72,6 +72,11 @@ int kvm_arch_init(KVMState *s)
 return 0;
 }
 
+unsigned long kvm_arch_vcpu_id(CPUState *cpu)
+{
+return cpu-cpu_index;
+}
+
 int kvm_arch_init_vcpu(CPUState *cpu)
 {
 int ret = 0;
-- 
1.7.11.7

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC 01/12] kvm: add KVM_FEATURE_CLOCKSOURCE_STABLE_BIT fake #define

2013-01-09 Thread Eduardo Habkost

Signed-off-by: Eduardo Habkost ehabk...@redhat.com
---
Cc: kvm@vger.kernel.org
Cc: Michael S. Tsirkin m...@redhat.com
Cc: Gleb Natapov g...@redhat.com
Cc: Marcelo Tosatti mtosa...@redhat.com
---
 include/sysemu/kvm.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index 6bdd513..22acf91 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -36,6 +36,7 @@
 #define KVM_FEATURE_ASYNC_PF 0
 #define KVM_FEATURE_STEAL_TIME   0
 #define KVM_FEATURE_PV_EOI   0
+#define KVM_FEATURE_CLOCKSOURCE_STABLE_BIT 0
 #endif
 
 extern int kvm_allowed;
-- 
1.7.11.7

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC 03/12] pc: Reverse pc_init_pci() compatibility logic

2013-01-09 Thread Eduardo Habkost

Currently, the pc-1.4 machine init function enables PV EOI and then
calls the pc-1.2 machine init function. The problem with this approach
is that now we can't enable any compatibility code inside the pc-1.2
init function because it would enable the compatibility behavior on
pc-1.3 and pc-1.4 as well.

This reverses the logic so that the pc-1.2 machine init function will
disable PV EOI, and then call the pc-1.4 machine init function. This way
we can change the pc-1.2 init function to enable compatibility behavior,
and the pc-1.4 init function would just use the default behavior.

It would be interesting to eventually change pc_init_pci_no_kvmclock()
and pc_init_isa() to reuse pc_init_pci_1_2() as well (so we don't need
to duplicate compatibility code on those two functions). But this will
be probably much easier to do after we create a PCInitArgs struct for
the PC initialization arguments.

Signed-off-by: Eduardo Habkost ehabk...@redhat.com
---
Cc: kvm@vger.kernel.org
Cc: Michael S. Tsirkin m...@redhat.com
Cc: Gleb Natapov g...@redhat.com
Cc: Marcelo Tosatti mtosa...@redhat.com
---
 hw/pc_piix.c  | 22 +-
 target-i386/cpu.c |  5 +++--
 target-i386/cpu.h |  2 +-
 3 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/hw/pc_piix.c b/hw/pc_piix.c
index 2b3d58b..13d7cc8 100644
--- a/hw/pc_piix.c
+++ b/hw/pc_piix.c
@@ -233,12 +233,14 @@ static void pc_init_pci(QEMUMachineInitArgs *args)
  initrd_filename, cpu_model, 1, 1);
 }
 
-static void pc_init_pci_1_3(QEMUMachineInitArgs *args)
+/* PC machine init function for pc-0.14 to pc-1.2 */
+static void pc_init_pci_1_2(QEMUMachineInitArgs *args)
 {
-enable_kvm_pv_eoi();
+disable_kvm_pv_eoi();
 pc_init_pci(args);
 }
 
+/* PC init function for pc-0.10 to pc-0.13, and reused by xenfv */
 static void pc_init_pci_no_kvmclock(QEMUMachineInitArgs *args)
 {
 ram_addr_t ram_size = args-ram_size;
@@ -247,6 +249,7 @@ static void pc_init_pci_no_kvmclock(QEMUMachineInitArgs 
*args)
 const char *kernel_cmdline = args-kernel_cmdline;
 const char *initrd_filename = args-initrd_filename;
 const char *boot_device = args-boot_device;
+disable_kvm_pv_eoi();
 pc_init1(get_system_memory(),
  get_system_io(),
  ram_size, boot_device,
@@ -264,6 +267,7 @@ static void pc_init_isa(QEMUMachineInitArgs *args)
 const char *boot_device = args-boot_device;
 if (cpu_model == NULL)
 cpu_model = 486;
+disable_kvm_pv_eoi();
 pc_init1(get_system_memory(),
  get_system_io(),
  ram_size, boot_device,
@@ -286,7 +290,7 @@ static QEMUMachine pc_machine_v1_4 = {
 .name = pc-1.4,
 .alias = pc,
 .desc = Standard PC,
-.init = pc_init_pci_1_3,
+.init = pc_init_pci,
 .max_cpus = 255,
 .is_default = 1,
 };
@@ -301,7 +305,7 @@ static QEMUMachine pc_machine_v1_4 = {
 static QEMUMachine pc_machine_v1_3 = {
 .name = pc-1.3,
 .desc = Standard PC,
-.init = pc_init_pci_1_3,
+.init = pc_init_pci,
 .max_cpus = 255,
 .compat_props = (GlobalProperty[]) {
 PC_COMPAT_1_3,
@@ -340,7 +344,7 @@ static QEMUMachine pc_machine_v1_3 = {
 static QEMUMachine pc_machine_v1_2 = {
 .name = pc-1.2,
 .desc = Standard PC,
-.init = pc_init_pci,
+.init = pc_init_pci_1_2,
 .max_cpus = 255,
 .compat_props = (GlobalProperty[]) {
 PC_COMPAT_1_2,
@@ -383,7 +387,7 @@ static QEMUMachine pc_machine_v1_2 = {
 static QEMUMachine pc_machine_v1_1 = {
 .name = pc-1.1,
 .desc = Standard PC,
-.init = pc_init_pci,
+.init = pc_init_pci_1_2,
 .max_cpus = 255,
 .compat_props = (GlobalProperty[]) {
 PC_COMPAT_1_1,
@@ -418,7 +422,7 @@ static QEMUMachine pc_machine_v1_1 = {
 static QEMUMachine pc_machine_v1_0 = {
 .name = pc-1.0,
 .desc = Standard PC,
-.init = pc_init_pci,
+.init = pc_init_pci_1_2,
 .max_cpus = 255,
 .compat_props = (GlobalProperty[]) {
 PC_COMPAT_1_0,
@@ -433,7 +437,7 @@ static QEMUMachine pc_machine_v1_0 = {
 static QEMUMachine pc_machine_v0_15 = {
 .name = pc-0.15,
 .desc = Standard PC,
-.init = pc_init_pci,
+.init = pc_init_pci_1_2,
 .max_cpus = 255,
 .compat_props = (GlobalProperty[]) {
 PC_COMPAT_0_15,
@@ -465,7 +469,7 @@ static QEMUMachine pc_machine_v0_15 = {
 static QEMUMachine pc_machine_v0_14 = {
 .name = pc-0.14,
 .desc = Standard PC,
-.init = pc_init_pci,
+.init = pc_init_pci_1_2,
 .max_cpus = 255,
 .compat_props = (GlobalProperty[]) {
 PC_COMPAT_0_14, 
diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index 57a22b7..492656c 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -211,11 +211,12 @@ static uint32_t kvm_default_features = (1  
KVM_FEATURE_CLOCKSOURCE) |
 (1  KVM_FEATURE_CLOCKSOURCE2) |
 (1  KVM_FEATURE_ASYNC_PF) |
 (1  KVM_FEATURE_STEAL_TIME) |
+(1  KVM_FEATURE_PV_EOI) |
 (1  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT);

[RFC 11/12] target-i386: Topology APIC ID utility functions

2013-01-09 Thread Eduardo Habkost

Signed-off-by: Eduardo Habkost ehabk...@redhat.com
---
Changes v1 - v2:
 - Support 32-bit APIC IDs (in case x2APIC is going to be used)
 - Coding style changes
 - Use TARGET_I386_TOPOLOGY_H instead of __QEMU_X86_TOPOLOGY_H__
 - Rename topo_make_apic_id() to topo_apicid_for_cpu()
 - Rename __make_apicid() to topo_make_apicid()
 - Spaces around operators on test-x86-cpuid.c, as requested by
   Blue Swirl
 - Make test-x86-cpuid a target-specific test

Changes v2 - v3:
 - Add documentation pointers to the code
 - Rename bits_for_count() to bitwidth_for_count()
 - Remove unused apicid_*_id() functions

Changes v3 - v4:
 - Remove now-obsolete FIXME comment from test-x86-cpuid.c
 - Change bitops.h include to qemu/bitops.h
---
 target-i386/topology.h | 133 +
 tests/.gitignore   |   1 +
 tests/Makefile |   4 ++
 tests/test-x86-cpuid.c | 101 +
 4 files changed, 239 insertions(+)
 create mode 100644 target-i386/topology.h
 create mode 100644 tests/test-x86-cpuid.c

diff --git a/target-i386/topology.h b/target-i386/topology.h
new file mode 100644
index 000..833ab47
--- /dev/null
+++ b/target-i386/topology.h
@@ -0,0 +1,133 @@
+/*
+ *  x86 CPU topology data structures and functions
+ *
+ *  Copyright (c) 2012 Red Hat Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the Software), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+#ifndef TARGET_I386_TOPOLOGY_H
+#define TARGET_I386_TOPOLOGY_H
+
+/* This file implements the APIC-ID-based CPU topology enumeration logic,
+ * documented at the following document:
+ *   Intel® 64 Architecture Processor Topology Enumeration
+ *   
http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/
+ *
+ * This code should be compatible with AMD's Extended Method described at:
+ *   AMD CPUID Specification (Publication #25481)
+ *   Section 3: Multiple Core Calcuation
+ * as long as:
+ *  nr_threads is set to 1;
+ *  OFFSET_IDX is assumed to be 0;
+ *  CPUID Fn8000_0008_ECX[ApicIdCoreIdSize[3:0]] is set to apicid_core_width().
+ */
+
+#include stdint.h
+#include string.h
+
+#include qemu/bitops.h
+
+/* APIC IDs can be 32-bit, but beware: APIC IDs  255 require x2APIC support
+ */
+typedef uint32_t apic_id_t;
+
+/* Return the bit width needed for 'count' IDs
+ */
+static unsigned bitwidth_for_count(unsigned count)
+{
+g_assert(count = 1);
+if (count == 1) {
+return 0;
+}
+return bitops_flsl(count - 1) + 1;
+}
+
+/* Bit width of the SMT_ID (thread ID) field on the APIC ID
+ */
+static inline unsigned apicid_smt_width(unsigned nr_cores, unsigned nr_threads)
+{
+return bitwidth_for_count(nr_threads);
+}
+
+/* Bit width of the Core_ID field
+ */
+static inline unsigned apicid_core_width(unsigned nr_cores, unsigned 
nr_threads)
+{
+return bitwidth_for_count(nr_cores);
+}
+
+/* Bit offset of the Core_ID field
+ */
+static inline unsigned apicid_core_offset(unsigned nr_cores,
+  unsigned nr_threads)
+{
+return apicid_smt_width(nr_cores, nr_threads);
+}
+
+/* Bit offset of the Pkg_ID (socket ID) field
+ */
+static inline unsigned apicid_pkg_offset(unsigned nr_cores, unsigned 
nr_threads)
+{
+return apicid_core_offset(nr_cores, nr_threads) + \
+   apicid_core_width(nr_cores, nr_threads);
+}
+
+/* Make APIC ID for the CPU based on Pkg_ID, Core_ID, SMT_ID
+ *
+ * The caller must make sure core_id  nr_cores and smt_id  nr_threads.
+ */
+static inline apic_id_t topo_make_apicid(unsigned nr_cores,
+ unsigned nr_threads,
+ unsigned pkg_id, unsigned core_id,
+ unsigned smt_id)
+{
+return (pkg_id   apicid_pkg_offset(nr_cores, nr_threads)) | \
+   (core_id  apicid_core_offset(nr_cores, nr_threads)) | \
+   smt_id;
+}
+
+/* Calculate thread/core/package IDs for a specific topology,

[RFC 06/12] fw_cfg: Remove FW_CFG_MAX_CPUS from fw_cfg_init()

2013-01-09 Thread Eduardo Habkost

PC will not use max_cpus for that field, so move it outside the common
code so it can use a different value on PC.

Signed-off-by: Eduardo Habkost ehabk...@redhat.com
---
 hw/fw_cfg.c   | 1 -
 hw/pc.c   | 2 +-
 hw/ppc_newworld.c | 1 +
 hw/ppc_oldworld.c | 1 +
 hw/sun4m.c| 3 +++
 hw/sun4u.c| 1 +
 6 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/hw/fw_cfg.c b/hw/fw_cfg.c
index 26f7125..0f13437 100644
--- a/hw/fw_cfg.c
+++ b/hw/fw_cfg.c
@@ -519,7 +519,6 @@ FWCfgState *fw_cfg_init(uint32_t ctl_port, uint32_t 
data_port,
 fw_cfg_add_bytes(s, FW_CFG_UUID, qemu_uuid, 16);
 fw_cfg_add_i16(s, FW_CFG_NOGRAPHIC, (uint16_t)(display_type == 
DT_NOGRAPHIC));
 fw_cfg_add_i16(s, FW_CFG_NB_CPUS, (uint16_t)smp_cpus);
-fw_cfg_add_i16(s, FW_CFG_MAX_CPUS, (uint16_t)max_cpus);
 fw_cfg_add_i16(s, FW_CFG_BOOT_MENU, (uint16_t)boot_menu);
 fw_cfg_bootsplash(s);
 fw_cfg_reboot(s);
diff --git a/hw/pc.c b/hw/pc.c
index df0c48e..beb816b 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -550,7 +550,7 @@ static void *bochs_bios_init(void)
 int i, j;
 
 fw_cfg = fw_cfg_init(BIOS_CFG_IOPORT, BIOS_CFG_IOPORT + 1, 0, 0);
-
+fw_cfg_add_i16(fw_cfg, FW_CFG_MAX_CPUS, (uint16_t)max_cpus);
 fw_cfg_add_i32(fw_cfg, FW_CFG_ID, 1);
 fw_cfg_add_i64(fw_cfg, FW_CFG_RAM_SIZE, (uint64_t)ram_size);
 fw_cfg_add_bytes(fw_cfg, FW_CFG_ACPI_TABLES, (uint8_t *)acpi_tables,
diff --git a/hw/ppc_newworld.c b/hw/ppc_newworld.c
index fabcc08..fdecfcf 100644
--- a/hw/ppc_newworld.c
+++ b/hw/ppc_newworld.c
@@ -397,6 +397,7 @@ static void ppc_core99_init(QEMUMachineInitArgs *args)
 /* No PCI init: the BIOS will do it */
 
 fw_cfg = fw_cfg_init(0, 0, CFG_ADDR, CFG_ADDR + 2);
+fw_cfg_add_i16(fw_cfg, FW_CFG_MAX_CPUS, (uint16_t)max_cpus);
 fw_cfg_add_i32(fw_cfg, FW_CFG_ID, 1);
 fw_cfg_add_i64(fw_cfg, FW_CFG_RAM_SIZE, (uint64_t)ram_size);
 fw_cfg_add_i16(fw_cfg, FW_CFG_MACHINE_ID, machine_arch);
diff --git a/hw/ppc_oldworld.c b/hw/ppc_oldworld.c
index fff5129..7406c10 100644
--- a/hw/ppc_oldworld.c
+++ b/hw/ppc_oldworld.c
@@ -296,6 +296,7 @@ static void ppc_heathrow_init(QEMUMachineInitArgs *args)
 /* No PCI init: the BIOS will do it */
 
 fw_cfg = fw_cfg_init(0, 0, CFG_ADDR, CFG_ADDR + 2);
+fw_cfg_add_i16(fw_cfg, FW_CFG_MAX_CPUS, (uint16_t)max_cpus);
 fw_cfg_add_i32(fw_cfg, FW_CFG_ID, 1);
 fw_cfg_add_i64(fw_cfg, FW_CFG_RAM_SIZE, (uint64_t)ram_size);
 fw_cfg_add_i16(fw_cfg, FW_CFG_MACHINE_ID, ARCH_HEATHROW);
diff --git a/hw/sun4m.c b/hw/sun4m.c
index 0d84b37..316e94c 100644
--- a/hw/sun4m.c
+++ b/hw/sun4m.c
@@ -1021,6 +1021,7 @@ static void sun4m_hw_init(const struct sun4m_hwdef 
*hwdef, ram_addr_t RAM_size,
  hwdef-ecc_version);
 
 fw_cfg = fw_cfg_init(0, 0, CFG_ADDR, CFG_ADDR + 2);
+fw_cfg_add_i16(fw_cfg, FW_CFG_MAX_CPUS, (uint16_t)max_cpus);
 fw_cfg_add_i32(fw_cfg, FW_CFG_ID, 1);
 fw_cfg_add_i64(fw_cfg, FW_CFG_RAM_SIZE, (uint64_t)ram_size);
 fw_cfg_add_i16(fw_cfg, FW_CFG_MACHINE_ID, hwdef-machine_id);
@@ -1658,6 +1659,7 @@ static void sun4d_hw_init(const struct sun4d_hwdef 
*hwdef, ram_addr_t RAM_size,
Sun4d);
 
 fw_cfg = fw_cfg_init(0, 0, CFG_ADDR, CFG_ADDR + 2);
+fw_cfg_add_i16(fw_cfg, FW_CFG_MAX_CPUS, (uint16_t)max_cpus);
 fw_cfg_add_i32(fw_cfg, FW_CFG_ID, 1);
 fw_cfg_add_i64(fw_cfg, FW_CFG_RAM_SIZE, (uint64_t)ram_size);
 fw_cfg_add_i16(fw_cfg, FW_CFG_MACHINE_ID, hwdef-machine_id);
@@ -1858,6 +1860,7 @@ static void sun4c_hw_init(const struct sun4c_hwdef 
*hwdef, ram_addr_t RAM_size,
Sun4c);
 
 fw_cfg = fw_cfg_init(0, 0, CFG_ADDR, CFG_ADDR + 2);
+fw_cfg_add_i16(fw_cfg, FW_CFG_MAX_CPUS, (uint16_t)max_cpus);
 fw_cfg_add_i32(fw_cfg, FW_CFG_ID, 1);
 fw_cfg_add_i64(fw_cfg, FW_CFG_RAM_SIZE, (uint64_t)ram_size);
 fw_cfg_add_i16(fw_cfg, FW_CFG_MACHINE_ID, hwdef-machine_id);
diff --git a/hw/sun4u.c b/hw/sun4u.c
index cbfd217..d114af3 100644
--- a/hw/sun4u.c
+++ b/hw/sun4u.c
@@ -878,6 +878,7 @@ static void sun4uv_init(MemoryRegion *address_space_mem,
(uint8_t *)nd_table[0].macaddr);
 
 fw_cfg = fw_cfg_init(BIOS_CFG_IOPORT, BIOS_CFG_IOPORT + 1, 0, 0);
+fw_cfg_add_i16(fw_cfg, FW_CFG_MAX_CPUS, (uint16_t)max_cpus);
 fw_cfg_add_i32(fw_cfg, FW_CFG_ID, 1);
 fw_cfg_add_i64(fw_cfg, FW_CFG_RAM_SIZE, (uint64_t)ram_size);
 fw_cfg_add_i16(fw_cfg, FW_CFG_MACHINE_ID, hwdef-machine_id);
-- 
1.7.11.7

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC 09/12] pc: Set fw_cfg data based on APIC ID calculation

2013-01-09 Thread Eduardo Habkost

This changes FW_CFG_MAX_CPUS and FW_CFG_NUMA to use apic_id_for_cpu(),
so the NUMA table can be based on the APIC IDs, instead of CPU index
(SeaBIOS knows nothing about CPU indexes, just APIC IDs).

Signed-off-by: Eduardo Habkost ehabk...@redhat.com
---
Changes v2:
 - Get PC object as argument
 - Add more detailed comments explaining the reason for FW_CFG_MAX_CPUS
   not being simply 'max_cpus'

Changes v3:
 - Use PCInitArgs instead of PC object

Changes v4:
 - Don't use PCInitArgs, just add the necessary data for apic_id_limit()
   as argument
 - Rename function to pc_apic_id_limit()
 - Rename max_apic_id to apic_id_limit
---
 hw/pc.c | 44 +---
 1 file changed, 37 insertions(+), 7 deletions(-)

diff --git a/hw/pc.c b/hw/pc.c
index beb816b..38ae807 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -541,6 +541,18 @@ int e820_add_entry(uint64_t address, uint64_t length, 
uint32_t type)
 return index;
 }
 
+/* Calculates the limit to CPU APIC ID values
+ *
+ * This function returns the limit for the APIC ID value, so that all
+ * CPU APIC IDs are  pc_apic_id_limit().
+ *
+ * This is used for FW_CFG_MAX_CPUS. See comments on bochs_bios_init().
+ */
+static unsigned int pc_apic_id_limit(unsigned int max_cpus)
+{
+return apic_id_for_cpu(max_cpus - 1) + 1;
+}
+
 static void *bochs_bios_init(void)
 {
 void *fw_cfg;
@@ -548,9 +560,24 @@ static void *bochs_bios_init(void)
 size_t smbios_len;
 uint64_t *numa_fw_cfg;
 int i, j;
+unsigned int apic_id_limit = pc_apic_id_limit(max_cpus);
 
 fw_cfg = fw_cfg_init(BIOS_CFG_IOPORT, BIOS_CFG_IOPORT + 1, 0, 0);
-fw_cfg_add_i16(fw_cfg, FW_CFG_MAX_CPUS, (uint16_t)max_cpus);
+/* FW_CFG_MAX_CPUS is a bit confusing/problematic on x86:
+ *
+ * SeaBIOS needs FW_CFG_MAX_CPUS for CPU hotplug, but the CPU hotplug
+ * QEMU-SeaBIOS interface is not based on the CPU index, but on the 
APIC
+ * ID of hotplugged CPUs[1]. This means that FW_CFG_MAX_CPUS is not the
+ * maximum number of CPUs, but the limit to the APIC ID values SeaBIOS
+ * may see.
+ *
+ * So, this means we must not use max_cpus, here, but the maximum possible
+ * APIC ID value, plus one.
+ *
+ * [1] The only kind of CPU identifier used between SeaBIOS and QEMU is
+ * the APIC ID, not the CPU index
+ */
+fw_cfg_add_i16(fw_cfg, FW_CFG_MAX_CPUS, (uint16_t)apic_id_limit);
 fw_cfg_add_i32(fw_cfg, FW_CFG_ID, 1);
 fw_cfg_add_i64(fw_cfg, FW_CFG_RAM_SIZE, (uint64_t)ram_size);
 fw_cfg_add_bytes(fw_cfg, FW_CFG_ACPI_TABLES, (uint8_t *)acpi_tables,
@@ -570,21 +597,24 @@ static void *bochs_bios_init(void)
  * of nodes, one word for each VCPU-node and one word for each node to
  * hold the amount of memory.
  */
-numa_fw_cfg = g_malloc0((1 + max_cpus + nb_numa_nodes) * 8);
+numa_fw_cfg = g_malloc0((1 + apic_id_limit + nb_numa_nodes) * 8);
 numa_fw_cfg[0] = cpu_to_le64(nb_numa_nodes);
-for (i = 0; i  max_cpus; i++) {
+unsigned int cpu_idx;
+for (cpu_idx = 0; cpu_idx  max_cpus; cpu_idx++) {
+unsigned int apic_id = apic_id_for_cpu(cpu_idx);
+assert(apic_id  apic_id_limit);
 for (j = 0; j  nb_numa_nodes; j++) {
-if (test_bit(i, node_cpumask[j])) {
-numa_fw_cfg[i + 1] = cpu_to_le64(j);
+if (test_bit(cpu_idx, node_cpumask[j])) {
+numa_fw_cfg[apic_id + 1] = cpu_to_le64(j);
 break;
 }
 }
 }
 for (i = 0; i  nb_numa_nodes; i++) {
-numa_fw_cfg[max_cpus + 1 + i] = cpu_to_le64(node_mem[i]);
+numa_fw_cfg[apic_id_limit + 1 + i] = cpu_to_le64(node_mem[i]);
 }
 fw_cfg_add_bytes(fw_cfg, FW_CFG_NUMA, (uint8_t *)numa_fw_cfg,
- (1 + max_cpus + nb_numa_nodes) * 8);
+ (1 + apic_id_limit + nb_numa_nodes) * 8);
 
 return fw_cfg;
 }
-- 
1.7.11.7

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC 10/12] tests: Support target-specific unit tests

2013-01-09 Thread Eduardo Habkost

To make unit tests that depend on target-specific files, use
check-unit-arch-y and test-obj-arch-y.

Signed-off-by: Eduardo Habkost ehabk...@redhat.com
---
 tests/Makefile | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/tests/Makefile b/tests/Makefile
index b09a343..fa96d1a 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -74,8 +74,6 @@ test-qapi-obj-y =  $(qobject-obj-y) $(qapi-obj-y) qemu-tool.o
 test-qapi-obj-y += tests/test-qapi-visit.o tests/test-qapi-types.o
 test-qapi-obj-y += module.o
 
-$(test-obj-y): QEMU_INCLUDES += -Itests
-
 tests/check-qint$(EXESUF): tests/check-qint.o qint.o
 tests/check-qstring$(EXESUF): tests/check-qstring.o qstring.o
 tests/check-qdict$(EXESUF): tests/check-qdict.o qdict.o qfloat.o qint.o 
qstring.o qbool.o qlist.o
@@ -111,9 +109,20 @@ tests/m48t59-test$(EXESUF): tests/m48t59-test.o 
$(trace-obj-y)
 tests/fdc-test$(EXESUF): tests/fdc-test.o tests/libqtest.o $(trace-obj-y)
 tests/hd-geo-test$(EXESUF): tests/hd-geo-test.o tests/libqtest.o $(trace-obj-y)
 
-# QTest rules
+# unit test rules:
 
 TARGETS=$(patsubst %-softmmu,%, $(filter %-softmmu,$(TARGET_DIRS)))
+
+# target-specific tests/objs:
+test-obj-y += $(foreach TARGET,$(TARGETS), $(test-obj-$(TARGET)-y))
+check-unit-y += $(foreach TARGET,$(TARGETS), $(check-unit-$(TARGET)-y))
+
+$(foreach TARGET,$(TARGETS),$(eval $(test-obj-$(TARGET)-y): QEMU_INCLUDES += 
-Itarget-$(TARGET)))
+
+$(test-obj-y): QEMU_INCLUDES += -Itests
+
+# QTest rules
+
 QTEST_TARGETS=$(foreach TARGET,$(TARGETS), $(if $(check-qtest-$(TARGET)-y), 
$(TARGET),))
 check-qtest-$(CONFIG_POSIX)=$(foreach TARGET,$(TARGETS), 
$(check-qtest-$(TARGET)-y))
 
-- 
1.7.11.7

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC 08/12] cpus.h: Make constant smp_cores/smp_threads available on *-user

2013-01-09 Thread Eduardo Habkost

The code that calculates the APIC ID will use smp_cores/smp_threads, so
just define them as 1 on *-user to avoid #ifdefs in the code.

Signed-off-by: Eduardo Habkost ehabk...@redhat.com
---
 include/sysemu/cpus.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/include/sysemu/cpus.h b/include/sysemu/cpus.h
index 81bd817..f7f6854 100644
--- a/include/sysemu/cpus.h
+++ b/include/sysemu/cpus.h
@@ -13,9 +13,16 @@ void cpu_synchronize_all_post_init(void);
 
 void qtest_clock_warp(int64_t dest);
 
+#ifndef CONFIG_USER_ONLY
 /* vl.c */
 extern int smp_cores;
 extern int smp_threads;
+#else
+/* *-user doesn't have configurable SMP topology */
+#define smp_cores   1
+#define smp_threads 1
+#endif
+
 void set_numa_modes(void);
 void set_cpu_log(const char *optarg);
 void set_cpu_log_filename(const char *optarg);
-- 
1.7.11.7

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC 05/12] target-i386: kvm: Set vcpu_id to APIC ID instead of CPU index

2013-01-09 Thread Eduardo Habkost

The CPU ID in KVM is supposed to be the APIC ID, so change the
KVM_CREATE_VCPU call to match it. The current behavior didn't break
anything yet because today the APIC ID is assumed to be equal to the CPU
index, but this won't be true in the future.

Signed-off-by: Eduardo Habkost ehabk...@redhat.com
---
Cc: kvm@vger.kernel.org
Cc: Michael S. Tsirkin m...@redhat.com
Cc: Gleb Natapov g...@redhat.com
Cc: Marcelo Tosatti mtosa...@redhat.com

Changes v2:
 - Change only i386 code (kvm_arch_vcpu_id())

Changes v3:
 - Get CPUState as argument instead of CPUArchState
---
 target-i386/kvm.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 5f3f789..c440809 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -411,9 +411,10 @@ static void cpu_update_state(void *opaque, int running, 
RunState state)
 }
 }
 
-unsigned long kvm_arch_vcpu_id(CPUState *cpu)
+unsigned long kvm_arch_vcpu_id(CPUState *cs)
 {
-return cpu-cpu_index;
+X86CPU *cpu = X86_CPU(cs);
+return cpu-env.cpuid_apic_id;
 }
 
 int kvm_arch_init_vcpu(CPUState *cs)
-- 
1.7.11.7

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC 07/12] target-i386/cpu: Introduce apic_id_for_cpu() function

2013-01-09 Thread Eduardo Habkost

This function will be used by both the CPU initialization code and the
fw_cfg table initialization code.

Later this function will be updated to generate APIC IDs according to
the CPU topology.

Signed-off-by: Eduardo Habkost ehabk...@redhat.com
---
 target-i386/cpu.c | 17 -
 target-i386/cpu.h |  2 ++
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index 492656c..33787dc 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -2192,6 +2192,21 @@ void x86_cpu_realize(Object *obj, Error **errp)
 cpu_reset(CPU(cpu));
 }
 
+/* Calculates initial APIC ID for a specific CPU index
+ *
+ * Currently we need to be able to calculate the APIC ID from the CPU index
+ * alone (without requiring a CPU object), as the QEMU-Seabios interfaces 
have
+ * no concept of CPU index, and the NUMA tables on fw_cfg need the APIC ID of
+ * all CPUs up to max_cpus.
+ */
+uint32_t apic_id_for_cpu(unsigned int cpu_index)
+{
+/* right now APIC ID == CPU index. this will eventually change to use
+ * the CPU topology configuration properly
+ */
+return cpu_index;
+}
+
 static void x86_cpu_initfn(Object *obj)
 {
 CPUState *cs = CPU(obj);
@@ -2226,7 +2241,7 @@ static void x86_cpu_initfn(Object *obj)
 x86_cpuid_get_tsc_freq,
 x86_cpuid_set_tsc_freq, NULL, NULL, NULL);
 
-env-cpuid_apic_id = cs-cpu_index;
+env-cpuid_apic_id = apic_id_for_cpu(cs-cpu_index);
 
 /* init various static tables used in TCG mode */
 if (tcg_enabled()  !inited) {
diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index f3c9df5..dbd9899 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -1238,4 +1238,6 @@ void disable_kvm_pv_eoi(void);
 /* Return name of 32-bit register, from a R_* constant */
 const char *get_register_name_32(unsigned int reg);
 
+uint32_t apic_id_for_cpu(unsigned int cpu_index);
+
 #endif /* CPU_I386_H */
-- 
1.7.11.7

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC 12/12] pc: Generate APIC IDs according to CPU topology

2013-01-09 Thread Eduardo Habkost

This keeps compatibility on machine-types pc-1.2 and older, and prints a
warning in case the requested configuration won't get the correct
topology.

I couldn't think of a better way to warn about broken topology when in
compat mode other than using error_report(). The warning message will be
probably be buried in a log file somewhere, but it's better than
nothing.

Signed-off-by: Eduardo Habkost ehabk...@redhat.com
---
Changes v1 - v2:
 - Move code to cpu.c
 - keep using cpu_index on *-user
 - Use SMP.contiguous_apic_ids global property
 - Prints warning in case the compatibility mode will expose incorrect
   topology

Changes v2 - v3:
 - Now all code is inside hw/pc.c
 - Use a real PC class and a contiguous_apic_ids property

Changes v3 - v4:
 - Instead of using a global property, use a separate machine init
   function and a PCInitArgs field, to implement compatibility mode
 - Use error_report() instead of fprintf(stderr) for the warning
 - Use a field on PCInitArgs instead of a static variable to check
   if warning was already printed

Changes v4 - v5:
 - Don't use PCInitArgs: simply add a enable_compat_apic_id_mode()
   function and a static compat_apic_id_mode variable, to enable the
   compatibility mode
 - Move APIC ID calculation code to cpu.c
---
 hw/pc_piix.c  | 13 +++--
 target-i386/cpu.c | 28 
 target-i386/cpu.h |  1 +
 3 files changed, 36 insertions(+), 6 deletions(-)

diff --git a/hw/pc_piix.c b/hw/pc_piix.c
index 13d7cc8..be4fea1 100644
--- a/hw/pc_piix.c
+++ b/hw/pc_piix.c
@@ -233,11 +233,18 @@ static void pc_init_pci(QEMUMachineInitArgs *args)
  initrd_filename, cpu_model, 1, 1);
 }
 
+
+static void pc_init_pci_1_3(QEMUMachineInitArgs *args)
+{
+enable_compat_apic_id_mode();
+pc_init_pci(args);
+}
+
 /* PC machine init function for pc-0.14 to pc-1.2 */
 static void pc_init_pci_1_2(QEMUMachineInitArgs *args)
 {
 disable_kvm_pv_eoi();
-pc_init_pci(args);
+pc_init_pci_1_3(args);
 }
 
 /* PC init function for pc-0.10 to pc-0.13, and reused by xenfv */
@@ -250,6 +257,7 @@ static void pc_init_pci_no_kvmclock(QEMUMachineInitArgs 
*args)
 const char *initrd_filename = args-initrd_filename;
 const char *boot_device = args-boot_device;
 disable_kvm_pv_eoi();
+enable_compat_apic_id_mode();
 pc_init1(get_system_memory(),
  get_system_io(),
  ram_size, boot_device,
@@ -268,6 +276,7 @@ static void pc_init_isa(QEMUMachineInitArgs *args)
 if (cpu_model == NULL)
 cpu_model = 486;
 disable_kvm_pv_eoi();
+enable_compat_apic_id_mode();
 pc_init1(get_system_memory(),
  get_system_io(),
  ram_size, boot_device,
@@ -305,7 +314,7 @@ static QEMUMachine pc_machine_v1_4 = {
 static QEMUMachine pc_machine_v1_3 = {
 .name = pc-1.3,
 .desc = Standard PC,
-.init = pc_init_pci,
+.init = pc_init_pci_1_3,
 .max_cpus = 255,
 .compat_props = (GlobalProperty[]) {
 PC_COMPAT_1_3,
diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index 33787dc..f34192c 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -23,6 +23,8 @@
 
 #include cpu.h
 #include sysemu/kvm.h
+#include sysemu/cpus.h
+#include topology.h
 
 #include qemu/option.h
 #include qemu/config-file.h
@@ -2192,6 +2194,14 @@ void x86_cpu_realize(Object *obj, Error **errp)
 cpu_reset(CPU(cpu));
 }
 
+/* Enables contiguous-apic-ID mode, for compatibility */
+static bool compat_apic_id_mode;
+
+void enable_compat_apic_id_mode(void)
+{
+compat_apic_id_mode = true;
+}
+
 /* Calculates initial APIC ID for a specific CPU index
  *
  * Currently we need to be able to calculate the APIC ID from the CPU index
@@ -2201,10 +2211,20 @@ void x86_cpu_realize(Object *obj, Error **errp)
  */
 uint32_t apic_id_for_cpu(unsigned int cpu_index)
 {
-/* right now APIC ID == CPU index. this will eventually change to use
- * the CPU topology configuration properly
- */
-return cpu_index;
+uint32_t correct_id;
+static bool warned;
+
+correct_id = topo_apicid_for_cpu(smp_cores, smp_threads, cpu_index);
+if (compat_apic_id_mode) {
+if (cpu_index != correct_id  !warned) {
+error_report(APIC IDs set in compatibility mode, 
+ CPU topology won't match the configuration);
+warned = true;
+}
+return cpu_index;
+} else {
+return correct_id;
+}
 }
 
 static void x86_cpu_initfn(Object *obj)
diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index dbd9899..d9a9e8f 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -1239,5 +1239,6 @@ void disable_kvm_pv_eoi(void);
 const char *get_register_name_32(unsigned int reg);
 
 uint32_t apic_id_for_cpu(unsigned int cpu_index);
+void enable_compat_apic_id_mode(void);
 
 #endif /* CPU_I386_H */
-- 
1.7.11.7

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo

[RFC 00/12] target-i386: Fix APIC-ID-based topology (v4)

2013-01-09 Thread Eduardo Habkost

This series uses a much simpler approach than the previous versions:

 - The APIC ID calculation code is now inside cpu.c
 - It doesn't require touching the PC CPU creation code at all
 - It simply uses a static variable to enable the compat behavior on pc-1.3
   and older
   - I considered making the compat-apic-id setting a global property for the
 X86CPU objects, the only problem is that the fw_cfg initialization code
 on pc.c also depends on the compat behavior

I am sending this as RFC because it depends on two not-included-yet series:
 - Andreas' qom-cpu-7 branch
 - My cpu-enforce fixes series

I hope to be able to get this buf fix into QEMU 1.4. I don't know if we should
try to get this before soft freeze, or we can include it after that.

Git tree for reference:
  git://github.com/ehabkost/qemu-hacks.git apicid-topology.v4
  https://github.com/ehabkost/qemu-hacks/tree/apicid-topology.v4

Eduardo Habkost (12):
  kvm: add KVM_FEATURE_CLOCKSOURCE_STABLE_BIT fake #define
  target-i386: Don't set any KVM flag by default if KVM is disabled
  pc: Reverse pc_init_pci() compatibility logic
  kvm: Create kvm_arch_vcpu_id() function
  target-i386: kvm: Set vcpu_id to APIC ID instead of CPU index
  fw_cfg: Remove FW_CFG_MAX_CPUS from fw_cfg_init()
  target-i386/cpu: Introduce apic_id_for_cpu() function
  cpus.h: Make constant smp_cores/smp_threads available on *-user
  pc: Set fw_cfg data based on APIC ID calculation
  tests: Support target-specific unit tests
  target-i386: Topology  APIC ID utility functions
  pc: Generate APIC IDs according to CPU topology

 hw/fw_cfg.c|   1 -
 hw/pc.c|  44 +---
 hw/pc_piix.c   |  27 +++---
 hw/ppc_newworld.c  |   1 +
 hw/ppc_oldworld.c  |   1 +
 hw/sun4m.c |   3 ++
 hw/sun4u.c |   1 +
 include/sysemu/cpus.h  |   7 +++
 include/sysemu/kvm.h   |   4 ++
 kvm-all.c  |   2 +-
 target-i386/cpu.c  |  56 -
 target-i386/cpu.h  |   5 +-
 target-i386/kvm.c  |   6 +++
 target-i386/topology.h | 133 +
 target-ppc/kvm.c   |   5 ++
 target-s390x/kvm.c |   5 ++
 tests/.gitignore   |   1 +
 tests/Makefile |  19 +--
 tests/test-x86-cpuid.c | 101 +
 19 files changed, 390 insertions(+), 32 deletions(-)
 create mode 100644 target-i386/topology.h
 create mode 100644 tests/test-x86-cpuid.c

-- 
1.7.11.7

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] VFIO-AER: Vfio-pci driver changes for supporting AER

2013-01-09 Thread Alex Williamson

On Wed, 2013-01-09 at 10:52 -0700, Alex Williamson wrote:
 On Wed, 2013-01-09 at 06:26 +, Pandarathil, Vijaymohan R wrote:
  - New ioctl which is used to pass the eventfd that is signaled when
an error occurs in the vfio_pci_device
  
  - Register pci_error_handler for the vfio_pci driver
  
  - When the device encounters an error, the error handler registered by
the vfio_pci driver gets invoked by the AER infrastructure
  
  - In the error handler, signal the eventfd registered for the device.
  
  - This results in the qemu eventfd handler getting invoked and
appropriate action taken for the guest.
  
  Signed-off-by: Vijay Mohan Pandarathil vijaymohan.pandarat...@hp.com
  ---
   drivers/vfio/pci/vfio_pci.c | 29 +
   drivers/vfio/pci/vfio_pci_private.h |  1 +
   drivers/vfio/vfio.c |  8 
   include/linux/vfio.h|  1 +
   include/uapi/linux/vfio.h   |  9 +
   5 files changed, 48 insertions(+)
  
  diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
  index 6c11994..4ae9526 100644
  --- a/drivers/vfio/pci/vfio_pci.c
  +++ b/drivers/vfio/pci/vfio_pci.c
  @@ -207,6 +207,8 @@ static long vfio_pci_ioctl(void *device_data,
  if (vdev-reset_works)
  info.flags |= VFIO_DEVICE_FLAGS_RESET;
   
  +   info.flags |= VFIO_DEVICE_FLAGS_AER_NOTIFY;
  +
 
 This appears to be a PCI specific flag, so the name should include
 _PCI_.  We also support non-PCIe devices and it seems like it would be
 possible to not have AER support available, so shouldn't this be
 conditional?
 
  info.num_regions = VFIO_PCI_NUM_REGIONS;
  info.num_irqs = VFIO_PCI_NUM_IRQS;
   
  @@ -348,6 +350,19 @@ static long vfio_pci_ioctl(void *device_data,
   
  return ret;
   
  +   } else if (cmd == VFIO_DEVICE_SET_ERRFD) {
  +   int32_t fd = (int32_t)arg;
  +
  +   if (fd  0)
  +   return -EINVAL;
  +
  +   vdev-err_trigger = eventfd_ctx_fdget(fd);
  +
  +   if (IS_ERR(vdev-err_trigger))
  +   return PTR_ERR(vdev-err_trigger);
  +
  +   return 0;
  +
 
 I'm not sure why we wouldn't describe this as just another interrupt
 from the device and configure it via SET_IRQ.  This ioctl has very
 limited use and doesn't follow any of the conventions of all the other
 vfio ioctls.
 
  } else if (cmd == VFIO_DEVICE_RESET)
  return vdev-reset_works ?
  pci_reset_function(vdev-pdev) : -EINVAL;
  @@ -527,11 +542,25 @@ static void vfio_pci_remove(struct pci_dev *pdev)
  kfree(vdev);
   }
   
  +static pci_ers_result_t vfio_err_detected(struct pci_dev *pdev,
  +   pci_channel_state_t state)
  +{
  +   struct vfio_pci_device *vdev = vfio_get_vdev(pdev-dev);
  +
  +   eventfd_signal(vdev-err_trigger, 1);
  +   return PCI_ERS_RESULT_CAN_RECOVER;
  +}
 
 What if err_trigger hasn't been set?
 
  +
  +static const struct pci_error_handlers vfio_err_handlers = {
  +   .error_detected = vfio_err_detected,
  +};
  +
   static struct pci_driver vfio_pci_driver = {
  .name   = vfio-pci,
  .id_table   = NULL, /* only dynamic ids */
  .probe  = vfio_pci_probe,
  .remove = vfio_pci_remove,
  +   .err_handler= vfio_err_handlers,
   };
   
   static void __exit vfio_pci_cleanup(void)
  diff --git a/drivers/vfio/pci/vfio_pci_private.h 
  b/drivers/vfio/pci/vfio_pci_private.h
  index 611827c..daee62f 100644
  --- a/drivers/vfio/pci/vfio_pci_private.h
  +++ b/drivers/vfio/pci/vfio_pci_private.h
  @@ -55,6 +55,7 @@ struct vfio_pci_device {
  boolbardirty;
  struct pci_saved_state  *pci_saved_state;
  atomic_trefcnt;
  +   struct eventfd_ctx  *err_trigger;
   };
   
   #define is_intx(vdev) (vdev-irq_type == VFIO_PCI_INTX_IRQ_INDEX)
  diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
  index 56097c6..5ed5a54 100644
  --- a/drivers/vfio/vfio.c
  +++ b/drivers/vfio/vfio.c
  @@ -693,6 +693,14 @@ void *vfio_del_group_dev(struct device *dev)
   }
   EXPORT_SYMBOL_GPL(vfio_del_group_dev);
   
  +void *vfio_get_vdev(struct device *dev)
  +{
  +   struct vfio_device *device = dev_get_drvdata(dev);
  +
  +   return device-device_data;
  +}
  +EXPORT_SYMBOL_GPL(vfio_get_vdev);
  +
 
 This is unsafe.  How do we know dev is a vfio device?  How do we keep
 that drvdata valid while you're using it?  I think you want to export
 the existing vfio_group_get_device() and vfio_device_put().  Thanks,

vfio_group_get_device() isn't quite what you want either since it
assumes a reference count on the group.  You'll want something like
vfio_add_group_dev() or vfio_dev_present(), perhaps moving the
iommu_group_get - vfio_group_get_from_iommu - vfio_group_get_device
into a helper function.  Thanks,

Alex

   /**
* VFIO base fd,

[PATCH 1/2] vfio-pci: Make host MSI-X enable track guest

2013-01-09 Thread Alex Williamson

Guests typically enable MSI-X with all of the vectors in the MSI-X
vector table masked.  Only when the vector is enabled does the vector
get unmasked, resulting in a vector_use callback.  These two points,
enable and unmask, correspond to pci_enable_msix() and request_irq()
for Linux guests.  Some drivers rely on VF/PF or PF/fw communication
channels that expect the physical state of the device to match the
guest visible state of the device.  They don't appreciate lazily
enabling MSI-X on the physical device.

To solve this, enable MSI-X with a single vector when the MSI-X
capability is enabled and immediate disable the vector.  This leaves
the physical device in exactly the same state between host and guest.
Furthermore, the brief gap where we enable vector 0, it fires into
userspace, not KVM, so the guest doesn't get spurious interrupts.
Ideally we could call VFIO_DEVICE_SET_IRQS with the right parameters
to enable MSI-X with zero vectors, but this will currently return an
error as the Linux MSI-X interfaces do not allow it.

Signed-off-by: Alex Williamson alex.william...@redhat.com
Cc: qemu-sta...@nongnu.org
---
 hw/vfio_pci.c |   30 ++
 1 file changed, 26 insertions(+), 4 deletions(-)

diff --git a/hw/vfio_pci.c b/hw/vfio_pci.c
index 28c8303..8ec1faf 100644
--- a/hw/vfio_pci.c
+++ b/hw/vfio_pci.c
@@ -562,8 +562,8 @@ static int vfio_enable_vectors(VFIODevice *vdev, bool msix)
 return ret;
 }
 
-static int vfio_msix_vector_use(PCIDevice *pdev,
-unsigned int nr, MSIMessage msg)
+static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
+   MSIMessage *msg, IOHandler *handler)
 {
 VFIODevice *vdev = DO_UPCAST(VFIODevice, pdev, pdev);
 VFIOMSIVector *vector;
@@ -587,7 +587,7 @@ static int vfio_msix_vector_use(PCIDevice *pdev,
  * Attempt to enable route through KVM irqchip,
  * default to userspace handling if unavailable.
  */
-vector-virq = kvm_irqchip_add_msi_route(kvm_state, msg);
+vector-virq = msg ? kvm_irqchip_add_msi_route(kvm_state, *msg) : -1;
 if (vector-virq  0 ||
 kvm_irqchip_add_irqfd_notifier(kvm_state, vector-interrupt,
vector-virq)  0) {
@@ -596,7 +596,7 @@ static int vfio_msix_vector_use(PCIDevice *pdev,
 vector-virq = -1;
 }
 qemu_set_fd_handler(event_notifier_get_fd(vector-interrupt),
-vfio_msi_interrupt, NULL, vector);
+handler, NULL, vector);
 }
 
 /*
@@ -639,6 +639,12 @@ static int vfio_msix_vector_use(PCIDevice *pdev,
 return 0;
 }
 
+static int vfio_msix_vector_use(PCIDevice *pdev,
+unsigned int nr, MSIMessage msg)
+{
+return vfio_msix_vector_do_use(pdev, nr, msg, vfio_msi_interrupt);
+}
+
 static void vfio_msix_vector_release(PCIDevice *pdev, unsigned int nr)
 {
 VFIODevice *vdev = DO_UPCAST(VFIODevice, pdev, pdev);
@@ -697,6 +703,22 @@ static void vfio_enable_msix(VFIODevice *vdev)
 
 vdev-interrupt = VFIO_INT_MSIX;
 
+/*
+ * Some communication channels between VF  PF or PF  fw rely on the
+ * physical state of the device and expect that enabling MSI-X from the
+ * guest enables the same on the host.  When our guest is Linux, the
+ * guest driver call to pci_enable_msix() sets the enabling bit in the
+ * MSI-X capability, but leaves the vector table masked.  We therefore
+ * can't rely on a vector_use callback (from request_irq() in the guest)
+ * to switch the physical device into MSI-X mode because that may come a
+ * long time after pci_enable_msix().  This code enables vector 0 with
+ * triggering to userspace, then immediately release the vector, leaving
+ * the physical device with no vectors enabled, but MSI-X enabled, just
+ * like the guest view.
+ */
+vfio_msix_vector_do_use(vdev-pdev, 0, NULL, NULL);
+vfio_msix_vector_release(vdev-pdev, 0);
+
 if (msix_set_vector_notifiers(vdev-pdev, vfio_msix_vector_use,
   vfio_msix_vector_release, NULL)) {
 error_report(vfio: msix_set_vector_notifiers failed\n);

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] vfio-pci: Loosen sanity checks to allow future features

2013-01-09 Thread Alex Williamson

VFIO_PCI_NUM_REGIONS and VFIO_PCI_NUM_IRQS should never have been
used in this manner as it locks a specific kernel implementation.
Future features may introduce new regions or interrupt entries
(VGA may add legacy ranges, AER might add an IRQ for error
signalling).  Fix this before it gets us into trouble.

Signed-off-by: Alex Williamson alex.william...@redhat.com
Cc: qemu-sta...@nongnu.org
---
 hw/vfio_pci.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/vfio_pci.c b/hw/vfio_pci.c
index 8ec1faf..c51ae67 100644
--- a/hw/vfio_pci.c
+++ b/hw/vfio_pci.c
@@ -1837,13 +1837,13 @@ static int vfio_get_device(VFIOGroup *group, const char 
*name, VFIODevice *vdev)
 error_report(Warning, device %s does not support reset\n, name);
 }
 
-if (dev_info.num_regions != VFIO_PCI_NUM_REGIONS) {
+if (dev_info.num_regions  VFIO_PCI_CONFIG_REGION_INDEX + 1) {
 error_report(vfio: unexpected number of io regions %u\n,
  dev_info.num_regions);
 goto error;
 }
 
-if (dev_info.num_irqs != VFIO_PCI_NUM_IRQS) {
+if (dev_info.num_irqs  VFIO_PCI_MSIX_IRQ_INDEX + 1) {
 error_report(vfio: unexpected number of irqs %u\n, 
dev_info.num_irqs);
 goto error;
 }

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PULL 0/2] vfio-pci: Fixes for 1.4 stable

2013-01-09 Thread Alex Williamson

Anthony,

The following changes since commit 560c30b1db1d40fe45c5104185367c4de43399d3:

  Merge remote-tracking branch 'kraxel/usb.75' into staging (2013-01-08 
10:36:20 -0600)

are available in the git repository at:


  git://github.com/awilliam/qemu-vfio.git tags/qemu-1.4-vfio-20130109.0

for you to fetch changes up to 8fc94e5a8046e349e07976f9bcaffbcd5833f3a2:

  vfio-pci: Loosen sanity checks to allow future features (2013-01-08 14:10:03 
-0700)


vfio-pci: Fixes for qemu 1.4  stable


Alex Williamson (2):
  vfio-pci: Make host MSI-X enable track guest
  vfio-pci: Loosen sanity checks to allow future features

 hw/vfio_pci.c | 34 --
 1 file changed, 28 insertions(+), 6 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [kvmarm] [PATCH v5.1 0/2] KVM: ARM: Rename KVM_SET_DEVICE_ADDRESS

2013-01-09 Thread Scott Wood


On 01/09/2013 10:48:47 AM, Alexander Graf wrote:


On 09.01.2013, at 17:26, Christoffer Dall wrote:

 Renames the KVM_SET_DEVICE_ADDRESS to KVM_ARM_SET_DEVICE_ADDR
 to make it obvious that this is ARM specific in lack of a better  
generic

 interface.

 Once we agree on a better interface the KVM/ARM code can also take
 advantage of that, but until then we don't want to hold up the  
KVM/ARM

 patches.

Works for me. Scott, are you happy with this one too?


Not really, given that it will stay around forever even after something  
new is introduced.


If you're going to change the name, why not just change it to  
KVM_SET_DEVICE_CONFIG?  Can we change the name later if nothing else  
changes (so it's still binary compatible)?


We can start to introduce (and fix ARM) with a generic ioctl in the  
MPIC patches then.


The ioctl is already generic, except for its name.

-Scott
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [kvmarm] [PATCH v5.1 0/2] KVM: ARM: Rename KVM_SET_DEVICE_ADDRESS

2013-01-09 Thread Alexander Graf


On 09.01.2013, at 20:50, Scott Wood wrote:

 On 01/09/2013 10:48:47 AM, Alexander Graf wrote:
 On 09.01.2013, at 17:26, Christoffer Dall wrote:
  Renames the KVM_SET_DEVICE_ADDRESS to KVM_ARM_SET_DEVICE_ADDR
  to make it obvious that this is ARM specific in lack of a better generic
  interface.
 
  Once we agree on a better interface the KVM/ARM code can also take
  advantage of that, but until then we don't want to hold up the KVM/ARM
  patches.
 Works for me. Scott, are you happy with this one too?
 
 Not really, given that it will stay around forever even after something new 
 is introduced.

But only in ARM specific code.

 If you're going to change the name, why not just change it to 
 KVM_SET_DEVICE_CONFIG?  Can we change the name later if nothing else changes 
 (so it's still binary compatible)?

Because that again implies that it's generic enough. And to reach that 
conclusion will take more time than we should spend on this now.

 We can start to introduce (and fix ARM) with a generic ioctl in the MPIC 
 patches then.
 
 The ioctl is already generic, except for its name.

It's making a few wrong assumptions:

  * maximum size of value is u64
  * combining device id (variable) with addr type id (const) into a single 
field. It could just be split into multiple fields
  * the id is 100% architecture specific. It shouldn't be. At least the device 
id field should be generic.

I'm not sure if we can come up with more problems in that API when staring at 
it a bit longer and/or we would actually start using it for more things. So for 
the sake of not holding up the ARM code, I'm perfectly fine to clutter ARM's 
ioctl handling code with an ioctl that is already deprecated at its 
introduction, as long as we don't hold everything else up meanwhile.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

vhost-net thread getting stuck ?

2013-01-09 Thread Chegu Vinod



Hello,

'am running into an issue with the latest bits. [ Pl. see below. The 
vhost thread seems to be getting
stuck while trying to memcopy...perhaps a bad address?.  ] Wondering if 
this is a known issue or

some recent regression ?

'am using the latest qemu (from qemu.git) and the latest kvm.git kernel 
on the host. Started the

guest using the following command line

/usr/local/bin/qemu-system-x86_64 \
-enable-kvm \
-cpu host \
-smp sockets=8,cores=10,threads=1 \
-numa node,nodeid=0,cpus=0-9,mem=64g \
-numa node,nodeid=1,cpus=10-19,mem=64g \
-numa node,nodeid=2,cpus=20-29,mem=64g \
-numa node,nodeid=3,cpus=30-39,mem=64g \
-numa node,nodeid=4,cpus=40-49,mem=64g \
-numa node,nodeid=5,cpus=50-59,mem=64g \
-numa node,nodeid=6,cpus=60-69,mem=64g \
-numa node,nodeid=7,cpus=70-79,mem=64g \
-m 524288 \
-mem-path /dev/hugepages \
-name vm2 \
-chardev 
socket,id=charmonitor,path=/var/lib/libvirt/qemu/vm2.monitor,server,now

ait \
-drive 
file=/dev/libvirt_lvm2/vm2,if=none,id=drive-virtio-disk0,format=raw,cache

=none,aio=native \
-device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=v

irtio-disk0,bootindex=1 \
-monitor stdio \
-net nic,model=virtio,macaddr=52:54:00:71:01:02,netdev=nic-0 \
-netdev tap,id=nic-0,ifname=tap0,script=no,downscript=no,vhost=on \
-vnc :4


Was just doing a basic kernel build in the guest when it hung with the 
following in the

dmesg of the host.

Thanks
Vinod

BUG: soft lockup - CPU#46 stuck for 23s! [vhost-135220:135231]
Modules linked in: kvm_intel kvm fuse ip6table_filter ip6_tables 
ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state 
nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter 
ip_tables bridge stp llc autofs4 sunrpc pcc_cpufreq ipv6 vhost_net 
macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support coretemp 
crc32c_intel ghash_clmulni_intel microcode pcspkr mlx4_core be2net 
lpc_ich mfd_core hpilo hpwdt i7core_edac edac_core sg netxen_nic ext4 
mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif aesni_intel ablk_helper 
cryptd lrw aes_x86_64 xts gf128mul pata_acpi ata_generic ata_piix hpsa 
lpfc scsi_transport_fc scsi_tgt radeon ttm drm_kms_helper drm 
i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod [last 
unloaded: kvm]

CPU 46
Pid: 135231, comm: vhost-135220 Not tainted 3.7.0+ #1 HP ProLiant DL980 G7
RIP: 0010:[8147bab0]  [8147bab0] 
skb_flow_dissect+0x1b0/0x440

RSP: 0018:881ffd131bc8  EFLAGS: 0246
RAX: 8a1f7dc70c00 RBX:  RCX: 7fa0
RDX:  RSI: 881ffd131c68 RDI: 8a1ff1bd6c80
RBP: 881ffd131c58 R08: 881ffd131bf8 R09: 8a1ff1bd6c80
R10: 0010 R11: 0004 R12: 8a1ff1bd6c80
R13: 000b R14: 8147330b R15: 881ffd131b58
FS:  () GS:8a1fff98() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 003d5c810dc0 CR3: 009f77c04000 CR4: 27e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process vhost-135220 (pid: 135231, threadinfo 881ffd13, task 
881ffcb754c0)

Stack:
 881ffd131c18 81477b90 00e2 2b289bcc58ce
 881ffd131ce4 00a2  00a2
 00a2 00a2 881ffd131c88 937e754e
Call Trace:
 [81477b90] ? memcpy_fromiovecend+0x90/0xd0
 [8147f3ca] __skb_get_rxhash+0x1a/0xe0
 [a03c90f8] tun_get_user+0x468/0x660 [tun]
 [81090010] ? __sdt_alloc+0x80/0x1a0
 [a03c934d] tun_sendmsg+0x5d/0x80 [tun]
 [a0468e8a] handle_tx+0x34a/0x680 [vhost_net]
 [a04691f5] handle_tx_kick+0x15/0x20 [vhost_net]
 [a0466dfc] vhost_worker+0x10c/0x1c0 [vhost_net]
 [a0466cf0] ? vhost_attach_cgroups_work+0x30/0x30 [vhost_net]
 [a0466cf0] ? vhost_attach_cgroups_work+0x30/0x30 [vhost_net]
 [8107ecfe] kthread+0xce/0xe0
 [8107ec30] ? kthread_freezable_should_stop+0x70/0x70
 [815537ac] ret_from_fork+0x7c/0xb0
 [8107ec30] ? kthread_freezable_should_stop+0x70/0x70
Code: b6 50 06 48 89 ce 48 c1 ee 20 31 f1 41 89 0e 48 8b 48 20 48 33 48 
18 48 89 c8 48 c1 e8 20 31 c1 41 89 4e 04 e9 35 ff ff ff 66 90 0f b6 
50 09 e9 1a ff ff ff 0f 1f 80 00 00 00 00 41 8b 44 24 68

[root@hydra11 kvm_rik]#
Message from syslogd@hydra11 at Jan  9 13:06:58 ...
 kernel:BUG: soft lockup - CPU#46 stuck for 22s! [vhost-135220:135231]

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [kvmarm] [PATCH v5.1 0/2] KVM: ARM: Rename KVM_SET_DEVICE_ADDRESS

2013-01-09 Thread Scott Wood


On 01/09/2013 02:12:16 PM, Alexander Graf wrote:


On 09.01.2013, at 20:50, Scott Wood wrote:

 On 01/09/2013 10:48:47 AM, Alexander Graf wrote:
 On 09.01.2013, at 17:26, Christoffer Dall wrote:
  Renames the KVM_SET_DEVICE_ADDRESS to KVM_ARM_SET_DEVICE_ADDR
  to make it obvious that this is ARM specific in lack of a better  
generic

  interface.
 
  Once we agree on a better interface the KVM/ARM code can also  
take
  advantage of that, but until then we don't want to hold up the  
KVM/ARM

  patches.
 Works for me. Scott, are you happy with this one too?

 Not really, given that it will stay around forever even after  
something new is introduced.


But only in ARM specific code.


...which I'll probably have to deal with when Freescale's  
virtualization-capable ARM chips come along.  I don't see it's only in  
that other architecture as it might as well not exist.


 If you're going to change the name, why not just change it to  
KVM_SET_DEVICE_CONFIG?  Can we change the name later if nothing else  
changes (so it's still binary compatible)?


Because that again implies that it's generic enough. And to reach  
that conclusion will take more time than we should spend on this now.


If the conclusion later on is that it is good enough, can the name be  
changed then?


 We can start to introduce (and fix ARM) with a generic ioctl in  
the MPIC patches then.


 The ioctl is already generic, except for its name.

It's making a few wrong assumptions:

  * maximum size of value is u64


This is tolerable IMHO.

  * combining device id (variable) with addr type id (const) into a  
single field. It could just be split into multiple fields


I agree, but that could be lived with as well.

I get that there's a tradeoff between getting something in now, versus  
waiting until the API is more refined.  Tagging it with a particular  
ISA seems like an odd way of saying soon to be deprecated, though.   
What happens if we're still squabbling over the perfect replacement API  
when we're trying to push PPC MPIC stuff in?


Perhaps the threshold for an API becoming permanent should not be  
acceptance into the tree, but rather the removal of an experimental  
tag (including a way of shutting off experimental APIs to make sure  
you're not depending on them).  Sort of like CONFIG_EXPERIMENTAL,  
except actually used for its intended purpose (distributions should  
have it *off* by default), and preferably managed at runtime.  Sort of  
like drivers/staging, except for APIs rather than drivers.  Changes at  
that point should require more justification than before merging, but  
would not have the strict compatibility requirement that  
non-experimental APIs have.  This would make collaboration and testing  
easier on APIs that aren't ready to be permanent.


  * the id is 100% architecture specific. It shouldn't be. At least  
the device id field should be generic.


That's a documentation issue that could be changed to have all  
architectures adopt what is currently specified for ARM, without  
breaking compatibility.


I'm not sure if we can come up with more problems in that API when  
staring at it a bit longer and/or we would actually start using it  
for more things. So for the sake of not holding up the ARM code, I'm  
perfectly fine to clutter ARM's ioctl handling code with an ioctl  
that is already deprecated at its introduction, as long as we don't  
hold everything else up meanwhile.


I'm not in a position to block it, and if I were I presumably would  
have seen this in time for feedback to matter.  That's different from  
actually being happy. :-)


-Scott
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 2/2] QEMU-AER: Qemu changes to support AER for VFIO-PCI devices

2013-01-09 Thread Blue Swirl

On Wed, Jan 9, 2013 at 6:26 AM, Pandarathil, Vijaymohan R
vijaymohan.pandarat...@hp.com wrote:
 - Create eventfd per vfio device assigned to a guest and register an
   event handler

 - This fd is passed to the vfio_pci driver through a new ioctl

 - When the device encounters an error, the eventfd is signaled
   and the qemu eventfd handler gets invoked.

 - In the handler decide what action to take. Current action taken
   is to terminate the guest.

 Signed-off-by: Vijay Mohan Pandarathil vijaymohan.pandarat...@hp.com
 ---
  hw/vfio_pci.c  | 56 
 ++
  linux-headers/linux/vfio.h |  9 
  2 files changed, 65 insertions(+)

 diff --git a/hw/vfio_pci.c b/hw/vfio_pci.c
 index 28c8303..9c3c28b 100644
 --- a/hw/vfio_pci.c
 +++ b/hw/vfio_pci.c
 @@ -38,6 +38,7 @@
  #include qemu/error-report.h
  #include qemu/queue.h
  #include qemu/range.h
 +#include sysemu/sysemu.h

  /* #define DEBUG_VFIO */
  #ifdef DEBUG_VFIO
 @@ -130,6 +131,8 @@ typedef struct VFIODevice {
  QLIST_ENTRY(VFIODevice) next;
  struct VFIOGroup *group;
  bool reset_works;
 +EventNotifier errfd;
 +__u32 dev_info_flags;

QEMU is not kernel code, please use uint32_t.

  } VFIODevice;

  typedef struct VFIOGroup {
 @@ -1805,6 +1808,8 @@ static int vfio_get_device(VFIOGroup *group, const char 
 *name, VFIODevice *vdev)
  DPRINTF(Device %s flags: %u, regions: %u, irgs: %u\n, name,
  dev_info.flags, dev_info.num_regions, dev_info.num_irqs);

 +vdev-dev_info_flags = dev_info.flags;
 +
  if (!(dev_info.flags  VFIO_DEVICE_FLAGS_PCI)) {
  error_report(vfio: Um, this isn't a PCI device\n);
  goto error;
 @@ -1900,6 +1905,55 @@ static void vfio_put_device(VFIODevice *vdev)
  }
  }

 +static void vfio_errfd_handler(void *opaque)
 +{
 +VFIODevice *vdev = opaque;
 +
 +if (!event_notifier_test_and_clear(vdev-errfd)) {
 +return;
 +}
 +
 +/*
 + * TBD. Retrieve the error details and decide what action
 + * needs to be taken. One of the actions could be to pass
 + * the error to the guest and have the guest driver recover
 + * the error. This requires that PCIe capabilities be
 + * exposed to the guest. At present, we just terminate the
 + * guest to contain the error.
 + */
 +error_report(%s(%04x:%02x:%02x.%x) 
 +Unrecoverable error detected... Terminating guest\n,
 +__func__, vdev-host.domain, vdev-host.bus, vdev-host.slot,
 +vdev-host.function);
 +
 +qemu_system_shutdown_request();
 +return;
 +}
 +
 +static void vfio_register_errfd(VFIODevice *vdev)
 +{
 +int32_t pfd;
 +int ret;
 +
 +if (!(vdev-dev_info_flags  VFIO_DEVICE_FLAGS_AER_NOTIFY)) {
 +error_report(vfio: Warning: Error notification not supported for 
 the device\n);
 +return;
 +}
 +if (event_notifier_init(vdev-errfd, 0)) {
 +error_report(vfio: Warning: Unable to init event notifier for error 
 detection\n);
 +return;
 +}
 +pfd = event_notifier_get_fd(vdev-errfd);
 +qemu_set_fd_handler(pfd, vfio_errfd_handler, NULL, vdev);
 +
 +ret = ioctl(vdev-fd, VFIO_DEVICE_SET_ERRFD, pfd);
 +if (ret) {
 +error_report(vfio: Warning: Failed to setup error fd: %d\n, ret);
 +qemu_set_fd_handler(pfd, NULL, NULL, vdev);
 +event_notifier_cleanup(vdev-errfd);
 +}
 +return;
 +}
  static int vfio_initfn(PCIDevice *pdev)
  {
  VFIODevice *pvdev, *vdev = DO_UPCAST(VFIODevice, pdev, pdev);
 @@ -2010,6 +2064,8 @@ static int vfio_initfn(PCIDevice *pdev)
  }
  }

 +vfio_register_errfd(vdev);
 +
  return 0;

  out_teardown:
 diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
 index 4758d1b..0ca4eeb 100644
 --- a/linux-headers/linux/vfio.h
 +++ b/linux-headers/linux/vfio.h
 @@ -147,6 +147,7 @@ struct vfio_device_info {
 __u32   flags;
  #define VFIO_DEVICE_FLAGS_RESET(1  0)/* Device supports 
 reset */
  #define VFIO_DEVICE_FLAGS_PCI  (1  1)/* vfio-pci device */
 +#define VFIO_DEVICE_FLAGS_AER_NOTIFY (1  2)   /* Supports aer notification 
 */
 __u32   num_regions;/* Max region index + 1 */
 __u32   num_irqs;   /* Max IRQ index + 1 */

These are verbatim copies of kernel headers so it's OK here.

  };
 @@ -288,6 +289,14 @@ struct vfio_irq_set {
   */
  #define VFIO_DEVICE_RESET  _IO(VFIO_TYPE, VFIO_BASE + 11)

 +/**
 + * VFIO_DEVICE_SET_ERRFD - _IO(VFIO_TYPE, VFIO_BASE + 12)
 + *
 + * Pass the eventfd to the vfio-pci driver for signalling any device
 + * error notifications
 + */
 +#define VFIO_DEVICE_SET_ERRFD   _IO(VFIO_TYPE, VFIO_BASE + 12)
 +
  /*
   * The VFIO-PCI bus driver makes use of the following fixed region and
   * IRQ index mapping.  Unimplemented regions return a size of zero.
 --
 1.7.11.3


--
To unsubscribe from this list: send the

Re: [kvmarm] [PATCH v5.1 0/2] KVM: ARM: Rename KVM_SET_DEVICE_ADDRESS

2013-01-09 Thread Alexander Graf



Am 09.01.2013 um 22:15 schrieb Scott Wood scottw...@freescale.com:

 On 01/09/2013 02:12:16 PM, Alexander Graf wrote:
 On 09.01.2013, at 20:50, Scott Wood wrote:
  On 01/09/2013 10:48:47 AM, Alexander Graf wrote:
  On 09.01.2013, at 17:26, Christoffer Dall wrote:
   Renames the KVM_SET_DEVICE_ADDRESS to KVM_ARM_SET_DEVICE_ADDR
   to make it obvious that this is ARM specific in lack of a better generic
   interface.
  
   Once we agree on a better interface the KVM/ARM code can also take
   advantage of that, but until then we don't want to hold up the KVM/ARM
   patches.
  Works for me. Scott, are you happy with this one too?
 
  Not really, given that it will stay around forever even after something 
  new is introduced.
 But only in ARM specific code.
 
 ...which I'll probably have to deal with when Freescale's 
 virtualization-capable ARM chips come along.  I don't see it's only in that 
 other architecture as it might as well not exist.

I'm saying it's limiting its scope to a few lines of code in arch an arch 
specific file and makes everyone aware that we really need to come to a 
conclusion soon.

 
  If you're going to change the name, why not just change it to 
  KVM_SET_DEVICE_CONFIG?  Can we change the name later if nothing else 
  changes (so it's still binary compatible)?
 Because that again implies that it's generic enough. And to reach that 
 conclusion will take more time than we should spend on this now.
 
 If the conclusion later on is that it is good enough, can the name be changed 
 then?

We can add another generic name, yes. Changing it won't work because it'd break 
compatibility.

 
  We can start to introduce (and fix ARM) with a generic ioctl in the MPIC 
  patches then.
 
  The ioctl is already generic, except for its name.
 It's making a few wrong assumptions:
  * maximum size of value is u64
 
 This is tolerable IMHO.
 
  * combining device id (variable) with addr type id (const) into a single 
 field. It could just be split into multiple fields
 
 I agree, but that could be lived with as well.
 
 I get that there's a tradeoff between getting something in now, versus 
 waiting until the API is more refined.  Tagging it with a particular ISA 
 seems like an odd way of saying soon to be deprecated, though.  What 
 happens if we're still squabbling over the perfect replacement API when we're 
 trying to push PPC MPIC stuff in?

Then we're the ones who have to come up with a good interface.

 
 Perhaps the threshold for an API becoming permanent should not be 
 acceptance into the tree, but rather the removal of an experimental tag 
 (including a way of shutting off experimental APIs to make sure you're not 
 depending on them).  Sort of like CONFIG_EXPERIMENTAL, except actually used 
 for its intended purpose (distributions should have it *off* by default), and 
 preferably managed at runtime.  Sort of like drivers/staging, except for APIs 
 rather than drivers.  Changes at that point should require more justification 
 than before merging, but would not have the strict compatibility requirement 
 that non-experimental APIs have.  This would make collaboration and testing 
 easier on APIs that aren't ready to be permanent.

This tag does exist. It's called not in Linus' tree :).

 
  * the id is 100% architecture specific. It shouldn't be. At least the 
 device id field should be generic.
 
 That's a documentation issue that could be changed to have all architectures 
 adopt what is currently specified for ARM, without breaking compatibility.

Depends on how we want to design the final layout. I don't have the impression 
that we've reached final conclusion on this interface. That's all I'm saying.

 
 I'm not sure if we can come up with more problems in that API when staring 
 at it a bit longer and/or we would actually start using it for more things. 
 So for the sake of not holding up the ARM code, I'm perfectly fine to 
 clutter ARM's ioctl handling code with an ioctl that is already deprecated 
 at its introduction, as long as we don't hold everything else up meanwhile.
 
 I'm not in a position to block it, and if I were I presumably would have seen 
 this in time for feedback to matter.  That's different from actually being 
 happy. :-)

For the ARM specific ioctl it has my ack. I'd say let's go with that and start 
to work on a good interface ASAP. But we need an ok from Marcelo or Gleb too.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [kvmarm] [PATCH v5.1 0/2] KVM: ARM: Rename KVM_SET_DEVICE_ADDRESS

2013-01-09 Thread Scott Wood


On 01/09/2013 03:37:20 PM, Alexander Graf wrote:



Am 09.01.2013 um 22:15 schrieb Scott Wood scottw...@freescale.com:

 I get that there's a tradeoff between getting something in now,  
versus waiting until the API is more refined.  Tagging it with a  
particular ISA seems like an odd way of saying soon to be  
deprecated, though.  What happens if we're still squabbling over the  
perfect replacement API when we're trying to push PPC MPIC stuff in?


Then we're the ones who have to come up with a good interface.


How about another bad one, with PPC in the name, and some pleas to  
hurry things up? :-)


It's not as if there haven't been last-minute requests for API changes  
on the PPC side in the past...


 Perhaps the threshold for an API becoming permanent should not be  
acceptance into the tree, but rather the removal of an experimental  
tag (including a way of shutting off experimental APIs to make sure  
you're not depending on them).  Sort of like CONFIG_EXPERIMENTAL,  
except actually used for its intended purpose (distributions should  
have it *off* by default), and preferably managed at runtime.  Sort  
of like drivers/staging, except for APIs rather than drivers.   
Changes at that point should require more justification than before  
merging, but would not have the strict compatibility requirement that  
non-experimental APIs have.  This would make collaboration and  
testing easier on APIs that aren't ready to be permanent.


This tag does exist. It's called not in Linus' tree :).


Which makes it a pain for multiple people to work on a new feature,  
especially when it spans components such as KVM and QEMU, and means  
that it gets less testing before the point of no return.


-Scott
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [kvmarm] [PATCH v5.1 0/2] KVM: ARM: Rename KVM_SET_DEVICE_ADDRESS

2013-01-09 Thread Christoffer Dall

On Wed, Jan 9, 2013 at 5:10 PM, Scott Wood scottw...@freescale.com wrote:
 On 01/09/2013 03:37:20 PM, Alexander Graf wrote:



 Am 09.01.2013 um 22:15 schrieb Scott Wood scottw...@freescale.com:

  I get that there's a tradeoff between getting something in now, versus
  waiting until the API is more refined.  Tagging it with a particular ISA
  seems like an odd way of saying soon to be deprecated, though.  What
  happens if we're still squabbling over the perfect replacement API when
  we're trying to push PPC MPIC stuff in?

 Then we're the ones who have to come up with a good interface.


 How about another bad one, with PPC in the name, and some pleas to hurry
 things up? :-)

 It's not as if there haven't been last-minute requests for API changes on
 the PPC side in the past...



This is getting out of hand.

Do you have another API for PPC, which was send for review and not
commented on several months ago that we can unify right now?

If not, let's go with the ARM name and work on the generic API in the mean time.

The end result will be something along 5 lines in a header files and 3
lines in a switch case that return -EINVAL if the interface is
completely deprecated later on, which is not a big problem.

-Christoffer
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [kvmarm] [PATCH v5.1 0/2] KVM: ARM: Rename KVM_SET_DEVICE_ADDRESS

2013-01-09 Thread Alexander Graf


On 09.01.2013, at 23:10, Scott Wood wrote:

 On 01/09/2013 03:37:20 PM, Alexander Graf wrote:
 Am 09.01.2013 um 22:15 schrieb Scott Wood scottw...@freescale.com:
  I get that there's a tradeoff between getting something in now, versus 
  waiting until the API is more refined.  Tagging it with a particular ISA 
  seems like an odd way of saying soon to be deprecated, though.  What 
  happens if we're still squabbling over the perfect replacement API when 
  we're trying to push PPC MPIC stuff in?
 Then we're the ones who have to come up with a good interface.
 
 How about another bad one, with PPC in the name, and some pleas to hurry 
 things up? :-)

Then I'd be the maintainer of that one and tell you whatever I think would be 
reasonable given the circumstances :).

 It's not as if there haven't been last-minute requests for API changes on the 
 PPC side in the past...

I don't remember a full PPC target merge ever relying on an API change like 
this.

In fact, in this particular case one could merge all of the patches except for 
the particular ioctl implementation and just give the respective addresses 
default values until there is an API to set them, similar to how we did things 
with PVR in the beginning.

But this is something that's always up to the respective maintainers and I'm 
not going to interfere with how they do their job :).

 
  Perhaps the threshold for an API becoming permanent should not be 
  acceptance into the tree, but rather the removal of an experimental tag 
  (including a way of shutting off experimental APIs to make sure you're not 
  depending on them).  Sort of like CONFIG_EXPERIMENTAL, except actually 
  used for its intended purpose (distributions should have it *off* by 
  default), and preferably managed at runtime.  Sort of like 
  drivers/staging, except for APIs rather than drivers.  Changes at that 
  point should require more justification than before merging, but would not 
  have the strict compatibility requirement that non-experimental APIs have. 
   This would make collaboration and testing easier on APIs that aren't 
  ready to be permanent.
 This tag does exist. It's called not in Linus' tree :).
 
 Which makes it a pain for multiple people to work on a new feature, 
 especially when it spans components such as KVM and QEMU, and means that it 
 gets less testing before the point of no return.

I agree, but we're not going to solve this one now :). In fact, it'd be even 
harder to solve (with questionable results) than the actual ioctl in question.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [kvmarm] [PATCH v5.1 0/2] KVM: ARM: Rename KVM_SET_DEVICE_ADDRESS

2013-01-09 Thread Alexander Graf


On 09.01.2013, at 23:26, Christoffer Dall wrote:

 On Wed, Jan 9, 2013 at 5:10 PM, Scott Wood scottw...@freescale.com wrote:
 On 01/09/2013 03:37:20 PM, Alexander Graf wrote:
 
 
 
 Am 09.01.2013 um 22:15 schrieb Scott Wood scottw...@freescale.com:
 
 I get that there's a tradeoff between getting something in now, versus
 waiting until the API is more refined.  Tagging it with a particular ISA
 seems like an odd way of saying soon to be deprecated, though.  What
 happens if we're still squabbling over the perfect replacement API when
 we're trying to push PPC MPIC stuff in?
 
 Then we're the ones who have to come up with a good interface.
 
 
 How about another bad one, with PPC in the name, and some pleas to hurry
 things up? :-)
 
 It's not as if there haven't been last-minute requests for API changes on
 the PPC side in the past...
 
 
 
 This is getting out of hand.
 
 Do you have another API for PPC, which was send for review and not
 commented on several months ago that we can unify right now?
 
 If not, let's go with the ARM name and work on the generic API in the mean 
 time.
 
 The end result will be something along 5 lines in a header files and 3
 lines in a switch case that return -EINVAL if the interface is
 completely deprecated later on, which is not a big problem.

Agreed [1].

So what exactly are we waiting for? Acks from kvm maintainers, right?


Alex

[1] It'd probably end up being an ioctl that converts from one calling 
convention to another, resulting in maybe 5 lines in the switch case.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH v8 2/3] x86, apicv: add virtual interrupt delivery support

2013-01-09 Thread Zhang, Yang Z

Marcelo Tosatti wrote on 2013-01-09:
 On Wed, Jan 09, 2013 at 08:07:31AM +, Zhang, Yang Z wrote:
if(check_request(KVM_REQ_, )) {
ioapic_lock();  (*)
update local EOI exit bitmap from IOAPIC
 In my patch, it traverses IOAPIC entry once and only updates target
 vcpus's eoi exit bitmap. Then make request for all vcpus. With your
 suggestion , all vcpus will traverse all IOAPIC entries. Though ioapic
 entry write is rare, it's still not reasonable.
 
 It should be fast, and very rare (as in once during system
 initialization, or device hotplug).
Ok. Will revise the patch follow your suggestion.

 
 Is there a particular case that makes it necessary to optimize scanning?
No. 

 
ioapic_unlock();
}
 
 Fine by me. Looks simpler.
 
 
 
 (*) plus any other lock that paths that update the map take
 
 
 
 
 
 
 Best regards,
 Yang
 
 --
 Gleb.
 
 
 Best regards,
 Yang


Best regards,
Yang

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] vhost-net thread getting stuck ?

2013-01-09 Thread Jason Wang

On 01/10/2013 04:25 AM, Chegu Vinod wrote:

 Hello,

 'am running into an issue with the latest bits. [ Pl. see below. The
 vhost thread seems to be getting
 stuck while trying to memcopy...perhaps a bad address?.  ] Wondering
 if this is a known issue or
 some recent regression ?

Hi:

Looks like the issue has been fixed in following commits, does you tree
contain these?

499744209b2cbca66c42119226e5470da3bb7040 and
76fe45812a3b134c39170ca32dfd4b7217d33145.

They have been merged in to Linus 3.8-rc tree.

Thanks

 'am using the latest qemu (from qemu.git) and the latest kvm.git
 kernel on the host. Started the
 guest using the following command line

 /usr/local/bin/qemu-system-x86_64 \
 -enable-kvm \
 -cpu host \
 -smp sockets=8,cores=10,threads=1 \
 -numa node,nodeid=0,cpus=0-9,mem=64g \
 -numa node,nodeid=1,cpus=10-19,mem=64g \
 -numa node,nodeid=2,cpus=20-29,mem=64g \
 -numa node,nodeid=3,cpus=30-39,mem=64g \
 -numa node,nodeid=4,cpus=40-49,mem=64g \
 -numa node,nodeid=5,cpus=50-59,mem=64g \
 -numa node,nodeid=6,cpus=60-69,mem=64g \
 -numa node,nodeid=7,cpus=70-79,mem=64g \
 -m 524288 \
 -mem-path /dev/hugepages \
 -name vm2 \
 -chardev
 socket,id=charmonitor,path=/var/lib/libvirt/qemu/vm2.monitor,server,now
 ait \
 -drive
 file=/dev/libvirt_lvm2/vm2,if=none,id=drive-virtio-disk0,format=raw,cache
 =none,aio=native \
 -device
 virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=v
 irtio-disk0,bootindex=1 \
 -monitor stdio \
 -net nic,model=virtio,macaddr=52:54:00:71:01:02,netdev=nic-0 \
 -netdev tap,id=nic-0,ifname=tap0,script=no,downscript=no,vhost=on \
 -vnc :4


 Was just doing a basic kernel build in the guest when it hung with the
 following in the
 dmesg of the host.

 Thanks
 Vinod

 BUG: soft lockup - CPU#46 stuck for 23s! [vhost-135220:135231]
 Modules linked in: kvm_intel kvm fuse ip6table_filter ip6_tables
 ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state
 nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter
 ip_tables bridge stp llc autofs4 sunrpc pcc_cpufreq ipv6 vhost_net
 macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support coretemp
 crc32c_intel ghash_clmulni_intel microcode pcspkr mlx4_core be2net
 lpc_ich mfd_core hpilo hpwdt i7core_edac edac_core sg netxen_nic ext4
 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif aesni_intel ablk_helper
 cryptd lrw aes_x86_64 xts gf128mul pata_acpi ata_generic ata_piix hpsa
 lpfc scsi_transport_fc scsi_tgt radeon ttm drm_kms_helper drm
 i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod [last
 unloaded: kvm]
 CPU 46
 Pid: 135231, comm: vhost-135220 Not tainted 3.7.0+ #1 HP ProLiant
 DL980 G7
 RIP: 0010:[8147bab0]  [8147bab0]
 skb_flow_dissect+0x1b0/0x440
 RSP: 0018:881ffd131bc8  EFLAGS: 0246
 RAX: 8a1f7dc70c00 RBX:  RCX: 7fa0
 RDX:  RSI: 881ffd131c68 RDI: 8a1ff1bd6c80
 RBP: 881ffd131c58 R08: 881ffd131bf8 R09: 8a1ff1bd6c80
 R10: 0010 R11: 0004 R12: 8a1ff1bd6c80
 R13: 000b R14: 8147330b R15: 881ffd131b58
 FS:  () GS:8a1fff98()
 knlGS:
 CS:  0010 DS:  ES:  CR0: 8005003b
 CR2: 003d5c810dc0 CR3: 009f77c04000 CR4: 27e0
 DR0:  DR1:  DR2: 
 DR3:  DR6: 0ff0 DR7: 0400
 Process vhost-135220 (pid: 135231, threadinfo 881ffd13, task
 881ffcb754c0)
 Stack:
  881ffd131c18 81477b90 00e2 2b289bcc58ce
  881ffd131ce4 00a2  00a2
  00a2 00a2 881ffd131c88 937e754e
 Call Trace:
  [81477b90] ? memcpy_fromiovecend+0x90/0xd0
  [8147f3ca] __skb_get_rxhash+0x1a/0xe0
  [a03c90f8] tun_get_user+0x468/0x660 [tun]
  [81090010] ? __sdt_alloc+0x80/0x1a0
  [a03c934d] tun_sendmsg+0x5d/0x80 [tun]
  [a0468e8a] handle_tx+0x34a/0x680 [vhost_net]
  [a04691f5] handle_tx_kick+0x15/0x20 [vhost_net]
  [a0466dfc] vhost_worker+0x10c/0x1c0 [vhost_net]
  [a0466cf0] ? vhost_attach_cgroups_work+0x30/0x30 [vhost_net]
  [a0466cf0] ? vhost_attach_cgroups_work+0x30/0x30 [vhost_net]
  [8107ecfe] kthread+0xce/0xe0
  [8107ec30] ? kthread_freezable_should_stop+0x70/0x70
  [815537ac] ret_from_fork+0x7c/0xb0
  [8107ec30] ? kthread_freezable_should_stop+0x70/0x70
 Code: b6 50 06 48 89 ce 48 c1 ee 20 31 f1 41 89 0e 48 8b 48 20 48 33
 48 18 48 89 c8 48 c1 e8 20 31 c1 41 89 4e 04 e9 35 ff ff ff 66 90 0f
 b6 50 09 e9 1a ff ff ff 0f 1f 80 00 00 00 00 41 8b 44 24 68
 [root@hydra11 kvm_rik]#
 Message from syslogd@hydra11 at Jan  9 13:06:58 ...
  kernel:BUG: soft lockup - CPU#46 stuck for 22s! [vhost-135220:135231]



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to

Re: [RFC PATCH 0/9] KVM: PPC: In-kernel PAPR interrupt controller emulation

2013-01-09 Thread Paul Mackerras

On Sat, Dec 15, 2012 at 01:06:13PM +1100, Benjamin Herrenschmidt wrote:
 On Sat, 2012-12-15 at 01:46 +0100, Alexander Graf wrote:
  On 05.11.2012, at 04:18, Paul Mackerras wrote:
  
   This series implements in-kernel emulation of the XICS interrupt
   controller specified in the PAPR specification and used by pseries
   guests.  Since the XICS is organized a bit differently to the
   interrupt controllers used on x86 machines, I have defined some new
   ioctls for setting up the XICS and for saving and restoring its
   state.  It may be possible to use some of the currently x86-specific
   ioctls instead.
  
  Is this one already following the new world order that we discussed during 
  KVM Forum? :)
 
 The new world order  (sorry, looks like nobody took notes and
 people expect me to do a write up from memory now ... :-)
 
 Well, mostly we agreed that the x86 stuff wasn't going to work for us.
 
 So basically what we discussed boils down to:
 
  - Move the existing generic KVM ioctl to create the kernel APIC to
 x86 since it's no as-is useful for other archs who, among other things,
 might need to pass argument based on the machine type (type of interrupt
 subsystem for example, etc...)

Assuming you're talking about KVM_CREATE_IRQCHIP, it is already
handled entirely in arch code (arch/x86/kvm/x86.c and
arch/ia64/kvm/kvm-ia64.c), along with KVM_GET_IRQCHIP and
KVM_SET_IRQCHIP.

  - Once that's done, well, instanciating interrupt controller objects
 becomes pretty much an arch specific matter. We could try to keep some
 ioctls somewhat common though it's not *that* useful because the various
 types  arguments are going to be fairly arch specific, so goes for
 configuration. Examples of what could be kept common are:
 
* Create irq chip, takes at least a type argument, possibly a few
 more type-specific args (or a union, but then let's keep space in there
 since we can't change the size of the struct later as it would impact
 the ioctl number afaik).

The existing KVM_CREATE_IRQCHIP is defined as _IO(...) which means
that it doesn't read or write memory, but there is still the 3rd
argument to the ioctl, which would give us 64 bits to indicate the
type of the top-level IRQ controller (XICS, MPIC, ...), but not much
more.

It sounds like the agreement at the meeting was more along the lines
of the KVM_CREATE_IRQCHIP_ARGS ioctl (as in the patches I posted)
which can be called multiple times to instantiate pieces of the
interrupt framework, rather than having a KVM_CREATE_IRQCHIP that gets
called once early on to say that there we are using in-kernel
interrupt controller emulation, followed by other calls to configure
the various parts of the framework.  Is that accurate?

   * Assigning the address of an irq chip when it can change (see ARM
 patches)
 
  - The existing business with irqfd etc... is mostly ok, except the
 interfaces messing around with MSIs (see virtio-pci use of kvm functions
 directly). The assignment of an irq number for an MSI must be a hook,
 probably a PCI controller hook, so x86 can get it done via its existing
 kernel interfaces and sane architectures can keep the assignment in qemu
 where it belongs.

So, if I've understood you correctly about what was agreed, the set of
ioctls that is implemented in the patches I posted is in line with
what was agreed, isn't it?  If not, where does it differ?  (To recap,
I had KVM_CREATE_IRQCHIP_ARGS, KVM_IRQCHIP_GET_SOURCES and
KVM_IRQCHIP_SET_SOURCES, plus a one-reg interface to get/set the
vcpu-specific state.)

Paul.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] vhost-net thread getting stuck ?

2013-01-09 Thread Chegu Vinod


On 1/9/2013 8:35 PM, Jason Wang wrote:

On 01/10/2013 04:25 AM, Chegu Vinod wrote:

Hello,

'am running into an issue with the latest bits. [ Pl. see below. The
vhost thread seems to be getting
stuck while trying to memcopy...perhaps a bad address?.  ] Wondering
if this is a known issue or
some recent regression ?

Hi:

Looks like the issue has been fixed in following commits, does you tree
contain these?

499744209b2cbca66c42119226e5470da3bb7040 and
76fe45812a3b134c39170ca32dfd4b7217d33145.

They have been merged in to Linus 3.8-rc tree.


I was using kvm.git kernel (as of today morning)looks like the fixes 
aren't there yet.


Will try the Linus's 3.8-rc tree.

Thanks!
Vinod



Thanks

'am using the latest qemu (from qemu.git) and the latest kvm.git
kernel on the host. Started the
guest using the following command line

/usr/local/bin/qemu-system-x86_64 \
-enable-kvm \
-cpu host \
-smp sockets=8,cores=10,threads=1 \
-numa node,nodeid=0,cpus=0-9,mem=64g \
-numa node,nodeid=1,cpus=10-19,mem=64g \
-numa node,nodeid=2,cpus=20-29,mem=64g \
-numa node,nodeid=3,cpus=30-39,mem=64g \
-numa node,nodeid=4,cpus=40-49,mem=64g \
-numa node,nodeid=5,cpus=50-59,mem=64g \
-numa node,nodeid=6,cpus=60-69,mem=64g \
-numa node,nodeid=7,cpus=70-79,mem=64g \
-m 524288 \
-mem-path /dev/hugepages \
-name vm2 \
-chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/vm2.monitor,server,now
ait \
-drive
file=/dev/libvirt_lvm2/vm2,if=none,id=drive-virtio-disk0,format=raw,cache
=none,aio=native \
-device
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=v
irtio-disk0,bootindex=1 \
-monitor stdio \
-net nic,model=virtio,macaddr=52:54:00:71:01:02,netdev=nic-0 \
-netdev tap,id=nic-0,ifname=tap0,script=no,downscript=no,vhost=on \
-vnc :4


Was just doing a basic kernel build in the guest when it hung with the
following in the
dmesg of the host.

Thanks
Vinod

BUG: soft lockup - CPU#46 stuck for 23s! [vhost-135220:135231]
Modules linked in: kvm_intel kvm fuse ip6table_filter ip6_tables
ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state
nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter
ip_tables bridge stp llc autofs4 sunrpc pcc_cpufreq ipv6 vhost_net
macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support coretemp
crc32c_intel ghash_clmulni_intel microcode pcspkr mlx4_core be2net
lpc_ich mfd_core hpilo hpwdt i7core_edac edac_core sg netxen_nic ext4
mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif aesni_intel ablk_helper
cryptd lrw aes_x86_64 xts gf128mul pata_acpi ata_generic ata_piix hpsa
lpfc scsi_transport_fc scsi_tgt radeon ttm drm_kms_helper drm
i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod [last
unloaded: kvm]
CPU 46
Pid: 135231, comm: vhost-135220 Not tainted 3.7.0+ #1 HP ProLiant
DL980 G7
RIP: 0010:[8147bab0]  [8147bab0]
skb_flow_dissect+0x1b0/0x440
RSP: 0018:881ffd131bc8  EFLAGS: 0246
RAX: 8a1f7dc70c00 RBX:  RCX: 7fa0
RDX:  RSI: 881ffd131c68 RDI: 8a1ff1bd6c80
RBP: 881ffd131c58 R08: 881ffd131bf8 R09: 8a1ff1bd6c80
R10: 0010 R11: 0004 R12: 8a1ff1bd6c80
R13: 000b R14: 8147330b R15: 881ffd131b58
FS:  () GS:8a1fff98()
knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 003d5c810dc0 CR3: 009f77c04000 CR4: 27e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process vhost-135220 (pid: 135231, threadinfo 881ffd13, task
881ffcb754c0)
Stack:
  881ffd131c18 81477b90 00e2 2b289bcc58ce
  881ffd131ce4 00a2  00a2
  00a2 00a2 881ffd131c88 937e754e
Call Trace:
  [81477b90] ? memcpy_fromiovecend+0x90/0xd0
  [8147f3ca] __skb_get_rxhash+0x1a/0xe0
  [a03c90f8] tun_get_user+0x468/0x660 [tun]
  [81090010] ? __sdt_alloc+0x80/0x1a0
  [a03c934d] tun_sendmsg+0x5d/0x80 [tun]
  [a0468e8a] handle_tx+0x34a/0x680 [vhost_net]
  [a04691f5] handle_tx_kick+0x15/0x20 [vhost_net]
  [a0466dfc] vhost_worker+0x10c/0x1c0 [vhost_net]
  [a0466cf0] ? vhost_attach_cgroups_work+0x30/0x30 [vhost_net]
  [a0466cf0] ? vhost_attach_cgroups_work+0x30/0x30 [vhost_net]
  [8107ecfe] kthread+0xce/0xe0
  [8107ec30] ? kthread_freezable_should_stop+0x70/0x70
  [815537ac] ret_from_fork+0x7c/0xb0
  [8107ec30] ? kthread_freezable_should_stop+0x70/0x70
Code: b6 50 06 48 89 ce 48 c1 ee 20 31 f1 41 89 0e 48 8b 48 20 48 33
48 18 48 89 c8 48 c1 e8 20 31 c1 41 89 4e 04 e9 35 ff ff ff 66 90 0f
b6 50 09 e9 1a ff ff ff 0f 1f 80 00 00 00 00 41 8b 44 24 68
[root@hydra11 kvm_rik]#
Message from syslogd@hydra11 at Jan  9 13:06:58 ...
  kernel:BUG: soft lockup - CPU#46 stuck for 22s! [vhost-135220:135231]



.

RE: Installation of Windows 8 hangs with KVM

2013-01-09 Thread Ren, Yongjie

 -Original Message-
 From: Gleb Natapov [mailto:g...@redhat.com]
 Sent: Monday, January 07, 2013 5:21 PM
 To: Ren, Yongjie
 Cc: Stefan Pietsch; kvm@vger.kernel.org
 Subject: Re: Installation of Windows 8 hangs with KVM

 On Mon, Jan 07, 2013 at 09:13:37AM +, Ren, Yongjie wrote:
   -Original Message-
   From: kvm-ow...@vger.kernel.org
 [mailto:kvm-ow...@vger.kernel.org]
   On Behalf Of Gleb Natapov
   Sent: Monday, January 07, 2013 4:54 PM
   To: Ren, Yongjie
   Cc: Stefan Pietsch; kvm@vger.kernel.org
   Subject: Re: Installation of Windows 8 hangs with KVM

   On Mon, Jan 07, 2013 at 08:38:59AM +, Ren, Yongjie wrote:
 -Original Message-
 From: kvm-ow...@vger.kernel.org
   [mailto:kvm-ow...@vger.kernel.org]
 On Behalf Of Stefan Pietsch
 Sent: Monday, January 07, 2013 2:25 AM
 To: Gleb Natapov
 Cc: kvm@vger.kernel.org
 Subject: Re: Installation of Windows 8 hangs with KVM

 * Gleb Natapov g...@redhat.com [2013-01-06 11:11]:
  On Fri, Jan 04, 2013 at 10:58:33PM +0100, Stefan Pietsch wrote:
   Hi all,

   when I run KVM with this command the Windows 8 installation
   stops
 with
   error code 0x005D:
   kvm -m 1024 -hda win8.img -cdrom windows_8_x86.iso

   After adding the option -cpu host the installation proceeds to
 a
   black
   screen and hangs.

   With Virtualbox the installation succeeds.
   The host CPU is an Intel Core Duo L2400.

   Do you have any suggestions?

  What is your kernel/qemu version?

 I'm using Debian unstable.

 qemu-kvm 1.1.2+dfsg-3
 Linux version 3.2.0-4-686-pae (debian-ker...@lists.debian.org) (gcc
 version 4.6.3 (Debian 4.6.3-14) ) #1 SMP Debian 3.2.35-2

you met issue only for 32bit Win8 (not 64 bit Win8), right?
I think it's the same issue as the below bug I reported.
https://bugs.launchpad.net/qemu/+bug/1007269
You can try with '-cpu coreduo' or '-cpu core2duo' in qemu-kvm
 command
   line.

This should be a known issue which is caused by missing 'SEP' CPU
 flag.
See another bug in Redhat bugzilla.
https://bugzilla.redhat.com/show_bug.cgi?id=821741

   That was RHEL kernel bug. Doubt Debian one has it.

  I don't think so. It should be a qemu bug (also described in that RHEL
 bugzilla).
 https://bugzilla.redhat.com/show_bug.cgi?id=821463 is the kernel one.

  In my SandyBridge platform, 32bit Win8 guest can boot up with '-cpu
 SandyBridge,+sep' in qemu-kvm CLI.
  But it can't boot up with '-cpu SandyBridge'.

 Which qemu version? Master has sep in SandyBridge definition. In any
 case -cpu host should have sep enabled.

Oh, sorry for my bad memory. I'm not sure it's about SEP.
I shouldn't have taken 'SandyBridge' model as an example because 32bit Win8 can 
boot up on 'SandyBridge' model.
'qemu64' is the default CPU model in QEMU/KVM on x86-64, but its 'family 
number' definition is not accurate. 
The vendor of 'qemu64' is defined as 'AMD'.
For AMD processors, AMD K8 CPU (e.g. AMD Family 15) firstly introduced SSE3 
instruction set.
'qemu64' already has 'CPUID_EXT_SSE3' in its ext_features, but its family 
number is only 6 (not equal or bigger than 15).
If I update the family number of 'qemu64' to '15', 32bit Win8 guest can boot up 
with the default 'qemu64' CPU model.
32bit Win8 may have some requirements of a 64bit CPU model.
The following patch can be a fix for 'qemu64' definition. 

diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index 78bd61e..89b203b 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -363,7 +363,7 @@ static x86_def_t builtin_x86_defs[] = {
 .vendor1 = CPUID_VENDOR_AMD_1,
 .vendor2 = CPUID_VENDOR_AMD_2,
 .vendor3 = CPUID_VENDOR_AMD_3,
-.family = 6,
+.family = 15,
 .model = 2,
 .stepping = 3,
 .features = PPRO_FEATURES |
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 10/12] virtio-net: multiqueue support

2013-01-09 Thread Jason Wang

On Wednesday, January 09, 2013 11:26:33 PM Jason Wang wrote:
 On 01/09/2013 06:01 PM, Wanlong Gao wrote:
  On 01/09/2013 05:30 PM, Jason Wang wrote:
  On 01/09/2013 04:23 PM, Wanlong Gao wrote:
  On 01/08/2013 06:14 PM, Jason Wang wrote:
  On 01/08/2013 06:00 PM, Wanlong Gao wrote:
  On 01/08/2013 05:51 PM, Jason Wang wrote:
  On 01/08/2013 05:49 PM, Wanlong Gao wrote:
  On 01/08/2013 05:29 PM, Jason Wang wrote:
  On 01/08/2013 05:07 PM, Wanlong Gao wrote:
  On 12/28/2012 06:32 PM, Jason Wang wrote:
  +} else if (nc-peer-info-type != 
  NET_CLIENT_OPTIONS_KIND_TAP) {
  +ret = -1;
  +} else {
  +ret = tap_detach(nc-peer);
  +}
  +
  +return ret;
  +}
  +
[...]
  I got guest kernel panic when using this way and set queues=4.
  
  Does it happens w/o or w/ a fd parameter? What's the qemu command line?
  Did you meet it during boot time?
  
  The QEMU command line is
  
  /work/git/qemu/x86_64-softmmu/qemu-system-x86_64 -name f17 -M pc-0.15
  -enable-kvm -m 3096 \ -smp 4,sockets=4,cores=1,threads=1 \
  -uuid c31a9f3e-4161-c53a-339c-5dc36d0497cb -no-user-config -nodefaults \
  -chardev
  socket,id=charmonitor,path=/var/lib/libvirt/qemu/f17.monitor,server,nowai
  t \ -mon chardev=charmonitor,id=monitor,mode=control \
  -rtc base=utc -no-shutdown \
  -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \
  -device
  virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0xb,num_queues=4,hotplug=on \
  -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 \
  -drive file=/vm/f17.img,if=none,id=drive-virtio-disk0,format=qcow2 \
  -device
  virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=vi
  rtio-disk0,bootindex=1 \ -drive
  file=/vm2/f17-kernel.img,if=none,id=drive-virtio-disk1,format=qcow2 \
  -device
  virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk1,id=vi
  rtio-disk1 \ -drive
  file=/vm/virtio-scsi/scsi3.img,if=none,id=drive-scsi0-0-2-0,format=raw \
  -device
  scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=2,drive=drive-scsi0-0-2-0,id=
  scsi0-0-2-0,removable=on \ -drive
  file=/vm/virtio-scsi/scsi4.img,if=none,id=drive-scsi0-0-3-0,format=raw \
  -device
  scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=3,drive=drive-scsi0-0-3-0,id=
  scsi0-0-3-0 \ -drive
  file=/vm/virtio-scsi/scsi1.img,if=none,id=drive-scsi0-0-0-0,format=raw \
  -device
  scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=
  scsi0-0-0-0 \ -drive
  file=/vm/virtio-scsi/scsi2.img,if=none,id=drive-scsi0-0-1-0,format=raw \
  -device
  scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi0-0-1-0,id=
  scsi0-0-1-0 \ -chardev pty,id=charserial0 -device
  isa-serial,chardev=charserial0,id=serial0 \ -chardev
  file,id=charserial1,path=/vm/f17.log \
  -device isa-serial,chardev=charserial1,id=serial1 \
  -device usb-tablet,id=input0 -vga std \
  -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 \
  -netdev tap,id=hostnet0,vhost=on,queues=4 \
  -device
  virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:ce:7b:29,bus=pci.0,ad
  dr=0x3 \ -monitor stdio
  
  I got panic just after booting the system, did nothing,  waited for a
  while, the guest panicked.
  
  [   28.053004] BUG: soft lockup - CPU#1 stuck for 23s! [ip:592]
  [   28.053004] Modules linked in: ip6t_REJECT nf_conntrack_ipv6
  nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables uinput
  joydev microcode virtio_balloon pcspkr virtio_net i2c_piix4 i2c_core
  virtio_scsi virtio_blk floppy [   28.053004] CPU 1
  [   28.053004] Pid: 592, comm: ip Not tainted 3.8.0-rc1-net+ #3 Bochs
  Bochs
  [   28.053004] RIP: 0010:[8137a9ab]  [8137a9ab]
  virtqueue_get_buf+0xb/0x120 [   28.053004] RSP: 0018:8800bc913550 
  EFLAGS: 0246
  [   28.053004] RAX:  RBX: 8800bc49c000 RCX:
  8800bc49e000 [   28.053004] RDX:  RSI:
  8800bc913584 RDI: 8800bcfd4000 [   28.053004] RBP:
  8800bc913558 R08: 8800bcfd0800 R09:  [  
  28.053004] R10: 8800bc49c000 R11: 880036cc4de0 R12:
  8800bcfd4000 [   28.053004] R13: 8800bc913558 R14:
  8137ad73 R15: 000200d0 [   28.053004] FS: 
  7fb27a589740() GS:8800c148() knlGS: [
28.053004] CS:  0010 DS:  ES:  CR0: 80050033
  [   28.053004] CR2: 00640530 CR3: baeff000 CR4:
  06e0 [   28.053004] DR0:  DR1:
   DR2:  [   28.053004] DR3:
   DR6: 0ff0 DR7: 0400 [  
  28.053004] Process ip (pid: 592, threadinfo 8800bc912000, task
  880036da2e20) [   28.053004] Stack:
  [   28.053004]  8800bcfd0800 8800bc913638 a003e9bb
  8800bc913656 [   28.053004]  00010002 8800c17ebb08
  0050ff10 ea0002f244c0 [   28.053004]  00020582
    ea0002f244c0 [   28.053004] Call
  Trace:
  [   28.053004]

Re: [Qemu-devel] [PATCH 10/12] virtio-net: multiqueue support

2013-01-09 Thread Wanlong Gao

On 01/10/2013 02:43 PM, Jason Wang wrote:
 On Wednesday, January 09, 2013 11:26:33 PM Jason Wang wrote:
 On 01/09/2013 06:01 PM, Wanlong Gao wrote:
 On 01/09/2013 05:30 PM, Jason Wang wrote:
 On 01/09/2013 04:23 PM, Wanlong Gao wrote:
 On 01/08/2013 06:14 PM, Jason Wang wrote:
 On 01/08/2013 06:00 PM, Wanlong Gao wrote:
 On 01/08/2013 05:51 PM, Jason Wang wrote:
 On 01/08/2013 05:49 PM, Wanlong Gao wrote:
 On 01/08/2013 05:29 PM, Jason Wang wrote:
 On 01/08/2013 05:07 PM, Wanlong Gao wrote:
 On 12/28/2012 06:32 PM, Jason Wang wrote:
 +} else if (nc-peer-info-type != 
 NET_CLIENT_OPTIONS_KIND_TAP) {
 +ret = -1;
 +} else {
 +ret = tap_detach(nc-peer);
 +}
 +
 +return ret;
 +}
 +
 [...]
 I got guest kernel panic when using this way and set queues=4.

 Does it happens w/o or w/ a fd parameter? What's the qemu command line?
 Did you meet it during boot time?

 The QEMU command line is

 /work/git/qemu/x86_64-softmmu/qemu-system-x86_64 -name f17 -M pc-0.15
 -enable-kvm -m 3096 \ -smp 4,sockets=4,cores=1,threads=1 \
 -uuid c31a9f3e-4161-c53a-339c-5dc36d0497cb -no-user-config -nodefaults \
 -chardev
 socket,id=charmonitor,path=/var/lib/libvirt/qemu/f17.monitor,server,nowai
 t \ -mon chardev=charmonitor,id=monitor,mode=control \
 -rtc base=utc -no-shutdown \
 -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \
 -device
 virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0xb,num_queues=4,hotplug=on \
 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 \
 -drive file=/vm/f17.img,if=none,id=drive-virtio-disk0,format=qcow2 \
 -device
 virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=vi
 rtio-disk0,bootindex=1 \ -drive
 file=/vm2/f17-kernel.img,if=none,id=drive-virtio-disk1,format=qcow2 \
 -device
 virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk1,id=vi
 rtio-disk1 \ -drive
 file=/vm/virtio-scsi/scsi3.img,if=none,id=drive-scsi0-0-2-0,format=raw \
 -device
 scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=2,drive=drive-scsi0-0-2-0,id=
 scsi0-0-2-0,removable=on \ -drive
 file=/vm/virtio-scsi/scsi4.img,if=none,id=drive-scsi0-0-3-0,format=raw \
 -device
 scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=3,drive=drive-scsi0-0-3-0,id=
 scsi0-0-3-0 \ -drive
 file=/vm/virtio-scsi/scsi1.img,if=none,id=drive-scsi0-0-0-0,format=raw \
 -device
 scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=
 scsi0-0-0-0 \ -drive
 file=/vm/virtio-scsi/scsi2.img,if=none,id=drive-scsi0-0-1-0,format=raw \
 -device
 scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi0-0-1-0,id=
 scsi0-0-1-0 \ -chardev pty,id=charserial0 -device
 isa-serial,chardev=charserial0,id=serial0 \ -chardev
 file,id=charserial1,path=/vm/f17.log \
 -device isa-serial,chardev=charserial1,id=serial1 \
 -device usb-tablet,id=input0 -vga std \
 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 \
 -netdev tap,id=hostnet0,vhost=on,queues=4 \
 -device
 virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:ce:7b:29,bus=pci.0,ad
 dr=0x3 \ -monitor stdio

 I got panic just after booting the system, did nothing,  waited for a
 while, the guest panicked.

 [   28.053004] BUG: soft lockup - CPU#1 stuck for 23s! [ip:592]
 [   28.053004] Modules linked in: ip6t_REJECT nf_conntrack_ipv6
 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables uinput
 joydev microcode virtio_balloon pcspkr virtio_net i2c_piix4 i2c_core
 virtio_scsi virtio_blk floppy [   28.053004] CPU 1
 [   28.053004] Pid: 592, comm: ip Not tainted 3.8.0-rc1-net+ #3 Bochs
 Bochs
 [   28.053004] RIP: 0010:[8137a9ab]  [8137a9ab]
 virtqueue_get_buf+0xb/0x120 [   28.053004] RSP: 0018:8800bc913550 
 EFLAGS: 0246
 [   28.053004] RAX:  RBX: 8800bc49c000 RCX:
 8800bc49e000 [   28.053004] RDX:  RSI:
 8800bc913584 RDI: 8800bcfd4000 [   28.053004] RBP:
 8800bc913558 R08: 8800bcfd0800 R09:  [  
 28.053004] R10: 8800bc49c000 R11: 880036cc4de0 R12:
 8800bcfd4000 [   28.053004] R13: 8800bc913558 R14:
 8137ad73 R15: 000200d0 [   28.053004] FS: 
 7fb27a589740() GS:8800c148() knlGS: [
   28.053004] CS:  0010 DS:  ES:  CR0: 80050033
 [   28.053004] CR2: 00640530 CR3: baeff000 CR4:
 06e0 [   28.053004] DR0:  DR1:
  DR2:  [   28.053004] DR3:
  DR6: 0ff0 DR7: 0400 [  
 28.053004] Process ip (pid: 592, threadinfo 8800bc912000, task
 880036da2e20) [   28.053004] Stack:
 [   28.053004]  8800bcfd0800 8800bc913638 a003e9bb
 8800bc913656 [   28.053004]  00010002 8800c17ebb08
 0050ff10 ea0002f244c0 [   28.053004]  00020582
   ea0002f244c0 [   28.053004] Call
 Trace:
 [   28.053004]  [a003e9bb]
 virtnet_send_command.constprop.26+0x24b/0x270

Re: [PATCH] virtio-spec: fix two typos

2013-01-09 Thread Rusty Russell

Stefan Hajnoczi stefa...@gmail.com writes:

 On Wed, Jan 09, 2013 at 03:55:23PM +0800, ak...@redhat.com wrote:
 From: Amos Kong ak...@redhat.com
 
 VIRTIO_NET_F_VTRL_VQ - VIRTIO_NET_F_CTRL_VQ
 VIRTIO_NET_CTRL_MQ is defined to 4 in kernel code
 
 Signed-off-by: Amos Kong ak...@redhat.com
 ---
  virtio-spec.lyx | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

 Reviewed-by: Stefan Hajnoczi stefa...@redhat.com

Thanks, applied.

Cheers,
Rusty.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 10/12] virtio-net: multiqueue support

2013-01-09 Thread Jason Wang

On Thursday, January 10, 2013 02:49:14 PM Wanlong Gao wrote:
 On 01/10/2013 02:43 PM, Jason Wang wrote:
  On Wednesday, January 09, 2013 11:26:33 PM Jason Wang wrote:
  On 01/09/2013 06:01 PM, Wanlong Gao wrote:
  On 01/09/2013 05:30 PM, Jason Wang wrote:
  On 01/09/2013 04:23 PM, Wanlong Gao wrote:
  On 01/08/2013 06:14 PM, Jason Wang wrote:
  On 01/08/2013 06:00 PM, Wanlong Gao wrote:
  On 01/08/2013 05:51 PM, Jason Wang wrote:
  On 01/08/2013 05:49 PM, Wanlong Gao wrote:
  On 01/08/2013 05:29 PM, Jason Wang wrote:
  On 01/08/2013 05:07 PM, Wanlong Gao wrote:
  On 12/28/2012 06:32 PM, Jason Wang wrote:
  +} else if (nc-peer-info-type !=
  NET_CLIENT_OPTIONS_KIND_TAP) {
  +ret = -1;
  +} else {
  +ret = tap_detach(nc-peer);
  +}
  +
  +return ret;
  +}
  +
  
  [...]
  
  I got guest kernel panic when using this way and set queues=4.
  
  Does it happens w/o or w/ a fd parameter? What's the qemu command line?
  Did you meet it during boot time?
  
  The QEMU command line is
  
  /work/git/qemu/x86_64-softmmu/qemu-system-x86_64 -name f17 -M pc-0.15
  -enable-kvm -m 3096 \ -smp 4,sockets=4,cores=1,threads=1 \
  -uuid c31a9f3e-4161-c53a-339c-5dc36d0497cb -no-user-config -nodefaults \
  -chardev
  socket,id=charmonitor,path=/var/lib/libvirt/qemu/f17.monitor,server,nowa
  i
  t \ -mon chardev=charmonitor,id=monitor,mode=control \
  -rtc base=utc -no-shutdown \
  -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \
  -device
  virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0xb,num_queues=4,hotplug=on \
  -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 \
  -drive file=/vm/f17.img,if=none,id=drive-virtio-disk0,format=qcow2 \
  -device
  virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=v
  i
  rtio-disk0,bootindex=1 \ -drive
  file=/vm2/f17-kernel.img,if=none,id=drive-virtio-disk1,format=qcow2 \
  -device
  virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk1,id=v
  i
  rtio-disk1 \ -drive
  file=/vm/virtio-scsi/scsi3.img,if=none,id=drive-scsi0-0-2-0,format=raw \
  -device
  scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=2,drive=drive-scsi0-0-2-0,id
  =
  scsi0-0-2-0,removable=on \ -drive
  file=/vm/virtio-scsi/scsi4.img,if=none,id=drive-scsi0-0-3-0,format=raw \
  -device
  scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=3,drive=drive-scsi0-0-3-0,id
  =
  scsi0-0-3-0 \ -drive
  file=/vm/virtio-scsi/scsi1.img,if=none,id=drive-scsi0-0-0-0,format=raw \
  -device
  scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id
  =
  scsi0-0-0-0 \ -drive
  file=/vm/virtio-scsi/scsi2.img,if=none,id=drive-scsi0-0-1-0,format=raw \
  -device
  scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi0-0-1-0,id
  =
  scsi0-0-1-0 \ -chardev pty,id=charserial0 -device
  isa-serial,chardev=charserial0,id=serial0 \ -chardev
  file,id=charserial1,path=/vm/f17.log \
  -device isa-serial,chardev=charserial1,id=serial1 \
  -device usb-tablet,id=input0 -vga std \
  -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 \
  -netdev tap,id=hostnet0,vhost=on,queues=4 \
  -device
  virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:ce:7b:29,bus=pci.0,a
  d
  dr=0x3 \ -monitor stdio
  
  I got panic just after booting the system, did nothing,  waited for a
  while, the guest panicked.
  
  [   28.053004] BUG: soft lockup - CPU#1 stuck for 23s! [ip:592]
  [   28.053004] Modules linked in: ip6t_REJECT nf_conntrack_ipv6
  nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables uinput
  joydev microcode virtio_balloon pcspkr virtio_net i2c_piix4 i2c_core
  virtio_scsi virtio_blk floppy [   28.053004] CPU 1
  [   28.053004] Pid: 592, comm: ip Not tainted 3.8.0-rc1-net+ #3 Bochs
  Bochs
  [   28.053004] RIP: 0010:[8137a9ab]  [8137a9ab]
  virtqueue_get_buf+0xb/0x120 [   28.053004] RSP: 0018:8800bc913550
  EFLAGS: 0246
  [   28.053004] RAX:  RBX: 8800bc49c000 RCX:
  8800bc49e000 [   28.053004] RDX:  RSI:
  8800bc913584 RDI: 8800bcfd4000 [   28.053004] RBP:
  8800bc913558 R08: 8800bcfd0800 R09:  [
  28.053004] R10: 8800bc49c000 R11: 880036cc4de0 R12:
  8800bcfd4000 [   28.053004] R13: 8800bc913558 R14:
  8137ad73 R15: 000200d0 [   28.053004] FS:
  7fb27a589740() GS:8800c148() knlGS:
  [
  
28.053004] CS:  0010 DS:  ES:  CR0: 80050033
  
  [   28.053004] CR2: 00640530 CR3: baeff000 CR4:
  06e0 [   28.053004] DR0:  DR1:
   DR2:  [   28.053004] DR3:
   DR6: 0ff0 DR7: 0400 [
  28.053004] Process ip (pid: 592, threadinfo 8800bc912000, task
  880036da2e20) [   28.053004] Stack:
  [   28.053004]  8800bcfd0800 8800bc913638 a003e9bb
  8800bc913656 [   28.053004]  00010002 8800c17ebb08
  0050ff10 ea0002f244c0 [

[PATCH v9 0/3] x86, apicv: Add APIC virtualization support

2013-01-09 Thread Yang Zhang

From: Yang Zhang yang.z.zh...@intel.com

APIC virtualization is a new feature which can eliminate most of VM exit
when vcpu handle a interrupt:

APIC register virtualization:
APIC read access doesn't cause APIC-access VM exits.
APIC write becomes trap-like.

Virtual interrupt delivery:
Virtual interrupt delivery avoids KVM to inject vAPIC interrupts
manually, which is fully taken care of by the hardware.

Please refer to Intel SDM volume 3, chapter 29 for more details.
Changes v8 to v9:
 * Update eoi exit bitmap by vcpu itself.
 * Enable virtualize x2apic mode when guest is using x2apic.
 * Rebase on top of KVM upstream

Changes v7 to v8:
 * According Marcelo's suggestion, add comments for irr_pending and isr_count,
   since the two valiables have different meaning when using apicv.
 * Set highest bit in vISR to SVI after migation.
 * Use spinlock to access eoi exit bitmap synchronously.
 * Enable virtualize x2apic mode when guest is using x2apic
 * Rebased on top of KVM upstream.

Changes v6 to v7:
 * fix a bug when set exit bitmap.
 * Rebased on top of KVM upstream.

Changes v5 to v6:
 * minor adjustments according gleb's comments
 * Rebased on top of KVM upstream.

Changes v4 to v5:
 * Set eoi exit bitmap when an interrupt has notifier registered.
 * Use request to track ioapic entry's modification.
 * Rebased on top of KVM upstream.

Changes v3 to v4:
 * use one option to control both register virtualization and virtual interrupt
   delivery.
 * Update eoi exit bitmap when programing ioapic or programing apic's 
id/dfr/ldr.
 * Rebased on top of KVM upstream.

Changes v2 to v3:
 * Drop Posted Interrupt patch from v3.
   According Gleb's suggestion, we will use global vector for all VCPUs as 
notification
   event vector. So we will rewrite the Posted Interrupt patch. And resend it 
later.
 * Use TMR to set the eoi exiting bitmap. We only want to set eoi exiting 
bitmap for
   those interrupt which is level trigger or has notifier in EOI write path. So 
TMR is
   enough to distinguish the interrupt trigger mode.
 * Simplify some code according Gleb's comments.
 * rebased on top of KVM upstream.

Changes v1 to v2:
 * Add Posted Interrupt support in this series patch.
 * Since there is a notifer hook in vAPIC EOI for PIT interrupt. So always Set 
PIT
   interrupt in eoi exit bitmap to force vmexit when EOI to interrupt.
 * Rebased on top of KVM upstream

Yang Zhang (3):
  x86, apicv: add APICv register virtualization support
  x86, apicv: add virtual x2apic support
  x86, apicv: add virtual interrupt delivery support

 arch/x86/include/asm/kvm_host.h |7 +
 arch/x86/include/asm/vmx.h  |   14 ++
 arch/x86/kvm/irq.c  |   56 +-
 arch/x86/kvm/lapic.c|   92 ++---
 arch/x86/kvm/lapic.h|   25 +++
 arch/x86/kvm/svm.c  |   24 +++
 arch/x86/kvm/vmx.c  |  404 ++-
 arch/x86/kvm/x86.c  |   14 ++-
 include/linux/kvm_host.h|3 +
 virt/kvm/ioapic.c   |   18 ++
 virt/kvm/ioapic.h   |4 +
 virt/kvm/irq_comm.c |   22 ++
 virt/kvm/kvm_main.c |5 +
 13 files changed, 643 insertions(+), 45 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v9 1/3] x86, apicv: add APICv register virtualization support

2013-01-09 Thread Yang Zhang

- APIC read doesn't cause VM-Exit
- APIC write becomes trap-like

Signed-off-by: Kevin Tian kevin.t...@intel.com
Signed-off-by: Yang Zhang yang.z.zh...@intel.com
---
 arch/x86/include/asm/vmx.h |2 ++
 arch/x86/kvm/lapic.c   |   15 +++
 arch/x86/kvm/lapic.h   |2 ++
 arch/x86/kvm/vmx.c |   33 -
 4 files changed, 51 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index e385df9..44c3f7e 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -66,6 +66,7 @@
 #define EXIT_REASON_EPT_MISCONFIG   49
 #define EXIT_REASON_WBINVD  54
 #define EXIT_REASON_XSETBV  55
+#define EXIT_REASON_APIC_WRITE  56
 #define EXIT_REASON_INVPCID 58
 
 #define VMX_EXIT_REASONS \
@@ -141,6 +142,7 @@
 #define SECONDARY_EXEC_ENABLE_VPID  0x0020
 #define SECONDARY_EXEC_WBINVD_EXITING  0x0040
 #define SECONDARY_EXEC_UNRESTRICTED_GUEST  0x0080
+#define SECONDARY_EXEC_APIC_REGISTER_VIRT   0x0100
 #define SECONDARY_EXEC_PAUSE_LOOP_EXITING  0x0400
 #define SECONDARY_EXEC_ENABLE_INVPCID  0x1000
 
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 9392f52..0664c13 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1212,6 +1212,21 @@ void kvm_lapic_set_eoi(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_lapic_set_eoi);
 
+/* emulate APIC access in a trap manner */
+void kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset)
+{
+   u32 val = 0;
+
+   /* hw has done the conditional check and inst decode */
+   offset = 0xff0;
+
+   apic_reg_read(vcpu-arch.apic, offset, 4, val);
+
+   /* TODO: optimize to just emulate side effect w/o one more write */
+   apic_reg_write(vcpu-arch.apic, offset, val);
+}
+EXPORT_SYMBOL_GPL(kvm_apic_write_nodecode);
+
 void kvm_free_lapic(struct kvm_vcpu *vcpu)
 {
struct kvm_lapic *apic = vcpu-arch.apic;
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index e5ebf9f..9a8ee22 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -64,6 +64,8 @@ int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu);
 u64 kvm_get_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu);
 void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu, u64 data);
 
+void kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset);
+
 void kvm_lapic_set_vapic_addr(struct kvm_vcpu *vcpu, gpa_t vapic_addr);
 void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu);
 void kvm_lapic_sync_to_vapic(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 55dfc37..688f43f 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -84,6 +84,9 @@ module_param(vmm_exclusive, bool, S_IRUGO);
 static bool __read_mostly fasteoi = 1;
 module_param(fasteoi, bool, S_IRUGO);
 
+static bool __read_mostly enable_apicv_reg_vid = 1;
+module_param(enable_apicv_reg_vid, bool, S_IRUGO);
+
 /*
  * If nested=1, nested virtualization is supported, i.e., guests may use
  * VMX and be a hypervisor for its own guests. If nested=0, guests may not
@@ -764,6 +767,12 @@ static inline bool 
cpu_has_vmx_virtualize_apic_accesses(void)
SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
 }
 
+static inline bool cpu_has_vmx_apic_register_virt(void)
+{
+   return vmcs_config.cpu_based_2nd_exec_ctrl 
+   SECONDARY_EXEC_APIC_REGISTER_VIRT;
+}
+
 static inline bool cpu_has_vmx_flexpriority(void)
 {
return cpu_has_vmx_tpr_shadow() 
@@ -2541,7 +2550,8 @@ static __init int setup_vmcs_config(struct vmcs_config 
*vmcs_conf)
SECONDARY_EXEC_UNRESTRICTED_GUEST |
SECONDARY_EXEC_PAUSE_LOOP_EXITING |
SECONDARY_EXEC_RDTSCP |
-   SECONDARY_EXEC_ENABLE_INVPCID;
+   SECONDARY_EXEC_ENABLE_INVPCID |
+   SECONDARY_EXEC_APIC_REGISTER_VIRT;
if (adjust_vmx_controls(min2, opt2,
MSR_IA32_VMX_PROCBASED_CTLS2,
_cpu_based_2nd_exec_control)  0)
@@ -2552,6 +2562,11 @@ static __init int setup_vmcs_config(struct vmcs_config 
*vmcs_conf)
SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES))
_cpu_based_exec_control = ~CPU_BASED_TPR_SHADOW;
 #endif
+
+   if (!(_cpu_based_exec_control  CPU_BASED_TPR_SHADOW))
+   _cpu_based_2nd_exec_control = ~(
+   SECONDARY_EXEC_APIC_REGISTER_VIRT);
+
if (_cpu_based_2nd_exec_control  SECONDARY_EXEC_ENABLE_EPT) {
/* CR3 accesses and invlpg don't need to cause VM Exits when EPT
   enabled */
@@ -2749,6 +2764,9 @@ static __init int hardware_setup(void)
if (!cpu_has_vmx_ple())
ple_gap = 0;
 
+   if (!cpu_has_vmx_apic_register_virt())
+

[PATCH v9 3/3] x86, apicv: add virtual interrupt delivery support

2013-01-09 Thread Yang Zhang

From: Yang Zhang yang.z.zh...@intel.com

Virtual interrupt delivery avoids KVM to inject vAPIC interrupts
manually, which is fully taken care of by the hardware. This needs
some special awareness into existing interrupr injection path:

- for pending interrupt, instead of direct injection, we may need
  update architecture specific indicators before resuming to guest.

- A pending interrupt, which is masked by ISR, should be also
  considered in above update action, since hardware will decide
  when to inject it at right time. Current has_interrupt and
  get_interrupt only returns a valid vector from injection p.o.v.

Signed-off-by: Kevin Tian kevin.t...@intel.com
Signed-off-by: Yang Zhang yang.z.zh...@intel.com
---
 arch/x86/include/asm/kvm_host.h |5 +
 arch/x86/include/asm/vmx.h  |   11 +++
 arch/x86/kvm/irq.c  |   56 +++-
 arch/x86/kvm/lapic.c|   72 +--
 arch/x86/kvm/lapic.h|   23 +
 arch/x86/kvm/svm.c  |   18 
 arch/x86/kvm/vmx.c  |  191 +--
 arch/x86/kvm/x86.c  |   14 +++-
 include/linux/kvm_host.h|3 +
 virt/kvm/ioapic.c   |   18 
 virt/kvm/ioapic.h   |4 +
 virt/kvm/irq_comm.c |   22 +
 virt/kvm/kvm_main.c |5 +
 13 files changed, 399 insertions(+), 43 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 572a562..f471856 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -697,6 +697,10 @@ struct kvm_x86_ops {
void (*enable_nmi_window)(struct kvm_vcpu *vcpu);
void (*enable_irq_window)(struct kvm_vcpu *vcpu);
void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);
+   int (*has_virtual_interrupt_delivery)(struct kvm_vcpu *vcpu);
+   void (*update_apic_irq)(struct kvm_vcpu *vcpu, int max_irr);
+   void (*update_eoi_exitmap)(struct kvm_vcpu *vcpu);
+   void (*set_svi)(int isr);
void (*enable_virtual_x2apic_mode)(struct kvm_vcpu *vcpu);
void (*disable_virtual_x2apic_mode)(struct kvm_vcpu *vcpu);
int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
@@ -993,6 +997,7 @@ int kvm_age_hva(struct kvm *kvm, unsigned long hva);
 int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
 void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
 int cpuid_maxphyaddr(struct kvm_vcpu *vcpu);
+int kvm_cpu_has_injectable_intr(struct kvm_vcpu *v);
 int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu);
 int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu);
 int kvm_cpu_get_interrupt(struct kvm_vcpu *v);
diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 0a54df0..694586c 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -62,6 +62,7 @@
 #define EXIT_REASON_MCE_DURING_VMENTRY  41
 #define EXIT_REASON_TPR_BELOW_THRESHOLD 43
 #define EXIT_REASON_APIC_ACCESS 44
+#define EXIT_REASON_EOI_INDUCED 45
 #define EXIT_REASON_EPT_VIOLATION   48
 #define EXIT_REASON_EPT_MISCONFIG   49
 #define EXIT_REASON_WBINVD  54
@@ -144,6 +145,7 @@
 #define SECONDARY_EXEC_WBINVD_EXITING  0x0040
 #define SECONDARY_EXEC_UNRESTRICTED_GUEST  0x0080
 #define SECONDARY_EXEC_APIC_REGISTER_VIRT   0x0100
+#define SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY0x0200
 #define SECONDARY_EXEC_PAUSE_LOOP_EXITING  0x0400
 #define SECONDARY_EXEC_ENABLE_INVPCID  0x1000
 
@@ -181,6 +183,7 @@ enum vmcs_field {
GUEST_GS_SELECTOR   = 0x080a,
GUEST_LDTR_SELECTOR = 0x080c,
GUEST_TR_SELECTOR   = 0x080e,
+   GUEST_INTR_STATUS   = 0x0810,
HOST_ES_SELECTOR= 0x0c00,
HOST_CS_SELECTOR= 0x0c02,
HOST_SS_SELECTOR= 0x0c04,
@@ -208,6 +211,14 @@ enum vmcs_field {
APIC_ACCESS_ADDR_HIGH   = 0x2015,
EPT_POINTER = 0x201a,
EPT_POINTER_HIGH= 0x201b,
+   EOI_EXIT_BITMAP0= 0x201c,
+   EOI_EXIT_BITMAP0_HIGH   = 0x201d,
+   EOI_EXIT_BITMAP1= 0x201e,
+   EOI_EXIT_BITMAP1_HIGH   = 0x201f,
+   EOI_EXIT_BITMAP2= 0x2020,
+   EOI_EXIT_BITMAP2_HIGH   = 0x2021,
+   EOI_EXIT_BITMAP3= 0x2022,
+   EOI_EXIT_BITMAP3_HIGH   = 0x2023,
GUEST_PHYSICAL_ADDRESS  = 0x2400,
GUEST_PHYSICAL_ADDRESS_HIGH = 0x2401,
VMCS_LINK_POINTER   = 0x2800,
diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
index b111aee..e113440 100644
--- a/arch/x86/kvm/irq.c
+++ b/arch/x86/kvm/irq.c
@@ -38,6 +38,38 @@ int kvm_cpu_has_pending_timer(struct kvm_vcpu

[PATCH v9 2/3] x86, apicv: add virtual x2apic support

2013-01-09 Thread Yang Zhang

From: Yang Zhang yang.z.zh...@intel.com

basically to benefit from apicv, we need to enable virtualized x2apic mode.
Currently, we only enable it when guest is really using x2apic.

Also, clear MSR bitmap for corresponding x2apic MSRs when guest enabled x2apic:
0x800 - 0x8ff: no read intercept for apicv register virtualization,
   except APIC ID and TMCCT.
APIC ID and TMCCT: need software's assistance to get right value.
TPR,EOI,SELF-IPI: no write intercept for virtual interrupt delivery.

Signed-off-by: Kevin Tian kevin.t...@intel.com
Signed-off-by: Yang Zhang yang.z.zh...@intel.com
---
 arch/x86/include/asm/kvm_host.h |2 +
 arch/x86/include/asm/vmx.h  |1 +
 arch/x86/kvm/lapic.c|5 +-
 arch/x86/kvm/svm.c  |6 +
 arch/x86/kvm/vmx.c  |  194 +--
 5 files changed, 200 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c431b33..572a562 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -697,6 +697,8 @@ struct kvm_x86_ops {
void (*enable_nmi_window)(struct kvm_vcpu *vcpu);
void (*enable_irq_window)(struct kvm_vcpu *vcpu);
void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);
+   void (*enable_virtual_x2apic_mode)(struct kvm_vcpu *vcpu);
+   void (*disable_virtual_x2apic_mode)(struct kvm_vcpu *vcpu);
int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
int (*get_tdp_level)(void);
u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 44c3f7e..0a54df0 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -139,6 +139,7 @@
 #define SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES 0x0001
 #define SECONDARY_EXEC_ENABLE_EPT   0x0002
 #define SECONDARY_EXEC_RDTSCP  0x0008
+#define SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE   0x0010
 #define SECONDARY_EXEC_ENABLE_VPID  0x0020
 #define SECONDARY_EXEC_WBINVD_EXITING  0x0040
 #define SECONDARY_EXEC_UNRESTRICTED_GUEST  0x0080
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 0664c13..ec38906 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1328,7 +1328,10 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value)
u32 id = kvm_apic_id(apic);
u32 ldr = ((id  4)  16) | (1  (id  0xf));
kvm_apic_set_ldr(apic, ldr);
-   }
+   kvm_x86_ops-enable_virtual_x2apic_mode(vcpu);
+   } else
+   kvm_x86_ops-disable_virtual_x2apic_mode(vcpu);
+
apic-base_address = apic-vcpu-arch.apic_base 
 MSR_IA32_APICBASE_BASE;
 
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index d29d3cd..0b82cb1 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3571,6 +3571,11 @@ static void update_cr8_intercept(struct kvm_vcpu *vcpu, 
int tpr, int irr)
set_cr_intercept(svm, INTERCEPT_CR8_WRITE);
 }
 
+static void svm_enable_virtual_x2apic_mode(struct kvm_vcpu *vcpu)
+{
+   return;
+}
+
 static int svm_nmi_allowed(struct kvm_vcpu *vcpu)
 {
struct vcpu_svm *svm = to_svm(vcpu);
@@ -4290,6 +4295,7 @@ static struct kvm_x86_ops svm_x86_ops = {
.enable_nmi_window = enable_nmi_window,
.enable_irq_window = enable_irq_window,
.update_cr8_intercept = update_cr8_intercept,
+   .enable_virtual_x2apic_mode = svm_enable_virtual_x2apic_mode,
 
.set_tss_addr = svm_set_tss_addr,
.get_tdp_level = get_npt_level,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 688f43f..b203ce7 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -433,6 +433,8 @@ struct vcpu_vmx {
 
bool rdtscp_enabled;
 
+   bool virtual_x2apic_enabled;
+
/* Support for a guest hypervisor (nested VMX) */
struct nested_vmx nested;
 };
@@ -767,12 +769,23 @@ static inline bool 
cpu_has_vmx_virtualize_apic_accesses(void)
SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
 }
 
+static inline bool cpu_has_vmx_virtualize_x2apic_mode(void)
+{
+   return vmcs_config.cpu_based_2nd_exec_ctrl 
+   SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE;
+}
+
 static inline bool cpu_has_vmx_apic_register_virt(void)
 {
return vmcs_config.cpu_based_2nd_exec_ctrl 
SECONDARY_EXEC_APIC_REGISTER_VIRT;
 }
 
+static inline bool cpu_has_vmx_virtual_intr_delivery(void)
+{
+   return false;
+}
+
 static inline bool cpu_has_vmx_flexpriority(void)
 {
return cpu_has_vmx_tpr_shadow() 
@@ -2544,6 +2557,7 @@ static __init int setup_vmcs_config(struct vmcs_config 
*vmcs_conf)
if (_cpu_based_exec_control  CPU_BASED_ACTIVATE_SECONDARY_CONTROLS) {
min2 = 0;
opt2 =

Re: [PATCH v9 2/3] x86, apicv: add virtual x2apic support

2013-01-09 Thread Gleb Natapov

On Thu, Jan 10, 2013 at 03:26:07PM +0800, Yang Zhang wrote:
 From: Yang Zhang yang.z.zh...@intel.com
 
 basically to benefit from apicv, we need to enable virtualized x2apic mode.
 Currently, we only enable it when guest is really using x2apic.
 
 Also, clear MSR bitmap for corresponding x2apic MSRs when guest enabled 
 x2apic:
 0x800 - 0x8ff: no read intercept for apicv register virtualization,
  except APIC ID and TMCCT.
 APIC ID and TMCCT: need software's assistance to get right value.
 TPR,EOI,SELF-IPI: no write intercept for virtual interrupt delivery.
 
 Signed-off-by: Kevin Tian kevin.t...@intel.com
 Signed-off-by: Yang Zhang yang.z.zh...@intel.com
 ---
  arch/x86/include/asm/kvm_host.h |2 +
  arch/x86/include/asm/vmx.h  |1 +
  arch/x86/kvm/lapic.c|5 +-
  arch/x86/kvm/svm.c  |6 +
  arch/x86/kvm/vmx.c  |  194 
 +--
  5 files changed, 200 insertions(+), 8 deletions(-)
 
 diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
 index c431b33..572a562 100644
 --- a/arch/x86/include/asm/kvm_host.h
 +++ b/arch/x86/include/asm/kvm_host.h
 @@ -697,6 +697,8 @@ struct kvm_x86_ops {
   void (*enable_nmi_window)(struct kvm_vcpu *vcpu);
   void (*enable_irq_window)(struct kvm_vcpu *vcpu);
   void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);
 + void (*enable_virtual_x2apic_mode)(struct kvm_vcpu *vcpu);
 + void (*disable_virtual_x2apic_mode)(struct kvm_vcpu *vcpu);
Make one callback with enable/disable parameter. And do not forget SVM.


   int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
   int (*get_tdp_level)(void);
   u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
 diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
 index 44c3f7e..0a54df0 100644
 --- a/arch/x86/include/asm/vmx.h
 +++ b/arch/x86/include/asm/vmx.h
 @@ -139,6 +139,7 @@
  #define SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES 0x0001
  #define SECONDARY_EXEC_ENABLE_EPT   0x0002
  #define SECONDARY_EXEC_RDTSCP0x0008
 +#define SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE   0x0010
  #define SECONDARY_EXEC_ENABLE_VPID  0x0020
  #define SECONDARY_EXEC_WBINVD_EXITING0x0040
  #define SECONDARY_EXEC_UNRESTRICTED_GUEST0x0080
 diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
 index 0664c13..ec38906 100644
 --- a/arch/x86/kvm/lapic.c
 +++ b/arch/x86/kvm/lapic.c
 @@ -1328,7 +1328,10 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 
 value)
   u32 id = kvm_apic_id(apic);
   u32 ldr = ((id  4)  16) | (1  (id  0xf));
   kvm_apic_set_ldr(apic, ldr);
 - }
 + kvm_x86_ops-enable_virtual_x2apic_mode(vcpu);
 + } else
 + kvm_x86_ops-disable_virtual_x2apic_mode(vcpu);
 +
You just broke SVM.

   apic-base_address = apic-vcpu-arch.apic_base 
MSR_IA32_APICBASE_BASE;
  
 diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
 index d29d3cd..0b82cb1 100644
 --- a/arch/x86/kvm/svm.c
 +++ b/arch/x86/kvm/svm.c
 @@ -3571,6 +3571,11 @@ static void update_cr8_intercept(struct kvm_vcpu 
 *vcpu, int tpr, int irr)
   set_cr_intercept(svm, INTERCEPT_CR8_WRITE);
  }
  
 +static void svm_enable_virtual_x2apic_mode(struct kvm_vcpu *vcpu)
 +{
 + return;
 +}
 +
  static int svm_nmi_allowed(struct kvm_vcpu *vcpu)
  {
   struct vcpu_svm *svm = to_svm(vcpu);
 @@ -4290,6 +4295,7 @@ static struct kvm_x86_ops svm_x86_ops = {
   .enable_nmi_window = enable_nmi_window,
   .enable_irq_window = enable_irq_window,
   .update_cr8_intercept = update_cr8_intercept,
 + .enable_virtual_x2apic_mode = svm_enable_virtual_x2apic_mode,
  
   .set_tss_addr = svm_set_tss_addr,
   .get_tdp_level = get_npt_level,
 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index 688f43f..b203ce7 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -433,6 +433,8 @@ struct vcpu_vmx {
  
   bool rdtscp_enabled;
  
 + bool virtual_x2apic_enabled;
 +
   /* Support for a guest hypervisor (nested VMX) */
   struct nested_vmx nested;
  };
 @@ -767,12 +769,23 @@ static inline bool 
 cpu_has_vmx_virtualize_apic_accesses(void)
   SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
  }
  
 +static inline bool cpu_has_vmx_virtualize_x2apic_mode(void)
 +{
 + return vmcs_config.cpu_based_2nd_exec_ctrl 
 + SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE;
 +}
 +
  static inline bool cpu_has_vmx_apic_register_virt(void)
  {
   return vmcs_config.cpu_based_2nd_exec_ctrl 
   SECONDARY_EXEC_APIC_REGISTER_VIRT;
  }
  
 +static inline bool cpu_has_vmx_virtual_intr_delivery(void)
 +{
 + return false;
 +}
 +
  static inline bool cpu_has_vmx_flexpriority(void)
  {
   return cpu_has_vmx_tpr_shadow() 
 @@ -2544,6 +2557,7

Re: [RFC PATCH 0/9] KVM: PPC: In-kernel PAPR interrupt controller emulation

2013-01-09 Thread Paul Mackerras

On Sat, Dec 15, 2012 at 01:06:13PM +1100, Benjamin Herrenschmidt wrote:
 On Sat, 2012-12-15 at 01:46 +0100, Alexander Graf wrote:
  On 05.11.2012, at 04:18, Paul Mackerras wrote:
  
   This series implements in-kernel emulation of the XICS interrupt
   controller specified in the PAPR specification and used by pseries
   guests.  Since the XICS is organized a bit differently to the
   interrupt controllers used on x86 machines, I have defined some new
   ioctls for setting up the XICS and for saving and restoring its
   state.  It may be possible to use some of the currently x86-specific
   ioctls instead.
  
  Is this one already following the new world order that we discussed during 
  KVM Forum? :)
 
 The new world order  (sorry, looks like nobody took notes and
 people expect me to do a write up from memory now ... :-)
 
 Well, mostly we agreed that the x86 stuff wasn't going to work for us.
 
 So basically what we discussed boils down to:
 
  - Move the existing generic KVM ioctl to create the kernel APIC to
 x86 since it's no as-is useful for other archs who, among other things,
 might need to pass argument based on the machine type (type of interrupt
 subsystem for example, etc...)

Assuming you're talking about KVM_CREATE_IRQCHIP, it is already
handled entirely in arch code (arch/x86/kvm/x86.c and
arch/ia64/kvm/kvm-ia64.c), along with KVM_GET_IRQCHIP and
KVM_SET_IRQCHIP.

  - Once that's done, well, instanciating interrupt controller objects
 becomes pretty much an arch specific matter. We could try to keep some
 ioctls somewhat common though it's not *that* useful because the various
 types  arguments are going to be fairly arch specific, so goes for
 configuration. Examples of what could be kept common are:
 
* Create irq chip, takes at least a type argument, possibly a few
 more type-specific args (or a union, but then let's keep space in there
 since we can't change the size of the struct later as it would impact
 the ioctl number afaik).

The existing KVM_CREATE_IRQCHIP is defined as _IO(...) which means
that it doesn't read or write memory, but there is still the 3rd
argument to the ioctl, which would give us 64 bits to indicate the
type of the top-level IRQ controller (XICS, MPIC, ...), but not much
more.

It sounds like the agreement at the meeting was more along the lines
of the KVM_CREATE_IRQCHIP_ARGS ioctl (as in the patches I posted)
which can be called multiple times to instantiate pieces of the
interrupt framework, rather than having a KVM_CREATE_IRQCHIP that gets
called once early on to say that there we are using in-kernel
interrupt controller emulation, followed by other calls to configure
the various parts of the framework.  Is that accurate?

   * Assigning the address of an irq chip when it can change (see ARM
 patches)
 
  - The existing business with irqfd etc... is mostly ok, except the
 interfaces messing around with MSIs (see virtio-pci use of kvm functions
 directly). The assignment of an irq number for an MSI must be a hook,
 probably a PCI controller hook, so x86 can get it done via its existing
 kernel interfaces and sane architectures can keep the assignment in qemu
 where it belongs.

So, if I've understood you correctly about what was agreed, the set of
ioctls that is implemented in the patches I posted is in line with
what was agreed, isn't it?  If not, where does it differ?  (To recap,
I had KVM_CREATE_IRQCHIP_ARGS, KVM_IRQCHIP_GET_SOURCES and
KVM_IRQCHIP_SET_SOURCES, plus a one-reg interface to get/set the
vcpu-specific state.)

Paul.
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

90 matches

Mail list logo