RE: [PATCH v8 2/3] x86, apicv: add virtual interrupt delivery support
Gleb Natapov wrote on 2013-01-08: On Tue, Jan 08, 2013 at 11:57:59AM -0200, Marcelo Tosatti wrote: On Tue, Jan 08, 2013 at 12:57:42PM +, Zhang, Yang Z wrote: Gleb Natapov wrote on 2013-01-08: On Mon, Jan 07, 2013 at 07:32:39PM -0200, Marcelo Tosatti wrote: On Mon, Jan 07, 2013 at 07:48:43PM +0200, Gleb Natapov wrote: ioapic_write (or any other ioapic update) lock() perform update make_all_vcpus_request(KVM_REQ_UPDATE_EOI_BITMAP) (*) unlock() (*) Similarly to TLB flush. The advantage is that all work becomes vcpu local. The end result is much simpler code. What complexity will it remove? Synchronization between multiple CPUs (except the KVM_REQ_ bit processing, which is infrastructure shared by other parts of KVM). Synchronization is just a lock around bitmap access. Can be replaced with RCU if it turns to be performance problem. We agreed that performance is non issue here. Yes, if the code is indeed simpler we can take the hit, although recalculating bitmap 255 times instead of one for -smp 255 looks like a little bit excessive, but I do not see considerable simplification (if at all). So as far as I understand you are proposing: vcpu0 or io thread: |vcpu1: ioapic_write (or other ioapic update) | lock(exitbitmap) | if (on vcpu) | ioapic_update_my_eoi_exitmap() | make_all_vcpus_request(update) |if (update requested) | ioapic_update_my_eoi_exitmap() unlock(exitbitmap) | The current patch logic is this: vcpu0 or io thread: | vcpu1: ioapic_write (or other ioapic update) | lock(exitbitmap) | ioapic_update_all_eoi_exitmaps() | make request on each vcpu| kick each vcpu | if (update requested) unlock(exitbitmap) |lock(exitbitmap) |load_exitbitmap() |unlock(exitbitmap) If I described correctly what you are proposing I do not see simplification since the bulk of the complexity is in the ioapic_update_(my|all)_eoi_exitmap() and they will be the same in both implementations. Actually I do see complication in your idea introduced by the fact that the case when update is done from vcpu thread have to be handled specially. The proposed patch may be simplified further by make_all_vcpus_request_async(update)(*) instead of making request and kicking each vcpu individually. In fact the way it is done now is buggy since requests are made only for vcpus with bit set in their bitmask, but if bit is cleared request is not made so vcpu can run with stale bitmask. ok, how about the follow logic: ioapic_write() lock() clear_eoi_exitmap_on_all_vcpus() perform update(no make request) make_all_vcpus_request(like tlb flush) unlock() Why not just ioapic writer / map updater context -- ioapic_write() make_all_vcpus_request() (no special lock taken) vcpu context, entry -- if(check_request(KVM_REQ_, )) { ioapic_lock(); (*) update local EOI exit bitmap from IOAPIC In my patch, it traverses IOAPIC entry once and only updates target vcpus's eoi exit bitmap. Then make request for all vcpus. With your suggestion , all vcpus will traverse all IOAPIC entries. Though ioapic entry write is rare, it's still not reasonable. ioapic_unlock(); } Fine by me. Looks simpler. (*) plus any other lock that paths that update the map take Best regards, Yang -- Gleb. Best regards, Yang -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 10/12] virtio-net: multiqueue support
On 01/08/2013 06:14 PM, Jason Wang wrote: On 01/08/2013 06:00 PM, Wanlong Gao wrote: On 01/08/2013 05:51 PM, Jason Wang wrote: On 01/08/2013 05:49 PM, Wanlong Gao wrote: On 01/08/2013 05:29 PM, Jason Wang wrote: On 01/08/2013 05:07 PM, Wanlong Gao wrote: On 12/28/2012 06:32 PM, Jason Wang wrote: +} else if (nc-peer-info-type != NET_CLIENT_OPTIONS_KIND_TAP) { +ret = -1; +} else { +ret = tap_detach(nc-peer); +} + +return ret; +} + +static void virtio_net_set_queues(VirtIONet *n) +{ +int i; + +for (i = 0; i n-max_queues; i++) { +if (i n-curr_queues) { +assert(!peer_attach(n, i)); +} else { +assert(!peer_detach(n, i)); I got a assert here, qemu-system-x86_64: /work/git/qemu/hw/virtio-net.c:330: virtio_net_set_queues: Assertion `!peer_detach(n, i)' failed. Any thoughts? Thanks, Wanlong Gao Thanks for the testing, which steps or cases did you met this assertion, migration, reboot or just changing the number of virtqueues? I use the 3.8-rc2 to test it again, I saw this tag has the multi-tap support. I just can't start the QEMU use -netdev tap,id=hostnet0,queues=2,fd=%d,fd=%d -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:ce:7b:29,bus=pci.0,addr=0x3 I pre-opened two tap fds, did I missing something? Nothing missed :) It should work. Could you please try not use fd=X and let qemu to create the file descriptors by itself? Btw, how did you create the two tap fds? Can it create descriptors itself? I get qemu-system-x86_64: -netdev tap,id=hostnet0,queues=2: Device 'tap' could not be initialized You need prepare an ifup script which default at /etc/qemu-ifup (like following). Or you may try to add a script=no after: #!/bin/sh switch=kvmbr0 /sbin/ifconfig $1 0.0.0.0 up /usr/sbin/brctl addif $switch $1 /usr/sbin/brctl stp $switch off This will let qemu create a tap fd itself and make it to be connected to a port of the bridge caled kvmbr0. But how to support multi-queue in this way? I got guest kernel panic when using this way and set queues=4. Thanks, Wanlong Gao I create the tap fd like this, and dup create the second fd, third fd, right? The second and third fd should be created with TUNSETIFF with the same tap_name also. Btw, you need to specify a IFF_MULTI_QUEUE flag to tell the kernel you want to create a multiqueue tap device, otherwise the second and third calling of TUNSETIFF will fail. Thanks int tap_fd = open(/dev/net/tun, O_RDWR); int vhost_fd = open(/dev/vhost-net, O_RDWR); char *tap_name = tap; char cmd[2048]; char brctl[256]; char netup[256]; struct ifreq ifr; if (tap_fd 0) { printf(open tun device failed\n); return -1; } if (vhost_fd 0) { printf(open vhost-net device failed\n); return -1; } memset(ifr, 0, sizeof(ifr)); memcpy(ifr.ifr_name, tap_name, sizeof(tap_name)); ifr.ifr_flags = IFF_TAP | IFF_NO_PI; /* * setup tap net device */ if (ioctl(tap_fd, TUNSETIFF, ifr) 0) { printf(setup tap net device failed\n); return -1; } sprintf(brctl, brctl addif virbr0 %s, tap_name); sprintf(netup, ifconfig %s up, tap_name); system(brctl); system(netup); Thanks, Wanlong Gao Thanks Thanks, Wanlong Gao +} +} +} + +static void virtio_net_set_multiqueue(VirtIONet *n, int multiqueue, int ctrl); + -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: what's the different for qemu --eanble-kvm and accel=kvm and qemu(when kvm kmod load)
On Mon, Jan 07, 2013 at 05:08:13PM +0800, lei yang wrote: On Mon, Jan 7, 2013 at 4:58 PM, Stefan Hajnoczi stefa...@gmail.com wrote: On Sun, Jan 6, 2013 at 12:27 PM, lei yang yanglei.f...@gmail.com wrote: What's the different with below combos? The difference is historical, it's just how the command-line options evolved over time. 1)qemu --enable-kvm The old way. Still useful because it's slightly easier to type than --machine accel=kvm. 2)qemu accel=kvm The modern way. 3)qemu without above parameters when kvm kmod has been load There is a difference in behavior between QEMU and qemu-kvm here: QEMU uses TCG and not KVM by default, regardless of whether the kvm.ko module has been loaded or not. qemu-kvm uses KVM by default, if available. The qemu-kvm fork has been retired so it's best not to rely on this behavior. Future distro packages will be built from QEMU and unless a code change is made, the default accelerator is TCG. Thanks fro the explain So if we want use kvm we need to explicitly add --enable-kvm or accel=kvm regardless kvm.ko load or not Yes. How can we check we are using TCG or KVM, can we check this in guestos or check this with monitor can you show me the exactly command? The QEMU monitor info kvm command displays the status. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio-spec: fix two typos
On Wed, Jan 09, 2013 at 03:55:23PM +0800, ak...@redhat.com wrote: From: Amos Kong ak...@redhat.com VIRTIO_NET_F_VTRL_VQ - VIRTIO_NET_F_CTRL_VQ VIRTIO_NET_CTRL_MQ is defined to 4 in kernel code Signed-off-by: Amos Kong ak...@redhat.com --- virtio-spec.lyx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) Reviewed-by: Stefan Hajnoczi stefa...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH qom-cpu 2/7] target-i386: Don't set any KVM flag by default if KVM is disabled
On Mon, 7 Jan 2013 16:20:43 -0200 Eduardo Habkost ehabk...@redhat.com wrote: This is a cleanup that tries to solve two small issues: - We don't need a separate kvm_pv_eoi_features variable just to keep a constant calculated at compile-time, and this style would require adding a separate variable (that's declared twice because of the CONFIG_KVM ifdef) for each feature that's going to be enabled/disable by machine-type compat code. - The pc-1.3 code is setting the kvm_pv_eoi flag on cpuid_kvm_features even when KVM is disabled at runtime. This small incosistency in the cpuid_kvm_features field isn't a problem today because cpuid_kvm_features is ignored by the TCG code, but it may cause unexpected problems later when refactoring the CPUID handling code. This patch eliminates the kvm_pv_eoi_features variable and simply uses kvm_enabled() inside the enable_kvm_pv_eoi() compat function, so it enables kvm_pv_eoi only if KVM is enabled. I believe this makes the behavior of enable_kvm_pv_eoi() clearer and easier to understand. Subj doesn't match what patch actually does. Have you meant Don't set kvm_pv_eoi flag by default if KVM is disabled? Although eliminate kvm_pv_eoi_features variable might better describe what patch is doing. Signed-off-by: Eduardo Habkost ehabk...@redhat.com --- Cc: kvm@vger.kernel.org Cc: Michael S. Tsirkin m...@redhat.com Cc: Gleb Natapov g...@redhat.com Cc: Marcelo Tosatti mtosa...@redhat.com Changes v2: - Coding style fix Changes v3: - Eliminate #ifdef by using the fake KVM_FEATURE_PV_EOI #define --- target-i386/cpu.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/target-i386/cpu.c b/target-i386/cpu.c index 951e206..40400ac 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -164,15 +164,15 @@ static uint32_t kvm_default_features = (1 KVM_FEATURE_CLOCKSOURCE) | (1 KVM_FEATURE_ASYNC_PF) | (1 KVM_FEATURE_STEAL_TIME) | (1 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT); -static const uint32_t kvm_pv_eoi_features = (0x1 KVM_FEATURE_PV_EOI); #else static uint32_t kvm_default_features = 0; -static const uint32_t kvm_pv_eoi_features = 0; #endif void enable_kvm_pv_eoi(void) { -kvm_default_features |= kvm_pv_eoi_features; +if (kvm_enabled()) { +kvm_default_features |= (1UL KVM_FEATURE_PV_EOI); +} } void host_cpuid(uint32_t function, uint32_t count, -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 10/12] virtio-net: multiqueue support
On 01/09/2013 04:23 PM, Wanlong Gao wrote: On 01/08/2013 06:14 PM, Jason Wang wrote: On 01/08/2013 06:00 PM, Wanlong Gao wrote: On 01/08/2013 05:51 PM, Jason Wang wrote: On 01/08/2013 05:49 PM, Wanlong Gao wrote: On 01/08/2013 05:29 PM, Jason Wang wrote: On 01/08/2013 05:07 PM, Wanlong Gao wrote: On 12/28/2012 06:32 PM, Jason Wang wrote: +} else if (nc-peer-info-type != NET_CLIENT_OPTIONS_KIND_TAP) { +ret = -1; +} else { +ret = tap_detach(nc-peer); +} + +return ret; +} + +static void virtio_net_set_queues(VirtIONet *n) +{ +int i; + +for (i = 0; i n-max_queues; i++) { +if (i n-curr_queues) { +assert(!peer_attach(n, i)); +} else { +assert(!peer_detach(n, i)); I got a assert here, qemu-system-x86_64: /work/git/qemu/hw/virtio-net.c:330: virtio_net_set_queues: Assertion `!peer_detach(n, i)' failed. Any thoughts? Thanks, Wanlong Gao Thanks for the testing, which steps or cases did you met this assertion, migration, reboot or just changing the number of virtqueues? I use the 3.8-rc2 to test it again, I saw this tag has the multi-tap support. I just can't start the QEMU use -netdev tap,id=hostnet0,queues=2,fd=%d,fd=%d -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:ce:7b:29,bus=pci.0,addr=0x3 I pre-opened two tap fds, did I missing something? Nothing missed :) It should work. Could you please try not use fd=X and let qemu to create the file descriptors by itself? Btw, how did you create the two tap fds? Can it create descriptors itself? I get qemu-system-x86_64: -netdev tap,id=hostnet0,queues=2: Device 'tap' could not be initialized You need prepare an ifup script which default at /etc/qemu-ifup (like following). Or you may try to add a script=no after: #!/bin/sh switch=kvmbr0 /sbin/ifconfig $1 0.0.0.0 up /usr/sbin/brctl addif $switch $1 /usr/sbin/brctl stp $switch off This will let qemu create a tap fd itself and make it to be connected to a port of the bridge caled kvmbr0. But how to support multi-queue in this way? Qemu will create the necessary multiqueue tap by itself, see patch 0/12. I got guest kernel panic when using this way and set queues=4. Does it happens w/o or w/ a fd parameter? What's the qemu command line? Did you meet it during boot time? Thanks Thanks, Wanlong Gao I create the tap fd like this, and dup create the second fd, third fd, right? The second and third fd should be created with TUNSETIFF with the same tap_name also. Btw, you need to specify a IFF_MULTI_QUEUE flag to tell the kernel you want to create a multiqueue tap device, otherwise the second and third calling of TUNSETIFF will fail. Thanks int tap_fd = open(/dev/net/tun, O_RDWR); int vhost_fd = open(/dev/vhost-net, O_RDWR); char *tap_name = tap; char cmd[2048]; char brctl[256]; char netup[256]; struct ifreq ifr; if (tap_fd 0) { printf(open tun device failed\n); return -1; } if (vhost_fd 0) { printf(open vhost-net device failed\n); return -1; } memset(ifr, 0, sizeof(ifr)); memcpy(ifr.ifr_name, tap_name, sizeof(tap_name)); ifr.ifr_flags = IFF_TAP | IFF_NO_PI; /* * setup tap net device */ if (ioctl(tap_fd, TUNSETIFF, ifr) 0) { printf(setup tap net device failed\n); return -1; } sprintf(brctl, brctl addif virbr0 %s, tap_name); sprintf(netup, ifconfig %s up, tap_name); system(brctl); system(netup); Thanks, Wanlong Gao Thanks Thanks, Wanlong Gao +} +} +} + +static void virtio_net_set_multiqueue(VirtIONet *n, int multiqueue, int ctrl); + -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 01/12] tap: multiqueue support
On Fri, Dec 28, 2012 at 06:31:53PM +0800, Jason Wang wrote: diff --git a/qapi-schema.json b/qapi-schema.json index 5dfa052..583eb7c 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -2465,7 +2465,7 @@ { 'type': 'NetdevTapOptions', 'data': { '*ifname': 'str', -'*fd': 'str', +'*fd': ['String'], This change is not backwards-compatible. You need to add a '*fds': ['String'] field instead. '*script': 'str', '*downscript': 'str', '*helper': 'str', @@ -2473,7 +2473,8 @@ '*vnet_hdr': 'bool', '*vhost': 'bool', '*vhostfd':'str', -'*vhostforce': 'bool' } } +'*vhostforce': 'bool', +'*queues': 'uint32'} } The 'queues' parameter should not be necessary when fd passing is used since we can learn the number of queues by looking at the list length. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 10/12] virtio-net: multiqueue support
On 01/09/2013 05:30 PM, Jason Wang wrote: On 01/09/2013 04:23 PM, Wanlong Gao wrote: On 01/08/2013 06:14 PM, Jason Wang wrote: On 01/08/2013 06:00 PM, Wanlong Gao wrote: On 01/08/2013 05:51 PM, Jason Wang wrote: On 01/08/2013 05:49 PM, Wanlong Gao wrote: On 01/08/2013 05:29 PM, Jason Wang wrote: On 01/08/2013 05:07 PM, Wanlong Gao wrote: On 12/28/2012 06:32 PM, Jason Wang wrote: +} else if (nc-peer-info-type != NET_CLIENT_OPTIONS_KIND_TAP) { +ret = -1; +} else { +ret = tap_detach(nc-peer); +} + +return ret; +} + +static void virtio_net_set_queues(VirtIONet *n) +{ +int i; + +for (i = 0; i n-max_queues; i++) { +if (i n-curr_queues) { +assert(!peer_attach(n, i)); +} else { +assert(!peer_detach(n, i)); I got a assert here, qemu-system-x86_64: /work/git/qemu/hw/virtio-net.c:330: virtio_net_set_queues: Assertion `!peer_detach(n, i)' failed. Any thoughts? Thanks, Wanlong Gao Thanks for the testing, which steps or cases did you met this assertion, migration, reboot or just changing the number of virtqueues? I use the 3.8-rc2 to test it again, I saw this tag has the multi-tap support. I just can't start the QEMU use -netdev tap,id=hostnet0,queues=2,fd=%d,fd=%d -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:ce:7b:29,bus=pci.0,addr=0x3 I pre-opened two tap fds, did I missing something? Nothing missed :) It should work. Could you please try not use fd=X and let qemu to create the file descriptors by itself? Btw, how did you create the two tap fds? Can it create descriptors itself? I get qemu-system-x86_64: -netdev tap,id=hostnet0,queues=2: Device 'tap' could not be initialized You need prepare an ifup script which default at /etc/qemu-ifup (like following). Or you may try to add a script=no after: #!/bin/sh switch=kvmbr0 /sbin/ifconfig $1 0.0.0.0 up /usr/sbin/brctl addif $switch $1 /usr/sbin/brctl stp $switch off This will let qemu create a tap fd itself and make it to be connected to a port of the bridge caled kvmbr0. But how to support multi-queue in this way? Qemu will create the necessary multiqueue tap by itself, see patch 0/12. I got guest kernel panic when using this way and set queues=4. Does it happens w/o or w/ a fd parameter? What's the qemu command line? Did you meet it during boot time? The QEMU command line is /work/git/qemu/x86_64-softmmu/qemu-system-x86_64 -name f17 -M pc-0.15 -enable-kvm -m 3096 \ -smp 4,sockets=4,cores=1,threads=1 \ -uuid c31a9f3e-4161-c53a-339c-5dc36d0497cb -no-user-config -nodefaults \ -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/f17.monitor,server,nowait \ -mon chardev=charmonitor,id=monitor,mode=control \ -rtc base=utc -no-shutdown \ -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \ -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0xb,num_queues=4,hotplug=on \ -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 \ -drive file=/vm/f17.img,if=none,id=drive-virtio-disk0,format=qcow2 \ -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 \ -drive file=/vm2/f17-kernel.img,if=none,id=drive-virtio-disk1,format=qcow2 \ -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk1,id=virtio-disk1 \ -drive file=/vm/virtio-scsi/scsi3.img,if=none,id=drive-scsi0-0-2-0,format=raw \ -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=2,drive=drive-scsi0-0-2-0,id=scsi0-0-2-0,removable=on \ -drive file=/vm/virtio-scsi/scsi4.img,if=none,id=drive-scsi0-0-3-0,format=raw \ -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=3,drive=drive-scsi0-0-3-0,id=scsi0-0-3-0 \ -drive file=/vm/virtio-scsi/scsi1.img,if=none,id=drive-scsi0-0-0-0,format=raw \ -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0 \ -drive file=/vm/virtio-scsi/scsi2.img,if=none,id=drive-scsi0-0-1-0,format=raw \ -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi0-0-1-0,id=scsi0-0-1-0 \ -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 \ -chardev file,id=charserial1,path=/vm/f17.log \ -device isa-serial,chardev=charserial1,id=serial1 \ -device usb-tablet,id=input0 -vga std \ -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 \ -netdev tap,id=hostnet0,vhost=on,queues=4 \ -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:ce:7b:29,bus=pci.0,addr=0x3 \ -monitor stdio I got panic just after booting the system, did nothing, waited for a while, the guest panicked. [ 28.053004] BUG: soft lockup - CPU#1 stuck for 23s! [ip:592] [ 28.053004] Modules linked in: ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables uinput joydev microcode virtio_balloon pcspkr virtio_net i2c_piix4 i2c_core virtio_scsi virtio_blk floppy [ 28.053004] CPU 1 [ 28.053004] Pid: 592, comm: ip Not tainted 3.8.0-rc1-net+ #3 Bochs
Re: [PATCH v5 01/12] KVM: ARM: Introduce KVM_SET_DEVICE_ADDRESS ioctl
Am 09.01.2013 um 00:49 schrieb Christoffer Dall c.d...@virtualopensystems.com: On Tue, Jan 8, 2013 at 6:29 PM, Scott Wood scottw...@freescale.com wrote: On 01/08/2013 05:17:05 PM, Christoffer Dall wrote: On Tue, Jan 8, 2013 at 5:36 PM, Scott Wood scottw...@freescale.com wrote: On 01/08/2013 12:41:30 PM, Christoffer Dall wrote: +struct kvm_device_address { + __u64 id; + __u64 addr; +}; What about this is really specific to addresses? Can't we set other device parameters this way? Sort of like a device equivalent of PPC's one-reg interface. This has been discussed a number of times, Sorry, this patch was just pointed out to me today. I googled the patch title but couldn't find this discussion. I believe it was mainly discussed at the KVM Forum in person. and one or the other there is a need for userspace to tell KVM to present memory-mapped devices at a given address. It was also considered to make this specific to irqchip initialization, but irqchips are different and a lot of that code is x86-specific, so that approach was discarded. This *could* look something like this: struct kvm_device_param { u64 dev_id; u64 param_id; u64 value; }; but that has less clear, or at least less specific, semantics. Why is it less clear? You need to have device-specific documentation for what id means, so why not also an enumeration of params? Or just keep it as is, and rename address to value. Whether dev and param are combined is orthogonal from whether it's used for non-address things. less clear in the sense that you have to look at more code to see what it does. I'm not saying that it's too unclear, at all, I'm actually fine with it, but to make my point, we can make an ioctl that's called do_something() that takes a struct with val0, val1, val2, val3, ... If you leave it as address, either we'll have it being used for non-address things regardless of the name (Not a typewriter!), or there'll end up being yet more unnecessary ioctls, or device-specific things will end up getting shoved into CPU interfaces such as one-reg. For example, on MPIC we need to be able to specify the version of the chip to emulate in addition to the address at which it lives. Also, why is it documented as an arm interface? Shouldn't it be a generic interface, with other architectures currently not implementing any IDs? What in the kvm_arch_vm_ioctl() wrapper is arm-specific? As I remember the argument for keeping this the point was that there were other preferred methods for other archs to do this, and that ARM was the only platform that had this explicit need, but maybe I'm making this up. I'll let Peter and Alex respond this as well, and if they're fine with changing it to what I proposed above, then let's do that, and we can make it a non arm-specific interface. I think we should make thus at least potentially generic. In fact, I wouldn't even mind calling it DEV_REG with the same semantics as ONE_REG, just that it also gets a unique dev id that gets created during in-kernel device creation and that it's a vm ioctl. That way we wouldn't block our path to create two in-kernel irqchips one day. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: FreeBSD-amd64 fails to start with SMP on quemu-kvm
On Tue, Jan 08, 2013 at 09:45:35PM +0100, Artur Samborski wrote: W dniu 08.01.2013 00:00, Marcelo Tosatti pisze: On Mon, Jan 07, 2013 at 06:13:22PM +0100, Artur Samborski wrote: Hello, When i try to run FreeBSD-amd64 on more than 1 vcpu in quemu-kvm (Fedora Core 17) eg. to run FreeBSD-9.0-RELEASE-amd64 with: qemu-kvm -m 1024m -cpu host -smp 2 -cdrom /storage/iso/FreeBSD-9.0-RELEASE-amd64-dvd1.iso it freezes KVM with: KVM internal error. Suberror: 1 emulation failure RAX=80b0d4c0 RBX=0009f000 RCX=c080 RDX= RSI=d238 RDI= RBP= RSP= R8 = R9 = R10= R11= R12= R13= R14= R15= RIP=0009f076 RFL=00010086 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES = f300 DPL=3 DS16 [-WA] CS =0008 00209900 DPL=0 CS64 [--A] SS =9f00 0009f000 f300 DPL=3 DS16 [-WA] DS =0018 00c09300 DPL=0 DS [-WA] FS = f300 DPL=3 DS16 [-WA] GS = f300 DPL=3 DS16 [-WA] LDT= 8200 DPL=0 LDT TR = 8b00 DPL=0 TSS64-busy GDT= 0009f080 0020 IDT= CR0=8011 CR2= CR3=0009c000 CR4=0030 DR0= DR1= DR2= DR3= DR6=0ff0 DR7=0400 EFER=0501 Code=00 00 00 80 0f 22 c0 ea 70 f0 09 00 08 00 48 b8 c0 d4 b0 80 ff ff ff ff ff e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 99 20 00 ff ff 00 00 Artur, Can you check whether https://patchwork-mail.kernel.org/patch/1942681/ fixes your problem Hi, thanks for the reply. Unfortunately, the patch does not help. Attempt to start FreeBSD amd64 via quemu-kvm with -smp parameter greater than 1 fails in exactly the same way as before. The patch was applied to the kernel from the 3.6.11-1.fc17.src.rpm package. Do I need some additional patches? Can you try queue branch from kvm.git? -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] ARM updates for kvmtool
On Mon, Jan 7, 2013 at 8:14 PM, Will Deacon will.dea...@arm.com wrote: Hello kvm hackers, This patch series introduces some updates to the ARM (AArch32) kvm tools code: - virtio mmio fixes to deal with guest page sizes != 4k (in preparation for AArch64, which I will post separately). - .dtb dumping via the lkvm command line - Support for PSCI firmware as a replacement to the spin-table based SMP boot code The last option was implemented after discussion on the linux-arm-kernel list when adding support for the mach-virt platform. I hope to upstream the kernel-side part of the implementation for 3.9 and expect the kvm bits to follow once that has been merged. All feedback welcome. Applied, thanks Will! -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] Add support for ARMv8 CPUs to kvmtool
Hi Will, On Mon, Jan 7, 2013 at 8:43 PM, Will Deacon will.dea...@arm.com wrote: Hello again, These two patches add support for ARMv8 processors running an AArch64 instance of kvm to kvmtool. Both AArch32 and AArch64 guests are supported and, in the case of the latter, the guest page size may be either 64k or 4k. This depends on the ARM updates series I just posted: https://lists.cs.columbia.edu/pipermail/kvmarm/2013-January/004505.html Feedback welcome, So will ARMv7 continue to work after this patch even without the ARM updates? If so, I'm happy to merge the patch before the updates hit v3.9. Pekka -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] Add support for ARMv8 CPUs to kvmtool
On Wed, Jan 09, 2013 at 11:16:42AM +, Pekka Enberg wrote: Hi Will, Hello, On Mon, Jan 7, 2013 at 8:43 PM, Will Deacon will.dea...@arm.com wrote: These two patches add support for ARMv8 processors running an AArch64 instance of kvm to kvmtool. Both AArch32 and AArch64 guests are supported and, in the case of the latter, the guest page size may be either 64k or 4k. This depends on the ARM updates series I just posted: https://lists.cs.columbia.edu/pipermail/kvmarm/2013-January/004505.html Feedback welcome, So will ARMv7 continue to work after this patch even without the ARM updates? If so, I'm happy to merge the patch before the updates hit v3.9. I think we're getting our wires crossed a bit here, so I'll try to explain my madness: - Mainline ARM kernels cannot be booted by kvmtool yet, you currently have to use my virt/mach branch from git://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git I hope to upstream this for 3.9 (it has gone through the necessary review on the ARM kernel list). - As part of the review mentioned above, some kvmtool changes were required to support SMP with the latest virt/mach. You just merged these in my ARM updates series. - The ARMv8 kvmtool code builds on top of the series you just merged and won't apply without it. Applying it won't have any adverse effects on ARMv7 (AArch32). In fact, the two series sit together on my kvmtool/arm branch from git://git.kernel.org/pub/scm/linux/kernel/git/will/kvmtool.git but I posted them separately to avoid holding up the ARM updates. Does that answer you question or have I made things worse?! Cheers, Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH qom-cpu 2/7] target-i386: Don't set any KVM flag by default if KVM is disabled
On Wed, Jan 09, 2013 at 09:41:37AM -0200, Eduardo Habkost wrote: [...] Andreas, please ignore this patch (it is not necessary anymore as this series doesn't include machine-type compatibility code for kvm_mmu). Patches 2-7 don't depend on this patch and should apply cleanly without it. I mean: patches 1 and 3-7 don't depend on this patch and can be applied cleanly without it. -- Eduardo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] ARM updates for kvmtool
On Wed, Jan 09, 2013 at 11:13:58AM +, Pekka Enberg wrote: On Mon, Jan 7, 2013 at 8:14 PM, Will Deacon will.dea...@arm.com wrote: Hello kvm hackers, This patch series introduces some updates to the ARM (AArch32) kvm tools code: - virtio mmio fixes to deal with guest page sizes != 4k (in preparation for AArch64, which I will post separately). - .dtb dumping via the lkvm command line - Support for PSCI firmware as a replacement to the spin-table based SMP boot code The last option was implemented after discussion on the linux-arm-kernel list when adding support for the mach-virt platform. I hope to upstream the kernel-side part of the implementation for 3.9 and expect the kvm bits to follow once that has been merged. All feedback welcome. Applied, thanks Will! Great, cheers Pekka. Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] Add support for ARMv8 CPUs to kvmtool
On Wed, Jan 9, 2013 at 1:33 PM, Will Deacon will.dea...@arm.com wrote: I think we're getting our wires crossed a bit here, so I'll try to explain my madness: - Mainline ARM kernels cannot be booted by kvmtool yet, you currently have to use my virt/mach branch from git://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git I hope to upstream this for 3.9 (it has gone through the necessary review on the ARM kernel list). - As part of the review mentioned above, some kvmtool changes were required to support SMP with the latest virt/mach. You just merged these in my ARM updates series. - The ARMv8 kvmtool code builds on top of the series you just merged and won't apply without it. Applying it won't have any adverse effects on ARMv7 (AArch32). In fact, the two series sit together on my kvmtool/arm branch from git://git.kernel.org/pub/scm/linux/kernel/git/will/kvmtool.git but I posted them separately to avoid holding up the ARM updates. Does that answer you question or have I made things worse?! Yes, it answers my question. I'll go ahead and apply the ARMv8 patches. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH qom-cpu 2/7] target-i386: Don't set any KVM flag by default if KVM is disabled
On Wed, Jan 09, 2013 at 10:46:12AM +0100, Igor Mammedov wrote: On Mon, 7 Jan 2013 16:20:43 -0200 Eduardo Habkost ehabk...@redhat.com wrote: This is a cleanup that tries to solve two small issues: - We don't need a separate kvm_pv_eoi_features variable just to keep a constant calculated at compile-time, and this style would require adding a separate variable (that's declared twice because of the CONFIG_KVM ifdef) for each feature that's going to be enabled/disable by machine-type compat code. - The pc-1.3 code is setting the kvm_pv_eoi flag on cpuid_kvm_features even when KVM is disabled at runtime. This small incosistency in the cpuid_kvm_features field isn't a problem today because cpuid_kvm_features is ignored by the TCG code, but it may cause unexpected problems later when refactoring the CPUID handling code. This patch eliminates the kvm_pv_eoi_features variable and simply uses kvm_enabled() inside the enable_kvm_pv_eoi() compat function, so it enables kvm_pv_eoi only if KVM is enabled. I believe this makes the behavior of enable_kvm_pv_eoi() clearer and easier to understand. Subj doesn't match what patch actually does. Have you meant Don't set kvm_pv_eoi flag by default if KVM is disabled? Although eliminate kvm_pv_eoi_features variable might better describe what patch is doing. True. The other flags in kvm_default_features may be already set and will be copied to cpuid_kvm_features even if kvm_enabled() is false. I had a previous version of this patch that also changed the code using kvm_default_features to check kvm_enabled(), but I removed that part and kept only the kvm_pv_eoi_features variable removal. Andreas, please ignore this patch (it is not necessary anymore as this series doesn't include machine-type compatibility code for kvm_mmu). Patches 2-7 don't depend on this patch and should apply cleanly without it. I will send a new version later, probably with a separate patch to ignore kvm_default_features if kvm_enabled() is false (so there's no need to even check kvm_enabled() inside enable_kvm_pv_eoi()). Signed-off-by: Eduardo Habkost ehabk...@redhat.com --- Cc: kvm@vger.kernel.org Cc: Michael S. Tsirkin m...@redhat.com Cc: Gleb Natapov g...@redhat.com Cc: Marcelo Tosatti mtosa...@redhat.com Changes v2: - Coding style fix Changes v3: - Eliminate #ifdef by using the fake KVM_FEATURE_PV_EOI #define --- target-i386/cpu.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/target-i386/cpu.c b/target-i386/cpu.c index 951e206..40400ac 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -164,15 +164,15 @@ static uint32_t kvm_default_features = (1 KVM_FEATURE_CLOCKSOURCE) | (1 KVM_FEATURE_ASYNC_PF) | (1 KVM_FEATURE_STEAL_TIME) | (1 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT); -static const uint32_t kvm_pv_eoi_features = (0x1 KVM_FEATURE_PV_EOI); #else static uint32_t kvm_default_features = 0; -static const uint32_t kvm_pv_eoi_features = 0; #endif void enable_kvm_pv_eoi(void) { -kvm_default_features |= kvm_pv_eoi_features; +if (kvm_enabled()) { +kvm_default_features |= (1UL KVM_FEATURE_PV_EOI); +} } void host_cpuid(uint32_t function, uint32_t count, -- Eduardo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM versions, machine types and failed migrations
Hello, I'd like to ask a few questions about the way migrations work in KVM among different emulated machine types and different versions of the qemu-kvm package. I am sending to both the kvm@ and qemu-devel@ lists, please redirect me if I was wrong in doing so. In a nutshell: while trying to live-migrate a VM on ~okeanos [1], we see VM migrations fail silently if going from kvm 1.0 to kvm 1.1. The source VM is frozen, info migrate on the source monitor reports success, but the VM is dead upon arrival on the destination process. Please see [3] for the exact package versions for qemu-kvm we have tested with. Migration works if the destination kvm has been started with the same machine type as the source VM, e.g., using -M pc-1.0 specifically on the destination, when migrating a pc-1.0 machine from kvm 1.0 to kvm 1.1. How does the machine type specified with -M work in the case of migrations? Are migrations expected to fail if the machine type is different between source and destination process? If yes, shouldn't KVM be able to detect this and abort the migration instead of failing silently? For every (src, dst) pair of package version / machine type, we saw the following: dst: 1.0/pc-0.12 1.0/pc-1.0 1.1/pc-1.1 src: 1.0/pc-0.12 ok ok fails silently 1.0/pc-1.0 fails silently ok fails silently 1.1/pc-1.1 fails silently fails silently ok Machine types pc-0.12 and pc-1.0 were run with qemu-kvm package version 1.0, machine type pc-1.1 with qemu-kvm package version 1.1. Also, the migration 1.0/pc-0.12 to 1.0/pc-1.0 seems to work... How can it work or even be allowed, given that the guest finds itself running on a different hardware configuration after the migration? Regarding different package versions of qemu-kvm, it seems migrations do not work from source 0.12.5 to any other version *even* if -M pc-0.12 is specified at the incoming KVM process. For versions = 1.0 everything works provided the machine type on the destination is the same as on the source. Our goal is to patch Ganeti [2] so that it sets the destination machine type to that of the source specifically, ensuring migrations work seamlessly after a KVM upgrade. Is there a way to retrieve the machine type of a running KVM process through a monitor command? Thank you, Vangelis. [1] ~okeanos IaaS: http://okeanos.io [2] Ganeti: https://code.google.com/p/ganeti/ [3] We tested on Debian Squeeze with package versions: kvm 0.12: qemu-kvm_0.12.5+dfsg-5+squeeze8_amd64.deb kvm 1.0: qemu-kvm_1.0+dfsg-8~bpo60+1_amd64.deb kvm 1.1: qemu-kvm_1.1.2+dfsg-2~bpo60+1_amd64.deb -- Vangelis Koukis vkou...@grnet.gr OpenPGP public key ID: pub 1024D/1D038E97 2003-07-13 Vangelis Koukis vkou...@cslab.ece.ntua.gr Key fingerprint = C5CD E02E 2C78 7C10 8A00 53D8 FBFC 3799 1D03 8E97 Only those who will risk going too far can possibly find out how far one can go. -- T.S. Eliot signature.asc Description: Digital signature
Re: KVM versions, machine types and failed migrations
On Wed, Jan 09, 2013 at 02:23:50PM +0200, Vangelis Koukis wrote: Hello, I'd like to ask a few questions about the way migrations work in KVM among different emulated machine types and different versions of the qemu-kvm package. I am sending to both the kvm@ and qemu-devel@ lists, please redirect me if I was wrong in doing so. In a nutshell: while trying to live-migrate a VM on ~okeanos [1], we see VM migrations fail silently if going from kvm 1.0 to kvm 1.1. The source VM is frozen, info migrate on the source monitor reports success, but the VM is dead upon arrival on the destination process. Please see [3] for the exact package versions for qemu-kvm we have tested with. Migration works if the destination kvm has been started with the same machine type as the source VM, e.g., using -M pc-1.0 specifically on the destination, when migrating a pc-1.0 machine from kvm 1.0 to kvm 1.1. How does the machine type specified with -M work in the case of migrations? Are migrations expected to fail if the machine type is different between source and destination process? If yes, shouldn't KVM be able to detect this and abort the migration instead of failing silently? When doing migration, the fundamental requirement is that the guest OS visible machine ABI must not change. Thus there are three key things to take care of when launching QEMU on the migration target host. - The device PCI/USB addresses must be identical to the source - The machine type must be identical to the source - The CPU model must be identical to the source If you don't follow those requirements, either QEMU or the guest OS or both will crash burn during migration you get to keep both pieces :-) Regarding different package versions of qemu-kvm, it seems migrations do not work from source 0.12.5 to any other version *even* if -M pc-0.12 is specified at the incoming KVM process. For versions = 1.0 everything works provided the machine type on the destination is the same as on the source. Some older versions of QEMU were buggy causing the machine type to not correctly preserve ABI. Our goal is to patch Ganeti [2] so that it sets the destination machine type to that of the source specifically, ensuring migrations work seamlessly after a KVM upgrade. Is there a way to retrieve the machine type of a running KVM process through a monitor command? IIRC there is not a monitor command for this. The general approach to dealing with migration stability should be to launch QEMU with a canonical hardware configuration. This means explicitly setting a machine type, CPU model and PCI/USB devices addresses upfront. NB you should not use 'pc' as a machine type - if you query the list of machine types from QEMU, it will tell you what 'pc' corresponds to (pc-1.2) and then use the versioned type so you have a known machine type. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [okeanos-dev] Re: KVM versions, machine types and failed migrations
On Wed, Jan 09, 2013 at 01:10:45pm +, Daniel P. Berrange wrote: When doing migration, the fundamental requirement is that the guest OS visible machine ABI must not change. Thus there are three key things to take care of when launching QEMU on the migration target host. - The device PCI/USB addresses must be identical to the source - The machine type must be identical to the source - The CPU model must be identical to the source Thanks for the detailed list of requirements, we'll take it into account for the relevant Ganeti patch. If you don't follow those requirements, either QEMU or the guest OS or both will crash burn during migration you get to keep both pieces :-) My point is, are these requirements left up to the caller of kvm -incoming to satisfy? Since the migration will most probably break, wouldn't it be best for QEMU to detect this and complain loudly, instead of continuing with the migration, failing silently and destroying the VM? Sure there could be some yes, do it, I know it is going to break option, which will make QEMU proceed with the migration. However, in 99% of the cases this is just user error, e.g. the user has upgraded the version on the other end and has not specified -M explicitly. It would be best if QEMU was able to detect and warn the user about what is going to happen, because it does lead to the VM dying. Regarding different package versions of qemu-kvm, it seems migrations do not work from source 0.12.5 to any other version *even* if -M pc-0.12 is specified at the incoming KVM process. For versions = 1.0 everything works provided the machine type on the destination is the same as on the source. Some older versions of QEMU were buggy causing the machine type to not correctly preserve ABI. Our goal is to patch Ganeti [2] so that it sets the destination machine type to that of the source specifically, ensuring migrations work seamlessly after a KVM upgrade. Is there a way to retrieve the machine type of a running KVM process through a monitor command? IIRC there is not a monitor command for this. The general approach to dealing with migration stability should be to launch QEMU with a canonical hardware configuration. This means explicitly setting a machine type, CPU model and PCI/USB devices addresses upfront. NB you should not use 'pc' as a machine type - if you query the list of machine types from QEMU, it will tell you what 'pc' corresponds to (pc-1.2) and then use the versioned type so you have a known machine type. This is exactly what we're trying to do: specify -M explicitly in the kvm command line, instead of letting the default pc machine type change arbitrarily whenever the qemu-kvm package gets upgraded. Thanks again, Vangelis. -- Vangelis Koukis vkou...@grnet.gr OpenPGP public key ID: pub 1024D/1D038E97 2003-07-13 Vangelis Koukis vkou...@cslab.ece.ntua.gr Key fingerprint = C5CD E02E 2C78 7C10 8A00 53D8 FBFC 3799 1D03 8E97 Only those who will risk going too far can possibly find out how far one can go. -- T.S. Eliot signature.asc Description: Digital signature
Re: [PATCH v5 12/12] ARM: KVM: Add VGIC configuration option
Hello. On 08-01-2013 22:42, Christoffer Dall wrote: From: Marc Zyngier marc.zyng...@arm.com It is now possible to select the VGIC configuration option. Signed-off-by: Marc Zyngier marc.zyng...@arm.com Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com --- arch/arm/kvm/Kconfig |8 1 file changed, 8 insertions(+) diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig index 05227cb..d32e33f 100644 --- a/arch/arm/kvm/Kconfig +++ b/arch/arm/kvm/Kconfig @@ -51,6 +51,14 @@ config KVM_ARM_MAX_VCPUS large, so only choose a reasonable number that you expect to actually use. +config KVM_ARM_VGIC +bool KVM support for Virtual GIC Please indent with tab, as below. + depends on KVM_ARM_HOST OF + select HAVE_KVM_IRQCHIP + default y + ---help--- + Adds support for a hardware assisted, in-kernel GIC emulation. + WBR, Sergei -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 00/12] Multiqueue virtio-net
On Fri, Dec 28, 2012 at 06:31:52PM +0800, Jason Wang wrote: Perf Numbers: Two Intel Xeon 5620 with direct connected intel 82599EB Host/Guest kernel: David net tree vhost enabled - lots of improvents of both latency and cpu utilization in request-reponse test - get regression of guest sending small packets which because TCP tends to batch less when the latency were improved 1q/2q/4q TCP_RR size #sessions trans.rate norm trans.rate norm trans.rate norm 1 1 9393.26 595.64 9408.18 597.34 9375.19 584.12 1 2072162.1 2214.24 129880.22 2456.13 196949.81 2298.13 1 50107513.38 2653.99 139721.93 2490.58 259713.82 2873.57 1 100 126734.63 2676.54 145553.5 2406.63 265252.68 2943 64 19453.42 632.33 9371.37 616.13 9338.19 615.97 64 20 70620.03 2093.68 125155.75 2409.15 191239.91 2253.32 64 50 1069662448.29 146518.67 2514.47 242134.07 2720.91 64 100 117046.35 2394.56 190153.09 2696.82 238881.29 2704.41 256 1 8733.29 736.36 8701.07 680.83 8608.92 530.1 256 20 69279.89 2274.45 115103.07 2299.76 144555.16 1963.53 256 50 97676.02 2296.09 150719.57 2522.92 254510.5 3028.44 256 100 150221.55 2949.56 197569.3 2790.92 300695.78 3494.83 TCP_CRR size #sessions trans.rate norm trans.rate norm trans.rate norm 1 1 2848.37 163.41 2230.39 130.89 2013.09 120.47 1 2023434.5 562.11 31057.43 531.07 49488.28 564.41 1 5028514.88 582.17 40494.23 605.92 60113.35 654.97 1 100 28827.22 584.73 48813.25 661.6 61783.62 676.56 64 12780.08 159.4 2201.07 127.96 2006.8 117.63 64 20 23318.51 564.47 30982.44 530.24 49734.95 566.13 64 50 28585.72 582.54 40576.7 610.08 60167.89 656.56 64 100 28747.37 584.17 49081.87 667.87 60612.94 662 256 1 2772.08 160.51 2231.84 131.05 2003.62 113.45 256 20 23086.35 559.8 30929.09 528.16 48454.9 555.22 256 50 28354.7 579.85 40578.31 60760261.71 657.87 256 100 28844.55 585.67 48541.86 659.08 61941.07 676.72 TCP_STREAM guest receiving size #sessions throughput norm throughput norm throughput norm 1 1 16.27 1.33 16.11.12 16.13 0.99 1 2 33.04 2.08 32.96 2.19 32.75 1.98 1 4 66.62 6.83 68.35.56 66.14 2.65 64 1896.55 56.67 914.02 58.14 898.9 61.56 64 21830.46 91.02 1812.02 64.59 1835.57 66.26 64 43626.61 142.55 3636.25 100.64 3607.46 75.03 256 1 2619.49 131.23 2543.19 129.03 2618.69 132.39 256 2 5136.58 203.02 5163.31 141.11 5236.51 149.4 256 4 7063.99 242.83 9365.4 208.49 9421.03 159.94 512 1 3592.43 165.24 3603.12 167.19 3552.5 169.57 512 2 7042.62 246.59 7068.46 180.87 7258.52 186.3 512 4 6996.08 241.49 9298.34 206.12 9418.52 159.33 1024 1 4339.54 192.95 4370.2 191.92 4211.72 192.49 1024 2 7439.45 254.77 9403.99 215.24 9120.82 222.67 1024 4 7953.86 272.11 9403.87 208.23 9366.98 159.49 4096 1 7696.28 272.04 7611.41 270.38 7778.71 267.76 4096 2 7530.35 261.1 8905.43 246.27 8990.18 267.57 4096 4 7121.6 247.02 9411.75 206.71 9654.96 184.67 16384 1 7795.73 268.54 7780.94 267.2 7634.26 260.73 16384 2 7436.57 255.81 9381.86 220.85 9392220.36 16384 4 7199.07 247.81 9420.96 205.87 9373.69 159.57 TCP_MAERTS guest sending size #sessions throughput norm throughput norm throughput norm 1 1 15.94 0.62 15.55 0.61 15.13 0.59 1 2 36.11 0.83 32.46 0.69 32.28 0.69 1 4 71.59 1 68.91 0.94 61.52 0.77 64 1630.71 22.52 622.11 22.35 605.09 21.84 64 21442.36 30.57 1292.15 25.82 1282.67 25.55 64 43186.79 42.59 2844.96 36.03 2529.69 30.06 256 1 1760.96 58.07 1738.44 57.43 1695.99 56.19 256 2 4834.23 95.19 3524.85 64.21 3511.94 64.45 256 4 9324.63 145.74 8956.49 116.39 6720.17 73.86 512 1 2678.03 84.1 2630.68 82.93 2636.54 82.57 512 2 9368.17 195.61 9408.82 204.53 5316.3 92.99 512 4 9186.34 209.68 9358.72 183.82 9489.29 160.42 1024 1 3620.71 109.88 3625.54 109.83 3606.61 112.35 1024 2 9429258.32 7082.79 120.55 7403.53 134.78 1024 4 9430.66 290.44 9499.29 232.31 9414.6 190.92 4096 1 9339.28 296.48 9374.23 372.88 9348.76 298.49 4096 2 9410.53 378.69 9412.61 286.18 9409.75 278.31 4096 4 9487.35 374.1 9556.91 288.81 9441.94 221.64 16384 1 9380.43 403.8 9379.78 399.13 9382.42 393.55 16384 2 9367.69 406.93 9415.04 312.68 9409.29 300.9 16384 4 9391.96 405.17 9695.12 310.54 9423.76 223.47 Trying to understand the performance results: What is the host device configuration? tap + bridge? Did you use host CPU affinity for the vhost threads? Can multiqueue tap take advantage of multiqueue host NICs or is virtio-net multiqueue unaware of the physical NIC multiqueue capabilities? The results seem pretty mixed - as a user it's not obvious what to choose as a good all-round setting. Any observations on how multiqueue should be configured? What is the norm statistic? Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a
Re: [PATCH v5 01/12] KVM: ARM: Introduce KVM_SET_DEVICE_ADDRESS ioctl
On 9 January 2013 10:02, Alexander Graf ag...@suse.de wrote: I think we should make thus at least potentially generic. In fact, I wouldn't even mind calling it DEV_REG with the same semantics as ONE_REG, just that it also gets a unique dev id that gets created during in-kernel device creation and that it's a vm ioctl. Well, we might want a DEV_REG, but you might as well just make ONE_REG OK as a VM ioctl, since there's no reason not to have not-per-cpu but not device registers. ONE_REG already supports dividing up the ID space so you can say which device or whatever is being used, because we had things like GIC registers in mind when we put that API together. However, this shouldn't be DEV_REG, because this isn't actually setting state in the irqchip device, it's configuring the kernel's part of the system model [compare wiring up irq routing, which also has a custom ioctl rather than a generic one]. As such it definitely needs to happen only before the VM is actually started and not during VM execution, unlike a DEV_REG which would presumably be generally valid. That way we wouldn't block our path to create two in-kernel irqchips one day. Er, SET_DEVICE_ADDRESS doesn't block us doing that; that's why it has an ID parameter. The discussion at the KVM forum, as I recall it was basically: (a) some other ppc irqchips also want to set the base address for where their memory mapped registers live, so this isn't a totally ARM specific weirdness (b) we didn't need to tangle up and delay the KVM ARM work with a vague and unspecified desire for general configurability -- PMM -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 01/12] KVM: ARM: Introduce KVM_SET_DEVICE_ADDRESS ioctl
On 09.01.2013, at 15:48, Peter Maydell wrote: On 9 January 2013 10:02, Alexander Graf ag...@suse.de wrote: I think we should make thus at least potentially generic. In fact, I wouldn't even mind calling it DEV_REG with the same semantics as ONE_REG, just that it also gets a unique dev id that gets created during in-kernel device creation and that it's a vm ioctl. Well, we might want a DEV_REG, but you might as well just make ONE_REG OK as a VM ioctl, since there's no reason not to have not-per-cpu but not device registers. ONE_REG already supports dividing up the ID space so you can say which device or whatever is being used, because we had things like GIC registers in mind when we put that API together. ONE_REG's address space today doesn't include a field for the target device, because that's already predetermined by the fd you run it on. Hence my suggestion to add it as separate field, because then we keep the id space mask-free. However, this shouldn't be DEV_REG, because this isn't actually setting state in the irqchip device, it's configuring the kernel's part of the system model [compare wiring up irq routing, which also has a custom ioctl rather than a generic one]. As such it definitely needs to happen only before the VM is actually started and not during VM execution, unlike a DEV_REG which would presumably be generally valid. The same can be true for ONE_REG/SET_SREGS. On PPC we support setting the PVR (CPU type) via SREGS because ONE_REG only came later. Setting the CPU type only makes sense before you first start using a vcpu. After that point it should be forbidden. So in a way, the irqchip memory address location is a device register, as it has all the properties that a register on that device would have. It is * per device * one id for one purpose * may or may not be writable during certain times We would also keep the gates open to get a new register id one day for a second address. Imagine an interrupt controller that splits its address mappings into PIC base registers and MSI target registers, though the MSI registers logically belong to the PIC. Then the ioctl as designed here gets stuck too. That way we wouldn't block our path to create two in-kernel irqchips one day. Er, SET_DEVICE_ADDRESS doesn't block us doing that; that's why it has an ID parameter. The discussion at the KVM forum, as I recall it was basically: (a) some other ppc irqchips also want to set the base address for where their memory mapped registers live, so this isn't a totally ARM specific weirdness Yes. (b) we didn't need to tangle up and delay the KVM ARM work with a vague and unspecified desire for general configurability Yeah, that was the basic idea. Considering that the patch set hasn't been going in for another 2 months after that discussion indicates that this isn't too much of an issue though :). In fact, I do recall myself pointing out that we need some padding in that ioctl to accomodate for later usages like the ones above. Then we could later rename (or copy name) it and everyone'd be happy. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 01/12] KVM: ARM: Introduce KVM_SET_DEVICE_ADDRESS ioctl
On Wed, Jan 9, 2013 at 10:11 AM, Peter Maydell peter.mayd...@linaro.org wrote: On 9 January 2013 14:58, Alexander Graf ag...@suse.de wrote: On 09.01.2013, at 15:48, Peter Maydell wrote: On 9 January 2013 10:02, Alexander Graf ag...@suse.de wrote: I think we should make thus at least potentially generic. In fact, I wouldn't even mind calling it DEV_REG with the same semantics as ONE_REG, just that it also gets a unique dev id that gets created during in-kernel device creation and that it's a vm ioctl. Well, we might want a DEV_REG, but you might as well just make ONE_REG OK as a VM ioctl, since there's no reason not to have not-per-cpu but not device registers. ONE_REG already supports dividing up the ID space so you can say which device or whatever is being used, because we had things like GIC registers in mind when we put that API together. ONE_REG's address space today doesn't include a field for the target device, There's 16 bits of register group type. because that's already predetermined by the fd you run it on. Hence my suggestion to add it as separate field, because then we keep the id space mask-free. Er, it already has masks for the different parts of ID space. That's a feature, not a bug. However, this shouldn't be DEV_REG, because this isn't actually setting state in the irqchip device, it's configuring the kernel's part of the system model [compare wiring up irq routing, which also has a custom ioctl rather than a generic one]. As such it definitely needs to happen only before the VM is actually started and not during VM execution, unlike a DEV_REG which would presumably be generally valid. The same can be true for ONE_REG/SET_SREGS. On PPC we support setting the PVR (CPU type) via SREGS because ONE_REG only came later. Setting the CPU type only makes sense before you first start using a vcpu. After that point it should be forbidden. So in a way, the irqchip memory address location is a device register, as it has all the properties that a register on that device would have. It is * per device * one id for one purpose * may or may not be writable during certain times There's a distinction between things you need to do in machine setup and things you need to propagate for migration. It should be possible to do a migration by doing a complete copy of all ONE_REG (and DEV_REG if we have it) state from source to destination, so it needs to be always possible to write them. We would also keep the gates open to get a new register id one day for a second address. Imagine an interrupt controller that splits its address mappings into PIC base registers and MSI target registers, though the MSI registers logically belong to the PIC. Then the ioctl as designed here gets stuck too. This is exactly what ARM's GIC has already. We have several memory regions (one for the distributor, one for the cpu interface), and map them with several calls to SET_DEVICE_ADDRESS. (b) we didn't need to tangle up and delay the KVM ARM work with a vague and unspecified desire for general configurability Yeah, that was the basic idea. Considering that the patch set hasn't been going in for another 2 months after that discussion indicates that this isn't too much of an issue though :). well, now it's actually going in, and it seems like this is the last issue. Let's hold our breaths here for a second and let me write up a new device attribute API and send that out, and if you guys can possibly live with it, then please consider accepting it. -Christoffer -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH qom-cpu 4/7] target-i386/cpu: Introduce FeatureWord typedefs
On Mon, Jan 07, 2013 at 04:20:45PM -0200, Eduardo Habkost wrote: This introduces a FeatureWord enum, FeatureWordInfo struct (with generation information about a feature word), and a FeatureWordArray typedef, and changes add_flagname_to_bitmaps() code and cpu_x86_parse_featurestr() to use the new typedefs instead of separate variables for each feature word. This will help us keep the code at kvm_check_features_against_host(), cpu_x86_parse_featurestr() and add_flagname_to_bitmaps() sane while adding new feature name arrays. Signed-off-by: Eduardo Habkost ehabk...@redhat.com --- target-i386/cpu.c | 97 +++ target-i386/cpu.h | 15 + 2 files changed, 63 insertions(+), 49 deletions(-) diff --git a/target-i386/cpu.c b/target-i386/cpu.c index b09b625..7d62d48 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -124,6 +124,20 @@ static const char *cpuid_7_0_ebx_feature_name[] = { NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, }; +typedef struct FeatureWordInfo { +const char **feat_names; +} FeatureWordInfo; + +static FeatureWordInfo feature_word_info[FEATURE_WORDS] = { +[FEAT_1_EDX] = { .feat_names = feature_name }, +[FEAT_1_ECX] = { .feat_names = ext_feature_name }, +[FEAT_8000_0001_EDX] = { .feat_names = ext2_feature_name }, +[FEAT_8000_0001_ECX] = { .feat_names = ext3_feature_name }, +[FEAT_KVM] = { .feat_names = kvm_feature_name }, +[FEAT_SVM] = { .feat_names = svm_feature_name }, +[FEAT_7_0_EBX] = { .feat_names = cpuid_7_0_ebx_feature_name }, +}; + const char *get_register_name_32(unsigned int reg) { static const char *reg_names[CPU_NB_REGS32] = { @@ -271,23 +285,20 @@ static bool lookup_feature(uint32_t *pval, const char *s, const char *e, return found; } -static void add_flagname_to_bitmaps(const char *flagname, uint32_t *features, -uint32_t *ext_features, -uint32_t *ext2_features, -uint32_t *ext3_features, -uint32_t *kvm_features, -uint32_t *svm_features, -uint32_t *cpuid_7_0_ebx_features) +static void add_flagname_to_bitmaps(const char *flagname, +FeatureWordArray words) { -if (!lookup_feature(features, flagname, NULL, feature_name) -!lookup_feature(ext_features, flagname, NULL, ext_feature_name) -!lookup_feature(ext2_features, flagname, NULL, ext2_feature_name) -!lookup_feature(ext3_features, flagname, NULL, ext3_feature_name) -!lookup_feature(kvm_features, flagname, NULL, kvm_feature_name) -!lookup_feature(svm_features, flagname, NULL, svm_feature_name) -!lookup_feature(cpuid_7_0_ebx_features, flagname, NULL, -cpuid_7_0_ebx_feature_name)) -fprintf(stderr, CPU feature %s not found\n, flagname); +FeatureWord w; +for (w = 0; w FEATURE_WORDS; w++) { +FeatureWordInfo *wi = feature_word_info[w]; +if (wi-feat_names I would move patch 6 before that one and drop this check here, but if you disagree it is your call. +lookup_feature(words[w], flagname, NULL, wi-feat_names)) { +break; +} +} +if (w == FEATURE_WORDS) { +fprintf(stderr, CPU feature %s not found\n, flagname); +} } typedef struct x86_def_t { @@ -1256,35 +1267,23 @@ static int cpu_x86_parse_featurestr(x86_def_t *x86_cpu_def, char *features) unsigned int i; char *featurestr; /* Single 'key=value string being parsed */ /* Features to be added */ -uint32_t plus_features = 0, plus_ext_features = 0; -uint32_t plus_ext2_features = 0, plus_ext3_features = 0; -uint32_t plus_kvm_features = kvm_default_features, plus_svm_features = 0; -uint32_t plus_7_0_ebx_features = 0; +FeatureWordArray plus_features = { +[FEAT_KVM] = kvm_default_features, +}; /* Features to be removed */ -uint32_t minus_features = 0, minus_ext_features = 0; -uint32_t minus_ext2_features = 0, minus_ext3_features = 0; -uint32_t minus_kvm_features = 0, minus_svm_features = 0; -uint32_t minus_7_0_ebx_features = 0; +FeatureWordArray minus_features = { 0 }; uint32_t numvalue; -add_flagname_to_bitmaps(hypervisor, plus_features, -plus_ext_features, plus_ext2_features, plus_ext3_features, -plus_kvm_features, plus_svm_features, plus_7_0_ebx_features); +add_flagname_to_bitmaps(hypervisor, plus_features); featurestr = features ? strtok(features, ,) : NULL; while (featurestr) { char *val; if (featurestr[0] == '+') { -add_flagname_to_bitmaps(featurestr + 1, plus_features, -
Re: [PATCH v5 01/12] KVM: ARM: Introduce KVM_SET_DEVICE_ADDRESS ioctl
On 9 January 2013 14:58, Alexander Graf ag...@suse.de wrote: On 09.01.2013, at 15:48, Peter Maydell wrote: On 9 January 2013 10:02, Alexander Graf ag...@suse.de wrote: I think we should make thus at least potentially generic. In fact, I wouldn't even mind calling it DEV_REG with the same semantics as ONE_REG, just that it also gets a unique dev id that gets created during in-kernel device creation and that it's a vm ioctl. Well, we might want a DEV_REG, but you might as well just make ONE_REG OK as a VM ioctl, since there's no reason not to have not-per-cpu but not device registers. ONE_REG already supports dividing up the ID space so you can say which device or whatever is being used, because we had things like GIC registers in mind when we put that API together. ONE_REG's address space today doesn't include a field for the target device, There's 16 bits of register group type. because that's already predetermined by the fd you run it on. Hence my suggestion to add it as separate field, because then we keep the id space mask-free. Er, it already has masks for the different parts of ID space. That's a feature, not a bug. However, this shouldn't be DEV_REG, because this isn't actually setting state in the irqchip device, it's configuring the kernel's part of the system model [compare wiring up irq routing, which also has a custom ioctl rather than a generic one]. As such it definitely needs to happen only before the VM is actually started and not during VM execution, unlike a DEV_REG which would presumably be generally valid. The same can be true for ONE_REG/SET_SREGS. On PPC we support setting the PVR (CPU type) via SREGS because ONE_REG only came later. Setting the CPU type only makes sense before you first start using a vcpu. After that point it should be forbidden. So in a way, the irqchip memory address location is a device register, as it has all the properties that a register on that device would have. It is * per device * one id for one purpose * may or may not be writable during certain times There's a distinction between things you need to do in machine setup and things you need to propagate for migration. It should be possible to do a migration by doing a complete copy of all ONE_REG (and DEV_REG if we have it) state from source to destination, so it needs to be always possible to write them. We would also keep the gates open to get a new register id one day for a second address. Imagine an interrupt controller that splits its address mappings into PIC base registers and MSI target registers, though the MSI registers logically belong to the PIC. Then the ioctl as designed here gets stuck too. This is exactly what ARM's GIC has already. We have several memory regions (one for the distributor, one for the cpu interface), and map them with several calls to SET_DEVICE_ADDRESS. (b) we didn't need to tangle up and delay the KVM ARM work with a vague and unspecified desire for general configurability Yeah, that was the basic idea. Considering that the patch set hasn't been going in for another 2 months after that discussion indicates that this isn't too much of an issue though :). We might get there faster if people didn't keep nitpicking the APIs at the last minute :-) -- PMM -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 01/12] KVM: ARM: Introduce KVM_SET_DEVICE_ADDRESS ioctl
On 09.01.2013, at 16:11, Peter Maydell wrote: On 9 January 2013 14:58, Alexander Graf ag...@suse.de wrote: On 09.01.2013, at 15:48, Peter Maydell wrote: On 9 January 2013 10:02, Alexander Graf ag...@suse.de wrote: I think we should make thus at least potentially generic. In fact, I wouldn't even mind calling it DEV_REG with the same semantics as ONE_REG, just that it also gets a unique dev id that gets created during in-kernel device creation and that it's a vm ioctl. Well, we might want a DEV_REG, but you might as well just make ONE_REG OK as a VM ioctl, since there's no reason not to have not-per-cpu but not device registers. ONE_REG already supports dividing up the ID space so you can say which device or whatever is being used, because we had things like GIC registers in mind when we put that API together. ONE_REG's address space today doesn't include a field for the target device, There's 16 bits of register group type. That's register group, not device id. It's constant per id. If you have an ARM register, it will always stay an ARM register. It's not going to change. But if you reuse those bits for the device id, they are going to change, because a device id is similar to a handle that should be returned by the device create ioctl. because that's already predetermined by the fd you run it on. Hence my suggestion to add it as separate field, because then we keep the id space mask-free. Er, it already has masks for the different parts of ID space. That's a feature, not a bug. All those masks are constant masks and only used to make reading the IDs easier. They never change within a single type of a register. However, this shouldn't be DEV_REG, because this isn't actually setting state in the irqchip device, it's configuring the kernel's part of the system model [compare wiring up irq routing, which also has a custom ioctl rather than a generic one]. As such it definitely needs to happen only before the VM is actually started and not during VM execution, unlike a DEV_REG which would presumably be generally valid. The same can be true for ONE_REG/SET_SREGS. On PPC we support setting the PVR (CPU type) via SREGS because ONE_REG only came later. Setting the CPU type only makes sense before you first start using a vcpu. After that point it should be forbidden. So in a way, the irqchip memory address location is a device register, as it has all the properties that a register on that device would have. It is * per device * one id for one purpose * may or may not be writable during certain times There's a distinction between things you need to do in machine setup and things you need to propagate for migration. It should be possible to do a migration by doing a complete copy of all ONE_REG (and DEV_REG if we have it) state from source to destination, so it needs to be always possible to write them. You need to somewhere encode a list of registers you want to synchronize for migration anyways. Don't include this register in that list then. We would also keep the gates open to get a new register id one day for a second address. Imagine an interrupt controller that splits its address mappings into PIC base registers and MSI target registers, though the MSI registers logically belong to the PIC. Then the ioctl as designed here gets stuck too. This is exactly what ARM's GIC has already. We have several memory regions (one for the distributor, one for the cpu interface), and map them with several calls to SET_DEVICE_ADDRESS. I see. (b) we didn't need to tangle up and delay the KVM ARM work with a vague and unspecified desire for general configurability Yeah, that was the basic idea. Considering that the patch set hasn't been going in for another 2 months after that discussion indicates that this isn't too much of an issue though :). We might get there faster if people didn't keep nitpicking the APIs at the last minute :-) I guess we just had too much bad experience with designing horrible APIs in the past on the PPC side ;). Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 2/3] x86, apicv: add virtual interrupt delivery support
On Wed, Jan 09, 2013 at 08:07:31AM +, Zhang, Yang Z wrote: if(check_request(KVM_REQ_, )) { ioapic_lock(); (*) update local EOI exit bitmap from IOAPIC In my patch, it traverses IOAPIC entry once and only updates target vcpus's eoi exit bitmap. Then make request for all vcpus. With your suggestion , all vcpus will traverse all IOAPIC entries. Though ioapic entry write is rare, it's still not reasonable. It should be fast, and very rare (as in once during system initialization, or device hotplug). Is there a particular case that makes it necessary to optimize scanning? ioapic_unlock(); } Fine by me. Looks simpler. (*) plus any other lock that paths that update the map take Best regards, Yang -- Gleb. Best regards, Yang -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH qom-cpu 0/7] disable kvm_mmu + -cpu enforce fixes (v3)
On Mon, Jan 07, 2013 at 04:20:41PM -0200, Eduardo Habkost wrote: Changes on v3: - Patches 3-9 from v2 are now already on qom-cpu tree - Remove CONFIG_KVM #ifdefs by declaring fake KVM_* #defines on sysemu/kvm.h - Refactor code that uses the feature word arrays (to make it easier to add a new feature name array) - Add feature name array for CPUID leaf 0xC001 Changes on v2: - Now both the kvm_mmu-disable and -cpu enforce changes are on the same series - Coding style fixes Git tree for reference: git://github.com/ehabkost/qemu-hacks.git cpu-enforce-all.v3 https://github.com/ehabkost/qemu-hacks/tree/cpu-enforce-all.v3 The changes are a bit intrusive, but: - The longer we take to make enforce strict as it should (and make libvirt finally use it), more users will have VMs with migration-unsafe unpredictable guest ABIs. For this reason, I would like to get this into QEMU 1.4. - The changes in this series should affect only users that are already using the enforce flag, and I believe whoever is using the enforce flag really want the strict behavior introduced by this series. Reviewed-by: Gleb Natapov g...@redhat.com Small comment on patch 4. Fill free to ignore. Eduardo Habkost (7): kvm: Add fake KVM constants to avoid #ifdefs on KVM-specific code target-i386: Don't set any KVM flag by default if KVM is disabled target-i386: Disable kvm_mmu by default target-i386/cpu: Introduce FeatureWord typedefs target-i386: kvm_check_features_against_host(): Use feature_word_info target-i386/cpu.c: Add feature name array for ext4_features target-i386: check/enforce: Check all feature words include/sysemu/kvm.h | 14 target-i386/cpu.c| 193 --- target-i386/cpu.h| 15 3 files changed, 150 insertions(+), 72 deletions(-) -- 1.7.11.7 -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 01/12] tap: multiqueue support
On 01/09/2013 05:56 PM, Stefan Hajnoczi wrote: On Fri, Dec 28, 2012 at 06:31:53PM +0800, Jason Wang wrote: diff --git a/qapi-schema.json b/qapi-schema.json index 5dfa052..583eb7c 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -2465,7 +2465,7 @@ { 'type': 'NetdevTapOptions', 'data': { '*ifname': 'str', -'*fd': 'str', +'*fd': ['String'], This change is not backwards-compatible. You need to add a '*fds': ['String'] field instead. I'm not quite understand this case, I think it still work when we we just specify one fd. '*script': 'str', '*downscript': 'str', '*helper': 'str', @@ -2473,7 +2473,8 @@ '*vnet_hdr': 'bool', '*vhost': 'bool', '*vhostfd':'str', -'*vhostforce': 'bool' } } +'*vhostforce': 'bool', +'*queues': 'uint32'} } The 'queues' parameter should not be necessary when fd passing is used since we can learn the number of queues by looking at the list length. Ok. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 10/12] virtio-net: multiqueue support
On 01/09/2013 06:01 PM, Wanlong Gao wrote: On 01/09/2013 05:30 PM, Jason Wang wrote: On 01/09/2013 04:23 PM, Wanlong Gao wrote: On 01/08/2013 06:14 PM, Jason Wang wrote: On 01/08/2013 06:00 PM, Wanlong Gao wrote: On 01/08/2013 05:51 PM, Jason Wang wrote: On 01/08/2013 05:49 PM, Wanlong Gao wrote: On 01/08/2013 05:29 PM, Jason Wang wrote: On 01/08/2013 05:07 PM, Wanlong Gao wrote: On 12/28/2012 06:32 PM, Jason Wang wrote: +} else if (nc-peer-info-type != NET_CLIENT_OPTIONS_KIND_TAP) { +ret = -1; +} else { +ret = tap_detach(nc-peer); +} + +return ret; +} + +static void virtio_net_set_queues(VirtIONet *n) +{ +int i; + +for (i = 0; i n-max_queues; i++) { +if (i n-curr_queues) { +assert(!peer_attach(n, i)); +} else { +assert(!peer_detach(n, i)); I got a assert here, qemu-system-x86_64: /work/git/qemu/hw/virtio-net.c:330: virtio_net_set_queues: Assertion `!peer_detach(n, i)' failed. Any thoughts? Thanks, Wanlong Gao Thanks for the testing, which steps or cases did you met this assertion, migration, reboot or just changing the number of virtqueues? I use the 3.8-rc2 to test it again, I saw this tag has the multi-tap support. I just can't start the QEMU use -netdev tap,id=hostnet0,queues=2,fd=%d,fd=%d -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:ce:7b:29,bus=pci.0,addr=0x3 I pre-opened two tap fds, did I missing something? Nothing missed :) It should work. Could you please try not use fd=X and let qemu to create the file descriptors by itself? Btw, how did you create the two tap fds? Can it create descriptors itself? I get qemu-system-x86_64: -netdev tap,id=hostnet0,queues=2: Device 'tap' could not be initialized You need prepare an ifup script which default at /etc/qemu-ifup (like following). Or you may try to add a script=no after: #!/bin/sh switch=kvmbr0 /sbin/ifconfig $1 0.0.0.0 up /usr/sbin/brctl addif $switch $1 /usr/sbin/brctl stp $switch off This will let qemu create a tap fd itself and make it to be connected to a port of the bridge caled kvmbr0. But how to support multi-queue in this way? Qemu will create the necessary multiqueue tap by itself, see patch 0/12. I got guest kernel panic when using this way and set queues=4. Does it happens w/o or w/ a fd parameter? What's the qemu command line? Did you meet it during boot time? The QEMU command line is /work/git/qemu/x86_64-softmmu/qemu-system-x86_64 -name f17 -M pc-0.15 -enable-kvm -m 3096 \ -smp 4,sockets=4,cores=1,threads=1 \ -uuid c31a9f3e-4161-c53a-339c-5dc36d0497cb -no-user-config -nodefaults \ -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/f17.monitor,server,nowait \ -mon chardev=charmonitor,id=monitor,mode=control \ -rtc base=utc -no-shutdown \ -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \ -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0xb,num_queues=4,hotplug=on \ -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 \ -drive file=/vm/f17.img,if=none,id=drive-virtio-disk0,format=qcow2 \ -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 \ -drive file=/vm2/f17-kernel.img,if=none,id=drive-virtio-disk1,format=qcow2 \ -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk1,id=virtio-disk1 \ -drive file=/vm/virtio-scsi/scsi3.img,if=none,id=drive-scsi0-0-2-0,format=raw \ -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=2,drive=drive-scsi0-0-2-0,id=scsi0-0-2-0,removable=on \ -drive file=/vm/virtio-scsi/scsi4.img,if=none,id=drive-scsi0-0-3-0,format=raw \ -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=3,drive=drive-scsi0-0-3-0,id=scsi0-0-3-0 \ -drive file=/vm/virtio-scsi/scsi1.img,if=none,id=drive-scsi0-0-0-0,format=raw \ -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0 \ -drive file=/vm/virtio-scsi/scsi2.img,if=none,id=drive-scsi0-0-1-0,format=raw \ -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi0-0-1-0,id=scsi0-0-1-0 \ -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 \ -chardev file,id=charserial1,path=/vm/f17.log \ -device isa-serial,chardev=charserial1,id=serial1 \ -device usb-tablet,id=input0 -vga std \ -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 \ -netdev tap,id=hostnet0,vhost=on,queues=4 \ -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:ce:7b:29,bus=pci.0,addr=0x3 \ -monitor stdio I got panic just after booting the system, did nothing, waited for a while, the guest panicked. [ 28.053004] BUG: soft lockup - CPU#1 stuck for 23s! [ip:592] [ 28.053004] Modules linked in: ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables uinput joydev microcode virtio_balloon pcspkr virtio_net i2c_piix4 i2c_core virtio_scsi
Re: [PATCH v5 01/12] KVM: ARM: Introduce KVM_SET_DEVICE_ADDRESS ioctl
On 09.01.2013, at 16:22, Marc Zyngier wrote: On Wed, 9 Jan 2013 15:11:39 +, Peter Maydell peter.mayd...@linaro.org wrote: On 9 January 2013 14:58, Alexander Graf ag...@suse.de wrote: Yeah, that was the basic idea. Considering that the patch set hasn't been going in for another 2 months after that discussion indicates that this isn't too much of an issue though :). We might get there faster if people didn't keep nitpicking the APIs at the last minute :-) Exactly. We're trying hard to get the damned thing merged, and the permanent API churn quickly becomes a burden. As I said earlier, we have had a lot of experience in creating really bad APIs in the past. But how about making this one specific? Call it KVM_ARM_SET_VGIC_ADDRESS, keep the rest as it is, resend it, and later we can come up with an actually generic interface. Alex My understanding was that a consensus was reached 2 months ago. Has something fundamentally changed? Something that makes this API invalid? If not, can we please keep this one? Thanks, M. -- Who you jivin' with that Cosmik Debris? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 00/12] Multiqueue virtio-net
On Wed, Jan 09, 2013 at 03:29:24PM +0100, Stefan Hajnoczi wrote: On Fri, Dec 28, 2012 at 06:31:52PM +0800, Jason Wang wrote: Perf Numbers: Two Intel Xeon 5620 with direct connected intel 82599EB Host/Guest kernel: David net tree vhost enabled - lots of improvents of both latency and cpu utilization in request-reponse test - get regression of guest sending small packets which because TCP tends to batch less when the latency were improved 1q/2q/4q TCP_RR size #sessions trans.rate norm trans.rate norm trans.rate norm 1 1 9393.26 595.64 9408.18 597.34 9375.19 584.12 1 2072162.1 2214.24 129880.22 2456.13 196949.81 2298.13 1 50107513.38 2653.99 139721.93 2490.58 259713.82 2873.57 1 100 126734.63 2676.54 145553.5 2406.63 265252.68 2943 64 19453.42 632.33 9371.37 616.13 9338.19 615.97 64 20 70620.03 2093.68 125155.75 2409.15 191239.91 2253.32 64 50 1069662448.29 146518.67 2514.47 242134.07 2720.91 64 100 117046.35 2394.56 190153.09 2696.82 238881.29 2704.41 256 1 8733.29 736.36 8701.07 680.83 8608.92 530.1 256 20 69279.89 2274.45 115103.07 2299.76 144555.16 1963.53 256 50 97676.02 2296.09 150719.57 2522.92 254510.5 3028.44 256 100 150221.55 2949.56 197569.3 2790.92 300695.78 3494.83 TCP_CRR size #sessions trans.rate norm trans.rate norm trans.rate norm 1 1 2848.37 163.41 2230.39 130.89 2013.09 120.47 1 2023434.5 562.11 31057.43 531.07 49488.28 564.41 1 5028514.88 582.17 40494.23 605.92 60113.35 654.97 1 100 28827.22 584.73 48813.25 661.6 61783.62 676.56 64 12780.08 159.4 2201.07 127.96 2006.8 117.63 64 20 23318.51 564.47 30982.44 530.24 49734.95 566.13 64 50 28585.72 582.54 40576.7 610.08 60167.89 656.56 64 100 28747.37 584.17 49081.87 667.87 60612.94 662 256 1 2772.08 160.51 2231.84 131.05 2003.62 113.45 256 20 23086.35 559.8 30929.09 528.16 48454.9 555.22 256 50 28354.7 579.85 40578.31 60760261.71 657.87 256 100 28844.55 585.67 48541.86 659.08 61941.07 676.72 TCP_STREAM guest receiving size #sessions throughput norm throughput norm throughput norm 1 1 16.27 1.33 16.11.12 16.13 0.99 1 2 33.04 2.08 32.96 2.19 32.75 1.98 1 4 66.62 6.83 68.35.56 66.14 2.65 64 1896.55 56.67 914.02 58.14 898.9 61.56 64 21830.46 91.02 1812.02 64.59 1835.57 66.26 64 43626.61 142.55 3636.25 100.64 3607.46 75.03 256 1 2619.49 131.23 2543.19 129.03 2618.69 132.39 256 2 5136.58 203.02 5163.31 141.11 5236.51 149.4 256 4 7063.99 242.83 9365.4 208.49 9421.03 159.94 512 1 3592.43 165.24 3603.12 167.19 3552.5 169.57 512 2 7042.62 246.59 7068.46 180.87 7258.52 186.3 512 4 6996.08 241.49 9298.34 206.12 9418.52 159.33 1024 1 4339.54 192.95 4370.2 191.92 4211.72 192.49 1024 2 7439.45 254.77 9403.99 215.24 9120.82 222.67 1024 4 7953.86 272.11 9403.87 208.23 9366.98 159.49 4096 1 7696.28 272.04 7611.41 270.38 7778.71 267.76 4096 2 7530.35 261.1 8905.43 246.27 8990.18 267.57 4096 4 7121.6 247.02 9411.75 206.71 9654.96 184.67 16384 1 7795.73 268.54 7780.94 267.2 7634.26 260.73 16384 2 7436.57 255.81 9381.86 220.85 9392220.36 16384 4 7199.07 247.81 9420.96 205.87 9373.69 159.57 TCP_MAERTS guest sending size #sessions throughput norm throughput norm throughput norm 1 1 15.94 0.62 15.55 0.61 15.13 0.59 1 2 36.11 0.83 32.46 0.69 32.28 0.69 1 4 71.59 1 68.91 0.94 61.52 0.77 64 1630.71 22.52 622.11 22.35 605.09 21.84 64 21442.36 30.57 1292.15 25.82 1282.67 25.55 64 43186.79 42.59 2844.96 36.03 2529.69 30.06 256 1 1760.96 58.07 1738.44 57.43 1695.99 56.19 256 2 4834.23 95.19 3524.85 64.21 3511.94 64.45 256 4 9324.63 145.74 8956.49 116.39 6720.17 73.86 512 1 2678.03 84.1 2630.68 82.93 2636.54 82.57 512 2 9368.17 195.61 9408.82 204.53 5316.3 92.99 512 4 9186.34 209.68 9358.72 183.82 9489.29 160.42 1024 1 3620.71 109.88 3625.54 109.83 3606.61 112.35 1024 2 9429258.32 7082.79 120.55 7403.53 134.78 1024 4 9430.66 290.44 9499.29 232.31 9414.6 190.92 4096 1 9339.28 296.48 9374.23 372.88 9348.76 298.49 4096 2 9410.53 378.69 9412.61 286.18 9409.75 278.31 4096 4 9487.35 374.1 9556.91 288.81 9441.94 221.64 16384 1 9380.43 403.8 9379.78 399.13 9382.42 393.55 16384 2 9367.69 406.93 9415.04 312.68 9409.29 300.9 16384 4 9391.96 405.17 9695.12 310.54 9423.76 223.47 Trying to understand the performance results: What is the host device configuration? tap + bridge? Did you use host CPU affinity for the vhost threads? Can multiqueue tap take advantage of multiqueue host NICs or is virtio-net multiqueue unaware of the physical NIC multiqueue capabilities? The results seem pretty mixed - as a user it's not obvious what to choose as a good all-round setting. Yes, this I
Re: [Qemu-devel] [PATCH 00/12] Multiqueue virtio-net
On 01/09/2013 11:32 PM, Michael S. Tsirkin wrote: On Wed, Jan 09, 2013 at 03:29:24PM +0100, Stefan Hajnoczi wrote: On Fri, Dec 28, 2012 at 06:31:52PM +0800, Jason Wang wrote: Perf Numbers: Two Intel Xeon 5620 with direct connected intel 82599EB Host/Guest kernel: David net tree vhost enabled - lots of improvents of both latency and cpu utilization in request-reponse test - get regression of guest sending small packets which because TCP tends to batch less when the latency were improved 1q/2q/4q TCP_RR size #sessions trans.rate norm trans.rate norm trans.rate norm 1 1 9393.26 595.64 9408.18 597.34 9375.19 584.12 1 2072162.1 2214.24 129880.22 2456.13 196949.81 2298.13 1 50107513.38 2653.99 139721.93 2490.58 259713.82 2873.57 1 100 126734.63 2676.54 145553.5 2406.63 265252.68 2943 64 19453.42 632.33 9371.37 616.13 9338.19 615.97 64 20 70620.03 2093.68 125155.75 2409.15 191239.91 2253.32 64 50 1069662448.29 146518.67 2514.47 242134.07 2720.91 64 100 117046.35 2394.56 190153.09 2696.82 238881.29 2704.41 256 1 8733.29 736.36 8701.07 680.83 8608.92 530.1 256 20 69279.89 2274.45 115103.07 2299.76 144555.16 1963.53 256 50 97676.02 2296.09 150719.57 2522.92 254510.5 3028.44 256 100 150221.55 2949.56 197569.3 2790.92 300695.78 3494.83 TCP_CRR size #sessions trans.rate norm trans.rate norm trans.rate norm 1 1 2848.37 163.41 2230.39 130.89 2013.09 120.47 1 2023434.5 562.11 31057.43 531.07 49488.28 564.41 1 5028514.88 582.17 40494.23 605.92 60113.35 654.97 1 100 28827.22 584.73 48813.25 661.6 61783.62 676.56 64 12780.08 159.4 2201.07 127.96 2006.8 117.63 64 20 23318.51 564.47 30982.44 530.24 49734.95 566.13 64 50 28585.72 582.54 40576.7 610.08 60167.89 656.56 64 100 28747.37 584.17 49081.87 667.87 60612.94 662 256 1 2772.08 160.51 2231.84 131.05 2003.62 113.45 256 20 23086.35 559.8 30929.09 528.16 48454.9 555.22 256 50 28354.7 579.85 40578.31 60760261.71 657.87 256 100 28844.55 585.67 48541.86 659.08 61941.07 676.72 TCP_STREAM guest receiving size #sessions throughput norm throughput norm throughput norm 1 1 16.27 1.33 16.11.12 16.13 0.99 1 2 33.04 2.08 32.96 2.19 32.75 1.98 1 4 66.62 6.83 68.35.56 66.14 2.65 64 1896.55 56.67 914.02 58.14 898.9 61.56 64 21830.46 91.02 1812.02 64.59 1835.57 66.26 64 43626.61 142.55 3636.25 100.64 3607.46 75.03 256 1 2619.49 131.23 2543.19 129.03 2618.69 132.39 256 2 5136.58 203.02 5163.31 141.11 5236.51 149.4 256 4 7063.99 242.83 9365.4 208.49 9421.03 159.94 512 1 3592.43 165.24 3603.12 167.19 3552.5 169.57 512 2 7042.62 246.59 7068.46 180.87 7258.52 186.3 512 4 6996.08 241.49 9298.34 206.12 9418.52 159.33 1024 1 4339.54 192.95 4370.2 191.92 4211.72 192.49 1024 2 7439.45 254.77 9403.99 215.24 9120.82 222.67 1024 4 7953.86 272.11 9403.87 208.23 9366.98 159.49 4096 1 7696.28 272.04 7611.41 270.38 7778.71 267.76 4096 2 7530.35 261.1 8905.43 246.27 8990.18 267.57 4096 4 7121.6 247.02 9411.75 206.71 9654.96 184.67 16384 1 7795.73 268.54 7780.94 267.2 7634.26 260.73 16384 2 7436.57 255.81 9381.86 220.85 9392220.36 16384 4 7199.07 247.81 9420.96 205.87 9373.69 159.57 TCP_MAERTS guest sending size #sessions throughput norm throughput norm throughput norm 1 1 15.94 0.62 15.55 0.61 15.13 0.59 1 2 36.11 0.83 32.46 0.69 32.28 0.69 1 4 71.59 1 68.91 0.94 61.52 0.77 64 1630.71 22.52 622.11 22.35 605.09 21.84 64 21442.36 30.57 1292.15 25.82 1282.67 25.55 64 43186.79 42.59 2844.96 36.03 2529.69 30.06 256 1 1760.96 58.07 1738.44 57.43 1695.99 56.19 256 2 4834.23 95.19 3524.85 64.21 3511.94 64.45 256 4 9324.63 145.74 8956.49 116.39 6720.17 73.86 512 1 2678.03 84.1 2630.68 82.93 2636.54 82.57 512 2 9368.17 195.61 9408.82 204.53 5316.3 92.99 512 4 9186.34 209.68 9358.72 183.82 9489.29 160.42 1024 1 3620.71 109.88 3625.54 109.83 3606.61 112.35 1024 2 9429258.32 7082.79 120.55 7403.53 134.78 1024 4 9430.66 290.44 9499.29 232.31 9414.6 190.92 4096 1 9339.28 296.48 9374.23 372.88 9348.76 298.49 4096 2 9410.53 378.69 9412.61 286.18 9409.75 278.31 4096 4 9487.35 374.1 9556.91 288.81 9441.94 221.64 16384 1 9380.43 403.8 9379.78 399.13 9382.42 393.55 16384 2 9367.69 406.93 9415.04 312.68 9409.29 300.9 16384 4 9391.96 405.17 9695.12 310.54 9423.76 223.47 Trying to understand the performance results: What is the host device configuration? tap + bridge? Yes. Did you use host CPU affinity for the vhost threads? I use numactl to pin cpu threads and vhost threads in the same numa node. Can multiqueue tap take advantage of multiqueue host NICs or is virtio-net multiqueue unaware of the physical NIC multiqueue capabilities? Tap is unware of the physical multiqueue NIC, but we can benefit from it since we
Re: [PATCH v5 01/12] KVM: ARM: Introduce KVM_SET_DEVICE_ADDRESS ioctl
On Wed, 9 Jan 2013 15:11:39 +, Peter Maydell peter.mayd...@linaro.org wrote: On 9 January 2013 14:58, Alexander Graf ag...@suse.de wrote: Yeah, that was the basic idea. Considering that the patch set hasn't been going in for another 2 months after that discussion indicates that this isn't too much of an issue though :). We might get there faster if people didn't keep nitpicking the APIs at the last minute :-) Exactly. We're trying hard to get the damned thing merged, and the permanent API churn quickly becomes a burden. My understanding was that a consensus was reached 2 months ago. Has something fundamentally changed? Something that makes this API invalid? If not, can we please keep this one? Thanks, M. -- Who you jivin' with that Cosmik Debris? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 01/12] KVM: ARM: Introduce KVM_SET_DEVICE_ADDRESS ioctl
On 09.01.2013, at 16:50, Marc Zyngier wrote: On Wed, 9 Jan 2013 16:28:03 +0100, Alexander Graf ag...@suse.de wrote: On 09.01.2013, at 16:22, Marc Zyngier wrote: On Wed, 9 Jan 2013 15:11:39 +, Peter Maydell peter.mayd...@linaro.org wrote: On 9 January 2013 14:58, Alexander Graf ag...@suse.de wrote: Yeah, that was the basic idea. Considering that the patch set hasn't been going in for another 2 months after that discussion indicates that this isn't too much of an issue though :). We might get there faster if people didn't keep nitpicking the APIs at the last minute :-) Exactly. We're trying hard to get the damned thing merged, and the permanent API churn quickly becomes a burden. As I said earlier, we have had a lot of experience in creating really bad APIs in the past. Is this one really bad? Again, what changed in the meantime that makes you think this API is wrong? I complained about it 2 months ago already. But how about making this one specific? Call it KVM_ARM_SET_VGIC_ADDRESS, keep the rest as it is, resend it, and later we can come up with an actually generic interface. Let's pretend you never wrote that, shall we? ;-) Why? The worst thing that happened to us so far were interfaces that looked generic and extensible and turned out not to be (see SET_SREGS in the ppc code). We have 2 options to circumvent this: 1) Commonly agree on an interface that actually is extensible and use it throughout the code 2) Create a very specific interface for a single use case, keep it implemented forever and as soon as we need more, implement a more generic one that goes back to 1 Since the ARM patches have been out of tree for too long, I'm happy with going route 2 until we go back to square 1. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 01/12] KVM: ARM: Introduce KVM_SET_DEVICE_ADDRESS ioctl
On 09/01/13 15:56, Alexander Graf wrote: On 09.01.2013, at 16:50, Marc Zyngier wrote: On Wed, 9 Jan 2013 16:28:03 +0100, Alexander Graf ag...@suse.de wrote: On 09.01.2013, at 16:22, Marc Zyngier wrote: On Wed, 9 Jan 2013 15:11:39 +, Peter Maydell peter.mayd...@linaro.org wrote: On 9 January 2013 14:58, Alexander Graf ag...@suse.de wrote: Yeah, that was the basic idea. Considering that the patch set hasn't been going in for another 2 months after that discussion indicates that this isn't too much of an issue though :). We might get there faster if people didn't keep nitpicking the APIs at the last minute :-) Exactly. We're trying hard to get the damned thing merged, and the permanent API churn quickly becomes a burden. As I said earlier, we have had a lot of experience in creating really bad APIs in the past. Is this one really bad? Again, what changed in the meantime that makes you think this API is wrong? I complained about it 2 months ago already. But how about making this one specific? Call it KVM_ARM_SET_VGIC_ADDRESS, keep the rest as it is, resend it, and later we can come up with an actually generic interface. Let's pretend you never wrote that, shall we? ;-) Why? The worst thing that happened to us so far were interfaces that looked generic and extensible and turned out not to be (see SET_SREGS in the ppc code). We have 2 options to circumvent this: 1) Commonly agree on an interface that actually is extensible and use it throughout the code 2) Create a very specific interface for a single use case, keep it implemented forever and as soon as we need more, implement a more generic one that goes back to 1 Since the ARM patches have been out of tree for too long, I'm happy with going route 2 until we go back to square 1. I really don't want to see that. Either we keep the API as it is, or we change it for something that is really better and used by other architectures. No point in turning it upside down if we're the only one doing it. Oh, and as we're aiming for 3.9, it'd better be ready soon... M. -- Jazz is not dead. It just smells funny... -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 01/12] KVM: ARM: Introduce KVM_SET_DEVICE_ADDRESS ioctl
On Wed, 9 Jan 2013 16:28:03 +0100, Alexander Graf ag...@suse.de wrote: On 09.01.2013, at 16:22, Marc Zyngier wrote: On Wed, 9 Jan 2013 15:11:39 +, Peter Maydell peter.mayd...@linaro.org wrote: On 9 January 2013 14:58, Alexander Graf ag...@suse.de wrote: Yeah, that was the basic idea. Considering that the patch set hasn't been going in for another 2 months after that discussion indicates that this isn't too much of an issue though :). We might get there faster if people didn't keep nitpicking the APIs at the last minute :-) Exactly. We're trying hard to get the damned thing merged, and the permanent API churn quickly becomes a burden. As I said earlier, we have had a lot of experience in creating really bad APIs in the past. Is this one really bad? Again, what changed in the meantime that makes you think this API is wrong? But how about making this one specific? Call it KVM_ARM_SET_VGIC_ADDRESS, keep the rest as it is, resend it, and later we can come up with an actually generic interface. Let's pretend you never wrote that, shall we? ;-) M. -- Fast, cheap, reliable. Pick two. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5.1 1/2] KVM: ARM: Introduce KVM_SET_DEVICE_ADDRESS ioctl
On ARM (and possibly other architectures) some bits are specific to the model being emulated for the guest and user space needs a way to tell the kernel about those bits. An example is mmio device base addresses, where KVM must know the base address for a given device to properly emulate mmio accesses within a certain address range or directly map a device with virtualiation extensions into the guest address space. We try to make this API slightly more generic than for our specific use, but so far only the VGIC uses this feature. Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com --- Documentation/virtual/kvm/api.txt | 37 + arch/arm/include/uapi/asm/kvm.h | 13 + arch/arm/kvm/arm.c| 23 ++- include/uapi/linux/kvm.h |8 4 files changed, 80 insertions(+), 1 deletion(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 38066a7a..c2ebd9e 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -2206,6 +2206,43 @@ This ioctl returns the guest registers that are supported for the KVM_GET_ONE_REG/KVM_SET_ONE_REG calls. +4.80 KVM_ARM_SET_DEVICE_ADDR + +Capability: KVM_CAP_ARM_SET_DEVICE_ADDR +Architectures: arm +Type: vm ioctl +Parameters: struct kvm_arm_device_address (in) +Returns: 0 on success, -1 on error +Errors: + ENODEV: The device id is unknown + ENXIO: Device not supported on current system + EEXIST: Address already set + E2BIG: Address outside guest physical address space + +struct kvm_arm_device_addr { + __u64 id; + __u64 addr; +}; + +Specify a device address in the guest's physical address space where guests +can access emulated or directly exposed devices, which the host kernel needs +to know about. The id field is an architecture specific identifier for a +specific device. + +ARM divides the id field into two parts, a device id and an address type id +specific to the individual device. + + Â bits: | 63... 32 | 31...16 | 15...0 | + field: |0x | device id | addr type id | + +ARM currently only require this when using the in-kernel GIC support for the +hardware VGIC features, using KVM_ARM_DEVICE_VGIC_V2 as the device id. When +setting the base address for the guest's mapping of the VGIC virtual CPU +and distributor interface, the ioctl must be called after calling +KVM_CREATE_IRQCHIP, but before calling KVM_RUN on any of the VCPUs. Calling +this ioctl twice for any of the base addresses will return -EEXIST. + + 5. The kvm_run structure diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h index 73b9615..09911a7 100644 --- a/arch/arm/include/uapi/asm/kvm.h +++ b/arch/arm/include/uapi/asm/kvm.h @@ -65,6 +65,19 @@ struct kvm_regs { #define KVM_ARM_TARGET_CORTEX_A15 0 #define KVM_ARM_NUM_TARGETS1 +/* KVM_SET_DEVICE_ADDRESS ioctl id encoding */ +#define KVM_DEVICE_TYPE_SHIFT 0 +#define KVM_DEVICE_TYPE_MASK (0x KVM_DEVICE_TYPE_SHIFT) +#define KVM_DEVICE_ID_SHIFT16 +#define KVM_DEVICE_ID_MASK (0x KVM_DEVICE_ID_SHIFT) + +/* Supported device IDs */ +#define KVM_ARM_DEVICE_VGIC_V2 0 + +/* Supported VGIC address types */ +#define KVM_VGIC_V2_ADDR_TYPE_DIST 0 +#define KVM_VGIC_V2_ADDR_TYPE_CPU 1 + struct kvm_vcpu_init { __u32 target; __u32 features[7]; diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c index f42d828..ce6e455 100644 --- a/arch/arm/kvm/arm.c +++ b/arch/arm/kvm/arm.c @@ -165,6 +165,8 @@ int kvm_dev_ioctl_check_extension(long ext) case KVM_CAP_COALESCED_MMIO: r = KVM_COALESCED_MMIO_PAGE_OFFSET; break; + case KVM_CAP_ARM_SET_DEVICE_ADDR: + r = 1; case KVM_CAP_NR_VCPUS: r = num_online_cpus(); break; @@ -805,10 +807,29 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log) return -EINVAL; } +static int kvm_vm_ioctl_set_device_addr(struct kvm *kvm, + struct kvm_arm_device_addr *dev_addr) +{ + return -ENODEV; +} + long kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) { - return -EINVAL; + struct kvm *kvm = filp-private_data; + void __user *argp = (void __user *)arg; + + switch (ioctl) { + case KVM_ARM_SET_DEVICE_ADDR: { + struct kvm_arm_device_addr dev_addr; + + if (copy_from_user(dev_addr, argp, sizeof(dev_addr))) + return -EFAULT; + return kvm_vm_ioctl_set_device_addr(kvm, dev_addr); + } + default: + return -EINVAL; + } } static void cpu_init_hyp_mode(void *vector) diff --git a/include/uapi/linux/kvm.h
[PATCH v5.1 2/2] ARM: KVM: VGIC accept vcpu and dist base addresses from user space
User space defines the model to emulate to a guest and should therefore decide which addresses are used for both the virtual CPU interface directly mapped in the guest physical address space and for the emulated distributor interface, which is mapped in software by the in-kernel VGIC support. Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com --- Documentation/virtual/kvm/api.txt |1 + arch/arm/include/asm/kvm_vgic.h |9 + arch/arm/include/uapi/asm/kvm.h |3 ++ arch/arm/kvm/arm.c| 14 arch/arm/kvm/vgic.c | 62 + 5 files changed, 88 insertions(+), 1 deletion(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index c2ebd9e..497fd48 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -2218,6 +2218,7 @@ Errors: ENXIO: Device not supported on current system EEXIST: Address already set E2BIG: Address outside guest physical address space + EBUSY: Address overlaps with other device range struct kvm_arm_device_addr { __u64 id; diff --git a/arch/arm/include/asm/kvm_vgic.h b/arch/arm/include/asm/kvm_vgic.h index fcfd530..270dcd2 100644 --- a/arch/arm/include/asm/kvm_vgic.h +++ b/arch/arm/include/asm/kvm_vgic.h @@ -22,6 +22,9 @@ #include asm/hardware/gic.h struct vgic_dist { + /* Distributor and vcpu interface mapping in the guest */ + phys_addr_t vgic_dist_base; + phys_addr_t vgic_cpu_base; }; struct vgic_cpu { @@ -33,6 +36,7 @@ struct kvm_run; struct kvm_exit_mmio; #ifdef CONFIG_KVM_ARM_VGIC +int kvm_vgic_set_addr(struct kvm *kvm, unsigned long type, u64 addr); bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run, struct kvm_exit_mmio *mmio); @@ -42,6 +46,11 @@ static inline int kvm_vgic_hyp_init(void) return 0; } +static inline int kvm_vgic_set_addr(struct kvm *kvm, unsigned long type, u64 addr) +{ + return 0; +} + static inline int kvm_vgic_init(struct kvm *kvm) { return 0; diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h index 09911a7..94e893b 100644 --- a/arch/arm/include/uapi/asm/kvm.h +++ b/arch/arm/include/uapi/asm/kvm.h @@ -78,6 +78,9 @@ struct kvm_regs { #define KVM_VGIC_V2_ADDR_TYPE_DIST 0 #define KVM_VGIC_V2_ADDR_TYPE_CPU 1 +#define KVM_VGIC_V2_DIST_SIZE 0x1000 +#define KVM_VGIC_V2_CPU_SIZE 0x2000 + struct kvm_vcpu_init { __u32 target; __u32 features[7]; diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c index fcb0dfc..b4e7571 100644 --- a/arch/arm/kvm/arm.c +++ b/arch/arm/kvm/arm.c @@ -858,7 +858,19 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log) static int kvm_vm_ioctl_set_device_addr(struct kvm *kvm, struct kvm_arm_device_addr *dev_addr) { - return -ENODEV; + unsigned long dev_id, type; + + dev_id = (dev_addr-id KVM_DEVICE_ID_MASK) KVM_DEVICE_ID_SHIFT; + type = (dev_addr-id KVM_DEVICE_TYPE_MASK) KVM_DEVICE_TYPE_SHIFT; + + switch (dev_id) { + case KVM_ARM_DEVICE_VGIC_V2: + if (!vgic_present) + return -ENXIO; + return kvm_vgic_set_addr(kvm, type, dev_addr-addr); + default: + return -ENODEV; + } } long kvm_arch_vm_ioctl(struct file *filp, diff --git a/arch/arm/kvm/vgic.c b/arch/arm/kvm/vgic.c index 1feee5a..cdb7671 100644 --- a/arch/arm/kvm/vgic.c +++ b/arch/arm/kvm/vgic.c @@ -22,6 +22,9 @@ #include linux/io.h #include asm/kvm_emulate.h +#define VGIC_ADDR_UNDEF(-1) +#define IS_VGIC_ADDR_UNDEF(_x) ((_x) == VGIC_ADDR_UNDEF) + #define ACCESS_READ_VALUE (1 0) #define ACCESS_READ_RAZ(0 0) #define ACCESS_READ_MASK(x)((x) (1 0)) @@ -142,3 +145,62 @@ bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run, { return KVM_EXIT_MMIO; } + +static bool vgic_ioaddr_overlap(struct kvm *kvm) +{ + phys_addr_t dist = kvm-arch.vgic.vgic_dist_base; + phys_addr_t cpu = kvm-arch.vgic.vgic_cpu_base; + + if (IS_VGIC_ADDR_UNDEF(dist) || IS_VGIC_ADDR_UNDEF(cpu)) + return 0; + if ((dist = cpu dist + KVM_VGIC_V2_DIST_SIZE cpu) || + (cpu = dist cpu + KVM_VGIC_V2_CPU_SIZE dist)) + return -EBUSY; + return 0; +} + +static int vgic_ioaddr_assign(struct kvm *kvm, phys_addr_t *ioaddr, + phys_addr_t addr, phys_addr_t size) +{ + int ret; + + if (!IS_VGIC_ADDR_UNDEF(*ioaddr)) + return -EEXIST; + if (addr + size addr) + return -EINVAL; + + ret = vgic_ioaddr_overlap(kvm); + if (ret) + return ret; + *ioaddr = addr; + return ret; +} + +int kvm_vgic_set_addr(struct kvm *kvm, unsigned long type, u64
[PATCH v5.1 0/2] KVM: ARM: Rename KVM_SET_DEVICE_ADDRESS
Renames the KVM_SET_DEVICE_ADDRESS to KVM_ARM_SET_DEVICE_ADDR to make it obvious that this is ARM specific in lack of a better generic interface. Once we agree on a better interface the KVM/ARM code can also take advantage of that, but until then we don't want to hold up the KVM/ARM patches. A new complete series of these patches including this change can be pulled from: git://github.com/virtualopensystems/linux-kvm-arm.git kvm-arm-v15-devaddr-v2 -Christoffer --- Christoffer Dall (2): KVM: ARM: Introduce KVM_SET_DEVICE_ADDRESS ioctl ARM: KVM: VGIC accept vcpu and dist base addresses from user space Documentation/virtual/kvm/api.txt | 38 ++ arch/arm/include/asm/hardware/gic.h | 25 arch/arm/include/asm/kvm_host.h | 18 +++ arch/arm/include/asm/kvm_vgic.h | 89 +++ arch/arm/include/uapi/asm/kvm.h | 16 +++ arch/arm/kvm/Makefile |1 arch/arm/kvm/arm.c | 92 +++- arch/arm/kvm/interrupts.S |4 + arch/arm/kvm/mmio.c |3 + arch/arm/kvm/vgic.c | 206 +++ include/uapi/linux/kvm.h|8 + 11 files changed, 498 insertions(+), 2 deletions(-) create mode 100644 arch/arm/include/asm/kvm_vgic.h create mode 100644 arch/arm/kvm/vgic.c -- -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 01/12] KVM: ARM: Introduce KVM_SET_DEVICE_ADDRESS ioctl
On Wed, Jan 9, 2013 at 11:12 AM, Marc Zyngier marc.zyng...@arm.com wrote: On 09/01/13 15:56, Alexander Graf wrote: On 09.01.2013, at 16:50, Marc Zyngier wrote: On Wed, 9 Jan 2013 16:28:03 +0100, Alexander Graf ag...@suse.de wrote: On 09.01.2013, at 16:22, Marc Zyngier wrote: On Wed, 9 Jan 2013 15:11:39 +, Peter Maydell peter.mayd...@linaro.org wrote: On 9 January 2013 14:58, Alexander Graf ag...@suse.de wrote: Yeah, that was the basic idea. Considering that the patch set hasn't been going in for another 2 months after that discussion indicates that this isn't too much of an issue though :). We might get there faster if people didn't keep nitpicking the APIs at the last minute :-) Exactly. We're trying hard to get the damned thing merged, and the permanent API churn quickly becomes a burden. As I said earlier, we have had a lot of experience in creating really bad APIs in the past. Is this one really bad? Again, what changed in the meantime that makes you think this API is wrong? I complained about it 2 months ago already. But how about making this one specific? Call it KVM_ARM_SET_VGIC_ADDRESS, keep the rest as it is, resend it, and later we can come up with an actually generic interface. Let's pretend you never wrote that, shall we? ;-) Why? The worst thing that happened to us so far were interfaces that looked generic and extensible and turned out not to be (see SET_SREGS in the ppc code). We have 2 options to circumvent this: 1) Commonly agree on an interface that actually is extensible and use it throughout the code 2) Create a very specific interface for a single use case, keep it implemented forever and as soon as we need more, implement a more generic one that goes back to 1 Since the ARM patches have been out of tree for too long, I'm happy with going route 2 until we go back to square 1. I really don't want to see that. Either we keep the API as it is, or we change it for something that is really better and used by other architectures. No point in turning it upside down if we're the only one doing it. Oh, and as we're aiming for 3.9, it'd better be ready soon... ok, I renamed it to KVM_ARM_SET_DEVICE_ADDR, using the same binary format, so user space tools remain compatible. I'll be happy to help out with a more generic API, once the kvm/arm patches reach upstream. Thanks so far! -Christoffer -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 12/12] ARM: KVM: Add VGIC configuration option
On Wed, Jan 9, 2013 at 8:28 AM, Sergei Shtylyov sshtyl...@mvista.com wrote: Hello. On 08-01-2013 22:42, Christoffer Dall wrote: From: Marc Zyngier marc.zyng...@arm.com It is now possible to select the VGIC configuration option. Signed-off-by: Marc Zyngier marc.zyng...@arm.com Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com --- arch/arm/kvm/Kconfig |8 1 file changed, 8 insertions(+) diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig index 05227cb..d32e33f 100644 --- a/arch/arm/kvm/Kconfig +++ b/arch/arm/kvm/Kconfig @@ -51,6 +51,14 @@ config KVM_ARM_MAX_VCPUS large, so only choose a reasonable number that you expect to actually use. +config KVM_ARM_VGIC +bool KVM support for Virtual GIC Please indent with tab, as below. fixed both, thanks. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [kvmarm] [PATCH v5.1 0/2] KVM: ARM: Rename KVM_SET_DEVICE_ADDRESS
On 09.01.2013, at 17:26, Christoffer Dall wrote: Renames the KVM_SET_DEVICE_ADDRESS to KVM_ARM_SET_DEVICE_ADDR to make it obvious that this is ARM specific in lack of a better generic interface. Once we agree on a better interface the KVM/ARM code can also take advantage of that, but until then we don't want to hold up the KVM/ARM patches. Works for me. Scott, are you happy with this one too? We can start to introduce (and fix ARM) with a generic ioctl in the MPIC patches then. Alex A new complete series of these patches including this change can be pulled from: git://github.com/virtualopensystems/linux-kvm-arm.git kvm-arm-v15-devaddr-v2 -Christoffer --- Christoffer Dall (2): KVM: ARM: Introduce KVM_SET_DEVICE_ADDRESS ioctl ARM: KVM: VGIC accept vcpu and dist base addresses from user space Documentation/virtual/kvm/api.txt | 38 ++ arch/arm/include/asm/hardware/gic.h | 25 arch/arm/include/asm/kvm_host.h | 18 +++ arch/arm/include/asm/kvm_vgic.h | 89 +++ arch/arm/include/uapi/asm/kvm.h | 16 +++ arch/arm/kvm/Makefile |1 arch/arm/kvm/arm.c | 92 +++- arch/arm/kvm/interrupts.S |4 + arch/arm/kvm/mmio.c |3 + arch/arm/kvm/vgic.c | 206 +++ include/uapi/linux/kvm.h|8 + 11 files changed, 498 insertions(+), 2 deletions(-) create mode 100644 arch/arm/include/asm/kvm_vgic.h create mode 100644 arch/arm/kvm/vgic.c -- ___ kvmarm mailing list kvm...@lists.cs.columbia.edu https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Guest performance is reduced after live migration
Here's what I have found so far... Ubuntu 10.04 performed within +/- 2% so I'm not including those results.It seems that it's more of an issue of disk access, so I'm going to run some more disk specific benchmarks and I'll those results later. I'd be happy to run any other perf tests that might help track down the problem as well. Qemu command line: /usr/bin/kvm -name one-3 -S -M pc-1.2 -cpu Westmere -enable-kvm -m 73728 -smp 16,sockets=2,cores=8,threads=1 -uuid 3ebea329-cfbb-3447-0b49-b41e078a3ede -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/one-3.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -no-kvm-pit-reinjection -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/one//datastores/0/3/disk.0,if=none,id=drive-virtio-disk0,format=raw,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/var/lib/one//datastores/0/3/disk.1,if=none,id=drive-ide0-0-0,readonly=on,format=raw -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev tap,fd=23,id=hostnet0,vhost=on,vhostfd=25 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=02:00:0a:64:02:fe,bus=pci.0,addr=0x3 -vnc 0.0.0.0:3,password -vga cirrus -incoming tcp:0.0.0.0:49155 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 Initial Boot Benchmarks === Huge Page Usage Physical Host: 2627584kB QEMU Process: 2478080kB Using phoronix pts/compuational ImageMagick - 31.54s Linux Kernel 3.1 - 43.91s Mplayer - 30.49s PHP - 22.25s After Live Migration Benchmarks === Huge Page Usage Physical Host: 3174400kB QEMU Process: 3151872kB Using phoronix pts/compilation ImageMagick - 43.29s Linux Kernel 3.1 - 76.67s Mplayer - 45.41s PHP - 29.1s -Original Message- From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf Of Mark Petersen Sent: Wednesday, January 02, 2013 7:32 PM To: Marcelo Tosatti Cc: kvm@vger.kernel.org; shouta.ueh...@jp.yokogawa.com Subject: RE: Guest performance is reduced after live migration I believe I disabled huge pages on the guest and host previously, but I'll test a few scenarios and look at transparent hugepage usage specifically again over the next couple days and report back. Below is a command line used for testing. /usr/bin/kvm - qemu-x86_64 /usr/bin/kvm -name one-483 -S -M pc-1.2 -cpu Westmere -enable-kvm -m 73728 -smp 16,sockets=2,cores=8,threads=1 -uuid a844146a-0d72-a661-fe6c-cb6b7a4ba240 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/one-483.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/one//datastores/0/483/disk.0,if=none,id=drive-virtio-disk0,format=raw,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/var/lib/one//datastores/0/483/disk.1,if=none,id=drive-ide0-0-0,readonly=on,format=raw -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev tap,fd=26,id=hostnet0,vhost=on,vhostfd=27 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=02:00:0a:02:02:4b,bus=pci.0,addr=0x3 -vnc 0.0.0.0 -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -Original Message- From: Marcelo Tosatti [mailto:mtosa...@redhat.com] Sent: Wednesday, January 02, 2013 6:49 PM To: Mark Petersen Cc: kvm@vger.kernel.org; shouta.ueh...@jp.yokogawa.com Subject: Re: Guest performance is reduced after live migration On Wed, Jan 02, 2013 at 11:56:11PM +, Mark Petersen wrote: I don't think it's related to huge pages... I was using phoronix-test-suite to run benchmarks. The 'batch/compilation' group shows the slowdown for all tests, the 'batch/computation' show some performance degradation, but not nearly as significant. Huge pages in the host, for the qemu-kvm process, i mean. Without huge pages backing guest memory in the host, 4k EPT TLB entries will be used instead of 2MB EPT TLB entries. You could probably easily test this way without phoronix - Start a VM with almost nothing running. Download mainline Linux kernel, compile. This takes about 45 seconds in my case (72GB memory, 16 virtual CPUs, idle physical host running this VM.) Run as many times as you want, still takes ~45 seconds. Migrate to a new idle host, kernel compile now takes ~90 seconds, wait 3 hours (should give khugepaged a change to do its thing I imagine), Please verify its the case (by checking how much memory is backed by hugepages). http://www.mjmwired.net/kernel/Documentation/vm/transhuge.txt Monitoring Usage. kernel compiles still take 90 seconds. Reboot virtual machine (run 'shutdown -r now', reboot, whatever.) First compile
Re: [PATCH 1/2] VFIO-AER: Vfio-pci driver changes for supporting AER
On Wed, 2013-01-09 at 06:26 +, Pandarathil, Vijaymohan R wrote: - New ioctl which is used to pass the eventfd that is signaled when an error occurs in the vfio_pci_device - Register pci_error_handler for the vfio_pci driver - When the device encounters an error, the error handler registered by the vfio_pci driver gets invoked by the AER infrastructure - In the error handler, signal the eventfd registered for the device. - This results in the qemu eventfd handler getting invoked and appropriate action taken for the guest. Signed-off-by: Vijay Mohan Pandarathil vijaymohan.pandarat...@hp.com --- drivers/vfio/pci/vfio_pci.c | 29 + drivers/vfio/pci/vfio_pci_private.h | 1 + drivers/vfio/vfio.c | 8 include/linux/vfio.h| 1 + include/uapi/linux/vfio.h | 9 + 5 files changed, 48 insertions(+) diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index 6c11994..4ae9526 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -207,6 +207,8 @@ static long vfio_pci_ioctl(void *device_data, if (vdev-reset_works) info.flags |= VFIO_DEVICE_FLAGS_RESET; + info.flags |= VFIO_DEVICE_FLAGS_AER_NOTIFY; + This appears to be a PCI specific flag, so the name should include _PCI_. We also support non-PCIe devices and it seems like it would be possible to not have AER support available, so shouldn't this be conditional? info.num_regions = VFIO_PCI_NUM_REGIONS; info.num_irqs = VFIO_PCI_NUM_IRQS; @@ -348,6 +350,19 @@ static long vfio_pci_ioctl(void *device_data, return ret; + } else if (cmd == VFIO_DEVICE_SET_ERRFD) { + int32_t fd = (int32_t)arg; + + if (fd 0) + return -EINVAL; + + vdev-err_trigger = eventfd_ctx_fdget(fd); + + if (IS_ERR(vdev-err_trigger)) + return PTR_ERR(vdev-err_trigger); + + return 0; + I'm not sure why we wouldn't describe this as just another interrupt from the device and configure it via SET_IRQ. This ioctl has very limited use and doesn't follow any of the conventions of all the other vfio ioctls. } else if (cmd == VFIO_DEVICE_RESET) return vdev-reset_works ? pci_reset_function(vdev-pdev) : -EINVAL; @@ -527,11 +542,25 @@ static void vfio_pci_remove(struct pci_dev *pdev) kfree(vdev); } +static pci_ers_result_t vfio_err_detected(struct pci_dev *pdev, + pci_channel_state_t state) +{ + struct vfio_pci_device *vdev = vfio_get_vdev(pdev-dev); + + eventfd_signal(vdev-err_trigger, 1); + return PCI_ERS_RESULT_CAN_RECOVER; +} What if err_trigger hasn't been set? + +static const struct pci_error_handlers vfio_err_handlers = { + .error_detected = vfio_err_detected, +}; + static struct pci_driver vfio_pci_driver = { .name = vfio-pci, .id_table = NULL, /* only dynamic ids */ .probe = vfio_pci_probe, .remove = vfio_pci_remove, + .err_handler= vfio_err_handlers, }; static void __exit vfio_pci_cleanup(void) diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h index 611827c..daee62f 100644 --- a/drivers/vfio/pci/vfio_pci_private.h +++ b/drivers/vfio/pci/vfio_pci_private.h @@ -55,6 +55,7 @@ struct vfio_pci_device { boolbardirty; struct pci_saved_state *pci_saved_state; atomic_trefcnt; + struct eventfd_ctx *err_trigger; }; #define is_intx(vdev) (vdev-irq_type == VFIO_PCI_INTX_IRQ_INDEX) diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index 56097c6..5ed5a54 100644 --- a/drivers/vfio/vfio.c +++ b/drivers/vfio/vfio.c @@ -693,6 +693,14 @@ void *vfio_del_group_dev(struct device *dev) } EXPORT_SYMBOL_GPL(vfio_del_group_dev); +void *vfio_get_vdev(struct device *dev) +{ + struct vfio_device *device = dev_get_drvdata(dev); + + return device-device_data; +} +EXPORT_SYMBOL_GPL(vfio_get_vdev); + This is unsafe. How do we know dev is a vfio device? How do we keep that drvdata valid while you're using it? I think you want to export the existing vfio_group_get_device() and vfio_device_put(). Thanks, Alex /** * VFIO base fd, /dev/vfio/vfio */ diff --git a/include/linux/vfio.h b/include/linux/vfio.h index ab9e862..3c97b03 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -45,6 +45,7 @@ extern int vfio_add_group_dev(struct device *dev, void *device_data); extern void *vfio_del_group_dev(struct device *dev); +extern void *vfio_get_vdev(struct device *dev); /** *
Re: [PATCH 2/2] QEMU-AER: Qemu changes to support AER for VFIO-PCI devices
On Wed, 2013-01-09 at 06:26 +, Pandarathil, Vijaymohan R wrote: - Create eventfd per vfio device assigned to a guest and register an event handler - This fd is passed to the vfio_pci driver through a new ioctl - When the device encounters an error, the eventfd is signaled and the qemu eventfd handler gets invoked. - In the handler decide what action to take. Current action taken is to terminate the guest. Signed-off-by: Vijay Mohan Pandarathil vijaymohan.pandarat...@hp.com --- hw/vfio_pci.c | 56 ++ linux-headers/linux/vfio.h | 9 2 files changed, 65 insertions(+) diff --git a/hw/vfio_pci.c b/hw/vfio_pci.c index 28c8303..9c3c28b 100644 --- a/hw/vfio_pci.c +++ b/hw/vfio_pci.c @@ -38,6 +38,7 @@ #include qemu/error-report.h #include qemu/queue.h #include qemu/range.h +#include sysemu/sysemu.h /* #define DEBUG_VFIO */ #ifdef DEBUG_VFIO @@ -130,6 +131,8 @@ typedef struct VFIODevice { QLIST_ENTRY(VFIODevice) next; struct VFIOGroup *group; bool reset_works; +EventNotifier errfd; Misleading name, it's an EventNotifier not an fd. +__u32 dev_info_flags; } VFIODevice; typedef struct VFIOGroup { @@ -1805,6 +1808,8 @@ static int vfio_get_device(VFIOGroup *group, const char *name, VFIODevice *vdev) DPRINTF(Device %s flags: %u, regions: %u, irgs: %u\n, name, dev_info.flags, dev_info.num_regions, dev_info.num_irqs); +vdev-dev_info_flags = dev_info.flags; + We test the VFIO_DEVICE_FLAGS_RESET elsewhere by caching the bit into a variable, why not do that here too? if (!(dev_info.flags VFIO_DEVICE_FLAGS_PCI)) { error_report(vfio: Um, this isn't a PCI device\n); goto error; @@ -1900,6 +1905,55 @@ static void vfio_put_device(VFIODevice *vdev) } } +static void vfio_errfd_handler(void *opaque) +{ +VFIODevice *vdev = opaque; + +if (!event_notifier_test_and_clear(vdev-errfd)) { +return; +} + +/* + * TBD. Retrieve the error details and decide what action + * needs to be taken. One of the actions could be to pass + * the error to the guest and have the guest driver recover + * the error. This requires that PCIe capabilities be + * exposed to the guest. At present, we just terminate the + * guest to contain the error. + */ +error_report(%s(%04x:%02x:%02x.%x) +Unrecoverable error detected... Terminating guest\n, +__func__, vdev-host.domain, vdev-host.bus, vdev-host.slot, +vdev-host.function); + +qemu_system_shutdown_request(); I would have figured hw_error +return; +} + +static void vfio_register_errfd(VFIODevice *vdev) +{ +int32_t pfd; pfd is used elsewhere in vfio as Pointer to FD, this is just a fd. +int ret; + +if (!(vdev-dev_info_flags VFIO_DEVICE_FLAGS_AER_NOTIFY)) { +error_report(vfio: Warning: Error notification not supported for the device\n); This should probably just be a debug print. +return; +} +if (event_notifier_init(vdev-errfd, 0)) { +error_report(vfio: Warning: Unable to init event notifier for error detection\n); +return; +} +pfd = event_notifier_get_fd(vdev-errfd); +qemu_set_fd_handler(pfd, vfio_errfd_handler, NULL, vdev); + +ret = ioctl(vdev-fd, VFIO_DEVICE_SET_ERRFD, pfd); +if (ret) { +error_report(vfio: Warning: Failed to setup error fd: %d\n, ret); +qemu_set_fd_handler(pfd, NULL, NULL, vdev); +event_notifier_cleanup(vdev-errfd); +} +return; +} static int vfio_initfn(PCIDevice *pdev) { VFIODevice *pvdev, *vdev = DO_UPCAST(VFIODevice, pdev, pdev); @@ -2010,6 +2064,8 @@ static int vfio_initfn(PCIDevice *pdev) } } +vfio_register_errfd(vdev); + No cleanup in exitfn?! Thanks, Alex return 0; out_teardown: diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h index 4758d1b..0ca4eeb 100644 --- a/linux-headers/linux/vfio.h +++ b/linux-headers/linux/vfio.h @@ -147,6 +147,7 @@ struct vfio_device_info { __u32 flags; #define VFIO_DEVICE_FLAGS_RESET (1 0)/* Device supports reset */ #define VFIO_DEVICE_FLAGS_PCI(1 1)/* vfio-pci device */ +#define VFIO_DEVICE_FLAGS_AER_NOTIFY (1 2) /* Supports aer notification */ __u32 num_regions;/* Max region index + 1 */ __u32 num_irqs; /* Max IRQ index + 1 */ }; @@ -288,6 +289,14 @@ struct vfio_irq_set { */ #define VFIO_DEVICE_RESET_IO(VFIO_TYPE, VFIO_BASE + 11) +/** + * VFIO_DEVICE_SET_ERRFD - _IO(VFIO_TYPE, VFIO_BASE + 12) + * + * Pass the eventfd to the vfio-pci driver for signalling any device + * error notifications + */ +#define VFIO_DEVICE_SET_ERRFD _IO(VFIO_TYPE, VFIO_BASE
[RFC 02/12] target-i386: Don't set any KVM flag by default if KVM is disabled
This is a cleanup that tries to solve two small issues: - We don't need a separate kvm_pv_eoi_features variable just to keep a constant calculated at compile-time, and this style would require adding a separate variable (that's declared twice because of the CONFIG_KVM ifdef) for each feature that's going to be enabled/disable by machine-type compat code. - The pc-1.3 code is setting the kvm_pv_eoi flag on cpuid_kvm_features even when KVM is disabled at runtime. This small incosistency in the cpuid_kvm_features field isn't a problem today because cpuid_kvm_features is ignored by the TCG code, but it may cause unexpected problems later when refactoring the CPUID handling code. This patch eliminates the kvm_pv_eoi_features variable and simply uses kvm_enabled() inside the enable_kvm_pv_eoi() compat function, so it enables kvm_pv_eoi only if KVM is enabled. I believe this makes the behavior of enable_kvm_pv_eoi() clearer and easier to understand. Signed-off-by: Eduardo Habkost ehabk...@redhat.com --- Cc: kvm@vger.kernel.org Cc: Michael S. Tsirkin m...@redhat.com Cc: Gleb Natapov g...@redhat.com Cc: Marcelo Tosatti mtosa...@redhat.com Changes v2: - Coding style fix Changes v3: - Eliminate #ifdef by using the fake KVM_FEATURE_PV_EOI #define Changes v4: - Check kvm_enabled() when actually using kvm_default_features - Eliminate Yet Another #ifdef by using the fake KVM_FEATURE_* #defines on kvm_default_features initialization --- target-i386/cpu.c | 16 ++-- 1 file changed, 6 insertions(+), 10 deletions(-) diff --git a/target-i386/cpu.c b/target-i386/cpu.c index e4dc370..57a22b7 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -206,22 +206,16 @@ typedef struct model_features_t { int check_cpuid = 0; int enforce_cpuid = 0; -#if defined(CONFIG_KVM) static uint32_t kvm_default_features = (1 KVM_FEATURE_CLOCKSOURCE) | (1 KVM_FEATURE_NOP_IO_DELAY) | (1 KVM_FEATURE_CLOCKSOURCE2) | (1 KVM_FEATURE_ASYNC_PF) | (1 KVM_FEATURE_STEAL_TIME) | (1 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT); -static const uint32_t kvm_pv_eoi_features = (0x1 KVM_FEATURE_PV_EOI); -#else -static uint32_t kvm_default_features = 0; -static const uint32_t kvm_pv_eoi_features = 0; -#endif void enable_kvm_pv_eoi(void) { -kvm_default_features |= kvm_pv_eoi_features; +kvm_default_features |= (1UL KVM_FEATURE_PV_EOI); } void host_cpuid(uint32_t function, uint32_t count, @@ -1343,13 +1337,15 @@ static int cpu_x86_parse_featurestr(x86_def_t *x86_cpu_def, char *features) unsigned int i; char *featurestr; /* Single 'key=value string being parsed */ /* Features to be added */ -FeatureWordArray plus_features = { -[FEAT_KVM] = kvm_default_features, -}; +FeatureWordArray plus_features = { 0 }; /* Features to be removed */ FeatureWordArray minus_features = { 0 }; uint32_t numvalue; +if (kvm_enabled()) { +plus_features[FEAT_KVM] = kvm_default_features; +} + add_flagname_to_bitmaps(hypervisor, plus_features); featurestr = features ? strtok(features, ,) : NULL; -- 1.7.11.7 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC 04/12] kvm: Create kvm_arch_vcpu_id() function
This will allow each architecture to define how the VCPU ID is set on the KVM_CREATE_VCPU ioctl call. Signed-off-by: Eduardo Habkost ehabk...@redhat.com --- Cc: kvm@vger.kernel.org Cc: Michael S. Tsirkin m...@redhat.com Cc: Gleb Natapov g...@redhat.com Cc: Marcelo Tosatti mtosa...@redhat.com Changes v2: - Get CPUState as argument instead of CPUArchState --- include/sysemu/kvm.h | 3 +++ kvm-all.c| 2 +- target-i386/kvm.c| 5 + target-ppc/kvm.c | 5 + target-s390x/kvm.c | 5 + 5 files changed, 19 insertions(+), 1 deletion(-) diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h index 22acf91..384ee66 100644 --- a/include/sysemu/kvm.h +++ b/include/sysemu/kvm.h @@ -196,6 +196,9 @@ int kvm_arch_init(KVMState *s); int kvm_arch_init_vcpu(CPUState *cpu); +/* Returns VCPU ID to be used on KVM_CREATE_VCPU ioctl() */ +unsigned long kvm_arch_vcpu_id(CPUState *cpu); + void kvm_arch_reset_vcpu(CPUState *cpu); int kvm_arch_on_sigbus_vcpu(CPUState *cpu, int code, void *addr); diff --git a/kvm-all.c b/kvm-all.c index 6e2164b..d37359f 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -222,7 +222,7 @@ int kvm_init_vcpu(CPUState *cpu) DPRINTF(kvm_init_vcpu\n); -ret = kvm_vm_ioctl(s, KVM_CREATE_VCPU, cpu-cpu_index); +ret = kvm_vm_ioctl(s, KVM_CREATE_VCPU, kvm_arch_vcpu_id(cpu)); if (ret 0) { DPRINTF(kvm_create_vcpu failed\n); goto err; diff --git a/target-i386/kvm.c b/target-i386/kvm.c index 3acff40..5f3f789 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -411,6 +411,11 @@ static void cpu_update_state(void *opaque, int running, RunState state) } } +unsigned long kvm_arch_vcpu_id(CPUState *cpu) +{ +return cpu-cpu_index; +} + int kvm_arch_init_vcpu(CPUState *cs) { struct { diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c index e4a387f..bdcd133 100644 --- a/target-ppc/kvm.c +++ b/target-ppc/kvm.c @@ -384,6 +384,11 @@ static inline void kvm_fixup_page_sizes(PowerPCCPU *cpu) #endif /* !defined (TARGET_PPC64) */ +unsigned long kvm_arch_vcpu_id(CPUState *cpu) +{ +return cpu-cpu_index; +} + int kvm_arch_init_vcpu(CPUState *cs) { PowerPCCPU *cpu = POWERPC_CPU(cs); diff --git a/target-s390x/kvm.c b/target-s390x/kvm.c index 6ec5e6d..bd9864c 100644 --- a/target-s390x/kvm.c +++ b/target-s390x/kvm.c @@ -72,6 +72,11 @@ int kvm_arch_init(KVMState *s) return 0; } +unsigned long kvm_arch_vcpu_id(CPUState *cpu) +{ +return cpu-cpu_index; +} + int kvm_arch_init_vcpu(CPUState *cpu) { int ret = 0; -- 1.7.11.7 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC 01/12] kvm: add KVM_FEATURE_CLOCKSOURCE_STABLE_BIT fake #define
Signed-off-by: Eduardo Habkost ehabk...@redhat.com --- Cc: kvm@vger.kernel.org Cc: Michael S. Tsirkin m...@redhat.com Cc: Gleb Natapov g...@redhat.com Cc: Marcelo Tosatti mtosa...@redhat.com --- include/sysemu/kvm.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h index 6bdd513..22acf91 100644 --- a/include/sysemu/kvm.h +++ b/include/sysemu/kvm.h @@ -36,6 +36,7 @@ #define KVM_FEATURE_ASYNC_PF 0 #define KVM_FEATURE_STEAL_TIME 0 #define KVM_FEATURE_PV_EOI 0 +#define KVM_FEATURE_CLOCKSOURCE_STABLE_BIT 0 #endif extern int kvm_allowed; -- 1.7.11.7 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC 03/12] pc: Reverse pc_init_pci() compatibility logic
Currently, the pc-1.4 machine init function enables PV EOI and then calls the pc-1.2 machine init function. The problem with this approach is that now we can't enable any compatibility code inside the pc-1.2 init function because it would enable the compatibility behavior on pc-1.3 and pc-1.4 as well. This reverses the logic so that the pc-1.2 machine init function will disable PV EOI, and then call the pc-1.4 machine init function. This way we can change the pc-1.2 init function to enable compatibility behavior, and the pc-1.4 init function would just use the default behavior. It would be interesting to eventually change pc_init_pci_no_kvmclock() and pc_init_isa() to reuse pc_init_pci_1_2() as well (so we don't need to duplicate compatibility code on those two functions). But this will be probably much easier to do after we create a PCInitArgs struct for the PC initialization arguments. Signed-off-by: Eduardo Habkost ehabk...@redhat.com --- Cc: kvm@vger.kernel.org Cc: Michael S. Tsirkin m...@redhat.com Cc: Gleb Natapov g...@redhat.com Cc: Marcelo Tosatti mtosa...@redhat.com --- hw/pc_piix.c | 22 +- target-i386/cpu.c | 5 +++-- target-i386/cpu.h | 2 +- 3 files changed, 17 insertions(+), 12 deletions(-) diff --git a/hw/pc_piix.c b/hw/pc_piix.c index 2b3d58b..13d7cc8 100644 --- a/hw/pc_piix.c +++ b/hw/pc_piix.c @@ -233,12 +233,14 @@ static void pc_init_pci(QEMUMachineInitArgs *args) initrd_filename, cpu_model, 1, 1); } -static void pc_init_pci_1_3(QEMUMachineInitArgs *args) +/* PC machine init function for pc-0.14 to pc-1.2 */ +static void pc_init_pci_1_2(QEMUMachineInitArgs *args) { -enable_kvm_pv_eoi(); +disable_kvm_pv_eoi(); pc_init_pci(args); } +/* PC init function for pc-0.10 to pc-0.13, and reused by xenfv */ static void pc_init_pci_no_kvmclock(QEMUMachineInitArgs *args) { ram_addr_t ram_size = args-ram_size; @@ -247,6 +249,7 @@ static void pc_init_pci_no_kvmclock(QEMUMachineInitArgs *args) const char *kernel_cmdline = args-kernel_cmdline; const char *initrd_filename = args-initrd_filename; const char *boot_device = args-boot_device; +disable_kvm_pv_eoi(); pc_init1(get_system_memory(), get_system_io(), ram_size, boot_device, @@ -264,6 +267,7 @@ static void pc_init_isa(QEMUMachineInitArgs *args) const char *boot_device = args-boot_device; if (cpu_model == NULL) cpu_model = 486; +disable_kvm_pv_eoi(); pc_init1(get_system_memory(), get_system_io(), ram_size, boot_device, @@ -286,7 +290,7 @@ static QEMUMachine pc_machine_v1_4 = { .name = pc-1.4, .alias = pc, .desc = Standard PC, -.init = pc_init_pci_1_3, +.init = pc_init_pci, .max_cpus = 255, .is_default = 1, }; @@ -301,7 +305,7 @@ static QEMUMachine pc_machine_v1_4 = { static QEMUMachine pc_machine_v1_3 = { .name = pc-1.3, .desc = Standard PC, -.init = pc_init_pci_1_3, +.init = pc_init_pci, .max_cpus = 255, .compat_props = (GlobalProperty[]) { PC_COMPAT_1_3, @@ -340,7 +344,7 @@ static QEMUMachine pc_machine_v1_3 = { static QEMUMachine pc_machine_v1_2 = { .name = pc-1.2, .desc = Standard PC, -.init = pc_init_pci, +.init = pc_init_pci_1_2, .max_cpus = 255, .compat_props = (GlobalProperty[]) { PC_COMPAT_1_2, @@ -383,7 +387,7 @@ static QEMUMachine pc_machine_v1_2 = { static QEMUMachine pc_machine_v1_1 = { .name = pc-1.1, .desc = Standard PC, -.init = pc_init_pci, +.init = pc_init_pci_1_2, .max_cpus = 255, .compat_props = (GlobalProperty[]) { PC_COMPAT_1_1, @@ -418,7 +422,7 @@ static QEMUMachine pc_machine_v1_1 = { static QEMUMachine pc_machine_v1_0 = { .name = pc-1.0, .desc = Standard PC, -.init = pc_init_pci, +.init = pc_init_pci_1_2, .max_cpus = 255, .compat_props = (GlobalProperty[]) { PC_COMPAT_1_0, @@ -433,7 +437,7 @@ static QEMUMachine pc_machine_v1_0 = { static QEMUMachine pc_machine_v0_15 = { .name = pc-0.15, .desc = Standard PC, -.init = pc_init_pci, +.init = pc_init_pci_1_2, .max_cpus = 255, .compat_props = (GlobalProperty[]) { PC_COMPAT_0_15, @@ -465,7 +469,7 @@ static QEMUMachine pc_machine_v0_15 = { static QEMUMachine pc_machine_v0_14 = { .name = pc-0.14, .desc = Standard PC, -.init = pc_init_pci, +.init = pc_init_pci_1_2, .max_cpus = 255, .compat_props = (GlobalProperty[]) { PC_COMPAT_0_14, diff --git a/target-i386/cpu.c b/target-i386/cpu.c index 57a22b7..492656c 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -211,11 +211,12 @@ static uint32_t kvm_default_features = (1 KVM_FEATURE_CLOCKSOURCE) | (1 KVM_FEATURE_CLOCKSOURCE2) | (1 KVM_FEATURE_ASYNC_PF) | (1 KVM_FEATURE_STEAL_TIME) | +(1 KVM_FEATURE_PV_EOI) | (1 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT);
[RFC 11/12] target-i386: Topology APIC ID utility functions
Signed-off-by: Eduardo Habkost ehabk...@redhat.com --- Changes v1 - v2: - Support 32-bit APIC IDs (in case x2APIC is going to be used) - Coding style changes - Use TARGET_I386_TOPOLOGY_H instead of __QEMU_X86_TOPOLOGY_H__ - Rename topo_make_apic_id() to topo_apicid_for_cpu() - Rename __make_apicid() to topo_make_apicid() - Spaces around operators on test-x86-cpuid.c, as requested by Blue Swirl - Make test-x86-cpuid a target-specific test Changes v2 - v3: - Add documentation pointers to the code - Rename bits_for_count() to bitwidth_for_count() - Remove unused apicid_*_id() functions Changes v3 - v4: - Remove now-obsolete FIXME comment from test-x86-cpuid.c - Change bitops.h include to qemu/bitops.h --- target-i386/topology.h | 133 + tests/.gitignore | 1 + tests/Makefile | 4 ++ tests/test-x86-cpuid.c | 101 + 4 files changed, 239 insertions(+) create mode 100644 target-i386/topology.h create mode 100644 tests/test-x86-cpuid.c diff --git a/target-i386/topology.h b/target-i386/topology.h new file mode 100644 index 000..833ab47 --- /dev/null +++ b/target-i386/topology.h @@ -0,0 +1,133 @@ +/* + * x86 CPU topology data structures and functions + * + * Copyright (c) 2012 Red Hat Inc. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the Software), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ +#ifndef TARGET_I386_TOPOLOGY_H +#define TARGET_I386_TOPOLOGY_H + +/* This file implements the APIC-ID-based CPU topology enumeration logic, + * documented at the following document: + * Intel® 64 Architecture Processor Topology Enumeration + * http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/ + * + * This code should be compatible with AMD's Extended Method described at: + * AMD CPUID Specification (Publication #25481) + * Section 3: Multiple Core Calcuation + * as long as: + * nr_threads is set to 1; + * OFFSET_IDX is assumed to be 0; + * CPUID Fn8000_0008_ECX[ApicIdCoreIdSize[3:0]] is set to apicid_core_width(). + */ + +#include stdint.h +#include string.h + +#include qemu/bitops.h + +/* APIC IDs can be 32-bit, but beware: APIC IDs 255 require x2APIC support + */ +typedef uint32_t apic_id_t; + +/* Return the bit width needed for 'count' IDs + */ +static unsigned bitwidth_for_count(unsigned count) +{ +g_assert(count = 1); +if (count == 1) { +return 0; +} +return bitops_flsl(count - 1) + 1; +} + +/* Bit width of the SMT_ID (thread ID) field on the APIC ID + */ +static inline unsigned apicid_smt_width(unsigned nr_cores, unsigned nr_threads) +{ +return bitwidth_for_count(nr_threads); +} + +/* Bit width of the Core_ID field + */ +static inline unsigned apicid_core_width(unsigned nr_cores, unsigned nr_threads) +{ +return bitwidth_for_count(nr_cores); +} + +/* Bit offset of the Core_ID field + */ +static inline unsigned apicid_core_offset(unsigned nr_cores, + unsigned nr_threads) +{ +return apicid_smt_width(nr_cores, nr_threads); +} + +/* Bit offset of the Pkg_ID (socket ID) field + */ +static inline unsigned apicid_pkg_offset(unsigned nr_cores, unsigned nr_threads) +{ +return apicid_core_offset(nr_cores, nr_threads) + \ + apicid_core_width(nr_cores, nr_threads); +} + +/* Make APIC ID for the CPU based on Pkg_ID, Core_ID, SMT_ID + * + * The caller must make sure core_id nr_cores and smt_id nr_threads. + */ +static inline apic_id_t topo_make_apicid(unsigned nr_cores, + unsigned nr_threads, + unsigned pkg_id, unsigned core_id, + unsigned smt_id) +{ +return (pkg_id apicid_pkg_offset(nr_cores, nr_threads)) | \ + (core_id apicid_core_offset(nr_cores, nr_threads)) | \ + smt_id; +} + +/* Calculate thread/core/package IDs for a specific topology,
[RFC 06/12] fw_cfg: Remove FW_CFG_MAX_CPUS from fw_cfg_init()
PC will not use max_cpus for that field, so move it outside the common code so it can use a different value on PC. Signed-off-by: Eduardo Habkost ehabk...@redhat.com --- hw/fw_cfg.c | 1 - hw/pc.c | 2 +- hw/ppc_newworld.c | 1 + hw/ppc_oldworld.c | 1 + hw/sun4m.c| 3 +++ hw/sun4u.c| 1 + 6 files changed, 7 insertions(+), 2 deletions(-) diff --git a/hw/fw_cfg.c b/hw/fw_cfg.c index 26f7125..0f13437 100644 --- a/hw/fw_cfg.c +++ b/hw/fw_cfg.c @@ -519,7 +519,6 @@ FWCfgState *fw_cfg_init(uint32_t ctl_port, uint32_t data_port, fw_cfg_add_bytes(s, FW_CFG_UUID, qemu_uuid, 16); fw_cfg_add_i16(s, FW_CFG_NOGRAPHIC, (uint16_t)(display_type == DT_NOGRAPHIC)); fw_cfg_add_i16(s, FW_CFG_NB_CPUS, (uint16_t)smp_cpus); -fw_cfg_add_i16(s, FW_CFG_MAX_CPUS, (uint16_t)max_cpus); fw_cfg_add_i16(s, FW_CFG_BOOT_MENU, (uint16_t)boot_menu); fw_cfg_bootsplash(s); fw_cfg_reboot(s); diff --git a/hw/pc.c b/hw/pc.c index df0c48e..beb816b 100644 --- a/hw/pc.c +++ b/hw/pc.c @@ -550,7 +550,7 @@ static void *bochs_bios_init(void) int i, j; fw_cfg = fw_cfg_init(BIOS_CFG_IOPORT, BIOS_CFG_IOPORT + 1, 0, 0); - +fw_cfg_add_i16(fw_cfg, FW_CFG_MAX_CPUS, (uint16_t)max_cpus); fw_cfg_add_i32(fw_cfg, FW_CFG_ID, 1); fw_cfg_add_i64(fw_cfg, FW_CFG_RAM_SIZE, (uint64_t)ram_size); fw_cfg_add_bytes(fw_cfg, FW_CFG_ACPI_TABLES, (uint8_t *)acpi_tables, diff --git a/hw/ppc_newworld.c b/hw/ppc_newworld.c index fabcc08..fdecfcf 100644 --- a/hw/ppc_newworld.c +++ b/hw/ppc_newworld.c @@ -397,6 +397,7 @@ static void ppc_core99_init(QEMUMachineInitArgs *args) /* No PCI init: the BIOS will do it */ fw_cfg = fw_cfg_init(0, 0, CFG_ADDR, CFG_ADDR + 2); +fw_cfg_add_i16(fw_cfg, FW_CFG_MAX_CPUS, (uint16_t)max_cpus); fw_cfg_add_i32(fw_cfg, FW_CFG_ID, 1); fw_cfg_add_i64(fw_cfg, FW_CFG_RAM_SIZE, (uint64_t)ram_size); fw_cfg_add_i16(fw_cfg, FW_CFG_MACHINE_ID, machine_arch); diff --git a/hw/ppc_oldworld.c b/hw/ppc_oldworld.c index fff5129..7406c10 100644 --- a/hw/ppc_oldworld.c +++ b/hw/ppc_oldworld.c @@ -296,6 +296,7 @@ static void ppc_heathrow_init(QEMUMachineInitArgs *args) /* No PCI init: the BIOS will do it */ fw_cfg = fw_cfg_init(0, 0, CFG_ADDR, CFG_ADDR + 2); +fw_cfg_add_i16(fw_cfg, FW_CFG_MAX_CPUS, (uint16_t)max_cpus); fw_cfg_add_i32(fw_cfg, FW_CFG_ID, 1); fw_cfg_add_i64(fw_cfg, FW_CFG_RAM_SIZE, (uint64_t)ram_size); fw_cfg_add_i16(fw_cfg, FW_CFG_MACHINE_ID, ARCH_HEATHROW); diff --git a/hw/sun4m.c b/hw/sun4m.c index 0d84b37..316e94c 100644 --- a/hw/sun4m.c +++ b/hw/sun4m.c @@ -1021,6 +1021,7 @@ static void sun4m_hw_init(const struct sun4m_hwdef *hwdef, ram_addr_t RAM_size, hwdef-ecc_version); fw_cfg = fw_cfg_init(0, 0, CFG_ADDR, CFG_ADDR + 2); +fw_cfg_add_i16(fw_cfg, FW_CFG_MAX_CPUS, (uint16_t)max_cpus); fw_cfg_add_i32(fw_cfg, FW_CFG_ID, 1); fw_cfg_add_i64(fw_cfg, FW_CFG_RAM_SIZE, (uint64_t)ram_size); fw_cfg_add_i16(fw_cfg, FW_CFG_MACHINE_ID, hwdef-machine_id); @@ -1658,6 +1659,7 @@ static void sun4d_hw_init(const struct sun4d_hwdef *hwdef, ram_addr_t RAM_size, Sun4d); fw_cfg = fw_cfg_init(0, 0, CFG_ADDR, CFG_ADDR + 2); +fw_cfg_add_i16(fw_cfg, FW_CFG_MAX_CPUS, (uint16_t)max_cpus); fw_cfg_add_i32(fw_cfg, FW_CFG_ID, 1); fw_cfg_add_i64(fw_cfg, FW_CFG_RAM_SIZE, (uint64_t)ram_size); fw_cfg_add_i16(fw_cfg, FW_CFG_MACHINE_ID, hwdef-machine_id); @@ -1858,6 +1860,7 @@ static void sun4c_hw_init(const struct sun4c_hwdef *hwdef, ram_addr_t RAM_size, Sun4c); fw_cfg = fw_cfg_init(0, 0, CFG_ADDR, CFG_ADDR + 2); +fw_cfg_add_i16(fw_cfg, FW_CFG_MAX_CPUS, (uint16_t)max_cpus); fw_cfg_add_i32(fw_cfg, FW_CFG_ID, 1); fw_cfg_add_i64(fw_cfg, FW_CFG_RAM_SIZE, (uint64_t)ram_size); fw_cfg_add_i16(fw_cfg, FW_CFG_MACHINE_ID, hwdef-machine_id); diff --git a/hw/sun4u.c b/hw/sun4u.c index cbfd217..d114af3 100644 --- a/hw/sun4u.c +++ b/hw/sun4u.c @@ -878,6 +878,7 @@ static void sun4uv_init(MemoryRegion *address_space_mem, (uint8_t *)nd_table[0].macaddr); fw_cfg = fw_cfg_init(BIOS_CFG_IOPORT, BIOS_CFG_IOPORT + 1, 0, 0); +fw_cfg_add_i16(fw_cfg, FW_CFG_MAX_CPUS, (uint16_t)max_cpus); fw_cfg_add_i32(fw_cfg, FW_CFG_ID, 1); fw_cfg_add_i64(fw_cfg, FW_CFG_RAM_SIZE, (uint64_t)ram_size); fw_cfg_add_i16(fw_cfg, FW_CFG_MACHINE_ID, hwdef-machine_id); -- 1.7.11.7 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC 09/12] pc: Set fw_cfg data based on APIC ID calculation
This changes FW_CFG_MAX_CPUS and FW_CFG_NUMA to use apic_id_for_cpu(), so the NUMA table can be based on the APIC IDs, instead of CPU index (SeaBIOS knows nothing about CPU indexes, just APIC IDs). Signed-off-by: Eduardo Habkost ehabk...@redhat.com --- Changes v2: - Get PC object as argument - Add more detailed comments explaining the reason for FW_CFG_MAX_CPUS not being simply 'max_cpus' Changes v3: - Use PCInitArgs instead of PC object Changes v4: - Don't use PCInitArgs, just add the necessary data for apic_id_limit() as argument - Rename function to pc_apic_id_limit() - Rename max_apic_id to apic_id_limit --- hw/pc.c | 44 +--- 1 file changed, 37 insertions(+), 7 deletions(-) diff --git a/hw/pc.c b/hw/pc.c index beb816b..38ae807 100644 --- a/hw/pc.c +++ b/hw/pc.c @@ -541,6 +541,18 @@ int e820_add_entry(uint64_t address, uint64_t length, uint32_t type) return index; } +/* Calculates the limit to CPU APIC ID values + * + * This function returns the limit for the APIC ID value, so that all + * CPU APIC IDs are pc_apic_id_limit(). + * + * This is used for FW_CFG_MAX_CPUS. See comments on bochs_bios_init(). + */ +static unsigned int pc_apic_id_limit(unsigned int max_cpus) +{ +return apic_id_for_cpu(max_cpus - 1) + 1; +} + static void *bochs_bios_init(void) { void *fw_cfg; @@ -548,9 +560,24 @@ static void *bochs_bios_init(void) size_t smbios_len; uint64_t *numa_fw_cfg; int i, j; +unsigned int apic_id_limit = pc_apic_id_limit(max_cpus); fw_cfg = fw_cfg_init(BIOS_CFG_IOPORT, BIOS_CFG_IOPORT + 1, 0, 0); -fw_cfg_add_i16(fw_cfg, FW_CFG_MAX_CPUS, (uint16_t)max_cpus); +/* FW_CFG_MAX_CPUS is a bit confusing/problematic on x86: + * + * SeaBIOS needs FW_CFG_MAX_CPUS for CPU hotplug, but the CPU hotplug + * QEMU-SeaBIOS interface is not based on the CPU index, but on the APIC + * ID of hotplugged CPUs[1]. This means that FW_CFG_MAX_CPUS is not the + * maximum number of CPUs, but the limit to the APIC ID values SeaBIOS + * may see. + * + * So, this means we must not use max_cpus, here, but the maximum possible + * APIC ID value, plus one. + * + * [1] The only kind of CPU identifier used between SeaBIOS and QEMU is + * the APIC ID, not the CPU index + */ +fw_cfg_add_i16(fw_cfg, FW_CFG_MAX_CPUS, (uint16_t)apic_id_limit); fw_cfg_add_i32(fw_cfg, FW_CFG_ID, 1); fw_cfg_add_i64(fw_cfg, FW_CFG_RAM_SIZE, (uint64_t)ram_size); fw_cfg_add_bytes(fw_cfg, FW_CFG_ACPI_TABLES, (uint8_t *)acpi_tables, @@ -570,21 +597,24 @@ static void *bochs_bios_init(void) * of nodes, one word for each VCPU-node and one word for each node to * hold the amount of memory. */ -numa_fw_cfg = g_malloc0((1 + max_cpus + nb_numa_nodes) * 8); +numa_fw_cfg = g_malloc0((1 + apic_id_limit + nb_numa_nodes) * 8); numa_fw_cfg[0] = cpu_to_le64(nb_numa_nodes); -for (i = 0; i max_cpus; i++) { +unsigned int cpu_idx; +for (cpu_idx = 0; cpu_idx max_cpus; cpu_idx++) { +unsigned int apic_id = apic_id_for_cpu(cpu_idx); +assert(apic_id apic_id_limit); for (j = 0; j nb_numa_nodes; j++) { -if (test_bit(i, node_cpumask[j])) { -numa_fw_cfg[i + 1] = cpu_to_le64(j); +if (test_bit(cpu_idx, node_cpumask[j])) { +numa_fw_cfg[apic_id + 1] = cpu_to_le64(j); break; } } } for (i = 0; i nb_numa_nodes; i++) { -numa_fw_cfg[max_cpus + 1 + i] = cpu_to_le64(node_mem[i]); +numa_fw_cfg[apic_id_limit + 1 + i] = cpu_to_le64(node_mem[i]); } fw_cfg_add_bytes(fw_cfg, FW_CFG_NUMA, (uint8_t *)numa_fw_cfg, - (1 + max_cpus + nb_numa_nodes) * 8); + (1 + apic_id_limit + nb_numa_nodes) * 8); return fw_cfg; } -- 1.7.11.7 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC 10/12] tests: Support target-specific unit tests
To make unit tests that depend on target-specific files, use check-unit-arch-y and test-obj-arch-y. Signed-off-by: Eduardo Habkost ehabk...@redhat.com --- tests/Makefile | 15 --- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/tests/Makefile b/tests/Makefile index b09a343..fa96d1a 100644 --- a/tests/Makefile +++ b/tests/Makefile @@ -74,8 +74,6 @@ test-qapi-obj-y = $(qobject-obj-y) $(qapi-obj-y) qemu-tool.o test-qapi-obj-y += tests/test-qapi-visit.o tests/test-qapi-types.o test-qapi-obj-y += module.o -$(test-obj-y): QEMU_INCLUDES += -Itests - tests/check-qint$(EXESUF): tests/check-qint.o qint.o tests/check-qstring$(EXESUF): tests/check-qstring.o qstring.o tests/check-qdict$(EXESUF): tests/check-qdict.o qdict.o qfloat.o qint.o qstring.o qbool.o qlist.o @@ -111,9 +109,20 @@ tests/m48t59-test$(EXESUF): tests/m48t59-test.o $(trace-obj-y) tests/fdc-test$(EXESUF): tests/fdc-test.o tests/libqtest.o $(trace-obj-y) tests/hd-geo-test$(EXESUF): tests/hd-geo-test.o tests/libqtest.o $(trace-obj-y) -# QTest rules +# unit test rules: TARGETS=$(patsubst %-softmmu,%, $(filter %-softmmu,$(TARGET_DIRS))) + +# target-specific tests/objs: +test-obj-y += $(foreach TARGET,$(TARGETS), $(test-obj-$(TARGET)-y)) +check-unit-y += $(foreach TARGET,$(TARGETS), $(check-unit-$(TARGET)-y)) + +$(foreach TARGET,$(TARGETS),$(eval $(test-obj-$(TARGET)-y): QEMU_INCLUDES += -Itarget-$(TARGET))) + +$(test-obj-y): QEMU_INCLUDES += -Itests + +# QTest rules + QTEST_TARGETS=$(foreach TARGET,$(TARGETS), $(if $(check-qtest-$(TARGET)-y), $(TARGET),)) check-qtest-$(CONFIG_POSIX)=$(foreach TARGET,$(TARGETS), $(check-qtest-$(TARGET)-y)) -- 1.7.11.7 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC 08/12] cpus.h: Make constant smp_cores/smp_threads available on *-user
The code that calculates the APIC ID will use smp_cores/smp_threads, so just define them as 1 on *-user to avoid #ifdefs in the code. Signed-off-by: Eduardo Habkost ehabk...@redhat.com --- include/sysemu/cpus.h | 7 +++ 1 file changed, 7 insertions(+) diff --git a/include/sysemu/cpus.h b/include/sysemu/cpus.h index 81bd817..f7f6854 100644 --- a/include/sysemu/cpus.h +++ b/include/sysemu/cpus.h @@ -13,9 +13,16 @@ void cpu_synchronize_all_post_init(void); void qtest_clock_warp(int64_t dest); +#ifndef CONFIG_USER_ONLY /* vl.c */ extern int smp_cores; extern int smp_threads; +#else +/* *-user doesn't have configurable SMP topology */ +#define smp_cores 1 +#define smp_threads 1 +#endif + void set_numa_modes(void); void set_cpu_log(const char *optarg); void set_cpu_log_filename(const char *optarg); -- 1.7.11.7 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC 05/12] target-i386: kvm: Set vcpu_id to APIC ID instead of CPU index
The CPU ID in KVM is supposed to be the APIC ID, so change the KVM_CREATE_VCPU call to match it. The current behavior didn't break anything yet because today the APIC ID is assumed to be equal to the CPU index, but this won't be true in the future. Signed-off-by: Eduardo Habkost ehabk...@redhat.com --- Cc: kvm@vger.kernel.org Cc: Michael S. Tsirkin m...@redhat.com Cc: Gleb Natapov g...@redhat.com Cc: Marcelo Tosatti mtosa...@redhat.com Changes v2: - Change only i386 code (kvm_arch_vcpu_id()) Changes v3: - Get CPUState as argument instead of CPUArchState --- target-i386/kvm.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/target-i386/kvm.c b/target-i386/kvm.c index 5f3f789..c440809 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -411,9 +411,10 @@ static void cpu_update_state(void *opaque, int running, RunState state) } } -unsigned long kvm_arch_vcpu_id(CPUState *cpu) +unsigned long kvm_arch_vcpu_id(CPUState *cs) { -return cpu-cpu_index; +X86CPU *cpu = X86_CPU(cs); +return cpu-env.cpuid_apic_id; } int kvm_arch_init_vcpu(CPUState *cs) -- 1.7.11.7 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC 07/12] target-i386/cpu: Introduce apic_id_for_cpu() function
This function will be used by both the CPU initialization code and the fw_cfg table initialization code. Later this function will be updated to generate APIC IDs according to the CPU topology. Signed-off-by: Eduardo Habkost ehabk...@redhat.com --- target-i386/cpu.c | 17 - target-i386/cpu.h | 2 ++ 2 files changed, 18 insertions(+), 1 deletion(-) diff --git a/target-i386/cpu.c b/target-i386/cpu.c index 492656c..33787dc 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -2192,6 +2192,21 @@ void x86_cpu_realize(Object *obj, Error **errp) cpu_reset(CPU(cpu)); } +/* Calculates initial APIC ID for a specific CPU index + * + * Currently we need to be able to calculate the APIC ID from the CPU index + * alone (without requiring a CPU object), as the QEMU-Seabios interfaces have + * no concept of CPU index, and the NUMA tables on fw_cfg need the APIC ID of + * all CPUs up to max_cpus. + */ +uint32_t apic_id_for_cpu(unsigned int cpu_index) +{ +/* right now APIC ID == CPU index. this will eventually change to use + * the CPU topology configuration properly + */ +return cpu_index; +} + static void x86_cpu_initfn(Object *obj) { CPUState *cs = CPU(obj); @@ -2226,7 +2241,7 @@ static void x86_cpu_initfn(Object *obj) x86_cpuid_get_tsc_freq, x86_cpuid_set_tsc_freq, NULL, NULL, NULL); -env-cpuid_apic_id = cs-cpu_index; +env-cpuid_apic_id = apic_id_for_cpu(cs-cpu_index); /* init various static tables used in TCG mode */ if (tcg_enabled() !inited) { diff --git a/target-i386/cpu.h b/target-i386/cpu.h index f3c9df5..dbd9899 100644 --- a/target-i386/cpu.h +++ b/target-i386/cpu.h @@ -1238,4 +1238,6 @@ void disable_kvm_pv_eoi(void); /* Return name of 32-bit register, from a R_* constant */ const char *get_register_name_32(unsigned int reg); +uint32_t apic_id_for_cpu(unsigned int cpu_index); + #endif /* CPU_I386_H */ -- 1.7.11.7 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC 12/12] pc: Generate APIC IDs according to CPU topology
This keeps compatibility on machine-types pc-1.2 and older, and prints a warning in case the requested configuration won't get the correct topology. I couldn't think of a better way to warn about broken topology when in compat mode other than using error_report(). The warning message will be probably be buried in a log file somewhere, but it's better than nothing. Signed-off-by: Eduardo Habkost ehabk...@redhat.com --- Changes v1 - v2: - Move code to cpu.c - keep using cpu_index on *-user - Use SMP.contiguous_apic_ids global property - Prints warning in case the compatibility mode will expose incorrect topology Changes v2 - v3: - Now all code is inside hw/pc.c - Use a real PC class and a contiguous_apic_ids property Changes v3 - v4: - Instead of using a global property, use a separate machine init function and a PCInitArgs field, to implement compatibility mode - Use error_report() instead of fprintf(stderr) for the warning - Use a field on PCInitArgs instead of a static variable to check if warning was already printed Changes v4 - v5: - Don't use PCInitArgs: simply add a enable_compat_apic_id_mode() function and a static compat_apic_id_mode variable, to enable the compatibility mode - Move APIC ID calculation code to cpu.c --- hw/pc_piix.c | 13 +++-- target-i386/cpu.c | 28 target-i386/cpu.h | 1 + 3 files changed, 36 insertions(+), 6 deletions(-) diff --git a/hw/pc_piix.c b/hw/pc_piix.c index 13d7cc8..be4fea1 100644 --- a/hw/pc_piix.c +++ b/hw/pc_piix.c @@ -233,11 +233,18 @@ static void pc_init_pci(QEMUMachineInitArgs *args) initrd_filename, cpu_model, 1, 1); } + +static void pc_init_pci_1_3(QEMUMachineInitArgs *args) +{ +enable_compat_apic_id_mode(); +pc_init_pci(args); +} + /* PC machine init function for pc-0.14 to pc-1.2 */ static void pc_init_pci_1_2(QEMUMachineInitArgs *args) { disable_kvm_pv_eoi(); -pc_init_pci(args); +pc_init_pci_1_3(args); } /* PC init function for pc-0.10 to pc-0.13, and reused by xenfv */ @@ -250,6 +257,7 @@ static void pc_init_pci_no_kvmclock(QEMUMachineInitArgs *args) const char *initrd_filename = args-initrd_filename; const char *boot_device = args-boot_device; disable_kvm_pv_eoi(); +enable_compat_apic_id_mode(); pc_init1(get_system_memory(), get_system_io(), ram_size, boot_device, @@ -268,6 +276,7 @@ static void pc_init_isa(QEMUMachineInitArgs *args) if (cpu_model == NULL) cpu_model = 486; disable_kvm_pv_eoi(); +enable_compat_apic_id_mode(); pc_init1(get_system_memory(), get_system_io(), ram_size, boot_device, @@ -305,7 +314,7 @@ static QEMUMachine pc_machine_v1_4 = { static QEMUMachine pc_machine_v1_3 = { .name = pc-1.3, .desc = Standard PC, -.init = pc_init_pci, +.init = pc_init_pci_1_3, .max_cpus = 255, .compat_props = (GlobalProperty[]) { PC_COMPAT_1_3, diff --git a/target-i386/cpu.c b/target-i386/cpu.c index 33787dc..f34192c 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -23,6 +23,8 @@ #include cpu.h #include sysemu/kvm.h +#include sysemu/cpus.h +#include topology.h #include qemu/option.h #include qemu/config-file.h @@ -2192,6 +2194,14 @@ void x86_cpu_realize(Object *obj, Error **errp) cpu_reset(CPU(cpu)); } +/* Enables contiguous-apic-ID mode, for compatibility */ +static bool compat_apic_id_mode; + +void enable_compat_apic_id_mode(void) +{ +compat_apic_id_mode = true; +} + /* Calculates initial APIC ID for a specific CPU index * * Currently we need to be able to calculate the APIC ID from the CPU index @@ -2201,10 +2211,20 @@ void x86_cpu_realize(Object *obj, Error **errp) */ uint32_t apic_id_for_cpu(unsigned int cpu_index) { -/* right now APIC ID == CPU index. this will eventually change to use - * the CPU topology configuration properly - */ -return cpu_index; +uint32_t correct_id; +static bool warned; + +correct_id = topo_apicid_for_cpu(smp_cores, smp_threads, cpu_index); +if (compat_apic_id_mode) { +if (cpu_index != correct_id !warned) { +error_report(APIC IDs set in compatibility mode, + CPU topology won't match the configuration); +warned = true; +} +return cpu_index; +} else { +return correct_id; +} } static void x86_cpu_initfn(Object *obj) diff --git a/target-i386/cpu.h b/target-i386/cpu.h index dbd9899..d9a9e8f 100644 --- a/target-i386/cpu.h +++ b/target-i386/cpu.h @@ -1239,5 +1239,6 @@ void disable_kvm_pv_eoi(void); const char *get_register_name_32(unsigned int reg); uint32_t apic_id_for_cpu(unsigned int cpu_index); +void enable_compat_apic_id_mode(void); #endif /* CPU_I386_H */ -- 1.7.11.7 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo
[RFC 00/12] target-i386: Fix APIC-ID-based topology (v4)
This series uses a much simpler approach than the previous versions: - The APIC ID calculation code is now inside cpu.c - It doesn't require touching the PC CPU creation code at all - It simply uses a static variable to enable the compat behavior on pc-1.3 and older - I considered making the compat-apic-id setting a global property for the X86CPU objects, the only problem is that the fw_cfg initialization code on pc.c also depends on the compat behavior I am sending this as RFC because it depends on two not-included-yet series: - Andreas' qom-cpu-7 branch - My cpu-enforce fixes series I hope to be able to get this buf fix into QEMU 1.4. I don't know if we should try to get this before soft freeze, or we can include it after that. Git tree for reference: git://github.com/ehabkost/qemu-hacks.git apicid-topology.v4 https://github.com/ehabkost/qemu-hacks/tree/apicid-topology.v4 Eduardo Habkost (12): kvm: add KVM_FEATURE_CLOCKSOURCE_STABLE_BIT fake #define target-i386: Don't set any KVM flag by default if KVM is disabled pc: Reverse pc_init_pci() compatibility logic kvm: Create kvm_arch_vcpu_id() function target-i386: kvm: Set vcpu_id to APIC ID instead of CPU index fw_cfg: Remove FW_CFG_MAX_CPUS from fw_cfg_init() target-i386/cpu: Introduce apic_id_for_cpu() function cpus.h: Make constant smp_cores/smp_threads available on *-user pc: Set fw_cfg data based on APIC ID calculation tests: Support target-specific unit tests target-i386: Topology APIC ID utility functions pc: Generate APIC IDs according to CPU topology hw/fw_cfg.c| 1 - hw/pc.c| 44 +--- hw/pc_piix.c | 27 +++--- hw/ppc_newworld.c | 1 + hw/ppc_oldworld.c | 1 + hw/sun4m.c | 3 ++ hw/sun4u.c | 1 + include/sysemu/cpus.h | 7 +++ include/sysemu/kvm.h | 4 ++ kvm-all.c | 2 +- target-i386/cpu.c | 56 - target-i386/cpu.h | 5 +- target-i386/kvm.c | 6 +++ target-i386/topology.h | 133 + target-ppc/kvm.c | 5 ++ target-s390x/kvm.c | 5 ++ tests/.gitignore | 1 + tests/Makefile | 19 +-- tests/test-x86-cpuid.c | 101 + 19 files changed, 390 insertions(+), 32 deletions(-) create mode 100644 target-i386/topology.h create mode 100644 tests/test-x86-cpuid.c -- 1.7.11.7 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] VFIO-AER: Vfio-pci driver changes for supporting AER
On Wed, 2013-01-09 at 10:52 -0700, Alex Williamson wrote: On Wed, 2013-01-09 at 06:26 +, Pandarathil, Vijaymohan R wrote: - New ioctl which is used to pass the eventfd that is signaled when an error occurs in the vfio_pci_device - Register pci_error_handler for the vfio_pci driver - When the device encounters an error, the error handler registered by the vfio_pci driver gets invoked by the AER infrastructure - In the error handler, signal the eventfd registered for the device. - This results in the qemu eventfd handler getting invoked and appropriate action taken for the guest. Signed-off-by: Vijay Mohan Pandarathil vijaymohan.pandarat...@hp.com --- drivers/vfio/pci/vfio_pci.c | 29 + drivers/vfio/pci/vfio_pci_private.h | 1 + drivers/vfio/vfio.c | 8 include/linux/vfio.h| 1 + include/uapi/linux/vfio.h | 9 + 5 files changed, 48 insertions(+) diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index 6c11994..4ae9526 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -207,6 +207,8 @@ static long vfio_pci_ioctl(void *device_data, if (vdev-reset_works) info.flags |= VFIO_DEVICE_FLAGS_RESET; + info.flags |= VFIO_DEVICE_FLAGS_AER_NOTIFY; + This appears to be a PCI specific flag, so the name should include _PCI_. We also support non-PCIe devices and it seems like it would be possible to not have AER support available, so shouldn't this be conditional? info.num_regions = VFIO_PCI_NUM_REGIONS; info.num_irqs = VFIO_PCI_NUM_IRQS; @@ -348,6 +350,19 @@ static long vfio_pci_ioctl(void *device_data, return ret; + } else if (cmd == VFIO_DEVICE_SET_ERRFD) { + int32_t fd = (int32_t)arg; + + if (fd 0) + return -EINVAL; + + vdev-err_trigger = eventfd_ctx_fdget(fd); + + if (IS_ERR(vdev-err_trigger)) + return PTR_ERR(vdev-err_trigger); + + return 0; + I'm not sure why we wouldn't describe this as just another interrupt from the device and configure it via SET_IRQ. This ioctl has very limited use and doesn't follow any of the conventions of all the other vfio ioctls. } else if (cmd == VFIO_DEVICE_RESET) return vdev-reset_works ? pci_reset_function(vdev-pdev) : -EINVAL; @@ -527,11 +542,25 @@ static void vfio_pci_remove(struct pci_dev *pdev) kfree(vdev); } +static pci_ers_result_t vfio_err_detected(struct pci_dev *pdev, + pci_channel_state_t state) +{ + struct vfio_pci_device *vdev = vfio_get_vdev(pdev-dev); + + eventfd_signal(vdev-err_trigger, 1); + return PCI_ERS_RESULT_CAN_RECOVER; +} What if err_trigger hasn't been set? + +static const struct pci_error_handlers vfio_err_handlers = { + .error_detected = vfio_err_detected, +}; + static struct pci_driver vfio_pci_driver = { .name = vfio-pci, .id_table = NULL, /* only dynamic ids */ .probe = vfio_pci_probe, .remove = vfio_pci_remove, + .err_handler= vfio_err_handlers, }; static void __exit vfio_pci_cleanup(void) diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h index 611827c..daee62f 100644 --- a/drivers/vfio/pci/vfio_pci_private.h +++ b/drivers/vfio/pci/vfio_pci_private.h @@ -55,6 +55,7 @@ struct vfio_pci_device { boolbardirty; struct pci_saved_state *pci_saved_state; atomic_trefcnt; + struct eventfd_ctx *err_trigger; }; #define is_intx(vdev) (vdev-irq_type == VFIO_PCI_INTX_IRQ_INDEX) diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index 56097c6..5ed5a54 100644 --- a/drivers/vfio/vfio.c +++ b/drivers/vfio/vfio.c @@ -693,6 +693,14 @@ void *vfio_del_group_dev(struct device *dev) } EXPORT_SYMBOL_GPL(vfio_del_group_dev); +void *vfio_get_vdev(struct device *dev) +{ + struct vfio_device *device = dev_get_drvdata(dev); + + return device-device_data; +} +EXPORT_SYMBOL_GPL(vfio_get_vdev); + This is unsafe. How do we know dev is a vfio device? How do we keep that drvdata valid while you're using it? I think you want to export the existing vfio_group_get_device() and vfio_device_put(). Thanks, vfio_group_get_device() isn't quite what you want either since it assumes a reference count on the group. You'll want something like vfio_add_group_dev() or vfio_dev_present(), perhaps moving the iommu_group_get - vfio_group_get_from_iommu - vfio_group_get_device into a helper function. Thanks, Alex /** * VFIO base fd,
[PATCH 1/2] vfio-pci: Make host MSI-X enable track guest
Guests typically enable MSI-X with all of the vectors in the MSI-X vector table masked. Only when the vector is enabled does the vector get unmasked, resulting in a vector_use callback. These two points, enable and unmask, correspond to pci_enable_msix() and request_irq() for Linux guests. Some drivers rely on VF/PF or PF/fw communication channels that expect the physical state of the device to match the guest visible state of the device. They don't appreciate lazily enabling MSI-X on the physical device. To solve this, enable MSI-X with a single vector when the MSI-X capability is enabled and immediate disable the vector. This leaves the physical device in exactly the same state between host and guest. Furthermore, the brief gap where we enable vector 0, it fires into userspace, not KVM, so the guest doesn't get spurious interrupts. Ideally we could call VFIO_DEVICE_SET_IRQS with the right parameters to enable MSI-X with zero vectors, but this will currently return an error as the Linux MSI-X interfaces do not allow it. Signed-off-by: Alex Williamson alex.william...@redhat.com Cc: qemu-sta...@nongnu.org --- hw/vfio_pci.c | 30 ++ 1 file changed, 26 insertions(+), 4 deletions(-) diff --git a/hw/vfio_pci.c b/hw/vfio_pci.c index 28c8303..8ec1faf 100644 --- a/hw/vfio_pci.c +++ b/hw/vfio_pci.c @@ -562,8 +562,8 @@ static int vfio_enable_vectors(VFIODevice *vdev, bool msix) return ret; } -static int vfio_msix_vector_use(PCIDevice *pdev, -unsigned int nr, MSIMessage msg) +static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr, + MSIMessage *msg, IOHandler *handler) { VFIODevice *vdev = DO_UPCAST(VFIODevice, pdev, pdev); VFIOMSIVector *vector; @@ -587,7 +587,7 @@ static int vfio_msix_vector_use(PCIDevice *pdev, * Attempt to enable route through KVM irqchip, * default to userspace handling if unavailable. */ -vector-virq = kvm_irqchip_add_msi_route(kvm_state, msg); +vector-virq = msg ? kvm_irqchip_add_msi_route(kvm_state, *msg) : -1; if (vector-virq 0 || kvm_irqchip_add_irqfd_notifier(kvm_state, vector-interrupt, vector-virq) 0) { @@ -596,7 +596,7 @@ static int vfio_msix_vector_use(PCIDevice *pdev, vector-virq = -1; } qemu_set_fd_handler(event_notifier_get_fd(vector-interrupt), -vfio_msi_interrupt, NULL, vector); +handler, NULL, vector); } /* @@ -639,6 +639,12 @@ static int vfio_msix_vector_use(PCIDevice *pdev, return 0; } +static int vfio_msix_vector_use(PCIDevice *pdev, +unsigned int nr, MSIMessage msg) +{ +return vfio_msix_vector_do_use(pdev, nr, msg, vfio_msi_interrupt); +} + static void vfio_msix_vector_release(PCIDevice *pdev, unsigned int nr) { VFIODevice *vdev = DO_UPCAST(VFIODevice, pdev, pdev); @@ -697,6 +703,22 @@ static void vfio_enable_msix(VFIODevice *vdev) vdev-interrupt = VFIO_INT_MSIX; +/* + * Some communication channels between VF PF or PF fw rely on the + * physical state of the device and expect that enabling MSI-X from the + * guest enables the same on the host. When our guest is Linux, the + * guest driver call to pci_enable_msix() sets the enabling bit in the + * MSI-X capability, but leaves the vector table masked. We therefore + * can't rely on a vector_use callback (from request_irq() in the guest) + * to switch the physical device into MSI-X mode because that may come a + * long time after pci_enable_msix(). This code enables vector 0 with + * triggering to userspace, then immediately release the vector, leaving + * the physical device with no vectors enabled, but MSI-X enabled, just + * like the guest view. + */ +vfio_msix_vector_do_use(vdev-pdev, 0, NULL, NULL); +vfio_msix_vector_release(vdev-pdev, 0); + if (msix_set_vector_notifiers(vdev-pdev, vfio_msix_vector_use, vfio_msix_vector_release, NULL)) { error_report(vfio: msix_set_vector_notifiers failed\n); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] vfio-pci: Loosen sanity checks to allow future features
VFIO_PCI_NUM_REGIONS and VFIO_PCI_NUM_IRQS should never have been used in this manner as it locks a specific kernel implementation. Future features may introduce new regions or interrupt entries (VGA may add legacy ranges, AER might add an IRQ for error signalling). Fix this before it gets us into trouble. Signed-off-by: Alex Williamson alex.william...@redhat.com Cc: qemu-sta...@nongnu.org --- hw/vfio_pci.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/hw/vfio_pci.c b/hw/vfio_pci.c index 8ec1faf..c51ae67 100644 --- a/hw/vfio_pci.c +++ b/hw/vfio_pci.c @@ -1837,13 +1837,13 @@ static int vfio_get_device(VFIOGroup *group, const char *name, VFIODevice *vdev) error_report(Warning, device %s does not support reset\n, name); } -if (dev_info.num_regions != VFIO_PCI_NUM_REGIONS) { +if (dev_info.num_regions VFIO_PCI_CONFIG_REGION_INDEX + 1) { error_report(vfio: unexpected number of io regions %u\n, dev_info.num_regions); goto error; } -if (dev_info.num_irqs != VFIO_PCI_NUM_IRQS) { +if (dev_info.num_irqs VFIO_PCI_MSIX_IRQ_INDEX + 1) { error_report(vfio: unexpected number of irqs %u\n, dev_info.num_irqs); goto error; } -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 0/2] vfio-pci: Fixes for 1.4 stable
Anthony, The following changes since commit 560c30b1db1d40fe45c5104185367c4de43399d3: Merge remote-tracking branch 'kraxel/usb.75' into staging (2013-01-08 10:36:20 -0600) are available in the git repository at: git://github.com/awilliam/qemu-vfio.git tags/qemu-1.4-vfio-20130109.0 for you to fetch changes up to 8fc94e5a8046e349e07976f9bcaffbcd5833f3a2: vfio-pci: Loosen sanity checks to allow future features (2013-01-08 14:10:03 -0700) vfio-pci: Fixes for qemu 1.4 stable Alex Williamson (2): vfio-pci: Make host MSI-X enable track guest vfio-pci: Loosen sanity checks to allow future features hw/vfio_pci.c | 34 -- 1 file changed, 28 insertions(+), 6 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [kvmarm] [PATCH v5.1 0/2] KVM: ARM: Rename KVM_SET_DEVICE_ADDRESS
On 01/09/2013 10:48:47 AM, Alexander Graf wrote: On 09.01.2013, at 17:26, Christoffer Dall wrote: Renames the KVM_SET_DEVICE_ADDRESS to KVM_ARM_SET_DEVICE_ADDR to make it obvious that this is ARM specific in lack of a better generic interface. Once we agree on a better interface the KVM/ARM code can also take advantage of that, but until then we don't want to hold up the KVM/ARM patches. Works for me. Scott, are you happy with this one too? Not really, given that it will stay around forever even after something new is introduced. If you're going to change the name, why not just change it to KVM_SET_DEVICE_CONFIG? Can we change the name later if nothing else changes (so it's still binary compatible)? We can start to introduce (and fix ARM) with a generic ioctl in the MPIC patches then. The ioctl is already generic, except for its name. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [kvmarm] [PATCH v5.1 0/2] KVM: ARM: Rename KVM_SET_DEVICE_ADDRESS
On 09.01.2013, at 20:50, Scott Wood wrote: On 01/09/2013 10:48:47 AM, Alexander Graf wrote: On 09.01.2013, at 17:26, Christoffer Dall wrote: Renames the KVM_SET_DEVICE_ADDRESS to KVM_ARM_SET_DEVICE_ADDR to make it obvious that this is ARM specific in lack of a better generic interface. Once we agree on a better interface the KVM/ARM code can also take advantage of that, but until then we don't want to hold up the KVM/ARM patches. Works for me. Scott, are you happy with this one too? Not really, given that it will stay around forever even after something new is introduced. But only in ARM specific code. If you're going to change the name, why not just change it to KVM_SET_DEVICE_CONFIG? Can we change the name later if nothing else changes (so it's still binary compatible)? Because that again implies that it's generic enough. And to reach that conclusion will take more time than we should spend on this now. We can start to introduce (and fix ARM) with a generic ioctl in the MPIC patches then. The ioctl is already generic, except for its name. It's making a few wrong assumptions: * maximum size of value is u64 * combining device id (variable) with addr type id (const) into a single field. It could just be split into multiple fields * the id is 100% architecture specific. It shouldn't be. At least the device id field should be generic. I'm not sure if we can come up with more problems in that API when staring at it a bit longer and/or we would actually start using it for more things. So for the sake of not holding up the ARM code, I'm perfectly fine to clutter ARM's ioctl handling code with an ioctl that is already deprecated at its introduction, as long as we don't hold everything else up meanwhile. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
vhost-net thread getting stuck ?
Hello, 'am running into an issue with the latest bits. [ Pl. see below. The vhost thread seems to be getting stuck while trying to memcopy...perhaps a bad address?. ] Wondering if this is a known issue or some recent regression ? 'am using the latest qemu (from qemu.git) and the latest kvm.git kernel on the host. Started the guest using the following command line /usr/local/bin/qemu-system-x86_64 \ -enable-kvm \ -cpu host \ -smp sockets=8,cores=10,threads=1 \ -numa node,nodeid=0,cpus=0-9,mem=64g \ -numa node,nodeid=1,cpus=10-19,mem=64g \ -numa node,nodeid=2,cpus=20-29,mem=64g \ -numa node,nodeid=3,cpus=30-39,mem=64g \ -numa node,nodeid=4,cpus=40-49,mem=64g \ -numa node,nodeid=5,cpus=50-59,mem=64g \ -numa node,nodeid=6,cpus=60-69,mem=64g \ -numa node,nodeid=7,cpus=70-79,mem=64g \ -m 524288 \ -mem-path /dev/hugepages \ -name vm2 \ -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/vm2.monitor,server,now ait \ -drive file=/dev/libvirt_lvm2/vm2,if=none,id=drive-virtio-disk0,format=raw,cache =none,aio=native \ -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=v irtio-disk0,bootindex=1 \ -monitor stdio \ -net nic,model=virtio,macaddr=52:54:00:71:01:02,netdev=nic-0 \ -netdev tap,id=nic-0,ifname=tap0,script=no,downscript=no,vhost=on \ -vnc :4 Was just doing a basic kernel build in the guest when it hung with the following in the dmesg of the host. Thanks Vinod BUG: soft lockup - CPU#46 stuck for 23s! [vhost-135220:135231] Modules linked in: kvm_intel kvm fuse ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables bridge stp llc autofs4 sunrpc pcc_cpufreq ipv6 vhost_net macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support coretemp crc32c_intel ghash_clmulni_intel microcode pcspkr mlx4_core be2net lpc_ich mfd_core hpilo hpwdt i7core_edac edac_core sg netxen_nic ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif aesni_intel ablk_helper cryptd lrw aes_x86_64 xts gf128mul pata_acpi ata_generic ata_piix hpsa lpfc scsi_transport_fc scsi_tgt radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: kvm] CPU 46 Pid: 135231, comm: vhost-135220 Not tainted 3.7.0+ #1 HP ProLiant DL980 G7 RIP: 0010:[8147bab0] [8147bab0] skb_flow_dissect+0x1b0/0x440 RSP: 0018:881ffd131bc8 EFLAGS: 0246 RAX: 8a1f7dc70c00 RBX: RCX: 7fa0 RDX: RSI: 881ffd131c68 RDI: 8a1ff1bd6c80 RBP: 881ffd131c58 R08: 881ffd131bf8 R09: 8a1ff1bd6c80 R10: 0010 R11: 0004 R12: 8a1ff1bd6c80 R13: 000b R14: 8147330b R15: 881ffd131b58 FS: () GS:8a1fff98() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 003d5c810dc0 CR3: 009f77c04000 CR4: 27e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process vhost-135220 (pid: 135231, threadinfo 881ffd13, task 881ffcb754c0) Stack: 881ffd131c18 81477b90 00e2 2b289bcc58ce 881ffd131ce4 00a2 00a2 00a2 00a2 881ffd131c88 937e754e Call Trace: [81477b90] ? memcpy_fromiovecend+0x90/0xd0 [8147f3ca] __skb_get_rxhash+0x1a/0xe0 [a03c90f8] tun_get_user+0x468/0x660 [tun] [81090010] ? __sdt_alloc+0x80/0x1a0 [a03c934d] tun_sendmsg+0x5d/0x80 [tun] [a0468e8a] handle_tx+0x34a/0x680 [vhost_net] [a04691f5] handle_tx_kick+0x15/0x20 [vhost_net] [a0466dfc] vhost_worker+0x10c/0x1c0 [vhost_net] [a0466cf0] ? vhost_attach_cgroups_work+0x30/0x30 [vhost_net] [a0466cf0] ? vhost_attach_cgroups_work+0x30/0x30 [vhost_net] [8107ecfe] kthread+0xce/0xe0 [8107ec30] ? kthread_freezable_should_stop+0x70/0x70 [815537ac] ret_from_fork+0x7c/0xb0 [8107ec30] ? kthread_freezable_should_stop+0x70/0x70 Code: b6 50 06 48 89 ce 48 c1 ee 20 31 f1 41 89 0e 48 8b 48 20 48 33 48 18 48 89 c8 48 c1 e8 20 31 c1 41 89 4e 04 e9 35 ff ff ff 66 90 0f b6 50 09 e9 1a ff ff ff 0f 1f 80 00 00 00 00 41 8b 44 24 68 [root@hydra11 kvm_rik]# Message from syslogd@hydra11 at Jan 9 13:06:58 ... kernel:BUG: soft lockup - CPU#46 stuck for 22s! [vhost-135220:135231] -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [kvmarm] [PATCH v5.1 0/2] KVM: ARM: Rename KVM_SET_DEVICE_ADDRESS
On 01/09/2013 02:12:16 PM, Alexander Graf wrote: On 09.01.2013, at 20:50, Scott Wood wrote: On 01/09/2013 10:48:47 AM, Alexander Graf wrote: On 09.01.2013, at 17:26, Christoffer Dall wrote: Renames the KVM_SET_DEVICE_ADDRESS to KVM_ARM_SET_DEVICE_ADDR to make it obvious that this is ARM specific in lack of a better generic interface. Once we agree on a better interface the KVM/ARM code can also take advantage of that, but until then we don't want to hold up the KVM/ARM patches. Works for me. Scott, are you happy with this one too? Not really, given that it will stay around forever even after something new is introduced. But only in ARM specific code. ...which I'll probably have to deal with when Freescale's virtualization-capable ARM chips come along. I don't see it's only in that other architecture as it might as well not exist. If you're going to change the name, why not just change it to KVM_SET_DEVICE_CONFIG? Can we change the name later if nothing else changes (so it's still binary compatible)? Because that again implies that it's generic enough. And to reach that conclusion will take more time than we should spend on this now. If the conclusion later on is that it is good enough, can the name be changed then? We can start to introduce (and fix ARM) with a generic ioctl in the MPIC patches then. The ioctl is already generic, except for its name. It's making a few wrong assumptions: * maximum size of value is u64 This is tolerable IMHO. * combining device id (variable) with addr type id (const) into a single field. It could just be split into multiple fields I agree, but that could be lived with as well. I get that there's a tradeoff between getting something in now, versus waiting until the API is more refined. Tagging it with a particular ISA seems like an odd way of saying soon to be deprecated, though. What happens if we're still squabbling over the perfect replacement API when we're trying to push PPC MPIC stuff in? Perhaps the threshold for an API becoming permanent should not be acceptance into the tree, but rather the removal of an experimental tag (including a way of shutting off experimental APIs to make sure you're not depending on them). Sort of like CONFIG_EXPERIMENTAL, except actually used for its intended purpose (distributions should have it *off* by default), and preferably managed at runtime. Sort of like drivers/staging, except for APIs rather than drivers. Changes at that point should require more justification than before merging, but would not have the strict compatibility requirement that non-experimental APIs have. This would make collaboration and testing easier on APIs that aren't ready to be permanent. * the id is 100% architecture specific. It shouldn't be. At least the device id field should be generic. That's a documentation issue that could be changed to have all architectures adopt what is currently specified for ARM, without breaking compatibility. I'm not sure if we can come up with more problems in that API when staring at it a bit longer and/or we would actually start using it for more things. So for the sake of not holding up the ARM code, I'm perfectly fine to clutter ARM's ioctl handling code with an ioctl that is already deprecated at its introduction, as long as we don't hold everything else up meanwhile. I'm not in a position to block it, and if I were I presumably would have seen this in time for feedback to matter. That's different from actually being happy. :-) -Scott -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 2/2] QEMU-AER: Qemu changes to support AER for VFIO-PCI devices
On Wed, Jan 9, 2013 at 6:26 AM, Pandarathil, Vijaymohan R vijaymohan.pandarat...@hp.com wrote: - Create eventfd per vfio device assigned to a guest and register an event handler - This fd is passed to the vfio_pci driver through a new ioctl - When the device encounters an error, the eventfd is signaled and the qemu eventfd handler gets invoked. - In the handler decide what action to take. Current action taken is to terminate the guest. Signed-off-by: Vijay Mohan Pandarathil vijaymohan.pandarat...@hp.com --- hw/vfio_pci.c | 56 ++ linux-headers/linux/vfio.h | 9 2 files changed, 65 insertions(+) diff --git a/hw/vfio_pci.c b/hw/vfio_pci.c index 28c8303..9c3c28b 100644 --- a/hw/vfio_pci.c +++ b/hw/vfio_pci.c @@ -38,6 +38,7 @@ #include qemu/error-report.h #include qemu/queue.h #include qemu/range.h +#include sysemu/sysemu.h /* #define DEBUG_VFIO */ #ifdef DEBUG_VFIO @@ -130,6 +131,8 @@ typedef struct VFIODevice { QLIST_ENTRY(VFIODevice) next; struct VFIOGroup *group; bool reset_works; +EventNotifier errfd; +__u32 dev_info_flags; QEMU is not kernel code, please use uint32_t. } VFIODevice; typedef struct VFIOGroup { @@ -1805,6 +1808,8 @@ static int vfio_get_device(VFIOGroup *group, const char *name, VFIODevice *vdev) DPRINTF(Device %s flags: %u, regions: %u, irgs: %u\n, name, dev_info.flags, dev_info.num_regions, dev_info.num_irqs); +vdev-dev_info_flags = dev_info.flags; + if (!(dev_info.flags VFIO_DEVICE_FLAGS_PCI)) { error_report(vfio: Um, this isn't a PCI device\n); goto error; @@ -1900,6 +1905,55 @@ static void vfio_put_device(VFIODevice *vdev) } } +static void vfio_errfd_handler(void *opaque) +{ +VFIODevice *vdev = opaque; + +if (!event_notifier_test_and_clear(vdev-errfd)) { +return; +} + +/* + * TBD. Retrieve the error details and decide what action + * needs to be taken. One of the actions could be to pass + * the error to the guest and have the guest driver recover + * the error. This requires that PCIe capabilities be + * exposed to the guest. At present, we just terminate the + * guest to contain the error. + */ +error_report(%s(%04x:%02x:%02x.%x) +Unrecoverable error detected... Terminating guest\n, +__func__, vdev-host.domain, vdev-host.bus, vdev-host.slot, +vdev-host.function); + +qemu_system_shutdown_request(); +return; +} + +static void vfio_register_errfd(VFIODevice *vdev) +{ +int32_t pfd; +int ret; + +if (!(vdev-dev_info_flags VFIO_DEVICE_FLAGS_AER_NOTIFY)) { +error_report(vfio: Warning: Error notification not supported for the device\n); +return; +} +if (event_notifier_init(vdev-errfd, 0)) { +error_report(vfio: Warning: Unable to init event notifier for error detection\n); +return; +} +pfd = event_notifier_get_fd(vdev-errfd); +qemu_set_fd_handler(pfd, vfio_errfd_handler, NULL, vdev); + +ret = ioctl(vdev-fd, VFIO_DEVICE_SET_ERRFD, pfd); +if (ret) { +error_report(vfio: Warning: Failed to setup error fd: %d\n, ret); +qemu_set_fd_handler(pfd, NULL, NULL, vdev); +event_notifier_cleanup(vdev-errfd); +} +return; +} static int vfio_initfn(PCIDevice *pdev) { VFIODevice *pvdev, *vdev = DO_UPCAST(VFIODevice, pdev, pdev); @@ -2010,6 +2064,8 @@ static int vfio_initfn(PCIDevice *pdev) } } +vfio_register_errfd(vdev); + return 0; out_teardown: diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h index 4758d1b..0ca4eeb 100644 --- a/linux-headers/linux/vfio.h +++ b/linux-headers/linux/vfio.h @@ -147,6 +147,7 @@ struct vfio_device_info { __u32 flags; #define VFIO_DEVICE_FLAGS_RESET(1 0)/* Device supports reset */ #define VFIO_DEVICE_FLAGS_PCI (1 1)/* vfio-pci device */ +#define VFIO_DEVICE_FLAGS_AER_NOTIFY (1 2) /* Supports aer notification */ __u32 num_regions;/* Max region index + 1 */ __u32 num_irqs; /* Max IRQ index + 1 */ These are verbatim copies of kernel headers so it's OK here. }; @@ -288,6 +289,14 @@ struct vfio_irq_set { */ #define VFIO_DEVICE_RESET _IO(VFIO_TYPE, VFIO_BASE + 11) +/** + * VFIO_DEVICE_SET_ERRFD - _IO(VFIO_TYPE, VFIO_BASE + 12) + * + * Pass the eventfd to the vfio-pci driver for signalling any device + * error notifications + */ +#define VFIO_DEVICE_SET_ERRFD _IO(VFIO_TYPE, VFIO_BASE + 12) + /* * The VFIO-PCI bus driver makes use of the following fixed region and * IRQ index mapping. Unimplemented regions return a size of zero. -- 1.7.11.3 -- To unsubscribe from this list: send the
Re: [kvmarm] [PATCH v5.1 0/2] KVM: ARM: Rename KVM_SET_DEVICE_ADDRESS
Am 09.01.2013 um 22:15 schrieb Scott Wood scottw...@freescale.com: On 01/09/2013 02:12:16 PM, Alexander Graf wrote: On 09.01.2013, at 20:50, Scott Wood wrote: On 01/09/2013 10:48:47 AM, Alexander Graf wrote: On 09.01.2013, at 17:26, Christoffer Dall wrote: Renames the KVM_SET_DEVICE_ADDRESS to KVM_ARM_SET_DEVICE_ADDR to make it obvious that this is ARM specific in lack of a better generic interface. Once we agree on a better interface the KVM/ARM code can also take advantage of that, but until then we don't want to hold up the KVM/ARM patches. Works for me. Scott, are you happy with this one too? Not really, given that it will stay around forever even after something new is introduced. But only in ARM specific code. ...which I'll probably have to deal with when Freescale's virtualization-capable ARM chips come along. I don't see it's only in that other architecture as it might as well not exist. I'm saying it's limiting its scope to a few lines of code in arch an arch specific file and makes everyone aware that we really need to come to a conclusion soon. If you're going to change the name, why not just change it to KVM_SET_DEVICE_CONFIG? Can we change the name later if nothing else changes (so it's still binary compatible)? Because that again implies that it's generic enough. And to reach that conclusion will take more time than we should spend on this now. If the conclusion later on is that it is good enough, can the name be changed then? We can add another generic name, yes. Changing it won't work because it'd break compatibility. We can start to introduce (and fix ARM) with a generic ioctl in the MPIC patches then. The ioctl is already generic, except for its name. It's making a few wrong assumptions: * maximum size of value is u64 This is tolerable IMHO. * combining device id (variable) with addr type id (const) into a single field. It could just be split into multiple fields I agree, but that could be lived with as well. I get that there's a tradeoff between getting something in now, versus waiting until the API is more refined. Tagging it with a particular ISA seems like an odd way of saying soon to be deprecated, though. What happens if we're still squabbling over the perfect replacement API when we're trying to push PPC MPIC stuff in? Then we're the ones who have to come up with a good interface. Perhaps the threshold for an API becoming permanent should not be acceptance into the tree, but rather the removal of an experimental tag (including a way of shutting off experimental APIs to make sure you're not depending on them). Sort of like CONFIG_EXPERIMENTAL, except actually used for its intended purpose (distributions should have it *off* by default), and preferably managed at runtime. Sort of like drivers/staging, except for APIs rather than drivers. Changes at that point should require more justification than before merging, but would not have the strict compatibility requirement that non-experimental APIs have. This would make collaboration and testing easier on APIs that aren't ready to be permanent. This tag does exist. It's called not in Linus' tree :). * the id is 100% architecture specific. It shouldn't be. At least the device id field should be generic. That's a documentation issue that could be changed to have all architectures adopt what is currently specified for ARM, without breaking compatibility. Depends on how we want to design the final layout. I don't have the impression that we've reached final conclusion on this interface. That's all I'm saying. I'm not sure if we can come up with more problems in that API when staring at it a bit longer and/or we would actually start using it for more things. So for the sake of not holding up the ARM code, I'm perfectly fine to clutter ARM's ioctl handling code with an ioctl that is already deprecated at its introduction, as long as we don't hold everything else up meanwhile. I'm not in a position to block it, and if I were I presumably would have seen this in time for feedback to matter. That's different from actually being happy. :-) For the ARM specific ioctl it has my ack. I'd say let's go with that and start to work on a good interface ASAP. But we need an ok from Marcelo or Gleb too. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [kvmarm] [PATCH v5.1 0/2] KVM: ARM: Rename KVM_SET_DEVICE_ADDRESS
On 01/09/2013 03:37:20 PM, Alexander Graf wrote: Am 09.01.2013 um 22:15 schrieb Scott Wood scottw...@freescale.com: I get that there's a tradeoff between getting something in now, versus waiting until the API is more refined. Tagging it with a particular ISA seems like an odd way of saying soon to be deprecated, though. What happens if we're still squabbling over the perfect replacement API when we're trying to push PPC MPIC stuff in? Then we're the ones who have to come up with a good interface. How about another bad one, with PPC in the name, and some pleas to hurry things up? :-) It's not as if there haven't been last-minute requests for API changes on the PPC side in the past... Perhaps the threshold for an API becoming permanent should not be acceptance into the tree, but rather the removal of an experimental tag (including a way of shutting off experimental APIs to make sure you're not depending on them). Sort of like CONFIG_EXPERIMENTAL, except actually used for its intended purpose (distributions should have it *off* by default), and preferably managed at runtime. Sort of like drivers/staging, except for APIs rather than drivers. Changes at that point should require more justification than before merging, but would not have the strict compatibility requirement that non-experimental APIs have. This would make collaboration and testing easier on APIs that aren't ready to be permanent. This tag does exist. It's called not in Linus' tree :). Which makes it a pain for multiple people to work on a new feature, especially when it spans components such as KVM and QEMU, and means that it gets less testing before the point of no return. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [kvmarm] [PATCH v5.1 0/2] KVM: ARM: Rename KVM_SET_DEVICE_ADDRESS
On Wed, Jan 9, 2013 at 5:10 PM, Scott Wood scottw...@freescale.com wrote: On 01/09/2013 03:37:20 PM, Alexander Graf wrote: Am 09.01.2013 um 22:15 schrieb Scott Wood scottw...@freescale.com: I get that there's a tradeoff between getting something in now, versus waiting until the API is more refined. Tagging it with a particular ISA seems like an odd way of saying soon to be deprecated, though. What happens if we're still squabbling over the perfect replacement API when we're trying to push PPC MPIC stuff in? Then we're the ones who have to come up with a good interface. How about another bad one, with PPC in the name, and some pleas to hurry things up? :-) It's not as if there haven't been last-minute requests for API changes on the PPC side in the past... This is getting out of hand. Do you have another API for PPC, which was send for review and not commented on several months ago that we can unify right now? If not, let's go with the ARM name and work on the generic API in the mean time. The end result will be something along 5 lines in a header files and 3 lines in a switch case that return -EINVAL if the interface is completely deprecated later on, which is not a big problem. -Christoffer -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [kvmarm] [PATCH v5.1 0/2] KVM: ARM: Rename KVM_SET_DEVICE_ADDRESS
On 09.01.2013, at 23:10, Scott Wood wrote: On 01/09/2013 03:37:20 PM, Alexander Graf wrote: Am 09.01.2013 um 22:15 schrieb Scott Wood scottw...@freescale.com: I get that there's a tradeoff between getting something in now, versus waiting until the API is more refined. Tagging it with a particular ISA seems like an odd way of saying soon to be deprecated, though. What happens if we're still squabbling over the perfect replacement API when we're trying to push PPC MPIC stuff in? Then we're the ones who have to come up with a good interface. How about another bad one, with PPC in the name, and some pleas to hurry things up? :-) Then I'd be the maintainer of that one and tell you whatever I think would be reasonable given the circumstances :). It's not as if there haven't been last-minute requests for API changes on the PPC side in the past... I don't remember a full PPC target merge ever relying on an API change like this. In fact, in this particular case one could merge all of the patches except for the particular ioctl implementation and just give the respective addresses default values until there is an API to set them, similar to how we did things with PVR in the beginning. But this is something that's always up to the respective maintainers and I'm not going to interfere with how they do their job :). Perhaps the threshold for an API becoming permanent should not be acceptance into the tree, but rather the removal of an experimental tag (including a way of shutting off experimental APIs to make sure you're not depending on them). Sort of like CONFIG_EXPERIMENTAL, except actually used for its intended purpose (distributions should have it *off* by default), and preferably managed at runtime. Sort of like drivers/staging, except for APIs rather than drivers. Changes at that point should require more justification than before merging, but would not have the strict compatibility requirement that non-experimental APIs have. This would make collaboration and testing easier on APIs that aren't ready to be permanent. This tag does exist. It's called not in Linus' tree :). Which makes it a pain for multiple people to work on a new feature, especially when it spans components such as KVM and QEMU, and means that it gets less testing before the point of no return. I agree, but we're not going to solve this one now :). In fact, it'd be even harder to solve (with questionable results) than the actual ioctl in question. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [kvmarm] [PATCH v5.1 0/2] KVM: ARM: Rename KVM_SET_DEVICE_ADDRESS
On 09.01.2013, at 23:26, Christoffer Dall wrote: On Wed, Jan 9, 2013 at 5:10 PM, Scott Wood scottw...@freescale.com wrote: On 01/09/2013 03:37:20 PM, Alexander Graf wrote: Am 09.01.2013 um 22:15 schrieb Scott Wood scottw...@freescale.com: I get that there's a tradeoff between getting something in now, versus waiting until the API is more refined. Tagging it with a particular ISA seems like an odd way of saying soon to be deprecated, though. What happens if we're still squabbling over the perfect replacement API when we're trying to push PPC MPIC stuff in? Then we're the ones who have to come up with a good interface. How about another bad one, with PPC in the name, and some pleas to hurry things up? :-) It's not as if there haven't been last-minute requests for API changes on the PPC side in the past... This is getting out of hand. Do you have another API for PPC, which was send for review and not commented on several months ago that we can unify right now? If not, let's go with the ARM name and work on the generic API in the mean time. The end result will be something along 5 lines in a header files and 3 lines in a switch case that return -EINVAL if the interface is completely deprecated later on, which is not a big problem. Agreed [1]. So what exactly are we waiting for? Acks from kvm maintainers, right? Alex [1] It'd probably end up being an ioctl that converts from one calling convention to another, resulting in maybe 5 lines in the switch case. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v8 2/3] x86, apicv: add virtual interrupt delivery support
Marcelo Tosatti wrote on 2013-01-09: On Wed, Jan 09, 2013 at 08:07:31AM +, Zhang, Yang Z wrote: if(check_request(KVM_REQ_, )) { ioapic_lock(); (*) update local EOI exit bitmap from IOAPIC In my patch, it traverses IOAPIC entry once and only updates target vcpus's eoi exit bitmap. Then make request for all vcpus. With your suggestion , all vcpus will traverse all IOAPIC entries. Though ioapic entry write is rare, it's still not reasonable. It should be fast, and very rare (as in once during system initialization, or device hotplug). Ok. Will revise the patch follow your suggestion. Is there a particular case that makes it necessary to optimize scanning? No. ioapic_unlock(); } Fine by me. Looks simpler. (*) plus any other lock that paths that update the map take Best regards, Yang -- Gleb. Best regards, Yang Best regards, Yang -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] vhost-net thread getting stuck ?
On 01/10/2013 04:25 AM, Chegu Vinod wrote: Hello, 'am running into an issue with the latest bits. [ Pl. see below. The vhost thread seems to be getting stuck while trying to memcopy...perhaps a bad address?. ] Wondering if this is a known issue or some recent regression ? Hi: Looks like the issue has been fixed in following commits, does you tree contain these? 499744209b2cbca66c42119226e5470da3bb7040 and 76fe45812a3b134c39170ca32dfd4b7217d33145. They have been merged in to Linus 3.8-rc tree. Thanks 'am using the latest qemu (from qemu.git) and the latest kvm.git kernel on the host. Started the guest using the following command line /usr/local/bin/qemu-system-x86_64 \ -enable-kvm \ -cpu host \ -smp sockets=8,cores=10,threads=1 \ -numa node,nodeid=0,cpus=0-9,mem=64g \ -numa node,nodeid=1,cpus=10-19,mem=64g \ -numa node,nodeid=2,cpus=20-29,mem=64g \ -numa node,nodeid=3,cpus=30-39,mem=64g \ -numa node,nodeid=4,cpus=40-49,mem=64g \ -numa node,nodeid=5,cpus=50-59,mem=64g \ -numa node,nodeid=6,cpus=60-69,mem=64g \ -numa node,nodeid=7,cpus=70-79,mem=64g \ -m 524288 \ -mem-path /dev/hugepages \ -name vm2 \ -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/vm2.monitor,server,now ait \ -drive file=/dev/libvirt_lvm2/vm2,if=none,id=drive-virtio-disk0,format=raw,cache =none,aio=native \ -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=v irtio-disk0,bootindex=1 \ -monitor stdio \ -net nic,model=virtio,macaddr=52:54:00:71:01:02,netdev=nic-0 \ -netdev tap,id=nic-0,ifname=tap0,script=no,downscript=no,vhost=on \ -vnc :4 Was just doing a basic kernel build in the guest when it hung with the following in the dmesg of the host. Thanks Vinod BUG: soft lockup - CPU#46 stuck for 23s! [vhost-135220:135231] Modules linked in: kvm_intel kvm fuse ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables bridge stp llc autofs4 sunrpc pcc_cpufreq ipv6 vhost_net macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support coretemp crc32c_intel ghash_clmulni_intel microcode pcspkr mlx4_core be2net lpc_ich mfd_core hpilo hpwdt i7core_edac edac_core sg netxen_nic ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif aesni_intel ablk_helper cryptd lrw aes_x86_64 xts gf128mul pata_acpi ata_generic ata_piix hpsa lpfc scsi_transport_fc scsi_tgt radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: kvm] CPU 46 Pid: 135231, comm: vhost-135220 Not tainted 3.7.0+ #1 HP ProLiant DL980 G7 RIP: 0010:[8147bab0] [8147bab0] skb_flow_dissect+0x1b0/0x440 RSP: 0018:881ffd131bc8 EFLAGS: 0246 RAX: 8a1f7dc70c00 RBX: RCX: 7fa0 RDX: RSI: 881ffd131c68 RDI: 8a1ff1bd6c80 RBP: 881ffd131c58 R08: 881ffd131bf8 R09: 8a1ff1bd6c80 R10: 0010 R11: 0004 R12: 8a1ff1bd6c80 R13: 000b R14: 8147330b R15: 881ffd131b58 FS: () GS:8a1fff98() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 003d5c810dc0 CR3: 009f77c04000 CR4: 27e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process vhost-135220 (pid: 135231, threadinfo 881ffd13, task 881ffcb754c0) Stack: 881ffd131c18 81477b90 00e2 2b289bcc58ce 881ffd131ce4 00a2 00a2 00a2 00a2 881ffd131c88 937e754e Call Trace: [81477b90] ? memcpy_fromiovecend+0x90/0xd0 [8147f3ca] __skb_get_rxhash+0x1a/0xe0 [a03c90f8] tun_get_user+0x468/0x660 [tun] [81090010] ? __sdt_alloc+0x80/0x1a0 [a03c934d] tun_sendmsg+0x5d/0x80 [tun] [a0468e8a] handle_tx+0x34a/0x680 [vhost_net] [a04691f5] handle_tx_kick+0x15/0x20 [vhost_net] [a0466dfc] vhost_worker+0x10c/0x1c0 [vhost_net] [a0466cf0] ? vhost_attach_cgroups_work+0x30/0x30 [vhost_net] [a0466cf0] ? vhost_attach_cgroups_work+0x30/0x30 [vhost_net] [8107ecfe] kthread+0xce/0xe0 [8107ec30] ? kthread_freezable_should_stop+0x70/0x70 [815537ac] ret_from_fork+0x7c/0xb0 [8107ec30] ? kthread_freezable_should_stop+0x70/0x70 Code: b6 50 06 48 89 ce 48 c1 ee 20 31 f1 41 89 0e 48 8b 48 20 48 33 48 18 48 89 c8 48 c1 e8 20 31 c1 41 89 4e 04 e9 35 ff ff ff 66 90 0f b6 50 09 e9 1a ff ff ff 0f 1f 80 00 00 00 00 41 8b 44 24 68 [root@hydra11 kvm_rik]# Message from syslogd@hydra11 at Jan 9 13:06:58 ... kernel:BUG: soft lockup - CPU#46 stuck for 22s! [vhost-135220:135231] -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to
Re: [RFC PATCH 0/9] KVM: PPC: In-kernel PAPR interrupt controller emulation
On Sat, Dec 15, 2012 at 01:06:13PM +1100, Benjamin Herrenschmidt wrote: On Sat, 2012-12-15 at 01:46 +0100, Alexander Graf wrote: On 05.11.2012, at 04:18, Paul Mackerras wrote: This series implements in-kernel emulation of the XICS interrupt controller specified in the PAPR specification and used by pseries guests. Since the XICS is organized a bit differently to the interrupt controllers used on x86 machines, I have defined some new ioctls for setting up the XICS and for saving and restoring its state. It may be possible to use some of the currently x86-specific ioctls instead. Is this one already following the new world order that we discussed during KVM Forum? :) The new world order (sorry, looks like nobody took notes and people expect me to do a write up from memory now ... :-) Well, mostly we agreed that the x86 stuff wasn't going to work for us. So basically what we discussed boils down to: - Move the existing generic KVM ioctl to create the kernel APIC to x86 since it's no as-is useful for other archs who, among other things, might need to pass argument based on the machine type (type of interrupt subsystem for example, etc...) Assuming you're talking about KVM_CREATE_IRQCHIP, it is already handled entirely in arch code (arch/x86/kvm/x86.c and arch/ia64/kvm/kvm-ia64.c), along with KVM_GET_IRQCHIP and KVM_SET_IRQCHIP. - Once that's done, well, instanciating interrupt controller objects becomes pretty much an arch specific matter. We could try to keep some ioctls somewhat common though it's not *that* useful because the various types arguments are going to be fairly arch specific, so goes for configuration. Examples of what could be kept common are: * Create irq chip, takes at least a type argument, possibly a few more type-specific args (or a union, but then let's keep space in there since we can't change the size of the struct later as it would impact the ioctl number afaik). The existing KVM_CREATE_IRQCHIP is defined as _IO(...) which means that it doesn't read or write memory, but there is still the 3rd argument to the ioctl, which would give us 64 bits to indicate the type of the top-level IRQ controller (XICS, MPIC, ...), but not much more. It sounds like the agreement at the meeting was more along the lines of the KVM_CREATE_IRQCHIP_ARGS ioctl (as in the patches I posted) which can be called multiple times to instantiate pieces of the interrupt framework, rather than having a KVM_CREATE_IRQCHIP that gets called once early on to say that there we are using in-kernel interrupt controller emulation, followed by other calls to configure the various parts of the framework. Is that accurate? * Assigning the address of an irq chip when it can change (see ARM patches) - The existing business with irqfd etc... is mostly ok, except the interfaces messing around with MSIs (see virtio-pci use of kvm functions directly). The assignment of an irq number for an MSI must be a hook, probably a PCI controller hook, so x86 can get it done via its existing kernel interfaces and sane architectures can keep the assignment in qemu where it belongs. So, if I've understood you correctly about what was agreed, the set of ioctls that is implemented in the patches I posted is in line with what was agreed, isn't it? If not, where does it differ? (To recap, I had KVM_CREATE_IRQCHIP_ARGS, KVM_IRQCHIP_GET_SOURCES and KVM_IRQCHIP_SET_SOURCES, plus a one-reg interface to get/set the vcpu-specific state.) Paul. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] vhost-net thread getting stuck ?
On 1/9/2013 8:35 PM, Jason Wang wrote: On 01/10/2013 04:25 AM, Chegu Vinod wrote: Hello, 'am running into an issue with the latest bits. [ Pl. see below. The vhost thread seems to be getting stuck while trying to memcopy...perhaps a bad address?. ] Wondering if this is a known issue or some recent regression ? Hi: Looks like the issue has been fixed in following commits, does you tree contain these? 499744209b2cbca66c42119226e5470da3bb7040 and 76fe45812a3b134c39170ca32dfd4b7217d33145. They have been merged in to Linus 3.8-rc tree. I was using kvm.git kernel (as of today morning)looks like the fixes aren't there yet. Will try the Linus's 3.8-rc tree. Thanks! Vinod Thanks 'am using the latest qemu (from qemu.git) and the latest kvm.git kernel on the host. Started the guest using the following command line /usr/local/bin/qemu-system-x86_64 \ -enable-kvm \ -cpu host \ -smp sockets=8,cores=10,threads=1 \ -numa node,nodeid=0,cpus=0-9,mem=64g \ -numa node,nodeid=1,cpus=10-19,mem=64g \ -numa node,nodeid=2,cpus=20-29,mem=64g \ -numa node,nodeid=3,cpus=30-39,mem=64g \ -numa node,nodeid=4,cpus=40-49,mem=64g \ -numa node,nodeid=5,cpus=50-59,mem=64g \ -numa node,nodeid=6,cpus=60-69,mem=64g \ -numa node,nodeid=7,cpus=70-79,mem=64g \ -m 524288 \ -mem-path /dev/hugepages \ -name vm2 \ -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/vm2.monitor,server,now ait \ -drive file=/dev/libvirt_lvm2/vm2,if=none,id=drive-virtio-disk0,format=raw,cache =none,aio=native \ -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=v irtio-disk0,bootindex=1 \ -monitor stdio \ -net nic,model=virtio,macaddr=52:54:00:71:01:02,netdev=nic-0 \ -netdev tap,id=nic-0,ifname=tap0,script=no,downscript=no,vhost=on \ -vnc :4 Was just doing a basic kernel build in the guest when it hung with the following in the dmesg of the host. Thanks Vinod BUG: soft lockup - CPU#46 stuck for 23s! [vhost-135220:135231] Modules linked in: kvm_intel kvm fuse ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables bridge stp llc autofs4 sunrpc pcc_cpufreq ipv6 vhost_net macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support coretemp crc32c_intel ghash_clmulni_intel microcode pcspkr mlx4_core be2net lpc_ich mfd_core hpilo hpwdt i7core_edac edac_core sg netxen_nic ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif aesni_intel ablk_helper cryptd lrw aes_x86_64 xts gf128mul pata_acpi ata_generic ata_piix hpsa lpfc scsi_transport_fc scsi_tgt radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: kvm] CPU 46 Pid: 135231, comm: vhost-135220 Not tainted 3.7.0+ #1 HP ProLiant DL980 G7 RIP: 0010:[8147bab0] [8147bab0] skb_flow_dissect+0x1b0/0x440 RSP: 0018:881ffd131bc8 EFLAGS: 0246 RAX: 8a1f7dc70c00 RBX: RCX: 7fa0 RDX: RSI: 881ffd131c68 RDI: 8a1ff1bd6c80 RBP: 881ffd131c58 R08: 881ffd131bf8 R09: 8a1ff1bd6c80 R10: 0010 R11: 0004 R12: 8a1ff1bd6c80 R13: 000b R14: 8147330b R15: 881ffd131b58 FS: () GS:8a1fff98() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 003d5c810dc0 CR3: 009f77c04000 CR4: 27e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process vhost-135220 (pid: 135231, threadinfo 881ffd13, task 881ffcb754c0) Stack: 881ffd131c18 81477b90 00e2 2b289bcc58ce 881ffd131ce4 00a2 00a2 00a2 00a2 881ffd131c88 937e754e Call Trace: [81477b90] ? memcpy_fromiovecend+0x90/0xd0 [8147f3ca] __skb_get_rxhash+0x1a/0xe0 [a03c90f8] tun_get_user+0x468/0x660 [tun] [81090010] ? __sdt_alloc+0x80/0x1a0 [a03c934d] tun_sendmsg+0x5d/0x80 [tun] [a0468e8a] handle_tx+0x34a/0x680 [vhost_net] [a04691f5] handle_tx_kick+0x15/0x20 [vhost_net] [a0466dfc] vhost_worker+0x10c/0x1c0 [vhost_net] [a0466cf0] ? vhost_attach_cgroups_work+0x30/0x30 [vhost_net] [a0466cf0] ? vhost_attach_cgroups_work+0x30/0x30 [vhost_net] [8107ecfe] kthread+0xce/0xe0 [8107ec30] ? kthread_freezable_should_stop+0x70/0x70 [815537ac] ret_from_fork+0x7c/0xb0 [8107ec30] ? kthread_freezable_should_stop+0x70/0x70 Code: b6 50 06 48 89 ce 48 c1 ee 20 31 f1 41 89 0e 48 8b 48 20 48 33 48 18 48 89 c8 48 c1 e8 20 31 c1 41 89 4e 04 e9 35 ff ff ff 66 90 0f b6 50 09 e9 1a ff ff ff 0f 1f 80 00 00 00 00 41 8b 44 24 68 [root@hydra11 kvm_rik]# Message from syslogd@hydra11 at Jan 9 13:06:58 ... kernel:BUG: soft lockup - CPU#46 stuck for 22s! [vhost-135220:135231] .
RE: Installation of Windows 8 hangs with KVM
-Original Message- From: Gleb Natapov [mailto:g...@redhat.com] Sent: Monday, January 07, 2013 5:21 PM To: Ren, Yongjie Cc: Stefan Pietsch; kvm@vger.kernel.org Subject: Re: Installation of Windows 8 hangs with KVM On Mon, Jan 07, 2013 at 09:13:37AM +, Ren, Yongjie wrote: -Original Message- From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf Of Gleb Natapov Sent: Monday, January 07, 2013 4:54 PM To: Ren, Yongjie Cc: Stefan Pietsch; kvm@vger.kernel.org Subject: Re: Installation of Windows 8 hangs with KVM On Mon, Jan 07, 2013 at 08:38:59AM +, Ren, Yongjie wrote: -Original Message- From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf Of Stefan Pietsch Sent: Monday, January 07, 2013 2:25 AM To: Gleb Natapov Cc: kvm@vger.kernel.org Subject: Re: Installation of Windows 8 hangs with KVM * Gleb Natapov g...@redhat.com [2013-01-06 11:11]: On Fri, Jan 04, 2013 at 10:58:33PM +0100, Stefan Pietsch wrote: Hi all, when I run KVM with this command the Windows 8 installation stops with error code 0x005D: kvm -m 1024 -hda win8.img -cdrom windows_8_x86.iso After adding the option -cpu host the installation proceeds to a black screen and hangs. With Virtualbox the installation succeeds. The host CPU is an Intel Core Duo L2400. Do you have any suggestions? What is your kernel/qemu version? I'm using Debian unstable. qemu-kvm 1.1.2+dfsg-3 Linux version 3.2.0-4-686-pae (debian-ker...@lists.debian.org) (gcc version 4.6.3 (Debian 4.6.3-14) ) #1 SMP Debian 3.2.35-2 you met issue only for 32bit Win8 (not 64 bit Win8), right? I think it's the same issue as the below bug I reported. https://bugs.launchpad.net/qemu/+bug/1007269 You can try with '-cpu coreduo' or '-cpu core2duo' in qemu-kvm command line. This should be a known issue which is caused by missing 'SEP' CPU flag. See another bug in Redhat bugzilla. https://bugzilla.redhat.com/show_bug.cgi?id=821741 That was RHEL kernel bug. Doubt Debian one has it. I don't think so. It should be a qemu bug (also described in that RHEL bugzilla). https://bugzilla.redhat.com/show_bug.cgi?id=821463 is the kernel one. In my SandyBridge platform, 32bit Win8 guest can boot up with '-cpu SandyBridge,+sep' in qemu-kvm CLI. But it can't boot up with '-cpu SandyBridge'. Which qemu version? Master has sep in SandyBridge definition. In any case -cpu host should have sep enabled. Oh, sorry for my bad memory. I'm not sure it's about SEP. I shouldn't have taken 'SandyBridge' model as an example because 32bit Win8 can boot up on 'SandyBridge' model. 'qemu64' is the default CPU model in QEMU/KVM on x86-64, but its 'family number' definition is not accurate. The vendor of 'qemu64' is defined as 'AMD'. For AMD processors, AMD K8 CPU (e.g. AMD Family 15) firstly introduced SSE3 instruction set. 'qemu64' already has 'CPUID_EXT_SSE3' in its ext_features, but its family number is only 6 (not equal or bigger than 15). If I update the family number of 'qemu64' to '15', 32bit Win8 guest can boot up with the default 'qemu64' CPU model. 32bit Win8 may have some requirements of a 64bit CPU model. The following patch can be a fix for 'qemu64' definition. diff --git a/target-i386/cpu.c b/target-i386/cpu.c index 78bd61e..89b203b 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -363,7 +363,7 @@ static x86_def_t builtin_x86_defs[] = { .vendor1 = CPUID_VENDOR_AMD_1, .vendor2 = CPUID_VENDOR_AMD_2, .vendor3 = CPUID_VENDOR_AMD_3, -.family = 6, +.family = 15, .model = 2, .stepping = 3, .features = PPRO_FEATURES | -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 10/12] virtio-net: multiqueue support
On Wednesday, January 09, 2013 11:26:33 PM Jason Wang wrote: On 01/09/2013 06:01 PM, Wanlong Gao wrote: On 01/09/2013 05:30 PM, Jason Wang wrote: On 01/09/2013 04:23 PM, Wanlong Gao wrote: On 01/08/2013 06:14 PM, Jason Wang wrote: On 01/08/2013 06:00 PM, Wanlong Gao wrote: On 01/08/2013 05:51 PM, Jason Wang wrote: On 01/08/2013 05:49 PM, Wanlong Gao wrote: On 01/08/2013 05:29 PM, Jason Wang wrote: On 01/08/2013 05:07 PM, Wanlong Gao wrote: On 12/28/2012 06:32 PM, Jason Wang wrote: +} else if (nc-peer-info-type != NET_CLIENT_OPTIONS_KIND_TAP) { +ret = -1; +} else { +ret = tap_detach(nc-peer); +} + +return ret; +} + [...] I got guest kernel panic when using this way and set queues=4. Does it happens w/o or w/ a fd parameter? What's the qemu command line? Did you meet it during boot time? The QEMU command line is /work/git/qemu/x86_64-softmmu/qemu-system-x86_64 -name f17 -M pc-0.15 -enable-kvm -m 3096 \ -smp 4,sockets=4,cores=1,threads=1 \ -uuid c31a9f3e-4161-c53a-339c-5dc36d0497cb -no-user-config -nodefaults \ -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/f17.monitor,server,nowai t \ -mon chardev=charmonitor,id=monitor,mode=control \ -rtc base=utc -no-shutdown \ -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \ -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0xb,num_queues=4,hotplug=on \ -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 \ -drive file=/vm/f17.img,if=none,id=drive-virtio-disk0,format=qcow2 \ -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=vi rtio-disk0,bootindex=1 \ -drive file=/vm2/f17-kernel.img,if=none,id=drive-virtio-disk1,format=qcow2 \ -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk1,id=vi rtio-disk1 \ -drive file=/vm/virtio-scsi/scsi3.img,if=none,id=drive-scsi0-0-2-0,format=raw \ -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=2,drive=drive-scsi0-0-2-0,id= scsi0-0-2-0,removable=on \ -drive file=/vm/virtio-scsi/scsi4.img,if=none,id=drive-scsi0-0-3-0,format=raw \ -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=3,drive=drive-scsi0-0-3-0,id= scsi0-0-3-0 \ -drive file=/vm/virtio-scsi/scsi1.img,if=none,id=drive-scsi0-0-0-0,format=raw \ -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id= scsi0-0-0-0 \ -drive file=/vm/virtio-scsi/scsi2.img,if=none,id=drive-scsi0-0-1-0,format=raw \ -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi0-0-1-0,id= scsi0-0-1-0 \ -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 \ -chardev file,id=charserial1,path=/vm/f17.log \ -device isa-serial,chardev=charserial1,id=serial1 \ -device usb-tablet,id=input0 -vga std \ -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 \ -netdev tap,id=hostnet0,vhost=on,queues=4 \ -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:ce:7b:29,bus=pci.0,ad dr=0x3 \ -monitor stdio I got panic just after booting the system, did nothing, waited for a while, the guest panicked. [ 28.053004] BUG: soft lockup - CPU#1 stuck for 23s! [ip:592] [ 28.053004] Modules linked in: ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables uinput joydev microcode virtio_balloon pcspkr virtio_net i2c_piix4 i2c_core virtio_scsi virtio_blk floppy [ 28.053004] CPU 1 [ 28.053004] Pid: 592, comm: ip Not tainted 3.8.0-rc1-net+ #3 Bochs Bochs [ 28.053004] RIP: 0010:[8137a9ab] [8137a9ab] virtqueue_get_buf+0xb/0x120 [ 28.053004] RSP: 0018:8800bc913550 EFLAGS: 0246 [ 28.053004] RAX: RBX: 8800bc49c000 RCX: 8800bc49e000 [ 28.053004] RDX: RSI: 8800bc913584 RDI: 8800bcfd4000 [ 28.053004] RBP: 8800bc913558 R08: 8800bcfd0800 R09: [ 28.053004] R10: 8800bc49c000 R11: 880036cc4de0 R12: 8800bcfd4000 [ 28.053004] R13: 8800bc913558 R14: 8137ad73 R15: 000200d0 [ 28.053004] FS: 7fb27a589740() GS:8800c148() knlGS: [ 28.053004] CS: 0010 DS: ES: CR0: 80050033 [ 28.053004] CR2: 00640530 CR3: baeff000 CR4: 06e0 [ 28.053004] DR0: DR1: DR2: [ 28.053004] DR3: DR6: 0ff0 DR7: 0400 [ 28.053004] Process ip (pid: 592, threadinfo 8800bc912000, task 880036da2e20) [ 28.053004] Stack: [ 28.053004] 8800bcfd0800 8800bc913638 a003e9bb 8800bc913656 [ 28.053004] 00010002 8800c17ebb08 0050ff10 ea0002f244c0 [ 28.053004] 00020582 ea0002f244c0 [ 28.053004] Call Trace: [ 28.053004]
Re: [Qemu-devel] [PATCH 10/12] virtio-net: multiqueue support
On 01/10/2013 02:43 PM, Jason Wang wrote: On Wednesday, January 09, 2013 11:26:33 PM Jason Wang wrote: On 01/09/2013 06:01 PM, Wanlong Gao wrote: On 01/09/2013 05:30 PM, Jason Wang wrote: On 01/09/2013 04:23 PM, Wanlong Gao wrote: On 01/08/2013 06:14 PM, Jason Wang wrote: On 01/08/2013 06:00 PM, Wanlong Gao wrote: On 01/08/2013 05:51 PM, Jason Wang wrote: On 01/08/2013 05:49 PM, Wanlong Gao wrote: On 01/08/2013 05:29 PM, Jason Wang wrote: On 01/08/2013 05:07 PM, Wanlong Gao wrote: On 12/28/2012 06:32 PM, Jason Wang wrote: +} else if (nc-peer-info-type != NET_CLIENT_OPTIONS_KIND_TAP) { +ret = -1; +} else { +ret = tap_detach(nc-peer); +} + +return ret; +} + [...] I got guest kernel panic when using this way and set queues=4. Does it happens w/o or w/ a fd parameter? What's the qemu command line? Did you meet it during boot time? The QEMU command line is /work/git/qemu/x86_64-softmmu/qemu-system-x86_64 -name f17 -M pc-0.15 -enable-kvm -m 3096 \ -smp 4,sockets=4,cores=1,threads=1 \ -uuid c31a9f3e-4161-c53a-339c-5dc36d0497cb -no-user-config -nodefaults \ -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/f17.monitor,server,nowai t \ -mon chardev=charmonitor,id=monitor,mode=control \ -rtc base=utc -no-shutdown \ -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \ -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0xb,num_queues=4,hotplug=on \ -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 \ -drive file=/vm/f17.img,if=none,id=drive-virtio-disk0,format=qcow2 \ -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=vi rtio-disk0,bootindex=1 \ -drive file=/vm2/f17-kernel.img,if=none,id=drive-virtio-disk1,format=qcow2 \ -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk1,id=vi rtio-disk1 \ -drive file=/vm/virtio-scsi/scsi3.img,if=none,id=drive-scsi0-0-2-0,format=raw \ -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=2,drive=drive-scsi0-0-2-0,id= scsi0-0-2-0,removable=on \ -drive file=/vm/virtio-scsi/scsi4.img,if=none,id=drive-scsi0-0-3-0,format=raw \ -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=3,drive=drive-scsi0-0-3-0,id= scsi0-0-3-0 \ -drive file=/vm/virtio-scsi/scsi1.img,if=none,id=drive-scsi0-0-0-0,format=raw \ -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id= scsi0-0-0-0 \ -drive file=/vm/virtio-scsi/scsi2.img,if=none,id=drive-scsi0-0-1-0,format=raw \ -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi0-0-1-0,id= scsi0-0-1-0 \ -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 \ -chardev file,id=charserial1,path=/vm/f17.log \ -device isa-serial,chardev=charserial1,id=serial1 \ -device usb-tablet,id=input0 -vga std \ -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 \ -netdev tap,id=hostnet0,vhost=on,queues=4 \ -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:ce:7b:29,bus=pci.0,ad dr=0x3 \ -monitor stdio I got panic just after booting the system, did nothing, waited for a while, the guest panicked. [ 28.053004] BUG: soft lockup - CPU#1 stuck for 23s! [ip:592] [ 28.053004] Modules linked in: ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables uinput joydev microcode virtio_balloon pcspkr virtio_net i2c_piix4 i2c_core virtio_scsi virtio_blk floppy [ 28.053004] CPU 1 [ 28.053004] Pid: 592, comm: ip Not tainted 3.8.0-rc1-net+ #3 Bochs Bochs [ 28.053004] RIP: 0010:[8137a9ab] [8137a9ab] virtqueue_get_buf+0xb/0x120 [ 28.053004] RSP: 0018:8800bc913550 EFLAGS: 0246 [ 28.053004] RAX: RBX: 8800bc49c000 RCX: 8800bc49e000 [ 28.053004] RDX: RSI: 8800bc913584 RDI: 8800bcfd4000 [ 28.053004] RBP: 8800bc913558 R08: 8800bcfd0800 R09: [ 28.053004] R10: 8800bc49c000 R11: 880036cc4de0 R12: 8800bcfd4000 [ 28.053004] R13: 8800bc913558 R14: 8137ad73 R15: 000200d0 [ 28.053004] FS: 7fb27a589740() GS:8800c148() knlGS: [ 28.053004] CS: 0010 DS: ES: CR0: 80050033 [ 28.053004] CR2: 00640530 CR3: baeff000 CR4: 06e0 [ 28.053004] DR0: DR1: DR2: [ 28.053004] DR3: DR6: 0ff0 DR7: 0400 [ 28.053004] Process ip (pid: 592, threadinfo 8800bc912000, task 880036da2e20) [ 28.053004] Stack: [ 28.053004] 8800bcfd0800 8800bc913638 a003e9bb 8800bc913656 [ 28.053004] 00010002 8800c17ebb08 0050ff10 ea0002f244c0 [ 28.053004] 00020582 ea0002f244c0 [ 28.053004] Call Trace: [ 28.053004] [a003e9bb] virtnet_send_command.constprop.26+0x24b/0x270
Re: [PATCH] virtio-spec: fix two typos
Stefan Hajnoczi stefa...@gmail.com writes: On Wed, Jan 09, 2013 at 03:55:23PM +0800, ak...@redhat.com wrote: From: Amos Kong ak...@redhat.com VIRTIO_NET_F_VTRL_VQ - VIRTIO_NET_F_CTRL_VQ VIRTIO_NET_CTRL_MQ is defined to 4 in kernel code Signed-off-by: Amos Kong ak...@redhat.com --- virtio-spec.lyx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) Reviewed-by: Stefan Hajnoczi stefa...@redhat.com Thanks, applied. Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 10/12] virtio-net: multiqueue support
On Thursday, January 10, 2013 02:49:14 PM Wanlong Gao wrote: On 01/10/2013 02:43 PM, Jason Wang wrote: On Wednesday, January 09, 2013 11:26:33 PM Jason Wang wrote: On 01/09/2013 06:01 PM, Wanlong Gao wrote: On 01/09/2013 05:30 PM, Jason Wang wrote: On 01/09/2013 04:23 PM, Wanlong Gao wrote: On 01/08/2013 06:14 PM, Jason Wang wrote: On 01/08/2013 06:00 PM, Wanlong Gao wrote: On 01/08/2013 05:51 PM, Jason Wang wrote: On 01/08/2013 05:49 PM, Wanlong Gao wrote: On 01/08/2013 05:29 PM, Jason Wang wrote: On 01/08/2013 05:07 PM, Wanlong Gao wrote: On 12/28/2012 06:32 PM, Jason Wang wrote: +} else if (nc-peer-info-type != NET_CLIENT_OPTIONS_KIND_TAP) { +ret = -1; +} else { +ret = tap_detach(nc-peer); +} + +return ret; +} + [...] I got guest kernel panic when using this way and set queues=4. Does it happens w/o or w/ a fd parameter? What's the qemu command line? Did you meet it during boot time? The QEMU command line is /work/git/qemu/x86_64-softmmu/qemu-system-x86_64 -name f17 -M pc-0.15 -enable-kvm -m 3096 \ -smp 4,sockets=4,cores=1,threads=1 \ -uuid c31a9f3e-4161-c53a-339c-5dc36d0497cb -no-user-config -nodefaults \ -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/f17.monitor,server,nowa i t \ -mon chardev=charmonitor,id=monitor,mode=control \ -rtc base=utc -no-shutdown \ -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \ -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0xb,num_queues=4,hotplug=on \ -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 \ -drive file=/vm/f17.img,if=none,id=drive-virtio-disk0,format=qcow2 \ -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=v i rtio-disk0,bootindex=1 \ -drive file=/vm2/f17-kernel.img,if=none,id=drive-virtio-disk1,format=qcow2 \ -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk1,id=v i rtio-disk1 \ -drive file=/vm/virtio-scsi/scsi3.img,if=none,id=drive-scsi0-0-2-0,format=raw \ -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=2,drive=drive-scsi0-0-2-0,id = scsi0-0-2-0,removable=on \ -drive file=/vm/virtio-scsi/scsi4.img,if=none,id=drive-scsi0-0-3-0,format=raw \ -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=3,drive=drive-scsi0-0-3-0,id = scsi0-0-3-0 \ -drive file=/vm/virtio-scsi/scsi1.img,if=none,id=drive-scsi0-0-0-0,format=raw \ -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id = scsi0-0-0-0 \ -drive file=/vm/virtio-scsi/scsi2.img,if=none,id=drive-scsi0-0-1-0,format=raw \ -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi0-0-1-0,id = scsi0-0-1-0 \ -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 \ -chardev file,id=charserial1,path=/vm/f17.log \ -device isa-serial,chardev=charserial1,id=serial1 \ -device usb-tablet,id=input0 -vga std \ -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 \ -netdev tap,id=hostnet0,vhost=on,queues=4 \ -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:ce:7b:29,bus=pci.0,a d dr=0x3 \ -monitor stdio I got panic just after booting the system, did nothing, waited for a while, the guest panicked. [ 28.053004] BUG: soft lockup - CPU#1 stuck for 23s! [ip:592] [ 28.053004] Modules linked in: ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables uinput joydev microcode virtio_balloon pcspkr virtio_net i2c_piix4 i2c_core virtio_scsi virtio_blk floppy [ 28.053004] CPU 1 [ 28.053004] Pid: 592, comm: ip Not tainted 3.8.0-rc1-net+ #3 Bochs Bochs [ 28.053004] RIP: 0010:[8137a9ab] [8137a9ab] virtqueue_get_buf+0xb/0x120 [ 28.053004] RSP: 0018:8800bc913550 EFLAGS: 0246 [ 28.053004] RAX: RBX: 8800bc49c000 RCX: 8800bc49e000 [ 28.053004] RDX: RSI: 8800bc913584 RDI: 8800bcfd4000 [ 28.053004] RBP: 8800bc913558 R08: 8800bcfd0800 R09: [ 28.053004] R10: 8800bc49c000 R11: 880036cc4de0 R12: 8800bcfd4000 [ 28.053004] R13: 8800bc913558 R14: 8137ad73 R15: 000200d0 [ 28.053004] FS: 7fb27a589740() GS:8800c148() knlGS: [ 28.053004] CS: 0010 DS: ES: CR0: 80050033 [ 28.053004] CR2: 00640530 CR3: baeff000 CR4: 06e0 [ 28.053004] DR0: DR1: DR2: [ 28.053004] DR3: DR6: 0ff0 DR7: 0400 [ 28.053004] Process ip (pid: 592, threadinfo 8800bc912000, task 880036da2e20) [ 28.053004] Stack: [ 28.053004] 8800bcfd0800 8800bc913638 a003e9bb 8800bc913656 [ 28.053004] 00010002 8800c17ebb08 0050ff10 ea0002f244c0 [
[PATCH v9 0/3] x86, apicv: Add APIC virtualization support
From: Yang Zhang yang.z.zh...@intel.com APIC virtualization is a new feature which can eliminate most of VM exit when vcpu handle a interrupt: APIC register virtualization: APIC read access doesn't cause APIC-access VM exits. APIC write becomes trap-like. Virtual interrupt delivery: Virtual interrupt delivery avoids KVM to inject vAPIC interrupts manually, which is fully taken care of by the hardware. Please refer to Intel SDM volume 3, chapter 29 for more details. Changes v8 to v9: * Update eoi exit bitmap by vcpu itself. * Enable virtualize x2apic mode when guest is using x2apic. * Rebase on top of KVM upstream Changes v7 to v8: * According Marcelo's suggestion, add comments for irr_pending and isr_count, since the two valiables have different meaning when using apicv. * Set highest bit in vISR to SVI after migation. * Use spinlock to access eoi exit bitmap synchronously. * Enable virtualize x2apic mode when guest is using x2apic * Rebased on top of KVM upstream. Changes v6 to v7: * fix a bug when set exit bitmap. * Rebased on top of KVM upstream. Changes v5 to v6: * minor adjustments according gleb's comments * Rebased on top of KVM upstream. Changes v4 to v5: * Set eoi exit bitmap when an interrupt has notifier registered. * Use request to track ioapic entry's modification. * Rebased on top of KVM upstream. Changes v3 to v4: * use one option to control both register virtualization and virtual interrupt delivery. * Update eoi exit bitmap when programing ioapic or programing apic's id/dfr/ldr. * Rebased on top of KVM upstream. Changes v2 to v3: * Drop Posted Interrupt patch from v3. According Gleb's suggestion, we will use global vector for all VCPUs as notification event vector. So we will rewrite the Posted Interrupt patch. And resend it later. * Use TMR to set the eoi exiting bitmap. We only want to set eoi exiting bitmap for those interrupt which is level trigger or has notifier in EOI write path. So TMR is enough to distinguish the interrupt trigger mode. * Simplify some code according Gleb's comments. * rebased on top of KVM upstream. Changes v1 to v2: * Add Posted Interrupt support in this series patch. * Since there is a notifer hook in vAPIC EOI for PIT interrupt. So always Set PIT interrupt in eoi exit bitmap to force vmexit when EOI to interrupt. * Rebased on top of KVM upstream Yang Zhang (3): x86, apicv: add APICv register virtualization support x86, apicv: add virtual x2apic support x86, apicv: add virtual interrupt delivery support arch/x86/include/asm/kvm_host.h |7 + arch/x86/include/asm/vmx.h | 14 ++ arch/x86/kvm/irq.c | 56 +- arch/x86/kvm/lapic.c| 92 ++--- arch/x86/kvm/lapic.h| 25 +++ arch/x86/kvm/svm.c | 24 +++ arch/x86/kvm/vmx.c | 404 ++- arch/x86/kvm/x86.c | 14 ++- include/linux/kvm_host.h|3 + virt/kvm/ioapic.c | 18 ++ virt/kvm/ioapic.h |4 + virt/kvm/irq_comm.c | 22 ++ virt/kvm/kvm_main.c |5 + 13 files changed, 643 insertions(+), 45 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v9 1/3] x86, apicv: add APICv register virtualization support
- APIC read doesn't cause VM-Exit - APIC write becomes trap-like Signed-off-by: Kevin Tian kevin.t...@intel.com Signed-off-by: Yang Zhang yang.z.zh...@intel.com --- arch/x86/include/asm/vmx.h |2 ++ arch/x86/kvm/lapic.c | 15 +++ arch/x86/kvm/lapic.h |2 ++ arch/x86/kvm/vmx.c | 33 - 4 files changed, 51 insertions(+), 1 deletions(-) diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index e385df9..44c3f7e 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -66,6 +66,7 @@ #define EXIT_REASON_EPT_MISCONFIG 49 #define EXIT_REASON_WBINVD 54 #define EXIT_REASON_XSETBV 55 +#define EXIT_REASON_APIC_WRITE 56 #define EXIT_REASON_INVPCID 58 #define VMX_EXIT_REASONS \ @@ -141,6 +142,7 @@ #define SECONDARY_EXEC_ENABLE_VPID 0x0020 #define SECONDARY_EXEC_WBINVD_EXITING 0x0040 #define SECONDARY_EXEC_UNRESTRICTED_GUEST 0x0080 +#define SECONDARY_EXEC_APIC_REGISTER_VIRT 0x0100 #define SECONDARY_EXEC_PAUSE_LOOP_EXITING 0x0400 #define SECONDARY_EXEC_ENABLE_INVPCID 0x1000 diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 9392f52..0664c13 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -1212,6 +1212,21 @@ void kvm_lapic_set_eoi(struct kvm_vcpu *vcpu) } EXPORT_SYMBOL_GPL(kvm_lapic_set_eoi); +/* emulate APIC access in a trap manner */ +void kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset) +{ + u32 val = 0; + + /* hw has done the conditional check and inst decode */ + offset = 0xff0; + + apic_reg_read(vcpu-arch.apic, offset, 4, val); + + /* TODO: optimize to just emulate side effect w/o one more write */ + apic_reg_write(vcpu-arch.apic, offset, val); +} +EXPORT_SYMBOL_GPL(kvm_apic_write_nodecode); + void kvm_free_lapic(struct kvm_vcpu *vcpu) { struct kvm_lapic *apic = vcpu-arch.apic; diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h index e5ebf9f..9a8ee22 100644 --- a/arch/x86/kvm/lapic.h +++ b/arch/x86/kvm/lapic.h @@ -64,6 +64,8 @@ int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu); u64 kvm_get_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu); void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu, u64 data); +void kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset); + void kvm_lapic_set_vapic_addr(struct kvm_vcpu *vcpu, gpa_t vapic_addr); void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu); void kvm_lapic_sync_to_vapic(struct kvm_vcpu *vcpu); diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 55dfc37..688f43f 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -84,6 +84,9 @@ module_param(vmm_exclusive, bool, S_IRUGO); static bool __read_mostly fasteoi = 1; module_param(fasteoi, bool, S_IRUGO); +static bool __read_mostly enable_apicv_reg_vid = 1; +module_param(enable_apicv_reg_vid, bool, S_IRUGO); + /* * If nested=1, nested virtualization is supported, i.e., guests may use * VMX and be a hypervisor for its own guests. If nested=0, guests may not @@ -764,6 +767,12 @@ static inline bool cpu_has_vmx_virtualize_apic_accesses(void) SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES; } +static inline bool cpu_has_vmx_apic_register_virt(void) +{ + return vmcs_config.cpu_based_2nd_exec_ctrl + SECONDARY_EXEC_APIC_REGISTER_VIRT; +} + static inline bool cpu_has_vmx_flexpriority(void) { return cpu_has_vmx_tpr_shadow() @@ -2541,7 +2550,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf) SECONDARY_EXEC_UNRESTRICTED_GUEST | SECONDARY_EXEC_PAUSE_LOOP_EXITING | SECONDARY_EXEC_RDTSCP | - SECONDARY_EXEC_ENABLE_INVPCID; + SECONDARY_EXEC_ENABLE_INVPCID | + SECONDARY_EXEC_APIC_REGISTER_VIRT; if (adjust_vmx_controls(min2, opt2, MSR_IA32_VMX_PROCBASED_CTLS2, _cpu_based_2nd_exec_control) 0) @@ -2552,6 +2562,11 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf) SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES)) _cpu_based_exec_control = ~CPU_BASED_TPR_SHADOW; #endif + + if (!(_cpu_based_exec_control CPU_BASED_TPR_SHADOW)) + _cpu_based_2nd_exec_control = ~( + SECONDARY_EXEC_APIC_REGISTER_VIRT); + if (_cpu_based_2nd_exec_control SECONDARY_EXEC_ENABLE_EPT) { /* CR3 accesses and invlpg don't need to cause VM Exits when EPT enabled */ @@ -2749,6 +2764,9 @@ static __init int hardware_setup(void) if (!cpu_has_vmx_ple()) ple_gap = 0; + if (!cpu_has_vmx_apic_register_virt()) +
[PATCH v9 3/3] x86, apicv: add virtual interrupt delivery support
From: Yang Zhang yang.z.zh...@intel.com Virtual interrupt delivery avoids KVM to inject vAPIC interrupts manually, which is fully taken care of by the hardware. This needs some special awareness into existing interrupr injection path: - for pending interrupt, instead of direct injection, we may need update architecture specific indicators before resuming to guest. - A pending interrupt, which is masked by ISR, should be also considered in above update action, since hardware will decide when to inject it at right time. Current has_interrupt and get_interrupt only returns a valid vector from injection p.o.v. Signed-off-by: Kevin Tian kevin.t...@intel.com Signed-off-by: Yang Zhang yang.z.zh...@intel.com --- arch/x86/include/asm/kvm_host.h |5 + arch/x86/include/asm/vmx.h | 11 +++ arch/x86/kvm/irq.c | 56 +++- arch/x86/kvm/lapic.c| 72 +-- arch/x86/kvm/lapic.h| 23 + arch/x86/kvm/svm.c | 18 arch/x86/kvm/vmx.c | 191 +-- arch/x86/kvm/x86.c | 14 +++- include/linux/kvm_host.h|3 + virt/kvm/ioapic.c | 18 virt/kvm/ioapic.h |4 + virt/kvm/irq_comm.c | 22 + virt/kvm/kvm_main.c |5 + 13 files changed, 399 insertions(+), 43 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 572a562..f471856 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -697,6 +697,10 @@ struct kvm_x86_ops { void (*enable_nmi_window)(struct kvm_vcpu *vcpu); void (*enable_irq_window)(struct kvm_vcpu *vcpu); void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr); + int (*has_virtual_interrupt_delivery)(struct kvm_vcpu *vcpu); + void (*update_apic_irq)(struct kvm_vcpu *vcpu, int max_irr); + void (*update_eoi_exitmap)(struct kvm_vcpu *vcpu); + void (*set_svi)(int isr); void (*enable_virtual_x2apic_mode)(struct kvm_vcpu *vcpu); void (*disable_virtual_x2apic_mode)(struct kvm_vcpu *vcpu); int (*set_tss_addr)(struct kvm *kvm, unsigned int addr); @@ -993,6 +997,7 @@ int kvm_age_hva(struct kvm *kvm, unsigned long hva); int kvm_test_age_hva(struct kvm *kvm, unsigned long hva); void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte); int cpuid_maxphyaddr(struct kvm_vcpu *vcpu); +int kvm_cpu_has_injectable_intr(struct kvm_vcpu *v); int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu); int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu); int kvm_cpu_get_interrupt(struct kvm_vcpu *v); diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index 0a54df0..694586c 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -62,6 +62,7 @@ #define EXIT_REASON_MCE_DURING_VMENTRY 41 #define EXIT_REASON_TPR_BELOW_THRESHOLD 43 #define EXIT_REASON_APIC_ACCESS 44 +#define EXIT_REASON_EOI_INDUCED 45 #define EXIT_REASON_EPT_VIOLATION 48 #define EXIT_REASON_EPT_MISCONFIG 49 #define EXIT_REASON_WBINVD 54 @@ -144,6 +145,7 @@ #define SECONDARY_EXEC_WBINVD_EXITING 0x0040 #define SECONDARY_EXEC_UNRESTRICTED_GUEST 0x0080 #define SECONDARY_EXEC_APIC_REGISTER_VIRT 0x0100 +#define SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY0x0200 #define SECONDARY_EXEC_PAUSE_LOOP_EXITING 0x0400 #define SECONDARY_EXEC_ENABLE_INVPCID 0x1000 @@ -181,6 +183,7 @@ enum vmcs_field { GUEST_GS_SELECTOR = 0x080a, GUEST_LDTR_SELECTOR = 0x080c, GUEST_TR_SELECTOR = 0x080e, + GUEST_INTR_STATUS = 0x0810, HOST_ES_SELECTOR= 0x0c00, HOST_CS_SELECTOR= 0x0c02, HOST_SS_SELECTOR= 0x0c04, @@ -208,6 +211,14 @@ enum vmcs_field { APIC_ACCESS_ADDR_HIGH = 0x2015, EPT_POINTER = 0x201a, EPT_POINTER_HIGH= 0x201b, + EOI_EXIT_BITMAP0= 0x201c, + EOI_EXIT_BITMAP0_HIGH = 0x201d, + EOI_EXIT_BITMAP1= 0x201e, + EOI_EXIT_BITMAP1_HIGH = 0x201f, + EOI_EXIT_BITMAP2= 0x2020, + EOI_EXIT_BITMAP2_HIGH = 0x2021, + EOI_EXIT_BITMAP3= 0x2022, + EOI_EXIT_BITMAP3_HIGH = 0x2023, GUEST_PHYSICAL_ADDRESS = 0x2400, GUEST_PHYSICAL_ADDRESS_HIGH = 0x2401, VMCS_LINK_POINTER = 0x2800, diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c index b111aee..e113440 100644 --- a/arch/x86/kvm/irq.c +++ b/arch/x86/kvm/irq.c @@ -38,6 +38,38 @@ int kvm_cpu_has_pending_timer(struct kvm_vcpu
[PATCH v9 2/3] x86, apicv: add virtual x2apic support
From: Yang Zhang yang.z.zh...@intel.com basically to benefit from apicv, we need to enable virtualized x2apic mode. Currently, we only enable it when guest is really using x2apic. Also, clear MSR bitmap for corresponding x2apic MSRs when guest enabled x2apic: 0x800 - 0x8ff: no read intercept for apicv register virtualization, except APIC ID and TMCCT. APIC ID and TMCCT: need software's assistance to get right value. TPR,EOI,SELF-IPI: no write intercept for virtual interrupt delivery. Signed-off-by: Kevin Tian kevin.t...@intel.com Signed-off-by: Yang Zhang yang.z.zh...@intel.com --- arch/x86/include/asm/kvm_host.h |2 + arch/x86/include/asm/vmx.h |1 + arch/x86/kvm/lapic.c|5 +- arch/x86/kvm/svm.c |6 + arch/x86/kvm/vmx.c | 194 +-- 5 files changed, 200 insertions(+), 8 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index c431b33..572a562 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -697,6 +697,8 @@ struct kvm_x86_ops { void (*enable_nmi_window)(struct kvm_vcpu *vcpu); void (*enable_irq_window)(struct kvm_vcpu *vcpu); void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr); + void (*enable_virtual_x2apic_mode)(struct kvm_vcpu *vcpu); + void (*disable_virtual_x2apic_mode)(struct kvm_vcpu *vcpu); int (*set_tss_addr)(struct kvm *kvm, unsigned int addr); int (*get_tdp_level)(void); u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio); diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index 44c3f7e..0a54df0 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -139,6 +139,7 @@ #define SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES 0x0001 #define SECONDARY_EXEC_ENABLE_EPT 0x0002 #define SECONDARY_EXEC_RDTSCP 0x0008 +#define SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE 0x0010 #define SECONDARY_EXEC_ENABLE_VPID 0x0020 #define SECONDARY_EXEC_WBINVD_EXITING 0x0040 #define SECONDARY_EXEC_UNRESTRICTED_GUEST 0x0080 diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 0664c13..ec38906 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -1328,7 +1328,10 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value) u32 id = kvm_apic_id(apic); u32 ldr = ((id 4) 16) | (1 (id 0xf)); kvm_apic_set_ldr(apic, ldr); - } + kvm_x86_ops-enable_virtual_x2apic_mode(vcpu); + } else + kvm_x86_ops-disable_virtual_x2apic_mode(vcpu); + apic-base_address = apic-vcpu-arch.apic_base MSR_IA32_APICBASE_BASE; diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index d29d3cd..0b82cb1 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -3571,6 +3571,11 @@ static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr) set_cr_intercept(svm, INTERCEPT_CR8_WRITE); } +static void svm_enable_virtual_x2apic_mode(struct kvm_vcpu *vcpu) +{ + return; +} + static int svm_nmi_allowed(struct kvm_vcpu *vcpu) { struct vcpu_svm *svm = to_svm(vcpu); @@ -4290,6 +4295,7 @@ static struct kvm_x86_ops svm_x86_ops = { .enable_nmi_window = enable_nmi_window, .enable_irq_window = enable_irq_window, .update_cr8_intercept = update_cr8_intercept, + .enable_virtual_x2apic_mode = svm_enable_virtual_x2apic_mode, .set_tss_addr = svm_set_tss_addr, .get_tdp_level = get_npt_level, diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 688f43f..b203ce7 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -433,6 +433,8 @@ struct vcpu_vmx { bool rdtscp_enabled; + bool virtual_x2apic_enabled; + /* Support for a guest hypervisor (nested VMX) */ struct nested_vmx nested; }; @@ -767,12 +769,23 @@ static inline bool cpu_has_vmx_virtualize_apic_accesses(void) SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES; } +static inline bool cpu_has_vmx_virtualize_x2apic_mode(void) +{ + return vmcs_config.cpu_based_2nd_exec_ctrl + SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE; +} + static inline bool cpu_has_vmx_apic_register_virt(void) { return vmcs_config.cpu_based_2nd_exec_ctrl SECONDARY_EXEC_APIC_REGISTER_VIRT; } +static inline bool cpu_has_vmx_virtual_intr_delivery(void) +{ + return false; +} + static inline bool cpu_has_vmx_flexpriority(void) { return cpu_has_vmx_tpr_shadow() @@ -2544,6 +2557,7 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf) if (_cpu_based_exec_control CPU_BASED_ACTIVATE_SECONDARY_CONTROLS) { min2 = 0; opt2 =
Re: [PATCH v9 2/3] x86, apicv: add virtual x2apic support
On Thu, Jan 10, 2013 at 03:26:07PM +0800, Yang Zhang wrote: From: Yang Zhang yang.z.zh...@intel.com basically to benefit from apicv, we need to enable virtualized x2apic mode. Currently, we only enable it when guest is really using x2apic. Also, clear MSR bitmap for corresponding x2apic MSRs when guest enabled x2apic: 0x800 - 0x8ff: no read intercept for apicv register virtualization, except APIC ID and TMCCT. APIC ID and TMCCT: need software's assistance to get right value. TPR,EOI,SELF-IPI: no write intercept for virtual interrupt delivery. Signed-off-by: Kevin Tian kevin.t...@intel.com Signed-off-by: Yang Zhang yang.z.zh...@intel.com --- arch/x86/include/asm/kvm_host.h |2 + arch/x86/include/asm/vmx.h |1 + arch/x86/kvm/lapic.c|5 +- arch/x86/kvm/svm.c |6 + arch/x86/kvm/vmx.c | 194 +-- 5 files changed, 200 insertions(+), 8 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index c431b33..572a562 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -697,6 +697,8 @@ struct kvm_x86_ops { void (*enable_nmi_window)(struct kvm_vcpu *vcpu); void (*enable_irq_window)(struct kvm_vcpu *vcpu); void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr); + void (*enable_virtual_x2apic_mode)(struct kvm_vcpu *vcpu); + void (*disable_virtual_x2apic_mode)(struct kvm_vcpu *vcpu); Make one callback with enable/disable parameter. And do not forget SVM. int (*set_tss_addr)(struct kvm *kvm, unsigned int addr); int (*get_tdp_level)(void); u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio); diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index 44c3f7e..0a54df0 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -139,6 +139,7 @@ #define SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES 0x0001 #define SECONDARY_EXEC_ENABLE_EPT 0x0002 #define SECONDARY_EXEC_RDTSCP0x0008 +#define SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE 0x0010 #define SECONDARY_EXEC_ENABLE_VPID 0x0020 #define SECONDARY_EXEC_WBINVD_EXITING0x0040 #define SECONDARY_EXEC_UNRESTRICTED_GUEST0x0080 diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 0664c13..ec38906 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -1328,7 +1328,10 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value) u32 id = kvm_apic_id(apic); u32 ldr = ((id 4) 16) | (1 (id 0xf)); kvm_apic_set_ldr(apic, ldr); - } + kvm_x86_ops-enable_virtual_x2apic_mode(vcpu); + } else + kvm_x86_ops-disable_virtual_x2apic_mode(vcpu); + You just broke SVM. apic-base_address = apic-vcpu-arch.apic_base MSR_IA32_APICBASE_BASE; diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index d29d3cd..0b82cb1 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -3571,6 +3571,11 @@ static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr) set_cr_intercept(svm, INTERCEPT_CR8_WRITE); } +static void svm_enable_virtual_x2apic_mode(struct kvm_vcpu *vcpu) +{ + return; +} + static int svm_nmi_allowed(struct kvm_vcpu *vcpu) { struct vcpu_svm *svm = to_svm(vcpu); @@ -4290,6 +4295,7 @@ static struct kvm_x86_ops svm_x86_ops = { .enable_nmi_window = enable_nmi_window, .enable_irq_window = enable_irq_window, .update_cr8_intercept = update_cr8_intercept, + .enable_virtual_x2apic_mode = svm_enable_virtual_x2apic_mode, .set_tss_addr = svm_set_tss_addr, .get_tdp_level = get_npt_level, diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 688f43f..b203ce7 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -433,6 +433,8 @@ struct vcpu_vmx { bool rdtscp_enabled; + bool virtual_x2apic_enabled; + /* Support for a guest hypervisor (nested VMX) */ struct nested_vmx nested; }; @@ -767,12 +769,23 @@ static inline bool cpu_has_vmx_virtualize_apic_accesses(void) SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES; } +static inline bool cpu_has_vmx_virtualize_x2apic_mode(void) +{ + return vmcs_config.cpu_based_2nd_exec_ctrl + SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE; +} + static inline bool cpu_has_vmx_apic_register_virt(void) { return vmcs_config.cpu_based_2nd_exec_ctrl SECONDARY_EXEC_APIC_REGISTER_VIRT; } +static inline bool cpu_has_vmx_virtual_intr_delivery(void) +{ + return false; +} + static inline bool cpu_has_vmx_flexpriority(void) { return cpu_has_vmx_tpr_shadow() @@ -2544,6 +2557,7
Re: [RFC PATCH 0/9] KVM: PPC: In-kernel PAPR interrupt controller emulation
On Sat, Dec 15, 2012 at 01:06:13PM +1100, Benjamin Herrenschmidt wrote: On Sat, 2012-12-15 at 01:46 +0100, Alexander Graf wrote: On 05.11.2012, at 04:18, Paul Mackerras wrote: This series implements in-kernel emulation of the XICS interrupt controller specified in the PAPR specification and used by pseries guests. Since the XICS is organized a bit differently to the interrupt controllers used on x86 machines, I have defined some new ioctls for setting up the XICS and for saving and restoring its state. It may be possible to use some of the currently x86-specific ioctls instead. Is this one already following the new world order that we discussed during KVM Forum? :) The new world order (sorry, looks like nobody took notes and people expect me to do a write up from memory now ... :-) Well, mostly we agreed that the x86 stuff wasn't going to work for us. So basically what we discussed boils down to: - Move the existing generic KVM ioctl to create the kernel APIC to x86 since it's no as-is useful for other archs who, among other things, might need to pass argument based on the machine type (type of interrupt subsystem for example, etc...) Assuming you're talking about KVM_CREATE_IRQCHIP, it is already handled entirely in arch code (arch/x86/kvm/x86.c and arch/ia64/kvm/kvm-ia64.c), along with KVM_GET_IRQCHIP and KVM_SET_IRQCHIP. - Once that's done, well, instanciating interrupt controller objects becomes pretty much an arch specific matter. We could try to keep some ioctls somewhat common though it's not *that* useful because the various types arguments are going to be fairly arch specific, so goes for configuration. Examples of what could be kept common are: * Create irq chip, takes at least a type argument, possibly a few more type-specific args (or a union, but then let's keep space in there since we can't change the size of the struct later as it would impact the ioctl number afaik). The existing KVM_CREATE_IRQCHIP is defined as _IO(...) which means that it doesn't read or write memory, but there is still the 3rd argument to the ioctl, which would give us 64 bits to indicate the type of the top-level IRQ controller (XICS, MPIC, ...), but not much more. It sounds like the agreement at the meeting was more along the lines of the KVM_CREATE_IRQCHIP_ARGS ioctl (as in the patches I posted) which can be called multiple times to instantiate pieces of the interrupt framework, rather than having a KVM_CREATE_IRQCHIP that gets called once early on to say that there we are using in-kernel interrupt controller emulation, followed by other calls to configure the various parts of the framework. Is that accurate? * Assigning the address of an irq chip when it can change (see ARM patches) - The existing business with irqfd etc... is mostly ok, except the interfaces messing around with MSIs (see virtio-pci use of kvm functions directly). The assignment of an irq number for an MSI must be a hook, probably a PCI controller hook, so x86 can get it done via its existing kernel interfaces and sane architectures can keep the assignment in qemu where it belongs. So, if I've understood you correctly about what was agreed, the set of ioctls that is implemented in the patches I posted is in line with what was agreed, isn't it? If not, where does it differ? (To recap, I had KVM_CREATE_IRQCHIP_ARGS, KVM_IRQCHIP_GET_SOURCES and KVM_IRQCHIP_SET_SOURCES, plus a one-reg interface to get/set the vcpu-specific state.) Paul. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html