RE: [RFC][PATCH v4 05/18] Add a function to indicate if device use external buffer.
> +static int dev_is_mpassthru(struct net_device *dev) > >bool return value should be better here. > >-- >Regards, >Changli Gao(xiao...@gmail.com) Thanks, would fix that. Thanks Xiaohui -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [RFC][PATCH v4 00/18] Provide a zero-copy method on KVM virtio-net.
> > The idea is simple, just to pin the guest VM user space and then let > > host NIC driver has the chance to directly DMA to it. > > > >Isn't it much easier to map the RX ring of the network device into the > >guest's address space, have DMA map calls translate guest addresses to > >physical/DMA addresses as well as do all of this crazy page pinning > >stuff, and provide the translations and protections via the IOMMU? >This means we need guest know how the specific network device works. >So we won't be able to, for example, move guest between different hosts. >There are other problems: many physical systems do not have an iommu, >some guest OS-es do not support DMA map calls, doing VM exit >on each DMA map call might turn out to be very slow. And so on. This solution is what now we can think of to implement zero-copy. Some modifications are made to net core to try to avoid network device driver changes. The major change is to __alloc_skb(), in which we added a dev parameter to indicate whether the device will DMA to/from guest/user buffer which is pointed by host skb->data. We also modify skb_release_data() and skb_reserve(). We made it now works with ixgbe driver with PS mode disabled, and got some performance data with it. using netperf with GSO/TSO disabled, 10G NIC, disabled packet split mode, with raw socket case compared to vhost. bindwidth will be from 1.1Gbps to 1.7Gbps CPU % from 120%-140% to 140%-160% We are now trying to get decent performance data with advanced features. Do you have any other concerns with this solution? >> What's being proposed here looks a bit over-engineered. >This is an attempt to reduce overhead for virtio (paravirtualization). >'Don't use PV' is kind of an alternative, but I do not >think it's a simpler one. -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] KVM, Fix QEMU-KVM is killed by guest SRAO MCE
On Wed, 2010-04-28 at 17:47 +0800, Avi Kivity wrote: > On 04/28/2010 05:56 AM, Huang Ying wrote: > > > >>> > >>> Just want to use the side effect of copy_from_user, SIGBUS will be sent > >>> to current process because the page touched is marked as poisoned. That > >>> is, failure is expected, so the return value is not checked. > >>> > >>> > >> What if the failure doesn't happen? Say, someone mmap()ed over the page. > >> > > Sorry, not get your idea clearly. hva is re-mmap()ed? We just read the > > hva, not write, so I think it should be OK here. > > > > > > We don't generate a signal in this case. Does the code continue to work > correctly (not sure what correctly is in this case... should probably > just continue). > > There's also the possibility of -EFAULT. I think signal should be generated for copy_from_user, because the hva is poisoned now. The signal will not generated only if the hva is re-mmap()ped to some other physical page, but this should be impossible unless we have memory hotadd/hotremove in KVM. If the signal is not generated, lost or overwritten, guest will continue, and if the hva is still poisoned, the page fault will be triggered again; if the hva is not poisoned, there will be no further page fault. Best Regards, Huang Ying -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Ignore SRAO MCE if another MCE is being processed
On Thu, 2010-04-29 at 00:33 +0800, Avi Kivity wrote: > On 04/28/2010 09:08 AM, Huang Ying wrote: > > > > There is some test script and document for this in: > > > > git://git.kernel.org/pub/scm/utils/cpu/mce/mce-test.git > > > > test script is in "kvm" directory, testing document is kvm/README > > > > > > This is a good candidate for kvm-autotest. We'll get little user > coverage for this feature, so it's good to get it machine tested regularly. Yes. We will working on this. Best Regards, Huang Ying -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
High CPU load on target host after migration
Hi, I have been toying around with kvm / libvirt / virt-manager and it's migration feature. Both host machines are running a 2.6.33 Kernel. One host is a Dual Quad Core Intel Xeon E5520 @ 2.27GHz and the other is a Dual Quad Core Intel E5420 @ 2.50GHz. Migrating Linux machines works great but Windows XP SP3 is giving me a headache. The migration process finishes without any crash/error but the CPU load on the target host is ~50% on two cores. There is no CPU intensive task running inside the VM though. "Removing" the network card from the VM and migrating it between the two machines doesn't seem to trigger the high CPU load. As network driver I tried the realtek and Red Hat virtio driver but it doesn't seem to make a difference. Any insights on what could cause this situation and how to best debug it? Cheers, Thomas -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RHEL6.0 PATCH] qemu-kvm: fix crash on reboot with vhost-net
On Wed, Apr 28, 2010 at 11:16:15PM +0300, Michael S. Tsirkin wrote: > Bugzilla: 585940 > Upstream status: applied on qemu-kvm.git, > commit 992cc816c42f2e93db033919a9ddbfcd1da4 Please disregard, sent to this list in error. Sorry about the noise. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Can I simulate a virtual Dual-Head Graphiccard?
On Wednesday, April 28, 2010 03:08:24 pm Axel Kittenberger wrote: > Hello, > > This is a question I was not able to answer with a search. I've been > using kvm now quite successfully as server side solution. Now I want to > use it on a particular desktop to have a Windows 7 Guest on a native > Linux system. Well this desktop has two Screens, and I'm sure its > expected to have the Guest also on both screens. > > Supposevly I could just simulate a very wide Screen and have the host > split it (either SDL or VNC). However, this is not quite the same, as > the guest will think exactly that, 1 wide screen. Meaning it will put > all the messageboxes exactly in the middle between the two screens, have > the startbar spawn on both screens and what not. So two screens are > handled a tad differently than one wide. > > Is this possible with kvm? > Either simulate a dual head mapping to one wide SDL/VNC display. Or > having two SDL/VNC displays? No. It was brought up before on the qemu list I believe. I think the gist was that qemu didn't support more than one vga card. > > Kind regards, > Axel Kittenberger > > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv7] add mergeable buffers support to vhost_net
This patch adds mergeable receive buffer support to vhost_net. Signed-off-by: David L Stevens diff -ruNp net-next-v0/drivers/vhost/net.c net-next-v7/drivers/vhost/net.c --- net-next-v0/drivers/vhost/net.c 2010-04-24 21:36:54.0 -0700 +++ net-next-v7/drivers/vhost/net.c 2010-04-28 12:26:18.0 -0700 @@ -74,6 +74,23 @@ static int move_iovec_hdr(struct iovec * } return seg; } +/* Copy iovec entries for len bytes from iovec. Return segments used. */ +static int copy_iovec_hdr(const struct iovec *from, struct iovec *to, + size_t len, int iovcount) +{ + int seg = 0; + size_t size; + while (len && seg < iovcount) { + size = min(from->iov_len, len); + to->iov_base = from->iov_base; + to->iov_len = size; + len -= size; + ++from; + ++to; + ++seg; + } + return seg; +} /* Caller must have TX VQ lock */ static void tx_poll_stop(struct vhost_net *net) @@ -109,7 +126,7 @@ static void handle_tx(struct vhost_net * }; size_t len, total_len = 0; int err, wmem; - size_t hdr_size; + size_t vhost_hlen; struct socket *sock = rcu_dereference(vq->private_data); if (!sock) return; @@ -128,13 +145,13 @@ static void handle_tx(struct vhost_net * if (wmem < sock->sk->sk_sndbuf / 2) tx_poll_stop(net); - hdr_size = vq->hdr_size; + vhost_hlen = vq->vhost_hlen; for (;;) { - head = vhost_get_vq_desc(&net->dev, vq, vq->iov, -ARRAY_SIZE(vq->iov), -&out, &in, -NULL, NULL); + head = vhost_get_desc(&net->dev, vq, vq->iov, + ARRAY_SIZE(vq->iov), + &out, &in, + NULL, NULL); /* Nothing new? Wait for eventfd to tell us they refilled. */ if (head == vq->num) { wmem = atomic_read(&sock->sk->sk_wmem_alloc); @@ -155,20 +172,20 @@ static void handle_tx(struct vhost_net * break; } /* Skip header. TODO: support TSO. */ - s = move_iovec_hdr(vq->iov, vq->hdr, hdr_size, out); + s = move_iovec_hdr(vq->iov, vq->hdr, vhost_hlen, out); msg.msg_iovlen = out; len = iov_length(vq->iov, out); /* Sanity check */ if (!len) { vq_err(vq, "Unexpected header len for TX: " "%zd expected %zd\n", - iov_length(vq->hdr, s), hdr_size); + iov_length(vq->hdr, s), vhost_hlen); break; } /* TODO: Check specific error and bomb out unless ENOBUFS? */ err = sock->ops->sendmsg(NULL, sock, &msg, len); if (unlikely(err < 0)) { - vhost_discard_vq_desc(vq); + vhost_discard_desc(vq, 1); tx_poll_start(net, sock); break; } @@ -187,12 +204,25 @@ static void handle_tx(struct vhost_net * unuse_mm(net->dev.mm); } +static int vhost_head_len(struct vhost_virtqueue *vq, struct sock *sk) +{ + struct sk_buff *head; + int len = 0; + + lock_sock(sk); + head = skb_peek(&sk->sk_receive_queue); + if (head) + len = head->len + vq->sock_hlen; + release_sock(sk); + return len; +} + /* Expects to be always run from workqueue - which acts as * read-size critical section for our kind of RCU. */ static void handle_rx(struct vhost_net *net) { struct vhost_virtqueue *vq = &net->dev.vqs[VHOST_NET_VQ_RX]; - unsigned head, out, in, log, s; + unsigned in, log, s; struct vhost_log *vq_log; struct msghdr msg = { .msg_name = NULL, @@ -203,14 +233,14 @@ static void handle_rx(struct vhost_net * .msg_flags = MSG_DONTWAIT, }; - struct virtio_net_hdr hdr = { - .flags = 0, - .gso_type = VIRTIO_NET_HDR_GSO_NONE + struct virtio_net_hdr_mrg_rxbuf hdr = { + .hdr.flags = 0, + .hdr.gso_type = VIRTIO_NET_HDR_GSO_NONE }; size_t len, total_len = 0; - int err; - size_t hdr_size; + int err, headcount, datalen; + size_t vhost_hlen; struct socket *sock = rcu_dereference(vq->private_data); if (!sock || skb_queue_empty(&sock->sk->sk_receive_queue)) return; @@ -218,18 +248,19 @@ static void handle_rx(struct vhost_net * use_mm(net->dev.mm); mutex_lock(&vq->mutex); vhost_disa
[RHEL6.0 PATCH] qemu-kvm: fix crash on reboot with vhost-net
Bugzilla: 585940 Upstream status: applied on qemu-kvm.git, commit 992cc816c42f2e93db033919a9ddbfcd1da4 When vhost-net is disabled on reboot, we set msix mask notifier to NULL to disable further mask/unmask notifications. Code currently tries to pass this NULL to notifier, leading to a crash. The right thing to do is to add explicit APIs to enable/disable notifications. Now when disabling notifications: - if vector is masked, we don't need to notify backend, just disable future notifications - if vector is unmasked, invoke callback to unassign backend, then disable future notifications This patch also polls notifier before closing it, to make sure we don't lose events if poll callback didn't have time to run. Signed-off-by: Michael S. Tsirkin --- hw/msix.c | 40 +++- hw/msix.h |1 + hw/virtio-pci.c |7 +-- 3 files changed, 41 insertions(+), 7 deletions(-) diff --git a/hw/msix.c b/hw/msix.c index 3fcf3a1..94e3981 100644 --- a/hw/msix.c +++ b/hw/msix.c @@ -610,14 +610,44 @@ void msix_unuse_all_vectors(PCIDevice *dev) int msix_set_mask_notifier(PCIDevice *dev, unsigned vector, void *opaque) { +int r; +if (vector >= dev->msix_entries_nr || !dev->msix_entry_used[vector]) +return 0; + +assert(dev->msix_mask_notifier); +assert(opaque); +assert(!dev->msix_mask_notifier_opaque[vector]); + +if (msix_is_masked(dev, vector)) { +return 0; +} +r = dev->msix_mask_notifier(dev, vector, opaque, +msix_is_masked(dev, vector)); +if (r < 0) { +return r; +} +dev->msix_mask_notifier_opaque[vector] = opaque; +return r; +} + +int msix_unset_mask_notifier(PCIDevice *dev, unsigned vector) +{ int r = 0; if (vector >= dev->msix_entries_nr || !dev->msix_entry_used[vector]) return 0; -if (dev->msix_mask_notifier) -r = dev->msix_mask_notifier(dev, vector, opaque, -msix_is_masked(dev, vector)); -if (r >= 0) -dev->msix_mask_notifier_opaque[vector] = opaque; +assert(dev->msix_mask_notifier); +assert(dev->msix_mask_notifier_opaque[vector]); + +if (msix_is_masked(dev, vector)) { +return 0; +} +r = dev->msix_mask_notifier(dev, vector, +dev->msix_mask_notifier_opaque[vector], +msix_is_masked(dev, vector)); +if (r < 0) { +return r; +} +dev->msix_mask_notifier_opaque[vector] = NULL; return r; } diff --git a/hw/msix.h b/hw/msix.h index f167231..6b21ffb 100644 --- a/hw/msix.h +++ b/hw/msix.h @@ -34,4 +34,5 @@ void msix_reset(PCIDevice *dev); extern int msix_supported; int msix_set_mask_notifier(PCIDevice *dev, unsigned vector, void *opaque); +int msix_unset_mask_notifier(PCIDevice *dev, unsigned vector); #endif diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c index cba188c..22f7fa0 100644 --- a/hw/virtio-pci.c +++ b/hw/virtio-pci.c @@ -437,10 +437,13 @@ static int virtio_pci_guest_notifier(void *opaque, int n, bool assign) msix_set_mask_notifier(&proxy->pci_dev, virtio_queue_vector(proxy->vdev, n), vq); } else { -msix_set_mask_notifier(&proxy->pci_dev, - virtio_queue_vector(proxy->vdev, n), NULL); +msix_unset_mask_notifier(&proxy->pci_dev, +virtio_queue_vector(proxy->vdev, n)); qemu_set_fd_handler(event_notifier_get_fd(notifier), NULL, NULL, NULL); +/* Test and clear notifier before closing it, + * in case poll callback didn't have time to run. */ +virtio_pci_guest_notifier_read(vq); event_notifier_cleanup(notifier); } -- 1.7.1.rc1.22.g3163 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Can I simulate a virtual Dual-Head Graphiccard?
Hello, This is a question I was not able to answer with a search. I've been using kvm now quite successfully as server side solution. Now I want to use it on a particular desktop to have a Windows 7 Guest on a native Linux system. Well this desktop has two Screens, and I'm sure its expected to have the Guest also on both screens. Supposevly I could just simulate a very wide Screen and have the host split it (either SDL or VNC). However, this is not quite the same, as the guest will think exactly that, 1 wide screen. Meaning it will put all the messageboxes exactly in the middle between the two screens, have the startbar spawn on both screens and what not. So two screens are handled a tad differently than one wide. Is this possible with kvm? Either simulate a dual head mapping to one wide SDL/VNC display. Or having two SDL/VNC displays? Kind regards, Axel Kittenberger -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2] qemu-kvm: fix crash on reboot with vhost-net
On Wed, Apr 28, 2010 at 02:43:05PM -0300, Marcelo Tosatti wrote: > > Signed-off-by: Michael S. Tsirkin > > Applied, thanks. Can you tell me what commit id it has pls (for backport to rhel6). -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Regression in vga performance between 0.11.1 and 0.12.1.1
Hi, I noticed that certain guests (for example, Ubuntu 9.04, Ubuntu 9.10, and the Ubuntu 10.04 release candidate) show dramatically (~100x) slower graphical output when running under qemu-kvm-0.12.1.1 than under qemu-kvm-0.11.1. Other guests, notably Windows XP and Windows Vista, run fine under both version of qemu. The regression is still present in qemu-kvm-0.12.3. Here's the information you request when submitting a bug report: * what cpu model (examples: Intel Core Duo, Intel Core 2 Duo, AMD Opteron 2210) The host has two Xeon X5650 cpus. * what kvm version you are using. If you're using git directly, provide the output of 'git describe'. I'm using the kvm included with the stock 2.6.32.12 kernel, and two different qemu-kvms: version 0.11.1, which works fine, and version 0.12.1.1, which has the regression. * the host kernel version I'm running kernel 2.6.32.12 on the host. * what host kernel arch you are using (i386 or x86_64) i386. * what guest you are using, including OS type (Linux, Windows, Solaris, etc.), bitness (32 or 64), kernel version Guests that are affected: Ubuntu 9.04 - kernel 2.6.28-18, 32 bits Ubuntu 9.10 - kernel 2.6.31-20, 32 bits Ubuntu 10.04 - kernel 2.6.32-21, 32-bits Guests that aren't affected: Windows XP SP3, Windows Vista SP2: both 32 bits * the qemu command line you are using to start the guest Here's one that works: DISPLAY=:0.2 ~/qemu0111/bin/qemu-system-x86_64 -m 1536 -soundhw es1370 -smp 4 -usb -usbdevice tablet -vga std -cpu core2duo -snapshot -name "VKoala (r/o)",process="VKoala" -net nic,macaddr=DE:AD:BE:EF:00:08 -net tap,vlan=0,ifname=tap6,script=no,downscript=no /vm/ubuntu-9.10/0006.img Here's one that works, but has drastically slower graphics (the only change is which qemu-system-x86_64 I'm running.) DISPLAY=:0.2 ~/qemu01211/bin/qemu-system-x86_64 -m 1536 -soundhw es1370 -smp 4 -usb -usbdevice tablet -vga std -cpu core2duo -snapshot -name "VKoala (r/o)",process="VKoala" -net nic,macaddr=DE:AD:BE:EF:00:08 -net tap,vlan=0,ifname=tap6,script=no,downscript=no /vm/ubuntu-9.10/0006.img Here are two that work fine with either binary: DISPLAY=:0.2 ~/qemu01211/bin/qemu-system-x86_64 -m 1536 -soundhw es1370 -usb -usbdevice tablet -localtime -vga std -cpu core2duo -snapshot -name "VXP (r/o)",process="VXP" -net nic,macaddr=DE:AD:BE:EF:00:01 -net tap,vlan=0,ifname=tap0,script=no,downscript=no /vm/xp/0024.img DISPLAY=:0.2 ~/qemu01211/bin/qemu-system-x86_64 -m 1536 -soundhw es1370 -smp 2 -usb -usbdevice tablet -localtime -vga std -cpu core2duo -snapshot -name "VVista (r/o)",process="VVista" -net nic,macaddr=DE:AD:BE:EF:00:02 -net tap,vlan=0,ifname=tap1,script=no,downscript=no /vm/vista/0009.img * whether the problem goes away if using the -no-kvm-irqchip or -no-kvm-pit switch. Neither of these switches change the results. * whether the problem also appears with the -no-kvm switch. This switch makes everything run so much slower that I can't tell if it changes the results. Please cc me if you have any questions, as I'm not subscribed to the kvm list. Thanks for working on kvm, it's a great piece of software! Aloha, Adam Greenblatt -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm mmu: reduce 50% memory usage
On Wed, Apr 28, 2010 at 07:57:01PM +0800, Lai Jiangshan wrote: > > I think users will enable tdp when their hardwares support ept or npt. > This patch can reduce about 50% kvm mmu memory usage for they. > > This simple patch use the fact that: > > When sp->role.direct is set, sp->gfns does not contain any essential > information, leaf sptes reachable from this sp are for a continuate > guest physical memory range(a linear range). > So sp->gfns[i](if it was set) equals to sp->gfn + i. (PT_PAGE_TABLE_LEVEL) > Obviously, it is not essential information, we can calculate it when need. > > It means we don't need sp->gfns when sp->role.direct=1, > Thus we can save one page usage for every kvm_mmu_page. > > Note: > Access to sp->gfns must be wrapped by kvm_mmu_page_get_gfn() > or kvm_mmu_page_set_gfn(). > It is only exposed in FNAME(sync_page). > > Signed-off-by: Lai Jiangshan Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2] qemu-kvm: fix crash on reboot with vhost-net
On Wed, Apr 28, 2010 at 12:27:38PM +0300, Michael S. Tsirkin wrote: > When vhost-net is disabled on reboot, we set msix mask notifier > to NULL to disable further mask/unmask notifications. > Code currently tries to pass this NULL to notifier, > leading to a crash. The right thing to do is > to add explicit APIs to enable/disable notifications. > Now when disabling notifications: > - if vector is masked, we don't need to notify backend, > just disable future notifications > - if vector is unmasked, invoke callback to unassign backend, > then disable future notifications > > This patch also polls notifier before closing it, > to make sure we don't lose events if poll callback > didn't have time to run. > > Signed-off-by: Michael S. Tsirkin Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Ignore SRAO MCE if another MCE is being processed
On 04/28/2010 09:08 AM, Huang Ying wrote: There is some test script and document for this in: git://git.kernel.org/pub/scm/utils/cpu/mce/mce-test.git test script is in "kvm" directory, testing document is kvm/README This is a good candidate for kvm-autotest. We'll get little user coverage for this feature, so it's good to get it machine tested regularly. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 0/10] KVM MMU: allow more shadow pages become asynchronous
On Wed, Apr 28, 2010 at 11:54:31AM +0800, Xiao Guangrong wrote: > Changlog v3: > > Those changes all form Avi's suggestion, thanks. > > - use smart way to fix the bug in patch 1 > - remove duplicates code in patch 5 > - check error code and fix forgot release page in patch 9 > - sync shadow pages in a batch instead of one by one > > And, there is one TODO thing: > Marker shadow page as unsync at create time avoid write-protect, > this idea is from Avi: > > |Another interesting case is to create new shadow pages in the unsync state. > |That can help when the guest starts a short lived process: we can avoid write > |protecting its pagetables completely > > I'll send the patch out after this patchset applied. > > Changlog v2: > > - when level is PT_DIRECTORY_LEVEL, the 'offset' should be > 'role.quadrant << 8', thanks Avi for point it out > > - keep invlpg code in paging_tmpl.h address Avi's suggestion > > - split kvm_sync_page() into kvm_sync_page() and kvm_sync_page_transient() > to clarify the code address Avi's suggestion Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: [PATCH 06/10] kvm: remove explicit kvm_arch_reset_vcpu from kvm_init_vcpu
On Wed, Apr 28, 2010 at 01:22:14PM -0300, Marcelo Tosatti wrote: > On Wed, Apr 28, 2010 at 10:39:06AM -0500, Anthony Liguori wrote: > > On 04/26/2010 12:59 PM, Marcelo Tosatti wrote: > > >This is now done via the initialization's qemu_system_reset call. > > > > > >Signed-off-by: Avi Kivity > > >--- > > > kvm-all.c |1 - > > > 1 files changed, 0 insertions(+), 1 deletions(-) > > > > > >diff --git a/kvm-all.c b/kvm-all.c > > >index 9c8aa7d..eabb097 100644 > > >--- a/kvm-all.c > > >+++ b/kvm-all.c > > >@@ -208,7 +208,6 @@ int kvm_init_vcpu(CPUState *env) > > > ret = kvm_arch_init_vcpu(env); > > > if (ret == 0) { > > > qemu_register_reset(kvm_reset_vcpu, env); > > >-kvm_arch_reset_vcpu(env); > > > } > > > err: > > > return ret; > > > > This breaks -enable-kvm in upstream qemu. No progress is ever made > > in the guest. > > These patches have been autotested, and just confirmed manually that > this patch does not break -enable-kvm (as the reset handler is now > called through the notifier). > > What is your qemu command line? Just drop it please. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: [PATCH 06/10] kvm: remove explicit kvm_arch_reset_vcpu from kvm_init_vcpu
On 04/28/2010 11:22 AM, Marcelo Tosatti wrote: On Wed, Apr 28, 2010 at 10:39:06AM -0500, Anthony Liguori wrote: On 04/26/2010 12:59 PM, Marcelo Tosatti wrote: This is now done via the initialization's qemu_system_reset call. Signed-off-by: Avi Kivity --- kvm-all.c |1 - 1 files changed, 0 insertions(+), 1 deletions(-) diff --git a/kvm-all.c b/kvm-all.c index 9c8aa7d..eabb097 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -208,7 +208,6 @@ int kvm_init_vcpu(CPUState *env) ret = kvm_arch_init_vcpu(env); if (ret == 0) { qemu_register_reset(kvm_reset_vcpu, env); -kvm_arch_reset_vcpu(env); } err: return ret; This breaks -enable-kvm in upstream qemu. No progress is ever made in the guest. These patches have been autotested, and just confirmed manually that this patch does not break -enable-kvm (as the reset handler is now called through the notifier). What is your qemu command line? x86_64-softmmu/qemu-system-x86_64 -hda ~/images/linux.img -snapshot -enable-kvm -L ~/git/qemu/pc-bios This is with the very latest tip. I can publish a branch if that's helpful. If you have an old tree, I wouldn't be surprised if it didn't show up. Regards, Anthony Liguori -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] Add test for ljmp.
On Tue, Apr 27, 2010 at 03:56:04PM +0300, Gleb Natapov wrote: > Test that ljmp with operand in IO memory works. > > Signed-off-by: Gleb Natapov Applied both, thanks. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Ignore SRAO MCE if another MCE is being processed
On Tue, Apr 27, 2010 at 03:10:49PM +0800, Huang Ying wrote: > In common cases, guest SRAO MCE will cause corresponding poisoned page > be un-mapped in host and SIGBUS be sent to QEMU-KVM, then QEMU-KVM > will relay the MCE to guest OS. > > But it is possible that the poisoned page is accessed in guest after > un-mapped in host and before MCE is relayed to guest OS. So that, the > SRAR SIGBUS is sent to QEMU-KVM before the SRAO SIGBUS, and if > QEMU-KVM relays them to guest OS one by one, guest system may reset, > because the SRAO MCE may be triggered while the SRAR MCE is being > processed. In fact, the SRAO MCE can be ignored in this situation, so > that the guest system is given opportunity to survive. > > Signed-off-by: Huang Ying Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: [PATCH 06/10] kvm: remove explicit kvm_arch_reset_vcpu from kvm_init_vcpu
On Wed, Apr 28, 2010 at 10:39:06AM -0500, Anthony Liguori wrote: > On 04/26/2010 12:59 PM, Marcelo Tosatti wrote: > >This is now done via the initialization's qemu_system_reset call. > > > >Signed-off-by: Avi Kivity > >--- > > kvm-all.c |1 - > > 1 files changed, 0 insertions(+), 1 deletions(-) > > > >diff --git a/kvm-all.c b/kvm-all.c > >index 9c8aa7d..eabb097 100644 > >--- a/kvm-all.c > >+++ b/kvm-all.c > >@@ -208,7 +208,6 @@ int kvm_init_vcpu(CPUState *env) > > ret = kvm_arch_init_vcpu(env); > > if (ret == 0) { > > qemu_register_reset(kvm_reset_vcpu, env); > >-kvm_arch_reset_vcpu(env); > > } > > err: > > return ret; > > This breaks -enable-kvm in upstream qemu. No progress is ever made > in the guest. These patches have been autotested, and just confirmed manually that this patch does not break -enable-kvm (as the reset handler is now called through the notifier). What is your qemu command line? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv2 07/23] KVM: x86 emulator: add (set|get)_msr callbacks to x86_emulate_ops
Add (set|get)_msr callbacks to x86_emulate_ops instead of calling them directly. Signed-off-by: Gleb Natapov --- arch/x86/include/asm/kvm_emulate.h |2 ++ arch/x86/kvm/emulate.c | 36 ++-- arch/x86/kvm/x86.c |2 ++ 3 files changed, 22 insertions(+), 18 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index c37296d..f751657 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -139,6 +139,8 @@ struct x86_emulate_ops { void (*set_rflags)(struct kvm_vcpu *vcpu, unsigned long rflags); int (*get_dr)(int dr, unsigned long *dest, struct kvm_vcpu *vcpu); int (*set_dr)(int dr, unsigned long value, struct kvm_vcpu *vcpu); + int (*set_msr)(struct kvm_vcpu *vcpu, u32 msr_index, u64 data); + int (*get_msr)(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata); }; /* Type, address-of, and value of an instruction's operand. */ diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index c54f381..37b20c3 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -1875,7 +1875,7 @@ setup_syscalls_segments(struct x86_emulate_ctxt *ctxt, } static int -emulate_syscall(struct x86_emulate_ctxt *ctxt) +emulate_syscall(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) { struct decode_cache *c = &ctxt->decode; struct kvm_segment cs, ss; @@ -1890,7 +1890,7 @@ emulate_syscall(struct x86_emulate_ctxt *ctxt) setup_syscalls_segments(ctxt, &cs, &ss); - kvm_x86_ops->get_msr(ctxt->vcpu, MSR_STAR, &msr_data); + ops->get_msr(ctxt->vcpu, MSR_STAR, &msr_data); msr_data >>= 32; cs.selector = (u16)(msr_data & 0xfffc); ss.selector = (u16)(msr_data + 8); @@ -1907,17 +1907,17 @@ emulate_syscall(struct x86_emulate_ctxt *ctxt) #ifdef CONFIG_X86_64 c->regs[VCPU_REGS_R11] = ctxt->eflags & ~EFLG_RF; - kvm_x86_ops->get_msr(ctxt->vcpu, - ctxt->mode == X86EMUL_MODE_PROT64 ? - MSR_LSTAR : MSR_CSTAR, &msr_data); + ops->get_msr(ctxt->vcpu, +ctxt->mode == X86EMUL_MODE_PROT64 ? +MSR_LSTAR : MSR_CSTAR, &msr_data); c->eip = msr_data; - kvm_x86_ops->get_msr(ctxt->vcpu, MSR_SYSCALL_MASK, &msr_data); + ops->get_msr(ctxt->vcpu, MSR_SYSCALL_MASK, &msr_data); ctxt->eflags &= ~(msr_data | EFLG_RF); #endif } else { /* legacy mode */ - kvm_x86_ops->get_msr(ctxt->vcpu, MSR_STAR, &msr_data); + ops->get_msr(ctxt->vcpu, MSR_STAR, &msr_data); c->eip = (u32)msr_data; ctxt->eflags &= ~(EFLG_VM | EFLG_IF | EFLG_RF); @@ -1927,7 +1927,7 @@ emulate_syscall(struct x86_emulate_ctxt *ctxt) } static int -emulate_sysenter(struct x86_emulate_ctxt *ctxt) +emulate_sysenter(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) { struct decode_cache *c = &ctxt->decode; struct kvm_segment cs, ss; @@ -1949,7 +1949,7 @@ emulate_sysenter(struct x86_emulate_ctxt *ctxt) setup_syscalls_segments(ctxt, &cs, &ss); - kvm_x86_ops->get_msr(ctxt->vcpu, MSR_IA32_SYSENTER_CS, &msr_data); + ops->get_msr(ctxt->vcpu, MSR_IA32_SYSENTER_CS, &msr_data); switch (ctxt->mode) { case X86EMUL_MODE_PROT32: if ((msr_data & 0xfffc) == 0x0) { @@ -1979,17 +1979,17 @@ emulate_sysenter(struct x86_emulate_ctxt *ctxt) kvm_x86_ops->set_segment(ctxt->vcpu, &cs, VCPU_SREG_CS); kvm_x86_ops->set_segment(ctxt->vcpu, &ss, VCPU_SREG_SS); - kvm_x86_ops->get_msr(ctxt->vcpu, MSR_IA32_SYSENTER_EIP, &msr_data); + ops->get_msr(ctxt->vcpu, MSR_IA32_SYSENTER_EIP, &msr_data); c->eip = msr_data; - kvm_x86_ops->get_msr(ctxt->vcpu, MSR_IA32_SYSENTER_ESP, &msr_data); + ops->get_msr(ctxt->vcpu, MSR_IA32_SYSENTER_ESP, &msr_data); c->regs[VCPU_REGS_RSP] = msr_data; return X86EMUL_CONTINUE; } static int -emulate_sysexit(struct x86_emulate_ctxt *ctxt) +emulate_sysexit(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) { struct decode_cache *c = &ctxt->decode; struct kvm_segment cs, ss; @@ -2012,7 +2012,7 @@ emulate_sysexit(struct x86_emulate_ctxt *ctxt) cs.dpl = 3; ss.dpl = 3; - kvm_x86_ops->get_msr(ctxt->vcpu, MSR_IA32_SYSENTER_CS, &msr_data); + ops->get_msr(ctxt->vcpu, MSR_IA32_SYSENTER_CS, &msr_data); switch (usermode) { case X86EMUL_MODE_PROT32: cs.selector = (u16)(msr_data + 16); @@ -3099,7 +3099,7 @@ twobyte_insn: } break; case 0x05: /* syscall */ - rc = emulate_syscall(ctxt); + rc = emulate_syscall(ctxt, ops); if (rc != X86EMUL_CONTI
[PATCHv2 05/23] KVM: x86 emulator: handle "far address" source operand.
ljmp/lcall instruction operand contains address and segment. It can be 10 bytes long. Currently we decode it as two different operands. Fix it by introducing new kind of operand that can hold entire far address. Signed-off-by: Gleb Natapov --- arch/x86/include/asm/kvm_emulate.h |6 +++- arch/x86/kvm/emulate.c | 56 --- 2 files changed, 37 insertions(+), 25 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index 288cbed..69a64a6 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -143,7 +143,11 @@ struct x86_emulate_ops { struct operand { enum { OP_REG, OP_MEM, OP_IMM, OP_NONE } type; unsigned int bytes; - unsigned long val, orig_val, *ptr; + unsigned long orig_val, *ptr; + union { + unsigned long val; + char valptr[sizeof(unsigned long) + 2]; + }; }; struct fetch_cache { diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index fbc555b..9b19838 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -67,6 +67,8 @@ #define SrcImmUByte (8<<4) /* 8-bit unsigned immediate operand. */ #define SrcImmU (9<<4) /* Immediate operand, unsigned */ #define SrcSI (0xa<<4) /* Source is in the DS:RSI */ +#define SrcImmFAddr (0xb<<4) /* Source is immediate far address */ +#define SrcMemFAddr (0xc<<4) /* Source is far address in memory */ #define SrcMask (0xf<<4) /* Generic ModRM decode. */ #define ModRM (1<<8) @@ -88,10 +90,6 @@ #define Src2CL (1<<29) #define Src2ImmByte (2<<29) #define Src2One (3<<29) -#define Src2Imm16 (4<<29) -#define Src2Mem16 (5<<29) /* Used for Ep encoding. First argument has to be - in memory and second argument is located - immediately after the first one in memory. */ #define Src2Mask(7<<29) enum { @@ -175,7 +173,7 @@ static u32 opcode_table[256] = { /* 0x90 - 0x97 */ DstReg, DstReg, DstReg, DstReg, DstReg, DstReg, DstReg, DstReg, /* 0x98 - 0x9F */ - 0, 0, SrcImm | Src2Imm16 | No64, 0, + 0, 0, SrcImmFAddr | No64, 0, ImplicitOps | Stack, ImplicitOps | Stack, 0, 0, /* 0xA0 - 0xA7 */ ByteOp | DstReg | SrcMem | Mov | MemAbs, DstReg | SrcMem | Mov | MemAbs, @@ -215,7 +213,7 @@ static u32 opcode_table[256] = { ByteOp | SrcImmUByte | DstAcc, SrcImmUByte | DstAcc, /* 0xE8 - 0xEF */ SrcImm | Stack, SrcImm | ImplicitOps, - SrcImmU | Src2Imm16 | No64, SrcImmByte | ImplicitOps, + SrcImmFAddr | No64, SrcImmByte | ImplicitOps, SrcNone | ByteOp | DstAcc, SrcNone | DstAcc, SrcNone | ByteOp | DstAcc, SrcNone | DstAcc, /* 0xF0 - 0xF7 */ @@ -350,7 +348,7 @@ static u32 group_table[] = { [Group5*8] = DstMem | SrcNone | ModRM, DstMem | SrcNone | ModRM, SrcMem | ModRM | Stack, 0, - SrcMem | ModRM | Stack, SrcMem | ModRM | Src2Mem16 | ImplicitOps, + SrcMem | ModRM | Stack, SrcMemFAddr | ModRM | ImplicitOps, SrcMem | ModRM | Stack, 0, [Group7*8] = 0, 0, ModRM | SrcMem | Priv, ModRM | SrcMem | Priv, @@ -576,6 +574,13 @@ static u32 group2_table[] = { (_type)_x; \ }) +#define insn_fetch_arr(_arr, _size, _eip)\ +({ rc = do_insn_fetch(ctxt, ops, (_eip), _arr, (_size)); \ + if (rc != X86EMUL_CONTINUE) \ + goto done; \ + (_eip) += (_size); \ +}) + static inline unsigned long ad_mask(struct decode_cache *c) { return (1UL << (c->ad_bytes << 3)) - 1; @@ -1160,6 +1165,17 @@ done_prefixes: c->regs[VCPU_REGS_RSI]); c->src.val = 0; break; + case SrcImmFAddr: + c->src.type = OP_IMM; + c->src.ptr = (unsigned long *)c->eip; + c->src.bytes = c->op_bytes + 2; + insn_fetch_arr(c->src.valptr, c->src.bytes, c->eip); + break; + case SrcMemFAddr: + c->src.type = OP_MEM; + c->src.ptr = (unsigned long *)c->modrm_ea; + c->src.bytes = c->op_bytes + 2; + break; } /* @@ -1179,22 +1195,10 @@ done_prefixes: c->src2.bytes = 1; c->src2.val = insn_fetch(u8, 1, c->eip); break; - case Src2Imm16: - c->src2.type = OP_IMM; - c->src2.ptr = (unsigned long *)c->eip; - c->src2.bytes = 2; - c->src2.val = insn_fetch(u16, 2, c->eip); - break; case Src2One: c->src2.bytes = 1; c->src2.va
[PATCHv2 06/23] KVM: x86 emulator: add (set|get)_dr callbacks to x86_emulate_ops
Add (set|get)_dr callbacks to x86_emulate_ops instead of calling them directly. Signed-off-by: Gleb Natapov --- arch/x86/include/asm/kvm_emulate.h |2 ++ arch/x86/include/asm/kvm_host.h|4 arch/x86/kvm/emulate.c |7 +-- arch/x86/kvm/x86.c | 12 ++-- 4 files changed, 13 insertions(+), 12 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index 69a64a6..c37296d 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -137,6 +137,8 @@ struct x86_emulate_ops { void (*set_cr)(int cr, ulong val, struct kvm_vcpu *vcpu); int (*cpl)(struct kvm_vcpu *vcpu); void (*set_rflags)(struct kvm_vcpu *vcpu, unsigned long rflags); + int (*get_dr)(int dr, unsigned long *dest, struct kvm_vcpu *vcpu); + int (*set_dr)(int dr, unsigned long value, struct kvm_vcpu *vcpu); }; /* Type, address-of, and value of an instruction's operand. */ diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 3f0007b..74cb6ac 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -590,10 +590,6 @@ void kvm_emulate_cpuid(struct kvm_vcpu *vcpu); int kvm_emulate_halt(struct kvm_vcpu *vcpu); int emulate_invlpg(struct kvm_vcpu *vcpu, gva_t address); int emulate_clts(struct kvm_vcpu *vcpu); -int emulator_get_dr(struct x86_emulate_ctxt *ctxt, int dr, - unsigned long *dest); -int emulator_set_dr(struct x86_emulate_ctxt *ctxt, int dr, - unsigned long value); void kvm_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg); int kvm_load_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector, int seg); diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 9b19838..c54f381 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -3132,7 +3132,7 @@ twobyte_insn: kvm_queue_exception(ctxt->vcpu, UD_VECTOR); goto done; } - emulator_get_dr(ctxt, c->modrm_reg, &c->regs[c->modrm_rm]); + ops->get_dr(c->modrm_reg, &c->regs[c->modrm_rm], ctxt->vcpu); c->dst.type = OP_NONE; /* no writeback */ break; case 0x22: /* mov reg, cr */ @@ -3145,7 +3145,10 @@ twobyte_insn: kvm_queue_exception(ctxt->vcpu, UD_VECTOR); goto done; } - emulator_set_dr(ctxt, c->modrm_reg, c->regs[c->modrm_rm]); + + ops->set_dr(c->modrm_reg,c->regs[c->modrm_rm] & + ((ctxt->mode == X86EMUL_MODE_PROT64) ? ~0ULL : ~0U), + ctxt->vcpu); c->dst.type = OP_NONE; /* no writeback */ break; case 0x30: diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 6b2ce1d..c0d6e4c 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3558,16 +3558,14 @@ int emulate_clts(struct kvm_vcpu *vcpu) return X86EMUL_CONTINUE; } -int emulator_get_dr(struct x86_emulate_ctxt *ctxt, int dr, unsigned long *dest) +int emulator_get_dr(int dr, unsigned long *dest, struct kvm_vcpu *vcpu) { - return kvm_get_dr(ctxt->vcpu, dr, dest); + return kvm_get_dr(vcpu, dr, dest); } -int emulator_set_dr(struct x86_emulate_ctxt *ctxt, int dr, unsigned long value) +int emulator_set_dr(int dr, unsigned long value, struct kvm_vcpu *vcpu) { - unsigned long mask = (ctxt->mode == X86EMUL_MODE_PROT64) ? ~0ULL : ~0U; - - return kvm_set_dr(ctxt->vcpu, dr, value & mask); + return kvm_set_dr(vcpu, dr, value); } void kvm_report_emulation_failure(struct kvm_vcpu *vcpu, const char *context) @@ -3749,6 +3747,8 @@ static struct x86_emulate_ops emulate_ops = { .set_cr = emulator_set_cr, .cpl = emulator_get_cpl, .set_rflags = emulator_set_rflags, + .get_dr = emulator_get_dr, + .set_dr = emulator_set_dr, }; static void cache_all_regs(struct kvm_vcpu *vcpu) -- 1.6.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv2 14/23] KVM: x86 emulator: x86_emulate_insn() return -1 only in case of emulation failure
Currently emulator returns -1 when emulation failed or IO is needed. Caller tries to guess whether emulation failed by looking at other variables. Make it easier for caller to recognise error condition by always returning -1 in case of failure. For this new emulator internal return value X86EMUL_IO_NEEDED is introduced. It is used to distinguish between error condition (which returns X86EMUL_UNHANDLEABLE) and condition that requires IO exit to userspace to continue emulation. Signed-off-by: Gleb Natapov --- arch/x86/include/asm/kvm_emulate.h |1 + arch/x86/kvm/x86.c | 36 ++-- 2 files changed, 19 insertions(+), 18 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index 0cf4311..777240d 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -53,6 +53,7 @@ struct x86_emulate_ctxt; #define X86EMUL_PROPAGATE_FAULT 2 /* propagate a generated fault to guest */ #define X86EMUL_RETRY_INSTR 3 /* retry the instruction for some reason */ #define X86EMUL_CMPXCHG_FAILED 4 /* cmpxchg did not see expected value */ +#define X86EMUL_IO_NEEDED 5 /* IO is needed to complete emulation */ struct x86_emulate_ops { /* diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index aa629a0..bbfabbd 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3213,7 +3213,7 @@ static int kvm_read_guest_virt_helper(gva_t addr, void *val, unsigned int bytes, } ret = kvm_read_guest(vcpu->kvm, gpa, data, toread); if (ret < 0) { - r = X86EMUL_UNHANDLEABLE; + r = X86EMUL_IO_NEEDED; goto out; } @@ -3269,7 +3269,7 @@ static int kvm_write_guest_virt_system(gva_t addr, void *val, } ret = kvm_write_guest(vcpu->kvm, gpa, data, towrite); if (ret < 0) { - r = X86EMUL_UNHANDLEABLE; + r = X86EMUL_IO_NEEDED; goto out; } @@ -3329,7 +3329,7 @@ mmio: vcpu->run->mmio.len = vcpu->mmio_size = bytes; vcpu->run->mmio.is_write = vcpu->mmio_is_write = 0; - return X86EMUL_UNHANDLEABLE; + return X86EMUL_IO_NEEDED; } int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa, @@ -3801,8 +3801,6 @@ int emulate_instruction(struct kvm_vcpu *vcpu, */ cache_all_regs(vcpu); - vcpu->mmio_is_write = 0; - if (!(emulation_type & EMULTYPE_NO_DECODE)) { int cs_db, cs_l; kvm_x86_ops->get_cs_db_l_bits(vcpu, &cs_db, &cs_l); @@ -3876,24 +3874,26 @@ restart: return EMULATE_DO_MMIO; } - if (r) { - if (kvm_mmu_unprotect_page_virt(vcpu, cr2)) - goto done; - if (!vcpu->mmio_needed) { - ++vcpu->stat.insn_emulation_fail; - trace_kvm_emulate_insn_failed(vcpu); - kvm_report_emulation_failure(vcpu, "mmio"); - return EMULATE_FAIL; - } + if (vcpu->mmio_needed) { + if (vcpu->mmio_is_write) + vcpu->mmio_needed = 0; return EMULATE_DO_MMIO; } - if (vcpu->mmio_is_write) { - vcpu->mmio_needed = 0; - return EMULATE_DO_MMIO; + if (r) { /* emulation failed */ + /* +* if emulation was due to access to shadowed page table +* and it failed try to unshadow page and re-entetr the +* guest to let CPU execute the instruction. +*/ + if (kvm_mmu_unprotect_page_virt(vcpu, cr2)) + return EMULATE_DONE; + + trace_kvm_emulate_insn_failed(vcpu); + kvm_report_emulation_failure(vcpu, "mmio"); + return EMULATE_FAIL; } -done: if (vcpu->arch.exception.pending) vcpu->arch.emulate_ctxt.restart = false; -- 1.6.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv2 10/23] KVM: x86 emulator: make set_cr() callback return error if it fails
Make set_cr() callback return error if it fails instead of injecting #GP behind emulator's back. Signed-off-by: Gleb Natapov --- arch/x86/include/asm/kvm_emulate.h |2 +- arch/x86/kvm/emulate.c | 10 ++- arch/x86/kvm/x86.c | 148 ++-- 3 files changed, 84 insertions(+), 76 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index df53ba2..6c4f491 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -135,7 +135,7 @@ struct x86_emulate_ops { unsigned long (*get_cached_segment_base)(int seg, struct kvm_vcpu *vcpu); void (*get_gdt)(struct desc_ptr *dt, struct kvm_vcpu *vcpu); ulong (*get_cr)(int cr, struct kvm_vcpu *vcpu); - void (*set_cr)(int cr, ulong val, struct kvm_vcpu *vcpu); + int (*set_cr)(int cr, ulong val, struct kvm_vcpu *vcpu); int (*cpl)(struct kvm_vcpu *vcpu); void (*set_rflags)(struct kvm_vcpu *vcpu, unsigned long rflags); int (*get_dr)(int dr, unsigned long *dest, struct kvm_vcpu *vcpu); diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index deb17a1..9f4b22e 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2272,7 +2272,10 @@ static int load_state_from_tss32(struct x86_emulate_ctxt *ctxt, struct decode_cache *c = &ctxt->decode; int ret; - ops->set_cr(3, tss->cr3, ctxt->vcpu); + if (ops->set_cr(3, tss->cr3, ctxt->vcpu)) { + kvm_inject_gp(ctxt->vcpu, 0); + return X86EMUL_PROPAGATE_FAULT; + } c->eip = tss->eip; ctxt->eflags = tss->eflags | 2; c->regs[VCPU_REGS_RAX] = tss->eax; @@ -3135,7 +3138,10 @@ twobyte_insn: c->dst.type = OP_NONE; /* no writeback */ break; case 0x22: /* mov reg, cr */ - ops->set_cr(c->modrm_reg, c->modrm_val, ctxt->vcpu); + if (ops->set_cr(c->modrm_reg, c->modrm_val, ctxt->vcpu)) { + kvm_inject_gp(ctxt->vcpu, 0); + goto done; + } c->dst.type = OP_NONE; break; case 0x23: /* mov from reg to dr */ diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 2829e8f..3cb016f 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -414,57 +414,49 @@ out: return changed; } -void kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0) +static int __kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0) { cr0 |= X86_CR0_ET; #ifdef CONFIG_X86_64 - if (cr0 & 0xUL) { - kvm_inject_gp(vcpu, 0); - return; - } + if (cr0 & 0xUL) + return 1; #endif cr0 &= ~CR0_RESERVED_BITS; - if ((cr0 & X86_CR0_NW) && !(cr0 & X86_CR0_CD)) { - kvm_inject_gp(vcpu, 0); - return; - } + if ((cr0 & X86_CR0_NW) && !(cr0 & X86_CR0_CD)) + return 1; - if ((cr0 & X86_CR0_PG) && !(cr0 & X86_CR0_PE)) { - kvm_inject_gp(vcpu, 0); - return; - } + if ((cr0 & X86_CR0_PG) && !(cr0 & X86_CR0_PE)) + return 1; if (!is_paging(vcpu) && (cr0 & X86_CR0_PG)) { #ifdef CONFIG_X86_64 if ((vcpu->arch.efer & EFER_LME)) { int cs_db, cs_l; - if (!is_pae(vcpu)) { - kvm_inject_gp(vcpu, 0); - return; - } + if (!is_pae(vcpu)) + return 1; kvm_x86_ops->get_cs_db_l_bits(vcpu, &cs_db, &cs_l); - if (cs_l) { - kvm_inject_gp(vcpu, 0); - return; - - } + if (cs_l) + return 1; } else #endif - if (is_pae(vcpu) && !load_pdptrs(vcpu, vcpu->arch.cr3)) { - kvm_inject_gp(vcpu, 0); - return; - } - + if (is_pae(vcpu) && !load_pdptrs(vcpu, vcpu->arch.cr3)) + return 1; } kvm_x86_ops->set_cr0(vcpu, cr0); kvm_mmu_reset_context(vcpu); - return; + return 0; +} + +void kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0) +{ + if (__kvm_set_cr0(vcpu, cr0)) + kvm_inject_gp(vcpu, 0); } EXPORT_SYMBOL_GPL(kvm_set_cr0); @@ -474,61 +466,56 @@ void kvm_lmsw(struct kvm_vcpu *vcpu, unsigned long msw) } EXPORT_SYMBOL_GPL(kvm_lmsw); -void kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) +int __kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) { unsigned long old_cr4 = kvm_read_cr4(vcpu); unsigned long pdptr_bits = X86_CR4_PGE | X86_CR4_PSE | X86_
[PATCHv2 08/23] KVM: x86 emulator: add get_cached_segment_base() callback to x86_emulate_ops.
On VMX it is expensive to call get_cached_descriptor() just to get segment base since multiple vmcs_reads are done instead of only one. Introduce new call back get_cached_segment_base() for efficiency. Signed-off-by: Gleb Natapov --- arch/x86/include/asm/kvm_emulate.h |1 + arch/x86/kvm/emulate.c | 13 + arch/x86/kvm/x86.c |7 +++ 3 files changed, 9 insertions(+), 12 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index f751657..df53ba2 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -132,6 +132,7 @@ struct x86_emulate_ops { int seg, struct kvm_vcpu *vcpu); u16 (*get_segment_selector)(int seg, struct kvm_vcpu *vcpu); void (*set_segment_selector)(u16 sel, int seg, struct kvm_vcpu *vcpu); + unsigned long (*get_cached_segment_base)(int seg, struct kvm_vcpu *vcpu); void (*get_gdt)(struct desc_ptr *dt, struct kvm_vcpu *vcpu); ulong (*get_cr)(int cr, struct kvm_vcpu *vcpu); void (*set_cr)(int cr, ulong val, struct kvm_vcpu *vcpu); diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 37b20c3..41c77a1 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2097,17 +2097,6 @@ static bool emulator_io_permited(struct x86_emulate_ctxt *ctxt, return true; } -static u32 get_cached_descriptor_base(struct x86_emulate_ctxt *ctxt, - struct x86_emulate_ops *ops, - int seg) -{ - struct desc_struct desc; - if (ops->get_cached_descriptor(&desc, seg, ctxt->vcpu)) - return get_desc_base(&desc); - else - return ~0; -} - static void save_state_to_tss16(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops, struct tss_segment_16 *tss) @@ -2383,7 +2372,7 @@ static int emulator_do_task_switch(struct x86_emulate_ctxt *ctxt, int ret; u16 old_tss_sel = ops->get_segment_selector(VCPU_SREG_TR, ctxt->vcpu); ulong old_tss_base = - get_cached_descriptor_base(ctxt, ops, VCPU_SREG_TR); + ops->get_cached_segment_base(VCPU_SREG_TR, ctxt->vcpu); u32 desc_limit; /* FIXME: old_tss_base == ~0 ? */ diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 69db0c8..2829e8f 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3652,6 +3652,12 @@ static void emulator_get_gdt(struct desc_ptr *dt, struct kvm_vcpu *vcpu) kvm_x86_ops->get_gdt(vcpu, dt); } +static unsigned long emulator_get_cached_segment_base(int seg, + struct kvm_vcpu *vcpu) +{ + return get_segment_base(vcpu, seg); +} + static bool emulator_get_cached_descriptor(struct desc_struct *desc, int seg, struct kvm_vcpu *vcpu) { @@ -3742,6 +3748,7 @@ static struct x86_emulate_ops emulate_ops = { .set_cached_descriptor = emulator_set_cached_descriptor, .get_segment_selector = emulator_get_segment_selector, .set_segment_selector = emulator_set_segment_selector, + .get_cached_segment_base = emulator_get_cached_segment_base, .get_gdt = emulator_get_gdt, .get_cr = emulator_get_cr, .set_cr = emulator_set_cr, -- 1.6.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv2 11/23] KVM: x86 emulator: make (get|set)_dr() callback return error if it fails
Make (get|set)_dr() callback return error if it fails instead of injecting exception behind emulator's back. Signed-off-by: Gleb Natapov --- arch/x86/kvm/emulate.c | 11 ++-- arch/x86/kvm/x86.c | 63 --- 2 files changed, 45 insertions(+), 29 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 9f4b22e..df4e9a0 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -3151,9 +3151,14 @@ twobyte_insn: goto done; } - ops->set_dr(c->modrm_reg,c->regs[c->modrm_rm] & - ((ctxt->mode == X86EMUL_MODE_PROT64) ? ~0ULL : ~0U), - ctxt->vcpu); + if (ops->set_dr(c->modrm_reg, c->regs[c->modrm_rm] & + ((ctxt->mode == X86EMUL_MODE_PROT64) ? +~0ULL : ~0U), ctxt->vcpu) < 0) { + /* #UD condition is already handled by the code above */ + kvm_inject_gp(ctxt->vcpu, 0); + goto done; + } + c->dst.type = OP_NONE; /* no writeback */ break; case 0x30: diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 3cb016f..7b4d113 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -573,7 +573,7 @@ unsigned long kvm_get_cr8(struct kvm_vcpu *vcpu) } EXPORT_SYMBOL_GPL(kvm_get_cr8); -int kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long val) +static int __kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long val) { switch (dr) { case 0 ... 3: @@ -582,29 +582,21 @@ int kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long val) vcpu->arch.eff_db[dr] = val; break; case 4: - if (kvm_read_cr4_bits(vcpu, X86_CR4_DE)) { - kvm_queue_exception(vcpu, UD_VECTOR); - return 1; - } + if (kvm_read_cr4_bits(vcpu, X86_CR4_DE)) + return 1; /* #UD */ /* fall through */ case 6: - if (val & 0xULL) { - kvm_inject_gp(vcpu, 0); - return 1; - } + if (val & 0xULL) + return -1; /* #GP */ vcpu->arch.dr6 = (val & DR6_VOLATILE) | DR6_FIXED_1; break; case 5: - if (kvm_read_cr4_bits(vcpu, X86_CR4_DE)) { - kvm_queue_exception(vcpu, UD_VECTOR); - return 1; - } + if (kvm_read_cr4_bits(vcpu, X86_CR4_DE)) + return 1; /* #UD */ /* fall through */ default: /* 7 */ - if (val & 0xULL) { - kvm_inject_gp(vcpu, 0); - return 1; - } + if (val & 0xULL) + return -1; /* #GP */ vcpu->arch.dr7 = (val & DR7_VOLATILE) | DR7_FIXED_1; if (!(vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP)) { kvm_x86_ops->set_dr7(vcpu, vcpu->arch.dr7); @@ -615,28 +607,37 @@ int kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long val) return 0; } + +int kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long val) +{ + int res; + + res = __kvm_set_dr(vcpu, dr, val); + if (res > 0) + kvm_queue_exception(vcpu, UD_VECTOR); + else if (res < 0) + kvm_inject_gp(vcpu, 0); + + return res; +} EXPORT_SYMBOL_GPL(kvm_set_dr); -int kvm_get_dr(struct kvm_vcpu *vcpu, int dr, unsigned long *val) +static int _kvm_get_dr(struct kvm_vcpu *vcpu, int dr, unsigned long *val) { switch (dr) { case 0 ... 3: *val = vcpu->arch.db[dr]; break; case 4: - if (kvm_read_cr4_bits(vcpu, X86_CR4_DE)) { - kvm_queue_exception(vcpu, UD_VECTOR); + if (kvm_read_cr4_bits(vcpu, X86_CR4_DE)) return 1; - } /* fall through */ case 6: *val = vcpu->arch.dr6; break; case 5: - if (kvm_read_cr4_bits(vcpu, X86_CR4_DE)) { - kvm_queue_exception(vcpu, UD_VECTOR); + if (kvm_read_cr4_bits(vcpu, X86_CR4_DE)) return 1; - } /* fall through */ default: /* 7 */ *val = vcpu->arch.dr7; @@ -645,6 +646,15 @@ int kvm_get_dr(struct kvm_vcpu *vcpu, int dr, unsigned long *val) return 0; } + +int kvm_get_dr(struct kvm_vcpu *vcpu, int dr, unsigned long *val) +{ + if (_kvm_get_dr(vcpu, dr, val)) { + kvm_queue_exception(vcpu, UD_VECTOR); +
[PATCHv2 09/23] KVM: x86 emulator: cleanup some direct calls into kvm to use existing callbacks
Use callbacks from x86_emulate_ops to access segments instead of calling into kvm directly. Signed-off-by: Gleb Natapov --- arch/x86/kvm/emulate.c | 200 +--- 1 files changed, 105 insertions(+), 95 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 41c77a1..deb17a1 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -622,31 +622,35 @@ static void set_seg_override(struct decode_cache *c, int seg) c->seg_override = seg; } -static unsigned long seg_base(struct x86_emulate_ctxt *ctxt, int seg) +static unsigned long seg_base(struct x86_emulate_ctxt *ctxt, + struct x86_emulate_ops *ops, int seg) { if (ctxt->mode == X86EMUL_MODE_PROT64 && seg < VCPU_SREG_FS) return 0; - return kvm_x86_ops->get_segment_base(ctxt->vcpu, seg); + return ops->get_cached_segment_base(seg, ctxt->vcpu); } static unsigned long seg_override_base(struct x86_emulate_ctxt *ctxt, + struct x86_emulate_ops *ops, struct decode_cache *c) { if (!c->has_seg_override) return 0; - return seg_base(ctxt, c->seg_override); + return seg_base(ctxt, ops, c->seg_override); } -static unsigned long es_base(struct x86_emulate_ctxt *ctxt) +static unsigned long es_base(struct x86_emulate_ctxt *ctxt, +struct x86_emulate_ops *ops) { - return seg_base(ctxt, VCPU_SREG_ES); + return seg_base(ctxt, ops, VCPU_SREG_ES); } -static unsigned long ss_base(struct x86_emulate_ctxt *ctxt) +static unsigned long ss_base(struct x86_emulate_ctxt *ctxt, +struct x86_emulate_ops *ops) { - return seg_base(ctxt, VCPU_SREG_SS); + return seg_base(ctxt, ops, VCPU_SREG_SS); } static int do_fetch_insn_byte(struct x86_emulate_ctxt *ctxt, @@ -941,7 +945,7 @@ x86_decode_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) memset(c, 0, sizeof(struct decode_cache)); c->eip = ctxt->eip; c->fetch.start = c->fetch.end = c->eip; - ctxt->cs_base = seg_base(ctxt, VCPU_SREG_CS); + ctxt->cs_base = seg_base(ctxt, ops, VCPU_SREG_CS); memcpy(c->regs, ctxt->vcpu->arch.regs, sizeof c->regs); switch (mode) { @@ -1065,7 +1069,7 @@ done_prefixes: set_seg_override(c, VCPU_SREG_DS); if (!(!c->twobyte && c->b == 0x8d)) - c->modrm_ea += seg_override_base(ctxt, c); + c->modrm_ea += seg_override_base(ctxt, ops, c); if (c->ad_bytes != 8) c->modrm_ea = (u32)c->modrm_ea; @@ -1161,7 +1165,7 @@ done_prefixes: c->src.type = OP_MEM; c->src.bytes = (c->d & ByteOp) ? 1 : c->op_bytes; c->src.ptr = (unsigned long *) - register_address(c, seg_override_base(ctxt, c), + register_address(c, seg_override_base(ctxt, ops, c), c->regs[VCPU_REGS_RSI]); c->src.val = 0; break; @@ -1257,7 +1261,7 @@ done_prefixes: c->dst.type = OP_MEM; c->dst.bytes = (c->d & ByteOp) ? 1 : c->op_bytes; c->dst.ptr = (unsigned long *) - register_address(c, es_base(ctxt), + register_address(c, es_base(ctxt, ops), c->regs[VCPU_REGS_RDI]); c->dst.val = 0; break; @@ -1516,7 +1520,8 @@ exception: return X86EMUL_PROPAGATE_FAULT; } -static inline void emulate_push(struct x86_emulate_ctxt *ctxt) +static inline void emulate_push(struct x86_emulate_ctxt *ctxt, + struct x86_emulate_ops *ops) { struct decode_cache *c = &ctxt->decode; @@ -1524,7 +1529,7 @@ static inline void emulate_push(struct x86_emulate_ctxt *ctxt) c->dst.bytes = c->op_bytes; c->dst.val = c->src.val; register_address_increment(c, &c->regs[VCPU_REGS_RSP], -c->op_bytes); - c->dst.ptr = (void *) register_address(c, ss_base(ctxt), + c->dst.ptr = (void *) register_address(c, ss_base(ctxt, ops), c->regs[VCPU_REGS_RSP]); } @@ -1535,7 +1540,7 @@ static int emulate_pop(struct x86_emulate_ctxt *ctxt, struct decode_cache *c = &ctxt->decode; int rc; - rc = read_emulated(ctxt, ops, register_address(c, ss_base(ctxt), + rc = read_emulated(ctxt, ops, register_address(c, ss_base(ctxt, ops), c->regs[VCPU_REGS_RSP]), dest, len); if (rc != X86EMUL_CONTINUE) @@ -1588,15 +1593,14 @@ static int emulate_popf(struct x86_emulate_ctxt *ctxt, return rc; } -static void emulate_push_sreg(struct x86_emulate_ctxt *ctxt, int
[PATCHv2 18/23] KVM: x86 emulator: advance RIP outside x86 emulator code
Return new RIP as part of instruction emulation result instead of updating KVM's RIP from x86 emulator code. Signed-off-by: Gleb Natapov --- arch/x86/kvm/emulate.c |7 --- arch/x86/kvm/x86.c |4 +++- 2 files changed, 7 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 83d8fe4..8ff6e82 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2496,8 +2496,9 @@ int emulator_task_switch(struct x86_emulate_ctxt *ctxt, if (rc == X86EMUL_CONTINUE) { memcpy(ctxt->vcpu->arch.regs, c->regs, sizeof c->regs); - kvm_rip_write(ctxt->vcpu, c->eip); rc = writeback(ctxt, ops); + if (rc == X86EMUL_CONTINUE) + ctxt->eip = c->eip; } return (rc == X86EMUL_UNHANDLEABLE) ? -1 : 0; @@ -2554,7 +2555,7 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) if (address_mask(c, c->regs[VCPU_REGS_RCX]) == 0) { string_done: ctxt->restart = false; - kvm_rip_write(ctxt->vcpu, c->eip); + ctxt->eip = c->eip; goto done; } /* The second termination condition only applies for REPE @@ -3032,7 +3033,7 @@ writeback: ctxt->decode.mem_read.end = 0; /* Commit shadow register state. */ memcpy(ctxt->vcpu->arch.regs, c->regs, sizeof c->regs); - kvm_rip_write(ctxt->vcpu, c->eip); + ctxt->eip = c->eip; ops->set_rflags(ctxt->vcpu, ctxt->eflags); done: diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 445769b..9429ab1 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3879,6 +3879,7 @@ restart: shadow_mask = vcpu->arch.emulate_ctxt.interruptibility; kvm_x86_ops->set_interrupt_shadow(vcpu, shadow_mask); + kvm_rip_write(vcpu, vcpu->arch.emulate_ctxt.eip); if (vcpu->arch.pio.count) { if (!vcpu->arch.pio.in) @@ -4882,7 +4883,8 @@ int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int reason, if (ret) return EMULATE_FAIL; - + + kvm_rip_write(vcpu, vcpu->arch.emulate_ctxt.eip); kvm_x86_ops->set_rflags(vcpu, vcpu->arch.emulate_ctxt.eflags); return EMULATE_DONE; } -- 1.6.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv2 12/23] KVM: x86 emulator: fix X86EMUL_RETRY_INSTR and X86EMUL_CMPXCHG_FAILED values
Currently X86EMUL_PROPAGATE_FAULT, X86EMUL_RETRY_INSTR and X86EMUL_CMPXCHG_FAILED have the same value so caller cannot distinguish why function such as emulator_cmpxchg_emulated() (which can return both X86EMUL_PROPAGATE_FAULT and X86EMUL_CMPXCHG_FAILED) failed. Signed-off-by: Gleb Natapov --- arch/x86/include/asm/kvm_emulate.h |5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index 6c4f491..0cf4311 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -51,8 +51,9 @@ struct x86_emulate_ctxt; #define X86EMUL_UNHANDLEABLE1 /* Terminate emulation but return success to the caller. */ #define X86EMUL_PROPAGATE_FAULT 2 /* propagate a generated fault to guest */ -#define X86EMUL_RETRY_INSTR 2 /* retry the instruction for some reason */ -#define X86EMUL_CMPXCHG_FAILED 2 /* cmpxchg did not see expected value */ +#define X86EMUL_RETRY_INSTR 3 /* retry the instruction for some reason */ +#define X86EMUL_CMPXCHG_FAILED 4 /* cmpxchg did not see expected value */ + struct x86_emulate_ops { /* * read_std: Read bytes of standard (non-emulated/special) memory. -- 1.6.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv2 13/23] KVM: fill in run->mmio details in (read|write)_emulated function.
Fill in run->mmio details in (read|write)_emulated function just like pio does. There is no point in filling only vcpu fields there just to copy them into vcpu->run a little bit later. Signed-off-by: Gleb Natapov --- arch/x86/kvm/x86.c | 25 + 1 files changed, 9 insertions(+), 16 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 7b4d113..aa629a0 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3324,9 +3324,10 @@ mmio: trace_kvm_mmio(KVM_TRACE_MMIO_READ_UNSATISFIED, bytes, gpa, 0); vcpu->mmio_needed = 1; - vcpu->mmio_phys_addr = gpa; - vcpu->mmio_size = bytes; - vcpu->mmio_is_write = 0; + vcpu->run->exit_reason = KVM_EXIT_MMIO; + vcpu->run->mmio.phys_addr = vcpu->mmio_phys_addr = gpa; + vcpu->run->mmio.len = vcpu->mmio_size = bytes; + vcpu->run->mmio.is_write = vcpu->mmio_is_write = 0; return X86EMUL_UNHANDLEABLE; } @@ -3374,10 +3375,11 @@ mmio: return X86EMUL_CONTINUE; vcpu->mmio_needed = 1; - vcpu->mmio_phys_addr = gpa; - vcpu->mmio_size = bytes; - vcpu->mmio_is_write = 1; - memcpy(vcpu->mmio_data, val, bytes); + vcpu->run->exit_reason = KVM_EXIT_MMIO; + vcpu->run->mmio.phys_addr = vcpu->mmio_phys_addr = gpa; + vcpu->run->mmio.len = vcpu->mmio_size = bytes; + vcpu->run->mmio.is_write = vcpu->mmio_is_write = 1; + memcpy(vcpu->run->mmio.data, val, bytes); return X86EMUL_CONTINUE; } @@ -3788,7 +3790,6 @@ int emulate_instruction(struct kvm_vcpu *vcpu, { int r, shadow_mask; struct decode_cache *c; - struct kvm_run *run = vcpu->run; kvm_clear_exception_queue(vcpu); vcpu->arch.mmio_fault_cr2 = cr2; @@ -3875,14 +3876,6 @@ restart: return EMULATE_DO_MMIO; } - if (r || vcpu->mmio_is_write) { - run->exit_reason = KVM_EXIT_MMIO; - run->mmio.phys_addr = vcpu->mmio_phys_addr; - memcpy(run->mmio.data, vcpu->mmio_data, 8); - run->mmio.len = vcpu->mmio_size; - run->mmio.is_write = vcpu->mmio_is_write; - } - if (r) { if (kvm_mmu_unprotect_page_virt(vcpu, cr2)) goto done; -- 1.6.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv2 19/23] KVM: x86 emulator: set RFLAGS outside x86 emulator code.
Removes the need for set_flags() callback. Signed-off-by: Gleb Natapov --- arch/x86/include/asm/kvm_emulate.h |1 - arch/x86/kvm/emulate.c |1 - arch/x86/kvm/x86.c |7 +-- 3 files changed, 1 insertions(+), 8 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index b7e00cb..a87d95f 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -142,7 +142,6 @@ struct x86_emulate_ops { ulong (*get_cr)(int cr, struct kvm_vcpu *vcpu); int (*set_cr)(int cr, ulong val, struct kvm_vcpu *vcpu); int (*cpl)(struct kvm_vcpu *vcpu); - void (*set_rflags)(struct kvm_vcpu *vcpu, unsigned long rflags); int (*get_dr)(int dr, unsigned long *dest, struct kvm_vcpu *vcpu); int (*set_dr)(int dr, unsigned long value, struct kvm_vcpu *vcpu); int (*set_msr)(struct kvm_vcpu *vcpu, u32 msr_index, u64 data); diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 8ff6e82..eee5b4d 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -3034,7 +3034,6 @@ writeback: /* Commit shadow register state. */ memcpy(ctxt->vcpu->arch.regs, c->regs, sizeof c->regs); ctxt->eip = c->eip; - ops->set_rflags(ctxt->vcpu, ctxt->eflags); done: return (rc == X86EMUL_UNHANDLEABLE) ? -1 : 0; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 9429ab1..344d9dd 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3744,11 +3744,6 @@ static void emulator_set_segment_selector(u16 sel, int seg, kvm_set_segment(vcpu, &kvm_seg, seg); } -static void emulator_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags) -{ - kvm_x86_ops->set_rflags(vcpu, rflags); -} - static struct x86_emulate_ops emulate_ops = { .read_std= kvm_read_guest_virt_system, .write_std = kvm_write_guest_virt_system, @@ -3767,7 +3762,6 @@ static struct x86_emulate_ops emulate_ops = { .get_cr = emulator_get_cr, .set_cr = emulator_set_cr, .cpl = emulator_get_cpl, - .set_rflags = emulator_set_rflags, .get_dr = emulator_get_dr, .set_dr = emulator_set_dr, .set_msr = kvm_set_msr, @@ -3879,6 +3873,7 @@ restart: shadow_mask = vcpu->arch.emulate_ctxt.interruptibility; kvm_x86_ops->set_interrupt_shadow(vcpu, shadow_mask); + kvm_x86_ops->set_rflags(vcpu, vcpu->arch.emulate_ctxt.eflags); kvm_rip_write(vcpu, vcpu->arch.emulate_ctxt.eip); if (vcpu->arch.pio.count) { -- 1.6.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv2 17/23] KVM: handle emulation failure case first.
If emulation failed return immediately. Signed-off-by: Gleb Natapov --- arch/x86/kvm/x86.c | 31 +++ 1 files changed, 15 insertions(+), 16 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 696b34b..445769b 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3862,22 +3862,6 @@ int emulate_instruction(struct kvm_vcpu *vcpu, restart: r = x86_emulate_insn(&vcpu->arch.emulate_ctxt, &emulate_ops); - shadow_mask = vcpu->arch.emulate_ctxt.interruptibility; - - if (r == 0) - kvm_x86_ops->set_interrupt_shadow(vcpu, shadow_mask); - - if (vcpu->arch.pio.count) { - if (!vcpu->arch.pio.in) - vcpu->arch.pio.count = 0; - return EMULATE_DO_MMIO; - } - - if (vcpu->mmio_needed) { - if (vcpu->mmio_is_write) - vcpu->mmio_needed = 0; - return EMULATE_DO_MMIO; - } if (r) { /* emulation failed */ /* @@ -3893,6 +3877,21 @@ restart: return EMULATE_FAIL; } + shadow_mask = vcpu->arch.emulate_ctxt.interruptibility; + kvm_x86_ops->set_interrupt_shadow(vcpu, shadow_mask); + + if (vcpu->arch.pio.count) { + if (!vcpu->arch.pio.in) + vcpu->arch.pio.count = 0; + return EMULATE_DO_MMIO; + } + + if (vcpu->mmio_needed) { + if (vcpu->mmio_is_write) + vcpu->mmio_needed = 0; + return EMULATE_DO_MMIO; + } + if (vcpu->arch.exception.pending) vcpu->arch.emulate_ctxt.restart = false; -- 1.6.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv2 15/23] KVM: remove export of emulator_write_emulated().
It is not called directly outside of the file it's defined in anymore. Signed-off-by: Gleb Natapov --- arch/x86/include/asm/kvm_host.h |5 - arch/x86/kvm/x86.c |1 - 2 files changed, 0 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 74cb6ac..ed48904 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -627,11 +627,6 @@ void kvm_inject_nmi(struct kvm_vcpu *vcpu); void fx_init(struct kvm_vcpu *vcpu); -int emulator_write_emulated(unsigned long addr, - const void *val, - unsigned int bytes, - struct kvm_vcpu *vcpu); - void kvm_mmu_flush_tlb(struct kvm_vcpu *vcpu); void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new, int bytes, diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index bbfabbd..224b98d 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3403,7 +3403,6 @@ int emulator_write_emulated(unsigned long addr, } return emulator_write_emulated_onepage(addr, val, bytes, vcpu); } -EXPORT_SYMBOL_GPL(emulator_write_emulated); #define CMPXCHG_TYPE(t, ptr, old, new) \ (cmpxchg((t *)(ptr), *(t *)(old), *(t *)(new)) == *(t *)(old)) -- 1.6.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv2 22/23] KVM: x86 emulator: move interruptibility state tracking out of emulator
Emulator shouldn't access vcpu directly. Signed-off-by: Gleb Natapov --- arch/x86/kvm/emulate.c | 19 ++- arch/x86/kvm/x86.c | 20 +--- 2 files changed, 19 insertions(+), 20 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 4cc8368..e2def3b 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -1843,20 +1843,6 @@ static inline int writeback(struct x86_emulate_ctxt *ctxt, return X86EMUL_CONTINUE; } -static void toggle_interruptibility(struct x86_emulate_ctxt *ctxt, u32 mask) -{ - u32 int_shadow = kvm_x86_ops->get_interrupt_shadow(ctxt->vcpu, mask); - /* -* an sti; sti; sequence only disable interrupts for the first -* instruction. So, if the last instruction, be it emulated or -* not, left the system with the INT_STI flag enabled, it -* means that the last instruction is an sti. We should not -* leave the flag on in this case. The same goes for mov ss -*/ - if (!(int_shadow & mask)) - ctxt->interruptibility = mask; -} - static inline void setup_syscalls_segments(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops, struct desc_struct *cs, @@ -2516,7 +2502,6 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) int rc = X86EMUL_CONTINUE; int saved_dst_type = c->dst.type; - ctxt->interruptibility = 0; ctxt->decode.mem_read.pos = 0; if (ctxt->mode == X86EMUL_MODE_PROT64 && (c->d & No64)) { @@ -2789,7 +2774,7 @@ special_insn: } if (c->modrm_reg == VCPU_SREG_SS) - toggle_interruptibility(ctxt, KVM_X86_SHADOW_INT_MOV_SS); + ctxt->interruptibility = KVM_X86_SHADOW_INT_MOV_SS; rc = load_segment_descriptor(ctxt, ops, sel, c->modrm_reg); @@ -2958,7 +2943,7 @@ special_insn: if (emulator_bad_iopl(ctxt, ops)) kvm_inject_gp(ctxt->vcpu, 0); else { - toggle_interruptibility(ctxt, KVM_X86_SHADOW_INT_STI); + ctxt->interruptibility = KVM_X86_SHADOW_INT_STI; ctxt->eflags |= X86_EFLAGS_IF; c->dst.type = OP_NONE; /* Disable writeback. */ } diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 5fda84e..f7e8732 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3776,12 +3776,26 @@ static void cache_all_regs(struct kvm_vcpu *vcpu) vcpu->arch.regs_dirty = ~0; } +static void toggle_interruptibility(struct kvm_vcpu *vcpu, u32 mask) +{ + u32 int_shadow = kvm_x86_ops->get_interrupt_shadow(vcpu, mask); + /* +* an sti; sti; sequence only disable interrupts for the first +* instruction. So, if the last instruction, be it emulated or +* not, left the system with the INT_STI flag enabled, it +* means that the last instruction is an sti. We should not +* leave the flag on in this case. The same goes for mov ss +*/ + if (!(int_shadow & mask)) + kvm_x86_ops->set_interrupt_shadow(vcpu, mask); +} + int emulate_instruction(struct kvm_vcpu *vcpu, unsigned long cr2, u16 error_code, int emulation_type) { - int r, shadow_mask; + int r; struct decode_cache *c = &vcpu->arch.emulate_ctxt.decode; kvm_clear_exception_queue(vcpu); @@ -3809,6 +3823,7 @@ int emulate_instruction(struct kvm_vcpu *vcpu, ? X86EMUL_MODE_PROT32 : X86EMUL_MODE_PROT16; memset(c, 0, sizeof(struct decode_cache)); memcpy(c->regs, vcpu->arch.regs, sizeof c->regs); + vcpu->arch.emulate_ctxt.interruptibility = 0; r = x86_decode_insn(&vcpu->arch.emulate_ctxt, &emulate_ops); trace_kvm_emulate_insn_start(vcpu); @@ -3876,8 +3891,7 @@ restart: return EMULATE_FAIL; } - shadow_mask = vcpu->arch.emulate_ctxt.interruptibility; - kvm_x86_ops->set_interrupt_shadow(vcpu, shadow_mask); + toggle_interruptibility(vcpu, vcpu->arch.emulate_ctxt.interruptibility); kvm_x86_ops->set_rflags(vcpu, vcpu->arch.emulate_ctxt.eflags); memcpy(vcpu->arch.regs, c->regs, sizeof c->regs); kvm_rip_write(vcpu, vcpu->arch.emulate_ctxt.eip); -- 1.6.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv2 16/23] KVM: do not inject #PF in (read|write)_emulated() callbacks
Return error to x86 emulator instead of injection exception behind its back. Signed-off-by: Gleb Natapov --- arch/x86/include/asm/kvm_emulate.h |3 +++ arch/x86/kvm/emulate.c | 12 +++- arch/x86/kvm/x86.c | 28 ++-- 3 files changed, 28 insertions(+), 15 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index 777240d..b7e00cb 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -94,6 +94,7 @@ struct x86_emulate_ops { int (*read_emulated)(unsigned long addr, void *val, unsigned int bytes, +unsigned int *error, struct kvm_vcpu *vcpu); /* @@ -106,6 +107,7 @@ struct x86_emulate_ops { int (*write_emulated)(unsigned long addr, const void *val, unsigned int bytes, + unsigned int *error, struct kvm_vcpu *vcpu); /* @@ -120,6 +122,7 @@ struct x86_emulate_ops { const void *old, const void *new, unsigned int bytes, + unsigned int *error, struct kvm_vcpu *vcpu); int (*pio_in_emulated)(int size, unsigned short port, void *val, diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index df4e9a0..83d8fe4 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -1277,6 +1277,7 @@ static int read_emulated(struct x86_emulate_ctxt *ctxt, { int rc; struct read_cache *mc = &ctxt->decode.mem_read; + u32 err; while (size) { int n = min(size, 8u); @@ -1284,7 +1285,10 @@ static int read_emulated(struct x86_emulate_ctxt *ctxt, if (mc->pos < mc->end) goto read_cached; - rc = ops->read_emulated(addr, mc->data + mc->end, n, ctxt->vcpu); + rc = ops->read_emulated(addr, mc->data + mc->end, n, &err, + ctxt->vcpu); + if (rc == X86EMUL_PROPAGATE_FAULT) + kvm_inject_page_fault(ctxt->vcpu, addr, err); if (rc != X86EMUL_CONTINUE) return rc; mc->end += n; @@ -1789,6 +1793,7 @@ static inline int writeback(struct x86_emulate_ctxt *ctxt, { int rc; struct decode_cache *c = &ctxt->decode; + u32 err; switch (c->dst.type) { case OP_REG: @@ -1817,13 +1822,18 @@ static inline int writeback(struct x86_emulate_ctxt *ctxt, &c->dst.orig_val, &c->dst.val, c->dst.bytes, + &err, ctxt->vcpu); else rc = ops->write_emulated( (unsigned long)c->dst.ptr, &c->dst.val, c->dst.bytes, + &err, ctxt->vcpu); + if (rc == X86EMUL_PROPAGATE_FAULT) + kvm_inject_page_fault(ctxt->vcpu, + (unsigned long)c->dst.ptr, err); if (rc != X86EMUL_CONTINUE) return rc; break; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 224b98d..696b34b 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3284,10 +3284,10 @@ out: static int emulator_read_emulated(unsigned long addr, void *val, unsigned int bytes, + unsigned int *error_code, struct kvm_vcpu *vcpu) { gpa_t gpa; - u32 error_code; if (vcpu->mmio_read_completed) { memcpy(val, vcpu->mmio_data, bytes); @@ -3297,12 +3297,10 @@ static int emulator_read_emulated(unsigned long addr, return X86EMUL_CONTINUE; } - gpa = kvm_mmu_gva_to_gpa_read(vcpu, addr, &error_code); + gpa = kvm_mmu_gva_to_gpa_read(vcpu, addr, error_code); - if (gpa == UNMAPPED_GVA) { - kvm_inject_page_fault(vcpu, addr, error_code); + if (gpa == UNMAPPED_GVA) return X86EMUL_PROPAGATE_FAULT; - } /* For APIC access vmexit */ if ((gpa & PAGE_MASK) == APIC_DEFAULT_PHYS_BASE) @@ -3347,17 +3345,15 @@ int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa, static int emulator_write_emulated_onepage(unsigned long addr,
[PATCHv2 21/23] KVM: x86 exmulator: handle shadowed registers outside emulator.
Emulator shouldn't access vcpu directly. Signed-off-by: Gleb Natapov --- arch/x86/kvm/emulate.c | 15 --- arch/x86/kvm/x86.c | 16 +--- 2 files changed, 13 insertions(+), 18 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 5d786d5..4cc8368 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -941,12 +941,9 @@ x86_decode_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) /* we cannot decode insn before we complete previous rep insn */ WARN_ON(ctxt->restart); - /* Shadow copy of register state. Committed on successful emulation. */ - memset(c, 0, sizeof(struct decode_cache)); c->eip = ctxt->eip; c->fetch.start = c->fetch.end = c->eip; ctxt->cs_base = seg_base(ctxt, ops, VCPU_SREG_CS); - memcpy(c->regs, ctxt->vcpu->arch.regs, sizeof c->regs); switch (mode) { case X86EMUL_MODE_REAL: @@ -2486,16 +2483,13 @@ int emulator_task_switch(struct x86_emulate_ctxt *ctxt, struct decode_cache *c = &ctxt->decode; int rc; - memset(c, 0, sizeof(struct decode_cache)); c->eip = ctxt->eip; - memcpy(c->regs, ctxt->vcpu->arch.regs, sizeof c->regs); c->dst.type = OP_NONE; rc = emulator_do_task_switch(ctxt, ops, tss_selector, reason, has_error_code, error_code); if (rc == X86EMUL_CONTINUE) { - memcpy(ctxt->vcpu->arch.regs, c->regs, sizeof c->regs); rc = writeback(ctxt, ops); if (rc == X86EMUL_CONTINUE) ctxt->eip = c->eip; @@ -2525,13 +2519,6 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) ctxt->interruptibility = 0; ctxt->decode.mem_read.pos = 0; - /* Shadow copy of register state. Committed on successful emulation. -* NOTE: we can copy them from vcpu as x86_decode_insn() doesn't -* modify them. -*/ - - memcpy(c->regs, ctxt->vcpu->arch.regs, sizeof c->regs); - if (ctxt->mode == X86EMUL_MODE_PROT64 && (c->d & No64)) { kvm_queue_exception(ctxt->vcpu, UD_VECTOR); goto done; @@ -3031,8 +3018,6 @@ writeback: * without decoding */ ctxt->decode.mem_read.end = 0; - /* Commit shadow register state. */ - memcpy(ctxt->vcpu->arch.regs, c->regs, sizeof c->regs); ctxt->eip = c->eip; done: diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 344d9dd..5fda84e 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3782,7 +3782,7 @@ int emulate_instruction(struct kvm_vcpu *vcpu, int emulation_type) { int r, shadow_mask; - struct decode_cache *c; + struct decode_cache *c = &vcpu->arch.emulate_ctxt.decode; kvm_clear_exception_queue(vcpu); vcpu->arch.mmio_fault_cr2 = cr2; @@ -3807,13 +3807,14 @@ int emulate_instruction(struct kvm_vcpu *vcpu, ? X86EMUL_MODE_VM86 : cs_l ? X86EMUL_MODE_PROT64 : cs_db ? X86EMUL_MODE_PROT32 : X86EMUL_MODE_PROT16; + memset(c, 0, sizeof(struct decode_cache)); + memcpy(c->regs, vcpu->arch.regs, sizeof c->regs); r = x86_decode_insn(&vcpu->arch.emulate_ctxt, &emulate_ops); trace_kvm_emulate_insn_start(vcpu); /* Only allow emulation of specific instructions on #UD * (namely VMMCALL, sysenter, sysexit, syscall)*/ - c = &vcpu->arch.emulate_ctxt.decode; if (emulation_type & EMULTYPE_TRAP_UD) { if (!c->twobyte) return EMULATE_FAIL; @@ -3854,6 +3855,10 @@ int emulate_instruction(struct kvm_vcpu *vcpu, return EMULATE_DONE; } + /* this is needed for vmware backdor interface to work since it + changes registers values during IO operation */ + memcpy(c->regs, vcpu->arch.regs, sizeof c->regs); + restart: r = x86_emulate_insn(&vcpu->arch.emulate_ctxt, &emulate_ops); @@ -3874,6 +3879,7 @@ restart: shadow_mask = vcpu->arch.emulate_ctxt.interruptibility; kvm_x86_ops->set_interrupt_shadow(vcpu, shadow_mask); kvm_x86_ops->set_rflags(vcpu, vcpu->arch.emulate_ctxt.eflags); + memcpy(vcpu->arch.regs, c->regs, sizeof c->regs); kvm_rip_write(vcpu, vcpu->arch.emulate_ctxt.eip); if (vcpu->arch.pio.count) { @@ -4857,6 +4863,7 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu, int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int reason, bool has_error_code, u32 error_code) { + struct decode_cache *c = &vcpu->arch.emulate_ctxt.decode; int cs_db, cs_l, ret; cache_all_regs(vcpu); @@ -4871,6 +4878,8 @@ int kvm_task_switch(struct kvm_vcpu *vcpu, u16
[PATCHv2 23/23] KVM: x86 emulator: do not inject exception directly into vcpu
Return exception as a result of instruction emulation and handle injection in KVM code. Signed-off-by: Gleb Natapov --- arch/x86/include/asm/kvm_emulate.h |6 ++ arch/x86/kvm/emulate.c | 124 ++-- arch/x86/kvm/x86.c | 20 +- 3 files changed, 100 insertions(+), 50 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index a87d95f..51cfd73 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -216,6 +216,12 @@ struct x86_emulate_ctxt { int interruptibility; bool restart; /* restart string instruction after writeback */ + + int exception; /* exception that happens during emulation or -1 */ + u32 error_code; /* error code for exception */ + bool error_code_valid; + unsigned long cr2; /* faulted address in case of #PF */ + /* decode cache */ struct decode_cache decode; }; diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index e2def3b..50ce993 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -653,6 +653,37 @@ static unsigned long ss_base(struct x86_emulate_ctxt *ctxt, return seg_base(ctxt, ops, VCPU_SREG_SS); } +static void emulate_exception(struct x86_emulate_ctxt *ctxt, int vec, + u32 error, bool valid) +{ + ctxt->exception = vec; + ctxt->error_code = error; + ctxt->error_code_valid = valid; + ctxt->restart = false; +} + +static void emulate_gp(struct x86_emulate_ctxt *ctxt, int err) +{ + emulate_exception(ctxt, GP_VECTOR, err, true); +} + +static void emulate_pf(struct x86_emulate_ctxt *ctxt, unsigned long addr, + int err) +{ + ctxt->cr2 = addr; + emulate_exception(ctxt, PF_VECTOR, err, true); +} + +static void emulate_ud(struct x86_emulate_ctxt *ctxt) +{ + emulate_exception(ctxt, UD_VECTOR, 0, false); +} + +static void emulate_ts(struct x86_emulate_ctxt *ctxt, int err) +{ + emulate_exception(ctxt, TS_VECTOR, err, true); +} + static int do_fetch_insn_byte(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops, unsigned long eip, u8 *dest) @@ -1285,7 +1316,7 @@ static int read_emulated(struct x86_emulate_ctxt *ctxt, rc = ops->read_emulated(addr, mc->data + mc->end, n, &err, ctxt->vcpu); if (rc == X86EMUL_PROPAGATE_FAULT) - kvm_inject_page_fault(ctxt->vcpu, addr, err); + emulate_pf(ctxt, addr, err); if (rc != X86EMUL_CONTINUE) return rc; mc->end += n; @@ -1366,13 +1397,13 @@ static int read_segment_descriptor(struct x86_emulate_ctxt *ctxt, get_descriptor_table_ptr(ctxt, ops, selector, &dt); if (dt.size < index * 8 + 7) { - kvm_inject_gp(ctxt->vcpu, selector & 0xfffc); + emulate_gp(ctxt, selector & 0xfffc); return X86EMUL_PROPAGATE_FAULT; } addr = dt.address + index * 8; ret = ops->read_std(addr, desc, sizeof *desc, ctxt->vcpu, &err); if (ret == X86EMUL_PROPAGATE_FAULT) - kvm_inject_page_fault(ctxt->vcpu, addr, err); + emulate_pf(ctxt, addr, err); return ret; } @@ -1391,14 +1422,14 @@ static int write_segment_descriptor(struct x86_emulate_ctxt *ctxt, get_descriptor_table_ptr(ctxt, ops, selector, &dt); if (dt.size < index * 8 + 7) { - kvm_inject_gp(ctxt->vcpu, selector & 0xfffc); + emulate_gp(ctxt, selector & 0xfffc); return X86EMUL_PROPAGATE_FAULT; } addr = dt.address + index * 8; ret = ops->write_std(addr, desc, sizeof *desc, ctxt->vcpu, &err); if (ret == X86EMUL_PROPAGATE_FAULT) - kvm_inject_page_fault(ctxt->vcpu, addr, err); + emulate_pf(ctxt, addr, err); return ret; } @@ -1517,7 +1548,7 @@ load: ops->set_cached_descriptor(&seg_desc, seg, ctxt->vcpu); return X86EMUL_CONTINUE; exception: - kvm_queue_exception_e(ctxt->vcpu, err_vec, err_code); + emulate_exception(ctxt, err_vec, err_code, true); return X86EMUL_PROPAGATE_FAULT; } @@ -1578,7 +1609,7 @@ static int emulate_popf(struct x86_emulate_ctxt *ctxt, break; case X86EMUL_MODE_VM86: if (iopl < 3) { - kvm_inject_gp(ctxt->vcpu, 0); + emulate_gp(ctxt, 0); return X86EMUL_PROPAGATE_FAULT; } change_mask |= EFLG_IF; @@ -1829,7 +1860,7 @@ static inline int writeback(struct x86_emulate_ctxt *ctxt, &err, ctxt->vcpu); if (rc == X86EMUL_PR
[PATCHv2 00/23] next round of emulator cleanups
This is the next round of emulator cleanups. Make it even more detached from kvm. First patch introduces IO read cache which is needed to correctly emulate instructions that require more then one IO read exit during emulation. ChangeLog; v1->v2: - add x86_emulator_ops callback to get cached segment base - use double underscore on helper functions - fold "remove unneeded initialization" patch Gleb Natapov (23): KVM: x86 emulator: introduce read cache. KVM: x86 emulator: fix Move r/m16 to segment register decoding. KVM: x86 emulator: cleanup xchg emulation. KVM: x86 emulator: cleanup nop emulation KVM: x86 emulator: handle "far address" source operand. KVM: x86 emulator: add (set|get)_dr callbacks to x86_emulate_ops KVM: x86 emulator: add (set|get)_msr callbacks to x86_emulate_ops KVM: x86 emulator: add get_cached_segment_base() callback to x86_emulate_ops. KVM: x86 emulator: cleanup some direct calls into kvm to use existing callbacks KVM: x86 emulator: make set_cr() callback return error if it fails KVM: x86 emulator: make (get|set)_dr() callback return error if it fails KVM: x86 emulator: fix X86EMUL_RETRY_INSTR and X86EMUL_CMPXCHG_FAILED values KVM: fill in run->mmio details in (read|write)_emulated function. KVM: x86 emulator: x86_emulate_insn() return -1 only in case of emulation failure KVM: remove export of emulator_write_emulated(). KVM: do not inject #PF in (read|write)_emulated() callbacks KVM: handle emulation failure case first. KVM: x86 emulator: advance RIP outside x86 emulator code KVM: x86 emulator: set RFLAGS outside x86 emulator code. KVM: x86 emulator: use shadowed register in emulate_sysexit() KVM: x86 exmulator: handle shadowed registers outside emulator. KVM: x86 emulator: move interruptibility state tracking out of emulator KVM: x86 emulator: do not inject exception directly into vcpu arch/x86/include/asm/kvm_emulate.h | 30 ++- arch/x86/include/asm/kvm_host.h|9 - arch/x86/kvm/emulate.c | 559 arch/x86/kvm/x86.c | 382 ++--- 4 files changed, 549 insertions(+), 431 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv2 20/23] KVM: x86 emulator: use shadowed register in emulate_sysexit()
emulate_sysexit() should use shadowed registers copy instead of looking into vcpu state directly. Signed-off-by: Gleb Natapov --- arch/x86/kvm/emulate.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index eee5b4d..5d786d5 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2063,8 +2063,8 @@ emulate_sysexit(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) ops->set_cached_descriptor(&ss, VCPU_SREG_SS, ctxt->vcpu); ops->set_segment_selector(ss_sel, VCPU_SREG_SS, ctxt->vcpu); - c->eip = ctxt->vcpu->arch.regs[VCPU_REGS_RDX]; - c->regs[VCPU_REGS_RSP] = ctxt->vcpu->arch.regs[VCPU_REGS_RCX]; + c->eip = c->regs[VCPU_REGS_RDX]; + c->regs[VCPU_REGS_RSP] = c->regs[VCPU_REGS_RCX]; return X86EMUL_CONTINUE; } -- 1.6.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv2 04/23] KVM: x86 emulator: cleanup nop emulation
Make it more explicit what we are checking for. Signed-off-by: Gleb Natapov --- arch/x86/kvm/emulate.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index ea5c6fd..fbc555b 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2799,8 +2799,8 @@ special_insn: goto done; break; case 0x90: /* nop / xchg r8,rax */ - if (!(c->rex_prefix & 1)) { /* nop */ - c->dst.type = OP_NONE; + if (c->dst.ptr == (unsigned long *)&c->regs[VCPU_REGS_RAX]) { + c->dst.type = OP_NONE; /* nop */ break; } case 0x91 ... 0x97: /* xchg reg,rax */ -- 1.6.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv2 02/23] KVM: x86 emulator: fix Move r/m16 to segment register decoding.
This instruction does not need generic decoding for its dst operand. Signed-off-by: Gleb Natapov --- arch/x86/kvm/emulate.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 6f40337..efb7853 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -171,7 +171,7 @@ static u32 opcode_table[256] = { ByteOp | DstMem | SrcReg | ModRM | Mov, DstMem | SrcReg | ModRM | Mov, ByteOp | DstReg | SrcMem | ModRM | Mov, DstReg | SrcMem | ModRM | Mov, DstMem | SrcReg | ModRM | Mov, ModRM | DstReg, - DstReg | SrcMem | ModRM | Mov, Group | Group1A, + ImplicitOps | SrcMem | ModRM, Group | Group1A, /* 0x90 - 0x97 */ DstReg, DstReg, DstReg, DstReg, DstReg, DstReg, DstReg, DstReg, /* 0x98 - 0x9F */ -- 1.6.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv2 03/23] KVM: x86 emulator: cleanup xchg emulation.
Dst operand is already initialized during decoding stage. No need to reinitialize. Signed-off-by: Gleb Natapov --- arch/x86/kvm/emulate.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index efb7853..ea5c6fd 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2804,8 +2804,8 @@ special_insn: break; } case 0x91 ... 0x97: /* xchg reg,rax */ - c->src.type = c->dst.type = OP_REG; - c->src.bytes = c->dst.bytes = c->op_bytes; + c->src.type = OP_REG; + c->src.bytes = c->op_bytes; c->src.ptr = (unsigned long *) &c->regs[VCPU_REGS_RAX]; c->src.val = *(c->src.ptr); goto xchg; -- 1.6.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv2 01/23] KVM: x86 emulator: introduce read cache.
Introduce read cache which is needed for instruction that require more then one exit to userspace. After returning from userspace the instruction will be re-executed with cached read value. Signed-off-by: Gleb Natapov --- arch/x86/include/asm/kvm_emulate.h |1 + arch/x86/kvm/emulate.c | 56 +++- 2 files changed, 43 insertions(+), 14 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index 0b2729b..288cbed 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -186,6 +186,7 @@ struct decode_cache { unsigned long modrm_val; struct fetch_cache fetch; struct read_cache io_read; + struct read_cache mem_read; }; struct x86_emulate_ctxt { diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 5ac0bb4..6f40337 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -1263,6 +1263,33 @@ done: return (rc == X86EMUL_UNHANDLEABLE) ? -1 : 0; } +static int read_emulated(struct x86_emulate_ctxt *ctxt, +struct x86_emulate_ops *ops, +unsigned long addr, void *dest, unsigned size) +{ + int rc; + struct read_cache *mc = &ctxt->decode.mem_read; + + while (size) { + int n = min(size, 8u); + size -= n; + if (mc->pos < mc->end) + goto read_cached; + + rc = ops->read_emulated(addr, mc->data + mc->end, n, ctxt->vcpu); + if (rc != X86EMUL_CONTINUE) + return rc; + mc->end += n; + + read_cached: + memcpy(dest, mc->data + mc->pos, n); + mc->pos += n; + dest += n; + addr += n; + } + return X86EMUL_CONTINUE; +} + static int pio_in_emulated(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops, unsigned int size, unsigned short port, @@ -1504,9 +1531,9 @@ static int emulate_pop(struct x86_emulate_ctxt *ctxt, struct decode_cache *c = &ctxt->decode; int rc; - rc = ops->read_emulated(register_address(c, ss_base(ctxt), -c->regs[VCPU_REGS_RSP]), - dest, len, ctxt->vcpu); + rc = read_emulated(ctxt, ops, register_address(c, ss_base(ctxt), + c->regs[VCPU_REGS_RSP]), + dest, len); if (rc != X86EMUL_CONTINUE) return rc; @@ -2475,6 +2502,7 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) int saved_dst_type = c->dst.type; ctxt->interruptibility = 0; + ctxt->decode.mem_read.pos = 0; /* Shadow copy of register state. Committed on successful emulation. * NOTE: we can copy them from vcpu as x86_decode_insn() doesn't @@ -2529,20 +2557,16 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) } if (c->src.type == OP_MEM) { - rc = ops->read_emulated((unsigned long)c->src.ptr, - &c->src.val, - c->src.bytes, - ctxt->vcpu); + rc = read_emulated(ctxt, ops, (unsigned long)c->src.ptr, + &c->src.val, c->src.bytes); if (rc != X86EMUL_CONTINUE) goto done; c->src.orig_val = c->src.val; } if (c->src2.type == OP_MEM) { - rc = ops->read_emulated((unsigned long)c->src2.ptr, - &c->src2.val, - c->src2.bytes, - ctxt->vcpu); + rc = read_emulated(ctxt, ops, (unsigned long)c->src2.ptr, + &c->src2.val, c->src2.bytes); if (rc != X86EMUL_CONTINUE) goto done; } @@ -2553,8 +2577,8 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) if ((c->dst.type == OP_MEM) && !(c->d & Mov)) { /* optimisation - avoid slow emulated read if Mov */ - rc = ops->read_emulated((unsigned long)c->dst.ptr, &c->dst.val, - c->dst.bytes, ctxt->vcpu); + rc = read_emulated(ctxt, ops, (unsigned long)c->dst.ptr, + &c->dst.val, c->dst.bytes); if (rc != X86EMUL_CONTINUE) goto done; } @@ -2981,7 +3005,11 @@ writeback: (rc->end != 0 && rc->end == rc->pos)) ctxt->restart = false; } - + /* +* reset read cache here in case st
Re: [PATCH] virtio-spec: document block CMD and FLUSH
On Tue, Apr 20, 2010 at 12:26:27AM +0300, Michael S. Tsirkin wrote: > On Fri, Feb 19, 2010 at 12:22:20AM +0200, Michael S. Tsirkin wrote: > > I took a stub at documenting CMD and FLUSH request types in virtio > > block. > > Any comments? Rusty? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/5] test: access: don't expect fetch fault indication if !efer.nx
Bit 4 of the page-fault error code can only be set if efer.nx=1. Signed-off-by: Avi Kivity --- kvm/user/test/x86/access.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/kvm/user/test/x86/access.c b/kvm/user/test/x86/access.c index 5addd15..3338fbc 100644 --- a/kvm/user/test/x86/access.c +++ b/kvm/user/test/x86/access.c @@ -463,6 +463,8 @@ no_pte: fault: if (!at->expected_fault) at->ignore_pde = 0; +if (!at->flags[AC_CPU_EFER_NX]) +at->expected_error &= ~PFERR_FETCH_MASK; } static void ac_test_check(ac_test_t *at, _Bool *success_ret, _Bool cond, -- 1.7.0.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/5] test: access: report successful tests optionally
There are so many tests, no point in reporting successful ones. Leave the capability optional in case a failure triggers a kvm or testsuite crash before the report is complete and we need to see which test failed. Signed-off-by: Avi Kivity --- kvm/user/test/x86/access.c | 15 +-- 1 files changed, 13 insertions(+), 2 deletions(-) diff --git a/kvm/user/test/x86/access.c b/kvm/user/test/x86/access.c index 0906691..c7a7075 100644 --- a/kvm/user/test/x86/access.c +++ b/kvm/user/test/x86/access.c @@ -6,6 +6,8 @@ #define true 1 #define false 0 +static _Bool verbose = false; + typedef unsigned long pt_element_t; #define PAGE_SIZE ((pt_element_t)4096) @@ -145,6 +147,9 @@ typedef struct { unsigned long linear_addr; } __attribute__((packed)) descriptor_table_t; + +static void ac_test_show(ac_test_t *at); + void lidt(idt_entry_t *idt, int nentries) { descriptor_table_t dt; @@ -469,6 +474,10 @@ static void ac_test_check(ac_test_t *at, _Bool *success_ret, _Bool cond, *success_ret = false; +if (!verbose) { +ac_test_show(at); +} + va_start(ap, fmt); vsnprintf(buf, sizeof(buf), fmt, ap); va_end(ap); @@ -565,7 +574,7 @@ int ac_test_do_access(ac_test_t *at) ac_test_check(at, &success, *at->pdep != at->expected_pde, "pde %x expected %x", *at->pdep, at->expected_pde); -if (success) { +if (success && verbose) { printf("PASS\n"); } return success; @@ -590,7 +599,9 @@ int ac_test_exec(ac_test_t *at) { int r; -ac_test_show(at); +if (verbose) { +ac_test_show(at); +} ac_test_setup_pte(at); r = ac_test_do_access(at); return r; -- 1.7.0.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/5] test: access: allow the processor not to set pde.a if a fault occurs
Some processors only set accessed bits if the translation is valid; allow this behaviour. This squelchs errors running with EPT. Signed-off-by: Avi Kivity --- kvm/user/test/x86/access.c | 25 - 1 files changed, 20 insertions(+), 5 deletions(-) diff --git a/kvm/user/test/x86/access.c b/kvm/user/test/x86/access.c index c7a7075..5addd15 100644 --- a/kvm/user/test/x86/access.c +++ b/kvm/user/test/x86/access.c @@ -137,6 +137,7 @@ typedef struct { pt_element_t expected_pte; pt_element_t *pdep; pt_element_t expected_pde; +pt_element_t ignore_pde; int expected_fault; unsigned expected_error; idt_entry_t idt[256]; @@ -370,6 +371,7 @@ void ac_test_setup_pte(ac_test_t *at) if (at->ptep) at->expected_pte = *at->ptep; at->expected_pde = *at->pdep; +at->ignore_pde = 0; at->expected_fault = 0; at->expected_error = PFERR_PRESENT_MASK; @@ -416,13 +418,17 @@ void ac_test_setup_pte(ac_test_t *at) if (at->flags[AC_ACCESS_FETCH] && at->flags[AC_PDE_NX]) at->expected_fault = 1; -if (at->expected_fault) +if (!at->flags[AC_PDE_ACCESSED]) +at->ignore_pde = PT_ACCESSED_MASK; + +if (!pde_valid) goto fault; -at->expected_pde |= PT_ACCESSED_MASK; +if (!at->expected_fault) +at->expected_pde |= PT_ACCESSED_MASK; if (at->flags[AC_PDE_PSE]) { - if (at->flags[AC_ACCESS_WRITE]) + if (at->flags[AC_ACCESS_WRITE] && !at->expected_fault) at->expected_pde |= PT_DIRTY_MASK; goto no_pte; } @@ -455,7 +461,8 @@ void ac_test_setup_pte(ac_test_t *at) no_pte: fault: -; +if (!at->expected_fault) +at->ignore_pde = 0; } static void ac_test_check(ac_test_t *at, _Bool *success_ret, _Bool cond, @@ -484,6 +491,13 @@ static void ac_test_check(ac_test_t *at, _Bool *success_ret, _Bool cond, printf("FAIL: %s\n", buf); } +static int pt_match(pt_element_t pte1, pt_element_t pte2, pt_element_t ignore) +{ +pte1 &= ~ignore; +pte2 &= ~ignore; +return pte1 == pte2; +} + int ac_test_do_access(ac_test_t *at) { static unsigned unique = 42; @@ -571,7 +585,8 @@ int ac_test_do_access(ac_test_t *at) "error code %x expected %x", e, at->expected_error); ac_test_check(at, &success, at->ptep && *at->ptep != at->expected_pte, "pte %x expected %x", *at->ptep, at->expected_pte); -ac_test_check(at, &success, *at->pdep != at->expected_pde, +ac_test_check(at, &success, + !pt_match(*at->pdep, at->expected_pde, at->ignore_pde), "pde %x expected %x", *at->pdep, at->expected_pde); if (success && verbose) { -- 1.7.0.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/5] test: access: split off test case formatting from execution
Signed-off-by: Avi Kivity --- kvm/user/test/x86/access.c | 10 -- 1 files changed, 8 insertions(+), 2 deletions(-) diff --git a/kvm/user/test/x86/access.c b/kvm/user/test/x86/access.c index 5eadff8..dbc1213 100644 --- a/kvm/user/test/x86/access.c +++ b/kvm/user/test/x86/access.c @@ -557,9 +557,8 @@ int ac_test_do_access(ac_test_t *at) return 1; } -int ac_test_exec(ac_test_t *at) +static void ac_test_show(ac_test_t *at) { -int r; char line[5000]; *line = 0; @@ -571,6 +570,13 @@ int ac_test_exec(ac_test_t *at) } strcat(line, ": "); printf("%s", line); +} + +int ac_test_exec(ac_test_t *at) +{ +int r; + +ac_test_show(at); ac_test_setup_pte(at); r = ac_test_do_access(at); return r; -- 1.7.0.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/5] Fixes for mmu access unit test
The mmu access unit test fails when testing real processor (e.g. with npt or ept). In one case this was because the test was too restrictive (more than one result is allowed) and in another both the test and kvm shadow mmu are faulty. This patchset fixes the test. Another patch will follow fixing kvm (but only after nnpt is merged). Avi Kivity (5): test: access: split off test case formatting from execution test: access: consolidate test failure reporting into a function test: access: report successful tests optionally test: access: allow the processor not to set pde.a if a fault occurs test: access: don't expect fetch fault indication if !efer.nx kvm/user/test/x86/access.c | 106 1 files changed, 77 insertions(+), 29 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/5] test: access: consolidate test failure reporting into a function
Signed-off-by: Avi Kivity --- kvm/user/test/x86/access.c | 60 +++- 1 files changed, 37 insertions(+), 23 deletions(-) diff --git a/kvm/user/test/x86/access.c b/kvm/user/test/x86/access.c index dbc1213..0906691 100644 --- a/kvm/user/test/x86/access.c +++ b/kvm/user/test/x86/access.c @@ -453,6 +453,28 @@ fault: ; } +static void ac_test_check(ac_test_t *at, _Bool *success_ret, _Bool cond, + const char *fmt, ...) +{ +va_list ap; +char buf[500]; + +if (!*success_ret) { +return; +} + +if (!cond) { +return; +} + +*success_ret = false; + +va_start(ap, fmt); +vsnprintf(buf, sizeof(buf), fmt, ap); +va_end(ap); +printf("FAIL: %s\n", buf); +} + int ac_test_do_access(ac_test_t *at) { static unsigned unique = 42; @@ -460,6 +482,7 @@ int ac_test_do_access(ac_test_t *at) unsigned e; static unsigned char user_stack[4096]; unsigned long rsp; +_Bool success = true; ++unique; @@ -531,30 +554,21 @@ int ac_test_do_access(ac_test_t *at) "jmp back_to_kernel \n\t" ".section .text"); -if (fault && !at->expected_fault) { - printf("FAIL: unexpected fault\n"); - return 0; -} -if (!fault && at->expected_fault) { - printf("FAIL: unexpected access\n"); - return 0; -} -if (fault && e != at->expected_error) { - printf("FAIL: error code %x expected %x\n", e, at->expected_error); - return 0; -} -if (at->ptep && *at->ptep != at->expected_pte) { - printf("FAIL: pte %x expected %x\n", *at->ptep, at->expected_pte); - return 0; +ac_test_check(at, &success, fault && !at->expected_fault, + "unexpected fault"); +ac_test_check(at, &success, !fault && at->expected_fault, + "unexpected access"); +ac_test_check(at, &success, fault && e != at->expected_error, + "error code %x expected %x", e, at->expected_error); +ac_test_check(at, &success, at->ptep && *at->ptep != at->expected_pte, + "pte %x expected %x", *at->ptep, at->expected_pte); +ac_test_check(at, &success, *at->pdep != at->expected_pde, + "pde %x expected %x", *at->pdep, at->expected_pde); + +if (success) { +printf("PASS\n"); } - -if (*at->pdep != at->expected_pde) { - printf("FAIL: pde %x expected %x\n", *at->pdep, at->expected_pde); - return 0; -} - -printf("PASS\n"); -return 1; +return success; } static void ac_test_show(ac_test_t *at) -- 1.7.0.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 06/10] kvm: remove explicit kvm_arch_reset_vcpu from kvm_init_vcpu
On 04/26/2010 12:59 PM, Marcelo Tosatti wrote: This is now done via the initialization's qemu_system_reset call. Signed-off-by: Avi Kivity --- kvm-all.c |1 - 1 files changed, 0 insertions(+), 1 deletions(-) diff --git a/kvm-all.c b/kvm-all.c index 9c8aa7d..eabb097 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -208,7 +208,6 @@ int kvm_init_vcpu(CPUState *env) ret = kvm_arch_init_vcpu(env); if (ret == 0) { qemu_register_reset(kvm_reset_vcpu, env); -kvm_arch_reset_vcpu(env); } err: return ret; This breaks -enable-kvm in upstream qemu. No progress is ever made in the guest. Regards, Anthony Liguori -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 15/22] KVM: MMU: Introduce kvm_read_guest_page_x86()
On Tue, Apr 27, 2010 at 04:35:06PM +0300, Avi Kivity wrote: > Perhaps have > >struct kvm_mmu base_mmu; >struct kvm_mmu nested_mmu; >struct kvm_mmu *mmu; Okay, the pointer is only used in the gva_to_gpa path which is the minority of mmu context usages. I will introduce a struct kvm_mmu *walk_mmu; and use it in the page walking code. Joerg -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Potential thread synchronization issue in qcow2.c and qcow2-cluster.c
Hi there, I am reading the snapshot source code of qemu-kvm-0.12.3, and am puzzled by the thread synchronization issue in qcow2.c and qcow2-cluster.c. Could someone please enlighten me? Thanks! Specifically, I found that BDRVQcowState.cluster_allocs, which is a QLIST_HEAD, may be accessed concurrently by two threads, but I did not see how the two thread properly synchronize with each other to avoid race conditions. I profiled all executions of qemu_mutex_lock_iothread(), and found that it only protects the vl.c:main_loop_wai() thread but does NOT protect the qemu-kvm.c:kvm_cpu_exec() thread. Did I miss something or is this a defect? Note that qemu-kvm.c:kvm_cpu_exec() is executed, instead of kvm-all.c:kvm_cpu_exec(). Please see below for more details. Here is an example of how two threads may reach qcow2. Stack trace of thread 1: main -> main_loop -> kvm_main_loop -> main_loop_wait -> posix_aio_read -> posix_aio_process_queue -> qcow_aio_write_cb -> qcow2_alloc_cluster_offset (which may modify BDRVQcowState.cluster_allocs) Stack trace of thread 2: ap_main_loop -> ... -> kvm_handle_io -> ... -> qcow_aio_writev -> qcow_aio_write_cb -> qcow2_alloc_cluster_offset (which may modify BDRVQcowState.cluster_allocs) Here is the trace showing that qemu_mutex_lock_iothread() does not protect the thread that executes. kvm_cpu_exec()->...->qcow_aio_write_cb(). home/ctang/kvm/qemu-kvm-0.12.3/qemu-kvm.c : 2530thread: b7e056d0 /home/ctang/kvm/bin/qemu-system-x86_64(qemu_mutex_unlock_iothread+0x1a) [0x8092242] /home/ctang/kvm/bin/qemu-system-x86_64(main_loop_wait+0x221) [0x806edef] /home/ctang/kvm/bin/qemu-system-x86_64(kvm_main_loop+0x1ff) [0x80916a1] /home/ctang/kvm/bin/qemu-system-x86_64 [0x806f5c2] /home/ctang/kvm/bin/qemu-system-x86_64(main+0x2e2c) [0x80736d1] /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe5) [0xb7e33775] /home/ctang/kvm/bin/qemu-system-x86_64 [0x8068bb1] block/qcow2-cluster.c : 721thread: b7dc2b90 /home/ctang/kvm/bin/qemu-system-x86_64(qcow2_alloc_cluster_offset+0x3c) [0x81175fa] /home/ctang/kvm/bin/qemu-system-x86_64(qcow_aio_write_cb+0x158) [0x8111d73] /home/ctang/kvm/bin/qemu-system-x86_64(qcow_aio_writev+0x94) [0x8112054] /home/ctang/kvm/bin/qemu-system-x86_64(bdrv_aio_writev+0xe1) [0x80fa8e9] /home/ctang/kvm/bin/qemu-system-x86_64 [0x81f4a96] /home/ctang/kvm/bin/qemu-system-x86_64 [0x81f4c04] /home/ctang/kvm/bin/qemu-system-x86_64(dma_bdrv_write+0x48) [0x81f4cbf] /home/ctang/kvm/bin/qemu-system-x86_64 [0x80a437c] /home/ctang/kvm/bin/qemu-system-x86_64(bmdma_cmd_writeb+0x73) [0x80a9503] /home/ctang/kvm/bin/qemu-system-x86_64 [0x812b1eb] /home/ctang/kvm/bin/qemu-system-x86_64(cpu_outb+0x27) [0x812b4e6] /home/ctang/kvm/bin/qemu-system-x86_64 [0x808d267] /home/ctang/kvm/bin/qemu-system-x86_64(kvm_run+0x2f4) [0x808f4b8] /home/ctang/kvm/bin/qemu-system-x86_64(kvm_cpu_exec+0x56) [0x80907b2] /home/ctang/kvm/bin/qemu-system-x86_64 [0x8090f4d] /home/ctang/kvm/bin/qemu-system-x86_64 [0x8091098] /lib/tls/i686/cmov/libpthread.so.0 [0xb7fd24ff] /lib/tls/i686/cmov/libc.so.6(clone+0x5e) [0xb7f0149e] /home/ctang/kvm/qemu-kvm-0.12.3/qemu-kvm.c : 2537thread: b7e056d0 /home/ctang/kvm/bin/qemu-system-x86_64(qemu_mutex_lock_iothread+0x1a) [0x809229d] /home/ctang/kvm/bin/qemu-system-x86_64(main_loop_wait+0x25c) [0x806ee2a] /home/ctang/kvm/bin/qemu-system-x86_64(kvm_main_loop+0x1ff) [0x80916a1] /home/ctang/kvm/bin/qemu-system-x86_64 [0x806f5c2] /home/ctang/kvm/bin/qemu-system-x86_64(main+0x2e2c) [0x80736d1] /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe5) [0xb7e33775] /home/ctang/kvm/bin/qemu-system-x86_64 [0x8068bb1] -- Regards, Chunqiang Tang -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 12/22] KVM: MMU: Implement nested gva_to_gpa functions
On Tue, Apr 27, 2010 at 03:37:40PM +0300, Avi Kivity wrote: > On 04/27/2010 01:38 PM, Joerg Roedel wrote: > >This patch adds the functions to do a nested l2_gva to > >l1_gpa page table walk. > > > >Signed-off-by: Joerg Roedel > >--- > > arch/x86/include/asm/kvm_host.h |4 > > arch/x86/kvm/mmu.c |8 > > arch/x86/kvm/paging_tmpl.h | 31 +++ > > 3 files changed, 43 insertions(+), 0 deletions(-) > > > >diff --git a/arch/x86/include/asm/kvm_host.h > >b/arch/x86/include/asm/kvm_host.h > >index ccdbd2f..3cbfb51 100644 > >--- a/arch/x86/include/asm/kvm_host.h > >+++ b/arch/x86/include/asm/kvm_host.h > >@@ -287,6 +287,10 @@ struct kvm_vcpu_arch { > > bool tpr_access_reporting; > > > > struct kvm_mmu mmu; > >+ > >+/* Used for two dimensional paging emulation */ > >+struct kvm_mmu nested_mmu; > >+ > > /* only needed in kvm_pv_mmu_op() path, but it's hot so > > * put it here to avoid allocation */ > > struct kvm_pv_mmu_op_buffer mmu_op_buffer; > >diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c > >index 4e0bfdb..8bd40b5 100644 > >--- a/arch/x86/kvm/mmu.c > >+++ b/arch/x86/kvm/mmu.c > >@@ -2144,6 +2144,14 @@ static gpa_t nonpaging_gva_to_gpa(struct kvm_vcpu > >*vcpu, gva_t vaddr, > > return vaddr; > > } > > > >+static gpa_t nonpaging_gva_to_gpa_nested(struct kvm_vcpu *vcpu, gva_t vaddr, > >+ u32 access, u32 *error) > >+{ > >+if (error) > >+*error = 0; > > Why allow error == NULL? In the emulator there are some functions passing NULL for the error code. In general it doesn't matter much because when we track this information in the vcpu struct these error parameters can be removed. I just let this outside the scope of this patchset and make that cleanup later. > > > > >+static gpa_t FNAME(gva_to_gpa_nested)(struct kvm_vcpu *vcpu, gva_t vaddr, > >+ u32 access, u32 *error) > >+{ > >+struct guest_walker walker; > >+gpa_t gpa = UNMAPPED_GVA; > >+int r; > >+ > >+r = FNAME(walk_addr_nested)(&walker, vcpu, vaddr, > >+!!(access& PFERR_WRITE_MASK), > >+!!(access& PFERR_USER_MASK), > >+!!(access& PFERR_FETCH_MASK)); > > Those !! are unneeded. Ok, I remove them. Joerg -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/5] Fix EFER.NX=0 with EPT
Currently we run with EFER.NX=1 on the guest even if the guest value is 0. This is fine with shadow, since we check bit 63 when instantiating a page table, and fault if bit 63 is set while EFER.NX is clear. This doesn't work with EPT, since we no longer get the change to check guest ptes. So we need to run with EFER.NX=0. This is complicated by the fact that if we switch EFER.NX on the host, we'll trap immediately, since some host pages are mapped with the NX bit set. As a result, we need to switch the MSR atomically during guest entry and exit. This patchset implements the complications described above. Avi Kivity (5): KVM: Let vcpu structure alignment be determined at runtime KVM: VMX: Add definition for msr autoload entry KVM: VMX: Add definitions for guest and host EFER autoswitch vmcs entries KVM: VMX: Add facility to atomically switch MSRs on guest entry/exit KVM: VMX: Atomically switch efer if EPT && !EFER.NX arch/ia64/kvm/vmm.c|2 +- arch/powerpc/kvm/44x.c |2 +- arch/powerpc/kvm/book3s.c |3 +- arch/powerpc/kvm/e500.c|2 +- arch/s390/kvm/kvm-s390.c |2 +- arch/x86/include/asm/vmx.h | 12 +++- arch/x86/kvm/svm.c |2 +- arch/x86/kvm/vmx.c | 63 +++- include/linux/kvm_host.h |2 +- virt/kvm/kvm_main.c|7 +++-- 10 files changed, 85 insertions(+), 12 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/5] KVM: Let vcpu structure alignment be determined at runtime
vmx and svm vcpus have different contents and therefore may have different alignmment requirements. Let each specify its required alignment. Signed-off-by: Avi Kivity --- arch/ia64/kvm/vmm.c |2 +- arch/powerpc/kvm/44x.c|2 +- arch/powerpc/kvm/book3s.c |3 ++- arch/powerpc/kvm/e500.c |2 +- arch/s390/kvm/kvm-s390.c |2 +- arch/x86/kvm/svm.c|2 +- arch/x86/kvm/vmx.c|3 ++- include/linux/kvm_host.h |2 +- virt/kvm/kvm_main.c |7 --- 9 files changed, 14 insertions(+), 11 deletions(-) diff --git a/arch/ia64/kvm/vmm.c b/arch/ia64/kvm/vmm.c index 7a62f75..f0b9cac 100644 --- a/arch/ia64/kvm/vmm.c +++ b/arch/ia64/kvm/vmm.c @@ -51,7 +51,7 @@ static int __init kvm_vmm_init(void) vmm_fpswa_interface = fpswa_interface; /*Register vmm data to kvm side*/ - return kvm_init(&vmm_info, 1024, THIS_MODULE); + return kvm_init(&vmm_info, 1024, 0, THIS_MODULE); } static void __exit kvm_vmm_exit(void) diff --git a/arch/powerpc/kvm/44x.c b/arch/powerpc/kvm/44x.c index 689a57c..73c0a3f 100644 --- a/arch/powerpc/kvm/44x.c +++ b/arch/powerpc/kvm/44x.c @@ -147,7 +147,7 @@ static int __init kvmppc_44x_init(void) if (r) return r; - return kvm_init(NULL, sizeof(struct kvmppc_vcpu_44x), THIS_MODULE); + return kvm_init(NULL, sizeof(struct kvmppc_vcpu_44x), 0, THIS_MODULE); } static void __exit kvmppc_44x_exit(void) diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 9f97dbe..16e3f4b 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -1384,7 +1384,8 @@ int __kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) static int kvmppc_book3s_init(void) { - return kvm_init(NULL, sizeof(struct kvmppc_vcpu_book3s), THIS_MODULE); + return kvm_init(NULL, sizeof(struct kvmppc_vcpu_book3s), 0, + THIS_MODULE); } static void kvmppc_book3s_exit(void) diff --git a/arch/powerpc/kvm/e500.c b/arch/powerpc/kvm/e500.c index 669a5c5..bc2b400 100644 --- a/arch/powerpc/kvm/e500.c +++ b/arch/powerpc/kvm/e500.c @@ -161,7 +161,7 @@ static int __init kvmppc_e500_init(void) flush_icache_range(kvmppc_booke_handlers, kvmppc_booke_handlers + max_ivor + kvmppc_handler_len); - return kvm_init(NULL, sizeof(struct kvmppc_vcpu_e500), THIS_MODULE); + return kvm_init(NULL, sizeof(struct kvmppc_vcpu_e500), 0, THIS_MODULE); } static void __init kvmppc_e500_exit(void) diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index ee7c713..8093e6f 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -752,7 +752,7 @@ gfn_t unalias_gfn(struct kvm *kvm, gfn_t gfn) static int __init kvm_s390_init(void) { int ret; - ret = kvm_init(NULL, sizeof(struct kvm_vcpu), THIS_MODULE); + ret = kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE); if (ret) return ret; diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 889f660..2511664 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -3319,7 +3319,7 @@ static struct kvm_x86_ops svm_x86_ops = { static int __init svm_init(void) { return kvm_init(&svm_x86_ops, sizeof(struct vcpu_svm), - THIS_MODULE); + __alignof__(struct vcpu_svm), THIS_MODULE); } static void __exit svm_exit(void) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 875b785..2e87296 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -4245,7 +4245,8 @@ static int __init vmx_init(void) set_bit(0, vmx_vpid_bitmap); /* 0 is reserved for host */ - r = kvm_init(&vmx_x86_ops, sizeof(struct vcpu_vmx), THIS_MODULE); + r = kvm_init(&vmx_x86_ops, sizeof(struct vcpu_vmx), +__alignof__(struct vcpu_vmx), THIS_MODULE); if (r) goto out3; diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index ce027d5..7cb116a 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -243,7 +243,7 @@ void kvm_vcpu_uninit(struct kvm_vcpu *vcpu); void vcpu_load(struct kvm_vcpu *vcpu); void vcpu_put(struct kvm_vcpu *vcpu); -int kvm_init(void *opaque, unsigned int vcpu_size, +int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align, struct module *module); void kvm_exit(void); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 9ab1a77..f032806 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2178,7 +2178,7 @@ static void kvm_sched_out(struct preempt_notifier *pn, kvm_arch_vcpu_put(vcpu); } -int kvm_init(void *opaque, unsigned int vcpu_size, +int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align, struct module *module) { int r; @@ -2228,8 +2228,9 @@ int kvm_init(void *opaque, unsigned int vcpu_size, goto out_free_4; /* A k
[PATCH 5/5] KVM: VMX: Atomically switch efer if EPT && !EFER.NX
When EPT is enabled, we cannot emulate EFER.NX=0 through the shadow page tables. This causes accesses through ptes with bit 63 set to succeed instead of failing a reserved bit check. Signed-off-by: Avi Kivity --- arch/x86/kvm/vmx.c | 11 +++ 1 files changed, 11 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index ae22dcf..cc78fee 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -678,6 +678,17 @@ static bool update_transition_efer(struct vcpu_vmx *vmx, int efer_offset) guest_efer |= host_efer & ignore_bits; vmx->guest_msrs[efer_offset].data = guest_efer; vmx->guest_msrs[efer_offset].mask = ~ignore_bits; + + clear_atomic_switch_msr(vmx, MSR_EFER); + /* On ept, can't emulate nx, and must switch nx atomically */ + if (enable_ept && ((vmx->vcpu.arch.efer ^ host_efer) & EFER_NX)) { + guest_efer = vmx->vcpu.arch.efer; + if (!(guest_efer & EFER_LMA)) + guest_efer &= ~EFER_LME; + add_atomic_switch_msr(vmx, MSR_EFER, guest_efer, host_efer); + return false; + } + return true; } -- 1.7.0.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/5] KVM: VMX: Add definition for msr autoload entry
Signed-off-by: Avi Kivity --- arch/x86/include/asm/vmx.h |8 +++- 1 files changed, 7 insertions(+), 1 deletions(-) diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index fb9a080..4497318 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -25,6 +25,8 @@ * */ +#include + /* * Definitions of Primary Processor-Based VM-Execution Controls. */ @@ -394,6 +396,10 @@ enum vmcs_field { #define ASM_VMX_INVEPT ".byte 0x66, 0x0f, 0x38, 0x80, 0x08" #define ASM_VMX_INVVPID ".byte 0x66, 0x0f, 0x38, 0x81, 0x08" - +struct vmx_msr_entry { + u32 index; + u32 reserved; + u64 value; +} __aligned(16); #endif -- 1.7.0.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/5] KVM: VMX: Add definitions for guest and host EFER autoswitch vmcs entries
Signed-off-by: Avi Kivity --- arch/x86/include/asm/vmx.h |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index 4497318..9e6779f 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -122,6 +122,8 @@ enum vmcs_field { GUEST_IA32_DEBUGCTL_HIGH= 0x2803, GUEST_IA32_PAT = 0x2804, GUEST_IA32_PAT_HIGH = 0x2805, + GUEST_IA32_EFER = 0x2806, + GUEST_IA32_EFER_HIGH= 0x2807, GUEST_PDPTR0= 0x280a, GUEST_PDPTR0_HIGH = 0x280b, GUEST_PDPTR1= 0x280c, @@ -132,6 +134,8 @@ enum vmcs_field { GUEST_PDPTR3_HIGH = 0x2811, HOST_IA32_PAT = 0x2c00, HOST_IA32_PAT_HIGH = 0x2c01, + HOST_IA32_EFER = 0x2c02, + HOST_IA32_EFER_HIGH = 0x2c03, PIN_BASED_VM_EXEC_CONTROL = 0x4000, CPU_BASED_VM_EXEC_CONTROL = 0x4002, EXCEPTION_BITMAP= 0x4004, -- 1.7.0.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/5] KVM: VMX: Add facility to atomically switch MSRs on guest entry/exit
Some guest msr values cannot be used on the host (for example. EFER.NX=0), so we need to switch them atomically during guest entry or exit. Add a facility to program the vmx msr autoload registers accordingly. Signed-off-by: Avi Kivity --- arch/x86/kvm/vmx.c | 49 + 1 files changed, 49 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 2e87296..ae22dcf 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -98,6 +98,8 @@ module_param(ple_gap, int, S_IRUGO); static int ple_window = KVM_VMX_DEFAULT_PLE_WINDOW; module_param(ple_window, int, S_IRUGO); +#define NR_AUTOLOAD_MSRS 1 + struct vmcs { u32 revision_id; u32 abort; @@ -125,6 +127,11 @@ struct vcpu_vmx { u64 msr_guest_kernel_gs_base; #endif struct vmcs *vmcs; + struct msr_autoload { + unsigned nr; + struct vmx_msr_entry guest[NR_AUTOLOAD_MSRS]; + struct vmx_msr_entry host[NR_AUTOLOAD_MSRS]; + } msr_autoload; struct { int loaded; u16 fs_sel, gs_sel, ldt_sel; @@ -595,6 +602,46 @@ static void update_exception_bitmap(struct kvm_vcpu *vcpu) vmcs_write32(EXCEPTION_BITMAP, eb); } +static void clear_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr) +{ + unsigned i; + struct msr_autoload *m = &vmx->msr_autoload; + + for (i = 0; i < m->nr; ++i) + if (m->guest[i].index == msr) + break; + + if (i == m->nr) + return; + --m->nr; + m->guest[i] = m->guest[m->nr]; + m->host[i] = m->host[m->nr]; + vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, m->nr); + vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, m->nr); +} + +static void add_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr, + u64 guest_val, u64 host_val) +{ + unsigned i; + struct msr_autoload *m = &vmx->msr_autoload; + + for (i = 0; i < m->nr; ++i) + if (m->guest[i].index == msr) + break; + + if (i == m->nr) { + ++m->nr; + vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, m->nr); + vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, m->nr); + } + + m->guest[i].index = msr; + m->guest[i].value = guest_val; + m->host[i].index = msr; + m->host[i].value = host_val; +} + static void reload_tss(void) { /* @@ -2470,7 +2517,9 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx) vmcs_writel(HOST_RIP, kvm_vmx_return); /* 22.2.5 */ vmcs_write32(VM_EXIT_MSR_STORE_COUNT, 0); vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, 0); + vmcs_write64(VM_EXIT_MSR_LOAD_ADDR, __pa(vmx->msr_autoload.host)); vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, 0); + vmcs_write64(VM_ENTRY_MSR_LOAD_ADDR, __pa(vmx->msr_autoload.guest)); rdmsr(MSR_IA32_SYSENTER_CS, host_sysenter_cs, junk); vmcs_write32(HOST_IA32_SYSENTER_CS, host_sysenter_cs); -- 1.7.0.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/9] KVM test: Log the content from guest serial console
On 04/26/2010 01:04 PM, Jason Wang wrote: > This patch tries to get the content of guest serial and log it into > the debug directoy of the testcase through a dedicated thread which is > created in the preprocessing and ended in the postprocessing. The > params of serial_mode must be set to "dump" in order to make use of > this feature. > > Signed-off-by: Jason Wang > --- > client/tests/kvm/kvm_preprocessing.py | 59 > +++- > client/tests/kvm/tests_base.cfg.sample |1 + > 2 files changed, 59 insertions(+), 1 deletions(-) > > diff --git a/client/tests/kvm/kvm_preprocessing.py > b/client/tests/kvm/kvm_preprocessing.py > index 4b9290c..50d0e35 100644 > --- a/client/tests/kvm/kvm_preprocessing.py > +++ b/client/tests/kvm/kvm_preprocessing.py > @@ -1,4 +1,5 @@ > import sys, os, time, commands, re, logging, signal, glob, threading, shutil > +import socket, select > from autotest_lib.client.bin import test, utils > from autotest_lib.client.common_lib import error > import kvm_vm, kvm_utils, kvm_subprocess, ppm_utils > @@ -13,7 +14,8 @@ except ImportError: > > _screendump_thread = None > _screendump_thread_termination_event = None > - > +_serialdump_thread = None > +_serialdump_thread_termination_event = None > > def preprocess_image(test, params): > """ > @@ -267,6 +269,16 @@ def preprocess(test, params, env): >args=(test, params, env)) > _screendump_thread.start() > > +# Start the serial dump thread > +if params.get("serial_mode") == "dump": > +logging.debug("Starting serialdump thread") > +global _serialdump_thread, _serialdump_thread_termination_event > +_serialdump_thread_termination_event = threading.Event() > +_serialdump_thread = threading.Thread(target=_dump_serial_console, > + args=(test, params, env)) > +_serialdump_thread.start() > + > + > > def postprocess(test, params, env): > """ > @@ -286,6 +298,13 @@ def postprocess(test, params, env): > _screendump_thread_termination_event.set() > _screendump_thread.join(10) > > +# Terminate the serialdump thread > +global _serialdump_thread, _serialdump_thread_termination_event > +if _serialdump_thread: > +logging.debug("Terminating serialdump thread...") > +_serialdump_thread_termination_event.set() > +_serialdump_thread.join(10) > + > # Warn about corrupt PPM files > for f in glob.glob(os.path.join(test.debugdir, "*.ppm")): > if not ppm_utils.image_verify_ppm_file(f): > @@ -450,3 +469,41 @@ def _take_screendumps(test, params, env): > if _screendump_thread_termination_event.isSet(): > break > _screendump_thread_termination_event.wait(delay) > + > +def _dump_serial_console(test, params, env): > +global _serialdump_thread_termination_event > +rs = [] > +files = {} > + > +while True: > +for vm in kvm_utils.env_get_all_vms(env): > +if not files.has_key(vm): You should probably add "and not vm.is_dead()" to this condition. Otherwise we'll get lots of "could not connect to serial socket" messages for dead VMs. Style note: AFAIK in the Autotest project the form "if not vm in files" is preferred. > +try: > +serial_socket = socket.socket(socket.AF_UNIX, > + socket.SOCK_STREAM) > +serial_socket.setblocking(False) > +serial_socket.connect(vm.serial_file_name) > +except: > +logging.debug("Could not connect to serial socket for > %s" % > + vm.name) > +continue > +rs.append(serial_socket) > +serial_dump_filename = os.path.join(test.debugdir, > +"serial-%s" % vm.name) > +files[vm] = [serial_socket, file(serial_dump_filename, "a+")] > + > +r, w, x = select.select(rs, [], [], 0.5) > +for vm in files.keys(): > +[s ,d] = files[vm] For consistency, please consider changing this list to a tuple, i.e. s, d = files[vm] or (s, d) = files[vm]. > +if s in r: > +data = s.recv(16384) > +if len(data) == 0: Style note: AFAIK the preferred form is "if not data". Sorry for the petty comments. Overall the patch looks good. > +rs.remove(s) > +files.pop(vm) > +else: > +d.write(data) > + > +if _serialdump_thread_termination_event.isSet(): > +break > + > + > diff --git a/client/tests/kvm/tests_base.cfg.sample > b/client/tests/kvm/tests_base.cfg.sample > index 9f82ffb..169a69e 100644 > --- a/client/tests/kvm/tests_base.cfg.sample > +++
Re: [PATCH 08/23] KVM: x86 emulator: cleanup some direct calls into kvm to use existing callbacks
On 04/28/2010 12:33 PM, Gleb Natapov wrote: On Wed, Apr 28, 2010 at 11:59:54AM +0300, Avi Kivity wrote: On 04/27/2010 03:15 PM, Gleb Natapov wrote: Use callbacks from x86_emulate_ops to access segments instead of calling into kvm directly. -static unsigned long seg_base(struct x86_emulate_ctxt *ctxt, int seg) +static unsigned long seg_base(struct x86_emulate_ctxt *ctxt, + struct x86_emulate_ops *ops, int seg) { - if (ctxt->mode == X86EMUL_MODE_PROT64&& seg< VCPU_SREG_FS) - return 0; + unsigned long base; - return kvm_x86_ops->get_segment_base(ctxt->vcpu, seg); get_segment_base() is only one vmread on intel, but you replace it with reading the entire segment. Didn't what to have separate x86_emulate_ops callback for reading segment base. If this is serious performance concern it can be added It will definitely hurt nested vmx. For ordinary vmx perhaps not so much, but better not to regress. + if (ctxt->mode == X86EMUL_MODE_PROT64) { + u64 val; + switch (seg) { + case VCPU_SREG_FS: + ops->get_msr(ctxt->vcpu, MSR_FS_BASE,&val); + break; + case VCPU_SREG_GS: + ops->get_msr(ctxt->vcpu, MSR_GS_BASE,&val); + break; + default: + val = 0; + break; + } Why this ugliness? get_cached_descriptor() should do this. get_cached_descriptor() returns struct desc_struct which is 32 bit only. There is not 64bit segment descriptors. Ah. Well either extend desc_struct, or return something which works everywhere. static unsigned long seg_override_base(struct x86_emulate_ctxt *ctxt, + struct x86_emulate_ops *ops, struct decode_cache *c) { if (!c->has_seg_override) return 0; - return seg_base(ctxt, c->seg_override); + return seg_base(ctxt, ops, c->seg_override); } Sticking ops into ctxt would reduce the size of these patches. But it will introduce intermediate huge patch. But I agree that this should be done eventually. Planned to do it somewhere at the end. Near the patch that change x86_emulate_ops callbacks to get ctxt instead of vcpu as a parameter. Would have been better up front IMO, but don't destroy your patchset just for this. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 9/9] KVM test: Redirect the console to serial for all linux guests
On 04/26/2010 01:04 PM, Jason Wang wrote: > As we have the ability to dump the content from serial console or use > a session through it, we need to redirect the console to serial > through unattended files to make use of it. > > Signed-off-by: Jason Wang > --- > client/tests/kvm/unattended/Fedora-10.ks |2 +- > client/tests/kvm/unattended/Fedora-11.ks |2 +- > client/tests/kvm/unattended/Fedora-12.ks |2 +- > client/tests/kvm/unattended/Fedora-8.ks |2 +- > client/tests/kvm/unattended/Fedora-9.ks |2 +- > client/tests/kvm/unattended/OpenSUSE-11.xml |1 + > client/tests/kvm/unattended/RHEL-3-series.ks |2 +- > client/tests/kvm/unattended/RHEL-4-series.ks |2 +- > client/tests/kvm/unattended/RHEL-5-series.ks |2 +- > client/tests/kvm/unattended/SLES-11.xml |1 + > 10 files changed, 10 insertions(+), 8 deletions(-) > > diff --git a/client/tests/kvm/unattended/Fedora-10.ks > b/client/tests/kvm/unattended/Fedora-10.ks > index 61e59d7..8628036 100644 > --- a/client/tests/kvm/unattended/Fedora-10.ks > +++ b/client/tests/kvm/unattended/Fedora-10.ks > @@ -11,7 +11,7 @@ firewall --enabled --ssh > selinux --enforcing > timezone --utc America/New_York > firstboot --disable > -bootloader --location=mbr > +bootloader --location=mbr --append="console=ttyS0,115200" I see no reason to completely silence the regular console. We can let some output go to the regular console as well as the serial one by using --append="console=tty0 console=ttyS0" > zerombr > clearpart --all --initlabel > autopart > diff --git a/client/tests/kvm/unattended/Fedora-11.ks > b/client/tests/kvm/unattended/Fedora-11.ks > index 0be7d06..5ce8ee2 100644 > --- a/client/tests/kvm/unattended/Fedora-11.ks > +++ b/client/tests/kvm/unattended/Fedora-11.ks > @@ -10,7 +10,7 @@ firewall --enabled --ssh > selinux --enforcing > timezone --utc America/New_York > firstboot --disable > -bootloader --location=mbr > +bootloader --location=mbr --append="console=ttyS0,115200" > zerombr > > clearpart --all --initlabel > diff --git a/client/tests/kvm/unattended/Fedora-12.ks > b/client/tests/kvm/unattended/Fedora-12.ks > index 0be7d06..5ce8ee2 100644 > --- a/client/tests/kvm/unattended/Fedora-12.ks > +++ b/client/tests/kvm/unattended/Fedora-12.ks > @@ -10,7 +10,7 @@ firewall --enabled --ssh > selinux --enforcing > timezone --utc America/New_York > firstboot --disable > -bootloader --location=mbr > +bootloader --location=mbr --append="console=ttyS0,115200" > zerombr > > clearpart --all --initlabel > diff --git a/client/tests/kvm/unattended/Fedora-8.ks > b/client/tests/kvm/unattended/Fedora-8.ks > index f4a872d..884d556 100644 > --- a/client/tests/kvm/unattended/Fedora-8.ks > +++ b/client/tests/kvm/unattended/Fedora-8.ks > @@ -11,7 +11,7 @@ firewall --enabled --ssh > selinux --enforcing > timezone --utc America/New_York > firstboot --disable > -bootloader --location=mbr > +bootloader --location=mbr --append="console=ttyS0,115200" > zerombr > clearpart --all --initlabel > autopart > diff --git a/client/tests/kvm/unattended/Fedora-9.ks > b/client/tests/kvm/unattended/Fedora-9.ks > index f4a872d..884d556 100644 > --- a/client/tests/kvm/unattended/Fedora-9.ks > +++ b/client/tests/kvm/unattended/Fedora-9.ks > @@ -11,7 +11,7 @@ firewall --enabled --ssh > selinux --enforcing > timezone --utc America/New_York > firstboot --disable > -bootloader --location=mbr > +bootloader --location=mbr --append="console=ttyS0,115200" > zerombr > clearpart --all --initlabel > autopart > diff --git a/client/tests/kvm/unattended/OpenSUSE-11.xml > b/client/tests/kvm/unattended/OpenSUSE-11.xml > index 7dd44fa..2a4fec0 100644 > --- a/client/tests/kvm/unattended/OpenSUSE-11.xml > +++ b/client/tests/kvm/unattended/OpenSUSE-11.xml > @@ -50,6 +50,7 @@ > edd > > > +console=ttyS0,115200 > grub > > > diff --git a/client/tests/kvm/unattended/RHEL-3-series.ks > b/client/tests/kvm/unattended/RHEL-3-series.ks > index 884b386..26b1130 100644 > --- a/client/tests/kvm/unattended/RHEL-3-series.ks > +++ b/client/tests/kvm/unattended/RHEL-3-series.ks > @@ -10,7 +10,7 @@ rootpw 123456 > firewall --enabled --ssh > timezone America/New_York > firstboot --disable > -bootloader --location=mbr > +bootloader --location=mbr --append="console=ttyS0,115200" > clearpart --all --initlabel > autopart > reboot > diff --git a/client/tests/kvm/unattended/RHEL-4-series.ks > b/client/tests/kvm/unattended/RHEL-4-series.ks > index ce4a430..f2f934f 100644 > --- a/client/tests/kvm/unattended/RHEL-4-series.ks > +++ b/client/tests/kvm/unattended/RHEL-4-series.ks > @@ -11,7 +11,7 @@ firewall --enabled --ssh > selinux --enforcing > timezone --utc America/New_York > firstboot --disable > -bootloader --location=mbr > +bootloader --location=mbr --append="console=ttyS0,115200" > zerombr > clearpart --all --initlabel > autopart > diff --git a/client/tests/kvm/unattended/RHEL-5-serie
Re: [PATCH] kvm mmu: reduce 50% memory usage
On 04/28/2010 02:57 PM, Lai Jiangshan wrote: I think users will enable tdp when their hardwares support ept or npt. This patch can reduce about 50% kvm mmu memory usage for they. Good one! -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/9] KVM test: Introduce the local_login()
On 04/26/2010 01:04 PM, Jason Wang wrote: > This patch introduces a new method which is used to log into the guest > through the guest serial console. The serial_mode must be set to > "session" in order to make use of this patch. In what cases would we want to use this feature? The serial console is not supported by all guests and I'm not sure it supports multiple concurrent sessions (does it?), so it's probably not possible to use it reliably as a replacement for the regular remote shell servers, or even as an alternative variant. > Signed-off-by: Jason Wang > --- > client/tests/kvm/kvm_vm.py | 25 + > 1 files changed, 25 insertions(+), 0 deletions(-) > > diff --git a/client/tests/kvm/kvm_vm.py b/client/tests/kvm/kvm_vm.py > index 0cdf925..a22893b 100755 > --- a/client/tests/kvm/kvm_vm.py > +++ b/client/tests/kvm/kvm_vm.py > @@ -814,7 +814,32 @@ class VM: > "command", "")) > return session > > +def local_login(self, timeout=240): > +""" > +Log into the guest via serial console > +If timeout expires while waiting for output from the guest (e.g. a > +password prompt or a shell prompt) -- fail. > +""" > + > +serial_mode = self.params.get("serial_mode") > +username = self.params.get("username", "") > +password = self.params.get("password", "") > +prompt = self.params.get("shell_prompt", "[\#\$]") > +linesep = eval("'%s'" % self.params.get("shell_linesep", r"\n")) > > +if serial_mode != "session": > +logging.debug("serial_mode is not session") > +return None > +else: > +command = "nc -U %s" % self.serial_file_name > +assist = self.params.get("prompt_assist") > +session = kvm_utils.remote_login(command, password, prompt, > linesep, > + timeout, "", username) ^ You probably meant to pass the prompt assist string to remote_login() but instead you're passing "". > +if session: > + > session.set_status_test_command(self.params.get("status_test_" > +"command", > "")) > +return session > + > def copy_files_to(self, local_path, remote_path, nic_index=0, > timeout=300): > """ > Transfer files to the guest. > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Autotest] [PATCH 8/9] KVM test: Create the background threads before calling process()
On 04/26/2010 01:04 PM, Jason Wang wrote: > If the screendump and scrialdump threads are created after the > process(), we may lose the progress tracking of guest shutting > down. So this patch creates them before calling process() in preprocess. > > Signed-off-by: Jason Wang > --- > client/tests/kvm/kvm_preprocessing.py |5 ++--- > 1 files changed, 2 insertions(+), 3 deletions(-) > > diff --git a/client/tests/kvm/kvm_preprocessing.py > b/client/tests/kvm/kvm_preprocessing.py > index 50d0e35..73e835a 100644 > --- a/client/tests/kvm/kvm_preprocessing.py > +++ b/client/tests/kvm/kvm_preprocessing.py > @@ -257,9 +257,6 @@ def preprocess(test, params, env): > int(params.get("pre_command_timeout", "600")), > params.get("pre_command_noncritical") == "yes") > > -# Preprocess all VMs and images > -process(test, params, env, preprocess_image, preprocess_vm) > - > # Start the screendump thread > if params.get("take_regular_screendumps") == "yes": > logging.debug("Starting screendump thread") > @@ -278,6 +275,8 @@ def preprocess(test, params, env): >args=(test, params, env)) > _serialdump_thread.start() > > +# Preprocess all VMs and images > +process(test, params, env, preprocess_image, preprocess_vm) > > > def postprocess(test, params, env): The initial shutdown procedure is carried out automatically by the preprocessor in order to prepare the VMs for the current test, and is not part of the test. During the procedure VMs from a previous test are shut down and/or restarted. I think it'll be confusing (or at least irrelevant) for the user to see a Fedora guest shutting down at the beginning of a Windows test. Also, it will be inconsistent with the pre_*.ppm screendumps which are taken after the new VMs are started. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm mmu: reduce 50% memory usage
I think users will enable tdp when their hardwares support ept or npt. This patch can reduce about 50% kvm mmu memory usage for they. This simple patch use the fact that: When sp->role.direct is set, sp->gfns does not contain any essential information, leaf sptes reachable from this sp are for a continuate guest physical memory range(a linear range). So sp->gfns[i](if it was set) equals to sp->gfn + i. (PT_PAGE_TABLE_LEVEL) Obviously, it is not essential information, we can calculate it when need. It means we don't need sp->gfns when sp->role.direct=1, Thus we can save one page usage for every kvm_mmu_page. Note: Access to sp->gfns must be wrapped by kvm_mmu_page_get_gfn() or kvm_mmu_page_set_gfn(). It is only exposed in FNAME(sync_page). Signed-off-by: Lai Jiangshan --- mmu.c | 38 +- paging_tmpl.h |3 +++ 2 files changed, 32 insertions(+), 9 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index ddfa865..a5c6719 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -393,6 +393,22 @@ static void mmu_free_rmap_desc(struct kvm_rmap_desc *rd) kfree(rd); } +static gfn_t kvm_mmu_page_get_gfn(struct kvm_mmu_page *sp, int index) +{ + if (!sp->role.direct) + return sp->gfns[index]; + + return sp->gfn + (index << ((sp->role.level - 1) * PT64_LEVEL_BITS)); +} + +static void kvm_mmu_page_set_gfn(struct kvm_mmu_page *sp, int index, gfn_t gfn) +{ + if (sp->role.direct) + BUG_ON(gfn != kvm_mmu_page_get_gfn(sp, index)); + else + sp->gfns[index] = gfn; +} + /* * Return the pointer to the largepage write count for a given * gfn, handling slots that are not large page aligned. @@ -543,7 +559,7 @@ static int rmap_add(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn) return count; gfn = unalias_gfn(vcpu->kvm, gfn); sp = page_header(__pa(spte)); - sp->gfns[spte - sp->spt] = gfn; + kvm_mmu_page_set_gfn(sp, spte - sp->spt, gfn); rmapp = gfn_to_rmap(vcpu->kvm, gfn, sp->role.level); if (!*rmapp) { rmap_printk("rmap_add: %p %llx 0->1\n", spte, *spte); @@ -601,6 +617,7 @@ static void rmap_remove(struct kvm *kvm, u64 *spte) struct kvm_rmap_desc *prev_desc; struct kvm_mmu_page *sp; pfn_t pfn; + gfn_t gfn; unsigned long *rmapp; int i; @@ -612,7 +629,8 @@ static void rmap_remove(struct kvm *kvm, u64 *spte) kvm_set_pfn_accessed(pfn); if (is_writable_pte(*spte)) kvm_set_pfn_dirty(pfn); - rmapp = gfn_to_rmap(kvm, sp->gfns[spte - sp->spt], sp->role.level); + gfn = kvm_mmu_page_get_gfn(sp, spte - sp->spt); + rmapp = gfn_to_rmap(kvm, gfn, sp->role.level); if (!*rmapp) { printk(KERN_ERR "rmap_remove: %p %llx 0->BUG\n", spte, *spte); BUG(); @@ -896,7 +914,8 @@ static void kvm_mmu_free_page(struct kvm *kvm, struct kvm_mmu_page *sp) ASSERT(is_empty_shadow_page(sp->spt)); list_del(&sp->link); __free_page(virt_to_page(sp->spt)); - __free_page(virt_to_page(sp->gfns)); + if (!sp->role.direct) + __free_page(virt_to_page(sp->gfns)); kfree(sp); ++kvm->arch.n_free_mmu_pages; } @@ -907,13 +926,15 @@ static unsigned kvm_page_table_hashfn(gfn_t gfn) } static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, - u64 *parent_pte) + u64 *parent_pte, int direct) { struct kvm_mmu_page *sp; sp = mmu_memory_cache_alloc(&vcpu->arch.mmu_page_header_cache, sizeof *sp); sp->spt = mmu_memory_cache_alloc(&vcpu->arch.mmu_page_cache, PAGE_SIZE); - sp->gfns = mmu_memory_cache_alloc(&vcpu->arch.mmu_page_cache, PAGE_SIZE); + if (!direct) + sp->gfns = mmu_memory_cache_alloc(&vcpu->arch.mmu_page_cache, + PAGE_SIZE); set_page_private(virt_to_page(sp->spt), (unsigned long)sp); list_add(&sp->link, &vcpu->kvm->arch.active_mmu_pages); bitmap_zero(sp->slot_bitmap, KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS); @@ -1347,7 +1368,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, return sp; } ++vcpu->kvm->stat.mmu_cache_miss; - sp = kvm_mmu_alloc_page(vcpu, parent_pte); + sp = kvm_mmu_alloc_page(vcpu, parent_pte, direct); if (!sp) return sp; sp->gfn = gfn; @@ -3314,7 +3335,7 @@ void inspect_spte_has_rmap(struct kvm *kvm, u64 *sptep) if (*sptep & PT_WRITABLE_MASK) { rev_sp = page_header(__pa(sptep)); - gfn = rev_sp->gfns[sptep - rev_sp->spt]; + gfn = kvm_mmu_page_get_gfn(rev_sp, sptep - rev_sp->spt); if (!gfn_to_memslot(kvm, gfn)) {
Re: [PATCH 11/22] KVM: MMU: Add infrastructure for two-level page walker
On 04/28/2010 02:03 PM, Joerg Roedel wrote: On Tue, Apr 27, 2010 at 03:34:10PM +0300, Avi Kivity wrote: On 04/27/2010 01:38 PM, Joerg Roedel wrote: This patch introduces a mmu-callback to translate gpa addresses in the walk_addr code. This is later used to translate l2_gpa addresses into l1_gpa addresses. +static inline gfn_t gpa_to_gfn(gpa_t gpa) +{ + return (gfn_t)gpa>> PAGE_SHIFT; +} + This overflows on 32-bit, since gpa_t is u64 and gfn_t is ulong. Thinking again about it, on 32 bit the physical address width is only 36 bits. So there shouldn't be an overflow, no? It's limited by MAXPHYADDR (at least on Intel) even on 32-bits. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/9] KVM test: Add the ability to send the username in remote_login()
On 04/26/2010 01:03 PM, Jason Wang wrote: > In order to let the serial console work, we must let the > remote_login() send the username when needed. > > Signed-off-by: Jason Wang > --- > client/tests/kvm/kvm_utils.py | 14 ++ > 1 files changed, 10 insertions(+), 4 deletions(-) > > diff --git a/client/tests/kvm/kvm_utils.py b/client/tests/kvm/kvm_utils.py > index 9adbaee..1ea0852 100644 > --- a/client/tests/kvm/kvm_utils.py > +++ b/client/tests/kvm/kvm_utils.py > @@ -452,7 +452,7 @@ def check_kvm_source_dir(source_dir): > # guests. > > def remote_login(command, password, prompt, linesep="\n", timeout=10, > - prompt_assist = None): > + prompt_assist = None, username = None): ^ ^ ^ ^ Same here: PEP 8 says no spaces around keyword arguments or default values, and pretty much all of the KVM test code conforms with PEP 8, so let's be consistent. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/9] KVM test: Introduce the prompt assist
On 04/26/2010 01:03 PM, Jason Wang wrote: > Sometimes we need to send an assist string to a session in order to > get the prompt especially when re-connecting to an already logged > serial session. This patch send the assist string before doing the > pattern matching of remote_login. Can you give an example of a prompt assist string, and a typical usage example? What guests require prompt assist strings? > Signed-off-by: Jason Wang > --- > client/tests/kvm/kvm_utils.py |9 +++-- > 1 files changed, 7 insertions(+), 2 deletions(-) > > diff --git a/client/tests/kvm/kvm_utils.py b/client/tests/kvm/kvm_utils.py > index 25f3c8c..9adbaee 100644 > --- a/client/tests/kvm/kvm_utils.py > +++ b/client/tests/kvm/kvm_utils.py > @@ -451,7 +451,8 @@ def check_kvm_source_dir(source_dir): > # The following are functions used for SSH, SCP and Telnet communication with > # guests. > > -def remote_login(command, password, prompt, linesep="\n", timeout=10): > +def remote_login(command, password, prompt, linesep="\n", timeout=10, > + prompt_assist = None): ^ ^ These spaces do not conform with PEP 8. > """ > Log into a remote host (guest) using SSH or Telnet. Run the given command > using kvm_spawn and provide answers to the questions asked. If timeout > @@ -468,7 +469,8 @@ def remote_login(command, password, prompt, linesep="\n", > timeout=10): > @param timeout: The maximal time duration (in seconds) to wait for each > step of the login procedure (i.e. the "Are you sure" prompt, the > password prompt, the shell prompt, etc) > - > +@prarm prompt_assist: An assistant string sent before the pattern Typo^ > +matching in order to get the prompt for some kinds of > shell_client. > @return Return the kvm_spawn object on success and None on failure. > """ > sub = kvm_subprocess.kvm_shell_session(command, > @@ -479,6 +481,9 @@ def remote_login(command, password, prompt, linesep="\n", > timeout=10): > > logging.debug("Trying to login with command '%s'" % command) > > +if prompt_assist is not None: > +sub.sendline(prompt_assist) > + > while True: > (match, text) = sub.read_until_last_line_matches( > [r"[Aa]re you sure", r"[Pp]assword:\s*$", > r"^\s*[Ll]ogin:\s*$", > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 11/22] KVM: MMU: Add infrastructure for two-level page walker
On 04/28/2010 01:52 PM, Joerg Roedel wrote: On Tue, Apr 27, 2010 at 03:34:10PM +0300, Avi Kivity wrote: On 04/27/2010 01:38 PM, Joerg Roedel wrote: This patch introduces a mmu-callback to translate gpa addresses in the walk_addr code. This is later used to translate l2_gpa addresses into l1_gpa addresses. +static inline gfn_t gpa_to_gfn(gpa_t gpa) +{ + return (gfn_t)gpa>> PAGE_SHIFT; +} + This overflows on 32-bit, since gpa_t is u64 and gfn_t is ulong. Hm, this is a problem outside of this patchset too (for 32bit hosts). The best solution is probably to convert gfn_t to u64 too. If you cast like (gfn_t)(gpa >> PAGE_SHIFT) you avoid the overflow for MAXPHYADDR < 48. However, I agree that converting gfn_t to u64 is best, the minor performance degradation is in no way comparable to the corruption that results from a miscast. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] KVM: x86: avoid unnecessary bitmap allocation when memslot is clean
On 04/27/2010 04:46 PM, Takuya Yoshikawa wrote: (2010/04/27 22:18), Avi Kivity wrote: Furthermore, the reduced allocations seem to produce good effects for other cases too. Actually, I observed that the time for the ioctl was more stable than the original one and the average time for dirty slots was also reduced by some extent. Can you explain why the dirty slots were improved? I cannot do exactly, but vmalloc() might affect the following allocations? But actually, I could check how much one vmalloc() consumed our time, and this will be completely removed by our "moving dirty bitmaps to user space". Better to remove this explanation about dirty slots? I guess so. BTW, maybe good news for us! 1. I found one place in which set_bit_to_user() is locally defined using current helpers. Interesting, where? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/9] KVM test: Make the login re suitable for serial console
On 04/26/2010 01:03 PM, Jason Wang wrote: > Current matching re ^\s*[Ll]ogin:\s*$ is not suitable for the serial > console, so change it to [Ll]ogin:. > > Signed-off-by: Jason Wang > --- > client/tests/kvm/kvm_utils.py |2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/client/tests/kvm/kvm_utils.py b/client/tests/kvm/kvm_utils.py > index 1ea0852..bb42314 100644 > --- a/client/tests/kvm/kvm_utils.py > +++ b/client/tests/kvm/kvm_utils.py > @@ -488,7 +488,7 @@ def remote_login(command, password, prompt, linesep="\n", > timeout=10, > > while True: > (match, text) = sub.read_until_last_line_matches( > -[r"[Aa]re you sure", r"[Pp]assword:\s*$", > r"^\s*[Ll]ogin:\s*$", > +[r"[Aa]re you sure", r"[Pp]assword:\s*$", r"[Ll]ogin:", This RE will also match "Last login: Wed Apr 28 14:07:28 2010 from localhost.localdomain" so we should probably settle for [Ll]ogin:\s*$. > r"[Cc]onnection.*closed", r"[Cc]onnection.*refused", > r"[Pp]lease wait", prompt], > timeout=timeout, internal_timeout=0.5) > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 11/22] KVM: MMU: Add infrastructure for two-level page walker
On Tue, Apr 27, 2010 at 03:34:10PM +0300, Avi Kivity wrote: > On 04/27/2010 01:38 PM, Joerg Roedel wrote: > >This patch introduces a mmu-callback to translate gpa > >addresses in the walk_addr code. This is later used to > >translate l2_gpa addresses into l1_gpa addresses. > > > >+static inline gfn_t gpa_to_gfn(gpa_t gpa) > >+{ > >+return (gfn_t)gpa>> PAGE_SHIFT; > >+} > >+ > > This overflows on 32-bit, since gpa_t is u64 and gfn_t is ulong. Thinking again about it, on 32 bit the physical address width is only 36 bits. So there shouldn't be an overflow, no? Joerg -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 11/22] KVM: MMU: Add infrastructure for two-level page walker
On Tue, Apr 27, 2010 at 03:34:10PM +0300, Avi Kivity wrote: > On 04/27/2010 01:38 PM, Joerg Roedel wrote: > >This patch introduces a mmu-callback to translate gpa > >addresses in the walk_addr code. This is later used to > >translate l2_gpa addresses into l1_gpa addresses. > > > >+static inline gfn_t gpa_to_gfn(gpa_t gpa) > >+{ > >+return (gfn_t)gpa>> PAGE_SHIFT; > >+} > >+ > > This overflows on 32-bit, since gpa_t is u64 and gfn_t is ulong. Hm, this is a problem outside of this patchset too (for 32bit hosts). The best solution is probably to convert gfn_t to u64 too. Joerg -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [SeaBIOS] About cpu_set, CPU hotplug and related subjects
On Wed, Apr 28, 2010 at 12:41:51PM +0200, Jes Sorensen wrote: > On 04/28/10 12:30, Gleb Natapov wrote: > > On Wed, Apr 28, 2010 at 11:31:00AM +0200, Jes Sorensen wrote: > >> On 04/22/10 03:12, Kevin O'Connor wrote: > >> Generating the more complex tables dynamically would be preferred, but > >> it requires like half an AML compiler in Seabios, so it kinda stalled > >> there > >> > > We can try to be smart and generate most of the code statically and only > > minimum that absolutely required dynamically. Haven't looked at how > > simple dynamic part can be made. > > I looked at it briefly and ran away screaming :) I am sure it can be This is normal reaction to ACPI of a healthy human being. > done, bit it would require pretty good understanding of the AML encodings. > > The CPU declarations are particularly tricky as they get pretty big and > complex and need to live in the DSDT, whereas a lot of other things we > can shift off to separate SSDT tables and only put the minimum that > needs to be generated dynamically in it's own table. > We can generate complex code statically and call it from dynamically generated CPU declarations. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [SeaBIOS] About cpu_set, CPU hotplug and related subjects
On 04/28/10 12:30, Gleb Natapov wrote: > On Wed, Apr 28, 2010 at 11:31:00AM +0200, Jes Sorensen wrote: >> On 04/22/10 03:12, Kevin O'Connor wrote: >> Generating the more complex tables dynamically would be preferred, but >> it requires like half an AML compiler in Seabios, so it kinda stalled >> there >> > We can try to be smart and generate most of the code statically and only > minimum that absolutely required dynamically. Haven't looked at how > simple dynamic part can be made. I looked at it briefly and ran away screaming :) I am sure it can be done, bit it would require pretty good understanding of the AML encodings. The CPU declarations are particularly tricky as they get pretty big and complex and need to live in the DSDT, whereas a lot of other things we can shift off to separate SSDT tables and only put the minimum that needs to be generated dynamically in it's own table. Cheers, Jes -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 0/10] KVM MMU: allow more shadow pages become asynchronous
On 04/28/2010 06:54 AM, Xiao Guangrong wrote: Changlog v3: Reviewed-by: Avi Kivity -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [SeaBIOS] About cpu_set, CPU hotplug and related subjects
On Wed, Apr 28, 2010 at 11:31:00AM +0200, Jes Sorensen wrote: > On 04/22/10 03:12, Kevin O'Connor wrote: > > As I understand it, the hotplug support was only in the kvm copy of > > bochs bios. It also limited the number of cpus one could use (I think > > 16). > > > > The current smp support in SeaBIOS doesn't limit the number of cpus. > > > > So, there has been reluctance to just port the old kvm bios code > > forward. > > I believe the number was limited to 15, due to limits in a certain > proprietary operating system. > > The issue with the AML code in BOCHS vs the code in Seabios is a bit > tricky. The tables in BOCHS are a fair bit more complicated with the > hotplug code and therefore statically generated, whereas Seabios > generates them dynamically at boot time. > > Generating the more complex tables dynamically would be preferred, but > it requires like half an AML compiler in Seabios, so it kinda stalled > there > We can try to be smart and generate most of the code statically and only minimum that absolutely required dynamically. Haven't looked at how simple dynamic part can be made. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] KVM: x86: avoid unnecessary bitmap allocation when memslot is clean
On 04/28/2010 07:17 AM, Takuya Yoshikawa wrote: (2010/04/27 22:46), Takuya Yoshikawa wrote: (2010/04/27 22:18), Avi Kivity wrote: Furthermore, the reduced allocations seem to produce good effects for other cases too. Actually, I observed that the time for the ioctl was more stable than the original one and the average time for dirty slots was also reduced by some extent. Can you explain why the dirty slots were improved? I cannot do exactly, but vmalloc() might affect the following allocations? Oh, this might be the effect of tlb flush or something? - I'm not so confident about mm, but vmalloc(),vfree() does this kind of thing? Yes, vmalloc() does tlb flushes, perhaps that's the cause. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv2] qemu-kvm: fix crash on reboot with vhost-net
When vhost-net is disabled on reboot, we set msix mask notifier to NULL to disable further mask/unmask notifications. Code currently tries to pass this NULL to notifier, leading to a crash. The right thing to do is to add explicit APIs to enable/disable notifications. Now when disabling notifications: - if vector is masked, we don't need to notify backend, just disable future notifications - if vector is unmasked, invoke callback to unassign backend, then disable future notifications This patch also polls notifier before closing it, to make sure we don't lose events if poll callback didn't have time to run. Signed-off-by: Michael S. Tsirkin --- Changes from v1: Separate APIs to set and unset notifiers Test and clear notifier before destroying it hw/msix.c | 40 +++- hw/msix.h |1 + hw/virtio-pci.c |7 +-- 3 files changed, 41 insertions(+), 7 deletions(-) diff --git a/hw/msix.c b/hw/msix.c index 3ec8805..8f9a621 100644 --- a/hw/msix.c +++ b/hw/msix.c @@ -609,14 +609,44 @@ void msix_unuse_all_vectors(PCIDevice *dev) int msix_set_mask_notifier(PCIDevice *dev, unsigned vector, void *opaque) { +int r; +if (vector >= dev->msix_entries_nr || !dev->msix_entry_used[vector]) +return 0; + +assert(dev->msix_mask_notifier); +assert(opaque); +assert(!dev->msix_mask_notifier_opaque[vector]); + +if (msix_is_masked(dev, vector)) { +return 0; +} +r = dev->msix_mask_notifier(dev, vector, opaque, +msix_is_masked(dev, vector)); +if (r < 0) { +return r; +} +dev->msix_mask_notifier_opaque[vector] = opaque; +return r; +} + +int msix_unset_mask_notifier(PCIDevice *dev, unsigned vector) +{ int r = 0; if (vector >= dev->msix_entries_nr || !dev->msix_entry_used[vector]) return 0; -if (dev->msix_mask_notifier) -r = dev->msix_mask_notifier(dev, vector, opaque, -msix_is_masked(dev, vector)); -if (r >= 0) -dev->msix_mask_notifier_opaque[vector] = opaque; +assert(dev->msix_mask_notifier); +assert(dev->msix_mask_notifier_opaque[vector]); + +if (msix_is_masked(dev, vector)) { +return 0; +} +r = dev->msix_mask_notifier(dev, vector, +dev->msix_mask_notifier_opaque[vector], +msix_is_masked(dev, vector)); +if (r < 0) { +return r; +} +dev->msix_mask_notifier_opaque[vector] = NULL; return r; } diff --git a/hw/msix.h b/hw/msix.h index f167231..6b21ffb 100644 --- a/hw/msix.h +++ b/hw/msix.h @@ -34,4 +34,5 @@ void msix_reset(PCIDevice *dev); extern int msix_supported; int msix_set_mask_notifier(PCIDevice *dev, unsigned vector, void *opaque); +int msix_unset_mask_notifier(PCIDevice *dev, unsigned vector); #endif diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c index 99a588c..c4bc633 100644 --- a/hw/virtio-pci.c +++ b/hw/virtio-pci.c @@ -462,10 +462,13 @@ static int virtio_pci_set_guest_notifier(void *opaque, int n, bool assign) msix_set_mask_notifier(&proxy->pci_dev, virtio_queue_vector(proxy->vdev, n), vq); } else { -msix_set_mask_notifier(&proxy->pci_dev, - virtio_queue_vector(proxy->vdev, n), NULL); +msix_unset_mask_notifier(&proxy->pci_dev, +virtio_queue_vector(proxy->vdev, n)); qemu_set_fd_handler(event_notifier_get_fd(notifier), NULL, NULL, NULL); +/* Test and clear notifier before closing it, + * in case poll callback didn't have time to run. */ +virtio_pci_guest_notifier_read(vq); event_notifier_cleanup(notifier); } -- 1.7.1.rc1.22.g3163 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/23] KVM: x86 emulator: cleanup some direct calls into kvm to use existing callbacks
On Wed, Apr 28, 2010 at 11:59:54AM +0300, Avi Kivity wrote: > On 04/27/2010 03:15 PM, Gleb Natapov wrote: > >Use callbacks from x86_emulate_ops to access segments instead of calling > >into kvm directly. > > > > > > > >-static unsigned long seg_base(struct x86_emulate_ctxt *ctxt, int seg) > >+static unsigned long seg_base(struct x86_emulate_ctxt *ctxt, > >+ struct x86_emulate_ops *ops, int seg) > > { > >-if (ctxt->mode == X86EMUL_MODE_PROT64&& seg< VCPU_SREG_FS) > >-return 0; > >+unsigned long base; > > > >-return kvm_x86_ops->get_segment_base(ctxt->vcpu, seg); > > get_segment_base() is only one vmread on intel, but you replace it > with reading the entire segment. > Didn't what to have separate x86_emulate_ops callback for reading segment base. If this is serious performance concern it can be added > >+if (ctxt->mode == X86EMUL_MODE_PROT64) { > >+u64 val; > >+switch (seg) { > >+case VCPU_SREG_FS: > >+ops->get_msr(ctxt->vcpu, MSR_FS_BASE,&val); > >+break; > >+case VCPU_SREG_GS: > >+ops->get_msr(ctxt->vcpu, MSR_GS_BASE,&val); > >+break; > >+default: > >+val = 0; > >+break; > >+} > > Why this ugliness? get_cached_descriptor() should do this. > get_cached_descriptor() returns struct desc_struct which is 32 bit only. There is not 64bit segment descriptors. > > > > static unsigned long seg_override_base(struct x86_emulate_ctxt *ctxt, > >+ struct x86_emulate_ops *ops, > >struct decode_cache *c) > > { > > if (!c->has_seg_override) > > return 0; > > > >-return seg_base(ctxt, c->seg_override); > >+return seg_base(ctxt, ops, c->seg_override); > > } > > Sticking ops into ctxt would reduce the size of these patches. > But it will introduce intermediate huge patch. But I agree that this should be done eventually. Planned to do it somewhere at the end. Near the patch that change x86_emulate_ops callbacks to get ctxt instead of vcpu as a parameter. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM Networking Issues
Hi, Can anyone shed some light on a questions I have if possible? I know KVM uses dnsmasq for DHCP and DNS, but is it possible to not use this and use another DHCP server to dish out IP addresses? Now if my understanding is right, there is just a NAT created to access the VM? How is this done? The reason I ask is that I am trying to put together a lab that is running a kickstart server, which uses dhcp, so i would like my DHCP server to handle all networking, but I need to know how to create the rest of the setup so that i can assign a static IP and dhcp is only used for deployment / PXE. I hope iv explained that so it makes sense? Hope someone can help. Kind Regards Tony -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2] qemu-kvm: fix crash on reboot with vhost-net
"Michael S. Tsirkin" wrote: > When vhost-net is disabled on reboot, we set msix mask notifier > to NULL to disable further mask/unmask notifications. > Code currently tries to pass this NULL to notifier, > leading to a crash. The right thing to do is > to add explicit APIs to enable/disable notifications. > Now when disabling notifications: > - if vector is masked, we don't need to notify backend, > just disable future notifications > - if vector is unmasked, invoke callback to unassign backend, > then disable future notifications > > This patch also polls notifier before closing it, > to make sure we don't lose events if poll callback > didn't have time to run. > > Signed-off-by: Michael S. Tsirkin > --- > > Changes from v1: > Separate APIs to set and unset notifiers > Test and clear notifier before destroying it > > hw/msix.c | 40 +++- > hw/msix.h |1 + > hw/virtio-pci.c |7 +-- > 3 files changed, 41 insertions(+), 7 deletions(-) Acked-by: Juan Quintela This patch addresses the issues that I pointed in previous mail and fixes the problem at hand. Creating a better API for interaction between virtio-pci <-> msix still needs more work/thought, already discussed that with mst. Later, Juan. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 15/23] KVM: do not inject #PF in (read|write)_emulated() callbacks
On 04/27/2010 03:15 PM, Gleb Natapov wrote: Return error to x86 emulator instead of injection exception behind its back. Signed-off-by: Gleb Natapov --- arch/x86/include/asm/kvm_emulate.h |3 +++ arch/x86/kvm/emulate.c | 12 +++- arch/x86/kvm/x86.c | 28 ++-- 3 files changed, 28 insertions(+), 15 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index ae4af86..b977ccf 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -94,6 +94,7 @@ struct x86_emulate_ops { int (*read_emulated)(unsigned long addr, void *val, unsigned int bytes, +unsigned int *error, struct kvm_vcpu *vcpu); The fault may be at a different address than addr, if we cross a page boundary. Need a struct here. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] KVM, Fix QEMU-KVM is killed by guest SRAO MCE
On 04/28/2010 05:56 AM, Huang Ying wrote: Just want to use the side effect of copy_from_user, SIGBUS will be sent to current process because the page touched is marked as poisoned. That is, failure is expected, so the return value is not checked. What if the failure doesn't happen? Say, someone mmap()ed over the page. Sorry, not get your idea clearly. hva is re-mmap()ed? We just read the hva, not write, so I think it should be OK here. We don't generate a signal in this case. Does the code continue to work correctly (not sure what correctly is in this case... should probably just continue). There's also the possibility of -EFAULT. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] KVM: x86: avoid unnecessary bitmap allocation when memslot is clean
Hi Marcelo, Avi, I updated the patch as follows. Changelog: 1. Inserted one "r = -ENOMEM;" line following Avi's advice. 2. Little change of explanation about performance improvements. I'm now testing and cleaning up my next patch series based on this, so please apply this if this makes sense and has no problems. Thanks, Takuya === Although we always allocate a new dirty bitmap in x86's get_dirty_log(), it is only used as a zero-source of copy_to_user() and freed right after that when memslot is clean. This patch uses clear_user() instead of doing this unnecessary zero-source allocation. Performance improvement: as we can expect easily, the time needed to allocate a bitmap is completely reduced. In my test, the improved ioctl was about 4 to 10 times faster than the original one for clean slots. Furthermore, reducing memory allocations and copies will produce good effects to caches too. Signed-off-by: Takuya Yoshikawa --- arch/x86/kvm/x86.c | 37 +++-- 1 files changed, 23 insertions(+), 14 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 6b2ce1d..b95a211 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2744,7 +2744,6 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot; unsigned long n; unsigned long is_dirty = 0; - unsigned long *dirty_bitmap = NULL; mutex_lock(&kvm->slots_lock); @@ -2759,27 +2758,30 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, n = kvm_dirty_bitmap_bytes(memslot); - r = -ENOMEM; - dirty_bitmap = vmalloc(n); - if (!dirty_bitmap) - goto out; - memset(dirty_bitmap, 0, n); - for (i = 0; !is_dirty && i < n/sizeof(long); i++) is_dirty = memslot->dirty_bitmap[i]; /* If nothing is dirty, don't bother messing with page tables. */ if (is_dirty) { struct kvm_memslots *slots, *old_slots; + unsigned long *dirty_bitmap; spin_lock(&kvm->mmu_lock); kvm_mmu_slot_remove_write_access(kvm, log->slot); spin_unlock(&kvm->mmu_lock); - slots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL); - if (!slots) - goto out_free; + r = -ENOMEM; + dirty_bitmap = vmalloc(n); + if (!dirty_bitmap) + goto out; + memset(dirty_bitmap, 0, n); + r = -ENOMEM; + slots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL); + if (!slots) { + vfree(dirty_bitmap); + goto out; + } memcpy(slots, kvm->memslots, sizeof(struct kvm_memslots)); slots->memslots[log->slot].dirty_bitmap = dirty_bitmap; @@ -2788,13 +2790,20 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, synchronize_srcu_expedited(&kvm->srcu); dirty_bitmap = old_slots->memslots[log->slot].dirty_bitmap; kfree(old_slots); + + r = -EFAULT; + if (copy_to_user(log->dirty_bitmap, dirty_bitmap, n)) { + vfree(dirty_bitmap); + goto out; + } + vfree(dirty_bitmap); + } else { + r = -EFAULT; + if (clear_user(log->dirty_bitmap, n)) + goto out; } r = 0; - if (copy_to_user(log->dirty_bitmap, dirty_bitmap, n)) - r = -EFAULT; -out_free: - vfree(dirty_bitmap); out: mutex_unlock(&kvm->slots_lock); return r; -- 1.6.3.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [SeaBIOS] About cpu_set, CPU hotplug and related subjects
On 04/22/10 03:12, Kevin O'Connor wrote: > As I understand it, the hotplug support was only in the kvm copy of > bochs bios. It also limited the number of cpus one could use (I think > 16). > > The current smp support in SeaBIOS doesn't limit the number of cpus. > > So, there has been reluctance to just port the old kvm bios code > forward. I believe the number was limited to 15, due to limits in a certain proprietary operating system. The issue with the AML code in BOCHS vs the code in Seabios is a bit tricky. The tables in BOCHS are a fair bit more complicated with the hotplug code and therefore statically generated, whereas Seabios generates them dynamically at boot time. Generating the more complex tables dynamically would be preferred, but it requires like half an AML compiler in Seabios, so it kinda stalled there Cheers, Jes -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Pulling regularly from QEMU git repository
On 04/28/2010 07:37 AM, Gerhard Wiesinger wrote: Hello, I saw that KVM git repository is not updated from QEMU repository very often. Is it possible to integrate QEMU git tree into KVM git regularly as qemu git tree is very dynamic. E.g. last commit was more than 2 weeks ago. I think it is necessary to integrate early and often as a lot of bugs are fixed. We usually pull at least once a week, but sometimes we are lazy. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 15/23] KVM: do not inject #PF in (read|write)_emulated() callbacks
On Wed, Apr 28, 2010 at 12:11:41PM +0300, Avi Kivity wrote: > On 04/27/2010 03:15 PM, Gleb Natapov wrote: > >Return error to x86 emulator instead of injection exception behind its back. > > > >Signed-off-by: Gleb Natapov > >--- > > arch/x86/include/asm/kvm_emulate.h |3 +++ > > arch/x86/kvm/emulate.c | 12 +++- > > arch/x86/kvm/x86.c | 28 ++-- > > 3 files changed, 28 insertions(+), 15 deletions(-) > > > >diff --git a/arch/x86/include/asm/kvm_emulate.h > >b/arch/x86/include/asm/kvm_emulate.h > >index ae4af86..b977ccf 100644 > >--- a/arch/x86/include/asm/kvm_emulate.h > >+++ b/arch/x86/include/asm/kvm_emulate.h > >@@ -94,6 +94,7 @@ struct x86_emulate_ops { > > int (*read_emulated)(unsigned long addr, > > void *val, > > unsigned int bytes, > >+ unsigned int *error, > > struct kvm_vcpu *vcpu); > > > > The fault may be at a different address than addr, if we cross a > page boundary. Need a struct here. > Correct and you can find couple of FIXME in emulator.c that say so. I'll fix that later. This is not the problem that was introduced by this patch set. What do you mean we need struct here? Struct with error code + error location? After looking at the code: only write_emulated handle cross page access correctly now. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/23] KVM: x86 emulator: cleanup some direct calls into kvm to use existing callbacks
On 04/27/2010 03:15 PM, Gleb Natapov wrote: Use callbacks from x86_emulate_ops to access segments instead of calling into kvm directly. -static unsigned long seg_base(struct x86_emulate_ctxt *ctxt, int seg) +static unsigned long seg_base(struct x86_emulate_ctxt *ctxt, + struct x86_emulate_ops *ops, int seg) { - if (ctxt->mode == X86EMUL_MODE_PROT64&& seg< VCPU_SREG_FS) - return 0; + unsigned long base; - return kvm_x86_ops->get_segment_base(ctxt->vcpu, seg); get_segment_base() is only one vmread on intel, but you replace it with reading the entire segment. + if (ctxt->mode == X86EMUL_MODE_PROT64) { + u64 val; + switch (seg) { + case VCPU_SREG_FS: + ops->get_msr(ctxt->vcpu, MSR_FS_BASE,&val); + break; + case VCPU_SREG_GS: + ops->get_msr(ctxt->vcpu, MSR_GS_BASE,&val); + break; + default: + val = 0; + break; + } Why this ugliness? get_cached_descriptor() should do this. static unsigned long seg_override_base(struct x86_emulate_ctxt *ctxt, + struct x86_emulate_ops *ops, struct decode_cache *c) { if (!c->has_seg_override) return 0; - return seg_base(ctxt, c->seg_override); + return seg_base(ctxt, ops, c->seg_override); } Sticking ops into ctxt would reduce the size of these patches. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 22/23] KVM: remove unneeded initialization.
On 04/27/2010 03:15 PM, Gleb Natapov wrote: This initialization is no longer needed. Signed-off-by: Gleb Natapov --- arch/x86/kvm/x86.c |2 -- 1 files changed, 0 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index ed15b20..29e2d3b 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3801,8 +3801,6 @@ int emulate_instruction(struct kvm_vcpu *vcpu, */ cache_all_regs(vcpu); - vcpu->mmio_is_write = 0; - if (!(emulation_type& EMULTYPE_NO_DECODE)) { int cs_db, cs_l; kvm_x86_ops->get_cs_db_l_bits(vcpu,&cs_db,&cs_l); Best to fold into the patch that made it unnecessary. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 09/23] KVM: x86 emulator: make set_cr() callback return error if it fails
On 04/27/2010 03:15 PM, Gleb Natapov wrote: Make set_cr() callback return error if it fails instead of injecting #GP behind emulator's back. -void kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0) +static int _kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0) { standard practice is double underscore. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html