RE: [RFC][PATCH v4 05/18] Add a function to indicate if device use external buffer.

2010-04-28 Thread Xin, Xiaohui
> +static int dev_is_mpassthru(struct net_device *dev)
>
>bool return value should be better here.
>
>-- 
>Regards,
>Changli Gao(xiao...@gmail.com)

Thanks, would fix that.

Thanks
Xiaohui
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [RFC][PATCH v4 00/18] Provide a zero-copy method on KVM virtio-net.

2010-04-28 Thread Xin, Xiaohui
> > The idea is simple, just to pin the guest VM user space and then let
> > host NIC driver has the chance to directly DMA to it.
> >
> >Isn't it much easier to map the RX ring of the network device into the
> >guest's address space, have DMA map calls translate guest addresses to
> >physical/DMA addresses as well as do all of this crazy page pinning
> >stuff, and provide the translations and protections via the IOMMU?

>This means we need guest know how the specific network device works.
>So we won't be able to, for example, move guest between different hosts.
>There are other problems: many physical systems do not have an iommu,
>some guest OS-es do not support DMA map calls, doing VM exit
>on each DMA map call might turn out to be very slow. And so on.

This solution is what now we can think of to implement zero-copy. Some
modifications are made to net core to try to avoid network device driver
changes.  The major change is to __alloc_skb(), in which we added a dev
parameter to indicate whether the device will DMA to/from guest/user buffer
which is pointed by host skb->data. We also modify skb_release_data() and
skb_reserve(). We made it now works with ixgbe driver with PS mode disabled,
and got some performance data with it.
  
using netperf with GSO/TSO disabled, 10G NIC, 
disabled packet split mode, with raw socket case compared to vhost.

bindwidth will be from 1.1Gbps to 1.7Gbps
CPU % from 120%-140% to 140%-160%

We are now trying to get decent performance data with advanced features.
Do you have any other concerns with this solution? 

>> What's being proposed here looks a bit over-engineered.

>This is an attempt to reduce overhead for virtio (paravirtualization).
>'Don't use PV' is kind of an alternative, but I do not
>think it's a simpler one.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM, Fix QEMU-KVM is killed by guest SRAO MCE

2010-04-28 Thread Huang Ying
On Wed, 2010-04-28 at 17:47 +0800, Avi Kivity wrote:
> On 04/28/2010 05:56 AM, Huang Ying wrote:
> >
> >>>
> >>> Just want to use the side effect of copy_from_user, SIGBUS will be sent
> >>> to current process because the page touched is marked as poisoned. That
> >>> is, failure is expected, so the return value is not checked.
> >>>
> >>>
> >> What if the failure doesn't happen?  Say, someone mmap()ed over the page.
> >>  
> > Sorry, not get your idea clearly. hva is re-mmap()ed? We just read the
> > hva, not write, so I think it should be OK here.
> >
> >
> 
> We don't generate a signal in this case.  Does the code continue to work 
> correctly (not sure what correctly is in this case... should probably 
> just continue).
> 
> There's also the possibility of -EFAULT.

I think signal should be generated for copy_from_user, because the hva
is poisoned now. The signal will not generated only if the hva is
re-mmap()ped to some other physical page, but this should be impossible
unless we have memory hotadd/hotremove in KVM.

If the signal is not generated, lost or overwritten, guest will
continue, and if the hva is still poisoned, the page fault will be
triggered again; if the hva is not poisoned, there will be no further
page fault.

Best Regards,
Huang Ying



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Ignore SRAO MCE if another MCE is being processed

2010-04-28 Thread Huang Ying
On Thu, 2010-04-29 at 00:33 +0800, Avi Kivity wrote:
> On 04/28/2010 09:08 AM, Huang Ying wrote:
> >
> > There is some test script and document for this in:
> >
> > git://git.kernel.org/pub/scm/utils/cpu/mce/mce-test.git
> >
> > test script is in "kvm" directory, testing document is kvm/README
> >
> >
> 
> This is a good candidate for kvm-autotest.  We'll get little user 
> coverage for this feature, so it's good to get it machine tested regularly.

Yes. We will working on this.

Best Regards,
Huang Ying

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


High CPU load on target host after migration

2010-04-28 Thread Thomas Beinicke
Hi,

I have been toying around with kvm / libvirt / virt-manager and it's migration 
feature.
Both host machines are running a 2.6.33 Kernel.

One host is a Dual Quad Core Intel Xeon E5520  @ 2.27GHz and the other is a 
Dual Quad Core Intel E5420  @ 2.50GHz.

Migrating Linux machines works great but Windows XP SP3 is giving me a 
headache.
The migration process finishes without any crash/error but the CPU load on the 
target host is ~50% on two cores. There is no CPU intensive task running 
inside the VM though.
"Removing" the network card from the VM and migrating it between the two 
machines doesn't seem to trigger the high CPU load.
As network driver I tried the realtek and Red Hat virtio driver but it doesn't 
seem to make a difference.

Any insights on what could cause this situation and how to best debug it?

Cheers,

Thomas
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RHEL6.0 PATCH] qemu-kvm: fix crash on reboot with vhost-net

2010-04-28 Thread Michael S. Tsirkin
On Wed, Apr 28, 2010 at 11:16:15PM +0300, Michael S. Tsirkin wrote:
> Bugzilla: 585940
> Upstream status: applied on qemu-kvm.git,
> commit 992cc816c42f2e93db033919a9ddbfcd1da4

Please disregard, sent to this list in error.
Sorry about the noise.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Can I simulate a virtual Dual-Head Graphiccard?

2010-04-28 Thread Brian Jackson
On Wednesday, April 28, 2010 03:08:24 pm Axel Kittenberger wrote:
> Hello,
> 
> This is a question I was not able to answer with a search. I've been
> using kvm now quite successfully as server side solution. Now I want to
> use it on a particular desktop to have a Windows 7 Guest on a native
> Linux system. Well this desktop has two Screens, and I'm sure its
> expected to have the Guest also on both screens.
> 
> Supposevly I could just simulate a very wide Screen and have the host
> split it (either SDL or VNC). However, this is not quite the same, as
> the guest will think exactly that, 1 wide screen. Meaning it will put
> all the messageboxes exactly in the middle between the two screens, have
> the startbar spawn on both screens and what not. So two screens are
> handled a tad differently than one wide.
> 
> Is this possible with kvm?
> Either simulate a dual head mapping to one wide SDL/VNC display. Or
> having two SDL/VNC displays?


No. It was brought up before on the qemu list I believe. I think the gist was 
that qemu didn't support more than one vga card.


> 
> Kind regards,
> Axel Kittenberger
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv7] add mergeable buffers support to vhost_net

2010-04-28 Thread David L Stevens
This patch adds mergeable receive buffer support to vhost_net.

Signed-off-by: David L Stevens 

diff -ruNp net-next-v0/drivers/vhost/net.c net-next-v7/drivers/vhost/net.c
--- net-next-v0/drivers/vhost/net.c 2010-04-24 21:36:54.0 -0700
+++ net-next-v7/drivers/vhost/net.c 2010-04-28 12:26:18.0 -0700
@@ -74,6 +74,23 @@ static int move_iovec_hdr(struct iovec *
}
return seg;
 }
+/* Copy iovec entries for len bytes from iovec. Return segments used. */
+static int copy_iovec_hdr(const struct iovec *from, struct iovec *to,
+ size_t len, int iovcount)
+{
+   int seg = 0;
+   size_t size;
+   while (len && seg < iovcount) {
+   size = min(from->iov_len, len);
+   to->iov_base = from->iov_base;
+   to->iov_len = size;
+   len -= size;
+   ++from;
+   ++to;
+   ++seg;
+   }
+   return seg;
+}
 
 /* Caller must have TX VQ lock */
 static void tx_poll_stop(struct vhost_net *net)
@@ -109,7 +126,7 @@ static void handle_tx(struct vhost_net *
};
size_t len, total_len = 0;
int err, wmem;
-   size_t hdr_size;
+   size_t vhost_hlen;
struct socket *sock = rcu_dereference(vq->private_data);
if (!sock)
return;
@@ -128,13 +145,13 @@ static void handle_tx(struct vhost_net *
 
if (wmem < sock->sk->sk_sndbuf / 2)
tx_poll_stop(net);
-   hdr_size = vq->hdr_size;
+   vhost_hlen = vq->vhost_hlen;
 
for (;;) {
-   head = vhost_get_vq_desc(&net->dev, vq, vq->iov,
-ARRAY_SIZE(vq->iov),
-&out, &in,
-NULL, NULL);
+   head = vhost_get_desc(&net->dev, vq, vq->iov,
+ ARRAY_SIZE(vq->iov),
+ &out, &in,
+ NULL, NULL);
/* Nothing new?  Wait for eventfd to tell us they refilled. */
if (head == vq->num) {
wmem = atomic_read(&sock->sk->sk_wmem_alloc);
@@ -155,20 +172,20 @@ static void handle_tx(struct vhost_net *
break;
}
/* Skip header. TODO: support TSO. */
-   s = move_iovec_hdr(vq->iov, vq->hdr, hdr_size, out);
+   s = move_iovec_hdr(vq->iov, vq->hdr, vhost_hlen, out);
msg.msg_iovlen = out;
len = iov_length(vq->iov, out);
/* Sanity check */
if (!len) {
vq_err(vq, "Unexpected header len for TX: "
   "%zd expected %zd\n",
-  iov_length(vq->hdr, s), hdr_size);
+  iov_length(vq->hdr, s), vhost_hlen);
break;
}
/* TODO: Check specific error and bomb out unless ENOBUFS? */
err = sock->ops->sendmsg(NULL, sock, &msg, len);
if (unlikely(err < 0)) {
-   vhost_discard_vq_desc(vq);
+   vhost_discard_desc(vq, 1);
tx_poll_start(net, sock);
break;
}
@@ -187,12 +204,25 @@ static void handle_tx(struct vhost_net *
unuse_mm(net->dev.mm);
 }
 
+static int vhost_head_len(struct vhost_virtqueue *vq, struct sock *sk)
+{
+   struct sk_buff *head;
+   int len = 0;
+
+   lock_sock(sk);
+   head = skb_peek(&sk->sk_receive_queue);
+   if (head)
+   len = head->len + vq->sock_hlen;
+   release_sock(sk);
+   return len;
+}
+
 /* Expects to be always run from workqueue - which acts as
  * read-size critical section for our kind of RCU. */
 static void handle_rx(struct vhost_net *net)
 {
struct vhost_virtqueue *vq = &net->dev.vqs[VHOST_NET_VQ_RX];
-   unsigned head, out, in, log, s;
+   unsigned in, log, s;
struct vhost_log *vq_log;
struct msghdr msg = {
.msg_name = NULL,
@@ -203,14 +233,14 @@ static void handle_rx(struct vhost_net *
.msg_flags = MSG_DONTWAIT,
};
 
-   struct virtio_net_hdr hdr = {
-   .flags = 0,
-   .gso_type = VIRTIO_NET_HDR_GSO_NONE
+   struct virtio_net_hdr_mrg_rxbuf hdr = {
+   .hdr.flags = 0,
+   .hdr.gso_type = VIRTIO_NET_HDR_GSO_NONE
};
 
size_t len, total_len = 0;
-   int err;
-   size_t hdr_size;
+   int err, headcount, datalen;
+   size_t vhost_hlen;
struct socket *sock = rcu_dereference(vq->private_data);
if (!sock || skb_queue_empty(&sock->sk->sk_receive_queue))
return;
@@ -218,18 +248,19 @@ static void handle_rx(struct vhost_net *
use_mm(net->dev.mm);
mutex_lock(&vq->mutex);
vhost_disa

[RHEL6.0 PATCH] qemu-kvm: fix crash on reboot with vhost-net

2010-04-28 Thread Michael S. Tsirkin
Bugzilla: 585940
Upstream status: applied on qemu-kvm.git,
commit 992cc816c42f2e93db033919a9ddbfcd1da4

When vhost-net is disabled on reboot, we set msix mask notifier
to NULL to disable further mask/unmask notifications.
Code currently tries to pass this NULL to notifier,
leading to a crash.  The right thing to do is
to add explicit APIs to enable/disable notifications.
Now when disabling notifications:
- if vector is masked, we don't need to notify backend,
  just disable future notifications
- if vector is unmasked, invoke callback to unassign backend,
  then disable future notifications

This patch also polls notifier before closing it,
to make sure we don't lose events if poll callback
didn't have time to run.

Signed-off-by: Michael S. Tsirkin 
---
 hw/msix.c   |   40 +++-
 hw/msix.h   |1 +
 hw/virtio-pci.c |7 +--
 3 files changed, 41 insertions(+), 7 deletions(-)

diff --git a/hw/msix.c b/hw/msix.c
index 3fcf3a1..94e3981 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -610,14 +610,44 @@ void msix_unuse_all_vectors(PCIDevice *dev)
 
 int msix_set_mask_notifier(PCIDevice *dev, unsigned vector, void *opaque)
 {
+int r;
+if (vector >= dev->msix_entries_nr || !dev->msix_entry_used[vector])
+return 0;
+
+assert(dev->msix_mask_notifier);
+assert(opaque);
+assert(!dev->msix_mask_notifier_opaque[vector]);
+
+if (msix_is_masked(dev, vector)) {
+return 0;
+}
+r = dev->msix_mask_notifier(dev, vector, opaque,
+msix_is_masked(dev, vector));
+if (r < 0) {
+return r;
+}
+dev->msix_mask_notifier_opaque[vector] = opaque;
+return r;
+}
+
+int msix_unset_mask_notifier(PCIDevice *dev, unsigned vector)
+{
 int r = 0;
 if (vector >= dev->msix_entries_nr || !dev->msix_entry_used[vector])
 return 0;
 
-if (dev->msix_mask_notifier)
-r = dev->msix_mask_notifier(dev, vector, opaque,
-msix_is_masked(dev, vector));
-if (r >= 0)
-dev->msix_mask_notifier_opaque[vector] = opaque;
+assert(dev->msix_mask_notifier);
+assert(dev->msix_mask_notifier_opaque[vector]);
+
+if (msix_is_masked(dev, vector)) {
+return 0;
+}
+r = dev->msix_mask_notifier(dev, vector,
+dev->msix_mask_notifier_opaque[vector],
+msix_is_masked(dev, vector));
+if (r < 0) {
+return r;
+}
+dev->msix_mask_notifier_opaque[vector] = NULL;
 return r;
 }
diff --git a/hw/msix.h b/hw/msix.h
index f167231..6b21ffb 100644
--- a/hw/msix.h
+++ b/hw/msix.h
@@ -34,4 +34,5 @@ void msix_reset(PCIDevice *dev);
 extern int msix_supported;
 
 int msix_set_mask_notifier(PCIDevice *dev, unsigned vector, void *opaque);
+int msix_unset_mask_notifier(PCIDevice *dev, unsigned vector);
 #endif
diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index cba188c..22f7fa0 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -437,10 +437,13 @@ static int virtio_pci_guest_notifier(void *opaque, int n, 
bool assign)
 msix_set_mask_notifier(&proxy->pci_dev,
virtio_queue_vector(proxy->vdev, n), vq);
 } else {
-msix_set_mask_notifier(&proxy->pci_dev,
-   virtio_queue_vector(proxy->vdev, n), NULL);
+msix_unset_mask_notifier(&proxy->pci_dev,
+virtio_queue_vector(proxy->vdev, n));
 qemu_set_fd_handler(event_notifier_get_fd(notifier),
 NULL, NULL, NULL);
+/* Test and clear notifier before closing it,
+ * in case poll callback didn't have time to run. */
+virtio_pci_guest_notifier_read(vq);
 event_notifier_cleanup(notifier);
 }
 
-- 
1.7.1.rc1.22.g3163
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Can I simulate a virtual Dual-Head Graphiccard?

2010-04-28 Thread Axel Kittenberger

Hello,

This is a question I was not able to answer with a search. I've been 
using kvm now quite successfully as server side solution. Now I want to 
use it on a particular desktop to have a Windows 7 Guest on a native 
Linux system. Well this desktop has two Screens, and I'm sure its 
expected to have the Guest also on both screens.


Supposevly I could just simulate a very wide Screen and have the host 
split it (either SDL or VNC). However, this is not quite the same, as 
the guest will think exactly that, 1 wide screen. Meaning it will put 
all the messageboxes exactly in the middle between the two screens, have 
the startbar spawn on both screens and what not. So two screens are 
handled a tad differently than one wide.


Is this possible with kvm?
Either simulate a dual head mapping to one wide SDL/VNC display. Or 
having two SDL/VNC displays?


Kind regards,
Axel Kittenberger


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2] qemu-kvm: fix crash on reboot with vhost-net

2010-04-28 Thread Michael S. Tsirkin
On Wed, Apr 28, 2010 at 02:43:05PM -0300, Marcelo Tosatti wrote:
> > Signed-off-by: Michael S. Tsirkin 
> 
> Applied, thanks.

Can you tell me what commit id it has pls (for backport to rhel6).
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Regression in vga performance between 0.11.1 and 0.12.1.1

2010-04-28 Thread Adam Greenblatt

Hi,
  I noticed that certain guests (for example, Ubuntu 9.04, Ubuntu 9.10,
and the Ubuntu 10.04 release candidate) show dramatically (~100x) slower
graphical output when running under qemu-kvm-0.12.1.1 than under 
qemu-kvm-0.11.1.

Other guests, notably Windows XP and Windows Vista, run fine under both
version of qemu.  The regression is still present in qemu-kvm-0.12.3.
  Here's the information you request when submitting a bug report:

   * what cpu model (examples: Intel Core Duo, Intel Core 2 Duo, AMD 
Opteron 2210)


  The host has two Xeon X5650 cpus.

   * what kvm version you are using. If you're using git directly, 
provide the output of 'git describe'.


  I'm using the kvm included with the stock 2.6.32.12 kernel, and 
two different
  qemu-kvms: version 0.11.1, which works fine, and version 
0.12.1.1, which has

  the regression.

   * the host kernel version

  I'm running kernel 2.6.32.12 on the host.

   * what host kernel arch you are using (i386 or x86_64)

  i386.

   * what guest you are using, including OS type (Linux, Windows, 
Solaris, etc.), bitness (32 or 64), kernel version


  Guests that are affected:
 Ubuntu 9.04 - kernel 2.6.28-18, 32 bits
 Ubuntu 9.10 - kernel 2.6.31-20, 32 bits
 Ubuntu 10.04 - kernel 2.6.32-21, 32-bits
  Guests that aren't affected:
 Windows XP SP3, Windows Vista SP2: both 32 bits

   * the qemu command line you are using to start the guest

  Here's one that works:

DISPLAY=:0.2 ~/qemu0111/bin/qemu-system-x86_64 -m 1536 -soundhw es1370 
-smp 4 -usb -usbdevice tablet -vga std -cpu core2duo -snapshot -name 
"VKoala (r/o)",process="VKoala" -net nic,macaddr=DE:AD:BE:EF:00:08 -net 
tap,vlan=0,ifname=tap6,script=no,downscript=no /vm/ubuntu-9.10/0006.img


  Here's one that works, but has drastically slower graphics (the only
  change is which qemu-system-x86_64 I'm running.)

DISPLAY=:0.2 ~/qemu01211/bin/qemu-system-x86_64 -m 1536 -soundhw es1370 
-smp 4 -usb -usbdevice tablet -vga std -cpu core2duo -snapshot -name 
"VKoala (r/o)",process="VKoala" -net nic,macaddr=DE:AD:BE:EF:00:08 -net 
tap,vlan=0,ifname=tap6,script=no,downscript=no /vm/ubuntu-9.10/0006.img


  Here are two that work fine with either binary:

DISPLAY=:0.2 ~/qemu01211/bin/qemu-system-x86_64 -m 1536 -soundhw es1370 
-usb -usbdevice tablet -localtime -vga std -cpu core2duo -snapshot -name 
"VXP (r/o)",process="VXP" -net nic,macaddr=DE:AD:BE:EF:00:01 -net 
tap,vlan=0,ifname=tap0,script=no,downscript=no /vm/xp/0024.img


DISPLAY=:0.2 ~/qemu01211/bin/qemu-system-x86_64 -m 1536 -soundhw es1370 
-smp 2 -usb -usbdevice tablet -localtime -vga std -cpu core2duo 
-snapshot -name "VVista (r/o)",process="VVista" -net 
nic,macaddr=DE:AD:BE:EF:00:02 -net 
tap,vlan=0,ifname=tap1,script=no,downscript=no /vm/vista/0009.img


   * whether the problem goes away if using the -no-kvm-irqchip or 
-no-kvm-pit switch.


  Neither of these switches change the results.

   * whether the problem also appears with the -no-kvm switch.

  This switch makes everything run so much slower that I can't tell 
if it

  changes the results.

Please cc me if you have any questions, as I'm not subscribed to the kvm 
list.

Thanks for working on kvm, it's a great piece of software!  Aloha,
  Adam Greenblatt


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm mmu: reduce 50% memory usage

2010-04-28 Thread Marcelo Tosatti
On Wed, Apr 28, 2010 at 07:57:01PM +0800, Lai Jiangshan wrote:
> 
> I think users will enable tdp when their hardwares support ept or npt.
> This patch can reduce about 50% kvm mmu memory usage for they.
> 
> This simple patch use the fact that:
> 
> When sp->role.direct is set, sp->gfns does not contain any essential
> information, leaf sptes reachable from this sp are for a continuate
> guest physical memory range(a linear range).
> So sp->gfns[i](if it was set) equals to sp->gfn + i. (PT_PAGE_TABLE_LEVEL)
> Obviously, it is not essential information, we can calculate it when need.
> 
> It means we don't need sp->gfns when sp->role.direct=1,
> Thus we can save one page usage for every kvm_mmu_page.
> 
> Note:
> Access to sp->gfns must be wrapped by kvm_mmu_page_get_gfn()
> or kvm_mmu_page_set_gfn().
> It is only exposed in FNAME(sync_page).
> 
> Signed-off-by: Lai Jiangshan 

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2] qemu-kvm: fix crash on reboot with vhost-net

2010-04-28 Thread Marcelo Tosatti
On Wed, Apr 28, 2010 at 12:27:38PM +0300, Michael S. Tsirkin wrote:
> When vhost-net is disabled on reboot, we set msix mask notifier
> to NULL to disable further mask/unmask notifications.
> Code currently tries to pass this NULL to notifier,
> leading to a crash.  The right thing to do is
> to add explicit APIs to enable/disable notifications.
> Now when disabling notifications:
> - if vector is masked, we don't need to notify backend,
>   just disable future notifications
> - if vector is unmasked, invoke callback to unassign backend,
>   then disable future notifications
> 
> This patch also polls notifier before closing it,
> to make sure we don't lose events if poll callback
> didn't have time to run.
> 
> Signed-off-by: Michael S. Tsirkin 

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Ignore SRAO MCE if another MCE is being processed

2010-04-28 Thread Avi Kivity

On 04/28/2010 09:08 AM, Huang Ying wrote:


There is some test script and document for this in:

git://git.kernel.org/pub/scm/utils/cpu/mce/mce-test.git

test script is in "kvm" directory, testing document is kvm/README

   


This is a good candidate for kvm-autotest.  We'll get little user 
coverage for this feature, so it's good to get it machine tested regularly.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 0/10] KVM MMU: allow more shadow pages become asynchronous

2010-04-28 Thread Marcelo Tosatti
On Wed, Apr 28, 2010 at 11:54:31AM +0800, Xiao Guangrong wrote:
> Changlog v3:
> 
> Those changes all form Avi's suggestion, thanks.
> 
> - use smart way to fix the bug in patch 1
> - remove duplicates code in patch 5
> - check error code and fix forgot release page in patch 9
> - sync shadow pages in a batch instead of one by one
> 
> And, there is one TODO thing:
> Marker shadow page as unsync at create time avoid write-protect,
> this idea is from Avi:
> 
> |Another interesting case is to create new shadow pages in the unsync state.
> |That can help when the guest starts a short lived process: we can avoid write
> |protecting its pagetables completely
> 
> I'll send the patch out after this patchset applied.
>  
> Changlog v2:
> 
> - when level is PT_DIRECTORY_LEVEL, the 'offset' should be
>   'role.quadrant << 8', thanks Avi for point it out
> 
> - keep invlpg code in paging_tmpl.h address Avi's suggestion
> 
> - split kvm_sync_page() into kvm_sync_page() and kvm_sync_page_transient()
>   to clarify the code address Avi's suggestion

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH 06/10] kvm: remove explicit kvm_arch_reset_vcpu from kvm_init_vcpu

2010-04-28 Thread Marcelo Tosatti
On Wed, Apr 28, 2010 at 01:22:14PM -0300, Marcelo Tosatti wrote:
> On Wed, Apr 28, 2010 at 10:39:06AM -0500, Anthony Liguori wrote:
> > On 04/26/2010 12:59 PM, Marcelo Tosatti wrote:
> > >This is now done via the initialization's qemu_system_reset call.
> > >
> > >Signed-off-by: Avi Kivity
> > >---
> > >  kvm-all.c |1 -
> > >  1 files changed, 0 insertions(+), 1 deletions(-)
> > >
> > >diff --git a/kvm-all.c b/kvm-all.c
> > >index 9c8aa7d..eabb097 100644
> > >--- a/kvm-all.c
> > >+++ b/kvm-all.c
> > >@@ -208,7 +208,6 @@ int kvm_init_vcpu(CPUState *env)
> > >  ret = kvm_arch_init_vcpu(env);
> > >  if (ret == 0) {
> > >  qemu_register_reset(kvm_reset_vcpu, env);
> > >-kvm_arch_reset_vcpu(env);
> > >  }
> > >  err:
> > >  return ret;
> > 
> > This breaks -enable-kvm in upstream qemu.  No progress is ever made
> > in the guest.
> 
> These patches have been autotested, and just confirmed manually that
> this patch does not break -enable-kvm (as the reset handler is now 
> called through the notifier).
> 
> What is your qemu command line?

Just drop it please.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH 06/10] kvm: remove explicit kvm_arch_reset_vcpu from kvm_init_vcpu

2010-04-28 Thread Anthony Liguori

On 04/28/2010 11:22 AM, Marcelo Tosatti wrote:

On Wed, Apr 28, 2010 at 10:39:06AM -0500, Anthony Liguori wrote:
   

On 04/26/2010 12:59 PM, Marcelo Tosatti wrote:
 

This is now done via the initialization's qemu_system_reset call.

Signed-off-by: Avi Kivity
---
  kvm-all.c |1 -
  1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index 9c8aa7d..eabb097 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -208,7 +208,6 @@ int kvm_init_vcpu(CPUState *env)
  ret = kvm_arch_init_vcpu(env);
  if (ret == 0) {
  qemu_register_reset(kvm_reset_vcpu, env);
-kvm_arch_reset_vcpu(env);
  }
  err:
  return ret;
   

This breaks -enable-kvm in upstream qemu.  No progress is ever made
in the guest.
 

These patches have been autotested, and just confirmed manually that
this patch does not break -enable-kvm (as the reset handler is now
called through the notifier).

What is your qemu command line?
   


x86_64-softmmu/qemu-system-x86_64 -hda ~/images/linux.img -snapshot 
-enable-kvm -L ~/git/qemu/pc-bios


This is with the very latest tip.  I can publish a branch if that's 
helpful.  If you have an old tree, I wouldn't be surprised if it didn't 
show up.


Regards,

Anthony Liguori


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] Add test for ljmp.

2010-04-28 Thread Marcelo Tosatti
On Tue, Apr 27, 2010 at 03:56:04PM +0300, Gleb Natapov wrote:
> Test that ljmp with operand in IO memory works.
> 
> Signed-off-by: Gleb Natapov 

Applied both, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Ignore SRAO MCE if another MCE is being processed

2010-04-28 Thread Marcelo Tosatti
On Tue, Apr 27, 2010 at 03:10:49PM +0800, Huang Ying wrote:
> In common cases, guest SRAO MCE will cause corresponding poisoned page
> be un-mapped in host and SIGBUS be sent to QEMU-KVM, then QEMU-KVM
> will relay the MCE to guest OS.
> 
> But it is possible that the poisoned page is accessed in guest after
> un-mapped in host and before MCE is relayed to guest OS. So that, the
> SRAR SIGBUS is sent to QEMU-KVM before the SRAO SIGBUS, and if
> QEMU-KVM relays them to guest OS one by one, guest system may reset,
> because the SRAO MCE may be triggered while the SRAR MCE is being
> processed. In fact, the SRAO MCE can be ignored in this situation, so
> that the guest system is given opportunity to survive.
> 
> Signed-off-by: Huang Ying 

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH 06/10] kvm: remove explicit kvm_arch_reset_vcpu from kvm_init_vcpu

2010-04-28 Thread Marcelo Tosatti
On Wed, Apr 28, 2010 at 10:39:06AM -0500, Anthony Liguori wrote:
> On 04/26/2010 12:59 PM, Marcelo Tosatti wrote:
> >This is now done via the initialization's qemu_system_reset call.
> >
> >Signed-off-by: Avi Kivity
> >---
> >  kvm-all.c |1 -
> >  1 files changed, 0 insertions(+), 1 deletions(-)
> >
> >diff --git a/kvm-all.c b/kvm-all.c
> >index 9c8aa7d..eabb097 100644
> >--- a/kvm-all.c
> >+++ b/kvm-all.c
> >@@ -208,7 +208,6 @@ int kvm_init_vcpu(CPUState *env)
> >  ret = kvm_arch_init_vcpu(env);
> >  if (ret == 0) {
> >  qemu_register_reset(kvm_reset_vcpu, env);
> >-kvm_arch_reset_vcpu(env);
> >  }
> >  err:
> >  return ret;
> 
> This breaks -enable-kvm in upstream qemu.  No progress is ever made
> in the guest.

These patches have been autotested, and just confirmed manually that
this patch does not break -enable-kvm (as the reset handler is now 
called through the notifier).

What is your qemu command line?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2 07/23] KVM: x86 emulator: add (set|get)_msr callbacks to x86_emulate_ops

2010-04-28 Thread Gleb Natapov
Add (set|get)_msr callbacks to x86_emulate_ops instead of calling
them directly.

Signed-off-by: Gleb Natapov 
---
 arch/x86/include/asm/kvm_emulate.h |2 ++
 arch/x86/kvm/emulate.c |   36 ++--
 arch/x86/kvm/x86.c |2 ++
 3 files changed, 22 insertions(+), 18 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index c37296d..f751657 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -139,6 +139,8 @@ struct x86_emulate_ops {
void (*set_rflags)(struct kvm_vcpu *vcpu, unsigned long rflags);
int (*get_dr)(int dr, unsigned long *dest, struct kvm_vcpu *vcpu);
int (*set_dr)(int dr, unsigned long value, struct kvm_vcpu *vcpu);
+   int (*set_msr)(struct kvm_vcpu *vcpu, u32 msr_index, u64 data);
+   int (*get_msr)(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata);
 };
 
 /* Type, address-of, and value of an instruction's operand. */
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index c54f381..37b20c3 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1875,7 +1875,7 @@ setup_syscalls_segments(struct x86_emulate_ctxt *ctxt,
 }
 
 static int
-emulate_syscall(struct x86_emulate_ctxt *ctxt)
+emulate_syscall(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops)
 {
struct decode_cache *c = &ctxt->decode;
struct kvm_segment cs, ss;
@@ -1890,7 +1890,7 @@ emulate_syscall(struct x86_emulate_ctxt *ctxt)
 
setup_syscalls_segments(ctxt, &cs, &ss);
 
-   kvm_x86_ops->get_msr(ctxt->vcpu, MSR_STAR, &msr_data);
+   ops->get_msr(ctxt->vcpu, MSR_STAR, &msr_data);
msr_data >>= 32;
cs.selector = (u16)(msr_data & 0xfffc);
ss.selector = (u16)(msr_data + 8);
@@ -1907,17 +1907,17 @@ emulate_syscall(struct x86_emulate_ctxt *ctxt)
 #ifdef CONFIG_X86_64
c->regs[VCPU_REGS_R11] = ctxt->eflags & ~EFLG_RF;
 
-   kvm_x86_ops->get_msr(ctxt->vcpu,
-   ctxt->mode == X86EMUL_MODE_PROT64 ?
-   MSR_LSTAR : MSR_CSTAR, &msr_data);
+   ops->get_msr(ctxt->vcpu,
+ctxt->mode == X86EMUL_MODE_PROT64 ?
+MSR_LSTAR : MSR_CSTAR, &msr_data);
c->eip = msr_data;
 
-   kvm_x86_ops->get_msr(ctxt->vcpu, MSR_SYSCALL_MASK, &msr_data);
+   ops->get_msr(ctxt->vcpu, MSR_SYSCALL_MASK, &msr_data);
ctxt->eflags &= ~(msr_data | EFLG_RF);
 #endif
} else {
/* legacy mode */
-   kvm_x86_ops->get_msr(ctxt->vcpu, MSR_STAR, &msr_data);
+   ops->get_msr(ctxt->vcpu, MSR_STAR, &msr_data);
c->eip = (u32)msr_data;
 
ctxt->eflags &= ~(EFLG_VM | EFLG_IF | EFLG_RF);
@@ -1927,7 +1927,7 @@ emulate_syscall(struct x86_emulate_ctxt *ctxt)
 }
 
 static int
-emulate_sysenter(struct x86_emulate_ctxt *ctxt)
+emulate_sysenter(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops)
 {
struct decode_cache *c = &ctxt->decode;
struct kvm_segment cs, ss;
@@ -1949,7 +1949,7 @@ emulate_sysenter(struct x86_emulate_ctxt *ctxt)
 
setup_syscalls_segments(ctxt, &cs, &ss);
 
-   kvm_x86_ops->get_msr(ctxt->vcpu, MSR_IA32_SYSENTER_CS, &msr_data);
+   ops->get_msr(ctxt->vcpu, MSR_IA32_SYSENTER_CS, &msr_data);
switch (ctxt->mode) {
case X86EMUL_MODE_PROT32:
if ((msr_data & 0xfffc) == 0x0) {
@@ -1979,17 +1979,17 @@ emulate_sysenter(struct x86_emulate_ctxt *ctxt)
kvm_x86_ops->set_segment(ctxt->vcpu, &cs, VCPU_SREG_CS);
kvm_x86_ops->set_segment(ctxt->vcpu, &ss, VCPU_SREG_SS);
 
-   kvm_x86_ops->get_msr(ctxt->vcpu, MSR_IA32_SYSENTER_EIP, &msr_data);
+   ops->get_msr(ctxt->vcpu, MSR_IA32_SYSENTER_EIP, &msr_data);
c->eip = msr_data;
 
-   kvm_x86_ops->get_msr(ctxt->vcpu, MSR_IA32_SYSENTER_ESP, &msr_data);
+   ops->get_msr(ctxt->vcpu, MSR_IA32_SYSENTER_ESP, &msr_data);
c->regs[VCPU_REGS_RSP] = msr_data;
 
return X86EMUL_CONTINUE;
 }
 
 static int
-emulate_sysexit(struct x86_emulate_ctxt *ctxt)
+emulate_sysexit(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops)
 {
struct decode_cache *c = &ctxt->decode;
struct kvm_segment cs, ss;
@@ -2012,7 +2012,7 @@ emulate_sysexit(struct x86_emulate_ctxt *ctxt)
 
cs.dpl = 3;
ss.dpl = 3;
-   kvm_x86_ops->get_msr(ctxt->vcpu, MSR_IA32_SYSENTER_CS, &msr_data);
+   ops->get_msr(ctxt->vcpu, MSR_IA32_SYSENTER_CS, &msr_data);
switch (usermode) {
case X86EMUL_MODE_PROT32:
cs.selector = (u16)(msr_data + 16);
@@ -3099,7 +3099,7 @@ twobyte_insn:
}
break;
case 0x05:  /* syscall */
-   rc = emulate_syscall(ctxt);
+   rc = emulate_syscall(ctxt, ops);
if (rc != X86EMUL_CONTI

[PATCHv2 05/23] KVM: x86 emulator: handle "far address" source operand.

2010-04-28 Thread Gleb Natapov
ljmp/lcall instruction operand contains address and segment.
It can be 10 bytes long. Currently we decode it as two different
operands. Fix it by introducing new kind of operand that can hold
entire far address.

Signed-off-by: Gleb Natapov 
---
 arch/x86/include/asm/kvm_emulate.h |6 +++-
 arch/x86/kvm/emulate.c |   56 ---
 2 files changed, 37 insertions(+), 25 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index 288cbed..69a64a6 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -143,7 +143,11 @@ struct x86_emulate_ops {
 struct operand {
enum { OP_REG, OP_MEM, OP_IMM, OP_NONE } type;
unsigned int bytes;
-   unsigned long val, orig_val, *ptr;
+   unsigned long orig_val, *ptr;
+   union {
+   unsigned long val;
+   char valptr[sizeof(unsigned long) + 2];
+   };
 };
 
 struct fetch_cache {
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index fbc555b..9b19838 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -67,6 +67,8 @@
 #define SrcImmUByte (8<<4)  /* 8-bit unsigned immediate operand. */
 #define SrcImmU (9<<4)  /* Immediate operand, unsigned */
 #define SrcSI   (0xa<<4)   /* Source is in the DS:RSI */
+#define SrcImmFAddr (0xb<<4)   /* Source is immediate far address */
+#define SrcMemFAddr (0xc<<4)   /* Source is far address in memory */
 #define SrcMask (0xf<<4)
 /* Generic ModRM decode. */
 #define ModRM   (1<<8)
@@ -88,10 +90,6 @@
 #define Src2CL  (1<<29)
 #define Src2ImmByte (2<<29)
 #define Src2One (3<<29)
-#define Src2Imm16   (4<<29)
-#define Src2Mem16   (5<<29) /* Used for Ep encoding. First argument has to be
-  in memory and second argument is located
-  immediately after the first one in memory. */
 #define Src2Mask(7<<29)
 
 enum {
@@ -175,7 +173,7 @@ static u32 opcode_table[256] = {
/* 0x90 - 0x97 */
DstReg, DstReg, DstReg, DstReg, DstReg, DstReg, DstReg, DstReg,
/* 0x98 - 0x9F */
-   0, 0, SrcImm | Src2Imm16 | No64, 0,
+   0, 0, SrcImmFAddr | No64, 0,
ImplicitOps | Stack, ImplicitOps | Stack, 0, 0,
/* 0xA0 - 0xA7 */
ByteOp | DstReg | SrcMem | Mov | MemAbs, DstReg | SrcMem | Mov | MemAbs,
@@ -215,7 +213,7 @@ static u32 opcode_table[256] = {
ByteOp | SrcImmUByte | DstAcc, SrcImmUByte | DstAcc,
/* 0xE8 - 0xEF */
SrcImm | Stack, SrcImm | ImplicitOps,
-   SrcImmU | Src2Imm16 | No64, SrcImmByte | ImplicitOps,
+   SrcImmFAddr | No64, SrcImmByte | ImplicitOps,
SrcNone | ByteOp | DstAcc, SrcNone | DstAcc,
SrcNone | ByteOp | DstAcc, SrcNone | DstAcc,
/* 0xF0 - 0xF7 */
@@ -350,7 +348,7 @@ static u32 group_table[] = {
[Group5*8] =
DstMem | SrcNone | ModRM, DstMem | SrcNone | ModRM,
SrcMem | ModRM | Stack, 0,
-   SrcMem | ModRM | Stack, SrcMem | ModRM | Src2Mem16 | ImplicitOps,
+   SrcMem | ModRM | Stack, SrcMemFAddr | ModRM | ImplicitOps,
SrcMem | ModRM | Stack, 0,
[Group7*8] =
0, 0, ModRM | SrcMem | Priv, ModRM | SrcMem | Priv,
@@ -576,6 +574,13 @@ static u32 group2_table[] = {
(_type)_x;  \
 })
 
+#define insn_fetch_arr(_arr, _size, _eip)\
+({ rc = do_insn_fetch(ctxt, ops, (_eip), _arr, (_size));   \
+   if (rc != X86EMUL_CONTINUE) \
+   goto done;  \
+   (_eip) += (_size);  \
+})
+
 static inline unsigned long ad_mask(struct decode_cache *c)
 {
return (1UL << (c->ad_bytes << 3)) - 1;
@@ -1160,6 +1165,17 @@ done_prefixes:
 c->regs[VCPU_REGS_RSI]);
c->src.val = 0;
break;
+   case SrcImmFAddr:
+   c->src.type = OP_IMM;
+   c->src.ptr = (unsigned long *)c->eip;
+   c->src.bytes = c->op_bytes + 2;
+   insn_fetch_arr(c->src.valptr, c->src.bytes, c->eip);
+   break;
+   case SrcMemFAddr:
+   c->src.type = OP_MEM;
+   c->src.ptr = (unsigned long *)c->modrm_ea;
+   c->src.bytes = c->op_bytes + 2;
+   break;
}
 
/*
@@ -1179,22 +1195,10 @@ done_prefixes:
c->src2.bytes = 1;
c->src2.val = insn_fetch(u8, 1, c->eip);
break;
-   case Src2Imm16:
-   c->src2.type = OP_IMM;
-   c->src2.ptr = (unsigned long *)c->eip;
-   c->src2.bytes = 2;
-   c->src2.val = insn_fetch(u16, 2, c->eip);
-   break;
case Src2One:
c->src2.bytes = 1;
c->src2.va

[PATCHv2 06/23] KVM: x86 emulator: add (set|get)_dr callbacks to x86_emulate_ops

2010-04-28 Thread Gleb Natapov
Add (set|get)_dr callbacks to x86_emulate_ops instead of calling
them directly.

Signed-off-by: Gleb Natapov 
---
 arch/x86/include/asm/kvm_emulate.h |2 ++
 arch/x86/include/asm/kvm_host.h|4 
 arch/x86/kvm/emulate.c |7 +--
 arch/x86/kvm/x86.c |   12 ++--
 4 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index 69a64a6..c37296d 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -137,6 +137,8 @@ struct x86_emulate_ops {
void (*set_cr)(int cr, ulong val, struct kvm_vcpu *vcpu);
int (*cpl)(struct kvm_vcpu *vcpu);
void (*set_rflags)(struct kvm_vcpu *vcpu, unsigned long rflags);
+   int (*get_dr)(int dr, unsigned long *dest, struct kvm_vcpu *vcpu);
+   int (*set_dr)(int dr, unsigned long value, struct kvm_vcpu *vcpu);
 };
 
 /* Type, address-of, and value of an instruction's operand. */
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 3f0007b..74cb6ac 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -590,10 +590,6 @@ void kvm_emulate_cpuid(struct kvm_vcpu *vcpu);
 int kvm_emulate_halt(struct kvm_vcpu *vcpu);
 int emulate_invlpg(struct kvm_vcpu *vcpu, gva_t address);
 int emulate_clts(struct kvm_vcpu *vcpu);
-int emulator_get_dr(struct x86_emulate_ctxt *ctxt, int dr,
-   unsigned long *dest);
-int emulator_set_dr(struct x86_emulate_ctxt *ctxt, int dr,
-   unsigned long value);
 
 void kvm_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg);
 int kvm_load_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector, int seg);
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 9b19838..c54f381 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -3132,7 +3132,7 @@ twobyte_insn:
kvm_queue_exception(ctxt->vcpu, UD_VECTOR);
goto done;
}
-   emulator_get_dr(ctxt, c->modrm_reg, &c->regs[c->modrm_rm]);
+   ops->get_dr(c->modrm_reg, &c->regs[c->modrm_rm], ctxt->vcpu);
c->dst.type = OP_NONE;  /* no writeback */
break;
case 0x22: /* mov reg, cr */
@@ -3145,7 +3145,10 @@ twobyte_insn:
kvm_queue_exception(ctxt->vcpu, UD_VECTOR);
goto done;
}
-   emulator_set_dr(ctxt, c->modrm_reg, c->regs[c->modrm_rm]);
+
+   ops->set_dr(c->modrm_reg,c->regs[c->modrm_rm] &
+   ((ctxt->mode == X86EMUL_MODE_PROT64) ? ~0ULL : ~0U),
+   ctxt->vcpu);
c->dst.type = OP_NONE;  /* no writeback */
break;
case 0x30:
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6b2ce1d..c0d6e4c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3558,16 +3558,14 @@ int emulate_clts(struct kvm_vcpu *vcpu)
return X86EMUL_CONTINUE;
 }
 
-int emulator_get_dr(struct x86_emulate_ctxt *ctxt, int dr, unsigned long *dest)
+int emulator_get_dr(int dr, unsigned long *dest, struct kvm_vcpu *vcpu)
 {
-   return kvm_get_dr(ctxt->vcpu, dr, dest);
+   return kvm_get_dr(vcpu, dr, dest);
 }
 
-int emulator_set_dr(struct x86_emulate_ctxt *ctxt, int dr, unsigned long value)
+int emulator_set_dr(int dr, unsigned long value, struct kvm_vcpu *vcpu)
 {
-   unsigned long mask = (ctxt->mode == X86EMUL_MODE_PROT64) ? ~0ULL : ~0U;
-
-   return kvm_set_dr(ctxt->vcpu, dr, value & mask);
+   return kvm_set_dr(vcpu, dr, value);
 }
 
 void kvm_report_emulation_failure(struct kvm_vcpu *vcpu, const char *context)
@@ -3749,6 +3747,8 @@ static struct x86_emulate_ops emulate_ops = {
.set_cr  = emulator_set_cr,
.cpl = emulator_get_cpl,
.set_rflags  = emulator_set_rflags,
+   .get_dr  = emulator_get_dr,
+   .set_dr  = emulator_set_dr,
 };
 
 static void cache_all_regs(struct kvm_vcpu *vcpu)
-- 
1.6.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2 14/23] KVM: x86 emulator: x86_emulate_insn() return -1 only in case of emulation failure

2010-04-28 Thread Gleb Natapov
Currently emulator returns -1 when emulation failed or IO is needed.
Caller tries to guess whether emulation failed by looking at other
variables. Make it easier for caller to recognise error condition by
always returning -1 in case of failure. For this new emulator
internal return value X86EMUL_IO_NEEDED is introduced. It is used to
distinguish between error condition (which returns X86EMUL_UNHANDLEABLE)
and condition that requires IO exit to userspace to continue emulation.

Signed-off-by: Gleb Natapov 
---
 arch/x86/include/asm/kvm_emulate.h |1 +
 arch/x86/kvm/x86.c |   36 ++--
 2 files changed, 19 insertions(+), 18 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index 0cf4311..777240d 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -53,6 +53,7 @@ struct x86_emulate_ctxt;
 #define X86EMUL_PROPAGATE_FAULT 2 /* propagate a generated fault to guest */
 #define X86EMUL_RETRY_INSTR 3 /* retry the instruction for some reason */
 #define X86EMUL_CMPXCHG_FAILED  4 /* cmpxchg did not see expected value */
+#define X86EMUL_IO_NEEDED   5 /* IO is needed to complete emulation */
 
 struct x86_emulate_ops {
/*
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index aa629a0..bbfabbd 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3213,7 +3213,7 @@ static int kvm_read_guest_virt_helper(gva_t addr, void 
*val, unsigned int bytes,
}
ret = kvm_read_guest(vcpu->kvm, gpa, data, toread);
if (ret < 0) {
-   r = X86EMUL_UNHANDLEABLE;
+   r = X86EMUL_IO_NEEDED;
goto out;
}
 
@@ -3269,7 +3269,7 @@ static int kvm_write_guest_virt_system(gva_t addr, void 
*val,
}
ret = kvm_write_guest(vcpu->kvm, gpa, data, towrite);
if (ret < 0) {
-   r = X86EMUL_UNHANDLEABLE;
+   r = X86EMUL_IO_NEEDED;
goto out;
}
 
@@ -3329,7 +3329,7 @@ mmio:
vcpu->run->mmio.len = vcpu->mmio_size = bytes;
vcpu->run->mmio.is_write = vcpu->mmio_is_write = 0;
 
-   return X86EMUL_UNHANDLEABLE;
+   return X86EMUL_IO_NEEDED;
 }
 
 int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa,
@@ -3801,8 +3801,6 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
 */
cache_all_regs(vcpu);
 
-   vcpu->mmio_is_write = 0;
-
if (!(emulation_type & EMULTYPE_NO_DECODE)) {
int cs_db, cs_l;
kvm_x86_ops->get_cs_db_l_bits(vcpu, &cs_db, &cs_l);
@@ -3876,24 +3874,26 @@ restart:
return EMULATE_DO_MMIO;
}
 
-   if (r) {
-   if (kvm_mmu_unprotect_page_virt(vcpu, cr2))
-   goto done;
-   if (!vcpu->mmio_needed) {
-   ++vcpu->stat.insn_emulation_fail;
-   trace_kvm_emulate_insn_failed(vcpu);
-   kvm_report_emulation_failure(vcpu, "mmio");
-   return EMULATE_FAIL;
-   }
+   if (vcpu->mmio_needed) {
+   if (vcpu->mmio_is_write)
+   vcpu->mmio_needed = 0;
return EMULATE_DO_MMIO;
}
 
-   if (vcpu->mmio_is_write) {
-   vcpu->mmio_needed = 0;
-   return EMULATE_DO_MMIO;
+   if (r) { /* emulation failed */
+   /*
+* if emulation was due to access to shadowed page table
+* and it failed try to unshadow page and re-entetr the
+* guest to let CPU execute the instruction.
+*/
+   if (kvm_mmu_unprotect_page_virt(vcpu, cr2))
+   return EMULATE_DONE;
+
+   trace_kvm_emulate_insn_failed(vcpu);
+   kvm_report_emulation_failure(vcpu, "mmio");
+   return EMULATE_FAIL;
}
 
-done:
if (vcpu->arch.exception.pending)
vcpu->arch.emulate_ctxt.restart = false;
 
-- 
1.6.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2 10/23] KVM: x86 emulator: make set_cr() callback return error if it fails

2010-04-28 Thread Gleb Natapov
Make set_cr() callback return error if it fails instead of injecting #GP
behind emulator's back.

Signed-off-by: Gleb Natapov 
---
 arch/x86/include/asm/kvm_emulate.h |2 +-
 arch/x86/kvm/emulate.c |   10 ++-
 arch/x86/kvm/x86.c |  148 ++--
 3 files changed, 84 insertions(+), 76 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index df53ba2..6c4f491 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -135,7 +135,7 @@ struct x86_emulate_ops {
unsigned long (*get_cached_segment_base)(int seg, struct kvm_vcpu 
*vcpu);
void (*get_gdt)(struct desc_ptr *dt, struct kvm_vcpu *vcpu);
ulong (*get_cr)(int cr, struct kvm_vcpu *vcpu);
-   void (*set_cr)(int cr, ulong val, struct kvm_vcpu *vcpu);
+   int (*set_cr)(int cr, ulong val, struct kvm_vcpu *vcpu);
int (*cpl)(struct kvm_vcpu *vcpu);
void (*set_rflags)(struct kvm_vcpu *vcpu, unsigned long rflags);
int (*get_dr)(int dr, unsigned long *dest, struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index deb17a1..9f4b22e 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2272,7 +2272,10 @@ static int load_state_from_tss32(struct x86_emulate_ctxt 
*ctxt,
struct decode_cache *c = &ctxt->decode;
int ret;
 
-   ops->set_cr(3, tss->cr3, ctxt->vcpu);
+   if (ops->set_cr(3, tss->cr3, ctxt->vcpu)) {
+   kvm_inject_gp(ctxt->vcpu, 0);
+   return X86EMUL_PROPAGATE_FAULT;
+   }
c->eip = tss->eip;
ctxt->eflags = tss->eflags | 2;
c->regs[VCPU_REGS_RAX] = tss->eax;
@@ -3135,7 +3138,10 @@ twobyte_insn:
c->dst.type = OP_NONE;  /* no writeback */
break;
case 0x22: /* mov reg, cr */
-   ops->set_cr(c->modrm_reg, c->modrm_val, ctxt->vcpu);
+   if (ops->set_cr(c->modrm_reg, c->modrm_val, ctxt->vcpu)) {
+   kvm_inject_gp(ctxt->vcpu, 0);
+   goto done;
+   }
c->dst.type = OP_NONE;
break;
case 0x23: /* mov from reg to dr */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2829e8f..3cb016f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -414,57 +414,49 @@ out:
return changed;
 }
 
-void kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
+static int __kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
 {
cr0 |= X86_CR0_ET;
 
 #ifdef CONFIG_X86_64
-   if (cr0 & 0xUL) {
-   kvm_inject_gp(vcpu, 0);
-   return;
-   }
+   if (cr0 & 0xUL)
+   return 1;
 #endif
 
cr0 &= ~CR0_RESERVED_BITS;
 
-   if ((cr0 & X86_CR0_NW) && !(cr0 & X86_CR0_CD)) {
-   kvm_inject_gp(vcpu, 0);
-   return;
-   }
+   if ((cr0 & X86_CR0_NW) && !(cr0 & X86_CR0_CD))
+   return 1;
 
-   if ((cr0 & X86_CR0_PG) && !(cr0 & X86_CR0_PE)) {
-   kvm_inject_gp(vcpu, 0);
-   return;
-   }
+   if ((cr0 & X86_CR0_PG) && !(cr0 & X86_CR0_PE))
+   return 1;
 
if (!is_paging(vcpu) && (cr0 & X86_CR0_PG)) {
 #ifdef CONFIG_X86_64
if ((vcpu->arch.efer & EFER_LME)) {
int cs_db, cs_l;
 
-   if (!is_pae(vcpu)) {
-   kvm_inject_gp(vcpu, 0);
-   return;
-   }
+   if (!is_pae(vcpu))
+   return 1;
kvm_x86_ops->get_cs_db_l_bits(vcpu, &cs_db, &cs_l);
-   if (cs_l) {
-   kvm_inject_gp(vcpu, 0);
-   return;
-
-   }
+   if (cs_l)
+   return 1;
} else
 #endif
-   if (is_pae(vcpu) && !load_pdptrs(vcpu, vcpu->arch.cr3)) {
-   kvm_inject_gp(vcpu, 0);
-   return;
-   }
-
+   if (is_pae(vcpu) && !load_pdptrs(vcpu, vcpu->arch.cr3))
+   return 1;
}
 
kvm_x86_ops->set_cr0(vcpu, cr0);
 
kvm_mmu_reset_context(vcpu);
-   return;
+   return 0;
+}
+
+void kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
+{
+   if (__kvm_set_cr0(vcpu, cr0))
+   kvm_inject_gp(vcpu, 0);
 }
 EXPORT_SYMBOL_GPL(kvm_set_cr0);
 
@@ -474,61 +466,56 @@ void kvm_lmsw(struct kvm_vcpu *vcpu, unsigned long msw)
 }
 EXPORT_SYMBOL_GPL(kvm_lmsw);
 
-void kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
+int __kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
 {
unsigned long old_cr4 = kvm_read_cr4(vcpu);
unsigned long pdptr_bits = X86_CR4_PGE | X86_CR4_PSE | X86_

[PATCHv2 08/23] KVM: x86 emulator: add get_cached_segment_base() callback to x86_emulate_ops.

2010-04-28 Thread Gleb Natapov
On VMX it is expensive to call get_cached_descriptor() just to get segment
base since multiple vmcs_reads are done instead of only one. Introduce
new call back get_cached_segment_base() for efficiency.

Signed-off-by: Gleb Natapov 
---
 arch/x86/include/asm/kvm_emulate.h |1 +
 arch/x86/kvm/emulate.c |   13 +
 arch/x86/kvm/x86.c |7 +++
 3 files changed, 9 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index f751657..df53ba2 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -132,6 +132,7 @@ struct x86_emulate_ops {
  int seg, struct kvm_vcpu *vcpu);
u16 (*get_segment_selector)(int seg, struct kvm_vcpu *vcpu);
void (*set_segment_selector)(u16 sel, int seg, struct kvm_vcpu *vcpu);
+   unsigned long (*get_cached_segment_base)(int seg, struct kvm_vcpu 
*vcpu);
void (*get_gdt)(struct desc_ptr *dt, struct kvm_vcpu *vcpu);
ulong (*get_cr)(int cr, struct kvm_vcpu *vcpu);
void (*set_cr)(int cr, ulong val, struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 37b20c3..41c77a1 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2097,17 +2097,6 @@ static bool emulator_io_permited(struct x86_emulate_ctxt 
*ctxt,
return true;
 }
 
-static u32 get_cached_descriptor_base(struct x86_emulate_ctxt *ctxt,
- struct x86_emulate_ops *ops,
- int seg)
-{
-   struct desc_struct desc;
-   if (ops->get_cached_descriptor(&desc, seg, ctxt->vcpu))
-   return get_desc_base(&desc);
-   else
-   return ~0;
-}
-
 static void save_state_to_tss16(struct x86_emulate_ctxt *ctxt,
struct x86_emulate_ops *ops,
struct tss_segment_16 *tss)
@@ -2383,7 +2372,7 @@ static int emulator_do_task_switch(struct 
x86_emulate_ctxt *ctxt,
int ret;
u16 old_tss_sel = ops->get_segment_selector(VCPU_SREG_TR, ctxt->vcpu);
ulong old_tss_base =
-   get_cached_descriptor_base(ctxt, ops, VCPU_SREG_TR);
+   ops->get_cached_segment_base(VCPU_SREG_TR, ctxt->vcpu);
u32 desc_limit;
 
/* FIXME: old_tss_base == ~0 ? */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 69db0c8..2829e8f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3652,6 +3652,12 @@ static void emulator_get_gdt(struct desc_ptr *dt, struct 
kvm_vcpu *vcpu)
kvm_x86_ops->get_gdt(vcpu, dt);
 }
 
+static unsigned long emulator_get_cached_segment_base(int seg,
+ struct kvm_vcpu *vcpu)
+{
+   return get_segment_base(vcpu, seg);
+}
+
 static bool emulator_get_cached_descriptor(struct desc_struct *desc, int seg,
   struct kvm_vcpu *vcpu)
 {
@@ -3742,6 +3748,7 @@ static struct x86_emulate_ops emulate_ops = {
.set_cached_descriptor = emulator_set_cached_descriptor,
.get_segment_selector = emulator_get_segment_selector,
.set_segment_selector = emulator_set_segment_selector,
+   .get_cached_segment_base = emulator_get_cached_segment_base,
.get_gdt = emulator_get_gdt,
.get_cr  = emulator_get_cr,
.set_cr  = emulator_set_cr,
-- 
1.6.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2 11/23] KVM: x86 emulator: make (get|set)_dr() callback return error if it fails

2010-04-28 Thread Gleb Natapov
Make (get|set)_dr() callback return error if it fails instead of
injecting exception behind emulator's back.

Signed-off-by: Gleb Natapov 
---
 arch/x86/kvm/emulate.c |   11 ++--
 arch/x86/kvm/x86.c |   63 ---
 2 files changed, 45 insertions(+), 29 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 9f4b22e..df4e9a0 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -3151,9 +3151,14 @@ twobyte_insn:
goto done;
}
 
-   ops->set_dr(c->modrm_reg,c->regs[c->modrm_rm] &
-   ((ctxt->mode == X86EMUL_MODE_PROT64) ? ~0ULL : ~0U),
-   ctxt->vcpu);
+   if (ops->set_dr(c->modrm_reg, c->regs[c->modrm_rm] &
+   ((ctxt->mode == X86EMUL_MODE_PROT64) ?
+~0ULL : ~0U), ctxt->vcpu) < 0) {
+   /* #UD condition is already handled by the code above */
+   kvm_inject_gp(ctxt->vcpu, 0);
+   goto done;
+   }
+
c->dst.type = OP_NONE;  /* no writeback */
break;
case 0x30:
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3cb016f..7b4d113 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -573,7 +573,7 @@ unsigned long kvm_get_cr8(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_get_cr8);
 
-int kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long val)
+static int __kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long val)
 {
switch (dr) {
case 0 ... 3:
@@ -582,29 +582,21 @@ int kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned 
long val)
vcpu->arch.eff_db[dr] = val;
break;
case 4:
-   if (kvm_read_cr4_bits(vcpu, X86_CR4_DE)) {
-   kvm_queue_exception(vcpu, UD_VECTOR);
-   return 1;
-   }
+   if (kvm_read_cr4_bits(vcpu, X86_CR4_DE))
+   return 1; /* #UD */
/* fall through */
case 6:
-   if (val & 0xULL) {
-   kvm_inject_gp(vcpu, 0);
-   return 1;
-   }
+   if (val & 0xULL)
+   return -1; /* #GP */
vcpu->arch.dr6 = (val & DR6_VOLATILE) | DR6_FIXED_1;
break;
case 5:
-   if (kvm_read_cr4_bits(vcpu, X86_CR4_DE)) {
-   kvm_queue_exception(vcpu, UD_VECTOR);
-   return 1;
-   }
+   if (kvm_read_cr4_bits(vcpu, X86_CR4_DE))
+   return 1; /* #UD */
/* fall through */
default: /* 7 */
-   if (val & 0xULL) {
-   kvm_inject_gp(vcpu, 0);
-   return 1;
-   }
+   if (val & 0xULL)
+   return -1; /* #GP */
vcpu->arch.dr7 = (val & DR7_VOLATILE) | DR7_FIXED_1;
if (!(vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP)) {
kvm_x86_ops->set_dr7(vcpu, vcpu->arch.dr7);
@@ -615,28 +607,37 @@ int kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned 
long val)
 
return 0;
 }
+
+int kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long val)
+{
+   int res;
+
+   res = __kvm_set_dr(vcpu, dr, val);
+   if (res > 0)
+   kvm_queue_exception(vcpu, UD_VECTOR);
+   else if (res < 0)
+   kvm_inject_gp(vcpu, 0);
+
+   return res;
+}
 EXPORT_SYMBOL_GPL(kvm_set_dr);
 
-int kvm_get_dr(struct kvm_vcpu *vcpu, int dr, unsigned long *val)
+static int _kvm_get_dr(struct kvm_vcpu *vcpu, int dr, unsigned long *val)
 {
switch (dr) {
case 0 ... 3:
*val = vcpu->arch.db[dr];
break;
case 4:
-   if (kvm_read_cr4_bits(vcpu, X86_CR4_DE)) {
-   kvm_queue_exception(vcpu, UD_VECTOR);
+   if (kvm_read_cr4_bits(vcpu, X86_CR4_DE))
return 1;
-   }
/* fall through */
case 6:
*val = vcpu->arch.dr6;
break;
case 5:
-   if (kvm_read_cr4_bits(vcpu, X86_CR4_DE)) {
-   kvm_queue_exception(vcpu, UD_VECTOR);
+   if (kvm_read_cr4_bits(vcpu, X86_CR4_DE))
return 1;
-   }
/* fall through */
default: /* 7 */
*val = vcpu->arch.dr7;
@@ -645,6 +646,15 @@ int kvm_get_dr(struct kvm_vcpu *vcpu, int dr, unsigned 
long *val)
 
return 0;
 }
+
+int kvm_get_dr(struct kvm_vcpu *vcpu, int dr, unsigned long *val)
+{
+   if (_kvm_get_dr(vcpu, dr, val)) {
+   kvm_queue_exception(vcpu, UD_VECTOR);
+   

[PATCHv2 09/23] KVM: x86 emulator: cleanup some direct calls into kvm to use existing callbacks

2010-04-28 Thread Gleb Natapov
Use callbacks from x86_emulate_ops to access segments instead of calling
into kvm directly.

Signed-off-by: Gleb Natapov 
---
 arch/x86/kvm/emulate.c |  200 +---
 1 files changed, 105 insertions(+), 95 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 41c77a1..deb17a1 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -622,31 +622,35 @@ static void set_seg_override(struct decode_cache *c, int 
seg)
c->seg_override = seg;
 }
 
-static unsigned long seg_base(struct x86_emulate_ctxt *ctxt, int seg)
+static unsigned long seg_base(struct x86_emulate_ctxt *ctxt,
+ struct x86_emulate_ops *ops, int seg)
 {
if (ctxt->mode == X86EMUL_MODE_PROT64 && seg < VCPU_SREG_FS)
return 0;
 
-   return kvm_x86_ops->get_segment_base(ctxt->vcpu, seg);
+   return ops->get_cached_segment_base(seg, ctxt->vcpu);
 }
 
 static unsigned long seg_override_base(struct x86_emulate_ctxt *ctxt,
+  struct x86_emulate_ops *ops,
   struct decode_cache *c)
 {
if (!c->has_seg_override)
return 0;
 
-   return seg_base(ctxt, c->seg_override);
+   return seg_base(ctxt, ops, c->seg_override);
 }
 
-static unsigned long es_base(struct x86_emulate_ctxt *ctxt)
+static unsigned long es_base(struct x86_emulate_ctxt *ctxt,
+struct x86_emulate_ops *ops)
 {
-   return seg_base(ctxt, VCPU_SREG_ES);
+   return seg_base(ctxt, ops, VCPU_SREG_ES);
 }
 
-static unsigned long ss_base(struct x86_emulate_ctxt *ctxt)
+static unsigned long ss_base(struct x86_emulate_ctxt *ctxt,
+struct x86_emulate_ops *ops)
 {
-   return seg_base(ctxt, VCPU_SREG_SS);
+   return seg_base(ctxt, ops, VCPU_SREG_SS);
 }
 
 static int do_fetch_insn_byte(struct x86_emulate_ctxt *ctxt,
@@ -941,7 +945,7 @@ x86_decode_insn(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
memset(c, 0, sizeof(struct decode_cache));
c->eip = ctxt->eip;
c->fetch.start = c->fetch.end = c->eip;
-   ctxt->cs_base = seg_base(ctxt, VCPU_SREG_CS);
+   ctxt->cs_base = seg_base(ctxt, ops, VCPU_SREG_CS);
memcpy(c->regs, ctxt->vcpu->arch.regs, sizeof c->regs);
 
switch (mode) {
@@ -1065,7 +1069,7 @@ done_prefixes:
set_seg_override(c, VCPU_SREG_DS);
 
if (!(!c->twobyte && c->b == 0x8d))
-   c->modrm_ea += seg_override_base(ctxt, c);
+   c->modrm_ea += seg_override_base(ctxt, ops, c);
 
if (c->ad_bytes != 8)
c->modrm_ea = (u32)c->modrm_ea;
@@ -1161,7 +1165,7 @@ done_prefixes:
c->src.type = OP_MEM;
c->src.bytes = (c->d & ByteOp) ? 1 : c->op_bytes;
c->src.ptr = (unsigned long *)
-   register_address(c,  seg_override_base(ctxt, c),
+   register_address(c,  seg_override_base(ctxt, ops, c),
 c->regs[VCPU_REGS_RSI]);
c->src.val = 0;
break;
@@ -1257,7 +1261,7 @@ done_prefixes:
c->dst.type = OP_MEM;
c->dst.bytes = (c->d & ByteOp) ? 1 : c->op_bytes;
c->dst.ptr = (unsigned long *)
-   register_address(c, es_base(ctxt),
+   register_address(c, es_base(ctxt, ops),
 c->regs[VCPU_REGS_RDI]);
c->dst.val = 0;
break;
@@ -1516,7 +1520,8 @@ exception:
return X86EMUL_PROPAGATE_FAULT;
 }
 
-static inline void emulate_push(struct x86_emulate_ctxt *ctxt)
+static inline void emulate_push(struct x86_emulate_ctxt *ctxt,
+   struct x86_emulate_ops *ops)
 {
struct decode_cache *c = &ctxt->decode;
 
@@ -1524,7 +1529,7 @@ static inline void emulate_push(struct x86_emulate_ctxt 
*ctxt)
c->dst.bytes = c->op_bytes;
c->dst.val = c->src.val;
register_address_increment(c, &c->regs[VCPU_REGS_RSP], -c->op_bytes);
-   c->dst.ptr = (void *) register_address(c, ss_base(ctxt),
+   c->dst.ptr = (void *) register_address(c, ss_base(ctxt, ops),
   c->regs[VCPU_REGS_RSP]);
 }
 
@@ -1535,7 +1540,7 @@ static int emulate_pop(struct x86_emulate_ctxt *ctxt,
struct decode_cache *c = &ctxt->decode;
int rc;
 
-   rc = read_emulated(ctxt, ops, register_address(c, ss_base(ctxt),
+   rc = read_emulated(ctxt, ops, register_address(c, ss_base(ctxt, ops),
   c->regs[VCPU_REGS_RSP]),
   dest, len);
if (rc != X86EMUL_CONTINUE)
@@ -1588,15 +1593,14 @@ static int emulate_popf(struct x86_emulate_ctxt *ctxt,
return rc;
 }
 
-static void emulate_push_sreg(struct x86_emulate_ctxt *ctxt, int

[PATCHv2 18/23] KVM: x86 emulator: advance RIP outside x86 emulator code

2010-04-28 Thread Gleb Natapov
Return new RIP as part of instruction emulation result instead of
updating KVM's RIP from x86 emulator code.

Signed-off-by: Gleb Natapov 
---
 arch/x86/kvm/emulate.c |7 ---
 arch/x86/kvm/x86.c |4 +++-
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 83d8fe4..8ff6e82 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2496,8 +2496,9 @@ int emulator_task_switch(struct x86_emulate_ctxt *ctxt,
 
if (rc == X86EMUL_CONTINUE) {
memcpy(ctxt->vcpu->arch.regs, c->regs, sizeof c->regs);
-   kvm_rip_write(ctxt->vcpu, c->eip);
rc = writeback(ctxt, ops);
+   if (rc == X86EMUL_CONTINUE)
+   ctxt->eip = c->eip;
}
 
return (rc == X86EMUL_UNHANDLEABLE) ? -1 : 0;
@@ -2554,7 +2555,7 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
if (address_mask(c, c->regs[VCPU_REGS_RCX]) == 0) {
string_done:
ctxt->restart = false;
-   kvm_rip_write(ctxt->vcpu, c->eip);
+   ctxt->eip = c->eip;
goto done;
}
/* The second termination condition only applies for REPE
@@ -3032,7 +3033,7 @@ writeback:
ctxt->decode.mem_read.end = 0;
/* Commit shadow register state. */
memcpy(ctxt->vcpu->arch.regs, c->regs, sizeof c->regs);
-   kvm_rip_write(ctxt->vcpu, c->eip);
+   ctxt->eip = c->eip;
ops->set_rflags(ctxt->vcpu, ctxt->eflags);
 
 done:
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 445769b..9429ab1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3879,6 +3879,7 @@ restart:
 
shadow_mask = vcpu->arch.emulate_ctxt.interruptibility;
kvm_x86_ops->set_interrupt_shadow(vcpu, shadow_mask);
+   kvm_rip_write(vcpu, vcpu->arch.emulate_ctxt.eip);
 
if (vcpu->arch.pio.count) {
if (!vcpu->arch.pio.in)
@@ -4882,7 +4883,8 @@ int kvm_task_switch(struct kvm_vcpu *vcpu, u16 
tss_selector, int reason,
 
if (ret)
return EMULATE_FAIL;
-
+   
+   kvm_rip_write(vcpu, vcpu->arch.emulate_ctxt.eip);
kvm_x86_ops->set_rflags(vcpu, vcpu->arch.emulate_ctxt.eflags);
return EMULATE_DONE;
 }
-- 
1.6.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2 12/23] KVM: x86 emulator: fix X86EMUL_RETRY_INSTR and X86EMUL_CMPXCHG_FAILED values

2010-04-28 Thread Gleb Natapov
Currently X86EMUL_PROPAGATE_FAULT, X86EMUL_RETRY_INSTR and
X86EMUL_CMPXCHG_FAILED have the same value so caller cannot
distinguish why function such as emulator_cmpxchg_emulated()
(which can return both X86EMUL_PROPAGATE_FAULT and
X86EMUL_CMPXCHG_FAILED) failed.

Signed-off-by: Gleb Natapov 
---
 arch/x86/include/asm/kvm_emulate.h |5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index 6c4f491..0cf4311 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -51,8 +51,9 @@ struct x86_emulate_ctxt;
 #define X86EMUL_UNHANDLEABLE1
 /* Terminate emulation but return success to the caller. */
 #define X86EMUL_PROPAGATE_FAULT 2 /* propagate a generated fault to guest */
-#define X86EMUL_RETRY_INSTR 2 /* retry the instruction for some reason */
-#define X86EMUL_CMPXCHG_FAILED  2 /* cmpxchg did not see expected value */
+#define X86EMUL_RETRY_INSTR 3 /* retry the instruction for some reason */
+#define X86EMUL_CMPXCHG_FAILED  4 /* cmpxchg did not see expected value */
+
 struct x86_emulate_ops {
/*
 * read_std: Read bytes of standard (non-emulated/special) memory.
-- 
1.6.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2 13/23] KVM: fill in run->mmio details in (read|write)_emulated function.

2010-04-28 Thread Gleb Natapov
Fill in run->mmio details in (read|write)_emulated function just like
pio does. There is no point in filling only vcpu fields there just to
copy them into vcpu->run a little bit later.

Signed-off-by: Gleb Natapov 
---
 arch/x86/kvm/x86.c |   25 +
 1 files changed, 9 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7b4d113..aa629a0 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3324,9 +3324,10 @@ mmio:
trace_kvm_mmio(KVM_TRACE_MMIO_READ_UNSATISFIED, bytes, gpa, 0);
 
vcpu->mmio_needed = 1;
-   vcpu->mmio_phys_addr = gpa;
-   vcpu->mmio_size = bytes;
-   vcpu->mmio_is_write = 0;
+   vcpu->run->exit_reason = KVM_EXIT_MMIO;
+   vcpu->run->mmio.phys_addr = vcpu->mmio_phys_addr = gpa;
+   vcpu->run->mmio.len = vcpu->mmio_size = bytes;
+   vcpu->run->mmio.is_write = vcpu->mmio_is_write = 0;
 
return X86EMUL_UNHANDLEABLE;
 }
@@ -3374,10 +3375,11 @@ mmio:
return X86EMUL_CONTINUE;
 
vcpu->mmio_needed = 1;
-   vcpu->mmio_phys_addr = gpa;
-   vcpu->mmio_size = bytes;
-   vcpu->mmio_is_write = 1;
-   memcpy(vcpu->mmio_data, val, bytes);
+   vcpu->run->exit_reason = KVM_EXIT_MMIO;
+   vcpu->run->mmio.phys_addr = vcpu->mmio_phys_addr = gpa;
+   vcpu->run->mmio.len = vcpu->mmio_size = bytes;
+   vcpu->run->mmio.is_write = vcpu->mmio_is_write = 1;
+   memcpy(vcpu->run->mmio.data, val, bytes);
 
return X86EMUL_CONTINUE;
 }
@@ -3788,7 +3790,6 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
 {
int r, shadow_mask;
struct decode_cache *c;
-   struct kvm_run *run = vcpu->run;
 
kvm_clear_exception_queue(vcpu);
vcpu->arch.mmio_fault_cr2 = cr2;
@@ -3875,14 +3876,6 @@ restart:
return EMULATE_DO_MMIO;
}
 
-   if (r || vcpu->mmio_is_write) {
-   run->exit_reason = KVM_EXIT_MMIO;
-   run->mmio.phys_addr = vcpu->mmio_phys_addr;
-   memcpy(run->mmio.data, vcpu->mmio_data, 8);
-   run->mmio.len = vcpu->mmio_size;
-   run->mmio.is_write = vcpu->mmio_is_write;
-   }
-
if (r) {
if (kvm_mmu_unprotect_page_virt(vcpu, cr2))
goto done;
-- 
1.6.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2 19/23] KVM: x86 emulator: set RFLAGS outside x86 emulator code.

2010-04-28 Thread Gleb Natapov
Removes the need for set_flags() callback.

Signed-off-by: Gleb Natapov 
---
 arch/x86/include/asm/kvm_emulate.h |1 -
 arch/x86/kvm/emulate.c |1 -
 arch/x86/kvm/x86.c |7 +--
 3 files changed, 1 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index b7e00cb..a87d95f 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -142,7 +142,6 @@ struct x86_emulate_ops {
ulong (*get_cr)(int cr, struct kvm_vcpu *vcpu);
int (*set_cr)(int cr, ulong val, struct kvm_vcpu *vcpu);
int (*cpl)(struct kvm_vcpu *vcpu);
-   void (*set_rflags)(struct kvm_vcpu *vcpu, unsigned long rflags);
int (*get_dr)(int dr, unsigned long *dest, struct kvm_vcpu *vcpu);
int (*set_dr)(int dr, unsigned long value, struct kvm_vcpu *vcpu);
int (*set_msr)(struct kvm_vcpu *vcpu, u32 msr_index, u64 data);
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 8ff6e82..eee5b4d 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -3034,7 +3034,6 @@ writeback:
/* Commit shadow register state. */
memcpy(ctxt->vcpu->arch.regs, c->regs, sizeof c->regs);
ctxt->eip = c->eip;
-   ops->set_rflags(ctxt->vcpu, ctxt->eflags);
 
 done:
return (rc == X86EMUL_UNHANDLEABLE) ? -1 : 0;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9429ab1..344d9dd 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3744,11 +3744,6 @@ static void emulator_set_segment_selector(u16 sel, int 
seg,
kvm_set_segment(vcpu, &kvm_seg, seg);
 }
 
-static void emulator_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
-{
-   kvm_x86_ops->set_rflags(vcpu, rflags);
-}
-
 static struct x86_emulate_ops emulate_ops = {
.read_std= kvm_read_guest_virt_system,
.write_std   = kvm_write_guest_virt_system,
@@ -3767,7 +3762,6 @@ static struct x86_emulate_ops emulate_ops = {
.get_cr  = emulator_get_cr,
.set_cr  = emulator_set_cr,
.cpl = emulator_get_cpl,
-   .set_rflags  = emulator_set_rflags,
.get_dr  = emulator_get_dr,
.set_dr  = emulator_set_dr,
.set_msr = kvm_set_msr,
@@ -3879,6 +3873,7 @@ restart:
 
shadow_mask = vcpu->arch.emulate_ctxt.interruptibility;
kvm_x86_ops->set_interrupt_shadow(vcpu, shadow_mask);
+   kvm_x86_ops->set_rflags(vcpu, vcpu->arch.emulate_ctxt.eflags);
kvm_rip_write(vcpu, vcpu->arch.emulate_ctxt.eip);
 
if (vcpu->arch.pio.count) {
-- 
1.6.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2 17/23] KVM: handle emulation failure case first.

2010-04-28 Thread Gleb Natapov
If emulation failed return immediately.

Signed-off-by: Gleb Natapov 
---
 arch/x86/kvm/x86.c |   31 +++
 1 files changed, 15 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 696b34b..445769b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3862,22 +3862,6 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
 
 restart:
r = x86_emulate_insn(&vcpu->arch.emulate_ctxt, &emulate_ops);
-   shadow_mask = vcpu->arch.emulate_ctxt.interruptibility;
-
-   if (r == 0)
-   kvm_x86_ops->set_interrupt_shadow(vcpu, shadow_mask);
-
-   if (vcpu->arch.pio.count) {
-   if (!vcpu->arch.pio.in)
-   vcpu->arch.pio.count = 0;
-   return EMULATE_DO_MMIO;
-   }
-
-   if (vcpu->mmio_needed) {
-   if (vcpu->mmio_is_write)
-   vcpu->mmio_needed = 0;
-   return EMULATE_DO_MMIO;
-   }
 
if (r) { /* emulation failed */
/*
@@ -3893,6 +3877,21 @@ restart:
return EMULATE_FAIL;
}
 
+   shadow_mask = vcpu->arch.emulate_ctxt.interruptibility;
+   kvm_x86_ops->set_interrupt_shadow(vcpu, shadow_mask);
+
+   if (vcpu->arch.pio.count) {
+   if (!vcpu->arch.pio.in)
+   vcpu->arch.pio.count = 0;
+   return EMULATE_DO_MMIO;
+   }
+
+   if (vcpu->mmio_needed) {
+   if (vcpu->mmio_is_write)
+   vcpu->mmio_needed = 0;
+   return EMULATE_DO_MMIO;
+   }
+
if (vcpu->arch.exception.pending)
vcpu->arch.emulate_ctxt.restart = false;
 
-- 
1.6.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2 15/23] KVM: remove export of emulator_write_emulated().

2010-04-28 Thread Gleb Natapov
It is not called directly outside of the file it's defined in anymore.

Signed-off-by: Gleb Natapov 
---
 arch/x86/include/asm/kvm_host.h |5 -
 arch/x86/kvm/x86.c  |1 -
 2 files changed, 0 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 74cb6ac..ed48904 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -627,11 +627,6 @@ void kvm_inject_nmi(struct kvm_vcpu *vcpu);
 
 void fx_init(struct kvm_vcpu *vcpu);
 
-int emulator_write_emulated(unsigned long addr,
-   const void *val,
-   unsigned int bytes,
-   struct kvm_vcpu *vcpu);
-
 void kvm_mmu_flush_tlb(struct kvm_vcpu *vcpu);
 void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
   const u8 *new, int bytes,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index bbfabbd..224b98d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3403,7 +3403,6 @@ int emulator_write_emulated(unsigned long addr,
}
return emulator_write_emulated_onepage(addr, val, bytes, vcpu);
 }
-EXPORT_SYMBOL_GPL(emulator_write_emulated);
 
 #define CMPXCHG_TYPE(t, ptr, old, new) \
(cmpxchg((t *)(ptr), *(t *)(old), *(t *)(new)) == *(t *)(old))
-- 
1.6.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2 22/23] KVM: x86 emulator: move interruptibility state tracking out of emulator

2010-04-28 Thread Gleb Natapov
Emulator shouldn't access vcpu directly.

Signed-off-by: Gleb Natapov 
---
 arch/x86/kvm/emulate.c |   19 ++-
 arch/x86/kvm/x86.c |   20 +---
 2 files changed, 19 insertions(+), 20 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 4cc8368..e2def3b 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1843,20 +1843,6 @@ static inline int writeback(struct x86_emulate_ctxt 
*ctxt,
return X86EMUL_CONTINUE;
 }
 
-static void toggle_interruptibility(struct x86_emulate_ctxt *ctxt, u32 mask)
-{
-   u32 int_shadow = kvm_x86_ops->get_interrupt_shadow(ctxt->vcpu, mask);
-   /*
-* an sti; sti; sequence only disable interrupts for the first
-* instruction. So, if the last instruction, be it emulated or
-* not, left the system with the INT_STI flag enabled, it
-* means that the last instruction is an sti. We should not
-* leave the flag on in this case. The same goes for mov ss
-*/
-   if (!(int_shadow & mask))
-   ctxt->interruptibility = mask;
-}
-
 static inline void
 setup_syscalls_segments(struct x86_emulate_ctxt *ctxt,
struct x86_emulate_ops *ops, struct desc_struct *cs,
@@ -2516,7 +2502,6 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
int rc = X86EMUL_CONTINUE;
int saved_dst_type = c->dst.type;
 
-   ctxt->interruptibility = 0;
ctxt->decode.mem_read.pos = 0;
 
if (ctxt->mode == X86EMUL_MODE_PROT64 && (c->d & No64)) {
@@ -2789,7 +2774,7 @@ special_insn:
}
 
if (c->modrm_reg == VCPU_SREG_SS)
-   toggle_interruptibility(ctxt, 
KVM_X86_SHADOW_INT_MOV_SS);
+   ctxt->interruptibility = KVM_X86_SHADOW_INT_MOV_SS;
 
rc = load_segment_descriptor(ctxt, ops, sel, c->modrm_reg);
 
@@ -2958,7 +2943,7 @@ special_insn:
if (emulator_bad_iopl(ctxt, ops))
kvm_inject_gp(ctxt->vcpu, 0);
else {
-   toggle_interruptibility(ctxt, KVM_X86_SHADOW_INT_STI);
+   ctxt->interruptibility = KVM_X86_SHADOW_INT_STI;
ctxt->eflags |= X86_EFLAGS_IF;
c->dst.type = OP_NONE;  /* Disable writeback. */
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5fda84e..f7e8732 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3776,12 +3776,26 @@ static void cache_all_regs(struct kvm_vcpu *vcpu)
vcpu->arch.regs_dirty = ~0;
 }
 
+static void toggle_interruptibility(struct kvm_vcpu *vcpu, u32 mask)
+{
+   u32 int_shadow = kvm_x86_ops->get_interrupt_shadow(vcpu, mask);
+   /*
+* an sti; sti; sequence only disable interrupts for the first
+* instruction. So, if the last instruction, be it emulated or
+* not, left the system with the INT_STI flag enabled, it
+* means that the last instruction is an sti. We should not
+* leave the flag on in this case. The same goes for mov ss
+*/
+   if (!(int_shadow & mask))
+   kvm_x86_ops->set_interrupt_shadow(vcpu, mask);
+}
+
 int emulate_instruction(struct kvm_vcpu *vcpu,
unsigned long cr2,
u16 error_code,
int emulation_type)
 {
-   int r, shadow_mask;
+   int r;
struct decode_cache *c = &vcpu->arch.emulate_ctxt.decode;
 
kvm_clear_exception_queue(vcpu);
@@ -3809,6 +3823,7 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
? X86EMUL_MODE_PROT32 : X86EMUL_MODE_PROT16;
memset(c, 0, sizeof(struct decode_cache));
memcpy(c->regs, vcpu->arch.regs, sizeof c->regs);
+   vcpu->arch.emulate_ctxt.interruptibility = 0;
 
r = x86_decode_insn(&vcpu->arch.emulate_ctxt, &emulate_ops);
trace_kvm_emulate_insn_start(vcpu);
@@ -3876,8 +3891,7 @@ restart:
return EMULATE_FAIL;
}
 
-   shadow_mask = vcpu->arch.emulate_ctxt.interruptibility;
-   kvm_x86_ops->set_interrupt_shadow(vcpu, shadow_mask);
+   toggle_interruptibility(vcpu, vcpu->arch.emulate_ctxt.interruptibility);
kvm_x86_ops->set_rflags(vcpu, vcpu->arch.emulate_ctxt.eflags);
memcpy(vcpu->arch.regs, c->regs, sizeof c->regs);
kvm_rip_write(vcpu, vcpu->arch.emulate_ctxt.eip);
-- 
1.6.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2 16/23] KVM: do not inject #PF in (read|write)_emulated() callbacks

2010-04-28 Thread Gleb Natapov
Return error to x86 emulator instead of injection exception behind its back.

Signed-off-by: Gleb Natapov 
---
 arch/x86/include/asm/kvm_emulate.h |3 +++
 arch/x86/kvm/emulate.c |   12 +++-
 arch/x86/kvm/x86.c |   28 ++--
 3 files changed, 28 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index 777240d..b7e00cb 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -94,6 +94,7 @@ struct x86_emulate_ops {
int (*read_emulated)(unsigned long addr,
 void *val,
 unsigned int bytes,
+unsigned int *error,
 struct kvm_vcpu *vcpu);
 
/*
@@ -106,6 +107,7 @@ struct x86_emulate_ops {
int (*write_emulated)(unsigned long addr,
  const void *val,
  unsigned int bytes,
+ unsigned int *error,
  struct kvm_vcpu *vcpu);
 
/*
@@ -120,6 +122,7 @@ struct x86_emulate_ops {
const void *old,
const void *new,
unsigned int bytes,
+   unsigned int *error,
struct kvm_vcpu *vcpu);
 
int (*pio_in_emulated)(int size, unsigned short port, void *val,
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index df4e9a0..83d8fe4 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1277,6 +1277,7 @@ static int read_emulated(struct x86_emulate_ctxt *ctxt,
 {
int rc;
struct read_cache *mc = &ctxt->decode.mem_read;
+   u32 err;
 
while (size) {
int n = min(size, 8u);
@@ -1284,7 +1285,10 @@ static int read_emulated(struct x86_emulate_ctxt *ctxt,
if (mc->pos < mc->end)
goto read_cached;

-   rc = ops->read_emulated(addr, mc->data + mc->end, n, 
ctxt->vcpu);
+   rc = ops->read_emulated(addr, mc->data + mc->end, n, &err,
+   ctxt->vcpu);
+   if (rc == X86EMUL_PROPAGATE_FAULT)
+   kvm_inject_page_fault(ctxt->vcpu, addr, err);
if (rc != X86EMUL_CONTINUE)
return rc;
mc->end += n;
@@ -1789,6 +1793,7 @@ static inline int writeback(struct x86_emulate_ctxt *ctxt,
 {
int rc;
struct decode_cache *c = &ctxt->decode;
+   u32 err;
 
switch (c->dst.type) {
case OP_REG:
@@ -1817,13 +1822,18 @@ static inline int writeback(struct x86_emulate_ctxt 
*ctxt,
&c->dst.orig_val,
&c->dst.val,
c->dst.bytes,
+   &err,
ctxt->vcpu);
else
rc = ops->write_emulated(
(unsigned long)c->dst.ptr,
&c->dst.val,
c->dst.bytes,
+   &err,
ctxt->vcpu);
+   if (rc == X86EMUL_PROPAGATE_FAULT)
+   kvm_inject_page_fault(ctxt->vcpu,
+ (unsigned long)c->dst.ptr, err);
if (rc != X86EMUL_CONTINUE)
return rc;
break;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 224b98d..696b34b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3284,10 +3284,10 @@ out:
 static int emulator_read_emulated(unsigned long addr,
  void *val,
  unsigned int bytes,
+ unsigned int *error_code,
  struct kvm_vcpu *vcpu)
 {
gpa_t gpa;
-   u32 error_code;
 
if (vcpu->mmio_read_completed) {
memcpy(val, vcpu->mmio_data, bytes);
@@ -3297,12 +3297,10 @@ static int emulator_read_emulated(unsigned long addr,
return X86EMUL_CONTINUE;
}
 
-   gpa = kvm_mmu_gva_to_gpa_read(vcpu, addr, &error_code);
+   gpa = kvm_mmu_gva_to_gpa_read(vcpu, addr, error_code);
 
-   if (gpa == UNMAPPED_GVA) {
-   kvm_inject_page_fault(vcpu, addr, error_code);
+   if (gpa == UNMAPPED_GVA)
return X86EMUL_PROPAGATE_FAULT;
-   }
 
/* For APIC access vmexit */
if ((gpa & PAGE_MASK) == APIC_DEFAULT_PHYS_BASE)
@@ -3347,17 +3345,15 @@ int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t 
gpa,
 static int emulator_write_emulated_onepage(unsigned long addr,

[PATCHv2 21/23] KVM: x86 exmulator: handle shadowed registers outside emulator.

2010-04-28 Thread Gleb Natapov
Emulator shouldn't access vcpu directly.

Signed-off-by: Gleb Natapov 
---
 arch/x86/kvm/emulate.c |   15 ---
 arch/x86/kvm/x86.c |   16 +---
 2 files changed, 13 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 5d786d5..4cc8368 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -941,12 +941,9 @@ x86_decode_insn(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
/* we cannot decode insn before we complete previous rep insn */
WARN_ON(ctxt->restart);
 
-   /* Shadow copy of register state. Committed on successful emulation. */
-   memset(c, 0, sizeof(struct decode_cache));
c->eip = ctxt->eip;
c->fetch.start = c->fetch.end = c->eip;
ctxt->cs_base = seg_base(ctxt, ops, VCPU_SREG_CS);
-   memcpy(c->regs, ctxt->vcpu->arch.regs, sizeof c->regs);
 
switch (mode) {
case X86EMUL_MODE_REAL:
@@ -2486,16 +2483,13 @@ int emulator_task_switch(struct x86_emulate_ctxt *ctxt,
struct decode_cache *c = &ctxt->decode;
int rc;
 
-   memset(c, 0, sizeof(struct decode_cache));
c->eip = ctxt->eip;
-   memcpy(c->regs, ctxt->vcpu->arch.regs, sizeof c->regs);
c->dst.type = OP_NONE;
 
rc = emulator_do_task_switch(ctxt, ops, tss_selector, reason,
 has_error_code, error_code);
 
if (rc == X86EMUL_CONTINUE) {
-   memcpy(ctxt->vcpu->arch.regs, c->regs, sizeof c->regs);
rc = writeback(ctxt, ops);
if (rc == X86EMUL_CONTINUE)
ctxt->eip = c->eip;
@@ -2525,13 +2519,6 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
ctxt->interruptibility = 0;
ctxt->decode.mem_read.pos = 0;
 
-   /* Shadow copy of register state. Committed on successful emulation.
-* NOTE: we can copy them from vcpu as x86_decode_insn() doesn't
-* modify them.
-*/
-
-   memcpy(c->regs, ctxt->vcpu->arch.regs, sizeof c->regs);
-
if (ctxt->mode == X86EMUL_MODE_PROT64 && (c->d & No64)) {
kvm_queue_exception(ctxt->vcpu, UD_VECTOR);
goto done;
@@ -3031,8 +3018,6 @@ writeback:
 * without decoding
 */
ctxt->decode.mem_read.end = 0;
-   /* Commit shadow register state. */
-   memcpy(ctxt->vcpu->arch.regs, c->regs, sizeof c->regs);
ctxt->eip = c->eip;
 
 done:
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 344d9dd..5fda84e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3782,7 +3782,7 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
int emulation_type)
 {
int r, shadow_mask;
-   struct decode_cache *c;
+   struct decode_cache *c = &vcpu->arch.emulate_ctxt.decode;
 
kvm_clear_exception_queue(vcpu);
vcpu->arch.mmio_fault_cr2 = cr2;
@@ -3807,13 +3807,14 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
? X86EMUL_MODE_VM86 : cs_l
? X86EMUL_MODE_PROT64 : cs_db
? X86EMUL_MODE_PROT32 : X86EMUL_MODE_PROT16;
+   memset(c, 0, sizeof(struct decode_cache));
+   memcpy(c->regs, vcpu->arch.regs, sizeof c->regs);
 
r = x86_decode_insn(&vcpu->arch.emulate_ctxt, &emulate_ops);
trace_kvm_emulate_insn_start(vcpu);
 
/* Only allow emulation of specific instructions on #UD
 * (namely VMMCALL, sysenter, sysexit, syscall)*/
-   c = &vcpu->arch.emulate_ctxt.decode;
if (emulation_type & EMULTYPE_TRAP_UD) {
if (!c->twobyte)
return EMULATE_FAIL;
@@ -3854,6 +3855,10 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
return EMULATE_DONE;
}
 
+   /* this is needed for vmware backdor interface to work since it
+  changes registers values  during IO operation */
+   memcpy(c->regs, vcpu->arch.regs, sizeof c->regs);
+
 restart:
r = x86_emulate_insn(&vcpu->arch.emulate_ctxt, &emulate_ops);
 
@@ -3874,6 +3879,7 @@ restart:
shadow_mask = vcpu->arch.emulate_ctxt.interruptibility;
kvm_x86_ops->set_interrupt_shadow(vcpu, shadow_mask);
kvm_x86_ops->set_rflags(vcpu, vcpu->arch.emulate_ctxt.eflags);
+   memcpy(vcpu->arch.regs, c->regs, sizeof c->regs);
kvm_rip_write(vcpu, vcpu->arch.emulate_ctxt.eip);
 
if (vcpu->arch.pio.count) {
@@ -4857,6 +4863,7 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
 int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int reason,
bool has_error_code, u32 error_code)
 {
+   struct decode_cache *c = &vcpu->arch.emulate_ctxt.decode;
int cs_db, cs_l, ret;
cache_all_regs(vcpu);
 
@@ -4871,6 +4878,8 @@ int kvm_task_switch(struct kvm_vcpu *vcpu, u16 

[PATCHv2 23/23] KVM: x86 emulator: do not inject exception directly into vcpu

2010-04-28 Thread Gleb Natapov
Return exception as a result of instruction emulation and handle
injection in KVM code.

Signed-off-by: Gleb Natapov 
---
 arch/x86/include/asm/kvm_emulate.h |6 ++
 arch/x86/kvm/emulate.c |  124 ++--
 arch/x86/kvm/x86.c |   20 +-
 3 files changed, 100 insertions(+), 50 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index a87d95f..51cfd73 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -216,6 +216,12 @@ struct x86_emulate_ctxt {
int interruptibility;
 
bool restart; /* restart string instruction after writeback */
+
+   int exception; /* exception that happens during emulation or -1 */
+   u32 error_code; /* error code for exception */
+   bool error_code_valid;
+   unsigned long cr2; /* faulted address in case of #PF */
+
/* decode cache */
struct decode_cache decode;
 };
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index e2def3b..50ce993 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -653,6 +653,37 @@ static unsigned long ss_base(struct x86_emulate_ctxt *ctxt,
return seg_base(ctxt, ops, VCPU_SREG_SS);
 }
 
+static void emulate_exception(struct x86_emulate_ctxt *ctxt, int vec,
+ u32 error, bool valid)
+{
+   ctxt->exception = vec;
+   ctxt->error_code = error;
+   ctxt->error_code_valid = valid;
+   ctxt->restart = false;
+}
+
+static void emulate_gp(struct x86_emulate_ctxt *ctxt, int err)
+{
+   emulate_exception(ctxt, GP_VECTOR, err, true);
+}
+
+static void emulate_pf(struct x86_emulate_ctxt *ctxt, unsigned long addr,
+  int err)
+{
+   ctxt->cr2 = addr;
+   emulate_exception(ctxt, PF_VECTOR, err, true);
+}
+
+static void emulate_ud(struct x86_emulate_ctxt *ctxt)
+{
+   emulate_exception(ctxt, UD_VECTOR, 0, false);
+}
+
+static void emulate_ts(struct x86_emulate_ctxt *ctxt, int err)
+{
+   emulate_exception(ctxt, TS_VECTOR, err, true);
+}
+
 static int do_fetch_insn_byte(struct x86_emulate_ctxt *ctxt,
  struct x86_emulate_ops *ops,
  unsigned long eip, u8 *dest)
@@ -1285,7 +1316,7 @@ static int read_emulated(struct x86_emulate_ctxt *ctxt,
rc = ops->read_emulated(addr, mc->data + mc->end, n, &err,
ctxt->vcpu);
if (rc == X86EMUL_PROPAGATE_FAULT)
-   kvm_inject_page_fault(ctxt->vcpu, addr, err);
+   emulate_pf(ctxt, addr, err);
if (rc != X86EMUL_CONTINUE)
return rc;
mc->end += n;
@@ -1366,13 +1397,13 @@ static int read_segment_descriptor(struct 
x86_emulate_ctxt *ctxt,
get_descriptor_table_ptr(ctxt, ops, selector, &dt);
 
if (dt.size < index * 8 + 7) {
-   kvm_inject_gp(ctxt->vcpu, selector & 0xfffc);
+   emulate_gp(ctxt, selector & 0xfffc);
return X86EMUL_PROPAGATE_FAULT;
}
addr = dt.address + index * 8;
ret = ops->read_std(addr, desc, sizeof *desc, ctxt->vcpu,  &err);
if (ret == X86EMUL_PROPAGATE_FAULT)
-   kvm_inject_page_fault(ctxt->vcpu, addr, err);
+   emulate_pf(ctxt, addr, err);
 
return ret;
 }
@@ -1391,14 +1422,14 @@ static int write_segment_descriptor(struct 
x86_emulate_ctxt *ctxt,
get_descriptor_table_ptr(ctxt, ops, selector, &dt);
 
if (dt.size < index * 8 + 7) {
-   kvm_inject_gp(ctxt->vcpu, selector & 0xfffc);
+   emulate_gp(ctxt, selector & 0xfffc);
return X86EMUL_PROPAGATE_FAULT;
}
 
addr = dt.address + index * 8;
ret = ops->write_std(addr, desc, sizeof *desc, ctxt->vcpu, &err);
if (ret == X86EMUL_PROPAGATE_FAULT)
-   kvm_inject_page_fault(ctxt->vcpu, addr, err);
+   emulate_pf(ctxt, addr, err);
 
return ret;
 }
@@ -1517,7 +1548,7 @@ load:
ops->set_cached_descriptor(&seg_desc, seg, ctxt->vcpu);
return X86EMUL_CONTINUE;
 exception:
-   kvm_queue_exception_e(ctxt->vcpu, err_vec, err_code);
+   emulate_exception(ctxt, err_vec, err_code, true);
return X86EMUL_PROPAGATE_FAULT;
 }
 
@@ -1578,7 +1609,7 @@ static int emulate_popf(struct x86_emulate_ctxt *ctxt,
break;
case X86EMUL_MODE_VM86:
if (iopl < 3) {
-   kvm_inject_gp(ctxt->vcpu, 0);
+   emulate_gp(ctxt, 0);
return X86EMUL_PROPAGATE_FAULT;
}
change_mask |= EFLG_IF;
@@ -1829,7 +1860,7 @@ static inline int writeback(struct x86_emulate_ctxt *ctxt,
&err,
ctxt->vcpu);
if (rc == X86EMUL_PR

[PATCHv2 00/23] next round of emulator cleanups

2010-04-28 Thread Gleb Natapov
This is the next round of emulator cleanups. Make it even more detached
from kvm. First patch introduces IO read cache which is needed to
correctly emulate instructions that require more then one IO read exit
during emulation.

ChangeLog;

 v1->v2:
  - add x86_emulator_ops callback to get cached segment base
  - use double underscore on helper functions
  - fold "remove unneeded initialization" patch

Gleb Natapov (23):
  KVM: x86 emulator: introduce read cache.
  KVM: x86 emulator: fix Move r/m16 to segment register decoding.
  KVM: x86 emulator: cleanup xchg emulation.
  KVM: x86 emulator: cleanup nop emulation
  KVM: x86 emulator: handle "far address" source operand.
  KVM: x86 emulator: add (set|get)_dr callbacks to x86_emulate_ops
  KVM: x86 emulator: add (set|get)_msr callbacks to x86_emulate_ops
  KVM: x86 emulator: add get_cached_segment_base() callback to
x86_emulate_ops.
  KVM: x86 emulator: cleanup some direct calls into kvm to use existing
callbacks
  KVM: x86 emulator: make set_cr() callback return error if it fails
  KVM: x86 emulator: make (get|set)_dr() callback return error if it
fails
  KVM: x86 emulator: fix X86EMUL_RETRY_INSTR and X86EMUL_CMPXCHG_FAILED
values
  KVM: fill in run->mmio details in (read|write)_emulated function.
  KVM: x86 emulator: x86_emulate_insn() return -1 only in case of
emulation failure
  KVM: remove export of emulator_write_emulated().
  KVM: do not inject #PF in (read|write)_emulated() callbacks
  KVM: handle emulation failure case first.
  KVM: x86 emulator: advance RIP outside x86 emulator code
  KVM: x86 emulator: set RFLAGS outside x86 emulator code.
  KVM: x86 emulator: use shadowed register in emulate_sysexit()
  KVM: x86 exmulator: handle shadowed registers outside emulator.
  KVM: x86 emulator: move interruptibility state tracking out of
emulator
  KVM: x86 emulator: do not inject exception directly into vcpu

 arch/x86/include/asm/kvm_emulate.h |   30 ++-
 arch/x86/include/asm/kvm_host.h|9 -
 arch/x86/kvm/emulate.c |  559 
 arch/x86/kvm/x86.c |  382 ++---
 4 files changed, 549 insertions(+), 431 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2 20/23] KVM: x86 emulator: use shadowed register in emulate_sysexit()

2010-04-28 Thread Gleb Natapov
emulate_sysexit() should use shadowed registers copy instead of
looking into vcpu state directly.

Signed-off-by: Gleb Natapov 
---
 arch/x86/kvm/emulate.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index eee5b4d..5d786d5 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2063,8 +2063,8 @@ emulate_sysexit(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
ops->set_cached_descriptor(&ss, VCPU_SREG_SS, ctxt->vcpu);
ops->set_segment_selector(ss_sel, VCPU_SREG_SS, ctxt->vcpu);
 
-   c->eip = ctxt->vcpu->arch.regs[VCPU_REGS_RDX];
-   c->regs[VCPU_REGS_RSP] = ctxt->vcpu->arch.regs[VCPU_REGS_RCX];
+   c->eip = c->regs[VCPU_REGS_RDX];
+   c->regs[VCPU_REGS_RSP] = c->regs[VCPU_REGS_RCX];
 
return X86EMUL_CONTINUE;
 }
-- 
1.6.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2 04/23] KVM: x86 emulator: cleanup nop emulation

2010-04-28 Thread Gleb Natapov
Make it more explicit what we are checking for.

Signed-off-by: Gleb Natapov 
---
 arch/x86/kvm/emulate.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index ea5c6fd..fbc555b 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2799,8 +2799,8 @@ special_insn:
goto done;
break;
case 0x90: /* nop / xchg r8,rax */
-   if (!(c->rex_prefix & 1)) { /* nop */
-   c->dst.type = OP_NONE;
+   if (c->dst.ptr == (unsigned long *)&c->regs[VCPU_REGS_RAX]) {
+   c->dst.type = OP_NONE;  /* nop */
break;
}
case 0x91 ... 0x97: /* xchg reg,rax */
-- 
1.6.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2 02/23] KVM: x86 emulator: fix Move r/m16 to segment register decoding.

2010-04-28 Thread Gleb Natapov
This instruction does not need generic decoding for its dst operand.

Signed-off-by: Gleb Natapov 
---
 arch/x86/kvm/emulate.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 6f40337..efb7853 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -171,7 +171,7 @@ static u32 opcode_table[256] = {
ByteOp | DstMem | SrcReg | ModRM | Mov, DstMem | SrcReg | ModRM | Mov,
ByteOp | DstReg | SrcMem | ModRM | Mov, DstReg | SrcMem | ModRM | Mov,
DstMem | SrcReg | ModRM | Mov, ModRM | DstReg,
-   DstReg | SrcMem | ModRM | Mov, Group | Group1A,
+   ImplicitOps | SrcMem | ModRM, Group | Group1A,
/* 0x90 - 0x97 */
DstReg, DstReg, DstReg, DstReg, DstReg, DstReg, DstReg, DstReg,
/* 0x98 - 0x9F */
-- 
1.6.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2 03/23] KVM: x86 emulator: cleanup xchg emulation.

2010-04-28 Thread Gleb Natapov
Dst operand is already initialized during decoding stage. No need to
reinitialize.

Signed-off-by: Gleb Natapov 
---
 arch/x86/kvm/emulate.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index efb7853..ea5c6fd 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2804,8 +2804,8 @@ special_insn:
break;
}
case 0x91 ... 0x97: /* xchg reg,rax */
-   c->src.type = c->dst.type = OP_REG;
-   c->src.bytes = c->dst.bytes = c->op_bytes;
+   c->src.type = OP_REG;
+   c->src.bytes = c->op_bytes;
c->src.ptr = (unsigned long *) &c->regs[VCPU_REGS_RAX];
c->src.val = *(c->src.ptr);
goto xchg;
-- 
1.6.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2 01/23] KVM: x86 emulator: introduce read cache.

2010-04-28 Thread Gleb Natapov
Introduce read cache which is needed for instruction that require more
then one exit to userspace. After returning from userspace the instruction
will be re-executed with cached read value.

Signed-off-by: Gleb Natapov 
---
 arch/x86/include/asm/kvm_emulate.h |1 +
 arch/x86/kvm/emulate.c |   56 +++-
 2 files changed, 43 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index 0b2729b..288cbed 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -186,6 +186,7 @@ struct decode_cache {
unsigned long modrm_val;
struct fetch_cache fetch;
struct read_cache io_read;
+   struct read_cache mem_read;
 };
 
 struct x86_emulate_ctxt {
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 5ac0bb4..6f40337 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1263,6 +1263,33 @@ done:
return (rc == X86EMUL_UNHANDLEABLE) ? -1 : 0;
 }
 
+static int read_emulated(struct x86_emulate_ctxt *ctxt,
+struct x86_emulate_ops *ops,
+unsigned long addr, void *dest, unsigned size)
+{
+   int rc;
+   struct read_cache *mc = &ctxt->decode.mem_read;
+
+   while (size) {
+   int n = min(size, 8u);
+   size -= n;
+   if (mc->pos < mc->end)
+   goto read_cached;
+   
+   rc = ops->read_emulated(addr, mc->data + mc->end, n, 
ctxt->vcpu);
+   if (rc != X86EMUL_CONTINUE)
+   return rc;
+   mc->end += n;
+
+   read_cached:
+   memcpy(dest, mc->data + mc->pos, n);
+   mc->pos += n;
+   dest += n;
+   addr += n;
+   }
+   return X86EMUL_CONTINUE;
+}
+
 static int pio_in_emulated(struct x86_emulate_ctxt *ctxt,
   struct x86_emulate_ops *ops,
   unsigned int size, unsigned short port,
@@ -1504,9 +1531,9 @@ static int emulate_pop(struct x86_emulate_ctxt *ctxt,
struct decode_cache *c = &ctxt->decode;
int rc;
 
-   rc = ops->read_emulated(register_address(c, ss_base(ctxt),
-c->regs[VCPU_REGS_RSP]),
-   dest, len, ctxt->vcpu);
+   rc = read_emulated(ctxt, ops, register_address(c, ss_base(ctxt),
+  c->regs[VCPU_REGS_RSP]),
+  dest, len);
if (rc != X86EMUL_CONTINUE)
return rc;
 
@@ -2475,6 +2502,7 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
int saved_dst_type = c->dst.type;
 
ctxt->interruptibility = 0;
+   ctxt->decode.mem_read.pos = 0;
 
/* Shadow copy of register state. Committed on successful emulation.
 * NOTE: we can copy them from vcpu as x86_decode_insn() doesn't
@@ -2529,20 +2557,16 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
}
 
if (c->src.type == OP_MEM) {
-   rc = ops->read_emulated((unsigned long)c->src.ptr,
-   &c->src.val,
-   c->src.bytes,
-   ctxt->vcpu);
+   rc = read_emulated(ctxt, ops, (unsigned long)c->src.ptr,
+   &c->src.val, c->src.bytes);
if (rc != X86EMUL_CONTINUE)
goto done;
c->src.orig_val = c->src.val;
}
 
if (c->src2.type == OP_MEM) {
-   rc = ops->read_emulated((unsigned long)c->src2.ptr,
-   &c->src2.val,
-   c->src2.bytes,
-   ctxt->vcpu);
+   rc = read_emulated(ctxt, ops, (unsigned long)c->src2.ptr,
+   &c->src2.val, c->src2.bytes);
if (rc != X86EMUL_CONTINUE)
goto done;
}
@@ -2553,8 +2577,8 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
 
if ((c->dst.type == OP_MEM) && !(c->d & Mov)) {
/* optimisation - avoid slow emulated read if Mov */
-   rc = ops->read_emulated((unsigned long)c->dst.ptr, &c->dst.val,
-   c->dst.bytes, ctxt->vcpu);
+   rc = read_emulated(ctxt, ops, (unsigned long)c->dst.ptr,
+  &c->dst.val, c->dst.bytes);
if (rc != X86EMUL_CONTINUE)
goto done;
}
@@ -2981,7 +3005,11 @@ writeback:
(rc->end != 0 && rc->end == rc->pos))
ctxt->restart = false;
}
-
+   /*
+* reset read cache here in case st

Re: [PATCH] virtio-spec: document block CMD and FLUSH

2010-04-28 Thread Michael S. Tsirkin
On Tue, Apr 20, 2010 at 12:26:27AM +0300, Michael S. Tsirkin wrote:
> On Fri, Feb 19, 2010 at 12:22:20AM +0200, Michael S. Tsirkin wrote:
> > I took a stub at documenting CMD and FLUSH request types in virtio
> > block.
> 
> Any comments?

Rusty?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/5] test: access: don't expect fetch fault indication if !efer.nx

2010-04-28 Thread Avi Kivity
Bit 4 of the page-fault error code can only be set if efer.nx=1.

Signed-off-by: Avi Kivity 
---
 kvm/user/test/x86/access.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/kvm/user/test/x86/access.c b/kvm/user/test/x86/access.c
index 5addd15..3338fbc 100644
--- a/kvm/user/test/x86/access.c
+++ b/kvm/user/test/x86/access.c
@@ -463,6 +463,8 @@ no_pte:
 fault:
 if (!at->expected_fault)
 at->ignore_pde = 0;
+if (!at->flags[AC_CPU_EFER_NX])
+at->expected_error &= ~PFERR_FETCH_MASK;
 }
 
 static void ac_test_check(ac_test_t *at, _Bool *success_ret, _Bool cond,
-- 
1.7.0.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/5] test: access: report successful tests optionally

2010-04-28 Thread Avi Kivity
There are so many tests, no point in reporting successful ones.  Leave the
capability optional in case a failure triggers a kvm or testsuite crash before
the report is complete and we need to see which test failed.

Signed-off-by: Avi Kivity 
---
 kvm/user/test/x86/access.c |   15 +--
 1 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/kvm/user/test/x86/access.c b/kvm/user/test/x86/access.c
index 0906691..c7a7075 100644
--- a/kvm/user/test/x86/access.c
+++ b/kvm/user/test/x86/access.c
@@ -6,6 +6,8 @@
 #define true 1
 #define false 0
 
+static _Bool verbose = false;
+
 typedef unsigned long pt_element_t;
 
 #define PAGE_SIZE ((pt_element_t)4096)
@@ -145,6 +147,9 @@ typedef struct {
 unsigned long linear_addr;
 } __attribute__((packed)) descriptor_table_t;
 
+
+static void ac_test_show(ac_test_t *at);
+
 void lidt(idt_entry_t *idt, int nentries)
 {
 descriptor_table_t dt;
@@ -469,6 +474,10 @@ static void ac_test_check(ac_test_t *at, _Bool 
*success_ret, _Bool cond,
 
 *success_ret = false;
 
+if (!verbose) {
+ac_test_show(at);
+}
+
 va_start(ap, fmt);
 vsnprintf(buf, sizeof(buf), fmt, ap);
 va_end(ap);
@@ -565,7 +574,7 @@ int ac_test_do_access(ac_test_t *at)
 ac_test_check(at, &success, *at->pdep != at->expected_pde,
   "pde %x expected %x", *at->pdep, at->expected_pde);
 
-if (success) {
+if (success && verbose) {
 printf("PASS\n");
 }
 return success;
@@ -590,7 +599,9 @@ int ac_test_exec(ac_test_t *at)
 {
 int r;
 
-ac_test_show(at);
+if (verbose) {
+ac_test_show(at);
+}
 ac_test_setup_pte(at);
 r = ac_test_do_access(at);
 return r;
-- 
1.7.0.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/5] test: access: allow the processor not to set pde.a if a fault occurs

2010-04-28 Thread Avi Kivity
Some processors only set accessed bits if the translation is valid; allow
this behaviour.  This squelchs errors running with EPT.

Signed-off-by: Avi Kivity 
---
 kvm/user/test/x86/access.c |   25 -
 1 files changed, 20 insertions(+), 5 deletions(-)

diff --git a/kvm/user/test/x86/access.c b/kvm/user/test/x86/access.c
index c7a7075..5addd15 100644
--- a/kvm/user/test/x86/access.c
+++ b/kvm/user/test/x86/access.c
@@ -137,6 +137,7 @@ typedef struct {
 pt_element_t expected_pte;
 pt_element_t *pdep;
 pt_element_t expected_pde;
+pt_element_t ignore_pde;
 int expected_fault;
 unsigned expected_error;
 idt_entry_t idt[256];
@@ -370,6 +371,7 @@ void ac_test_setup_pte(ac_test_t *at)
 if (at->ptep)
at->expected_pte = *at->ptep;
 at->expected_pde = *at->pdep;
+at->ignore_pde = 0;
 at->expected_fault = 0;
 at->expected_error = PFERR_PRESENT_MASK;
 
@@ -416,13 +418,17 @@ void ac_test_setup_pte(ac_test_t *at)
 if (at->flags[AC_ACCESS_FETCH] && at->flags[AC_PDE_NX])
at->expected_fault = 1;
 
-if (at->expected_fault)
+if (!at->flags[AC_PDE_ACCESSED])
+at->ignore_pde = PT_ACCESSED_MASK;
+
+if (!pde_valid)
goto fault;
 
-at->expected_pde |= PT_ACCESSED_MASK;
+if (!at->expected_fault)
+at->expected_pde |= PT_ACCESSED_MASK;
 
 if (at->flags[AC_PDE_PSE]) {
-   if (at->flags[AC_ACCESS_WRITE])
+   if (at->flags[AC_ACCESS_WRITE] && !at->expected_fault)
at->expected_pde |= PT_DIRTY_MASK;
goto no_pte;
 }
@@ -455,7 +461,8 @@ void ac_test_setup_pte(ac_test_t *at)
 
 no_pte:
 fault:
-;
+if (!at->expected_fault)
+at->ignore_pde = 0;
 }
 
 static void ac_test_check(ac_test_t *at, _Bool *success_ret, _Bool cond,
@@ -484,6 +491,13 @@ static void ac_test_check(ac_test_t *at, _Bool 
*success_ret, _Bool cond,
 printf("FAIL: %s\n", buf);
 }
 
+static int pt_match(pt_element_t pte1, pt_element_t pte2, pt_element_t ignore)
+{
+pte1 &= ~ignore;
+pte2 &= ~ignore;
+return pte1 == pte2;
+}
+
 int ac_test_do_access(ac_test_t *at)
 {
 static unsigned unique = 42;
@@ -571,7 +585,8 @@ int ac_test_do_access(ac_test_t *at)
   "error code %x expected %x", e, at->expected_error);
 ac_test_check(at, &success, at->ptep && *at->ptep != at->expected_pte,
   "pte %x expected %x", *at->ptep, at->expected_pte);
-ac_test_check(at, &success, *at->pdep != at->expected_pde,
+ac_test_check(at, &success,
+  !pt_match(*at->pdep, at->expected_pde, at->ignore_pde),
   "pde %x expected %x", *at->pdep, at->expected_pde);
 
 if (success && verbose) {
-- 
1.7.0.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/5] test: access: split off test case formatting from execution

2010-04-28 Thread Avi Kivity
Signed-off-by: Avi Kivity 
---
 kvm/user/test/x86/access.c |   10 --
 1 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/kvm/user/test/x86/access.c b/kvm/user/test/x86/access.c
index 5eadff8..dbc1213 100644
--- a/kvm/user/test/x86/access.c
+++ b/kvm/user/test/x86/access.c
@@ -557,9 +557,8 @@ int ac_test_do_access(ac_test_t *at)
 return 1;
 }
 
-int ac_test_exec(ac_test_t *at)
+static void ac_test_show(ac_test_t *at)
 {
-int r;
 char line[5000];
 
 *line = 0;
@@ -571,6 +570,13 @@ int ac_test_exec(ac_test_t *at)
}
 strcat(line, ": ");
 printf("%s", line);
+}
+
+int ac_test_exec(ac_test_t *at)
+{
+int r;
+
+ac_test_show(at);
 ac_test_setup_pte(at);
 r = ac_test_do_access(at);
 return r;
-- 
1.7.0.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/5] Fixes for mmu access unit test

2010-04-28 Thread Avi Kivity
The mmu access unit test fails when testing real processor (e.g. with npt or
ept).  In one case this was because the test was too restrictive (more than
one result is allowed) and in another both the test and kvm shadow mmu are
faulty.

This patchset fixes the test.  Another patch will follow fixing kvm (but only
after nnpt is merged).

Avi Kivity (5):
  test: access: split off test case formatting from execution
  test: access: consolidate test failure reporting into a function
  test: access: report successful tests optionally
  test: access: allow the processor not to set pde.a if a fault occurs
  test: access: don't expect fetch fault indication if !efer.nx

 kvm/user/test/x86/access.c |  106 
 1 files changed, 77 insertions(+), 29 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/5] test: access: consolidate test failure reporting into a function

2010-04-28 Thread Avi Kivity
Signed-off-by: Avi Kivity 
---
 kvm/user/test/x86/access.c |   60 +++-
 1 files changed, 37 insertions(+), 23 deletions(-)

diff --git a/kvm/user/test/x86/access.c b/kvm/user/test/x86/access.c
index dbc1213..0906691 100644
--- a/kvm/user/test/x86/access.c
+++ b/kvm/user/test/x86/access.c
@@ -453,6 +453,28 @@ fault:
 ;
 }
 
+static void ac_test_check(ac_test_t *at, _Bool *success_ret, _Bool cond,
+  const char *fmt, ...)
+{
+va_list ap;
+char buf[500];
+
+if (!*success_ret) {
+return;
+}
+
+if (!cond) {
+return;
+}
+
+*success_ret = false;
+
+va_start(ap, fmt);
+vsnprintf(buf, sizeof(buf), fmt, ap);
+va_end(ap);
+printf("FAIL: %s\n", buf);
+}
+
 int ac_test_do_access(ac_test_t *at)
 {
 static unsigned unique = 42;
@@ -460,6 +482,7 @@ int ac_test_do_access(ac_test_t *at)
 unsigned e;
 static unsigned char user_stack[4096];
 unsigned long rsp;
+_Bool success = true;
 
 ++unique;
 
@@ -531,30 +554,21 @@ int ac_test_do_access(ac_test_t *at)
  "jmp back_to_kernel \n\t"
  ".section .text");
 
-if (fault && !at->expected_fault) {
-   printf("FAIL: unexpected fault\n");
-   return 0;
-}
-if (!fault && at->expected_fault) {
-   printf("FAIL: unexpected access\n");
-   return 0;
-}
-if (fault && e != at->expected_error) {
-   printf("FAIL: error code %x expected %x\n", e, at->expected_error);
-   return 0;
-}
-if (at->ptep && *at->ptep != at->expected_pte) {
-   printf("FAIL: pte %x expected %x\n", *at->ptep, at->expected_pte);
-   return 0;
+ac_test_check(at, &success, fault && !at->expected_fault,
+  "unexpected fault");
+ac_test_check(at, &success, !fault && at->expected_fault,
+  "unexpected access");
+ac_test_check(at, &success, fault && e != at->expected_error,
+  "error code %x expected %x", e, at->expected_error);
+ac_test_check(at, &success, at->ptep && *at->ptep != at->expected_pte,
+  "pte %x expected %x", *at->ptep, at->expected_pte);
+ac_test_check(at, &success, *at->pdep != at->expected_pde,
+  "pde %x expected %x", *at->pdep, at->expected_pde);
+
+if (success) {
+printf("PASS\n");
 }
-
-if (*at->pdep != at->expected_pde) {
-   printf("FAIL: pde %x expected %x\n", *at->pdep, at->expected_pde);
-   return 0;
-}
-
-printf("PASS\n");
-return 1;
+return success;
 }
 
 static void ac_test_show(ac_test_t *at)
-- 
1.7.0.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 06/10] kvm: remove explicit kvm_arch_reset_vcpu from kvm_init_vcpu

2010-04-28 Thread Anthony Liguori

On 04/26/2010 12:59 PM, Marcelo Tosatti wrote:

This is now done via the initialization's qemu_system_reset call.

Signed-off-by: Avi Kivity
---
  kvm-all.c |1 -
  1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index 9c8aa7d..eabb097 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -208,7 +208,6 @@ int kvm_init_vcpu(CPUState *env)
  ret = kvm_arch_init_vcpu(env);
  if (ret == 0) {
  qemu_register_reset(kvm_reset_vcpu, env);
-kvm_arch_reset_vcpu(env);
  }
  err:
  return ret;
   


This breaks -enable-kvm in upstream qemu.  No progress is ever made in 
the guest.


Regards,

Anthony Liguori


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 15/22] KVM: MMU: Introduce kvm_read_guest_page_x86()

2010-04-28 Thread Joerg Roedel
On Tue, Apr 27, 2010 at 04:35:06PM +0300, Avi Kivity wrote:
> Perhaps have
> 
>struct kvm_mmu base_mmu;
>struct kvm_mmu nested_mmu;
>struct kvm_mmu *mmu;

Okay, the pointer is only used in the gva_to_gpa path which is the
minority of mmu context usages. I will introduce a

struct kvm_mmu *walk_mmu;

and use it in the page walking code.

Joerg


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Potential thread synchronization issue in qcow2.c and qcow2-cluster.c

2010-04-28 Thread Chunqiang (CQ) Tang
Hi there,

I am reading the snapshot source code of qemu-kvm-0.12.3, and am
puzzled by the thread synchronization issue in qcow2.c and
qcow2-cluster.c. Could someone please enlighten me? Thanks!

Specifically, I found that BDRVQcowState.cluster_allocs, which is a
QLIST_HEAD, may be accessed concurrently by two threads, but I did not
see how the two thread properly synchronize with each other to avoid
race conditions. I profiled all executions of
qemu_mutex_lock_iothread(), and found that
it only protects the vl.c:main_loop_wai() thread but does NOT protect
the qemu-kvm.c:kvm_cpu_exec() thread. Did I miss something or is this
a defect?  Note that qemu-kvm.c:kvm_cpu_exec() is
executed, instead of kvm-all.c:kvm_cpu_exec(). Please
see below for more details.

Here is an example of how two threads may reach qcow2.

Stack trace of thread 1:
main -> main_loop -> kvm_main_loop -> main_loop_wait -> posix_aio_read
-> posix_aio_process_queue -> qcow_aio_write_cb ->
qcow2_alloc_cluster_offset (which may modify
BDRVQcowState.cluster_allocs)

Stack trace of thread 2:
ap_main_loop -> ... -> kvm_handle_io -> ... -> qcow_aio_writev ->
qcow_aio_write_cb -> qcow2_alloc_cluster_offset (which may modify
BDRVQcowState.cluster_allocs)

Here is the trace showing that
qemu_mutex_lock_iothread() does not protect the thread
that executes. kvm_cpu_exec()->...->qcow_aio_write_cb().

home/ctang/kvm/qemu-kvm-0.12.3/qemu-kvm.c : 2530thread: b7e056d0
   /home/ctang/kvm/bin/qemu-system-x86_64(qemu_mutex_unlock_iothread+0x1a)
[0x8092242]
   /home/ctang/kvm/bin/qemu-system-x86_64(main_loop_wait+0x221) [0x806edef]
   /home/ctang/kvm/bin/qemu-system-x86_64(kvm_main_loop+0x1ff) [0x80916a1]
   /home/ctang/kvm/bin/qemu-system-x86_64 [0x806f5c2]
   /home/ctang/kvm/bin/qemu-system-x86_64(main+0x2e2c) [0x80736d1]
   /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe5) [0xb7e33775]
   /home/ctang/kvm/bin/qemu-system-x86_64 [0x8068bb1]

block/qcow2-cluster.c : 721thread: b7dc2b90
   /home/ctang/kvm/bin/qemu-system-x86_64(qcow2_alloc_cluster_offset+0x3c)
[0x81175fa]
   /home/ctang/kvm/bin/qemu-system-x86_64(qcow_aio_write_cb+0x158)
[0x8111d73]
   /home/ctang/kvm/bin/qemu-system-x86_64(qcow_aio_writev+0x94) [0x8112054]
   /home/ctang/kvm/bin/qemu-system-x86_64(bdrv_aio_writev+0xe1) [0x80fa8e9]
   /home/ctang/kvm/bin/qemu-system-x86_64 [0x81f4a96]
   /home/ctang/kvm/bin/qemu-system-x86_64 [0x81f4c04]
   /home/ctang/kvm/bin/qemu-system-x86_64(dma_bdrv_write+0x48) [0x81f4cbf]
   /home/ctang/kvm/bin/qemu-system-x86_64 [0x80a437c]
   /home/ctang/kvm/bin/qemu-system-x86_64(bmdma_cmd_writeb+0x73)
[0x80a9503]
   /home/ctang/kvm/bin/qemu-system-x86_64 [0x812b1eb]
   /home/ctang/kvm/bin/qemu-system-x86_64(cpu_outb+0x27) [0x812b4e6]
   /home/ctang/kvm/bin/qemu-system-x86_64 [0x808d267]
   /home/ctang/kvm/bin/qemu-system-x86_64(kvm_run+0x2f4) [0x808f4b8]
   /home/ctang/kvm/bin/qemu-system-x86_64(kvm_cpu_exec+0x56) [0x80907b2]
   /home/ctang/kvm/bin/qemu-system-x86_64 [0x8090f4d]
   /home/ctang/kvm/bin/qemu-system-x86_64 [0x8091098]
   /lib/tls/i686/cmov/libpthread.so.0 [0xb7fd24ff]
   /lib/tls/i686/cmov/libc.so.6(clone+0x5e) [0xb7f0149e]

/home/ctang/kvm/qemu-kvm-0.12.3/qemu-kvm.c : 2537thread: b7e056d0
   /home/ctang/kvm/bin/qemu-system-x86_64(qemu_mutex_lock_iothread+0x1a)
[0x809229d]
   /home/ctang/kvm/bin/qemu-system-x86_64(main_loop_wait+0x25c) [0x806ee2a]
   /home/ctang/kvm/bin/qemu-system-x86_64(kvm_main_loop+0x1ff) [0x80916a1]
   /home/ctang/kvm/bin/qemu-system-x86_64 [0x806f5c2]
   /home/ctang/kvm/bin/qemu-system-x86_64(main+0x2e2c) [0x80736d1]
   /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe5) [0xb7e33775]
   /home/ctang/kvm/bin/qemu-system-x86_64 [0x8068bb1]

-- 
Regards,
Chunqiang Tang
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 12/22] KVM: MMU: Implement nested gva_to_gpa functions

2010-04-28 Thread Joerg Roedel
On Tue, Apr 27, 2010 at 03:37:40PM +0300, Avi Kivity wrote:
> On 04/27/2010 01:38 PM, Joerg Roedel wrote:
> >This patch adds the functions to do a nested l2_gva to
> >l1_gpa page table walk.
> >
> >Signed-off-by: Joerg Roedel
> >---
> >  arch/x86/include/asm/kvm_host.h |4 
> >  arch/x86/kvm/mmu.c  |8 
> >  arch/x86/kvm/paging_tmpl.h  |   31 +++
> >  3 files changed, 43 insertions(+), 0 deletions(-)
> >
> >diff --git a/arch/x86/include/asm/kvm_host.h 
> >b/arch/x86/include/asm/kvm_host.h
> >index ccdbd2f..3cbfb51 100644
> >--- a/arch/x86/include/asm/kvm_host.h
> >+++ b/arch/x86/include/asm/kvm_host.h
> >@@ -287,6 +287,10 @@ struct kvm_vcpu_arch {
> > bool tpr_access_reporting;
> >
> > struct kvm_mmu mmu;
> >+
> >+/* Used for two dimensional paging emulation */
> >+struct kvm_mmu nested_mmu;
> >+
> > /* only needed in kvm_pv_mmu_op() path, but it's hot so
> >  * put it here to avoid allocation */
> > struct kvm_pv_mmu_op_buffer mmu_op_buffer;
> >diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> >index 4e0bfdb..8bd40b5 100644
> >--- a/arch/x86/kvm/mmu.c
> >+++ b/arch/x86/kvm/mmu.c
> >@@ -2144,6 +2144,14 @@ static gpa_t nonpaging_gva_to_gpa(struct kvm_vcpu 
> >*vcpu, gva_t vaddr,
> > return vaddr;
> >  }
> >
> >+static gpa_t nonpaging_gva_to_gpa_nested(struct kvm_vcpu *vcpu, gva_t vaddr,
> >+ u32 access, u32 *error)
> >+{
> >+if (error)
> >+*error = 0;
> 
> Why allow error == NULL?

In the emulator there are some functions passing NULL for the error
code. In general it doesn't matter much because when we track this
information in the vcpu struct these error parameters can be removed. I
just let this outside the scope of this patchset and make that cleanup
later.

> 
> >
> >+static gpa_t FNAME(gva_to_gpa_nested)(struct kvm_vcpu *vcpu, gva_t vaddr,
> >+  u32 access, u32 *error)
> >+{
> >+struct guest_walker walker;
> >+gpa_t gpa = UNMAPPED_GVA;
> >+int r;
> >+
> >+r = FNAME(walk_addr_nested)(&walker, vcpu, vaddr,
> >+!!(access&  PFERR_WRITE_MASK),
> >+!!(access&  PFERR_USER_MASK),
> >+!!(access&  PFERR_FETCH_MASK));
> 
> Those !! are unneeded.

Ok, I remove them.

Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/5] Fix EFER.NX=0 with EPT

2010-04-28 Thread Avi Kivity
Currently we run with EFER.NX=1 on the guest even if the guest value is 0.
This is fine with shadow, since we check bit 63 when instantiating a page
table, and fault if bit 63 is set while EFER.NX is clear.

This doesn't work with EPT, since we no longer get the change to check guest
ptes.  So we need to run with EFER.NX=0.

This is complicated by the fact that if we switch EFER.NX on the host, we'll
trap immediately, since some host pages are mapped with the NX bit set.  As
a result, we need to switch the MSR atomically during guest entry and exit.

This patchset implements the complications described above.

Avi Kivity (5):
  KVM: Let vcpu structure alignment be determined at runtime
  KVM: VMX: Add definition for msr autoload entry
  KVM: VMX: Add definitions for guest and host EFER autoswitch vmcs
entries
  KVM: VMX: Add facility to atomically switch MSRs on guest entry/exit
  KVM: VMX: Atomically switch efer if EPT && !EFER.NX

 arch/ia64/kvm/vmm.c|2 +-
 arch/powerpc/kvm/44x.c |2 +-
 arch/powerpc/kvm/book3s.c  |3 +-
 arch/powerpc/kvm/e500.c|2 +-
 arch/s390/kvm/kvm-s390.c   |2 +-
 arch/x86/include/asm/vmx.h |   12 +++-
 arch/x86/kvm/svm.c |2 +-
 arch/x86/kvm/vmx.c |   63 +++-
 include/linux/kvm_host.h   |2 +-
 virt/kvm/kvm_main.c|7 +++--
 10 files changed, 85 insertions(+), 12 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/5] KVM: Let vcpu structure alignment be determined at runtime

2010-04-28 Thread Avi Kivity
vmx and svm vcpus have different contents and therefore may have different
alignmment requirements.  Let each specify its required alignment.

Signed-off-by: Avi Kivity 
---
 arch/ia64/kvm/vmm.c   |2 +-
 arch/powerpc/kvm/44x.c|2 +-
 arch/powerpc/kvm/book3s.c |3 ++-
 arch/powerpc/kvm/e500.c   |2 +-
 arch/s390/kvm/kvm-s390.c  |2 +-
 arch/x86/kvm/svm.c|2 +-
 arch/x86/kvm/vmx.c|3 ++-
 include/linux/kvm_host.h  |2 +-
 virt/kvm/kvm_main.c   |7 ---
 9 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/arch/ia64/kvm/vmm.c b/arch/ia64/kvm/vmm.c
index 7a62f75..f0b9cac 100644
--- a/arch/ia64/kvm/vmm.c
+++ b/arch/ia64/kvm/vmm.c
@@ -51,7 +51,7 @@ static int __init  kvm_vmm_init(void)
vmm_fpswa_interface = fpswa_interface;
 
/*Register vmm data to kvm side*/
-   return kvm_init(&vmm_info, 1024, THIS_MODULE);
+   return kvm_init(&vmm_info, 1024, 0, THIS_MODULE);
 }
 
 static void __exit kvm_vmm_exit(void)
diff --git a/arch/powerpc/kvm/44x.c b/arch/powerpc/kvm/44x.c
index 689a57c..73c0a3f 100644
--- a/arch/powerpc/kvm/44x.c
+++ b/arch/powerpc/kvm/44x.c
@@ -147,7 +147,7 @@ static int __init kvmppc_44x_init(void)
if (r)
return r;
 
-   return kvm_init(NULL, sizeof(struct kvmppc_vcpu_44x), THIS_MODULE);
+   return kvm_init(NULL, sizeof(struct kvmppc_vcpu_44x), 0, THIS_MODULE);
 }
 
 static void __exit kvmppc_44x_exit(void)
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 9f97dbe..16e3f4b 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -1384,7 +1384,8 @@ int __kvmppc_vcpu_run(struct kvm_run *kvm_run, struct 
kvm_vcpu *vcpu)
 
 static int kvmppc_book3s_init(void)
 {
-   return kvm_init(NULL, sizeof(struct kvmppc_vcpu_book3s), THIS_MODULE);
+   return kvm_init(NULL, sizeof(struct kvmppc_vcpu_book3s), 0,
+   THIS_MODULE);
 }
 
 static void kvmppc_book3s_exit(void)
diff --git a/arch/powerpc/kvm/e500.c b/arch/powerpc/kvm/e500.c
index 669a5c5..bc2b400 100644
--- a/arch/powerpc/kvm/e500.c
+++ b/arch/powerpc/kvm/e500.c
@@ -161,7 +161,7 @@ static int __init kvmppc_e500_init(void)
flush_icache_range(kvmppc_booke_handlers,
kvmppc_booke_handlers + max_ivor + kvmppc_handler_len);
 
-   return kvm_init(NULL, sizeof(struct kvmppc_vcpu_e500), THIS_MODULE);
+   return kvm_init(NULL, sizeof(struct kvmppc_vcpu_e500), 0, THIS_MODULE);
 }
 
 static void __init kvmppc_e500_exit(void)
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index ee7c713..8093e6f 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -752,7 +752,7 @@ gfn_t unalias_gfn(struct kvm *kvm, gfn_t gfn)
 static int __init kvm_s390_init(void)
 {
int ret;
-   ret = kvm_init(NULL, sizeof(struct kvm_vcpu), THIS_MODULE);
+   ret = kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE);
if (ret)
return ret;
 
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 889f660..2511664 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3319,7 +3319,7 @@ static struct kvm_x86_ops svm_x86_ops = {
 static int __init svm_init(void)
 {
return kvm_init(&svm_x86_ops, sizeof(struct vcpu_svm),
- THIS_MODULE);
+   __alignof__(struct vcpu_svm), THIS_MODULE);
 }
 
 static void __exit svm_exit(void)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 875b785..2e87296 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4245,7 +4245,8 @@ static int __init vmx_init(void)
 
set_bit(0, vmx_vpid_bitmap); /* 0 is reserved for host */
 
-   r = kvm_init(&vmx_x86_ops, sizeof(struct vcpu_vmx), THIS_MODULE);
+   r = kvm_init(&vmx_x86_ops, sizeof(struct vcpu_vmx),
+__alignof__(struct vcpu_vmx), THIS_MODULE);
if (r)
goto out3;
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ce027d5..7cb116a 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -243,7 +243,7 @@ void kvm_vcpu_uninit(struct kvm_vcpu *vcpu);
 void vcpu_load(struct kvm_vcpu *vcpu);
 void vcpu_put(struct kvm_vcpu *vcpu);
 
-int kvm_init(void *opaque, unsigned int vcpu_size,
+int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
  struct module *module);
 void kvm_exit(void);
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 9ab1a77..f032806 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2178,7 +2178,7 @@ static void kvm_sched_out(struct preempt_notifier *pn,
kvm_arch_vcpu_put(vcpu);
 }
 
-int kvm_init(void *opaque, unsigned int vcpu_size,
+int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
  struct module *module)
 {
int r;
@@ -2228,8 +2228,9 @@ int kvm_init(void *opaque, unsigned int vcpu_size,
goto out_free_4;
 
/* A k

[PATCH 5/5] KVM: VMX: Atomically switch efer if EPT && !EFER.NX

2010-04-28 Thread Avi Kivity
When EPT is enabled, we cannot emulate EFER.NX=0 through the shadow page
tables.  This causes accesses through ptes with bit 63 set to succeed instead
of failing a reserved bit check.

Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/vmx.c |   11 +++
 1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index ae22dcf..cc78fee 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -678,6 +678,17 @@ static bool update_transition_efer(struct vcpu_vmx *vmx, 
int efer_offset)
guest_efer |= host_efer & ignore_bits;
vmx->guest_msrs[efer_offset].data = guest_efer;
vmx->guest_msrs[efer_offset].mask = ~ignore_bits;
+
+   clear_atomic_switch_msr(vmx, MSR_EFER);
+   /* On ept, can't emulate nx, and must switch nx atomically */
+   if (enable_ept && ((vmx->vcpu.arch.efer ^ host_efer) & EFER_NX)) {
+   guest_efer = vmx->vcpu.arch.efer;
+   if (!(guest_efer & EFER_LMA))
+   guest_efer &= ~EFER_LME;
+   add_atomic_switch_msr(vmx, MSR_EFER, guest_efer, host_efer);
+   return false;
+   }
+
return true;
 }
 
-- 
1.7.0.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/5] KVM: VMX: Add definition for msr autoload entry

2010-04-28 Thread Avi Kivity
Signed-off-by: Avi Kivity 
---
 arch/x86/include/asm/vmx.h |8 +++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index fb9a080..4497318 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -25,6 +25,8 @@
  *
  */
 
+#include 
+
 /*
  * Definitions of Primary Processor-Based VM-Execution Controls.
  */
@@ -394,6 +396,10 @@ enum vmcs_field {
 #define ASM_VMX_INVEPT   ".byte 0x66, 0x0f, 0x38, 0x80, 0x08"
 #define ASM_VMX_INVVPID  ".byte 0x66, 0x0f, 0x38, 0x81, 0x08"
 
-
+struct vmx_msr_entry {
+   u32 index;
+   u32 reserved;
+   u64 value;
+} __aligned(16);
 
 #endif
-- 
1.7.0.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/5] KVM: VMX: Add definitions for guest and host EFER autoswitch vmcs entries

2010-04-28 Thread Avi Kivity
Signed-off-by: Avi Kivity 
---
 arch/x86/include/asm/vmx.h |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 4497318..9e6779f 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -122,6 +122,8 @@ enum vmcs_field {
GUEST_IA32_DEBUGCTL_HIGH= 0x2803,
GUEST_IA32_PAT  = 0x2804,
GUEST_IA32_PAT_HIGH = 0x2805,
+   GUEST_IA32_EFER = 0x2806,
+   GUEST_IA32_EFER_HIGH= 0x2807,
GUEST_PDPTR0= 0x280a,
GUEST_PDPTR0_HIGH   = 0x280b,
GUEST_PDPTR1= 0x280c,
@@ -132,6 +134,8 @@ enum vmcs_field {
GUEST_PDPTR3_HIGH   = 0x2811,
HOST_IA32_PAT   = 0x2c00,
HOST_IA32_PAT_HIGH  = 0x2c01,
+   HOST_IA32_EFER  = 0x2c02,
+   HOST_IA32_EFER_HIGH = 0x2c03,
PIN_BASED_VM_EXEC_CONTROL   = 0x4000,
CPU_BASED_VM_EXEC_CONTROL   = 0x4002,
EXCEPTION_BITMAP= 0x4004,
-- 
1.7.0.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/5] KVM: VMX: Add facility to atomically switch MSRs on guest entry/exit

2010-04-28 Thread Avi Kivity
Some guest msr values cannot be used on the host (for example. EFER.NX=0),
so we need to switch them atomically during guest entry or exit.

Add a facility to program the vmx msr autoload registers accordingly.

Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/vmx.c |   49 +
 1 files changed, 49 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 2e87296..ae22dcf 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -98,6 +98,8 @@ module_param(ple_gap, int, S_IRUGO);
 static int ple_window = KVM_VMX_DEFAULT_PLE_WINDOW;
 module_param(ple_window, int, S_IRUGO);
 
+#define NR_AUTOLOAD_MSRS 1
+
 struct vmcs {
u32 revision_id;
u32 abort;
@@ -125,6 +127,11 @@ struct vcpu_vmx {
u64   msr_guest_kernel_gs_base;
 #endif
struct vmcs  *vmcs;
+   struct msr_autoload {
+   unsigned nr;
+   struct vmx_msr_entry guest[NR_AUTOLOAD_MSRS];
+   struct vmx_msr_entry host[NR_AUTOLOAD_MSRS];
+   } msr_autoload;
struct {
int   loaded;
u16   fs_sel, gs_sel, ldt_sel;
@@ -595,6 +602,46 @@ static void update_exception_bitmap(struct kvm_vcpu *vcpu)
vmcs_write32(EXCEPTION_BITMAP, eb);
 }
 
+static void clear_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr)
+{
+   unsigned i;
+   struct msr_autoload *m = &vmx->msr_autoload;
+
+   for (i = 0; i < m->nr; ++i)
+   if (m->guest[i].index == msr)
+   break;
+
+   if (i == m->nr)
+   return;
+   --m->nr;
+   m->guest[i] = m->guest[m->nr];
+   m->host[i] = m->host[m->nr];
+   vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, m->nr);
+   vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, m->nr);
+}
+
+static void add_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr,
+ u64 guest_val, u64 host_val)
+{
+   unsigned i;
+   struct msr_autoload *m = &vmx->msr_autoload;
+
+   for (i = 0; i < m->nr; ++i)
+   if (m->guest[i].index == msr)
+   break;
+
+   if (i == m->nr) {
+   ++m->nr;
+   vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, m->nr);
+   vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, m->nr);
+   }
+
+   m->guest[i].index = msr;
+   m->guest[i].value = guest_val;
+   m->host[i].index = msr;
+   m->host[i].value = host_val;
+}
+
 static void reload_tss(void)
 {
/*
@@ -2470,7 +2517,9 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
vmcs_writel(HOST_RIP, kvm_vmx_return); /* 22.2.5 */
vmcs_write32(VM_EXIT_MSR_STORE_COUNT, 0);
vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, 0);
+   vmcs_write64(VM_EXIT_MSR_LOAD_ADDR, __pa(vmx->msr_autoload.host));
vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, 0);
+   vmcs_write64(VM_ENTRY_MSR_LOAD_ADDR, __pa(vmx->msr_autoload.guest));
 
rdmsr(MSR_IA32_SYSENTER_CS, host_sysenter_cs, junk);
vmcs_write32(HOST_IA32_SYSENTER_CS, host_sysenter_cs);
-- 
1.7.0.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/9] KVM test: Log the content from guest serial console

2010-04-28 Thread Michael Goldish
On 04/26/2010 01:04 PM, Jason Wang wrote:
> This patch tries to get the content of guest serial and log it into
> the debug directoy of the testcase through a dedicated thread which is
> created in the preprocessing and ended in the postprocessing. The
> params of serial_mode must be set to "dump" in order to make use of
> this feature.
> 
> Signed-off-by: Jason Wang 
> ---
>  client/tests/kvm/kvm_preprocessing.py  |   59 
> +++-
>  client/tests/kvm/tests_base.cfg.sample |1 +
>  2 files changed, 59 insertions(+), 1 deletions(-)
> 
> diff --git a/client/tests/kvm/kvm_preprocessing.py 
> b/client/tests/kvm/kvm_preprocessing.py
> index 4b9290c..50d0e35 100644
> --- a/client/tests/kvm/kvm_preprocessing.py
> +++ b/client/tests/kvm/kvm_preprocessing.py
> @@ -1,4 +1,5 @@
>  import sys, os, time, commands, re, logging, signal, glob, threading, shutil
> +import socket, select
>  from autotest_lib.client.bin import test, utils
>  from autotest_lib.client.common_lib import error
>  import kvm_vm, kvm_utils, kvm_subprocess, ppm_utils
> @@ -13,7 +14,8 @@ except ImportError:
>  
>  _screendump_thread = None
>  _screendump_thread_termination_event = None
> -
> +_serialdump_thread = None
> +_serialdump_thread_termination_event = None
>  
>  def preprocess_image(test, params):
>  """
> @@ -267,6 +269,16 @@ def preprocess(test, params, env):
>args=(test, params, env))
>  _screendump_thread.start()
>  
> +# Start the serial dump thread
> +if params.get("serial_mode") == "dump":
> +logging.debug("Starting serialdump thread")
> +global _serialdump_thread, _serialdump_thread_termination_event
> +_serialdump_thread_termination_event = threading.Event()
> +_serialdump_thread = threading.Thread(target=_dump_serial_console,
> +  args=(test, params, env))
> +_serialdump_thread.start()
> +
> +
>  
>  def postprocess(test, params, env):
>  """
> @@ -286,6 +298,13 @@ def postprocess(test, params, env):
>  _screendump_thread_termination_event.set()
>  _screendump_thread.join(10)
>  
> +# Terminate the serialdump thread
> +global _serialdump_thread, _serialdump_thread_termination_event
> +if _serialdump_thread:
> +logging.debug("Terminating serialdump thread...")
> +_serialdump_thread_termination_event.set()
> +_serialdump_thread.join(10)
> +
>  # Warn about corrupt PPM files
>  for f in glob.glob(os.path.join(test.debugdir, "*.ppm")):
>  if not ppm_utils.image_verify_ppm_file(f):
> @@ -450,3 +469,41 @@ def _take_screendumps(test, params, env):
>  if _screendump_thread_termination_event.isSet():
>  break
>  _screendump_thread_termination_event.wait(delay)
> +
> +def _dump_serial_console(test, params, env):
> +global _serialdump_thread_termination_event
> +rs = []
> +files = {}
> +
> +while True:
> +for vm in kvm_utils.env_get_all_vms(env):
> +if not files.has_key(vm):

You should probably add "and not vm.is_dead()" to this condition.
Otherwise we'll get lots of "could not connect to serial socket"
messages for dead VMs.

Style note: AFAIK in the Autotest project the form "if not vm in files"
is preferred.

> +try:
> +serial_socket = socket.socket(socket.AF_UNIX,
> +  socket.SOCK_STREAM)
> +serial_socket.setblocking(False)
> +serial_socket.connect(vm.serial_file_name)
> +except:
> +logging.debug("Could not connect to serial socket for 
> %s" %
> +  vm.name)
> +continue
> +rs.append(serial_socket)
> +serial_dump_filename = os.path.join(test.debugdir,
> +"serial-%s" % vm.name)
> +files[vm] = [serial_socket, file(serial_dump_filename, "a+")]
> +
> +r, w, x = select.select(rs, [], [], 0.5)
> +for vm in files.keys():
> +[s ,d] = files[vm]

For consistency, please consider changing this list to a tuple, i.e.
s, d = files[vm] or (s, d) = files[vm].

> +if s in r:
> +data = s.recv(16384)
> +if len(data) == 0:

Style note: AFAIK the preferred form is "if not data".

Sorry for the petty comments.  Overall the patch looks good.

> +rs.remove(s)
> +files.pop(vm)
> +else:
> +d.write(data)
> +
> +if _serialdump_thread_termination_event.isSet():
> +break
> +
> +
> diff --git a/client/tests/kvm/tests_base.cfg.sample 
> b/client/tests/kvm/tests_base.cfg.sample
> index 9f82ffb..169a69e 100644
> --- a/client/tests/kvm/tests_base.cfg.sample
> +++

Re: [PATCH 08/23] KVM: x86 emulator: cleanup some direct calls into kvm to use existing callbacks

2010-04-28 Thread Avi Kivity

On 04/28/2010 12:33 PM, Gleb Natapov wrote:

On Wed, Apr 28, 2010 at 11:59:54AM +0300, Avi Kivity wrote:
   

On 04/27/2010 03:15 PM, Gleb Natapov wrote:
 

Use callbacks from x86_emulate_ops to access segments instead of calling
into kvm directly.



-static unsigned long seg_base(struct x86_emulate_ctxt *ctxt, int seg)
+static unsigned long seg_base(struct x86_emulate_ctxt *ctxt,
+ struct x86_emulate_ops *ops, int seg)
  {
-   if (ctxt->mode == X86EMUL_MODE_PROT64&&   seg<   VCPU_SREG_FS)
-   return 0;
+   unsigned long base;

-   return kvm_x86_ops->get_segment_base(ctxt->vcpu, seg);
   

get_segment_base() is only one vmread on intel, but you replace it
with reading the entire segment.

 

Didn't what to have separate x86_emulate_ops callback for reading
segment base. If this is serious performance concern it can be added
   


It will definitely hurt nested vmx.  For ordinary vmx perhaps not so 
much, but better not to regress.



+   if (ctxt->mode == X86EMUL_MODE_PROT64) {
+   u64 val;
+   switch (seg) {
+   case VCPU_SREG_FS:
+   ops->get_msr(ctxt->vcpu, MSR_FS_BASE,&val);
+   break;
+   case VCPU_SREG_GS:
+   ops->get_msr(ctxt->vcpu, MSR_GS_BASE,&val);
+   break;
+   default:
+   val = 0;
+   break;
+   }
   

Why this ugliness?  get_cached_descriptor() should do this.

 

get_cached_descriptor() returns struct desc_struct which is 32 bit only.
There is not 64bit segment descriptors.
   


Ah.  Well either extend desc_struct, or return something which works 
everywhere.



  static unsigned long seg_override_base(struct x86_emulate_ctxt *ctxt,
+  struct x86_emulate_ops *ops,
   struct decode_cache *c)
  {
if (!c->has_seg_override)
return 0;

-   return seg_base(ctxt, c->seg_override);
+   return seg_base(ctxt, ops, c->seg_override);
  }
   

Sticking ops into ctxt would reduce the size of these patches.

 

But it will introduce intermediate huge patch. But I agree that this
should be done eventually. Planned to do it somewhere at the end. Near
the patch that change x86_emulate_ops callbacks to get ctxt instead of
vcpu as a parameter.
   


Would have been better up front IMO, but don't destroy your patchset 
just for this.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 9/9] KVM test: Redirect the console to serial for all linux guests

2010-04-28 Thread Michael Goldish
On 04/26/2010 01:04 PM, Jason Wang wrote:
> As we have the ability to dump the content from serial console or use
> a session through it, we need to redirect the console to serial
> through unattended files to make use of it.
> 
> Signed-off-by: Jason Wang 
> ---
>  client/tests/kvm/unattended/Fedora-10.ks |2 +-
>  client/tests/kvm/unattended/Fedora-11.ks |2 +-
>  client/tests/kvm/unattended/Fedora-12.ks |2 +-
>  client/tests/kvm/unattended/Fedora-8.ks  |2 +-
>  client/tests/kvm/unattended/Fedora-9.ks  |2 +-
>  client/tests/kvm/unattended/OpenSUSE-11.xml  |1 +
>  client/tests/kvm/unattended/RHEL-3-series.ks |2 +-
>  client/tests/kvm/unattended/RHEL-4-series.ks |2 +-
>  client/tests/kvm/unattended/RHEL-5-series.ks |2 +-
>  client/tests/kvm/unattended/SLES-11.xml  |1 +
>  10 files changed, 10 insertions(+), 8 deletions(-)
> 
> diff --git a/client/tests/kvm/unattended/Fedora-10.ks 
> b/client/tests/kvm/unattended/Fedora-10.ks
> index 61e59d7..8628036 100644
> --- a/client/tests/kvm/unattended/Fedora-10.ks
> +++ b/client/tests/kvm/unattended/Fedora-10.ks
> @@ -11,7 +11,7 @@ firewall --enabled --ssh
>  selinux --enforcing
>  timezone --utc America/New_York
>  firstboot --disable
> -bootloader --location=mbr
> +bootloader --location=mbr --append="console=ttyS0,115200"

I see no reason to completely silence the regular console.  We can let
some output go to the regular console as well as the serial one by using

--append="console=tty0 console=ttyS0"

>  zerombr
>  clearpart --all --initlabel
>  autopart
> diff --git a/client/tests/kvm/unattended/Fedora-11.ks 
> b/client/tests/kvm/unattended/Fedora-11.ks
> index 0be7d06..5ce8ee2 100644
> --- a/client/tests/kvm/unattended/Fedora-11.ks
> +++ b/client/tests/kvm/unattended/Fedora-11.ks
> @@ -10,7 +10,7 @@ firewall --enabled --ssh
>  selinux --enforcing
>  timezone --utc America/New_York
>  firstboot --disable
> -bootloader --location=mbr
> +bootloader --location=mbr --append="console=ttyS0,115200"
>  zerombr
>  
>  clearpart --all --initlabel
> diff --git a/client/tests/kvm/unattended/Fedora-12.ks 
> b/client/tests/kvm/unattended/Fedora-12.ks
> index 0be7d06..5ce8ee2 100644
> --- a/client/tests/kvm/unattended/Fedora-12.ks
> +++ b/client/tests/kvm/unattended/Fedora-12.ks
> @@ -10,7 +10,7 @@ firewall --enabled --ssh
>  selinux --enforcing
>  timezone --utc America/New_York
>  firstboot --disable
> -bootloader --location=mbr
> +bootloader --location=mbr --append="console=ttyS0,115200"
>  zerombr
>  
>  clearpart --all --initlabel
> diff --git a/client/tests/kvm/unattended/Fedora-8.ks 
> b/client/tests/kvm/unattended/Fedora-8.ks
> index f4a872d..884d556 100644
> --- a/client/tests/kvm/unattended/Fedora-8.ks
> +++ b/client/tests/kvm/unattended/Fedora-8.ks
> @@ -11,7 +11,7 @@ firewall --enabled --ssh
>  selinux --enforcing
>  timezone --utc America/New_York
>  firstboot --disable
> -bootloader --location=mbr
> +bootloader --location=mbr --append="console=ttyS0,115200"
>  zerombr
>  clearpart --all --initlabel
>  autopart
> diff --git a/client/tests/kvm/unattended/Fedora-9.ks 
> b/client/tests/kvm/unattended/Fedora-9.ks
> index f4a872d..884d556 100644
> --- a/client/tests/kvm/unattended/Fedora-9.ks
> +++ b/client/tests/kvm/unattended/Fedora-9.ks
> @@ -11,7 +11,7 @@ firewall --enabled --ssh
>  selinux --enforcing
>  timezone --utc America/New_York
>  firstboot --disable
> -bootloader --location=mbr
> +bootloader --location=mbr --append="console=ttyS0,115200"
>  zerombr
>  clearpart --all --initlabel
>  autopart
> diff --git a/client/tests/kvm/unattended/OpenSUSE-11.xml 
> b/client/tests/kvm/unattended/OpenSUSE-11.xml
> index 7dd44fa..2a4fec0 100644
> --- a/client/tests/kvm/unattended/OpenSUSE-11.xml
> +++ b/client/tests/kvm/unattended/OpenSUSE-11.xml
> @@ -50,6 +50,7 @@
>  edd
>
>  
> +console=ttyS0,115200
>  grub
>  
>
> diff --git a/client/tests/kvm/unattended/RHEL-3-series.ks 
> b/client/tests/kvm/unattended/RHEL-3-series.ks
> index 884b386..26b1130 100644
> --- a/client/tests/kvm/unattended/RHEL-3-series.ks
> +++ b/client/tests/kvm/unattended/RHEL-3-series.ks
> @@ -10,7 +10,7 @@ rootpw 123456
>  firewall --enabled --ssh
>  timezone America/New_York
>  firstboot --disable
> -bootloader --location=mbr
> +bootloader --location=mbr --append="console=ttyS0,115200"
>  clearpart --all --initlabel
>  autopart
>  reboot
> diff --git a/client/tests/kvm/unattended/RHEL-4-series.ks 
> b/client/tests/kvm/unattended/RHEL-4-series.ks
> index ce4a430..f2f934f 100644
> --- a/client/tests/kvm/unattended/RHEL-4-series.ks
> +++ b/client/tests/kvm/unattended/RHEL-4-series.ks
> @@ -11,7 +11,7 @@ firewall --enabled --ssh
>  selinux --enforcing
>  timezone --utc America/New_York
>  firstboot --disable
> -bootloader --location=mbr
> +bootloader --location=mbr --append="console=ttyS0,115200"
>  zerombr
>  clearpart --all --initlabel
>  autopart
> diff --git a/client/tests/kvm/unattended/RHEL-5-serie

Re: [PATCH] kvm mmu: reduce 50% memory usage

2010-04-28 Thread Avi Kivity

On 04/28/2010 02:57 PM, Lai Jiangshan wrote:

I think users will enable tdp when their hardwares support ept or npt.
This patch can reduce about 50% kvm mmu memory usage for they.

   


Good one!

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/9] KVM test: Introduce the local_login()

2010-04-28 Thread Michael Goldish
On 04/26/2010 01:04 PM, Jason Wang wrote:
> This patch introduces a new method which is used to log into the guest
> through the guest serial console. The serial_mode must be set to
> "session" in order to make use of this patch.

In what cases would we want to use this feature?  The serial console is
not supported by all guests and I'm not sure it supports multiple
concurrent sessions (does it?), so it's probably not possible to use it
reliably as a replacement for the regular remote shell servers, or even
as an alternative variant.

> Signed-off-by: Jason Wang 
> ---
>  client/tests/kvm/kvm_vm.py |   25 +
>  1 files changed, 25 insertions(+), 0 deletions(-)
> 
> diff --git a/client/tests/kvm/kvm_vm.py b/client/tests/kvm/kvm_vm.py
> index 0cdf925..a22893b 100755
> --- a/client/tests/kvm/kvm_vm.py
> +++ b/client/tests/kvm/kvm_vm.py
> @@ -814,7 +814,32 @@ class VM:
>  "command", ""))
>  return session
>  
> +def local_login(self, timeout=240):
> +"""
> +Log into the guest via serial console
> +If timeout expires while waiting for output from the guest (e.g. a
> +password prompt or a shell prompt) -- fail.
> +"""
> +
> +serial_mode = self.params.get("serial_mode")
> +username = self.params.get("username", "")
> +password = self.params.get("password", "")
> +prompt = self.params.get("shell_prompt", "[\#\$]")
> +linesep = eval("'%s'" % self.params.get("shell_linesep", r"\n"))
>  
> +if serial_mode != "session":
> +logging.debug("serial_mode is not session")
> +return None
> +else:
> +command = "nc -U %s"  % self.serial_file_name
> +assist = self.params.get("prompt_assist")
> +session = kvm_utils.remote_login(command, password, prompt, 
> linesep,
> + timeout, "", username)
 ^
You probably meant to pass the prompt assist string to remote_login()
but instead you're passing "".

> +if session:
> +
> session.set_status_test_command(self.params.get("status_test_"
> +"command", 
> ""))
> +return session
> +
>  def copy_files_to(self, local_path, remote_path, nic_index=0, 
> timeout=300):
>  """
>  Transfer files to the guest.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Autotest] [PATCH 8/9] KVM test: Create the background threads before calling process()

2010-04-28 Thread Michael Goldish
On 04/26/2010 01:04 PM, Jason Wang wrote:
> If the screendump and scrialdump threads are created after the
> process(), we may lose the progress tracking of guest shutting
> down. So this patch creates them before calling process() in preprocess.
> 
> Signed-off-by: Jason Wang 
> ---
>  client/tests/kvm/kvm_preprocessing.py |5 ++---
>  1 files changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/client/tests/kvm/kvm_preprocessing.py 
> b/client/tests/kvm/kvm_preprocessing.py
> index 50d0e35..73e835a 100644
> --- a/client/tests/kvm/kvm_preprocessing.py
> +++ b/client/tests/kvm/kvm_preprocessing.py
> @@ -257,9 +257,6 @@ def preprocess(test, params, env):
>  int(params.get("pre_command_timeout", "600")),
>  params.get("pre_command_noncritical") == "yes")
>  
> -# Preprocess all VMs and images
> -process(test, params, env, preprocess_image, preprocess_vm)
> -
>  # Start the screendump thread
>  if params.get("take_regular_screendumps") == "yes":
>  logging.debug("Starting screendump thread")
> @@ -278,6 +275,8 @@ def preprocess(test, params, env):
>args=(test, params, env))
>  _serialdump_thread.start()
>  
> +# Preprocess all VMs and images
> +process(test, params, env, preprocess_image, preprocess_vm)
>  
>  
>  def postprocess(test, params, env):

The initial shutdown procedure is carried out automatically by the
preprocessor in order to prepare the VMs for the current test, and is
not part of the test.  During the procedure VMs from a previous test are
shut down and/or restarted.  I think it'll be confusing (or at least
irrelevant) for the user to see a Fedora guest shutting down at the
beginning of a Windows test.  Also, it will be inconsistent with the
pre_*.ppm screendumps which are taken after the new VMs are started.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm mmu: reduce 50% memory usage

2010-04-28 Thread Lai Jiangshan

I think users will enable tdp when their hardwares support ept or npt.
This patch can reduce about 50% kvm mmu memory usage for they.

This simple patch use the fact that:

When sp->role.direct is set, sp->gfns does not contain any essential
information, leaf sptes reachable from this sp are for a continuate
guest physical memory range(a linear range).
So sp->gfns[i](if it was set) equals to sp->gfn + i. (PT_PAGE_TABLE_LEVEL)
Obviously, it is not essential information, we can calculate it when need.

It means we don't need sp->gfns when sp->role.direct=1,
Thus we can save one page usage for every kvm_mmu_page.

Note:
Access to sp->gfns must be wrapped by kvm_mmu_page_get_gfn()
or kvm_mmu_page_set_gfn().
It is only exposed in FNAME(sync_page).

Signed-off-by: Lai Jiangshan 
---
 mmu.c |   38 +-
 paging_tmpl.h |3 +++
 2 files changed, 32 insertions(+), 9 deletions(-)
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index ddfa865..a5c6719 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -393,6 +393,22 @@ static void mmu_free_rmap_desc(struct kvm_rmap_desc *rd)
kfree(rd);
 }
 
+static gfn_t kvm_mmu_page_get_gfn(struct kvm_mmu_page *sp, int index)
+{
+   if (!sp->role.direct)
+   return sp->gfns[index];
+
+   return sp->gfn + (index << ((sp->role.level - 1) * PT64_LEVEL_BITS));
+}
+
+static void kvm_mmu_page_set_gfn(struct kvm_mmu_page *sp, int index, gfn_t gfn)
+{
+   if (sp->role.direct)
+   BUG_ON(gfn != kvm_mmu_page_get_gfn(sp, index));
+   else
+   sp->gfns[index] = gfn;
+}
+
 /*
  * Return the pointer to the largepage write count for a given
  * gfn, handling slots that are not large page aligned.
@@ -543,7 +559,7 @@ static int rmap_add(struct kvm_vcpu *vcpu, u64 *spte, gfn_t 
gfn)
return count;
gfn = unalias_gfn(vcpu->kvm, gfn);
sp = page_header(__pa(spte));
-   sp->gfns[spte - sp->spt] = gfn;
+   kvm_mmu_page_set_gfn(sp, spte - sp->spt, gfn);
rmapp = gfn_to_rmap(vcpu->kvm, gfn, sp->role.level);
if (!*rmapp) {
rmap_printk("rmap_add: %p %llx 0->1\n", spte, *spte);
@@ -601,6 +617,7 @@ static void rmap_remove(struct kvm *kvm, u64 *spte)
struct kvm_rmap_desc *prev_desc;
struct kvm_mmu_page *sp;
pfn_t pfn;
+   gfn_t gfn;
unsigned long *rmapp;
int i;
 
@@ -612,7 +629,8 @@ static void rmap_remove(struct kvm *kvm, u64 *spte)
kvm_set_pfn_accessed(pfn);
if (is_writable_pte(*spte))
kvm_set_pfn_dirty(pfn);
-   rmapp = gfn_to_rmap(kvm, sp->gfns[spte - sp->spt], sp->role.level);
+   gfn = kvm_mmu_page_get_gfn(sp, spte - sp->spt);
+   rmapp = gfn_to_rmap(kvm, gfn, sp->role.level);
if (!*rmapp) {
printk(KERN_ERR "rmap_remove: %p %llx 0->BUG\n", spte, *spte);
BUG();
@@ -896,7 +914,8 @@ static void kvm_mmu_free_page(struct kvm *kvm, struct 
kvm_mmu_page *sp)
ASSERT(is_empty_shadow_page(sp->spt));
list_del(&sp->link);
__free_page(virt_to_page(sp->spt));
-   __free_page(virt_to_page(sp->gfns));
+   if (!sp->role.direct)
+   __free_page(virt_to_page(sp->gfns));
kfree(sp);
++kvm->arch.n_free_mmu_pages;
 }
@@ -907,13 +926,15 @@ static unsigned kvm_page_table_hashfn(gfn_t gfn)
 }
 
 static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu,
-  u64 *parent_pte)
+  u64 *parent_pte, int direct)
 {
struct kvm_mmu_page *sp;
 
sp = mmu_memory_cache_alloc(&vcpu->arch.mmu_page_header_cache, sizeof 
*sp);
sp->spt = mmu_memory_cache_alloc(&vcpu->arch.mmu_page_cache, PAGE_SIZE);
-   sp->gfns = mmu_memory_cache_alloc(&vcpu->arch.mmu_page_cache, 
PAGE_SIZE);
+   if (!direct)
+   sp->gfns = mmu_memory_cache_alloc(&vcpu->arch.mmu_page_cache,
+ PAGE_SIZE);
set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
list_add(&sp->link, &vcpu->kvm->arch.active_mmu_pages);
bitmap_zero(sp->slot_bitmap, KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS);
@@ -1347,7 +1368,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct 
kvm_vcpu *vcpu,
return sp;
}
++vcpu->kvm->stat.mmu_cache_miss;
-   sp = kvm_mmu_alloc_page(vcpu, parent_pte);
+   sp = kvm_mmu_alloc_page(vcpu, parent_pte, direct);
if (!sp)
return sp;
sp->gfn = gfn;
@@ -3314,7 +3335,7 @@ void inspect_spte_has_rmap(struct kvm *kvm, u64 *sptep)
 
if (*sptep & PT_WRITABLE_MASK) {
rev_sp = page_header(__pa(sptep));
-   gfn = rev_sp->gfns[sptep - rev_sp->spt];
+   gfn = kvm_mmu_page_get_gfn(rev_sp, sptep - rev_sp->spt);
 
if (!gfn_to_memslot(kvm, gfn)) {
 

Re: [PATCH 11/22] KVM: MMU: Add infrastructure for two-level page walker

2010-04-28 Thread Avi Kivity

On 04/28/2010 02:03 PM, Joerg Roedel wrote:

On Tue, Apr 27, 2010 at 03:34:10PM +0300, Avi Kivity wrote:
   

On 04/27/2010 01:38 PM, Joerg Roedel wrote:
 

This patch introduces a mmu-callback to translate gpa
addresses in the walk_addr code. This is later used to
translate l2_gpa addresses into l1_gpa addresses.

+static inline gfn_t gpa_to_gfn(gpa_t gpa)
+{
+   return (gfn_t)gpa>>   PAGE_SHIFT;
+}
+
   

This overflows on 32-bit, since gpa_t is u64 and gfn_t is ulong.
 

Thinking again about it, on 32 bit the physical address width is only 36
bits. So there shouldn't be an overflow, no?
   


It's limited by MAXPHYADDR (at least on Intel) even on 32-bits.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/9] KVM test: Add the ability to send the username in remote_login()

2010-04-28 Thread Michael Goldish
On 04/26/2010 01:03 PM, Jason Wang wrote:
> In order to let the serial console work, we must let the
> remote_login() send the username when needed.
> 
> Signed-off-by: Jason Wang 
> ---
>  client/tests/kvm/kvm_utils.py |   14 ++
>  1 files changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/client/tests/kvm/kvm_utils.py b/client/tests/kvm/kvm_utils.py
> index 9adbaee..1ea0852 100644
> --- a/client/tests/kvm/kvm_utils.py
> +++ b/client/tests/kvm/kvm_utils.py
> @@ -452,7 +452,7 @@ def check_kvm_source_dir(source_dir):
>  # guests.
>  
>  def remote_login(command, password, prompt, linesep="\n", timeout=10,
> - prompt_assist = None):
> + prompt_assist = None, username = None):
 ^ ^  ^ ^
Same here: PEP 8 says no spaces around keyword arguments or default
values, and pretty much all of the KVM test code conforms with PEP 8, so
let's be consistent.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/9] KVM test: Introduce the prompt assist

2010-04-28 Thread Michael Goldish
On 04/26/2010 01:03 PM, Jason Wang wrote:
> Sometimes we need to send an assist string to a session in order to
> get the prompt especially when re-connecting to an already logged
> serial session. This patch send the assist string before doing the
> pattern matching of remote_login.

Can you give an example of a prompt assist string, and a typical usage
example?  What guests require prompt assist strings?

> Signed-off-by: Jason Wang 
> ---
>  client/tests/kvm/kvm_utils.py |9 +++--
>  1 files changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/client/tests/kvm/kvm_utils.py b/client/tests/kvm/kvm_utils.py
> index 25f3c8c..9adbaee 100644
> --- a/client/tests/kvm/kvm_utils.py
> +++ b/client/tests/kvm/kvm_utils.py
> @@ -451,7 +451,8 @@ def check_kvm_source_dir(source_dir):
>  # The following are functions used for SSH, SCP and Telnet communication with
>  # guests.
>  
> -def remote_login(command, password, prompt, linesep="\n", timeout=10):
> +def remote_login(command, password, prompt, linesep="\n", timeout=10,
> + prompt_assist = None):
 ^ ^
These spaces do not conform with PEP 8.

>  """
>  Log into a remote host (guest) using SSH or Telnet. Run the given command
>  using kvm_spawn and provide answers to the questions asked. If timeout
> @@ -468,7 +469,8 @@ def remote_login(command, password, prompt, linesep="\n", 
> timeout=10):
>  @param timeout: The maximal time duration (in seconds) to wait for each
>  step of the login procedure (i.e. the "Are you sure" prompt, the
>  password prompt, the shell prompt, etc)
> -
> +@prarm prompt_assist: An assistant string sent before the pattern

Typo^

> +matching in order to get the prompt for some kinds of 
> shell_client.
>  @return Return the kvm_spawn object on success and None on failure.
>  """
>  sub = kvm_subprocess.kvm_shell_session(command,
> @@ -479,6 +481,9 @@ def remote_login(command, password, prompt, linesep="\n", 
> timeout=10):
>  
>  logging.debug("Trying to login with command '%s'" % command)
>  
> +if prompt_assist is not None:
> +sub.sendline(prompt_assist)
> +
>  while True:
>  (match, text) = sub.read_until_last_line_matches(
>  [r"[Aa]re you sure", r"[Pp]assword:\s*$", 
> r"^\s*[Ll]ogin:\s*$",
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 11/22] KVM: MMU: Add infrastructure for two-level page walker

2010-04-28 Thread Avi Kivity

On 04/28/2010 01:52 PM, Joerg Roedel wrote:

On Tue, Apr 27, 2010 at 03:34:10PM +0300, Avi Kivity wrote:
   

On 04/27/2010 01:38 PM, Joerg Roedel wrote:
 

This patch introduces a mmu-callback to translate gpa
addresses in the walk_addr code. This is later used to
translate l2_gpa addresses into l1_gpa addresses.

+static inline gfn_t gpa_to_gfn(gpa_t gpa)
+{
+   return (gfn_t)gpa>>   PAGE_SHIFT;
+}
+
   

This overflows on 32-bit, since gpa_t is u64 and gfn_t is ulong.
 

Hm, this is a problem outside of this patchset too (for 32bit hosts).
The best solution is probably to convert gfn_t to u64 too.
   


If you cast like

   (gfn_t)(gpa >> PAGE_SHIFT)

you avoid the overflow for MAXPHYADDR < 48.  However, I agree that 
converting gfn_t to u64 is best, the minor performance degradation is in 
no way comparable to the corruption that results from a miscast.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] KVM: x86: avoid unnecessary bitmap allocation when memslot is clean

2010-04-28 Thread Avi Kivity

On 04/27/2010 04:46 PM, Takuya Yoshikawa wrote:

(2010/04/27 22:18), Avi Kivity wrote:


Furthermore, the reduced allocations seem to produce good effects for
other cases too. Actually, I observed that the time for the ioctl was
more stable than the original one and the average time for dirty slots
was also reduced by some extent.


Can you explain why the dirty slots were improved?


I cannot do exactly, but vmalloc() might affect the following 
allocations?


But actually, I could check how much one vmalloc() consumed our time,
and this will be completely removed by our "moving dirty bitmaps to 
user space".



Better to remove this explanation about dirty slots?


I guess so.



BTW, maybe good news for us!

1. I found one place in which set_bit_to_user() is locally defined using
current helpers.


Interesting, where?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/9] KVM test: Make the login re suitable for serial console

2010-04-28 Thread Michael Goldish
On 04/26/2010 01:03 PM, Jason Wang wrote:
> Current matching re ^\s*[Ll]ogin:\s*$ is not suitable for the serial
> console, so change it to [Ll]ogin:.
> 
> Signed-off-by: Jason Wang 
> ---
>  client/tests/kvm/kvm_utils.py |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/client/tests/kvm/kvm_utils.py b/client/tests/kvm/kvm_utils.py
> index 1ea0852..bb42314 100644
> --- a/client/tests/kvm/kvm_utils.py
> +++ b/client/tests/kvm/kvm_utils.py
> @@ -488,7 +488,7 @@ def remote_login(command, password, prompt, linesep="\n", 
> timeout=10,
>  
>  while True:
>  (match, text) = sub.read_until_last_line_matches(
> -[r"[Aa]re you sure", r"[Pp]assword:\s*$", 
> r"^\s*[Ll]ogin:\s*$",
> +[r"[Aa]re you sure", r"[Pp]assword:\s*$", r"[Ll]ogin:",

This RE will also match

"Last login: Wed Apr 28 14:07:28 2010 from localhost.localdomain"

so we should probably settle for [Ll]ogin:\s*$.

>   r"[Cc]onnection.*closed", r"[Cc]onnection.*refused",
>   r"[Pp]lease wait", prompt],
>   timeout=timeout, internal_timeout=0.5)
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 11/22] KVM: MMU: Add infrastructure for two-level page walker

2010-04-28 Thread Joerg Roedel
On Tue, Apr 27, 2010 at 03:34:10PM +0300, Avi Kivity wrote:
> On 04/27/2010 01:38 PM, Joerg Roedel wrote:
> >This patch introduces a mmu-callback to translate gpa
> >addresses in the walk_addr code. This is later used to
> >translate l2_gpa addresses into l1_gpa addresses.
> >
> >+static inline gfn_t gpa_to_gfn(gpa_t gpa)
> >+{
> >+return (gfn_t)gpa>>  PAGE_SHIFT;
> >+}
> >+
> 
> This overflows on 32-bit, since gpa_t is u64 and gfn_t is ulong.

Thinking again about it, on 32 bit the physical address width is only 36
bits. So there shouldn't be an overflow, no?

Joerg


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 11/22] KVM: MMU: Add infrastructure for two-level page walker

2010-04-28 Thread Joerg Roedel
On Tue, Apr 27, 2010 at 03:34:10PM +0300, Avi Kivity wrote:
> On 04/27/2010 01:38 PM, Joerg Roedel wrote:
> >This patch introduces a mmu-callback to translate gpa
> >addresses in the walk_addr code. This is later used to
> >translate l2_gpa addresses into l1_gpa addresses.
> >
> >+static inline gfn_t gpa_to_gfn(gpa_t gpa)
> >+{
> >+return (gfn_t)gpa>>  PAGE_SHIFT;
> >+}
> >+
> 
> This overflows on 32-bit, since gpa_t is u64 and gfn_t is ulong.

Hm, this is a problem outside of this patchset too (for 32bit hosts).
The best solution is probably to convert gfn_t to u64 too.

Joerg


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [SeaBIOS] About cpu_set, CPU hotplug and related subjects

2010-04-28 Thread Gleb Natapov
On Wed, Apr 28, 2010 at 12:41:51PM +0200, Jes Sorensen wrote:
> On 04/28/10 12:30, Gleb Natapov wrote:
> > On Wed, Apr 28, 2010 at 11:31:00AM +0200, Jes Sorensen wrote:
> >> On 04/22/10 03:12, Kevin O'Connor wrote:
> >> Generating the more complex tables dynamically would be preferred, but
> >> it requires like half an AML compiler in Seabios, so it kinda stalled
> >> there
> >>
> > We can try to be smart and generate most of the code statically and only
> > minimum that absolutely required dynamically. Haven't looked at how
> > simple dynamic part can be made.
> 
> I looked at it briefly and ran away screaming :) I am sure it can be
This is normal reaction to ACPI of a healthy human being.

> done, bit it would require pretty good understanding of the AML encodings.
> 
> The CPU declarations are particularly tricky as they get pretty big and
> complex and need to live in the DSDT, whereas a lot of other things we
> can shift off to separate SSDT tables and only put the minimum that
> needs to be generated dynamically in it's own table.
> 
We can generate complex code statically and call it from dynamically
generated CPU declarations.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [SeaBIOS] About cpu_set, CPU hotplug and related subjects

2010-04-28 Thread Jes Sorensen
On 04/28/10 12:30, Gleb Natapov wrote:
> On Wed, Apr 28, 2010 at 11:31:00AM +0200, Jes Sorensen wrote:
>> On 04/22/10 03:12, Kevin O'Connor wrote:
>> Generating the more complex tables dynamically would be preferred, but
>> it requires like half an AML compiler in Seabios, so it kinda stalled
>> there
>>
> We can try to be smart and generate most of the code statically and only
> minimum that absolutely required dynamically. Haven't looked at how
> simple dynamic part can be made.

I looked at it briefly and ran away screaming :) I am sure it can be
done, bit it would require pretty good understanding of the AML encodings.

The CPU declarations are particularly tricky as they get pretty big and
complex and need to live in the DSDT, whereas a lot of other things we
can shift off to separate SSDT tables and only put the minimum that
needs to be generated dynamically in it's own table.

Cheers,
Jes
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 0/10] KVM MMU: allow more shadow pages become asynchronous

2010-04-28 Thread Avi Kivity

On 04/28/2010 06:54 AM, Xiao Guangrong wrote:

Changlog v3:
   


Reviewed-by: Avi Kivity 

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [SeaBIOS] About cpu_set, CPU hotplug and related subjects

2010-04-28 Thread Gleb Natapov
On Wed, Apr 28, 2010 at 11:31:00AM +0200, Jes Sorensen wrote:
> On 04/22/10 03:12, Kevin O'Connor wrote:
> > As I understand it, the hotplug support was only in the kvm copy of
> > bochs bios.  It also limited the number of cpus one could use (I think
> > 16).
> > 
> > The current smp support in SeaBIOS doesn't limit the number of cpus.
> > 
> > So, there has been reluctance to just port the old kvm bios code
> > forward.
> 
> I believe the number was limited to 15, due to limits in a certain
> proprietary operating system.
> 
> The issue with the AML code in BOCHS vs the code in Seabios is a bit
> tricky. The tables in BOCHS are a fair bit more complicated with the
> hotplug code and therefore statically generated, whereas Seabios
> generates them dynamically at boot time.
> 
> Generating the more complex tables dynamically would be preferred, but
> it requires like half an AML compiler in Seabios, so it kinda stalled
> there
> 
We can try to be smart and generate most of the code statically and only
minimum that absolutely required dynamically. Haven't looked at how
simple dynamic part can be made.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] KVM: x86: avoid unnecessary bitmap allocation when memslot is clean

2010-04-28 Thread Avi Kivity

On 04/28/2010 07:17 AM, Takuya Yoshikawa wrote:

(2010/04/27 22:46), Takuya Yoshikawa wrote:

(2010/04/27 22:18), Avi Kivity wrote:


Furthermore, the reduced allocations seem to produce good effects for
other cases too. Actually, I observed that the time for the ioctl was
more stable than the original one and the average time for dirty slots
was also reduced by some extent.


Can you explain why the dirty slots were improved?


I cannot do exactly, but vmalloc() might affect the following 
allocations?





Oh, this might be the effect of tlb flush or something?
 - I'm not so confident about mm, but vmalloc(),vfree() does this kind 
of thing?


Yes, vmalloc() does tlb flushes, perhaps that's the cause.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2] qemu-kvm: fix crash on reboot with vhost-net

2010-04-28 Thread Michael S. Tsirkin
When vhost-net is disabled on reboot, we set msix mask notifier
to NULL to disable further mask/unmask notifications.
Code currently tries to pass this NULL to notifier,
leading to a crash.  The right thing to do is
to add explicit APIs to enable/disable notifications.
Now when disabling notifications:
- if vector is masked, we don't need to notify backend,
  just disable future notifications
- if vector is unmasked, invoke callback to unassign backend,
  then disable future notifications

This patch also polls notifier before closing it,
to make sure we don't lose events if poll callback
didn't have time to run.

Signed-off-by: Michael S. Tsirkin 
---

Changes from v1:
Separate APIs to set and unset notifiers
Test and clear notifier before destroying it

 hw/msix.c   |   40 +++-
 hw/msix.h   |1 +
 hw/virtio-pci.c |7 +--
 3 files changed, 41 insertions(+), 7 deletions(-)

diff --git a/hw/msix.c b/hw/msix.c
index 3ec8805..8f9a621 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -609,14 +609,44 @@ void msix_unuse_all_vectors(PCIDevice *dev)
 
 int msix_set_mask_notifier(PCIDevice *dev, unsigned vector, void *opaque)
 {
+int r;
+if (vector >= dev->msix_entries_nr || !dev->msix_entry_used[vector])
+return 0;
+
+assert(dev->msix_mask_notifier);
+assert(opaque);
+assert(!dev->msix_mask_notifier_opaque[vector]);
+
+if (msix_is_masked(dev, vector)) {
+return 0;
+}
+r = dev->msix_mask_notifier(dev, vector, opaque,
+msix_is_masked(dev, vector));
+if (r < 0) {
+return r;
+}
+dev->msix_mask_notifier_opaque[vector] = opaque;
+return r;
+}
+
+int msix_unset_mask_notifier(PCIDevice *dev, unsigned vector)
+{
 int r = 0;
 if (vector >= dev->msix_entries_nr || !dev->msix_entry_used[vector])
 return 0;
 
-if (dev->msix_mask_notifier)
-r = dev->msix_mask_notifier(dev, vector, opaque,
-msix_is_masked(dev, vector));
-if (r >= 0)
-dev->msix_mask_notifier_opaque[vector] = opaque;
+assert(dev->msix_mask_notifier);
+assert(dev->msix_mask_notifier_opaque[vector]);
+
+if (msix_is_masked(dev, vector)) {
+return 0;
+}
+r = dev->msix_mask_notifier(dev, vector,
+dev->msix_mask_notifier_opaque[vector],
+msix_is_masked(dev, vector));
+if (r < 0) {
+return r;
+}
+dev->msix_mask_notifier_opaque[vector] = NULL;
 return r;
 }
diff --git a/hw/msix.h b/hw/msix.h
index f167231..6b21ffb 100644
--- a/hw/msix.h
+++ b/hw/msix.h
@@ -34,4 +34,5 @@ void msix_reset(PCIDevice *dev);
 extern int msix_supported;
 
 int msix_set_mask_notifier(PCIDevice *dev, unsigned vector, void *opaque);
+int msix_unset_mask_notifier(PCIDevice *dev, unsigned vector);
 #endif
diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index 99a588c..c4bc633 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -462,10 +462,13 @@ static int virtio_pci_set_guest_notifier(void *opaque, 
int n, bool assign)
 msix_set_mask_notifier(&proxy->pci_dev,
virtio_queue_vector(proxy->vdev, n), vq);
 } else {
-msix_set_mask_notifier(&proxy->pci_dev,
-   virtio_queue_vector(proxy->vdev, n), NULL);
+msix_unset_mask_notifier(&proxy->pci_dev,
+virtio_queue_vector(proxy->vdev, n));
 qemu_set_fd_handler(event_notifier_get_fd(notifier),
 NULL, NULL, NULL);
+/* Test and clear notifier before closing it,
+ * in case poll callback didn't have time to run. */
+virtio_pci_guest_notifier_read(vq);
 event_notifier_cleanup(notifier);
 }
 
-- 
1.7.1.rc1.22.g3163
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/23] KVM: x86 emulator: cleanup some direct calls into kvm to use existing callbacks

2010-04-28 Thread Gleb Natapov
On Wed, Apr 28, 2010 at 11:59:54AM +0300, Avi Kivity wrote:
> On 04/27/2010 03:15 PM, Gleb Natapov wrote:
> >Use callbacks from x86_emulate_ops to access segments instead of calling
> >into kvm directly.
> >
> >
> >
> >-static unsigned long seg_base(struct x86_emulate_ctxt *ctxt, int seg)
> >+static unsigned long seg_base(struct x86_emulate_ctxt *ctxt,
> >+  struct x86_emulate_ops *ops, int seg)
> >  {
> >-if (ctxt->mode == X86EMUL_MODE_PROT64&&  seg<  VCPU_SREG_FS)
> >-return 0;
> >+unsigned long base;
> >
> >-return kvm_x86_ops->get_segment_base(ctxt->vcpu, seg);
> 
> get_segment_base() is only one vmread on intel, but you replace it
> with reading the entire segment.
> 
Didn't what to have separate x86_emulate_ops callback for reading
segment base. If this is serious performance concern it can be added

> >+if (ctxt->mode == X86EMUL_MODE_PROT64) {
> >+u64 val;
> >+switch (seg) {
> >+case VCPU_SREG_FS:
> >+ops->get_msr(ctxt->vcpu, MSR_FS_BASE,&val);
> >+break;
> >+case VCPU_SREG_GS:
> >+ops->get_msr(ctxt->vcpu, MSR_GS_BASE,&val);
> >+break;
> >+default:
> >+val = 0;
> >+break;
> >+}
> 
> Why this ugliness?  get_cached_descriptor() should do this.
> 
get_cached_descriptor() returns struct desc_struct which is 32 bit only.
There is not 64bit segment descriptors.

> >
> >  static unsigned long seg_override_base(struct x86_emulate_ctxt *ctxt,
> >+   struct x86_emulate_ops *ops,
> >struct decode_cache *c)
> >  {
> > if (!c->has_seg_override)
> > return 0;
> >
> >-return seg_base(ctxt, c->seg_override);
> >+return seg_base(ctxt, ops, c->seg_override);
> >  }
> 
> Sticking ops into ctxt would reduce the size of these patches.
> 
But it will introduce intermediate huge patch. But I agree that this
should be done eventually. Planned to do it somewhere at the end. Near
the patch that change x86_emulate_ops callbacks to get ctxt instead of
vcpu as a parameter.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM Networking Issues

2010-04-28 Thread Anthony Davis

Hi,

Can anyone shed some light on a questions I have if possible?

I know KVM uses dnsmasq for DHCP and DNS, but is it possible to not  
use this and use another DHCP server to dish out IP addresses? Now if  
my understanding is right, there is just a NAT created to access the  
VM? How is this done?


The reason I ask is that I am trying to put together a lab that is  
running a kickstart server, which uses dhcp, so i would like my DHCP  
server to handle all networking, but I need to know how to create the  
rest of the setup so that i can assign a static IP and dhcp is only  
used for deployment / PXE.


I hope iv explained that so it makes sense?

Hope someone can help.

Kind Regards

Tony
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2] qemu-kvm: fix crash on reboot with vhost-net

2010-04-28 Thread Juan Quintela
"Michael S. Tsirkin"  wrote:
> When vhost-net is disabled on reboot, we set msix mask notifier
> to NULL to disable further mask/unmask notifications.
> Code currently tries to pass this NULL to notifier,
> leading to a crash.  The right thing to do is
> to add explicit APIs to enable/disable notifications.
> Now when disabling notifications:
> - if vector is masked, we don't need to notify backend,
>   just disable future notifications
> - if vector is unmasked, invoke callback to unassign backend,
>   then disable future notifications
>
> This patch also polls notifier before closing it,
> to make sure we don't lose events if poll callback
> didn't have time to run.
>
> Signed-off-by: Michael S. Tsirkin 
> ---
>
> Changes from v1:
>   Separate APIs to set and unset notifiers
>   Test and clear notifier before destroying it
>
>  hw/msix.c   |   40 +++-
>  hw/msix.h   |1 +
>  hw/virtio-pci.c |7 +--
>  3 files changed, 41 insertions(+), 7 deletions(-)

Acked-by: Juan Quintela 

This patch addresses the issues that I pointed in previous mail and
fixes the problem at hand.

Creating a better API for interaction between virtio-pci <-> msix still
needs more work/thought, already discussed that with mst.

Later, Juan.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 15/23] KVM: do not inject #PF in (read|write)_emulated() callbacks

2010-04-28 Thread Avi Kivity

On 04/27/2010 03:15 PM, Gleb Natapov wrote:

Return error to x86 emulator instead of injection exception behind its back.

Signed-off-by: Gleb Natapov
---
  arch/x86/include/asm/kvm_emulate.h |3 +++
  arch/x86/kvm/emulate.c |   12 +++-
  arch/x86/kvm/x86.c |   28 ++--
  3 files changed, 28 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index ae4af86..b977ccf 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -94,6 +94,7 @@ struct x86_emulate_ops {
int (*read_emulated)(unsigned long addr,
 void *val,
 unsigned int bytes,
+unsigned int *error,
 struct kvm_vcpu *vcpu);

   


The fault may be at a different address than addr, if we cross a page 
boundary.  Need a struct here.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM, Fix QEMU-KVM is killed by guest SRAO MCE

2010-04-28 Thread Avi Kivity

On 04/28/2010 05:56 AM, Huang Ying wrote:
   


Just want to use the side effect of copy_from_user, SIGBUS will be sent
to current process because the page touched is marked as poisoned. That
is, failure is expected, so the return value is not checked.

   

What if the failure doesn't happen?  Say, someone mmap()ed over the page.
 

Sorry, not get your idea clearly. hva is re-mmap()ed? We just read the
hva, not write, so I think it should be OK here.

   


We don't generate a signal in this case.  Does the code continue to work 
correctly (not sure what correctly is in this case... should probably 
just continue).


There's also the possibility of -EFAULT.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] KVM: x86: avoid unnecessary bitmap allocation when memslot is clean

2010-04-28 Thread Takuya Yoshikawa
Hi Marcelo, Avi,

I updated the patch as follows.

Changelog:
 1. Inserted one "r = -ENOMEM;" line following Avi's advice.
 2. Little change of explanation about performance improvements.

I'm now testing and cleaning up my next patch series based on this,
so please apply this if this makes sense and has no problems.

Thanks,
  Takuya

===

Although we always allocate a new dirty bitmap in x86's get_dirty_log(),
it is only used as a zero-source of copy_to_user() and freed right after
that when memslot is clean. This patch uses clear_user() instead of doing
this unnecessary zero-source allocation.

Performance improvement: as we can expect easily, the time needed to
allocate a bitmap is completely reduced. In my test, the improved ioctl
was about 4 to 10 times faster than the original one for clean slots.
Furthermore, reducing memory allocations and copies will produce good
effects to caches too.

Signed-off-by: Takuya Yoshikawa 
---
 arch/x86/kvm/x86.c |   37 +++--
 1 files changed, 23 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6b2ce1d..b95a211 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2744,7 +2744,6 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
struct kvm_memory_slot *memslot;
unsigned long n;
unsigned long is_dirty = 0;
-   unsigned long *dirty_bitmap = NULL;
 
mutex_lock(&kvm->slots_lock);
 
@@ -2759,27 +2758,30 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
 
n = kvm_dirty_bitmap_bytes(memslot);
 
-   r = -ENOMEM;
-   dirty_bitmap = vmalloc(n);
-   if (!dirty_bitmap)
-   goto out;
-   memset(dirty_bitmap, 0, n);
-
for (i = 0; !is_dirty && i < n/sizeof(long); i++)
is_dirty = memslot->dirty_bitmap[i];
 
/* If nothing is dirty, don't bother messing with page tables. */
if (is_dirty) {
struct kvm_memslots *slots, *old_slots;
+   unsigned long *dirty_bitmap;
 
spin_lock(&kvm->mmu_lock);
kvm_mmu_slot_remove_write_access(kvm, log->slot);
spin_unlock(&kvm->mmu_lock);
 
-   slots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
-   if (!slots)
-   goto out_free;
+   r = -ENOMEM;
+   dirty_bitmap = vmalloc(n);
+   if (!dirty_bitmap)
+   goto out;
+   memset(dirty_bitmap, 0, n);
 
+   r = -ENOMEM;
+   slots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
+   if (!slots) {
+   vfree(dirty_bitmap);
+   goto out;
+   }
memcpy(slots, kvm->memslots, sizeof(struct kvm_memslots));
slots->memslots[log->slot].dirty_bitmap = dirty_bitmap;
 
@@ -2788,13 +2790,20 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
synchronize_srcu_expedited(&kvm->srcu);
dirty_bitmap = old_slots->memslots[log->slot].dirty_bitmap;
kfree(old_slots);
+
+   r = -EFAULT;
+   if (copy_to_user(log->dirty_bitmap, dirty_bitmap, n)) {
+   vfree(dirty_bitmap);
+   goto out;
+   }
+   vfree(dirty_bitmap);
+   } else {
+   r = -EFAULT;
+   if (clear_user(log->dirty_bitmap, n))
+   goto out;
}
 
r = 0;
-   if (copy_to_user(log->dirty_bitmap, dirty_bitmap, n))
-   r = -EFAULT;
-out_free:
-   vfree(dirty_bitmap);
 out:
mutex_unlock(&kvm->slots_lock);
return r;
-- 
1.6.3.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [SeaBIOS] About cpu_set, CPU hotplug and related subjects

2010-04-28 Thread Jes Sorensen
On 04/22/10 03:12, Kevin O'Connor wrote:
> As I understand it, the hotplug support was only in the kvm copy of
> bochs bios.  It also limited the number of cpus one could use (I think
> 16).
> 
> The current smp support in SeaBIOS doesn't limit the number of cpus.
> 
> So, there has been reluctance to just port the old kvm bios code
> forward.

I believe the number was limited to 15, due to limits in a certain
proprietary operating system.

The issue with the AML code in BOCHS vs the code in Seabios is a bit
tricky. The tables in BOCHS are a fair bit more complicated with the
hotplug code and therefore statically generated, whereas Seabios
generates them dynamically at boot time.

Generating the more complex tables dynamically would be preferred, but
it requires like half an AML compiler in Seabios, so it kinda stalled
there

Cheers,
Jes
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Pulling regularly from QEMU git repository

2010-04-28 Thread Avi Kivity

On 04/28/2010 07:37 AM, Gerhard Wiesinger wrote:

Hello,

I saw that KVM git repository is not updated from QEMU repository very 
often. Is it possible to integrate QEMU git tree into KVM git 
regularly as qemu git tree is very dynamic. E.g. last commit was more 
than 2 weeks ago. I think it is necessary to integrate early and often 
as a lot of bugs are fixed.




We usually pull at least once a week, but sometimes we are lazy.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 15/23] KVM: do not inject #PF in (read|write)_emulated() callbacks

2010-04-28 Thread Gleb Natapov
On Wed, Apr 28, 2010 at 12:11:41PM +0300, Avi Kivity wrote:
> On 04/27/2010 03:15 PM, Gleb Natapov wrote:
> >Return error to x86 emulator instead of injection exception behind its back.
> >
> >Signed-off-by: Gleb Natapov
> >---
> >  arch/x86/include/asm/kvm_emulate.h |3 +++
> >  arch/x86/kvm/emulate.c |   12 +++-
> >  arch/x86/kvm/x86.c |   28 ++--
> >  3 files changed, 28 insertions(+), 15 deletions(-)
> >
> >diff --git a/arch/x86/include/asm/kvm_emulate.h 
> >b/arch/x86/include/asm/kvm_emulate.h
> >index ae4af86..b977ccf 100644
> >--- a/arch/x86/include/asm/kvm_emulate.h
> >+++ b/arch/x86/include/asm/kvm_emulate.h
> >@@ -94,6 +94,7 @@ struct x86_emulate_ops {
> > int (*read_emulated)(unsigned long addr,
> >  void *val,
> >  unsigned int bytes,
> >+ unsigned int *error,
> >  struct kvm_vcpu *vcpu);
> >
> 
> The fault may be at a different address than addr, if we cross a
> page boundary.  Need a struct here.
> 
Correct and you can find couple of FIXME in emulator.c that say so.
I'll fix that later. This is not the problem that was introduced by this
patch set. What do you mean we need struct here? Struct with
error code + error location?

After looking at the code: only write_emulated handle cross page access
correctly now.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/23] KVM: x86 emulator: cleanup some direct calls into kvm to use existing callbacks

2010-04-28 Thread Avi Kivity

On 04/27/2010 03:15 PM, Gleb Natapov wrote:

Use callbacks from x86_emulate_ops to access segments instead of calling
into kvm directly.



-static unsigned long seg_base(struct x86_emulate_ctxt *ctxt, int seg)
+static unsigned long seg_base(struct x86_emulate_ctxt *ctxt,
+ struct x86_emulate_ops *ops, int seg)
  {
-   if (ctxt->mode == X86EMUL_MODE_PROT64&&  seg<  VCPU_SREG_FS)
-   return 0;
+   unsigned long base;

-   return kvm_x86_ops->get_segment_base(ctxt->vcpu, seg);
   


get_segment_base() is only one vmread on intel, but you replace it with 
reading the entire segment.



+   if (ctxt->mode == X86EMUL_MODE_PROT64) {
+   u64 val;
+   switch (seg) {
+   case VCPU_SREG_FS:
+   ops->get_msr(ctxt->vcpu, MSR_FS_BASE,&val);
+   break;
+   case VCPU_SREG_GS:
+   ops->get_msr(ctxt->vcpu, MSR_GS_BASE,&val);
+   break;
+   default:
+   val = 0;
+   break;
+   }
   


Why this ugliness?  get_cached_descriptor() should do this.



  static unsigned long seg_override_base(struct x86_emulate_ctxt *ctxt,
+  struct x86_emulate_ops *ops,
   struct decode_cache *c)
  {
if (!c->has_seg_override)
return 0;

-   return seg_base(ctxt, c->seg_override);
+   return seg_base(ctxt, ops, c->seg_override);
  }
   


Sticking ops into ctxt would reduce the size of these patches.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 22/23] KVM: remove unneeded initialization.

2010-04-28 Thread Avi Kivity

On 04/27/2010 03:15 PM, Gleb Natapov wrote:

This initialization is no longer needed.

Signed-off-by: Gleb Natapov
---
  arch/x86/kvm/x86.c |2 --
  1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ed15b20..29e2d3b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3801,8 +3801,6 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
 */
cache_all_regs(vcpu);

-   vcpu->mmio_is_write = 0;
-
if (!(emulation_type&  EMULTYPE_NO_DECODE)) {
int cs_db, cs_l;
kvm_x86_ops->get_cs_db_l_bits(vcpu,&cs_db,&cs_l);
   


Best to fold into the patch that made it unnecessary.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 09/23] KVM: x86 emulator: make set_cr() callback return error if it fails

2010-04-28 Thread Avi Kivity

On 04/27/2010 03:15 PM, Gleb Natapov wrote:

Make set_cr() callback return error if it fails instead of injecting #GP
behind emulator's back.



-void kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
+static int _kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
  {
   


standard practice is double underscore.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html