date:20111205

Re: [PATCH] kvm: make vcpu life cycle separated from kvm instance

2011-12-05 Thread Gleb Natapov

On Mon, Dec 05, 2011 at 01:39:37PM +0800, Liu ping fan wrote:
 On Sun, Dec 4, 2011 at 8:10 PM, Gleb Natapov g...@redhat.com wrote:
  On Sun, Dec 04, 2011 at 07:53:37PM +0800, Liu ping fan wrote:
  On Sat, Dec 3, 2011 at 2:26 AM, Jan Kiszka jan.kis...@siemens.com wrote:
   On 2011-12-02 07:26, Liu Ping Fan wrote:
   From: Liu Ping Fan pingf...@linux.vnet.ibm.com
  
   Currently, vcpu can be destructed only when kvm instance destroyed.
   Change this to vcpu's destruction taken when its refcnt is zero,
   and then vcpu MUST and CAN be destroyed before kvm's destroy.
  
   I'm lacking the big picture yet (would be good to have in the change log
   - at least I'm too lazy to read the code):
  
   What increments the refcnt, what decrements it again? IOW, how does user
   space controls the life-cycle of a vcpu after your changes?
  
  In local APIC mode, delivering IPI to target APIC, target's refcnt is
  incremented, and decremented when finished. At other times, using RCU to
  Why is this needed?
 
 Suppose the following scene:
 
 #define kvm_for_each_vcpu(idx, vcpup, kvm) \
 for (idx = 0; \
  idx  atomic_read(kvm-online_vcpus)  \
  (vcpup = kvm_get_vcpu(kvm, idx)) != NULL; \
  idx++)
 
 --
 Here kvm_vcpu's destruction is called
   vcpup-vcpu_id ...  //oops!
 
 
And this is exactly how your code looks. i.e you do not increment
reference count in most of the loops, you only increment it twice
(in pic_unlock() and kvm_irq_delivery_to_apic()) because you are using
vcpu outside of rcu_read_lock() protected section and I do not see why
not just extend protected section to include kvm_vcpu_kick(). As far as
I can see this function does not sleep.

What should protect vcpu from disappearing in your example above is RCU
itself if you are using it right. But since I do not see any calls to
rcu_assign_pointer()/rcu_dereference() I doubt you are using it right
actually.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-next RFC PATCH 0/5] Series short description

2011-12-05 Thread Jason Wang

multiple queue virtio-net: flow steering through host/guest cooperation

Hello all:

This is a rough series adds the guest/host cooperation of flow
steering support based on Krish Kumar's multiple queue virtio-net
driver patch 3/3 (http://lwn.net/Articles/467283/).

This idea is simple, the backend pass the rxhash to the guest and
guest would tell the backend the hash to queue mapping when necessary
then backend can choose the queue based on the hash value of the
packet.  The table is just a page shared bettwen userspace and the
backend.

Patch 1 enable the ability to pass the rxhash through vnet_hdr to
guest.
Patch 2,3 implement a very simple flow director for tap and
mavtap. tap part is based on the multiqueue tap patches posted by me
(http://lwn.net/Articles/459270/).
Patch 4 implement a method for virtio device to find the irq of a
specific virtqueue, in order to do device specific interrupt
optimization
Patch 5 is the part of the guest driver that using accelerate rfs to
program the flow director and with some optimizations on irq affinity
and tx queue selection.

This is just a prototype that demonstrates the idea, there are still
things need to be discussed:

- An alternative idea instead of shared page is ctrl vq, the reason
  that a shared table is preferable is the delay of ctrl vq itself.
- Optimization on irq affinity and tx queue selection

Comments are welcomed, thanks!

---

Jason Wang (5):
  virtio_net: passing rxhash through vnet_hdr
  tuntap: simple flow director support
  macvtap: flow director support
  virtio: introduce a method to get the irq of a specific virtqueue
  virtio-net: flow director support


 drivers/lguest/lguest_device.c |8 ++
 drivers/net/macvlan.c  |4 +
 drivers/net/macvtap.c  |   42 -
 drivers/net/tun.c  |  105 --
 drivers/net/virtio_net.c   |  189 +++-
 drivers/s390/kvm/kvm_virtio.c  |6 +
 drivers/vhost/net.c|   10 +-
 drivers/vhost/vhost.h  |5 +
 drivers/virtio/virtio_mmio.c   |8 ++
 drivers/virtio/virtio_pci.c|   12 +++
 include/linux/if_macvlan.h |1 
 include/linux/if_tun.h |   11 ++
 include/linux/virtio_config.h  |4 +
 include/linux/virtio_net.h |   16 +++
 14 files changed, 377 insertions(+), 44 deletions(-)

-- 
Signature
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-next RFC PATCH 1/5] virtio_net: passing rxhash through vnet_hdr

2011-12-05 Thread Jason Wang

This patch enables the ability to pass the rxhash value to guest
through vnet_hdr. This is useful for guest when it wants to cooperate
with virtual device to steer a flow to dedicated guest cpu.

This feature is negotiated through VIRTIO_NET_F_GUEST_RXHASH.

Signed-off-by: Jason Wang jasow...@redhat.com
---
 drivers/net/macvtap.c  |   10 ++
 drivers/net/tun.c  |   44 +---
 drivers/net/virtio_net.c   |   26 ++
 drivers/vhost/net.c|   10 +++---
 drivers/vhost/vhost.h  |5 +++--
 include/linux/if_tun.h |1 +
 include/linux/virtio_net.h |   10 +-
 7 files changed, 73 insertions(+), 33 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 7c88d13..504c745 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -760,16 +760,17 @@ static ssize_t macvtap_put_user(struct macvtap_queue *q,
int vnet_hdr_len = 0;
 
if (q-flags  IFF_VNET_HDR) {
-   struct virtio_net_hdr vnet_hdr;
+   struct virtio_net_hdr_rxhash vnet_hdr;
vnet_hdr_len = q-vnet_hdr_sz;
if ((len -= vnet_hdr_len)  0)
return -EINVAL;
 
-   ret = macvtap_skb_to_vnet_hdr(skb, vnet_hdr);
+   ret = macvtap_skb_to_vnet_hdr(skb, vnet_hdr.hdr.hdr);
if (ret)
return ret;
 
-   if (memcpy_toiovecend(iv, (void *)vnet_hdr, 0, 
sizeof(vnet_hdr)))
+   vnet_hdr.rxhash = skb-rxhash;
+   if (memcpy_toiovecend(iv, (void *)vnet_hdr, 0, q-vnet_hdr_sz))
return -EFAULT;
}
 
@@ -890,7 +891,8 @@ static long macvtap_ioctl(struct file *file, unsigned int 
cmd,
return ret;
 
case TUNGETFEATURES:
-   if (put_user(IFF_TAP | IFF_NO_PI | IFF_VNET_HDR, up))
+   if (put_user(IFF_TAP | IFF_NO_PI | IFF_VNET_HDR | IFF_RXHASH,
+up))
return -EFAULT;
return 0;
 
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index afb11d1..7d22b4b 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -869,49 +869,55 @@ static ssize_t tun_put_user(struct tun_file *tfile,
}
 
if (tfile-flags  TUN_VNET_HDR) {
-   struct virtio_net_hdr gso = { 0 }; /* no info leak */
-   if ((len -= tfile-vnet_hdr_sz)  0)
+   struct virtio_net_hdr_rxhash hdr;
+   struct virtio_net_hdr *gso = (struct virtio_net_hdr *)hdr;
+
+   if ((len -= tfile-vnet_hdr_sz)  0 ||
+   tfile-vnet_hdr_sz  sizeof(struct virtio_net_hdr_rxhash))
return -EINVAL;
 
+   memset(hdr, 0, sizeof(hdr));
if (skb_is_gso(skb)) {
struct skb_shared_info *sinfo = skb_shinfo(skb);
 
/* This is a hint as to how much should be linear. */
-   gso.hdr_len = skb_headlen(skb);
-   gso.gso_size = sinfo-gso_size;
+   gso-hdr_len = skb_headlen(skb);
+   gso-gso_size = sinfo-gso_size;
if (sinfo-gso_type  SKB_GSO_TCPV4)
-   gso.gso_type = VIRTIO_NET_HDR_GSO_TCPV4;
+   gso-gso_type = VIRTIO_NET_HDR_GSO_TCPV4;
else if (sinfo-gso_type  SKB_GSO_TCPV6)
-   gso.gso_type = VIRTIO_NET_HDR_GSO_TCPV6;
+   gso-gso_type = VIRTIO_NET_HDR_GSO_TCPV6;
else if (sinfo-gso_type  SKB_GSO_UDP)
-   gso.gso_type = VIRTIO_NET_HDR_GSO_UDP;
+   gso-gso_type = VIRTIO_NET_HDR_GSO_UDP;
else {
pr_err(unexpected GSO type: 
   0x%x, gso_size %d, hdr_len %d\n,
-  sinfo-gso_type, gso.gso_size,
-  gso.hdr_len);
+  sinfo-gso_type, gso-gso_size,
+  gso-hdr_len);
print_hex_dump(KERN_ERR, tun: ,
   DUMP_PREFIX_NONE,
   16, 1, skb-head,
-  min((int)gso.hdr_len, 64), true);
+  min((int)gso-hdr_len, 64),
+  true);
WARN_ON_ONCE(1);
return -EINVAL;
}
if (sinfo-gso_type  SKB_GSO_TCP_ECN)
-   gso.gso_type |= VIRTIO_NET_HDR_GSO_ECN;
+   gso-gso_type |= VIRTIO_NET_HDR_GSO_ECN;

[net-next RFC PATCH 2/5] tuntap: simple flow director support

2011-12-05 Thread Jason Wang

This patch adds a simple flow director to tun/tap device. It is just a
page that contains the hash to queue mapping which could be changed by
user-space. The backend (tap/macvtap) would query this table to get
the desired queue of a packets when it send packets to userspace.

The page address were set through a new kind of ioctl - TUNSETFD and
were pinned until device exit or another new page were specified.

Signed-off-by: Jason Wang jasow...@redhat.com
---
 drivers/net/tun.c  |   63 
 include/linux/if_tun.h |   10 
 2 files changed, 62 insertions(+), 11 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 7d22b4b..2efaf81 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -64,6 +64,7 @@
 #include linux/nsproxy.h
 #include linux/virtio_net.h
 #include linux/rcupdate.h
+#include linux/highmem.h
 #include net/net_namespace.h
 #include net/netns/generic.h
 #include net/rtnetlink.h
@@ -109,6 +110,7 @@ struct tap_filter {
 };
 
 #define MAX_TAP_QUEUES (NR_CPUS  16 ? NR_CPUS : 16)
+#define TAP_HASH_MASK  0xFF
 
 struct tun_file {
struct sock sk;
@@ -128,6 +130,7 @@ struct tun_sock;
 
 struct tun_struct {
struct tun_file *tfiles[MAX_TAP_QUEUES];
+   struct page *fd_page[1];
unsigned intnumqueues;
unsigned intflags;
uid_t   owner;
@@ -156,7 +159,7 @@ static struct tun_file *tun_get_queue(struct net_device 
*dev,
struct tun_struct *tun = netdev_priv(dev);
struct tun_file *tfile = NULL;
int numqueues = tun-numqueues;
-   __u32 rxq;
+   __u32 rxq, rxhash;
 
BUG_ON(!rcu_read_lock_held());
 
@@ -168,6 +171,22 @@ static struct tun_file *tun_get_queue(struct net_device 
*dev,
goto out;
}
 
+   rxhash = skb_get_rxhash(skb);
+   if (rxhash) {
+   if (tun-fd_page[0]) {
+   u16 *table = kmap_atomic(tun-fd_page[0]);
+   rxq = table[rxhash  TAP_HASH_MASK];
+   kunmap_atomic(table);
+   if (rxq  numqueues) {
+   tfile = rcu_dereference(tun-tfiles[rxq]);
+   goto out;
+   }
+   }
+   rxq = ((u64)rxhash * numqueues)  32;
+   tfile = rcu_dereference(tun-tfiles[rxq]);
+   goto out;
+   }
+
if (likely(skb_rx_queue_recorded(skb))) {
rxq = skb_get_rx_queue(skb);
 
@@ -178,14 +197,6 @@ static struct tun_file *tun_get_queue(struct net_device 
*dev,
goto out;
}
 
-   /* Check if we can use flow to select a queue */
-   rxq = skb_get_rxhash(skb);
-   if (rxq) {
-   u32 idx = ((u64)rxq * numqueues)  32;
-   tfile = rcu_dereference(tun-tfiles[idx]);
-   goto out;
-   }
-
tfile = rcu_dereference(tun-tfiles[0]);
 out:
return tfile;
@@ -1020,6 +1031,14 @@ out:
return ret;
 }
 
+static void tun_destructor(struct net_device *dev)
+{
+   struct tun_struct *tun = netdev_priv(dev);
+   if (tun-fd_page[0])
+   put_page(tun-fd_page[0]);
+   free_netdev(dev);
+}
+
 static void tun_setup(struct net_device *dev)
 {
struct tun_struct *tun = netdev_priv(dev);
@@ -1028,7 +1047,7 @@ static void tun_setup(struct net_device *dev)
tun-group = -1;
 
dev-ethtool_ops = tun_ethtool_ops;
-   dev-destructor = free_netdev;
+   dev-destructor = tun_destructor;
 }
 
 /* Trivial set of netlink ops to allow deleting tun or tap
@@ -1230,6 +1249,7 @@ static int tun_set_iff(struct net *net, struct file 
*file, struct ifreq *ifr)
tun = netdev_priv(dev);
tun-dev = dev;
tun-flags = flags;
+   tun-fd_page[0] = NULL;
 
security_tun_dev_post_create(tfile-sk);
 
@@ -1353,6 +1373,7 @@ static long __tun_chr_ioctl(struct file *file, unsigned 
int cmd,
struct net_device *dev = NULL;
void __user* argp = (void __user*)arg;
struct ifreq ifr;
+   struct tun_fd tfd;
int ret;
 
if (cmd == TUNSETIFF || cmd == TUNATTACHQUEUE || _IOC_TYPE(cmd) == 0x89)
@@ -1364,7 +1385,8 @@ static long __tun_chr_ioctl(struct file *file, unsigned 
int cmd,
 * This is needed because we never checked for invalid flags on
 * TUNSETIFF. */
return put_user(IFF_TUN | IFF_TAP | IFF_NO_PI | IFF_ONE_QUEUE |
-   IFF_VNET_HDR | IFF_MULTI_QUEUE | IFF_RXHASH,
+   IFF_VNET_HDR | IFF_MULTI_QUEUE | IFF_RXHASH |
+   IFF_FD,
(unsigned int __user*)argp);
}
 
@@ -1476,6 +1498,25 @@ static long __tun_chr_ioctl(struct file *file, unsigned 
int cmd,
ret = set_offload(tun, arg);

[net-next RFC PATCH 3/5] macvtap: flow director support

2011-12-05 Thread Jason Wang

Signed-off-by: Jason Wang jasow...@redhat.com
---
 drivers/net/macvlan.c  |4 
 drivers/net/macvtap.c  |   36 ++--
 include/linux/if_macvlan.h |1 +
 3 files changed, 39 insertions(+), 2 deletions(-)

diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index 7413497..b0cb7ce 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -706,6 +706,7 @@ int macvlan_common_newlink(struct net *src_net, struct 
net_device *dev,
vlan-port = port;
vlan-receive  = receive;
vlan-forward  = forward;
+   vlan-fd_page[0] = NULL;
 
vlan-mode = MACVLAN_MODE_VEPA;
if (data  data[IFLA_MACVLAN_MODE])
@@ -749,6 +750,9 @@ void macvlan_dellink(struct net_device *dev, struct 
list_head *head)
 {
struct macvlan_dev *vlan = netdev_priv(dev);
 
+   if (vlan-fd_page[0])
+   put_page(vlan-fd_page[0]);
+
list_del(vlan-list);
unregister_netdevice_queue(dev, head);
 }
diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 504c745..a34eb84 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -14,6 +14,7 @@
 #include linux/wait.h
 #include linux/cdev.h
 #include linux/fs.h
+#include linux/highmem.h
 
 #include net/net_namespace.h
 #include net/rtnetlink.h
@@ -62,6 +63,8 @@ static DEFINE_IDR(minor_idr);
 static struct class *macvtap_class;
 static struct cdev macvtap_cdev;
 
+#define TAP_HASH_MASK 0xFF
+
 static const struct proto_ops macvtap_socket_ops;
 
 /*
@@ -189,6 +192,11 @@ static struct macvtap_queue *macvtap_get_queue(struct 
net_device *dev,
/* Check if we can use flow to select a queue */
rxq = skb_get_rxhash(skb);
if (rxq) {
+   if (vlan-fd_page[0]) {
+   u16 *table = kmap_atomic(vlan-fd_page[0]);
+   rxq = table[rxq  TAP_HASH_MASK];
+   kunmap_atomic(table);
+   }
tap = rcu_dereference(vlan-taps[rxq % numvtaps]);
if (tap)
goto out;
@@ -851,6 +859,7 @@ static long macvtap_ioctl(struct file *file, unsigned int 
cmd,
 {
struct macvtap_queue *q = file-private_data;
struct macvlan_dev *vlan;
+   struct tun_fd tfd;
void __user *argp = (void __user *)arg;
struct ifreq __user *ifr = argp;
unsigned int __user *up = argp;
@@ -891,8 +900,8 @@ static long macvtap_ioctl(struct file *file, unsigned int 
cmd,
return ret;
 
case TUNGETFEATURES:
-   if (put_user(IFF_TAP | IFF_NO_PI | IFF_VNET_HDR | IFF_RXHASH,
-up))
+   if (put_user(IFF_TAP | IFF_NO_PI | IFF_VNET_HDR | IFF_RXHASH |
+IFF_FD, up))
return -EFAULT;
return 0;
 
@@ -918,6 +927,29 @@ static long macvtap_ioctl(struct file *file, unsigned int 
cmd,
q-vnet_hdr_sz = s;
return 0;
 
+   case TUNSETFD:
+   rcu_read_lock_bh();
+   vlan = rcu_dereference(q-vlan);
+   if (!vlan)
+   ret = -ENOLINK;
+   else {
+   if (copy_from_user(tfd, argp, sizeof(tfd)))
+   ret = -EFAULT;
+   if (vlan-fd_page[0]) {
+   put_page(vlan-fd_page[0]);
+   vlan-fd_page[0] = NULL;
+   }
+
+   /* put_page() in macvlan_dellink() */
+   if (get_user_pages_fast(tfd.addr, 1, 0,
+   vlan-fd_page[0]) != 1)
+   ret = -EFAULT;
+   else
+   ret = 0;
+   }
+   rcu_read_unlock_bh();
+   return ret;
+
case TUNSETOFFLOAD:
/* let the user check for future flags */
if (arg  ~(TUN_F_CSUM | TUN_F_TSO4 | TUN_F_TSO6 |
diff --git a/include/linux/if_macvlan.h b/include/linux/if_macvlan.h
index d103dca..69a87a1 100644
--- a/include/linux/if_macvlan.h
+++ b/include/linux/if_macvlan.h
@@ -65,6 +65,7 @@ struct macvlan_dev {
struct macvtap_queue*taps[MAX_MACVTAP_QUEUES];
int numvtaps;
int minor;
+   struct page *fd_page[1];
 };
 
 static inline void macvlan_count_rx(const struct macvlan_dev *vlan,

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-next RFC PATCH 4/5] virtio: introduce a method to get the irq of a specific virtqueue

2011-12-05 Thread Jason Wang

Device specific irq configuration may be need in order to do some
optimization. So a new configuration is needed to get the irq of a
virtqueue.

Signed-off-by: Jason Wang jasow...@redhat.com
---
 drivers/lguest/lguest_device.c |8 
 drivers/s390/kvm/kvm_virtio.c  |6 ++
 drivers/virtio/virtio_mmio.c   |8 
 drivers/virtio/virtio_pci.c|   12 
 include/linux/virtio_config.h  |4 
 5 files changed, 38 insertions(+), 0 deletions(-)

diff --git a/drivers/lguest/lguest_device.c b/drivers/lguest/lguest_device.c
index 595d731..6483bff 100644
--- a/drivers/lguest/lguest_device.c
+++ b/drivers/lguest/lguest_device.c
@@ -386,6 +386,13 @@ static const char *lg_bus_name(struct virtio_device *vdev)
return ;
 }
 
+static int lg_get_vq_irq(struct virtio_device *vdev, struct virtqueue *vq)
+{
+   struct lguest_vq_info *lvq = vq-priv;
+
+   return lvq-config.irq;
+}
+
 /* The ops structure which hooks everything together. */
 static struct virtio_config_ops lguest_config_ops = {
.get_features = lg_get_features,
@@ -398,6 +405,7 @@ static struct virtio_config_ops lguest_config_ops = {
.find_vqs = lg_find_vqs,
.del_vqs = lg_del_vqs,
.bus_name = lg_bus_name,
+   .get_vq_irq = lg_get_vq_irq,
 };
 
 /*
diff --git a/drivers/s390/kvm/kvm_virtio.c b/drivers/s390/kvm/kvm_virtio.c
index 8af868b..a8d5ca1 100644
--- a/drivers/s390/kvm/kvm_virtio.c
+++ b/drivers/s390/kvm/kvm_virtio.c
@@ -268,6 +268,11 @@ static const char *kvm_bus_name(struct virtio_device *vdev)
return ;
 }
 
+static int kvm_get_vq_irq(struct virtio_device *vdev, struct virtqueue *vq)
+{
+   return 0x2603;
+}
+
 /*
  * The config ops structure as defined by virtio config
  */
@@ -282,6 +287,7 @@ static struct virtio_config_ops kvm_vq_configspace_ops = {
.find_vqs = kvm_find_vqs,
.del_vqs = kvm_del_vqs,
.bus_name = kvm_bus_name,
+   .get_vq_irq = kvm_get_vq_irq,
 };
 
 /*
diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
index 2f57380..309d471 100644
--- a/drivers/virtio/virtio_mmio.c
+++ b/drivers/virtio/virtio_mmio.c
@@ -368,6 +368,13 @@ static const char *vm_bus_name(struct virtio_device *vdev)
return vm_dev-pdev-name;
 }
 
+static int vm_get_vq_irq(struct virtio_device *vdev, struct virtqueue *vq)
+{
+   struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
+
+   return platform_get_irq(vm_dev-pdev, 0);
+}
+
 static struct virtio_config_ops virtio_mmio_config_ops = {
.get= vm_get,
.set= vm_set,
@@ -379,6 +386,7 @@ static struct virtio_config_ops virtio_mmio_config_ops = {
.get_features   = vm_get_features,
.finalize_features = vm_finalize_features,
.bus_name   = vm_bus_name,
+   .get_vq_irq = vm_get_vq_irq,
 };
 
 
diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
index 229ea56..4f99164 100644
--- a/drivers/virtio/virtio_pci.c
+++ b/drivers/virtio/virtio_pci.c
@@ -583,6 +583,17 @@ static const char *vp_bus_name(struct virtio_device *vdev)
return pci_name(vp_dev-pci_dev);
 }
 
+static int vp_get_vq_irq(struct virtio_device *vdev, struct virtqueue *vq)
+{
+   struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+   struct virtio_pci_vq_info *info = vq-priv;
+
+   if (vp_dev-intx_enabled)
+   return vp_dev-pci_dev-irq;
+   else
+   return vp_dev-msix_entries[info-msix_vector].vector;
+}
+
 static struct virtio_config_ops virtio_pci_config_ops = {
.get= vp_get,
.set= vp_set,
@@ -594,6 +605,7 @@ static struct virtio_config_ops virtio_pci_config_ops = {
.get_features   = vp_get_features,
.finalize_features = vp_finalize_features,
.bus_name   = vp_bus_name,
+   .get_vq_irq = vp_get_vq_irq,
 };
 
 static void virtio_pci_release_dev(struct device *_d)
diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index 63f98d0..7b783a6 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -104,6 +104,9 @@
  * vdev: the virtio_device
  *  This returns a pointer to the bus name a la pci_name from which
  *  the caller can then copy.
+ * @get_vq_irq: get the irq numer of the specific virt queue.
+ *  vdev: the virtio_device
+ *  vq: the virtqueue
  */
 typedef void vq_callback_t(struct virtqueue *);
 struct virtio_config_ops {
@@ -122,6 +125,7 @@ struct virtio_config_ops {
u32 (*get_features)(struct virtio_device *vdev);
void (*finalize_features)(struct virtio_device *vdev);
const char *(*bus_name)(struct virtio_device *vdev);
+   int (*get_vq_irq)(struct virtio_device *vdev, struct virtqueue *vq);
 };
 
 /* If driver didn't advertise the feature, it will never appear. */

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to

[net-next RFC PATCH 5/5] virtio-net: flow director support

2011-12-05 Thread Jason Wang

In order to let the packets of a flow to be passed to the desired
guest cpu, we can co-operate with devices through programming the flow
director which was just a hash to queue table.

This kinds of co-operation is done through the accelerate RFS support,
a device specific flow sterring method virtnet_fd() is used to modify
the flow director based on rfs mapping. The desired queue were
calculated through reverse mapping of the irq affinity table. In order
to parallelize the ingress path, irq affinity of rx queue were also
provides by the driver.

In addition to accelerate RFS, we can also use the guest scheduler to
balance the load of TX and reduce the lock contention on egress path,
so the processor_id() were used to tx queue selection.

Signed-off-by: Jason Wang jasow...@redhat.com
---
 drivers/net/virtio_net.c   |  165 +++-
 include/linux/virtio_net.h |6 ++
 2 files changed, 169 insertions(+), 2 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 0d871f8..89bb5e7 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -26,6 +26,10 @@
 #include linux/scatterlist.h
 #include linux/if_vlan.h
 #include linux/slab.h
+#include linux/highmem.h
+#include linux/cpu_rmap.h
+#include linux/interrupt.h
+#include linux/cpumask.h
 
 static int napi_weight = 128;
 module_param(napi_weight, int, 0444);
@@ -40,6 +44,7 @@ module_param(gso, bool, 0444);
 
 #define VIRTNET_SEND_COMMAND_SG_MAX2
 #define VIRTNET_DRIVER_VERSION 1.0.0
+#define TAP_HASH_MASK 0xFF
 
 struct virtnet_send_stats {
struct u64_stats_sync syncp;
@@ -89,6 +94,9 @@ struct receive_queue {
 
/* Active rx statistics */
struct virtnet_recv_stats __percpu *stats;
+
+   /* FIXME: per vector instead of per queue ?? */
+   cpumask_var_t affinity_mask;
 };
 
 struct virtnet_info {
@@ -110,6 +118,11 @@ struct virtnet_info {
 
/* Host will pass rxhash to us. */
bool has_rxhash;
+
+   /* A page of flow director */
+   struct page *fd_page;
+
+   cpumask_var_t affinity_mask;
 };
 
 struct skb_vnet_hdr {
@@ -386,6 +399,7 @@ static void receive_buf(struct receive_queue *rq, void 
*buf, unsigned int len)
if (vi-has_rxhash)
skb-rxhash = hdr-rhdr.rxhash;
 
+   skb_record_rx_queue(skb, rq-vq-queue_index / 2);
netif_receive_skb(skb);
return;
 
@@ -722,6 +736,19 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct 
net_device *dev)
return NETDEV_TX_OK;
 }
 
+static int virtnet_set_fd(struct net_device *dev, u32 pfn)
+{
+   struct virtnet_info *vi = netdev_priv(dev);
+   struct virtio_device *vdev = vi-vdev;
+
+   if (virtio_has_feature(vdev, VIRTIO_NET_F_HOST_FD)) {
+   vdev-config-set(vdev,
+ offsetof(struct virtio_net_config_fd, addr),
+ pfn, sizeof(u32));
+   }
+   return 0;
+}
+
 static int virtnet_set_mac_address(struct net_device *dev, void *p)
 {
struct virtnet_info *vi = netdev_priv(dev);
@@ -1017,6 +1044,39 @@ static int virtnet_change_mtu(struct net_device *dev, 
int new_mtu)
return 0;
 }
 
+#ifdef CONFIG_RFS_ACCEL
+
+int virtnet_fd(struct net_device *net_dev, const struct sk_buff *skb,
+  u16 rxq_index, u32 flow_id)
+{
+   struct virtnet_info *vi = netdev_priv(net_dev);
+   u16 *table = NULL;
+
+   if (skb-protocol != htons(ETH_P_IP) || !skb-rxhash)
+   return -EPROTONOSUPPORT;
+
+   table = kmap_atomic(vi-fd_page);
+   table[skb-rxhash  TAP_HASH_MASK] = rxq_index;
+   kunmap_atomic(table);
+
+   return 0;
+}
+#endif
+
+static u16 virtnet_select_queue(struct net_device *dev, struct sk_buff *skb)
+{
+   int txq = skb_rx_queue_recorded(skb) ? skb_get_rx_queue(skb) :
+  smp_processor_id();
+
+   /* As we make use of the accelerate rfs which let the scheduler to
+* balance the load, it make sense to choose the tx queue also based on
+* theprocessor id?
+*/
+   while (unlikely(txq = dev-real_num_tx_queues))
+   txq -= dev-real_num_tx_queues;
+   return txq;
+}
+
 static const struct net_device_ops virtnet_netdev = {
.ndo_open= virtnet_open,
.ndo_stop= virtnet_close,
@@ -1028,9 +1088,13 @@ static const struct net_device_ops virtnet_netdev = {
.ndo_get_stats64 = virtnet_stats,
.ndo_vlan_rx_add_vid = virtnet_vlan_rx_add_vid,
.ndo_vlan_rx_kill_vid = virtnet_vlan_rx_kill_vid,
+   .ndo_select_queue= virtnet_select_queue,
 #ifdef CONFIG_NET_POLL_CONTROLLER
.ndo_poll_controller = virtnet_netpoll,
 #endif
+#ifdef CONFIG_RFS_ACCEL
+   .ndo_rx_flow_steer   = virtnet_fd,
+#endif
 };
 
 static void virtnet_update_status(struct virtnet_info *vi)
@@ -1272,12 +1336,76 @@ static int virtnet_setup_vqs(struct virtnet_info *vi)

winXP Standard PC HAL and qemu-kvm = 0.15

2011-12-05 Thread Michael Tokarev

As it turned out, a windowsXP machine does not work in
qemu-kvm = 0.15 (it loses network and USB entirely)
if it is using Standard PC HAL.  In 0.14 it worked
fine, but not in 0.14 (I haven't tried any in-between
versions yet).

There are several HAL types available in winXP: these
are Uniprocessor PC with MPS (or Multiprocessor),
also two ACPI types, and Standard PC.  All the other
HAL types appears to work fine, but not Standard PC.

I haven't debugged further yet, -- because it were
not easy to find out what was causing the regression
and how to reproduce it, and also because I don't think
it is the right HAL for qemu-kvm guest anyway.

So, if anybody have some thoughts about this issue,
and especially if you know a way to switch winXP HAL
type to some ACPI variant without reinstalling, please
speak up.. ;)

Debian bugreport for a reference: http://bugs.debian.org/647312

Reproducer: install a winXP guest on kvm with -no-acpi so
it chooses an Uniprocessor with MPS HAL.  Switch it to
Standard PC in device manager, reboot -- in 0.15+ it does
not work anymore, while in 0.14 it continues to work fine.

Thank you!

/mjt
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] kvm tools: Split custom rootfs init into two stages

2011-12-05 Thread Sasha Levin

Currently custom rootfs init is built along with the main KVM tools executable
and is copied into custom rootfs directories when they are created with
'kvm setup'. The problem there is that if the init code changes, they have
to be manually copied to custom rootfs directories.

Instead, this patch splits init process into two parts. One part that simply
handles mounts, and passes it to stage 2 of the init.

Stage 2 really sits along in the code tree, and does all the heavy lifting.

This allows us to make init changes in the code tree and have it automatically
be updated in custom rootfs guests without having to copy files over manually.

Signed-off-by: Sasha Levin levinsasha...@gmail.com
---
 tools/kvm/Makefile  |9 +++--
 tools/kvm/builtin-run.c |   27 +++
 tools/kvm/guest/init.c  |   14 +++---
 3 files changed, 37 insertions(+), 13 deletions(-)

diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile
index bb5f6b0..ece3306 100644
--- a/tools/kvm/Makefile
+++ b/tools/kvm/Makefile
@@ -21,6 +21,7 @@ TAGS  := ctags
 PROGRAM:= kvm
 
 GUEST_INIT := guest/init
+GUEST_INIT_S2 := guest/init_stage2
 
 OBJS   += builtin-balloon.o
 OBJS   += builtin-debug.o
@@ -179,7 +180,7 @@ WARNINGS += -Wwrite-strings
 
 CFLAGS += $(WARNINGS)
 
-all: $(PROGRAM) $(GUEST_INIT)
+all: $(PROGRAM) $(GUEST_INIT) $(GUEST_INIT_S2)
 
 KVMTOOLS-VERSION-FILE:
@$(SHELL_PATH) util/KVMTOOLS-VERSION-GEN $(OUTPUT)
@@ -193,6 +194,10 @@ $(GUEST_INIT): guest/init.c
$(E)   LINK $@
$(Q) $(CC) -static guest/init.c -o $@
 
+$(GUEST_INIT_S2): guest/init_stage2.c
+   $(E)   LINK $@
+   $(Q) $(CC) -static guest/init_stage2.c -o $@
+
 $(DEPS):
 
 %.d: %.c
@@ -269,7 +274,7 @@ clean:
$(Q) rm -f bios/bios-rom.h
$(Q) rm -f tests/boot/boot_test.iso
$(Q) rm -rf tests/boot/rootfs/
-   $(Q) rm -f $(DEPS) $(OBJS) $(PROGRAM) $(GUEST_INIT)
+   $(Q) rm -f $(DEPS) $(OBJS) $(PROGRAM) $(GUEST_INIT) $(GUEST_INIT_S2)
$(Q) rm -f cscope.*
$(Q) rm -f $(KVM_INCLUDE)/common-cmds.h
$(Q) rm -f KVMTOOLS-VERSION-FILE
diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c
index 43cf2c4..7c5ae47 100644
--- a/tools/kvm/builtin-run.c
+++ b/tools/kvm/builtin-run.c
@@ -702,6 +702,31 @@ void kvm_run_help(void)
usage_with_options(run_usage, options);
 }
 
+static int kvm_custom_stage2(void)
+{
+   char tmp[PATH_MAX], dst[PATH_MAX], *src;
+   const char *rootfs;
+   int r;
+
+   src = realpath(guest/init_stage2, NULL);
+   if (src == NULL)
+   return -ENOMEM;
+
+   if (image_filename[0] == NULL)
+   rootfs = default;
+   else
+   rootfs = image_filename[0];
+
+   sprintf(tmp, %s%s/virt/init_stage2, kvm__get_dir(), rootfs);
+   remove(tmp);
+
+   sprintf(dst, /host/%s, src);
+   r = symlink(dst, tmp);
+   free(src);
+
+   return r;
+}
+
 int kvm_cmd_run(int argc, const char **argv, const char *prefix)
 {
static char real_cmdline[2048], default_name[20];
@@ -867,6 +892,8 @@ int kvm_cmd_run(int argc, const char **argv, const char 
*prefix)
strcat(real_cmdline,  init=/virt/init);
if (!no_dhcp)
strcat(real_cmdline,   ip=dhcp);
+   if (kvm_custom_stage2())
+   die(Failed linking stage 2 of init.);
}
} else if (!strstr(real_cmdline, root=)) {
strlcat(real_cmdline,  root=/dev/vda rw , 
sizeof(real_cmdline));
diff --git a/tools/kvm/guest/init.c b/tools/kvm/guest/init.c
index 8975023..032a261 100644
--- a/tools/kvm/guest/init.c
+++ b/tools/kvm/guest/init.c
@@ -1,6 +1,6 @@
 /*
- * This is a simple init for shared rootfs guests. It brings up critical
- * mountpoints and then launches /bin/sh.
+ * This is a simple init for shared rootfs guests. This part should be limited
+ * to doing mounts and running stage 2 of the init process.
  */
 #include sys/mount.h
 #include string.h
@@ -30,15 +30,7 @@ int main(int argc, char *argv[])
 
do_mounts();
 
-/* get session leader */
-setsid();
-
-/* set controlling terminal */
-ioctl (0, TIOCSCTTY, 1);
-
-   puts(Starting '/bin/sh'...);
-
-   run_process(/bin/sh);
+   run_process(/virt/init_stage2);
 
printf(Init failed: %s\n, strerror(errno));
 
-- 
1.7.8

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] kvm tools: Split custom rootfs init into two stages

2011-12-05 Thread Cyrill Gorcunov

On Mon, Dec 05, 2011 at 11:22:11AM +0200, Sasha Levin wrote:
  
 +static int kvm_custom_stage2(void)
 +{
 + char tmp[PATH_MAX], dst[PATH_MAX], *src;
 + const char *rootfs;
 + int r;
 +
 + src = realpath(guest/init_stage2, NULL);
 + if (src == NULL)
 + return -ENOMEM;
 +
 + if (image_filename[0] == NULL)
 + rootfs = default;
 + else
 + rootfs = image_filename[0];
 +
 + sprintf(tmp, %s%s/virt/init_stage2, kvm__get_dir(), rootfs);
 + remove(tmp);
 +
 + sprintf(dst, /host/%s, src);
 + r = symlink(dst, tmp);
 + free(src);
 +
 + return r;
 +}
 +

I might be paranoid -- but could you please use snprintf here? :)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] kvm: make vcpu life cycle separated from kvm instance

2011-12-05 Thread Avi Kivity

On 12/05/2011 07:29 AM, Liu ping fan wrote:
 like this,
 #define kvm_for_each_vcpu(idx, cnt, vcpup, kvm) \
   for (idx = 0, cnt = 0, vcpup = kvm_get_vcpu(kvm, idx); \
cnt  atomic_read(kvm-online_vcpus)  \
idx  KVM_MAX_VCPUS; \
idx++, (vcpup == NULL)?:cnt++, vcpup = kvm_get_vcpu(kvm, idx)) \
if (vcpup == NULL) \
 continue; \
else


 A little ugly, but have not thought a better way out :-)


#define kvm_for_each_vcpu(vcpu, it) for (vcpu = kvm_fev_init(it); vcpu;
vcpu = kvm_fev_next(it, vcpu))

Though that doesn't give a good place for rcu_read_unlock().



-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] kvm: make vcpu life cycle separated from kvm instance

2011-12-05 Thread Gleb Natapov

On Mon, Dec 05, 2011 at 11:30:51AM +0200, Avi Kivity wrote:
 On 12/05/2011 07:29 AM, Liu ping fan wrote:
  like this,
  #define kvm_for_each_vcpu(idx, cnt, vcpup, kvm) \
  for (idx = 0, cnt = 0, vcpup = kvm_get_vcpu(kvm, idx); \
   cnt  atomic_read(kvm-online_vcpus)  \
   idx  KVM_MAX_VCPUS; \
   idx++, (vcpup == NULL)?:cnt++, vcpup = kvm_get_vcpu(kvm, idx)) \
   if (vcpup == NULL) \
continue; \
   else
 
 
  A little ugly, but have not thought a better way out :-)
 
 
 #define kvm_for_each_vcpu(vcpu, it) for (vcpu = kvm_fev_init(it); vcpu;
 vcpu = kvm_fev_next(it, vcpu))
 
 Though that doesn't give a good place for rcu_read_unlock().
 
 
Why not use rculist to store vcpus and use list_for_each_entry_rcu()?

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] virtio-ring: Use threshold for switching to indirect descriptors

2011-12-05 Thread Avi Kivity

On 12/05/2011 02:10 AM, Rusty Russell wrote:
 On Sun, 04 Dec 2011 17:16:59 +0200, Avi Kivity a...@redhat.com wrote:
  On 12/04/2011 05:11 PM, Michael S. Tsirkin wrote:
There's also the used ring, but that's a
mistake if you have out of order completion.  We should have used 
copying.
  
   Seems unrelated... unless you want used to be written into
   descriptor ring itself?
  
  The avail/used rings are in addition to the regular ring, no?  If you
  copy descriptors, then it goes away.

 There were two ideas which drove the current design:

 1) The Van-Jacobson style no two writers to same cacheline makes rings
fast idea.  Empirically, this doesn't show any winnage.

Write/write is the same as write/read or read/write.  Both cases have to
send a probe and wait for the result.  What we really need is to
minimize cache line ping ponging, and the descriptor pool fails that
with ooo completion.  I doubt it's measurable though except with the
very fastest storage providers.

 2) Allowing a generic inter-guest copy mechanism, so we could have
genuinely untrusted driver domains.  Yet noone ever did this so it's
hardly a killer feature :(

It's still a goal, though not an important one.  But we have to
translate rings anyway, don't, since buffers are in guest physical
addresses, and we're moving into an address space that doesn't map those.

I thought of having a vhost-copy driver that could do ring translation,
using a dma engine for the copy.

 So if we're going to revisit and drop those requirements, I'd say:

 1) Shared device/driver rings like Xen.  Xen uses device-specific ring
contents, I'd be tempted to stick to our pre-headers, and a 'u64
addr; u64 len_and_flags; u64 cookie;' generic style.  Then use
the same ring for responses.  That's a slight space-win, since
we're 24 bytes vs 26 bytes now.

Let's cheat and have inline contents.  Take three bits from
len_and_flags to specify additional descriptors as inline data.  Also,
stuff the cookie into len_and_flags as well.

 2) Stick with physically-contiguous rings, but use them of size (2^n)-1.
Makes the indexing harder, but that -1 lets us stash the indices in
the first entry and makes the ring a nice 2^n size.

Allocate at lease a cache line for those.  The 2^n size is not really
material, a division is never necessary.

16kB worth of descriptors is 1024 entries.  With 4kB buffers, that's 4MB
worth of data, or 4 ms at 10GbE line speed.  With 1500 byte buffers it's
just 1.5 ms.  In any case I think it's sufficient.
  
   Right. So I think that without indirect, we waste about 3 entries
   per packet for virtio header and transport etc headers.
  
  That does suck.  Are there issues in increasing the ring size?  Or
  making it discontiguous?

 Because the qemu implementation is broken.  

I was talking about something else, but this is more important.  Every
time we make a simplifying assumption, it turns around and bites us, and
the code becomes twice as complicated as it would have been in the first
place, and the test matrix explodes.

 We can often put the virtio
 header at the head of the packet.  In practice, the qemu implementation
 insists the header be a single descriptor.

 (At least, it used to, perhaps it has now been fixed.  We need a
 VIRTIO_NET_F_I_NOW_CONFORM_TO_THE_DAMN_SPEC_SORRY_I_SUCK bit).

We'll run out of bits in no time.

 We currently use small rings: the guest can't negotiate so qemu has to
 offer a lowest-common-denominator value.  The new virtio-pci layout
 fixes this, and lets the guest set the ring size.

Ok good.  Note the figuring out the best ring size needs some info from
the host, but that can be had from other channels.

  Can you take a peek at how Xen manages its rings?  They have the same
  problems we do.

 Yes, I made some mistakes, but I did steal from them in the first
 place...

There was a bit of second system syndrome there.  And I don't understand
how the ring/pool issue didn't surface during review, it seems so
obvious now but completely eluded me then.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] kvm: make vcpu life cycle separated from kvm instance

2011-12-05 Thread Avi Kivity

On 12/05/2011 11:42 AM, Gleb Natapov wrote:
 On Mon, Dec 05, 2011 at 11:30:51AM +0200, Avi Kivity wrote:
  On 12/05/2011 07:29 AM, Liu ping fan wrote:
   like this,
   #define kvm_for_each_vcpu(idx, cnt, vcpup, kvm) \
 for (idx = 0, cnt = 0, vcpup = kvm_get_vcpu(kvm, idx); \
  cnt  atomic_read(kvm-online_vcpus)  \
  idx  KVM_MAX_VCPUS; \
  idx++, (vcpup == NULL)?:cnt++, vcpup = kvm_get_vcpu(kvm, idx)) \
  if (vcpup == NULL) \
   continue; \
  else
  
  
   A little ugly, but have not thought a better way out :-)
  
  
  #define kvm_for_each_vcpu(vcpu, it) for (vcpu = kvm_fev_init(it); vcpu;
  vcpu = kvm_fev_next(it, vcpu))
  
  Though that doesn't give a good place for rcu_read_unlock().
  
  
 Why not use rculist to store vcpus and use list_for_each_entry_rcu()?

We can, but that's a bigger change.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC][PATCH 14/16] kvm: x86: Add user space part for in-kernel i8259

2011-12-05 Thread Avi Kivity

On 12/04/2011 11:38 PM, Jan Kiszka wrote:
  
  It should be also possible to migrate from non-KVM device to KVM
  version, different names would prevent that for ever.

 It is (theoretically) possible with these patches as the vmstate names
 are the same. KVM to TCG migration does not work right now, so I was
 only able to test in-kernel - user space irqchip model migrations.

btw, for the next-gen migration protocol, we'd probably be using QOM
paths, not vmstate names; the QOM paths would include the device name?

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] kvm: make vcpu life cycle separated from kvm instance

2011-12-05 Thread Gleb Natapov

On Mon, Dec 05, 2011 at 11:58:56AM +0200, Avi Kivity wrote:
 On 12/05/2011 11:42 AM, Gleb Natapov wrote:
  On Mon, Dec 05, 2011 at 11:30:51AM +0200, Avi Kivity wrote:
   On 12/05/2011 07:29 AM, Liu ping fan wrote:
like this,
#define kvm_for_each_vcpu(idx, cnt, vcpup, kvm) \
for (idx = 0, cnt = 0, vcpup = kvm_get_vcpu(kvm, idx); \
 cnt  atomic_read(kvm-online_vcpus)  \
 idx  KVM_MAX_VCPUS; \
 idx++, (vcpup == NULL)?:cnt++, vcpup = kvm_get_vcpu(kvm, 
idx)) \
 if (vcpup == NULL) \
  continue; \
 else
   
   
A little ugly, but have not thought a better way out :-)
   
   
   #define kvm_for_each_vcpu(vcpu, it) for (vcpu = kvm_fev_init(it); vcpu;
   vcpu = kvm_fev_next(it, vcpu))
   
   Though that doesn't give a good place for rcu_read_unlock().
   
   
  Why not use rculist to store vcpus and use list_for_each_entry_rcu()?
 
 We can, but that's a bigger change.
 
Is it? I do not see a lot of accesses to vcpu array except those loops.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] kvm: make vcpu life cycle separated from kvm instance

2011-12-05 Thread Avi Kivity

On 12/05/2011 12:18 PM, Gleb Natapov wrote:
  
  We can, but that's a bigger change.
  
 Is it? I do not see a lot of accesses to vcpu array except those loops.


Well actually some of those loops have to go away and be replaced by a
hash lookup with apic id as key.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [net-next RFC PATCH 2/5] tuntap: simple flow director support

2011-12-05 Thread Stefan Hajnoczi

On Mon, Dec 5, 2011 at 8:58 AM, Jason Wang jasow...@redhat.com wrote:
 This patch adds a simple flow director to tun/tap device. It is just a
 page that contains the hash to queue mapping which could be changed by
 user-space. The backend (tap/macvtap) would query this table to get
 the desired queue of a packets when it send packets to userspace.

 The page address were set through a new kind of ioctl - TUNSETFD and
 were pinned until device exit or another new page were specified.

Please use flow or fdir instead of fd in the ioctl and code.
fd reminds of file descriptor.  The ixgbe driver uses fdir.

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] kvm: make vcpu life cycle separated from kvm instance

2011-12-05 Thread Gleb Natapov

On Mon, Dec 05, 2011 at 12:22:53PM +0200, Avi Kivity wrote:
 On 12/05/2011 12:18 PM, Gleb Natapov wrote:
   
   We can, but that's a bigger change.
   
  Is it? I do not see a lot of accesses to vcpu array except those loops.
 
 
 Well actually some of those loops have to go away and be replaced by a
 hash lookup with apic id as key.
 
Yes, but apic ids are guest controllable, so there should be separate hash that
will hold vcpu to gust configured apic id mapping. Shouldn't prevent us
from moving to rculist now.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [RFC][PATCH 02/16] kvm: Move kvmclock into hw/kvm folder

2011-12-05 Thread Andreas Färber

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Am 03.12.2011 23:33, schrieb Jan Kiszka:
 On 2011-12-03 20:00, Andreas Färber wrote:
 Am 03.12.2011 12:17, schrieb Jan Kiszka:
 diff --git a/hw/kvmclock.c b/hw/kvm/clock.c similarity index
 96% rename from hw/kvmclock.c rename to hw/kvm/clock.c index
 5388bc4..aa37c5d 100644 --- a/hw/kvmclock.c +++
 b/hw/kvm/clock.c @@ -11,11 +11,11 @@ * */
 
 -#include qemu-common.h -#include sysemu.h -#include
 sysbus.h -#include kvm.h -#include kvmclock.h +#include
 qemu-common.h +#include sysemu.h +#include kvm.h 
 +#include hw/sysbus.h +#include hw/kvm/clock.h
 
 #include linux/kvm.h #include linux/kvm_para.h
 
 Please don't start using system includes for everything. Rather
 extend QEMU_CFLAGS to contain the right user include path(s).
 
 No problem - and no need to tweak any CFLAGS

Right, I had recursion into kvm/ in mind - would've required -I ../..
to be added to CFLAGS.

 ( only adds . to the header search paths).

By default that is. -iquote can add further paths. (Unfortunately
didn't solve the Cocoa Block.h vs. block.h problem since Objective-C
frameworks use quotes, too.)

 Do we have a convention that every include in  is considered
 system header? Should probably be documented then (and code should
 be converted gradually).

The convention I perceived was that everything QEMU was in quotes
whereas POSIX, Linux, zlib, glib, etc. were in angle brackets. Didn't
check for documentation.

Andreas

- -- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.18 (GNU/Linux)

iQIcBAEBAgAGBQJO3KA6AAoJEPou0S0+fgE/izQP/1q0Oje72FdXyUyVxPZw2Ypi
zp+2TFYJ3FJUrTLkkDBjmsaMT0sdIoI/wXxDTrrif9QI1gfRhNlxw9qES+En4xDG
3ClCl6UMNrcq35WrejIvPOXQMvVH6tTnliHBKmG6TSsQXPEFLS/BbWA1Y3gV7nZ4
KXmMHdNqVzmo66AU0FGQPSZyE/u+w8PKnfOIea961tMFtYodny69lzuoBWIaC/oT
8neCRT6U4BVX6hEy6QgY1651IM0KUOUC0fbBwFMwiy+NeL5KgB+GWsrnVq+U0hpM
gDtE09L1IKzuppMLlsx1DmxAZYHX12ZlW5W3np13+qDOkFx+4JqT3AU1MGBDhVQ+
ylbYXAINpcXsV8hTyCv1xoWlCJTUreD5+vVgAe5IN3jJUuXttR867YZHS6w0Xkh2
saTYRdkaywNpb9Jm/8RdP0Nepjq2YKdjP99/Da5/GOlVBOqASycKmtAyKQKerhAx
2n+Os8Ekji9fLM7S1FFWe2i/v/bUiVKb9TPRw98tDaDd9V0RW2AkBrJcL2BlFBC4
nqM57ndpv3phGLbVoin2yo32P6iTqL/bS7iyJap+IeklSzxSyW0bBcJyT0oIZMQ2
TdeZNSS2aF9+SmIp91aNRIWhXDAZGggls5AvrS3FTbyzY0jb4HXLIYVGyLCdzfar
uHBpp0n3XZsqieTYP+f0
=zA/a
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [net-next RFC PATCH 5/5] virtio-net: flow director support

2011-12-05 Thread Stefan Hajnoczi

On Mon, Dec 5, 2011 at 8:59 AM, Jason Wang jasow...@redhat.com wrote:
 +static int virtnet_set_fd(struct net_device *dev, u32 pfn)
 +{
 +       struct virtnet_info *vi = netdev_priv(dev);
 +       struct virtio_device *vdev = vi-vdev;
 +
 +       if (virtio_has_feature(vdev, VIRTIO_NET_F_HOST_FD)) {
 +               vdev-config-set(vdev,
 +                                 offsetof(struct virtio_net_config_fd, addr),
 +                                 pfn, sizeof(u32));

Please use the virtio model (i.e. virtqueues) instead of shared
memory.  Mapping a page breaks the virtio abstraction.

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC][PATCH 14/16] kvm: x86: Add user space part for in-kernel i8259

2011-12-05 Thread Jan Kiszka

On 2011-12-05 11:01, Avi Kivity wrote:
 On 12/04/2011 11:38 PM, Jan Kiszka wrote:

 It should be also possible to migrate from non-KVM device to KVM
 version, different names would prevent that for ever.

 It is (theoretically) possible with these patches as the vmstate names
 are the same. KVM to TCG migration does not work right now, so I was
 only able to test in-kernel - user space irqchip model migrations.
 
 btw, for the next-gen migration protocol, we'd probably be using QOM
 paths, not vmstate names; the QOM paths would include the device name?

That would be a very bad idea IMHO. Every refactoring of your device
tree, e.g. to model CPU hotplug and the ICC bus more accurately, would
risk to create a migration crack. At least we would need some stable
naming and/or alias concept then.

Jan



signature.asc
Description: OpenPGP digital signature

[kvm-autotest] tests.cgroup: Add 2 new tests of cpuset.cpus cgroup functionality

2011-12-05 Thread Lukas Doktor


Hi,

This patchset fixes some issues in cgroup_common.py library and adds 2 new 
tests to cgroup-kvm test.

Please find the details in each patch.

Sent to upstream as pull req. 103:
https://github.com/autotest/autotest/pull/103

Regards,
Lukáš

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/3] [kvm-autotest] tests.cgroup: Add TestCpusetCpus test

2011-12-05 Thread Lukas Doktor

Tests the cpuset.cpus cgroup feature. It stresses all VM's CPUs
and changes the CPU affinity. Verifies correct behaviour.

* Add TestCpusetCpus test
* import cleanup
* private function names cleanup
---
 client/tests/cgroup/cgroup_common.py |2 +
 client/tests/kvm/tests/cgroup.py |  211 +++---
 2 files changed, 194 insertions(+), 19 deletions(-)

diff --git a/client/tests/cgroup/cgroup_common.py 
b/client/tests/cgroup/cgroup_common.py
index 186bf09..fe1601b 100755
--- a/client/tests/cgroup/cgroup_common.py
+++ b/client/tests/cgroup/cgroup_common.py
@@ -152,6 +152,8 @@ class Cgroup(object):
 
 if pwd == None:
 pwd = self.root
+if isinstance(pwd, int):
+pwd = self.cgroups[pwd]
 try:
 # Remove tailing '\n' from each line
 ret = [_[:-1] for _ in open(pwd+prop, 'r').readlines()]
diff --git a/client/tests/kvm/tests/cgroup.py b/client/tests/kvm/tests/cgroup.py
index ee6ef2e..23ae622 100644
--- a/client/tests/kvm/tests/cgroup.py
+++ b/client/tests/kvm/tests/cgroup.py
@@ -7,8 +7,8 @@ import logging, os, re, sys, tempfile, time
 from random import random
 from autotest_lib.client.common_lib import error
 from autotest_lib.client.bin import utils
-from autotest_lib.client.tests.cgroup.cgroup_common import Cgroup, 
CgroupModules
-from autotest_lib.client.virt import virt_utils, virt_env_process
+from autotest_lib.client.tests.cgroup.cgroup_common import (Cgroup,
+CgroupModules, get_load_per_cpu)
 from autotest_lib.client.virt.aexpect import ExpectTimeoutError
 from autotest_lib.client.virt.aexpect import ExpectProcessTerminatedError
 
@@ -839,7 +839,7 @@ def run_cgroup(test, params, env):
  * Freezes the guest and thaws it again couple of times
  * verifies that guest is frozen and runs when expected
 
-def get_stat(pid):
+def _get_stat(pid):
 
 Gather statistics of pid+1st level subprocesses cpu usage
 @param pid: PID of the desired process
@@ -877,9 +877,9 @@ def run_cgroup(test, params, env):
 _ = cgroup.get_property('freezer.state', cgroup.cgroups[0])
 if 'FROZEN' not in _:
 raise error.TestFail(Couldn't freeze the VM: state %s % 
_)
-stat_ = get_stat(pid)
+stat_ = _get_stat(pid)
 time.sleep(tsttime)
-stat = get_stat(pid)
+stat = _get_stat(pid)
 if stat != stat_:
 raise error.TestFail('Process was running in FROZEN state; 
'
  'stat=%s, stat_=%s, diff=%s' %
@@ -887,9 +887,9 @@ def run_cgroup(test, params, env):
 logging.info(THAWING (%ss), tsttime)
 self.cgroup.set_property('freezer.state', 'THAWED',
  self.cgroup.cgroups[0])
-stat_ = get_stat(pid)
+stat_ = _get_stat(pid)
 time.sleep(tsttime)
-stat = get_stat(pid)
+stat = _get_stat(pid)
 if (stat - stat_)  (90*tsttime):
 raise error.TestFail('Process was not active in FROZEN'
  'state; stat=%s, stat_=%s, diff=%s' %
@@ -1186,7 +1186,7 @@ def run_cgroup(test, params, env):
 Let each of 3 scenerios (described in test specification) stabilize
 and then measure the CPU utilisation for time_test time.
 
-def get_stat(f_stats, _stats=None):
+def _get_stat(f_stats, _stats=None):
  Reads CPU times from f_stats[] files and sumarize them. 
 if _stats is None:
 _stats = []
@@ -1218,27 +1218,27 @@ def run_cgroup(test, params, env):
 for thread_count in range(0, host_cpus):
 sessions[thread_count].sendline(cmd)
 time.sleep(time_init)
-_stats = get_stat(f_stats)
+_stats = _get_stat(f_stats)
 time.sleep(time_test)
-stats.append(get_stat(f_stats, _stats))
+stats.append(_get_stat(f_stats, _stats))
 
 thread_count += 1
 sessions[thread_count].sendline(cmd)
 if host_cpus % no_speeds == 0 and no_speeds = host_cpus:
 time.sleep(time_init)
-_stats = get_stat(f_stats)
+_stats = _get_stat(f_stats)
 time.sleep(time_test)
-stats.append(get_stat(f_stats, _stats))
+stats.append(_get_stat(f_stats, _stats))
 
 for i in range(thread_count+1, no_threads):
 sessions[i].sendline(cmd)
 time.sleep(time_init)
-_stats = get_stat(f_stats)
+_stats = _get_stat(f_stats)
 for j in range(3):
-

[PATCH 3/3] [kvm-autotest] tests.cgroup: Add TestCpusetCpusSwitching

2011-12-05 Thread Lukas Doktor

Tests the cpuset.cpus cgroup feature. It stresses all VM's CPUs
while switching between cgroups with different setting.

Signed-off-by: Lukas Doktor ldok...@redhat.com
---
 client/tests/cgroup/cgroup_common.py |4 +
 client/tests/kvm/tests/cgroup.py |  108 +-
 2 files changed, 109 insertions(+), 3 deletions(-)

diff --git a/client/tests/cgroup/cgroup_common.py 
b/client/tests/cgroup/cgroup_common.py
index fe1601b..56856c0 100755
--- a/client/tests/cgroup/cgroup_common.py
+++ b/client/tests/cgroup/cgroup_common.py
@@ -105,6 +105,8 @@ class Cgroup(object):
 @param pwd: cgroup directory
 @return: 0 when is 'pwd' member
 
+if isinstance(pwd, int):
+pwd = self.cgroups[pwd]
 if open(pwd + '/tasks').readlines().count(%d\n % pid)  0:
 return 0
 else:
@@ -126,6 +128,8 @@ class Cgroup(object):
 @param pid: pid of the process
 @param pwd: cgroup directory
 
+if isinstance(pwd, int):
+pwd = self.cgroups[pwd]
 try:
 open(pwd+'/tasks', 'w').write(str(pid))
 except Exception, inst:
diff --git a/client/tests/kvm/tests/cgroup.py b/client/tests/kvm/tests/cgroup.py
index 23ae622..2e18ef7 100644
--- a/client/tests/kvm/tests/cgroup.py
+++ b/client/tests/kvm/tests/cgroup.py
@@ -51,13 +51,12 @@ def run_cgroup(test, params, env):
 @param cgroup: cgroup handler
 @param pwd: desired cgroup's pwd, cgroup index or None for root cgroup
 
-if isinstance(pwd, int):
-pwd = cgroup.cgroups[pwd]
 cgroup.set_cgroup(vm.get_shell_pid(), pwd)
 for pid in utils.get_children_pids(vm.get_shell_pid()):
 cgroup.set_cgroup(int(pid), pwd)
 
 
+
 def distance(actual, reference):
 
 Absolute value of relative distance of two numbers
@@ -1341,7 +1340,7 @@ def run_cgroup(test, params, env):
 except Exception, failure_detail:
 err += \nCan't remove Cgroup: %s % failure_detail
 
-self.sessions[0].sendline('rm -f /tmp/cgroup-cpu-lock')
+self.sessions[-1].sendline('rm -f /tmp/cgroup-cpu-lock')
 for i in range(len(self.sessions)):
 try:
 self.sessions[i].close()
@@ -1381,6 +1380,7 @@ def run_cgroup(test, params, env):
 self.sessions.append(self.vm.wait_for_login(timeout=30))
 self.sessions[i].cmd(touch /tmp/cgroup-cpu-lock)
 self.sessions[i].sendline(cmd)
+self.sessions.append(self.vm.wait_for_login(timeout=30))   # 
cleanup
 
 
 def run(self):
@@ -1485,8 +1485,109 @@ def run_cgroup(test, params, env):
 logging.error(err)
 raise error.TestFail(err)
 
+logging.info(Test passed successfully)
 return (All clear)
 
+
+class TestCpusetCpusSwitching:
+
+Tests the cpuset.cpus cgroup feature. It stresses all VM's CPUs
+while switching between cgroups with different setting.
+
+def __init__(self, vms, modules):
+
+Initialization
+@param vms: list of vms
+@param modules: initialized cgroup module class
+
+self.vm = vms[0]  # Virt machines
+self.modules = modules  # cgroup module handler
+self.cgroup = Cgroup('cpuset', '')   # cgroup handler
+self.sessions = []
+
+
+def cleanup(self):
+ Cleanup 
+err = 
+try:
+del(self.cgroup)
+except Exception, failure_detail:
+err += \nCan't remove Cgroup: %s % failure_detail
+
+self.sessions[-1].sendline('rm -f /tmp/cgroup-cpu-lock')
+for i in range(len(self.sessions)):
+try:
+self.sessions[i].close()
+except Exception, failure_detail:
+err += (\nCan't close the %dst ssh connection % i)
+
+if err:
+logging.error(Some cleanup operations failed: %s, err)
+raise error.TestError(Some cleanup operations failed: %s %
+  err)
+
+
+def init(self):
+
+Prepares cgroup, moves VM into it and execute stressers.
+
+self.cgroup.initialize(self.modules)
+vm_cpus = int(params.get('smp', 1))
+all_cpus = self.cgroup.get_property(cpuset.cpus)[0]
+if all_cpus == 0:
+raise error.TestFail(This test needs at least 2 CPUs on 
+ host, cpuset=%s % all_cpus)
+try:
+last_cpu = int(all_cpus.split('-')[1])
+except Exception:
+raise error.TestFail(Failed to get #CPU from root cgroup.)
+
+# Comments are for vm_cpus=2, no_cpus=4, _SC_CLK_TCK=100
+

[PATCH 1/3] [autotest] client.tests.cgroup: Replace LoadPerCpu() by get_load_per_cpu

2011-12-05 Thread Lukas Doktor

* Move LoadPerCpu into cgroup_common.py (cgroup-kvm will need it too)
* [FIX] Use etraceback
* Code cleanup
---
 client/tests/cgroup/cgroup.py|   79 ++
 client/tests/cgroup/cgroup_common.py |   22 +
 2 files changed, 35 insertions(+), 66 deletions(-)

diff --git a/client/tests/cgroup/cgroup.py b/client/tests/cgroup/cgroup.py
index 207a0d7..000e562 100755
--- a/client/tests/cgroup/cgroup.py
+++ b/client/tests/cgroup/cgroup.py
@@ -12,9 +12,7 @@ from tempfile import NamedTemporaryFile
 
 from autotest_lib.client.bin import test, utils
 from autotest_lib.client.common_lib import error
-from cgroup_common import Cgroup as CG
-from cgroup_common import CgroupModules
-from cgroup_common import _traceback
+from cgroup_common import Cgroup, CgroupModules, get_load_per_cpu
 
 class cgroup(test.test):
 
@@ -48,7 +46,7 @@ class cgroup(test.test):
 logging.info(--- 'test_%s' FAILED ---, subtest)
 except Exception:
 err += %s,  % subtest
-tb = _traceback(test_%s % subtest, sys.exc_info())
+tb = utils.etraceback(test_%s % subtest, sys.exc_info())
 logging.error(test_%s: FAILED%s, subtest, tb)
 logging.info(--- 'test_%s' FAILED ---, subtest)
 
@@ -75,7 +73,6 @@ class cgroup(test.test):
 def cleanup(self):
  Cleanup 
 logging.debug('cgroup_test cleanup')
-print Cleanup
 del (self.modules)
 
 
@@ -102,7 +99,7 @@ class cgroup(test.test):
 raise error.TestFail(Some parts of cleanup failed%s % 
err)
 
 # Preparation
-item = CG('memory', self._client)
+item = Cgroup('memory', self._client)
 item.initialize(self.modules)
 item.smoke_test()
 pwd = item.mk_cgroup()
@@ -116,8 +113,8 @@ class cgroup(test.test):
 mem = min(int(mem.split()[1])/1024, 1024)
 mem = max(mem, 100) # at least 100M
 try:
-memsw_limit_bytes = 
item.get_property(memory.memsw.limit_in_bytes)
-except error.TestFail:
+item.get_property(memory.memsw.limit_in_bytes)
+except error.TestError:
 # Doesn't support memsw limitation - disabling
 logging.info(System does not support 'memsw')
 utils.system(swapoff -a)
@@ -222,7 +219,8 @@ class cgroup(test.test):
 logging.debug(test_memory: Memfill mem + swap limit)
 ps = item.test(memfill %d %s % (mem, outf.name))
 item.set_cgroup(ps.pid, pwd)
-item.set_property_h(memory.memsw.limit_in_bytes, %dM%(mem/2), 
pwd)
+item.set_property_h(memory.memsw.limit_in_bytes, %dM%(mem/2),
+pwd)
 ps.stdin.write('\n')
 i = 0
 while ps.poll() == None:
@@ -266,56 +264,6 @@ class cgroup(test.test):
 Cpuset test
 1) Initiate CPU load on CPU0, than spread into CPU* - CPU0
 
-class LoadPerCpu:
-
-Handles the LoadPerCpu stats
-self.values [cpus, cpu0, cpu1, ...]
-
-def __init__(self):
-
-Init
-
-self.values = []
-self.stat = open('/proc/stat', 'r')
-line = self.stat.readline()
-while line:
-if line.startswith('cpu'):
-self.values.append(int(line.split()[1]))
-else:
-break
-line = self.stat.readline()
-
-def reload(self):
-
-Reload current values
-
-self.values = self.get()
-
-def get(self):
-
-Get the current values
-@return vals: array of current values [cpus, cpu0, cpu1..]
-
-self.stat.seek(0)
-self.stat.flush()
-vals = []
-for _ in range(len(self.values)):
-vals.append(int(self.stat.readline().split()[1]))
-return vals
-
-def tick(self):
-
-Reload values and returns the load between the last tick/reload
-@return vals: array of load between ticks/reloads
-  values [cpus, cpu0, cpu1..]
-
-vals = self.get()
-ret = []
-for i in range(len(self.values)):
-ret.append(vals[i] - self.values[i])
-self.values = vals
-return ret
-
 def cleanup(supress=False):
  cleanup 
 logging.debug(test_cpuset: Cleanup)
@@ -341,7 +289,7 @@ class cgroup(test.test):
 raise error.TestFail(Some parts of cleanup failed%s % 
err)
 
 # Preparation
-item = CG('cpuset',

Re: [RFC][PATCH 14/16] kvm: x86: Add user space part for in-kernel i8259

2011-12-05 Thread Avi Kivity

On 12/05/2011 01:37 PM, Jan Kiszka wrote:
 On 2011-12-05 11:01, Avi Kivity wrote:
  On 12/04/2011 11:38 PM, Jan Kiszka wrote:
 
  It should be also possible to migrate from non-KVM device to KVM
  version, different names would prevent that for ever.
 
  It is (theoretically) possible with these patches as the vmstate names
  are the same. KVM to TCG migration does not work right now, so I was
  only able to test in-kernel - user space irqchip model migrations.
  
  btw, for the next-gen migration protocol, we'd probably be using QOM
  paths, not vmstate names; the QOM paths would include the device name?

 That would be a very bad idea IMHO. Every refactoring of your device
 tree, e.g. to model CPU hotplug and the ICC bus more accurately, would
 risk to create a migration crack.

At some point, something has to be stable.  We can't have an infinite
number of layers giving names to things.  I propose we have just one layer.

  At least we would need some stable
 naming and/or alias concept then.

We should be able to transform a path to backward compatible names,
yes.  But if something has an unstable name, let's omit it in the first
place.

(the memory API added unstable names, hopefully the QOM can take over
the stable ones and we'll have a good way to denote the unstable ones).

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC][PATCH 14/16] kvm: x86: Add user space part for in-kernel i8259

2011-12-05 Thread Jan Kiszka

On 2011-12-05 13:36, Avi Kivity wrote:
 On 12/05/2011 01:37 PM, Jan Kiszka wrote:
 On 2011-12-05 11:01, Avi Kivity wrote:
 On 12/04/2011 11:38 PM, Jan Kiszka wrote:

 It should be also possible to migrate from non-KVM device to KVM
 version, different names would prevent that for ever.

 It is (theoretically) possible with these patches as the vmstate names
 are the same. KVM to TCG migration does not work right now, so I was
 only able to test in-kernel - user space irqchip model migrations.

 btw, for the next-gen migration protocol, we'd probably be using QOM
 paths, not vmstate names; the QOM paths would include the device name?

 That would be a very bad idea IMHO. Every refactoring of your device
 tree, e.g. to model CPU hotplug and the ICC bus more accurately, would
 risk to create a migration crack.
 
 At some point, something has to be stable.  We can't have an infinite
 number of layers giving names to things.  I propose we have just one layer.
 
  At least we would need some stable
 naming and/or alias concept then.
 
 We should be able to transform a path to backward compatible names,
 yes.  But if something has an unstable name, let's omit it in the first
 place.
 
 (the memory API added unstable names, hopefully the QOM can take over
 the stable ones and we'll have a good way to denote the unstable ones).
 

OK, maybe - or likely - we should make those device models have the same
names in QOM once instantiated. But I'm still convinced they should
remain separated models in contrast to a single model with a property.

The kvm ioapic, e.g., requires an additional property (gsi_base) that is
meaningless for user space devices. And its interrupts have to be
wiredconfigured differently at board model level. So, from the QEMU
POV, it is a very different device. Just the guest does not notice.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V3] Guest stop notification

2011-12-05 Thread Eric B Munson

On Sat, 03 Dec 2011, Jan Kiszka wrote:

 On 2011-12-02 22:27, Eric B Munson wrote:
  On Fri, 02 Dec 2011, Jan Kiszka wrote:
  
  On 2011-12-02 20:19, Eric B Munson wrote:
  Often when a guest is stopped from the qemu console, it will report 
  spurious
  soft lockup warnings on resume.  There are kernel patches being discussed 
  that
  will give the host the ability to tell the guest that it is being stopped 
  and
  should ignore the soft lockup warning that generates.
 
  Signed-off-by: Eric B Munson emun...@mgebm.net
  Cc: Avi Kivity a...@redhat.com
  Cc: Marcelo Tosatti mtosa...@redhat.com
  Cc: Jan Kiszka jan.kis...@siemens.com
  Cc: ry...@linux.vnet.ibm.com
  Cc: aligu...@us.ibm.com
  Cc: kvm@vger.kernel.org
 
  ---
  Changes from V2:
   Move ioctl into hw/kvmclock.c so as other arches can use it as it is
  implemented
 
  Changes from V1:
   Remove unnecessary encapsulating function
 
   hw/kvmclock.c |   24 
   1 files changed, 24 insertions(+), 0 deletions(-)
 
  diff --git a/hw/kvmclock.c b/hw/kvmclock.c
  index 5388bc4..756839f 100644
  --- a/hw/kvmclock.c
  +++ b/hw/kvmclock.c
  @@ -16,6 +16,7 @@
   #include sysbus.h
   #include kvm.h
   #include kvmclock.h
  +#include cpu-all.h
   
   #include linux/kvm.h
   #include linux/kvm_para.h
  @@ -69,11 +70,34 @@ static void kvmclock_vm_state_change(void *opaque, 
  int running,
   }
   }
   
  +static void kvmclock_vm_state_change_vcpu(void *opaque, int running,
  +  RunState state)
  +{
  +int ret;
  +CPUState *penv = first_cpu;
  +
  +if (running) {
  + while (penv) {
 
  or: for (cpu = first_cpu; cpu != NULL; cpu = cpu-next_cpu) {
 
  
  Functionally equivalent and I see both in the code, is there a standard?
 
 Not really. I once tried to introduce an iterator macro, but it was
 refused. The above is just more compact.
 
 But this is only a minor nit.
 

Fair enough, since there will be a V4 I will switch to the for loop.

  
  +ret = kvm_vcpu_ioctl(penv, KVM_GUEST_PAUSED, 0);
  +if (ret) {
  +if (ret != ENOSYS) {
  +fprintf(stderr,
  +kvmclock_vm_state_change_vcpu: %s\n,
  +strerror(-ret));
  +}
  +return;
  +}
  +penv = (CPUState *)penv-next_cpu;
 
  Unneeded cast.
 
  
  Also following an example seen elsewhere.
 
 Generally, we try to avoid those pointless casts.
 

Will remove for V4.

  
  +}
  +}
  +}
  +
 
  Again: please use checkpatch.pl.
 
  
  Sorry, tough to get used to hitting space bar that many times...
  
   static int kvmclock_init(SysBusDevice *dev)
   {
   KVMClockState *s = FROM_SYSBUS(KVMClockState, dev);
   
   qemu_add_vm_change_state_handler(kvmclock_vm_state_change, s);
  +qemu_add_vm_change_state_handler(kvmclock_vm_state_change_vcpu, 
  NULL);
   return 0;
   }
   
 
  Why not extend the existing handler?
  
  Because the new handler doesn't touch the KVMClockState object.  If this is
  preferred, I have no objection.
 
 The separate registration looks strange to me. And the fact that you
 don't need to object doesn't justify a callback of its own.
 

I think you misunderstood me, I meant I have no object to doign it your way if
you have a strong opinion (as it seems you do).

  
 
  I still wonder if the IOCTL interface is actually kvmclock specific. But
  Marcello asked for this, and we could still change it when some arch
  comes around that provides it independent of kvmclock.
  
  The flag itself is stored in the pvclock_vcpu_time_info structure, and 
  anything
  else that touches that structure uses ioctls.
 
 That's the host-guest interface. But I'm talking about the kvm-qemu
 interface here which has no relation to how the was paused information
 is transferred to the guest.
 
 Jan
 




signature.asc
Description: Digital signature

Re: [RFC][PATCH 14/16] kvm: x86: Add user space part for in-kernel i8259

2011-12-05 Thread Avi Kivity

On 12/05/2011 02:47 PM, Jan Kiszka wrote:
  
  (the memory API added unstable names, hopefully the QOM can take over
  the stable ones and we'll have a good way to denote the unstable ones).
  

 OK, maybe - or likely - we should make those device models have the same
 names in QOM once instantiated. But I'm still convinced they should
 remain separated models in contrast to a single model with a property.

What do you mean by separate models?  You share all the code you can,
and don't share the code you can't.  To me, single model == single name.

 The kvm ioapic, e.g., requires an additional property (gsi_base) that is
 meaningless for user space devices. And its interrupts have to be
 wiredconfigured differently at board model level. So, from the QEMU
 POV, it is a very different device. Just the guest does not notice.

It's like qcow2 and raw/native IO are wire differently, or virtio-net
and vhost-net.  But it's the same IDE device or virtio NIC.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: winXP Standard PC HAL and qemu-kvm = 0.15

2011-12-05 Thread Avi Kivity

On 12/05/2011 11:21 AM, Michael Tokarev wrote:
 As it turned out, a windowsXP machine does not work in
 qemu-kvm = 0.15 (it loses network and USB entirely)
 if it is using Standard PC HAL.  In 0.14 it worked
 fine, but not in 0.14 (I haven't tried any in-between
 versions yet).

 There are several HAL types available in winXP: these
 are Uniprocessor PC with MPS (or Multiprocessor),
 also two ACPI types, and Standard PC.  All the other
 HAL types appears to work fine, but not Standard PC.

 I haven't debugged further yet, -- because it were
 not easy to find out what was causing the regression
 and how to reproduce it, and also because I don't think
 it is the right HAL for qemu-kvm guest anyway.

It's not, but the regression indicates we broke something.  It would be
good to know what that is.

 So, if anybody have some thoughts about this issue,
 and especially if you know a way to switch winXP HAL
 type to some ACPI variant without reinstalling, please
 speak up.. ;)

I remember doing it somewhere in device manager, perhaps in the
processor entry.  But it was years since I last did this.

 Debian bugreport for a reference: http://bugs.debian.org/647312

 Reproducer: install a winXP guest on kvm with -no-acpi so
 it chooses an Uniprocessor with MPS HAL.  Switch it to
 Standard PC in device manager, reboot -- in 0.15+ it does
 not work anymore, while in 0.14 it continues to work fine.

Most likely non-ACPI interrupt routing.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC][PATCH 14/16] kvm: x86: Add user space part for in-kernel i8259

2011-12-05 Thread Jan Kiszka

On 2011-12-05 14:14, Avi Kivity wrote:
 On 12/05/2011 02:47 PM, Jan Kiszka wrote:

 (the memory API added unstable names, hopefully the QOM can take over
 the stable ones and we'll have a good way to denote the unstable ones).


 OK, maybe - or likely - we should make those device models have the same
 names in QOM once instantiated. But I'm still convinced they should
 remain separated models in contrast to a single model with a property.
 
 What do you mean by separate models?  You share all the code you can,
 and don't share the code you can't.  To me, single model == single name.

But different configuration.

 
 The kvm ioapic, e.g., requires an additional property (gsi_base) that is
 meaningless for user space devices. And its interrupts have to be
 wiredconfigured differently at board model level. So, from the QEMU
 POV, it is a very different device. Just the guest does not notice.
 
 It's like qcow2 and raw/native IO are wire differently, or virtio-net
 and vhost-net.  But it's the same IDE device or virtio NIC.

That would mean introducing a backend/frontend concept for irqchips.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC][PATCH 14/16] kvm: x86: Add user space part for in-kernel i8259

2011-12-05 Thread Avi Kivity

On 12/05/2011 03:29 PM, Jan Kiszka wrote:
 On 2011-12-05 14:14, Avi Kivity wrote:
  On 12/05/2011 02:47 PM, Jan Kiszka wrote:
 
  (the memory API added unstable names, hopefully the QOM can take over
  the stable ones and we'll have a good way to denote the unstable ones).
 
 
  OK, maybe - or likely - we should make those device models have the same
  names in QOM once instantiated. But I'm still convinced they should
  remain separated models in contrast to a single model with a property.
  
  What do you mean by separate models?  You share all the code you can,
  and don't share the code you can't.  To me, single model == single name.

 But different configuration.

Right, just like IDE with different backends.

  
  The kvm ioapic, e.g., requires an additional property (gsi_base) that is
  meaningless for user space devices. And its interrupts have to be
  wiredconfigured differently at board model level. So, from the QEMU
  POV, it is a very different device. Just the guest does not notice.
  
  It's like qcow2 and raw/native IO are wire differently, or virtio-net
  and vhost-net.  But it's the same IDE device or virtio NIC.

 That would mean introducing a backend/frontend concept for irqchips.

We could do it, have one ioapic model with ioapic_ops-eoi_broadcast(). 
Most of the interfaces already dispatch dynamically (qdev gpio/irq) so
there wouldn't be much more there.

To me, how it's actually implemented is not important.  What is
important is that save/restore, the monitor, and the guest don't notice
any changes.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH V3] Guest stop notification

2011-12-05 Thread Jan Kiszka

On 2011-12-05 14:35, Marcelo Tosatti wrote:
 On Sat, Dec 03, 2011 at 12:45:51PM +0100, Jan Kiszka wrote:
 I was referring to the relation between the IOCTL and kvmclock, but
 IOCTL vs. kvm_run.

 Jan

 Ah, OK. Yes, we better characterize it as KVMCLOCK specific (a generic
 guest is paused command is not the scope of this patch).

 So appending KVMCLOCK_ to the ioctl definitions would make that more
 explicit.

 IMHO, that would move things in the wrong direction. The IOCTL in itself
 has _nothing_ to do with kvmclock. It's just that its x86 backend is
 implemented on top of that infrastructure. For me the IOCTL is pretty
 generic, can be backed by kvmclock, but need not be on all future archs.

 Jan
 
 I do not see the need to lift this infrastructure to arch independent
 status at the moment, without clear semantics on that arch independent
 level.
 
 So I am fine with the current GUEST_PAUSED naming (which can later be
 extended with GUEST_RESUMED etc, if necessary, for use by other archs
 for example), and implementation in hw/kvmclock.c.
 

Yes, let's keep it as suggested last (addition of kvmclock, unchanged
IOCTL interface).

Jan



signature.asc
Description: OpenPGP digital signature

Re: [RFC][PATCH 14/16] kvm: x86: Add user space part for in-kernel i8259

2011-12-05 Thread Jan Kiszka

On 2011-12-05 14:36, Avi Kivity wrote:
 On 12/05/2011 03:29 PM, Jan Kiszka wrote:
 On 2011-12-05 14:14, Avi Kivity wrote:
 On 12/05/2011 02:47 PM, Jan Kiszka wrote:

 (the memory API added unstable names, hopefully the QOM can take over
 the stable ones and we'll have a good way to denote the unstable ones).


 OK, maybe - or likely - we should make those device models have the same
 names in QOM once instantiated. But I'm still convinced they should
 remain separated models in contrast to a single model with a property.

 What do you mean by separate models?  You share all the code you can,
 and don't share the code you can't.  To me, single model == single name.

 But different configuration.
 
 Right, just like IDE with different backends.

Except that there is a comparably large infrastructure to manage those
backends.

 

 The kvm ioapic, e.g., requires an additional property (gsi_base) that is
 meaningless for user space devices. And its interrupts have to be
 wiredconfigured differently at board model level. So, from the QEMU
 POV, it is a very different device. Just the guest does not notice.

 It's like qcow2 and raw/native IO are wire differently, or virtio-net
 and vhost-net.  But it's the same IDE device or virtio NIC.

 That would mean introducing a backend/frontend concept for irqchips.
 
 We could do it, have one ioapic model with ioapic_ops-eoi_broadcast(). 
 Most of the interfaces already dispatch dynamically (qdev gpio/irq) so
 there wouldn't be much more there.

The problem is configuration. Just by setting ioapic.backend=xxx, we
cannot pass down parameters that are backend-specific. We could ignore
this issue and make all specific parameters visible via the frontend.
Would be slightly ugly.

 
 To me, how it's actually implemented is not important.  What is
 important is that save/restore, the monitor, and the guest don't notice
 any changes.

I widely agree, except that differentiation (or backend awareness) has
to be preserved in the monitor.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

KVM call agenda for 12/6 (Tuesday) @ 10am US/Eastern

2011-12-05 Thread Juan Quintela


Hi

Please send in any agenda items you are interested in covering.

Proposal (from Anthony):

 1. A short introduction to each of the guest agents, what guests they
 support, and what verbs they support.

 2. A short description of key requirements from each party (oVirt,
 libvirt, QEMU) for a guest agent

 3. An open discussion about possible ways to collaborate/converge.

Notice that guest integration will take more than one week (Anthony
estimation also).

For libvirt and ovirt folks, please contact me or Chris for details of
the call.


Thanks, Juan.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/5] kvm tools: Split custom rootfs init into two stages

2011-12-05 Thread Sasha Levin

Currently custom rootfs init is built along with the main KVM tools executable
and is copied into custom rootfs directories when they are created with
'kvm setup'. The problem there is that if the init code changes, they have
to be manually copied to custom rootfs directories.

Instead, this patch splits init process into two parts. One part that simply
handles mounts, and passes it to stage 2 of the init.

Stage 2 really sits along in the code tree, and does all the heavy lifting.

This allows us to make init changes in the code tree and have it automatically
be updated in custom rootfs guests without having to copy files over manua

Signed-off-by: Sasha Levin levinsasha...@gmail.com
---
 tools/kvm/Makefile|9 +++--
 tools/kvm/builtin-run.c   |   27 +++
 tools/kvm/guest/init.c|   14 +++---
 tools/kvm/guest/init_stage2.c |   34 ++
 4 files changed, 71 insertions(+), 13 deletions(-)
 create mode 100644 tools/kvm/guest/init_stage2.c

diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile
index bb5f6b0..ece3306 100644
--- a/tools/kvm/Makefile
+++ b/tools/kvm/Makefile
@@ -21,6 +21,7 @@ TAGS  := ctags
 PROGRAM:= kvm
 
 GUEST_INIT := guest/init
+GUEST_INIT_S2 := guest/init_stage2
 
 OBJS   += builtin-balloon.o
 OBJS   += builtin-debug.o
@@ -179,7 +180,7 @@ WARNINGS += -Wwrite-strings
 
 CFLAGS += $(WARNINGS)
 
-all: $(PROGRAM) $(GUEST_INIT)
+all: $(PROGRAM) $(GUEST_INIT) $(GUEST_INIT_S2)
 
 KVMTOOLS-VERSION-FILE:
@$(SHELL_PATH) util/KVMTOOLS-VERSION-GEN $(OUTPUT)
@@ -193,6 +194,10 @@ $(GUEST_INIT): guest/init.c
$(E)   LINK $@
$(Q) $(CC) -static guest/init.c -o $@
 
+$(GUEST_INIT_S2): guest/init_stage2.c
+   $(E)   LINK $@
+   $(Q) $(CC) -static guest/init_stage2.c -o $@
+
 $(DEPS):
 
 %.d: %.c
@@ -269,7 +274,7 @@ clean:
$(Q) rm -f bios/bios-rom.h
$(Q) rm -f tests/boot/boot_test.iso
$(Q) rm -rf tests/boot/rootfs/
-   $(Q) rm -f $(DEPS) $(OBJS) $(PROGRAM) $(GUEST_INIT)
+   $(Q) rm -f $(DEPS) $(OBJS) $(PROGRAM) $(GUEST_INIT) $(GUEST_INIT_S2)
$(Q) rm -f cscope.*
$(Q) rm -f $(KVM_INCLUDE)/common-cmds.h
$(Q) rm -f KVMTOOLS-VERSION-FILE
diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c
index 43cf2c4..9635c82 100644
--- a/tools/kvm/builtin-run.c
+++ b/tools/kvm/builtin-run.c
@@ -702,6 +702,31 @@ void kvm_run_help(void)
usage_with_options(run_usage, options);
 }
 
+static int kvm_custom_stage2(void)
+{
+   char tmp[PATH_MAX], dst[PATH_MAX], *src;
+   const char *rootfs;
+   int r;
+
+   src = realpath(guest/init_stage2, NULL);
+   if (src == NULL)
+   return -ENOMEM;
+
+   if (image_filename[0] == NULL)
+   rootfs = default;
+   else
+   rootfs = image_filename[0];
+
+   snprintf(tmp, PATH_MAX, %s%s/virt/init_stage2, kvm__get_dir(), 
rootfs);
+   remove(tmp);
+
+   snprintf(dst, PATH_MAX, /host/%s, src);
+   r = symlink(dst, tmp);
+   free(src);
+
+   return r;
+}
+
 int kvm_cmd_run(int argc, const char **argv, const char *prefix)
 {
static char real_cmdline[2048], default_name[20];
@@ -867,6 +892,8 @@ int kvm_cmd_run(int argc, const char **argv, const char 
*prefix)
strcat(real_cmdline,  init=/virt/init);
if (!no_dhcp)
strcat(real_cmdline,   ip=dhcp);
+   if (kvm_custom_stage2())
+   die(Failed linking stage 2 of init.);
}
} else if (!strstr(real_cmdline, root=)) {
strlcat(real_cmdline,  root=/dev/vda rw , 
sizeof(real_cmdline));
diff --git a/tools/kvm/guest/init.c b/tools/kvm/guest/init.c
index 8975023..032a261 100644
--- a/tools/kvm/guest/init.c
+++ b/tools/kvm/guest/init.c
@@ -1,6 +1,6 @@
 /*
- * This is a simple init for shared rootfs guests. It brings up critical
- * mountpoints and then launches /bin/sh.
+ * This is a simple init for shared rootfs guests. This part should be limited
+ * to doing mounts and running stage 2 of the init process.
  */
 #include sys/mount.h
 #include string.h
@@ -30,15 +30,7 @@ int main(int argc, char *argv[])
 
do_mounts();
 
-/* get session leader */
-setsid();
-
-/* set controlling terminal */
-ioctl (0, TIOCSCTTY, 1);
-
-   puts(Starting '/bin/sh'...);
-
-   run_process(/bin/sh);
+   run_process(/virt/init_stage2);
 
printf(Init failed: %s\n, strerror(errno));
 
diff --git a/tools/kvm/guest/init_stage2.c b/tools/kvm/guest/init_stage2.c
new file mode 100644
index 000..af615a0
--- /dev/null
+++ b/tools/kvm/guest/init_stage2.c
@@ -0,0 +1,34 @@
+/*
+ * This is a stage 2 of the init. This part should do all the heavy
+ * lifting such as setting up the console and calling /bin/sh.
+ */
+#include sys/mount.h
+#include string.h
+#include unistd.h

[PATCH 2/5] kvm tools: Remove double 'init=' kernel param

2011-12-05 Thread Sasha Levin

Signed-off-by: Sasha Levin levinsasha...@gmail.com
---
 tools/kvm/builtin-run.c |3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c
index 9635c82..de3001e 100644
--- a/tools/kvm/builtin-run.c
+++ b/tools/kvm/builtin-run.c
@@ -881,9 +881,6 @@ int kvm_cmd_run(int argc, const char **argv, const char 
*prefix)
if (virtio_9p__register(kvm, /, hostfs)  0)
die(Unable to initialize virtio 9p);
using_rootfs = custom_rootfs = 1;
-
-   if (!strstr(real_cmdline, init=))
-   strlcat(real_cmdline,  init=/bin/sh , 
sizeof(real_cmdline));
}
 
if (using_rootfs) {
-- 
1.7.8

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/5] kvm tools: Allow easily sandboxing applications within a guest

2011-12-05 Thread Sasha Levin

This patch adds a '--sandbox' argument when used in conjuction with a custom
rootfs, it allows running a script or an executable in the guest environment
by using executables and other files from the host.

This is useful when testing code that might cause problems on the host, or
to automate kernel testing since it's now easy to link a kvm tools test
script with 'git bisect run'.

Suggested-by: Ingo Molnar mi...@elte.hu
Signed-off-by: Sasha Levin levinsasha...@gmail.com
---
 tools/kvm/builtin-run.c   |   31 +++
 tools/kvm/guest/init_stage2.c |   13 -
 2 files changed, 43 insertions(+), 1 deletions(-)

diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c
index de3001e..cd14159 100644
--- a/tools/kvm/builtin-run.c
+++ b/tools/kvm/builtin-run.c
@@ -82,6 +82,7 @@ static const char *guest_mac;
 static const char *host_mac;
 static const char *script;
 static const char *guest_name;
+static const char *sandbox;
 static struct virtio_net_params *net_params;
 static bool single_step;
 static bool readonly_image[MAX_DISK_IMAGES];
@@ -420,6 +421,8 @@ static const struct option options[] = {
OPT_CALLBACK('\0', tty, NULL, tty id,
 Remap guest TTY into a pty on the host,
 tty_parser),
+   OPT_STRING('\0', sandbox, sandbox, script,
+   Run this script when booting into custom rootfs),
 
OPT_GROUP(Kernel options:),
OPT_STRING('k', kernel, kernel_filename, kernel,
@@ -727,6 +730,31 @@ static int kvm_custom_stage2(void)
return r;
 }
 
+static int kvm_run_set_sandbox(void)
+{
+   const char *guestfs_name = default;
+   char path[PATH_MAX], script[PATH_MAX], *tmp;
+
+   if (image_filename[0])
+   guestfs_name = image_filename[0];
+
+   snprintf(path, PATH_MAX, %s%s/virt/sandbox.sh, kvm__get_dir(), 
guestfs_name);
+
+   remove(path);
+
+   if (sandbox == NULL)
+   return 0;
+
+   tmp = realpath(sandbox, NULL);
+   if (tmp == NULL)
+   return -ENOMEM;
+
+   snprintf(script, PATH_MAX, /host/%s, tmp);
+   free(tmp);
+
+   return symlink(script, path);
+}
+
 int kvm_cmd_run(int argc, const char **argv, const char *prefix)
 {
static char real_cmdline[2048], default_name[20];
@@ -886,7 +914,10 @@ int kvm_cmd_run(int argc, const char **argv, const char 
*prefix)
if (using_rootfs) {
strcat(real_cmdline,  root=/dev/root rw 
rootflags=rw,trans=virtio,version=9p2000.L rootfstype=9p);
if (custom_rootfs) {
+   kvm_run_set_sandbox();
+
strcat(real_cmdline,  init=/virt/init);
+
if (!no_dhcp)
strcat(real_cmdline,   ip=dhcp);
if (kvm_custom_stage2())
diff --git a/tools/kvm/guest/init_stage2.c b/tools/kvm/guest/init_stage2.c
index af615a0..6489fee 100644
--- a/tools/kvm/guest/init_stage2.c
+++ b/tools/kvm/guest/init_stage2.c
@@ -16,6 +16,14 @@ static int run_process(char *filename)
return execve(filename, new_argv, new_env);
 }
 
+static int run_process_sandbox(char *filename)
+{
+   char *new_argv[] = { filename, /virt/sandbox.sh, NULL };
+   char *new_env[] = { TERM=linux, NULL };
+
+   return execve(filename, new_argv, new_env);
+}
+
 int main(int argc, char *argv[])
 {
/* get session leader */
@@ -26,7 +34,10 @@ int main(int argc, char *argv[])
 
puts(Starting '/bin/sh'...);
 
-   run_process(/bin/sh);
+   if (access(/virt/sandbox.sh, R_OK) == 0)
+   run_process_sandbox(/bin/sh);
+   else
+   run_process(/bin/sh);
 
printf(Init failed: %s\n, strerror(errno));
 
-- 
1.7.8

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 4/5] kvm tools: Ignore parameters after dashdash in 'kvm run'

2011-12-05 Thread Sasha Levin

This allows other commands to wrap 'kvm run' and use the parameters user
provides after a dash-dash for it's own use.

Signed-off-by: Sasha Levin levinsasha...@gmail.com
---
 tools/kvm/builtin-run.c |7 ++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c
index cd14159..5db6995 100644
--- a/tools/kvm/builtin-run.c
+++ b/tools/kvm/builtin-run.c
@@ -776,8 +776,13 @@ int kvm_cmd_run(int argc, const char **argv, const char 
*prefix)
 
while (argc != 0) {
argc = parse_options(argc, argv, options, run_usage,
-   PARSE_OPT_STOP_AT_NON_OPTION);
+   PARSE_OPT_STOP_AT_NON_OPTION |
+   PARSE_OPT_KEEP_DASHDASH);
if (argc != 0) {
+   /* Cusrom options, should have been handled elsewhere */
+   if (strcmp(argv[0], --) == 0)
+   break;
+
if (kernel_filename) {
fprintf(stderr, Cannot handle parameter: 
%s\n, argv[0]);
-- 
1.7.8

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 5/5] kvm tools: Add 'kvm sandbox'

2011-12-05 Thread Sasha Levin

This patch adds 'kvm sandbox' which is a wrapper on top of 'kvm run' which
allows the user to easily specify sandboxed command to run in a custom
rootfs guest.

Example usage:

kvm sandbox -d test_guest -k some_kernel -- do_something_in_guest

Suggested-by: Pekka Enberg penb...@cs.helsinki.fi
Signed-off-by: Sasha Levin levinsasha...@gmail.com
---
 tools/kvm/Documentation/kvm-sandbox.txt |   16 ++
 tools/kvm/Makefile  |1 +
 tools/kvm/builtin-run.c |   49 +-
 tools/kvm/builtin-sandbox.c |9 ++
 tools/kvm/command-list.txt  |1 +
 tools/kvm/include/kvm/builtin-run.h |2 +
 tools/kvm/include/kvm/builtin-sandbox.h |6 
 tools/kvm/kvm-cmd.c |2 +
 8 files changed, 84 insertions(+), 2 deletions(-)
 create mode 100644 tools/kvm/Documentation/kvm-sandbox.txt
 create mode 100644 tools/kvm/builtin-sandbox.c
 create mode 100644 tools/kvm/include/kvm/builtin-sandbox.h

diff --git a/tools/kvm/Documentation/kvm-sandbox.txt 
b/tools/kvm/Documentation/kvm-sandbox.txt
new file mode 100644
index 000..8f24fc7
--- /dev/null
+++ b/tools/kvm/Documentation/kvm-sandbox.txt
@@ -0,0 +1,16 @@
+kvm-sandbox(1)
+
+
+NAME
+
+kvm-sandbox - Run a command in a sandboxed guest
+
+SYNOPSIS
+
+[verse]
+'kvm sandbox ['kvm run' arguments] -- [sandboxed command]'
+
+DESCRIPTION
+---
+The sandboxed command will run in a guest as part of it's init
+command.
diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile
index ece3306..24af1d0 100644
--- a/tools/kvm/Makefile
+++ b/tools/kvm/Makefile
@@ -85,6 +85,7 @@ OBJS  += hw/vesa.o
 OBJS   += hw/i8042.o
 OBJS   += hw/pci-shmem.o
 OBJS   += kvm-ipc.o
+OBJS   += builtin-sandbox.o
 
 FLAGS_BFD := $(CFLAGS) -lbfd
 has_bfd := $(call try-cc,$(SOURCE_BFD),$(FLAGS_BFD))
diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c
index 5db6995..7a57b5c 100644
--- a/tools/kvm/builtin-run.c
+++ b/tools/kvm/builtin-run.c
@@ -53,6 +53,7 @@
 #define DEFAULT_GUEST_MAC  02:15:15:15:15:15
 #define DEFAULT_HOST_MAC   02:01:01:01:01:01
 #define DEFAULT_SCRIPT none
+const char *DEFAULT_SANDBOX_FILENAME = guest/sandbox.sh;
 
 #define MB_SHIFT   (20)
 #define KB_SHIFT   (10)
@@ -94,6 +95,7 @@ static bool custom_rootfs;
 static bool no_net;
 static bool no_dhcp;
 extern bool ioport_debug;
+static int  kvm_run_wrapper;
 extern int  active_console;
 extern int  debug_iodelay;
 
@@ -107,6 +109,15 @@ static const char * const run_usage[] = {
NULL
 };
 
+enum {
+   KVM_RUN_SANDBOX,
+};
+
+void kvm_run_set_wrapper_sandbox(void)
+{
+   kvm_run_wrapper = KVM_RUN_SANDBOX;
+}
+
 static int img_name_parser(const struct option *opt, const char *arg, int 
unset)
 {
char *sep;
@@ -755,6 +766,35 @@ static int kvm_run_set_sandbox(void)
return symlink(script, path);
 }
 
+static void kvm_run_write_sandbox_cmd(const char **argv, int argc)
+{
+   const char script_hdr[] = #! /bin/bash\n\n;
+   int fd;
+
+   remove(sandbox);
+
+   fd = open(sandbox, O_RDWR | O_CREAT, 0777);
+   if (fd  0)
+   die(Failed creating sandbox script);
+
+   if (write(fd, script_hdr, sizeof(script_hdr) - 1) = 0)
+   die(Failed writing sandbox script);
+
+   while (argc) {
+   if (write(fd, argv[0], strlen(argv[0])) = 0)
+   die(Failed writing sandbox script);
+   if (argc - 1)
+   if (write(fd,  , 1) = 0)
+   die(Failed writing sandbox script);
+   argv++;
+   argc--;
+   }
+   if (write(fd, \n, 1) = 0)
+   die(Failed writing sandbox script);
+
+   close(fd);
+}
+
 int kvm_cmd_run(int argc, const char **argv, const char *prefix)
 {
static char real_cmdline[2048], default_name[20];
@@ -780,8 +820,13 @@ int kvm_cmd_run(int argc, const char **argv, const char 
*prefix)
PARSE_OPT_KEEP_DASHDASH);
if (argc != 0) {
/* Cusrom options, should have been handled elsewhere */
-   if (strcmp(argv[0], --) == 0)
-   break;
+   if (strcmp(argv[0], --) == 0) {
+   if (kvm_run_wrapper == KVM_RUN_SANDBOX) {
+   sandbox = DEFAULT_SANDBOX_FILENAME;
+   kvm_run_write_sandbox_cmd(argv+1, 
argc-1);
+   break;
+   }
+   }
 
if (kernel_filename) {
fprintf(stderr, Cannot handle parameter: 
diff --git a/tools/kvm/builtin-sandbox.c b/tools/kvm/builtin-sandbox.c
new file mode 100644
index 000..433f536
--- /dev/null
+++ b/tools/kvm/builtin-sandbox.c
@@ -0,0 +1,9 @@

Re: [PATCH 3/5 V4] Add ioctl for KVM_GUEST_STOPPED

2011-12-05 Thread Eric B Munson

On Sat, 03 Dec 2011, Sasha Levin wrote:

 On Tue, 2011-11-29 at 16:35 -0500, Eric B Munson wrote:
  
  Now that we have a flag that will tell the guest it was suspended,
  create an interface for that communication using a KVM ioctl.
  
  Signed-off-by: Eric B Munson emun...@mgebm.net 
 
 Can it be documented in api.txt as well?
 
 -- 
 
 Sasha.
 

Thanks for the review, will do for V5.

Eric


signature.asc
Description: Digital signature

Re: [PATCH v2 1/3] pci: Rework config space blocking services

2011-12-05 Thread Jesse Barnes

On Fri,  4 Nov 2011 09:45:59 +0100
Jan Kiszka jan.kis...@siemens.com wrote:

 pci_block_user_cfg_access was designed for the use case that a single
 context, the IPR driver, temporarily delays user space accesses to the
 config space via sysfs. This assumption became invalid by the time
 pci_dev_reset was added as locking instance. Today, if you run two loops
 in parallel that reset the same device via sysfs, you end up with a
 kernel BUG as pci_block_user_cfg_access detect the broken assumption.
 
 This reworks the pci_block_user_cfg_access to a sleeping service
 pci_cfg_access_lock and an atomic-compatible variant called
 pci_cfg_access_trylock. The former not only blocks user space access as
 before but also waits if access was already locked. The latter service
 just returns false in this case, allowing the caller to resolve the
 conflict instead of raising a BUG.
 
 Adaptions of the ipr driver were originally written by Brian King.

Applied this series to linux-next, thanks.

-- 
Jesse Barnes, Intel Open Source Technology Center


signature.asc
Description: PGP signature

Re: [PATCHv2 RFC] virtio-pci: flexible configuration layout

2011-12-05 Thread Jesse Barnes

On Mon, 14 Nov 2011 20:18:55 +0200
Michael S. Tsirkin m...@redhat.com wrote:

 Add a flexible mechanism to specify virtio configuration layout, using
 pci vendor-specific capability.  A separate capability is used for each
 of common, device specific and data-path accesses.
 
 Warning: compiled only.
 This patch also needs to be split up, pci_iomap changes
 also need arch updates for non-x86.
 There might also be more spec changes.
 
 Posting here for early feedback, and to allow Sasha to
 proceed with his kvm tool work.
 
 Changes from v1:
 Updated to match v3 of the spec, see:
   Subject: [PATCHv3 RFC] virtio-spec: flexible configuration layout
   Message-ID: 2010122436.ga13...@redhat.com
   In-Reply-To: 2009195901.ga28...@redhat.com

Looks like this conflicts with your other iomap changes... I didn't
check your latest tree; do you just add another patch on top for the
virtio changes now?

Thanks,
-- 
Jesse Barnes, Intel Open Source Technology Center


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH] ivshmem: fix guest unable to start with ioeventfd

2011-12-05 Thread Cam Macdonell

2011/12/2 Cam Macdonell c...@cs.ualberta.ca:
 2011/11/30 Cam Macdonell c...@cs.ualberta.ca:
 2011/11/30 Zang Hongyong zanghongy...@huawei.com:
 Can this bug fix patch be applied yet?

 Sorry, for not replying yet.  I'll test your patch within the next day.

 Have you confirmed the proper receipt of interrupts in the receiving guests?

 I can confirm the bug occurs with ioeventfd enabled and that the
 patches fixes it, but sometime after 15.1, I no longer see interrupts
 (MSI or regular) being delivered in the guest.

 I will bisect tomorrow.

With Michael's help we debugged msi-x interrupt delivery.  With that
fix in place, this patch fixes ioeventfd in ivshmem.


 Cam


 With this bug, guest os cannot successfully boot with ioeventfd.
 Thus the new PIO DoorBell patch cannot be posted.

 Well, you can certainly post the new patch, just clarify that it's
 dependent on this patch.

 Sincerely,
 Cam


 Thanks,
 Hongyong

 于 2011/11/24,星期四 18:05, zanghongy...@huawei.com 写道:
 From: Hongyong Zang zanghongy...@huawei.com

 When a guest boots with ioeventfd, an error (by gdb) occurs:
   Program received signal SIGSEGV, Segmentation fault.
   0x006009cc in setup_ioeventfds (s=0x171dc40)
   at /home/louzhengwei/git_source/qemu-kvm/hw/ivshmem.c:363
   363 for (j = 0; j  s-peers[i].nb_eventfds; j++) {
 The bug is due to accessing s-peers which is NULL.

 This patch uses the memory region API to replace the old one 
 kvm_set_ioeventfd_mmio_long().
 And this patch makes memory_region_add_eventfd() called in ivshmem_read() 
 when qemu receives
 eventfd information from ivshmem_server.

 Signed-off-by: Hongyong Zang zanghongy...@huawei.com
 ---
  hw/ivshmem.c |   41 ++---
  1 files changed, 14 insertions(+), 27 deletions(-)

 diff --git a/hw/ivshmem.c b/hw/ivshmem.c
 index 242fbea..be26f03 100644
 --- a/hw/ivshmem.c
 +++ b/hw/ivshmem.c
 @@ -58,7 +58,6 @@ typedef struct IVShmemState {
  CharDriverState *server_chr;
  MemoryRegion ivshmem_mmio;

 -pcibus_t mmio_addr;
  /* We might need to register the BAR before we actually have the 
 memory.
   * So prepare a container MemoryRegion for the BAR immediately and
   * add a subregion when we have the memory.
 @@ -346,8 +345,14 @@ static void close_guest_eventfds(IVShmemState *s, int 
 posn)
  guest_curr_max = s-peers[posn].nb_eventfds;

  for (i = 0; i  guest_curr_max; i++) {
 -kvm_set_ioeventfd_mmio_long(s-peers[posn].eventfds[i],
 -s-mmio_addr + DOORBELL, (posn  16) | i, 0);
 +if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) {
 +memory_region_del_eventfd(s-ivshmem_mmio,
 + DOORBELL,
 + 4,
 + true,
 + (posn  16) | i,
 + s-peers[posn].eventfds[i]);
 +}
  close(s-peers[posn].eventfds[i]);
  }

 @@ -355,22 +360,6 @@ static void close_guest_eventfds(IVShmemState *s, int 
 posn)
  s-peers[posn].nb_eventfds = 0;
  }

 -static void setup_ioeventfds(IVShmemState *s) {
 -
 -int i, j;
 -
 -for (i = 0; i = s-max_peer; i++) {
 -for (j = 0; j  s-peers[i].nb_eventfds; j++) {
 -memory_region_add_eventfd(s-ivshmem_mmio,
 -  DOORBELL,
 -  4,
 -  true,
 -  (i  16) | j,
 -  s-peers[i].eventfds[j]);
 -}
 -}
 -}
 -
  /* this function increase the dynamic storage need to store data about 
 other
   * guests */
  static void increase_dynamic_storage(IVShmemState *s, int new_min_size) {
 @@ -491,10 +480,12 @@ static void ivshmem_read(void *opaque, const uint8_t 
 * buf, int flags)
  }

  if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) {
 -if (kvm_set_ioeventfd_mmio_long(incoming_fd, s-mmio_addr + 
 DOORBELL,
 -(incoming_posn  16) | guest_max_eventfd, 1)  
 0) {
 -fprintf(stderr, ivshmem: ioeventfd not available\n);
 -}
 +memory_region_add_eventfd(s-ivshmem_mmio,
 +  DOORBELL,
 +  4,
 +  true,
 +  (incoming_posn  16) | 
 guest_max_eventfd,
 +  incoming_fd);
  }

  return;
 @@ -659,10 +650,6 @@ static int pci_ivshmem_init(PCIDevice *dev)
  memory_region_init_io(s-ivshmem_mmio, ivshmem_mmio_ops, s,
ivshmem-mmio, IVSHMEM_REG_BAR_SIZE);

 -if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) {
 -setup_ioeventfds(s);
 -}
 -
  /* region for registers*/
  pci_register_bar(s-dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY,
   s-ivshmem_mmio);


--
To unsubscribe from

Re: [net-next RFC PATCH 2/5] tuntap: simple flow director support

2011-12-05 Thread Ben Hutchings

On Mon, 2011-12-05 at 16:58 +0800, Jason Wang wrote:
 This patch adds a simple flow director to tun/tap device. It is just a
 page that contains the hash to queue mapping which could be changed by
 user-space. The backend (tap/macvtap) would query this table to get
 the desired queue of a packets when it send packets to userspace.

This is just flow hashing (RSS), not flow steering.

 The page address were set through a new kind of ioctl - TUNSETFD and
 were pinned until device exit or another new page were specified.
[...]

You should implement ethtool ETHTOOL_{G,S}RXFHINDIR instead.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [net-next RFC PATCH 3/5] macvtap: flow director support

2011-12-05 Thread Ben Hutchings

Similarly, macvtap chould implement the ethtool {get,set}_rxfh_indir
operations.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH V4] Guest stop notification

2011-12-05 Thread Eric B Munson

Often when a guest is stopped from the qemu console, it will report spurious
soft lockup warnings on resume.  There are kernel patches being discussed that
will give the host the ability to tell the guest that it is being stopped and
should ignore the soft lockup warning that generates.  This patch uses the qemu
Notifier system to tell the guest it is about to be stopped.

Signed-off-by: Eric B Munson emun...@mgebm.net
Cc: Avi Kivity a...@redhat.com
Cc: Marcelo Tosatti mtosa...@redhat.com
Cc: Jan Kiszka jan.kis...@siemens.com
Cc: ry...@linux.vnet.ibm.com
Cc: aligu...@us.ibm.com
Cc: kvm@vger.kernel.org

---
Changes from V3:
 Collapse new state change notification function into existsing function.
 Correct whitespace issues
 Change ioctl name to KVMCLOCK_GUEST_PAUSED
 Use for loop to iterate vpcu's

Changes from V2:
 Move ioctl into hw/kvmclock.c so as other arches can use it as it is
implemented

Changes from V1:
 Remove unnecessary encapsulating function

 hw/kvmclock.c |   15 +++
 1 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/hw/kvmclock.c b/hw/kvmclock.c
index 5388bc4..fa11dd7 100644
--- a/hw/kvmclock.c
+++ b/hw/kvmclock.c
@@ -16,6 +16,7 @@
 #include sysbus.h
 #include kvm.h
 #include kvmclock.h
+#include cpu-all.h
 
 #include linux/kvm.h
 #include linux/kvm_para.h
@@ -62,10 +63,24 @@ static int kvmclock_post_load(void *opaque, int version_id)
 static void kvmclock_vm_state_change(void *opaque, int running,
  RunState state)
 {
+int ret;
+CPUState *penv = first_cpu;
 KVMClockState *s = opaque;
 
 if (running) {
 s-clock_valid = false;
+
+for (penv = first_cpu; penv != NULL; penv = penv-next_cpu) {
+ret = kvm_vcpu_ioctl(penv, KVMCLOCK_GUEST_PAUSED, 0);
+if (ret) {
+if (ret != -EINVAL) {
+fprintf(stderr,
+kvmclock_vm_state_change: %s\n,
+strerror(-ret));
+}
+return;
+}
+}
 }
 }
 
-- 
1.7.5.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/5 V5] Add flag to indicate that a vm was stopped by the host

2011-12-05 Thread Eric B Munson

This flag will be used to check if the vm was stopped by the host when a soft
lockup was detected.  The host will set the flag when it stops the guest.  On
resume, the guest will check this flag if a soft lockup is detected and skip
issuing the warning.

Signed-off-by: Eric B Munson emun...@mgebm.net
Cc: mi...@redhat.com
Cc: h...@zytor.com
Cc: a...@arndb.de
Cc: ry...@linux.vnet.ibm.com
Cc: aligu...@us.ibm.com
Cc: mtosa...@redhat.com
Cc: jeremy.fitzhardi...@citrix.com
Cc: levinsasha...@gmail.com
Cc: Jan Kiszka jan.kis...@siemens.com
Cc: kvm@vger.kernel.org
Cc: linux-a...@vger.kernel.org
Cc: x...@kernel.org
Cc: linux-ker...@vger.kernel.org
---
 arch/x86/include/asm/pvclock-abi.h |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/pvclock-abi.h 
b/arch/x86/include/asm/pvclock-abi.h
index 35f2d19..6167fd7 100644
--- a/arch/x86/include/asm/pvclock-abi.h
+++ b/arch/x86/include/asm/pvclock-abi.h
@@ -40,5 +40,6 @@ struct pvclock_wall_clock {
 } __attribute__((__packed__));
 
 #define PVCLOCK_TSC_STABLE_BIT (1  0)
+#define PVCLOCK_GUEST_STOPPED  (1  1)
 #endif /* __ASSEMBLY__ */
 #endif /* _ASM_X86_PVCLOCK_ABI_H */
-- 
1.7.5.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/5 V5] Add functions to check if the host has stopped the vm

2011-12-05 Thread Eric B Munson

When a host stops or suspends a VM it will set a flag to show this.  The
watchdog will use these functions to determine if a softlockup is real, or the
result of a suspended VM.

Signed-off-by: Eric B Munson emun...@mgebm.net
Cc: mi...@redhat.com
Cc: h...@zytor.com
Cc: a...@arndb.de
Cc: ry...@linux.vnet.ibm.com
Cc: aligu...@us.ibm.com
Cc: mtosa...@redhat.com
Cc: jeremy.fitzhardi...@citrix.com
Cc: levinsasha...@gmail.com
Cc: Jan Kiszka jan.kis...@siemens.com
Cc: kvm@vger.kernel.org
Cc: linux-a...@vger.kernel.org
Cc: x...@kernel.org
Cc: linux-ker...@vger.kernel.org
---
 arch/x86/include/asm/kvm_para.h |1 +
 arch/x86/kernel/kvmclock.c  |   21 +
 2 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 734c376..e9d63a6 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -95,6 +95,7 @@ struct kvm_vcpu_pv_apf_data {
 extern void kvmclock_init(void);
 extern int kvm_register_clock(char *txt);
 
+bool kvm_check_and_clear_guest_paused(int cpu);
 
 /* This instruction is vmcall.  On non-VT architectures, it will generate a
  * trap that we will then rewrite to the appropriate instruction.
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index 44842d7..f0c0599 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -22,6 +22,7 @@
 #include asm/msr.h
 #include asm/apic.h
 #include linux/percpu.h
+#include linux/hardirq.h
 
 #include asm/x86_init.h
 #include asm/reboot.h
@@ -114,6 +115,26 @@ static void kvm_get_preset_lpj(void)
preset_lpj = lpj;
 }
 
+bool kvm_check_and_clear_guest_paused(int cpu)
+{
+   bool ret = false;
+   struct pvclock_vcpu_time_info *src;
+
+   /*
+* per_cpu() is safe here because this function is only called from
+* timer functions where preemption is already disabled.
+*/
+   WARN_ON(!in_atomic());
+   src = per_cpu(hv_clock, cpu);
+   if ((src-flags  PVCLOCK_GUEST_STOPPED) != 0) {
+   src-flags = src-flags  (~PVCLOCK_GUEST_STOPPED);
+   ret = true;
+   }
+
+   return ret;
+}
+EXPORT_SYMBOL_GPL(kvm_check_and_clear_guest_paused);
+
 static struct clocksource kvm_clock = {
.name = kvm-clock,
.read = kvm_clock_get_cycles,
-- 
1.7.5.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/5 V5] Avoid soft lockup message when KVM is stopped by host

2011-12-05 Thread Eric B Munson

Changes from V4:
Rename KVM_GUEST_PAUSED to KVMCLOCK_GUEST_PAUSED
Add description of KVMCLOCK_GUEST_PAUSED ioctl to api.txt

Changes from V3:
Include CC's on patch 3
Drop clear flag ioctl and have the watchdog clear the flag when it is reset

Changes from V2:
A new kvm functions defined in kvm_para.h, the only change to pvclock is the
initial flag definition

Changes from V1:
(Thanks Marcelo)
Host code has all been moved to arch/x86/kvm/x86.c
KVM_PAUSE_GUEST was renamed to KVM_GUEST_PAUSED

When a guest kernel is stopped by the host hypervisor it can look like a soft
lockup to the guest kernel.  This false warning can mask later soft lockup
warnings which may be real.  This patch series adds a method for a host
hypervisor to communicate to a guest kernel that it is being stopped.  The
final patch in the series has the watchdog check this flag when it goes to
issue a soft lockup warning and skip the warning if the guest knows it was
stopped.

It was attempted to solve this in Qemu, but the side effects of saving and
restoring the clock and tsc for each vcpu put the wall clock of the guest behind
by the amount of time of the pause.  This forces a guest to have ntp running
in order to keep the wall clock accurate.

Cc: mi...@redhat.com
Cc: h...@zytor.com
Cc: a...@arndb.de
Cc: ry...@linux.vnet.ibm.com
Cc: aligu...@us.ibm.com
Cc: mtosa...@redhat.com
Cc: jeremy.fitzhardi...@citrix.com
Cc: levinsasha...@gmail.com
Cc: Jan Kiszka jan.kis...@siemens.com
Cc: kvm@vger.kernel.org
Cc: linux-a...@vger.kernel.org
Cc: x...@kernel.org
Cc: linux-ker...@vger.kernel.org

Eric B Munson (5):
  Add flag to indicate that a vm was stopped by the host
  Add functions to check if the host has stopped the vm
  Add ioctl for KVMCLOCK_GUEST_STOPPED
  Add generic stubs for kvm stop check functions
  Add check for suspended vm in softlockup detector

 Documentation/virtual/kvm/api.txt  |   12 
 arch/x86/include/asm/kvm_host.h|2 ++
 arch/x86/include/asm/kvm_para.h|1 +
 arch/x86/include/asm/pvclock-abi.h |1 +
 arch/x86/kernel/kvmclock.c |   21 +
 arch/x86/kvm/x86.c |   20 
 include/asm-generic/kvm_para.h |   14 ++
 include/linux/kvm.h|2 ++
 kernel/watchdog.c  |   12 
 9 files changed, 85 insertions(+), 0 deletions(-)
 create mode 100644 include/asm-generic/kvm_para.h

-- 
1.7.5.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 5/5 V5] Add check for suspended vm in softlockup detector

2011-12-05 Thread Eric B Munson

A suspended VM can cause spurious soft lockup warnings.  To avoid these, the
watchdog now checks if the kernel knows it was stopped by the host and skips
the warning if so.  When the watchdog is reset successfully, clear the guest
paused flag.

Signed-off-by: Eric B Munson emun...@mgebm.net
Cc: mi...@redhat.com
Cc: h...@zytor.com
Cc: a...@arndb.de
Cc: ry...@linux.vnet.ibm.com
Cc: aligu...@us.ibm.com
Cc: mtosa...@redhat.com
Cc: jeremy.fitzhardi...@citrix.com
Cc: levinsasha...@gmail.com
Cc: Jan Kiszka jan.kis...@siemens.com
Cc: kvm@vger.kernel.org
Cc: linux-a...@vger.kernel.org
Cc: x...@kernel.org
Cc: linux-ker...@vger.kernel.org
---
Changes from V3:
 Clear the PAUSED flag when the watchdog is reset

 kernel/watchdog.c |   12 
 1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 1d7bca7..7c62919 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -25,6 +25,7 @@
 #include linux/sysctl.h
 
 #include asm/irq_regs.h
+#include linux/kvm_para.h
 #include linux/perf_event.h
 
 int watchdog_enabled = 1;
@@ -280,6 +281,9 @@ static enum hrtimer_restart watchdog_timer_fn(struct 
hrtimer *hrtimer)
__this_cpu_write(softlockup_touch_sync, false);
sched_clock_tick();
}
+
+   /* Clear the guest paused flag on watchdog reset */
+   kvm_check_and_clear_guest_paused(smp_processor_id());
__touch_watchdog();
return HRTIMER_RESTART;
}
@@ -292,6 +296,14 @@ static enum hrtimer_restart watchdog_timer_fn(struct 
hrtimer *hrtimer)
 */
duration = is_softlockup(touch_ts);
if (unlikely(duration)) {
+   /*
+* If a virtual machine is stopped by the host it can look to
+* the watchdog like a soft lockup, check to see if the host
+* stopped the vm before we issue the warning
+*/
+   if (kvm_check_and_clear_guest_paused(smp_processor_id()))
+   return HRTIMER_RESTART;
+
/* only warn once */
if (__this_cpu_read(soft_watchdog_warn) == true)
return HRTIMER_RESTART;
-- 
1.7.5.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: winXP Standard PC HAL and qemu-kvm = 0.15

2011-12-05 Thread Michael Tokarev

On 05.12.2011 17:28, Avi Kivity wrote:
[]
 I haven't debugged further yet, -- because it were
 not easy to find out what was causing the regression
 and how to reproduce it, and also because I don't think
 it is the right HAL for qemu-kvm guest anyway.
 
 It's not, but the regression indicates we broke something.  It would be
 good to know what that is.

So today I gave it a chance with git bisect, and here's what it found:

First bad commit ef390067a72fe09977bb4ac8211313e1503302ea
Merge: c7b3e90 0fd542f
Author: Avi Kivity a...@redhat.com
Date:   Sun May 15 04:48:05 2011 -0400

Merge commit '0fd542fb7d13ddf12f897bb27c5950f31638b1df' into upstream-merge

* commit '0fd542fb7d13ddf12f897bb27c5950f31638b1df':
  cpu: add set_memory flag to request dirty logging
  piix_pci: load path clean up
  piix_pci: optimize set irq path
  piix_pci: eliminate PIIX3State::pci_irq_levels
  pci: add accessor function to get irq levels
  cirrus_vga: remove unneeded reset

Conflicts:
exec.c

Signed-off-by: Avi Kivity a...@redhat.com

And just like with the 32/64bit lockup issue, this is a merge
commit, which is not exactly useful.

Any guesses? :)

The problem is that so far, there's no known way to change to
use proper hal type in winXP (except of reinstalling the guest),
and there's no known workaround on the kvm side, so users are
stuck with older versions.

 So, if anybody have some thoughts about this issue,
 and especially if you know a way to switch winXP HAL
 type to some ACPI variant without reinstalling, please
 speak up.. ;)
 
 I remember doing it somewhere in device manager, perhaps in the
 processor entry.  But it was years since I last did this.

As I already mentioned, changing HAL type works from anything to
Standard PC, but not back.  I'll try to investigate.

 Debian bugreport for a reference: http://bugs.debian.org/647312

 Reproducer: install a winXP guest on kvm with -no-acpi so
 it chooses an Uniprocessor with MPS HAL.  Switch it to
 Standard PC in device manager, reboot -- in 0.15+ it does
 not work anymore, while in 0.14 it continues to work fine.
 
 Most likely non-ACPI interrupt routing.

The commit it bisected to talks about piix -- may it be related?

Thanks,

/mjt
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv2 RFC] virtio-pci: flexible configuration layout

2011-12-05 Thread Michael S. Tsirkin

On Mon, Dec 05, 2011 at 11:16:05AM -0800, Jesse Barnes wrote:
 On Mon, 14 Nov 2011 20:18:55 +0200
 Michael S. Tsirkin m...@redhat.com wrote:
 
  Add a flexible mechanism to specify virtio configuration layout, using
  pci vendor-specific capability.  A separate capability is used for each
  of common, device specific and data-path accesses.
  
  Warning: compiled only.
  This patch also needs to be split up, pci_iomap changes
  also need arch updates for non-x86.
  There might also be more spec changes.
  
  Posting here for early feedback, and to allow Sasha to
  proceed with his kvm tool work.
  
  Changes from v1:
  Updated to match v3 of the spec, see:
  Subject: [PATCHv3 RFC] virtio-spec: flexible configuration layout
  Message-ID: 2010122436.ga13...@redhat.com
  In-Reply-To: 2009195901.ga28...@redhat.com
 
 Looks like this conflicts with your other iomap changes... I didn't
 check your latest tree; do you just add another patch on top for the
 virtio changes now?
 
 Thanks,

Yes. Rusty asked for more changes so that isn't yet pushed.

 -- 
 Jesse Barnes, Intel Open Source Technology Center


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 4/5 V5] Add generic stubs for kvm stop check functions

2011-12-05 Thread Eric B Munson

Signed-off-by: Eric B Munson emun...@mgebm.net
Cc: mi...@redhat.com
Cc: h...@zytor.com
Cc: a...@arndb.de
Cc: ry...@linux.vnet.ibm.com
Cc: aligu...@us.ibm.com
Cc: mtosa...@redhat.com
Cc: jeremy.fitzhardi...@citrix.com
Cc: levinsasha...@gmail.com
Cc: Jan Kiszka jan.kis...@siemens.com
Cc: kvm@vger.kernel.org
Cc: linux-a...@vger.kernel.org
Cc: x...@kernel.org
Cc: linux-ker...@vger.kernel.org
---
 include/asm-generic/kvm_para.h |   14 ++
 1 files changed, 14 insertions(+), 0 deletions(-)
 create mode 100644 include/asm-generic/kvm_para.h

diff --git a/include/asm-generic/kvm_para.h b/include/asm-generic/kvm_para.h
new file mode 100644
index 000..177e1eb
--- /dev/null
+++ b/include/asm-generic/kvm_para.h
@@ -0,0 +1,14 @@
+#ifndef _ASM_GENERIC_KVM_PARA_H
+#define _ASM_GENERIC_KVM_PARA_H
+
+
+/*
+ * This function is used by architectures that support kvm to avoid issuing
+ * false soft lockup messages.
+ */
+static inline bool kvm_check_and_clear_guest_paused(int cpu)
+{
+   return false;
+}
+
+#endif
-- 
1.7.5.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/5 V5] Add ioctl for KVMCLOCK_GUEST_STOPPED

2011-12-05 Thread Eric B Munson

Now that we have a flag that will tell the guest it was suspended, create an
interface for that communication using a KVM ioctl.

Signed-off-by: Eric B Munson emun...@mgebm.net

Cc: mi...@redhat.com
Cc: h...@zytor.com
Cc: a...@arndb.de
Cc: ry...@linux.vnet.ibm.com
Cc: aligu...@us.ibm.com
Cc: mtosa...@redhat.com
Cc: jeremy.fitzhardi...@citrix.com
Cc: levinsasha...@gmail.com
Cc: Jan Kiszka jan.kis...@siemens.com
Cc: kvm@vger.kernel.org
Cc: linux-a...@vger.kernel.org
Cc: x...@kernel.org
Cc: linux-ker...@vger.kernel.org
---
Changes from V4:
 Rename KVM_GUEST_PAUSED to KVMCLOCK_GUEST_PAUSED
 Add new ioctl description to api.txt

 Documentation/virtual/kvm/api.txt |   12 
 arch/x86/include/asm/kvm_host.h   |2 ++
 arch/x86/kvm/x86.c|   20 
 include/linux/kvm.h   |2 ++
 4 files changed, 36 insertions(+), 0 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 7945b0b..0f7dd99 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1450,6 +1450,18 @@ is supported; 2 if the processor requires all virtual 
machines to have
 an RMA, or 1 if the processor can use an RMA but doesn't require it,
 because it supports the Virtual RMA (VRMA) facility.
 
+4.64 KVMCLOCK_GUEST_PAUSED
+
+Capability: basic
+Architechtures: Any that implement pvclocks (currently x86 only)
+Type: vcpu ioctl
+Parameters: None
+Returns: 0 on success, -1 on error
+
+This signals to the host kernel that the specified guest is being paused by
+userspace.  The host will set a flag in the pvclock structure that is checked
+from the soft lockup watchdog.
+
 5. The kvm_run structure
 
 Application code obtains a pointer to the kvm_run structure by
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b4973f4..beb94c6 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -672,6 +672,8 @@ int kvm_pv_mmu_op(struct kvm_vcpu *vcpu, unsigned long 
bytes,
  gpa_t addr, unsigned long *ret);
 u8 kvm_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn);
 
+int kvm_set_guest_paused(struct kvm_vcpu *vcpu);
+
 extern bool tdp_enabled;
 
 u64 vcpu_tsc_khz(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c38efd7..1dab5fd 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3295,6 +3295,10 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 
goto out;
}
+   case KVMCLOCK_GUEST_PAUSED: {
+   r = kvm_set_guest_paused(vcpu);
+   break;
+   }
default:
r = -EINVAL;
}
@@ -6117,6 +6121,22 @@ int kvm_task_switch(struct kvm_vcpu *vcpu, u16 
tss_selector, int reason,
 }
 EXPORT_SYMBOL_GPL(kvm_task_switch);
 
+/*
+ * kvm_set_guest_paused() indicates to the guest kernel that it has been
+ * stopped by the hypervisor.  This function will be called from the host only.
+ * EINVAL is returned when the host attempts to set the flag for a guest that
+ * does not support pv clocks.
+ */
+int kvm_set_guest_paused(struct kvm_vcpu *vcpu)
+{
+   struct pvclock_vcpu_time_info *src = vcpu-arch.hv_clock;
+   if (!vcpu-arch.time_page)
+   return -EINVAL;
+   src-flags |= PVCLOCK_GUEST_STOPPED;
+   return 0;
+}
+EXPORT_SYMBOL_GPL(kvm_set_guest_paused);
+
 int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
  struct kvm_sregs *sregs)
 {
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index c3892fc..1d1ddef 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -762,6 +762,8 @@ struct kvm_clock_data {
 #define KVM_CREATE_SPAPR_TCE _IOW(KVMIO,  0xa8, struct 
kvm_create_spapr_tce)
 /* Available with KVM_CAP_RMA */
 #define KVM_ALLOCATE_RMA _IOR(KVMIO,  0xa9, struct kvm_allocate_rma)
+/* VM is being stopped by host */
+#define KVMCLOCK_GUEST_PAUSED_IO(KVMIO,   0xaa)
 
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU(1  0)
 
-- 
1.7.5.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [net-next RFC PATCH 5/5] virtio-net: flow director support

2011-12-05 Thread Ben Hutchings

On Mon, 2011-12-05 at 16:59 +0800, Jason Wang wrote:
 In order to let the packets of a flow to be passed to the desired
 guest cpu, we can co-operate with devices through programming the flow
 director which was just a hash to queue table.
 
 This kinds of co-operation is done through the accelerate RFS support,
 a device specific flow sterring method virtnet_fd() is used to modify
 the flow director based on rfs mapping. The desired queue were
 calculated through reverse mapping of the irq affinity table. In order
 to parallelize the ingress path, irq affinity of rx queue were also
 provides by the driver.
 
 In addition to accelerate RFS, we can also use the guest scheduler to
 balance the load of TX and reduce the lock contention on egress path,
 so the processor_id() were used to tx queue selection.
[...]
 +#ifdef CONFIG_RFS_ACCEL
 +
 +int virtnet_fd(struct net_device *net_dev, const struct sk_buff *skb,
 +u16 rxq_index, u32 flow_id)
 +{
 + struct virtnet_info *vi = netdev_priv(net_dev);
 + u16 *table = NULL;
 +
 + if (skb-protocol != htons(ETH_P_IP) || !skb-rxhash)
 + return -EPROTONOSUPPORT;

Why only IPv4?

 + table = kmap_atomic(vi-fd_page);
 + table[skb-rxhash  TAP_HASH_MASK] = rxq_index;
 + kunmap_atomic(table);
 +
 + return 0;
 +}
 +#endif

This is not a proper implementation of ndo_rx_flow_steer.  If you steer
a flow by changing the RSS table this can easily cause packet reordering
in other flows.  The filtering should be more precise, ideally matching
exactly a single flow by e.g. VID and IP 5-tuple.

I think you need to add a second hash table which records exactly which
flow is supposed to be steered.  Also, you must call
rps_may_expire_flow() to check whether an entry in this table may be
replaced; otherwise you can cause packet reordering in the flow that was
previously being steered.

Finally, this function must return the table index it assigned, so that
rps_may_expire_flow() works.

 +static u16 virtnet_select_queue(struct net_device *dev, struct sk_buff *skb)
 +{
 + int txq = skb_rx_queue_recorded(skb) ? skb_get_rx_queue(skb) :
 +smp_processor_id();
 +
 + /* As we make use of the accelerate rfs which let the scheduler to
 +  * balance the load, it make sense to choose the tx queue also based on
 +  * theprocessor id?
 +  */
 + while (unlikely(txq = dev-real_num_tx_queues))
 + txq -= dev-real_num_tx_queues;
 + return txq;
 +}
[...]

Don't do this, let XPS handle it.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

kvm deadlock

2011-12-05 Thread Nate Custer

Hello,

I am struggling with repeatable full hardware locks when running 8-12 KVM vms. 
At some point before the hard lock I get a inconsistent lock state warning. An 
example of this can be found here:

http://pastebin.com/8wKhgE2C

After that the server continues to run for a while and then starts its death 
spiral. When it reaches that point it fails to log anything further to the 
disk, but by attaching a console I have been able to get a stack trace 
documenting the final implosion:

http://pastebin.com/PbcN76bd

All of the cores end up hung and the server stops responding to all input, 
including SysRq commands. 

I have seen this behavior on two machines (dual E5606 running Fedora 16) both 
passed cpuburnin testing and memtest86 scans without error. 

I have reproduced the crash and stack traces from a Fedora debugging kernel - 
3.1.2-1 and with a vanilla 3.1.4 kernel.

Nate Custer
QA Analyst
cPanel Inc--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Xen-devel] [PATCH RFC V3 1/4] debugfs: Add support to print u32 array in debugfs

2011-12-05 Thread Konrad Rzeszutek Wilk

On Wed, Nov 30, 2011 at 02:29:39PM +0530, Raghavendra K T wrote:
 Add debugfs support to print u32-arrays in debugfs. Move the code from Xen to 
 debugfs
 to make the code common for other users as well.
 
 Signed-off-by: Srivatsa Vaddagiri va...@linux.vnet.ibm.com
 Signed-off-by: Suzuki Poulose suz...@in.ibm.com
 Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com

Looks good to me.
 ---
 diff --git a/arch/x86/xen/debugfs.c b/arch/x86/xen/debugfs.c
 index 7c0fedd..c8377fb 100644
 --- a/arch/x86/xen/debugfs.c
 +++ b/arch/x86/xen/debugfs.c
 @@ -19,107 +19,3 @@ struct dentry * __init xen_init_debugfs(void)
   return d_xen_debug;
  }
  
 -struct array_data
 -{
 - void *array;
 - unsigned elements;
 -};
 -
 -static int u32_array_open(struct inode *inode, struct file *file)
 -{
 - file-private_data = NULL;
 - return nonseekable_open(inode, file);
 -}
 -
 -static size_t format_array(char *buf, size_t bufsize, const char *fmt,
 -u32 *array, unsigned array_size)
 -{
 - size_t ret = 0;
 - unsigned i;
 -
 - for(i = 0; i  array_size; i++) {
 - size_t len;
 -
 - len = snprintf(buf, bufsize, fmt, array[i]);
 - len++;  /* ' ' or '\n' */
 - ret += len;
 -
 - if (buf) {
 - buf += len;
 - bufsize -= len;
 - buf[-1] = (i == array_size-1) ? '\n' : ' ';
 - }
 - }
 -
 - ret++;  /* \0 */
 - if (buf)
 - *buf = '\0';
 -
 - return ret;
 -}
 -
 -static char *format_array_alloc(const char *fmt, u32 *array, unsigned 
 array_size)
 -{
 - size_t len = format_array(NULL, 0, fmt, array, array_size);
 - char *ret;
 -
 - ret = kmalloc(len, GFP_KERNEL);
 - if (ret == NULL)
 - return NULL;
 -
 - format_array(ret, len, fmt, array, array_size);
 - return ret;
 -}
 -
 -static ssize_t u32_array_read(struct file *file, char __user *buf, size_t 
 len,
 -   loff_t *ppos)
 -{
 - struct inode *inode = file-f_path.dentry-d_inode;
 - struct array_data *data = inode-i_private;
 - size_t size;
 -
 - if (*ppos == 0) {
 - if (file-private_data) {
 - kfree(file-private_data);
 - file-private_data = NULL;
 - }
 -
 - file-private_data = format_array_alloc(%u, data-array, 
 data-elements);
 - }
 -
 - size = 0;
 - if (file-private_data)
 - size = strlen(file-private_data);
 -
 - return simple_read_from_buffer(buf, len, ppos, file-private_data, 
 size);
 -}
 -
 -static int xen_array_release(struct inode *inode, struct file *file)
 -{
 - kfree(file-private_data);
 -
 - return 0;
 -}
 -
 -static const struct file_operations u32_array_fops = {
 - .owner  = THIS_MODULE,
 - .open   = u32_array_open,
 - .release= xen_array_release,
 - .read   = u32_array_read,
 - .llseek = no_llseek,
 -};
 -
 -struct dentry *xen_debugfs_create_u32_array(const char *name, mode_t mode,
 - struct dentry *parent,
 - u32 *array, unsigned elements)
 -{
 - struct array_data *data = kmalloc(sizeof(*data), GFP_KERNEL);
 -
 - if (data == NULL)
 - return NULL;
 -
 - data-array = array;
 - data-elements = elements;
 -
 - return debugfs_create_file(name, mode, parent, data, u32_array_fops);
 -}
 diff --git a/arch/x86/xen/debugfs.h b/arch/x86/xen/debugfs.h
 index e281320..12ebf33 100644
 --- a/arch/x86/xen/debugfs.h
 +++ b/arch/x86/xen/debugfs.h
 @@ -3,8 +3,4 @@
  
  struct dentry * __init xen_init_debugfs(void);
  
 -struct dentry *xen_debugfs_create_u32_array(const char *name, mode_t mode,
 - struct dentry *parent,
 - u32 *array, unsigned elements);
 -
  #endif /* _XEN_DEBUGFS_H */
 diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
 index fc506e6..14a8961 100644
 --- a/arch/x86/xen/spinlock.c
 +++ b/arch/x86/xen/spinlock.c
 @@ -286,7 +286,7 @@ static int __init xen_spinlock_debugfs(void)
   debugfs_create_u64(time_blocked, 0444, d_spin_debug,
  spinlock_stats.time_blocked);
  
 - xen_debugfs_create_u32_array(histo_blocked, 0444, d_spin_debug,
 + debugfs_create_u32_array(histo_blocked, 0444, d_spin_debug,
spinlock_stats.histo_spin_blocked, 
 HISTO_BUCKETS + 1);
  
   return 0;
 diff --git a/fs/debugfs/file.c b/fs/debugfs/file.c
 index 90f7657..df44ccf 100644
 --- a/fs/debugfs/file.c
 +++ b/fs/debugfs/file.c
 @@ -18,6 +18,7 @@
  #include linux/pagemap.h
  #include linux/namei.h
  #include linux/debugfs.h
 +#include linux/slab.h
  
  static ssize_t default_read_file(struct file *file, char __user *buf,
size_t count, loff_t *ppos)
 @@ -525,3 +526,130 @@

[no subject]

2011-12-05 Thread Cao,Bing Bu


subscribe kvm

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC V3 4/4] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor

2011-12-05 Thread Konrad Rzeszutek Wilk

On Wed, Nov 30, 2011 at 02:30:38PM +0530, Raghavendra K T wrote:
 This patch extends Linux guests running on KVM hypervisor to support
 pv-ticketlocks. 
 During smp_boot_cpus  paravirtualied KVM guest detects if the hypervisor has
 required feature (KVM_FEATURE_KICK_VCPU) to support pv-ticketlocks. If so,
  support for pv-ticketlocks is registered via pv_lock_ops.
 
 Signed-off-by: Srivatsa Vaddagiri va...@linux.vnet.ibm.com
 Signed-off-by: Suzuki Poulose suz...@in.ibm.com
 Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
 ---
 diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
 index 8b1d65d..7e419ad 100644
 --- a/arch/x86/include/asm/kvm_para.h
 +++ b/arch/x86/include/asm/kvm_para.h
 @@ -195,10 +195,21 @@ void kvm_async_pf_task_wait(u32 token);
  void kvm_async_pf_task_wake(u32 token);
  u32 kvm_read_and_reset_pf_reason(void);
  extern void kvm_disable_steal_time(void);
 -#else
 -#define kvm_guest_init() do { } while (0)
 +
 +#ifdef CONFIG_PARAVIRT_SPINLOCKS
 +void __init kvm_spinlock_init(void);
 +#else /* CONFIG_PARAVIRT_SPINLOCKS */
 +static void kvm_spinlock_init(void)
 +{
 +}
 +#endif /* CONFIG_PARAVIRT_SPINLOCKS */
 +
 +#else /* CONFIG_KVM_GUEST */
 +#define kvm_guest_init() do {} while (0)
  #define kvm_async_pf_task_wait(T) do {} while(0)
  #define kvm_async_pf_task_wake(T) do {} while(0)
 +#define kvm_spinlock_init() do {} while (0)
 +
  static inline u32 kvm_read_and_reset_pf_reason(void)
  {
   return 0;
 diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
 index a9c2116..dffeea3 100644
 --- a/arch/x86/kernel/kvm.c
 +++ b/arch/x86/kernel/kvm.c
 @@ -33,6 +33,7 @@
  #include linux/sched.h
  #include linux/slab.h
  #include linux/kprobes.h
 +#include linux/debugfs.h
  #include asm/timer.h
  #include asm/cpu.h
  #include asm/traps.h
 @@ -545,6 +546,7 @@ static void __init kvm_smp_prepare_boot_cpu(void)
  #endif
   kvm_guest_cpu_init();
   native_smp_prepare_boot_cpu();
 + kvm_spinlock_init();
  }
  
  static void __cpuinit kvm_guest_cpu_online(void *dummy)
 @@ -627,3 +629,248 @@ static __init int activate_jump_labels(void)
   return 0;
  }
  arch_initcall(activate_jump_labels);
 +
 +#ifdef CONFIG_PARAVIRT_SPINLOCKS
 +
 +enum kvm_contention_stat {
 + TAKEN_SLOW,
 + TAKEN_SLOW_PICKUP,
 + RELEASED_SLOW,
 + RELEASED_SLOW_KICKED,
 + NR_CONTENTION_STATS
 +};
 +
 +#ifdef CONFIG_KVM_DEBUG_FS
 +
 +static struct kvm_spinlock_stats
 +{
 + u32 contention_stats[NR_CONTENTION_STATS];
 +
 +#define HISTO_BUCKETS30
 + u32 histo_spin_blocked[HISTO_BUCKETS+1];
 +
 + u64 time_blocked;
 +} spinlock_stats;
 +
 +static u8 zero_stats;
 +
 +static inline void check_zero(void)
 +{
 + u8 ret;
 + u8 old = ACCESS_ONCE(zero_stats);
 + if (unlikely(old)) {
 + ret = cmpxchg(zero_stats, old, 0);
 + /* This ensures only one fellow resets the stat */
 + if (ret == old)
 + memset(spinlock_stats, 0, sizeof(spinlock_stats));
 + }
 +}
 +
 +static inline void add_stats(enum kvm_contention_stat var, int val)

You probably want 'int val' to be 'u32 val' as that is the type
in contention_stats.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 00/28] kvm tools: Prepare kvmtool for another architecture

2011-12-05 Thread Matt Evans

Hi,


This patch series rearranges and tidies various parts of kvmtool to pave the way
for the addition of support for another architecture -- SPAPR PPC64.  A second
patch series will follow to present the PPC64 support.

kvmtool is extremely x86-specific, so a fair chunk of refactoring into common
code vs architecture-specific code is performed in this set.  It also has a
(refreshingly small) set of endian bugs that are fixed, plus assumptions about
the hardware presented to the guest.

I've started the series with the main meat-- moving/renaming things like bios,
CPU setup, guest address space layout, interrupts, ioports etc., into a new x86/
directory.  The Makefile determines an architecture and builds the appropriate
dir, devices, etc.

Follow-on patches change some of the mechanics, for example modifying the loop
around ioctl(KVM_RUN) so that whilst it stays generic, it calls into
arch-specific code to handle specific exit reasons, MMIO etc.  The builtin-run
initialisation path is rationalised so that PCI  IRQs are initialised before
devices, and all of this happens before arch-specific code is given the chance
to initialise any firmware and generate any device trees.

Most of this series is fairly trivial, in moving code, making definitions
arch-local or available via a header, endian sanitisation.  The PCI code changes
are probably most 'interesting', in that I have made the config space accesses
available to those not using the PC ioport access method, plus wrapped
initialisations of config space with cpu_to_leXX accesses.

If there's anything in this series that'll cause the world to end, or stain, do
let me know. :)


Cheers,


Matt



Matt Evans (28):
  kvm tools: Split x86 arch-specific bits into x86/
  kvm tools: Only build/init i8042 on x86
  kvm tools: Add Makefile parameter for kernel include path
  kvm tools: Re-arrange Makefile to heed CFLAGS before checking for
optional libs
  kvm tools: 64-bit tidy; use PRIx64 when printf'ing u64s and link
appropriately
  kvm tools: Add arch-specific KVM_RUN exit handling via
kvm_cpu__handle_exit()
  kvm tools: Move 'kvm__recommended_cpus' to arch-specific code
  kvm tools: Fix KVM_RUN exit code check
  kvm tools: Add kvm__arch_periodic_poll()
  kvm tools: term.h needs to include stdbool.h
  kvm tools: kvm.c needs to include sys/stat.h for mkdir
  kvm tools: Move arch-specific cmdline init into
kvm__arch_set_cmdline()
  kvm tools: Add CONSOLE_HV term type and allow it to be selected
  kvm tools: Fix term_getc(), term_getc_iov() endian bugs
  kvm tools: Allow initrd_check() to match a cpio
  kvm tools: Allow load_flat_binary() to load an initrd alongside
  kvm tools: Only call symbol__init() if we have BFD
  kvm tools: Initialise PCI before devices start getting registered
with PCI
  kvm tools: Perform CPU and firmware setup after devices are added
  kvm tools: Init IRQs after determining nrcpus
  kvm tools: Add --hugetlbfs option to specify memory path
  kvm tools: Move PCI_MAX_DEVICES to pci.h
  kvm tools: Endian-sanitise pci.h and PCI device setup
  kvm tools: Fix virtio-pci endian bug when reading
VIRTIO_PCI_QUEUE_NUM
  kvm tools: Correctly set virtio-pci bar_size and remove hardwired
address
  kvm tools: Add pci__config_{rd,wr}(), pci__find_dev() and fix PCI
config register addressing
  kvm tools: Arch-specific define for PCI MMIO allocation area
  kvm tools: Create arch-specific kvm_cpu__emulate_io()

 tools/kvm/Makefile  |  139 +---
 tools/kvm/builtin-run.c |   82 +++--
 tools/kvm/builtin-stat.c|4 +-
 tools/kvm/disk/core.c   |4 +-
 tools/kvm/hw/pci-shmem.c|   23 +-
 tools/kvm/hw/vesa.c |   15 +-
 tools/kvm/include/kvm/ioport.h  |   13 +-
 tools/kvm/include/kvm/kvm-cpu.h |   30 +--
 tools/kvm/include/kvm/kvm.h |   62 +---
 tools/kvm/include/kvm/pci.h |   30 ++-
 tools/kvm/include/kvm/term.h|2 +
 tools/kvm/ioport.c  |   54 ---
 tools/kvm/kvm-cpu.c |  407 +-
 tools/kvm/kvm.c |  374 +---
 tools/kvm/mmio.c|4 +-
 tools/kvm/pci.c |   76 +++--
 tools/kvm/term.c|5 +-
 tools/kvm/virtio/pci.c  |   51 ++--
 tools/kvm/{ = x86}/bios.c  |0
 tools/kvm/{ = x86}/bios/.gitignore |0
 tools/kvm/{ = x86}/bios/bios-rom.S |2 +-
 tools/kvm/{ = x86}/bios/e820.c |0
 tools/kvm/{ = x86}/bios/entry.S|0
 tools/kvm/{ = x86}/bios/gen-offsets.sh |0
 tools/kvm/{ = x86}/bios/int10.c|0
 tools/kvm/{ = x86}/bios/int15.c

[PATCH 02/28] kvm tools: Only build/init i8042 on x86

2011-12-05 Thread Matt Evans

Not every architecture has an i8042 kbd controller, so only use this when
building for x86.

Signed-off-by: Matt Evans m...@ozlabs.org
---
 tools/kvm/Makefile  |2 +-
 tools/kvm/builtin-run.c |2 ++
 2 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile
index 243886e..f58a1d8 100644
--- a/tools/kvm/Makefile
+++ b/tools/kvm/Makefile
@@ -77,7 +77,6 @@ OBJS  += util/strbuf.o
 OBJS   += virtio/9p.o
 OBJS   += virtio/9p-pdu.o
 OBJS   += hw/vesa.o
-OBJS   += hw/i8042.o
 OBJS   += hw/pci-shmem.o
 OBJS   += kvm-ipc.o
 
@@ -153,6 +152,7 @@ ifeq ($(ARCH),x86)
OBJS+= x86/kvm.o
OBJS+= x86/kvm-cpu.o
OBJS+= x86/mptable.o
+   OBJS+= hw/i8042.o
 # Exclude BIOS object files from header dependencies.
OTHEROBJS   += x86/bios.o
OTHEROBJS   += x86/bios/bios-rom.o
diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c
index 9148d83..e4aa87e 100644
--- a/tools/kvm/builtin-run.c
+++ b/tools/kvm/builtin-run.c
@@ -941,7 +941,9 @@ int kvm_cmd_run(int argc, const char **argv, const char 
*prefix)
 
kvm__init_ram(kvm);
 
+#ifdef CONFIG_X86
kbd__init(kvm);
+#endif
 
pci_shmem__init(kvm);
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 03/28] kvm tools: Add Makefile parameter for kernel include path

2011-12-05 Thread Matt Evans

This patch adds an 'I' parameter to override the default kernel include path of
'../../include'.

Signed-off-by: Matt Evans m...@ozlabs.org
---
 tools/kvm/Makefile |9 +++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile
index f58a1d8..f85a154 100644
--- a/tools/kvm/Makefile
+++ b/tools/kvm/Makefile
@@ -9,7 +9,12 @@ else
E = @\#
Q =
 endif
-export E Q
+ifneq ($(I), )
+   KINCL_PATH=$(I)
+else
+   KINCL_PATH=../..
+endif
+export E Q KINCL_PATH
 
 include config/utilities.mak
 include config/feature-tests.mak
@@ -176,7 +181,7 @@ DEFINES += -DKVMTOOLS_VERSION='$(KVMTOOLS_VERSION)'
 DEFINES+= -DBUILD_ARCH='$(ARCH)'
 
 KVM_INCLUDE := include
-CFLAGS += $(CPPFLAGS) $(DEFINES) -I$(KVM_INCLUDE) -I$(ARCH_INCLUDE) 
-I../../include -I../../arch/$(ARCH)/include/ -Os -g
+CFLAGS += $(CPPFLAGS) $(DEFINES) -I$(KVM_INCLUDE) -I$(ARCH_INCLUDE) 
-I$(KINCL_PATH)/include -I$(KINCL_PATH)/arch/$(ARCH)/include/ -Os -g
 
 ifneq ($(WERROR),0)
WARNINGS += -Werror
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 04/28] kvm tools: Re-arrange Makefile to heed CFLAGS before checking for optional libs

2011-12-05 Thread Matt Evans

The checks for optional libraries build code to perform the tests, so should
respect certain CFLAGS -- in particular, -m64 so we check for 64bit libraries if
they're required.

Signed-off-by: Matt Evans m...@ozlabs.org
---
 tools/kvm/Makefile |   86 ++-
 1 files changed, 44 insertions(+), 42 deletions(-)

diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile
index f85a154..009a6ba 100644
--- a/tools/kvm/Makefile
+++ b/tools/kvm/Makefile
@@ -85,48 +85,6 @@ OBJS += hw/vesa.o
 OBJS   += hw/pci-shmem.o
 OBJS   += kvm-ipc.o
 
-FLAGS_BFD := $(CFLAGS) -lbfd
-has_bfd := $(call try-cc,$(SOURCE_BFD),$(FLAGS_BFD))
-ifeq ($(has_bfd),y)
-   CFLAGS  += -DCONFIG_HAS_BFD
-   OBJS+= symbol.o
-   LIBS+= -lbfd
-endif
-
-FLAGS_VNCSERVER := $(CFLAGS) -lvncserver
-has_vncserver := $(call try-cc,$(SOURCE_VNCSERVER),$(FLAGS_VNCSERVER))
-ifeq ($(has_vncserver),y)
-   OBJS+= ui/vnc.o
-   CFLAGS  += -DCONFIG_HAS_VNCSERVER
-   LIBS+= -lvncserver
-endif
-
-FLAGS_SDL := $(CFLAGS) -lSDL
-has_SDL := $(call try-cc,$(SOURCE_SDL),$(FLAGS_SDL))
-ifeq ($(has_SDL),y)
-   OBJS+= ui/sdl.o
-   CFLAGS  += -DCONFIG_HAS_SDL
-   LIBS+= -lSDL
-endif
-
-FLAGS_ZLIB := $(CFLAGS) -lz
-has_ZLIB := $(call try-cc,$(SOURCE_ZLIB),$(FLAGS_ZLIB))
-ifeq ($(has_ZLIB),y)
-   CFLAGS  += -DCONFIG_HAS_ZLIB
-   LIBS+= -lz
-endif
-
-FLAGS_AIO := $(CFLAGS) -laio
-has_AIO := $(call try-cc,$(SOURCE_AIO),$(FLAGS_AIO))
-ifeq ($(has_AIO),y)
-   CFLAGS  += -DCONFIG_HAS_AIO
-   LIBS+= -laio
-endif
-
-LIBS   += -lrt
-LIBS   += -lpthread
-LIBS   += -lutil
-
 # Additional ARCH settings for x86
 ARCH ?= $(shell echo $(uname_M) | sed -e s/i.86/i386/ -e s/sun4u/sparc64/ \
   -e s/arm.*/arm/ -e s/sa110/arm/ \
@@ -172,6 +130,50 @@ else
UNSUPP_ERR =
 endif
 
+
+FLAGS_BFD := $(CFLAGS) -lbfd
+has_bfd := $(call try-cc,$(SOURCE_BFD),$(FLAGS_BFD))
+ifeq ($(has_bfd),y)
+   CFLAGS  += -DCONFIG_HAS_BFD
+   OBJS+= symbol.o
+   LIBS+= -lbfd
+endif
+
+FLAGS_VNCSERVER := $(CFLAGS) -lvncserver
+has_vncserver := $(call try-cc,$(SOURCE_VNCSERVER),$(FLAGS_VNCSERVER))
+ifeq ($(has_vncserver),y)
+   OBJS+= ui/vnc.o
+   CFLAGS  += -DCONFIG_HAS_VNCSERVER
+   LIBS+= -lvncserver
+endif
+
+FLAGS_SDL := $(CFLAGS) -lSDL
+has_SDL := $(call try-cc,$(SOURCE_SDL),$(FLAGS_SDL))
+ifeq ($(has_SDL),y)
+   OBJS+= ui/sdl.o
+   CFLAGS  += -DCONFIG_HAS_SDL
+   LIBS+= -lSDL
+endif
+
+FLAGS_ZLIB := $(CFLAGS) -lz
+has_ZLIB := $(call try-cc,$(SOURCE_ZLIB),$(FLAGS_ZLIB))
+ifeq ($(has_ZLIB),y)
+   CFLAGS  += -DCONFIG_HAS_ZLIB
+   LIBS+= -lz
+endif
+
+FLAGS_AIO := $(CFLAGS) -laio
+has_AIO := $(call try-cc,$(SOURCE_AIO),$(FLAGS_AIO))
+ifeq ($(has_AIO),y)
+   CFLAGS  += -DCONFIG_HAS_AIO
+   LIBS+= -laio
+endif
+
+LIBS   += -lrt
+LIBS   += -lpthread
+LIBS   += -lutil
+
+
 DEPS   := $(patsubst %.o,%.d,$(OBJS))
 OBJS   += $(OTHEROBJS)
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 05/28] kvm tools: 64-bit tidy; use PRIx64 when printf'ing u64s and link appropriately

2011-12-05 Thread Matt Evans

On LP64 systems our u64s are just longs; remove the %llx'es in favour of PRIx64
etc.

This patch also adds CFLAGS to the final link, so that any -m64 is obeyed when
linking, too.

Signed-off-by: Matt Evans m...@ozlabs.org
---
 tools/kvm/Makefile   |2 +-
 tools/kvm/builtin-run.c  |   14 --
 tools/kvm/builtin-stat.c |4 +++-
 tools/kvm/disk/core.c|4 +++-
 tools/kvm/mmio.c |4 +++-
 5 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile
index 009a6ba..57dc521 100644
--- a/tools/kvm/Makefile
+++ b/tools/kvm/Makefile
@@ -218,7 +218,7 @@ KVMTOOLS-VERSION-FILE:
 
 $(PROGRAM): $(DEPS) $(OBJS)
$(E)   LINK $@
-   $(Q) $(CC) $(OBJS) $(LIBS) -o $@
+   $(Q) $(CC) $(CFLAGS) $(OBJS) $(LIBS) -o $@
 
 $(GUEST_INIT): guest/init.c
$(E)   LINK $@
diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c
index e4aa87e..7cf208d 100644
--- a/tools/kvm/builtin-run.c
+++ b/tools/kvm/builtin-run.c
@@ -42,6 +42,8 @@
 #include stdlib.h
 #include string.h
 #include unistd.h
+#define __STDC_FORMAT_MACROS
+#include inttypes.h
 #include ctype.h
 #include stdio.h
 
@@ -383,8 +385,8 @@ static int shmem_parser(const struct option *opt, const 
char *arg, int unset)
strcpy(handle, default_handle);
}
if (verbose) {
-   pr_info(shmem: phys_addr = %llx, phys_addr);
-   pr_info(shmem: size  = %llx, size);
+   pr_info(shmem: phys_addr = %PRIx64, phys_addr);
+   pr_info(shmem: size  = %PRIx64, size);
pr_info(shmem: handle= %s, handle);
pr_info(shmem: create= %d, create);
}
@@ -545,7 +547,7 @@ panic_kvm:
current_kvm_cpu-kvm_run-exit_reason,
kvm_exit_reasons[current_kvm_cpu-kvm_run-exit_reason]);
if (current_kvm_cpu-kvm_run-exit_reason == KVM_EXIT_UNKNOWN)
-   fprintf(stderr, KVM exit code: 0x%Lu\n,
+   fprintf(stderr, KVM exit code: 0x%PRIx64\n,
current_kvm_cpu-kvm_run-hw.hardware_exit_reason);
 
kvm_cpu__set_debug_fd(STDOUT_FILENO);
@@ -760,10 +762,10 @@ int kvm_cmd_run(int argc, const char **argv, const char 
*prefix)
ram_size= get_ram_size(nrcpus);
 
if (ram_size  MIN_RAM_SIZE_MB)
-   die(Not enough memory specified: %lluMB (min %lluMB), 
ram_size, MIN_RAM_SIZE_MB);
+   die(Not enough memory specified: %PRIu64MB (min %lluMB), 
ram_size, MIN_RAM_SIZE_MB);
 
if (ram_size  host_ram_size())
-   pr_warning(Guest memory size %lluMB exceeds host physical RAM 
size %lluMB, ram_size, host_ram_size());
+   pr_warning(Guest memory size %PRIu64MB exceeds host physical 
RAM size %PRIu64MB, ram_size, host_ram_size());
 
ram_size = MB_SHIFT;
 
@@ -878,7 +880,7 @@ int kvm_cmd_run(int argc, const char **argv, const char 
*prefix)
virtio_blk__init_all(kvm);
}
 
-   printf(  # kvm run -k %s -m %Lu -c %d --name %s\n, kernel_filename, 
ram_size / 1024 / 1024, nrcpus, guest_name);
+   printf(  # kvm run -k %s -m %PRId64 -c %d --name %s\n, 
kernel_filename, ram_size / 1024 / 1024, nrcpus, guest_name);
 
if (!kvm__load_kernel(kvm, kernel_filename, initrd_filename,
real_cmdline, vidmode))
diff --git a/tools/kvm/builtin-stat.c b/tools/kvm/builtin-stat.c
index e28eb5b..c1f2605 100644
--- a/tools/kvm/builtin-stat.c
+++ b/tools/kvm/builtin-stat.c
@@ -9,6 +9,8 @@
 #include stdio.h
 #include string.h
 #include signal.h
+#define __STDC_FORMAT_MACROS
+#include inttypes.h
 
 #include linux/virtio_balloon.h
 
@@ -97,7 +99,7 @@ static int do_memstat(const char *name, int sock)
printf(The total amount of memory available (in 
bytes):);
break;
}
-   printf(%llu\n, stats[i].val);
+   printf(%PRId64\n, stats[i].val);
}
printf(\n);
 
diff --git a/tools/kvm/disk/core.c b/tools/kvm/disk/core.c
index 4915efd..a135851 100644
--- a/tools/kvm/disk/core.c
+++ b/tools/kvm/disk/core.c
@@ -4,6 +4,8 @@
 
 #include sys/eventfd.h
 #include sys/poll.h
+#define __STDC_FORMAT_MACROS
+#include inttypes.h
 
 #define AIO_MAX 32
 
@@ -232,7 +234,7 @@ ssize_t disk_image__get_serial(struct disk_image *disk, 
void *buffer, ssize_t *l
if (fstat(disk-fd, st) != 0)
return 0;
 
-   *len = snprintf(buffer, *len, %llu%llu%llu, (u64)st.st_dev, 
(u64)st.st_rdev, (u64)st.st_ino);
+   *len = snprintf(buffer, *len, %PRId64%PRId64%PRId64, 
(u64)st.st_dev, (u64)st.st_rdev, (u64)st.st_ino);
return *len;
 }
 
diff --git a/tools/kvm/mmio.c b/tools/kvm/mmio.c
index de7320f..1158bff 100644
--- a/tools/kvm/mmio.c
+++ b/tools/kvm/mmio.c
@@ -9,6 +9,8 @@
 #include linux/kvm.h
 #include linux/types.h
 #include linux/rbtree.h
+#define __STDC_FORMAT_MACROS
+#include

[PATCH 06/28] kvm tools: Add arch-specific KVM_RUN exit handling via kvm_cpu__handle_exit()

2011-12-05 Thread Matt Evans

This patch creates a new function in x86/kvm-cpu.c, kvm_cpu__handle_exit(), in
which arch-specific exit reasons can be handled outside of the common runloop.

Signed-off-by: Matt Evans m...@ozlabs.org
---
 tools/kvm/include/kvm/kvm-cpu.h |2 ++
 tools/kvm/kvm-cpu.c |   10 --
 tools/kvm/x86/kvm-cpu.c |5 +
 3 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/tools/kvm/include/kvm/kvm-cpu.h b/tools/kvm/include/kvm/kvm-cpu.h
index 719e286..15618f1 100644
--- a/tools/kvm/include/kvm/kvm-cpu.h
+++ b/tools/kvm/include/kvm/kvm-cpu.h
@@ -2,6 +2,7 @@
 #define KVM__KVM_CPU_H
 
 #include kvm/kvm-cpu-arch.h
+#include stdbool.h
 
 struct kvm_cpu *kvm_cpu__init(struct kvm *kvm, unsigned long cpu_id);
 void kvm_cpu__delete(struct kvm_cpu *vcpu);
@@ -11,6 +12,7 @@ void kvm_cpu__enable_singlestep(struct kvm_cpu *vcpu);
 void kvm_cpu__run(struct kvm_cpu *vcpu);
 void kvm_cpu__reboot(void);
 int kvm_cpu__start(struct kvm_cpu *cpu);
+bool kvm_cpu__handle_exit(struct kvm_cpu *vcpu);
 
 int kvm_cpu__get_debug_fd(void);
 void kvm_cpu__set_debug_fd(int fd);
diff --git a/tools/kvm/kvm-cpu.c b/tools/kvm/kvm-cpu.c
index 5aba3bb..9bc0796 100644
--- a/tools/kvm/kvm-cpu.c
+++ b/tools/kvm/kvm-cpu.c
@@ -137,8 +137,14 @@ int kvm_cpu__start(struct kvm_cpu *cpu)
goto exit_kvm;
case KVM_EXIT_SHUTDOWN:
goto exit_kvm;
-   default:
-   goto panic_kvm;
+   default: {
+   bool ret;
+
+   ret = kvm_cpu__handle_exit(cpu);
+   if (!ret)
+   goto panic_kvm;
+   break;
+   }
}
kvm_cpu__handle_coalesced_mmio(cpu);
}
diff --git a/tools/kvm/x86/kvm-cpu.c b/tools/kvm/x86/kvm-cpu.c
index b26b208..a0d10cc 100644
--- a/tools/kvm/x86/kvm-cpu.c
+++ b/tools/kvm/x86/kvm-cpu.c
@@ -212,6 +212,11 @@ void kvm_cpu__reset_vcpu(struct kvm_cpu *vcpu)
kvm_cpu__setup_msrs(vcpu);
 }
 
+bool kvm_cpu__handle_exit(struct kvm_cpu *vcpu)
+{
+   return false;
+}
+
 static void print_dtable(const char *name, struct kvm_dtable *dtable)
 {
dprintf(debug_fd,  %s %016llx  %08hx\n,
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 07/28] kvm tools: Move 'kvm__recommended_cpus' to arch-specific code

2011-12-05 Thread Matt Evans

Architectures can recommend/count/determine number of CPUs differently, so move
this out of generic code.

Signed-off-by: Matt Evans m...@ozlabs.org
---
 tools/kvm/kvm.c |   30 --
 tools/kvm/x86/kvm.c |   30 ++
 2 files changed, 30 insertions(+), 30 deletions(-)

diff --git a/tools/kvm/kvm.c b/tools/kvm/kvm.c
index 7ce1640..e526483 100644
--- a/tools/kvm/kvm.c
+++ b/tools/kvm/kvm.c
@@ -259,17 +259,6 @@ void kvm__register_mem(struct kvm *kvm, u64 guest_phys, 
u64 size, void *userspac
die_perror(KVM_SET_USER_MEMORY_REGION ioctl);
 }
 
-int kvm__recommended_cpus(struct kvm *kvm)
-{
-   int ret;
-
-   ret = ioctl(kvm-sys_fd, KVM_CHECK_EXTENSION, KVM_CAP_NR_VCPUS);
-   if (ret = 0)
-   die_perror(KVM_CAP_NR_VCPUS);
-
-   return ret;
-}
-
 static void kvm__pid(int fd, u32 type, u32 len, u8 *msg)
 {
pid_t pid = getpid();
@@ -282,25 +271,6 @@ static void kvm__pid(int fd, u32 type, u32 len, u8 *msg)
pr_warning(Failed sending PID);
 }
 
-/*
- * The following hack should be removed once 'x86: Raise the hard
- * VCPU count limit' makes it's way into the mainline.
- */
-#ifndef KVM_CAP_MAX_VCPUS
-#define KVM_CAP_MAX_VCPUS 66
-#endif
-
-int kvm__max_cpus(struct kvm *kvm)
-{
-   int ret;
-
-   ret = ioctl(kvm-sys_fd, KVM_CHECK_EXTENSION, KVM_CAP_MAX_VCPUS);
-   if (ret = 0)
-   ret = kvm__recommended_cpus(kvm);
-
-   return ret;
-}
-
 struct kvm *kvm__init(const char *kvm_dev, u64 ram_size, const char *name)
 {
struct kvm *kvm;
diff --git a/tools/kvm/x86/kvm.c b/tools/kvm/x86/kvm.c
index ac6c91e..75e4a52 100644
--- a/tools/kvm/x86/kvm.c
+++ b/tools/kvm/x86/kvm.c
@@ -76,6 +76,36 @@ bool kvm__arch_cpu_supports_vm(void)
return regs.ecx  (1  feature);
 }
 
+int kvm__recommended_cpus(struct kvm *kvm)
+{
+   int ret;
+
+   ret = ioctl(kvm-sys_fd, KVM_CHECK_EXTENSION, KVM_CAP_NR_VCPUS);
+   if (ret = 0)
+   die_perror(KVM_CAP_NR_VCPUS);
+
+   return ret;
+}
+
+/*
+ * The following hack should be removed once 'x86: Raise the hard
+ * VCPU count limit' makes it's way into the mainline.
+ */
+#ifndef KVM_CAP_MAX_VCPUS
+#define KVM_CAP_MAX_VCPUS 66
+#endif
+
+int kvm__max_cpus(struct kvm *kvm)
+{
+   int ret;
+
+   ret = ioctl(kvm-sys_fd, KVM_CHECK_EXTENSION, KVM_CAP_MAX_VCPUS);
+   if (ret = 0)
+   ret = kvm__recommended_cpus(kvm);
+
+   return ret;
+}
+
 /*
  * Allocating RAM size bigger than 4GB requires us to leave a gap
  * in the RAM which is used for PCI MMIO, hotplug, and unconfigured
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 08/28] kvm tools: Fix KVM_RUN exit code check

2011-12-05 Thread Matt Evans

kvm_cpu__run() currently die()s if KVM_RUN returns non-zero.  Some architectures
may return positive values in non-error cases, whereas real errors are always
negative return values.  Check for those instead.

Signed-off-by: Matt Evans m...@ozlabs.org
---
 tools/kvm/kvm-cpu.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/tools/kvm/kvm-cpu.c b/tools/kvm/kvm-cpu.c
index 9bc0796..884a89f 100644
--- a/tools/kvm/kvm-cpu.c
+++ b/tools/kvm/kvm-cpu.c
@@ -30,7 +30,7 @@ void kvm_cpu__run(struct kvm_cpu *vcpu)
int err;
 
err = ioctl(vcpu-vcpu_fd, KVM_RUN, 0);
-   if (err  (errno != EINTR  errno != EAGAIN))
+   if (err  0  (errno != EINTR  errno != EAGAIN))
die_perror(KVM_RUN failed);
 }
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 09/28] kvm tools: Add kvm__arch_periodic_poll()

2011-12-05 Thread Matt Evans

Currently, the SIGALRM handler calls device poll functions (for serial, virtio
console) directly.  Which devices are present and which require polling is a
system-specific decision, so create a new function called from common code 
move the x86-specific poll calls into it.

Signed-off-by: Matt Evans m...@ozlabs.org
---
 tools/kvm/builtin-run.c |3 +--
 tools/kvm/include/kvm/kvm.h |1 +
 tools/kvm/x86/kvm.c |8 
 3 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c
index 7cf208d..9ef331e 100644
--- a/tools/kvm/builtin-run.c
+++ b/tools/kvm/builtin-run.c
@@ -522,8 +522,7 @@ static void handle_debug(int fd, u32 type, u32 len, u8 *msg)
 
 static void handle_sigalrm(int sig)
 {
-   serial8250__inject_interrupt(kvm);
-   virtio_console__inject_interrupt(kvm);
+   kvm__arch_periodic_poll(kvm);
 }
 
 static void handle_stop(int fd, u32 type, u32 len, u8 *msg)
diff --git a/tools/kvm/include/kvm/kvm.h b/tools/kvm/include/kvm/kvm.h
index ca1acc0..60842d5 100644
--- a/tools/kvm/include/kvm/kvm.h
+++ b/tools/kvm/include/kvm/kvm.h
@@ -56,6 +56,7 @@ void kvm__remove_socket(const char *name);
 void kvm__arch_init(struct kvm *kvm, const char *kvm_dev, u64 ram_size, const 
char *name);
 void kvm__arch_setup_firmware(struct kvm *kvm);
 bool kvm__arch_cpu_supports_vm(void);
+void kvm__arch_periodic_poll(struct kvm *kvm);
 
 int load_flat_binary(struct kvm *kvm, int fd);
 bool load_bzimage(struct kvm *kvm, int fd_kernel, int fd_initrd, const char 
*kernel_cmdline, u16 vidmode);
diff --git a/tools/kvm/x86/kvm.c b/tools/kvm/x86/kvm.c
index 75e4a52..45dcb77 100644
--- a/tools/kvm/x86/kvm.c
+++ b/tools/kvm/x86/kvm.c
@@ -4,6 +4,8 @@
 #include kvm/interrupt.h
 #include kvm/mptable.h
 #include kvm/util.h
+#include kvm/8250-serial.h
+#include kvm/virtio-console.h
 
 #include asm/bootparam.h
 #include linux/kvm.h
@@ -358,3 +360,9 @@ void kvm__arch_setup_firmware(struct kvm *kvm)
/* MP table */
mptable_setup(kvm, kvm-nrcpus);
 }
+
+void kvm__arch_periodic_poll(struct kvm *kvm)
+{
+   serial8250__inject_interrupt(kvm);
+   virtio_console__inject_interrupt(kvm);
+}
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 10/28] kvm tools: term.h needs to include stdbool.h

2011-12-05 Thread Matt Evans

Fix a missing include.

Signed-off-by: Matt Evans m...@ozlabs.org
---
 tools/kvm/include/kvm/term.h |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/tools/kvm/include/kvm/term.h b/tools/kvm/include/kvm/term.h
index 37ec731..938c26f 100644
--- a/tools/kvm/include/kvm/term.h
+++ b/tools/kvm/include/kvm/term.h
@@ -2,6 +2,7 @@
 #define KVM__TERM_H
 
 #include sys/uio.h
+#include stdbool.h
 
 #define CONSOLE_8250   1
 #define CONSOLE_VIRTIO 2
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 11/28] kvm tools: kvm.c needs to include sys/stat.h for mkdir

2011-12-05 Thread Matt Evans

Fix a missing include.

Signed-off-by: Matt Evans m...@ozlabs.org
---
 tools/kvm/kvm.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/tools/kvm/kvm.c b/tools/kvm/kvm.c
index e526483..33243f1 100644
--- a/tools/kvm/kvm.c
+++ b/tools/kvm/kvm.c
@@ -8,6 +8,7 @@
 #include linux/kvm.h
 
 #include sys/un.h
+#include sys/stat.h
 #include sys/types.h
 #include sys/socket.h
 #include sys/ioctl.h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 12/28] kvm tools: Move arch-specific cmdline init into kvm__arch_set_cmdline()

2011-12-05 Thread Matt Evans

Different systems will want different base kernel commandlines, e.g. non-x86
systems probably don't need noapic, i8042.* etc., so set the commandline up in
arch-specific code.  Then, if the resulting commandline is empty, don't strcat a
space onto the front.

Signed-off-by: Matt Evans m...@ozlabs.org
---
 tools/kvm/builtin-run.c |   12 +---
 tools/kvm/include/kvm/kvm.h |1 +
 tools/kvm/x86/kvm.c |   11 +++
 3 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c
index 9ef331e..a67bd8c 100644
--- a/tools/kvm/builtin-run.c
+++ b/tools/kvm/builtin-run.c
@@ -835,13 +835,11 @@ int kvm_cmd_run(int argc, const char **argv, const char 
*prefix)
vidmode = 0;
 
memset(real_cmdline, 0, sizeof(real_cmdline));
-   strcpy(real_cmdline, noapic noacpi pci=conf1 reboot=k panic=1 
i8042.direct=1 
-   i8042.dumbkbd=1 i8042.nopnp=1);
-   if (vnc || sdl) {
-   strcat(real_cmdline,  video=vesafb console=tty0);
-   } else
-   strcat(real_cmdline,  console=ttyS0 earlyprintk=serial 
i8042.noaux=1);
-   strcat(real_cmdline,  );
+   kvm__arch_set_cmdline(real_cmdline, vnc || sdl);
+
+   if (strlen(real_cmdline)  0)
+   strcat(real_cmdline,  );
+
if (kernel_cmdline)
strlcat(real_cmdline, kernel_cmdline, sizeof(real_cmdline));
 
diff --git a/tools/kvm/include/kvm/kvm.h b/tools/kvm/include/kvm/kvm.h
index 60842d5..fae2ba9 100644
--- a/tools/kvm/include/kvm/kvm.h
+++ b/tools/kvm/include/kvm/kvm.h
@@ -53,6 +53,7 @@ int kvm__get_sock_by_instance(const char *name);
 int kvm__enumerate_instances(int (*callback)(const char *name, int pid));
 void kvm__remove_socket(const char *name);
 
+void kvm__arch_set_cmdline(char *cmdline, bool video);
 void kvm__arch_init(struct kvm *kvm, const char *kvm_dev, u64 ram_size, const 
char *name);
 void kvm__arch_setup_firmware(struct kvm *kvm);
 bool kvm__arch_cpu_supports_vm(void);
diff --git a/tools/kvm/x86/kvm.c b/tools/kvm/x86/kvm.c
index 45dcb77..7071dc6 100644
--- a/tools/kvm/x86/kvm.c
+++ b/tools/kvm/x86/kvm.c
@@ -149,6 +149,17 @@ void kvm__init_ram(struct kvm *kvm)
}
 }
 
+/* Arch-specific commandline setup */
+void kvm__arch_set_cmdline(char *cmdline, bool video)
+{
+   strcpy(cmdline, noapic noacpi pci=conf1 reboot=k panic=1 
i8042.direct=1 
+   i8042.dumbkbd=1 i8042.nopnp=1);
+   if (video) {
+   strcat(cmdline,  video=vesafb console=tty0);
+   } else
+   strcat(cmdline,  console=ttyS0 earlyprintk=serial 
i8042.noaux=1);
+}
+
 /* Architecture-specific KVM init */
 void kvm__arch_init(struct kvm *kvm, const char *kvm_dev, u64 ram_size, const 
char *name)
 {
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 13/28] kvm tools: Add CONSOLE_HV term type and allow it to be selected

2011-12-05 Thread Matt Evans

This patch paves the way for adding a hypervisor console, useful on systems that
support one out of the box yet don't have either serial port or virtio console
support (e.g. kernels expecting POWER SPAPR).

Signed-off-by: Matt Evans m...@ozlabs.org
---
 tools/kvm/builtin-run.c  |8 ++--
 tools/kvm/include/kvm/term.h |1 +
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c
index a67bd8c..1257c90 100644
--- a/tools/kvm/builtin-run.c
+++ b/tools/kvm/builtin-run.c
@@ -416,7 +416,7 @@ static const struct option options[] = {
OPT_BOOLEAN('\0', rng, virtio_rng, Enable virtio Random Number 
Generator),
OPT_CALLBACK('\0', 9p, NULL, dir_to_share,tag_name,
 Enable virtio 9p to share files between host and guest, 
virtio_9p_rootdir_parser),
-   OPT_STRING('\0', console, console, serial or virtio,
+   OPT_STRING('\0', console, console, serial, virtio or hv,
Console to use),
OPT_STRING('\0', dev, dev, device_file, KVM device file),
OPT_CALLBACK('\0', tty, NULL, tty id,
@@ -776,8 +776,12 @@ int kvm_cmd_run(int argc, const char **argv, const char 
*prefix)
 
if (!strncmp(console, virtio, 6))
active_console  = CONSOLE_VIRTIO;
-   else
+   else if (!strncmp(console, serial, 6))
active_console  = CONSOLE_8250;
+   else if (!strncmp(console, hv, 2))
+   active_console = CONSOLE_HV;
+   else
+   pr_warning(No console!);
 
if (!host_ip)
host_ip = DEFAULT_HOST_ADDR;
diff --git a/tools/kvm/include/kvm/term.h b/tools/kvm/include/kvm/term.h
index 938c26f..a6a9822 100644
--- a/tools/kvm/include/kvm/term.h
+++ b/tools/kvm/include/kvm/term.h
@@ -6,6 +6,7 @@
 
 #define CONSOLE_8250   1
 #define CONSOLE_VIRTIO 2
+#define CONSOLE_HV 3
 
 int term_putc_iov(int who, struct iovec *iov, int iovcnt, int term);
 int term_getc_iov(int who, struct iovec *iov, int iovcnt, int term);
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 14/28] kvm tools: Fix term_getc(), term_getc_iov() endian bugs

2011-12-05 Thread Matt Evans

term_getc()'s int c has one byte written into it (at its lowest address) by
read_in_full().  This is expected to be the least significant byte, but that
isn't the case on BE!  Use correct type, unsigned char.  A similar issue exists
in term_getc_iov(), which needs to write a char to the iov rather than an int.

Signed-off-by: Matt Evans m...@ozlabs.org
---
 tools/kvm/term.c |5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/tools/kvm/term.c b/tools/kvm/term.c
index fb5d71c..440884e 100644
--- a/tools/kvm/term.c
+++ b/tools/kvm/term.c
@@ -30,11 +30,10 @@ int term_fds[4][2];
 
 int term_getc(int who, int term)
 {
-   int c;
+   unsigned char c;
 
if (who != active_console)
return -1;
-
if (read_in_full(term_fds[term][TERM_FD_IN], c, 1)  0)
return -1;
 
@@ -84,7 +83,7 @@ int term_getc_iov(int who, struct iovec *iov, int iovcnt, int 
term)
if (c  0)
return 0;
 
-   *((int *)iov[TERM_FD_IN].iov_base)  = c;
+   *((char *)iov[TERM_FD_IN].iov_base) = (char)c;
 
return sizeof(char);
 }
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 15/28] kvm tools: Allow initrd_check() to match a cpio

2011-12-05 Thread Matt Evans

cpios are valid as initrds too, so allow them through the check.

Signed-off-by: Matt Evans m...@ozlabs.org
---
 tools/kvm/kvm.c |8 +---
 1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/tools/kvm/kvm.c b/tools/kvm/kvm.c
index 33243f1..457de1a 100644
--- a/tools/kvm/kvm.c
+++ b/tools/kvm/kvm.c
@@ -317,10 +317,11 @@ struct kvm *kvm__init(const char *kvm_dev, u64 ram_size, 
const char *name)
 /* RFC 1952 */
 #define GZIP_ID1   0x1f
 #define GZIP_ID2   0x8b
-
+#define CPIO_MAGIC 0707
+/* initrd may be gzipped, or a plain cpio */
 static bool initrd_check(int fd)
 {
-   unsigned char id[2];
+   unsigned char id[4];
 
if (read_in_full(fd, id, ARRAY_SIZE(id))  0)
return false;
@@ -328,7 +329,8 @@ static bool initrd_check(int fd)
if (lseek(fd, 0, SEEK_SET)  0)
die_perror(lseek);
 
-   return id[0] == GZIP_ID1  id[1] == GZIP_ID2;
+   return (id[0] == GZIP_ID1  id[1] == GZIP_ID2) ||
+   !memcmp(id, CPIO_MAGIC, 4);
 }
 
 bool kvm__load_kernel(struct kvm *kvm, const char *kernel_filename,
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 16/28] kvm tools: Allow load_flat_binary() to load an initrd alongside

2011-12-05 Thread Matt Evans

This patch passes the initrd fd and commandline to load_flat_binary(), which may
be used to load both the kernel  an initrd (stashing or inserting the
commandline as appropriate) in the same way that load_bzimage() does.  This is
especially useful when load_bzimage() is unused for a particular
architecture. :-)

Signed-off-by: Matt Evans m...@ozlabs.org
---
 tools/kvm/include/kvm/kvm.h |2 +-
 tools/kvm/kvm.c |   10 ++
 tools/kvm/x86/kvm.c |   12 +---
 3 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/tools/kvm/include/kvm/kvm.h b/tools/kvm/include/kvm/kvm.h
index fae2ba9..5fe6e75 100644
--- a/tools/kvm/include/kvm/kvm.h
+++ b/tools/kvm/include/kvm/kvm.h
@@ -59,7 +59,7 @@ void kvm__arch_setup_firmware(struct kvm *kvm);
 bool kvm__arch_cpu_supports_vm(void);
 void kvm__arch_periodic_poll(struct kvm *kvm);
 
-int load_flat_binary(struct kvm *kvm, int fd);
+int load_flat_binary(struct kvm *kvm, int fd_kernel, int fd_initrd, const char 
*kernel_cmdline);
 bool load_bzimage(struct kvm *kvm, int fd_kernel, int fd_initrd, const char 
*kernel_cmdline, u16 vidmode);
 
 /*
diff --git a/tools/kvm/kvm.c b/tools/kvm/kvm.c
index 457de1a..6f33e1a 100644
--- a/tools/kvm/kvm.c
+++ b/tools/kvm/kvm.c
@@ -354,23 +354,25 @@ bool kvm__load_kernel(struct kvm *kvm, const char 
*kernel_filename,
 
ret = load_bzimage(kvm, fd_kernel, fd_initrd, kernel_cmdline, vidmode);
 
-   if (initrd_filename)
-   close(fd_initrd);
-
if (ret)
goto found_kernel;
 
pr_warning(%s is not a bzImage. Trying to load it as a flat 
binary..., kernel_filename);
 
-   ret = load_flat_binary(kvm, fd_kernel);
+   ret = load_flat_binary(kvm, fd_kernel, fd_initrd, kernel_cmdline);
+
if (ret)
goto found_kernel;
 
+   if (initrd_filename)
+   close(fd_initrd);
close(fd_kernel);
 
die(%s is not a valid bzImage or flat binary, kernel_filename);
 
 found_kernel:
+   if (initrd_filename)
+   close(fd_initrd);
close(fd_kernel);
 
return ret;
diff --git a/tools/kvm/x86/kvm.c b/tools/kvm/x86/kvm.c
index 7071dc6..4ac21c0 100644
--- a/tools/kvm/x86/kvm.c
+++ b/tools/kvm/x86/kvm.c
@@ -227,17 +227,23 @@ void kvm__irq_trigger(struct kvm *kvm, int irq)
 #define BOOT_PROTOCOL_REQUIRED 0x206
 #define LOAD_HIGH  0x01
 
-int load_flat_binary(struct kvm *kvm, int fd)
+int load_flat_binary(struct kvm *kvm, int fd_kernel, int fd_initrd, const char 
*kernel_cmdline)
 {
void *p;
int nr;
 
-   if (lseek(fd, 0, SEEK_SET)  0)
+   /* Some architectures may support loading an initrd alongside the flat 
kernel,
+* but we do not.
+*/
+   if (fd_initrd != -1)
+   pr_warning(Loading initrd with flat binary not supported.);
+
+   if (lseek(fd_kernel, 0, SEEK_SET)  0)
die_perror(lseek);
 
p = guest_real_to_host(kvm, BOOT_LOADER_SELECTOR, BOOT_LOADER_IP);
 
-   while ((nr = read(fd, p, 65536))  0)
+   while ((nr = read(fd_kernel, p, 65536))  0)
p += nr;
 
kvm-boot_selector  = BOOT_LOADER_SELECTOR;
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 17/28] kvm tools: Only call symbol__init() if we have BFD

2011-12-05 Thread Matt Evans

CONFIG_HAS_BFD is optional, symbol.c inclusion is optional -- so make its init
call dependent on CONFIG_HAS_BFD.

Signed-off-by: Matt Evans m...@ozlabs.org
---
 tools/kvm/builtin-run.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c
index 1257c90..aaa5132 100644
--- a/tools/kvm/builtin-run.c
+++ b/tools/kvm/builtin-run.c
@@ -798,8 +798,9 @@ int kvm_cmd_run(int argc, const char **argv, const char 
*prefix)
if (!script)
script = DEFAULT_SCRIPT;
 
+#ifdef CONFIG_HAS_BFD
symbol__init(vmlinux_filename);
-
+#endif
term_init();
 
if (!guest_name) {
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 18/28] kvm tools: Initialise PCI before devices start getting registered with PCI

2011-12-05 Thread Matt Evans

Re-arrange pci__init() in builtin-run such that it comes before devices are
initialised.

Signed-off-by: Matt Evans m...@ozlabs.org
---
 tools/kvm/builtin-run.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c
index aaa5132..32e19e7 100644
--- a/tools/kvm/builtin-run.c
+++ b/tools/kvm/builtin-run.c
@@ -829,6 +829,8 @@ int kvm_cmd_run(int argc, const char **argv, const char 
*prefix)
 
kvm-nrcpus = nrcpus;
 
+   pci__init();
+
/*
 * vidmode should be either specified
 * either set by default
@@ -896,8 +898,6 @@ int kvm_cmd_run(int argc, const char **argv, const char 
*prefix)
 
serial8250__init(kvm);
 
-   pci__init();
-
if (active_console == CONSOLE_VIRTIO)
virtio_console__init(kvm);
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 19/28] kvm tools: Perform CPU and firmware setup after devices are added

2011-12-05 Thread Matt Evans

Currently some devices (in this case kbd, fb, vesa) are initialised after
CPU/firmware setup.  On some platforms (e.g. PPC) kvm__arch_setup_firmware() may
be making a device tree.  Any devices added after this point will be missed!

Tiny refactor of builtin-run.c, moving timer start, firmware setup, cpu init
to occur last.

Signed-off-by: Matt Evans m...@ozlabs.org
---
 tools/kvm/builtin-run.c |   24 ++--
 1 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c
index 32e19e7..576dcfa 100644
--- a/tools/kvm/builtin-run.c
+++ b/tools/kvm/builtin-run.c
@@ -933,16 +933,6 @@ int kvm_cmd_run(int argc, const char **argv, const char 
*prefix)
virtio_net__init(net_params);
}
 
-   kvm__start_timer(kvm);
-
-   kvm__arch_setup_firmware(kvm);
-
-   for (i = 0; i  nrcpus; i++) {
-   kvm_cpus[i] = kvm_cpu__init(kvm, i);
-   if (!kvm_cpus[i])
-   die(unable to initialize KVM VCPU);
-   }
-
kvm__init_ram(kvm);
 
 #ifdef CONFIG_X86
@@ -966,6 +956,20 @@ int kvm_cmd_run(int argc, const char **argv, const char 
*prefix)
 
fb__start();
 
+   /* Device init all done; firmware init must
+* come after this (it may set up device trees etc.)
+*/
+
+   kvm__start_timer(kvm);
+
+   kvm__arch_setup_firmware(kvm);
+
+   for (i = 0; i  nrcpus; i++) {
+   kvm_cpus[i] = kvm_cpu__init(kvm, i);
+   if (!kvm_cpus[i])
+   die(unable to initialize KVM VCPU);
+   }
+
thread_pool__init(nr_online_cpus);
ioeventfd__start();
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 20/28] kvm tools: Init IRQs after determining nrcpus

2011-12-05 Thread Matt Evans

IRQ init may involve per-CPU setup/allocation of resources, so make sure
kvm-nrcpus is initialised before calling irq__init().

Signed-off-by: Matt Evans m...@ozlabs.org
---
 tools/kvm/builtin-run.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c
index 576dcfa..84aa931 100644
--- a/tools/kvm/builtin-run.c
+++ b/tools/kvm/builtin-run.c
@@ -810,8 +810,6 @@ int kvm_cmd_run(int argc, const char **argv, const char 
*prefix)
 
kvm = kvm__init(dev, ram_size, guest_name);
 
-   irq__init(kvm);
-
kvm-single_step = single_step;
 
ioeventfd__init();
@@ -829,6 +827,8 @@ int kvm_cmd_run(int argc, const char **argv, const char 
*prefix)
 
kvm-nrcpus = nrcpus;
 
+   irq__init(kvm);
+
pci__init();
 
/*
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 21/28] kvm tools: Add --hugetlbfs option to specify memory path

2011-12-05 Thread Matt Evans

Some architectures may want to use hugetlbfs to mmap() their guest memory, so
allow a path to be specified on the commandline and pass it to kvm__arch_init().

Signed-off-by: Matt Evans m...@ozlabs.org
---
 tools/kvm/builtin-run.c |4 +++-
 tools/kvm/include/kvm/kvm.h |4 ++--
 tools/kvm/kvm.c |4 ++--
 tools/kvm/x86/kvm.c |2 +-
 4 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c
index 84aa931..4c88169 100644
--- a/tools/kvm/builtin-run.c
+++ b/tools/kvm/builtin-run.c
@@ -84,6 +84,7 @@ static const char *guest_mac;
 static const char *host_mac;
 static const char *script;
 static const char *guest_name;
+static const char *hugetlbfs_path;
 static struct virtio_net_params *net_params;
 static bool single_step;
 static bool readonly_image[MAX_DISK_IMAGES];
@@ -422,6 +423,7 @@ static const struct option options[] = {
OPT_CALLBACK('\0', tty, NULL, tty id,
 Remap guest TTY into a pty on the host,
 tty_parser),
+   OPT_STRING('\0', hugetlbfs, hugetlbfs_path, path, Hugetlbfs 
path),
 
OPT_GROUP(Kernel options:),
OPT_STRING('k', kernel, kernel_filename, kernel,
@@ -808,7 +810,7 @@ int kvm_cmd_run(int argc, const char **argv, const char 
*prefix)
guest_name = default_name;
}
 
-   kvm = kvm__init(dev, ram_size, guest_name);
+   kvm = kvm__init(dev, hugetlbfs_path, ram_size, guest_name);
 
kvm-single_step = single_step;
 
diff --git a/tools/kvm/include/kvm/kvm.h b/tools/kvm/include/kvm/kvm.h
index 5fe6e75..7159952 100644
--- a/tools/kvm/include/kvm/kvm.h
+++ b/tools/kvm/include/kvm/kvm.h
@@ -30,7 +30,7 @@ struct kvm_ext {
 void kvm__set_dir(const char *fmt, ...);
 const char *kvm__get_dir(void);
 
-struct kvm *kvm__init(const char *kvm_dev, u64 ram_size, const char *name);
+struct kvm *kvm__init(const char *kvm_dev, const char *hugetlbfs_path, u64 
ram_size, const char *name);
 int kvm__recommended_cpus(struct kvm *kvm);
 int kvm__max_cpus(struct kvm *kvm);
 void kvm__init_ram(struct kvm *kvm);
@@ -54,7 +54,7 @@ int kvm__enumerate_instances(int (*callback)(const char 
*name, int pid));
 void kvm__remove_socket(const char *name);
 
 void kvm__arch_set_cmdline(char *cmdline, bool video);
-void kvm__arch_init(struct kvm *kvm, const char *kvm_dev, u64 ram_size, const 
char *name);
+void kvm__arch_init(struct kvm *kvm, const char *kvm_dev, const char 
*hugetlbfs_path, u64 ram_size, const char *name);
 void kvm__arch_setup_firmware(struct kvm *kvm);
 bool kvm__arch_cpu_supports_vm(void);
 void kvm__arch_periodic_poll(struct kvm *kvm);
diff --git a/tools/kvm/kvm.c b/tools/kvm/kvm.c
index 6f33e1a..503ceae 100644
--- a/tools/kvm/kvm.c
+++ b/tools/kvm/kvm.c
@@ -272,7 +272,7 @@ static void kvm__pid(int fd, u32 type, u32 len, u8 *msg)
pr_warning(Failed sending PID);
 }
 
-struct kvm *kvm__init(const char *kvm_dev, u64 ram_size, const char *name)
+struct kvm *kvm__init(const char *kvm_dev, const char *hugetlbfs_path, u64 
ram_size, const char *name)
 {
struct kvm *kvm;
int ret;
@@ -305,7 +305,7 @@ struct kvm *kvm__init(const char *kvm_dev, u64 ram_size, 
const char *name)
if (kvm__check_extensions(kvm))
die(A required KVM extention is not supported by OS);
 
-   kvm__arch_init(kvm, kvm_dev, ram_size, name);
+   kvm__arch_init(kvm, kvm_dev, hugetlbfs_path, ram_size, name);
 
kvm-name = name;
 
diff --git a/tools/kvm/x86/kvm.c b/tools/kvm/x86/kvm.c
index 4ac21c0..76f805f 100644
--- a/tools/kvm/x86/kvm.c
+++ b/tools/kvm/x86/kvm.c
@@ -161,7 +161,7 @@ void kvm__arch_set_cmdline(char *cmdline, bool video)
 }
 
 /* Architecture-specific KVM init */
-void kvm__arch_init(struct kvm *kvm, const char *kvm_dev, u64 ram_size, const 
char *name)
+void kvm__arch_init(struct kvm *kvm, const char *kvm_dev, const char 
*hugetlbfs_path, u64 ram_size, const char *name)
 {
struct kvm_pit_config pit_config = { .flags = 0, };
int ret;
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 22/28] kvm tools: Move PCI_MAX_DEVICES to pci.h

2011-12-05 Thread Matt Evans

Other pieces of kvmtool may be interested in PCI_MAX_DEVICES.

Signed-off-by: Matt Evans m...@ozlabs.org
---
 tools/kvm/include/kvm/pci.h |1 +
 tools/kvm/pci.c |1 -
 2 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/tools/kvm/include/kvm/pci.h b/tools/kvm/include/kvm/pci.h
index f71af0b..b578ad7 100644
--- a/tools/kvm/include/kvm/pci.h
+++ b/tools/kvm/include/kvm/pci.h
@@ -6,6 +6,7 @@
 #include linux/pci_regs.h
 #include linux/msi.h
 
+#define PCI_MAX_DEVICES256
 /*
  * PCI Configuration Mechanism #1 I/O ports. See Section 3.7.4.1.
  * (Configuration Mechanism #1) of the PCI Local Bus Specification 2.1 for
diff --git a/tools/kvm/pci.c b/tools/kvm/pci.c
index d1afc05..920e13e 100644
--- a/tools/kvm/pci.c
+++ b/tools/kvm/pci.c
@@ -5,7 +5,6 @@
 
 #include assert.h
 
-#define PCI_MAX_DEVICES256
 #define PCI_BAR_OFFSET(b)  (offsetof(struct pci_device_header, 
bar[b]))
 
 static struct pci_device_header*pci_devices[PCI_MAX_DEVICES];
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 23/28] kvm tools: Endian-sanitise pci.h and PCI device setup

2011-12-05 Thread Matt Evans

vesa, pci-shmem and virtio-pci devices need to set up config space with
little-endian conversions (as config space is LE).  The pci_config_address
bitfield also needs to be reversed when building on BE systems.

Signed-off-by: Matt Evans m...@ozlabs.org
---
 tools/kvm/hw/pci-shmem.c   |   23 +++--
 tools/kvm/hw/vesa.c|   15 +++--
 tools/kvm/include/kvm/ioport.h |   11 +
 tools/kvm/include/kvm/pci.h|   24 +-
 tools/kvm/pci.c|4 +-
 tools/kvm/virtio/pci.c |   41 +--
 6 files changed, 68 insertions(+), 50 deletions(-)

diff --git a/tools/kvm/hw/pci-shmem.c b/tools/kvm/hw/pci-shmem.c
index 780a377..fd954c5 100644
--- a/tools/kvm/hw/pci-shmem.c
+++ b/tools/kvm/hw/pci-shmem.c
@@ -8,21 +8,22 @@
 #include kvm/ioeventfd.h
 
 #include linux/kvm.h
+#include linux/byteorder.h
 #include sys/ioctl.h
 #include fcntl.h
 #include sys/mman.h
 
 static struct pci_device_header pci_shmem_pci_device = {
-   .vendor_id  = PCI_VENDOR_ID_REDHAT_QUMRANET,
-   .device_id  = 0x1110,
+   .vendor_id  = cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET),
+   .device_id  = cpu_to_le16(0x1110),
.header_type= PCI_HEADER_TYPE_NORMAL,
-   .class  = 0xFF, /* misc pci device */
-   .status = PCI_STATUS_CAP_LIST,
+   .class[2]   = 0xFF, /* misc pci device */
+   .status = cpu_to_le16(PCI_STATUS_CAP_LIST),
.capabilities   = (void *)pci_shmem_pci_device.msix - (void 
*)pci_shmem_pci_device,
.msix.cap   = PCI_CAP_ID_MSIX,
-   .msix.ctrl  = 1,
-   .msix.table_offset = 1, /* Use BAR 1 */
-   .msix.pba_offset = 0x1001,  /* Use BAR 1 */
+   .msix.ctrl  = cpu_to_le16(1),
+   .msix.table_offset = cpu_to_le32(1),/* Use BAR 1 */
+   .msix.pba_offset = cpu_to_le32(0x1001), /* Use BAR 1 */
 };
 
 /* registers for the Inter-VM shared memory device */
@@ -123,7 +124,7 @@ int pci_shmem__get_local_irqfd(struct kvm *kvm)
if (fd  0)
return fd;
 
-   if (pci_shmem_pci_device.msix.ctrl  PCI_MSIX_FLAGS_ENABLE) {
+   if (pci_shmem_pci_device.msix.ctrl  
cpu_to_le16(PCI_MSIX_FLAGS_ENABLE)) {
gsi = irq__add_msix_route(kvm, msix_table[0].msg);
} else {
gsi = pci_shmem_pci_device.irq_line;
@@ -241,11 +242,11 @@ int pci_shmem__init(struct kvm *kvm)
 * 1 - MSI-X MMIO space
 * 2 - Shared memory block
 */
-   pci_shmem_pci_device.bar[0] = ivshmem_registers | 
PCI_BASE_ADDRESS_SPACE_IO;
+   pci_shmem_pci_device.bar[0] = cpu_to_le32(ivshmem_registers | 
PCI_BASE_ADDRESS_SPACE_IO);
pci_shmem_pci_device.bar_size[0] = shmem_region-size;
-   pci_shmem_pci_device.bar[1] = msix_block | 
PCI_BASE_ADDRESS_SPACE_MEMORY;
+   pci_shmem_pci_device.bar[1] = cpu_to_le32(msix_block | 
PCI_BASE_ADDRESS_SPACE_MEMORY);
pci_shmem_pci_device.bar_size[1] = 0x1010;
-   pci_shmem_pci_device.bar[2] = shmem_region-phys_addr | 
PCI_BASE_ADDRESS_SPACE_MEMORY;
+   pci_shmem_pci_device.bar[2] = cpu_to_le32(shmem_region-phys_addr | 
PCI_BASE_ADDRESS_SPACE_MEMORY);
pci_shmem_pci_device.bar_size[2] = shmem_region-size;
 
pci__register(pci_shmem_pci_device, dev);
diff --git a/tools/kvm/hw/vesa.c b/tools/kvm/hw/vesa.c
index 22b1652..63f1082 100644
--- a/tools/kvm/hw/vesa.c
+++ b/tools/kvm/hw/vesa.c
@@ -8,6 +8,7 @@
 #include kvm/irq.h
 #include kvm/kvm.h
 #include kvm/pci.h
+#include linux/byteorder.h
 #include sys/mman.h
 
 #include sys/types.h
@@ -31,14 +32,14 @@ static struct ioport_operations vesa_io_ops = {
 };
 
 static struct pci_device_header vesa_pci_device = {
-   .vendor_id  = PCI_VENDOR_ID_REDHAT_QUMRANET,
-   .device_id  = PCI_DEVICE_ID_VESA,
+   .vendor_id  = cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET),
+   .device_id  = cpu_to_le16(PCI_DEVICE_ID_VESA),
.header_type= PCI_HEADER_TYPE_NORMAL,
.revision_id= 0,
-   .class  = 0x03,
-   .subsys_vendor_id   = PCI_SUBSYSTEM_VENDOR_ID_REDHAT_QUMRANET,
-   .subsys_id  = PCI_SUBSYSTEM_ID_VESA,
-   .bar[1] = VESA_MEM_ADDR | PCI_BASE_ADDRESS_SPACE_MEMORY,
+   .class[2]   = 0x03,
+   .subsys_vendor_id   = 
cpu_to_le16(PCI_SUBSYSTEM_VENDOR_ID_REDHAT_QUMRANET),
+   .subsys_id  = cpu_to_le16(PCI_SUBSYSTEM_ID_VESA),
+   .bar[1] = cpu_to_le32(VESA_MEM_ADDR | 
PCI_BASE_ADDRESS_SPACE_MEMORY),
.bar_size[1]= VESA_MEM_SIZE,
 };
 
@@ -56,7 +57,7 @@ struct framebuffer *vesa__init(struct kvm *kvm)
vesa_pci_device.irq_pin = pin;
vesa_pci_device.irq_line= line;
vesa_base_addr

[PATCH 24/28] kvm tools: Fix virtio-pci endian bug when reading VIRTIO_PCI_QUEUE_NUM

2011-12-05 Thread Matt Evans

The field size is currently wrong, read into a 32bit word instead of 16.  This
casues trouble when BE.

Signed-off-by: Matt Evans m...@ozlabs.org
---
 tools/kvm/virtio/pci.c |3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/tools/kvm/virtio/pci.c b/tools/kvm/virtio/pci.c
index 0ae93fb..6b27ff8 100644
--- a/tools/kvm/virtio/pci.c
+++ b/tools/kvm/virtio/pci.c
@@ -116,8 +116,7 @@ static bool virtio_pci__io_in(struct ioport *ioport, struct 
kvm *kvm, u16 port,
break;
case VIRTIO_PCI_QUEUE_NUM:
val = vtrans-virtio_ops-get_size_vq(kvm, vpci-dev, 
vpci-queue_selector);
-   ioport__write32(data, val);
-   break;
+   ioport__write16(data, val);
break;
case VIRTIO_PCI_STATUS:
ioport__write8(data, vpci-status);
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 25/28] kvm tools: Correctly set virtio-pci bar_size and remove hardwired address

2011-12-05 Thread Matt Evans

The BAR addresses are set up fine, but missed the bar_size[] array which is now
updated correspondingly.

Use PCI_IO_SIZE instead of '0x100'.

Signed-off-by: Matt Evans m...@ozlabs.org
---
 tools/kvm/virtio/pci.c |7 +--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/tools/kvm/virtio/pci.c b/tools/kvm/virtio/pci.c
index 6b27ff8..ffa3768 100644
--- a/tools/kvm/virtio/pci.c
+++ b/tools/kvm/virtio/pci.c
@@ -293,8 +293,8 @@ int virtio_pci__init(struct kvm *kvm, struct virtio_trans 
*vtrans, void *dev,
vpci-msix_pba_block = pci_get_io_space_block(PCI_IO_SIZE);
 
vpci-base_addr = ioport__register(IOPORT_EMPTY, virtio_pci__io_ops, 
IOPORT_SIZE, vtrans);
-   kvm__register_mmio(kvm, vpci-msix_io_block, 0x100, 
callback_mmio_table, vpci);
-   kvm__register_mmio(kvm, vpci-msix_pba_block, 0x100, callback_mmio_pba, 
vpci);
+   kvm__register_mmio(kvm, vpci-msix_io_block, PCI_IO_SIZE, 
callback_mmio_table, vpci);
+   kvm__register_mmio(kvm, vpci-msix_pba_block, PCI_IO_SIZE, 
callback_mmio_pba, vpci);
 
vpci-pci_hdr = (struct pci_device_header) {
.vendor_id  = 
cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET),
@@ -313,6 +313,9 @@ int virtio_pci__init(struct kvm *kvm, struct virtio_trans 
*vtrans, void *dev,
  | 
PCI_BASE_ADDRESS_MEM_TYPE_64),
.status = cpu_to_le16(PCI_STATUS_CAP_LIST),
.capabilities   = (void *)vpci-pci_hdr.msix - (void 
*)vpci-pci_hdr,
+   .bar_size[0]= IOPORT_SIZE,
+   .bar_size[1]= PCI_IO_SIZE,
+   .bar_size[3]= PCI_IO_SIZE,
};
 
vpci-pci_hdr.msix.cap = PCI_CAP_ID_MSIX;
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 26/28] kvm tools: Add pci__config_{rd,wr}(), pci__find_dev() and fix PCI config register addressing

2011-12-05 Thread Matt Evans

This allows config space access in a more natural manner than clunky x86 IO 
ports,
and is useful for other architectures.

Furthermore, the actual registers were only accessed in 32bit chunks; other
systems (e.g. PPC) allow smaller accesses so that, for example, the 16-bit
config field can be read directly.  This patch allows this sort of addressing.

Signed-off-by: Matt Evans m...@ozlabs.org
---
 tools/kvm/include/kvm/pci.h |5 +++
 tools/kvm/pci.c |   63 +++---
 2 files changed, 45 insertions(+), 23 deletions(-)

diff --git a/tools/kvm/include/kvm/pci.h b/tools/kvm/include/kvm/pci.h
index 88e92dc..be2b0bc 100644
--- a/tools/kvm/include/kvm/pci.h
+++ b/tools/kvm/include/kvm/pci.h
@@ -7,6 +7,8 @@
 #include linux/msi.h
 #include endian.h
 
+#include kvm/kvm.h
+
 #define PCI_MAX_DEVICES256
 /*
  * PCI Configuration Mechanism #1 I/O ports. See Section 3.7.4.1.
@@ -82,6 +84,9 @@ struct pci_device_header {
 
 void pci__init(void);
 void pci__register(struct pci_device_header *dev, u8 dev_num);
+struct pci_device_header *pci__find_dev(u8 dev_num);
 u32 pci_get_io_space_block(u32 size);
+void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void 
*data, int size);
+void pci__config_rd(struct kvm *kvm, union pci_config_address addr, void 
*data, int size);
 
 #endif /* KVM__PCI_H */
diff --git a/tools/kvm/pci.c b/tools/kvm/pci.c
index 5bbcbc7..8282e23 100644
--- a/tools/kvm/pci.c
+++ b/tools/kvm/pci.c
@@ -77,7 +77,6 @@ static bool pci_device_exists(u8 bus_number, u8 
device_number, u8 function_numbe
 static bool pci_config_data_out(struct ioport *ioport, struct kvm *kvm, u16 
port, void *data, int size)
 {
unsigned long start;
-   u8 dev_num;
 
/*
 * If someone accesses PCI configuration space offsets that are not
@@ -85,12 +84,41 @@ static bool pci_config_data_out(struct ioport *ioport, 
struct kvm *kvm, u16 port
 */
start = port - PCI_CONFIG_DATA;
 
-   dev_num = pci_config_address.device_number;
+   pci__config_wr(kvm, pci_config_address, data, size);
+
+   return true;
+}
+
+static bool pci_config_data_in(struct ioport *ioport, struct kvm *kvm, u16 
port, void *data, int size)
+{
+   unsigned long start;
+
+   /*
+* If someone accesses PCI configuration space offsets that are not
+* aligned to 4 bytes, it uses ioports to signify that.
+*/
+   start = port - PCI_CONFIG_DATA;
+
+   pci__config_rd(kvm, pci_config_address, data, size);
+
+   return true;
+}
+
+static struct ioport_operations pci_config_data_ops = {
+   .io_in  = pci_config_data_in,
+   .io_out = pci_config_data_out,
+};
+
+void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void 
*data, int size)
+{
+   u8 dev_num;
+
+   dev_num = addr.device_number;
 
if (pci_device_exists(0, dev_num, 0)) {
unsigned long offset;
 
-   offset = start + (pci_config_address.register_number  2);
+   offset = addr.w  0xff;
if (offset  sizeof(struct pci_device_header)) {
void *p = pci_devices[dev_num];
u8 bar = (offset - PCI_BAR_OFFSET(0)) / (sizeof(u32));
@@ -116,27 +144,18 @@ static bool pci_config_data_out(struct ioport *ioport, 
struct kvm *kvm, u16 port
}
}
}
-
-   return true;
 }
 
-static bool pci_config_data_in(struct ioport *ioport, struct kvm *kvm, u16 
port, void *data, int size)
+void pci__config_rd(struct kvm *kvm, union pci_config_address addr, void 
*data, int size)
 {
-   unsigned long start;
u8 dev_num;
 
-   /*
-* If someone accesses PCI configuration space offsets that are not
-* aligned to 4 bytes, it uses ioports to signify that.
-*/
-   start = port - PCI_CONFIG_DATA;
-
-   dev_num = pci_config_address.device_number;
+   dev_num = addr.device_number;
 
if (pci_device_exists(0, dev_num, 0)) {
unsigned long offset;
 
-   offset = start + (pci_config_address.register_number  2);
+   offset = addr.w  0xff;
if (offset  sizeof(struct pci_device_header)) {
void *p = pci_devices[dev_num];
 
@@ -145,22 +164,20 @@ static bool pci_config_data_in(struct ioport *ioport, 
struct kvm *kvm, u16 port,
memset(data, 0x00, size);
} else
memset(data, 0xff, size);
-
-   return true;
 }
 
-static struct ioport_operations pci_config_data_ops = {
-   .io_in  = pci_config_data_in,
-   .io_out = pci_config_data_out,
-};
-
 void pci__register(struct pci_device_header *dev, u8 dev_num)
 {
assert(dev_num  PCI_MAX_DEVICES);
-
pci_devices[dev_num]= dev;
 }
 
+struct pci_device_header *pci__find_dev(u8 dev_num)
+{
+

[PATCH 27/28] kvm tools: Arch-specific define for PCI MMIO allocation area

2011-12-05 Thread Matt Evans

pci_get_io_space_block() used to grab addresses from
KVM_32BIT_GAP_START + 0x100, which is x86-specific.  Create a new define,
KVM_PCI_MMIO_AREA, to specify a bus address these allocations can come from.

Signed-off-by: Matt Evans m...@ozlabs.org
---
 tools/kvm/pci.c  |8 ++--
 tools/kvm/x86/include/kvm/kvm-arch.h |5 +
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/tools/kvm/pci.c b/tools/kvm/pci.c
index 8282e23..045c1c5 100644
--- a/tools/kvm/pci.c
+++ b/tools/kvm/pci.c
@@ -11,8 +11,12 @@ static struct pci_device_header  
*pci_devices[PCI_MAX_DEVICES];
 
 static union pci_config_addresspci_config_address;
 
-/* This is within our PCI gap - in an unused area */
-static u32 io_space_blocks = KVM_32BIT_GAP_START + 0x100;
+/* This is within our PCI gap - in an unused area.
+ * Note this is a PCI *bus address*, is used to assign BARs etc.!
+ * (That's why it can still 32bit even with 64bit guests-- 64bit
+ * PCI isn't currently supported.)
+ */
+static u32 io_space_blocks = KVM_PCI_MMIO_AREA;
 
 u32 pci_get_io_space_block(u32 size)
 {
diff --git a/tools/kvm/x86/include/kvm/kvm-arch.h 
b/tools/kvm/x86/include/kvm/kvm-arch.h
index 02aa8b9..686b1b8 100644
--- a/tools/kvm/x86/include/kvm/kvm-arch.h
+++ b/tools/kvm/x86/include/kvm/kvm-arch.h
@@ -18,6 +18,11 @@
 
 #define KVM_MMIO_START KVM_32BIT_GAP_START
 
+/* This is the address that pci_get_io_space_block() starts allocating
+ * from.  Note that this is a PCI bus address (though same on x86).
+ */
+#define KVM_PCI_MMIO_AREA  (KVM_MMIO_START + 0x100)
+
 struct kvm {
int sys_fd; /* For system ioctls(), i.e. 
/dev/kvm */
int vm_fd;  /* For VM ioctls() */
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 28/28] kvm tools: Create arch-specific kvm_cpu__emulate_io()

2011-12-05 Thread Matt Evans

Different architectures will deal with MMIO exits differently.  For example,
KVM_EXIT_IO is x86-specific, and I/O cycles are often synthesisted by steering
into windows in PCI bridges on other architectures.

This patch moves the IO/MMIO exit code from the main runloop into x86/kvm-cpu.c

Signed-off-by: Matt Evans m...@ozlabs.org
---
 tools/kvm/include/kvm/kvm-cpu.h |1 +
 tools/kvm/kvm-cpu.c |   37 +
 tools/kvm/x86/kvm-cpu.c |   37 +
 3 files changed, 43 insertions(+), 32 deletions(-)

diff --git a/tools/kvm/include/kvm/kvm-cpu.h b/tools/kvm/include/kvm/kvm-cpu.h
index 15618f1..6f38c0c 100644
--- a/tools/kvm/include/kvm/kvm-cpu.h
+++ b/tools/kvm/include/kvm/kvm-cpu.h
@@ -13,6 +13,7 @@ void kvm_cpu__run(struct kvm_cpu *vcpu);
 void kvm_cpu__reboot(void);
 int kvm_cpu__start(struct kvm_cpu *cpu);
 bool kvm_cpu__handle_exit(struct kvm_cpu *vcpu);
+bool kvm_cpu__emulate_io(struct kvm_cpu *cpu, struct kvm_run *kvm_run);
 
 int kvm_cpu__get_debug_fd(void);
 void kvm_cpu__set_debug_fd(int fd);
diff --git a/tools/kvm/kvm-cpu.c b/tools/kvm/kvm-cpu.c
index 884a89f..c9fbc81 100644
--- a/tools/kvm/kvm-cpu.c
+++ b/tools/kvm/kvm-cpu.c
@@ -103,49 +103,22 @@ int kvm_cpu__start(struct kvm_cpu *cpu)
kvm_cpu__show_registers(cpu);
kvm_cpu__show_code(cpu);
break;
-   case KVM_EXIT_IO: {
-   bool ret;
-
-   ret = kvm__emulate_io(cpu-kvm,
-   cpu-kvm_run-io.port,
-   (u8 *)cpu-kvm_run +
-   cpu-kvm_run-io.data_offset,
-   cpu-kvm_run-io.direction,
-   cpu-kvm_run-io.size,
-   cpu-kvm_run-io.count);
-
-   if (!ret)
+   case KVM_EXIT_IO:
+   case KVM_EXIT_MMIO:
+   if (!kvm_cpu__emulate_io(cpu, cpu-kvm_run))
goto panic_kvm;
break;
-   }
-   case KVM_EXIT_MMIO: {
-   bool ret;
-
-   ret = kvm__emulate_mmio(cpu-kvm,
-   cpu-kvm_run-mmio.phys_addr,
-   cpu-kvm_run-mmio.data,
-   cpu-kvm_run-mmio.len,
-   cpu-kvm_run-mmio.is_write);
-
-   if (!ret)
-   goto panic_kvm;
-   break;
-   }
case KVM_EXIT_INTR:
if (cpu-is_running)
break;
goto exit_kvm;
case KVM_EXIT_SHUTDOWN:
goto exit_kvm;
-   default: {
-   bool ret;
-
-   ret = kvm_cpu__handle_exit(cpu);
-   if (!ret)
+   default:
+   if (!kvm_cpu__handle_exit(cpu))
goto panic_kvm;
break;
}
-   }
kvm_cpu__handle_coalesced_mmio(cpu);
}
 
diff --git a/tools/kvm/x86/kvm-cpu.c b/tools/kvm/x86/kvm-cpu.c
index a0d10cc..665d742 100644
--- a/tools/kvm/x86/kvm-cpu.c
+++ b/tools/kvm/x86/kvm-cpu.c
@@ -217,6 +217,43 @@ bool kvm_cpu__handle_exit(struct kvm_cpu *vcpu)
return false;
 }
 
+bool kvm_cpu__emulate_io(struct kvm_cpu *cpu, struct kvm_run *kvm_run)
+{
+   bool ret;
+   switch (kvm_run-exit_reason) {
+   case KVM_EXIT_IO: {
+   ret = kvm__emulate_io(cpu-kvm,
+ cpu-kvm_run-io.port,
+ (u8 *)cpu-kvm_run +
+ cpu-kvm_run-io.data_offset,
+ cpu-kvm_run-io.direction,
+ cpu-kvm_run-io.size,
+ cpu-kvm_run-io.count);
+
+   if (!ret)
+   goto panic_kvm;
+   break;
+   }
+   case KVM_EXIT_MMIO: {
+   ret = kvm__emulate_mmio(cpu-kvm,
+   cpu-kvm_run-mmio.phys_addr,
+   cpu-kvm_run-mmio.data,
+   cpu-kvm_run-mmio.len,
+   cpu-kvm_run-mmio.is_write);
+
+   if (!ret)
+   goto panic_kvm;
+   break;
+   }
+   default:
+   pr_warning(Unknown exit reason %d in %s\n, 
kvm_run-exit_reason, __FUNCTION__);
+   return false;
+   }
+   return true;
+panic_kvm:
+   return false;
+}
+
 static void print_dtable(const

[PATCH 1/8] kvm tools: Add initial SPAPR PPC64 architecture support

2011-12-05 Thread Matt Evans

This patch adds a new arch directory, powerpc, basic file structure, register
setup and where necessary stubs out arch-specific functions (e.g. interrupts,
runloop exits) that later patches will provide.  The target is an
SPAPR-compliant PPC64 machine (i.e. pSeries); there is no support for PPC32 or
'bare metal' PPC64 guests as yet.  Subsequent patches implement the hcalls and
RTAS required to boot SPAPR pSeries kernels.

Memory is mapped from hugetlbfs (as that is currently required by upstream PPC64
HV-mode KVM).  The mapping of a VRMA region is yet to be implemented; this is
only necessary on processors that don't support VRMA, e.g. = P6.  Work is
therefore needed to get this going on pre-P7 CPUs.

Processor state is set up as a guest kernel would expect (both primary and
secondaries), and SMP is fully supported.

Finally, support is added for simply loading flat binary kernels (plus initrd).
(bzImages are not used on PPC, and this series does not add zImage support or an
ELF loader.)  The intention is to later support loading firmware such as SLOF.

Signed-off-by: Matt Evans m...@ozlabs.org
---
 tools/kvm/Makefile   |   10 +
 tools/kvm/kvm.c  |3 +
 tools/kvm/powerpc/include/kvm/barrier.h  |6 +
 tools/kvm/powerpc/include/kvm/kvm-arch.h |   70 
 tools/kvm/powerpc/include/kvm/kvm-cpu-arch.h |   46 +
 tools/kvm/powerpc/ioport.c   |   18 ++
 tools/kvm/powerpc/irq.c  |   40 +
 tools/kvm/powerpc/kvm-cpu.c  |  232 ++
 tools/kvm/powerpc/kvm.c  |  231 +
 9 files changed, 656 insertions(+), 0 deletions(-)
 create mode 100644 tools/kvm/powerpc/include/kvm/barrier.h
 create mode 100644 tools/kvm/powerpc/include/kvm/kvm-arch.h
 create mode 100644 tools/kvm/powerpc/include/kvm/kvm-cpu-arch.h
 create mode 100644 tools/kvm/powerpc/ioport.c
 create mode 100644 tools/kvm/powerpc/irq.c
 create mode 100644 tools/kvm/powerpc/kvm-cpu.c
 create mode 100644 tools/kvm/powerpc/kvm.c

diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile
index 57dc521..58815a2 100644
--- a/tools/kvm/Makefile
+++ b/tools/kvm/Makefile
@@ -121,6 +121,16 @@ ifeq ($(ARCH),x86)
OTHEROBJS   += x86/bios/bios-rom.o
ARCH_INCLUDE := x86/include
 endif
+# POWER/ppc:  Actually only support ppc64 currently.
+ifeq ($(uname_M), ppc64)
+   DEFINES += -DCONFIG_PPC
+   OBJS+= powerpc/ioport.o
+   OBJS+= powerpc/irq.o
+   OBJS+= powerpc/kvm.o
+   OBJS+= powerpc/kvm-cpu.o
+   ARCH_INCLUDE := powerpc/include
+   CFLAGS += -m64
+endif
 
 ###
 
diff --git a/tools/kvm/kvm.c b/tools/kvm/kvm.c
index 503ceae..d716ede 100644
--- a/tools/kvm/kvm.c
+++ b/tools/kvm/kvm.c
@@ -49,6 +49,9 @@ const char *kvm_exit_reasons[] = {
DEFINE_KVM_EXIT_REASON(KVM_EXIT_DCR),
DEFINE_KVM_EXIT_REASON(KVM_EXIT_NMI),
DEFINE_KVM_EXIT_REASON(KVM_EXIT_INTERNAL_ERROR),
+#ifdef CONFIG_PPC64
+   DEFINE_KVM_EXIT_REASON(KVM_EXIT_PAPR_HCALL),
+#endif
 };
 
 extern struct kvm *kvm;
diff --git a/tools/kvm/powerpc/include/kvm/barrier.h 
b/tools/kvm/powerpc/include/kvm/barrier.h
new file mode 100644
index 000..bc7d179
--- /dev/null
+++ b/tools/kvm/powerpc/include/kvm/barrier.h
@@ -0,0 +1,6 @@
+#ifndef _KVM_BARRIER_H_
+#define _KVM_BARRIER_H_
+
+#include asm/system.h
+
+#endif /* _KVM_BARRIER_H_ */
diff --git a/tools/kvm/powerpc/include/kvm/kvm-arch.h 
b/tools/kvm/powerpc/include/kvm/kvm-arch.h
new file mode 100644
index 000..722d01c
--- /dev/null
+++ b/tools/kvm/powerpc/include/kvm/kvm-arch.h
@@ -0,0 +1,70 @@
+/*
+ * PPC64 architecture-specific definitions
+ *
+ * Copyright 2011 Matt Evans m...@ozlabs.org, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+
+#ifndef KVM__KVM_ARCH_H
+#define KVM__KVM_ARCH_H
+
+#include stdbool.h
+#include linux/types.h
+#include time.h
+
+#define KVM_NR_CPUS(255)
+
+/* MMIO lives after RAM, but it'd be nice if it didn't constantly move.
+ * Choose a suitably high address, e.g. 63T...  This limits RAM size.
+ */
+#define PPC_MMIO_START 0x3F00UL
+#define PPC_MMIO_SIZE  0x0100UL
+
+#define KERNEL_LOAD_ADDR   0x
+#define KERNEL_START_ADDR  0x
+#define KERNEL_SECONDARY_START_ADDR 0x0060
+#define INITRD_LOAD_ADDR   0x0280
+
+#define FDT_MAX_SIZE   0x1
+#define RTAS_MAX_SIZE  0x1
+
+#define TIMEBASE_FREQ  51200ULL
+
+#define KVM_MMIO_START PPC_MMIO_START
+
+/* This is the address that pci_get_io_space_block() starts allocating
+ * from.  Note that this is a PCI bus address.
+ */
+#define

[PATCH 2/8] kvm tools: Generate SPAPR PPC64 guest device tree

2011-12-05 Thread Matt Evans

The generated DT is the bare minimum structure required for SPAPR (on which
subsequent patches for VIO, XICS, PCI etc. will build); root node, cpus, memory.

Some aspects are currently hardwired for simplicity, for example advertised
page sizes, HPT size, SLB size, VMX/DFP, etc.  Future support of a variety
of POWER CPUs should acquire this info from the host and encode appropriately.

This requires a 64-bit libfdt.

Signed-off-by: Matt Evans m...@ozlabs.org
---
 tools/kvm/Makefile  |3 +-
 tools/kvm/powerpc/kvm.c |  141 +++
 2 files changed, 143 insertions(+), 1 deletions(-)

diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile
index 58815a2..dc18959 100644
--- a/tools/kvm/Makefile
+++ b/tools/kvm/Makefile
@@ -129,7 +129,8 @@ ifeq ($(uname_M), ppc64)
OBJS+= powerpc/kvm.o
OBJS+= powerpc/kvm-cpu.o
ARCH_INCLUDE := powerpc/include
-   CFLAGS += -m64
+   CFLAGS  += -m64
+   LIBS+= -lfdt
 endif
 
 ###
diff --git a/tools/kvm/powerpc/kvm.c b/tools/kvm/powerpc/kvm.c
index 036bfc0..d792bee 100644
--- a/tools/kvm/powerpc/kvm.c
+++ b/tools/kvm/powerpc/kvm.c
@@ -3,6 +3,9 @@
  *
  * Copyright 2011 Matt Evans m...@ozlabs.org, IBM Corporation.
  *
+ * Portions of FDT setup borrowed from QEMU, copyright 2010 David Gibson, IBM
+ * Corporation.
+ *
  * This program is free software; you can redistribute it and/or modify it
  * under the terms of the GNU General Public License version 2 as published
  * by the Free Software Foundation.
@@ -28,8 +31,11 @@
 #include asm/unistd.h
 #include errno.h
 
+#include linux/byteorder.h
 #include libfdt.h
 
+#define HPT_ORDER 24
+
 #define HUGETLBFS_PATH /var/lib/hugetlbfs/global/pagesize-16MB/
 
 static char kern_cmdline[2048];
@@ -212,9 +218,144 @@ bool load_bzimage(struct kvm *kvm, int fd_kernel,
return false;
 }
 
+#define SMT_THREADS 4
+
+#define _FDT(exp)  \
+   do {\
+   int ret = (exp);\
+   if (ret  0) {  \
+   die(Error creating device tree: %s: %s\n, \
+   #exp, fdt_strerror(ret));   \
+   }   \
+   } while (0)
+
+static uint32_t mfpvr(void)
+{
+   uint32_t r;
+   asm volatile (mfpvr %0 : =r(r));
+   return r;
+}
+
 static void setup_fdt(struct kvm *kvm)
 {
+   uint64_tmem_reg_property[] = { 0, cpu_to_be64(kvm-ram_size) };
+   int smp_cpus = kvm-nrcpus;
+   uint32_tinterrupt_server_ranges_prop[] = {0, 
cpu_to_be32(smp_cpus)};
+   charhypertas_prop_kvm[] = 
hcall-pft\0hcall-term\0hcall-dabr\0hcall-interrupt
+   \0hcall-tce\0hcall-vio\0hcall-splpar\0hcall-bulk;
+   int i, j;
+   charcpu_name[30];
+   u8  staging_fdt[FDT_MAX_SIZE];
+   uint32_tpvr = mfpvr();
+
+   /* Generate an appropriate DT at kvm-fdt_gra */
+   void *fdt_dest = guest_flat_to_host(kvm, kvm-fdt_gra);
+   void *fdt = staging_fdt;
+
+   _FDT(fdt_create(fdt, FDT_MAX_SIZE));
+   _FDT(fdt_finish_reservemap(fdt));
+
+   _FDT(fdt_begin_node(fdt, ));
+
+   _FDT(fdt_property_string(fdt, device_type, chrp));
+   _FDT(fdt_property_string(fdt, model, IBM pSeries (emulated by 
kvmtool)));
+   _FDT(fdt_property_cell(fdt, #address-cells, 0x2));
+   _FDT(fdt_property_cell(fdt, #size-cells, 0x2));
+
+   /* /chosen */
+   _FDT(fdt_begin_node(fdt, chosen));
+   /* cmdline */
+   _FDT(fdt_property_string(fdt, bootargs, kern_cmdline));
+   /* Initrd */
+   if (kvm-initrd_size != 0) {
+   uint32_t ird_st_prop = cpu_to_be32(kvm-initrd_gra);
+   uint32_t ird_end_prop = cpu_to_be32(kvm-initrd_gra +
+   kvm-initrd_size);
+   _FDT(fdt_property(fdt, linux,initrd-start,
+  ird_st_prop, sizeof(ird_st_prop)));
+   _FDT(fdt_property(fdt, linux,initrd-end,
+  ird_end_prop, sizeof(ird_end_prop)));
+   }
+
+   /* Memory: We don't alloc. a separate RMA yet.  If we ever need to
+* (CAP_PPC_RMA == 2) then have one memory node for 0-RMAsize, and
+* another RMAsize-endOfMem.
+*/
+   _FDT(fdt_begin_node(fdt, memory@0));
+   _FDT(fdt_property_string(fdt, device_type, memory));
+   _FDT(fdt_property(fdt, reg, mem_reg_property, 
sizeof(mem_reg_property)));
+   _FDT(fdt_end_node(fdt));
+
+   /* CPUs */
+   _FDT(fdt_begin_node(fdt, cpus));
+   _FDT(fdt_property_cell(fdt, #address-cells, 0x1));
+   _FDT(fdt_property_cell(fdt, #size-cells, 0x0));
+
+   for (i = 0; i  smp_cpus; i +=

[PATCH 0/8] kvm tools SPAPR PPC64 support

2011-12-05 Thread Matt Evans

Hi,

This set of patches builds upon the prep-work of the previous set and adds
support to kvmtool for PPC64 SPAPR-based guests, i.e. an environment akin to an
LPAR on IBM's pSeries machines.

This support is not yet fully-featured but, in a basic state, works well.
The guests have a functional but no-frills experience, with:

- SMP guests
- HV console (or RTAS console, for udbg)
- Net, block over virtio-pci
- No PAPR VIO/VSCSI/VNET yet
- No fancyfeatures like migration yet

Though minimal, guests are quite stable.

There are obvious areas for future improvement:

- Non-VRMA RMAs aren't supported, meaning POWER7-only for the moment
- Other CPU-specific details are currently assumed (e.g. available page sizes);
  work is required to determine host capabilities and pass these up.
- Support SLOF
- Maybe support VIO
- Some hypercalls used by partition firmware/SLOF (not the kernel) are
  unimplemented
- Fancy PCI (e.g. passthrough)
- Currently KVM_NR_CPUs is arbitrarily fixed at 255, and could be higher.
  Guests with this many CPUs boot fine.

Some PPC KVM kernel-side features aren't implemented yet and have required
kvmtool workarounds; mmio coalescing isn't supported and lack of ioeventfds
requires virtio to gracefully fall back when it fails to register one.


Cheers,


Matt



Matt Evans (8):
  kvm tools: Add initial SPAPR PPC64 architecture support
  kvm tools: Generate SPAPR PPC64 guest device tree
  kvm tools: Add SPAPR PPC64 hcall  rtascall structure
  kvm tools: Add SPAPR PPC64 HV console
  kvm tools: Add PPC64 XICS interrupt controller support
  kvm tools: Add PPC64 PCI Host Bridge
  kvm tools: Add PPC64 kvm_cpu__emulate_io()
  kvm tools: Make virtio-pci's ioeventfd__add_event() fall back
gracefully if ioeventfds unavailable

 tools/kvm/Makefile   |   16 +
 tools/kvm/include/kvm/ioeventfd.h|3 +-
 tools/kvm/ioeventfd.c|   12 +-
 tools/kvm/kvm.c  |3 +
 tools/kvm/powerpc/include/kvm/barrier.h  |6 +
 tools/kvm/powerpc/include/kvm/kvm-arch.h |   74 
 tools/kvm/powerpc/include/kvm/kvm-cpu-arch.h |   48 +++
 tools/kvm/powerpc/ioport.c   |   18 +
 tools/kvm/powerpc/irq.c  |   62 +++
 tools/kvm/powerpc/kvm-cpu.c  |  281 ++
 tools/kvm/powerpc/kvm.c  |  466 +++
 tools/kvm/powerpc/spapr.h|  316 +++
 tools/kvm/powerpc/spapr_hcall.c  |  151 
 tools/kvm/powerpc/spapr_hvcons.c |  101 +
 tools/kvm/powerpc/spapr_hvcons.h |   19 +
 tools/kvm/powerpc/spapr_pci.c|  429 +
 tools/kvm/powerpc/spapr_pci.h|   38 ++
 tools/kvm/powerpc/spapr_rtas.c   |  226 +++
 tools/kvm/powerpc/xics.c |  529 ++
 tools/kvm/powerpc/xics.h |   23 ++
 tools/kvm/virtio/pci.c   |   11 +-
 21 files changed, 2827 insertions(+), 5 deletions(-)
 create mode 100644 tools/kvm/powerpc/include/kvm/barrier.h
 create mode 100644 tools/kvm/powerpc/include/kvm/kvm-arch.h
 create mode 100644 tools/kvm/powerpc/include/kvm/kvm-cpu-arch.h
 create mode 100644 tools/kvm/powerpc/ioport.c
 create mode 100644 tools/kvm/powerpc/irq.c
 create mode 100644 tools/kvm/powerpc/kvm-cpu.c
 create mode 100644 tools/kvm/powerpc/kvm.c
 create mode 100644 tools/kvm/powerpc/spapr.h
 create mode 100644 tools/kvm/powerpc/spapr_hcall.c
 create mode 100644 tools/kvm/powerpc/spapr_hvcons.c
 create mode 100644 tools/kvm/powerpc/spapr_hvcons.h
 create mode 100644 tools/kvm/powerpc/spapr_pci.c
 create mode 100644 tools/kvm/powerpc/spapr_pci.h
 create mode 100644 tools/kvm/powerpc/spapr_rtas.c
 create mode 100644 tools/kvm/powerpc/xics.c
 create mode 100644 tools/kvm/powerpc/xics.h

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/8] kvm tools: Add SPAPR PPC64 hcall rtascall structure

2011-12-05 Thread Matt Evans

This patch adds the basic structure for HV calls, their registration and some of
the simpler calls.  A similar layout for RTAS calls is also added, again with
some of the simpler RTAS calls used by the guest.  The SPAPR RTAS stub is
generated inline.  Also, nodes for RTAS are added to the device tree.

Signed-off-by: Matt Evans m...@ozlabs.org
---
 tools/kvm/Makefile  |2 +
 tools/kvm/powerpc/kvm-cpu.c |5 +
 tools/kvm/powerpc/kvm.c |   39 +-
 tools/kvm/powerpc/spapr.h   |  308 +++
 tools/kvm/powerpc/spapr_hcall.c |  151 +++
 tools/kvm/powerpc/spapr_rtas.c  |  226 
 6 files changed, 730 insertions(+), 1 deletions(-)
 create mode 100644 tools/kvm/powerpc/spapr.h
 create mode 100644 tools/kvm/powerpc/spapr_hcall.c
 create mode 100644 tools/kvm/powerpc/spapr_rtas.c

diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile
index dc18959..0f24104 100644
--- a/tools/kvm/Makefile
+++ b/tools/kvm/Makefile
@@ -128,6 +128,8 @@ ifeq ($(uname_M), ppc64)
OBJS+= powerpc/irq.o
OBJS+= powerpc/kvm.o
OBJS+= powerpc/kvm-cpu.o
+   OBJS+= powerpc/spapr_hcall.o
+   OBJS+= powerpc/spapr_rtas.o
ARCH_INCLUDE := powerpc/include
CFLAGS  += -m64
LIBS+= -lfdt
diff --git a/tools/kvm/powerpc/kvm-cpu.c b/tools/kvm/powerpc/kvm-cpu.c
index 79422ff..71c648e 100644
--- a/tools/kvm/powerpc/kvm-cpu.c
+++ b/tools/kvm/powerpc/kvm-cpu.c
@@ -14,6 +14,8 @@
 #include kvm/util.h
 #include kvm/kvm.h
 
+#include spapr.h
+
 #include sys/ioctl.h
 #include sys/mman.h
 #include signal.h
@@ -156,6 +158,9 @@ bool kvm_cpu__handle_exit(struct kvm_cpu *vcpu)
bool ret = true;
struct kvm_run *run = vcpu-kvm_run;
switch(run-exit_reason) {
+   case KVM_EXIT_PAPR_HCALL:
+   run-papr_hcall.ret = spapr_hypercall(vcpu, run-papr_hcall.nr, 
run-papr_hcall.args);
+   break;
default:
ret = false;
}
diff --git a/tools/kvm/powerpc/kvm.c b/tools/kvm/powerpc/kvm.c
index d792bee..2f0a921 100644
--- a/tools/kvm/powerpc/kvm.c
+++ b/tools/kvm/powerpc/kvm.c
@@ -14,6 +14,8 @@
 #include kvm/kvm.h
 #include kvm/util.h
 
+#include spapr.h
+
 #include linux/kvm.h
 
 #include sys/types.h
@@ -153,6 +155,10 @@ void kvm__arch_init(struct kvm *kvm, const char *kvm_dev, 
const char *hugetlbfs_
cap_ppc_rma = ioctl(kvm-sys_fd, KVM_CHECK_EXTENSION, KVM_CAP_PPC_RMA);
if (cap_ppc_rma == 2)
die(Need contiguous RMA allocation on this hardware, which is 
not yet supported.);
+
+   /* Do these before FDT setup, IRQ setup, etc. */
+   hypercall_init();
+   register_core_rtas();
 }
 
 void kvm__irq_line(struct kvm *kvm, int irq, int level)
@@ -262,6 +268,20 @@ static void setup_fdt(struct kvm *kvm)
_FDT(fdt_property_cell(fdt, #address-cells, 0x2));
_FDT(fdt_property_cell(fdt, #size-cells, 0x2));
 
+   /* RTAS */
+   _FDT(fdt_begin_node(fdt, rtas));
+   /* This is what the kernel uses to switch 'We're an LPAR'! */
+_FDT(fdt_property(fdt, ibm,hypertas-functions, hypertas_prop_kvm,
+   sizeof(hypertas_prop_kvm)));
+   _FDT(fdt_property_cell(fdt, linux,rtas-base, kvm-rtas_gra));
+   _FDT(fdt_property_cell(fdt, linux,rtas-entry, kvm-rtas_gra));
+   _FDT(fdt_property_cell(fdt, rtas-size, kvm-rtas_size));
+   /* Now add properties for all RTAS tokens: */
+   if (spapr_rtas_fdt_setup(kvm, fdt))
+   die(Couldn't create RTAS FDT properties\n);
+
+   _FDT(fdt_end_node(fdt));
+
/* /chosen */
_FDT(fdt_begin_node(fdt, chosen));
/* cmdline */
@@ -363,7 +383,24 @@ static void setup_fdt(struct kvm *kvm)
  */
 void kvm__arch_setup_firmware(struct kvm *kvm)
 {
-   /* Load RTAS */
+   /* Set up RTAS stub.  All it is is a single hypercall:
+  0:   7c 64 1b 78 mr  r4,r3
+  4:   3c 60 00 00 lis r3,0
+  8:   60 63 f0 00 ori r3,r3,61440
+  c:   44 00 00 22 sc  1
+ 10:   4e 80 00 20 blr
+   */
+   uint32_t *rtas = guest_flat_to_host(kvm, kvm-rtas_gra);
+
+   rtas[0] = 0x7c641b78;
+   rtas[1] = 0x3c60;
+   rtas[2] = 0x6063f000;
+   rtas[3] = 0x4422;
+   rtas[4] = 0x4e800020;
+   kvm-rtas_size = 20;
+
+   pr_info(Set up %ld bytes of RTAS at 0x%lx\n,
+   kvm-rtas_size, kvm-rtas_gra);
 
/* Load SLOF */
 
diff --git a/tools/kvm/powerpc/spapr.h b/tools/kvm/powerpc/spapr.h
new file mode 100644
index 000..4e5d7bd
--- /dev/null
+++ b/tools/kvm/powerpc/spapr.h
@@ -0,0 +1,308 @@
+/*
+ * SPAPR definitions and declarations
+ *
+ * Borrowed heavily from QEMU's spapr.h,
+ * Copyright (c) 2010 David Gibson, IBM Corporation.
+ *
+ * Modifications by Matt Evans m...@ozlabs.org, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or

[PATCH 4/8] kvm tools: Add SPAPR PPC64 HV console

2011-12-05 Thread Matt Evans

This adds the console code, plus VIO HV terminal nodes are added to
the device tree so the guest kernel will pick it up.

Signed-off-by: Matt Evans m...@ozlabs.org
---
 tools/kvm/Makefile   |1 +
 tools/kvm/powerpc/kvm.c  |   31 
 tools/kvm/powerpc/spapr_hvcons.c |  101 ++
 tools/kvm/powerpc/spapr_hvcons.h |   19 +++
 4 files changed, 152 insertions(+), 0 deletions(-)
 create mode 100644 tools/kvm/powerpc/spapr_hvcons.c
 create mode 100644 tools/kvm/powerpc/spapr_hvcons.h

diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile
index 0f24104..76cce3a 100644
--- a/tools/kvm/Makefile
+++ b/tools/kvm/Makefile
@@ -130,6 +130,7 @@ ifeq ($(uname_M), ppc64)
OBJS+= powerpc/kvm-cpu.o
OBJS+= powerpc/spapr_hcall.o
OBJS+= powerpc/spapr_rtas.o
+   OBJS+= powerpc/spapr_hvcons.o
ARCH_INCLUDE := powerpc/include
CFLAGS  += -m64
LIBS+= -lfdt
diff --git a/tools/kvm/powerpc/kvm.c b/tools/kvm/powerpc/kvm.c
index 2f0a921..8614538 100644
--- a/tools/kvm/powerpc/kvm.c
+++ b/tools/kvm/powerpc/kvm.c
@@ -15,6 +15,7 @@
 #include kvm/util.h
 
 #include spapr.h
+#include spapr_hvcons.h
 
 #include linux/kvm.h
 
@@ -159,6 +160,8 @@ void kvm__arch_init(struct kvm *kvm, const char *kvm_dev, 
const char *hugetlbfs_
/* Do these before FDT setup, IRQ setup, etc. */
hypercall_init();
register_core_rtas();
+   /* Now that hypercalls are initialised, register a couple for the 
console: */
+   spapr_hvcons_init();
 }
 
 void kvm__irq_line(struct kvm *kvm, int irq, int level)
@@ -172,6 +175,11 @@ void kvm__irq_trigger(struct kvm *kvm, int irq)
kvm__irq_line(kvm, irq, 0);
 }
 
+void kvm__arch_periodic_poll(struct kvm *kvm)
+{
+   spapr_hvcons_poll(kvm);
+}
+
 int load_flat_binary(struct kvm *kvm, int fd_kernel, int fd_initrd, const char 
*kernel_cmdline)
 {
void *p;
@@ -297,6 +305,13 @@ static void setup_fdt(struct kvm *kvm)
   ird_end_prop, sizeof(ird_end_prop)));
}
 
+   /* stdout-path: This is assuming we're using the HV console.  Also, the
+* address is hardwired until we do a VIO bus.
+*/
+   _FDT(fdt_property_string(fdt, linux,stdout-path,
+/vdevice/vty@3000));
+   _FDT(fdt_end_node(fdt));
+
/* Memory: We don't alloc. a separate RMA yet.  If we ever need to
 * (CAP_PPC_RMA == 2) then have one memory node for 0-RMAsize, and
 * another RMAsize-endOfMem.
@@ -369,6 +384,22 @@ static void setup_fdt(struct kvm *kvm)
}
_FDT(fdt_end_node(fdt));
 
+   /* VIO: See comment in linux,stdout-path; we don't yet represent a VIO
+* bus/address allocation so addresses are hardwired here.
+*/
+   _FDT(fdt_begin_node(fdt, vdevice));
+   _FDT(fdt_property_cell(fdt, #address-cells, 0x1));
+   _FDT(fdt_property_cell(fdt, #size-cells, 0x0));
+   _FDT(fdt_property_string(fdt, device_type, vdevice));
+   _FDT(fdt_property_string(fdt, compatible, IBM,vdevice));
+   _FDT(fdt_begin_node(fdt, vty@3000));
+   _FDT(fdt_property_string(fdt, name, vty));
+   _FDT(fdt_property_string(fdt, device_type, serial));
+   _FDT(fdt_property_string(fdt, compatible, hvterm1));
+   _FDT(fdt_property_cell(fdt, reg, 0x3000));
+   _FDT(fdt_end_node(fdt));
+   _FDT(fdt_end_node(fdt));
+
/* Finalise: */
_FDT(fdt_end_node(fdt)); /* Root node */
_FDT(fdt_finish(fdt));
diff --git a/tools/kvm/powerpc/spapr_hvcons.c b/tools/kvm/powerpc/spapr_hvcons.c
new file mode 100644
index 000..97902ac
--- /dev/null
+++ b/tools/kvm/powerpc/spapr_hvcons.c
@@ -0,0 +1,101 @@
+/*
+ * SPAPR HV console
+ *
+ * Borrowed lightly from QEMU's spapr_vty.c, Copyright (c) 2010 David Gibson,
+ * IBM Corporation.
+ *
+ * Copyright (c) 2011 Matt Evans m...@ozlabs.org, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+
+#include kvm/term.h
+#include kvm/kvm.h
+#include kvm/kvm-cpu.h
+#include kvm/util.h
+#include spapr.h
+#include spapr_hvcons.h
+
+#include stdio.h
+#include sys/uio.h
+#include errno.h
+
+#include linux/byteorder.h
+
+union hv_chario {
+   struct {
+   uint64_t char0_7;
+   uint64_t char8_15;
+   } a;
+   uint8_t buf[16];
+};
+
+static unsigned long h_put_term_char(struct kvm_cpu *vcpu, unsigned long 
opcode, unsigned long *args)
+{
+   /* To do: Read register from args[0], and check it. */
+   unsigned long len = args[1];
+   union hv_chario data;
+   struct iovec iov;
+
+   if (len  16) {
+   return H_PARAMETER;
+   }
+   data.a.char0_7 = cpu_to_be64(args[2]);
+   data.a.char8_15 = cpu_to_be64(args[3]);
+
+   iov.iov_base =

[PATCH 5/8] kvm tools: Add PPC64 XICS interrupt controller support

2011-12-05 Thread Matt Evans

This patch adds XICS emulation code (heavily borrowed from QEMU), and wires
this into kvm_cpu__irq() to fire a CPU IRQ via KVM.  A device tree entry is
also added.  IPIs work, xics_alloc_irqnum() is added to allocate an external
IRQ (which will later be used by the PHB PCI code) and finally, kvm__irq_line()
can be called to raise an IRQ on XICS.

Signed-off-by: Matt Evans m...@ozlabs.org
---
 tools/kvm/Makefile   |1 +
 tools/kvm/powerpc/include/kvm/kvm-arch.h |1 +
 tools/kvm/powerpc/include/kvm/kvm-cpu-arch.h |2 +
 tools/kvm/powerpc/irq.c  |   11 +-
 tools/kvm/powerpc/kvm-cpu.c  |   10 +
 tools/kvm/powerpc/kvm.c  |   25 +-
 tools/kvm/powerpc/xics.c |  529 ++
 tools/kvm/powerpc/xics.h |   23 ++
 8 files changed, 596 insertions(+), 6 deletions(-)
 create mode 100644 tools/kvm/powerpc/xics.c
 create mode 100644 tools/kvm/powerpc/xics.h

diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile
index 76cce3a..6c8 100644
--- a/tools/kvm/Makefile
+++ b/tools/kvm/Makefile
@@ -131,6 +131,7 @@ ifeq ($(uname_M), ppc64)
OBJS+= powerpc/spapr_hcall.o
OBJS+= powerpc/spapr_rtas.o
OBJS+= powerpc/spapr_hvcons.o
+   OBJS+= powerpc/xics.o
ARCH_INCLUDE := powerpc/include
CFLAGS  += -m64
LIBS+= -lfdt
diff --git a/tools/kvm/powerpc/include/kvm/kvm-arch.h 
b/tools/kvm/powerpc/include/kvm/kvm-arch.h
index 722d01c..ae811e9 100644
--- a/tools/kvm/powerpc/include/kvm/kvm-arch.h
+++ b/tools/kvm/powerpc/include/kvm/kvm-arch.h
@@ -65,6 +65,7 @@ struct kvm {
unsigned long   initrd_gra;
unsigned long   initrd_size;
const char  *name;
+   struct icp_state*icp;
 };
 
 #endif /* KVM__KVM_ARCH_H */
diff --git a/tools/kvm/powerpc/include/kvm/kvm-cpu-arch.h 
b/tools/kvm/powerpc/include/kvm/kvm-cpu-arch.h
index dbabc57..551307e 100644
--- a/tools/kvm/powerpc/include/kvm/kvm-cpu-arch.h
+++ b/tools/kvm/powerpc/include/kvm/kvm-cpu-arch.h
@@ -17,6 +17,8 @@
 
 #include pthread.h
 
+#define POWER7_EXT_IRQ 0
+
 struct kvm;
 
 struct kvm_cpu {
diff --git a/tools/kvm/powerpc/irq.c b/tools/kvm/powerpc/irq.c
index 46aa64f..80c972a 100644
--- a/tools/kvm/powerpc/irq.c
+++ b/tools/kvm/powerpc/irq.c
@@ -21,6 +21,10 @@
 #include stddef.h
 #include stdlib.h
 
+#include xics.h
+
+#define XICS_IRQS   1024
+
 int irq__register_device(u32 dev, u8 *num, u8 *pin, u8 *line)
 {
fprintf(stderr, irq__register_device(%d, [%d], [%d], [%d]\n,
@@ -30,7 +34,12 @@ int irq__register_device(u32 dev, u8 *num, u8 *pin, u8 *line)
 
 void irq__init(struct kvm *kvm)
 {
-   fprintf(stderr, __func__);
+   /* kvm-nr_cpus is now valid; for /now/, pass
+* this to xics_system_init(), which assumes servers
+* are numbered 0..nrcpus.  This may not really be true,
+* but it is OK currently.
+*/
+   kvm-icp = xics_system_init(XICS_IRQS, kvm-nrcpus);
 }
 
 int irq__add_msix_route(struct kvm *kvm, struct msi_msg *msg)
diff --git a/tools/kvm/powerpc/kvm-cpu.c b/tools/kvm/powerpc/kvm-cpu.c
index 71c648e..63cd106 100644
--- a/tools/kvm/powerpc/kvm-cpu.c
+++ b/tools/kvm/powerpc/kvm-cpu.c
@@ -15,6 +15,7 @@
 #include kvm/kvm.h
 
 #include spapr.h
+#include xics.h
 
 #include sys/ioctl.h
 #include sys/mman.h
@@ -107,6 +108,9 @@ struct kvm_cpu *kvm_cpu__init(struct kvm *kvm, unsigned 
long cpu_id)
 */
vcpu-is_running = true;
 
+   /* Register with IRQ controller */
+   xics_cpu_register(vcpu);
+
return vcpu;
 }
 
@@ -151,6 +155,12 @@ void kvm_cpu__reset_vcpu(struct kvm_cpu *vcpu)
 /* kvm_cpu__irq - set KVM's IRQ flag on this vcpu */
 void kvm_cpu__irq(struct kvm_cpu *vcpu, int pin, int level)
 {
+   unsigned int virq = level ? KVM_INTERRUPT_SET_LEVEL : 
KVM_INTERRUPT_UNSET;
+
+   if (pin != POWER7_EXT_IRQ)
+   return;
+   if (ioctl(vcpu-vcpu_fd, KVM_INTERRUPT, virq)  0)
+   pr_warning(Could not KVM_INTERRUPT.);
 }
 
 bool kvm_cpu__handle_exit(struct kvm_cpu *vcpu)
diff --git a/tools/kvm/powerpc/kvm.c b/tools/kvm/powerpc/kvm.c
index 8614538..bfd7c3a 100644
--- a/tools/kvm/powerpc/kvm.c
+++ b/tools/kvm/powerpc/kvm.c
@@ -41,9 +41,13 @@
 
 #define HUGETLBFS_PATH /var/lib/hugetlbfs/global/pagesize-16MB/
 
+#define PHANDLE_XICP   0x
+
 static char kern_cmdline[2048];
 
 struct kvm_ext kvm_req_ext[] = {
+   { DEFINE_KVM_EXT(KVM_CAP_PPC_UNSET_IRQ) },
+   { DEFINE_KVM_EXT(KVM_CAP_PPC_IRQ_LEVEL) },
{ 0, 0 }
 };
 
@@ -164,11 +168,6 @@ void kvm__arch_init(struct kvm *kvm, const char *kvm_dev, 
const char *hugetlbfs_
spapr_hvcons_init();
 }
 
-void kvm__irq_line(struct kvm *kvm, int irq, int level)
-{
-   fprintf(stderr, irq_line(%d, %d)\n, irq, level);
-}
-
 void kvm__irq_trigger(struct kvm *kvm, int irq)
 {
kvm__irq_line(kvm, irq, 1);
@@

[PATCH 6/8] kvm tools: Add PPC64 PCI Host Bridge

2011-12-05 Thread Matt Evans

This provides the PCI bridge, definitions for the address layout of the windows
and wires in IRQs.  Once PCI devices are all registered, they are enumerated and
DT nodes generated for each.

Signed-off-by: Matt Evans m...@ozlabs.org
---
 tools/kvm/powerpc/include/kvm/kvm-arch.h |3 +
 tools/kvm/powerpc/irq.c  |   17 +-
 tools/kvm/powerpc/kvm.c  |   11 +
 tools/kvm/powerpc/spapr.h|8 +
 tools/kvm/powerpc/spapr_pci.c|  429 ++
 tools/kvm/powerpc/spapr_pci.h|   38 +++
 6 files changed, 504 insertions(+), 2 deletions(-)
 create mode 100644 tools/kvm/powerpc/spapr_pci.c
 create mode 100644 tools/kvm/powerpc/spapr_pci.h

diff --git a/tools/kvm/powerpc/include/kvm/kvm-arch.h 
b/tools/kvm/powerpc/include/kvm/kvm-arch.h
index ae811e9..ba374f5 100644
--- a/tools/kvm/powerpc/include/kvm/kvm-arch.h
+++ b/tools/kvm/powerpc/include/kvm/kvm-arch.h
@@ -40,6 +40,8 @@
  */
 #define KVM_PCI_MMIO_AREA  0x100
 
+struct spapr_phb;
+
 struct kvm {
int sys_fd; /* For system ioctls(), i.e. 
/dev/kvm */
int vm_fd;  /* For VM ioctls() */
@@ -66,6 +68,7 @@ struct kvm {
unsigned long   initrd_size;
const char  *name;
struct icp_state*icp;
+   struct spapr_phb*phb;
 };
 
 #endif /* KVM__KVM_ARCH_H */
diff --git a/tools/kvm/powerpc/irq.c b/tools/kvm/powerpc/irq.c
index 80c972a..134db8f 100644
--- a/tools/kvm/powerpc/irq.c
+++ b/tools/kvm/powerpc/irq.c
@@ -21,14 +21,27 @@
 #include stddef.h
 #include stdlib.h
 
+#include kvm/pci.h
+
 #include xics.h
+#include spapr_pci.h
 
 #define XICS_IRQS   1024
 
+static int pci_devs = 0;
+
 int irq__register_device(u32 dev, u8 *num, u8 *pin, u8 *line)
 {
-   fprintf(stderr, irq__register_device(%d, [%d], [%d], [%d]\n,
-   dev, *num, *pin, *line);
+   if (pci_devs = PCI_MAX_DEVICES)
+   die(Hit PCI device limit!\n);
+
+   *num = pci_devs++;
+
+   *pin = 1;
+   /* Have I said how nasty I find this?  Line should be dontcare... PHB
+* should determine which CPU/XICS IRQ to fire.
+*/
+   *line = xics_alloc_irqnum();
return 0;
 }
 
diff --git a/tools/kvm/powerpc/kvm.c b/tools/kvm/powerpc/kvm.c
index bfd7c3a..353c667 100644
--- a/tools/kvm/powerpc/kvm.c
+++ b/tools/kvm/powerpc/kvm.c
@@ -16,6 +16,7 @@
 
 #include spapr.h
 #include spapr_hvcons.h
+#include spapr_pci.h
 
 #include linux/kvm.h
 
@@ -166,6 +167,11 @@ void kvm__arch_init(struct kvm *kvm, const char *kvm_dev, 
const char *hugetlbfs_
register_core_rtas();
/* Now that hypercalls are initialised, register a couple for the 
console: */
spapr_hvcons_init();
+   spapr_create_phb(kvm, pci, SPAPR_PCI_BUID,
+SPAPR_PCI_MEM_WIN_ADDR,
+SPAPR_PCI_MEM_WIN_SIZE,
+SPAPR_PCI_IO_WIN_ADDR,
+SPAPR_PCI_IO_WIN_SIZE);
 }
 
 void kvm__irq_trigger(struct kvm *kvm, int irq)
@@ -420,6 +426,11 @@ static void setup_fdt(struct kvm *kvm)
_FDT(fdt_finish(fdt));
 
_FDT(fdt_open_into(fdt, fdt_dest, FDT_MAX_SIZE));
+
+   /* PCI */
+   if (spapr_populate_pci_devices(kvm, PHANDLE_XICP, fdt_dest))
+   die(Fail populating PCI device nodes);
+
_FDT(fdt_add_mem_rsv(fdt_dest, kvm-rtas_gra, kvm-rtas_size));
_FDT(fdt_pack(fdt_dest));
 }
diff --git a/tools/kvm/powerpc/spapr.h b/tools/kvm/powerpc/spapr.h
index 4e5d7bd..902496d 100644
--- a/tools/kvm/powerpc/spapr.h
+++ b/tools/kvm/powerpc/spapr.h
@@ -305,4 +305,12 @@ target_ulong spapr_rtas_call(struct kvm_cpu *vcpu,
  uint32_t token, uint32_t nargs, target_ulong args,
  uint32_t nret, target_ulong rets);
 
+#define SPAPR_PCI_BUID  0x8002001ULL
+#define SPAPR_PCI_MEM_WIN_ADDR  (KVM_MMIO_START + 0xA000)
+#define SPAPR_PCI_MEM_WIN_SIZE  0x2000
+#define SPAPR_PCI_IO_WIN_ADDR   (KVM_MMIO_START + 0x8000)
+/* This, to me, is odd... 32MB of I/O?  Some PHBs are set up like this.
+ * Anything ever use  64K? :P */
+#define SPAPR_PCI_IO_WIN_SIZE  0x200
+
 #endif /* !defined (__HW_SPAPR_H__) */
diff --git a/tools/kvm/powerpc/spapr_pci.c b/tools/kvm/powerpc/spapr_pci.c
new file mode 100644
index 000..233c42c
--- /dev/null
+++ b/tools/kvm/powerpc/spapr_pci.c
@@ -0,0 +1,429 @@
+/*
+ * SPAPR PHB emulation, RTAS interface to PCI config space, device tree nodes
+ * for enumerated devices.
+ *
+ * Borrowed heavily from QEMU's spapr_pci.c,
+ * Copyright (c) 2011 Alexey Kardashevskiy, IBM Corporation.
+ * Copyright (c) 2011 David Gibson, IBM Corporation.
+ *
+ * Modifications copyright 2011 Matt Evans m...@ozlabs.org, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public

[PATCH 7/8] kvm tools: Add PPC64 kvm_cpu__emulate_io()

2011-12-05 Thread Matt Evans

This is the final piece of the puzzle for PPC SPAPR PCI; this
function splits MMIO accesses into the two PHB windows  directs
things to MMIO/IO emulation as appropriate.

Signed-off-by: Matt Evans m...@ozlabs.org
---
 tools/kvm/Makefile  |1 +
 tools/kvm/powerpc/kvm-cpu.c |   34 ++
 2 files changed, 35 insertions(+), 0 deletions(-)

diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile
index 6c8..9b875dd 100644
--- a/tools/kvm/Makefile
+++ b/tools/kvm/Makefile
@@ -131,6 +131,7 @@ ifeq ($(uname_M), ppc64)
OBJS+= powerpc/spapr_hcall.o
OBJS+= powerpc/spapr_rtas.o
OBJS+= powerpc/spapr_hvcons.o
+   OBJS+= powerpc/spapr_pci.o
OBJS+= powerpc/xics.o
ARCH_INCLUDE := powerpc/include
CFLAGS  += -m64
diff --git a/tools/kvm/powerpc/kvm-cpu.c b/tools/kvm/powerpc/kvm-cpu.c
index 63cd106..0cf4dc8 100644
--- a/tools/kvm/powerpc/kvm-cpu.c
+++ b/tools/kvm/powerpc/kvm-cpu.c
@@ -24,6 +24,7 @@
 #include string.h
 #include errno.h
 #include stdio.h
+#include assert.h
 
 static int debug_fd;
 
@@ -177,6 +178,39 @@ bool kvm_cpu__handle_exit(struct kvm_cpu *vcpu)
return ret;
 }
 
+bool kvm_cpu__emulate_io(struct kvm_cpu *cpu, struct kvm_run *kvm_run)
+{
+   bool ret = false;
+   u64 phys_addr;
+
+   /* We'll never get KVM_EXIT_IO, it's x86-specific.  All IO is MM! :P
+* So, look at our windows here  split addresses into I/O or MMIO.
+*/
+   assert(kvm_run-exit_reason == KVM_EXIT_MMIO);
+
+   phys_addr = cpu-kvm_run-mmio.phys_addr;
+   if ((phys_addr = SPAPR_PCI_IO_WIN_ADDR) 
+   (phys_addr  SPAPR_PCI_IO_WIN_ADDR + SPAPR_PCI_IO_WIN_SIZE)) {
+   ret = kvm__emulate_io(cpu-kvm, phys_addr - 
SPAPR_PCI_IO_WIN_ADDR,
+ cpu-kvm_run-mmio.data,
+ cpu-kvm_run-mmio.is_write ?
+ KVM_EXIT_IO_OUT : KVM_EXIT_IO_IN,
+ cpu-kvm_run-mmio.len, 1);
+   } else if ((phys_addr = SPAPR_PCI_MEM_WIN_ADDR) 
+  (phys_addr  SPAPR_PCI_MEM_WIN_ADDR + 
SPAPR_PCI_MEM_WIN_SIZE)) {
+   ret = kvm__emulate_mmio(cpu-kvm,
+   cpu-kvm_run-mmio.phys_addr - 
SPAPR_PCI_MEM_WIN_ADDR,
+   cpu-kvm_run-mmio.data,
+   cpu-kvm_run-mmio.len,
+   cpu-kvm_run-mmio.is_write);
+   } else {
+   pr_warning(MMIO %s unknown address %lx (size %d)!\n,
+  cpu-kvm_run-mmio.is_write ? write to : read 
from,
+  phys_addr, cpu-kvm_run-mmio.len);
+   }
+   return ret;
+}
+
 #define CONDSTR_BIT(m, b) (((m)  MSR_##b) ? #b  : )
 
 void kvm_cpu__show_registers(struct kvm_cpu *vcpu)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 8/8] kvm tools: Make virtio-pci's ioeventfd__add_event() fall back gracefully if ioeventfds unavailable

2011-12-05 Thread Matt Evans

PPC KVM doesn't yet support ioeventfds, so don't bomb out/die.  virtio-pci is
able to function if it instead uses normal IO port notification.

Signed-off-by: Matt Evans m...@ozlabs.org
---
 tools/kvm/include/kvm/ioeventfd.h |3 ++-
 tools/kvm/ioeventfd.c |   12 +---
 tools/kvm/virtio/pci.c|   11 ++-
 3 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/tools/kvm/include/kvm/ioeventfd.h 
b/tools/kvm/include/kvm/ioeventfd.h
index df01750..5e458be 100644
--- a/tools/kvm/include/kvm/ioeventfd.h
+++ b/tools/kvm/include/kvm/ioeventfd.h
@@ -4,6 +4,7 @@
 #include linux/types.h
 #include linux/list.h
 #include sys/eventfd.h
+#include stdbool.h
 
 struct kvm;
 
@@ -21,7 +22,7 @@ struct ioevent {
 
 void ioeventfd__init(void);
 void ioeventfd__start(void);
-void ioeventfd__add_event(struct ioevent *ioevent);
+bool ioeventfd__add_event(struct ioevent *ioevent);
 void ioeventfd__del_event(u64 addr, u64 datamatch);
 
 #endif
diff --git a/tools/kvm/ioeventfd.c b/tools/kvm/ioeventfd.c
index 3a240e4..37f9a63 100644
--- a/tools/kvm/ioeventfd.c
+++ b/tools/kvm/ioeventfd.c
@@ -26,7 +26,7 @@ void ioeventfd__init(void)
die(Failed creating epoll fd);
 }
 
-void ioeventfd__add_event(struct ioevent *ioevent)
+bool ioeventfd__add_event(struct ioevent *ioevent)
 {
struct kvm_ioeventfd kvm_ioevent;
struct epoll_event epoll_event;
@@ -48,8 +48,13 @@ void ioeventfd__add_event(struct ioevent *ioevent)
.flags  = KVM_IOEVENTFD_FLAG_PIO | 
KVM_IOEVENTFD_FLAG_DATAMATCH,
};
 
-   if (ioctl(ioevent-fn_kvm-vm_fd, KVM_IOEVENTFD, kvm_ioevent) != 0)
-   die(Failed creating new ioeventfd);
+   if (ioctl(ioevent-fn_kvm-vm_fd, KVM_IOEVENTFD, kvm_ioevent) != 0) {
+   /* Not all KVM implementations may support KVM_IOEVENTFD,
+* so be graceful.
+*/
+   free(new_ioevent);
+   return false;
+   }
 
epoll_event = (struct epoll_event) {
.events = EPOLLIN,
@@ -60,6 +65,7 @@ void ioeventfd__add_event(struct ioevent *ioevent)
die(Failed assigning new event to the epoll fd);
 
list_add_tail(new_ioevent-list, used_ioevents);
+   return true;
 }
 
 void ioeventfd__del_event(u64 addr, u64 datamatch)
diff --git a/tools/kvm/virtio/pci.c b/tools/kvm/virtio/pci.c
index ffa3768..06d3b79 100644
--- a/tools/kvm/virtio/pci.c
+++ b/tools/kvm/virtio/pci.c
@@ -50,7 +50,16 @@ static int virtio_pci__init_ioeventfd(struct kvm *kvm, 
struct virtio_trans *vtra
.fd = eventfd(0, 0),
};
 
-   ioeventfd__add_event(ioevent);
+   if (!ioeventfd__add_event(ioevent)) {
+#ifndef CONFIG_PPC
+   /* PPC64 doesn't have kvm ioevents yet, so we expect this to
+* fail -- don't need to be verbose about it!  For virtio-pci,
+* this is fine.  It catches the IO accesses anyway, so
+* still works (but slower).
+*/
+   pr_warning(Failed creating new ioeventfd);
+#endif
+   }
 
if (vtrans-virtio_ops-notify_vq_eventfd)
vtrans-virtio_ops-notify_vq_eventfd(kvm, vpci-dev, vq, 
ioevent.fd);
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] virtio-ring: Use threshold for switching to indirect descriptors

2011-12-05 Thread Rusty Russell

On Mon, 05 Dec 2011 11:52:54 +0200, Avi Kivity a...@redhat.com wrote:
 On 12/05/2011 02:10 AM, Rusty Russell wrote:
  On Sun, 04 Dec 2011 17:16:59 +0200, Avi Kivity a...@redhat.com wrote:
   On 12/04/2011 05:11 PM, Michael S. Tsirkin wrote:
 There's also the used ring, but that's a
 mistake if you have out of order completion.  We should have used 
 copying.
   
Seems unrelated... unless you want used to be written into
descriptor ring itself?
   
   The avail/used rings are in addition to the regular ring, no?  If you
   copy descriptors, then it goes away.
 
  There were two ideas which drove the current design:
 
  1) The Van-Jacobson style no two writers to same cacheline makes rings
 fast idea.  Empirically, this doesn't show any winnage.
 
 Write/write is the same as write/read or read/write.  Both cases have to
 send a probe and wait for the result.  What we really need is to
 minimize cache line ping ponging, and the descriptor pool fails that
 with ooo completion.  I doubt it's measurable though except with the
 very fastest storage providers.

The claim was that going exclusive-shared-exclusive was cheaper than
exclusive-invalid-exclusive.  When VJ said it, it seemed convincing :)

  2) Allowing a generic inter-guest copy mechanism, so we could have
 genuinely untrusted driver domains.  Yet noone ever did this so it's
 hardly a killer feature :(
 
 It's still a goal, though not an important one.  But we have to
 translate rings anyway, don't, since buffers are in guest physical
 addresses, and we're moving into an address space that doesn't map those.

Yes, but the hypervisor/trusted party would simply have to do the copy;
the rings themselves would be shared A would say copy this to/from B's
ring entry N and you know that A can't have changed B's entry.

 I thought of having a vhost-copy driver that could do ring translation,
 using a dma engine for the copy.

As long as we get the length of data written from the vhost-copy driver
(ie. not just the network header).  Otherwise a malicious other guest
can send short packets, and a local process can read uninitialized
memory.  And pre-zeroing the buffers for this corner case sucks.

  So if we're going to revisit and drop those requirements, I'd say:
 
  1) Shared device/driver rings like Xen.  Xen uses device-specific ring
 contents, I'd be tempted to stick to our pre-headers, and a 'u64
 addr; u64 len_and_flags; u64 cookie;' generic style.  Then use
 the same ring for responses.  That's a slight space-win, since
 we're 24 bytes vs 26 bytes now.
 
 Let's cheat and have inline contents.  Take three bits from
 len_and_flags to specify additional descriptors as inline data.

Nice, I like this optimization.

 Also, stuff the cookie into len_and_flags as well.

Every driver really wants to put a pointer in there.  We have an array
to map desc. numbers to cookies inside the virtio core.

We really want 64 bits.

  2) Stick with physically-contiguous rings, but use them of size (2^n)-1.
 Makes the indexing harder, but that -1 lets us stash the indices in
 the first entry and makes the ring a nice 2^n size.
 
 Allocate at lease a cache line for those.  The 2^n size is not really
 material, a division is never necessary.

We free-run our indices, so we *do* a division (by truncation).  If we
limit indices to ringsize, then we have to handle empty/full confusion.

It's nice for simple OSes if things pack nicely into pages, but it's not
a killer feature IMHO.

 16kB worth of descriptors is 1024 entries.  With 4kB buffers, that's 
 4MB
 worth of data, or 4 ms at 10GbE line speed.  With 1500 byte buffers 
 it's
 just 1.5 ms.  In any case I think it's sufficient.
   
Right. So I think that without indirect, we waste about 3 entries
per packet for virtio header and transport etc headers.
   
   That does suck.  Are there issues in increasing the ring size?  Or
   making it discontiguous?
 
  Because the qemu implementation is broken.  
 
 I was talking about something else, but this is more important.  Every
 time we make a simplifying assumption, it turns around and bites us, and
 the code becomes twice as complicated as it would have been in the first
 place, and the test matrix explodes.

True, though we seem to be improving.  But this is why I don't want
optional features in the spec; I want us always to exercise all of it.

  We can often put the virtio
  header at the head of the packet.  In practice, the qemu implementation
  insists the header be a single descriptor.
 
  (At least, it used to, perhaps it has now been fixed.  We need a
  VIRTIO_NET_F_I_NOW_CONFORM_TO_THE_DAMN_SPEC_SORRY_I_SUCK bit).
 
 We'll run out of bits in no time.

We had one already: VIRTIO_F_BAD_FEATURE.  We haven't used it in a long
time (if ever), and I just removed it from the latest version of the
spec.

But we can cheat: we can add this as a requirement to The New Ring
Layout.  And document

Re: [PATCH 00/28] kvm tools: Prepare kvmtool for another architecture

2011-12-05 Thread Matt Evans

On 06/12/11 14:35, Matt Evans wrote:

 This patch series rearranges and tidies various parts of kvmtool to pave the 
 way
 for the addition of support for another architecture -- SPAPR PPC64.  A second
 patch series will follow to present the PPC64 support.

I forgot to mention, of course, that these two sets apply on top of 
git://github.com/penberg/linux-kvm.git master as of d5e6b9fa.

Also, I've have been testing PPC64 kvmtool using the book3s_hv KVM mode.


Matt
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

1 2 >

1 - 100 of 164 matches

Mail list logo