Re: [PATCH] kvm: make vcpu life cycle separated from kvm instance
On Mon, Dec 05, 2011 at 01:39:37PM +0800, Liu ping fan wrote: On Sun, Dec 4, 2011 at 8:10 PM, Gleb Natapov g...@redhat.com wrote: On Sun, Dec 04, 2011 at 07:53:37PM +0800, Liu ping fan wrote: On Sat, Dec 3, 2011 at 2:26 AM, Jan Kiszka jan.kis...@siemens.com wrote: On 2011-12-02 07:26, Liu Ping Fan wrote: From: Liu Ping Fan pingf...@linux.vnet.ibm.com Currently, vcpu can be destructed only when kvm instance destroyed. Change this to vcpu's destruction taken when its refcnt is zero, and then vcpu MUST and CAN be destroyed before kvm's destroy. I'm lacking the big picture yet (would be good to have in the change log - at least I'm too lazy to read the code): What increments the refcnt, what decrements it again? IOW, how does user space controls the life-cycle of a vcpu after your changes? In local APIC mode, delivering IPI to target APIC, target's refcnt is incremented, and decremented when finished. At other times, using RCU to Why is this needed? Suppose the following scene: #define kvm_for_each_vcpu(idx, vcpup, kvm) \ for (idx = 0; \ idx atomic_read(kvm-online_vcpus) \ (vcpup = kvm_get_vcpu(kvm, idx)) != NULL; \ idx++) -- Here kvm_vcpu's destruction is called vcpup-vcpu_id ... //oops! And this is exactly how your code looks. i.e you do not increment reference count in most of the loops, you only increment it twice (in pic_unlock() and kvm_irq_delivery_to_apic()) because you are using vcpu outside of rcu_read_lock() protected section and I do not see why not just extend protected section to include kvm_vcpu_kick(). As far as I can see this function does not sleep. What should protect vcpu from disappearing in your example above is RCU itself if you are using it right. But since I do not see any calls to rcu_assign_pointer()/rcu_dereference() I doubt you are using it right actually. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[net-next RFC PATCH 0/5] Series short description
multiple queue virtio-net: flow steering through host/guest cooperation Hello all: This is a rough series adds the guest/host cooperation of flow steering support based on Krish Kumar's multiple queue virtio-net driver patch 3/3 (http://lwn.net/Articles/467283/). This idea is simple, the backend pass the rxhash to the guest and guest would tell the backend the hash to queue mapping when necessary then backend can choose the queue based on the hash value of the packet. The table is just a page shared bettwen userspace and the backend. Patch 1 enable the ability to pass the rxhash through vnet_hdr to guest. Patch 2,3 implement a very simple flow director for tap and mavtap. tap part is based on the multiqueue tap patches posted by me (http://lwn.net/Articles/459270/). Patch 4 implement a method for virtio device to find the irq of a specific virtqueue, in order to do device specific interrupt optimization Patch 5 is the part of the guest driver that using accelerate rfs to program the flow director and with some optimizations on irq affinity and tx queue selection. This is just a prototype that demonstrates the idea, there are still things need to be discussed: - An alternative idea instead of shared page is ctrl vq, the reason that a shared table is preferable is the delay of ctrl vq itself. - Optimization on irq affinity and tx queue selection Comments are welcomed, thanks! --- Jason Wang (5): virtio_net: passing rxhash through vnet_hdr tuntap: simple flow director support macvtap: flow director support virtio: introduce a method to get the irq of a specific virtqueue virtio-net: flow director support drivers/lguest/lguest_device.c |8 ++ drivers/net/macvlan.c |4 + drivers/net/macvtap.c | 42 - drivers/net/tun.c | 105 -- drivers/net/virtio_net.c | 189 +++- drivers/s390/kvm/kvm_virtio.c |6 + drivers/vhost/net.c| 10 +- drivers/vhost/vhost.h |5 + drivers/virtio/virtio_mmio.c |8 ++ drivers/virtio/virtio_pci.c| 12 +++ include/linux/if_macvlan.h |1 include/linux/if_tun.h | 11 ++ include/linux/virtio_config.h |4 + include/linux/virtio_net.h | 16 +++ 14 files changed, 377 insertions(+), 44 deletions(-) -- Signature -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[net-next RFC PATCH 1/5] virtio_net: passing rxhash through vnet_hdr
This patch enables the ability to pass the rxhash value to guest through vnet_hdr. This is useful for guest when it wants to cooperate with virtual device to steer a flow to dedicated guest cpu. This feature is negotiated through VIRTIO_NET_F_GUEST_RXHASH. Signed-off-by: Jason Wang jasow...@redhat.com --- drivers/net/macvtap.c | 10 ++ drivers/net/tun.c | 44 +--- drivers/net/virtio_net.c | 26 ++ drivers/vhost/net.c| 10 +++--- drivers/vhost/vhost.h |5 +++-- include/linux/if_tun.h |1 + include/linux/virtio_net.h | 10 +- 7 files changed, 73 insertions(+), 33 deletions(-) diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c index 7c88d13..504c745 100644 --- a/drivers/net/macvtap.c +++ b/drivers/net/macvtap.c @@ -760,16 +760,17 @@ static ssize_t macvtap_put_user(struct macvtap_queue *q, int vnet_hdr_len = 0; if (q-flags IFF_VNET_HDR) { - struct virtio_net_hdr vnet_hdr; + struct virtio_net_hdr_rxhash vnet_hdr; vnet_hdr_len = q-vnet_hdr_sz; if ((len -= vnet_hdr_len) 0) return -EINVAL; - ret = macvtap_skb_to_vnet_hdr(skb, vnet_hdr); + ret = macvtap_skb_to_vnet_hdr(skb, vnet_hdr.hdr.hdr); if (ret) return ret; - if (memcpy_toiovecend(iv, (void *)vnet_hdr, 0, sizeof(vnet_hdr))) + vnet_hdr.rxhash = skb-rxhash; + if (memcpy_toiovecend(iv, (void *)vnet_hdr, 0, q-vnet_hdr_sz)) return -EFAULT; } @@ -890,7 +891,8 @@ static long macvtap_ioctl(struct file *file, unsigned int cmd, return ret; case TUNGETFEATURES: - if (put_user(IFF_TAP | IFF_NO_PI | IFF_VNET_HDR, up)) + if (put_user(IFF_TAP | IFF_NO_PI | IFF_VNET_HDR | IFF_RXHASH, +up)) return -EFAULT; return 0; diff --git a/drivers/net/tun.c b/drivers/net/tun.c index afb11d1..7d22b4b 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -869,49 +869,55 @@ static ssize_t tun_put_user(struct tun_file *tfile, } if (tfile-flags TUN_VNET_HDR) { - struct virtio_net_hdr gso = { 0 }; /* no info leak */ - if ((len -= tfile-vnet_hdr_sz) 0) + struct virtio_net_hdr_rxhash hdr; + struct virtio_net_hdr *gso = (struct virtio_net_hdr *)hdr; + + if ((len -= tfile-vnet_hdr_sz) 0 || + tfile-vnet_hdr_sz sizeof(struct virtio_net_hdr_rxhash)) return -EINVAL; + memset(hdr, 0, sizeof(hdr)); if (skb_is_gso(skb)) { struct skb_shared_info *sinfo = skb_shinfo(skb); /* This is a hint as to how much should be linear. */ - gso.hdr_len = skb_headlen(skb); - gso.gso_size = sinfo-gso_size; + gso-hdr_len = skb_headlen(skb); + gso-gso_size = sinfo-gso_size; if (sinfo-gso_type SKB_GSO_TCPV4) - gso.gso_type = VIRTIO_NET_HDR_GSO_TCPV4; + gso-gso_type = VIRTIO_NET_HDR_GSO_TCPV4; else if (sinfo-gso_type SKB_GSO_TCPV6) - gso.gso_type = VIRTIO_NET_HDR_GSO_TCPV6; + gso-gso_type = VIRTIO_NET_HDR_GSO_TCPV6; else if (sinfo-gso_type SKB_GSO_UDP) - gso.gso_type = VIRTIO_NET_HDR_GSO_UDP; + gso-gso_type = VIRTIO_NET_HDR_GSO_UDP; else { pr_err(unexpected GSO type: 0x%x, gso_size %d, hdr_len %d\n, - sinfo-gso_type, gso.gso_size, - gso.hdr_len); + sinfo-gso_type, gso-gso_size, + gso-hdr_len); print_hex_dump(KERN_ERR, tun: , DUMP_PREFIX_NONE, 16, 1, skb-head, - min((int)gso.hdr_len, 64), true); + min((int)gso-hdr_len, 64), + true); WARN_ON_ONCE(1); return -EINVAL; } if (sinfo-gso_type SKB_GSO_TCP_ECN) - gso.gso_type |= VIRTIO_NET_HDR_GSO_ECN; + gso-gso_type |= VIRTIO_NET_HDR_GSO_ECN;
[net-next RFC PATCH 2/5] tuntap: simple flow director support
This patch adds a simple flow director to tun/tap device. It is just a page that contains the hash to queue mapping which could be changed by user-space. The backend (tap/macvtap) would query this table to get the desired queue of a packets when it send packets to userspace. The page address were set through a new kind of ioctl - TUNSETFD and were pinned until device exit or another new page were specified. Signed-off-by: Jason Wang jasow...@redhat.com --- drivers/net/tun.c | 63 include/linux/if_tun.h | 10 2 files changed, 62 insertions(+), 11 deletions(-) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 7d22b4b..2efaf81 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -64,6 +64,7 @@ #include linux/nsproxy.h #include linux/virtio_net.h #include linux/rcupdate.h +#include linux/highmem.h #include net/net_namespace.h #include net/netns/generic.h #include net/rtnetlink.h @@ -109,6 +110,7 @@ struct tap_filter { }; #define MAX_TAP_QUEUES (NR_CPUS 16 ? NR_CPUS : 16) +#define TAP_HASH_MASK 0xFF struct tun_file { struct sock sk; @@ -128,6 +130,7 @@ struct tun_sock; struct tun_struct { struct tun_file *tfiles[MAX_TAP_QUEUES]; + struct page *fd_page[1]; unsigned intnumqueues; unsigned intflags; uid_t owner; @@ -156,7 +159,7 @@ static struct tun_file *tun_get_queue(struct net_device *dev, struct tun_struct *tun = netdev_priv(dev); struct tun_file *tfile = NULL; int numqueues = tun-numqueues; - __u32 rxq; + __u32 rxq, rxhash; BUG_ON(!rcu_read_lock_held()); @@ -168,6 +171,22 @@ static struct tun_file *tun_get_queue(struct net_device *dev, goto out; } + rxhash = skb_get_rxhash(skb); + if (rxhash) { + if (tun-fd_page[0]) { + u16 *table = kmap_atomic(tun-fd_page[0]); + rxq = table[rxhash TAP_HASH_MASK]; + kunmap_atomic(table); + if (rxq numqueues) { + tfile = rcu_dereference(tun-tfiles[rxq]); + goto out; + } + } + rxq = ((u64)rxhash * numqueues) 32; + tfile = rcu_dereference(tun-tfiles[rxq]); + goto out; + } + if (likely(skb_rx_queue_recorded(skb))) { rxq = skb_get_rx_queue(skb); @@ -178,14 +197,6 @@ static struct tun_file *tun_get_queue(struct net_device *dev, goto out; } - /* Check if we can use flow to select a queue */ - rxq = skb_get_rxhash(skb); - if (rxq) { - u32 idx = ((u64)rxq * numqueues) 32; - tfile = rcu_dereference(tun-tfiles[idx]); - goto out; - } - tfile = rcu_dereference(tun-tfiles[0]); out: return tfile; @@ -1020,6 +1031,14 @@ out: return ret; } +static void tun_destructor(struct net_device *dev) +{ + struct tun_struct *tun = netdev_priv(dev); + if (tun-fd_page[0]) + put_page(tun-fd_page[0]); + free_netdev(dev); +} + static void tun_setup(struct net_device *dev) { struct tun_struct *tun = netdev_priv(dev); @@ -1028,7 +1047,7 @@ static void tun_setup(struct net_device *dev) tun-group = -1; dev-ethtool_ops = tun_ethtool_ops; - dev-destructor = free_netdev; + dev-destructor = tun_destructor; } /* Trivial set of netlink ops to allow deleting tun or tap @@ -1230,6 +1249,7 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr) tun = netdev_priv(dev); tun-dev = dev; tun-flags = flags; + tun-fd_page[0] = NULL; security_tun_dev_post_create(tfile-sk); @@ -1353,6 +1373,7 @@ static long __tun_chr_ioctl(struct file *file, unsigned int cmd, struct net_device *dev = NULL; void __user* argp = (void __user*)arg; struct ifreq ifr; + struct tun_fd tfd; int ret; if (cmd == TUNSETIFF || cmd == TUNATTACHQUEUE || _IOC_TYPE(cmd) == 0x89) @@ -1364,7 +1385,8 @@ static long __tun_chr_ioctl(struct file *file, unsigned int cmd, * This is needed because we never checked for invalid flags on * TUNSETIFF. */ return put_user(IFF_TUN | IFF_TAP | IFF_NO_PI | IFF_ONE_QUEUE | - IFF_VNET_HDR | IFF_MULTI_QUEUE | IFF_RXHASH, + IFF_VNET_HDR | IFF_MULTI_QUEUE | IFF_RXHASH | + IFF_FD, (unsigned int __user*)argp); } @@ -1476,6 +1498,25 @@ static long __tun_chr_ioctl(struct file *file, unsigned int cmd, ret = set_offload(tun, arg);
[net-next RFC PATCH 3/5] macvtap: flow director support
Signed-off-by: Jason Wang jasow...@redhat.com --- drivers/net/macvlan.c |4 drivers/net/macvtap.c | 36 ++-- include/linux/if_macvlan.h |1 + 3 files changed, 39 insertions(+), 2 deletions(-) diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c index 7413497..b0cb7ce 100644 --- a/drivers/net/macvlan.c +++ b/drivers/net/macvlan.c @@ -706,6 +706,7 @@ int macvlan_common_newlink(struct net *src_net, struct net_device *dev, vlan-port = port; vlan-receive = receive; vlan-forward = forward; + vlan-fd_page[0] = NULL; vlan-mode = MACVLAN_MODE_VEPA; if (data data[IFLA_MACVLAN_MODE]) @@ -749,6 +750,9 @@ void macvlan_dellink(struct net_device *dev, struct list_head *head) { struct macvlan_dev *vlan = netdev_priv(dev); + if (vlan-fd_page[0]) + put_page(vlan-fd_page[0]); + list_del(vlan-list); unregister_netdevice_queue(dev, head); } diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c index 504c745..a34eb84 100644 --- a/drivers/net/macvtap.c +++ b/drivers/net/macvtap.c @@ -14,6 +14,7 @@ #include linux/wait.h #include linux/cdev.h #include linux/fs.h +#include linux/highmem.h #include net/net_namespace.h #include net/rtnetlink.h @@ -62,6 +63,8 @@ static DEFINE_IDR(minor_idr); static struct class *macvtap_class; static struct cdev macvtap_cdev; +#define TAP_HASH_MASK 0xFF + static const struct proto_ops macvtap_socket_ops; /* @@ -189,6 +192,11 @@ static struct macvtap_queue *macvtap_get_queue(struct net_device *dev, /* Check if we can use flow to select a queue */ rxq = skb_get_rxhash(skb); if (rxq) { + if (vlan-fd_page[0]) { + u16 *table = kmap_atomic(vlan-fd_page[0]); + rxq = table[rxq TAP_HASH_MASK]; + kunmap_atomic(table); + } tap = rcu_dereference(vlan-taps[rxq % numvtaps]); if (tap) goto out; @@ -851,6 +859,7 @@ static long macvtap_ioctl(struct file *file, unsigned int cmd, { struct macvtap_queue *q = file-private_data; struct macvlan_dev *vlan; + struct tun_fd tfd; void __user *argp = (void __user *)arg; struct ifreq __user *ifr = argp; unsigned int __user *up = argp; @@ -891,8 +900,8 @@ static long macvtap_ioctl(struct file *file, unsigned int cmd, return ret; case TUNGETFEATURES: - if (put_user(IFF_TAP | IFF_NO_PI | IFF_VNET_HDR | IFF_RXHASH, -up)) + if (put_user(IFF_TAP | IFF_NO_PI | IFF_VNET_HDR | IFF_RXHASH | +IFF_FD, up)) return -EFAULT; return 0; @@ -918,6 +927,29 @@ static long macvtap_ioctl(struct file *file, unsigned int cmd, q-vnet_hdr_sz = s; return 0; + case TUNSETFD: + rcu_read_lock_bh(); + vlan = rcu_dereference(q-vlan); + if (!vlan) + ret = -ENOLINK; + else { + if (copy_from_user(tfd, argp, sizeof(tfd))) + ret = -EFAULT; + if (vlan-fd_page[0]) { + put_page(vlan-fd_page[0]); + vlan-fd_page[0] = NULL; + } + + /* put_page() in macvlan_dellink() */ + if (get_user_pages_fast(tfd.addr, 1, 0, + vlan-fd_page[0]) != 1) + ret = -EFAULT; + else + ret = 0; + } + rcu_read_unlock_bh(); + return ret; + case TUNSETOFFLOAD: /* let the user check for future flags */ if (arg ~(TUN_F_CSUM | TUN_F_TSO4 | TUN_F_TSO6 | diff --git a/include/linux/if_macvlan.h b/include/linux/if_macvlan.h index d103dca..69a87a1 100644 --- a/include/linux/if_macvlan.h +++ b/include/linux/if_macvlan.h @@ -65,6 +65,7 @@ struct macvlan_dev { struct macvtap_queue*taps[MAX_MACVTAP_QUEUES]; int numvtaps; int minor; + struct page *fd_page[1]; }; static inline void macvlan_count_rx(const struct macvlan_dev *vlan, -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[net-next RFC PATCH 4/5] virtio: introduce a method to get the irq of a specific virtqueue
Device specific irq configuration may be need in order to do some optimization. So a new configuration is needed to get the irq of a virtqueue. Signed-off-by: Jason Wang jasow...@redhat.com --- drivers/lguest/lguest_device.c |8 drivers/s390/kvm/kvm_virtio.c |6 ++ drivers/virtio/virtio_mmio.c |8 drivers/virtio/virtio_pci.c| 12 include/linux/virtio_config.h |4 5 files changed, 38 insertions(+), 0 deletions(-) diff --git a/drivers/lguest/lguest_device.c b/drivers/lguest/lguest_device.c index 595d731..6483bff 100644 --- a/drivers/lguest/lguest_device.c +++ b/drivers/lguest/lguest_device.c @@ -386,6 +386,13 @@ static const char *lg_bus_name(struct virtio_device *vdev) return ; } +static int lg_get_vq_irq(struct virtio_device *vdev, struct virtqueue *vq) +{ + struct lguest_vq_info *lvq = vq-priv; + + return lvq-config.irq; +} + /* The ops structure which hooks everything together. */ static struct virtio_config_ops lguest_config_ops = { .get_features = lg_get_features, @@ -398,6 +405,7 @@ static struct virtio_config_ops lguest_config_ops = { .find_vqs = lg_find_vqs, .del_vqs = lg_del_vqs, .bus_name = lg_bus_name, + .get_vq_irq = lg_get_vq_irq, }; /* diff --git a/drivers/s390/kvm/kvm_virtio.c b/drivers/s390/kvm/kvm_virtio.c index 8af868b..a8d5ca1 100644 --- a/drivers/s390/kvm/kvm_virtio.c +++ b/drivers/s390/kvm/kvm_virtio.c @@ -268,6 +268,11 @@ static const char *kvm_bus_name(struct virtio_device *vdev) return ; } +static int kvm_get_vq_irq(struct virtio_device *vdev, struct virtqueue *vq) +{ + return 0x2603; +} + /* * The config ops structure as defined by virtio config */ @@ -282,6 +287,7 @@ static struct virtio_config_ops kvm_vq_configspace_ops = { .find_vqs = kvm_find_vqs, .del_vqs = kvm_del_vqs, .bus_name = kvm_bus_name, + .get_vq_irq = kvm_get_vq_irq, }; /* diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c index 2f57380..309d471 100644 --- a/drivers/virtio/virtio_mmio.c +++ b/drivers/virtio/virtio_mmio.c @@ -368,6 +368,13 @@ static const char *vm_bus_name(struct virtio_device *vdev) return vm_dev-pdev-name; } +static int vm_get_vq_irq(struct virtio_device *vdev, struct virtqueue *vq) +{ + struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev); + + return platform_get_irq(vm_dev-pdev, 0); +} + static struct virtio_config_ops virtio_mmio_config_ops = { .get= vm_get, .set= vm_set, @@ -379,6 +386,7 @@ static struct virtio_config_ops virtio_mmio_config_ops = { .get_features = vm_get_features, .finalize_features = vm_finalize_features, .bus_name = vm_bus_name, + .get_vq_irq = vm_get_vq_irq, }; diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c index 229ea56..4f99164 100644 --- a/drivers/virtio/virtio_pci.c +++ b/drivers/virtio/virtio_pci.c @@ -583,6 +583,17 @@ static const char *vp_bus_name(struct virtio_device *vdev) return pci_name(vp_dev-pci_dev); } +static int vp_get_vq_irq(struct virtio_device *vdev, struct virtqueue *vq) +{ + struct virtio_pci_device *vp_dev = to_vp_device(vdev); + struct virtio_pci_vq_info *info = vq-priv; + + if (vp_dev-intx_enabled) + return vp_dev-pci_dev-irq; + else + return vp_dev-msix_entries[info-msix_vector].vector; +} + static struct virtio_config_ops virtio_pci_config_ops = { .get= vp_get, .set= vp_set, @@ -594,6 +605,7 @@ static struct virtio_config_ops virtio_pci_config_ops = { .get_features = vp_get_features, .finalize_features = vp_finalize_features, .bus_name = vp_bus_name, + .get_vq_irq = vp_get_vq_irq, }; static void virtio_pci_release_dev(struct device *_d) diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h index 63f98d0..7b783a6 100644 --- a/include/linux/virtio_config.h +++ b/include/linux/virtio_config.h @@ -104,6 +104,9 @@ * vdev: the virtio_device * This returns a pointer to the bus name a la pci_name from which * the caller can then copy. + * @get_vq_irq: get the irq numer of the specific virt queue. + * vdev: the virtio_device + * vq: the virtqueue */ typedef void vq_callback_t(struct virtqueue *); struct virtio_config_ops { @@ -122,6 +125,7 @@ struct virtio_config_ops { u32 (*get_features)(struct virtio_device *vdev); void (*finalize_features)(struct virtio_device *vdev); const char *(*bus_name)(struct virtio_device *vdev); + int (*get_vq_irq)(struct virtio_device *vdev, struct virtqueue *vq); }; /* If driver didn't advertise the feature, it will never appear. */ -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to
[net-next RFC PATCH 5/5] virtio-net: flow director support
In order to let the packets of a flow to be passed to the desired guest cpu, we can co-operate with devices through programming the flow director which was just a hash to queue table. This kinds of co-operation is done through the accelerate RFS support, a device specific flow sterring method virtnet_fd() is used to modify the flow director based on rfs mapping. The desired queue were calculated through reverse mapping of the irq affinity table. In order to parallelize the ingress path, irq affinity of rx queue were also provides by the driver. In addition to accelerate RFS, we can also use the guest scheduler to balance the load of TX and reduce the lock contention on egress path, so the processor_id() were used to tx queue selection. Signed-off-by: Jason Wang jasow...@redhat.com --- drivers/net/virtio_net.c | 165 +++- include/linux/virtio_net.h |6 ++ 2 files changed, 169 insertions(+), 2 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 0d871f8..89bb5e7 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -26,6 +26,10 @@ #include linux/scatterlist.h #include linux/if_vlan.h #include linux/slab.h +#include linux/highmem.h +#include linux/cpu_rmap.h +#include linux/interrupt.h +#include linux/cpumask.h static int napi_weight = 128; module_param(napi_weight, int, 0444); @@ -40,6 +44,7 @@ module_param(gso, bool, 0444); #define VIRTNET_SEND_COMMAND_SG_MAX2 #define VIRTNET_DRIVER_VERSION 1.0.0 +#define TAP_HASH_MASK 0xFF struct virtnet_send_stats { struct u64_stats_sync syncp; @@ -89,6 +94,9 @@ struct receive_queue { /* Active rx statistics */ struct virtnet_recv_stats __percpu *stats; + + /* FIXME: per vector instead of per queue ?? */ + cpumask_var_t affinity_mask; }; struct virtnet_info { @@ -110,6 +118,11 @@ struct virtnet_info { /* Host will pass rxhash to us. */ bool has_rxhash; + + /* A page of flow director */ + struct page *fd_page; + + cpumask_var_t affinity_mask; }; struct skb_vnet_hdr { @@ -386,6 +399,7 @@ static void receive_buf(struct receive_queue *rq, void *buf, unsigned int len) if (vi-has_rxhash) skb-rxhash = hdr-rhdr.rxhash; + skb_record_rx_queue(skb, rq-vq-queue_index / 2); netif_receive_skb(skb); return; @@ -722,6 +736,19 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev) return NETDEV_TX_OK; } +static int virtnet_set_fd(struct net_device *dev, u32 pfn) +{ + struct virtnet_info *vi = netdev_priv(dev); + struct virtio_device *vdev = vi-vdev; + + if (virtio_has_feature(vdev, VIRTIO_NET_F_HOST_FD)) { + vdev-config-set(vdev, + offsetof(struct virtio_net_config_fd, addr), + pfn, sizeof(u32)); + } + return 0; +} + static int virtnet_set_mac_address(struct net_device *dev, void *p) { struct virtnet_info *vi = netdev_priv(dev); @@ -1017,6 +1044,39 @@ static int virtnet_change_mtu(struct net_device *dev, int new_mtu) return 0; } +#ifdef CONFIG_RFS_ACCEL + +int virtnet_fd(struct net_device *net_dev, const struct sk_buff *skb, + u16 rxq_index, u32 flow_id) +{ + struct virtnet_info *vi = netdev_priv(net_dev); + u16 *table = NULL; + + if (skb-protocol != htons(ETH_P_IP) || !skb-rxhash) + return -EPROTONOSUPPORT; + + table = kmap_atomic(vi-fd_page); + table[skb-rxhash TAP_HASH_MASK] = rxq_index; + kunmap_atomic(table); + + return 0; +} +#endif + +static u16 virtnet_select_queue(struct net_device *dev, struct sk_buff *skb) +{ + int txq = skb_rx_queue_recorded(skb) ? skb_get_rx_queue(skb) : + smp_processor_id(); + + /* As we make use of the accelerate rfs which let the scheduler to +* balance the load, it make sense to choose the tx queue also based on +* theprocessor id? +*/ + while (unlikely(txq = dev-real_num_tx_queues)) + txq -= dev-real_num_tx_queues; + return txq; +} + static const struct net_device_ops virtnet_netdev = { .ndo_open= virtnet_open, .ndo_stop= virtnet_close, @@ -1028,9 +1088,13 @@ static const struct net_device_ops virtnet_netdev = { .ndo_get_stats64 = virtnet_stats, .ndo_vlan_rx_add_vid = virtnet_vlan_rx_add_vid, .ndo_vlan_rx_kill_vid = virtnet_vlan_rx_kill_vid, + .ndo_select_queue= virtnet_select_queue, #ifdef CONFIG_NET_POLL_CONTROLLER .ndo_poll_controller = virtnet_netpoll, #endif +#ifdef CONFIG_RFS_ACCEL + .ndo_rx_flow_steer = virtnet_fd, +#endif }; static void virtnet_update_status(struct virtnet_info *vi) @@ -1272,12 +1336,76 @@ static int virtnet_setup_vqs(struct virtnet_info *vi)
winXP Standard PC HAL and qemu-kvm = 0.15
As it turned out, a windowsXP machine does not work in qemu-kvm = 0.15 (it loses network and USB entirely) if it is using Standard PC HAL. In 0.14 it worked fine, but not in 0.14 (I haven't tried any in-between versions yet). There are several HAL types available in winXP: these are Uniprocessor PC with MPS (or Multiprocessor), also two ACPI types, and Standard PC. All the other HAL types appears to work fine, but not Standard PC. I haven't debugged further yet, -- because it were not easy to find out what was causing the regression and how to reproduce it, and also because I don't think it is the right HAL for qemu-kvm guest anyway. So, if anybody have some thoughts about this issue, and especially if you know a way to switch winXP HAL type to some ACPI variant without reinstalling, please speak up.. ;) Debian bugreport for a reference: http://bugs.debian.org/647312 Reproducer: install a winXP guest on kvm with -no-acpi so it chooses an Uniprocessor with MPS HAL. Switch it to Standard PC in device manager, reboot -- in 0.15+ it does not work anymore, while in 0.14 it continues to work fine. Thank you! /mjt -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm tools: Split custom rootfs init into two stages
Currently custom rootfs init is built along with the main KVM tools executable and is copied into custom rootfs directories when they are created with 'kvm setup'. The problem there is that if the init code changes, they have to be manually copied to custom rootfs directories. Instead, this patch splits init process into two parts. One part that simply handles mounts, and passes it to stage 2 of the init. Stage 2 really sits along in the code tree, and does all the heavy lifting. This allows us to make init changes in the code tree and have it automatically be updated in custom rootfs guests without having to copy files over manually. Signed-off-by: Sasha Levin levinsasha...@gmail.com --- tools/kvm/Makefile |9 +++-- tools/kvm/builtin-run.c | 27 +++ tools/kvm/guest/init.c | 14 +++--- 3 files changed, 37 insertions(+), 13 deletions(-) diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile index bb5f6b0..ece3306 100644 --- a/tools/kvm/Makefile +++ b/tools/kvm/Makefile @@ -21,6 +21,7 @@ TAGS := ctags PROGRAM:= kvm GUEST_INIT := guest/init +GUEST_INIT_S2 := guest/init_stage2 OBJS += builtin-balloon.o OBJS += builtin-debug.o @@ -179,7 +180,7 @@ WARNINGS += -Wwrite-strings CFLAGS += $(WARNINGS) -all: $(PROGRAM) $(GUEST_INIT) +all: $(PROGRAM) $(GUEST_INIT) $(GUEST_INIT_S2) KVMTOOLS-VERSION-FILE: @$(SHELL_PATH) util/KVMTOOLS-VERSION-GEN $(OUTPUT) @@ -193,6 +194,10 @@ $(GUEST_INIT): guest/init.c $(E) LINK $@ $(Q) $(CC) -static guest/init.c -o $@ +$(GUEST_INIT_S2): guest/init_stage2.c + $(E) LINK $@ + $(Q) $(CC) -static guest/init_stage2.c -o $@ + $(DEPS): %.d: %.c @@ -269,7 +274,7 @@ clean: $(Q) rm -f bios/bios-rom.h $(Q) rm -f tests/boot/boot_test.iso $(Q) rm -rf tests/boot/rootfs/ - $(Q) rm -f $(DEPS) $(OBJS) $(PROGRAM) $(GUEST_INIT) + $(Q) rm -f $(DEPS) $(OBJS) $(PROGRAM) $(GUEST_INIT) $(GUEST_INIT_S2) $(Q) rm -f cscope.* $(Q) rm -f $(KVM_INCLUDE)/common-cmds.h $(Q) rm -f KVMTOOLS-VERSION-FILE diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c index 43cf2c4..7c5ae47 100644 --- a/tools/kvm/builtin-run.c +++ b/tools/kvm/builtin-run.c @@ -702,6 +702,31 @@ void kvm_run_help(void) usage_with_options(run_usage, options); } +static int kvm_custom_stage2(void) +{ + char tmp[PATH_MAX], dst[PATH_MAX], *src; + const char *rootfs; + int r; + + src = realpath(guest/init_stage2, NULL); + if (src == NULL) + return -ENOMEM; + + if (image_filename[0] == NULL) + rootfs = default; + else + rootfs = image_filename[0]; + + sprintf(tmp, %s%s/virt/init_stage2, kvm__get_dir(), rootfs); + remove(tmp); + + sprintf(dst, /host/%s, src); + r = symlink(dst, tmp); + free(src); + + return r; +} + int kvm_cmd_run(int argc, const char **argv, const char *prefix) { static char real_cmdline[2048], default_name[20]; @@ -867,6 +892,8 @@ int kvm_cmd_run(int argc, const char **argv, const char *prefix) strcat(real_cmdline, init=/virt/init); if (!no_dhcp) strcat(real_cmdline, ip=dhcp); + if (kvm_custom_stage2()) + die(Failed linking stage 2 of init.); } } else if (!strstr(real_cmdline, root=)) { strlcat(real_cmdline, root=/dev/vda rw , sizeof(real_cmdline)); diff --git a/tools/kvm/guest/init.c b/tools/kvm/guest/init.c index 8975023..032a261 100644 --- a/tools/kvm/guest/init.c +++ b/tools/kvm/guest/init.c @@ -1,6 +1,6 @@ /* - * This is a simple init for shared rootfs guests. It brings up critical - * mountpoints and then launches /bin/sh. + * This is a simple init for shared rootfs guests. This part should be limited + * to doing mounts and running stage 2 of the init process. */ #include sys/mount.h #include string.h @@ -30,15 +30,7 @@ int main(int argc, char *argv[]) do_mounts(); -/* get session leader */ -setsid(); - -/* set controlling terminal */ -ioctl (0, TIOCSCTTY, 1); - - puts(Starting '/bin/sh'...); - - run_process(/bin/sh); + run_process(/virt/init_stage2); printf(Init failed: %s\n, strerror(errno)); -- 1.7.8 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm tools: Split custom rootfs init into two stages
On Mon, Dec 05, 2011 at 11:22:11AM +0200, Sasha Levin wrote: +static int kvm_custom_stage2(void) +{ + char tmp[PATH_MAX], dst[PATH_MAX], *src; + const char *rootfs; + int r; + + src = realpath(guest/init_stage2, NULL); + if (src == NULL) + return -ENOMEM; + + if (image_filename[0] == NULL) + rootfs = default; + else + rootfs = image_filename[0]; + + sprintf(tmp, %s%s/virt/init_stage2, kvm__get_dir(), rootfs); + remove(tmp); + + sprintf(dst, /host/%s, src); + r = symlink(dst, tmp); + free(src); + + return r; +} + I might be paranoid -- but could you please use snprintf here? :) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm: make vcpu life cycle separated from kvm instance
On 12/05/2011 07:29 AM, Liu ping fan wrote: like this, #define kvm_for_each_vcpu(idx, cnt, vcpup, kvm) \ for (idx = 0, cnt = 0, vcpup = kvm_get_vcpu(kvm, idx); \ cnt atomic_read(kvm-online_vcpus) \ idx KVM_MAX_VCPUS; \ idx++, (vcpup == NULL)?:cnt++, vcpup = kvm_get_vcpu(kvm, idx)) \ if (vcpup == NULL) \ continue; \ else A little ugly, but have not thought a better way out :-) #define kvm_for_each_vcpu(vcpu, it) for (vcpu = kvm_fev_init(it); vcpu; vcpu = kvm_fev_next(it, vcpu)) Though that doesn't give a good place for rcu_read_unlock(). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm: make vcpu life cycle separated from kvm instance
On Mon, Dec 05, 2011 at 11:30:51AM +0200, Avi Kivity wrote: On 12/05/2011 07:29 AM, Liu ping fan wrote: like this, #define kvm_for_each_vcpu(idx, cnt, vcpup, kvm) \ for (idx = 0, cnt = 0, vcpup = kvm_get_vcpu(kvm, idx); \ cnt atomic_read(kvm-online_vcpus) \ idx KVM_MAX_VCPUS; \ idx++, (vcpup == NULL)?:cnt++, vcpup = kvm_get_vcpu(kvm, idx)) \ if (vcpup == NULL) \ continue; \ else A little ugly, but have not thought a better way out :-) #define kvm_for_each_vcpu(vcpu, it) for (vcpu = kvm_fev_init(it); vcpu; vcpu = kvm_fev_next(it, vcpu)) Though that doesn't give a good place for rcu_read_unlock(). Why not use rculist to store vcpus and use list_for_each_entry_rcu()? -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio-ring: Use threshold for switching to indirect descriptors
On 12/05/2011 02:10 AM, Rusty Russell wrote: On Sun, 04 Dec 2011 17:16:59 +0200, Avi Kivity a...@redhat.com wrote: On 12/04/2011 05:11 PM, Michael S. Tsirkin wrote: There's also the used ring, but that's a mistake if you have out of order completion. We should have used copying. Seems unrelated... unless you want used to be written into descriptor ring itself? The avail/used rings are in addition to the regular ring, no? If you copy descriptors, then it goes away. There were two ideas which drove the current design: 1) The Van-Jacobson style no two writers to same cacheline makes rings fast idea. Empirically, this doesn't show any winnage. Write/write is the same as write/read or read/write. Both cases have to send a probe and wait for the result. What we really need is to minimize cache line ping ponging, and the descriptor pool fails that with ooo completion. I doubt it's measurable though except with the very fastest storage providers. 2) Allowing a generic inter-guest copy mechanism, so we could have genuinely untrusted driver domains. Yet noone ever did this so it's hardly a killer feature :( It's still a goal, though not an important one. But we have to translate rings anyway, don't, since buffers are in guest physical addresses, and we're moving into an address space that doesn't map those. I thought of having a vhost-copy driver that could do ring translation, using a dma engine for the copy. So if we're going to revisit and drop those requirements, I'd say: 1) Shared device/driver rings like Xen. Xen uses device-specific ring contents, I'd be tempted to stick to our pre-headers, and a 'u64 addr; u64 len_and_flags; u64 cookie;' generic style. Then use the same ring for responses. That's a slight space-win, since we're 24 bytes vs 26 bytes now. Let's cheat and have inline contents. Take three bits from len_and_flags to specify additional descriptors as inline data. Also, stuff the cookie into len_and_flags as well. 2) Stick with physically-contiguous rings, but use them of size (2^n)-1. Makes the indexing harder, but that -1 lets us stash the indices in the first entry and makes the ring a nice 2^n size. Allocate at lease a cache line for those. The 2^n size is not really material, a division is never necessary. 16kB worth of descriptors is 1024 entries. With 4kB buffers, that's 4MB worth of data, or 4 ms at 10GbE line speed. With 1500 byte buffers it's just 1.5 ms. In any case I think it's sufficient. Right. So I think that without indirect, we waste about 3 entries per packet for virtio header and transport etc headers. That does suck. Are there issues in increasing the ring size? Or making it discontiguous? Because the qemu implementation is broken. I was talking about something else, but this is more important. Every time we make a simplifying assumption, it turns around and bites us, and the code becomes twice as complicated as it would have been in the first place, and the test matrix explodes. We can often put the virtio header at the head of the packet. In practice, the qemu implementation insists the header be a single descriptor. (At least, it used to, perhaps it has now been fixed. We need a VIRTIO_NET_F_I_NOW_CONFORM_TO_THE_DAMN_SPEC_SORRY_I_SUCK bit). We'll run out of bits in no time. We currently use small rings: the guest can't negotiate so qemu has to offer a lowest-common-denominator value. The new virtio-pci layout fixes this, and lets the guest set the ring size. Ok good. Note the figuring out the best ring size needs some info from the host, but that can be had from other channels. Can you take a peek at how Xen manages its rings? They have the same problems we do. Yes, I made some mistakes, but I did steal from them in the first place... There was a bit of second system syndrome there. And I don't understand how the ring/pool issue didn't surface during review, it seems so obvious now but completely eluded me then. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm: make vcpu life cycle separated from kvm instance
On 12/05/2011 11:42 AM, Gleb Natapov wrote: On Mon, Dec 05, 2011 at 11:30:51AM +0200, Avi Kivity wrote: On 12/05/2011 07:29 AM, Liu ping fan wrote: like this, #define kvm_for_each_vcpu(idx, cnt, vcpup, kvm) \ for (idx = 0, cnt = 0, vcpup = kvm_get_vcpu(kvm, idx); \ cnt atomic_read(kvm-online_vcpus) \ idx KVM_MAX_VCPUS; \ idx++, (vcpup == NULL)?:cnt++, vcpup = kvm_get_vcpu(kvm, idx)) \ if (vcpup == NULL) \ continue; \ else A little ugly, but have not thought a better way out :-) #define kvm_for_each_vcpu(vcpu, it) for (vcpu = kvm_fev_init(it); vcpu; vcpu = kvm_fev_next(it, vcpu)) Though that doesn't give a good place for rcu_read_unlock(). Why not use rculist to store vcpus and use list_for_each_entry_rcu()? We can, but that's a bigger change. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH 14/16] kvm: x86: Add user space part for in-kernel i8259
On 12/04/2011 11:38 PM, Jan Kiszka wrote: It should be also possible to migrate from non-KVM device to KVM version, different names would prevent that for ever. It is (theoretically) possible with these patches as the vmstate names are the same. KVM to TCG migration does not work right now, so I was only able to test in-kernel - user space irqchip model migrations. btw, for the next-gen migration protocol, we'd probably be using QOM paths, not vmstate names; the QOM paths would include the device name? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm: make vcpu life cycle separated from kvm instance
On Mon, Dec 05, 2011 at 11:58:56AM +0200, Avi Kivity wrote: On 12/05/2011 11:42 AM, Gleb Natapov wrote: On Mon, Dec 05, 2011 at 11:30:51AM +0200, Avi Kivity wrote: On 12/05/2011 07:29 AM, Liu ping fan wrote: like this, #define kvm_for_each_vcpu(idx, cnt, vcpup, kvm) \ for (idx = 0, cnt = 0, vcpup = kvm_get_vcpu(kvm, idx); \ cnt atomic_read(kvm-online_vcpus) \ idx KVM_MAX_VCPUS; \ idx++, (vcpup == NULL)?:cnt++, vcpup = kvm_get_vcpu(kvm, idx)) \ if (vcpup == NULL) \ continue; \ else A little ugly, but have not thought a better way out :-) #define kvm_for_each_vcpu(vcpu, it) for (vcpu = kvm_fev_init(it); vcpu; vcpu = kvm_fev_next(it, vcpu)) Though that doesn't give a good place for rcu_read_unlock(). Why not use rculist to store vcpus and use list_for_each_entry_rcu()? We can, but that's a bigger change. Is it? I do not see a lot of accesses to vcpu array except those loops. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm: make vcpu life cycle separated from kvm instance
On 12/05/2011 12:18 PM, Gleb Natapov wrote: We can, but that's a bigger change. Is it? I do not see a lot of accesses to vcpu array except those loops. Well actually some of those loops have to go away and be replaced by a hash lookup with apic id as key. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next RFC PATCH 2/5] tuntap: simple flow director support
On Mon, Dec 5, 2011 at 8:58 AM, Jason Wang jasow...@redhat.com wrote: This patch adds a simple flow director to tun/tap device. It is just a page that contains the hash to queue mapping which could be changed by user-space. The backend (tap/macvtap) would query this table to get the desired queue of a packets when it send packets to userspace. The page address were set through a new kind of ioctl - TUNSETFD and were pinned until device exit or another new page were specified. Please use flow or fdir instead of fd in the ioctl and code. fd reminds of file descriptor. The ixgbe driver uses fdir. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm: make vcpu life cycle separated from kvm instance
On Mon, Dec 05, 2011 at 12:22:53PM +0200, Avi Kivity wrote: On 12/05/2011 12:18 PM, Gleb Natapov wrote: We can, but that's a bigger change. Is it? I do not see a lot of accesses to vcpu array except those loops. Well actually some of those loops have to go away and be replaced by a hash lookup with apic id as key. Yes, but apic ids are guest controllable, so there should be separate hash that will hold vcpu to gust configured apic id mapping. Shouldn't prevent us from moving to rculist now. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [RFC][PATCH 02/16] kvm: Move kvmclock into hw/kvm folder
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Am 03.12.2011 23:33, schrieb Jan Kiszka: On 2011-12-03 20:00, Andreas Färber wrote: Am 03.12.2011 12:17, schrieb Jan Kiszka: diff --git a/hw/kvmclock.c b/hw/kvm/clock.c similarity index 96% rename from hw/kvmclock.c rename to hw/kvm/clock.c index 5388bc4..aa37c5d 100644 --- a/hw/kvmclock.c +++ b/hw/kvm/clock.c @@ -11,11 +11,11 @@ * */ -#include qemu-common.h -#include sysemu.h -#include sysbus.h -#include kvm.h -#include kvmclock.h +#include qemu-common.h +#include sysemu.h +#include kvm.h +#include hw/sysbus.h +#include hw/kvm/clock.h #include linux/kvm.h #include linux/kvm_para.h Please don't start using system includes for everything. Rather extend QEMU_CFLAGS to contain the right user include path(s). No problem - and no need to tweak any CFLAGS Right, I had recursion into kvm/ in mind - would've required -I ../.. to be added to CFLAGS. ( only adds . to the header search paths). By default that is. -iquote can add further paths. (Unfortunately didn't solve the Cocoa Block.h vs. block.h problem since Objective-C frameworks use quotes, too.) Do we have a convention that every include in is considered system header? Should probably be documented then (and code should be converted gradually). The convention I perceived was that everything QEMU was in quotes whereas POSIX, Linux, zlib, glib, etc. were in angle brackets. Didn't check for documentation. Andreas - -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.18 (GNU/Linux) iQIcBAEBAgAGBQJO3KA6AAoJEPou0S0+fgE/izQP/1q0Oje72FdXyUyVxPZw2Ypi zp+2TFYJ3FJUrTLkkDBjmsaMT0sdIoI/wXxDTrrif9QI1gfRhNlxw9qES+En4xDG 3ClCl6UMNrcq35WrejIvPOXQMvVH6tTnliHBKmG6TSsQXPEFLS/BbWA1Y3gV7nZ4 KXmMHdNqVzmo66AU0FGQPSZyE/u+w8PKnfOIea961tMFtYodny69lzuoBWIaC/oT 8neCRT6U4BVX6hEy6QgY1651IM0KUOUC0fbBwFMwiy+NeL5KgB+GWsrnVq+U0hpM gDtE09L1IKzuppMLlsx1DmxAZYHX12ZlW5W3np13+qDOkFx+4JqT3AU1MGBDhVQ+ ylbYXAINpcXsV8hTyCv1xoWlCJTUreD5+vVgAe5IN3jJUuXttR867YZHS6w0Xkh2 saTYRdkaywNpb9Jm/8RdP0Nepjq2YKdjP99/Da5/GOlVBOqASycKmtAyKQKerhAx 2n+Os8Ekji9fLM7S1FFWe2i/v/bUiVKb9TPRw98tDaDd9V0RW2AkBrJcL2BlFBC4 nqM57ndpv3phGLbVoin2yo32P6iTqL/bS7iyJap+IeklSzxSyW0bBcJyT0oIZMQ2 TdeZNSS2aF9+SmIp91aNRIWhXDAZGggls5AvrS3FTbyzY0jb4HXLIYVGyLCdzfar uHBpp0n3XZsqieTYP+f0 =zA/a -END PGP SIGNATURE- -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next RFC PATCH 5/5] virtio-net: flow director support
On Mon, Dec 5, 2011 at 8:59 AM, Jason Wang jasow...@redhat.com wrote: +static int virtnet_set_fd(struct net_device *dev, u32 pfn) +{ + struct virtnet_info *vi = netdev_priv(dev); + struct virtio_device *vdev = vi-vdev; + + if (virtio_has_feature(vdev, VIRTIO_NET_F_HOST_FD)) { + vdev-config-set(vdev, + offsetof(struct virtio_net_config_fd, addr), + pfn, sizeof(u32)); Please use the virtio model (i.e. virtqueues) instead of shared memory. Mapping a page breaks the virtio abstraction. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH 14/16] kvm: x86: Add user space part for in-kernel i8259
On 2011-12-05 11:01, Avi Kivity wrote: On 12/04/2011 11:38 PM, Jan Kiszka wrote: It should be also possible to migrate from non-KVM device to KVM version, different names would prevent that for ever. It is (theoretically) possible with these patches as the vmstate names are the same. KVM to TCG migration does not work right now, so I was only able to test in-kernel - user space irqchip model migrations. btw, for the next-gen migration protocol, we'd probably be using QOM paths, not vmstate names; the QOM paths would include the device name? That would be a very bad idea IMHO. Every refactoring of your device tree, e.g. to model CPU hotplug and the ICC bus more accurately, would risk to create a migration crack. At least we would need some stable naming and/or alias concept then. Jan signature.asc Description: OpenPGP digital signature
[kvm-autotest] tests.cgroup: Add 2 new tests of cpuset.cpus cgroup functionality
Hi, This patchset fixes some issues in cgroup_common.py library and adds 2 new tests to cgroup-kvm test. Please find the details in each patch. Sent to upstream as pull req. 103: https://github.com/autotest/autotest/pull/103 Regards, Lukáš -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] [kvm-autotest] tests.cgroup: Add TestCpusetCpus test
Tests the cpuset.cpus cgroup feature. It stresses all VM's CPUs and changes the CPU affinity. Verifies correct behaviour. * Add TestCpusetCpus test * import cleanup * private function names cleanup --- client/tests/cgroup/cgroup_common.py |2 + client/tests/kvm/tests/cgroup.py | 211 +++--- 2 files changed, 194 insertions(+), 19 deletions(-) diff --git a/client/tests/cgroup/cgroup_common.py b/client/tests/cgroup/cgroup_common.py index 186bf09..fe1601b 100755 --- a/client/tests/cgroup/cgroup_common.py +++ b/client/tests/cgroup/cgroup_common.py @@ -152,6 +152,8 @@ class Cgroup(object): if pwd == None: pwd = self.root +if isinstance(pwd, int): +pwd = self.cgroups[pwd] try: # Remove tailing '\n' from each line ret = [_[:-1] for _ in open(pwd+prop, 'r').readlines()] diff --git a/client/tests/kvm/tests/cgroup.py b/client/tests/kvm/tests/cgroup.py index ee6ef2e..23ae622 100644 --- a/client/tests/kvm/tests/cgroup.py +++ b/client/tests/kvm/tests/cgroup.py @@ -7,8 +7,8 @@ import logging, os, re, sys, tempfile, time from random import random from autotest_lib.client.common_lib import error from autotest_lib.client.bin import utils -from autotest_lib.client.tests.cgroup.cgroup_common import Cgroup, CgroupModules -from autotest_lib.client.virt import virt_utils, virt_env_process +from autotest_lib.client.tests.cgroup.cgroup_common import (Cgroup, +CgroupModules, get_load_per_cpu) from autotest_lib.client.virt.aexpect import ExpectTimeoutError from autotest_lib.client.virt.aexpect import ExpectProcessTerminatedError @@ -839,7 +839,7 @@ def run_cgroup(test, params, env): * Freezes the guest and thaws it again couple of times * verifies that guest is frozen and runs when expected -def get_stat(pid): +def _get_stat(pid): Gather statistics of pid+1st level subprocesses cpu usage @param pid: PID of the desired process @@ -877,9 +877,9 @@ def run_cgroup(test, params, env): _ = cgroup.get_property('freezer.state', cgroup.cgroups[0]) if 'FROZEN' not in _: raise error.TestFail(Couldn't freeze the VM: state %s % _) -stat_ = get_stat(pid) +stat_ = _get_stat(pid) time.sleep(tsttime) -stat = get_stat(pid) +stat = _get_stat(pid) if stat != stat_: raise error.TestFail('Process was running in FROZEN state; ' 'stat=%s, stat_=%s, diff=%s' % @@ -887,9 +887,9 @@ def run_cgroup(test, params, env): logging.info(THAWING (%ss), tsttime) self.cgroup.set_property('freezer.state', 'THAWED', self.cgroup.cgroups[0]) -stat_ = get_stat(pid) +stat_ = _get_stat(pid) time.sleep(tsttime) -stat = get_stat(pid) +stat = _get_stat(pid) if (stat - stat_) (90*tsttime): raise error.TestFail('Process was not active in FROZEN' 'state; stat=%s, stat_=%s, diff=%s' % @@ -1186,7 +1186,7 @@ def run_cgroup(test, params, env): Let each of 3 scenerios (described in test specification) stabilize and then measure the CPU utilisation for time_test time. -def get_stat(f_stats, _stats=None): +def _get_stat(f_stats, _stats=None): Reads CPU times from f_stats[] files and sumarize them. if _stats is None: _stats = [] @@ -1218,27 +1218,27 @@ def run_cgroup(test, params, env): for thread_count in range(0, host_cpus): sessions[thread_count].sendline(cmd) time.sleep(time_init) -_stats = get_stat(f_stats) +_stats = _get_stat(f_stats) time.sleep(time_test) -stats.append(get_stat(f_stats, _stats)) +stats.append(_get_stat(f_stats, _stats)) thread_count += 1 sessions[thread_count].sendline(cmd) if host_cpus % no_speeds == 0 and no_speeds = host_cpus: time.sleep(time_init) -_stats = get_stat(f_stats) +_stats = _get_stat(f_stats) time.sleep(time_test) -stats.append(get_stat(f_stats, _stats)) +stats.append(_get_stat(f_stats, _stats)) for i in range(thread_count+1, no_threads): sessions[i].sendline(cmd) time.sleep(time_init) -_stats = get_stat(f_stats) +_stats = _get_stat(f_stats) for j in range(3): -
[PATCH 3/3] [kvm-autotest] tests.cgroup: Add TestCpusetCpusSwitching
Tests the cpuset.cpus cgroup feature. It stresses all VM's CPUs while switching between cgroups with different setting. Signed-off-by: Lukas Doktor ldok...@redhat.com --- client/tests/cgroup/cgroup_common.py |4 + client/tests/kvm/tests/cgroup.py | 108 +- 2 files changed, 109 insertions(+), 3 deletions(-) diff --git a/client/tests/cgroup/cgroup_common.py b/client/tests/cgroup/cgroup_common.py index fe1601b..56856c0 100755 --- a/client/tests/cgroup/cgroup_common.py +++ b/client/tests/cgroup/cgroup_common.py @@ -105,6 +105,8 @@ class Cgroup(object): @param pwd: cgroup directory @return: 0 when is 'pwd' member +if isinstance(pwd, int): +pwd = self.cgroups[pwd] if open(pwd + '/tasks').readlines().count(%d\n % pid) 0: return 0 else: @@ -126,6 +128,8 @@ class Cgroup(object): @param pid: pid of the process @param pwd: cgroup directory +if isinstance(pwd, int): +pwd = self.cgroups[pwd] try: open(pwd+'/tasks', 'w').write(str(pid)) except Exception, inst: diff --git a/client/tests/kvm/tests/cgroup.py b/client/tests/kvm/tests/cgroup.py index 23ae622..2e18ef7 100644 --- a/client/tests/kvm/tests/cgroup.py +++ b/client/tests/kvm/tests/cgroup.py @@ -51,13 +51,12 @@ def run_cgroup(test, params, env): @param cgroup: cgroup handler @param pwd: desired cgroup's pwd, cgroup index or None for root cgroup -if isinstance(pwd, int): -pwd = cgroup.cgroups[pwd] cgroup.set_cgroup(vm.get_shell_pid(), pwd) for pid in utils.get_children_pids(vm.get_shell_pid()): cgroup.set_cgroup(int(pid), pwd) + def distance(actual, reference): Absolute value of relative distance of two numbers @@ -1341,7 +1340,7 @@ def run_cgroup(test, params, env): except Exception, failure_detail: err += \nCan't remove Cgroup: %s % failure_detail -self.sessions[0].sendline('rm -f /tmp/cgroup-cpu-lock') +self.sessions[-1].sendline('rm -f /tmp/cgroup-cpu-lock') for i in range(len(self.sessions)): try: self.sessions[i].close() @@ -1381,6 +1380,7 @@ def run_cgroup(test, params, env): self.sessions.append(self.vm.wait_for_login(timeout=30)) self.sessions[i].cmd(touch /tmp/cgroup-cpu-lock) self.sessions[i].sendline(cmd) +self.sessions.append(self.vm.wait_for_login(timeout=30)) # cleanup def run(self): @@ -1485,8 +1485,109 @@ def run_cgroup(test, params, env): logging.error(err) raise error.TestFail(err) +logging.info(Test passed successfully) return (All clear) + +class TestCpusetCpusSwitching: + +Tests the cpuset.cpus cgroup feature. It stresses all VM's CPUs +while switching between cgroups with different setting. + +def __init__(self, vms, modules): + +Initialization +@param vms: list of vms +@param modules: initialized cgroup module class + +self.vm = vms[0] # Virt machines +self.modules = modules # cgroup module handler +self.cgroup = Cgroup('cpuset', '') # cgroup handler +self.sessions = [] + + +def cleanup(self): + Cleanup +err = +try: +del(self.cgroup) +except Exception, failure_detail: +err += \nCan't remove Cgroup: %s % failure_detail + +self.sessions[-1].sendline('rm -f /tmp/cgroup-cpu-lock') +for i in range(len(self.sessions)): +try: +self.sessions[i].close() +except Exception, failure_detail: +err += (\nCan't close the %dst ssh connection % i) + +if err: +logging.error(Some cleanup operations failed: %s, err) +raise error.TestError(Some cleanup operations failed: %s % + err) + + +def init(self): + +Prepares cgroup, moves VM into it and execute stressers. + +self.cgroup.initialize(self.modules) +vm_cpus = int(params.get('smp', 1)) +all_cpus = self.cgroup.get_property(cpuset.cpus)[0] +if all_cpus == 0: +raise error.TestFail(This test needs at least 2 CPUs on + host, cpuset=%s % all_cpus) +try: +last_cpu = int(all_cpus.split('-')[1]) +except Exception: +raise error.TestFail(Failed to get #CPU from root cgroup.) + +# Comments are for vm_cpus=2, no_cpus=4, _SC_CLK_TCK=100 +
[PATCH 1/3] [autotest] client.tests.cgroup: Replace LoadPerCpu() by get_load_per_cpu
* Move LoadPerCpu into cgroup_common.py (cgroup-kvm will need it too) * [FIX] Use etraceback * Code cleanup --- client/tests/cgroup/cgroup.py| 79 ++ client/tests/cgroup/cgroup_common.py | 22 + 2 files changed, 35 insertions(+), 66 deletions(-) diff --git a/client/tests/cgroup/cgroup.py b/client/tests/cgroup/cgroup.py index 207a0d7..000e562 100755 --- a/client/tests/cgroup/cgroup.py +++ b/client/tests/cgroup/cgroup.py @@ -12,9 +12,7 @@ from tempfile import NamedTemporaryFile from autotest_lib.client.bin import test, utils from autotest_lib.client.common_lib import error -from cgroup_common import Cgroup as CG -from cgroup_common import CgroupModules -from cgroup_common import _traceback +from cgroup_common import Cgroup, CgroupModules, get_load_per_cpu class cgroup(test.test): @@ -48,7 +46,7 @@ class cgroup(test.test): logging.info(--- 'test_%s' FAILED ---, subtest) except Exception: err += %s, % subtest -tb = _traceback(test_%s % subtest, sys.exc_info()) +tb = utils.etraceback(test_%s % subtest, sys.exc_info()) logging.error(test_%s: FAILED%s, subtest, tb) logging.info(--- 'test_%s' FAILED ---, subtest) @@ -75,7 +73,6 @@ class cgroup(test.test): def cleanup(self): Cleanup logging.debug('cgroup_test cleanup') -print Cleanup del (self.modules) @@ -102,7 +99,7 @@ class cgroup(test.test): raise error.TestFail(Some parts of cleanup failed%s % err) # Preparation -item = CG('memory', self._client) +item = Cgroup('memory', self._client) item.initialize(self.modules) item.smoke_test() pwd = item.mk_cgroup() @@ -116,8 +113,8 @@ class cgroup(test.test): mem = min(int(mem.split()[1])/1024, 1024) mem = max(mem, 100) # at least 100M try: -memsw_limit_bytes = item.get_property(memory.memsw.limit_in_bytes) -except error.TestFail: +item.get_property(memory.memsw.limit_in_bytes) +except error.TestError: # Doesn't support memsw limitation - disabling logging.info(System does not support 'memsw') utils.system(swapoff -a) @@ -222,7 +219,8 @@ class cgroup(test.test): logging.debug(test_memory: Memfill mem + swap limit) ps = item.test(memfill %d %s % (mem, outf.name)) item.set_cgroup(ps.pid, pwd) -item.set_property_h(memory.memsw.limit_in_bytes, %dM%(mem/2), pwd) +item.set_property_h(memory.memsw.limit_in_bytes, %dM%(mem/2), +pwd) ps.stdin.write('\n') i = 0 while ps.poll() == None: @@ -266,56 +264,6 @@ class cgroup(test.test): Cpuset test 1) Initiate CPU load on CPU0, than spread into CPU* - CPU0 -class LoadPerCpu: - -Handles the LoadPerCpu stats -self.values [cpus, cpu0, cpu1, ...] - -def __init__(self): - -Init - -self.values = [] -self.stat = open('/proc/stat', 'r') -line = self.stat.readline() -while line: -if line.startswith('cpu'): -self.values.append(int(line.split()[1])) -else: -break -line = self.stat.readline() - -def reload(self): - -Reload current values - -self.values = self.get() - -def get(self): - -Get the current values -@return vals: array of current values [cpus, cpu0, cpu1..] - -self.stat.seek(0) -self.stat.flush() -vals = [] -for _ in range(len(self.values)): -vals.append(int(self.stat.readline().split()[1])) -return vals - -def tick(self): - -Reload values and returns the load between the last tick/reload -@return vals: array of load between ticks/reloads - values [cpus, cpu0, cpu1..] - -vals = self.get() -ret = [] -for i in range(len(self.values)): -ret.append(vals[i] - self.values[i]) -self.values = vals -return ret - def cleanup(supress=False): cleanup logging.debug(test_cpuset: Cleanup) @@ -341,7 +289,7 @@ class cgroup(test.test): raise error.TestFail(Some parts of cleanup failed%s % err) # Preparation -item = CG('cpuset',
Re: [RFC][PATCH 14/16] kvm: x86: Add user space part for in-kernel i8259
On 12/05/2011 01:37 PM, Jan Kiszka wrote: On 2011-12-05 11:01, Avi Kivity wrote: On 12/04/2011 11:38 PM, Jan Kiszka wrote: It should be also possible to migrate from non-KVM device to KVM version, different names would prevent that for ever. It is (theoretically) possible with these patches as the vmstate names are the same. KVM to TCG migration does not work right now, so I was only able to test in-kernel - user space irqchip model migrations. btw, for the next-gen migration protocol, we'd probably be using QOM paths, not vmstate names; the QOM paths would include the device name? That would be a very bad idea IMHO. Every refactoring of your device tree, e.g. to model CPU hotplug and the ICC bus more accurately, would risk to create a migration crack. At some point, something has to be stable. We can't have an infinite number of layers giving names to things. I propose we have just one layer. At least we would need some stable naming and/or alias concept then. We should be able to transform a path to backward compatible names, yes. But if something has an unstable name, let's omit it in the first place. (the memory API added unstable names, hopefully the QOM can take over the stable ones and we'll have a good way to denote the unstable ones). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH 14/16] kvm: x86: Add user space part for in-kernel i8259
On 2011-12-05 13:36, Avi Kivity wrote: On 12/05/2011 01:37 PM, Jan Kiszka wrote: On 2011-12-05 11:01, Avi Kivity wrote: On 12/04/2011 11:38 PM, Jan Kiszka wrote: It should be also possible to migrate from non-KVM device to KVM version, different names would prevent that for ever. It is (theoretically) possible with these patches as the vmstate names are the same. KVM to TCG migration does not work right now, so I was only able to test in-kernel - user space irqchip model migrations. btw, for the next-gen migration protocol, we'd probably be using QOM paths, not vmstate names; the QOM paths would include the device name? That would be a very bad idea IMHO. Every refactoring of your device tree, e.g. to model CPU hotplug and the ICC bus more accurately, would risk to create a migration crack. At some point, something has to be stable. We can't have an infinite number of layers giving names to things. I propose we have just one layer. At least we would need some stable naming and/or alias concept then. We should be able to transform a path to backward compatible names, yes. But if something has an unstable name, let's omit it in the first place. (the memory API added unstable names, hopefully the QOM can take over the stable ones and we'll have a good way to denote the unstable ones). OK, maybe - or likely - we should make those device models have the same names in QOM once instantiated. But I'm still convinced they should remain separated models in contrast to a single model with a property. The kvm ioapic, e.g., requires an additional property (gsi_base) that is meaningless for user space devices. And its interrupts have to be wiredconfigured differently at board model level. So, from the QEMU POV, it is a very different device. Just the guest does not notice. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3] Guest stop notification
On Sat, 03 Dec 2011, Jan Kiszka wrote: On 2011-12-02 22:27, Eric B Munson wrote: On Fri, 02 Dec 2011, Jan Kiszka wrote: On 2011-12-02 20:19, Eric B Munson wrote: Often when a guest is stopped from the qemu console, it will report spurious soft lockup warnings on resume. There are kernel patches being discussed that will give the host the ability to tell the guest that it is being stopped and should ignore the soft lockup warning that generates. Signed-off-by: Eric B Munson emun...@mgebm.net Cc: Avi Kivity a...@redhat.com Cc: Marcelo Tosatti mtosa...@redhat.com Cc: Jan Kiszka jan.kis...@siemens.com Cc: ry...@linux.vnet.ibm.com Cc: aligu...@us.ibm.com Cc: kvm@vger.kernel.org --- Changes from V2: Move ioctl into hw/kvmclock.c so as other arches can use it as it is implemented Changes from V1: Remove unnecessary encapsulating function hw/kvmclock.c | 24 1 files changed, 24 insertions(+), 0 deletions(-) diff --git a/hw/kvmclock.c b/hw/kvmclock.c index 5388bc4..756839f 100644 --- a/hw/kvmclock.c +++ b/hw/kvmclock.c @@ -16,6 +16,7 @@ #include sysbus.h #include kvm.h #include kvmclock.h +#include cpu-all.h #include linux/kvm.h #include linux/kvm_para.h @@ -69,11 +70,34 @@ static void kvmclock_vm_state_change(void *opaque, int running, } } +static void kvmclock_vm_state_change_vcpu(void *opaque, int running, + RunState state) +{ +int ret; +CPUState *penv = first_cpu; + +if (running) { + while (penv) { or: for (cpu = first_cpu; cpu != NULL; cpu = cpu-next_cpu) { Functionally equivalent and I see both in the code, is there a standard? Not really. I once tried to introduce an iterator macro, but it was refused. The above is just more compact. But this is only a minor nit. Fair enough, since there will be a V4 I will switch to the for loop. +ret = kvm_vcpu_ioctl(penv, KVM_GUEST_PAUSED, 0); +if (ret) { +if (ret != ENOSYS) { +fprintf(stderr, +kvmclock_vm_state_change_vcpu: %s\n, +strerror(-ret)); +} +return; +} +penv = (CPUState *)penv-next_cpu; Unneeded cast. Also following an example seen elsewhere. Generally, we try to avoid those pointless casts. Will remove for V4. +} +} +} + Again: please use checkpatch.pl. Sorry, tough to get used to hitting space bar that many times... static int kvmclock_init(SysBusDevice *dev) { KVMClockState *s = FROM_SYSBUS(KVMClockState, dev); qemu_add_vm_change_state_handler(kvmclock_vm_state_change, s); +qemu_add_vm_change_state_handler(kvmclock_vm_state_change_vcpu, NULL); return 0; } Why not extend the existing handler? Because the new handler doesn't touch the KVMClockState object. If this is preferred, I have no objection. The separate registration looks strange to me. And the fact that you don't need to object doesn't justify a callback of its own. I think you misunderstood me, I meant I have no object to doign it your way if you have a strong opinion (as it seems you do). I still wonder if the IOCTL interface is actually kvmclock specific. But Marcello asked for this, and we could still change it when some arch comes around that provides it independent of kvmclock. The flag itself is stored in the pvclock_vcpu_time_info structure, and anything else that touches that structure uses ioctls. That's the host-guest interface. But I'm talking about the kvm-qemu interface here which has no relation to how the was paused information is transferred to the guest. Jan signature.asc Description: Digital signature
Re: [RFC][PATCH 14/16] kvm: x86: Add user space part for in-kernel i8259
On 12/05/2011 02:47 PM, Jan Kiszka wrote: (the memory API added unstable names, hopefully the QOM can take over the stable ones and we'll have a good way to denote the unstable ones). OK, maybe - or likely - we should make those device models have the same names in QOM once instantiated. But I'm still convinced they should remain separated models in contrast to a single model with a property. What do you mean by separate models? You share all the code you can, and don't share the code you can't. To me, single model == single name. The kvm ioapic, e.g., requires an additional property (gsi_base) that is meaningless for user space devices. And its interrupts have to be wiredconfigured differently at board model level. So, from the QEMU POV, it is a very different device. Just the guest does not notice. It's like qcow2 and raw/native IO are wire differently, or virtio-net and vhost-net. But it's the same IDE device or virtio NIC. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: winXP Standard PC HAL and qemu-kvm = 0.15
On 12/05/2011 11:21 AM, Michael Tokarev wrote: As it turned out, a windowsXP machine does not work in qemu-kvm = 0.15 (it loses network and USB entirely) if it is using Standard PC HAL. In 0.14 it worked fine, but not in 0.14 (I haven't tried any in-between versions yet). There are several HAL types available in winXP: these are Uniprocessor PC with MPS (or Multiprocessor), also two ACPI types, and Standard PC. All the other HAL types appears to work fine, but not Standard PC. I haven't debugged further yet, -- because it were not easy to find out what was causing the regression and how to reproduce it, and also because I don't think it is the right HAL for qemu-kvm guest anyway. It's not, but the regression indicates we broke something. It would be good to know what that is. So, if anybody have some thoughts about this issue, and especially if you know a way to switch winXP HAL type to some ACPI variant without reinstalling, please speak up.. ;) I remember doing it somewhere in device manager, perhaps in the processor entry. But it was years since I last did this. Debian bugreport for a reference: http://bugs.debian.org/647312 Reproducer: install a winXP guest on kvm with -no-acpi so it chooses an Uniprocessor with MPS HAL. Switch it to Standard PC in device manager, reboot -- in 0.15+ it does not work anymore, while in 0.14 it continues to work fine. Most likely non-ACPI interrupt routing. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH 14/16] kvm: x86: Add user space part for in-kernel i8259
On 2011-12-05 14:14, Avi Kivity wrote: On 12/05/2011 02:47 PM, Jan Kiszka wrote: (the memory API added unstable names, hopefully the QOM can take over the stable ones and we'll have a good way to denote the unstable ones). OK, maybe - or likely - we should make those device models have the same names in QOM once instantiated. But I'm still convinced they should remain separated models in contrast to a single model with a property. What do you mean by separate models? You share all the code you can, and don't share the code you can't. To me, single model == single name. But different configuration. The kvm ioapic, e.g., requires an additional property (gsi_base) that is meaningless for user space devices. And its interrupts have to be wiredconfigured differently at board model level. So, from the QEMU POV, it is a very different device. Just the guest does not notice. It's like qcow2 and raw/native IO are wire differently, or virtio-net and vhost-net. But it's the same IDE device or virtio NIC. That would mean introducing a backend/frontend concept for irqchips. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH 14/16] kvm: x86: Add user space part for in-kernel i8259
On 12/05/2011 03:29 PM, Jan Kiszka wrote: On 2011-12-05 14:14, Avi Kivity wrote: On 12/05/2011 02:47 PM, Jan Kiszka wrote: (the memory API added unstable names, hopefully the QOM can take over the stable ones and we'll have a good way to denote the unstable ones). OK, maybe - or likely - we should make those device models have the same names in QOM once instantiated. But I'm still convinced they should remain separated models in contrast to a single model with a property. What do you mean by separate models? You share all the code you can, and don't share the code you can't. To me, single model == single name. But different configuration. Right, just like IDE with different backends. The kvm ioapic, e.g., requires an additional property (gsi_base) that is meaningless for user space devices. And its interrupts have to be wiredconfigured differently at board model level. So, from the QEMU POV, it is a very different device. Just the guest does not notice. It's like qcow2 and raw/native IO are wire differently, or virtio-net and vhost-net. But it's the same IDE device or virtio NIC. That would mean introducing a backend/frontend concept for irqchips. We could do it, have one ioapic model with ioapic_ops-eoi_broadcast(). Most of the interfaces already dispatch dynamically (qdev gpio/irq) so there wouldn't be much more there. To me, how it's actually implemented is not important. What is important is that save/restore, the monitor, and the guest don't notice any changes. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH V3] Guest stop notification
On 2011-12-05 14:35, Marcelo Tosatti wrote: On Sat, Dec 03, 2011 at 12:45:51PM +0100, Jan Kiszka wrote: I was referring to the relation between the IOCTL and kvmclock, but IOCTL vs. kvm_run. Jan Ah, OK. Yes, we better characterize it as KVMCLOCK specific (a generic guest is paused command is not the scope of this patch). So appending KVMCLOCK_ to the ioctl definitions would make that more explicit. IMHO, that would move things in the wrong direction. The IOCTL in itself has _nothing_ to do with kvmclock. It's just that its x86 backend is implemented on top of that infrastructure. For me the IOCTL is pretty generic, can be backed by kvmclock, but need not be on all future archs. Jan I do not see the need to lift this infrastructure to arch independent status at the moment, without clear semantics on that arch independent level. So I am fine with the current GUEST_PAUSED naming (which can later be extended with GUEST_RESUMED etc, if necessary, for use by other archs for example), and implementation in hw/kvmclock.c. Yes, let's keep it as suggested last (addition of kvmclock, unchanged IOCTL interface). Jan signature.asc Description: OpenPGP digital signature
Re: [RFC][PATCH 14/16] kvm: x86: Add user space part for in-kernel i8259
On 2011-12-05 14:36, Avi Kivity wrote: On 12/05/2011 03:29 PM, Jan Kiszka wrote: On 2011-12-05 14:14, Avi Kivity wrote: On 12/05/2011 02:47 PM, Jan Kiszka wrote: (the memory API added unstable names, hopefully the QOM can take over the stable ones and we'll have a good way to denote the unstable ones). OK, maybe - or likely - we should make those device models have the same names in QOM once instantiated. But I'm still convinced they should remain separated models in contrast to a single model with a property. What do you mean by separate models? You share all the code you can, and don't share the code you can't. To me, single model == single name. But different configuration. Right, just like IDE with different backends. Except that there is a comparably large infrastructure to manage those backends. The kvm ioapic, e.g., requires an additional property (gsi_base) that is meaningless for user space devices. And its interrupts have to be wiredconfigured differently at board model level. So, from the QEMU POV, it is a very different device. Just the guest does not notice. It's like qcow2 and raw/native IO are wire differently, or virtio-net and vhost-net. But it's the same IDE device or virtio NIC. That would mean introducing a backend/frontend concept for irqchips. We could do it, have one ioapic model with ioapic_ops-eoi_broadcast(). Most of the interfaces already dispatch dynamically (qdev gpio/irq) so there wouldn't be much more there. The problem is configuration. Just by setting ioapic.backend=xxx, we cannot pass down parameters that are backend-specific. We could ignore this issue and make all specific parameters visible via the frontend. Would be slightly ugly. To me, how it's actually implemented is not important. What is important is that save/restore, the monitor, and the guest don't notice any changes. I widely agree, except that differentiation (or backend awareness) has to be preserved in the monitor. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM call agenda for 12/6 (Tuesday) @ 10am US/Eastern
Hi Please send in any agenda items you are interested in covering. Proposal (from Anthony): 1. A short introduction to each of the guest agents, what guests they support, and what verbs they support. 2. A short description of key requirements from each party (oVirt, libvirt, QEMU) for a guest agent 3. An open discussion about possible ways to collaborate/converge. Notice that guest integration will take more than one week (Anthony estimation also). For libvirt and ovirt folks, please contact me or Chris for details of the call. Thanks, Juan. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/5] kvm tools: Split custom rootfs init into two stages
Currently custom rootfs init is built along with the main KVM tools executable and is copied into custom rootfs directories when they are created with 'kvm setup'. The problem there is that if the init code changes, they have to be manually copied to custom rootfs directories. Instead, this patch splits init process into two parts. One part that simply handles mounts, and passes it to stage 2 of the init. Stage 2 really sits along in the code tree, and does all the heavy lifting. This allows us to make init changes in the code tree and have it automatically be updated in custom rootfs guests without having to copy files over manua Signed-off-by: Sasha Levin levinsasha...@gmail.com --- tools/kvm/Makefile|9 +++-- tools/kvm/builtin-run.c | 27 +++ tools/kvm/guest/init.c| 14 +++--- tools/kvm/guest/init_stage2.c | 34 ++ 4 files changed, 71 insertions(+), 13 deletions(-) create mode 100644 tools/kvm/guest/init_stage2.c diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile index bb5f6b0..ece3306 100644 --- a/tools/kvm/Makefile +++ b/tools/kvm/Makefile @@ -21,6 +21,7 @@ TAGS := ctags PROGRAM:= kvm GUEST_INIT := guest/init +GUEST_INIT_S2 := guest/init_stage2 OBJS += builtin-balloon.o OBJS += builtin-debug.o @@ -179,7 +180,7 @@ WARNINGS += -Wwrite-strings CFLAGS += $(WARNINGS) -all: $(PROGRAM) $(GUEST_INIT) +all: $(PROGRAM) $(GUEST_INIT) $(GUEST_INIT_S2) KVMTOOLS-VERSION-FILE: @$(SHELL_PATH) util/KVMTOOLS-VERSION-GEN $(OUTPUT) @@ -193,6 +194,10 @@ $(GUEST_INIT): guest/init.c $(E) LINK $@ $(Q) $(CC) -static guest/init.c -o $@ +$(GUEST_INIT_S2): guest/init_stage2.c + $(E) LINK $@ + $(Q) $(CC) -static guest/init_stage2.c -o $@ + $(DEPS): %.d: %.c @@ -269,7 +274,7 @@ clean: $(Q) rm -f bios/bios-rom.h $(Q) rm -f tests/boot/boot_test.iso $(Q) rm -rf tests/boot/rootfs/ - $(Q) rm -f $(DEPS) $(OBJS) $(PROGRAM) $(GUEST_INIT) + $(Q) rm -f $(DEPS) $(OBJS) $(PROGRAM) $(GUEST_INIT) $(GUEST_INIT_S2) $(Q) rm -f cscope.* $(Q) rm -f $(KVM_INCLUDE)/common-cmds.h $(Q) rm -f KVMTOOLS-VERSION-FILE diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c index 43cf2c4..9635c82 100644 --- a/tools/kvm/builtin-run.c +++ b/tools/kvm/builtin-run.c @@ -702,6 +702,31 @@ void kvm_run_help(void) usage_with_options(run_usage, options); } +static int kvm_custom_stage2(void) +{ + char tmp[PATH_MAX], dst[PATH_MAX], *src; + const char *rootfs; + int r; + + src = realpath(guest/init_stage2, NULL); + if (src == NULL) + return -ENOMEM; + + if (image_filename[0] == NULL) + rootfs = default; + else + rootfs = image_filename[0]; + + snprintf(tmp, PATH_MAX, %s%s/virt/init_stage2, kvm__get_dir(), rootfs); + remove(tmp); + + snprintf(dst, PATH_MAX, /host/%s, src); + r = symlink(dst, tmp); + free(src); + + return r; +} + int kvm_cmd_run(int argc, const char **argv, const char *prefix) { static char real_cmdline[2048], default_name[20]; @@ -867,6 +892,8 @@ int kvm_cmd_run(int argc, const char **argv, const char *prefix) strcat(real_cmdline, init=/virt/init); if (!no_dhcp) strcat(real_cmdline, ip=dhcp); + if (kvm_custom_stage2()) + die(Failed linking stage 2 of init.); } } else if (!strstr(real_cmdline, root=)) { strlcat(real_cmdline, root=/dev/vda rw , sizeof(real_cmdline)); diff --git a/tools/kvm/guest/init.c b/tools/kvm/guest/init.c index 8975023..032a261 100644 --- a/tools/kvm/guest/init.c +++ b/tools/kvm/guest/init.c @@ -1,6 +1,6 @@ /* - * This is a simple init for shared rootfs guests. It brings up critical - * mountpoints and then launches /bin/sh. + * This is a simple init for shared rootfs guests. This part should be limited + * to doing mounts and running stage 2 of the init process. */ #include sys/mount.h #include string.h @@ -30,15 +30,7 @@ int main(int argc, char *argv[]) do_mounts(); -/* get session leader */ -setsid(); - -/* set controlling terminal */ -ioctl (0, TIOCSCTTY, 1); - - puts(Starting '/bin/sh'...); - - run_process(/bin/sh); + run_process(/virt/init_stage2); printf(Init failed: %s\n, strerror(errno)); diff --git a/tools/kvm/guest/init_stage2.c b/tools/kvm/guest/init_stage2.c new file mode 100644 index 000..af615a0 --- /dev/null +++ b/tools/kvm/guest/init_stage2.c @@ -0,0 +1,34 @@ +/* + * This is a stage 2 of the init. This part should do all the heavy + * lifting such as setting up the console and calling /bin/sh. + */ +#include sys/mount.h +#include string.h +#include unistd.h
[PATCH 2/5] kvm tools: Remove double 'init=' kernel param
Signed-off-by: Sasha Levin levinsasha...@gmail.com --- tools/kvm/builtin-run.c |3 --- 1 files changed, 0 insertions(+), 3 deletions(-) diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c index 9635c82..de3001e 100644 --- a/tools/kvm/builtin-run.c +++ b/tools/kvm/builtin-run.c @@ -881,9 +881,6 @@ int kvm_cmd_run(int argc, const char **argv, const char *prefix) if (virtio_9p__register(kvm, /, hostfs) 0) die(Unable to initialize virtio 9p); using_rootfs = custom_rootfs = 1; - - if (!strstr(real_cmdline, init=)) - strlcat(real_cmdline, init=/bin/sh , sizeof(real_cmdline)); } if (using_rootfs) { -- 1.7.8 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/5] kvm tools: Allow easily sandboxing applications within a guest
This patch adds a '--sandbox' argument when used in conjuction with a custom rootfs, it allows running a script or an executable in the guest environment by using executables and other files from the host. This is useful when testing code that might cause problems on the host, or to automate kernel testing since it's now easy to link a kvm tools test script with 'git bisect run'. Suggested-by: Ingo Molnar mi...@elte.hu Signed-off-by: Sasha Levin levinsasha...@gmail.com --- tools/kvm/builtin-run.c | 31 +++ tools/kvm/guest/init_stage2.c | 13 - 2 files changed, 43 insertions(+), 1 deletions(-) diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c index de3001e..cd14159 100644 --- a/tools/kvm/builtin-run.c +++ b/tools/kvm/builtin-run.c @@ -82,6 +82,7 @@ static const char *guest_mac; static const char *host_mac; static const char *script; static const char *guest_name; +static const char *sandbox; static struct virtio_net_params *net_params; static bool single_step; static bool readonly_image[MAX_DISK_IMAGES]; @@ -420,6 +421,8 @@ static const struct option options[] = { OPT_CALLBACK('\0', tty, NULL, tty id, Remap guest TTY into a pty on the host, tty_parser), + OPT_STRING('\0', sandbox, sandbox, script, + Run this script when booting into custom rootfs), OPT_GROUP(Kernel options:), OPT_STRING('k', kernel, kernel_filename, kernel, @@ -727,6 +730,31 @@ static int kvm_custom_stage2(void) return r; } +static int kvm_run_set_sandbox(void) +{ + const char *guestfs_name = default; + char path[PATH_MAX], script[PATH_MAX], *tmp; + + if (image_filename[0]) + guestfs_name = image_filename[0]; + + snprintf(path, PATH_MAX, %s%s/virt/sandbox.sh, kvm__get_dir(), guestfs_name); + + remove(path); + + if (sandbox == NULL) + return 0; + + tmp = realpath(sandbox, NULL); + if (tmp == NULL) + return -ENOMEM; + + snprintf(script, PATH_MAX, /host/%s, tmp); + free(tmp); + + return symlink(script, path); +} + int kvm_cmd_run(int argc, const char **argv, const char *prefix) { static char real_cmdline[2048], default_name[20]; @@ -886,7 +914,10 @@ int kvm_cmd_run(int argc, const char **argv, const char *prefix) if (using_rootfs) { strcat(real_cmdline, root=/dev/root rw rootflags=rw,trans=virtio,version=9p2000.L rootfstype=9p); if (custom_rootfs) { + kvm_run_set_sandbox(); + strcat(real_cmdline, init=/virt/init); + if (!no_dhcp) strcat(real_cmdline, ip=dhcp); if (kvm_custom_stage2()) diff --git a/tools/kvm/guest/init_stage2.c b/tools/kvm/guest/init_stage2.c index af615a0..6489fee 100644 --- a/tools/kvm/guest/init_stage2.c +++ b/tools/kvm/guest/init_stage2.c @@ -16,6 +16,14 @@ static int run_process(char *filename) return execve(filename, new_argv, new_env); } +static int run_process_sandbox(char *filename) +{ + char *new_argv[] = { filename, /virt/sandbox.sh, NULL }; + char *new_env[] = { TERM=linux, NULL }; + + return execve(filename, new_argv, new_env); +} + int main(int argc, char *argv[]) { /* get session leader */ @@ -26,7 +34,10 @@ int main(int argc, char *argv[]) puts(Starting '/bin/sh'...); - run_process(/bin/sh); + if (access(/virt/sandbox.sh, R_OK) == 0) + run_process_sandbox(/bin/sh); + else + run_process(/bin/sh); printf(Init failed: %s\n, strerror(errno)); -- 1.7.8 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/5] kvm tools: Ignore parameters after dashdash in 'kvm run'
This allows other commands to wrap 'kvm run' and use the parameters user provides after a dash-dash for it's own use. Signed-off-by: Sasha Levin levinsasha...@gmail.com --- tools/kvm/builtin-run.c |7 ++- 1 files changed, 6 insertions(+), 1 deletions(-) diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c index cd14159..5db6995 100644 --- a/tools/kvm/builtin-run.c +++ b/tools/kvm/builtin-run.c @@ -776,8 +776,13 @@ int kvm_cmd_run(int argc, const char **argv, const char *prefix) while (argc != 0) { argc = parse_options(argc, argv, options, run_usage, - PARSE_OPT_STOP_AT_NON_OPTION); + PARSE_OPT_STOP_AT_NON_OPTION | + PARSE_OPT_KEEP_DASHDASH); if (argc != 0) { + /* Cusrom options, should have been handled elsewhere */ + if (strcmp(argv[0], --) == 0) + break; + if (kernel_filename) { fprintf(stderr, Cannot handle parameter: %s\n, argv[0]); -- 1.7.8 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/5] kvm tools: Add 'kvm sandbox'
This patch adds 'kvm sandbox' which is a wrapper on top of 'kvm run' which allows the user to easily specify sandboxed command to run in a custom rootfs guest. Example usage: kvm sandbox -d test_guest -k some_kernel -- do_something_in_guest Suggested-by: Pekka Enberg penb...@cs.helsinki.fi Signed-off-by: Sasha Levin levinsasha...@gmail.com --- tools/kvm/Documentation/kvm-sandbox.txt | 16 ++ tools/kvm/Makefile |1 + tools/kvm/builtin-run.c | 49 +- tools/kvm/builtin-sandbox.c |9 ++ tools/kvm/command-list.txt |1 + tools/kvm/include/kvm/builtin-run.h |2 + tools/kvm/include/kvm/builtin-sandbox.h |6 tools/kvm/kvm-cmd.c |2 + 8 files changed, 84 insertions(+), 2 deletions(-) create mode 100644 tools/kvm/Documentation/kvm-sandbox.txt create mode 100644 tools/kvm/builtin-sandbox.c create mode 100644 tools/kvm/include/kvm/builtin-sandbox.h diff --git a/tools/kvm/Documentation/kvm-sandbox.txt b/tools/kvm/Documentation/kvm-sandbox.txt new file mode 100644 index 000..8f24fc7 --- /dev/null +++ b/tools/kvm/Documentation/kvm-sandbox.txt @@ -0,0 +1,16 @@ +kvm-sandbox(1) + + +NAME + +kvm-sandbox - Run a command in a sandboxed guest + +SYNOPSIS + +[verse] +'kvm sandbox ['kvm run' arguments] -- [sandboxed command]' + +DESCRIPTION +--- +The sandboxed command will run in a guest as part of it's init +command. diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile index ece3306..24af1d0 100644 --- a/tools/kvm/Makefile +++ b/tools/kvm/Makefile @@ -85,6 +85,7 @@ OBJS += hw/vesa.o OBJS += hw/i8042.o OBJS += hw/pci-shmem.o OBJS += kvm-ipc.o +OBJS += builtin-sandbox.o FLAGS_BFD := $(CFLAGS) -lbfd has_bfd := $(call try-cc,$(SOURCE_BFD),$(FLAGS_BFD)) diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c index 5db6995..7a57b5c 100644 --- a/tools/kvm/builtin-run.c +++ b/tools/kvm/builtin-run.c @@ -53,6 +53,7 @@ #define DEFAULT_GUEST_MAC 02:15:15:15:15:15 #define DEFAULT_HOST_MAC 02:01:01:01:01:01 #define DEFAULT_SCRIPT none +const char *DEFAULT_SANDBOX_FILENAME = guest/sandbox.sh; #define MB_SHIFT (20) #define KB_SHIFT (10) @@ -94,6 +95,7 @@ static bool custom_rootfs; static bool no_net; static bool no_dhcp; extern bool ioport_debug; +static int kvm_run_wrapper; extern int active_console; extern int debug_iodelay; @@ -107,6 +109,15 @@ static const char * const run_usage[] = { NULL }; +enum { + KVM_RUN_SANDBOX, +}; + +void kvm_run_set_wrapper_sandbox(void) +{ + kvm_run_wrapper = KVM_RUN_SANDBOX; +} + static int img_name_parser(const struct option *opt, const char *arg, int unset) { char *sep; @@ -755,6 +766,35 @@ static int kvm_run_set_sandbox(void) return symlink(script, path); } +static void kvm_run_write_sandbox_cmd(const char **argv, int argc) +{ + const char script_hdr[] = #! /bin/bash\n\n; + int fd; + + remove(sandbox); + + fd = open(sandbox, O_RDWR | O_CREAT, 0777); + if (fd 0) + die(Failed creating sandbox script); + + if (write(fd, script_hdr, sizeof(script_hdr) - 1) = 0) + die(Failed writing sandbox script); + + while (argc) { + if (write(fd, argv[0], strlen(argv[0])) = 0) + die(Failed writing sandbox script); + if (argc - 1) + if (write(fd, , 1) = 0) + die(Failed writing sandbox script); + argv++; + argc--; + } + if (write(fd, \n, 1) = 0) + die(Failed writing sandbox script); + + close(fd); +} + int kvm_cmd_run(int argc, const char **argv, const char *prefix) { static char real_cmdline[2048], default_name[20]; @@ -780,8 +820,13 @@ int kvm_cmd_run(int argc, const char **argv, const char *prefix) PARSE_OPT_KEEP_DASHDASH); if (argc != 0) { /* Cusrom options, should have been handled elsewhere */ - if (strcmp(argv[0], --) == 0) - break; + if (strcmp(argv[0], --) == 0) { + if (kvm_run_wrapper == KVM_RUN_SANDBOX) { + sandbox = DEFAULT_SANDBOX_FILENAME; + kvm_run_write_sandbox_cmd(argv+1, argc-1); + break; + } + } if (kernel_filename) { fprintf(stderr, Cannot handle parameter: diff --git a/tools/kvm/builtin-sandbox.c b/tools/kvm/builtin-sandbox.c new file mode 100644 index 000..433f536 --- /dev/null +++ b/tools/kvm/builtin-sandbox.c @@ -0,0 +1,9 @@
Re: [PATCH 3/5 V4] Add ioctl for KVM_GUEST_STOPPED
On Sat, 03 Dec 2011, Sasha Levin wrote: On Tue, 2011-11-29 at 16:35 -0500, Eric B Munson wrote: Now that we have a flag that will tell the guest it was suspended, create an interface for that communication using a KVM ioctl. Signed-off-by: Eric B Munson emun...@mgebm.net Can it be documented in api.txt as well? -- Sasha. Thanks for the review, will do for V5. Eric signature.asc Description: Digital signature
Re: [PATCH v2 1/3] pci: Rework config space blocking services
On Fri, 4 Nov 2011 09:45:59 +0100 Jan Kiszka jan.kis...@siemens.com wrote: pci_block_user_cfg_access was designed for the use case that a single context, the IPR driver, temporarily delays user space accesses to the config space via sysfs. This assumption became invalid by the time pci_dev_reset was added as locking instance. Today, if you run two loops in parallel that reset the same device via sysfs, you end up with a kernel BUG as pci_block_user_cfg_access detect the broken assumption. This reworks the pci_block_user_cfg_access to a sleeping service pci_cfg_access_lock and an atomic-compatible variant called pci_cfg_access_trylock. The former not only blocks user space access as before but also waits if access was already locked. The latter service just returns false in this case, allowing the caller to resolve the conflict instead of raising a BUG. Adaptions of the ipr driver were originally written by Brian King. Applied this series to linux-next, thanks. -- Jesse Barnes, Intel Open Source Technology Center signature.asc Description: PGP signature
Re: [PATCHv2 RFC] virtio-pci: flexible configuration layout
On Mon, 14 Nov 2011 20:18:55 +0200 Michael S. Tsirkin m...@redhat.com wrote: Add a flexible mechanism to specify virtio configuration layout, using pci vendor-specific capability. A separate capability is used for each of common, device specific and data-path accesses. Warning: compiled only. This patch also needs to be split up, pci_iomap changes also need arch updates for non-x86. There might also be more spec changes. Posting here for early feedback, and to allow Sasha to proceed with his kvm tool work. Changes from v1: Updated to match v3 of the spec, see: Subject: [PATCHv3 RFC] virtio-spec: flexible configuration layout Message-ID: 2010122436.ga13...@redhat.com In-Reply-To: 2009195901.ga28...@redhat.com Looks like this conflicts with your other iomap changes... I didn't check your latest tree; do you just add another patch on top for the virtio changes now? Thanks, -- Jesse Barnes, Intel Open Source Technology Center signature.asc Description: PGP signature
Re: [Qemu-devel] [PATCH] ivshmem: fix guest unable to start with ioeventfd
2011/12/2 Cam Macdonell c...@cs.ualberta.ca: 2011/11/30 Cam Macdonell c...@cs.ualberta.ca: 2011/11/30 Zang Hongyong zanghongy...@huawei.com: Can this bug fix patch be applied yet? Sorry, for not replying yet. I'll test your patch within the next day. Have you confirmed the proper receipt of interrupts in the receiving guests? I can confirm the bug occurs with ioeventfd enabled and that the patches fixes it, but sometime after 15.1, I no longer see interrupts (MSI or regular) being delivered in the guest. I will bisect tomorrow. With Michael's help we debugged msi-x interrupt delivery. With that fix in place, this patch fixes ioeventfd in ivshmem. Cam With this bug, guest os cannot successfully boot with ioeventfd. Thus the new PIO DoorBell patch cannot be posted. Well, you can certainly post the new patch, just clarify that it's dependent on this patch. Sincerely, Cam Thanks, Hongyong 于 2011/11/24,星期四 18:05, zanghongy...@huawei.com 写道: From: Hongyong Zang zanghongy...@huawei.com When a guest boots with ioeventfd, an error (by gdb) occurs: Program received signal SIGSEGV, Segmentation fault. 0x006009cc in setup_ioeventfds (s=0x171dc40) at /home/louzhengwei/git_source/qemu-kvm/hw/ivshmem.c:363 363 for (j = 0; j s-peers[i].nb_eventfds; j++) { The bug is due to accessing s-peers which is NULL. This patch uses the memory region API to replace the old one kvm_set_ioeventfd_mmio_long(). And this patch makes memory_region_add_eventfd() called in ivshmem_read() when qemu receives eventfd information from ivshmem_server. Signed-off-by: Hongyong Zang zanghongy...@huawei.com --- hw/ivshmem.c | 41 ++--- 1 files changed, 14 insertions(+), 27 deletions(-) diff --git a/hw/ivshmem.c b/hw/ivshmem.c index 242fbea..be26f03 100644 --- a/hw/ivshmem.c +++ b/hw/ivshmem.c @@ -58,7 +58,6 @@ typedef struct IVShmemState { CharDriverState *server_chr; MemoryRegion ivshmem_mmio; -pcibus_t mmio_addr; /* We might need to register the BAR before we actually have the memory. * So prepare a container MemoryRegion for the BAR immediately and * add a subregion when we have the memory. @@ -346,8 +345,14 @@ static void close_guest_eventfds(IVShmemState *s, int posn) guest_curr_max = s-peers[posn].nb_eventfds; for (i = 0; i guest_curr_max; i++) { -kvm_set_ioeventfd_mmio_long(s-peers[posn].eventfds[i], -s-mmio_addr + DOORBELL, (posn 16) | i, 0); +if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) { +memory_region_del_eventfd(s-ivshmem_mmio, + DOORBELL, + 4, + true, + (posn 16) | i, + s-peers[posn].eventfds[i]); +} close(s-peers[posn].eventfds[i]); } @@ -355,22 +360,6 @@ static void close_guest_eventfds(IVShmemState *s, int posn) s-peers[posn].nb_eventfds = 0; } -static void setup_ioeventfds(IVShmemState *s) { - -int i, j; - -for (i = 0; i = s-max_peer; i++) { -for (j = 0; j s-peers[i].nb_eventfds; j++) { -memory_region_add_eventfd(s-ivshmem_mmio, - DOORBELL, - 4, - true, - (i 16) | j, - s-peers[i].eventfds[j]); -} -} -} - /* this function increase the dynamic storage need to store data about other * guests */ static void increase_dynamic_storage(IVShmemState *s, int new_min_size) { @@ -491,10 +480,12 @@ static void ivshmem_read(void *opaque, const uint8_t * buf, int flags) } if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) { -if (kvm_set_ioeventfd_mmio_long(incoming_fd, s-mmio_addr + DOORBELL, -(incoming_posn 16) | guest_max_eventfd, 1) 0) { -fprintf(stderr, ivshmem: ioeventfd not available\n); -} +memory_region_add_eventfd(s-ivshmem_mmio, + DOORBELL, + 4, + true, + (incoming_posn 16) | guest_max_eventfd, + incoming_fd); } return; @@ -659,10 +650,6 @@ static int pci_ivshmem_init(PCIDevice *dev) memory_region_init_io(s-ivshmem_mmio, ivshmem_mmio_ops, s, ivshmem-mmio, IVSHMEM_REG_BAR_SIZE); -if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) { -setup_ioeventfds(s); -} - /* region for registers*/ pci_register_bar(s-dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, s-ivshmem_mmio); -- To unsubscribe from
Re: [net-next RFC PATCH 2/5] tuntap: simple flow director support
On Mon, 2011-12-05 at 16:58 +0800, Jason Wang wrote: This patch adds a simple flow director to tun/tap device. It is just a page that contains the hash to queue mapping which could be changed by user-space. The backend (tap/macvtap) would query this table to get the desired queue of a packets when it send packets to userspace. This is just flow hashing (RSS), not flow steering. The page address were set through a new kind of ioctl - TUNSETFD and were pinned until device exit or another new page were specified. [...] You should implement ethtool ETHTOOL_{G,S}RXFHINDIR instead. Ben. -- Ben Hutchings, Staff Engineer, Solarflare Not speaking for my employer; that's the marketing department's job. They asked us to note that Solarflare product names are trademarked. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next RFC PATCH 3/5] macvtap: flow director support
Similarly, macvtap chould implement the ethtool {get,set}_rxfh_indir operations. Ben. -- Ben Hutchings, Staff Engineer, Solarflare Not speaking for my employer; that's the marketing department's job. They asked us to note that Solarflare product names are trademarked. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V4] Guest stop notification
Often when a guest is stopped from the qemu console, it will report spurious soft lockup warnings on resume. There are kernel patches being discussed that will give the host the ability to tell the guest that it is being stopped and should ignore the soft lockup warning that generates. This patch uses the qemu Notifier system to tell the guest it is about to be stopped. Signed-off-by: Eric B Munson emun...@mgebm.net Cc: Avi Kivity a...@redhat.com Cc: Marcelo Tosatti mtosa...@redhat.com Cc: Jan Kiszka jan.kis...@siemens.com Cc: ry...@linux.vnet.ibm.com Cc: aligu...@us.ibm.com Cc: kvm@vger.kernel.org --- Changes from V3: Collapse new state change notification function into existsing function. Correct whitespace issues Change ioctl name to KVMCLOCK_GUEST_PAUSED Use for loop to iterate vpcu's Changes from V2: Move ioctl into hw/kvmclock.c so as other arches can use it as it is implemented Changes from V1: Remove unnecessary encapsulating function hw/kvmclock.c | 15 +++ 1 files changed, 15 insertions(+), 0 deletions(-) diff --git a/hw/kvmclock.c b/hw/kvmclock.c index 5388bc4..fa11dd7 100644 --- a/hw/kvmclock.c +++ b/hw/kvmclock.c @@ -16,6 +16,7 @@ #include sysbus.h #include kvm.h #include kvmclock.h +#include cpu-all.h #include linux/kvm.h #include linux/kvm_para.h @@ -62,10 +63,24 @@ static int kvmclock_post_load(void *opaque, int version_id) static void kvmclock_vm_state_change(void *opaque, int running, RunState state) { +int ret; +CPUState *penv = first_cpu; KVMClockState *s = opaque; if (running) { s-clock_valid = false; + +for (penv = first_cpu; penv != NULL; penv = penv-next_cpu) { +ret = kvm_vcpu_ioctl(penv, KVMCLOCK_GUEST_PAUSED, 0); +if (ret) { +if (ret != -EINVAL) { +fprintf(stderr, +kvmclock_vm_state_change: %s\n, +strerror(-ret)); +} +return; +} +} } } -- 1.7.5.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/5 V5] Add flag to indicate that a vm was stopped by the host
This flag will be used to check if the vm was stopped by the host when a soft lockup was detected. The host will set the flag when it stops the guest. On resume, the guest will check this flag if a soft lockup is detected and skip issuing the warning. Signed-off-by: Eric B Munson emun...@mgebm.net Cc: mi...@redhat.com Cc: h...@zytor.com Cc: a...@arndb.de Cc: ry...@linux.vnet.ibm.com Cc: aligu...@us.ibm.com Cc: mtosa...@redhat.com Cc: jeremy.fitzhardi...@citrix.com Cc: levinsasha...@gmail.com Cc: Jan Kiszka jan.kis...@siemens.com Cc: kvm@vger.kernel.org Cc: linux-a...@vger.kernel.org Cc: x...@kernel.org Cc: linux-ker...@vger.kernel.org --- arch/x86/include/asm/pvclock-abi.h |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/pvclock-abi.h b/arch/x86/include/asm/pvclock-abi.h index 35f2d19..6167fd7 100644 --- a/arch/x86/include/asm/pvclock-abi.h +++ b/arch/x86/include/asm/pvclock-abi.h @@ -40,5 +40,6 @@ struct pvclock_wall_clock { } __attribute__((__packed__)); #define PVCLOCK_TSC_STABLE_BIT (1 0) +#define PVCLOCK_GUEST_STOPPED (1 1) #endif /* __ASSEMBLY__ */ #endif /* _ASM_X86_PVCLOCK_ABI_H */ -- 1.7.5.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/5 V5] Add functions to check if the host has stopped the vm
When a host stops or suspends a VM it will set a flag to show this. The watchdog will use these functions to determine if a softlockup is real, or the result of a suspended VM. Signed-off-by: Eric B Munson emun...@mgebm.net Cc: mi...@redhat.com Cc: h...@zytor.com Cc: a...@arndb.de Cc: ry...@linux.vnet.ibm.com Cc: aligu...@us.ibm.com Cc: mtosa...@redhat.com Cc: jeremy.fitzhardi...@citrix.com Cc: levinsasha...@gmail.com Cc: Jan Kiszka jan.kis...@siemens.com Cc: kvm@vger.kernel.org Cc: linux-a...@vger.kernel.org Cc: x...@kernel.org Cc: linux-ker...@vger.kernel.org --- arch/x86/include/asm/kvm_para.h |1 + arch/x86/kernel/kvmclock.c | 21 + 2 files changed, 22 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h index 734c376..e9d63a6 100644 --- a/arch/x86/include/asm/kvm_para.h +++ b/arch/x86/include/asm/kvm_para.h @@ -95,6 +95,7 @@ struct kvm_vcpu_pv_apf_data { extern void kvmclock_init(void); extern int kvm_register_clock(char *txt); +bool kvm_check_and_clear_guest_paused(int cpu); /* This instruction is vmcall. On non-VT architectures, it will generate a * trap that we will then rewrite to the appropriate instruction. diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c index 44842d7..f0c0599 100644 --- a/arch/x86/kernel/kvmclock.c +++ b/arch/x86/kernel/kvmclock.c @@ -22,6 +22,7 @@ #include asm/msr.h #include asm/apic.h #include linux/percpu.h +#include linux/hardirq.h #include asm/x86_init.h #include asm/reboot.h @@ -114,6 +115,26 @@ static void kvm_get_preset_lpj(void) preset_lpj = lpj; } +bool kvm_check_and_clear_guest_paused(int cpu) +{ + bool ret = false; + struct pvclock_vcpu_time_info *src; + + /* +* per_cpu() is safe here because this function is only called from +* timer functions where preemption is already disabled. +*/ + WARN_ON(!in_atomic()); + src = per_cpu(hv_clock, cpu); + if ((src-flags PVCLOCK_GUEST_STOPPED) != 0) { + src-flags = src-flags (~PVCLOCK_GUEST_STOPPED); + ret = true; + } + + return ret; +} +EXPORT_SYMBOL_GPL(kvm_check_and_clear_guest_paused); + static struct clocksource kvm_clock = { .name = kvm-clock, .read = kvm_clock_get_cycles, -- 1.7.5.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/5 V5] Avoid soft lockup message when KVM is stopped by host
Changes from V4: Rename KVM_GUEST_PAUSED to KVMCLOCK_GUEST_PAUSED Add description of KVMCLOCK_GUEST_PAUSED ioctl to api.txt Changes from V3: Include CC's on patch 3 Drop clear flag ioctl and have the watchdog clear the flag when it is reset Changes from V2: A new kvm functions defined in kvm_para.h, the only change to pvclock is the initial flag definition Changes from V1: (Thanks Marcelo) Host code has all been moved to arch/x86/kvm/x86.c KVM_PAUSE_GUEST was renamed to KVM_GUEST_PAUSED When a guest kernel is stopped by the host hypervisor it can look like a soft lockup to the guest kernel. This false warning can mask later soft lockup warnings which may be real. This patch series adds a method for a host hypervisor to communicate to a guest kernel that it is being stopped. The final patch in the series has the watchdog check this flag when it goes to issue a soft lockup warning and skip the warning if the guest knows it was stopped. It was attempted to solve this in Qemu, but the side effects of saving and restoring the clock and tsc for each vcpu put the wall clock of the guest behind by the amount of time of the pause. This forces a guest to have ntp running in order to keep the wall clock accurate. Cc: mi...@redhat.com Cc: h...@zytor.com Cc: a...@arndb.de Cc: ry...@linux.vnet.ibm.com Cc: aligu...@us.ibm.com Cc: mtosa...@redhat.com Cc: jeremy.fitzhardi...@citrix.com Cc: levinsasha...@gmail.com Cc: Jan Kiszka jan.kis...@siemens.com Cc: kvm@vger.kernel.org Cc: linux-a...@vger.kernel.org Cc: x...@kernel.org Cc: linux-ker...@vger.kernel.org Eric B Munson (5): Add flag to indicate that a vm was stopped by the host Add functions to check if the host has stopped the vm Add ioctl for KVMCLOCK_GUEST_STOPPED Add generic stubs for kvm stop check functions Add check for suspended vm in softlockup detector Documentation/virtual/kvm/api.txt | 12 arch/x86/include/asm/kvm_host.h|2 ++ arch/x86/include/asm/kvm_para.h|1 + arch/x86/include/asm/pvclock-abi.h |1 + arch/x86/kernel/kvmclock.c | 21 + arch/x86/kvm/x86.c | 20 include/asm-generic/kvm_para.h | 14 ++ include/linux/kvm.h|2 ++ kernel/watchdog.c | 12 9 files changed, 85 insertions(+), 0 deletions(-) create mode 100644 include/asm-generic/kvm_para.h -- 1.7.5.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/5 V5] Add check for suspended vm in softlockup detector
A suspended VM can cause spurious soft lockup warnings. To avoid these, the watchdog now checks if the kernel knows it was stopped by the host and skips the warning if so. When the watchdog is reset successfully, clear the guest paused flag. Signed-off-by: Eric B Munson emun...@mgebm.net Cc: mi...@redhat.com Cc: h...@zytor.com Cc: a...@arndb.de Cc: ry...@linux.vnet.ibm.com Cc: aligu...@us.ibm.com Cc: mtosa...@redhat.com Cc: jeremy.fitzhardi...@citrix.com Cc: levinsasha...@gmail.com Cc: Jan Kiszka jan.kis...@siemens.com Cc: kvm@vger.kernel.org Cc: linux-a...@vger.kernel.org Cc: x...@kernel.org Cc: linux-ker...@vger.kernel.org --- Changes from V3: Clear the PAUSED flag when the watchdog is reset kernel/watchdog.c | 12 1 files changed, 12 insertions(+), 0 deletions(-) diff --git a/kernel/watchdog.c b/kernel/watchdog.c index 1d7bca7..7c62919 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -25,6 +25,7 @@ #include linux/sysctl.h #include asm/irq_regs.h +#include linux/kvm_para.h #include linux/perf_event.h int watchdog_enabled = 1; @@ -280,6 +281,9 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) __this_cpu_write(softlockup_touch_sync, false); sched_clock_tick(); } + + /* Clear the guest paused flag on watchdog reset */ + kvm_check_and_clear_guest_paused(smp_processor_id()); __touch_watchdog(); return HRTIMER_RESTART; } @@ -292,6 +296,14 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) */ duration = is_softlockup(touch_ts); if (unlikely(duration)) { + /* +* If a virtual machine is stopped by the host it can look to +* the watchdog like a soft lockup, check to see if the host +* stopped the vm before we issue the warning +*/ + if (kvm_check_and_clear_guest_paused(smp_processor_id())) + return HRTIMER_RESTART; + /* only warn once */ if (__this_cpu_read(soft_watchdog_warn) == true) return HRTIMER_RESTART; -- 1.7.5.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: winXP Standard PC HAL and qemu-kvm = 0.15
On 05.12.2011 17:28, Avi Kivity wrote: [] I haven't debugged further yet, -- because it were not easy to find out what was causing the regression and how to reproduce it, and also because I don't think it is the right HAL for qemu-kvm guest anyway. It's not, but the regression indicates we broke something. It would be good to know what that is. So today I gave it a chance with git bisect, and here's what it found: First bad commit ef390067a72fe09977bb4ac8211313e1503302ea Merge: c7b3e90 0fd542f Author: Avi Kivity a...@redhat.com Date: Sun May 15 04:48:05 2011 -0400 Merge commit '0fd542fb7d13ddf12f897bb27c5950f31638b1df' into upstream-merge * commit '0fd542fb7d13ddf12f897bb27c5950f31638b1df': cpu: add set_memory flag to request dirty logging piix_pci: load path clean up piix_pci: optimize set irq path piix_pci: eliminate PIIX3State::pci_irq_levels pci: add accessor function to get irq levels cirrus_vga: remove unneeded reset Conflicts: exec.c Signed-off-by: Avi Kivity a...@redhat.com And just like with the 32/64bit lockup issue, this is a merge commit, which is not exactly useful. Any guesses? :) The problem is that so far, there's no known way to change to use proper hal type in winXP (except of reinstalling the guest), and there's no known workaround on the kvm side, so users are stuck with older versions. So, if anybody have some thoughts about this issue, and especially if you know a way to switch winXP HAL type to some ACPI variant without reinstalling, please speak up.. ;) I remember doing it somewhere in device manager, perhaps in the processor entry. But it was years since I last did this. As I already mentioned, changing HAL type works from anything to Standard PC, but not back. I'll try to investigate. Debian bugreport for a reference: http://bugs.debian.org/647312 Reproducer: install a winXP guest on kvm with -no-acpi so it chooses an Uniprocessor with MPS HAL. Switch it to Standard PC in device manager, reboot -- in 0.15+ it does not work anymore, while in 0.14 it continues to work fine. Most likely non-ACPI interrupt routing. The commit it bisected to talks about piix -- may it be related? Thanks, /mjt -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 RFC] virtio-pci: flexible configuration layout
On Mon, Dec 05, 2011 at 11:16:05AM -0800, Jesse Barnes wrote: On Mon, 14 Nov 2011 20:18:55 +0200 Michael S. Tsirkin m...@redhat.com wrote: Add a flexible mechanism to specify virtio configuration layout, using pci vendor-specific capability. A separate capability is used for each of common, device specific and data-path accesses. Warning: compiled only. This patch also needs to be split up, pci_iomap changes also need arch updates for non-x86. There might also be more spec changes. Posting here for early feedback, and to allow Sasha to proceed with his kvm tool work. Changes from v1: Updated to match v3 of the spec, see: Subject: [PATCHv3 RFC] virtio-spec: flexible configuration layout Message-ID: 2010122436.ga13...@redhat.com In-Reply-To: 2009195901.ga28...@redhat.com Looks like this conflicts with your other iomap changes... I didn't check your latest tree; do you just add another patch on top for the virtio changes now? Thanks, Yes. Rusty asked for more changes so that isn't yet pushed. -- Jesse Barnes, Intel Open Source Technology Center -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/5 V5] Add generic stubs for kvm stop check functions
Signed-off-by: Eric B Munson emun...@mgebm.net Cc: mi...@redhat.com Cc: h...@zytor.com Cc: a...@arndb.de Cc: ry...@linux.vnet.ibm.com Cc: aligu...@us.ibm.com Cc: mtosa...@redhat.com Cc: jeremy.fitzhardi...@citrix.com Cc: levinsasha...@gmail.com Cc: Jan Kiszka jan.kis...@siemens.com Cc: kvm@vger.kernel.org Cc: linux-a...@vger.kernel.org Cc: x...@kernel.org Cc: linux-ker...@vger.kernel.org --- include/asm-generic/kvm_para.h | 14 ++ 1 files changed, 14 insertions(+), 0 deletions(-) create mode 100644 include/asm-generic/kvm_para.h diff --git a/include/asm-generic/kvm_para.h b/include/asm-generic/kvm_para.h new file mode 100644 index 000..177e1eb --- /dev/null +++ b/include/asm-generic/kvm_para.h @@ -0,0 +1,14 @@ +#ifndef _ASM_GENERIC_KVM_PARA_H +#define _ASM_GENERIC_KVM_PARA_H + + +/* + * This function is used by architectures that support kvm to avoid issuing + * false soft lockup messages. + */ +static inline bool kvm_check_and_clear_guest_paused(int cpu) +{ + return false; +} + +#endif -- 1.7.5.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/5 V5] Add ioctl for KVMCLOCK_GUEST_STOPPED
Now that we have a flag that will tell the guest it was suspended, create an interface for that communication using a KVM ioctl. Signed-off-by: Eric B Munson emun...@mgebm.net Cc: mi...@redhat.com Cc: h...@zytor.com Cc: a...@arndb.de Cc: ry...@linux.vnet.ibm.com Cc: aligu...@us.ibm.com Cc: mtosa...@redhat.com Cc: jeremy.fitzhardi...@citrix.com Cc: levinsasha...@gmail.com Cc: Jan Kiszka jan.kis...@siemens.com Cc: kvm@vger.kernel.org Cc: linux-a...@vger.kernel.org Cc: x...@kernel.org Cc: linux-ker...@vger.kernel.org --- Changes from V4: Rename KVM_GUEST_PAUSED to KVMCLOCK_GUEST_PAUSED Add new ioctl description to api.txt Documentation/virtual/kvm/api.txt | 12 arch/x86/include/asm/kvm_host.h |2 ++ arch/x86/kvm/x86.c| 20 include/linux/kvm.h |2 ++ 4 files changed, 36 insertions(+), 0 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 7945b0b..0f7dd99 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1450,6 +1450,18 @@ is supported; 2 if the processor requires all virtual machines to have an RMA, or 1 if the processor can use an RMA but doesn't require it, because it supports the Virtual RMA (VRMA) facility. +4.64 KVMCLOCK_GUEST_PAUSED + +Capability: basic +Architechtures: Any that implement pvclocks (currently x86 only) +Type: vcpu ioctl +Parameters: None +Returns: 0 on success, -1 on error + +This signals to the host kernel that the specified guest is being paused by +userspace. The host will set a flag in the pvclock structure that is checked +from the soft lockup watchdog. + 5. The kvm_run structure Application code obtains a pointer to the kvm_run structure by diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index b4973f4..beb94c6 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -672,6 +672,8 @@ int kvm_pv_mmu_op(struct kvm_vcpu *vcpu, unsigned long bytes, gpa_t addr, unsigned long *ret); u8 kvm_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn); +int kvm_set_guest_paused(struct kvm_vcpu *vcpu); + extern bool tdp_enabled; u64 vcpu_tsc_khz(struct kvm_vcpu *vcpu); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index c38efd7..1dab5fd 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3295,6 +3295,10 @@ long kvm_arch_vcpu_ioctl(struct file *filp, goto out; } + case KVMCLOCK_GUEST_PAUSED: { + r = kvm_set_guest_paused(vcpu); + break; + } default: r = -EINVAL; } @@ -6117,6 +6121,22 @@ int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int reason, } EXPORT_SYMBOL_GPL(kvm_task_switch); +/* + * kvm_set_guest_paused() indicates to the guest kernel that it has been + * stopped by the hypervisor. This function will be called from the host only. + * EINVAL is returned when the host attempts to set the flag for a guest that + * does not support pv clocks. + */ +int kvm_set_guest_paused(struct kvm_vcpu *vcpu) +{ + struct pvclock_vcpu_time_info *src = vcpu-arch.hv_clock; + if (!vcpu-arch.time_page) + return -EINVAL; + src-flags |= PVCLOCK_GUEST_STOPPED; + return 0; +} +EXPORT_SYMBOL_GPL(kvm_set_guest_paused); + int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs) { diff --git a/include/linux/kvm.h b/include/linux/kvm.h index c3892fc..1d1ddef 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -762,6 +762,8 @@ struct kvm_clock_data { #define KVM_CREATE_SPAPR_TCE _IOW(KVMIO, 0xa8, struct kvm_create_spapr_tce) /* Available with KVM_CAP_RMA */ #define KVM_ALLOCATE_RMA _IOR(KVMIO, 0xa9, struct kvm_allocate_rma) +/* VM is being stopped by host */ +#define KVMCLOCK_GUEST_PAUSED_IO(KVMIO, 0xaa) #define KVM_DEV_ASSIGN_ENABLE_IOMMU(1 0) -- 1.7.5.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next RFC PATCH 5/5] virtio-net: flow director support
On Mon, 2011-12-05 at 16:59 +0800, Jason Wang wrote: In order to let the packets of a flow to be passed to the desired guest cpu, we can co-operate with devices through programming the flow director which was just a hash to queue table. This kinds of co-operation is done through the accelerate RFS support, a device specific flow sterring method virtnet_fd() is used to modify the flow director based on rfs mapping. The desired queue were calculated through reverse mapping of the irq affinity table. In order to parallelize the ingress path, irq affinity of rx queue were also provides by the driver. In addition to accelerate RFS, we can also use the guest scheduler to balance the load of TX and reduce the lock contention on egress path, so the processor_id() were used to tx queue selection. [...] +#ifdef CONFIG_RFS_ACCEL + +int virtnet_fd(struct net_device *net_dev, const struct sk_buff *skb, +u16 rxq_index, u32 flow_id) +{ + struct virtnet_info *vi = netdev_priv(net_dev); + u16 *table = NULL; + + if (skb-protocol != htons(ETH_P_IP) || !skb-rxhash) + return -EPROTONOSUPPORT; Why only IPv4? + table = kmap_atomic(vi-fd_page); + table[skb-rxhash TAP_HASH_MASK] = rxq_index; + kunmap_atomic(table); + + return 0; +} +#endif This is not a proper implementation of ndo_rx_flow_steer. If you steer a flow by changing the RSS table this can easily cause packet reordering in other flows. The filtering should be more precise, ideally matching exactly a single flow by e.g. VID and IP 5-tuple. I think you need to add a second hash table which records exactly which flow is supposed to be steered. Also, you must call rps_may_expire_flow() to check whether an entry in this table may be replaced; otherwise you can cause packet reordering in the flow that was previously being steered. Finally, this function must return the table index it assigned, so that rps_may_expire_flow() works. +static u16 virtnet_select_queue(struct net_device *dev, struct sk_buff *skb) +{ + int txq = skb_rx_queue_recorded(skb) ? skb_get_rx_queue(skb) : +smp_processor_id(); + + /* As we make use of the accelerate rfs which let the scheduler to + * balance the load, it make sense to choose the tx queue also based on + * theprocessor id? + */ + while (unlikely(txq = dev-real_num_tx_queues)) + txq -= dev-real_num_tx_queues; + return txq; +} [...] Don't do this, let XPS handle it. Ben. -- Ben Hutchings, Staff Engineer, Solarflare Not speaking for my employer; that's the marketing department's job. They asked us to note that Solarflare product names are trademarked. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
kvm deadlock
Hello, I am struggling with repeatable full hardware locks when running 8-12 KVM vms. At some point before the hard lock I get a inconsistent lock state warning. An example of this can be found here: http://pastebin.com/8wKhgE2C After that the server continues to run for a while and then starts its death spiral. When it reaches that point it fails to log anything further to the disk, but by attaching a console I have been able to get a stack trace documenting the final implosion: http://pastebin.com/PbcN76bd All of the cores end up hung and the server stops responding to all input, including SysRq commands. I have seen this behavior on two machines (dual E5606 running Fedora 16) both passed cpuburnin testing and memtest86 scans without error. I have reproduced the crash and stack traces from a Fedora debugging kernel - 3.1.2-1 and with a vanilla 3.1.4 kernel. Nate Custer QA Analyst cPanel Inc-- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Xen-devel] [PATCH RFC V3 1/4] debugfs: Add support to print u32 array in debugfs
On Wed, Nov 30, 2011 at 02:29:39PM +0530, Raghavendra K T wrote: Add debugfs support to print u32-arrays in debugfs. Move the code from Xen to debugfs to make the code common for other users as well. Signed-off-by: Srivatsa Vaddagiri va...@linux.vnet.ibm.com Signed-off-by: Suzuki Poulose suz...@in.ibm.com Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com Looks good to me. --- diff --git a/arch/x86/xen/debugfs.c b/arch/x86/xen/debugfs.c index 7c0fedd..c8377fb 100644 --- a/arch/x86/xen/debugfs.c +++ b/arch/x86/xen/debugfs.c @@ -19,107 +19,3 @@ struct dentry * __init xen_init_debugfs(void) return d_xen_debug; } -struct array_data -{ - void *array; - unsigned elements; -}; - -static int u32_array_open(struct inode *inode, struct file *file) -{ - file-private_data = NULL; - return nonseekable_open(inode, file); -} - -static size_t format_array(char *buf, size_t bufsize, const char *fmt, -u32 *array, unsigned array_size) -{ - size_t ret = 0; - unsigned i; - - for(i = 0; i array_size; i++) { - size_t len; - - len = snprintf(buf, bufsize, fmt, array[i]); - len++; /* ' ' or '\n' */ - ret += len; - - if (buf) { - buf += len; - bufsize -= len; - buf[-1] = (i == array_size-1) ? '\n' : ' '; - } - } - - ret++; /* \0 */ - if (buf) - *buf = '\0'; - - return ret; -} - -static char *format_array_alloc(const char *fmt, u32 *array, unsigned array_size) -{ - size_t len = format_array(NULL, 0, fmt, array, array_size); - char *ret; - - ret = kmalloc(len, GFP_KERNEL); - if (ret == NULL) - return NULL; - - format_array(ret, len, fmt, array, array_size); - return ret; -} - -static ssize_t u32_array_read(struct file *file, char __user *buf, size_t len, - loff_t *ppos) -{ - struct inode *inode = file-f_path.dentry-d_inode; - struct array_data *data = inode-i_private; - size_t size; - - if (*ppos == 0) { - if (file-private_data) { - kfree(file-private_data); - file-private_data = NULL; - } - - file-private_data = format_array_alloc(%u, data-array, data-elements); - } - - size = 0; - if (file-private_data) - size = strlen(file-private_data); - - return simple_read_from_buffer(buf, len, ppos, file-private_data, size); -} - -static int xen_array_release(struct inode *inode, struct file *file) -{ - kfree(file-private_data); - - return 0; -} - -static const struct file_operations u32_array_fops = { - .owner = THIS_MODULE, - .open = u32_array_open, - .release= xen_array_release, - .read = u32_array_read, - .llseek = no_llseek, -}; - -struct dentry *xen_debugfs_create_u32_array(const char *name, mode_t mode, - struct dentry *parent, - u32 *array, unsigned elements) -{ - struct array_data *data = kmalloc(sizeof(*data), GFP_KERNEL); - - if (data == NULL) - return NULL; - - data-array = array; - data-elements = elements; - - return debugfs_create_file(name, mode, parent, data, u32_array_fops); -} diff --git a/arch/x86/xen/debugfs.h b/arch/x86/xen/debugfs.h index e281320..12ebf33 100644 --- a/arch/x86/xen/debugfs.h +++ b/arch/x86/xen/debugfs.h @@ -3,8 +3,4 @@ struct dentry * __init xen_init_debugfs(void); -struct dentry *xen_debugfs_create_u32_array(const char *name, mode_t mode, - struct dentry *parent, - u32 *array, unsigned elements); - #endif /* _XEN_DEBUGFS_H */ diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c index fc506e6..14a8961 100644 --- a/arch/x86/xen/spinlock.c +++ b/arch/x86/xen/spinlock.c @@ -286,7 +286,7 @@ static int __init xen_spinlock_debugfs(void) debugfs_create_u64(time_blocked, 0444, d_spin_debug, spinlock_stats.time_blocked); - xen_debugfs_create_u32_array(histo_blocked, 0444, d_spin_debug, + debugfs_create_u32_array(histo_blocked, 0444, d_spin_debug, spinlock_stats.histo_spin_blocked, HISTO_BUCKETS + 1); return 0; diff --git a/fs/debugfs/file.c b/fs/debugfs/file.c index 90f7657..df44ccf 100644 --- a/fs/debugfs/file.c +++ b/fs/debugfs/file.c @@ -18,6 +18,7 @@ #include linux/pagemap.h #include linux/namei.h #include linux/debugfs.h +#include linux/slab.h static ssize_t default_read_file(struct file *file, char __user *buf, size_t count, loff_t *ppos) @@ -525,3 +526,130 @@
[no subject]
subscribe kvm -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC V3 4/4] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor
On Wed, Nov 30, 2011 at 02:30:38PM +0530, Raghavendra K T wrote: This patch extends Linux guests running on KVM hypervisor to support pv-ticketlocks. During smp_boot_cpus paravirtualied KVM guest detects if the hypervisor has required feature (KVM_FEATURE_KICK_VCPU) to support pv-ticketlocks. If so, support for pv-ticketlocks is registered via pv_lock_ops. Signed-off-by: Srivatsa Vaddagiri va...@linux.vnet.ibm.com Signed-off-by: Suzuki Poulose suz...@in.ibm.com Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com --- diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h index 8b1d65d..7e419ad 100644 --- a/arch/x86/include/asm/kvm_para.h +++ b/arch/x86/include/asm/kvm_para.h @@ -195,10 +195,21 @@ void kvm_async_pf_task_wait(u32 token); void kvm_async_pf_task_wake(u32 token); u32 kvm_read_and_reset_pf_reason(void); extern void kvm_disable_steal_time(void); -#else -#define kvm_guest_init() do { } while (0) + +#ifdef CONFIG_PARAVIRT_SPINLOCKS +void __init kvm_spinlock_init(void); +#else /* CONFIG_PARAVIRT_SPINLOCKS */ +static void kvm_spinlock_init(void) +{ +} +#endif /* CONFIG_PARAVIRT_SPINLOCKS */ + +#else /* CONFIG_KVM_GUEST */ +#define kvm_guest_init() do {} while (0) #define kvm_async_pf_task_wait(T) do {} while(0) #define kvm_async_pf_task_wake(T) do {} while(0) +#define kvm_spinlock_init() do {} while (0) + static inline u32 kvm_read_and_reset_pf_reason(void) { return 0; diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index a9c2116..dffeea3 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -33,6 +33,7 @@ #include linux/sched.h #include linux/slab.h #include linux/kprobes.h +#include linux/debugfs.h #include asm/timer.h #include asm/cpu.h #include asm/traps.h @@ -545,6 +546,7 @@ static void __init kvm_smp_prepare_boot_cpu(void) #endif kvm_guest_cpu_init(); native_smp_prepare_boot_cpu(); + kvm_spinlock_init(); } static void __cpuinit kvm_guest_cpu_online(void *dummy) @@ -627,3 +629,248 @@ static __init int activate_jump_labels(void) return 0; } arch_initcall(activate_jump_labels); + +#ifdef CONFIG_PARAVIRT_SPINLOCKS + +enum kvm_contention_stat { + TAKEN_SLOW, + TAKEN_SLOW_PICKUP, + RELEASED_SLOW, + RELEASED_SLOW_KICKED, + NR_CONTENTION_STATS +}; + +#ifdef CONFIG_KVM_DEBUG_FS + +static struct kvm_spinlock_stats +{ + u32 contention_stats[NR_CONTENTION_STATS]; + +#define HISTO_BUCKETS30 + u32 histo_spin_blocked[HISTO_BUCKETS+1]; + + u64 time_blocked; +} spinlock_stats; + +static u8 zero_stats; + +static inline void check_zero(void) +{ + u8 ret; + u8 old = ACCESS_ONCE(zero_stats); + if (unlikely(old)) { + ret = cmpxchg(zero_stats, old, 0); + /* This ensures only one fellow resets the stat */ + if (ret == old) + memset(spinlock_stats, 0, sizeof(spinlock_stats)); + } +} + +static inline void add_stats(enum kvm_contention_stat var, int val) You probably want 'int val' to be 'u32 val' as that is the type in contention_stats. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 00/28] kvm tools: Prepare kvmtool for another architecture
Hi, This patch series rearranges and tidies various parts of kvmtool to pave the way for the addition of support for another architecture -- SPAPR PPC64. A second patch series will follow to present the PPC64 support. kvmtool is extremely x86-specific, so a fair chunk of refactoring into common code vs architecture-specific code is performed in this set. It also has a (refreshingly small) set of endian bugs that are fixed, plus assumptions about the hardware presented to the guest. I've started the series with the main meat-- moving/renaming things like bios, CPU setup, guest address space layout, interrupts, ioports etc., into a new x86/ directory. The Makefile determines an architecture and builds the appropriate dir, devices, etc. Follow-on patches change some of the mechanics, for example modifying the loop around ioctl(KVM_RUN) so that whilst it stays generic, it calls into arch-specific code to handle specific exit reasons, MMIO etc. The builtin-run initialisation path is rationalised so that PCI IRQs are initialised before devices, and all of this happens before arch-specific code is given the chance to initialise any firmware and generate any device trees. Most of this series is fairly trivial, in moving code, making definitions arch-local or available via a header, endian sanitisation. The PCI code changes are probably most 'interesting', in that I have made the config space accesses available to those not using the PC ioport access method, plus wrapped initialisations of config space with cpu_to_leXX accesses. If there's anything in this series that'll cause the world to end, or stain, do let me know. :) Cheers, Matt Matt Evans (28): kvm tools: Split x86 arch-specific bits into x86/ kvm tools: Only build/init i8042 on x86 kvm tools: Add Makefile parameter for kernel include path kvm tools: Re-arrange Makefile to heed CFLAGS before checking for optional libs kvm tools: 64-bit tidy; use PRIx64 when printf'ing u64s and link appropriately kvm tools: Add arch-specific KVM_RUN exit handling via kvm_cpu__handle_exit() kvm tools: Move 'kvm__recommended_cpus' to arch-specific code kvm tools: Fix KVM_RUN exit code check kvm tools: Add kvm__arch_periodic_poll() kvm tools: term.h needs to include stdbool.h kvm tools: kvm.c needs to include sys/stat.h for mkdir kvm tools: Move arch-specific cmdline init into kvm__arch_set_cmdline() kvm tools: Add CONSOLE_HV term type and allow it to be selected kvm tools: Fix term_getc(), term_getc_iov() endian bugs kvm tools: Allow initrd_check() to match a cpio kvm tools: Allow load_flat_binary() to load an initrd alongside kvm tools: Only call symbol__init() if we have BFD kvm tools: Initialise PCI before devices start getting registered with PCI kvm tools: Perform CPU and firmware setup after devices are added kvm tools: Init IRQs after determining nrcpus kvm tools: Add --hugetlbfs option to specify memory path kvm tools: Move PCI_MAX_DEVICES to pci.h kvm tools: Endian-sanitise pci.h and PCI device setup kvm tools: Fix virtio-pci endian bug when reading VIRTIO_PCI_QUEUE_NUM kvm tools: Correctly set virtio-pci bar_size and remove hardwired address kvm tools: Add pci__config_{rd,wr}(), pci__find_dev() and fix PCI config register addressing kvm tools: Arch-specific define for PCI MMIO allocation area kvm tools: Create arch-specific kvm_cpu__emulate_io() tools/kvm/Makefile | 139 +--- tools/kvm/builtin-run.c | 82 +++-- tools/kvm/builtin-stat.c|4 +- tools/kvm/disk/core.c |4 +- tools/kvm/hw/pci-shmem.c| 23 +- tools/kvm/hw/vesa.c | 15 +- tools/kvm/include/kvm/ioport.h | 13 +- tools/kvm/include/kvm/kvm-cpu.h | 30 +-- tools/kvm/include/kvm/kvm.h | 62 +--- tools/kvm/include/kvm/pci.h | 30 ++- tools/kvm/include/kvm/term.h|2 + tools/kvm/ioport.c | 54 --- tools/kvm/kvm-cpu.c | 407 +- tools/kvm/kvm.c | 374 +--- tools/kvm/mmio.c|4 +- tools/kvm/pci.c | 76 +++-- tools/kvm/term.c|5 +- tools/kvm/virtio/pci.c | 51 ++-- tools/kvm/{ = x86}/bios.c |0 tools/kvm/{ = x86}/bios/.gitignore |0 tools/kvm/{ = x86}/bios/bios-rom.S |2 +- tools/kvm/{ = x86}/bios/e820.c |0 tools/kvm/{ = x86}/bios/entry.S|0 tools/kvm/{ = x86}/bios/gen-offsets.sh |0 tools/kvm/{ = x86}/bios/int10.c|0 tools/kvm/{ = x86}/bios/int15.c
[PATCH 02/28] kvm tools: Only build/init i8042 on x86
Not every architecture has an i8042 kbd controller, so only use this when building for x86. Signed-off-by: Matt Evans m...@ozlabs.org --- tools/kvm/Makefile |2 +- tools/kvm/builtin-run.c |2 ++ 2 files changed, 3 insertions(+), 1 deletions(-) diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile index 243886e..f58a1d8 100644 --- a/tools/kvm/Makefile +++ b/tools/kvm/Makefile @@ -77,7 +77,6 @@ OBJS += util/strbuf.o OBJS += virtio/9p.o OBJS += virtio/9p-pdu.o OBJS += hw/vesa.o -OBJS += hw/i8042.o OBJS += hw/pci-shmem.o OBJS += kvm-ipc.o @@ -153,6 +152,7 @@ ifeq ($(ARCH),x86) OBJS+= x86/kvm.o OBJS+= x86/kvm-cpu.o OBJS+= x86/mptable.o + OBJS+= hw/i8042.o # Exclude BIOS object files from header dependencies. OTHEROBJS += x86/bios.o OTHEROBJS += x86/bios/bios-rom.o diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c index 9148d83..e4aa87e 100644 --- a/tools/kvm/builtin-run.c +++ b/tools/kvm/builtin-run.c @@ -941,7 +941,9 @@ int kvm_cmd_run(int argc, const char **argv, const char *prefix) kvm__init_ram(kvm); +#ifdef CONFIG_X86 kbd__init(kvm); +#endif pci_shmem__init(kvm); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 03/28] kvm tools: Add Makefile parameter for kernel include path
This patch adds an 'I' parameter to override the default kernel include path of '../../include'. Signed-off-by: Matt Evans m...@ozlabs.org --- tools/kvm/Makefile |9 +++-- 1 files changed, 7 insertions(+), 2 deletions(-) diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile index f58a1d8..f85a154 100644 --- a/tools/kvm/Makefile +++ b/tools/kvm/Makefile @@ -9,7 +9,12 @@ else E = @\# Q = endif -export E Q +ifneq ($(I), ) + KINCL_PATH=$(I) +else + KINCL_PATH=../.. +endif +export E Q KINCL_PATH include config/utilities.mak include config/feature-tests.mak @@ -176,7 +181,7 @@ DEFINES += -DKVMTOOLS_VERSION='$(KVMTOOLS_VERSION)' DEFINES+= -DBUILD_ARCH='$(ARCH)' KVM_INCLUDE := include -CFLAGS += $(CPPFLAGS) $(DEFINES) -I$(KVM_INCLUDE) -I$(ARCH_INCLUDE) -I../../include -I../../arch/$(ARCH)/include/ -Os -g +CFLAGS += $(CPPFLAGS) $(DEFINES) -I$(KVM_INCLUDE) -I$(ARCH_INCLUDE) -I$(KINCL_PATH)/include -I$(KINCL_PATH)/arch/$(ARCH)/include/ -Os -g ifneq ($(WERROR),0) WARNINGS += -Werror -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 04/28] kvm tools: Re-arrange Makefile to heed CFLAGS before checking for optional libs
The checks for optional libraries build code to perform the tests, so should respect certain CFLAGS -- in particular, -m64 so we check for 64bit libraries if they're required. Signed-off-by: Matt Evans m...@ozlabs.org --- tools/kvm/Makefile | 86 ++- 1 files changed, 44 insertions(+), 42 deletions(-) diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile index f85a154..009a6ba 100644 --- a/tools/kvm/Makefile +++ b/tools/kvm/Makefile @@ -85,48 +85,6 @@ OBJS += hw/vesa.o OBJS += hw/pci-shmem.o OBJS += kvm-ipc.o -FLAGS_BFD := $(CFLAGS) -lbfd -has_bfd := $(call try-cc,$(SOURCE_BFD),$(FLAGS_BFD)) -ifeq ($(has_bfd),y) - CFLAGS += -DCONFIG_HAS_BFD - OBJS+= symbol.o - LIBS+= -lbfd -endif - -FLAGS_VNCSERVER := $(CFLAGS) -lvncserver -has_vncserver := $(call try-cc,$(SOURCE_VNCSERVER),$(FLAGS_VNCSERVER)) -ifeq ($(has_vncserver),y) - OBJS+= ui/vnc.o - CFLAGS += -DCONFIG_HAS_VNCSERVER - LIBS+= -lvncserver -endif - -FLAGS_SDL := $(CFLAGS) -lSDL -has_SDL := $(call try-cc,$(SOURCE_SDL),$(FLAGS_SDL)) -ifeq ($(has_SDL),y) - OBJS+= ui/sdl.o - CFLAGS += -DCONFIG_HAS_SDL - LIBS+= -lSDL -endif - -FLAGS_ZLIB := $(CFLAGS) -lz -has_ZLIB := $(call try-cc,$(SOURCE_ZLIB),$(FLAGS_ZLIB)) -ifeq ($(has_ZLIB),y) - CFLAGS += -DCONFIG_HAS_ZLIB - LIBS+= -lz -endif - -FLAGS_AIO := $(CFLAGS) -laio -has_AIO := $(call try-cc,$(SOURCE_AIO),$(FLAGS_AIO)) -ifeq ($(has_AIO),y) - CFLAGS += -DCONFIG_HAS_AIO - LIBS+= -laio -endif - -LIBS += -lrt -LIBS += -lpthread -LIBS += -lutil - # Additional ARCH settings for x86 ARCH ?= $(shell echo $(uname_M) | sed -e s/i.86/i386/ -e s/sun4u/sparc64/ \ -e s/arm.*/arm/ -e s/sa110/arm/ \ @@ -172,6 +130,50 @@ else UNSUPP_ERR = endif + +FLAGS_BFD := $(CFLAGS) -lbfd +has_bfd := $(call try-cc,$(SOURCE_BFD),$(FLAGS_BFD)) +ifeq ($(has_bfd),y) + CFLAGS += -DCONFIG_HAS_BFD + OBJS+= symbol.o + LIBS+= -lbfd +endif + +FLAGS_VNCSERVER := $(CFLAGS) -lvncserver +has_vncserver := $(call try-cc,$(SOURCE_VNCSERVER),$(FLAGS_VNCSERVER)) +ifeq ($(has_vncserver),y) + OBJS+= ui/vnc.o + CFLAGS += -DCONFIG_HAS_VNCSERVER + LIBS+= -lvncserver +endif + +FLAGS_SDL := $(CFLAGS) -lSDL +has_SDL := $(call try-cc,$(SOURCE_SDL),$(FLAGS_SDL)) +ifeq ($(has_SDL),y) + OBJS+= ui/sdl.o + CFLAGS += -DCONFIG_HAS_SDL + LIBS+= -lSDL +endif + +FLAGS_ZLIB := $(CFLAGS) -lz +has_ZLIB := $(call try-cc,$(SOURCE_ZLIB),$(FLAGS_ZLIB)) +ifeq ($(has_ZLIB),y) + CFLAGS += -DCONFIG_HAS_ZLIB + LIBS+= -lz +endif + +FLAGS_AIO := $(CFLAGS) -laio +has_AIO := $(call try-cc,$(SOURCE_AIO),$(FLAGS_AIO)) +ifeq ($(has_AIO),y) + CFLAGS += -DCONFIG_HAS_AIO + LIBS+= -laio +endif + +LIBS += -lrt +LIBS += -lpthread +LIBS += -lutil + + DEPS := $(patsubst %.o,%.d,$(OBJS)) OBJS += $(OTHEROBJS) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 05/28] kvm tools: 64-bit tidy; use PRIx64 when printf'ing u64s and link appropriately
On LP64 systems our u64s are just longs; remove the %llx'es in favour of PRIx64 etc. This patch also adds CFLAGS to the final link, so that any -m64 is obeyed when linking, too. Signed-off-by: Matt Evans m...@ozlabs.org --- tools/kvm/Makefile |2 +- tools/kvm/builtin-run.c | 14 -- tools/kvm/builtin-stat.c |4 +++- tools/kvm/disk/core.c|4 +++- tools/kvm/mmio.c |4 +++- 5 files changed, 18 insertions(+), 10 deletions(-) diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile index 009a6ba..57dc521 100644 --- a/tools/kvm/Makefile +++ b/tools/kvm/Makefile @@ -218,7 +218,7 @@ KVMTOOLS-VERSION-FILE: $(PROGRAM): $(DEPS) $(OBJS) $(E) LINK $@ - $(Q) $(CC) $(OBJS) $(LIBS) -o $@ + $(Q) $(CC) $(CFLAGS) $(OBJS) $(LIBS) -o $@ $(GUEST_INIT): guest/init.c $(E) LINK $@ diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c index e4aa87e..7cf208d 100644 --- a/tools/kvm/builtin-run.c +++ b/tools/kvm/builtin-run.c @@ -42,6 +42,8 @@ #include stdlib.h #include string.h #include unistd.h +#define __STDC_FORMAT_MACROS +#include inttypes.h #include ctype.h #include stdio.h @@ -383,8 +385,8 @@ static int shmem_parser(const struct option *opt, const char *arg, int unset) strcpy(handle, default_handle); } if (verbose) { - pr_info(shmem: phys_addr = %llx, phys_addr); - pr_info(shmem: size = %llx, size); + pr_info(shmem: phys_addr = %PRIx64, phys_addr); + pr_info(shmem: size = %PRIx64, size); pr_info(shmem: handle= %s, handle); pr_info(shmem: create= %d, create); } @@ -545,7 +547,7 @@ panic_kvm: current_kvm_cpu-kvm_run-exit_reason, kvm_exit_reasons[current_kvm_cpu-kvm_run-exit_reason]); if (current_kvm_cpu-kvm_run-exit_reason == KVM_EXIT_UNKNOWN) - fprintf(stderr, KVM exit code: 0x%Lu\n, + fprintf(stderr, KVM exit code: 0x%PRIx64\n, current_kvm_cpu-kvm_run-hw.hardware_exit_reason); kvm_cpu__set_debug_fd(STDOUT_FILENO); @@ -760,10 +762,10 @@ int kvm_cmd_run(int argc, const char **argv, const char *prefix) ram_size= get_ram_size(nrcpus); if (ram_size MIN_RAM_SIZE_MB) - die(Not enough memory specified: %lluMB (min %lluMB), ram_size, MIN_RAM_SIZE_MB); + die(Not enough memory specified: %PRIu64MB (min %lluMB), ram_size, MIN_RAM_SIZE_MB); if (ram_size host_ram_size()) - pr_warning(Guest memory size %lluMB exceeds host physical RAM size %lluMB, ram_size, host_ram_size()); + pr_warning(Guest memory size %PRIu64MB exceeds host physical RAM size %PRIu64MB, ram_size, host_ram_size()); ram_size = MB_SHIFT; @@ -878,7 +880,7 @@ int kvm_cmd_run(int argc, const char **argv, const char *prefix) virtio_blk__init_all(kvm); } - printf( # kvm run -k %s -m %Lu -c %d --name %s\n, kernel_filename, ram_size / 1024 / 1024, nrcpus, guest_name); + printf( # kvm run -k %s -m %PRId64 -c %d --name %s\n, kernel_filename, ram_size / 1024 / 1024, nrcpus, guest_name); if (!kvm__load_kernel(kvm, kernel_filename, initrd_filename, real_cmdline, vidmode)) diff --git a/tools/kvm/builtin-stat.c b/tools/kvm/builtin-stat.c index e28eb5b..c1f2605 100644 --- a/tools/kvm/builtin-stat.c +++ b/tools/kvm/builtin-stat.c @@ -9,6 +9,8 @@ #include stdio.h #include string.h #include signal.h +#define __STDC_FORMAT_MACROS +#include inttypes.h #include linux/virtio_balloon.h @@ -97,7 +99,7 @@ static int do_memstat(const char *name, int sock) printf(The total amount of memory available (in bytes):); break; } - printf(%llu\n, stats[i].val); + printf(%PRId64\n, stats[i].val); } printf(\n); diff --git a/tools/kvm/disk/core.c b/tools/kvm/disk/core.c index 4915efd..a135851 100644 --- a/tools/kvm/disk/core.c +++ b/tools/kvm/disk/core.c @@ -4,6 +4,8 @@ #include sys/eventfd.h #include sys/poll.h +#define __STDC_FORMAT_MACROS +#include inttypes.h #define AIO_MAX 32 @@ -232,7 +234,7 @@ ssize_t disk_image__get_serial(struct disk_image *disk, void *buffer, ssize_t *l if (fstat(disk-fd, st) != 0) return 0; - *len = snprintf(buffer, *len, %llu%llu%llu, (u64)st.st_dev, (u64)st.st_rdev, (u64)st.st_ino); + *len = snprintf(buffer, *len, %PRId64%PRId64%PRId64, (u64)st.st_dev, (u64)st.st_rdev, (u64)st.st_ino); return *len; } diff --git a/tools/kvm/mmio.c b/tools/kvm/mmio.c index de7320f..1158bff 100644 --- a/tools/kvm/mmio.c +++ b/tools/kvm/mmio.c @@ -9,6 +9,8 @@ #include linux/kvm.h #include linux/types.h #include linux/rbtree.h +#define __STDC_FORMAT_MACROS +#include
[PATCH 06/28] kvm tools: Add arch-specific KVM_RUN exit handling via kvm_cpu__handle_exit()
This patch creates a new function in x86/kvm-cpu.c, kvm_cpu__handle_exit(), in which arch-specific exit reasons can be handled outside of the common runloop. Signed-off-by: Matt Evans m...@ozlabs.org --- tools/kvm/include/kvm/kvm-cpu.h |2 ++ tools/kvm/kvm-cpu.c | 10 -- tools/kvm/x86/kvm-cpu.c |5 + 3 files changed, 15 insertions(+), 2 deletions(-) diff --git a/tools/kvm/include/kvm/kvm-cpu.h b/tools/kvm/include/kvm/kvm-cpu.h index 719e286..15618f1 100644 --- a/tools/kvm/include/kvm/kvm-cpu.h +++ b/tools/kvm/include/kvm/kvm-cpu.h @@ -2,6 +2,7 @@ #define KVM__KVM_CPU_H #include kvm/kvm-cpu-arch.h +#include stdbool.h struct kvm_cpu *kvm_cpu__init(struct kvm *kvm, unsigned long cpu_id); void kvm_cpu__delete(struct kvm_cpu *vcpu); @@ -11,6 +12,7 @@ void kvm_cpu__enable_singlestep(struct kvm_cpu *vcpu); void kvm_cpu__run(struct kvm_cpu *vcpu); void kvm_cpu__reboot(void); int kvm_cpu__start(struct kvm_cpu *cpu); +bool kvm_cpu__handle_exit(struct kvm_cpu *vcpu); int kvm_cpu__get_debug_fd(void); void kvm_cpu__set_debug_fd(int fd); diff --git a/tools/kvm/kvm-cpu.c b/tools/kvm/kvm-cpu.c index 5aba3bb..9bc0796 100644 --- a/tools/kvm/kvm-cpu.c +++ b/tools/kvm/kvm-cpu.c @@ -137,8 +137,14 @@ int kvm_cpu__start(struct kvm_cpu *cpu) goto exit_kvm; case KVM_EXIT_SHUTDOWN: goto exit_kvm; - default: - goto panic_kvm; + default: { + bool ret; + + ret = kvm_cpu__handle_exit(cpu); + if (!ret) + goto panic_kvm; + break; + } } kvm_cpu__handle_coalesced_mmio(cpu); } diff --git a/tools/kvm/x86/kvm-cpu.c b/tools/kvm/x86/kvm-cpu.c index b26b208..a0d10cc 100644 --- a/tools/kvm/x86/kvm-cpu.c +++ b/tools/kvm/x86/kvm-cpu.c @@ -212,6 +212,11 @@ void kvm_cpu__reset_vcpu(struct kvm_cpu *vcpu) kvm_cpu__setup_msrs(vcpu); } +bool kvm_cpu__handle_exit(struct kvm_cpu *vcpu) +{ + return false; +} + static void print_dtable(const char *name, struct kvm_dtable *dtable) { dprintf(debug_fd, %s %016llx %08hx\n, -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 07/28] kvm tools: Move 'kvm__recommended_cpus' to arch-specific code
Architectures can recommend/count/determine number of CPUs differently, so move this out of generic code. Signed-off-by: Matt Evans m...@ozlabs.org --- tools/kvm/kvm.c | 30 -- tools/kvm/x86/kvm.c | 30 ++ 2 files changed, 30 insertions(+), 30 deletions(-) diff --git a/tools/kvm/kvm.c b/tools/kvm/kvm.c index 7ce1640..e526483 100644 --- a/tools/kvm/kvm.c +++ b/tools/kvm/kvm.c @@ -259,17 +259,6 @@ void kvm__register_mem(struct kvm *kvm, u64 guest_phys, u64 size, void *userspac die_perror(KVM_SET_USER_MEMORY_REGION ioctl); } -int kvm__recommended_cpus(struct kvm *kvm) -{ - int ret; - - ret = ioctl(kvm-sys_fd, KVM_CHECK_EXTENSION, KVM_CAP_NR_VCPUS); - if (ret = 0) - die_perror(KVM_CAP_NR_VCPUS); - - return ret; -} - static void kvm__pid(int fd, u32 type, u32 len, u8 *msg) { pid_t pid = getpid(); @@ -282,25 +271,6 @@ static void kvm__pid(int fd, u32 type, u32 len, u8 *msg) pr_warning(Failed sending PID); } -/* - * The following hack should be removed once 'x86: Raise the hard - * VCPU count limit' makes it's way into the mainline. - */ -#ifndef KVM_CAP_MAX_VCPUS -#define KVM_CAP_MAX_VCPUS 66 -#endif - -int kvm__max_cpus(struct kvm *kvm) -{ - int ret; - - ret = ioctl(kvm-sys_fd, KVM_CHECK_EXTENSION, KVM_CAP_MAX_VCPUS); - if (ret = 0) - ret = kvm__recommended_cpus(kvm); - - return ret; -} - struct kvm *kvm__init(const char *kvm_dev, u64 ram_size, const char *name) { struct kvm *kvm; diff --git a/tools/kvm/x86/kvm.c b/tools/kvm/x86/kvm.c index ac6c91e..75e4a52 100644 --- a/tools/kvm/x86/kvm.c +++ b/tools/kvm/x86/kvm.c @@ -76,6 +76,36 @@ bool kvm__arch_cpu_supports_vm(void) return regs.ecx (1 feature); } +int kvm__recommended_cpus(struct kvm *kvm) +{ + int ret; + + ret = ioctl(kvm-sys_fd, KVM_CHECK_EXTENSION, KVM_CAP_NR_VCPUS); + if (ret = 0) + die_perror(KVM_CAP_NR_VCPUS); + + return ret; +} + +/* + * The following hack should be removed once 'x86: Raise the hard + * VCPU count limit' makes it's way into the mainline. + */ +#ifndef KVM_CAP_MAX_VCPUS +#define KVM_CAP_MAX_VCPUS 66 +#endif + +int kvm__max_cpus(struct kvm *kvm) +{ + int ret; + + ret = ioctl(kvm-sys_fd, KVM_CHECK_EXTENSION, KVM_CAP_MAX_VCPUS); + if (ret = 0) + ret = kvm__recommended_cpus(kvm); + + return ret; +} + /* * Allocating RAM size bigger than 4GB requires us to leave a gap * in the RAM which is used for PCI MMIO, hotplug, and unconfigured -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 08/28] kvm tools: Fix KVM_RUN exit code check
kvm_cpu__run() currently die()s if KVM_RUN returns non-zero. Some architectures may return positive values in non-error cases, whereas real errors are always negative return values. Check for those instead. Signed-off-by: Matt Evans m...@ozlabs.org --- tools/kvm/kvm-cpu.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/tools/kvm/kvm-cpu.c b/tools/kvm/kvm-cpu.c index 9bc0796..884a89f 100644 --- a/tools/kvm/kvm-cpu.c +++ b/tools/kvm/kvm-cpu.c @@ -30,7 +30,7 @@ void kvm_cpu__run(struct kvm_cpu *vcpu) int err; err = ioctl(vcpu-vcpu_fd, KVM_RUN, 0); - if (err (errno != EINTR errno != EAGAIN)) + if (err 0 (errno != EINTR errno != EAGAIN)) die_perror(KVM_RUN failed); } -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 09/28] kvm tools: Add kvm__arch_periodic_poll()
Currently, the SIGALRM handler calls device poll functions (for serial, virtio console) directly. Which devices are present and which require polling is a system-specific decision, so create a new function called from common code move the x86-specific poll calls into it. Signed-off-by: Matt Evans m...@ozlabs.org --- tools/kvm/builtin-run.c |3 +-- tools/kvm/include/kvm/kvm.h |1 + tools/kvm/x86/kvm.c |8 3 files changed, 10 insertions(+), 2 deletions(-) diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c index 7cf208d..9ef331e 100644 --- a/tools/kvm/builtin-run.c +++ b/tools/kvm/builtin-run.c @@ -522,8 +522,7 @@ static void handle_debug(int fd, u32 type, u32 len, u8 *msg) static void handle_sigalrm(int sig) { - serial8250__inject_interrupt(kvm); - virtio_console__inject_interrupt(kvm); + kvm__arch_periodic_poll(kvm); } static void handle_stop(int fd, u32 type, u32 len, u8 *msg) diff --git a/tools/kvm/include/kvm/kvm.h b/tools/kvm/include/kvm/kvm.h index ca1acc0..60842d5 100644 --- a/tools/kvm/include/kvm/kvm.h +++ b/tools/kvm/include/kvm/kvm.h @@ -56,6 +56,7 @@ void kvm__remove_socket(const char *name); void kvm__arch_init(struct kvm *kvm, const char *kvm_dev, u64 ram_size, const char *name); void kvm__arch_setup_firmware(struct kvm *kvm); bool kvm__arch_cpu_supports_vm(void); +void kvm__arch_periodic_poll(struct kvm *kvm); int load_flat_binary(struct kvm *kvm, int fd); bool load_bzimage(struct kvm *kvm, int fd_kernel, int fd_initrd, const char *kernel_cmdline, u16 vidmode); diff --git a/tools/kvm/x86/kvm.c b/tools/kvm/x86/kvm.c index 75e4a52..45dcb77 100644 --- a/tools/kvm/x86/kvm.c +++ b/tools/kvm/x86/kvm.c @@ -4,6 +4,8 @@ #include kvm/interrupt.h #include kvm/mptable.h #include kvm/util.h +#include kvm/8250-serial.h +#include kvm/virtio-console.h #include asm/bootparam.h #include linux/kvm.h @@ -358,3 +360,9 @@ void kvm__arch_setup_firmware(struct kvm *kvm) /* MP table */ mptable_setup(kvm, kvm-nrcpus); } + +void kvm__arch_periodic_poll(struct kvm *kvm) +{ + serial8250__inject_interrupt(kvm); + virtio_console__inject_interrupt(kvm); +} -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 10/28] kvm tools: term.h needs to include stdbool.h
Fix a missing include. Signed-off-by: Matt Evans m...@ozlabs.org --- tools/kvm/include/kvm/term.h |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/tools/kvm/include/kvm/term.h b/tools/kvm/include/kvm/term.h index 37ec731..938c26f 100644 --- a/tools/kvm/include/kvm/term.h +++ b/tools/kvm/include/kvm/term.h @@ -2,6 +2,7 @@ #define KVM__TERM_H #include sys/uio.h +#include stdbool.h #define CONSOLE_8250 1 #define CONSOLE_VIRTIO 2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 11/28] kvm tools: kvm.c needs to include sys/stat.h for mkdir
Fix a missing include. Signed-off-by: Matt Evans m...@ozlabs.org --- tools/kvm/kvm.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/tools/kvm/kvm.c b/tools/kvm/kvm.c index e526483..33243f1 100644 --- a/tools/kvm/kvm.c +++ b/tools/kvm/kvm.c @@ -8,6 +8,7 @@ #include linux/kvm.h #include sys/un.h +#include sys/stat.h #include sys/types.h #include sys/socket.h #include sys/ioctl.h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 12/28] kvm tools: Move arch-specific cmdline init into kvm__arch_set_cmdline()
Different systems will want different base kernel commandlines, e.g. non-x86 systems probably don't need noapic, i8042.* etc., so set the commandline up in arch-specific code. Then, if the resulting commandline is empty, don't strcat a space onto the front. Signed-off-by: Matt Evans m...@ozlabs.org --- tools/kvm/builtin-run.c | 12 +--- tools/kvm/include/kvm/kvm.h |1 + tools/kvm/x86/kvm.c | 11 +++ 3 files changed, 17 insertions(+), 7 deletions(-) diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c index 9ef331e..a67bd8c 100644 --- a/tools/kvm/builtin-run.c +++ b/tools/kvm/builtin-run.c @@ -835,13 +835,11 @@ int kvm_cmd_run(int argc, const char **argv, const char *prefix) vidmode = 0; memset(real_cmdline, 0, sizeof(real_cmdline)); - strcpy(real_cmdline, noapic noacpi pci=conf1 reboot=k panic=1 i8042.direct=1 - i8042.dumbkbd=1 i8042.nopnp=1); - if (vnc || sdl) { - strcat(real_cmdline, video=vesafb console=tty0); - } else - strcat(real_cmdline, console=ttyS0 earlyprintk=serial i8042.noaux=1); - strcat(real_cmdline, ); + kvm__arch_set_cmdline(real_cmdline, vnc || sdl); + + if (strlen(real_cmdline) 0) + strcat(real_cmdline, ); + if (kernel_cmdline) strlcat(real_cmdline, kernel_cmdline, sizeof(real_cmdline)); diff --git a/tools/kvm/include/kvm/kvm.h b/tools/kvm/include/kvm/kvm.h index 60842d5..fae2ba9 100644 --- a/tools/kvm/include/kvm/kvm.h +++ b/tools/kvm/include/kvm/kvm.h @@ -53,6 +53,7 @@ int kvm__get_sock_by_instance(const char *name); int kvm__enumerate_instances(int (*callback)(const char *name, int pid)); void kvm__remove_socket(const char *name); +void kvm__arch_set_cmdline(char *cmdline, bool video); void kvm__arch_init(struct kvm *kvm, const char *kvm_dev, u64 ram_size, const char *name); void kvm__arch_setup_firmware(struct kvm *kvm); bool kvm__arch_cpu_supports_vm(void); diff --git a/tools/kvm/x86/kvm.c b/tools/kvm/x86/kvm.c index 45dcb77..7071dc6 100644 --- a/tools/kvm/x86/kvm.c +++ b/tools/kvm/x86/kvm.c @@ -149,6 +149,17 @@ void kvm__init_ram(struct kvm *kvm) } } +/* Arch-specific commandline setup */ +void kvm__arch_set_cmdline(char *cmdline, bool video) +{ + strcpy(cmdline, noapic noacpi pci=conf1 reboot=k panic=1 i8042.direct=1 + i8042.dumbkbd=1 i8042.nopnp=1); + if (video) { + strcat(cmdline, video=vesafb console=tty0); + } else + strcat(cmdline, console=ttyS0 earlyprintk=serial i8042.noaux=1); +} + /* Architecture-specific KVM init */ void kvm__arch_init(struct kvm *kvm, const char *kvm_dev, u64 ram_size, const char *name) { -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 13/28] kvm tools: Add CONSOLE_HV term type and allow it to be selected
This patch paves the way for adding a hypervisor console, useful on systems that support one out of the box yet don't have either serial port or virtio console support (e.g. kernels expecting POWER SPAPR). Signed-off-by: Matt Evans m...@ozlabs.org --- tools/kvm/builtin-run.c |8 ++-- tools/kvm/include/kvm/term.h |1 + 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c index a67bd8c..1257c90 100644 --- a/tools/kvm/builtin-run.c +++ b/tools/kvm/builtin-run.c @@ -416,7 +416,7 @@ static const struct option options[] = { OPT_BOOLEAN('\0', rng, virtio_rng, Enable virtio Random Number Generator), OPT_CALLBACK('\0', 9p, NULL, dir_to_share,tag_name, Enable virtio 9p to share files between host and guest, virtio_9p_rootdir_parser), - OPT_STRING('\0', console, console, serial or virtio, + OPT_STRING('\0', console, console, serial, virtio or hv, Console to use), OPT_STRING('\0', dev, dev, device_file, KVM device file), OPT_CALLBACK('\0', tty, NULL, tty id, @@ -776,8 +776,12 @@ int kvm_cmd_run(int argc, const char **argv, const char *prefix) if (!strncmp(console, virtio, 6)) active_console = CONSOLE_VIRTIO; - else + else if (!strncmp(console, serial, 6)) active_console = CONSOLE_8250; + else if (!strncmp(console, hv, 2)) + active_console = CONSOLE_HV; + else + pr_warning(No console!); if (!host_ip) host_ip = DEFAULT_HOST_ADDR; diff --git a/tools/kvm/include/kvm/term.h b/tools/kvm/include/kvm/term.h index 938c26f..a6a9822 100644 --- a/tools/kvm/include/kvm/term.h +++ b/tools/kvm/include/kvm/term.h @@ -6,6 +6,7 @@ #define CONSOLE_8250 1 #define CONSOLE_VIRTIO 2 +#define CONSOLE_HV 3 int term_putc_iov(int who, struct iovec *iov, int iovcnt, int term); int term_getc_iov(int who, struct iovec *iov, int iovcnt, int term); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 14/28] kvm tools: Fix term_getc(), term_getc_iov() endian bugs
term_getc()'s int c has one byte written into it (at its lowest address) by read_in_full(). This is expected to be the least significant byte, but that isn't the case on BE! Use correct type, unsigned char. A similar issue exists in term_getc_iov(), which needs to write a char to the iov rather than an int. Signed-off-by: Matt Evans m...@ozlabs.org --- tools/kvm/term.c |5 ++--- 1 files changed, 2 insertions(+), 3 deletions(-) diff --git a/tools/kvm/term.c b/tools/kvm/term.c index fb5d71c..440884e 100644 --- a/tools/kvm/term.c +++ b/tools/kvm/term.c @@ -30,11 +30,10 @@ int term_fds[4][2]; int term_getc(int who, int term) { - int c; + unsigned char c; if (who != active_console) return -1; - if (read_in_full(term_fds[term][TERM_FD_IN], c, 1) 0) return -1; @@ -84,7 +83,7 @@ int term_getc_iov(int who, struct iovec *iov, int iovcnt, int term) if (c 0) return 0; - *((int *)iov[TERM_FD_IN].iov_base) = c; + *((char *)iov[TERM_FD_IN].iov_base) = (char)c; return sizeof(char); } -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 15/28] kvm tools: Allow initrd_check() to match a cpio
cpios are valid as initrds too, so allow them through the check. Signed-off-by: Matt Evans m...@ozlabs.org --- tools/kvm/kvm.c |8 +--- 1 files changed, 5 insertions(+), 3 deletions(-) diff --git a/tools/kvm/kvm.c b/tools/kvm/kvm.c index 33243f1..457de1a 100644 --- a/tools/kvm/kvm.c +++ b/tools/kvm/kvm.c @@ -317,10 +317,11 @@ struct kvm *kvm__init(const char *kvm_dev, u64 ram_size, const char *name) /* RFC 1952 */ #define GZIP_ID1 0x1f #define GZIP_ID2 0x8b - +#define CPIO_MAGIC 0707 +/* initrd may be gzipped, or a plain cpio */ static bool initrd_check(int fd) { - unsigned char id[2]; + unsigned char id[4]; if (read_in_full(fd, id, ARRAY_SIZE(id)) 0) return false; @@ -328,7 +329,8 @@ static bool initrd_check(int fd) if (lseek(fd, 0, SEEK_SET) 0) die_perror(lseek); - return id[0] == GZIP_ID1 id[1] == GZIP_ID2; + return (id[0] == GZIP_ID1 id[1] == GZIP_ID2) || + !memcmp(id, CPIO_MAGIC, 4); } bool kvm__load_kernel(struct kvm *kvm, const char *kernel_filename, -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 16/28] kvm tools: Allow load_flat_binary() to load an initrd alongside
This patch passes the initrd fd and commandline to load_flat_binary(), which may be used to load both the kernel an initrd (stashing or inserting the commandline as appropriate) in the same way that load_bzimage() does. This is especially useful when load_bzimage() is unused for a particular architecture. :-) Signed-off-by: Matt Evans m...@ozlabs.org --- tools/kvm/include/kvm/kvm.h |2 +- tools/kvm/kvm.c | 10 ++ tools/kvm/x86/kvm.c | 12 +--- 3 files changed, 16 insertions(+), 8 deletions(-) diff --git a/tools/kvm/include/kvm/kvm.h b/tools/kvm/include/kvm/kvm.h index fae2ba9..5fe6e75 100644 --- a/tools/kvm/include/kvm/kvm.h +++ b/tools/kvm/include/kvm/kvm.h @@ -59,7 +59,7 @@ void kvm__arch_setup_firmware(struct kvm *kvm); bool kvm__arch_cpu_supports_vm(void); void kvm__arch_periodic_poll(struct kvm *kvm); -int load_flat_binary(struct kvm *kvm, int fd); +int load_flat_binary(struct kvm *kvm, int fd_kernel, int fd_initrd, const char *kernel_cmdline); bool load_bzimage(struct kvm *kvm, int fd_kernel, int fd_initrd, const char *kernel_cmdline, u16 vidmode); /* diff --git a/tools/kvm/kvm.c b/tools/kvm/kvm.c index 457de1a..6f33e1a 100644 --- a/tools/kvm/kvm.c +++ b/tools/kvm/kvm.c @@ -354,23 +354,25 @@ bool kvm__load_kernel(struct kvm *kvm, const char *kernel_filename, ret = load_bzimage(kvm, fd_kernel, fd_initrd, kernel_cmdline, vidmode); - if (initrd_filename) - close(fd_initrd); - if (ret) goto found_kernel; pr_warning(%s is not a bzImage. Trying to load it as a flat binary..., kernel_filename); - ret = load_flat_binary(kvm, fd_kernel); + ret = load_flat_binary(kvm, fd_kernel, fd_initrd, kernel_cmdline); + if (ret) goto found_kernel; + if (initrd_filename) + close(fd_initrd); close(fd_kernel); die(%s is not a valid bzImage or flat binary, kernel_filename); found_kernel: + if (initrd_filename) + close(fd_initrd); close(fd_kernel); return ret; diff --git a/tools/kvm/x86/kvm.c b/tools/kvm/x86/kvm.c index 7071dc6..4ac21c0 100644 --- a/tools/kvm/x86/kvm.c +++ b/tools/kvm/x86/kvm.c @@ -227,17 +227,23 @@ void kvm__irq_trigger(struct kvm *kvm, int irq) #define BOOT_PROTOCOL_REQUIRED 0x206 #define LOAD_HIGH 0x01 -int load_flat_binary(struct kvm *kvm, int fd) +int load_flat_binary(struct kvm *kvm, int fd_kernel, int fd_initrd, const char *kernel_cmdline) { void *p; int nr; - if (lseek(fd, 0, SEEK_SET) 0) + /* Some architectures may support loading an initrd alongside the flat kernel, +* but we do not. +*/ + if (fd_initrd != -1) + pr_warning(Loading initrd with flat binary not supported.); + + if (lseek(fd_kernel, 0, SEEK_SET) 0) die_perror(lseek); p = guest_real_to_host(kvm, BOOT_LOADER_SELECTOR, BOOT_LOADER_IP); - while ((nr = read(fd, p, 65536)) 0) + while ((nr = read(fd_kernel, p, 65536)) 0) p += nr; kvm-boot_selector = BOOT_LOADER_SELECTOR; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 17/28] kvm tools: Only call symbol__init() if we have BFD
CONFIG_HAS_BFD is optional, symbol.c inclusion is optional -- so make its init call dependent on CONFIG_HAS_BFD. Signed-off-by: Matt Evans m...@ozlabs.org --- tools/kvm/builtin-run.c |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c index 1257c90..aaa5132 100644 --- a/tools/kvm/builtin-run.c +++ b/tools/kvm/builtin-run.c @@ -798,8 +798,9 @@ int kvm_cmd_run(int argc, const char **argv, const char *prefix) if (!script) script = DEFAULT_SCRIPT; +#ifdef CONFIG_HAS_BFD symbol__init(vmlinux_filename); - +#endif term_init(); if (!guest_name) { -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 18/28] kvm tools: Initialise PCI before devices start getting registered with PCI
Re-arrange pci__init() in builtin-run such that it comes before devices are initialised. Signed-off-by: Matt Evans m...@ozlabs.org --- tools/kvm/builtin-run.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c index aaa5132..32e19e7 100644 --- a/tools/kvm/builtin-run.c +++ b/tools/kvm/builtin-run.c @@ -829,6 +829,8 @@ int kvm_cmd_run(int argc, const char **argv, const char *prefix) kvm-nrcpus = nrcpus; + pci__init(); + /* * vidmode should be either specified * either set by default @@ -896,8 +898,6 @@ int kvm_cmd_run(int argc, const char **argv, const char *prefix) serial8250__init(kvm); - pci__init(); - if (active_console == CONSOLE_VIRTIO) virtio_console__init(kvm); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 19/28] kvm tools: Perform CPU and firmware setup after devices are added
Currently some devices (in this case kbd, fb, vesa) are initialised after CPU/firmware setup. On some platforms (e.g. PPC) kvm__arch_setup_firmware() may be making a device tree. Any devices added after this point will be missed! Tiny refactor of builtin-run.c, moving timer start, firmware setup, cpu init to occur last. Signed-off-by: Matt Evans m...@ozlabs.org --- tools/kvm/builtin-run.c | 24 ++-- 1 files changed, 14 insertions(+), 10 deletions(-) diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c index 32e19e7..576dcfa 100644 --- a/tools/kvm/builtin-run.c +++ b/tools/kvm/builtin-run.c @@ -933,16 +933,6 @@ int kvm_cmd_run(int argc, const char **argv, const char *prefix) virtio_net__init(net_params); } - kvm__start_timer(kvm); - - kvm__arch_setup_firmware(kvm); - - for (i = 0; i nrcpus; i++) { - kvm_cpus[i] = kvm_cpu__init(kvm, i); - if (!kvm_cpus[i]) - die(unable to initialize KVM VCPU); - } - kvm__init_ram(kvm); #ifdef CONFIG_X86 @@ -966,6 +956,20 @@ int kvm_cmd_run(int argc, const char **argv, const char *prefix) fb__start(); + /* Device init all done; firmware init must +* come after this (it may set up device trees etc.) +*/ + + kvm__start_timer(kvm); + + kvm__arch_setup_firmware(kvm); + + for (i = 0; i nrcpus; i++) { + kvm_cpus[i] = kvm_cpu__init(kvm, i); + if (!kvm_cpus[i]) + die(unable to initialize KVM VCPU); + } + thread_pool__init(nr_online_cpus); ioeventfd__start(); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 20/28] kvm tools: Init IRQs after determining nrcpus
IRQ init may involve per-CPU setup/allocation of resources, so make sure kvm-nrcpus is initialised before calling irq__init(). Signed-off-by: Matt Evans m...@ozlabs.org --- tools/kvm/builtin-run.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c index 576dcfa..84aa931 100644 --- a/tools/kvm/builtin-run.c +++ b/tools/kvm/builtin-run.c @@ -810,8 +810,6 @@ int kvm_cmd_run(int argc, const char **argv, const char *prefix) kvm = kvm__init(dev, ram_size, guest_name); - irq__init(kvm); - kvm-single_step = single_step; ioeventfd__init(); @@ -829,6 +827,8 @@ int kvm_cmd_run(int argc, const char **argv, const char *prefix) kvm-nrcpus = nrcpus; + irq__init(kvm); + pci__init(); /* -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 21/28] kvm tools: Add --hugetlbfs option to specify memory path
Some architectures may want to use hugetlbfs to mmap() their guest memory, so allow a path to be specified on the commandline and pass it to kvm__arch_init(). Signed-off-by: Matt Evans m...@ozlabs.org --- tools/kvm/builtin-run.c |4 +++- tools/kvm/include/kvm/kvm.h |4 ++-- tools/kvm/kvm.c |4 ++-- tools/kvm/x86/kvm.c |2 +- 4 files changed, 8 insertions(+), 6 deletions(-) diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c index 84aa931..4c88169 100644 --- a/tools/kvm/builtin-run.c +++ b/tools/kvm/builtin-run.c @@ -84,6 +84,7 @@ static const char *guest_mac; static const char *host_mac; static const char *script; static const char *guest_name; +static const char *hugetlbfs_path; static struct virtio_net_params *net_params; static bool single_step; static bool readonly_image[MAX_DISK_IMAGES]; @@ -422,6 +423,7 @@ static const struct option options[] = { OPT_CALLBACK('\0', tty, NULL, tty id, Remap guest TTY into a pty on the host, tty_parser), + OPT_STRING('\0', hugetlbfs, hugetlbfs_path, path, Hugetlbfs path), OPT_GROUP(Kernel options:), OPT_STRING('k', kernel, kernel_filename, kernel, @@ -808,7 +810,7 @@ int kvm_cmd_run(int argc, const char **argv, const char *prefix) guest_name = default_name; } - kvm = kvm__init(dev, ram_size, guest_name); + kvm = kvm__init(dev, hugetlbfs_path, ram_size, guest_name); kvm-single_step = single_step; diff --git a/tools/kvm/include/kvm/kvm.h b/tools/kvm/include/kvm/kvm.h index 5fe6e75..7159952 100644 --- a/tools/kvm/include/kvm/kvm.h +++ b/tools/kvm/include/kvm/kvm.h @@ -30,7 +30,7 @@ struct kvm_ext { void kvm__set_dir(const char *fmt, ...); const char *kvm__get_dir(void); -struct kvm *kvm__init(const char *kvm_dev, u64 ram_size, const char *name); +struct kvm *kvm__init(const char *kvm_dev, const char *hugetlbfs_path, u64 ram_size, const char *name); int kvm__recommended_cpus(struct kvm *kvm); int kvm__max_cpus(struct kvm *kvm); void kvm__init_ram(struct kvm *kvm); @@ -54,7 +54,7 @@ int kvm__enumerate_instances(int (*callback)(const char *name, int pid)); void kvm__remove_socket(const char *name); void kvm__arch_set_cmdline(char *cmdline, bool video); -void kvm__arch_init(struct kvm *kvm, const char *kvm_dev, u64 ram_size, const char *name); +void kvm__arch_init(struct kvm *kvm, const char *kvm_dev, const char *hugetlbfs_path, u64 ram_size, const char *name); void kvm__arch_setup_firmware(struct kvm *kvm); bool kvm__arch_cpu_supports_vm(void); void kvm__arch_periodic_poll(struct kvm *kvm); diff --git a/tools/kvm/kvm.c b/tools/kvm/kvm.c index 6f33e1a..503ceae 100644 --- a/tools/kvm/kvm.c +++ b/tools/kvm/kvm.c @@ -272,7 +272,7 @@ static void kvm__pid(int fd, u32 type, u32 len, u8 *msg) pr_warning(Failed sending PID); } -struct kvm *kvm__init(const char *kvm_dev, u64 ram_size, const char *name) +struct kvm *kvm__init(const char *kvm_dev, const char *hugetlbfs_path, u64 ram_size, const char *name) { struct kvm *kvm; int ret; @@ -305,7 +305,7 @@ struct kvm *kvm__init(const char *kvm_dev, u64 ram_size, const char *name) if (kvm__check_extensions(kvm)) die(A required KVM extention is not supported by OS); - kvm__arch_init(kvm, kvm_dev, ram_size, name); + kvm__arch_init(kvm, kvm_dev, hugetlbfs_path, ram_size, name); kvm-name = name; diff --git a/tools/kvm/x86/kvm.c b/tools/kvm/x86/kvm.c index 4ac21c0..76f805f 100644 --- a/tools/kvm/x86/kvm.c +++ b/tools/kvm/x86/kvm.c @@ -161,7 +161,7 @@ void kvm__arch_set_cmdline(char *cmdline, bool video) } /* Architecture-specific KVM init */ -void kvm__arch_init(struct kvm *kvm, const char *kvm_dev, u64 ram_size, const char *name) +void kvm__arch_init(struct kvm *kvm, const char *kvm_dev, const char *hugetlbfs_path, u64 ram_size, const char *name) { struct kvm_pit_config pit_config = { .flags = 0, }; int ret; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 22/28] kvm tools: Move PCI_MAX_DEVICES to pci.h
Other pieces of kvmtool may be interested in PCI_MAX_DEVICES. Signed-off-by: Matt Evans m...@ozlabs.org --- tools/kvm/include/kvm/pci.h |1 + tools/kvm/pci.c |1 - 2 files changed, 1 insertions(+), 1 deletions(-) diff --git a/tools/kvm/include/kvm/pci.h b/tools/kvm/include/kvm/pci.h index f71af0b..b578ad7 100644 --- a/tools/kvm/include/kvm/pci.h +++ b/tools/kvm/include/kvm/pci.h @@ -6,6 +6,7 @@ #include linux/pci_regs.h #include linux/msi.h +#define PCI_MAX_DEVICES256 /* * PCI Configuration Mechanism #1 I/O ports. See Section 3.7.4.1. * (Configuration Mechanism #1) of the PCI Local Bus Specification 2.1 for diff --git a/tools/kvm/pci.c b/tools/kvm/pci.c index d1afc05..920e13e 100644 --- a/tools/kvm/pci.c +++ b/tools/kvm/pci.c @@ -5,7 +5,6 @@ #include assert.h -#define PCI_MAX_DEVICES256 #define PCI_BAR_OFFSET(b) (offsetof(struct pci_device_header, bar[b])) static struct pci_device_header*pci_devices[PCI_MAX_DEVICES]; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 23/28] kvm tools: Endian-sanitise pci.h and PCI device setup
vesa, pci-shmem and virtio-pci devices need to set up config space with little-endian conversions (as config space is LE). The pci_config_address bitfield also needs to be reversed when building on BE systems. Signed-off-by: Matt Evans m...@ozlabs.org --- tools/kvm/hw/pci-shmem.c | 23 +++-- tools/kvm/hw/vesa.c| 15 +++-- tools/kvm/include/kvm/ioport.h | 11 + tools/kvm/include/kvm/pci.h| 24 +- tools/kvm/pci.c|4 +- tools/kvm/virtio/pci.c | 41 +-- 6 files changed, 68 insertions(+), 50 deletions(-) diff --git a/tools/kvm/hw/pci-shmem.c b/tools/kvm/hw/pci-shmem.c index 780a377..fd954c5 100644 --- a/tools/kvm/hw/pci-shmem.c +++ b/tools/kvm/hw/pci-shmem.c @@ -8,21 +8,22 @@ #include kvm/ioeventfd.h #include linux/kvm.h +#include linux/byteorder.h #include sys/ioctl.h #include fcntl.h #include sys/mman.h static struct pci_device_header pci_shmem_pci_device = { - .vendor_id = PCI_VENDOR_ID_REDHAT_QUMRANET, - .device_id = 0x1110, + .vendor_id = cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET), + .device_id = cpu_to_le16(0x1110), .header_type= PCI_HEADER_TYPE_NORMAL, - .class = 0xFF, /* misc pci device */ - .status = PCI_STATUS_CAP_LIST, + .class[2] = 0xFF, /* misc pci device */ + .status = cpu_to_le16(PCI_STATUS_CAP_LIST), .capabilities = (void *)pci_shmem_pci_device.msix - (void *)pci_shmem_pci_device, .msix.cap = PCI_CAP_ID_MSIX, - .msix.ctrl = 1, - .msix.table_offset = 1, /* Use BAR 1 */ - .msix.pba_offset = 0x1001, /* Use BAR 1 */ + .msix.ctrl = cpu_to_le16(1), + .msix.table_offset = cpu_to_le32(1),/* Use BAR 1 */ + .msix.pba_offset = cpu_to_le32(0x1001), /* Use BAR 1 */ }; /* registers for the Inter-VM shared memory device */ @@ -123,7 +124,7 @@ int pci_shmem__get_local_irqfd(struct kvm *kvm) if (fd 0) return fd; - if (pci_shmem_pci_device.msix.ctrl PCI_MSIX_FLAGS_ENABLE) { + if (pci_shmem_pci_device.msix.ctrl cpu_to_le16(PCI_MSIX_FLAGS_ENABLE)) { gsi = irq__add_msix_route(kvm, msix_table[0].msg); } else { gsi = pci_shmem_pci_device.irq_line; @@ -241,11 +242,11 @@ int pci_shmem__init(struct kvm *kvm) * 1 - MSI-X MMIO space * 2 - Shared memory block */ - pci_shmem_pci_device.bar[0] = ivshmem_registers | PCI_BASE_ADDRESS_SPACE_IO; + pci_shmem_pci_device.bar[0] = cpu_to_le32(ivshmem_registers | PCI_BASE_ADDRESS_SPACE_IO); pci_shmem_pci_device.bar_size[0] = shmem_region-size; - pci_shmem_pci_device.bar[1] = msix_block | PCI_BASE_ADDRESS_SPACE_MEMORY; + pci_shmem_pci_device.bar[1] = cpu_to_le32(msix_block | PCI_BASE_ADDRESS_SPACE_MEMORY); pci_shmem_pci_device.bar_size[1] = 0x1010; - pci_shmem_pci_device.bar[2] = shmem_region-phys_addr | PCI_BASE_ADDRESS_SPACE_MEMORY; + pci_shmem_pci_device.bar[2] = cpu_to_le32(shmem_region-phys_addr | PCI_BASE_ADDRESS_SPACE_MEMORY); pci_shmem_pci_device.bar_size[2] = shmem_region-size; pci__register(pci_shmem_pci_device, dev); diff --git a/tools/kvm/hw/vesa.c b/tools/kvm/hw/vesa.c index 22b1652..63f1082 100644 --- a/tools/kvm/hw/vesa.c +++ b/tools/kvm/hw/vesa.c @@ -8,6 +8,7 @@ #include kvm/irq.h #include kvm/kvm.h #include kvm/pci.h +#include linux/byteorder.h #include sys/mman.h #include sys/types.h @@ -31,14 +32,14 @@ static struct ioport_operations vesa_io_ops = { }; static struct pci_device_header vesa_pci_device = { - .vendor_id = PCI_VENDOR_ID_REDHAT_QUMRANET, - .device_id = PCI_DEVICE_ID_VESA, + .vendor_id = cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET), + .device_id = cpu_to_le16(PCI_DEVICE_ID_VESA), .header_type= PCI_HEADER_TYPE_NORMAL, .revision_id= 0, - .class = 0x03, - .subsys_vendor_id = PCI_SUBSYSTEM_VENDOR_ID_REDHAT_QUMRANET, - .subsys_id = PCI_SUBSYSTEM_ID_VESA, - .bar[1] = VESA_MEM_ADDR | PCI_BASE_ADDRESS_SPACE_MEMORY, + .class[2] = 0x03, + .subsys_vendor_id = cpu_to_le16(PCI_SUBSYSTEM_VENDOR_ID_REDHAT_QUMRANET), + .subsys_id = cpu_to_le16(PCI_SUBSYSTEM_ID_VESA), + .bar[1] = cpu_to_le32(VESA_MEM_ADDR | PCI_BASE_ADDRESS_SPACE_MEMORY), .bar_size[1]= VESA_MEM_SIZE, }; @@ -56,7 +57,7 @@ struct framebuffer *vesa__init(struct kvm *kvm) vesa_pci_device.irq_pin = pin; vesa_pci_device.irq_line= line; vesa_base_addr
[PATCH 24/28] kvm tools: Fix virtio-pci endian bug when reading VIRTIO_PCI_QUEUE_NUM
The field size is currently wrong, read into a 32bit word instead of 16. This casues trouble when BE. Signed-off-by: Matt Evans m...@ozlabs.org --- tools/kvm/virtio/pci.c |3 +-- 1 files changed, 1 insertions(+), 2 deletions(-) diff --git a/tools/kvm/virtio/pci.c b/tools/kvm/virtio/pci.c index 0ae93fb..6b27ff8 100644 --- a/tools/kvm/virtio/pci.c +++ b/tools/kvm/virtio/pci.c @@ -116,8 +116,7 @@ static bool virtio_pci__io_in(struct ioport *ioport, struct kvm *kvm, u16 port, break; case VIRTIO_PCI_QUEUE_NUM: val = vtrans-virtio_ops-get_size_vq(kvm, vpci-dev, vpci-queue_selector); - ioport__write32(data, val); - break; + ioport__write16(data, val); break; case VIRTIO_PCI_STATUS: ioport__write8(data, vpci-status); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 25/28] kvm tools: Correctly set virtio-pci bar_size and remove hardwired address
The BAR addresses are set up fine, but missed the bar_size[] array which is now updated correspondingly. Use PCI_IO_SIZE instead of '0x100'. Signed-off-by: Matt Evans m...@ozlabs.org --- tools/kvm/virtio/pci.c |7 +-- 1 files changed, 5 insertions(+), 2 deletions(-) diff --git a/tools/kvm/virtio/pci.c b/tools/kvm/virtio/pci.c index 6b27ff8..ffa3768 100644 --- a/tools/kvm/virtio/pci.c +++ b/tools/kvm/virtio/pci.c @@ -293,8 +293,8 @@ int virtio_pci__init(struct kvm *kvm, struct virtio_trans *vtrans, void *dev, vpci-msix_pba_block = pci_get_io_space_block(PCI_IO_SIZE); vpci-base_addr = ioport__register(IOPORT_EMPTY, virtio_pci__io_ops, IOPORT_SIZE, vtrans); - kvm__register_mmio(kvm, vpci-msix_io_block, 0x100, callback_mmio_table, vpci); - kvm__register_mmio(kvm, vpci-msix_pba_block, 0x100, callback_mmio_pba, vpci); + kvm__register_mmio(kvm, vpci-msix_io_block, PCI_IO_SIZE, callback_mmio_table, vpci); + kvm__register_mmio(kvm, vpci-msix_pba_block, PCI_IO_SIZE, callback_mmio_pba, vpci); vpci-pci_hdr = (struct pci_device_header) { .vendor_id = cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET), @@ -313,6 +313,9 @@ int virtio_pci__init(struct kvm *kvm, struct virtio_trans *vtrans, void *dev, | PCI_BASE_ADDRESS_MEM_TYPE_64), .status = cpu_to_le16(PCI_STATUS_CAP_LIST), .capabilities = (void *)vpci-pci_hdr.msix - (void *)vpci-pci_hdr, + .bar_size[0]= IOPORT_SIZE, + .bar_size[1]= PCI_IO_SIZE, + .bar_size[3]= PCI_IO_SIZE, }; vpci-pci_hdr.msix.cap = PCI_CAP_ID_MSIX; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 26/28] kvm tools: Add pci__config_{rd,wr}(), pci__find_dev() and fix PCI config register addressing
This allows config space access in a more natural manner than clunky x86 IO ports, and is useful for other architectures. Furthermore, the actual registers were only accessed in 32bit chunks; other systems (e.g. PPC) allow smaller accesses so that, for example, the 16-bit config field can be read directly. This patch allows this sort of addressing. Signed-off-by: Matt Evans m...@ozlabs.org --- tools/kvm/include/kvm/pci.h |5 +++ tools/kvm/pci.c | 63 +++--- 2 files changed, 45 insertions(+), 23 deletions(-) diff --git a/tools/kvm/include/kvm/pci.h b/tools/kvm/include/kvm/pci.h index 88e92dc..be2b0bc 100644 --- a/tools/kvm/include/kvm/pci.h +++ b/tools/kvm/include/kvm/pci.h @@ -7,6 +7,8 @@ #include linux/msi.h #include endian.h +#include kvm/kvm.h + #define PCI_MAX_DEVICES256 /* * PCI Configuration Mechanism #1 I/O ports. See Section 3.7.4.1. @@ -82,6 +84,9 @@ struct pci_device_header { void pci__init(void); void pci__register(struct pci_device_header *dev, u8 dev_num); +struct pci_device_header *pci__find_dev(u8 dev_num); u32 pci_get_io_space_block(u32 size); +void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data, int size); +void pci__config_rd(struct kvm *kvm, union pci_config_address addr, void *data, int size); #endif /* KVM__PCI_H */ diff --git a/tools/kvm/pci.c b/tools/kvm/pci.c index 5bbcbc7..8282e23 100644 --- a/tools/kvm/pci.c +++ b/tools/kvm/pci.c @@ -77,7 +77,6 @@ static bool pci_device_exists(u8 bus_number, u8 device_number, u8 function_numbe static bool pci_config_data_out(struct ioport *ioport, struct kvm *kvm, u16 port, void *data, int size) { unsigned long start; - u8 dev_num; /* * If someone accesses PCI configuration space offsets that are not @@ -85,12 +84,41 @@ static bool pci_config_data_out(struct ioport *ioport, struct kvm *kvm, u16 port */ start = port - PCI_CONFIG_DATA; - dev_num = pci_config_address.device_number; + pci__config_wr(kvm, pci_config_address, data, size); + + return true; +} + +static bool pci_config_data_in(struct ioport *ioport, struct kvm *kvm, u16 port, void *data, int size) +{ + unsigned long start; + + /* +* If someone accesses PCI configuration space offsets that are not +* aligned to 4 bytes, it uses ioports to signify that. +*/ + start = port - PCI_CONFIG_DATA; + + pci__config_rd(kvm, pci_config_address, data, size); + + return true; +} + +static struct ioport_operations pci_config_data_ops = { + .io_in = pci_config_data_in, + .io_out = pci_config_data_out, +}; + +void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data, int size) +{ + u8 dev_num; + + dev_num = addr.device_number; if (pci_device_exists(0, dev_num, 0)) { unsigned long offset; - offset = start + (pci_config_address.register_number 2); + offset = addr.w 0xff; if (offset sizeof(struct pci_device_header)) { void *p = pci_devices[dev_num]; u8 bar = (offset - PCI_BAR_OFFSET(0)) / (sizeof(u32)); @@ -116,27 +144,18 @@ static bool pci_config_data_out(struct ioport *ioport, struct kvm *kvm, u16 port } } } - - return true; } -static bool pci_config_data_in(struct ioport *ioport, struct kvm *kvm, u16 port, void *data, int size) +void pci__config_rd(struct kvm *kvm, union pci_config_address addr, void *data, int size) { - unsigned long start; u8 dev_num; - /* -* If someone accesses PCI configuration space offsets that are not -* aligned to 4 bytes, it uses ioports to signify that. -*/ - start = port - PCI_CONFIG_DATA; - - dev_num = pci_config_address.device_number; + dev_num = addr.device_number; if (pci_device_exists(0, dev_num, 0)) { unsigned long offset; - offset = start + (pci_config_address.register_number 2); + offset = addr.w 0xff; if (offset sizeof(struct pci_device_header)) { void *p = pci_devices[dev_num]; @@ -145,22 +164,20 @@ static bool pci_config_data_in(struct ioport *ioport, struct kvm *kvm, u16 port, memset(data, 0x00, size); } else memset(data, 0xff, size); - - return true; } -static struct ioport_operations pci_config_data_ops = { - .io_in = pci_config_data_in, - .io_out = pci_config_data_out, -}; - void pci__register(struct pci_device_header *dev, u8 dev_num) { assert(dev_num PCI_MAX_DEVICES); - pci_devices[dev_num]= dev; } +struct pci_device_header *pci__find_dev(u8 dev_num) +{ +
[PATCH 27/28] kvm tools: Arch-specific define for PCI MMIO allocation area
pci_get_io_space_block() used to grab addresses from KVM_32BIT_GAP_START + 0x100, which is x86-specific. Create a new define, KVM_PCI_MMIO_AREA, to specify a bus address these allocations can come from. Signed-off-by: Matt Evans m...@ozlabs.org --- tools/kvm/pci.c |8 ++-- tools/kvm/x86/include/kvm/kvm-arch.h |5 + 2 files changed, 11 insertions(+), 2 deletions(-) diff --git a/tools/kvm/pci.c b/tools/kvm/pci.c index 8282e23..045c1c5 100644 --- a/tools/kvm/pci.c +++ b/tools/kvm/pci.c @@ -11,8 +11,12 @@ static struct pci_device_header *pci_devices[PCI_MAX_DEVICES]; static union pci_config_addresspci_config_address; -/* This is within our PCI gap - in an unused area */ -static u32 io_space_blocks = KVM_32BIT_GAP_START + 0x100; +/* This is within our PCI gap - in an unused area. + * Note this is a PCI *bus address*, is used to assign BARs etc.! + * (That's why it can still 32bit even with 64bit guests-- 64bit + * PCI isn't currently supported.) + */ +static u32 io_space_blocks = KVM_PCI_MMIO_AREA; u32 pci_get_io_space_block(u32 size) { diff --git a/tools/kvm/x86/include/kvm/kvm-arch.h b/tools/kvm/x86/include/kvm/kvm-arch.h index 02aa8b9..686b1b8 100644 --- a/tools/kvm/x86/include/kvm/kvm-arch.h +++ b/tools/kvm/x86/include/kvm/kvm-arch.h @@ -18,6 +18,11 @@ #define KVM_MMIO_START KVM_32BIT_GAP_START +/* This is the address that pci_get_io_space_block() starts allocating + * from. Note that this is a PCI bus address (though same on x86). + */ +#define KVM_PCI_MMIO_AREA (KVM_MMIO_START + 0x100) + struct kvm { int sys_fd; /* For system ioctls(), i.e. /dev/kvm */ int vm_fd; /* For VM ioctls() */ -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 28/28] kvm tools: Create arch-specific kvm_cpu__emulate_io()
Different architectures will deal with MMIO exits differently. For example, KVM_EXIT_IO is x86-specific, and I/O cycles are often synthesisted by steering into windows in PCI bridges on other architectures. This patch moves the IO/MMIO exit code from the main runloop into x86/kvm-cpu.c Signed-off-by: Matt Evans m...@ozlabs.org --- tools/kvm/include/kvm/kvm-cpu.h |1 + tools/kvm/kvm-cpu.c | 37 + tools/kvm/x86/kvm-cpu.c | 37 + 3 files changed, 43 insertions(+), 32 deletions(-) diff --git a/tools/kvm/include/kvm/kvm-cpu.h b/tools/kvm/include/kvm/kvm-cpu.h index 15618f1..6f38c0c 100644 --- a/tools/kvm/include/kvm/kvm-cpu.h +++ b/tools/kvm/include/kvm/kvm-cpu.h @@ -13,6 +13,7 @@ void kvm_cpu__run(struct kvm_cpu *vcpu); void kvm_cpu__reboot(void); int kvm_cpu__start(struct kvm_cpu *cpu); bool kvm_cpu__handle_exit(struct kvm_cpu *vcpu); +bool kvm_cpu__emulate_io(struct kvm_cpu *cpu, struct kvm_run *kvm_run); int kvm_cpu__get_debug_fd(void); void kvm_cpu__set_debug_fd(int fd); diff --git a/tools/kvm/kvm-cpu.c b/tools/kvm/kvm-cpu.c index 884a89f..c9fbc81 100644 --- a/tools/kvm/kvm-cpu.c +++ b/tools/kvm/kvm-cpu.c @@ -103,49 +103,22 @@ int kvm_cpu__start(struct kvm_cpu *cpu) kvm_cpu__show_registers(cpu); kvm_cpu__show_code(cpu); break; - case KVM_EXIT_IO: { - bool ret; - - ret = kvm__emulate_io(cpu-kvm, - cpu-kvm_run-io.port, - (u8 *)cpu-kvm_run + - cpu-kvm_run-io.data_offset, - cpu-kvm_run-io.direction, - cpu-kvm_run-io.size, - cpu-kvm_run-io.count); - - if (!ret) + case KVM_EXIT_IO: + case KVM_EXIT_MMIO: + if (!kvm_cpu__emulate_io(cpu, cpu-kvm_run)) goto panic_kvm; break; - } - case KVM_EXIT_MMIO: { - bool ret; - - ret = kvm__emulate_mmio(cpu-kvm, - cpu-kvm_run-mmio.phys_addr, - cpu-kvm_run-mmio.data, - cpu-kvm_run-mmio.len, - cpu-kvm_run-mmio.is_write); - - if (!ret) - goto panic_kvm; - break; - } case KVM_EXIT_INTR: if (cpu-is_running) break; goto exit_kvm; case KVM_EXIT_SHUTDOWN: goto exit_kvm; - default: { - bool ret; - - ret = kvm_cpu__handle_exit(cpu); - if (!ret) + default: + if (!kvm_cpu__handle_exit(cpu)) goto panic_kvm; break; } - } kvm_cpu__handle_coalesced_mmio(cpu); } diff --git a/tools/kvm/x86/kvm-cpu.c b/tools/kvm/x86/kvm-cpu.c index a0d10cc..665d742 100644 --- a/tools/kvm/x86/kvm-cpu.c +++ b/tools/kvm/x86/kvm-cpu.c @@ -217,6 +217,43 @@ bool kvm_cpu__handle_exit(struct kvm_cpu *vcpu) return false; } +bool kvm_cpu__emulate_io(struct kvm_cpu *cpu, struct kvm_run *kvm_run) +{ + bool ret; + switch (kvm_run-exit_reason) { + case KVM_EXIT_IO: { + ret = kvm__emulate_io(cpu-kvm, + cpu-kvm_run-io.port, + (u8 *)cpu-kvm_run + + cpu-kvm_run-io.data_offset, + cpu-kvm_run-io.direction, + cpu-kvm_run-io.size, + cpu-kvm_run-io.count); + + if (!ret) + goto panic_kvm; + break; + } + case KVM_EXIT_MMIO: { + ret = kvm__emulate_mmio(cpu-kvm, + cpu-kvm_run-mmio.phys_addr, + cpu-kvm_run-mmio.data, + cpu-kvm_run-mmio.len, + cpu-kvm_run-mmio.is_write); + + if (!ret) + goto panic_kvm; + break; + } + default: + pr_warning(Unknown exit reason %d in %s\n, kvm_run-exit_reason, __FUNCTION__); + return false; + } + return true; +panic_kvm: + return false; +} + static void print_dtable(const
[PATCH 1/8] kvm tools: Add initial SPAPR PPC64 architecture support
This patch adds a new arch directory, powerpc, basic file structure, register setup and where necessary stubs out arch-specific functions (e.g. interrupts, runloop exits) that later patches will provide. The target is an SPAPR-compliant PPC64 machine (i.e. pSeries); there is no support for PPC32 or 'bare metal' PPC64 guests as yet. Subsequent patches implement the hcalls and RTAS required to boot SPAPR pSeries kernels. Memory is mapped from hugetlbfs (as that is currently required by upstream PPC64 HV-mode KVM). The mapping of a VRMA region is yet to be implemented; this is only necessary on processors that don't support VRMA, e.g. = P6. Work is therefore needed to get this going on pre-P7 CPUs. Processor state is set up as a guest kernel would expect (both primary and secondaries), and SMP is fully supported. Finally, support is added for simply loading flat binary kernels (plus initrd). (bzImages are not used on PPC, and this series does not add zImage support or an ELF loader.) The intention is to later support loading firmware such as SLOF. Signed-off-by: Matt Evans m...@ozlabs.org --- tools/kvm/Makefile | 10 + tools/kvm/kvm.c |3 + tools/kvm/powerpc/include/kvm/barrier.h |6 + tools/kvm/powerpc/include/kvm/kvm-arch.h | 70 tools/kvm/powerpc/include/kvm/kvm-cpu-arch.h | 46 + tools/kvm/powerpc/ioport.c | 18 ++ tools/kvm/powerpc/irq.c | 40 + tools/kvm/powerpc/kvm-cpu.c | 232 ++ tools/kvm/powerpc/kvm.c | 231 + 9 files changed, 656 insertions(+), 0 deletions(-) create mode 100644 tools/kvm/powerpc/include/kvm/barrier.h create mode 100644 tools/kvm/powerpc/include/kvm/kvm-arch.h create mode 100644 tools/kvm/powerpc/include/kvm/kvm-cpu-arch.h create mode 100644 tools/kvm/powerpc/ioport.c create mode 100644 tools/kvm/powerpc/irq.c create mode 100644 tools/kvm/powerpc/kvm-cpu.c create mode 100644 tools/kvm/powerpc/kvm.c diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile index 57dc521..58815a2 100644 --- a/tools/kvm/Makefile +++ b/tools/kvm/Makefile @@ -121,6 +121,16 @@ ifeq ($(ARCH),x86) OTHEROBJS += x86/bios/bios-rom.o ARCH_INCLUDE := x86/include endif +# POWER/ppc: Actually only support ppc64 currently. +ifeq ($(uname_M), ppc64) + DEFINES += -DCONFIG_PPC + OBJS+= powerpc/ioport.o + OBJS+= powerpc/irq.o + OBJS+= powerpc/kvm.o + OBJS+= powerpc/kvm-cpu.o + ARCH_INCLUDE := powerpc/include + CFLAGS += -m64 +endif ### diff --git a/tools/kvm/kvm.c b/tools/kvm/kvm.c index 503ceae..d716ede 100644 --- a/tools/kvm/kvm.c +++ b/tools/kvm/kvm.c @@ -49,6 +49,9 @@ const char *kvm_exit_reasons[] = { DEFINE_KVM_EXIT_REASON(KVM_EXIT_DCR), DEFINE_KVM_EXIT_REASON(KVM_EXIT_NMI), DEFINE_KVM_EXIT_REASON(KVM_EXIT_INTERNAL_ERROR), +#ifdef CONFIG_PPC64 + DEFINE_KVM_EXIT_REASON(KVM_EXIT_PAPR_HCALL), +#endif }; extern struct kvm *kvm; diff --git a/tools/kvm/powerpc/include/kvm/barrier.h b/tools/kvm/powerpc/include/kvm/barrier.h new file mode 100644 index 000..bc7d179 --- /dev/null +++ b/tools/kvm/powerpc/include/kvm/barrier.h @@ -0,0 +1,6 @@ +#ifndef _KVM_BARRIER_H_ +#define _KVM_BARRIER_H_ + +#include asm/system.h + +#endif /* _KVM_BARRIER_H_ */ diff --git a/tools/kvm/powerpc/include/kvm/kvm-arch.h b/tools/kvm/powerpc/include/kvm/kvm-arch.h new file mode 100644 index 000..722d01c --- /dev/null +++ b/tools/kvm/powerpc/include/kvm/kvm-arch.h @@ -0,0 +1,70 @@ +/* + * PPC64 architecture-specific definitions + * + * Copyright 2011 Matt Evans m...@ozlabs.org, IBM Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published + * by the Free Software Foundation. + */ + +#ifndef KVM__KVM_ARCH_H +#define KVM__KVM_ARCH_H + +#include stdbool.h +#include linux/types.h +#include time.h + +#define KVM_NR_CPUS(255) + +/* MMIO lives after RAM, but it'd be nice if it didn't constantly move. + * Choose a suitably high address, e.g. 63T... This limits RAM size. + */ +#define PPC_MMIO_START 0x3F00UL +#define PPC_MMIO_SIZE 0x0100UL + +#define KERNEL_LOAD_ADDR 0x +#define KERNEL_START_ADDR 0x +#define KERNEL_SECONDARY_START_ADDR 0x0060 +#define INITRD_LOAD_ADDR 0x0280 + +#define FDT_MAX_SIZE 0x1 +#define RTAS_MAX_SIZE 0x1 + +#define TIMEBASE_FREQ 51200ULL + +#define KVM_MMIO_START PPC_MMIO_START + +/* This is the address that pci_get_io_space_block() starts allocating + * from. Note that this is a PCI bus address. + */ +#define
[PATCH 2/8] kvm tools: Generate SPAPR PPC64 guest device tree
The generated DT is the bare minimum structure required for SPAPR (on which subsequent patches for VIO, XICS, PCI etc. will build); root node, cpus, memory. Some aspects are currently hardwired for simplicity, for example advertised page sizes, HPT size, SLB size, VMX/DFP, etc. Future support of a variety of POWER CPUs should acquire this info from the host and encode appropriately. This requires a 64-bit libfdt. Signed-off-by: Matt Evans m...@ozlabs.org --- tools/kvm/Makefile |3 +- tools/kvm/powerpc/kvm.c | 141 +++ 2 files changed, 143 insertions(+), 1 deletions(-) diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile index 58815a2..dc18959 100644 --- a/tools/kvm/Makefile +++ b/tools/kvm/Makefile @@ -129,7 +129,8 @@ ifeq ($(uname_M), ppc64) OBJS+= powerpc/kvm.o OBJS+= powerpc/kvm-cpu.o ARCH_INCLUDE := powerpc/include - CFLAGS += -m64 + CFLAGS += -m64 + LIBS+= -lfdt endif ### diff --git a/tools/kvm/powerpc/kvm.c b/tools/kvm/powerpc/kvm.c index 036bfc0..d792bee 100644 --- a/tools/kvm/powerpc/kvm.c +++ b/tools/kvm/powerpc/kvm.c @@ -3,6 +3,9 @@ * * Copyright 2011 Matt Evans m...@ozlabs.org, IBM Corporation. * + * Portions of FDT setup borrowed from QEMU, copyright 2010 David Gibson, IBM + * Corporation. + * * This program is free software; you can redistribute it and/or modify it * under the terms of the GNU General Public License version 2 as published * by the Free Software Foundation. @@ -28,8 +31,11 @@ #include asm/unistd.h #include errno.h +#include linux/byteorder.h #include libfdt.h +#define HPT_ORDER 24 + #define HUGETLBFS_PATH /var/lib/hugetlbfs/global/pagesize-16MB/ static char kern_cmdline[2048]; @@ -212,9 +218,144 @@ bool load_bzimage(struct kvm *kvm, int fd_kernel, return false; } +#define SMT_THREADS 4 + +#define _FDT(exp) \ + do {\ + int ret = (exp);\ + if (ret 0) { \ + die(Error creating device tree: %s: %s\n, \ + #exp, fdt_strerror(ret)); \ + } \ + } while (0) + +static uint32_t mfpvr(void) +{ + uint32_t r; + asm volatile (mfpvr %0 : =r(r)); + return r; +} + static void setup_fdt(struct kvm *kvm) { + uint64_tmem_reg_property[] = { 0, cpu_to_be64(kvm-ram_size) }; + int smp_cpus = kvm-nrcpus; + uint32_tinterrupt_server_ranges_prop[] = {0, cpu_to_be32(smp_cpus)}; + charhypertas_prop_kvm[] = hcall-pft\0hcall-term\0hcall-dabr\0hcall-interrupt + \0hcall-tce\0hcall-vio\0hcall-splpar\0hcall-bulk; + int i, j; + charcpu_name[30]; + u8 staging_fdt[FDT_MAX_SIZE]; + uint32_tpvr = mfpvr(); + + /* Generate an appropriate DT at kvm-fdt_gra */ + void *fdt_dest = guest_flat_to_host(kvm, kvm-fdt_gra); + void *fdt = staging_fdt; + + _FDT(fdt_create(fdt, FDT_MAX_SIZE)); + _FDT(fdt_finish_reservemap(fdt)); + + _FDT(fdt_begin_node(fdt, )); + + _FDT(fdt_property_string(fdt, device_type, chrp)); + _FDT(fdt_property_string(fdt, model, IBM pSeries (emulated by kvmtool))); + _FDT(fdt_property_cell(fdt, #address-cells, 0x2)); + _FDT(fdt_property_cell(fdt, #size-cells, 0x2)); + + /* /chosen */ + _FDT(fdt_begin_node(fdt, chosen)); + /* cmdline */ + _FDT(fdt_property_string(fdt, bootargs, kern_cmdline)); + /* Initrd */ + if (kvm-initrd_size != 0) { + uint32_t ird_st_prop = cpu_to_be32(kvm-initrd_gra); + uint32_t ird_end_prop = cpu_to_be32(kvm-initrd_gra + + kvm-initrd_size); + _FDT(fdt_property(fdt, linux,initrd-start, + ird_st_prop, sizeof(ird_st_prop))); + _FDT(fdt_property(fdt, linux,initrd-end, + ird_end_prop, sizeof(ird_end_prop))); + } + + /* Memory: We don't alloc. a separate RMA yet. If we ever need to +* (CAP_PPC_RMA == 2) then have one memory node for 0-RMAsize, and +* another RMAsize-endOfMem. +*/ + _FDT(fdt_begin_node(fdt, memory@0)); + _FDT(fdt_property_string(fdt, device_type, memory)); + _FDT(fdt_property(fdt, reg, mem_reg_property, sizeof(mem_reg_property))); + _FDT(fdt_end_node(fdt)); + + /* CPUs */ + _FDT(fdt_begin_node(fdt, cpus)); + _FDT(fdt_property_cell(fdt, #address-cells, 0x1)); + _FDT(fdt_property_cell(fdt, #size-cells, 0x0)); + + for (i = 0; i smp_cpus; i +=
[PATCH 0/8] kvm tools SPAPR PPC64 support
Hi, This set of patches builds upon the prep-work of the previous set and adds support to kvmtool for PPC64 SPAPR-based guests, i.e. an environment akin to an LPAR on IBM's pSeries machines. This support is not yet fully-featured but, in a basic state, works well. The guests have a functional but no-frills experience, with: - SMP guests - HV console (or RTAS console, for udbg) - Net, block over virtio-pci - No PAPR VIO/VSCSI/VNET yet - No fancyfeatures like migration yet Though minimal, guests are quite stable. There are obvious areas for future improvement: - Non-VRMA RMAs aren't supported, meaning POWER7-only for the moment - Other CPU-specific details are currently assumed (e.g. available page sizes); work is required to determine host capabilities and pass these up. - Support SLOF - Maybe support VIO - Some hypercalls used by partition firmware/SLOF (not the kernel) are unimplemented - Fancy PCI (e.g. passthrough) - Currently KVM_NR_CPUs is arbitrarily fixed at 255, and could be higher. Guests with this many CPUs boot fine. Some PPC KVM kernel-side features aren't implemented yet and have required kvmtool workarounds; mmio coalescing isn't supported and lack of ioeventfds requires virtio to gracefully fall back when it fails to register one. Cheers, Matt Matt Evans (8): kvm tools: Add initial SPAPR PPC64 architecture support kvm tools: Generate SPAPR PPC64 guest device tree kvm tools: Add SPAPR PPC64 hcall rtascall structure kvm tools: Add SPAPR PPC64 HV console kvm tools: Add PPC64 XICS interrupt controller support kvm tools: Add PPC64 PCI Host Bridge kvm tools: Add PPC64 kvm_cpu__emulate_io() kvm tools: Make virtio-pci's ioeventfd__add_event() fall back gracefully if ioeventfds unavailable tools/kvm/Makefile | 16 + tools/kvm/include/kvm/ioeventfd.h|3 +- tools/kvm/ioeventfd.c| 12 +- tools/kvm/kvm.c |3 + tools/kvm/powerpc/include/kvm/barrier.h |6 + tools/kvm/powerpc/include/kvm/kvm-arch.h | 74 tools/kvm/powerpc/include/kvm/kvm-cpu-arch.h | 48 +++ tools/kvm/powerpc/ioport.c | 18 + tools/kvm/powerpc/irq.c | 62 +++ tools/kvm/powerpc/kvm-cpu.c | 281 ++ tools/kvm/powerpc/kvm.c | 466 +++ tools/kvm/powerpc/spapr.h| 316 +++ tools/kvm/powerpc/spapr_hcall.c | 151 tools/kvm/powerpc/spapr_hvcons.c | 101 + tools/kvm/powerpc/spapr_hvcons.h | 19 + tools/kvm/powerpc/spapr_pci.c| 429 + tools/kvm/powerpc/spapr_pci.h| 38 ++ tools/kvm/powerpc/spapr_rtas.c | 226 +++ tools/kvm/powerpc/xics.c | 529 ++ tools/kvm/powerpc/xics.h | 23 ++ tools/kvm/virtio/pci.c | 11 +- 21 files changed, 2827 insertions(+), 5 deletions(-) create mode 100644 tools/kvm/powerpc/include/kvm/barrier.h create mode 100644 tools/kvm/powerpc/include/kvm/kvm-arch.h create mode 100644 tools/kvm/powerpc/include/kvm/kvm-cpu-arch.h create mode 100644 tools/kvm/powerpc/ioport.c create mode 100644 tools/kvm/powerpc/irq.c create mode 100644 tools/kvm/powerpc/kvm-cpu.c create mode 100644 tools/kvm/powerpc/kvm.c create mode 100644 tools/kvm/powerpc/spapr.h create mode 100644 tools/kvm/powerpc/spapr_hcall.c create mode 100644 tools/kvm/powerpc/spapr_hvcons.c create mode 100644 tools/kvm/powerpc/spapr_hvcons.h create mode 100644 tools/kvm/powerpc/spapr_pci.c create mode 100644 tools/kvm/powerpc/spapr_pci.h create mode 100644 tools/kvm/powerpc/spapr_rtas.c create mode 100644 tools/kvm/powerpc/xics.c create mode 100644 tools/kvm/powerpc/xics.h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/8] kvm tools: Add SPAPR PPC64 hcall rtascall structure
This patch adds the basic structure for HV calls, their registration and some of the simpler calls. A similar layout for RTAS calls is also added, again with some of the simpler RTAS calls used by the guest. The SPAPR RTAS stub is generated inline. Also, nodes for RTAS are added to the device tree. Signed-off-by: Matt Evans m...@ozlabs.org --- tools/kvm/Makefile |2 + tools/kvm/powerpc/kvm-cpu.c |5 + tools/kvm/powerpc/kvm.c | 39 +- tools/kvm/powerpc/spapr.h | 308 +++ tools/kvm/powerpc/spapr_hcall.c | 151 +++ tools/kvm/powerpc/spapr_rtas.c | 226 6 files changed, 730 insertions(+), 1 deletions(-) create mode 100644 tools/kvm/powerpc/spapr.h create mode 100644 tools/kvm/powerpc/spapr_hcall.c create mode 100644 tools/kvm/powerpc/spapr_rtas.c diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile index dc18959..0f24104 100644 --- a/tools/kvm/Makefile +++ b/tools/kvm/Makefile @@ -128,6 +128,8 @@ ifeq ($(uname_M), ppc64) OBJS+= powerpc/irq.o OBJS+= powerpc/kvm.o OBJS+= powerpc/kvm-cpu.o + OBJS+= powerpc/spapr_hcall.o + OBJS+= powerpc/spapr_rtas.o ARCH_INCLUDE := powerpc/include CFLAGS += -m64 LIBS+= -lfdt diff --git a/tools/kvm/powerpc/kvm-cpu.c b/tools/kvm/powerpc/kvm-cpu.c index 79422ff..71c648e 100644 --- a/tools/kvm/powerpc/kvm-cpu.c +++ b/tools/kvm/powerpc/kvm-cpu.c @@ -14,6 +14,8 @@ #include kvm/util.h #include kvm/kvm.h +#include spapr.h + #include sys/ioctl.h #include sys/mman.h #include signal.h @@ -156,6 +158,9 @@ bool kvm_cpu__handle_exit(struct kvm_cpu *vcpu) bool ret = true; struct kvm_run *run = vcpu-kvm_run; switch(run-exit_reason) { + case KVM_EXIT_PAPR_HCALL: + run-papr_hcall.ret = spapr_hypercall(vcpu, run-papr_hcall.nr, run-papr_hcall.args); + break; default: ret = false; } diff --git a/tools/kvm/powerpc/kvm.c b/tools/kvm/powerpc/kvm.c index d792bee..2f0a921 100644 --- a/tools/kvm/powerpc/kvm.c +++ b/tools/kvm/powerpc/kvm.c @@ -14,6 +14,8 @@ #include kvm/kvm.h #include kvm/util.h +#include spapr.h + #include linux/kvm.h #include sys/types.h @@ -153,6 +155,10 @@ void kvm__arch_init(struct kvm *kvm, const char *kvm_dev, const char *hugetlbfs_ cap_ppc_rma = ioctl(kvm-sys_fd, KVM_CHECK_EXTENSION, KVM_CAP_PPC_RMA); if (cap_ppc_rma == 2) die(Need contiguous RMA allocation on this hardware, which is not yet supported.); + + /* Do these before FDT setup, IRQ setup, etc. */ + hypercall_init(); + register_core_rtas(); } void kvm__irq_line(struct kvm *kvm, int irq, int level) @@ -262,6 +268,20 @@ static void setup_fdt(struct kvm *kvm) _FDT(fdt_property_cell(fdt, #address-cells, 0x2)); _FDT(fdt_property_cell(fdt, #size-cells, 0x2)); + /* RTAS */ + _FDT(fdt_begin_node(fdt, rtas)); + /* This is what the kernel uses to switch 'We're an LPAR'! */ +_FDT(fdt_property(fdt, ibm,hypertas-functions, hypertas_prop_kvm, + sizeof(hypertas_prop_kvm))); + _FDT(fdt_property_cell(fdt, linux,rtas-base, kvm-rtas_gra)); + _FDT(fdt_property_cell(fdt, linux,rtas-entry, kvm-rtas_gra)); + _FDT(fdt_property_cell(fdt, rtas-size, kvm-rtas_size)); + /* Now add properties for all RTAS tokens: */ + if (spapr_rtas_fdt_setup(kvm, fdt)) + die(Couldn't create RTAS FDT properties\n); + + _FDT(fdt_end_node(fdt)); + /* /chosen */ _FDT(fdt_begin_node(fdt, chosen)); /* cmdline */ @@ -363,7 +383,24 @@ static void setup_fdt(struct kvm *kvm) */ void kvm__arch_setup_firmware(struct kvm *kvm) { - /* Load RTAS */ + /* Set up RTAS stub. All it is is a single hypercall: + 0: 7c 64 1b 78 mr r4,r3 + 4: 3c 60 00 00 lis r3,0 + 8: 60 63 f0 00 ori r3,r3,61440 + c: 44 00 00 22 sc 1 + 10: 4e 80 00 20 blr + */ + uint32_t *rtas = guest_flat_to_host(kvm, kvm-rtas_gra); + + rtas[0] = 0x7c641b78; + rtas[1] = 0x3c60; + rtas[2] = 0x6063f000; + rtas[3] = 0x4422; + rtas[4] = 0x4e800020; + kvm-rtas_size = 20; + + pr_info(Set up %ld bytes of RTAS at 0x%lx\n, + kvm-rtas_size, kvm-rtas_gra); /* Load SLOF */ diff --git a/tools/kvm/powerpc/spapr.h b/tools/kvm/powerpc/spapr.h new file mode 100644 index 000..4e5d7bd --- /dev/null +++ b/tools/kvm/powerpc/spapr.h @@ -0,0 +1,308 @@ +/* + * SPAPR definitions and declarations + * + * Borrowed heavily from QEMU's spapr.h, + * Copyright (c) 2010 David Gibson, IBM Corporation. + * + * Modifications by Matt Evans m...@ozlabs.org, IBM Corporation. + * + * This program is free software; you can redistribute it and/or
[PATCH 4/8] kvm tools: Add SPAPR PPC64 HV console
This adds the console code, plus VIO HV terminal nodes are added to the device tree so the guest kernel will pick it up. Signed-off-by: Matt Evans m...@ozlabs.org --- tools/kvm/Makefile |1 + tools/kvm/powerpc/kvm.c | 31 tools/kvm/powerpc/spapr_hvcons.c | 101 ++ tools/kvm/powerpc/spapr_hvcons.h | 19 +++ 4 files changed, 152 insertions(+), 0 deletions(-) create mode 100644 tools/kvm/powerpc/spapr_hvcons.c create mode 100644 tools/kvm/powerpc/spapr_hvcons.h diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile index 0f24104..76cce3a 100644 --- a/tools/kvm/Makefile +++ b/tools/kvm/Makefile @@ -130,6 +130,7 @@ ifeq ($(uname_M), ppc64) OBJS+= powerpc/kvm-cpu.o OBJS+= powerpc/spapr_hcall.o OBJS+= powerpc/spapr_rtas.o + OBJS+= powerpc/spapr_hvcons.o ARCH_INCLUDE := powerpc/include CFLAGS += -m64 LIBS+= -lfdt diff --git a/tools/kvm/powerpc/kvm.c b/tools/kvm/powerpc/kvm.c index 2f0a921..8614538 100644 --- a/tools/kvm/powerpc/kvm.c +++ b/tools/kvm/powerpc/kvm.c @@ -15,6 +15,7 @@ #include kvm/util.h #include spapr.h +#include spapr_hvcons.h #include linux/kvm.h @@ -159,6 +160,8 @@ void kvm__arch_init(struct kvm *kvm, const char *kvm_dev, const char *hugetlbfs_ /* Do these before FDT setup, IRQ setup, etc. */ hypercall_init(); register_core_rtas(); + /* Now that hypercalls are initialised, register a couple for the console: */ + spapr_hvcons_init(); } void kvm__irq_line(struct kvm *kvm, int irq, int level) @@ -172,6 +175,11 @@ void kvm__irq_trigger(struct kvm *kvm, int irq) kvm__irq_line(kvm, irq, 0); } +void kvm__arch_periodic_poll(struct kvm *kvm) +{ + spapr_hvcons_poll(kvm); +} + int load_flat_binary(struct kvm *kvm, int fd_kernel, int fd_initrd, const char *kernel_cmdline) { void *p; @@ -297,6 +305,13 @@ static void setup_fdt(struct kvm *kvm) ird_end_prop, sizeof(ird_end_prop))); } + /* stdout-path: This is assuming we're using the HV console. Also, the +* address is hardwired until we do a VIO bus. +*/ + _FDT(fdt_property_string(fdt, linux,stdout-path, +/vdevice/vty@3000)); + _FDT(fdt_end_node(fdt)); + /* Memory: We don't alloc. a separate RMA yet. If we ever need to * (CAP_PPC_RMA == 2) then have one memory node for 0-RMAsize, and * another RMAsize-endOfMem. @@ -369,6 +384,22 @@ static void setup_fdt(struct kvm *kvm) } _FDT(fdt_end_node(fdt)); + /* VIO: See comment in linux,stdout-path; we don't yet represent a VIO +* bus/address allocation so addresses are hardwired here. +*/ + _FDT(fdt_begin_node(fdt, vdevice)); + _FDT(fdt_property_cell(fdt, #address-cells, 0x1)); + _FDT(fdt_property_cell(fdt, #size-cells, 0x0)); + _FDT(fdt_property_string(fdt, device_type, vdevice)); + _FDT(fdt_property_string(fdt, compatible, IBM,vdevice)); + _FDT(fdt_begin_node(fdt, vty@3000)); + _FDT(fdt_property_string(fdt, name, vty)); + _FDT(fdt_property_string(fdt, device_type, serial)); + _FDT(fdt_property_string(fdt, compatible, hvterm1)); + _FDT(fdt_property_cell(fdt, reg, 0x3000)); + _FDT(fdt_end_node(fdt)); + _FDT(fdt_end_node(fdt)); + /* Finalise: */ _FDT(fdt_end_node(fdt)); /* Root node */ _FDT(fdt_finish(fdt)); diff --git a/tools/kvm/powerpc/spapr_hvcons.c b/tools/kvm/powerpc/spapr_hvcons.c new file mode 100644 index 000..97902ac --- /dev/null +++ b/tools/kvm/powerpc/spapr_hvcons.c @@ -0,0 +1,101 @@ +/* + * SPAPR HV console + * + * Borrowed lightly from QEMU's spapr_vty.c, Copyright (c) 2010 David Gibson, + * IBM Corporation. + * + * Copyright (c) 2011 Matt Evans m...@ozlabs.org, IBM Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published + * by the Free Software Foundation. + */ + +#include kvm/term.h +#include kvm/kvm.h +#include kvm/kvm-cpu.h +#include kvm/util.h +#include spapr.h +#include spapr_hvcons.h + +#include stdio.h +#include sys/uio.h +#include errno.h + +#include linux/byteorder.h + +union hv_chario { + struct { + uint64_t char0_7; + uint64_t char8_15; + } a; + uint8_t buf[16]; +}; + +static unsigned long h_put_term_char(struct kvm_cpu *vcpu, unsigned long opcode, unsigned long *args) +{ + /* To do: Read register from args[0], and check it. */ + unsigned long len = args[1]; + union hv_chario data; + struct iovec iov; + + if (len 16) { + return H_PARAMETER; + } + data.a.char0_7 = cpu_to_be64(args[2]); + data.a.char8_15 = cpu_to_be64(args[3]); + + iov.iov_base =
[PATCH 5/8] kvm tools: Add PPC64 XICS interrupt controller support
This patch adds XICS emulation code (heavily borrowed from QEMU), and wires this into kvm_cpu__irq() to fire a CPU IRQ via KVM. A device tree entry is also added. IPIs work, xics_alloc_irqnum() is added to allocate an external IRQ (which will later be used by the PHB PCI code) and finally, kvm__irq_line() can be called to raise an IRQ on XICS. Signed-off-by: Matt Evans m...@ozlabs.org --- tools/kvm/Makefile |1 + tools/kvm/powerpc/include/kvm/kvm-arch.h |1 + tools/kvm/powerpc/include/kvm/kvm-cpu-arch.h |2 + tools/kvm/powerpc/irq.c | 11 +- tools/kvm/powerpc/kvm-cpu.c | 10 + tools/kvm/powerpc/kvm.c | 25 +- tools/kvm/powerpc/xics.c | 529 ++ tools/kvm/powerpc/xics.h | 23 ++ 8 files changed, 596 insertions(+), 6 deletions(-) create mode 100644 tools/kvm/powerpc/xics.c create mode 100644 tools/kvm/powerpc/xics.h diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile index 76cce3a..6c8 100644 --- a/tools/kvm/Makefile +++ b/tools/kvm/Makefile @@ -131,6 +131,7 @@ ifeq ($(uname_M), ppc64) OBJS+= powerpc/spapr_hcall.o OBJS+= powerpc/spapr_rtas.o OBJS+= powerpc/spapr_hvcons.o + OBJS+= powerpc/xics.o ARCH_INCLUDE := powerpc/include CFLAGS += -m64 LIBS+= -lfdt diff --git a/tools/kvm/powerpc/include/kvm/kvm-arch.h b/tools/kvm/powerpc/include/kvm/kvm-arch.h index 722d01c..ae811e9 100644 --- a/tools/kvm/powerpc/include/kvm/kvm-arch.h +++ b/tools/kvm/powerpc/include/kvm/kvm-arch.h @@ -65,6 +65,7 @@ struct kvm { unsigned long initrd_gra; unsigned long initrd_size; const char *name; + struct icp_state*icp; }; #endif /* KVM__KVM_ARCH_H */ diff --git a/tools/kvm/powerpc/include/kvm/kvm-cpu-arch.h b/tools/kvm/powerpc/include/kvm/kvm-cpu-arch.h index dbabc57..551307e 100644 --- a/tools/kvm/powerpc/include/kvm/kvm-cpu-arch.h +++ b/tools/kvm/powerpc/include/kvm/kvm-cpu-arch.h @@ -17,6 +17,8 @@ #include pthread.h +#define POWER7_EXT_IRQ 0 + struct kvm; struct kvm_cpu { diff --git a/tools/kvm/powerpc/irq.c b/tools/kvm/powerpc/irq.c index 46aa64f..80c972a 100644 --- a/tools/kvm/powerpc/irq.c +++ b/tools/kvm/powerpc/irq.c @@ -21,6 +21,10 @@ #include stddef.h #include stdlib.h +#include xics.h + +#define XICS_IRQS 1024 + int irq__register_device(u32 dev, u8 *num, u8 *pin, u8 *line) { fprintf(stderr, irq__register_device(%d, [%d], [%d], [%d]\n, @@ -30,7 +34,12 @@ int irq__register_device(u32 dev, u8 *num, u8 *pin, u8 *line) void irq__init(struct kvm *kvm) { - fprintf(stderr, __func__); + /* kvm-nr_cpus is now valid; for /now/, pass +* this to xics_system_init(), which assumes servers +* are numbered 0..nrcpus. This may not really be true, +* but it is OK currently. +*/ + kvm-icp = xics_system_init(XICS_IRQS, kvm-nrcpus); } int irq__add_msix_route(struct kvm *kvm, struct msi_msg *msg) diff --git a/tools/kvm/powerpc/kvm-cpu.c b/tools/kvm/powerpc/kvm-cpu.c index 71c648e..63cd106 100644 --- a/tools/kvm/powerpc/kvm-cpu.c +++ b/tools/kvm/powerpc/kvm-cpu.c @@ -15,6 +15,7 @@ #include kvm/kvm.h #include spapr.h +#include xics.h #include sys/ioctl.h #include sys/mman.h @@ -107,6 +108,9 @@ struct kvm_cpu *kvm_cpu__init(struct kvm *kvm, unsigned long cpu_id) */ vcpu-is_running = true; + /* Register with IRQ controller */ + xics_cpu_register(vcpu); + return vcpu; } @@ -151,6 +155,12 @@ void kvm_cpu__reset_vcpu(struct kvm_cpu *vcpu) /* kvm_cpu__irq - set KVM's IRQ flag on this vcpu */ void kvm_cpu__irq(struct kvm_cpu *vcpu, int pin, int level) { + unsigned int virq = level ? KVM_INTERRUPT_SET_LEVEL : KVM_INTERRUPT_UNSET; + + if (pin != POWER7_EXT_IRQ) + return; + if (ioctl(vcpu-vcpu_fd, KVM_INTERRUPT, virq) 0) + pr_warning(Could not KVM_INTERRUPT.); } bool kvm_cpu__handle_exit(struct kvm_cpu *vcpu) diff --git a/tools/kvm/powerpc/kvm.c b/tools/kvm/powerpc/kvm.c index 8614538..bfd7c3a 100644 --- a/tools/kvm/powerpc/kvm.c +++ b/tools/kvm/powerpc/kvm.c @@ -41,9 +41,13 @@ #define HUGETLBFS_PATH /var/lib/hugetlbfs/global/pagesize-16MB/ +#define PHANDLE_XICP 0x + static char kern_cmdline[2048]; struct kvm_ext kvm_req_ext[] = { + { DEFINE_KVM_EXT(KVM_CAP_PPC_UNSET_IRQ) }, + { DEFINE_KVM_EXT(KVM_CAP_PPC_IRQ_LEVEL) }, { 0, 0 } }; @@ -164,11 +168,6 @@ void kvm__arch_init(struct kvm *kvm, const char *kvm_dev, const char *hugetlbfs_ spapr_hvcons_init(); } -void kvm__irq_line(struct kvm *kvm, int irq, int level) -{ - fprintf(stderr, irq_line(%d, %d)\n, irq, level); -} - void kvm__irq_trigger(struct kvm *kvm, int irq) { kvm__irq_line(kvm, irq, 1); @@
[PATCH 6/8] kvm tools: Add PPC64 PCI Host Bridge
This provides the PCI bridge, definitions for the address layout of the windows and wires in IRQs. Once PCI devices are all registered, they are enumerated and DT nodes generated for each. Signed-off-by: Matt Evans m...@ozlabs.org --- tools/kvm/powerpc/include/kvm/kvm-arch.h |3 + tools/kvm/powerpc/irq.c | 17 +- tools/kvm/powerpc/kvm.c | 11 + tools/kvm/powerpc/spapr.h|8 + tools/kvm/powerpc/spapr_pci.c| 429 ++ tools/kvm/powerpc/spapr_pci.h| 38 +++ 6 files changed, 504 insertions(+), 2 deletions(-) create mode 100644 tools/kvm/powerpc/spapr_pci.c create mode 100644 tools/kvm/powerpc/spapr_pci.h diff --git a/tools/kvm/powerpc/include/kvm/kvm-arch.h b/tools/kvm/powerpc/include/kvm/kvm-arch.h index ae811e9..ba374f5 100644 --- a/tools/kvm/powerpc/include/kvm/kvm-arch.h +++ b/tools/kvm/powerpc/include/kvm/kvm-arch.h @@ -40,6 +40,8 @@ */ #define KVM_PCI_MMIO_AREA 0x100 +struct spapr_phb; + struct kvm { int sys_fd; /* For system ioctls(), i.e. /dev/kvm */ int vm_fd; /* For VM ioctls() */ @@ -66,6 +68,7 @@ struct kvm { unsigned long initrd_size; const char *name; struct icp_state*icp; + struct spapr_phb*phb; }; #endif /* KVM__KVM_ARCH_H */ diff --git a/tools/kvm/powerpc/irq.c b/tools/kvm/powerpc/irq.c index 80c972a..134db8f 100644 --- a/tools/kvm/powerpc/irq.c +++ b/tools/kvm/powerpc/irq.c @@ -21,14 +21,27 @@ #include stddef.h #include stdlib.h +#include kvm/pci.h + #include xics.h +#include spapr_pci.h #define XICS_IRQS 1024 +static int pci_devs = 0; + int irq__register_device(u32 dev, u8 *num, u8 *pin, u8 *line) { - fprintf(stderr, irq__register_device(%d, [%d], [%d], [%d]\n, - dev, *num, *pin, *line); + if (pci_devs = PCI_MAX_DEVICES) + die(Hit PCI device limit!\n); + + *num = pci_devs++; + + *pin = 1; + /* Have I said how nasty I find this? Line should be dontcare... PHB +* should determine which CPU/XICS IRQ to fire. +*/ + *line = xics_alloc_irqnum(); return 0; } diff --git a/tools/kvm/powerpc/kvm.c b/tools/kvm/powerpc/kvm.c index bfd7c3a..353c667 100644 --- a/tools/kvm/powerpc/kvm.c +++ b/tools/kvm/powerpc/kvm.c @@ -16,6 +16,7 @@ #include spapr.h #include spapr_hvcons.h +#include spapr_pci.h #include linux/kvm.h @@ -166,6 +167,11 @@ void kvm__arch_init(struct kvm *kvm, const char *kvm_dev, const char *hugetlbfs_ register_core_rtas(); /* Now that hypercalls are initialised, register a couple for the console: */ spapr_hvcons_init(); + spapr_create_phb(kvm, pci, SPAPR_PCI_BUID, +SPAPR_PCI_MEM_WIN_ADDR, +SPAPR_PCI_MEM_WIN_SIZE, +SPAPR_PCI_IO_WIN_ADDR, +SPAPR_PCI_IO_WIN_SIZE); } void kvm__irq_trigger(struct kvm *kvm, int irq) @@ -420,6 +426,11 @@ static void setup_fdt(struct kvm *kvm) _FDT(fdt_finish(fdt)); _FDT(fdt_open_into(fdt, fdt_dest, FDT_MAX_SIZE)); + + /* PCI */ + if (spapr_populate_pci_devices(kvm, PHANDLE_XICP, fdt_dest)) + die(Fail populating PCI device nodes); + _FDT(fdt_add_mem_rsv(fdt_dest, kvm-rtas_gra, kvm-rtas_size)); _FDT(fdt_pack(fdt_dest)); } diff --git a/tools/kvm/powerpc/spapr.h b/tools/kvm/powerpc/spapr.h index 4e5d7bd..902496d 100644 --- a/tools/kvm/powerpc/spapr.h +++ b/tools/kvm/powerpc/spapr.h @@ -305,4 +305,12 @@ target_ulong spapr_rtas_call(struct kvm_cpu *vcpu, uint32_t token, uint32_t nargs, target_ulong args, uint32_t nret, target_ulong rets); +#define SPAPR_PCI_BUID 0x8002001ULL +#define SPAPR_PCI_MEM_WIN_ADDR (KVM_MMIO_START + 0xA000) +#define SPAPR_PCI_MEM_WIN_SIZE 0x2000 +#define SPAPR_PCI_IO_WIN_ADDR (KVM_MMIO_START + 0x8000) +/* This, to me, is odd... 32MB of I/O? Some PHBs are set up like this. + * Anything ever use 64K? :P */ +#define SPAPR_PCI_IO_WIN_SIZE 0x200 + #endif /* !defined (__HW_SPAPR_H__) */ diff --git a/tools/kvm/powerpc/spapr_pci.c b/tools/kvm/powerpc/spapr_pci.c new file mode 100644 index 000..233c42c --- /dev/null +++ b/tools/kvm/powerpc/spapr_pci.c @@ -0,0 +1,429 @@ +/* + * SPAPR PHB emulation, RTAS interface to PCI config space, device tree nodes + * for enumerated devices. + * + * Borrowed heavily from QEMU's spapr_pci.c, + * Copyright (c) 2011 Alexey Kardashevskiy, IBM Corporation. + * Copyright (c) 2011 David Gibson, IBM Corporation. + * + * Modifications copyright 2011 Matt Evans m...@ozlabs.org, IBM Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public
[PATCH 7/8] kvm tools: Add PPC64 kvm_cpu__emulate_io()
This is the final piece of the puzzle for PPC SPAPR PCI; this function splits MMIO accesses into the two PHB windows directs things to MMIO/IO emulation as appropriate. Signed-off-by: Matt Evans m...@ozlabs.org --- tools/kvm/Makefile |1 + tools/kvm/powerpc/kvm-cpu.c | 34 ++ 2 files changed, 35 insertions(+), 0 deletions(-) diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile index 6c8..9b875dd 100644 --- a/tools/kvm/Makefile +++ b/tools/kvm/Makefile @@ -131,6 +131,7 @@ ifeq ($(uname_M), ppc64) OBJS+= powerpc/spapr_hcall.o OBJS+= powerpc/spapr_rtas.o OBJS+= powerpc/spapr_hvcons.o + OBJS+= powerpc/spapr_pci.o OBJS+= powerpc/xics.o ARCH_INCLUDE := powerpc/include CFLAGS += -m64 diff --git a/tools/kvm/powerpc/kvm-cpu.c b/tools/kvm/powerpc/kvm-cpu.c index 63cd106..0cf4dc8 100644 --- a/tools/kvm/powerpc/kvm-cpu.c +++ b/tools/kvm/powerpc/kvm-cpu.c @@ -24,6 +24,7 @@ #include string.h #include errno.h #include stdio.h +#include assert.h static int debug_fd; @@ -177,6 +178,39 @@ bool kvm_cpu__handle_exit(struct kvm_cpu *vcpu) return ret; } +bool kvm_cpu__emulate_io(struct kvm_cpu *cpu, struct kvm_run *kvm_run) +{ + bool ret = false; + u64 phys_addr; + + /* We'll never get KVM_EXIT_IO, it's x86-specific. All IO is MM! :P +* So, look at our windows here split addresses into I/O or MMIO. +*/ + assert(kvm_run-exit_reason == KVM_EXIT_MMIO); + + phys_addr = cpu-kvm_run-mmio.phys_addr; + if ((phys_addr = SPAPR_PCI_IO_WIN_ADDR) + (phys_addr SPAPR_PCI_IO_WIN_ADDR + SPAPR_PCI_IO_WIN_SIZE)) { + ret = kvm__emulate_io(cpu-kvm, phys_addr - SPAPR_PCI_IO_WIN_ADDR, + cpu-kvm_run-mmio.data, + cpu-kvm_run-mmio.is_write ? + KVM_EXIT_IO_OUT : KVM_EXIT_IO_IN, + cpu-kvm_run-mmio.len, 1); + } else if ((phys_addr = SPAPR_PCI_MEM_WIN_ADDR) + (phys_addr SPAPR_PCI_MEM_WIN_ADDR + SPAPR_PCI_MEM_WIN_SIZE)) { + ret = kvm__emulate_mmio(cpu-kvm, + cpu-kvm_run-mmio.phys_addr - SPAPR_PCI_MEM_WIN_ADDR, + cpu-kvm_run-mmio.data, + cpu-kvm_run-mmio.len, + cpu-kvm_run-mmio.is_write); + } else { + pr_warning(MMIO %s unknown address %lx (size %d)!\n, + cpu-kvm_run-mmio.is_write ? write to : read from, + phys_addr, cpu-kvm_run-mmio.len); + } + return ret; +} + #define CONDSTR_BIT(m, b) (((m) MSR_##b) ? #b : ) void kvm_cpu__show_registers(struct kvm_cpu *vcpu) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 8/8] kvm tools: Make virtio-pci's ioeventfd__add_event() fall back gracefully if ioeventfds unavailable
PPC KVM doesn't yet support ioeventfds, so don't bomb out/die. virtio-pci is able to function if it instead uses normal IO port notification. Signed-off-by: Matt Evans m...@ozlabs.org --- tools/kvm/include/kvm/ioeventfd.h |3 ++- tools/kvm/ioeventfd.c | 12 +--- tools/kvm/virtio/pci.c| 11 ++- 3 files changed, 21 insertions(+), 5 deletions(-) diff --git a/tools/kvm/include/kvm/ioeventfd.h b/tools/kvm/include/kvm/ioeventfd.h index df01750..5e458be 100644 --- a/tools/kvm/include/kvm/ioeventfd.h +++ b/tools/kvm/include/kvm/ioeventfd.h @@ -4,6 +4,7 @@ #include linux/types.h #include linux/list.h #include sys/eventfd.h +#include stdbool.h struct kvm; @@ -21,7 +22,7 @@ struct ioevent { void ioeventfd__init(void); void ioeventfd__start(void); -void ioeventfd__add_event(struct ioevent *ioevent); +bool ioeventfd__add_event(struct ioevent *ioevent); void ioeventfd__del_event(u64 addr, u64 datamatch); #endif diff --git a/tools/kvm/ioeventfd.c b/tools/kvm/ioeventfd.c index 3a240e4..37f9a63 100644 --- a/tools/kvm/ioeventfd.c +++ b/tools/kvm/ioeventfd.c @@ -26,7 +26,7 @@ void ioeventfd__init(void) die(Failed creating epoll fd); } -void ioeventfd__add_event(struct ioevent *ioevent) +bool ioeventfd__add_event(struct ioevent *ioevent) { struct kvm_ioeventfd kvm_ioevent; struct epoll_event epoll_event; @@ -48,8 +48,13 @@ void ioeventfd__add_event(struct ioevent *ioevent) .flags = KVM_IOEVENTFD_FLAG_PIO | KVM_IOEVENTFD_FLAG_DATAMATCH, }; - if (ioctl(ioevent-fn_kvm-vm_fd, KVM_IOEVENTFD, kvm_ioevent) != 0) - die(Failed creating new ioeventfd); + if (ioctl(ioevent-fn_kvm-vm_fd, KVM_IOEVENTFD, kvm_ioevent) != 0) { + /* Not all KVM implementations may support KVM_IOEVENTFD, +* so be graceful. +*/ + free(new_ioevent); + return false; + } epoll_event = (struct epoll_event) { .events = EPOLLIN, @@ -60,6 +65,7 @@ void ioeventfd__add_event(struct ioevent *ioevent) die(Failed assigning new event to the epoll fd); list_add_tail(new_ioevent-list, used_ioevents); + return true; } void ioeventfd__del_event(u64 addr, u64 datamatch) diff --git a/tools/kvm/virtio/pci.c b/tools/kvm/virtio/pci.c index ffa3768..06d3b79 100644 --- a/tools/kvm/virtio/pci.c +++ b/tools/kvm/virtio/pci.c @@ -50,7 +50,16 @@ static int virtio_pci__init_ioeventfd(struct kvm *kvm, struct virtio_trans *vtra .fd = eventfd(0, 0), }; - ioeventfd__add_event(ioevent); + if (!ioeventfd__add_event(ioevent)) { +#ifndef CONFIG_PPC + /* PPC64 doesn't have kvm ioevents yet, so we expect this to +* fail -- don't need to be verbose about it! For virtio-pci, +* this is fine. It catches the IO accesses anyway, so +* still works (but slower). +*/ + pr_warning(Failed creating new ioeventfd); +#endif + } if (vtrans-virtio_ops-notify_vq_eventfd) vtrans-virtio_ops-notify_vq_eventfd(kvm, vpci-dev, vq, ioevent.fd); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio-ring: Use threshold for switching to indirect descriptors
On Mon, 05 Dec 2011 11:52:54 +0200, Avi Kivity a...@redhat.com wrote: On 12/05/2011 02:10 AM, Rusty Russell wrote: On Sun, 04 Dec 2011 17:16:59 +0200, Avi Kivity a...@redhat.com wrote: On 12/04/2011 05:11 PM, Michael S. Tsirkin wrote: There's also the used ring, but that's a mistake if you have out of order completion. We should have used copying. Seems unrelated... unless you want used to be written into descriptor ring itself? The avail/used rings are in addition to the regular ring, no? If you copy descriptors, then it goes away. There were two ideas which drove the current design: 1) The Van-Jacobson style no two writers to same cacheline makes rings fast idea. Empirically, this doesn't show any winnage. Write/write is the same as write/read or read/write. Both cases have to send a probe and wait for the result. What we really need is to minimize cache line ping ponging, and the descriptor pool fails that with ooo completion. I doubt it's measurable though except with the very fastest storage providers. The claim was that going exclusive-shared-exclusive was cheaper than exclusive-invalid-exclusive. When VJ said it, it seemed convincing :) 2) Allowing a generic inter-guest copy mechanism, so we could have genuinely untrusted driver domains. Yet noone ever did this so it's hardly a killer feature :( It's still a goal, though not an important one. But we have to translate rings anyway, don't, since buffers are in guest physical addresses, and we're moving into an address space that doesn't map those. Yes, but the hypervisor/trusted party would simply have to do the copy; the rings themselves would be shared A would say copy this to/from B's ring entry N and you know that A can't have changed B's entry. I thought of having a vhost-copy driver that could do ring translation, using a dma engine for the copy. As long as we get the length of data written from the vhost-copy driver (ie. not just the network header). Otherwise a malicious other guest can send short packets, and a local process can read uninitialized memory. And pre-zeroing the buffers for this corner case sucks. So if we're going to revisit and drop those requirements, I'd say: 1) Shared device/driver rings like Xen. Xen uses device-specific ring contents, I'd be tempted to stick to our pre-headers, and a 'u64 addr; u64 len_and_flags; u64 cookie;' generic style. Then use the same ring for responses. That's a slight space-win, since we're 24 bytes vs 26 bytes now. Let's cheat and have inline contents. Take three bits from len_and_flags to specify additional descriptors as inline data. Nice, I like this optimization. Also, stuff the cookie into len_and_flags as well. Every driver really wants to put a pointer in there. We have an array to map desc. numbers to cookies inside the virtio core. We really want 64 bits. 2) Stick with physically-contiguous rings, but use them of size (2^n)-1. Makes the indexing harder, but that -1 lets us stash the indices in the first entry and makes the ring a nice 2^n size. Allocate at lease a cache line for those. The 2^n size is not really material, a division is never necessary. We free-run our indices, so we *do* a division (by truncation). If we limit indices to ringsize, then we have to handle empty/full confusion. It's nice for simple OSes if things pack nicely into pages, but it's not a killer feature IMHO. 16kB worth of descriptors is 1024 entries. With 4kB buffers, that's 4MB worth of data, or 4 ms at 10GbE line speed. With 1500 byte buffers it's just 1.5 ms. In any case I think it's sufficient. Right. So I think that without indirect, we waste about 3 entries per packet for virtio header and transport etc headers. That does suck. Are there issues in increasing the ring size? Or making it discontiguous? Because the qemu implementation is broken. I was talking about something else, but this is more important. Every time we make a simplifying assumption, it turns around and bites us, and the code becomes twice as complicated as it would have been in the first place, and the test matrix explodes. True, though we seem to be improving. But this is why I don't want optional features in the spec; I want us always to exercise all of it. We can often put the virtio header at the head of the packet. In practice, the qemu implementation insists the header be a single descriptor. (At least, it used to, perhaps it has now been fixed. We need a VIRTIO_NET_F_I_NOW_CONFORM_TO_THE_DAMN_SPEC_SORRY_I_SUCK bit). We'll run out of bits in no time. We had one already: VIRTIO_F_BAD_FEATURE. We haven't used it in a long time (if ever), and I just removed it from the latest version of the spec. But we can cheat: we can add this as a requirement to The New Ring Layout. And document
Re: [PATCH 00/28] kvm tools: Prepare kvmtool for another architecture
On 06/12/11 14:35, Matt Evans wrote: This patch series rearranges and tidies various parts of kvmtool to pave the way for the addition of support for another architecture -- SPAPR PPC64. A second patch series will follow to present the PPC64 support. I forgot to mention, of course, that these two sets apply on top of git://github.com/penberg/linux-kvm.git master as of d5e6b9fa. Also, I've have been testing PPC64 kvmtool using the book3s_hv KVM mode. Matt -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html