Re: [PATCH v2] Add Mergeable RX buffer feature to vhost_net
On Thu, Apr 01, 2010 at 11:22:37AM -0700, David Stevens wrote: kvm-ow...@vger.kernel.org wrote on 04/01/2010 03:54:15 AM: On Wed, Mar 31, 2010 at 03:04:43PM -0700, David Stevens wrote: + head.iov_base = (void *)vhost_get_vq_desc(net-dev, vq, + vq-iov, ARRAY_SIZE(vq-iov), out, in, NULL, NULL); I this casting confusing. Is it really expensive to add an array of heads so that we do not need to cast? It needs the heads and the lengths, which looks a lot like an iovec. I was trying to resist adding a new struct XXX { unsigned head; unsigned len; } just for this, but I could make these parallel arrays, one with head index and the other with length. Michael, on this one, if I add vq-heads as an argument to vhost_get_heads (aka vhost_get_desc_n), I'd need the length too. Would you rather this 1) remain an iovec (and a single arg added) but cast still there, 2) 2 arrays (head and length) and 2 args added, or 3) a new struct type of {unsigned,int} to carry for the heads+len instead of iovec? My preference would be 1). I agree the casts are ugly, but it is essentially an iovec the way we use it; it's just that the base isn't a pointer but a descriptor index instead. I prefer 2 or 3. If you prefer 1 strongly, I think we should add a detailed comment near the iovec, and a couple of inline wrappers to store/get data in the iovec. EAGAIN is not possible after the change, because we don't even enter the loop unless we have an skb on the read queue; the other cases bomb out, so I figured the comment for future work is now done. :-) Guest could be buggy so we'll get EFAULT. If skb is taken off the rx queue (as below), we might get EAGAIN. We break on any error. If we get EAGAIN because someone read on the socket, this code would break the loop, but EAGAIN is a more serious problem if it changed since we peeked (because it means someone else is reading the socket). But I don't understand -- are you suggesting that the error handling be different than that, or that the comment is still relevant? My intention here is to do the TODO from the comment so that it can be removed, by handling all error cases. I think because of the peek, EAGAIN isn't something to be ignored anymore, but the effect is the same whether we break out of the loop or not, since we retry the packet next time around. Essentially, we ignore every error since we will redo it with the same packet the next time around. Maybe we should print something here, but since we'll be retrying the packet that's still on the socket, a permanent error would spew continuously. Maybe we should shut down entirely if we get any negative return value here (including EAGAIN, since that tells us someone messed with the socket when we don't want them to). If you want the comment still there, ok, but I do think EAGAIN isn't a special case per the comment anymore, and is handled as all other errors are: by exiting the loop and retrying next time. +-DLS Yes, I just think some comment should stay, as you say, because otherwise we simply retry continuously. Maybe we should trigger vq_err. It needs to be given some thought which I have not given it yet. Thinking aloud, EAGAIN means someone reads the socket together with us, I prefer that this condition is made a fatal error, we should make sure we are polling the socket so we see packets if more appear. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] vhost: Make it more scalable by creating a vhost thread per device.
On Fri, Apr 02, 2010 at 10:31:20AM -0700, Sridhar Samudrala wrote: Make vhost scalable by creating a separate vhost thread per vhost device. This provides better scaling across multiple guests and with multiple interfaces in a guest. Thanks for looking into this. An alternative approach is to simply replace create_singlethread_workqueue with create_workqueue which would get us a thread per host CPU. It seems that in theory this should be the optimal approach wrt CPU locality, however, in practice a single thread seems to get better numbers. I have a TODO to investigate this. Could you try looking into this? I am seeing better aggregated througput/latency when running netperf across multiple guests or multiple interfaces in a guest in parallel with this patch. Any numbers? What happens to CPU utilization? Signed-off-by: Sridhar Samudrala s...@us.ibm.com diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index a6a88df..29aa80f 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -339,8 +339,10 @@ static int vhost_net_open(struct inode *inode, struct file *f) return r; } - vhost_poll_init(n-poll + VHOST_NET_VQ_TX, handle_tx_net, POLLOUT); - vhost_poll_init(n-poll + VHOST_NET_VQ_RX, handle_rx_net, POLLIN); + vhost_poll_init(n-poll + VHOST_NET_VQ_TX, handle_tx_net, POLLOUT, + n-dev); + vhost_poll_init(n-poll + VHOST_NET_VQ_RX, handle_rx_net, POLLIN, + n-dev); n-tx_poll_state = VHOST_NET_POLL_DISABLED; f-private_data = n; @@ -643,25 +645,14 @@ static struct miscdevice vhost_net_misc = { int vhost_net_init(void) { - int r = vhost_init(); - if (r) - goto err_init; - r = misc_register(vhost_net_misc); - if (r) - goto err_reg; - return 0; -err_reg: - vhost_cleanup(); -err_init: - return r; - + return misc_register(vhost_net_misc); } + module_init(vhost_net_init); void vhost_net_exit(void) { misc_deregister(vhost_net_misc); - vhost_cleanup(); } module_exit(vhost_net_exit); diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 7bd7a1e..243f4d3 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -36,8 +36,6 @@ enum { VHOST_MEMORY_F_LOG = 0x1, }; -static struct workqueue_struct *vhost_workqueue; - static void vhost_poll_func(struct file *file, wait_queue_head_t *wqh, poll_table *pt) { @@ -56,18 +54,19 @@ static int vhost_poll_wakeup(wait_queue_t *wait, unsigned mode, int sync, if (!((unsigned long)key poll-mask)) return 0; - queue_work(vhost_workqueue, poll-work); + queue_work(poll-dev-wq, poll-work); return 0; } /* Init poll structure */ void vhost_poll_init(struct vhost_poll *poll, work_func_t func, - unsigned long mask) + unsigned long mask, struct vhost_dev *dev) { INIT_WORK(poll-work, func); init_waitqueue_func_entry(poll-wait, vhost_poll_wakeup); init_poll_funcptr(poll-table, vhost_poll_func); poll-mask = mask; + poll-dev = dev; } /* Start polling a file. We add ourselves to file's wait queue. The caller must @@ -96,7 +95,7 @@ void vhost_poll_flush(struct vhost_poll *poll) void vhost_poll_queue(struct vhost_poll *poll) { - queue_work(vhost_workqueue, poll-work); + queue_work(poll-dev-wq, poll-work); } static void vhost_vq_reset(struct vhost_dev *dev, @@ -128,6 +127,11 @@ long vhost_dev_init(struct vhost_dev *dev, struct vhost_virtqueue *vqs, int nvqs) { int i; + + dev-wq = create_singlethread_workqueue(vhost); + if (!dev-wq) + return -ENOMEM; + dev-vqs = vqs; dev-nvqs = nvqs; mutex_init(dev-mutex); @@ -143,7 +147,7 @@ long vhost_dev_init(struct vhost_dev *dev, if (dev-vqs[i].handle_kick) vhost_poll_init(dev-vqs[i].poll, dev-vqs[i].handle_kick, - POLLIN); + POLLIN, dev); } return 0; } @@ -216,6 +220,8 @@ void vhost_dev_cleanup(struct vhost_dev *dev) if (dev-mm) mmput(dev-mm); dev-mm = NULL; + + destroy_workqueue(dev-wq); } static int log_access_ok(void __user *log_base, u64 addr, unsigned long sz) @@ -1095,16 +1101,3 @@ void vhost_disable_notify(struct vhost_virtqueue *vq) vq_err(vq, Failed to enable notification at %p: %d\n, vq-used-flags, r); } - -int vhost_init(void) -{ - vhost_workqueue = create_singlethread_workqueue(vhost); - if (!vhost_workqueue) - return -ENOMEM; - return 0; -} - -void vhost_cleanup(void) -{ - destroy_workqueue(vhost_workqueue); -} diff --git
Re: [PATCHv6 0/4] qemu-kvm: vhost net port
On Wed, Mar 24, 2010 at 02:38:57PM +0200, Avi Kivity wrote: On 03/17/2010 03:04 PM, Michael S. Tsirkin wrote: This is port of vhost v6 patch set I posted previously to qemu-kvm, for those that want to get good performance out of it :) This patchset needs to be applied when qemu.git one gets merged, this includes irqchip support. Ping me when this happens please. Ping -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
High CPU use of -usbdevice tablet (was Re: KVM usability)
Avi Kivity a...@redhat.com writes: On 03/02/2010 11:34 AM, Jernej Simončič wrote: On Tuesday, March 2, 2010, 9:21:18, Chris Webb wrote: I remember about a year ago, someone asserting on the list that -usbdevice tablet was very CPU intensive even when not in use, and should be avoided if mouse support wasn't needed, e.g. on non-graphical VMs. Was that actually a significant hit, and is it still true today? It would appear that this is still the case, at least on slower hosts - on Atom Z530 (1,6GHz), the XP VM uses ~30% CPU when idle with -usbdevice tablet, but only ~4% without it. However, on a faster host (Core2 Quad 2,66GHz), there's practically no difference (Vista x64 VM uses ~1% CPU when idle regardless of -usbdevice tablet). Looks like the tablet is set to 100 Hz polling rate. We may be able to get away with 30 Hz or even less (ep_bInterval, in ms, in hw/usb-wacom.c). Hi Avi. Sorry for the very late follow-up, but I decided to experiment with this. The cpu impact of the usb tablet device shows up fairly clearly on a crude test on my (relatively low-spec) desktop. Running an idle Fedora 11 livecd on qemu-kvm 0.12.3, top shows around 0.1% of my cpu in use, but this increases to roughly 5% when specifying -usbdevice tablet, and more detailed examination with perf record/report suggests about a factor of thirty too. It's actually a more general symptom with USB or at least HID devices by the look of things: although -usb doesn't increase CPU use on its own, the same increase in load can also be triggered by -usbdevice keyboard or mouse. However, running with all three of -usbdevice mouse, keyboard and tablet doesn't increase load any more than just one of these. Changing the USB tablet polling interval from 10ms to 100ms in both hw/usb-wacom.c and hw/usb-hid.c made no difference except the an increase in bInterval shown in lsusb -v in the guest and the hint of jerky mouse movement I expected from setting this value so high. A similar change to the polling interval for the keyboard and mouse also made no difference to their performance impact. Taking the FRAME_TIMER_FREQ down to 100 in hw/usb-uhci.c does seem to reduce the CPU load quite a bit, but at the expense of making the USB tablet (and presumably all other USB devices) very laggy. Could there be some bug here that causes the usb hid devices to wake qemu at the maximum rate possible (FRAME_TIMER_FREQ?) rather than the configured polling interval? Best wishes, Chris. PS Vmmouse works fine as an absolute pointing device in the place of -usbdevice tablet without the performance impact, but this isn't supported out of the box with typical linux live CDs (e.g. Fedora 11 and 12 or Knoppix) so unfortunately it's probably less suitable as a default configuration to expose to end-users. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Networkconfiguration with KVM
On Sun, Apr 4, 2010 at 5:47 PM, Dan Johansson k...@dmj.nu wrote: Hi, I am new to this list and to KVM (and qemu) so please be gentle with me. Up until now I have been running my virtualizing using VMWare-Server. Now I want to try KVM due to some issues with the VMWare-Server and I am having some troubles with the networking part of KVM. This is a small example of what I want (best viewed in a fix-font): +---+ | Host | | +--+ eth0 | 192.168.1.0/24 | | eth0|-- + | | | VM1 eth1|---(---+--- eth1 | 192.168.2.0/24 | | eth2|---(---(---+ | | +--+ | | | | | | | | | | +--+ +---(---(--- eth2 | 192.168.1.0/24 | | eth0|---+ | | | | | VM2 eth1|---+ +--- eth3 | 192.168.3.0/24 | | eth2|---+ | | +--+ | | | +---+ Host-eth0 is only for the Host (no VM) Host-eth1 is shared between the Host and the VM's (VM?-eth1) Host-eth2 and Host-eth3 are only for the VMs (eth0 and eth2) The Host and the VMs all have fixed IPs (no dhcp or likewise). In this example th IPs could be: Host-eth0: 192.168.1.1 Host-eth1: 192.168.2.1 Host-eth2: - Host-eth3: - VM1-eth0: 192.168.1.11 VM1-eth1: 192.168.2.11 VM1-eth2: 192.168.3.11 VM2-eth0: 192.168.1.22 VM2-eth1: 192.168.2.22 VM3-eth2: 192.168.3.22 And, yes, Host-eth0 and Host-eth2 are in the same subnet, with eth0 dedicated to the Host and eth2 dedicated to the VMs. In VMWare this was quite easy to setup (three bridged networks). Its easy with KVM too. You want 3 NICs per VM, so you need to pass the corresponding parameters(including qemu-ifup script) for 3 NICs to each VM. In the host you need to create 2 bridges: say br-eth1 and br-eth2. Make them as the interface on the host in place of the corresponding eth interfaces.(brct addbr br-eth1; ifcfg eth1 0.0.0.0 up; brctl addif br-eth eth1; assign eth1's ip and routes to breth1; same for eth2). In the corresponding qemu-ifup scripts of each interface use bridge=br-ethN (This basicaly translates to brctl addif br-ethN $1, where $ is the tap device created) This should work perfectly fine with your existing NW setup. For a quick reference use: http://www.linux-kvm.org/page/Networking Does someone know how I can set this up with KVM/QEMU? Regards, -- Dan Johansson, http://www.dmj.nu *** This message is printed on 100% recycled electrons! *** -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Regards Sudhir Kumar -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] High CPU use of -usbdevice tablet (was Re: KVM usability)
Looks like the tablet is set to 100 Hz polling rate. We may be able to get away with 30 Hz or even less (ep_bInterval, in ms, in hw/usb-wacom.c). Changing the USB tablet polling interval from 10ms to 100ms in both hw/usb-wacom.c and hw/usb-hid.c made no difference except the an increase in bInterval shown in lsusb -v in the guest and the hint of jerky mouse movement I expected from setting this value so high. A similar change to the polling interval for the keyboard and mouse also made no difference to their performance impact. The USB HID devices implement the SET_IDLE command, so the polling interval will have no real effect on performance. My guess is that the overhead you're seeing is entirely from the USB host adapter having to wake up and check the transport descriptor lists. This will only result in the guest being woken if a device actually responds (as mentioned above it should not). Taking the FRAME_TIMER_FREQ down to 100 in hw/usb-uhci.c does seem to reduce the CPU load quite a bit, but at the expense of making the USB tablet (and presumably all other USB devices) very laggy. The guest USB driver explicitly decides which devices to poll each frame. Slowing down the frame rate will effectively change the polling period by the same factor. e.g. the HID device requests a polling rate of 10ms, you slowed down frame rate by 10x, so you're efectively only polling every 100ms. If you want a quick and nasty hack then you can probably make the device wake up less often, and process multiple frames every wakeup. However this is probably going to do bad things (at best extremely poor performance) when using actual USB devices. Fixing this properly is hard because the transport descriptor lists are stores in system RAM, and polled by the host adapter. The first step is to read the whole table of descriptors, and calculate when the next event is due. However the guest will not explicitly notify the HBA when these tables are modified, so you also need some sort of MMU trap to trigger recalculation. This only gets you down to the base polling interval requested by the device. Increasing this interval causes significant user visible latency, so increasing it is not an option. The guest is also likely to distribute polling events evenly, further reducing the effective sleep interval. To fix this you need additional APIs so that a device can report when an endpoint will become unblocked, rather than just waiting to be polled and NAKing the request. Paul -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2933400 ] virtio-blk io errors / data corruption on raw drives 1 TB
Bugs item #2933400, was opened at 2010-01-16 15:35 Message generated for change (Comment added) made by masc82 You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2933400group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 9 Private: No Submitted By: MaSc82 (masc82) Assigned to: Nobody/Anonymous (nobody) Summary: virtio-blk io errors / data corruption on raw drives 1 TB Initial Comment: When attaching raw drives 1 TB, buffer io errors will most likely occur, filesystems get corrupted. Processes (like mkfs.ext4) might freeze completely when filesystems are created on the guest. Here's a typical log excerpt when using mkfs.ext4 on a 1.5 TB drive on a Ubuntu 9.10 guest: (kern.log) Jan 15 20:40:44 q kernel: [ 677.076602] Buffer I/O error on device vde, logical block 366283764 Jan 15 20:40:44 q kernel: [ 677.076607] Buffer I/O error on device vde, logical block 366283765 Jan 15 20:40:44 q kernel: [ 677.076611] Buffer I/O error on device vde, logical block 366283766 Jan 15 20:40:44 q kernel: [ 677.076616] Buffer I/O error on device vde, logical block 366283767 Jan 15 20:40:44 q kernel: [ 677.076621] Buffer I/O error on device vde, logical block 366283768 Jan 15 20:40:44 q kernel: [ 677.076626] Buffer I/O error on device vde, logical block 366283769 (messages) Jan 15 20:40:44 q kernel: [ 677.076534] lost page write due to I/O error on vde Jan 15 20:40:44 q kernel: [ 677.076541] lost page write due to I/O error on vde Jan 15 20:40:44 q kernel: [ 677.076546] lost page write due to I/O error on vde Jan 15 20:40:44 q kernel: [ 677.076599] lost page write due to I/O error on vde Jan 15 20:40:44 q kernel: [ 677.076604] lost page write due to I/O error on vde Jan 15 20:40:44 q kernel: [ 677.076609] lost page write due to I/O error on vde Jan 15 20:40:44 q kernel: [ 677.076613] lost page write due to I/O error on vde Jan 15 20:40:44 q kernel: [ 677.076618] lost page write due to I/O error on vde Jan 15 20:40:44 q kernel: [ 677.076623] lost page write due to I/O error on vde Jan 15 20:40:44 q kernel: [ 677.076628] lost page write due to I/O error on vde Jan 15 20:45:55 q Backgrounding to notify hosts... (The following entries will repeat infinitely, mkfs.ext4 will not exit and cannot be killed) Jan 15 20:49:27 q kernel: [ 1200.520096] mkfs.ext4 D 0 1839 1709 0x Jan 15 20:49:27 q kernel: [ 1200.520101] 88004e157cb8 0082 88004e157c58 00015880 Jan 15 20:49:27 q kernel: [ 1200.520115] 88004ef6c7c0 00015880 00015880 00015880 Jan 15 20:49:27 q kernel: [ 1200.520118] 00015880 88004ef6c7c0 00015880 00015880 Jan 15 20:49:27 q kernel: [ 1200.520123] Call Trace: Jan 15 20:49:27 q kernel: [ 1200.520157] [810da0f0] ? sync_page+0x0/0x50 Jan 15 20:49:27 q kernel: [ 1200.520178] [815255f8] io_schedule+0x28/0x40 Jan 15 20:49:27 q kernel: [ 1200.520182] [810da12d] sync_page+0x3d/0x50 Jan 15 20:49:27 q kernel: [ 1200.520185] [81525b17] __wait_on_bit+0x57/0x80 Jan 15 20:49:27 q kernel: [ 1200.520192] [810da29e] wait_on_page_bit+0x6e/0x80 Jan 15 20:49:27 q kernel: [ 1200.520205] [81078650] ? wake_bit_function+0x0/0x40 Jan 15 20:49:27 q kernel: [ 1200.520210] [810e44e0] ? pagevec_lookup_tag+0x20/0x30 Jan 15 20:49:27 q kernel: [ 1200.520213] [810da745] wait_on_page_writeback_range+0xf5/0x190 Jan 15 20:49:27 q kernel: [ 1200.520217] [810da807] filemap_fdatawait+0x27/0x30 Jan 15 20:49:27 q kernel: [ 1200.520220] [810dacb4] filemap_write_and_wait+0x44/0x50 Jan 15 20:49:27 q kernel: [ 1200.520235] [8114ba9f] __sync_blockdev+0x1f/0x40 Jan 15 20:49:27 q kernel: [ 1200.520239] [8114bace] sync_blockdev+0xe/0x10 Jan 15 20:49:27 q kernel: [ 1200.520241] [8114baea] block_fsync+0x1a/0x20 Jan 15 20:49:27 q kernel: [ 1200.520249] [81142f26] vfs_fsync+0x86/0xf0 Jan 15 20:49:27 q kernel: [ 1200.520252] [81142fc9] do_fsync+0x39/0x60 Jan 15 20:49:27 q kernel: [ 1200.520255] [8114301b] sys_fsync+0xb/0x10 Jan 15 20:49:27 q kernel: [ 1200.520271] [81011fc2] system_call_fastpath+0x16/0x1b In my case I was switching to virtio at one point, but the behaviour didn't show until there was 1 TB data on the filesystem. very dangerous. Tested using 2 different SATA controllers, 1.5 TB lvm/mdraid, single 1.5 TB drive and 2 TB lvm/mdraid. The behaviour does not occur with if=scsi or if=ide. #2914397 might be related: https://sourceforge.net/tracker/?func=detailaid=2914397group_id=180599atid=893831 This blog post might also relate: http://www.neuhalfen.name/2009/08/05/OpenSolaris_KVM_and_large_IDE_drives/ CPU: Intel
Re: [PATCHv6 0/4] qemu-kvm: vhost net port
On 04/04/2010 02:46 PM, Michael S. Tsirkin wrote: On Wed, Mar 24, 2010 at 02:38:57PM +0200, Avi Kivity wrote: On 03/17/2010 03:04 PM, Michael S. Tsirkin wrote: This is port of vhost v6 patch set I posted previously to qemu-kvm, for those that want to get good performance out of it :) This patchset needs to be applied when qemu.git one gets merged, this includes irqchip support. Ping me when this happens please. Ping Bounce. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] High CPU use of -usbdevice tablet (was Re: KVM usability)
On 04/04/2010 05:25 PM, Paul Brook wrote: Looks like the tablet is set to 100 Hz polling rate. We may be able to get away with 30 Hz or even less (ep_bInterval, in ms, in hw/usb-wacom.c). Changing the USB tablet polling interval from 10ms to 100ms in both hw/usb-wacom.c and hw/usb-hid.c made no difference except the an increase in bInterval shown in lsusb -v in the guest and the hint of jerky mouse movement I expected from setting this value so high. A similar change to the polling interval for the keyboard and mouse also made no difference to their performance impact. The USB HID devices implement the SET_IDLE command, so the polling interval will have no real effect on performance. On a Linux guest (F12), I see 125 USB interrupts per second with no mouse movement, so something is broken (on the guest or host). My guess is that the overhead you're seeing is entirely from the USB host adapter having to wake up and check the transport descriptor lists. This will only result in the guest being woken if a device actually responds (as mentioned above it should not). A quick profile on the host side doesn't show this. Instead, I see a lot of select() overhead. Surprising as there are ~10 descriptors being polled, so ~1200 polls per second. Maybe epoll will help here. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM Page Fault Question
(re-adding list) On 04/02/2010 07:01 PM, Marek Olszewski wrote: Thanks for the fast response. I'm trying to find the code that on a write to a guest page table entry, will iterate over all shadow page table entries that map that guest entry to update them. Can you point me to that code? I can't seem to find it myself :( See kvm_mmu_pte_write(). -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Networkconfiguration with KVM
On Sunday 04 April 2010 15.00:26 sudhir kumar wrote: On Sun, Apr 4, 2010 at 5:47 PM, Dan Johansson k...@dmj.nu wrote: Hi, I am new to this list and to KVM (and qemu) so please be gentle with me. Up until now I have been running my virtualizing using VMWare-Server. Now I want to try KVM due to some issues with the VMWare-Server and I am having some troubles with the networking part of KVM. This is a small example of what I want (best viewed in a fix-font): +---+ | Host | | +--+eth0 | 192.168.1.0/24 | | eth0|-- + | | | VM1 eth1|---(---+--- eth1 | 192.168.2.0/24 | | eth2|---(---(---+ | | +--+ | | | | | | | | | | +--+ +---(---(--- eth2 | 192.168.1.0/24 | | eth0|---+ | | | | | VM2 eth1|---+ +--- eth3 | 192.168.3.0/24 | | eth2|---+ | | +--+ | | | +---+ Host-eth0 is only for the Host (no VM) Host-eth1 is shared between the Host and the VM's (VM?-eth1) Host-eth2 and Host-eth3 are only for the VMs (eth0 and eth2) The Host and the VMs all have fixed IPs (no dhcp or likewise). In this example th IPs could be: Host-eth0: 192.168.1.1 Host-eth1: 192.168.2.1 Host-eth2: - Host-eth3: - VM1-eth0: 192.168.1.11 VM1-eth1: 192.168.2.11 VM1-eth2: 192.168.3.11 VM2-eth0: 192.168.1.22 VM2-eth1: 192.168.2.22 VM3-eth2: 192.168.3.22 And, yes, Host-eth0 and Host-eth2 are in the same subnet, with eth0 dedicated to the Host and eth2 dedicated to the VMs. In VMWare this was quite easy to setup (three bridged networks). Its easy with KVM too. You want 3 NICs per VM, so you need to pass the corresponding parameters(including qemu-ifup script) for 3 NICs to each VM. In the host you need to create 2 bridges: say br-eth1 and br-eth2. Make them as the interface on the host in place of the corresponding eth interfaces.(brct addbr br-eth1; ifcfg eth1 0.0.0.0 up; brctl addif br-eth eth1; assign eth1's ip and routes to breth1; same for eth2). In the corresponding qemu-ifup scripts of each interface use bridge=br-ethN (This basicaly translates to brctl addif br-ethN $1, where $ is the tap device created) This should work perfectly fine with your existing NW setup. For a quick reference use: http://www.linux-kvm.org/page/Networking Thanks for your help, but... I am still not able to get it to work the way I want. This is what I have don so far: brctl addbr br-eth1 brctl addbr br-eth3 ip link set eth1 up ip link set eth3 up brctl addif br-eth1 eth1 brctl addif br-eth3 eth3 tunctl -b -t qtap1 tunctl -b -t qtap3 brctl addif br-eth1 qtap1 brctl addif br-eth3 qtap3 ifconfig qtap1 up 0.0.0.0 promisc ifconfig qtap3 up 0.0.0.0 promisc # ifconfig eth0 Link encap:Ethernet HWaddr 00:0d:88:52:51:24 inet addr:192.168.1.3 Bcast:192.168.1.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:443638 errors:0 dropped:0 overruns:0 frame:0 TX packets:758540 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:47041686 (44.8 MiB) TX bytes:990115354 (944.2 MiB) Interrupt:19 Base address:0xec00 eth1 Link encap:Ethernet HWaddr 00:0d:88:52:51:25 inet addr:192.168.4.1 Bcast:192.168.4.255 Mask:255.255.255.0 UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:6 errors:0 dropped:0 overruns:0 carrier:6 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 B) TX bytes:360 (360.0 B) Interrupt:18 Base address:0xe880 eth3 Link encap:Ethernet HWaddr 00:0d:88:52:51:27 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:4 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 B) TX bytes:240 (240.0 B) Interrupt:16 Base address:0xe480 qtap1 Link encap:Ethernet HWaddr 26:c0:de:df:c5:e4 UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1 RX packets:351 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:500 RX bytes:14742 (14.3 KiB) TX bytes:0 (0.0 B) qtap3 Link encap:Ethernet HWaddr 26:3e:ba:2d:97:bc UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1 RX packets:6 errors:0 dropped:0 overruns:0 frame:0 TX packets:0
Re: Networkconfiguration with KVM
Am 04.04.2010 20:02, schrieb Dan Johansson: On Sunday 04 April 2010 15.00:26 sudhir kumar wrote: On Sun, Apr 4, 2010 at 5:47 PM, Dan Johansson k...@dmj.nu wrote: Hi, I am new to this list and to KVM (and qemu) so please be gentle with me. Up until now I have been running my virtualizing using VMWare-Server. Now I want to try KVM due to some issues with the VMWare-Server and I am having some troubles with the networking part of KVM. This is a small example of what I want (best viewed in a fix-font): +---+ | Host | | +--+eth0 | 192.168.1.0/24 | | eth0|-- + | | | VM1 eth1|---(---+--- eth1 | 192.168.2.0/24 | | eth2|---(---(---+ | | +--+ | | | | | | | | | | +--+ +---(---(--- eth2 | 192.168.1.0/24 | | eth0|---+ | | | | | VM2 eth1|---+ +--- eth3 | 192.168.3.0/24 | | eth2|---+ | | +--+ | | | +---+ Host-eth0 is only for the Host (no VM) Host-eth1 is shared between the Host and the VM's (VM?-eth1) Host-eth2 and Host-eth3 are only for the VMs (eth0 and eth2) The Host and the VMs all have fixed IPs (no dhcp or likewise). In this example th IPs could be: Host-eth0: 192.168.1.1 Host-eth1: 192.168.2.1 Host-eth2: - Host-eth3: - VM1-eth0: 192.168.1.11 VM1-eth1: 192.168.2.11 VM1-eth2: 192.168.3.11 VM2-eth0: 192.168.1.22 VM2-eth1: 192.168.2.22 VM3-eth2: 192.168.3.22 And, yes, Host-eth0 and Host-eth2 are in the same subnet, with eth0 dedicated to the Host and eth2 dedicated to the VMs. In VMWare this was quite easy to setup (three bridged networks). Its easy with KVM too. You want 3 NICs per VM, so you need to pass the corresponding parameters(including qemu-ifup script) for 3 NICs to each VM. In the host you need to create 2 bridges: say br-eth1 and br-eth2. Make them as the interface on the host in place of the corresponding eth interfaces.(brct addbr br-eth1; ifcfg eth1 0.0.0.0 up; brctl addif br-eth eth1; assign eth1's ip and routes to breth1; same for eth2). In the corresponding qemu-ifup scripts of each interface use bridge=br-ethN (This basicaly translates to brctl addif br-ethN $1, where $ is the tap device created) This should work perfectly fine with your existing NW setup. For a quick reference use: http://www.linux-kvm.org/page/Networking Thanks for your help, but... I am still not able to get it to work the way I want. This is what I have don so far: brctl addbr br-eth1 brctl addbr br-eth3 ip link set eth1 up ip link set eth3 up brctl addif br-eth1 eth1 brctl addif br-eth3 eth3 tunctl -b -t qtap1 tunctl -b -t qtap3 brctl addif br-eth1 qtap1 brctl addif br-eth3 qtap3 ifconfig qtap1 up 0.0.0.0 promisc ifconfig qtap3 up 0.0.0.0 promisc # ifconfig eth0 Link encap:Ethernet HWaddr 00:0d:88:52:51:24 inet addr:192.168.1.3 Bcast:192.168.1.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:443638 errors:0 dropped:0 overruns:0 frame:0 TX packets:758540 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:47041686 (44.8 MiB) TX bytes:990115354 (944.2 MiB) Interrupt:19 Base address:0xec00 eth1 Link encap:Ethernet HWaddr 00:0d:88:52:51:25 inet addr:192.168.4.1 Bcast:192.168.4.255 Mask:255.255.255.0 UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:6 errors:0 dropped:0 overruns:0 carrier:6 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 B) TX bytes:360 (360.0 B) Interrupt:18 Base address:0xe880 eth3 Link encap:Ethernet HWaddr 00:0d:88:52:51:27 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:4 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 B) TX bytes:240 (240.0 B) Interrupt:16 Base address:0xe480 qtap1 Link encap:Ethernet HWaddr 26:c0:de:df:c5:e4 UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1 RX packets:351 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:500 RX bytes:14742 (14.3 KiB) TX bytes:0 (0.0 B) qtap3 Link encap:Ethernet HWaddr 26:3e:ba:2d:97:bc UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1 RX packets:6 errors:0
Re: [Qemu-devel] High CPU use of -usbdevice tablet (was Re: KVM usability)
The USB HID devices implement the SET_IDLE command, so the polling interval will have no real effect on performance. On a Linux guest (F12), I see 125 USB interrupts per second with no mouse movement, so something is broken (on the guest or host). Turns out to be a a bug in the UHCI emulation. It should only raise an interrupt if the transfer actually completes (i.e. the active bit is set to zero). Fixed by 5bd2c0d7. I was testing with an OHCI controller, which does not have this bug. Paul -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] High CPU use of -usbdevice tablet (was Re: KVM usability)
My guess is that the overhead you're seeing is entirely from the USB host adapter having to wake up and check the transport descriptor lists. This will only result in the guest being woken if a device actually responds (as mentioned above it should not). A quick profile on the host side doesn't show this. Instead, I see a lot of select() overhead. This actually confirms my hypothesis. After fixing the UHCI bug the guest is completely idle, but the host still needs to wake up at 1ms intervals to do UHCI emulation. I can believe that the most visible part of this is the select() syscall. Surprising as there are ~10 descriptors being polled, so ~1200 polls per second. Maybe epoll will help here. I'm not sure where you get 1200 from. select will be called once per host wakeup. i.e. if the USB controller is enabled then 1k times per second due to the frame tick. Are you sure there are actually 10 descriptors being polled? Remember that the nfds argument is the value of the largest fd in the set (+1), not the number of descriptors in the set. Paul -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] fix migration with big mem guests
Hi, (Below is explenation about the bug to who does`nt familier) In the beggining I tried to make this code run with qemu_bh() but the result was performence catastrophic The reason is that the migration code just doesn`t built to run at such high granularity, for example sutff like: static ram_addr_t ram_save_remaining(void) { ram_addr_t addr; ram_addr_t count = 0; for (addr = 0; addr last_ram_offset; addr += TARGET_PAGE_SIZE) { if (cpu_physical_memory_get_dirty(addr, MIGRATION_DIRTY_FLAG)) count++; } return count; } That get called from ram_save_live(), were taking way too much time... (Just think that I tried to read each time small data, and run it at each time that main_loop_wait() finish (from qemu_bh_poll()) Then I thought ok - let`s add a timer that the bh code will run to me only once in a time - however the migration code already have timer that is set, so it like it make the most sense to use it... If anyone have any better idea how to solve this issue, I will be very happy to hear. Thanks. From 2d9c25f1fee61f50cb130769c3779707a6ef90d9 Mon Sep 17 00:00:00 2001 From: Izik Eidus iei...@redhat.com Date: Mon, 5 Apr 2010 02:05:09 +0300 Subject: [PATCH] qemu-kvm: fix migration with large mem In cases of guests with large mem that have pages that all their bytes content are the same, we will spend alot of time reading the memory from the guest (is_dup_page()) It is happening beacuse ram_save_live() function have limit of how much we can send to the dest but not how much we read from it, and in cases we have many is_dup_page() hits, we might read huge amount of data without updating important stuff like the timers... The guest lose all its repsonsibility and have many softlock ups inside itself. this patch add limit on the size we can read from the guest each iteration. Thanks. Signed-off-by: Izik Eidus iei...@redhat.com --- vl.c |6 +- 1 files changed, 5 insertions(+), 1 deletions(-) diff --git a/vl.c b/vl.c index d959fdb..777988d 100644 --- a/vl.c +++ b/vl.c @@ -174,6 +174,8 @@ int main(int argc, char **argv) #define DEFAULT_RAM_SIZE 128 +#define MAX_SAVE_BLOCK_READ 10 * 1024 * 1024 + #define MAX_VIRTIO_CONSOLES 1 static const char *data_dir; @@ -2854,6 +2856,7 @@ static int ram_save_live(Monitor *mon, QEMUFile *f, int stage, void *opaque) uint64_t bytes_transferred_last; double bwidth = 0; uint64_t expected_time = 0; +int data_read = 0; if (stage 0) { cpu_physical_memory_set_dirty_tracking(0); @@ -2883,10 +2886,11 @@ static int ram_save_live(Monitor *mon, QEMUFile *f, int stage, void *opaque) bytes_transferred_last = bytes_transferred; bwidth = qemu_get_clock_ns(rt_clock); -while (!qemu_file_rate_limit(f)) { +while (!qemu_file_rate_limit(f) data_read MAX_SAVE_BLOCK_READ) { int ret; ret = ram_save_block(f); +data_read += ret * TARGET_PAGE_SIZE; bytes_transferred += ret * TARGET_PAGE_SIZE; if (ret == 0) /* no more blocks */ break; -- 1.6.6.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html