Re: [PATCH v2] Add Mergeable RX buffer feature to vhost_net

2010-04-04 Thread Michael S. Tsirkin
On Thu, Apr 01, 2010 at 11:22:37AM -0700, David Stevens wrote:
 kvm-ow...@vger.kernel.org wrote on 04/01/2010 03:54:15 AM:
 
  On Wed, Mar 31, 2010 at 03:04:43PM -0700, David Stevens wrote:
 
   
 +   head.iov_base = (void 
 *)vhost_get_vq_desc(net-dev, 
   vq,
 +   vq-iov, ARRAY_SIZE(vq-iov), out, in, 
 NULL, 
   
 NULL);

I this casting confusing.
Is it really expensive to add an array of heads so that
we do not need to cast?
   
   It needs the heads and the lengths, which looks a lot
   like an iovec. I was trying to resist adding a new
   struct XXX { unsigned head; unsigned len; } just for this,
   but I could make these parallel arrays, one with head index and
   the other with length.
 
 Michael, on this one, if I add vq-heads as an argument to
 vhost_get_heads (aka vhost_get_desc_n), I'd need the length too.
 Would you rather this 1) remain an iovec (and a single arg added) but
 cast still there, 2) 2 arrays (head and length) and 2 args added, or
 3) a new struct type of {unsigned,int} to carry for the heads+len
 instead of iovec?
 My preference would be 1). I agree the casts are ugly, but
 it is essentially an iovec the way we use it; it's just that the
 base isn't a pointer but a descriptor index instead.

I prefer 2 or 3. If you prefer 1 strongly, I think we should
add a detailed comment near the iovec, and
a couple of inline wrappers to store/get data in the iovec.

   
   EAGAIN is not possible after the change, because we don't
   even enter the loop unless we have an skb on the read queue; the
   other cases bomb out, so I figured the comment for future work is
   now done. :-)
  
  Guest could be buggy so we'll get EFAULT.
  If skb is taken off the rx queue (as below), we might get EAGAIN.
 
 We break on any error. If we get EAGAIN because someone read
 on the socket, this code would break the loop, but EAGAIN is a more
 serious problem if it changed since we peeked (because it means
 someone else is reading the socket).
 But I don't understand -- are you suggesting that the error
 handling be different than that, or that the comment is still
 relevant?
 My intention here is to do the TODO from the comment
 so that it can be removed, by handling all error cases. I think
 because of the peek, EAGAIN isn't something to be ignored anymore,
 but the effect is the same whether we break out of the loop or
 not, since we retry the packet next time around. Essentially, we
 ignore every error since we will redo it with the same packet the
 next time around. Maybe we should print something here, but since
 we'll be retrying the packet that's still on the socket, a permanent
 error would spew continuously. Maybe we should shut down entirely
 if we get any negative return value here (including EAGAIN, since
 that tells us someone messed with the socket when we don't want them
 to).
 If you want the comment still there, ok, but I do think EAGAIN
 isn't a special case per the comment anymore, and is handled as all
 other errors are: by exiting the loop and retrying next time.
 
 +-DLS

Yes, I just think some comment should stay, as you say, because
otherwise we simply retry continuously. Maybe we should trigger vq_err.
It needs to be given some thought which I have not given it yet.

Thinking aloud, EAGAIN means someone reads the socket
together with us, I prefer that this condition is made a fatal
error, we should make sure we are polling the socket
so we see packets if more appear.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] vhost: Make it more scalable by creating a vhost thread per device.

2010-04-04 Thread Michael S. Tsirkin
On Fri, Apr 02, 2010 at 10:31:20AM -0700, Sridhar Samudrala wrote:
 Make vhost scalable by creating a separate vhost thread per vhost
 device. This provides better scaling across multiple guests and with
 multiple interfaces in a guest.

Thanks for looking into this. An alternative approach is
to simply replace create_singlethread_workqueue with
create_workqueue which would get us a thread per host CPU.

It seems that in theory this should be the optimal approach
wrt CPU locality, however, in practice a single thread
seems to get better numbers. I have a TODO to investigate this.
Could you try looking into this?

 
 I am seeing better aggregated througput/latency when running netperf
 across multiple guests or multiple interfaces in a guest in parallel
 with this patch.

Any numbers? What happens to CPU utilization?

 Signed-off-by: Sridhar Samudrala s...@us.ibm.com
 
 diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
 index a6a88df..29aa80f 100644
 --- a/drivers/vhost/net.c
 +++ b/drivers/vhost/net.c
 @@ -339,8 +339,10 @@ static int vhost_net_open(struct inode *inode, struct 
 file *f)
   return r;
   }
  
 - vhost_poll_init(n-poll + VHOST_NET_VQ_TX, handle_tx_net, POLLOUT);
 - vhost_poll_init(n-poll + VHOST_NET_VQ_RX, handle_rx_net, POLLIN);
 + vhost_poll_init(n-poll + VHOST_NET_VQ_TX, handle_tx_net, POLLOUT,
 + n-dev);
 + vhost_poll_init(n-poll + VHOST_NET_VQ_RX, handle_rx_net, POLLIN,
 + n-dev);
   n-tx_poll_state = VHOST_NET_POLL_DISABLED;
  
   f-private_data = n;
 @@ -643,25 +645,14 @@ static struct miscdevice vhost_net_misc = {
  
  int vhost_net_init(void)
  {
 - int r = vhost_init();
 - if (r)
 - goto err_init;
 - r = misc_register(vhost_net_misc);
 - if (r)
 - goto err_reg;
 - return 0;
 -err_reg:
 - vhost_cleanup();
 -err_init:
 - return r;
 -
 + return misc_register(vhost_net_misc);
  }
 +
  module_init(vhost_net_init);
  
  void vhost_net_exit(void)
  {
   misc_deregister(vhost_net_misc);
 - vhost_cleanup();
  }
  module_exit(vhost_net_exit);
  
 diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
 index 7bd7a1e..243f4d3 100644
 --- a/drivers/vhost/vhost.c
 +++ b/drivers/vhost/vhost.c
 @@ -36,8 +36,6 @@ enum {
   VHOST_MEMORY_F_LOG = 0x1,
  };
  
 -static struct workqueue_struct *vhost_workqueue;
 -
  static void vhost_poll_func(struct file *file, wait_queue_head_t *wqh,
   poll_table *pt)
  {
 @@ -56,18 +54,19 @@ static int vhost_poll_wakeup(wait_queue_t *wait, unsigned 
 mode, int sync,
   if (!((unsigned long)key  poll-mask))
   return 0;
  
 - queue_work(vhost_workqueue, poll-work);
 + queue_work(poll-dev-wq, poll-work);
   return 0;
  }
  
  /* Init poll structure */
  void vhost_poll_init(struct vhost_poll *poll, work_func_t func,
 -  unsigned long mask)
 +  unsigned long mask, struct vhost_dev *dev)
  {
   INIT_WORK(poll-work, func);
   init_waitqueue_func_entry(poll-wait, vhost_poll_wakeup);
   init_poll_funcptr(poll-table, vhost_poll_func);
   poll-mask = mask;
 + poll-dev = dev;
  }
  
  /* Start polling a file. We add ourselves to file's wait queue. The caller 
 must
 @@ -96,7 +95,7 @@ void vhost_poll_flush(struct vhost_poll *poll)
  
  void vhost_poll_queue(struct vhost_poll *poll)
  {
 - queue_work(vhost_workqueue, poll-work);
 + queue_work(poll-dev-wq, poll-work);
  }
  
  static void vhost_vq_reset(struct vhost_dev *dev,
 @@ -128,6 +127,11 @@ long vhost_dev_init(struct vhost_dev *dev,
   struct vhost_virtqueue *vqs, int nvqs)
  {
   int i;
 +
 + dev-wq = create_singlethread_workqueue(vhost);
 + if (!dev-wq)
 + return -ENOMEM;
 +
   dev-vqs = vqs;
   dev-nvqs = nvqs;
   mutex_init(dev-mutex);
 @@ -143,7 +147,7 @@ long vhost_dev_init(struct vhost_dev *dev,
   if (dev-vqs[i].handle_kick)
   vhost_poll_init(dev-vqs[i].poll,
   dev-vqs[i].handle_kick,
 - POLLIN);
 + POLLIN, dev);
   }
   return 0;
  }
 @@ -216,6 +220,8 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
   if (dev-mm)
   mmput(dev-mm);
   dev-mm = NULL;
 +
 + destroy_workqueue(dev-wq);
  }
  
  static int log_access_ok(void __user *log_base, u64 addr, unsigned long sz)
 @@ -1095,16 +1101,3 @@ void vhost_disable_notify(struct vhost_virtqueue *vq)
   vq_err(vq, Failed to enable notification at %p: %d\n,
  vq-used-flags, r);
  }
 -
 -int vhost_init(void)
 -{
 - vhost_workqueue = create_singlethread_workqueue(vhost);
 - if (!vhost_workqueue)
 - return -ENOMEM;
 - return 0;
 -}
 -
 -void vhost_cleanup(void)
 -{
 - destroy_workqueue(vhost_workqueue);
 -}
 diff --git 

Re: [PATCHv6 0/4] qemu-kvm: vhost net port

2010-04-04 Thread Michael S. Tsirkin
On Wed, Mar 24, 2010 at 02:38:57PM +0200, Avi Kivity wrote:
 On 03/17/2010 03:04 PM, Michael S. Tsirkin wrote:
 This is port of vhost v6 patch set I posted previously to qemu-kvm, for
 those that want to get good performance out of it :) This patchset needs
 to be applied when qemu.git one gets merged, this includes irqchip
 support.



 Ping me when this happens please.

Ping

 -- 
 error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


High CPU use of -usbdevice tablet (was Re: KVM usability)

2010-04-04 Thread Chris Webb
Avi Kivity a...@redhat.com writes:

 On 03/02/2010 11:34 AM, Jernej Simončič wrote:
 On Tuesday, March 2, 2010, 9:21:18, Chris Webb wrote:
 
 I remember about a year ago, someone asserting on the list that -usbdevice
 tablet was very CPU intensive even when not in use, and should be avoided if
 mouse support wasn't needed, e.g. on non-graphical VMs. Was that actually a
 significant hit, and is it still true today?
 It would appear that this is still the case, at least on slower hosts
 - on Atom Z530 (1,6GHz), the XP VM uses ~30% CPU when idle with
 -usbdevice tablet, but only ~4% without it. However, on a faster host
 (Core2 Quad 2,66GHz), there's practically no difference (Vista x64 VM
 uses ~1% CPU when idle regardless of -usbdevice tablet).
 
 Looks like the tablet is set to 100 Hz polling rate.  We may be able
 to get away with 30 Hz or even less (ep_bInterval, in ms, in
 hw/usb-wacom.c).

Hi Avi. Sorry for the very late follow-up, but I decided to experiment with
this. The cpu impact of the usb tablet device shows up fairly clearly on a
crude test on my (relatively low-spec) desktop. Running an idle Fedora 11
livecd on qemu-kvm 0.12.3, top shows around 0.1% of my cpu in use, but this
increases to roughly 5% when specifying -usbdevice tablet, and more detailed
examination with perf record/report suggests about a factor of thirty too.

It's actually a more general symptom with USB or at least HID devices by the
look of things: although -usb doesn't increase CPU use on its own, the same
increase in load can also be triggered by -usbdevice keyboard or mouse.
However, running with all three of -usbdevice mouse, keyboard and tablet
doesn't increase load any more than just one of these.

Changing the USB tablet polling interval from 10ms to 100ms in both
hw/usb-wacom.c and hw/usb-hid.c made no difference except the an increase in
bInterval shown in lsusb -v in the guest and the hint of jerky mouse
movement I expected from setting this value so high. A similar change to the
polling interval for the keyboard and mouse also made no difference to their
performance impact.

Taking the FRAME_TIMER_FREQ down to 100 in hw/usb-uhci.c does seem to reduce
the CPU load quite a bit, but at the expense of making the USB tablet (and
presumably all other USB devices) very laggy.

Could there be some bug here that causes the usb hid devices to wake qemu at
the maximum rate possible (FRAME_TIMER_FREQ?) rather than the configured
polling interval?

Best wishes,

Chris.

PS Vmmouse works fine as an absolute pointing device in the place of
-usbdevice tablet without the performance impact, but this isn't supported
out of the box with typical linux live CDs (e.g. Fedora 11 and 12 or
Knoppix) so unfortunately it's probably less suitable as a default
configuration to expose to end-users.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Networkconfiguration with KVM

2010-04-04 Thread sudhir kumar
On Sun, Apr 4, 2010 at 5:47 PM, Dan Johansson k...@dmj.nu wrote:
 Hi,

 I am new to this list and to KVM (and qemu) so please be gentle with me.
 Up until now I have been running my virtualizing  using VMWare-Server. Now I
 want to try KVM due to some issues with the  VMWare-Server and I am having
 some troubles with the networking part of KVM.

 This is a small example of what I want (best viewed in a fix-font):

  +---+
  | Host                              |
  |  +--+                eth0 | 192.168.1.0/24
  |  |      eth0|-- +                 |
  |  | VM1  eth1|---(---+--- eth1 | 192.168.2.0/24
  |  |      eth2|---(---(---+         |
  |  +--+   |   |   |         |
  |                 |   |   |         |
  |  +--+   +---(---(--- eth2 | 192.168.1.0/24
  |  |      eth0|---+   |   |         |
  |  | VM2  eth1|---+   +--- eth3 | 192.168.3.0/24
  |  |      eth2|---+         |
  |  +--+                     |
  |                                   |
  +---+

 Host-eth0 is only for the Host (no VM)
 Host-eth1 is shared between the Host and the VM's (VM?-eth1)
 Host-eth2 and Host-eth3 are only for the VMs (eth0 and eth2)

 The Host and the VMs all have fixed IPs (no dhcp or likewise).
 In this example th IPs could be:
 Host-eth0:      192.168.1.1
 Host-eth1:      192.168.2.1
 Host-eth2:      -
 Host-eth3:      -
 VM1-eth0:               192.168.1.11
 VM1-eth1:               192.168.2.11
 VM1-eth2:               192.168.3.11
 VM2-eth0:               192.168.1.22
 VM2-eth1:               192.168.2.22
 VM3-eth2:               192.168.3.22

 And, yes, Host-eth0 and Host-eth2 are in the same subnet, with eth0 dedicated
 to the Host and eth2 dedicated to the VMs.

 In VMWare this was quite easy to setup (three bridged networks).

Its easy with KVM too. You want 3 NICs per VM, so you need to pass the
corresponding parameters(including qemu-ifup script) for 3 NICs to
each VM.
In the host you need to create 2 bridges: say br-eth1 and br-eth2.
Make them as the interface on the host in place of the corresponding
eth interfaces.(brct addbr br-eth1; ifcfg eth1 0.0.0.0 up; brctl addif
br-eth eth1; assign eth1's ip and routes to breth1; same for eth2).
In the corresponding qemu-ifup scripts of each interface use
bridge=br-ethN (This basicaly translates to brctl addif br-ethN $1,
where $ is the tap device created)
This should work perfectly fine with your existing NW setup.
For a quick reference use: http://www.linux-kvm.org/page/Networking

 Does someone know how I can set this up with KVM/QEMU?

 Regards,
 --
 Dan Johansson, http://www.dmj.nu
 ***
 This message is printed on 100% recycled electrons!
 ***
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html




-- 
Regards
Sudhir Kumar
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] High CPU use of -usbdevice tablet (was Re: KVM usability)

2010-04-04 Thread Paul Brook
  Looks like the tablet is set to 100 Hz polling rate.  We may be able
  to get away with 30 Hz or even less (ep_bInterval, in ms, in
  hw/usb-wacom.c).

 Changing the USB tablet polling interval from 10ms to 100ms in both
 hw/usb-wacom.c and hw/usb-hid.c made no difference except the an increase
  in bInterval shown in lsusb -v in the guest and the hint of jerky mouse
  movement I expected from setting this value so high. A similar change to
  the polling interval for the keyboard and mouse also made no difference to
  their performance impact.

The USB HID devices implement the SET_IDLE command, so the polling interval 
will have no real effect on performance.

My guess is that the overhead you're seeing is entirely from the USB host 
adapter having to wake up and check the transport descriptor lists. This will 
only result in the guest being woken if a device actually responds (as 
mentioned above it should not).

Taking the FRAME_TIMER_FREQ down to 100 in hw/usb-uhci.c does seem to reduce
the CPU load quite a bit, but at the expense of making the USB tablet (and
presumably all other USB devices) very laggy.

The guest USB driver explicitly decides which devices to poll each frame. 
Slowing down the frame rate will effectively change the polling period by the 
same factor. e.g. the HID device requests a polling rate of 10ms, you slowed 
down frame rate by 10x, so you're efectively only polling every 100ms.

If you want a quick and nasty hack then you can probably make the device wake 
up less often, and process multiple frames every wakeup.  However this is 
probably going to do bad things (at best extremely poor performance) when 
using actual USB devices.

Fixing this properly is hard because the transport descriptor lists are stores 
in system RAM, and polled by the host adapter.  The first step is to read the 
whole table of descriptors, and calculate when the next event is due. However 
the guest will not explicitly notify the HBA when these tables are modified, 
so you also need some sort of MMU trap to trigger recalculation.

This only gets you down to the base polling interval requested by the device.  
Increasing this interval causes significant user visible latency, so 
increasing it is not an option. The guest is also likely to distribute polling 
events evenly, further reducing the effective sleep interval.  To fix this you 
need additional APIs so that a device can report when an endpoint will become 
unblocked, rather than just waiting to be polled and NAKing the request.

Paul

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2933400 ] virtio-blk io errors / data corruption on raw drives 1 TB

2010-04-04 Thread SourceForge.net
Bugs item #2933400, was opened at 2010-01-16 15:35
Message generated for change (Comment added) made by masc82
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2933400group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 9
Private: No
Submitted By: MaSc82 (masc82)
Assigned to: Nobody/Anonymous (nobody)
Summary: virtio-blk io errors / data corruption on raw drives  1 TB

Initial Comment:
When attaching raw drives  1 TB, buffer io errors will most likely occur, 
filesystems get corrupted. Processes (like mkfs.ext4) might freeze completely 
when filesystems are created on the guest.

Here's a typical log excerpt when using mkfs.ext4 on a 1.5 TB drive on a Ubuntu 
9.10 guest:
(kern.log)
Jan 15 20:40:44 q kernel: [  677.076602] Buffer I/O error on device vde, 
logical block 366283764
Jan 15 20:40:44 q kernel: [  677.076607] Buffer I/O error on device vde, 
logical block 366283765
Jan 15 20:40:44 q kernel: [  677.076611] Buffer I/O error on device vde, 
logical block 366283766
Jan 15 20:40:44 q kernel: [  677.076616] Buffer I/O error on device vde, 
logical block 366283767
Jan 15 20:40:44 q kernel: [  677.076621] Buffer I/O error on device vde, 
logical block 366283768
Jan 15 20:40:44 q kernel: [  677.076626] Buffer I/O error on device vde, 
logical block 366283769
(messages)
Jan 15 20:40:44 q kernel: [  677.076534] lost page write due to I/O error on vde
Jan 15 20:40:44 q kernel: [  677.076541] lost page write due to I/O error on vde
Jan 15 20:40:44 q kernel: [  677.076546] lost page write due to I/O error on vde
Jan 15 20:40:44 q kernel: [  677.076599] lost page write due to I/O error on vde
Jan 15 20:40:44 q kernel: [  677.076604] lost page write due to I/O error on vde
Jan 15 20:40:44 q kernel: [  677.076609] lost page write due to I/O error on vde
Jan 15 20:40:44 q kernel: [  677.076613] lost page write due to I/O error on vde
Jan 15 20:40:44 q kernel: [  677.076618] lost page write due to I/O error on vde
Jan 15 20:40:44 q kernel: [  677.076623] lost page write due to I/O error on vde
Jan 15 20:40:44 q kernel: [  677.076628] lost page write due to I/O error on vde
Jan 15 20:45:55 q Backgrounding to notify hosts...
(The following entries will repeat infinitely, mkfs.ext4 will not exit and 
cannot be killed)
Jan 15 20:49:27 q kernel: [ 1200.520096] mkfs.ext4 D  0 
 1839   1709 0x
Jan 15 20:49:27 q kernel: [ 1200.520101]  88004e157cb8 0082 
88004e157c58 00015880
Jan 15 20:49:27 q kernel: [ 1200.520115]  88004ef6c7c0 00015880 
00015880 00015880
Jan 15 20:49:27 q kernel: [ 1200.520118]  00015880 88004ef6c7c0 
00015880 00015880
Jan 15 20:49:27 q kernel: [ 1200.520123] Call Trace:
Jan 15 20:49:27 q kernel: [ 1200.520157]  [810da0f0] ? 
sync_page+0x0/0x50
Jan 15 20:49:27 q kernel: [ 1200.520178]  [815255f8] 
io_schedule+0x28/0x40
Jan 15 20:49:27 q kernel: [ 1200.520182]  [810da12d] 
sync_page+0x3d/0x50
Jan 15 20:49:27 q kernel: [ 1200.520185]  [81525b17] 
__wait_on_bit+0x57/0x80
Jan 15 20:49:27 q kernel: [ 1200.520192]  [810da29e] 
wait_on_page_bit+0x6e/0x80
Jan 15 20:49:27 q kernel: [ 1200.520205]  [81078650] ? 
wake_bit_function+0x0/0x40
Jan 15 20:49:27 q kernel: [ 1200.520210]  [810e44e0] ? 
pagevec_lookup_tag+0x20/0x30
Jan 15 20:49:27 q kernel: [ 1200.520213]  [810da745] 
wait_on_page_writeback_range+0xf5/0x190
Jan 15 20:49:27 q kernel: [ 1200.520217]  [810da807] 
filemap_fdatawait+0x27/0x30
Jan 15 20:49:27 q kernel: [ 1200.520220]  [810dacb4] 
filemap_write_and_wait+0x44/0x50
Jan 15 20:49:27 q kernel: [ 1200.520235]  [8114ba9f] 
__sync_blockdev+0x1f/0x40
Jan 15 20:49:27 q kernel: [ 1200.520239]  [8114bace] 
sync_blockdev+0xe/0x10
Jan 15 20:49:27 q kernel: [ 1200.520241]  [8114baea] 
block_fsync+0x1a/0x20
Jan 15 20:49:27 q kernel: [ 1200.520249]  [81142f26] 
vfs_fsync+0x86/0xf0
Jan 15 20:49:27 q kernel: [ 1200.520252]  [81142fc9] 
do_fsync+0x39/0x60
Jan 15 20:49:27 q kernel: [ 1200.520255]  [8114301b] 
sys_fsync+0xb/0x10
Jan 15 20:49:27 q kernel: [ 1200.520271]  [81011fc2] 
system_call_fastpath+0x16/0x1b

In my case I was switching to virtio at one point, but the behaviour didn't 
show until there was  1 TB data on the filesystem. very dangerous.

Tested using 2 different SATA controllers, 1.5 TB lvm/mdraid, single 1.5 TB 
drive and 2 TB lvm/mdraid.
The behaviour does not occur with if=scsi or if=ide.

#2914397 might be related: 
https://sourceforge.net/tracker/?func=detailaid=2914397group_id=180599atid=893831
This blog post might also relate: 
http://www.neuhalfen.name/2009/08/05/OpenSolaris_KVM_and_large_IDE_drives/

CPU: Intel 

Re: [PATCHv6 0/4] qemu-kvm: vhost net port

2010-04-04 Thread Avi Kivity

On 04/04/2010 02:46 PM, Michael S. Tsirkin wrote:

On Wed, Mar 24, 2010 at 02:38:57PM +0200, Avi Kivity wrote:
   

On 03/17/2010 03:04 PM, Michael S. Tsirkin wrote:
 

This is port of vhost v6 patch set I posted previously to qemu-kvm, for
those that want to get good performance out of it :) This patchset needs
to be applied when qemu.git one gets merged, this includes irqchip
support.


   

Ping me when this happens please.
 

Ping
   


Bounce.

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] High CPU use of -usbdevice tablet (was Re: KVM usability)

2010-04-04 Thread Avi Kivity

On 04/04/2010 05:25 PM, Paul Brook wrote:

Looks like the tablet is set to 100 Hz polling rate.  We may be able
to get away with 30 Hz or even less (ep_bInterval, in ms, in
hw/usb-wacom.c).
   

Changing the USB tablet polling interval from 10ms to 100ms in both
hw/usb-wacom.c and hw/usb-hid.c made no difference except the an increase
  in bInterval shown in lsusb -v in the guest and the hint of jerky mouse
  movement I expected from setting this value so high. A similar change to
  the polling interval for the keyboard and mouse also made no difference to
  their performance impact.
 

The USB HID devices implement the SET_IDLE command, so the polling interval
will have no real effect on performance.
   


On a Linux guest (F12), I see 125 USB interrupts per second with no 
mouse movement, so something is broken (on the guest or host).



My guess is that the overhead you're seeing is entirely from the USB host
adapter having to wake up and check the transport descriptor lists. This will
only result in the guest being woken if a device actually responds (as
mentioned above it should not).
   


A quick profile on the host side doesn't show this.  Instead, I see a 
lot of select() overhead.  Surprising as there are ~10 descriptors being 
polled, so ~1200 polls per second.  Maybe epoll will help here.



--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM Page Fault Question

2010-04-04 Thread Avi Kivity

(re-adding list)


On 04/02/2010 07:01 PM, Marek Olszewski wrote:

Thanks for the fast response.

I'm trying to find the code that on a write to a guest page table 
entry, will iterate over all shadow page table entries that map that 
guest entry to update them.  Can you point me to that code?  I can't 
seem to find it myself :(


See kvm_mmu_pte_write().

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Networkconfiguration with KVM

2010-04-04 Thread Dan Johansson
On Sunday 04 April 2010 15.00:26 sudhir kumar wrote:
 On Sun, Apr 4, 2010 at 5:47 PM, Dan Johansson k...@dmj.nu wrote:
  Hi,
 
  I am new to this list and to KVM (and qemu) so please be gentle with me.
  Up until now I have been running my virtualizing  using VMWare-Server.
  Now I want to try KVM due to some issues with the  VMWare-Server and I am
  having some troubles with the networking part of KVM.
 
  This is a small example of what I want (best viewed in a fix-font):
 
   +---+
   | Host  |
   |  +--+eth0 | 192.168.1.0/24
   |  |  eth0|-- + |
   |  | VM1  eth1|---(---+--- eth1 | 192.168.2.0/24
   |  |  eth2|---(---(---+ |
   |  +--+   |   |   | |
   | |   |   | |
   |  +--+   +---(---(--- eth2 | 192.168.1.0/24
   |  |  eth0|---+   |   | |
   |  | VM2  eth1|---+   +--- eth3 | 192.168.3.0/24
   |  |  eth2|---+ |
   |  +--+ |
   |   |
   +---+
 
  Host-eth0 is only for the Host (no VM)
  Host-eth1 is shared between the Host and the VM's (VM?-eth1)
  Host-eth2 and Host-eth3 are only for the VMs (eth0 and eth2)
 
  The Host and the VMs all have fixed IPs (no dhcp or likewise).
  In this example th IPs could be:
  Host-eth0:  192.168.1.1
  Host-eth1:  192.168.2.1
  Host-eth2:  -
  Host-eth3:  -
  VM1-eth0:   192.168.1.11
  VM1-eth1:   192.168.2.11
  VM1-eth2:   192.168.3.11
  VM2-eth0:   192.168.1.22
  VM2-eth1:   192.168.2.22
  VM3-eth2:   192.168.3.22
 
  And, yes, Host-eth0 and Host-eth2 are in the same subnet, with eth0
  dedicated to the Host and eth2 dedicated to the VMs.
 
  In VMWare this was quite easy to setup (three bridged networks).
 
 Its easy with KVM too. You want 3 NICs per VM, so you need to pass the
 corresponding parameters(including qemu-ifup script) for 3 NICs to
 each VM.
 In the host you need to create 2 bridges: say br-eth1 and br-eth2.
 Make them as the interface on the host in place of the corresponding
 eth interfaces.(brct addbr br-eth1; ifcfg eth1 0.0.0.0 up; brctl addif
 br-eth eth1; assign eth1's ip and routes to breth1; same for eth2).
 In the corresponding qemu-ifup scripts of each interface use
 bridge=br-ethN (This basicaly translates to brctl addif br-ethN $1,
 where $ is the tap device created)
 This should work perfectly fine with your existing NW setup.
 For a quick reference use: http://www.linux-kvm.org/page/Networking

Thanks for your help, but... I am still not able to get it to work the way I 
want.
This is what I have don so far:
brctl addbr br-eth1
brctl addbr br-eth3

ip link set eth1 up
ip link set eth3 up

brctl addif br-eth1 eth1
brctl addif br-eth3 eth3

tunctl -b -t qtap1
tunctl -b -t qtap3

brctl addif br-eth1 qtap1
brctl addif br-eth3 qtap3

ifconfig qtap1 up 0.0.0.0 promisc
ifconfig qtap3 up 0.0.0.0 promisc

# ifconfig
eth0  Link encap:Ethernet  HWaddr 00:0d:88:52:51:24
  inet addr:192.168.1.3  Bcast:192.168.1.255  Mask:255.255.255.0
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:443638 errors:0 dropped:0 overruns:0 frame:0
  TX packets:758540 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:47041686 (44.8 MiB)  TX bytes:990115354 (944.2 MiB)
  Interrupt:19 Base address:0xec00

eth1  Link encap:Ethernet  HWaddr 00:0d:88:52:51:25
  inet addr:192.168.4.1  Bcast:192.168.4.255  Mask:255.255.255.0
  UP BROADCAST RUNNING PROMISC MULTICAST  MTU:1500  Metric:1
  RX packets:0 errors:0 dropped:0 overruns:0 frame:0
  TX packets:6 errors:0 dropped:0 overruns:0 carrier:6
  collisions:0 txqueuelen:1000
  RX bytes:0 (0.0 B)  TX bytes:360 (360.0 B)
  Interrupt:18 Base address:0xe880

eth3  Link encap:Ethernet  HWaddr 00:0d:88:52:51:27
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:0 errors:0 dropped:0 overruns:0 frame:0
  TX packets:4 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:0 (0.0 B)  TX bytes:240 (240.0 B)
  Interrupt:16 Base address:0xe480

qtap1 Link encap:Ethernet  HWaddr 26:c0:de:df:c5:e4
  UP BROADCAST RUNNING PROMISC MULTICAST  MTU:1500  Metric:1
  RX packets:351 errors:0 dropped:0 overruns:0 frame:0
  TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:500
  RX bytes:14742 (14.3 KiB)  TX bytes:0 (0.0 B)

qtap3 Link encap:Ethernet  HWaddr 26:3e:ba:2d:97:bc
  UP BROADCAST RUNNING PROMISC MULTICAST  MTU:1500  Metric:1
  RX packets:6 errors:0 dropped:0 overruns:0 frame:0
  TX packets:0 

Re: Networkconfiguration with KVM

2010-04-04 Thread Held Bernhard
Am 04.04.2010 20:02, schrieb Dan Johansson:
 On Sunday 04 April 2010 15.00:26 sudhir kumar wrote:
 On Sun, Apr 4, 2010 at 5:47 PM, Dan Johansson k...@dmj.nu wrote:
 Hi,

 I am new to this list and to KVM (and qemu) so please be gentle with me.
 Up until now I have been running my virtualizing  using VMWare-Server.
 Now I want to try KVM due to some issues with the  VMWare-Server and I am
 having some troubles with the networking part of KVM.

 This is a small example of what I want (best viewed in a fix-font):

  +---+
  | Host  |
  |  +--+eth0 | 192.168.1.0/24
  |  |  eth0|-- + |
  |  | VM1  eth1|---(---+--- eth1 | 192.168.2.0/24
  |  |  eth2|---(---(---+ |
  |  +--+   |   |   | |
  | |   |   | |
  |  +--+   +---(---(--- eth2 | 192.168.1.0/24
  |  |  eth0|---+   |   | |
  |  | VM2  eth1|---+   +--- eth3 | 192.168.3.0/24
  |  |  eth2|---+ |
  |  +--+ |
  |   |
  +---+

 Host-eth0 is only for the Host (no VM)
 Host-eth1 is shared between the Host and the VM's (VM?-eth1)
 Host-eth2 and Host-eth3 are only for the VMs (eth0 and eth2)

 The Host and the VMs all have fixed IPs (no dhcp or likewise).
 In this example th IPs could be:
 Host-eth0:  192.168.1.1
 Host-eth1:  192.168.2.1
 Host-eth2:  -
 Host-eth3:  -
 VM1-eth0:   192.168.1.11
 VM1-eth1:   192.168.2.11
 VM1-eth2:   192.168.3.11
 VM2-eth0:   192.168.1.22
 VM2-eth1:   192.168.2.22
 VM3-eth2:   192.168.3.22

 And, yes, Host-eth0 and Host-eth2 are in the same subnet, with eth0
 dedicated to the Host and eth2 dedicated to the VMs.

 In VMWare this was quite easy to setup (three bridged networks).

 Its easy with KVM too. You want 3 NICs per VM, so you need to pass the
 corresponding parameters(including qemu-ifup script) for 3 NICs to
 each VM.
 In the host you need to create 2 bridges: say br-eth1 and br-eth2.
 Make them as the interface on the host in place of the corresponding
 eth interfaces.(brct addbr br-eth1; ifcfg eth1 0.0.0.0 up; brctl addif
 br-eth eth1; assign eth1's ip and routes to breth1; same for eth2).
 In the corresponding qemu-ifup scripts of each interface use
 bridge=br-ethN (This basicaly translates to brctl addif br-ethN $1,
 where $ is the tap device created)
 This should work perfectly fine with your existing NW setup.
 For a quick reference use: http://www.linux-kvm.org/page/Networking
 
 Thanks for your help, but... I am still not able to get it to work the way I 
 want.
 This is what I have don so far:
 brctl addbr br-eth1
 brctl addbr br-eth3
 
 ip link set eth1 up
 ip link set eth3 up
 
 brctl addif br-eth1 eth1
 brctl addif br-eth3 eth3
 
 tunctl -b -t qtap1
 tunctl -b -t qtap3
 
 brctl addif br-eth1 qtap1
 brctl addif br-eth3 qtap3
 
 ifconfig qtap1 up 0.0.0.0 promisc
 ifconfig qtap3 up 0.0.0.0 promisc
 
 # ifconfig
 eth0  Link encap:Ethernet  HWaddr 00:0d:88:52:51:24
   inet addr:192.168.1.3  Bcast:192.168.1.255  Mask:255.255.255.0
   UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
   RX packets:443638 errors:0 dropped:0 overruns:0 frame:0
   TX packets:758540 errors:0 dropped:0 overruns:0 carrier:0
   collisions:0 txqueuelen:1000
   RX bytes:47041686 (44.8 MiB)  TX bytes:990115354 (944.2 MiB)
   Interrupt:19 Base address:0xec00
 
 eth1  Link encap:Ethernet  HWaddr 00:0d:88:52:51:25
   inet addr:192.168.4.1  Bcast:192.168.4.255  Mask:255.255.255.0
   UP BROADCAST RUNNING PROMISC MULTICAST  MTU:1500  Metric:1
   RX packets:0 errors:0 dropped:0 overruns:0 frame:0
   TX packets:6 errors:0 dropped:0 overruns:0 carrier:6
   collisions:0 txqueuelen:1000
   RX bytes:0 (0.0 B)  TX bytes:360 (360.0 B)
   Interrupt:18 Base address:0xe880
 
 eth3  Link encap:Ethernet  HWaddr 00:0d:88:52:51:27
   UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
   RX packets:0 errors:0 dropped:0 overruns:0 frame:0
   TX packets:4 errors:0 dropped:0 overruns:0 carrier:0
   collisions:0 txqueuelen:1000
   RX bytes:0 (0.0 B)  TX bytes:240 (240.0 B)
   Interrupt:16 Base address:0xe480
 
 qtap1 Link encap:Ethernet  HWaddr 26:c0:de:df:c5:e4
   UP BROADCAST RUNNING PROMISC MULTICAST  MTU:1500  Metric:1
   RX packets:351 errors:0 dropped:0 overruns:0 frame:0
   TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
   collisions:0 txqueuelen:500
   RX bytes:14742 (14.3 KiB)  TX bytes:0 (0.0 B)
 
 qtap3 Link encap:Ethernet  HWaddr 26:3e:ba:2d:97:bc
   UP BROADCAST RUNNING PROMISC MULTICAST  MTU:1500  Metric:1
   RX packets:6 errors:0 

Re: [Qemu-devel] High CPU use of -usbdevice tablet (was Re: KVM usability)

2010-04-04 Thread Paul Brook
  The USB HID devices implement the SET_IDLE command, so the polling
  interval will have no real effect on performance.
 
 On a Linux guest (F12), I see 125 USB interrupts per second with no
 mouse movement, so something is broken (on the guest or host).

Turns out to be a a bug in the UHCI emulation. It should only raise an 
interrupt if the transfer actually completes (i.e. the active bit is set to 
zero). Fixed by 5bd2c0d7.

I was testing with an OHCI controller, which does not have this bug.

Paul
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] High CPU use of -usbdevice tablet (was Re: KVM usability)

2010-04-04 Thread Paul Brook
  My guess is that the overhead you're seeing is entirely from the USB host
  adapter having to wake up and check the transport descriptor lists. This
  will only result in the guest being woken if a device actually responds
  (as mentioned above it should not).
 
 A quick profile on the host side doesn't show this.  Instead, I see a
 lot of select() overhead.

This actually confirms my hypothesis. After fixing the UHCI bug the guest is 
completely idle, but the host still needs to wake up at 1ms intervals to do 
UHCI emulation. I can believe that the most visible part of this is the 
select() syscall.

 Surprising as there are ~10 descriptors being
 polled, so ~1200 polls per second.  Maybe epoll will help here.

I'm not sure where you get 1200 from.  select will be called once per host 
wakeup. i.e. if the USB controller is enabled then 1k times per second due to 
the frame tick.

Are you sure there are actually 10 descriptors being polled? Remember that the 
nfds argument is the value of the largest fd in the set (+1), not the number 
of descriptors in the set.

Paul
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] fix migration with big mem guests

2010-04-04 Thread Izik Eidus
Hi,

(Below is explenation about the bug to who does`nt familier)

In the beggining I tried to make this code run with
qemu_bh() but the result was performence catastrophic

The reason is that the migration code just doesn`t built
to run at such high granularity, for example sutff like:

static ram_addr_t ram_save_remaining(void)
{
ram_addr_t addr;
ram_addr_t count = 0;

for (addr = 0; addr  last_ram_offset; addr += TARGET_PAGE_SIZE) {
if (cpu_physical_memory_get_dirty(addr, MIGRATION_DIRTY_FLAG))
count++;
}

return count;
}

That get called from ram_save_live(), were taking way too much time...
(Just think that I tried to read each time small data, and run it at
 each time that main_loop_wait() finish (from qemu_bh_poll())

Then I thought ok - let`s add a timer that the bh code will run to me
only once in a time - however the migration code already have timer
that is set, so it like it make the most sense to use it...

If anyone have any better idea how to solve this issue, I will be very
happy to hear.

Thanks.

From 2d9c25f1fee61f50cb130769c3779707a6ef90d9 Mon Sep 17 00:00:00 2001
From: Izik Eidus iei...@redhat.com
Date: Mon, 5 Apr 2010 02:05:09 +0300
Subject: [PATCH] qemu-kvm: fix migration with large mem

In cases of guests with large mem that have pages
that all their bytes content are the same, we will
spend alot of time reading the memory from the guest
(is_dup_page())

It is happening beacuse ram_save_live() function have
limit of how much we can send to the dest but not how
much we read from it, and in cases we have many is_dup_page()
hits, we might read huge amount of data without updating important
stuff like the timers...

The guest lose all its repsonsibility and have many softlock ups
inside itself.

this patch add limit on the size we can read from the guest each
iteration.

Thanks.

Signed-off-by: Izik Eidus iei...@redhat.com
---
 vl.c |6 +-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/vl.c b/vl.c
index d959fdb..777988d 100644
--- a/vl.c
+++ b/vl.c
@@ -174,6 +174,8 @@ int main(int argc, char **argv)
 
 #define DEFAULT_RAM_SIZE 128
 
+#define MAX_SAVE_BLOCK_READ 10 * 1024 * 1024
+
 #define MAX_VIRTIO_CONSOLES 1
 
 static const char *data_dir;
@@ -2854,6 +2856,7 @@ static int ram_save_live(Monitor *mon, QEMUFile *f, int 
stage, void *opaque)
 uint64_t bytes_transferred_last;
 double bwidth = 0;
 uint64_t expected_time = 0;
+int data_read = 0;
 
 if (stage  0) {
 cpu_physical_memory_set_dirty_tracking(0);
@@ -2883,10 +2886,11 @@ static int ram_save_live(Monitor *mon, QEMUFile *f, int 
stage, void *opaque)
 bytes_transferred_last = bytes_transferred;
 bwidth = qemu_get_clock_ns(rt_clock);
 
-while (!qemu_file_rate_limit(f)) {
+while (!qemu_file_rate_limit(f)  data_read  MAX_SAVE_BLOCK_READ) {
 int ret;
 
 ret = ram_save_block(f);
+data_read += ret * TARGET_PAGE_SIZE;
 bytes_transferred += ret * TARGET_PAGE_SIZE;
 if (ret == 0) /* no more blocks */
 break;
-- 
1.6.6.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html