RE: Network I/O performance

2009-05-20 Thread Fischer, Anna
 Subject: Re: Network I/O performance
 
 Fischer, Anna wrote:
  Subject: Re: Network I/O performance
 
  Fischer, Anna wrote:
 
  I am running KVM with Fedora Core 8 on a 2.6.23 32-bit kernel. I
 use
 
  the tun/tap device model and the Linux bridge kernel module to
 connect
  my VM to the network. I have 2 10G Intel 82598 network devices (with
  the ixgbe driver) attached to my machine and I want to do packet
  routing in my VM (the VM has two virtual network interfaces
  configured). Analysing the network performance of the standard QEMU
  emulated NICs, I get less that 1G of throughput on those 10G links.
  Surprisingly though, I don't really see CPU utilization being maxed
  out. This is a dual core machine, and mpstat shows me that both CPUs
  are about 40% idle. My VM is more or less unresponsive due to the
 high
  network processing load while the host OS still seems to be in good
  shape. How can I best tune this setup to achieve best possible
  performance with KVM? I know there is virtIO and I know there is PCI
  pass-through, but those models are not an option for me right now.
 
  How many cpus are assigned to the guest?  If only one, then 40% idle
  equates to 100% of a core for the guest and 20% for housekeeping.
 
 
  No, the machine has a dual core CPU and I have configured the guest
 with 2 CPUs. So I would want to see KVM using up to 200% of CPU,
 ideally. There is nothing else running on that machine.
 
 
 Well, it really depends on the workload, whether it can utilize both
 vcpus.
 
 
 
  If this is the case, you could try pinning the vcpu thread (info
 cpus
  from the monitor) to one core.  You should then see 100%/20% cpu
 load
  distribution.
 
  wrt emulated NIC performance, I'm guessing you're not doing tcp?  If
  you
  were we might do something with TSO.
 
 
  No, I am measuring UDP throughput performance. I have now tried using
 a different NIC model, and the e1000 model seems to achieve slightly
 better performance (CPU goes up to 110% only though). I have also been
 running virtio now, and while its performance with 2.6.20 was very poor
 too, when changing the guest kernel to 2.6.30, I get a reasonable
 performance and higher CPU utilization (e.g. it goes up to 180-190%). I
 have to throttle the incoming bandwidth though, because as soon as I go
 over a certain threshold, CPU goes back down to 90% and throughput goes
 down too.
 
 
 Yes, there's a known issue with UDP, where we don't report congestion
 and the queues start dropping packets.  There's a patch for tun queued
 for the next merge window; you'll need a 2.6.31 host for that IIRC
 (Herbert?)
 
  I have not seen this with Xen/VMware where I mostly managed to max
 out CPU completely before throughput performance did not go up anymore.
 
  I have also realized that when using the tun/tap configuration with a
 bridge, packets are replicated on all tap devices when QEMU writes
 packets to the tun interface. I guess this is a limitation of tun/tap
 as it does not know to which tap device the packet has to go to. The
 tap device then eventually drops packets when the destination MAC is
 not its own, but it still receives the packet which causes more
 overhead in the system overall.
 
 
 Right, I guess you'd see this with a real switch as well?  Maybe have
 your guest send a packet out once in a while so the bridge can learn
 its
 MAC address (we do this after migration, for example).

No, this is not about the bridge - packets are replicated by tun/tap as far as 
I can see. In fact I run two bridges, and attach my two tap interfaces to those 
(one tap per bridge to connect it to the external network). And packets that 
should actually only go to one bridge, are replicated on the other one, too. 
This is far off from being ideal, but I guess the issue is that the tun/tap 
interface is a 1-N mapping, so there is not much you can do.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


tun/tap and Vlans (was: Re: Network I/O performance)

2009-05-19 Thread Lukas Kolbe
Hi all,

On a sidenote:

  I have also realized that when using the tun/tap configuration with
  a bridge, packets are replicated on all tap devices when QEMU writes
  packets to the tun interface. I guess this is a limitation of
  tun/tap as it does not know to which tap device the packet has to go
  to. The tap device then eventually drops packets when the
  destination MAC is not its own, but it still receives the packet 
  which causes more overhead in the system overall.
 
 Right, I guess you'd see this with a real switch as well?  Maybe have 
 your guest send a packet out once in a while so the bridge can learn its 
 MAC address (we do this after migration, for example).

Does this mean that it is not possible for having each tun device in a
seperate bridge that serves a seperate Vlan? We have experienced a
strange problem that we couldn't yet explain. Given this setup:

GuestHost  
kvm1 --- eth0 -+- bridge0 --- vlan1 \
   | +-- eth0
kvm2 -+- eth0 -/ /
  \- eth1 --- bridge1 --- vlan2 +

When sending packets through kvm2/eth0, they appear on both bridges and
also vlans, also when sending packets through kvm2/eth1. When the guest
has only one interface, the packets only appear on one bridge and one
vlan as it's supposed to be.

Can this be worked around?

-- 
Lukas


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network I/O performance

2009-05-18 Thread Avi Kivity

Herbert Xu wrote:
Yes, there's a known issue with UDP, where we don't report congestion  
and the queues start dropping packets.  There's a patch for tun queued  
for the next merge window; you'll need a 2.6.31 host for that IIRC  
(Herbert?)



It should be in 2.6.30 in fact.  However, this is for outbound
traffic only since inbound traffic shouldn't have this problem
of the guest sending faster than the wire.
  


Is there a corresponding qemu change?  Or is this a already handled by 
the existing code?


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network I/O performance

2009-05-17 Thread Avi Kivity

Fischer, Anna wrote:

Subject: Re: Network I/O performance

Fischer, Anna wrote:


I am running KVM with Fedora Core 8 on a 2.6.23 32-bit kernel. I use
  

the tun/tap device model and the Linux bridge kernel module to connect
my VM to the network. I have 2 10G Intel 82598 network devices (with
the ixgbe driver) attached to my machine and I want to do packet
routing in my VM (the VM has two virtual network interfaces
configured). Analysing the network performance of the standard QEMU
emulated NICs, I get less that 1G of throughput on those 10G links.
Surprisingly though, I don't really see CPU utilization being maxed
out. This is a dual core machine, and mpstat shows me that both CPUs
are about 40% idle. My VM is more or less unresponsive due to the high
network processing load while the host OS still seems to be in good
shape. How can I best tune this setup to achieve best possible
performance with KVM? I know there is virtIO and I know there is PCI
pass-through, but those models are not an option for me right now.

How many cpus are assigned to the guest?  If only one, then 40% idle

equates to 100% of a core for the guest and 20% for housekeeping.



No, the machine has a dual core CPU and I have configured the guest with 2 
CPUs. So I would want to see KVM using up to 200% of CPU, ideally. There is 
nothing else running on that machine.
  


Well, it really depends on the workload, whether it can utilize both vcpus.

 
  

If this is the case, you could try pinning the vcpu thread (info cpus
from the monitor) to one core.  You should then see 100%/20% cpu load
distribution.

wrt emulated NIC performance, I'm guessing you're not doing tcp?  If
you
were we might do something with TSO.



No, I am measuring UDP throughput performance. I have now tried using a different NIC model, and the e1000 model seems to achieve slightly better performance (CPU goes up to 110% only though). I have also been running virtio now, and while its performance with 2.6.20 was very poor too, when changing the guest kernel to 2.6.30, I get a reasonable performance and higher CPU utilization (e.g. it goes up to 180-190%). I have to throttle the incoming bandwidth though, because as soon as I go over a certain threshold, CPU goes back down to 90% and throughput goes down too. 
  


Yes, there's a known issue with UDP, where we don't report congestion 
and the queues start dropping packets.  There's a patch for tun queued 
for the next merge window; you'll need a 2.6.31 host for that IIRC 
(Herbert?)



I have not seen this with Xen/VMware where I mostly managed to max out CPU 
completely before throughput performance did not go up anymore.

I have also realized that when using the tun/tap configuration with a bridge, 
packets are replicated on all tap devices when QEMU writes packets to the tun 
interface. I guess this is a limitation of tun/tap as it does not know to which 
tap device the packet has to go to. The tap device then eventually drops 
packets when the destination MAC is not its own, but it still receives the 
packet which causes more overhead in the system overall.
  


Right, I guess you'd see this with a real switch as well?  Maybe have 
your guest send a packet out once in a while so the bridge can learn its 
MAC address (we do this after migration, for example).


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network I/O performance

2009-05-13 Thread Avi Kivity

Fischer, Anna wrote:

I am running KVM with Fedora Core 8 on a 2.6.23 32-bit kernel. I use the 
tun/tap device model and the Linux bridge kernel module to connect my VM to the 
network. I have 2 10G Intel 82598 network devices (with the ixgbe driver) 
attached to my machine and I want to do packet routing in my VM (the VM has two 
virtual network interfaces configured). Analysing the network performance of 
the standard QEMU emulated NICs, I get less that 1G of throughput on those 10G 
links. Surprisingly though, I don't really see CPU utilization being maxed out. 
This is a dual core machine, and mpstat shows me that both CPUs are about 40% 
idle. My VM is more or less unresponsive due to the high network processing 
load while the host OS still seems to be in good shape. How can I best tune 
this setup to achieve best possible performance with KVM? I know there is 
virtIO and I know there is PCI pass-through, but those models are not an option 
for me right now.
  


How many cpus are assigned to the guest?  If only one, then 40% idle 
equates to 100% of a core for the guest and 20% for housekeeping.


If this is the case, you could try pinning the vcpu thread (info cpus 
from the monitor) to one core.  You should then see 100%/20% cpu load 
distribution.


wrt emulated NIC performance, I'm guessing you're not doing tcp?  If you 
were we might do something with TSO.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Network I/O performance

2009-05-13 Thread Fischer, Anna
 Subject: Re: Network I/O performance
 
 Fischer, Anna wrote:
  I am running KVM with Fedora Core 8 on a 2.6.23 32-bit kernel. I use
 the tun/tap device model and the Linux bridge kernel module to connect
 my VM to the network. I have 2 10G Intel 82598 network devices (with
 the ixgbe driver) attached to my machine and I want to do packet
 routing in my VM (the VM has two virtual network interfaces
 configured). Analysing the network performance of the standard QEMU
 emulated NICs, I get less that 1G of throughput on those 10G links.
 Surprisingly though, I don't really see CPU utilization being maxed
 out. This is a dual core machine, and mpstat shows me that both CPUs
 are about 40% idle. My VM is more or less unresponsive due to the high
 network processing load while the host OS still seems to be in good
 shape. How can I best tune this setup to achieve best possible
 performance with KVM? I know there is virtIO and I know there is PCI
 pass-through, but those models are not an option for me right now.
 
 
 How many cpus are assigned to the guest?  If only one, then 40% idle
 equates to 100% of a core for the guest and 20% for housekeeping.

No, the machine has a dual core CPU and I have configured the guest with 2 
CPUs. So I would want to see KVM using up to 200% of CPU, ideally. There is 
nothing else running on that machine.
 
 If this is the case, you could try pinning the vcpu thread (info cpus
 from the monitor) to one core.  You should then see 100%/20% cpu load
 distribution.
 
 wrt emulated NIC performance, I'm guessing you're not doing tcp?  If
 you
 were we might do something with TSO.

No, I am measuring UDP throughput performance. I have now tried using a 
different NIC model, and the e1000 model seems to achieve slightly better 
performance (CPU goes up to 110% only though). I have also been running virtio 
now, and while its performance with 2.6.20 was very poor too, when changing the 
guest kernel to 2.6.30, I get a reasonable performance and higher CPU 
utilization (e.g. it goes up to 180-190%). I have to throttle the incoming 
bandwidth though, because as soon as I go over a certain threshold, CPU goes 
back down to 90% and throughput goes down too. 

I have not seen this with Xen/VMware where I mostly managed to max out CPU 
completely before throughput performance did not go up anymore.

I have also realized that when using the tun/tap configuration with a bridge, 
packets are replicated on all tap devices when QEMU writes packets to the tun 
interface. I guess this is a limitation of tun/tap as it does not know to which 
tap device the packet has to go to. The tap device then eventually drops 
packets when the destination MAC is not its own, but it still receives the 
packet which causes more overhead in the system overall.

I have not yet experimented much with pinning VCPU threads to cores. I will do 
that as well.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html