RE: pci-stub error and MSI-X for KVM guest

2010-01-05 Thread Fischer, Anna
 Subject: Re: pci-stub error and MSI-X for KVM guest
 
 * Fischer, Anna (anna.fisc...@hp.com) wrote:
   Ouch.  Can you do debuginfo-install qemu-system-x86 to get the
 debug
   packages, then attach gdb to the QEMU process so that when you do
 lspci
   -v
   in the guest (assuming this is QEMU segfaulting) you'll get a
 backtrace?
 
  I don't know how I can tell virt-manager through the GUI to enable
 debug mode, e.g. call virt-manager with '-s'. From the command line I
 can attach gdb like this, but when running virt-manager from the GUI
 then I cannot connect to localhost:1234. However, the issues only arise
 when starting virt-manager from the GUI. I can't find the configuration
 option to somehow tell that I want it to be launched with '-s'?
 
 Just looking for a backtrace of the qemu-kvm process itself.  So after
 you launch it via virt-manager, gdb /usr/bin/qemu-kvm $(pidof qemu-kvm)
 should be sufficient.

So, when setting a breakpoint for the exit() call I'm getting a bit closer to 
figuring where it kills my guest.

Breakpoint 1, exit (status=1) at exit.c:99
99  {
Current language:  auto
The current source language is auto; currently c.
(gdb) bt
#0  exit (status=1) at exit.c:99
#1  0x00470c6e in assigned_dev_pci_read_config (d=0x259c6f0, 
address=64, len=4)
at /usr/src/debug/qemu-kvm-0.11.0/hw/device-assignment.c:349
#2  0x0042419d in handle_io (vcpu=value optimized out)
at /usr/src/debug/qemu-kvm-0.11.0/qemu-kvm.c:784
#3  kvm_run (vcpu=value optimized out) at 
/usr/src/debug/qemu-kvm-0.11.0/qemu-kvm.c:1017
#4  0x00424273 in kvm_cpu_exec (env=0x3f)
at /usr/src/debug/qemu-kvm-0.11.0/qemu-kvm.c:1686
#5  0x00425856 in kvm_main_loop_cpu (env=0x255a150)
at /usr/src/debug/qemu-kvm-0.11.0/qemu-kvm.c:1868
#6  ap_main_loop (env=0x255a150) at 
/usr/src/debug/qemu-kvm-0.11.0/qemu-kvm.c:1905
#7  0x0035aac06a3a in start_thread (arg=value optimized out) at 
pthread_create.c:297
#8  0x0035aa0ddf3d in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#9  0x in ?? ()
(gdb) p assigned_dev_pci_read_config::address
$1 = 64
(gdb) p assigned_dev_pci_read_config::val
$2 = 0
(gdb) p assigned_dev_pci_read_config::len
$3 = 4
(gdb) p assigned_dev_pci_read_config::ret
$4 = value optimized out
(gdb) p assigned_dev_pci_read_config::fd
$5 = 13
(gdb) p assigned_dev_pci_read_config::pci_dev
$6 = (AssignedDevice *) 0x259c6f0
(gdb) p assigned_dev_pci_read_config::pci_dev-real_device
$7 = {bus = 0 '\000', dev = 0 '\000', func = 0 '\000', irq = 0, region_number = 
7, regions = {{
  type = 512, valid = 1, base_addr = 4077142016, size = 16384, resource_fd 
= 15}, {type = 0, 
  valid = 0, base_addr = 0, size = 0, resource_fd = 0}, {type = 0, valid = 
0, base_addr = 0, 
  size = 0, resource_fd = 0}, {type = 512, valid = 1, base_addr = 
4077273088, size = 16384, 
  resource_fd = 16}, {type = 0, valid = 0, base_addr = 0, size = 0, 
resource_fd = 0}, {
  type = 0, valid = 0, base_addr = 0, size = 0, resource_fd = 0}, {type = 
0, valid = 0, 
  base_addr = 0, size = 0, resource_fd = 0}}, config_fd = 13}
(gdb) p assigned_dev_pci_read_config::d
$8 = (PCIDevice *) 0x259c6f0

So the function assigned_dev_pci_read_config fails to read the PCI 
configuration of the device and then the exit(1) call kills my guest. I don't 
know enough about the internals of KVM PCI device assignment and furthermore I 
don't quite know why this works when starting virt-manager from the command 
line and not when starting it from the GUI.

From the dmesg logs I would still guess that the problem is that pci-stub is 
not initialized properly, and perhaps this is also why the PCI read fails 
here? pci-stub tells me in the logs enabling device but I don't see any 
messages about enabling/assigning interrupts as I do when running from the 
command line. 

Let me know if you need any further information.

Attached a list of virt packages I run under Fedora Core 12.

Thanks,
Anna



packages.log
Description: packages.log


RE: pci-stub error and MSI-X for KVM guest

2010-01-04 Thread Fischer, Anna
 Subject: Re: pci-stub error and MSI-X for KVM guest
 
 * Fischer, Anna (anna.fisc...@hp.com) wrote:
   Subject: Re: pci-stub error and MSI-X for KVM guest
This works fine in principle and I can see the PCI device in the
guest under lspci. However, the 82576 VF driver requires the OS
to support MSI-X. My Fedora installation is configured with MSI-X,
e.g. CONFIG_PCI_MSI is 'y'. When I load the driver it tells me it
   cannot
initialize MSI-X for the device, and under /proc/interrupts I can
 see
that MSI-X does not seem to work. Is this a KVM/QEMU limitation?
 It
   works
for me when running the VF driver under a non-virtualized Linux
 system.
  
   No, this should work fine.  QEMU/KVM supports MSI-X to guest as well
 as
   VFs.
 
  Actually, I just got this to work. However, it only works if I call
  qemu-kvm from the command line, while it doesn't work when I start
  the guest via the virt-manager. So this seems to be an issue with
  Fedora's virt-manager rather than with KVM/QEMU. If I call qemu-kvm
  from the command line then I get the pci-stub messages saying 'irq xx
  for MSI/MSI-x' when the guest boots up and the VF device works just
 fine
  inside the guest. When I start the guest using virt-manager then I
 don't
  see any of these irq allocation messages from pci-stub. Any idea what
  the problem could be here?
 
 No, sounds odd.  Can you:
 
   # virsh dumpxml [domain]
 
 and show the output of the hostdev XML section?

hostdev mode='subsystem' type='pci' managed='yes'
  source
address domain='0x' bus='0x03' slot='0x10' function='0x3'/
/source
/hostdev

The device to assign is at :03:10.3, dmesg shows:

pci-stub :03:10.3: enabling device ( - 0002)
assign device: host bdf = 3:10:3


 
Also, when I do an lspci on the KVM guest, that is fine, but when
 I
do an lspci -v then the guest crashes down. In the host OS under
 dmesg
I can see this:
   
pci-stub :03:10.0: restoring config space at offset 0x1 (was
   0x10, writing 0x14)
   
Is this a known issue? My qemu-kvm version is 2:0.11.0.
  
   No, I've not seen the crash before.  What do you mean the guest
 crashes
   down?
 
  So this also only happens when starting the guest using virt-manager.
 It
  works fine when starting qemu-kvm from the command line. This is weird
 as
  I call it with the same parameters as I can see virt-manager uses
 under
  'ps -ef | grep qemu'. The guest crashes down means that the QEMU
 process
  is terminated. I don't see anything in the logs. It just disappears.
 
 Ouch.  Can you do debuginfo-install qemu-system-x86 to get the debug
 packages, then attach gdb to the QEMU process so that when you do lspci
 -v
 in the guest (assuming this is QEMU segfaulting) you'll get a backtrace?

I don't know how I can tell virt-manager through the GUI to enable debug mode, 
e.g. call virt-manager with '-s'. From the command line I can attach gdb like 
this, but when running virt-manager from the GUI then I cannot connect to 
localhost:1234. However, the issues only arise when starting virt-manager from 
the GUI. I can't find the configuration option to somehow tell that I want it 
to be launched with '-s'?

 
   This looks like a Fedora specific version (rpm version).  Can you
 verify
   this is from Fedora packages vs. upstream source?  If it's Fedora,
   would be useful to open a bug there.
 
  Yes, I am using KVM/QEMU which ships with the Fedora Core 12
 distribution.
 
 OK, please file a bug there (and include the backtrace info).

I will file a bug once I get the full information. Currently my guess is 
actually that I might have package mismatches or so with libvirt or 
virt-manager or QEMU related software. The is my only explanation for why it 
works from the command line, but not from the GUI. Some path variables must be 
set differently and perhaps pointing to different libraries or packages or so, 
otherwise there is no way it can behave differently when calling virt-manager 
with exactly the same parameters...

Cheers,
Anna
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: pci-stub error and MSI-X for KVM guest

2010-01-04 Thread Fischer, Anna
 Subject: RE: pci-stub error and MSI-X for KVM guest
 
  Subject: Re: pci-stub error and MSI-X for KVM guest
 
  * Fischer, Anna (anna.fisc...@hp.com) wrote:
Subject: Re: pci-stub error and MSI-X for KVM guest
 This works fine in principle and I can see the PCI device in the
 guest under lspci. However, the 82576 VF driver requires the OS
 to support MSI-X. My Fedora installation is configured with MSI-
 X,
 e.g. CONFIG_PCI_MSI is 'y'. When I load the driver it tells me
 it
cannot
 initialize MSI-X for the device, and under /proc/interrupts I
 can
  see
 that MSI-X does not seem to work. Is this a KVM/QEMU limitation?
  It
works
 for me when running the VF driver under a non-virtualized Linux
  system.
   
No, this should work fine.  QEMU/KVM supports MSI-X to guest as
 well
  as
VFs.
  
   Actually, I just got this to work. However, it only works if I call
   qemu-kvm from the command line, while it doesn't work when I start
   the guest via the virt-manager. So this seems to be an issue with
   Fedora's virt-manager rather than with KVM/QEMU. If I call qemu-kvm
   from the command line then I get the pci-stub messages saying 'irq
 xx
   for MSI/MSI-x' when the guest boots up and the VF device works just
  fine
   inside the guest. When I start the guest using virt-manager then I
  don't
   see any of these irq allocation messages from pci-stub. Any idea
 what
   the problem could be here?
 
  No, sounds odd.  Can you:
 
# virsh dumpxml [domain]
 
  and show the output of the hostdev XML section?
 
 hostdev mode='subsystem' type='pci' managed='yes'
   source
 address domain='0x' bus='0x03' slot='0x10' function='0x3'/
   /source
 /hostdev
 
 The device to assign is at :03:10.3, dmesg shows:
 
 pci-stub :03:10.3: enabling device ( - 0002)
 assign device: host bdf = 3:10:3

I forgot, here is the process that the virt-manager GUI creates, e.g. this is 
the one that does not work.

qemu  3072 1  4 11:26 ?00:00:33 /usr/bin/qemu-kvm -S -M pc-0.11 
-m 1024 -smp 1 -name FC10-2 -uuid b811b278-fae2-a3cc-d51d-8f5b078b2477 -monitor 
unix:/var/lib/libvirt/qemu/FC10-2.monitor,server,nowait -boot c -drive 
file=/var/lib/libvirt/images/FC10-2.img,if=virtio,index=0,boot=on -drive 
file=/home/af/Download/Fedora-12-x86_64-Live-KDE.iso,if=ide,media=cdrom,index=2 
-net none -serial pty -parallel none -usb -vnc 127.0.0.1:0 -k en-gb -vga cirrus 
-soundhw es1370 -pcidevice host=03:10.3

Note that this one does work from the command line, but not via the GUI.

For the debugging to work, I need the '-s' option to be added too...

Cheers,
Anna
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


pci-stub error and MSI-X for KVM guest

2009-12-21 Thread Fischer, Anna
I am running Fedora Core 12 with a 2.6.31 kernel. I use the Intel 82576 SR-IOV 
network card and want to assign its Virtual Functions (VFs) to separate KVM 
guests. My guests also run Fedora Core 12 with a 2.6.31 kernel. I use the 
latest igb driver in the host OS and load it with 2 VFs activated. Then I 
assign those to my KVM guests. I use virt-manager to do this which then takes 
care of configuring pci-stub.

This works fine in principle and I can see the PCI device in the guest under 
lspci. However, the 82576 VF driver requires the OS to support MSI-X. My Fedora 
installation is configured with MSI-X, e.g. CONFIG_PCI_MSI is 'y'. When I load 
the driver it tells me it cannot initialize MSI-X for the device, and under 
/proc/interrupts I can see that MSI-X does not seem to work. Is this a KVM/QEMU 
limitation? It works for me when running the VF driver under a non-virtualized 
Linux system.

Also, when I do an lspci on the KVM guest, that is fine, but when I do an lspci 
-v then the guest crashes down. In the host OS under dmesg I can see this:

pci-stub :03:10.0: restoring config space at offset 0x1 (was 0x10, 
writing 0x14)

Is this a known issue? My qemu-kvm version is 2:0.11.0.

Thanks,
Anna
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: pci-stub error and MSI-X for KVM guest

2009-12-21 Thread Fischer, Anna
 Subject: Re: pci-stub error and MSI-X for KVM guest
 
 * Fischer, Anna (anna.fisc...@hp.com) wrote:
  I am running Fedora Core 12 with a 2.6.31 kernel. I use the Intel
  82576 SR-IOV network card and want to assign its Virtual Functions
 (VFs)
  to separate KVM guests. My guests also run Fedora Core 12 with a
 2.6.31
  kernel. I use the latest igb driver in the host OS and load it with 2
  VFs activated. Then I assign those to my KVM guests. I use virt-
 manager
  to do this which then takes care of configuring pci-stub.
 
 By 2.6.31 are you referring to the stock Fedora 12 kernel package?

Yes.


  This works fine in principle and I can see the PCI device in the
  guest under lspci. However, the 82576 VF driver requires the OS
  to support MSI-X. My Fedora installation is configured with MSI-X,
  e.g. CONFIG_PCI_MSI is 'y'. When I load the driver it tells me it
 cannot
  initialize MSI-X for the device, and under /proc/interrupts I can see
  that MSI-X does not seem to work. Is this a KVM/QEMU limitation? It
 works
  for me when running the VF driver under a non-virtualized Linux system.
 
 No, this should work fine.  QEMU/KVM supports MSI-X to guest as well as
 VFs.

Actually, I just got this to work. However, it only works if I call qemu-kvm 
from the command line, while it doesn't work when I start the guest via the 
virt-manager. So this seems to be an issue with Fedora's virt-manager rather 
than with KVM/QEMU. If I call qemu-kvm from the command line then I get the 
pci-stub messages saying 'irq xx for MSI/MSI-x' when the guest boots up and the 
VF device works just fine inside the guest. When I start the guest using 
virt-manager then I don't see any of these irq allocation messages from 
pci-stub. Any idea what the problem could be here?

 
  Also, when I do an lspci on the KVM guest, that is fine, but when I
  do an lspci -v then the guest crashes down. In the host OS under dmesg
  I can see this:
 
  pci-stub :03:10.0: restoring config space at offset 0x1 (was
 0x10, writing 0x14)
 
  Is this a known issue? My qemu-kvm version is 2:0.11.0.
 
 No, I've not seen the crash before.  What do you mean the guest crashes
 down?

So this also only happens when starting the guest using virt-manager. It works 
fine when starting qemu-kvm from the command line. This is weird as I call it 
with the same parameters as I can see virt-manager uses under 'ps -ef | grep 
qemu'. The guest crashes down means that the QEMU process is terminated. I 
don't see anything in the logs. It just disappears.

 
 This looks like a Fedora specific version (rpm version).  Can you verify
 this is from Fedora packages vs. upstream source?  If it's Fedora,
 would be useful to open a bug there.

Yes, I am using KVM/QEMU which ships with the Fedora Core 12 distribution.

Thanks for your help,
Anna
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM for Linux 2.6.16?

2009-07-09 Thread Fischer, Anna
Hi, I am trying to compile the kvm-87 module for Linux 2.6.16. I thought that 
it has been back-ported to such an old kernel. However, I don't seem to be able 
to compile the module on my kernel. I get the following error:

  CCtsc2005.o
  CCscsi-disk.o
  CCcdrom.o
  CCscsi-generic.o
  CCusb.o
  CCusb-hub.o
  CCusb-linux.o
In file included from usb-linux.c:41:
/usr/include/linux/usbdevice_fs.h:49: error: expected ':', ',', ';', '}' or 
'__attribute__' before '*' token
/usr/include/linux/usbdevice_fs.h:56: error: expected ':', ',', ';', '}' or 
'__attribute__' before '*' token
/usr/include/linux/usbdevice_fs.h:66: error: expected ':', ',', ';', '}' or 
'__attribute__' before '*' token
/usr/include/linux/usbdevice_fs.h:100: error: expected ':', ',', ';', '}' or 
'__attribute__' before '*' token
/usr/include/linux/usbdevice_fs.h:116: error: expected ':', ',', ';', '}' or 
'__attribute__' before '*' token
usb-linux.c: In function 'async_complete':
usb-linux.c:271: error: 'struct usbdevfs_urb' has no member named 
'actual_length'
usb-linux.c: In function 'usb_host_handle_data':
usb-linux.c:464: error: 'struct usbdevfs_urb' has no member named 'buffer'
usb-linux.c:465: error: 'struct usbdevfs_urb' has no member named 
'buffer_length'
usb-linux.c:471: error: 'struct usbdevfs_urb' has no member named 
'number_of_packets'
usb-linux.c:472: error: 'struct usbdevfs_urb' has no member named 
'iso_frame_desc'
usb-linux.c:478: error: 'struct usbdevfs_urb' has no member named 'usercontext'
usb-linux.c: In function 'usb_host_handle_control':
usb-linux.c:598: error: 'struct usbdevfs_urb' has no member named 'buffer'


Is KVM not supposed to work on 2.6.16?

Cheers,
Anna
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: KVM for Linux 2.6.16?

2009-07-09 Thread Fischer, Anna
 Subject: Re: KVM for Linux 2.6.16?
 
 On Thu, 2009-07-09 at 16:49 +, Fischer, Anna wrote:
  Hi, I am trying to compile the kvm-87 module for Linux 2.6.16. I
 thought that it has been back-ported to such an old kernel. However, I
 don't seem to be able to compile the module on my kernel. I get the
 following error:
 
CCtsc2005.o
CCscsi-disk.o
CCcdrom.o
CCscsi-generic.o
CCusb.o
CCusb-hub.o
CCusb-linux.o
  In file included from usb-linux.c:41:
  /usr/include/linux/usbdevice_fs.h:49: error: expected ':', ',', ';',
 '}' or '__attribute__' before '*' token
  /usr/include/linux/usbdevice_fs.h:56: error: expected ':', ',', ';',
 '}' or '__attribute__' before '*' token
  /usr/include/linux/usbdevice_fs.h:66: error: expected ':', ',', ';',
 '}' or '__attribute__' before '*' token
  /usr/include/linux/usbdevice_fs.h:100: error: expected ':', ',', ';',
 '}' or '__attribute__' before '*' token
  /usr/include/linux/usbdevice_fs.h:116: error: expected ':', ',', ';',
 '}' or '__attribute__' before '*' token
  usb-linux.c: In function 'async_complete':
  usb-linux.c:271: error: 'struct usbdevfs_urb' has no member named
 'actual_length'
  usb-linux.c: In function 'usb_host_handle_data':
  usb-linux.c:464: error: 'struct usbdevfs_urb' has no member named
 'buffer'
  usb-linux.c:465: error: 'struct usbdevfs_urb' has no member named
 'buffer_length'
  usb-linux.c:471: error: 'struct usbdevfs_urb' has no member named
 'number_of_packets'
  usb-linux.c:472: error: 'struct usbdevfs_urb' has no member named
 'iso_frame_desc'
  usb-linux.c:478: error: 'struct usbdevfs_urb' has no member named
 'usercontext'
  usb-linux.c: In function 'usb_host_handle_control':
  usb-linux.c:598: error: 'struct usbdevfs_urb' has no member named
 'buffer'
 
 
  Is KVM not supposed to work on 2.6.16?
 Hi Anna,
 
 I'm afraid that I have some bad news for you. Usually KVM versions are
 tailored to kernel versions contemporary with them. Version 87 is
 supposed to need 2.6.26 kernels and newer, IIRC. So for your 2.6.16 you
 should try some of the incipient KVM versions, and if you are lucky
 enough, they might work.

So if I run an ancient Linux kernel, then I can only run with an ancient KVM 
version? I thought the code was kept backwards compatible to some extend?
N�r��yb�X��ǧv�^�)޺{.n�+h����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf

RE: Network throughput limits for local VM - VM communication

2009-06-17 Thread Fischer, Anna
 Subject: Re: Network throughput limits for local VM - VM
 communication
 
 Fischer, Anna wrote:
  Not sure I understand. As far as I can see the packets are replicated
 on the tun/tap interface before they actually enter the bridge. So this
 is not about the bridge learning MAC addresses and flooding frames to
 unknown destinations. So I think this is different.
 
 
 Okay.
 
 You said:
 
  However, without VLANs, the tun
  interface will pass packets to all tap interfaces. It has to, as it
  doesn't know to which one the packet has to go to.
 
 Well, it shouldn't.  The tun interface should pass the packets to just
 one tap interface.
 
 Can you post the qemu command line you're using?  There's a gotcha
 there
 that can result in what you're seeing.

Sorry for the late reply on this issue. The command line I am using looks 
roughly like this:

/usr/bin/qemu-system-x86_64 -m 1024 -smp 2 -name FC10-2 -uuid 
b811b278-fae2-a3cc-d51d-8f5b078b2477 -boot c -drive 
file=,if=ide,media=cdrom,index=2 -drive 
file=/var/lib/libvirt/images/FC10-2.img,if=virtio,index=0,boot=on -net 
nic,macaddr=54:52:00:11:ae:79,model=e1000 -net tap net 
nic,macaddr=54:52:00:11:ae:78,model=e1000 -net tap  -serial pty -parallel none 
-usb -vnc 127.0.0.1:2 -k en-gb -soundhw es1370

This is my routing VM that has two network interfaces and routes packets 
between two subnets. It has one interface plugged into bridge virbr0 and the 
other interface is plugged into virbr1:

brctl show
bridge name bridge id   STP enabled interfaces
virbr0  8000.8ac1d18c63ec   no  vnet0
vnet1
virbr1  8000.2ebfcbb9ed70   no  vnet2
vnet3

If I use the e1000 virtual NIC model, I see performance drop significantly 
compared to using virtio_net. However, with virtio_net I have the network 
stalling after a few seconds of high-throughput traffic (as I mentioned in my 
previous post). Just to reiterate my scenario: I run three guests on the same 
physical machine, one guest is my routing VM that is routing IP network traffic 
between the other two guests.

I am also wondering about the fact that I do not seem to get CPU utilization 
maxed out in this case while throughput does not go any higher. I do not 
understand what is stopping KVM from using more CPU for guest I/O processing? 
There is nothing else running on my machine. I have analyzed the amount of CPU 
that each KVM thread is using, and I can see that the thread that is running 
the VCPU of the routing VM which is processing interrupts of the e1000 virtual 
network card is using the highest amount of CPU. Is there any way that I can 
optimize my network set-up? Maybe some specific configuration of the e1000 
driver within the guest? Are there any known issues with this?

I also see very difference CPU utilization and network throughput figures when 
pinning threads to CPU cores using taskset. At one point I managed to double 
the throughput, but I could not reproduce that setup for some reason. What are 
the major issues that I would need to pay attention to when pinning threads to 
cores in order to optimize my specific set-up so that I can achieve better 
network I/O performance?

Thanks for your help.

Anna
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Network throughput limits for local VM - VM communication

2009-06-17 Thread Fischer, Anna
 Subject: Re: Network throughput limits for local VM - VM
 communication
 
 On 06/17/2009 10:36 AM, Fischer, Anna wrote:
 
  /usr/bin/qemu-system-x86_64 -m 1024 -smp 2 -name FC10-2 -uuid
 b811b278-fae2-a3cc-d51d-8f5b078b2477 -boot c -drive
 file=,if=ide,media=cdrom,index=2 -drive
 file=/var/lib/libvirt/images/FC10-2.img,if=virtio,index=0,boot=on -net
 nic,macaddr=54:52:00:11:ae:79,model=e1000 -net tap net
 nic,macaddr=54:52:00:11:ae:78,model=e1000 -net tap  -serial pty -
 parallel none -usb -vnc 127.0.0.1:2 -k en-gb -soundhw es1370
 
 
 
 Okay, like I suspected, qemu has a trap here and you walked into it.
 The -net option plugs the device you specify into a virtual hub.  The
 command line you provided plugs the two virtual NICs and the two tap
 devices into one virtual hub, so any packet received from any of the
 four clients will be propagated to the other three.
 
 To get this to work right, specify the vlan= parameter which says which
 virtual hub a component is plugged into.  Note this has nothing to do
 with 802.blah vlans.
 
 So your command line should look like
 
 qemu ... -net nic,...,vlan=0 -net tap,...,vlan=0 -net
 nic,...,vlan=1
 -net tap,...,vlan=1
 
 This will give you two virtual hubs, each bridging a virtual nic to a
 tap device.
 
  This is my routing VM that has two network interfaces and routes
 packets between two subnets. It has one interface plugged into bridge
 virbr0 and the other interface is plugged into virbr1:
 
  brctl show
  bridge name bridge id   STP enabled interfaces
  virbr0  8000.8ac1d18c63ec   no  vnet0
   vnet1
  virbr1  8000.2ebfcbb9ed70   no  vnet2
   vnet3
 
 
 Please redo the tests with qemu vlans but without 802.blah vlans, so we
 see what happens without packet duplication.

Avi, thanks for your quick reply. I do use the vlan= parameter now, and yes, I 
do not see packet duplication any more, so everything you said is right and I 
do understand now why I was seeing packets on both bridges before. So this has 
nothing to do with tun/tap then but just with the way QEMU virtual hubs work. 
I didn't know about any details on that before.

Even with vlan= enabled, I am still having the same issues with weird CPU 
utilization and low throughput that I have described below.
 
  If I use the e1000 virtual NIC model, I see performance drop
 significantly compared to using virtio_net. However, with virtio_net I
 have the network stalling after a few seconds of high-throughput
 traffic (as I mentioned in my previous post). Just to reiterate my
 scenario: I run three guests on the same physical machine, one guest is
 my routing VM that is routing IP network traffic between the other two
 guests.
 
  I am also wondering about the fact that I do not seem to get CPU
 utilization maxed out in this case while throughput does not go any
 higher. I do not understand what is stopping KVM from using more CPU
 for guest I/O processing? There is nothing else running on my machine.
 I have analyzed the amount of CPU that each KVM thread is using, and I
 can see that the thread that is running the VCPU of the routing VM
 which is processing interrupts of the e1000 virtual network card is
 using the highest amount of CPU. Is there any way that I can optimize
 my network set-up? Maybe some specific configuration of the e1000
 driver within the guest? Are there any known issues with this?
 
 
 There are known issues with lack of flow control while sending packets
 out of a guest.  If the guest runs tcp that tends to correct for it,
 but
 if you run a lower level protocol that doesn't have its own flow
 control, the guest may spend a lot of cpu generating packets that are
 eventually dropped.  We are working on fixing this.

For the tests I run now (with vlan= enabled) I am actually using both TCP and 
UDP, and I see the problem with virtio_net for both protocols. What I am 
wondering about though is that I do not seem to have any problems if I 
communicate directly between the two guests (if I plug then into the same 
bridge and put them onto the same networks), so why do I only see the problem 
of stalling network communication when there is a routing VM in the network 
path? Is this just because the system is even more overloaded in that case? Or 
could this be an issue related to a dual NIC configuration or the fact that I 
run multiple bridges on the same physical machine?

When you say We are working on fixing this. - which code parts are you 
working on? Is this in the QEMU network I/O processing code or is this 
virtio_net related?

  I also see very difference CPU utilization and network throughput
 figures when pinning threads to CPU cores using taskset. At one point I
 managed to double the throughput, but I could not reproduce that setup
 for some reason. What are the major issues that I would

RE: Network throughput limits for local VM - VM communication

2009-06-17 Thread Fischer, Anna
 Subject: Re: Network throughput limits for local VM - VM
 communication
 
 On 06/17/2009 11:12 AM, Fischer, Anna wrote:
 
  For the tests I run now (with vlan= enabled) I am actually using both
 TCP and UDP, and I see the problem with virtio_net for both protocols.
 What I am wondering about though is that I do not seem to have any
 problems if I communicate directly between the two guests (if I plug
 then into the same bridge and put them onto the same networks), so why
 do I only see the problem of stalling network communication when there
 is a routing VM in the network path? Is this just because the system is
 even more overloaded in that case? Or could this be an issue related to
 a dual NIC configuration or the fact that I run multiple bridges on the
 same physical machine?
 
 
 My guess is that somewhere there's a queue that's shorter that the
 virtio queue, or its usable size fluctuates (because it is shared with
 something else).  So TCP flow control doesn't work, and UDP doesn't
 have
 a chance.
 
  When you say We are working on fixing this. - which code parts are
 you working on? Is this in the QEMU network I/O processing code or is
 this virtio_net related?
 
 
 tap. virtio, qemu, maybe more.  It's a difficult problem.
 
  Retry with the fixed configuration? You mean setting the vlan=
 parameter? I have already used the vlan= parameter for the latest
 tests, and so the CPU utilization issues I am talking about are
 happening with that configuration.
 
 
 Yeah.
 
 Can you compare total data sent and received as seen by the guests?
 That would confirm that packets being dropped causes the slowdown.

Yes, I will check on that and report back. 

It still does not answer my question on why I only see low CPU utilization 
numbers with the e1000 virtual device model. There is no network stalling or 
packet drops or any other obvious issues when running with that model, but I am 
still seeing low CPU utilization numbers. What is preventing KVM here to use 
more of the host CPU capacity when the host is not doing anything else but run 
virtual machines? Is there any way that I can get higher CPU utilization out of 
KVM?

Thanks,
Anna
N�r��yb�X��ǧv�^�)޺{.n�+h����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf

RE: Network throughput limits for local VM - VM communication

2009-06-11 Thread Fischer, Anna
 Subject: Re: Network throughput limits for local VM - VM
 communication
 
 On Wednesday 10 June 2009, Fischer, Anna wrote:
   Have you tried eliminating VLAN to simplify the setup?
 
  No - but there is a relating bug in the tun/tap interface (well, it
 is not
  really a bug but simply the way tun/tap works) that will cause
 packets to
  be replicated on all the tap interfaces (across all bridges attached
 to
  those) if I do not configure VLANs. This will result in a system that
 is
  even more overloaded. I had discovered this a while back when running
  UDP stress tests under 10G.
 
 Not sure I understand. Do you mean you have all three guests connected
 to the same bridge? If you want the router guest to be the only
 connection,
 you should not connect the two bridges anywhere else, so I don't see
 how packets can go from one bridge to the other one, except through the
 router.

I am using two bridges, and yes, in theory, the router should be the only 
connection between the two guests. However, without VLANs, the tun interface 
will pass packets to all tap interfaces. It has to, as it doesn't know to which 
one the packet has to go to. It does not look at packets, it simply copies 
buffers from userspace to the tap interface in the kernel. The tap interface 
then eventually drops the packet, if the MAC address does not match its own. So 
packets will not actually go across both bridges, because the tap interface 
that should not receive the packet does drop it. However, it does receive the 
packet and processes it to some extend which causes some overhead. As I was 
told by someone at KVM/RedHat, this does not happen when using VLANs as then 
there will be a direct mapping between any tun-tap device and so no packet 
replication across multiple tap devices.
 

   Does it change when the guests communicate over a -net socket
 interface
   with your router instead of the -net tap + bridge in the host?
 
  I have not tried this - I need the bridge in the network data path
 for
  some testing, so using the -net socket interface would not solve my
 problem.
 
 I did not mean this to solve your problem but to hunt down the bug.
 If the problem only exists with the host bridge device, we should look
 there, but if it persists, we can probably rule out the tap, bridge and
 vlan
 code in the host as the problem source.

Yes, I understand you were trying to help and using the -net socket interface 
would help to narrow down where the problem could be. I just have not yet 
managed to set this up, but I might do if I find the time in the next days. I 
was hoping that other people might have seen the same issues I see, but 
unfortunately I did not get that many replies/suggestions on this issue from 
the list at all.
 

  However, I have just today managed to get around this bug by using
 the
  e1000 QEMU emulated NIC model and this seems to do the trick. Now the
  throughput is still very low, but that might simply be because my
 system
  is too weak. When using the e1000 model instead of rtl8139 or virtio,
  I do not have any network crashes any more.
 
 That could either indicate a bug in rtl8139 and virtio, or that the
 specific timing of the e1000 model hides this bug.
 
 What happens if only one side uses e1000 while the other still uses
 virtio? What about any of the other models?

Good question. I will try this out and post the results.

Cheers,
Anna
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Network throughput limits for local VM - VM communication

2009-06-11 Thread Fischer, Anna
 Subject: Re: Network throughput limits for local VM - VM
 communication
 
 Fischer, Anna wrote:
  I am using two bridges, and yes, in theory, the router should be the
 only connection between the two guests. However, without VLANs, the tun
 interface will pass packets to all tap interfaces. It has to, as it
 doesn't know to which one the packet has to go to. It does not look at
 packets, it simply copies buffers from userspace to the tap interface
 in the kernel. The tap interface then eventually drops the packet, if
 the MAC address does not match its own. So packets will not actually go
 across both bridges, because the tap interface that should not receive
 the packet does drop it. However, it does receive the packet and
 processes it to some extend which causes some overhead. As I was told
 by someone at KVM/RedHat, this does not happen when using VLANs as then
 there will be a direct mapping between any tun-tap device and so no
 packet replication across multiple tap devices.
 
 
 This only happens if the receiving tap never sends out packets.  If the
 tap interface does send out packets, the bridge will associate their
 MAC
 address with that interface, and future packets will only be forwarded
 there.
 
 Is this your scenario?

Not sure I understand. As far as I can see the packets are replicated on the 
tun/tap interface before they actually enter the bridge. So this is not about 
the bridge learning MAC addresses and flooding frames to unknown destinations. 
So I think this is different.

Anna

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Network throughput limits for local VM - VM communication

2009-06-10 Thread Fischer, Anna
 Subject: Re: Network throughput limits for local VM - VM
 communication
 
 On Tuesday 09 June 2009, Fischer, Anna wrote:
 
  I have tried using virtio and using the emulated QEMU virtual NICs.
  It does not make a difference. It seems as if there is an overflow
 somewhere
  when QEMU/virtio cannot cope with the network load any more, and then
 the
  virtual interfaces don't seem to transmit anything anymore. It seems
 to
  mostly work again when I shut down and start up the interfaces of the
 router
  inside of the guest. I use two bridges (and VLANs) that pass packets
 between
  sending/receiving guests and the routing guest. The set-up works fine
 for
  simple ping and other communication that is low-throughput type
 traffic.
 
 Have you tried eliminating VLAN to simplify the setup?

No - but there is a relating bug in the tun/tap interface (well, it is not 
really a bug but simply the way tun/tap works) that will cause packets to be 
replicated on all the tap interfaces (across all bridges attached to those) if 
I do not configure VLANs. This will result in a system that is even more 
overloaded. I had discovered this a while back when running UDP stress tests 
under 10G.

 
 Does it change when the guests communicate over a -net socket interface
 with your router instead of the -net tap + bridge in the host?

I have not tried this - I need the bridge in the network data path for some 
testing, so using the -net socket interface would not solve my problem.

However, I have just today managed to get around this bug by using the e1000 
QEMU emulated NIC model and this seems to do the trick. Now the throughput is 
still very low, but that might simply be because my system is too weak. When 
using the e1000 model instead of rtl8139 or virtio, I do not have any network 
crashes any more.

I have been looking through the RedHat bug lists to search for some hints 
today, and it seems as if there are a lot of people seeing the network under 
KVM break down under heavy load. I think that this is something that needs some 
further investigation. I can provide more detailed system set-up etc, it should 
be easy to reproduce this.

Thanks for your help,
Anna
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Network throughput limits for local VM - VM communication

2009-06-09 Thread Fischer, Anna
I am testing network throughput between two guests residing on the same 
physical machine. I use a bridge to pass packets between those guests and the 
virtio NIC model. I am wondering why the throughput only goes up to about 
970Mbps. Should we not be able to achieve much higher throughput if the packets 
do not actually go out on the physical wire? What are the limitations for 
throughput performance under KVM/virtio? I can see that by default the 
interfaces (the tap devices) have TX queue length set to 500, and I wonder if 
increasing this would make any difference? Also, are there other things I would 
need to consider to achieve higher throughput numbers for local guest - guest 
communication? The CPU is not maxed out at all, and shows as being idle for 
most of the time while the throughput does not increase any more.

I run KVM under standard Fedora Core 10 with a Linux kernel 2.6.27.

Thanks,
Anna

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Network I/O performance

2009-05-20 Thread Fischer, Anna
 Subject: Re: Network I/O performance
 
 Fischer, Anna wrote:
  Subject: Re: Network I/O performance
 
  Fischer, Anna wrote:
 
  I am running KVM with Fedora Core 8 on a 2.6.23 32-bit kernel. I
 use
 
  the tun/tap device model and the Linux bridge kernel module to
 connect
  my VM to the network. I have 2 10G Intel 82598 network devices (with
  the ixgbe driver) attached to my machine and I want to do packet
  routing in my VM (the VM has two virtual network interfaces
  configured). Analysing the network performance of the standard QEMU
  emulated NICs, I get less that 1G of throughput on those 10G links.
  Surprisingly though, I don't really see CPU utilization being maxed
  out. This is a dual core machine, and mpstat shows me that both CPUs
  are about 40% idle. My VM is more or less unresponsive due to the
 high
  network processing load while the host OS still seems to be in good
  shape. How can I best tune this setup to achieve best possible
  performance with KVM? I know there is virtIO and I know there is PCI
  pass-through, but those models are not an option for me right now.
 
  How many cpus are assigned to the guest?  If only one, then 40% idle
  equates to 100% of a core for the guest and 20% for housekeeping.
 
 
  No, the machine has a dual core CPU and I have configured the guest
 with 2 CPUs. So I would want to see KVM using up to 200% of CPU,
 ideally. There is nothing else running on that machine.
 
 
 Well, it really depends on the workload, whether it can utilize both
 vcpus.
 
 
 
  If this is the case, you could try pinning the vcpu thread (info
 cpus
  from the monitor) to one core.  You should then see 100%/20% cpu
 load
  distribution.
 
  wrt emulated NIC performance, I'm guessing you're not doing tcp?  If
  you
  were we might do something with TSO.
 
 
  No, I am measuring UDP throughput performance. I have now tried using
 a different NIC model, and the e1000 model seems to achieve slightly
 better performance (CPU goes up to 110% only though). I have also been
 running virtio now, and while its performance with 2.6.20 was very poor
 too, when changing the guest kernel to 2.6.30, I get a reasonable
 performance and higher CPU utilization (e.g. it goes up to 180-190%). I
 have to throttle the incoming bandwidth though, because as soon as I go
 over a certain threshold, CPU goes back down to 90% and throughput goes
 down too.
 
 
 Yes, there's a known issue with UDP, where we don't report congestion
 and the queues start dropping packets.  There's a patch for tun queued
 for the next merge window; you'll need a 2.6.31 host for that IIRC
 (Herbert?)
 
  I have not seen this with Xen/VMware where I mostly managed to max
 out CPU completely before throughput performance did not go up anymore.
 
  I have also realized that when using the tun/tap configuration with a
 bridge, packets are replicated on all tap devices when QEMU writes
 packets to the tun interface. I guess this is a limitation of tun/tap
 as it does not know to which tap device the packet has to go to. The
 tap device then eventually drops packets when the destination MAC is
 not its own, but it still receives the packet which causes more
 overhead in the system overall.
 
 
 Right, I guess you'd see this with a real switch as well?  Maybe have
 your guest send a packet out once in a while so the bridge can learn
 its
 MAC address (we do this after migration, for example).

No, this is not about the bridge - packets are replicated by tun/tap as far as 
I can see. In fact I run two bridges, and attach my two tap interfaces to those 
(one tap per bridge to connect it to the external network). And packets that 
should actually only go to one bridge, are replicated on the other one, too. 
This is far off from being ideal, but I guess the issue is that the tun/tap 
interface is a 1-N mapping, so there is not much you can do.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: tun/tap and Vlans

2009-05-20 Thread Fischer, Anna
 Subject: Re: tun/tap and Vlans
 
 Lukas Kolbe wrote:
  Right, I guess you'd see this with a real switch as well?  Maybe
 have
  your guest send a packet out once in a while so the bridge can learn
 its
  MAC address (we do this after migration, for example).
 
 
  Does this mean that it is not possible for having each tun device in
 a
  seperate bridge that serves a seperate Vlan? We have experienced a
  strange problem that we couldn't yet explain. Given this setup:
 
  GuestHost
  kvm1 --- eth0 -+- bridge0 --- vlan1 \
 | +-- eth0
  kvm2 -+- eth0 -/ /
\- eth1 --- bridge1 --- vlan2 +
 
  When sending packets through kvm2/eth0, they appear on both bridges
 and
  also vlans, also when sending packets through kvm2/eth1. When the
 guest
  has only one interface, the packets only appear on one bridge and one
  vlan as it's supposed to be.
 
  Can this be worked around?
 
 
 This is strange.  Can you post the command line you used to start kvm2?

This is exactly my scenario as well. 

When QEMU sends packets through the tun interface coming from a VM then those 
will be passed to both tap devices of that VM. Simply because it doesn't know 
where to send the packet to. It just copies the buffer to the tap interface. 
The tap interface then eventually discards the packet if the MAC address 
doesn't match its own.

What you would need is a 1:1 mapping, e.g. one tun interface per tap device. 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM VT-d2?

2009-05-14 Thread Fischer, Anna
Does KVM already take advantage of Intel VT-d2 features, e.g. interrupt 
remapping support? Has anyone verified how it improves interrupt delivery for 
PCI pass-through devices?

Thanks,
Anna


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: KVM VT-d2?

2009-05-14 Thread Fischer, Anna
I thought that one use case of VT-d2 interrupt remapping was to be able to 
safely and more efficiently deliver interrupts to the CPU that runs the 
particular VCPU of the guest that owns the I/O device that issues the 
interrupt. Shouldn't there at least be some performance (e.g. latency) 
improvement doing the remapping and checking in HW with a predefined table 
rather than multiplexing this in software in the hypervisor layer?

 -Original Message-
 From: Kay, Allen M [mailto:allen.m@intel.com]
 Sent: 14 May 2009 15:02
 To: Fischer, Anna; kvm@vger.kernel.org
 Subject: RE: KVM  VT-d2?
 
 We have verified VT-d2 features works with PCI passthrough on KVM.  To
 enable it, you need to turn on interrupt remapping in kernel config.
 
 Interrupt remapping is a security/isolation feature where interrupt
 delivery is qualified with device's bus/device/function in interrupt
 remapping table entry when source ID checking is turn on.  It does not
 directly inject interrupt to the guest OS.
 
 -Original Message-
 From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On
 Behalf Of Fischer, Anna
 Sent: Thursday, May 14, 2009 2:53 PM
 To: kvm@vger.kernel.org
 Subject: KVM  VT-d2?
 
 Does KVM already take advantage of Intel VT-d2 features, e.g. interrupt
 remapping support? Has anyone verified how it improves interrupt
 delivery for PCI pass-through devices?
 
 Thanks,
 Anna
 
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


virtio_net with RSS?

2009-05-14 Thread Fischer, Anna
Are there any plans to enhance virtio_net with receive-side scaling 
capabilities, so that an SMP guest OS can balance its network processing load 
more equally across multiple CPUs?

Thanks,
Anna
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


PCI pass-through of multi-function device

2009-05-14 Thread Fischer, Anna
Does KVM allow passing through a full multi-function PCI device to a guest, and 
make that device appear as a whole multi-function device rather than as 
multiple PCI single-function devices (e.g. Xen only does the latter where all 
PCI devices appear with function ID being 0 in the guest)?

Thanks,
Anna
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Network I/O performance

2009-05-13 Thread Fischer, Anna
 Subject: Re: Network I/O performance
 
 Fischer, Anna wrote:
  I am running KVM with Fedora Core 8 on a 2.6.23 32-bit kernel. I use
 the tun/tap device model and the Linux bridge kernel module to connect
 my VM to the network. I have 2 10G Intel 82598 network devices (with
 the ixgbe driver) attached to my machine and I want to do packet
 routing in my VM (the VM has two virtual network interfaces
 configured). Analysing the network performance of the standard QEMU
 emulated NICs, I get less that 1G of throughput on those 10G links.
 Surprisingly though, I don't really see CPU utilization being maxed
 out. This is a dual core machine, and mpstat shows me that both CPUs
 are about 40% idle. My VM is more or less unresponsive due to the high
 network processing load while the host OS still seems to be in good
 shape. How can I best tune this setup to achieve best possible
 performance with KVM? I know there is virtIO and I know there is PCI
 pass-through, but those models are not an option for me right now.
 
 
 How many cpus are assigned to the guest?  If only one, then 40% idle
 equates to 100% of a core for the guest and 20% for housekeeping.

No, the machine has a dual core CPU and I have configured the guest with 2 
CPUs. So I would want to see KVM using up to 200% of CPU, ideally. There is 
nothing else running on that machine.
 
 If this is the case, you could try pinning the vcpu thread (info cpus
 from the monitor) to one core.  You should then see 100%/20% cpu load
 distribution.
 
 wrt emulated NIC performance, I'm guessing you're not doing tcp?  If
 you
 were we might do something with TSO.

No, I am measuring UDP throughput performance. I have now tried using a 
different NIC model, and the e1000 model seems to achieve slightly better 
performance (CPU goes up to 110% only though). I have also been running virtio 
now, and while its performance with 2.6.20 was very poor too, when changing the 
guest kernel to 2.6.30, I get a reasonable performance and higher CPU 
utilization (e.g. it goes up to 180-190%). I have to throttle the incoming 
bandwidth though, because as soon as I go over a certain threshold, CPU goes 
back down to 90% and throughput goes down too. 

I have not seen this with Xen/VMware where I mostly managed to max out CPU 
completely before throughput performance did not go up anymore.

I have also realized that when using the tun/tap configuration with a bridge, 
packets are replicated on all tap devices when QEMU writes packets to the tun 
interface. I guess this is a limitation of tun/tap as it does not know to which 
tap device the packet has to go to. The tap device then eventually drops 
packets when the destination MAC is not its own, but it still receives the 
packet which causes more overhead in the system overall.

I have not yet experimented much with pinning VCPU threads to cores. I will do 
that as well.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Problem doing pci passthrough of the network card without VT-d

2009-05-13 Thread Fischer, Anna
Are you expecting this to work using the 1:1 mapping for direct device 
assignment? I use a similar setup (e.g. dma=none and no VT-d) but a different 
NIC (Intel 82598 10G) and a different driver (ixgbe). I see the same messages, 
but also don't get the device to work in the guest (while it does work in the 
host OS). In fact I don't get any errors on the guest side, so it is hard to 
track what is wrong. No I/O is happening. The guest cannot not transmit/receive 
any packets to/from those NICs. The interface packet counters stay at 0.

I see an error in QEMU saying invalid memtype, and it also seems to have 
trouble assigning IRQs. assigned_dev_enabled_msix() fails with Invalid 
Argument, but on the guest side I can see that MSI-X is configured properly 
under /proc/interrupts.

I use the latest KVM 2.6.30 tree in both host OS and guest OS.

 -Original Message-
 From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On
 Behalf Of Passera, Pablo R
 Sent: 12 May 2009 11:22
 To: kvm@vger.kernel.org
 Subject: RE: Problem doing pci passthrough of the network card without
 VT-d

 One update on this. I disabled VT-d from the BIOS and now I am not
 getting the DMAR error messages in dmesg, but the board still does not
 work on the guest. Any help is welcomed.

 e1000e :00:19.0: PCI INT A disabled
 pci-stub :00:19.0: PCI INT A - GSI 20 (level, low) - IRQ 20
 pci-stub :00:19.0: irq 29 for MSI/MSI-X
 pci-stub :00:19.0: irq 29 for MSI/MSI-X
 pci-stub :00:19.0: irq 29 for MSI/MSI-X
 pci-stub :00:19.0: irq 29 for MSI/MSI-X
 pci-stub :00:19.0: irq 29 for MSI/MSI-X
 pci-stub :00:19.0: irq 29 for MSI/MSI-X
 pci-stub :00:19.0: irq 29 for MSI/MSI-X
 pci-stub :00:19.0: irq 29 for MSI/MSI-X
 pci-stub :00:19.0: irq 29 for MSI/MSI-X
 pci-stub :00:19.0: irq 29 for MSI/MSI-X

 Regards,
 Pablo

 -Original Message-
 From: Passera, Pablo R
 Sent: Tuesday, May 12, 2009 12:14 PM
 To: kvm@vger.kernel.org
 Subject: Problem doing pci passthrough of the network card without VT-
 d
 
 Hi List,
I am having problems to do pci passthrough to a network card
 without using VT-d. The card is present in the guest but with a
 different model (Intel Corporation 82801I Gigabit Ethernet Controller
 (rev 2)) and it does not work. The qemu line that I used is:
 
 ./devel/bin/qemu-system-x86_64 -hda ./dm.img -m 256 -pcidevice
 host=00:19.0,dma=none -net none
 
 Before running qemu I did
 
 echo 8086 294c  /sys/bus/pci/drivers/pci-stub/new_id
 echo :00:19.0  /sys/bus/pci/drivers/e1000e/unbind
 echo :00:19.0  /sys/bus/pci/drivers/pci-stub/bind
 
 This is the lspci -tv output
 
 -[:00]-+-00.0  Intel Corporation 82X38/X48 Express DRAM Controller
+-01.0-[:01]00.0  nVidia Corporation G80 [GeForce
 8800 GTX]
+-19.0  Intel Corporation 82566DC-2 Gigabit Network
 Connection
+-1a.0  Intel Corporation 82801I (ICH9 Family) USB UHCI
 Controller #4
+-1a.1  Intel Corporation 82801I (ICH9 Family) USB UHCI
 Controller #5
+-1a.2  Intel Corporation 82801I (ICH9 Family) USB UHCI
 Controller #6
+-1a.7  Intel Corporation 82801I (ICH9 Family) USB2 EHCI
 Controller #2
+-1b.0  Intel Corporation 82801I (ICH9 Family) HD Audio
 Controller
+-1c.0-[:02]--
+-1c.4-[:03]00.0  Marvell Technology Group Ltd.
 88SE6121 SATA II Controller
+-1d.0  Intel Corporation 82801I (ICH9 Family) USB UHCI
 Controller #1
+-1d.1  Intel Corporation 82801I (ICH9 Family) USB UHCI
 Controller #2
+-1d.2  Intel Corporation 82801I (ICH9 Family) USB UHCI
 Controller #3
+-1d.7  Intel Corporation 82801I (ICH9 Family) USB2 EHCI
 Controller #1
+-1e.0-[:04]03.0  Texas Instruments TSB43AB22/A
 IEEE-
 1394a-2000 Controller (PHY/Link)
+-1f.0  Intel Corporation 82801IR (ICH9R) LPC Interface
 Controller
+-1f.2  Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 4
 port
 SATA IDE Controller
+-1f.3  Intel Corporation 82801I (ICH9 Family) SMBus
 Controller
\-1f.5  Intel Corporation 82801I (ICH9 Family) 2 port SATA
 IDE Controller
 
 
 I am getting the following error in host dmesg
 
 e1000e :00:19.0: PCI INT A disabled
 pci-stub :00:19.0: PCI INT A - GSI 20 (level, low) - IRQ 20
 pci-stub :00:19.0: irq 29 for MSI/MSI-X
 pci-stub :00:19.0: irq 29 for MSI/MSI-X
 pci-stub :00:19.0: irq 29 for MSI/MSI-X
 pci-stub :00:19.0: irq 29 for MSI/MSI-X
 pci-stub :00:19.0: irq 29 for MSI/MSI-X
 DMAR:[DMA Read] Request device [00:19.0] fault addr baee000
 DMAR:[fault reason 02] Present bit in context entry is clear
 pci-stub :00:19.0: irq 29 for MSI/MSI-X
 pci-stub :00:19.0: irq 29 for MSI/MSI-X
 pci-stub :00:19.0: irq 29 for MSI/MSI-X
 pci-stub :00:19.0: irq 29 for MSI/MSI-X
 pci-stub :00:19.0: irq 29 for MSI/MSI-X
 DMAR:[DMA Read] Request device [00:19.0] fault 

Network I/O performance

2009-05-11 Thread Fischer, Anna
I am running KVM with Fedora Core 8 on a 2.6.23 32-bit kernel. I use the 
tun/tap device model and the Linux bridge kernel module to connect my VM to the 
network. I have 2 10G Intel 82598 network devices (with the ixgbe driver) 
attached to my machine and I want to do packet routing in my VM (the VM has two 
virtual network interfaces configured). Analysing the network performance of 
the standard QEMU emulated NICs, I get less that 1G of throughput on those 10G 
links. Surprisingly though, I don't really see CPU utilization being maxed out. 
This is a dual core machine, and mpstat shows me that both CPUs are about 40% 
idle. My VM is more or less unresponsive due to the high network processing 
load while the host OS still seems to be in good shape. How can I best tune 
this setup to achieve best possible performance with KVM? I know there is 
virtIO and I know there is PCI pass-through, but those models are not an option 
for me right now.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 0/13 v7] PCI: Linux kernel SR-IOV support

2008-12-17 Thread Fischer, Anna
 From: linux-pci-ow...@vger.kernel.org [mailto:linux-pci-
 ow...@vger.kernel.org] On Behalf Of Jesse Barnes
 Sent: 16 December 2008 23:24
 To: Yu Zhao
 Cc: linux-...@vger.kernel.org; Chiang, Alexander; Helgaas, Bjorn;
 grund...@parisc-linux.org; g...@kroah.com; mi...@elte.hu;
 matt...@wil.cx; randy.dun...@oracle.com; rdre...@cisco.com;
 ho...@verge.net.au; ying...@kernel.org; linux-ker...@vger.kernel.org;
 kvm@vger.kernel.org; virtualizat...@lists.linux-foundation.org
 Subject: Re: [PATCH 0/13 v7] PCI: Linux kernel SR-IOV support

 On Friday, November 21, 2008 10:36 am Yu Zhao wrote:
  Greetings,
 
  Following patches are intended to support SR-IOV capability in the
  Linux kernel. With these patches, people can turn a PCI device with
  the capability into multiple ones from software perspective, which
  will benefit KVM and achieve other purposes such as QoS, security,
  and etc.
 
  The Physical Function and Virtual Function drivers using the SR-IOV
  APIs will come soon!
 
  Major changes from v6 to v7:
  1, remove boot-time resource rebalancing support. (Greg KH)
  2, emit uevent upon the PF driver is loaded. (Greg KH)
  3, put SR-IOV callback function into the 'pci_driver'. (Matthew
 Wilcox)
  4, register SR-IOV service at the PF loading stage.
  5, remove unnecessary APIs (pci_iov_enable/disable).

 Thanks for your patience with this, Yu, I know it's been a long haul.
 :)

 I applied 1-9 to my linux-next branch; and at least patch #10 needs a
 respin,
 so can you re-do 10-13 as a new patch set?

 On re-reading the last thread, there was a lot of smoke, but very
 little fire
 afaict.  The main questions I saw were:

   1) do we need SR-IOV at all?  why not just make each subsystem export
  devices to guests?
 This is a bit of a red herring.  Nothing about SR-IOV prevents us
 from
 making subsystems more v12n friendly.  And since SR-IOV is a
 hardware
 feature supported by devices these days, we should make Linux
 support it.

   2) should the PF/VF drivers be the same or not?
 Again, the SR-IOV patchset and PCI spec don't dictate this.  We're
 free to
 do what we want here.

   3) should VF devices be represented by pci_dev structs?
 Yes.  (This is an easy one :)

   4) can VF devices be used on the host?
 Yet again, SR-IOV doesn't dictate this.  Developers can make PF/VF
 combo
 drivers or split them, and export the resulting devices however
 they want.
 Some subsystem work may be needed to make this efficient, but SR-
 IOV
 itself is agnostic about it.

 So overall I didn't see many objections to the actual code in the last
 post,
 and the issues above certainly don't merit a NAK IMO...

I have two minor comments on this topic.

1) Currently the PF driver is called before the kernel initializes VFs and
their resources, and the current API does not allow the PF driver to
detect that easily if the allocation of the VFs and their resources
has succeeded or not. It would be quite useful if the PF driver gets
notified when the VFs have been created successfully as it might have
to do further device-specific work *after* IOV has been enabled.

2) Configuration of SR-IOV: the current API allows to enable/disable
VFs from userspace via SYSFS. At the moment I am not quite clear what
exactly is supposed to control these capabilities. This could be
Linux tools or, on a virtualized system, hypervisor control tools.
One thing I am missing though is an in-kernel API for this which I
think might be useful. After all the PF driver controls the device,
and, for example, when a device error occurs (e.g. a hardware failure
which only the PF driver will be able to detect, not Linux), then the
PF driver might have to de-allocate all resources, shut down VFs and
reset the device, or something like that. In that case the PF driver
needs to have a way to notify the Linux SR-IOV code about this and
initiate cleaning up of VFs and their resources. At the moment, this
would have to go through userspace, I believe, and I think that is not
an optimal solution. Yu, do you have an opinion on how this would be
realized?

Anna
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 0/13 v7] PCI: Linux kernel SR-IOV support

2008-12-17 Thread Fischer, Anna
 From: Zhao, Yu [mailto:yu.z...@intel.com]
 Sent: 18 December 2008 02:14
 To: Fischer, Anna
 Cc: Jesse Barnes; linux-...@vger.kernel.org; Chiang, Alexander;
 Helgaas, Bjorn; grund...@parisc-linux.org; g...@kroah.com;
 mi...@elte.hu; matt...@wil.cx; randy.dun...@oracle.com;
 rdre...@cisco.com; ho...@verge.net.au; ying...@kernel.org; linux-
 ker...@vger.kernel.org; kvm@vger.kernel.org;
 virtualizat...@lists.linux-foundation.org
 Subject: Re: [PATCH 0/13 v7] PCI: Linux kernel SR-IOV support

 Fischer, Anna wrote:
  I have two minor comments on this topic.
 
  1) Currently the PF driver is called before the kernel initializes
 VFs and
  their resources, and the current API does not allow the PF driver to
  detect that easily if the allocation of the VFs and their resources
  has succeeded or not. It would be quite useful if the PF driver gets
  notified when the VFs have been created successfully as it might have
  to do further device-specific work *after* IOV has been enabled.

 If the VF allocation fails in the PCI layer, then the SR-IOV core will
 invokes the callback again to notify the PF driver with zero VF count.
 The PF driver does not have to concern about this even the PCI layer
 code fails (and actually it's very rare).

Yes, this is good.


 And I'm not sure why the PF driver wants to do further work *after* the
 VF is allocated. Does this mean PF driver have to set up some internal
 resources related to SR-IOV/VF? If yes, I suggest the PF driver do it
 before VF allocation. The design philosophy of SR-IOV/VF is that VF is
 treated as hot-plug device, which means it should be immediately usable
 by VF driver (e.g. VF driver is pre-loaded) after it appears in the PCI
 subsystem. If that is not the purpose, then PF driver should handle it
 not depending on the SR-IOV, right?

Yes, you are right. In fact I was assuming in this case that the PF driver
might have to allocate VF specific resources before a PF - VF
communication can be established but this can be done before the VF PCI
device appears, so I was wrong with this. The current API is sufficient
to handle all of this, so I am withdrawing my concern here ;-)


 If you could elaborate your SR-IOV PF/VF h/w specific requirement, it
 would be help for me to answer this question :-)

  2) Configuration of SR-IOV: the current API allows to enable/disable
  VFs from userspace via SYSFS. At the moment I am not quite clear what
  exactly is supposed to control these capabilities. This could be
  Linux tools or, on a virtualized system, hypervisor control tools.

 This depends on user application, you know, which depends on the usage
 environment (i.e. native, KVM or Xen).

  One thing I am missing though is an in-kernel API for this which I
  think might be useful. After all the PF driver controls the device,
  and, for example, when a device error occurs (e.g. a hardware failure
  which only the PF driver will be able to detect, not Linux), then the
  PF driver might have to de-allocate all resources, shut down VFs and
  reset the device, or something like that. In that case the PF driver
  needs to have a way to notify the Linux SR-IOV code about this and
  initiate cleaning up of VFs and their resources. At the moment, this
  would have to go through userspace, I believe, and I think that is
 not
  an optimal solution. Yu, do you have an opinion on how this would be
  realized?

 Yes, the PF driver can use pci_iov_unregister to disable SR-IOV in case
 the fatal error occurs. This function also sends notification to user
 level through 'uevent' so user application can aware the change.

If pci_iov_unregister is accessible for kernel drivers than this is in fact
all we need. Thanks for the clarification.


I think the patchset looks very good.

Acked-by: Anna Fischer anna.fisc...@hp.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

2008-11-08 Thread Fischer, Anna
 Subject: Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
 Importance: High

 On Fri, Nov 07, 2008 at 11:17:40PM +0800, Yu Zhao wrote:
  While we are arguing what the software model the SR-IOV should be,
 let me
  ask two simple questions first:
 
  1, What does the SR-IOV looks like?
  2, Why do we need to support it?

 I don't think we need to worry about those questions, as we can see
 what
 the SR-IOV interface looks like by looking at the PCI spec, and we know
 Linux needs to support it, as Linux needs to support everything :)

 (note, community members that can not see the PCI specs at this point
 in
 time, please know that we are working on resolving these issues,
 hopefully we will have some good news within a month or so.)

  As you know the Linux kernel is the base of various virtual machine
  monitors such as KVM, Xen, OpenVZ and VServer. We need SR-IOV support
 in
  the kernel because mostly it helps high-end users (IT departments,
 HPC,
  etc.) to share limited hardware resources among hundreds or even
 thousands
  virtual machines and hence reduce the cost. How can we make these
 virtual
  machine monitors utilize the advantage of SR-IOV without spending too
 much
  effort meanwhile remaining architectural correctness? I believe
 making VF
  represent as much closer as a normal PCI device (struct pci_dev) is
 the
  best way in current situation, because this is not only what the
 hardware
  designers expect us to do but also the usage model that KVM, Xen and
 other
  VMMs have already supported.

 But would such an api really take advantage of the new IOV interfaces
 that are exposed by the new device type?

I agree with what Yu says. The idea is to have hardware capabilities to
virtualize a PCI device in a way that those virtual devices can represent
full PCI devices. The advantage of that is that those virtual device can
then be used like any other standard PCI device, meaning we can use existing
OS tools, configuration mechanism etc. to start working with them. Also, when
using a virtualization-based system, e.g. Xen or KVM, we do not need
to introduce new mechanisms to make use of SR-IOV, because we can handle
VFs as full PCI devices.

A virtual PCI device in hardware (a VF) can be as powerful or complex as
you like, or it can be very simple. But the big advantage of SR-IOV is
that hardware presents a complete PCI device to the OS - as opposed to
some resources, or queues, that need specific new configuration and
assignment mechanisms in order to use them with a guest OS (like, for
example, VMDq or similar technologies).

Anna
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

2008-11-06 Thread Fischer, Anna
 On Thu, Nov 06, 2008 at 08:41:53AM -0800, H L wrote:
  I have not modified any existing drivers, but instead I threw
 together
  a bare-bones module enabling me to make a call to pci_iov_register()
  and then poke at an SR-IOV adapter's /sys entries for which no driver
  was loaded.
 
  It appears from my perusal thus far that drivers using these new
  SR-IOV patches will require modification; i.e. the driver associated
  with the Physical Function (PF) will be required to make the
  pci_iov_register() call along with the requisite notify() function.
  Essentially this suggests to me a model for the PF driver to perform
  any global actions or setup on behalf of VFs before enabling them
  after which VF drivers could be associated.

 Where would the VF drivers have to be associated?  On the pci_dev
 level or on a higher one?

A VF appears to the Linux OS as a standard (full, additional) PCI device. The 
driver is associated in the same way as for a normal PCI device. Ideally, you 
would use SR-IOV devices on a virtualized system, for example, using Xen. A VF 
can then be assigned to a guest domain as a full PCI device.

 Will all drivers that want to bind to a VF device need to be
 rewritten?

Currently, any vendor providing a SR-IOV device needs to provide a PF driver 
and a VF driver that runs on their hardware. A VF driver does not necessarily 
need to know much about SR-IOV but just run on the presented PCI device. You 
might want to have a communication channel between PF and VF driver though, for 
various reasons, if such a channel is not provided in hardware.

  I have so far only seen Yu Zhao's 7-patch set.  I've not yet looked
  at his subsequently tendered 15-patch set so I don't know what has
  changed.The hardware/firmware implementation for any given SR-IOV
  compatible device, will determine the extent of differences required
  between a PF driver and a VF driver.

 Yeah, that's what I'm worried/curious about.  Without seeing the code
 for such a driver, how can we properly evaluate if this infrastructure
 is the correct one and the proper way to do all of this?

Yu's API allows a PF driver to register with the Linux PCI code and use it to 
activate VFs and allocate their resources. The PF driver needs to be modified 
to work with that API. While you can argue about how that API is supposed to 
look like, it is clear that such an API is required in some form. The PF driver 
needs to know when VFs are active as it might want to allocate further 
(device-specific) resources to VFs or initiate further (device-specific) 
configurations. While probably a lot of SR-IOV specific code has to be in the 
PF driver, there is also support required from the Linux PCI subsystem, which 
is to some extend provided by Yu's patches.

Anna
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

2008-11-06 Thread Fischer, Anna
 Subject: Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

 On Thu, Nov 06, 2008 at 05:38:16PM +, Fischer, Anna wrote:
   On Thu, Nov 06, 2008 at 08:41:53AM -0800, H L wrote:
I have not modified any existing drivers, but instead I threw
   together
a bare-bones module enabling me to make a call to
 pci_iov_register()
and then poke at an SR-IOV adapter's /sys entries for which no
 driver
was loaded.
   
It appears from my perusal thus far that drivers using these new
SR-IOV patches will require modification; i.e. the driver
 associated
with the Physical Function (PF) will be required to make the
pci_iov_register() call along with the requisite notify()
 function.
Essentially this suggests to me a model for the PF driver to
 perform
any global actions or setup on behalf of VFs before enabling
 them
after which VF drivers could be associated.
  
   Where would the VF drivers have to be associated?  On the pci_dev
   level or on a higher one?
 
  A VF appears to the Linux OS as a standard (full, additional) PCI
  device. The driver is associated in the same way as for a normal PCI
  device. Ideally, you would use SR-IOV devices on a virtualized
 system,
  for example, using Xen. A VF can then be assigned to a guest domain
 as
  a full PCI device.

 It's that second part that I'm worried about.  How is that going to
 happen?  Do you have any patches that show this kind of assignment?

That depends on your setup. Using Xen, you could assign the VF to a guest 
domain like any other PCI device, e.g. using PCI pass-through. For VMware, KVM, 
there are standard ways to do that, too. I currently don't see why SR-IOV 
devices would need any specific, non-standard mechanism for device assignment.


   Will all drivers that want to bind to a VF device need to be
   rewritten?
 
  Currently, any vendor providing a SR-IOV device needs to provide a PF
  driver and a VF driver that runs on their hardware.

 Are there any such drivers available yet?

I don't know.


  A VF driver does not necessarily need to know much about SR-IOV but
  just run on the presented PCI device. You might want to have a
  communication channel between PF and VF driver though, for various
  reasons, if such a channel is not provided in hardware.

 Agreed, but what does that channel look like in Linux?

 I have some ideas of what I think it should look like, but if people
 already have code, I'd love to see that as well.

At this point I would guess that this code is vendor specific, as are the 
drivers. The issue I see is that most likely drivers will run in different 
environments, for example, in Xen the PF driver runs in a driver domain while a 
VF driver runs in a guest VM. So a communication channel would need to be 
either Xen specific, or vendor specific. Also, a guest using the VF might run 
Windows while the PF might be controlled under Linux.

Anna
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html