RE: pci-stub error and MSI-X for KVM guest
Subject: Re: pci-stub error and MSI-X for KVM guest * Fischer, Anna (anna.fisc...@hp.com) wrote: Ouch. Can you do debuginfo-install qemu-system-x86 to get the debug packages, then attach gdb to the QEMU process so that when you do lspci -v in the guest (assuming this is QEMU segfaulting) you'll get a backtrace? I don't know how I can tell virt-manager through the GUI to enable debug mode, e.g. call virt-manager with '-s'. From the command line I can attach gdb like this, but when running virt-manager from the GUI then I cannot connect to localhost:1234. However, the issues only arise when starting virt-manager from the GUI. I can't find the configuration option to somehow tell that I want it to be launched with '-s'? Just looking for a backtrace of the qemu-kvm process itself. So after you launch it via virt-manager, gdb /usr/bin/qemu-kvm $(pidof qemu-kvm) should be sufficient. So, when setting a breakpoint for the exit() call I'm getting a bit closer to figuring where it kills my guest. Breakpoint 1, exit (status=1) at exit.c:99 99 { Current language: auto The current source language is auto; currently c. (gdb) bt #0 exit (status=1) at exit.c:99 #1 0x00470c6e in assigned_dev_pci_read_config (d=0x259c6f0, address=64, len=4) at /usr/src/debug/qemu-kvm-0.11.0/hw/device-assignment.c:349 #2 0x0042419d in handle_io (vcpu=value optimized out) at /usr/src/debug/qemu-kvm-0.11.0/qemu-kvm.c:784 #3 kvm_run (vcpu=value optimized out) at /usr/src/debug/qemu-kvm-0.11.0/qemu-kvm.c:1017 #4 0x00424273 in kvm_cpu_exec (env=0x3f) at /usr/src/debug/qemu-kvm-0.11.0/qemu-kvm.c:1686 #5 0x00425856 in kvm_main_loop_cpu (env=0x255a150) at /usr/src/debug/qemu-kvm-0.11.0/qemu-kvm.c:1868 #6 ap_main_loop (env=0x255a150) at /usr/src/debug/qemu-kvm-0.11.0/qemu-kvm.c:1905 #7 0x0035aac06a3a in start_thread (arg=value optimized out) at pthread_create.c:297 #8 0x0035aa0ddf3d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #9 0x in ?? () (gdb) p assigned_dev_pci_read_config::address $1 = 64 (gdb) p assigned_dev_pci_read_config::val $2 = 0 (gdb) p assigned_dev_pci_read_config::len $3 = 4 (gdb) p assigned_dev_pci_read_config::ret $4 = value optimized out (gdb) p assigned_dev_pci_read_config::fd $5 = 13 (gdb) p assigned_dev_pci_read_config::pci_dev $6 = (AssignedDevice *) 0x259c6f0 (gdb) p assigned_dev_pci_read_config::pci_dev-real_device $7 = {bus = 0 '\000', dev = 0 '\000', func = 0 '\000', irq = 0, region_number = 7, regions = {{ type = 512, valid = 1, base_addr = 4077142016, size = 16384, resource_fd = 15}, {type = 0, valid = 0, base_addr = 0, size = 0, resource_fd = 0}, {type = 0, valid = 0, base_addr = 0, size = 0, resource_fd = 0}, {type = 512, valid = 1, base_addr = 4077273088, size = 16384, resource_fd = 16}, {type = 0, valid = 0, base_addr = 0, size = 0, resource_fd = 0}, { type = 0, valid = 0, base_addr = 0, size = 0, resource_fd = 0}, {type = 0, valid = 0, base_addr = 0, size = 0, resource_fd = 0}}, config_fd = 13} (gdb) p assigned_dev_pci_read_config::d $8 = (PCIDevice *) 0x259c6f0 So the function assigned_dev_pci_read_config fails to read the PCI configuration of the device and then the exit(1) call kills my guest. I don't know enough about the internals of KVM PCI device assignment and furthermore I don't quite know why this works when starting virt-manager from the command line and not when starting it from the GUI. From the dmesg logs I would still guess that the problem is that pci-stub is not initialized properly, and perhaps this is also why the PCI read fails here? pci-stub tells me in the logs enabling device but I don't see any messages about enabling/assigning interrupts as I do when running from the command line. Let me know if you need any further information. Attached a list of virt packages I run under Fedora Core 12. Thanks, Anna packages.log Description: packages.log
RE: pci-stub error and MSI-X for KVM guest
Subject: Re: pci-stub error and MSI-X for KVM guest * Fischer, Anna (anna.fisc...@hp.com) wrote: Subject: Re: pci-stub error and MSI-X for KVM guest This works fine in principle and I can see the PCI device in the guest under lspci. However, the 82576 VF driver requires the OS to support MSI-X. My Fedora installation is configured with MSI-X, e.g. CONFIG_PCI_MSI is 'y'. When I load the driver it tells me it cannot initialize MSI-X for the device, and under /proc/interrupts I can see that MSI-X does not seem to work. Is this a KVM/QEMU limitation? It works for me when running the VF driver under a non-virtualized Linux system. No, this should work fine. QEMU/KVM supports MSI-X to guest as well as VFs. Actually, I just got this to work. However, it only works if I call qemu-kvm from the command line, while it doesn't work when I start the guest via the virt-manager. So this seems to be an issue with Fedora's virt-manager rather than with KVM/QEMU. If I call qemu-kvm from the command line then I get the pci-stub messages saying 'irq xx for MSI/MSI-x' when the guest boots up and the VF device works just fine inside the guest. When I start the guest using virt-manager then I don't see any of these irq allocation messages from pci-stub. Any idea what the problem could be here? No, sounds odd. Can you: # virsh dumpxml [domain] and show the output of the hostdev XML section? hostdev mode='subsystem' type='pci' managed='yes' source address domain='0x' bus='0x03' slot='0x10' function='0x3'/ /source /hostdev The device to assign is at :03:10.3, dmesg shows: pci-stub :03:10.3: enabling device ( - 0002) assign device: host bdf = 3:10:3 Also, when I do an lspci on the KVM guest, that is fine, but when I do an lspci -v then the guest crashes down. In the host OS under dmesg I can see this: pci-stub :03:10.0: restoring config space at offset 0x1 (was 0x10, writing 0x14) Is this a known issue? My qemu-kvm version is 2:0.11.0. No, I've not seen the crash before. What do you mean the guest crashes down? So this also only happens when starting the guest using virt-manager. It works fine when starting qemu-kvm from the command line. This is weird as I call it with the same parameters as I can see virt-manager uses under 'ps -ef | grep qemu'. The guest crashes down means that the QEMU process is terminated. I don't see anything in the logs. It just disappears. Ouch. Can you do debuginfo-install qemu-system-x86 to get the debug packages, then attach gdb to the QEMU process so that when you do lspci -v in the guest (assuming this is QEMU segfaulting) you'll get a backtrace? I don't know how I can tell virt-manager through the GUI to enable debug mode, e.g. call virt-manager with '-s'. From the command line I can attach gdb like this, but when running virt-manager from the GUI then I cannot connect to localhost:1234. However, the issues only arise when starting virt-manager from the GUI. I can't find the configuration option to somehow tell that I want it to be launched with '-s'? This looks like a Fedora specific version (rpm version). Can you verify this is from Fedora packages vs. upstream source? If it's Fedora, would be useful to open a bug there. Yes, I am using KVM/QEMU which ships with the Fedora Core 12 distribution. OK, please file a bug there (and include the backtrace info). I will file a bug once I get the full information. Currently my guess is actually that I might have package mismatches or so with libvirt or virt-manager or QEMU related software. The is my only explanation for why it works from the command line, but not from the GUI. Some path variables must be set differently and perhaps pointing to different libraries or packages or so, otherwise there is no way it can behave differently when calling virt-manager with exactly the same parameters... Cheers, Anna -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: pci-stub error and MSI-X for KVM guest
Subject: RE: pci-stub error and MSI-X for KVM guest Subject: Re: pci-stub error and MSI-X for KVM guest * Fischer, Anna (anna.fisc...@hp.com) wrote: Subject: Re: pci-stub error and MSI-X for KVM guest This works fine in principle and I can see the PCI device in the guest under lspci. However, the 82576 VF driver requires the OS to support MSI-X. My Fedora installation is configured with MSI- X, e.g. CONFIG_PCI_MSI is 'y'. When I load the driver it tells me it cannot initialize MSI-X for the device, and under /proc/interrupts I can see that MSI-X does not seem to work. Is this a KVM/QEMU limitation? It works for me when running the VF driver under a non-virtualized Linux system. No, this should work fine. QEMU/KVM supports MSI-X to guest as well as VFs. Actually, I just got this to work. However, it only works if I call qemu-kvm from the command line, while it doesn't work when I start the guest via the virt-manager. So this seems to be an issue with Fedora's virt-manager rather than with KVM/QEMU. If I call qemu-kvm from the command line then I get the pci-stub messages saying 'irq xx for MSI/MSI-x' when the guest boots up and the VF device works just fine inside the guest. When I start the guest using virt-manager then I don't see any of these irq allocation messages from pci-stub. Any idea what the problem could be here? No, sounds odd. Can you: # virsh dumpxml [domain] and show the output of the hostdev XML section? hostdev mode='subsystem' type='pci' managed='yes' source address domain='0x' bus='0x03' slot='0x10' function='0x3'/ /source /hostdev The device to assign is at :03:10.3, dmesg shows: pci-stub :03:10.3: enabling device ( - 0002) assign device: host bdf = 3:10:3 I forgot, here is the process that the virt-manager GUI creates, e.g. this is the one that does not work. qemu 3072 1 4 11:26 ?00:00:33 /usr/bin/qemu-kvm -S -M pc-0.11 -m 1024 -smp 1 -name FC10-2 -uuid b811b278-fae2-a3cc-d51d-8f5b078b2477 -monitor unix:/var/lib/libvirt/qemu/FC10-2.monitor,server,nowait -boot c -drive file=/var/lib/libvirt/images/FC10-2.img,if=virtio,index=0,boot=on -drive file=/home/af/Download/Fedora-12-x86_64-Live-KDE.iso,if=ide,media=cdrom,index=2 -net none -serial pty -parallel none -usb -vnc 127.0.0.1:0 -k en-gb -vga cirrus -soundhw es1370 -pcidevice host=03:10.3 Note that this one does work from the command line, but not via the GUI. For the debugging to work, I need the '-s' option to be added too... Cheers, Anna -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
pci-stub error and MSI-X for KVM guest
I am running Fedora Core 12 with a 2.6.31 kernel. I use the Intel 82576 SR-IOV network card and want to assign its Virtual Functions (VFs) to separate KVM guests. My guests also run Fedora Core 12 with a 2.6.31 kernel. I use the latest igb driver in the host OS and load it with 2 VFs activated. Then I assign those to my KVM guests. I use virt-manager to do this which then takes care of configuring pci-stub. This works fine in principle and I can see the PCI device in the guest under lspci. However, the 82576 VF driver requires the OS to support MSI-X. My Fedora installation is configured with MSI-X, e.g. CONFIG_PCI_MSI is 'y'. When I load the driver it tells me it cannot initialize MSI-X for the device, and under /proc/interrupts I can see that MSI-X does not seem to work. Is this a KVM/QEMU limitation? It works for me when running the VF driver under a non-virtualized Linux system. Also, when I do an lspci on the KVM guest, that is fine, but when I do an lspci -v then the guest crashes down. In the host OS under dmesg I can see this: pci-stub :03:10.0: restoring config space at offset 0x1 (was 0x10, writing 0x14) Is this a known issue? My qemu-kvm version is 2:0.11.0. Thanks, Anna -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: pci-stub error and MSI-X for KVM guest
Subject: Re: pci-stub error and MSI-X for KVM guest * Fischer, Anna (anna.fisc...@hp.com) wrote: I am running Fedora Core 12 with a 2.6.31 kernel. I use the Intel 82576 SR-IOV network card and want to assign its Virtual Functions (VFs) to separate KVM guests. My guests also run Fedora Core 12 with a 2.6.31 kernel. I use the latest igb driver in the host OS and load it with 2 VFs activated. Then I assign those to my KVM guests. I use virt- manager to do this which then takes care of configuring pci-stub. By 2.6.31 are you referring to the stock Fedora 12 kernel package? Yes. This works fine in principle and I can see the PCI device in the guest under lspci. However, the 82576 VF driver requires the OS to support MSI-X. My Fedora installation is configured with MSI-X, e.g. CONFIG_PCI_MSI is 'y'. When I load the driver it tells me it cannot initialize MSI-X for the device, and under /proc/interrupts I can see that MSI-X does not seem to work. Is this a KVM/QEMU limitation? It works for me when running the VF driver under a non-virtualized Linux system. No, this should work fine. QEMU/KVM supports MSI-X to guest as well as VFs. Actually, I just got this to work. However, it only works if I call qemu-kvm from the command line, while it doesn't work when I start the guest via the virt-manager. So this seems to be an issue with Fedora's virt-manager rather than with KVM/QEMU. If I call qemu-kvm from the command line then I get the pci-stub messages saying 'irq xx for MSI/MSI-x' when the guest boots up and the VF device works just fine inside the guest. When I start the guest using virt-manager then I don't see any of these irq allocation messages from pci-stub. Any idea what the problem could be here? Also, when I do an lspci on the KVM guest, that is fine, but when I do an lspci -v then the guest crashes down. In the host OS under dmesg I can see this: pci-stub :03:10.0: restoring config space at offset 0x1 (was 0x10, writing 0x14) Is this a known issue? My qemu-kvm version is 2:0.11.0. No, I've not seen the crash before. What do you mean the guest crashes down? So this also only happens when starting the guest using virt-manager. It works fine when starting qemu-kvm from the command line. This is weird as I call it with the same parameters as I can see virt-manager uses under 'ps -ef | grep qemu'. The guest crashes down means that the QEMU process is terminated. I don't see anything in the logs. It just disappears. This looks like a Fedora specific version (rpm version). Can you verify this is from Fedora packages vs. upstream source? If it's Fedora, would be useful to open a bug there. Yes, I am using KVM/QEMU which ships with the Fedora Core 12 distribution. Thanks for your help, Anna -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM for Linux 2.6.16?
Hi, I am trying to compile the kvm-87 module for Linux 2.6.16. I thought that it has been back-ported to such an old kernel. However, I don't seem to be able to compile the module on my kernel. I get the following error: CCtsc2005.o CCscsi-disk.o CCcdrom.o CCscsi-generic.o CCusb.o CCusb-hub.o CCusb-linux.o In file included from usb-linux.c:41: /usr/include/linux/usbdevice_fs.h:49: error: expected ':', ',', ';', '}' or '__attribute__' before '*' token /usr/include/linux/usbdevice_fs.h:56: error: expected ':', ',', ';', '}' or '__attribute__' before '*' token /usr/include/linux/usbdevice_fs.h:66: error: expected ':', ',', ';', '}' or '__attribute__' before '*' token /usr/include/linux/usbdevice_fs.h:100: error: expected ':', ',', ';', '}' or '__attribute__' before '*' token /usr/include/linux/usbdevice_fs.h:116: error: expected ':', ',', ';', '}' or '__attribute__' before '*' token usb-linux.c: In function 'async_complete': usb-linux.c:271: error: 'struct usbdevfs_urb' has no member named 'actual_length' usb-linux.c: In function 'usb_host_handle_data': usb-linux.c:464: error: 'struct usbdevfs_urb' has no member named 'buffer' usb-linux.c:465: error: 'struct usbdevfs_urb' has no member named 'buffer_length' usb-linux.c:471: error: 'struct usbdevfs_urb' has no member named 'number_of_packets' usb-linux.c:472: error: 'struct usbdevfs_urb' has no member named 'iso_frame_desc' usb-linux.c:478: error: 'struct usbdevfs_urb' has no member named 'usercontext' usb-linux.c: In function 'usb_host_handle_control': usb-linux.c:598: error: 'struct usbdevfs_urb' has no member named 'buffer' Is KVM not supposed to work on 2.6.16? Cheers, Anna -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: KVM for Linux 2.6.16?
Subject: Re: KVM for Linux 2.6.16? On Thu, 2009-07-09 at 16:49 +, Fischer, Anna wrote: Hi, I am trying to compile the kvm-87 module for Linux 2.6.16. I thought that it has been back-ported to such an old kernel. However, I don't seem to be able to compile the module on my kernel. I get the following error: CCtsc2005.o CCscsi-disk.o CCcdrom.o CCscsi-generic.o CCusb.o CCusb-hub.o CCusb-linux.o In file included from usb-linux.c:41: /usr/include/linux/usbdevice_fs.h:49: error: expected ':', ',', ';', '}' or '__attribute__' before '*' token /usr/include/linux/usbdevice_fs.h:56: error: expected ':', ',', ';', '}' or '__attribute__' before '*' token /usr/include/linux/usbdevice_fs.h:66: error: expected ':', ',', ';', '}' or '__attribute__' before '*' token /usr/include/linux/usbdevice_fs.h:100: error: expected ':', ',', ';', '}' or '__attribute__' before '*' token /usr/include/linux/usbdevice_fs.h:116: error: expected ':', ',', ';', '}' or '__attribute__' before '*' token usb-linux.c: In function 'async_complete': usb-linux.c:271: error: 'struct usbdevfs_urb' has no member named 'actual_length' usb-linux.c: In function 'usb_host_handle_data': usb-linux.c:464: error: 'struct usbdevfs_urb' has no member named 'buffer' usb-linux.c:465: error: 'struct usbdevfs_urb' has no member named 'buffer_length' usb-linux.c:471: error: 'struct usbdevfs_urb' has no member named 'number_of_packets' usb-linux.c:472: error: 'struct usbdevfs_urb' has no member named 'iso_frame_desc' usb-linux.c:478: error: 'struct usbdevfs_urb' has no member named 'usercontext' usb-linux.c: In function 'usb_host_handle_control': usb-linux.c:598: error: 'struct usbdevfs_urb' has no member named 'buffer' Is KVM not supposed to work on 2.6.16? Hi Anna, I'm afraid that I have some bad news for you. Usually KVM versions are tailored to kernel versions contemporary with them. Version 87 is supposed to need 2.6.26 kernels and newer, IIRC. So for your 2.6.16 you should try some of the incipient KVM versions, and if you are lucky enough, they might work. So if I run an ancient Linux kernel, then I can only run with an ancient KVM version? I thought the code was kept backwards compatible to some extend? N�r��yb�X��ǧv�^�){.n�+h����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf
RE: Network throughput limits for local VM - VM communication
Subject: Re: Network throughput limits for local VM - VM communication Fischer, Anna wrote: Not sure I understand. As far as I can see the packets are replicated on the tun/tap interface before they actually enter the bridge. So this is not about the bridge learning MAC addresses and flooding frames to unknown destinations. So I think this is different. Okay. You said: However, without VLANs, the tun interface will pass packets to all tap interfaces. It has to, as it doesn't know to which one the packet has to go to. Well, it shouldn't. The tun interface should pass the packets to just one tap interface. Can you post the qemu command line you're using? There's a gotcha there that can result in what you're seeing. Sorry for the late reply on this issue. The command line I am using looks roughly like this: /usr/bin/qemu-system-x86_64 -m 1024 -smp 2 -name FC10-2 -uuid b811b278-fae2-a3cc-d51d-8f5b078b2477 -boot c -drive file=,if=ide,media=cdrom,index=2 -drive file=/var/lib/libvirt/images/FC10-2.img,if=virtio,index=0,boot=on -net nic,macaddr=54:52:00:11:ae:79,model=e1000 -net tap net nic,macaddr=54:52:00:11:ae:78,model=e1000 -net tap -serial pty -parallel none -usb -vnc 127.0.0.1:2 -k en-gb -soundhw es1370 This is my routing VM that has two network interfaces and routes packets between two subnets. It has one interface plugged into bridge virbr0 and the other interface is plugged into virbr1: brctl show bridge name bridge id STP enabled interfaces virbr0 8000.8ac1d18c63ec no vnet0 vnet1 virbr1 8000.2ebfcbb9ed70 no vnet2 vnet3 If I use the e1000 virtual NIC model, I see performance drop significantly compared to using virtio_net. However, with virtio_net I have the network stalling after a few seconds of high-throughput traffic (as I mentioned in my previous post). Just to reiterate my scenario: I run three guests on the same physical machine, one guest is my routing VM that is routing IP network traffic between the other two guests. I am also wondering about the fact that I do not seem to get CPU utilization maxed out in this case while throughput does not go any higher. I do not understand what is stopping KVM from using more CPU for guest I/O processing? There is nothing else running on my machine. I have analyzed the amount of CPU that each KVM thread is using, and I can see that the thread that is running the VCPU of the routing VM which is processing interrupts of the e1000 virtual network card is using the highest amount of CPU. Is there any way that I can optimize my network set-up? Maybe some specific configuration of the e1000 driver within the guest? Are there any known issues with this? I also see very difference CPU utilization and network throughput figures when pinning threads to CPU cores using taskset. At one point I managed to double the throughput, but I could not reproduce that setup for some reason. What are the major issues that I would need to pay attention to when pinning threads to cores in order to optimize my specific set-up so that I can achieve better network I/O performance? Thanks for your help. Anna -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Network throughput limits for local VM - VM communication
Subject: Re: Network throughput limits for local VM - VM communication On 06/17/2009 10:36 AM, Fischer, Anna wrote: /usr/bin/qemu-system-x86_64 -m 1024 -smp 2 -name FC10-2 -uuid b811b278-fae2-a3cc-d51d-8f5b078b2477 -boot c -drive file=,if=ide,media=cdrom,index=2 -drive file=/var/lib/libvirt/images/FC10-2.img,if=virtio,index=0,boot=on -net nic,macaddr=54:52:00:11:ae:79,model=e1000 -net tap net nic,macaddr=54:52:00:11:ae:78,model=e1000 -net tap -serial pty - parallel none -usb -vnc 127.0.0.1:2 -k en-gb -soundhw es1370 Okay, like I suspected, qemu has a trap here and you walked into it. The -net option plugs the device you specify into a virtual hub. The command line you provided plugs the two virtual NICs and the two tap devices into one virtual hub, so any packet received from any of the four clients will be propagated to the other three. To get this to work right, specify the vlan= parameter which says which virtual hub a component is plugged into. Note this has nothing to do with 802.blah vlans. So your command line should look like qemu ... -net nic,...,vlan=0 -net tap,...,vlan=0 -net nic,...,vlan=1 -net tap,...,vlan=1 This will give you two virtual hubs, each bridging a virtual nic to a tap device. This is my routing VM that has two network interfaces and routes packets between two subnets. It has one interface plugged into bridge virbr0 and the other interface is plugged into virbr1: brctl show bridge name bridge id STP enabled interfaces virbr0 8000.8ac1d18c63ec no vnet0 vnet1 virbr1 8000.2ebfcbb9ed70 no vnet2 vnet3 Please redo the tests with qemu vlans but without 802.blah vlans, so we see what happens without packet duplication. Avi, thanks for your quick reply. I do use the vlan= parameter now, and yes, I do not see packet duplication any more, so everything you said is right and I do understand now why I was seeing packets on both bridges before. So this has nothing to do with tun/tap then but just with the way QEMU virtual hubs work. I didn't know about any details on that before. Even with vlan= enabled, I am still having the same issues with weird CPU utilization and low throughput that I have described below. If I use the e1000 virtual NIC model, I see performance drop significantly compared to using virtio_net. However, with virtio_net I have the network stalling after a few seconds of high-throughput traffic (as I mentioned in my previous post). Just to reiterate my scenario: I run three guests on the same physical machine, one guest is my routing VM that is routing IP network traffic between the other two guests. I am also wondering about the fact that I do not seem to get CPU utilization maxed out in this case while throughput does not go any higher. I do not understand what is stopping KVM from using more CPU for guest I/O processing? There is nothing else running on my machine. I have analyzed the amount of CPU that each KVM thread is using, and I can see that the thread that is running the VCPU of the routing VM which is processing interrupts of the e1000 virtual network card is using the highest amount of CPU. Is there any way that I can optimize my network set-up? Maybe some specific configuration of the e1000 driver within the guest? Are there any known issues with this? There are known issues with lack of flow control while sending packets out of a guest. If the guest runs tcp that tends to correct for it, but if you run a lower level protocol that doesn't have its own flow control, the guest may spend a lot of cpu generating packets that are eventually dropped. We are working on fixing this. For the tests I run now (with vlan= enabled) I am actually using both TCP and UDP, and I see the problem with virtio_net for both protocols. What I am wondering about though is that I do not seem to have any problems if I communicate directly between the two guests (if I plug then into the same bridge and put them onto the same networks), so why do I only see the problem of stalling network communication when there is a routing VM in the network path? Is this just because the system is even more overloaded in that case? Or could this be an issue related to a dual NIC configuration or the fact that I run multiple bridges on the same physical machine? When you say We are working on fixing this. - which code parts are you working on? Is this in the QEMU network I/O processing code or is this virtio_net related? I also see very difference CPU utilization and network throughput figures when pinning threads to CPU cores using taskset. At one point I managed to double the throughput, but I could not reproduce that setup for some reason. What are the major issues that I would
RE: Network throughput limits for local VM - VM communication
Subject: Re: Network throughput limits for local VM - VM communication On 06/17/2009 11:12 AM, Fischer, Anna wrote: For the tests I run now (with vlan= enabled) I am actually using both TCP and UDP, and I see the problem with virtio_net for both protocols. What I am wondering about though is that I do not seem to have any problems if I communicate directly between the two guests (if I plug then into the same bridge and put them onto the same networks), so why do I only see the problem of stalling network communication when there is a routing VM in the network path? Is this just because the system is even more overloaded in that case? Or could this be an issue related to a dual NIC configuration or the fact that I run multiple bridges on the same physical machine? My guess is that somewhere there's a queue that's shorter that the virtio queue, or its usable size fluctuates (because it is shared with something else). So TCP flow control doesn't work, and UDP doesn't have a chance. When you say We are working on fixing this. - which code parts are you working on? Is this in the QEMU network I/O processing code or is this virtio_net related? tap. virtio, qemu, maybe more. It's a difficult problem. Retry with the fixed configuration? You mean setting the vlan= parameter? I have already used the vlan= parameter for the latest tests, and so the CPU utilization issues I am talking about are happening with that configuration. Yeah. Can you compare total data sent and received as seen by the guests? That would confirm that packets being dropped causes the slowdown. Yes, I will check on that and report back. It still does not answer my question on why I only see low CPU utilization numbers with the e1000 virtual device model. There is no network stalling or packet drops or any other obvious issues when running with that model, but I am still seeing low CPU utilization numbers. What is preventing KVM here to use more of the host CPU capacity when the host is not doing anything else but run virtual machines? Is there any way that I can get higher CPU utilization out of KVM? Thanks, Anna N�r��yb�X��ǧv�^�){.n�+h����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf
RE: Network throughput limits for local VM - VM communication
Subject: Re: Network throughput limits for local VM - VM communication On Wednesday 10 June 2009, Fischer, Anna wrote: Have you tried eliminating VLAN to simplify the setup? No - but there is a relating bug in the tun/tap interface (well, it is not really a bug but simply the way tun/tap works) that will cause packets to be replicated on all the tap interfaces (across all bridges attached to those) if I do not configure VLANs. This will result in a system that is even more overloaded. I had discovered this a while back when running UDP stress tests under 10G. Not sure I understand. Do you mean you have all three guests connected to the same bridge? If you want the router guest to be the only connection, you should not connect the two bridges anywhere else, so I don't see how packets can go from one bridge to the other one, except through the router. I am using two bridges, and yes, in theory, the router should be the only connection between the two guests. However, without VLANs, the tun interface will pass packets to all tap interfaces. It has to, as it doesn't know to which one the packet has to go to. It does not look at packets, it simply copies buffers from userspace to the tap interface in the kernel. The tap interface then eventually drops the packet, if the MAC address does not match its own. So packets will not actually go across both bridges, because the tap interface that should not receive the packet does drop it. However, it does receive the packet and processes it to some extend which causes some overhead. As I was told by someone at KVM/RedHat, this does not happen when using VLANs as then there will be a direct mapping between any tun-tap device and so no packet replication across multiple tap devices. Does it change when the guests communicate over a -net socket interface with your router instead of the -net tap + bridge in the host? I have not tried this - I need the bridge in the network data path for some testing, so using the -net socket interface would not solve my problem. I did not mean this to solve your problem but to hunt down the bug. If the problem only exists with the host bridge device, we should look there, but if it persists, we can probably rule out the tap, bridge and vlan code in the host as the problem source. Yes, I understand you were trying to help and using the -net socket interface would help to narrow down where the problem could be. I just have not yet managed to set this up, but I might do if I find the time in the next days. I was hoping that other people might have seen the same issues I see, but unfortunately I did not get that many replies/suggestions on this issue from the list at all. However, I have just today managed to get around this bug by using the e1000 QEMU emulated NIC model and this seems to do the trick. Now the throughput is still very low, but that might simply be because my system is too weak. When using the e1000 model instead of rtl8139 or virtio, I do not have any network crashes any more. That could either indicate a bug in rtl8139 and virtio, or that the specific timing of the e1000 model hides this bug. What happens if only one side uses e1000 while the other still uses virtio? What about any of the other models? Good question. I will try this out and post the results. Cheers, Anna -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Network throughput limits for local VM - VM communication
Subject: Re: Network throughput limits for local VM - VM communication Fischer, Anna wrote: I am using two bridges, and yes, in theory, the router should be the only connection between the two guests. However, without VLANs, the tun interface will pass packets to all tap interfaces. It has to, as it doesn't know to which one the packet has to go to. It does not look at packets, it simply copies buffers from userspace to the tap interface in the kernel. The tap interface then eventually drops the packet, if the MAC address does not match its own. So packets will not actually go across both bridges, because the tap interface that should not receive the packet does drop it. However, it does receive the packet and processes it to some extend which causes some overhead. As I was told by someone at KVM/RedHat, this does not happen when using VLANs as then there will be a direct mapping between any tun-tap device and so no packet replication across multiple tap devices. This only happens if the receiving tap never sends out packets. If the tap interface does send out packets, the bridge will associate their MAC address with that interface, and future packets will only be forwarded there. Is this your scenario? Not sure I understand. As far as I can see the packets are replicated on the tun/tap interface before they actually enter the bridge. So this is not about the bridge learning MAC addresses and flooding frames to unknown destinations. So I think this is different. Anna -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Network throughput limits for local VM - VM communication
Subject: Re: Network throughput limits for local VM - VM communication On Tuesday 09 June 2009, Fischer, Anna wrote: I have tried using virtio and using the emulated QEMU virtual NICs. It does not make a difference. It seems as if there is an overflow somewhere when QEMU/virtio cannot cope with the network load any more, and then the virtual interfaces don't seem to transmit anything anymore. It seems to mostly work again when I shut down and start up the interfaces of the router inside of the guest. I use two bridges (and VLANs) that pass packets between sending/receiving guests and the routing guest. The set-up works fine for simple ping and other communication that is low-throughput type traffic. Have you tried eliminating VLAN to simplify the setup? No - but there is a relating bug in the tun/tap interface (well, it is not really a bug but simply the way tun/tap works) that will cause packets to be replicated on all the tap interfaces (across all bridges attached to those) if I do not configure VLANs. This will result in a system that is even more overloaded. I had discovered this a while back when running UDP stress tests under 10G. Does it change when the guests communicate over a -net socket interface with your router instead of the -net tap + bridge in the host? I have not tried this - I need the bridge in the network data path for some testing, so using the -net socket interface would not solve my problem. However, I have just today managed to get around this bug by using the e1000 QEMU emulated NIC model and this seems to do the trick. Now the throughput is still very low, but that might simply be because my system is too weak. When using the e1000 model instead of rtl8139 or virtio, I do not have any network crashes any more. I have been looking through the RedHat bug lists to search for some hints today, and it seems as if there are a lot of people seeing the network under KVM break down under heavy load. I think that this is something that needs some further investigation. I can provide more detailed system set-up etc, it should be easy to reproduce this. Thanks for your help, Anna -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Network throughput limits for local VM - VM communication
I am testing network throughput between two guests residing on the same physical machine. I use a bridge to pass packets between those guests and the virtio NIC model. I am wondering why the throughput only goes up to about 970Mbps. Should we not be able to achieve much higher throughput if the packets do not actually go out on the physical wire? What are the limitations for throughput performance under KVM/virtio? I can see that by default the interfaces (the tap devices) have TX queue length set to 500, and I wonder if increasing this would make any difference? Also, are there other things I would need to consider to achieve higher throughput numbers for local guest - guest communication? The CPU is not maxed out at all, and shows as being idle for most of the time while the throughput does not increase any more. I run KVM under standard Fedora Core 10 with a Linux kernel 2.6.27. Thanks, Anna -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Network I/O performance
Subject: Re: Network I/O performance Fischer, Anna wrote: Subject: Re: Network I/O performance Fischer, Anna wrote: I am running KVM with Fedora Core 8 on a 2.6.23 32-bit kernel. I use the tun/tap device model and the Linux bridge kernel module to connect my VM to the network. I have 2 10G Intel 82598 network devices (with the ixgbe driver) attached to my machine and I want to do packet routing in my VM (the VM has two virtual network interfaces configured). Analysing the network performance of the standard QEMU emulated NICs, I get less that 1G of throughput on those 10G links. Surprisingly though, I don't really see CPU utilization being maxed out. This is a dual core machine, and mpstat shows me that both CPUs are about 40% idle. My VM is more or less unresponsive due to the high network processing load while the host OS still seems to be in good shape. How can I best tune this setup to achieve best possible performance with KVM? I know there is virtIO and I know there is PCI pass-through, but those models are not an option for me right now. How many cpus are assigned to the guest? If only one, then 40% idle equates to 100% of a core for the guest and 20% for housekeeping. No, the machine has a dual core CPU and I have configured the guest with 2 CPUs. So I would want to see KVM using up to 200% of CPU, ideally. There is nothing else running on that machine. Well, it really depends on the workload, whether it can utilize both vcpus. If this is the case, you could try pinning the vcpu thread (info cpus from the monitor) to one core. You should then see 100%/20% cpu load distribution. wrt emulated NIC performance, I'm guessing you're not doing tcp? If you were we might do something with TSO. No, I am measuring UDP throughput performance. I have now tried using a different NIC model, and the e1000 model seems to achieve slightly better performance (CPU goes up to 110% only though). I have also been running virtio now, and while its performance with 2.6.20 was very poor too, when changing the guest kernel to 2.6.30, I get a reasonable performance and higher CPU utilization (e.g. it goes up to 180-190%). I have to throttle the incoming bandwidth though, because as soon as I go over a certain threshold, CPU goes back down to 90% and throughput goes down too. Yes, there's a known issue with UDP, where we don't report congestion and the queues start dropping packets. There's a patch for tun queued for the next merge window; you'll need a 2.6.31 host for that IIRC (Herbert?) I have not seen this with Xen/VMware where I mostly managed to max out CPU completely before throughput performance did not go up anymore. I have also realized that when using the tun/tap configuration with a bridge, packets are replicated on all tap devices when QEMU writes packets to the tun interface. I guess this is a limitation of tun/tap as it does not know to which tap device the packet has to go to. The tap device then eventually drops packets when the destination MAC is not its own, but it still receives the packet which causes more overhead in the system overall. Right, I guess you'd see this with a real switch as well? Maybe have your guest send a packet out once in a while so the bridge can learn its MAC address (we do this after migration, for example). No, this is not about the bridge - packets are replicated by tun/tap as far as I can see. In fact I run two bridges, and attach my two tap interfaces to those (one tap per bridge to connect it to the external network). And packets that should actually only go to one bridge, are replicated on the other one, too. This is far off from being ideal, but I guess the issue is that the tun/tap interface is a 1-N mapping, so there is not much you can do. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: tun/tap and Vlans
Subject: Re: tun/tap and Vlans Lukas Kolbe wrote: Right, I guess you'd see this with a real switch as well? Maybe have your guest send a packet out once in a while so the bridge can learn its MAC address (we do this after migration, for example). Does this mean that it is not possible for having each tun device in a seperate bridge that serves a seperate Vlan? We have experienced a strange problem that we couldn't yet explain. Given this setup: GuestHost kvm1 --- eth0 -+- bridge0 --- vlan1 \ | +-- eth0 kvm2 -+- eth0 -/ / \- eth1 --- bridge1 --- vlan2 + When sending packets through kvm2/eth0, they appear on both bridges and also vlans, also when sending packets through kvm2/eth1. When the guest has only one interface, the packets only appear on one bridge and one vlan as it's supposed to be. Can this be worked around? This is strange. Can you post the command line you used to start kvm2? This is exactly my scenario as well. When QEMU sends packets through the tun interface coming from a VM then those will be passed to both tap devices of that VM. Simply because it doesn't know where to send the packet to. It just copies the buffer to the tap interface. The tap interface then eventually discards the packet if the MAC address doesn't match its own. What you would need is a 1:1 mapping, e.g. one tun interface per tap device. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM VT-d2?
Does KVM already take advantage of Intel VT-d2 features, e.g. interrupt remapping support? Has anyone verified how it improves interrupt delivery for PCI pass-through devices? Thanks, Anna -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: KVM VT-d2?
I thought that one use case of VT-d2 interrupt remapping was to be able to safely and more efficiently deliver interrupts to the CPU that runs the particular VCPU of the guest that owns the I/O device that issues the interrupt. Shouldn't there at least be some performance (e.g. latency) improvement doing the remapping and checking in HW with a predefined table rather than multiplexing this in software in the hypervisor layer? -Original Message- From: Kay, Allen M [mailto:allen.m@intel.com] Sent: 14 May 2009 15:02 To: Fischer, Anna; kvm@vger.kernel.org Subject: RE: KVM VT-d2? We have verified VT-d2 features works with PCI passthrough on KVM. To enable it, you need to turn on interrupt remapping in kernel config. Interrupt remapping is a security/isolation feature where interrupt delivery is qualified with device's bus/device/function in interrupt remapping table entry when source ID checking is turn on. It does not directly inject interrupt to the guest OS. -Original Message- From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf Of Fischer, Anna Sent: Thursday, May 14, 2009 2:53 PM To: kvm@vger.kernel.org Subject: KVM VT-d2? Does KVM already take advantage of Intel VT-d2 features, e.g. interrupt remapping support? Has anyone verified how it improves interrupt delivery for PCI pass-through devices? Thanks, Anna -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
virtio_net with RSS?
Are there any plans to enhance virtio_net with receive-side scaling capabilities, so that an SMP guest OS can balance its network processing load more equally across multiple CPUs? Thanks, Anna -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
PCI pass-through of multi-function device
Does KVM allow passing through a full multi-function PCI device to a guest, and make that device appear as a whole multi-function device rather than as multiple PCI single-function devices (e.g. Xen only does the latter where all PCI devices appear with function ID being 0 in the guest)? Thanks, Anna -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Network I/O performance
Subject: Re: Network I/O performance Fischer, Anna wrote: I am running KVM with Fedora Core 8 on a 2.6.23 32-bit kernel. I use the tun/tap device model and the Linux bridge kernel module to connect my VM to the network. I have 2 10G Intel 82598 network devices (with the ixgbe driver) attached to my machine and I want to do packet routing in my VM (the VM has two virtual network interfaces configured). Analysing the network performance of the standard QEMU emulated NICs, I get less that 1G of throughput on those 10G links. Surprisingly though, I don't really see CPU utilization being maxed out. This is a dual core machine, and mpstat shows me that both CPUs are about 40% idle. My VM is more or less unresponsive due to the high network processing load while the host OS still seems to be in good shape. How can I best tune this setup to achieve best possible performance with KVM? I know there is virtIO and I know there is PCI pass-through, but those models are not an option for me right now. How many cpus are assigned to the guest? If only one, then 40% idle equates to 100% of a core for the guest and 20% for housekeeping. No, the machine has a dual core CPU and I have configured the guest with 2 CPUs. So I would want to see KVM using up to 200% of CPU, ideally. There is nothing else running on that machine. If this is the case, you could try pinning the vcpu thread (info cpus from the monitor) to one core. You should then see 100%/20% cpu load distribution. wrt emulated NIC performance, I'm guessing you're not doing tcp? If you were we might do something with TSO. No, I am measuring UDP throughput performance. I have now tried using a different NIC model, and the e1000 model seems to achieve slightly better performance (CPU goes up to 110% only though). I have also been running virtio now, and while its performance with 2.6.20 was very poor too, when changing the guest kernel to 2.6.30, I get a reasonable performance and higher CPU utilization (e.g. it goes up to 180-190%). I have to throttle the incoming bandwidth though, because as soon as I go over a certain threshold, CPU goes back down to 90% and throughput goes down too. I have not seen this with Xen/VMware where I mostly managed to max out CPU completely before throughput performance did not go up anymore. I have also realized that when using the tun/tap configuration with a bridge, packets are replicated on all tap devices when QEMU writes packets to the tun interface. I guess this is a limitation of tun/tap as it does not know to which tap device the packet has to go to. The tap device then eventually drops packets when the destination MAC is not its own, but it still receives the packet which causes more overhead in the system overall. I have not yet experimented much with pinning VCPU threads to cores. I will do that as well. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Problem doing pci passthrough of the network card without VT-d
Are you expecting this to work using the 1:1 mapping for direct device assignment? I use a similar setup (e.g. dma=none and no VT-d) but a different NIC (Intel 82598 10G) and a different driver (ixgbe). I see the same messages, but also don't get the device to work in the guest (while it does work in the host OS). In fact I don't get any errors on the guest side, so it is hard to track what is wrong. No I/O is happening. The guest cannot not transmit/receive any packets to/from those NICs. The interface packet counters stay at 0. I see an error in QEMU saying invalid memtype, and it also seems to have trouble assigning IRQs. assigned_dev_enabled_msix() fails with Invalid Argument, but on the guest side I can see that MSI-X is configured properly under /proc/interrupts. I use the latest KVM 2.6.30 tree in both host OS and guest OS. -Original Message- From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf Of Passera, Pablo R Sent: 12 May 2009 11:22 To: kvm@vger.kernel.org Subject: RE: Problem doing pci passthrough of the network card without VT-d One update on this. I disabled VT-d from the BIOS and now I am not getting the DMAR error messages in dmesg, but the board still does not work on the guest. Any help is welcomed. e1000e :00:19.0: PCI INT A disabled pci-stub :00:19.0: PCI INT A - GSI 20 (level, low) - IRQ 20 pci-stub :00:19.0: irq 29 for MSI/MSI-X pci-stub :00:19.0: irq 29 for MSI/MSI-X pci-stub :00:19.0: irq 29 for MSI/MSI-X pci-stub :00:19.0: irq 29 for MSI/MSI-X pci-stub :00:19.0: irq 29 for MSI/MSI-X pci-stub :00:19.0: irq 29 for MSI/MSI-X pci-stub :00:19.0: irq 29 for MSI/MSI-X pci-stub :00:19.0: irq 29 for MSI/MSI-X pci-stub :00:19.0: irq 29 for MSI/MSI-X pci-stub :00:19.0: irq 29 for MSI/MSI-X Regards, Pablo -Original Message- From: Passera, Pablo R Sent: Tuesday, May 12, 2009 12:14 PM To: kvm@vger.kernel.org Subject: Problem doing pci passthrough of the network card without VT- d Hi List, I am having problems to do pci passthrough to a network card without using VT-d. The card is present in the guest but with a different model (Intel Corporation 82801I Gigabit Ethernet Controller (rev 2)) and it does not work. The qemu line that I used is: ./devel/bin/qemu-system-x86_64 -hda ./dm.img -m 256 -pcidevice host=00:19.0,dma=none -net none Before running qemu I did echo 8086 294c /sys/bus/pci/drivers/pci-stub/new_id echo :00:19.0 /sys/bus/pci/drivers/e1000e/unbind echo :00:19.0 /sys/bus/pci/drivers/pci-stub/bind This is the lspci -tv output -[:00]-+-00.0 Intel Corporation 82X38/X48 Express DRAM Controller +-01.0-[:01]00.0 nVidia Corporation G80 [GeForce 8800 GTX] +-19.0 Intel Corporation 82566DC-2 Gigabit Network Connection +-1a.0 Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #4 +-1a.1 Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #5 +-1a.2 Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #6 +-1a.7 Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #2 +-1b.0 Intel Corporation 82801I (ICH9 Family) HD Audio Controller +-1c.0-[:02]-- +-1c.4-[:03]00.0 Marvell Technology Group Ltd. 88SE6121 SATA II Controller +-1d.0 Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 +-1d.1 Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 +-1d.2 Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 +-1d.7 Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 +-1e.0-[:04]03.0 Texas Instruments TSB43AB22/A IEEE- 1394a-2000 Controller (PHY/Link) +-1f.0 Intel Corporation 82801IR (ICH9R) LPC Interface Controller +-1f.2 Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 4 port SATA IDE Controller +-1f.3 Intel Corporation 82801I (ICH9 Family) SMBus Controller \-1f.5 Intel Corporation 82801I (ICH9 Family) 2 port SATA IDE Controller I am getting the following error in host dmesg e1000e :00:19.0: PCI INT A disabled pci-stub :00:19.0: PCI INT A - GSI 20 (level, low) - IRQ 20 pci-stub :00:19.0: irq 29 for MSI/MSI-X pci-stub :00:19.0: irq 29 for MSI/MSI-X pci-stub :00:19.0: irq 29 for MSI/MSI-X pci-stub :00:19.0: irq 29 for MSI/MSI-X pci-stub :00:19.0: irq 29 for MSI/MSI-X DMAR:[DMA Read] Request device [00:19.0] fault addr baee000 DMAR:[fault reason 02] Present bit in context entry is clear pci-stub :00:19.0: irq 29 for MSI/MSI-X pci-stub :00:19.0: irq 29 for MSI/MSI-X pci-stub :00:19.0: irq 29 for MSI/MSI-X pci-stub :00:19.0: irq 29 for MSI/MSI-X pci-stub :00:19.0: irq 29 for MSI/MSI-X DMAR:[DMA Read] Request device [00:19.0] fault
Network I/O performance
I am running KVM with Fedora Core 8 on a 2.6.23 32-bit kernel. I use the tun/tap device model and the Linux bridge kernel module to connect my VM to the network. I have 2 10G Intel 82598 network devices (with the ixgbe driver) attached to my machine and I want to do packet routing in my VM (the VM has two virtual network interfaces configured). Analysing the network performance of the standard QEMU emulated NICs, I get less that 1G of throughput on those 10G links. Surprisingly though, I don't really see CPU utilization being maxed out. This is a dual core machine, and mpstat shows me that both CPUs are about 40% idle. My VM is more or less unresponsive due to the high network processing load while the host OS still seems to be in good shape. How can I best tune this setup to achieve best possible performance with KVM? I know there is virtIO and I know there is PCI pass-through, but those models are not an option for me right now. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 0/13 v7] PCI: Linux kernel SR-IOV support
From: linux-pci-ow...@vger.kernel.org [mailto:linux-pci- ow...@vger.kernel.org] On Behalf Of Jesse Barnes Sent: 16 December 2008 23:24 To: Yu Zhao Cc: linux-...@vger.kernel.org; Chiang, Alexander; Helgaas, Bjorn; grund...@parisc-linux.org; g...@kroah.com; mi...@elte.hu; matt...@wil.cx; randy.dun...@oracle.com; rdre...@cisco.com; ho...@verge.net.au; ying...@kernel.org; linux-ker...@vger.kernel.org; kvm@vger.kernel.org; virtualizat...@lists.linux-foundation.org Subject: Re: [PATCH 0/13 v7] PCI: Linux kernel SR-IOV support On Friday, November 21, 2008 10:36 am Yu Zhao wrote: Greetings, Following patches are intended to support SR-IOV capability in the Linux kernel. With these patches, people can turn a PCI device with the capability into multiple ones from software perspective, which will benefit KVM and achieve other purposes such as QoS, security, and etc. The Physical Function and Virtual Function drivers using the SR-IOV APIs will come soon! Major changes from v6 to v7: 1, remove boot-time resource rebalancing support. (Greg KH) 2, emit uevent upon the PF driver is loaded. (Greg KH) 3, put SR-IOV callback function into the 'pci_driver'. (Matthew Wilcox) 4, register SR-IOV service at the PF loading stage. 5, remove unnecessary APIs (pci_iov_enable/disable). Thanks for your patience with this, Yu, I know it's been a long haul. :) I applied 1-9 to my linux-next branch; and at least patch #10 needs a respin, so can you re-do 10-13 as a new patch set? On re-reading the last thread, there was a lot of smoke, but very little fire afaict. The main questions I saw were: 1) do we need SR-IOV at all? why not just make each subsystem export devices to guests? This is a bit of a red herring. Nothing about SR-IOV prevents us from making subsystems more v12n friendly. And since SR-IOV is a hardware feature supported by devices these days, we should make Linux support it. 2) should the PF/VF drivers be the same or not? Again, the SR-IOV patchset and PCI spec don't dictate this. We're free to do what we want here. 3) should VF devices be represented by pci_dev structs? Yes. (This is an easy one :) 4) can VF devices be used on the host? Yet again, SR-IOV doesn't dictate this. Developers can make PF/VF combo drivers or split them, and export the resulting devices however they want. Some subsystem work may be needed to make this efficient, but SR- IOV itself is agnostic about it. So overall I didn't see many objections to the actual code in the last post, and the issues above certainly don't merit a NAK IMO... I have two minor comments on this topic. 1) Currently the PF driver is called before the kernel initializes VFs and their resources, and the current API does not allow the PF driver to detect that easily if the allocation of the VFs and their resources has succeeded or not. It would be quite useful if the PF driver gets notified when the VFs have been created successfully as it might have to do further device-specific work *after* IOV has been enabled. 2) Configuration of SR-IOV: the current API allows to enable/disable VFs from userspace via SYSFS. At the moment I am not quite clear what exactly is supposed to control these capabilities. This could be Linux tools or, on a virtualized system, hypervisor control tools. One thing I am missing though is an in-kernel API for this which I think might be useful. After all the PF driver controls the device, and, for example, when a device error occurs (e.g. a hardware failure which only the PF driver will be able to detect, not Linux), then the PF driver might have to de-allocate all resources, shut down VFs and reset the device, or something like that. In that case the PF driver needs to have a way to notify the Linux SR-IOV code about this and initiate cleaning up of VFs and their resources. At the moment, this would have to go through userspace, I believe, and I think that is not an optimal solution. Yu, do you have an opinion on how this would be realized? Anna -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 0/13 v7] PCI: Linux kernel SR-IOV support
From: Zhao, Yu [mailto:yu.z...@intel.com] Sent: 18 December 2008 02:14 To: Fischer, Anna Cc: Jesse Barnes; linux-...@vger.kernel.org; Chiang, Alexander; Helgaas, Bjorn; grund...@parisc-linux.org; g...@kroah.com; mi...@elte.hu; matt...@wil.cx; randy.dun...@oracle.com; rdre...@cisco.com; ho...@verge.net.au; ying...@kernel.org; linux- ker...@vger.kernel.org; kvm@vger.kernel.org; virtualizat...@lists.linux-foundation.org Subject: Re: [PATCH 0/13 v7] PCI: Linux kernel SR-IOV support Fischer, Anna wrote: I have two minor comments on this topic. 1) Currently the PF driver is called before the kernel initializes VFs and their resources, and the current API does not allow the PF driver to detect that easily if the allocation of the VFs and their resources has succeeded or not. It would be quite useful if the PF driver gets notified when the VFs have been created successfully as it might have to do further device-specific work *after* IOV has been enabled. If the VF allocation fails in the PCI layer, then the SR-IOV core will invokes the callback again to notify the PF driver with zero VF count. The PF driver does not have to concern about this even the PCI layer code fails (and actually it's very rare). Yes, this is good. And I'm not sure why the PF driver wants to do further work *after* the VF is allocated. Does this mean PF driver have to set up some internal resources related to SR-IOV/VF? If yes, I suggest the PF driver do it before VF allocation. The design philosophy of SR-IOV/VF is that VF is treated as hot-plug device, which means it should be immediately usable by VF driver (e.g. VF driver is pre-loaded) after it appears in the PCI subsystem. If that is not the purpose, then PF driver should handle it not depending on the SR-IOV, right? Yes, you are right. In fact I was assuming in this case that the PF driver might have to allocate VF specific resources before a PF - VF communication can be established but this can be done before the VF PCI device appears, so I was wrong with this. The current API is sufficient to handle all of this, so I am withdrawing my concern here ;-) If you could elaborate your SR-IOV PF/VF h/w specific requirement, it would be help for me to answer this question :-) 2) Configuration of SR-IOV: the current API allows to enable/disable VFs from userspace via SYSFS. At the moment I am not quite clear what exactly is supposed to control these capabilities. This could be Linux tools or, on a virtualized system, hypervisor control tools. This depends on user application, you know, which depends on the usage environment (i.e. native, KVM or Xen). One thing I am missing though is an in-kernel API for this which I think might be useful. After all the PF driver controls the device, and, for example, when a device error occurs (e.g. a hardware failure which only the PF driver will be able to detect, not Linux), then the PF driver might have to de-allocate all resources, shut down VFs and reset the device, or something like that. In that case the PF driver needs to have a way to notify the Linux SR-IOV code about this and initiate cleaning up of VFs and their resources. At the moment, this would have to go through userspace, I believe, and I think that is not an optimal solution. Yu, do you have an opinion on how this would be realized? Yes, the PF driver can use pci_iov_unregister to disable SR-IOV in case the fatal error occurs. This function also sends notification to user level through 'uevent' so user application can aware the change. If pci_iov_unregister is accessible for kernel drivers than this is in fact all we need. Thanks for the clarification. I think the patchset looks very good. Acked-by: Anna Fischer anna.fisc...@hp.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
Subject: Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support Importance: High On Fri, Nov 07, 2008 at 11:17:40PM +0800, Yu Zhao wrote: While we are arguing what the software model the SR-IOV should be, let me ask two simple questions first: 1, What does the SR-IOV looks like? 2, Why do we need to support it? I don't think we need to worry about those questions, as we can see what the SR-IOV interface looks like by looking at the PCI spec, and we know Linux needs to support it, as Linux needs to support everything :) (note, community members that can not see the PCI specs at this point in time, please know that we are working on resolving these issues, hopefully we will have some good news within a month or so.) As you know the Linux kernel is the base of various virtual machine monitors such as KVM, Xen, OpenVZ and VServer. We need SR-IOV support in the kernel because mostly it helps high-end users (IT departments, HPC, etc.) to share limited hardware resources among hundreds or even thousands virtual machines and hence reduce the cost. How can we make these virtual machine monitors utilize the advantage of SR-IOV without spending too much effort meanwhile remaining architectural correctness? I believe making VF represent as much closer as a normal PCI device (struct pci_dev) is the best way in current situation, because this is not only what the hardware designers expect us to do but also the usage model that KVM, Xen and other VMMs have already supported. But would such an api really take advantage of the new IOV interfaces that are exposed by the new device type? I agree with what Yu says. The idea is to have hardware capabilities to virtualize a PCI device in a way that those virtual devices can represent full PCI devices. The advantage of that is that those virtual device can then be used like any other standard PCI device, meaning we can use existing OS tools, configuration mechanism etc. to start working with them. Also, when using a virtualization-based system, e.g. Xen or KVM, we do not need to introduce new mechanisms to make use of SR-IOV, because we can handle VFs as full PCI devices. A virtual PCI device in hardware (a VF) can be as powerful or complex as you like, or it can be very simple. But the big advantage of SR-IOV is that hardware presents a complete PCI device to the OS - as opposed to some resources, or queues, that need specific new configuration and assignment mechanisms in order to use them with a guest OS (like, for example, VMDq or similar technologies). Anna -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
On Thu, Nov 06, 2008 at 08:41:53AM -0800, H L wrote: I have not modified any existing drivers, but instead I threw together a bare-bones module enabling me to make a call to pci_iov_register() and then poke at an SR-IOV adapter's /sys entries for which no driver was loaded. It appears from my perusal thus far that drivers using these new SR-IOV patches will require modification; i.e. the driver associated with the Physical Function (PF) will be required to make the pci_iov_register() call along with the requisite notify() function. Essentially this suggests to me a model for the PF driver to perform any global actions or setup on behalf of VFs before enabling them after which VF drivers could be associated. Where would the VF drivers have to be associated? On the pci_dev level or on a higher one? A VF appears to the Linux OS as a standard (full, additional) PCI device. The driver is associated in the same way as for a normal PCI device. Ideally, you would use SR-IOV devices on a virtualized system, for example, using Xen. A VF can then be assigned to a guest domain as a full PCI device. Will all drivers that want to bind to a VF device need to be rewritten? Currently, any vendor providing a SR-IOV device needs to provide a PF driver and a VF driver that runs on their hardware. A VF driver does not necessarily need to know much about SR-IOV but just run on the presented PCI device. You might want to have a communication channel between PF and VF driver though, for various reasons, if such a channel is not provided in hardware. I have so far only seen Yu Zhao's 7-patch set. I've not yet looked at his subsequently tendered 15-patch set so I don't know what has changed.The hardware/firmware implementation for any given SR-IOV compatible device, will determine the extent of differences required between a PF driver and a VF driver. Yeah, that's what I'm worried/curious about. Without seeing the code for such a driver, how can we properly evaluate if this infrastructure is the correct one and the proper way to do all of this? Yu's API allows a PF driver to register with the Linux PCI code and use it to activate VFs and allocate their resources. The PF driver needs to be modified to work with that API. While you can argue about how that API is supposed to look like, it is clear that such an API is required in some form. The PF driver needs to know when VFs are active as it might want to allocate further (device-specific) resources to VFs or initiate further (device-specific) configurations. While probably a lot of SR-IOV specific code has to be in the PF driver, there is also support required from the Linux PCI subsystem, which is to some extend provided by Yu's patches. Anna -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
Subject: Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support On Thu, Nov 06, 2008 at 05:38:16PM +, Fischer, Anna wrote: On Thu, Nov 06, 2008 at 08:41:53AM -0800, H L wrote: I have not modified any existing drivers, but instead I threw together a bare-bones module enabling me to make a call to pci_iov_register() and then poke at an SR-IOV adapter's /sys entries for which no driver was loaded. It appears from my perusal thus far that drivers using these new SR-IOV patches will require modification; i.e. the driver associated with the Physical Function (PF) will be required to make the pci_iov_register() call along with the requisite notify() function. Essentially this suggests to me a model for the PF driver to perform any global actions or setup on behalf of VFs before enabling them after which VF drivers could be associated. Where would the VF drivers have to be associated? On the pci_dev level or on a higher one? A VF appears to the Linux OS as a standard (full, additional) PCI device. The driver is associated in the same way as for a normal PCI device. Ideally, you would use SR-IOV devices on a virtualized system, for example, using Xen. A VF can then be assigned to a guest domain as a full PCI device. It's that second part that I'm worried about. How is that going to happen? Do you have any patches that show this kind of assignment? That depends on your setup. Using Xen, you could assign the VF to a guest domain like any other PCI device, e.g. using PCI pass-through. For VMware, KVM, there are standard ways to do that, too. I currently don't see why SR-IOV devices would need any specific, non-standard mechanism for device assignment. Will all drivers that want to bind to a VF device need to be rewritten? Currently, any vendor providing a SR-IOV device needs to provide a PF driver and a VF driver that runs on their hardware. Are there any such drivers available yet? I don't know. A VF driver does not necessarily need to know much about SR-IOV but just run on the presented PCI device. You might want to have a communication channel between PF and VF driver though, for various reasons, if such a channel is not provided in hardware. Agreed, but what does that channel look like in Linux? I have some ideas of what I think it should look like, but if people already have code, I'd love to see that as well. At this point I would guess that this code is vendor specific, as are the drivers. The issue I see is that most likely drivers will run in different environments, for example, in Xen the PF driver runs in a driver domain while a VF driver runs in a guest VM. So a communication channel would need to be either Xen specific, or vendor specific. Also, a guest using the VF might run Windows while the PF might be controlled under Linux. Anna -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html