Re: [Qemu-devel] Are there plans to achieve ram live Snapshot feature?
On Thu, Aug 15, 2013 at 10:26:36AM +0800, Wenchao Xia wrote: 于 2013-8-14 15:53, Stefan Hajnoczi 写道: On Wed, Aug 14, 2013 at 3:54 AM, Wenchao Xia xiaw...@linux.vnet.ibm.com wrote: 于 2013-8-13 16:21, Stefan Hajnoczi 写道: On Tue, Aug 13, 2013 at 4:53 AM, Wenchao Xia xiaw...@linux.vnet.ibm.com wrote: 于 2013-8-12 19:33, Stefan Hajnoczi 写道: On Mon, Aug 12, 2013 at 12:26 PM, Alex Bligh a...@alex.org.uk wrote: --On 12 August 2013 11:59:03 +0200 Stefan Hajnoczi stefa...@gmail.com wrote: The idea that was discussed on qemu-de...@nongnu.org uses fork(2) to capture the state of guest RAM and then send it back to the parent process. The guest is only paused for a brief instant during fork(2) and can continue to run afterwards. How would you capture the state of emulated hardware which might not be in the guest RAM? Exactly the same way vmsave works today. It calls the device's save functions which serialize state to file. The difference between today's vmsave and the fork(2) approach is that QEMU does not need to wait for guest RAM to be written to file before resuming the guest. Stefan I have a worry about what glib says: On Unix, the GLib mainloop is incompatible with fork(). Any program using the mainloop must either exec() or exit() from the child without returning to the mainloop. This is fine, the child just writes out the memory pages and exits. It never returns to the glib mainloop. There is another way to do it: intercept the write in kvm.ko(or other kernel code). Since the key is intercept the memory change, we can do it in userspace in TCG mode, thus we can add the missing part in KVM mode. Another benefit of this way is: the used memory can be controlled. For example, with ioctl(), set a buffer of a fixed size which keeps the intercepted write data by kernel code, which can avoid frequently switch back to user space qemu code. when it is full always return back to userspace's qemu code, let qemu code save the data into disk. I haven't check the exactly behavior of Intel guest mode about how to handle page fault, so can't estimate the performance caused by switching of guest mode and root mode, but it should not be worse than fork(). The fork(2) approach is portable, covers both KVM and TCG, and doesn't require kernel changes. A kvm.ko kernel change also won't be supported on existing KVM hosts. These are big drawbacks and the kernel approach would need to be significantly better than plain old fork(2) to make it worthwhile. Stefan I think advantage is memory usage is predictable, so memory usage peak can be avoided, by always save the changed pages first. fork() does not know which pages are changed. I am not sure if this would be a serious issue when server's memory is consumed much, for example, 24G host emulate 11G*2 guest to provide powerful virtual server. Memory usage is predictable but guest uptime is unpredictable because it waits until memory is written out. This defeats the point of live savevm. The guest may be stalled arbitrarily. I think it is adjustable. There is no much difference with fork(), except get more precise control about the changed pages. Kernel intercept the change, and stores the changed page in another page, similar to fork(). When userspace qemu code execute, save some pages to disk. Buffer can be used like some lubricant. When Buffer = MAX, it equals to fork(), guest runs more lively. When Buffer = 0, guest runs less lively. I think it allows user to find a good balance point with a parameter. It is harder to implement, just want to show the idea. You are right. You could set a bigger buffer size to increase guest uptime. The fork child can minimize the chance of out-of-memory by using madvise(MADV_DONTNEED) after pages have been written out. It seems no way to make sure the written out page is the changed pages, so it have a good chance the written one is the unchanged and still used by the other qemu process. The KVM dirty log tells you which pages were touched. The fork child process could give priority to the pages which have been touched by the guest. They must be written out and marked madvise(MADV_DONTNEED) as soon as possible. I haven't looked at the vmsave data format yet to see if memory pages can be saved in random order, but this might work. It reduces the likelihood of copy-on-write memory growth. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Are there plans to achieve ram live Snapshot feature?
On Wed, Aug 14, 2013 at 3:54 AM, Wenchao Xia xiaw...@linux.vnet.ibm.com wrote: 于 2013-8-13 16:21, Stefan Hajnoczi 写道: On Tue, Aug 13, 2013 at 4:53 AM, Wenchao Xia xiaw...@linux.vnet.ibm.com wrote: 于 2013-8-12 19:33, Stefan Hajnoczi 写道: On Mon, Aug 12, 2013 at 12:26 PM, Alex Bligh a...@alex.org.uk wrote: --On 12 August 2013 11:59:03 +0200 Stefan Hajnoczi stefa...@gmail.com wrote: The idea that was discussed on qemu-de...@nongnu.org uses fork(2) to capture the state of guest RAM and then send it back to the parent process. The guest is only paused for a brief instant during fork(2) and can continue to run afterwards. How would you capture the state of emulated hardware which might not be in the guest RAM? Exactly the same way vmsave works today. It calls the device's save functions which serialize state to file. The difference between today's vmsave and the fork(2) approach is that QEMU does not need to wait for guest RAM to be written to file before resuming the guest. Stefan I have a worry about what glib says: On Unix, the GLib mainloop is incompatible with fork(). Any program using the mainloop must either exec() or exit() from the child without returning to the mainloop. This is fine, the child just writes out the memory pages and exits. It never returns to the glib mainloop. There is another way to do it: intercept the write in kvm.ko(or other kernel code). Since the key is intercept the memory change, we can do it in userspace in TCG mode, thus we can add the missing part in KVM mode. Another benefit of this way is: the used memory can be controlled. For example, with ioctl(), set a buffer of a fixed size which keeps the intercepted write data by kernel code, which can avoid frequently switch back to user space qemu code. when it is full always return back to userspace's qemu code, let qemu code save the data into disk. I haven't check the exactly behavior of Intel guest mode about how to handle page fault, so can't estimate the performance caused by switching of guest mode and root mode, but it should not be worse than fork(). The fork(2) approach is portable, covers both KVM and TCG, and doesn't require kernel changes. A kvm.ko kernel change also won't be supported on existing KVM hosts. These are big drawbacks and the kernel approach would need to be significantly better than plain old fork(2) to make it worthwhile. Stefan I think advantage is memory usage is predictable, so memory usage peak can be avoided, by always save the changed pages first. fork() does not know which pages are changed. I am not sure if this would be a serious issue when server's memory is consumed much, for example, 24G host emulate 11G*2 guest to provide powerful virtual server. Memory usage is predictable but guest uptime is unpredictable because it waits until memory is written out. This defeats the point of live savevm. The guest may be stalled arbitrarily. The fork child can minimize the chance of out-of-memory by using madvise(MADV_DONTNEED) after pages have been written out. The way fork handles memory overcommit on Linux is configurable, but I guess in a situation where memory runs out the Out-of-Memory Killer will kill a process (probably QEMU since it is hogging so much memory). The risk of OOM can be avoided by running the traditional vmsave which stops the guest instead of using live vmsave. The other option is to live migrate to file but the disadvantage there is that you cannot choose exactly when the state it saved, it happens sometime after live migration is initiated. There are trade-offs with all the approaches, it depends on what is most important to you. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM Block Device Driver
On Wed, Aug 14, 2013 at 10:40:06AM +0800, Fam Zheng wrote: On Tue, 08/13 16:13, Spensky, Chad - 0559 - MITLL wrote: Hi All, I'm working with some disk introspection on KVM, and we trying to create a shadow image of the disk. We've hooked the functions in block.c, in particular bdrv_aio_writev. However we are seeing writes go through, pausing the VM, and the comparing our shadow image with the actual VM image, and they aren't 100% synced up. The first 1-2 sectors appear to be always be correct, however, after that, there are sometimes some discrepancies. I believe we have exhausted most obvious bugs (malloc bugs, incorrect size calculations etc.). Has anyone had any experience with this or have any insights? Our methodology is as follows: 1. Boot the VM. 2. Pause VM. 3. Copy the disk to our shadow image. How do you copy the disk, from guest or host? 4. Perform very few reads/writes. Did you flush to disk? 5. Pause VM. 6. Compare shadow copy with active vm disk. And this is where we are seeing discrepancies. Any help is much appreciated! We are running on Ubuntu 12.04 with a modified Debian build. - Chad -- Chad S. Spensky I think drive-backup command does just what you want, it creates a image and copy-on-write date from guest disk to the target, without pausing VM. Or perhaps drive-mirror. Maybe Chad can explain what the use case is. There is probably an existing command that does this or that could be extended to do this safely. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM Block Device Driver
On Wed, Aug 14, 2013 at 07:29:53AM -0400, Spensky, Chad - 0559 - MITLL wrote: We are trying to keep an active shadow copy while the system is running without any need for pausing. More precisely we want to log every individual access to the drive into a database so that the entire stream of accesses could be replayed at a later time. CCing Wolfgang Richter who was previously interested in block I/O tracing: https://lists.nongnu.org/archive/html/qemu-devel/2013-05/msg01725.html Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Oracle RAC in libvirt+KVM environment
On Wed, Aug 14, 2013 at 04:40:44PM +0800, Timon Wang wrote: I found a article about Hyper-V virtual Fiber Channel, I think this will make Failover Cluster work if KVM has the same feature. http://technet.microsoft.com/en-us/library/hh831413.aspx Hyper-V uses NPIV for virtual Fiber Channel, I have read some article about KVM NPIV, but how can I config it with libvirt? Any body can show me some example? A web search turns up this: https://docs.fedoraproject.org/en-US/Fedora/18/html/Virtualization_Administration_Guide/sect-Technical_Papers-Identifying_HBAs_in_a_Host_System-Confirming_That_IO_Traffic_is_Going_through_an_NPIV_HBA.html You can use this if the host has a supported Fibre Channel HBA and your image is on a SAN LUN. From my limited knowledge about this, NPIV itself won't make clustering possible. RAC or Failure Cluster probably still require specific SCSI commands in order to work (like persistent reservations) and that's what needs to be investigated in order to figure out a solution. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Are there plans to achieve ram live Snapshot feature?
On Tue, Aug 13, 2013 at 4:53 AM, Wenchao Xia xiaw...@linux.vnet.ibm.com wrote: 于 2013-8-12 19:33, Stefan Hajnoczi 写道: On Mon, Aug 12, 2013 at 12:26 PM, Alex Bligh a...@alex.org.uk wrote: --On 12 August 2013 11:59:03 +0200 Stefan Hajnoczi stefa...@gmail.com wrote: The idea that was discussed on qemu-de...@nongnu.org uses fork(2) to capture the state of guest RAM and then send it back to the parent process. The guest is only paused for a brief instant during fork(2) and can continue to run afterwards. How would you capture the state of emulated hardware which might not be in the guest RAM? Exactly the same way vmsave works today. It calls the device's save functions which serialize state to file. The difference between today's vmsave and the fork(2) approach is that QEMU does not need to wait for guest RAM to be written to file before resuming the guest. Stefan I have a worry about what glib says: On Unix, the GLib mainloop is incompatible with fork(). Any program using the mainloop must either exec() or exit() from the child without returning to the mainloop. This is fine, the child just writes out the memory pages and exits. It never returns to the glib mainloop. There is another way to do it: intercept the write in kvm.ko(or other kernel code). Since the key is intercept the memory change, we can do it in userspace in TCG mode, thus we can add the missing part in KVM mode. Another benefit of this way is: the used memory can be controlled. For example, with ioctl(), set a buffer of a fixed size which keeps the intercepted write data by kernel code, which can avoid frequently switch back to user space qemu code. when it is full always return back to userspace's qemu code, let qemu code save the data into disk. I haven't check the exactly behavior of Intel guest mode about how to handle page fault, so can't estimate the performance caused by switching of guest mode and root mode, but it should not be worse than fork(). The fork(2) approach is portable, covers both KVM and TCG, and doesn't require kernel changes. A kvm.ko kernel change also won't be supported on existing KVM hosts. These are big drawbacks and the kernel approach would need to be significantly better than plain old fork(2) to make it worthwhile. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Oracle RAC in libvirt+KVM environment
On Mon, Aug 12, 2013 at 06:17:51PM +0800, Timon Wang wrote: Yes, SCSI bus likes pass through a shared LUN to the vm, and I am using a shared LUN for 'share' purpose. I found a post that vmware use lsilogic bus for the shared disk, but my qemu/kvm version can't support lsilogic bus. I'm tring to update qemu/kvm version for lsilogic bus support. Use virtio-scsi. The emulated LSI SCSI controller has known bugs and is not actively developed - don't be surprised if you hit issues with it. The question is still what commands RAC or Failover Clustering use. If you find that the software refuses to run, it could be because additional work is required to make it work on KVM. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Oracle RAC in libvirt+KVM environment
On Fri, Aug 02, 2013 at 01:58:24PM +0800, Timon Wang wrote: We wan't to setup two Oracle instance and make RAC work on them. Both VM are setup based on libvirt + KVM, we use a lvm lun which formated in qcow2 format and set the shareable properties in the disk driver like this: disk type='block' device='disk' driver name='qemu' type='qcow2' cache='none'/ qcow2 is not cluster-aware, it cannot be opened by multiple VMs at the same time. You must use raw. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Oracle RAC in libvirt+KVM environment
On Sat, Aug 10, 2013 at 11:14:39AM +0800, Timon Wang wrote: I have tryied change the disk bus to SCSI, add a SCSI controller whose model is virtio-scsi, still can't setup the RAC instance. I tried to use windows 2008 Failover Cluster feature to setup a a Failover Cluster instead, and I can't find any cluster disk to share between two nodes. So when Failover Cluster is setup, I can't add any Cluster disk to the Failover Cluster. Have I missed some thing? I'm not sure what SCSI-level requirements RAC or Failover Cluster have. If anyone knows which features are needed it would be possible to confirm whether they are supported under KVM. I expect this can only work if you are passing through a shared LUN. Can you describe your configuration? Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Are there plans to achieve ram live Snapshot feature?
On Fri, Aug 09, 2013 at 10:20:49AM +, Chijianchun wrote: Now in KVM, when RAM snapshot, vcpus needs stopped, it is Unfriendly restrictions to users. Are there plans to achieve ram live Snapshot feature? in my mind, Snapshots can not occupy additional too much memory, So when the memory needs to be changed, the old memory page is needed to flush to the file first. But flushing to file is too slower than memory, and when flushing, the vcpu or VM is need to be paused until finished flushing, so pause...resume...pause...resume., more and more slower. Is this idea feasible? Are there any other thoughts? A few people have looked at live vmsave or guest RAM snapshots. The idea that was discussed on qemu-de...@nongnu.org uses fork(2) to capture the state of guest RAM and then send it back to the parent process. The guest is only paused for a brief instant during fork(2) and can continue to run afterwards. The child process is a simple loop that sends the contents of guest RAM back to the parent process over a pipe or writes the memory pages to the save file on disk. It performs no logic besides writing out guest RAM. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Are there plans to achieve ram live Snapshot feature?
On Mon, Aug 12, 2013 at 12:26 PM, Alex Bligh a...@alex.org.uk wrote: --On 12 August 2013 11:59:03 +0200 Stefan Hajnoczi stefa...@gmail.com wrote: The idea that was discussed on qemu-de...@nongnu.org uses fork(2) to capture the state of guest RAM and then send it back to the parent process. The guest is only paused for a brief instant during fork(2) and can continue to run afterwards. How would you capture the state of emulated hardware which might not be in the guest RAM? Exactly the same way vmsave works today. It calls the device's save functions which serialize state to file. The difference between today's vmsave and the fork(2) approach is that QEMU does not need to wait for guest RAM to be written to file before resuming the guest. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: FAQ on linux-kvm.org has broken link
On Mon, Aug 05, 2013 at 10:59:45PM +0200, folkert wrote: Two approaches to get closer to the source of the problem: 1. Try the latest vanilla kernel on the host (Linux 3.10.5). This way you can rule out fixed bugs in vhost_net or tap. 2. Get the system into the bad state and then do some deeper. Start with outgoing ping, instrument guest driver and host vhost_net functions to see what the drivers are doing, inspect the transmit vring, etc. #1 is probably the best next step. If it fails and you still have time to work on a solution we can start digging deeper with #2. I can upgrade now to 3.10.3 as that is the current version in debian. Sounds good. That way you'll also have access to the latest perf for instrumenting vhost_net if it still fails. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: FAQ on linux-kvm.org has broken link
On Fri, Aug 02, 2013 at 08:06:58PM +0200, folkert wrote: A couple of questions: Please post the QEMU command-line from the host (ps aux | grep qemu). I'll post them all: - UMTS-clone: this one works fine since it was created a weak ago - belle: this one was fine but suddenly also showed the problem - mauer: the problem one 112 4819 1 4 Jul30 ?03:29:39 /usr/bin/kvm -S -M pc-1.1 -enable-kvm -m 1024 -smp 1,sockets=1,cores=1,threads=1 -name UMTS-clone -uuid e49502f1-0c74-2a60-99dc-7602da5ee640 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/UMTS-clone.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/dev/VGNEO/LV_V_UMTS-clone,if=none,id=drive-virtio-disk0,format=raw,cache=writeback -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/home/folkert/ISOs/wheezy.iso,if=none,id=drive-ide0-1-0,readonly=on,format=raw -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=20,id=hostnet0,vhost=on,vhostfd=21 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:09:3b:b6,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:0,password -vga cirrus -device usb-host,hostbus=6,hostaddr=5,id=hostdev0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 112 10065 1 11 Jul30 ?07:46:16 /usr/bin/kvm -S -M pc-1.1 -enable-kvm -m 8192 -smp 12,sockets=12,cores=1,threads=1 -name belle -uuid 16b704d7-5fbd-d67b-71e6-0d6b43f1bc0a -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/belle.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/dev/VGNEO/LV_V_BELLE,if=none,id=drive-virtio-disk0,format=raw -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/dev/VGNEO/LV_V_BELLE_OS,if=none,id=drive-virtio-disk1,format=raw,cache=writeback -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk1,id=virtio-disk1 -drive file=/dev/VGJOURNAL/LV_J_BELLE,if=none,id=drive-ide0-0-0,format=raw,cache=writeback -device ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -drive if=none,id=drive-ide0-1-0,readonly=on,format=raw,cache=none -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=26,id=hostnet0,vhost=on,vhostfd=27 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:75:4a:6f,bus=pci.0,addr=0x3 -netdev tap,fd=28,id=hostnet1,vhost=on,vhostfd=29 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:0a:6e:de,bus=pci.0,addr=0x7 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 -vnc 127.0.0.1:1,password -vga cirrus -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 root 13116 12830 0 19:54 pts/800:00:00 grep qemu 112 23453 1 57 13:16 ?03:46:51 /usr/bin/kvm -S -M pc-1.1 -enable-kvm -m 8192 -smp 8,maxcpus=12,sockets=12,cores=1,threads=1 -name mauer -uuid 3a8452e6-81af-b185-63b6-2b32be17ed87 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/mauer.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/dev/VGNEO/LV_V_MAUER,if=none,id=drive-virtio-disk0,format=raw,cache=writeback -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/dev/VGJOURNAL/LV_J_MAUER,if=none,id=drive-virtio-disk1,format=raw,cache=writethrough -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0xa,drive=drive-virtio-disk1,id=virtio-disk1 -drive if=none,id=drive-ide0-1-0,readonly=on,format=raw -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=26,id=hostnet0,vhost=on,vhostfd=27 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:86:d9:1f,bus=pci.0,addr=0x3 -netdev tap,fd=28,id=hostnet1,vhost=on,vhostfd=29 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:a3:12:8a,bus=pci.0,addr=0x4 -netdev tap,fd=30,id=hostnet2,vhost=on,vhostfd=31 -device virtio-net-pci,netdev=hostnet2,id=net2,mac=52:54:00:0f:54:c2,bus=pci.0,addr=0x5 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 -vnc 127.0.0.1:2,password -vga cirrus -device intel-hda,id=sound0,bus=pci.0,addr=0x7 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device
Re: [Bug 60620] guest loses frequently (multiple times per day!) connectivity to network device
On Fri, Aug 02, 2013 at 11:28:45AM +, bugzilla-dae...@bugzilla.kernel.org wrote: https://bugzilla.kernel.org/show_bug.cgi?id=60620 --- Comment #9 from Folkert van Heusden folk...@vanheusden.com --- Good news! If I - bring down all interfaces in the guest (ifdown eth0...) - rmmod virtio_net - modprobe virtio_net - bring up the interfaces again + it all works again! So hopefully this helps the bug hunt? Hi Folkert, Please post the QEMU command-line on the host (ps aux | grep qemu) and the output of lsmod | grep vhost_net. Since reinitializing the guest driver fixes the issue, we now need to find out whether the guest or the host side got stuck. I think I asked before, but please also post any relevant lines from dmesg on the host and from the guest. Examples would include networking error messages, kernel backtraces, or out-of-memory errors. Thanks, Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: FAQ on linux-kvm.org has broken link
On Fri, Aug 2, 2013 at 1:37 PM, folkert folk...@vanheusden.com wrote: If the result is #2, check firewalls on host and guest. Also try the following inside the guest: disable the network interface, rmmod virtio_net, modprobe virtio_net again, and bring the network up. I pinged, I sniffed, I updated the bug report (it also happens with other guests now!). And the bring down interfaces / rmmod / modprobe / ifup works! So I think something is wrong with virtio_net! What shall I do now? Hi Folkert, I wrote a reply earlier today but it was rejected because I not have a kernel.org bugzilla account. If you don't mind let's continue discussing on this mailing list - we don't know whether this is a kernel bug yet anyway. A couple of questions: Please post the QEMU command-line from the host (ps aux | grep qemu). Please confirm that vhost_net is being used on the host (lsmod | grep vhost_net). Please double-check both guest and host dmesg for any suspicious messages. It could be about networking, out-of-memory, or kernel backtraces. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: FAQ on linux-kvm.org has broken link
On Tue, Jul 30, 2013 at 10:45:20PM +0200, folkert wrote: If you keep losing network connectivity you may have a MAC or IP address conflict. The symptom is that network traffic is intermittent - for example, ping might work but a full TCP connection does not. I submitted a bug at bugzilla a while ago which I updated today with new findings: https://bugzilla.kernel.org/show_bug.cgi?id=60620 This week the system ran a couple of times for 1-2 days but tonight was a bit of a disaster: I had to reboot the system 18 times. Sometimes it was fine for half an hour but most of the times after a couple of minutes (sometimes even during boot) the networking on that one guest failed. I can't add anything besides suggesting slightly more verbose troubleshooting steps: 1. Wait until the guest suffers from lost network connectivity. 2. Confirm the MAC/IP addresses and run tcpdump -ni $IFACE inside the guest. Ping the guest from the host and check whether tcpdump reports ICMP ping packets. 3. Now try pinging the host from the guest and run tcpdump -ni $IFACE on the host. To determine the host-side tap interface, run the following: $ virsh domiflist mauer Interface Type Source Model MAC --- vnet0 networkdefaultvirtio 52:54:00:b9:c8:4d Now you have verified tap connectivity with the guest. We now know: 1. Tap connectivity is fine (both transmit and receive are working) 2. Either transmit or receive are broken (ping doesn't work but tcpdump does show incoming packets on one side). 3. Tap connectivity is broken (ping fails and tcpdump shows no ICMP packets). If the result is #1 then you can continue troubleshooting the next step: the bridge or NAT configuration on the host. If the result is #2, check firewalls on host and guest. Also try the following inside the guest: disable the network interface, rmmod virtio_net, modprobe virtio_net again, and bring the network up. If the result is #3, check firewalls on host and guest as well as dmesg output in host and guest. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: FAQ on linux-kvm.org has broken link
On Tue, Jul 30, 2013 at 03:18:53AM +0200, folkert wrote: The link at: http://www.linux-kvm.org/page/FAQ#My_guest_network_is_stuck_what_should_I_do.3F pointing to: http://qemu-buch.de/cgi-bin/moin.cgi/QemuNetwork is broken: it gives a Internal server error message. Please someone point me to the correct location as I'm struggeling with a VM losing connectivity all the time. Hi Folkert, I have updated the wiki to point to http://qemu-project.org/Documentation/Networking. The original link seems to be down. If you keep losing network connectivity you may have a MAC or IP address conflict. The symptom is that network traffic is intermittent - for example, ping might work but a full TCP connection does not. This happens when two guests are configured with identical MAC or IP addresses on the same bridge or subnet. They will fight over the MAC or IP address and you will not be able to reliably communicate with those guests. The tool for solving networking issues is often tcpdump. Run tcpdump inside the guest to verify it is receiving traffic or investigate a failed connection. Run tcpdump on the host - especially if you are using -netdev tap - to inspect the traffic being forwarded on behalf of the guest. If you let libvirt set up networking for you all should be fine. If you run qemu manually or customized the domain XML, then it's possible you have a misconfiguration. Feel free to post the details so someone can help you. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: VU#976534 - How to submit security bugs?
On Mon, Jul 22, 2013 at 02:49:50PM -0400, CERT(R) Coordination Center wrote: My name is Adam Rauf and I work for the CERT Coordination Center. We have a report that may affect KVM/QEMU. How can we securely send it over to you? Thanks so much! Paolo, Gleb, Anthony: Is this already being discussed off-list? Adam: Paolo Bonzini and Gleb Natapov are the KVM kernel maintainers and Anthony Liguori is QEMU maintainer. You can verify this by checking linux.git ./MAINTAINERS and qemu.git ./MAINTAINERS. I suggest getting in touch with them. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: disk corruption after virsh destroy
On Tue, Jul 02, 2013 at 10:40:11AM -0400, Brian J. Murrell wrote: I have a cluster of VMs setup with shared virtio-scsi disks. The purpose of sharing a disk is that if a VM goes down, another can pick up and mount the (ext4) filesystem on shared disk a provide service to it. But just to be super clear, only one VM ever has a filesystem mounted at a time even though multiple VMs technically can access the device at the same time. A VM mounting a filesystem ensures absolutely that no other node has it mounted before mounting it. That said, what I am finding is that when one a node dies and another node tries to mount the (ext4) filesystem, it is found dirty and needs an fsck. My understanding is that with ext{3,4}, this should not be the case and indeed it is my experience, on real hardware with coherent disk caching (i.e. no non-battery-backed caching disk controllers lying to the O/S about what has been written to physical disk) that this is the case. That is, a node failing does not leave an ext{3,4} filesystem dirty such that it needs an fsck. So, clearly, somewhere between the KVM VM and the physical disk, there is a cache that is resulting in the guest O/S believing data is being written to physical disk that is not actually being written there. To that end, I have ensured that on these shared disks that I set cache=none, but this does not seem to have fixed the problem. I expect journal replay and possibly fsck when an ext4 file system was left in a mounted state and with I/O pending (e.g. due to power failure). A few questions: 1. Is the guest mounting the file system with barrier=0? barrier=1 is the default. 2. Do the physical disks have a volatile write cache enabled (if yes, the guest should use barrier=1)? If the physical disks have a non-volatile write cache or the write cache is disabled (then barrier=0 is okay). 3. Have you tested without the cluster? Run a single VM and kill it while it is busy. Then start it up again and see if there is fsck. 4. Is it possible that your previous cluster setup used tune2fs(8) to disable fsck in some cases? That could explain why you didn't see fsck before but do now. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: i/o threads
On Wed, Jun 26, 2013 at 03:53:21PM +0200, folkert wrote: I noticed that on my 3 VMs running server, that there are 10-20 threads doing i/o. As the VMs are running on HDDs and not SSDs I think that is counterproductive: won't these threads make the HDDs seek back and forth constantly? The worker threads are doing preadv()/pwritev()/fdatasync(). It's up to the host kernel to schedule that I/O efficiently. Exposing more I/O to the host gives it a chance to merge or reorder I/O for optimal performance, so it's a good thing. On the other hand, if QEMU only did 1 or 2 I/O requests at a time then the host kernel could do nothing to improve the I/O pattern and the disks would indeed seek back and forth constantly. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Google Summer of Code 2013 has started
It is a pleasure to welcome the following GSoC 2013 students to the QEMU, KVM, and libvirt communities: Libvirt Wireshark Dissector - Yuto KAWAMURA (kawamuray) http://qemu-project.org/Features/LibvirtWiresharkDissector Libvirt Introduce API to query IP addresses for given domain - Nehal J. Wani (nehaljwani) http://www.google-melange.com/gsoc/project/google/gsoc2013/nehaljwani/51001 Libvirt More Intelligent virsh auto-completion - Tomas Meszaros http://www.google-melange.com/gsoc/project/google/gsoc2013/examon/13001 QEMU Integrated Copy-Paste - Ozan Çağlayan and Pallav Agrawal (pallav) http://qemu-project.org/Features/IntegratedCopyPaste QEMU Continuation Passing C - Charlie Shepherd (cs648) http://qemu-project.org/Features/Continuation-Passing_C QEMU Kconfig - Ákos Kovács http://qemu-project.org/Features/Kconfig QEMU USB Media Transfer Protocol emulation - a|mond http://www.google-melange.com/gsoc/project/google/gsoc2013/almond/1001 KVM Nested Virtualization Testsuite - Arthur Chunqi Li (xelatex) http://www.google-melange.com/gsoc/project/google/gsoc2013/xelatex/19001 Coding started on Monday, 17th of June and ends Monday, 23rd of September. Feel free to follow these projects - feature pages are being created with git repo and blog links. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Would a DOS on dovecot running under a VM cause host to crash?
On Fri, Jun 21, 2013 at 10:27:07AM +1200, Hugh Davenport wrote: The attack lasted around 4 minutes, in which there was 1161 lines in the log for a single attacker ip, and no other similar logs previously. Would this be enough to kill not only the VM running dovecot, but the underlying host machine? Have you checked logs on the host? Specifically /var/log/messages for seg fault messages or Out-of-Memory Killer messages. It's also worth checking /var/log/libvirt/qemu/domain.log if you are using libvirt. That file contains the QEMU stderr output. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: cache write back barriers
On Thu, Jun 13, 2013 at 10:47:32AM +0200, folkert wrote: Hi, In virt-manager I saw that there's the option for cache writeback for storage devices. I'm wondering: does this also make kvm to ignore write barriers invoked by the virtual machine? No, that would be unsafe. When the guest issues a flush then QEMU will ensure that data reaches the disk with -drive cache=writeback. Aha so the writeback behaves like the consume harddisks with write-cache on them. In that case maybe an extra note could be added to the virt-manager (excellent software by the way!) that if the client vm supports barriers, that write-back in that case then is safe. Agree? CCed virt-manager mailing list so they can see your request. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: cache write back barriers
On Wed, Jun 12, 2013 at 10:03:10AM +0200, folkert wrote: In virt-manager I saw that there's the option for cache writeback for storage devices. I'm wondering: does this also make kvm to ignore write barriers invoked by the virtual machine? No, that would be unsafe. When the guest issues a flush then QEMU will ensure that data reaches the disk with -drive cache=writeback. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: VirtIO and BSOD On Windows Server 2003
On Mon, Jun 03, 2013 at 09:56:41AM -0700, Aaron Clausen wrote: I recently built a new kvm server with Debian Wheezy which comes with KVM 1.1.2 and when I moved this guest over, I immediately started getting BSODs (0x007). I disabled virtio block driver and then attempted to upgrade to the latest with no luck. Stop code 0x7b Inaccessible boot device? How did you create the guest on the new server? Perhaps the hardware configuration changed - I suggest trying to make it as close to the original guest as possible (including the same PCI slots). Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Redirections from virtual interfaces.
On Fri, May 31, 2013 at 11:10:24AM -0300, Targino SIlveira wrote: I have an server with only one NIC, this NIC has a Public IP, this server is locate in a data center, I can't have more than one, but I can have many IP's, so I would like to know if I can redirect packages from virtual interface for my VM's? Examples: eth0:1 xxx.xx.xxx.xxx redirec all trafic to 192.168.122.200 eth0:2 xxx.xx.xxx.xxy redirec all trafic to 192.168.122.150 eth0:3 xxx.xx.xxx.xxz redirec all trafic to 192.168.122.180 I'm using /etc/libvirt/hooks/qemu to write iptables rules. Yes, look at NAT. A lot of material probably covers NAT behind one public IP, in this case you actually need to map public addresses onto private addresses 1:1. A web search for linux nat should turn up howtos. Or check on libvirt.org if there is libvirt configuration that automatically sets this up for you. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: updated: kvm networking todo wiki
On Thu, May 30, 2013 at 7:23 AM, Rusty Russell ru...@rustcorp.com.au wrote: Anthony Liguori anth...@codemonkey.ws writes: Rusty Russell ru...@rustcorp.com.au writes: On Fri, May 24, 2013 at 08:47:58AM -0500, Anthony Liguori wrote: FWIW, I think what's more interesting is using vhost-net as a networking backend with virtio-net in QEMU being what's guest facing. In theory, this gives you the best of both worlds: QEMU acts as a first line of defense against a malicious guest while still getting the performance advantages of vhost-net (zero-copy). It would be an interesting idea if we didn't already have the vhost model where we don't need the userspace bounce. The model is very interesting for QEMU because then we can use vhost as a backend for other types of network adapters (like vmxnet3 or even e1000). It also helps for things like fault tolerance where we need to be able to control packet flow within QEMU. (CC's reduced, context added, Dmitry Fleytman added for vmxnet3 thoughts). Then I'm really confused as to what this would look like. A zero copy sendmsg? We should be able to implement that today. On the receive side, what can we do better than readv? If we need to return to userspace to tell the guest that we've got a new packet, we don't win on latency. We might reduce syscall overhead with a multi-dimensional readv to read multiple packets at once? Sounds like recvmmsg(2). Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm: exclude ioeventfd from counting kvm_io_range limit
On Sat, May 25, 2013 at 06:44:15AM +0800, Amos Kong wrote: We can easily reach the 1000 limit by start VM with a couple hundred I/O devices (multifunction=on). The hardcode limit already been adjusted 3 times (6 ~ 200 ~ 300 ~ 1000). In userspace, we already have maximum file descriptor to limit ioeventfd count. But kvm_io_bus devices also are used for pit, pic, ioapic, coalesced_mmio. They couldn't be limited by maximum file descriptor. Currently only ioeventfds take too much kvm_io_bus devices, so just exclude it from counting kvm_io_range limit. Also fixed one indent issue in kvm_host.h Signed-off-by: Amos Kong ak...@redhat.com --- include/linux/kvm_host.h | 3 ++- virt/kvm/eventfd.c | 2 ++ virt/kvm/kvm_main.c | 3 ++- 3 files changed, 6 insertions(+), 2 deletions(-) Reviewed-by: Stefan Hajnoczi stefa...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm: add detail error message when fail to add ioeventfd
On Wed, May 22, 2013 at 09:48:21PM +0800, Amos Kong wrote: On Wed, May 22, 2013 at 11:32:27AM +0200, Stefan Hajnoczi wrote: On Wed, May 22, 2013 at 12:57:35PM +0800, Amos Kong wrote: I try to hotplug 28 * 8 multiple-function devices to guest with old host kernel, ioeventfds in host kernel will be exhausted, then qemu fails to allocate ioeventfds for blk/nic devices. It's better to add detail error here. Signed-off-by: Amos Kong ak...@redhat.com --- kvm-all.c |4 1 files changed, 4 insertions(+), 0 deletions(-) It would be nice to make kvm bus scalable so that the hardcoded in-kernel I/O device limit can be lifted. I had increased kernel NR_IOBUS_DEVS to 1000 (a limitation is needed for security) in last Mar, and make resizing kvm_io_range array dynamical. The maximum should not be hardcoded. File descriptor, maximum memory, etc are all controlled by rlimits. And since ioeventfds are file descriptors they are already limited by the maximum number of file descriptors. Why is there a need to impose a hardcoded limit? Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm: add detail error message when fail to add ioeventfd
On Wed, May 22, 2013 at 12:57:35PM +0800, Amos Kong wrote: I try to hotplug 28 * 8 multiple-function devices to guest with old host kernel, ioeventfds in host kernel will be exhausted, then qemu fails to allocate ioeventfds for blk/nic devices. It's better to add detail error here. Signed-off-by: Amos Kong ak...@redhat.com --- kvm-all.c |4 1 files changed, 4 insertions(+), 0 deletions(-) It would be nice to make kvm bus scalable so that the hardcoded in-kernel I/O device limit can be lifted. Reviewed-by: Stefan Hajnoczi stefa...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2013 Linux Plumbers Virtualization Microconference proposal call for participation
On Thu, May 16, 2013 at 02:32:30PM -0600, Alex Williamson wrote: We'd like to hold another virtualization microconference as part of this year's Linux Plumbers Conference. To do so, we need to show that there's enough interest, materials, and people willing to attend. Convenience info: September 18-20, 2013 New Orleans, Louisiana -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: how emulated disk IO translated to physical disk IO on host side
On Tue, May 07, 2013 at 09:58:52AM -0500, sheng qiu wrote: i am trying to figure out the code path which translate the emulated disk IO issued by VM to actual physical disk IO on host side. Can anyone give me a clear view about this? For an overview of the stack: http://events.linuxfoundation.org/slides/2011/linuxcon-japan/lcj2011_hajnoczi.pdf i read the kvm side code about the VMexit handling, the handle_io() will be called for IO exits. for IO job that cannot be handled inside the hyperviser, it will switch to qemu-kvm process and the handle_io() at qemu side will be called. finally it seems invoke the ioport_read()/ioport_write() which invoke the actual registered read/write operator. Then i get lost here, i do not know for emulated disk io which function will be responsible for the remaining job. i.e. catch the cmd for accessing the virtual disk and translate to the read/write to offset of the disk img file (assume we use file for virtual disk), and then issue the system call to the host to issue the real io cmd to physical disk. If you are running with a virtio-blk PCI adapter, then QEMU's virtio_queue_host_notifier_read() or virtio_ioport_write() is invoked (it depends whether ioeventfd is being used or regular I/O dispatch). Then QEMU's hw/block/virtio-blk.c will call bdrv_aio_readv()/bdrv_aio_writev()/bdrv_aio_flush(). This enters the QEMU block layer (see block.c and block/) where image file formats are handled. Eventually you get to block/raw-posix.c which issues either a preadv()/pwritev()/fdatasync() in a worker thread or a Linux AIO io_submit() (if -drive aio=native was used). Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call minutes for 2013-04-23
On Tue, Apr 23, 2013 at 10:06:41AM -0600, Eric Blake wrote: On 04/23/2013 08:45 AM, Juan Quintela wrote: we can change drive_mirror to use a new command to see if there are the new features. drive-mirror changed in 1.4 to add optional buf-size parameter; right now, libvirt is forced to limit itself to 1.3 interface (no buf-size or granularity) because there is no introspection and no query-* command that witnesses that the feature is present. Idea was that we need to add a new query-drive-mirror-capabilities (name subject to bikeshedding) command into 1.5 that would let libvirt know that buf-size/granularity is usable (done right, it would also prevent the situation of buf-size being a write-only interface where it is set when starting the mirror but can not be queried later to see what size is in use). Unclear whether anyone was signing up to tackle the addition of a query command counterpart for drive-mirror in time for 1.5. Seems like the trivial solution is a query-command-capabilities QMP command. query-command-capabilities drive-mirror = ['buf-size'] It should only be a few lines of code and can be used for other commands that add optional parameters in the future. In other words: typedef struct mon_cmd_t { ... const char **capabilities; /* drive-mirror uses [buf-size, NULL] */ }; if we have a stable c-api we can do test cases that work. Having such a testsuite would make a stable C API more important. Writing tests in Python has been productive, see qemu-iotests 041 and friends. The tests spawn QEMU guests and use QMP to interact: result = self.vm.qmp('query-block') self.assert_qmp(result, 'return[0]/inserted/file', target_img) Using this XPath-style syntax it's very easy to access the JSON. QEMU users tend not to use C, except libvirt. Even libvirt implements the QMP protocol dynamically and can handle optional arguments well. I don't think a static C API makes sense when we have an extensible JSON protocol. Let's use the extensibility to our advantage. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fwd: kvm
On Mon, Apr 22, 2013 at 10:59:25AM +0100, Gary Lloyd wrote: I was wondering if anyone could help me with an issue with KVM and ISCSI. If we restart a controller on our EqualLogic SAN or there are any network interruptions on the storage network, KVM guests throw a wobbler and their files systems go into read only(centos 5.9 guest with virtio driver). I have read a few forums that indicate you can set disk timeout values on the guests themselves but this is not possible using the virtio driver, which is what we are currently using. Is there any way we can instruct KVM to pause the vm's if there is a storage failure and resume them when the storage comes back online ? We are currently running Centos 6.4. There seems to be a werror='stop' and rerror='stop' options to achieve this but if I try to put these in options in the libvirt xml file for a vm, libvirt appears to be removing them. Please email libvirt-us...@redhat.com for questions about libvirt in the future. This is a question about libvirt domain XML. The documentation is here: http://libvirt.org/formatdomain.html#elementsDisks The attribute is called error_policy. The documentation says: The optional error_policy attribute controls how the hypervisor will behave on a disk read or write error, possible values are stop, report, ignore, and enospace.Since 0.8.0, report since 0.9.7 The default setting of error_policy is report. There is also an optional rerror_policy that controls behavior for read errors only. Since 0.9.7. If no rerror_policy is given, error_policy is used for both read and write errors. If rerror_policy is given, it overrides the error_policy for read errors. Also note that enospace is not a valid policy for read errors, so if error_policy is set to enospace and no rerror_policy is given, the read error policy will be left at its default, which is report. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [User Question] Repeated severe performance problems on guest
On Wed, Apr 17, 2013 at 09:52:39PM +0200, Martin Wawro wrote: Hi Stefan, The host is interesting too if you suspect KVM is involved in the performance issue (rather than it being purely an application issue inside the guest). For example, pidstat (from the sysstat package) on the host can tell you the guest mode CPU utilization percentage. That's useful for double-checking that the guest is indeed using up a lot of CPU time (the guest data you posted suggests it is). I added it to the host logging to have more information next time something goes haywire. What does top or ps say about the 79% userspace CPU utilization? Perhaps this is unrelated to KVM and simply a buggy application going nuts. In this case, it was postgres (we have a couple of instances running on the guest). But it can also be another daemon process that usually behaves very well, so no real culprit to pinpoint it to. We have the same setup (including OS versions and binary versions) in other locations (on physical machines) running for years without any problems, so I doubt that this is an application issue. Another hint that it is not an application issue is the fact, that when we shutdown the processes that generate the load, the load average goes down for a couple of seconds and then again rises to sky-high values with another process consuming the load (until nothing is left running on the machine except for syslogd :-) ). I see. That's a good reason to carefully monitor the host for things that could interfere with guest performance. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [User Question] Repeated severe performance problems on guest
On Thu, Apr 18, 2013 at 12:00 PM, Martin Wawro martin.wa...@gmail.com wrote: On 04/18/2013 09:25 AM, Stefan Hajnoczi wrote: I see. That's a good reason to carefully monitor the host for things that could interfere with guest performance. Stefan Seems that today is a bad day for our server. We had to give him the boot (again). Also the results of the pidstat output do not seem to yield much additional information on what could be the problem. In order to avoid spilling this mailing list, here is some data gathered on the host: http://pastebin.com/8q7UgXkJ ...and this is the data from the guest: http://pastebin.com/xLTYZjGp No answer but some more questions. Regarding the kvm_stat output, the exits are caused by 68,000 pagefaults/second (pf_fixed). Perhaps someone can explain what this means? The host has 8 cores, the guest has 7. Host pidstat shows qemu-kvm consuming 263.9% CPU: 11:25:27 4017 11.13 34.65 218.12 263.90 7 qemu-kvm Why is the guest not getting more than 3 CPUs since the host is otherwise idle? You may want to disable ksmd on the host since you only have 1 guest, but I doubt that will fix the main problem: 11:25:27 1000.007.890.007.89 7 ksmd For details, see https://www.kernel.org/doc/Documentation/vm/ksm.txt. What is the python process on the host doing? Is it poking libvirt? 11:25:27 45584.663.550.008.21 7 python 11:25:27 36593.994.550.008.54 7 libvirtd Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [User Question] Repeated severe performance problems on guest
On Thu, Apr 18, 2013 at 03:27:45PM +0200, Martin Wawro wrote: On 04/18/2013 03:14 PM, Stefan Hajnoczi wrote: No answer but some more questions. Regarding the kvm_stat output, the exits are caused by 68,000 pagefaults/second (pf_fixed). Perhaps someone can explain what this means? The host has 8 cores, the guest has 7. Host pidstat shows qemu-kvm consuming 263.9% CPU: 11:25:27 4017 11.13 34.65 218.12 263.90 7 qemu-kvm Why is the guest not getting more than 3 CPUs since the host is otherwise idle? If one waits a little longer, top shows all 7 cores under utilization (700%). Unfortunately we have to be quick with the reboots during daytime, because the system is in production use and we have not decided yet to completely replace it. BTW does the host CPU support Intel Extended Page Tables or AMD Nested Page Tables? grep 'npt\|ept' /proc/cpuinfo (I think the kvm_stat is saying EPT/NPT are not in use) For details, see https://www.kernel.org/doc/Documentation/vm/ksm.txt. What is the python process on the host doing? Is it poking libvirt? 11:25:27 45584.663.550.008.21 7 python 11:25:27 36593.994.550.008.54 7 libvirtd That is virt-manager.py, exactly doing that. Okay, I was wondering if something is causing libvirt and maybe QEMU to act strangely. If its just virt-manager then it's probably not the issue. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [User Question] Repeated severe performance problems on guest
On Tue, Apr 16, 2013 at 09:49:20AM +0200, Martin Wawro wrote: On 04/16/2013 07:49 AM, Stefan Hajnoczi wrote: Besides the kvm_stat, general performance data from the host is useful when dealing with high load averages. Do you have vmstat or sar data for periods of time when the machine was slow? Stefan We do have a rather exhaustive log on the guest. As for the host, we did not find anything suspicious except for the kvm_stat output. So we did not log any more than that. The host is interesting too if you suspect KVM is involved in the performance issue (rather than it being purely an application issue inside the guest). For example, pidstat (from the sysstat package) on the host can tell you the guest mode CPU utilization percentage. That's useful for double-checking that the guest is indeed using up a lot of CPU time (the guest data you posted suggests it is). Here is the output of vmstat 5 5 on the guest: procs ---memory-- ---swap-- -io -system-- cpu r b swpd free buff cache si sobibo in cs us sy id wa 84 0 19596 104404 60 2193261600 232 11092 7 2 90 1 80 0 19596 98100 60 2193392000 106 119 854 912 79 21 0 0 89 0 19596 94216 60 2193276400 106 223 864 886 79 21 0 0 87 0 19596 95848 60 21927612008247 856 906 79 21 0 0 Load average at that time: 75 (1:20 AM) What does top or ps say about the 79% userspace CPU utilization? Perhaps this is unrelated to KVM and simply a buggy application going nuts. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] provide an API to userspace doing memory snapshot
On Tue, Apr 16, 2013 at 03:54:15PM +0800, Wenchao Xia wrote: 于 2013-4-16 13:51, Stefan Hajnoczi 写道: On Mon, Apr 15, 2013 at 09:03:36PM +0800, Wenchao Xia wrote: I'd like to add/export an function which allow userspace program to take snapshot for a region of memory. Since it is not implemented yet I will describe it as C APIs, it is quite simple now and if it is worthy I'll improve the interface later: We talked about a simple approach using fork(2) on IRC yesterday. Is this email outdated? Stefan No, after the discuss on IRC, I agree that fork() is a simpler method to do it, which can comes to qemu fast, since user wants it. With a more consideration, still I think a KVM's mem snapshot would be an long term solution for it: The source of the problem comes from acceleration module, kvm.ko, when qemu does not use it, no troubles. This means an acceleration module missed a function while caller requires. My instinct idea is: when acceleration module replace a pure software one, it should try provide all parts or not stop software filling the gap, and doing so brings benefits, so hope to add it. My API description is old, the core is COW pages, maybe redesign if reasonable. QEMU is a userspace process that has guest RAM mmapped. You want to snapshot that mmap region but there is no Linux system call to do that. Maybe a new mremap(2) flag is what you want. But I don't see the connection to kvm.ko which you mention. The feature you're wishing for has nothing to do with kvm.ko. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Perf tuning help?
On Tue, Apr 16, 2013 at 04:30:17PM -0400, Mason Turner wrote: We have an in-house app, written in c, that is not performing as well as we'd hoped it would when moving to a VM. We've tried all the common tuning recommendations (virtio, tap interface, cpu pining), without any change in performance. Even terminating all of the other VMs on the host doesn't make a difference. The VM doesn't appear to be CPU, memory or IO bound. We are trying to maximize UDP-based QPS against the in-house app. I've been running strace against the app and perf kvm against the VM to try to identify any bottlenecks. I would say there are a lot of kvm_exits, but I'm not sure how to quantify what is acceptable and what is not. We are trying to maximize UDP queries against the app. I've read a few times that the virtio network stack results in a lot of vm_exits. Unfortunately, we can't use the direct PCI access with our hardware. Can you explain the traffic characteristics more? * UDP packet size * Pattern: 1 query packet, 1 response packet or something more exotic * Bare metal QPS (the goal) * Guest QPS (what you're seeing) * Benchmark configuration: are packets going across a physical network? Is there a good resource inefficient system calls? Things that result in higher than normal kvm_exits, or other performance killers? Thanks for the help. Our hypdervisor is running on CentOS 6.3: 2.6.32-279.22.1.el6.x86_64 qemu-kvm 0.12.1.2 libvirt 0.9.10 Our app is running on Centos 6.1: 2.6.32-131.0.15.el6.x86_64 Slightly outdated guest and host. It might be worth trying upstream kernels and QEMU (build from source). domain type='kvm' namething1/name uuidabe76ce9-60a0-4727-a7ae-cf572e5c3f21/uuid memory unit='KiB'16384000/memory currentMemory unit='KiB'16384000/currentMemory vcpu placement='static'6/vcpu cputune vcpupin vcpu='0' cpuset='0'/ vcpupin vcpu='1' cpuset='2'/ vcpupin vcpu='2' cpuset='4'/ vcpupin vcpu='3' cpuset='6'/ vcpupin vcpu='4' cpuset='8'/ vcpupin vcpu='5' cpuset='10'/ /cputune numatune memory mode='interleave' nodeset='0,2,4,6,8,10'/ /numatune os type arch='x86_64' machine='rhel6.0.0'hvm/type boot dev='hd'/ /os features acpi/ apic/ pae/ /features clock offset='utc'/ on_poweroffdestroy/on_poweroff on_rebootrestart/on_reboot on_crashrestart/on_crash devices emulator/usr/libexec/qemu-kvm/emulator disk type='file' device='disk' driver name='qemu' type='raw' cache='none'/ source file='/var/lib/libvirt/images/thing1-disk0'/ target dev='vda' bus='virtio'/ address type='pci' domain='0x' bus='0x00' slot='0x05' function='0x0'/ /disk controller type='usb' index='0' address type='pci' domain='0x' bus='0x00' slot='0x01' function='0x2'/ /controller interface type='bridge' mac address='00:5e:e3:e1:8a:aa'/ source bridge='virbr0'/ model type='virtio'/ address type='pci' domain='0x' bus='0x00' slot='0x04' function='0x0'/ /interface Please double-check that vhost-net is being used: http://pic.dhe.ibm.com/infocenter/lnxinfo/v3r0m0/topic/liaat/liaatbpvhostnet.htm serial type='pty' target port='0'/ /serial console type='pty' target type='serial' port='0'/ /console input type='tablet' bus='usb'/ input type='mouse' bus='ps2'/ graphics type='vnc' port='-1' autoport='yes' keymap='en-us'/ video model type='cirrus' vram='9216' heads='1'/ address type='pci' domain='0x' bus='0x00' slot='0x02' function='0x0'/ /video memballoon model='virtio' address type='pci' domain='0x' bus='0x00' slot='0x06' function='0x0'/ /memballoon /devices /domain -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [User Question] Repeated severe performance problems on guest
On Fri, Apr 12, 2013 at 05:04:27PM +0200, Martin Wawro wrote: Logging the kvm_stat on the host, we obtained the following output during Besides the kvm_stat, general performance data from the host is useful when dealing with high load averages. Do you have vmstat or sar data for periods of time when the machine was slow? Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] provide an API to userspace doing memory snapshot
On Mon, Apr 15, 2013 at 09:03:36PM +0800, Wenchao Xia wrote: I'd like to add/export an function which allow userspace program to take snapshot for a region of memory. Since it is not implemented yet I will describe it as C APIs, it is quite simple now and if it is worthy I'll improve the interface later: We talked about a simple approach using fork(2) on IRC yesterday. Is this email outdated? Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] reply: reply: qemu crashed when starting vm(kvm) with vnc connect
On Mon, Apr 08, 2013 at 12:27:06PM +, Zhanghaoyu (A) wrote: On Sun, Apr 07, 2013 at 04:58:07AM +, Zhanghaoyu (A) wrote: I start a kvm VM with vnc(using the zrle protocol) connect, sometimes qemu program crashed during starting period, received signal SIGABRT. Trying about 20 times, this crash may be reproduced. I guess the cause memory corruption or double free. Which version of QEMU are you running? Please try qemu.git/master. Stefan I used the QEMU download from qemu.git (http://git.qemu.org/git/qemu.git). Great, thanks! Can you please post a backtrace? The easiest way is: $ ulimit -c unlimited $ qemu-system-x86_64 -enable-kvm -m 1024 ... ...crash... $ gdb -c qemu-system-x86_64.core (gdb) bt Depending on how your system is configured the core file might have a different filename but there should be a file name *core* the current working directory after the crash. The backtrace will make it possible to find out where the crash occurred. Thanks, Stefan backtrace from core file is shown as below: Program received signal SIGABRT, Aborted. 0x7f32eda3dd95 in raise () from /lib64/libc.so.6 (gdb) bt #0 0x7f32eda3dd95 in raise () from /lib64/libc.so.6 #1 0x7f32eda3f2ab in abort () from /lib64/libc.so.6 #2 0x7f32eda77ece in __libc_message () from /lib64/libc.so.6 #3 0x7f32eda7dc06 in malloc_printerr () from /lib64/libc.so.6 #4 0x7f32eda7ecda in _int_free () from /lib64/libc.so.6 #5 0x7f32efd3452c in free_and_trace (mem=0x7f329cd0) at vl.c:2880 #6 0x7f32efd251a1 in buffer_free (buffer=0x7f32f0c82890) at ui/vnc.c:505 #7 0x7f32efd20c56 in vnc_zrle_clear (vs=0x7f32f0c762d0) at ui/vnc-enc-zrle.c:364 #8 0x7f32efd26d07 in vnc_disconnect_finish (vs=0x7f32f0c762d0) at ui/vnc.c:1050 #9 0x7f32efd275c5 in vnc_client_read (opaque=0x7f32f0c762d0) at ui/vnc.c:1349 #10 0x7f32efcb397c in qemu_iohandler_poll (readfds=0x7f32f074d020, writefds=0x7f32f074d0a0, xfds=0x7f32f074d120, ret=1) at iohandler.c:124 #11 0x7f32efcb46e8 in main_loop_wait (nonblocking=0) at main-loop.c:417 #12 0x7f32efd31159 in main_loop () at vl.c:2133 #13 0x7f32efd38070 in main (argc=46, argv=0x7fff7f5df178, envp=0x7fff7f5df2f0) at vl.c:4481 CCing Corentin and Gerd who are more familiar with the VNC code than me. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call agenda for 2013-04-09
Meeting notes on Abel's presentation: Aim: improve vhost scalability Shared vhost thread == Problem: Linux scheduler does not see state of virtqueues, cannot make good scheduling decisions Solution: Shared thread serves multiple VMs and therefore influences I/O scheduling instead of kernel thread per vhost device Exitless communication = * Polling on host to notice guest vring updates without guest pio instruction * Use CPU affinity to bind vcpus to separate cores and let polling run on dedicated cores * Existless Interrupt (ELI) or future hardware APIC virtualization feature to inject virtual interrupts without vmexit and EOI See paper for performance results (impressive numbers): http://domino.research.ibm.com/library/cyberdig.nsf/papers/479E3578ED05BFAC85257B4200427735/$File/h-0319.pdf Abel will publish rebased code on GitHub but does not have time to upstream them. The next step: QEMU/KVM community can digest the paper + patches and decide on ideas to upstream. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Virtualbox svga card in KVM
On Fri, Apr 05, 2013 at 04:52:05PM -0700, Sriram Murthy wrote: For starters, virtual box has better SVGA WDDM drivers that allows for a much richer display when the VM display is local. What does much richer display mean? Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 答复: [Qemu-devel] qemu crashed when starting vm(kvm) with vnc connect
On Sun, Apr 07, 2013 at 04:58:07AM +, Zhanghaoyu (A) wrote: I start a kvm VM with vnc(using the zrle protocol) connect, sometimes qemu program crashed during starting period, received signal SIGABRT. Trying about 20 times, this crash may be reproduced. I guess the cause memory corruption or double free. Which version of QEMU are you running? Please try qemu.git/master. Stefan I used the QEMU download from qemu.git (http://git.qemu.org/git/qemu.git). Great, thanks! Can you please post a backtrace? The easiest way is: $ ulimit -c unlimited $ qemu-system-x86_64 -enable-kvm -m 1024 ... ...crash... $ gdb -c qemu-system-x86_64.core (gdb) bt Depending on how your system is configured the core file might have a different filename but there should be a file name *core* the current working directory after the crash. The backtrace will make it possible to find out where the crash occurred. Thanks, Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH uq/master v2 0/2] Add some tracepoints for clarification of the cause of troubles
On Fri, Mar 29, 2013 at 01:24:25PM +0900, Kazuya Saito wrote: This series adds tracepoints for helping us clarify the cause of troubles. Virtualization on Linux is composed of some components such as qemu, kvm, libvirt, and so on. So it is very important to clarify firstly and swiftly the cause of troubles is on what component of them. Although qemu has useful information of this because it stands among kvm, libvirt and guest, it doesn't output the information by trace or log system. These patches add tracepoints which lead to reduce the time of the clarification. We'd like to add the tracepoints as the first set because, based on our experience, we've found out they must be useful for an investigation in the future. Without those tracepoints, we had a really hard time investigating a problem since the problem's reproducibility was quite low and there was no clue in the dump of qemu. Changes from v1: Add arg to kvm_ioctl, kvm_vm_ioctl, kvm_vcpu_ioctl tracepoints. Add cpu_index to kvm_vcpu_ioctl, kvm_run_exit tracepoints. Kazuya Saito (2): kvm-all: add kvm_ioctl, kvm_vm_ioctl, kvm_vcpu_ioctl tracepoints kvm-all: add kvm_run_exit tracepoint kvm-all.c|5 + trace-events |7 +++ 2 files changed, 12 insertions(+), 0 deletions(-) Thanks, applied to my tracing tree: https://github.com/stefanha/qemu/commits/tracing Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
We've been accepted to Google Summer of Code 2013
Good news! QEMU.org has been accepted to Google Summer of Code 2013. This means students can begin considering our list of QEMU, kvm kernel module, and libvirt project ideas: http://qemu-project.org/Google_Summer_of_Code_2013 Student applications open April 22 at 19:00 UTC. You can already view the application template here: http://www.google-melange.com/gsoc/org/google/gsoc2013/qemu If you are an interested student, please take a look at the project ideas and get in touch with the mentor for that project. They can help clarify the scope of the project and what skills are necessary. You are invited to join the #qemu-gsoc IRC channel on irc.oftc.net where questions about Google Summer of Code with QEMU.org are welcome. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] qemu crashed when starting vm(kvm) with vnc connect
On Tue, Apr 02, 2013 at 09:02:02AM +, Zhanghaoyu (A) wrote: I start a kvm VM with vnc(using the zrle protocol) connect, sometimes qemu program crashed during starting period, received signal SIGABRT. Trying about 20 times, this crash may be reproduced. I guess the cause memory corruption or double free. Which version of QEMU are you running? Please try qemu.git/master. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Virtualbox svga card in KVM
On Thu, Mar 21, 2013 at 10:53:21AM -0400, Alon Levy wrote: I am planning on bringing in the virtualbox svga card into kvm as a new svga card type (vbox probably?) so that we can load the VirtualBox SVGA card drivers in the guest. I'm curious if the vbox SVGA card has features that existing QEMU graphics cards do not provide? Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call agenda for 2013-03-26
On Mon, Mar 25, 2013 at 08:13:34PM -0500, Rob Landley wrote: On 03/25/2013 08:17:44 AM, Juan Quintela wrote: Hi Please send in any agenda topics you are interested in. Later, Juan. If Google summer of code is still open: http://qemu-project.org/Google_Summer_of_Code_2013 Project ideas can still be added to the wiki. They must have a mentor who is able to commit around 5 hours per week this summer. I'm not sure about the status of the todo list items you mentioned, hopefully others can help. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
QEMU has applied for Google Summer of Code 2013
QEMU.org has applied for Google Summer of Code 2013 and also aims to be an umbrella organization for libvirt and the KVM kernel module. Accepted mentoring organizations will be announced on April 8 at 19:00 UTC at http://google-melange.com/. This year we have proposed 5 QEMU project ideas, 1 KVM kernel module project idea, and 4 libvirt project ideas: http://qemu-project.org/Google_Summer_of_Code_2013 Thanks to everyone who has volunteered to be a mentor! Also thanks to Anthony Liguori for being backup org admin. Fingers crossed, Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio-blk: Set default serial id
On Wed, Mar 20, 2013 at 01:56:08PM +0800, Asias He wrote: If user does not specify a serial id, e.g. -device virtio-blk-pci,serial=serial_id or -drive serial=serial_id no serial id will be assigned. Add a default serial id in this case to help identifying the disk in guest. Signed-off-by: Asias He as...@redhat.com --- hw/virtio-blk.c | 7 +++ 1 file changed, 7 insertions(+) Autogenerated IDs have been proposed (for other devices?) before and I think we should avoid them. The serial in this patch depends on the internal counter we use for savevm. It is not a well-defined value that guests can depend on remaining the same. It can change between QEMU invocations - due to internal changes in QEMU or because the management tool reordered -device options. Users will be confused and their guests may stop working if they depend on an ID like this. The solution is to do persistent naming either by really passing -device virtio-blk-pci,serial= or with udev inside the guest using the bus address (PCI devfn) like the new persistent network interface naming for Linux. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 WIP 2/3] vhost-scsi: new device supporting the tcm_vhost Linux kernel module
On Tue, Mar 19, 2013 at 08:34:44AM +0800, Asias He wrote: +static void vhost_scsi_stop(VHostSCSI *vs, VirtIODevice *vdev) +{ +int ret = 0; + +if (!vdev-binding-set_guest_notifiers) { +ret = vdev-binding-set_guest_notifiers(vdev-binding_opaque, + vs-dev.nvqs, false); +if (ret 0) { +error_report(vhost guest notifier cleanup failed: %d\n, ret); Indentation. scripts/checkpatch.pl should catch this. +} +} +assert(ret = 0); + +vhost_scsi_clear_endpoint(vdev); +vhost_dev_stop(vs-dev, vdev); +vhost_dev_disable_notifiers(vs-dev, vdev); +} + +static void vhost_scsi_set_config(VirtIODevice *vdev, + const uint8_t *config) +{ +VirtIOSCSIConfig *scsiconf = (VirtIOSCSIConfig *)config; +VHostSCSI *vs = (VHostSCSI *)vdev; + +if ((uint32_t) ldl_raw(scsiconf-sense_size) != vs-vs.sense_size || +(uint32_t) ldl_raw(scsiconf-cdb_size) != vs-vs.cdb_size) { +error_report(vhost-scsi does not support changing the sense data and CDB sizes); +exit(1); Guest-triggerable exits can be used as a denial of service - especially under nested virtualization where killing the L1 hypervisor would kill all L2 guests! I would just log a warning here. +} +} + +static void vhost_scsi_set_status(VirtIODevice *vdev, uint8_t val) +{ +VHostSCSI *vs = (VHostSCSI *)vdev; +bool start = (val VIRTIO_CONFIG_S_DRIVER_OK); + +if (vs-dev.started == start) { +return; +} + +if (start) { +int ret; + +ret = vhost_scsi_start(vs, vdev); +if (ret 0) { +error_report(virtio-scsi: unable to start vhost: %s\n, + strerror(-ret)); + +/* There is no userspace virtio-scsi fallback so exit */ +exit(1); It's questionable whether to kill the guest or simply disable this virtio-scsi-pci adapter. Fine for now but we may want to allow a policy here in the future. diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c index 39c1966..281a7e2 100644 --- a/hw/virtio-pci.c +++ b/hw/virtio-pci.c @@ -22,6 +22,7 @@ #include hw/virtio-net.h #include hw/virtio-serial.h #include hw/virtio-scsi.h +#include hw/vhost-scsi.h Can this header be included unconditionally? It uses _IOW() which may not be available on all host platforms. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 WIP 3/3] disable vhost_verify_ring_mappings check
On Tue, Mar 19, 2013 at 08:34:45AM +0800, Asias He wrote: --- hw/vhost.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/hw/vhost.c b/hw/vhost.c index 4d6aee3..0c52ec4 100644 --- a/hw/vhost.c +++ b/hw/vhost.c @@ -421,10 +421,12 @@ static void vhost_set_memory(MemoryListener *listener, return; } +#if 0 if (dev-started) { r = vhost_verify_ring_mappings(dev, start_addr, size); assert(r = 0); } +#endif Please add a comment to explain why. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Can I bridge the loopback?
On Sat, Mar 16, 2013 at 12:06:30AM -0500, Steve wrote: Here's the issue. I want to communicate between virtual machines, second Ethernet virtual port. But I would like to use the host loopback for that so as to not be limited to Ethernet port speeds, for large copies, etc. Right now, the machine is connected to a 10mbps switch on port 2 and would like to get far faster transfer speeds when using the so called private LAN. Bridging eth1 merely limits the speed to 10 Mbps. If it was bridged to the host loopback, I was hoping it could achieve far faster speeds and also not saturate the switch. So, could I make a br1 that is assigned to 127.0.0.1 and then each host can use that as eth1?-- Guest-guest communication is not affected by physical NIC link speed. A software bridge with the guest tap interfaces and the host's physical interface should allow guests to communicate 10 Mbps. Have you measured the speed of guest-guest networking and found it is 10 Mbps? If you still experience poor performance, please post your QEMU command-line, ifconfig -a (on host), and brctl show (on host) output. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2 WIP 2/2] vhost-scsi: new device supporting the tcm_vhost Linux kernel module
On Tue, Mar 12, 2013 at 02:29:42PM +0800, Asias He wrote: diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c index 39c1966..4a97ca1 100644 --- a/hw/virtio-pci.c +++ b/hw/virtio-pci.c These changes break the build for non-Linux hosts. Please introduce a CONFIG_VHOST_SCSI and #ifdef appropriate sections in hw/virtio-pci.c. CONFIG_VIRTFS does the same thing. +static Property vhost_scsi_properties[] = { +DEFINE_PROP_BIT(ioeventfd, VirtIOPCIProxy, flags, VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true), This flag makes QEMU's virtqueue handling use ioeventfd. Since the vhost-scsi.c takes over the guest/host notifiers, we never do QEMU virtqueue processing and the ioeventfd flag has no real meaning. You can drop it. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: win2k guest vm won't boot under Fedora 18 KVM
On Sat, Mar 09, 2013 at 12:43:32PM -0700, Earl Marwil wrote: Hi, I'm looking for some guidance on how to get to the root cause of an issue that I am observing with a win2k guest that won't boot under Fedora 18 on one system but will boot on another. A few days ago I posted on the fedora forum: http://forums.fedoraforum.org/showthread.php?t=289401 I can repeat the details in this thread if requested. The issue is that, with a fresh build of Fedora 18, updated to the most recent kernel and packages on an external USB ssd, my win2k VM boots on my laptop (Core i73720QM processor) but does not boot on my desktop system (Core i7-870 processor). I'm not sure whether this is a kvm issue or a kernel issue. I'll be glad to dig deeper, just let me what information is needed. Hi Earl, From your forum post: KVM internal error. Suberror: 1 emulation failure EAX=63700200 EBX=e6f5 ECX=000f EDX=0936 ... Code=74 1d b0 37 e6 70 eb 00 e4 71 eb 00 32 e4 c1 c0 04 c0 c8 04 d5 0a 3d 13 0 0 75 04 b8 7a 15 c3 b8 00 00 c3 55 8b ec 1e 06 56 57 8b 46 04 8e d8 8b 76 06 Here is my guess: Laptop has a CPU from 2012. Desktop has a CPU from 2009. Intel added unrestricted guest support to VMX. This feature allows the CPU to run real mode code in guest mode. CPUs that do not support unrestricted guest (your desktop?) use an emulator implemented in software inside the kvm.ko kernel module. The emulator may be unable to handle the real mode instruction in the particular kernel version you are running. The laptop doesn't hit this issue because it supports unrestricted guest while the desktop falls back to the emulator inside kvm.ko where it hits the bug. You may find that changing kernel versions on the desktop will make it work. The best would be to compile a vanilla Linux kernel for the desktop machine to verify that this issue still happens. If so, please post the full KVM internal error output to this mailing list and hopefully someone can fix the emulator. Problem with my theory: I haven't figured out how to check which Intel CPU models support unrestricted guest, so I'm not 100% sure this is the issue. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM call agenda for 2013-03-12
On Mon, Mar 11, 2013 at 4:42 PM, Juan Quintela quint...@redhat.com wrote: Please send in any agenda topics you are interested in. Overview of mentoring for Google Summer of Code 2013: * Post project ideas here: http://wiki.qemu.org/Google_Summer_of_Code_2013 * Who can be a mentor? * What's in it for the mentor? * What does a mentor do? * How does a mentor select a student to work with? Open discussion - any questions about Google Summer of Code. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2 0/6] tcm_vhost hotplug/hotunplug support and locking/flushing fix
On Fri, Mar 08, 2013 at 10:21:41AM +0800, Asias He wrote: Changes in v2: - Remove code duplication in tcm_vhost_{hotplug,hotunplug} - Fix racing of vs_events_nr - Add flush fix patch to this series Asias He (6): tcm_vhost: Add missed lock in vhost_scsi_clear_endpoint() tcm_vhost: Introduce tcm_vhost_check_feature() tcm_vhost: Introduce tcm_vhost_check_endpoint() tcm_vhost: Fix vs-vs_endpoint checking in vhost_scsi_handle_vq() tcm_vhost: Add hotplug/hotunplug support tcm_vhost: Flush vhost_work in vhost_scsi_flush() drivers/vhost/tcm_vhost.c | 243 -- drivers/vhost/tcm_vhost.h | 10 ++ 2 files changed, 247 insertions(+), 6 deletions(-) -- 1.8.1.4 Reviewed-by: Stefan Hajnoczi stefa...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm + ceph performance issues
On Thu, Mar 07, 2013 at 12:57:55PM +0100, Wolfgang Hennerbichler wrote: I'm running a virtual machine with the following command: LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin QEMU_AUDIO_DRV=none /usr/bin/kvm -S -M pc-1.0 -enable-kvm -m 4096 -smp 2,sockets=2,cores=1,threads=1 -name korfu_ceph -uuid a9131b8f-d087-26f4-2ca9-018505f11838 -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/korfu_ceph.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime -no-shutdown -device lsi,id=scsi0,bus=pci.0,addr=0x4 -drive file=rbd:rd/korfu:rbd_cache=1:mon_host=rd-clusternode21\:6789\;rd-clusternode22\:6789,if=none,id=drive-ide0-0-0,format=raw -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -usb -vnc 127.0.0.1:0 -vga std -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 kvm-version is that from ubuntu LTS: 1.0+noroms-0ubuntu14.7 When I read or write big files the system basically gets unusable (mouse-cursor in VNC jerks across the screen, i/o is mostly on blocking). I know it is related to ceph in a way, but also to KVM, as it seems that there a lot of IRQ's happening (or how do you explain the mourse cursor in VNC jerking and lagging behind time?_) . Ceph Mailing List doesn't really help. High CPU load doesn't hurt the machine, it's only Harddisk I/O. Oh, and the main host running kvm doesn't really suffer, too. some i/o waiting, but not really swapping or something. Here's my libvirt-config if it is of any help: http://pastie.org/6411055 any hints would REALLY be appreciated... Please try using virtio-blk instead of IDE. If the guest still jerks try using the Linux rbd block driver instead of QEMU -drive rbd:. I haven't used Ceph much but there should be documentation on attaching a RADOS block device to your Linux host. Tell QEMU to use the RADOS block device like a regular file (you are now using the kernel driver instead of QEMU code to talk to the Ceph cluster). Please let us know the outcome. If you find that virtio-blk does not make much difference but using the kernel rbd driver does, then this suggests there is a bug in QEMU's block/rbd.c. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] tcm_vhost: Add hotplug/hotunplug support
On Thu, Mar 07, 2013 at 08:26:20AM +0800, Asias He wrote: On Wed, Mar 06, 2013 at 10:21:09AM +0100, Stefan Hajnoczi wrote: On Wed, Mar 06, 2013 at 02:16:30PM +0800, Asias He wrote: +static struct tcm_vhost_evt *tcm_vhost_allocate_evt(struct vhost_scsi *vs, + u32 event, u32 reason) +{ + struct tcm_vhost_evt *evt; + + if (atomic_read(vs-vs_events_nr) VHOST_SCSI_MAX_EVENT) + return NULL; + + evt = kzalloc(sizeof(*evt), GFP_KERNEL); + + if (evt) { + atomic_inc(vs-vs_events_nr); This looks suspicious: checking vs_events_nr VHOST_SCSI_MAX_EVENT first and then incrementing later isn't atomic! This does not matter. (1) and (2) are okay. In case (3), the other side can only decrease the number of event, the limit will not be exceeded. (1) atomic_dec() atomic_read() atomic_inc() (2) atomic_read() atomic_inc() atomic_dec() (3) atomic_read() atomic_dec() atomic_inc() The cases you listed are fine but I'm actually concerned about tcm_vhost_allocate_evt() racing with itself. There are 3 callers and I'm not sure which lock prevents them from executing at the same time. +static int tcm_vhost_hotunplug(struct tcm_vhost_tpg *tpg, struct se_lun *lun) +{ + struct vhost_scsi *vs = tpg-vhost_scsi; + + mutex_lock(tpg-tv_tpg_mutex); + vs = tpg-vhost_scsi; + mutex_unlock(tpg-tv_tpg_mutex); + if (!vs) + return -EOPNOTSUPP; + + if (!tcm_vhost_check_feature(vs, VIRTIO_SCSI_F_HOTPLUG)) + return -EOPNOTSUPP; + + return tcm_vhost_send_evt(vs, tpg, lun, + VIRTIO_SCSI_T_TRANSPORT_RESET, + VIRTIO_SCSI_EVT_RESET_REMOVED); +} tcm_vhost_hotplug() and tcm_vhost_hotunplug() are the same function except for VIRTIO_SCSI_EVT_RESET_RESCAN vs VIRTIO_SCSI_EVT_RESET_REMOVED. That can be passed in as an argument and the code duplication can be eliminated. I thought about this also. We can have a tcm_vhost_do_hotplug() helper. tcm_vhost_do_hotplug(tpg, lun, plug) tcm_vhost_hotplug() { tcm_vhost_do_hotplug(tpg, lun, true) } tcm_vhost_hotunplug() { tcm_vhost_do_hotplug(tpg, lun, false) } The reason I did not do that is I do not like the true/false argument but anyway this could remove duplication. I will do it. true/false makes the calling code hard to read, I suggest passing in VIRTIO_SCSI_EVT_RESET_RESCAN or VIRTIO_SCSI_EVT_RESET_REMOVED as the argument. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] tcm_vhost: Add hotplug/hotunplug support
On Thu, Mar 07, 2013 at 05:47:26PM +0800, Asias He wrote: On Thu, Mar 07, 2013 at 09:58:04AM +0100, Stefan Hajnoczi wrote: On Thu, Mar 07, 2013 at 08:26:20AM +0800, Asias He wrote: On Wed, Mar 06, 2013 at 10:21:09AM +0100, Stefan Hajnoczi wrote: On Wed, Mar 06, 2013 at 02:16:30PM +0800, Asias He wrote: +static struct tcm_vhost_evt *tcm_vhost_allocate_evt(struct vhost_scsi *vs, + u32 event, u32 reason) +{ + struct tcm_vhost_evt *evt; + + if (atomic_read(vs-vs_events_nr) VHOST_SCSI_MAX_EVENT) + return NULL; + + evt = kzalloc(sizeof(*evt), GFP_KERNEL); + + if (evt) { + atomic_inc(vs-vs_events_nr); This looks suspicious: checking vs_events_nr VHOST_SCSI_MAX_EVENT first and then incrementing later isn't atomic! This does not matter. (1) and (2) are okay. In case (3), the other side can only decrease the number of event, the limit will not be exceeded. (1) atomic_dec() atomic_read() atomic_inc() (2) atomic_read() atomic_inc() atomic_dec() (3) atomic_read() atomic_dec() atomic_inc() The cases you listed are fine but I'm actually concerned about tcm_vhost_allocate_evt() racing with itself. There are 3 callers and I'm not sure which lock prevents them from executing at the same time. No lock to prevent it. But what is the racing of executing tcm_vhost_allocate_evt() at the same time? atomic_read() = VHOST_SCSI_MAX_EVENT atomic_read() = VHOST_SCSI_MAX_EVENT atomic_inc() atomic_inc() Now vs-vs_events_nr == VHOST_SCSI_MAX_EVENT + 1 which the if statement was supposed to prevent. +static int tcm_vhost_hotunplug(struct tcm_vhost_tpg *tpg, struct se_lun *lun) +{ + struct vhost_scsi *vs = tpg-vhost_scsi; + + mutex_lock(tpg-tv_tpg_mutex); + vs = tpg-vhost_scsi; + mutex_unlock(tpg-tv_tpg_mutex); + if (!vs) + return -EOPNOTSUPP; + + if (!tcm_vhost_check_feature(vs, VIRTIO_SCSI_F_HOTPLUG)) + return -EOPNOTSUPP; + + return tcm_vhost_send_evt(vs, tpg, lun, + VIRTIO_SCSI_T_TRANSPORT_RESET, + VIRTIO_SCSI_EVT_RESET_REMOVED); +} tcm_vhost_hotplug() and tcm_vhost_hotunplug() are the same function except for VIRTIO_SCSI_EVT_RESET_RESCAN vs VIRTIO_SCSI_EVT_RESET_REMOVED. That can be passed in as an argument and the code duplication can be eliminated. I thought about this also. We can have a tcm_vhost_do_hotplug() helper. tcm_vhost_do_hotplug(tpg, lun, plug) tcm_vhost_hotplug() { tcm_vhost_do_hotplug(tpg, lun, true) } tcm_vhost_hotunplug() { tcm_vhost_do_hotplug(tpg, lun, false) } The reason I did not do that is I do not like the true/false argument but anyway this could remove duplication. I will do it. true/false makes the calling code hard to read, I suggest passing in VIRTIO_SCSI_EVT_RESET_RESCAN or VIRTIO_SCSI_EVT_RESET_REMOVED as the argument. Yes. However, I think passing VIRTIO_SCSI_EVT_RESET_* is even worse. 1) Having VIRTIO_SCSI_EVT_RESET_RESCAN or VIRTIO_SCSI_EVT_RESET_REMOVED around VIRTIO_SCSI_T_TRANSPORT_RESET would be nicer. 2) tcm_vhost_do_hotplug(tpg, lun, VIRTIO_SCSI_EVT_RESET_*) doest not make much sense. What the hell is VIRTIO_SCSI_EVT_RESET_* when you do hotplug or hotunplug. In contrast, if we have tcm_vhost_do_hotplug(tpg, lun, plug), plug means doing hotplug or hotunplug. The VIRTIO_SCSI_EVT_RESET_REMOVED constant is pretty clear (removed means unplug). The VIRTIO_SCSI_EVT_RESET_RESCAN is less clear, but this code is in drivers/vhost/tcm_vhost.c so you can expect the reader to know the device specification :). Anyway, it's not the end of the world if you leave the duplicated code in, use a boolean parameter, or use the virtio event constant. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/5] tcm_vhost: Introduce tcm_vhost_check_feature()
On Wed, Mar 06, 2013 at 02:16:27PM +0800, Asias He wrote: This helper is useful to check if a feature is supported. Signed-off-by: Asias He as...@redhat.com --- drivers/vhost/tcm_vhost.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/drivers/vhost/tcm_vhost.c b/drivers/vhost/tcm_vhost.c index b3e50d7..fdbf986 100644 --- a/drivers/vhost/tcm_vhost.c +++ b/drivers/vhost/tcm_vhost.c @@ -91,6 +91,20 @@ static int iov_num_pages(struct iovec *iov) ((unsigned long)iov-iov_base PAGE_MASK)) PAGE_SHIFT; } +static bool tcm_vhost_check_feature(struct vhost_scsi *vs, u64 feature) +{ + u64 acked_features; + bool ret = false; + + mutex_lock(vs-dev.mutex); + acked_features = vs-dev.acked_features; + if (acked_features 1ULL feature) + ret = true; + mutex_unlock(vs-dev.mutex); + + return ret; +} This is like vhost_has_feature() except it acquires dev.mutex? In any case it isn't tcm_vhost-specific and could be in vhost.c. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Tracing kvm: kvm_entry and kvm_exit
On Thu, Feb 28, 2013 at 5:49 AM, David Ahern dsah...@gmail.com wrote: On 2/27/13 9:39 AM, David Ahern wrote: I have been playing with the live mode a bit lately. I'll add a debug to note 2 consecutive entry events without an exit -- see if it sheds some light on it. If you feel game take this for a spin: https://github.com/dsahern/linux/commits/perf-kvm-live-3.8 This is very cool, thanks for sharing. Next time I'm profiling vmexit latencies I'll give it a try. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v3 4/5] KVM: ioeventfd for virtio-ccw devices.
On Tue, Feb 26, 2013 at 12:55:36PM +0200, Michael S. Tsirkin wrote: On Mon, Feb 25, 2013 at 04:27:49PM +0100, Cornelia Huck wrote: diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index f0ced1a..8de3cd7 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -679,11 +679,16 @@ static int kvm_assign_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args) { int pio = args-flags KVM_IOEVENTFD_FLAG_PIO; - enum kvm_bus bus_idx = pio ? KVM_PIO_BUS : KVM_MMIO_BUS; + int ccw; + enum kvm_bus bus_idx; struct _ioeventfd*p; struct eventfd_ctx *eventfd; int ret; + ccw = args-flags KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY; + bus_idx = pio ? KVM_PIO_BUS : + ccw ? KVM_VIRTIO_CCW_NOTIFY_BUS : + KVM_MMIO_BUS; May be better to rewrite using if/else. Saw this after sending my comment. I agree with Michael, an if statement allows you to drop the locals and capture the bus_idx conversion in a single place (it could even be a static function to save duplicating the code in both functions that use it). Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 4/5] KVM: ioeventfd for virtio-ccw devices.
On Mon, Feb 25, 2013 at 04:27:49PM +0100, Cornelia Huck wrote: diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index f0ced1a..8de3cd7 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -679,11 +679,16 @@ static int kvm_assign_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args) { int pio = args-flags KVM_IOEVENTFD_FLAG_PIO; - enum kvm_bus bus_idx = pio ? KVM_PIO_BUS : KVM_MMIO_BUS; + int ccw; + enum kvm_bus bus_idx; struct _ioeventfd*p; struct eventfd_ctx *eventfd; int ret; + ccw = args-flags KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY; + bus_idx = pio ? KVM_PIO_BUS : + ccw ? KVM_VIRTIO_CCW_NOTIFY_BUS : + KVM_MMIO_BUS; /* must be natural-word sized */ switch (args-len) { case 1: @@ -759,11 +764,16 @@ static int kvm_deassign_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args) { int pio = args-flags KVM_IOEVENTFD_FLAG_PIO; - enum kvm_bus bus_idx = pio ? KVM_PIO_BUS : KVM_MMIO_BUS; + int ccw; + enum kvm_bus bus_idx; struct _ioeventfd*p, *tmp; struct eventfd_ctx *eventfd; int ret = -ENOENT; + ccw = args-flags KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY; + bus_idx = pio ? KVM_PIO_BUS : + ccw ? KVM_VIRTIO_CCW_NOTIFY_BUS : + KVM_MMIO_BUS; This is getting pretty convoluted. Drop of pio and ccw local variables, replace ?: with an if statement: if (args-flags KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY) bus_idx = KVM_VIRTIO_CCW_NOTIFY_BUS; else if (args-flags KVM_IOEVENTFD_FLAG_PIO) bus_idx = KVM_PIO_BUS; else bus_idx = KVM_MMIO_BUS; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is there any solution in KVM that like VAAI does in EXSI
On Tue, Feb 26, 2013 at 01:49:42PM +0800, Timon Wang wrote: Is there any solution in KVM that works like VAAI does in EXSI, I found a PPT that posted in Sep. 2012, which said that storage offload will be consider in future. I am wondering anybody knows about this, or provide some information about this? Thin Provisioning support is being added to QEMU. Some configurations already work - virtio-scsi on a block device or raw file supports discard, for example. Linux recently got Zero Blocks support in the form of the BLKZEROOUT ioctl. It is not being exploited by QEMU or libvirt yet. Copy Offload, not aware of active development. Perhaps libvirt or libstoragemgmt will support it. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Tracing kvm: kvm_entry and kvm_exit
On Fri, Feb 22, 2013 at 11:34:27AM -0500, Mohamad Gebai wrote: I am tracing kvm using perf and I am analyzing the sequences of kvm_entry and kvm_exit tracepoints. I noticed that during the boot process of a VM, there are a lot more (2 to 3 as many times) kvm_entry event than there are kvm_exit. I tried looking around but didn't find anything that explains this. Is this missing instrumentation? Or what other path does kvm take that doesn't generate a kvm_exit event? Gleb Natapov noticed something similar when playing with the perf script I posted here: http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/104181 Perhaps there is a code path that is missing trace_kvm_exit(). We didn't investigate why it happens but the unexplained kvm_entry events only appeared at the beginning of the trace, so the theory was that events are not activated atomically by perf(1). CCing perf mailing list. It would be interesting if someone knows the answer. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qemu help documentation
On Thu, Feb 14, 2013 at 02:22:51PM +0100, Paolo Pedaletti wrote: I have trouble to get full list of the output of qemu help inside kvm when I switch to second console CTRL-ALT-2 I can't find the full list even inside source code (apt-get source qemu-kvm) and neither inside binary file (grep blockarg qemu-*) Is it possible to redirect the output of help on an external file? Or paging it? This because (the main problem is) I'm trying to get full kernel message at boot, but inside KVM window it's not possible to scroll up ( goal: http://pedalinux.blogspot.it/2013/02/physical-to-virtual-step-by-step.html ) or to dump outside terminal output. Try Ctrl+PageUp. If that doesn't work you can put the monitor on stdio like this: $ qemu-system-x86_64 -monitor stdio ... Then you can interact from your shell and scroll back up as usual. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Win2003 disk corruption with kvm-1.0. and virtio
On Wed, Feb 13, 2013 at 10:53:14AM +0100, Sylvain Bauza wrote: As per documentation, Nova (Openstack Compute layer) is doing a 'qemu-img convert -s' against a running instance. http://docs.openstack.org/trunk/openstack-compute/admin/content/creating-images-from-running-instances.html That command will not corrupt the running instance because it opens the image read-only. It is possible that the new image is corrupted since qemu-img is reading from a qcow2 file that is changing underneath it. However, the chance is small as long as the snapshot isn't deleted while qemu-img convert is running. So this doesn't sound like the cause of the problems you are seeing. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Win2003 disk corruption with kvm-1.0. and virtio
On Tue, Feb 12, 2013 at 03:30:37PM +0100, Sylvain Bauza wrote: We currently run Openstack Essex hosts with KVM-1.0 (Ubuntu 12.04) instances with qcow2,virtio,cache=none For Linux VMs, no trouble at all but we do observe filesystem corruption and inconsistency (missing DLLs, CHKDSK asked by EventViewer, failure at reboot) with some of our Windows 2003 SP2 64b images. At first boot, stress tests (CrystalDiskMark 3.0.2 and intensive CHKDSK) don't show up problems. It is only appearing 6 or 12h later. Are you running the latest virtio-win drivers? See http://www.linux-kvm.org/page/WindowsGuestDrivers/Download_Drivers. Have you tested with IDE instead of virtio on the Windows guests? Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Google Summer of Code 2013 ideas wiki open
On Thu, Feb 14, 2013 at 11:39 AM, harryxiyou harryxi...@gmail.com wrote: On Tue, Feb 12, 2013 at 5:21 AM, Stefan Hajnoczi stefa...@gmail.com wrote: On Thu, Feb 7, 2013 at 4:19 PM, Stefan Hajnoczi stefa...@gmail.com wrote: I believe Google will announce GSoC again this year (there is no guarantee though) and I have created the wiki page so we can begin organizing project ideas that students can choose from. Google Summer of Code 2013 has just been announced! http://google-opensource.blogspot.de/2013/02/flip-bits-not-burgers-google-summer-of.html Some project ideas have already been discussed on IRC or private emails. Please go ahead and put them on the project ideas wiki page: http://wiki.qemu.org/Google_Summer_of_Code_2013 I am a senior student and wanna do some jobs about storage in Libvirt in GSOC 2013. I wonder whether Libvirt and QEMU will join GSOC 2013 together. If true, i will focus on http://wiki.qemu.org/Google_Summer_of_Code_2013 and add myself introductions to QEMU links said by Stefan Hajnoczi. Could anyone give me some suggestions? Thanks in advance. Hi Harry, Thanks for your interest. You can begin thinking about ideas but please keep in mind that we are still in the very early stages of GSoC preparation. Google will publish the list of accepted organizations on April 8th. Then there is a period of over 3 weeks to discuss your project idea with the organization. In the meantime, the best thing to do is to get familiar with the code bases and see if you can find/fix a bug. Contributing patches is a great way to get noticed. There is always a chance that QEMU and/or libvirt may not be among the list of accepted organizations, so don't put all your eggs in one basket :). Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Win2003 disk corruption with kvm-1.0. and virtio
On Tue, Feb 12, 2013 at 03:30:37PM +0100, Sylvain Bauza wrote: We currently run Openstack Essex hosts with KVM-1.0 (Ubuntu 12.04) instances with qcow2,virtio,cache=none For Linux VMs, no trouble at all but we do observe filesystem corruption and inconsistency (missing DLLs, CHKDSK asked by EventViewer, failure at reboot) with some of our Windows 2003 SP2 64b images. At first boot, stress tests (CrystalDiskMark 3.0.2 and intensive CHKDSK) don't show up problems. It is only appearing 6 or 12h later. Do you have any idea on how to prevent it ? Is cache=writethrough an acceptable solution ? We don't want to leave qcow2 image format as it does allow to do live snapshots et al. How are you taking live snapshots? qemu-img should not be used on a disk image that is currently open by a running guest, it may lead to corruption. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Google Summer of Code 2013 ideas wiki open
On Thu, Feb 7, 2013 at 4:19 PM, Stefan Hajnoczi stefa...@gmail.com wrote: I believe Google will announce GSoC again this year (there is no guarantee though) and I have created the wiki page so we can begin organizing project ideas that students can choose from. Google Summer of Code 2013 has just been announced! http://google-opensource.blogspot.de/2013/02/flip-bits-not-burgers-google-summer-of.html Some project ideas have already been discussed on IRC or private emails. Please go ahead and put them on the project ideas wiki page: http://wiki.qemu.org/Google_Summer_of_Code_2013 Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Google Summer of Code 2013 ideas wiki open
On Thu, Feb 7, 2013 at 4:19 PM, Stefan Hajnoczi stefa...@gmail.com wrote: CCed libvir-list to see if libvirt would like to do a joint application with QEMU. As mentioned, it's early days and GSoC 2013 has not been announced yet. I just want to start gathering ideas and seeing who is willing to mentor this year. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Investigating abnormal stealtimes
On Tue, Feb 5, 2013 at 1:26 AM, Marcelo Tosatti mtosa...@redhat.com wrote: - 'Steal time' is the amount of time taken while vcpu is able to run but not runnable. Maybe 'vmexit latency' is a better name. You are right, 'vmexit latency' is a better name. - Perhaps it would be good to subtract the time the thread was involuntarily scheduled out due 'timeslice' expiration. Otherwise, running a CPU intensive task returns false positives (that is, long delays to due to reschedule due to 'timeslice' exhausted by guest CPU activity, not due to KVM or QEMU issues such as voluntarily schedule in pthread_mutex_lock). Alternatively you can raise the priority of the vcpu threads (to get rid of the false positives). I think this depends on the use-case. If the aim is to find out why the guest has poor response times then timeslice expiration is interesting. If the aim is to optimize QEMU or kvm.ko then timeslice expiration is a nuisance :). Your idea to raise the vcpu thread priority sounds good to me. - Idea: Would be handy to extract trace events in the offending 'latency above threshold' vmexit/vmentry region. Say that you enable other trace events (unrelated to kvm) which can help identify the culprit. Instead of scanning the file manually searching for 100466.1062486786 save one vmexit/vmentry cycle, along with other trace events in that period, in a separate file. Good idea. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How to limit upload bandwidth for a guest server?
On Sun, Feb 03, 2013 at 07:59:07PM -0600, Neil Aggarwal wrote: I have a CentOS server using KVM to host guest servers. I am trying to limit the bandwidth usable by a guest server. I tried to use tc, but that is only limiting the download bandwidth to a server. It does not seem to filter packets uploaded by the server. Is there a tool to limit upload traffic for a guest server? Consider using management tools like libvirt that handle tc and friends for you. The domain XML is interfacebandwidthinbound and outbound. Back to the question, are you looking for ingress qdiscs? http://www.lartc.org/howto/lartc.adv-qdisc.ingress.html This is a standard tc question, not related to virtualization. You may get better help from Linux networking mailing lists or IRC channels. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] QEMU buildbot maintenance state
On Wed, Jan 30, 2013 at 10:31:22AM +0100, Gerd Hoffmann wrote: Hi, Gerd: Are you willing to co-maintain the QEMU buildmaster with Daniel and Christian? It would be awesome if you could do this given your experience running and customizing buildbot. I'll try to set aside some time for that. Christians idea to host the config at github is good, that certainly makes it easier to balance things to more people. Another thing which would be helpful: Any chance we can setup a maintainer tree mirror @ git.qemu.org? A single repository where each maintainer tree shows up as a branch? This would make the buildbot setup *alot* easier. We can go for a AnyBranchScheduler then with BuildFactory and BuildConfig shared, instead of needing one BuildFactory and BuildConfig per branch. Also makes the buildbot web interface less cluttered as we don't have a insane amount of BuildConfigs any more. And saves some resources (bandwidth + diskspace) for the buildslaves. I think people who want to look what is coming or who want to test stuff cooking it would be a nice service too if they have a one-stop shop where they can get everything. I sent a pull request that makes the BuildFactory definitions simpler using a single create_build_factory() function: https://github.com/b1-systems/buildbot/pull/1 Keep in mind that BuildFactories differ not just by repo/branch but also: * in-tree or out-of-tree * extra ./configure arguments * gmake instead of make I think this means it is not as simple as defining a single BuildFactory. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] QEMU buildbot maintenance state
On Wed, Jan 30, 2013 at 10:31:22AM +0100, Gerd Hoffmann wrote: Hi, Gerd: Are you willing to co-maintain the QEMU buildmaster with Daniel and Christian? It would be awesome if you could do this given your experience running and customizing buildbot. I'll try to set aside some time for that. Excellent, thank you! Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: QEMU buildbot maintenance state
On Tue, Jan 29, 2013 at 04:04:39PM +0100, Christian Berendt wrote: On 01/28/2013 03:29 PM, Daniel Gollub wrote: JFYI, the main buildbot configuration which controls everything (beside buildslave credentials) is accessible to everyone: http://people.b1-systems.de/~gollub/buildbot/ If you are familiar with buildbot feel free to incorporate your suggested changes directly on a copy and send me or Christian the diff so we just have to review and apply it. I moved the configuration on GitHub (https://github.com/b1-systems/buildbot). I'll add a cron job to the buildbot system to regular pull and apply the latest configuration. Simply open a pull request to modify the configuration. Thanks Christian! I have updated the QEMU wiki page: http://wiki.qemu.org/ContinuousIntegration Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Investigating abnormal stealtimes
Khoa and I have been discussing a workload that triggers softlockups and hung task warnings inside the guest. These warnings can pop up due to bugs in the guest Linux kernel but they can also be triggered by the hypervisor if vcpus are not being scheduled at reasonable times. I've wanted a tool that reports high stealtimes and includes the last vmexit reason. This allows us to figure out if specific I/O device emulation is taking too long or if other factors like host memory pressure are degrading guest performance. Here is a first sketch of such a tool. It's a perf-script(1) Python script which can be used to analyze perf.data files recorded with kvm:kvm_entry and kvm:kvm_exit events. Stealtimes exceeding a threshold will be flagged up: $ perf script -s /absolute/path/to/stealtime.py 100466.1062486786 9690: steal time 0.029318914 secs, exit_reason IO_INSTRUCTION, guest_rip 0x81278f02, isa 1, info1 0xcf80003, info2 0x0 The example above shows an I/O access to 0xcf8 (PCI Configuration Space Address port) that took about 28 milliseconds. The host pid was 9690; this can be used to investigate the QEMU vcpu thread. The guest rip can be used to investigate guest code that triggered this vmexit. Given this information, it becomes possible to debug QEMU to figure out why vmexit handling is taking too long. It might be due to global mutex contention if another thread holds the global mutex while blocking. This sort of investigation needs to be done manually today but it might be possible to add perf event handlers to watch for global mutex contention inside QEMU and automatically identify the culprit. Stalls inside the kvm kernel module can also be investigated since kvm:kvm_exit events are triggered when they happen too. I wanted to share in case it is useful for others. Suggestions for better approaches welcome! Signed-off-by: Stefan Hajnoczi stefa...@redhat.com --- #!/usr/bin/env python # perf script event handlers, generated by perf script -g python # Licensed under the terms of the GNU GPL License version 2 # Script to print steal times longer than a given threshold # # To collect trace data: # $ perf record -a -e kvm:kvm_entry -e kvm:kvm_exit # # To print results from trace data: # # $ perf script -s /absolute/path/to/stealtime.py # 100466.1062486786 9690: steal time 0.029318914 secs, # exit_reason IO_INSTRUCTION, # guest_rip 0x81278f02, # isa 1, info1 0xcf80003, info2 0x0 # # The example above shows an I/O access to 0xcf8 (PCI Configuration Space # Address port) that took about 28 milliseconds. The host pid was 9690; this # can be used to investigate the QEMU vcpu thread. The guest rip can be used # to investigate guest code that triggered this vmexit. # Print steal times longer than this threshold in milliseconds THRESHOLD_MS = 100 import os import sys sys.path.append(os.environ['PERF_EXEC_PATH'] + \ '/scripts/python/Perf-Trace-Util/lib/Perf/Trace') from perf_trace_context import * from Core import * vcpu_threads = {} def trace_begin(): print 'argv:', str(sys.argv) def trace_end(): pass def kvm__kvm_exit(event_name, context, common_cpu, common_secs, common_nsecs, common_pid, common_comm, exit_reason, guest_rip, isa, info1, info2): if common_pid in vcpu_threads: last = vcpu_threads[common_pid] assert last[0] == 'kvm__kvm_entry' while last[2] common_nsecs: common_secs -= 1 common_nsecs += 1 * 1000 * 1000 * 1000 delta_secs = common_secs - last[1] delta_nsecs = common_nsecs - last[2] vcpu_threads[common_pid] = (event_name, common_secs, common_nsecs, exit_reason, guest_rip, isa, info1, info2) def kvm__kvm_entry(event_name, context, common_cpu, common_secs, common_nsecs, common_pid, common_comm, vcpu_id): if common_pid in vcpu_threads: last = vcpu_threads[common_pid] assert last[0] == 'kvm__kvm_exit' while last[2] common_nsecs: common_secs -= 1 common_nsecs += 1 * 1000 * 1000 * 1000 delta_secs = common_secs - last[1] delta_nsecs = common_nsecs - last[2] if delta_secs 0 or delta_nsecs THRESHOLD_MS * 1000 * 1000: print '%05u.%09u %u: steal time %05u.%09u secs, exit_reason %s, guest_rip %#x, isa %d, info1 %#x, info2 %#x' % ( last[1], last[2], common_pid, delta_secs, delta_nsecs, symbol_str(kvm__kvm_exit, exit_reason, last[3]), last[4], last[5], last[6], last[7]) vcpu_threads[common_pid] = (event_name, common_secs, common_nsecs) def trace_unhandled(event_name, context, event_fields_dict): print ' '.join
Re: [Qemu-devel] QEMU buildbot maintenance state (was: Re: KVM call agenda for 2013-01-29)
On Mon, Jan 28, 2013 at 03:29:16PM +0100, Daniel Gollub wrote: If Daniel does not have sufficient time to administer it, can we maybe have that set up on qemu.org instead, with more than one person that has access to it? JFYI, I just requested if I am allowed to grant Stefan root access to our box. I would not mind to give him access - but need to check back with our IT first. Thanks offering this. Unfortunately I can't accept because I'm at the limit of keeping up with my other QEMU responsibilities. I don't have enough time to do this job well. Gerd: Are you willing to co-maintain the QEMU buildmaster with Daniel and Christian? It would be awesome if you could do this given your experience running and customizing buildbot. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM call agenda for 2013-01-29
On Mon, Jan 28, 2013 at 11:59:40AM +0100, Juan Quintela wrote: Please send in any agenda topics you are interested in. Replacing select(2) so that we will not hit the 1024 fd_set limit in the future. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [QEMU PATCH v5 0/3] virtio-net: fix of ctrl commands
On Tue, Jan 22, 2013 at 11:44:43PM +0800, Amos Kong wrote: Currently virtio-net code relys on the layout of descriptor, this patchset removed the assumptions and introduced a control command to set mac address. Last patch is a trivial renaming. V2: check guest's iov_len V3: fix of migration compatibility make mac field in config space read-only when new feature is acked V4: add fix of descriptor layout assumptions, trivial rename V5: fix endianness after iov_to_buf copy Amos Kong (2): virtio-net: introduce a new macaddr control virtio-net: rename ctrl rx commands Michael S. Tsirkin (1): virtio-net: remove layout assumptions for ctrl vq hw/pc_piix.c|4 ++ hw/virtio-net.c | 142 +- hw/virtio-net.h | 26 +++ 3 files changed, 108 insertions(+), 64 deletions(-) Reviewed-by: Stefan Hajnoczi stefa...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [QEMU PATCH v4 1/3] virtio-net: remove layout assumptions for ctrl vq
On Tue, Jan 22, 2013 at 10:38:14PM +0800, Amos Kong wrote: On Mon, Jan 21, 2013 at 05:03:30PM +0100, Stefan Hajnoczi wrote: On Sat, Jan 19, 2013 at 09:54:26AM +0800, ak...@redhat.com wrote: From: Michael S. Tsirkin m...@redhat.com Virtio-net code makes assumption about virtqueue descriptor layout (e.g. sg[0] is the header, sg[1] is the data buffer). This patch makes code not rely on the layout of descriptors. Signed-off-by: Michael S. Tsirkin m...@redhat.com Signed-off-by: Amos Kong ak...@redhat.com --- hw/virtio-net.c | 128 1 file changed, 74 insertions(+), 54 deletions(-) diff --git a/hw/virtio-net.c b/hw/virtio-net.c index 3bb01b1..113e194 100644 --- a/hw/virtio-net.c +++ b/hw/virtio-net.c @@ -315,44 +315,44 @@ static void virtio_net_set_features(VirtIODevice *vdev, uint32_t features) } static int virtio_net_handle_rx_mode(VirtIONet *n, uint8_t cmd, - VirtQueueElement *elem) + struct iovec *iov, unsigned int iov_cnt) { uint8_t on; +size_t s; -if (elem-out_num != 2 || elem-out_sg[1].iov_len != sizeof(on)) { -error_report(virtio-net ctrl invalid rx mode command); -exit(1); +s = iov_to_buf(iov, iov_cnt, 0, on, sizeof(on)); +if (s != sizeof(on)) { +return VIRTIO_NET_ERR; } -on = ldub_p(elem-out_sg[1].iov_base); - -if (cmd == VIRTIO_NET_CTRL_RX_MODE_PROMISC) +if (cmd == VIRTIO_NET_CTRL_RX_MODE_PROMISC) { n-promisc = on; -else if (cmd == VIRTIO_NET_CTRL_RX_MODE_ALLMULTI) +} else if (cmd == VIRTIO_NET_CTRL_RX_MODE_ALLMULTI) { n-allmulti = on; -else if (cmd == VIRTIO_NET_CTRL_RX_MODE_ALLUNI) +} else if (cmd == VIRTIO_NET_CTRL_RX_MODE_ALLUNI) { n-alluni = on; -else if (cmd == VIRTIO_NET_CTRL_RX_MODE_NOMULTI) +} else if (cmd == VIRTIO_NET_CTRL_RX_MODE_NOMULTI) { n-nomulti = on; -else if (cmd == VIRTIO_NET_CTRL_RX_MODE_NOUNI) +} else if (cmd == VIRTIO_NET_CTRL_RX_MODE_NOUNI) { n-nouni = on; -else if (cmd == VIRTIO_NET_CTRL_RX_MODE_NOBCAST) +} else if (cmd == VIRTIO_NET_CTRL_RX_MODE_NOBCAST) { n-nobcast = on; -else +} else { return VIRTIO_NET_ERR; +} return VIRTIO_NET_OK; } static int virtio_net_handle_mac(VirtIONet *n, uint8_t cmd, - VirtQueueElement *elem) + struct iovec *iov, unsigned int iov_cnt) { struct virtio_net_ctrl_mac mac_data; +size_t s; -if (cmd != VIRTIO_NET_CTRL_MAC_TABLE_SET || elem-out_num != 3 || -elem-out_sg[1].iov_len sizeof(mac_data) || -elem-out_sg[2].iov_len sizeof(mac_data)) +if (cmd != VIRTIO_NET_CTRL_MAC_TABLE_SET) { return VIRTIO_NET_ERR; +} n-mac_table.in_use = 0; n-mac_table.first_multi = 0; @@ -360,54 +360,71 @@ static int virtio_net_handle_mac(VirtIONet *n, uint8_t cmd, n-mac_table.multi_overflow = 0; memset(n-mac_table.macs, 0, MAC_TABLE_ENTRIES * ETH_ALEN); -mac_data.entries = ldl_p(elem-out_sg[1].iov_base); +s = iov_to_buf(iov, iov_cnt, 0, mac_data.entries, + sizeof(mac_data.entries)); Hi Stefan, can we adjust the endianness after each iov_to_buf() copy? Yes. It's only necessary for uint16_t and larger types since a single byte cannot be swapped (so ldub_p() is not needed). Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [QEMU PATCH v4 1/3] virtio-net: remove layout assumptions for ctrl vq
On Sat, Jan 19, 2013 at 09:54:26AM +0800, ak...@redhat.com wrote: From: Michael S. Tsirkin m...@redhat.com Virtio-net code makes assumption about virtqueue descriptor layout (e.g. sg[0] is the header, sg[1] is the data buffer). This patch makes code not rely on the layout of descriptors. Signed-off-by: Michael S. Tsirkin m...@redhat.com Signed-off-by: Amos Kong ak...@redhat.com --- hw/virtio-net.c | 128 1 file changed, 74 insertions(+), 54 deletions(-) diff --git a/hw/virtio-net.c b/hw/virtio-net.c index 3bb01b1..113e194 100644 --- a/hw/virtio-net.c +++ b/hw/virtio-net.c @@ -315,44 +315,44 @@ static void virtio_net_set_features(VirtIODevice *vdev, uint32_t features) } static int virtio_net_handle_rx_mode(VirtIONet *n, uint8_t cmd, - VirtQueueElement *elem) + struct iovec *iov, unsigned int iov_cnt) { uint8_t on; +size_t s; -if (elem-out_num != 2 || elem-out_sg[1].iov_len != sizeof(on)) { -error_report(virtio-net ctrl invalid rx mode command); -exit(1); +s = iov_to_buf(iov, iov_cnt, 0, on, sizeof(on)); +if (s != sizeof(on)) { +return VIRTIO_NET_ERR; } -on = ldub_p(elem-out_sg[1].iov_base); - -if (cmd == VIRTIO_NET_CTRL_RX_MODE_PROMISC) +if (cmd == VIRTIO_NET_CTRL_RX_MODE_PROMISC) { n-promisc = on; -else if (cmd == VIRTIO_NET_CTRL_RX_MODE_ALLMULTI) +} else if (cmd == VIRTIO_NET_CTRL_RX_MODE_ALLMULTI) { n-allmulti = on; -else if (cmd == VIRTIO_NET_CTRL_RX_MODE_ALLUNI) +} else if (cmd == VIRTIO_NET_CTRL_RX_MODE_ALLUNI) { n-alluni = on; -else if (cmd == VIRTIO_NET_CTRL_RX_MODE_NOMULTI) +} else if (cmd == VIRTIO_NET_CTRL_RX_MODE_NOMULTI) { n-nomulti = on; -else if (cmd == VIRTIO_NET_CTRL_RX_MODE_NOUNI) +} else if (cmd == VIRTIO_NET_CTRL_RX_MODE_NOUNI) { n-nouni = on; -else if (cmd == VIRTIO_NET_CTRL_RX_MODE_NOBCAST) +} else if (cmd == VIRTIO_NET_CTRL_RX_MODE_NOBCAST) { n-nobcast = on; -else +} else { return VIRTIO_NET_ERR; +} return VIRTIO_NET_OK; } static int virtio_net_handle_mac(VirtIONet *n, uint8_t cmd, - VirtQueueElement *elem) + struct iovec *iov, unsigned int iov_cnt) { struct virtio_net_ctrl_mac mac_data; +size_t s; -if (cmd != VIRTIO_NET_CTRL_MAC_TABLE_SET || elem-out_num != 3 || -elem-out_sg[1].iov_len sizeof(mac_data) || -elem-out_sg[2].iov_len sizeof(mac_data)) +if (cmd != VIRTIO_NET_CTRL_MAC_TABLE_SET) { return VIRTIO_NET_ERR; +} n-mac_table.in_use = 0; n-mac_table.first_multi = 0; @@ -360,54 +360,71 @@ static int virtio_net_handle_mac(VirtIONet *n, uint8_t cmd, n-mac_table.multi_overflow = 0; memset(n-mac_table.macs, 0, MAC_TABLE_ENTRIES * ETH_ALEN); -mac_data.entries = ldl_p(elem-out_sg[1].iov_base); +s = iov_to_buf(iov, iov_cnt, 0, mac_data.entries, + sizeof(mac_data.entries)); -if (sizeof(mac_data.entries) + -(mac_data.entries * ETH_ALEN) elem-out_sg[1].iov_len) +if (s != sizeof(mac_data.entries)) { return VIRTIO_NET_ERR; +} +iov_discard_front(iov, iov_cnt, s); + +if (mac_data.entries * ETH_ALEN iov_size(iov, iov_cnt)) { The (possible) byteswap was lost. ldl_p() copies from target endianness to host endianness. +return VIRTIO_NET_ERR; +} if (mac_data.entries = MAC_TABLE_ENTRIES) { -memcpy(n-mac_table.macs, elem-out_sg[1].iov_base + sizeof(mac_data), - mac_data.entries * ETH_ALEN); +s = iov_to_buf(iov, iov_cnt, 0, n-mac_table.macs, + mac_data.entries * ETH_ALEN); +if (s != mac_data.entries * ETH_ALEN) { +return VIRTIO_NET_OK; s/VIRTIO_NET_OK/VIRTIO_NET_ERR/ +} n-mac_table.in_use += mac_data.entries; } else { n-mac_table.uni_overflow = 1; } +iov_discard_front(iov, iov_cnt, mac_data.entries * ETH_ALEN); + n-mac_table.first_multi = n-mac_table.in_use; -mac_data.entries = ldl_p(elem-out_sg[2].iov_base); +s = iov_to_buf(iov, iov_cnt, 0, mac_data.entries, + sizeof(mac_data.entries)); Same deal with mac_data.entries byteswap. -if (sizeof(mac_data.entries) + -(mac_data.entries * ETH_ALEN) elem-out_sg[2].iov_len) +if (s != sizeof(mac_data.entries)) { return VIRTIO_NET_ERR; +} -if (mac_data.entries) { -if (n-mac_table.in_use + mac_data.entries = MAC_TABLE_ENTRIES) { -memcpy(n-mac_table.macs + (n-mac_table.in_use * ETH_ALEN), - elem-out_sg[2].iov_base + sizeof(mac_data), -
Re: [QEMU PATCH v4 2/3] virtio-net: introduce a new macaddr control
On Sat, Jan 19, 2013 at 09:54:27AM +0800, ak...@redhat.com wrote: @@ -350,6 +351,18 @@ static int virtio_net_handle_mac(VirtIONet *n, uint8_t cmd, struct virtio_net_ctrl_mac mac_data; size_t s; +if (cmd == VIRTIO_NET_CTRL_MAC_ADDR_SET) { +if (iov_size(iov, iov_cnt) != ETH_ALEN) { +return VIRTIO_NET_ERR; +} +s = iov_to_buf(iov, iov_cnt, 0, n-mac, sizeof(n-mac)); +if (s != sizeof(n-mac)) { +return VIRTIO_NET_ERR; +} Since iov_size() was checked before iov_to_buf(), we never hit this error. And if we did n-mac would be trashed (i.e. error handling is not complete). I think assert(s == sizeof(n-mac)) is more appropriate appropriate. Also, please change ETH_ALEN to sizeof(n-mac) to make the relationship between the check and the copy clear. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 2/3] net: split eth_mac_addr for better error handling
On Sun, Jan 20, 2013 at 10:43:08AM +0800, ak...@redhat.com wrote: From: Stefan Hajnoczi stefa...@gmail.com When we set mac address, software mac address in system and hardware mac address all need to be updated. Current eth_mac_addr() doesn't allow callers to implement error handling nicely. This patch split eth_mac_addr() to prepare part and real commit part, then we can prepare first, and try to change hardware address, then do the real commit if hardware address is set successfully. Signed-off-by: Stefan Hajnoczi stefa...@gmail.com Signed-off-by: Amos Kong ak...@redhat.com --- include/linux/etherdevice.h | 2 ++ net/ethernet/eth.c | 43 --- 2 files changed, 38 insertions(+), 7 deletions(-) Feel free to make yourself author and put me just as Suggested-by:. I posted pseudo-code but didn't write the patch or test it, so it's fair to say that credit goes to you. :) Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] virtio-spec: set mac address by a new vq command
On Thu, Jan 17, 2013 at 06:25:47PM +0800, ak...@redhat.com wrote: From: Amos Kong ak...@redhat.com Virtio-net driver currently programs MAC address byte by byte, this means that we have an intermediate step where mac is wrong. This patch introduced a new control command to set MAC address in one time, and added a new feature flag VIRTIO_NET_F_MAC_ADDR for this feature. Signed-off-by: Amos Kong ak...@redhat.com --- v2: add more detail about new command (Stefan) --- virtio-spec.lyx | 58 - 1 file changed, 57 insertions(+), 1 deletion(-) diff --git a/virtio-spec.lyx b/virtio-spec.lyx index 1ba9992..1ec0cd4 100644 --- a/virtio-spec.lyx +++ b/virtio-spec.lyx @@ -56,6 +56,7 @@ \html_math_output 0 \html_css_as_file 0 \html_be_strict false +\author -1930653948 Amos Kong \author -608949062 Rusty Russell,,, \author -385801441 Cornelia Huck cornelia.h...@de.ibm.com \author 1112500848 Rusty Russell ru...@rustcorp.com.au @@ -4391,6 +4392,14 @@ VIRTIO_NET_F_GUEST_ANNOUNCE(21) Guest can send gratuitous packets. \change_inserted 1986246365 1352742808 VIRTIO_NET_F_MQ(22) Device supports multiqueue with automatic receive steering. +\change_inserted -1930653948 1358319033 + +\end_layout + +\begin_layout Description + +\change_inserted -1930653948 1358319080 +VIRTIO_NET_F_CTRL_MAC_ADDR(23) Set MAC address. \change_unchanged \end_layout @@ -5284,7 +5293,11 @@ The class VIRTIO_NET_CTRL_RX has two commands: VIRTIO_NET_CTRL_RX_PROMISC \end_layout \begin_layout Subsubsection* -Setting MAC Address Filtering +Setting MAC Address +\change_deleted -1930653948 1358318470 + Filtering +\change_unchanged + \end_layout \begin_layout Standard @@ -5324,6 +5337,17 @@ struct virtio_net_ctrl_mac { \begin_layout Plain Layout #define VIRTIO_NET_CTRL_MAC_TABLE_SET0 +\change_inserted -1930653948 1358318313 + +\end_layout + +\begin_layout Plain Layout + +\change_inserted -1930653948 1358318331 + + #define VIRTIO_NET_CTRL_MAC_ADDR_SET 1 +\change_unchanged + \end_layout \end_inset @@ -5349,6 +5373,38 @@ T_CTRL_MAC_TABLE_SET. The command-specific-data is two variable length tables of 6-byte MAC addresses. The first table contains unicast addresses, and the second contains multicast addresses. +\change_inserted -1930653948 1358318545 + +\end_layout + +\begin_layout Standard + +\change_inserted -1930653948 1358418243 +The config space +\begin_inset Quotes eld +\end_inset + +mac +\begin_inset Quotes erd +\end_inset + + field and the command VIRTIO_NET_CTRL_MAC_ADDR_SET both set the default + MAC address which rx filtering accepts. + The command VIRTIO_NET_CTRL_MAC_ADDR_SET is atomic whereas the config space + +\begin_inset Quotes eld +\end_inset + +mac +\begin_inset Quotes erd +\end_inset + + field is not. + Therefore, VIRTIO_NET_CTRL_MAC_ADDR_SET is preferred, especially while + the NIC is up. + The command-specific-data is a 6-byte MAC address. +\change_unchanged The specification must also say that the mac field is read-only when the VIRTIO_NET_CTRL_MAC_ADDR_SET command is supported. (I think you added this behavior to your patch.) Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/2] virtio-net: introduce a new control to set macaddr
On Thu, Jan 17, 2013 at 06:40:12PM +0800, ak...@redhat.com wrote: diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 395ab4f..837c978 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -802,14 +802,32 @@ static int virtnet_set_mac_address(struct net_device *dev, void *p) struct virtnet_info *vi = netdev_priv(dev); struct virtio_device *vdev = vi-vdev; int ret; + struct scatterlist sg; + char save_addr[ETH_ALEN]; + unsigned char save_aatype; + + memcpy(save_addr, dev-dev_addr, ETH_ALEN); + save_aatype = dev-addr_assign_type; ret = eth_mac_addr(dev, p); if (ret) return ret; - if (virtio_has_feature(vdev, VIRTIO_NET_F_MAC)) + if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_MAC_ADDR)) { + sg_init_one(sg, dev-dev_addr, dev-addr_len); + if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_MAC, + VIRTIO_NET_CTRL_MAC_ADDR_SET, + sg, 1, 0)) { + dev_warn(vdev-dev, + Failed to set mac address by vq command.\n); + memcpy(dev-dev_addr, save_addr, ETH_ALEN); + dev-addr_assign_type = save_aatype; + return -EINVAL; + } eth_mac_addr() doesn't allow callers to implement error handling nicely. Although you didn't duplicate it's code directly, this patch still leaks internals of eth_mac_addr(). How about splitting eth_mac_addr() in a separate patch: int eth_prepare_mac_addr_change(struct net_device *dev, void *p) { struct sockaddr *addr = p; if (!(dev-priv_flags IFF_LIVE_ADDR_CHANGE) netif_running(dev)) return -EBUSY; if (!is_valid_ether_addr(addr-sa_data)) return -EADDRNOTAVAIL; return 0; } void eth_commit_mac_addr_change(struct net_device *dev, void *p) { struct sockaddr *addr = p; memcpy(dev-dev_addr, addr-sa_data, ETH_ALEN); /* if device marked as NET_ADDR_RANDOM, reset it */ dev-addr_assign_type = ~NET_ADDR_RANDOM; } /* Default implementation of MAC address changing */ int eth_mac_addr(struct net_device *dev, void *p) { int ret; ret = eth_prepare_mac_addr_change(dev, p); if (ret 0) return ret; eth_commit_mac_addr_change(dev, p); return 0; } Now virtio_net.c does: ret = eth_prepare_mac_addr_change(dev, p); if (ret 0) return ret; if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_MAC_ADDR)) { sg_init_one(sg, dev-dev_addr, dev-addr_len); if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_MAC, VIRTIO_NET_CTRL_MAC_ADDR_SET, sg, 1, 0)) { dev_warn(vdev-dev, Failed to set mac address by vq command.\n); return -EINVAL; } } ... eth_commit_mac_addr_change(dev, p); return 0; Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [QEMU PATCH v2] virtio-net: introduce a new macaddr control
On Thu, Jan 17, 2013 at 01:45:11PM +0800, Amos Kong wrote: On Thu, Jan 17, 2013 at 11:49:20AM +1030, Rusty Russell wrote: ak...@redhat.com writes: @@ -349,6 +351,14 @@ static int virtio_net_handle_mac(VirtIONet *n, uint8_t cmd, { struct virtio_net_ctrl_mac mac_data; +if (cmd == VIRTIO_NET_CTRL_MAC_ADDR_SET elem-out_num == 2 +elem-out_sg[1].iov_len == ETH_ALEN) { +/* Set MAC address */ +memcpy(n-mac, elem-out_sg[1].iov_base, elem-out_sg[1].iov_len); +qemu_format_nic_info_str(n-nic-nc, n-mac); +return VIRTIO_NET_OK; +} Does the rest of the net device still rely on the layout of descriptors? No, only info string of net client relies on n-mac I think the question is whether the hw/virtio-net.c code makes assumptions about virtqueue descriptor layout (e.g. sg[0] is the header, sg[1] is the data buffer). The answer is yes, the control virtqueue function directly accesses iov[n]. Additional patches would be required to convert the existing hw/virtio-net.c code to make no assumptions about virtqueue descriptor layout. It's outside the scope of this series. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: VirtIO id X is not a head!
On Wed, Jan 16, 2013 at 08:58:50PM +0100, Matthias Leinweber wrote: i try to implement a virtual device/driver, but i ran into some trouble using the virtio api. My implementation looks as follows: A kthread exposes memory via add_buf, kicks and sleeps. If a callback is issued he is woken up and takes the filled buffer back via get_buf. (No other kthread, process or whatever works on this vq in the kernel). In qemu a qemu_thread waits for some shared memory and tries to pop elements from the vq and copies some data into the guest accessible memory. Not all elements are necessarily poped before fill flush and notify are called. If a pop returns with 0 the thread goes to sleep until the handler routine for this vq wakes the thread up again. from time to time (after several 100k gets,adds and pops) i get: id %u is not a head!. virtio_ring.c: if (unlikely(i = vq-vring.num)) { BAD_RING(vq, id %u out of range\n, i); return NULL; I have no idea what i am doing wrong. Is synchronization needed between add pop and get or am i not allowed to use a qemu_thread when working on a vq? Hard to tell exactly what is going on without seeing the code. QEMU has a global mutex and therefore does not need to do much explicit locking...except if you spawn your own thread. The hw/virtio.c code in QEMU is not thread-safe. You cannot use it from a thread without holding the QEMU global mutex. It's fine to do I/O handling in worker threads, but you must use a BH, event notifier, or some other mechanism of kicking the QEMU iothread and process the virtqueue completion in a callback there. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [QEMU PATCH v3] virtio-net: introduce a new macaddr control
On Thu, Jan 17, 2013 at 06:30:46PM +0800, ak...@redhat.com wrote: From: Amos Kong ak...@redhat.com In virtio-net guest driver, currently we write MAC address to pci config space byte by byte, this means that we have an intermediate step where mac is wrong. This patch introduced a new control command to set MAC address, it's atomic. VIRTIO_NET_F_CTRL_MAC_ADDR is a new feature bit for compatibility. mac field will be set to read-only when VIRTIO_NET_F_CTRL_MAC_ADDR is acked. Signed-off-by: Amos Kong ak...@redhat.com --- V2: check guest's iov_len V3: fix of migration compatibility make mac field in config space read-only when new feature is acked --- hw/pc_piix.c| 4 hw/virtio-net.c | 10 +- hw/virtio-net.h | 12 ++-- 3 files changed, 23 insertions(+), 3 deletions(-) Reviewed-by: Stefan Hajnoczi stefa...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [QEMU PATCH v2] virtio-net: introduce a new macaddr control
On Wed, Jan 16, 2013 at 02:37:34PM +0800, Jason Wang wrote: On Wednesday, January 16, 2013 02:16:47 PM ak...@redhat.com wrote: From: Amos Kong ak...@redhat.com In virtio-net guest driver, currently we write MAC address to pci config space byte by byte, this means that we have an intermediate step where mac is wrong. This patch introduced a new control command to set MAC address in one time. VIRTIO_NET_F_CTRL_MAC_ADDR is a new feature bit for compatibility. Signed-off-by: Amos Kong ak...@redhat.com --- V2: check guest's iov_len before memcpy --- hw/virtio-net.c | 10 ++ hw/virtio-net.h | 9 - 2 files changed, 18 insertions(+), 1 deletion(-) diff --git a/hw/virtio-net.c b/hw/virtio-net.c index dc7c6d6..d05f98f 100644 --- a/hw/virtio-net.c +++ b/hw/virtio-net.c @@ -247,6 +247,7 @@ static uint32_t virtio_net_get_features(VirtIODevice *vdev, uint32_t features) VirtIONet *n = to_virtio_net(vdev); features |= (1 VIRTIO_NET_F_MAC); +features |= (1 VIRTIO_NET_F_CTRL_MAC_ADDR); if (!peer_has_vnet_hdr(n)) { features = ~(0x1 VIRTIO_NET_F_CSUM); @@ -282,6 +283,7 @@ static uint32_t virtio_net_bad_features(VirtIODevice *vdev) /* Linux kernel 2.6.25. It understood MAC (as everyone must), * but also these: */ features |= (1 VIRTIO_NET_F_MAC); +features |= (1 VIRTIO_NET_F_CTRL_MAC_ADDR); features |= (1 VIRTIO_NET_F_CSUM); features |= (1 VIRTIO_NET_F_HOST_TSO4); features |= (1 VIRTIO_NET_F_HOST_TSO6); @@ -349,6 +351,14 @@ static int virtio_net_handle_mac(VirtIONet *n, uint8_t cmd, { struct virtio_net_ctrl_mac mac_data; +if (cmd == VIRTIO_NET_CTRL_MAC_ADDR_SET elem-out_num == 2 +elem-out_sg[1].iov_len == ETH_ALEN) { +/* Set MAC address */ +memcpy(n-mac, elem-out_sg[1].iov_base, elem-out_sg[1].iov_len); +qemu_format_nic_info_str(n-nic-nc, n-mac); +return VIRTIO_NET_OK; +} + if (cmd != VIRTIO_NET_CTRL_MAC_TABLE_SET || elem-out_num != 3 || elem-out_sg[1].iov_len sizeof(mac_data) || elem-out_sg[2].iov_len sizeof(mac_data)) diff --git a/hw/virtio-net.h b/hw/virtio-net.h index d46fb98..9394cc0 100644 --- a/hw/virtio-net.h +++ b/hw/virtio-net.h @@ -44,6 +44,8 @@ #define VIRTIO_NET_F_CTRL_VLAN 19 /* Control channel VLAN filtering */ #define VIRTIO_NET_F_CTRL_RX_EXTRA 20 /* Extra RX mode control support */ +#define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */ + I wonder whether we need a DEFINE_PROP_BIT to disable and compat this feature. Consider we may migrate from a new version to an old version. I agree, migration needs to be handled. The bit should never change while the device is initialized and running. We should also never start rejecting or ignoring the command if it was available before. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio-spec: set mac address by a new vq command
On Wed, Jan 16, 2013 at 03:33:24PM +0800, ak...@redhat.com wrote: +\change_inserted -1930653948 1358320004 +The command VIRTIO_NET_CTRL_MAC_ADDR_SET is used to set +\begin_inset Quotes eld +\end_inset + +physical +\begin_inset Quotes erd +\end_inset + + address of the network card. The physical address of the network card? That term is not defined anywhere in the specification. Perhaps it's best to explain that the config space mac field and VIRTIO_NET_CTRL_MAC_ADDR_SET both set the default MAC address which rx filtering accepts. (The MAC table is an additional set of MAC addresses which rx filtering accepts.) It would also be worth explaining that VIRTIO_NET_CTRL_MAC_ADDR_SET is atomic whereas the config space mac field is not. Therefore, VIRTIO_NET_CTRL_MAC_ADDR_SET is preferred, especially while the NIC is up. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] virtio-net: introduce a new macaddr control
On Thu, Jan 10, 2013 at 10:51:57PM +0800, ak...@redhat.com wrote: @@ -349,6 +351,13 @@ static int virtio_net_handle_mac(VirtIONet *n, uint8_t cmd, { struct virtio_net_ctrl_mac mac_data; +if (cmd == VIRTIO_NET_CTRL_MAC_ADDR_SET elem-out_num == 2) { +/* Set MAC address */ +memcpy(n-mac, elem-out_sg[1].iov_base, elem-out_sg[1].iov_len); We cannot trust the guest's iov_len, it could overflow n-mac. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 01/12] tap: multiqueue support
On Wed, Jan 09, 2013 at 11:25:24PM +0800, Jason Wang wrote: On 01/09/2013 05:56 PM, Stefan Hajnoczi wrote: On Fri, Dec 28, 2012 at 06:31:53PM +0800, Jason Wang wrote: diff --git a/qapi-schema.json b/qapi-schema.json index 5dfa052..583eb7c 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -2465,7 +2465,7 @@ { 'type': 'NetdevTapOptions', 'data': { '*ifname': 'str', -'*fd': 'str', +'*fd': ['String'], This change is not backwards-compatible. You need to add a '*fds': ['String'] field instead. I'm not quite understand this case, I think it still work when we we just specify one fd. You are right, the QemuOpts visitor shows no incompatibility. But there is also a QMP interface: netdev_add. I think changing the type to a string list breaks compatibility there. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 00/12] Multiqueue virtio-net
On Wed, Jan 09, 2013 at 11:33:25PM +0800, Jason Wang wrote: On 01/09/2013 11:32 PM, Michael S. Tsirkin wrote: On Wed, Jan 09, 2013 at 03:29:24PM +0100, Stefan Hajnoczi wrote: On Fri, Dec 28, 2012 at 06:31:52PM +0800, Jason Wang wrote: Perf Numbers: Two Intel Xeon 5620 with direct connected intel 82599EB Host/Guest kernel: David net tree vhost enabled - lots of improvents of both latency and cpu utilization in request-reponse test - get regression of guest sending small packets which because TCP tends to batch less when the latency were improved 1q/2q/4q TCP_RR size #sessions trans.rate norm trans.rate norm trans.rate norm 1 1 9393.26 595.64 9408.18 597.34 9375.19 584.12 1 2072162.1 2214.24 129880.22 2456.13 196949.81 2298.13 1 50107513.38 2653.99 139721.93 2490.58 259713.82 2873.57 1 100 126734.63 2676.54 145553.5 2406.63 265252.68 2943 64 19453.42 632.33 9371.37 616.13 9338.19 615.97 64 20 70620.03 2093.68 125155.75 2409.15 191239.91 2253.32 64 50 1069662448.29 146518.67 2514.47 242134.07 2720.91 64 100 117046.35 2394.56 190153.09 2696.82 238881.29 2704.41 256 1 8733.29 736.36 8701.07 680.83 8608.92 530.1 256 20 69279.89 2274.45 115103.07 2299.76 144555.16 1963.53 256 50 97676.02 2296.09 150719.57 2522.92 254510.5 3028.44 256 100 150221.55 2949.56 197569.3 2790.92 300695.78 3494.83 TCP_CRR size #sessions trans.rate norm trans.rate norm trans.rate norm 1 1 2848.37 163.41 2230.39 130.89 2013.09 120.47 1 2023434.5 562.11 31057.43 531.07 49488.28 564.41 1 5028514.88 582.17 40494.23 605.92 60113.35 654.97 1 100 28827.22 584.73 48813.25 661.6 61783.62 676.56 64 12780.08 159.4 2201.07 127.96 2006.8 117.63 64 20 23318.51 564.47 30982.44 530.24 49734.95 566.13 64 50 28585.72 582.54 40576.7 610.08 60167.89 656.56 64 100 28747.37 584.17 49081.87 667.87 60612.94 662 256 1 2772.08 160.51 2231.84 131.05 2003.62 113.45 256 20 23086.35 559.8 30929.09 528.16 48454.9 555.22 256 50 28354.7 579.85 40578.31 60760261.71 657.87 256 100 28844.55 585.67 48541.86 659.08 61941.07 676.72 TCP_STREAM guest receiving size #sessions throughput norm throughput norm throughput norm 1 1 16.27 1.33 16.11.12 16.13 0.99 1 2 33.04 2.08 32.96 2.19 32.75 1.98 1 4 66.62 6.83 68.35.56 66.14 2.65 64 1896.55 56.67 914.02 58.14 898.9 61.56 64 21830.46 91.02 1812.02 64.59 1835.57 66.26 64 43626.61 142.55 3636.25 100.64 3607.46 75.03 256 1 2619.49 131.23 2543.19 129.03 2618.69 132.39 256 2 5136.58 203.02 5163.31 141.11 5236.51 149.4 256 4 7063.99 242.83 9365.4 208.49 9421.03 159.94 512 1 3592.43 165.24 3603.12 167.19 3552.5 169.57 512 2 7042.62 246.59 7068.46 180.87 7258.52 186.3 512 4 6996.08 241.49 9298.34 206.12 9418.52 159.33 1024 1 4339.54 192.95 4370.2 191.92 4211.72 192.49 1024 2 7439.45 254.77 9403.99 215.24 9120.82 222.67 1024 4 7953.86 272.11 9403.87 208.23 9366.98 159.49 4096 1 7696.28 272.04 7611.41 270.38 7778.71 267.76 4096 2 7530.35 261.1 8905.43 246.27 8990.18 267.57 4096 4 7121.6 247.02 9411.75 206.71 9654.96 184.67 16384 1 7795.73 268.54 7780.94 267.2 7634.26 260.73 16384 2 7436.57 255.81 9381.86 220.85 9392220.36 16384 4 7199.07 247.81 9420.96 205.87 9373.69 159.57 TCP_MAERTS guest sending size #sessions throughput norm throughput norm throughput norm 1 1 15.94 0.62 15.55 0.61 15.13 0.59 1 2 36.11 0.83 32.46 0.69 32.28 0.69 1 4 71.59 1 68.91 0.94 61.52 0.77 64 1630.71 22.52 622.11 22.35 605.09 21.84 64 21442.36 30.57 1292.15 25.82 1282.67 25.55 64 43186.79 42.59 2844.96 36.03 2529.69 30.06 256 1 1760.96 58.07 1738.44 57.43 1695.99 56.19 256 2 4834.23 95.19 3524.85 64.21 3511.94 64.45 256 4 9324.63 145.74 8956.49 116.39 6720.17 73.86 512 1 2678.03 84.1 2630.68 82.93 2636.54 82.57 512 2 9368.17 195.61 9408.82 204.53 5316.3 92.99 512 4 9186.34 209.68 9358.72 183.82 9489.29 160.42 1024 1 3620.71 109.88 3625.54 109.83 3606.61 112.35 1024 2 9429258.32 7082.79 120.55 7403.53 134.78 1024 4 9430.66 290.44 9499.29 232.31 9414.6 190.92 4096 1 9339.28 296.48 9374.23 372.88 9348.76 298.49 4096 2 9410.53 378.69 9412.61 286.18 9409.75 278.31 4096 4 9487.35 374.1 9556.91 288.81 9441.94 221.64 16384 1 9380.43 403.8 9379.78 399.13 9382.42 393.55 16384 2 9367.69 406.93 9415.04 312.68 9409.29 300.9 16384 4 9391.96 405.17 9695.12 310.54 9423.76 223.47 Trying to understand the performance results: What is the host device configuration? tap + bridge? Yes. Did you use host CPU affinity for the vhost threads? I use numactl to pin cpu threads and vhost threads in the same numa node. Can multiqueue tap take advantage of multiqueue host NICs
Re: [PATCH 01/12] tap: multiqueue support
On Fri, Dec 28, 2012 at 06:31:53PM +0800, Jason Wang wrote: Mainly suggestions to make the code easier to understand, but see the comment about the 1:1 queue/NetClientState model for a general issue with this approach. Recently, linux support multiqueue tap which could let userspace call TUNSETIFF for a signle device many times to create multiple file descriptors as s/signle/single/ (Noting these if you respin.) independent queues. User could also enable/disabe a specific queue through s/disabe/disable/ TUNSETQUEUE. The patch adds the generic infrastructure to create multiqueue taps. To achieve this a new parameter queues were introduced to specify how many queues were expected to be created for tap. The fd parameter were also changed to support a list of file descriptors which could be used by management (such as libvirt) to pass pre-created file descriptors (queues) to qemu. Each TAPState were still associated to a tap fd, which mean multiple TAPStates were created when user needs multiqueue taps. Only linux part were implemented now, since it's the only OS that support multiqueue tap. Signed-off-by: Jason Wang jasow...@redhat.com --- net/tap-aix.c | 18 - net/tap-bsd.c | 18 - net/tap-haiku.c | 18 - net/tap-linux.c | 70 +++- net/tap-linux.h |4 + net/tap-solaris.c | 18 - net/tap-win32.c | 10 ++ net/tap.c | 248 + net/tap.h |8 ++- qapi-schema.json |5 +- 10 files changed, 335 insertions(+), 82 deletions(-) This patch should be split up: 1. linux-headers: import linux/if_tun.h multiqueue constants 2. tap: add Linux multiqueue support (tap_open(), tap_fd_attach(), tap_fd_detach()) 3. tap: queue attach/detach (tap_attach(), tap_detach()) 4. tap: split out net_init_one_tap() function (pure code motion, to make later diffs easy to review) 5. tap: add queues and multi-fd options (net_init_tap()/net_init_one_tap() changes) Each commit description can explain how this works in more detail. I think I've figured it out now but it would have helped to separate things out from the start. diff --git a/net/tap-aix.c b/net/tap-aix.c index f27c177..f931ef3 100644 --- a/net/tap-aix.c +++ b/net/tap-aix.c @@ -25,7 +25,8 @@ #include net/tap.h #include stdio.h -int tap_open(char *ifname, int ifname_size, int *vnet_hdr, int vnet_hdr_required) +int tap_open(char *ifname, int ifname_size, int *vnet_hdr, + int vnet_hdr_required, int mq_required) { fprintf(stderr, no tap on AIX\n); return -1; @@ -59,3 +60,18 @@ void tap_fd_set_offload(int fd, int csum, int tso4, int tso6, int ecn, int ufo) { } + +int tap_fd_attach(int fd) +{ +return -1; +} + +int tap_fd_detach(int fd) +{ +return -1; +} + +int tap_fd_ifname(int fd, char *ifname) +{ +return -1; +} diff --git a/net/tap-bsd.c b/net/tap-bsd.c index a3b717d..07c287d 100644 --- a/net/tap-bsd.c +++ b/net/tap-bsd.c @@ -33,7 +33,8 @@ #include net/if_tap.h #endif -int tap_open(char *ifname, int ifname_size, int *vnet_hdr, int vnet_hdr_required) +int tap_open(char *ifname, int ifname_size, int *vnet_hdr, + int vnet_hdr_required, int mq_required) { int fd; #ifdef TAPGIFNAME @@ -145,3 +146,18 @@ void tap_fd_set_offload(int fd, int csum, int tso4, int tso6, int ecn, int ufo) { } + +int tap_fd_attach(int fd) +{ +return -1; +} + +int tap_fd_detach(int fd) +{ +return -1; +} + +int tap_fd_ifname(int fd, char *ifname) +{ +return -1; +} diff --git a/net/tap-haiku.c b/net/tap-haiku.c index 34739d1..62ab423 100644 --- a/net/tap-haiku.c +++ b/net/tap-haiku.c @@ -25,7 +25,8 @@ #include net/tap.h #include stdio.h -int tap_open(char *ifname, int ifname_size, int *vnet_hdr, int vnet_hdr_required) +int tap_open(char *ifname, int ifname_size, int *vnet_hdr, + int vnet_hdr_required, int mq_required) { fprintf(stderr, no tap on Haiku\n); return -1; @@ -59,3 +60,18 @@ void tap_fd_set_offload(int fd, int csum, int tso4, int tso6, int ecn, int ufo) { } + +int tap_fd_attach(int fd) +{ +return -1; +} + +int tap_fd_detach(int fd) +{ +return -1; +} + +int tap_fd_ifname(int fd, char *ifname) +{ +return -1; +} diff --git a/net/tap-linux.c b/net/tap-linux.c index c6521be..0854ef5 100644 --- a/net/tap-linux.c +++ b/net/tap-linux.c @@ -35,7 +35,8 @@ #define PATH_NET_TUN /dev/net/tun -int tap_open(char *ifname, int ifname_size, int *vnet_hdr, int vnet_hdr_required) +int tap_open(char *ifname, int ifname_size, int *vnet_hdr, + int vnet_hdr_required, int mq_required) { struct ifreq ifr; int fd, ret; @@ -67,6 +68,20 @@ int tap_open(char *ifname, int ifname_size, int *vnet_hdr, int