Re: [Qemu-devel] [PATCH 2/2 V7] qemu,qmp: add inject-nmi qmp command
Avi Kivity a...@redhat.com writes: On 04/08/2011 12:41 AM, Anthony Liguori wrote: And it's a good thing to have, but exposing this as the only API to do something as simple as generating a guest crash dump is not the friendliest thing in the world to do to users. nmi is a fine name for something that corresponds to a real-life nmi button (often labeled NMI). Agree. generate-crash-dump is a wrong name for something that doesn't generate a crash dump (the guest may not be configured for it, or it may fail to work). Or the OS uses the NMI button for something else. I'd expect that to be host-side functionality. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v2 3/5] hpet 'driftfix': add fields to HPETTimer and VMStateDescription
typedef struct HPETState { @@ -248,7 +253,7 @@ static int hpet_post_load(void *opaque, int version_id) static const VMStateDescription vmstate_hpet_timer = { .name = hpet_timer, - .version_id = 1, + .version_id = 3, Why jump from 1 to 3? .minimum_version_id = 1, .minimum_version_id_old = 1, .fields = (VMStateField []) { @@ -258,6 +263,11 @@ static const VMStateDescription vmstate_hpet_timer = { VMSTATE_UINT64(fsb, HPETTimer), VMSTATE_UINT64(period, HPETTimer), VMSTATE_UINT8(wrap_flag, HPETTimer), + VMSTATE_UINT64_V(saved_period, HPETTimer, 3), + VMSTATE_UINT64_V(ticks_not_accounted, HPETTimer, 3), + VMSTATE_UINT32_V(irqs_to_inject, HPETTimer, 3), + VMSTATE_UINT32_V(irq_rate, HPETTimer, 3), + VMSTATE_UINT32_V(divisor, HPETTimer, 3), Anthony, I incremented the version ID of 'vmstate_hpet' from 2 to 3 to make sure that migrations from a QEMU process that is capable of 'driftfix' to a QEMU process that is _not_ capable of 'driftfix' will fail. I assigned version ID 3 to 'vmstate_hpet_timer' and to the new fields in there too to indicate that adding those fields was the reason why the version ID of 'vmstate_hpet' was incremented to 3. As far as the flow of execution in vmstate_load_state() is concerned, I think it does not matter whether the version ID of 'vmstate_hpet_timer' and the new fields in there is 2 or 3 (as long as they are consistent). When the 'while(field-name)' loop in vmstate_load_state() gets to the following field in 'vmstate_hpet' ... VMSTATE_STRUCT_VARRAY_UINT8(timer, HPETState, num_timers, 0, vmstate_hpet_timer, HPETTimer), ... it calls itself recursively ... if (field-flags VMS_STRUCT) { ret = vmstate_load_state(f, field-vmsd, addr, field-vmsd-version_id); 'field-vmsd-version_id' is the version ID of 'vmstate_hpet_timer' [1]. Hence 'vmstate_hpet_timer.version_id' is being checked against itself ... if (version_id vmsd-version_id) { return -EINVAL; } ... and the version IDs of the new fields are also being checked against 'vmstate_hpet_timer.version_id' ... if ((field-field_exists field-field_exists(opaque, version_id)) || (!field-field_exists field-version_id = version_id)) { If you want me to change the version ID of 'vmstate_hpet_timer' and the new fields in there from 3 to 2, I can do that. Regards, Uli [1] Ref.: commit fa3aad24d94a6cf894db52d83f72a399324a17bb -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: EuroSec'11 Presentation
On Sun, Apr 10, 2011 at 4:19 PM, Kuniyasu Suzaki k.suz...@aist.go.jp wrote: From: Avi Kivity a...@redhat.com Subject: Re: EuroSec'11 Presentation Date: Sun, 10 Apr 2011 17:49:52 +0300 On 04/10/2011 05:23 PM, Kuniyasu Suzaki wrote: Dear, I made a presentation about memory disclosure attack on SKM (Kernel Samepage Merging) with KVM at EuroSec 2011. The titile is Memory Deduplication as a Threat to the Guest OS. http://www.iseclab.org/eurosec-2011/program.html The slide is downloadbale. http://www.slideshare.net/suzaki/eurosec2011-slide-memory-deduplication The paper will be downloadble form ACM Digital Library. Please tell me, if you have comments. Thank you. Very interesting presentation. It seems every time you share something, it become a target for attacks. I'm happy to hear your comments. The referee's comment was severe. It said there was not brand-new point, but there are real attack experiences. My paper was just evaluated the detction on apahce2 and sshd on Linux Guest OS and Firefox and IE6 on Windows Guest OS. If I have a VM on the same physical host as someone else I may be able to determine which programs and specific versions they are currently running. Is there some creative attack using this technique that I'm missing? I don't see many serious threats. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v2 3/5] hpet 'driftfix': add fields to HPETTimer and VMStateDescription
vmstate_hpet_timer = { VMSTATE_UINT64(fsb, HPETTimer), VMSTATE_UINT64(period, HPETTimer), VMSTATE_UINT8(wrap_flag, HPETTimer), + VMSTATE_UINT64_V(saved_period, HPETTimer, 3), + VMSTATE_UINT64_V(ticks_not_accounted, HPETTimer, 3), + VMSTATE_UINT32_V(irqs_to_inject, HPETTimer, 3), + VMSTATE_UINT32_V(irq_rate, HPETTimer, 3), + VMSTATE_UINT32_V(divisor, HPETTimer, 3), We ought to be able to use a subsection keyed off of whether any ticks are currently accumulated, no? Anthony, I'm not sure if I understand your question correctly. Are you suggesting to migrate the driftfix-related state conditionally / only if there are any ticks accumulated in 'ticks_not_accounted' and 'irqs_to_inject' ? The size of the driftfix-related state is 28 bytes per timer and we have 32 timers per HPETState, i.e. 896 additional bytes per HPETState. With a maximum number of 8 HPET blocks (HPETState), this amounts to 7168 bytes. Hence, unconditional migration of the driftfix-related state should not cause significant additional overhead. Maybe I missed something. Could you please explain which benefit you see in using a subsection ? Regards, Uli -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v2 3/5] hpet 'driftfix': add fields to HPETTimer and VMStateDescription
On 04/11/2011 12:06 PM, Ulrich Obergfell wrote: vmstate_hpet_timer = { VMSTATE_UINT64(fsb, HPETTimer), VMSTATE_UINT64(period, HPETTimer), VMSTATE_UINT8(wrap_flag, HPETTimer), + VMSTATE_UINT64_V(saved_period, HPETTimer, 3), + VMSTATE_UINT64_V(ticks_not_accounted, HPETTimer, 3), + VMSTATE_UINT32_V(irqs_to_inject, HPETTimer, 3), + VMSTATE_UINT32_V(irq_rate, HPETTimer, 3), + VMSTATE_UINT32_V(divisor, HPETTimer, 3), We ought to be able to use a subsection keyed off of whether any ticks are currently accumulated, no? Anthony, I'm not sure if I understand your question correctly. Are you suggesting to migrate the driftfix-related state conditionally / only if there are any ticks accumulated in 'ticks_not_accounted' and 'irqs_to_inject' ? The size of the driftfix-related state is 28 bytes per timer and we have 32 timers per HPETState, i.e. 896 additional bytes per HPETState. With a maximum number of 8 HPET blocks (HPETState), this amounts to 7168 bytes. Hence, unconditional migration of the driftfix-related state should not cause significant additional overhead. It's not about overhead. Maybe I missed something. Could you please explain which benefit you see in using a subsection ? In the common case of there being no drift, you can migrate from a qemu that supports driftfix to a qemu that doesn't. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: USB EHCI patch for 0.14.0?
David: I have applied the patch to 0.14.0, and there is a bug if I add a optiarc CRRWDVD CRX890A usb device on windows xp, I first comment out the following code in usb-linux.c: if (is_halted(s, p-devep)) { ret = ioctl(s-fd, USBDEVFS_CLEAR_HALT, urb-endpoint); #if 0 if (ret 0) { DPRINTF(husb: failed to clear halt. ep 0x%x errno %d\n, urb-endpoint, errno); return USB_RET_NAK; } #endif clear_halt(s, p-devep); } then it can continue to run in linux, but still stall on windows xp and win7. I turn on debug, part of the output is as the following: husb: async cancel. aurb 0x1616cd0 husb: async completed. aurb 0x1616cd0 status -2 alen 0 husb: reset device 6.8 husb: claiming interfaces. config 1 husb: i is 18, descr_len is 50, dl 9, dt 2 husb: config #1 need 1 husb: 1 interfaces claimed for configuration 1 husb: ctrl type 0x80 req 0x6 val 0x100 index 0 len 64 husb: submit ctrl. len 72 aurb 0x1616cd0 husb: async completed. aurb 0x1616cd0 status 0 alen 18 invoking packet_complete. plen = 8 husb: reset device 6.8 husb: claiming interfaces. config 1 husb: i is 18, descr_len is 50, dl 9, dt 2 husb: config #1 need 1 husb: 1 interfaces claimed for configuration 1 husb: ctrl type 0x0 req 0x5 val 0x2 index 0 len 0 husb: ctrl set addr 2 husb: ctrl type 0x80 req 0x6 val 0x100 index 0 len 18 husb: submit ctrl. len 26 aurb 0x1616cd0 husb: async completed. aurb 0x1616cd0 status 0 alen 18 invoking packet_complete. plen = 8 husb: ctrl type 0x0 req 0x9 val 0x1 index 0 len 0 husb: releasing interfaces husb: ctrl set config 1 ret 0 errno 11 husb: claiming interfaces. config 1 husb: i is 18, descr_len is 50, dl 9, dt 2 husb: config #1 need 1 husb: 1 interfaces claimed for configuration 1 husb: data submit. ep 0x2 len 31 aurb 0x1616cd0 husb: async completed. aurb 0x1616cd0 status 0 alen 31 invoking packet_complete. plen = 31 husb: data submit. ep 0x81 len 64 aurb 0x1616cd0 husb: async completed. aurb 0x1616cd0 status 0 alen 4 invoking packet_complete. plen = 4 husb: data submit. ep 0x81 len 13 aurb 0x1616cd0 husb: async completed. aurb 0x1616cd0 status -32 alen 0 invoking packet_complete. plen = -3 husb: reset device 6.8 husb: claiming interfaces. config 1 husb: i is 18, descr_len is 50, dl 9, dt 2 husb: config #1 need 1 husb: 1 interfaces claimed for configuration 1 husb: ctrl type 0x80 req 0x6 val 0x100 index 0 len 64 husb: submit ctrl. len 72 aurb 0x1616cd0 husb: async completed. aurb 0x1616cd0 status 0 alen 18 invoking packet_complete. plen = 8 husb: reset device 6.8 husb: claiming interfaces. config 1 husb: i is 18, descr_len is 50, dl 9, dt 2 husb: config #1 need 1 husb: 1 interfaces claimed for configuration 1 husb: ctrl type 0x0 req 0x5 val 0x1 index 0 len 0 husb: ctrl set addr 1 husb: ctrl type 0x80 req 0x6 val 0x100 index 0 len 18 husb: submit ctrl. len 26 aurb 0x1616cd0 husb: async completed. aurb 0x1616cd0 status 0 alen 18 invoking packet_complete. plen = 8 husb: ctrl type 0x0 req 0x9 val 0x1 index 0 len 0 husb: releasing interfaces husb: ctrl set config 1 ret 0 errno 11 husb: claiming interfaces. config 1 husb: i is 18, descr_len is 50, dl 9, dt 2 husb: config #1 need 1 husb: 1 interfaces claimed for configuration 1 husb: data submit. ep 0x2 len 31 aurb 0x1616cd0 husb: async completed. aurb 0x1616cd0 status 0 alen 31 invoking packet_complete. plen = 31 husb: data submit. ep 0x81 len 64 aurb 0x1616cd0 [Thread 0x74f75710 (LWP 3317) exited] husb: async completed. aurb 0x1616cd0 status 0 alen 4 invoking packet_complete. plen = 4 husb: data submit. ep 0x81 len 13 aurb 0x1616cd0 husb: async cancel. aurb 0x1616cd0 husb: async completed. aurb 0x1616cd0 status -2 alen 0 husb: reset device 6.8 husb: claiming interfaces. config 1 husb: i is 18, descr_len is 50, dl 9, dt 2 husb: config #1 need 1 husb: 1 interfaces claimed for configuration 1 husb: ctrl type 0x80 req 0x6 val 0x100 index 0 len 64 husb: submit ctrl. len 72 aurb 0x1616cd0 husb: async completed. aurb 0x1616cd0 status 0 alen 18 invoking packet_complete. plen = 8 husb: reset device 6.8 husb: claiming interfaces. config 1 husb: i is 18, descr_len is 50, dl 9, dt 2 husb: config #1 need 1 husb: 1 interfaces claimed for configuration 1 husb: ctrl type 0x0 req 0x5 val 0x2 index 0 len 0 husb: ctrl set addr 2 husb: ctrl type 0x80 req 0x6 val 0x100 index 0 len 18 husb: submit ctrl. len 26 aurb 0x1616cd0 husb: async completed. aurb 0x1616cd0 status 0 alen 18 invoking packet_complete. plen = 8 husb: ctrl type 0x0 req 0x9 val 0x1 index 0 len 0 husb: releasing interfaces husb: ctrl set config 1 ret 0 errno 11 husb: claiming interfaces. config 1 husb: i is 18, descr_len is 50, dl 9, dt 2 husb: config #1 need 1 husb: 1 interfaces claimed for configuration 1 husb: data submit. ep 0x2 len 31 aurb 0x1616cd0 husb: async completed. aurb 0x1616cd0 status 0 alen 31 invoking packet_complete. plen = 31 husb: data submit. ep 0x81 len 40 aurb
RE: Administration panel for KVM
Hi Daniel, Proxmox VE can be installed on existing Lenny installations (see http://pve.proxmox.com/wiki/Install_Proxmox_VE_on_Debian_Lenny), the upcoming 2.x series on Squeeze. But we still provide a bare-metal installer as this is the most user friendly way to install (the auto partitioning make sure that there is enough free space for the LVM snapshots, used for backups (see vzdump for OpenVZ and KVM)). This means you just have an additional repo in your sources.list and you still get Debian security updates (expect some package which are provide by our repo, like KVM or Kernel). We do not use libvirt, we have a web gui and also tools for the command line, e.g. qm for managing KVM guests. http://pve.proxmox.com/wiki/Qm_manual Here is the link to the roadmap for 2.0 - a major change a big step forward: http://pve.proxmox.com/wiki/Roadmap#Roadmap_for_2.x Best Regards, Martin -Original Message- From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf Of Daniel Bareiro Sent: Sonntag, 10. April 2011 17:00 To: KVM General Subject: Re: Administration panel for KVM On Sunday, 10 April 2011 14:00:41 +0200, Matthias Hovestadt wrote: Hi! Hi, Matthias! With a group of college buddies, we are evaluating the possibility of initiating a project to develop a management panel of KVM virtual machines. The idea is to do something similar to OpenXenManager but for KVM. At out university we developed a Perl-based management tool named kvm-top. This tool is command-line only, not offering any GUI at the moment. The initial idea of that tool was to make the start-up of VMs easier than doing it manually. The tool analyzes a VM-specific config file like GUEST_ID=219 GUEST_NAME=attic . . defining all parameters for starting up a VM. For actually starting this VM, a single command now is sufficient: asok01 ~ # kvm-top -start attic This will not only start-up the VM attic, but also check if this VM is running on some other cluster node and connect to the iSCSI target if required. Meanwhile, the tool has evolved, not only consisting of the kvm-top tool, but also a server component named kvm-ctld running on each cluster node. The kvm-top tool connects to the kvm-ctld running on the local host, executing the desired command. At this, the command does not nessecarily have to be executed on the same cluster node. For instance, it is easily possible to start/stop a VM running on a different cluster node. However, the main feature of kvm-top is giving information about the current status of the running VMs: asok01 ~ # kvm-top VM NODE AS 5s 30s USER PID #CPU MEM VNC SPICE #LAN == === atticasok02 4 4 root 66141 2048 36003 -2 cbaseasok08 1 1 root 102221 1048 36142 -1 cbase-spice asok08 0 0 root 42691 1024 36143 59241 cloud-pj asok02 14 18 root 240711 1024 36001 -2 . . . where 5s and 30s contain the average system load over the last 5s resp. 30s. There are serveral ways of filtering or sorting the output, e.g. sorting by cluster nodes: asok01 ~ # kvm-top -s node NODE VM AS 5s 30s USER PID #CPU MEM VNC SPICE #LAN == === asok01(ENABLED): 0(0) VMs, CPU=0%, MEM=2%, AGE 00:00 asok02(ENABLED): 7(8) VMs, CPU=13%, MEM=99%, AGE 00:05 attic 4 4 root 66141 2048 36003 -2 cloud-pj 21 19 root 240711 1024 36001 -2 . . The kvm-top tool even allows migration of VMs between the cluster nodes. The following command would migrate the VM attic from the currently used cluster node asok02 to cluster node asok07 (note: the command has been executed on a different cluster node asok01): asok01 ~ # kvm-top -migrate attic asok07 As I mentioned, the tool is command line only at the moment, however it shouldn't be too difficult to create a web-based interface, since the kvm-ctld allows communication not only with kvm-top. Connecting to the port of kvm-ctld, it's pretty easy to get information about all currently running VMs or start/stop/migrate VMs. If there's interest in that tool, please let me know. I'll gladly publish it. Sounds interesting. If you publish it, I'd take a look. Researching on the Internet I found virt-manager [1], although I'm not sure if it can interact with KVM. In any case, virt-manager uses libvirt and my idea was not to use libvirt in the VMHost. I guess kvm-ctld will supply some of the functions of libvirt at the remote end. Thanks for your reply. Regards, Daniel [1] http://virt-manager.et.redhat.com/ -- Fingerprint: BFB3 08D6 B4D1 31B2 72B9 29CE 6696 BF1B 14E6 1D37 Powered
Re: Slow PXE boot in qemu.git (fast in qemu-kvm.git)
On Sat, 9 Apr 2011 13:34:43 +0300 Blue Swirl blauwir...@gmail.com wrote: On Sat, Apr 9, 2011 at 2:25 AM, Luiz Capitulino lcapitul...@redhat.com wrote: Hi there, Summary: - PXE boot in qemu.git (HEAD f124a41) is quite slow, more than 5 minutes. Got the problem with e1000, virtio and rtl8139. However, pcnet *works* (it's as fast as qemu-kvm.git) - PXE boot in qemu-kvm.git (HEAD df85c051) is fast, less than a minute. Tried with e1000, virtio and rtl8139 (I don't remember if I tried with pcnet) I tried with qemu.git v0.13.0 in order to check if this was a regression, but I got the same problem... Then I inspected qemu-kvm.git under the assumption that it could have a fix that wasn't commited to qemu.git. Found this: - commit 0836b77f0f65d56d08bdeffbac25cd6d78267dc9 which is merge, works - commit cc015e9a5dde2f03f123357fa060acbdfcd570a4 does not work (it's slow) I tried a bisect, but it brakes due to gcc4 vs. gcc3 changes. Then I inspected commits manually, and found out that commit 64d7e9a4 doesn't work, which makes me think that the fix could be in the conflict resolution of 0836b77f, which makes me remember that I'm late for diner, so my conclusions at this point are not reliable :) Ideas? What is the test case? It's an external PXE server, command-line is: qemu -boot n -enable-kvm -net nic,model=virtio -net tap,ifname=vnet0,script= I tried PXE booting a 10M file with and without KVM and the results are pretty much the same with pcnet and e1000. time qemu -monitor stdio -boot n -net nic,model=e1000 -net user,tftp=.,bootfile=10M -net dump,file=foo -enable-kvm time qemu -monitor stdio -boot n -net nic,model=pcnet -net user,tftp=.,bootfile=10M -net dump,file=foo -enable-kvm time qemu -monitor stdio -boot n -net nic,model=e1000 -net user,tftp=.,bootfile=10M -net dump,file=foo time qemu -monitor stdio -boot n -net nic,model=pcnet -net user,tftp=.,bootfile=10M -net dump,file=foo All times are ~10s. Yeah, you're using the internal tftp server. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v2 3/5] hpet 'driftfix': add fields to HPETTimer and VMStateDescription
On 04/11/2011 04:08 AM, Avi Kivity wrote: On 04/11/2011 12:06 PM, Ulrich Obergfell wrote: vmstate_hpet_timer = { VMSTATE_UINT64(fsb, HPETTimer), VMSTATE_UINT64(period, HPETTimer), VMSTATE_UINT8(wrap_flag, HPETTimer), + VMSTATE_UINT64_V(saved_period, HPETTimer, 3), + VMSTATE_UINT64_V(ticks_not_accounted, HPETTimer, 3), + VMSTATE_UINT32_V(irqs_to_inject, HPETTimer, 3), + VMSTATE_UINT32_V(irq_rate, HPETTimer, 3), + VMSTATE_UINT32_V(divisor, HPETTimer, 3), We ought to be able to use a subsection keyed off of whether any ticks are currently accumulated, no? Anthony, I'm not sure if I understand your question correctly. Are you suggesting to migrate the driftfix-related state conditionally / only if there are any ticks accumulated in 'ticks_not_accounted' and 'irqs_to_inject' ? The size of the driftfix-related state is 28 bytes per timer and we have 32 timers per HPETState, i.e. 896 additional bytes per HPETState. With a maximum number of 8 HPET blocks (HPETState), this amounts to 7168 bytes. Hence, unconditional migration of the driftfix-related state should not cause significant additional overhead. It's not about overhead. Maybe I missed something. Could you please explain which benefit you see in using a subsection ? In the common case of there being no drift, you can migrate from a qemu that supports driftfix to a qemu that doesn't. Right, subsections are a trick. The idea is that when you introduce new state for a device model that is not always going to be set, when you do the migration, you detect whether the state is set or not and if it's not set, instead of sending empty versions of that state (i.e. missed_ticks=0) you just don't send the new state at all. This means that you can migrate to an older version of QEMU provided the migration would work correctly. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v2 3/5] hpet 'driftfix': add fields to HPETTimer and VMStateDescription
On 04/11/2011 03:24 AM, Ulrich Obergfell wrote: typedef struct HPETState { @@ -248,7 +253,7 @@ static int hpet_post_load(void *opaque, int version_id) static const VMStateDescription vmstate_hpet_timer = { .name = hpet_timer, - .version_id = 1, + .version_id = 3, Why jump from 1 to 3? .minimum_version_id = 1, .minimum_version_id_old = 1, .fields = (VMStateField []) { @@ -258,6 +263,11 @@ static const VMStateDescription vmstate_hpet_timer = { VMSTATE_UINT64(fsb, HPETTimer), VMSTATE_UINT64(period, HPETTimer), VMSTATE_UINT8(wrap_flag, HPETTimer), + VMSTATE_UINT64_V(saved_period, HPETTimer, 3), + VMSTATE_UINT64_V(ticks_not_accounted, HPETTimer, 3), + VMSTATE_UINT32_V(irqs_to_inject, HPETTimer, 3), + VMSTATE_UINT32_V(irq_rate, HPETTimer, 3), + VMSTATE_UINT32_V(divisor, HPETTimer, 3), Anthony, I incremented the version ID of 'vmstate_hpet' from 2 to 3 to make sure that migrations from a QEMU process that is capable of 'driftfix' to a QEMU process that is _not_ capable of 'driftfix' will fail. I assigned version ID 3 to 'vmstate_hpet_timer' and to the new fields in there too to indicate that adding those fields was the reason why the version ID of 'vmstate_hpet' was incremented to 3. As far as the flow of execution in vmstate_load_state() is concerned, I think it does not matter whether the version ID of 'vmstate_hpet_timer' and the new fields in there is 2 or 3 (as long as they are consistent). When the 'while(field-name)' loop in vmstate_load_state() gets to the following field in 'vmstate_hpet' ... VMSTATE_STRUCT_VARRAY_UINT8(timer, HPETState, num_timers, 0, vmstate_hpet_timer, HPETTimer), ... it calls itself recursively ... if (field-flags VMS_STRUCT) { ret = vmstate_load_state(f, field-vmsd, addr, field-vmsd-version_id); 'field-vmsd-version_id' is the version ID of 'vmstate_hpet_timer' [1]. Hence 'vmstate_hpet_timer.version_id' is being checked against itself ... if (version_id vmsd-version_id) { return -EINVAL; } ... and the version IDs of the new fields are also being checked against 'vmstate_hpet_timer.version_id' ... if ((field-field_exists field-field_exists(opaque, version_id)) || (!field-field_exists field-version_id= version_id)) { If you want me to change the version ID of 'vmstate_hpet_timer' and the new fields in there from 3 to 2, I can do that. It avoids surprises so I think it's a reasonable thing to do. But yes, your analysis is correct. Regards, Anthony Liguori Regards, Uli [1] Ref.: commit fa3aad24d94a6cf894db52d83f72a399324a17bb -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: USB EHCI patch for 0.14.0?
On 04/11/11 03:40, ya su wrote: David: I have applied the patch to 0.14.0, and there is a bug if I add a optiarc CRRWDVD CRX890A usb device on windows xp, I first comment out the following code in usb-linux.c: if (is_halted(s, p-devep)) { ret = ioctl(s-fd, USBDEVFS_CLEAR_HALT, urb-endpoint); #if 0 if (ret 0) { DPRINTF(husb: failed to clear halt. ep 0x%x errno %d\n, urb-endpoint, errno); return USB_RET_NAK; } #endif clear_halt(s, p-devep); } then it can continue to run in linux, but still stall on windows xp and win7. I turn on debug, part of the output is as the following: The EHCI code is very rough and needs someone to step up and finish it. It seems to work ok for USB storage devices (keys and drives), and seems to work fine with printers and scanners (at least it works with mine ;-)). I see stalls from time to time, but it recovers and continues on. Clearly some touchups are needed. On the other end it is known not to work with any audio and video devices (webcams, iphones). Something like the DVD I have no idea - never tried. I lost momentum on the code last August and have not been able to get back to it for a variety of reasons. It really needs someone to pick it up and continue - or look at adding xhci code which might be a better solution for virtualization. David -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v2 3/5] hpet 'driftfix': add fields to HPETTimer and VMStateDescription
On Mon, 2011-04-11 at 08:10 -0500, Anthony Liguori wrote: On 04/11/2011 04:08 AM, Avi Kivity wrote: On 04/11/2011 12:06 PM, Ulrich Obergfell wrote: vmstate_hpet_timer = { VMSTATE_UINT64(fsb, HPETTimer), VMSTATE_UINT64(period, HPETTimer), VMSTATE_UINT8(wrap_flag, HPETTimer), + VMSTATE_UINT64_V(saved_period, HPETTimer, 3), + VMSTATE_UINT64_V(ticks_not_accounted, HPETTimer, 3), + VMSTATE_UINT32_V(irqs_to_inject, HPETTimer, 3), + VMSTATE_UINT32_V(irq_rate, HPETTimer, 3), + VMSTATE_UINT32_V(divisor, HPETTimer, 3), We ought to be able to use a subsection keyed off of whether any ticks are currently accumulated, no? Anthony, I'm not sure if I understand your question correctly. Are you suggesting to migrate the driftfix-related state conditionally / only if there are any ticks accumulated in 'ticks_not_accounted' and 'irqs_to_inject' ? The size of the driftfix-related state is 28 bytes per timer and we have 32 timers per HPETState, i.e. 896 additional bytes per HPETState. With a maximum number of 8 HPET blocks (HPETState), this amounts to 7168 bytes. Hence, unconditional migration of the driftfix-related state should not cause significant additional overhead. It's not about overhead. Maybe I missed something. Could you please explain which benefit you see in using a subsection ? In the common case of there being no drift, you can migrate from a qemu that supports driftfix to a qemu that doesn't. Right, subsections are a trick. The idea is that when you introduce new state for a device model that is not always going to be set, when you do the migration, you detect whether the state is set or not and if it's not set, instead of sending empty versions of that state (i.e. missed_ticks=0) you just don't send the new state at all. This means that you can migrate to an older version of QEMU provided the migration would work correctly. Using subsections and testing for hpet option being disabled vs enabled, is fine. But checking for the existence of drift, like you suggested (or at least how I understood you), is very tricky. It is expected to change many times during guest lifetime, and would make our migration predictability something Heisenberg would be proud of. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v2 3/5] hpet 'driftfix': add fields to HPETTimer and VMStateDescription
On 04/11/2011 04:39 PM, Glauber Costa wrote: On Mon, 2011-04-11 at 08:10 -0500, Anthony Liguori wrote: On 04/11/2011 04:08 AM, Avi Kivity wrote: On 04/11/2011 12:06 PM, Ulrich Obergfell wrote: vmstate_hpet_timer = { VMSTATE_UINT64(fsb, HPETTimer), VMSTATE_UINT64(period, HPETTimer), VMSTATE_UINT8(wrap_flag, HPETTimer), + VMSTATE_UINT64_V(saved_period, HPETTimer, 3), + VMSTATE_UINT64_V(ticks_not_accounted, HPETTimer, 3), + VMSTATE_UINT32_V(irqs_to_inject, HPETTimer, 3), + VMSTATE_UINT32_V(irq_rate, HPETTimer, 3), + VMSTATE_UINT32_V(divisor, HPETTimer, 3), We ought to be able to use a subsection keyed off of whether any ticks are currently accumulated, no? Anthony, I'm not sure if I understand your question correctly. Are you suggesting to migrate the driftfix-related state conditionally / only if there are any ticks accumulated in 'ticks_not_accounted' and 'irqs_to_inject' ? The size of the driftfix-related state is 28 bytes per timer and we have 32 timers per HPETState, i.e. 896 additional bytes per HPETState. With a maximum number of 8 HPET blocks (HPETState), this amounts to 7168 bytes. Hence, unconditional migration of the driftfix-related state should not cause significant additional overhead. It's not about overhead. Maybe I missed something. Could you please explain which benefit you see in using a subsection ? In the common case of there being no drift, you can migrate from a qemu that supports driftfix to a qemu that doesn't. Right, subsections are a trick. The idea is that when you introduce new state for a device model that is not always going to be set, when you do the migration, you detect whether the state is set or not and if it's not set, instead of sending empty versions of that state (i.e. missed_ticks=0) you just don't send the new state at all. This means that you can migrate to an older version of QEMU provided the migration would work correctly. Using subsections and testing for hpet option being disabled vs enabled, is fine. But checking for the existence of drift, like you suggested (or at least how I understood you), is very tricky. It is expected to change many times during guest lifetime, and would make our migration predictability something Heisenberg would be proud of. First, I'd expect no drift under normal circumstances, at least without overcommit. We may also allow a small amount of drift to pass migration (we lost time during the last phase anyway). Second, the problem only occurs on new-old migrations. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v2 3/5] hpet 'driftfix': add fields to HPETTimer and VMStateDescription
On 04/11/2011 08:39 AM, Glauber Costa wrote: On Mon, 2011-04-11 at 08:10 -0500, Anthony Liguori wrote: On 04/11/2011 04:08 AM, Avi Kivity wrote: On 04/11/2011 12:06 PM, Ulrich Obergfell wrote: vmstate_hpet_timer = { VMSTATE_UINT64(fsb, HPETTimer), VMSTATE_UINT64(period, HPETTimer), VMSTATE_UINT8(wrap_flag, HPETTimer), + VMSTATE_UINT64_V(saved_period, HPETTimer, 3), + VMSTATE_UINT64_V(ticks_not_accounted, HPETTimer, 3), + VMSTATE_UINT32_V(irqs_to_inject, HPETTimer, 3), + VMSTATE_UINT32_V(irq_rate, HPETTimer, 3), + VMSTATE_UINT32_V(divisor, HPETTimer, 3), We ought to be able to use a subsection keyed off of whether any ticks are currently accumulated, no? Anthony, I'm not sure if I understand your question correctly. Are you suggesting to migrate the driftfix-related state conditionally / only if there are any ticks accumulated in 'ticks_not_accounted' and 'irqs_to_inject' ? The size of the driftfix-related state is 28 bytes per timer and we have 32 timers per HPETState, i.e. 896 additional bytes per HPETState. With a maximum number of 8 HPET blocks (HPETState), this amounts to 7168 bytes. Hence, unconditional migration of the driftfix-related state should not cause significant additional overhead. It's not about overhead. Maybe I missed something. Could you please explain which benefit you see in using a subsection ? In the common case of there being no drift, you can migrate from a qemu that supports driftfix to a qemu that doesn't. Right, subsections are a trick. The idea is that when you introduce new state for a device model that is not always going to be set, when you do the migration, you detect whether the state is set or not and if it's not set, instead of sending empty versions of that state (i.e. missed_ticks=0) you just don't send the new state at all. This means that you can migrate to an older version of QEMU provided the migration would work correctly. Using subsections and testing for hpet option being disabled vs enabled, is fine. But checking for the existence of drift, like you suggested (or at least how I understood you), is very tricky. It is expected to change many times during guest lifetime, and would make our migration predictability something Heisenberg would be proud of. Is this true? I would expect it to be very tied to workloads. For idle workloads, you should never have accumulated missed ticks whereas with heavy workloads, you always will have accumulated ticks. Is that not correct? Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v2 3/5] hpet 'driftfix': add fields to HPETTimer and VMStateDescription
On Mon, 2011-04-11 at 08:47 -0500, Anthony Liguori wrote: On 04/11/2011 08:39 AM, Glauber Costa wrote: On Mon, 2011-04-11 at 08:10 -0500, Anthony Liguori wrote: On 04/11/2011 04:08 AM, Avi Kivity wrote: On 04/11/2011 12:06 PM, Ulrich Obergfell wrote: vmstate_hpet_timer = { VMSTATE_UINT64(fsb, HPETTimer), VMSTATE_UINT64(period, HPETTimer), VMSTATE_UINT8(wrap_flag, HPETTimer), + VMSTATE_UINT64_V(saved_period, HPETTimer, 3), + VMSTATE_UINT64_V(ticks_not_accounted, HPETTimer, 3), + VMSTATE_UINT32_V(irqs_to_inject, HPETTimer, 3), + VMSTATE_UINT32_V(irq_rate, HPETTimer, 3), + VMSTATE_UINT32_V(divisor, HPETTimer, 3), We ought to be able to use a subsection keyed off of whether any ticks are currently accumulated, no? Anthony, I'm not sure if I understand your question correctly. Are you suggesting to migrate the driftfix-related state conditionally / only if there are any ticks accumulated in 'ticks_not_accounted' and 'irqs_to_inject' ? The size of the driftfix-related state is 28 bytes per timer and we have 32 timers per HPETState, i.e. 896 additional bytes per HPETState. With a maximum number of 8 HPET blocks (HPETState), this amounts to 7168 bytes. Hence, unconditional migration of the driftfix-related state should not cause significant additional overhead. It's not about overhead. Maybe I missed something. Could you please explain which benefit you see in using a subsection ? In the common case of there being no drift, you can migrate from a qemu that supports driftfix to a qemu that doesn't. Right, subsections are a trick. The idea is that when you introduce new state for a device model that is not always going to be set, when you do the migration, you detect whether the state is set or not and if it's not set, instead of sending empty versions of that state (i.e. missed_ticks=0) you just don't send the new state at all. This means that you can migrate to an older version of QEMU provided the migration would work correctly. Using subsections and testing for hpet option being disabled vs enabled, is fine. But checking for the existence of drift, like you suggested (or at least how I understood you), is very tricky. It is expected to change many times during guest lifetime, and would make our migration predictability something Heisenberg would be proud of. Is this true? I would expect it to be very tied to workloads. For idle workloads, you should never have accumulated missed ticks whereas with heavy workloads, you always will have accumulated ticks. Is that not correct? Yes, it is , but we lose a lot of reliability by tying migration to the workload. Given that we still have to start qemu the same way both sides, we end up with a situation in which at time t, migration is possible, and at time t+1 migration is not. I'd rather have subsections enabled at all times when the option to allow driftfix is enabled. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: EuroSec'11 Presentation
On 04/11/2011 03:51 AM, Stefan Hajnoczi wrote: I'm happy to hear your comments. The referee's comment was severe. It said there was not brand-new point, but there are real attack experiences. My paper was just evaluated the detction on apahce2 and sshd on Linux Guest OS and Firefox and IE6 on Windows Guest OS. If I have a VM on the same physical host as someone else I may be able to determine which programs and specific versions they are currently running. Is there some creative attack using this technique that I'm missing? I don't see many serious threats. It's a deviation of a previously demonstrated attack where memory access timing is used to guess memory content. This has been demonstrated in the past to be a viable technique to reduce the keyspace of things like ssh keys which makes attack a bit easier. But it's a well known issue with colocation and the attack can be executed just by looking at raw memory access time (to guess whether another process brought something into the cache). Regards, Anthony Liguori Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: EuroSec'11 Presentation
Stefan, From: Stefan Hajnoczi stefa...@gmail.com Subject: Re: EuroSec'11 Presentation Date: Mon, 11 Apr 2011 09:51:42 +0100 On Sun, Apr 10, 2011 at 4:19 PM, Kuniyasu Suzaki k.suz...@aist.go.jp wrote: From: Avi Kivity a...@redhat.com Subject: Re: EuroSec'11 Presentation Date: Sun, 10 Apr 2011 17:49:52 +0300 On 04/10/2011 05:23 PM, Kuniyasu Suzaki wrote: Dear, I made a presentation about memory disclosure attack on SKM (Kernel Samepage Merging) with KVM at EuroSec 2011. The titile is Memory Deduplication as a Threat to the Guest OS. http://www.iseclab.org/eurosec-2011/program.html The slide is downloadbale. http://www.slideshare.net/suzaki/eurosec2011-slide-memory-deduplication The paper will be downloadble form ACM Digital Library. Please tell me, if you have comments. Thank you. Very interesting presentation. It seems every time you share something, it become a target for attacks. I'm happy to hear your comments. The referee's comment was severe. It said there was not brand-new point, but there are real attack experiences. My paper was just evaluated the detction on apahce2 and sshd on Linux Guest OS and Firefox and IE6 on Windows Guest OS. If I have a VM on the same physical host as someone else I may be able to determine which programs and specific versions they are currently running. Is there some creative attack using this technique that I'm missing? I don't see many serious threats. The memory disclosure attack assumed to be applied on Cloud Computing which offers multi tenants. Even if a application has a vulnerablity, attacker can find and attack it. As I show my slides, IE6 is an exmaple. The situation resembles to Cross VM Side Channel Attack mentioned in CCS10 paper Hey, you, get off of my cloud. Kuniyasu Suzaki -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: EuroSec'11 Presentation
Anthony, From: Anthony Liguori anth...@codemonkey.ws Subject: Re: EuroSec'11 Presentation Date: Mon, 11 Apr 2011 10:27:27 -0500 On 04/11/2011 03:51 AM, Stefan Hajnoczi wrote: I'm happy to hear your comments. The referee's comment was severe. It said there was not brand-new point, but there are real attack experiences. My paper was just evaluated the detction on apahce2 and sshd on Linux Guest OS and Firefox and IE6 on Windows Guest OS. If I have a VM on the same physical host as someone else I may be able to determine which programs and specific versions they are currently running. Is there some creative attack using this technique that I'm missing? I don't see many serious threats. It's a deviation of a previously demonstrated attack where memory access timing is used to guess memory content. This has been demonstrated in the past to be a viable technique to reduce the keyspace of things like ssh keys which makes attack a bit easier. But it's a well known issue with colocation and the attack can be executed just by looking at raw memory access time (to guess whether another process brought something into the cache). Thank you for comments. The memory disclosure attack can be prevented by several ways mention in my Countermeasure side (Page 22). If we limit KSM on READ-ONLY pages, we detect and prevent the attack. I also think most memory deduplication is on READ-ONLY pages. -- Kuniysu Suzaki -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: EuroSec'11 Presentation
On 04/11/2011 06:46 PM, Kuniyasu Suzaki wrote: But it's a well known issue with colocation and the attack can be executed just by looking at raw memory access time (to guess whether another process brought something into the cache). Thank you for comments. The memory disclosure attack can be prevented by several ways mention in my Countermeasure side (Page 22). If we limit KSM on READ-ONLY pages, we detect and prevent the attack. I also think most memory deduplication is on READ-ONLY pages. With EPT or NPT you cannot detect if a page is read only. Furthermore, at least Linux (without highmem) maps all of memory with a read/write mapping in addition to the per-process mapping, so no page is read-only. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: EuroSec'11 Presentation
From: Avi Kivity a...@redhat.com Subject: Re: EuroSec'11 Presentation Date: Mon, 11 Apr 2011 18:48:41 +0300 On 04/11/2011 06:46 PM, Kuniyasu Suzaki wrote: But it's a well known issue with colocation and the attack can be executed just by looking at raw memory access time (to guess whether another process brought something into the cache). Thank you for comments. The memory disclosure attack can be prevented by several ways mention in my Countermeasure side (Page 22). If we limit KSM on READ-ONLY pages, we detect and prevent the attack. I also think most memory deduplication is on READ-ONLY pages. With EPT or NPT you cannot detect if a page is read only. Furthermore, at least Linux (without highmem) maps all of memory with a read/write mapping in addition to the per-process mapping, so no page is read-only. Unfortunately, yes. Linux kernel maps all memory with read/write. I met this problem already. I have to find another OS which clearly separete read only pages. I also know the CPU can not distinguish read only pages. However, If a VMM can trace CR3 and retrive the page tables, we can distinguish read only page or not. Yes, it is a academic interest. -- suzaki -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2] fix regression caused by e48672fa25e879f7ae21785c7efd187738139593
Hello Zachary, what is the current status, are You going to post this patch to Avi? I'd like to see one (or both) in stable eventually, I think it's good candidate.. BR nik - Ing. Nikola CIPRICH LinuxBox.cz, s.r.o. 28. rijna 168, 709 01 Ostrava tel.: +420 596 603 142 fax:+420 596 621 273 mobil: +420 777 093 799 www.linuxbox.cz mobil servis: +420 737 238 656 email servis: ser...@linuxbox.cz - -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: EuroSec'11 Presentation
On Mon, Apr 11, 2011 at 4:27 PM, Anthony Liguori anth...@codemonkey.ws wrote: On 04/11/2011 03:51 AM, Stefan Hajnoczi wrote: I'm happy to hear your comments. The referee's comment was severe. It said there was not brand-new point, but there are real attack experiences. My paper was just evaluated the detction on apahce2 and sshd on Linux Guest OS and Firefox and IE6 on Windows Guest OS. If I have a VM on the same physical host as someone else I may be able to determine which programs and specific versions they are currently running. Is there some creative attack using this technique that I'm missing? I don't see many serious threats. It's a deviation of a previously demonstrated attack where memory access timing is used to guess memory content. This has been demonstrated in the past to be a viable technique to reduce the keyspace of things like ssh keys which makes attack a bit easier. How can you reduce the key space by determining whether the guest has arbitrary 4 KB data in physical memory? Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM call agenda for April 12
Please, send in any agenda items you are interested in covering. Later, Juan. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: USB EHCI patch for 0.14.0?
On 2011-04-11 15:23, David Ahern wrote: I lost momentum on the code last August and have not been able to get back to it for a variety of reasons. It really needs someone to pick it up and continue - or look at adding xhci code which might be a better solution for virtualization. xHCI is on the way [1], but the code was not yet published AFAIK. Jan [1] http://www.linuxtag.org/2011/de/program/freies-vortragsprogramm/popup/vortragsdetails.html?no_cache=1talkid=103 -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: USB EHCI patch for 0.14.0?
On 2011-04-11 15:23, David Ahern wrote: I lost momentum on the code last August and have not been able to get back to it for a variety of reasons. It really needs someone to pick it up and continue - or look at adding xhci code which might be a better solution for virtualization. xHCI is on the way [1], but the code was not yet published AFAIK. Jan [1] http://www.linuxtag.org/2011/de/program/freies-vortragsprogramm/popup/vortragsdetails.html?no_cache=1talkid=103 -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 2/2 V7] qemu,qmp: add inject-nmi qmp command
On Mon, Apr 11, 2011 at 10:01 AM, Markus Armbruster arm...@redhat.com wrote: Avi Kivity a...@redhat.com writes: On 04/08/2011 12:41 AM, Anthony Liguori wrote: And it's a good thing to have, but exposing this as the only API to do something as simple as generating a guest crash dump is not the friendliest thing in the world to do to users. nmi is a fine name for something that corresponds to a real-life nmi button (often labeled NMI). Agree. We could also introduce an alias mechanism for user friendly names, so nmi could be used in addition of full path. Aliases could be useful for device paths as well. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: EuroSec'11 Presentation
On 04/11/2011 10:46 AM, Kuniyasu Suzaki wrote: But it's a well known issue with colocation and the attack can be executed just by looking at raw memory access time (to guess whether another process brought something into the cache). Thank you for comments. The memory disclosure attack can be prevented by several ways mention in my Countermeasure side (Page 22). Not to be discouraging, but this class of attacks (side channel information disclosures) is very well known and very well documented. Side channel attacks are extremely difficult to use from a practical perspective. First, you have to know that your target is colocated with you and that you are actually sharing a resource. Second, you have to be able to exploit the additional information you've gathered. This type of attack is just as application to any multi-user environment and is not at all unique to virtualization. If we limit KSM on READ-ONLY pages, we detect and prevent the attack. I also think most memory deduplication is on READ-ONLY pages. There's really no point about worrying about these sort of things. Either you're not going to colocate, you'll colocate and do the best you can with what the hardware provides (socket isolation, no KSM, etc.), or you're no going to worry about these types of things. Again, it is extremely difficult to use side channel information disclosures to actually exploit anything. If you are worried about this level of security, you shouldn't be using x86 hardware as more advanced hardware has more rigorous support for protecting against these sort of things. Regards, Anthony Liguori -- Kuniysu Suzaki -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: EuroSec'11 Presentation
On 04/11/2011 11:25 AM, Stefan Hajnoczi wrote: On Mon, Apr 11, 2011 at 4:27 PM, Anthony Liguorianth...@codemonkey.ws wrote: It's a deviation of a previously demonstrated attack where memory access timing is used to guess memory content. This has been demonstrated in the past to be a viable technique to reduce the keyspace of things like ssh keys which makes attack a bit easier. How can you reduce the key space by determining whether the guest has arbitrary 4 KB data in physical memory? I'm not sure that you can. But the way the cache timing attack worked is that by doing a cache timing analysis in another process that's sharing the cache with a process doing key generation, you can make predictions about the paths taken by the key generation code which let's you narrow down the key space. Of course, even this is extremely hard to exploit because you need to happen to be sharing a cache with something that's doing ssh key generation, you have to know when it starts and when it ends, and you have to know exactly what version of ssh is running. And even then, it's just reduces the time needed to brute force. It still takes a long time. I think knowing whether a 4kb page is shared by some other guest in the system is so little information that I don't see what you could practically do with it that can't already be done via a cache timing attack. Regards, Anthony Liguori Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: USB EHCI patch for 0.14.0?
On 04/11/11 10:46, Jan Kiszka wrote: On 2011-04-11 15:23, David Ahern wrote: I lost momentum on the code last August and have not been able to get back to it for a variety of reasons. It really needs someone to pick it up and continue - or look at adding xhci code which might be a better solution for virtualization. xHCI is on the way [1], but the code was not yet published AFAIK. Jan [1] http://www.linuxtag.org/2011/de/program/freies-vortragsprogramm/popup/vortragsdetails.html?no_cache=1talkid=103 interesting. And will it be released / submitted to qemu for inclusion? David -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Host crash
Hello, I ran into a crash today while I tried to log into one of my servers. I noticed ssh didn't respond at all. All virtual machines were running ok though. I had one terminal open to the server and it was running ok except ssh didn't work. I'm not quite sure if this is kvm related but I was hoping you experts could figure out what went wrong. Sorry for the noise it it is something else. My setup (two similar machines): Asus P5K SE mobo Quad Core Q6600 8 GB RAM Several NICs for different networks NICs: 01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 01) 02:00.0 Ethernet controller: Atheros Communications L1 Gigabit Ethernet (rev b0) 04:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit Ethernet Controller (rev 20) 05:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 01) 06:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5751 Gigabit Ethernet PCI Express (rev 21) 07:01.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet Controller (rev 02) Realteks are bonded for drbd sync Marvell for my local net (using sk98lin driver) Intel for internet connection Broadcom for my SAN. not in use yet. two vlans on my local net Debian squeeze (all software stock squeeze except qemu-kvm) kernel 2.6.32-5-amd64 drbd used for shared storage between hosts qemu-kvm-0.14 home made script for starting and stopping virtual machines Just let me know if you need more info. Below is a cut from dmesg that I was able to save: [210618.760363] [ cut here ] [210618.760397] kernel BUG at /build/buildd-linux-2.6_2.6.32-31-amd64-vrfdM4/linux-2.6-2.6.32/debian/build/source_amd64_none/mm/slub.c:2969! [210618.760455] invalid opcode: [#1] SMP [210618.760489] last sysfs file: /sys/devices/virtual/net/vlan240/statistics/tx_dropped [210618.760542] CPU 3 [210618.760568] Modules linked in: nfs fscache ocfs2 jbd2 quota_tree drbd lru_cache cn nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue fuse configfs bridge loop 8021q garp stp bonding kvm_intel kvm tun snd_hda_codec_realtek snd_hda_intel snd_hda_codec nouveau snd_hwdep ttm snd_pcm snd_timer drm_kms_helper snd drm soundcore i2c_i801 i2c_algo_bit serio_raw snd_page_alloc asus_atk0110 evdev i2c_core pcspkr button processor ext3 jbd mbcache dm_mod raid1 raid0 md_mod sd_mod crc_t10dif pata_marvell ata_generic tg3 ahci ata_piix libphy uhci_hcd e1000 atl1 sk98lin libata scsi_mod ehci_hcd thermal thermal_sys usbcore r8169 nls_base mii [last unloaded: scsi_wait_scan] [210618.761141] Pid: 2588, comm: smartd Tainted: G M 2.6.32-5-amd64 #1 P5K SE [210618.761188] RIP: 0010:[810e723f] [810e723f] kfree+0x55/0xcb [210618.761244] RSP: 0018:88022447baa8 EFLAGS: 00010246 [210618.761272] RAX: 02100068 RBX: 8801fdc35560 RCX: 015e [210618.761320] RDX: 880207d68380 RSI: ea0007945700 RDI: ea000700 [210618.761367] RBP: 8802 R08: R09: 81455200 [210618.761413] R10: 0002 R11: R12: 8110fe65 [210618.761460] R13: 880224d56a80 R14: 88022a49 R15: 880207d68380 [210618.761508] FS: 7f70fb4207e0() GS:880008d8() knlGS: [210618.761557] CS: 0010 DS: ES: CR0: 8005003b [210618.761586] CR2: 7fa94149d6f0 CR3: 00022448e000 CR4: 26e0 [210618.761633] DR0: DR1: DR2: [210618.761680] DR3: DR6: 0ff0 DR7: 0400 [210618.761728] Process smartd (pid: 2588, threadinfo 88022447a000, task 88022ce92350) [210618.761776] Stack: [210618.761799] 8801fdc35560 880207d68380 8110fe65 [210618.761839] 0 8801fdc35560 81110845 88020001 88022aca2350 [210618.761899] 0 880207d68380 8118210a [210618.764184] Call Trace: [210618.764184] [8110fe65] ? bio_free_map_data+0x15/0x1e [210618.764184] [81110845] ? bio_uncopy_user+0x47/0x59 [210618.764184] [8118210a] ? blk_rq_unmap_user+0x1e/0x45 [210618.764184] [811859e7] ? sg_io+0x37a/0x3d7 [210618.764184] [81185f43] ? scsi_cmd_ioctl+0x217/0x3f4 [210618.764184] [810f6145] ? path_to_nameidata+0x15/0x37 [210618.764184] [a00a8b0c] ? sd_ioctl+0x9d/0xcb [sd_mod] [210618.764184] [81183915] ? __blkdev_driver_ioctl+0x69/0x7e [210618.764184] [81184110] ? blkdev_ioctl+0x7e6/0x836 [210618.764184] [810bc307] ? release_pages+0x17b/0x18d [210618.764184] [810ff946] ? touch_atime+0x7c/0x127 [210618.764184]
Re: Slow PXE boot in qemu.git (fast in qemu-kvm.git)
On Fri, 08 Apr 2011 19:50:57 -0500 Anthony Liguori anth...@codemonkey.ws wrote: On 04/08/2011 06:25 PM, Luiz Capitulino wrote: Hi there, Summary: - PXE boot in qemu.git (HEAD f124a41) is quite slow, more than 5 minutes. Got the problem with e1000, virtio and rtl8139. However, pcnet *works* (it's as fast as qemu-kvm.git) - PXE boot in qemu-kvm.git (HEAD df85c051) is fast, less than a minute. Tried with e1000, virtio and rtl8139 (I don't remember if I tried with pcnet) I tried with qemu.git v0.13.0 in order to check if this was a regression, but I got the same problem... Then I inspected qemu-kvm.git under the assumption that it could have a fix that wasn't commited to qemu.git. Found this: - commit 0836b77f0f65d56d08bdeffbac25cd6d78267dc9 which is merge, works - commit cc015e9a5dde2f03f123357fa060acbdfcd570a4 does not work (it's slow) I tried a bisect, but it brakes due to gcc4 vs. gcc3 changes. Then I inspected commits manually, and found out that commit 64d7e9a4 doesn't work, which makes me think that the fix could be in the conflict resolution of 0836b77f, which makes me remember that I'm late for diner, so my conclusions at this point are not reliable :) Can you run kvm_stat to see what the exit rates are? Here you go, both collected after the VM is fully booted: qemu.git: efer_reload0 0 exits 15976719599 fpu_reload 203 0 halt_exits 54427 halt_wakeup0 0 host_state_reload 29985170 hypercalls 0 0 insn_emulation 13449597341 insn_emulation_fail0 0 invlpg 9687 0 io_exits 85979 0 irq_exits 162179 4 irq_injections 1158227 irq_window 2071227 largepages 0 0 mmio_exits 954541 mmu_cache_miss 5307 0 mmu_flooded 2493 0 mmu_pde_zapped 1188 0 mmu_pte_updated 5355 0 mmu_pte_write 181550 0 mmu_recycled 0 0 mmu_shadow_zapped 6437 0 mmu_unsync15 0 nmi_injections 0 0 nmi_window 0 0 pf_fixed 73983 0 pf_guest4027 0 remote_tlb_flush 1 0 request_irq6 0 signal_exits 135731 2 tlb_flush 26760 0 qemu-kvm.git: efer_reload0 0 exits869724433 fpu_reload46 0 halt_exits 206 8 halt_wakeup7 0 host_state_reload 105173 8 hypercalls 0 0 insn_emulation 698411821 insn_emulation_fail0 0 invlpg 9682 0 io_exits 626201 0 irq_exits 22930 4 irq_injections 2815 8 irq_window 1029 0 largepages 0 0 mmio_exits 3657 0 mmu_cache_miss 5271 0 mmu_flooded 2466 0 mmu_pde_zapped 1146 0 mmu_pte_updated 5294 0 mmu_pte_write 191173 0 mmu_recycled 0 0 mmu_shadow_zapped 6405 0 mmu_unsync17 0 nmi_injections 0 0 nmi_window 0 0 pf_fixed 73580 0 pf_guest4169 0 remote_tlb_flush 1 0 request_irq0 0 signal_exits 24873 0 tlb_flush 26628 0 Maybe we're missing a coalesced io in qemu.git? It's also possible that gpxe is hitting the apic or pit quite a lot. Regards, Anthony Liguori Ideas? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm tools: Use mmap for working with disk image V2
How do you plan to handle I/O errors or ENOSPC conditions? Note that shared writeable mappings are by far the feature in the VM/FS code that is most error prone, including the impossiblity of doing sensible error handling. The version that accidentally used MAP_PRIVATE actually makes a lot of sense for an equivalent of qemu's snapshot mode where the image is readonly and changes are kept private as long as the amount of modified blocks is small enough to not kill the host VM, but using shared writeable mappings just sems dangerous. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm tools: Use mmap for working with disk image V2
On Mon, Apr 11, 2011 at 9:41 PM, Christoph Hellwig h...@infradead.org wrote: How do you plan to handle I/O errors or ENOSPC conditions? Note that shared writeable mappings are by far the feature in the VM/FS code that is most error prone, including the impossiblity of doing sensible error handling. Good point. I reverted the commit. Thanks! On Mon, Apr 11, 2011 at 9:41 PM, Christoph Hellwig h...@infradead.org wrote: The version that accidentally used MAP_PRIVATE actually makes a lot of sense for an equivalent of qemu's snapshot mode where the image is readonly and changes are kept private as long as the amount of modified blocks is small enough to not kill the host VM, but using shared writeable mappings just sems dangerous. Yup, Sasha, mind submitting a MAP_PRIVATE version that's enabled with '--snapshot' (or equivalent) command line option. Pekka -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Slow PXE boot in qemu.git (fast in qemu-kvm.git)
On Mon, 2011-04-11 at 15:35 -0300, Luiz Capitulino wrote: On Fri, 08 Apr 2011 19:50:57 -0500 Anthony Liguori anth...@codemonkey.ws wrote: On 04/08/2011 06:25 PM, Luiz Capitulino wrote: Hi there, Summary: - PXE boot in qemu.git (HEAD f124a41) is quite slow, more than 5 minutes. Got the problem with e1000, virtio and rtl8139. However, pcnet *works* (it's as fast as qemu-kvm.git) - PXE boot in qemu-kvm.git (HEAD df85c051) is fast, less than a minute. Tried with e1000, virtio and rtl8139 (I don't remember if I tried with pcnet) I was having this problem too, but I think it's because I forgot to build qemu with --enable-io-thread, which is the default for qemu-kvm. Can you re-configure and build with that and see if it's fast? Thanks, Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm tools: rhel6.0 guest hung during shutdown
On Sun, 2011-04-10 at 16:58 +0300, Gleb Natapov wrote: On Sun, Apr 10, 2011 at 09:49:31PM +0800, Amos Kong wrote: System halted. [note: guest hung ...] Isn't that expected result without ACPI support? I would expect all guests to hang like that at the end. I see hangs with Debian Squeeze image too but not with the minimal QEMU image I usually test things with. I wonder, though, why userspace insists on using ACPI for shutdown as we boot with 'noapic'. Pekka -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm tools: rhel6.0 guest hung during shutdown
On Mon, Apr 11, 2011 at 10:01:30PM +0300, Pekka Enberg wrote: On Sun, 2011-04-10 at 16:58 +0300, Gleb Natapov wrote: On Sun, Apr 10, 2011 at 09:49:31PM +0800, Amos Kong wrote: System halted. [note: guest hung ...] Isn't that expected result without ACPI support? I would expect all guests to hang like that at the end. I see hangs with Debian Squeeze image too but not with the minimal QEMU image I usually test things with. I wonder, though, why userspace insists on using ACPI for shutdown as we boot with 'noapic'. There is not way to power down PC from software without ACPI (may be APM has something, but I doubt kvm-tool implements it either). Do you remember Windows 95 it is now safe to turn off your computer screen? -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Slow PXE boot in qemu.git (fast in qemu-kvm.git)
On Mon, 11 Apr 2011 13:00:32 -0600 Alex Williamson alex.william...@redhat.com wrote: On Mon, 2011-04-11 at 15:35 -0300, Luiz Capitulino wrote: On Fri, 08 Apr 2011 19:50:57 -0500 Anthony Liguori anth...@codemonkey.ws wrote: On 04/08/2011 06:25 PM, Luiz Capitulino wrote: Hi there, Summary: - PXE boot in qemu.git (HEAD f124a41) is quite slow, more than 5 minutes. Got the problem with e1000, virtio and rtl8139. However, pcnet *works* (it's as fast as qemu-kvm.git) - PXE boot in qemu-kvm.git (HEAD df85c051) is fast, less than a minute. Tried with e1000, virtio and rtl8139 (I don't remember if I tried with pcnet) I was having this problem too, but I think it's because I forgot to build qemu with --enable-io-thread, which is the default for qemu-kvm. Can you re-configure and build with that and see if it's fast? Thanks, Yes, nice catch, it's faster with I/O thread enabled, even seem faster than qemu-kvm.git. So, does this have to be fixed w/o I/O thread? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm tools: rhel6.0 guest hung during shutdown
On 04/11/2011 11:07 PM, Gleb Natapov wrote: On Mon, Apr 11, 2011 at 10:01:30PM +0300, Pekka Enberg wrote: On Sun, 2011-04-10 at 16:58 +0300, Gleb Natapov wrote: On Sun, Apr 10, 2011 at 09:49:31PM +0800, Amos Kong wrote: System halted. [note: guest hung ...] Isn't that expected result without ACPI support? I would expect all guests to hang like that at the end. I see hangs with Debian Squeeze image too but not with the minimal QEMU image I usually test things with. I wonder, though, why userspace insists on using ACPI for shutdown as we boot with 'noapic'. There is not way to power down PC from software without ACPI (may be APM has something, but I doubt kvm-tool implements it either). Do you remember Windows 95 it is now safe to turn off your computer screen? yup, iirc APM had some set power state entry point, but not sure, need to find docs ;) -- Gleb. -- Cyrill -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm tools: rhel6.0 guest hung during shutdown
On Mon, Apr 11, 2011 at 11:28:22PM +0400, Cyrill Gorcunov wrote: On 04/11/2011 11:07 PM, Gleb Natapov wrote: On Mon, Apr 11, 2011 at 10:01:30PM +0300, Pekka Enberg wrote: On Sun, 2011-04-10 at 16:58 +0300, Gleb Natapov wrote: On Sun, Apr 10, 2011 at 09:49:31PM +0800, Amos Kong wrote: System halted. [note: guest hung ...] Isn't that expected result without ACPI support? I would expect all guests to hang like that at the end. I see hangs with Debian Squeeze image too but not with the minimal QEMU image I usually test things with. I wonder, though, why userspace insists on using ACPI for shutdown as we boot with 'noapic'. There is not way to power down PC from software without ACPI (may be APM has something, but I doubt kvm-tool implements it either). Do you remember Windows 95 it is now safe to turn off your computer screen? yup, iirc APM had some set power state entry point, but not sure, need to find docs ;) Just go for ACPI then. APM is dead. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: USB EHCI patch for 0.14.0?
On 2011-04-11 19:53, David Ahern wrote: On 04/11/11 10:46, Jan Kiszka wrote: On 2011-04-11 15:23, David Ahern wrote: I lost momentum on the code last August and have not been able to get back to it for a variety of reasons. It really needs someone to pick it up and continue - or look at adding xhci code which might be a better solution for virtualization. xHCI is on the way [1], but the code was not yet published AFAIK. Jan [1] http://www.linuxtag.org/2011/de/program/freies-vortragsprogramm/popup/vortragsdetails.html?no_cache=1talkid=103 interesting. And will it be released / submitted to qemu for inclusion? I suppose so. But maybe Alex can tell more. Jan signature.asc Description: OpenPGP digital signature
Re: Slow PXE boot in qemu.git (fast in qemu-kvm.git)
On Mon, 2011-04-11 at 22:04 +0200, Jan Kiszka wrote: On 2011-04-11 21:15, Luiz Capitulino wrote: On Mon, 11 Apr 2011 13:00:32 -0600 Alex Williamson alex.william...@redhat.com wrote: On Mon, 2011-04-11 at 15:35 -0300, Luiz Capitulino wrote: On Fri, 08 Apr 2011 19:50:57 -0500 Anthony Liguori anth...@codemonkey.ws wrote: On 04/08/2011 06:25 PM, Luiz Capitulino wrote: Hi there, Summary: - PXE boot in qemu.git (HEAD f124a41) is quite slow, more than 5 minutes. Got the problem with e1000, virtio and rtl8139. However, pcnet *works* (it's as fast as qemu-kvm.git) - PXE boot in qemu-kvm.git (HEAD df85c051) is fast, less than a minute. Tried with e1000, virtio and rtl8139 (I don't remember if I tried with pcnet) I was having this problem too, but I think it's because I forgot to build qemu with --enable-io-thread, which is the default for qemu-kvm. Can you re-configure and build with that and see if it's fast? Thanks, Yes, nice catch, it's faster with I/O thread enabled, even seem faster than qemu-kvm.git. What's the performance under qemu-kvm with -no-kvm-irqchip? So, does this have to be fixed w/o I/O thread? If it's most probably an architectural deficit of non-io-thread mode, I would say let it rest in peace. But maybe it points to a generic issues that is just magnified by non-threaded mode. I've probably been told, but forget. Why isn't io-thread enabled by default? Thanks, Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Slow PXE boot in qemu.git (fast in qemu-kvm.git)
On 2011-04-11 22:14, Alex Williamson wrote: On Mon, 2011-04-11 at 22:04 +0200, Jan Kiszka wrote: On 2011-04-11 21:15, Luiz Capitulino wrote: On Mon, 11 Apr 2011 13:00:32 -0600 Alex Williamson alex.william...@redhat.com wrote: On Mon, 2011-04-11 at 15:35 -0300, Luiz Capitulino wrote: On Fri, 08 Apr 2011 19:50:57 -0500 Anthony Liguori anth...@codemonkey.ws wrote: On 04/08/2011 06:25 PM, Luiz Capitulino wrote: Hi there, Summary: - PXE boot in qemu.git (HEAD f124a41) is quite slow, more than 5 minutes. Got the problem with e1000, virtio and rtl8139. However, pcnet *works* (it's as fast as qemu-kvm.git) - PXE boot in qemu-kvm.git (HEAD df85c051) is fast, less than a minute. Tried with e1000, virtio and rtl8139 (I don't remember if I tried with pcnet) I was having this problem too, but I think it's because I forgot to build qemu with --enable-io-thread, which is the default for qemu-kvm. Can you re-configure and build with that and see if it's fast? Thanks, Yes, nice catch, it's faster with I/O thread enabled, even seem faster than qemu-kvm.git. What's the performance under qemu-kvm with -no-kvm-irqchip? So, does this have to be fixed w/o I/O thread? If it's most probably an architectural deficit of non-io-thread mode, I would say let it rest in peace. But maybe it points to a generic issues that is just magnified by non-threaded mode. I've probably been told, but forget. Why isn't io-thread enabled by default? Thanks, TCG performance still sucks in io-threaded mode. I've three patches in my queue that reduces the overhead a bit further - for me to a reasonable level (will post them the next days). But, still, YMMV depending on the workload. At least Windows should no longer we a functional blocker thanks to Paolo's work. Jan signature.asc Description: OpenPGP digital signature
Re: Slow PXE boot in qemu.git (fast in qemu-kvm.git)
On 2011-04-11 22:18, Jan Kiszka wrote: On 2011-04-11 22:14, Alex Williamson wrote: On Mon, 2011-04-11 at 22:04 +0200, Jan Kiszka wrote: On 2011-04-11 21:15, Luiz Capitulino wrote: On Mon, 11 Apr 2011 13:00:32 -0600 Alex Williamson alex.william...@redhat.com wrote: On Mon, 2011-04-11 at 15:35 -0300, Luiz Capitulino wrote: On Fri, 08 Apr 2011 19:50:57 -0500 Anthony Liguori anth...@codemonkey.ws wrote: On 04/08/2011 06:25 PM, Luiz Capitulino wrote: Hi there, Summary: - PXE boot in qemu.git (HEAD f124a41) is quite slow, more than 5 minutes. Got the problem with e1000, virtio and rtl8139. However, pcnet *works* (it's as fast as qemu-kvm.git) - PXE boot in qemu-kvm.git (HEAD df85c051) is fast, less than a minute. Tried with e1000, virtio and rtl8139 (I don't remember if I tried with pcnet) I was having this problem too, but I think it's because I forgot to build qemu with --enable-io-thread, which is the default for qemu-kvm. Can you re-configure and build with that and see if it's fast? Thanks, Yes, nice catch, it's faster with I/O thread enabled, even seem faster than qemu-kvm.git. What's the performance under qemu-kvm with -no-kvm-irqchip? So, does this have to be fixed w/o I/O thread? If it's most probably an architectural deficit of non-io-thread mode, I would say let it rest in peace. But maybe it points to a generic issues that is just magnified by non-threaded mode. I've probably been told, but forget. Why isn't io-thread enabled by default? Thanks, TCG performance still sucks in io-threaded mode. I've three patches in my queue that reduces the overhead a bit further - for me to a reasonable level (will post them the next days). But, still, YMMV depending on the workload. In fact, they were already prepared. So I've just sent them out. Jan signature.asc Description: OpenPGP digital signature
Re: Slow PXE boot in qemu.git (fast in qemu-kvm.git)
On 2011-04-11 23:05, Luiz Capitulino wrote: On Mon, 11 Apr 2011 22:04:52 +0200 Jan Kiszka jan.kis...@web.de wrote: On 2011-04-11 21:15, Luiz Capitulino wrote: On Mon, 11 Apr 2011 13:00:32 -0600 Alex Williamson alex.william...@redhat.com wrote: On Mon, 2011-04-11 at 15:35 -0300, Luiz Capitulino wrote: On Fri, 08 Apr 2011 19:50:57 -0500 Anthony Liguori anth...@codemonkey.ws wrote: On 04/08/2011 06:25 PM, Luiz Capitulino wrote: Hi there, Summary: - PXE boot in qemu.git (HEAD f124a41) is quite slow, more than 5 minutes. Got the problem with e1000, virtio and rtl8139. However, pcnet *works* (it's as fast as qemu-kvm.git) - PXE boot in qemu-kvm.git (HEAD df85c051) is fast, less than a minute. Tried with e1000, virtio and rtl8139 (I don't remember if I tried with pcnet) I was having this problem too, but I think it's because I forgot to build qemu with --enable-io-thread, which is the default for qemu-kvm. Can you re-configure and build with that and see if it's fast? Thanks, Yes, nice catch, it's faster with I/O thread enabled, even seem faster than qemu-kvm.git. What's the performance under qemu-kvm with -no-kvm-irqchip? Still fast, I meant: is it even faster with unaccelerated userspace irqchip? I've seen such effects with emulated NICs before. but just realized that qemu-kvm's configure says that I/O thread is disabled: IO thread no And it's fast.. That only means (so far) that the upstream io-thread code is disabled. Qemu-kvm's own solution is enabled all the time, and you can't switch to upstream anyway as both are incompatible. That's going to change soon (hopefully) when we migrate qemu-kvm to the upstream version. Jan signature.asc Description: OpenPGP digital signature
Re: [PATCH v2 1/2] rbd: use the higher level librbd instead of just librados
On 04/08/2011 01:43 AM, Stefan Hajnoczi wrote: On Mon, Mar 28, 2011 at 04:15:57PM -0700, Josh Durgin wrote: librbd stacks on top of librados to provide access to rbd images. Using librbd simplifies the qemu code, and allows qemu to use new versions of the rbd format with few (if any) changes. Signed-off-by: Josh Durginjosh.dur...@dreamhost.com Signed-off-by: Yehuda Sadehyeh...@hq.newdream.net --- block/rbd.c | 785 +++-- block/rbd_types.h | 71 - configure | 33 +-- 3 files changed, 221 insertions(+), 668 deletions(-) delete mode 100644 block/rbd_types.h Hi Josh, I have applied your patches onto qemu.git/master and am running ceph.git/master. Unfortunately qemu-iotests fails for me. Test 016 seems to hang in qemu-io -g -c write -P 66 128M 512 rbd:rbd/t.raw. I can reproduce this consistently. Here is the backtrace of the hung process (not consuming CPU, probably deadlocked): This hung because it wasn't checking the return value of rbd_aio_write. I've fixed this in the for-qemu branch of http://ceph.newdream.net/git/qemu-kvm.git. Also, the existing rbd implementation is not 'growable' - writing to a large offset will not expand the rbd image correctly. Should we implement bdrv_truncate to support this (librbd has a resize operation)? Is bdrv_truncate useful outside of qemu-img and qemu-io? Test 008 failed with an assertion but succeeded when run again. I think this is a race condition: This is likely a use-after-free, but I haven't been able to find the race condition yet (or reproduce it). Could you get a backtrace from the core file? Thanks, Josh -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Slow PXE boot in qemu.git (fast in qemu-kvm.git)
On 04/11/2011 03:04 PM, Jan Kiszka wrote: On 2011-04-11 21:15, Luiz Capitulino wrote: On Mon, 11 Apr 2011 13:00:32 -0600 Alex Williamsonalex.william...@redhat.com wrote: On Mon, 2011-04-11 at 15:35 -0300, Luiz Capitulino wrote: On Fri, 08 Apr 2011 19:50:57 -0500 Anthony Liguorianth...@codemonkey.ws wrote: On 04/08/2011 06:25 PM, Luiz Capitulino wrote: Hi there, Summary: - PXE boot in qemu.git (HEAD f124a41) is quite slow, more than 5 minutes. Got the problem with e1000, virtio and rtl8139. However, pcnet *works* (it's as fast as qemu-kvm.git) - PXE boot in qemu-kvm.git (HEAD df85c051) is fast, less than a minute. Tried with e1000, virtio and rtl8139 (I don't remember if I tried with pcnet) I was having this problem too, but I think it's because I forgot to build qemu with --enable-io-thread, which is the default for qemu-kvm. Can you re-configure and build with that and see if it's fast? Thanks, Yes, nice catch, it's faster with I/O thread enabled, even seem faster than qemu-kvm.git. What's the performance under qemu-kvm with -no-kvm-irqchip? So, does this have to be fixed w/o I/O thread? If it's most probably an architectural deficit of non-io-thread mode, I would say let it rest in peace. But maybe it points to a generic issues that is just magnified by non-threaded mode. If gpxe is spinning waiting for I/O to complete, that's going to prevent select from running until the next signal (timer event). Regards, Anthony Liguori Jan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE] Native Linux KVM tool
On Sat, Apr 09, 2011 at 09:40:09AM +0200, Ingo Molnar wrote: * Andrea Arcangeli aarca...@redhat.com wrote: [...] I thought the whole point of a native kvm tool was to go all the paravirt way to provide max performance and maybe also depend on vhost as much as possible. BTW, I should elaborate on the all the paravirt way, going 100% paravirt isn't what I meant. I was thinking at the performance critical drivers mainly like storage and network. The kvm tool could be more hackable and evolve faster by exposing a single hardware view to the linux guest (using only paravirt whenever that improves performance, like network/storage). Whenever full emulation doesn't affect any fast path, it should be preferred rather than inventing new paravirt interfaces for no good. That for example applies first and foremost to the EPT support which is simpler and more optimal than any shadow paravirt pagetables. It'd be a dead end to do all in paravirt performance-wise. I definitely didn't mean any resemblance to lguest when I said full paravirt ;). Sorry for the confusion. To me it's more than that: today i can use it to minimally boot test various native bzImages just by typing: kvm run ./bzImage this will get me past most of the kernel init, up to the point where it would try to mount user-space. ( That's rather powerful to me personally, as i introduce most of my bugs to these stages of kernel bootup - and as a kernel developer i'm not alone there ;-) I would be sad if i were forced to compile in some sort of paravirt support, just to be able to boot-test random native kernel images. Really, if you check the code, serial console and timer support is not a big deal complexity-wise and it is rather useful: Agree with that. git pull git://github.com/penberg/linux-kvm master So i think up to a point hardware emulation is both fun to implement (it's fun to be on the receiving end of hw calls, for a change) and a no-brainer to have from a usability POV. How far it wants to go we'll see! :-) About using the kvm tool as a debugging tool I don't see the point though. It's very unlikely the kvm tool will ever be able to match qemu power and capabilities for debugging, in fact qemu will allow you to do basic debug of several device drivers too (e1000, IDE etc...). I don't really see the point of the kvm tool as a debugging tool considering how qemu is mature in terms of monitor memory inspection commands and gdbstub for that, if it's debug you're going after adding more features to the qemu monitor looks a better way to go. The only way I see this useful is to lead it into a full performance direction, using paravirt whenever it saves CPU (like virtio-blk, vhost-net) and allow it to scale to hundred of cpus doing I/O simultaneously and get there faster than qemu. Now smp scaling with qemu-kvm driver backends hasn't been a big issue according to Avi, so it's not like we're under pressure from it, but clearly someday it may become a bigger issue and having less drivers to deal with (especially only having vhost-blk in userland with vhost-net already being in the kernel) may provide an advantage in allowing a more performance oriented implementation of the backends without breaking lots of existing and valuable full-emulated drivers. In terms of pure kernel debugging I'm afraid this will be dead end and for the kernel testing you describe I think qemu-kvm will work best already. We already have a simpler kvm support in qemu (vs qemu-kvm) and we don't want a third that is even slower than qemu kvm support, so it has to be faster than qemu-kvm or nothing IMHO :). -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM call agenda for April 12
On 04/11/2011 11:35 AM, Juan Quintela wrote: Please, send in any agenda items you are interested in covering. I won't be able to attend due. Regards, Anthony Liguori Later, Juan. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html