Re: [Qemu-devel] Sometimes qem-kvm hang at kvm_put_vcpu_events when virsh restore. Both version 0.12.5 / 0.13.0
1. I create a guest use libvirt; 2. I start the guest use: virsh start guest; 3. I save the guest to a file, use: virsh save guest guest.save 4. I retore the guest, use : virsh retore guest.save Sometimes, the guest restore successful. sometimes restore fail. I debug the code, the qemu-kvm hang at kvm_put_vcpu_events, the process state is D, uninterruptable sleep. Only can killed by kill -9. I print the events when kvm_put_vcpu_events / kvm_get_vcpu_events, it is always different, event when restore successful - Original Message - From: Mulyadi Santosa mulyadi.sant...@gmail.com To: changlimin changli...@h3c.com Cc: qemu-devel@nongnu.org Sent: Wednesday, December 15, 2010 3:46 PM Subject: Re: [Qemu-devel] Sometimes qem-kvm hang at kvm_put_vcpu_events when virsh restore. Both version 0.12.5 / 0.13.0 2010/12/15 changlimin changli...@h3c.com: BTW windows 2008 guest start failed at qem-kvm 0.13.0, but 0.12.5 start success. I am not qemu/kvm expert, but I think if you could provide either qemu instruction trace (using -d) or traces inside the guest OS itself (windows live debugger maybe?), it would be much easier for people here to decipher the problem. At least could you write down how do you invoke qemu-kvm back then? -- regards, Mulyadi Santosa Freelance Linux trainer and consultant blog: the-hydra.blogspot.com training: mulyaditraining.blogspot.com
[Qemu-devel] qemu-x86_64 segments on fedora14
Dear All, I am getting a the following segmentation fault when I run the command qemu-x86_64 /bin/ls: qemu: uncaught target signal 11 (Segmentation fault) - core dumped Segmentation fault (core dumped) I am using fedora14 operating system. Thanks for your help. Rania Mameesh University of Siena Italy
[Qemu-devel] Re: [PATCH 00/11] AHCI emulation support v9
Hi, One thing that might be worth considering is to have preliminary seabios support for booting. Gerd does have something in the queue there ;). Not in the queue, it's upstream. Just pull latest seabios. As it didn't got much testing yet it is off by default, so make sure to flip CONFIG_AHCI. Thats it ;) cheers, Gerd
Re: [Qemu-devel] Sometimes qem-kvm hang at kvm_put_vcpu_events when virsh restore. Both version 0.12.5 / 0.13.0
On Wed, Dec 15, 2010 at 15:08, changlimin changli...@h3c.com wrote: 1. I create a guest use libvirt; 2. I start the guest use: virsh start guest; 3. I save the guest to a file, use: virsh save guest guest.save 4. I retore the guest, use : virsh retore guest.save Sometimes, the guest restore successful. sometimes restore fail. I debug the code, the qemu-kvm hang at kvm_put_vcpu_events, the process state is D, uninterruptable sleep. Only can killed by kill -9. I print the events when kvm_put_vcpu_events / kvm_get_vcpu_events, it is always different, event when restore successful I could only say, i smell deadlock due to race condition here...but I have no further prove...how many (virtual) CPU did you simulate at that time? -- regards, Mulyadi Santosa Freelance Linux trainer and consultant blog: the-hydra.blogspot.com training: mulyaditraining.blogspot.com
Re: [Qemu-devel] qemu-x86_64 segments on fedora14
On Wed, Dec 15, 2010 at 15:26, Rania Mameesh mame...@dii.unisi.it wrote: Dear All, I am getting a the following segmentation fault when I run the command qemu-x86_64 /bin/ls: qemu: uncaught target signal 11 (Segmentation fault) - core dumped Segmentation fault (core dumped) that was the user mode Qemu emulation, right? Sounds like missing NPTL implementation to meAFAIK it's still incomplete in some parts -- regards, Mulyadi Santosa Freelance Linux trainer and consultant blog: the-hydra.blogspot.com training: mulyaditraining.blogspot.com
Re: [Qemu-devel] Sometimes qem-kvm hang at kvm_put_vcpu_events when virsh restore. Both version 0.12.5 / 0.13.0
I have tried 1 vcpu and 2 vcpus, both sometimes restore fail. - Original Message - From: Mulyadi Santosa mulyadi.sant...@gmail.com To: changlimin changli...@h3c.com Cc: qemu-devel@nongnu.org Sent: Wednesday, December 15, 2010 4:31 PM Subject: Re: [Qemu-devel] Sometimes qem-kvm hang at kvm_put_vcpu_events when virsh restore. Both version 0.12.5 / 0.13.0 On Wed, Dec 15, 2010 at 15:08, changlimin changli...@h3c.com wrote: 1. I create a guest use libvirt; 2. I start the guest use: virsh start guest; 3. I save the guest to a file, use: virsh save guest guest.save 4. I retore the guest, use : virsh retore guest.save Sometimes, the guest restore successful. sometimes restore fail. I debug the code, the qemu-kvm hang at kvm_put_vcpu_events, the process state is D, uninterruptable sleep. Only can killed by kill -9. I print the events when kvm_put_vcpu_events / kvm_get_vcpu_events, it is always different, event when restore successful I could only say, i smell deadlock due to race condition here...but I have no further prove...how many (virtual) CPU did you simulate at that time? -- regards, Mulyadi Santosa Freelance Linux trainer and consultant blog: the-hydra.blogspot.com training: mulyaditraining.blogspot.com
[Qemu-devel] Re: [PATCH 00/11] AHCI emulation support v9
Am 15.12.2010 um 09:28 schrieb Gerd Hoffmann kra...@redhat.com: Hi, One thing that might be worth considering is to have preliminary seabios support for booting. Gerd does have something in the queue there ;). Not in the queue, it's upstream. Just pull latest seabios. As it didn't got much testing yet it is off by default, so make sure to flip CONFIG_AHCI. Thats it ;) Even better! Could we get that in for 0.14 still? I assume that means we need a stable seabios release. Alex
[Qemu-devel] Re: [PATCH 00/11] AHCI emulation support v9
Hi, Even better! Could we get that in for 0.14 still? I assume that means we need a stable seabios release. Also for Gleb's bootorder bits we'll need a seabios update. I think for that not all bits are in upstream seabios yet though. A new release once this is settled is probably a good idea, so we don't have to grab a random snapshot. cheers, Gerd
[Qemu-devel] [PATCH v3] qemu, qmp: convert do_inject_nmi() to QObject, QError
Convert do_inject_nmi() to QObject, QError, we need to use it(via libvirt). changed from v1 Add document. Add error handling when the cpu index is invalid. changed from v2 use QERR_INVALID_PARAMETER_VALUE as Markus suggest. Signed-off-by: Lai Jiangshan la...@cn.fujitsu.com --- diff --git a/hmp-commands.hx b/hmp-commands.hx index 23024ba..f86d9fe 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -724,7 +724,8 @@ ETEXI .args_type = cpu_index:i, .params = cpu, .help = inject an NMI on the given CPU, -.mhandler.cmd = do_inject_nmi, +.user_print = monitor_user_noop, +.mhandler.cmd_new = do_inject_nmi, }, #endif STEXI diff --git a/monitor.c b/monitor.c index ec31eac..3e33a96 100644 --- a/monitor.c +++ b/monitor.c @@ -2119,7 +2119,7 @@ static void do_wav_capture(Monitor *mon, const QDict *qdict) #endif #if defined(TARGET_I386) -static void do_inject_nmi(Monitor *mon, const QDict *qdict) +static int do_inject_nmi(Monitor *mon, const QDict *qdict, QObject **ret_data) { CPUState *env; int cpu_index = qdict_get_int(qdict, cpu_index); @@ -2127,8 +2127,12 @@ static void do_inject_nmi(Monitor *mon, const QDict *qdict) for (env = first_cpu; env != NULL; env = env-next_cpu) if (env-cpu_index == cpu_index) { cpu_interrupt(env, CPU_INTERRUPT_NMI); -break; +return 0; } + +qerror_report(QERR_INVALID_PARAMETER_VALUE, cpu_index, + a CPU number); +return -1; } #endif diff --git a/qmp-commands.hx b/qmp-commands.hx index e5f157f..fcb6bf2 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -429,6 +429,33 @@ Example: EQMP +#if defined(TARGET_I386) +{ +.name = inject_nmi, +.args_type = cpu_index:i, +.params = cpu, +.help = inject an NMI on the given CPU, +.user_print = monitor_user_noop, +.mhandler.cmd_new = do_inject_nmi, +}, +#endif +SQMP +inject_nmi +-- + +Inject an NMI on the given CPU (x86 only). + +Arguments: + +- cpu_index: the index of the CPU to be injected NMI (json-int) + +Example: + +- { execute: inject_nmi, arguments: { cpu_index: 0 } } +- { return: {} } + +EQMP + { .name = migrate, .args_type = detach:-d,blk:-b,inc:-i,uri:s,
Re: [Qemu-devel] Re: [PATCH] PCI: Bus number from the bridge, not the device
On Tue, Dec 14, 2010 at 11:34:53AM -0700, Alex Williamson wrote: On Tue, 2010-12-14 at 14:26 +0200, Michael S. Tsirkin wrote: On Mon, Dec 13, 2010 at 10:04:24PM -0700, Alex Williamson wrote: I've only ever seen config[PCI_SECONDARY_BUS] be non-zero for an assigned device, so I'm pretty sure we're not going to hurt migration, but the code is clearly wrong and I'd like to make sure we don't trip on a migration failure for a minor device config space change. Which reminds me: maybe just mark nested bridges as non-migrateable for now? Care writing such a patch? Hmm, this is trickier than it sounds. Hmm, since 0 is put in the path instead of the bridge number, will the correct bridge be restored? We're really only broken wrt migration if a device under a bridge calls qemu_ram_alloc. I guess there's more broken-ness. What exactly breaks qemu_ram_alloc? Any device is free to do this, but typically it only happens via pci_add_option_rom() (not counting vga as typical). So maybe the better approach for now is to prevent the problem by disallowing option ROMs for devices below a bridge. We obviously risk devices coming along that allocate RAM on their own, but we could still allow the most common issue with almost no lost functionality (assuming no one wants to boot off that nested device). Thoughts? Thanks, Alex
[Qemu-devel] Re: [PATCH] rtl8139: IO memory is not part of vmstate
On Tue, Dec 14, 2010 at 08:41:55AM -0700, Alex Williamson wrote: On Tue, 2010-12-14 at 14:32 +0200, Michael S. Tsirkin wrote: On Mon, Dec 13, 2010 at 10:00:48PM -0700, Alex Williamson wrote: On Tue, 2010-12-14 at 06:43 +0200, Michael S. Tsirkin wrote: On Mon, Dec 13, 2010 at 12:15:08PM -0700, Alex Williamson wrote: On Mon, 2010-12-13 at 21:06 +0200, Michael S. Tsirkin wrote: On Mon, Dec 13, 2010 at 11:59:16AM -0700, Alex Williamson wrote: On Mon, 2010-12-13 at 20:54 +0200, Michael S. Tsirkin wrote: On Mon, Dec 13, 2010 at 11:00:44AM -0700, Alex Williamson wrote: On Mon, 2010-12-13 at 19:50 +0200, Michael S. Tsirkin wrote: On Mon, Dec 13, 2010 at 10:43:22AM -0700, Alex Williamson wrote: So, unfortunately, I stand by my original patch. What about the one that put -1 in saved index for a hotplugged device? There are still examples that don't work even without hotplug (example 2 and example 3 after the reboot). That hack limits the damage, but still leaves a latent bug for reboot and doesn't address the non-hotplug scenarios. So, I don't think it's worthwhile to pursue, and we shouldn't pretend we can use it to avoid bumping the version_id. Thanks, Alex I guess when we bump it we tell users: migration is completely borken to the old version, don't even try it. Is there a way for libvirt to discover such incompatibilities and avoid the migration? I don't know if libvirt has a way to query this in advance. If a migration is attempted, the target will report: savevm: unsupported version 5 for ':00:03.0/rtl8139' v4 And the source will continue running. We waste plenty of bits getting to that point, Yes, this happens after all of memory has been migrated. Better late than never :^\ One other question: can we do the same by creating a new (empty) section? As was discussed in the past this is easier for downstreams to cherry-pick. The only way I can think to do that would be to have a subsection that is always included, but saves no data. That would force a failure on new-old migration, but I don't think it really matches the intended purpose of subsections and feels like it's adding cruft for no gain. Maybe I'm missing something. Juan, is there any advantage to trapping this in a subsection? Thanks, Alex Maybe in this particular case the advantage is minimal. But it seems easier to stick to a rule of no more version bumps than argue about each case. Do we have such a rule? I think it's easier to have a hard rule than start arguing whether each change is a subsection or not for each patch. What is the downside? If we have a subsection who's needed function is return 1, I think that's a good indication that it's not appropriate for a subsection and the end result is equivalent to bumping the main driver vmstate version. So if you feel strongly about it, go ahead, I'm just concerned we'll now need to argue version or subsection for each migration-related change. It's convoluted to try to hide a one-way upgrade in a subsection. It may be convoluted but I think it works better with downstreams. When we'll bump the version again in the future, it should be possible for a downstream to pick up just the next change, and skip this one. But that's a general comment. In this case, I expect all downstream to be able to pick this bugfix with no real downsides. Thanks, Alex
[Qemu-devel] Re: [PATCH] rtl8139: IO memory is not part of vmstate
On Tue, Dec 14, 2010 at 04:59:55PM +0100, Paolo Bonzini wrote: On 12/14/2010 04:41 PM, Alex Williamson wrote: Maybe in this particular case the advantage is minimal. But it seems easier to stick to a rule of no more version bumps than argue about each case. Do we have such a rule? If we have a subsection who's needed function is return 1, I think that's a good indication that it's not appropriate for a subsection and the end result is equivalent to bumping the main driver vmstate version. It's convoluted to try to hide a one-way upgrade in a subsection. Thanks, Indeed, subsections are for data that is rarely needed so that there's some chance (sometimes ~100%) of migration working seemlessly. If a subsection arrives that qemu does not know about, won't migratin fail? In this case it's either no-bump-and-live-with-the-consequences, or changing the version id. Paolo This was discussed to death already. version ids have the problem that they don't play nicely with downstreams. -- MST
Re: [Qemu-devel] [PATCH v3] qemu, qmp: convert do_inject_nmi() to QObject, QError
Lai Jiangshan la...@cn.fujitsu.com writes: Convert do_inject_nmi() to QObject, QError, we need to use it(via libvirt). changed from v1 Add document. Add error handling when the cpu index is invalid. changed from v2 use QERR_INVALID_PARAMETER_VALUE as Markus suggest. Signed-off-by: Lai Jiangshan la...@cn.fujitsu.com A note on commit messages: The commit message should describe the current version of the patch. Don't repeat the subject line in the body. Patch history is very useful for review, but usually uninteresting once the patch is committed. Thus, it's best to put it after the --- separator. Subsystem tags in the subject line are helpful. But qemu doesn't provide any information there :) Regarding the patch: The conversion looks good. The new QMP command is called inject_nmi, while the existing human monitor command is called nmi. Luiz asked for this name change. I don't mind. But should we rename the human monitor command for consistency? Regardless, the differing command name is worth mentioning in the commit message.
[Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify
On Mon, Dec 13, 2010 at 6:52 PM, Michael S. Tsirkin m...@redhat.com wrote: On Mon, Dec 13, 2010 at 05:57:28PM +, Stefan Hajnoczi wrote: On Mon, Dec 13, 2010 at 4:28 PM, Stefan Hajnoczi stefa...@gmail.com wrote: On Mon, Dec 13, 2010 at 4:12 PM, Michael S. Tsirkin m...@redhat.com wrote: On Mon, Dec 13, 2010 at 03:27:06PM +, Stefan Hajnoczi wrote: On Mon, Dec 13, 2010 at 1:36 PM, Michael S. Tsirkin m...@redhat.com wrote: On Mon, Dec 13, 2010 at 03:35:38PM +0200, Michael S. Tsirkin wrote: On Mon, Dec 13, 2010 at 01:11:27PM +, Stefan Hajnoczi wrote: Fresh results: 192.168.0.1 - host (runs netperf) 192.168.0.2 - guest (runs netserver) host$ src/netperf -H 192.168.0.2 -- -m 200 ioeventfd=on TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2 (192.168.0.2) port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 200 10.00 1759.25 ioeventfd=off TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2 (192.168.0.2) port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 200 10.00 1757.15 The results vary approx +/- 3% between runs. Invocation: $ x86_64-softmmu/qemu-system-x86_64 -m 4096 -enable-kvm -netdev type=tap,id=net0,ifname=tap0,script=no,downscript=no -device virtio-net-pci,netdev=net0,ioeventfd=on|off -vnc :0 -drive if=virtio,cache=none,file=$HOME/rhel6-autobench-raw.img I am running qemu.git with v5 patches, based off 36888c6335422f07bbc50bf3443a39f24b90c7c6. Host: 1 Quad-Core AMD Opteron(tm) Processor 2350 @ 2 GHz 8 GB RAM RHEL 6 host Next I will try the patches on latest qemu-kvm.git Stefan One interesting thing is that I put virtio-net earlier on command line. Sorry I mean I put it after disk, you put it before. I can't find a measurable difference when swapping -drive and -netdev. One other concern I have is that we are apparently using ioeventfd for all VQs. E.g. for virtio-net we probably should not use it for the control VQ - it's a waste of resources. One option is a per-device (block, net, etc) bitmap that masks out virtqueues. Is that something you'd like to see? I'm tempted to mask out the RX vq too and see how that affects the qemu-kvm.git specific issue. As expected, the rx virtqueue is involved in the degradation. I enabled ioeventfd only for the TX virtqueue and got the same good results as userspace virtio-net. When I enable only the rx virtqueue, performs decreases as we've seen above. Stefan Interesting. In particular this implies something's wrong with the queue: we should not normally be getting notifications from rx queue at all. Is it running low on buffers? Does it help to increase the vq size? Any other explanation? I made a mistake, it is the *tx* vq that causes reduced performance on short packets with ioeventfd. I double-checked the results and the rx vq doesn't affect performance. Initially I thought the fix would be to adjust the tx mitigation mechanism since ioeventfd does its own mitigation of sorts. Multiple eventfd signals will be coalesced into one qemu-kvm event handler call if qemu-kvm didn't have a chance to handle the first event before the eventfd was signalled again. I added -device virtio-net-pci tx=immediate to flush the TX queue immediately instead of scheduling a BH or timer. Unfortunately this had little measurable effect and performance stayed the same. This suggests most of the latency is between the guest's pio write and qemu-kvm getting around to handling the event. You mentioned that vhost-net has the same performance issue on this benchmark. I guess a solution for vhost-net may help virtio-ioeventfd and vice versa. Are you happy with this patchset if I remove virtio-net-pci ioeventfd=on|off so only virtio-blk-pci has ioeventfd=on|off (with default on)? For block we've found it to be a win and the initial results looked good for net too. Stefan
[Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify
For the record, here are the commits to selectively mask virtqueues for ioeventfd and to add -device virtio-net-pci,tx=immediate: http://repo.or.cz/w/qemu-kvm/stefanha.git/shortlog/refs/heads/virtio-ioeventfd-2 I'm posting this in case you want to try it out too. Stefan
Re: [Qemu-devel] [PULL] spice: add qxl device, qmp events + monitor commands.
On 12/09/10 14:33, Gerd Hoffmann wrote: The following changes since commit 138b38b61bf92d4e9588acf934e532499c94e185: ppc: kvm: fix signedness warning (2010-12-08 21:30:19 +0100) are available in the git repository at: git://anongit.freedesktop.org/spice/qemu spice.v23.pull Ping? cheers, Gerd
Re: [Qemu-devel] Re: [PATCH 1/7] usb-linux: introduce a usb_linux_alt_setting function
On 12/09/10 14:16, Gerd Hoffmann wrote: On 11/26/10 19:13, Hans de Goede wrote: The next patch in this series introduces multiple ways to get the alt setting dependent upon usb_fs_type, it is cleaner to put this into its own function. Note that this patch also changes the assumed alt setting in case of an error getting the alt setting to be 0 (a sane default) rather then the interface numberwhich makes no sense. Patch series looks good. Acked-by: Gerd Hoffmann kra...@redhat.com Ping? Any chance this can go in for 0.14? cheers, Gerd
Re: [Qemu-devel] [PATCH 0/5] usb-ccid (v9)
On 12/12/10 19:37, Alon Levy wrote: This patchset adds three new devices, usb-ccid, ccid-card-passthru and ccid-card-emulated, providing a CCID bus, a simple passthru protocol implementing card requiring a client, and a standalone emulated card. Hmm, 'git am' refuses to apply these and complains about a corrupted patch: Applying: libcacard: initial commit after coding style fixes fatal: corrupt patch at line 5839 Is there a git tree somewhere with the patches to pull from? cheers, Gerd
[Qemu-devel] Re: [Spice-devel] RFC; usb redirection protocol
Hi, Thanks for taking the time to read all that! On 12/13/2010 12:21 PM, Gerd Hoffmann wrote: Basic packet structure / communication -- Each packet exchanged between the vm-host and the usb-host starts with a usb_redir_header, followed by an optional command specific header follow by optional additional data. The usb_redir_header each packet starts with looks as follows: struct usb_redir_header { uint32_t command; uint32_t length; } uint32_t id; ? A reply would then carry the id of the request ... That sounds like a good idea (all though for iso streams the packets will be only send in one direction). Given that everything is done over a potentially slow transport in practice the diferentiating between synchroneous and asynchroneous commands may seem odd. The difference is how the usb-host will handle them once received. For synchroneous commands the usb-host will hand the request over to the host os and then *wait* for a response. This means that the vm-host is guaranteed to get an immediate response. Where as for asynchroneous commands to usb-host hands the request over to the host os with the request to let the usb-host process know when the request is done. Hmm. Looks like you are planning for one tcp stream and one thread (on the usb-host side) for each usb device. That will not work very good for usb-over-vnc because there is a single tcp stream only. We could of course multiplex multiple logical usb connections over vnc, but even then blocking on the usb-host side looks bad as this could disrupt other usb devices forwarded over the same connection. The idea is for this protocol to be transport independent (*) so it could be one stream, or it could be multiplexed into another transport. Assuring that there are not too large latencies, and handling things like not blocking for too long when handling multiple devices from the same thread are left to the implementation. It could be that the implementation decides to not handle synchroneous commands synchroneous at all. The main difference I'm trying to make here is between commands which do things to end points which cannot be done while other packets are in flight (like setting configuration, or interface alt setting) and commands (normal packets) of which there can be multiple in flight. I'm open to using a different term then synchroneous and async here. *) Assuming a reliable, ordered transport usb_redir_report_descriptor --- usb_redir_header.command: usb_redir_report_desciptor usb_redir_header.length: sizeof usb device descriptors No command specific header. The command specific additional data contains the entire descriptors for the usb device. A packet of this type is send by the usb-host directly after the hello packet it contains the usb descriptor tables for the usb device. Device addressing isn't done at all in the protocol, i.e. there is a fixed device - connection relation ship? The idea is there is a fixed device - transport pipe relation ship, yes. For the simple tcp server usb-host I plan to implement as first usb-host, the device would be specified on the cmdline while starting the server. When one wants to use multiple devices, this means starting one usb-server per device. For things like vnc and spice I assume there will be some sort of control channel where usb device management is done. To be more precise I expect the vnc client / spice client to have some UI which allows selecting a usb device to plug into the virtual machine. And then the client will setup a transport for this (reserve a channel within its existing stream) and tell the server that it has an usb device at sub channel id #, at which point the server will add a new usb redir device to qemu, passing in a transport object which is connected to this channel. Please let me know what you think of this. Do you know whenever certain low-level usb ops can work with this? I expect most usb devices to work with this, I don't know about really weird ones. Specifically iphone firmware flashing was mentioned on the list. I think that should work, but that is an interesting case about which I don't know enough yet. Also I remember somewhere in the ehci (or xhci?) specs was mentioned with some devices it can be needed to talk to them *before* an bus address is assigned ... If some devices need this, and the setup done there cannot be done by the client machines os, we loose, as device enumeration and address assignment is done at the OS level, and at least under Linux we won't get a chance to talk to the device till after it has been assigned an address. Regards, Hans
[Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify
On Wed, Dec 15, 2010 at 11:42:12AM +, Stefan Hajnoczi wrote: On Mon, Dec 13, 2010 at 6:52 PM, Michael S. Tsirkin m...@redhat.com wrote: On Mon, Dec 13, 2010 at 05:57:28PM +, Stefan Hajnoczi wrote: On Mon, Dec 13, 2010 at 4:28 PM, Stefan Hajnoczi stefa...@gmail.com wrote: On Mon, Dec 13, 2010 at 4:12 PM, Michael S. Tsirkin m...@redhat.com wrote: On Mon, Dec 13, 2010 at 03:27:06PM +, Stefan Hajnoczi wrote: On Mon, Dec 13, 2010 at 1:36 PM, Michael S. Tsirkin m...@redhat.com wrote: On Mon, Dec 13, 2010 at 03:35:38PM +0200, Michael S. Tsirkin wrote: On Mon, Dec 13, 2010 at 01:11:27PM +, Stefan Hajnoczi wrote: Fresh results: 192.168.0.1 - host (runs netperf) 192.168.0.2 - guest (runs netserver) host$ src/netperf -H 192.168.0.2 -- -m 200 ioeventfd=on TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2 (192.168.0.2) port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 200 10.00 1759.25 ioeventfd=off TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2 (192.168.0.2) port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 200 10.00 1757.15 The results vary approx +/- 3% between runs. Invocation: $ x86_64-softmmu/qemu-system-x86_64 -m 4096 -enable-kvm -netdev type=tap,id=net0,ifname=tap0,script=no,downscript=no -device virtio-net-pci,netdev=net0,ioeventfd=on|off -vnc :0 -drive if=virtio,cache=none,file=$HOME/rhel6-autobench-raw.img I am running qemu.git with v5 patches, based off 36888c6335422f07bbc50bf3443a39f24b90c7c6. Host: 1 Quad-Core AMD Opteron(tm) Processor 2350 @ 2 GHz 8 GB RAM RHEL 6 host Next I will try the patches on latest qemu-kvm.git Stefan One interesting thing is that I put virtio-net earlier on command line. Sorry I mean I put it after disk, you put it before. I can't find a measurable difference when swapping -drive and -netdev. One other concern I have is that we are apparently using ioeventfd for all VQs. E.g. for virtio-net we probably should not use it for the control VQ - it's a waste of resources. One option is a per-device (block, net, etc) bitmap that masks out virtqueues. Is that something you'd like to see? I'm tempted to mask out the RX vq too and see how that affects the qemu-kvm.git specific issue. As expected, the rx virtqueue is involved in the degradation. I enabled ioeventfd only for the TX virtqueue and got the same good results as userspace virtio-net. When I enable only the rx virtqueue, performs decreases as we've seen above. Stefan Interesting. In particular this implies something's wrong with the queue: we should not normally be getting notifications from rx queue at all. Is it running low on buffers? Does it help to increase the vq size? Any other explanation? I made a mistake, it is the *tx* vq that causes reduced performance on short packets with ioeventfd. I double-checked the results and the rx vq doesn't affect performance. Initially I thought the fix would be to adjust the tx mitigation mechanism since ioeventfd does its own mitigation of sorts. Multiple eventfd signals will be coalesced into one qemu-kvm event handler call if qemu-kvm didn't have a chance to handle the first event before the eventfd was signalled again. I added -device virtio-net-pci tx=immediate to flush the TX queue immediately instead of scheduling a BH or timer. Unfortunately this had little measurable effect and performance stayed the same. This suggests most of the latency is between the guest's pio write and qemu-kvm getting around to handling the event. You mentioned that vhost-net has the same performance issue on this benchmark. I guess a solution for vhost-net may help virtio-ioeventfd and vice versa. Are you happy with this patchset if I remove virtio-net-pci ioeventfd=on|off so only virtio-blk-pci has ioeventfd=on|off (with default on)? For block we've found it to be a win and the initial results looked good for net too. Stefan I'm concerned that the tests were done on qemu.git. Could you check block with qemu-kvm too please?
[Qemu-devel] Re: [Spice-devel] RFC; usb redirection protocol
Do you know whenever certain low-level usb ops can work with this? I expect most usb devices to work with this, I don't know about really weird ones. Specifically iphone firmware flashing was mentioned on the list. I think that should work, but that is an interesting case about which I don't know enough yet. If that can help you, I may try to capture the usb activities while doing the next firmware upgrade. Unfortunately, it'll be under Windows so I don't know what tool I could use for that. If I can confirm that the upgrade works with Oracle Virtualbox, I could also try that, and capture it at guest /and/ host level. I heard that next update should happen within a few week, so I'll keep you in touch about it. Frederic.
Re: [Qemu-devel] [PATCH 1/5] usb-ccid: add CCID bus
On 12/12/10 19:37, Alon Levy wrote: A CCID device is a smart card reader. It is a USB device, defined at [1]. This patch introduces the usb-ccid device that is a ccid bus. Next patches will introduce two card types to use it, a passthru card and an emulated card. Looks good overall, just some minor nits / questions: +struct CCIDCardState { +DeviceState qdev; Add uint32_t slot here? Also adding a bus property for it would be good, even though it can't be anything but '0' right now, to prepare the interfaces for the day when we'll support more than a single card slot and thus can attach multiple cards to the ccid bus. +static VMStateDescription ccid_vmstate = { +.name = CCID_DEV_NAME, +.version_id = 1, +.minimum_version_id = 1, +.post_load = ccid_post_load, +.pre_save = ccid_pre_save, +.fields = (VMStateField []) { +VMSTATE_STRUCT(dev, USBCCIDState, 1, usb_device_vmstate, USBDevice), Can we please disable this for now? USB migration support doesn't exist, and when we add it chances are that it isn't compatible with what you have here. +static struct USBDeviceInfo ccid_info = { +.product_desc = QEMU USB CCID, +.qdev.name = CCID_DEV_NAME, +.qdev.size = sizeof(USBCCIDState), +.qdev.vmsd =ccid_vmstate, Best leave the vmstate structs in the code (with a comment) and just zap this line which winds it up. thanks, Gerd
Re: [Qemu-devel] [PATCH 2/5] ccid: add passthru card device
Hi, Commit message could be a bit more verbose, even if it is redundant with what the patch 1/5 message says. Also short usage information (or pointer to the file patch 5/5 adds) would be good. Patch itself looks fine to me. cheers, Gerd
Re: [Qemu-devel] [PATCH 3/5] libcacard: initial commit after coding style fixes
On 12/12/10 19:37, Alon Levy wrote: From: Robert Relyearrel...@redhat.com Signed-off-by: Alon Levyal...@redhat.com Commit message could be more verbose too. This is the smart card emulation library, right? Patch looks good on a quick glance, I'm not a crypto expert though. Acked-by: Gerd Hoffmann kra...@redhat.com cheers, Gerd
Re: [Qemu-devel] [PATCH 4/5] ccid: add ccid-card-emulated device (v2)
On 12/12/10 19:37, Alon Levy wrote: changes from v1: remove stale comments, use only c-style comments bugfix, forgot to set recv_len change reader name to 'Virtual Reader' Signed-off-by: Alon Levyal...@redhat.com Should be more verbose too. Please explain what the threads are used for and why they are needed (/me guesses that libcacard wants run async). thanks, Gerd
Re: [Qemu-devel] [PATCH 5/5] smartcard: add docs
Hi, docs/libcacard.txt | 483 Move this to the libcacard patch and add a pointer to it to the commit message? cheers, Gerd
[Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify
On Wed, Dec 15, 2010 at 12:14 PM, Michael S. Tsirkin m...@redhat.com wrote: On Wed, Dec 15, 2010 at 11:42:12AM +, Stefan Hajnoczi wrote: On Mon, Dec 13, 2010 at 6:52 PM, Michael S. Tsirkin m...@redhat.com wrote: On Mon, Dec 13, 2010 at 05:57:28PM +, Stefan Hajnoczi wrote: On Mon, Dec 13, 2010 at 4:28 PM, Stefan Hajnoczi stefa...@gmail.com wrote: On Mon, Dec 13, 2010 at 4:12 PM, Michael S. Tsirkin m...@redhat.com wrote: On Mon, Dec 13, 2010 at 03:27:06PM +, Stefan Hajnoczi wrote: On Mon, Dec 13, 2010 at 1:36 PM, Michael S. Tsirkin m...@redhat.com wrote: On Mon, Dec 13, 2010 at 03:35:38PM +0200, Michael S. Tsirkin wrote: On Mon, Dec 13, 2010 at 01:11:27PM +, Stefan Hajnoczi wrote: Fresh results: 192.168.0.1 - host (runs netperf) 192.168.0.2 - guest (runs netserver) host$ src/netperf -H 192.168.0.2 -- -m 200 ioeventfd=on TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2 (192.168.0.2) port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 200 10.00 1759.25 ioeventfd=off TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2 (192.168.0.2) port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 200 10.00 1757.15 The results vary approx +/- 3% between runs. Invocation: $ x86_64-softmmu/qemu-system-x86_64 -m 4096 -enable-kvm -netdev type=tap,id=net0,ifname=tap0,script=no,downscript=no -device virtio-net-pci,netdev=net0,ioeventfd=on|off -vnc :0 -drive if=virtio,cache=none,file=$HOME/rhel6-autobench-raw.img I am running qemu.git with v5 patches, based off 36888c6335422f07bbc50bf3443a39f24b90c7c6. Host: 1 Quad-Core AMD Opteron(tm) Processor 2350 @ 2 GHz 8 GB RAM RHEL 6 host Next I will try the patches on latest qemu-kvm.git Stefan One interesting thing is that I put virtio-net earlier on command line. Sorry I mean I put it after disk, you put it before. I can't find a measurable difference when swapping -drive and -netdev. One other concern I have is that we are apparently using ioeventfd for all VQs. E.g. for virtio-net we probably should not use it for the control VQ - it's a waste of resources. One option is a per-device (block, net, etc) bitmap that masks out virtqueues. Is that something you'd like to see? I'm tempted to mask out the RX vq too and see how that affects the qemu-kvm.git specific issue. As expected, the rx virtqueue is involved in the degradation. I enabled ioeventfd only for the TX virtqueue and got the same good results as userspace virtio-net. When I enable only the rx virtqueue, performs decreases as we've seen above. Stefan Interesting. In particular this implies something's wrong with the queue: we should not normally be getting notifications from rx queue at all. Is it running low on buffers? Does it help to increase the vq size? Any other explanation? I made a mistake, it is the *tx* vq that causes reduced performance on short packets with ioeventfd. I double-checked the results and the rx vq doesn't affect performance. Initially I thought the fix would be to adjust the tx mitigation mechanism since ioeventfd does its own mitigation of sorts. Multiple eventfd signals will be coalesced into one qemu-kvm event handler call if qemu-kvm didn't have a chance to handle the first event before the eventfd was signalled again. I added -device virtio-net-pci tx=immediate to flush the TX queue immediately instead of scheduling a BH or timer. Unfortunately this had little measurable effect and performance stayed the same. This suggests most of the latency is between the guest's pio write and qemu-kvm getting around to handling the event. You mentioned that vhost-net has the same performance issue on this benchmark. I guess a solution for vhost-net may help virtio-ioeventfd and vice versa. Are you happy with this patchset if I remove virtio-net-pci ioeventfd=on|off so only virtio-blk-pci has ioeventfd=on|off (with default on)? For block we've found it to be a win and the initial results looked good for net too. Stefan I'm concerned that the tests were done on qemu.git. Could you check block with qemu-kvm too please? The following results show qemu-kvm with virtio-ioeventfd v3 for both aio=native and aio=threads: http://wiki.qemu.org/Features/VirtioIoeventfd Stefan
Re: [Qemu-devel] [PATCH 0/5] usb-ccid (v9)
On Wed, Dec 15, 2010 at 01:01:56PM +0100, Gerd Hoffmann wrote: On 12/12/10 19:37, Alon Levy wrote: This patchset adds three new devices, usb-ccid, ccid-card-passthru and ccid-card-emulated, providing a CCID bus, a simple passthru protocol implementing card requiring a client, and a standalone emulated card. Hmm, 'git am' refuses to apply these and complains about a corrupted patch: Applying: libcacard: initial commit after coding style fixes fatal: corrupt patch at line 5839 Is there a git tree somewhere with the patches to pull from? git://anongit.freedesktop.org/~alon/qemu usb_ccid.v9 It contains some other patches (usb attach/detach and qtree extra parameter patches), so cherry-pick from it. I'll make a straight branch too after I address your comments. cheers, Gerd
Re: [Qemu-devel] Re: sparc OBP psr value
Forget this. My test was flawed because I still wasn't comparing apples to apples. I was comparing the pre-bootloader state to the post-bootloader state, and it seems that OBP, even on a real machine, shows all the registers as zero before it runs any program. However, I still think there's something wrong with the RT625 version somewhere that causes it to show as a 605e in the banner, but I don't have the real thing to compare against. What I do have for comparison is a 150MHz hyperSparc which shows up as an RT626. Bob
Re: [Qemu-devel] [PATCH 26/27] blockdev: Collect block device code in new blockdev.c
On Fri, Jun 4, 2010 at 6:33 PM, Kevin Wolf kw...@redhat.com wrote: From: Markus Armbruster arm...@redhat.com Anything that moves hundreds of lines out of vl.c can't be all bad. I know I'm late for this train, but why does this patch change the license of the former vl.c code from a BSD-like one to GPLv2? It seems to be contradicting the vl.c paragraph: * The above copyright notice and this permission notice shall be included in * all copies or substantial portions of the Software. -- Regards, Artyom Tarasenko solaris/sparc under qemu blog: http://tyom.blogspot.com/
Re: [Qemu-devel] Re: [Spice-devel] RFC; usb redirection protocol
On Wed, Dec 15, 2010 at 01:15:58PM +0100, Hans de Goede wrote: Hi, Thanks for taking the time to read all that! On 12/13/2010 12:21 PM, Gerd Hoffmann wrote: Basic packet structure / communication -- Each packet exchanged between the vm-host and the usb-host starts with a usb_redir_header, followed by an optional command specific header follow by optional additional data. The usb_redir_header each packet starts with looks as follows: struct usb_redir_header { uint32_t command; uint32_t length; } uint32_t id; ? A reply would then carry the id of the request ... That sounds like a good idea (all though for iso streams the packets will be only send in one direction). Given that everything is done over a potentially slow transport in practice the diferentiating between synchroneous and asynchroneous commands may seem odd. The difference is how the usb-host will handle them once received. For synchroneous commands the usb-host will hand the request over to the host os and then *wait* for a response. This means that the vm-host is guaranteed to get an immediate response. Where as for asynchroneous commands to usb-host hands the request over to the host os with the request to let the usb-host process know when the request is done. Hmm. Looks like you are planning for one tcp stream and one thread (on the usb-host side) for each usb device. That will not work very good for usb-over-vnc because there is a single tcp stream only. We could of course multiplex multiple logical usb connections over vnc, but even then blocking on the usb-host side looks bad as this could disrupt other usb devices forwarded over the same connection. The idea is for this protocol to be transport independent (*) so it could be one stream, or it could be multiplexed into another transport. Assuring that there are not too large latencies, and handling things like not blocking for too long when handling multiple devices from the same thread are left to the implementation. It could be that the implementation decides to not handle synchroneous commands synchroneous at all. The main difference I'm trying to make here is between commands which do things to end points which cannot be done while other packets are in flight (like setting configuration, or interface alt setting) and commands (normal packets) of which there can be multiple in flight. I'm open to using a different term then synchroneous and async here. *) Assuming a reliable, ordered transport usb_redir_report_descriptor --- usb_redir_header.command: usb_redir_report_desciptor usb_redir_header.length: sizeof usb device descriptors No command specific header. The command specific additional data contains the entire descriptors for the usb device. A packet of this type is send by the usb-host directly after the hello packet it contains the usb descriptor tables for the usb device. Device addressing isn't done at all in the protocol, i.e. there is a fixed device - connection relation ship? The idea is there is a fixed device - transport pipe relation ship, yes. For the simple tcp server usb-host I plan to implement as first usb-host, the device would be specified on the cmdline while starting the server. When one wants to use multiple devices, this means starting one usb-server per device. For things like vnc and spice I assume there will be some sort of control channel where usb device management is done. To be more precise I expect the vnc client / spice client to have some UI which allows selecting a usb device to plug into the virtual machine. And then the client will setup a transport for this (reserve a channel within its existing stream) and tell the server that it has an usb device at sub channel id #, at which point the server will add a new usb redir device to qemu, passing in a transport object which is connected to this channel. Please let me know what you think of this. Do you know whenever certain low-level usb ops can work with this? I expect most usb devices to work with this, I don't know about really weird ones. Specifically iphone firmware flashing was mentioned on the list. I think that should work, but that is an interesting case about which I don't know enough yet. Also I remember somewhere in the ehci (or xhci?) specs was mentioned with some devices it can be needed to talk to them *before* an bus address is assigned ... If some devices need this, and the setup done there cannot be done by the client machines os, we loose, as device enumeration and address assignment is done at the OS level, and at least under Linux we won't get a chance to talk to the device till after it has been assigned an address. I know of devices that will enumerate twice, first as one device, then after a certain setup exchange as another. But that seems to be covered by the
Re: [Qemu-devel] Re: [PATCH] PCI: Bus number from the bridge, not the device
On Wed, 2010-12-15 at 11:56 +0200, Michael S. Tsirkin wrote: On Tue, Dec 14, 2010 at 11:34:53AM -0700, Alex Williamson wrote: On Tue, 2010-12-14 at 14:26 +0200, Michael S. Tsirkin wrote: On Mon, Dec 13, 2010 at 10:04:24PM -0700, Alex Williamson wrote: I've only ever seen config[PCI_SECONDARY_BUS] be non-zero for an assigned device, so I'm pretty sure we're not going to hurt migration, but the code is clearly wrong and I'd like to make sure we don't trip on a migration failure for a minor device config space change. Which reminds me: maybe just mark nested bridges as non-migrateable for now? Care writing such a patch? Hmm, this is trickier than it sounds. Hmm, since 0 is put in the path instead of the bridge number, will the correct bridge be restored? We're really only broken wrt migration if a device under a bridge calls qemu_ram_alloc. I guess there's more broken-ness. What exactly breaks qemu_ram_alloc? You're right, it's more broken than that. Anything that calls get_dev_path is broken for migration of bridges since the path is determined before the guest updates bus numbers. That includes qemu_ram_alloc and vmstate. I was only looking at the qemu_ram_alloc side. So perhaps the right answer, for the moment, is to block migration if there's a p2p bridge. Alex Any device is free to do this, but typically it only happens via pci_add_option_rom() (not counting vga as typical). So maybe the better approach for now is to prevent the problem by disallowing option ROMs for devices below a bridge. We obviously risk devices coming along that allocate RAM on their own, but we could still allow the most common issue with almost no lost functionality (assuming no one wants to boot off that nested device). Thoughts? Thanks, Alex
Re: [Qemu-devel] [PATCH 26/27] blockdev: Collect block device code in new blockdev.c
Am 15.12.2010 16:04, schrieb Artyom Tarasenko: On Fri, Jun 4, 2010 at 6:33 PM, Kevin Wolf kw...@redhat.com wrote: From: Markus Armbruster arm...@redhat.com Anything that moves hundreds of lines out of vl.c can't be all bad. I know I'm late for this train, but why does this patch change the license of the former vl.c code from a BSD-like one to GPLv2? It seems to be contradicting the vl.c paragraph: * The above copyright notice and this permission notice shall be included in * all copies or substantial portions of the Software. Right, I think this needs to fixed. Markus, can you send a patch? Kevin
[Qemu-devel] Re: [PATCH] rtl8139: IO memory is not part of vmstate
On Wed, 2010-12-15 at 12:07 +0200, Michael S. Tsirkin wrote: On Tue, Dec 14, 2010 at 08:41:55AM -0700, Alex Williamson wrote: On Tue, 2010-12-14 at 14:32 +0200, Michael S. Tsirkin wrote: On Mon, Dec 13, 2010 at 10:00:48PM -0700, Alex Williamson wrote: On Tue, 2010-12-14 at 06:43 +0200, Michael S. Tsirkin wrote: On Mon, Dec 13, 2010 at 12:15:08PM -0700, Alex Williamson wrote: On Mon, 2010-12-13 at 21:06 +0200, Michael S. Tsirkin wrote: On Mon, Dec 13, 2010 at 11:59:16AM -0700, Alex Williamson wrote: On Mon, 2010-12-13 at 20:54 +0200, Michael S. Tsirkin wrote: On Mon, Dec 13, 2010 at 11:00:44AM -0700, Alex Williamson wrote: On Mon, 2010-12-13 at 19:50 +0200, Michael S. Tsirkin wrote: On Mon, Dec 13, 2010 at 10:43:22AM -0700, Alex Williamson wrote: So, unfortunately, I stand by my original patch. What about the one that put -1 in saved index for a hotplugged device? There are still examples that don't work even without hotplug (example 2 and example 3 after the reboot). That hack limits the damage, but still leaves a latent bug for reboot and doesn't address the non-hotplug scenarios. So, I don't think it's worthwhile to pursue, and we shouldn't pretend we can use it to avoid bumping the version_id. Thanks, Alex I guess when we bump it we tell users: migration is completely borken to the old version, don't even try it. Is there a way for libvirt to discover such incompatibilities and avoid the migration? I don't know if libvirt has a way to query this in advance. If a migration is attempted, the target will report: savevm: unsupported version 5 for ':00:03.0/rtl8139' v4 And the source will continue running. We waste plenty of bits getting to that point, Yes, this happens after all of memory has been migrated. Better late than never :^\ One other question: can we do the same by creating a new (empty) section? As was discussed in the past this is easier for downstreams to cherry-pick. The only way I can think to do that would be to have a subsection that is always included, but saves no data. That would force a failure on new-old migration, but I don't think it really matches the intended purpose of subsections and feels like it's adding cruft for no gain. Maybe I'm missing something. Juan, is there any advantage to trapping this in a subsection? Thanks, Alex Maybe in this particular case the advantage is minimal. But it seems easier to stick to a rule of no more version bumps than argue about each case. Do we have such a rule? I think it's easier to have a hard rule than start arguing whether each change is a subsection or not for each patch. We need to have these discussions to make sure we're doing everything we can do preserve the ABI. AIUI, subsections are only useful in helping us with that if there's a worthwhile percentage of the time that the subsection isn't needed in the migration stream. Even then it's unclear to me how high level tools are supposed to deal with this. Do they re-try the migration, hoping the subsection was only needed temporarily? In this case, the subsection would be needed all the time, so I don't think it's applicable and I don't see a dogmatic rule about using them to have any value. What is the downside? Some empty structures to do things in a completely convoluted way. What's the upside? If we have a subsection who's needed function is return 1, I think that's a good indication that it's not appropriate for a subsection and the end result is equivalent to bumping the main driver vmstate version. So if you feel strongly about it, go ahead, I'm just concerned we'll now need to argue version or subsection for each migration-related change. Subsections don't magically preserve the migration ABI, so we still need to have these discussions It's convoluted to try to hide a one-way upgrade in a subsection. It may be convoluted but I think it works better with downstreams. When we'll bump the version again in the future, it should be possible for a downstream to pick up just the next change, and skip this one. But that's a general comment. In this case, I expect all downstream to be able to pick this bugfix with no real downsides. If a subsection can maintain the migration ABI in a worthwhile percentage of use cases and high level tools are signed up to understand whether to retry a migration blocked by a subsection describing an inflight DMA vs avoiding a basic incompatibility, sure, let's use them. I don't think
[Qemu-devel] Re: [PULL 00/14] Block patches
Am 09.12.2010 12:09, schrieb Kevin Wolf: The following changes since commit 138b38b61bf92d4e9588acf934e532499c94e185: ppc: kvm: fix signedness warning (2010-12-08 21:30:19 +0100) are available in the git repository at: git://repo.or.cz/qemu/kevin.git for-anthony Christian Brunner (1): ceph/rbd block driver for qemu-kvm Jes Sorensen (8): Add missing tracing to qemu_mallocz() Use qemu_mallocz() instead of calloc() in img_convert() img_convert(): Only try to free bs[] entries if bs is valid. Consolidate printing of block driver options Fix formatting and missing braces in qemu-img.c Fail if detecting an unknown option Make error handling more consistent in img_create() and img_resize() qemu-img: Deprecate obsolete -6 and -e options Stefan Hajnoczi (5): block: Make bdrv_create_file() ':' handling consistent qemu-option: Don't reinvent append_option_parameters() qemu-option: Fix parse_option_parameters() documentation typo qemu-img: Free option parameter lists in img_create() qemu-img: Fail creation if backing format is invalid Makefile.objs |1 + block.c |2 +- block/rbd.c | 1059 + block/rbd_types.h | 71 block_int.h |1 - configure | 52 +++ qemu-img.c| 247 - qemu-malloc.c |5 +- qemu-option.c | 13 +- 9 files changed, 1344 insertions(+), 107 deletions(-) create mode 100644 block/rbd.c create mode 100644 block/rbd_types.h Ping? Kevin
Re: [Qemu-devel] [PATCH v2 1/1] qemu-img.c: Clean up handling of image size in img_create()
jes.soren...@redhat.com writes: From: Jes Sorensen jes.soren...@redhat.com This cleans up the handling of image size in img_create() by parsing the value early, and then only setting it once if a value has been added as the last argument to the command line. Signed-off-by: Jes Sorensen jes.soren...@redhat.com --- qemu-img.c | 14 -- 1 files changed, 8 insertions(+), 6 deletions(-) Patch conflicts with commit c2abccec. diff --git a/qemu-img.c b/qemu-img.c index d146d8c..9986004 100644 --- a/qemu-img.c +++ b/qemu-img.c @@ -282,6 +282,7 @@ static int add_old_style_options(const char *fmt, QEMUOptionParameter *list, static int img_create(int argc, char **argv) { int c, ret = 0; +uint64_t img_size = -1; const char *fmt = raw; const char *base_fmt = NULL; const char *filename; @@ -329,6 +330,11 @@ static int img_create(int argc, char **argv) } filename = argv[optind++]; +/* Get image size, if specified */ +if (optind argc) { +img_size = strtosz(argv[optind++], NULL); strtosz() can fail. More below. +} + if (options !strcmp(options, ?)) { ret = print_block_option_help(filename, fmt); goto out; @@ -356,7 +362,8 @@ static int img_create(int argc, char **argv) /* Create parameter list with default values */ param = parse_option_parameters(, create_options, param); -set_option_parameter_int(param, BLOCK_OPT_SIZE, -1); + +set_option_parameter_int(param, BLOCK_OPT_SIZE, img_size); /* Parse -o options */ if (options) { @@ -368,11 +375,6 @@ static int img_create(int argc, char **argv) } } -/* Add size to parameters */ -if (optind argc) { -set_option_parameter(param, BLOCK_OPT_SIZE, argv[optind++]); -} - /* Add old-style options to parameters */ ret = add_old_style_options(fmt, param, base_filename, base_fmt); if (ret 0) { This switches parsing of the size argument from parse_option_size() (via set_option_parameter()) to strtosz(). I'm fine with that, but: * Before: $ qemu-img create xxx xxx Parameter 'size' expects a size You may use k, M, G or T suffixes for kilobytes, megabytes, gigabytes and terabytes. qemu-img: Image creation needs a size parameter * After: $ qemu-img create xxx xxx qemu-img: Image creation needs a size parameter Intentional?
[Qemu-devel] Re: [PATCH 0/3] Re-factor img_create() and add live snapshots
Am 13.12.2010 08:32, schrieb jes.soren...@redhat.com: From: Jes Sorensen jes.soren...@redhat.com Hi, This set of patches re-factors img_create() and moves the core part of it into block.c so it can be accessed from qemu as well as qemu-img. The second patch adds basic live snapshots support to the code, however only snapshots to external QCOW2 images is supported for now. QED support should be trivial once the QED patches go into upstream. The last patch fixes a small gotcha which is present in the old code as well. Try to catch cases where a user tries to create an image with itself as the backing file. QEMU does 'interesting' things when you do this. Many thanks to Kevin for his help with block layer internals! Cheers, Jes Jes Sorensen (3): qemu-img.c: Re-factor img_create() Introduce do_snapshot_blkdev() and monitor command to handle it. Prevent creating an image with the same filename as backing file This doesn't seem to apply cleanly any more. Can you please rebase (ideally on top of my block branch)? Kevin
[Qemu-devel] Re: [PATCH 2/3] Introduce do_snapshot_blkdev() and monitor command to handle it.
Am 13.12.2010 08:32, schrieb jes.soren...@redhat.com: From: Jes Sorensen jes.soren...@redhat.com The monitor command is: snapshot_blkdev device [snapshot-file] [format] Default format is qcow2. For now snapshots without a snapshot-file, eg internal snapshots, are not supported. Signed-off-by: Jes Sorensen jes.soren...@redhat.com --- blockdev.c | 61 +++ blockdev.h |1 + hmp-commands.hx | 19 + 3 files changed, 81 insertions(+), 0 deletions(-) diff --git a/blockdev.c b/blockdev.c index f6ac439..1ea24d7 100644 --- a/blockdev.c +++ b/blockdev.c @@ -514,6 +514,67 @@ void do_commit(Monitor *mon, const QDict *qdict) } } +int do_snapshot_blkdev(Monitor *mon, const QDict *qdict, QObject **ret_data) +{ +const char *device = qdict_get_str(qdict, device); +const char *filename = qdict_get_try_str(qdict, snapshot_file); +const char *format = qdict_get_try_str(qdict, format); +const char format_qcow2[] = qcow2; +BlockDriverState *bs; +BlockDriver *drv, *proto_drv; +int ret = 0; +int flags; + +bs = bdrv_find(device); +if (!bs) { +qerror_report(QERR_DEVICE_NOT_FOUND, device); +ret = -1; +goto out; +} + +if (!format) { +format = format_qcow2; +} + +drv = bdrv_find_format(format); +if (!drv) { +qerror_report(QERR_INVALID_BLOCK_FORMAT, format); +ret = -1; +goto out; +} + +proto_drv = bdrv_find_protocol(filename); +if (!proto_drv) { +qerror_report(QERR_INVALID_BLOCK_FORMAT, format); +ret = -1; +goto out; +} + +ret = bdrv_img_create(filename, format, bs-filename, + bs-drv-format_name, NULL, -1, bs-open_flags); +if (ret) { +goto out; +} + +qemu_aio_flush(); +bdrv_flush(bs); + +flags = bs-open_flags; +bdrv_close(bs); +ret = bdrv_open(bs, filename, flags, drv); +/* + * If reopening the image file we just created fails, we really + * are in trouble :( + */ +assert(ret == 0); +out: +if (ret) { +ret = 1; I seem to remember that errors are always -1 for monitor commands. Kevin
Re: [Qemu-devel] [PATCH v2 1/1] qemu-img.c: Clean up handling of image size in img_create()
On 12/15/10 17:47, Markus Armbruster wrote: jes.soren...@redhat.com writes: From: Jes Sorensen jes.soren...@redhat.com This cleans up the handling of image size in img_create() by parsing the value early, and then only setting it once if a value has been added as the last argument to the command line. Signed-off-by: Jes Sorensen jes.soren...@redhat.com --- qemu-img.c | 14 -- 1 files changed, 8 insertions(+), 6 deletions(-) Patch conflicts with commit c2abccec. ARGH :( This switches parsing of the size argument from parse_option_size() (via set_option_parameter()) to strtosz(). I'm fine with that, but: * Before: $ qemu-img create xxx xxx Parameter 'size' expects a size You may use k, M, G or T suffixes for kilobytes, megabytes, gigabytes and terabytes. qemu-img: Image creation needs a size parameter * After: $ qemu-img create xxx xxx qemu-img: Image creation needs a size parameter Intentional? This was addressed in the later revision when I introduced strtosz_suffix() Cheers, Jes
[Qemu-devel] Re: [PATCH 2/3] Introduce do_snapshot_blkdev() and monitor command to handle it.
On 12/15/10 17:55, Kevin Wolf wrote: +int do_snapshot_blkdev(Monitor *mon, const QDict *qdict, QObject **ret_data) +{ +const char *device = qdict_get_str(qdict, device); +const char *filename = qdict_get_try_str(qdict, snapshot_file); +const char *format = qdict_get_try_str(qdict, format); +const char format_qcow2[] = qcow2; +BlockDriverState *bs; +BlockDriver *drv, *proto_drv; +int ret = 0; +int flags; + +bs = bdrv_find(device); +if (!bs) { +qerror_report(QERR_DEVICE_NOT_FOUND, device); +ret = -1; +goto out; +} + +if (!format) { +format = format_qcow2; +} + +drv = bdrv_find_format(format); +if (!drv) { +qerror_report(QERR_INVALID_BLOCK_FORMAT, format); +ret = -1; +goto out; +} + +proto_drv = bdrv_find_protocol(filename); +if (!proto_drv) { +qerror_report(QERR_INVALID_BLOCK_FORMAT, format); +ret = -1; +goto out; +} + +ret = bdrv_img_create(filename, format, bs-filename, + bs-drv-format_name, NULL, -1, bs-open_flags); +if (ret) { +goto out; +} + +qemu_aio_flush(); +bdrv_flush(bs); + +flags = bs-open_flags; +bdrv_close(bs); +ret = bdrv_open(bs, filename, flags, drv); +/* + * If reopening the image file we just created fails, we really + * are in trouble :( + */ +assert(ret == 0); +out: +if (ret) { +ret = 1; I seem to remember that errors are always -1 for monitor commands. I mapped it after something else, but admitted I cannot remember where - can someone clarify? Cheers, Jes
[Qemu-devel] Re: [PATCH 0/3] Re-factor img_create() and add live snapshots
On 12/15/10 17:52, Kevin Wolf wrote: Am 13.12.2010 08:32, schrieb jes.soren...@redhat.com: Jes Sorensen (3): qemu-img.c: Re-factor img_create() Introduce do_snapshot_blkdev() and monitor command to handle it. Prevent creating an image with the same filename as backing file This doesn't seem to apply cleanly any more. Can you please rebase (ideally on top of my block branch)? Kevin Darn! I'll try and rebase it tomorrow. Thanks, Jes
[Qemu-devel] Re: [PATCH] rtl8139: IO memory is not part of vmstate
On 12/15/2010 11:00 AM, Michael S. Tsirkin wrote: Indeed, subsections are for data that is rarely needed so that there's some chance (sometimes ~100%) of migration working seemlessly. If a subsection arrives that qemu does not know about, won't migratin fail? Yes, that's why rarely needed = some high chance of migration working (though no certainty). In this case it's either no-bump-and-live-with-the-consequences, or changing the version id. This was discussed to death already. version ids have the problem that they don't play nicely with downstreams. Downstream version bumps don't play nicely with upstream, so downstream does have a reason for always-necessary subsections. But upstream can bump the version id as much as they care. Paolo
[Qemu-devel] Re: [PATCH v3] qemu, qmp: convert do_inject_nmi() to QObject, QError
On Wed, 15 Dec 2010 17:49:27 +0800 Lai Jiangshan la...@cn.fujitsu.com wrote: Convert do_inject_nmi() to QObject, QError, we need to use it(via libvirt). changed from v1 Add document. Add error handling when the cpu index is invalid. changed from v2 use QERR_INVALID_PARAMETER_VALUE as Markus suggest. Signed-off-by: Lai Jiangshan la...@cn.fujitsu.com --- diff --git a/hmp-commands.hx b/hmp-commands.hx index 23024ba..f86d9fe 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -724,7 +724,8 @@ ETEXI .args_type = cpu_index:i, .params = cpu, .help = inject an NMI on the given CPU, -.mhandler.cmd = do_inject_nmi, +.user_print = monitor_user_noop, +.mhandler.cmd_new = do_inject_nmi, }, #endif STEXI diff --git a/monitor.c b/monitor.c index ec31eac..3e33a96 100644 --- a/monitor.c +++ b/monitor.c @@ -2119,7 +2119,7 @@ static void do_wav_capture(Monitor *mon, const QDict *qdict) #endif #if defined(TARGET_I386) -static void do_inject_nmi(Monitor *mon, const QDict *qdict) +static int do_inject_nmi(Monitor *mon, const QDict *qdict, QObject **ret_data) { CPUState *env; int cpu_index = qdict_get_int(qdict, cpu_index); @@ -2127,8 +2127,12 @@ static void do_inject_nmi(Monitor *mon, const QDict *qdict) for (env = first_cpu; env != NULL; env = env-next_cpu) if (env-cpu_index == cpu_index) { cpu_interrupt(env, CPU_INTERRUPT_NMI); -break; +return 0; } + +qerror_report(QERR_INVALID_PARAMETER_VALUE, cpu_index, + a CPU number); +return -1; } #endif diff --git a/qmp-commands.hx b/qmp-commands.hx index e5f157f..fcb6bf2 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -429,6 +429,33 @@ Example: EQMP +#if defined(TARGET_I386) +{ +.name = inject_nmi, +.args_type = cpu_index:i, +.params = cpu, +.help = inject an NMI on the given CPU, +.user_print = monitor_user_noop, +.mhandler.cmd_new = do_inject_nmi, +}, +#endif +SQMP +inject_nmi +-- + +Inject an NMI on the given CPU (x86 only). + +Arguments: + +- cpu_index: the index of the CPU to be injected NMI (json-int) Please, use cpu-index, that's what we're using for the human-monitor-command. Avi, Anthony, can you please ACK this new command? + +Example: + +- { execute: inject_nmi, arguments: { cpu_index: 0 } } +- { return: {} } + +EQMP + { .name = migrate, .args_type = detach:-d,blk:-b,inc:-i,uri:s,
Re: [Qemu-devel] [PATCH v3] qemu, qmp: convert do_inject_nmi() to QObject, QError
On Wed, 15 Dec 2010 11:49:23 +0100 Markus Armbruster arm...@redhat.com wrote: Lai Jiangshan la...@cn.fujitsu.com writes: Convert do_inject_nmi() to QObject, QError, we need to use it(via libvirt). changed from v1 Add document. Add error handling when the cpu index is invalid. changed from v2 use QERR_INVALID_PARAMETER_VALUE as Markus suggest. Signed-off-by: Lai Jiangshan la...@cn.fujitsu.com A note on commit messages: The commit message should describe the current version of the patch. Don't repeat the subject line in the body. Patch history is very useful for review, but usually uninteresting once the patch is committed. Thus, it's best to put it after the --- separator. Subsystem tags in the subject line are helpful. But qemu doesn't provide any information there :) Regarding the patch: The conversion looks good. The new QMP command is called inject_nmi, while the existing human monitor command is called nmi. Luiz asked for this name change. I don't mind. But should we rename the human monitor command for consistency? I don't think so, we don't need (and maybe don't even want) naming parity between QMP and HMP. Remember that one of our mistakes was to couple the two. Also, Avi asked for more descriptive names in QMP and I agree with him, I would even be in favor of calling it inject-non-maskable-interrupt. Regardless, the differing command name is worth mentioning in the commit message.
[Qemu-devel] Re: [PATCH v3] qemu, qmp: convert do_inject_nmi() to QObject, QError
On 12/15/2010 07:09 PM, Luiz Capitulino wrote: On Wed, 15 Dec 2010 17:49:27 +0800 Lai Jiangshanla...@cn.fujitsu.com wrote: Convert do_inject_nmi() to QObject, QError, we need to use it(via libvirt). changed from v1 Add document. Add error handling when the cpu index is invalid. changed from v2 use QERR_INVALID_PARAMETER_VALUE as Markus suggest. Signed-off-by: Lai Jiangshanla...@cn.fujitsu.com --- diff --git a/hmp-commands.hx b/hmp-commands.hx index 23024ba..f86d9fe 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -724,7 +724,8 @@ ETEXI .args_type = cpu_index:i, .params = cpu, .help = inject an NMI on the given CPU, -.mhandler.cmd = do_inject_nmi, +.user_print = monitor_user_noop, +.mhandler.cmd_new = do_inject_nmi, }, #endif STEXI diff --git a/monitor.c b/monitor.c index ec31eac..3e33a96 100644 --- a/monitor.c +++ b/monitor.c @@ -2119,7 +2119,7 @@ static void do_wav_capture(Monitor *mon, const QDict *qdict) #endif #if defined(TARGET_I386) -static void do_inject_nmi(Monitor *mon, const QDict *qdict) +static int do_inject_nmi(Monitor *mon, const QDict *qdict, QObject **ret_data) { CPUState *env; int cpu_index = qdict_get_int(qdict, cpu_index); @@ -2127,8 +2127,12 @@ static void do_inject_nmi(Monitor *mon, const QDict *qdict) for (env = first_cpu; env != NULL; env = env-next_cpu) if (env-cpu_index == cpu_index) { cpu_interrupt(env, CPU_INTERRUPT_NMI); -break; +return 0; } + +qerror_report(QERR_INVALID_PARAMETER_VALUE, cpu_index, + a CPU number); +return -1; } #endif diff --git a/qmp-commands.hx b/qmp-commands.hx index e5f157f..fcb6bf2 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -429,6 +429,33 @@ Example: EQMP +#if defined(TARGET_I386) +{ +.name = inject_nmi, +.args_type = cpu_index:i, +.params = cpu, +.help = inject an NMI on the given CPU, +.user_print = monitor_user_noop, +.mhandler.cmd_new = do_inject_nmi, +}, +#endif +SQMP +inject_nmi +-- + +Inject an NMI on the given CPU (x86 only). + +Arguments: + +- cpu_index: the index of the CPU to be injected NMI (json-int) Please, use cpu-index, that's what we're using for the human-monitor-command. Avi, Anthony, can you please ACK this new command? I'd like to see cpu-index made optional; if not present, nmi all cpus (that's what the nmi button on many machines does, or at least I think that's what it does). -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] [PATCH v2 1/1] qemu-img.c: Clean up handling of image size in img_create()
Jes Sorensen jes.soren...@redhat.com writes: On 12/15/10 17:47, Markus Armbruster wrote: jes.soren...@redhat.com writes: From: Jes Sorensen jes.soren...@redhat.com This cleans up the handling of image size in img_create() by parsing the value early, and then only setting it once if a value has been added as the last argument to the command line. Signed-off-by: Jes Sorensen jes.soren...@redhat.com --- qemu-img.c | 14 -- 1 files changed, 8 insertions(+), 6 deletions(-) Patch conflicts with commit c2abccec. ARGH :( This switches parsing of the size argument from parse_option_size() (via set_option_parameter()) to strtosz(). I'm fine with that, but: * Before: $ qemu-img create xxx xxx Parameter 'size' expects a size You may use k, M, G or T suffixes for kilobytes, megabytes, gigabytes and terabytes. qemu-img: Image creation needs a size parameter * After: $ qemu-img create xxx xxx qemu-img: Image creation needs a size parameter Intentional? This was addressed in the later revision when I introduced strtosz_suffix() Looks like I'm getting lost in my post-vacation mail backlog %-}
[Qemu-devel] Re: [PATCH v3] qemu, qmp: convert do_inject_nmi() to QObject, QError
On Wed, 15 Dec 2010 19:18:32 +0200 Avi Kivity a...@redhat.com wrote: On 12/15/2010 07:09 PM, Luiz Capitulino wrote: On Wed, 15 Dec 2010 17:49:27 +0800 Lai Jiangshanla...@cn.fujitsu.com wrote: Convert do_inject_nmi() to QObject, QError, we need to use it(via libvirt). changed from v1 Add document. Add error handling when the cpu index is invalid. changed from v2 use QERR_INVALID_PARAMETER_VALUE as Markus suggest. Signed-off-by: Lai Jiangshanla...@cn.fujitsu.com --- diff --git a/hmp-commands.hx b/hmp-commands.hx index 23024ba..f86d9fe 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -724,7 +724,8 @@ ETEXI .args_type = cpu_index:i, .params = cpu, .help = inject an NMI on the given CPU, -.mhandler.cmd = do_inject_nmi, +.user_print = monitor_user_noop, +.mhandler.cmd_new = do_inject_nmi, }, #endif STEXI diff --git a/monitor.c b/monitor.c index ec31eac..3e33a96 100644 --- a/monitor.c +++ b/monitor.c @@ -2119,7 +2119,7 @@ static void do_wav_capture(Monitor *mon, const QDict *qdict) #endif #if defined(TARGET_I386) -static void do_inject_nmi(Monitor *mon, const QDict *qdict) +static int do_inject_nmi(Monitor *mon, const QDict *qdict, QObject **ret_data) { CPUState *env; int cpu_index = qdict_get_int(qdict, cpu_index); @@ -2127,8 +2127,12 @@ static void do_inject_nmi(Monitor *mon, const QDict *qdict) for (env = first_cpu; env != NULL; env = env-next_cpu) if (env-cpu_index == cpu_index) { cpu_interrupt(env, CPU_INTERRUPT_NMI); -break; +return 0; } + +qerror_report(QERR_INVALID_PARAMETER_VALUE, cpu_index, + a CPU number); +return -1; } #endif diff --git a/qmp-commands.hx b/qmp-commands.hx index e5f157f..fcb6bf2 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -429,6 +429,33 @@ Example: EQMP +#if defined(TARGET_I386) +{ +.name = inject_nmi, +.args_type = cpu_index:i, +.params = cpu, +.help = inject an NMI on the given CPU, +.user_print = monitor_user_noop, +.mhandler.cmd_new = do_inject_nmi, +}, +#endif +SQMP +inject_nmi +-- + +Inject an NMI on the given CPU (x86 only). + +Arguments: + +- cpu_index: the index of the CPU to be injected NMI (json-int) Please, use cpu-index, that's what we're using for the human-monitor-command. Avi, Anthony, can you please ACK this new command? I'd like to see cpu-index made optional; if not present, nmi all cpus (that's what the nmi button on many machines does, or at least I think that's what it does). Looks like a GUI feature to me, _might_ turn out to be an undesirable side effect to client writers. I guess I prefer a to-all-cpus argument.
[Qemu-devel] Re: [PATCH 2/3] Introduce do_snapshot_blkdev() and monitor command to handle it.
On Wed, 15 Dec 2010 17:57:25 +0100 Jes Sorensen jes.soren...@redhat.com wrote: On 12/15/10 17:55, Kevin Wolf wrote: +int do_snapshot_blkdev(Monitor *mon, const QDict *qdict, QObject **ret_data) +{ +const char *device = qdict_get_str(qdict, device); +const char *filename = qdict_get_try_str(qdict, snapshot_file); +const char *format = qdict_get_try_str(qdict, format); +const char format_qcow2[] = qcow2; +BlockDriverState *bs; +BlockDriver *drv, *proto_drv; +int ret = 0; +int flags; + +bs = bdrv_find(device); +if (!bs) { +qerror_report(QERR_DEVICE_NOT_FOUND, device); +ret = -1; +goto out; +} + +if (!format) { +format = format_qcow2; +} + +drv = bdrv_find_format(format); +if (!drv) { +qerror_report(QERR_INVALID_BLOCK_FORMAT, format); +ret = -1; +goto out; +} + +proto_drv = bdrv_find_protocol(filename); +if (!proto_drv) { +qerror_report(QERR_INVALID_BLOCK_FORMAT, format); +ret = -1; +goto out; +} + +ret = bdrv_img_create(filename, format, bs-filename, + bs-drv-format_name, NULL, -1, bs-open_flags); +if (ret) { +goto out; +} + +qemu_aio_flush(); +bdrv_flush(bs); + +flags = bs-open_flags; +bdrv_close(bs); +ret = bdrv_open(bs, filename, flags, drv); +/* + * If reopening the image file we just created fails, we really + * are in trouble :( + */ +assert(ret == 0); +out: +if (ret) { +ret = 1; I seem to remember that errors are always -1 for monitor commands. I mapped it after something else, but admitted I cannot remember where - can someone clarify? -1 Cheers, Jes
Re: [Qemu-devel] [PATCH v4 1/2] Minimal RAM API support
This adds a minimum chunk of Anthony's RAM API support so that we can identify actual VM RAM versus all the other things that make use of qemu_ram_alloc. Why do we care? How are you defining actual VM RAM? Surely the whole point of qemu_ram_alloc is to allocate a chunk of memory that can be mapped into the guest physical address space, so all uses of qemu_ram_alloc should be using this API. Paul
[Qemu-devel] Re: [PATCH v3] qemu, qmp: convert do_inject_nmi() to QObject, QError
Luiz Capitulino lcapitul...@redhat.com writes: On Wed, 15 Dec 2010 19:18:32 +0200 Avi Kivity a...@redhat.com wrote: On 12/15/2010 07:09 PM, Luiz Capitulino wrote: On Wed, 15 Dec 2010 17:49:27 +0800 Lai Jiangshanla...@cn.fujitsu.com wrote: Convert do_inject_nmi() to QObject, QError, we need to use it(via libvirt). changed from v1 Add document. Add error handling when the cpu index is invalid. changed from v2 use QERR_INVALID_PARAMETER_VALUE as Markus suggest. Signed-off-by: Lai Jiangshanla...@cn.fujitsu.com --- diff --git a/hmp-commands.hx b/hmp-commands.hx index 23024ba..f86d9fe 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -724,7 +724,8 @@ ETEXI .args_type = cpu_index:i, .params = cpu, .help = inject an NMI on the given CPU, -.mhandler.cmd = do_inject_nmi, +.user_print = monitor_user_noop, +.mhandler.cmd_new = do_inject_nmi, }, #endif STEXI diff --git a/monitor.c b/monitor.c index ec31eac..3e33a96 100644 --- a/monitor.c +++ b/monitor.c @@ -2119,7 +2119,7 @@ static void do_wav_capture(Monitor *mon, const QDict *qdict) #endif #if defined(TARGET_I386) -static void do_inject_nmi(Monitor *mon, const QDict *qdict) +static int do_inject_nmi(Monitor *mon, const QDict *qdict, QObject **ret_data) { CPUState *env; int cpu_index = qdict_get_int(qdict, cpu_index); @@ -2127,8 +2127,12 @@ static void do_inject_nmi(Monitor *mon, const QDict *qdict) for (env = first_cpu; env != NULL; env = env-next_cpu) if (env-cpu_index == cpu_index) { cpu_interrupt(env, CPU_INTERRUPT_NMI); -break; +return 0; } + +qerror_report(QERR_INVALID_PARAMETER_VALUE, cpu_index, + a CPU number); +return -1; } #endif diff --git a/qmp-commands.hx b/qmp-commands.hx index e5f157f..fcb6bf2 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -429,6 +429,33 @@ Example: EQMP +#if defined(TARGET_I386) +{ +.name = inject_nmi, +.args_type = cpu_index:i, +.params = cpu, +.help = inject an NMI on the given CPU, +.user_print = monitor_user_noop, +.mhandler.cmd_new = do_inject_nmi, +}, +#endif +SQMP +inject_nmi +-- + +Inject an NMI on the given CPU (x86 only). + +Arguments: + +- cpu_index: the index of the CPU to be injected NMI (json-int) Please, use cpu-index, that's what we're using for the human-monitor-command. Avi, Anthony, can you please ACK this new command? I'd like to see cpu-index made optional; if not present, nmi all cpus (that's what the nmi button on many machines does, or at least I think that's what it does). Looks like a GUI feature to me, Really? Can't see how you can build NMI to all CPUs from NMI this CPU. Or am I misunderstanding you? _might_ turn out to be an undesirable side effect to client writers. They seem to be coping fine with optional arguments elsewhere. I guess I prefer a to-all-cpus argument. How would that look like? cpu-index: all? I find optional json-int a simpler schema than either a json-int or the json-string all.
Re: [Qemu-devel] [PATCH 5/6] [RFC] Emulation of Leon3.
On 12/13/2010 07:18 PM, Blue Swirl wrote: On Mon, Dec 13, 2010 at 3:51 PM, Fabien Chouteauchout...@adacore.com wrote: On 12/11/2010 10:56 AM, Blue Swirl wrote: On Tue, Dec 7, 2010 at 11:40 AM, Fabien Chouteauchout...@adacore.com wrote: On 12/06/2010 06:53 PM, Blue Swirl wrote: On Mon, Dec 6, 2010 at 9:26 AM, Fabien Chouteauchout...@adacore.com wrote: Signed-off-by: Fabien Chouteauchout...@adacore.com --- Makefile.target |5 +- hw/leon3.c | 310 ++ target-sparc/cpu.h | 10 ++ target-sparc/helper.c|2 +- target-sparc/op_helper.c | 30 - 5 files changed, 353 insertions(+), 4 deletions(-) diff --git a/Makefile.target b/Makefile.target index 2800f47..f40e04f 100644 --- a/Makefile.target +++ b/Makefile.target @@ -290,7 +290,10 @@ obj-sparc-y += cirrus_vga.o else obj-sparc-y = sun4m.o lance.o tcx.o sun4m_iommu.o slavio_intctl.o obj-sparc-y += slavio_timer.o slavio_misc.o sparc32_dma.o -obj-sparc-y += cs4231.o eccmemctl.o sbi.o sun4c_intctl.o +obj-sparc-y += cs4231.o eccmemctl.o sbi.o sun4c_intctl.o leon3.o + +# GRLIB +obj-sparc-y += grlib_gptimer.o grlib_irqmp.o grlib_apbuart.o endif obj-arm-y = integratorcp.o versatilepb.o arm_pic.o arm_timer.o diff --git a/hw/leon3.c b/hw/leon3.c new file mode 100644 index 000..ba61081 --- /dev/null +++ b/hw/leon3.c @@ -0,0 +1,310 @@ +/* + * QEMU Leon3 System Emulator + * + * Copyright (c) 2010 AdaCore + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the Software), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ +#include hw.h +#include qemu-timer.h +#include qemu-char.h +#include sysemu.h +#include boards.h +#include loader.h +#include elf.h + +#include grlib.h + +/* #define DEBUG_LEON3 */ + +#ifdef DEBUG_LEON3 +#define DPRINTF(fmt, ...) \ +do { printf(Leon3: fmt , ## __VA_ARGS__); } while (0) +#else +#define DPRINTF(fmt, ...) +#endif + +/* Default system clock. */ +#define CPU_CLK (40 * 1000 * 1000) + +#define PROM_FILENAMEu-boot.bin + +#define MAX_PILS 16 + +typedef struct Leon3State +{ +uint32_t cache_control; +uint32_t inst_cache_conf; +uint32_t data_cache_conf; + +uint64_t entry; /* save kernel entry in case of reset */ +} Leon3State; + +Leon3State leon3_state; Again global state, please refactor. Perhaps most of the cache handling code belong to target-sparc/op_helper.c and this structure to CPUSPARCState. I will try to find a solution for that. Is it OK to add some Leon3 specific stuff in the CPUSPARCState? Yes, no problem. You can also drop the intermediate Leon3State structure if there is no benefit. + +/* Cache control: emulate the behavior of cache control registers but without + any effect on the emulated CPU */ + +#define CACHE_DISABLED 0x0 +#define CACHE_FROZEN 0x1 +#define CACHE_ENABLED 0x3 + +/* Cache Control register fields */ + +#define CACHE_CTRL_IF (14) /* Instruction Cache Freeze on Interrupt */ +#define CACHE_CTRL_DF (15) /* Data Cache Freeze on Interrupt */ +#define CACHE_CTRL_DP (1 14) /* Data cache flush pending */ +#define CACHE_CTRL_IP (1 15) /* Instruction cache flush pending */ +#define CACHE_CTRL_IB (1 16) /* Instruction burst fetch */ +#define CACHE_CTRL_FI (1 21) /* Flush Instruction cache (Write only) */ +#define CACHE_CTRL_FD (1 22) /* Flush Data cache (Write only) */ +#define CACHE_CTRL_DS (1 23) /* Data cache snoop enable */ + +void leon3_cache_control_int(void) +{ +uint32_t state = 0; + +if (leon3_state.cache_control CACHE_CTRL_IF) { +/* Instruction cache state */ +state = leon3_state.cache_control 0x3; Please add a new define CACHE_CTRL_xxx to replace 0x3. Done. +if (state == CACHE_ENABLED) { +state = CACHE_FROZEN; +DPRINTF(Instruction cache: freeze\n); +} + +leon3_state.cache_control= ~0x3; +
Re: [Qemu-devel] [PATCH v3] qemu, qmp: convert do_inject_nmi() to QObject, QError
Luiz Capitulino lcapitul...@redhat.com writes: On Wed, 15 Dec 2010 11:49:23 +0100 Markus Armbruster arm...@redhat.com wrote: Lai Jiangshan la...@cn.fujitsu.com writes: Convert do_inject_nmi() to QObject, QError, we need to use it(via libvirt). changed from v1 Add document. Add error handling when the cpu index is invalid. changed from v2 use QERR_INVALID_PARAMETER_VALUE as Markus suggest. Signed-off-by: Lai Jiangshan la...@cn.fujitsu.com A note on commit messages: The commit message should describe the current version of the patch. Don't repeat the subject line in the body. Patch history is very useful for review, but usually uninteresting once the patch is committed. Thus, it's best to put it after the --- separator. Subsystem tags in the subject line are helpful. But qemu doesn't provide any information there :) Regarding the patch: The conversion looks good. The new QMP command is called inject_nmi, while the existing human monitor command is called nmi. Luiz asked for this name change. I don't mind. But should we rename the human monitor command for consistency? I don't think so, we don't need (and maybe don't even want) naming parity between QMP and HMP. Remember that one of our mistakes was to couple the two. Human nmi and QMP inject_nmi are identical commands, aren't they? Giving the same things the same name isn't coupling :) The mistake that matters here was adopting existing human commands for QMP uncritically. Also, Avi asked for more descriptive names in QMP and I agree with him, I would even be in favor of calling it inject-non-maskable-interrupt. I like inject_nmi better than nmi. inject-non-maskable-interrupt is too long even for QMP. Regardless, the differing command name is worth mentioning in the commit message.
[Qemu-devel] Re: [PATCH] qemu-kvm: Switch to upstream -enable-kvm semantics
On 12/15/2010 09:50 AM, Markus Armbruster wrote: We currently enable KVM by default, and when it's not available, we print a message and fall back to TCG. Option -enable-kvm is ignored. Option -no-kvm suppresses KVM. Upstream works differently: KVM is off by default, -enable-kvm switches it on. -enable-kvm terminates the process unsuccessfully if KVM is not available. upstream qemu | default |-enable-kvm +---+--- KVM available | disabled | enabled KVM unavailable | disabled |fail qemu-kvm| default |-enable-kvm +---+--- KVM available | enabled* | enabled KVM unavailable | disabled | disabled* * differs from upstream Users of qemu and qemu-kvm need to be aware of these differences to enable / disable use of KVM reliably. This is bothersome. Consider -enable-kvm when KVM is unavailable: If the user expects qemu-kvm behavior (fall back), but qemu fails, he'll likely be surprised and unhappy. If the user expects upstream behavior (fail), but qemu-kvm falls back to TCG, the guest runs slow as molasses, and the user will likely be confused and unhappy (unless he spots and understands the disable KVM message). Switch to upstream semantics: KVM off by default, -enable-kvm switches it on, and when it can't, it's fatal. Having to enable KVM explicitly is annoying, but the proper place to address that is upstream. Signed-off-by: Markus Armbrusterarm...@redhat.com Backwards compatibility is going to kill us if we try to make this change. Current qemu-kvm behavior: default: -accel kvm,tcg -no-kvm: -accel tcg -enable-kvm: -accel kvm,tcg Current upstream behavior default: -accel tcg -enable-kvm: -accel kvm I think we should tie `-accel' to the machine type. For qemu-kvm, a different default machine type should be used than upstream qemu (it really should be a configure switch). For `pc', the default `-accel' behavior should remain 'tcg'. For `kvmpc', the default `-accel' behavior should be 'kvm,tcg'. -no-kvm should be deprecated. -enable-kvm should also be deprecated in favor of the `-accel' option. In the short term, it would be a good idea to modify qemu-kvm to switch the -enable-kvm semantics to match upstream (fail if KVM isn't available). Adding an alias for 'kvmpc' upstream and qemu-kvm and making qemu-kvm default to 'kvmpc' would be helpful for management tools too. Regards, Anthony Liguori --- vl.c | 10 +- 1 files changed, 1 insertions(+), 9 deletions(-) diff --git a/vl.c b/vl.c index e3c8919..87e88c2 100644 --- a/vl.c +++ b/vl.c @@ -247,7 +247,7 @@ static void *boot_set_opaque; static NotifierList exit_notifiers = NOTIFIER_LIST_INITIALIZER(exit_notifiers); -int kvm_allowed = 1; +int kvm_allowed = 0; uint32_t xen_domid; enum xen_mode xen_mode = XEN_EMULATE; @@ -2436,10 +2436,8 @@ int main(int argc, char **argv, char **envp) case QEMU_OPTION_smbios: do_smbios_option(optarg); break; -#ifdef OBSOLETE_KVM_IMPL case QEMU_OPTION_enable_kvm: kvm_allowed = 1; -#endif break; case QEMU_OPTION_no_kvm: kvm_allowed = 0; @@ -2789,18 +2787,12 @@ int main(int argc, char **argv, char **envp) if (kvm_allowed) { int ret = kvm_init(smp_cpus); if (ret 0) { -#if defined(OBSOLETE_KVM_IMPL) || defined(CONFIG_NO_CPU_EMULATION) if (!kvm_available()) { printf(KVM not supported for this target\n); } else { fprintf(stderr, failed to initialize KVM: %s\n, strerror(-ret)); } exit(1); -#endif -#ifdef CONFIG_KVM -fprintf(stderr, Could not initialize KVM, will disable KVM support\n); -kvm_allowed = 0; -#endif } }
Re: [Qemu-devel] [Qestion] What status of memory stats feature
On Wed, 15 Dec 2010 16:20:05 +0900 Ken'ichi Ohmichi oomi...@mxs.nes.nec.co.jp wrote: Hi, I tried to get the memory stats by using virDomainMemoryStats() of libvirt, but it could not do it because of the following patch: [PATCH 03/23] disable guest-provided stats on info balloon command 2010/10/01 from Luiz Capitulino http://www.mail-archive.com/qemu-devel@nongnu.org/msg43024.html According to the related thread, the above patch avoids hanging user monitor, and people hope the memory stats feature will be able in the future. So I'd like to know the status of this feature. Is there the patch for enabling the feature ? If the patch exists, I'd like to try it. It doesn't, afaik. What is the essential problem ? There are two essential problems here. The first one is that QMP lacks true asynchronous command support (it does have some code for it, but it's incomplete). The second problem is that we must not make an existing synchronous command asynchronous, because this breaks the ABI. Does some infinite loop happen ? No, the guest doesn't respond. Unfortunately, I cannot understand the cause of hanging user monitor. The balloon command needs guest cooperation (ie. it talks to the guest), if the guest doesn't respond the command will never return. We don't have a precise ETA for async support, but if it depends only on the new error framework work, then we could have it for 0.15. If it depends on the full monitor redesign work, then I guess we'll two releases.
Re: [Qemu-devel] [PATCH v3] qemu, qmp: convert do_inject_nmi() to QObject, QError
On Wed, 15 Dec 2010 18:39:07 +0100 Markus Armbruster arm...@redhat.com wrote: Luiz Capitulino lcapitul...@redhat.com writes: On Wed, 15 Dec 2010 11:49:23 +0100 Markus Armbruster arm...@redhat.com wrote: Lai Jiangshan la...@cn.fujitsu.com writes: Convert do_inject_nmi() to QObject, QError, we need to use it(via libvirt). changed from v1 Add document. Add error handling when the cpu index is invalid. changed from v2 use QERR_INVALID_PARAMETER_VALUE as Markus suggest. Signed-off-by: Lai Jiangshan la...@cn.fujitsu.com A note on commit messages: The commit message should describe the current version of the patch. Don't repeat the subject line in the body. Patch history is very useful for review, but usually uninteresting once the patch is committed. Thus, it's best to put it after the --- separator. Subsystem tags in the subject line are helpful. But qemu doesn't provide any information there :) Regarding the patch: The conversion looks good. The new QMP command is called inject_nmi, while the existing human monitor command is called nmi. Luiz asked for this name change. I don't mind. But should we rename the human monitor command for consistency? I don't think so, we don't need (and maybe don't even want) naming parity between QMP and HMP. Remember that one of our mistakes was to couple the two. Human nmi and QMP inject_nmi are identical commands, aren't they? At this point in time yes, but they might not be in the near future. Assuming they might be different is the safest thing to do. That's true for all existing commands. Giving the same things the same name isn't coupling :) Expecting them to be the same in the future is. The mistake that matters here was adopting existing human commands for QMP uncritically. That's the protocol visible mistake, yes. Also, Avi asked for more descriptive names in QMP and I agree with him, I would even be in favor of calling it inject-non-maskable-interrupt. I like inject_nmi better than nmi. inject-non-maskable-interrupt is too long even for QMP. It's not supposed to be typed that much, but I'm not that strong about that. nitpick: I think we should be consistent in the use of _ or -, eg. we should pick inject-nmi or inject_nmi? Regardless, the differing command name is worth mentioning in the commit message.
[Qemu-devel] Re: [PATCH] qemu-kvm: Switch to upstream -enable-kvm semantics
Anthony Liguori anth...@codemonkey.ws writes: On 12/15/2010 09:50 AM, Markus Armbruster wrote: We currently enable KVM by default, and when it's not available, we print a message and fall back to TCG. Option -enable-kvm is ignored. Option -no-kvm suppresses KVM. Upstream works differently: KVM is off by default, -enable-kvm switches it on. -enable-kvm terminates the process unsuccessfully if KVM is not available. upstream qemu | default |-enable-kvm +---+--- KVM available | disabled | enabled KVM unavailable | disabled |fail qemu-kvm| default |-enable-kvm +---+--- KVM available | enabled* | enabled KVM unavailable | disabled | disabled* * differs from upstream Users of qemu and qemu-kvm need to be aware of these differences to enable / disable use of KVM reliably. This is bothersome. Consider -enable-kvm when KVM is unavailable: If the user expects qemu-kvm behavior (fall back), but qemu fails, he'll likely be surprised and unhappy. If the user expects upstream behavior (fail), but qemu-kvm falls back to TCG, the guest runs slow as molasses, and the user will likely be confused and unhappy (unless he spots and understands the disable KVM message). Switch to upstream semantics: KVM off by default, -enable-kvm switches it on, and when it can't, it's fatal. Having to enable KVM explicitly is annoying, but the proper place to address that is upstream. Signed-off-by: Markus Armbrusterarm...@redhat.com Backwards compatibility is going to kill us if we try to make this change. Current qemu-kvm behavior: default: -accel kvm,tcg -no-kvm: -accel tcg -enable-kvm: -accel kvm,tcg Current upstream behavior default: -accel tcg -enable-kvm: -accel kvm I think we should tie `-accel' to the machine type. For qemu-kvm, a different default machine type should be used than upstream qemu (it really should be a configure switch). For `pc', the default `-accel' behavior should remain 'tcg'. For kvmpc', the default `-accel' behavior should be 'kvm,tcg'. -no-kvm should be deprecated. -enable-kvm should also be deprecated in favor of the `-accel' option. I'm fine with -accel and deprecating the old options. But until we have that: In the short term, it would be a good idea to modify qemu-kvm to switch the -enable-kvm semantics to match upstream (fail if KVM isn't available). That's what my patch does. Additionally, it changes the default to match upstream: KVM disabled. What do you want changed in my patch? Adding an alias for 'kvmpc' upstream and qemu-kvm and making qemu-kvm default to 'kvmpc' would be helpful for management tools too. That would address Having to enable KVM explicitly is annoying.
[Qemu-devel] Re: [PATCH v3] qemu, qmp: convert do_inject_nmi() to QObject, QError
On Wed, 15 Dec 2010 18:45:09 +0100 Markus Armbruster arm...@redhat.com wrote: Luiz Capitulino lcapitul...@redhat.com writes: On Wed, 15 Dec 2010 19:18:32 +0200 Avi Kivity a...@redhat.com wrote: On 12/15/2010 07:09 PM, Luiz Capitulino wrote: On Wed, 15 Dec 2010 17:49:27 +0800 Lai Jiangshanla...@cn.fujitsu.com wrote: Convert do_inject_nmi() to QObject, QError, we need to use it(via libvirt). changed from v1 Add document. Add error handling when the cpu index is invalid. changed from v2 use QERR_INVALID_PARAMETER_VALUE as Markus suggest. Signed-off-by: Lai Jiangshanla...@cn.fujitsu.com --- diff --git a/hmp-commands.hx b/hmp-commands.hx index 23024ba..f86d9fe 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -724,7 +724,8 @@ ETEXI .args_type = cpu_index:i, .params = cpu, .help = inject an NMI on the given CPU, -.mhandler.cmd = do_inject_nmi, +.user_print = monitor_user_noop, +.mhandler.cmd_new = do_inject_nmi, }, #endif STEXI diff --git a/monitor.c b/monitor.c index ec31eac..3e33a96 100644 --- a/monitor.c +++ b/monitor.c @@ -2119,7 +2119,7 @@ static void do_wav_capture(Monitor *mon, const QDict *qdict) #endif #if defined(TARGET_I386) -static void do_inject_nmi(Monitor *mon, const QDict *qdict) +static int do_inject_nmi(Monitor *mon, const QDict *qdict, QObject **ret_data) { CPUState *env; int cpu_index = qdict_get_int(qdict, cpu_index); @@ -2127,8 +2127,12 @@ static void do_inject_nmi(Monitor *mon, const QDict *qdict) for (env = first_cpu; env != NULL; env = env-next_cpu) if (env-cpu_index == cpu_index) { cpu_interrupt(env, CPU_INTERRUPT_NMI); -break; +return 0; } + +qerror_report(QERR_INVALID_PARAMETER_VALUE, cpu_index, + a CPU number); +return -1; } #endif diff --git a/qmp-commands.hx b/qmp-commands.hx index e5f157f..fcb6bf2 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -429,6 +429,33 @@ Example: EQMP +#if defined(TARGET_I386) +{ +.name = inject_nmi, +.args_type = cpu_index:i, +.params = cpu, +.help = inject an NMI on the given CPU, +.user_print = monitor_user_noop, +.mhandler.cmd_new = do_inject_nmi, +}, +#endif +SQMP +inject_nmi +-- + +Inject an NMI on the given CPU (x86 only). + +Arguments: + +- cpu_index: the index of the CPU to be injected NMI (json-int) Please, use cpu-index, that's what we're using for the human-monitor-command. Avi, Anthony, can you please ACK this new command? I'd like to see cpu-index made optional; if not present, nmi all cpus (that's what the nmi button on many machines does, or at least I think that's what it does). Looks like a GUI feature to me, Really? Can't see how you can build NMI to all CPUs from NMI this CPU. Or am I misunderstanding you? I guess so. Avi referred to 'nmi button on many machines', I assumed he meant a virtual machine GUI, am I wrong? _might_ turn out to be an undesirable side effect to client writers. They seem to be coping fine with optional arguments elsewhere. Which we might want to review. I guess I prefer a to-all-cpus argument. How would that look like? cpu-index: all? Like this: { execute: inject-nmi, arguments: { to-all-cpus: true } } But this looks like an optimization to me, because it's also easy to do: for cpu in query-cpus; do inject-nmi cpu Unless we want to do this in an atomic way, due to side effects I'm not aware about. I find optional json-int a simpler schema than either a json-int or the json-string all.
[Qemu-devel] [Bug 690776] Re: Overwrite argv to set process title, eliminating 16-character prctl() limit.
** Attachment added: qemu-overwrite-argv-to-set-process-title.patch https://bugs.launchpad.net/bugs/690776/+attachment/1767059/+files/qemu-overwrite-argv-to-set-process-title.patch -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/690776 Title: Overwrite argv to set process title, eliminating 16-character prctl() limit. Status in QEMU: New Bug description: I've modified qemu to overwrite its arguments to set the process title, since its current prctl() method has a 16-character limit. I posted the original patch to qemu-devel, made the changes others suggested, then re-posted to qemu-devel. I flailed around a bit with the patch submission process and think I finally got it right, but haven't been able to gain the notice of a committer to have this pushed. Maybe this will get more attention when reported in the BTS.
[Qemu-devel] [Bug 690776] [NEW] Overwrite argv to set process title, eliminating 16-character prctl() limit.
Public bug reported: I've modified qemu to overwrite its arguments to set the process title, since its current prctl() method has a 16-character limit. I posted the original patch to qemu-devel, made the changes others suggested, then re-posted to qemu-devel. I flailed around a bit with the patch submission process and think I finally got it right, but haven't been able to gain the notice of a committer to have this pushed. Maybe this will get more attention when reported in the BTS. ** Affects: qemu Importance: Undecided Status: New -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/690776 Title: Overwrite argv to set process title, eliminating 16-character prctl() limit. Status in QEMU: New Bug description: I've modified qemu to overwrite its arguments to set the process title, since its current prctl() method has a 16-character limit. I posted the original patch to qemu-devel, made the changes others suggested, then re-posted to qemu-devel. I flailed around a bit with the patch submission process and think I finally got it right, but haven't been able to gain the notice of a committer to have this pushed. Maybe this will get more attention when reported in the BTS.
[Qemu-devel] Re: [PATCH] rtl8139: IO memory is not part of vmstate
On Wed, Dec 15, 2010 at 06:12:25PM +0100, Paolo Bonzini wrote: On 12/15/2010 11:00 AM, Michael S. Tsirkin wrote: Indeed, subsections are for data that is rarely needed so that there's some chance (sometimes ~100%) of migration working seemlessly. If a subsection arrives that qemu does not know about, won't migratin fail? Yes, that's why rarely needed = some high chance of migration working (though no certainty). In this case it's either no-bump-and-live-with-the-consequences, or changing the version id. This was discussed to death already. version ids have the problem that they don't play nicely with downstreams. Downstream version bumps don't play nicely with upstream, so downstream does have a reason for always-necessary subsections. But upstream can bump the version id as much as they care. Paolo This assuming upstream developers do not care about downstreams. To give a chance for downstream to cherry-pick changes, upstream should use subsections instead of version ids too. -- MST
Re: [Qemu-devel] [PATCH v4 1/2] Minimal RAM API support
On Wed, 2010-12-15 at 17:23 +, Paul Brook wrote: This adds a minimum chunk of Anthony's RAM API support so that we can identify actual VM RAM versus all the other things that make use of qemu_ram_alloc. Why do we care? How are you defining actual VM RAM? Surely the whole point of qemu_ram_alloc is to allocate a chunk of memory that can be mapped into the guest physical address space, so all uses of qemu_ram_alloc should be using this API. http://wiki.qemu.org/Features/RamAPI
Re: [Qemu-devel] [PATCH v4 1/2] Minimal RAM API support
On 12/15/2010 11:23 AM, Paul Brook wrote: This adds a minimum chunk of Anthony's RAM API support so that we can identify actual VM RAM versus all the other things that make use of qemu_ram_alloc. Why do we care? How are you defining actual VM RAM? Surely the whole point of qemu_ram_alloc is to allocate a chunk of memory that can be mapped into the guest physical address space, so all uses of qemu_ram_alloc should be using this API. actual VM RAM == the DIMM devices. This address has exactly a 1-1 mapping between memory content and an address. It doesn't change during program execution. It may be mapped in the CPU in weird ways, it may be visibly different to devices, but that's a different interface. Why do we care about differentiating actual VM RAM from things that behave like RAM but are not actually RAM (like device ROM)? Because the semantics are different. ROM is non-volatile and RAM is volatile. If we don't make that distinction in our interfaces, we loose the ability to model the behavioral differences. For things like paravirtual devices, we can take short cuts (to optimize performance) by saying the device is directly connecting to RAM (and doesn't go through the normal translation hierarchy). Regards, Anthony Liguori Paul -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Qemu-devel] [PATCH] Fix migrate set speed doc arg
We used to ignore any fractional part in 0.13, but due to recent changes (started with 9f9b17a4f0865286391e4d3a0a735230122a2289) migrate_set_speed will reject the fractional part. We don't expect existing clients to be relying on this, but we need to update the documentation to reflect the change. Signed-off-by: Luiz Capitulino lcapitul...@redhat.com --- qmp-commands.hx |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/qmp-commands.hx b/qmp-commands.hx index 3486223..164ecbc 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -510,7 +510,7 @@ Set maximum speed for migrations. Arguments: -- value: maximum speed, in bytes per second (json-number) +- value: maximum speed, in bytes per second (json-int) Example: -- 1.7.3.3.585.g74f6e
Re: [Qemu-devel] [PATCH] Fix migrate set speed doc arg
Luiz Capitulino lcapitul...@redhat.com writes: We used to ignore any fractional part in 0.13, but due to recent changes (started with 9f9b17a4f0865286391e4d3a0a735230122a2289) migrate_set_speed will reject the fractional part. We don't expect existing clients to be relying on this, but we need to update the documentation to reflect the change. Signed-off-by: Luiz Capitulino lcapitul...@redhat.com Documentation broken by commit ed3d4a80. ACK
[Qemu-devel] [Bug 685096] Re: USB Passthrough not working for Windows 7 guest
Same problem here, using: qemu-kvm 0.13 kernel 2.6.36.2 kvm-intel Guest: Windows 7 Enterprise x64 INFO USBHOST: Device 2.2, speed 480 Mb/s Class 00: USB device 054c:02a5, Storage Media INFO USB: Device 0.3, Speed 480 Mb/s, Product Storage Media Device appears in Windows 7 but in Error Code 10. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/685096 Title: USB Passthrough not working for Windows 7 guest Status in QEMU: New Bug description: USB Passthrough from host to guest is not working for a 32-bit Windows 7 guest, while it works perfectly for a 32-bit Windows XP guest. The device appears in the device manager of Windows 7, but with Error code 10: device cannot start. I have tried this with numerous USB thumbdrives and a USB wireless NIC, all with the same result. The device name and functionality is recognized, so at least some USB negotiation is taking place. I am trying this with the latest git-pull of QEMU-KVM. The command line to launch qemu-kvm for win7 is: sudo /home/user/local_install/bin/qemu-system-x86_64 -cpu core2duo -m 1024 -smp 2 -vga std -hda ./disk_images/win7.qcow -vnc :1 -boot c -usb -usbdevice tablet -usbdevice host:0781:5150 The command line to launch qemu-kvm for winxp is: sudo /home/user/local_install/bin/qemu-system-x86_64 -cpu core2duo -m 1024 -smp 2 -usb -vga std -hda ./winxpsp3.qcow -vnc :0 -boot c -usbdevice tablet -usbdevice host:0781:5150 Any help is appreciated.
[Qemu-devel] Storing instructions executed in CPUState
Hello, I am attempting to fold a cache model into Qemu. For this I need to store the instructions executed between actual lds/sts to Qemu memory, for performance reasons. I figured that a buffer in the CPUState could accommodate this requirement, however tcg only gives the ability to ld/st from host memory at constant offsets. Is there a better solution to this problem than adding new ld/st instructions to tcg that take a computed offset? Thx, James
[Qemu-devel] [Bug 685096] Re: USB Passthrough not working for Windows 7 guest
Ugh... I have just realized that KVM only supports UHCI, so not USB 2.0 support -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/685096 Title: USB Passthrough not working for Windows 7 guest Status in QEMU: New Bug description: USB Passthrough from host to guest is not working for a 32-bit Windows 7 guest, while it works perfectly for a 32-bit Windows XP guest. The device appears in the device manager of Windows 7, but with Error code 10: device cannot start. I have tried this with numerous USB thumbdrives and a USB wireless NIC, all with the same result. The device name and functionality is recognized, so at least some USB negotiation is taking place. I am trying this with the latest git-pull of QEMU-KVM. The command line to launch qemu-kvm for win7 is: sudo /home/user/local_install/bin/qemu-system-x86_64 -cpu core2duo -m 1024 -smp 2 -vga std -hda ./disk_images/win7.qcow -vnc :1 -boot c -usb -usbdevice tablet -usbdevice host:0781:5150 The command line to launch qemu-kvm for winxp is: sudo /home/user/local_install/bin/qemu-system-x86_64 -cpu core2duo -m 1024 -smp 2 -usb -vga std -hda ./winxpsp3.qcow -vnc :0 -boot c -usbdevice tablet -usbdevice host:0781:5150 Any help is appreciated.
Re: [Qemu-devel] [Qestion] What status of memory stats feature
On Wed, 2010-12-15 at 15:39 -0200, Luiz Capitulino wrote: On Wed, 15 Dec 2010 16:20:05 +0900 Ken'ichi Ohmichi oomi...@mxs.nes.nec.co.jp wrote: Hi, I tried to get the memory stats by using virDomainMemoryStats() of libvirt, but it could not do it because of the following patch: [PATCH 03/23] disable guest-provided stats on info balloon command 2010/10/01 from Luiz Capitulino http://www.mail-archive.com/qemu-devel@nongnu.org/msg43024.html According to the related thread, the above patch avoids hanging user monitor, and people hope the memory stats feature will be able in the future. So I'd like to know the status of this feature. Is there the patch for enabling the feature ? If the patch exists, I'd like to try it. It doesn't, afaik. It depends on what you are looking to do. If you just want to play around and aren't concerned about lockups, You could undo the above patch to re-enable the interface (although I cannot guarantee that even this will work). What is the essential problem ? There are two essential problems here. The first one is that QMP lacks true asynchronous command support (it does have some code for it, but it's incomplete). I was never completely clear on the problems with the QMP async support. The second problem is that we must not make an existing synchronous command asynchronous, because this breaks the ABI. The memory stats interface has always advertised itself as an async command so this one shouldn't matter. Whether those semantics actually ever worked is another issue. Does some infinite loop happen ? No, the guest doesn't respond. Unfortunately, I cannot understand the cause of hanging user monitor. The balloon command needs guest cooperation (ie. it talks to the guest), if the guest doesn't respond the command will never return. Isn't that only a problem for the user monitor? For QMP you might just never get a response to the stats request. IIRC, the QMP monitor is never 'suspended' in the same way that the user monitor is so I wouldn't expect it to be susceptible to the same lockup issues. We don't have a precise ETA for async support, but if it depends only on the new error framework work, then we could have it for 0.15. If it depends on the full monitor redesign work, then I guess we'll two releases. I can't provide a definitive answer to this unfortunately since I haven't looked in awhile (nor am I privy to the details of the full monitor redesign proposal. -- Thanks, Adam
Re: [Qemu-devel] Re: [PATCH 00/15] Megasas HBA emulation and SCSI update v.3
On Mon, 2010-12-13 at 08:32 +0100, Hannes Reinecke wrote: On 12/10/2010 11:14 PM, Paolo Bonzini wrote: On 11/24/2010 05:50 PM, Christoph Hellwig wrote: Btw, it might make sense to split this series into two. Patches 1 to 11 are genuine improvements to the SCSI code, which I'd like to see merged ASAP. The rest is the actual megasas driver, which I still want to see, but haven't even gotten to review yet. Ping for patches 1 to 11? Paolo The first few already have been merged by Kevin Wolf; I'll see to prepare an updated patchset. Actually, I was about to ask as I'd like to base some new work of mine on top of these. I don't see any recent commit from Kevin in the qemu master branch (nor in any other branch on http://git.savannah.gnu.org/cgit/qemu.git/log/). Does Kevin maintain a separate staging tree ? Cheers, Ben.
Re: [Qemu-devel] Re: [PATCH 00/15] Megasas HBA emulation and SCSI update v.3
On Thu, 2010-12-16 at 12:45 +1100, Benjamin Herrenschmidt wrote: On Mon, 2010-12-13 at 08:32 +0100, Hannes Reinecke wrote: On 12/10/2010 11:14 PM, Paolo Bonzini wrote: On 11/24/2010 05:50 PM, Christoph Hellwig wrote: Btw, it might make sense to split this series into two. Patches 1 to 11 are genuine improvements to the SCSI code, which I'd like to see merged ASAP. The rest is the actual megasas driver, which I still want to see, but haven't even gotten to review yet. Ping for patches 1 to 11? Paolo The first few already have been merged by Kevin Wolf; I'll see to prepare an updated patchset. Actually, I was about to ask as I'd like to base some new work of mine on top of these. I don't see any recent commit from Kevin in the qemu master branch (nor in any other branch on http://git.savannah.gnu.org/cgit/qemu.git/log/). Does Kevin maintain a separate staging tree ? BTW. I could use patch 12 too (get_sense() callback :-) Would save me from manufacturing REQUEST_SENSE etc... I can carry it locally for the time being but it would be nice to have it upstream. Cheers, Ben.
Re: [Qemu-devel] Re: [PATCH] scsi-generic: bugfixes for 'SCSIRequest' conversion
The commit message of this commit says it's a workaround for a problem with lsi: - when a read is aborted due to a mark/EOF/EOD/EOM, the len reported to controller can be 0. LSI controller emulation doesn't know how to manage this. A workaround found is to call the completion routine with SCSI_REASON_DONE just after calling it with SCSI_REASON_DATA with len=0. Are you sure that it's not needed any more? Don't ask me. I didn't do the patch, and my knowledge of lsi HBA internals is scanty. Nic, can you comment here? Back to this topic... On my current WIP code, currently based on qemu upstream commit f711df67d611e4762966a249742a5f7499e19f99, I've just observed the following behaviour: Use -cdrom /dev/cdrom (which points to /dev/sr0) No disk in the drive, scsi-disk kicks in. test-unit-ready properly says there's no medium, so far so good. Now, my (buggy) guest code tries to do a READ_10. At this point, qemu hangs. What happens is that my scsi complete callback get called with reason 1 (DATA) and arg 0. I then perform no data copy, and call back sdev-info-read_data() which eventually calls me back again with reason 1 and arg 0, etc... in a loop. I haven't had a chance yet to look at what's happening in the guts of scsi-disk, but here, either I'm supposed to special case DATA with 0 bytes requested in my HBA driver, or there's a bug in scsi-disk. I can work around it by doing a cancel_io in that case and then completing the request is if reason had been 0 (DONE) but that's of course just a band-aid. I'll let you know what I find out when I get a chance to look at this again (hopefully soon). Cheers, Ben.
Re: [Qemu-devel] Re: [PATCH] PCI: Bus number from the bridge, not the device
On Wed, Dec 15, 2010 at 08:27:49AM -0700, Alex Williamson wrote: On Wed, 2010-12-15 at 11:56 +0200, Michael S. Tsirkin wrote: On Tue, Dec 14, 2010 at 11:34:53AM -0700, Alex Williamson wrote: On Tue, 2010-12-14 at 14:26 +0200, Michael S. Tsirkin wrote: On Mon, Dec 13, 2010 at 10:04:24PM -0700, Alex Williamson wrote: I've only ever seen config[PCI_SECONDARY_BUS] be non-zero for an assigned device, so I'm pretty sure we're not going to hurt migration, but the code is clearly wrong and I'd like to make sure we don't trip on a migration failure for a minor device config space change. Which reminds me: maybe just mark nested bridges as non-migrateable for now? Care writing such a patch? Hmm, this is trickier than it sounds. Hmm, since 0 is put in the path instead of the bridge number, will the correct bridge be restored? We're really only broken wrt migration if a device under a bridge calls qemu_ram_alloc. I guess there's more broken-ness. What exactly breaks qemu_ram_alloc? You're right, it's more broken than that. Anything that calls get_dev_path is broken for migration of bridges since the path is determined before the guest updates bus numbers. That includes qemu_ram_alloc and vmstate. I was only looking at the qemu_ram_alloc side. So perhaps the right answer, for the moment, is to block migration if there's a p2p bridge. Eww. That's bad. Anyway, I agree to disable it for the moment. Let's fix it after 0.14 release. -- yamahata
Re: [Qemu-devel] [PATCH] RFC: delay pci_update_mappings for 64-bit BARs
On Tue, Dec 14, 2010 at 10:42:43AM -0700, Cam Macdonell wrote: On Mon, Dec 13, 2010 at 8:00 PM, Isaku Yamahata yamah...@valinux.co.jp wrote: On Mon, Dec 13, 2010 at 03:43:44PM -0700, Cam Macdonell wrote: Do not call pci_update_mappings on the lower 32-bits of a 64-bit bar. ?Wait for the upper 32 or else Qemu will try to map on just the lower 32 which is probably going to corrupt memory. I was encountering crashes when mapping certain PCI region sizes. ?The problem turns out that pci_update_mappings is being called without all 64-bits in the BAR. ?For example when mapping to 0x18000, once the lower 32-bits were written the remapping happened (mapping to 0x800) which would overwrite something. I'm not certain if this is completely correct, I'm simply testing the lower 4-bits to only be MEM_TYPE_64 flag. ?Upper 32-bit address parts can be values like 0xff which is tricky to test against. You're assuming that guest OS always write lower 32bit and them upper 32bit. Is the assumption correct? I found Linux does, but I don't know about other OSes. And I couldn't find any sentence about how to update (64bit) BAR in the specs. (Please correct me if I missed it) I think you're right, we probably can't assume the order. Some work around would be necessary regardless of 32bit-or-64bit. because qemu doesn't emulate bus accurately at the moment. How about the followings? If BAR overlaps with RAM, don't map BAR. If BAR overlaps with other BARs, record the overlapping and when updating one of the BARs, update all the overlapping BARs. Which BAR wins depends on the order of updating, it doesn't matter because it's anomaly case. But the addresses in the BARs may not overlap. For example, Linux allocates memory from top down, so I recently had the mapping of a BAR to address 0xffc000 So BAR 0x18 sees 0xc004 Then BAR 0x1c sees 0xff So if I understand what you mean by overlapping BARs, 0xc000 and 0xffc000 will not be detected as overlapping and so we can't record it. But, we can allow harmless mappings of the incomplete lower-32 to proceed and then get remapped when the upper bits are written. (This is what happens currently, but fails when the lower-32 overwrite RAM). Case of writing upper-then-lower (non-Linux case): The addresses in the upper 32-bits are going to be limited to 16-bits (at most 48-bit addresses currently) and so those shouldn't update mappings because they will overlap with RAM. When the lower-bits are written, we have the full 64-bit address and can update mappings. Case of writing lower-then-upper: If the lower 32-bit BAR address doesn't conflict with RAM, map it. When the upper bits are written, update to the correct mapping. We would just have to ensure the first mapping is indeed harmless. Agreed. When only lower 32-bit is set, it may conflicts with other BAR. So when upper 32-bit is written, the conflicted BAR also needs update. It might be acceptable to update all pci devices instead of conflicted ones. Although it would be quite slow, setting BAR is not performance critical. Would that work? I hope so. Anyway it's just work around. Cam This way, 32bit BAR case is also covered. thanks, Cam --- ?hw/pci.c | ? ?5 - ?1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/hw/pci.c b/hw/pci.c index 438c0d1..3b81792 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -1000,6 +1000,9 @@ void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val, int l) ?{ ? ? ?int i, was_irq_disabled = pci_irq_disabled(d); ? ? ?uint32_t config_size = pci_config_size(d); + ? ?int is_64 = 0; + + ? ?is_64 = ((val 0xf) == PCI_BASE_ADDRESS_MEM_TYPE_64); ? ? ?for (i = 0; i l addr + i config_size; val = 8, ++i) { ? ? ? ? ?uint8_t wmask = d-wmask[addr + i]; @@ -1008,7 +1011,7 @@ void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val, int l) ? ? ? ? ?d-config[addr + i] = (d-config[addr + i] ~wmask) | (val wmask); ? ? ? ? ?d-config[addr + i] = ~(val w1cmask); /* W1C: Write 1 to Clear */ ? ? ?} - ? ?if (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) || + ? ?if ((ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) (!is_64)) || ? ? ? ? ?ranges_overlap(addr, l, PCI_ROM_ADDRESS, 4) || ? ? ? ? ?ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4) || ? ? ? ? ?range_covers_byte(addr, l, PCI_COMMAND)) -- 1.7.0.4 -- yamahata -- yamahata
[Qemu-devel] Re: [PATCH 05/21] virtio: modify save/load handler to handle inuse varialble.
2010/12/3 Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp: 2010/12/2 Michael S. Tsirkin m...@redhat.com: On Wed, Dec 01, 2010 at 05:03:43PM +0900, Yoshiaki Tamura wrote: 2010/11/28 Michael S. Tsirkin m...@redhat.com: On Sun, Nov 28, 2010 at 08:27:58PM +0900, Yoshiaki Tamura wrote: 2010/11/28 Michael S. Tsirkin m...@redhat.com: On Thu, Nov 25, 2010 at 03:06:44PM +0900, Yoshiaki Tamura wrote: Modify inuse type to uint16_t, let save/load to handle, and revert last_avail_idx with inuse if there are outstanding emulation. Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp This changes migration format, so it will break compatibility with existing drivers. More generally, I think migrating internal state that is not guest visible is always a mistake as it ties migration format to an internal implementation (yes, I know we do this sometimes, but we should at least try not to add such cases). I think the right thing to do in this case is to flush outstanding work when vm is stopped. Then, we are guaranteed that inuse is 0. I sent patches that do this for virtio net and block. Could you give me the link of your patches? I'd like to test whether they work with Kemari upon failover. If they do, I'm happy to drop this patch. Yoshi Look for this: stable migration image on a stopped vm sent on: Wed, 24 Nov 2010 17:52:49 +0200 Thanks for the info. However, The patch series above didn't solve the issue. In case of Kemari, inuse is mostly 0 because it queues the output, and while last_avail_idx gets incremented immediately, not sending inuse makes the state inconsistent between Primary and Secondary. Hmm. Can we simply avoid incrementing last_avail_idx? I think we can calculate or prepare an internal last_avail_idx, and update the external when inuse is decremented. I'll try whether it work w/ w/o Kemari. Hi Michael, Could you please take a look at the following patch? commit 36ee7910059e6b236fe9467a609f5b4aed866912 Author: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp Date: Thu Dec 16 14:50:54 2010 +0900 virtio: update last_avail_idx when inuse is decreased. Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp diff --git a/hw/virtio.c b/hw/virtio.c index c8a0fc6..6688c02 100644 --- a/hw/virtio.c +++ b/hw/virtio.c @@ -237,6 +237,7 @@ void virtqueue_flush(VirtQueue *vq, unsigned int count) wmb(); trace_virtqueue_flush(vq, count); vring_used_idx_increment(vq, count); +vq-last_avail_idx += count; vq-inuse -= count; } @@ -385,7 +386,7 @@ int virtqueue_pop(VirtQueue *vq, VirtQueueElement *elem) unsigned int i, head, max; target_phys_addr_t desc_pa = vq-vring.desc; -if (!virtqueue_num_heads(vq, vq-last_avail_idx)) +if (!virtqueue_num_heads(vq, vq-last_avail_idx + vq-inuse)) return 0; /* When we start there are none of either input nor output. */ @@ -393,7 +394,7 @@ int virtqueue_pop(VirtQueue *vq, VirtQueueElement *elem) max = vq-vring.num; -i = head = virtqueue_get_head(vq, vq-last_avail_idx++); +i = head = virtqueue_get_head(vq, vq-last_avail_idx + vq-inuse); if (vring_desc_flags(desc_pa, i) VRING_DESC_F_INDIRECT) { if (vring_desc_len(desc_pa, i) % sizeof(VRingDesc)) { I'm wondering why last_avail_idx is OK to send but not inuse. last_avail_idx is at some level a mistake, it exposes part of our internal implementation, but it does *also* express a guest observable state. Here's the problem that it solves: just looking at the rings in virtio there is no way to detect that a specific request has already been completed. And the protocol forbids completing the same request twice. Our implementation always starts processing the requests in order, and since we flush outstanding requests before save, it works to just tell the remote 'process only requests after this place'. But there's no such requirement in the virtio protocol, so to be really generic we could add a bitmask of valid avail ring entries that did not complete yet. This would be the exact representation of the guest observable state. In practice we have rings of up to 512 entries. That's 64 byte per ring, not a lot at all. However, if we ever do change the protocol to send the bitmask, we would need some code to resubmit requests out of order, so it's not trivial. Another minor mistake with last_avail_idx is that it has some redundancy: the high bits in the index ( vq size) are not necessary as they can be got from avail idx. There's a consistency check in load but we really should try to use formats that are always consistent. The following patch does the same thing as original, yet keeps the format of the virtio. It shouldn't break live migration either because inuse should be 0. Yoshi Question is, can you flush to make inuse 0 in kemari too? And if not, how do you handle the fact that some requests are
[Qemu-devel] Re: [PATCH 11/21] ioport: insert event_tap_ioport() to ioport_write().
2010/11/28 Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp: 2010/11/28 Michael S. Tsirkin m...@redhat.com: On Thu, Nov 25, 2010 at 03:06:50PM +0900, Yoshiaki Tamura wrote: Record ioport event to replay it upon failover. Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp Interesting. This will have to be extended to support ioeventfd. Since each eventfd is really just a binary trigger it should be enough to read out the fd state. Haven't thought about eventfd yet. Will try doing it in the next spin. Hi Michael, I looked into eventfd and realized it's only used with vhost now. However, I believe vhost bypass the net layer in qemu, and there is no way for Kemari to detect the outputs. To me, it doesn't make sense to extend this patch to support eventfd... Thanks, Yoshi Yoshi --- ioport.c | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/ioport.c b/ioport.c index aa4188a..74aebf5 100644 --- a/ioport.c +++ b/ioport.c @@ -27,6 +27,7 @@ #include ioport.h #include trace.h +#include event-tap.h /***/ /* IO Port */ @@ -76,6 +77,7 @@ static void ioport_write(int index, uint32_t address, uint32_t data) default_ioport_writel }; IOPortWriteFunc *func = ioport_write_table[index][address]; + event_tap_ioport(index, address, data); if (!func) func = default_func[index]; func(ioport_opaque[address], address, data); -- 1.7.1.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html