Re: [Qemu-devel] LPC2011 Virtualization Micro Conf
Hi, For those who are interested, I have posted the notes from the 2011 Linux Plumbers Virtualization micro conference here: http://wiki.linuxplumbersconf.org/2011:virtualization Slides can be found by clicking on the presentation and going onto the Plumbers abstracts. Cheers, Jes
Re: [Qemu-devel] RFC: moving fsfreeze support from the userland guest agent to the guest kernel
On 07/27/11 18:40, Andrea Arcangeli wrote: Another thing to note is that snapshotting is not necessarily something that should be completely transparent to the guest. One of the planned future features for the guest agent (mentioned in the snapshot wiki, and a common use case that I've seen come up elsewhere as well in the context of database applications), is a way for userspace applications to register callbacks to be made in the event of a freeze (dumping application-managed caches to disk and things along that line). The Not sure if the scripts are really needed or if they would just open a brand new fsfreeze specific unix domain socket (created by the database) to tell the database to freeze. If the latter is the case, then it'd be better rather than changing the database to open unix domain socket so the script can connect to it when invoked (or maybe to just add some new function to the protocol of an existing open unix domain socket), to instead change the database to open a /dev/virtio-fsfreeze device, created by the virtio-fsfreeze.ko virtio driver through udev. The database would poll it, and it could read the request to freeze, and write into it that it finished freezing when done. Then when all openers of the device freezed, the virtio-fsfreeze.ko would go ahead freezing all the filesystems, and then tell qemu when it's finished freezing. Then qemu can finally block all the I/O and tell libvirt to go ahead with the snapshot. I think it could also be a combined operation, ie. having the freeze happen in the kernel, but doing the callouts using a userspace daemon. I like the userspace daemon for the callouts because it allows providing a more sophisticated API than if we provide just a socket like interface. In addition the callout is less critical wrt crashes than the fsfreeze operations. Cheers, Jes
Re: [Qemu-devel] RFC: moving fsfreeze support from the userland guest agent to the guest kernel
On 07/27/11 20:36, Christoph Hellwig wrote: Initiating the freeze from kernelspace doesn't make much sense. With virtio we could add in-band freeze request to the protocol, and although that would be a major change in that way virtio-blk works right now it's at least doable. But all other real storage targets only communicate with their initators over out of band procotols that are entirely handled in userspace, and given their high-level nature better are - that is if we know them at all given how vendors like to keep this secrete IP closed and just offer userspace management tools in binary form. building new infrastructure in the kernel just for virtio, while needing to duplicate the same thing in userspace for all real storage seems like a really bad idea. That is in addition to the userspace freeze notifier similar to what e.g. Windows has - if the freeze process is driven from userspace it's much easier to handle those properly compared to requiring kernel upcalls. The freeze operation would really just be a case of walking the list of mounted file systems and calling the FIFREEZE ioctl operation on them. I wouldn't anticipate doing anything else in a virtio-fsfreeze.ko module. Cheers, Jes
Re: [Qemu-devel] [PATCH] Introduce QEMU_NEW()
On 07/25/11 12:06, Stefan Hajnoczi wrote: +#define QEMU_NEW(type) ((type *)(qemu_malloc(sizeof(type +#define QEMU_NEWZ(type) ((type *)(qemu_mallocz(sizeof(type Does this mean we need to duplicate the type name for each allocation? struct foo *f; ... f = qemu_malloc(sizeof(*f)); Becomes: struct foo *f; ... f = QEMU_NEW(struct foo); If you ever change the name of the type you have to search-replace these instances. The idomatic C way works well, I don't see a reason to use QEMU_NEW(). You're right, and it will promote even more abuse of the ugly typedefs. This really makes the code less readable, especially for outsiders :( Jes
[Qemu-devel] [PATCH resend] Add missing trace call to oslib-posix.c:qemu_vmalloc()
From: Jes Sorensen jes.soren...@redhat.com Signed-off-by: Jes Sorensen jes.soren...@redhat.com --- oslib-posix.c |5 - 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/oslib-posix.c b/oslib-posix.c index 3a18e86..196099c 100644 --- a/oslib-posix.c +++ b/oslib-posix.c @@ -79,7 +79,10 @@ void *qemu_memalign(size_t alignment, size_t size) /* alloc shared memory pages */ void *qemu_vmalloc(size_t size) { -return qemu_memalign(getpagesize(), size); +void *ptr; +ptr = qemu_memalign(getpagesize(), size); +trace_qemu_vmalloc(size, ptr); +return ptr; } void qemu_vfree(void *ptr) -- 1.7.4.4
Re: [Qemu-devel] [PATCH] Introduce QEMU_NEW()
On 07/25/11 17:15, Anthony Liguori wrote: On 07/25/2011 10:10 AM, Jes Sorensen wrote: On 07/25/11 12:06, Stefan Hajnoczi wrote: +#define QEMU_NEW(type) ((type *)(qemu_malloc(sizeof(type +#define QEMU_NEWZ(type) ((type *)(qemu_mallocz(sizeof(type Does this mean we need to duplicate the type name for each allocation? struct foo *f; ... f = qemu_malloc(sizeof(*f)); Becomes: struct foo *f; ... f = QEMU_NEW(struct foo); If you ever change the name of the type you have to search-replace these instances. The idomatic C way works well, I don't see a reason to use QEMU_NEW(). You're right, and it will promote even more abuse of the ugly typedefs. This really makes the code less readable, especially for outsiders :( I don't think it really matters either way. If some people prefer to use g_new(struct foo, 1) vs. g_malloc(sizeof(*f)), I don't think it significantly impacts overall code readability. But having nice, documentation for key internal APIs does which is why using the glib interfaces makes sense IMHO. Using the commands consistently does have an impact, and at least with qemu_malloc() it is obvious what they are and how they behave. The proposed macros on the other hand requires everybody to go look up the macro to find out what it is trying to do. Jes
Re: [Qemu-devel] [PATCH] Introduce QEMU_NEW()
On 07/25/11 17:20, Avi Kivity wrote: On 07/25/2011 06:17 PM, Jes Sorensen wrote: Using the commands consistently does have an impact, and at least with qemu_malloc() it is obvious what they are and how they behave. The proposed macros on the other hand requires everybody to go look up the macro to find out what it is trying to do. That's true for every single function and macro that qemu defines. Of course, and every time it adds complexity for reading it. In this particular case it seems to simply make the code worse for no gain. Jes
Re: [Qemu-devel] [PATCH] Introduce QEMU_NEW()
On 07/25/11 17:24, Avi Kivity wrote: On 07/25/2011 06:21 PM, Jes Sorensen wrote: On 07/25/11 17:20, Avi Kivity wrote: On 07/25/2011 06:17 PM, Jes Sorensen wrote: Using the commands consistently does have an impact, and at least with qemu_malloc() it is obvious what they are and how they behave. The proposed macros on the other hand requires everybody to go look up the macro to find out what it is trying to do. That's true for every single function and macro that qemu defines. Of course, and every time it adds complexity for reading it. In this particular case it seems to simply make the code worse for no gain. I guess type safety doesn't matter to you. In my experience it's one of the lesser problems in the code. Jes
Re: [Qemu-devel] [PATCH] guest agent: qemu-ga daemon
On 07/23/11 18:10, Anthony Liguori wrote: qga/guest-agent-commands.c: In function ‘qmp_guest_fsfreeze_freeze’: qga/guest-agent-commands.c:443: error: ‘FIFREEZE’ undeclared (first use in this function) qga/guest-agent-commands.c:443: error: (Each undeclared identifier is reported only once qga/guest-agent-commands.c:443: error: for each function it appears in.) qga/guest-agent-commands.c: In function ‘qmp_guest_fsfreeze_thaw’: qga/guest-agent-commands.c:481: error: ‘FITHAW’ undeclared (first use in this function) The kernel probably doesn't implement FIFREEZE. You need to do a configure test and set CONFIG_FSFREEZE appropriately. I anticipated this and that's why I added CONFIG_FSFREEZE and didn't just do __linux__. That would be odd, FIFREEZE has been around since at least January 2009 according to git blame (fcccf502540e3d752d33b2d8e976034dee81f9f7). Is OpenSuSE 11 that old? That said, having a test for it being present is a good idea. Cheers, Jes
Re: [Qemu-devel] live snapshot wiki updated
On 07/19/11 18:46, Daniel P. Berrange wrote: On Tue, Jul 19, 2011 at 04:14:27PM +0100, Stefan Hajnoczi wrote: For fd-passing perhaps we have an opportunity to use a callback mechanism (QEMU request: filename - libvirt response: fd) and do all the image format parsing in QEMU. The reason why libvirt does the parsing of file headers to determine backing files is to maintain the trust boundary. Everything run from the exec() of QEMU onwards is considered untrusted code. So having QEMU parsing the file headers passing back open() requests to libvirt is breaking the trust boundary. Pardon, but I fail to see the issue here. If QEMU passes a filename back to libvirt, libvirt still gets to make the decision whether or not it is legitimate for QEMU to get that file descriptor or not. It doesn't change anything wrt who actually opens the file, hence the 'trust' is unchanged. NB, i'm not happy about libvirt having to have knowledge of file format headers, but we needed something more efficient reliable than invoking qemu-img info parsing the output. Ideally QEMU (or something else) would provide a library libblockformat.so with stable APIs for at least reading metadata about image formats. If it had APIs for image creation, etc too that would be a bonus, but we're more or less ok spawning qemu-img for those cases currently. Even having a library for libvirt to link against is suboptimal here. Two processes shouldn't be fighting over the internals of metadata, the ownership of the metadata belongs solely with QEMU. In addition you have the constant issue of dependencies there, hence if QEMU is updated and it provides a newer block format library, it may prevent libvirt from running forcing an update of libvirt as well. That is not acceptable for development. Cheers, Jes
Re: [Qemu-devel] live snapshot wiki updated
On 07/19/11 18:14, Anthony Liguori wrote: As nice as that sentiment is, it will never fly, because it would be a regression in current behavior. The whole reason that the virt_use_nfs SELinux bool exists is that some people are willing to make the partial security tradeoff. Besides, the use of sVirt via SELinux is more than just open() protection - while the current virt_use_nfs bool makes NFS less secure than otherwise possible, it still gives some nice guarantees to the rest of the qemu process such as passthrough accesses to local pci devices. Well leaving things at status quo is not making it worse, it just leaves an evil in place. NFS and SELinux is a fundamental problem with SELinux and NFS. We can piss and moan as much as we want about it but it's reality. SELinux fundamentally requires extended attributes. By the time NFS adds extended attribute support, we'll all be flying around in hover cars. As terrible as NFS is, people use it all of the time. It would be nice if libvirt had the ability to make better use of DAC to support isolation. The fact that MAC is the only way you can do isolation between guests is pretty unfortunate. If I could assign specific UIDs to a guest and use that to enforce isolation, it would go a long ways to solving this problem. Right, we're stuck with the two horros of NFS and selinux, so we need something that gets around the problem. In a sane world we would simply say 'no NFS, no selinux', but as you say that will never happen. My suggestion of a callback mechanism where libvirt registers the callback with QEMU for open() calls, allowing libvirt to perform the open and return the open file descriptor would get around this problem. Jes
Re: [Qemu-devel] live snapshot wiki updated
On 07/19/11 18:47, Daniel P. Berrange wrote: On Tue, Jul 19, 2011 at 04:30:19PM +0200, Jes Sorensen wrote: On 07/19/11 16:24, Eric Blake wrote: Besides, I feel that having a well-documented file format, so that independent applications can both parse the same file with the same semantics by obeying the file format specification, is a good design goal. We all know that documentation is rarely uptodate, new features may not get added and libvirt will never be able to keep up. The driver for a file format belongs in QEMU and nowhere else. This would be possible if QEMU to provide a libblockformat.so library which allowed apps to extract metadata from file formats using a stable API. There is no reason for libvirt or any external process to mess about with the internals of image files. You have the same problem if the image file is encrypted. Jes
Re: [Qemu-devel] live snapshot wiki updated
On 07/20/11 12:01, Kevin Wolf wrote: Right, we're stuck with the two horros of NFS and selinux, so we need something that gets around the problem. In a sane world we would simply say 'no NFS, no selinux', but as you say that will never happen. My suggestion of a callback mechanism where libvirt registers the callback with QEMU for open() calls, allowing libvirt to perform the open and return the open file descriptor would get around this problem. To me this sounds more like a problem than a solution. It basically means that during an open (which may even be initiated by a monitor command), you need monitor interaction. It basically means that open becomes asynchronous, and requires clients to deal with that, which sounds at least interesting... Also you have to add some magic to all places opening something. I think if libvirt wants qemu to use an fd instead of a file name, it shouldn't pass a file name but an fd in the first place. Which means that the two that we need are support for an fd: protocol (patches on the list, need review), and a way for libvirt to override the backing file of an image. The problem is that QEMU will find backing file file names inside the images which it will be unable to open. How do you suggest we get around that? Jes
Re: [Qemu-devel] live snapshot wiki updated
On 07/18/11 16:08, Stefan Hajnoczi wrote: On Fri, Jul 15, 2011 at 3:58 PM, Jes Sorensen jes.soren...@redhat.com wrote: I have been updating the live snapshot wiki for qemu to try and cover the commands we will want for async snapshot handling too. http://wiki.qemu.org/Features/Snapshots Regarding fd passing, do we even support SELinux today with backing files? Not sure I understand what you mean. The current code should be happy to take an existing file or a raw device for the snapshot. Jes
[Qemu-devel] [PATCH] Add missing documentation for qemu-img -p
From: Jes Sorensen jes.soren...@redhat.com Signed-off-by: Jes Sorensen jes.soren...@redhat.com --- qemu-img-cmds.hx |4 ++-- qemu-img.texi|6 -- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx index 2b70618..1299e83 100644 --- a/qemu-img-cmds.hx +++ b/qemu-img-cmds.hx @@ -30,7 +30,7 @@ ETEXI DEF(convert, img_convert, convert [-c] [-p] [-f fmt] [-t cache] [-O output_fmt] [-o options] [-s snapshot_name] filename [filename2 [...]] output_filename) STEXI -@item convert [-c] [-f @var{fmt}] [-O @var{output_fmt}] [-o @var{options}] [-s @var{snapshot_name}] @var{filename} [@var{filename2} [...]] @var{output_filename} +@item convert [-c] [-p] [-f @var{fmt}] [-O @var{output_fmt}] [-o @var{options}] [-s @var{snapshot_name}] @var{filename} [@var{filename2} [...]] @var{output_filename} ETEXI DEF(info, img_info, @@ -48,7 +48,7 @@ ETEXI DEF(rebase, img_rebase, rebase [-f fmt] [-t cache] [-p] [-u] -b backing_file [-F backing_fmt] filename) STEXI -@item rebase [-f @var{fmt}] [-u] -b @var{backing_file} [-F @var{backing_fmt}] @var{filename} +@item rebase [-f @var{fmt}] [-p] [-u] -b @var{backing_file} [-F @var{backing_fmt}] @var{filename} ETEXI DEF(resize, img_resize, diff --git a/qemu-img.texi b/qemu-img.texi index 526474c..495a1b6 100644 --- a/qemu-img.texi +++ b/qemu-img.texi @@ -38,6 +38,8 @@ by the used format or see the format descriptions below for details. indicates that target image must be compressed (qcow format only) @item -h with or without a command shows help and lists the supported formats +@item -p +display progress bar (convert and rebase commands only) @end table Parameters to snapshot subcommand: @@ -84,7 +86,7 @@ it doesn't need to be specified separately in this case. Commit the changes recorded in @var{filename} in its base image. -@item convert [-c] [-f @var{fmt}] [-O @var{output_fmt}] [-o @var{options}] [-s @var{snapshot_name}] @var{filename} [@var{filename2} [...]] @var{output_filename} +@item convert [-c] [-p] [-f @var{fmt}] [-O @var{output_fmt}] [-o @var{options}] [-s @var{snapshot_name}] @var{filename} [@var{filename2} [...]] @var{output_filename} Convert the disk image @var{filename} or a snapshot @var{snapshot_name} to disk image @var{output_filename} using format @var{output_fmt}. It can be optionally compressed (@code{-c} @@ -114,7 +116,7 @@ they are displayed too. List, apply, create or delete snapshots in image @var{filename}. -@item rebase [-f @var{fmt}] [-u] -b @var{backing_file} [-F @var{backing_fmt}] @var{filename} +@item rebase [-f @var{fmt}] [-p] [-u] -b @var{backing_file} [-F @var{backing_fmt}] @var{filename} Changes the backing file of an image. Only the formats @code{qcow2} and @code{qed} support changing the backing file. -- 1.7.4.4
Re: [Qemu-devel] live snapshot wiki updated
On 07/19/11 15:23, Stefan Hajnoczi wrote: On Tue, Jul 19, 2011 at 8:24 AM, Jes Sorensen jes.soren...@redhat.com wrote: On 07/18/11 16:08, Stefan Hajnoczi wrote: On Fri, Jul 15, 2011 at 3:58 PM, Jes Sorensen jes.soren...@redhat.com wrote: I have been updating the live snapshot wiki for qemu to try and cover the commands we will want for async snapshot handling too. http://wiki.qemu.org/Features/Snapshots Regarding fd passing, do we even support SELinux today with backing files? Not sure I understand what you mean. The current code should be happy to take an existing file or a raw device for the snapshot. Sorry, I was off on a tangent. I think today QEMU does not support opening image files with a backing file purely using file descriptors. We currently require the ability to open files. I see what you mean - I don't actually know how that would work, since the backing file specified in the front image will be a file name. Eric, what happens if libvirt in an selinux environment tells QEMU to launch using an image file that is backed by backing file(s)? Cheers, Jes
Re: [Qemu-devel] live snapshot wiki updated
On 07/19/11 15:58, Eric Blake wrote: On 07/19/2011 07:27 AM, Jes Sorensen wrote: Eric, what happens if libvirt in an selinux environment tells QEMU to launch using an image file that is backed by backing file(s)? Before starting qemu, libvirt first parses all the image files, to see if any of them have backing images. For every qcow2 or qed image with a backing file, libvirt sets the SELinux context of both the qcow2 image and its backing file so that qemu will be able to successfully open() them. But if any of those files reside on NFS, then it is not possible to label individual files, so it requires setting the SELinux bool virt_use_nfs, which thus gives qemu the power to open() arbitrary files on NFS, and you've lost security. Urgh, libvirt parsing image files is really unfortunate, it really doesn't give me warm fuzzy feelings :( libvirt really should not know about internals of image formats. It would be nice if libvirt had a way to pass fds for every disk and backing file up front; then, SELinux can work around the lack of NFS per-file labelling by blocking open() in qemu. In fact, this has already been proposed: A cleaner solution seems to have libvirt provide a call-back allowing QEMU to call out and have libvirt open a file descriptor instead. This way libvirt can validate it and open it for QEMU and pass it back. If we cannot do something like this, I would prefer to have backing files on NFS should simply not be supported when running in an selinux setup. Cheers, Jes
Re: [Qemu-devel] live snapshot wiki updated
On 07/19/11 16:24, Eric Blake wrote: [adding the libvir-list] On 07/19/2011 08:09 AM, Jes Sorensen wrote: Urgh, libvirt parsing image files is really unfortunate, it really doesn't give me warm fuzzy feelings :( libvirt really should not know about internals of image formats. But even if you add new features to qemu to avoid needing this in the future, it doesn't change the past - libvirt will always have to know how to parse image files understood by older qemu, and so as long as libvirt already knows how to do that parsing, we might as well take advantage of it. What has been done here in the past is plain wrong. Continuing to do it isn't the right thing to do here. Besides, I feel that having a well-documented file format, so that independent applications can both parse the same file with the same semantics by obeying the file format specification, is a good design goal. We all know that documentation is rarely uptodate, new features may not get added and libvirt will never be able to keep up. The driver for a file format belongs in QEMU and nowhere else. It would be nice if libvirt had a way to pass fds for every disk and backing file up front; then, SELinux can work around the lack of NFS per-file labelling by blocking open() in qemu. In fact, this has already been proposed: A cleaner solution seems to have libvirt provide a call-back allowing QEMU to call out and have libvirt open a file descriptor instead. This way libvirt can validate it and open it for QEMU and pass it back. Yes, that could probably be made to work with libvirt. I am a little frustrated this approach wasn't taken up front instead of the evil hack of having libvirt attempt to parse image files. If we cannot do something like this, I would prefer to have backing files on NFS should simply not be supported when running in an selinux setup. As nice as that sentiment is, it will never fly, because it would be a regression in current behavior. The whole reason that the virt_use_nfs SELinux bool exists is that some people are willing to make the partial security tradeoff. Besides, the use of sVirt via SELinux is more than just open() protection - while the current virt_use_nfs bool makes NFS less secure than otherwise possible, it still gives some nice guarantees to the rest of the qemu process such as passthrough accesses to local pci devices. Well leaving things at status quo is not making it worse, it just leaves an evil in place. Just because it is currently not as secure to mix NFS shared storage with backing files doesn't stop some people from wanting to do it [in fact, that's my current development setup - I use qcow2 images on NFS shared storage, keep SELinux enabled, and enable the virt_use_nfs bool]. This discussion is about adding enhancements that make SELinux even more powerful when using NFS shared storage, by adding fd passing (whether libvirt parses in advance, or whether qemu raises an event and requires feedback from libvirt), and not about crippling the existing capability to use the virt_use_nfs selinux bool. I do not believe we should try and add extra interfaces to support something which is inherently broken. This really boils down to whether we should support fd passing for snapshots in the first place. If it is to support the broken setup of libvirt parsing image files, then I am totally against it, if we work on a proper solution that involves this in some way, then we can discuss it. Cheers, Jes
[Qemu-devel] live snapshot wiki updated
Hi, I have been updating the live snapshot wiki for qemu to try and cover the commands we will want for async snapshot handling too. http://wiki.qemu.org/Features/Snapshots Cheers, Jes
Re: [Qemu-devel] [PATCH v3] QMP: add snapshot_blkdev command
On 07/11/11 22:35, Luiz Capitulino wrote: Sorry that is no go, you just broke the hmp implementation - you cannot change the hmp behavior like that. HMP uses positional arguments, so changing argument names makes no difference. And, apart from some exceptions, it's not an stable interface, anyway... I guess you're right about the naming not affecting the hmp interface. However hmp is far more usable to end users than qmp, so yes it does matter not to change the interface at random. Jes
Re: [Qemu-devel] Taking live snapshots of running VMs
On 07/09/11 00:24, Ahmed M. Azab wrote: Hi All, Is there a way to take a live memory snapshot of a running VM without freezing or stopping this VM? I explored the Qemu code and documentation and I found two ways to take a snapshot: What you are talking about is called a 'checkpoint', not a snapshot. There has been a lot of confusion around the naming, but it really helps if we try to use the right names. Checkpoints are for things you want to be able to restart, ie. with memory, snapshots are just data, ie. disks. Cheers, Jes
[Qemu-devel] [PATCH v3] QMP: add snapshot_blkdev command
From: Jes Sorensen jes.soren...@redhat.com Add QMP bits for snapshot_blkdev command. This is the same as snapshot_blkdev in the human monitor. The command is synchronous. In the future async commands and or a break down of the functionality into multiple commands might be added. Signed-off-by: Jes Sorensen jes.soren...@redhat.com --- qmp-commands.hx | 35 +++ 1 files changed, 35 insertions(+), 0 deletions(-) diff --git a/qmp-commands.hx b/qmp-commands.hx index 92c5c3a..2b9f6af 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -694,6 +694,41 @@ Example: EQMP { +.name = blockdev-snapshot-sync, +.args_type = device:B,snapshot-file:s?,format:s?, +.params = device [new-image-file] [format], +.user_print = monitor_user_noop, +.mhandler.cmd_new = do_snapshot_blkdev, +}, + +SQMP +blockdev-snapshot-sync +-- + +Synchronous snapshot of a block device. snapshot_file specifies the +target of the new image. If the file exists, or if it is a device, the +snapshot will be created in the existing file/device. If does not +exist, a new file will be created. format specifies the format of the +snapshot image, default is qcow2. + +Arguments: + +- device: device name to snapshot (json-string) +- snapshot_file: name of new image file (json-string) +- format: format of new image (json-string) + +Example: + +- { execute: blockdev-snapshot, arguments: { device: ide-hd0, +snapshot-file: +/some/place/my-image, + format: qcow2 + } } +- { return: {} } + +EQMP + +{ .name = balloon, .args_type = value:M, .params = target, -- 1.7.4.4
Re: [Qemu-devel] [PATCH 1/1] QMP: add snapshot_blkdev command
On 07/11/11 18:35, Luiz Capitulino wrote: On Fri, 8 Jul 2011 14:00:13 +0200 jes.soren...@redhat.com wrote: From: Jes Sorensen jes.soren...@redhat.com Add QMP bits for snapshot_blkdev command. This is the same as snapshot_blkdev in the human monitor. The command is synchronous. In the future async commands and or a break down of the functionality into multiple commands might be added. Signed-off-by: Jes Sorensen jes.soren...@redhat.com --- qmp-commands.hx | 38 ++ 1 files changed, 38 insertions(+), 0 deletions(-) diff --git a/qmp-commands.hx b/qmp-commands.hx index 92c5c3a..eb135c1 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -694,6 +694,44 @@ Example: EQMP { +.name = blockdev-snapshot, blockdev-snapshot-sync. argh, will fix +.args_type = device:B,snapshot_file:s?,format:s?, +.params = device [new-image-file] [format], +.help = initiates a live snapshot\n\t\t\t + of device. If a new image file is specified, the\n\t\t\t + new image file will become the new root image.\n\t\t\t + If format is specified, the snapshot file will\n\t\t\t + be created in that format. Otherwise the\n\t\t\t + snapshot will be internal! (currently unsupported), The 'otherwise' part is a bit confusing. You document something as if it were supported then you say it's not supported. I recommend to just not document it instead. Ok +.user_print = monitor_user_noop, +.mhandler.cmd_new = do_snapshot_blkdev, +}, + +SQMP +blockdev-snapshot-sync +-- + +Synchronous snapshot of block device, using snapshot file as target, +if provided. In QMP only this text is used, the text in '.help' is discarded. Please, move all command documentation here. If .help isn't used, then please remove it from the struct. It is really pointless to keep carrying both around. + +Arguments: + +- device: device name to snapshot (json-string) +- snapshot_file: name of new image file (json-string) +- format: format of new image (json-string) + +Example: + +- { execute: blockdev-snapshot, arguments: { device: ide-hd0, +snapshot_file: We are trying to standardize the use of hyphen in QMP. Sorry, but I haven't got a clue what you mean here. Jes
Re: [Qemu-devel] [PATCH v3] QMP: add snapshot_blkdev command
On 07/11/11 22:24, Luiz Capitulino wrote: On Mon, 11 Jul 2011 20:01:09 +0200 jes.soren...@redhat.com wrote: From: Jes Sorensen jes.soren...@redhat.com Add QMP bits for snapshot_blkdev command. This is the same as snapshot_blkdev in the human monitor. The command is synchronous. In the future async commands and or a break down of the functionality into multiple commands might be added. Signed-off-by: Jes Sorensen jes.soren...@redhat.com --- qmp-commands.hx | 35 +++ 1 files changed, 35 insertions(+), 0 deletions(-) diff --git a/qmp-commands.hx b/qmp-commands.hx index 92c5c3a..2b9f6af 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -694,6 +694,41 @@ Example: EQMP { +.name = blockdev-snapshot-sync, +.args_type = device:B,snapshot-file:s?,format:s?, You changed from an underline to a hyphen as I asked, but the implementation still expects an underline. I fixed that myself to avoid multiple submissions because of a small thing (also fixed the command name in the subject). The patch I merged follows for reference. Please, test your patches before submitting next time. Sorry that is no go, you just broke the hmp implementation - you cannot change the hmp behavior like that. Jes
[Qemu-devel] [PATCH 0/1] QMP command for snapshot_blkdev
From: Jes Sorensen jes.soren...@redhat.com Hi, I discussed the issue of a QMP command for live snapshot with Anthony, and we have agreed that it is fine to have a QMP command that matches the current human monitor command. This doesn't preclude that in the future someone may want to add support breaking live snapshots into multiple commands. Cheers, Jes Jes Sorensen (1): QMP: add snapshot_blkdev command qmp-commands.hx | 38 ++ 1 files changed, 38 insertions(+), 0 deletions(-) -- 1.7.4.4
[Qemu-devel] [PATCH 1/1] QMP: add snapshot_blkdev command
From: Jes Sorensen jes.soren...@redhat.com Add QMP bits for snapshot_blkdev command. This is the same as snapshot_blkdev in the human monitor. The command is synchronous. In the future async commands and or a break down of the functionality into multiple commands might be added. Signed-off-by: Jes Sorensen jes.soren...@redhat.com --- qmp-commands.hx | 38 ++ 1 files changed, 38 insertions(+), 0 deletions(-) diff --git a/qmp-commands.hx b/qmp-commands.hx index 92c5c3a..eb135c1 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -694,6 +694,44 @@ Example: EQMP { +.name = blockdev-snapshot, +.args_type = device:B,snapshot_file:s?,format:s?, +.params = device [new-image-file] [format], +.help = initiates a live snapshot\n\t\t\t + of device. If a new image file is specified, the\n\t\t\t + new image file will become the new root image.\n\t\t\t + If format is specified, the snapshot file will\n\t\t\t + be created in that format. Otherwise the\n\t\t\t + snapshot will be internal! (currently unsupported), +.user_print = monitor_user_noop, +.mhandler.cmd_new = do_snapshot_blkdev, +}, + +SQMP +blockdev-snapshot-sync +-- + +Synchronous snapshot of block device, using snapshot file as target, +if provided. + +Arguments: + +- device: device name to snapshot (json-string) +- snapshot_file: name of new image file (json-string) +- format: format of new image (json-string) + +Example: + +- { execute: blockdev-snapshot, arguments: { device: ide-hd0, +snapshot_file: +/some/place/my-image, + format: qcow2 + } } +- { return: {} } + +EQMP + +{ .name = balloon, .args_type = value:M, .params = target, -- 1.7.4.4
Re: [Qemu-devel] migration: new sections and backward compatibility.
On 07/07/11 09:30, Avi Kivity wrote: On 07/07/2011 10:14 AM, Gerd Hoffmann wrote: Can't we just implicitly fail migration whenever there's a device in the tree that doesn't have VMSTATE? There are cases where the device doesn't need to save state, so that alone doesn't cut it. It should then say so by having an empty VMSTATE descriptor. It seems reasonable to me to introduce a situation where devices have to explicitly marked as migration compatible and fail if there are devices in the system which are not. Even for the case like USB devices where migration might simply force a replug of the devices. Cheers, Jes
[Qemu-devel] [PATCH 1/1] usb_register_port(): do not set port-opaque and port-index twice
From: Jes Sorensen jes.soren...@redhat.com Signed-off-by: Jes Sorensen jes.soren...@redhat.com --- hw/usb-bus.c |2 -- 1 files changed, 0 insertions(+), 2 deletions(-) diff --git a/hw/usb-bus.c b/hw/usb-bus.c index 2abce12..6e082ab 100644 --- a/hw/usb-bus.c +++ b/hw/usb-bus.c @@ -145,8 +145,6 @@ void usb_register_port(USBBus *bus, USBPort *port, void *opaque, int index, { port-opaque = opaque; port-index = index; -port-opaque = opaque; -port-index = index; port-ops = ops; port-speedmask = speedmask; QTAILQ_INSERT_TAIL(bus-free, port, next); -- 1.7.4.4
Re: [Qemu-devel] [PATCH 08/10] cocoa: Revert dependency on VNC
On 06/14/11 03:22, Andreas Färber wrote: In 821601ea5b02a68ada479731a4d3d07a9876632a (Make VNC support optional) cocoa.o was moved from ui-obj-$(CONFIG_COCOA) to vnc-obj-$(CONFIG_COCOA), adding a dependency on $(CONFIG_VNC). That must've been unintentional. Cc: Jes Sorensen jes.soren...@redhat.com Cc: Anthony Liguori aligu...@us.ibm.com Signed-off-by: Andreas Färber andreas.faer...@web.de --- Makefile.objs |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) Looks like an error from my side, sorry about that. Patch looks good. Jes
Re: [Qemu-devel] [patch 1/7] add migration_active function
On 05/24/11 06:31, Marcelo Tosatti wrote: To query whether migration is active. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com ACK Jes
Re: [Qemu-devel] [RFC] live snapshot, live merge, live block migration
On 05/20/11 14:19, Stefan Hajnoczi wrote: I'm interested in what the API for snapshots would look like. I presume you're talking external snapshots here? The API is really what should be defined by libvirt, so you get a unified API that can work both on QEMU level snapshots as well as enterprise storage, host file system snapshots etc. Specifically how does user software do the following: 1. Create a snapshot There's a QMP patch out already that is still not applied, but it is pretty simple, similar to the hmp command. Alternatively you can do it the evil way by pre-creating the snapshot image file and feeding that the snapshot command. In this case QEMU won't create the snapshot file. 2. Delete a snapshot This is still to be defined. 3. List snapshots Again this is tricky as it depends on the type of snapshot. For QEMU level ones they are files, so 'ls' is your friend :) 4. Access data from a snapshot You boot the snapshot file. 5. Restore a VM from a snapshot We're talking snapshots not checkpointing here, so you cannot restore a VM from a snapshot. 6. Get the dirty blocks list (for incremental backup) Good question We've discussed image format-level approaches but I think the scope of the API should cover several levels at which snapshots are implemented: 1. Image format - image file snapshot (Jes, Jagane) 2. Host file system - ext4 and btrfs snapshots 3. Storage system - LVM or SAN volume snapshots It will be hard to take advantage of more efficient host file system or storage system snapshots if they are not designed in now. Is anyone familiar enough with the libvirt storage APIs to draft an extension that adds snapshot support? I will take a stab at it if no one else want to try it. I believe the libvirt guys are already looking at this. Adding to the CC list. Cheers, Jes
Re: [Qemu-devel] [RFC] live snapshot, live merge, live block migration
On 05/20/11 14:49, Stefan Hajnoczi wrote: On Fri, May 20, 2011 at 1:39 PM, Jes Sorensen jes.soren...@redhat.com wrote: On 05/20/11 14:19, Stefan Hajnoczi wrote: I'm interested in what the API for snapshots would look like. I presume you're talking external snapshots here? The API is really what should be defined by libvirt, so you get a unified API that can work both on QEMU level snapshots as well as enterprise storage, host file system snapshots etc. Thanks for the pointers on external snapshots using image files. I'm really thinking about the libvirt API. Basically I'm not sure we'll implement the right things if we don't think through the API that the user sees first. Right, I agree. There's a lot of variables there, and they are not necessarily easy to map into a single namespace. I am not sure it should be done either.. Cheers, Jes
Re: [Qemu-devel] [RFC] live snapshot, live merge, live block migration
On 05/09/11 15:40, Dor Laor wrote: Summary: * We need Marcelo's new (to come) block copy implementation * should work in parallel to migration and hotplug * General copy on read is desirable * Live snapshot merge to be implemented using block copy * Need to utilize a remote block access protocol (iscsi/nbd/other) Which one is the best? * Keep qemu-img the single interface for dirty block mappings. * Live block migration pre copy == live copy + block access protocol + live migration * Live block migration post copy == live migration + block access protocol/copy on read. Comments? I think we should add Jagane Sundar's Livebackup to the watch list here. It looks very interesting as an alternative way to reach some of the same goals. Cheers, Jes
Re: [Qemu-devel] [RFC] live snapshot, live merge, live block migration
On 05/09/11 17:23, Anthony Liguori wrote: * Live snapshots and live snapshot merge Live snapshot is already incorporated (by Jes) in qemu (still need qemu-agent work to freeze the guest FS). Live snapshot is unfortunately not really live. It runs a lot of operations synchronously which will cause the guest to incur downtime. We really need to refactor it to truly be live. We keep having this discussion, but as pointed out in my last reply on this, you can pre-create your image if you so desire. The actual snapshot then becomes less in one command. Yes we can make it even nicer, but what we have now is far less bad than you make it out to be. Cheers, Jes
Re: [Qemu-devel] KVM call agenda for May 10th
On 05/09/11 13:50, Juan Quintela wrote: Please send in any agenda items you are interested in covering. From last week, we have already: - import kvm headers into qemu, drop #ifdef maze (Jan) Thanks, Juan. Since we haven't received any further agenda items. In addition Anthony is unavailable, plus all the guys in Tel Aviv is on holiday today - lets postpone until next week. CANCELED! Jes
Re: [Qemu-devel] [PATCH 1/2] Add documentation for qemu_progres_print()
On 05/06/11 17:10, Markus Armbruster wrote: Jes Sorensen jes.soren...@redhat.com writes: What you add is a delta, which is relative to the max. We can change the argument name of the function to be delta instead if that makes it easier to follow. Here's my try: /* * Report progress. * @percent is how much progress we made. * If @max is zero, @percent is how much of the job is done. * Else, @percent is a progress delta since the last call, as a fraction * of @max. I.e. delta is @percent * @max / 100. This is for * convenience, it lets you account for @max% of the total work in some * function, and still count @percent from 0 to 100. */ Thanks! I made it a little more explicit based on your input: /* * Report progress. * @delta is how much progress we made. * If @max is zero, @delta is an absolut value of the total job done. * Else, @delta is a progress delta since the last call, as a fraction * of @max. I.e. the delta is @delta * @max / 100. This allows * relative accounting of functions which may be a different fraction of * the full job, depending on the context they are called in. I.e. * a function might be considered 40% of the full job if used from * bdrv_img_create() but only 20% if called from img_convert(). */ Document min_skip with qemu_progress_init(): /* * Initialize progress reporting. * If @enabled is false, actual reporting is suppressed. The user can * still trigger a report by sending SIGUSR1. * Reports are also suppressed unless we've had at least @min_skip * percent progress since the last report. */ excellent! To be honest, the @max feature is a pain to document. I'd ditch it. For non-zero max, qemu_progress_report(x, max) becomes qemu_progress_report(x * max/100) I have to say I prefer having the max setting here - what would be an option would be to keep the max value as a state, and then have a qemu_progress_set_current_max() or something like that, so you wouldn't have to type it every time? Not much of an inconvenience, in my opinion. None at all when max is 100, which is the case for all non-zero max arguments so far. The reason for introducing this is that some functions are called in different ways, and to keep the same accounting code, it would be possible to add the 'max' argument to those functions when they do their counting. It is an attempt to be forward compatible for when it happens :) The only use of the absolute feature (zero max) so far is setting it to all done, like this: qemu_progress_print(100, 0); Can be done just as well with a delta = 100, e.g. qemu_progress_print(100); If a need for setting other absolute progress arises, I'd consider adding second function. Getting rid of the absolute setting would be fine with me. You're right that it is quite easy to set it that way. Cheers, Jes
[Qemu-devel] [PATCH replacement 1/1] Add documentation for qemu_progress_{init, print}()
From: Jes Sorensen jes.soren...@redhat.com Signed-off-by: Jes Sorensen jes.soren...@redhat.com --- qemu-common.h |2 +- qemu-progress.c | 24 +--- 2 files changed, 22 insertions(+), 4 deletions(-) diff --git a/qemu-common.h b/qemu-common.h index f9f705d..78b7a4a 100644 --- a/qemu-common.h +++ b/qemu-common.h @@ -346,7 +346,7 @@ void qemu_iovec_memset_skip(QEMUIOVector *qiov, int c, size_t count, void qemu_progress_init(int enabled, float min_skip); void qemu_progress_end(void); -void qemu_progress_print(float percent, int max); +void qemu_progress_print(float delta, int max); #define QEMU_FILE_TYPE_BIOS 0 #define QEMU_FILE_TYPE_KEYMAP 1 diff --git a/qemu-progress.c b/qemu-progress.c index a4894c0..8ebe8ef 100644 --- a/qemu-progress.c +++ b/qemu-progress.c @@ -96,6 +96,13 @@ static void progress_dummy_init(void) state.end = progress_dummy_end; } +/* + * Initialize progress reporting. + * If @enabled is false, actual reporting is suppressed. The user can + * still trigger a report by sending a SIGUSR1. + * Reports are also suppressed unless we've had at least @min_skip + * percent progress since the last report. + */ void qemu_progress_init(int enabled, float min_skip) { state.min_skip = min_skip; @@ -111,14 +118,25 @@ void qemu_progress_end(void) state.end(); } -void qemu_progress_print(float percent, int max) +/* + * Report progress. + * @delta is how much progress we made. + * If @max is zero, @delta is an absolut value of the total job done. + * Else, @delta is a progress delta since the last call, as a fraction + * of @max. I.e. the delta is @delta * @max / 100. This allows + * relative accounting of functions which may be a different fraction of + * the full job, depending on the context they are called in. I.e. + * a function might be considered 40% of the full job if used from + * bdrv_img_create() but only 20% if called from img_convert(). + */ +void qemu_progress_print(float delta, int max) { float current; if (max == 0) { -current = percent; +current = delta; } else { -current = state.current + percent / 100 * max; +current = state.current + delta / 100 * max; } if (current 100) { current = 100; -- 1.7.4.4
[Qemu-devel] [PATCH qemu-block 0/2] cleanup progress code
From: Jes Sorensen jes.soren...@redhat.com Hi, Two small patches cleaning up the progress printing code - adding documentation and removing some unneeded paranthesis. Also know as the 'happy markus' patch series This is relative to the block branch. Jes Jes Sorensen (2): Add documentation for qemu_progres_print() qemu-img.c: Remove superfluous parenthesis qemu-img.c |6 +++--- qemu-progress.c |8 2 files changed, 11 insertions(+), 3 deletions(-) -- 1.7.4.4
[Qemu-devel] [PATCH 1/2] Add documentation for qemu_progres_print()
From: Jes Sorensen jes.soren...@redhat.com Signed-off-by: Jes Sorensen jes.soren...@redhat.com --- qemu-progress.c |8 1 files changed, 8 insertions(+), 0 deletions(-) diff --git a/qemu-progress.c b/qemu-progress.c index a4894c0..70928d6 100644 --- a/qemu-progress.c +++ b/qemu-progress.c @@ -111,6 +111,14 @@ void qemu_progress_end(void) state.end(); } +/* + * Add delta to current state, and print the output if the current + * state has progressed more than min_skip since the last value was + * printed. 'max' specifies the relative percentage, ie. a function + * can count for 30% of the total work, and still count from 0-100, by + * setting max to 30. If max is set to zero, the percent argument + * becomes an absolute value for current state. + */ void qemu_progress_print(float percent, int max) { float current; -- 1.7.4.4
[Qemu-devel] [PATCH 2/2] qemu-img.c: Remove superfluous parenthesis
From: Jes Sorensen jes.soren...@redhat.com Signed-off-by: Jes Sorensen jes.soren...@redhat.com --- qemu-img.c |6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/qemu-img.c b/qemu-img.c index e825123..1da5484 100644 --- a/qemu-img.c +++ b/qemu-img.c @@ -785,7 +785,7 @@ static int img_convert(int argc, char **argv) nb_sectors = total_sectors; local_progress = (float)100 / -(nb_sectors / MIN(nb_sectors, (cluster_sectors))); +(nb_sectors / MIN(nb_sectors, cluster_sectors)); for(;;) { int64_t bs_num; @@ -856,7 +856,7 @@ static int img_convert(int argc, char **argv) sector_num = 0; // total number of sectors converted so far nb_sectors = total_sectors - sector_num; local_progress = (float)100 / -(nb_sectors / MIN(nb_sectors, (IO_BUF_SIZE / 512))); +(nb_sectors / MIN(nb_sectors, IO_BUF_SIZE / 512)); for(;;) { nb_sectors = total_sectors - sector_num; @@ -1331,7 +1331,7 @@ static int img_rebase(int argc, char **argv) bdrv_get_geometry(bs, num_sectors); local_progress = (float)100 / -(num_sectors / MIN(num_sectors, (IO_BUF_SIZE / 512))); +(num_sectors / MIN(num_sectors, IO_BUF_SIZE / 512)); for (sector = 0; sector num_sectors; sector += n) { /* How many sectors can we handle with the next read? */ -- 1.7.4.4
Re: [Qemu-devel] [PATCH 1/2] Add documentation for qemu_progres_print()
On 05/06/11 12:40, Brad Hards wrote: On Fri, 6 May 2011 07:39:10 PM jes.soren...@redhat.com wrote: +/* + * Add delta to current state, and print the output if the current + * state has progressed more than min_skip since the last value was + * printed. 'max' specifies the relative percentage, ie. a function + * can count for 30% of the total work, and still count from 0-100, by + * setting max to 30. If max is set to zero, the percent argument + * becomes an absolute value for current state. + */ void qemu_progress_print(float percent, int max) I hate to critique anyone adding docs, but this makes no sense at all to me without reading the code. Is percent the amount we are adding (i.e. the delta) or the result (i.e. absolute progress)? Or does it vary according to the value of max? What you add is a delta, which is relative to the max. We can change the argument name of the function to be delta instead if that makes it easier to follow. Cheers, Jes
Re: [Qemu-devel] [PATCH v2 3/3] CPUPhysMemoryClient: Batch contiguous addresses when playing catchup
On 05/05/11 16:21, Alex Williamson wrote: A bit worried that ram_addr_t size might thinkably overflow (it's just a long, could be a 4G ram). Break it out when it fills up? struct CPUPhysMemoryClient { void (*set_memory)(struct CPUPhysMemoryClient *client, target_phys_addr_t start_addr, ram_addr_t size, ram_addr_t phys_offset); ram_addr_t seems to be the standard for describing these types of things. It's an unsigned long, so 4G is only concern for 32b builds, which don't support that much memory anyway. Please apply. Thanks, A memory size can obviously not be bigger than the maximum physical address, so I find it really hard to see how this could overflow. It seems fair to use it for the size here. Acked-by: Jes Sorensen jes.soren...@redhat.com
Re: [Qemu-devel] [PATCH v2 3/3] CPUPhysMemoryClient: Batch contiguous addresses when playing catchup
On 05/05/11 17:18, Michael S. Tsirkin wrote: A memory size can obviously not be bigger than the maximum physical address, so I find it really hard to see how this could overflow. For example, a 4G size does not fit in 32 bits. That is the only corner case - you can handle that by -1 if you like. Jes
Re: [Qemu-devel] [PATCH v2 3/3] CPUPhysMemoryClient: Batch contiguous addresses when playing catchup
On 05/05/11 17:38, Michael S. Tsirkin wrote: On Thu, May 05, 2011 at 05:36:04PM +0200, Jes Sorensen wrote: On 05/05/11 17:18, Michael S. Tsirkin wrote: A memory size can obviously not be bigger than the maximum physical address, so I find it really hard to see how this could overflow. For example, a 4G size does not fit in 32 bits. That is the only corner case True. you can handle that by -1 if you like. But then all users need to be updated. Seems easier to break out of the loop easier. It's likely not a real problem, certainly not on a pc, don't know about other systems. I think it is quite fair to limit the amount of memory we support when running 32 bit qemu binaries. I would expect more things to break than just this if we tried to support 4GB of RAM on a 32 bit host. Cheers, Jes
Re: [Qemu-devel] [PATCH v2 1/1] Add QMP bits for blockdev-snapshot-sync.
On 04/29/11 15:45, Anthony Liguori wrote: On 04/29/2011 08:38 AM, Jes Sorensen wrote: It is exactly the same for the management tool: - Creation of the new image either succeeds or fails - Switchover either succeeds or fails Creating an image can be treated as an atomic operation. It either fully succeeds or you assume it failed and throw the image away since you can always try again. When you combine creating an image with another operation, you create a a single operation that cannot be treated as atomic which makes recovery from failure much more difficult. As explained in previous emails, you still have the option to do this with the current implementation, if you so desire. It's a bad idea, but for those who insist on doing the wrong thing, sure go ahead. Just create the image first, and then pass it in as the snapshot file rather than specifying a snapshot file that doesn't exist already. 3) Begin switch over to new image 4) Switch over image. 5) Receive notification when it's complete Sorry but this isn't an accurate description of the process. Once you have a new image ready, you need to halt writes, flush buffers, and then you can do the switch, which has to be atomic. You need to queue writes, you can still let the guest run while the remaining writes are sent to disk. But if this is a background task, then as a management tool, don't I want a notification when it's been completed? Don't I want to have a little spinning box in my GUI or something like that while this is happening? I agree that having an interface that can run it in the background asynchronously isn't a bad thing, and it would be a nice feature. However it really doesn't qualify as anything but an enhancement in my book. You already have the option to pre-create the snapshot image if you so desire. But combining 1-5 in a single interface creates a command that while convenient on the command line, is not usable for a robust management tool. As I explained you can already use an externally created image with the current interface, the only issue that may be worth doing async is the buffer flushing. However once you do that, guest writes are going to stall anyway, and eventually the guest applications will all stall. The interface shouldn't have the option to create an image... because if you have that, the interface is difficult to use. Why present an option to do something that we know is broken? We can't blame management tools for not doing a good job managing KVM if we're giving them poor interfaces to work with. Sorry this is utterly bogus. This is *not* broken, it is the right way to do things. It is clearly a matter of taste whether you prefer it one way or another, but it is not broken. There is nothing wrong with it being an option, especially since you don't have to do anything magic for creating an image in advance. I realize it doesn't match your image of how you would like the command to work, but that is not the same as it being broken. I really see nothing in your above arguments that backs up the argument that the current implementation is broken in any way. I am totally in favor or having an async implementation as well, but I really don't see it being as big an issue as you are making it. Regards, Jes
Re: [Qemu-devel] [RFC][PATCH v2 00/11] QEMU Guest Agent: QMP-based host/guest communication (virtagent)
On 04/21/11 15:55, Michael Roth wrote: Did you do anything with the fsfreeze patches, or were they dropped in the migration to qapi? They were pending some changes required on the agent side that weren't really addressed/doable until this patchset, namely: 1) server-side timeout mechanism to recover from RPCs that can hang indefinitely or take a really long time (fsfreeze, fopen, etc), currently it's 30 seconds, may need to bump it to 60 for fsfreeze, or potentially add an RPC to change the server-side timeout 2) a simple way to temporarily turn off logging so agent doesn't deadlock itself 3) a way to register a cleanup handler when a timeout occurs. 4) disabling RPCs where proper accounting/logging is required (guest-open-file, guest-shutdown, etc) #4 isn't implemented...I think this could be done fairly in-evasively with something like: Response important_rpc(): if (!ga_log(syslog, LEVEL_CRITICAL, important stuff happening)) return ERROR_LOGGING_CURRENTLY_DISABLED Either that, or maybe simply disable the full command while the freeze is in progress? I fear we're more likely to miss a case of checking for logging than we are to miss command disabling? It should still be very non evasive, maybe just a flag in the struct declaring the functions marking it as logging-required and if the no-logging flag is set, the command is made to wait, or return -EAGAIN bool ga_log(log_domain, level, msg): if (log_domain == syslog) if (!logging_enabled is_critical(log_level)) return False; syslog(msg, ...) else if (logging_enabled) normallog(msg, ...) return True With that I think we could actually drop the fsfreeze stuff in. Thoughts? IMHO it is better to disable the commands rather than just logging, but either way should allow it to drop in. Sorry for the late reply, been a bit swamped here. Cheers, Jes
Re: [Qemu-devel] KVM call agenda for May 03rd
On 05/03/11 15:04, Jan Kiszka wrote: On 2011-05-03 12:21, Juan Quintela wrote: Please send in any agenda items you are interested in covering. Provided there will be more topics: - import kvm headers into qemu, drop #ifdef maze Otherwise, we can also discuss this based on the patch I'm preparing ATM. Since this it the only topic for the call, I suggest we cancel and add it to next week's agenda. Cheers, Jes
Re: [Qemu-devel] [RFC][PATCH v2 00/11] QEMU Guest Agent: QMP-based host/guest communication (virtagent)
On 05/03/11 15:53, Michael Roth wrote: IMHO it is better to disable the commands rather than just logging, but either way should allow it to drop in. Kinda agree, but logging seems to be the real dependency. With the server-side timeouts now in place even doing stuff like fopen/fwrite is permitted (it would just timeout if it blocked too long). It's the logging stuff that we don't really have a way to recover from, because it's not run in a thread we can just nuke after a certain amount of time. Even when we're not frozen, we can't guarantee an fopen/fwrite/fread will succeed, so failures shouldn't be too much of a surprise since they need to be handled anyway. And determining whether or not a command should be marked as executable during a freeze is somewhat nebulous (fopen might work for read-only access, but hang for write access when O_CREATE is set, fwrite might succeed if it doesn't require a flush, etc), plus internal things like logging need to be taken into account. So, for now at least I think it's a reasonable way to do it. Please be very careful with any fwrite() calls - it's not just logging. Any writes to the frozen file systems will result in the caller being put to sleep until the file system is unfrozen. It won't timeout, and the agent will be stuck hanging in that call. It's fun playing with the fsfreeze stuff on your desktop system - doing it in an xterm has 'interesting' effects. :) This is why I prefer the 'disable function' rather 'disable logging' approach. Sorry for the late reply, been a bit swamped here. No problem I have your patches in my tree now. They still need a little bit of love and testing but I should be able to get them out on the list shortly. Sounds great! Cheers, Jes
Re: [Qemu-devel] [PATCH v2 1/1] Add QMP bits for blockdev-snapshot-sync.
On 04/28/11 17:10, Anthony Liguori wrote: On 04/28/2011 09:57 AM, Jes Sorensen wrote: On 04/28/11 16:46, Anthony Liguori wrote: Sorry this is inherently broken. The management tool should not be keeping state in this process. I agree an async interface would be nice, but the above process is plain wrong. The async snapshot process needs to be doing the exact same as in the current implementation, the main difference is that it would be running asynchronously and that QMP would be able to query the state of it. No, the command does too many things and as such, makes it impossible for a management tool to gracefully recover. It is exactly the same for the management tool: - Creation of the new image either succeeds or fails - Switchover either succeeds or fails If you have a crash in the process, you still can only know by checking the consistency of the new image after rebooting. There is no issue here, you have the exact same problem if you get a crash during d) in your example. It is the same with the existing command, the crash is only an issue if it happens right in the middle of the switch over. Until then, only the original image remains valid. But the new image is always valid once it's been created as pending writes are lost no matter what in a hard power failure. That suggests the following flow: 1) Create new image with a backing file 2) Completion ACK At this stage, the management tool should update it's internal state to point to the new image. This can still be done with the existing code - it's not the ideal setup, but it is possible due to evil things such as selinux labeling. 3) Begin switch over to new image 4) Switch over image. 5) Receive notification when it's complete Sorry but this isn't an accurate description of the process. Once you have a new image ready, you need to halt writes, flush buffers, and then you can do the switch, which has to be atomic. The only thing that can really be async in all of this is the buffer flushing. There isn't any long switch-over process. If (4) is happening asynchronously, things get kind of complicated. You can either wait for things to quiesce on their own or you can queue pending requests. It's unclear to me what the right strategy is or if there's a mixed strategy that's needed. I think it makes sense to treat 3/4 as a command with (5) being an event notification. The actual guest application freeze (what some strange people use the quiicannotspell word for) is done prior to the switchover, you cannot do it in parallel with it. But combining 1-5 in a single interface creates a command that while convenient on the command line, is not usable for a robust management tool. As I explained you can already use an externally created image with the current interface, the only issue that may be worth doing async is the buffer flushing. However once you do that, guest writes are going to stall anyway, and eventually the guest applications will all stall. The single command is both better from a consistency perspective, and it will do the right job. Things are not going to get any easier from the management tool's perspective than they are with the current interface. Cheers, Jes
Re: [Qemu-devel] [PATCH 1/2] Add dd-style SIGUSR1 progress reporting
On 04/27/11 18:14, Markus Armbruster wrote: +static void progress_simple_init(void) +{ +state.print = progress_simple_print; +state.end = progress_simple_end; +} + +#ifdef CONFIG_POSIX +static void sigusr_print(int signal) +{ +printf((%3.2f/100%%)\n, state.current); printf() is not async-signal-safe. I don't think you can safely call it in a signal handler. G, you're absolutely right! Back to the drawing board! If someone locates my lost marbles, would you mind returning them? I need them urgently! Cheers, Jes
[Qemu-devel] [PATCH] qemu-progress.c: printf isn't signal safe
From: Jes Sorensen jes.soren...@redhat.com Change the signal handling to indicate a signal is pending, rather then printing directly from the signal handler. In addition make the signal prints go to stderr, rather than stdout. Signed-off-by: Jes Sorensen jes.soren...@redhat.com --- qemu-progress.c |7 ++- 1 files changed, 6 insertions(+), 1 deletions(-) diff --git a/qemu-progress.c b/qemu-progress.c index e1feb89..27a2310 100644 --- a/qemu-progress.c +++ b/qemu-progress.c @@ -32,6 +32,7 @@ struct progress_state { float current; float last_print; float min_skip; +int print_pending; void (*print)(void); void (*end)(void); }; @@ -63,12 +64,16 @@ static void progress_simple_init(void) #ifdef CONFIG_POSIX static void sigusr_print(int signal) { -printf((%3.2f/100%%)\n, state.current); +state.print_pending = 1; } #endif static void progress_dummy_print(void) { +if (state.print_pending) { +fprintf(stderr, (%3.2f/100%%)\n, state.current); +state.print_pending = 0; +} } static void progress_dummy_end(void) -- 1.7.4.4
[Qemu-devel] [PATCH v2] qemu-progress.c: printf isn't signal safe
From: Jes Sorensen jes.soren...@redhat.com Change the signal handling to indicate a signal is pending, rather then printing directly from the signal handler. In addition make the signal prints go to stderr, rather than stdout. Signed-off-by: Jes Sorensen jes.soren...@redhat.com --- qemu-progress.c |7 ++- 1 files changed, 6 insertions(+), 1 deletions(-) diff --git a/qemu-progress.c b/qemu-progress.c index e1feb89..a4894c0 100644 --- a/qemu-progress.c +++ b/qemu-progress.c @@ -37,6 +37,7 @@ struct progress_state { }; static struct progress_state state; +static volatile sig_atomic_t print_pending; /* * Simple progress print function. @@ -63,12 +64,16 @@ static void progress_simple_init(void) #ifdef CONFIG_POSIX static void sigusr_print(int signal) { -printf((%3.2f/100%%)\n, state.current); +print_pending = 1; } #endif static void progress_dummy_print(void) { +if (print_pending) { +fprintf(stderr, (%3.2f/100%%)\n, state.current); +print_pending = 0; +} } static void progress_dummy_end(void) -- 1.7.4.4
Re: [Qemu-devel] [PATCH v2 1/1] Add QMP bits for blockdev-snapshot-sync.
On 04/27/11 17:05, Jes Sorensen wrote: On 04/27/11 17:05, Luiz Capitulino wrote: On Mon, 18 Apr 2011 16:27:01 +0200 jes.soren...@redhat.com wrote: From: Jes Sorensen jes.soren...@redhat.com This is quivalent to snapshot_blkdev in the human monitor, with _sync added to the command name to make it explicit that the command is synchronous and leave space for a future async version. I'm not sure appending _sync is such a good convention, most commands are sync today and they don't have it. I'd prefer to call it snapshot_blkdev and note in the documentation how it works. On the other hand, I'm not sure how Anthony is going to model async commands, so maybe he has a better suggestion. The _sync prefix is on purpose to leave space for a possible async implementation of the snapshot command in the future. This isn't related to it being a sync vs async qmp command though. If people are more comfortable with the QMP command being blockdev-snapshot and then using {-,_}async for the async comment in the monitor and QMP later, that is fine with me. I'd just like to move on this and get it upstream. Cheers, Jes
Re: [Qemu-devel] [PATCH v2 1/1] Add QMP bits for blockdev-snapshot-sync.
On 04/27/11 17:05, Luiz Capitulino wrote: +Synchronous snapshot of block device, using snapshot file as target +if provided. It's not optional in HMP: (qemu) snapshot_blkdev ide0-hd0 Parameter 'snapshot_file' is missing (qemu) The parameter is optional in HMP, however it will fail if there is no snapshot filename or there is no flag for internal snapshots (which is currently unsupported). Jes
Re: [Qemu-devel] [PATCH v2 1/1] Add QMP bits for blockdev-snapshot-sync.
On 04/27/11 17:05, Luiz Capitulino wrote: +If a new image file is specified, the new image file will become the +new root image. If format is specified, the snapshot file will be +created in that format. Otherwise the snapshot will be internal! +(currently unsupported). Sorry for the stupid question, but what's a new root image? Also, all these assumptions seem human features to me, as it can save some typing and I can poke around to see where the snapshots are stored. All arguments should be mandatory in QMP, IMO. Sorry, but there is absolutely no reason to make all arguments mandatory. Sure it can be done, but the only result is a separate handling function for it, so we got more almost identical, but still different code to maintain. Finally, what's the expect behavior when -snapshot is used? I'm getting this: (qemu) snapshot_blkdev ide0-hd0 snap-test Could not open '/tmp/vl.6w8YXA' (qemu) What type of file system is your /tmp? You need to provide full path to the snapshot file if you don't want it created next to where your qemu binary is being executed. At first, I don't see why we shouldn't generate the live snapshot, but anyway, any special behavior like this should be noted in the section called Notes in the command's documentation. I don't follow this at all, please elaborate. Jes
Re: [Qemu-devel] [PATCH v2 1/1] Add QMP bits for blockdev-snapshot-sync.
On 04/28/11 15:41, Kevin Wolf wrote: Finally, what's the expect behavior when -snapshot is used? I'm getting this: (qemu) snapshot_blkdev ide0-hd0 snap-test Could not open '/tmp/vl.6w8YXA' (qemu) What type of file system is your /tmp? You need to provide full path to the snapshot file if you don't want it created next to where your qemu binary is being executed. I think the problem is that this is a temporary file, i.e. unlinked directly after it has been opened. Trying to reopen a deleted file is a bad idea. True, but if /tmp is tmpfs, it isn't possible to open things O_DIRECT, which could also be the source of the problem here. Jes
[Qemu-devel] [PATCH v3] QMP: add snapshot_blkdev command
From: Jes Sorensen jes.soren...@redhat.com Add QMP bits for snapshot_blkdev command. This is the same as snapshot_blkdev in the human monitor. The command is synchronous. In the future async commands may be added with the name _async/-async. Signed-off-by: Jes Sorensen jes.soren...@redhat.com --- qmp-commands.hx | 38 ++ 1 files changed, 38 insertions(+), 0 deletions(-) diff --git a/qmp-commands.hx b/qmp-commands.hx index fbd98ee..24e9c3e 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -667,6 +667,44 @@ Example: EQMP { +.name = blockdev-snapshot, +.args_type = device:B,snapshot_file:s?,format:s?, +.params = device [new-image-file] [format], +.help = initiates a live snapshot\n\t\t\t + of device. If a new image file is specified, the\n\t\t\t + new image file will become the new root image.\n\t\t\t + If format is specified, the snapshot file will\n\t\t\t + be created in that format. Otherwise the\n\t\t\t + snapshot will be internal! (currently unsupported), +.user_print = monitor_user_noop, +.mhandler.cmd_new = do_snapshot_blkdev, +}, + +SQMP +blockdev-snapshot-sync +-- + +Synchronous snapshot of block device, using snapshot file as target, +if provided. + +Arguments: + +- device: device name to snapshot (json-string) +- snapshot_file: name of new image file (json-string) +- format: format of now image (json-string) + +Example: + +- { execute: blockdev-snapshot, arguments: { device: ide-hd0, +snapshot_file: +/some/place/my-image, + format: qcow2 + } } +- { return: {} } + +EQMP + +{ .name = balloon, .args_type = value:M, .params = target, -- 1.7.4.4
Re: [Qemu-devel] [PATCH v2 1/1] Add QMP bits for blockdev-snapshot-sync.
On 04/28/11 16:21, Luiz Capitulino wrote: On Thu, 28 Apr 2011 15:21:41 +0200 Jes Sorensen jes.soren...@redhat.com wrote: On 04/27/11 17:05, Luiz Capitulino wrote: All arguments should be mandatory in QMP, IMO. Sorry, but there is absolutely no reason to make all arguments mandatory. Sure it can be done, but the only result is a separate handling function for it, so we got more almost identical, but still different code to maintain. We shouldn't compromise our external interface quality because of implementation details. What I'm really asking here is whether this is a good command for our management tools. It has been discussed repeatedly for months, so yes I will argue it is. For example, I've just realized that the new root image is going to be automatically created after the first call to this command, and subsequent calls w/o the snapshot file name will re-use that file. Is that correct? No of course not. Every call to snapshot-blockdev will create a new snapshot. If you don't specify a filename, the new snapshot will be internal, except you will get an error as we don't currently support that. snapshots can be chained, so you can end up with a snapshot pointing to the previous snapshot, which points to the previous snapshot, which points to the original image . Also note the optional format usage, the command (randomly) picks qcow2 if the format is not given. What happens if I pass a raw image and don't specify the format? Will it work as it works for qcow2? The command doesn't pick randomly, it picks the default cow format for qemu. You cannot pass a raw image as it is not cow compatible. I'm not exactly asking for mandatory arguments. For the format argument for example, we could try to auto-detect the format (is it possible)? And then we could fail with a meaningful error message. The code is in there now, and there hasn't been requests for this in the past. The introduction of the qmp wrapper is not the place to discuss this. And, I also would like to hear from Anthony, as he's picking up QMP maintenance. Anthony already stated to me that he was fairly happy with it. However you are the QMP maintainer, so it needs to go in via the QMP tree. Finally, what's the expect behavior when -snapshot is used? I'm getting this: (qemu) snapshot_blkdev ide0-hd0 snap-test Could not open '/tmp/vl.6w8YXA' (qemu) What type of file system is your /tmp? ext4 You need to provide full path to the snapshot file if you don't want it created next to where your qemu binary is being executed. I'm not running in /tmp. Well something is funny with your /tmp then, as the above isn't normal behavior. At first, I don't see why we shouldn't generate the live snapshot, but anyway, any special behavior like this should be noted in the section called Notes in the command's documentation. I don't follow this at all, please elaborate. Any kind of limitation should be noted in the documentation. We cannot document a users choice of /tmp, when /tmp isn't part of what the command does. Jes
Re: [Qemu-devel] [PATCH v2 1/1] Add QMP bits for blockdev-snapshot-sync.
On 04/28/11 16:36, Anthony Liguori wrote: On 04/27/2011 10:05 AM, Luiz Capitulino wrote: On Mon, 18 Apr 2011 16:27:01 +0200 jes.soren...@redhat.com wrote: From: Jes Sorensenjes.soren...@redhat.com This is quivalent to snapshot_blkdev in the human monitor, with _sync added to the command name to make it explicit that the command is synchronous and leave space for a future async version. I'm not sure appending _sync is such a good convention, most commands are sync today and they don't have it. I'd prefer to call it snapshot_blkdev and note in the documentation how it works. It probably should be called snapshot_blkdev_broken because that's what it really is. The '_sync' is there to indicate that this command doesn't properly implement asynchronous logic and can break a guest. I'd actually prefer that we not expose this command through QMP at all and instead implement a proper snapshot command. Sorry but this is utterly bogus. The snapshot support as is works fine, and the command is in the monitor. We should expose it in QMP as well. If we eventually get a different implementation, then we can rename it or replace it then. Jes
Re: [Qemu-devel] [PATCH v2 1/1] Add QMP bits for blockdev-snapshot-sync.
On 04/28/11 16:42, Kevin Wolf wrote: What type of file system is your /tmp? You need to provide full path to the snapshot file if you don't want it created next to where your qemu binary is being executed. I think the problem is that this is a temporary file, i.e. unlinked directly after it has been opened. Trying to reopen a deleted file is a bad idea. True, but if /tmp is tmpfs, it isn't possible to open things O_DIRECT, which could also be the source of the problem here. Not really, -snapshot is very clearly the problem here. Note that what's failing here is not opening the new snapshot but the old temporary image created by -snapshot. Sorry you are losing me here - why would reopening the old image fail? And if it fails, why does it have such a bizarre name in Luiz's case? Jes
Re: [Qemu-devel] [PATCH v2 1/1] Add QMP bits for blockdev-snapshot-sync.
On 04/28/11 16:46, Anthony Liguori wrote: On 04/28/2011 09:38 AM, Jes Sorensen wrote: Sorry but this is utterly bogus. The snapshot support as is works fine, and the command is in the monitor. We should expose it in QMP as well. It went in for the monitor because it was considered an imperfect command so we held up the QMP side because we wanted a better interface. I am not sure who is included in the 'we' here. The current command does: 1) Create new image backing to current image 2) Flush outstanding I/O to old image 3) Close current image 4) Reopen newly created image 5) Go Operations (1) and (2) are very synchronous operations. (4) can be too. We really should have a bdrv_aio_snapshot() function that implements the logic for at least (2) in an asynchronous fashion. That sort of interface is going to affect how we expose things in QMP. As from a QMP perspective, we're going to do something like: a) start snapshot b) query snapshot progress c) receive notification of snapshot completion d) flip over image Sorry this is inherently broken. The management tool should not be keeping state in this process. I agree an async interface would be nice, but the above process is plain wrong. The async snapshot process needs to be doing the exact same as in the current implementation, the main difference is that it would be running asynchronously and that QMP would be able to query the state of it. You definitely do *not* want the management tool to launch the snapshot process, and then do the flip by the management tool later. It should all happen as part of the original command. Being able to query it while it is progressing is a different story. The one reason why this isn't all that big an issue in the first place, is that in the normal usage case, a user will run a guest agent which freezes the guest file systems during the snapshot process, so the fact that the existing command is synchronous pretty much disappears in the noise. And of course, this needs to be carefully thought through for race conditions. In the current command, what happens if you get a crash between (2) and (3)? There's no way for the management tools to know that we didn't finish flushing writes. How does the management tool know that (1) didn't fail mid way through resulting in a corrupted image? There is no issue here, you have the exact same problem if you get a crash during d) in your example. It is the same with the existing command, the crash is only an issue if it happens right in the middle of the switch over. Until then, only the original image remains valid. Cheers, Jes
[Qemu-devel] [PATCH 2/2] Remove obsolete 'enabled' variable from progress state
From: Jes Sorensen jes.soren...@redhat.com Signed-off-by: Jes Sorensen jes.soren...@redhat.com Signed-off-by: Kevin Wolf kw...@redhat.com --- qemu-progress.c |8 ++-- 1 files changed, 2 insertions(+), 6 deletions(-) diff --git a/qemu-progress.c b/qemu-progress.c index b4b751c..e1feb89 100644 --- a/qemu-progress.c +++ b/qemu-progress.c @@ -29,7 +29,6 @@ #include signal.h struct progress_state { -int enabled; float current; float last_print; float min_skip; @@ -46,10 +45,8 @@ static struct progress_state state; */ static void progress_simple_print(void) { -if (state.enabled) { -printf((%3.2f/100%%)\r, state.current); -fflush(stdout); -} +printf((%3.2f/100%%)\r, state.current); +fflush(stdout); } static void progress_simple_end(void) @@ -96,7 +93,6 @@ static void progress_dummy_init(void) void qemu_progress_init(int enabled, float min_skip) { -state.enabled = enabled; state.min_skip = min_skip; if (enabled) { progress_simple_init(); -- 1.7.4.4
[Qemu-devel] [PATCH 1/2] Add dd-style SIGUSR1 progress reporting
From: Jes Sorensen jes.soren...@redhat.com This introduces support for dd-style progress reporting on POSIX systems, if the user hasn't specified -p to report progress. If sent a SIGUSR1, qemu-img will report current progress for commands that support progress reporting. Signed-off-by: Jes Sorensen jes.soren...@redhat.com --- qemu-progress.c | 53 - 1 files changed, 48 insertions(+), 5 deletions(-) diff --git a/qemu-progress.c b/qemu-progress.c index 656e065..b4b751c 100644 --- a/qemu-progress.c +++ b/qemu-progress.c @@ -26,12 +26,15 @@ #include osdep.h #include sysemu.h #include stdio.h +#include signal.h struct progress_state { int enabled; float current; float last_print; float min_skip; +void (*print)(void); +void (*end)(void); }; static struct progress_state state; @@ -51,20 +54,60 @@ static void progress_simple_print(void) static void progress_simple_end(void) { -if (state.enabled) { -printf(\n); -} +printf(\n); +} + +static void progress_simple_init(void) +{ +state.print = progress_simple_print; +state.end = progress_simple_end; +} + +#ifdef CONFIG_POSIX +static void sigusr_print(int signal) +{ +printf((%3.2f/100%%)\n, state.current); +} +#endif + +static void progress_dummy_print(void) +{ +} + +static void progress_dummy_end(void) +{ +} + +static void progress_dummy_init(void) +{ +#ifdef CONFIG_POSIX +struct sigaction action; + +memset(action, 0, sizeof(action)); +sigfillset(action.sa_mask); +action.sa_handler = sigusr_print; +action.sa_flags = 0; +sigaction(SIGUSR1, action, NULL); +#endif + +state.print = progress_dummy_print; +state.end = progress_dummy_end; } void qemu_progress_init(int enabled, float min_skip) { state.enabled = enabled; state.min_skip = min_skip; +if (enabled) { +progress_simple_init(); +} else { +progress_dummy_init(); +} } void qemu_progress_end(void) { -progress_simple_end(); +state.end(); } void qemu_progress_print(float percent, int max) @@ -84,6 +127,6 @@ void qemu_progress_print(float percent, int max) if (current (state.last_print + state.min_skip) || (current == 100) || (current == 0)) { state.last_print = state.current; -progress_simple_print(); +state.print(); } } -- 1.7.4.4
[Qemu-devel] [PATCH v2 qemu-block 0/2] Add dd-style SIGUSR1 progress reporting
From: Jes Sorensen jes.soren...@redhat.com This introduces support for dd-style progress reporting, if the user hasn't specified -p to report progress. If sent a SIGUSR1, qemu-img will report current progress for commands that support progress reporting. v2 fixes the mingw32 build problems, there is no change to the code on POSIX systems. It should be a drop-in replacement for the previous patch. Jes Sorensen (2): Add dd-style SIGUSR1 progress reporting Remove obsolete 'enabled' variable from progress state qemu-progress.c | 61 +-- 1 files changed, 50 insertions(+), 11 deletions(-) -- 1.7.4.4
Re: [Qemu-devel] [PATCH v2 1/1] Add QMP bits for blockdev-snapshot-sync.
On 04/27/11 17:05, Luiz Capitulino wrote: On Mon, 18 Apr 2011 16:27:01 +0200 jes.soren...@redhat.com wrote: From: Jes Sorensen jes.soren...@redhat.com This is quivalent to snapshot_blkdev in the human monitor, with _sync added to the command name to make it explicit that the command is synchronous and leave space for a future async version. I'm not sure appending _sync is such a good convention, most commands are sync today and they don't have it. I'd prefer to call it snapshot_blkdev and note in the documentation how it works. On the other hand, I'm not sure how Anthony is going to model async commands, so maybe he has a better suggestion. The _sync prefix is on purpose to leave space for a possible async implementation of the snapshot command in the future. This isn't related to it being a sync vs async qmp command though. Jes
Re: [Qemu-devel] [RFC][PATCH v2 00/11] QEMU Guest Agent: QMP-based host/guest communication (virtagent)
On 04/21/11 22:58, Michael Roth wrote: On 04/21/2011 09:10 AM, Jes Sorensen wrote: On 04/18/11 17:02, Michael Roth wrote: One thing I cannot seem to figure out with this tree - the agent commands do not seem to show up in the monitor? What am I missing? Hmm, for some reason this email never hit my inbox. You mean with the human monitor? Currently, with these new patches, we're QMP only. And most of the command names/semantics have changed as well. The qapi-schema.json changes in patch 16 have the new prototypes, and the 0 patch has some usage examples. Hi Michael, Yeah it was the conclusion I came to on Thursday when I was working on porting the freeze patches over. After fighting the json %#$%#$%#$ I ended up with something I couldn't test in the end :( Any plans to add human monitor support in the near future? Cheers, Jes
Re: [Qemu-devel] KVM call agenda for April 26th
On 04/26/11 11:24, Juan Quintela wrote: Please, send in any agenda items you are interested in covering. From last week: Tools for resource accounting the virtual machines. Luis Antonio Galindo Castro (FunkyM0nk3y) funkymons...@gmail.com - Status of glib tree - next steps? Jes
Re: [Qemu-devel] [RFC][PATCH v2 15/17] guest agent: qemu-ga daemon
On 04/25/11 14:27, Ian Molton wrote: On Fri, 2011-04-22 at 13:51 +0200, Jes Sorensen wrote: Hiding things you miss when reading the code, it's a classic for people to do if(foo) bleh(); on the same line, and whoever reads the code will expect the action on the next line, especially if foo is a long complex statement. It's one of these 'just don't do it, it bites you in the end' things. Meh. I dont see it that way... Sure, if it was one line out of 20 written that way, it would be weird, but as is, its just part of a block of identical lines. It is a matter of consistency, we allow it in one place, we suddenly have it all over. The moment someone wants to add a slightly more complex case to such a switch statement it is all down the drain. It is way better to stay consistent across the board. I dont really see a parallel with the if() statement either since the condition in the switch() case isnt on the same line as such. I must admit that I only write one-liner if statements if the condition is short though. Writing one-liner if() statements is inherently broken, or you could call it the Perl syndrome. Write-once, read-never. Jes
Re: [Qemu-devel] KVM call agenda for April 26th
On 04/26/11 15:09, Anthony Liguori wrote: On 04/26/2011 06:47 AM, Jes Sorensen wrote: On 04/26/11 11:24, Juan Quintela wrote: Please, send in any agenda items you are interested in covering. From last week: Tools for resource accounting the virtual machines. Luis Antonio Galindo Castro (FunkyM0nk3y)funkymons...@gmail.com - Status of glib tree - next steps? I actually decided to just rewriting all of QEMU in Python and C++.. You'll have to compete with my Java port!
Re: [Qemu-devel] [RFC][PATCH v2 00/11] QEMU Guest Agent: QMP-based host/guest communication (virtagent)
On 04/26/11 16:27, Michael Roth wrote: On 04/26/2011 01:57 AM, Jes Sorensen wrote: Yeah it was the conclusion I came to on Thursday when I was working on porting the freeze patches over. After fighting the json %#$%#$%#$ I ended up with something I couldn't test in the end :( I actually worked on getting most of the fsfreeze ported over yesterday, were they any major changes from the last set of patches other than the porting work? If so, feel free to send the patches my way and I'll hack on those a bit, otherwise I plan to have the fsfreeze stuff included in the next re-spin later this week. I'll try and post them later today. Any plans to add human monitor support in the near future? The main target will be QMP for the initial patches. But ideally the HMP commands we add will have a pretty close correspondence with QMP. That is unfortunate, QMP is really user unfriendly :( Cheers, Jes
[Qemu-devel] [PATCH] Add QMP fsfreeze support
From: Jes Sorensen jes.soren...@redhat.com This patch adds the following QMP commands: qga-guest-fsfreeze: - Freezes all local file systems in the guest. Command will return the number of file systems that were frozen. qga-guest-fsthaw: - Thaws all local file systems in the guest. Command will return the number of file systems that were thawed. Signed-off-by: Jes Sorensen jes.soren...@redhat.com --- qapi-schema.json | 24 ++ qga/guest-agent-commands.c | 177 2 files changed, 201 insertions(+), 0 deletions(-) diff --git a/qapi-schema.json b/qapi-schema.json index e2f209d..2c86cec 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -1562,6 +1562,30 @@ { 'option': 'blockdev', 'data': 'BlockdevConfig', 'implicit': 'id' } ## +# @guest-fsfreeze: +# +# json gibberish for guest-fsfreeze command +# +# Returns: Number of file systems frozen +# +# Since: 0.15.0 +## +{ 'command': 'guest-fsfreeze', + 'returns': 'int' } + +## +# @guest-fsthaw: +# +# json gibberish for guest-fsthaw command +# +# Returns: Number of file systems thawed +# +# Since: 0.15.0 +## +{ 'command': 'guest-fsthaw', + 'returns': 'int' } + +## # @VncConfig: # # Configuration options for the built-in VNC server. diff --git a/qga/guest-agent-commands.c b/qga/guest-agent-commands.c index 843ef36..4bc2c57 100644 --- a/qga/guest-agent-commands.c +++ b/qga/guest-agent-commands.c @@ -11,6 +11,10 @@ */ #include glib.h +#include mntent.h +#include sys/types.h +#include sys/ioctl.h +#include linux/fs.h #include guest-agent.h static bool enable_syslog = true; @@ -262,6 +266,179 @@ struct GuestFileSeek *qga_guest_file_seek(int64_t filehandle, int64_t offset, return seek_data; } +/* + * Walk the mount table and build a list of local file systems + */ +struct direntry { +char *dirname; +char *devtype; +struct direntry *next; +}; + +struct va_freezestate { +struct direntry *mount_list; +int status; +}; + +/* + * This should be in a header file, but we have none :( + */ +enum va_fsfreeze_status { +FREEZE_ERROR = -1, +FREEZE_THAWED = 0, +FREEZE_INPROGRESS = 1, +FREEZE_FROZEN = 2, +FREEZE_THAWINPROGRESS = 3, +}; + +static struct va_freezestate freezestate; + +static int build_mount_list(void) +{ +struct mntent *mnt; +struct direntry *entry; +struct direntry *next; +char const *mtab = MOUNTED; +FILE *fp; + +fp = setmntent(mtab, r); +if (!fp) { +fprintf(stderr, unable to read mtab\n); +goto fail; +} + +while ((mnt = getmntent(fp))) { +/* + * An entry which device name doesn't start with a '/' is + * either a dummy file system or a network file system. + * Add special handling for smbfs and cifs as is done by + * coreutils as well. + */ +if ((mnt-mnt_fsname[0] != '/') || +(strcmp(mnt-mnt_type, smbfs) == 0) || +(strcmp(mnt-mnt_type, cifs) == 0)) { +continue; +} + +entry = qemu_malloc(sizeof(struct direntry)); +entry-dirname = qemu_strdup(mnt-mnt_dir); +entry-devtype = qemu_strdup(mnt-mnt_type); +entry-next = freezestate.mount_list; + +freezestate.mount_list = entry; +} + +endmntent(fp); + +return 0; + +fail: +while(freezestate.mount_list) { +next = freezestate.mount_list-next; +qemu_free(freezestate.mount_list-dirname); +qemu_free(freezestate.mount_list-devtype); +qemu_free(freezestate.mount_list); +freezestate.mount_list = next; +} + +return -1; +} + +/* + * qga_guest_fsfreeze(): Walk list of mounted file systems in the + * guest, and freeze the ones which are real local file systems. + * return values: Number of file systems frozen, -1 on error. + */ +int64_t qga_guest_fsfreeze(Error **err) +{ +struct direntry *entry; +int fd, i = 0; +int64_t ret; +SLOG(va_fsfreeze()); + +if (freezestate.status != FREEZE_THAWED) { +ret = 0; +goto out; +} + +ret = build_mount_list(); +if (ret 0) { +goto out; +} + +freezestate.status = FREEZE_INPROGRESS; + +entry = freezestate.mount_list; +while(entry) { +fd = qemu_open(entry-dirname, O_RDONLY); +if (fd == -1) { +ret = errno; +goto error; +} +ret = ioctl(fd, FIFREEZE); +close(fd); +if (ret 0 ret != EOPNOTSUPP) { +goto error; +} + +entry = entry-next; +i++; +} + +freezestate.status = FREEZE_FROZEN; +ret = i; +out: +return ret; +error: +if (i 0) { +freezestate.status = FREEZE_ERROR; +} +goto out; +} + +/* + * qga_guest_fsthaw(): Walk list of frozen file systems in the guest, and + * thaw them. + * return values: Number of file systems thawed on success, -1 on error. + */ +int64_t qga_guest_fsthaw(Error **err) +{ +struct
Re: [Qemu-devel] [RFC][PATCH v2 15/17] guest agent: qemu-ga daemon
On 04/22/11 11:23, Ian Molton wrote: On Thu, 2011-04-21 at 08:21 -0500, Michael Roth wrote: +switch (level G_LOG_LEVEL_MASK) { +case G_LOG_LEVEL_ERROR: return error; +case G_LOG_LEVEL_CRITICAL: return critical; +case G_LOG_LEVEL_WARNING: return warning; +case G_LOG_LEVEL_MESSAGE: return message; +case G_LOG_LEVEL_INFO: return info; +case G_LOG_LEVEL_DEBUG: return debug; +default:return user; +} Urgh! No two statements on the same line please! Always wondered what the logic for this one is. IMHO the above is FAR neater than splitting it to near double its height. What kind of coding error does splitting this out aim to prevent? missing break; / return; statements? Because I dont see how it achieves that... Hiding things you miss when reading the code, it's a classic for people to do if(foo) bleh(); on the same line, and whoever reads the code will expect the action on the next line, especially if foo is a long complex statement. It's one of these 'just don't do it, it bites you in the end' things. Jes
Re: [Qemu-devel] [RFC][PATCH v2 08/17] qapi: fix Error usage in qemu-sockets.c
On 04/18/11 17:02, Michael Roth wrote: Fix spurious errors due to not initializing Error pointer to NULL before checking for errors. Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com --- qemu-sockets.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/qemu-sockets.c b/qemu-sockets.c index dc8beeb..e709e5f 100644 --- a/qemu-sockets.c +++ b/qemu-sockets.c @@ -630,7 +630,7 @@ int unix_connect(const char *path) { QemuOpts *opts; int sock; -Error *err; +Error *err = NULL; opts = qemu_opts_create(dummy_opts, NULL, 0, err); if (err) { This one really should go into the tree asap, even if the rest of the virt agent patches are still pending. Reviewed-by: Jes Sorensen jes.soren...@redhat.com
Re: [Qemu-devel] [RFC][PATCH v2 09/17] qmp proxy: core code for proxying qmp requests to guest
On 04/18/11 17:02, Michael Roth wrote: diff --git a/qmp-core.c b/qmp-core.c index 9f3d182..dab50a1 100644 --- a/qmp-core.c +++ b/qmp-core.c @@ -937,7 +937,15 @@ void qmp_async_complete_command(QmpCommandState *cmd, QObject *retval, Error *er qemu_free(cmd); } +extern QmpProxy *qmp_proxy_default; Please put this in a header file. +static void qmp_proxy_process_control_event(QmpProxy *p, const QDict *evt) +{ +const char *cmd; +int host_sid, guest_sid; + +cmd = qdict_get_try_str(evt, _control_event); +if (!cmd) { +fprintf(stderr, received NULL control event\n); +} else if (strcmp(cmd, guest_ack) == 0) { +host_sid = qdict_get_try_int(evt, _control_arg_host_sid, 0); +if (!host_sid) { +fprintf(stderr, invalid guest_ack: wrong host sid\n); +return; +} +/* guest responded to host_init, return a host_ack */ +/* reset outstanding requests, then send an ack with the + * session id they passed us + */ +guest_sid = qdict_get_try_int(evt, _control_arg_guest_sid, 0); I am wondering if it would make sense to put these arguments in a header file as #define's to make sure you don't have to chase down a typo on one side at some point? Just an idea, dunno if it is worth it. Cheers, Jes
Re: [Qemu-devel] [RFC][PATCH v2 12/17] guest agent: worker thread class
On 04/18/11 17:02, Michael Roth wrote: diff --git a/qga/guest-agent-worker.c b/qga/guest-agent-worker.c new file mode 100644 index 000..e3295da --- /dev/null +++ b/qga/guest-agent-worker.c @@ -0,0 +1,173 @@ +/* + * QEMU Guest Agent worker thread interfaces + * + * Copyright IBM Corp. 2011 + * + * Authors: + * Michael Roth mdr...@linux.vnet.ibm.com + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ +#include glib.h +#include stdlib.h +#include stdio.h +#include stdbool.h +#include pthread.h +#include errno.h +#include string.h +#include guest-agent.h +#include ../error.h Oh dear! do not do that please! Fix the Makefile to include the appropriate path. +struct GAWorker { +pthread_t thread; +ga_worker_func execute; +pthread_mutex_t input_mutex; +pthread_cond_t input_avail_cond; +void *input; +bool input_avail; +pthread_mutex_t output_mutex; +pthread_cond_t output_avail_cond; You really should use QemuMutex and friends here. +void *output; +Error *output_error; +bool output_avail; +}; + +static void *worker_run(void *worker_p) +{ +GAWorker *worker = worker_p; +Error *err; +void *input, *output; + +while (1) { +/* wait for input */ +pthread_mutex_lock(worker-input_mutex); qemu_mutex_lock() +while (!worker-input_avail) { +pthread_cond_wait(worker-input_avail_cond, worker-input_mutex); +} again +input = worker-input; +worker-input_avail = false; +pthread_mutex_unlock(worker-input_mutex); and again I'll stop. Basically there really should be no references to pthread_* Jes
Re: [Qemu-devel] [RFC][PATCH v2 15/17] guest agent: qemu-ga daemon
On 04/18/11 17:02, Michael Roth wrote: +static const char *ga_log_level_str(GLogLevelFlags level) +{ +switch (level G_LOG_LEVEL_MASK) { +case G_LOG_LEVEL_ERROR: return error; +case G_LOG_LEVEL_CRITICAL: return critical; +case G_LOG_LEVEL_WARNING: return warning; +case G_LOG_LEVEL_MESSAGE: return message; +case G_LOG_LEVEL_INFO: return info; +case G_LOG_LEVEL_DEBUG: return debug; +default:return user; +} Urgh! No two statements on the same line please! Jes
Re: [Qemu-devel] [RFC][PATCH v2 00/11] QEMU Guest Agent: QMP-based host/guest communication (virtagent)
On 04/18/11 17:02, Michael Roth wrote: These apply on top of Anthony's glib tree, commit 03d5927deb5e6baebaade1b4c8ff2428a85e125c currently, and can also be obtained from: git://repo.or.cz/qemu/mdroth.git qga_v2 Patches 1-8 are general json/QAPI-related fixes. Anthony, please consider pulling these into your glib tree. The json fix-ups may need further evaluation, but I'm confident they're at least an improvement. The QAPI ones are mostly trivial fix-ups. Changes since V1: - Added guest agent worker thread to execute RPCs in the guest. With this in place we have a reliable timeout mechanism for hung commands, currently set at 30 seconds. - Add framework for registering init/cleanup routines for stateful RPCs to clean up after themselves after a timeout. - Added the following RPCs: guest-file-{open,close,read,write,seek}, guest-shutdown, guest-info, and removed stubs for guest-view-file (now deprecated) - Added GUEST_AGENT_UP/GUEST_AGENT_DOWN QMP events - Switched to a TCP-style host-initiated 3-way handshake for channel negotiation, this simplifies client negotiation/interaction over the wire - Added configurable log level/log file/pid file options for guest agent - Various fixes for bugs/memory leaks and checkpatch.pl fixups ISSUES/TODOS: - Fix QMP proxy handling of error responses sent by guest agent, currently ignored - Add unit tests for guest agent wire protocol - Add unit tests for QMP interfaces - Add host-side timeout mechanism for async QMP commands - Return error for guest commands if guest up event has not yet been recieved - Make QMP param names more consistent between related commands - Clean up logging Hi, I had a look through this patchset and generally it looks pretty good. There were a few nits, and I ignored all the python gibberish to avoid getting a headache. Did you do anything with the fsfreeze patches, or were they dropped in the migration to qapi? Cheers, Jes
[Qemu-devel] [PATCH qemu-glib] Add missing files to distclean list
From: Jes Sorensen jes.soren...@redhat.com Signed-off-by: Jes Sorensen jes.soren...@redhat.com --- Makefile |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/Makefile b/Makefile index a7c1503..bebe3bd 100644 --- a/Makefile +++ b/Makefile @@ -301,6 +301,7 @@ distclean: clean for d in $(TARGET_DIRS) libhw32 libhw64 libuser libdis libdis-user; do \ rm -rf $$d || exit 1 ; \ done + rm -f qmp.h qmp-marshal.c qmp-marshal-types.* KEYMAPS=da en-gb et fr fr-ch is lt modifiers no pt-br sv \ ar de en-us fi fr-be hr it lv nl pl ru th \ -- 1.7.4.4
Re: [Qemu-devel] [RFC][PATCH v2 12/17] guest agent: worker thread class
On 04/21/11 15:15, Michael Roth wrote: On 04/21/2011 03:44 AM, Jes Sorensen wrote: and again I'll stop. Basically there really should be no references to pthread_* This is on the guest side of things where I'm trying to use GLib wherever possible to keep things somewhat portable: logging/list utilities/io events/etc. And I *really* wanted to use GThreads here, but the problem is that GThread does not have any sane means to kill off a thread when a timeout occurs: there's no analogue to pthread_cancel(), and to use signals you need to break the abstraction to get the underlying pid. The new QemuThread stuff is using GThread underneath the covers so same limitation there. pthreads provides these things and is fairly portable however, so I opted to make it an explicit dependency on the guest side. So glib+pthreads are the current dependencies. That is really unfortunate - is there no way around it? It really would be ideal if we could build the guest relying on QemuThreads for portability. Either way, please fix the include issue. Jes
Re: [Qemu-devel] [RFC][PATCH v2 00/11] QEMU Guest Agent: QMP-based host/guest communication (virtagent)
On 04/18/11 17:02, Michael Roth wrote: These apply on top of Anthony's glib tree, commit 03d5927deb5e6baebaade1b4c8ff2428a85e125c currently, and can also be obtained from: git://repo.or.cz/qemu/mdroth.git qga_v2 Patches 1-8 are general json/QAPI-related fixes. Anthony, please consider pulling these into your glib tree. The json fix-ups may need further evaluation, but I'm confident they're at least an improvement. The QAPI ones are mostly trivial fix-ups. Changes since V1: - Added guest agent worker thread to execute RPCs in the guest. With this in place we have a reliable timeout mechanism for hung commands, currently set at 30 seconds. - Add framework for registering init/cleanup routines for stateful RPCs to clean up after themselves after a timeout. - Added the following RPCs: guest-file-{open,close,read,write,seek}, guest-shutdown, guest-info, and removed stubs for guest-view-file (now deprecated) - Added GUEST_AGENT_UP/GUEST_AGENT_DOWN QMP events - Switched to a TCP-style host-initiated 3-way handshake for channel negotiation, this simplifies client negotiation/interaction over the wire - Added configurable log level/log file/pid file options for guest agent - Various fixes for bugs/memory leaks and checkpatch.pl fixups ISSUES/TODOS: - Fix QMP proxy handling of error responses sent by guest agent, currently ignored - Add unit tests for guest agent wire protocol - Add unit tests for QMP interfaces - Add host-side timeout mechanism for async QMP commands - Return error for guest commands if guest up event has not yet been recieved - Make QMP param names more consistent between related commands - Clean up logging Michael, One thing I cannot seem to figure out with this tree - the agent commands do not seem to show up in the monitor? What am I missing? Cheers, Jes
Re: [Qemu-devel] [PATCH] atapi: Add 'medium ready' to 'medium not ready' transition on cd change
On 04/18/11 13:45, Amit Shah wrote: MMC-5 Table F.1 lists errors that can be thrown for the TEST_UNIT_READY command. Going from medium not ready to medium ready states is communicated by throwing an error. This adds the missing 'tray opened' event that we fail to report to guests. After doing this, older Linux guests properly revalidate a disc on the change command. HSM violation errors, which caused Linux guests to do a soft-reset of the link, also go away: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 sr 1:0:0:0: CDB: Test Unit Ready: 00 00 00 00 00 00 ata2.00: cmd a0/00:00:00:00:00/00:00:00:00:00/a0 tag 0 res 01/60:00:00:00:00/00:00:00:00:00/a0 Emask 0x3 (HSM violation) ata2.00: status: { ERR } ata2: soft resetting link ata2.00: configured for MWDMA2 ata2: EH complete Signed-off-by: Amit Shah amit.s...@redhat.com --- hw/ide/core.c | 20 1 files changed, 16 insertions(+), 4 deletions(-) Looks good to me. Acked-by: Jes Sorensen jes.soren...@redhat.com
[Qemu-devel] [PATCH v2 1/1] Add QMP bits for blockdev-snapshot-sync.
From: Jes Sorensen jes.soren...@redhat.com This is quivalent to snapshot_blkdev in the human monitor, with _sync added to the command name to make it explicit that the command is synchronous and leave space for a future async version. Signed-off-by: Jes Sorensen jes.soren...@redhat.com --- qmp-commands.hx | 26 ++ 1 files changed, 26 insertions(+), 0 deletions(-) diff --git a/qmp-commands.hx b/qmp-commands.hx index fbd98ee..b8f537c 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -667,6 +667,32 @@ Example: EQMP { +.name = blockdev-snapshot-sync, +.args_type = device:B,snapshot_file:s?,format:s?, +.params = device [new-image-file] [format], +.user_print = monitor_user_noop, +.mhandler.cmd_new = do_snapshot_blkdev, +}, + +SQMP +Synchronous snapshot of block device, using snapshot file as target +if provided. + +If a new image file is specified, the new image file will become the +new root image. If format is specified, the snapshot file will be +created in that format. Otherwise the snapshot will be internal! +(currently unsupported). + +Errors: +If creating the new snapshot image fails, QEMU will continue running +on the original image. If switching to the newly created image fails, +it will be attempted to fall back to the original image and return +QERR_OPEN_FILE_FAILED with the snapshot filename. If re-opening +the original image fails, QERR_OPEN_FILE_FAILED will be returned with +the original image filename. +EQMP + +{ .name = balloon, .args_type = value:M, .params = target, -- 1.7.4.4
[Qemu-devel] [PATCH v2 0/1] Add QMP bits for blockdev-snapshot-sync.
From: Jes Sorensen jes.soren...@redhat.com This is an old patch I am resurrecting, adding a QMP command for live snapshot support. I have tried to address the comments received in the previous emails around March 9th. Please let me know if you have further issues with this. Jes Sorensen (1): Add QMP bits for blockdev-snapshot-sync. qmp-commands.hx | 26 ++ 1 files changed, 26 insertions(+), 0 deletions(-) -- 1.7.4.4
[Qemu-devel] LPC2011 Virtualization Micro Conf
Hi, With the success of last year's Virtualization micro-conference track at Linux Plumbers 2010, I have accepted to organize a similar track for Linux Plumbers 2011 in Santa Rosa. Please see the official Linux Plumbers 2011 website for full details about the conference: http://www.linuxplumbersconf.org/2011/ The Linux Plumbers 2011 Virtualization track is focusing on general free software Linux Virtualization. It is not reserved for a specific hypervisor, but will focus on general virtualization issues and in particular collaboration amongst projects. This would include KVM, Xen, QEMU, containers etc. Deadline: - The deadline for submissions is April 30th. Please visit the following link to submit your proposal: http://www.linuxplumbersconf.org/2011/ocw/events/LPC2011MC/proposals Example topics: --- - Kernel and Hypervisor KVM/QEMU/Xen interaction - QEMU integration, sharing of code between the different projects - IO Performance and scalability - Live Migration - Managing and supporting enterprise storage - Support for new hardware features, and/or provide guest access to these features. - Guest agents - Virtualization management tools, libvirt, etc. - Desktop integration - Consumer Electronics device emulation - Custom platform configuration and coordination with the kernel Audience: - Virtualization hypervisor developers, developers of virtualization management tools and applications, embedded virtualization developers, vendors and others. Best regards, Jes
[Qemu-devel] [PATCH] Remove obsolete 'enabled' variable from progress state
From: Jes Sorensen jes.soren...@redhat.com Signed-off-by: Jes Sorensen jes.soren...@redhat.com --- qemu-progress.c |8 ++-- 1 files changed, 2 insertions(+), 6 deletions(-) diff --git a/qemu-progress.c b/qemu-progress.c index 6498161..f3ce974 100644 --- a/qemu-progress.c +++ b/qemu-progress.c @@ -29,7 +29,6 @@ #include signal.h struct progress_state { -int enabled; float current; float last_print; float min_skip; @@ -46,10 +45,8 @@ static struct progress_state state; */ static void progress_simple_print(void) { -if (state.enabled) { -printf((%3.2f/100%%)\r, state.current); -fflush(stdout); -} +printf((%3.2f/100%%)\r, state.current); +fflush(stdout); } static void progress_simple_end(void) @@ -92,7 +89,6 @@ static void progress_dummy_init(void) void qemu_progress_init(int enabled, float min_skip) { -state.enabled = enabled; state.min_skip = min_skip; if (enabled) { progress_simple_init(); -- 1.7.4.2
[Qemu-devel] [PATCH] Add dd-style SIGUSR1 progress reporting
From: Jes Sorensen jes.soren...@redhat.com This introduces support for dd-style progress reporting, if the user hasn't specified -p to report progress. If sent a SIGUSR1, qemu-img will report current progress for commands that support progress reporting. Signed-off-by: Jes Sorensen jes.soren...@redhat.com --- qemu-progress.c | 49 - 1 files changed, 44 insertions(+), 5 deletions(-) diff --git a/qemu-progress.c b/qemu-progress.c index 656e065..6498161 100644 --- a/qemu-progress.c +++ b/qemu-progress.c @@ -26,12 +26,15 @@ #include osdep.h #include sysemu.h #include stdio.h +#include signal.h struct progress_state { int enabled; float current; float last_print; float min_skip; +void (*print)(void); +void (*end)(void); }; static struct progress_state state; @@ -51,20 +54,56 @@ static void progress_simple_print(void) static void progress_simple_end(void) { -if (state.enabled) { -printf(\n); -} +printf(\n); +} + +static void progress_simple_init(void) +{ +state.print = progress_simple_print; +state.end = progress_simple_end; +} + +static void sigusr_print(int signal) +{ +printf((%3.2f/100%%)\n, state.current); +} + +static void progress_dummy_print(void) +{ +} + +static void progress_dummy_end(void) +{ +} + +static void progress_dummy_init(void) +{ +struct sigaction action; + +memset(action, 0, sizeof(action)); +sigfillset(action.sa_mask); +action.sa_handler = sigusr_print; +action.sa_flags = 0; +sigaction(SIGUSR1, action, NULL); + +state.print = progress_dummy_print; +state.end = progress_dummy_end; } void qemu_progress_init(int enabled, float min_skip) { state.enabled = enabled; state.min_skip = min_skip; +if (enabled) { +progress_simple_init(); +} else { +progress_dummy_init(); +} } void qemu_progress_end(void) { -progress_simple_end(); +state.end(); } void qemu_progress_print(float percent, int max) @@ -84,6 +123,6 @@ void qemu_progress_print(float percent, int max) if (current (state.last_print + state.min_skip) || (current == 100) || (current == 0)) { state.last_print = state.current; -progress_simple_print(); +state.print(); } } -- 1.7.4.2
Re: [Qemu-devel] [PATCH 04/11] Move generic or OS function declarations to qemu-common.h
On 04/08/11 22:45, Blue Swirl wrote: Move generic or OS related function declarations and macro TFR to qemu-common.h. While moving, also add #include winsock2.h to fix a recent mingw32 build breakage. Signed-off-by: Blue Swirl blauwir...@gmail.com --- qemu-common.h | 21 + sysemu.h | 21 - 2 files changed, 21 insertions(+), 21 deletions(-) diff --git a/qemu-common.h b/qemu-common.h index 82e27c1..8b48a09 100644 --- a/qemu-common.h +++ b/qemu-common.h @@ -12,6 +12,7 @@ #endif #define QEMU_BUILD_BUG_ON(x) typedef char __build_bug_on__##__LINE__[(x)?-1:1]; +#define TFR(expr) do { if ((expr) != -1) break; } while (errno == EINTR) typedef struct QEMUTimer QEMUTimer; typedef struct QEMUFile QEMUFile; @@ -39,6 +40,16 @@ typedef struct Monitor Monitor; #include sys/time.h #include assert.h +#ifdef _WIN32 +#include windows.h +#include winsock2.h +#include qemu-os-win32.h +#endif + +#ifdef CONFIG_POSIX +#include qemu-os-posix.h +#endif + Nice, the more we can get out of the dreadful sysemu.h the better. However, please put the includes of windows.h and winsock2.h into qemu-os-win32.h instead. Cheers, Jes
Re: [Qemu-devel] [PATCH v3 0/7] Let boards state maximum RAM limits in QEMUMachine struct
On 04/04/11 19:26, Peter Maydell wrote: On 4 April 2011 15:53, Jes Sorensen jes.soren...@redhat.com wrote: I understand that what you are proposing seems to work well enough for your problem at hand. What I am saying is that adding a mechanism like that, can cause problems for adding a more generic mechanism that handles more advanced boards in the future. I much prefer a generic solution than a simple hack. I don't think it's a hack. I think it's a reasonably clean solution to the set of cases we've actually encountered in practice, and I think trying to design something more generalised without actually having a use case to tie it to is just going to produce something complicated which doesn't turn out to actually be what a hypothetical advanced board will actually need. I think we're much better off with code that does what we need it to do now, and designing and implementing the complicated generic framework only when we actually need it. Sorry but it is not a clean solution at all, it's a simple hack based on a linear model, which happens to work in practice when dealing with some simple boards. I suggested a very simple interface that would work fine for embedded boards, and which would be simple to implement. The so called advanced boards you're referring to are out there in numbers already, you probably have a couple of them. They are called PCs. As I pointed out before, this is not a theoretical problem, most numa systems have this issue, including many x86 boxes. I can see the problem also existing with mips boards like the sb1250 ones I worked on many years ago. OK, so presumably you can provide a qemu command line and an image which demonstrates the problem... Look at the -numa argument and show me the code that actually validates the memory amount correctly in that case? Jes
Re: [Qemu-devel] [PATCH v3 0/7] Let boards state maximum RAM limits in QEMUMachine struct
On 04/04/11 18:54, Blue Swirl wrote: On Mon, Apr 4, 2011 at 5:53 PM, Jes Sorensen jes.soren...@redhat.com wrote: I understand that what you are proposing seems to work well enough for your problem at hand. What I am saying is that adding a mechanism like that, can cause problems for adding a more generic mechanism that handles more advanced boards in the future. I much prefer a generic solution than a simple hack. We could have -device RAM,base=xxx,size=yyy,id=DIMM1 -numa nodeid=zzz,memory=DIMM1 for fine tuned control. But asking users to list and bind the DIMMs needed just to have some amount of RAM is a bit too much. So we also need a simple case (-m) and a simple check for the max memory. I totally agree, but the suggestion I proposed earlier doesn't in any way prevent this. If we use a table of valid memory locations for a given board, then it is really easy for each board to provide a validation function which accepts the amount or rejects it. As I pointed out before, this is not a theoretical problem, most numa systems have this issue, including many x86 boxes. I can see the problem also existing with mips boards like the sb1250 ones I worked on many years ago. Having an a table of valid ram locations for a board, will also give you a framework to validate against if you want to be able to specify chunks of memory at different areas of a board. This could be useful for testing behavior that is like it would be if you have a system where installing different DIMMs would split the RAM up differently. Maybe we could remove some of memory logic in pc.c with this approach. I believe it would simplify things a great deal, and have the benefit that we can emulate things much more realistically. The only issue is that it takes a little more work up front, but it really isn't a big deal. Cheers, Jes
Re: [Qemu-devel] [PATCH v4 5/5] atapi: GESN: implement 'media' subcommand
On 04/12/11 16:09, Amit Shah wrote: diff --git a/hw/ide/core.c b/hw/ide/core.c index dafc049..209d8e6 100644 --- a/hw/ide/core.c +++ b/hw/ide/core.c @@ -1084,6 +1084,49 @@ static int ide_dvd_read_structure(IDEState *s, int format, } } +static unsigned int event_status_media(IDEState *s, + uint8_t *buf, + unsigned int max_len) +{ +enum media_event_code { +no_change = 0, /* Status unchanged */ +eject_requested, /* received a request from user to eject */ +new_media, /* new media inserted and ready for access */ +media_removal, /* only for media changers */ +media_changed, /* only for media changers */ +bg_format_completed, /* MRW or DVD+RW b/g format completed */ +bg_format_restarted, /* MRW or DVD+RW b/g format restarted */ +}; +enum media_status { +tray_open = 1, +media_present = 2, +}; Would be nice if you used upper-case with the enums since they are effectively just defines, to avoid confusing them with variables. +uint8_t event_code, media_status; + +media_status = 0; +if (s-bs-tray_open) { +media_status = tray_open; +} else if (bdrv_is_inserted(s-bs)) { +media_status = media_present; +} + +/* Event notification descriptor */ +event_code = no_change; +if (media_status != tray_open s-events.new_media) { +event_code = new_media; +s-events.new_media = false; +} + +buf[4] = event_code; +buf[5] = media_status; + +/* These fields are reserved, just clear them. */ +buf[6] = 0; +buf[7] = 0; + +return 8; /* We wrote to 4 extra bytes from the header */ Shouldn't you verify that you don't exceed max_len in this? @@ -1104,6 +1147,26 @@ static void handle_get_event_status_notification(IDEState *s, uint8_t supported_events; } __attribute((packed)) *gesn_event_header; +enum notification_class_request_type { +reserved1 = 1 0, +operational_change = 1 1, +power_management = 1 2, +external_request = 1 3, +media = 1 4, +multi_host = 1 5, +device_busy = 1 6, +reserved2 = 1 7, +}; +enum event_notification_class_field { +enc_no_events = 0, +enc_operational_change, +enc_power_management, +enc_external_request, +enc_media, +enc_multiple_hosts, +enc_device_busy, +enc_reserved, +}; More lower-case enums :( Otherwise the series looks good. Jes
Re: [Qemu-devel] [PATCH v4 5/5] atapi: GESN: implement 'media' subcommand
On 04/12/11 17:13, Kevin Wolf wrote: Am 12.04.2011 17:03, schrieb Jes Sorensen: Shouldn't you verify that you don't exceed max_len in this? Not necessary (the buffer is always 2048 bytes), but it looks like the max_len parameter is unused now, so it could be removed. Kevin That works fine too - I just didn't like having a length and then having it ignored completely. Cheers, Jes
Re: [Qemu-devel] [PATCH v5 0/5] atapi: Implement 'media' subcommand for GESN
On 04/12/11 18:06, Amit Shah wrote: The GET_EVENT_STATUS_NOTIFICATION ATAPI command is listed as a mandatory command in the spec but we don't really implement it any of its sub-commands. The commit message for the last commit explains why implementing just the media subcommand is helpful and how it goes a long way in getting guests to behave as expected. The difference from the RFC series sent earlier is: - Split into more patches - Add tray open/close notification (from Markus) There certainly is much more work to be done for the other commands and also for state change handling (tray open / close / new media) overall for the block layer, but this is a good first step in being spec-compliant and at the same time making guests work. v5 looks good. Acked-by: Jes Sorensen jes.soren...@redhat.com Jes
Re: [Qemu-devel] [PATCH v3 0/7] Let boards state maximum RAM limits in QEMUMachine struct
On 03/30/11 16:07, Peter Maydell wrote: On 30 March 2011 14:56, Anthony Liguori anth...@codemonkey.ws wrote: On 03/30/2011 08:22 AM, Peter Maydell wrote: Not really, typically they're just filled up in some particular order (main RAM in one place and expansion RAM elsewhere). Since the board init function is only passed a single ram_size parameter that's all you can do anyhow. FWIW, I don't think any static data is going to be perfect here. A lot of boards have strict requirements on ram_size based on plausible combinations of DIMMs. Arbitrary values up to ram_size is not good enough. ram_size ought to be viewed as a hint, not a mechanism to allow common code to completely validate the passed in ram size parameter. It's still up to the board to validate that the given ram size makes sense. Yes, I agree, so we shouldn't try to specify some complicated set of static data that still won't be good enough. I'm trying to make it easy for boards to avoid crashing horribly when the user passes a bad value; that's all. If you don't validate properly, is there really a point in introducing that value anyway? From what you write, it sounds like it can still fail for some limits of the memory valid if the config is wrong? It still seems to me it would be better to have the boards present a table of valid memory ranges so we can do a proper validation of the valud? Cheers, Jes
Re: [Qemu-devel] [PATCH v3 0/7] Let boards state maximum RAM limits in QEMUMachine struct
On 04/04/11 16:42, Peter Maydell wrote: On 4 April 2011 15:29, Jes Sorensen jes.soren...@redhat.com wrote: Yes, I agree, so we shouldn't try to specify some complicated set of static data that still won't be good enough. I'm trying to make it easy for boards to avoid crashing horribly when the user passes a bad value; that's all. If you don't validate properly, is there really a point in introducing that value anyway? From what you write, it sounds like it can still fail for some limits of the memory valid if the config is wrong? For the boards I care about (the ARM ones), the only validation requirement is that we don't allow the user to specify so much ram that we overlap physical RAM with I/O space. So ram_size is good enough. For the sun4m boards we can assume that the only validation they need is a ram_size check, because that's all they do at the moment and nobody's complaining that I know of. I understand that what you are proposing seems to work well enough for your problem at hand. What I am saying is that adding a mechanism like that, can cause problems for adding a more generic mechanism that handles more advanced boards in the future. I much prefer a generic solution than a simple hack. It still seems to me it would be better to have the boards present a table of valid memory ranges so we can do a proper validation of the valud? If you have a concrete example of multiple boards which we currently model and which require this level of flexibility to avoid odd misbehaviour trying to run a guest, then please point them out and I'll look at expanding the patch to cover their requirements. If this is just a theoretical issue, then I think we should only add the extra generic framework code if and when we turn out to need it. As I pointed out before, this is not a theoretical problem, most numa systems have this issue, including many x86 boxes. I can see the problem also existing with mips boards like the sb1250 ones I worked on many years ago. Having an a table of valid ram locations for a board, will also give you a framework to validate against if you want to be able to specify chunks of memory at different areas of a board. This could be useful for testing behavior that is like it would be if you have a system where installing different DIMMs would split the RAM up differently. Jes
[Qemu-devel] Re: [PATCH V3] floppy: save and restore DIR register
On 04/01/11 08:22, Jason Wang wrote: We need to keep DIR register unchanged across migration, but currently it depends on the media_changed flags from block layer. Since we do not save/restore it and the bdrv_open() called in dest node may set the media_changed flag when trying to open floppy image, guest driver may think the floppy have changed after migration. To fix this, a new filed media_changed in FDrive strcutre was introduced in order to save and restore the it from block layer through pre_save/post_load callbacks. Signed-off-by: Jason Wang jasow...@redhat.com Looked through this, and it looks perfectly reasonable to me. Reviewed-by: Jes Sorensen jes.soren...@redhat.com
[Qemu-devel] Re: [RFC][PATCH v1 10/12] guest agent: qemu-ga daemon
On 03/25/11 20:47, Michael Roth wrote: This is the actual guest daemon, it listens for requests over a virtio-serial/isa-serial/unix socket channel and routes them through to dispatch routines, and writes the results back to the channel in a manner similar to Qmp. This is currently horribly broken, only the unix-listen channel method is working at the moment (likely due to mis-use of gio channel interfaces), and the code is in overall rough shape. Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com --- qemu-ga.c | 522 + 1 files changed, 522 insertions(+), 0 deletions(-) create mode 100644 qemu-ga.c diff --git a/qemu-ga.c b/qemu-ga.c new file mode 100644 index 000..435a1fc --- /dev/null +++ b/qemu-ga.c @@ -0,0 +1,522 @@ +/* + * QEMU Guest Agent + * + * Copyright IBM Corp. 2011 + * + * Authors: + * Adam Litkeagli...@linux.vnet.ibm.com + * Michael Roth mdr...@linux.vnet.ibm.com + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ +#include stdlib.h +#include stdio.h +#include stdbool.h +#include glib.h +#include gio/gio.h +#include getopt.h +#include termios.h +#include qemu_socket.h +#include json-streamer.h +#include json-parser.h +#include guest-agent.h + +#define QGA_VERSION 1.0 +#define QGA_GUEST_PATH_VIRTIO_DEFAULT /dev/virtio-ports/va +#define QGA_PIDFILE_DEFAULT /var/run/qemu-va.pid +#define QGA_BAUDRATE_DEFAULT B38400 /* for isa-serial channels */ + +bool verbose_enabled = false; + +typedef struct GAState { +bool active; +int session_id; +const char *proxy_path; +JSONMessageParser parser; +GMainLoop *main_loop; +guint conn_id; +GSocket *conn_sock; +GIOChannel *conn_channel; +guint listen_id; +GSocket *listen_sock; +GIOChannel *listen_channel; +const char *path; +const char *method; +} GAState; + +static void usage(const char *cmd) +{ +printf( +Usage: %s -c channel_opts\n +QEMU virtagent guest agent %s\n +\n + -c, --channel channel method: one of unix-connect, virtio-serial, or\n +isa-serial\n + -p, --pathchannel path\n + -v, --verbose display extra debugging information\n + -d, --daemonize become a daemon\n + -h, --helpdisplay this help and exit\n +\n +Report bugs to mdr...@linux.vnet.ibm.com\n +, cmd, QGA_VERSION); +} + +static void conn_channel_close(GAState *s); + +static void become_daemon(void) +{ +pid_t pid, sid; +int pidfd; +char *pidstr; + +pid = fork(); +if (pid 0) +exit(EXIT_FAILURE); There's a pile of missing braces in this file - please go through it and fix them before the next version. Cheers, Jes
Re: [Qemu-devel] Re: [PATCH v2] qemu-img: Initial progress printing support
On 03/31/11 13:49, Stefan Hajnoczi wrote: On Thu, Mar 31, 2011 at 12:38 PM, Kevin Wolf kw...@redhat.com wrote: Am 31.03.2011 13:15, schrieb Jes Sorensen: On 03/31/11 12:38, Kevin Wolf wrote: I have been a little reluctant to do this because it will break the ABI for tools running qemu-img from a GUI etc. That's the reason for the from a terminal part. If we check for isatty(), we should handle these cases just fine. Yes, I think checking for a tty is enough precaution and allows users to get the benefit of the progress bar. TBH I'd probably forget to add -p half the time :). Ok, this is fine with me - however how do you suggest we offer the option to disable it on the command line, an additional flag? Jes
[Qemu-devel] Re: [PATCH v2] qemu-img: Initial progress printing support
On 03/31/11 12:38, Kevin Wolf wrote: Am 30.03.2011 14:16, schrieb jes.soren...@redhat.com: From: Jes Sorensen jes.soren...@redhat.com This adds the basic infrastructure for supporting progress output on the command line, as well as progress support for qemu-img commands 'rebase' and 'convert'. Signed-off-by: Jes Sorensen jes.soren...@redhat.com Thanks, applied to the block branch. I think we should consider turning the progress output on by default if qemu-img is run from a terminal. Thanks! I have been a little reluctant to do this because it will break the ABI for tools running qemu-img from a GUI etc. Cheers, Jes
Re: [Qemu-devel] A question about QEMU on unix
On 03/29/11 12:55, Bin (Bin) Shi wrote: Can QEMU run on QNX ? My machine is Cpu - arm11 Os - qnx6.5 Does QEMU support my machine ? Hi, Do you mean if QEMU can emulate ARM11 and boot QNX that way, or do you want to run QEMU on a QNX box? I don't think QEMU has been ported to QNX, so it may require a bit of work. I have no idea how hard it would be though. As for emulating ARM11, I cannot say. Cheers, Jes
Re: [Qemu-devel] [PATCH v3 0/7] Let boards state maximum RAM limits in QEMUMachine struct
On 03/29/11 16:08, Peter Maydell wrote: This primary aim of this patchset is to add a new 'max_ram' field to the QEMUMachine structure so that a board model can specify the maximum RAM it will accept. We can then produce a friendly diagnostic message when the user tries to start qemu with a '-m' option asking for more RAM than that. (Currently most of the ARM devboard models respond with an obscure guest crash when the guest tries to access RAM and finds device registers instead.) If no maximum size is specified we default to the old behaviour of do not impose any limit. The bulk of the patchset is knock-on cleanup as a result, in particular allowing QEMUMachine structs to be const and sun4m cleanup. Hi Peter, Sorry for not getting to this patch earlier. I am a little concerned about this approach. It should work for simple embedded boards, but for larger systems, it really ought to be a mask rather than a max address. On NUMA systems you are bound to have holes, and at times you might not even start from address 0. In these cases you are likely to have device registers mapped in between the memory chunks. Ideally I think it would be better to have a mask and then introduce a is_valid_memory() kinda function to check it with. I fear that by introducing a simple max limit like this, we are going to hit problems later when we try to improve the NUMA support. Cheers, Jes