Re: [Qemu-devel] Re: [Spice-devel] RFC; usb redirection protocol
Hi, I know of devices that will enumerate twice, first as one device, then after a certain setup exchange as another. But that seems to be covered by the suggestion here, it will just be identicle to two completely different transports. Or identify devices by physical port instead of bus address or vendor/device id. Having that would be good for usb passthrough too I guess. cheers, Gerd
[Qemu-devel] Re: [Spice-devel] RFC; usb redirection protocol
Hi, It could be that the implementation decides to not handle synchroneous commands synchroneous at all. The main difference I'm trying to make here is between commands which do things to end points which cannot be done while other packets are in flight (like setting configuration, or interface alt setting) and commands (normal packets) of which there can be multiple in flight. I'm open to using a different term then synchroneous and async here. Name them control / data packets? cheers, Gerd
Re: [Qemu-devel] Re: [PATCH 00/15] Megasas HBA emulation and SCSI update v.3
On Thu, Dec 16, 2010 at 1:45 AM, Benjamin Herrenschmidt b...@kernel.crashing.org wrote: On Mon, 2010-12-13 at 08:32 +0100, Hannes Reinecke wrote: On 12/10/2010 11:14 PM, Paolo Bonzini wrote: On 11/24/2010 05:50 PM, Christoph Hellwig wrote: Btw, it might make sense to split this series into two. Patches 1 to 11 are genuine improvements to the SCSI code, which I'd like to see merged ASAP. The rest is the actual megasas driver, which I still want to see, but haven't even gotten to review yet. Ping for patches 1 to 11? Paolo The first few already have been merged by Kevin Wolf; I'll see to prepare an updated patchset. Actually, I was about to ask as I'd like to base some new work of mine on top of these. I don't see any recent commit from Kevin in the qemu master branch (nor in any other branch on http://git.savannah.gnu.org/cgit/qemu.git/log/). Does Kevin maintain a separate staging tree ? http://repo.or.cz/w/qemu/kevin.git/shortlog/refs/heads/block Stefan
Re: [Qemu-devel] Re: [PATCH] PCI: Bus number from the bridge, not the device
On Thu, Dec 16, 2010 at 04:08:16PM +0900, Isaku Yamahata wrote: On Wed, Dec 15, 2010 at 08:27:49AM -0700, Alex Williamson wrote: On Wed, 2010-12-15 at 11:56 +0200, Michael S. Tsirkin wrote: On Tue, Dec 14, 2010 at 11:34:53AM -0700, Alex Williamson wrote: On Tue, 2010-12-14 at 14:26 +0200, Michael S. Tsirkin wrote: On Mon, Dec 13, 2010 at 10:04:24PM -0700, Alex Williamson wrote: I've only ever seen config[PCI_SECONDARY_BUS] be non-zero for an assigned device, so I'm pretty sure we're not going to hurt migration, but the code is clearly wrong and I'd like to make sure we don't trip on a migration failure for a minor device config space change. Which reminds me: maybe just mark nested bridges as non-migrateable for now? Care writing such a patch? Hmm, this is trickier than it sounds. Hmm, since 0 is put in the path instead of the bridge number, will the correct bridge be restored? We're really only broken wrt migration if a device under a bridge calls qemu_ram_alloc. I guess there's more broken-ness. What exactly breaks qemu_ram_alloc? You're right, it's more broken than that. Anything that calls get_dev_path is broken for migration of bridges since the path is determined before the guest updates bus numbers. That includes qemu_ram_alloc and vmstate. I was only looking at the qemu_ram_alloc side. So perhaps the right answer, for the moment, is to block migration if there's a p2p bridge. Eww. That's bad. Anyway, I agree to disable it for the moment. Patch? Let's fix it after 0.14 release. -- yamahata
[Qemu-devel] Re: [PATCH v3] qemu, qmp: convert do_inject_nmi() to QObject, QError
On 12/15/2010 08:00 PM, Luiz Capitulino wrote: Looks like a GUI feature to me, Really? Can't see how you can build NMI to all CPUs from NMI this CPU. Or am I misunderstanding you? I guess so. Avi referred to 'nmi button on many machines', I assumed he meant a virtual machine GUI, am I wrong? I meant a real machine's GUI (it's a physical button you can press with your finger, if you have thin fingers). -- error compiling committee.c: too many arguments to function
[Qemu-devel] Re: [PATCH 11/21] ioport: insert event_tap_ioport() to ioport_write().
On Thu, Dec 16, 2010 at 04:37:41PM +0900, Yoshiaki Tamura wrote: 2010/11/28 Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp: 2010/11/28 Michael S. Tsirkin m...@redhat.com: On Thu, Nov 25, 2010 at 03:06:50PM +0900, Yoshiaki Tamura wrote: Record ioport event to replay it upon failover. Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp Interesting. This will have to be extended to support ioeventfd. Since each eventfd is really just a binary trigger it should be enough to read out the fd state. Haven't thought about eventfd yet. Will try doing it in the next spin. Hi Michael, I looked into eventfd and realized it's only used with vhost now. There are patches on list to use it for block/userspace net. However, I believe vhost bypass the net layer in qemu, and there is no way for Kemari to detect the outputs. To me, it doesn't make sense to extend this patch to support eventfd... Thanks, Yoshi Yoshi --- ioport.c | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/ioport.c b/ioport.c index aa4188a..74aebf5 100644 --- a/ioport.c +++ b/ioport.c @@ -27,6 +27,7 @@ #include ioport.h #include trace.h +#include event-tap.h /***/ /* IO Port */ @@ -76,6 +77,7 @@ static void ioport_write(int index, uint32_t address, uint32_t data) default_ioport_writel }; IOPortWriteFunc *func = ioport_write_table[index][address]; + event_tap_ioport(index, address, data); if (!func) func = default_func[index]; func(ioport_opaque[address], address, data); -- 1.7.1.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: [PATCH v3] qemu, qmp: convert do_inject_nmi() to QObject, QError
Luiz Capitulino lcapitul...@redhat.com writes: On Wed, 15 Dec 2010 18:45:09 +0100 Markus Armbruster arm...@redhat.com wrote: Luiz Capitulino lcapitul...@redhat.com writes: On Wed, 15 Dec 2010 19:18:32 +0200 Avi Kivity a...@redhat.com wrote: [...] I'd like to see cpu-index made optional; if not present, nmi all cpus (that's what the nmi button on many machines does, or at least I think that's what it does). Looks like a GUI feature to me, Really? Can't see how you can build NMI to all CPUs from NMI this CPU. Or am I misunderstanding you? I guess so. Avi referred to 'nmi button on many machines', I assumed he meant a virtual machine GUI, am I wrong? _might_ turn out to be an undesirable side effect to client writers. They seem to be coping fine with optional arguments elsewhere. Which we might want to review. I guess I prefer a to-all-cpus argument. How would that look like? cpu-index: all? Like this: { execute: inject-nmi, arguments: { to-all-cpus: true } } But this looks like an optimization to me, because it's also easy to do: for cpu in query-cpus; do inject-nmi cpu Unless we want to do this in an atomic way, due to side effects I'm not aware about. I'd expect a physical NMI button to interrupt all CPUs simultaneously (modulo wire length). [...]
[Qemu-devel] Re: [PATCH 11/21] ioport: insert event_tap_ioport() to ioport_write().
2010/12/16 Michael S. Tsirkin m...@redhat.com: On Thu, Dec 16, 2010 at 04:37:41PM +0900, Yoshiaki Tamura wrote: 2010/11/28 Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp: 2010/11/28 Michael S. Tsirkin m...@redhat.com: On Thu, Nov 25, 2010 at 03:06:50PM +0900, Yoshiaki Tamura wrote: Record ioport event to replay it upon failover. Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp Interesting. This will have to be extended to support ioeventfd. Since each eventfd is really just a binary trigger it should be enough to read out the fd state. Haven't thought about eventfd yet. Will try doing it in the next spin. Hi Michael, I looked into eventfd and realized it's only used with vhost now. There are patches on list to use it for block/userspace net. Thanks. Now I understand. In that case, inserting an even-tap function to the following code should be appropriate? int event_notifier_test_and_clear(EventNotifier *e) { uint64_t value; int r = read(e-fd, value, sizeof(value)); return r == sizeof(value); } However, I believe vhost bypass the net layer in qemu, and there is no way for Kemari to detect the outputs. To me, it doesn't make sense to extend this patch to support eventfd... Thanks, Yoshi Yoshi --- ioport.c | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/ioport.c b/ioport.c index aa4188a..74aebf5 100644 --- a/ioport.c +++ b/ioport.c @@ -27,6 +27,7 @@ #include ioport.h #include trace.h +#include event-tap.h /***/ /* IO Port */ @@ -76,6 +77,7 @@ static void ioport_write(int index, uint32_t address, uint32_t data) default_ioport_writel }; IOPortWriteFunc *func = ioport_write_table[index][address]; + event_tap_ioport(index, address, data); if (!func) func = default_func[index]; func(ioport_opaque[address], address, data); -- 1.7.1.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Qemu-devel] Re: [PATCH 05/21] virtio: modify save/load handler to handle inuse varialble.
On Thu, Dec 16, 2010 at 04:36:16PM +0900, Yoshiaki Tamura wrote: 2010/12/3 Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp: 2010/12/2 Michael S. Tsirkin m...@redhat.com: On Wed, Dec 01, 2010 at 05:03:43PM +0900, Yoshiaki Tamura wrote: 2010/11/28 Michael S. Tsirkin m...@redhat.com: On Sun, Nov 28, 2010 at 08:27:58PM +0900, Yoshiaki Tamura wrote: 2010/11/28 Michael S. Tsirkin m...@redhat.com: On Thu, Nov 25, 2010 at 03:06:44PM +0900, Yoshiaki Tamura wrote: Modify inuse type to uint16_t, let save/load to handle, and revert last_avail_idx with inuse if there are outstanding emulation. Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp This changes migration format, so it will break compatibility with existing drivers. More generally, I think migrating internal state that is not guest visible is always a mistake as it ties migration format to an internal implementation (yes, I know we do this sometimes, but we should at least try not to add such cases). I think the right thing to do in this case is to flush outstanding work when vm is stopped. Then, we are guaranteed that inuse is 0. I sent patches that do this for virtio net and block. Could you give me the link of your patches? I'd like to test whether they work with Kemari upon failover. If they do, I'm happy to drop this patch. Yoshi Look for this: stable migration image on a stopped vm sent on: Wed, 24 Nov 2010 17:52:49 +0200 Thanks for the info. However, The patch series above didn't solve the issue. In case of Kemari, inuse is mostly 0 because it queues the output, and while last_avail_idx gets incremented immediately, not sending inuse makes the state inconsistent between Primary and Secondary. Hmm. Can we simply avoid incrementing last_avail_idx? I think we can calculate or prepare an internal last_avail_idx, and update the external when inuse is decremented. I'll try whether it work w/ w/o Kemari. Hi Michael, Could you please take a look at the following patch? Which version is this against? commit 36ee7910059e6b236fe9467a609f5b4aed866912 Author: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp Date: Thu Dec 16 14:50:54 2010 +0900 virtio: update last_avail_idx when inuse is decreased. Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp It would be better to have a commit description explaining why a change is made, and why it is correct, not just repeating what can be seen from the diff anyway. diff --git a/hw/virtio.c b/hw/virtio.c index c8a0fc6..6688c02 100644 --- a/hw/virtio.c +++ b/hw/virtio.c @@ -237,6 +237,7 @@ void virtqueue_flush(VirtQueue *vq, unsigned int count) wmb(); trace_virtqueue_flush(vq, count); vring_used_idx_increment(vq, count); +vq-last_avail_idx += count; vq-inuse -= count; } @@ -385,7 +386,7 @@ int virtqueue_pop(VirtQueue *vq, VirtQueueElement *elem) unsigned int i, head, max; target_phys_addr_t desc_pa = vq-vring.desc; -if (!virtqueue_num_heads(vq, vq-last_avail_idx)) +if (!virtqueue_num_heads(vq, vq-last_avail_idx + vq-inuse)) return 0; /* When we start there are none of either input nor output. */ @@ -393,7 +394,7 @@ int virtqueue_pop(VirtQueue *vq, VirtQueueElement *elem) max = vq-vring.num; -i = head = virtqueue_get_head(vq, vq-last_avail_idx++); +i = head = virtqueue_get_head(vq, vq-last_avail_idx + vq-inuse); if (vring_desc_flags(desc_pa, i) VRING_DESC_F_INDIRECT) { if (vring_desc_len(desc_pa, i) % sizeof(VRingDesc)) { Hmm, will virtio_queue_empty be wrong now? What about virtqueue_avail_bytes? Previous patch version sure looked simpler, and this seems functionally equivalent, so my question still stands: here it is rephrased in a different way: assume that we have in avail ring 2 requests at start of ring: A and B in this order host pops A, then B, then completes B and flushes now with this patch last_avail_idx will be 1, and then remote will get it, it will execute B again. As a result B will complete twice, and apparently A will never complete. This is what I was saying below: assuming that there are outstanding requests when we migrate, there is no way a single index can be enough to figure out which requests need to be handled and which are in flight already. We must add some kind of bitmask to tell us which is which. I'm wondering why last_avail_idx is OK to send but not inuse. last_avail_idx is at some level a mistake, it exposes part of our internal implementation, but it does *also* express a guest observable state. Here's the problem that it solves: just looking at the rings in virtio there is no way to detect that a specific request has already been completed. And the protocol forbids completing the same request twice. Our
[Qemu-devel] Re: [PATCH 11/21] ioport: insert event_tap_ioport() to ioport_write().
On Thu, Dec 16, 2010 at 06:50:04PM +0900, Yoshiaki Tamura wrote: 2010/12/16 Michael S. Tsirkin m...@redhat.com: On Thu, Dec 16, 2010 at 04:37:41PM +0900, Yoshiaki Tamura wrote: 2010/11/28 Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp: 2010/11/28 Michael S. Tsirkin m...@redhat.com: On Thu, Nov 25, 2010 at 03:06:50PM +0900, Yoshiaki Tamura wrote: Record ioport event to replay it upon failover. Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp Interesting. This will have to be extended to support ioeventfd. Since each eventfd is really just a binary trigger it should be enough to read out the fd state. Haven't thought about eventfd yet. Will try doing it in the next spin. Hi Michael, I looked into eventfd and realized it's only used with vhost now. There are patches on list to use it for block/userspace net. Thanks. Now I understand. In that case, inserting an even-tap function to the following code should be appropriate? int event_notifier_test_and_clear(EventNotifier *e) { uint64_t value; int r = read(e-fd, value, sizeof(value)); return r == sizeof(value); } Possibly. However, I believe vhost bypass the net layer in qemu, and there is no way for Kemari to detect the outputs. Then maybe you should check for this combination and either disable vhost-net on the backend when kemari is active or fail. To me, it doesn't make sense to extend this patch to support eventfd... Thanks, Yoshi Yoshi --- ioport.c | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/ioport.c b/ioport.c index aa4188a..74aebf5 100644 --- a/ioport.c +++ b/ioport.c @@ -27,6 +27,7 @@ #include ioport.h #include trace.h +#include event-tap.h /***/ /* IO Port */ @@ -76,6 +77,7 @@ static void ioport_write(int index, uint32_t address, uint32_t data) default_ioport_writel }; IOPortWriteFunc *func = ioport_write_table[index][address]; + event_tap_ioport(index, address, data); if (!func) func = default_func[index]; func(ioport_opaque[address], address, data); -- 1.7.1.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 1/2] Introduce strtosz_suffix()
Sorry for the late review. jes.soren...@redhat.com writes: From: Jes Sorensen jes.soren...@redhat.com This introduces strtosz_suffix() which allows the caller to specify a default suffix in case the non default of MB is wanted. strtosz() is kept as a wrapper for strtosz_suffix() which keeps it's current default of MB. Signed-off-by: Jes Sorensen jes.soren...@redhat.com --- cutils.c | 17 ++--- qemu-common.h |7 +++ 2 files changed, 21 insertions(+), 3 deletions(-) diff --git a/cutils.c b/cutils.c index 28089aa..7984bc1 100644 --- a/cutils.c +++ b/cutils.c @@ -291,10 +291,10 @@ int fcntl_setfl(int fd, int flag) * value must be terminated by whitespace, ',' or '\0'. Return -1 on * error. */ -ssize_t strtosz(const char *nptr, char **end) +ssize_t strtosz_suffix(const char *nptr, char **end, const char default_suffix) Last parameter's const is of marginal uility. Matter of taste. { ssize_t retval = -1; -char *endptr, c; +char *endptr, c, d; int mul_required = 0; double val, mul, integral, fraction; @@ -313,10 +313,16 @@ ssize_t strtosz(const char *nptr, char **end) * part of a multi token argument. */ c = *endptr; +d = c; if (isspace(c) || c == '\0' || c == ',') { c = 0; +if (default_suffix) { +d = default_suffix; +} else { +d = c; +} The else clause d = c is effectively d = 0 (due to prior c = 0), which is the same as d = default_suffix (due to else's condition). Thus, you can replace the conditional by a simple d = default_suffix; } -switch (c) { +switch (d) { case 'B': case 'b': mul = 1; @@ -371,3 +377,8 @@ fail: return retval; } + +ssize_t strtosz(const char *nptr, char **end) +{ +return strtosz_suffix(nptr, end, STRTOSZ_DEFSUFFIX_MB); +} diff --git a/qemu-common.h b/qemu-common.h index de82c2e..1ed32e5 100644 --- a/qemu-common.h +++ b/qemu-common.h @@ -149,7 +149,14 @@ time_t mktimegm(struct tm *tm); int qemu_fls(int i); int qemu_fdatasync(int fd); int fcntl_setfl(int fd, int flag); + +#define STRTOSZ_DEFSUFFIX_TB 'T' +#define STRTOSZ_DEFSUFFIX_GB 'G' +#define STRTOSZ_DEFSUFFIX_MB 'M' +#define STRTOSZ_DEFSUFFIX_KB 'K' +#define STRTOSZ_DEFSUFFIX_B 'B' I don't see what these defines buy us over the literals, but if it makes you or other reviewers happier... ssize_t strtosz(const char *nptr, char **end); +ssize_t strtosz_suffix(const char *nptr, char **end, const char default_suffix); /* path.c */ void init_paths(const char *prefix);
Re: [Qemu-devel] [PATCH 0/1] introduce spice-qemu-char chardev
Alon Levy al...@redhat.com writes: Adding a chardev backend for spice, for usage by spice vdagent over a with virtio-serial device. Please put this into PATCH 1/1, so it gets committed. Thanks.
[Qemu-devel] Re: [PATCH v3] qemu, qmp: convert do_inject_nmi() to QObject, QError
On 12/16/2010 12:48 PM, Luiz Capitulino wrote: Ok, I didn't know that, but I had another idea: the command could accept either a single cpu index or a list: { execute: inject-nmi, arguments: { cpus: 2 } } { execute: inject-nmi, arguments: { cpus: [1, 2, 3, 4] } } This has the feature of injecting the nmi in just some cpus, although I'm not sure this is going to be desired/useful. If we agree on this we'll have to wait because the monitor doesn't currently support hybrid arguments. I hope it never does. They're hard to support in old-school statically typed languages. -- error compiling committee.c: too many arguments to function
[Qemu-devel] Re: [PATCH v3] qemu, qmp: convert do_inject_nmi() to QObject, QError
On Thu, 16 Dec 2010 11:03:38 +0200 Avi Kivity a...@redhat.com wrote: On 12/15/2010 08:00 PM, Luiz Capitulino wrote: Looks like a GUI feature to me, Really? Can't see how you can build NMI to all CPUs from NMI this CPU. Or am I misunderstanding you? I guess so. Avi referred to 'nmi button on many machines', I assumed he meant a virtual machine GUI, am I wrong? I meant a real machine's GUI (it's a physical button you can press with your finger, if you have thin fingers). Ok, I didn't know that, but I had another idea: the command could accept either a single cpu index or a list: { execute: inject-nmi, arguments: { cpus: 2 } } { execute: inject-nmi, arguments: { cpus: [1, 2, 3, 4] } } This has the feature of injecting the nmi in just some cpus, although I'm not sure this is going to be desired/useful. If we agree on this we'll have to wait because the monitor doesn't currently support hybrid arguments.
[Qemu-devel] [PATCH 1/3] qemu-img.c: Re-factor img_create()
From: Jes Sorensen jes.soren...@redhat.com This patch re-factors img_create() moving the code doing the actual work into block.c where it can be shared with QEMU. This is needed to be able to create images from QEMU to be used for live snapshots. Signed-off-by: Jes Sorensen jes.soren...@redhat.com --- block.c| 144 block.h|4 ++ qemu-img.c | 108 + 3 files changed, 150 insertions(+), 106 deletions(-) diff --git a/block.c b/block.c index b4aaf41..765f9f3 100644 --- a/block.c +++ b/block.c @@ -2758,3 +2758,147 @@ int64_t bdrv_get_dirty_count(BlockDriverState *bs) { return bs-dirty_count; } + +int bdrv_img_create(const char *filename, const char *fmt, +const char *base_filename, const char *base_fmt, +char *options, uint64_t img_size, int flags) +{ +QEMUOptionParameter *param = NULL, *create_options = NULL; +QEMUOptionParameter *backing_fmt; +BlockDriverState *bs = NULL; +BlockDriver *drv, *proto_drv; +int ret = 0; + +/* Find driver and parse its options */ +drv = bdrv_find_format(fmt); +if (!drv) { +error_report(Unknown file format '%s', fmt); +ret = -1; +goto out; +} + +proto_drv = bdrv_find_protocol(filename); +if (!proto_drv) { +error_report(Unknown protocol '%s', filename); +ret = -1; +goto out; +} + +create_options = append_option_parameters(create_options, + drv-create_options); +create_options = append_option_parameters(create_options, + proto_drv-create_options); + +/* Create parameter list with default values */ +param = parse_option_parameters(, create_options, param); + +set_option_parameter_int(param, BLOCK_OPT_SIZE, img_size); + +/* Parse -o options */ +if (options) { +param = parse_option_parameters(options, create_options, param); +if (param == NULL) { +error_report(Invalid options for file format '%s'., fmt); +ret = -1; +goto out; +} +} + +if (base_filename) { +if (set_option_parameter(param, BLOCK_OPT_BACKING_FILE, + base_filename)) { +error_report(Backing file not supported for file format '%s', + fmt); +ret = -1; +goto out; +} +} + +backing_fmt = get_option_parameter(param, BLOCK_OPT_BACKING_FMT); +if (backing_fmt backing_fmt-value.s) { +if (!bdrv_find_format(backing_fmt-value.s)) { +error_report(Unknown backing file format '%s', + backing_fmt-value.s); +ret = -1; +goto out; +} +} + +if (base_fmt) { +if (set_option_parameter(param, BLOCK_OPT_BACKING_FMT, base_fmt)) { +error_report(Backing file format not supported for file + format '%s', fmt); +ret = -1; +goto out; +} +} + +// The size for the image must always be specified, with one exception: +// If we are using a backing file, we can obtain the size from there +if (get_option_parameter(param, BLOCK_OPT_SIZE)-value.n == -1) { +QEMUOptionParameter *backing_file = +get_option_parameter(param, BLOCK_OPT_BACKING_FILE); + +if (backing_file backing_file-value.s) { +uint64_t size; +const char *fmt = NULL; +char buf[32]; + +if (backing_fmt backing_fmt-value.s) { +fmt = backing_fmt-value.s; +} + +bs = bdrv_new(); +if (!bs) { +error_report(Not enough memory to allocate BlockDriverState); +ret = -1; +goto out; +} +ret = bdrv_open(bs, backing_file-value.s, flags, drv); +if (ret 0) { +error_report(Could not open '%s', filename); +ret = -1; +goto out; +} +bdrv_get_geometry(bs, size); +size *= 512; + +snprintf(buf, sizeof(buf), % PRId64, size); +set_option_parameter(param, BLOCK_OPT_SIZE, buf); +} else { +error_report(Image creation needs a size parameter); +ret = -1; +goto out; +} +} + +printf(Formatting '%s', fmt=%s , filename, fmt); +print_option_parameters(param); +puts(); + +ret = bdrv_create(drv, filename, param); +free_option_parameters(create_options); +free_option_parameters(param); + +if (ret 0) { +if (ret == -ENOTSUP) { +error_report(Formatting or formatting option not supported for + file format '%s', fmt); +} else if (ret == -EFBIG) { +
[Qemu-devel] [PATCH 3/3] Prevent creating an image with the same filename as backing file
From: Jes Sorensen jes.soren...@redhat.com Signed-off-by: Jes Sorensen jes.soren...@redhat.com --- block.c |7 +++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/block.c b/block.c index 765f9f3..027dc6a 100644 --- a/block.c +++ b/block.c @@ -2769,6 +2769,13 @@ int bdrv_img_create(const char *filename, const char *fmt, BlockDriver *drv, *proto_drv; int ret = 0; +if (!strcmp(filename, base_filename)) { +error_report(Error: Trying to create a snapshot with the same + filename as the backing file); +ret = -1; +goto out; +} + /* Find driver and parse its options */ drv = bdrv_find_format(fmt); if (!drv) { -- 1.7.3.3
[Qemu-devel] [PATCH v2 0/3] Re-factor img_create() and add live snapshots
From: Jes Sorensen jes.soren...@redhat.com Hi, This set of patches re-factors img_create() and moves the core part of it into block.c so it can be accessed from qemu as well as qemu-img. The second patch adds basic live snapshots support to the code, however only snapshots to external QCOW2 images is supported for now. QED support should be trivial once the QED patches go into upstream. The last patch fixes a small gotcha which is present in the old code as well. Try to catch cases where a user tries to create an image with itself as the backing file. QEMU does 'interesting' things when you do this. Many thanks to Kevin for his help with block layer internals! New in v2: - Fix error return value in monitor command - Clarify help message for command - Fix patch conflict against block tree. It's all Stefan's fault :) f8feb11f4d76f390dddc5cc5345abf99f7659a78 Cheers, Jes Jes Sorensen (3): qemu-img.c: Re-factor img_create() Introduce do_snapshot_blkdev() and monitor command to handle it. Prevent creating an image with the same filename as backing file block.c | 151 +++ block.h |4 ++ blockdev.c | 61 ++ blockdev.h |1 + hmp-commands.hx | 19 +++ qemu-img.c | 108 +-- 6 files changed, 238 insertions(+), 106 deletions(-) -- 1.7.3.3
[Qemu-devel] [PATCH 2/5] ccid: add passthru card device
The passthru ccid card is a device sitting on the usb-ccid bus and using a chardevice to communicate with a remote device using the VSCard protocol defined in libcacard/vscard_common.h Usage docs available in following patch in docs/ccid.txt Signed-off-by: Alon Levy al...@redhat.com --- Makefile.objs |2 +- hw/ccid-card-passthru.c | 277 + libcacard/vscard_common.h | 130 + 3 files changed, 408 insertions(+), 1 deletions(-) create mode 100644 hw/ccid-card-passthru.c create mode 100644 libcacard/vscard_common.h diff --git a/Makefile.objs b/Makefile.objs index 1454264..c4a445c 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -193,7 +193,7 @@ hw-obj-$(CONFIG_FDC) += fdc.o hw-obj-$(CONFIG_ACPI) += acpi.o acpi_piix4.o hw-obj-$(CONFIG_APM) += pm_smbus.o apm.o hw-obj-$(CONFIG_DMA) += dma.o -hw-obj-$(CONFIG_SMARTCARD) += usb-ccid.o +hw-obj-$(CONFIG_SMARTCARD) += usb-ccid.o ccid-card-passthru.o # PPC devices hw-obj-$(CONFIG_OPENPIC) += openpic.o diff --git a/hw/ccid-card-passthru.c b/hw/ccid-card-passthru.c new file mode 100644 index 000..e90ba0e --- /dev/null +++ b/hw/ccid-card-passthru.c @@ -0,0 +1,277 @@ +/* + * CCID Card Device emulation + * + * Copyright (c) 2010 Red Hat. + * Written by Alon Levy. + * + * This code is licenced under the LGPL. + */ + +#include qemu-char.h +#include monitor.h +#include hw/ccid.h +#include libcacard/vscard_common.h + +#define DPRINTF(card, lvl, fmt, ...) \ +do { if (lvl = card-debug) { printf(ccid-card: fmt , ## __VA_ARGS__); } } while (0) + +/* Passthru card */ + + +// TODO: do we still need this? +uint8_t DEFAULT_ATR[] = { +/* From some example somewhere + 0x3B, 0xB0, 0x18, 0x00, 0xD1, 0x81, 0x05, 0xB1, 0x40, 0x38, 0x1F, 0x03, 0x28 + */ + +/* From an Athena smart card */ + 0x3B, 0xD5, 0x18, 0xFF, 0x80, 0x91, 0xFE, 0x1F, 0xC3, 0x80, 0x73, 0xC8, 0x21, 0x13, 0x08 + +}; /* maximum size of ATR - from 7816-3 */ + + +#define PASSTHRU_DEV_NAME ccid-card-passthru +#define VSCARD_IN_SIZE 65536 +#define MAX_ATR_SIZE40 + +typedef struct PassthruState PassthruState; + +struct PassthruState { +CCIDCardState base; +CharDriverState *cs; +uint8_t vscard_in_data[VSCARD_IN_SIZE]; +uint32_t vscard_in_pos; +uint32_t vscard_in_hdr; +uint8_t atr[MAX_ATR_SIZE]; +uint8_t atr_length; +uint8_t debug; +}; + +/* VSCard protocol over chardev + * This code should not depend on the card type. + * */ + +static void ccid_card_vscard_send_msg( +PassthruState *s, VSCMsgType type, reader_id_t reader_id, +const uint8_t* payload, uint32_t length) +{ +VSCMsgHeader scr_msg_header; + +scr_msg_header.type = type; +scr_msg_header.reader_id = reader_id; +scr_msg_header.length = length; +qemu_chr_write(s-cs, (uint8_t*)scr_msg_header, sizeof(VSCMsgHeader)); +qemu_chr_write(s-cs, payload, length); +} + +static void ccid_card_vscard_send_apdu( +PassthruState *s, const uint8_t* apdu, uint32_t length) +{ +ccid_card_vscard_send_msg(s, VSC_APDU, VSCARD_MINIMAL_READER_ID, apdu, length); +} + +static void ccid_card_vscard_send_error( +PassthruState *s, reader_id_t reader_id, VSCErrorCode code) +{ +VSCMsgError msg = {.code=code}; + +ccid_card_vscard_send_msg(s, VSC_Error, reader_id, (uint8_t*)msg, sizeof(msg)); +} + +static void ccid_card_vscard_send_init(PassthruState *s) +{ +VSCMsgInit msg = {.version=VSCARD_VERSION}; + +ccid_card_vscard_send_msg(s, VSC_Init, VSCARD_UNDEFINED_READER_ID, + (uint8_t*)msg, sizeof(msg)); +} + +static int ccid_card_vscard_can_read(void *opaque) +{ +return 65535; +} + +static void ccid_card_vscard_handle_message(PassthruState *card, +VSCMsgHeader* scr_msg_header) +{ +uint8_t *data = (uint8_t*)scr_msg_header[1]; + +switch (scr_msg_header-type) { +case VSC_ATR: +DPRINTF(card, 1, VSC_ATR %d\n, scr_msg_header-length); +assert(scr_msg_header-length = MAX_ATR_SIZE); +memcpy(card-atr, data, scr_msg_header-length); +card-atr_length = scr_msg_header-length; +ccid_card_card_inserted(card-base); +break; +case VSC_APDU: +ccid_card_send_apdu_to_guest(card-base, data, scr_msg_header-length); +break; +case VSC_CardRemove: +DPRINTF(card, 1, VSC_CardRemove\n); +ccid_card_card_removed(card-base); +break; +case VSC_Init: +break; +case VSC_Error: +ccid_card_card_error(card-base, *(uint64_t*)data); +break; +case VSC_ReaderAdd: +if (ccid_card_ccid_attach(card-base) 0) { +ccid_card_vscard_send_error(card, VSCARD_UNDEFINED_READER_ID, + VSC_CANNOT_ADD_MORE_READERS); +} else { +ccid_card_vscard_send_msg(card, VSC_ReaderAddResponse, +
[Qemu-devel] [PATCH 0/5] usb-ccid (v10)
This patchset adds three new devices, usb-ccid, ccid-card-passthru and ccid-card-emulated, providing a CCID bus, a simple passthru protocol implementing card requiring a client, and a standalone emulated card. It also introduces a new directory libcaccard with CAC card emulation, CAC is a type of ISO 7816 smart card. v8-v10 changes: * usb-ccid: * add slot for future use (Gerd) * ifdef ENABLE_MIGRATION for migration support on account of usb migration not being ready in general. (Gerd) * verbosified commit messages. (Gerd) * put libcacard docs in libcacard commit. (Gerd) v8-v9 changes: * Blue Swirl comments: * white space fixes * enabled by default, disabled only if missing nss * forgotten fix from v8 (don't build libcacard.so) * added a note about device being little endian * library renamed from libcaccard to libcacard * squashed both of libcacard patches, they touched different files anyway. v7-v8 changes: * Blue Swirl comments: * usb-ccid: deannonymize some structs * usb-ccid: coding style change - answer_t and bulk_in_t fixed * usb-ccid: handle endianess conversion between guest and host * usb-ccid: s/ccid_bulk_in_copy_out/ccid_bulk_in_copy_to_guest/ * ccid-card-emulated: fix segfault if backend not specified * ccid-card-emulated: let last reader inserted win * libcaccard: remove double vscard_common.h v6-v7 changes: * external libcaccard became internal directory libcaccard * statically link object files into qemu * produce libcaccard.so for usage by external projects * applied coding style to new code (please check me) - did not use the qemu options parsing for libcaccard, since it seems to draw large amounts of qemu code (monitor for instance). v5-v6 changes: * really remove static debug (I apologize for claiming to have done so before) v4-v5 changes: * rebased to latest * remove static debug in card devices * fix --enable-smartcard to link * stall instead of assert when exceeding BULK_OUT_DATA_SIZE * make ccid_reserve_recv_buf for too large len discard message, not exit * make ccid_reserve_recv_buf return void* * fix typo * remove commented code in VMState v3-v4: * remove ccid field in CCIDBus * remove static debug in bus * add back docs v2-v3: * split into bus (usb-ccid.c, uses ccid.h) and card (ccid-card-passthru.c). * removed documentation (being revised). v1-v2: * all QSIMPLEQ turned into fixed sized rings * all allocated buffers turned into fixed size buffers * added migration support * added a message to tell client qemu has migrated to ip:port * for lack of monitor commands ip:port are 0:0, which causes the updated vscclient to connect to one port higher on the same host. will add monitor commands in a separate patch. tested with current setup. *** BLURB HERE *** Alon Levy (4): usb-ccid: add CCID bus ccid: add passthru card device ccid: add ccid-card-emulated device (v2) ccid: add docs Robert Relyea (1): libcacard: initial commit after coding style fixes Makefile|6 +- Makefile.objs |6 + Makefile.target |2 + configure | 25 + docs/ccid.txt | 125 hw/ccid-card-emulated.c | 501 hw/ccid-card-passthru.c | 277 + hw/ccid.h | 35 ++ hw/usb-ccid.c | 1362 +++ libcacard/Makefile | 13 + libcacard/cac.c | 411 + libcacard/cac.h | 20 + libcacard/card_7816.c | 780 + libcacard/card_7816.h | 60 ++ libcacard/card_7816t.h | 163 + libcacard/config.h | 81 +++ libcacard/event.c | 112 libcacard/eventt.h | 28 + libcacard/link_test.c | 20 + libcacard/mutex.h | 59 ++ libcacard/passthru.c| 612 +++ libcacard/passthru.h| 50 ++ libcacard/vcard.c | 350 +++ libcacard/vcard.h | 85 +++ libcacard/vcard_emul.h | 59 ++ libcacard/vcard_emul_nss.c | 1147 libcacard/vcard_emul_type.c | 60 ++ libcacard/vcard_emul_type.h | 29 + libcacard/vcardt.h | 66 +++ libcacard/vevent.h | 26 + libcacard/vreader.c | 515 libcacard/vreader.h | 53 ++ libcacard/vreadert.h| 23 + libcacard/vscard_common.h | 130 libcacard/vscclient.c | 710 ++ 35 files changed, 7999 insertions(+), 2 deletions(-) create mode 100644 docs/ccid.txt create mode 100644 hw/ccid-card-emulated.c create mode 100644 hw/ccid-card-passthru.c create mode 100644 hw/ccid.h create mode 100644 hw/usb-ccid.c create mode 100644 libcacard/Makefile create mode 100644 libcacard/cac.c create mode 100644 libcacard/cac.h create mode 100644 libcacard/card_7816.c create mode 100644
[Qemu-devel] [PATCH 2/3] Introduce do_snapshot_blkdev() and monitor command to handle it.
From: Jes Sorensen jes.soren...@redhat.com The monitor command is: snapshot_blkdev device [snapshot-file] [format] Default format is qcow2. For now snapshots without a snapshot-file, eg internal snapshots, are not supported. Signed-off-by: Jes Sorensen jes.soren...@redhat.com --- blockdev.c | 61 +++ blockdev.h |1 + hmp-commands.hx | 19 + 3 files changed, 81 insertions(+), 0 deletions(-) diff --git a/blockdev.c b/blockdev.c index 3b3b82d..9d6f72c 100644 --- a/blockdev.c +++ b/blockdev.c @@ -516,6 +516,67 @@ void do_commit(Monitor *mon, const QDict *qdict) } } +int do_snapshot_blkdev(Monitor *mon, const QDict *qdict, QObject **ret_data) +{ +const char *device = qdict_get_str(qdict, device); +const char *filename = qdict_get_try_str(qdict, snapshot_file); +const char *format = qdict_get_try_str(qdict, format); +const char format_qcow2[] = qcow2; +BlockDriverState *bs; +BlockDriver *drv, *proto_drv; +int ret = 0; +int flags; + +bs = bdrv_find(device); +if (!bs) { +qerror_report(QERR_DEVICE_NOT_FOUND, device); +ret = -1; +goto out; +} + +if (!format) { +format = format_qcow2; +} + +drv = bdrv_find_format(format); +if (!drv) { +qerror_report(QERR_INVALID_BLOCK_FORMAT, format); +ret = -1; +goto out; +} + +proto_drv = bdrv_find_protocol(filename); +if (!proto_drv) { +qerror_report(QERR_INVALID_BLOCK_FORMAT, format); +ret = -1; +goto out; +} + +ret = bdrv_img_create(filename, format, bs-filename, + bs-drv-format_name, NULL, -1, bs-open_flags); +if (ret) { +goto out; +} + +qemu_aio_flush(); +bdrv_flush(bs); + +flags = bs-open_flags; +bdrv_close(bs); +ret = bdrv_open(bs, filename, flags, drv); +/* + * If reopening the image file we just created fails, we really + * are in trouble :( + */ +assert(ret == 0); +out: +if (ret) { +ret = -1; +} + +return ret; +} + static int eject_device(Monitor *mon, BlockDriverState *bs, int force) { if (!force) { diff --git a/blockdev.h b/blockdev.h index 4cb8ca9..4536b5c 100644 --- a/blockdev.h +++ b/blockdev.h @@ -52,5 +52,6 @@ int do_block_set_passwd(Monitor *mon, const QDict *qdict, QObject **ret_data); int do_change_block(Monitor *mon, const char *device, const char *filename, const char *fmt); int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data); +int do_snapshot_blkdev(Monitor *mon, const QDict *qdict, QObject **ret_data); #endif diff --git a/hmp-commands.hx b/hmp-commands.hx index 23024ba..dd3db36 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -801,6 +801,25 @@ STEXI Set maximum tolerated downtime (in seconds) for migration. ETEXI +{ +.name = snapshot_blkdev, +.args_type = device:s,snapshot_file:s?,format:s?, +.params = device [new-image-file] [format], +.help = initiates a live snapshot\n\t\t\t + of device. If a new image file is specified, the\n\t\t\t + new image file will become the new root image.\n\t\t\t + If format is specified, the snapshot file will\n\t\t\t + be created in that format. Otherwise the\n\t\t\t + snapshot will be internal! (currently unsupported), +.mhandler.cmd_new = do_snapshot_blkdev, +}, + +STEXI +...@item snapshot_blkdev +...@findex snapshot_blkdev +Snapshot device, using snapshot file as target if provided +ETEXI + #if defined(TARGET_I386) { .name = drive_add, -- 1.7.3.3
[Qemu-devel] Re: [PATCH v3] qemu, qmp: convert do_inject_nmi() to QObject, QError
On Thu, 16 Dec 2010 12:51:14 +0200 Avi Kivity a...@redhat.com wrote: On 12/16/2010 12:48 PM, Luiz Capitulino wrote: Ok, I didn't know that, but I had another idea: the command could accept either a single cpu index or a list: { execute: inject-nmi, arguments: { cpus: 2 } } { execute: inject-nmi, arguments: { cpus: [1, 2, 3, 4] } } This has the feature of injecting the nmi in just some cpus, although I'm not sure this is going to be desired/useful. If we agree on this we'll have to wait because the monitor doesn't currently support hybrid arguments. I hope it never does. They're hard to support in old-school statically typed languages. We could accept only a list then.
Re: [Qemu-devel] [Qestion] What status of memory stats feature
On Wed, 15 Dec 2010 16:58:03 -0600 Adam Litke a...@us.ibm.com wrote: On Wed, 2010-12-15 at 15:39 -0200, Luiz Capitulino wrote: On Wed, 15 Dec 2010 16:20:05 +0900 Ken'ichi Ohmichi oomi...@mxs.nes.nec.co.jp wrote: Hi, I tried to get the memory stats by using virDomainMemoryStats() of libvirt, but it could not do it because of the following patch: [PATCH 03/23] disable guest-provided stats on info balloon command 2010/10/01 from Luiz Capitulino http://www.mail-archive.com/qemu-devel@nongnu.org/msg43024.html According to the related thread, the above patch avoids hanging user monitor, and people hope the memory stats feature will be able in the future. So I'd like to know the status of this feature. Is there the patch for enabling the feature ? If the patch exists, I'd like to try it. It doesn't, afaik. It depends on what you are looking to do. If you just want to play around and aren't concerned about lockups, You could undo the above patch to re-enable the interface (although I cannot guarantee that even this will work). What is the essential problem ? There are two essential problems here. The first one is that QMP lacks true asynchronous command support (it does have some code for it, but it's incomplete). I was never completely clear on the problems with the QMP async support. The second problem is that we must not make an existing synchronous command asynchronous, because this breaks the ABI. The memory stats interface has always advertised itself as an async command so this one shouldn't matter. Whether those semantics actually ever worked is another issue. The query-balloon command used to be synchronous (or quite fast, or would never lockup). We changed that. Does some infinite loop happen ? No, the guest doesn't respond. Unfortunately, I cannot understand the cause of hanging user monitor. The balloon command needs guest cooperation (ie. it talks to the guest), if the guest doesn't respond the command will never return. Isn't that only a problem for the user monitor? For QMP you might just never get a response to the stats request. IIRC, the QMP monitor is never 'suspended' in the same way that the user monitor is so I wouldn't expect it to be susceptible to the same lockup issues. We need to have a way to cancel the request. This is not the cause of this problem, though. We don't have a precise ETA for async support, but if it depends only on the new error framework work, then we could have it for 0.15. If it depends on the full monitor redesign work, then I guess we'll two releases. I can't provide a definitive answer to this unfortunately since I haven't looked in awhile (nor am I privy to the details of the full monitor redesign proposal. Basically, we're splitting HMP and QMP. The end result will be that HMP will use QMP as a C library. Not hard to do, but there's lots of code churn involved. Now, regarding async support, I already have a proposal, but I'll send it only next year because I'll be in vacations for two weeks.
[Qemu-devel] [PATCH 4/5] ccid: add ccid-card-emulated device (v2)
This devices uses libcacard (internal) to emulate a smartcard conforming to the CAC standard. It attaches to the usb-ccid bus. Usage instructions (example command lines) are in the following patch in docs/ccid.txt. It uses libcacard which uses nss, so it can work with both hw cards and certificates (files). changes from v1: remove stale comments, use only c-style comments bugfix, forgot to set recv_len change reader name to 'Virtual Reader' Signed-off-by: Alon Levy al...@redhat.com --- Makefile.objs |2 +- hw/ccid-card-emulated.c | 501 +++ 2 files changed, 502 insertions(+), 1 deletions(-) create mode 100644 hw/ccid-card-emulated.c diff --git a/Makefile.objs b/Makefile.objs index 3f44a91..54b22ca 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -193,7 +193,7 @@ hw-obj-$(CONFIG_FDC) += fdc.o hw-obj-$(CONFIG_ACPI) += acpi.o acpi_piix4.o hw-obj-$(CONFIG_APM) += pm_smbus.o apm.o hw-obj-$(CONFIG_DMA) += dma.o -hw-obj-$(CONFIG_SMARTCARD) += usb-ccid.o ccid-card-passthru.o +hw-obj-$(CONFIG_SMARTCARD) += usb-ccid.o ccid-card-passthru.o ccid-card-emulated.o # PPC devices hw-obj-$(CONFIG_OPENPIC) += openpic.o diff --git a/hw/ccid-card-emulated.c b/hw/ccid-card-emulated.c new file mode 100644 index 000..7ecdf75 --- /dev/null +++ b/hw/ccid-card-emulated.c @@ -0,0 +1,501 @@ +/* + * CCID Card Device. Emulated card. + * + * It can be used to provide access to the local hardware in a non exclusive + * way, or it can use certificates. It requires the usb-ccid bus. + * + * Usage 1: standard, mirror hardware reader+card: + * qemu .. -usb -device usb-ccid -device ccid-card-emulated + * + * Usage 2: use certificates, no hardware required + * one time: create the certificates: + * for i in 1 2 3; do certutil -d /etc/pki/nssdb -x -t CT,CT,CT -S -s CN=user$i -n user$i; done + * qemu .. -usb -device usb-ccid -device ccid-card-emulated,cert1=user1,cert2=user2,cert3=user3 + * + * If you use a non default db for the certificates you can specify it using the db parameter. + * + * + * Copyright (c) 2010 Red Hat. + * Written by Alon Levy. + * + * This code is licenced under the LGPL. + */ + +#include pthread.h +#include eventt.h +#include vevent.h +#include vreader.h +#include vcard_emul.h +#include qemu-char.h +#include monitor.h +#include hw/ccid.h + +#define DPRINTF(card, lvl, fmt, ...) \ +do { if (lvl = card-debug) { printf(ccid-card-emul: %s: fmt , __func__, ## __VA_ARGS__); } } while (0) + +#define EMULATED_DEV_NAME ccid-card-emulated + +#define BACKEND_NSS_EMULATED nss-emulated /* the default */ +#define BACKEND_CERTIFICATES certificates + +typedef struct EmulatedState EmulatedState; + +enum { +EMUL_READER_INSERT = 0, +EMUL_READER_REMOVE, +EMUL_CARD_INSERT, +EMUL_CARD_REMOVE, +EMUL_GUEST_APDU, +EMUL_RESPONSE_APDU, +EMUL_ERROR, +}; + +static const char* emul_event_to_string(uint32_t emul_event) +{ +switch (emul_event) { +case EMUL_READER_INSERT: return EMUL_READER_INSERT; +case EMUL_READER_REMOVE: return EMUL_READER_REMOVE; +case EMUL_CARD_INSERT: return EMUL_CARD_INSERT; +case EMUL_CARD_REMOVE: return EMUL_CARD_REMOVE; +case EMUL_GUEST_APDU: return EMUL_GUEST_APDU; +case EMUL_RESPONSE_APDU: return EMUL_RESPONSE_APDU; +case EMUL_ERROR: return EMUL_ERROR; +default: +break; +} +return UNKNOWN; +} + +typedef struct EmulEvent { +QSIMPLEQ_ENTRY(EmulEvent) entry; +union { +struct { +uint32_t type; +} gen; +struct { +uint32_t type; +uint64_t code; +} error; +struct { +uint32_t type; +uint32_t len; +uint8_t data[]; +} data; +} p; +} EmulEvent; + +#define MAX_ATR_SIZE 40 +struct EmulatedState { +CCIDCardState base; +uint8_t debug; +char*backend; +char*cert1; +char*cert2; +char*cert3; +char*db; +uint8_t atr[MAX_ATR_SIZE]; +uint8_t atr_length; +QSIMPLEQ_HEAD(event_list, EmulEvent) event_list; +pthread_mutex_t event_list_mutex; +VReader *reader; +QSIMPLEQ_HEAD(guest_apdu_list, EmulEvent) guest_apdu_list; +pthread_mutex_t vreader_mutex; /* and guest_apdu_list mutex */ +pthread_mutex_t handle_apdu_mutex; +pthread_cond_t handle_apdu_cond; +int pipe[2]; +int quit_apdu_thread; +pthread_mutex_t apdu_thread_quit_mutex; +pthread_cond_t apdu_thread_quit_cond; +}; + +static void emulated_apdu_from_guest(CCIDCardState *base, const uint8_t *apdu, uint32_t len) +{ +EmulatedState *card = DO_UPCAST(EmulatedState, base, base); +EmulEvent *event = (EmulEvent*)malloc(sizeof(EmulEvent) + len); + +assert(event); +event-p.data.type = EMUL_GUEST_APDU; +event-p.data.len = len; +memcpy(event-p.data.data, apdu, len); +pthread_mutex_lock(card-vreader_mutex); +
[Qemu-devel] [PATCH 1/5] usb-ccid: add CCID bus
A CCID device is a smart card reader. It is a USB device, defined at [1]. This patch introduces the usb-ccid device that is a ccid bus. Next patches will introduce two card types to use it, a passthru card and an emulated card. [1] http://www.usb.org/developers/devclass_docs/DWG_Smart-Card_CCID_Rev110. Signed-off-by: Alon Levy al...@redhat.com --- Makefile.objs |1 + configure |6 + hw/ccid.h | 35 ++ hw/usb-ccid.c | 1362 + 4 files changed, 1404 insertions(+), 0 deletions(-) create mode 100644 hw/ccid.h create mode 100644 hw/usb-ccid.c diff --git a/Makefile.objs b/Makefile.objs index cebb945..1454264 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -193,6 +193,7 @@ hw-obj-$(CONFIG_FDC) += fdc.o hw-obj-$(CONFIG_ACPI) += acpi.o acpi_piix4.o hw-obj-$(CONFIG_APM) += pm_smbus.o apm.o hw-obj-$(CONFIG_DMA) += dma.o +hw-obj-$(CONFIG_SMARTCARD) += usb-ccid.o # PPC devices hw-obj-$(CONFIG_OPENPIC) += openpic.o diff --git a/configure b/configure index 2917874..584cc8e 100755 --- a/configure +++ b/configure @@ -332,6 +332,7 @@ zero_malloc= trace_backend=nop trace_file=trace spice= +smartcard=yes # OS specific if check_define __linux__ ; then @@ -2354,6 +2355,7 @@ echo vhost-net support $vhost_net echo Trace backend $trace_backend echo Trace output file $trace_file-pid echo spice support $spice +echo smartcard support $smartcard if test $sdl_too_old = yes; then echo - Your SDL version is too old - please upgrade to have SDL support @@ -2617,6 +2619,10 @@ if test $spice = yes ; then echo CONFIG_SPICE=y $config_host_mak fi +if test $smartcard = yes ; then + echo CONFIG_SMARTCARD=y $config_host_mak +fi + # XXX: suppress that if [ $bsd = yes ] ; then echo CONFIG_BSD=y $config_host_mak diff --git a/hw/ccid.h b/hw/ccid.h new file mode 100644 index 000..af59070 --- /dev/null +++ b/hw/ccid.h @@ -0,0 +1,35 @@ +#ifndef __CCID_H__ +#define __CCID_H__ + +#include qdev.h + +typedef struct CCIDCardState CCIDCardState; +typedef struct CCIDCardInfo CCIDCardInfo; + +struct CCIDCardState { +DeviceState qdev; +uint32_tslot; // For future use with multiple slot reader. +}; + +struct CCIDCardInfo { +DeviceInfo qdev; +void (*print)(Monitor *mon, CCIDCardState *card, int indent); +const uint8_t *(*get_atr)(CCIDCardState *card, uint32_t *len); +void (*apdu_from_guest)(CCIDCardState *card, const uint8_t *apdu, uint32_t len); +int (*exitfn)(CCIDCardState *card); +int (*initfn)(CCIDCardState *card); +}; + +void ccid_card_send_apdu_to_guest(CCIDCardState *card, uint8_t* apdu, uint32_t len); +void ccid_card_card_removed(CCIDCardState *card); +void ccid_card_card_inserted(CCIDCardState *card); +void ccid_card_card_error(CCIDCardState *card, uint64_t error); +void ccid_card_qdev_register(CCIDCardInfo *card); + +/* support guest visible insertion/removal of ccid devices based on actual + * devices connected/removed. Called by card implementation (passthru, local) */ +int ccid_card_ccid_attach(CCIDCardState *card); +void ccid_card_ccid_detach(CCIDCardState *card); + +#endif // __CCID_H__ + diff --git a/hw/usb-ccid.c b/hw/usb-ccid.c new file mode 100644 index 000..f431ba4 --- /dev/null +++ b/hw/usb-ccid.c @@ -0,0 +1,1362 @@ +/* + * CCID Device emulation + * + * Based on usb-serial.c: + * Copyright (c) 2006 CodeSourcery. + * Copyright (c) 2008 Samuel Thibault samuel.thiba...@ens-lyon.org + * Written by Paul Brook, reused for FTDI by Samuel Thibault, + * Reused for CCID by Alon Levy. + * Contributed to by Robert Relyea + * Copyright (c) 2010 Red Hat. + * + * This code is licenced under the LGPL. + */ + +/* References: + * + * CCID Specification Revision 1.1 April 22nd 2005 + * Universal Serial Bus, Device Class: Smart Card + * Specification for Integrated Circuit(s) Cards Interface Devices + * + * Endianess note: from the spec (1.3) + * Fields that are larger than a byte are stored in little endian + * + * KNOWN BUGS + * 1. remove/insert can sometimes result in removed state instead of inserted. + * This is a result of the following: + * symptom: dmesg shows ERMOTEIO (-121), pcscd shows -99. This happens + * when we send a too short packet, seen in uhci-usb.c, resulting from + * a urb requesting SPD and us returning a smaller packet. + * Not sure which messages trigger this. + * + * Migration note: + * + * All the VMStateDescription's are left here for future use, but + * not enabled right now since there is no support for USB migration. + * + * To enable define ENABLE_MIGRATION + */ + +#include qemu-common.h +#include qemu-error.h +#include usb.h +#include monitor.h + +#include hw/ccid.h + +//#define DEBUG_CCID + +#define DPRINTF(s, lvl, fmt, ...) \ +do { if (lvl = s-debug) { printf(usb-ccid: fmt , ## __VA_ARGS__); } } while (0) + +#define CCID_DEV_NAME usb-ccid + +/* The two options for variable sized buffers: + * make them constant size, for large enough constant, + * or
[Qemu-devel] Re: [PATCH 1/3] qemu-img.c: Re-factor img_create()
Am 16.12.2010 12:04, schrieb jes.soren...@redhat.com: From: Jes Sorensen jes.soren...@redhat.com This patch re-factors img_create() moving the code doing the actual work into block.c where it can be shared with QEMU. This is needed to be able to create images from QEMU to be used for live snapshots. Signed-off-by: Jes Sorensen jes.soren...@redhat.com --- block.c| 144 block.h|4 ++ qemu-img.c | 108 + 3 files changed, 150 insertions(+), 106 deletions(-) diff --git a/block.c b/block.c index b4aaf41..765f9f3 100644 --- a/block.c +++ b/block.c @@ -2758,3 +2758,147 @@ int64_t bdrv_get_dirty_count(BlockDriverState *bs) { return bs-dirty_count; } + +int bdrv_img_create(const char *filename, const char *fmt, +const char *base_filename, const char *base_fmt, +char *options, uint64_t img_size, int flags) +{ +QEMUOptionParameter *param = NULL, *create_options = NULL; +QEMUOptionParameter *backing_fmt; +BlockDriverState *bs = NULL; +BlockDriver *drv, *proto_drv; +int ret = 0; + +/* Find driver and parse its options */ +drv = bdrv_find_format(fmt); +if (!drv) { +error_report(Unknown file format '%s', fmt); +ret = -1; +goto out; +} + +proto_drv = bdrv_find_protocol(filename); +if (!proto_drv) { +error_report(Unknown protocol '%s', filename); +ret = -1; +goto out; +} + +create_options = append_option_parameters(create_options, + drv-create_options); +create_options = append_option_parameters(create_options, + proto_drv-create_options); + +/* Create parameter list with default values */ +param = parse_option_parameters(, create_options, param); + +set_option_parameter_int(param, BLOCK_OPT_SIZE, img_size); + +/* Parse -o options */ +if (options) { +param = parse_option_parameters(options, create_options, param); +if (param == NULL) { +error_report(Invalid options for file format '%s'., fmt); +ret = -1; +goto out; +} +} + +if (base_filename) { +if (set_option_parameter(param, BLOCK_OPT_BACKING_FILE, + base_filename)) { +error_report(Backing file not supported for file format '%s', + fmt); +ret = -1; +goto out; +} +} + +backing_fmt = get_option_parameter(param, BLOCK_OPT_BACKING_FMT); +if (backing_fmt backing_fmt-value.s) { +if (!bdrv_find_format(backing_fmt-value.s)) { +error_report(Unknown backing file format '%s', + backing_fmt-value.s); +ret = -1; +goto out; +} +} + +if (base_fmt) { +if (set_option_parameter(param, BLOCK_OPT_BACKING_FMT, base_fmt)) { +error_report(Backing file format not supported for file + format '%s', fmt); +ret = -1; +goto out; +} +} The order is wrong here. If you use -F, the backing format won't be checked. + +// The size for the image must always be specified, with one exception: +// If we are using a backing file, we can obtain the size from there +if (get_option_parameter(param, BLOCK_OPT_SIZE)-value.n == -1) { +QEMUOptionParameter *backing_file = +get_option_parameter(param, BLOCK_OPT_BACKING_FILE); + +if (backing_file backing_file-value.s) { +uint64_t size; +const char *fmt = NULL; +char buf[32]; + +if (backing_fmt backing_fmt-value.s) { +fmt = backing_fmt-value.s; +} + +bs = bdrv_new(); +if (!bs) { +error_report(Not enough memory to allocate BlockDriverState); +ret = -1; +goto out; +} bdrv_new never returns NULL (it's an indirect qemu_malloc call). +ret = bdrv_open(bs, backing_file-value.s, flags, drv); +if (ret 0) { +error_report(Could not open '%s', filename); +ret = -1; +goto out; +} +bdrv_get_geometry(bs, size); +size *= 512; + +snprintf(buf, sizeof(buf), % PRId64, size); +set_option_parameter(param, BLOCK_OPT_SIZE, buf); +} else { +error_report(Image creation needs a size parameter); +ret = -1; +goto out; +} +} + +printf(Formatting '%s', fmt=%s , filename, fmt); +print_option_parameters(param); +puts(); + +ret =
Re: [Qemu-devel] [PATCH 3/3] Prevent creating an image with the same filename as backing file
On Thu, Dec 16, 2010 at 11:04 AM, jes.soren...@redhat.com wrote: From: Jes Sorensen jes.soren...@redhat.com Signed-off-by: Jes Sorensen jes.soren...@redhat.com --- block.c | 7 +++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/block.c b/block.c index 765f9f3..027dc6a 100644 --- a/block.c +++ b/block.c @@ -2769,6 +2769,13 @@ int bdrv_img_create(const char *filename, const char *fmt, BlockDriver *drv, *proto_drv; int ret = 0; + if (!strcmp(filename, base_filename)) { + error_report(Error: Trying to create a snapshot with the same + filename as the backing file); Can we avoid using the word snapshot here? Just image would make sense too. Users could hit this if they incorrectly create an image manually. They might not be thinking in terms of snapshot and that word is already overloaded in QEMU. Stefan
[Qemu-devel] [PATCH 5/5] ccid: add docs
Add documentation for the usb-ccid device and accompanying two card devices, ccid-card-emulated and ccid-card-passthru. --- docs/ccid.txt | 125 ++ docs/libcacard.txt | 483 2 files changed, 125 insertions(+), 483 deletions(-) create mode 100644 docs/ccid.txt delete mode 100644 docs/libcacard.txt diff --git a/docs/ccid.txt b/docs/ccid.txt new file mode 100644 index 000..791c28c --- /dev/null +++ b/docs/ccid.txt @@ -0,0 +1,125 @@ +Qemu CCID Device Documentation. + +Contents +1. USB CCID device +2. Building +3. Using ccid-card-emulated with hardware +4. Using ccid-card-emulated with certificates +5. Using ccid-card-passthru with client side hardware +6. Using ccid-card-passthru with client side certificates +7. Passthrough protocol scenario +8. libcaccard + +1. USB CCID device + +The USB CCID device is a USB device implementing the CCID specification, which +lets one connect smart card readers that implement the same spec. For more +information see the specification: + + Universal Serial Bus + Device Class: Smart Card + CCID + Specification for + Integrated Circuit(s) Cards Interface Devices + Revision 1.1 + April 22rd, 2005 + +Smartcard are used for authentication, single sign on, decryption in +public/private schemes and digital signatures. A smartcard reader on the client +cannot be used on a guest with simple usb passthrough since it will then not be +available on the client, possibly locking the computer when it is removed. On +the other hand this device can let you use the smartcard on both the client and +the guest machine. It is also possible to have a completely virtual smart card +reader and smart card (i.e. not backed by a physical device) using this device. + +2. Building + +The cryptographic functions and access to the physical card is done via NSS. + +Installing NSS: + +In redhat/fedora: +yum install nss-devel +In ubuntu/debian: +apt-get install libnss3-dev +(not tested on ubuntu) + +Configuring and building: +./configure --enable-smartcard make + +3. Using ccid-card-emulated with hardware + +Assuming you have a working smartcard on the host with the current +user, using NSS, qemu acts as another NSS client using ccid-card-emulated: + +qemu -usb -device usb-ccid -device ccid-card-emualated + +4. Using ccid-card-emulated with certificates + +You must create the certificates. This is a one time process. We use NSS +certificates: + +certutil -d /etc/pki/nssdb -x -t CT,CT,CT -S -s CN=cert1 -n cert1 + +Note: you must have exactly three certificates. + +Assuming the current user can access the certificates (use certutil -L to +verify), you can use the emulated card type with the certificates backend: + +qemu -usb -device usb-ccid -device ccid-card-emulated,backend=certificates,cert1=cert1,cert2=cert2,cert3=cert3 + +5. Using ccid-card-passthru with client side hardware + +on the host specify the ccid-card-passthru device with a suitable chardev: + +qemu -chardev socket,server,host=0.0.0.0,port=2001,id=ccid,nowait -usb -device usb-ccid -device ccid-card-passthru,chardev=ccid + +on the client run vscclient, built when you built the libcaccard library: +libcaccard/vscclient qemu-host 2001 + +6. Using ccid-card-passthru with client side certificates + +Run qemu as per #5, and run vscclient as follows: +(Note: vscclient command line interface is in a state of change) + +libcaccard/vscclient -e db=\/etc/pki/nssdb\ use_hw=no soft=(,Test,CAC,,cert1,cert2,cert3) qemu-host 2001 + +7. Passthrough protocol scenario + +This is a typical interchange of messages when using the passthru card device. +usb-ccid is a usb device. It defaults to an unattached usb device on startup. +usb-ccid expects a chardev and expects the protocol defined in +cac_card/vscard_common.h to be passed over that. + +A typical interchange is: + +client event | vscclient |passthru| usb-ccid | guest event +-- + | VSC_Init|| | + | VSC_ReaderAdd || attach| + | || | sees new usb device. +card inserted | || | + | VSC_ATR || | + | || | guest operation, APDU transfer via CCID + | | VSC_APDU | | + | VSC_APDU|| | +client-physical | || | +card APDU exchange| || | +
[Qemu-devel] [PATCH 2/2] ARM: Implement correct NaN propagation rules
Implement the correct NaN propagation rules for ARM targets by providing an appropriate pickNaN function. Signed-off-by: Peter Maydell peter.mayd...@linaro.org --- fpu/softfloat-specialize.h | 23 +++ 1 files changed, 23 insertions(+), 0 deletions(-) diff --git a/fpu/softfloat-specialize.h b/fpu/softfloat-specialize.h index 3015480..fd1b114 100644 --- a/fpu/softfloat-specialize.h +++ b/fpu/softfloat-specialize.h @@ -149,6 +149,28 @@ static float32 commonNaNToFloat32( commonNaNT a ) | tie-break rule. **/ +#if defined(TARGET_ARM) +static int pickNaN(flag aIsQNaN, flag aIsSNaN, flag bIsQNaN, flag bIsSNaN, +flag aIsLargerSignificand) +{ +/* ARM mandated NaN propagation rules: take the first of: + * 1. A if it is signaling + * 2. B if it is signaling + * 3. A (quiet) + * 4. B (quiet) + * A signaling NaN is always quietened before returning it. + */ +if (aIsSNaN) { +return 0; +} else if (bIsSNaN) { +return 1; +} else if (aIsQNaN) { +return 0; +} else { +return 1; +} +} +#else static int pickNaN(flag aIsQNaN, flag aIsSNaN, flag bIsQNaN, flag bIsSNaN, flag aIsLargerSignificand) { @@ -178,6 +200,7 @@ static int pickNaN(flag aIsQNaN, flag aIsSNaN, flag bIsQNaN, flag bIsSNaN, return 1; } } +#endif /* | Takes two single-precision floating-point values `a' and `b', one of which -- 1.6.3.3
[Qemu-devel] Re: [PATCH 2/3] Introduce do_snapshot_blkdev() and monitor command to handle it.
Am 16.12.2010 12:04, schrieb jes.soren...@redhat.com: From: Jes Sorensen jes.soren...@redhat.com The monitor command is: snapshot_blkdev device [snapshot-file] [format] Default format is qcow2. For now snapshots without a snapshot-file, eg internal snapshots, are not supported. Signed-off-by: Jes Sorensen jes.soren...@redhat.com --- blockdev.c | 61 +++ blockdev.h |1 + hmp-commands.hx | 19 + 3 files changed, 81 insertions(+), 0 deletions(-) diff --git a/blockdev.c b/blockdev.c index 3b3b82d..9d6f72c 100644 --- a/blockdev.c +++ b/blockdev.c @@ -516,6 +516,67 @@ void do_commit(Monitor *mon, const QDict *qdict) } } +int do_snapshot_blkdev(Monitor *mon, const QDict *qdict, QObject **ret_data) +{ +const char *device = qdict_get_str(qdict, device); +const char *filename = qdict_get_try_str(qdict, snapshot_file); +const char *format = qdict_get_try_str(qdict, format); +const char format_qcow2[] = qcow2; +BlockDriverState *bs; +BlockDriver *drv, *proto_drv; +int ret = 0; +int flags; + +bs = bdrv_find(device); +if (!bs) { +qerror_report(QERR_DEVICE_NOT_FOUND, device); +ret = -1; +goto out; +} + +if (!format) { +format = format_qcow2; Why introducing format_qcow2 instead of directly using the string literal here? +} + +drv = bdrv_find_format(format); +if (!drv) { +qerror_report(QERR_INVALID_BLOCK_FORMAT, format); +ret = -1; +goto out; +} + +proto_drv = bdrv_find_protocol(filename); +if (!proto_drv) { +qerror_report(QERR_INVALID_BLOCK_FORMAT, format); +ret = -1; +goto out; +} + +ret = bdrv_img_create(filename, format, bs-filename, + bs-drv-format_name, NULL, -1, bs-open_flags); +if (ret) { +goto out; +} + +qemu_aio_flush(); +bdrv_flush(bs); + +flags = bs-open_flags; +bdrv_close(bs); +ret = bdrv_open(bs, filename, flags, drv); +/* + * If reopening the image file we just created fails, we really + * are in trouble :( + */ +assert(ret == 0); bdrv_commit handles this case by setting bs-drv = NULL. After this the device will fail all requests with -ENOMEDIUM, but at least you don't lose your VM immediately. Kevin
[Qemu-devel] [PATCH 0/2] ARM: Implement correct NaN propagation rules
IEEE754 doesn't specify precisely what NaN should be returned as the result of an operation on two input NaNs. The existing softfloat code hardwires x87 propagation rules. This patchset abstracts out the code in the various propagateFloat*NaN() functions which picks a NaN to return, so that it can be easily replaced on a per-target basis. It also implements the correct version of this routine for ARM. I've tested this by the usual random instruction set method. The first patch makes no change to NaN handling (ie it is purely a refactoring of the code); the second brings ARM into line with the hardware implementation. Maintainers of other targets should consider implementing a suitable version of the pickNaN() function. I know the SPARC architecture manual documents NaN rules, and they're not the x87 ones, for instance. Peter Maydell (2): softfloat: abstract out target-specific NaN propagation rules ARM: Implement correct NaN propagation rules fpu/softfloat-specialize.h | 183 +-- 1 files changed, 123 insertions(+), 60 deletions(-)
[Qemu-devel] Re: [PATCH 2/3] Introduce do_snapshot_blkdev() and monitor command to handle it.
On 12/16/10 12:45, Kevin Wolf wrote: Am 16.12.2010 12:04, schrieb jes.soren...@redhat.com: From: Jes Sorensen jes.soren...@redhat.com The monitor command is: snapshot_blkdev device [snapshot-file] [format] Default format is qcow2. For now snapshots without a snapshot-file, eg internal snapshots, are not supported. Signed-off-by: Jes Sorensen jes.soren...@redhat.com --- blockdev.c | 61 +++ blockdev.h |1 + hmp-commands.hx | 19 + 3 files changed, 81 insertions(+), 0 deletions(-) diff --git a/blockdev.c b/blockdev.c index 3b3b82d..9d6f72c 100644 --- a/blockdev.c +++ b/blockdev.c @@ -516,6 +516,67 @@ void do_commit(Monitor *mon, const QDict *qdict) } } +int do_snapshot_blkdev(Monitor *mon, const QDict *qdict, QObject **ret_data) +{ +const char *device = qdict_get_str(qdict, device); +const char *filename = qdict_get_try_str(qdict, snapshot_file); +const char *format = qdict_get_try_str(qdict, format); +const char format_qcow2[] = qcow2; +BlockDriverState *bs; +BlockDriver *drv, *proto_drv; +int ret = 0; +int flags; + +bs = bdrv_find(device); +if (!bs) { +qerror_report(QERR_DEVICE_NOT_FOUND, device); +ret = -1; +goto out; +} + +if (!format) { +format = format_qcow2; Why introducing format_qcow2 instead of directly using the string literal here? It should generate the same code - I kinda liked my style better, but I'll change it. +flags = bs-open_flags; +bdrv_close(bs); +ret = bdrv_open(bs, filename, flags, drv); +/* + * If reopening the image file we just created fails, we really + * are in trouble :( + */ +assert(ret == 0); bdrv_commit handles this case by setting bs-drv = NULL. After this the device will fail all requests with -ENOMEDIUM, but at least you don't lose your VM immediately. Well if we hit this situation something catastrophic happened. I don't see how we can continue beyond this point really. The old image was dropped in the swap and the new one is not accessible ... we're dead :( Cheers, Jes
[Qemu-devel] Re: [PATCH 3/3] Prevent creating an image with the same filename as backing file
Am 16.12.2010 12:04, schrieb jes.soren...@redhat.com: From: Jes Sorensen jes.soren...@redhat.com Signed-off-by: Jes Sorensen jes.soren...@redhat.com --- block.c |7 +++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/block.c b/block.c index 765f9f3..027dc6a 100644 --- a/block.c +++ b/block.c @@ -2769,6 +2769,13 @@ int bdrv_img_create(const char *filename, const char *fmt, BlockDriver *drv, *proto_drv; int ret = 0; +if (!strcmp(filename, base_filename)) { +error_report(Error: Trying to create a snapshot with the same + filename as the backing file); +ret = -1; +goto out; +} + /* Find driver and parse its options */ drv = bdrv_find_format(fmt); if (!drv) { This doesn't catch things like qemu-img create -f qcow2 -obacking_file=foo.qcow2 foo.qcow2 though it will work if you use the legacy -b option. I think we should keep it consistent. Kevin
Re: [Qemu-devel] Re: [PATCH v3] qemu, qmp: convert do_inject_nmi() to QObject, QError
Luiz Capitulino lcapitul...@redhat.com writes: On Thu, 16 Dec 2010 11:03:38 +0200 Avi Kivity a...@redhat.com wrote: On 12/15/2010 08:00 PM, Luiz Capitulino wrote: Looks like a GUI feature to me, Really? Can't see how you can build NMI to all CPUs from NMI this CPU. Or am I misunderstanding you? I guess so. Avi referred to 'nmi button on many machines', I assumed he meant a virtual machine GUI, am I wrong? I meant a real machine's GUI (it's a physical button you can press with your finger, if you have thin fingers). Ok, I didn't know that, but I had another idea: the command could accept either a single cpu index or a list: { execute: inject-nmi, arguments: { cpus: 2 } } { execute: inject-nmi, arguments: { cpus: [1, 2, 3, 4] } } This has the feature of injecting the nmi in just some cpus, although I'm not sure this is going to be desired/useful. Use case for NMI-ing a subset of the CPUs? Heck, use case for anything else but NMI all? If we agree on this we'll have to wait because the monitor doesn't currently support hybrid arguments. Let's keep the schema simple.
[Qemu-devel] Re: [PATCH] ide: Register vm change state handler once only
Am 10.12.2010 15:47, schrieb Stefan Hajnoczi: We register the vm change state handler in a PCI BAR map() function. This function can be called multiple times throughout the lifetime of a PCI IDE device. This results in duplicate vm change state handlers being register, none of which are ever unregistered. Instead, register the vm change state handler in the device's init function once and for all. piix tested, cmd646 and via not tested. Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com This conflicts with the AHCI patches. Can you rebase on top of the block branch? Kevin
[Qemu-devel] Re: [PATCH 1/3] qemu-img.c: Re-factor img_create()
On 12/16/10 12:35, Kevin Wolf wrote: Am 16.12.2010 12:04, schrieb jes.soren...@redhat.com: + +backing_fmt = get_option_parameter(param, BLOCK_OPT_BACKING_FMT); +if (backing_fmt backing_fmt-value.s) { +if (!bdrv_find_format(backing_fmt-value.s)) { +error_report(Unknown backing file format '%s', + backing_fmt-value.s); +ret = -1; +goto out; +} +} + +if (base_fmt) { +if (set_option_parameter(param, BLOCK_OPT_BACKING_FMT, base_fmt)) { +error_report(Backing file format not supported for file + format '%s', fmt); +ret = -1; +goto out; +} +} The order is wrong here. If you use -F, the backing format won't be checked. Urgh yes, my bad! I had it the other way, but decided to change it - very silly. + +// The size for the image must always be specified, with one exception: +// If we are using a backing file, we can obtain the size from there +if (get_option_parameter(param, BLOCK_OPT_SIZE)-value.n == -1) { +QEMUOptionParameter *backing_file = +get_option_parameter(param, BLOCK_OPT_BACKING_FILE); + +if (backing_file backing_file-value.s) { +uint64_t size; +const char *fmt = NULL; +char buf[32]; + +if (backing_fmt backing_fmt-value.s) { +fmt = backing_fmt-value.s; +} + +bs = bdrv_new(); +if (!bs) { +error_report(Not enough memory to allocate BlockDriverState); +ret = -1; +goto out; +} bdrv_new never returns NULL (it's an indirect qemu_malloc call). I see - this was copied wholesale from qemu-img.c but I'll just remove the error check. +ret = bdrv_open(bs, backing_file-value.s, flags, drv); +if (ret 0) { +error_report(Could not open '%s', filename); +ret = -1; +goto out; +} +bdrv_get_geometry(bs, size); +size *= 512; + +snprintf(buf, sizeof(buf), % PRId64, size); +set_option_parameter(param, BLOCK_OPT_SIZE, buf); +} else { +error_report(Image creation needs a size parameter); +ret = -1; +goto out; +} +} + +printf(Formatting '%s', fmt=%s , filename, fmt); +print_option_parameters(param); +puts(); + +ret = bdrv_create(drv, filename, param); +free_option_parameters(create_options); +free_option_parameters(param); These need to be after out: to avoid leaking in error cases. You're basically reverting a87a6721d with this. Whoops - another one of those conflicting ones. It's all Stefan's fault :) + +if (ret 0) { +if (ret == -ENOTSUP) { +error_report(Formatting or formatting option not supported for + file format '%s', fmt); +} else if (ret == -EFBIG) { +error_report(The image size is too large for file format '%s', + fmt); +} else { +error_report(%s: error while creating %s: %s, filename, fmt, + strerror(-ret)); +} +} + +out: +if (bs) { +bdrv_delete(bs); +} +if (ret) { +return 1; +} Maybe we should better use the usual 0/-errno style. In qemu-img it was the exit code of the program, but now it's a block layer function. I like errno, I made it behave like this for consistency with the rest of QEMU. In most places we don't seem to like anything but -1/1 for error values. Let me know what you prefer, or alternatively we can change it in a follow-on patch? Cheers, Jes
Re: [Qemu-devel] [PATCH 1/5] usb-ccid: add CCID bus
On 12/16/10 12:06, Alon Levy wrote: A CCID device is a smart card reader. It is a USB device, defined at [1]. This patch introduces the usb-ccid device that is a ccid bus. Next patches will introduce two card types to use it, a passthru card and an emulated card. [1]http://www.usb.org/developers/devclass_docs/DWG_Smart-Card_CCID_Rev110. Acked-by: Gerd Hoffmann kra...@redhat.com cheers, Gerd
Re: [Qemu-devel] [PATCH 2/5] ccid: add passthru card device
On 12/16/10 12:06, Alon Levy wrote: The passthru ccid card is a device sitting on the usb-ccid bus and using a chardevice to communicate with a remote device using the VSCard protocol defined in libcacard/vscard_common.h Usage docs available in following patch in docs/ccid.txt Acked-by: Gerd Hoffmann kra...@redhat.com cheers, Gerd
[Qemu-devel] [PATCH 1/1] spice: add chardev
Adding a chardev backend for spice, for usage by spice vdagent in conjunction with a properly named virtio-serial device. --- Makefile.objs |2 +- qemu-char.c |7 ++ qemu-config.c |9 ++ qemu-options.hx | 18 - spice-qemu-char.c | 222 + spice-qemu-char.h |9 ++ 6 files changed, 265 insertions(+), 2 deletions(-) create mode 100644 spice-qemu-char.c create mode 100644 spice-qemu-char.h diff --git a/Makefile.objs b/Makefile.objs index cebb945..320b2a9 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -102,7 +102,7 @@ common-obj-$(CONFIG_BRLAPI) += baum.o common-obj-$(CONFIG_POSIX) += migration-exec.o migration-unix.o migration-fd.o common-obj-$(CONFIG_WIN32) += version.o -common-obj-$(CONFIG_SPICE) += ui/spice-core.o ui/spice-input.o ui/spice-display.o +common-obj-$(CONFIG_SPICE) += ui/spice-core.o ui/spice-input.o ui/spice-display.o spice-qemu-char.o audio-obj-y = audio.o noaudio.o wavaudio.o mixeng.o audio-obj-$(CONFIG_SDL) += sdlaudio.o diff --git a/qemu-char.c b/qemu-char.c index edc9ad6..82789ba 100644 --- a/qemu-char.c +++ b/qemu-char.c @@ -98,6 +98,10 @@ #include qemu_socket.h +#ifdef CONFIG_SPICE +#include spice-qemu-char.h +#endif + #define READ_BUF_LEN 4096 /***/ @@ -2495,6 +2499,9 @@ static const struct { || defined(__FreeBSD_kernel__) { .name = parport, .open = qemu_chr_open_pp }, #endif +#ifdef CONFIG_SPICE +{ .name = spicevmc, .open = qemu_chr_open_spice }, +#endif }; CharDriverState *qemu_chr_open_opts(QemuOpts *opts, diff --git a/qemu-config.c b/qemu-config.c index 965fa46..42cd977 100644 --- a/qemu-config.c +++ b/qemu-config.c @@ -147,6 +147,15 @@ static QemuOptsList qemu_chardev_opts = { .name = signal, .type = QEMU_OPT_BOOL, }, +#ifdef CONFIG_SPICE +{ +.name = name, +.type = QEMU_OPT_STRING, +},{ +.name = debug, +.type = QEMU_OPT_NUMBER, +}, +#endif { /* end of list */ } }, }; diff --git a/qemu-options.hx b/qemu-options.hx index 4d99a58..d8ad070 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -1357,6 +1357,9 @@ DEF(chardev, HAS_ARG, QEMU_OPTION_chardev, #if defined(__linux__) || defined(__FreeBSD__) || defined(__DragonFly__) -chardev parport,id=id,path=path[,mux=on|off]\n #endif +#if defined(CONFIG_SPICE) +-chardev spicevmc,id=id,debug=debug,name=name\n +#endif , QEMU_ARCH_ALL ) @@ -1381,7 +1384,10 @@ Backend is one of: @option{stdio}, @option{braille}, @option{tty}, -...@option{parport}. +...@option{parport} +#if defined(CONFIG_SPICE) +...@option{spicevmc}. +#endif The specific backend will determine the applicable options. All devices must have an id, which can be any string up to 127 characters long. @@ -1557,6 +1563,16 @@ Connect to a local parallel port. @option{path} specifies the path to the parallel port device. @option{path} is required. +#if defined(CONFIG_SPICE) +...@item -chardev spicevmc ,i...@var{id} ,deb...@var{debug}, na...@var{name} + +...@option{debug} debug level for spicevmc + +...@option{name} name of spice channel to connect to + +Connect to a spice virtual machine channel, such as vdiport. +#endif + @end table ETEXI diff --git a/spice-qemu-char.c b/spice-qemu-char.c new file mode 100644 index 000..a080796 --- /dev/null +++ b/spice-qemu-char.c @@ -0,0 +1,222 @@ +#include config-host.h +#include ui/qemu-spice.h +#include spice.h +#include spice-experimental.h + +#include osdep.h + +#include spice-qemu-char.h + +//#define SPICE_QEMU_CHAR_USE_IOCTL + +#define dprintf(_scd, _level, _fmt, ...)\ +do {\ +static unsigned __dprintf_counter = 0; \ +if (_scd-debug = _level) {\ +fprintf(stderr, scd: %3d: _fmt, ++__dprintf_counter, ## __VA_ARGS__);\ +} \ +} while (0) + +#define VMC_MAX_HOST_WRITE2048 + +typedef struct SpiceCharDriver { +CharDriverState* chr; +SpiceCharDeviceInstance sin; +char *subtype; +bool active; +uint8_t *buffer; +uint8_t *datapos; +ssize_t bufsize, datalen; +uint32_t debug; +} SpiceCharDriver; + + +static int vmc_write(SpiceCharDeviceInstance *sin, const uint8_t *buf, int len) +{ +SpiceCharDriver *scd = container_of(sin, SpiceCharDriver, sin); +ssize_t out = 0; +ssize_t last_out; +uint8_t* p = (uint8_t*)buf; + +while (len 0) { +last_out = MIN(len, VMC_MAX_HOST_WRITE); +qemu_chr_read(scd-chr, p, last_out); +if (last_out 0) { +out += last_out; +
[Qemu-devel] [PATCH 1/2] softfloat: abstract out target-specific NaN propagation rules
IEEE754 doesn't specify precisely what NaN should be returned as the result of an operation on two input NaNs. This is therefore target-specific. Abstract out the code in propagateFloat*NaN() which was implementing the x87 propagation rules, so that it can be easily replaced on a per-target basis. Signed-off-by: Peter Maydell peter.mayd...@linaro.org --- fpu/softfloat-specialize.h | 160 +++ 1 files changed, 100 insertions(+), 60 deletions(-) diff --git a/fpu/softfloat-specialize.h b/fpu/softfloat-specialize.h index 8e6aceb..3015480 100644 --- a/fpu/softfloat-specialize.h +++ b/fpu/softfloat-specialize.h @@ -134,6 +134,52 @@ static float32 commonNaNToFloat32( commonNaNT a ) } /* +| Select which NaN to propagate for a two-input operation. +| IEEE754 doesn't specify all the details of this, so the +| algorithm is target-specific. +| The routine is passed various bits of information about the +| two NaNs and should return 0 to select NaN a and 1 for NaN b. +| Note that signalling NaNs are always squashed to quiet NaNs +| by the caller, by flipping the SNaN bit before returning them. +| +| aIsLargerSignificand is only valid if both a and b are NaNs +| of some kind, and is true if a has the larger significand, +| or if both a and b have the same significand but a is +| positive but b is negative. It is only needed for the x87 +| tie-break rule. +**/ + +static int pickNaN(flag aIsQNaN, flag aIsSNaN, flag bIsQNaN, flag bIsSNaN, +flag aIsLargerSignificand) +{ +/* This implements x87 NaN propagation rules: + * SNaN + QNaN = return the QNaN + * two SNaNs = return the one with the larger significand, silenced + * two QNaNs = return the one with the larger significand + * SNaN and a non-NaN = return the SNaN, silenced + * QNaN and a non-NaN = return the QNaN + * + * If we get down to comparing significands and they are the same, + * return the NaN with the positive sign bit (if any). + */ +if (aIsSNaN) { +if (bIsSNaN) { +return aIsLargerSignificand ? 0 : 1; +} +return bIsQNaN ? 1 : 0; +} +else if (aIsQNaN) { +if (bIsSNaN || !bIsQNaN) +return 0; +else { +return aIsLargerSignificand ? 0 : 1; +} +} else { +return 1; +} +} + +/* | Takes two single-precision floating-point values `a' and `b', one of which | is a NaN, and returns the appropriate NaN result. If either `a' or `b' is a | signaling NaN, the invalid exception is raised. @@ -141,7 +187,7 @@ static float32 commonNaNToFloat32( commonNaNT a ) static float32 propagateFloat32NaN( float32 a, float32 b STATUS_PARAM) { -flag aIsNaN, aIsSignalingNaN, bIsNaN, bIsSignalingNaN; +flag aIsNaN, aIsSignalingNaN, bIsNaN, bIsSignalingNaN, aIsLargerSignificand; bits32 av, bv, res; if ( STATUS(default_nan_mode) ) @@ -161,26 +207,22 @@ static float32 propagateFloat32NaN( float32 a, float32 b STATUS_PARAM) bv |= 0x0040; #endif if ( aIsSignalingNaN | bIsSignalingNaN ) float_raise( float_flag_invalid STATUS_VAR); -if ( aIsSignalingNaN ) { -if ( bIsSignalingNaN ) goto returnLargerSignificand; -res = bIsNaN ? bv : av; -} -else if ( aIsNaN ) { -if ( bIsSignalingNaN || ! bIsNaN ) -res = av; -else { - returnLargerSignificand: -if ( (bits32) ( av1 ) (bits32) ( bv1 ) ) -res = bv; -else if ( (bits32) ( bv1 ) (bits32) ( av1 ) ) -res = av; -else -res = ( av bv ) ? av : bv; -} + +if ((bits32)(av1) (bits32)(bv1)) { +aIsLargerSignificand = 0; +} else if ((bits32)(bv1) (bits32)(av1)) { +aIsLargerSignificand = 1; +} else { +aIsLargerSignificand = (av bv) ? 1 : 0; } -else { + +if (pickNaN(aIsNaN, aIsSignalingNaN, bIsNaN, bIsSignalingNaN, +aIsLargerSignificand)) { res = bv; +} else { +res = av; } + return make_float32(res); } @@ -276,7 +318,7 @@ static float64 commonNaNToFloat64( commonNaNT a ) static float64 propagateFloat64NaN( float64 a, float64 b STATUS_PARAM) { -flag aIsNaN, aIsSignalingNaN, bIsNaN, bIsSignalingNaN; +flag aIsNaN, aIsSignalingNaN, bIsNaN, bIsSignalingNaN, aIsLargerSignificand; bits64 av, bv, res; if ( STATUS(default_nan_mode) ) @@ -296,26 +338,22 @@ static float64 propagateFloat64NaN( float64 a, float64 b STATUS_PARAM) bv |= LIT64( 0x0008 ); #endif if ( aIsSignalingNaN | bIsSignalingNaN ) float_raise( float_flag_invalid STATUS_VAR); -if ( aIsSignalingNaN ) { -if ( bIsSignalingNaN ) goto
[Qemu-devel] Re: [PATCH] rtl8139: IO memory is not part of vmstate
On 12/15/2010 08:04 PM, Michael S. Tsirkin wrote: This assuming upstream developers do not care about downstreams. To give a chance for downstream to cherry-pick changes, upstream should use subsections instead of version ids too. Then version ids should be deprecated altogether. Nothing against it, but I never heard about this plan. Paolo
Re: [Qemu-devel] Re: [PATCH v3] qemu, qmp: convert do_inject_nmi() to QObject, QError
On 12/16/2010 01:47 PM, Markus Armbruster wrote: This has the feature of injecting the nmi in just some cpus, although I'm not sure this is going to be desired/useful. Use case for NMI-ing a subset of the CPUs? Heck, use case for anything else but NMI all? Exactly. -- error compiling committee.c: too many arguments to function
[Qemu-devel] [PATCH 2/4] Introduce do_snapshot_blkdev() and monitor command to handle it.
From: Jes Sorensen jes.soren...@redhat.com The monitor command is: snapshot_blkdev device [snapshot-file] [format] Default format is qcow2. For now snapshots without a snapshot-file, eg internal snapshots, are not supported. Signed-off-by: Jes Sorensen jes.soren...@redhat.com --- blockdev.c | 62 +++ blockdev.h |1 + hmp-commands.hx | 19 3 files changed, 82 insertions(+), 0 deletions(-) diff --git a/blockdev.c b/blockdev.c index 3b3b82d..d7add36 100644 --- a/blockdev.c +++ b/blockdev.c @@ -516,6 +516,68 @@ void do_commit(Monitor *mon, const QDict *qdict) } } +int do_snapshot_blkdev(Monitor *mon, const QDict *qdict, QObject **ret_data) +{ +const char *device = qdict_get_str(qdict, device); +const char *filename = qdict_get_try_str(qdict, snapshot_file); +const char *format = qdict_get_try_str(qdict, format); +BlockDriverState *bs; +BlockDriver *drv, *proto_drv; +int ret = 0; +int flags; + +bs = bdrv_find(device); +if (!bs) { +qerror_report(QERR_DEVICE_NOT_FOUND, device); +ret = -1; +goto out; +} + +if (!format) { +format = qcow2; +} + +drv = bdrv_find_format(format); +if (!drv) { +qerror_report(QERR_INVALID_BLOCK_FORMAT, format); +ret = -1; +goto out; +} + +proto_drv = bdrv_find_protocol(filename); +if (!proto_drv) { +qerror_report(QERR_INVALID_BLOCK_FORMAT, format); +ret = -1; +goto out; +} + +ret = bdrv_img_create(filename, format, bs-filename, + bs-drv-format_name, NULL, -1, bs-open_flags); +if (ret) { +goto out; +} + +qemu_aio_flush(); +bdrv_flush(bs); + +flags = bs-open_flags; +bdrv_close(bs); +ret = bdrv_open(bs, filename, flags, drv); +/* + * If reopening the image file we just created fails, we really + * are in trouble :( + */ +if (ret != 0) { +abort(); +} +out: +if (ret) { +ret = -1; +} + +return ret; +} + static int eject_device(Monitor *mon, BlockDriverState *bs, int force) { if (!force) { diff --git a/blockdev.h b/blockdev.h index 4cb8ca9..4536b5c 100644 --- a/blockdev.h +++ b/blockdev.h @@ -52,5 +52,6 @@ int do_block_set_passwd(Monitor *mon, const QDict *qdict, QObject **ret_data); int do_change_block(Monitor *mon, const char *device, const char *filename, const char *fmt); int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data); +int do_snapshot_blkdev(Monitor *mon, const QDict *qdict, QObject **ret_data); #endif diff --git a/hmp-commands.hx b/hmp-commands.hx index 23024ba..dd3db36 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -801,6 +801,25 @@ STEXI Set maximum tolerated downtime (in seconds) for migration. ETEXI +{ +.name = snapshot_blkdev, +.args_type = device:s,snapshot_file:s?,format:s?, +.params = device [new-image-file] [format], +.help = initiates a live snapshot\n\t\t\t + of device. If a new image file is specified, the\n\t\t\t + new image file will become the new root image.\n\t\t\t + If format is specified, the snapshot file will\n\t\t\t + be created in that format. Otherwise the\n\t\t\t + snapshot will be internal! (currently unsupported), +.mhandler.cmd_new = do_snapshot_blkdev, +}, + +STEXI +...@item snapshot_blkdev +...@findex snapshot_blkdev +Snapshot device, using snapshot file as target if provided +ETEXI + #if defined(TARGET_I386) { .name = drive_add, -- 1.7.3.3
[Qemu-devel] [PATCH 1/4] qemu-img.c: Re-factor img_create()
From: Jes Sorensen jes.soren...@redhat.com This patch re-factors img_create() moving the code doing the actual work into block.c where it can be shared with QEMU. This is needed to be able to create images from QEMU to be used for live snapshots. Signed-off-by: Jes Sorensen jes.soren...@redhat.com --- block.c| 141 block.h|4 ++ qemu-img.c | 108 +- 3 files changed, 147 insertions(+), 106 deletions(-) diff --git a/block.c b/block.c index b4aaf41..a48b30c 100644 --- a/block.c +++ b/block.c @@ -2758,3 +2758,144 @@ int64_t bdrv_get_dirty_count(BlockDriverState *bs) { return bs-dirty_count; } + +int bdrv_img_create(const char *filename, const char *fmt, +const char *base_filename, const char *base_fmt, +char *options, uint64_t img_size, int flags) +{ +QEMUOptionParameter *param = NULL, *create_options = NULL; +QEMUOptionParameter *backing_fmt; +BlockDriverState *bs = NULL; +BlockDriver *drv, *proto_drv; +int ret = 0; + +/* Find driver and parse its options */ +drv = bdrv_find_format(fmt); +if (!drv) { +error_report(Unknown file format '%s', fmt); +ret = -1; +goto out; +} + +proto_drv = bdrv_find_protocol(filename); +if (!proto_drv) { +error_report(Unknown protocol '%s', filename); +ret = -1; +goto out; +} + +create_options = append_option_parameters(create_options, + drv-create_options); +create_options = append_option_parameters(create_options, + proto_drv-create_options); + +/* Create parameter list with default values */ +param = parse_option_parameters(, create_options, param); + +set_option_parameter_int(param, BLOCK_OPT_SIZE, img_size); + +/* Parse -o options */ +if (options) { +param = parse_option_parameters(options, create_options, param); +if (param == NULL) { +error_report(Invalid options for file format '%s'., fmt); +ret = -1; +goto out; +} +} + +if (base_filename) { +if (set_option_parameter(param, BLOCK_OPT_BACKING_FILE, + base_filename)) { +error_report(Backing file not supported for file format '%s', + fmt); +ret = -1; +goto out; +} +} + +if (base_fmt) { +if (set_option_parameter(param, BLOCK_OPT_BACKING_FMT, base_fmt)) { +error_report(Backing file format not supported for file + format '%s', fmt); +ret = -1; +goto out; +} +} + +backing_fmt = get_option_parameter(param, BLOCK_OPT_BACKING_FMT); +if (backing_fmt backing_fmt-value.s) { +if (!bdrv_find_format(backing_fmt-value.s)) { +error_report(Unknown backing file format '%s', + backing_fmt-value.s); +ret = -1; +goto out; +} +} + +// The size for the image must always be specified, with one exception: +// If we are using a backing file, we can obtain the size from there +if (get_option_parameter(param, BLOCK_OPT_SIZE)-value.n == -1) { +QEMUOptionParameter *backing_file = +get_option_parameter(param, BLOCK_OPT_BACKING_FILE); + +if (backing_file backing_file-value.s) { +uint64_t size; +const char *fmt = NULL; +char buf[32]; + +if (backing_fmt backing_fmt-value.s) { +fmt = backing_fmt-value.s; +} + +bs = bdrv_new(); + +ret = bdrv_open(bs, backing_file-value.s, flags, drv); +if (ret 0) { +error_report(Could not open '%s', filename); +ret = -1; +goto out; +} +bdrv_get_geometry(bs, size); +size *= 512; + +snprintf(buf, sizeof(buf), % PRId64, size); +set_option_parameter(param, BLOCK_OPT_SIZE, buf); +} else { +error_report(Image creation needs a size parameter); +ret = -1; +goto out; +} +} + +printf(Formatting '%s', fmt=%s , filename, fmt); +print_option_parameters(param); +puts(); + +ret = bdrv_create(drv, filename, param); + +if (ret 0) { +if (ret == -ENOTSUP) { +error_report(Formatting or formatting option not supported for + file format '%s', fmt); +} else if (ret == -EFBIG) { +error_report(The image size is too large for file format '%s', + fmt); +} else { +error_report(%s: error while creating %s: %s, filename, fmt, + strerror(-ret)); +} +} +
[Qemu-devel] [PATCH v3 0/4] Re-factor img_create() and add live snapshots
From: Jes Sorensen jes.soren...@redhat.com Hi, This set of patches re-factors img_create() and moves the core part of it into block.c so it can be accessed from qemu as well as qemu-img. The second patch adds basic live snapshots support to the code, however only snapshots to external QCOW2 images is supported for now. QED support should be trivial once the QED patches go into upstream. The last patch fixes a small gotcha which is present in the old code as well. Try to catch cases where a user tries to create an image with itself as the backing file. QEMU does 'interesting' things when you do this. Many thanks to Kevin for his help with block layer internals! New in v2: - Fix error return value in monitor command - Clarify help message for command - Fix patch conflict against block tree. It's all Stefan's fault :) f8feb11f4d76f390dddc5cc5345abf99f7659a78 New in v3: - Address issues pointed out by Stefan and Kevin - Additional patch to return proper -errno error values on error in bdrv_img_create() as suggested by Kevin. Jes Sorensen (4): qemu-img.c: Re-factor img_create() Introduce do_snapshot_blkdev() and monitor command to handle it. Prevent creating an image with the same filename as backing file bdrv_img_create() use proper errno return values block.c | 145 +++ block.h |4 ++ blockdev.c | 62 +++ blockdev.h |1 + hmp-commands.hx | 19 +++ qemu-img.c | 108 + 6 files changed, 233 insertions(+), 106 deletions(-) -- 1.7.3.3
Re: [Qemu-devel] [PATCH 3/5] libcacard: initial commit after coding style fixes
On 12/16/10 12:06, Alon Levy wrote: libcacard emulates a Common Access Card (CAC) which is a standard for smartcards. It is used by the emulated ccid card introduced in a following patch. Docs are available in docs/libcacard.txt Looks good to me, although I'm not a crypto expert. Most of the critical crypto stuff is in nss anyway though, so: Acked-by: Gerd Hoffmann kra...@redhat.com cheers, Gerd
[Qemu-devel] [PATCH 4/4] bdrv_img_create() use proper errno return values
From: Jes Sorensen jes.soren...@redhat.com Kevin suggested to have bdrv_img_create() return proper -errno values on error. Signed-off-by: Jes Sorensen jes.soren...@redhat.com --- block.c | 23 ++- 1 files changed, 10 insertions(+), 13 deletions(-) diff --git a/block.c b/block.c index 0c14eee..fe07d0b 100644 --- a/block.c +++ b/block.c @@ -2773,14 +2773,14 @@ int bdrv_img_create(const char *filename, const char *fmt, drv = bdrv_find_format(fmt); if (!drv) { error_report(Unknown file format '%s', fmt); -ret = -1; +ret = -EINVAL; goto out; } proto_drv = bdrv_find_protocol(filename); if (!proto_drv) { error_report(Unknown protocol '%s', filename); -ret = -1; +ret = -EINVAL; goto out; } @@ -2799,7 +2799,7 @@ int bdrv_img_create(const char *filename, const char *fmt, param = parse_option_parameters(options, create_options, param); if (param == NULL) { error_report(Invalid options for file format '%s'., fmt); -ret = -1; +ret = -EINVAL; goto out; } } @@ -2809,7 +2809,7 @@ int bdrv_img_create(const char *filename, const char *fmt, base_filename)) { error_report(Backing file not supported for file format '%s', fmt); -ret = -1; +ret = -EINVAL; goto out; } } @@ -2818,7 +2818,7 @@ int bdrv_img_create(const char *filename, const char *fmt, if (set_option_parameter(param, BLOCK_OPT_BACKING_FMT, base_fmt)) { error_report(Backing file format not supported for file format '%s', fmt); -ret = -1; +ret = -EINVAL; goto out; } } @@ -2828,7 +2828,7 @@ int bdrv_img_create(const char *filename, const char *fmt, if (!strcmp(filename, backing_file-value.s)) { error_report(Error: Trying to create an image with the same filename as the backing file); -ret = -1; +ret = -EINVAL; goto out; } } @@ -2838,7 +2838,7 @@ int bdrv_img_create(const char *filename, const char *fmt, if (!bdrv_find_format(backing_fmt-value.s)) { error_report(Unknown backing file format '%s', backing_fmt-value.s); -ret = -1; +ret = -EINVAL; goto out; } } @@ -2860,7 +2860,6 @@ int bdrv_img_create(const char *filename, const char *fmt, ret = bdrv_open(bs, backing_file-value.s, flags, drv); if (ret 0) { error_report(Could not open '%s', filename); -ret = -1; goto out; } bdrv_get_geometry(bs, size); @@ -2870,7 +2869,7 @@ int bdrv_img_create(const char *filename, const char *fmt, set_option_parameter(param, BLOCK_OPT_SIZE, buf); } else { error_report(Image creation needs a size parameter); -ret = -1; +ret = -EINVAL; goto out; } } @@ -2901,8 +2900,6 @@ out: if (bs) { bdrv_delete(bs); } -if (ret) { -return 1; -} -return 0; + +return ret; } -- 1.7.3.3
Re: [Qemu-devel] [PATCH 4/5] ccid: add ccid-card-emulated device (v2)
On 12/16/10 12:06, Alon Levy wrote: This devices uses libcacard (internal) to emulate a smartcard conforming to the CAC standard. It attaches to the usb-ccid bus. Usage instructions (example command lines) are in the following patch in docs/ccid.txt. It uses libcacard which uses nss, so it can work with both hw cards and certificates (files). Acked-by: Gerd Hoffmann kra...@redhat.com cheers, Gerd
[Qemu-devel] [PATCH 3/4] Prevent creating an image with the same filename as backing file
From: Jes Sorensen jes.soren...@redhat.com Signed-off-by: Jes Sorensen jes.soren...@redhat.com --- block.c | 15 +++ 1 files changed, 11 insertions(+), 4 deletions(-) diff --git a/block.c b/block.c index a48b30c..0c14eee 100644 --- a/block.c +++ b/block.c @@ -2764,7 +2764,7 @@ int bdrv_img_create(const char *filename, const char *fmt, char *options, uint64_t img_size, int flags) { QEMUOptionParameter *param = NULL, *create_options = NULL; -QEMUOptionParameter *backing_fmt; +QEMUOptionParameter *backing_fmt, *backing_file; BlockDriverState *bs = NULL; BlockDriver *drv, *proto_drv; int ret = 0; @@ -2823,6 +2823,16 @@ int bdrv_img_create(const char *filename, const char *fmt, } } +backing_file = get_option_parameter(param, BLOCK_OPT_BACKING_FILE); +if (backing_file backing_file-value.s) { +if (!strcmp(filename, backing_file-value.s)) { +error_report(Error: Trying to create an image with the + same filename as the backing file); +ret = -1; +goto out; +} +} + backing_fmt = get_option_parameter(param, BLOCK_OPT_BACKING_FMT); if (backing_fmt backing_fmt-value.s) { if (!bdrv_find_format(backing_fmt-value.s)) { @@ -2836,9 +2846,6 @@ int bdrv_img_create(const char *filename, const char *fmt, // The size for the image must always be specified, with one exception: // If we are using a backing file, we can obtain the size from there if (get_option_parameter(param, BLOCK_OPT_SIZE)-value.n == -1) { -QEMUOptionParameter *backing_file = -get_option_parameter(param, BLOCK_OPT_BACKING_FILE); - if (backing_file backing_file-value.s) { uint64_t size; const char *fmt = NULL; -- 1.7.3.3
Re: [Qemu-devel] [PATCH 5/5] ccid: add docs
On 12/16/10 12:06, Alon Levy wrote: Add documentation for the usb-ccid device and accompanying two card devices, ccid-card-emulated and ccid-card-passthru. --- docs/ccid.txt | 125 ++ docs/libcacard.txt | 483 Guess that isn't intentional, right? +A typical interchange is: + +client event | vscclient |passthru| usb-ccid | guest event +-- + | VSC_Init|| | + | VSC_ReaderAdd || attach| + | || | sees new usb device. +card inserted | || | + | VSC_ATR || | + | || | guest operation, APDU transfer via CCID + | | VSC_APDU | | + | VSC_APDU|| | +client-physical | || | +card APDU exchange| || | +[APDU-APDU repeats several times] +card removed | || | + | VSC_CardRemove || | +kill/quit | || | + vscclient | || | + | VSC_ReaderRemove||detach | + | || | usb device removed. This needs updating, doesn't it? I think you plugin and -out just the cards on the ccid bus instead of the whole usb device, right? cheers, Gerd
Re: [Qemu-devel] Re: [PATCH v3] qemu, qmp: convert do_inject_nmi() to QObject, QError
On Thu, 16 Dec 2010 14:50:08 +0200 Avi Kivity a...@redhat.com wrote: On 12/16/2010 01:47 PM, Markus Armbruster wrote: This has the feature of injecting the nmi in just some cpus, although I'm not sure this is going to be desired/useful. Use case for NMI-ing a subset of the CPUs? Heck, use case for anything else but NMI all? Exactly. Then I think the to-all-cpus argument is better.
Re: [Qemu-devel] Re: [PATCH v3] qemu, qmp: convert do_inject_nmi() to QObject, QError
On 12/16/2010 03:09 PM, Luiz Capitulino wrote: On Thu, 16 Dec 2010 14:50:08 +0200 Avi Kivitya...@redhat.com wrote: On 12/16/2010 01:47 PM, Markus Armbruster wrote: This has the feature of injecting the nmi in just some cpus, although I'm not sure this is going to be desired/useful. Use case for NMI-ing a subset of the CPUs? Heck, use case for anything else but NMI all? Exactly. Then I think the to-all-cpus argument is better. Why have an argument at all? Always nmi to all cpus. -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] Re: [PATCH v3] qemu, qmp: convert do_inject_nmi() to QObject, QError
On Thu, 16 Dec 2010 15:11:50 +0200 Avi Kivity a...@redhat.com wrote: On 12/16/2010 03:09 PM, Luiz Capitulino wrote: On Thu, 16 Dec 2010 14:50:08 +0200 Avi Kivitya...@redhat.com wrote: On 12/16/2010 01:47 PM, Markus Armbruster wrote: This has the feature of injecting the nmi in just some cpus, although I'm not sure this is going to be desired/useful. Use case for NMI-ing a subset of the CPUs? Heck, use case for anything else but NMI all? Exactly. Then I think the to-all-cpus argument is better. Why have an argument at all? Always nmi to all cpus. That's possible too.
Re: [Qemu-devel] [PATCH 1/1] spice: add chardev
On 12/16/10 12:29, Alon Levy wrote: Adding a chardev backend for spice, for usage by spice vdagent in conjunction with a properly named virtio-serial device. Usage example would be nice here. +#ifdef CONFIG_SPICE +#include spice-qemu-char.h +#endif #ifdef can be dropped. +#ifdef CONFIG_SPICE +{ +.name = name, +.type = QEMU_OPT_STRING, +},{ +.name = debug, +.type = QEMU_OPT_NUMBER, +}, +#endif This too. @@ -1381,7 +1384,10 @@ Backend is one of: @option{stdio}, @option{braille}, @option{tty}, -...@option{parport}. +...@option{parport} +#if defined(CONFIG_SPICE) +...@option{spicevmc}. +#endif This too, documentation should be there unconditionally. +//#define SPICE_QEMU_CHAR_USE_IOCTL Why is this disabled? Does it depend on the chardev patches from Amit? diff --git a/spice-qemu-char.h b/spice-qemu-char.h new file mode 100644 index 000..32d5009 --- /dev/null +++ b/spice-qemu-char.h @@ -0,0 +1,9 @@ +#ifndef __SPICE_QEMU_CHAR_H__ +#define __SPICE_QEMU_CHAR_H__ + +#include qemu-char.h + +CharDriverState *qemu_chr_open_spice(QemuOpts *opts); + +#endif // __SPICE_QEMU_CHAR_H__ + Hmm, maybe add this to ui/qemu-spice.h instead, so we don't clutter the tree with lots of tiny includes? cheers, Gerd
[Qemu-devel] [PATCH] qemu.img.c: Use error_report() instead of own error() implementation
From: Jes Sorensen jes.soren...@redhat.com Signed-off-by: Jes Sorensen jes.soren...@redhat.com --- qemu-img.c | 128 +-- 1 files changed, 63 insertions(+), 65 deletions(-) diff --git a/qemu-img.c b/qemu-img.c index 603bdb3..d3921b6 100644 --- a/qemu-img.c +++ b/qemu-img.c @@ -26,6 +26,7 @@ #include osdep.h #include sysemu.h #include block_int.h +#include qerror.h #include stdio.h #ifdef _WIN32 @@ -40,16 +41,6 @@ typedef struct img_cmd_t { /* Default to cache=writeback as data integrity is not important for qemu-tcg. */ #define BDRV_O_FLAGS BDRV_O_CACHE_WB -static void GCC_FMT_ATTR(1, 2) error(const char *fmt, ...) -{ -va_list ap; -va_start(ap, fmt); -fprintf(stderr, qemu-img: ); -vfprintf(stderr, fmt, ap); -fprintf(stderr, \n); -va_end(ap); -} - static void format_print(void *opaque, const char *name) { printf( %s, name); @@ -196,13 +187,13 @@ static int print_block_option_help(const char *filename, const char *fmt) /* Find driver and parse its options */ drv = bdrv_find_format(fmt); if (!drv) { -error(Unknown file format '%s', fmt); +error_report(Unknown file format '%s', fmt); return 1; } proto_drv = bdrv_find_protocol(filename); if (!proto_drv) { -error(Unknown protocol '%s', filename); +error_report(Unknown protocol '%s', filename); return 1; } @@ -225,30 +216,30 @@ static BlockDriverState *bdrv_new_open(const char *filename, bs = bdrv_new(); if (!bs) { -error(Not enough memory); +error_report(Not enough memory); goto fail; } if (fmt) { drv = bdrv_find_format(fmt); if (!drv) { -error(Unknown file format '%s', fmt); +error_report(Unknown file format '%s', fmt); goto fail; } } else { drv = NULL; } if (bdrv_open(bs, filename, flags, drv) 0) { -error(Could not open '%s', filename); +error_report(Could not open '%s', filename); goto fail; } if (bdrv_is_encrypted(bs)) { printf(Disk image '%s' is encrypted.\n, filename); if (read_password(password, sizeof(password)) 0) { -error(No password given); +error_report(No password given); goto fail; } if (bdrv_set_key(bs, password) 0) { -error(invalid password); +error_report(invalid password); goto fail; } } @@ -266,13 +257,15 @@ static int add_old_style_options(const char *fmt, QEMUOptionParameter *list, { if (base_filename) { if (set_option_parameter(list, BLOCK_OPT_BACKING_FILE, base_filename)) { -error(Backing file not supported for file format '%s', fmt); +error_report(Backing file not supported for file format '%s', + fmt); return -1; } } if (base_fmt) { if (set_option_parameter(list, BLOCK_OPT_BACKING_FMT, base_fmt)) { -error(Backing file format not supported for file format '%s', fmt); +error_report(Backing file format not supported for file + format '%s', fmt); return -1; } } @@ -309,11 +302,11 @@ static int img_create(int argc, char **argv) fmt = optarg; break; case 'e': -error(qemu-img: option -e is deprecated, please use \'-o +error_report(qemu-img: option -e is deprecated, please use \'-o encryption\' instead!); return 1; case '6': -error(qemu-img: option -6 is deprecated, please use \'-o +error_report(qemu-img: option -6 is deprecated, please use \'-o compat6\' instead!); return 1; case 'o': @@ -333,9 +326,9 @@ static int img_create(int argc, char **argv) ssize_t sval; sval = strtosz_suffix(argv[optind++], NULL, STRTOSZ_DEFSUFFIX_B); if (sval 0) { -error(Invalid image size specified! You may use k, M, G or +error_report(Invalid image size specified! You may use k, M, G or T suffixes for ); -error(kilobytes, megabytes, gigabytes and terabytes.); +error_report(kilobytes, megabytes, gigabytes and terabytes.); ret = -1; goto out; } @@ -399,7 +392,7 @@ static int img_check(int argc, char **argv) ret = bdrv_check(bs, result); if (ret == -ENOTSUP) { -error(This image format does not support checks); +error_report(This image format does not support checks); bdrv_delete(bs); return 1; } @@ -481,16 +474,16 @@ static int img_commit(int argc, char **argv) printf(Image committed.\n); break; case -ENOENT: -error(No disk
[Qemu-devel] [PATCH] qemu-img: Call error_set_progname
Call error_set_progname during the qemu-img initialization, so that error messages printed with error_report() use the right prefix. Signed-off-by: Kevin Wolf kw...@redhat.com --- qemu-img.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/qemu-img.c b/qemu-img.c index 603bdb3..0ff179f 100644 --- a/qemu-img.c +++ b/qemu-img.c @@ -23,6 +23,7 @@ */ #include qemu-common.h #include qemu-option.h +#include qemu-error.h #include osdep.h #include sysemu.h #include block_int.h @@ -1508,6 +1509,8 @@ int main(int argc, char **argv) const img_cmd_t *cmd; const char *cmdname; +error_set_progname(argv[0]); + bdrv_init(); if (argc 2) help(); -- 1.7.2.3
[Qemu-devel] Re: [PATCH v3 0/4] Re-factor img_create() and add live snapshots
Am 16.12.2010 13:52, schrieb jes.soren...@redhat.com: From: Jes Sorensen jes.soren...@redhat.com Hi, This set of patches re-factors img_create() and moves the core part of it into block.c so it can be accessed from qemu as well as qemu-img. The second patch adds basic live snapshots support to the code, however only snapshots to external QCOW2 images is supported for now. QED support should be trivial once the QED patches go into upstream. The last patch fixes a small gotcha which is present in the old code as well. Try to catch cases where a user tries to create an image with itself as the backing file. QEMU does 'interesting' things when you do this. Many thanks to Kevin for his help with block layer internals! New in v2: - Fix error return value in monitor command - Clarify help message for command - Fix patch conflict against block tree. It's all Stefan's fault :) f8feb11f4d76f390dddc5cc5345abf99f7659a78 New in v3: - Address issues pointed out by Stefan and Kevin - Additional patch to return proper -errno error values on error in bdrv_img_create() as suggested by Kevin. Jes Sorensen (4): qemu-img.c: Re-factor img_create() Introduce do_snapshot_blkdev() and monitor command to handle it. Prevent creating an image with the same filename as backing file bdrv_img_create() use proper errno return values block.c | 145 +++ block.h |4 ++ blockdev.c | 62 +++ blockdev.h |1 + hmp-commands.hx | 19 +++ qemu-img.c | 108 + 6 files changed, 233 insertions(+), 106 deletions(-) Thanks, applied all to the block branch. Kevin
[Qemu-devel] Re: [PATCH] qemu.img.c: Use error_report() instead of own error() implementation
Am 16.12.2010 14:31, schrieb jes.soren...@redhat.com: From: Jes Sorensen jes.soren...@redhat.com Signed-off-by: Jes Sorensen jes.soren...@redhat.com Thanks, applied to the block branch. Kevin
[Qemu-devel] Re: [PATCH 05/21] virtio: modify save/load handler to handle inuse varialble.
2010/12/16 Michael S. Tsirkin m...@redhat.com: On Thu, Dec 16, 2010 at 04:36:16PM +0900, Yoshiaki Tamura wrote: 2010/12/3 Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp: 2010/12/2 Michael S. Tsirkin m...@redhat.com: On Wed, Dec 01, 2010 at 05:03:43PM +0900, Yoshiaki Tamura wrote: 2010/11/28 Michael S. Tsirkin m...@redhat.com: On Sun, Nov 28, 2010 at 08:27:58PM +0900, Yoshiaki Tamura wrote: 2010/11/28 Michael S. Tsirkin m...@redhat.com: On Thu, Nov 25, 2010 at 03:06:44PM +0900, Yoshiaki Tamura wrote: Modify inuse type to uint16_t, let save/load to handle, and revert last_avail_idx with inuse if there are outstanding emulation. Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp This changes migration format, so it will break compatibility with existing drivers. More generally, I think migrating internal state that is not guest visible is always a mistake as it ties migration format to an internal implementation (yes, I know we do this sometimes, but we should at least try not to add such cases). I think the right thing to do in this case is to flush outstanding work when vm is stopped. Then, we are guaranteed that inuse is 0. I sent patches that do this for virtio net and block. Could you give me the link of your patches? I'd like to test whether they work with Kemari upon failover. If they do, I'm happy to drop this patch. Yoshi Look for this: stable migration image on a stopped vm sent on: Wed, 24 Nov 2010 17:52:49 +0200 Thanks for the info. However, The patch series above didn't solve the issue. In case of Kemari, inuse is mostly 0 because it queues the output, and while last_avail_idx gets incremented immediately, not sending inuse makes the state inconsistent between Primary and Secondary. Hmm. Can we simply avoid incrementing last_avail_idx? I think we can calculate or prepare an internal last_avail_idx, and update the external when inuse is decremented. I'll try whether it work w/ w/o Kemari. Hi Michael, Could you please take a look at the following patch? Which version is this against? Oops. It should be very old. 67f895bfe69f323b427b284430b6219c8a62e8d4 commit 36ee7910059e6b236fe9467a609f5b4aed866912 Author: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp Date: Thu Dec 16 14:50:54 2010 +0900 virtio: update last_avail_idx when inuse is decreased. Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp It would be better to have a commit description explaining why a change is made, and why it is correct, not just repeating what can be seen from the diff anyway. Sorry for being lazy here. diff --git a/hw/virtio.c b/hw/virtio.c index c8a0fc6..6688c02 100644 --- a/hw/virtio.c +++ b/hw/virtio.c @@ -237,6 +237,7 @@ void virtqueue_flush(VirtQueue *vq, unsigned int count) wmb(); trace_virtqueue_flush(vq, count); vring_used_idx_increment(vq, count); + vq-last_avail_idx += count; vq-inuse -= count; } @@ -385,7 +386,7 @@ int virtqueue_pop(VirtQueue *vq, VirtQueueElement *elem) unsigned int i, head, max; target_phys_addr_t desc_pa = vq-vring.desc; - if (!virtqueue_num_heads(vq, vq-last_avail_idx)) + if (!virtqueue_num_heads(vq, vq-last_avail_idx + vq-inuse)) return 0; /* When we start there are none of either input nor output. */ @@ -393,7 +394,7 @@ int virtqueue_pop(VirtQueue *vq, VirtQueueElement *elem) max = vq-vring.num; - i = head = virtqueue_get_head(vq, vq-last_avail_idx++); + i = head = virtqueue_get_head(vq, vq-last_avail_idx + vq-inuse); if (vring_desc_flags(desc_pa, i) VRING_DESC_F_INDIRECT) { if (vring_desc_len(desc_pa, i) % sizeof(VRingDesc)) { Hmm, will virtio_queue_empty be wrong now? What about virtqueue_avail_bytes? I think there are two problems. 1. When to update last_avail_idx. 2. The ordering issue you're mentioning below. The patch above is only trying to address 1 because last time you mentioned that modifying last_avail_idx upon save may break the guest, which I agree. If virtio_queue_empty and virtqueue_avail_bytes are only used internally, meaning invisible to the guest, I guess the approach above can be applied too. Previous patch version sure looked simpler, and this seems functionally equivalent, so my question still stands: here it is rephrased in a different way: assume that we have in avail ring 2 requests at start of ring: A and B in this order host pops A, then B, then completes B and flushes now with this patch last_avail_idx will be 1, and then remote will get it, it will execute B again. As a result B will complete twice, and apparently A will never complete. This is what I was saying below: assuming that there are outstanding requests when we migrate, there is no way a single index can be enough to figure out which
[Qemu-devel] Re: [PATCH] qemu-img: Call error_set_progname
On 12/16/10 15:13, Kevin Wolf wrote: Call error_set_progname during the qemu-img initialization, so that error messages printed with error_report() use the right prefix. Signed-off-by: Kevin Wolf kw...@redhat.com Acked-by: Jes Sorensen jes.soren...@redhat.com --- qemu-img.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/qemu-img.c b/qemu-img.c index 603bdb3..0ff179f 100644 --- a/qemu-img.c +++ b/qemu-img.c @@ -23,6 +23,7 @@ */ #include qemu-common.h #include qemu-option.h +#include qemu-error.h #include osdep.h #include sysemu.h #include block_int.h @@ -1508,6 +1509,8 @@ int main(int argc, char **argv) const img_cmd_t *cmd; const char *cmdname; +error_set_progname(argv[0]); + bdrv_init(); if (argc 2) help();
[Qemu-devel] [PATCH] Remove NULL checks for bdrv_new return value
It's an indirect call to qemu_malloc, which never returns an error. Signed-off-by: Kevin Wolf kw...@redhat.com --- hw/xen_disk.c | 17 ++--- qemu-img.c|5 + qemu-io.c |2 -- qemu-nbd.c|2 -- 4 files changed, 7 insertions(+), 19 deletions(-) diff --git a/hw/xen_disk.c b/hw/xen_disk.c index 85a1c85..1954311 100644 --- a/hw/xen_disk.c +++ b/hw/xen_disk.c @@ -634,17 +634,12 @@ static int blk_init(struct XenDevice *xendev) if (!blkdev-dinfo) { /* setup via xenbus - create new block driver instance */ xen_be_printf(blkdev-xendev, 2, create new bdrv (xenbus setup)\n); - blkdev-bs = bdrv_new(blkdev-dev); - if (blkdev-bs) { - if (bdrv_open(blkdev-bs, blkdev-filename, qflags, - bdrv_find_whitelisted_format(blkdev-fileproto)) -!= 0) { - bdrv_delete(blkdev-bs); - blkdev-bs = NULL; - } - } - if (!blkdev-bs) - return -1; +blkdev-bs = bdrv_new(blkdev-dev); +if (bdrv_open(blkdev-bs, blkdev-filename, qflags, + bdrv_find_whitelisted_format(blkdev-fileproto)) != 0) { +bdrv_delete(blkdev-bs); +blkdev-bs = NULL; +} } else { /* setup via qemu cmdline - already setup for us */ xen_be_printf(blkdev-xendev, 2, get configured bdrv (cmdline setup)\n); diff --git a/qemu-img.c b/qemu-img.c index 0b871d8..afd9ed2 100644 --- a/qemu-img.c +++ b/qemu-img.c @@ -215,10 +215,7 @@ static BlockDriverState *bdrv_new_open(const char *filename, char password[256]; bs = bdrv_new(); -if (!bs) { -error_report(Not enough memory); -goto fail; -} + if (fmt) { drv = bdrv_find_format(fmt); if (!drv) { diff --git a/qemu-io.c b/qemu-io.c index ff353eb..0f6d1b6 100644 --- a/qemu-io.c +++ b/qemu-io.c @@ -1509,8 +1509,6 @@ static int openfile(char *name, int flags, int growable) } } else { bs = bdrv_new(hda); - if (!bs) - return 1; if (bdrv_open(bs, name, flags, NULL) 0) { fprintf(stderr, %s: can't open device %s\n, progname, name); diff --git a/qemu-nbd.c b/qemu-nbd.c index 99f1d22..e858033 100644 --- a/qemu-nbd.c +++ b/qemu-nbd.c @@ -336,8 +336,6 @@ int main(int argc, char **argv) bdrv_init(); bs = bdrv_new(hda); -if (bs == NULL) -return 1; if ((ret = bdrv_open(bs, argv[optind], flags, NULL)) 0) { errno = -ret; -- 1.7.2.3
[Qemu-devel] Re: [PATCH 05/21] virtio: modify save/load handler to handle inuse varialble.
On Thu, Dec 16, 2010 at 11:28:46PM +0900, Yoshiaki Tamura wrote: 2010/12/16 Michael S. Tsirkin m...@redhat.com: On Thu, Dec 16, 2010 at 04:36:16PM +0900, Yoshiaki Tamura wrote: 2010/12/3 Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp: 2010/12/2 Michael S. Tsirkin m...@redhat.com: On Wed, Dec 01, 2010 at 05:03:43PM +0900, Yoshiaki Tamura wrote: 2010/11/28 Michael S. Tsirkin m...@redhat.com: On Sun, Nov 28, 2010 at 08:27:58PM +0900, Yoshiaki Tamura wrote: 2010/11/28 Michael S. Tsirkin m...@redhat.com: On Thu, Nov 25, 2010 at 03:06:44PM +0900, Yoshiaki Tamura wrote: Modify inuse type to uint16_t, let save/load to handle, and revert last_avail_idx with inuse if there are outstanding emulation. Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp This changes migration format, so it will break compatibility with existing drivers. More generally, I think migrating internal state that is not guest visible is always a mistake as it ties migration format to an internal implementation (yes, I know we do this sometimes, but we should at least try not to add such cases). I think the right thing to do in this case is to flush outstanding work when vm is stopped. Then, we are guaranteed that inuse is 0. I sent patches that do this for virtio net and block. Could you give me the link of your patches? I'd like to test whether they work with Kemari upon failover. If they do, I'm happy to drop this patch. Yoshi Look for this: stable migration image on a stopped vm sent on: Wed, 24 Nov 2010 17:52:49 +0200 Thanks for the info. However, The patch series above didn't solve the issue. In case of Kemari, inuse is mostly 0 because it queues the output, and while last_avail_idx gets incremented immediately, not sending inuse makes the state inconsistent between Primary and Secondary. Hmm. Can we simply avoid incrementing last_avail_idx? I think we can calculate or prepare an internal last_avail_idx, and update the external when inuse is decremented. I'll try whether it work w/ w/o Kemari. Hi Michael, Could you please take a look at the following patch? Which version is this against? Oops. It should be very old. 67f895bfe69f323b427b284430b6219c8a62e8d4 commit 36ee7910059e6b236fe9467a609f5b4aed866912 Author: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp Date: Thu Dec 16 14:50:54 2010 +0900 virtio: update last_avail_idx when inuse is decreased. Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp It would be better to have a commit description explaining why a change is made, and why it is correct, not just repeating what can be seen from the diff anyway. Sorry for being lazy here. diff --git a/hw/virtio.c b/hw/virtio.c index c8a0fc6..6688c02 100644 --- a/hw/virtio.c +++ b/hw/virtio.c @@ -237,6 +237,7 @@ void virtqueue_flush(VirtQueue *vq, unsigned int count) wmb(); trace_virtqueue_flush(vq, count); vring_used_idx_increment(vq, count); + vq-last_avail_idx += count; vq-inuse -= count; } @@ -385,7 +386,7 @@ int virtqueue_pop(VirtQueue *vq, VirtQueueElement *elem) unsigned int i, head, max; target_phys_addr_t desc_pa = vq-vring.desc; - if (!virtqueue_num_heads(vq, vq-last_avail_idx)) + if (!virtqueue_num_heads(vq, vq-last_avail_idx + vq-inuse)) return 0; /* When we start there are none of either input nor output. */ @@ -393,7 +394,7 @@ int virtqueue_pop(VirtQueue *vq, VirtQueueElement *elem) max = vq-vring.num; - i = head = virtqueue_get_head(vq, vq-last_avail_idx++); + i = head = virtqueue_get_head(vq, vq-last_avail_idx + vq-inuse); if (vring_desc_flags(desc_pa, i) VRING_DESC_F_INDIRECT) { if (vring_desc_len(desc_pa, i) % sizeof(VRingDesc)) { Hmm, will virtio_queue_empty be wrong now? What about virtqueue_avail_bytes? I think there are two problems. 1. When to update last_avail_idx. 2. The ordering issue you're mentioning below. The patch above is only trying to address 1 because last time you mentioned that modifying last_avail_idx upon save may break the guest, which I agree. If virtio_queue_empty and virtqueue_avail_bytes are only used internally, meaning invisible to the guest, I guess the approach above can be applied too. So IMHO 2 is the real issue. This is what was problematic with the save patch, otherwise of course changes in save are better than changes all over the codebase. Previous patch version sure looked simpler, and this seems functionally equivalent, so my question still stands: here it is rephrased in a different way: assume that we have in avail ring 2 requests at start of ring: A and B in this order host pops A, then B, then completes B and flushes
[Qemu-devel] [PATCH v2] Remove NULL checks for bdrv_new return value
It's an indirect call to qemu_malloc, which never returns an error. Signed-off-by: Kevin Wolf kw...@redhat.com --- v2: - Fixed bdrv_open failure case in xen_disk hw/xen_disk.c | 17 ++--- qemu-img.c|5 + qemu-io.c |2 -- qemu-nbd.c|2 -- 4 files changed, 7 insertions(+), 19 deletions(-) diff --git a/hw/xen_disk.c b/hw/xen_disk.c index 85a1c85..ed9e5eb 100644 --- a/hw/xen_disk.c +++ b/hw/xen_disk.c @@ -634,17 +634,12 @@ static int blk_init(struct XenDevice *xendev) if (!blkdev-dinfo) { /* setup via xenbus - create new block driver instance */ xen_be_printf(blkdev-xendev, 2, create new bdrv (xenbus setup)\n); - blkdev-bs = bdrv_new(blkdev-dev); - if (blkdev-bs) { - if (bdrv_open(blkdev-bs, blkdev-filename, qflags, - bdrv_find_whitelisted_format(blkdev-fileproto)) -!= 0) { - bdrv_delete(blkdev-bs); - blkdev-bs = NULL; - } - } - if (!blkdev-bs) - return -1; +blkdev-bs = bdrv_new(blkdev-dev); +if (bdrv_open(blkdev-bs, blkdev-filename, qflags, + bdrv_find_whitelisted_format(blkdev-fileproto)) != 0) { +bdrv_delete(blkdev-bs); +return -1; +} } else { /* setup via qemu cmdline - already setup for us */ xen_be_printf(blkdev-xendev, 2, get configured bdrv (cmdline setup)\n); diff --git a/qemu-img.c b/qemu-img.c index 0b871d8..afd9ed2 100644 --- a/qemu-img.c +++ b/qemu-img.c @@ -215,10 +215,7 @@ static BlockDriverState *bdrv_new_open(const char *filename, char password[256]; bs = bdrv_new(); -if (!bs) { -error_report(Not enough memory); -goto fail; -} + if (fmt) { drv = bdrv_find_format(fmt); if (!drv) { diff --git a/qemu-io.c b/qemu-io.c index ff353eb..0f6d1b6 100644 --- a/qemu-io.c +++ b/qemu-io.c @@ -1509,8 +1509,6 @@ static int openfile(char *name, int flags, int growable) } } else { bs = bdrv_new(hda); - if (!bs) - return 1; if (bdrv_open(bs, name, flags, NULL) 0) { fprintf(stderr, %s: can't open device %s\n, progname, name); diff --git a/qemu-nbd.c b/qemu-nbd.c index 99f1d22..e858033 100644 --- a/qemu-nbd.c +++ b/qemu-nbd.c @@ -336,8 +336,6 @@ int main(int argc, char **argv) bdrv_init(); bs = bdrv_new(hda); -if (bs == NULL) -return 1; if ((ret = bdrv_open(bs, argv[optind], flags, NULL)) 0) { errno = -ret; -- 1.7.2.3
Re: [Qemu-devel] Re: [PATCH 00/15] Megasas HBA emulation and SCSI update v.3
Am 16.12.2010 02:45, schrieb Benjamin Herrenschmidt: On Mon, 2010-12-13 at 08:32 +0100, Hannes Reinecke wrote: On 12/10/2010 11:14 PM, Paolo Bonzini wrote: On 11/24/2010 05:50 PM, Christoph Hellwig wrote: Btw, it might make sense to split this series into two. Patches 1 to 11 are genuine improvements to the SCSI code, which I'd like to see merged ASAP. The rest is the actual megasas driver, which I still want to see, but haven't even gotten to review yet. Ping for patches 1 to 11? Paolo The first few already have been merged by Kevin Wolf; I'll see to prepare an updated patchset. Actually, I was about to ask as I'd like to base some new work of mine on top of these. I don't see any recent commit from Kevin in the qemu master branch (nor in any other branch on http://git.savannah.gnu.org/cgit/qemu.git/log/). Patches 1 to 5 are already in master, for the rest I'm waiting for the next update. If you need them earlier for basing your own work on them I can take this version of the patches into my block branch, even though some of the need an update before they can be merged into master. Does Kevin maintain a separate staging tree ? It's at git://repo.or.cz/qemu/kevin.git block Kevin
[Qemu-devel] classic emulator Vs QEMU-TCG
Hi all! I am a student, trying to understand QEMU, specifically TCG translation/execution. After spending much time on the code I still have big doubts. I think my doubts are due to the classic idea I have of an emulator. Actually as a student, I've never developed even a simple classic emulator myself, but in my idea it should follow this flow: 1) Fetch target instruction i.e. PC(0x532652) : 0x104265 (I am just inventing) 2) Decode Opcode 0x10 : ADD, R1: 0x42, R2: 0x65 3) Look up instruction function table: switch(opcode) case add : add(R1, R2) break; 4) Execution void add(int R1, int R2) { env-reg[R1] = env-reg[R1] + env[R2];} Now all of that would be compiled offline for the host machine and at runtime the host macine would just execute the binary host code for the instruction env-reg[R1] = env-reg[R1] + env[R2]; (its host binary translation) In QEMU/TCG, thanks to the help of Mr. Blue Swirl, I understood there is a runtime creation of host binary, starting from the loaded target binary.. My big doubt is, how can I execute that new binary? .. Shall TCG put it in some memory location, and then make the process branch to that address (and then back) ? I really can't see how that happens in the code :( in cpu-exec.c : cpu_exec_nocache i find: /* execute the generated code */ next_tb = tcg_qemu_tb_exec(tb-tc_ptr); and in cpu-exec.c : cpu_exec /* execute the generated code */ next_tb = tcg_qemu_tb_exec(tc_ptr); so I thought tcg_qemu_tb_exec function should do the work of executing the translated binary in the host. But then I found out it is just a define in tcg.h: #define tcg_qemu_tb_exec(tb_ptr) ((long REGPARM (*)(void *))code_gen_prologue)(tb_ptr) and again in exec.c uint8_t code_gen_prologue[1024] code_gen_section; Maybe I have some problems with that C syntax, but I really don't understand what happens there.. how the execution happens! I think for all of you working for so long on QEMU, with a long successful experience in this field should be very easy.. but atm I really can't figure it out alone.. I can't find good documents explaining it, and I can't understand myself from the code! Thank you very very much for any help! :) Stefano B.
Re: [Qemu-devel] classic emulator Vs QEMU-TCG
On 16 December 2010 15:20, Stefano Bonifazi stefboombas...@gmail.com wrote: so I thought tcg_qemu_tb_exec function should do the work of executing the translated binary in the host. But then I found out it is just a define in tcg.h: #define tcg_qemu_tb_exec(tb_ptr) ((long REGPARM (*)(void *))code_gen_prologue)(tb_ptr) and again in exec.c uint8_t code_gen_prologue[1024] code_gen_section; Maybe I have some problems with that C syntax, but I really don't understand what happens there.. how the execution happens! Some hints: * go and look up the C syntax for function pointers and casting things to function pointers * code_gen_prologue[] contains code which has been generated once on startup -- go and find the function which is doing this, which ought to tell you what the prologue code actually does... * try single stepping individual machine instructions in the debugger as you go through tcg_qemu_tb_exec() and matching this up with what is really happening here and with the bits of qemu which generated that code. -- PMM
Re: [Qemu-devel] ]PATCH 0/7] add TRIM/UNMAP support, v3
Am 10.12.2010 16:00, schrieb Christoph Hellwig: This patchset adds support for the ATA TRIM and SCSI WRITE SAME with unmap commands, which allow reclaiming free space from a backing image. The user facing implementation is pretty complete, but not really efficient because the underlying bdrv_discard implementation doesn't use the aio implementation yet. The reason for that is that the SCSI layer doesn't really allow any asynchronous commands except for READ/WRITE by design, and implementing the ATA TRIM command with it's multiple ranges is rather painful, and combined with the SCSI limitation I didn't bother yet. The only backend support so far is the XFS hole punching ioctl, but others can be added easily when they become available. A virtio implementation for a discard command would also be pretty easy, but until we actually support a better backend then a plain sparse file it's not worth using for production enviroments anyway, but more for playing with the thin provisioning infrastructure, or observing guest behaviour when TRIM / unmap is supported. If the support is enabled and the backend doesn't support hole punching the TRIM / WRITE SAME commands become no-ops so that migration from hosts supporting or not supporting it works. Version 3: - refactor IDE dma support code - proper brace obsfucation - fix compile without xfs headers - use bool instead of int for a one-byte flag Version 2: - replace tabs with spaces - return -ENOMEDIUM from bdrv_discard if there's no driver assigned - actually list the TP EVPD page as supported when querying for supported EVPD pages The SCSI changes seem to apply, but the rest conflicts with recent changes, most notably AHCI. Can you rebase on top of the block branch? I also found some minor things in the SCSI code, so I'll take the chance to comment on them. Kevin
Re: [Qemu-devel] [PATCH 2/7] scsi-disk: support WRITE SAME (16) with unmap bit
Am 10.12.2010 16:00, schrieb Christoph Hellwig: Support discards via the WRITE SAME command with the unmap bit set, and tell the initiator about the support for it via the block limit and the new thin provisioning EVPD pages. Also fix the comment which incorrectly describedthe block limits EVPD page. Signed-off-by: Christoph Hellwig h...@lst.de Index: qemu/hw/scsi-disk.c === --- qemu.orig/hw/scsi-disk.c 2010-12-07 09:35:44.203009851 +0100 +++ qemu/hw/scsi-disk.c 2010-12-07 09:42:07.984255141 +0100 @@ -424,7 +424,8 @@ static int scsi_disk_emulate_inquiry(SCS outbuf[buflen++] = 0x80; // unit serial number outbuf[buflen++] = 0x83; // device identification if (bdrv_get_type_hint(s-bs) != BDRV_TYPE_CDROM) { -outbuf[buflen++] = 0xb0; // block device characteristics +outbuf[buflen++] = 0xb0; // block limits +outbuf[buflen++] = 0xb2; // thin provisioning } outbuf[pages] = buflen - pages - 1; // number of pages break; @@ -466,8 +467,10 @@ static int scsi_disk_emulate_inquiry(SCS buflen += id_len; break; } -case 0xb0: /* block device characteristics */ +case 0xb0: /* block limits */ { +unsigned int unmap_sectors = +s-qdev.conf.discard_granularity / s-qdev.blocksize; unsigned int min_io_size = s-qdev.conf.min_io_size / s-qdev.blocksize; unsigned int opt_io_size = @@ -492,6 +495,21 @@ static int scsi_disk_emulate_inquiry(SCS outbuf[13] = (opt_io_size 16) 0xff; outbuf[14] = (opt_io_size 8) 0xff; outbuf[15] = opt_io_size 0xff; + +/* optimal unmap granularity */ +outbuf[28] = (unmap_sectors 24) 0xff; +outbuf[29] = (unmap_sectors 16) 0xff; +outbuf[30] = (unmap_sectors 8) 0xff; +outbuf[31] = unmap_sectors 0xff; +break; +} +case 0xb2: /* thin provisioning */ +{ +outbuf[3] = buflen = 8; +outbuf[4] = 0; +outbuf[5] = 0x40; /* write same with unmap supported */ +outbuf[6] = 0; +outbuf[7] = 0; break; } default: @@ -959,6 +977,11 @@ static int scsi_disk_emulate_command(SCS outbuf[11] = 0; outbuf[12] = 0; outbuf[13] = get_physical_block_exp(s-qdev.conf); + +/* set TPE bit if the format supports discard */ +if (s-qdev.conf.discard_granularity) +outbuf[14] = 0x80; Missing braces. + /* Protection, exponent and lowest lba field left blank. */ buflen = req-cmd.xfer; break; @@ -1123,6 +1146,34 @@ static int32_t scsi_send_command(SCSIDev goto illegal_lba; } break; +case WRITE_SAME_16: +len = r-req.cmd.xfer / d-blocksize; + +DPRINTF(WRITE SAME(16) (sector % PRId64 , count %d)\n, +r-req.cmd.lba, len); + +if (r-req.cmd.lba s-max_lba) { +goto illegal_lba; +} + +/* + * We only support WRITE SAME with the unmap bit set for now. + */ +if (!(buf[1] 0x8)) { +goto fail; +} + +rc = bdrv_discard(s-bs, r-req.cmd.lba * s-cluster_size, + len * s-cluster_size); +if (rc 0) { +/* XXX: better error code ?*/ +goto fail; +} + +scsi_req_set_status(r, GOOD, NO_SENSE); +scsi_req_complete(r-req); +scsi_remove_request(r); Isn't this the same as scsi_command_complete()? +return 0; And if you break; here instead of returning (like all other commands) and remove the three lines above completely, I think it would just do the right thing. Kevin
[Qemu-devel] [PATCH v2] ide: Register vm change state handler once only
We register the vm change state handler in a PCI BAR map() function. This function can be called multiple times throughout the lifetime of a PCI IDE device. This results in duplicate vm change state handlers being register, none of which are ever unregistered. Instead, register the vm change state handler in the device's init function once and for all. piix tested, cmd646 and via not tested. Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com --- v2: * Rebased on kevin/block hw/ide/cmd646.c | 18 ++ hw/ide/piix.c | 34 -- hw/ide/via.c| 34 -- 3 files changed, 58 insertions(+), 28 deletions(-) diff --git a/hw/ide/cmd646.c b/hw/ide/cmd646.c index e191ee6..89ba836 100644 --- a/hw/ide/cmd646.c +++ b/hw/ide/cmd646.c @@ -167,10 +167,6 @@ static void bmdma_map(PCIDevice *pci_dev, int region_num, for(i = 0;i 2; i++) { BMDMAState *bm = d-bmdma[i]; -bmdma_init(d-bus[i], bm); -bm-bus = d-bus+i; -qemu_add_vm_change_state_handler(d-bus[i].dma-ops-restart_cb, - bm-dma); if (i == 0) { register_ioport_write(addr, 4, 1, bmdma_writeb_0, d); @@ -228,6 +224,7 @@ static int pci_cmd646_ide_initfn(PCIDevice *dev) PCIIDEState *d = DO_UPCAST(PCIIDEState, dev, dev); uint8_t *pci_conf = d-dev.config; qemu_irq *irq; +int i; pci_config_set_vendor_id(pci_conf, PCI_VENDOR_ID_CMD); pci_config_set_device_id(pci_conf, PCI_DEVICE_ID_CMD_646); @@ -253,10 +250,15 @@ static int pci_cmd646_ide_initfn(PCIDevice *dev) pci_conf[PCI_INTERRUPT_PIN] = 0x01; // interrupt on pin 1 irq = qemu_allocate_irqs(cmd646_set_irq, d, 2); -ide_bus_new(d-bus[0], d-dev.qdev, 0); -ide_bus_new(d-bus[1], d-dev.qdev, 1); -ide_init2(d-bus[0], irq[0]); -ide_init2(d-bus[1], irq[1]); +for (i = 0; i 2; i++) { +ide_bus_new(d-bus[i], d-dev.qdev, i); +ide_init2(d-bus[i], irq[i]); + +bmdma_init(d-bus[i], d-bmdma[i]); +bm-bus = d-bus[i]; +qemu_add_vm_change_state_handler(d-bus[i].dma-ops-restart_cb, + d-bmdma[i]-dma); +} vmstate_register(dev-qdev, 0, vmstate_ide_pci, d); qemu_register_reset(cmd646_reset, d); diff --git a/hw/ide/piix.c b/hw/ide/piix.c index a6b5d92..1cad906 100644 --- a/hw/ide/piix.c +++ b/hw/ide/piix.c @@ -76,10 +76,6 @@ static void bmdma_map(PCIDevice *pci_dev, int region_num, for(i = 0;i 2; i++) { BMDMAState *bm = d-bmdma[i]; -bmdma_init(d-bus[i], bm); -bm-bus = d-bus+i; -qemu_add_vm_change_state_handler(d-bus[i].dma-ops-restart_cb, - bm-dma); register_ioport_write(addr, 1, 1, bmdma_cmd_writeb, bm); @@ -112,6 +108,29 @@ static void piix3_reset(void *opaque) pci_conf[0x20] = 0x01; /* BMIBA: 20-23h */ } +static void pci_piix_init_ports(PCIIDEState *d) { +int i; +struct { +int iobase; +int iobase2; +int isairq; +} port_info[] = { +{0x1f0, 0x3f6, 14}, +{0x170, 0x376, 15}, +}; + +for (i = 0; i 2; i++) { +ide_bus_new(d-bus[i], d-dev.qdev, i); +ide_init_ioport(d-bus[i], port_info[i].iobase, port_info[i].iobase2); +ide_init2(d-bus[i], isa_reserve_irq(port_info[i].isairq)); + +bmdma_init(d-bus[i], d-bmdma[i]); +d-bmdma[i].bus = d-bus[i]; +qemu_add_vm_change_state_handler(d-bus[i].dma-ops-restart_cb, + d-bmdma[i].dma); +} +} + static int pci_piix_ide_initfn(PCIIDEState *d) { uint8_t *pci_conf = d-dev.config; @@ -125,13 +144,8 @@ static int pci_piix_ide_initfn(PCIIDEState *d) vmstate_register(d-dev.qdev, 0, vmstate_ide_pci, d); -ide_bus_new(d-bus[0], d-dev.qdev, 0); -ide_bus_new(d-bus[1], d-dev.qdev, 1); -ide_init_ioport(d-bus[0], 0x1f0, 0x3f6); -ide_init_ioport(d-bus[1], 0x170, 0x376); +pci_piix_init_ports(d); -ide_init2(d-bus[0], isa_reserve_irq(14)); -ide_init2(d-bus[1], isa_reserve_irq(15)); return 0; } diff --git a/hw/ide/via.c b/hw/ide/via.c index 2603110..5b70bd2 100644 --- a/hw/ide/via.c +++ b/hw/ide/via.c @@ -78,10 +78,6 @@ static void bmdma_map(PCIDevice *pci_dev, int region_num, for(i = 0;i 2; i++) { BMDMAState *bm = d-bmdma[i]; -bmdma_init(d-bus[i], bm); -bm-bus = d-bus+i; -qemu_add_vm_change_state_handler(d-bus[i].dma-ops-restart_cb, - bm-dma); register_ioport_write(addr, 1, 1, bmdma_cmd_writeb, bm); @@ -135,6 +131,29 @@ static void via_reset(void *opaque) pci_set_long(pci_conf + 0xc0, 0x00020001); } +static void vt82c686b_init_ports(PCIIDEState *d) { +int i; +struct { +int iobase; +int iobase2; +int isairq; +} port_info[] = { +{0x1f0, 0x3f6,
Re: [Qemu-devel] classic emulator Vs QEMU-TCG
Hi Stefano I'll try to share what I know about TCG.. On Thu, Dec 16, 2010 at 22:20, Stefano Bonifazi stefboombas...@gmail.com wrote: Actually as a student, I've never developed even a simple classic emulator myself, you're not alone...trust me.. :) but in my idea it should follow this flow: 1) Fetch target instruction i.e. PC(0x532652) : 0x104265 (I am just inventing) 2) Decode Opcode 0x10 : ADD, R1: 0x42, R2: 0x65 3) Look up instruction function table: switch(opcode) case add : add(R1, R2) break; 4) Execution void add(int R1, int R2) { env-reg[R1] = env-reg[R1] + env[R2];} You're right. Basically, we're taught that emulation is like big giant swith..case with lots of condition. And that's exactly what Bochs does AFAIK... The pros of this approach is instruction could be simulated as precise as possible and we could have more precise control about timing...however the cons is... as we saw that big case branching...cache miss could likely happen (in host machine I mean) and pipeline stalls might happen more. By doing what Qemu does, be it using the old dyngen or new TCG, we try to maintain execution fluidity by interpreting instruction as less as possible and strings together the already translated blocks ... And don't forget that Qemu sometimes does things like lazy flags update, somewhat simple dead code elimination and so on. More like tiny compiler...right? Now all of that would be compiled offline for the host machine and at runtime the host macine would just execute the binary host code for the instruction env-reg[R1] = env-reg[R1] + env[R2]; (its host binary translation) My big doubt is, how can I execute that new binary? .. Shall TCG put it in some memory location, and then make the process branch to that address (and then back) ? I really can't see how that happens in the code :( in cpu-exec.c : cpu_exec_nocache i find: /* execute the generated code */ next_tb = tcg_qemu_tb_exec(tb-tc_ptr); and in cpu-exec.c : cpu_exec /* execute the generated code */ next_tb = tcg_qemu_tb_exec(tc_ptr); so I thought tcg_qemu_tb_exec function should do the work of executing the translated binary in the host. But then I found out it is just a define in tcg.h: #define tcg_qemu_tb_exec(tb_ptr) ((long REGPARM (*)(void *))code_gen_prologue)(tb_ptr) and again in exec.c uint8_t code_gen_prologue[1024] code_gen_section; Maybe I have some problems with that C syntax, but I really don't understand what happens there.. how the execution happens! With my limited C knowledge, I saw that as a instruction jump (to tb_ptr). The code_gen_prologue seems to me like a cast. casting each opcode in tb_ptr as uint8_t with maximum length=1024 I hope that's the right interpretation...I must admit Qemu is full of gcc and C tricks here and there... -- regards, Mulyadi Santosa Freelance Linux trainer and consultant blog: the-hydra.blogspot.com training: mulyaditraining.blogspot.com
[Qemu-devel] Re: [PATCH 05/21] virtio: modify save/load handler to handle inuse varialble.
2010/12/16 Michael S. Tsirkin m...@redhat.com: On Thu, Dec 16, 2010 at 11:28:46PM +0900, Yoshiaki Tamura wrote: 2010/12/16 Michael S. Tsirkin m...@redhat.com: On Thu, Dec 16, 2010 at 04:36:16PM +0900, Yoshiaki Tamura wrote: 2010/12/3 Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp: 2010/12/2 Michael S. Tsirkin m...@redhat.com: On Wed, Dec 01, 2010 at 05:03:43PM +0900, Yoshiaki Tamura wrote: 2010/11/28 Michael S. Tsirkin m...@redhat.com: On Sun, Nov 28, 2010 at 08:27:58PM +0900, Yoshiaki Tamura wrote: 2010/11/28 Michael S. Tsirkin m...@redhat.com: On Thu, Nov 25, 2010 at 03:06:44PM +0900, Yoshiaki Tamura wrote: Modify inuse type to uint16_t, let save/load to handle, and revert last_avail_idx with inuse if there are outstanding emulation. Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp This changes migration format, so it will break compatibility with existing drivers. More generally, I think migrating internal state that is not guest visible is always a mistake as it ties migration format to an internal implementation (yes, I know we do this sometimes, but we should at least try not to add such cases). I think the right thing to do in this case is to flush outstanding work when vm is stopped. Then, we are guaranteed that inuse is 0. I sent patches that do this for virtio net and block. Could you give me the link of your patches? I'd like to test whether they work with Kemari upon failover. If they do, I'm happy to drop this patch. Yoshi Look for this: stable migration image on a stopped vm sent on: Wed, 24 Nov 2010 17:52:49 +0200 Thanks for the info. However, The patch series above didn't solve the issue. In case of Kemari, inuse is mostly 0 because it queues the output, and while last_avail_idx gets incremented immediately, not sending inuse makes the state inconsistent between Primary and Secondary. Hmm. Can we simply avoid incrementing last_avail_idx? I think we can calculate or prepare an internal last_avail_idx, and update the external when inuse is decremented. I'll try whether it work w/ w/o Kemari. Hi Michael, Could you please take a look at the following patch? Which version is this against? Oops. It should be very old. 67f895bfe69f323b427b284430b6219c8a62e8d4 commit 36ee7910059e6b236fe9467a609f5b4aed866912 Author: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp Date: Thu Dec 16 14:50:54 2010 +0900 virtio: update last_avail_idx when inuse is decreased. Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp It would be better to have a commit description explaining why a change is made, and why it is correct, not just repeating what can be seen from the diff anyway. Sorry for being lazy here. diff --git a/hw/virtio.c b/hw/virtio.c index c8a0fc6..6688c02 100644 --- a/hw/virtio.c +++ b/hw/virtio.c @@ -237,6 +237,7 @@ void virtqueue_flush(VirtQueue *vq, unsigned int count) wmb(); trace_virtqueue_flush(vq, count); vring_used_idx_increment(vq, count); + vq-last_avail_idx += count; vq-inuse -= count; } @@ -385,7 +386,7 @@ int virtqueue_pop(VirtQueue *vq, VirtQueueElement *elem) unsigned int i, head, max; target_phys_addr_t desc_pa = vq-vring.desc; - if (!virtqueue_num_heads(vq, vq-last_avail_idx)) + if (!virtqueue_num_heads(vq, vq-last_avail_idx + vq-inuse)) return 0; /* When we start there are none of either input nor output. */ @@ -393,7 +394,7 @@ int virtqueue_pop(VirtQueue *vq, VirtQueueElement *elem) max = vq-vring.num; - i = head = virtqueue_get_head(vq, vq-last_avail_idx++); + i = head = virtqueue_get_head(vq, vq-last_avail_idx + vq-inuse); if (vring_desc_flags(desc_pa, i) VRING_DESC_F_INDIRECT) { if (vring_desc_len(desc_pa, i) % sizeof(VRingDesc)) { Hmm, will virtio_queue_empty be wrong now? What about virtqueue_avail_bytes? I think there are two problems. 1. When to update last_avail_idx. 2. The ordering issue you're mentioning below. The patch above is only trying to address 1 because last time you mentioned that modifying last_avail_idx upon save may break the guest, which I agree. If virtio_queue_empty and virtqueue_avail_bytes are only used internally, meaning invisible to the guest, I guess the approach above can be applied too. So IMHO 2 is the real issue. This is what was problematic with the save patch, otherwise of course changes in save are better than changes all over the codebase. All right. Then let's focus on 2 first. Previous patch version sure looked simpler, and this seems functionally equivalent, so my question still stands: here it is rephrased in a different way: assume that we have in avail ring 2 requests
[Qemu-devel] [PATCH 2/2] Add proper -errno error return values to qcow2_open()
From: Jes Sorensen jes.soren...@redhat.com In addition this adds missing braces to the function to be consistent with the coding style. Signed-off-by: Jes Sorensen jes.soren...@redhat.com --- block/qcow2.c | 61 1 files changed, 43 insertions(+), 18 deletions(-) diff --git a/block/qcow2.c b/block/qcow2.c index d7fd167..b4a9e5e 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -140,12 +140,14 @@ static int qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset, static int qcow2_open(BlockDriverState *bs, int flags) { BDRVQcowState *s = bs-opaque; -int len, i; +int len, i, ret = 0; QCowHeader header; uint64_t ext_end; -if (bdrv_pread(bs-file, 0, header, sizeof(header)) != sizeof(header)) +if (bdrv_pread(bs-file, 0, header, sizeof(header)) != sizeof(header)) { +ret = -EIO; goto fail; +} be32_to_cpus(header.magic); be32_to_cpus(header.version); be64_to_cpus(header.backing_file_offset); @@ -160,16 +162,23 @@ static int qcow2_open(BlockDriverState *bs, int flags) be64_to_cpus(header.snapshots_offset); be32_to_cpus(header.nb_snapshots); -if (header.magic != QCOW_MAGIC || header.version != QCOW_VERSION) +if (header.magic != QCOW_MAGIC || header.version != QCOW_VERSION) { +ret = -EINVAL; goto fail; +} if (header.cluster_bits MIN_CLUSTER_BITS || -header.cluster_bits MAX_CLUSTER_BITS) +header.cluster_bits MAX_CLUSTER_BITS) { +ret = -EINVAL; goto fail; -if (header.crypt_method QCOW_CRYPT_AES) +} +if (header.crypt_method QCOW_CRYPT_AES) { +ret = -EINVAL; goto fail; +} s-crypt_method_header = header.crypt_method; -if (s-crypt_method_header) +if (s-crypt_method_header) { bs-encrypted = 1; +} s-cluster_bits = header.cluster_bits; s-cluster_size = 1 s-cluster_bits; s-cluster_sectors = 1 (s-cluster_bits - 9); @@ -191,15 +200,20 @@ static int qcow2_open(BlockDriverState *bs, int flags) s-l1_vm_state_index = size_to_l1(s, header.size); /* the L1 table must contain at least enough entries to put header.size bytes */ -if (s-l1_size s-l1_vm_state_index) +if (s-l1_size s-l1_vm_state_index) { +ret = -EINVAL; goto fail; +} s-l1_table_offset = header.l1_table_offset; if (s-l1_size 0) { s-l1_table = qemu_mallocz( align_offset(s-l1_size * sizeof(uint64_t), 512)); -if (bdrv_pread(bs-file, s-l1_table_offset, s-l1_table, s-l1_size * sizeof(uint64_t)) != -s-l1_size * sizeof(uint64_t)) +if (bdrv_pread(bs-file, s-l1_table_offset, s-l1_table, + s-l1_size * sizeof(uint64_t)) != +s-l1_size * sizeof(uint64_t)) { +ret = -EIO; goto fail; +} for(i = 0;i s-l1_size; i++) { be64_to_cpus(s-l1_table[i]); } @@ -212,35 +226,46 @@ static int qcow2_open(BlockDriverState *bs, int flags) + 512); s-cluster_cache_offset = -1; -if (qcow2_refcount_init(bs) 0) +ret = qcow2_refcount_init(bs); +if (ret != 0) { goto fail; +} QLIST_INIT(s-cluster_allocs); /* read qcow2 extensions */ -if (header.backing_file_offset) +if (header.backing_file_offset) { ext_end = header.backing_file_offset; -else +} else { ext_end = s-cluster_size; -if (qcow2_read_extensions(bs, sizeof(header), ext_end)) +} +if (qcow2_read_extensions(bs, sizeof(header), ext_end)) { +ret = -EINVAL; goto fail; +} /* read the backing file name */ if (header.backing_file_offset != 0) { len = header.backing_file_size; -if (len 1023) +if (len 1023) { len = 1023; -if (bdrv_pread(bs-file, header.backing_file_offset, bs-backing_file, len) != len) +} +if (bdrv_pread(bs-file, header.backing_file_offset, + bs-backing_file, len) != len) { +ret = -EIO; goto fail; +} bs-backing_file[len] = '\0'; } -if (qcow2_read_snapshots(bs) 0) +if (qcow2_read_snapshots(bs) 0) { +ret = -EINVAL; goto fail; +} #ifdef DEBUG_ALLOC qcow2_check_refcounts(bs); #endif -return 0; +return ret; fail: qcow2_free_snapshots(bs); @@ -249,7 +274,7 @@ static int qcow2_open(BlockDriverState *bs, int flags) qemu_free(s-l2_cache); qemu_free(s-cluster_cache); qemu_free(s-cluster_data); -return -1; +return ret; } static int qcow2_set_key(BlockDriverState *bs, const char *key) -- 1.7.3.3
[Qemu-devel] [PATCH 0/2] qcow2 cleanups
From: Jes Sorensen jes.soren...@redhat.com Hi, These two patches tries to clean up the qcow2 code a little. First makes the function names consistent, ie. we shouldn't have qcow_ functions in the qcow2 code. Second tries to add proper errno return values to qcow2_open(). Jes Sorensen (2): block/qcow2.c: rename qcow_ functions to qcow2_ Add proper -errno error return values to qcow2_open() block/qcow2-cluster.c |6 +- block/qcow2-snapshot.c |6 +- block/qcow2.c | 269 +++- 3 files changed, 158 insertions(+), 123 deletions(-) -- 1.7.3.3
[Qemu-devel] [PATCH 1/2] block/qcow2.c: rename qcow_ functions to qcow2_
From: Jes Sorensen jes.soren...@redhat.com It doesn't really make sense for functions in qcow2.c to be named qcow_ so convert the names to match correctly. Signed-off-by: Jes Sorensen jes.soren...@redhat.com --- block/qcow2-cluster.c |6 +- block/qcow2-snapshot.c |6 +- block/qcow2.c | 210 +--- 3 files changed, 116 insertions(+), 106 deletions(-) diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c index b040208..6928c63 100644 --- a/block/qcow2-cluster.c +++ b/block/qcow2-cluster.c @@ -352,8 +352,8 @@ void qcow2_encrypt_sectors(BDRVQcowState *s, int64_t sector_num, } -static int qcow_read(BlockDriverState *bs, int64_t sector_num, - uint8_t *buf, int nb_sectors) +static int qcow2_read(BlockDriverState *bs, int64_t sector_num, + uint8_t *buf, int nb_sectors) { BDRVQcowState *s = bs-opaque; int ret, index_in_cluster, n, n1; @@ -419,7 +419,7 @@ static int copy_sectors(BlockDriverState *bs, uint64_t start_sect, if (n = 0) return 0; BLKDBG_EVENT(bs-file, BLKDBG_COW_READ); -ret = qcow_read(bs, start_sect + n_start, s-cluster_data, n); +ret = qcow2_read(bs, start_sect + n_start, s-cluster_data, n); if (ret 0) return ret; if (s-crypt_method) { diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c index aacf357..74823a5 100644 --- a/block/qcow2-snapshot.c +++ b/block/qcow2-snapshot.c @@ -116,7 +116,7 @@ int qcow2_read_snapshots(BlockDriverState *bs) } /* add at the end of the file a new list of snapshots */ -static int qcow_write_snapshots(BlockDriverState *bs) +static int qcow2_write_snapshots(BlockDriverState *bs) { BDRVQcowState *s = bs-opaque; QCowSnapshot *sn; @@ -300,7 +300,7 @@ int qcow2_snapshot_create(BlockDriverState *bs, QEMUSnapshotInfo *sn_info) s-snapshots = snapshots1; s-snapshots[s-nb_snapshots++] = *sn; -if (qcow_write_snapshots(bs) 0) +if (qcow2_write_snapshots(bs) 0) goto fail; #ifdef DEBUG_ALLOC qcow2_check_refcounts(bs); @@ -378,7 +378,7 @@ int qcow2_snapshot_delete(BlockDriverState *bs, const char *snapshot_id) qemu_free(sn-name); memmove(sn, sn + 1, (s-nb_snapshots - snapshot_index - 1) * sizeof(*sn)); s-nb_snapshots--; -ret = qcow_write_snapshots(bs); +ret = qcow2_write_snapshots(bs); if (ret 0) { /* XXX: restore snapshot if error ? */ return ret; diff --git a/block/qcow2.c b/block/qcow2.c index 537c479..d7fd167 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -50,10 +50,10 @@ typedef struct { uint32_t magic; uint32_t len; } QCowExtension; -#define QCOW_EXT_MAGIC_END 0 -#define QCOW_EXT_MAGIC_BACKING_FORMAT 0xE2792ACA +#define QCOW2_EXT_MAGIC_END 0 +#define QCOW2_EXT_MAGIC_BACKING_FORMAT 0xE2792ACA -static int qcow_probe(const uint8_t *buf, int buf_size, const char *filename) +static int qcow2_probe(const uint8_t *buf, int buf_size, const char *filename) { const QCowHeader *cow_header = (const void *)buf; @@ -73,14 +73,14 @@ static int qcow_probe(const uint8_t *buf, int buf_size, const char *filename) * unknown magic is skipped (future extension this version knows nothing about) * return 0 upon success, non-0 otherwise */ -static int qcow_read_extensions(BlockDriverState *bs, uint64_t start_offset, -uint64_t end_offset) +static int qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset, + uint64_t end_offset) { QCowExtension ext; uint64_t offset; #ifdef DEBUG_EXT -printf(qcow_read_extensions: start=%ld end=%ld\n, start_offset, end_offset); +printf(qcow2_read_extensions: start=%ld end=%ld\n, start_offset, end_offset); #endif offset = start_offset; while (offset end_offset) { @@ -88,13 +88,13 @@ static int qcow_read_extensions(BlockDriverState *bs, uint64_t start_offset, #ifdef DEBUG_EXT /* Sanity check */ if (offset s-cluster_size) -printf(qcow_handle_extension: suspicious offset %lu\n, offset); +printf(qcow_read_extension: suspicious offset %lu\n, offset); printf(attemting to read extended header in offset %lu\n, offset); #endif if (bdrv_pread(bs-file, offset, ext, sizeof(ext)) != sizeof(ext)) { -fprintf(stderr, qcow_handle_extension: ERROR: +fprintf(stderr, qcow_read_extension: ERROR: pread fail from offset % PRIu64 \n, offset); return 1; @@ -106,10 +106,10 @@ static int qcow_read_extensions(BlockDriverState *bs, uint64_t start_offset, printf(ext.magic = 0x%x\n, ext.magic); #endif switch (ext.magic) { -case QCOW_EXT_MAGIC_END: +case QCOW2_EXT_MAGIC_END: return 0; -case QCOW_EXT_MAGIC_BACKING_FORMAT: +case QCOW2_EXT_MAGIC_BACKING_FORMAT: if
[Qemu-devel] -snapshot
Hello, Is a bad idea if I run multiples qemu vms pointing to the same disk img using -snapshot parameter? What kind of problem may I have? Thanks,
[Qemu-devel] Re: [PATCH 11/21] ioport: insert event_tap_ioport() to ioport_write().
On Thu, Dec 16, 2010 at 9:50 AM, Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp wrote: 2010/12/16 Michael S. Tsirkin m...@redhat.com: On Thu, Dec 16, 2010 at 04:37:41PM +0900, Yoshiaki Tamura wrote: 2010/11/28 Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp: 2010/11/28 Michael S. Tsirkin m...@redhat.com: On Thu, Nov 25, 2010 at 03:06:50PM +0900, Yoshiaki Tamura wrote: Record ioport event to replay it upon failover. Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp Interesting. This will have to be extended to support ioeventfd. Since each eventfd is really just a binary trigger it should be enough to read out the fd state. Haven't thought about eventfd yet. Will try doing it in the next spin. Hi Michael, I looked into eventfd and realized it's only used with vhost now. There are patches on list to use it for block/userspace net. Thanks. Now I understand. In that case, inserting an even-tap function to the following code should be appropriate? int event_notifier_test_and_clear(EventNotifier *e) { uint64_t value; int r = read(e-fd, value, sizeof(value)); return r == sizeof(value); } However, I believe vhost bypass the net layer in qemu, and there is no way for Kemari to detect the outputs. To me, it doesn't make sense to extend this patch to support eventfd... Here is the userspace ioeventfd patch series: http://www.mail-archive.com/qemu-devel@nongnu.org/msg49208.html Instead of switching to QEMU userspace to handle the virtqueue kick pio write, we signal the eventfd inside the kernel and resume guest code execution. The I/O thread can then process the virtqueue kick in parallel to guest code execution. I think this can still be tied into Kemari. If you are switching to a pure net/block-layer event tap instead of pio/mmio, then I think it should just work. For vhost it would be more difficult to integrate with Kemari. Stefan
Re: [Qemu-devel] -snapshot
On Thu, Dec 16, 2010 at 4:21 PM, Amador Pahim ama...@pahim.org wrote: Is a bad idea if I run multiples qemu vms pointing to the same disk img using -snapshot parameter? What kind of problem may I have? It should work fine. -snapshot means that QEMU creates a temporary qcow2 file backed by the disk image you've given it. No changes will be made to your disk image - it is read-only. Stefan
Re: [Qemu-devel] -snapshot
Hello Stefan, Thank you for your answer. Just one more question: If, while my snapshot vms are running, the main disk is modified by a non snapshot vm? For example, installing some extra software.. this can freeze vms or something? Regards, Pahim On Thu, Dec 16, 2010 at 2:28 PM, Stefan Hajnoczi stefa...@gmail.com wrote: On Thu, Dec 16, 2010 at 4:21 PM, Amador Pahim ama...@pahim.org wrote: Is a bad idea if I run multiples qemu vms pointing to the same disk img using -snapshot parameter? What kind of problem may I have? It should work fine. -snapshot means that QEMU creates a temporary qcow2 file backed by the disk image you've given it. No changes will be made to your disk image - it is read-only. Stefan
Re: [Qemu-devel] ]PATCH 0/7] add TRIM/UNMAP support, v3
On Thu, Dec 16, 2010 at 04:43:20PM +0100, Kevin Wolf wrote: The SCSI changes seem to apply, but the rest conflicts with recent changes, most notably AHCI. Can you rebase on top of the block branch? I've tried to, but with the maze of pointer indirections I'm totally lost for now. I'll have to drop the IDE side for now until that area gets less messy or I get a lot more time.
[Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify
On Wed, Dec 15, 2010 at 12:59 PM, Stefan Hajnoczi stefa...@gmail.com wrote: On Wed, Dec 15, 2010 at 12:14 PM, Michael S. Tsirkin m...@redhat.com wrote: On Wed, Dec 15, 2010 at 11:42:12AM +, Stefan Hajnoczi wrote: Are you happy with this patchset if I remove virtio-net-pci ioeventfd=on|off so only virtio-blk-pci has ioeventfd=on|off (with default on)? For block we've found it to be a win and the initial results looked good for net too. Please let me know if I should disable ioeventfd for virtio-net. Stefan
Re: [Qemu-devel] [PATCH 2/7] scsi-disk: support WRITE SAME (16) with unmap bit
On Thu, Dec 16, 2010 at 04:48:15PM +0100, Kevin Wolf wrote: +scsi_req_set_status(r, GOOD, NO_SENSE); +scsi_req_complete(r-req); +scsi_remove_request(r); Isn't this the same as scsi_command_complete()? Yes. +return 0; And if you break; here instead of returning (like all other commands) and remove the three lines above completely, I think it would just do the right thing. Yes, that looks doable.
Re: [Qemu-devel] [PATCH 1/1] spice: add chardev
On Thu, Dec 16, 2010 at 02:21:16PM +0100, Gerd Hoffmann wrote: On 12/16/10 12:29, Alon Levy wrote: Adding a chardev backend for spice, for usage by spice vdagent in conjunction with a properly named virtio-serial device. Usage example would be nice here. +#ifdef CONFIG_SPICE +#include spice-qemu-char.h +#endif #ifdef can be dropped. ok. +#ifdef CONFIG_SPICE +{ +.name = name, +.type = QEMU_OPT_STRING, +},{ +.name = debug, +.type = QEMU_OPT_NUMBER, +}, +#endif This too. ok. @@ -1381,7 +1384,10 @@ Backend is one of: @option{stdio}, @option{braille}, @option{tty}, -...@option{parport}. +...@option{parport} +#if defined(CONFIG_SPICE) +...@option{spicevmc}. +#endif This too, documentation should be there unconditionally. ok. +//#define SPICE_QEMU_CHAR_USE_IOCTL Why is this disabled? Does it depend on the chardev patches from Amit? There was a long discussion that concluded we don't want IOCTL's at all, and that there should be some other mechanism for connection state communication between the two sides. Meanwhile I found out I don't need these (I don't remember exactly what I used instead, but basically just the regular results of write/read). diff --git a/spice-qemu-char.h b/spice-qemu-char.h new file mode 100644 index 000..32d5009 --- /dev/null +++ b/spice-qemu-char.h @@ -0,0 +1,9 @@ +#ifndef __SPICE_QEMU_CHAR_H__ +#define __SPICE_QEMU_CHAR_H__ + +#include qemu-char.h + +CharDriverState *qemu_chr_open_spice(QemuOpts *opts); + +#endif // __SPICE_QEMU_CHAR_H__ + Hmm, maybe add this to ui/qemu-spice.h instead, so we don't clutter the tree with lots of tiny includes? ok. cheers, Gerd
Re: [Qemu-devel] [PATCH 5/5] ccid: add docs
On Thu, Dec 16, 2010 at 02:01:34PM +0100, Gerd Hoffmann wrote: On 12/16/10 12:06, Alon Levy wrote: Add documentation for the usb-ccid device and accompanying two card devices, ccid-card-emulated and ccid-card-passthru. --- docs/ccid.txt | 125 ++ docs/libcacard.txt | 483 Guess that isn't intentional, right? doh! +A typical interchange is: + +client event | vscclient |passthru| usb-ccid | guest event +-- + | VSC_Init|| | + | VSC_ReaderAdd || attach | + | || | sees new usb device. +card inserted | || | + | VSC_ATR || | + | || | guest operation, APDU transfer via CCID + | | VSC_APDU | | + | VSC_APDU|| | +client-physical | || | +card APDU exchange| || | +[APDU-APDU repeats several times] +card removed | || | + | VSC_CardRemove || | +kill/quit | || | + vscclient | || | + | VSC_ReaderRemove||detach | + | || | usb device removed. This needs updating, doesn't it? I think you plugin and -out just the cards on the ccid bus instead of the whole usb device, right? actually I'm not sure.. I'll check and update. cheers, Gerd
Re: [Qemu-devel] [PATCH 1/1] spice: add chardev
Hi, +//#define SPICE_QEMU_CHAR_USE_IOCTL Why is this disabled? Does it depend on the chardev patches from Amit? There was a long discussion that concluded we don't want IOCTL's at all, and that there should be some other mechanism for connection state communication between the two sides. Meanwhile I found out I don't need these (I don't remember exactly what I used instead, but basically just the regular results of write/read). Ok, so when it is obsolete now it can be dropped altogether I guess? cheers, Gerd
Re: [Qemu-devel] -snapshot
On Thu, Dec 16, 2010 at 4:34 PM, Amador Pahim ama...@pahim.org wrote: Thank you for your answer. Just one more question: If, while my snapshot vms are running, the main disk is modified by a non snapshot vm? For example, installing some extra software.. this can freeze vms or something? Correct, it is not safe to modify the base image while there is another disk image backed off it. The reason for this is that the image only needs to store the changes that were made on top of the base image. For anything which hasn't been modified it will go back to the base image and read data from there. If you modify the base image, then the filesystem in the base image is not longer what your image file was created from and you have an inconsistent view of the disk. It leads to odd behavior and is unsafe. Stefan
Re: [Qemu-devel] [PATCH v2] Remove NULL checks for bdrv_new return value
On Thu, Dec 16, 2010 at 2:44 PM, Kevin Wolf kw...@redhat.com wrote: It's an indirect call to qemu_malloc, which never returns an error. Signed-off-by: Kevin Wolf kw...@redhat.com --- v2: - Fixed bdrv_open failure case in xen_disk hw/xen_disk.c | 17 ++--- qemu-img.c | 5 + qemu-io.c | 2 -- qemu-nbd.c | 2 -- 4 files changed, 7 insertions(+), 19 deletions(-) Reviewed-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
[Qemu-devel] Re: [PATCH 4/4] qcow2: Use block-queue
On Mon, Dec 13, 2010 at 4:29 PM, Kevin Wolf kw...@redhat.com wrote: diff --git a/block/qcow2.c b/block/qcow2.c index 537c479..e445913 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -136,6 +136,20 @@ static int qcow_read_extensions(BlockDriverState *bs, uint64_t start_offset, return 0; } +static bool qcow_blkqueue_error_cb(void *opaque, int ret) +{ + BlockDriverState *bs = opaque; + BlockErrorAction action = bdrv_get_on_error(bs, 0); + + if ((action == BLOCK_ERR_STOP_ENOSPC ret == -ENOSPC) + || action == BLOCK_ERR_STOP_ANY) + { + bdrv_mon_event(bs, BDRV_ACTION_STOP, 0); + return vm_stop(0); + } + + return false; +} Not sure this is qcow2-specific. Is it possible to put this in a generic sysemu-specific place and qemu-tool.c? That would allow the vm_stop() change to be dropped. Stefan
[Qemu-devel] [PATCH 0/2] Fix rtl8139 migration with hotplug
Ok, I think this might actually make everyone happy, but I've been known to be wrong about that many times before. Juan challenged me to find an rtl8139 migration scenario that fails when hotplug is not involved (and not switch device creation order since that's a usage bug). I couldn't come up with one. We had been arguing that a subsection didn't make sense for the change to rtl8139 vmstate because the needed function would be {return 1}. but what if we could detect if the VM had done any other hotplugs and only include the subsection in those cases. That's what this short series does. So, I hope Juan is happy because this preserves the migration ABI for the majority of the use cases, and I hope Michael is happy because it does so using a subsection. Thanks, Alex --- Alex Williamson (2): rtl8139: Use subsection to restrict migration after hotplug qdev: Track runtime machine modifications hw/qdev.c| 10 ++ hw/qdev.h|1 + hw/rtl8139.c | 28 +++- 3 files changed, 38 insertions(+), 1 deletions(-)
[Qemu-devel] [patch 1/3] add migration_active function
To query whether migration is active. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: qemu-kvm-block-copy/migration.c === --- qemu-kvm-block-copy.orig/migration.c +++ qemu-kvm-block-copy/migration.c @@ -448,3 +448,13 @@ int migrate_fd_close(void *opaque) qemu_set_fd_handler2(s-fd, NULL, NULL, NULL, NULL); return s-close(s); } + +bool migration_active(void) +{ +if (current_migration +current_migration-get_status(current_migration) == MIG_STATE_ACTIVE) { +return true; +} + +return false; +} Index: qemu-kvm-block-copy/migration.h === --- qemu-kvm-block-copy.orig/migration.h +++ qemu-kvm-block-copy/migration.h @@ -134,4 +134,6 @@ static inline FdMigrationState *migrate_ return container_of(mig_state, FdMigrationState, mig_state); } +bool migration_active(void); + #endif
Re: [Qemu-devel] [PATCH 00/14] [PULL] ARM fixes, v2
On 7 December 2010 15:50, Peter Maydell peter.mayd...@linaro.org wrote: The following changes since commit 2c90fe2b71df2534884bce96d90cbfcc93aeedb8: Kirill Batuzov (1): Speedup 'tb_find_slow' by using the same heuristic as during memory page lookup are available in the git repository at: git://git.linaro.org/qemu/qemu-arm.git for-anthony Ping? -- PMM
[Qemu-devel] [patch 0/3] add support for live block copy
Add support for live block copy. This is similar (and based on), block migration, except it is performed without VM migration.
Re: [Qemu-devel] -snapshot
Am 16.12.2010 18:45, schrieb Stefan Hajnoczi: On Thu, Dec 16, 2010 at 4:34 PM, Amador Pahim ama...@pahim.org wrote: Thank you for your answer. Just one more question: If, while my snapshot vms are running, the main disk is modified by a non snapshot vm? For example, installing some extra software.. this can freeze vms or something? Correct, it is not safe to modify the base image while there is another disk image backed off it. The reason for this is that the image only needs to store the changes that were made on top of the base image. For anything which hasn't been modified it will go back to the base image and read data from there. If you modify the base image, then the filesystem in the base image is not longer what your image file was created from and you have an inconsistent view of the disk. It leads to odd behavior and is unsafe. Stefan There are useful scenarios where using the same disk simultaneously from a snapshot vm and a real system works. If you have a hard disk with a dual boot configuration, it is sometimes useful to boot one configuration with the real system, then start qemu and boot the second configuration. Even booting the same configuration twice (once with the real machine, once with qemu snapshot) is sometimes useful and works to a limited degree. It is a simple way to try new bootloader configurations or other boot setups. Regards Stefan
[Qemu-devel] [PATCH] qdev: sysbus_get_default must not return a NULL pointer (fix regression)
Every system should have some sort of main system bus, so sysbus_get_default should always return a valid bus. Without this patch, at least mipssim and malta no longer start but raise a null pointer access exception (caused by commit ec990eb622ad46df5ddcb1e94c418c271894d416). Cc: Anthony Liguori anth...@codemonkey.ws Signed-off-by: Stefan Weil w...@mail.berlios.de --- hw/qdev.c |9 + 1 files changed, 5 insertions(+), 4 deletions(-) diff --git a/hw/qdev.c b/hw/qdev.c index 10e28df..6fc9b02 100644 --- a/hw/qdev.c +++ b/hw/qdev.c @@ -107,10 +107,7 @@ DeviceState *qdev_create(BusState *bus, const char *name) DeviceInfo *info; if (!bus) { -if (!main_system_bus) { -main_system_bus = qbus_create(system_bus_info, NULL, main-system-bus); -} -bus = main_system_bus; +bus = sysbus_get_default(); } info = qdev_find_info(bus-info, name); @@ -311,6 +308,10 @@ static int qdev_reset_one(DeviceState *dev, void *opaque) BusState *sysbus_get_default(void) { +if (!main_system_bus) { +main_system_bus = qbus_create(system_bus_info, NULL, + main-system-bus); +} return main_system_bus; } -- 1.7.2.3
[Qemu-devel] Re: [PATCH 0/3] add TRIM/UNMAP support, v4
This patchset adds support for the SCSI WRITE SAME with unmap command, which allows reclaiming free space from a backing image. The user facing implementation is pretty complete, but not really efficient because the underlying bdrv_discard implementation doesn't use the aio implementation yet. The reason for that is that the SCSI layer doesn't really allow any asynchronous commands except for READ/WRITE by design. The only support backend so far is the XFS hole punching ioctl, but others can be added easily when they become available. A virtio implementation for a discard command would also be pretty easy, but until we actually support a better backend then a plain sparse file it's not worth using for production enviroments anyway, but more for playing with the thin provisioning infrastructure, or observing guest behaviour when TRIM / unmap is supported. If the support is enabled and the backend doesn't support hole punching the WRITE SAME command becomes a no-op so that migration from hosts supporting or not supporting it works. Version 4: - incorporate Kevin's review comments for the scsi patch - update to the current block queue - drop ide TRIM support, as it doesn't easily port to the refactoring of the IDE code Version 3: - refactor IDE dma support code - proper brace obsfucation - fix compile without xfs headers - use bool instead of int for a one-byte flag Version 2: - replace tabs with spaces - return -ENOMEDIUM from bdrv_discard if there's no driver assigned - actually list the TP EVPD page as supported when querying for supported EVPD pages
[Qemu-devel] [PATCH 1/3] block: add discard support
Add a new bdrv_discard method to free blocks in a mapping image, and a new drive property to set the granularity for these discard. If no discard granularity support is set discard support is disabled. Signed-off-by: Christoph Hellwig h...@lst.de Index: qemu/block.c === --- qemu.orig/block.c 2010-12-16 16:51:43.0 +0100 +++ qemu/block.c2010-12-16 16:52:05.875253180 +0100 @@ -1515,6 +1515,17 @@ int bdrv_has_zero_init(BlockDriverState return 1; } +int bdrv_discard(BlockDriverState *bs, int64_t sector_num, int nb_sectors) +{ +if (!bs-drv) { +return -ENOMEDIUM; +} +if (!bs-drv-bdrv_discard) { +return 0; +} +return bs-drv-bdrv_discard(bs, sector_num, nb_sectors); +} + /* * Returns true iff the specified sector is present in the disk image. Drivers * not implementing the functionality are assumed to not support backing files, Index: qemu/block.h === --- qemu.orig/block.h 2010-12-16 16:51:43.0 +0100 +++ qemu/block.h2010-12-16 16:52:05.875253180 +0100 @@ -146,6 +146,7 @@ int bdrv_flush(BlockDriverState *bs); void bdrv_flush_all(void); void bdrv_close_all(void); +int bdrv_discard(BlockDriverState *bs, int64_t sector_num, int nb_sectors); int bdrv_has_zero_init(BlockDriverState *bs); int bdrv_is_allocated(BlockDriverState *bs, int64_t sector_num, int nb_sectors, int *pnum); Index: qemu/block_int.h === --- qemu.orig/block_int.h 2010-12-16 16:51:43.0 +0100 +++ qemu/block_int.h2010-12-16 16:52:38.542254787 +0100 @@ -72,6 +72,8 @@ struct BlockDriver { BlockDriverCompletionFunc *cb, void *opaque); BlockDriverAIOCB *(*bdrv_aio_flush)(BlockDriverState *bs, BlockDriverCompletionFunc *cb, void *opaque); +int (*bdrv_discard)(BlockDriverState *bs, int64_t sector_num, +int nb_sectors); int (*bdrv_aio_multiwrite)(BlockDriverState *bs, BlockRequest *reqs, int num_reqs); @@ -227,6 +229,7 @@ typedef struct BlockConf { uint16_t min_io_size; uint32_t opt_io_size; int32_t bootindex; +uint32_t discard_granularity; } BlockConf; static inline unsigned int get_physical_block_exp(BlockConf *conf) @@ -250,6 +253,8 @@ static inline unsigned int get_physical_ _conf.physical_block_size, 512), \ DEFINE_PROP_UINT16(min_io_size, _state, _conf.min_io_size, 0), \ DEFINE_PROP_UINT32(opt_io_size, _state, _conf.opt_io_size, 0),\ -DEFINE_PROP_INT32(bootindex, _state, _conf.bootindex, -1) \ +DEFINE_PROP_INT32(bootindex, _state, _conf.bootindex, -1),\ +DEFINE_PROP_UINT32(discard_granularity, _state, \ + _conf.discard_granularity, 0) #endif /* BLOCK_INT_H */ Index: qemu/block/raw.c === --- qemu.orig/block/raw.c 2010-12-15 10:05:38.0 +0100 +++ qemu/block/raw.c2010-12-16 16:52:05.882253111 +0100 @@ -65,6 +65,11 @@ static int raw_probe(const uint8_t *buf, return 1; /* everything can be opened as raw image */ } +static int raw_discard(BlockDriverState *bs, int64_t sector_num, int nb_sectors) +{ +return bdrv_discard(bs-file, sector_num, nb_sectors); +} + static int raw_is_inserted(BlockDriverState *bs) { return bdrv_is_inserted(bs-file); @@ -130,6 +135,7 @@ static BlockDriver bdrv_raw = { .bdrv_aio_readv = raw_aio_readv, .bdrv_aio_writev= raw_aio_writev, .bdrv_aio_flush = raw_aio_flush, +.bdrv_discard = raw_discard, .bdrv_is_inserted = raw_is_inserted, .bdrv_eject = raw_eject,
[Qemu-devel] [PATCH 2/3] scsi-disk: support WRITE SAME (16) with unmap bit
Support discards via the WRITE SAME command with the unmap bit set, and tell the initiator about the support for it via the block limit and the new thin provisioning EVPD pages. Also fix the comment which incorrectly describedthe block limits EVPD page. Signed-off-by: Christoph Hellwig h...@lst.de Index: qemu/hw/scsi-disk.c === --- qemu.orig/hw/scsi-disk.c2010-12-15 10:06:14.260003984 +0100 +++ qemu/hw/scsi-disk.c 2010-12-16 16:57:26.881003566 +0100 @@ -424,7 +424,8 @@ static int scsi_disk_emulate_inquiry(SCS outbuf[buflen++] = 0x80; // unit serial number outbuf[buflen++] = 0x83; // device identification if (bdrv_get_type_hint(s-bs) != BDRV_TYPE_CDROM) { -outbuf[buflen++] = 0xb0; // block device characteristics +outbuf[buflen++] = 0xb0; // block limits +outbuf[buflen++] = 0xb2; // thin provisioning } outbuf[pages] = buflen - pages - 1; // number of pages break; @@ -466,8 +467,10 @@ static int scsi_disk_emulate_inquiry(SCS buflen += id_len; break; } -case 0xb0: /* block device characteristics */ +case 0xb0: /* block limits */ { +unsigned int unmap_sectors = +s-qdev.conf.discard_granularity / s-qdev.blocksize; unsigned int min_io_size = s-qdev.conf.min_io_size / s-qdev.blocksize; unsigned int opt_io_size = @@ -492,6 +495,21 @@ static int scsi_disk_emulate_inquiry(SCS outbuf[13] = (opt_io_size 16) 0xff; outbuf[14] = (opt_io_size 8) 0xff; outbuf[15] = opt_io_size 0xff; + +/* optimal unmap granularity */ +outbuf[28] = (unmap_sectors 24) 0xff; +outbuf[29] = (unmap_sectors 16) 0xff; +outbuf[30] = (unmap_sectors 8) 0xff; +outbuf[31] = unmap_sectors 0xff; +break; +} +case 0xb2: /* thin provisioning */ +{ +outbuf[3] = buflen = 8; +outbuf[4] = 0; +outbuf[5] = 0x40; /* write same with unmap supported */ +outbuf[6] = 0; +outbuf[7] = 0; break; } default: @@ -959,6 +977,12 @@ static int scsi_disk_emulate_command(SCS outbuf[11] = 0; outbuf[12] = 0; outbuf[13] = get_physical_block_exp(s-qdev.conf); + +/* set TPE bit if the format supports discard */ +if (s-qdev.conf.discard_granularity) { +outbuf[14] = 0x80; +} + /* Protection, exponent and lowest lba field left blank. */ buflen = req-cmd.xfer; break; @@ -1123,6 +1147,31 @@ static int32_t scsi_send_command(SCSIDev goto illegal_lba; } break; +case WRITE_SAME_16: +len = r-req.cmd.xfer / d-blocksize; + +DPRINTF(WRITE SAME(16) (sector % PRId64 , count %d)\n, +r-req.cmd.lba, len); + +if (r-req.cmd.lba s-max_lba) { +goto illegal_lba; +} + +/* + * We only support WRITE SAME with the unmap bit set for now. + */ +if (!(buf[1] 0x8)) { +goto fail; +} + +rc = bdrv_discard(s-bs, r-req.cmd.lba * s-cluster_size, + len * s-cluster_size); +if (rc 0) { +/* XXX: better error code ?*/ +goto fail; +} + +break; default: DPRINTF(Unknown SCSI command (%2.2x)\n, buf[0]); fail: Index: qemu/hw/scsi-defs.h === --- qemu.orig/hw/scsi-defs.h2010-12-15 10:05:38.301003914 +0100 +++ qemu/hw/scsi-defs.h 2010-12-16 16:52:55.803004683 +0100 @@ -84,6 +84,7 @@ #define MODE_SENSE_10 0x5a #define PERSISTENT_RESERVE_IN 0x5e #define PERSISTENT_RESERVE_OUT 0x5f +#define WRITE_SAME_16 0x93 #define MAINTENANCE_IN0xa3 #define MAINTENANCE_OUT 0xa4 #define MOVE_MEDIUM 0xa5
[Qemu-devel] [PATCH 3/3] raw-posix: add discard support
Add support to discard blocks in a raw image residing on an XFS filesystem by calling the XFS_IOC_UNRESVSP64 ioctl to punch holes. Support for other hole punching mechanisms can be added when they become available. Signed-off-by: Christoph Hellwig h...@lst.de Index: qemu/block/raw-posix.c === --- qemu.orig/block/raw-posix.c 2010-12-15 10:05:37.0 +0100 +++ qemu/block/raw-posix.c 2010-12-16 17:40:47.617253460 +0100 @@ -69,6 +69,10 @@ #include sys/diskslice.h #endif +#ifdef CONFIG_XFS +#include xfs/xfs.h +#endif + //#define DEBUG_FLOPPY //#define DEBUG_BLOCK @@ -120,6 +124,9 @@ typedef struct BDRVRawState { #endif uint8_t *aligned_buf; unsigned aligned_buf_size; +#ifdef CONFIG_XFS +bool is_xfs : 1; +#endif } BDRVRawState; static int fd_open(BlockDriverState *bs); @@ -196,6 +203,12 @@ static int raw_open_common(BlockDriverSt #endif } +#ifdef CONFIG_XFS +if (platform_test_xfs_fd(s-fd)) { +s-is_xfs = 1; +} +#endif + return 0; out_free_buf: @@ -740,6 +753,36 @@ static int raw_flush(BlockDriverState *b return qemu_fdatasync(s-fd); } +#ifdef CONFIG_XFS +static int xfs_discard(BDRVRawState *s, int64_t sector_num, int nb_sectors) +{ +struct xfs_flock64 fl; + +memset(fl, 0, sizeof(fl)); +fl.l_whence = SEEK_SET; +fl.l_start = sector_num 9; +fl.l_len = (int64_t)nb_sectors 9; + +if (xfsctl(NULL, s-fd, XFS_IOC_UNRESVSP64, fl) 0) { +printf(cannot punch hole (%s)\n, strerror(errno)); +return -errno; +} + +return 0; +} +#endif + +static int raw_discard(BlockDriverState *bs, int64_t sector_num, int nb_sectors) +{ +#ifdef CONFIG_XFS +BDRVRawState *s = bs-opaque; + +if (s-is_xfs) +return xfs_discard(s, sector_num, nb_sectors); +#endif + +return 0; +} static QEMUOptionParameter raw_create_options[] = { { @@ -761,6 +804,7 @@ static BlockDriver bdrv_file = { .bdrv_close = raw_close, .bdrv_create = raw_create, .bdrv_flush = raw_flush, +.bdrv_discard = raw_discard, .bdrv_aio_readv = raw_aio_readv, .bdrv_aio_writev = raw_aio_writev, Index: qemu/configure === --- qemu.orig/configure 2010-12-16 16:51:43.0 +0100 +++ qemu/configure 2010-12-16 17:41:11.410254507 +0100 @@ -288,6 +288,7 @@ xen= linux_aio= attr= vhost_net= +xfs= gprof=no debug_tcg=no @@ -1399,6 +1400,27 @@ EOF fi ## +# xfsctl() probe, used for raw-posix +if test $xfs != no ; then + cat $TMPC EOF +#include xfs/xfs.h +int main(void) +{ +xfsctl(NULL, 0, 0, NULL); +return 0; +} +EOF + if compile_prog ; then +xfs=yes + else +if test $xfs = yes ; then + feature_not_found xfs +fi +xfs=no + fi +fi + +## # vde libraries probe if test $vde != no ; then vde_libs=-lvdeplug @@ -2403,6 +2425,7 @@ echo Trace backend $trace_backend echo Trace output file $trace_file-pid echo spice support $spice echo rbd support $rbd +echo xfsctl support$xfs if test $sdl_too_old = yes; then echo - Your SDL version is too old - please upgrade to have SDL support @@ -2548,6 +2571,9 @@ fi if test $uuid = yes ; then echo CONFIG_UUID=y $config_host_mak fi +if test $xfs = yes ; then + echo CONFIG_XFS=y $config_host_mak +fi qemu_version=`head $source_path/VERSION` echo VERSION=$qemu_version $config_host_mak echo PKGVERSION=$pkgversion $config_host_mak
[Qemu-devel] [PATCH 1/2] qdev: Track runtime machine modifications
Create a trivial interface to track whether the machine has been modified since boot. Adding or removing devices will trigger this to return true. An example usage scenario for such an interface is the rtl8139 driver which includes a cpu_register_io_memory() value in it's migration stream. For the majority of migrations, where no hotplug has occured in the machine, this works correctly. Once the machine is modified, we can use this interface to detect that and include a subsection for the device to prevent migrations to rtl8139 versions with this bug. Signed-off-by: Alex Williamson alex.william...@redhat.com --- hw/qdev.c | 10 ++ hw/qdev.h |1 + 2 files changed, 11 insertions(+), 0 deletions(-) diff --git a/hw/qdev.c b/hw/qdev.c index 35858cb..e6e7a57 100644 --- a/hw/qdev.c +++ b/hw/qdev.c @@ -32,6 +32,8 @@ #include blockdev.h static int qdev_hotplug = 0; +static bool qdev_hot_added = false; +static bool qdev_hot_removed = false; /* This is a nasty hack to allow passing a NULL bus to qdev_create. */ static BusState *main_system_bus; @@ -93,6 +95,7 @@ static DeviceState *qdev_create_from_info(BusState *bus, DeviceInfo *info) if (qdev_hotplug) { assert(bus-allow_hotplug); dev-hotplugged = 1; +qdev_hot_added = true; } dev-instance_id_alias = -1; dev-state = DEV_STATE_CREATED; @@ -305,6 +308,8 @@ int qdev_unplug(DeviceState *dev) } assert(dev-info-unplug != NULL); +qdev_hot_removed = true; + return dev-info-unplug(dev); } @@ -370,6 +375,11 @@ void qdev_machine_creation_done(void) qdev_hotplug = 1; } +bool qdev_machine_modified(void) +{ +return qdev_hot_added || qdev_hot_removed; +} + /* Get a character (serial) device interface. */ CharDriverState *qdev_init_chardev(DeviceState *dev) { diff --git a/hw/qdev.h b/hw/qdev.h index 579328a..66b8aee 100644 --- a/hw/qdev.h +++ b/hw/qdev.h @@ -123,6 +123,7 @@ int qdev_unplug(DeviceState *dev); void qdev_free(DeviceState *dev); int qdev_simple_unplug_cb(DeviceState *dev); void qdev_machine_creation_done(void); +bool qdev_machine_modified(void); qemu_irq qdev_get_gpio_in(DeviceState *dev, int n); void qdev_connect_gpio_out(DeviceState *dev, int n, qemu_irq pin);
[Qemu-devel] [patch 3/3] do not allow migration if block copy in progress
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: qemu-kvm-block-copy/migration.c === --- qemu-kvm-block-copy.orig/migration.c +++ qemu-kvm-block-copy/migration.c @@ -19,6 +19,7 @@ #include block.h #include qemu_socket.h #include block-migration.h +#include block-copy.h #include qemu-objects.h //#define DEBUG_MIGRATION @@ -88,6 +89,11 @@ int do_migrate(Monitor *mon, const QDict return -1; } +if (block_copy_active()) { +monitor_printf(mon, block copy in progress\n); +return -1; +} + if (strstart(uri, tcp:, p)) { s = tcp_start_outgoing_migration(mon, p, max_throttle, detach, blk, inc);
[Qemu-devel] [patch 2/3] live block copy
Add support for live block copy. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: qemu-kvm/block-copy.c === --- /dev/null +++ qemu-kvm/block-copy.c @@ -0,0 +1,728 @@ +/* + * QEMU live block copy + * + * Copyright (C) 2010 Red Hat Inc. + * + * Authors: Marcelo Tosatti mtosa...@redhat.com + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + * + */ + +#include qemu-common.h +#include block_int.h +#include qemu-queue.h +#include qemu-timer.h +#include monitor.h +#include block-copy.h +#include migration.h +#include sysemu.h +#include qjson.h +#include assert.h + +#define BLOCK_SIZE (BDRV_SECTORS_PER_DIRTY_CHUNK BDRV_SECTOR_BITS) +#define MAX_IS_ALLOCATED_SEARCH 65536 + +/* + * Stages: + * + * STAGE_BULK: bulk reads/writes in progress + * STAGE_BULK_FINISHED: bulk reads finished, bulk writes in progress + * STAGE_DIRTY: bulk writes finished, dirty reads/writes in progress + * STAGE_SWITCH_FINISHED: switched to new image. + */ + +enum BdrvCopyStage { +STAGE_BULK, +STAGE_BULK_FINISHED, +STAGE_DIRTY, +STAGE_SWITCH_FINISHED, +}; + +typedef struct BdrvCopyState { +BlockDriverState *src; +BlockDriverState *dst; +bool shared_base; + +int64_t curr_sector; +int64_t completed_sectors; +int64_t nr_sectors; + +enum BdrvCopyStage stage; +int inflight_reads; +int error; +int failed; +int cancelled; +QLIST_HEAD(, BdrvCopyBlock) io_list; +unsigned long *aio_bitmap; +QEMUTimer *aio_timer; +QLIST_ENTRY(BdrvCopyState) list; + +int64_t blocks; +int64_t total_time; + +char src_device_name[32]; +char dst_filename[1024]; +int commit_fd; +} BdrvCopyState; + +typedef struct BdrvCopyBlock { +BdrvCopyState *state; +uint8_t *buf; +int64_t sector; +int64_t nr_sectors; +struct iovec iov; +QEMUIOVector qiov; +BlockDriverAIOCB *aiocb; +int64_t time; +QLIST_ENTRY(BdrvCopyBlock) list; +} BdrvCopyBlock; + +static QLIST_HEAD(, BdrvCopyState) block_copy_list = +QLIST_HEAD_INITIALIZER(block_copy_list); + +static void alloc_aio_bitmap(BdrvCopyState *s) +{ +BlockDriverState *bs = s-src; +int64_t bitmap_size; + +bitmap_size = (bdrv_getlength(bs) BDRV_SECTOR_BITS) + +BDRV_SECTORS_PER_DIRTY_CHUNK * 8 - 1; +bitmap_size /= BDRV_SECTORS_PER_DIRTY_CHUNK * 8; + +s-aio_bitmap = qemu_mallocz(bitmap_size); +} + +static bool aio_inflight(BdrvCopyState *s, int64_t sector) +{ +int64_t chunk = sector / (int64_t)BDRV_SECTORS_PER_DIRTY_CHUNK; + +if (s-aio_bitmap +(sector BDRV_SECTOR_BITS) bdrv_getlength(s-src)) { +return !!(s-aio_bitmap[chunk / (sizeof(unsigned long) * 8)] +(1UL (chunk % (sizeof(unsigned long) * 8; +} else { +return 0; +} +} + +static void set_aio_inflight(BdrvCopyState *s, int64_t sector_num, + int nb_sectors, int set) +{ +int64_t start, end; +unsigned long val, idx, bit; + +start = sector_num / BDRV_SECTORS_PER_DIRTY_CHUNK; +end = (sector_num + nb_sectors - 1) / BDRV_SECTORS_PER_DIRTY_CHUNK; + +for (; start = end; start++) { +idx = start / (sizeof(unsigned long) * 8); +bit = start % (sizeof(unsigned long) * 8); +val = s-aio_bitmap[idx]; +if (set) { +if (!(val (1UL bit))) { +val |= 1UL bit; +} +} else { +if (val (1UL bit)) { +val = ~(1UL bit); +} +} +s-aio_bitmap[idx] = val; +} +} + +static void blkcopy_set_stage(BdrvCopyState *s, enum BdrvCopyStage stage) +{ +s-stage = stage; + +switch (stage) { +case STAGE_BULK: +BLKDBG_EVENT(s-dst-file, BLKDBG_BLKCOPY_STAGE_BULK); +break; +case STAGE_BULK_FINISHED: +BLKDBG_EVENT(s-dst-file, BLKDBG_BLKCOPY_STAGE_BULK_FINISHED); +break; +case STAGE_DIRTY: +BLKDBG_EVENT(s-dst-file, BLKDBG_BLKCOPY_STAGE_DIRTY); +break; +case STAGE_SWITCH_FINISHED: +BLKDBG_EVENT(s-dst-file, BLKDBG_BLKCOPY_STAGE_SWITCH_FINISHED); +break; +default: +break; +} +} + +static void blk_copy_handle_cb_error(BdrvCopyState *s, int ret) +{ +s-error = ret; +qemu_mod_timer(s-aio_timer, qemu_get_clock(rt_clock)); +} + +static inline void add_avg_transfer_time(BdrvCopyState *s, int64_t time) +{ +s-blocks++; +s-total_time += time; +} + +static void blk_copy_write_cb(void *opaque, int ret) +{ +BdrvCopyBlock *blk = opaque; +BdrvCopyState *s = blk-state; + +if (ret 0) { +QLIST_REMOVE(blk, list); +qemu_free(blk-buf); +qemu_free(blk); +blk_copy_handle_cb_error(s, ret); +return; +} + +QLIST_REMOVE(blk, list); +add_avg_transfer_time(s, qemu_get_clock_ns(rt_clock) - blk-time); + +/* schedule switch to
Re: [Qemu-devel] -snapshot
Thanks. Some stuff are clear now. I started a VDI Open Source project (http://www.ucs.br/projeto/osdvt/) and your information will help a lot. Pahim On Thu, Dec 16, 2010 at 4:16 PM, Stefan Weil w...@mail.berlios.de wrote: Am 16.12.2010 18:45, schrieb Stefan Hajnoczi: On Thu, Dec 16, 2010 at 4:34 PM, Amador Pahim ama...@pahim.org wrote: Thank you for your answer. Just one more question: If, while my snapshot vms are running, the main disk is modified by a non snapshot vm? For example, installing some extra software.. this can freeze vms or something? Correct, it is not safe to modify the base image while there is another disk image backed off it. The reason for this is that the image only needs to store the changes that were made on top of the base image. For anything which hasn't been modified it will go back to the base image and read data from there. If you modify the base image, then the filesystem in the base image is not longer what your image file was created from and you have an inconsistent view of the disk. It leads to odd behavior and is unsafe. Stefan There are useful scenarios where using the same disk simultaneously from a snapshot vm and a real system works. If you have a hard disk with a dual boot configuration, it is sometimes useful to boot one configuration with the real system, then start qemu and boot the second configuration. Even booting the same configuration twice (once with the real machine, once with qemu snapshot) is sometimes useful and works to a limited degree. It is a simple way to try new bootloader configurations or other boot setups. Regards Stefan