[Qemu-devel] [PATCH v2] Add interface to traverse the qmp command list by QmpCommand
In the original code, qmp_get_command_list is used to construct a list of all commands' name. To get the information of all qga commands, it traverses the name list and search the command info with its name. So it can cause O(n^2) in the number of commands. This patch adds an interface to traverse the qmp command list by QmpCommand to replace qmp_get_command_list. It can decrease the complexity from O(n^2) to O(n). Signed-off-by: Mark Wu wu...@linux.vnet.ibm.com --- Changes: v2: 1. Keep the signature of qmp_command_is_enabled (per Eric and Michael) 2. Remove the unnecessary pointer castings (per Eric) include/qapi/qmp/dispatch.h | 5 ++-- qapi/qmp-registry.c | 27 +++--- qga/commands.c | 38 ++--- qga/main.c | 68 + 4 files changed, 48 insertions(+), 90 deletions(-) diff --git a/include/qapi/qmp/dispatch.h b/include/qapi/qmp/dispatch.h index 1ce11f5..b6eb49e 100644 --- a/include/qapi/qmp/dispatch.h +++ b/include/qapi/qmp/dispatch.h @@ -47,9 +47,10 @@ QmpCommand *qmp_find_command(const char *name); QObject *qmp_dispatch(QObject *request); void qmp_disable_command(const char *name); void qmp_enable_command(const char *name); -bool qmp_command_is_enabled(const char *name); -char **qmp_get_command_list(void); +bool qmp_command_is_enabled(const QmpCommand *cmd); QObject *qmp_build_error_object(Error *errp); +typedef void (*qmp_cmd_callback_fn)(QmpCommand *cmd, void *opaque); +void qmp_for_each_command(qmp_cmd_callback_fn fn, void *opaque); #endif diff --git a/qapi/qmp-registry.c b/qapi/qmp-registry.c index 28bbbe8..3fcf10e 100644 --- a/qapi/qmp-registry.c +++ b/qapi/qmp-registry.c @@ -66,35 +66,16 @@ void qmp_enable_command(const char *name) qmp_toggle_command(name, true); } -bool qmp_command_is_enabled(const char *name) +bool qmp_command_is_enabled(const QmpCommand *cmd) { -QmpCommand *cmd; - -QTAILQ_FOREACH(cmd, qmp_commands, node) { -if (strcmp(cmd-name, name) == 0) { -return cmd-enabled; -} -} - -return false; +return cmd-enabled; } -char **qmp_get_command_list(void) +void qmp_for_each_command(qmp_cmd_callback_fn fn, void *opaque) { QmpCommand *cmd; -int count = 1; -char **list_head, **list; - -QTAILQ_FOREACH(cmd, qmp_commands, node) { -count++; -} - -list_head = list = g_malloc0(count * sizeof(char *)); QTAILQ_FOREACH(cmd, qmp_commands, node) { -*list = g_strdup(cmd-name); -list++; +fn(cmd, opaque); } - -return list_head; } diff --git a/qga/commands.c b/qga/commands.c index 528b082..063b22b 100644 --- a/qga/commands.c +++ b/qga/commands.c @@ -45,35 +45,27 @@ void qmp_guest_ping(Error **err) slog(guest-ping called); } -struct GuestAgentInfo *qmp_guest_info(Error **err) +static void qmp_command_info(QmpCommand *cmd, void *opaque) { -GuestAgentInfo *info = g_malloc0(sizeof(GuestAgentInfo)); +GuestAgentInfo *info = opaque; GuestAgentCommandInfo *cmd_info; GuestAgentCommandInfoList *cmd_info_list; -char **cmd_list_head, **cmd_list; - -info-version = g_strdup(QEMU_VERSION); - -cmd_list_head = cmd_list = qmp_get_command_list(); -if (*cmd_list_head == NULL) { -goto out; -} -while (*cmd_list) { -cmd_info = g_malloc0(sizeof(GuestAgentCommandInfo)); -cmd_info-name = g_strdup(*cmd_list); -cmd_info-enabled = qmp_command_is_enabled(cmd_info-name); +cmd_info = g_malloc0(sizeof(GuestAgentCommandInfo)); +cmd_info-name = g_strdup(cmd-name); +cmd_info-enabled = qmp_command_is_enabled(cmd); -cmd_info_list = g_malloc0(sizeof(GuestAgentCommandInfoList)); -cmd_info_list-value = cmd_info; -cmd_info_list-next = info-supported_commands; -info-supported_commands = cmd_info_list; +cmd_info_list = g_malloc0(sizeof(GuestAgentCommandInfoList)); +cmd_info_list-value = cmd_info; +cmd_info_list-next = info-supported_commands; +info-supported_commands = cmd_info_list; +} -g_free(*cmd_list); -cmd_list++; -} +struct GuestAgentInfo *qmp_guest_info(Error **err) +{ +GuestAgentInfo *info = g_malloc0(sizeof(GuestAgentInfo)); -out: -g_free(cmd_list_head); +info-version = g_strdup(QEMU_VERSION); +qmp_for_each_command(qmp_command_info, info); return info; } diff --git a/qga/main.c b/qga/main.c index 6c746c8..ff2ee03 100644 --- a/qga/main.c +++ b/qga/main.c @@ -347,48 +347,34 @@ static gint ga_strcmp(gconstpointer str1, gconstpointer str2) } /* disable commands that aren't safe for fsfreeze */ -static void ga_disable_non_whitelisted(void) +static void ga_disable_non_whitelisted(QmpCommand *cmd, void *opaque) { -char **list_head, **list; bool whitelisted; int i; -list_head = list = qmp_get_command_list(); -while (*list != NULL) { -whitelisted = false; -
[Qemu-devel] [PATCH v4] Extend qemu-ga's 'guest-info' command to expose flag 'success-response'
Now we have several qemu-ga commands not returning response on success. It has been documented in qga/qapi-schema.json already. This patch exposes the 'success-response' flag by extending 'guest-info' command. With this change, the clients can handle the command response more flexibly. Signed-off-by: Mark Wu wu...@linux.vnet.ibm.com --- Changes: v4: Add signature of qmp_has_success_response per Michael. v3: 1. treat cmd-options as a bitmask instead of single option (per Eric) 2. rebase on the patch Add interface to traverse the qmp command list by QmpCommand to avoid the O(n2) problem (per Eric and Michael) v2: add the notation 'since 1.7' to the option 'success-response' (per Eric Blake's comments) include/qapi/qmp/dispatch.h | 1 + qapi/qmp-registry.c | 5 + qga/commands.c | 1 + qga/qapi-schema.json| 5 - 4 files changed, 11 insertions(+), 1 deletion(-) diff --git a/include/qapi/qmp/dispatch.h b/include/qapi/qmp/dispatch.h index b6eb49e..cebf6aa 100644 --- a/include/qapi/qmp/dispatch.h +++ b/include/qapi/qmp/dispatch.h @@ -48,6 +48,7 @@ QObject *qmp_dispatch(QObject *request); void qmp_disable_command(const char *name); void qmp_enable_command(const char *name); bool qmp_command_is_enabled(const QmpCommand *cmd); +bool qmp_has_success_response(const QmpCommand *cmd); QObject *qmp_build_error_object(Error *errp); typedef void (*qmp_cmd_callback_fn)(QmpCommand *cmd, void *opaque); void qmp_for_each_command(qmp_cmd_callback_fn fn, void *opaque); diff --git a/qapi/qmp-registry.c b/qapi/qmp-registry.c index 3fcf10e..c75c2e8 100644 --- a/qapi/qmp-registry.c +++ b/qapi/qmp-registry.c @@ -71,6 +71,11 @@ bool qmp_command_is_enabled(const QmpCommand *cmd) return cmd-enabled; } +bool qmp_has_success_response(const QmpCommand *cmd) +{ + return !(cmd-options QCO_NO_SUCCESS_RESP); +} + void qmp_for_each_command(qmp_cmd_callback_fn fn, void *opaque) { QmpCommand *cmd; diff --git a/qga/commands.c b/qga/commands.c index 063b22b..7f089ba 100644 --- a/qga/commands.c +++ b/qga/commands.c @@ -54,6 +54,7 @@ static void qmp_command_info(QmpCommand *cmd, void *opaque) cmd_info = g_malloc0(sizeof(GuestAgentCommandInfo)); cmd_info-name = g_strdup(cmd-name); cmd_info-enabled = qmp_command_is_enabled(cmd); +cmd_info-success_response = qmp_has_success_response(cmd); cmd_info_list = g_malloc0(sizeof(GuestAgentCommandInfoList)); cmd_info_list-value = cmd_info; diff --git a/qga/qapi-schema.json b/qga/qapi-schema.json index 7155b7a..245f968 100644 --- a/qga/qapi-schema.json +++ b/qga/qapi-schema.json @@ -141,10 +141,13 @@ # # @enabled: whether command is currently enabled by guest admin # +# @success-response: whether command returns a response on success +#(since 1.7) +# # Since 1.1.0 ## { 'type': 'GuestAgentCommandInfo', - 'data': { 'name': 'str', 'enabled': 'bool' } } + 'data': { 'name': 'str', 'enabled': 'bool', 'success-response': 'bool' } } ## # @GuestAgentInfo -- 1.8.3.1
Re: [Qemu-devel] [PATCHv3 00/20] block: logical block provisioning enhancements
On Mon, Oct 7, 2013 at 11:42 AM, Paolo Bonzini pbonz...@redhat.com wrote: Il 07/10/2013 10:42, Stefan Hajnoczi ha scritto: Could you make bdrv_co_write_zeroes() always use UNMAP, if possible, and avoid adding the new BDRV_REQ_MAY_UNMAP flag? While reading the first few patches in this series I wondered why there is a need to expose flags at all... Sometimes it is useful to distinguish between zeroing at the image format level from discarding at the device level, but I don't think we make use of that yet. I'd prefer to keep the interface simple for now and add flags later, if necessary. Or maybe I just missed something ;) The flag is needed to implement the right semantics for the SCSI WRITE SAME command, which are: - if the UNMAP bit is off, always write the sectors (that's bdrv_aio_write_zeroes without BDRV_REQ_MAY_UNMAP if the payload is zero, otherwise it's emulated with bdrv_aio_writev) - if the target can discard and write the specified payload, you can discard, else you must write the sectors with the correct payload (that's bdrv_aio_write_zeroes with BDRV_REQ_MAY_UNMAP). Contrast this with the UNMAP command, which does not make any guarantee on the content of the sectors after the command is completed (a few months ago we agreed that, even if you have discard_zeroes=true in the target, it is fine for UNMAP to do nothing). Okay, then let's keep the patches to expose the flag. Stefan
Re: [Qemu-devel] [PATCH v5 0/4] timers thread-safe stuff
On Mon, Oct 07, 2013 at 02:24:26PM +0200, Paolo Bonzini wrote: Stefan, will you pick this up next week or shall I? I have patches for thread-safe icount almost ready to post, and I am not sure through whom they are going to go. Please include it in your pull request. Kevin is merging block patches this week, the queue is fairly full so I imagine he has plenty of other things to review. Stefan
Re: [Qemu-devel] [patch 0/2] force -mem-path RAM allocation
We have -mem-path FILE provide backing storage for guest RAM -mem-prealloc preallocate guest memory (use with -mem-path) PATCH 2/2 adds -mem-path-forcefail if unable to allocate RAM as specified by -mem-path Looks like it's time to consolidate the options related to guest memory into a single, QemuOpts-style -memory NAME=VALUE,... What do you guys think?
Re: [Qemu-devel] [PATCHv3 00/20] block: logical block provisioning enhancements
On 08.10.2013 09:02, Stefan Hajnoczi wrote: On Mon, Oct 7, 2013 at 11:42 AM, Paolo Bonzini pbonz...@redhat.com wrote: Il 07/10/2013 10:42, Stefan Hajnoczi ha scritto: Could you make bdrv_co_write_zeroes() always use UNMAP, if possible, and avoid adding the new BDRV_REQ_MAY_UNMAP flag? While reading the first few patches in this series I wondered why there is a need to expose flags at all... Sometimes it is useful to distinguish between zeroing at the image format level from discarding at the device level, but I don't think we make use of that yet. I'd prefer to keep the interface simple for now and add flags later, if necessary. Or maybe I just missed something ;) The flag is needed to implement the right semantics for the SCSI WRITE SAME command, which are: - if the UNMAP bit is off, always write the sectors (that's bdrv_aio_write_zeroes without BDRV_REQ_MAY_UNMAP if the payload is zero, otherwise it's emulated with bdrv_aio_writev) - if the target can discard and write the specified payload, you can discard, else you must write the sectors with the correct payload (that's bdrv_aio_write_zeroes with BDRV_REQ_MAY_UNMAP). Contrast this with the UNMAP command, which does not make any guarantee on the content of the sectors after the command is completed (a few months ago we agreed that, even if you have discard_zeroes=true in the target, it is fine for UNMAP to do nothing). Okay, then let's keep the patches to expose the flag. Okay, then I can keep those. Can you give a short hint if my approach with brdv_make_empty is what you want? I would like to not change the parameters, so use BDRV_REQ_MAY_UNMAP unconditionally. int bdrv_make_empty(BlockDriverState *bs) { int64_t target_size = bdrv_getlength(bs) / BDRV_SECTOR_SIZE; int64_t ret, nb_sectors, sector_num = 0; int n; if (bs-drv-bdrv_make_empty) { return bs-drv-bdrv_make_empty(bs); } for (;;) { nb_sectors = target_size - sector_num; if (nb_sectors = 0) { return 0; } if (nb_sectors INT_MAX) { nb_sectors = INT_MAX; } ret = bdrv_get_block_status(bs, sector_num, nb_sectors, n); if (ret BDRV_BLOCK_ZERO) { sector_num += n; continue; } ret = bdrv_write_zeroes(bs, sector_num, n, BDRV_REQ_MAY_UNMAP); if (ret 0) { error_report(error writing zeroes at sector % PRId64 : %s, sector_num, strerror(-ret)); return ret; } sector_num += n; } }
Re: [Qemu-devel] [patch 0/2] force -mem-path RAM allocation
Il 08/10/2013 09:32, Markus Armbruster ha scritto: We have -mem-path FILE provide backing storage for guest RAM -mem-prealloc preallocate guest memory (use with -mem-path) PATCH 2/2 adds -mem-path-forcefail if unable to allocate RAM as specified by -mem-path Looks like it's time to consolidate the options related to guest memory into a single, QemuOpts-style -memory NAME=VALUE,... What do you guys think? Yes, we can use -numa memory (or -numa mem) that Wanlong Gao is adding. We can add path=, preallocate= and force= options there. Paolo
Re: [Qemu-devel] [patch 1/2] qemu: mempath: prefault pages manually
Il 08/10/2013 02:41, Marcelo Tosatti ha scritto: +/* unblock SIGBUS */ +pthread_sigmask(SIG_BLOCK, NULL, oldset); +sigemptyset(set); +sigaddset(set, SIGBUS); +pthread_sigmask(SIG_UNBLOCK, set, NULL); Please instead modify qemu-thread-posix.c to unblock all per-thread signals (SIGBUS, SIGSEGV, SIGILL, SIGFPE and SIGSYS). There is no need to keep those blocked. Paolo
Re: [Qemu-devel] [PATCH v4 2/7] qmp: add internal sync mode common to mirror_start
On Mon, 09/30 08:49, Eric Blake wrote: On 09/30/2013 06:02 AM, Fam Zheng wrote: This adds a new sync mode common which only copies data that is above the common ancestor of source and target. In general, this could be useful in cases like: base_bs --- common_ancestor --- foo --- bar ---source \ \--- target Where data in foo, bar and source will be copied to target, once such common backing_hd sharing is introduced. For now, we could use a special case: If target is the ancestor of source, like, base_bs --- target --- foo --- bar ---source The data in foo, bar and source will be copied to target, like drive-commit, and when they are synced, the source bs replaces target bs. This is specifically useful for block commit of active layer. This mode is not available (-ENOTSUP) from QMP interface, it is only used internally by block commit code. +++ b/qapi-schema.json @@ -1363,7 +1363,7 @@ # Since: 1.3 ## { 'enum': 'MirrorSyncMode', - 'data': ['top', 'full', 'none'] } + 'data': ['top', 'full', 'none', 'common'] } Is it worth documenting the mode, in order to include a '(since 1.7)' notation, as well as a mention that this mode is not supported via QMP but only exists so that the code generator will support the mode needed internally? Is there any way to refactor things so that you don't have to munge the QAPI just to provide this internal-only mode? As described in commit message, this mode could be useful once blockdev-add has device referencing (backing_hd sharing). For now, even with the same backing file, they don't share BDS, so it's not working as expected and should be disabled. So do you think it OK to document as not implemented for now, and wait for backing_hd sharing to enable it? Thanks, Fam
Re: [Qemu-devel] [PATCH v4 4/7] mirror: Add commit_job_type to perform commit with mirror code
On Tue, 10/01 11:13, Eric Blake wrote: On 09/30/2013 06:02 AM, Fam Zheng wrote: Commit active layer will be implemented in block/mirror.c, prepare a new job type to let it have a right type name for the user. Signed-off-by: Fam Zheng f...@redhat.com --- block/mirror.c| 12 +++- blockdev.c| 2 +- include/block/block_int.h | 2 ++ 3 files changed, 14 insertions(+), 2 deletions(-) diff --git a/block/mirror.c b/block/mirror.c index af6851f..20dcfb6 100644 --- a/block/mirror.c +++ b/block/mirror.c @@ -532,10 +532,19 @@ static const BlockJobType mirror_job_type = { .complete = mirror_complete, }; +static const BlockJobType commit_job_type = { +.instance_size = sizeof(MirrorBlockJob), +.job_type = commit, I still wonder if we should complete the conversion over to a QAPI enum type for all valid job types prior to hard-coding open strings through yet more of the code base. As long as we don't have introspection done yet, we can still make the switch, and in the long run, having an enum of valid job types seems like it will be better for maintenance, all with no change to what is sent over the wire in QMP. @@ -390,6 +391,7 @@ void mirror_start(BlockDriverState *bs, BlockDriverState *target, int64_t speed, int64_t granularity, int64_t buf_size, MirrorSyncMode mode, BlockdevOnError on_source_error, BlockdevOnError on_target_error, + bool commit_job, If we DO create a QAPI enum for job type, then this parameter would be the enum type, rather than a bool. Good point, I'll work on a QAPI enum and base this on it. Fam
Re: [Qemu-devel] [PATCH qom-next 0/2] qdev-monitor: Reference counting follow-ups
On Mon, 7 Oct 2013 18:43:59 +0200 Andreas Färber afaer...@suse.de wrote: Hello, I have queued bug fixes by Igor and Stefan for device_add on qom-next and am rearranging the following changes of mine on top. 1) Further naming cleanups, now rebased on the bugfixes for easier backporting. 2) Inlining of qdev_init(), so that we always have unparent+unref pairs. If there's no objections, planning to include this in a pull tonight or tomorrow. Regards, Andreas Cc: Igor Mammedov imamm...@redhat.com Cc: Stefan Hajnoczi stefa...@redhat.com Cc: Paolo Bonzini pbonz...@redhat.com Cc: Anthony Liguori anth...@codemonkey.ws Andreas Färber (2): qdev-monitor: Avoid qdev as variable name qdev-monitor: Inline qdev_init() for device_add qdev-monitor.c | 37 + 1 file changed, 21 insertions(+), 16 deletions(-) Reviewed-By: Igor Mammedov imamm...@redhat.com
[Qemu-devel] savevm/loadvm
Hi! I need the community help with savevm/loadvm. I run QEMU like this: ./qemu-system-ppc64 \ -drive file=virtimg/fc19_16GB.qcow2 \ -nodefaults \ -m 2048 \ -machine pseries \ -nographic \ -vga none \ -enable-kvm The disk image is an 16GB qcow2 image. Now I start the guest and do savevm 1 and loadvm 1 from the qemu console. Everything works. Then I exit qemu, make sure that the snapshot is there and run QEMU as above plus -loadvm 1. It fails with: qemu-system-ppc64: qcow2: Loading snapshots with different disk size is not implemented qemu-system-ppc64: Error -95 while activating snapshot '2' on 'scsi0-hd0' The check is added by commit 90b277593df873d3a2480f002e2eb5fe1f8e5277 qcow2: Save disk size in snapshot header. As I cannot realize the whole idea of the patch, I looked a bit deeper. This is the check: int qcow2_snapshot_goto(BlockDriverState *bs, const char *snapshot_id) { [...] if (sn-disk_size != bs-total_sectors * BDRV_SECTOR_SIZE) { error_report(qcow2: Loading snapshots with different disk size is not implemented); ret = -ENOTSUP; goto fail; } My understanding of the patch was that the disk_size should remain 16GB (0x4..) as it uses bs-total_sectors and never changes it. And bs-growable is 0 for qcow2 image because it is not really growable. At least the total_sectors value from the qcow2 file header does not change between QEMU starts. However qcow2_save_vmstate() sets bs-growable to 1 for a short time (commit 178e08a58f40dd5aef2ce774fe0850f5d0e56918 from 2009) and this triggers a branch in bdrv_co_do_writev() which changes bs-total_sectors. So when QEMU writes snapshots to the file, the disk_size field of a snapshot has bigger value (for example 0x4.007b.8180). And the check above fails. It does not fail if to do loadvm _in_the_same_run_ after savevm because QEMU operates with the updated bs-total_sectors. What the proper fix would be? Or it is not a bug at all and I should be using something else for -loadvm? Thanks. -- Alexey
[Qemu-devel] [PATCH] scsi: Allocate SCSITargetReq r-buf dynamically
r-buf is hardcoded to 2056 which is (256 + 1) * 8, allowing 256 luns at most. If more than 256 luns are specified by user, we have buffer overflow in scsi_target_emulate_report_luns. To fix, we allocate the buffer dynamically. Signed-off-by: Asias He as...@redhat.com --- hw/scsi/scsi-bus.c | 44 +--- include/hw/scsi/scsi.h | 2 ++ 2 files changed, 35 insertions(+), 11 deletions(-) diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c index 4d36841..d950e6f 100644 --- a/hw/scsi/scsi-bus.c +++ b/hw/scsi/scsi-bus.c @@ -11,6 +11,8 @@ static char *scsibus_get_dev_path(DeviceState *dev); static char *scsibus_get_fw_dev_path(DeviceState *dev); static int scsi_req_parse(SCSICommand *cmd, SCSIDevice *dev, uint8_t *buf); static void scsi_req_dequeue(SCSIRequest *req); +static uint8_t *scsi_target_alloc_buf(SCSIRequest *req, size_t len); +static void scsi_target_free_buf(SCSIRequest *req); static Property scsi_props[] = { DEFINE_PROP_UINT32(channel, SCSIDevice, channel, 0), @@ -317,7 +319,8 @@ typedef struct SCSITargetReq SCSITargetReq; struct SCSITargetReq { SCSIRequest req; int len; -uint8_t buf[2056]; +uint8_t *buf; +int buf_len; }; static void store_lun(uint8_t *outbuf, int lun) @@ -361,14 +364,12 @@ static bool scsi_target_emulate_report_luns(SCSITargetReq *r) if (!found_lun0) { n += 8; } -len = MIN(n + 8, r-req.cmd.xfer ~7); -if (len sizeof(r-buf)) { -/* TODO: 256 LUNs? */ -return false; -} +scsi_target_alloc_buf(r-req, n + 8); + +len = MIN(n + 8, r-req.cmd.xfer ~7); memset(r-buf, 0, len); -stl_be_p(r-buf, n); +stl_be_p(r-buf[0], n); i = found_lun0 ? 8 : 16; QTAILQ_FOREACH(kid, r-req.bus-qbus.children, sibling) { DeviceState *qdev = kid-child; @@ -387,6 +388,9 @@ static bool scsi_target_emulate_report_luns(SCSITargetReq *r) static bool scsi_target_emulate_inquiry(SCSITargetReq *r) { assert(r-req.dev-lun != r-req.lun); + +scsi_target_alloc_buf(r-req, SCSI_INQUIRY_LEN); + if (r-req.cmd.buf[1] 0x2) { /* Command support data - optional, not implemented */ return false; @@ -411,7 +415,7 @@ static bool scsi_target_emulate_inquiry(SCSITargetReq *r) return false; } /* done with EVPD */ -assert(r-len sizeof(r-buf)); +assert(r-len r-buf_len); r-len = MIN(r-req.cmd.xfer, r-len); return true; } @@ -455,8 +459,8 @@ static int32_t scsi_target_send_command(SCSIRequest *req, uint8_t *buf) } break; case REQUEST_SENSE: -r-len = scsi_device_get_sense(r-req.dev, r-buf, - MIN(req-cmd.xfer, sizeof r-buf), +scsi_target_alloc_buf(r-req, SCSI_SENSE_LEN); +r-len = scsi_device_get_sense(r-req.dev, r-buf, r-buf_len, (req-cmd.buf[1] 1) == 0); if (r-req.dev-sense_is_ua) { scsi_device_unit_attention_reported(req-dev); @@ -501,11 +505,29 @@ static uint8_t *scsi_target_get_buf(SCSIRequest *req) return r-buf; } +static uint8_t *scsi_target_alloc_buf(SCSIRequest *req, size_t len) +{ +SCSITargetReq *r = DO_UPCAST(SCSITargetReq, req, req); + +r-buf = g_malloc(len); +r-buf_len = len; + +return r-buf; +} + +static void scsi_target_free_buf(SCSIRequest *req) +{ +SCSITargetReq *r = DO_UPCAST(SCSITargetReq, req, req); + +g_free(r-buf); +} + static const struct SCSIReqOps reqops_target_command = { .size = sizeof(SCSITargetReq), .send_command = scsi_target_send_command, .read_data= scsi_target_read_data, .get_buf = scsi_target_get_buf, +.free_req = scsi_target_free_buf, }; @@ -1365,7 +1387,7 @@ int scsi_build_sense(uint8_t *in_buf, int in_len, buf[7] = 10; buf[12] = sense.asc; buf[13] = sense.ascq; -return MIN(len, 18); +return MIN(len, SCSI_SENSE_LEN); } else { /* Return descriptor format sense buffer */ buf[0] = 0x72; diff --git a/include/hw/scsi/scsi.h b/include/hw/scsi/scsi.h index 1b66510..76f6ac2 100644 --- a/include/hw/scsi/scsi.h +++ b/include/hw/scsi/scsi.h @@ -9,6 +9,8 @@ #define MAX_SCSI_DEVS 255 #define SCSI_CMD_BUF_SIZE 16 +#define SCSI_SENSE_LEN 18 +#define SCSI_INQUIRY_LEN36 typedef struct SCSIBus SCSIBus; typedef struct SCSIBusInfo SCSIBusInfo; -- 1.8.3.1
[Qemu-devel] [PATCH 2/8] timers: add timer_mod_anticipate and timer_mod_anticipate_ns
These let a user anticipate the deadline of a timer, atomically with other sites that call the function. This helps avoiding complicated lock hierarchies. It is useful whenever the timer does work based on the current value of the clock (rather than doing something periodically on every tick). Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- include/qemu/timer.h | 26 ++ qemu-timer.c | 29 + 2 files changed, 55 insertions(+) diff --git a/include/qemu/timer.h b/include/qemu/timer.h index b58903b..f215b0b 100644 --- a/include/qemu/timer.h +++ b/include/qemu/timer.h @@ -539,6 +539,19 @@ void timer_del(QEMUTimer *ts); void timer_mod_ns(QEMUTimer *ts, int64_t expire_time); /** + * timer_mod_anticipate_ns: + * @ts: the timer + * @expire_time: the expiry time in nanoseconds + * + * Modify a timer to expire at @expire_time or the current time, + * whichever comes earlier. + * + * This function is thread-safe but the timer and its timer list must not be + * freed while this function is running. + */ +void timer_mod_anticipate_ns(QEMUTimer *ts, int64_t expire_time); + +/** * timer_mod: * @ts: the timer * @expire_time: the expire time in the units associated with the timer @@ -552,6 +565,19 @@ void timer_mod_ns(QEMUTimer *ts, int64_t expire_time); void timer_mod(QEMUTimer *ts, int64_t expire_timer); /** + * timer_mod_anticipate: + * @ts: the timer + * @expire_time: the expiry time in nanoseconds + * + * Modify a timer to expire at @expire_time or the current time, whichever + * comes earlier, taking into account the scale associated with the timer. + * + * This function is thread-safe but the timer and its timer list must not be + * freed while this function is running. + */ +void timer_mod_anticipate(QEMUTimer *ts, int64_t expire_time); + +/** * timer_pending: * @ts: the timer * diff --git a/qemu-timer.c b/qemu-timer.c index 95fc6eb..202e9a2 100644 --- a/qemu-timer.c +++ b/qemu-timer.c @@ -393,11 +393,40 @@ void timer_mod_ns(QEMUTimer *ts, int64_t expire_time) } } +/* modify the current timer so that it will be fired when current_time + = expire_time or the current deadline, whichever comes earlier. + The corresponding callback will be called. */ +void timer_mod_anticipate_ns(QEMUTimer *ts, int64_t expire_time) +{ +QEMUTimerList *timer_list = ts-timer_list; +bool rearm; + +qemu_mutex_lock(timer_list-active_timers_lock); +if (ts-expire_time == -1 || ts-expire_time expire_time) { +if (ts-expire_time != -1) { +timer_del_locked(timer_list, ts); +} +rearm = timer_mod_ns_locked(timer_list, ts, expire_time); +} else { +rearm = false; +} +qemu_mutex_unlock(timer_list-active_timers_lock); + +if (rearm) { +timerlist_rearm(timer_list); +} +} + void timer_mod(QEMUTimer *ts, int64_t expire_time) { timer_mod_ns(ts, expire_time * ts-scale); } +void timer_mod_anticipate(QEMUTimer *ts, int64_t expire_time) +{ +timer_mod_anticipate_ns(ts, expire_time * ts-scale); +} + bool timer_pending(QEMUTimer *ts) { return ts-expire_time = 0; -- 1.8.3.1
[Qemu-devel] [PATCH 0/8] Make icount thread-safe
This series moves the icount state under the same seqlock as the normal vm_clock implementation. It is not yet 100% thread-safe, because the CPU list should be moved under RCU protection (due to the call to !all_cpu_threads_idle() in qemu_clock_warp). However it is a substantial step forward, the only uncovered case being CPU hotplug. Please review. Paolo Paolo Bonzini (8): timers: extract timer_mod_ns_locked and timerlist_rearm timers: add timer_mod_anticipate and timer_mod_anticipate_ns timers: use cpu_get_icount() directly timers: reorganize icount_warp_rt timers: prepare the code for future races in calling qemu_clock_warp timers: introduce cpu_get_clock_locked timers: document (future) locking rules for icount timers: make icount thread-safe cpus.c | 110 - include/qemu/timer.h | 26 + qemu-timer.c | 74 +++-- 4 files changed, 163 insertions(+), 47 deletions(-) create mode 100644 include/qemu/seqlock.h -- 1.8.3.1
[Qemu-devel] [PATCH 4/8] timers: reorganize icount_warp_rt
To prepare for future code changes, move the increment of qemu_icount_bias outside the if statement. Also, hoist outside the if the check for timers that expired due to the warping. The check is redundant when !runstate_is_running(), but doing it this way helps because the code that increments qemu_icount_bias will be a critical section. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- cpus.c | 18 ++ 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/cpus.c b/cpus.c index f87ff6f..9f450ad 100644 --- a/cpus.c +++ b/cpus.c @@ -279,10 +279,10 @@ static void icount_warp_rt(void *opaque) if (runstate_is_running()) { int64_t clock = qemu_clock_get_ns(QEMU_CLOCK_REALTIME); -int64_t warp_delta = clock - vm_clock_warp_start; -if (use_icount == 1) { -qemu_icount_bias += warp_delta; -} else { +int64_t warp_delta; + +warp_delta = clock - vm_clock_warp_start; +if (use_icount == 2) { /* * In adaptive mode, do not let QEMU_CLOCK_VIRTUAL run too * far ahead of real time. @@ -290,13 +290,15 @@ static void icount_warp_rt(void *opaque) int64_t cur_time = cpu_get_clock(); int64_t cur_icount = cpu_get_icount(); int64_t delta = cur_time - cur_icount; -qemu_icount_bias += MIN(warp_delta, delta); -} -if (qemu_clock_expired(QEMU_CLOCK_VIRTUAL)) { -qemu_clock_notify(QEMU_CLOCK_VIRTUAL); +warp_delta = MIN(warp_delta, delta); } +qemu_icount_bias += warp_delta; } vm_clock_warp_start = -1; + +if (qemu_clock_expired(QEMU_CLOCK_VIRTUAL)) { +qemu_clock_notify(QEMU_CLOCK_VIRTUAL); +} } void qtest_clock_warp(int64_t dest) -- 1.8.3.1
[Qemu-devel] [PATCH 1/8] timers: extract timer_mod_ns_locked and timerlist_rearm
These will be reused in timer_mod_anticipate functions. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- qemu-timer.c | 51 --- 1 file changed, 32 insertions(+), 19 deletions(-) diff --git a/qemu-timer.c b/qemu-timer.c index 6b62e88..95fc6eb 100644 --- a/qemu-timer.c +++ b/qemu-timer.c @@ -338,6 +338,34 @@ static void timer_del_locked(QEMUTimerList *timer_list, QEMUTimer *ts) } } +static bool timer_mod_ns_locked(QEMUTimerList *timer_list, +QEMUTimer *ts, int64_t expire_time) +{ +QEMUTimer **pt, *t; + +/* add the timer in the sorted list */ +pt = timer_list-active_timers; +for (;;) { +t = *pt; +if (!timer_expired_ns(t, expire_time)) { +break; +} +pt = t-next; +} +ts-expire_time = MAX(expire_time, 0); +ts-next = *pt; +*pt = ts; + +return pt == timer_list-active_timers; +} + +static void timerlist_rearm(QEMUTimerList *timer_list) +{ +/* Interrupt execution to force deadline recalculation. */ +qemu_clock_warp(timer_list-clock-type); +timerlist_notify(timer_list); +} + /* stop a timer, but do not dealloc it */ void timer_del(QEMUTimer *ts) { @@ -353,30 +381,15 @@ void timer_del(QEMUTimer *ts) void timer_mod_ns(QEMUTimer *ts, int64_t expire_time) { QEMUTimerList *timer_list = ts-timer_list; -QEMUTimer **pt, *t; +bool rearm; qemu_mutex_lock(timer_list-active_timers_lock); timer_del_locked(timer_list, ts); - -/* add the timer in the sorted list */ -pt = timer_list-active_timers; -for(;;) { -t = *pt; -if (!timer_expired_ns(t, expire_time)) { -break; -} -pt = t-next; -} -ts-expire_time = MAX(expire_time, 0); -ts-next = *pt; -*pt = ts; +rearm = timer_mod_ns_locked(timer_list, ts, expire_time); qemu_mutex_unlock(timer_list-active_timers_lock); -/* Rearm if necessary */ -if (pt == timer_list-active_timers) { -/* Interrupt execution to force deadline recalculation. */ -qemu_clock_warp(timer_list-clock-type); -timerlist_notify(timer_list); +if (rearm) { +timerlist_rearm(timer_list); } } -- 1.8.3.1
[Qemu-devel] [PATCH 3/8] timers: use cpu_get_icount() directly
This will help later when we will have to place these calls in a critical section, and thus call a version of cpu_get_icount() that does not take the lock. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- cpus.c | 17 +++-- 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/cpus.c b/cpus.c index 870a832..f87ff6f 100644 --- a/cpus.c +++ b/cpus.c @@ -224,12 +224,15 @@ static void icount_adjust(void) int64_t cur_icount; int64_t delta; static int64_t last_delta; + /* If the VM is not running, then do nothing. */ if (!runstate_is_running()) { return; } + cur_time = cpu_get_clock(); -cur_icount = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL); +cur_icount = cpu_get_icount(); + delta = cur_icount - cur_time; /* FIXME: This is a very crude algorithm, somewhat prone to oscillation. */ if (delta 0 @@ -285,7 +288,7 @@ static void icount_warp_rt(void *opaque) * far ahead of real time. */ int64_t cur_time = cpu_get_clock(); -int64_t cur_icount = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL); +int64_t cur_icount = cpu_get_icount(); int64_t delta = cur_time - cur_icount; qemu_icount_bias += MIN(warp_delta, delta); } -- 1.8.3.1
[Qemu-devel] [PATCH 8/8] timers: make icount thread-safe
This lets threads other than the I/O thread use vm_clock even in -icount mode. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- cpus.c | 42 ++ 1 file changed, 34 insertions(+), 8 deletions(-) diff --git a/cpus.c b/cpus.c index bc675a4..1e5cba4 100644 --- a/cpus.c +++ b/cpus.c @@ -133,7 +133,7 @@ typedef struct TimersState { static TimersState timers_state; /* Return the virtual CPU time, based on the instruction counter. */ -int64_t cpu_get_icount(void) +static int64_t cpu_get_icount_locked(void) { int64_t icount; CPUState *cpu = current_cpu; @@ -149,6 +149,19 @@ int64_t cpu_get_icount(void) return qemu_icount_bias + (icount icount_time_shift); } +int64_t cpu_get_icount(void) +{ +int64_t icount; +unsigned start; + +do { +start = seqlock_read_begin(timers_state.clock_seqlock); +icount = cpu_get_icount_locked(); +} while (seqlock_read_retry(timers_state.clock_seqlock, start)); + +return icount; +} + /* return the host CPU cycle counter and handle stop/restart */ /* cpu_ticks is safely if holding BQL */ int64_t cpu_get_ticks(void) @@ -246,8 +259,9 @@ static void icount_adjust(void) return; } -cur_time = cpu_get_clock(); -cur_icount = cpu_get_icount(); +seqlock_write_lock(timers_state.clock_seqlock); +cur_time = cpu_get_clock_locked(); +cur_icount = cpu_get_icount_locked(); delta = cur_icount - cur_time; /* FIXME: This is a very crude algorithm, somewhat prone to oscillation. */ @@ -265,6 +279,7 @@ static void icount_adjust(void) } last_delta = delta; qemu_icount_bias = cur_icount - (qemu_icount icount_time_shift); +seqlock_write_unlock(timers_state.clock_seqlock); } static void icount_adjust_rt(void *opaque) @@ -289,10 +304,14 @@ static int64_t qemu_icount_round(int64_t count) static void icount_warp_rt(void *opaque) { -if (vm_clock_warp_start == -1) { +/* The icount_warp_timer is rescheduled soon after vm_clock_warp_start + * changes from -1 to another value, so the race here is okay. + */ +if (atomic_read(vm_clock_warp_start) == -1) { return; } +seqlock_write_lock(timers_state.clock_seqlock); if (runstate_is_running()) { int64_t clock = qemu_clock_get_ns(QEMU_CLOCK_REALTIME); int64_t warp_delta; @@ -303,14 +322,15 @@ static void icount_warp_rt(void *opaque) * In adaptive mode, do not let QEMU_CLOCK_VIRTUAL run too * far ahead of real time. */ -int64_t cur_time = cpu_get_clock(); -int64_t cur_icount = cpu_get_icount(); +int64_t cur_time = cpu_get_clock_locked(); +int64_t cur_icount = cpu_get_icount_locked(); int64_t delta = cur_time - cur_icount; warp_delta = MIN(warp_delta, delta); } qemu_icount_bias += warp_delta; } vm_clock_warp_start = -1; +seqlock_write_unlock(timers_state.clock_seqlock); if (qemu_clock_expired(QEMU_CLOCK_VIRTUAL)) { qemu_clock_notify(QEMU_CLOCK_VIRTUAL); @@ -324,7 +344,10 @@ void qtest_clock_warp(int64_t dest) while (clock dest) { int64_t deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL); int64_t warp = MIN(dest - clock, deadline); +seqlock_write_lock(timers_state.clock_seqlock); qemu_icount_bias += warp; +seqlock_write_unlock(timers_state.clock_seqlock); + qemu_clock_run_timers(QEMU_CLOCK_VIRTUAL); clock = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL); } @@ -391,9 +415,11 @@ void qemu_clock_warp(QEMUClockType type) * you will not be sending network packets continuously instead of * every 100ms. */ +seqlock_write_lock(timers_state.clock_seqlock); if (vm_clock_warp_start == -1 || vm_clock_warp_start clock) { vm_clock_warp_start = clock; } +seqlock_write_unlock(timers_state.clock_seqlock); timer_mod_anticipate(icount_warp_timer, clock + deadline); } else if (deadline == 0) { qemu_clock_notify(QEMU_CLOCK_VIRTUAL); -- 1.8.3.1
[Qemu-devel] [PATCH 5/8] timers: prepare the code for future races in calling qemu_clock_warp
Computing the deadline of all vm_clocks is somewhat expensive and calls out to qemu-timer.c; two reasons not to do it in the seqlock's write-side critical section. This however opens the door for races in setting and reading vm_clock_warp_start. To plug them, we need to cover the case where a new deadline slips in between the call to qemu_clock_deadline_ns_all and the actual modification of the icount_warp_timer. Restrict changes to vm_clock_warp_start and the icount_warp_timer's expiration time, to only move them back (which would simply cause an early wakeup). If a vm_clock timer is cancelled while CPUs are idle, this might cause the icount_warp_timer to fire unnecessarily. This is not a problem, after it fires the timer becomes inactive and the next call to timer_mod_anticipate will be precise. In addition to this, we must deactivate the icount_warp_timer _before_ checking whether CPUs are idle. This way, if the last CPU becomes idle during the call to timer_del we will still set up the icount_warp_timer. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- cpus.c | 19 +-- 1 file changed, 9 insertions(+), 10 deletions(-) diff --git a/cpus.c b/cpus.c index 9f450ad..08eaf23 100644 --- a/cpus.c +++ b/cpus.c @@ -319,6 +319,7 @@ void qtest_clock_warp(int64_t dest) void qemu_clock_warp(QEMUClockType type) { +int64_t clock; int64_t deadline; /* @@ -338,7 +339,7 @@ void qemu_clock_warp(QEMUClockType type) * the earliest QEMU_CLOCK_VIRTUAL timer. */ icount_warp_rt(NULL); -if (!all_cpu_threads_idle() || !qemu_clock_has_timers(QEMU_CLOCK_VIRTUAL)) { -timer_del(icount_warp_timer); +timer_del(icount_warp_timer); +if (!all_cpu_threads_idle()) { return; } @@ -348,17 +349,11 @@ void qemu_clock_warp(QEMUClockType type) return; } -vm_clock_warp_start = qemu_clock_get_ns(QEMU_CLOCK_REALTIME); /* We want to use the earliest deadline from ALL vm_clocks */ +clock = qemu_clock_get_ns(QEMU_CLOCK_REALTIME); deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL); - -/* Maintain prior (possibly buggy) behaviour where if no deadline - * was set (as there is no QEMU_CLOCK_VIRTUAL timer) or it is more than - * INT32_MAX nanoseconds ahead, we still use INT32_MAX - * nanoseconds. - */ -if ((deadline 0) || (deadline INT32_MAX)) { -deadline = INT32_MAX; +if (deadline 0) { +return; } if (deadline 0) { @@ -379,7 +375,10 @@ void qemu_clock_warp(QEMUClockType type) * you will not be sending network packets continuously instead of * every 100ms. */ -timer_mod(icount_warp_timer, vm_clock_warp_start + deadline); +if (vm_clock_warp_start == -1 || vm_clock_warp_start clock) { +vm_clock_warp_start = clock; +} +timer_mod_anticipate(icount_warp_timer, clock + deadline); } else if (deadline == 0) { qemu_clock_notify(QEMU_CLOCK_VIRTUAL); } -- 1.8.3.1
[Qemu-devel] [PATCH 6/8] timers: introduce cpu_get_clock_locked
This fixes a deadlock in cpu_disable_ticks. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- Should be squashed in Ping Fan's patches. cpus.c | 21 +++-- 1 file changed, 15 insertions(+), 6 deletions(-) diff --git a/cpus.c b/cpus.c index 08eaf23..01acce2 100644 --- a/cpus.c +++ b/cpus.c @@ -166,6 +166,20 @@ int64_t cpu_get_ticks(void) } } +static int64_t cpu_get_clock_locked(void) +{ +int64_t ti; + +if (!timers_state.cpu_ticks_enabled) { +ti = timers_state.cpu_clock_offset; +} else { +ti = get_clock(); +ti += timers_state.cpu_clock_offset; +} + +return ti; +} + /* return the host CPU monotonic timer and handle stop/restart */ int64_t cpu_get_clock(void) { @@ -174,12 +188,7 @@ int64_t cpu_get_clock(void) do { start = seqlock_read_begin(timers_state.clock_seqlock); -if (!timers_state.cpu_ticks_enabled) { -ti = timers_state.cpu_clock_offset; -} else { -ti = get_clock(); -ti += timers_state.cpu_clock_offset; -} +ti = cpu_get_clock_locked(); } while (seqlock_read_retry(timers_state.clock_seqlock, start)); return ti; @@ -220,7 +233,7 @@ void cpu_disable_ticks(void) seqlock_write_lock(timers_state.clock_seqlock); if (timers_state.cpu_ticks_enabled) { timers_state.cpu_ticks_offset = cpu_get_ticks(); -timers_state.cpu_clock_offset = cpu_get_clock(); +timers_state.cpu_clock_offset = cpu_get_clock_locked(); timers_state.cpu_ticks_enabled = 0; } seqlock_write_unlock(timers_state.clock_seqlock); -- 1.8.3.1
[Qemu-devel] [PATCH 7/8] timers: document (future) locking rules for icount
Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- cpus.c | 15 +++ 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/cpus.c b/cpus.c index 01acce2..bc675a4 100644 --- a/cpus.c +++ b/cpus.c @@ -98,17 +98,22 @@ static bool all_cpu_threads_idle(void) /***/ /* guest cycle counter */ +/* Protected by TimersState seqlock */ + +/* Compensate for varying guest execution speed. */ +static int64_t qemu_icount_bias; +static int64_t vm_clock_warp_start; /* Conversion factor from emulated instructions to virtual clock ticks. */ static int icount_time_shift; /* Arbitrarily pick 1MIPS as the minimum allowable speed. */ #define MAX_ICOUNT_SHIFT 10 -/* Compensate for varying guest execution speed. */ -static int64_t qemu_icount_bias; + +/* Only written by TCG thread */ +static int64_t qemu_icount; + static QEMUTimer *icount_rt_timer; static QEMUTimer *icount_vm_timer; static QEMUTimer *icount_warp_timer; -static int64_t vm_clock_warp_start; -static int64_t qemu_icount; typedef struct TimersState { int64_t cpu_ticks_prev; @@ -232,6 +237,8 @@ static void icount_adjust(void) int64_t cur_time; int64_t cur_icount; int64_t delta; + +/* Protected by TimersState mutex. */ static int64_t last_delta; /* If the VM is not running, then do nothing. */ -- 1.8.3.1
Re: [Qemu-devel] [PATCHv3 00/20] block: logical block provisioning enhancements
On Tue, Oct 8, 2013 at 10:01 AM, Peter Lieven p...@kamp.de wrote: On 08.10.2013 09:02, Stefan Hajnoczi wrote: On Mon, Oct 7, 2013 at 11:42 AM, Paolo Bonzini pbonz...@redhat.com wrote: Il 07/10/2013 10:42, Stefan Hajnoczi ha scritto: Could you make bdrv_co_write_zeroes() always use UNMAP, if possible, and avoid adding the new BDRV_REQ_MAY_UNMAP flag? While reading the first few patches in this series I wondered why there is a need to expose flags at all... Sometimes it is useful to distinguish between zeroing at the image format level from discarding at the device level, but I don't think we make use of that yet. I'd prefer to keep the interface simple for now and add flags later, if necessary. Or maybe I just missed something ;) The flag is needed to implement the right semantics for the SCSI WRITE SAME command, which are: - if the UNMAP bit is off, always write the sectors (that's bdrv_aio_write_zeroes without BDRV_REQ_MAY_UNMAP if the payload is zero, otherwise it's emulated with bdrv_aio_writev) - if the target can discard and write the specified payload, you can discard, else you must write the sectors with the correct payload (that's bdrv_aio_write_zeroes with BDRV_REQ_MAY_UNMAP). Contrast this with the UNMAP command, which does not make any guarantee on the content of the sectors after the command is completed (a few months ago we agreed that, even if you have discard_zeroes=true in the target, it is fine for UNMAP to do nothing). Okay, then let's keep the patches to expose the flag. Okay, then I can keep those. Can you give a short hint if my approach with brdv_make_empty is what you want? I would like to not change the parameters, so use BDRV_REQ_MAY_UNMAP unconditionally. int bdrv_make_empty(BlockDriverState *bs) The semantics of bdrv_make_empty() today are: deallocate all data in the top layer of the image file. If there is a backing file, reads will fall back to the backing file. The semantics that you want are zeroing the entire disk image (efficiently, when possible). A flags argument is needed to support both of sets of semantics. If you don't like that, then I suggest creating a new function called bdrv_make_zero(). Stefan
Re: [Qemu-devel] savevm/loadvm
Il 08/10/2013 10:40, Alexey Kardashevskiy ha scritto: However qcow2_save_vmstate() sets bs-growable to 1 for a short time (commit 178e08a58f40dd5aef2ce774fe0850f5d0e56918 from 2009) and this triggers a branch in bdrv_co_do_writev() which changes bs-total_sectors. So when QEMU writes snapshots to the file, the disk_size field of a snapshot has bigger value (for example 0x4.007b.8180). I think you need to modify qcow2_save_vmstate to save and restore bs-total_sectors. Can you test that and if so post the patch? Paolo
Re: [Qemu-devel] [PATCH 1/8] timers: extract timer_mod_ns_locked and timerlist_rearm
On 8 Oct 2013, at 09:47, Paolo Bonzini wrote: These will be reused in timer_mod_anticipate functions. Signed-off-by: Paolo Bonzini pbonz...@redhat.com Signed-off-by: Alex Bligh a...@alex.org.uk --- qemu-timer.c | 51 --- 1 file changed, 32 insertions(+), 19 deletions(-) diff --git a/qemu-timer.c b/qemu-timer.c index 6b62e88..95fc6eb 100644 --- a/qemu-timer.c +++ b/qemu-timer.c @@ -338,6 +338,34 @@ static void timer_del_locked(QEMUTimerList *timer_list, QEMUTimer *ts) } } +static bool timer_mod_ns_locked(QEMUTimerList *timer_list, +QEMUTimer *ts, int64_t expire_time) +{ +QEMUTimer **pt, *t; + +/* add the timer in the sorted list */ +pt = timer_list-active_timers; +for (;;) { +t = *pt; +if (!timer_expired_ns(t, expire_time)) { +break; +} +pt = t-next; +} +ts-expire_time = MAX(expire_time, 0); +ts-next = *pt; +*pt = ts; + +return pt == timer_list-active_timers; +} + +static void timerlist_rearm(QEMUTimerList *timer_list) +{ +/* Interrupt execution to force deadline recalculation. */ +qemu_clock_warp(timer_list-clock-type); +timerlist_notify(timer_list); +} + /* stop a timer, but do not dealloc it */ void timer_del(QEMUTimer *ts) { @@ -353,30 +381,15 @@ void timer_del(QEMUTimer *ts) void timer_mod_ns(QEMUTimer *ts, int64_t expire_time) { QEMUTimerList *timer_list = ts-timer_list; -QEMUTimer **pt, *t; +bool rearm; qemu_mutex_lock(timer_list-active_timers_lock); timer_del_locked(timer_list, ts); - -/* add the timer in the sorted list */ -pt = timer_list-active_timers; -for(;;) { -t = *pt; -if (!timer_expired_ns(t, expire_time)) { -break; -} -pt = t-next; -} -ts-expire_time = MAX(expire_time, 0); -ts-next = *pt; -*pt = ts; +rearm = timer_mod_ns_locked(timer_list, ts, expire_time); qemu_mutex_unlock(timer_list-active_timers_lock); -/* Rearm if necessary */ -if (pt == timer_list-active_timers) { -/* Interrupt execution to force deadline recalculation. */ -qemu_clock_warp(timer_list-clock-type); -timerlist_notify(timer_list); +if (rearm) { +timerlist_rearm(timer_list); } } -- 1.8.3.1 -- Alex Bligh
Re: [Qemu-devel] KVM Guest keymap issue
Hi, the strange thing is that all other keys and combinations work except those ccaron, Ccaron, scaron and Scaron, zcaron and ZCaron don't. In our language there are many words containing those chars and I really need to have them working. When looking at the sl keymap file, those codes, even for all other chars that I type with showkey --ascii, are different than the showkey outputs, but they work (except those mentioned above). Now I am totally confused on how could those that work, work ... Thanks for any enlightenments in advance :) Matej 2013/9/26 Matej Mailing mail...@tam.si: I am still pretty lost here, also after reading your link which shed a light to many things. Every suggestion and idea is very welcome! Thanks, Matej 2013/9/24 Markus Armbruster arm...@redhat.com: Not specific to KVM, adding qemu-devel. Matej Mailing mail...@tam.si writes: Dear list, I have a problem with a Windows XP guest that I connect to via VNC and is using sl keymap (option -k sl). The guest is Windows XP and the problematic characters are s, c and z with caron... when I type them via VNC, they are not printed at all in virtual system... I have checked the file /usr/share/kvm/keymaps/sl and it seems that it contains different codes than I get when doing showkey --ascii on the host machine (running Ubuntu 12.04). I have tried to change the KVM's keymap file 'sl' with the codes I get from showkey, but they are still not printed in virtual system to which I am connected via VNC... I am totally lost with this issue, thanks for your time and ideas. Required reading for anyone struggling with virtual keyboards: https://www.berrange.com/posts/2010/07/04/more-than-you-or-i-ever-wanted-to-know-about-virtual-keyboard-handling/
Re: [Qemu-devel] [PATCHv3 00/20] block: logical block provisioning enhancements
On 08.10.2013 10:59, Stefan Hajnoczi wrote: On Tue, Oct 8, 2013 at 10:01 AM, Peter Lieven p...@kamp.de wrote: On 08.10.2013 09:02, Stefan Hajnoczi wrote: On Mon, Oct 7, 2013 at 11:42 AM, Paolo Bonzini pbonz...@redhat.com wrote: Il 07/10/2013 10:42, Stefan Hajnoczi ha scritto: Could you make bdrv_co_write_zeroes() always use UNMAP, if possible, and avoid adding the new BDRV_REQ_MAY_UNMAP flag? While reading the first few patches in this series I wondered why there is a need to expose flags at all... Sometimes it is useful to distinguish between zeroing at the image format level from discarding at the device level, but I don't think we make use of that yet. I'd prefer to keep the interface simple for now and add flags later, if necessary. Or maybe I just missed something ;) The flag is needed to implement the right semantics for the SCSI WRITE SAME command, which are: - if the UNMAP bit is off, always write the sectors (that's bdrv_aio_write_zeroes without BDRV_REQ_MAY_UNMAP if the payload is zero, otherwise it's emulated with bdrv_aio_writev) - if the target can discard and write the specified payload, you can discard, else you must write the sectors with the correct payload (that's bdrv_aio_write_zeroes with BDRV_REQ_MAY_UNMAP). Contrast this with the UNMAP command, which does not make any guarantee on the content of the sectors after the command is completed (a few months ago we agreed that, even if you have discard_zeroes=true in the target, it is fine for UNMAP to do nothing). Okay, then let's keep the patches to expose the flag. Okay, then I can keep those. Can you give a short hint if my approach with brdv_make_empty is what you want? I would like to not change the parameters, so use BDRV_REQ_MAY_UNMAP unconditionally. int bdrv_make_empty(BlockDriverState *bs) The semantics of bdrv_make_empty() today are: deallocate all data in the top layer of the image file. If there is a backing file, reads will fall back to the backing file. The semantics that you want are zeroing the entire disk image (efficiently, when possible). A flags argument is needed to support both of sets of semantics. If you don't like that, then I suggest creating a new function called bdrv_make_zero(). Ok, that is what I would like to do. In this case I only have to rename bdrv_zeroize to bdrv_make_zero. Ok ? Peter
Re: [Qemu-devel] [PATCH 2/8] timers: add timer_mod_anticipate and timer_mod_anticipate_ns
Paolo, On 8 Oct 2013, at 09:47, Paolo Bonzini wrote: --- a/qemu-timer.c +++ b/qemu-timer.c @@ -393,11 +393,40 @@ void timer_mod_ns(QEMUTimer *ts, int64_t expire_time) } } +/* modify the current timer so that it will be fired when current_time + = expire_time or the current deadline, whichever comes earlier. + The corresponding callback will be called. */ +void timer_mod_anticipate_ns(QEMUTimer *ts, int64_t expire_time) +{ +QEMUTimerList *timer_list = ts-timer_list; +bool rearm; + +qemu_mutex_lock(timer_list-active_timers_lock); +if (ts-expire_time == -1 || ts-expire_time expire_time) { So if we want to alter it ... +if (ts-expire_time != -1) { +timer_del_locked(timer_list, ts); +} What's this bit for? Surely you've calculated whether you are shortening the expiry time (above), so all you need do now is modify it. Why delete it? timer_mod_ns doesn't make this check? Otherwise looks OK. +rearm = timer_mod_ns_locked(timer_list, ts, expire_time); +} else { +rearm = false; +} +qemu_mutex_unlock(timer_list-active_timers_lock); + +if (rearm) { +timerlist_rearm(timer_list); +} +} + void timer_mod(QEMUTimer *ts, int64_t expire_time) { timer_mod_ns(ts, expire_time * ts-scale); } +void timer_mod_anticipate(QEMUTimer *ts, int64_t expire_time) +{ +timer_mod_anticipate_ns(ts, expire_time * ts-scale); +} + bool timer_pending(QEMUTimer *ts) { return ts-expire_time = 0; -- 1.8.3.1 -- Alex Bligh
Re: [Qemu-devel] savevm/loadvm
Am 08.10.2013 um 11:04 hat Paolo Bonzini geschrieben: Il 08/10/2013 10:40, Alexey Kardashevskiy ha scritto: However qcow2_save_vmstate() sets bs-growable to 1 for a short time (commit 178e08a58f40dd5aef2ce774fe0850f5d0e56918 from 2009) and this triggers a branch in bdrv_co_do_writev() which changes bs-total_sectors. So when QEMU writes snapshots to the file, the disk_size field of a snapshot has bigger value (for example 0x4.007b.8180). I think you need to modify qcow2_save_vmstate to save and restore bs-total_sectors. Can you test that and if so post the patch? It's a regression introduced by commit df2a6f29, right? What you suggest probably works as a quick hack to fix the bug. The VM state functions in qcow2 are getting uglier and uglier, though. Maybe they should avoid going through block.c and adding hacks to disable or revert side effects. Kevin
Re: [Qemu-devel] [PATCH 2/8] timers: add timer_mod_anticipate and timer_mod_anticipate_ns
Il 08/10/2013 11:15, Alex Bligh ha scritto: So if we want to alter it ... +if (ts-expire_time != -1) { +timer_del_locked(timer_list, ts); +} What's this bit for? Surely you've calculated whether you are shortening the expiry time (above), so all you need do now is modify it. Why delete it? timer_mod_ns doesn't make this check? timer_mod_ns_locked does not remove the timer from the list: qemu_mutex_lock(timer_list-active_timers_lock); timer_del_locked(timer_list, ts); rearm = timer_mod_ns_locked(timer_list, ts, expire_time); qemu_mutex_unlock(timer_list-active_timers_lock); This is doing the same. The check in the if is not strictly necessary, but it saves a useless visit of the list. It could be added to timer_mod_ns as well. Paolo
Re: [Qemu-devel] [PATCHv3 00/20] block: logical block provisioning enhancements
On Tue, Oct 8, 2013 at 11:12 AM, Peter Lieven p...@kamp.de wrote: On 08.10.2013 10:59, Stefan Hajnoczi wrote: On Tue, Oct 8, 2013 at 10:01 AM, Peter Lieven p...@kamp.de wrote: On 08.10.2013 09:02, Stefan Hajnoczi wrote: On Mon, Oct 7, 2013 at 11:42 AM, Paolo Bonzini pbonz...@redhat.com wrote: Il 07/10/2013 10:42, Stefan Hajnoczi ha scritto: Could you make bdrv_co_write_zeroes() always use UNMAP, if possible, and avoid adding the new BDRV_REQ_MAY_UNMAP flag? While reading the first few patches in this series I wondered why there is a need to expose flags at all... Sometimes it is useful to distinguish between zeroing at the image format level from discarding at the device level, but I don't think we make use of that yet. I'd prefer to keep the interface simple for now and add flags later, if necessary. Or maybe I just missed something ;) The flag is needed to implement the right semantics for the SCSI WRITE SAME command, which are: - if the UNMAP bit is off, always write the sectors (that's bdrv_aio_write_zeroes without BDRV_REQ_MAY_UNMAP if the payload is zero, otherwise it's emulated with bdrv_aio_writev) - if the target can discard and write the specified payload, you can discard, else you must write the sectors with the correct payload (that's bdrv_aio_write_zeroes with BDRV_REQ_MAY_UNMAP). Contrast this with the UNMAP command, which does not make any guarantee on the content of the sectors after the command is completed (a few months ago we agreed that, even if you have discard_zeroes=true in the target, it is fine for UNMAP to do nothing). Okay, then let's keep the patches to expose the flag. Okay, then I can keep those. Can you give a short hint if my approach with brdv_make_empty is what you want? I would like to not change the parameters, so use BDRV_REQ_MAY_UNMAP unconditionally. int bdrv_make_empty(BlockDriverState *bs) The semantics of bdrv_make_empty() today are: deallocate all data in the top layer of the image file. If there is a backing file, reads will fall back to the backing file. The semantics that you want are zeroing the entire disk image (efficiently, when possible). A flags argument is needed to support both of sets of semantics. If you don't like that, then I suggest creating a new function called bdrv_make_zero(). Ok, that is what I would like to do. In this case I only have to rename bdrv_zeroize to bdrv_make_zero. Ok ? Yes, please. Stefan
[Qemu-devel] [PATCH 0/3] qapi: introduce BlockJobType enum
Currently, block job type is hard coded string and could be repeated in different places in the code base. Introduce a enum type in QAPI to make it better for maintenance and introspection. The old BlockJobType struct is renamed to BlockJobDriver and its field job_type becomes a BlockJobType enum. Nothing is changed to the interface. Fam Zheng (3): blockjob: rename BlockJobType to BlockJobDriver qapi: Introduce enum BlockJobType qapi: make use of new BlockJobType block/backup.c | 6 +++--- block/commit.c | 6 +++--- block/mirror.c | 6 +++--- block/stream.c | 6 +++--- blockjob.c | 22 +++--- include/block/blockjob.h | 14 +++--- qapi-schema.json | 18 ++ 7 files changed, 48 insertions(+), 30 deletions(-) -- 1.8.3.1
[Qemu-devel] [PATCH 1/3] blockjob: rename BlockJobType to BlockJobDriver
We will use BlockJobType as the enum type name of block jobs in QAPI, rename current BlockJobType to BlockJobDriver, which will eventually become a set of operations, similar to block drivers. Signed-off-by: Fam Zheng f...@redhat.com --- block/backup.c | 4 ++-- block/commit.c | 4 ++-- block/mirror.c | 4 ++-- block/stream.c | 4 ++-- blockjob.c | 22 +++--- include/block/blockjob.h | 12 ++-- 6 files changed, 25 insertions(+), 25 deletions(-) diff --git a/block/backup.c b/block/backup.c index 04c4b5c..d374472 100644 --- a/block/backup.c +++ b/block/backup.c @@ -202,7 +202,7 @@ static void backup_iostatus_reset(BlockJob *job) bdrv_iostatus_reset(s-target); } -static const BlockJobType backup_job_type = { +static const BlockJobDriver backup_job_driver = { .instance_size = sizeof(BackupBlockJob), .job_type = backup, .set_speed = backup_set_speed, @@ -370,7 +370,7 @@ void backup_start(BlockDriverState *bs, BlockDriverState *target, return; } -BackupBlockJob *job = block_job_create(backup_job_type, bs, speed, +BackupBlockJob *job = block_job_create(backup_job_driver, bs, speed, cb, opaque, errp); if (!job) { return; diff --git a/block/commit.c b/block/commit.c index ac4b7cc..5146138 100644 --- a/block/commit.c +++ b/block/commit.c @@ -173,7 +173,7 @@ static void commit_set_speed(BlockJob *job, int64_t speed, Error **errp) ratelimit_set_speed(s-limit, speed / BDRV_SECTOR_SIZE, SLICE_TIME); } -static const BlockJobType commit_job_type = { +static const BlockJobDriver commit_job_driver = { .instance_size = sizeof(CommitBlockJob), .job_type = commit, .set_speed = commit_set_speed, @@ -238,7 +238,7 @@ void commit_start(BlockDriverState *bs, BlockDriverState *base, } -s = block_job_create(commit_job_type, bs, speed, cb, opaque, errp); +s = block_job_create(commit_job_driver, bs, speed, cb, opaque, errp); if (!s) { return; } diff --git a/block/mirror.c b/block/mirror.c index 6e7a274..991cc24 100644 --- a/block/mirror.c +++ b/block/mirror.c @@ -525,7 +525,7 @@ static void mirror_complete(BlockJob *job, Error **errp) block_job_resume(job); } -static const BlockJobType mirror_job_type = { +static const BlockJobDriver mirror_job_driver = { .instance_size = sizeof(MirrorBlockJob), .job_type = mirror, .set_speed = mirror_set_speed, @@ -563,7 +563,7 @@ void mirror_start(BlockDriverState *bs, BlockDriverState *target, return; } -s = block_job_create(mirror_job_type, bs, speed, cb, opaque, errp); +s = block_job_create(mirror_job_driver, bs, speed, cb, opaque, errp); if (!s) { return; } diff --git a/block/stream.c b/block/stream.c index 45837f4..7f412bd 100644 --- a/block/stream.c +++ b/block/stream.c @@ -203,7 +203,7 @@ static void stream_set_speed(BlockJob *job, int64_t speed, Error **errp) ratelimit_set_speed(s-limit, speed / BDRV_SECTOR_SIZE, SLICE_TIME); } -static const BlockJobType stream_job_type = { +static const BlockJobDriver stream_job_driver = { .instance_size = sizeof(StreamBlockJob), .job_type = stream, .set_speed = stream_set_speed, @@ -224,7 +224,7 @@ void stream_start(BlockDriverState *bs, BlockDriverState *base, return; } -s = block_job_create(stream_job_type, bs, speed, cb, opaque, errp); +s = block_job_create(stream_job_driver, bs, speed, cb, opaque, errp); if (!s) { return; } diff --git a/blockjob.c b/blockjob.c index e7d49b7..6814e69 100644 --- a/blockjob.c +++ b/blockjob.c @@ -35,7 +35,7 @@ #include qmp-commands.h #include qemu/timer.h -void *block_job_create(const BlockJobType *job_type, BlockDriverState *bs, +void *block_job_create(const BlockJobDriver *driver, BlockDriverState *bs, int64_t speed, BlockDriverCompletionFunc *cb, void *opaque, Error **errp) { @@ -48,8 +48,8 @@ void *block_job_create(const BlockJobType *job_type, BlockDriverState *bs, bdrv_ref(bs); bdrv_set_in_use(bs, 1); -job = g_malloc0(job_type-instance_size); -job-job_type = job_type; +job = g_malloc0(driver-instance_size); +job-driver= driver; job-bs= bs; job-cb= cb; job-opaque= opaque; @@ -87,11 +87,11 @@ void block_job_set_speed(BlockJob *job, int64_t speed, Error **errp) { Error *local_err = NULL; -if (!job-job_type-set_speed) { +if (!job-driver-set_speed) { error_set(errp, QERR_NOT_SUPPORTED); return; } -job-job_type-set_speed(job, speed, local_err); +job-driver-set_speed(job, speed, local_err); if (error_is_set(local_err)) { error_propagate(errp, local_err); return; @@ -102,12 +102,12 @@ void
[Qemu-devel] [PATCH 2/3] qapi: Introduce enum BlockJobType
This will replace the open coded block job type string for mirror, commit and backup. Signed-off-by: Fam Zheng f...@redhat.com --- qapi-schema.json | 18 ++ 1 file changed, 18 insertions(+) diff --git a/qapi-schema.json b/qapi-schema.json index 145eca8..381ffbf 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -1366,6 +1366,24 @@ 'data': ['top', 'full', 'none'] } ## +# @BlockJobType: +# +# Type of a block job. +# +# @commit: block commit job type, see block-commit +# +# @stream: block stream job type, see block-stream +# +# @mirror: drive mirror job type, see drive-mirror +# +# @backup: drive backup job type, see drive-backup +# +# Since: 1.7 +## +{ 'enum': 'BlockJobType', + 'data': ['commit', 'stream', 'mirror', 'backup'] } + +## # @BlockJobInfo: # # Information about a long-running block device operation. -- 1.8.3.1
[Qemu-devel] [PATCH 3/3] qapi: make use of new BlockJobType
Switch the string to enum type BlockJobType in BlockJobDriver. Signed-off-by: Fam Zheng f...@redhat.com --- block/backup.c | 2 +- block/commit.c | 2 +- block/mirror.c | 2 +- block/stream.c | 2 +- blockjob.c | 4 ++-- include/block/blockjob.h | 2 +- 6 files changed, 7 insertions(+), 7 deletions(-) diff --git a/block/backup.c b/block/backup.c index d374472..cad14c9 100644 --- a/block/backup.c +++ b/block/backup.c @@ -204,7 +204,7 @@ static void backup_iostatus_reset(BlockJob *job) static const BlockJobDriver backup_job_driver = { .instance_size = sizeof(BackupBlockJob), -.job_type = backup, +.job_type = BLOCK_JOB_TYPE_BACKUP, .set_speed = backup_set_speed, .iostatus_reset = backup_iostatus_reset, }; diff --git a/block/commit.c b/block/commit.c index 5146138..d4090cb 100644 --- a/block/commit.c +++ b/block/commit.c @@ -175,7 +175,7 @@ static void commit_set_speed(BlockJob *job, int64_t speed, Error **errp) static const BlockJobDriver commit_job_driver = { .instance_size = sizeof(CommitBlockJob), -.job_type = commit, +.job_type = BLOCK_JOB_TYPE_COMMIT, .set_speed = commit_set_speed, }; diff --git a/block/mirror.c b/block/mirror.c index 991cc24..7b95acf 100644 --- a/block/mirror.c +++ b/block/mirror.c @@ -527,7 +527,7 @@ static void mirror_complete(BlockJob *job, Error **errp) static const BlockJobDriver mirror_job_driver = { .instance_size = sizeof(MirrorBlockJob), -.job_type = mirror, +.job_type = BLOCK_JOB_TYPE_MIRROR, .set_speed = mirror_set_speed, .iostatus_reset= mirror_iostatus_reset, .complete = mirror_complete, diff --git a/block/stream.c b/block/stream.c index 7f412bd..694fd42 100644 --- a/block/stream.c +++ b/block/stream.c @@ -205,7 +205,7 @@ static void stream_set_speed(BlockJob *job, int64_t speed, Error **errp) static const BlockJobDriver stream_job_driver = { .instance_size = sizeof(StreamBlockJob), -.job_type = stream, +.job_type = BLOCK_JOB_TYPE_STREAM, .set_speed = stream_set_speed, }; diff --git a/blockjob.c b/blockjob.c index 6814e69..9e5fd5c 100644 --- a/blockjob.c +++ b/blockjob.c @@ -209,7 +209,7 @@ void block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns) BlockJobInfo *block_job_query(BlockJob *job) { BlockJobInfo *info = g_new0(BlockJobInfo, 1); -info-type = g_strdup(job-driver-job_type); +info-type = g_strdup(BlockJobType_lookup[job-driver-job_type]); info-device= g_strdup(bdrv_get_device_name(job-bs)); info-len = job-len; info-busy = job-busy; @@ -236,7 +236,7 @@ QObject *qobject_from_block_job(BlockJob *job) 'len': % PRId64 , 'offset': % PRId64 , 'speed': % PRId64 }, - job-driver-job_type, + BlockJobType_lookup[job-driver-job_type], bdrv_get_device_name(job-bs), job-len, job-offset, diff --git a/include/block/blockjob.h b/include/block/blockjob.h index 99359b5..d76de62 100644 --- a/include/block/blockjob.h +++ b/include/block/blockjob.h @@ -37,7 +37,7 @@ typedef struct BlockJobDriver { size_t instance_size; /** String describing the operation, part of query-block-jobs QMP API */ -const char *job_type; +BlockJobType job_type; /** Optional callback for job types that support setting a speed limit */ void (*set_speed)(BlockJob *job, int64_t speed, Error **errp); -- 1.8.3.1
Re: [Qemu-devel] [PATCH v4 6/7] commit: remove unused check
On Mon, 09/30 14:17, Paolo Bonzini wrote: Il 30/09/2013 14:02, Fam Zheng ha scritto: We support top == active for commit now, remove the check which is dead code now. Signed-off-by: Fam Zheng f...@redhat.com --- block/commit.c | 7 --- 1 file changed, 7 deletions(-) diff --git a/block/commit.c b/block/commit.c index ac4b7cc..086f8c9 100644 --- a/block/commit.c +++ b/block/commit.c @@ -198,13 +198,6 @@ void commit_start(BlockDriverState *bs, BlockDriverState *base, return; } -/* Once we support top == active layer, remove this check */ -if (top == bs) { -error_setg(errp, - Top image as the active layer is currently unsupported); -return; -} - if (top == base) { error_setg(errp, Invalid files for merge: top and base are the same); return; Perhaps this could even become an assertion, or it could take care of calling commit_active_start? Good idea. I'll make it an assertion. Thanks, Fam
Re: [Qemu-devel] savevm/loadvm
Il 08/10/2013 11:23, Kevin Wolf ha scritto: I think you need to modify qcow2_save_vmstate to save and restore bs-total_sectors. Can you test that and if so post the patch? It's a regression introduced by commit df2a6f29, right? Yes, that's what introduced the if. What you suggest probably works as a quick hack to fix the bug. The VM state functions in qcow2 are getting uglier and uglier, though. Maybe they should avoid going through block.c and adding hacks to disable or revert side effects. Yes, that would work too. Paolo
Re: [Qemu-devel] BUG: RTC issue when Windows guest is idle
Hi: I have met the same bug that windows2008 guest stop receive the RTC ticks when it in idle status by fortuitous. When vnc connect, guest will resume to receive RTC ticks and the time run fast because of the coalesced timer HPET is diabled, and RTC is set catchup, as following: clock offset='utc' timer name='rtc' tickpolicy='catchup' track='guest'/ timer name='hpet' present='no'/ /clock My kvm module is version3.6. Should I upgrade it to latest version. Any suggestion is welcome ! Thanks! --xiangyouxie On Thu, Feb 21, 2013 at 06:16:10PM +, Matthew Anderson wrote: If this isn't the correct list just let me know, I've run into a bug whereby a Windows guest (tested on Server 2008R2 and 2012) no longer receives RTC ticks when it has been idle for a random amount of time. HPET is disabled and the guest is running Hyper-V relaxed timers (same situation without hv_relaxed). The guest clock stands still and the qemu process uses very little CPU (0.5%, normally it's 5% when the guest is idle) . Eventually the guest stops responding to network requests but if you open the guest console via VNC and move the mouse around it comes back to life and QEMU replays the lost RTC ticks and the guest recovers. I've also been able to make it recover by querying the clock over the network via the net time command, you can see the clock stand still for 30 seconds then it replays the ticks and catches up. I've tried to reproduce the issue but it seems fairly illusive, the only way I've been able to reproduce it is by letting the VM's idle and waiting. Sometimes it's hours and sometimes minutes. Can anyone suggest a way to narrow the issue down? Qemu command line is- /usr/bin/kvm -name SQL01 -S -M pc-0.14 -cpu qemu64,hv_relaxed -enable-kvm -m 2048 -smp 2,sockets=2,cores=1,threads=1 -uuid 5f54333b-c250-aa72-c979-39d156814b85 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/iHost-SQL01.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime -no-hpet -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/mnt/gluster1-norep/iHost/SQL01.qed,if=none,id=drive-virtio-disk0,format=qed,cache=writeback -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0 -drive file=/mnt/gluster1-norep/iHost/SQL01-Data.qed,if=none,id=drive-virtio-disk2,format=qed,cache=writeback -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk2,id=virtio-disk2 -drive if=none,id=drive-ide0-1-0,readonly=on,format=raw -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=1 -netdev tap,fd=29,id=hostnet0,vhost=on,vhostfd=39 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:2c:8d:23,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 -vnc 127.0.0.1:22 -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 Environment is - Mainline 3.7.5 and 3.8.0 Qemu 1.2.2, 1.3.1 and 1.4.0
Re: [Qemu-devel] [PATCH v2 14/17] qemu-iotests: Check autodel behaviour for device_del
Am 01.10.2013 um 19:06 hat Eric Blake geschrieben: On 10/01/2013 07:20 AM, Kevin Wolf wrote: Block devices creates with -drive and drive_add should automatically disappear if the guest device is unplugged. blockdev-add ones shouldn't. Signed-off-by: Kevin Wolf kw...@redhat.com Reviewed-by: Max Reitz mre...@redhat.com --- tests/qemu-iotests/064 | 133 +++ tests/qemu-iotests/064.out | 80 +++ tests/qemu-iotests/common.filter | 8 +++ tests/qemu-iotests/group | 1 + 4 files changed, 222 insertions(+) create mode 100755 tests/qemu-iotests/064 create mode 100644 tests/qemu-iotests/064.out Reviewed-by: Eric Blake ebl...@redhat.com + +_supported_fmt qcow2 +_supported_proto file +_supported_os Linux Is this test really Linux-only? Probably most tests aren't, but all of them have _supported_os Linux. If anyone wanted to use this for something, they'd have to make a bigger change than just to this patch. Kevin
Re: [Qemu-devel] [RFC] qapi/error: Optional error propagation backtrace
On 2013-09-09 15:53, Max Reitz wrote: Add a configure switch which enables an error propagation backtrace. This results in the error_set function prepending every message by the source file name, function and line in which it was called, as well as error_propagate appending this information to the propagated message, resulting in a backtrace. Signed-off-by: Max Reitz mre...@redhat.com Ping – any comments on this? --- Since this obviously breaks existing tests (and cannot really be used for new tests since it includes a line number, which is a rather volatile output) and is generally not very useful to normal users, the switch is disabled by default. This functionality is intended for developers tracking down error paths. Since I do not know whether I am the only one considering it useful in the first place, this is just an RFC for now. Furthermore, I am not sure whether a configure switch is really the right way to implement this. --- configure| 12 +++ include/qapi/error.h | 40 +++- util/error.c | 57 +++- 3 files changed, 90 insertions(+), 19 deletions(-) diff --git a/configure b/configure index e989609..4e43f7b 100755 --- a/configure +++ b/configure @@ -243,6 +243,7 @@ gtk= gtkabi=2.0 tpm=no libssh2= +error_backtrace=no # parse CC options first for opt do @@ -949,6 +950,10 @@ for opt do ;; --enable-libssh2) libssh2=yes ;; + --enable-error-bt) error_backtrace=yes + ;; + --disable-error-bt) error_backtrace=no + ;; *) echo ERROR: unknown option $opt; show_help=yes ;; esac @@ -1168,6 +1173,8 @@ echo --gcov=GCOV use specified gcov [$gcov_tool] echo --enable-tpm enable TPM support echo --disable-libssh2disable ssh block device support echo --enable-libssh2 enable ssh block device support +echo --disable-error-bt disable backtraces on internal errors +echo --enable-error-btenable backtraces on internal errors echo echo NOTE: The object files are built at the place where configure is launched exit 1 @@ -3650,6 +3657,7 @@ echo gcov $gcov_tool echo gcov enabled $gcov echo TPM support $tpm echo libssh2 support $libssh2 +echo error backtraces $error_backtrace echo TPM passthrough $tpm_passthrough echo QOM debugging $qom_cast_debug @@ -4044,6 +4052,10 @@ if test $libssh2 = yes ; then echo CONFIG_LIBSSH2=y $config_host_mak fi +if test $error_backtrace = yes ; then + echo CONFIG_ERROR_BACKTRACE=y $config_host_mak +fi + if test $virtio_blk_data_plane = yes ; then echo 'CONFIG_VIRTIO_BLK_DATA_PLANE=$(CONFIG_VIRTIO)' $config_host_mak fi diff --git a/include/qapi/error.h b/include/qapi/error.h index ffd1cea..d3cfe35 100644 --- a/include/qapi/error.h +++ b/include/qapi/error.h @@ -25,16 +25,28 @@ typedef struct Error Error; /** * Set an indirect pointer to an error given a ErrorClass value and a * printf-style human message. This function is not meant to be used outside - * of QEMU. + * of QEMU. The message will be prepended by the file/function/line information + * if CONFIG_ERROR_BACKTRACE is defined. */ -void error_set(Error **err, ErrorClass err_class, const char *fmt, ...) GCC_FMT_ATTR(3, 4); +void error_set_bt(const char *file, const char *func, int line, + Error **err, ErrorClass err_class, const char *fmt, ...) +GCC_FMT_ATTR(6, 7); + +#define error_set(...) \ +error_set_bt(__FILE__, __func__, __LINE__, __VA_ARGS__) /** * Set an indirect pointer to an error given a ErrorClass value and a * printf-style human message, followed by a strerror() string if - * @os_error is not zero. + * @os_error is not zero. The message will be prepended by the + * file/function/line information if CONFIG_ERROR_BACKTRACE is defined. */ -void error_set_errno(Error **err, int os_error, ErrorClass err_class, const char *fmt, ...) GCC_FMT_ATTR(4, 5); +void error_set_errno_bt(const char *file, const char *func, int line, +Error **err, int os_error, ErrorClass err_class, +const char *fmt, ...) GCC_FMT_ATTR(7, 8); + +#define error_set_errno(...) \ +error_set_errno_bt(__FILE__, __func__, __LINE__, __VA_ARGS__) /** * Same as error_set(), but sets a generic error @@ -47,7 +59,11 @@ void error_set_errno(Error **err, int os_error, ErrorClass err_class, const char /** * Helper for open() errors */ -void error_setg_file_open(Error **errp, int os_errno, const char *filename); +void error_setg_file_open_bt(const char *file, const char *func, int line, + Error **errp, int os_errno, const char *filename); + +#define error_setg_file_open(...) \ +error_setg_file_open_bt(__FILE__, __func__, __LINE__, __VA_ARGS__) /** * Returns true if an indirect pointer to an error is
[Qemu-devel] [PATCH] osdep: initialize glib threads in all QEMU tools
glib versions prior to 2.31.0 require an explicit g_thread_init() call to enable multi-threading. Failure to initialize threading causes glib to take single-threaded code paths without synchronization. For example, the g_slice allocator will crash due to race conditions. Fix this for all QEMU tool programs (qemu-nbd, qemu-io, qemu-img) by moving the g_thread_init() call from vl.c:main() into a new osdep.c:thread_init() constructor function. thread_init() has __attribute__((constructor)) and is automatically invoked by the runtime during startup. We can now drop the simple trace backend's g_thread_init() call since thread_init() already called it. Note that we must keep coroutine-gthread.c's g_thread_init() call which is located in a constructor function. There is no guarantee for constructor function ordering so thread_init() may only be called later. Reported-by: Mario de Chenno mario.deche...@unina2.it Signed-off-by: Stefan Hajnoczi stefa...@redhat.com --- trace/simple.c | 9 - util/osdep.c | 18 ++ vl.c | 8 3 files changed, 18 insertions(+), 17 deletions(-) diff --git a/trace/simple.c b/trace/simple.c index 1e3f691..7833309 100644 --- a/trace/simple.c +++ b/trace/simple.c @@ -404,15 +404,6 @@ bool trace_backend_init(const char *events, const char *file) { GThread *thread; -if (!g_thread_supported()) { -#if !GLIB_CHECK_VERSION(2, 31, 0) -g_thread_init(NULL); -#else -fprintf(stderr, glib threading failed to initialize.\n); -exit(1); -#endif -} - #if !GLIB_CHECK_VERSION(2, 31, 0) trace_available_cond = g_cond_new(); trace_empty_cond = g_cond_new(); diff --git a/util/osdep.c b/util/osdep.c index 62072b4..e29ba6f 100644 --- a/util/osdep.c +++ b/util/osdep.c @@ -437,6 +437,24 @@ int socket_init(void) return 0; } +/* Ensure that glib is running in multi-threaded mode */ +static void __attribute__((constructor)) thread_init(void) +{ +if (!g_thread_supported()) { +#if !GLIB_CHECK_VERSION(2, 31, 0) +/* Old versions of glib require explicit initialization. Failure to do + * this results in the single-threaded code paths being taken inside + * glib. For example, the g_slice allocator will not be thread-safe + * and cause crashes. + */ +g_thread_init(NULL); +#else +fprintf(stderr, glib threading failed to initialize.\n); +exit(1); +#endif +} +} + #ifndef CONFIG_IOVEC /* helper function for iov_send_recv() */ static ssize_t diff --git a/vl.c b/vl.c index 983cdc6..e2dee8e 100644 --- a/vl.c +++ b/vl.c @@ -2857,14 +2857,6 @@ int main(int argc, char **argv, char **envp) error_set_progname(argv[0]); g_mem_set_vtable(mem_trace); -if (!g_thread_supported()) { -#if !GLIB_CHECK_VERSION(2, 31, 0) -g_thread_init(NULL); -#else -fprintf(stderr, glib threading failed to initialize.\n); -exit(1); -#endif -} module_call_init(MODULE_INIT_QOM); -- 1.8.3.1
[Qemu-devel] [Bug 1233225] Re: mips/mipsel linux user float division problem
This is a known issue. There was a fix proposal by Thomas Schwinge back in June http://patchwork.ozlabs.org/patch/250161/ but he has not updated the patch per suggestion ever since, though the patch as is was much closer to correct behaviour than what it is now in the source. If anyone is in hurry, he/she can pick up that change. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1233225 Title: mips/mipsel linux user float division problem Status in QEMU: Confirmed Bug description: Hi, I tested the following with the qemu git HEAD as of 2013-09-30 on Debian stable and testing. My host runs amd64 but I also tried this out inside a i386 chroot with the same result. The problem occurs for mips and mipsel. Given the following program: #include stdio.h int main(int argc, char **argv) { int a = 1; double d = a/2.0; printf(%f\n, d); return 0; } Instead of printing 0.5, it will print 2.0 if executed in qemu user mode. $ mipsel-linux-gnu-gcc mipstest.c $ ~/qemu/mipsel-linux-user/qemu-mipsel ./a.out 2.0 Expecting this to be a problem with my cross compiler (gcc-4.4 from emdebian) I ran a fully emulated debian squeeze environment inside qemu. There, I compiled the same program natively with gcc and as expected got 0.5 as the output. I also copied the cross compiled binary inside the emulated environment and also got 0.5 when I ran it. So the same mips/mipsel binary produces different output depending on whether it is run in a fully emulated environment or qemu user mode. Can anybody else reproduce this problem? To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1233225/+subscriptions
Re: [Qemu-devel] [PATCH v2 2/2] KVM: s390: add floating irq controller
On Sat, Oct 05, 2013 at 01:53:33AM +0200, Alexander Graf wrote: On 06.09.2013, at 14:19, Jens Freimann wrote: [snip] -int kvm_s390_inject_vm(struct kvm *kvm, - struct kvm_s390_interrupt *s390int) +static void __inject_vm(struct kvm *kvm, struct kvm_s390_interrupt_info *inti) This really doesn't only inject, it enqueues an interrupt and also injects them then, right? hmm, maybe the term injecting is misleading. What is meant by injecting here is simply adding it to the list. The actual injection into the guest lowcore is done by the do_deliver_interrupt function. { struct kvm_s390_local_interrupt *li; struct kvm_s390_float_interrupt *fi; - struct kvm_s390_interrupt_info *inti, *iter; + struct kvm_s390_interrupt_info *iter; int sigcpu; + mutex_lock(kvm-lock); + fi = kvm-arch.float_int; You probably want to move this into your device structure eventually. Yes, good point. + spin_lock(fi-lock); + if (!is_ioint(inti-type)) { + list_add_tail(inti-list, fi-list); + } else { + u64 isc_bits = int_word_to_isc_bits(inti-io.io_int_word); + + /* Keep I/O interrupts sorted in isc order. */ + list_for_each_entry(iter, fi-list, list) { + if (!is_ioint(iter-type)) + continue; + if (int_word_to_isc_bits(iter-io.io_int_word) + = isc_bits) + continue; + break; + } + list_add_tail(inti-list, iter-list); + } + atomic_set(fi-active, 1); + sigcpu = find_first_bit(fi-idle_mask, KVM_MAX_VCPUS); + if (sigcpu == KVM_MAX_VCPUS) { + do { + sigcpu = fi-next_rr_cpu++; + if (sigcpu == KVM_MAX_VCPUS) + sigcpu = fi-next_rr_cpu = 0; + } while (fi-local_int[sigcpu] == NULL); + } + li = fi-local_int[sigcpu]; + spin_lock_bh(li-lock); + atomic_set_mask(CPUSTAT_EXT_INT, li-cpuflags); + if (waitqueue_active(li-wq)) + wake_up_interruptible(li-wq); Does this kick the other vcpu to notify it that an irq is available? The code is very spaghetti style. There's certainly a good opportunity to split things up and make it more readable here ;). I think splitting it into enqueue and target cpu deliver parts makes a lot of sense for starters. I'm working on a larger patchset that changes interrupt injection and delivery. The rationale is to make it more readable, use less locking (get rid of lists use bitmaps) and become more PoP-compliant. I'll try to make it more obvious. Btw, usually the way these interrupt controllers work in KVM is that the CPU local interrupt controller part fetches interrupts from the floating interrupt controller. So in here you would only enqueue and maybe kick a potential receiver of the interrupt. Before a vcpu goes back into guest state, it asks the floating interrupt controller whether there's anything pending for it. If so, it delivers it. I think that's kind of what we're doing here. We enqueue by adding it to our list of pending interrupts, look for an idle cpu, set the external interrupt flag and wake it up. That way even if the vcpu you chose here happens to get preempted, you still deliver the interrupt quickly through another vcpu. + spin_unlock_bh(li-lock); + spin_unlock(fi-lock); + mutex_unlock(kvm-lock); +} + +int kvm_s390_inject_vm(struct kvm *kvm, + struct kvm_s390_interrupt *s390int) +{ + struct kvm_s390_interrupt_info *inti; + inti = kzalloc(sizeof(*inti), GFP_KERNEL); if (!inti) return -ENOMEM; - switch (s390int-type) { + inti-type = s390int-type; + switch (inti-type) { case KVM_S390_INT_VIRTIO: VM_EVENT(kvm, 5, inject: virtio parm:%x,parm64:%llx, s390int-parm, s390int-parm64); - inti-type = s390int-type; inti-ext.ext_params = s390int-parm; inti-ext.ext_params2 = s390int-parm64; break; case KVM_S390_INT_SERVICE: VM_EVENT(kvm, 5, inject: sclp parm:%x, s390int-parm); - inti-type = s390int-type; inti-ext.ext_params = s390int-parm; break; - case KVM_S390_PROGRAM_INT: - case KVM_S390_SIGP_STOP: - case KVM_S390_INT_EXTERNAL_CALL: - case KVM_S390_INT_EMERGENCY: - kfree(inti); - return -EINVAL; case KVM_S390_MCHK: VM_EVENT(kvm, 5, inject: machine check parm64:%llx, s390int-parm64); - inti-type = s390int-type; inti-mchk.cr14 = s390int-parm; /* upper bits are not used */ inti-mchk.mcic = s390int-parm64; break; case
Re: [Qemu-devel] [PATCHv5] block/get_block_status: avoid redundant callouts on raw devices
On Mon, Oct 7, 2013 at 10:25 AM, Paolo Bonzini pbonz...@redhat.com wrote: Il 07/10/2013 07:59, Peter Lieven ha scritto: if a raw device like an iscsi target or host device is used the current implementation makes a second call out to get the block status of bs-file. Signed-off-by: Peter Lieven p...@kamp.de --- v5: add a generic get_lba_status function in the raw driver which adds the BDRV_BLOCK_RAW flag. bdrv_co_get_block_status will handle the callout to bs-file then. v4: use a flag to detect the raw driver instead of the strncmp hack block.c |4 block/raw_bsd.c |3 ++- include/block/block.h |4 3 files changed, 10 insertions(+), 1 deletion(-) diff --git a/block.c b/block.c index 93e113a..38a589e 100644 --- a/block.c +++ b/block.c @@ -3147,6 +3147,10 @@ static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs, return ret; } +if (ret BDRV_BLOCK_RAW) { +return bdrv_get_block_status(bs-file, sector_num, nb_sectors, pnum); Strictly speaking, this should probably do something like this: assert(ret BDRV_BLOCK_OFFSET_VALID); return bdrv_get_block_status(bs-file, ret BDRV_SECTOR_BITS, nb_sectors, pnum); Or alternatively the raw driver should return just BDRV_BLOCK_RAW. As a third option, the raw driver could also return not just BDRV_BLOCK_RAW and BDRV_BLOCK_OFFSET_VALID, but also BDRV_BLOCK_DATA (so that the answer makes some sense even without going down to bs-file). But I'll let the block maintainers decide what to do. The return value is a bitfield, so let's use those bits and take Option 3. Stefan
[Qemu-devel] centos5 32bit and debian6 32bit problem with virtio-serial (guest-agent)
Hi, I've enabled qemu-guest-agent for a VM. When the guest is centos6, I can properly find /dev/virtio-ports/org.qemu.guest_agent.0. But when the guest is either centos5 32bit or debian6 32bit (these are the OSes I've checked so far), I cannot find the device in /dev. The command 'lspci' displays 00:08.0 Communication controller: Red Hat, Inc Virtio console. I'm not sure if there is a problem with the driver or it's just mapped with other names in /dev. I would be glad if you assist. Thanks, Hamed Afshar
[Qemu-devel] [PATCHv4 03/17] block: introduce BDRV_REQ_MAY_UNMAP request flag
Reviewed-by: Eric Blake ebl...@redhat.com Signed-off-by: Peter Lieven p...@kamp.de --- block-migration.c |3 ++- block.c |4 block/backup.c|2 +- include/block/block.h |7 +++ 4 files changed, 14 insertions(+), 2 deletions(-) diff --git a/block-migration.c b/block-migration.c index 713a8e3..fc4ef93 100644 --- a/block-migration.c +++ b/block-migration.c @@ -780,7 +780,8 @@ static int block_load(QEMUFile *f, void *opaque, int version_id) } if (flags BLK_MIG_FLAG_ZERO_BLOCK) { -ret = bdrv_write_zeroes(bs, addr, nr_sectors, 0); +ret = bdrv_write_zeroes(bs, addr, nr_sectors, +BDRV_REQ_MAY_UNMAP); } else { buf = g_malloc(BLOCK_SIZE); qemu_get_buffer(f, buf, BLOCK_SIZE); diff --git a/block.c b/block.c index 8ed53a2..0675257 100644 --- a/block.c +++ b/block.c @@ -2801,6 +2801,10 @@ int coroutine_fn bdrv_co_write_zeroes(BlockDriverState *bs, { trace_bdrv_co_write_zeroes(bs, sector_num, nb_sectors); +if (!(bs-open_flags BDRV_O_UNMAP)) { +flags = ~BDRV_REQ_MAY_UNMAP; +} + return bdrv_co_do_writev(bs, sector_num, nb_sectors, NULL, BDRV_REQ_ZERO_WRITE | flags); } diff --git a/block/backup.c b/block/backup.c index 5f6a642..1ab080d 100644 --- a/block/backup.c +++ b/block/backup.c @@ -139,7 +139,7 @@ static int coroutine_fn backup_do_cow(BlockDriverState *bs, if (buffer_is_zero(iov.iov_base, iov.iov_len)) { ret = bdrv_co_write_zeroes(job-target, start * BACKUP_SECTORS_PER_CLUSTER, - n, 0); + n, BDRV_REQ_MAY_UNMAP); } else { ret = bdrv_co_writev(job-target, start * BACKUP_SECTORS_PER_CLUSTER, n, diff --git a/include/block/block.h b/include/block/block.h index 47c6475..b72c81a 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -65,6 +65,13 @@ typedef struct BlockDevOps { typedef enum { BDRV_REQ_COPY_ON_READ = 0x1, BDRV_REQ_ZERO_WRITE = 0x2, +/* The BDRV_REQ_MAY_UNMAP flag is used to indicate that the block driver + * is allowed to optimize a write zeroes request by unmapping (discarding) + * blocks if it is guaranteed that the result will read back as + * zeroes. The flag is only passed to the driver if the block device is + * opened with BDRV_O_UNMAP. + */ +BDRV_REQ_MAY_UNMAP= 0x4, } BdrvRequestFlags; #define BDRV_O_RDWR0x0002 -- 1.7.9.5
[Qemu-devel] [PATCHv4 05/17] block/raw: add bdrv_has_discard_zeroes and bdrv_has_discard_write_zeroes
Reviewed-by: Eric Blake ebl...@redhat.com Signed-off-by: Peter Lieven p...@kamp.de --- block/raw_bsd.c | 56 +-- 1 file changed, 34 insertions(+), 22 deletions(-) diff --git a/block/raw_bsd.c b/block/raw_bsd.c index d5ab295..8dc7bba 100644 --- a/block/raw_bsd.c +++ b/block/raw_bsd.c @@ -131,6 +131,16 @@ static int raw_has_zero_init(BlockDriverState *bs) return bdrv_has_zero_init(bs-file); } +static bool raw_has_discard_zeroes(BlockDriverState *bs) +{ +return bdrv_has_discard_zeroes(bs-file); +} + +static bool raw_has_discard_write_zeroes(BlockDriverState *bs) +{ +return bdrv_has_discard_write_zeroes(bs-file); +} + static int raw_create(const char *filename, QEMUOptionParameter *options, Error **errp) { @@ -165,28 +175,30 @@ static int raw_probe(const uint8_t *buf, int buf_size, const char *filename) } static BlockDriver bdrv_raw = { -.format_name = raw, -.bdrv_probe = raw_probe, -.bdrv_reopen_prepare = raw_reopen_prepare, -.bdrv_open= raw_open, -.bdrv_close = raw_close, -.bdrv_create = raw_create, -.bdrv_co_readv= raw_co_readv, -.bdrv_co_writev = raw_co_writev, -.bdrv_co_write_zeroes = raw_co_write_zeroes, -.bdrv_co_discard = raw_co_discard, -.bdrv_co_get_block_status = raw_co_get_block_status, -.bdrv_truncate= raw_truncate, -.bdrv_getlength = raw_getlength, -.bdrv_get_info= raw_get_info, -.bdrv_is_inserted = raw_is_inserted, -.bdrv_media_changed = raw_media_changed, -.bdrv_eject = raw_eject, -.bdrv_lock_medium = raw_lock_medium, -.bdrv_ioctl = raw_ioctl, -.bdrv_aio_ioctl = raw_aio_ioctl, -.create_options = raw_create_options[0], -.bdrv_has_zero_init = raw_has_zero_init +.format_name = raw, +.bdrv_probe= raw_probe, +.bdrv_reopen_prepare = raw_reopen_prepare, +.bdrv_open = raw_open, +.bdrv_close= raw_close, +.bdrv_create = raw_create, +.bdrv_co_readv = raw_co_readv, +.bdrv_co_writev= raw_co_writev, +.bdrv_co_write_zeroes = raw_co_write_zeroes, +.bdrv_co_discard = raw_co_discard, +.bdrv_co_get_block_status = raw_co_get_block_status, +.bdrv_truncate = raw_truncate, +.bdrv_getlength= raw_getlength, +.bdrv_get_info = raw_get_info, +.bdrv_is_inserted = raw_is_inserted, +.bdrv_media_changed= raw_media_changed, +.bdrv_eject= raw_eject, +.bdrv_lock_medium = raw_lock_medium, +.bdrv_ioctl= raw_ioctl, +.bdrv_aio_ioctl= raw_aio_ioctl, +.create_options= raw_create_options[0], +.bdrv_has_zero_init= raw_has_zero_init, +.bdrv_has_discard_zeroes = raw_has_discard_zeroes, +.bdrv_has_discard_write_zeroes = raw_has_discard_write_zeroes, }; static void bdrv_raw_init(void) -- 1.7.9.5
[Qemu-devel] [PATCHv4 01/17] block: make BdrvRequestFlags public
Reviewed-by: Eric Blake ebl...@redhat.com Signed-off-by: Peter Lieven p...@kamp.de --- block.c |5 - include/block/block.h |5 + 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/block.c b/block.c index 93e113a..08cba1e 100644 --- a/block.c +++ b/block.c @@ -51,11 +51,6 @@ #define NOT_DONE 0x7fff /* used while emulated sync operation in progress */ -typedef enum { -BDRV_REQ_COPY_ON_READ = 0x1, -BDRV_REQ_ZERO_WRITE = 0x2, -} BdrvRequestFlags; - static void bdrv_dev_change_media_cb(BlockDriverState *bs, bool load); static BlockDriverAIOCB *bdrv_aio_readv_em(BlockDriverState *bs, int64_t sector_num, QEMUIOVector *qiov, int nb_sectors, diff --git a/include/block/block.h b/include/block/block.h index f808550..9fb0c3d 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -62,6 +62,11 @@ typedef struct BlockDevOps { void (*resize_cb)(void *opaque); } BlockDevOps; +typedef enum { +BDRV_REQ_COPY_ON_READ = 0x1, +BDRV_REQ_ZERO_WRITE = 0x2, +} BdrvRequestFlags; + #define BDRV_O_RDWR0x0002 #define BDRV_O_SNAPSHOT0x0008 /* open the file read only and save writes in a snapshot */ #define BDRV_O_NOCACHE 0x0020 /* do not use the host page cache */ -- 1.7.9.5
[Qemu-devel] [PATCHv4 06/17] block: add BlockLimits structure to BlockDriverState
this patch adds BlockLimits which introduces discard and write_zeroes limits and alignment information to the BlockDriverState. Signed-off-by: Peter Lieven p...@kamp.de --- include/block/block_int.h | 17 + 1 file changed, 17 insertions(+) diff --git a/include/block/block_int.h b/include/block/block_int.h index ce5c62e..775b7ab 100644 --- a/include/block/block_int.h +++ b/include/block/block_int.h @@ -233,6 +233,20 @@ struct BlockDriver { QLIST_ENTRY(BlockDriver) list; }; +typedef struct BlockLimits { +/* maximum number of sectors that can be discarded at once */ +int max_discard; + +/* optimal alignment for discard requests in sectors */ +int64_t discard_alignment; + +/* maximum number of sectors that can zeroized at once */ +int max_write_zeroes; + +/* optimal alignment for write zeroes requests in sectors */ +int64_t write_zeroes_alignment; +} BlockLimits; + /* * Note: the function bdrv_append() copies and swaps contents of * BlockDriverStates, so if you add new fields to this struct, please @@ -286,6 +300,9 @@ struct BlockDriverState { uint64_t total_time_ns[BDRV_MAX_IOTYPE]; uint64_t wr_highest_sector; +/* I/O Limits */ +BlockLimits bl; + /* Whether the disk can expand beyond total_sectors */ int growable; -- 1.7.9.5
[Qemu-devel] [PATCHv4 14/17] block/get_block_status: fix BDRV_BLOCK_ZERO for unallocated blocks
this patch does 2 things: a) only do additional call outs if BDRV_BLOCK_ZERO is not already set. b) use the newly introduced bdrv_has_discard_zeroes() to return the zero state of an unallocated block. the used callout to bdrv_has_zero_init() is only valid right after bdrv_create. Reviewed-by: Eric Blake ebl...@redhat.com Signed-off-by: Peter Lieven p...@kamp.de --- block.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/block.c b/block.c index fc931e3..1be4418 100644 --- a/block.c +++ b/block.c @@ -3247,8 +3247,8 @@ static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs, return ret; } -if (!(ret BDRV_BLOCK_DATA)) { -if (bdrv_has_zero_init(bs)) { +if (!(ret BDRV_BLOCK_DATA) !(ret BDRV_BLOCK_ZERO)) { +if (bdrv_has_discard_zeroes(bs)) { ret |= BDRV_BLOCK_ZERO; } else if (bs-backing_hd) { BlockDriverState *bs2 = bs-backing_hd; -- 1.7.9.5
[Qemu-devel] [PATCHv4 02/17] block: add flags to bdrv_*_write_zeroes
Reviewed-by: Eric Blake ebl...@redhat.com Signed-off-by: Peter Lieven p...@kamp.de --- block-migration.c |2 +- block.c | 20 +++- block/backup.c|3 ++- block/qcow2-cluster.c |2 +- block/qcow2.c |2 +- block/qed.c |3 ++- block/raw_bsd.c |5 +++-- block/vmdk.c |3 ++- include/block/block.h |4 ++-- include/block/block_int.h |2 +- qemu-io-cmds.c|2 +- 11 files changed, 27 insertions(+), 21 deletions(-) diff --git a/block-migration.c b/block-migration.c index daf9ec1..713a8e3 100644 --- a/block-migration.c +++ b/block-migration.c @@ -780,7 +780,7 @@ static int block_load(QEMUFile *f, void *opaque, int version_id) } if (flags BLK_MIG_FLAG_ZERO_BLOCK) { -ret = bdrv_write_zeroes(bs, addr, nr_sectors); +ret = bdrv_write_zeroes(bs, addr, nr_sectors, 0); } else { buf = g_malloc(BLOCK_SIZE); qemu_get_buffer(f, buf, BLOCK_SIZE); diff --git a/block.c b/block.c index 08cba1e..8ed53a2 100644 --- a/block.c +++ b/block.c @@ -79,7 +79,7 @@ static BlockDriverAIOCB *bdrv_co_aio_rw_vector(BlockDriverState *bs, bool is_write); static void coroutine_fn bdrv_co_do_rw(void *opaque); static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs, -int64_t sector_num, int nb_sectors); +int64_t sector_num, int nb_sectors, BdrvRequestFlags flags); static QTAILQ_HEAD(, BlockDriverState) bdrv_states = QTAILQ_HEAD_INITIALIZER(bdrv_states); @@ -2375,10 +2375,11 @@ int bdrv_writev(BlockDriverState *bs, int64_t sector_num, QEMUIOVector *qiov) return bdrv_rwv_co(bs, sector_num, qiov, true, 0); } -int bdrv_write_zeroes(BlockDriverState *bs, int64_t sector_num, int nb_sectors) +int bdrv_write_zeroes(BlockDriverState *bs, int64_t sector_num, + int nb_sectors, BdrvRequestFlags flags) { return bdrv_rw_co(bs, sector_num, NULL, nb_sectors, true, - BDRV_REQ_ZERO_WRITE); + BDRV_REQ_ZERO_WRITE | flags); } int bdrv_pread(BlockDriverState *bs, int64_t offset, @@ -2560,7 +2561,7 @@ static int coroutine_fn bdrv_co_do_copy_on_readv(BlockDriverState *bs, if (drv-bdrv_co_write_zeroes buffer_is_zero(bounce_buffer, iov.iov_len)) { ret = bdrv_co_do_write_zeroes(bs, cluster_sector_num, - cluster_nb_sectors); + cluster_nb_sectors, 0); } else { /* This does not change the data on the disk, it is not necessary * to flush even in cache=writethrough mode. @@ -2694,7 +2695,7 @@ int coroutine_fn bdrv_co_copy_on_readv(BlockDriverState *bs, } static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs, -int64_t sector_num, int nb_sectors) +int64_t sector_num, int nb_sectors, BdrvRequestFlags flags) { BlockDriver *drv = bs-drv; QEMUIOVector qiov; @@ -2706,7 +2707,7 @@ static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs, /* First try the efficient write zeroes operation */ if (drv-bdrv_co_write_zeroes) { -ret = drv-bdrv_co_write_zeroes(bs, sector_num, nb_sectors); +ret = drv-bdrv_co_write_zeroes(bs, sector_num, nb_sectors, flags); if (ret != -ENOTSUP) { return ret; } @@ -2761,7 +2762,7 @@ static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs, if (ret 0) { /* Do nothing, write notifier decided to fail this request */ } else if (flags BDRV_REQ_ZERO_WRITE) { -ret = bdrv_co_do_write_zeroes(bs, sector_num, nb_sectors); +ret = bdrv_co_do_write_zeroes(bs, sector_num, nb_sectors, flags); } else { ret = drv-bdrv_co_writev(bs, sector_num, nb_sectors, qiov); } @@ -2795,12 +2796,13 @@ int coroutine_fn bdrv_co_writev(BlockDriverState *bs, int64_t sector_num, } int coroutine_fn bdrv_co_write_zeroes(BlockDriverState *bs, - int64_t sector_num, int nb_sectors) + int64_t sector_num, int nb_sectors, + BdrvRequestFlags flags) { trace_bdrv_co_write_zeroes(bs, sector_num, nb_sectors); return bdrv_co_do_writev(bs, sector_num, nb_sectors, NULL, - BDRV_REQ_ZERO_WRITE); + BDRV_REQ_ZERO_WRITE | flags); } /** diff --git a/block/backup.c b/block/backup.c index 04c4b5c..5f6a642 100644 --- a/block/backup.c +++ b/block/backup.c @@ -138,7 +138,8 @@ static int coroutine_fn backup_do_cow(BlockDriverState *bs, if (buffer_is_zero(iov.iov_base, iov.iov_len)) { ret = bdrv_co_write_zeroes(job-target, - start * BACKUP_SECTORS_PER_CLUSTER, n); +
[Qemu-devel] [PATCHv4 07/17] block: honour BlockLimits in bdrv_co_do_write_zeroes
Reviewed-by: Eric Blake ebl...@redhat.com Signed-off-by: Peter Lieven p...@kamp.de --- block.c | 65 +++ 1 file changed, 49 insertions(+), 16 deletions(-) diff --git a/block.c b/block.c index 6a46bc2..7551751 100644 --- a/block.c +++ b/block.c @@ -2694,32 +2694,65 @@ int coroutine_fn bdrv_co_copy_on_readv(BlockDriverState *bs, BDRV_REQ_COPY_ON_READ); } +/* if no limit is specified in the BlockLimits use a default + * of 32768 512-byte sectors (16 MiB) per request. + */ +#define MAX_WRITE_ZEROES_DEFAULT 32768 + static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs, int64_t sector_num, int nb_sectors, BdrvRequestFlags flags) { BlockDriver *drv = bs-drv; QEMUIOVector qiov; -struct iovec iov; -int ret; +struct iovec iov = {0}; +int ret = 0; -/* TODO Emulate only part of misaligned requests instead of letting block - * drivers return -ENOTSUP and emulate everything */ +int max_write_zeroes = bs-bl.max_write_zeroes ? + bs-bl.max_write_zeroes : MAX_WRITE_ZEROES_DEFAULT; -/* First try the efficient write zeroes operation */ -if (drv-bdrv_co_write_zeroes) { -ret = drv-bdrv_co_write_zeroes(bs, sector_num, nb_sectors, flags); -if (ret != -ENOTSUP) { -return ret; +while (nb_sectors 0 !ret) { +int num = nb_sectors; + +/* align request */ +if (bs-bl.write_zeroes_alignment +num = bs-bl.write_zeroes_alignment +sector_num % bs-bl.write_zeroes_alignment) { +if (num bs-bl.write_zeroes_alignment) { +num = bs-bl.write_zeroes_alignment; +} +num -= sector_num % bs-bl.write_zeroes_alignment; } -} -/* Fall back to bounce buffer if write zeroes is unsupported */ -iov.iov_len = nb_sectors * BDRV_SECTOR_SIZE; -iov.iov_base = qemu_blockalign(bs, iov.iov_len); -memset(iov.iov_base, 0, iov.iov_len); -qemu_iovec_init_external(qiov, iov, 1); +/* limit request size */ +if (num max_write_zeroes) { +num = max_write_zeroes; +} + +ret = -ENOTSUP; +/* First try the efficient write zeroes operation */ +if (drv-bdrv_co_write_zeroes) { +ret = drv-bdrv_co_write_zeroes(bs, sector_num, num, flags); +} + +if (ret == -ENOTSUP) { +/* Fall back to bounce buffer if write zeroes is unsupported */ +iov.iov_len = num * BDRV_SECTOR_SIZE; +if (iov.iov_base == NULL) { +/* allocate bounce buffer only once and ensure that it + * is big enough for this and all future requests. + */ +size_t bufsize = num = nb_sectors ? num : max_write_zeroes; +iov.iov_base = qemu_blockalign(bs, bufsize * BDRV_SECTOR_SIZE); +memset(iov.iov_base, 0, bufsize * BDRV_SECTOR_SIZE); +} +qemu_iovec_init_external(qiov, iov, 1); -ret = drv-bdrv_co_writev(bs, sector_num, nb_sectors, qiov); +ret = drv-bdrv_co_writev(bs, sector_num, num, qiov); +} + +sector_num += num; +nb_sectors -= num; +} qemu_vfree(iov.iov_base); return ret; -- 1.7.9.5
[Qemu-devel] [PATCHv4 16/17] qemu-img: conditionally zero out target on convert
If the target has_zero_init = 0, but supports efficiently writing zeroes by unmapping we call bdrv_zeroize to avoid fully allocating the target. This currently is designed especially for iscsi. Reviewed-by: Eric Blake ebl...@redhat.com Signed-off-by: Peter Lieven p...@kamp.de --- qemu-img.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/qemu-img.c b/qemu-img.c index c6eff15..2d46de8 100644 --- a/qemu-img.c +++ b/qemu-img.c @@ -1353,7 +1353,7 @@ static int img_convert(int argc, char **argv) } } -flags = BDRV_O_RDWR; +flags = min_sparse ? (BDRV_O_RDWR | BDRV_O_UNMAP) : BDRV_O_RDWR; ret = bdrv_parse_cache_flags(cache, flags); if (ret 0) { error_report(Invalid cache option: %s, cache); @@ -1469,6 +1469,14 @@ static int img_convert(int argc, char **argv) } else { int has_zero_init = min_sparse ? bdrv_has_zero_init(out_bs) : 0; +if (!has_zero_init bdrv_has_discard_write_zeroes(out_bs)) { +ret = bdrv_make_zero(out_bs, BDRV_REQ_MAY_UNMAP); +if (ret 0) { +goto out; +} +has_zero_init = 1; +} + sector_num = 0; // total number of sectors converted so far nb_sectors = total_sectors - sector_num; if (nb_sectors != 0) { -- 1.7.9.5
[Qemu-devel] [PATCHv4 00/17] block: logical block provisioning enhancements
this patch adds the ability for targets to stay sparse during block migration (if the zero_blocks capability is set) and qemu-img convert even if the target does not have has_zero_init = 1. the series was especially developed for iSCSI, but it should also work with other drivers with little or no adjustments. these adjustments should be limited to providing block provisioning information through get_block_info and/or honouring BDRV_REQ_MAY_UNMAP on writing zeroes. v3-v4: - changed BlockLimits struct to typedef (Stefan, Eric) - renamed bdrv_zeroize to bdrv_make_zero (Stefan) - added comment about the -S flag of qemu-img convert in qemu-img.texi (Eric) - used struct assignment for bs-bl in raw_open (Stefan, Eric) - dropped 3 get_block_status fixes that are independent of this series and already partly merged. v2-v3: - fix merge conflict in block/qcow2_cluster.c - changed return type of bdrv_has_discard_zeroes and bdrv_has_discard_write_zeroes to bool. - moved alignment and limits info to a BlockLimits struct (Paolo). - added magic constanst for default maximum in bdrv_co_do_write_zeroes and bdrv_co_discard (Eric). - bdrv_co_do_write_zeroes: allocating the bounce buffer only once (Eric), fixed bounce iov_len in the fall back path. - bdrv_zeroize: added inline docu (Eric) and do not mask flags passed to bdrv_write_zeroes (Eric). - qemu-img: changed the default hint for -S (min_sparse) in the usage help to 4k. not changing the default as it is unclear why this default was set. size suffixes are already supported (Eric). v1-v2: - moved block max_discard and max_write_zeroes to BlockDriverState - added discard_alignment and write_zeroes_alignment to BlockDriverState - added bdrv_has_discard_zeroes() and bdrv_has_discard_write_zeroes() - added logic to bdrv_co_discard and bdrv_co_do_write_zeroes to honour limit and alignment info. - added support for -S 0 in qemu-img convert. Peter Lieven (17): block: make BdrvRequestFlags public block: add flags to bdrv_*_write_zeroes block: introduce BDRV_REQ_MAY_UNMAP request flag block: introduce bdrv_has_discard_zeroes and bdrv_has_discard_write_zeroes block/raw: add bdrv_has_discard_zeroes and bdrv_has_discard_write_zeroes block: add BlockLimits structure to BlockDriverState block: honour BlockLimits in bdrv_co_do_write_zeroes block: honour BlockLimits in bdrv_co_discard iscsi: simplify iscsi_co_discard iscsi: set limits in BlockDriverState iscsi: add bdrv_has_discard_zeroes and bdrv_has_discard_write_zeroes iscsi: add bdrv_co_write_zeroes block: introduce bdrv_make_zero block/get_block_status: fix BDRV_BLOCK_ZERO for unallocated blocks qemu-img: add support for fully allocated images qemu-img: conditionally zero out target on convert block/raw: copy BlockLimits on raw_open block-migration.c |3 +- block.c | 199 + block/backup.c|3 +- block/iscsi.c | 152 -- block/qcow2-cluster.c |2 +- block/qcow2.c |2 +- block/qed.c |3 +- block/raw_bsd.c | 62 -- block/vmdk.c |3 +- include/block/block.h | 19 - include/block/block_int.h | 32 +++- qemu-img.c| 18 +++- qemu-img.texi |5 ++ qemu-io-cmds.c|2 +- 14 files changed, 394 insertions(+), 111 deletions(-) -- 1.7.9.5
[Qemu-devel] [PATCHv4 04/17] block: introduce bdrv_has_discard_zeroes and bdrv_has_discard_write_zeroes
Reviewed-by: Eric Blake ebl...@redhat.com Signed-off-by: Peter Lieven p...@kamp.de --- block.c | 29 + include/block/block.h |2 ++ include/block/block_int.h | 13 + 3 files changed, 44 insertions(+) diff --git a/block.c b/block.c index 0675257..6a46bc2 100644 --- a/block.c +++ b/block.c @@ -3085,6 +3085,35 @@ int bdrv_has_zero_init(BlockDriverState *bs) return 0; } +bool bdrv_has_discard_zeroes(BlockDriverState *bs) +{ +assert(bs-drv); + +if (bs-backing_hd) { +return false; +} +if (bs-drv-bdrv_has_discard_zeroes) { +return bs-drv-bdrv_has_discard_zeroes(bs); +} + +return false; +} + +bool bdrv_has_discard_write_zeroes(BlockDriverState *bs) +{ +assert(bs-drv); + +if (bs-backing_hd || !(bs-open_flags BDRV_O_UNMAP)) { +return false; +} + +if (bs-drv-bdrv_has_discard_write_zeroes) { +return bs-drv-bdrv_has_discard_write_zeroes(bs); +} + +return false; +} + typedef struct BdrvCoGetBlockStatusData { BlockDriverState *bs; BlockDriverState *base; diff --git a/include/block/block.h b/include/block/block.h index b72c81a..2de226f 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -310,6 +310,8 @@ int bdrv_discard(BlockDriverState *bs, int64_t sector_num, int nb_sectors); int bdrv_co_discard(BlockDriverState *bs, int64_t sector_num, int nb_sectors); int bdrv_has_zero_init_1(BlockDriverState *bs); int bdrv_has_zero_init(BlockDriverState *bs); +bool bdrv_has_discard_zeroes(BlockDriverState *bs); +bool bdrv_has_discard_write_zeroes(BlockDriverState *bs); int64_t bdrv_get_block_status(BlockDriverState *bs, int64_t sector_num, int nb_sectors, int *pnum); int bdrv_is_allocated(BlockDriverState *bs, int64_t sector_num, int nb_sectors, diff --git a/include/block/block_int.h b/include/block/block_int.h index f95dba7..ce5c62e 100644 --- a/include/block/block_int.h +++ b/include/block/block_int.h @@ -217,6 +217,19 @@ struct BlockDriver { */ int (*bdrv_has_zero_init)(BlockDriverState *bs); +/* + * Returns true if discarded blocks read back as zeroes. + */ +bool (*bdrv_has_discard_zeroes)(BlockDriverState *bs); + +/* + * Returns true if the driver can optimize writing zeroes by discarding + * sectors. It is additionally required that the block device is + * opened with BDRV_O_UNMAP and the that write zeroes request carries + * the BDRV_REQ_MAY_UNMAP flag for this to work. + */ +bool (*bdrv_has_discard_write_zeroes)(BlockDriverState *bs); + QLIST_ENTRY(BlockDriver) list; }; -- 1.7.9.5
[Qemu-devel] [PATCHv4 11/17] iscsi: add bdrv_has_discard_zeroes and bdrv_has_discard_write_zeroes
Reviewed-by: Eric Blake ebl...@redhat.com Signed-off-by: Peter Lieven p...@kamp.de --- block/iscsi.c | 16 +++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/block/iscsi.c b/block/iscsi.c index 8ed2274..f0ac620 100644 --- a/block/iscsi.c +++ b/block/iscsi.c @@ -1449,6 +1449,18 @@ static int iscsi_has_zero_init(BlockDriverState *bs) return 0; } +static bool iscsi_has_discard_zeroes(BlockDriverState *bs) +{ +IscsiLun *iscsilun = bs-opaque; +return !!iscsilun-lbprz; +} + +static bool iscsi_has_discard_write_zeroes(BlockDriverState *bs) +{ +IscsiLun *iscsilun = bs-opaque; +return iscsilun-lbprz iscsilun-lbp.lbpws; +} + static int iscsi_create(const char *filename, QEMUOptionParameter *options, Error **errp) { @@ -1535,7 +1547,9 @@ static BlockDriver bdrv_iscsi = { .bdrv_aio_writev = iscsi_aio_writev, .bdrv_aio_flush = iscsi_aio_flush, -.bdrv_has_zero_init = iscsi_has_zero_init, +.bdrv_has_zero_init= iscsi_has_zero_init, +.bdrv_has_discard_zeroes = iscsi_has_discard_zeroes, +.bdrv_has_discard_write_zeroes = iscsi_has_discard_write_zeroes, #ifdef __linux__ .bdrv_ioctl = iscsi_ioctl, -- 1.7.9.5
[Qemu-devel] [PATCHv4 10/17] iscsi: set limits in BlockDriverState
Reviewed-by: Eric Blake ebl...@redhat.com Signed-off-by: Peter Lieven p...@kamp.de --- block/iscsi.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/block/iscsi.c b/block/iscsi.c index be70ced..8ed2274 100644 --- a/block/iscsi.c +++ b/block/iscsi.c @@ -1367,6 +1367,20 @@ static int iscsi_open(BlockDriverState *bs, QDict *options, int flags, sizeof(struct scsi_inquiry_block_limits)); scsi_free_scsi_task(task); task = NULL; + +if (iscsilun-bl.max_unmap 0x) { +bs-bl.max_discard = sector_lun2qemu(iscsilun-bl.max_unmap, + iscsilun); +} +bs-bl.discard_alignment = sector_lun2qemu(iscsilun-bl.opt_unmap_gran, + iscsilun); + +if (iscsilun-bl.max_ws_len 0x) { +bs-bl.max_write_zeroes = sector_lun2qemu(iscsilun-bl.max_ws_len, + iscsilun); +} +bs-bl.write_zeroes_alignment = sector_lun2qemu(iscsilun-bl.opt_unmap_gran, +iscsilun); } #if defined(LIBISCSI_FEATURE_NOP_COUNTER) -- 1.7.9.5
[Qemu-devel] [PATCHv4 08/17] block: honour BlockLimits in bdrv_co_discard
Reviewed-by: Eric Blake ebl...@redhat.com Signed-off-by: Peter Lieven p...@kamp.de --- block.c | 37 - 1 file changed, 36 insertions(+), 1 deletion(-) diff --git a/block.c b/block.c index 7551751..43d5f46 100644 --- a/block.c +++ b/block.c @@ -4209,6 +4209,11 @@ static void coroutine_fn bdrv_discard_co_entry(void *opaque) rwco-ret = bdrv_co_discard(rwco-bs, rwco-sector_num, rwco-nb_sectors); } +/* if no limit is specified in the BlockLimits use a default + * of 32768 512-byte sectors (16 MiB) per request. + */ +#define MAX_DISCARD_DEFAULT 32768 + int coroutine_fn bdrv_co_discard(BlockDriverState *bs, int64_t sector_num, int nb_sectors) { @@ -4230,7 +4235,37 @@ int coroutine_fn bdrv_co_discard(BlockDriverState *bs, int64_t sector_num, } if (bs-drv-bdrv_co_discard) { -return bs-drv-bdrv_co_discard(bs, sector_num, nb_sectors); +int max_discard = bs-bl.max_discard ? + bs-bl.max_discard : MAX_DISCARD_DEFAULT; + +while (nb_sectors 0) { +int ret; +int num = nb_sectors; + +/* align request */ +if (bs-bl.discard_alignment +num = bs-bl.discard_alignment +sector_num % bs-bl.discard_alignment) { +if (num bs-bl.discard_alignment) { +num = bs-bl.discard_alignment; +} +num -= sector_num % bs-bl.discard_alignment; +} + +/* limit request size */ +if (num max_discard) { +num = max_discard; +} + +ret = bs-drv-bdrv_co_discard(bs, sector_num, num); +if (ret) { +return ret; +} + +sector_num += num; +nb_sectors -= num; +} +return 0; } else if (bs-drv-bdrv_aio_discard) { BlockDriverAIOCB *acb; CoroutineIOCompletion co = { -- 1.7.9.5
[Qemu-devel] [PATCHv4 09/17] iscsi: simplify iscsi_co_discard
now that bdrv_co_discard can handle limits we do not need the request split logic here anymore. Reviewed-by: Eric Blake ebl...@redhat.com Signed-off-by: Peter Lieven p...@kamp.de --- block/iscsi.c | 67 + 1 file changed, 25 insertions(+), 42 deletions(-) diff --git a/block/iscsi.c b/block/iscsi.c index 6152ef1..be70ced 100644 --- a/block/iscsi.c +++ b/block/iscsi.c @@ -87,7 +87,6 @@ typedef struct IscsiAIOCB { #define NOP_INTERVAL 5000 #define MAX_NOP_FAILURES 3 #define ISCSI_CMD_RETRIES 5 -#define ISCSI_MAX_UNMAP 131072 static void iscsi_bh_cb(void *p) @@ -912,8 +911,6 @@ coroutine_fn iscsi_co_discard(BlockDriverState *bs, int64_t sector_num, IscsiLun *iscsilun = bs-opaque; struct IscsiTask iTask; struct unmap_list list; -uint32_t nb_blocks; -uint32_t max_unmap; if (!is_request_lun_aligned(sector_num, nb_sectors, iscsilun)) { return -EINVAL; @@ -925,52 +922,38 @@ coroutine_fn iscsi_co_discard(BlockDriverState *bs, int64_t sector_num, } list.lba = sector_qemu2lun(sector_num, iscsilun); -nb_blocks = sector_qemu2lun(nb_sectors, iscsilun); +list.num = sector_qemu2lun(nb_sectors, iscsilun); -max_unmap = iscsilun-bl.max_unmap; -if (max_unmap == 0x) { -max_unmap = ISCSI_MAX_UNMAP; -} - -while (nb_blocks 0) { -iscsi_co_init_iscsitask(iscsilun, iTask); -list.num = nb_blocks; -if (list.num max_unmap) { -list.num = max_unmap; -} +iscsi_co_init_iscsitask(iscsilun, iTask); retry: -if (iscsi_unmap_task(iscsilun-iscsi, iscsilun-lun, 0, 0, list, 1, - iscsi_co_generic_cb, iTask) == NULL) { -return -EIO; -} - -while (!iTask.complete) { -iscsi_set_events(iscsilun); -qemu_coroutine_yield(); -} +if (iscsi_unmap_task(iscsilun-iscsi, iscsilun-lun, 0, 0, list, 1, + iscsi_co_generic_cb, iTask) == NULL) { +return -EIO; +} -if (iTask.task != NULL) { -scsi_free_scsi_task(iTask.task); -iTask.task = NULL; -} +while (!iTask.complete) { +iscsi_set_events(iscsilun); +qemu_coroutine_yield(); +} -if (iTask.do_retry) { -goto retry; -} +if (iTask.task != NULL) { +scsi_free_scsi_task(iTask.task); +iTask.task = NULL; +} -if (iTask.status == SCSI_STATUS_CHECK_CONDITION) { -/* the target might fail with a check condition if it - is not happy with the alignment of the UNMAP request - we silently fail in this case */ -return 0; -} +if (iTask.do_retry) { +goto retry; +} -if (iTask.status != SCSI_STATUS_GOOD) { -return -EIO; -} +if (iTask.status == SCSI_STATUS_CHECK_CONDITION) { +/* the target might fail with a check condition if it + is not happy with the alignment of the UNMAP request + we silently fail in this case */ +return 0; +} -list.lba += list.num; -nb_blocks -= list.num; +if (iTask.status != SCSI_STATUS_GOOD) { +return -EIO; } return 0; -- 1.7.9.5
[Qemu-devel] [PATCHv4 12/17] iscsi: add bdrv_co_write_zeroes
Reviewed-by: Eric Blake ebl...@redhat.com Signed-off-by: Peter Lieven p...@kamp.de --- block/iscsi.c | 59 + 1 file changed, 59 insertions(+) diff --git a/block/iscsi.c b/block/iscsi.c index f0ac620..16510c2 100644 --- a/block/iscsi.c +++ b/block/iscsi.c @@ -56,6 +56,7 @@ typedef struct IscsiLun { uint8_t lbprz; struct scsi_inquiry_logical_block_provisioning lbp; struct scsi_inquiry_block_limits bl; +unsigned char *zeroblock; } IscsiLun; typedef struct IscsiTask { @@ -959,6 +960,62 @@ retry: return 0; } + +static int +coroutine_fn iscsi_co_write_zeroes(BlockDriverState *bs, int64_t sector_num, + int nb_sectors, BdrvRequestFlags flags) +{ +IscsiLun *iscsilun = bs-opaque; +struct IscsiTask iTask; +uint64_t lba; +uint32_t nb_blocks; + +if (!is_request_lun_aligned(sector_num, nb_sectors, iscsilun)) { +return -EINVAL; +} + +if (!iscsilun-lbp.lbpws) { +/* WRITE SAME is not supported by the target */ +return -ENOTSUP; +} + +lba = sector_qemu2lun(sector_num, iscsilun); +nb_blocks = sector_qemu2lun(nb_sectors, iscsilun); + +if (iscsilun-zeroblock == NULL) { +iscsilun-zeroblock = g_malloc0(iscsilun-block_size); +} + +iscsi_co_init_iscsitask(iscsilun, iTask); +retry: +if (iscsi_writesame16_task(iscsilun-iscsi, iscsilun-lun, lba, + iscsilun-zeroblock, iscsilun-block_size, + nb_blocks, 0, !!(flags BDRV_REQ_MAY_UNMAP), + 0, 0, iscsi_co_generic_cb, iTask) == NULL) { +return -EIO; +} + +while (!iTask.complete) { +iscsi_set_events(iscsilun); +qemu_coroutine_yield(); +} + +if (iTask.task != NULL) { +scsi_free_scsi_task(iTask.task); +iTask.task = NULL; +} + +if (iTask.do_retry) { +goto retry; +} + +if (iTask.status != SCSI_STATUS_GOOD) { +return -EIO; +} + +return 0; +} + static int parse_chap(struct iscsi_context *iscsi, const char *target) { QemuOptsList *list; @@ -1421,6 +1478,7 @@ static void iscsi_close(BlockDriverState *bs) } qemu_aio_set_fd_handler(iscsi_get_fd(iscsi), NULL, NULL, NULL); iscsi_destroy_context(iscsi); +g_free(iscsilun-zeroblock); memset(iscsilun, 0, sizeof(IscsiLun)); } @@ -1542,6 +1600,7 @@ static BlockDriver bdrv_iscsi = { .bdrv_co_get_block_status = iscsi_co_get_block_status, #endif .bdrv_co_discard = iscsi_co_discard, +.bdrv_co_write_zeroes = iscsi_co_write_zeroes, .bdrv_aio_readv = iscsi_aio_readv, .bdrv_aio_writev = iscsi_aio_writev, -- 1.7.9.5
[Qemu-devel] [PATCHv4 13/17] block: introduce bdrv_make_zero
this patch adds a call to completely zero out a block device. the operation is sped up by checking the block status and only writing zeroes to the device if they currently do not return zeroes. optionally the zero writing can be sped up by setting the flag BDRV_REQ_MAY_UNMAP to emulate the zero write by unmapping if the driver supports it. Signed-off-by: Peter Lieven p...@kamp.de --- block.c | 37 + include/block/block.h |1 + 2 files changed, 38 insertions(+) diff --git a/block.c b/block.c index 43d5f46..fc931e3 100644 --- a/block.c +++ b/block.c @@ -2382,6 +2382,43 @@ int bdrv_write_zeroes(BlockDriverState *bs, int64_t sector_num, BDRV_REQ_ZERO_WRITE | flags); } +/* + * Completely zero out a block device with the help of bdrv_write_zeroes. + * The operation is sped up by checking the block status and only writing + * zeroes to the device if they currently do not return zeroes. Optional + * flags are passed through to bdrv_write_zeroes (e.g. BDRV_REQ_MAY_UNMAP). + * + * Returns 0 on error, 0 on success. For error codes see bdrv_write(). + */ +int bdrv_make_zero(BlockDriverState *bs, BdrvRequestFlags flags) +{ +int64_t target_size = bdrv_getlength(bs) / BDRV_SECTOR_SIZE; +int64_t ret, nb_sectors, sector_num = 0; +int n; + +for (;;) { +nb_sectors = target_size - sector_num; +if (nb_sectors = 0) { +return 0; +} +if (nb_sectors INT_MAX) { +nb_sectors = INT_MAX; +} +ret = bdrv_get_block_status(bs, sector_num, nb_sectors, n); +if (ret BDRV_BLOCK_ZERO) { +sector_num += n; +continue; +} +ret = bdrv_write_zeroes(bs, sector_num, n, flags); +if (ret 0) { +error_report(error writing zeroes at sector % PRId64 : %s, + sector_num, strerror(-ret)); +return ret; +} +sector_num += n; +} +} + int bdrv_pread(BlockDriverState *bs, int64_t offset, void *buf, int count1) { diff --git a/include/block/block.h b/include/block/block.h index 2de226f..f466953 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -196,6 +196,7 @@ int bdrv_write(BlockDriverState *bs, int64_t sector_num, const uint8_t *buf, int nb_sectors); int bdrv_write_zeroes(BlockDriverState *bs, int64_t sector_num, int nb_sectors, BdrvRequestFlags flags); +int bdrv_make_zero(BlockDriverState *bs, BdrvRequestFlags flags); int bdrv_writev(BlockDriverState *bs, int64_t sector_num, QEMUIOVector *qiov); int bdrv_pread(BlockDriverState *bs, int64_t offset, void *buf, int count); -- 1.7.9.5
Re: [Qemu-devel] [PATCHv5] block/get_block_status: avoid redundant callouts on raw devices
On 07.10.2013 10:25, Paolo Bonzini wrote: Il 07/10/2013 07:59, Peter Lieven ha scritto: if a raw device like an iscsi target or host device is used the current implementation makes a second call out to get the block status of bs-file. Signed-off-by: Peter Lieven p...@kamp.de --- v5: add a generic get_lba_status function in the raw driver which adds the BDRV_BLOCK_RAW flag. bdrv_co_get_block_status will handle the callout to bs-file then. v4: use a flag to detect the raw driver instead of the strncmp hack block.c |4 block/raw_bsd.c |3 ++- include/block/block.h |4 3 files changed, 10 insertions(+), 1 deletion(-) diff --git a/block.c b/block.c index 93e113a..38a589e 100644 --- a/block.c +++ b/block.c @@ -3147,6 +3147,10 @@ static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs, return ret; } +if (ret BDRV_BLOCK_RAW) { +return bdrv_get_block_status(bs-file, sector_num, nb_sectors, pnum); Strictly speaking, this should probably do something like this: assert(ret BDRV_BLOCK_OFFSET_VALID); return bdrv_get_block_status(bs-file, ret BDRV_SECTOR_BITS, nb_sectors, pnum); shouldn't the last line be: return bdrv_get_block_status(bs-file, ret BDRV_SECTOR_BITS, *pnum, pnum); This would of course require *pnum = nb_sectors in raw_co_get_block_status ?
[Qemu-devel] [PATCH v3 01/17] qapi-types/visit.py: Pass whole expr dict for structs
Signed-off-by: Kevin Wolf kw...@redhat.com Reviewed-by: Eric Blake ebl...@redhat.com --- scripts/qapi-types.py | 11 --- scripts/qapi-visit.py | 8 ++-- 2 files changed, 14 insertions(+), 5 deletions(-) diff --git a/scripts/qapi-types.py b/scripts/qapi-types.py index 5222463..566fe5e 100644 --- a/scripts/qapi-types.py +++ b/scripts/qapi-types.py @@ -71,7 +71,7 @@ def generate_struct_fields(members): c_name=c_var(argname)) if structured: push_indent() -ret += generate_struct(, argname, argentry) +ret += generate_struct({ field: argname, data: argentry}) pop_indent() else: ret += mcgen(''' @@ -81,7 +81,12 @@ def generate_struct_fields(members): return ret -def generate_struct(structname, fieldname, members): +def generate_struct(expr): + +structname = expr.get('type', ) +fieldname = expr.get('field', ) +members = expr['data'] + ret = mcgen(''' struct %(name)s { @@ -417,7 +422,7 @@ if do_builtins: for expr in exprs: ret = \n if expr.has_key('type'): -ret += generate_struct(expr['type'], , expr['data']) + \n +ret += generate_struct(expr) + \n ret += generate_type_cleanup_decl(expr['type'] + List) fdef.write(generate_type_cleanup(expr['type'] + List) + \n) ret += generate_type_cleanup_decl(expr['type']) diff --git a/scripts/qapi-visit.py b/scripts/qapi-visit.py index 597cca4..1e44004 100644 --- a/scripts/qapi-visit.py +++ b/scripts/qapi-visit.py @@ -120,7 +120,11 @@ if (!err) { ''') return ret -def generate_visit_struct(name, members): +def generate_visit_struct(expr): + +name = expr['type'] +members = expr['data'] + ret = generate_visit_struct_fields(name, , , members) ret += mcgen(''' @@ -472,7 +476,7 @@ if do_builtins: for expr in exprs: if expr.has_key('type'): -ret = generate_visit_struct(expr['type'], expr['data']) +ret = generate_visit_struct(expr) ret += generate_visit_list(expr['type'], expr['data']) fdef.write(ret) -- 1.8.1.4
[Qemu-devel] [PATCH v3 04/17] blockdev: 'blockdev-add' QMP command
For examples see the changes to qmp-commands.hx. Signed-off-by: Kevin Wolf kw...@redhat.com --- blockdev.c | 57 ++ qapi-schema.json | 236 +++ qmp-commands.hx | 55 + 3 files changed, 348 insertions(+) diff --git a/blockdev.c b/blockdev.c index 29a5b70..157c076 100644 --- a/blockdev.c +++ b/blockdev.c @@ -38,6 +38,8 @@ #include qemu/option.h #include qemu/config-file.h #include qapi/qmp/types.h +#include qapi-visit.h +#include qapi/qmp-output-visitor.h #include sysemu/sysemu.h #include block/block_int.h #include qmp-commands.h @@ -2066,6 +2068,61 @@ void qmp_block_job_complete(const char *device, Error **errp) block_job_complete(job, errp); } +void qmp_blockdev_add(BlockdevOptions *options, Error **errp) +{ +QmpOutputVisitor *ov = qmp_output_visitor_new(); +QObject *obj; +QDict *qdict; +DriveInfo *dinfo; +Error *local_err = NULL; + +/* Require an ID in the top level */ +if (!options-has_id) { +error_setg(errp, Block device needs an ID); +goto fail; +} + +/* TODO Sort it out in raw-posix and drive_init: Reject aio=native with + * cache.direct=false instead of silently switching to aio=threads, except + * if called from drive_init. + * + * For now, simply forbidding the combination for all drivers will do. */ +if (options-has_aio options-aio == BLOCKDEV_AIO_OPTIONS_NATIVE) { +bool direct = options-cache-has_direct options-cache-direct; +if (!options-has_cache !direct) { +error_setg(errp, aio=native requires cache.direct=true); +goto fail; +} +} + +visit_type_BlockdevOptions(qmp_output_get_visitor(ov), + options, NULL, local_err); +if (error_is_set(local_err)) { +error_propagate(errp, local_err); +goto fail; +} + +obj = qmp_output_get_qobject(ov); +qdict = qobject_to_qdict(obj); + +qdict_flatten(qdict); + +QemuOpts *opts = qemu_opts_from_qdict(qemu_drive_opts, qdict, local_err); +if (error_is_set(local_err)) { +error_propagate(errp, local_err); +goto fail; +} + +dinfo = blockdev_init(opts, IF_NONE); +if (!dinfo) { +error_setg(errp, Could not open image); +goto fail; +} + +fail: +qmp_output_visitor_cleanup(ov); +} + static void do_qmp_query_block_jobs_one(void *opaque, BlockDriverState *bs) { BlockJobInfoList **prev = opaque; diff --git a/qapi-schema.json b/qapi-schema.json index 145eca8..71e2b2e 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -3902,3 +3902,239 @@ ## { 'command': 'query-rx-filter', 'data': { '*name': 'str' }, 'returns': ['RxFilterInfo'] } + + +## +# @BlockdevDiscardOptions +# +# Determines how to handle discard requests. +# +# @ignore: Ignore the request +# @unmap: Forward as an unmap request +# +# Since: 1.7 +## +{ 'enum': 'BlockdevDiscardOptions', + 'data': [ 'ignore', 'unmap' ] } + +## +# @BlockdevAioOptions +# +# Selects the AIO backend to handle I/O requests +# +# @threads: Use qemu's thread pool +# @native: Use native AIO backend (only Linux and Windows) +# +# Since: 1.7 +## +{ 'enum': 'BlockdevAioOptions', + 'data': [ 'threads', 'native' ] } + +## +# @BlockdevCacheOptions +# +# Includes cache-related options for block devices +# +# @writeback: #optional enables writeback mode for any caches (default: true) +# @direct: #optional enables use of O_DIRECT (bypass the host page cache; +# default: false) +# @no-flush:#optional ignore any flush requests for the device (default: +# false) +# +# Since: 1.7 +## +{ 'type': 'BlockdevCacheOptions', + 'data': { '*writeback': 'bool', +'*direct': 'bool', +'*no-flush': 'bool' } } + +## +# @BlockdevOptionsBase +# +# Options that are available for all block devices, independent of the block +# driver. +# +# @driver: block driver name +# @id: #optional id by which the new block device can be referred to. +# This is a required option on the top level of blockdev-add, and +# currently not allowed on any other level. +# @discard: #optional discard-related options (default: ignore) +# @cache: #optional cache-related options +# @aio: #optional AIO backend (default: threads) +# @rerror: #optional how to handle read errors on the device +# (default: report) +# @werror: #optional how to handle write errors on the device +# (default: enospc) +# @read-only: #optional whether the block device should be read-only +# (default: false) +# +# Since: 1.7 +## +{ 'type': 'BlockdevOptionsBase', + 'data': { 'driver': 'str', +'*id': 'str', +'*discard': 'BlockdevDiscardOptions', +'*cache': 'BlockdevCacheOptions', +'*aio': 'BlockdevAioOptions', +
[Qemu-devel] [PATCH v3 05/17] blockdev: Separate ID generation from DriveInfo creation
blockdev-add shouldn't automatically generate IDs, but will keep most of the DriveInfo creation code. Signed-off-by: Kevin Wolf kw...@redhat.com Reviewed-by: Max Reitz mre...@redhat.com Reviewed-by: Wenchao Xia xiaw...@linux.vnet.ibm.com Reviewed-by: Eric Blake ebl...@redhat.com --- blockdev.c| 32 +--- include/qemu/option.h | 1 + util/qemu-option.c| 6 ++ 3 files changed, 24 insertions(+), 15 deletions(-) diff --git a/blockdev.c b/blockdev.c index 157c076..e7e6bdd 100644 --- a/blockdev.c +++ b/blockdev.c @@ -604,23 +604,25 @@ static DriveInfo *blockdev_init(QemuOpts *all_opts, return NULL; } -/* init */ - -dinfo = g_malloc0(sizeof(*dinfo)); -if ((buf = qemu_opts_id(opts)) != NULL) { -dinfo-id = g_strdup(buf); -} else { -/* no id supplied - create one */ -dinfo-id = g_malloc0(32); -if (type == IF_IDE || type == IF_SCSI) +/* no id supplied - create one */ +if (qemu_opts_id(opts) == NULL) { +char *new_id; +if (type == IF_IDE || type == IF_SCSI) { mediastr = (media == MEDIA_CDROM) ? -cd : -hd; -if (max_devs) -snprintf(dinfo-id, 32, %s%i%s%i, - if_name[type], bus_id, mediastr, unit_id); -else -snprintf(dinfo-id, 32, %s%s%i, - if_name[type], mediastr, unit_id); +} +if (max_devs) { +new_id = g_strdup_printf(%s%i%s%i, if_name[type], bus_id, + mediastr, unit_id); +} else { +new_id = g_strdup_printf(%s%s%i, if_name[type], + mediastr, unit_id); +} +qemu_opts_set_id(opts, new_id); } + +/* init */ +dinfo = g_malloc0(sizeof(*dinfo)); +dinfo-id = g_strdup(qemu_opts_id(opts)); dinfo-bdrv = bdrv_new(dinfo-id); dinfo-bdrv-open_flags = snapshot ? BDRV_O_SNAPSHOT : 0; dinfo-bdrv-read_only = ro; diff --git a/include/qemu/option.h b/include/qemu/option.h index 63db4cc..5c0c6dd 100644 --- a/include/qemu/option.h +++ b/include/qemu/option.h @@ -142,6 +142,7 @@ void qemu_opts_loc_restore(QemuOpts *opts); int qemu_opts_set(QemuOptsList *list, const char *id, const char *name, const char *value); const char *qemu_opts_id(QemuOpts *opts); +void qemu_opts_set_id(QemuOpts *opts, char *id); void qemu_opts_del(QemuOpts *opts); void qemu_opts_validate(QemuOpts *opts, const QemuOptDesc *desc, Error **errp); int qemu_opts_do_parse(QemuOpts *opts, const char *params, const char *firstname); diff --git a/util/qemu-option.c b/util/qemu-option.c index e0844a9..efcb5dc 100644 --- a/util/qemu-option.c +++ b/util/qemu-option.c @@ -834,6 +834,12 @@ const char *qemu_opts_id(QemuOpts *opts) return opts-id; } +/* The id string will be g_free()d by qemu_opts_del */ +void qemu_opts_set_id(QemuOpts *opts, char *id) +{ +opts-id = id; +} + void qemu_opts_del(QemuOpts *opts) { QemuOpt *opt; -- 1.8.1.4
[Qemu-devel] [PATCH v3 02/17] qapi-types/visit.py: Inheritance for structs
This introduces a new 'base' key for struct definitions that refers to another struct type. On the JSON level, the fields of the base type are included directly into the same namespace as the fields of the defined type, like with unions. On the C level, a pointer to a struct of the base type is included. Signed-off-by: Kevin Wolf kw...@redhat.com Reviewed-by: Eric Blake ebl...@redhat.com --- docs/qapi-code-gen.txt | 17 + scripts/qapi-types.py | 4 scripts/qapi-visit.py | 18 -- 3 files changed, 37 insertions(+), 2 deletions(-) diff --git a/docs/qapi-code-gen.txt b/docs/qapi-code-gen.txt index 0ce045c..91f44d0 100644 --- a/docs/qapi-code-gen.txt +++ b/docs/qapi-code-gen.txt @@ -53,6 +53,23 @@ The use of '*' as a prefix to the name means the member is optional. Optional members should always be added to the end of the dictionary to preserve backwards compatibility. + +A complex type definition can specify another complex type as its base. +In this case, the fields of the base type are included as top-level fields +of the new complex type's dictionary in the QMP wire format. An example +definition is: + + { 'type': 'BlockdevOptionsGenericFormat', 'data': { 'file': 'str' } } + { 'type': 'BlockdevOptionsGenericCOWFormat', + 'base': 'BlockdevOptionsGenericFormat', + 'data': { '*backing': 'str' } } + +An example BlockdevOptionsGenericCOWFormat object on the wire could use +both fields like this: + + { file: /some/place/my-image, + backing: /some/place/my-backing-file } + === Enumeration types === An enumeration type is a dictionary containing a single key whose value is a diff --git a/scripts/qapi-types.py b/scripts/qapi-types.py index 566fe5e..4a1652b 100644 --- a/scripts/qapi-types.py +++ b/scripts/qapi-types.py @@ -86,6 +86,7 @@ def generate_struct(expr): structname = expr.get('type', ) fieldname = expr.get('field', ) members = expr['data'] +base = expr.get('base') ret = mcgen(''' struct %(name)s @@ -93,6 +94,9 @@ struct %(name)s ''', name=structname) +if base: +ret += generate_struct_fields({'base': base}) + ret += generate_struct_fields(members) if len(fieldname): diff --git a/scripts/qapi-visit.py b/scripts/qapi-visit.py index 1e44004..c39e628 100644 --- a/scripts/qapi-visit.py +++ b/scripts/qapi-visit.py @@ -17,7 +17,7 @@ import os import getopt import errno -def generate_visit_struct_fields(name, field_prefix, fn_prefix, members): +def generate_visit_struct_fields(name, field_prefix, fn_prefix, members, base = None): substructs = [] ret = '' full_name = name if not fn_prefix else %s_%s % (name, fn_prefix) @@ -42,6 +42,19 @@ static void visit_type_%(full_name)s_fields(Visitor *m, %(name)s ** obj, Error * name=name, full_name=full_name) push_indent() +if base: +ret += mcgen(''' +visit_start_implicit_struct(m, obj ? (void**) (*obj)-%(c_name)s : NULL, sizeof(%(type)s), err); +if (!err) { +visit_type_%(type)s_fields(m, obj ? (*obj)-%(c_prefix)s%(c_name)s : NULL, err); +error_propagate(errp, err); +err = NULL; +visit_end_implicit_struct(m, err); +} +''', + c_prefix=c_var(field_prefix), + type=type_name(base), c_name=c_var('base')) + for argname, argentry, optional, structured in parse_args(members): if optional: ret += mcgen(''' @@ -124,8 +137,9 @@ def generate_visit_struct(expr): name = expr['type'] members = expr['data'] +base = expr.get('base') -ret = generate_visit_struct_fields(name, , , members) +ret = generate_visit_struct_fields(name, , , members, base) ret += mcgen(''' -- 1.8.1.4
[Qemu-devel] [PATCH v3 12/17] blockdev: Move virtio-blk device creation to drive_init
Signed-off-by: Kevin Wolf kw...@redhat.com Reviewed-by: Max Reitz mre...@redhat.com Reviewed-by: Eric Blake ebl...@redhat.com --- blockdev.c | 54 +++--- 1 file changed, 27 insertions(+), 27 deletions(-) diff --git a/blockdev.c b/blockdev.c index 1eaefb0..1c05c7a 100644 --- a/blockdev.c +++ b/blockdev.c @@ -318,7 +318,6 @@ static DriveInfo *blockdev_init(QDict *bs_opts, int ro = 0; int bdrv_flags = 0; int on_read_error, on_write_error; -const char *devaddr; DriveInfo *dinfo; ThrottleConfig cfg; int snapshot = 0; @@ -472,20 +471,12 @@ static DriveInfo *blockdev_init(QDict *bs_opts, } } -if ((devaddr = qemu_opt_get(opts, addr)) != NULL) { -if (type != IF_VIRTIO) { -error_report(addr is not supported by this bus type); -return NULL; -} -} - /* init */ dinfo = g_malloc0(sizeof(*dinfo)); dinfo-id = g_strdup(qemu_opts_id(opts)); dinfo-bdrv = bdrv_new(dinfo-id); dinfo-bdrv-open_flags = snapshot ? BDRV_O_SNAPSHOT : 0; dinfo-bdrv-read_only = ro; -dinfo-devaddr = devaddr; dinfo-type = type; dinfo-refcount = 1; if (serial != NULL) { @@ -512,22 +503,8 @@ static DriveInfo *blockdev_init(QDict *bs_opts, case IF_FLOPPY: case IF_PFLASH: case IF_MTD: -break; case IF_VIRTIO: -{ -/* add virtio block device */ -QemuOpts *devopts; -devopts = qemu_opts_create_nofail(qemu_find_opts(device)); -if (arch_type == QEMU_ARCH_S390X) { -qemu_opt_set(devopts, driver, virtio-blk-s390); -} else { -qemu_opt_set(devopts, driver, virtio-blk-pci); -} -qemu_opt_set(devopts, drive, dinfo-id); -if (devaddr) -qemu_opt_set(devopts, addr, devaddr); break; -} default: abort(); } @@ -651,6 +628,10 @@ QemuOptsList qemu_legacy_drive_opts = { .name = boot, .type = QEMU_OPT_BOOL, .help = (deprecated, ignored), +},{ +.name = addr, +.type = QEMU_OPT_STRING, +.help = pci address (virtio only), }, { /* end of list */ } }, @@ -666,6 +647,7 @@ DriveInfo *drive_init(QemuOpts *all_opts, BlockInterfaceType block_default_type) BlockInterfaceType type; int cyls, heads, secs, translation; int max_devs, bus_id, unit_id, index; +const char *devaddr; Error *local_err = NULL; /* Change legacy command line options into QMP ones */ @@ -866,6 +848,27 @@ DriveInfo *drive_init(QemuOpts *all_opts, BlockInterfaceType block_default_type) g_free(new_id); } +/* Add virtio block device */ +devaddr = qemu_opt_get(legacy_opts, addr); +if (devaddr type != IF_VIRTIO) { +error_report(addr is not supported by this bus type); +goto fail; +} + +if (type == IF_VIRTIO) { +QemuOpts *devopts; +devopts = qemu_opts_create_nofail(qemu_find_opts(device)); +if (arch_type == QEMU_ARCH_S390X) { +qemu_opt_set(devopts, driver, virtio-blk-s390); +} else { +qemu_opt_set(devopts, driver, virtio-blk-pci); +} +qemu_opt_set(devopts, drive, qdict_get_str(bs_opts, id)); +if (devaddr) { +qemu_opt_set(devopts, addr, devaddr); +} +} + /* Actual block device init: Functionality shared with blockdev-add */ dinfo = blockdev_init(bs_opts, type, media); if (dinfo == NULL) { @@ -883,6 +886,7 @@ DriveInfo *drive_init(QemuOpts *all_opts, BlockInterfaceType block_default_type) dinfo-bus = bus_id; dinfo-unit = unit_id; +dinfo-devaddr = devaddr; fail: qemu_opts_del(legacy_opts); @@ -2259,10 +2263,6 @@ QemuOptsList qemu_common_drive_opts = { .type = QEMU_OPT_STRING, .help = write error action, },{ -.name = addr, -.type = QEMU_OPT_STRING, -.help = pci address (virtio only), -},{ .name = read-only, .type = QEMU_OPT_BOOL, .help = open drive file as read-only, -- 1.8.1.4
[Qemu-devel] [PATCH v3 07/17] blockdev: Move parsing of 'media' option to drive_init
This moves as much as possible of the processing of the 'media' option to drive_init so that it can only be accessed using legacy functions, but never with anything blockdev-add related. Signed-off-by: Kevin Wolf kw...@redhat.com Reviewed-by: Benoit Canet ben...@irqsave.net Reviewed-by: Eric Blake ebl...@redhat.com --- blockdev.c | 73 ++ 1 file changed, 50 insertions(+), 23 deletions(-) diff --git a/blockdev.c b/blockdev.c index cbe907b..0eaaffa 100644 --- a/blockdev.c +++ b/blockdev.c @@ -305,16 +305,18 @@ static bool check_throttle_config(ThrottleConfig *cfg, Error **errp) return true; } +typedef enum { MEDIA_DISK, MEDIA_CDROM } DriveMediaType; + /* Takes the ownership of bs_opts */ static DriveInfo *blockdev_init(QDict *bs_opts, -BlockInterfaceType block_default_type) +BlockInterfaceType block_default_type, +DriveMediaType media) { const char *buf; const char *file = NULL; const char *serial; const char *mediastr = ; BlockInterfaceType type; -enum { MEDIA_DISK, MEDIA_CDROM } media; int bus_id, unit_id; int cyls, heads, secs, translation; int max_devs; @@ -335,7 +337,6 @@ static DriveInfo *blockdev_init(QDict *bs_opts, BlockDriver *drv = NULL; translation = BIOS_ATA_TRANSLATION_AUTO; -media = MEDIA_DISK; /* Check common options by copying from bs_opts to opts, all other options * stay in bs_opts for processing by bdrv_open(). */ @@ -422,19 +423,11 @@ static DriveInfo *blockdev_init(QDict *bs_opts, } } -if ((buf = qemu_opt_get(opts, media)) != NULL) { -if (!strcmp(buf, disk)) { - media = MEDIA_DISK; - } else if (!strcmp(buf, cdrom)) { -if (cyls || secs || heads) { -error_report(CHS can't be set with media=%s, buf); - return NULL; -} - media = MEDIA_CDROM; - } else { - error_report('%s' invalid media, buf); - return NULL; - } +if (media == MEDIA_CDROM) { +if (cyls || secs || heads) { +error_report(CHS can't be set with media=cdrom); +return NULL; +} } if ((buf = qemu_opt_get(opts, discard)) != NULL) { @@ -755,11 +748,27 @@ static void qemu_opt_rename(QemuOpts *opts, const char *from, const char *to) } } +QemuOptsList qemu_legacy_drive_opts = { +.name = drive, +.head = QTAILQ_HEAD_INITIALIZER(qemu_legacy_drive_opts.head), +.desc = { +{ +.name = media, +.type = QEMU_OPT_STRING, +.help = media type (disk, cdrom), +}, +{ /* end of list */ } +}, +}; + DriveInfo *drive_init(QemuOpts *all_opts, BlockInterfaceType block_default_type) { const char *value; -DriveInfo *dinfo; +DriveInfo *dinfo = NULL; QDict *bs_opts; +QemuOpts *legacy_opts; +DriveMediaType media = MEDIA_DISK; +Error *local_err = NULL; /* Change legacy command line options into QMP ones */ qemu_opt_rename(all_opts, iops, throttling.iops-total); @@ -812,8 +821,29 @@ DriveInfo *drive_init(QemuOpts *all_opts, BlockInterfaceType block_default_type) bs_opts = qdict_new(); qemu_opts_to_qdict(all_opts, bs_opts); +legacy_opts = qemu_opts_create_nofail(qemu_legacy_drive_opts); +qemu_opts_absorb_qdict(legacy_opts, bs_opts, local_err); +if (error_is_set(local_err)) { +qerror_report_err(local_err); +error_free(local_err); +goto fail; +} + +/* Media type */ +value = qemu_opt_get(legacy_opts, media); +if (value) { +if (!strcmp(value, disk)) { +media = MEDIA_DISK; +} else if (!strcmp(value, cdrom)) { +media = MEDIA_CDROM; +} else { +error_report('%s' invalid media, value); +goto fail; +} +} + /* Actual block device init: Functionality shared with blockdev-add */ -dinfo = blockdev_init(bs_opts, block_default_type); +dinfo = blockdev_init(bs_opts, block_default_type, media); if (dinfo == NULL) { goto fail; } @@ -823,6 +853,7 @@ DriveInfo *drive_init(QemuOpts *all_opts, BlockInterfaceType block_default_type) dinfo-opts = all_opts; fail: +qemu_opts_del(legacy_opts); return dinfo; } @@ -2115,7 +2146,7 @@ void qmp_blockdev_add(BlockdevOptions *options, Error **errp) qdict_flatten(qdict); -dinfo = blockdev_init(qdict, IF_NONE); +dinfo = blockdev_init(qdict, IF_NONE, MEDIA_DISK); if (!dinfo) { error_setg(errp, Could not open image); goto fail; @@ -2184,10 +2215,6 @@ QemuOptsList qemu_common_drive_opts = { .type = QEMU_OPT_STRING, .help = chs translation (auto, lba. none), },{ -.name = media, -.type = QEMU_OPT_STRING,
[Qemu-devel] [PATCH v3 09/17] blockdev: Moving parsing of geometry options to drive_init
This moves all of the geometry options (cyls/heads/secs/trans) to drive_init so that they can only be accessed using legacy functions, but never with anything blockdev-add related. Signed-off-by: Kevin Wolf kw...@redhat.com Reviewed-by: Eric Blake ebl...@redhat.com --- blockdev.c | 136 +++-- 1 file changed, 69 insertions(+), 67 deletions(-) diff --git a/blockdev.c b/blockdev.c index 58c3ab9..b1c7590 100644 --- a/blockdev.c +++ b/blockdev.c @@ -317,7 +317,6 @@ static DriveInfo *blockdev_init(QDict *bs_opts, const char *serial; const char *mediastr = ; int bus_id, unit_id; -int cyls, heads, secs, translation; int max_devs; int index; int ro = 0; @@ -335,8 +334,6 @@ static DriveInfo *blockdev_init(QDict *bs_opts, bool has_driver_specific_opts; BlockDriver *drv = NULL; -translation = BIOS_ATA_TRANSLATION_AUTO; - /* Check common options by copying from bs_opts to opts, all other options * stay in bs_opts for processing by bdrv_open(). */ id = qdict_get_try_str(bs_opts, id); @@ -365,10 +362,6 @@ static DriveInfo *blockdev_init(QDict *bs_opts, unit_id = qemu_opt_get_number(opts, unit, -1); index = qemu_opt_get_number(opts, index, -1); -cyls = qemu_opt_get_number(opts, cyls, 0); -heads = qemu_opt_get_number(opts, heads, 0); -secs = qemu_opt_get_number(opts, secs, 0); - snapshot = qemu_opt_get_bool(opts, snapshot, 0); ro = qemu_opt_get_bool(opts, read-only, 0); copy_on_read = qemu_opt_get_bool(opts, copy-on-read, false); @@ -378,46 +371,6 @@ static DriveInfo *blockdev_init(QDict *bs_opts, max_devs = if_max_devs[type]; -if (cyls || heads || secs) { -if (cyls 1) { -error_report(invalid physical cyls number); - return NULL; - } -if (heads 1) { -error_report(invalid physical heads number); - return NULL; - } -if (secs 1) { -error_report(invalid physical secs number); - return NULL; - } -} - -if ((buf = qemu_opt_get(opts, trans)) != NULL) { -if (!cyls) { -error_report('%s' trans must be used with cyls, heads and secs, - buf); -return NULL; -} -if (!strcmp(buf, none)) -translation = BIOS_ATA_TRANSLATION_NONE; -else if (!strcmp(buf, lba)) -translation = BIOS_ATA_TRANSLATION_LBA; -else if (!strcmp(buf, auto)) -translation = BIOS_ATA_TRANSLATION_AUTO; - else { -error_report('%s' invalid translation type, buf); - return NULL; - } -} - -if (media == MEDIA_CDROM) { -if (cyls || secs || heads) { -error_report(CHS can't be set with media=cdrom); -return NULL; -} -} - if ((buf = qemu_opt_get(opts, discard)) != NULL) { if (bdrv_parse_discard_flags(buf, bdrv_flags) != 0) { error_report(invalid discard option); @@ -612,10 +565,6 @@ static DriveInfo *blockdev_init(QDict *bs_opts, dinfo-type = type; dinfo-bus = bus_id; dinfo-unit = unit_id; -dinfo-cyls = cyls; -dinfo-heads = heads; -dinfo-secs = secs; -dinfo-trans = translation; dinfo-refcount = 1; if (serial != NULL) { dinfo-serial = g_strdup(serial); @@ -748,6 +697,22 @@ QemuOptsList qemu_legacy_drive_opts = { .name = if, .type = QEMU_OPT_STRING, .help = interface (ide, scsi, sd, mtd, floppy, pflash, virtio), +},{ +.name = cyls, +.type = QEMU_OPT_NUMBER, +.help = number of cylinders (ide disk geometry), +},{ +.name = heads, +.type = QEMU_OPT_NUMBER, +.help = number of heads (ide disk geometry), +},{ +.name = secs, +.type = QEMU_OPT_NUMBER, +.help = number of sectors (ide disk geometry), +},{ +.name = trans, +.type = QEMU_OPT_STRING, +.help = chs translation (auto, lba, none), }, { /* end of list */ } }, @@ -761,6 +726,7 @@ DriveInfo *drive_init(QemuOpts *all_opts, BlockInterfaceType block_default_type) QemuOpts *legacy_opts; DriveMediaType media = MEDIA_DISK; BlockInterfaceType type; +int cyls, heads, secs, translation; Error *local_err = NULL; /* Change legacy command line options into QMP ones */ @@ -850,6 +816,53 @@ DriveInfo *drive_init(QemuOpts *all_opts, BlockInterfaceType block_default_type) type = block_default_type; } +/* Geometry */ +cyls = qemu_opt_get_number(legacy_opts, cyls, 0); +heads = qemu_opt_get_number(legacy_opts, heads, 0); +secs = qemu_opt_get_number(legacy_opts, secs, 0); + +if (cyls || heads || secs) { +if (cyls 1) { +error_report(invalid physical cyls
[Qemu-devel] [PATCH v3 15/17] blockdev: Remove 'media' parameter from blockdev_init()
The remaining users shouldn't be there with blockdev-add and are easy to move to drive_init(). Bonus bug fix: As a side effect, CD-ROM drives can now use block drivers on the read-only whitelist without explicitly specifying read-only=on, even if a format is explicitly specified. Signed-off-by: Kevin Wolf kw...@redhat.com Reviewed-by: Max Reitz mre...@redhat.com Reviewed-by: Eric Blake ebl...@redhat.com --- blockdev.c | 40 +++- 1 file changed, 15 insertions(+), 25 deletions(-) diff --git a/blockdev.c b/blockdev.c index 1aedef8..b39f2e7 100644 --- a/blockdev.c +++ b/blockdev.c @@ -309,8 +309,7 @@ typedef enum { MEDIA_DISK, MEDIA_CDROM } DriveMediaType; /* Takes the ownership of bs_opts */ static DriveInfo *blockdev_init(QDict *bs_opts, -BlockInterfaceType type, -DriveMediaType media) +BlockInterfaceType type) { const char *buf; const char *file = NULL; @@ -492,22 +491,6 @@ static DriveInfo *blockdev_init(QDict *bs_opts, bdrv_set_io_limits(dinfo-bdrv, cfg); } -switch(type) { -case IF_IDE: -case IF_SCSI: -case IF_XEN: -case IF_NONE: -dinfo-media_cd = media == MEDIA_CDROM; -break; -case IF_SD: -case IF_FLOPPY: -case IF_PFLASH: -case IF_MTD: -case IF_VIRTIO: -break; -default: -abort(); -} if (!file || !*file) { if (has_driver_specific_opts) { file = NULL; @@ -529,11 +512,6 @@ static DriveInfo *blockdev_init(QDict *bs_opts, bdrv_flags |= BDRV_O_INCOMING; } -if (media == MEDIA_CDROM) { -/* CDROM is fine for any interface, don't check. */ -ro = 1; -} - bdrv_flags |= ro ? 0 : BDRV_O_RDWR; if (ro copy_on_read) { @@ -717,6 +695,7 @@ DriveInfo *drive_init(QemuOpts *all_opts, BlockInterfaceType block_default_type) media = MEDIA_DISK; } else if (!strcmp(value, cdrom)) { media = MEDIA_CDROM; +qdict_put(bs_opts, read-only, qstring_from_str(on)); } else { error_report('%s' invalid media, value); goto fail; @@ -864,7 +843,7 @@ DriveInfo *drive_init(QemuOpts *all_opts, BlockInterfaceType block_default_type) } /* Actual block device init: Functionality shared with blockdev-add */ -dinfo = blockdev_init(bs_opts, type, media); +dinfo = blockdev_init(bs_opts, type); if (dinfo == NULL) { goto fail; } @@ -882,6 +861,17 @@ DriveInfo *drive_init(QemuOpts *all_opts, BlockInterfaceType block_default_type) dinfo-unit = unit_id; dinfo-devaddr = devaddr; +switch(type) { +case IF_IDE: +case IF_SCSI: +case IF_XEN: +case IF_NONE: +dinfo-media_cd = media == MEDIA_CDROM; +break; +default: +break; +} + fail: qemu_opts_del(legacy_opts); return dinfo; @@ -2176,7 +2166,7 @@ void qmp_blockdev_add(BlockdevOptions *options, Error **errp) qdict_flatten(qdict); -dinfo = blockdev_init(qdict, IF_NONE, MEDIA_DISK); +dinfo = blockdev_init(qdict, IF_NONE); if (!dinfo) { error_setg(errp, Could not open image); goto fail; -- 1.8.1.4
[Qemu-devel] [PATCH v3 03/17] blockdev: Introduce DriveInfo.enable_auto_del
BlockDriverStates shouldn't be affected by an unplugged guest device, except if created with the legacy -drive command line option or the drive_add HMP command. Make the automatic deletion as well as cancelling of jobs conditional on an enable_auto_del boolean that is only set in drive_init(). Signed-off-by: Kevin Wolf kw...@redhat.com Reviewed-by: Eric Blake ebl...@redhat.com Reviewed-by: Wenchao Xia xiaw...@linux.vnet.ibm.com --- blockdev.c| 17 - include/sysemu/blockdev.h | 1 + 2 files changed, 17 insertions(+), 1 deletion(-) diff --git a/blockdev.c b/blockdev.c index 8aa66a9..29a5b70 100644 --- a/blockdev.c +++ b/blockdev.c @@ -89,6 +89,10 @@ void blockdev_mark_auto_del(BlockDriverState *bs) { DriveInfo *dinfo = drive_get_by_blockdev(bs); +if (dinfo !dinfo-enable_auto_del) { +return; +} + if (bs-job) { block_job_cancel(bs-job); } @@ -750,6 +754,7 @@ static void qemu_opt_rename(QemuOpts *opts, const char *from, const char *to) DriveInfo *drive_init(QemuOpts *all_opts, BlockInterfaceType block_default_type) { const char *value; +DriveInfo *dinfo; /* Change legacy command line options into QMP ones */ qemu_opt_rename(all_opts, iops, throttling.iops-total); @@ -798,7 +803,17 @@ DriveInfo *drive_init(QemuOpts *all_opts, BlockInterfaceType block_default_type) qemu_opt_unset(all_opts, cache); } -return blockdev_init(all_opts, block_default_type); +/* Actual block device init: Functionality shared with blockdev-add */ +dinfo = blockdev_init(all_opts, block_default_type); +if (dinfo == NULL) { +goto fail; +} + +/* Set legacy DriveInfo fields */ +dinfo-enable_auto_del = true; + +fail: +return dinfo; } void do_commit(Monitor *mon, const QDict *qdict) diff --git a/include/sysemu/blockdev.h b/include/sysemu/blockdev.h index 804ec88..1082091 100644 --- a/include/sysemu/blockdev.h +++ b/include/sysemu/blockdev.h @@ -37,6 +37,7 @@ struct DriveInfo { int bus; int unit; int auto_del; /* see blockdev_mark_auto_del() */ +bool enable_auto_del; /* Only for legacy drive_init() */ int media_cd; int cyls, heads, secs, trans; QemuOpts *opts; -- 1.8.1.4
[Qemu-devel] [PATCH v3 10/17] blockdev: Move parsing of 'boot' option to drive_init
It's already ignored and only prints a deprecation message. No use in making it available in new interfaces. Signed-off-by: Kevin Wolf kw...@redhat.com Reviewed-by: Eric Blake ebl...@redhat.com --- blockdev.c | 21 +++-- 1 file changed, 11 insertions(+), 10 deletions(-) diff --git a/blockdev.c b/blockdev.c index b1c7590..85e412a 100644 --- a/blockdev.c +++ b/blockdev.c @@ -456,12 +456,6 @@ static DriveInfo *blockdev_init(QDict *bs_opts, return NULL; } -if (qemu_opt_get(opts, boot) != NULL) { -fprintf(stderr, qemu-kvm: boot=on|off is deprecated and will be -ignored. Future versions will reject this parameter. Please -update your scripts.\n); -} - on_write_error = BLOCKDEV_ON_ERROR_ENOSPC; if ((buf = qemu_opt_get(opts, werror)) != NULL) { if (type != IF_IDE type != IF_SCSI type != IF_VIRTIO type != IF_NONE) { @@ -713,6 +707,10 @@ QemuOptsList qemu_legacy_drive_opts = { .name = trans, .type = QEMU_OPT_STRING, .help = chs translation (auto, lba, none), +},{ +.name = boot, +.type = QEMU_OPT_BOOL, +.help = (deprecated, ignored), }, { /* end of list */ } }, @@ -788,6 +786,13 @@ DriveInfo *drive_init(QemuOpts *all_opts, BlockInterfaceType block_default_type) goto fail; } +/* Deprecated option boot=[on|off] */ +if (qemu_opt_get(legacy_opts, boot) != NULL) { +fprintf(stderr, qemu-kvm: boot=on|off is deprecated and will be +ignored. Future versions will reject this parameter. Please +update your scripts.\n); +} + /* Media type */ value = qemu_opt_get(legacy_opts, media); if (value) { @@ -2328,10 +2333,6 @@ QemuOptsList qemu_common_drive_opts = { .name = copy-on-read, .type = QEMU_OPT_BOOL, .help = copy read data from backing file into image file, -},{ -.name = boot, -.type = QEMU_OPT_BOOL, -.help = (deprecated, ignored), }, { /* end of list */ } }, -- 1.8.1.4
[Qemu-devel] [PATCH v3 11/17] blockdev: Move bus/unit/index processing to drive_init
This requires moving the automatic ID generation at the same time, so let's do that as well. Signed-off-by: Kevin Wolf kw...@redhat.com Reviewed-by: Max Reitz mre...@redhat.com Reviewed-by: Eric Blake ebl...@redhat.com --- blockdev.c | 157 - 1 file changed, 73 insertions(+), 84 deletions(-) diff --git a/blockdev.c b/blockdev.c index 85e412a..1eaefb0 100644 --- a/blockdev.c +++ b/blockdev.c @@ -315,10 +315,6 @@ static DriveInfo *blockdev_init(QDict *bs_opts, const char *buf; const char *file = NULL; const char *serial; -const char *mediastr = ; -int bus_id, unit_id; -int max_devs; -int index; int ro = 0; int bdrv_flags = 0; int on_read_error, on_write_error; @@ -358,10 +354,6 @@ static DriveInfo *blockdev_init(QDict *bs_opts, has_driver_specific_opts = !!qdict_size(bs_opts); /* extract parameters */ -bus_id = qemu_opt_get_number(opts, bus, 0); -unit_id = qemu_opt_get_number(opts, unit, -1); -index = qemu_opt_get_number(opts, index, -1); - snapshot = qemu_opt_get_bool(opts, snapshot, 0); ro = qemu_opt_get_bool(opts, read-only, 0); copy_on_read = qemu_opt_get_bool(opts, copy-on-read, false); @@ -369,8 +361,6 @@ static DriveInfo *blockdev_init(QDict *bs_opts, file = qemu_opt_get(opts, file); serial = qemu_opt_get(opts, serial); -max_devs = if_max_devs[type]; - if ((buf = qemu_opt_get(opts, discard)) != NULL) { if (bdrv_parse_discard_flags(buf, bdrv_flags) != 0) { error_report(invalid discard option); @@ -489,66 +479,6 @@ static DriveInfo *blockdev_init(QDict *bs_opts, } } -/* compute bus and unit according index */ - -if (index != -1) { -if (bus_id != 0 || unit_id != -1) { -error_report(index cannot be used with bus and unit); -return NULL; -} -bus_id = drive_index_to_bus_id(type, index); -unit_id = drive_index_to_unit_id(type, index); -} - -/* if user doesn't specify a unit_id, - * try to find the first free - */ - -if (unit_id == -1) { - unit_id = 0; - while (drive_get(type, bus_id, unit_id) != NULL) { - unit_id++; - if (max_devs unit_id = max_devs) { - unit_id -= max_devs; - bus_id++; - } - } -} - -/* check unit id */ - -if (max_devs unit_id = max_devs) { -error_report(unit %d too big (max is %d), - unit_id, max_devs - 1); -return NULL; -} - -/* - * catch multiple definitions - */ - -if (drive_get(type, bus_id, unit_id) != NULL) { -error_report(drive with bus=%d, unit=%d (index=%d) exists, - bus_id, unit_id, index); -return NULL; -} - -/* no id supplied - create one */ -if (qemu_opts_id(opts) == NULL) { -char *new_id; -if (type == IF_IDE || type == IF_SCSI) { -mediastr = (media == MEDIA_CDROM) ? -cd : -hd; -} -if (max_devs) { -new_id = g_strdup_printf(%s%i%s%i, if_name[type], bus_id, - mediastr, unit_id); -} else { -new_id = g_strdup_printf(%s%s%i, if_name[type], - mediastr, unit_id); -} -qemu_opts_set_id(opts, new_id); -} - /* init */ dinfo = g_malloc0(sizeof(*dinfo)); dinfo-id = g_strdup(qemu_opts_id(opts)); @@ -557,8 +487,6 @@ static DriveInfo *blockdev_init(QDict *bs_opts, dinfo-bdrv-read_only = ro; dinfo-devaddr = devaddr; dinfo-type = type; -dinfo-bus = bus_id; -dinfo-unit = unit_id; dinfo-refcount = 1; if (serial != NULL) { dinfo-serial = g_strdup(serial); @@ -684,6 +612,18 @@ QemuOptsList qemu_legacy_drive_opts = { .head = QTAILQ_HEAD_INITIALIZER(qemu_legacy_drive_opts.head), .desc = { { +.name = bus, +.type = QEMU_OPT_NUMBER, +.help = bus number, +},{ +.name = unit, +.type = QEMU_OPT_NUMBER, +.help = unit number (i.e. lun for scsi), +},{ +.name = index, +.type = QEMU_OPT_NUMBER, +.help = index number, +},{ .name = media, .type = QEMU_OPT_STRING, .help = media type (disk, cdrom), @@ -725,6 +665,7 @@ DriveInfo *drive_init(QemuOpts *all_opts, BlockInterfaceType block_default_type) DriveMediaType media = MEDIA_DISK; BlockInterfaceType type; int cyls, heads, secs, translation; +int max_devs, bus_id, unit_id, index; Error *local_err = NULL; /* Change legacy command line options into QMP ones */ @@ -868,6 +809,63 @@ DriveInfo *drive_init(QemuOpts *all_opts, BlockInterfaceType block_default_type) } } +/* Device address specified by bus/unit or index. + * If none
[Qemu-devel] [PATCH v3 06/17] blockdev: Pass QDict to blockdev_init()
Working on a QDict instead of a QemuOpts that accepts anything is more in line with bdrv_open(). A QDict is what qmp_blockdev_add() already has anyway, so this saves additional conversions. And last, but not least, it allows later patches to easily extract legacy options into a separate, typed QemuOpts for drive_init() (the untyped QemuOpts that drive_init already has doesn't allow access to numbers, only strings, and is therefore useless without conversion). Signed-off-by: Kevin Wolf kw...@redhat.com Reviewed-by: Benoit Canet ben...@irqsave.net Reviewed-by: Eric Blake ebl...@redhat.com --- blockdev.c | 34 +- 1 file changed, 17 insertions(+), 17 deletions(-) diff --git a/blockdev.c b/blockdev.c index e7e6bdd..cbe907b 100644 --- a/blockdev.c +++ b/blockdev.c @@ -217,7 +217,10 @@ static void bdrv_format_print(void *opaque, const char *name) static void drive_uninit(DriveInfo *dinfo) { -qemu_opts_del(dinfo-opts); +if (dinfo-opts) { +qemu_opts_del(dinfo-opts); +} + bdrv_unref(dinfo-bdrv); g_free(dinfo-id); QTAILQ_REMOVE(drives, dinfo, next); @@ -302,7 +305,8 @@ static bool check_throttle_config(ThrottleConfig *cfg, Error **errp) return true; } -static DriveInfo *blockdev_init(QemuOpts *all_opts, +/* Takes the ownership of bs_opts */ +static DriveInfo *blockdev_init(QDict *bs_opts, BlockInterfaceType block_default_type) { const char *buf; @@ -326,7 +330,6 @@ static DriveInfo *blockdev_init(QemuOpts *all_opts, int ret; Error *error = NULL; QemuOpts *opts; -QDict *bs_opts; const char *id; bool has_driver_specific_opts; BlockDriver *drv = NULL; @@ -334,9 +337,9 @@ static DriveInfo *blockdev_init(QemuOpts *all_opts, translation = BIOS_ATA_TRANSLATION_AUTO; media = MEDIA_DISK; -/* Check common options by copying from all_opts to opts, all other options - * are stored in bs_opts. */ -id = qemu_opts_id(all_opts); +/* Check common options by copying from bs_opts to opts, all other options + * stay in bs_opts for processing by bdrv_open(). */ +id = qdict_get_try_str(bs_opts, id); opts = qemu_opts_create(qemu_common_drive_opts, id, 1, error); if (error_is_set(error)) { qerror_report_err(error); @@ -344,8 +347,6 @@ static DriveInfo *blockdev_init(QemuOpts *all_opts, return NULL; } -bs_opts = qdict_new(); -qemu_opts_to_qdict(all_opts, bs_opts); qemu_opts_absorb_qdict(opts, bs_opts, error); if (error_is_set(error)) { qerror_report_err(error); @@ -634,7 +635,6 @@ static DriveInfo *blockdev_init(QemuOpts *all_opts, dinfo-heads = heads; dinfo-secs = secs; dinfo-trans = translation; -dinfo-opts = all_opts; dinfo-refcount = 1; if (serial != NULL) { dinfo-serial = g_strdup(serial); @@ -759,6 +759,7 @@ DriveInfo *drive_init(QemuOpts *all_opts, BlockInterfaceType block_default_type) { const char *value; DriveInfo *dinfo; +QDict *bs_opts; /* Change legacy command line options into QMP ones */ qemu_opt_rename(all_opts, iops, throttling.iops-total); @@ -807,14 +808,19 @@ DriveInfo *drive_init(QemuOpts *all_opts, BlockInterfaceType block_default_type) qemu_opt_unset(all_opts, cache); } +/* Get a QDict for processing the options */ +bs_opts = qdict_new(); +qemu_opts_to_qdict(all_opts, bs_opts); + /* Actual block device init: Functionality shared with blockdev-add */ -dinfo = blockdev_init(all_opts, block_default_type); +dinfo = blockdev_init(bs_opts, block_default_type); if (dinfo == NULL) { goto fail; } /* Set legacy DriveInfo fields */ dinfo-enable_auto_del = true; +dinfo-opts = all_opts; fail: return dinfo; @@ -2109,13 +2115,7 @@ void qmp_blockdev_add(BlockdevOptions *options, Error **errp) qdict_flatten(qdict); -QemuOpts *opts = qemu_opts_from_qdict(qemu_drive_opts, qdict, local_err); -if (error_is_set(local_err)) { -error_propagate(errp, local_err); -goto fail; -} - -dinfo = blockdev_init(opts, IF_NONE); +dinfo = blockdev_init(qdict, IF_NONE); if (!dinfo) { error_setg(errp, Could not open image); goto fail; -- 1.8.1.4
[Qemu-devel] [PATCH v3 17/17] blockdev: blockdev_init() error conversion
This gives us meaningful error messages for the blockdev-add QMP command. Signed-off-by: Kevin Wolf kw...@redhat.com Reviewed-by: Max Reitz mre...@redhat.com Reviewed-by: Eric Blake ebl...@redhat.com --- blockdev.c | 59 +-- 1 file changed, 33 insertions(+), 26 deletions(-) diff --git a/blockdev.c b/blockdev.c index 61dbf26..835c8c3 100644 --- a/blockdev.c +++ b/blockdev.c @@ -272,7 +272,7 @@ static void bdrv_put_ref_bh_schedule(BlockDriverState *bs) qemu_bh_schedule(s-bh); } -static int parse_block_error_action(const char *buf, bool is_read) +static int parse_block_error_action(const char *buf, bool is_read, Error **errp) { if (!strcmp(buf, ignore)) { return BLOCKDEV_ON_ERROR_IGNORE; @@ -283,8 +283,8 @@ static int parse_block_error_action(const char *buf, bool is_read) } else if (!strcmp(buf, report)) { return BLOCKDEV_ON_ERROR_REPORT; } else { -error_report('%s' invalid %s error action, - buf, is_read ? read : write); +error_setg(errp, '%s' invalid %s error action, + buf, is_read ? read : write); return -1; } } @@ -309,7 +309,8 @@ typedef enum { MEDIA_DISK, MEDIA_CDROM } DriveMediaType; /* Takes the ownership of bs_opts */ static DriveInfo *blockdev_init(QDict *bs_opts, -BlockInterfaceType type) +BlockInterfaceType type, +Error **errp) { const char *buf; const char *file = NULL; @@ -333,15 +334,13 @@ static DriveInfo *blockdev_init(QDict *bs_opts, id = qdict_get_try_str(bs_opts, id); opts = qemu_opts_create(qemu_common_drive_opts, id, 1, error); if (error_is_set(error)) { -qerror_report_err(error); -error_free(error); +error_propagate(errp, error); return NULL; } qemu_opts_absorb_qdict(opts, bs_opts, error); if (error_is_set(error)) { -qerror_report_err(error); -error_free(error); +error_propagate(errp, error); return NULL; } @@ -361,7 +360,7 @@ static DriveInfo *blockdev_init(QDict *bs_opts, if ((buf = qemu_opt_get(opts, discard)) != NULL) { if (bdrv_parse_discard_flags(buf, bdrv_flags) != 0) { -error_report(invalid discard option); +error_setg(errp, invalid discard option); return NULL; } } @@ -383,7 +382,7 @@ static DriveInfo *blockdev_init(QDict *bs_opts, } else if (!strcmp(buf, threads)) { /* this is the default */ } else { - error_report(invalid aio option); + error_setg(errp, invalid aio option); return NULL; } } @@ -400,9 +399,10 @@ static DriveInfo *blockdev_init(QDict *bs_opts, drv = bdrv_find_whitelisted_format(buf, ro); if (!drv) { if (!ro bdrv_find_whitelisted_format(buf, !ro)) { -error_report('%s' can be only used as read-only device., buf); +error_setg(errp, '%s' can be only used as read-only device., + buf); } else { -error_report('%s' invalid format, buf); +error_setg(errp, '%s' invalid format, buf); } return NULL; } @@ -439,20 +439,20 @@ static DriveInfo *blockdev_init(QDict *bs_opts, cfg.op_size = qemu_opt_get_number(opts, throttling.iops-size, 0); if (!check_throttle_config(cfg, error)) { -error_report(%s, error_get_pretty(error)); -error_free(error); +error_propagate(errp, error); return NULL; } on_write_error = BLOCKDEV_ON_ERROR_ENOSPC; if ((buf = qemu_opt_get(opts, werror)) != NULL) { if (type != IF_IDE type != IF_SCSI type != IF_VIRTIO type != IF_NONE) { -error_report(werror is not supported by this bus type); +error_setg(errp, werror is not supported by this bus type); return NULL; } -on_write_error = parse_block_error_action(buf, 0); -if (on_write_error 0) { +on_write_error = parse_block_error_action(buf, 0, error); +if (error_is_set(error)) { +error_propagate(errp, error); return NULL; } } @@ -464,8 +464,9 @@ static DriveInfo *blockdev_init(QDict *bs_opts, return NULL; } -on_read_error = parse_block_error_action(buf, 1); -if (on_read_error 0) { +on_read_error = parse_block_error_action(buf, 1, error); +if (error_is_set(error)) { +error_propagate(errp, error); return NULL; } } @@ -518,8 +519,9 @@ static DriveInfo *blockdev_init(QDict *bs_opts, ret = bdrv_open(dinfo-bdrv, file, bs_opts, bdrv_flags, drv, error); if (ret 0) { -error_report(could not open disk image %s: %s, -
[Qemu-devel] [PATCH v3 08/17] blockdev: Move parsing of 'if' option to drive_init
It's always IF_NONE for blockdev-add. Signed-off-by: Kevin Wolf kw...@redhat.com Reviewed-by: Benoit Canet ben...@irqsave.net Reviewed-by: Eric Blake ebl...@redhat.com --- blockdev.c | 40 ++-- 1 file changed, 22 insertions(+), 18 deletions(-) diff --git a/blockdev.c b/blockdev.c index 0eaaffa..58c3ab9 100644 --- a/blockdev.c +++ b/blockdev.c @@ -309,14 +309,13 @@ typedef enum { MEDIA_DISK, MEDIA_CDROM } DriveMediaType; /* Takes the ownership of bs_opts */ static DriveInfo *blockdev_init(QDict *bs_opts, -BlockInterfaceType block_default_type, +BlockInterfaceType type, DriveMediaType media) { const char *buf; const char *file = NULL; const char *serial; const char *mediastr = ; -BlockInterfaceType type; int bus_id, unit_id; int cyls, heads, secs, translation; int max_devs; @@ -377,17 +376,6 @@ static DriveInfo *blockdev_init(QDict *bs_opts, file = qemu_opt_get(opts, file); serial = qemu_opt_get(opts, serial); -if ((buf = qemu_opt_get(opts, if)) != NULL) { -for (type = 0; type IF_COUNT strcmp(buf, if_name[type]); type++) -; -if (type == IF_COUNT) { -error_report(unsupported bus type '%s', buf); -return NULL; - } -} else { -type = block_default_type; -} - max_devs = if_max_devs[type]; if (cyls || heads || secs) { @@ -756,6 +744,10 @@ QemuOptsList qemu_legacy_drive_opts = { .name = media, .type = QEMU_OPT_STRING, .help = media type (disk, cdrom), +},{ +.name = if, +.type = QEMU_OPT_STRING, +.help = interface (ide, scsi, sd, mtd, floppy, pflash, virtio), }, { /* end of list */ } }, @@ -768,6 +760,7 @@ DriveInfo *drive_init(QemuOpts *all_opts, BlockInterfaceType block_default_type) QDict *bs_opts; QemuOpts *legacy_opts; DriveMediaType media = MEDIA_DISK; +BlockInterfaceType type; Error *local_err = NULL; /* Change legacy command line options into QMP ones */ @@ -842,8 +835,23 @@ DriveInfo *drive_init(QemuOpts *all_opts, BlockInterfaceType block_default_type) } } +/* Controller type */ +value = qemu_opt_get(legacy_opts, if); +if (value) { +for (type = 0; + type IF_COUNT strcmp(value, if_name[type]); + type++) { +} +if (type == IF_COUNT) { +error_report(unsupported bus type '%s', value); +goto fail; +} +} else { +type = block_default_type; +} + /* Actual block device init: Functionality shared with blockdev-add */ -dinfo = blockdev_init(bs_opts, block_default_type, media); +dinfo = blockdev_init(bs_opts, type, media); if (dinfo == NULL) { goto fail; } @@ -2191,10 +2199,6 @@ QemuOptsList qemu_common_drive_opts = { .type = QEMU_OPT_NUMBER, .help = unit number (i.e. lun for scsi), },{ -.name = if, -.type = QEMU_OPT_STRING, -.help = interface (ide, scsi, sd, mtd, floppy, pflash, virtio), -},{ .name = index, .type = QEMU_OPT_NUMBER, .help = index number, -- 1.8.1.4
[Qemu-devel] [PATCHv4 15/17] qemu-img: add support for fully allocated images
Signed-off-by: Peter Lieven p...@kamp.de --- qemu-img.c|8 +--- qemu-img.texi |5 + 2 files changed, 10 insertions(+), 3 deletions(-) diff --git a/qemu-img.c b/qemu-img.c index 926f0a0..c6eff15 100644 --- a/qemu-img.c +++ b/qemu-img.c @@ -100,8 +100,10 @@ static void help(void) '-h' with or without a command shows this help and lists the supported formats\n '-p' show progress of command (only certain commands)\n '-q' use Quiet mode - do not print any output (except errors)\n - '-S' indicates the consecutive number of bytes that must contain only zeros\n - for qemu-img to create a sparse image during conversion\n + '-S' indicates the consecutive number of bytes (defaults to 4k) that must\n + contain only zeros for qemu-img to create a sparse image during\n + conversion. if the number of bytes is 0 sparse files are disabled and\n + images will always be fully allocated\n '--output' takes the format in which the output must be done (human or json)\n '-n' skips the target volume creation (useful if the volume is created\n prior to running qemu-img)\n @@ -1465,7 +1467,7 @@ static int img_convert(int argc, char **argv) /* signal EOF to align */ bdrv_write_compressed(out_bs, 0, NULL, 0); } else { -int has_zero_init = bdrv_has_zero_init(out_bs); +int has_zero_init = min_sparse ? bdrv_has_zero_init(out_bs) : 0; sector_num = 0; // total number of sectors converted so far nb_sectors = total_sectors - sector_num; diff --git a/qemu-img.texi b/qemu-img.texi index 768054e..51a1ee5 100644 --- a/qemu-img.texi +++ b/qemu-img.texi @@ -193,6 +193,11 @@ Image conversion is also useful to get smaller image when using a growable format such as @code{qcow} or @code{cow}: the empty sectors are detected and suppressed from the destination image. +@var{sparse_size} indicates the consecutive number of bytes (defaults to 4k) +that must contain only zeros for qemu-img to create a sparse image during +conversion. If the number of bytes is 0 sparse files are disabled and +images will always be fully allocated. + You can use the @var{backing_file} option to force the output image to be created as a copy on write image of the specified base image; the @var{backing_file} should have the same content as the input's base image, -- 1.7.9.5
[Qemu-devel] [PATCH v3 13/17] blockdev: Remove IF_* check for read-only blockdev_init
IF_NONE allows read-only, which makes forbidding it in this place for other types pretty much pointless. Instead, make sure that all devices for which the check would have errored out check in their init function that they don't get a read-only BlockDriverState. This catches even cases where IF_NONE and -device is used. Signed-off-by: Kevin Wolf kw...@redhat.com Reviewed-by: Eric Blake ebl...@redhat.com --- blockdev.c | 6 -- hw/block/m25p80.c | 5 + hw/block/xen_disk.c| 5 + hw/sd/milkymist-memcard.c | 4 hw/sd/omap_mmc.c | 6 ++ hw/sd/pl181.c | 4 hw/sd/pxa2xx_mmci.c| 3 +++ hw/sd/sd.c | 5 + hw/sd/sdhci.c | 3 +++ hw/sd/ssi-sd.c | 3 +++ tests/qemu-iotests/051.out | 5 - 11 files changed, 42 insertions(+), 7 deletions(-) diff --git a/blockdev.c b/blockdev.c index 1c05c7a..1aedef8 100644 --- a/blockdev.c +++ b/blockdev.c @@ -532,12 +532,6 @@ static DriveInfo *blockdev_init(QDict *bs_opts, if (media == MEDIA_CDROM) { /* CDROM is fine for any interface, don't check. */ ro = 1; -} else if (ro == 1) { -if (type != IF_SCSI type != IF_VIRTIO type != IF_FLOPPY -type != IF_NONE type != IF_PFLASH) { -error_report(read-only not supported by this bus type); -goto err; -} } bdrv_flags |= ro ? 0 : BDRV_O_RDWR; diff --git a/hw/block/m25p80.c b/hw/block/m25p80.c index 8c3b7f0..02a1544 100644 --- a/hw/block/m25p80.c +++ b/hw/block/m25p80.c @@ -624,6 +624,11 @@ static int m25p80_init(SSISlave *ss) if (dinfo dinfo-bdrv) { DB_PRINT_L(0, Binding to IF_MTD drive\n); s-bdrv = dinfo-bdrv; +if (bdrv_is_read_only(s-bdrv)) { +fprintf(stderr, Can't use a read-only drive); +return 1; +} + /* FIXME: Move to late init */ if (bdrv_read(s-bdrv, 0, s-storage, DIV_ROUND_UP(s-size, BDRV_SECTOR_SIZE))) { diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c index f35fc59..253a45f 100644 --- a/hw/block/xen_disk.c +++ b/hw/block/xen_disk.c @@ -829,6 +829,11 @@ static int blk_connect(struct XenDevice *xendev) /* setup via qemu cmdline - already setup for us */ xen_be_printf(blkdev-xendev, 2, get configured bdrv (cmdline setup)\n); blkdev-bs = blkdev-dinfo-bdrv; +if (bdrv_is_read_only(blkdev-bs) !readonly) { +xen_be_printf(blkdev-xendev, 0, Unexpected read-only drive); +blkdev-bs = NULL; +return -1; +} /* blkdev-bs is not create by us, we get a reference * so we can bdrv_unref() unconditionally */ bdrv_ref(blkdev-bs); diff --git a/hw/sd/milkymist-memcard.c b/hw/sd/milkymist-memcard.c index 42613b3..d1168c9 100644 --- a/hw/sd/milkymist-memcard.c +++ b/hw/sd/milkymist-memcard.c @@ -255,6 +255,10 @@ static int milkymist_memcard_init(SysBusDevice *dev) dinfo = drive_get_next(IF_SD); s-card = sd_init(dinfo ? dinfo-bdrv : NULL, false); +if (s-card == NULL) { +return -1; +} + s-enabled = dinfo ? bdrv_is_inserted(dinfo-bdrv) : 0; memory_region_init_io(s-regs_region, OBJECT(s), memcard_mmio_ops, s, diff --git a/hw/sd/omap_mmc.c b/hw/sd/omap_mmc.c index bf5d1fb..937a478 100644 --- a/hw/sd/omap_mmc.c +++ b/hw/sd/omap_mmc.c @@ -593,6 +593,9 @@ struct omap_mmc_s *omap_mmc_init(hwaddr base, /* Instantiate the storage */ s-card = sd_init(bd, false); +if (s-card == NULL) { +exit(1); +} return s; } @@ -618,6 +621,9 @@ struct omap_mmc_s *omap2_mmc_init(struct omap_target_agent_s *ta, /* Instantiate the storage */ s-card = sd_init(bd, false); +if (s-card == NULL) { +exit(1); +} s-cdet = qemu_allocate_irqs(omap_mmc_cover_cb, s, 1)[0]; sd_set_cb(s-card, NULL, s-cdet); diff --git a/hw/sd/pl181.c b/hw/sd/pl181.c index 03875bf..c35896d 100644 --- a/hw/sd/pl181.c +++ b/hw/sd/pl181.c @@ -491,6 +491,10 @@ static int pl181_init(SysBusDevice *sbd) qdev_init_gpio_out(dev, s-cardstatus, 2); dinfo = drive_get_next(IF_SD); s-card = sd_init(dinfo ? dinfo-bdrv : NULL, false); +if (s-card == NULL) { +return -1; +} + return 0; } diff --git a/hw/sd/pxa2xx_mmci.c b/hw/sd/pxa2xx_mmci.c index 90c955f..b9d8b1a 100644 --- a/hw/sd/pxa2xx_mmci.c +++ b/hw/sd/pxa2xx_mmci.c @@ -539,6 +539,9 @@ PXA2xxMMCIState *pxa2xx_mmci_init(MemoryRegion *sysmem, /* Instantiate the actual storage */ s-card = sd_init(bd, false); +if (s-card == NULL) { +exit(1); +} register_savevm(NULL, pxa2xx_mmci, 0, 0, pxa2xx_mmci_save, pxa2xx_mmci_load, s); diff --git a/hw/sd/sd.c b/hw/sd/sd.c index 346d86f..7380f06 100644 --- a/hw/sd/sd.c +++ b/hw/sd/sd.c @@ -494,6 +494,11 @@ SDState *sd_init(BlockDriverState *bs, bool is_spi) {
Re: [Qemu-devel] [PATCH] osdep: initialize glib threads in all QEMU tools
Am 08.10.2013 um 11:58 hat Stefan Hajnoczi geschrieben: glib versions prior to 2.31.0 require an explicit g_thread_init() call to enable multi-threading. Failure to initialize threading causes glib to take single-threaded code paths without synchronization. For example, the g_slice allocator will crash due to race conditions. Fix this for all QEMU tool programs (qemu-nbd, qemu-io, qemu-img) by moving the g_thread_init() call from vl.c:main() into a new osdep.c:thread_init() constructor function. thread_init() has __attribute__((constructor)) and is automatically invoked by the runtime during startup. We can now drop the simple trace backend's g_thread_init() call since thread_init() already called it. Note that we must keep coroutine-gthread.c's g_thread_init() call which is located in a constructor function. There is no guarantee for constructor function ordering so thread_init() may only be called later. The glib documentation says: Since version 2.24, calling g_thread_init() multiple times is allowed, but nothing happens except for the first call. I take that this means previously it wasn't allowed. qemu's configure checks for a minimum version of 2.12, so we seems to support glib versions that don't allow g_thread_init() to be called multiple times. Do we need to protect against this? Kevin
[Qemu-devel] [PATCH v3 14/17] qemu-iotests: Check autodel behaviour for device_del
Block devices creates with -drive and drive_add should automatically disappear if the guest device is unplugged. blockdev-add ones shouldn't. Signed-off-by: Kevin Wolf kw...@redhat.com Reviewed-by: Max Reitz mre...@redhat.com Reviewed-by: Eric Blake ebl...@redhat.com --- tests/qemu-iotests/064 | 133 +++ tests/qemu-iotests/064.out | 80 +++ tests/qemu-iotests/common.filter | 8 +++ tests/qemu-iotests/group | 1 + 4 files changed, 222 insertions(+) create mode 100755 tests/qemu-iotests/064 create mode 100644 tests/qemu-iotests/064.out diff --git a/tests/qemu-iotests/064 b/tests/qemu-iotests/064 new file mode 100755 index 000..79dc38b --- /dev/null +++ b/tests/qemu-iotests/064 @@ -0,0 +1,133 @@ +#!/bin/bash +# +# Test automatic deletion of BDSes created by -drive/drive_add +# +# Copyright (C) 2013 Red Hat, Inc. +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see http://www.gnu.org/licenses/. +# + +# creator +owner=kw...@redhat.com + +seq=`basename $0` +echo QA output created by $seq + +here=`pwd` +tmp=/tmp/$$ +status=1 # failure is the default! + +# get standard environment, filters and checks +. ./common.rc +. ./common.filter + +_supported_fmt qcow2 +_supported_proto file +_supported_os Linux + +function do_run_qemu() +{ +echo Testing: $@ +$QEMU -nographic -qmp stdio -serial none $@ +echo +} + +function run_qemu() +{ +do_run_qemu $@ 21 | _filter_testdir | _filter_qmp +} + +size=128M + +_make_test_img $size + +echo +echo === -drive/-device and device_del === +echo + +run_qemu -drive file=$TEST_IMG,format=$IMGFMT,if=none,id=disk -device virtio-blk-pci,drive=disk,id=virtio0 EOF +{ execute: qmp_capabilities } +{ execute: query-block } +{ execute: device_del, arguments: { id: virtio0 } } +{ execute: system_reset } +{ execute: query-block } +{ execute: quit } +EOF + +echo +echo === -drive/device_add and device_del === +echo + +run_qemu -drive file=$TEST_IMG,format=$IMGFMT,if=none,id=disk EOF +{ execute: qmp_capabilities } +{ execute: query-block } +{ execute: device_add, + arguments: { driver: virtio-blk-pci, drive: disk, + id: virtio0 } } +{ execute: device_del, arguments: { id: virtio0 } } +{ execute: system_reset } +{ execute: query-block } +{ execute: quit } +EOF + +echo +echo === drive_add/device_add and device_del === +echo + +run_qemu EOF +{ execute: qmp_capabilities } +{ execute: human-monitor-command, + arguments: { command-line: drive_add 0 file=$TEST_IMG,format=$IMGFMT,if=none,id=disk } } +{ execute: query-block } +{ execute: device_add, + arguments: { driver: virtio-blk-pci, drive: disk, + id: virtio0 } } +{ execute: device_del, arguments: { id: virtio0 } } +{ execute: system_reset } +{ execute: query-block } +{ execute: quit } +EOF + +echo +echo === blockdev_add/device_add and device_del === +echo + +run_qemu EOF +{ execute: qmp_capabilities } +{ execute: blockdev-add, + arguments: { + options: { +driver: $IMGFMT, +id: disk, +file: { +driver: file, +filename: $TEST_IMG +} + } +} + } +{ execute: query-block } +{ execute: device_add, + arguments: { driver: virtio-blk-pci, drive: disk, + id: virtio0 } } +{ execute: device_del, arguments: { id: virtio0 } } +{ execute: system_reset } +{ execute: query-block } +{ execute: quit } +EOF + +# success, all done +echo *** done +rm -f $seq.full +status=0 diff --git a/tests/qemu-iotests/064.out b/tests/qemu-iotests/064.out new file mode 100644 index 000..8c7a334 --- /dev/null +++ b/tests/qemu-iotests/064.out @@ -0,0 +1,80 @@ +QA output created by 064 +Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728 + +=== -drive/-device and device_del === + +Testing: -drive file=TEST_DIR/t.qcow2,format=qcow2,if=none,id=disk -device virtio-blk-pci,drive=disk,id=virtio0 +QMP_VERSION +{return: {}} +{return: [{io-status: ok, device: disk, locked: false, removable: false, inserted: {iops_rd: 0, image: {virtual-size: 134217728, filename: TEST_DIR/t.qcow2, cluster-size: 65536, format: qcow2, actual-size: 139264, dirty-flag: false}, iops_wr: 0, ro: false, backing_file_depth: 0, drv: qcow2, iops: 0, bps_wr: 0, encrypted: false, bps: 0, bps_rd: 0, file: TEST_DIR/t.qcow2, encryption_key_missing: false}, type: unknown}, {io-status: ok, device: ide1-cd0, locked: false,
[Qemu-devel] [PATCH v3 16/17] blockdev: Don't disable COR automatically with blockdev-add
If a read-only device is configured with copy-on-read=on, the old code only prints a warning and automatically disables copy on read. Make it a real error for blockdev-add. Signed-off-by: Kevin Wolf kw...@redhat.com Reviewed-by: Max Reitz mre...@redhat.com Reviewed-by: Eric Blake ebl...@redhat.com --- block.c| 9 +++-- blockdev.c | 31 +++ 2 files changed, 34 insertions(+), 6 deletions(-) diff --git a/block.c b/block.c index 93e113a..dc63f02 100644 --- a/block.c +++ b/block.c @@ -774,8 +774,13 @@ static int bdrv_open_common(BlockDriverState *bs, BlockDriverState *file, } assert(bs-copy_on_read == 0); /* bdrv_new() and bdrv_close() make it so */ -if (!bs-read_only (flags BDRV_O_COPY_ON_READ)) { -bdrv_enable_copy_on_read(bs); +if (flags BDRV_O_COPY_ON_READ) { +if (!bs-read_only) { +bdrv_enable_copy_on_read(bs); +} else { +error_setg(errp, Can't use copy-on-read on read-only device); +return -EINVAL; +} } if (filename != NULL) { diff --git a/blockdev.c b/blockdev.c index b39f2e7..61dbf26 100644 --- a/blockdev.c +++ b/blockdev.c @@ -514,10 +514,6 @@ static DriveInfo *blockdev_init(QDict *bs_opts, bdrv_flags |= ro ? 0 : BDRV_O_RDWR; -if (ro copy_on_read) { -error_report(warning: disabling copy_on_read on read-only drive); -} - QINCREF(bs_opts); ret = bdrv_open(dinfo-bdrv, file, bs_opts, bdrv_flags, drv, error); @@ -605,6 +601,18 @@ QemuOptsList qemu_legacy_drive_opts = { .type = QEMU_OPT_STRING, .help = pci address (virtio only), }, + +/* Options that are passed on, but have special semantics with -drive */ +{ +.name = read-only, +.type = QEMU_OPT_BOOL, +.help = open drive file as read-only, +},{ +.name = copy-on-read, +.type = QEMU_OPT_BOOL, +.help = copy read data from backing file into image file, +}, + { /* end of list */ } }, }; @@ -620,6 +628,7 @@ DriveInfo *drive_init(QemuOpts *all_opts, BlockInterfaceType block_default_type) int cyls, heads, secs, translation; int max_devs, bus_id, unit_id, index; const char *devaddr; +bool read_only, copy_on_read; Error *local_err = NULL; /* Change legacy command line options into QMP ones */ @@ -702,6 +711,20 @@ DriveInfo *drive_init(QemuOpts *all_opts, BlockInterfaceType block_default_type) } } +/* copy-on-read is disabled with a warning for read-only devices */ +read_only = qemu_opt_get_bool(legacy_opts, read-only, false); +copy_on_read = qemu_opt_get_bool(legacy_opts, copy-on-read, false); + +if (read_only copy_on_read) { +error_report(warning: disabling copy-on-read on read-only drive); +copy_on_read = false; +} + +qdict_put(bs_opts, read-only, + qstring_from_str(read_only ? on : off)); +qdict_put(bs_opts, copy-on-read, + qstring_from_str(copy_on_read ? on :off)); + /* Controller type */ value = qemu_opt_get(legacy_opts, if); if (value) { -- 1.8.1.4
Re: [Qemu-devel] [PATCH] block/iscsi: introduce bdrv_co_{readv, writev, flush_to_disk}
Am 02.10.2013 um 17:46 hat Paolo Bonzini geschrieben: Il 02/10/2013 17:41, Peter Lieven ha scritto: this converts read, write and flush functions from aio to coroutines. I'm not sure it's already the time for this... Cancellation sucks in QEMU, and this is going to make things even worse. Not sure what you're referring to. If you mean iscsi_aio_cancel(), isn't it dead code anyway since we changed block.c to use coroutines for everything? bdrv_co_io_em() even throws the acb away, so even if you wanted, there's no way to cancel the request even today. And the lines deleted/inserted are much in favour of this patch. Kevin
Re: [Qemu-devel] [PATCH] block/iscsi: introduce bdrv_co_{readv, writev, flush_to_disk}
Il 08/10/2013 14:33, Kevin Wolf ha scritto: this converts read, write and flush functions from aio to coroutines. I'm not sure it's already the time for this... Cancellation sucks in QEMU, and this is going to make things even worse. Not sure what you're referring to. If you mean iscsi_aio_cancel(), isn't it dead code anyway since we changed block.c to use coroutines for everything? bdrv_co_io_em() even throws the acb away, so even if you wanted, there's no way to cancel the request even today. SCSI tries to use cancellation, and this results in VCPU threads starving all other threads. So I would like to introduce cancellation points for coroutines. Paolo
Re: [Qemu-devel] [PATCHv5] block/get_block_status: avoid redundant callouts on raw devices
Il 08/10/2013 14:05, Peter Lieven ha scritto: Strictly speaking, this should probably do something like this: assert(ret BDRV_BLOCK_OFFSET_VALID); return bdrv_get_block_status(bs-file, ret BDRV_SECTOR_BITS, nb_sectors, pnum); shouldn't the last line be: return bdrv_get_block_status(bs-file, ret BDRV_SECTOR_BITS, *pnum, pnum); This would of course require *pnum = nb_sectors in raw_co_get_block_status ? Yes for both questions. Paolo
Re: [Qemu-devel] [PATCH] qemu-iotests: Correct 026 output
Am 02.10.2013 um 16:45 hat Max Reitz geschrieben: Because l2_allocate now frees the unused L2 cluster on error, the according test cases in 026 don't result in one leaked cluster anymore. Signed-off-by: Max Reitz mre...@redhat.com --- This patch depends on and is a follow-up to qcow2: Free allocated L2 cluster on error which was part of the qcow2: Small error path fixes for l2_allocate series. This series is fully merged to Kevin's block branch, however, this one patch was excluded from his pull request last Friday, therefore, it is missing from master. --- tests/qemu-iotests/026.out | 32 tests/qemu-iotests/026.out.nocache | 32 2 files changed, 16 insertions(+), 48 deletions(-) Stefan, it seems you already picked this up? Kevin
Re: [Qemu-devel] [PATCH] block/iscsi: introduce bdrv_co_{readv, writev, flush_to_disk}
Am 08.10.2013 um 14:35 hat Paolo Bonzini geschrieben: Il 08/10/2013 14:33, Kevin Wolf ha scritto: this converts read, write and flush functions from aio to coroutines. I'm not sure it's already the time for this... Cancellation sucks in QEMU, and this is going to make things even worse. Not sure what you're referring to. If you mean iscsi_aio_cancel(), isn't it dead code anyway since we changed block.c to use coroutines for everything? bdrv_co_io_em() even throws the acb away, so even if you wanted, there's no way to cancel the request even today. SCSI tries to use cancellation, and this results in VCPU threads starving all other threads. So I would like to introduce cancellation points for coroutines. Sounds like a nice thing to have, but it's unrelated to this patch. Cancellation means waiting for request completion before and after the patch. Kevin
[Qemu-devel] [PATCHv6] block/get_block_status: avoid redundant callouts on raw devices
if a raw device like an iscsi target or host device is used the current implementation makes a second call out to get the block status of bs-file. Signed-off-by: Peter Lieven p...@kamp.de --- v6: made the result of raw_co_get_block_status valid by adding BDRV_BLOCK_DATA and setting *pnum to nb_sectors and calling bdrv_get_block_status on bs-file with the offset and *pnum returned in the first callout. v5: add a generic get_lba_status function in the raw driver which adds the BDRV_BLOCK_RAW flag. bdrv_co_get_block_status will handle the callout to bs-file then. v4: use a flag to detect the raw driver instead of the strncmp hack block.c |6 ++ block/raw_bsd.c |4 +++- include/block/block.h |4 3 files changed, 13 insertions(+), 1 deletion(-) diff --git a/block.c b/block.c index 93e113a..826eff0 100644 --- a/block.c +++ b/block.c @@ -3147,6 +3147,12 @@ static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs, return ret; } +if (ret BDRV_BLOCK_RAW) { +assert(ret BDRV_BLOCK_OFFSET_VALID); +return bdrv_get_block_status(bs-file, ret BDRV_SECTOR_BITS, + *pnum, pnum); +} + if (!(ret BDRV_BLOCK_DATA)) { if (bdrv_has_zero_init(bs)) { ret |= BDRV_BLOCK_ZERO; diff --git a/block/raw_bsd.c b/block/raw_bsd.c index d4ace60..d61906b 100644 --- a/block/raw_bsd.c +++ b/block/raw_bsd.c @@ -62,7 +62,9 @@ static int64_t coroutine_fn raw_co_get_block_status(BlockDriverState *bs, int64_t sector_num, int nb_sectors, int *pnum) { -return bdrv_get_block_status(bs-file, sector_num, nb_sectors, pnum); +*pnum = nb_sectors; +return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID | BDRV_BLOCK_DATA | + (sector_num BDRV_SECTOR_BITS); } static int coroutine_fn raw_co_write_zeroes(BlockDriverState *bs, diff --git a/include/block/block.h b/include/block/block.h index f808550..003699e 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -84,6 +84,9 @@ typedef struct BlockDevOps { /* BDRV_BLOCK_DATA: data is read from bs-file or another file * BDRV_BLOCK_ZERO: sectors read as zero * BDRV_BLOCK_OFFSET_VALID: sector stored in bs-file as raw data + * BDRV_BLOCK_RAW: used internally to indicate that the request + * was answered by the raw driver and that one + * should look in bs-file directly. * * If BDRV_BLOCK_OFFSET_VALID is set, bits 9-62 represent the offset in * bs-file where sector data can be read from as raw data. @@ -105,6 +108,7 @@ typedef struct BlockDevOps { #define BDRV_BLOCK_DATA 1 #define BDRV_BLOCK_ZERO 2 #define BDRV_BLOCK_OFFSET_VALID 4 +#define BDRV_BLOCK_RAW 8 #define BDRV_BLOCK_OFFSET_MASK BDRV_SECTOR_MASK typedef enum { -- 1.7.9.5
[Qemu-devel] [PATCHv4 17/17] block/raw: copy BlockLimits on raw_open
Signed-off-by: Peter Lieven p...@kamp.de --- block/raw_bsd.c |1 + 1 file changed, 1 insertion(+) diff --git a/block/raw_bsd.c b/block/raw_bsd.c index 8dc7bba..2c26b79 100644 --- a/block/raw_bsd.c +++ b/block/raw_bsd.c @@ -159,6 +159,7 @@ static int raw_open(BlockDriverState *bs, QDict *options, int flags, Error **errp) { bs-sg = bs-file-sg; +bs-bl = bs-file-bl; return 0; } -- 1.7.9.5
Re: [Qemu-devel] [PATCH 0/3] qapi: introduce BlockJobType enum
Am 08.10.2013 um 11:29 hat Fam Zheng geschrieben: Currently, block job type is hard coded string and could be repeated in different places in the code base. Introduce a enum type in QAPI to make it better for maintenance and introspection. The old BlockJobType struct is renamed to BlockJobDriver and its field job_type becomes a BlockJobType enum. Nothing is changed to the interface. Fam Zheng (3): blockjob: rename BlockJobType to BlockJobDriver qapi: Introduce enum BlockJobType qapi: make use of new BlockJobType Thanks, applied to the block branch. Kevin
Re: [Qemu-devel] [Xen-devel] Hvmloader: Add _STA for PCI hotplug slots
Il 29/09/2013 02:30, Gonglei (Arei) ha scritto: -Original Message- From: Konrad Rzeszutek Wilk [mailto:konrad.w...@oracle.com] Sent: Saturday, September 28, 2013 5:43 AM To: Gonglei (Arei); anthony.per...@citrix.com; Stefano Stabellini Cc: xen-de...@lists.xen.org; Hanweidong (Randy); Yanqiangjun; Luonengjun; qemu-devel@nongnu.org; Gaowei (UVP); Huangweidong (Hardware) Subject: Re: [Xen-devel] Hvmloader: Add _STA for PCI hotplug slots On Fri, Sep 27, 2013 at 06:29:20AM +, Gonglei (Arei) wrote: Hi, Hey, (CCing Stefano and Anthony). In Xen platform, after using upstream qemu, the all of pci devices will show hotplug in the windows guest. In this situation, the windows guest may occur blue screen when VM' user click the icon of VGA card for trying unplug VGA card. However, we don't hope VM's user can do such dangerous operation, and showing all pci devices inside the guest OS is unfriendly. In addition, I find the traditional qemu have not this problem, and KVM also. Is there any news about this patch please? On the KVM platform, the seabios will read the RMV bits of pci slot (according the 0xae08 I/O port register), then modify the SSDT table. The key steps as follows: In Seabios: #define PCI_RMV_BASE 0xae0c// 0xae08 I/O port register static void* build_ssdt(void) { ... // build Device object for each slot u32 rmvc_pcrm = inl(PCI_RMV_BASE); ... } In upstream Qemu, read 0xae0c I/O port register function: static uint64_t pci_read(void *opaque, hwaddr addr, unsigned int size) { ... case PCI_RMV_BASE - PCI_HOTPLUG_ADDR: val = s-pci0_hotplug_enable; break; } s-pci0_hotplug_enable is set by the follow function: static void piix4_update_hotplug(PIIX4PMState *s) { ... s-pci0_hotplug_enable = ~0; s-pci0_slot_device_present = 0; QTAILQ_FOREACH_SAFE(kid, bus-children, sibling, next) { DeviceState *qdev = kid-child; PCIDevice *pdev = PCI_DEVICE(qdev); PCIDeviceClass *pc = PCI_DEVICE_GET_CLASS(pdev); int slot = PCI_SLOT(pdev-devfn); //setting by PCIDeviceClass *k-no_hotplug if (pc-no_hotplug) { s-pci0_hotplug_enable = ~(1U slot); } s-pci0_slot_device_present |= (1U slot); } } But, on the XEN platform, ACPI DSDT tables is produced by the hvmloader, more details in this patch: http://xen.1045712.n5.nabble.com/xen-unstable-hvmloader-acpi-dsdt-Fix-PCI- hotplug-with-the-new-qemu-xen-td4947152.html # Node ID 1a912ce93b506a185b54fd97986214e6eff8a0bc # Parent 6bc03e22f921aadfa7e5cebe92100cb01377947d hvmloader/acpi/dsdt: Fix PCI hotplug with the new qemu-xen. oddly enough you did not CC the author of said patch? I am doing that for you. That's my mistake, thank you so much! The ACPI PIIX4 device in QEMU upstream as not the same behavior to handle PCI hotplug. This patch introduce the necessary change to the DSDT ACPI table to behave as expceted by the new QEMU. To switch to this new DSDT table version, there is a new option --dm-version to mk_dsdt. Change are inspired by SeaBIOS DSDT source code. There is few things missing with the new QEMU: - QEMU provide the plugged/unplugged status only per slot (and not per func like qemu-xen-traditionnal. - I did not include the _STA ACPI method that give the status of a device (present, functionning properly) because qemu-xen does not handle it. - I did not include the _RMV method that say if the device can be removed, because the IO port of QEMU that give this status always return true. In SeaBIOS table, they have a specific _RMV method for VGA, ISA that return false. But I'm not sure that we can do the same in Xen. now, I add the _STA method, return the different value according the 0xae08 I/O port register on read, a pci device allow hotplug return 0x1f, a pci device don't allow return 0x1e. Then the pci devices which don't allow hotplug will not show inside the guest OS. So you are saying that the 'the IO port of QEMU that give this status always return true. is no longer correct? Yes, now the RMV bits of pci slot is set by the PCIDeviceClass's no_hotplug attribute, such as: static void cirrus_vga_class_init(ObjectClass *klass, void *data) { DeviceClass *dc = DEVICE_CLASS(klass); PCIDeviceClass *k = PCI_DEVICE_CLASS(klass); k-no_hotplug = 1; ... ... } If k-no_hotplug equal 1, the corresponding slot of PIIX4PMState pci0_hotplug_enable bit will be cleared. Otherwise the slot's pci0_hotplug_enable bit will be set. Index: tools/firmware/hvmloader/acpi/mk_dsdt.c === --- tools/firmware/hvmloader/acpi/mk_dsdt.c (revision 1105) +++ tools/firmware/hvmloader/acpi/mk_dsdt.c (working copy) @@ -437,6 +437,10 @@ indent(); printf(B0EJ, 32,\n); pop_block(); +stmt(OperationRegion, SRMV, SystemIO,
Re: [Qemu-devel] [PATCH 3/3] qapi: make use of new BlockJobType
On 10/08/2013 03:29 AM, Fam Zheng wrote: Switch the string to enum type BlockJobType in BlockJobDriver. Signed-off-by: Fam Zheng f...@redhat.com --- +++ b/include/block/blockjob.h @@ -37,7 +37,7 @@ typedef struct BlockJobDriver { size_t instance_size; /** String describing the operation, part of query-block-jobs QMP API */ -const char *job_type; +BlockJobType job_type; Comment looks awkward now that it is not a string in memory; but it is still a string on the wire, so I can live with it as-is. Series: Reviewed-by: Eric Blake ebl...@redhat.com Thanks for doing this! -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH] osdep: initialize glib threads in all QEMU tools
Il 08/10/2013 14:25, Kevin Wolf ha scritto: The glib documentation says: Since version 2.24, calling g_thread_init() multiple times is allowed, but nothing happens except for the first call. I take that this means previously it wasn't allowed. qemu's configure checks for a minimum version of 2.12, so we seems to support glib versions that don't allow g_thread_init() to be called multiple times. Do we need to protect against this? I think that's the point of the if (!g_thread_supported ()) tests. Paolo
Re: [Qemu-devel] [PATCH v4] Extend qemu-ga's 'guest-info' command to expose flag 'success-response'
On 10/08/2013 12:23 AM, Mark Wu wrote: Now we have several qemu-ga commands not returning response on success. It has been documented in qga/qapi-schema.json already. This patch exposes the 'success-response' flag by extending 'guest-info' command. With this change, the clients can handle the command response more flexibly. Signed-off-by: Mark Wu wu...@linux.vnet.ibm.com --- Changes: v4: Add signature of qmp_has_success_response per Michael. v3: 1. treat cmd-options as a bitmask instead of single option (per Eric) 2. rebase on the patch Add interface to traverse the qmp command list by QmpCommand to avoid the O(n2) problem (per Eric and Michael) v2: add the notation 'since 1.7' to the option 'success-response' (per Eric Blake's comments) include/qapi/qmp/dispatch.h | 1 + qapi/qmp-registry.c | 5 + qga/commands.c | 1 + qga/qapi-schema.json| 5 - 4 files changed, 11 insertions(+), 1 deletion(-) Reviewed-by: Eric Blake ebl...@redhat.com -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH v2] Add interface to traverse the qmp command list by QmpCommand
On 10/08/2013 12:23 AM, Mark Wu wrote: In the original code, qmp_get_command_list is used to construct a list of all commands' name. To get the information of all qga commands, it traverses the name list and search the command info with its name. So it can cause O(n^2) in the number of commands. This patch adds an interface to traverse the qmp command list by QmpCommand to replace qmp_get_command_list. It can decrease the complexity from O(n^2) to O(n). Signed-off-by: Mark Wu wu...@linux.vnet.ibm.com --- Changes: v2: 1. Keep the signature of qmp_command_is_enabled (per Eric and Michael) 2. Remove the unnecessary pointer castings (per Eric) include/qapi/qmp/dispatch.h | 5 ++-- qapi/qmp-registry.c | 27 +++--- qga/commands.c | 38 ++--- qga/main.c | 68 + 4 files changed, 48 insertions(+), 90 deletions(-) Reviewed-by: Eric Blake ebl...@redhat.com +++ b/qga/main.c @@ -347,48 +347,34 @@ static gint ga_strcmp(gconstpointer str1, gconstpointer str2) } /* disable commands that aren't safe for fsfreeze */ -static void ga_disable_non_whitelisted(void) +static void ga_disable_non_whitelisted(QmpCommand *cmd, void *opaque) { -char **list_head, **list; bool whitelisted; int i; -list_head = list = qmp_get_command_list(); -while (*list != NULL) { -whitelisted = false; -i = 0; -while (ga_freeze_whitelist[i] != NULL) { -if (strcmp(*list, ga_freeze_whitelist[i]) == 0) { -whitelisted = true; -} -i++; -} -if (!whitelisted) { -g_debug(disabling command: %s, *list); -qmp_disable_command(*list); +whitelisted = false; +i = 0; +while (ga_freeze_whitelist[i] != NULL) { +if (strcmp(cmd-name, ga_freeze_whitelist[i]) == 0) { This (and several other instances) accesses cmd-name directly where we were formally using the string from the returned list. Should these spots be modified to go through an accessor method instead, similar to qmp_command_is_enabled, so that QmpCommand can be treated as an opaque type from this file? -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH 0/8] Make icount thread-safe
Am 08.10.2013 10:47, schrieb Paolo Bonzini: This series moves the icount state under the same seqlock as the normal vm_clock implementation. It is not yet 100% thread-safe, because the CPU list should be moved under RCU protection (due to the call to !all_cpu_threads_idle() in qemu_clock_warp). However it is a substantial step forward, the only uncovered case being CPU hotplug. Please review. Paolo Paolo Bonzini (8): timers: extract timer_mod_ns_locked and timerlist_rearm timers: add timer_mod_anticipate and timer_mod_anticipate_ns timers: use cpu_get_icount() directly timers: reorganize icount_warp_rt timers: prepare the code for future races in calling qemu_clock_warp timers: introduce cpu_get_clock_locked timers: document (future) locking rules for icount timers: make icount thread-safe These patches touch cpus.c exclusively, so timers: is rather misleading. As you know I have pending patches (in need of rebase due to the performance issue you raised) moving the icount CPU fields around. Is there anything in particular I should be aware of? Looks to me as if this may be orthogonal? What about the previous patch disabling icount for -smp? Does this series supersede it or does it fix different concurrency issues? Thanks, Andreas cpus.c | 110 - include/qemu/timer.h | 26 + qemu-timer.c | 74 +++-- 4 files changed, 163 insertions(+), 47 deletions(-) create mode 100644 include/qemu/seqlock.h -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
Re: [Qemu-devel] [PATCH v5 0/3] pci: partially implement master abort protocol
On Mon, Sep 16, 2013 at 11:21:13AM +0300, Marcel Apfelbaum wrote: PCI spec requires that a transaction that has not been claimed by any PCI bus devices will be terminated by the initiator with master abort. For read transactions -1() is returned and writes are silently dropped. OK looks good to me, I put this on the pci branch. Implementation: - Allowed the MemoryRegion priority to be negative so a subregion will be visible on all the addresses not covered by other container subregions. - Added a memory region with negative priority that extends over all the pci address space. This region catches all the accesses to the unassigned pci addresses. - The MemoryRegion's ops emulates the master abort scenario. I am working on implementing the following on top of this series - Implement upstream master abort - Handling of RECEIVED MASTER ABORT BIT in Status register Changes from v4: - Addressed Peter Maydell comments - Changed memory patches commit comment - Addressed Michael S. Tsirkin comments - Changed PCI master_abort_mem ops endian-nes to DEVICE_LITTLE_ENDIAN Changes from v3: - Addressed Peter Maydell comments - Removed unnecessary changes to priority of MemoryListener - Ensured that priority is now signed in all related places - Added to memory docs explanation on signed priorities - Addresses Michael S. Tsirkin comments - Changed the name of the new Memory region to master_abort_mem - Made master abort priority INT_MIN instead of -1 - Removed handling of RECEIVED MASTER ABORT BIT; it will be taken care in a different series Changes from v2: - minor: changed nr of patches in the title - minor: modified series list Changes from v1: - pci-unassigned-mem MemoryRegion resides now in PCIBus and not on various Host Bridges - pci-unassgined-mem does not have a .valid.accept field and implements read write methods Marcel Apfelbaum (3): memory: Change MemoryRegion priorities from unsigned to signed docs/memory: Explictly state that MemoryRegion priority is signed hw/pci: partially handle pci master abort docs/memory.txt | 4 hw/core/sysbus.c | 4 ++-- hw/pci/pci.c | 27 +++ include/exec/memory.h| 4 ++-- include/hw/pci/pci_bus.h | 1 + include/hw/sysbus.h | 2 +- memory.c | 4 ++-- 7 files changed, 39 insertions(+), 7 deletions(-) -- 1.8.3.1
Re: [Qemu-devel] [PATCH 0/8] Make icount thread-safe
Il 08/10/2013 15:47, Andreas Färber ha scritto: Am 08.10.2013 10:47, schrieb Paolo Bonzini: This series moves the icount state under the same seqlock as the normal vm_clock implementation. It is not yet 100% thread-safe, because the CPU list should be moved under RCU protection (due to the call to !all_cpu_threads_idle() in qemu_clock_warp). However it is a substantial step forward, the only uncovered case being CPU hotplug. Please review. Paolo Paolo Bonzini (8): timers: extract timer_mod_ns_locked and timerlist_rearm timers: add timer_mod_anticipate and timer_mod_anticipate_ns timers: use cpu_get_icount() directly timers: reorganize icount_warp_rt timers: prepare the code for future races in calling qemu_clock_warp timers: introduce cpu_get_clock_locked timers: document (future) locking rules for icount timers: make icount thread-safe These patches touch cpus.c exclusively, so timers: is rather misleading. I can change that to icount. As you know I have pending patches (in need of rebase due to the performance issue you raised) moving the icount CPU fields around. Is there anything in particular I should be aware of? Looks to me as if this may be orthogonal? It's entirely orthogonal. It doesn't affect the cpu-exec part of icount, only the timers :) part. What about the previous patch disabling icount for -smp? Does this series supersede it or does it fix different concurrency issues? This is for making accesses to icount safe without holding the BQL. icount for -smp remains just as broken as before. Paolo
Re: [Qemu-devel] [PATCH v7 4/6] qcow2: Add support for ImageInfoSpecific
Am 02.10.2013 um 10:39 hat Max Reitz geschrieben: Add a new ImageInfoSpecificQCow2 type as a subtype of ImageInfoSpecific. This contains the compatibility level as a string and an optional lazy_refcounts boolean (optional means mandatory for compat = 1.1 and not available for compat == 0.10). Also, add qcow2_get_specific_info, which returns this information. Signed-off-by: Max Reitz mre...@redhat.com Reviewed-by: Eric Blake ebl...@redhat.com --- block/qcow2.c| 19 +++ qapi-schema.json | 16 2 files changed, 35 insertions(+) diff --git a/block/qcow2.c b/block/qcow2.c index 4a9888c..396650d 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -1810,6 +1810,24 @@ static int qcow2_get_info(BlockDriverState *bs, BlockDriverInfo *bdi) return 0; } +static ImageInfoSpecific *qcow2_get_specific_info(BlockDriverState *bs) +{ +BDRVQcowState *s = bs-opaque; +ImageInfoSpecific *spec_info = g_new0(ImageInfoSpecific, 1); + +spec_info-kind = IMAGE_INFO_SPECIFIC_KIND_QCOW2; +spec_info-qcow2 = g_new0(ImageInfoSpecificQCow2, 1); +if (s-qcow_version == 2) { +spec_info-qcow2-compat = g_strdup(0.10); +} else if (s-qcow_version == 3) { +spec_info-qcow2-compat = g_strdup(1.1); +spec_info-qcow2-lazy_refcounts = s-use_lazy_refcounts; If this function gets reused in a running qemu instance, e.g. from the monitor, I think the semantics we'd want it to have is to describe the image file, not the current options (which may have been modified on the command line or using other monitor commands). By checking s-compatible_features instead of s-use_lazy_refcounts you would always return this information. (Also, how could you leave out this opportunity to use compound literals?! ;-)) +spec_info-qcow2-has_lazy_refcounts = true; +} + +return spec_info; +} Kevin
Re: [Qemu-devel] [PATCH] osdep: initialize glib threads in all QEMU tools
Am 08.10.2013 um 15:08 hat Paolo Bonzini geschrieben: Il 08/10/2013 14:25, Kevin Wolf ha scritto: The glib documentation says: Since version 2.24, calling g_thread_init() multiple times is allowed, but nothing happens except for the first call. I take that this means previously it wasn't allowed. qemu's configure checks for a minimum version of 2.12, so we seems to support glib versions that don't allow g_thread_init() to be called multiple times. Do we need to protect against this? I think that's the point of the if (!g_thread_supported ()) tests. Ah yes, I think you're right. Not the best function name I've ever seen that glib uses there, but okay. Kevin
[Qemu-devel] Current qemu-master hangs when used with qxl + linux guest
Hi All, I'm having this weird problem with qemu master + spice/qxl using guests. As soon as the guest starts Xorg, I get the following message from qemu: main-loop: WARNING: I/O thread spun for 1000 iterations And from then on the guest hangs and qemu consumes 100% cpu. The qemu console still works, and I can quit qemu that way. Doing ctrl+c + a thread apply all bt in qemy shows one cpu thread waiting for the iothread-lock, and all other threads waiting in poll. This happens both with non kms guests (tried RHEL-6.5, older Fedoras) as well as with kms guests (tried a fully up2date F-19). Since I've not seen any similar reports, I assume it is something with my setup ... I've tried changing various things: -removing the spice agent channel -changing the number of virtual cpus (tried 1 and 2 virtual cpus) -upgrading spice-server to the latest git master But all to no avail. This is with qemu-master build from source on a fully up2date F-20 system, using the F-20 seabios files. If someone has any clever ideas I'll happily try debugging this further. Regards, Hans