Re: [Qemu-devel] QEMU and Real Time OS
On Fri, Jan 30, 2015 at 08:37:47AM +0100, Jan Kiszka wrote: On 2015-01-30 00:06, Paolo Bonzini wrote: On 29/01/2015 20:37, Marc MarĂ wrote: Is this an expected behaviour? I can't see why. I'd like to know if there is a certain reason why it doesn't work. Or if it should work and the problem is too much I/O overhead. Or any other hint to understand it. It is due to latencies in the host. You need at least to use preempt-rt kernels in the host as well. That alone won't help much. You also need to fine-tune the guest to avoid running into QEMU locks that continuously synchronizes the guest on things like VGA or disk I/O emulation. When using KVM, thus being able to run VCPUs widely independent of each other and the device models, you need to push cyclictest on an isolated second virtual CPU of the guest. Luiz and Marcelo can probably confirm this based on their ongoing experiments. Yes, we have achieved low latencies by using a dedicated pCPU for a guest vCPU. This also avoids iothread - guest vCPU -RT priority issues. With TCG, we would first of all have to make it true SMP and independent of the I/O device lock. That's what Frederic is working on [1]. Jan [1] http://permalink.gmane.org/gmane.comp.emulators.qemu/314406 -- Siemens AG, Corporate Technology, CT RTC ITP SES-DE Corporate Competence Center Embedded Linux
Re: [Qemu-devel] [PATCH 1/2] glusterfs: fix max_discard
On 02/02/15 23:40, Peter Lieven wrote: Am 02.02.2015 um 21:09 schrieb Denis V. Lunev: qemu_gluster_co_discard calculates size to discard as follows size_t size = nb_sectors * BDRV_SECTOR_SIZE; ret = glfs_discard_async(s-fd, offset, size, gluster_finish_aiocb, acb); glfs_discard_async is declared as follows: int glfs_discard_async (glfs_fd_t *fd, off_t length, size_t lent, glfs_io_cbk fn, void *data) __THROW This is problematic on i686 as sizeof(size_t) == 4. Set bl_max_discard to SIZE_MAX BDRV_SECTOR_BITS to avoid overflow on i386. Signed-off-by: Denis V. Lunev d...@openvz.org CC: Kevin Wolf kw...@redhat.com CC: Peter Lieven p...@kamp.de --- block/gluster.c | 9 + 1 file changed, 9 insertions(+) diff --git a/block/gluster.c b/block/gluster.c index 1eb3a8c..8a8c153 100644 --- a/block/gluster.c +++ b/block/gluster.c @@ -622,6 +622,11 @@ out: return ret; } +static void qemu_gluster_refresh_limits(BlockDriverState *bs, Error **errp) +{ +bs-bl.max_discard = MIN(SIZE_MAX BDRV_SECTOR_BITS, INT_MAX); +} + Looking at the gluster code bl.max_transfer_length should have the same limit, but thats a different patch. ha, the same applies to nbd code too. I'll do this stuff tomorrow and also I think that some audit in other drivers could reveal something interesting. Den
Re: [Qemu-devel] [PATCH v3 06/14] qemu-img: Use blk_new_open() in img_rebase()
On 2015-02-02 at 14:00, Kevin Wolf wrote: Am 26.01.2015 um 16:00 hat Max Reitz geschrieben: Signed-off-by: Max Reitz mre...@redhat.com --- qemu-img.c | 57 - 1 file changed, 24 insertions(+), 33 deletions(-) diff --git a/qemu-img.c b/qemu-img.c index be1953d..0b23c87 100644 --- a/qemu-img.c +++ b/qemu-img.c @@ -2430,7 +2430,6 @@ static int img_rebase(int argc, char **argv) { BlockBackend *blk = NULL, *blk_old_backing = NULL, *blk_new_backing = NULL; BlockDriverState *bs = NULL, *bs_old_backing = NULL, *bs_new_backing = NULL; -BlockDriver *old_backing_drv, *new_backing_drv; char *filename; const char *fmt, *cache, *src_cache, *out_basefmt, *out_baseimg; int c, flags, src_flags, ret; @@ -2524,54 +2523,46 @@ static int img_rebase(int argc, char **argv) } bs = blk_bs(blk); -/* Find the right drivers for the backing files */ -old_backing_drv = NULL; -new_backing_drv = NULL; - -if (!unsafe bs-backing_format[0] != '\0') { -old_backing_drv = bdrv_find_format(bs-backing_format); -if (old_backing_drv == NULL) { -error_report(Invalid format name: '%s', bs-backing_format); -ret = -1; -goto out; -} -} - -if (out_basefmt != NULL) { -new_backing_drv = bdrv_find_format(out_basefmt); -if (new_backing_drv == NULL) { -error_report(Invalid format name: '%s', out_basefmt); -ret = -1; -goto out; -} -} You're removing the validity check of the new backing file format: [master] $ ./qemu-img rebase -u -b /tmp/backing.qcow2 -F foobar /tmp/test.qcow2 qemu-img: Invalid format name: 'foobar' [master] $ [growable-v3] $ ./qemu-img rebase -u -b /tmp/backing.qcow2 -F foobar /tmp/test.qcow2 [growable-v3] $ Oops, right, I missed that this changes the unsafe path. Thanks for catching it! I'll fix it in v4. Max
Re: [Qemu-devel] [PATCH v3 13/14] block: Remove growable from BDS
Am 26.01.2015 um 16:00 hat Max Reitz geschrieben: Now that request clamping is done in the BlockBackend, the growable field can be removed from the BlockDriverState. All BDSs are now treated as being growable (that is, they are allowed to grow; they are not necessarily actually able to). Signed-off-by: Max Reitz mre...@redhat.com --- block.c | 24 +--- block/qcow2.c | 6 -- block/raw-posix.c | 2 +- block/raw-win32.c | 2 +- block/sheepdog.c | 2 +- include/block/block_int.h | 3 --- 6 files changed, 8 insertions(+), 31 deletions(-) diff --git a/block.c b/block.c index d45e4dd..356a857 100644 --- a/block.c +++ b/block.c @@ -970,7 +970,6 @@ static int bdrv_open_common(BlockDriverState *bs, BlockDriverState *file, bs-zero_beyond_eof = true; open_flags = bdrv_open_flags(bs, flags); bs-read_only = !(open_flags BDRV_O_RDWR); -bs-growable = !!(flags BDRV_O_PROTOCOL); if (use_bdrv_whitelist !bdrv_is_whitelisted(drv, bs-read_only)) { error_setg(errp, @@ -1885,7 +1884,6 @@ void bdrv_close(BlockDriverState *bs) bs-encrypted = 0; bs-valid_key = 0; bs-sg = 0; -bs-growable = 0; bs-zero_beyond_eof = false; QDECREF(bs-options); bs-options = NULL; @@ -2645,25 +2643,13 @@ exit: static int bdrv_check_byte_request(BlockDriverState *bs, int64_t offset, size_t size) { -int64_t len; - if (size INT_MAX) { return -EIO; } -if (!bdrv_is_inserted(bs)) +if (!bdrv_is_inserted(bs)) { return -ENOMEDIUM; - -if (bs-growable) -return 0; - -len = bdrv_getlength(bs); - -if (offset 0) -return -EIO; Wouldn't it be better to keep this one, even though bs-growable used to disable it? Kevin
Re: [Qemu-devel] [PATCH 2/2] nbd: fix max_discard
Am 02.02.2015 um 19:29 schrieb Denis V. Lunev: nbd_co_discard calls nbd_client_session_co_discard which uses uint32_t as the length in bytes of the data to discard due to the following definition: struct nbd_request { uint32_t magic; uint32_t type; uint64_t handle; uint64_t from; uint32_t len; -- the length of data to be discarded, in bytes } QEMU_PACKED; Thus we should limit bl_max_discard to UINT32_MAX BDRV_SECTOR_BITS to avoid overflow. Signed-off-by: Denis V. Lunev d...@openvz.org CC: Kevin Wolf kw...@redhat.com CC: Peter Lieven p...@kamp.de --- block/nbd.c | 8 1 file changed, 8 insertions(+) diff --git a/block/nbd.c b/block/nbd.c index 04cc845..99af713 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -301,6 +301,11 @@ static int nbd_co_flush(BlockDriverState *bs) return nbd_client_session_co_flush(s-client); } +static void nbd_refresh_limits(BlockDriverState *bs, Error **errp) +{ +bs-bl.max_discard = UINT32_MAX BDRV_SECTOR_BITS; +} + static int nbd_co_discard(BlockDriverState *bs, int64_t sector_num, int nb_sectors) { @@ -396,6 +401,7 @@ static BlockDriver bdrv_nbd = { .bdrv_close = nbd_close, .bdrv_co_flush_to_os= nbd_co_flush, .bdrv_co_discard= nbd_co_discard, +.bdrv_refresh_limits= nbd_refresh_limits, .bdrv_getlength = nbd_getlength, .bdrv_detach_aio_context= nbd_detach_aio_context, .bdrv_attach_aio_context= nbd_attach_aio_context, @@ -413,6 +419,7 @@ static BlockDriver bdrv_nbd_tcp = { .bdrv_close = nbd_close, .bdrv_co_flush_to_os= nbd_co_flush, .bdrv_co_discard= nbd_co_discard, +.bdrv_refresh_limits= nbd_refresh_limits, .bdrv_getlength = nbd_getlength, .bdrv_detach_aio_context= nbd_detach_aio_context, .bdrv_attach_aio_context= nbd_attach_aio_context, @@ -430,6 +437,7 @@ static BlockDriver bdrv_nbd_unix = { .bdrv_close = nbd_close, .bdrv_co_flush_to_os= nbd_co_flush, .bdrv_co_discard= nbd_co_discard, +.bdrv_refresh_limits= nbd_refresh_limits, .bdrv_getlength = nbd_getlength, .bdrv_detach_aio_context= nbd_detach_aio_context, .bdrv_attach_aio_context= nbd_attach_aio_context, Reviewed-by: Peter Lieven p...@kamp.de
[Qemu-devel] [PATCH 1/2] glusterfs: fix max_discard
qemu_gluster_co_discard calculates size to discard as follows size_t size = nb_sectors * BDRV_SECTOR_SIZE; ret = glfs_discard_async(s-fd, offset, size, gluster_finish_aiocb, acb); glfs_discard_async is declared as follows: int glfs_discard_async (glfs_fd_t *fd, off_t length, size_t lent, glfs_io_cbk fn, void *data) __THROW This is problematic on i686 as sizeof(size_t) == 4. Set bl_max_discard to SIZE_MAX BDRV_SECTOR_BITS to avoid overflow on i386. Signed-off-by: Denis V. Lunev d...@openvz.org CC: Kevin Wolf kw...@redhat.com CC: Peter Lieven p...@kamp.de --- block/gluster.c | 9 + 1 file changed, 9 insertions(+) diff --git a/block/gluster.c b/block/gluster.c index 1eb3a8c..8a8c153 100644 --- a/block/gluster.c +++ b/block/gluster.c @@ -622,6 +622,11 @@ out: return ret; } +static void qemu_gluster_refresh_limits(BlockDriverState *bs, Error **errp) +{ +bs-bl.max_discard = MIN(SIZE_MAX BDRV_SECTOR_BITS, INT_MAX); +} + #ifdef CONFIG_GLUSTERFS_DISCARD static coroutine_fn int qemu_gluster_co_discard(BlockDriverState *bs, int64_t sector_num, int nb_sectors) @@ -735,6 +740,7 @@ static BlockDriver bdrv_gluster = { #ifdef CONFIG_GLUSTERFS_DISCARD .bdrv_co_discard = qemu_gluster_co_discard, #endif +.bdrv_refresh_limits = qemu_gluster_refresh_limits, #ifdef CONFIG_GLUSTERFS_ZEROFILL .bdrv_co_write_zeroes = qemu_gluster_co_write_zeroes, #endif @@ -762,6 +768,7 @@ static BlockDriver bdrv_gluster_tcp = { #ifdef CONFIG_GLUSTERFS_DISCARD .bdrv_co_discard = qemu_gluster_co_discard, #endif +.bdrv_refresh_limits = qemu_gluster_refresh_limits, #ifdef CONFIG_GLUSTERFS_ZEROFILL .bdrv_co_write_zeroes = qemu_gluster_co_write_zeroes, #endif @@ -789,6 +796,7 @@ static BlockDriver bdrv_gluster_unix = { #ifdef CONFIG_GLUSTERFS_DISCARD .bdrv_co_discard = qemu_gluster_co_discard, #endif +.bdrv_refresh_limits = qemu_gluster_refresh_limits, #ifdef CONFIG_GLUSTERFS_ZEROFILL .bdrv_co_write_zeroes = qemu_gluster_co_write_zeroes, #endif @@ -816,6 +824,7 @@ static BlockDriver bdrv_gluster_rdma = { #ifdef CONFIG_GLUSTERFS_DISCARD .bdrv_co_discard = qemu_gluster_co_discard, #endif +.bdrv_refresh_limits = qemu_gluster_refresh_limits, #ifdef CONFIG_GLUSTERFS_ZEROFILL .bdrv_co_write_zeroes = qemu_gluster_co_write_zeroes, #endif -- 1.9.1
Re: [Qemu-devel] [RFC v2 6/8] vfio_pci: fix a wrong check in vfio_pci_reset
On Wed, 2015-01-28 at 16:37 +0800, Chen Fan wrote: when vfio device support FLR, then when device reset, we call VFIO_DEVICE_RESET ioctl to reset the device first, at kernel side, we also can see the order of reset: 3330 rc = pcie_flr(dev, probe); 3331 if (rc != -ENOTTY) 3332 goto done; 3334 rc = pci_af_flr(dev, probe); 3335 if (rc != -ENOTTY) 3336 goto done; 3337 3338 rc = pci_pm_reset(dev, probe); 3339 if (rc != -ENOTTY) 3340 goto done; so when vfio has FLR, reset it directly. Signed-off-by: Chen Fan chen.fan.f...@cn.fujitsu.com --- hw/vfio/pci.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index 8c81bb3..54eb6b4 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -3455,7 +3455,7 @@ static void vfio_pci_reset(DeviceState *dev) vfio_pci_pre_reset(vdev); if (vdev-vbasedev.reset_works -(vdev-has_flr || !vdev-has_pm_reset) +vdev-has_flr !ioctl(vdev-vbasedev.fd, VFIO_DEVICE_RESET)) { trace_vfio_pci_reset_flr(vdev-vbasedev.name); goto post_reset; Does this actually fix anything? QEMU shouldn't rely on a specific behavior of the kernel. This test is de-prioritizing a PM reset because they're often non-effective. If the device supports FLR, the second part of the OR is unreached, so what's the point of this change? Thanks, Alex
[Qemu-devel] [PATCH 0/3] nbd: Drop BDS backpointer
Right now, bdrv_swap() on NBD BDSs results in a segmentation fault pretty much all of the time. This series fixes this. Note that this is not a common case, as bdrv_swap() is generally only performed on root BDSs (there are exceptions, though) and NBD BDSs normally have a format BDS above them. However, due to misconfiguration (or maybe it is not even a misconfiguration, but just a strange configuration) these cases may indeed occur. For the iotest to succeed, this series relies on iotests: Specify format for qemu-nbd. I took the second patch in this series from my other series block: Rework bdrv_close_all() (which has 21 patches itself and depends on 64 other patches, so making this series rely on that one probably would not have been a very good idea). Max Reitz (3): nbd: Drop BDS backpointer iotests: Add wait functionality to _cleanup_qemu iotests: Add test for drive-mirror with NBD target block/nbd-client.c | 95 ++ block/nbd-client.h | 20 - block/nbd.c| 37 +++- tests/qemu-iotests/094 | 81 +++ tests/qemu-iotests/094.out | 11 + tests/qemu-iotests/common.qemu | 12 +- tests/qemu-iotests/group | 1 + 7 files changed, 177 insertions(+), 80 deletions(-) create mode 100755 tests/qemu-iotests/094 create mode 100644 tests/qemu-iotests/094.out -- 2.1.0
[Qemu-devel] [PATCH 1/3] nbd: Drop BDS backpointer
Before this patch, the opaque pointer in an NBD BDS points to a BDRVNBDState, which contains an NbdClientSession object, which in turn contains a pointer to the BDS. This pointer may become invalid due to bdrv_swap(), so drop it, and instead pass the BDS directly to the nbd-client.c functions which then retrieve the NbdClientSession object from there. Signed-off-by: Max Reitz mre...@redhat.com --- block/nbd-client.c | 95 -- block/nbd-client.h | 20 ++-- block/nbd.c| 37 - 3 files changed, 73 insertions(+), 79 deletions(-) diff --git a/block/nbd-client.c b/block/nbd-client.c index 28bfb62..4ede714 100644 --- a/block/nbd-client.c +++ b/block/nbd-client.c @@ -43,20 +43,23 @@ static void nbd_recv_coroutines_enter_all(NbdClientSession *s) } } -static void nbd_teardown_connection(NbdClientSession *client) +static void nbd_teardown_connection(BlockDriverState *bs) { +NbdClientSession *client = nbd_get_client_session(bs); + /* finish any pending coroutines */ shutdown(client-sock, 2); nbd_recv_coroutines_enter_all(client); -nbd_client_session_detach_aio_context(client); +nbd_client_session_detach_aio_context(bs); closesocket(client-sock); client-sock = -1; } static void nbd_reply_ready(void *opaque) { -NbdClientSession *s = opaque; +BlockDriverState *bs = opaque; +NbdClientSession *s = nbd_get_client_session(bs); uint64_t i; int ret; @@ -89,28 +92,29 @@ static void nbd_reply_ready(void *opaque) } fail: -nbd_teardown_connection(s); +nbd_teardown_connection(bs); } static void nbd_restart_write(void *opaque) { -NbdClientSession *s = opaque; +BlockDriverState *bs = opaque; -qemu_coroutine_enter(s-send_coroutine, NULL); +qemu_coroutine_enter(nbd_get_client_session(bs)-send_coroutine, NULL); } -static int nbd_co_send_request(NbdClientSession *s, -struct nbd_request *request, -QEMUIOVector *qiov, int offset) +static int nbd_co_send_request(BlockDriverState *bs, + struct nbd_request *request, + QEMUIOVector *qiov, int offset) { +NbdClientSession *s = nbd_get_client_session(bs); AioContext *aio_context; int rc, ret; qemu_co_mutex_lock(s-send_mutex); s-send_coroutine = qemu_coroutine_self(); -aio_context = bdrv_get_aio_context(s-bs); +aio_context = bdrv_get_aio_context(bs); aio_set_fd_handler(aio_context, s-sock, - nbd_reply_ready, nbd_restart_write, s); + nbd_reply_ready, nbd_restart_write, bs); if (qiov) { if (!s-is_unix) { socket_set_cork(s-sock, 1); @@ -129,7 +133,7 @@ static int nbd_co_send_request(NbdClientSession *s, } else { rc = nbd_send_request(s-sock, request); } -aio_set_fd_handler(aio_context, s-sock, nbd_reply_ready, NULL, s); +aio_set_fd_handler(aio_context, s-sock, nbd_reply_ready, NULL, bs); s-send_coroutine = NULL; qemu_co_mutex_unlock(s-send_mutex); return rc; @@ -195,10 +199,11 @@ static void nbd_coroutine_end(NbdClientSession *s, } } -static int nbd_co_readv_1(NbdClientSession *client, int64_t sector_num, +static int nbd_co_readv_1(BlockDriverState *bs, int64_t sector_num, int nb_sectors, QEMUIOVector *qiov, int offset) { +NbdClientSession *client = nbd_get_client_session(bs); struct nbd_request request = { .type = NBD_CMD_READ }; struct nbd_reply reply; ssize_t ret; @@ -207,7 +212,7 @@ static int nbd_co_readv_1(NbdClientSession *client, int64_t sector_num, request.len = nb_sectors * 512; nbd_coroutine_start(client, request); -ret = nbd_co_send_request(client, request, NULL, 0); +ret = nbd_co_send_request(bs, request, NULL, 0); if (ret 0) { reply.error = -ret; } else { @@ -218,15 +223,16 @@ static int nbd_co_readv_1(NbdClientSession *client, int64_t sector_num, } -static int nbd_co_writev_1(NbdClientSession *client, int64_t sector_num, +static int nbd_co_writev_1(BlockDriverState *bs, int64_t sector_num, int nb_sectors, QEMUIOVector *qiov, int offset) { +NbdClientSession *client = nbd_get_client_session(bs); struct nbd_request request = { .type = NBD_CMD_WRITE }; struct nbd_reply reply; ssize_t ret; -if (!bdrv_enable_write_cache(client-bs) +if (!bdrv_enable_write_cache(bs) (client-nbdflags NBD_FLAG_SEND_FUA)) { request.type |= NBD_CMD_FLAG_FUA; } @@ -235,7 +241,7 @@ static int nbd_co_writev_1(NbdClientSession *client, int64_t sector_num, request.len = nb_sectors * 512; nbd_coroutine_start(client, request); -ret = nbd_co_send_request(client, request, qiov, offset); +ret = nbd_co_send_request(bs, request, qiov, offset);
Re: [Qemu-devel] [PATCH v3 10/14] qemu-io: Remove growable option
On 2015-02-02 at 14:36, Kevin Wolf wrote: Am 27.01.2015 um 18:11 hat Max Reitz geschrieben: On 2015-01-27 at 12:10, Eric Blake wrote: On 01/27/2015 10:04 AM, Max Reitz wrote: On 2015-01-27 at 11:59, Eric Blake wrote: On 01/26/2015 08:00 AM, Max Reitz wrote: Remove growable option from the open command and from the qemu-io command line. qemu-io is about to be converted to BlockBackend which will make sure that no request exceeds the image size, so the only way to keep growable would be to use BlockBackend if it is not given and to directly access the BDS if it is. qemu-io is a debugging tool, therefore removing a rarely used option will have only a very small impact, if any. There was only one qemu-iotest which used the option; since it is not critical, this patch just removes it. Signed-off-by: Max Reitz mre...@redhat.com --- Reviewed-by: Eric Blake ebl...@redhat.com Do we want to ever reuse the test number that you are deleting? Good question, I think I have talked about that with Kevin before. It would not hurt too much if we were to accidentally reuse the test case number, most certainly not here in upstream. However, for all downstream versions of qemu, this might make adding the new test 16 difficult; but certainly not impossible (if someone is affected by this issue, he/she can just use 999 or something). So we may want to keep in mind not to reuse number 16, but if someone does, so be it. Is it worth a placeholder file that has a comment mentioning that the test number is intentionally reserved (and if someone attempts to run, always passes)? Seems good to me. It's a minor effort now and may avert some hassle later. How about just keeping a comment line in group? Oh, that would be too simple. Or maybe just simple enough for me; yes, that seems nicer indeed. Max
Re: [Qemu-devel] [RFC v2 2/8] vfio-pci: add aer capability support
On Wed, 2015-01-28 at 16:37 +0800, Chen Fan wrote: if we detect the aer capability in vfio device, then we should initialize the vfio device aer rigister bits. so guest OS can set this bits as needed. s/rigister/register/ Signed-off-by: Chen Fan chen.fan.f...@cn.fujitsu.com --- hw/vfio/pci.c | 72 +++ 1 file changed, 72 insertions(+) diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index 014a92c..2072261 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -2670,6 +2670,73 @@ static int vfio_add_capabilities(VFIOPCIDevice *vdev) return vfio_add_std_cap(vdev, pdev-config[PCI_CAPABILITY_LIST]); } +static void vfio_pci_aer_init(VFIOPCIDevice *vdev) +{ +PCIDevice *pdev = vdev-pdev; +PCIExpressDevice *exp = pdev-exp; +uint16_t offset = exp-aer_cap; + +if (!offset) { +return; +} + All of these need to be documented with comments. +memset(pdev-wmask + offset, 0, PCI_ERR_SIZEOF); +memset(pdev-w1cmask + offset, 0, PCI_ERR_SIZEOF); +memset(pdev-cmask + offset, 0xFF, PCI_ERR_SIZEOF); + +pci_long_test_and_set_mask(pdev-wmask + exp-exp_cap + PCI_EXP_DEVCTL, + PCI_EXP_DEVCTL_CERE | PCI_EXP_DEVCTL_NFERE | + PCI_EXP_DEVCTL_FERE | PCI_EXP_DEVCTL_URRE); +pci_long_test_and_set_mask(pdev-w1cmask + exp-exp_cap + PCI_EXP_DEVSTA, + PCI_EXP_DEVSTA_CED | PCI_EXP_DEVSTA_NFED | + PCI_EXP_DEVSTA_FED | PCI_EXP_DEVSTA_URD); + +pci_set_long(pdev-w1cmask + offset + PCI_ERR_UNCOR_STATUS, + PCI_ERR_UNC_SUPPORTED); +pci_set_long(pdev-wmask + offset + PCI_ERR_UNCOR_SEVER, + PCI_ERR_UNC_SUPPORTED); +pci_long_test_and_set_mask(pdev-w1cmask + offset + PCI_ERR_COR_STATUS, + PCI_ERR_COR_STATUS); +pci_set_long(pdev-wmask + offset + PCI_ERR_COR_MASK, + PCI_ERR_COR_SUPPORTED); + +pci_set_long(pdev-wmask + offset + PCI_ERR_CAP, + PCI_ERR_CAP_ECRC_GENE | PCI_ERR_CAP_ECRC_CHKE | + PCI_ERR_CAP_MHRE); +} + +static int vfio_add_ext_capabilities(VFIOPCIDevice *vdev) +{ +PCIDevice *pdev = vdev-pdev; +PCIExpressDevice *exp; +uint32_t header; +uint16_t next = PCI_CONFIG_SPACE_SIZE; + +if (pci_config_size(pdev) = PCI_CONFIG_SPACE_SIZE) { +return 0; +} + +header = pci_get_long(pdev-config + next); +while (header) { +switch (PCI_EXT_CAP_ID(header)) { +case PCI_EXT_CAP_ID_ERR: + exp = pdev-exp; + exp-aer_cap = next; Shouldn't we call pcie_aer_init() here? + + vfio_pci_aer_init(vdev); + break; +}; + +next = PCI_EXT_CAP_NEXT(header); +if (!next) { +return 0; +} +header = pci_get_long(pdev-config + next); I'd like to see this look more like vfio_add_std_cap(), registering every capability with the QEMU PCIe-core and setting up emulation to allow QEMU to skip capabilities that it doesn't want to expose. +} + +return 0; +} + static void vfio_pci_pre_reset(VFIOPCIDevice *vdev) { PCIDevice *pdev = vdev-pdev; @@ -3296,6 +3363,11 @@ static int vfio_initfn(PCIDevice *pdev) goto out_teardown; } +ret = vfio_add_ext_capabilities(vdev); +if (ret) { +goto out_teardown; +} + Why not extend vfio_add_capabilities()? It specifically calls vfio_add_std_cap() in order that there could be a vfio_add_ext_cap(). /* QEMU emulates all of MSI MSIX */ if (pdev-cap_present QEMU_PCI_CAP_MSIX) { memset(vdev-emulated_config_bits + pdev-msix_cap, 0xff,
Re: [Qemu-devel] [RFC v2 7/8] vfio_pci: change vfio device features bit macro to enum definition
On Wed, 2015-01-28 at 16:37 +0800, Chen Fan wrote: The why should go here Signed-off-by: Chen Fan chen.fan.f...@cn.fujitsu.com --- hw/vfio/pci.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index 54eb6b4..65247ee 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -134,6 +134,12 @@ typedef struct VFIOMSIXInfo { void *mmap; } VFIOMSIXInfo; +/* Bits in VFIOPCIDevice features field. */ +enum { +#define VFIO_FEATURE_ENABLE_VGA_BIT 0 +VFIO_FEATURE_ENABLE_VGA = (1 VFIO_FEATURE_ENABLE_VGA_BIT), +}; Rather strange to use a bitmap within an enum, isn't it? + typedef struct VFIOPCIDevice { PCIDevice pdev; VFIODevice vbasedev; @@ -154,8 +160,6 @@ typedef struct VFIOPCIDevice { PCIHostDeviceAddress host; EventNotifier err_notifier; uint32_t features; -#define VFIO_FEATURE_ENABLE_VGA_BIT 0 -#define VFIO_FEATURE_ENABLE_VGA (1 VFIO_FEATURE_ENABLE_VGA_BIT) int32_t bootindex; uint8_t pm_cap; bool has_vga;
Re: [Qemu-devel] [PATCH 04/19] libqos/ahci: Add command header helpers
On 02/02/2015 05:27 AM, Paolo Bonzini wrote: On 30/01/2015 19:41, John Snow wrote: +/* Set the #cx'th command of port #px. */ +void ahci_set_command_header(AHCIQState *ahci, uint8_t px, + uint8_t cx, AHCICommandHeader *cmd) +{ +uint64_t ba = ahci-port[px].clb; +ba += cx * sizeof(AHCICommandHeader); + +cmd-flags = cpu_to_le16(cmd-flags); +cmd-prdtl = cpu_to_le16(cmd-prdtl); +cmd-prdbc = cpu_to_le32(cmd-prdbc); +cmd-ctba = cpu_to_le64(cmd-ctba); Modifying cmd is uglyish, and especially it might hide bugs that only happen on big endian system. Please copy to a local AHCICommandHeader variable before adjusting for host endianness. Paolo +memwrite(ba, cmd, sizeof(AHCICommandHeader)); +} You're right. I had just never actually thought to reference the command after committing it before, so I hadn't considered it in that light. If I intend to share goodies with other tests it would be nice if they didn't do secretly evil things :)
Re: [Qemu-devel] [PATCH v3 09/14] qemu-io: Use blk_new_open() in openfile()
Am 26.01.2015 um 16:00 hat Max Reitz geschrieben: Signed-off-by: Max Reitz mre...@redhat.com --- qemu-io.c | 31 --- 1 file changed, 12 insertions(+), 19 deletions(-) diff --git a/qemu-io.c b/qemu-io.c index 91a445a..81f8f64 100644 --- a/qemu-io.c +++ b/qemu-io.c @@ -39,7 +39,6 @@ static ReadLineState *readline_state; static int close_f(BlockDriverState *bs, int argc, char **argv) { blk_unref(qemuio_blk); -qemuio_bs = NULL; qemuio_blk = NULL; return 0; } This doesn't seem to be correct at this point in the series. qemuio_bs is still passed to qemuio_command(), which checks whether non-global commands are allowed to run using init_check_command(). At the end of the series, please remove qemuio_bs altogether. This version of the series leaves it around without any reader. Kevin
[Qemu-devel] [PATCH 1/1] Tracify migration/rdma.c
From: Dr. David Alan Gilbert dgilb...@redhat.com Turn all the D/DD/DDDPRINTFs into trace events Turn most of the fprintf(stderr, into error_report Signed-off-by: Dr. David Alan Gilbert dgilb...@redhat.com --- migration/rdma.c | 333 --- trace-events | 62 +++ 2 files changed, 206 insertions(+), 189 deletions(-) diff --git a/migration/rdma.c b/migration/rdma.c index b32dbdf..fc351ea 100644 --- a/migration/rdma.c +++ b/migration/rdma.c @@ -26,34 +26,7 @@ #include arpa/inet.h #include string.h #include rdma/rdma_cma.h - -//#define DEBUG_RDMA -//#define DEBUG_RDMA_VERBOSE -//#define DEBUG_RDMA_REALLY_VERBOSE - -#ifdef DEBUG_RDMA -#define DPRINTF(fmt, ...) \ -do { printf(rdma: fmt, ## __VA_ARGS__); } while (0) -#else -#define DPRINTF(fmt, ...) \ -do { } while (0) -#endif - -#ifdef DEBUG_RDMA_VERBOSE -#define DDPRINTF(fmt, ...) \ -do { printf(rdma: fmt, ## __VA_ARGS__); } while (0) -#else -#define DDPRINTF(fmt, ...) \ -do { } while (0) -#endif - -#ifdef DEBUG_RDMA_REALLY_VERBOSE -#define DDDPRINTF(fmt, ...) \ -do { printf(rdma: fmt, ## __VA_ARGS__); } while (0) -#else -#define DDDPRINTF(fmt, ...) \ -do { } while (0) -#endif +#include trace.h /* * Print and error on both the Monitor and the Log file. @@ -104,8 +77,8 @@ static uint32_t known_capabilities = RDMA_CAPABILITY_PIN_ALL; do { \ if (rdma-error_state) { \ if (!rdma-error_reported) { \ -fprintf(stderr, RDMA is in an error state waiting migration \ - to abort!\n); \ +error_report(RDMA is in an error state waiting migration \ + to abort!); \ rdma-error_reported = 1; \ } \ return rdma-error_state; \ @@ -578,12 +551,13 @@ static int __qemu_rdma_add_block(RDMAContext *rdma, void *host_addr, g_hash_table_insert(rdma-blockmap, (void *) block_offset, block); -DDPRINTF(Added Block: %d, addr: % PRIu64 , offset: % PRIu64 -length: % PRIu64 end: % PRIu64 bits % PRIu64 chunks %d\n, -local-nb_blocks, (uint64_t) block-local_host_addr, block-offset, -block-length, (uint64_t) (block-local_host_addr + block-length), -BITS_TO_LONGS(block-nb_chunks) * -sizeof(unsigned long) * 8, block-nb_chunks); +trace___qemu_rdma_add_block(local-nb_blocks, + (uint64_t) block-local_host_addr, block-offset, + block-length, + (uint64_t) (block-local_host_addr + block-length), + BITS_TO_LONGS(block-nb_chunks) * + sizeof(unsigned long) * 8, + block-nb_chunks); local-nb_blocks++; @@ -614,7 +588,7 @@ static int qemu_rdma_init_ram_blocks(RDMAContext *rdma) rdma-blockmap = g_hash_table_new(g_direct_hash, g_direct_equal); memset(local, 0, sizeof *local); qemu_ram_foreach_block(qemu_rdma_init_one_block, rdma); -DPRINTF(Allocated %d local ram block structures\n, local-nb_blocks); +trace_qemu_rdma_init_ram_blocks(local-nb_blocks); rdma-block = (RDMARemoteBlock *) g_malloc0(sizeof(RDMARemoteBlock) * rdma-local_ram_blocks.nb_blocks); local-init = true; @@ -683,12 +657,12 @@ static int __qemu_rdma_delete_block(RDMAContext *rdma, ram_addr_t block_offset) local-block = NULL; } -DDPRINTF(Deleted Block: %d, addr: % PRIu64 , offset: % PRIu64 -length: % PRIu64 end: % PRIu64 bits % PRIu64 chunks %d\n, -local-nb_blocks, (uint64_t) block-local_host_addr, block-offset, -block-length, (uint64_t) (block-local_host_addr + block-length), -BITS_TO_LONGS(block-nb_chunks) * -sizeof(unsigned long) * 8, block-nb_chunks); +trace___qemu_rdma_delete_block(local-nb_blocks, + (uint64_t)block-local_host_addr, + block-offset, block-length, + (uint64_t)(block-local_host_addr + block-length), + BITS_TO_LONGS(block-nb_chunks) * + sizeof(unsigned long) * 8, block-nb_chunks); g_free(old); @@ -713,7 +687,7 @@ static void qemu_rdma_dump_id(const char *who, struct ibv_context *verbs) struct ibv_port_attr port; if (ibv_query_port(verbs, 1, port)) { -fprintf(stderr, FAILED TO QUERY PORT INFORMATION!\n); +error_report(Failed to query port information); return; } @@ -744,7 +718,7 @@ static void qemu_rdma_dump_gid(const char *who, struct rdma_cm_id *id) char dgid[33]; inet_ntop(AF_INET6, id-route.addr.addr.ibaddr.sgid, sgid, sizeof sgid); inet_ntop(AF_INET6, id-route.addr.addr.ibaddr.dgid, dgid, sizeof dgid); -DPRINTF(%s Source GID: %s, Dest GID: %s\n, who, sgid,
Re: [Qemu-devel] [PATCH 1/2] glusterfs: fix max_discard
On 02/02/15 22:58, Peter Lieven wrote: Am 02.02.2015 um 19:29 schrieb Denis V. Lunev: qemu_gluster_co_discard calculates size to discard as follows size_t size = nb_sectors * BDRV_SECTOR_SIZE; ret = glfs_discard_async(s-fd, offset, size, gluster_finish_aiocb, acb); glfs_discard_async is declared as follows: int glfs_discard_async (glfs_fd_t *fd, off_t length, size_t lent, glfs_io_cbk fn, void *data) __THROW This is problematic on i686 as sizeof(size_t) == 4. Set bl_max_discard to SIZE_MAX BDRV_SECTOR_BITS to avoid overflow on i386. Signed-off-by: Denis V. Lunev d...@openvz.org CC: Kevin Wolf kw...@redhat.com CC: Peter Lieven p...@kamp.de --- block/gluster.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/block/gluster.c b/block/gluster.c index 1eb3a8c..47bf92d 100644 --- a/block/gluster.c +++ b/block/gluster.c @@ -622,6 +622,13 @@ out: return ret; } +static void qemu_gluster_refresh_limits(BlockDriverState *bs, Error **errp) +{ +#if SIZE_MAX == UINT_MAX +bs-bl.max_discard = SIZE_MAX BDRV_SECTOR_BITS; +#endif I would write: bs-bl.max_discard = MIN(SIZE_MAX BDRV_SECTOR_BITS, INT_MAX); without the condition. Peter ok, respinned Den
[Qemu-devel] [PATCH v3 0/2] fix max_discard for NBD/gluster block drivers
Changes from v2: - dropped conditional in patch gluster code Changes from v1: - switched to UINT32_MAX for NBD - limited max_discard for gluster to SIZE_MAX on i686 only Signed-off-by: Denis V. Lunev d...@openvz.org CC: Kevin Wolf kw...@redhat.com CC: Peter Lieven p...@kamp.de
Re: [Qemu-devel] [PATCH v11 03/13] qmp: Add block-dirty-bitmap-add and block-dirty-bitmap-remove
On 02/02/2015 05:10 AM, Markus Armbruster wrote: Kevin Wolf kw...@redhat.com writes: Am 30.01.2015 um 18:04 hat John Snow geschrieben: On 01/30/2015 09:32 AM, Kevin Wolf wrote: Am 21.01.2015 um 10:34 hat Markus Armbruster geschrieben: I'm afraid I forgot much of the discussion we had before the break, and only now it's coming back, slowly. Quoting myself on naming parameters identifying nodes[*]: John Snow pointed out to me that we still haven't spelled out how this single parameter should be named. On obvious option is calling it node-name, or FOO-node-name when we have several. However, we'd then have two subtly different kinds of parameters called like that: the old ones accept *only* node names, the new ones also accept backend names, which automatically resolve to the backend's root node. Three ways to cope with that: * Find a better name. * Make the old ones accept backend names, too. Only a few, not that much work. However, there are exceptions: - blockdev-add's node-name *defines* the node name. - query-named-block-nodes's node-name *is* the node's name. * Stop worrying and embrace the inconsistency. The affected commands are headed for deprecation anyway. I think I'd go with node or FOO-node for parameters that reference nodes and accept both node names and backend names, and refrain from touching the existing node-name parameters. Wasn't the conclusion last time that we would try to find a better name for new commands and leave old commands alone because they are going to become deprecated? That is, a combination of your first and last option? That was my impression, too: Use a new name for new commands and then slowly phase out the old things. This makes the new name clear as to what it supports (BOTH backends and nodes through a common namespace) to external management utilities like libvirt. That's why I just rolled 'node-ref.' Yes, I agree with that in principle. I called it 'node' below because that's shorter and doesn't include type information in the name, but if everyone preferred 'node-ref', I wouldn't have a problem with it either. Let's go through existing uses of @node-name again: 1. Define a node name QMP commands blockdev-add (type BlockdevOptionsBase), drive-mirror 2. Report a node name QMP command query-named-block-nodes (type BlockDeviceInfo) Whatever name we end up using, 1. and 2. should probably use the same. Should they? If these commands accept directly *node* names and have no chance of referencing a backend, maybe they should use different parameter names. Maybe. But even if we use different names for things that can only be node names, never backend names, 1. and 2. should use the same name, because they're both things that can only be node names. That's what Kevin said. Note that these cases don't refer to a node, but they define/return the name of a node. No way that could ever be a backend name, unless we broke the code. 3. Node reference with backend names permitted for convenience New QMP command block-dirty-bitmap-add (type BlockDirtyBitmapAdd) and others 4. Node reference with backend names not permitted QMP commands drive-mirror @replaces, change-backing-file @image-node-name We may want to support the backend name resolves to root node convenience feature here, for consistency. Then this moves under 3. Note interface wart: change-backing-file additionally requires the backend owning the node. We need the backend to set op-blockers, we can't easily find it from the node, so we make the user point it out to us. These shouldn't be existing. As you say, we should move them to 3. Technically #3 here isn't a usage of node-name, because... I didn't use node-name for these commands. Unless I am reading this list wrong, but it's definitely not an existing use. I don't have any opinions about #4; presumably that's something we're aiming to phase out. Yes. Where phasing out simply means to extend it so that it accepts backend names. That should in theory be an easy change, except that @image-node-name has a name that isn't quite compatible with it... Our choice for 3. affects the phasing out of 4. Our choice for 3 is a naming convention for parameters referencing nodes that accept both node and backend names. If, after extending the code to accept backend names, the old names for 4. follow the naming convention for 3., we're done. Else, we still have an interface wart. We can live with it, or we can rename the parameter to follow the convention. 5. Pair of names node reference, specify exactly one QMP commands block_passwd, block_resize, blockdev-snapshot-sync We can ignore this one, because we intend to replace the commands and deprecate the old ones. Agreed, these shouldn't be existing either. If I understand you correctly, you're
Re: [Qemu-devel] [PATCH v3 04/14] block/xen: Use blk_new_open() in blk_connect()
On 2015-02-02 at 13:27, Kevin Wolf wrote: Am 26.01.2015 um 16:00 hat Max Reitz geschrieben: Signed-off-by: Max Reitz mre...@redhat.com --- hw/block/xen_disk.c | 28 1 file changed, 12 insertions(+), 16 deletions(-) diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c index 21842a0..1b0257c 100644 --- a/hw/block/xen_disk.c +++ b/hw/block/xen_disk.c @@ -40,6 +40,8 @@ #include xen_blkif.h #include sysemu/blockdev.h #include sysemu/block-backend.h +#include qapi/qmp/qdict.h +#include qapi/qmp/qstring.h /* - */ @@ -897,30 +899,24 @@ static int blk_connect(struct XenDevice *xendev) blkdev-dinfo = drive_get(IF_XEN, 0, index); if (!blkdev-dinfo) { Error *local_err = NULL; -BlockBackend *blk; -BlockDriver *drv; -BlockDriverState *bs; +QDict *options = NULL; -/* setup via xenbus - create new block driver instance */ -xen_be_printf(blkdev-xendev, 2, create new bdrv (xenbus setup)\n); -blk = blk_new_with_bs(blkdev-dev, NULL); -if (!blk) { -return -1; +if (strcmp(blkdev-fileproto, unset)) { xen_disk's usage of the string unset to mark configurations where no driver is specific is quite ugly. I think it's possible for users to pass this string as the driver name, and we would end up probing the driver instead of returning an error. Which was actually how any invalid driver name was handled before this patch: bdrv_find_whitelisted_format() would return NULL, and instead of erroring out, NULL would be passed to bdrv_open(), which probed the driver then. This patch improves the situation: Now any value that is not unset is passed as the driver option, so invalid drivers will produce an error now. Should be mentioned in the commit message. However, if the user passes unset, that still means probing. Ugly. Not sure why blkdev-fileproto == NULL wasn't used to specify that no driver was given. We could change that in a patch before this one if we wanted to clean it up. Or we could just feel reassured that xen_disk is horrible code, that this patch already fixes most of it and leave it alone. I was glad enough that I made it out alive after digging just a bit into the code. I'm fully in favor of leaving this as-is and adding a note to the commit message. Thank you for reviewing! Max +options = qdict_new(); +qdict_put_obj(options, driver, + QOBJECT(qstring_from_str(blkdev-fileproto))); } -blkdev-blk = blk; -bs = blk_bs(blk); -drv = bdrv_find_whitelisted_format(blkdev-fileproto, readonly); -if (bdrv_open(bs, blkdev-filename, NULL, NULL, qflags, - drv, local_err) != 0) { +/* setup via xenbus - create new block driver instance */ +xen_be_printf(blkdev-xendev, 2, create new bdrv (xenbus setup)\n); +blkdev-blk = blk_new_open(blkdev-dev, blkdev-filename, NULL, options, + qflags, local_err); +if (!blkdev-blk) { xen_be_printf(blkdev-xendev, 0, error: %s\n, error_get_pretty(local_err)); error_free(local_err); -blk_unref(blk); -blkdev-blk = NULL; return -1; } -assert(bs == blk_bs(blk)); } else { /* setup via qemu cmdline - already setup for us */ xen_be_printf(blkdev-xendev, 2, get configured bdrv (cmdline setup)\n); Kevin
Re: [Qemu-devel] [PATCH v3 05/14] qemu-img: Use blk_new_open() in img_open()
On 2015-02-02 at 13:35, Kevin Wolf wrote: Am 26.01.2015 um 16:00 hat Max Reitz geschrieben: Signed-off-by: Max Reitz mre...@redhat.com --- qemu-img.c | 20 ++-- 1 file changed, 6 insertions(+), 14 deletions(-) diff --git a/qemu-img.c b/qemu-img.c index 4e9a7f5..be1953d 100644 --- a/qemu-img.c +++ b/qemu-img.c @@ -291,32 +291,24 @@ static BlockBackend *img_open(const char *id, const char *filename, { BlockBackend *blk; BlockDriverState *bs; -BlockDriver *drv; char password[256]; Error *local_err = NULL; -int ret; - -blk = blk_new_with_bs(id, error_abort); -bs = blk_bs(blk); +QDict *options = NULL; if (fmt) { -drv = bdrv_find_format(fmt); -if (!drv) { -error_report(Unknown file format '%s', fmt); -goto fail; -} -} else { -drv = NULL; +options = qdict_new(); +qdict_put_obj(options, driver, QOBJECT(qstring_from_str(fmt))); I'm only noticing it here, though you did the same in the previous patches. This can be written shorter as: qdict_put(options, driver, qstring_from_str(fmt)); Indeed, will do. Max } -ret = bdrv_open(bs, filename, NULL, NULL, flags, drv, local_err); -if (ret 0) { +blk = blk_new_open(id, filename, NULL, options, flags, local_err); +if (!blk) { error_report(Could not open '%s': %s, filename, error_get_pretty(local_err)); error_free(local_err); goto fail; } +bs = blk_bs(blk); if (bdrv_is_encrypted(bs) require_io) { qprintf(quiet, Disk image '%s' is encrypted.\n, filename); if (read_password(password, sizeof(password)) 0) { Kevin
Re: [Qemu-devel] [PATCH v3 00/14] block: Remove growable, add blk_new_open()
Am 26.01.2015 um 16:00 hat Max Reitz geschrieben: This series removes the growable field from the BlockDriverState object. Its use was to clamp guest requests against the limits of the BDS; however, this can now be done more easily by moving those checks into the BlockBackend functions. In a future series, growable may be reintroduced (maybe with a different name); it will then signify whether a BDS is able to grow (in contrast to the current growable, which signifies whether it is allowed to). Maybe I will add it to the BlockDriver instead of the BDS, though. To be able to remove that field, qemu-io needs to be converted to BlockBackend, which is done by this series as well. While working on that I decided to convert blk_new_with_bs()+bdrv_open() to blk_new_open(). I was skeptical about that decision at first, but it seems good now that I was able to replace nearly every blk_new_with_bs() call by blk_new_open(). In a future series I may try to convert some remaining bdrv_open() calls to blk_new_open() as well. (And, in fact, in a future series I *will* replace the last remaining blk_new_with_bs() outside of blk_new_open() by blk_new_open().) Finally, the question needs to be asked: If, after this series, every BDS is allowed to grow, are there any users which do not use BB, but should still be disallowed from reading/writing beyond a BDS's limits? The only users I could see were the block jobs. Some of them should indeed be converted to BB; but none of them takes a user-supplied offset or size, all work on the full BDS (or only on parts which have been modified, etc.). Therefore, it is by design impossible for them to exceed the BDS's limits, which makes making all BDS's growable safe. Patches that I didn't comment on (all except 4, 5, 6, 9, 10, 13) are, assuming that you address Eric's comments, if any: Reviewed-by: Kevin Wolf kw...@redhat.com
Re: [Qemu-devel] [PATCH v3 09/14] qemu-io: Use blk_new_open() in openfile()
On 2015-02-02 at 14:34, Kevin Wolf wrote: Am 26.01.2015 um 16:00 hat Max Reitz geschrieben: Signed-off-by: Max Reitz mre...@redhat.com --- qemu-io.c | 31 --- 1 file changed, 12 insertions(+), 19 deletions(-) diff --git a/qemu-io.c b/qemu-io.c index 91a445a..81f8f64 100644 --- a/qemu-io.c +++ b/qemu-io.c @@ -39,7 +39,6 @@ static ReadLineState *readline_state; static int close_f(BlockDriverState *bs, int argc, char **argv) { blk_unref(qemuio_blk); -qemuio_bs = NULL; qemuio_blk = NULL; return 0; } This doesn't seem to be correct at this point in the series. qemuio_bs is still passed to qemuio_command(), which checks whether non-global commands are allowed to run using init_check_command(). $ ./qemu-io -c close -c 'read 0 64k' test.qcow2 read failed: No medium found Yep, will fix. Thanks for catching it! At the end of the series, please remove qemuio_bs altogether. This version of the series leaves it around without any reader. Okay, will do. Max
Re: [Qemu-devel] [PATCH 1/2] glusterfs: fix max_discard
Am 02.02.2015 um 19:29 schrieb Denis V. Lunev: qemu_gluster_co_discard calculates size to discard as follows size_t size = nb_sectors * BDRV_SECTOR_SIZE; ret = glfs_discard_async(s-fd, offset, size, gluster_finish_aiocb, acb); glfs_discard_async is declared as follows: int glfs_discard_async (glfs_fd_t *fd, off_t length, size_t lent, glfs_io_cbk fn, void *data) __THROW This is problematic on i686 as sizeof(size_t) == 4. Set bl_max_discard to SIZE_MAX BDRV_SECTOR_BITS to avoid overflow on i386. Signed-off-by: Denis V. Lunev d...@openvz.org CC: Kevin Wolf kw...@redhat.com CC: Peter Lieven p...@kamp.de --- block/gluster.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/block/gluster.c b/block/gluster.c index 1eb3a8c..47bf92d 100644 --- a/block/gluster.c +++ b/block/gluster.c @@ -622,6 +622,13 @@ out: return ret; } +static void qemu_gluster_refresh_limits(BlockDriverState *bs, Error **errp) +{ +#if SIZE_MAX == UINT_MAX +bs-bl.max_discard = SIZE_MAX BDRV_SECTOR_BITS; +#endif I would write: bs-bl.max_discard = MIN(SIZE_MAX BDRV_SECTOR_BITS, INT_MAX); without the condition. Peter +} + #ifdef CONFIG_GLUSTERFS_DISCARD static coroutine_fn int qemu_gluster_co_discard(BlockDriverState *bs, int64_t sector_num, int nb_sectors) @@ -735,6 +742,7 @@ static BlockDriver bdrv_gluster = { #ifdef CONFIG_GLUSTERFS_DISCARD .bdrv_co_discard = qemu_gluster_co_discard, #endif +.bdrv_refresh_limits = qemu_gluster_refresh_limits, #ifdef CONFIG_GLUSTERFS_ZEROFILL .bdrv_co_write_zeroes = qemu_gluster_co_write_zeroes, #endif @@ -762,6 +770,7 @@ static BlockDriver bdrv_gluster_tcp = { #ifdef CONFIG_GLUSTERFS_DISCARD .bdrv_co_discard = qemu_gluster_co_discard, #endif +.bdrv_refresh_limits = qemu_gluster_refresh_limits, #ifdef CONFIG_GLUSTERFS_ZEROFILL .bdrv_co_write_zeroes = qemu_gluster_co_write_zeroes, #endif @@ -789,6 +798,7 @@ static BlockDriver bdrv_gluster_unix = { #ifdef CONFIG_GLUSTERFS_DISCARD .bdrv_co_discard = qemu_gluster_co_discard, #endif +.bdrv_refresh_limits = qemu_gluster_refresh_limits, #ifdef CONFIG_GLUSTERFS_ZEROFILL .bdrv_co_write_zeroes = qemu_gluster_co_write_zeroes, #endif @@ -816,6 +826,7 @@ static BlockDriver bdrv_gluster_rdma = { #ifdef CONFIG_GLUSTERFS_DISCARD .bdrv_co_discard = qemu_gluster_co_discard, #endif +.bdrv_refresh_limits = qemu_gluster_refresh_limits, #ifdef CONFIG_GLUSTERFS_ZEROFILL .bdrv_co_write_zeroes = qemu_gluster_co_write_zeroes, #endif
[Qemu-devel] [PATCH 3/3] softfloat: expand out STATUS macro
Expand out and remove the STATUS macro. Signed-off-by: Peter Maydell peter.mayd...@linaro.org --- fpu/softfloat-specialize.h | 24 ++--- fpu/softfloat.c| 239 ++--- include/fpu/softfloat.h| 30 +++--- 3 files changed, 167 insertions(+), 126 deletions(-) diff --git a/fpu/softfloat-specialize.h b/fpu/softfloat-specialize.h index 21d3ec2..b5b4c67 100644 --- a/fpu/softfloat-specialize.h +++ b/fpu/softfloat-specialize.h @@ -172,7 +172,7 @@ const float128 float128_default_nan void float_raise(int8 flags , float_status *status) { -STATUS(float_exception_flags) |= flags; +status-float_exception_flags |= flags; } /* @@ -275,7 +275,7 @@ static float16 commonNaNToFloat16(commonNaNT a, float_status *status) { uint16_t mantissa = a.high54; -if (STATUS(default_nan_mode)) { +if (status-default_nan_mode) { return float16_default_nan; } @@ -380,7 +380,7 @@ static float32 commonNaNToFloat32(commonNaNT a, float_status *status) { uint32_t mantissa = a.high41; -if ( STATUS(default_nan_mode) ) { +if (status-default_nan_mode) { return float32_default_nan; } @@ -633,7 +633,7 @@ static float32 propagateFloat32NaN(float32 a, float32 b, float_status *status) float_raise(float_flag_invalid, status); } -if ( STATUS(default_nan_mode) ) +if (status-default_nan_mode) return float32_default_nan; if ((uint32_t)(av1) (uint32_t)(bv1)) { @@ -684,7 +684,7 @@ static float32 propagateFloat32MulAddNaN(float32 a, float32 b, bIsQuietNaN, bIsSignalingNaN, cIsQuietNaN, cIsSignalingNaN, infzero, status); -if (STATUS(default_nan_mode)) { +if (status-default_nan_mode) { /* Note that this check is after pickNaNMulAdd so that function * has an opportunity to set the Invalid flag. */ @@ -800,7 +800,7 @@ static float64 commonNaNToFloat64(commonNaNT a, float_status *status) { uint64_t mantissa = a.high12; -if ( STATUS(default_nan_mode) ) { +if (status-default_nan_mode) { return float64_default_nan; } @@ -836,7 +836,7 @@ static float64 propagateFloat64NaN(float64 a, float64 b, float_status *status) float_raise(float_flag_invalid, status); } -if ( STATUS(default_nan_mode) ) +if (status-default_nan_mode) return float64_default_nan; if ((uint64_t)(av1) (uint64_t)(bv1)) { @@ -887,7 +887,7 @@ static float64 propagateFloat64MulAddNaN(float64 a, float64 b, bIsQuietNaN, bIsSignalingNaN, cIsQuietNaN, cIsSignalingNaN, infzero, status); -if (STATUS(default_nan_mode)) { +if (status-default_nan_mode) { /* Note that this check is after pickNaNMulAdd so that function * has an opportunity to set the Invalid flag. */ @@ -1019,7 +1019,7 @@ static floatx80 commonNaNToFloatx80(commonNaNT a, float_status *status) { floatx80 z; -if ( STATUS(default_nan_mode) ) { +if (status-default_nan_mode) { z.low = floatx80_default_nan_low; z.high = floatx80_default_nan_high; return z; @@ -1057,7 +1057,7 @@ static floatx80 propagateFloatx80NaN(floatx80 a, floatx80 b, float_raise(float_flag_invalid, status); } -if ( STATUS(default_nan_mode) ) { +if (status-default_nan_mode) { a.low = floatx80_default_nan_low; a.high = floatx80_default_nan_high; return a; @@ -1176,7 +1176,7 @@ static float128 commonNaNToFloat128(commonNaNT a, float_status *status) { float128 z; -if ( STATUS(default_nan_mode) ) { +if (status-default_nan_mode) { z.low = float128_default_nan_low; z.high = float128_default_nan_high; return z; @@ -1208,7 +1208,7 @@ static float128 propagateFloat128NaN(float128 a, float128 b, float_raise(float_flag_invalid, status); } -if ( STATUS(default_nan_mode) ) { +if (status-default_nan_mode) { a.low = float128_default_nan_low; a.high = float128_default_nan_high; return a; diff --git a/fpu/softfloat.c b/fpu/softfloat.c index f56c618..f1170fe 100644 --- a/fpu/softfloat.c +++ b/fpu/softfloat.c @@ -151,7 +151,7 @@ static int32 roundAndPackInt32(flag zSign, uint64_t absZ, float_status *status) int8 roundIncrement, roundBits; int32_t z; -roundingMode = STATUS(float_rounding_mode); +roundingMode = status-float_rounding_mode; roundNearestEven = ( roundingMode == float_round_nearest_even ); switch (roundingMode) { case float_round_nearest_even: @@ -179,7 +179,9 @@ static int32 roundAndPackInt32(flag zSign, uint64_t absZ, float_status *status) float_raise(float_flag_invalid, status); return zSign ? (int32_t) 0x8000 : 0x7FFF; } -if
Re: [Qemu-devel] [PATCH 1/2] glusterfs: fix max_discard
Am 02.02.2015 um 21:09 schrieb Denis V. Lunev: qemu_gluster_co_discard calculates size to discard as follows size_t size = nb_sectors * BDRV_SECTOR_SIZE; ret = glfs_discard_async(s-fd, offset, size, gluster_finish_aiocb, acb); glfs_discard_async is declared as follows: int glfs_discard_async (glfs_fd_t *fd, off_t length, size_t lent, glfs_io_cbk fn, void *data) __THROW This is problematic on i686 as sizeof(size_t) == 4. Set bl_max_discard to SIZE_MAX BDRV_SECTOR_BITS to avoid overflow on i386. Signed-off-by: Denis V. Lunev d...@openvz.org CC: Kevin Wolf kw...@redhat.com CC: Peter Lieven p...@kamp.de --- block/gluster.c | 9 + 1 file changed, 9 insertions(+) diff --git a/block/gluster.c b/block/gluster.c index 1eb3a8c..8a8c153 100644 --- a/block/gluster.c +++ b/block/gluster.c @@ -622,6 +622,11 @@ out: return ret; } +static void qemu_gluster_refresh_limits(BlockDriverState *bs, Error **errp) +{ +bs-bl.max_discard = MIN(SIZE_MAX BDRV_SECTOR_BITS, INT_MAX); +} + Looking at the gluster code bl.max_transfer_length should have the same limit, but thats a different patch. #ifdef CONFIG_GLUSTERFS_DISCARD static coroutine_fn int qemu_gluster_co_discard(BlockDriverState *bs, int64_t sector_num, int nb_sectors) @@ -735,6 +740,7 @@ static BlockDriver bdrv_gluster = { #ifdef CONFIG_GLUSTERFS_DISCARD .bdrv_co_discard = qemu_gluster_co_discard, #endif +.bdrv_refresh_limits = qemu_gluster_refresh_limits, #ifdef CONFIG_GLUSTERFS_ZEROFILL .bdrv_co_write_zeroes = qemu_gluster_co_write_zeroes, #endif @@ -762,6 +768,7 @@ static BlockDriver bdrv_gluster_tcp = { #ifdef CONFIG_GLUSTERFS_DISCARD .bdrv_co_discard = qemu_gluster_co_discard, #endif +.bdrv_refresh_limits = qemu_gluster_refresh_limits, #ifdef CONFIG_GLUSTERFS_ZEROFILL .bdrv_co_write_zeroes = qemu_gluster_co_write_zeroes, #endif @@ -789,6 +796,7 @@ static BlockDriver bdrv_gluster_unix = { #ifdef CONFIG_GLUSTERFS_DISCARD .bdrv_co_discard = qemu_gluster_co_discard, #endif +.bdrv_refresh_limits = qemu_gluster_refresh_limits, #ifdef CONFIG_GLUSTERFS_ZEROFILL .bdrv_co_write_zeroes = qemu_gluster_co_write_zeroes, #endif @@ -816,6 +824,7 @@ static BlockDriver bdrv_gluster_rdma = { #ifdef CONFIG_GLUSTERFS_DISCARD .bdrv_co_discard = qemu_gluster_co_discard, #endif +.bdrv_refresh_limits = qemu_gluster_refresh_limits, #ifdef CONFIG_GLUSTERFS_ZEROFILL .bdrv_co_write_zeroes = qemu_gluster_co_write_zeroes, #endif Reviewed-by: Peter Lieven p...@kamp.de
Re: [Qemu-devel] [PATCH 13/19] libqos/ahci: add ahci command size setters
On 02/02/2015 05:35 AM, Paolo Bonzini wrote: On 30/01/2015 19:42, John Snow wrote: +void ahci_command_set_sizes(AHCICommand *cmd, uint64_t xbytes, +unsigned prd_size) +{ +/* Each PRD can describe up to 4MiB, and must not be odd. */ +g_assert_cmphex(prd_size, =, 4096 * 1024); +g_assert_cmphex(prd_size 0x01, ==, 0x00); +cmd-prd_size = prd_size; +cmd-xbytes = xbytes; +cmd-fis.count = cpu_to_le16(cmd-xbytes / AHCI_SECTOR_SIZE); Why do you need cpu_to_le16 here, instead of having it in the function that writes the command to guest memory? Paolo +cmd-header.prdtl = size_to_prdtl(cmd-xbytes, cmd-prd_size); +} + In this case, only the command header had a utility written for it to flip the bits for me. This is part of the FIS, instead, which has no explicit flip-on-write mechanism inside of commit. So, it's correct, but not terribly consistent. I can write a fis write helper to make this more internally consistent about when we handle it for the user and when we don't.
Re: [Qemu-devel] [PATCH 0/3] softfloat: Remove STATUS macros
On 02/02/2015 12:31 PM, Peter Maydell wrote: Peter Maydell (3): softfloat: Expand out the STATUS_PARAM macro softfloat: expand out STATUS_VAR softfloat: expand out STATUS macro fpu/softfloat-specialize.h | 124 ++-- fpu/softfloat.c| 1609 include/fpu/softfloat.h| 371 +- target-mips/msa_helper.c | 74 +- 4 files changed, 1188 insertions(+), 990 deletions(-) Reviewed-by: Richard Henderson r...@twiddle.net r~
Re: [Qemu-devel] [PATCH 1/3] softfloat: Expand out the STATUS_PARAM macro
On 02/02/2015 12:31 PM, Peter Maydell wrote: -void float_raise( int8 flags STATUS_PARAM ) +void float_raise(int8 flags , float_status *status) Extra space before comma. r~
Re: [Qemu-devel] [PATCH v2 1/2] configure: Default to enable module build
On 13/01/2015 09:53, Fam Zheng wrote: We have module build support around for a while, but also had it bitrot several times. It probably makes sense to enable it by default so that people can notice and use it. Counterpart to --enable-modules, which is turned as default, --disable-modules is added to suppress it. If both are omitted, the support is guesses as usual. Signed-off-by: Fam Zheng f...@redhat.com I squashed this to unbreak static compilation, but Peter reported that module linking fails on ARM host: LINK block/curl.so /usr/bin/ld: block/curl.o: relocation R_ARM_THM_MOVW_ABS_NC against `__stack_chk_guard' can not be used when making a shared object; recompile with -fPIC block/curl.o: could not read symbols: Bad value collect2: error: ld returned 1 exit status diff --git a/configure b/configure index a9ae57a..4fae00a 100755 --- a/configure +++ b/configure @@ -1536,9 +1536,6 @@ if compile_prog -Werror -fno-gcse ; then fi if test $static = yes ; then - if test $modules = yes ; then -error_exit static and modules are mutually incompatible - fi if test $pie = yes ; then error_exit static and pie are mutually incompatible else Stage this hunk [y,n,q,a,d,/,j,J,g,e,?]? y @@ -2763,6 +2760,14 @@ fi module_try_enable() { force=$1 + if test $static = yes; then +if $force; then + error_exit static and modules are mutually incompatible +else + modules=no + return +fi + fi shacmd_probe=sha1sum sha1 shasum for c in $shacmd_probe; do if has $c; then Is the above ok? Paolo
Re: [Qemu-devel] [PATCH 17/19] qtest/ahci: Add a macro bootup routine
On 02/02/2015 05:37 AM, Paolo Bonzini wrote: On 30/01/2015 19:42, John Snow wrote: +/** + * Boot and fully enable the HBA device. + * @see ahci_boot, ahci_pci_enable and ahci_hba_enable. + */ +static AHCIQState *ahci_macro_bootup(void) Ugly name... I would just leave out this patch. Paolo +{ +AHCIQState *ahci; +ahci = ahci_boot(); + +ahci_pci_enable(ahci); +ahci_hba_enable(ahci); + +return ahci; +} It comes in handy later for testing migration so I don't have to do a lot of boilerplate for each instance, though it is just a convenience subroutine with no logic of its own. I like to cut down on boilerplate as much as possible to expose the logic of the test as much as possible. Have a suggestion for a better name, or are you very adamant about culling it? --js
Re: [Qemu-devel] [PATCH 04/19] libqos/ahci: Add command header helpers
On 02/02/2015 05:25 AM, Paolo Bonzini wrote: On 30/01/2015 19:41, John Snow wrote: +/* Construct our Command Header (set_command_header handles endianness.) */ +memset(cmd, 0x00, sizeof(cmd)); +cmd.flags = 5; /* reg_h2d_fis is 5 double-words long */ +cmd.flags = 0x400; /* clear PxTFD.STS.BSY when done */ And here |= became = ? Paolo +cmd.prdtl = 1; /* One PRD table entry. */ A typo that testing didn't pick up on. (I wonder how?) Thank you.
[Qemu-devel] [PATCH 2/3] iotests: Add wait functionality to _cleanup_qemu
The qemu process does not always need to be killed, just waiting for it can be fine, too. This introduces a way to do so. Signed-off-by: Max Reitz mre...@redhat.com --- tests/qemu-iotests/common.qemu | 12 +++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/tests/qemu-iotests/common.qemu b/tests/qemu-iotests/common.qemu index 8e618b5..4e1996c 100644 --- a/tests/qemu-iotests/common.qemu +++ b/tests/qemu-iotests/common.qemu @@ -187,13 +187,23 @@ function _launch_qemu() # Silenty kills the QEMU process +# +# If $wait is set to anything other than the empty string, the process will not +# be killed but only waited for, and any output will be forwarded to stdout. If +# $wait is empty, the process will be killed and all output will be suppressed. function _cleanup_qemu() { # QEMU_PID[], QEMU_IN[], QEMU_OUT[] all use same indices for i in ${!QEMU_OUT[@]} do -kill -KILL ${QEMU_PID[$i]} 2/dev/null +if [ -z ${wait} ]; then +kill -KILL ${QEMU_PID[$i]} 2/dev/null +fi wait ${QEMU_PID[$i]} 2/dev/null # silent kill +if [ -n ${wait} ]; then +cat ${QEMU_OUT[$i]} | _filter_testdir | _filter_qemu \ + | _filter_qemu_io | _filter_qmp +fi rm -f ${QEMU_FIFO_IN}_${i} ${QEMU_FIFO_OUT}_${i} eval exec ${QEMU_IN[$i]}- # close file descriptors eval exec ${QEMU_OUT[$i]}- -- 2.1.0
[Qemu-devel] [PATCH 3/3] iotests: Add test for drive-mirror with NBD target
When the drive-mirror block job is completed, it will call bdrv_swap() on the source and the target BDS; this should obviously not result in a segmentation fault. Signed-off-by: Max Reitz mre...@redhat.com --- tests/qemu-iotests/094 | 81 ++ tests/qemu-iotests/094.out | 11 +++ tests/qemu-iotests/group | 1 + 3 files changed, 93 insertions(+) create mode 100755 tests/qemu-iotests/094 create mode 100644 tests/qemu-iotests/094.out diff --git a/tests/qemu-iotests/094 b/tests/qemu-iotests/094 new file mode 100755 index 000..27a2be2 --- /dev/null +++ b/tests/qemu-iotests/094 @@ -0,0 +1,81 @@ +#!/bin/bash +# +# Test case for drive-mirror to NBD (especially bdrv_swap() on NBD BDS) +# +# Copyright (C) 2015 Red Hat, Inc. +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see http://www.gnu.org/licenses/. +# + +# creator +owner=mre...@redhat.com + +seq=$(basename $0) +echo QA output created by $seq + +here=$PWD +tmp=/tmp/$$ +status=1 # failure is the default! + +trap exit \$status 0 1 2 3 15 + +# get standard environment, filters and checks +. ./common.rc +. ./common.filter +. ./common.qemu + +_supported_fmt generic +_supported_proto nbd +_supported_os Linux +_unsupported_imgopts subformat=monolithicFlat subformat=twoGbMaxExtentFlat + +_make_test_img 64M +$QEMU_IMG create -f $IMGFMT $TEST_DIR/source.$IMGFMT 64M | _filter_img_create + +_launch_qemu -drive if=none,id=src,file=$TEST_DIR/source.$IMGFMT,format=raw \ + -nodefaults + +_send_qemu_cmd $QEMU_HANDLE \ +{'execute': 'qmp_capabilities'} \ +'return' + +# 'format': 'nbd' is not actually correct, but this is probably the only way +# to test bdrv_swap() on an NBD BDS +_send_qemu_cmd $QEMU_HANDLE \ +{'execute': 'drive-mirror', + 'arguments': {'device': 'src', +'target': '$TEST_IMG', +'format': 'nbd', +'sync':'full', +'mode':'existing'}} \ +'BLOCK_JOB_READY' + +_send_qemu_cmd $QEMU_HANDLE \ +{'execute': 'block-job-complete', + 'arguments': {'device': 'src'}} \ +'BLOCK_JOB_COMPLETE' + +_send_qemu_cmd $QEMU_HANDLE \ +{'execute': 'quit'} \ +'return' + +wait=1 _cleanup_qemu + +_cleanup_test_img +rm -f $TEST_DIR/source.$IMGFMT + +# success, all done +echo '*** done' +rm -f $seq.full +status=0 diff --git a/tests/qemu-iotests/094.out b/tests/qemu-iotests/094.out new file mode 100644 index 000..b66dc07 --- /dev/null +++ b/tests/qemu-iotests/094.out @@ -0,0 +1,11 @@ +QA output created by 094 +Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864 +Formatting 'TEST_DIR/source.IMGFMT', fmt=IMGFMT size=67108864 +{return: {}} +{return: {}} +{timestamp: {seconds: TIMESTAMP, microseconds: TIMESTAMP}, event: BLOCK_JOB_READY, data: {device: src, len: 67108864, offset: 67108864, speed: 0, type: mirror}} +{return: {}} +{timestamp: {seconds: TIMESTAMP, microseconds: TIMESTAMP}, event: BLOCK_JOB_COMPLETED, data: {device: src, len: 67108864, offset: 67108864, speed: 0, type: mirror}} +{return: {}} +{timestamp: {seconds: TIMESTAMP, microseconds: TIMESTAMP}, event: SHUTDOWN} +*** done diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group index 4b2b93b..6e2447a 100644 --- a/tests/qemu-iotests/group +++ b/tests/qemu-iotests/group @@ -99,6 +99,7 @@ 090 rw auto quick 091 rw auto 092 rw auto quick +094 rw auto quick 095 rw auto quick 097 rw auto backing 098 rw auto backing quick -- 2.1.0
Re: [Qemu-devel] [PATCH v3 10/14] qemu-io: Remove growable option
Am 27.01.2015 um 18:11 hat Max Reitz geschrieben: On 2015-01-27 at 12:10, Eric Blake wrote: On 01/27/2015 10:04 AM, Max Reitz wrote: On 2015-01-27 at 11:59, Eric Blake wrote: On 01/26/2015 08:00 AM, Max Reitz wrote: Remove growable option from the open command and from the qemu-io command line. qemu-io is about to be converted to BlockBackend which will make sure that no request exceeds the image size, so the only way to keep growable would be to use BlockBackend if it is not given and to directly access the BDS if it is. qemu-io is a debugging tool, therefore removing a rarely used option will have only a very small impact, if any. There was only one qemu-iotest which used the option; since it is not critical, this patch just removes it. Signed-off-by: Max Reitz mre...@redhat.com --- Reviewed-by: Eric Blake ebl...@redhat.com Do we want to ever reuse the test number that you are deleting? Good question, I think I have talked about that with Kevin before. It would not hurt too much if we were to accidentally reuse the test case number, most certainly not here in upstream. However, for all downstream versions of qemu, this might make adding the new test 16 difficult; but certainly not impossible (if someone is affected by this issue, he/she can just use 999 or something). So we may want to keep in mind not to reuse number 16, but if someone does, so be it. Is it worth a placeholder file that has a comment mentioning that the test number is intentionally reserved (and if someone attempts to run, always passes)? Seems good to me. It's a minor effort now and may avert some hassle later. How about just keeping a comment line in group? Kevin
[Qemu-devel] [PATCH 2/2] nbd: fix max_discard
nbd_co_discard calls nbd_client_session_co_discard which uses uint32_t as the length in bytes of the data to discard due to the following definition: struct nbd_request { uint32_t magic; uint32_t type; uint64_t handle; uint64_t from; uint32_t len; -- the length of data to be discarded, in bytes } QEMU_PACKED; Thus we should limit bl_max_discard to UINT32_MAX BDRV_SECTOR_BITS to avoid overflow. Signed-off-by: Denis V. Lunev d...@openvz.org Reviewed-by: Peter Lieven p...@kamp.de CC: Kevin Wolf kw...@redhat.com --- block/nbd.c | 8 1 file changed, 8 insertions(+) diff --git a/block/nbd.c b/block/nbd.c index 04cc845..99af713 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -301,6 +301,11 @@ static int nbd_co_flush(BlockDriverState *bs) return nbd_client_session_co_flush(s-client); } +static void nbd_refresh_limits(BlockDriverState *bs, Error **errp) +{ +bs-bl.max_discard = UINT32_MAX BDRV_SECTOR_BITS; +} + static int nbd_co_discard(BlockDriverState *bs, int64_t sector_num, int nb_sectors) { @@ -396,6 +401,7 @@ static BlockDriver bdrv_nbd = { .bdrv_close = nbd_close, .bdrv_co_flush_to_os= nbd_co_flush, .bdrv_co_discard= nbd_co_discard, +.bdrv_refresh_limits= nbd_refresh_limits, .bdrv_getlength = nbd_getlength, .bdrv_detach_aio_context= nbd_detach_aio_context, .bdrv_attach_aio_context= nbd_attach_aio_context, @@ -413,6 +419,7 @@ static BlockDriver bdrv_nbd_tcp = { .bdrv_close = nbd_close, .bdrv_co_flush_to_os= nbd_co_flush, .bdrv_co_discard= nbd_co_discard, +.bdrv_refresh_limits= nbd_refresh_limits, .bdrv_getlength = nbd_getlength, .bdrv_detach_aio_context= nbd_detach_aio_context, .bdrv_attach_aio_context= nbd_attach_aio_context, @@ -430,6 +437,7 @@ static BlockDriver bdrv_nbd_unix = { .bdrv_close = nbd_close, .bdrv_co_flush_to_os= nbd_co_flush, .bdrv_co_discard= nbd_co_discard, +.bdrv_refresh_limits= nbd_refresh_limits, .bdrv_getlength = nbd_getlength, .bdrv_detach_aio_context= nbd_detach_aio_context, .bdrv_attach_aio_context= nbd_attach_aio_context, -- 1.9.1
Re: [Qemu-devel] [RFC v2 8/8] vfio-pci: add VFIO_FEATURE_ENABLE_AER_CAP feature
On Wed, 2015-01-28 at 16:37 +0800, Chen Fan wrote: for old machine types, we should disable aercap feature. Signed-off-by: Chen Fan chen.fan.f...@cn.fujitsu.com --- hw/vfio/pci.c | 13 ++--- include/hw/compat.h | 4 2 files changed, 14 insertions(+), 3 deletions(-) diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index 65247ee..0d830e6 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -138,6 +138,8 @@ typedef struct VFIOMSIXInfo { enum { #define VFIO_FEATURE_ENABLE_VGA_BIT 0 VFIO_FEATURE_ENABLE_VGA = (1 VFIO_FEATURE_ENABLE_VGA_BIT), +#define VFIO_FEATURE_ENABLE_AER_CAP_BIT 1 +VFIO_FEATURE_ENABLE_AER_CAP = (1 VFIO_FEATURE_ENABLE_AER_CAP_BIT), }; typedef struct VFIOPCIDevice { @@ -2724,10 +2726,12 @@ static int vfio_add_ext_capabilities(VFIOPCIDevice *vdev) while (header) { switch (PCI_EXT_CAP_ID(header)) { case PCI_EXT_CAP_ID_ERR: - exp = pdev-exp; - exp-aer_cap = next; + if (vdev-features VFIO_FEATURE_ENABLE_AER_CAP) { + exp = pdev-exp; + exp-aer_cap = next; - vfio_pci_aer_init(vdev); + vfio_pci_aer_init(vdev); + } break; }; @@ -3498,6 +3502,9 @@ static Property vfio_pci_dev_properties[] = { DEFINE_PROP_BIT(x-vga, VFIOPCIDevice, features, VFIO_FEATURE_ENABLE_VGA_BIT, false), DEFINE_PROP_INT32(bootindex, VFIOPCIDevice, bootindex, -1), +DEFINE_PROP_BIT(aercap, VFIOPCIDevice, features, +VFIO_FEATURE_ENABLE_AER_CAP_BIT, true), + /* * TODO - support passed fds... is this necessary? * DEFINE_PROP_STRING(vfiofd, VFIOPCIDevice, vfiofd_name), diff --git a/include/hw/compat.h b/include/hw/compat.h index 313682a..72a2cdb 100644 --- a/include/hw/compat.h +++ b/include/hw/compat.h @@ -30,6 +30,10 @@ .driver = virtio-pci,\ .property = virtio-pci-bus-master-bug-migration,\ .value= on,\ +},{\ +.driver = vfio-pci,\ +.property = aercap,\ +.value= off,\ This will leave it enabled on both Q35 and 440FX afaict, so I'm not sure what this fixes. We don't care about migration compatibility with vfio-pci. } #endif /* HW_COMPAT_H */
Re: [Qemu-devel] [RFC v2 4/8] vfio-pci: pass the aer error to guest
On Wed, 2015-01-28 at 16:37 +0800, Chen Fan wrote: when the vfio device encounters an uncorrectable error in host, the vfio_pci driver will signal the eventfd registered by this vfio device, the results in the qemu eventfd handler getting invoked. this patch is to pass the error to guest and have the guest driver recover from the error. Signed-off-by: Chen Fan chen.fan.f...@cn.fujitsu.com --- hw/vfio/pci.c | 34 -- 1 file changed, 28 insertions(+), 6 deletions(-) diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index 2072261..8c81bb3 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -3141,18 +3141,40 @@ static void vfio_put_device(VFIOPCIDevice *vdev) static void vfio_err_notifier_handler(void *opaque) { VFIOPCIDevice *vdev = opaque; +PCIDevice *dev = vdev-pdev; +PCIEAERMsg msg = { +.severity = 0, +.source_id = (pci_bus_num(dev-bus) 8) | dev-devfn, +}; if (!event_notifier_test_and_clear(vdev-err_notifier)) { return; } +/* we should read the error details from the real hardware + * configuration spaces, here we only need to do is signaling + * to guest an uncorrectable error has occurred. + */ +if (dev-exp.aer_cap) { +uint8_t *aer_cap = dev-config + dev-exp.aer_cap; +uint32_t uncor_status; +bool isfatal; + +uncor_status = vfio_pci_read_config(dev, + dev-exp.aer_cap + PCI_ERR_UNCOR_STATUS, 4); + +isfatal = uncor_status pci_get_long(aer_cap + PCI_ERR_UNCOR_SEVER); + +msg.severity = isfatal ? PCI_ERR_ROOT_CMD_FATAL_EN : + PCI_ERR_ROOT_CMD_NONFATAL_EN; + +pcie_aer_msg(dev, msg); +return; +} What if the guest machine type is 440FX? We've just killed the existing vm_stop functionality for the majority of users. + /* - * TBD. Retrieve the error details and decide what action - * needs to be taken. One of the actions could be to pass - * the error to the guest and have the guest driver recover - * from the error. This requires that PCIe capabilities be - * exposed to the guest. For now, we just terminate the - * guest to contain the error. + * If the aer capability is not exposed to the guest. we just + * terminate the guest to contain the error. Just because it's exposed doesn't mean the guest chipset allows access to it, right? */ error_report(%s(%04x:%02x:%02x.%x) Unrecoverable error detected.
[Qemu-devel] [PATCH 0/3] softfloat: Remove STATUS macros
This patchset expands out all the uses of the softfloat STATUS_VAR, STATUS_PARAM and STATUS macros and removes their definitions. In my opinion these macros are basically just obfuscating the passing around and use of a float_status pointer in the softfloat code, and serve no useful purpose. I used to think that we should retain them on the offchance we wanted to pull in new softfloat upstream code, but this is wrong on three counts now: (1) there's never been any new softfloat upstream code to take (2) even if there were, we couldn't take it because of the license (3) these macros aren't even in the upstream version; they were added by Fabrice when he integrated softfloat into QEMU We've never used whatever flexibility might in theory be permitted by the indirection through this set of macros, so this patchset just deletes them all. The changes make an alarmingly large diffstat, but it's just mechanical expansion plus fixing of coding style issues on the affected lines. (Lots of style fixing because the softfloat upstream coding is (a) a long way away from QEMU's and (b) bonkers.) This obviously is likely to induce merge conflicts with any other in-flight softfloat changes, for which I apologise in advance. [Next on the cleanup list: get rid of those 'int8/int32/int64' types...] Peter Maydell (3): softfloat: Expand out the STATUS_PARAM macro softfloat: expand out STATUS_VAR softfloat: expand out STATUS macro fpu/softfloat-specialize.h | 124 ++-- fpu/softfloat.c| 1609 include/fpu/softfloat.h| 371 +- target-mips/msa_helper.c | 74 +- 4 files changed, 1188 insertions(+), 990 deletions(-) -- 1.9.1
Re: [Qemu-devel] [PATCH v2 01/11] cpu_ldst.h: Allow NB_MMU_MODES to be 7
On 01/29/2015 10:55 AM, Peter Maydell wrote: Support guest CPUs which need 7 MMU index values. Add a comment about what would be required to raise the limit further (trivial for 8, TCG backend rework for 9 or more). Signed-off-by: Peter Maydell peter.mayd...@linaro.org Reviewed-by: Greg Bellows greg.bell...@linaro.org --- include/exec/cpu_ldst.h | 28 +--- 1 file changed, 25 insertions(+), 3 deletions(-) Reviewed-by: Richard Henderson r...@twiddle.net r~
Re: [Qemu-devel] [v2][PATCH] libxl: add one machine property to support IGD GFX passthrough
On 2015/2/2 20:19, Wei Liu wrote: On Mon, Feb 02, 2015 at 09:17:23AM +0800, Tiejun Chen wrote: When we're working to support IGD GFX passthrough with qemu upstream, instead of -gfx_passthru we'd like to make that a machine option, -machine xxx,-igd-passthru=on. This need to bring a change on tool side. Signed-off-by: Tiejun Chen tiejun.c...@intel.com --- v2: * Based on some discussions with Wei we'd like to keep both old option, -gfx_passthru, and new machine property option, -machine xxx,-igd-passthru=on at the same time but deprecate the old one. Then finally we remove the old one at that point that to give downstream (in this case, Xen) time to cope with the change. My suggestion has one premise -- if upstream QEMU has already released that -gfx_passthru option. If there is no old one (in upstream QEMU) at all, then there is nothing to keep and deprecate. Understood. tools/libxl/libxl_dm.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c index c2b0487..8405f0b 100644 --- a/tools/libxl/libxl_dm.c +++ b/tools/libxl/libxl_dm.c @@ -701,6 +701,11 @@ static char ** libxl__build_device_model_args_new(libxl__gc *gc, Note this function is upstream QEMU specfic. Yeah. flexarray_append(dm_args, -net); flexarray_append(dm_args, none); } +/* + * Although we already introduce 'igd-passthru', but we'd like + * to remove this until we give downstream time to cope with + * the change. + */ if (libxl_defbool_val(b_info-u.hvm.gfx_passthru)) { flexarray_append(dm_args, -gfx_passthru); } The comment contradicts what I know (or what I think I know). In our last email exchange you said there was no -gfx_passthru in any version of upstream QEMU. So, shouldn't you just remove this `if' statement? Right. So what about this? libxl: add one machine property to support IGD GFX passthrough When we're working to support IGD GFX passthrough with qemu upstream, we'd like to introduce a machine option, -machine xxx,igd-passthru=on, to enable/disable that feature. And we also remove that old option, -gfx_passthru, just from the case of LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN since actually no any qemu stream version really need or use that. Signed-off-by: Tiejun Chen tiejun.c...@intel.com diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c index c2b0487..b888f19 100644 --- a/tools/libxl/libxl_dm.c +++ b/tools/libxl/libxl_dm.c @@ -701,9 +701,6 @@ static char ** libxl__build_device_model_args_new(libxl__gc *gc, flexarray_append(dm_args, -net); flexarray_append(dm_args, none); } -if (libxl_defbool_val(b_info-u.hvm.gfx_passthru)) { -flexarray_append(dm_args, -gfx_passthru); -} } else { if (!sdl !vnc) { flexarray_append(dm_args, -nographic); @@ -748,6 +745,11 @@ static char ** libxl__build_device_model_args_new(libxl__gc *gc, machinearg, max_ram_below_4g); } } + +if (libxl_defbool_val(b_info-u.hvm.gfx_passthru)) { +machinearg = libxl__sprintf(gc, %s,igd-passthru=on, machinearg); +} + flexarray_append(dm_args, machinearg); for (i = 0; b_info-extra_hvm b_info-extra_hvm[i] != NULL; i++) flexarray_append(dm_args, b_info-extra_hvm[i]); Thanks Tiejun Wei. @@ -748,6 +753,11 @@ static char ** libxl__build_device_model_args_new(libxl__gc *gc, machinearg, max_ram_below_4g); } } + +if (libxl_defbool_val(b_info-u.hvm.gfx_passthru)) { +machinearg = libxl__sprintf(gc, %s,igd-passthru=on, machinearg); +} + flexarray_append(dm_args, machinearg); for (i = 0; b_info-extra_hvm b_info-extra_hvm[i] != NULL; i++) flexarray_append(dm_args, b_info-extra_hvm[i]); -- 1.9.1
Re: [Qemu-devel] [PATCH v4 07/18] spapr_iommu: Implement free_table() helper
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 02/02/2015 05:37 PM, David Gibson wrote: On Thu, Jan 29, 2015 at 08:27:19PM +1100, Alexey Kardashevskiy wrote: Every sPAPRTCETable object holds an IOMMU memory region which holds a referenced to the sPAPRTCETable instance. So if we want to free an sPAPRTCETable instance, calling object_unref() will not be enough as embedded memory region will hold the reference and we need to break the loop. This adds a spapr_tce_free_table() helper which destroys the embedded memory region and then calls object_unref() on the sPAPRTCETable instance which will succeed now. The helper adds a new child under unique name derived from LIOBN. As we are here, fix spapr_tce_new_table() callers. At the moment spapr_tce_new_table() references sPAPRTCETable twice - once in object_new() and second time in object_property_add_child(). The callers of spapr_tce_new_table() should take care of correct referencing. Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru Reviewed-by: David Gibson da...@gibson.dropbear.id.au With the caveat that I don't know the QOM refcounting model well enough to really analyze if this is correct. It's not obviously wrong, and doesn't look like it will break anything in the actual PAPR related code. I am redoing this part to always keep sPAPRTCETable and not free them on every reboot so you can ignore this type of problems now. Thanks! - -- Alexey -BEGIN PGP SIGNATURE- Version: GnuPG v1 iQIcBAEBAgAGBQJU0CUvAAoJEIYTPdgrwSC5mpcP+wa3vulve++JX06UWbTsaarV nXQhWsKHAFQZmQYzRzyUwl00JEIp14NZ9dzBXEaAIutQmiFDeNA7a3QCHpnkt2Ei JJzzzK5ygjcX9H2RP+hzaoJkHxdFKMQ9TREKe86IsHYw34rREzfS/Li6PkTm+LSo 3jnnd1k0GGB9AX/hwvDHHcdQRpio2sI0lG7WH3R8FDbHAHo/YW3JvrPL3iTDpNDA HYR9Hy4JynzQIGJhzYFkmODEjUbLtn/c/xIYVZJDNdDdLETF1sK+iTpbinxtPe5n JsKGncsFmZGKotJ3ZAjPxogdYhqtvq5/VsUpc2jHlC4QKgEsVLqUm5cfu+hl5a00 w/4W9tB+lqbFFvTxNAOokYPnWaD+NJZ3DY03vsCKRk16/MrMjdfMpsj/m3ai6pAx Pc/A2JfZ30SjqbFBLQJZKBDKQ9JARp7F/S60s7iW2D4ELRChdprcb/cE/gVGoWFV DYKpPRtdzjNNtjMLNd6OuWVu3LItcfz42VQwxrwjyZOIIgpdpOM1hBBXqMjhOykW gFtlf/zEDAxQ9cl0RrjH9gC1UwnP7w+slvspa5MNghnFKByHBOwUW9bNpDc2xcp2 9I6j1yjxI81Si96uGFj8Ba2LzBGaTzVIbiSsnV2cvZKirNAN/Gnm9lcgwOOMKA09 QAx4oiji73rE6GzBZxFx =o1mo -END PGP SIGNATURE-
Re: [Qemu-devel] [PATCH 4/6] vmport_rpc: Add QMP access to vmport_rpc object.
On 01/30/15 17:32, Eric Blake wrote: On 01/30/2015 02:06 PM, Don Slutz wrote: This adds two new inject commands: inject-vmport-reboot inject-vmport-halt And three guest info commands: vmport-guestinfo-set vmport-guestinfo-get query-vmport-guestinfo More details in qmp-commands.hx Signed-off-by: Don Slutz dsl...@verizon.com --- +static int foreach_dynamic_VMPortRpc_device(FindVMPortRpcDeviceFunc *func, +void *arg) +{ +VMPortRpcFind find = { +.func = func, +.arg = arg, +}; + +/* Loop through all VMPortRpc devices that were spawened outside s/spawened/spawned/ Fixed. #ifdef VMPORT_RPC_DEBUG /* * Add helper function for traceing. This routine will convert Pre-existing as of this patch, but while you are here: s/traceing/tracing/ unless it occurred earlier in the series, in which case fix it there. Fixed. +static void convert_local_rc(Error **errp, int rc) +{ +switch (rc) { +case 0: +break; +case VMPORT_DEVICE_NOT_FOUND: +error_set(errp, QERR_DEVICE_NOT_FOUND, TYPE_VMPORT_RPC); +break; +case SEND_NOT_OPEN: +error_set(errp, ERROR_CLASS_GENERIC_ERROR, VMWare rpc not open); shorter as: error_setg(errp, VMWare rpc not open); and similar for all your other uses of ERROR_CLASS_GENERIC_ERROR Fixed. +++ b/qapi-schema.json @@ -1271,6 +1271,101 @@ { 'command': 'inject-nmi' } ## +# @inject-vmport-reboot: +# +# Injects a VMWare Tools reboot to the guest. +# +# Returns: If successful, nothing +# +# Since: 2.3 +# +## +{ 'command': 'inject-vmport-reboot' } + +## +# @inject-vmport-halt: +# +# Injects a VMWare Tools halt to the guest. +# +# Returns: If successful, nothing +# +# Since: 2.3 +# +## +{ 'command': 'inject-vmport-halt' } Why two commands? Why not just 'inject-vmport-action' with a parameter that says whether the action is 'reboot', 'halt', or something else? This is a corner case. I will convert it to inject-vmport-action with a parameter. + +## +# @vmport-guestinfo-set: +# +# Set a VMWare Tools guestinfo key to a value +# +# @key: the key to set +# +# @value: The data to set the key to +# +# @format: #optional value encoding (default 'utf8'). +# - base64: value must be base64 encoded text. Its binary +#decoding gets set. +#Bug: invalid base64 is currently not rejected. We should fix the bug, rather than document it. Well, this turns out to be complex. If there is a bug it is in GLIB. However the Glib reference manual refers to RFC 1421 and RFC 2045 and MIME encoding. Based on all that (which seems to match: http://en.wikipedia.org/wiki/Base64 ) MIME states that all characters outside the (base64) alphabet are to be ignored. Testing shows that g_base64_decode() does this. The confusion is that most non-MIME uses reject a base64 string that contain characters outside the alphabet. I was just following the other uses of base64 in this file. DataFormat refers to RFC 3548, which has the info: Implementations MUST reject the encoding if it contains characters outside the base alphabet when interpreting base encoded data, unless the specification referring to this document explicitly states otherwise. Such specifications may, as MIME does, instead state that characters outside the base encoding alphabet should simply be ignored when interpreting data (be liberal in what you accept). So with GLIB going the MIME way, I do not think this is a QEMU bug (you could consider this a GLIB bug, but the document I found says that GLIB goes the MIME way and so does not reject anything). Having now looked closer at this, I plan to drop this bug section since I am happy with what GLIB is doing. +#Whitespace *is* invalid. +# - utf8: value's UTF-8 encoding is written +# - value itself is always Unicode regardless of format, like +#any other string. This was confusing to read - there are three bullets, so it looks like you meant to document three valid DataFormat enum values; but there are only two. The comment about 'value' being supplied as valid JSON UTF8 and only later decoded according to 'format' might belong better on 'value', if at all. On the other hand, I see you are just blindly copy-and-pasting from 'ringbuf-write'. Yup. Will just drop the - value... +# +# Returns: Nothing on success +# +# Since: 2.3 +## +{ 'command': 'vmport-guestinfo-set', + 'data': {'key': 'str', 'value': 'str', + '*format': 'DataFormat'} } + +## +# @vmport-guestinfo-get: +# +# Get a VMWare Tools guestinfo value for a key +# +# @key: the key to get +# +# @format: #optional data encoding (default 'utf8'). +# - base64: the value is returned in base64 encoding. +# - utf8: the value is interpreted as UTF-8. +#Bug: can
Re: [Qemu-devel] [v2][PATCH] libxl: add one machine property to support IGD GFX passthrough
On 2015/2/2 19:08, Ian Campbell wrote: On Mon, 2015-02-02 at 09:17 +0800, Tiejun Chen wrote: When we're working to support IGD GFX passthrough with qemu upstream, instead of -gfx_passthru we'd like to make that a machine option, -machine xxx,-igd-passthru=on. This need to bring a change on tool side. From which Qemu version is this new option accepted? What will a verison of qemu prior to then do when presented with the new option? Sorry, maybe I'm not describing this correctly. Actually qemu upstream never own this option, '-gfx_passthru' at all. This just exists alone in qemu-xen-traditional. So here I'm trying to introduce a new stuff that doesn't clash anything in qemu upstream. So I guess I should rephrase this as follows: libxl: add one machine property to support IGD GFX passthrough When we're working to support IGD GFX passthrough with qemu upstream, we'd like to introduce a machine option, -machine xxx,igd-passthru=on, to enable/disable that feature. And we also remove that old option, -gfx_passthru, just from the case of LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN since actually no any qemu stream version really need or use that. (nb: you have an extra '-' in the description I think) Yeah, I will remove that. Thanks Tiejun Signed-off-by: Tiejun Chen tiejun.c...@intel.com --- v2: * Based on some discussions with Wei we'd like to keep both old option, -gfx_passthru, and new machine property option, -machine xxx,-igd-passthru=on at the same time but deprecate the old one. Then finally we remove the old one at that point that to give downstream (in this case, Xen) time to cope with the change. tools/libxl/libxl_dm.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c index c2b0487..8405f0b 100644 --- a/tools/libxl/libxl_dm.c +++ b/tools/libxl/libxl_dm.c @@ -701,6 +701,11 @@ static char ** libxl__build_device_model_args_new(libxl__gc *gc, flexarray_append(dm_args, -net); flexarray_append(dm_args, none); } +/* + * Although we already introduce 'igd-passthru', but we'd like + * to remove this until we give downstream time to cope with + * the change. + */ if (libxl_defbool_val(b_info-u.hvm.gfx_passthru)) { flexarray_append(dm_args, -gfx_passthru); } @@ -748,6 +753,11 @@ static char ** libxl__build_device_model_args_new(libxl__gc *gc, machinearg, max_ram_below_4g); } } + +if (libxl_defbool_val(b_info-u.hvm.gfx_passthru)) { +machinearg = libxl__sprintf(gc, %s,igd-passthru=on, machinearg); +} + flexarray_append(dm_args, machinearg); for (i = 0; b_info-extra_hvm b_info-extra_hvm[i] != NULL; i++) flexarray_append(dm_args, b_info-extra_hvm[i]);
Re: [Qemu-devel] [PATCH v2 1/2] configure: Default to enable module build
On Mon, 02/02 20:34, Paolo Bonzini wrote: On 13/01/2015 09:53, Fam Zheng wrote: We have module build support around for a while, but also had it bitrot several times. It probably makes sense to enable it by default so that people can notice and use it. Counterpart to --enable-modules, which is turned as default, --disable-modules is added to suppress it. If both are omitted, the support is guesses as usual. Signed-off-by: Fam Zheng f...@redhat.com I squashed this to unbreak static compilation, but Peter reported that module linking fails on ARM host: LINK block/curl.so /usr/bin/ld: block/curl.o: relocation R_ARM_THM_MOVW_ABS_NC against `__stack_chk_guard' can not be used when making a shared object; recompile with -fPIC block/curl.o: could not read symbols: Bad value collect2: error: ld returned 1 exit status I don't see how -fPIC is missed in ARM host :( Does the below patch fix this? diff --git a/configure b/configure index a9ae57a..4fae00a 100755 --- a/configure +++ b/configure @@ -1536,9 +1536,6 @@ if compile_prog -Werror -fno-gcse ; then fi if test $static = yes ; then - if test $modules = yes ; then -error_exit static and modules are mutually incompatible - fi if test $pie = yes ; then error_exit static and pie are mutually incompatible else Stage this hunk [y,n,q,a,d,/,j,J,g,e,?]? y @@ -2763,6 +2760,14 @@ fi module_try_enable() { force=$1 + if test $static = yes; then +if $force; then + error_exit static and modules are mutually incompatible +else + modules=no + return +fi + fi shacmd_probe=sha1sum sha1 shasum for c in $shacmd_probe; do if has $c; then Is the above ok? Yes, it looks correct. Fam
Re: [Qemu-devel] [PATCH 2/2] bootdevice: add check in restore_boot_order()
On 2015/2/2 17:37, Markus Armbruster wrote: Gonglei arei.gong...@huawei.com writes: On 2015/1/30 20:32, Markus Armbruster wrote: Gonglei arei.gong...@huawei.com writes: On 2015/1/30 20:01, Markus Armbruster wrote: Gonglei arei.gong...@huawei.com writes: On 2015/1/30 15:46, Markus Armbruster wrote: Gonglei arei.gong...@huawei.com writes: On 2015/1/30 0:03, Alexander Graf wrote: On 29.01.15 14:29, arei.gong...@huawei.com wrote: From: Gonglei arei.gong...@huawei.com If boot order is invaild or is set failed, exit qemu. Signed-off-by: Gonglei arei.gong...@huawei.com Do we really want to kill the machine only because the boot device string doesn't validate? Not all of the situation. If people want to change boot order by qmp/hmp command, it just report an error, please see do_boot_set(). But if the boot order is set in qemu command line, it will exit qemu if the boot device string is invalidate, as this patch's situation, which follow the original processing way (commit ef3adf68). I think Alex isn't concerned about the monitor command, but what happens when boot order once is reset to order on system reset. -boot errors should have been detected during command line processing (strongly preferred) or initial startup (acceptable). Detecting Yes, and it had done it just like that, please see main() of vl.c. So, actually it wouldn't fail in the check of restore_boot_order function's calling. The only possible fails will happen to call boot_set_handler(). Take x86 pc machine example, set_boot_dev() callback may return errors. I don't like unreachable error messages. If qemu_boot_set() can't fail in restore_boot_order(), then simply assert it doesn't fail, by passing error_abort. Sorry, I meant the validate_bootdevices() can't fail in restore_boot_order(), but boot_set_handler(boot_set_opaque, boot_order, errp) may fail, such as set_boot_dev(). For example: x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 4096 -boot menu=on,order=nbcdep,once=c -monitor stdio -vnc :0 QEMU 2.2.50 monitor - type 'help' for more information (qemu) system_reset (qemu) qemu-system-x86_64: Too many boot devices for PC The value of parameter order should be checked during command line processing (strongly preferred) or initial startup (acceptable) if at all possible. Is it possible? Either 'once' option or 'order' option can take effect for -boot at the same time, that is say initial startup processing can check only one. Besides, the check is just for corresponding machine type, so command line processing also can't do it. I challenge your idea that we can't check this before the guest starts running. qemu_boot_set() can fail for two reasons: There is a third reason that boot_set_handler is not null, but fails in really executing time. You can see my above example about function set_boot_dev(), the handler of pc machine. * validate_bootdevices() fails Should never happen, because we've called it in main() already, treating failure as fatal error. Yes. * boot_set_handler is null MachineClass method init() may set this. main() could *easily* test whether it did! If it didn't, and -boot once is given, error out. Similar checks exist already, e.g. drive_check_orphaned(), net_check_clients(). They only warn, but that's detail. I agree, just need to report the error message. Regards, -Gonglei
Re: [Qemu-devel] [v2][PATCH] libxl: add one machine property to support IGD GFX passthrough
On 2015/2/2 20:54, Ian Jackson wrote: Wei Liu writes (Re: [v2][PATCH] libxl: add one machine property to support IGD GFX passthrough): On Mon, Feb 02, 2015 at 09:17:23AM +0800, Tiejun Chen wrote: When we're working to support IGD GFX passthrough with qemu upstream, instead of -gfx_passthru we'd like to make that a machine option, -machine xxx,-igd-passthru=on. This need to bring a change on tool side. ... My suggestion has one premise -- if upstream QEMU has already released that -gfx_passthru option. If there is no old one (in upstream QEMU) at all, then there is nothing to keep and deprecate. I think the commit message of the xen.git commit should explain what options are supported by which versions of qemu (including qemu upstream's future plans). That would provide (a) something which summarises the communication etc. with qemu upstream and can be checked with them if necessary and (b) something against which the libxl changes can be easily judged. Sorry, looks I'm misleading this to everyone. Here I picked my reply from another email: Actually qemu upstream never own this option, '-gfx_passthru' at all. This just exists alone in qemu-xen-traditional. So here I'm trying to introduce a new stuff that doesn't clash anything in qemu upstream. So I guess I should rephrase this as follows: libxl: add one machine property to support IGD GFX passthrough When we're working to support IGD GFX passthrough with qemu upstream, we'd like to introduce a machine option, -machine xxx,igd-passthru=on, to enable/disable that feature. And we also remove that old option, -gfx_passthru, just from the case of LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN since actually no any qemu stream version really need or use that. So is it good? Thanks Tiejun
Re: [Qemu-devel] [RFC PATCH v8 08/21] cpu: replay instructions sequence
On 02/02/2015 13:28, Pavel Dovgaluk wrote: cpu-exception_index = EXCP_INTERRUPT; next_tb = 0; +qemu_notify_event(); Why is this needed? It is needed to wake up iothread in replay mode. Otherwise it waits for additional time and replay becomes too slow. What event (something from a timerlist?) is ready, that has not been notified to the iothread yet? qemu_notify_event() should never be needed in common case. It's probably missing somewhere else. Paolo
Re: [Qemu-devel] [RFC 09/10] cpu: remove exit_request global.
On 02/02/2015 14:12, Peter Maydell wrote: In particular, you can do it while you are under the big QEMU lock. ...but this is a signal handler, so we can't guarantee that the thread holds the big lock. ...and if we can iterate over CPU lists in signal handlers, the correct approach is probably to use the existing exit_request flag rather than adding another one. (Needs investigation to check and document semantics of that flag.) I agree on both counts. Paolo
Re: [Qemu-devel] [RFC PATCH v8 08/21] cpu: replay instructions sequence
On 02/02/2015 13:42, Pavel Dovgaluk wrote: From: Paolo Bonzini [mailto:pbonz...@redhat.com] On 02/02/2015 13:28, Pavel Dovgaluk wrote: cpu-exception_index = EXCP_INTERRUPT; next_tb = 0; +qemu_notify_event(); Why is this needed? It is needed to wake up iothread in replay mode. Otherwise it waits for additional time and replay becomes too slow. What event (something from a timerlist?) is ready, that has not been notified to the iothread yet? qemu_notify_event() should never be needed in common case. It's probably missing somewhere else. I think in this case there are no events at all - just reading timers values that were made while recording. We have to replay these reads by waking iothread. I think the right place for this is in replay_read_next_clock then. Paolo
Re: [Qemu-devel] [PATCH 7/7] block/raw-posix: set max_write_zeroes to INT_MAX for regular files
Am 30.01.2015 um 09:42 hat Denis V. Lunev geschrieben: fallocate() works fine and could handle properly with arbitrary size requests. There is no sense to reduce the amount of space to fallocate. The bigger is the size, the better is the performance as the amount of journal updates is reduced. The patch changes behavior for both generic filesystem and XFS codepaths, which are different in handle_aiocb_write_zeroes. The implementation of fallocate and xfsctl(XFS_IOC_ZERO_RANGE) for XFS are exactly the same thus the change is fine for both ways. Signed-off-by: Denis V. Lunev d...@openvz.org Reviewed-by: Max Reitz mre...@redhat.com CC: Kevin Wolf kw...@redhat.com CC: Stefan Hajnoczi stefa...@redhat.com CC: Peter Lieven p...@kamp.de CC: Fam Zheng f...@redhat.com --- block/raw-posix.c | 17 + 1 file changed, 17 insertions(+) diff --git a/block/raw-posix.c b/block/raw-posix.c index 7b42f37..933c778 100644 --- a/block/raw-posix.c +++ b/block/raw-posix.c @@ -293,6 +293,20 @@ static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp) } } +static void raw_probe_max_write_zeroes(BlockDriverState *bs) +{ +BDRVRawState *s = bs-opaque; +struct stat st; + +if (fstat(s-fd, st) 0) { +return; /* no problem, keep default value */ +} +if (!S_ISREG(st.st_mode) || !s-discard_zeroes) { +return; +} +bs-bl.max_write_zeroes = INT_MAX; +} Peter, do you remember why INT_MAX isn't actually the default? I think the most reasonable behaviour would be that a limitation is only used if a block driver requests it, and otherwise unlimited is assumed. We can take this patch to raw-posix, it is certainly not wrong. But any format driver or filter will still, in most cases needlessly, apply MAX_WRITE_ZEROES_DEFAULT, i.e. a 16 MB maximum, so I think we should consider making a change to the default. Kevin
[Qemu-devel] [PATCH v4 0/5] this series adds the long missing multiread support to virtio-blk.
some remarks: - i introduced rd_merged and wr_merged block accounting stats to blockstats as a generic interface which can be set from any driver that will introduce multirequest merging in the future. - the old multiwrite interface is still there and might be removed. v3-v4: - added a patch to disable request merging in the virtio-blk device - changed some static function names [Max] - removed _identifiers [Max] - removed to +1 in the check for max iovectors merged [Max, Kevin] - changed type 0xff to type ~(VIRTIO_BLK_T_OUT | VIRTIO_BLK_T_BARRIER) [Max] - initialize qiov before bounds check [Max] - added an assertion to avoid overflowing mrb-reqs array [Max] - added comments about the usage of qiov (internal vs. external) [Max] v2-v3: - completely reworked Patch 4 as it turned out that depending on the kernel even sequential requests fly in out of order or we receive a mix from different software queues. v1-v2: - add overflow checking for nb_sectors [Kevin] - do not change the name of the macro of max mergable requests. [Fam] RFC v2-v1: - added Erics grammar corrections to Patch 4 - Patch 4 - reworked the IF tree to a SWITCH statement to make it easier to understand in virtio_blk_handle_request - stop merging if requests switch from read to write or vice versa. e..g, otherwise we could introduce big delays as a small read request could be followed by a lot of write requests and the read requests sits there until the queue is empty. RFC v1-RFC v2: - completed Patch 1 by the example in qmp-commands.hx [Eric] - used bool for merge in Patch 4 [Max] - fixed a few typos in the commit msg of Patch 4 [Max] - do not start merging and directly pass req[0]-qiov in case of num_reqs == 1 - avoid allocating memory for the multireq [Kevin] - do not import block_int.h and add appropiate iface to block-backend [Kevin] - removed debug output and added trace event for multireq [Kevin] - fixed alloc hint for the merge qiov [Kevin] - currently did not split virtio_submit_multireq into rw code since the redundant code would now be much bigger part than in the original patch. - added a merge_qiov to VirtioBlockRequest. Abusing the qiov was not possible because it is initialized externally with guest memory [Kevin] - added a pointer to VirtioBlockRequest to create a linked list of VirtioBlockBlockRequests. This list is used to complete all requests belonging to a multireq [Kevin] Peter Lieven (5): block: add accounting for merged requests hw/virtio-blk: add a constant for max number of merged requests block-backend: expose bs-bl.max_transfer_length virtio-blk: introduce multiread virtio-blk: add a knob to disable request merging block.c | 2 + block/accounting.c | 7 + block/block-backend.c | 5 + block/qapi.c| 2 + hmp.c | 6 +- hw/block/dataplane/virtio-blk.c | 8 +- hw/block/virtio-blk.c | 299 +++- include/block/accounting.h | 3 + include/hw/virtio/virtio-blk.h | 18 ++- include/sysemu/block-backend.h | 1 + qapi/block-core.json| 9 +- qmp-commands.hx | 22 ++- trace-events| 1 + 13 files changed, 273 insertions(+), 110 deletions(-) -- 1.9.1
Re: [Qemu-devel] [PATCH 7/7] block/raw-posix: set max_write_zeroes to INT_MAX for regular files
Am 02.02.2015 um 15:16 schrieb Kevin Wolf: Am 02.02.2015 um 15:12 hat Peter Lieven geschrieben: Am 02.02.2015 um 15:04 schrieb Kevin Wolf: Am 02.02.2015 um 14:55 hat Peter Lieven geschrieben: Am 02.02.2015 um 14:23 schrieb Kevin Wolf: Am 30.01.2015 um 09:42 hat Denis V. Lunev geschrieben: fallocate() works fine and could handle properly with arbitrary size requests. There is no sense to reduce the amount of space to fallocate. The bigger is the size, the better is the performance as the amount of journal updates is reduced. The patch changes behavior for both generic filesystem and XFS codepaths, which are different in handle_aiocb_write_zeroes. The implementation of fallocate and xfsctl(XFS_IOC_ZERO_RANGE) for XFS are exactly the same thus the change is fine for both ways. Signed-off-by: Denis V. Lunev d...@openvz.org Reviewed-by: Max Reitz mre...@redhat.com CC: Kevin Wolf kw...@redhat.com CC: Stefan Hajnoczi stefa...@redhat.com CC: Peter Lieven p...@kamp.de CC: Fam Zheng f...@redhat.com --- block/raw-posix.c | 17 + 1 file changed, 17 insertions(+) diff --git a/block/raw-posix.c b/block/raw-posix.c index 7b42f37..933c778 100644 --- a/block/raw-posix.c +++ b/block/raw-posix.c @@ -293,6 +293,20 @@ static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp) } } +static void raw_probe_max_write_zeroes(BlockDriverState *bs) +{ +BDRVRawState *s = bs-opaque; +struct stat st; + +if (fstat(s-fd, st) 0) { +return; /* no problem, keep default value */ +} +if (!S_ISREG(st.st_mode) || !s-discard_zeroes) { +return; +} +bs-bl.max_write_zeroes = INT_MAX; +} Peter, do you remember why INT_MAX isn't actually the default? I think the most reasonable behaviour would be that a limitation is only used if a block driver requests it, and otherwise unlimited is assumed. The default (0) actually means unlimited or undefined. We introduced that limit of 16MB in bdrv_co_write_zeroes to create only reasonable sized requests because there is no guarantee that write zeroes is a fast operation. We should set INT_MAX only if we know that write zeroes of an arbitrary size is always fast. Well, splitting it up doesn't make it any faster. I think we can assume that drv-bdrv_co_write_zeroes() wants to know the full request size unless the driver has explicitly set bs-bl.max_write_zeroes. You mean sth like this: Yes, I think that's what I meant. I can't find the original discussion why we added this limit. It was actually the default before we introduced BlockLimits. And, it was also the default in the unsupported path of write zeroes which created big memory allocations. This might be the reason why we introduced a limit. Peter Kevin diff --git a/block.c b/block.c index 61412e9..8272ef9 100644 --- a/block.c +++ b/block.c @@ -3192,10 +3192,7 @@ int coroutine_fn bdrv_co_copy_on_readv(BlockDriverState *bs, BDRV_REQ_COPY_ON_READ); } -/* if no limit is specified in the BlockLimits use a default - * of 32768 512-byte sectors (16 MiB) per request. - */ -#define MAX_WRITE_ZEROES_DEFAULT 32768 +#define MAX_WRITE_ZEROES_BOUNCE_BUFFER 32768 static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs, int64_t sector_num, int nb_sectors, BdrvRequestFlags flags) @@ -3206,7 +3203,7 @@ static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs, int ret = 0; int max_write_zeroes = bs-bl.max_write_zeroes ? - bs-bl.max_write_zeroes : MAX_WRITE_ZEROES_DEFAULT; + bs-bl.max_write_zeroes : INT_MAX; while (nb_sectors 0 !ret) { int num = nb_sectors; @@ -3242,7 +3239,7 @@ static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs, if (ret == -ENOTSUP) { /* Fall back to bounce buffer if write zeroes is unsupported */ int max_xfer_len = MIN_NON_ZERO(bs-bl.max_transfer_length, - MAX_WRITE_ZEROES_DEFAULT); + MAX_WRITE_ZEROES_BOUNCE_BUFFER); num = MIN(num, max_xfer_len); iov.iov_len = num * BDRV_SECTOR_SIZE; if (iov.iov_base == NULL) { @@ -5099,11 +5096,6 @@ static void coroutine_fn bdrv_discard_co_entry(void *opaque) rwco-ret = bdrv_co_discard(rwco-bs, rwco-sector_num, rwco-nb_sectors); } -/* if no limit is specified in the BlockLimits use a default - * of 32768 512-byte sectors (16 MiB) per request. - */ -#define MAX_DISCARD_DEFAULT 32768 - int coroutine_fn bdrv_co_discard(BlockDriverState *bs, int64_t sector_num, int nb_sectors) { @@ -5128,7 +5120,7 @@ int coroutine_fn bdrv_co_discard(BlockDriverState *bs, int64_t sector_num, return 0; } -max_discard = bs-bl.max_discard ? bs-bl.max_discard : MAX_DISCARD_DEFAULT; +max_discard = bs-bl.max_discard ? bs-bl.max_discard : INT_MAX; while (nb_sectors 0) { int ret; int num =
Re: [Qemu-devel] [PATCH] vl.c: fix -usb option assertion failure in qemu_opt_get_bool_helper()
On 01/31/2015 11:23 AM, Jan Kiszka wrote: On 2015-01-05 12:37, Jan Kiszka wrote: On 2015-01-05 12:22, Stefan Hajnoczi wrote: Commit 49d2e648e8087d154d8bf8b91f27c8e05e79d5a6 (machine: remove qemu_machine_opts global list) removed option descriptions from the -machine QemuOptsList to avoid repeating MachineState's QOM properties. This change broke vl.c:usb_enabled() because qemu_opt_get_bool() cannot be used on QemuOptsList without option descriptions since QemuOpts doesn't know the type and therefore left an unparsed string value. This patch avoids calling qemu_opt_get_bool() to fix the assertion failure: $ qemu-system-x86_64 -usb qemu_opt_get_bool_helper: Assertion `opt-desc opt-desc-type == QEMU_OPT_BOOL' failed. Test the presence of -usb using qemu_opt_find() but use the MachineState-usb field instead of qemu_opt_get_bool(). Cc: Marcel Apfelbaum marce...@redhat.com Cc: Tiejun Chen tiejun.c...@intel.com Signed-off-by: Stefan Hajnoczi stefa...@redhat.com --- vl.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/vl.c b/vl.c index bea9656..6e8889c 100644 --- a/vl.c +++ b/vl.c @@ -999,8 +999,11 @@ static int parse_name(QemuOpts *opts, void *opaque) bool usb_enabled(bool default_usb) { -return qemu_opt_get_bool(qemu_get_machine_opts(), usb, - has_defaults default_usb); +if (qemu_opt_find(qemu_get_machine_opts(), usb)) { +return current_machine-usb; +} else { +return has_defaults default_usb; +} } #ifndef _WIN32 That still leaves the other boolean machine options broken. A generic fix would be good. Or revert the original commit until we have one. Just pulled current master, and at least the iommu machine option is still triggering an abort. What is the status of fixing these fallouts? Was anything else addressed by now, just this one forgotten? No, is not forgotten. I waited to see that the approach to fix this issue is accepted by the reviewers. Thanks for the reminder, patches will be submitted soon. Marcel Jan
[Qemu-devel] [v4 12/13] migration: Add command to set migration parameter
Add the qmp and hmp commands to tune the parameters used in live migration. Signed-off-by: Liang Li liang.z...@intel.com Signed-off-by: Yang Zhang yang.z.zh...@intel.com --- hmp-commands.hx | 15 ++ hmp.c | 35 ++ hmp.h | 3 ++ include/migration/migration.h | 4 +-- migration/migration.c | 69 +++ monitor.c | 18 +++ qapi-schema.json | 52 qmp-commands.hx | 25 8 files changed, 206 insertions(+), 15 deletions(-) diff --git a/hmp-commands.hx b/hmp-commands.hx index e37bc8b..535b5ba 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -985,6 +985,21 @@ Enable/Disable the usage of a capability @var{capability} for migration. ETEXI { +.name = migrate_set_parameter, +.args_type = parameter:s,value:i, +.params = parameter value, +.help = Set the parameter for migration, +.mhandler.cmd = hmp_migrate_set_parameter, +.command_completion = migrate_set_parameter_completion, +}, + +STEXI +@item migrate_set_parameter @var{parameter} @var{value} +@findex migrate_set_parameter +Set the parameter @var{parameter} for migration. +ETEXI + +{ .name = client_migrate_info, .args_type = protocol:s,hostname:s,port:i?,tls-port:i?,cert-subject:s?, .params = protocol hostname port tls-port cert-subject, diff --git a/hmp.c b/hmp.c index 481be80..faab4b0 100644 --- a/hmp.c +++ b/hmp.c @@ -1134,6 +1134,41 @@ void hmp_migrate_set_capability(Monitor *mon, const QDict *qdict) } } +void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict) +{ +const char *param = qdict_get_str(qdict, parameter); +int value = qdict_get_int(qdict, value); +Error *err = NULL; +MigrationParameterStatusList *params = g_malloc0(sizeof(*params)); +MigrationParameterInt *data; +int i; + +for (i = 0; i MIGRATION_PARAMETER_MAX; i++) { +if (strcmp(param, MigrationParameter_lookup[i]) == 0) { +params-value = g_malloc0(sizeof(*params-value)); +params-value-kind = i; +params-value-data = g_malloc0(sizeof(MigrationParameterInt)); +data = (MigrationParameterInt *)params-value-data; +data-value = value; +params-next = NULL; +qmp_migrate_set_parameters(params, err); +break; +} +} + +if (i == MIGRATION_PARAMETER_MAX) { +error_set(err, QERR_INVALID_PARAMETER, param); +} + +qapi_free_MigrationParameterStatusList(params); + +if (err) { +monitor_printf(mon, migrate_set_parameter: %s\n, + error_get_pretty(err)); +error_free(err); +} +} + void hmp_set_password(Monitor *mon, const QDict *qdict) { const char *protocol = qdict_get_str(qdict, protocol); diff --git a/hmp.h b/hmp.h index 4bb5dca..429efea 100644 --- a/hmp.h +++ b/hmp.h @@ -63,6 +63,7 @@ void hmp_migrate_cancel(Monitor *mon, const QDict *qdict); void hmp_migrate_set_downtime(Monitor *mon, const QDict *qdict); void hmp_migrate_set_speed(Monitor *mon, const QDict *qdict); void hmp_migrate_set_capability(Monitor *mon, const QDict *qdict); +void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict); void hmp_migrate_set_cache_size(Monitor *mon, const QDict *qdict); void hmp_set_password(Monitor *mon, const QDict *qdict); void hmp_expire_password(Monitor *mon, const QDict *qdict); @@ -111,6 +112,8 @@ void watchdog_action_completion(ReadLineState *rs, int nb_args, const char *str); void migrate_set_capability_completion(ReadLineState *rs, int nb_args, const char *str); +void migrate_set_parameter_completion(ReadLineState *rs, int nb_args, + const char *str); void host_net_add_completion(ReadLineState *rs, int nb_args, const char *str); void host_net_remove_completion(ReadLineState *rs, int nb_args, const char *str); diff --git a/include/migration/migration.h b/include/migration/migration.h index 0c4f21c..8e09b42 100644 --- a/include/migration/migration.h +++ b/include/migration/migration.h @@ -50,9 +50,7 @@ struct MigrationState QEMUBH *cleanup_bh; QEMUFile *file; QemuThread *compress_thread; -int compress_thread_count; -int decompress_thread_count; -int compress_level; +int parameters[MIGRATION_PARAMETER_MAX]; int state; MigrationParams params; diff --git a/migration/migration.c b/migration/migration.c index cbbd455..b5055dc 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -66,9 +66,12 @@ MigrationState *migrate_get_current(void) .bandwidth_limit = MAX_THROTTLE, .xbzrle_cache_size =
Re: [Qemu-devel] [PATCH v2 1/1] qemu-img: Add QEMU_PKGVERSION to QEMU_IMG_VERSION
Am 09.01.2015 um 16:17 hat Don Slutz geschrieben: This is the same way vl.c handles this. Signed-off-by: Don Slutz dsl...@verizon.com Thanks, applied to the block branch. Kevin
Re: [Qemu-devel] [PATCH 7/7] block/raw-posix: set max_write_zeroes to INT_MAX for regular files
Am 02.02.2015 um 14:23 schrieb Kevin Wolf: Am 30.01.2015 um 09:42 hat Denis V. Lunev geschrieben: fallocate() works fine and could handle properly with arbitrary size requests. There is no sense to reduce the amount of space to fallocate. The bigger is the size, the better is the performance as the amount of journal updates is reduced. The patch changes behavior for both generic filesystem and XFS codepaths, which are different in handle_aiocb_write_zeroes. The implementation of fallocate and xfsctl(XFS_IOC_ZERO_RANGE) for XFS are exactly the same thus the change is fine for both ways. Signed-off-by: Denis V. Lunev d...@openvz.org Reviewed-by: Max Reitz mre...@redhat.com CC: Kevin Wolf kw...@redhat.com CC: Stefan Hajnoczi stefa...@redhat.com CC: Peter Lieven p...@kamp.de CC: Fam Zheng f...@redhat.com --- block/raw-posix.c | 17 + 1 file changed, 17 insertions(+) diff --git a/block/raw-posix.c b/block/raw-posix.c index 7b42f37..933c778 100644 --- a/block/raw-posix.c +++ b/block/raw-posix.c @@ -293,6 +293,20 @@ static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp) } } +static void raw_probe_max_write_zeroes(BlockDriverState *bs) +{ +BDRVRawState *s = bs-opaque; +struct stat st; + +if (fstat(s-fd, st) 0) { +return; /* no problem, keep default value */ +} +if (!S_ISREG(st.st_mode) || !s-discard_zeroes) { +return; +} +bs-bl.max_write_zeroes = INT_MAX; +} Peter, do you remember why INT_MAX isn't actually the default? I think the most reasonable behaviour would be that a limitation is only used if a block driver requests it, and otherwise unlimited is assumed. The default (0) actually means unlimited or undefined. We introduced that limit of 16MB in bdrv_co_write_zeroes to create only reasonable sized requests because there is no guarantee that write zeroes is a fast operation. We should set INT_MAX only if we know that write zeroes of an arbitrary size is always fast. Peter
[Qemu-devel] [PATCH v4 1/5] block: add accounting for merged requests
Signed-off-by: Peter Lieven p...@kamp.de Reviewed-by: Eric Blake ebl...@redhat.com Reviewed-by: Max Reitz mre...@redhat.com --- block.c| 2 ++ block/accounting.c | 7 +++ block/qapi.c | 2 ++ hmp.c | 6 +- include/block/accounting.h | 3 +++ qapi/block-core.json | 9 - qmp-commands.hx| 22 ++ 7 files changed, 45 insertions(+), 6 deletions(-) diff --git a/block.c b/block.c index d45e4dd..61412e9 100644 --- a/block.c +++ b/block.c @@ -4562,6 +4562,8 @@ static int multiwrite_merge(BlockDriverState *bs, BlockRequest *reqs, } } +block_acct_merge_done(bs-stats, BLOCK_ACCT_WRITE, num_reqs - outidx - 1); + return outidx + 1; } diff --git a/block/accounting.c b/block/accounting.c index 18102f0..01d594f 100644 --- a/block/accounting.c +++ b/block/accounting.c @@ -54,3 +54,10 @@ void block_acct_highest_sector(BlockAcctStats *stats, int64_t sector_num, stats-wr_highest_sector = sector_num + nb_sectors - 1; } } + +void block_acct_merge_done(BlockAcctStats *stats, enum BlockAcctType type, + int num_requests) +{ +assert(type BLOCK_MAX_IOTYPE); +stats-merged[type] += num_requests; +} diff --git a/block/qapi.c b/block/qapi.c index 75c388e..d1a8917 100644 --- a/block/qapi.c +++ b/block/qapi.c @@ -335,6 +335,8 @@ static BlockStats *bdrv_query_stats(const BlockDriverState *bs, s-stats-wr_bytes = bs-stats.nr_bytes[BLOCK_ACCT_WRITE]; s-stats-rd_operations = bs-stats.nr_ops[BLOCK_ACCT_READ]; s-stats-wr_operations = bs-stats.nr_ops[BLOCK_ACCT_WRITE]; +s-stats-rd_merged = bs-stats.merged[BLOCK_ACCT_READ]; +s-stats-wr_merged = bs-stats.merged[BLOCK_ACCT_WRITE]; s-stats-wr_highest_offset = bs-stats.wr_highest_sector * BDRV_SECTOR_SIZE; s-stats-flush_operations = bs-stats.nr_ops[BLOCK_ACCT_FLUSH]; diff --git a/hmp.c b/hmp.c index 481be80..5a10154 100644 --- a/hmp.c +++ b/hmp.c @@ -474,6 +474,8 @@ void hmp_info_blockstats(Monitor *mon, const QDict *qdict) wr_total_time_ns=% PRId64 rd_total_time_ns=% PRId64 flush_total_time_ns=% PRId64 +rd_merged=% PRId64 +wr_merged=% PRId64 \n, stats-value-stats-rd_bytes, stats-value-stats-wr_bytes, @@ -482,7 +484,9 @@ void hmp_info_blockstats(Monitor *mon, const QDict *qdict) stats-value-stats-flush_operations, stats-value-stats-wr_total_time_ns, stats-value-stats-rd_total_time_ns, - stats-value-stats-flush_total_time_ns); + stats-value-stats-flush_total_time_ns, + stats-value-stats-rd_merged, + stats-value-stats-wr_merged); } qapi_free_BlockStatsList(stats_list); diff --git a/include/block/accounting.h b/include/block/accounting.h index 50b42b3..4c406cf 100644 --- a/include/block/accounting.h +++ b/include/block/accounting.h @@ -39,6 +39,7 @@ typedef struct BlockAcctStats { uint64_t nr_bytes[BLOCK_MAX_IOTYPE]; uint64_t nr_ops[BLOCK_MAX_IOTYPE]; uint64_t total_time_ns[BLOCK_MAX_IOTYPE]; +uint64_t merged[BLOCK_MAX_IOTYPE]; uint64_t wr_highest_sector; } BlockAcctStats; @@ -53,5 +54,7 @@ void block_acct_start(BlockAcctStats *stats, BlockAcctCookie *cookie, void block_acct_done(BlockAcctStats *stats, BlockAcctCookie *cookie); void block_acct_highest_sector(BlockAcctStats *stats, int64_t sector_num, unsigned int nb_sectors); +void block_acct_merge_done(BlockAcctStats *stats, enum BlockAcctType type, + int num_requests); #endif diff --git a/qapi/block-core.json b/qapi/block-core.json index 80984d1..b7d9772 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -407,13 +407,20 @@ # growable sparse files (like qcow2) that are used on top # of a physical device. # +# @rd_merged: Number of read requests that have been merged into another +# request (Since 2.3). +# +# @wr_merged: Number of write requests that have been merged into another +# request (Since 2.3). +# # Since: 0.14.0 ## { 'type': 'BlockDeviceStats', 'data': {'rd_bytes': 'int', 'wr_bytes': 'int', 'rd_operations': 'int', 'wr_operations': 'int', 'flush_operations': 'int', 'flush_total_time_ns': 'int', 'wr_total_time_ns': 'int', - 'rd_total_time_ns': 'int', 'wr_highest_offset': 'int' } } + 'rd_total_time_ns': 'int', 'wr_highest_offset': 'int', + 'rd_merged': 'int', 'wr_merged': 'int' } } ## # @BlockStats: diff --git a/qmp-commands.hx b/qmp-commands.hx index c5f16dd..af3fd19 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx
Re: [Qemu-devel] [PATCH] qed: Really remove unused field QEDAIOCB.finished
Am 28.01.2015 um 02:51 hat Fam Zheng geschrieben: The commit 533ffb17a that removed qed_aiocb_info.cancel said to remove this but didn't do it. Signed-off-by: Fam Zheng f...@redhat.com Thanks, applied to the block branch. Kevin
[Qemu-devel] [PATCH v4 5/5] virtio-blk: add a knob to disable request merging
this adds a knob to disable request merging for debugging or benchmarks if dedired. Signed-off-by: Peter Lieven p...@kamp.de --- hw/block/virtio-blk.c | 5 - include/hw/virtio/virtio-blk.h | 1 + 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index d0a01a8..8c51a29 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -531,7 +531,8 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb) /* merge would exceed maximum number of requests or IO direction * changes */ if (mrb-num_reqs 0 (mrb-num_reqs == VIRTIO_BLK_MAX_MERGE_REQS || - is_write != mrb-is_write)) { + is_write != mrb-is_write || + !req-dev-conf.request_merging)) { virtio_blk_submit_multireq(req-dev-blk, mrb); } @@ -950,6 +951,8 @@ static Property virtio_blk_properties[] = { #ifdef __linux__ DEFINE_PROP_BIT(scsi, VirtIOBlock, conf.scsi, 0, true), #endif +DEFINE_PROP_BIT(request-merging, VirtIOBlock, conf.request_merging, 0, +true), DEFINE_PROP_BIT(x-data-plane, VirtIOBlock, conf.data_plane, 0, false), DEFINE_PROP_END_OF_LIST(), }; diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h index 2027a64..fc7d311 100644 --- a/include/hw/virtio/virtio-blk.h +++ b/include/hw/virtio/virtio-blk.h @@ -113,6 +113,7 @@ struct VirtIOBlkConf uint32_t scsi; uint32_t config_wce; uint32_t data_plane; +uint32_t request_merging; }; struct VirtIOBlockDataPlane; -- 1.9.1
Re: [Qemu-devel] [PATCH 7/7] block/raw-posix: set max_write_zeroes to INT_MAX for regular files
Am 02.02.2015 um 14:55 hat Peter Lieven geschrieben: Am 02.02.2015 um 14:23 schrieb Kevin Wolf: Am 30.01.2015 um 09:42 hat Denis V. Lunev geschrieben: fallocate() works fine and could handle properly with arbitrary size requests. There is no sense to reduce the amount of space to fallocate. The bigger is the size, the better is the performance as the amount of journal updates is reduced. The patch changes behavior for both generic filesystem and XFS codepaths, which are different in handle_aiocb_write_zeroes. The implementation of fallocate and xfsctl(XFS_IOC_ZERO_RANGE) for XFS are exactly the same thus the change is fine for both ways. Signed-off-by: Denis V. Lunev d...@openvz.org Reviewed-by: Max Reitz mre...@redhat.com CC: Kevin Wolf kw...@redhat.com CC: Stefan Hajnoczi stefa...@redhat.com CC: Peter Lieven p...@kamp.de CC: Fam Zheng f...@redhat.com --- block/raw-posix.c | 17 + 1 file changed, 17 insertions(+) diff --git a/block/raw-posix.c b/block/raw-posix.c index 7b42f37..933c778 100644 --- a/block/raw-posix.c +++ b/block/raw-posix.c @@ -293,6 +293,20 @@ static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp) } } +static void raw_probe_max_write_zeroes(BlockDriverState *bs) +{ +BDRVRawState *s = bs-opaque; +struct stat st; + +if (fstat(s-fd, st) 0) { +return; /* no problem, keep default value */ +} +if (!S_ISREG(st.st_mode) || !s-discard_zeroes) { +return; +} +bs-bl.max_write_zeroes = INT_MAX; +} Peter, do you remember why INT_MAX isn't actually the default? I think the most reasonable behaviour would be that a limitation is only used if a block driver requests it, and otherwise unlimited is assumed. The default (0) actually means unlimited or undefined. We introduced that limit of 16MB in bdrv_co_write_zeroes to create only reasonable sized requests because there is no guarantee that write zeroes is a fast operation. We should set INT_MAX only if we know that write zeroes of an arbitrary size is always fast. Well, splitting it up doesn't make it any faster. I think we can assume that drv-bdrv_co_write_zeroes() wants to know the full request size unless the driver has explicitly set bs-bl.max_write_zeroes. Only if we go on emulating the operation with a zero-filled buffer, I understand that we might need to split it up so that our bounce buffer doesn't become huge. Kevin
Re: [Qemu-devel] [RFC PATCH v8 08/21] cpu: replay instructions sequence
From: Paolo Bonzini [mailto:pbonz...@redhat.com] On 02/02/2015 13:28, Pavel Dovgaluk wrote: cpu-exception_index = EXCP_INTERRUPT; next_tb = 0; +qemu_notify_event(); Why is this needed? It is needed to wake up iothread in replay mode. Otherwise it waits for additional time and replay becomes too slow. What event (something from a timerlist?) is ready, that has not been notified to the iothread yet? qemu_notify_event() should never be needed in common case. It's probably missing somewhere else. I think in this case there are no events at all - just reading timers values that were made while recording. We have to replay these reads by waking iothread. Pavel Dovgalyuk
Re: [Qemu-devel] [PATCH v6 0/7] eliminate data write in bdrv_write_zeroes on Linux in raw-posix.c
Am 30.01.2015 um 09:42 hat Denis V. Lunev geschrieben: I have performed several tests with non-aligned fallocate calls and in all cases (with non-aligned fallocates) Linux performs fine, i.e. areas are zeroed correctly. Checks were made on Linux 3.16.0-28-generic #38-Ubuntu SMP This should seriously increase performance of bdrv_write_zeroes Thanks, applied all to the block branch. Note that I'm hoping to remove patch 7/7 from the queue again if we come to the conclusion that changing the global default instead makes more sense. Kevin
Re: [Qemu-devel] [PATCH v2 1/4] qemu-io: initialize progname with error_set_progname()
At Mon, 2 Feb 2015 12:36:00 +0100, Kevin Wolf wrote: Am 02.02.2015 um 10:51 hat Hitoshi Mitake geschrieben: At Thu, 22 Jan 2015 18:08:11 +0900, Hitoshi Mitake wrote: Calling error_get_progname() in the context of qemu-io can cause segmentation fault because qemu-io doesn't initialize its progname with error_set_progname(). This patch adds the initialization. Currently, the missing call of error_set_progname() doesn't cause any problems because qemu-io doesn't use error_get_progname(). This patch is a proactive action. Cc: Kevin Wolf kw...@redhat.com Cc: Stefan Hajnoczi stefa...@redhat.com Cc: Markus Armbruster arm...@redhat.com Signed-off-by: Hitoshi Mitake mitake.hito...@lab.ntt.co.jp --- qemu-io.c | 1 + 1 file changed, 1 insertion(+) Hi Kevin, Stefan, could you pick this patch? sheepdog driver has a pending patch [1] which depends on it. I believe at least qemu-io should call error_set_progname() because block drivers can use the qemu-error infrastructure. So are you planning to get the individual patches merged on their own instead of the whole series as one? I thought this patchset can be proactive fix. So I'm separating the patch for sheepdog driver from this one. [1] https://github.com/sheepdog/qemu/commit/a95c35e606a2a189e7dbaf645277c5c306b01e4b That patch looks wrong. Nobody guarantees that qemu-io is really called qemu-io. The user can name their binaries as they want and create symlinks with any name, and indeed names for qemu-img used by some distributions include 'qemu-img-kvm' and 'kvm-img'. You need to find a different way than checking binary file names. I couldn't consider about the case, thanks for pointing! I'll seek other method for distinguishing binary files. Thanks, Hitoshi Kevin
Re: [Qemu-devel] [PATCH RFC v6 18/20] virtio: support revision-specific features
On Sun, 1 Feb 2015 23:29:20 +0200 Michael S. Tsirkin m...@redhat.com wrote: On Fri, Jan 30, 2015 at 03:08:08PM +0100, Cornelia Huck wrote: On Wed, 7 Jan 2015 21:10:07 +0200 Michael S. Tsirkin m...@redhat.com wrote: On Wed, Jan 07, 2015 at 05:22:32PM +0100, Cornelia Huck wrote: On Sun, 28 Dec 2014 10:32:06 +0200 Michael S. Tsirkin m...@redhat.com wrote: On Thu, Dec 11, 2014 at 02:25:20PM +0100, Cornelia Huck wrote: Devices may support different sets of feature bits depending on which revision they're operating at. Let's give the transport a way to re-query the device about its features when the revision has been changed. Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com So now we have both get_features and get_features_rev, and it's never clear which revision does host_features refer to. IMHO that's just too messy. Let's add get_legacy_features and host_legacy_features instead? I wanted to avoid touching anything that does not support version 1. And this interface might still work for later revisions, no? We can add _modern_ then, or rename host_features to host_legacy_features everywhere as preparation. OK, I've ditched the don't modify old stuff goal and introduced -get_features_legacy(). For now, devices will add VERSION_1 in their -get_features() callback when they support it. (For many devices, this will be the only difference between the two callbacks.) Two sets of host_features don't make much sense to me. It's the most natural implementation given that some features are only set for legacy and some (will be) only for modern interfaces. We may be talking about different things: There's the host_features field which is referenced by the different device types, and there are the sets of features for legacy or modern devices. These are currently set dynamically in the -get_features() callbacks, although we could possibly convert them to pre-set values like in the kernel. Then I agree, two sets would be natural. But I'd rather stick with the current method, if only to avoid churn.
Re: [Qemu-devel] [PATCH 7/7] block/raw-posix: set max_write_zeroes to INT_MAX for regular files
Am 02.02.2015 um 15:04 schrieb Kevin Wolf: Am 02.02.2015 um 14:55 hat Peter Lieven geschrieben: Am 02.02.2015 um 14:23 schrieb Kevin Wolf: Am 30.01.2015 um 09:42 hat Denis V. Lunev geschrieben: fallocate() works fine and could handle properly with arbitrary size requests. There is no sense to reduce the amount of space to fallocate. The bigger is the size, the better is the performance as the amount of journal updates is reduced. The patch changes behavior for both generic filesystem and XFS codepaths, which are different in handle_aiocb_write_zeroes. The implementation of fallocate and xfsctl(XFS_IOC_ZERO_RANGE) for XFS are exactly the same thus the change is fine for both ways. Signed-off-by: Denis V. Lunev d...@openvz.org Reviewed-by: Max Reitz mre...@redhat.com CC: Kevin Wolf kw...@redhat.com CC: Stefan Hajnoczi stefa...@redhat.com CC: Peter Lieven p...@kamp.de CC: Fam Zheng f...@redhat.com --- block/raw-posix.c | 17 + 1 file changed, 17 insertions(+) diff --git a/block/raw-posix.c b/block/raw-posix.c index 7b42f37..933c778 100644 --- a/block/raw-posix.c +++ b/block/raw-posix.c @@ -293,6 +293,20 @@ static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp) } } +static void raw_probe_max_write_zeroes(BlockDriverState *bs) +{ +BDRVRawState *s = bs-opaque; +struct stat st; + +if (fstat(s-fd, st) 0) { +return; /* no problem, keep default value */ +} +if (!S_ISREG(st.st_mode) || !s-discard_zeroes) { +return; +} +bs-bl.max_write_zeroes = INT_MAX; +} Peter, do you remember why INT_MAX isn't actually the default? I think the most reasonable behaviour would be that a limitation is only used if a block driver requests it, and otherwise unlimited is assumed. The default (0) actually means unlimited or undefined. We introduced that limit of 16MB in bdrv_co_write_zeroes to create only reasonable sized requests because there is no guarantee that write zeroes is a fast operation. We should set INT_MAX only if we know that write zeroes of an arbitrary size is always fast. Well, splitting it up doesn't make it any faster. I think we can assume that drv-bdrv_co_write_zeroes() wants to know the full request size unless the driver has explicitly set bs-bl.max_write_zeroes. You mean sth like this: diff --git a/block.c b/block.c index 61412e9..8272ef9 100644 --- a/block.c +++ b/block.c @@ -3192,10 +3192,7 @@ int coroutine_fn bdrv_co_copy_on_readv(BlockDriverState *bs, BDRV_REQ_COPY_ON_READ); } -/* if no limit is specified in the BlockLimits use a default - * of 32768 512-byte sectors (16 MiB) per request. - */ -#define MAX_WRITE_ZEROES_DEFAULT 32768 +#define MAX_WRITE_ZEROES_BOUNCE_BUFFER 32768 static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs, int64_t sector_num, int nb_sectors, BdrvRequestFlags flags) @@ -3206,7 +3203,7 @@ static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs, int ret = 0; int max_write_zeroes = bs-bl.max_write_zeroes ? - bs-bl.max_write_zeroes : MAX_WRITE_ZEROES_DEFAULT; + bs-bl.max_write_zeroes : INT_MAX; while (nb_sectors 0 !ret) { int num = nb_sectors; @@ -3242,7 +3239,7 @@ static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs, if (ret == -ENOTSUP) { /* Fall back to bounce buffer if write zeroes is unsupported */ int max_xfer_len = MIN_NON_ZERO(bs-bl.max_transfer_length, - MAX_WRITE_ZEROES_DEFAULT); + MAX_WRITE_ZEROES_BOUNCE_BUFFER); num = MIN(num, max_xfer_len); iov.iov_len = num * BDRV_SECTOR_SIZE; if (iov.iov_base == NULL) { @@ -5099,11 +5096,6 @@ static void coroutine_fn bdrv_discard_co_entry(void *opaque) rwco-ret = bdrv_co_discard(rwco-bs, rwco-sector_num, rwco-nb_sectors); } -/* if no limit is specified in the BlockLimits use a default - * of 32768 512-byte sectors (16 MiB) per request. - */ -#define MAX_DISCARD_DEFAULT 32768 - int coroutine_fn bdrv_co_discard(BlockDriverState *bs, int64_t sector_num, int nb_sectors) { @@ -5128,7 +5120,7 @@ int coroutine_fn bdrv_co_discard(BlockDriverState *bs, int64_t sector_num, return 0; } -max_discard = bs-bl.max_discard ? bs-bl.max_discard : MAX_DISCARD_DEFAULT; +max_discard = bs-bl.max_discard ? bs-bl.max_discard : INT_MAX; while (nb_sectors 0) { int ret; int num = nb_sectors; Peter
Re: [Qemu-devel] [PATCH V11 2/3] i386: Add a Virtual Machine Generation ID device
On Sun, 01 Feb 2015 14:56:26 +0200 Gal Hammer gham...@redhat.com wrote: On 22/01/2015 15:52, Igor Mammedov wrote: On Tue, 16 Dec 2014 17:50:43 +0200 Gal Hammer gham...@redhat.com wrote: Based on Microsoft's sepecifications (paper can be dowloaded from http://go.microsoft.com/fwlink/?LinkId=260709), add a device description to the SSDT ACPI table and its implementation. The GUID is set using a global vmgenid.uuid parameter. Signed-off-by: Gal Hammer gham...@redhat.com --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -257,6 +257,7 @@ static void acpi_get_pci_info(PcPciInfo *info) #define ACPI_BUILD_TABLE_FILE etc/acpi/tables #define ACPI_BUILD_RSDP_FILE etc/acpi/rsdp #define ACPI_BUILD_TPMLOG_FILE etc/tpm/log +#define ACPI_BUILD_VMGENID_FILE etc/vm-generation-id static void build_header(GArray *linker, GArray *table_data, @@ -1068,6 +1069,8 @@ build_ssdt(GArray *table_data, GArray *linker, { MachineState *machine = MACHINE(qdev_get_machine()); uint32_t nr_mem = machine-ram_slots; +uint32_t vm_gid_physical_address; +uint32_t vm_gid_offset = 0; unsigned acpi_cpus = guest_info-apic_id_limit; int ssdt_start = table_data-len; uint8_t *ssdt_ptr; @@ -1096,6 +1099,21 @@ build_ssdt(GArray *table_data, GArray *linker, ACPI_BUILD_SET_LE(ssdt_ptr, sizeof(ssdp_misc_aml), ssdt_isa_pest[0], 16, misc-pvpanic_port); +if (vm_generation_id_set()) { +vm_gid_physical_address = ssdt_start + ssdt_acpi_vm_gid_addr[0]; +bios_linker_loader_alloc(linker, ACPI_BUILD_VMGENID_FILE, 8, true); +bios_linker_loader_add_pointer(linker, ACPI_BUILD_VMGENID_FILE, + ACPI_BUILD_TABLE_FILE, + table_data, + vm_gid_offset, + sizeof(vm_gid_offset)); could some explain how this pointer magic works, I can try, but don't you think that a magic is gone once explained? ;-) From my weak understanding it seems broken. Lets see: [1] vm_gid_offset - must be pointer inside of dest_file blob (ACPI_BUILD_VMGENID_FILE) [2] vm_gid_offset - should hold offset of the place inside of src_file (ACPI_BUILD_TABLE_FILE) where to pointer inside of dest_file should point to The vm_gid_offset should point where in the ACPI_BUILD_VMGENID_FILE the VM's GUID is stored. At the moment, it should always be zero because the GUID is stored at the begging of the ACPI_BUILD_VMGENID_FILE. now: vm_gid_physical_address - holds [2] i.e. offset of VGIA constant in inside SSDT in ACPI_BUILD_TABLE_FILE. +ACPI_BUILD_SET_LE(ssdt_ptr, sizeof(ssdp_misc_aml), + ssdt_acpi_vm_gid_addr[0], 32, vm_gid_physical_address); Then we write this offset into VGIA in ACPI_BUILD_TABLE_FILE. Yes. This offset is later patched by the linker to the full physical address. After BIOS loads tables it's going to patch at [3] ACPI_BUILD_VMGENID_FILE + (vm_gid_offset - table_data-data) /* only god knows where it will be/ and on top of it write in it value: *(ACPI_BUILD_TABLE_FILE + *[3]) We know exactly where it is, no need to call for god's help :-). This approach in general of patching arbitrary place in AML blob to get PHY addr of buffer with UUID, is quite a hack, especially in light of that we are trying to hide all direct access to AML blobs with related pointer arithmetic and manual patching. Why not reserve some potion of RAM and pass to BIOS/guest a reservation so it won't be part of AddressRangeMemory or AddressRangeACPI as MS spec requires? Then you won't need jump all above hoops to just get buffer's PHY addr. I'll be glad to hear a new idea that I didn't already try in one of other previous patches. The problem is that the specification requires working with a physical address, so it must be allocated from inside the guest. Since the OS is not exist in this stage and I also don't want to write a special driver just to allocate this buffer I had to choose this approach. how about creating device which will map 4K MMIO region in PCI hole address space and passing it as a reservation via e820 table we have in QEMU. Then address could be directly built in ACPI tables as constant value at the time of ACPI tables creation. That way it would be possible to get address of buffer without firmware + guest OS doing anything and going through quite complex chain for getting buffer address (qemu-bios-OSPM-qemu). If you go current route, it would be needed to teach linker a new command to make reservation in E820 so that allocated buffer won't be part of of AddressRangeMemory as required by spec or anything else. Which would make already hard to understand/use correctly linker API even more complex.
Re: [Qemu-devel] [v2][PATCH] libxl: add one machine property to support IGD GFX passthrough
Wei Liu writes (Re: [v2][PATCH] libxl: add one machine property to support IGD GFX passthrough): On Mon, Feb 02, 2015 at 09:17:23AM +0800, Tiejun Chen wrote: When we're working to support IGD GFX passthrough with qemu upstream, instead of -gfx_passthru we'd like to make that a machine option, -machine xxx,-igd-passthru=on. This need to bring a change on tool side. ... My suggestion has one premise -- if upstream QEMU has already released that -gfx_passthru option. If there is no old one (in upstream QEMU) at all, then there is nothing to keep and deprecate. I think the commit message of the xen.git commit should explain what options are supported by which versions of qemu (including qemu upstream's future plans). That would provide (a) something which summarises the communication etc. with qemu upstream and can be checked with them if necessary and (b) something against which the libxl changes can be easily judged. Thanks, Ian.
Re: [Qemu-devel] [RFC 09/10] cpu: remove exit_request global.
On 2 February 2015 at 10:03, Paolo Bonzini pbonz...@redhat.com wrote: You can't do this -- this code is a signal handler so it could get run at any time including while the list of CPUs is being updated. (This is why we have the exit_request flag in the first place rather than just setting the exit_request flag in each CPU...) Actually you can do this if you are careful. In particular, you can do it while you are under the big QEMU lock. ...but this is a signal handler, so we can't guarantee that the thread holds the big lock. If you are not, basically you have to treat the CPU list as RCU-protected, and this is doable because the CPU object cannot be removed and added back into the CPU list. Unfortunately RCU doesn't support QTAILQ, at least not yet, so you'd have to convert the CPU list to QLIST. But the basic idea of this patch can be done. ...and if we can iterate over CPU lists in signal handlers, the correct approach is probably to use the existing exit_request flag rather than adding another one. (Needs investigation to check and document semantics of that flag.) -- PMM
Re: [Qemu-devel] [PATCH V11 2/3] i386: Add a Virtual Machine Generation ID device
On 02/02/2015 14:46, Igor Mammedov wrote: On Sun, 01 Feb 2015 14:56:26 +0200 Gal Hammer gham...@redhat.com wrote: On 22/01/2015 15:52, Igor Mammedov wrote: On Tue, 16 Dec 2014 17:50:43 +0200 Gal Hammer gham...@redhat.com wrote: Based on Microsoft's sepecifications (paper can be dowloaded from http://go.microsoft.com/fwlink/?LinkId=260709), add a device description to the SSDT ACPI table and its implementation. The GUID is set using a global vmgenid.uuid parameter. Signed-off-by: Gal Hammer gham...@redhat.com --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -257,6 +257,7 @@ static void acpi_get_pci_info(PcPciInfo *info) #define ACPI_BUILD_TABLE_FILE etc/acpi/tables #define ACPI_BUILD_RSDP_FILE etc/acpi/rsdp #define ACPI_BUILD_TPMLOG_FILE etc/tpm/log +#define ACPI_BUILD_VMGENID_FILE etc/vm-generation-id static void build_header(GArray *linker, GArray *table_data, @@ -1068,6 +1069,8 @@ build_ssdt(GArray *table_data, GArray *linker, { MachineState *machine = MACHINE(qdev_get_machine()); uint32_t nr_mem = machine-ram_slots; +uint32_t vm_gid_physical_address; +uint32_t vm_gid_offset = 0; unsigned acpi_cpus = guest_info-apic_id_limit; int ssdt_start = table_data-len; uint8_t *ssdt_ptr; @@ -1096,6 +1099,21 @@ build_ssdt(GArray *table_data, GArray *linker, ACPI_BUILD_SET_LE(ssdt_ptr, sizeof(ssdp_misc_aml), ssdt_isa_pest[0], 16, misc-pvpanic_port); +if (vm_generation_id_set()) { +vm_gid_physical_address = ssdt_start + ssdt_acpi_vm_gid_addr[0]; +bios_linker_loader_alloc(linker, ACPI_BUILD_VMGENID_FILE, 8, true); +bios_linker_loader_add_pointer(linker, ACPI_BUILD_VMGENID_FILE, + ACPI_BUILD_TABLE_FILE, + table_data, + vm_gid_offset, + sizeof(vm_gid_offset)); could some explain how this pointer magic works, I can try, but don't you think that a magic is gone once explained? ;-) From my weak understanding it seems broken. Lets see: [1] vm_gid_offset - must be pointer inside of dest_file blob (ACPI_BUILD_VMGENID_FILE) [2] vm_gid_offset - should hold offset of the place inside of src_file (ACPI_BUILD_TABLE_FILE) where to pointer inside of dest_file should point to The vm_gid_offset should point where in the ACPI_BUILD_VMGENID_FILE the VM's GUID is stored. At the moment, it should always be zero because the GUID is stored at the begging of the ACPI_BUILD_VMGENID_FILE. now: vm_gid_physical_address - holds [2] i.e. offset of VGIA constant in inside SSDT in ACPI_BUILD_TABLE_FILE. +ACPI_BUILD_SET_LE(ssdt_ptr, sizeof(ssdp_misc_aml), + ssdt_acpi_vm_gid_addr[0], 32, vm_gid_physical_address); Then we write this offset into VGIA in ACPI_BUILD_TABLE_FILE. Yes. This offset is later patched by the linker to the full physical address. After BIOS loads tables it's going to patch at [3] ACPI_BUILD_VMGENID_FILE + (vm_gid_offset - table_data-data) /* only god knows where it will be/ and on top of it write in it value: *(ACPI_BUILD_TABLE_FILE + *[3]) We know exactly where it is, no need to call for god's help :-). This approach in general of patching arbitrary place in AML blob to get PHY addr of buffer with UUID, is quite a hack, especially in light of that we are trying to hide all direct access to AML blobs with related pointer arithmetic and manual patching. Why not reserve some potion of RAM and pass to BIOS/guest a reservation so it won't be part of AddressRangeMemory or AddressRangeACPI as MS spec requires? Then you won't need jump all above hoops to just get buffer's PHY addr. I'll be glad to hear a new idea that I didn't already try in one of other previous patches. The problem is that the specification requires working with a physical address, so it must be allocated from inside the guest. Since the OS is not exist in this stage and I also don't want to write a special driver just to allocate this buffer I had to choose this approach. how about creating device which will map 4K MMIO region in PCI hole address space and passing it as a reservation via e820 table we have in QEMU. Then address could be directly built in ACPI tables as constant value at the time of ACPI tables creation. Isn't this will cause a VMEXIT when the guest is reading the GUID? If it is then this idea was already presented and Michael didn't approve it. That way it would be possible to get address of buffer without firmware + guest OS doing anything and going through quite complex chain for getting buffer address (qemu-bios-OSPM-qemu). If you go current route, it would be needed to teach linker a new command to make reservation in E820 so that allocated buffer won't be part of of AddressRangeMemory as required by spec or anything else. Which would make already hard to
Re: [Qemu-devel] [RFC PATCH v8 09/21] replay: interrupts and exceptions
From: Paolo Bonzini [mailto:pbonz...@redhat.com] On 22/01/2015 09:52, Pavel Dovgalyuk wrote: +if (replay_mode == REPLAY_MODE_RECORD) { +replay_save_instructions(); +replay_put_event(EVENT_EXCEPTION); +return true; Missing mutex lock/unlock. I think not. We just have to write number of already executed instructions. This number is not linked to exception event. They could be read in replay while processing some other event. @@ -1294,6 +1295,9 @@ bool x86_cpu_exec_interrupt(CPUState *cs, int interrupt_request) if (interrupt_request CPU_INTERRUPT_POLL) { cs-interrupt_request = ~CPU_INTERRUPT_POLL; apic_poll_irq(cpu-apic_state); +if (replay_mode != REPLAY_MODE_NONE) { +return true; +} } #endif Can you explain this? It probably needs a comment. Each function call should process one interrupt at once, because it corresponds to single event in the log. Pavel Dovgalyuk
[Qemu-devel] [PATCH v4 4/5] virtio-blk: introduce multiread
this patch finally introduces multiread support to virtio-blk. While multiwrite support was there for a long time, read support was missing. The complete merge logic is moved into virtio-blk.c which has been the only user of request merging ever since. This is required to be able to merge chunks of requests and immediately invoke callbacks for those requests. Secondly, this is required to switch to direct invocation of coroutines which is planned at a later stage. The following benchmarks show the performance of running fio with 4 worker threads on a local ram disk. The numbers show the average of 10 test runs after 1 run as warmup phase. |4k| 64k|4k MB/s | rd seq | rd rand | rd seq | rd rand | wr seq | wr rand --++-++-++ master| 1221 | 1187| 4178 | 4114| 1745 | 1213 multiread | 1829 | 1189| 4639 | 4110| 1894 | 1216 Signed-off-by: Peter Lieven p...@kamp.de --- hw/block/dataplane/virtio-blk.c | 8 +- hw/block/virtio-blk.c | 296 +++- include/hw/virtio/virtio-blk.h | 19 +-- trace-events| 1 + 4 files changed, 218 insertions(+), 106 deletions(-) diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c index 39c5d71..be957d1 100644 --- a/hw/block/dataplane/virtio-blk.c +++ b/hw/block/dataplane/virtio-blk.c @@ -96,9 +96,7 @@ static void handle_notify(EventNotifier *e) event_notifier_test_and_clear(s-host_notifier); blk_io_plug(s-conf-conf.blk); for (;;) { -MultiReqBuffer mrb = { -.num_writes = 0, -}; +MultiReqBuffer mrb = {}; int ret; /* Disable guest-host notifies to avoid unnecessary vmexits */ @@ -120,7 +118,9 @@ static void handle_notify(EventNotifier *e) virtio_blk_handle_request(req, mrb); } -virtio_submit_multiwrite(s-conf-conf.blk, mrb); +if (mrb.num_reqs) { +virtio_blk_submit_multireq(s-conf-conf.blk, mrb); +} if (likely(ret == -EAGAIN)) { /* vring emptied */ /* Re-enable guest-host notifies and stop processing the vring. diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index e04adb8..d0a01a8 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -34,6 +34,7 @@ VirtIOBlockReq *virtio_blk_alloc_request(VirtIOBlock *s) req-dev = s; req-qiov.size = 0; req-next = NULL; +req-mr_next = NULL; return req; } @@ -84,20 +85,32 @@ static int virtio_blk_handle_rw_error(VirtIOBlockReq *req, int error, static void virtio_blk_rw_complete(void *opaque, int ret) { -VirtIOBlockReq *req = opaque; +VirtIOBlockReq *next = opaque; -trace_virtio_blk_rw_complete(req, ret); +while (next) { +VirtIOBlockReq *req = next; +next = req-mr_next; +trace_virtio_blk_rw_complete(req, ret); -if (ret) { -int p = virtio_ldl_p(VIRTIO_DEVICE(req-dev), req-out.type); -bool is_read = !(p VIRTIO_BLK_T_OUT); -if (virtio_blk_handle_rw_error(req, -ret, is_read)) -return; -} +if (req-qiov.nalloc != -1) { +/* If nalloc is != 1 req-qiov is a local copy of the original + * external iovec. It was allocated in submit_merged_requests + * to be able to merge requests. */ +qemu_iovec_destroy(req-qiov); +} -virtio_blk_req_complete(req, VIRTIO_BLK_S_OK); -block_acct_done(blk_get_stats(req-dev-blk), req-acct); -virtio_blk_free_request(req); +if (ret) { +int p = virtio_ldl_p(VIRTIO_DEVICE(req-dev), req-out.type); +bool is_read = !(p VIRTIO_BLK_T_OUT); +if (virtio_blk_handle_rw_error(req, -ret, is_read)) { +continue; +} +} + +virtio_blk_req_complete(req, VIRTIO_BLK_S_OK); +block_acct_done(blk_get_stats(req-dev-blk), req-acct); +virtio_blk_free_request(req); +} } static void virtio_blk_flush_complete(void *opaque, int ret) @@ -291,24 +304,127 @@ static void virtio_blk_handle_scsi(VirtIOBlockReq *req) } } -void virtio_submit_multiwrite(BlockBackend *blk, MultiReqBuffer *mrb) +static inline void submit_requests(BlockBackend *blk, MultiReqBuffer *mrb, + int start, int num_reqs, int niov) { -int i, ret; +QEMUIOVector *qiov = mrb-reqs[start]-qiov; +int64_t sector_num = mrb-reqs[start]-sector_num; +int nb_sectors = mrb-reqs[start]-qiov.size / BDRV_SECTOR_SIZE; +bool is_write = mrb-is_write; + +if (num_reqs 1) { +int i; +struct iovec *tmp_iov = qiov-iov; +int tmp_niov = qiov-niov; + +/* mrb-reqs[start]-qiov was initialized from external so we can't + * modifiy it here. We need to initialize it locally and then add the +
[Qemu-devel] [PATCH v4 2/5] hw/virtio-blk: add a constant for max number of merged requests
As it was not obvious (at least for me) where the 32 comes from; add a constant for it. Signed-off-by: Peter Lieven p...@kamp.de Reviewed-by: Eric Blake ebl...@redhat.com Reviewed-by: Max Reitz mre...@redhat.com --- hw/block/virtio-blk.c | 2 +- include/hw/virtio/virtio-blk.h | 4 +++- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index 4032fca..e04adb8 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -360,7 +360,7 @@ static void virtio_blk_handle_write(VirtIOBlockReq *req, MultiReqBuffer *mrb) block_acct_start(blk_get_stats(req-dev-blk), req-acct, req-qiov.size, BLOCK_ACCT_WRITE); -if (mrb-num_writes == 32) { +if (mrb-num_writes == VIRTIO_BLK_MAX_MERGE_REQS) { virtio_submit_multiwrite(req-dev-blk, mrb); } diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h index 4652b70..6ef3fa5 100644 --- a/include/hw/virtio/virtio-blk.h +++ b/include/hw/virtio/virtio-blk.h @@ -134,8 +134,10 @@ typedef struct VirtIOBlock { struct VirtIOBlockDataPlane *dataplane; } VirtIOBlock; +#define VIRTIO_BLK_MAX_MERGE_REQS 32 + typedef struct MultiReqBuffer { -BlockRequestblkreq[32]; +BlockRequestblkreq[VIRTIO_BLK_MAX_MERGE_REQS]; unsigned intnum_writes; } MultiReqBuffer; -- 1.9.1
Re: [Qemu-devel] [PATCH] tun: orphan an skb on tx
On Mon, Feb 02, 2015 at 07:27:10AM +, David Woodhouse wrote: On Sun, 2015-02-01 at 21:07 -0800, David Miller wrote: We might as well have not have implemented the IPSEC stack at all, because as a result of the userland VPN stuff our IPSEC stack is largely unused except by a very narrow group of users. Well, I'd love to make better use of it if I can. I do suspect it makes most sense for userspace to continue to manage the probing of UDP connectivity, and the fallback to TCP mode — and I suspect it also makes sense to continue to use tun for passing packets up to the VPN client when it's using the TCP transport. So the question would be how we handle redirecting the packet flow to the optional UDP transport, when the VPN client determines that it's available. For the sake of the user setting up firewall and routing rules, I do think it's important that it continues to appear to userspace as the *same* device for the entire lifetime of the session, regardless of which transport the packets happen to be using at a given moment in time. It doesn't *have* to be tun, though. You don't seem to like my suggestion of somehow pushing down an XFRM state to the tun device to direct the packets out there instead of up to userspace. Do you have an alternative suggestion... or a specific concern that would help me come up with something you like better? Maybe you want to use a virtual tunnel interface (vti) what we have already. Everything that is routed through such an interface is guaranteed to be either encrypted if a matching xfrm state is present or dropped. Same on the rceive side, everything that is received by this interface is guaranteed to be IPsec processed. So you can do a routing based decision about the IPsec processing. While I'm sure it could handle the ESP in UDP encapsulation, I'm not that sure about your TCP fallback because this requires a valid xfrm state to allow packets to pass. Using the same interface for both is probably not possible.
Re: [Qemu-devel] [PATCH 7/7] block/raw-posix: set max_write_zeroes to INT_MAX for regular files
Am 02.02.2015 um 15:12 hat Peter Lieven geschrieben: Am 02.02.2015 um 15:04 schrieb Kevin Wolf: Am 02.02.2015 um 14:55 hat Peter Lieven geschrieben: Am 02.02.2015 um 14:23 schrieb Kevin Wolf: Am 30.01.2015 um 09:42 hat Denis V. Lunev geschrieben: fallocate() works fine and could handle properly with arbitrary size requests. There is no sense to reduce the amount of space to fallocate. The bigger is the size, the better is the performance as the amount of journal updates is reduced. The patch changes behavior for both generic filesystem and XFS codepaths, which are different in handle_aiocb_write_zeroes. The implementation of fallocate and xfsctl(XFS_IOC_ZERO_RANGE) for XFS are exactly the same thus the change is fine for both ways. Signed-off-by: Denis V. Lunev d...@openvz.org Reviewed-by: Max Reitz mre...@redhat.com CC: Kevin Wolf kw...@redhat.com CC: Stefan Hajnoczi stefa...@redhat.com CC: Peter Lieven p...@kamp.de CC: Fam Zheng f...@redhat.com --- block/raw-posix.c | 17 + 1 file changed, 17 insertions(+) diff --git a/block/raw-posix.c b/block/raw-posix.c index 7b42f37..933c778 100644 --- a/block/raw-posix.c +++ b/block/raw-posix.c @@ -293,6 +293,20 @@ static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp) } } +static void raw_probe_max_write_zeroes(BlockDriverState *bs) +{ +BDRVRawState *s = bs-opaque; +struct stat st; + +if (fstat(s-fd, st) 0) { +return; /* no problem, keep default value */ +} +if (!S_ISREG(st.st_mode) || !s-discard_zeroes) { +return; +} +bs-bl.max_write_zeroes = INT_MAX; +} Peter, do you remember why INT_MAX isn't actually the default? I think the most reasonable behaviour would be that a limitation is only used if a block driver requests it, and otherwise unlimited is assumed. The default (0) actually means unlimited or undefined. We introduced that limit of 16MB in bdrv_co_write_zeroes to create only reasonable sized requests because there is no guarantee that write zeroes is a fast operation. We should set INT_MAX only if we know that write zeroes of an arbitrary size is always fast. Well, splitting it up doesn't make it any faster. I think we can assume that drv-bdrv_co_write_zeroes() wants to know the full request size unless the driver has explicitly set bs-bl.max_write_zeroes. You mean sth like this: Yes, I think that's what I meant. Kevin diff --git a/block.c b/block.c index 61412e9..8272ef9 100644 --- a/block.c +++ b/block.c @@ -3192,10 +3192,7 @@ int coroutine_fn bdrv_co_copy_on_readv(BlockDriverState *bs, BDRV_REQ_COPY_ON_READ); } -/* if no limit is specified in the BlockLimits use a default - * of 32768 512-byte sectors (16 MiB) per request. - */ -#define MAX_WRITE_ZEROES_DEFAULT 32768 +#define MAX_WRITE_ZEROES_BOUNCE_BUFFER 32768 static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs, int64_t sector_num, int nb_sectors, BdrvRequestFlags flags) @@ -3206,7 +3203,7 @@ static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs, int ret = 0; int max_write_zeroes = bs-bl.max_write_zeroes ? - bs-bl.max_write_zeroes : MAX_WRITE_ZEROES_DEFAULT; + bs-bl.max_write_zeroes : INT_MAX; while (nb_sectors 0 !ret) { int num = nb_sectors; @@ -3242,7 +3239,7 @@ static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs, if (ret == -ENOTSUP) { /* Fall back to bounce buffer if write zeroes is unsupported */ int max_xfer_len = MIN_NON_ZERO(bs-bl.max_transfer_length, - MAX_WRITE_ZEROES_DEFAULT); + MAX_WRITE_ZEROES_BOUNCE_BUFFER); num = MIN(num, max_xfer_len); iov.iov_len = num * BDRV_SECTOR_SIZE; if (iov.iov_base == NULL) { @@ -5099,11 +5096,6 @@ static void coroutine_fn bdrv_discard_co_entry(void *opaque) rwco-ret = bdrv_co_discard(rwco-bs, rwco-sector_num, rwco-nb_sectors); } -/* if no limit is specified in the BlockLimits use a default - * of 32768 512-byte sectors (16 MiB) per request. - */ -#define MAX_DISCARD_DEFAULT 32768 - int coroutine_fn bdrv_co_discard(BlockDriverState *bs, int64_t sector_num, int nb_sectors) { @@ -5128,7 +5120,7 @@ int coroutine_fn bdrv_co_discard(BlockDriverState *bs, int64_t sector_num, return 0; } -max_discard = bs-bl.max_discard ? bs-bl.max_discard : MAX_DISCARD_DEFAULT; +max_discard = bs-bl.max_discard ? bs-bl.max_discard : INT_MAX; while (nb_sectors 0) { int ret; int num = nb_sectors; Peter
[Qemu-devel] [PATCH v4 3/5] block-backend: expose bs-bl.max_transfer_length
Signed-off-by: Peter Lieven p...@kamp.de Reviewed-by: Max Reitz mre...@redhat.com --- block/block-backend.c | 5 + include/sysemu/block-backend.h | 1 + 2 files changed, 6 insertions(+) diff --git a/block/block-backend.c b/block/block-backend.c index d00c129..c28e240 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -580,6 +580,11 @@ int blk_get_flags(BlockBackend *blk) return bdrv_get_flags(blk-bs); } +int blk_get_max_transfer_length(BlockBackend *blk) +{ +return blk-bs-bl.max_transfer_length; +} + void blk_set_guest_block_size(BlockBackend *blk, int align) { bdrv_set_guest_block_size(blk-bs, align); diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h index 8871a02..aab12b9 100644 --- a/include/sysemu/block-backend.h +++ b/include/sysemu/block-backend.h @@ -127,6 +127,7 @@ int blk_is_inserted(BlockBackend *blk); void blk_lock_medium(BlockBackend *blk, bool locked); void blk_eject(BlockBackend *blk, bool eject_flag); int blk_get_flags(BlockBackend *blk); +int blk_get_max_transfer_length(BlockBackend *blk); void blk_set_guest_block_size(BlockBackend *blk, int align); void *blk_blockalign(BlockBackend *blk, size_t size); bool blk_op_is_blocked(BlockBackend *blk, BlockOpType op, Error **errp); -- 1.9.1
Re: [Qemu-devel] [PATCH V11 2/3] i386: Add a Virtual Machine Generation ID device
On Mon, 02 Feb 2015 15:13:39 +0200 Gal Hammer gham...@redhat.com wrote: On 02/02/2015 14:46, Igor Mammedov wrote: On Sun, 01 Feb 2015 14:56:26 +0200 Gal Hammer gham...@redhat.com wrote: On 22/01/2015 15:52, Igor Mammedov wrote: On Tue, 16 Dec 2014 17:50:43 +0200 Gal Hammer gham...@redhat.com wrote: Based on Microsoft's sepecifications (paper can be dowloaded from http://go.microsoft.com/fwlink/?LinkId=260709), add a device description to the SSDT ACPI table and its implementation. The GUID is set using a global vmgenid.uuid parameter. Signed-off-by: Gal Hammer gham...@redhat.com --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -257,6 +257,7 @@ static void acpi_get_pci_info(PcPciInfo *info) #define ACPI_BUILD_TABLE_FILE etc/acpi/tables #define ACPI_BUILD_RSDP_FILE etc/acpi/rsdp #define ACPI_BUILD_TPMLOG_FILE etc/tpm/log +#define ACPI_BUILD_VMGENID_FILE etc/vm-generation-id static void build_header(GArray *linker, GArray *table_data, @@ -1068,6 +1069,8 @@ build_ssdt(GArray *table_data, GArray *linker, { MachineState *machine = MACHINE(qdev_get_machine()); uint32_t nr_mem = machine-ram_slots; +uint32_t vm_gid_physical_address; +uint32_t vm_gid_offset = 0; unsigned acpi_cpus = guest_info-apic_id_limit; int ssdt_start = table_data-len; uint8_t *ssdt_ptr; @@ -1096,6 +1099,21 @@ build_ssdt(GArray *table_data, GArray *linker, ACPI_BUILD_SET_LE(ssdt_ptr, sizeof(ssdp_misc_aml), ssdt_isa_pest[0], 16, misc-pvpanic_port); +if (vm_generation_id_set()) { +vm_gid_physical_address = ssdt_start + ssdt_acpi_vm_gid_addr[0]; +bios_linker_loader_alloc(linker, ACPI_BUILD_VMGENID_FILE, 8, true); +bios_linker_loader_add_pointer(linker, ACPI_BUILD_VMGENID_FILE, + ACPI_BUILD_TABLE_FILE, + table_data, + vm_gid_offset, + sizeof(vm_gid_offset)); could some explain how this pointer magic works, I can try, but don't you think that a magic is gone once explained? ;-) From my weak understanding it seems broken. Lets see: [1] vm_gid_offset - must be pointer inside of dest_file blob (ACPI_BUILD_VMGENID_FILE) [2] vm_gid_offset - should hold offset of the place inside of src_file (ACPI_BUILD_TABLE_FILE) where to pointer inside of dest_file should point to The vm_gid_offset should point where in the ACPI_BUILD_VMGENID_FILE the VM's GUID is stored. At the moment, it should always be zero because the GUID is stored at the begging of the ACPI_BUILD_VMGENID_FILE. now: vm_gid_physical_address - holds [2] i.e. offset of VGIA constant in inside SSDT in ACPI_BUILD_TABLE_FILE. +ACPI_BUILD_SET_LE(ssdt_ptr, sizeof(ssdp_misc_aml), + ssdt_acpi_vm_gid_addr[0], 32, vm_gid_physical_address); Then we write this offset into VGIA in ACPI_BUILD_TABLE_FILE. Yes. This offset is later patched by the linker to the full physical address. After BIOS loads tables it's going to patch at [3] ACPI_BUILD_VMGENID_FILE + (vm_gid_offset - table_data-data) /* only god knows where it will be/ and on top of it write in it value: *(ACPI_BUILD_TABLE_FILE + *[3]) We know exactly where it is, no need to call for god's help :-). This approach in general of patching arbitrary place in AML blob to get PHY addr of buffer with UUID, is quite a hack, especially in light of that we are trying to hide all direct access to AML blobs with related pointer arithmetic and manual patching. Why not reserve some potion of RAM and pass to BIOS/guest a reservation so it won't be part of AddressRangeMemory or AddressRangeACPI as MS spec requires? Then you won't need jump all above hoops to just get buffer's PHY addr. I'll be glad to hear a new idea that I didn't already try in one of other previous patches. The problem is that the specification requires working with a physical address, so it must be allocated from inside the guest. Since the OS is not exist in this stage and I also don't want to write a special driver just to allocate this buffer I had to choose this approach. how about creating device which will map 4K MMIO region in PCI hole address space and passing it as a reservation via e820 table we have in QEMU. Then address could be directly built in ACPI tables as constant value at the time of ACPI tables creation. Isn't this will cause a VMEXIT when the guest is reading the GUID? If it is then this idea was already presented and Michael didn't approve it. It will, but is it performance critical? VM supposed to read it at start-up and on getting notification. So I think VMEXIT in this case is not sufficient to drop simple and
[Qemu-devel] qemu 2.2.1 ?
Hi, are there any plans to release a 2.2 maintainance release anytime soon? Thanks, Peter
Re: [Qemu-devel] [RFC PATCH v8 09/21] replay: interrupts and exceptions
On 02/02/2015 14:50, Pavel Dovgaluk wrote: I think not. We just have to write number of already executed instructions. This number is not linked to exception event. They could be read in replay while processing some other event. I was referring to replay_put_event(EVENT_EXCEPTION) only. In principle you could run EVENT_EXCEPTION concurrently with other event writes, for example: +replay_mutex_lock(); +replay_put_event(EVENT_CLOCK + kind); +replay_put_qword(clock); +replay_mutex_unlock(); and get EVENT_CLOCK + kind (1 byte) EVENT_EXCEPTION (1 byte) clock (8 bytes) in the replay stream. I know it's really unlikely, perhaps even impossible in the current QEMU architecture that is dominated by the big QEMU lock. But the right thing to do is to lock the mutex around all event writes. Even if the write itself is atomic, it could happen in the middle of another write if you do not lock the mutex. Paolo
Re: [Qemu-devel] [PATCH] Fix ABI incompatibility between Qemu-aarch64 and Linux Kernel in signal handling.
On 2 February 2015 at 08:23, Maxim Ostapenko m.ostape...@partner.samsung.com wrote: Sorry, missed ML. You missed qemu-devel as well :-) On 02/02/2015 12:20 PM, Maxim Ostapenko wrote: Hi, this patch fixes https://bugs.launchpad.net/qemu/+bug/1416988. There is a small ABI incompatibility between Qemu-aarch64 and Linux Kernel 3.19 caused by wrong size of struct target_siginfo in Qemu. This tiny patch fixes the issue. -Maxim From 1d662e325d004bce2d640cfea5b337c92c7feca2 Mon Sep 17 00:00:00 2001 From: Max Ostapenko m.ostape...@partner.samsung.com Date: Mon, 2 Feb 2015 12:03:20 +0400 Subject: [PATCH] linux-user: wrong TARGET_SI_PAD_SIZE value for some targets. Fix TARGET_SI_PAD_SIZE for S390X, SPARC, ALPHA and AARCH64 to correspond Linux Kernel 3.19 c59c961ca511dc7ee2f4f7e9c224d16f5c76ca6e revision. Thanks for this patch. I'm afraid that to apply it we need a Signed-off-by: line from you that certifies that you're OK to contribute it. (This is the same as Linux kernel patches; see the Patch emails must include a Signed-off-by: line section in http://wiki.qemu.org/Contribute/SubmitAPatch ). --- linux-user/syscall_defs.h | 5 + 1 file changed, 5 insertions(+) diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h index ebb3be1..b95a0b2 100644 --- a/linux-user/syscall_defs.h +++ b/linux-user/syscall_defs.h @@ -655,7 +655,12 @@ typedef struct { #endif #define TARGET_SI_MAX_SIZE128 +#if defined(TARGET_S390X) || defined(TARGET_SPARC) \ + || defined(TARGET_ALPHA) || defined(TARGET_AARCH64) I don't think this ifdef is correct. For instance for sparc the kernel defines things differently for 32 and 64 bit. In fact as far as I can tell for all architectures the kernel defines __ARCH_SI_PREAMBLE_SIZE to (4 * sizeof(int)) for 64 bit and (3 * sizeof(int)) for 32 bit. +#define TARGET_SI_PAD_SIZE((TARGET_SI_MAX_SIZE/sizeof(int)) - 4) +#else #define TARGET_SI_PAD_SIZE((TARGET_SI_MAX_SIZE/sizeof(int)) - 3) +#endif So I would suggest #if TARGET_ABI_BITS == 32 #define TARGET_SI_PREAMBLE_SIZE (3 * sizeof(int)) #else #define TARGET_SI_PREAMBLE_SIZE (4 * sizeof(int)) #endif #define TARGET_SI_PAD_SIZE ((TARGET_SI_MAX_SIZE - TARGET_SI_PREAMBLE_SIZE) / sizeof(int)) which should give us the correct answer for all archs and configurations (including I think the 32-bit-on-64-bit setups like sparc32plus). It also rephrases the calculation to match what the kernel uses. thanks -- PMM
Re: [Qemu-devel] [PATCH 09/19] qtest/ahci: Demagic ahci tests.
On 30/01/2015 19:42, John Snow wrote: memset(cmd, 0x00, sizeof(cmd)); -cmd.flags = 5; /* reg_h2d_fis is 5 double-words long */ -cmd.flags = 0x400; /* clear PxTFD.STS.BSY when done */ -cmd.prdtl = 1; /* One PRD table entry. */ +cmd.flags = 5;/* reg_h2d_fis is 5 double-words long */ +cmd.flags = CMDH_CLR_BSY; /* clear PxTFD.STS.BSY when done */ Same = vs. |= note as before. Paolo
Re: [Qemu-devel] [PATCH 10/19] libqos/ahci: Add ide cmd properties
On 30/01/2015 19:42, John Snow wrote: The Invalid Command Sentinel here that caps the property array is an invalid ATA command, namely 0x01. 0x00 is NOP and 0xFF is reserved for vendor usage, so I chose the first invalid one instead. You can use ARRAY_SIZE instead. Paolo
[Qemu-devel] [v4 06/13] arch_init: Add and free data struct for decompression
Define the data structure and variables used to do multiple thread decompression, and add the code to initialize and free them. Signed-off-by: Liang Li liang.z...@intel.com Signed-off-by: Yang Zhang yang.z.zh...@intel.com --- arch_init.c | 13 - 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/arch_init.c b/arch_init.c index 87c4947..500f299 100644 --- a/arch_init.c +++ b/arch_init.c @@ -345,7 +345,12 @@ struct CompressParam { typedef struct CompressParam CompressParam; struct DecompressParam { -/* To be done */ +bool busy; +QemuMutex mutex; +QemuCond cond; +void *des; +uint8 *compbuf; +int len; }; typedef struct DecompressParam DecompressParam; @@ -1188,6 +1193,9 @@ void migrate_decompress_threads_create(int count) compressed_data_buf = g_malloc0(compressBound(TARGET_PAGE_SIZE)); quit_thread = false; for (i = 0; i count; i++) { +qemu_mutex_init(decomp_param[i].mutex); +qemu_cond_init(decomp_param[i].cond); +decomp_param[i].compbuf = g_malloc0(compressBound(TARGET_PAGE_SIZE)); qemu_thread_create(decompress_threads + i, decompress, do_data_decompress, decomp_param + i, QEMU_THREAD_JOINABLE); @@ -1202,6 +1210,9 @@ void migrate_decompress_threads_join(void) thread_count = migrate_decompress_threads(); for (i = 0; i thread_count; i++) { qemu_thread_join(decompress_threads + i); +qemu_mutex_destroy(decomp_param[i].mutex); +qemu_cond_destroy(decomp_param[i].cond); +g_free(decomp_param[i].compbuf); } g_free(decompress_threads); g_free(decomp_param); -- 1.9.1
Re: [Qemu-devel] [PATCH 13/19] libqos/ahci: add ahci command size setters
On 30/01/2015 19:42, John Snow wrote: +void ahci_command_set_sizes(AHCICommand *cmd, uint64_t xbytes, +unsigned prd_size) +{ +/* Each PRD can describe up to 4MiB, and must not be odd. */ +g_assert_cmphex(prd_size, =, 4096 * 1024); +g_assert_cmphex(prd_size 0x01, ==, 0x00); +cmd-prd_size = prd_size; +cmd-xbytes = xbytes; +cmd-fis.count = cpu_to_le16(cmd-xbytes / AHCI_SECTOR_SIZE); Why do you need cpu_to_le16 here, instead of having it in the function that writes the command to guest memory? Paolo +cmd-header.prdtl = size_to_prdtl(cmd-xbytes, cmd-prd_size); +} +
Re: [Qemu-devel] [PATCH 00/19] qtest/ahci: add dma test
On 30/01/2015 19:41, John Snow wrote: Add a simple DMA r/w test to ahci-test. Oh, and for the first 18 patches, refactor everything into helpers so that each ahci_test isn't a thousand lines long. This patch depends upon the ahci test preliminary refactoring series upstream, which shuffled a lot of libqos and malloc facilities to support this series. This patchset is a necessary step in checking in AHCI/DMA migration tests that I will later use as proof as suitability of enabling the ICH9 and AHCI migration flags. Sent a few comments, everything that I didn't reply to has my Reviewed-by. Paolo John Snow (19): libqos/ahci: Add ahci_port_select helper libqos/ahci: Add ahci_port_clear helper qtest/ahci: rename 'Command' to 'CommandHeader' libqos/ahci: Add command header helpers libqos/ahci: Add ahci_port_check_error helper libqos/ahci: Add ahci_port_check_interrupts helper libqos/ahci: Add port_check_nonbusy helper libqos/ahci: Add cmd response sanity check helpers qtest/ahci: Demagic ahci tests. libqos/ahci: Add ide cmd properties libqos/ahci: add ahci command functions libqos/ahci: add ahci command verify libqos/ahci: add ahci command size setters libqos/ahci: Add ahci_guest_io libqos/ahci: add ahci_io libqos/ahci: Add ahci_clean_mem qtest/ahci: Add a macro bootup routine qtest/ahci: Assert sector size in identify test qtest/ahci: Adding simple dma read-write test tests/ahci-test.c | 246 +- tests/libqos/ahci.c | 554 ++ tests/libqos/ahci.h | 163 --- tests/libqos/malloc.c | 5 + tests/libqos/malloc.h | 1 + 5 files changed, 798 insertions(+), 171 deletions(-)
[Qemu-devel] [v4 11/13] migration: Add interface to control compression
The multiple compression threads can be turned on/off through qmp and hmp interface before doing live migration. Signed-off-by: Liang Li liang.z...@intel.com Signed-off-by: Yang Zhang yang.z.zh...@intel.com Reviewed-by: Dr.David Alan Gilbert dgilb...@redhat.com --- migration/migration.c | 7 +-- qapi-schema.json | 7 ++- 2 files changed, 11 insertions(+), 3 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index a6f6e02..cbbd455 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -588,8 +588,11 @@ bool migrate_zero_blocks(void) bool migrate_use_compression(void) { -/* Disable compression before the patch series are applied */ -return false; +MigrationState *s; + +s = migrate_get_current(); + +return s-enabled_capabilities[MIGRATION_CAPABILITY_COMPRESS]; } int migrate_compress_level(void) diff --git a/qapi-schema.json b/qapi-schema.json index e16f8eb..0dfc4ce 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -491,13 +491,18 @@ # to enable the capability on the source VM. The feature is disabled by # default. (since 1.6) # +# @compress: Use multiple compression threads to accelerate live migration. +# This feature can help to reduce the migration traffic, by sending +# compressed pages. The feature is disabled by default. (since 2.3) +# # @auto-converge: If enabled, QEMU will automatically throttle down the guest # to speed up convergence of RAM migration. (since 1.6) # # Since: 1.2 ## { 'enum': 'MigrationCapability', - 'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks'] } + 'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks', + 'compress'] } ## # @MigrationCapabilityStatus -- 1.9.1
Re: [Qemu-devel] [PATCH v2 1/4] qemu-io: initialize progname with error_set_progname()
Am 02.02.2015 um 10:51 hat Hitoshi Mitake geschrieben: At Thu, 22 Jan 2015 18:08:11 +0900, Hitoshi Mitake wrote: Calling error_get_progname() in the context of qemu-io can cause segmentation fault because qemu-io doesn't initialize its progname with error_set_progname(). This patch adds the initialization. Currently, the missing call of error_set_progname() doesn't cause any problems because qemu-io doesn't use error_get_progname(). This patch is a proactive action. Cc: Kevin Wolf kw...@redhat.com Cc: Stefan Hajnoczi stefa...@redhat.com Cc: Markus Armbruster arm...@redhat.com Signed-off-by: Hitoshi Mitake mitake.hito...@lab.ntt.co.jp --- qemu-io.c | 1 + 1 file changed, 1 insertion(+) Hi Kevin, Stefan, could you pick this patch? sheepdog driver has a pending patch [1] which depends on it. I believe at least qemu-io should call error_set_progname() because block drivers can use the qemu-error infrastructure. So are you planning to get the individual patches merged on their own instead of the whole series as one? [1] https://github.com/sheepdog/qemu/commit/a95c35e606a2a189e7dbaf645277c5c306b01e4b That patch looks wrong. Nobody guarantees that qemu-io is really called qemu-io. The user can name their binaries as they want and create symlinks with any name, and indeed names for qemu-img used by some distributions include 'qemu-img-kvm' and 'kvm-img'. You need to find a different way than checking binary file names. Kevin
Re: [Qemu-devel] [RFC PATCH v8 08/21] cpu: replay instructions sequence
From: Paolo Bonzini [mailto:pbonz...@redhat.com] On 22/01/2015 09:52, Pavel Dovgalyuk wrote: This patch adds calls to replay functions into the icount setup block. In record mode number of executed instructions is written to the log. In replay mode number of istructions to execute is taken from the replay log. Signed-off-by: Pavel Dovgalyuk pavel.dovga...@ispras.ru --- cpu-exec.c |1 + cpus.c | 28 ++-- replay/replay.c | 24 replay/replay.h |4 4 files changed, 47 insertions(+), 10 deletions(-) diff --git a/cpu-exec.c b/cpu-exec.c index 49f01f5..99a0993 100644 --- a/cpu-exec.c +++ b/cpu-exec.c @@ -531,6 +531,7 @@ int cpu_exec(CPUArchState *env) } cpu-exception_index = EXCP_INTERRUPT; next_tb = 0; +qemu_notify_event(); Why is this needed? It is needed to wake up iothread in replay mode. Otherwise it waits for additional time and replay becomes too slow. cpu_loop_exit(cpu); } break; Pavel Dovgalyuk
Re: [Qemu-devel] [PATCH v2 4/4] linux-user: initialize progname with error_set_progname()
Hi Peter, Sorry for my late reply. At Thu, 22 Jan 2015 10:07:27 +, Peter Maydell wrote: On 22 January 2015 at 09:08, Hitoshi Mitake mitake.hito...@lab.ntt.co.jp wrote: Calling error_get_progname() in the context of qemu-x86_64 can cause segmentation fault because linux-user/main.c doesn't initialize its progname with error_set_progname(). This patch adds the initialization. Currently, the missing call of error_set_progname() doesn't cause any problems because qemu-x86_64 doesn't use error_get_progname(). This patch is a proactive action. I don't think this patch is right. The *-user binaries don't (and should not) use the qemu-error infrastructure (they don't have the monitor or any of the other stuff that uses). Code which tries to use error_get_progname() in *-user would be broken. I just thought linux-user would be a potential user of the qemu-error infrastructure because it is liked with util/qemu-error.o. Actually, the changes of the patch (let linux-user/main.c include qemu/error-report.h and call error_get_progname()) didn't cause any problems. But I'm not familiar with linux-user. If the command should never be a user of the error infrastructure, I'd like to drop this patch, of course. Thanks, Hitoshi
Re: [Qemu-devel] [RFC 09/10] cpu: remove exit_request global.
On 29/01/2015 16:52, Peter Maydell wrote: +CPU_FOREACH(cpu) { +cpu-exit_loop_request = 1; +} } You can't do this -- this code is a signal handler so it could get run at any time including while the list of CPUs is being updated. (This is why we have the exit_request flag in the first place rather than just setting the exit_request flag in each CPU...) Actually you can do this if you are careful. In particular, you can do it while you are under the big QEMU lock. If you are not, basically you have to treat the CPU list as RCU-protected, and this is doable because the CPU object cannot be removed and added back into the CPU list. Unfortunately RCU doesn't support QTAILQ, at least not yet, so you'd have to convert the CPU list to QLIST. But the basic idea of this patch can be done. Paolo
[Qemu-devel] [v4 03/13] migration: Add the framework of multi-thread decompression
Add the code to create and destroy the multiple threads those will be used to do data decompression. Left some functions empty just to keep clearness, and the code will be added later. Signed-off-by: Liang Li liang.z...@intel.com Signed-off-by: Yang Zhang yang.z.zh...@intel.com --- arch_init.c | 75 +++ include/migration/migration.h | 4 +++ migration/migration.c | 16 + 3 files changed, 95 insertions(+) diff --git a/arch_init.c b/arch_init.c index 1831f1a..ed34eb3 100644 --- a/arch_init.c +++ b/arch_init.c @@ -24,6 +24,7 @@ #include stdint.h #include stdarg.h #include stdlib.h +#include zlib.h #ifndef _WIN32 #include sys/types.h #include sys/mman.h @@ -126,6 +127,7 @@ static uint64_t bitmap_sync_count; #define RAM_SAVE_FLAG_CONTINUE 0x20 #define RAM_SAVE_FLAG_XBZRLE 0x40 /* 0x80 is reserved in migration.h start with 0x100 next */ +#define RAM_SAVE_FLAG_COMPRESS_PAGE0x100 static struct defconfig_file { const char *filename; @@ -337,8 +339,16 @@ struct CompressParam { }; typedef struct CompressParam CompressParam; +struct DecompressParam { +/* To be done */ +}; +typedef struct DecompressParam DecompressParam; + static CompressParam *comp_param; static bool quit_thread; +static DecompressParam *decomp_param; +static QemuThread *decompress_threads; +static uint8_t *compressed_data_buf; static void *do_data_compress(void *opaque) { @@ -1128,10 +1138,58 @@ void ram_handle_compressed(void *host, uint8_t ch, uint64_t size) } } +static void *do_data_decompress(void *opaque) +{ +while (!quit_thread) { +/* To be done */ +} + +return NULL; +} + +void migrate_decompress_threads_create(int count) +{ +int i; + +decompress_threads = g_new0(QemuThread, count); +decomp_param = g_new0(DecompressParam, count); +compressed_data_buf = g_malloc0(compressBound(TARGET_PAGE_SIZE)); +quit_thread = false; +for (i = 0; i count; i++) { +qemu_thread_create(decompress_threads + i, decompress, + do_data_decompress, decomp_param + i, + QEMU_THREAD_JOINABLE); +} +} + +void migrate_decompress_threads_join(void) +{ +int i, thread_count; + +quit_thread = true; +thread_count = migrate_decompress_threads(); +for (i = 0; i thread_count; i++) { +qemu_thread_join(decompress_threads + i); +} +g_free(decompress_threads); +g_free(decomp_param); +g_free(compressed_data_buf); +decompress_threads = NULL; +decomp_param = NULL; +compressed_data_buf = NULL; +} + +static void decompress_data_with_multi_threads(uint8_t *compbuf, + void *host, int len) +{ +/* To be done */ +} + static int ram_load(QEMUFile *f, void *opaque, int version_id) { int flags = 0, ret = 0; static uint64_t seq_iter; +int len = 0; seq_iter++; @@ -1208,6 +1266,23 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id) qemu_get_buffer(f, host, TARGET_PAGE_SIZE); break; +case RAM_SAVE_FLAG_COMPRESS_PAGE: +host = host_from_stream_offset(f, addr, flags); +if (!host) { +error_report(Invalid RAM offset RAM_ADDR_FMT, addr); +ret = -EINVAL; +break; +} + +len = qemu_get_be32(f); +if (len 0 || len compressBound(TARGET_PAGE_SIZE)) { +error_report(Invalid compressed data length: %d, len); +ret = -EINVAL; +break; +} +qemu_get_buffer(f, compressed_data_buf, len); +decompress_data_with_multi_threads(compressed_data_buf, host, len); +break; case RAM_SAVE_FLAG_XBZRLE: host = host_from_stream_offset(f, addr, flags); if (!host) { diff --git a/include/migration/migration.h b/include/migration/migration.h index daf6c81..0c4f21c 100644 --- a/include/migration/migration.h +++ b/include/migration/migration.h @@ -51,6 +51,7 @@ struct MigrationState QEMUFile *file; QemuThread *compress_thread; int compress_thread_count; +int decompress_thread_count; int compress_level; int state; @@ -112,6 +113,8 @@ MigrationState *migrate_get_current(void); void migrate_compress_threads_create(MigrationState *s); void migrate_compress_threads_join(MigrationState *s); +void migrate_decompress_threads_create(int count); +void migrate_decompress_threads_join(void); uint64_t ram_bytes_remaining(void); uint64_t ram_bytes_transferred(void); uint64_t ram_bytes_total(void); @@ -164,6 +167,7 @@ int64_t xbzrle_cache_resize(int64_t new_size); bool migrate_use_compression(void); int migrate_compress_level(void); int migrate_compress_threads(void); +int migrate_decompress_threads(void); void ram_control_before_iterate(QEMUFile *f, uint64_t flags); void
[Qemu-devel] [v4 13/13] migration: Add command to query migration parameter
Add the qmp and hmp commands to query the parameters used in live migration. Signed-off-by: Liang Li liang.z...@intel.com Signed-off-by: Yang Zhang yang.z.zh...@intel.com --- hmp-commands.hx | 2 ++ hmp.c | 21 + hmp.h | 1 + migration/migration.c | 27 +++ monitor.c | 7 +++ qapi-schema.json | 11 +++ qmp-commands.hx | 26 ++ 7 files changed, 95 insertions(+) diff --git a/hmp-commands.hx b/hmp-commands.hx index 535b5ba..ed0c06a 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -1779,6 +1779,8 @@ show user network stack connection states show migration status @item info migrate_capabilities show current migration capabilities +@item info migrate_parameters +show current migration parameters @item info migrate_cache_size show current migration XBZRLE cache size @item info balloon diff --git a/hmp.c b/hmp.c index faab4b0..33c95b3 100644 --- a/hmp.c +++ b/hmp.c @@ -246,6 +246,27 @@ void hmp_info_migrate_capabilities(Monitor *mon, const QDict *qdict) qapi_free_MigrationCapabilityStatusList(caps); } +void hmp_info_migrate_parameters(Monitor *mon, const QDict *qdict) +{ +MigrationParameterStatusList *params, *p; +MigrationParameterInt *data; + +params = qmp_query_migrate_parameters(NULL); + +if (params) { +monitor_printf(mon, parameters:); +for (p = params; p; p = p-next) { +data = (MigrationParameterInt *)p-value-data; +monitor_printf(mon, %s: % PRId64, + MigrationParameter_lookup[p-value-kind], + data-value); +} +monitor_printf(mon, \n); +} + +qapi_free_MigrationParameterStatusList(params); +} + void hmp_info_migrate_cache_size(Monitor *mon, const QDict *qdict) { monitor_printf(mon, xbzrel cache size: % PRId64 kbytes\n, diff --git a/hmp.h b/hmp.h index 429efea..b2b2d2c 100644 --- a/hmp.h +++ b/hmp.h @@ -28,6 +28,7 @@ void hmp_info_chardev(Monitor *mon, const QDict *qdict); void hmp_info_mice(Monitor *mon, const QDict *qdict); void hmp_info_migrate(Monitor *mon, const QDict *qdict); void hmp_info_migrate_capabilities(Monitor *mon, const QDict *qdict); +void hmp_info_migrate_parameters(Monitor *mon, const QDict *qdict); void hmp_info_migrate_cache_size(Monitor *mon, const QDict *qdict); void hmp_info_cpus(Monitor *mon, const QDict *qdict); void hmp_info_block(Monitor *mon, const QDict *qdict); diff --git a/migration/migration.c b/migration/migration.c index b5055dc..f925130 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -179,6 +179,33 @@ MigrationCapabilityStatusList *qmp_query_migrate_capabilities(Error **errp) return head; } +MigrationParameterStatusList *qmp_query_migrate_parameters(Error **errp) +{ +MigrationParameterStatusList *head = NULL; +MigrationParameterStatusList *params; +MigrationState *s = migrate_get_current(); +MigrationParameterInt *data; +int i; + +params = NULL; /* silence compiler warning */ +for (i = 0; i MIGRATION_PARAMETER_MAX; i++) { +if (head == NULL) { +head = g_malloc0(sizeof(*params)); +params = head; +} else { +params-next = g_malloc0(sizeof(*params)); +params = params-next; +} +params-value = g_malloc(sizeof(*params-value)); +params-value-kind = i; +params-value-data = g_malloc(sizeof(MigrationParameterInt)); +data = (MigrationParameterInt *)params-value-data; +data-value = s-parameters[i]; +} + +return head; +} + static void get_xbzrle_cache_stats(MigrationInfo *info) { if (migrate_use_xbzrle()) { diff --git a/monitor.c b/monitor.c index fa8ebde..58dfa28 100644 --- a/monitor.c +++ b/monitor.c @@ -2872,6 +2872,13 @@ static mon_cmd_t info_cmds[] = { .mhandler.cmd = hmp_info_migrate_capabilities, }, { +.name = migrate_parameters, +.args_type = , +.params = , +.help = show current migration parameters, +.mhandler.cmd = hmp_info_migrate_parameters, +}, +{ .name = migrate_cache_size, .args_type = , .params = , diff --git a/qapi-schema.json b/qapi-schema.json index 273f991..af34f7f 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -593,6 +593,17 @@ { 'command': 'migrate-set-parameters', 'data': { 'parameters': ['MigrationParameterStatus'] } } ## +# @query-migrate-parameters +# +# Returns information about the current migration parameters status +# +# Returns: @MigrationParametersStatus +# +# Since: 2.3 +## +{ 'command': 'query-migrate-parameters', + 'returns': ['MigrationParameterStatus'] } +## ## # @MouseInfo: # diff --git a/qmp-commands.hx b/qmp-commands.hx index 9d16386..bcfe823 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -3303,6 +3303,32 @@ EQMP
[Qemu-devel] [v4 04/13] qemu-file: Add compression functions to QEMUFile
qemu_put_compression_data() compress the data and put it to QEMUFile. qemu_put_qemu_file() put the data in the buffer of source QEMUFile to destination QEMUFile. Signed-off-by: Liang Li liang.z...@intel.com Signed-off-by: Yang Zhang yang.z.zh...@intel.com --- include/migration/qemu-file.h | 3 +++ migration/qemu-file.c | 39 +++ 2 files changed, 42 insertions(+) diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h index d843c00..2204fb9 100644 --- a/include/migration/qemu-file.h +++ b/include/migration/qemu-file.h @@ -159,6 +159,9 @@ void qemu_put_be32(QEMUFile *f, unsigned int v); void qemu_put_be64(QEMUFile *f, uint64_t v); int qemu_peek_buffer(QEMUFile *f, uint8_t *buf, int size, size_t offset); int qemu_get_buffer(QEMUFile *f, uint8_t *buf, int size); +size_t qemu_put_compression_data(QEMUFile *f, const uint8_t *p, size_t size, + int level); +int qemu_put_qemu_file(QEMUFile *f_des, QEMUFile *f_src); /* * Note that you can only peek continuous bytes from where the current pointer * is; you aren't guaranteed to be able to peak to +n bytes unless you've diff --git a/migration/qemu-file.c b/migration/qemu-file.c index edc2830..de2da2d 100644 --- a/migration/qemu-file.c +++ b/migration/qemu-file.c @@ -21,6 +21,7 @@ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN * THE SOFTWARE. */ +#include zlib.h #include qemu-common.h #include qemu/iov.h #include qemu/sockets.h @@ -529,3 +530,41 @@ uint64_t qemu_get_be64(QEMUFile *f) v |= qemu_get_be32(f); return v; } + +/* compress size bytes of data start at p with specific compression + * leve and store the compressed data to the buffer of f. + */ + +size_t qemu_put_compression_data(QEMUFile *f, const uint8_t *p, size_t size, + int level) +{ +size_t blen = IO_BUF_SIZE - f-buf_index - sizeof(int32_t); + +if (blen compressBound(size)) { +return 0; +} +if (compress2(f-buf + f-buf_index + sizeof(int32_t), blen, (Bytef *)p, + size, level) != Z_OK) { +error_report(Compress Failed!); +return 0; +} +qemu_put_be32(f, blen); +f-buf_index += blen; +return blen + sizeof(int32_t); +} + +/* Put the data in the buffer of f_src to the buffer of f_des, and + * then reset the buf_index of f_src to 0. + */ + +int qemu_put_qemu_file(QEMUFile *f_des, QEMUFile *f_src) +{ +int len = 0; + +if (f_src-buf_index 0) { +len = f_src-buf_index; +qemu_put_buffer(f_des, f_src-buf, f_src-buf_index); +f_src-buf_index = 0; +} +return len; +} -- 1.9.1
[Qemu-devel] [v4 01/13] docs: Add a doc about multiple thread compression
Give some details about the multiple thread (de)compression and how to use it in live migration. Signed-off-by: Liang Li liang.z...@intel.com Signed-off-by: Yang Zhang yang.z.zh...@intel.com Reviewed-by: Dr.David Alan Gilbert dgilb...@redhat.com --- docs/multi-thread-compression.txt | 149 ++ 1 file changed, 149 insertions(+) create mode 100644 docs/multi-thread-compression.txt diff --git a/docs/multi-thread-compression.txt b/docs/multi-thread-compression.txt new file mode 100644 index 000..0d4d212 --- /dev/null +++ b/docs/multi-thread-compression.txt @@ -0,0 +1,149 @@ +Use multiple thread (de)compression in live migration += +Copyright (C) 2015 Intel Corporation +Author: Liang Li liang.z...@intel.com + +This work is licensed under the terms of the GNU GPLv2 or later. See +the COPYING file in the top-level directory. + +Contents: += +* Introduction +* When to use +* Performance +* Usage +* TODO + +Introduction + +Instead of sending the guest memory directly, this solution will +compress the RAM page before sending; after receiving, the data will +be decompressed. Using compression in live migration can help +to reduce the data transferred about 60%, this is very useful when the +bandwidth is limited, and the total migration time can also be reduced +about 70% in a typical case. In addition to this, the VM downtime can be +reduced about 50%. The benefit depends on data's compressibility in VM. + +The process of compression will consume additional CPU cycles, and the +extra CPU cycles will increase the migration time. On the other hand, +the amount of data transferred will decrease; this factor can reduce +the total migration time. If the process of the compression is quick +enough, then the total migration time can be reduced, and multiple +thread compression can be used to accelerate the compression process. + +The decompression speed of Zlib is at least 4 times as quick as +compression, if the source and destination CPU have equal speed, +keeping the compression thread count 4 times the decompression +thread count can avoid CPU waste. + +Compression level can be used to control the compression speed and the +compression ratio. High compression ratio will take more time, level 0 +stands for no compression, level 1 stands for the best compression +speed, and level 9 stands for the best compression ratio. Users can +select a level number between 0 and 9. + + +When to use the multiple thread compression in live migration += +Compression of data will consume extra CPU cycles; so in a system with +high overhead of CPU, avoid using this feature. When the network +bandwidth is very limited and the CPU resource is adequate, use of +multiple thread compression will be very helpful. If both the CPU and +the network bandwidth are adequate, use of multiple thread compression +can still help to reduce the migration time. + +Performance +=== +Test environment: + +CPU: Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz +Socket Count: 2 +RAM: 128G +NIC: Intel I350 (10/100/1000Mbps) +Host OS: CentOS 7 64-bit +Guest OS: RHEL 6.5 64-bit +Parameter: qemu-system-x86_64 -enable-kvm -smp 4 -m 4096 + /share/ia32e_rhel6u5.qcow -monitor stdio + +There is no additional application is running on the guest when doing +the test. + + +Speed limit: 1000Gb/s +--- +| original | compress thread: 8 +| way | decompress thread: 2 +| | compression level: 1 +--- +total time(msec): | | 1833 +--- +downtime(msec): |100| 27 +--- +transferred ram(kB):| 363536 | 107819 +--- +throughput(mbps): | 893.73 | 482.22 +--- +total ram(kB): | 4211524 | 4211524 +--- + +There is an application running on the guest which write random numbers +to RAM block areas periodically. + +Speed limit: 1000Gb/s +--- +| original | compress thread: 8 +| way | decompress thread: 2 +| | compression level: 1 +--- +total time(msec): | 37369 | 15989 +--- +downtime(msec): |337| 173 +--- +transferred ram(kB):| 4274143 | 1699824
Re: [Qemu-devel] [PATCH 2/2] bootdevice: add check in restore_boot_order()
Gonglei arei.gong...@huawei.com writes: On 2015/1/30 20:32, Markus Armbruster wrote: Gonglei arei.gong...@huawei.com writes: On 2015/1/30 20:01, Markus Armbruster wrote: Gonglei arei.gong...@huawei.com writes: On 2015/1/30 15:46, Markus Armbruster wrote: Gonglei arei.gong...@huawei.com writes: On 2015/1/30 0:03, Alexander Graf wrote: On 29.01.15 14:29, arei.gong...@huawei.com wrote: From: Gonglei arei.gong...@huawei.com If boot order is invaild or is set failed, exit qemu. Signed-off-by: Gonglei arei.gong...@huawei.com Do we really want to kill the machine only because the boot device string doesn't validate? Not all of the situation. If people want to change boot order by qmp/hmp command, it just report an error, please see do_boot_set(). But if the boot order is set in qemu command line, it will exit qemu if the boot device string is invalidate, as this patch's situation, which follow the original processing way (commit ef3adf68). I think Alex isn't concerned about the monitor command, but what happens when boot order once is reset to order on system reset. -boot errors should have been detected during command line processing (strongly preferred) or initial startup (acceptable). Detecting Yes, and it had done it just like that, please see main() of vl.c. So, actually it wouldn't fail in the check of restore_boot_order function's calling. The only possible fails will happen to call boot_set_handler(). Take x86 pc machine example, set_boot_dev() callback may return errors. I don't like unreachable error messages. If qemu_boot_set() can't fail in restore_boot_order(), then simply assert it doesn't fail, by passing error_abort. Sorry, I meant the validate_bootdevices() can't fail in restore_boot_order(), but boot_set_handler(boot_set_opaque, boot_order, errp) may fail, such as set_boot_dev(). For example: x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 4096 -boot menu=on,order=nbcdep,once=c -monitor stdio -vnc :0 QEMU 2.2.50 monitor - type 'help' for more information (qemu) system_reset (qemu) qemu-system-x86_64: Too many boot devices for PC The value of parameter order should be checked during command line processing (strongly preferred) or initial startup (acceptable) if at all possible. Is it possible? Either 'once' option or 'order' option can take effect for -boot at the same time, that is say initial startup processing can check only one. Besides, the check is just for corresponding machine type, so command line processing also can't do it. I challenge your idea that we can't check this before the guest starts running. qemu_boot_set() can fail for two reasons: * validate_bootdevices() fails Should never happen, because we've called it in main() already, treating failure as fatal error. * boot_set_handler is null MachineClass method init() may set this. main() could *easily* test whether it did! If it didn't, and -boot once is given, error out. Similar checks exist already, e.g. drive_check_orphaned(), net_check_clients(). They only warn, but that's detail.
Re: [Qemu-devel] [PATCH 04/19] libqos/ahci: Add command header helpers
On 30/01/2015 19:41, John Snow wrote: +/* Construct our Command Header (set_command_header handles endianness.) */ +memset(cmd, 0x00, sizeof(cmd)); +cmd.flags = 5; /* reg_h2d_fis is 5 double-words long */ +cmd.flags = 0x400; /* clear PxTFD.STS.BSY when done */ And here |= became = ? Paolo +cmd.prdtl = 1; /* One PRD table entry. */
Re: [Qemu-devel] [PATCH 03/19] qtest/ahci: rename 'Command' to 'CommandHeader'
On 30/01/2015 19:41, John Snow wrote: @@ -703,8 +703,8 @@ static void ahci_test_identify(AHCIQState *ahci) /* Copy the existing Command #0 structure from the CLB into local memory, * and build a new command #0. */ memread(ahci-port[i].clb, cmd, sizeof(cmd)); -cmd.b1 = 5;/* reg_h2d_fis is 5 double-words long */ -cmd.b2 = 0x04; /* clear PxTFD.STS.BSY when done */ +cmd.flags = 5;/* reg_h2d_fis is 5 double-words long */ +cmd.flags |= 0x400; /* clear PxTFD.STS.BSY when done */ Missing cpu_to_le16, I think. Paolo
Re: [Qemu-devel] [PATCH 19/19] qtest/ahci: Adding simple dma read-write test
On 30/01/2015 19:42, John Snow wrote: +memwrite(ptr, tx, bufsize); + +/* Write this buffer to disk, then read it back to the DMA buffer. */ +ahci_guest_io(ahci, px, CMD_WRITE_DMA, ptr, bufsize); I would qmemset the buffer here. Paolo +ahci_guest_io(ahci, px, CMD_READ_DMA, ptr, bufsize); + +/*** Read back the Data ***/ +memread(ptr, rx, bufsize); +g_assert_cmphex(memcmp(tx, rx, bufsize), ==, 0);
Re: [Qemu-devel] [PATCH 17/19] qtest/ahci: Add a macro bootup routine
On 30/01/2015 19:42, John Snow wrote: +/** + * Boot and fully enable the HBA device. + * @see ahci_boot, ahci_pci_enable and ahci_hba_enable. + */ +static AHCIQState *ahci_macro_bootup(void) Ugly name... I would just leave out this patch. Paolo +{ +AHCIQState *ahci; +ahci = ahci_boot(); + +ahci_pci_enable(ahci); +ahci_hba_enable(ahci); + +return ahci; +}
[Qemu-devel] [v4 10/13] migration: Add the core code for decompression
Implement the core logic of multiple thread decompression, the decompression can work now. Signed-off-by: Liang Li liang.z...@intel.com Signed-off-by: Yang Zhang yang.z.zh...@intel.com --- arch_init.c | 49 +++-- 1 file changed, 47 insertions(+), 2 deletions(-) diff --git a/arch_init.c b/arch_init.c index 8ef0315..549fdbb 100644 --- a/arch_init.c +++ b/arch_init.c @@ -814,6 +814,13 @@ static inline void start_compression(CompressParam *param) qemu_mutex_unlock(param-mutex); } +static inline void start_decompression(DecompressParam *param) +{ +qemu_mutex_lock(param-mutex); +param-busy = true; +qemu_cond_signal(param-cond); +qemu_mutex_unlock(param-mutex); +} static uint64_t bytes_transferred; @@ -1347,8 +1354,27 @@ void ram_handle_compressed(void *host, uint8_t ch, uint64_t size) static void *do_data_decompress(void *opaque) { +DecompressParam *param = opaque; +size_t pagesize; + while (!quit_thread) { -/* To be done */ +qemu_mutex_lock(param-mutex); +while (!param-busy) { +qemu_cond_wait(param-cond, param-mutex); +if (quit_thread) { +break; +} +pagesize = TARGET_PAGE_SIZE; +/* uncompress() will return failed in some case, especially + * when the page is dirted when doing the compression, it's + * not a problem because the dirty page will be retransferred + * and uncompress() won't break the data in other pages. + */ +uncompress((Bytef *)param-des, pagesize, + (const Bytef *)param-compbuf, param-len); +param-busy = false; +} +qemu_mutex_unlock(param-mutex); } return NULL; @@ -1379,6 +1405,9 @@ void migrate_decompress_threads_join(void) quit_thread = true; thread_count = migrate_decompress_threads(); for (i = 0; i thread_count; i++) { +qemu_cond_signal(decomp_param[i].cond); +} +for (i = 0; i thread_count; i++) { qemu_thread_join(decompress_threads + i); qemu_mutex_destroy(decomp_param[i].mutex); qemu_cond_destroy(decomp_param[i].cond); @@ -1395,7 +1424,23 @@ void migrate_decompress_threads_join(void) static void decompress_data_with_multi_threads(uint8_t *compbuf, void *host, int len) { -/* To be done */ +int idx, thread_count; + +thread_count = migrate_decompress_threads(); +while (true) { +for (idx = 0; idx thread_count; idx++) { +if (!decomp_param[idx].busy) { +memcpy(decomp_param[idx].compbuf, compbuf, len); +decomp_param[idx].des = host; +decomp_param[idx].len = len; +start_decompression(decomp_param[idx]); +break; +} +} +if (idx thread_count) { +break; +} +} } static int ram_load(QEMUFile *f, void *opaque, int version_id) -- 1.9.1
[Qemu-devel] [PATCH v4 0/13] migration: Add a new feature to do live migration
This feature can help to reduce the data transferred about 60%, and the migration time can also be reduced about 70%. Summary of changed from v3-v4 -Update the test data in the document -Fix some typo errors -Use compressBound instead of MACRO define -Optimize the performance when compression co-work with XBZRLE -Added some comments -Rename some functions and variables -Incorrect coding style fix -QMP type change for compatibility -- 1.9.1