[Qemu-block] [PULL 2/5] tests: Fix test 049 fallout from improved HMP error messages

2015-10-02 Thread Kevin Wolf
From: Eric Blake 

Commit 50b7b000 improved HMP error messages, but forgot to update
qemu-iotests to match.

Reported-by: Kevin Wolf 
Signed-off-by: Eric Blake 
Reviewed-by: John Snow 
Reviewed-by: Alberto Garcia 
Signed-off-by: Kevin Wolf 
---
 tests/qemu-iotests/049.out | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tests/qemu-iotests/049.out b/tests/qemu-iotests/049.out
index 0425ae0..a2b6703 100644
--- a/tests/qemu-iotests/049.out
+++ b/tests/qemu-iotests/049.out
@@ -118,6 +118,7 @@ qemu-img: kilobytes, megabytes, gigabytes, terabytes, 
petabytes and exabytes.
 
 qemu-img create -f qcow2 -o size=foobar TEST_DIR/t.qcow2
 qemu-img: Parameter 'size' expects a size
+You may use k, M, G or T suffixes for kilobytes, megabytes, gigabytes and 
terabytes.
 qemu-img: TEST_DIR/t.qcow2: Invalid options for file format 'qcow2'
 
 == Check correct interpretation of suffixes for cluster size ==
-- 
1.8.3.1




[Qemu-block] [PULL 0/5] Block layer patches

2015-10-02 Thread Kevin Wolf
The following changes since commit ff770b07f34d28b79013a83989bd6c85f8f16b2f:

  Merge remote-tracking branch 'remotes/cody/tags/block-pull-request' into 
staging (2015-10-02 11:01:18 +0100)

are available in the git repository at:


  git://repo.or.cz/qemu/kevin.git tags/for-upstream

for you to fetch changes up to 73ba05d936e82fe01b2b2cf987bf3aecb4792af5:

  block/raw-posix: Open file descriptor O_RDWR to work around glibc 
posix_fallocate emulation issue. (2015-10-02 13:48:29 +0200)


Block layer patches


Alberto Garcia (1):
  block: disable I/O limits at the beginning of bdrv_close()

Eric Blake (1):
  tests: Fix test 049 fallout from improved HMP error messages

Kevin Wolf (1):
  raw-win32: Fix write request error handling

Max Reitz (1):
  iotests: Fix test 128 for password-less sudo

Richard W.M. Jones (1):
  block/raw-posix: Open file descriptor O_RDWR to work around glibc 
posix_fallocate emulation issue.

 block.c| 11 ++-
 block/raw-posix.c  |  2 +-
 block/raw-win32.c  |  4 ++--
 tests/qemu-iotests/049.out |  1 +
 tests/qemu-iotests/128 |  9 -
 5 files changed, 18 insertions(+), 9 deletions(-)



[Qemu-block] [PULL 3/5] iotests: Fix test 128 for password-less sudo

2015-10-02 Thread Kevin Wolf
From: Max Reitz 

As of 934659c460d46c948cf348822fda1d38556ed9a4, $QEMU_IO is generally no
longer a program name, and therefore "sudo -n $QEMU_IO" will no longer
work.

Fix this by copying the qemu-io invocation function from common.config,
making it use $sudo for invoking $QEMU_IO_PROG, and then use that
function instead of $QEMU_IO.

Reported-by: Fam Zheng 
Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
Signed-off-by: Kevin Wolf 
---
 tests/qemu-iotests/128 | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/128 b/tests/qemu-iotests/128
index e2a0f2f..3d8107d 100755
--- a/tests/qemu-iotests/128
+++ b/tests/qemu-iotests/128
@@ -31,6 +31,11 @@ status=1 # failure is the default!
 devname="eiodev$$"
 sudo=""
 
+_sudo_qemu_io_wrapper()
+{
+(exec $sudo "$QEMU_IO_PROG" $QEMU_IO_OPTIONS "$@")
+}
+
 _setup_eiodev()
 {
# This test should either be run as root or with passwordless sudo
@@ -76,7 +81,9 @@ TEST_IMG="/dev/mapper/$devname"
 echo
 echo "== reading from error device =="
 # Opening image should succeed but the read operation should fail
-$sudo $QEMU_IO --format "$IMGFMT" --nocache -c "read 0 65536" "$TEST_IMG" | 
_filter_qemu_io
+_sudo_qemu_io_wrapper --format "$IMGFMT" --nocache \
+  -c "read 0 65536" "$TEST_IMG" \
+| _filter_qemu_io
 
 # success, all done
 echo "*** done"
-- 
1.8.3.1




[Qemu-block] [PULL 4/5] block: disable I/O limits at the beginning of bdrv_close()

2015-10-02 Thread Kevin Wolf
From: Alberto Garcia 

Disabling I/O limits from a BDS also drains all pending throttled
requests, so it should be done at the beginning of bdrv_close() with
the rest of the bdrv_drain() calls before the BlockDriver is closed.

Signed-off-by: Alberto Garcia 
Signed-off-by: Kevin Wolf 
---
 block.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/block.c b/block.c
index 6268e37..1f90b47 100644
--- a/block.c
+++ b/block.c
@@ -1907,6 +1907,12 @@ void bdrv_close(BlockDriverState *bs)
 if (bs->job) {
 block_job_cancel_sync(bs->job);
 }
+
+/* Disable I/O limits and drain all pending throttled requests */
+if (bs->io_limits_enabled) {
+bdrv_io_limits_disable(bs);
+}
+
 bdrv_drain(bs); /* complete I/O */
 bdrv_flush(bs);
 bdrv_drain(bs); /* in case flush left pending I/O */
@@ -1958,11 +1964,6 @@ void bdrv_close(BlockDriverState *bs)
 blk_dev_change_media_cb(bs->blk, false);
 }
 
-/*throttling disk I/O limits*/
-if (bs->io_limits_enabled) {
-bdrv_io_limits_disable(bs);
-}
-
 QLIST_FOREACH_SAFE(ban, >aio_notifiers, list, ban_next) {
 g_free(ban);
 }
-- 
1.8.3.1




[Qemu-block] [PULL 5/5] block/raw-posix: Open file descriptor O_RDWR to work around glibc posix_fallocate emulation issue.

2015-10-02 Thread Kevin Wolf
From: "Richard W.M. Jones" 

  https://bugzilla.redhat.com/show_bug.cgi?id=1265196

The following command fails on an NFS mountpoint:

  $ qemu-img create -f qcow2 -o preallocation=falloc disk.img 262144
  Formatting 'disk.img', fmt=qcow2 size=262144 encryption=off 
cluster_size=65536 preallocation='falloc' lazy_refcounts=off
  qemu-img: disk.img: Could not preallocate data for the new file: Bad file 
descriptor

The reason turns out to be because NFS doesn't support the
posix_fallocate call.  glibc emulates it instead.  However glibc's
emulation involves using the pread(2) syscall.  The pread syscall
fails with EBADF if the file descriptor is opened without the read
open-flag (ie. open (..., O_WRONLY)).

I contacted glibc upstream about this, and their response is here:

  https://bugzilla.redhat.com/show_bug.cgi?id=1265196#c9

There are two possible fixes: Use Linux fallocate directly, or (this
fix) work around the problem in qemu by opening the file with O_RDWR
instead of O_WRONLY.

Signed-off-by: Richard W.M. Jones 
BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1265196
Reviewed-by: Jeff Cody 
Reviewed-by: Eric Blake 
Signed-off-by: Kevin Wolf 
---
 block/raw-posix.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 30df8ad..86f8562 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -1648,7 +1648,7 @@ static int raw_create(const char *filename, QemuOpts 
*opts, Error **errp)
 goto out;
 }
 
-fd = qemu_open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY,
+fd = qemu_open(filename, O_RDWR | O_CREAT | O_TRUNC | O_BINARY,
0644);
 if (fd < 0) {
 result = -errno;
-- 
1.8.3.1




Re: [Qemu-block] [PULL 0/1] Block job patches

2015-10-02 Thread Peter Maydell
On 1 October 2015 at 20:05, Jeff Cody  wrote:
> The following changes since commit fa500928ad9da6dd570918e3dfca13c029af07a8:
>
>   Merge remote-tracking branch 'remotes/juanquintela/tags/migration/20150930' 
> into staging (2015-10-01 10:49:38 +0100)
>
> are available in the git repository at:
>
>
>   g...@github.com:codyprime/qemu-kvm-jtc.git tags/block-pull-request
>
> for you to fetch changes up to 5279efebcf8f8fbf2ed2feed63cdb9d375c7cd07:
>
>   block: mirror - fix full sync mode when target does not support zero init 
> (2015-10-01 15:02:21 -0400)
>
> 
> Block job patches
> 
>
> Jeff Cody (1):
>   block: mirror - fix full sync mode when target does not support zero
> init
>
>  block/mirror.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>

Applied, thanks.

-- PMM



Re: [Qemu-block] [PATCH] gluster: allocate GlusterAIOCBs on the stack

2015-10-02 Thread Kevin Wolf
Am 01.10.2015 um 13:04 hat Paolo Bonzini geschrieben:
> This is simpler now that the driver has been converted to coroutines.
> 
> Signed-off-by: Paolo Bonzini 

Reviewed-by: Kevin Wolf 



[Qemu-block] [PATCH v2 10/22] block: Add average I/O queue depth to BlockDeviceTimedStats

2015-10-02 Thread Alberto Garcia
This patch adds two new fields to BlockDeviceTimedStats that track the
average number of pending read and write requests for a block device.

The values are calculated for the period of time defined for that
interval.

Signed-off-by: Alberto Garcia 
---
 block/accounting.c   | 12 
 block/qapi.c |  5 +
 include/block/accounting.h   |  2 ++
 include/qemu/timed-average.h |  1 +
 qapi/block-core.json |  9 -
 qmp-commands.hx  |  6 ++
 util/timed-average.c | 17 +
 7 files changed, 51 insertions(+), 1 deletion(-)

diff --git a/block/accounting.c b/block/accounting.c
index c74a473..d94ebed 100644
--- a/block/accounting.c
+++ b/block/accounting.c
@@ -138,3 +138,15 @@ int64_t block_acct_idle_time_ns(BlockAcctStats *stats)
 {
 return qemu_clock_get_ns(clock_type) - stats->last_access_time_ns;
 }
+
+double block_acct_queue_depth(BlockAcctTimedStats *stats,
+  enum BlockAcctType type)
+{
+uint64_t sum, elapsed;
+
+assert(type < BLOCK_MAX_IOTYPE);
+
+sum = timed_average_sum(>latency[type], );
+
+return (double) sum / elapsed;
+}
diff --git a/block/qapi.c b/block/qapi.c
index fdddb45..2fec5e1 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -404,6 +404,11 @@ static BlockStats *bdrv_query_stats(const BlockDriverState 
*bs,
 dev_stats->min_flush_latency_ns = timed_average_min(fl);
 dev_stats->max_flush_latency_ns = timed_average_max(fl);
 dev_stats->avg_flush_latency_ns = timed_average_avg(fl);
+
+dev_stats->avg_rd_queue_depth =
+block_acct_queue_depth(ts, BLOCK_ACCT_READ);
+dev_stats->avg_wr_queue_depth =
+block_acct_queue_depth(ts, BLOCK_ACCT_WRITE);
 }
 }
 
diff --git a/include/block/accounting.h b/include/block/accounting.h
index 09605bb..f41ddde 100644
--- a/include/block/accounting.h
+++ b/include/block/accounting.h
@@ -78,5 +78,7 @@ void block_acct_invalid(BlockAcctStats *stats, enum 
BlockAcctType type);
 void block_acct_merge_done(BlockAcctStats *stats, enum BlockAcctType type,
int num_requests);
 int64_t block_acct_idle_time_ns(BlockAcctStats *stats);
+double block_acct_queue_depth(BlockAcctTimedStats *stats,
+  enum BlockAcctType type);
 
 #endif
diff --git a/include/qemu/timed-average.h b/include/qemu/timed-average.h
index f1cdddc..364bf88 100644
--- a/include/qemu/timed-average.h
+++ b/include/qemu/timed-average.h
@@ -59,5 +59,6 @@ void timed_average_account(TimedAverage *ta, uint64_t value);
 uint64_t timed_average_min(TimedAverage *ta);
 uint64_t timed_average_avg(TimedAverage *ta);
 uint64_t timed_average_max(TimedAverage *ta);
+uint64_t timed_average_sum(TimedAverage *ta, uint64_t *elapsed);
 
 #endif
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 45f913b..c6db03b 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -450,6 +450,12 @@
 # @avg_flush_latency_ns: Average latency of flush operations in the
 #defined interval, in nanoseconds.
 #
+# @avg_rd_queue_depth: Average number of pending read operations
+#  in the defined interval.
+#
+# @avg_wr_queue_depth: Average number of pending write operations
+#  in the defined interval.
+#
 # Since: 2.5
 ##
 
@@ -458,7 +464,8 @@
 'max_rd_latency_ns': 'int', 'avg_rd_latency_ns': 'int',
 'min_wr_latency_ns': 'int', 'max_wr_latency_ns': 'int',
 'avg_wr_latency_ns': 'int', 'min_flush_latency_ns': 'int',
-'max_flush_latency_ns': 'int', 'avg_flush_latency_ns': 'int' } }
+'max_flush_latency_ns': 'int', 'avg_flush_latency_ns': 'int',
+'avg_rd_queue_depth': 'number', 'avg_wr_queue_depth': 'number' } }
 
 ##
 # @BlockDeviceStats:
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 33f2897..533edf5 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -2564,6 +2564,12 @@ Each json-object contain the following:
 - "avg_flush_latency_ns": average latency of flush operations
   in the defined interval, in
   nanoseconds (json-int)
+- "avg_rd_queue_depth": average number of pending read
+operations in the defined interval
+(json-number)
+- "avg_wr_queue_depth": average number of pending write
+operations in the defined interval
+(json-number).
 - "parent": Contains recursively the statistics of the underlying
 protocol (e.g. the host file for a qcow2 image). If there is
 no underlying protocol, this field is omitted
diff --git a/util/timed-average.c b/util/timed-average.c
index 1f4689e..4263b29 100644
--- a/util/timed-average.c
+++ b/util/timed-average.c
@@ -208,3 

[Qemu-block] [PATCH v2 21/22] scsi-disk: Account for failed operations

2015-10-02 Thread Alberto Garcia
Signed-off-by: Alberto Garcia 
---
 hw/scsi/scsi-disk.c | 46 +++---
 1 file changed, 31 insertions(+), 15 deletions(-)

diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index bada9a7..20a31a7 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -90,7 +90,7 @@ struct SCSIDiskState
 bool tray_locked;
 };
 
-static int scsi_handle_rw_error(SCSIDiskReq *r, int error);
+static int scsi_handle_rw_error(SCSIDiskReq *r, int error, bool acct_failed);
 
 static void scsi_free_request(SCSIRequest *req)
 {
@@ -169,18 +169,18 @@ static void scsi_aio_complete(void *opaque, int ret)
 
 assert(r->req.aiocb != NULL);
 r->req.aiocb = NULL;
-block_acct_done(blk_get_stats(s->qdev.conf.blk), >acct);
 if (r->req.io_canceled) {
 scsi_req_cancel_complete(>req);
 goto done;
 }
 
 if (ret < 0) {
-if (scsi_handle_rw_error(r, -ret)) {
+if (scsi_handle_rw_error(r, -ret, true)) {
 goto done;
 }
 }
 
+block_acct_done(blk_get_stats(s->qdev.conf.blk), >acct);
 scsi_req_complete(>req, GOOD);
 
 done:
@@ -247,7 +247,7 @@ static void scsi_dma_complete_noio(SCSIDiskReq *r, int ret)
 }
 
 if (ret < 0) {
-if (scsi_handle_rw_error(r, -ret)) {
+if (scsi_handle_rw_error(r, -ret, false)) {
 goto done;
 }
 }
@@ -273,7 +273,11 @@ static void scsi_dma_complete(void *opaque, int ret)
 assert(r->req.aiocb != NULL);
 r->req.aiocb = NULL;
 
-block_acct_done(blk_get_stats(s->qdev.conf.blk), >acct);
+if (ret < 0) {
+block_acct_failed(blk_get_stats(s->qdev.conf.blk), >acct);
+} else {
+block_acct_done(blk_get_stats(s->qdev.conf.blk), >acct);
+}
 scsi_dma_complete_noio(r, ret);
 }
 
@@ -285,18 +289,18 @@ static void scsi_read_complete(void * opaque, int ret)
 
 assert(r->req.aiocb != NULL);
 r->req.aiocb = NULL;
-block_acct_done(blk_get_stats(s->qdev.conf.blk), >acct);
 if (r->req.io_canceled) {
 scsi_req_cancel_complete(>req);
 goto done;
 }
 
 if (ret < 0) {
-if (scsi_handle_rw_error(r, -ret)) {
+if (scsi_handle_rw_error(r, -ret, true)) {
 goto done;
 }
 }
 
+block_acct_done(blk_get_stats(s->qdev.conf.blk), >acct);
 DPRINTF("Data ready tag=0x%x len=%zd\n", r->req.tag, r->qiov.size);
 
 n = r->qiov.size / 512;
@@ -322,7 +326,7 @@ static void scsi_do_read(SCSIDiskReq *r, int ret)
 }
 
 if (ret < 0) {
-if (scsi_handle_rw_error(r, -ret)) {
+if (scsi_handle_rw_error(r, -ret, false)) {
 goto done;
 }
 }
@@ -355,7 +359,11 @@ static void scsi_do_read_cb(void *opaque, int ret)
 assert (r->req.aiocb != NULL);
 r->req.aiocb = NULL;
 
-block_acct_done(blk_get_stats(s->qdev.conf.blk), >acct);
+if (ret < 0) {
+block_acct_failed(blk_get_stats(s->qdev.conf.blk), >acct);
+} else {
+block_acct_done(blk_get_stats(s->qdev.conf.blk), >acct);
+}
 scsi_do_read(opaque, ret);
 }
 
@@ -407,7 +415,7 @@ static void scsi_read_data(SCSIRequest *req)
  * scsi_handle_rw_error always manages its reference counts, independent
  * of the return value.
  */
-static int scsi_handle_rw_error(SCSIDiskReq *r, int error)
+static int scsi_handle_rw_error(SCSIDiskReq *r, int error, bool acct_failed)
 {
 bool is_read = (r->req.cmd.mode == SCSI_XFER_FROM_DEV);
 SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
@@ -415,6 +423,9 @@ static int scsi_handle_rw_error(SCSIDiskReq *r, int error)
is_read, error);
 
 if (action == BLOCK_ERROR_ACTION_REPORT) {
+if (acct_failed) {
+block_acct_failed(blk_get_stats(s->qdev.conf.blk), >acct);
+}
 switch (error) {
 case ENOMEDIUM:
 scsi_check_condition(r, SENSE_CODE(NO_MEDIUM));
@@ -452,7 +463,7 @@ static void scsi_write_complete_noio(SCSIDiskReq *r, int 
ret)
 }
 
 if (ret < 0) {
-if (scsi_handle_rw_error(r, -ret)) {
+if (scsi_handle_rw_error(r, -ret, false)) {
 goto done;
 }
 }
@@ -481,7 +492,11 @@ static void scsi_write_complete(void * opaque, int ret)
 assert (r->req.aiocb != NULL);
 r->req.aiocb = NULL;
 
-block_acct_done(blk_get_stats(s->qdev.conf.blk), >acct);
+if (ret < 0) {
+block_acct_failed(blk_get_stats(s->qdev.conf.blk), >acct);
+} else {
+block_acct_done(blk_get_stats(s->qdev.conf.blk), >acct);
+}
 scsi_write_complete_noio(r, ret);
 }
 
@@ -1592,7 +1607,7 @@ static void scsi_unmap_complete_noio(UnmapCBData *data, 
int ret)
 }
 
 if (ret < 0) {
-if (scsi_handle_rw_error(r, -ret)) {
+if (scsi_handle_rw_error(r, -ret, false)) {
 goto done;
 }
 }
@@ -1696,18 +1711,19 @@ static void scsi_write_same_complete(void *opaque, int 
ret)
 
 assert(r->req.aiocb 

[Qemu-block] [PATCH v2 20/22] macio: Account for failed operations

2015-10-02 Thread Alberto Garcia
Signed-off-by: Alberto Garcia 
---
 hw/ide/macio.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/hw/ide/macio.c b/hw/ide/macio.c
index 66ac2ba..176e331 100644
--- a/hw/ide/macio.c
+++ b/hw/ide/macio.c
@@ -286,7 +286,11 @@ static void pmac_ide_atapi_transfer_cb(void *opaque, int 
ret)
 return;
 
 done:
-block_acct_done(blk_get_stats(s->blk), >acct);
+if (ret < 0) {
+block_acct_failed(blk_get_stats(s->blk), >acct);
+} else {
+block_acct_done(blk_get_stats(s->blk), >acct);
+}
 io->dma_end(opaque);
 
 return;
@@ -348,7 +352,11 @@ static void pmac_ide_transfer_cb(void *opaque, int ret)
 
 done:
 if (s->dma_cmd == IDE_DMA_READ || s->dma_cmd == IDE_DMA_WRITE) {
-block_acct_done(blk_get_stats(s->blk), >acct);
+if (ret < 0) {
+block_acct_failed(blk_get_stats(s->blk), >acct);
+} else {
+block_acct_done(blk_get_stats(s->blk), >acct);
+}
 }
 io->dma_end(opaque);
 }
-- 
2.5.3




[Qemu-block] [PATCH v2 13/22] block: Use QEMU_CLOCK_VIRTUAL for the accounting code in qtest mode

2015-10-02 Thread Alberto Garcia
This patch switches to QEMU_CLOCK_VIRTUAL for the accounting code in
qtest mode, and makes the latency of the operation constant. This way we
can perform tests on the accounting code with reproducible results.

Signed-off-by: Alberto Garcia 
---
 block/accounting.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/block/accounting.c b/block/accounting.c
index d94ebed..8eb59fc 100644
--- a/block/accounting.c
+++ b/block/accounting.c
@@ -25,14 +25,20 @@
 #include "block/accounting.h"
 #include "block/block_int.h"
 #include "qemu/timer.h"
+#include "sysemu/qtest.h"
 
 static QEMUClockType clock_type = QEMU_CLOCK_REALTIME;
+static const int qtest_latency_ns = NANOSECONDS_PER_SECOND / 1000;
 
 void block_acct_init(BlockAcctStats *stats, bool account_invalid,
  bool account_failed)
 {
 stats->account_invalid = account_invalid;
 stats->account_failed = account_failed;
+
+if (qtest_enabled()) {
+clock_type = QEMU_CLOCK_VIRTUAL;
+}
 }
 
 void block_acct_cleanup(BlockAcctStats *stats)
@@ -84,6 +90,10 @@ void block_acct_done(BlockAcctStats *stats, BlockAcctCookie 
*cookie)
 int64_t time_ns = qemu_clock_get_ns(clock_type);
 int64_t latency_ns = time_ns - cookie->start_time_ns;
 
+if (qtest_enabled()) {
+latency_ns = qtest_latency_ns;
+}
+
 assert(cookie->type < BLOCK_MAX_IOTYPE);
 
 stats->nr_bytes[cookie->type] += cookie->bytes;
@@ -107,6 +117,10 @@ void block_acct_failed(BlockAcctStats *stats, 
BlockAcctCookie *cookie)
 int64_t time_ns = qemu_clock_get_ns(clock_type);
 int64_t latency_ns = time_ns - cookie->start_time_ns;
 
+if (qtest_enabled()) {
+latency_ns = qtest_latency_ns;
+}
+
 stats->total_time_ns[cookie->type] += latency_ns;
 stats->last_access_time_ns = time_ns;
 
-- 
2.5.3




[Qemu-block] [PATCH v2 05/22] block: Add idle_time_ns to BlockDeviceStats

2015-10-02 Thread Alberto Garcia
This patch adds the new field 'idle_time_ns' to the BlockDeviceStats
structure, indicating the time that has passed since the previous I/O
operation.

It also adds the block_acct_idle_time_ns() call, to ensure that all
references to the clock type used for accounting are in the same
place. This will later allow us to use a different clock for iotests.

Signed-off-by: Alberto Garcia 
---
 block/accounting.c | 12 ++--
 block/qapi.c   |  5 +
 hmp.c  |  4 +++-
 include/block/accounting.h |  2 ++
 qapi/block-core.json   |  6 +-
 qmp-commands.hx| 10 --
 6 files changed, 33 insertions(+), 6 deletions(-)

diff --git a/block/accounting.c b/block/accounting.c
index 6f4c0f1..d427fa8 100644
--- a/block/accounting.c
+++ b/block/accounting.c
@@ -40,12 +40,15 @@ void block_acct_start(BlockAcctStats *stats, 
BlockAcctCookie *cookie,
 
 void block_acct_done(BlockAcctStats *stats, BlockAcctCookie *cookie)
 {
+int64_t time_ns = qemu_clock_get_ns(clock_type);
+int64_t latency_ns = time_ns - cookie->start_time_ns;
+
 assert(cookie->type < BLOCK_MAX_IOTYPE);
 
 stats->nr_bytes[cookie->type] += cookie->bytes;
 stats->nr_ops[cookie->type]++;
-stats->total_time_ns[cookie->type] +=
-qemu_clock_get_ns(clock_type) - cookie->start_time_ns;
+stats->total_time_ns[cookie->type] += latency_ns;
+stats->last_access_time_ns = time_ns;
 }
 
 
@@ -55,3 +58,8 @@ void block_acct_merge_done(BlockAcctStats *stats, enum 
BlockAcctType type,
 assert(type < BLOCK_MAX_IOTYPE);
 stats->merged[type] += num_requests;
 }
+
+int64_t block_acct_idle_time_ns(BlockAcctStats *stats)
+{
+return qemu_clock_get_ns(clock_type) - stats->last_access_time_ns;
+}
diff --git a/block/qapi.c b/block/qapi.c
index e936ba7..66f2604 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -357,6 +357,11 @@ static BlockStats *bdrv_query_stats(const BlockDriverState 
*bs,
 s->stats->wr_total_time_ns = stats->total_time_ns[BLOCK_ACCT_WRITE];
 s->stats->rd_total_time_ns = stats->total_time_ns[BLOCK_ACCT_READ];
 s->stats->flush_total_time_ns = stats->total_time_ns[BLOCK_ACCT_FLUSH];
+
+s->stats->has_idle_time_ns = stats->last_access_time_ns > 0;
+if (s->stats->has_idle_time_ns) {
+s->stats->idle_time_ns = block_acct_idle_time_ns(stats);
+}
 }
 
 s->stats->wr_highest_offset = bs->wr_highest_offset;
diff --git a/hmp.c b/hmp.c
index 6e2bbc8..289c240 100644
--- a/hmp.c
+++ b/hmp.c
@@ -511,6 +511,7 @@ void hmp_info_blockstats(Monitor *mon, const QDict *qdict)
" flush_total_time_ns=%" PRId64
" rd_merged=%" PRId64
" wr_merged=%" PRId64
+   " idle_time_ns=%" PRId64
"\n",
stats->value->stats->rd_bytes,
stats->value->stats->wr_bytes,
@@ -521,7 +522,8 @@ void hmp_info_blockstats(Monitor *mon, const QDict *qdict)
stats->value->stats->rd_total_time_ns,
stats->value->stats->flush_total_time_ns,
stats->value->stats->rd_merged,
-   stats->value->stats->wr_merged);
+   stats->value->stats->wr_merged,
+   stats->value->stats->idle_time_ns);
 }
 
 qapi_free_BlockStatsList(stats_list);
diff --git a/include/block/accounting.h b/include/block/accounting.h
index 66637cd..4b2b999 100644
--- a/include/block/accounting.h
+++ b/include/block/accounting.h
@@ -40,6 +40,7 @@ typedef struct BlockAcctStats {
 uint64_t nr_ops[BLOCK_MAX_IOTYPE];
 uint64_t total_time_ns[BLOCK_MAX_IOTYPE];
 uint64_t merged[BLOCK_MAX_IOTYPE];
+int64_t last_access_time_ns;
 } BlockAcctStats;
 
 typedef struct BlockAcctCookie {
@@ -53,5 +54,6 @@ void block_acct_start(BlockAcctStats *stats, BlockAcctCookie 
*cookie,
 void block_acct_done(BlockAcctStats *stats, BlockAcctCookie *cookie);
 void block_acct_merge_done(BlockAcctStats *stats, enum BlockAcctType type,
int num_requests);
+int64_t block_acct_idle_time_ns(BlockAcctStats *stats);
 
 #endif
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 5f12af7..a529e2f 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -448,6 +448,10 @@
 # @wr_merged: Number of write requests that have been merged into another
 # request (Since 2.3).
 #
+# @idle_time_ns: #optional Time since the last I/O operation, in
+#miliseconds. If the field is absent it means that
+#there haven't been any operations yet (Since 2.5).
+#
 # Since: 0.14.0
 ##
 { 'struct': 'BlockDeviceStats',
@@ -455,7 +459,7 @@
'wr_operations': 'int', 'flush_operations': 'int',
'flush_total_time_ns': 'int', 'wr_total_time_ns': 'int',
'rd_total_time_ns': 'int', 'wr_highest_offset': 'int',
-   

[Qemu-block] [PATCH v2 18/22] atapi: Account for failed and invalid operations

2015-10-02 Thread Alberto Garcia
Signed-off-by: Alberto Garcia 
---
 hw/ide/atapi.c | 31 +++
 1 file changed, 19 insertions(+), 12 deletions(-)

diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
index 747f466..cf0b78e 100644
--- a/hw/ide/atapi.c
+++ b/hw/ide/atapi.c
@@ -108,27 +108,30 @@ static void cd_data_to_raw(uint8_t *buf, int lba)
 static int cd_read_sector(IDEState *s, int lba, uint8_t *buf, int sector_size)
 {
 int ret;
+block_acct_start(blk_get_stats(s->blk), >acct,
+ 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
 
 switch(sector_size) {
 case 2048:
-block_acct_start(blk_get_stats(s->blk), >acct,
- 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
 ret = blk_read(s->blk, (int64_t)lba << 2, buf, 4);
-block_acct_done(blk_get_stats(s->blk), >acct);
 break;
 case 2352:
-block_acct_start(blk_get_stats(s->blk), >acct,
- 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
 ret = blk_read(s->blk, (int64_t)lba << 2, buf + 16, 4);
-block_acct_done(blk_get_stats(s->blk), >acct);
-if (ret < 0)
-return ret;
-cd_data_to_raw(buf, lba);
+if (ret >= 0) {
+cd_data_to_raw(buf, lba);
+}
 break;
 default:
-ret = -EIO;
-break;
+block_acct_invalid(blk_get_stats(s->blk), BLOCK_ACCT_READ);
+return -EIO;
 }
+
+if (ret < 0) {
+block_acct_failed(blk_get_stats(s->blk), >acct);
+} else {
+block_acct_done(blk_get_stats(s->blk), >acct);
+}
+
 return ret;
 }
 
@@ -357,7 +360,11 @@ static void ide_atapi_cmd_read_dma_cb(void *opaque, int 
ret)
 return;
 
 eot:
-block_acct_done(blk_get_stats(s->blk), >acct);
+if (ret < 0) {
+block_acct_failed(blk_get_stats(s->blk), >acct);
+} else {
+block_acct_done(blk_get_stats(s->blk), >acct);
+}
 ide_set_inactive(s, false);
 }
 
-- 
2.5.3




[Qemu-block] [PATCH v2 17/22] xen_disk: Account for failed and invalid operations

2015-10-02 Thread Alberto Garcia
Signed-off-by: Alberto Garcia 
---
 hw/block/xen_disk.c | 23 ++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
index 4869518..02eda6e 100644
--- a/hw/block/xen_disk.c
+++ b/hw/block/xen_disk.c
@@ -537,7 +537,11 @@ static void qemu_aio_complete(void *opaque, int ret)
 break;
 }
 case BLKIF_OP_READ:
-block_acct_done(blk_get_stats(ioreq->blkdev->blk), >acct);
+if (ioreq->status == BLKIF_RSP_OKAY) {
+block_acct_done(blk_get_stats(ioreq->blkdev->blk), >acct);
+} else {
+block_acct_failed(blk_get_stats(ioreq->blkdev->blk), >acct);
+}
 break;
 case BLKIF_OP_DISCARD:
 default:
@@ -722,6 +726,23 @@ static void blk_handle_requests(struct XenBlkDev *blkdev)
 
 /* parse them */
 if (ioreq_parse(ioreq) != 0) {
+
+switch (ioreq->req.operation) {
+case BLKIF_OP_READ:
+block_acct_invalid(blk_get_stats(blkdev->blk),
+   BLOCK_ACCT_READ);
+break;
+case BLKIF_OP_WRITE:
+block_acct_invalid(blk_get_stats(blkdev->blk),
+   BLOCK_ACCT_WRITE);
+break;
+case BLKIF_OP_FLUSH_DISKCACHE:
+block_acct_invalid(blk_get_stats(blkdev->blk),
+   BLOCK_ACCT_FLUSH);
+default:
+break;
+};
+
 if (blk_send_response_one(ioreq)) {
 xen_be_send_notify(>xendev);
 }
-- 
2.5.3




[Qemu-block] [PATCH v2 22/22] block: Update copyright of the accounting code

2015-10-02 Thread Alberto Garcia
Signed-off-by: Alberto Garcia 
---
 block/accounting.c | 1 +
 include/block/accounting.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/block/accounting.c b/block/accounting.c
index 8eb59fc..ebe5ad6 100644
--- a/block/accounting.c
+++ b/block/accounting.c
@@ -2,6 +2,7 @@
  * QEMU System Emulator block accounting
  *
  * Copyright (c) 2011 Christoph Hellwig
+ * Copyright (c) 2015 Igalia, S.L.
  *
  * Permission is hereby granted, free of charge, to any person obtaining a copy
  * of this software and associated documentation files (the "Software"), to 
deal
diff --git a/include/block/accounting.h b/include/block/accounting.h
index f41ddde..0215a4a 100644
--- a/include/block/accounting.h
+++ b/include/block/accounting.h
@@ -2,6 +2,7 @@
  * QEMU System Emulator block accounting
  *
  * Copyright (c) 2011 Christoph Hellwig
+ * Copyright (c) 2015 Igalia, S.L.
  *
  * Permission is hereby granted, free of charge, to any person obtaining a copy
  * of this software and associated documentation files (the "Software"), to 
deal
-- 
2.5.3




Re: [Qemu-block] [Qemu-devel] [PATCH] block/raw-posix: Open file descriptor O_RDWR to work around glibc posix_fallocate emulation issue.

2015-10-02 Thread Peter Maydell
On 29 September 2015 at 16:54, Richard W.M. Jones  wrote:
>   https://bugzilla.redhat.com/show_bug.cgi?id=1265196
>
> The following command fails on an NFS mountpoint:
>
>   $ qemu-img create -f qcow2 -o preallocation=falloc disk.img 262144
>   Formatting 'disk.img', fmt=qcow2 size=262144 encryption=off 
> cluster_size=65536 preallocation='falloc' lazy_refcounts=off
>   qemu-img: disk.img: Could not preallocate data for the new file: Bad file 
> descriptor
>
> The reason turns out to be because NFS doesn't support the
> posix_fallocate call.  glibc emulates it instead.  However glibc's
> emulation involves using the pread(2) syscall.  The pread syscall
> fails with EBADF if the file descriptor is opened without the read
> open-flag (ie. open (..., O_WRONLY)).
>
> I contacted glibc upstream about this, and their response is here:
>
>   https://bugzilla.redhat.com/show_bug.cgi?id=1265196#c9

"You are not authorized to access bug #1265196."

Any chance of a working URL if we're going to immortalize it
in QEMU's git log?

thanks
-- PMM



[Qemu-block] [PATCH v2 16/22] virtio-blk: Account for failed and invalid operations

2015-10-02 Thread Alberto Garcia
Signed-off-by: Alberto Garcia 
---
 hw/block/virtio-blk.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index f9301ae..9c5cb67 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -76,7 +76,7 @@ static int virtio_blk_handle_rw_error(VirtIOBlockReq *req, 
int error,
 s->rq = req;
 } else if (action == BLOCK_ERROR_ACTION_REPORT) {
 virtio_blk_req_complete(req, VIRTIO_BLK_S_IOERR);
-block_acct_done(blk_get_stats(s->blk), >acct);
+block_acct_failed(blk_get_stats(s->blk), >acct);
 virtio_blk_free_request(req);
 }
 
@@ -536,6 +536,8 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, 
MultiReqBuffer *mrb)
 if (!virtio_blk_sect_range_ok(req->dev, req->sector_num,
   req->qiov.size)) {
 virtio_blk_req_complete(req, VIRTIO_BLK_S_IOERR);
+block_acct_invalid(blk_get_stats(req->dev->blk),
+   is_write ? BLOCK_ACCT_WRITE : BLOCK_ACCT_READ);
 virtio_blk_free_request(req);
 return;
 }
-- 
2.5.3




[Qemu-block] [PATCH v2 19/22] ide: Account for failed and invalid operations

2015-10-02 Thread Alberto Garcia
Signed-off-by: Alberto Garcia 
---
 hw/ide/core.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/hw/ide/core.c b/hw/ide/core.c
index b559f1b..dd7e9c5 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -574,7 +574,6 @@ static void ide_sector_read_cb(void *opaque, int ret)
 if (ret == -ECANCELED) {
 return;
 }
-block_acct_done(blk_get_stats(s->blk), >acct);
 if (ret != 0) {
 if (ide_handle_rw_error(s, -ret, IDE_RETRY_PIO |
 IDE_RETRY_READ)) {
@@ -582,6 +581,8 @@ static void ide_sector_read_cb(void *opaque, int ret)
 }
 }
 
+block_acct_done(blk_get_stats(s->blk), >acct);
+
 n = s->nsector;
 if (n > s->req_nb_sectors) {
 n = s->req_nb_sectors;
@@ -621,6 +622,7 @@ static void ide_sector_read(IDEState *s)
 
 if (!ide_sect_range_ok(s, sector_num, n)) {
 ide_rw_error(s);
+block_acct_invalid(blk_get_stats(s->blk), BLOCK_ACCT_READ);
 return;
 }
 
@@ -672,6 +674,7 @@ static int ide_handle_rw_error(IDEState *s, int error, int 
op)
 assert(s->bus->retry_unit == s->unit);
 s->bus->error_status = op;
 } else if (action == BLOCK_ERROR_ACTION_REPORT) {
+block_acct_failed(blk_get_stats(s->blk), >acct);
 if (op & IDE_RETRY_DMA) {
 ide_dma_error(s);
 } else {
@@ -750,6 +753,7 @@ static void ide_dma_cb(void *opaque, int ret)
 if ((s->dma_cmd == IDE_DMA_READ || s->dma_cmd == IDE_DMA_WRITE) &&
 !ide_sect_range_ok(s, sector_num, n)) {
 ide_dma_error(s);
+block_acct_invalid(blk_get_stats(s->blk), s->acct.type);
 return;
 }
 
@@ -826,7 +830,6 @@ static void ide_sector_write_cb(void *opaque, int ret)
 if (ret == -ECANCELED) {
 return;
 }
-block_acct_done(blk_get_stats(s->blk), >acct);
 
 s->pio_aiocb = NULL;
 s->status &= ~BUSY_STAT;
@@ -837,6 +840,8 @@ static void ide_sector_write_cb(void *opaque, int ret)
 }
 }
 
+block_acct_done(blk_get_stats(s->blk), >acct);
+
 n = s->nsector;
 if (n > s->req_nb_sectors) {
 n = s->req_nb_sectors;
@@ -887,6 +892,7 @@ static void ide_sector_write(IDEState *s)
 
 if (!ide_sect_range_ok(s, sector_num, n)) {
 ide_rw_error(s);
+block_acct_invalid(blk_get_stats(s->blk), BLOCK_ACCT_WRITE);
 return;
 }
 
-- 
2.5.3




[Qemu-block] [PATCH v2 14/22] iotests: Add test for the block device statistics

2015-10-02 Thread Alberto Garcia
Signed-off-by: Alberto Garcia 
---
 tests/qemu-iotests/136 | 349 +
 tests/qemu-iotests/136.out |   5 +
 tests/qemu-iotests/group   |   1 +
 3 files changed, 355 insertions(+)
 create mode 100644 tests/qemu-iotests/136
 create mode 100644 tests/qemu-iotests/136.out

diff --git a/tests/qemu-iotests/136 b/tests/qemu-iotests/136
new file mode 100644
index 000..f574d83
--- /dev/null
+++ b/tests/qemu-iotests/136
@@ -0,0 +1,349 @@
+#!/usr/bin/env python
+#
+# Tests for block device statistics
+#
+# Copyright (C) 2015 Igalia, S.L.
+# Author: Alberto Garcia 
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+import iotests
+import os
+
+interval_length = 10
+nsec_per_sec = 10
+op_latency = nsec_per_sec / 1000 # See qtest_latency_ns in accounting.c
+bad_sector = 8192
+bad_offset = bad_sector * 512
+blkdebug_file = os.path.join(iotests.test_dir, 'blkdebug.conf')
+
+class BlockDeviceStatsTestCase(iotests.QMPTestCase):
+test_img = "null-aio://"
+total_rd_bytes = 0
+total_rd_ops = 0
+total_wr_bytes = 0
+total_wr_ops = 0
+total_wr_merged = 0
+total_flush_ops = 0
+failed_rd_ops = 0
+failed_wr_ops = 0
+invalid_rd_ops = 0
+invalid_wr_ops = 0
+wr_highest_offset = 0
+account_invalid = False
+account_failed = False
+
+def blockstats(self, device):
+result = self.vm.qmp("query-blockstats")
+for r in result['return']:
+if r['device'] == device:
+return r['stats']
+raise Exception("Device not found for blockstats: %s" % device)
+
+def create_blkdebug_file(self):
+file = open(blkdebug_file, 'w')
+file.write('''
+[inject-error]
+event = "read_aio"
+errno = "5"
+sector = "%d"
+
+[inject-error]
+event = "write_aio"
+errno = "5"
+sector = "%d"
+''' % (bad_sector, bad_sector))
+file.close()
+
+def setUp(self):
+drive_args = []
+drive_args.append("stats-intervals=%d" % interval_length)
+drive_args.append("stats-account-invalid=%s" %
+  (self.account_invalid and "on" or "off"))
+drive_args.append("stats-account-failed=%s" %
+  (self.account_failed and "on" or "off"))
+self.create_blkdebug_file()
+self.vm = iotests.VM().add_drive('blkdebug:%s:%s ' %
+ (blkdebug_file, self.test_img),
+ ','.join(drive_args))
+self.vm.launch()
+# Set an initial value for the clock
+self.vm.qtest("clock_step %d" % nsec_per_sec)
+
+def tearDown(self):
+self.vm.shutdown()
+os.remove(blkdebug_file)
+
+def accounted_ops(self, read = False, write = False, flush = False):
+ops = 0
+if write:
+ops += self.total_wr_ops
+if self.account_failed:
+ops += self.failed_wr_ops
+if self.account_invalid:
+ops += self.invalid_wr_ops
+if read:
+ops += self.total_rd_ops
+if self.account_failed:
+ops += self.failed_rd_ops
+if self.account_invalid:
+ops += self.invalid_rd_ops
+if flush:
+ops += self.total_flush_ops
+return ops
+
+def accounted_latency(self, read = False, write = False, flush = False):
+latency = 0
+if write:
+latency += self.total_wr_ops * op_latency
+if self.account_failed:
+latency += self.failed_wr_ops * op_latency
+if read:
+latency += self.total_rd_ops * op_latency
+if self.account_failed:
+latency += self.failed_rd_ops * op_latency
+if flush:
+latency += self.total_flush_ops * op_latency
+return latency
+
+def check_values(self):
+stats = self.blockstats('drive0')
+
+# Check that the totals match with what we have calculated
+self.assertEqual(self.total_rd_bytes, stats['rd_bytes'])
+self.assertEqual(self.total_wr_bytes, stats['wr_bytes'])
+self.assertEqual(self.total_rd_ops, stats['rd_operations'])
+self.assertEqual(self.total_wr_ops, stats['wr_operations'])
+self.assertEqual(self.total_flush_ops, stats['flush_operations'])
+

[Qemu-block] [PATCH v2 12/22] qemu-io: Account for failed, invalid and flush operations

2015-10-02 Thread Alberto Garcia
Signed-off-by: Alberto Garcia 
---
 qemu-io-cmds.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
index d6572a8..f8f02ab 100644
--- a/qemu-io-cmds.c
+++ b/qemu-io-cmds.c
@@ -1364,6 +1364,7 @@ static void aio_write_done(void *opaque, int ret)
 
 if (ret < 0) {
 printf("aio_write failed: %s\n", strerror(-ret));
+block_acct_failed(blk_get_stats(ctx->blk), >acct);
 goto out;
 }
 
@@ -1392,6 +1393,7 @@ static void aio_read_done(void *opaque, int ret)
 
 if (ret < 0) {
 printf("readv failed: %s\n", strerror(-ret));
+block_acct_failed(blk_get_stats(ctx->blk), >acct);
 goto out;
 }
 
@@ -1505,6 +1507,7 @@ static int aio_read_f(BlockBackend *blk, int argc, char 
**argv)
 if (ctx->offset & 0x1ff) {
 printf("offset %" PRId64 " is not sector aligned\n",
ctx->offset);
+block_acct_invalid(blk_get_stats(blk), BLOCK_ACCT_READ);
 g_free(ctx);
 return 0;
 }
@@ -1512,6 +1515,7 @@ static int aio_read_f(BlockBackend *blk, int argc, char 
**argv)
 nr_iov = argc - optind;
 ctx->buf = create_iovec(blk, >qiov, [optind], nr_iov, 0xab);
 if (ctx->buf == NULL) {
+block_acct_invalid(blk_get_stats(blk), BLOCK_ACCT_READ);
 g_free(ctx);
 return 0;
 }
@@ -1600,6 +1604,7 @@ static int aio_write_f(BlockBackend *blk, int argc, char 
**argv)
 if (ctx->offset & 0x1ff) {
 printf("offset %" PRId64 " is not sector aligned\n",
ctx->offset);
+block_acct_invalid(blk_get_stats(blk), BLOCK_ACCT_WRITE);
 g_free(ctx);
 return 0;
 }
@@ -1607,6 +1612,7 @@ static int aio_write_f(BlockBackend *blk, int argc, char 
**argv)
 nr_iov = argc - optind;
 ctx->buf = create_iovec(blk, >qiov, [optind], nr_iov, pattern);
 if (ctx->buf == NULL) {
+block_acct_invalid(blk_get_stats(blk), BLOCK_ACCT_WRITE);
 g_free(ctx);
 return 0;
 }
@@ -1621,7 +1627,10 @@ static int aio_write_f(BlockBackend *blk, int argc, char 
**argv)
 
 static int aio_flush_f(BlockBackend *blk, int argc, char **argv)
 {
+BlockAcctCookie cookie = { 0 };
+block_acct_start(blk_get_stats(blk), , 0, BLOCK_ACCT_FLUSH);
 blk_drain_all();
+block_acct_done(blk_get_stats(blk), );
 return 0;
 }
 
-- 
2.5.3




Re: [Qemu-block] [Qemu-devel] [PATCH] block/raw-posix: Open file descriptor O_RDWR to work around glibc posix_fallocate emulation issue.

2015-10-02 Thread Richard W.M. Jones
On Fri, Oct 02, 2015 at 03:36:04PM +0100, Peter Maydell wrote:
> On 29 September 2015 at 16:54, Richard W.M. Jones  wrote:
> >   https://bugzilla.redhat.com/show_bug.cgi?id=1265196
> >
> > The following command fails on an NFS mountpoint:
> >
> >   $ qemu-img create -f qcow2 -o preallocation=falloc disk.img 262144
> >   Formatting 'disk.img', fmt=qcow2 size=262144 encryption=off 
> > cluster_size=65536 preallocation='falloc' lazy_refcounts=off
> >   qemu-img: disk.img: Could not preallocate data for the new file: Bad file 
> > descriptor
> >
> > The reason turns out to be because NFS doesn't support the
> > posix_fallocate call.  glibc emulates it instead.  However glibc's
> > emulation involves using the pread(2) syscall.  The pread syscall
> > fails with EBADF if the file descriptor is opened without the read
> > open-flag (ie. open (..., O_WRONLY)).
> >
> > I contacted glibc upstream about this, and their response is here:
> >
> >   https://bugzilla.redhat.com/show_bug.cgi?id=1265196#c9
> 
> "You are not authorized to access bug #1265196."
> 
> Any chance of a working URL if we're going to immortalize it
> in QEMU's git log?

Sorry about that.  I have now made the bug public -- there was
absolutely no reason why it was private before.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://libguestfs.org



[Qemu-block] [PATCH v2 06/22] block: Add "supports_stats" field to BlockStats

2015-10-02 Thread Alberto Garcia
query-blockstats always returns a BlockDeviceStats structure for each
BDS, regardless of whether it implements accounting or not. Since the
field is mandatory there's no way to tell that from the values alone.

This field solves that problem by indicating which BDSs support I/O
accounting.

Signed-off-by: Alberto Garcia 
---
 block/qapi.c | 2 ++
 qapi/block-core.json | 3 +++
 qmp-commands.hx  | 6 ++
 3 files changed, 11 insertions(+)

diff --git a/block/qapi.c b/block/qapi.c
index 66f2604..518b658 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -343,6 +343,8 @@ static BlockStats *bdrv_query_stats(const BlockDriverState 
*bs,
 s->node_name = g_strdup(bdrv_get_node_name(bs));
 }
 
+s->supports_stats = bs->blk != NULL;
+
 s->stats = g_malloc0(sizeof(*s->stats));
 if (bs->blk) {
 BlockAcctStats *stats = blk_get_stats(bs->blk);
diff --git a/qapi/block-core.json b/qapi/block-core.json
index a529e2f..903884b 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -471,6 +471,8 @@
 #
 # @node-name: #optional The node name of the device. (Since 2.3)
 #
+# @supports_stats: Whether the device supports I/O stats (Since 2.5)
+#
 # @stats:  A @BlockDeviceStats for the device.
 #
 # @parent: #optional This describes the file block device if it has one.
@@ -482,6 +484,7 @@
 ##
 { 'struct': 'BlockStats',
   'data': {'*device': 'str', '*node-name': 'str',
+   'supports_stats': 'bool',
'stats': 'BlockDeviceStats',
'*parent': 'BlockStats',
'*backing': 'BlockStats'} }
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 13c1bdc..2f48bb6 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -2496,6 +2496,7 @@ value is a json-array of all devices.
 Each json-object contain the following:
 
 - "device": device name (json-string)
+- "supports_stats": whether the device supports I/O stats (json-bool)
 - "stats": A json-object with the statistics information, it contains:
 - "rd_bytes": bytes read (json-int)
 - "wr_bytes": bytes written (json-int)
@@ -2528,6 +2529,7 @@ Example:
  {
 "device":"ide0-hd0",
 "parent":{
+   "supports_stats":true,
"stats":{
   "wr_highest_offset":3686448128,
   "wr_bytes":9786368,
@@ -2543,6 +2545,7 @@ Example:
   "idle_time_ns":2953431879
}
 },
+"supports_stats":true,
 "stats":{
"wr_highest_offset":2821110784,
"wr_bytes":9786368,
@@ -2560,6 +2563,7 @@ Example:
  },
  {
 "device":"ide1-cd0",
+"supports_stats":false,
 "stats":{
"wr_highest_offset":0,
"wr_bytes":0,
@@ -2576,6 +2580,7 @@ Example:
  },
  {
 "device":"floppy0",
+"supports_stats":false,
 "stats":{
"wr_highest_offset":0,
"wr_bytes":0,
@@ -2592,6 +2597,7 @@ Example:
  },
  {
 "device":"sd0",
+"supports_stats":false,
 "stats":{
"wr_highest_offset":0,
"wr_bytes":0,
-- 
2.5.3




[Qemu-block] [PATCH v2 01/22] xen_disk: Account for flush operations

2015-10-02 Thread Alberto Garcia
Currently both BLKIF_OP_WRITE and BLKIF_OP_FLUSH_DISKCACHE are being
accounted as write operations.

Signed-off-by: Alberto Garcia 
---
 hw/block/xen_disk.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
index 1bbc111..4869518 100644
--- a/hw/block/xen_disk.c
+++ b/hw/block/xen_disk.c
@@ -576,7 +576,9 @@ static int ioreq_runio_qemu_aio(struct ioreq *ioreq)
 }
 
 block_acct_start(blk_get_stats(blkdev->blk), >acct,
- ioreq->v.size, BLOCK_ACCT_WRITE);
+ ioreq->v.size,
+ ioreq->req.operation == BLKIF_OP_WRITE ?
+ BLOCK_ACCT_WRITE : BLOCK_ACCT_FLUSH);
 ioreq->aio_inflight++;
 blk_aio_writev(blkdev->blk, ioreq->start / BLOCK_SIZE,
>v, ioreq->v.size / BLOCK_SIZE,
-- 
2.5.3




[Qemu-block] [PATCH v2 08/22] block: Allow configuring whether to account failed and invalid ops

2015-10-02 Thread Alberto Garcia
This patch adds two options, "stats-account-invalid" and
"stats-account-failed", that can be used to decide whether invalid and
failed I/O operations must be used when collecting statistics for
latency and last access time.

Signed-off-by: Alberto Garcia 
---
 block/accounting.c | 24 +++-
 block/qapi.c   |  3 +++
 blockdev.c | 16 
 include/block/accounting.h |  5 +
 qapi/block-core.json   | 17 -
 qmp-commands.hx| 25 -
 6 files changed, 79 insertions(+), 11 deletions(-)

diff --git a/block/accounting.c b/block/accounting.c
index eb86a47..9584450 100644
--- a/block/accounting.c
+++ b/block/accounting.c
@@ -28,6 +28,13 @@
 
 static QEMUClockType clock_type = QEMU_CLOCK_REALTIME;
 
+void block_acct_init(BlockAcctStats *stats, bool account_invalid,
+ bool account_failed)
+{
+stats->account_invalid = account_invalid;
+stats->account_failed = account_failed;
+}
+
 void block_acct_start(BlockAcctStats *stats, BlockAcctCookie *cookie,
   int64_t bytes, enum BlockAcctType type)
 {
@@ -53,13 +60,17 @@ void block_acct_done(BlockAcctStats *stats, BlockAcctCookie 
*cookie)
 
 void block_acct_failed(BlockAcctStats *stats, BlockAcctCookie *cookie)
 {
-int64_t time_ns = qemu_clock_get_ns(clock_type);
-
 assert(cookie->type < BLOCK_MAX_IOTYPE);
 
 stats->failed_ops[cookie->type]++;
-stats->total_time_ns[cookie->type] += time_ns - cookie->start_time_ns;
-stats->last_access_time_ns = time_ns;
+
+if (stats->account_failed) {
+int64_t time_ns = qemu_clock_get_ns(clock_type);
+int64_t latency_ns = time_ns - cookie->start_time_ns;
+
+stats->total_time_ns[cookie->type] += latency_ns;
+stats->last_access_time_ns = time_ns;
+}
 }
 
 void block_acct_invalid(BlockAcctStats *stats, enum BlockAcctType type)
@@ -67,7 +78,10 @@ void block_acct_invalid(BlockAcctStats *stats, enum 
BlockAcctType type)
 assert(type < BLOCK_MAX_IOTYPE);
 
 stats->invalid_ops[type]++;
-stats->last_access_time_ns = qemu_clock_get_ns(clock_type);
+
+if (stats->account_invalid) {
+stats->last_access_time_ns = qemu_clock_get_ns(clock_type);
+}
 }
 
 void block_acct_merge_done(BlockAcctStats *stats, enum BlockAcctType type,
diff --git a/block/qapi.c b/block/qapi.c
index 1b787ba..7dd9128 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -374,6 +374,9 @@ static BlockStats *bdrv_query_stats(const BlockDriverState 
*bs,
 if (s->stats->has_idle_time_ns) {
 s->stats->idle_time_ns = block_acct_idle_time_ns(stats);
 }
+
+s->stats->account_invalid = stats->account_invalid;
+s->stats->account_failed = stats->account_failed;
 }
 
 s->stats->wr_highest_offset = bs->wr_highest_offset;
diff --git a/blockdev.c b/blockdev.c
index 4731843..61a80fb 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -461,6 +461,7 @@ static BlockBackend *blockdev_init(const char *file, QDict 
*bs_opts,
 const char *buf;
 int bdrv_flags = 0;
 int on_read_error, on_write_error;
+bool account_invalid, account_failed;
 BlockBackend *blk;
 BlockDriverState *bs;
 ThrottleConfig cfg;
@@ -497,6 +498,9 @@ static BlockBackend *blockdev_init(const char *file, QDict 
*bs_opts,
 /* extract parameters */
 snapshot = qemu_opt_get_bool(opts, "snapshot", 0);
 
+account_invalid = qemu_opt_get_bool(opts, "stats-account-invalid", true);
+account_failed = qemu_opt_get_bool(opts, "stats-account-failed", true);
+
 extract_common_blockdev_options(opts, _flags, , _zeroes,
 _group, );
 if (error) {
@@ -597,6 +601,8 @@ static BlockBackend *blockdev_init(const char *file, QDict 
*bs_opts,
 if (bdrv_key_required(bs)) {
 autostart = 0;
 }
+
+block_acct_init(blk_get_stats(blk), account_invalid, account_failed);
 }
 
 blk_set_on_error(blk, on_read_error, on_write_error);
@@ -3528,6 +3534,16 @@ QemuOptsList qemu_common_drive_opts = {
 .name = "detect-zeroes",
 .type = QEMU_OPT_STRING,
 .help = "try to optimize zero writes (off, on, unmap)",
+},{
+.name = "stats-account-invalid",
+.type = QEMU_OPT_BOOL,
+.help = "whether to account for invalid I/O operations "
+"in the statistics",
+},{
+.name = "stats-account-failed",
+.type = QEMU_OPT_BOOL,
+.help = "whether to account for failed I/O operations "
+"in the statistics",
 },
 { /* end of list */ }
 },
diff --git a/include/block/accounting.h b/include/block/accounting.h
index b50e3cc..0d9b076 100644
--- a/include/block/accounting.h
+++ b/include/block/accounting.h
@@ -25,6 +25,7 @@
 #define BLOCK_ACCOUNTING_H
 
 #include 
+#include 
 
 #include 

Re: [Qemu-block] [Qemu-devel] [PATCH] qtest/ide-test: ppc64be correction for ATAPI tests

2015-10-02 Thread John Snow


On 10/02/2015 06:55 AM, Kevin Wolf wrote:
> Am 28.09.2015 um 19:38 hat John Snow geschrieben:
>> the 16bit ide data register is LE by definition.
>>
>> Signed-off-by: John Snow 
>> ---
>>  tests/ide-test.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/tests/ide-test.c b/tests/ide-test.c
>> index 5594738..b6e9e1a 100644
>> --- a/tests/ide-test.c
>> +++ b/tests/ide-test.c
>> @@ -633,7 +633,7 @@ static void send_scsi_cdb_read10(uint64_t lba, int 
>> nblocks)
>>  
>>  /* Send Packet */
>>  for (i = 0; i < sizeof(Read10CDB)/2; i++) {
>> -outw(IDE_BASE + reg_data, ((uint16_t *))[i]);
>> +outw(IDE_BASE + reg_data, cpu_to_le16(((uint16_t *))[i]));
>>  }
>>  }
>>  
>> @@ -733,7 +733,7 @@ static void cdrom_pio_impl(int nblocks)
>>  size_t offset = i * (limit / 2);
>>  size_t rem = (rxsize / 2) - offset;
>>  for (j = 0; j < MIN((limit / 2), rem); j++) {
>> -rx[offset + j] = inw(IDE_BASE + reg_data);
>> +rx[offset + j] = le16_to_cpu(inw(IDE_BASE + reg_data));
>>  }
>>  ide_wait_intr(IDE_PRIMARY_IRQ);
>>  }
> 
> Why doesn't the access in test_identify() need a fix?
> 
> Kevin
> 

The strings are stored as BE16 chunks but transmitted via an LE16
register, so if we want to decode the original string, we can apply
*either* an LE16 or a BE16 swap to obtain the byte-ordered string.

Yes, that took me a minute to figure out.

We *could* apply a le16_to_cpu filter to re-obtain the raw original
data, but then we'd just have to run it back through be16_to_cpu to get
the string out anyway.

In this test, we just cheat and run be16_to_cpu, which works.

--js



Re: [Qemu-block] [PATCH] qtest/ide-test: ppc64be correction for ATAPI tests

2015-10-02 Thread Kevin Wolf
Am 28.09.2015 um 19:38 hat John Snow geschrieben:
> the 16bit ide data register is LE by definition.
> 
> Signed-off-by: John Snow 

Reviewed-by: Kevin Wolf 



Re: [Qemu-block] [PATCH v2 11/16] block-backend: Add blk_set_bs()

2015-10-02 Thread Max Reitz
On 01.10.2015 15:13, Kevin Wolf wrote:
> It allows changing the BlockDriverState that a BlockBackend points to.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  block/block-backend.c | 17 +
>  include/block/block_int.h |  2 ++
>  2 files changed, 19 insertions(+)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v2 13/16] block: Implement bdrv_append() without bdrv_swap()

2015-10-02 Thread Max Reitz
On 01.10.2015 15:13, Kevin Wolf wrote:
> Remember all parent nodes and just change the pointers there instead of
> swapping the contents of the BlockDriverState.
> 
> Handling of snapshot=on must be moved further down in bdrv_open()
> because *pbs (which is the bs pointer in the BlockBackend) must already
> be set before bdrv_append() is called. Otherwise bdrv_append() changes
> the BB's pointer to the temporary snapshot, but bdrv_open() overwrites
> it with the read-only original image.
> 
> We also need to be careful to update callers as the interface changes
> (becomes less insane): Previously, the meaning of the two parameters was
> inverted when bdrv_append() returns. Now any BDS pointers keep pointing
> to the same node.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  block.c| 112 
> +
>  blockdev.c |   2 +-
>  2 files changed, 85 insertions(+), 29 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] [PATCH 1/3] block: prohibit migration during BlockJobs

2015-10-02 Thread John Snow


On 10/01/2015 02:03 PM, Paolo Bonzini wrote:
> 
> 
> On 01/10/2015 18:34, John Snow wrote:
>> Unless we can prove this to be safe for specific cases,
>> the default should be to prohibit migration during BlockJobs.
> 
> Block jobs do not affect the current block, only other block device,
> hence they *are* safe for migration.
> 

Can you elaborate for me here?

> What you want, I think, is the target not to be garbage when migration
> ends.  Based on this you can block specific cases, namely mirror which
> you already do allow (patch 2) and backup except for sync='none'.
> 
> Paolo
> 

It would be nice if the target wasn't garbage, yes :)

I allow mirror in specific circumstances -- you can't start a mirror,
but if an existing mirror has hit the sync phase, that's OK.

I can try to do a more exhaustive audit of what should and should not
work, but my thought was "guilty before proven innocent."

>> In conjunction with
>> "migration: disallow_migrate_add_blocker during migration",
>> this should be sufficient to disallow the blockjob from starting
>> in the event of an in-progress migration.
> 




Re: [Qemu-block] [Qemu-devel] [PATCH 3/3] block: prohibit migration during transactions

2015-10-02 Thread John Snow


On 10/01/2015 02:01 PM, Paolo Bonzini wrote:
> 
> 
> On 01/10/2015 18:34, John Snow wrote:
>> +
>> +error_setg(, "Block device(s) are in use by a Block 
>> Transaction");
> 
> s/Block Transaction/transaction command/
> 
> But how can migration start during a transaction?
> 

Well, it definitely can't now ! :)

This is actually more for the other case: The migration starts, and we
want to prohibit snapshots and transactions during that period. Together
with the patch this depends on, we accomplish that.

The workflow of snapshot/transaction-before-migration isn't currently
possible.

>> +ret = migrate_add_blocker(blocker, errp);
>> +if (ret < 0) {
>> +goto cleanup_mig;
>> +}
>>  
>>  QSIMPLEQ_HEAD(snap_bdrv_states, BlkTransactionState) snap_bdrv_states;
>>  QSIMPLEQ_INIT(_bdrv_states);
>> @@ -1814,6 +1823,9 @@ exit:
>>  }
>>  g_free(state);
>>  }
>> + cleanup_mig:
>> +migrate_del_blocker(blocker);
>> +error_free(blocker);
> 



Re: [Qemu-block] [PATCH v2 07/16] block: Convert bs->backing_hd to BdrvChild

2015-10-02 Thread Max Reitz
On 01.10.2015 15:13, Kevin Wolf wrote:
> This is the final step in converting all of the BlockDriverState
> pointers that block drivers use to BdrvChild.
> 
> After this patch, bs->children contains the full list of child nodes
> that are referenced by a given BDS, and these children are only
> referenced through BdrvChild, so that updating the pointer in there is
> enough for changing edges in the graph.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  block.c   | 116 
> +-
>  block/io.c|  24 +-
>  block/mirror.c|   6 +--
>  block/qapi.c  |   8 ++--
>  block/qcow.c  |   4 +-
>  block/qcow2-cluster.c |   4 +-
>  block/qcow2.c |   6 +--
>  block/qed.c   |  12 ++---
>  block/stream.c|   8 ++--
>  block/vmdk.c  |  21 +
>  block/vvfat.c |   6 +--
>  blockdev.c|   4 +-
>  include/block/block_int.h |  12 +++--
>  qemu-img.c|   4 +-
>  14 files changed, 125 insertions(+), 110 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v2 14/16] blockjob: Store device name at job creation

2015-10-02 Thread Max Reitz
On 01.10.2015 15:13, Kevin Wolf wrote:
> Some block jobs change the block device graph on completion. This means
> that the device that owns the job and originally was addressed with its
> device name may no longer be what the corresponding BlockBackend points
> to.
> 
> Previously, the effects of bdrv_swap() ensured that the job was (at
> least partially) transferred to the target image. Events that contain
> the device name could still use bdrv_get_device_name(job->bs) and get
> the same result.
> 
> After removing bdrv_swap(), this won't work any more. Instead, save the
> device name at job creation and use that copy for QMP events and
> anything else identifying the job.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  block/mirror.c   |  3 +--
>  blockjob.c   | 15 ---
>  include/block/blockjob.h |  8 
>  3 files changed, 17 insertions(+), 9 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v2 08/16] block: Manage backing file references in bdrv_set_backing_hd()

2015-10-02 Thread Max Reitz
On 01.10.2015 15:13, Kevin Wolf wrote:
> This simplifies the code somewhat, especially when dropping whole
> backing file subchains.
> 
> The exception is the mirroring code that does adventurous things with
> bdrv_swap() and in order to keep it working, I had to duplicate most of
> bdrv_set_backing_hd() locally. We'll get rid again of this ugliness
> shortly.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  block.c   | 35 ++-
>  block/mirror.c| 16 
>  block/stream.c| 30 +-
>  block/vvfat.c |  6 +-
>  include/block/block.h |  1 +
>  5 files changed, 33 insertions(+), 55 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v2 05/16] block: Convert bs->file to BdrvChild

2015-10-02 Thread Max Reitz
On 01.10.2015 15:13, Kevin Wolf wrote:
> This patch removes the temporary duplication between bs->file and
> bs->file_child by converting everything to BdrvChild.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  block.c   | 63 
> ++-
>  block/blkdebug.c  | 32 +---
>  block/blkverify.c | 33 ++---
>  block/bochs.c |  8 +++---
>  block/cloop.c | 10 
>  block/dmg.c   | 20 +++
>  block/io.c| 50 ++---
>  block/parallels.c | 38 ++--
>  block/qapi.c  |  2 +-
>  block/qcow.c  | 42 ---
>  block/qcow2-cache.c   | 11 +
>  block/qcow2-cluster.c | 38 
>  block/qcow2-refcount.c| 45 +
>  block/qcow2-snapshot.c| 30 +++---
>  block/qcow2.c | 62 --
>  block/qed-table.c |  4 +--
>  block/qed.c   | 32 
>  block/raw_bsd.c   | 40 +++---
>  block/snapshot.c  | 12 -
>  block/vdi.c   | 17 +++--
>  block/vhdx-log.c  | 25 ++-
>  block/vhdx.c  | 36 ++-
>  block/vmdk.c  | 27 ++--
>  block/vpc.c   | 34 +
>  include/block/block.h |  8 +-
>  include/block/block_int.h |  3 +--
>  26 files changed, 378 insertions(+), 344 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] [PATCH 1/5] ide/atapi: make PIO read requests async

2015-10-02 Thread John Snow


On 09/21/2015 08:25 AM, Peter Lieven wrote:
> PIO read requests on the ATAPI interface used to be sync blk requests.
> This has to siginificant drawbacks. First the main loop hangs util an
> I/O request is completed and secondly if the I/O request does not
> complete (e.g. due to an unresponsive storage) Qemu hangs completely.
> 
> Signed-off-by: Peter Lieven 
> ---
>  hw/ide/atapi.c | 69 
> --
>  1 file changed, 43 insertions(+), 26 deletions(-)
> 
> diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
> index 747f466..9257e1c 100644
> --- a/hw/ide/atapi.c
> +++ b/hw/ide/atapi.c
> @@ -105,31 +105,51 @@ static void cd_data_to_raw(uint8_t *buf, int lba)
>  memset(buf, 0, 288);
>  }
>  
> -static int cd_read_sector(IDEState *s, int lba, uint8_t *buf, int 
> sector_size)
> +static void cd_read_sector_cb(void *opaque, int ret)
>  {
> -int ret;
> +IDEState *s = opaque;
>  
> -switch(sector_size) {
> -case 2048:
> -block_acct_start(blk_get_stats(s->blk), >acct,
> - 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
> -ret = blk_read(s->blk, (int64_t)lba << 2, buf, 4);
> -block_acct_done(blk_get_stats(s->blk), >acct);
> -break;
> -case 2352:
> -block_acct_start(blk_get_stats(s->blk), >acct,
> - 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
> -ret = blk_read(s->blk, (int64_t)lba << 2, buf + 16, 4);
> -block_acct_done(blk_get_stats(s->blk), >acct);
> -if (ret < 0)
> -return ret;
> -cd_data_to_raw(buf, lba);
> -break;
> -default:
> -ret = -EIO;
> -break;
> +block_acct_done(blk_get_stats(s->blk), >acct);
> +
> +if (ret < 0) {
> +ide_atapi_io_error(s, ret);
> +return;
> +}
> +
> +if (s->cd_sector_size == 2352) {
> +cd_data_to_raw(s->io_buffer, s->lba);
>  }
> -return ret;
> +
> +s->lba++;
> +s->io_buffer_index = 0;
> +s->status &= ~BUSY_STAT;
> +
> +ide_atapi_cmd_reply_end(s);
> +}
> +
> +static int cd_read_sector(IDEState *s, int lba, void *buf, int sector_size)
> +{
> +if (sector_size != 2048 && sector_size != 2352) {
> +return -EINVAL;
> +}
> +
> +s->iov.iov_base = buf;
> +if (sector_size == 2352) {
> +buf += 4;
> +}
> +
> +s->iov.iov_len = 4 * BDRV_SECTOR_SIZE;
> +qemu_iovec_init_external(>qiov, >iov, 1);
> +
> +if (blk_aio_readv(s->blk, (int64_t)lba << 2, >qiov, 4,
> +  cd_read_sector_cb, s) == NULL) {
> +return -EIO;
> +}
> +
> +block_acct_start(blk_get_stats(s->blk), >acct,
> + 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
> +s->status |= BUSY_STAT;
> +return 0;
>  }
>  
>  void ide_atapi_cmd_ok(IDEState *s)
> @@ -190,10 +210,8 @@ void ide_atapi_cmd_reply_end(IDEState *s)
>  ret = cd_read_sector(s, s->lba, s->io_buffer, s->cd_sector_size);
>  if (ret < 0) {
>  ide_atapi_io_error(s, ret);
> -return;
>  }
> -s->lba++;
> -s->io_buffer_index = 0;
> +return;
>  }
>  if (s->elementary_transfer_size > 0) {
>  /* there are some data left to transmit in this elementary
> @@ -275,7 +293,6 @@ static void ide_atapi_cmd_read_pio(IDEState *s, int lba, 
> int nb_sectors,
>  s->io_buffer_index = sector_size;
>  s->cd_sector_size = sector_size;
>  
> -s->status = READY_STAT | SEEK_STAT;
>  ide_atapi_cmd_reply_end(s);
>  }
>  
> 

For me, this hangs the pio_large test in tests/ide-test, path:
/x86_64/ide/cdrom/pio_large

...I'll debug this over the weekend.

I think I checked this test in after you wrote this patch series, with a
mind of being able to test ATAPI for Alexander's ATAPI-SCSI bridge, but
it seems useful here :)

-- 
—js