Re: [Qemu-block] [Qemu-devel] [PATCH 0/9] nbd block status base:allocation

2018-02-23 Thread no-reply
Hi,

This series failed docker-mingw@fedora build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

Type: series
Message-id: 1518702707-7077-1-git-send-email-vsement...@virtuozzo.com
Subject: [Qemu-devel] [PATCH 0/9] nbd block status base:allocation

=== TEST SCRIPT BEGIN ===
#!/bin/bash
set -e
git submodule update --init dtc
# Let docker tests dump environment info
export SHOW_ENV=1
export J=8
time make docker-test-mingw@fedora
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
7d95dcdd92 iotests: new test 206 for NBD BLOCK_STATUS
906b0164c4 iotests: add file_path helper
015ee723d2 iotests.py: tiny refactor: move system imports up
1377201bee nbd: BLOCK_STATUS for standard get_block_status function: client part
a750bdb375 nbd/client: fix error messages in nbd_handle_reply_err
6ec660434e block/nbd-client: save first fatal error in nbd_iter_error
1b609ef226 nbd: BLOCK_STATUS for standard get_block_status function: server part
ac6e460a1f nbd: change indenting in nbd.h
5e399e16f0 nbd/server: add nbd_opt_invalid helper

=== OUTPUT BEGIN ===
Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc'
Cloning into '/var/tmp/patchew-tester-tmp-ffe51cdm/src/dtc'...
Submodule path 'dtc': checked out 'e54388015af1fb4bf04d0bca99caba1074d9cc42'
  BUILD   fedora
make[1]: Entering directory '/var/tmp/patchew-tester-tmp-ffe51cdm/src'
  GEN 
/var/tmp/patchew-tester-tmp-ffe51cdm/src/docker-src.2018-02-24-01.47.30.11028/qemu.tar
Cloning into 
'/var/tmp/patchew-tester-tmp-ffe51cdm/src/docker-src.2018-02-24-01.47.30.11028/qemu.tar.vroot'...
done.
Your branch is up-to-date with 'origin/test'.
Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc'
Cloning into 
'/var/tmp/patchew-tester-tmp-ffe51cdm/src/docker-src.2018-02-24-01.47.30.11028/qemu.tar.vroot/dtc'...
Submodule path 'dtc': checked out 'e54388015af1fb4bf04d0bca99caba1074d9cc42'
Submodule 'ui/keycodemapdb' (git://git.qemu.org/keycodemapdb.git) registered 
for path 'ui/keycodemapdb'
Cloning into 
'/var/tmp/patchew-tester-tmp-ffe51cdm/src/docker-src.2018-02-24-01.47.30.11028/qemu.tar.vroot/ui/keycodemapdb'...
Submodule path 'ui/keycodemapdb': checked out 
'6b3d716e2b6472eb7189d3220552280ef3d832ce'
  COPYRUNNER
RUN test-mingw in qemu:fedora 
Packages installed:
PyYAML-3.12-5.fc27.x86_64
SDL-devel-1.2.15-29.fc27.x86_64
bc-1.07.1-3.fc27.x86_64
bison-3.0.4-8.fc27.x86_64
bzip2-1.0.6-24.fc27.x86_64
ccache-3.3.5-1.fc27.x86_64
clang-5.0.1-1.fc27.x86_64
findutils-4.6.0-14.fc27.x86_64
flex-2.6.1-5.fc27.x86_64
gcc-7.3.1-2.fc27.x86_64
gcc-c++-7.3.1-2.fc27.x86_64
gettext-0.19.8.1-12.fc27.x86_64
git-2.14.3-2.fc27.x86_64
glib2-devel-2.54.3-2.fc27.x86_64
hostname-3.18-4.fc27.x86_64
libaio-devel-0.3.110-9.fc27.x86_64
libasan-7.3.1-2.fc27.x86_64
libfdt-devel-1.4.6-1.fc27.x86_64
libubsan-7.3.1-2.fc27.x86_64
make-4.2.1-4.fc27.x86_64
mingw32-SDL-1.2.15-9.fc27.noarch
mingw32-bzip2-1.0.6-9.fc27.noarch
mingw32-curl-7.54.1-2.fc27.noarch
mingw32-glib2-2.54.1-1.fc27.noarch
mingw32-gmp-6.1.2-2.fc27.noarch
mingw32-gnutls-3.5.13-2.fc27.noarch
mingw32-gtk2-2.24.31-4.fc27.noarch
mingw32-gtk3-3.22.16-1.fc27.noarch
mingw32-libjpeg-turbo-1.5.1-3.fc27.noarch
mingw32-libpng-1.6.29-2.fc27.noarch
mingw32-libssh2-1.8.0-3.fc27.noarch
mingw32-libtasn1-4.13-1.fc27.noarch
mingw32-nettle-3.3-3.fc27.noarch
mingw32-pixman-0.34.0-3.fc27.noarch
mingw32-pkg-config-0.28-9.fc27.x86_64
mingw64-SDL-1.2.15-9.fc27.noarch
mingw64-bzip2-1.0.6-9.fc27.noarch
mingw64-curl-7.54.1-2.fc27.noarch
mingw64-glib2-2.54.1-1.fc27.noarch
mingw64-gmp-6.1.2-2.fc27.noarch
mingw64-gnutls-3.5.13-2.fc27.noarch
mingw64-gtk2-2.24.31-4.fc27.noarch
mingw64-gtk3-3.22.16-1.fc27.noarch
mingw64-libjpeg-turbo-1.5.1-3.fc27.noarch
mingw64-libpng-1.6.29-2.fc27.noarch
mingw64-libssh2-1.8.0-3.fc27.noarch
mingw64-libtasn1-4.13-1.fc27.noarch
mingw64-nettle-3.3-3.fc27.noarch
mingw64-pixman-0.34.0-3.fc27.noarch
mingw64-pkg-config-0.28-9.fc27.x86_64
nettle-devel-3.4-1.fc27.x86_64
perl-5.26.1-402.fc27.x86_64
pixman-devel-0.34.0-4.fc27.x86_64
python3-3.6.2-13.fc27.x86_64
sparse-0.5.1-2.fc27.x86_64
tar-1.29-7.fc27.x86_64
which-2.21-4.fc27.x86_64
zlib-devel-1.2.11-4.fc27.x86_64

Environment variables:
TARGET_LIST=
PACKAGES=ccache gettext git tar PyYAML sparse flex bison python3 bzip2 hostname 
glib2-devel pixman-devel zlib-devel SDL-devel libfdt-devel gcc gcc-c++ 
clang make perl which bc findutils libaio-devel nettle-devel libasan 
libubsan mingw32-pixman mingw32-glib2 mingw32-gmp mingw32-SDL 
mingw32-pkg-config mingw32-gtk2 mingw32-gtk3 mingw32-gnutls mingw32-nettle 
mingw32-libtasn1 mingw32-libjpeg-turbo mingw32-libpng mingw32-curl 
mingw32-libssh2 mingw32-bzip2 mingw64-pixman mingw64-glib2 mingw64-gmp 
mingw64-SDL mingw64-pkg-config mingw64-gtk2 mingw64-gtk3 mingw64-gnutls 
mingw64-nettle mingw64-libtasn1 mingw64-libjpeg-turbo 

Re: [Qemu-block] [Qemu-devel] [PATCH] scsi: Remove automatic creation of SCSI controllers with -drive if=scsi

2018-02-23 Thread no-reply
Hi,

This series failed build test on s390x host. Please find the details below.

N/A. Internal error while reading log file



---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-de...@freelists.org

Re: [Qemu-block] [Qemu-devel] [RFC v4 00/21] blockjobs: add explicit job management

2018-02-23 Thread no-reply
Hi,

This series failed build test on ppcbe host. Please find the details below.

Type: series
Message-id: 20180223235142.21501-1-js...@redhat.com
Subject: [Qemu-devel] [RFC v4 00/21] blockjobs: add explicit job management

=== TEST SCRIPT BEGIN ===
#!/bin/bash
# Testing script will be invoked under the git checkout with
# HEAD pointing to a commit that has the patches applied on top of "base"
# branch
set -e
echo "=== ENV ==="
env
echo "=== PACKAGES ==="
rpm -qa
echo "=== TEST BEGIN ==="
INSTALL=$PWD/install
BUILD=$PWD/build
mkdir -p $BUILD $INSTALL
SRC=$PWD
cd $BUILD
$SRC/configure --prefix=$INSTALL
make -j100
# XXX: we need reliable clean up
# make check -j100 V=1
make install
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 - [tag update]  patchew/20180223153636.29809-1-alex.ben...@linaro.org -> 
patchew/20180223153636.29809-1-alex.ben...@linaro.org
 * [new tag] patchew/20180223235142.21501-1-js...@redhat.com -> 
patchew/20180223235142.21501-1-js...@redhat.com
Submodule 'capstone' (git://git.qemu.org/capstone.git) registered for path 
'capstone'
Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc'
Submodule 'roms/QemuMacDrivers' (git://git.qemu.org/QemuMacDrivers.git) 
registered for path 'roms/QemuMacDrivers'
Submodule 'roms/SLOF' (git://git.qemu-project.org/SLOF.git) registered for path 
'roms/SLOF'
Submodule 'roms/ipxe' (git://git.qemu-project.org/ipxe.git) registered for path 
'roms/ipxe'
Submodule 'roms/openbios' (git://git.qemu-project.org/openbios.git) registered 
for path 'roms/openbios'
Submodule 'roms/openhackware' (git://git.qemu-project.org/openhackware.git) 
registered for path 'roms/openhackware'
Submodule 'roms/qemu-palcode' (git://github.com/rth7680/qemu-palcode.git) 
registered for path 'roms/qemu-palcode'
Submodule 'roms/seabios' (git://git.qemu-project.org/seabios.git/) registered 
for path 'roms/seabios'
Submodule 'roms/seabios-hppa' (git://github.com/hdeller/seabios-hppa.git) 
registered for path 'roms/seabios-hppa'
Submodule 'roms/sgabios' (git://git.qemu-project.org/sgabios.git) registered 
for path 'roms/sgabios'
Submodule 'roms/skiboot' (git://git.qemu.org/skiboot.git) registered for path 
'roms/skiboot'
Submodule 'roms/u-boot' (git://git.qemu-project.org/u-boot.git) registered for 
path 'roms/u-boot'
Submodule 'roms/vgabios' (git://git.qemu-project.org/vgabios.git/) registered 
for path 'roms/vgabios'
Submodule 'ui/keycodemapdb' (git://git.qemu.org/keycodemapdb.git) registered 
for path 'ui/keycodemapdb'
Cloning into 'capstone'...
Submodule path 'capstone': checked out 
'22ead3e0bfdb87516656453336160e0a37b066bf'
Cloning into 'dtc'...
Submodule path 'dtc': checked out 'e54388015af1fb4bf04d0bca99caba1074d9cc42'
Cloning into 'roms/QemuMacDrivers'...
Submodule path 'roms/QemuMacDrivers': checked out 
'd4e7d7ac663fcb55f1b93575445fcbca372f17a7'
Cloning into 'roms/SLOF'...
Submodule path 'roms/SLOF': checked out 
'fa981320a1e0968d6fc1b8de319723ff8212b337'
Cloning into 'roms/ipxe'...
Submodule path 'roms/ipxe': checked out 
'0600d3ae94f93efd10fc6b3c7420a9557a3a1670'
Cloning into 'roms/openbios'...
Submodule path 'roms/openbios': checked out 
'54d959d97fb331708767b2fd4a878efd2bbc41bb'
Cloning into 'roms/openhackware'...
Submodule path 'roms/openhackware': checked out 
'c559da7c8eec5e45ef1f67978827af6f0b9546f5'
Cloning into 'roms/qemu-palcode'...
Submodule path 'roms/qemu-palcode': checked out 
'f3c7e44c70254975df2a00af39701eafbac4d471'
Cloning into 'roms/seabios'...
Submodule path 'roms/seabios': checked out 
'63451fca13c75870e1703eb3e20584d91179aebc'
Cloning into 'roms/seabios-hppa'...
Submodule path 'roms/seabios-hppa': checked out 
'649e6202b8d65d46c69f542b1380f840fbe8ab13'
Cloning into 'roms/sgabios'...
Submodule path 'roms/sgabios': checked out 
'cbaee52287e5f32373181cff50a00b6c4ac9015a'
Cloning into 'roms/skiboot'...
Submodule path 'roms/skiboot': checked out 
'e0ee24c27a172bcf482f6f2bc905e6211c134bcc'
Cloning into 'roms/u-boot'...
Submodule path 'roms/u-boot': checked out 
'd85ca029f257b53a96da6c2fb421e78a003a9943'
Cloning into 'roms/vgabios'...
Submodule path 'roms/vgabios': checked out 
'19ea12c230ded95928ecaef0db47a82231c2e485'
Cloning into 'ui/keycodemapdb'...
Submodule path 'ui/keycodemapdb': checked out 
'6b3d716e2b6472eb7189d3220552280ef3d832ce'
Switched to a new branch 'test'
230e578 blockjobs: add manual_mgmt option to transactions
f278a51 iotests: test manual job dismissal
8e473ab blockjobs: Expose manual property
7ad2d01 blockjobs: add block-job-finalize
3857c91 blockjobs: add PENDING status and event
18eb8a4 blockjobs: add waiting status
daf9613 blockjobs: add prepare callback
78be501 blockjobs: add block_job_txn_apply function
4b659ab blockjobs: add commit, abort, clean helpers
4023046 blockjobs: ensure abort is called for cancelled jobs
e9300b1 blockjobs: add block_job_dismiss
4fc045e blockjobs: add NULL state
e6aa454 blockjobs: add CONCLUDED state
78efa2f 

[Qemu-block] [RFC v4 04/21] blockjobs: add status enum

2018-02-23 Thread John Snow
We're about to add several new states, and booleans are becoming
unwieldly and difficult to reason about. It would help to have a
more explicit bookkeeping of the state of blockjobs. To this end,
add a new "status" field and add our existing states in a redundant
manner alongside the bools they are replacing:

UNDEFINED: Placeholder, default state. Not currently visible to QMP
   unless changes occur in the future to allow creating jobs
   without starting them via QMP.
CREATED:   replaces !!job->co && paused && !busy
RUNNING:   replaces effectively (!paused && busy)
PAUSED:Nearly redundant with info->paused, which shows pause_count.
   This reports the actual status of the job, which almost always
   matches the paused request status. It differs in that it is
   strictly only true when the job has actually gone dormant.
READY: replaces job->ready.
STANDBY:   Paused, but job->ready is true.

New state additions in coming commits will not be quite so redundant:

WAITING:   Waiting on transaction. This job has finished all the work
   it can until the transaction converges, fails, or is canceled.
PENDING:   Pending authorization from user. This job has finished all the
   work it can until the job or transaction is finalized via
   block_job_finalize. This implies the transaction has converged
   and left the WAITING phase.
ABORTING:  Job has encountered an error condition and is in the process
   of aborting.
CONCLUDED: Job has ceased all operations and has a return code available
   for query and may be dismissed via block_job_dismiss.
NULL:  Job has been dismissed and (should) be destroyed. Should never
   be visible to QMP.

Some of these states appear somewhat superfluous, but it helps define the
expected flow of a job; so some of the states wind up being synchronous
empty transitions. Importantly, jobs can be in only one of these states
at any given time, which helps code and external users alike reason about
the current condition of a job unambiguously.

Signed-off-by: John Snow 
---
 blockjob.c |  9 +
 include/block/blockjob.h   |  7 +--
 qapi/block-core.json   | 31 ++-
 tests/qemu-iotests/109.out | 24 
 4 files changed, 56 insertions(+), 15 deletions(-)

diff --git a/blockjob.c b/blockjob.c
index 47468331ec..1be9c20cff 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -320,6 +320,7 @@ void block_job_start(BlockJob *job)
 job->pause_count--;
 job->busy = true;
 job->paused = false;
+job->status = BLOCK_JOB_STATUS_RUNNING;
 bdrv_coroutine_enter(blk_bs(job->blk), job->co);
 }
 
@@ -598,6 +599,7 @@ BlockJobInfo *block_job_query(BlockJob *job, Error **errp)
 info->speed = job->speed;
 info->io_status = job->iostatus;
 info->ready = job->ready;
+info->status= job->status;
 return info;
 }
 
@@ -701,6 +703,7 @@ void *block_job_create(const char *job_id, const 
BlockJobDriver *driver,
 job->pause_count   = 1;
 job->refcnt= 1;
 job->manual= (flags & BLOCK_JOB_MANUAL);
+job->status= BLOCK_JOB_STATUS_CREATED;
 aio_timer_init(qemu_get_aio_context(), >sleep_timer,
QEMU_CLOCK_REALTIME, SCALE_NS,
block_job_sleep_timer_cb, job);
@@ -814,9 +817,14 @@ void coroutine_fn block_job_pause_point(BlockJob *job)
 }
 
 if (block_job_should_pause(job) && !block_job_is_cancelled(job)) {
+BlockJobStatus status = job->status;
+job->status = status == BLOCK_JOB_STATUS_READY ? \
+BLOCK_JOB_STATUS_STANDBY : \
+BLOCK_JOB_STATUS_PAUSED;
 job->paused = true;
 block_job_do_yield(job, -1);
 job->paused = false;
+job->status = status;
 }
 
 if (job->driver->resume) {
@@ -922,6 +930,7 @@ void block_job_iostatus_reset(BlockJob *job)
 
 void block_job_event_ready(BlockJob *job)
 {
+job->status = BLOCK_JOB_STATUS_READY;
 job->ready = true;
 
 if (block_job_is_internal(job)) {
diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index 8ffabdcbc4..e254359d6b 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -143,10 +143,13 @@ typedef struct BlockJob {
 
 /**
  * Set to true when the management API has requested manual job
- * management semantics.
+ * management semantics. See @BlockJobStatus for details.
  */
 bool manual;
 
+/** Current state; See @BlockJobStatus for details. */
+BlockJobStatus status;
+
 /** Non-NULL if this job is part of a transaction */
 BlockJobTxn *txn;
 QLIST_ENTRY(BlockJob) txn_list;
@@ -157,7 +160,7 @@ typedef enum BlockJobCreateFlags {
 BLOCK_JOB_DEFAULT = 0x00,
 /* BlockJob is not QMP-created and should not send QMP events */
 BLOCK_JOB_INTERNAL = 0x01,

[Qemu-block] [RFC v4 11/21] blockjobs: add block_job_dismiss

2018-02-23 Thread John Snow
For jobs that have reached their CONCLUDED state, prior to having their
last reference put down (meaning jobs that have completed successfully,
unsuccessfully, or have been canceled), allow the user to dismiss the
job's lingering status report via block-job-dismiss.

This gives management APIs the chance to conclusively determine if a job
failed or succeeded, even if the event broadcast was missed.

Note that jobs do not yet linger in any such state, they are freed
immediately upon reaching this previously-unnamed state. such a state is
added immediately in the next commit.

Verbs:
Dismiss: operates on CONCLUDED jobs only.
Signed-off-by: John Snow 
---
 block/trace-events   |  1 +
 blockdev.c   | 14 ++
 blockjob.c   | 34 --
 include/block/blockjob.h |  9 +
 qapi/block-core.json | 24 +++-
 5 files changed, 79 insertions(+), 3 deletions(-)

diff --git a/block/trace-events b/block/trace-events
index 3fe89f7ea6..266afd9e99 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -50,6 +50,7 @@ qmp_block_job_cancel(void *job) "job %p"
 qmp_block_job_pause(void *job) "job %p"
 qmp_block_job_resume(void *job) "job %p"
 qmp_block_job_complete(void *job) "job %p"
+qmp_block_job_dismiss(void *job) "job %p"
 qmp_block_stream(void *bs, void *job) "bs %p job %p"
 
 # block/file-win32.c
diff --git a/blockdev.c b/blockdev.c
index cba935a0a6..3180130782 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3852,6 +3852,20 @@ void qmp_block_job_complete(const char *device, Error 
**errp)
 aio_context_release(aio_context);
 }
 
+void qmp_block_job_dismiss(const char *id, Error **errp)
+{
+AioContext *aio_context;
+BlockJob *job = find_block_job(id, _context, errp);
+
+if (!job) {
+return;
+}
+
+trace_qmp_block_job_dismiss(job);
+block_job_dismiss(, errp);
+aio_context_release(aio_context);
+}
+
 void qmp_change_backing_file(const char *device,
  const char *image_node_name,
  const char *backing_file,
diff --git a/blockjob.c b/blockjob.c
index 7b5c4063cf..4d29391673 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -63,6 +63,7 @@ bool 
BlockJobVerbTable[BLOCK_JOB_VERB__MAX][BLOCK_JOB_STATUS__MAX] = {
 [BLOCK_JOB_VERB_RESUME]   = {0, 1, 1, 1, 1, 1, 0, 0, 0},
 [BLOCK_JOB_VERB_SET_SPEED]= {0, 1, 1, 1, 1, 1, 0, 0, 0},
 [BLOCK_JOB_VERB_COMPLETE] = {0, 0, 0, 0, 1, 0, 0, 0, 0},
+[BLOCK_JOB_VERB_DISMISS]  = {0, 0, 0, 0, 0, 0, 0, 1, 0},
 };
 
 static void block_job_state_transition(BlockJob *job, BlockJobStatus s1)
@@ -424,7 +425,6 @@ static void block_job_completed_single(BlockJob *job)
 QLIST_REMOVE(job, txn_list);
 block_job_txn_unref(job->txn);
 block_job_event_concluded(job);
-block_job_state_transition(job, BLOCK_JOB_STATUS_NULL);
 block_job_unref(job);
 }
 
@@ -441,6 +441,13 @@ static void block_job_cancel_async(BlockJob *job)
 job->cancelled = true;
 }
 
+static void block_job_do_dismiss(BlockJob *job)
+{
+assert(job);
+block_job_state_transition(job, BLOCK_JOB_STATUS_NULL);
+block_job_unref(job);
+}
+
 static int block_job_finish_sync(BlockJob *job,
  void (*finish)(BlockJob *, Error **errp),
  Error **errp)
@@ -590,6 +597,19 @@ void block_job_complete(BlockJob *job, Error **errp)
 job->driver->complete(job, errp);
 }
 
+void block_job_dismiss(BlockJob **jobptr, Error **errp)
+{
+BlockJob *job = *jobptr;
+/* similarly to _complete, this is QMP-interface only. */
+assert(job->id);
+if (block_job_apply_verb(job, BLOCK_JOB_VERB_DISMISS, errp)) {
+return;
+}
+
+block_job_do_dismiss(job);
+*jobptr = NULL;
+}
+
 void block_job_user_pause(BlockJob *job, Error **errp)
 {
 if (block_job_apply_verb(job, BLOCK_JOB_VERB_PAUSE, errp)) {
@@ -626,7 +646,7 @@ void block_job_user_resume(BlockJob *job, Error **errp)
 void block_job_cancel(BlockJob *job)
 {
 if (job->status == BLOCK_JOB_STATUS_CONCLUDED) {
-return;
+block_job_do_dismiss(job);
 } else if (block_job_started(job)) {
 block_job_cancel_async(job);
 block_job_enter(job);
@@ -737,6 +757,10 @@ static void block_job_event_completed(BlockJob *job, const 
char *msg)
 static void block_job_event_concluded(BlockJob *job)
 {
 block_job_state_transition(job, BLOCK_JOB_STATUS_CONCLUDED);
+/* for pre-2.12 style jobs, automatically destroy */
+if (!job->manual) {
+block_job_do_dismiss(job);
+}
 }
 
 /*
@@ -841,6 +865,9 @@ void *block_job_create(const char *job_id, const 
BlockJobDriver *driver,
 block_job_txn_add_job(txn, job);
 }
 
+/* For the expanded job control STM, grab an extra
+ * reference for finalize() to put down */
+block_job_ref(job);
 return job;
 }
 
@@ -859,6 +886,9 @@ void 

[Qemu-block] [RFC v4 02/21] blockjobs: model single jobs as transactions

2018-02-23 Thread John Snow
model all independent jobs as single job transactions.

It's one less case we have to worry about when we add more states to the
transition machine. This way, we can just treat all job lifetimes exactly
the same. This helps tighten assertions of the STM graph and removes some
conditionals that would have been needed in the coming commits adding a
more explicit job lifetime management API.

Signed-off-by: John Snow 
---
 block/backup.c   |  3 +--
 block/commit.c   |  2 +-
 block/mirror.c   |  2 +-
 block/stream.c   |  2 +-
 blockjob.c   | 25 -
 include/block/blockjob_int.h |  3 ++-
 tests/test-bdrv-drain.c  |  4 ++--
 tests/test-blockjob-txn.c| 19 +++
 tests/test-blockjob.c|  2 +-
 9 files changed, 32 insertions(+), 30 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 4a16a37229..7e254dabff 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -621,7 +621,7 @@ BlockJob *backup_job_create(const char *job_id, 
BlockDriverState *bs,
 }
 
 /* job->common.len is fixed, so we can't allow resize */
-job = block_job_create(job_id, _job_driver, bs,
+job = block_job_create(job_id, _job_driver, txn, bs,
BLK_PERM_CONSISTENT_READ,
BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE |
BLK_PERM_WRITE_UNCHANGED | BLK_PERM_GRAPH_MOD,
@@ -677,7 +677,6 @@ BlockJob *backup_job_create(const char *job_id, 
BlockDriverState *bs,
 block_job_add_bdrv(>common, "target", target, 0, BLK_PERM_ALL,
_abort);
 job->common.len = len;
-block_job_txn_add_job(txn, >common);
 
 return >common;
 
diff --git a/block/commit.c b/block/commit.c
index bb6c904704..9682158ee7 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -289,7 +289,7 @@ void commit_start(const char *job_id, BlockDriverState *bs,
 return;
 }
 
-s = block_job_create(job_id, _job_driver, bs, 0, BLK_PERM_ALL,
+s = block_job_create(job_id, _job_driver, NULL, bs, 0, BLK_PERM_ALL,
  speed, BLOCK_JOB_DEFAULT, NULL, NULL, errp);
 if (!s) {
 return;
diff --git a/block/mirror.c b/block/mirror.c
index c9badc1203..6bab7cfdd8 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1166,7 +1166,7 @@ static void mirror_start_job(const char *job_id, 
BlockDriverState *bs,
 }
 
 /* Make sure that the source is not resized while the job is running */
-s = block_job_create(job_id, driver, mirror_top_bs,
+s = block_job_create(job_id, driver, NULL, mirror_top_bs,
  BLK_PERM_CONSISTENT_READ,
  BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED |
  BLK_PERM_WRITE | BLK_PERM_GRAPH_MOD, speed,
diff --git a/block/stream.c b/block/stream.c
index 499cdacdb0..f3b53f49e2 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -244,7 +244,7 @@ void stream_start(const char *job_id, BlockDriverState *bs,
 /* Prevent concurrent jobs trying to modify the graph structure here, we
  * already have our own plans. Also don't allow resize as the image size is
  * queried only at the job start and then cached. */
-s = block_job_create(job_id, _job_driver, bs,
+s = block_job_create(job_id, _job_driver, NULL, bs,
  BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED |
  BLK_PERM_GRAPH_MOD,
  BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED |
diff --git a/blockjob.c b/blockjob.c
index 24833ef30f..7ba3683ee3 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -357,10 +357,8 @@ static void block_job_completed_single(BlockJob *job)
 }
 }
 
-if (job->txn) {
-QLIST_REMOVE(job, txn_list);
-block_job_txn_unref(job->txn);
-}
+QLIST_REMOVE(job, txn_list);
+block_job_txn_unref(job->txn);
 block_job_unref(job);
 }
 
@@ -647,7 +645,7 @@ static void block_job_event_completed(BlockJob *job, const 
char *msg)
  */
 
 void *block_job_create(const char *job_id, const BlockJobDriver *driver,
-   BlockDriverState *bs, uint64_t perm,
+   BlockJobTxn *txn, BlockDriverState *bs, uint64_t perm,
uint64_t shared_perm, int64_t speed, int flags,
BlockCompletionFunc *cb, void *opaque, Error **errp)
 {
@@ -729,6 +727,17 @@ void *block_job_create(const char *job_id, const 
BlockJobDriver *driver,
 return NULL;
 }
 }
+
+/* Single jobs are modeled as single-job transactions for sake of
+ * consolidating the job management logic */
+if (!txn) {
+txn = block_job_txn_new();
+block_job_txn_add_job(txn, job);
+block_job_txn_unref(txn);
+} else {
+block_job_txn_add_job(txn, job);
+}
+
 return job;
 }
 
@@ -752,13 +761,11 @@ void 

[Qemu-block] [RFC v4 14/21] blockjobs: add block_job_txn_apply function

2018-02-23 Thread John Snow
Simply apply a function transaction-wide.
A few more uses of this in forthcoming patches.

Signed-off-by: John Snow 
---
 blockjob.c | 24 +++-
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/blockjob.c b/blockjob.c
index 431ce9c220..8f02c03880 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -467,6 +467,19 @@ static void block_job_cancel_async(BlockJob *job)
 job->cancelled = true;
 }
 
+static void block_job_txn_apply(BlockJobTxn *txn, void fn(BlockJob *))
+{
+AioContext *ctx;
+BlockJob *job, *next;
+
+QLIST_FOREACH_SAFE(job, >jobs, txn_list, next) {
+ctx = blk_get_aio_context(job->blk);
+aio_context_acquire(ctx);
+fn(job);
+aio_context_release(ctx);
+}
+}
+
 static void block_job_do_dismiss(BlockJob *job)
 {
 assert(job);
@@ -552,9 +565,8 @@ static void block_job_completed_txn_abort(BlockJob *job)
 
 static void block_job_completed_txn_success(BlockJob *job)
 {
-AioContext *ctx;
 BlockJobTxn *txn = job->txn;
-BlockJob *other_job, *next;
+BlockJob *other_job;
 /*
  * Successful completion, see if there are other running jobs in this
  * txn.
@@ -565,13 +577,7 @@ static void block_job_completed_txn_success(BlockJob *job)
 }
 }
 /* We are the last completed job, commit the transaction. */
-QLIST_FOREACH_SAFE(other_job, >jobs, txn_list, next) {
-ctx = blk_get_aio_context(other_job->blk);
-aio_context_acquire(ctx);
-assert(other_job->ret == 0);
-block_job_completed_single(other_job);
-aio_context_release(ctx);
-}
+block_job_txn_apply(txn, block_job_completed_single);
 }
 
 /* Assumes the block_job_mutex is held */
-- 
2.14.3




[Qemu-block] [RFC v4 20/21] iotests: test manual job dismissal

2018-02-23 Thread John Snow
Signed-off-by: John Snow 
---
 tests/qemu-iotests/056 | 195 +
 tests/qemu-iotests/056.out |   4 +-
 2 files changed, 197 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/056 b/tests/qemu-iotests/056
index 04f2c3c841..bc21ba9af8 100755
--- a/tests/qemu-iotests/056
+++ b/tests/qemu-iotests/056
@@ -29,6 +29,26 @@ backing_img = os.path.join(iotests.test_dir, 'backing.img')
 test_img = os.path.join(iotests.test_dir, 'test.img')
 target_img = os.path.join(iotests.test_dir, 'target.img')
 
+def img_create(img, fmt=iotests.imgfmt, size='64M', **kwargs):
+fullname = os.path.join(iotests.test_dir, '%s.%s' % (img, fmt))
+optargs = []
+for k,v in kwargs.iteritems():
+optargs = optargs + ['-o', '%s=%s' % (k,v)]
+args = ['create', '-f', fmt] + optargs + [fullname, size]
+iotests.qemu_img(*args)
+return fullname
+
+def try_remove(img):
+try:
+os.remove(img)
+except OSError:
+pass
+
+def io_write_patterns(img, patterns):
+for pattern in patterns:
+iotests.qemu_io('-c', 'write -P%s %s %s' % pattern, img)
+
+
 class TestSyncModesNoneAndTop(iotests.QMPTestCase):
 image_len = 64 * 1024 * 1024 # MB
 
@@ -108,5 +128,180 @@ class TestBeforeWriteNotifier(iotests.QMPTestCase):
 event = self.cancel_and_wait()
 self.assert_qmp(event, 'data/type', 'backup')
 
+class BackupTest(iotests.QMPTestCase):
+def setUp(self):
+self.vm = iotests.VM()
+self.test_img = img_create('test')
+self.dest_img = img_create('dest')
+self.vm.add_drive(self.test_img)
+self.vm.launch()
+
+def tearDown(self):
+self.vm.shutdown()
+try_remove(self.test_img)
+try_remove(self.dest_img)
+
+def hmp_io_writes(self, drive, patterns):
+for pattern in patterns:
+self.vm.hmp_qemu_io(drive, 'write -P%s %s %s' % pattern)
+self.vm.hmp_qemu_io(drive, 'flush')
+
+def qmp_job_pending_wait(self, device):
+event = self.vm.event_wait(name="BLOCK_JOB_PENDING",
+   match={'data': {'id': device}})
+self.assertNotEqual(event, None)
+res = self.vm.qmp("block-job-finalize", id=device)
+self.assert_qmp(res, 'return', {})
+
+def qmp_backup_and_wait(self, cmd='drive-backup', serror=None,
+aerror=None, **kwargs):
+if not self.qmp_backup(cmd, serror, **kwargs):
+return False
+if 'manual' in kwargs and kwargs['manual']:
+self.qmp_job_pending_wait(kwargs['device'])
+return self.qmp_backup_wait(kwargs['device'], aerror)
+
+def qmp_backup(self, cmd='drive-backup',
+   error=None, **kwargs):
+self.assertTrue('device' in kwargs)
+res = self.vm.qmp(cmd, **kwargs)
+if error:
+self.assert_qmp(res, 'error/desc', error)
+return False
+self.assert_qmp(res, 'return', {})
+return True
+
+def qmp_backup_wait(self, device, error=None):
+event = self.vm.event_wait(name="BLOCK_JOB_COMPLETED",
+   match={'data': {'device': device}})
+self.assertNotEqual(event, None)
+try:
+failure = self.dictpath(event, 'data/error')
+except AssertionError:
+# Backup succeeded.
+self.assert_qmp(event, 'data/offset', event['data']['len'])
+return True
+else:
+# Failure.
+self.assert_qmp(event, 'data/error', qerror)
+return False
+
+def test_dismiss_false(self):
+res = self.vm.qmp('query-block-jobs')
+self.assert_qmp(res, 'return', [])
+self.qmp_backup_and_wait(device='drive0', format=iotests.imgfmt,
+ sync='full', target=self.dest_img, 
manual=False)
+res = self.vm.qmp('query-block-jobs')
+self.assert_qmp(res, 'return', [])
+
+def test_dismiss_true(self):
+res = self.vm.qmp('query-block-jobs')
+self.assert_qmp(res, 'return', [])
+self.qmp_backup_and_wait(device='drive0', format=iotests.imgfmt,
+ sync='full', target=self.dest_img, 
manual=True)
+res = self.vm.qmp('query-block-jobs')
+self.assert_qmp(res, 'return[0]/status', 'concluded')
+res = self.vm.qmp('block-job-dismiss', id='drive0')
+self.assert_qmp(res, 'return', {})
+res = self.vm.qmp('query-block-jobs')
+self.assert_qmp(res, 'return', [])
+
+def test_dismiss_bad_id(self):
+res = self.vm.qmp('query-block-jobs')
+self.assert_qmp(res, 'return', [])
+res = self.vm.qmp('block-job-dismiss', id='foobar')
+self.assert_qmp(res, 'error/class', 'DeviceNotActive')
+
+def test_dismiss_collision(self):
+res = self.vm.qmp('query-block-jobs')
+self.assert_qmp(res, 'return', [])
+

[Qemu-block] [RFC v4 18/21] blockjobs: add block-job-finalize

2018-02-23 Thread John Snow
Instead of automatically transitioning from PENDING to CONCLUDED, gate
the .prepare() and .commit() phases behind an explicit acknowledgement
provided by the QMP monitor if manual completion mode has been requested.

This allows us to perform graph changes in prepare and/or commit so that
graph changes do not occur autonomously without knowledge of the
controlling management layer.

Transactions that have reached the "PENDING" state together can all be
moved to invoke their finalization methods by issuing block_job_finalize
to any one job in the transaction.

Jobs in a transaction with mixed job->manual settings will remain stuck
in the "WAITING" state until block_job_finalize is authored on the job(s)
that have reached the "PENDING" state.

These jobs are not allowed to progress because other jobs in the
transaction may still fail during their preparation phase during
finalization, so these jobs must remain in the WAITING phase until
success is guaranteed. These jobs will then automatically dismiss
themselves, but jobs that had the manual property set will remain
at CONCLUDED as normal.

Signed-off-by: John Snow 
---
 block/trace-events   |  1 +
 blockdev.c   | 14 ++
 blockjob.c   | 69 +---
 include/block/blockjob.h | 17 
 qapi/block-core.json | 23 +++-
 5 files changed, 108 insertions(+), 16 deletions(-)

diff --git a/block/trace-events b/block/trace-events
index 5e531e0310..a81b66ff36 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -51,6 +51,7 @@ qmp_block_job_cancel(void *job) "job %p"
 qmp_block_job_pause(void *job) "job %p"
 qmp_block_job_resume(void *job) "job %p"
 qmp_block_job_complete(void *job) "job %p"
+qmp_block_job_finalize(void *job) "job %p"
 qmp_block_job_dismiss(void *job) "job %p"
 qmp_block_stream(void *bs, void *job) "bs %p job %p"
 
diff --git a/blockdev.c b/blockdev.c
index 3180130782..05fd421cdc 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3852,6 +3852,20 @@ void qmp_block_job_complete(const char *device, Error 
**errp)
 aio_context_release(aio_context);
 }
 
+void qmp_block_job_finalize(const char *id, Error **errp)
+{
+AioContext *aio_context;
+BlockJob *job = find_block_job(id, _context, errp);
+
+if (!job) {
+return;
+}
+
+trace_qmp_block_job_finalize(job);
+block_job_finalize(job, errp);
+aio_context_release(aio_context);
+}
+
 void qmp_block_job_dismiss(const char *id, Error **errp)
 {
 AioContext *aio_context;
diff --git a/blockjob.c b/blockjob.c
index 23b4b99fd4..f9e8a64261 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -65,14 +65,15 @@ bool 
BlockJobVerbTable[BLOCK_JOB_VERB__MAX][BLOCK_JOB_STATUS__MAX] = {
 [BLOCK_JOB_VERB_RESUME]   = {0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0},
 [BLOCK_JOB_VERB_SET_SPEED]= {0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0},
 [BLOCK_JOB_VERB_COMPLETE] = {0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0},
+[BLOCK_JOB_VERB_FINALIZE] = {0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0},
 [BLOCK_JOB_VERB_DISMISS]  = {0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0},
 };
 
-static void block_job_state_transition(BlockJob *job, BlockJobStatus s1)
+static bool block_job_state_transition(BlockJob *job, BlockJobStatus s1)
 {
 BlockJobStatus s0 = job->status;
 if (s0 == s1) {
-return;
+return false;
 }
 assert(s1 >= 0 && s1 <= BLOCK_JOB_STATUS__MAX);
 trace_block_job_state_transition(job, job->ret, BlockJobSTT[s0][s1] ?
@@ -83,6 +84,7 @@ static void block_job_state_transition(BlockJob *job, 
BlockJobStatus s1)
   s1));
 assert(BlockJobSTT[s0][s1]);
 job->status = s1;
+return true;
 }
 
 static int block_job_apply_verb(BlockJob *job, BlockJobVerb bv, Error **errp)
@@ -432,7 +434,7 @@ static void block_job_clean(BlockJob *job)
 }
 }
 
-static int block_job_completed_single(BlockJob *job)
+static int block_job_finalize_single(BlockJob *job)
 {
 assert(job->completed);
 
@@ -581,18 +583,44 @@ static void block_job_completed_txn_abort(BlockJob *job)
 assert(other_job->cancelled);
 block_job_finish_sync(other_job, NULL, NULL);
 }
-block_job_completed_single(other_job);
+block_job_finalize_single(other_job);
 aio_context_release(ctx);
 }
 
 block_job_txn_unref(txn);
 }
 
+static int block_job_is_manual(BlockJob *job)
+{
+return job->manual;
+}
+
+static void block_job_do_finalize(BlockJob *job)
+{
+int rc;
+assert(job && job->txn);
+
+/* For jobs set !job->manual, transition to pending synchronously now */
+block_job_txn_apply(job->txn, block_job_event_pending, false);
+
+/* prepare the transaction to complete */
+rc = block_job_txn_apply(job->txn, block_job_prepare, true);
+if (rc) {
+block_job_completed_txn_abort(job);
+} else {
+

[Qemu-block] [RFC v4 05/21] blockjobs: add state transition table

2018-02-23 Thread John Snow
The state transition table has mostly been implied. We're about to make
it a bit more complex, so let's make the STM explicit instead.

Perform state transitions with a function that for now just asserts the
transition is appropriate.

Transitions:
Undefined -> Created: During job initialization.
Created   -> Running: Once the job is started.
  Jobs cannot transition from "Created" to "Paused"
  directly, but will instead synchronously transition
  to running to paused immediately.
Running   -> Paused:  Normal workflow for pauses.
Running   -> Ready:   Normal workflow for jobs reaching their sync point.
  (e.g. mirror)
Ready -> Standby: Normal workflow for pausing ready jobs.
Paused-> Running: Normal resume.
Standby   -> Ready:   Resume of a Standby job.


+-+
|UNDEFINED|
+--+--+
   |
+--v+
|CREATED|
+--++
   |
+--v+ +--+
|RUNNING<->PAUSED|
+--++ +--+
   |
+--v--+   +---+
|READY<--->STANDBY|
+-+   +---+


Notably, there is no state presently defined as of this commit that
deals with a job after the "running" or "ready" states, so this table
will be adjusted alongside the commits that introduce those states.

Signed-off-by: John Snow 
---
 block/trace-events |  3 +++
 blockjob.c | 42 --
 2 files changed, 39 insertions(+), 6 deletions(-)

diff --git a/block/trace-events b/block/trace-events
index 02dd80ff0c..b75a0c8409 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -4,6 +4,9 @@
 bdrv_open_common(void *bs, const char *filename, int flags, const char 
*format_name) "bs %p filename \"%s\" flags 0x%x format_name \"%s\""
 bdrv_lock_medium(void *bs, bool locked) "bs %p locked %d"
 
+# blockjob.c
+block_job_state_transition(void *job,  int ret, const char *legal, const char 
*s0, const char *s1) "job %p (ret: %d) attempting %s transition (%s-->%s)"
+
 # block/block-backend.c
 blk_co_preadv(void *blk, void *bs, int64_t offset, unsigned int bytes, int 
flags) "blk %p bs %p offset %"PRId64" bytes %u flags 0x%x"
 blk_co_pwritev(void *blk, void *bs, int64_t offset, unsigned int bytes, int 
flags) "blk %p bs %p offset %"PRId64" bytes %u flags 0x%x"
diff --git a/blockjob.c b/blockjob.c
index 1be9c20cff..d745b3bb69 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -28,6 +28,7 @@
 #include "block/block.h"
 #include "block/blockjob_int.h"
 #include "block/block_int.h"
+#include "block/trace.h"
 #include "sysemu/block-backend.h"
 #include "qapi/error.h"
 #include "qapi/qmp/qerror.h"
@@ -41,6 +42,34 @@
  * block_job_enter. */
 static QemuMutex block_job_mutex;
 
+/* BlockJob State Transition Table */
+bool BlockJobSTT[BLOCK_JOB_STATUS__MAX][BLOCK_JOB_STATUS__MAX] = {
+  /* U, C, R, P, Y, S */
+/* U: */ [BLOCK_JOB_STATUS_UNDEFINED] = {0, 1, 0, 0, 0, 0},
+/* C: */ [BLOCK_JOB_STATUS_CREATED]   = {0, 0, 1, 0, 0, 0},
+/* R: */ [BLOCK_JOB_STATUS_RUNNING]   = {0, 0, 0, 1, 1, 0},
+/* P: */ [BLOCK_JOB_STATUS_PAUSED]= {0, 0, 1, 0, 0, 0},
+/* Y: */ [BLOCK_JOB_STATUS_READY] = {0, 0, 0, 0, 0, 1},
+/* S: */ [BLOCK_JOB_STATUS_STANDBY]   = {0, 0, 0, 0, 1, 0},
+};
+
+static void block_job_state_transition(BlockJob *job, BlockJobStatus s1)
+{
+BlockJobStatus s0 = job->status;
+if (s0 == s1) {
+return;
+}
+assert(s1 >= 0 && s1 <= BLOCK_JOB_STATUS__MAX);
+trace_block_job_state_transition(job, job->ret, BlockJobSTT[s0][s1] ?
+ "allowed" : "disallowed",
+ qapi_enum_lookup(_lookup,
+  s0),
+ qapi_enum_lookup(_lookup,
+  s1));
+assert(BlockJobSTT[s0][s1]);
+job->status = s1;
+}
+
 static void block_job_lock(void)
 {
 qemu_mutex_lock(_job_mutex);
@@ -320,7 +349,7 @@ void block_job_start(BlockJob *job)
 job->pause_count--;
 job->busy = true;
 job->paused = false;
-job->status = BLOCK_JOB_STATUS_RUNNING;
+block_job_state_transition(job, BLOCK_JOB_STATUS_RUNNING);
 bdrv_coroutine_enter(blk_bs(job->blk), job->co);
 }
 
@@ -704,6 +733,7 @@ void *block_job_create(const char *job_id, const 
BlockJobDriver *driver,
 job->refcnt= 1;
 job->manual= (flags & BLOCK_JOB_MANUAL);
 job->status= BLOCK_JOB_STATUS_CREATED;
+block_job_state_transition(job, BLOCK_JOB_STATUS_CREATED);
 aio_timer_init(qemu_get_aio_context(), >sleep_timer,
QEMU_CLOCK_REALTIME, SCALE_NS,
block_job_sleep_timer_cb, job);
@@ -818,13 +848,13 @@ void coroutine_fn block_job_pause_point(BlockJob *job)
 
 if (block_job_should_pause(job) && !block_job_is_cancelled(job)) {
 BlockJobStatus status = job->status;
-job->status = 

[Qemu-block] [RFC v4 07/21] blockjobs: add block_job_verb permission table

2018-02-23 Thread John Snow
Which commands ("verbs") are appropriate for jobs in which state is
also somewhat burdensome to keep track of.

As of this commit, it looks rather useless, but begins to look more
interesting the more states we add to the STM table.

A recurring theme is that no verb will apply to an 'undefined' job.

Further, it's not presently possible to restrict the "pause" or "resume"
verbs any more than they are in this commit because of the asynchronous
nature of how jobs enter the PAUSED state; justifications for some
seemingly erroneous applications are given below.

=
Verbs
=

Cancel:Any state except undefined.
Pause: Any state except undefined;
   'created': Requests that the job pauses as it starts.
   'running': Normal usage. (PAUSED)
   'paused':  The job may be paused for internal reasons,
  but the user may wish to force an indefinite
  user-pause, so this is allowed.
   'ready':   Normal usage. (STANDBY)
   'standby': Same logic as above.
Resume:Any state except undefined;
   'created': Will lift a user's pause-on-start request.
   'running': Will lift a pause request before it takes effect.
   'paused':  Normal usage.
   'ready':   Will lift a pause request before it takes effect.
   'standby': Normal usage.
Set-speed: Any state except undefined, though ready may not be meaningful.
Complete:  Only a 'ready' job may accept a complete request.


===
Changes
===

(1)

To facilitate "nice" error checking, all five major block-job verb
interfaces in blockjob.c now support an errp parameter:

- block_job_user_cancel is added as a new interface.
- block_job_user_pause gains an errp paramter
- block_job_user_resume gains an errp parameter
- block_job_set_speed already had an errp parameter.
- block_job_complete already had an errp parameter.

(2)

block-job-pause and block-job-resume will no longer no-op when trying
to pause an already paused job, or trying to resume a job that isn't
paused. These functions will now report that they did not perform the
action requested because it was not possible.

iotests have been adjusted to address this new behavior.

(3)

block-job-complete doesn't worry about checking !block_job_started,
because the permission table guards against this.

(4)

test-bdrv-drain's job implementation needs to announce that it is
'ready' now, in order to be completed.

Signed-off-by: John Snow 
---
 block/trace-events   |  1 +
 blockdev.c   | 10 +++
 blockjob.c   | 71 ++--
 include/block/blockjob.h | 13 +++--
 qapi/block-core.json | 20 ++
 tests/test-bdrv-drain.c  |  1 +
 6 files changed, 100 insertions(+), 16 deletions(-)

diff --git a/block/trace-events b/block/trace-events
index b75a0c8409..3fe89f7ea6 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -6,6 +6,7 @@ bdrv_lock_medium(void *bs, bool locked) "bs %p locked %d"
 
 # blockjob.c
 block_job_state_transition(void *job,  int ret, const char *legal, const char 
*s0, const char *s1) "job %p (ret: %d) attempting %s transition (%s-->%s)"
+block_job_apply_verb(void *job, const char *state, const char *verb, const 
char *legal) "job %p in state %s; applying verb %s (%s)"
 
 # block/block-backend.c
 blk_co_preadv(void *blk, void *bs, int64_t offset, unsigned int bytes, int 
flags) "blk %p bs %p offset %"PRId64" bytes %u flags 0x%x"
diff --git a/blockdev.c b/blockdev.c
index 3fb1ca803c..cba935a0a6 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3805,7 +3805,7 @@ void qmp_block_job_cancel(const char *device,
 }
 
 trace_qmp_block_job_cancel(job);
-block_job_cancel(job);
+block_job_user_cancel(job, errp);
 out:
 aio_context_release(aio_context);
 }
@@ -3815,12 +3815,12 @@ void qmp_block_job_pause(const char *device, Error 
**errp)
 AioContext *aio_context;
 BlockJob *job = find_block_job(device, _context, errp);
 
-if (!job || block_job_user_paused(job)) {
+if (!job) {
 return;
 }
 
 trace_qmp_block_job_pause(job);
-block_job_user_pause(job);
+block_job_user_pause(job, errp);
 aio_context_release(aio_context);
 }
 
@@ -3829,12 +3829,12 @@ void qmp_block_job_resume(const char *device, Error 
**errp)
 AioContext *aio_context;
 BlockJob *job = find_block_job(device, _context, errp);
 
-if (!job || !block_job_user_paused(job)) {
+if (!job) {
 return;
 }
 
 trace_qmp_block_job_resume(job);
-block_job_user_resume(job);
+block_job_user_resume(job, errp);
 aio_context_release(aio_context);
 }
 
diff --git a/blockjob.c b/blockjob.c
index d745b3bb69..4e424fef72 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -53,6 +53,15 @@ bool 
BlockJobSTT[BLOCK_JOB_STATUS__MAX][BLOCK_JOB_STATUS__MAX] = {
 /* S: */ [BLOCK_JOB_STATUS_STANDBY]   = {0, 0, 0, 0, 1, 0},
 };
 
+bool 

[Qemu-block] [RFC v4 21/21] blockjobs: add manual_mgmt option to transactions

2018-02-23 Thread John Snow
This allows us to easily force the option for all jobs belonging
to a transaction to ensure consistency with how all those jobs
will be handled.

This is purely a convenience.

Signed-off-by: John Snow 
---
 blockdev.c|  7 ++-
 blockjob.c| 10 +++---
 include/block/blockjob.h  |  5 -
 qapi/transaction.json |  3 ++-
 tests/test-blockjob-txn.c |  6 +++---
 5 files changed, 22 insertions(+), 9 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 2eddb0e726..34181c41c2 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2225,6 +2225,11 @@ static TransactionProperties *get_transaction_properties(
 props->completion_mode = ACTION_COMPLETION_MODE_INDIVIDUAL;
 }
 
+if (!props->has_manual_mgmt) {
+props->has_manual_mgmt = true;
+props->manual_mgmt = false;
+}
+
 return props;
 }
 
@@ -2250,7 +2255,7 @@ void qmp_transaction(TransactionActionList *dev_list,
  */
 props = get_transaction_properties(props);
 if (props->completion_mode != ACTION_COMPLETION_MODE_INDIVIDUAL) {
-block_job_txn = block_job_txn_new();
+block_job_txn = block_job_txn_new(props->manual_mgmt);
 }
 
 /* drain all i/o before any operations */
diff --git a/blockjob.c b/blockjob.c
index f9e8a64261..eaaa2aea65 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -136,6 +136,9 @@ struct BlockJobTxn {
 
 /* Reference count */
 int refcnt;
+
+/* Participating jobs must use manual completion */
+bool manual;
 };
 
 static QLIST_HEAD(, BlockJob) block_jobs = QLIST_HEAD_INITIALIZER(block_jobs);
@@ -176,11 +179,12 @@ BlockJob *block_job_get(const char *id)
 return NULL;
 }
 
-BlockJobTxn *block_job_txn_new(void)
+BlockJobTxn *block_job_txn_new(bool manual_mgmt)
 {
 BlockJobTxn *txn = g_new0(BlockJobTxn, 1);
 QLIST_INIT(>jobs);
 txn->refcnt = 1;
+txn->manual = manual_mgmt;
 return txn;
 }
 
@@ -944,7 +948,7 @@ void *block_job_create(const char *job_id, const 
BlockJobDriver *driver,
 job->paused= true;
 job->pause_count   = 1;
 job->refcnt= 1;
-job->manual= (flags & BLOCK_JOB_MANUAL);
+job->manual= (flags & BLOCK_JOB_MANUAL) || (txn && txn->manual);
 job->status= BLOCK_JOB_STATUS_CREATED;
 block_job_state_transition(job, BLOCK_JOB_STATUS_CREATED);
 aio_timer_init(qemu_get_aio_context(), >sleep_timer,
@@ -978,7 +982,7 @@ void *block_job_create(const char *job_id, const 
BlockJobDriver *driver,
 /* Single jobs are modeled as single-job transactions for sake of
  * consolidating the job management logic */
 if (!txn) {
-txn = block_job_txn_new();
+txn = block_job_txn_new(false);
 block_job_txn_add_job(txn, job);
 block_job_txn_unref(txn);
 } else {
diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index e09064c342..f3d026f13d 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -371,8 +371,11 @@ void block_job_iostatus_reset(BlockJob *job);
  * All jobs in the transaction either complete successfully or fail/cancel as a
  * group.  Jobs wait for each other before completing.  Cancelling one job
  * cancels all jobs in the transaction.
+ *
+ * @manual_mgmt: whether or not jobs that belong to this transaction will be
+ *   forced to use 2.12+ job management semantics
  */
-BlockJobTxn *block_job_txn_new(void);
+BlockJobTxn *block_job_txn_new(bool manual_mgmt);
 
 /**
  * block_job_ref:
diff --git a/qapi/transaction.json b/qapi/transaction.json
index bd312792da..9611758cb6 100644
--- a/qapi/transaction.json
+++ b/qapi/transaction.json
@@ -79,7 +79,8 @@
 ##
 { 'struct': 'TransactionProperties',
   'data': {
-   '*completion-mode': 'ActionCompletionMode'
+   '*completion-mode': 'ActionCompletionMode',
+   '*manual-mgmt': 'bool'
   }
 }
 
diff --git a/tests/test-blockjob-txn.c b/tests/test-blockjob-txn.c
index 34f09ef8c1..2d84f9a41e 100644
--- a/tests/test-blockjob-txn.c
+++ b/tests/test-blockjob-txn.c
@@ -119,7 +119,7 @@ static void test_single_job(int expected)
 BlockJobTxn *txn;
 int result = -EINPROGRESS;
 
-txn = block_job_txn_new();
+txn = block_job_txn_new(false);
 job = test_block_job_start(1, true, expected, , txn);
 block_job_start(job);
 
@@ -158,7 +158,7 @@ static void test_pair_jobs(int expected1, int expected2)
 int result1 = -EINPROGRESS;
 int result2 = -EINPROGRESS;
 
-txn = block_job_txn_new();
+txn = block_job_txn_new(false);
 job1 = test_block_job_start(1, true, expected1, , txn);
 job2 = test_block_job_start(2, true, expected2, , txn);
 block_job_start(job1);
@@ -220,7 +220,7 @@ static void test_pair_jobs_fail_cancel_race(void)
 int result1 = -EINPROGRESS;
 int result2 = -EINPROGRESS;
 
-txn = block_job_txn_new();
+txn = block_job_txn_new(false);
 job1 = test_block_job_start(1, true, -ECANCELED, , txn);
 job2 = 

[Qemu-block] [RFC v4 01/21] blockjobs: fix set-speed kick

2018-02-23 Thread John Snow
If speed is '0' it's not actually "less than" the previous speed.
Kick the job in this case too.

Signed-off-by: John Snow 
---
 blockjob.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/blockjob.c b/blockjob.c
index 3f52f29f75..24833ef30f 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -499,7 +499,7 @@ void block_job_set_speed(BlockJob *job, int64_t speed, 
Error **errp)
 }
 
 job->speed = speed;
-if (speed <= old_speed) {
+if (speed && speed <= old_speed) {
 return;
 }
 
-- 
2.14.3




[Qemu-block] [RFC v4 09/21] blockjobs: add CONCLUDED state

2018-02-23 Thread John Snow
add a new state "CONCLUDED" that identifies a job that has ceased all
operations. The wording was chosen to avoid any phrasing that might
imply success, error, or cancellation. The task has simply ceased all
operation and can never again perform any work.

("finished", "done", and "completed" might all imply success.)

Transitions:
Running  -> Concluded: normal completion
Ready-> Concluded: normal completion
Aborting -> Concluded: error and cancellations

Verbs:
None as of this commit. (a future commit adds 'dismiss')

 +-+
 |UNDEFINED|
 +--+--+
|
 +--v+
 |CREATED|
 +--++
|
 +--v+ +--+
   +-+RUNNING<->PAUSED|
   | +--+-+--+ +--+
   || |
   || +--+
   |||
   | +--v--+   +---+ |
   +-+READY<--->STANDBY| |
   | +--+--+   +---+ |
   |||
+--v-+   +--v--+ |
|ABORTING+--->CONCLUDED<-+
++   +-+

Signed-off-by: John Snow 
---
 blockjob.c   | 43 ---
 qapi/block-core.json |  5 -
 2 files changed, 32 insertions(+), 16 deletions(-)

diff --git a/blockjob.c b/blockjob.c
index 4c3fcda46c..93b0a36306 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -44,23 +44,24 @@ static QemuMutex block_job_mutex;
 
 /* BlockJob State Transition Table */
 bool BlockJobSTT[BLOCK_JOB_STATUS__MAX][BLOCK_JOB_STATUS__MAX] = {
-  /* U, C, R, P, Y, S, X */
-/* U: */ [BLOCK_JOB_STATUS_UNDEFINED] = {0, 1, 0, 0, 0, 0, 0},
-/* C: */ [BLOCK_JOB_STATUS_CREATED]   = {0, 0, 1, 0, 0, 0, 0},
-/* R: */ [BLOCK_JOB_STATUS_RUNNING]   = {0, 0, 0, 1, 1, 0, 1},
-/* P: */ [BLOCK_JOB_STATUS_PAUSED]= {0, 0, 1, 0, 0, 0, 0},
-/* Y: */ [BLOCK_JOB_STATUS_READY] = {0, 0, 0, 0, 0, 1, 1},
-/* S: */ [BLOCK_JOB_STATUS_STANDBY]   = {0, 0, 0, 0, 1, 0, 0},
-/* X: */ [BLOCK_JOB_STATUS_ABORTING]  = {0, 0, 0, 0, 0, 0, 0},
+  /* U, C, R, P, Y, S, X, E */
+/* U: */ [BLOCK_JOB_STATUS_UNDEFINED] = {0, 1, 0, 0, 0, 0, 0, 0},
+/* C: */ [BLOCK_JOB_STATUS_CREATED]   = {0, 0, 1, 0, 0, 0, 0, 0},
+/* R: */ [BLOCK_JOB_STATUS_RUNNING]   = {0, 0, 0, 1, 1, 0, 1, 1},
+/* P: */ [BLOCK_JOB_STATUS_PAUSED]= {0, 0, 1, 0, 0, 0, 0, 0},
+/* Y: */ [BLOCK_JOB_STATUS_READY] = {0, 0, 0, 0, 0, 1, 1, 1},
+/* S: */ [BLOCK_JOB_STATUS_STANDBY]   = {0, 0, 0, 0, 1, 0, 0, 0},
+/* X: */ [BLOCK_JOB_STATUS_ABORTING]  = {0, 0, 0, 0, 0, 0, 0, 1},
+/* E: */ [BLOCK_JOB_STATUS_CONCLUDED] = {0, 0, 0, 0, 0, 0, 0, 0},
 };
 
 bool BlockJobVerbTable[BLOCK_JOB_VERB__MAX][BLOCK_JOB_STATUS__MAX] = {
-  /* U, C, R, P, Y, S, X */
-[BLOCK_JOB_VERB_CANCEL]   = {0, 1, 1, 1, 1, 1, 0},
-[BLOCK_JOB_VERB_PAUSE]= {0, 1, 1, 1, 1, 1, 0},
-[BLOCK_JOB_VERB_RESUME]   = {0, 1, 1, 1, 1, 1, 0},
-[BLOCK_JOB_VERB_SET_SPEED]= {0, 1, 1, 1, 1, 1, 0},
-[BLOCK_JOB_VERB_COMPLETE] = {0, 0, 0, 0, 1, 0, 0},
+  /* U, C, R, P, Y, S, X, E */
+[BLOCK_JOB_VERB_CANCEL]   = {0, 1, 1, 1, 1, 1, 0, 0},
+[BLOCK_JOB_VERB_PAUSE]= {0, 1, 1, 1, 1, 1, 0, 0},
+[BLOCK_JOB_VERB_RESUME]   = {0, 1, 1, 1, 1, 1, 0, 0},
+[BLOCK_JOB_VERB_SET_SPEED]= {0, 1, 1, 1, 1, 1, 0, 0},
+[BLOCK_JOB_VERB_COMPLETE] = {0, 0, 0, 0, 1, 0, 0, 0},
 };
 
 static void block_job_state_transition(BlockJob *job, BlockJobStatus s1)
@@ -114,6 +115,7 @@ static void __attribute__((__constructor__)) 
block_job_init(void)
 
 static void block_job_event_cancelled(BlockJob *job);
 static void block_job_event_completed(BlockJob *job, const char *msg);
+static void block_job_event_concluded(BlockJob *job);
 static void block_job_enter_cond(BlockJob *job, bool(*fn)(BlockJob *job));
 
 /* Transactional group of block jobs */
@@ -420,6 +422,7 @@ static void block_job_completed_single(BlockJob *job)
 
 QLIST_REMOVE(job, txn_list);
 block_job_txn_unref(job->txn);
+block_job_event_concluded(job);
 block_job_unref(job);
 }
 
@@ -620,7 +623,9 @@ void block_job_user_resume(BlockJob *job, Error **errp)
 
 void block_job_cancel(BlockJob *job)
 {
-if (block_job_started(job)) {
+if (job->status == BLOCK_JOB_STATUS_CONCLUDED) {
+return;
+} else if (block_job_started(job)) {
 block_job_cancel_async(job);
 block_job_enter(job);
 } else {
@@ -727,6 +732,14 @@ static void block_job_event_completed(BlockJob *job, const 
char *msg)
 _abort);
 }
 
+static void block_job_event_concluded(BlockJob *job)
+{
+if 

[Qemu-block] [RFC v4 00/21] blockjobs: add explicit job management

2018-02-23 Thread John Snow
This series seeks to address two distinct but closely related issues
concerning the job management API.

(1) For jobs that complete when a monitor is not attached and receiving
events or notifications, there's no way to discern the job's final
return code. Jobs must remain in the query list until dismissed
for reliable management.

(2) Jobs that change the block graph structure at an indeterminate point
after the job starts compete with the management layer that relies
on that graph structure to issue meaningful commands.

This structure should change only at the behest of the management
API, and not asynchronously at unknown points in time. Before a job
issues such changes, it must rely on explicit and synchronous
confirmation from the management API.

This series is a rough sketch that solves these problems by adding three
new distinct job states, and two new job command verbs.

These changes are implemented by formalizing a State Transition Machine
for the BlockJob subsystem.

Job States:

UNDEFINED   Default state. Internal state only.
CREATED Job has been created
RUNNING Job has been started and is running
PAUSED  Job is not ready and has been paused
READY   Job is ready and is running
STANDBY Job is ready and is paused

WAITING Job is waiting on peers in transaction
PENDING Job is waiting on ACK from QMP
ABORTINGJob is aborting or has been cancelled
CONCLUDED   Job has finished and has a retcode available
NULLJob is being dismantled. Internal state only.

Job Verbs:

CANCEL  Instructs a running job to terminate with error,
(Except when that job is READY, which produces no error.)
PAUSE   Request a job to pause.
RESUME  Request a job to resume from a pause.
SET-SPEED   Change the speed limiting parameter of a job.
COMPLETEAsk a READY job to finish and exit.

FINALIZEAsk a PENDING job to perform its graph finalization.
DISMISS Finish cleaning up an empty job.

And here's my stab at a diagram:

 +-+
 |UNDEFINED|
 +--+--+
|
 +--v+
 |CREATED+-+
 +--++ |
|  |
 +--++ +--+|
   +-+RUNNING<->PAUSED||
   | +--+-+--+ +--+|
   || ||
   || +--+ |
   ||| |
   | +--v--+   +---+ | |
   +-+READY<--->STANDBY| | |
   | +--+--+   +---+ | |
   ||| |
   | +--v+   | |
   +-+WAITING+---+ |
   | +--++ |
   ||  |
   | +--v+ |
   +-+PENDING| |
   | +--++ |
   ||  |
+--v-+   +--v--+   |
|ABORTING+--->CONCLUDED|   |
++   +--+--+   |
|  |
 +--v-+|
 |NULL++
 ++

V4:
 - All jobs are now transactions.
 - All jobs now transition through states in a uniform way.
 - Verb permissions are now enforced.

V3:
 - Added WAITING and PENDING events
 - Added block_job_finalize verb
 - Added .pending() callback for jobs
 - Tweaked how .commit/.abort work

V2:
 - Added tests!
 - Changed property name (Jeff, Paolo)

RFC / Known problems:
- I need a lot more tests, still...

- STANDBY is a dumb name, and maybe not even really needed or wanted.
  However, a Paused job will return to either READY or RUNNING depending on
  the state it was in when it was PAUSED. We can keep that in an internal
  variable, or we can make it explicit in the STM.

- is "manual" descriptive as a property name?
  Kevin conceives of the new workflow as
  "No automatic transitions, please." (i.e. automatic-transitions: False)
  Whereas I think of it more like:
  "Enable manual workflow mode, please." (manual-transitions: True)

  I like the idea of the new property defaulting to false and have coded
  in a way mindful of that.

- Mirror needs to be refactored to use the commit/abort/pending/clean callbacks
  to fulfill the promise made by "no graph changes without user authorization"
  that PENDING is supposed to offer



For convenience, this branch is available at:
https://github.com/jnsnow/qemu.git branch block-job-reap
https://github.com/jnsnow/qemu/tree/block-job-reap

This version is tagged 

[Qemu-block] [RFC v4 10/21] blockjobs: add NULL state

2018-02-23 Thread John Snow
Add a new state that specifically demarcates when we begin to permanently
demolish a job after it has performed all work. This makes the transition
explicit in the STM table and highlights conditions under which a job may
be demolished.


Transitions:
Created   -> Null: Early failure event before the job is started
Concluded -> Null: Standard transition.

Verbs:
None. This should not ever be visible to the monitor.

 +-+
 |UNDEFINED|
 +--+--+
|
 +--v+
 |CREATED+--+
 +--++  |
|   |
 +--v+ +--+ |
   +-+RUNNING<->PAUSED| |
   | +--+-+--+ +--+ |
   || | |
   || +--+  |
   |||  |
   | +--v--+   +---+ |  |
   +-+READY<--->STANDBY| |  |
   | +--+--+   +---+ |  |
   |||  |
+--v-+   +--v--+ |  |
|ABORTING+--->CONCLUDED<-+  |
++   +--+--+|
|   |
 +--v-+ |
 |NULL<-+
 ++

Signed-off-by: John Snow 
---
 blockjob.c   | 35 +--
 qapi/block-core.json |  5 -
 2 files changed, 21 insertions(+), 19 deletions(-)

diff --git a/blockjob.c b/blockjob.c
index 93b0a36306..7b5c4063cf 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -44,24 +44,25 @@ static QemuMutex block_job_mutex;
 
 /* BlockJob State Transition Table */
 bool BlockJobSTT[BLOCK_JOB_STATUS__MAX][BLOCK_JOB_STATUS__MAX] = {
-  /* U, C, R, P, Y, S, X, E */
-/* U: */ [BLOCK_JOB_STATUS_UNDEFINED] = {0, 1, 0, 0, 0, 0, 0, 0},
-/* C: */ [BLOCK_JOB_STATUS_CREATED]   = {0, 0, 1, 0, 0, 0, 0, 0},
-/* R: */ [BLOCK_JOB_STATUS_RUNNING]   = {0, 0, 0, 1, 1, 0, 1, 1},
-/* P: */ [BLOCK_JOB_STATUS_PAUSED]= {0, 0, 1, 0, 0, 0, 0, 0},
-/* Y: */ [BLOCK_JOB_STATUS_READY] = {0, 0, 0, 0, 0, 1, 1, 1},
-/* S: */ [BLOCK_JOB_STATUS_STANDBY]   = {0, 0, 0, 0, 1, 0, 0, 0},
-/* X: */ [BLOCK_JOB_STATUS_ABORTING]  = {0, 0, 0, 0, 0, 0, 0, 1},
-/* E: */ [BLOCK_JOB_STATUS_CONCLUDED] = {0, 0, 0, 0, 0, 0, 0, 0},
+  /* U, C, R, P, Y, S, X, E, N */
+/* U: */ [BLOCK_JOB_STATUS_UNDEFINED] = {0, 1, 0, 0, 0, 0, 0, 0, 0},
+/* C: */ [BLOCK_JOB_STATUS_CREATED]   = {0, 0, 1, 0, 0, 0, 0, 0, 1},
+/* R: */ [BLOCK_JOB_STATUS_RUNNING]   = {0, 0, 0, 1, 1, 0, 1, 1, 0},
+/* P: */ [BLOCK_JOB_STATUS_PAUSED]= {0, 0, 1, 0, 0, 0, 0, 0, 0},
+/* Y: */ [BLOCK_JOB_STATUS_READY] = {0, 0, 0, 0, 0, 1, 1, 1, 0},
+/* S: */ [BLOCK_JOB_STATUS_STANDBY]   = {0, 0, 0, 0, 1, 0, 0, 0, 0},
+/* X: */ [BLOCK_JOB_STATUS_ABORTING]  = {0, 0, 0, 0, 0, 0, 0, 1, 0},
+/* E: */ [BLOCK_JOB_STATUS_CONCLUDED] = {0, 0, 0, 0, 0, 0, 0, 0, 1},
+/* N: */ [BLOCK_JOB_STATUS_NULL]  = {0, 0, 0, 0, 0, 0, 0, 0, 0},
 };
 
 bool BlockJobVerbTable[BLOCK_JOB_VERB__MAX][BLOCK_JOB_STATUS__MAX] = {
-  /* U, C, R, P, Y, S, X, E */
-[BLOCK_JOB_VERB_CANCEL]   = {0, 1, 1, 1, 1, 1, 0, 0},
-[BLOCK_JOB_VERB_PAUSE]= {0, 1, 1, 1, 1, 1, 0, 0},
-[BLOCK_JOB_VERB_RESUME]   = {0, 1, 1, 1, 1, 1, 0, 0},
-[BLOCK_JOB_VERB_SET_SPEED]= {0, 1, 1, 1, 1, 1, 0, 0},
-[BLOCK_JOB_VERB_COMPLETE] = {0, 0, 0, 0, 1, 0, 0, 0},
+  /* U, C, R, P, Y, S, X, E, N */
+[BLOCK_JOB_VERB_CANCEL]   = {0, 1, 1, 1, 1, 1, 0, 0, 0},
+[BLOCK_JOB_VERB_PAUSE]= {0, 1, 1, 1, 1, 1, 0, 0, 0},
+[BLOCK_JOB_VERB_RESUME]   = {0, 1, 1, 1, 1, 1, 0, 0, 0},
+[BLOCK_JOB_VERB_SET_SPEED]= {0, 1, 1, 1, 1, 1, 0, 0, 0},
+[BLOCK_JOB_VERB_COMPLETE] = {0, 0, 0, 0, 1, 0, 0, 0, 0},
 };
 
 static void block_job_state_transition(BlockJob *job, BlockJobStatus s1)
@@ -423,6 +424,7 @@ static void block_job_completed_single(BlockJob *job)
 QLIST_REMOVE(job, txn_list);
 block_job_txn_unref(job->txn);
 block_job_event_concluded(job);
+block_job_state_transition(job, BLOCK_JOB_STATUS_NULL);
 block_job_unref(job);
 }
 
@@ -734,9 +736,6 @@ static void block_job_event_completed(BlockJob *job, const 
char *msg)
 
 static void block_job_event_concluded(BlockJob *job)
 {
-if (block_job_is_internal(job) || !job->manual) {
-return;
-}
 block_job_state_transition(job, BLOCK_JOB_STATUS_CONCLUDED);
 }
 
diff --git a/qapi/block-core.json b/qapi/block-core.json
index aeb9b1937b..578c0c91ca 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1003,11 +1003,14 @@
 # @concluded: The 

[Qemu-block] [RFC v4 06/21] iotests: add pause_wait

2018-02-23 Thread John Snow
Split out the pause command into the actual pause and the wait.
Not every usage presently needs to resubmit a pause request.

The intent with the next commit will be to explicitly disallow
redundant or meaningless pause/resume requests, so the tests
need to become more judicious to reflect that.

Signed-off-by: John Snow 
---
 tests/qemu-iotests/030|  6 ++
 tests/qemu-iotests/055| 17 ++---
 tests/qemu-iotests/iotests.py | 12 
 3 files changed, 16 insertions(+), 19 deletions(-)

diff --git a/tests/qemu-iotests/030 b/tests/qemu-iotests/030
index 457984b8e9..251883226c 100755
--- a/tests/qemu-iotests/030
+++ b/tests/qemu-iotests/030
@@ -86,11 +86,9 @@ class TestSingleDrive(iotests.QMPTestCase):
 result = self.vm.qmp('block-stream', device='drive0')
 self.assert_qmp(result, 'return', {})
 
-result = self.vm.qmp('block-job-pause', device='drive0')
-self.assert_qmp(result, 'return', {})
-
+self.pause_job('drive0', wait=False)
 self.vm.resume_drive('drive0')
-self.pause_job('drive0')
+self.pause_wait('drive0')
 
 result = self.vm.qmp('query-block-jobs')
 offset = self.dictpath(result, 'return[0]/offset')
diff --git a/tests/qemu-iotests/055 b/tests/qemu-iotests/055
index 8a5d9fd269..3437c11507 100755
--- a/tests/qemu-iotests/055
+++ b/tests/qemu-iotests/055
@@ -86,11 +86,9 @@ class TestSingleDrive(iotests.QMPTestCase):
  target=target, sync='full')
 self.assert_qmp(result, 'return', {})
 
-result = self.vm.qmp('block-job-pause', device='drive0')
-self.assert_qmp(result, 'return', {})
-
+self.pause_job('drive0', wait=False)
 self.vm.resume_drive('drive0')
-self.pause_job('drive0')
+self.pause_wait('drive0')
 
 result = self.vm.qmp('query-block-jobs')
 offset = self.dictpath(result, 'return[0]/offset')
@@ -303,13 +301,12 @@ class TestSingleTransaction(iotests.QMPTestCase):
 ])
 self.assert_qmp(result, 'return', {})
 
-result = self.vm.qmp('block-job-pause', device='drive0')
-self.assert_qmp(result, 'return', {})
+self.pause_job('drive0', wait=False)
 
 result = self.vm.qmp('block-job-set-speed', device='drive0', speed=0)
 self.assert_qmp(result, 'return', {})
 
-self.pause_job('drive0')
+self.pause_wait('drive0')
 
 result = self.vm.qmp('query-block-jobs')
 offset = self.dictpath(result, 'return[0]/offset')
@@ -534,11 +531,9 @@ class TestDriveCompression(iotests.QMPTestCase):
 result = self.vm.qmp(cmd, device='drive0', sync='full', compress=True, 
**args)
 self.assert_qmp(result, 'return', {})
 
-result = self.vm.qmp('block-job-pause', device='drive0')
-self.assert_qmp(result, 'return', {})
-
+self.pause_job('drive0', wait=False)
 self.vm.resume_drive('drive0')
-self.pause_job('drive0')
+self.pause_wait('drive0')
 
 result = self.vm.qmp('query-block-jobs')
 offset = self.dictpath(result, 'return[0]/offset')
diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 1bcc9ca57d..5303bbc8e2 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -473,10 +473,7 @@ class QMPTestCase(unittest.TestCase):
 event = self.wait_until_completed(drive=drive)
 self.assert_qmp(event, 'data/type', 'mirror')
 
-def pause_job(self, job_id='job0'):
-result = self.vm.qmp('block-job-pause', device=job_id)
-self.assert_qmp(result, 'return', {})
-
+def pause_wait(self, job_id='job0'):
 with Timeout(1, "Timeout waiting for job to pause"):
 while True:
 result = self.vm.qmp('query-block-jobs')
@@ -484,6 +481,13 @@ class QMPTestCase(unittest.TestCase):
 if job['device'] == job_id and job['paused'] == True and 
job['busy'] == False:
 return job
 
+def pause_job(self, job_id='job0', wait=True):
+result = self.vm.qmp('block-job-pause', device=job_id)
+self.assert_qmp(result, 'return', {})
+if wait:
+return self.pause_wait(job_id)
+return result
+
 
 def notrun(reason):
 '''Skip this test suite'''
-- 
2.14.3




[Qemu-block] [RFC v4 13/21] blockjobs: add commit, abort, clean helpers

2018-02-23 Thread John Snow
The completed_single function is getting a little mucked up with
checking to see which callbacks exist, so let's factor them out.

Signed-off-by: John Snow 
---
 blockjob.c | 35 ++-
 1 file changed, 26 insertions(+), 9 deletions(-)

diff --git a/blockjob.c b/blockjob.c
index ef17dea004..431ce9c220 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -394,6 +394,29 @@ static void block_job_update_rc(BlockJob *job)
 }
 }
 
+static void block_job_commit(BlockJob *job)
+{
+assert(!job->ret);
+if (job->driver->commit) {
+job->driver->commit(job);
+}
+}
+
+static void block_job_abort(BlockJob *job)
+{
+assert(job->ret);
+if (job->driver->abort) {
+job->driver->abort(job);
+}
+}
+
+static void block_job_clean(BlockJob *job)
+{
+if (job->driver->clean) {
+job->driver->clean(job);
+}
+}
+
 static void block_job_completed_single(BlockJob *job)
 {
 assert(job->completed);
@@ -402,17 +425,11 @@ static void block_job_completed_single(BlockJob *job)
 block_job_update_rc(job);
 
 if (!job->ret) {
-if (job->driver->commit) {
-job->driver->commit(job);
-}
+block_job_commit(job);
 } else {
-if (job->driver->abort) {
-job->driver->abort(job);
-}
-}
-if (job->driver->clean) {
-job->driver->clean(job);
+block_job_abort(job);
 }
+block_job_clean(job);
 
 if (job->cb) {
 job->cb(job->opaque, job->ret);
-- 
2.14.3




[Qemu-block] [RFC v4 12/21] blockjobs: ensure abort is called for cancelled jobs

2018-02-23 Thread John Snow
Presently, even if a job is canceled post-completion as a result of
a failing peer in a transaction, it will still call .commit because
nothing has updated or changed its return code.

The reason why this does not cause problems currently is because
backup's implementation of .commit checks for cancellation itself.

I'd like to simplify this contract:

(1) Abort is called if the job/transaction fails
(2) Commit is called if the job/transaction succeeds

To this end: A job's return code, if 0, will be forcibly set as
-ECANCELED if that job has already concluded. Remove the now
redundant check in the backup job implementation.

We need to check for cancellation in both block_job_completed
AND block_job_completed_single, because jobs may be cancelled between
those two calls; for instance in transactions.

The check in block_job_completed could be removed, but there's no
point in starting to attempt to succeed a transaction that we know
in advance will fail.

This does NOT affect mirror jobs that are "canceled" during their
synchronous phase. The mirror job itself forcibly sets the canceled
property to false prior to ceding control, so such cases will invoke
the "commit" callback.

Signed-off-by: John Snow 
---
 block/backup.c |  2 +-
 block/trace-events |  1 +
 blockjob.c | 19 +++
 3 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 7e254dabff..453cd62c24 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -206,7 +206,7 @@ static void backup_cleanup_sync_bitmap(BackupBlockJob *job, 
int ret)
 BdrvDirtyBitmap *bm;
 BlockDriverState *bs = blk_bs(job->common.blk);
 
-if (ret < 0 || block_job_is_cancelled(>common)) {
+if (ret < 0) {
 /* Merge the successor back into the parent, delete nothing. */
 bm = bdrv_reclaim_dirty_bitmap(bs, job->sync_bitmap, NULL);
 assert(bm);
diff --git a/block/trace-events b/block/trace-events
index 266afd9e99..5e531e0310 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -5,6 +5,7 @@ bdrv_open_common(void *bs, const char *filename, int flags, 
const char *format_n
 bdrv_lock_medium(void *bs, bool locked) "bs %p locked %d"
 
 # blockjob.c
+block_job_completed(void *job, int ret, int jret) "job %p ret %d corrected ret 
%d"
 block_job_state_transition(void *job,  int ret, const char *legal, const char 
*s0, const char *s1) "job %p (ret: %d) attempting %s transition (%s-->%s)"
 block_job_apply_verb(void *job, const char *state, const char *verb, const 
char *legal) "job %p in state %s; applying verb %s (%s)"
 
diff --git a/blockjob.c b/blockjob.c
index 4d29391673..ef17dea004 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -384,13 +384,22 @@ void block_job_start(BlockJob *job)
 bdrv_coroutine_enter(blk_bs(job->blk), job->co);
 }
 
+static void block_job_update_rc(BlockJob *job)
+{
+if (!job->ret && block_job_is_cancelled(job)) {
+job->ret = -ECANCELED;
+}
+if (job->ret) {
+block_job_state_transition(job, BLOCK_JOB_STATUS_ABORTING);
+}
+}
+
 static void block_job_completed_single(BlockJob *job)
 {
 assert(job->completed);
 
-if (job->ret || block_job_is_cancelled(job)) {
-block_job_state_transition(job, BLOCK_JOB_STATUS_ABORTING);
-}
+/* Ensure abort is called for late-transactional failures */
+block_job_update_rc(job);
 
 if (!job->ret) {
 if (job->driver->commit) {
@@ -898,7 +907,9 @@ void block_job_completed(BlockJob *job, int ret)
 assert(blk_bs(job->blk)->job == job);
 job->completed = true;
 job->ret = ret;
-if (ret < 0 || block_job_is_cancelled(job)) {
+block_job_update_rc(job);
+trace_block_job_completed(job, ret, job->ret);
+if (job->ret) {
 block_job_completed_txn_abort(job);
 } else {
 block_job_completed_txn_success(job);
-- 
2.14.3




[Qemu-block] [RFC v4 19/21] blockjobs: Expose manual property

2018-02-23 Thread John Snow
Expose the "manual" property via QAPI for the backup-related jobs.
As of this commit, this allows the management API to request the
"concluded" and "dismiss" semantics for backup jobs.

Signed-off-by: John Snow 
---
 blockdev.c   | 19 ---
 qapi/block-core.json | 32 ++--
 2 files changed, 42 insertions(+), 9 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 05fd421cdc..2eddb0e726 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3260,7 +3260,7 @@ static BlockJob *do_drive_backup(DriveBackup *backup, 
BlockJobTxn *txn,
 AioContext *aio_context;
 QDict *options = NULL;
 Error *local_err = NULL;
-int flags;
+int flags, job_flags = BLOCK_JOB_DEFAULT;
 int64_t size;
 bool set_backing_hd = false;
 
@@ -3279,6 +3279,9 @@ static BlockJob *do_drive_backup(DriveBackup *backup, 
BlockJobTxn *txn,
 if (!backup->has_job_id) {
 backup->job_id = NULL;
 }
+if (!backup->has_manual) {
+backup->manual = false;
+}
 if (!backup->has_compress) {
 backup->compress = false;
 }
@@ -3370,11 +3373,14 @@ static BlockJob *do_drive_backup(DriveBackup *backup, 
BlockJobTxn *txn,
 goto out;
 }
 }
+if (backup->manual) {
+job_flags |= BLOCK_JOB_MANUAL;
+}
 
 job = backup_job_create(backup->job_id, bs, target_bs, backup->speed,
 backup->sync, bmap, backup->compress,
 backup->on_source_error, backup->on_target_error,
-BLOCK_JOB_DEFAULT, NULL, NULL, txn, _err);
+job_flags, NULL, NULL, txn, _err);
 bdrv_unref(target_bs);
 if (local_err != NULL) {
 error_propagate(errp, local_err);
@@ -3409,6 +3415,7 @@ BlockJob *do_blockdev_backup(BlockdevBackup *backup, 
BlockJobTxn *txn,
 Error *local_err = NULL;
 AioContext *aio_context;
 BlockJob *job = NULL;
+int job_flags = BLOCK_JOB_DEFAULT;
 
 if (!backup->has_speed) {
 backup->speed = 0;
@@ -3422,6 +3429,9 @@ BlockJob *do_blockdev_backup(BlockdevBackup *backup, 
BlockJobTxn *txn,
 if (!backup->has_job_id) {
 backup->job_id = NULL;
 }
+if (!backup->has_manual) {
+backup->manual = false;
+}
 if (!backup->has_compress) {
 backup->compress = false;
 }
@@ -3450,10 +3460,13 @@ BlockJob *do_blockdev_backup(BlockdevBackup *backup, 
BlockJobTxn *txn,
 goto out;
 }
 }
+if (backup->manual) {
+job_flags |= BLOCK_JOB_MANUAL;
+}
 job = backup_job_create(backup->job_id, bs, target_bs, backup->speed,
 backup->sync, NULL, backup->compress,
 backup->on_source_error, backup->on_target_error,
-BLOCK_JOB_DEFAULT, NULL, NULL, txn, _err);
+job_flags, NULL, NULL, txn, _err);
 if (local_err != NULL) {
 error_propagate(errp, local_err);
 }
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 549c6c02d8..7b3af93682 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1177,6 +1177,16 @@
 # @job-id: identifier for the newly-created block job. If
 #  omitted, the device name will be used. (Since 2.7)
 #
+# @manual: True to use an expanded, more explicit job control flow.
+#  Jobs may transition from a running state to a pending state,
+#  where they must be instructed to complete manually via
+#  block-job-finalize.
+#  Jobs belonging to a transaction must either all or all not use this
+#  setting. Once a transaction reaches a pending state, issuing the
+#  finalize command to any one job in the transaction is sufficient
+#  to finalize the entire transaction.
+#  (Since 2.12)
+#
 # @device: the device name or node-name of a root node which should be copied.
 #
 # @target: the target of the new image. If the file exists, or if it
@@ -1217,9 +1227,10 @@
 # Since: 1.6
 ##
 { 'struct': 'DriveBackup',
-  'data': { '*job-id': 'str', 'device': 'str', 'target': 'str',
-'*format': 'str', 'sync': 'MirrorSyncMode', '*mode': 
'NewImageMode',
-'*speed': 'int', '*bitmap': 'str', '*compress': 'bool',
+  'data': { '*job-id': 'str', '*manual': 'bool', 'device': 'str',
+'target': 'str', '*format': 'str', 'sync': 'MirrorSyncMode',
+'*mode': 'NewImageMode', '*speed': 'int',
+'*bitmap': 'str', '*compress': 'bool',
 '*on-source-error': 'BlockdevOnError',
 '*on-target-error': 'BlockdevOnError' } }
 
@@ -1229,6 +1240,16 @@
 # @job-id: identifier for the newly-created block job. If
 #  omitted, the device name will be used. (Since 2.7)
 #
+# @manual: True to use an expanded, more explicit job control flow.
+#  Jobs may transition from a running state to a pending state,
+#  where they must be 

[Qemu-block] [RFC v4 15/21] blockjobs: add prepare callback

2018-02-23 Thread John Snow
Some jobs upon finalization may need to perform some work that can
still fail. If these jobs are part of a transaction, it's important
that these callbacks fail the entire transaction.

We allow for a new callback in addition to commit/abort/clean that
allows us the opportunity to have fairly late-breaking failures
in the transactional process.

The expected flow is:

- All jobs in a transaction converge to the WAITING state
  (added in a forthcoming commit)
- All jobs prepare to call either commit/abort
- If any job fails, is canceled, or fails preparation, all jobs
  call their .abort callback.
- All jobs enter the PENDING state, awaiting manual intervention
  (also added in a forthcoming commit)
- block-job-finalize is issued by the user/management layer
- All jobs call their commit callbacks.

Signed-off-by: John Snow 
---
 blockjob.c   | 34 +++---
 include/block/blockjob_int.h | 10 ++
 2 files changed, 41 insertions(+), 3 deletions(-)

diff --git a/blockjob.c b/blockjob.c
index 8f02c03880..1c010ec100 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -394,6 +394,18 @@ static void block_job_update_rc(BlockJob *job)
 }
 }
 
+static int block_job_prepare(BlockJob *job)
+{
+if (job->ret) {
+goto out;
+}
+if (job->driver->prepare) {
+job->ret = job->driver->prepare(job);
+}
+ out:
+return job->ret;
+}
+
 static void block_job_commit(BlockJob *job)
 {
 assert(!job->ret);
@@ -417,7 +429,7 @@ static void block_job_clean(BlockJob *job)
 }
 }
 
-static void block_job_completed_single(BlockJob *job)
+static int block_job_completed_single(BlockJob *job)
 {
 assert(job->completed);
 
@@ -452,6 +464,7 @@ static void block_job_completed_single(BlockJob *job)
 block_job_txn_unref(job->txn);
 block_job_event_concluded(job);
 block_job_unref(job);
+return 0;
 }
 
 static void block_job_cancel_async(BlockJob *job)
@@ -467,17 +480,22 @@ static void block_job_cancel_async(BlockJob *job)
 job->cancelled = true;
 }
 
-static void block_job_txn_apply(BlockJobTxn *txn, void fn(BlockJob *))
+static int block_job_txn_apply(BlockJobTxn *txn, int fn(BlockJob *))
 {
 AioContext *ctx;
 BlockJob *job, *next;
+int rc;
 
 QLIST_FOREACH_SAFE(job, >jobs, txn_list, next) {
 ctx = blk_get_aio_context(job->blk);
 aio_context_acquire(ctx);
-fn(job);
+rc = fn(job);
 aio_context_release(ctx);
+if (rc) {
+break;
+}
 }
+return rc;
 }
 
 static void block_job_do_dismiss(BlockJob *job)
@@ -567,6 +585,8 @@ static void block_job_completed_txn_success(BlockJob *job)
 {
 BlockJobTxn *txn = job->txn;
 BlockJob *other_job;
+int rc = 0;
+
 /*
  * Successful completion, see if there are other running jobs in this
  * txn.
@@ -576,6 +596,14 @@ static void block_job_completed_txn_success(BlockJob *job)
 return;
 }
 }
+
+/* Jobs may require some prep-work to complete without failure */
+rc = block_job_txn_apply(txn, block_job_prepare);
+if (rc) {
+block_job_completed_txn_abort(job);
+return;
+}
+
 /* We are the last completed job, commit the transaction. */
 block_job_txn_apply(txn, block_job_completed_single);
 }
diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
index 259d49b32a..642adce68b 100644
--- a/include/block/blockjob_int.h
+++ b/include/block/blockjob_int.h
@@ -53,6 +53,16 @@ struct BlockJobDriver {
  */
 void (*complete)(BlockJob *job, Error **errp);
 
+/**
+ * If the callback is not NULL, prepare will be invoked when all the jobs
+ * belonging to the same transaction complete; or upon this job's 
completion
+ * if it is not in a transaction.
+ *
+ * This callback will not be invoked if the job has already failed.
+ * If it fails, abort and then clean will be called.
+ */
+int (*prepare)(BlockJob *job);
+
 /**
  * If the callback is not NULL, it will be invoked when all the jobs
  * belonging to the same transaction complete; or upon this job's
-- 
2.14.3




[Qemu-block] [RFC v4 03/21] blockjobs: add manual property

2018-02-23 Thread John Snow
This property will be used to opt-in to the new BlockJobs workflow
that allows a tighter, more explicit control over transitions from
one runstate to another.

While we're here, fix up the documentation for block_job_create
a little bit.

Signed-off-by: John Snow 
---
 blockjob.c   |  1 +
 include/block/blockjob.h | 10 ++
 include/block/blockjob_int.h |  4 +++-
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/blockjob.c b/blockjob.c
index 7ba3683ee3..47468331ec 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -700,6 +700,7 @@ void *block_job_create(const char *job_id, const 
BlockJobDriver *driver,
 job->paused= true;
 job->pause_count   = 1;
 job->refcnt= 1;
+job->manual= (flags & BLOCK_JOB_MANUAL);
 aio_timer_init(qemu_get_aio_context(), >sleep_timer,
QEMU_CLOCK_REALTIME, SCALE_NS,
block_job_sleep_timer_cb, job);
diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index 00403d9482..8ffabdcbc4 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -141,14 +141,24 @@ typedef struct BlockJob {
  */
 QEMUTimer sleep_timer;
 
+/**
+ * Set to true when the management API has requested manual job
+ * management semantics.
+ */
+bool manual;
+
 /** Non-NULL if this job is part of a transaction */
 BlockJobTxn *txn;
 QLIST_ENTRY(BlockJob) txn_list;
 } BlockJob;
 
 typedef enum BlockJobCreateFlags {
+/* Default behavior */
 BLOCK_JOB_DEFAULT = 0x00,
+/* BlockJob is not QMP-created and should not send QMP events */
 BLOCK_JOB_INTERNAL = 0x01,
+/* BlockJob requests manual job management steps. */
+BLOCK_JOB_MANUAL = 0x02,
 } BlockJobCreateFlags;
 
 /**
diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
index becaae74c2..259d49b32a 100644
--- a/include/block/blockjob_int.h
+++ b/include/block/blockjob_int.h
@@ -114,11 +114,13 @@ struct BlockJobDriver {
  * block_job_create:
  * @job_id: The id of the newly-created job, or %NULL to have one
  * generated automatically.
- * @job_type: The class object for the newly-created job.
+ * @driver: The class object for the newly-created job.
  * @txn: The transaction this job belongs to, if any. %NULL otherwise.
  * @bs: The block
  * @perm, @shared_perm: Permissions to request for @bs
  * @speed: The maximum speed, in bytes per second, or 0 for unlimited.
+ * @flags: Creation flags for the Block Job.
+ * See @BlockJobCreateFlags
  * @cb: Completion function for the job.
  * @opaque: Opaque pointer value passed to @cb.
  * @errp: Error object.
-- 
2.14.3




[Qemu-block] [RFC v4 17/21] blockjobs: add PENDING status and event

2018-02-23 Thread John Snow
For jobs utilizing the new manual workflow, we intend to prohibit
them from modifying the block graph until the management layer provides
an explicit ACK via block-job-finalize to move the process forward.

To distinguish this runstate from "ready" or "waiting," we add a new
"pending" event.

For now, the transition from PENDING to CONCLUDED/ABORTING is automatic,
but a future commit will add the explicit block-job-finalize step.

Transitions:
Waiting -> Pending:   Normal transition.
Pending -> Concluded: Normal transition.
Pending -> Aborting:  Late transactional failures and cancellations.

Removed Transitions:
Waiting -> Concluded: Jobs must go to PENDING first.

Verbs:
Cancel: Can be applied to a pending job.

 +-+
 |UNDEFINED|
 +--+--+
|
 +--v+
 |CREATED+-+
 +--++ |
|  |
 +--++ +--+|
   +-+RUNNING<->PAUSED||
   | +--+-+--+ +--+|
   || ||
   || +--+ |
   ||| |
   | +--v--+   +---+ | |
   +-+READY<--->STANDBY| | |
   | +--+--+   +---+ | |
   ||| |
   | +--v+   | |
   +-+WAITING+---+ |
   | +--++ |
   ||  |
   | +--v+ |
   +-+PENDING| |
   | +--++ |
   ||  |
+--v-+   +--v--+   |
|ABORTING+--->CONCLUDED|   |
++   +--+--+   |
|  |
 +--v-+|
 |NULL++
 ++

Signed-off-by: John Snow 
---
 blockjob.c   | 66 +---
 qapi/block-core.json | 31 +++-
 2 files changed, 72 insertions(+), 25 deletions(-)

diff --git a/blockjob.c b/blockjob.c
index 4aed86fc6b..23b4b99fd4 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -44,27 +44,28 @@ static QemuMutex block_job_mutex;
 
 /* BlockJob State Transition Table */
 bool BlockJobSTT[BLOCK_JOB_STATUS__MAX][BLOCK_JOB_STATUS__MAX] = {
-  /* U, C, R, P, Y, S, W, X, E, N */
-/* U: */ [BLOCK_JOB_STATUS_UNDEFINED] = {0, 1, 0, 0, 0, 0, 0, 0, 0, 0},
-/* C: */ [BLOCK_JOB_STATUS_CREATED]   = {0, 0, 1, 0, 0, 0, 0, 0, 0, 1},
-/* R: */ [BLOCK_JOB_STATUS_RUNNING]   = {0, 0, 0, 1, 1, 0, 1, 1, 0, 0},
-/* P: */ [BLOCK_JOB_STATUS_PAUSED]= {0, 0, 1, 0, 0, 0, 0, 0, 0, 0},
-/* Y: */ [BLOCK_JOB_STATUS_READY] = {0, 0, 0, 0, 0, 1, 1, 1, 0, 0},
-/* S: */ [BLOCK_JOB_STATUS_STANDBY]   = {0, 0, 0, 0, 1, 0, 0, 0, 0, 0},
-/* W: */ [BLOCK_JOB_STATUS_WAITING]   = {0, 0, 0, 0, 0, 0, 0, 1, 1, 0},
-/* X: */ [BLOCK_JOB_STATUS_ABORTING]  = {0, 0, 0, 0, 0, 0, 0, 0, 1, 0},
-/* E: */ [BLOCK_JOB_STATUS_CONCLUDED] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 1},
-/* N: */ [BLOCK_JOB_STATUS_NULL]  = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0},
+  /* U, C, R, P, Y, S, W, D, X, E, N */
+/* U: */ [BLOCK_JOB_STATUS_UNDEFINED] = {0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0},
+/* C: */ [BLOCK_JOB_STATUS_CREATED]   = {0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1},
+/* R: */ [BLOCK_JOB_STATUS_RUNNING]   = {0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0},
+/* P: */ [BLOCK_JOB_STATUS_PAUSED]= {0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0},
+/* Y: */ [BLOCK_JOB_STATUS_READY] = {0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0},
+/* S: */ [BLOCK_JOB_STATUS_STANDBY]   = {0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0},
+/* W: */ [BLOCK_JOB_STATUS_WAITING]   = {0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0},
+/* D: */ [BLOCK_JOB_STATUS_PENDING]   = {0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0},
+/* X: */ [BLOCK_JOB_STATUS_ABORTING]  = {0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0},
+/* E: */ [BLOCK_JOB_STATUS_CONCLUDED] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1},
+/* N: */ [BLOCK_JOB_STATUS_NULL]  = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0},
 };
 
 bool BlockJobVerbTable[BLOCK_JOB_VERB__MAX][BLOCK_JOB_STATUS__MAX] = {
-  /* U, C, R, P, Y, S, W, X, E, N */
-[BLOCK_JOB_VERB_CANCEL]   = {0, 1, 1, 1, 1, 1, 1, 0, 0, 0},
-[BLOCK_JOB_VERB_PAUSE]= {0, 1, 1, 1, 1, 1, 0, 0, 0, 0},
-[BLOCK_JOB_VERB_RESUME]   = {0, 1, 1, 1, 1, 1, 0, 0, 0, 0},
-[BLOCK_JOB_VERB_SET_SPEED]= {0, 1, 1, 1, 1, 1, 0, 0, 0, 0},
-[BLOCK_JOB_VERB_COMPLETE] = {0, 0, 0, 0, 1, 0, 0, 0, 0, 0},
-[BLOCK_JOB_VERB_DISMISS]  = {0, 0, 0, 0, 0, 0, 0, 0, 1, 0},
+  /* U, C, R, P, Y, S, W, D, X, E, N */
+

[Qemu-block] [RFC v4 08/21] blockjobs: add ABORTING state

2018-02-23 Thread John Snow
Add a new state ABORTING.

This makes transitions from normative states to error states explicit
in the STM, and serves as a disambiguation for which states may complete
normally when normal end-states (CONCLUDED) are added in future commits.

Notably, Paused/Standby jobs do not transition directly to aborting,
as they must wake up first and cooperate in their cancellation.

Transitions:
Running -> Aborting: can be cancelled or encounter an error
Ready   -> Aborting: can be cancelled or encounter an error

Verbs:
None. The job must finish cleaning itself up and report its final status.

 +-+
 |UNDEFINED|
 +--+--+
|
 +--v+
 |CREATED|
 +--++
|
 +--v+ +--+
   +-+RUNNING<->PAUSED|
   | +--++ +--+
   ||
   | +--v--+   +---+
   +-+READY<--->STANDBY|
   | +-+   +---+
   |
+--v-+
|ABORTING|
++

Signed-off-by: John Snow 
---
 blockjob.c   | 31 ++-
 qapi/block-core.json |  7 ++-
 2 files changed, 24 insertions(+), 14 deletions(-)

diff --git a/blockjob.c b/blockjob.c
index 4e424fef72..4c3fcda46c 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -44,22 +44,23 @@ static QemuMutex block_job_mutex;
 
 /* BlockJob State Transition Table */
 bool BlockJobSTT[BLOCK_JOB_STATUS__MAX][BLOCK_JOB_STATUS__MAX] = {
-  /* U, C, R, P, Y, S */
-/* U: */ [BLOCK_JOB_STATUS_UNDEFINED] = {0, 1, 0, 0, 0, 0},
-/* C: */ [BLOCK_JOB_STATUS_CREATED]   = {0, 0, 1, 0, 0, 0},
-/* R: */ [BLOCK_JOB_STATUS_RUNNING]   = {0, 0, 0, 1, 1, 0},
-/* P: */ [BLOCK_JOB_STATUS_PAUSED]= {0, 0, 1, 0, 0, 0},
-/* Y: */ [BLOCK_JOB_STATUS_READY] = {0, 0, 0, 0, 0, 1},
-/* S: */ [BLOCK_JOB_STATUS_STANDBY]   = {0, 0, 0, 0, 1, 0},
+  /* U, C, R, P, Y, S, X */
+/* U: */ [BLOCK_JOB_STATUS_UNDEFINED] = {0, 1, 0, 0, 0, 0, 0},
+/* C: */ [BLOCK_JOB_STATUS_CREATED]   = {0, 0, 1, 0, 0, 0, 0},
+/* R: */ [BLOCK_JOB_STATUS_RUNNING]   = {0, 0, 0, 1, 1, 0, 1},
+/* P: */ [BLOCK_JOB_STATUS_PAUSED]= {0, 0, 1, 0, 0, 0, 0},
+/* Y: */ [BLOCK_JOB_STATUS_READY] = {0, 0, 0, 0, 0, 1, 1},
+/* S: */ [BLOCK_JOB_STATUS_STANDBY]   = {0, 0, 0, 0, 1, 0, 0},
+/* X: */ [BLOCK_JOB_STATUS_ABORTING]  = {0, 0, 0, 0, 0, 0, 0},
 };
 
 bool BlockJobVerbTable[BLOCK_JOB_VERB__MAX][BLOCK_JOB_STATUS__MAX] = {
-  /* U, C, R, P, Y, S */
-[BLOCK_JOB_VERB_CANCEL]   = {0, 1, 1, 1, 1, 1},
-[BLOCK_JOB_VERB_PAUSE]= {0, 1, 1, 1, 1, 1},
-[BLOCK_JOB_VERB_RESUME]   = {0, 1, 1, 1, 1, 1},
-[BLOCK_JOB_VERB_SET_SPEED]= {0, 1, 1, 1, 1, 1},
-[BLOCK_JOB_VERB_COMPLETE] = {0, 0, 0, 0, 1, 0},
+  /* U, C, R, P, Y, S, X */
+[BLOCK_JOB_VERB_CANCEL]   = {0, 1, 1, 1, 1, 1, 0},
+[BLOCK_JOB_VERB_PAUSE]= {0, 1, 1, 1, 1, 1, 0},
+[BLOCK_JOB_VERB_RESUME]   = {0, 1, 1, 1, 1, 1, 0},
+[BLOCK_JOB_VERB_SET_SPEED]= {0, 1, 1, 1, 1, 1, 0},
+[BLOCK_JOB_VERB_COMPLETE] = {0, 0, 0, 0, 1, 0, 0},
 };
 
 static void block_job_state_transition(BlockJob *job, BlockJobStatus s1)
@@ -383,6 +384,10 @@ static void block_job_completed_single(BlockJob *job)
 {
 assert(job->completed);
 
+if (job->ret || block_job_is_cancelled(job)) {
+block_job_state_transition(job, BLOCK_JOB_STATUS_ABORTING);
+}
+
 if (!job->ret) {
 if (job->driver->commit) {
 job->driver->commit(job);
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 11659496c5..3f7d559fc0 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -996,10 +996,15 @@
 # @standby: The job is ready, but paused. This is nearly identical to @paused.
 #   The job may return to @ready or otherwise be canceled.
 #
+# @aborting: The job is in the process of being aborted, and will finish with
+#an error. The job will afterwards report that it is @concluded.
+#This status may not be visible to the management process.
+#
 # Since: 2.12
 ##
 { 'enum': 'BlockJobStatus',
-  'data': ['undefined', 'created', 'running', 'paused', 'ready', 'standby'] }
+  'data': ['undefined', 'created', 'running', 'paused', 'ready', 'standby',
+   'aborting' ] }
 
 ##
 # @BlockJobInfo:
-- 
2.14.3




[Qemu-block] [RFC v4 16/21] blockjobs: add waiting status

2018-02-23 Thread John Snow
For jobs that are stuck waiting on others in a transaction, it would
be nice to know that they are no longer "running" in that sense, but
instead are waiting on other jobs in the transaction.

Jobs that are "waiting" in this sense cannot be meaningfully altered
any longer as they have left their running loop. The only meaningful
user verb for jobs in this state is "cancel," which will cancel the
whole transaction, too.

Transitions:
Running -> Waiting:   Normal transition.
Ready   -> Waiting:   Normal transition.
Waiting -> Aborting:  Transactional cancellation.
Waiting -> Concluded: Normal transition.

Removed Transitions:
Running -> Concluded: Jobs must go to WAITING first.
Ready   -> Concluded: Jobs must go to WAITING fisrt.

Verbs:
Cancel: Can be applied to WAITING jobs.

 +-+
 |UNDEFINED|
 +--+--+
|
 +--v+
 |CREATED+-+
 +--++ |
|  |
 +--v+ +--+|
   +-+RUNNING<->PAUSED||
   | +--+-+--+ +--+|
   || ||
   || +--+ |
   ||| |
   | +--v--+   +---+ | |
   +-+READY<--->STANDBY| | |
   | +--+--+   +---+ | |
   ||| |
   | +--v+   | |
   +-+WAITING<---+ |
   | +--++ |
   ||  |
+--+-+   +--v--+   |
|ABORTING+--->CONCLUDED|   |
++   +--+--+   |
|  |
 +--v-+|
 |NULL<+
 ++

Signed-off-by: John Snow 
---
 blockjob.c   | 37 -
 qapi/block-core.json | 29 -
 2 files changed, 48 insertions(+), 18 deletions(-)

diff --git a/blockjob.c b/blockjob.c
index 1c010ec100..4aed86fc6b 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -44,26 +44,27 @@ static QemuMutex block_job_mutex;
 
 /* BlockJob State Transition Table */
 bool BlockJobSTT[BLOCK_JOB_STATUS__MAX][BLOCK_JOB_STATUS__MAX] = {
-  /* U, C, R, P, Y, S, X, E, N */
-/* U: */ [BLOCK_JOB_STATUS_UNDEFINED] = {0, 1, 0, 0, 0, 0, 0, 0, 0},
-/* C: */ [BLOCK_JOB_STATUS_CREATED]   = {0, 0, 1, 0, 0, 0, 0, 0, 1},
-/* R: */ [BLOCK_JOB_STATUS_RUNNING]   = {0, 0, 0, 1, 1, 0, 1, 1, 0},
-/* P: */ [BLOCK_JOB_STATUS_PAUSED]= {0, 0, 1, 0, 0, 0, 0, 0, 0},
-/* Y: */ [BLOCK_JOB_STATUS_READY] = {0, 0, 0, 0, 0, 1, 1, 1, 0},
-/* S: */ [BLOCK_JOB_STATUS_STANDBY]   = {0, 0, 0, 0, 1, 0, 0, 0, 0},
-/* X: */ [BLOCK_JOB_STATUS_ABORTING]  = {0, 0, 0, 0, 0, 0, 0, 1, 0},
-/* E: */ [BLOCK_JOB_STATUS_CONCLUDED] = {0, 0, 0, 0, 0, 0, 0, 0, 1},
-/* N: */ [BLOCK_JOB_STATUS_NULL]  = {0, 0, 0, 0, 0, 0, 0, 0, 0},
+  /* U, C, R, P, Y, S, W, X, E, N */
+/* U: */ [BLOCK_JOB_STATUS_UNDEFINED] = {0, 1, 0, 0, 0, 0, 0, 0, 0, 0},
+/* C: */ [BLOCK_JOB_STATUS_CREATED]   = {0, 0, 1, 0, 0, 0, 0, 0, 0, 1},
+/* R: */ [BLOCK_JOB_STATUS_RUNNING]   = {0, 0, 0, 1, 1, 0, 1, 1, 0, 0},
+/* P: */ [BLOCK_JOB_STATUS_PAUSED]= {0, 0, 1, 0, 0, 0, 0, 0, 0, 0},
+/* Y: */ [BLOCK_JOB_STATUS_READY] = {0, 0, 0, 0, 0, 1, 1, 1, 0, 0},
+/* S: */ [BLOCK_JOB_STATUS_STANDBY]   = {0, 0, 0, 0, 1, 0, 0, 0, 0, 0},
+/* W: */ [BLOCK_JOB_STATUS_WAITING]   = {0, 0, 0, 0, 0, 0, 0, 1, 1, 0},
+/* X: */ [BLOCK_JOB_STATUS_ABORTING]  = {0, 0, 0, 0, 0, 0, 0, 0, 1, 0},
+/* E: */ [BLOCK_JOB_STATUS_CONCLUDED] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 1},
+/* N: */ [BLOCK_JOB_STATUS_NULL]  = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0},
 };
 
 bool BlockJobVerbTable[BLOCK_JOB_VERB__MAX][BLOCK_JOB_STATUS__MAX] = {
-  /* U, C, R, P, Y, S, X, E, N */
-[BLOCK_JOB_VERB_CANCEL]   = {0, 1, 1, 1, 1, 1, 0, 0, 0},
-[BLOCK_JOB_VERB_PAUSE]= {0, 1, 1, 1, 1, 1, 0, 0, 0},
-[BLOCK_JOB_VERB_RESUME]   = {0, 1, 1, 1, 1, 1, 0, 0, 0},
-[BLOCK_JOB_VERB_SET_SPEED]= {0, 1, 1, 1, 1, 1, 0, 0, 0},
-[BLOCK_JOB_VERB_COMPLETE] = {0, 0, 0, 0, 1, 0, 0, 0, 0},
-[BLOCK_JOB_VERB_DISMISS]  = {0, 0, 0, 0, 0, 0, 0, 1, 0},
+  /* U, C, R, P, Y, S, W, X, E, N */
+[BLOCK_JOB_VERB_CANCEL]   = {0, 1, 1, 1, 1, 1, 1, 0, 0, 0},
+[BLOCK_JOB_VERB_PAUSE]= {0, 1, 1, 1, 1, 1, 0, 0, 0, 0},
+[BLOCK_JOB_VERB_RESUME]   = {0, 1, 1, 1, 1, 1, 0, 0, 0, 0},
+[BLOCK_JOB_VERB_SET_SPEED]= {0, 1, 1, 1, 1, 1, 0, 0, 0, 0},
+[BLOCK_JOB_VERB_COMPLETE] = {0, 0, 0, 0, 1, 0, 0, 

Re: [Qemu-block] [PATCH v8 09/21] null: Switch to .bdrv_co_block_status()

2018-02-23 Thread Eric Blake

On 02/23/2018 11:05 AM, Kevin Wolf wrote:

Am 23.02.2018 um 17:43 hat Eric Blake geschrieben:

OFFSET_VALID | DATA might be excusable because I can see that it's
convenient that a protocol driver refers to itself as *file instead of
returning NULL there and then the offset is valid (though it would be
pointless to actually follow the file pointer), but OFFSET_VALID without
DATA probably isn't.


So OFFSET_VALID | DATA for a protocol BDS is not just convenient, but
necessary to avoid breaking qemu-img map output.  But you are also right
that OFFSET_VALID without data makes little sense at a protocol layer. So
with that in mind, I'm auditing all of the protocol layers to make sure
OFFSET_VALID ends up as something sane.


That's one way to look at it.

The other way is that qemu-img map shouldn't ask the protocol layer for
its offset because it already knows the offset (it is what it passes as
a parameter to bdrv_co_block_status).

Anyway, it's probably not worth changing the interface, we should just
make sure that the return values of the individual drivers are
consistent.


Yet another inconsistency, and it's making me scratch my head today.

By the way, in my byte-based stuff that is now pending on your tree, I 
tried hard to NOT change semantics or the set of flags returned by a 
given driver, and we agreed that's why you'd accept the series as-is and 
make me do this followup exercise.  But it's looking like my followups 
may end up touching a lot of the same drivers again, now that I'm 
looking at what the semantics SHOULD be (and whatever I do end up 
tweaking, I will at least make sure that iotests is still happy with it).


First, let's read what states the NBD spec is proposing:


It defines the following flags for the flags field:

NBD_STATE_HOLE (bit 0): if set, the block represents a hole (and future 
writes to that area may cause fragmentation or encounter an ENOSPC error); if 
clear, the block is allocated or the server could not otherwise determine its 
status. Note that the use of NBD_CMD_TRIM is related to this status, but that 
the server MAY report a hole even where NBD_CMD_TRIM has not been requested, 
and also that a server MAY report that the block is allocated even where 
NBD_CMD_TRIM has been requested.
NBD_STATE_ZERO (bit 1): if set, the block contents read as all zeroes; if 
clear, the block contents are not known. Note that the use of 
NBD_CMD_WRITE_ZEROES is related to this status, but that the server MAY report 
zeroes even where NBD_CMD_WRITE_ZEROES has not been requested, and also that a 
server MAY report unknown content even where NBD_CMD_WRITE_ZEROES has been 
requested.

It is not an error for a server to report that a region of the export has both 
NBD_STATE_HOLE set and NBD_STATE_ZERO clear. The contents of such an area are 
undefined, and a client reading such an area should make no assumption as to 
its contents or stability.


So here's how Vladimir proposed implementing it in his series (written 
before my byte-based block status stuff went in to your tree):

https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg04038.html

Server side (3/9):

+int ret = bdrv_block_status_above(bs, NULL, offset, tail_bytes, 
,

+  NULL, NULL);
+if (ret < 0) {
+return ret;
+}
+
+flags = (ret & BDRV_BLOCK_ALLOCATED ? 0 : NBD_STATE_HOLE) |
+(ret & BDRV_BLOCK_ZERO  ? NBD_STATE_ZERO : 0);

Client side (6/9):

+*pnum = extent.length >> BDRV_SECTOR_BITS;
+return (extent.flags & NBD_STATE_HOLE ? 0 : BDRV_BLOCK_DATA) |
+   (extent.flags & NBD_STATE_ZERO ? BDRV_BLOCK_ZERO : 0);

Does anything there strike you as odd?  In isolation, they seemed fine 
to me, but side-by-side, I'm scratching my head: the server queries the 
block layer, and turns BDRV_BLOCK_ALLOCATED into !NBD_STATE_HOLE; the 
client side then takes the NBD protocol and tries to turn it back into 
information to feed the block layer, where !NBD_STATE_HOLE now feeds 
BDRV_BLOCK_DATA.  Why the different choice of bits?


Part of the story is that right now, we document that ONLY the block 
layer sets _ALLOCATED, in io.c, as a result of the driver layer 
returning HOLE || ZERO (there are cases where the block layer can return 
ZERO but not ALLOCATED, because the driver layer returned 0 but the 
block layer still knows that area reads as zero).  So Victor's patch 
matches the fact that the driver shouldn't set ALLOCATED.  Still, if we 
are tying ALLOCATED to whether there is a hole, then that seems like 
information we should be getting from the driver, not something 
synthesized after we've left the driver!


Then there's the question of file-posix.c: what should it return for a 
hole, ZERO|OFFSET_VALID or DATA|ZERO|OFFSET_VALID?  The wording in 
block.h implies that if DATA is not set, then the area reads as zero to 
the guest, but may have indeterminate value on the underlying file - but 
we KNOW 

Re: [Qemu-block] [Qemu-devel] [PATCH v2] block: Fix qemu crash when using scsi-block

2018-02-23 Thread Deepa Srinivasan

Stefan, Kevin - Ping, to take this patch. Thanks.


On 01/29/2018 07:51 AM, Stefan Hajnoczi wrote:

On Fri, Dec 15, 2017 at 04:59:13PM -0800, Deepa Srinivasan wrote:

Starting qemu with the following arguments causes qemu to segfault:
... -device lsi,id=lsi0 -drive file=iscsi:<...>,format=raw,if=none,node-name=
iscsi1 -device scsi-block,bus=lsi0.0,id=<...>,drive=iscsi1

This patch fixes blk_aio_ioctl() so it does not pass stack addresses to
blk_aio_ioctl_entry() which may be invoked after blk_aio_ioctl() returns. More
details about the bug follow.

blk_aio_ioctl() invokes blk_aio_prwv() with blk_aio_ioctl_entry as the
coroutine parameter. blk_aio_prwv() ultimately calls aio_co_enter().

When blk_aio_ioctl() is executed from within a coroutine context (e.g.
iscsi_bh_cb()), aio_co_enter() adds the coroutine (blk_aio_ioctl_entry) to
the current coroutine's wakeup queue. blk_aio_ioctl() then returns.

When blk_aio_ioctl_entry() executes later, it accesses an invalid pointer:

 BlkRwCo *rwco = >rwco;

 rwco->ret = blk_co_ioctl(rwco->blk, rwco->offset,
  rwco->qiov->iov[0].iov_base);  <--- qiov is
  invalid here
...

In the case when blk_aio_ioctl() is called from a non-coroutine context,
blk_aio_ioctl_entry() executes immediately. But if bdrv_co_ioctl() calls
qemu_coroutine_yield(), blk_aio_ioctl() will return. When the coroutine
execution is complete, control returns to blk_aio_ioctl_entry() after the call
to blk_co_ioctl(). There is no invalid reference after this point, but the
function is still holding on to invalid pointers.

The fix is to change blk_aio_prwv() to accept a void pointer for the IO buffer
rather than a QEMUIOVector. blk_aio_prwv() passes this through in BlkRwCo and 
the
coroutine function casts it to QEMUIOVector or uses the void pointer directly.

Signed-off-by: Deepa Srinivasan 
Signed-off-by: Konrad Rzeszutek Wilk 
Reviewed-by: Mark Kanda 
---
  block/block-backend.c | 51 +--

Kevin: Ping.  Will you take this through your tree?


  1 file changed, 25 insertions(+), 26 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index baef8e7..2d0d9b6 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1140,7 +1140,7 @@ int coroutine_fn blk_co_pwritev(BlockBackend *blk, 
int64_t offset,
  typedef struct BlkRwCo {
  BlockBackend *blk;
  int64_t offset;
-QEMUIOVector *qiov;
+void *iobuf;
  int ret;
  BdrvRequestFlags flags;
  } BlkRwCo;
@@ -1148,17 +1148,19 @@ typedef struct BlkRwCo {
  static void blk_read_entry(void *opaque)
  {
  BlkRwCo *rwco = opaque;
+QEMUIOVector *qiov = rwco->iobuf;
  
-rwco->ret = blk_co_preadv(rwco->blk, rwco->offset, rwco->qiov->size,

-  rwco->qiov, rwco->flags);
+rwco->ret = blk_co_preadv(rwco->blk, rwco->offset, qiov->size,
+  qiov, rwco->flags);
  }
  
  static void blk_write_entry(void *opaque)

  {
  BlkRwCo *rwco = opaque;
+QEMUIOVector *qiov = rwco->iobuf;
  
-rwco->ret = blk_co_pwritev(rwco->blk, rwco->offset, rwco->qiov->size,

-   rwco->qiov, rwco->flags);
+rwco->ret = blk_co_pwritev(rwco->blk, rwco->offset, qiov->size,
+   qiov, rwco->flags);
  }
  
  static int blk_prw(BlockBackend *blk, int64_t offset, uint8_t *buf,

@@ -1178,7 +1180,7 @@ static int blk_prw(BlockBackend *blk, int64_t offset, 
uint8_t *buf,
  rwco = (BlkRwCo) {
  .blk= blk,
  .offset = offset,
-.qiov   = ,
+.iobuf  = ,
  .flags  = flags,
  .ret= NOT_DONE,
  };
@@ -1275,7 +1277,7 @@ static void blk_aio_complete_bh(void *opaque)
  }
  
  static BlockAIOCB *blk_aio_prwv(BlockBackend *blk, int64_t offset, int bytes,

-QEMUIOVector *qiov, CoroutineEntry co_entry,
+void *iobuf, CoroutineEntry co_entry,
  BdrvRequestFlags flags,
  BlockCompletionFunc *cb, void *opaque)
  {
@@ -1287,7 +1289,7 @@ static BlockAIOCB *blk_aio_prwv(BlockBackend *blk, 
int64_t offset, int bytes,
  acb->rwco = (BlkRwCo) {
  .blk= blk,
  .offset = offset,
-.qiov   = qiov,
+.iobuf  = iobuf,
  .flags  = flags,
  .ret= NOT_DONE,
  };
@@ -1310,10 +1312,11 @@ static void blk_aio_read_entry(void *opaque)
  {
  BlkAioEmAIOCB *acb = opaque;
  BlkRwCo *rwco = >rwco;
+QEMUIOVector *qiov = rwco->iobuf;
  
-assert(rwco->qiov->size == acb->bytes);

+assert(qiov->size == acb->bytes);
  rwco->ret = blk_co_preadv(rwco->blk, rwco->offset, acb->bytes,
-  rwco->qiov, rwco->flags);
+  qiov, 

[Qemu-block] [PATCH v3 36/36] qemu-iotests: Test ssh image creation over QMP

2018-02-23 Thread Kevin Wolf
Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 tests/qemu-iotests/207 | 261 +
 tests/qemu-iotests/207.out |  75 +
 tests/qemu-iotests/group   |   1 +
 3 files changed, 337 insertions(+)
 create mode 100755 tests/qemu-iotests/207
 create mode 100644 tests/qemu-iotests/207.out

diff --git a/tests/qemu-iotests/207 b/tests/qemu-iotests/207
new file mode 100755
index 00..f5c77852d1
--- /dev/null
+++ b/tests/qemu-iotests/207
@@ -0,0 +1,261 @@
+#!/bin/bash
+#
+# Test ssh image creation
+#
+# Copyright (C) 2018 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+# creator
+owner=kw...@redhat.com
+
+seq=`basename $0`
+echo "QA output created by $seq"
+
+here=`pwd`
+status=1   # failure is the default!
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+_supported_fmt raw
+_supported_proto ssh
+_supported_os Linux
+
+function do_run_qemu()
+{
+echo Testing: "$@"
+$QEMU -nographic -qmp stdio -serial none "$@"
+echo
+}
+
+function run_qemu()
+{
+do_run_qemu "$@" 2>&1 | _filter_testdir | _filter_qmp \
+  | _filter_qemu | _filter_imgfmt \
+  | _filter_actual_image_size
+}
+
+echo
+echo "=== Successful image creation (defaults) ==="
+echo
+
+run_qemu 

[Qemu-block] [PATCH v3 26/36] nfs: Support .bdrv_co_create

2018-02-23 Thread Kevin Wolf
This adds the .bdrv_co_create driver callback to nfs, which enables
image creation over QMP.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 qapi/block-core.json | 16 +++-
 block/nfs.c  | 74 +---
 2 files changed, 74 insertions(+), 16 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 6c0c16ebe3..085b791303 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -3393,6 +3393,20 @@
 '*preallocation':   'PreallocMode' } }
 
 ##
+# @BlockdevCreateOptionsNfs:
+#
+# Driver specific image creation options for NFS.
+#
+# @location Where to store the new image file
+# @size Size of the virtual disk in bytes
+#
+# Since: 2.12
+##
+{ 'struct': 'BlockdevCreateOptionsNfs',
+  'data': { 'location': 'BlockdevOptionsNfs',
+'size': 'size' } }
+
+##
 # @BlockdevQcow2Version:
 #
 # @v2:  The original QCOW2 format as introduced in qemu 0.10 (version 2)
@@ -3491,7 +3505,7 @@
   'iscsi':  'BlockdevCreateNotSupported',
   'luks':   'BlockdevCreateNotSupported',
   'nbd':'BlockdevCreateNotSupported',
-  'nfs':'BlockdevCreateNotSupported',
+  'nfs':'BlockdevCreateOptionsNfs',
   'null-aio':   'BlockdevCreateNotSupported',
   'null-co':'BlockdevCreateNotSupported',
   'nvme':   'BlockdevCreateNotSupported',
diff --git a/block/nfs.c b/block/nfs.c
index 9283bfbaae..c0c153cadb 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -551,33 +551,45 @@ out:
 return ret;
 }
 
-static int64_t nfs_client_open_qdict(NFSClient *client, QDict *options,
- int flags, int open_flags, Error **errp)
+static BlockdevOptionsNfs *nfs_options_qdict_to_qapi(QDict *options,
+ Error **errp)
 {
 BlockdevOptionsNfs *opts = NULL;
 QObject *crumpled = NULL;
 Visitor *v;
 Error *local_err = NULL;
-int ret;
 
 crumpled = qdict_crumple(options, errp);
 if (crumpled == NULL) {
-return -EINVAL;
+return NULL;
 }
 
 v = qobject_input_visitor_new_keyval(crumpled);
 visit_type_BlockdevOptionsNfs(v, NULL, , _err);
 visit_free(v);
+qobject_decref(crumpled);
 
 if (local_err) {
-error_propagate(errp, local_err);
+return NULL;
+}
+
+return opts;
+}
+
+static int64_t nfs_client_open_qdict(NFSClient *client, QDict *options,
+ int flags, int open_flags, Error **errp)
+{
+BlockdevOptionsNfs *opts;
+int ret;
+
+opts = nfs_options_qdict_to_qapi(options, errp);
+if (opts == NULL) {
 ret = -EINVAL;
 goto fail;
 }
 
 ret = nfs_client_open(client, opts, flags, open_flags, errp);
 fail:
-qobject_decref(crumpled);
 qapi_free_BlockdevOptionsNfs(opts);
 return ret;
 }
@@ -614,17 +626,42 @@ static QemuOptsList nfs_create_opts = {
 }
 };
 
-static int nfs_file_create(const char *url, QemuOpts *opts, Error **errp)
+static int nfs_file_co_create(BlockdevCreateOptions *options, Error **errp)
 {
-int64_t ret, total_size;
+BlockdevCreateOptionsNfs *opts = >u.nfs;
 NFSClient *client = g_new0(NFSClient, 1);
-QDict *options = NULL;
+int ret;
+
+assert(options->driver == BLOCKDEV_DRIVER_NFS);
 
 client->aio_context = qemu_get_aio_context();
 
+ret = nfs_client_open(client, opts->location, O_CREAT, 0, errp);
+if (ret < 0) {
+goto out;
+}
+ret = nfs_ftruncate(client->context, client->fh, opts->size);
+nfs_client_close(client);
+
+out:
+g_free(client);
+return ret;
+}
+
+static int nfs_file_create(const char *url, QemuOpts *opts, Error **errp)
+{
+BlockdevCreateOptions *create_options;
+BlockdevCreateOptionsNfs *nfs_opts;
+QDict *options;
+int ret;
+
+create_options = g_new0(BlockdevCreateOptions, 1);
+create_options->driver = BLOCKDEV_DRIVER_NFS;
+nfs_opts = _options->u.nfs;
+
 /* Read out options */
-total_size = ROUND_UP(qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0),
-  BDRV_SECTOR_SIZE);
+nfs_opts->size = ROUND_UP(qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0),
+  BDRV_SECTOR_SIZE);
 
 options = qdict_new();
 ret = nfs_parse_uri(url, options, errp);
@@ -632,15 +669,21 @@ static int nfs_file_create(const char *url, QemuOpts 
*opts, Error **errp)
 goto out;
 }
 
-ret = nfs_client_open_qdict(client, options, O_CREAT, 0, errp);
+nfs_opts->location = nfs_options_qdict_to_qapi(options, errp);
+if (nfs_opts->location == NULL) {
+ret = -EINVAL;
+goto out;
+}
+
+ret = nfs_file_co_create(create_options, errp);
 if (ret < 0) {
 goto out;
 }
-ret = nfs_ftruncate(client->context, client->fh, total_size);
-

[Qemu-block] [PATCH v3 28/36] sheepdog: Support .bdrv_co_create

2018-02-23 Thread Kevin Wolf
This adds the .bdrv_co_create driver callback to sheepdog, which enables
image creation over QMP.

Signed-off-by: Kevin Wolf 
---
 qapi/block-core.json |  24 -
 block/sheepdog.c | 242 +++
 2 files changed, 191 insertions(+), 75 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 2b249c9e3d..f7679fce53 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -3512,6 +3512,28 @@
 'erasure-coded': 'SheepdogRedundancyErasureCoded' } }
 
 ##
+# @BlockdevCreateOptionsSheepdog:
+#
+# Driver specific image creation options for Sheepdog.
+#
+# @location Where to store the new image file
+# @size Size of the virtual disk in bytes
+# @backing-file File name of a base image
+# @preallocationPreallocation mode (allowed values: off, full)
+# @redundancy   Redundancy of the image
+# @object-size  Object size of the image
+#
+# Since: 2.12
+##
+{ 'struct': 'BlockdevCreateOptionsSheepdog',
+  'data': { 'location': 'BlockdevOptionsSheepdog',
+'size': 'size',
+'*backing-file':'str',
+'*preallocation':   'PreallocMode',
+'*redundancy':  'SheepdogRedundancy',
+'*object-size': 'size' } }
+
+##
 # @BlockdevCreateNotSupported:
 #
 # This is used for all drivers that don't support creating images.
@@ -3562,7 +3584,7 @@
   'raw':'BlockdevCreateNotSupported',
   'rbd':'BlockdevCreateOptionsRbd',
   'replication':'BlockdevCreateNotSupported',
-  'sheepdog':   'BlockdevCreateNotSupported',
+  'sheepdog':   'BlockdevCreateOptionsSheepdog',
   'ssh':'BlockdevCreateNotSupported',
   'throttle':   'BlockdevCreateNotSupported',
   'vdi':'BlockdevCreateNotSupported',
diff --git a/block/sheepdog.c b/block/sheepdog.c
index 22df2ba9d0..83da6236ca 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -17,6 +17,7 @@
 #include "qapi/error.h"
 #include "qapi/qmp/qdict.h"
 #include "qapi/qobject-input-visitor.h"
+#include "qapi/qobject-output-visitor.h"
 #include "qemu/uri.h"
 #include "qemu/error-report.h"
 #include "qemu/option.h"
@@ -533,23 +534,6 @@ static void sd_aio_setup(SheepdogAIOCB *acb, 
BDRVSheepdogState *s,
 qemu_co_mutex_unlock(>queue_lock);
 }
 
-static SocketAddress *sd_socket_address(const char *path,
-const char *host, const char *port)
-{
-SocketAddress *addr = g_new0(SocketAddress, 1);
-
-if (path) {
-addr->type = SOCKET_ADDRESS_TYPE_UNIX;
-addr->u.q_unix.path = g_strdup(path);
-} else {
-addr->type = SOCKET_ADDRESS_TYPE_INET;
-addr->u.inet.host = g_strdup(host ?: SD_DEFAULT_ADDR);
-addr->u.inet.port = g_strdup(port ?: stringify(SD_DEFAULT_PORT));
-}
-
-return addr;
-}
-
 static SocketAddress *sd_server_config(QDict *options, Error **errp)
 {
 QDict *server = NULL;
@@ -1882,6 +1866,44 @@ out_with_err_set:
 return ret;
 }
 
+static int sd_create_prealloc(BlockdevOptionsSheepdog *location, int64_t size,
+  Error **errp)
+{
+BlockDriverState *bs;
+Visitor *v;
+QObject *obj = NULL;
+QDict *qdict;
+Error *local_err = NULL;
+int ret;
+
+v = qobject_output_visitor_new();
+visit_type_BlockdevOptionsSheepdog(v, NULL, , _err);
+visit_free(v);
+
+if (local_err) {
+error_propagate(errp, local_err);
+qobject_decref(obj);
+return -EINVAL;
+}
+
+qdict = qobject_to_qdict(obj);
+qdict_flatten(qdict);
+
+qdict_put_str(qdict, "driver", "sheepdog");
+
+bs = bdrv_open(NULL, NULL, qdict, BDRV_O_PROTOCOL | BDRV_O_RDWR, errp);
+if (bs == NULL) {
+ret = -EIO;
+goto fail;
+}
+
+ret = sd_prealloc(bs, 0, size, errp);
+fail:
+bdrv_unref(bs);
+QDECREF(qdict);
+return ret;
+}
+
 static int parse_redundancy(BDRVSheepdogState *s, SheepdogRedundancy *opt)
 {
 struct SheepdogInode *inode = >inode;
@@ -1934,9 +1956,9 @@ static int parse_redundancy(BDRVSheepdogState *s, 
SheepdogRedundancy *opt)
  * # create a erasure coded vdi with x data strips and y parity strips
  * -o redundancy=x:y (x must be one of {2,4,8,16} and 1 <= y < SD_EC_MAX_STRIP)
  */
-static int parse_redundancy_str(BDRVSheepdogState *s, const char *opt)
+static SheepdogRedundancy *parse_redundancy_str(const char *opt)
 {
-struct SheepdogRedundancy redundancy;
+SheepdogRedundancy *redundancy;
 const char *n1, *n2;
 long copy, parity;
 char p[10];
@@ -1947,26 +1969,27 @@ static int parse_redundancy_str(BDRVSheepdogState *s, 
const char *opt)
 n2 = strtok(NULL, ":");
 
 if (!n1) {
-return -EINVAL;
+return NULL;
 }
 
 ret = qemu_strtol(n1, NULL, 10, );
 if (ret < 0) {
-return ret;
+return NULL;
 }
 
+redundancy = 

[Qemu-block] [PATCH v3 35/36] qemu-iotests: Test qcow2 over file image creation with QMP

2018-02-23 Thread Kevin Wolf
Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 tests/qemu-iotests/206 | 436 +
 tests/qemu-iotests/206.out | 209 ++
 tests/qemu-iotests/group   |   1 +
 3 files changed, 646 insertions(+)
 create mode 100755 tests/qemu-iotests/206
 create mode 100644 tests/qemu-iotests/206.out

diff --git a/tests/qemu-iotests/206 b/tests/qemu-iotests/206
new file mode 100755
index 00..0a18b2b19a
--- /dev/null
+++ b/tests/qemu-iotests/206
@@ -0,0 +1,436 @@
+#!/bin/bash
+#
+# Test qcow2 and file image creation
+#
+# Copyright (C) 2018 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+# creator
+owner=kw...@redhat.com
+
+seq=`basename $0`
+echo "QA output created by $seq"
+
+here=`pwd`
+status=1   # failure is the default!
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+_supported_fmt qcow2
+_supported_proto file
+_supported_os Linux
+
+function do_run_qemu()
+{
+echo Testing: "$@"
+$QEMU -nographic -qmp stdio -serial none "$@"
+echo
+}
+
+function run_qemu()
+{
+do_run_qemu "$@" 2>&1 | _filter_testdir | _filter_qmp \
+  | _filter_qemu | _filter_imgfmt \
+  | _filter_actual_image_size
+}
+
+echo
+echo "=== Successful image creation (defaults) ==="
+echo
+
+size=$((128 * 1024 * 1024))
+
+run_qemu <

[Qemu-block] [PATCH v3 22/36] rbd: Support .bdrv_co_create

2018-02-23 Thread Kevin Wolf
This adds the .bdrv_co_create driver callback to rbd, which enables
image creation over QMP.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 qapi/block-core.json |  19 ++-
 block/rbd.c  | 146 ++-
 2 files changed, 116 insertions(+), 49 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 74021c51d7..6c0c16ebe3 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -3436,6 +3436,23 @@
 '*refcount-bits':   'int' } }
 
 ##
+# @BlockdevCreateOptionsRbd:
+#
+# Driver specific image creation options for rbd/Ceph.
+#
+# @location Where to store the new image file. This location cannot
+#   point to a snapshot.
+# @size Size of the virtual disk in bytes
+# @cluster-size RBD object size
+#
+# Since: 2.12
+##
+{ 'struct': 'BlockdevCreateOptionsRbd',
+  'data': { 'location': 'BlockdevOptionsRbd',
+'size': 'size',
+'*cluster-size' :   'size' } }
+
+##
 # @BlockdevCreateNotSupported:
 #
 # This is used for all drivers that don't support creating images.
@@ -3484,7 +3501,7 @@
   'qed':'BlockdevCreateNotSupported',
   'quorum': 'BlockdevCreateNotSupported',
   'raw':'BlockdevCreateNotSupported',
-  'rbd':'BlockdevCreateNotSupported',
+  'rbd':'BlockdevCreateOptionsRbd',
   'replication':'BlockdevCreateNotSupported',
   'sheepdog':   'BlockdevCreateNotSupported',
   'ssh':'BlockdevCreateNotSupported',
diff --git a/block/rbd.c b/block/rbd.c
index 9b247f020d..ee71dc8941 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -332,69 +332,55 @@ static QemuOptsList runtime_opts = {
 },
 };
 
-static int qemu_rbd_create(const char *filename, QemuOpts *opts, Error **errp)
+/* FIXME Deprecate and remove keypairs or make it available in QMP.
+ * password_secret should eventually be configurable in opts->location. Support
+ * for it in .bdrv_open will make it work here as well. */
+static int qemu_rbd_do_create(BlockdevCreateOptions *options,
+  const char *keypairs, const char 
*password_secret,
+  Error **errp)
 {
-Error *local_err = NULL;
-int64_t bytes = 0;
-int64_t objsize;
-int obj_order = 0;
-const char *pool, *image_name, *conf, *user, *keypairs;
-const char *secretid;
+BlockdevCreateOptionsRbd *opts = >u.rbd;
 rados_t cluster;
 rados_ioctx_t io_ctx;
-QDict *options = NULL;
-int ret = 0;
+int obj_order = 0;
+int ret;
 
-secretid = qemu_opt_get(opts, "password-secret");
+assert(options->driver == BLOCKDEV_DRIVER_RBD);
+if (opts->location->has_snapshot) {
+error_setg(errp, "Can't use snapshot name for image creation");
+return -EINVAL;
+}
 
-/* Read out options */
-bytes = ROUND_UP(qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0),
- BDRV_SECTOR_SIZE);
-objsize = qemu_opt_get_size_del(opts, BLOCK_OPT_CLUSTER_SIZE, 0);
-if (objsize) {
+/* TODO Remove the limitation */
+if (opts->location->has_server) {
+error_setg(errp, "Can't specify server for image creation");
+return -EINVAL;
+}
+
+if (opts->has_cluster_size) {
+int64_t objsize = opts->cluster_size;
 if ((objsize - 1) & objsize) {/* not a power of 2? */
 error_setg(errp, "obj size needs to be power of 2");
-ret = -EINVAL;
-goto exit;
+return -EINVAL;
 }
 if (objsize < 4096) {
 error_setg(errp, "obj size too small");
-ret = -EINVAL;
-goto exit;
+return -EINVAL;
 }
 obj_order = ctz32(objsize);
 }
 
-options = qdict_new();
-qemu_rbd_parse_filename(filename, options, _err);
-if (local_err) {
-ret = -EINVAL;
-error_propagate(errp, local_err);
-goto exit;
-}
-
-/*
- * Caution: while qdict_get_try_str() is fine, getting non-string
- * types would require more care.  When @options come from -blockdev
- * or blockdev_add, its members are typed according to the QAPI
- * schema, but when they come from -drive, they're all QString.
- */
-pool   = qdict_get_try_str(options, "pool");
-conf   = qdict_get_try_str(options, "conf");
-user   = qdict_get_try_str(options, "user");
-image_name = qdict_get_try_str(options, "image");
-keypairs   = qdict_get_try_str(options, "=keyvalue-pairs");
-
-ret = rados_create(, user);
+ret = rados_create(, opts->location->user);
 if (ret < 0) {
 error_setg_errno(errp, -ret, "error initializing");
-goto exit;
+return ret;
 }
 
 /* try default location when conf=NULL, but ignore failure */
-ret = rados_conf_read_file(cluster, conf);
-  

[Qemu-block] [PATCH v3 19/36] rbd: Factor out qemu_rbd_connect()

2018-02-23 Thread Kevin Wolf
The code to establish an RBD connection is duplicated between open and
create. In order to be able to share the code, factor out the code from
qemu_rbd_open() as a first step.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 block/rbd.c | 100 
 1 file changed, 60 insertions(+), 40 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index 27fa11b473..4bbcce4eca 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -544,32 +544,17 @@ out:
 return rados_str;
 }
 
-static int qemu_rbd_open(BlockDriverState *bs, QDict *options, int flags,
- Error **errp)
+static int qemu_rbd_connect(rados_t *cluster, rados_ioctx_t *io_ctx,
+char **s_snap, char **s_image_name,
+QDict *options, bool cache, Error **errp)
 {
-BDRVRBDState *s = bs->opaque;
-const char *pool, *snap, *conf, *user, *image_name, *keypairs;
-const char *secretid, *filename;
 QemuOpts *opts;
-Error *local_err = NULL;
 char *mon_host = NULL;
+const char *pool, *snap, *conf, *user, *image_name, *keypairs;
+const char *secretid;
+Error *local_err = NULL;
 int r;
 
-/* If we are given a filename, parse the filename, with precedence given to
- * filename encoded options */
-filename = qdict_get_try_str(options, "filename");
-if (filename) {
-warn_report("'filename' option specified. "
-"This is an unsupported option, and may be deprecated "
-"in the future");
-qemu_rbd_parse_filename(filename, options, _err);
-if (local_err) {
-r = -EINVAL;
-error_propagate(errp, local_err);
-goto exit;
-}
-}
-
 opts = qemu_opts_create(_opts, NULL, 0, _abort);
 qemu_opts_absorb_qdict(opts, options, _err);
 if (local_err) {
@@ -600,35 +585,35 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict 
*options, int flags,
 goto failed_opts;
 }
 
-r = rados_create(>cluster, user);
+r = rados_create(cluster, user);
 if (r < 0) {
 error_setg_errno(errp, -r, "error initializing");
 goto failed_opts;
 }
 
-s->snap = g_strdup(snap);
-s->image_name = g_strdup(image_name);
+*s_snap = g_strdup(snap);
+*s_image_name = g_strdup(image_name);
 
 /* try default location when conf=NULL, but ignore failure */
-r = rados_conf_read_file(s->cluster, conf);
+r = rados_conf_read_file(*cluster, conf);
 if (conf && r < 0) {
 error_setg_errno(errp, -r, "error reading conf file %s", conf);
 goto failed_shutdown;
 }
 
-r = qemu_rbd_set_keypairs(s->cluster, keypairs, errp);
+r = qemu_rbd_set_keypairs(*cluster, keypairs, errp);
 if (r < 0) {
 goto failed_shutdown;
 }
 
 if (mon_host) {
-r = rados_conf_set(s->cluster, "mon_host", mon_host);
+r = rados_conf_set(*cluster, "mon_host", mon_host);
 if (r < 0) {
 goto failed_shutdown;
 }
 }
 
-if (qemu_rbd_set_auth(s->cluster, secretid, errp) < 0) {
+if (qemu_rbd_set_auth(*cluster, secretid, errp) < 0) {
 r = -EIO;
 goto failed_shutdown;
 }
@@ -640,24 +625,65 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict 
*options, int flags,
  * librbd defaults to no caching. If write through caching cannot
  * be set up, fall back to no caching.
  */
-if (flags & BDRV_O_NOCACHE) {
-rados_conf_set(s->cluster, "rbd_cache", "false");
+if (cache) {
+rados_conf_set(*cluster, "rbd_cache", "true");
 } else {
-rados_conf_set(s->cluster, "rbd_cache", "true");
+rados_conf_set(*cluster, "rbd_cache", "false");
 }
 
-r = rados_connect(s->cluster);
+r = rados_connect(*cluster);
 if (r < 0) {
 error_setg_errno(errp, -r, "error connecting");
 goto failed_shutdown;
 }
 
-r = rados_ioctx_create(s->cluster, pool, >io_ctx);
+r = rados_ioctx_create(*cluster, pool, io_ctx);
 if (r < 0) {
 error_setg_errno(errp, -r, "error opening pool %s", pool);
 goto failed_shutdown;
 }
 
+qemu_opts_del(opts);
+return 0;
+
+failed_shutdown:
+rados_shutdown(*cluster);
+g_free(*s_snap);
+g_free(*s_image_name);
+failed_opts:
+qemu_opts_del(opts);
+g_free(mon_host);
+return r;
+}
+
+static int qemu_rbd_open(BlockDriverState *bs, QDict *options, int flags,
+ Error **errp)
+{
+BDRVRBDState *s = bs->opaque;
+Error *local_err = NULL;
+const char *filename;
+int r;
+
+/* If we are given a filename, parse the filename, with precedence given to
+ * filename encoded options */
+filename = qdict_get_try_str(options, "filename");
+if (filename) {
+warn_report("'filename' option specified. "
+"This is an unsupported option, and may 

[Qemu-block] [PATCH v3 33/36] file-posix: Fix no-op bdrv_truncate() with falloc preallocation

2018-02-23 Thread Kevin Wolf
If bdrv_truncate() is called, but the requested size is the same as
before, don't call posix_fallocate(), which returns -EINVAL for length
zero and would therefore make bdrv_truncate() fail.

The problem can be triggered by creating a zero-sized raw image with
'falloc' preallocation mode.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
Reviewed-by: Eric Blake 
---
 block/file-posix.c | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index ba14ed9459..6aed5bca0b 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1686,11 +1686,15 @@ static int raw_regular_truncate(int fd, int64_t offset, 
PreallocMode prealloc,
  * file systems that do not support fallocate(), trying to check if a
  * block is allocated before allocating it, so don't do that here.
  */
-result = -posix_fallocate(fd, current_length, offset - current_length);
-if (result != 0) {
-/* posix_fallocate() doesn't set errno. */
-error_setg_errno(errp, -result,
- "Could not preallocate new data");
+if (offset != current_length) {
+result = -posix_fallocate(fd, current_length, offset - 
current_length);
+if (result != 0) {
+/* posix_fallocate() doesn't set errno. */
+error_setg_errno(errp, -result,
+ "Could not preallocate new data");
+}
+} else {
+result = 0;
 }
 goto out;
 #endif
-- 
2.13.6




[Qemu-block] [PATCH v3 34/36] block: Fail bdrv_truncate() with negative size

2018-02-23 Thread Kevin Wolf
Most callers have their own checks, but something like this should also
be checked centrally. As it happens, x-blockdev-create can pass negative
image sizes to format drivers (because there is no QAPI type that would
reject negative numbers) and triggers the check added by this patch.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
Reviewed-by: Eric Blake 
---
 block.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/block.c b/block.c
index 4a7e448226..5c874aefa1 100644
--- a/block.c
+++ b/block.c
@@ -3684,6 +3684,11 @@ int bdrv_truncate(BdrvChild *child, int64_t offset, 
PreallocMode prealloc,
 error_setg(errp, "No medium inserted");
 return -ENOMEDIUM;
 }
+if (offset < 0) {
+error_setg(errp, "Image size cannot be negative");
+return -EINVAL;
+}
+
 if (!drv->bdrv_truncate) {
 if (bs->file && drv->is_filter) {
 return bdrv_truncate(bs->file, offset, prealloc, errp);
-- 
2.13.6




[Qemu-block] [PATCH v3 32/36] ssh: Support .bdrv_co_create

2018-02-23 Thread Kevin Wolf
This adds the .bdrv_co_create driver callback to ssh, which enables
image creation over QMP.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 qapi/block-core.json | 16 -
 block/ssh.c  | 92 +---
 2 files changed, 67 insertions(+), 41 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 431d4a4fb2..2f7fab46eb 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -3593,6 +3593,20 @@
 '*object-size': 'size' } }
 
 ##
+# @BlockdevCreateOptionsSsh:
+#
+# Driver specific image creation options for SSH.
+#
+# @location Where to store the new image file
+# @size Size of the virtual disk in bytes
+#
+# Since: 2.12
+##
+{ 'struct': 'BlockdevCreateOptionsSsh',
+  'data': { 'location': 'BlockdevOptionsSsh',
+'size': 'size' } }
+
+##
 # @BlockdevCreateNotSupported:
 #
 # This is used for all drivers that don't support creating images.
@@ -3644,7 +3658,7 @@
   'rbd':'BlockdevCreateOptionsRbd',
   'replication':'BlockdevCreateNotSupported',
   'sheepdog':   'BlockdevCreateOptionsSheepdog',
-  'ssh':'BlockdevCreateNotSupported',
+  'ssh':'BlockdevCreateOptionsSsh',
   'throttle':   'BlockdevCreateNotSupported',
   'vdi':'BlockdevCreateNotSupported',
   'vhdx':   'BlockdevCreateNotSupported',
diff --git a/block/ssh.c b/block/ssh.c
index 77bc20041f..bd3044e5f6 100644
--- a/block/ssh.c
+++ b/block/ssh.c
@@ -826,64 +826,75 @@ static QemuOptsList ssh_create_opts = {
 }
 };
 
-static int ssh_create(const char *filename, QemuOpts *opts, Error **errp)
+static int ssh_co_create(BlockdevCreateOptions *options, Error **errp)
 {
-int r, ret;
-int64_t total_size = 0;
-QDict *uri_options = NULL;
-BlockdevOptionsSsh *ssh_opts = NULL;
+BlockdevCreateOptionsSsh *opts = >u.ssh;
 BDRVSSHState s;
-ssize_t r2;
 char c[1] = { '\0' };
+int ret;
+
+assert(options->driver == BLOCKDEV_DRIVER_SSH);
 
 ssh_state_init();
 
+ret = connect_to_ssh(, opts->location,
+ LIBSSH2_FXF_READ|LIBSSH2_FXF_WRITE|
+ LIBSSH2_FXF_CREAT|LIBSSH2_FXF_TRUNC,
+ 0644, errp);
+if (ret < 0) {
+goto fail;
+}
+
+if (opts->size > 0) {
+libssh2_sftp_seek64(s.sftp_handle, opts->size - 1);
+ret = libssh2_sftp_write(s.sftp_handle, c, 1);
+if (ret < 0) {
+sftp_error_setg(errp, , "truncate failed");
+ret = -EINVAL;
+goto fail;
+}
+s.attrs.filesize = opts->size;
+}
+
+ret = 0;
+fail:
+ssh_state_free();
+return ret;
+}
+
+static int ssh_create(const char *filename, QemuOpts *opts, Error **errp)
+{
+BlockdevCreateOptions *create_options;
+BlockdevCreateOptionsSsh *ssh_opts;
+int ret;
+QDict *uri_options = NULL;
+
+create_options = g_new0(BlockdevCreateOptions, 1);
+create_options->driver = BLOCKDEV_DRIVER_SSH;
+ssh_opts = _options->u.ssh;
+
 /* Get desired file size. */
-total_size = ROUND_UP(qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0),
-  BDRV_SECTOR_SIZE);
-DPRINTF("total_size=%" PRIi64, total_size);
+ssh_opts->size = ROUND_UP(qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0),
+  BDRV_SECTOR_SIZE);
+DPRINTF("total_size=%" PRIi64, ssh_opts->size);
 
 uri_options = qdict_new();
-r = parse_uri(filename, uri_options, errp);
-if (r < 0) {
-ret = r;
+ret = parse_uri(filename, uri_options, errp);
+if (ret < 0) {
 goto out;
 }
 
-ssh_opts = ssh_parse_options(uri_options, errp);
-if (ssh_opts == NULL) {
+ssh_opts->location = ssh_parse_options(uri_options, errp);
+if (ssh_opts->location == NULL) {
 ret = -EINVAL;
 goto out;
 }
 
-r = connect_to_ssh(, ssh_opts,
-   LIBSSH2_FXF_READ|LIBSSH2_FXF_WRITE|
-   LIBSSH2_FXF_CREAT|LIBSSH2_FXF_TRUNC,
-   0644, errp);
-if (r < 0) {
-ret = r;
-goto out;
-}
-
-if (total_size > 0) {
-libssh2_sftp_seek64(s.sftp_handle, total_size-1);
-r2 = libssh2_sftp_write(s.sftp_handle, c, 1);
-if (r2 < 0) {
-sftp_error_setg(errp, , "truncate failed");
-ret = -EINVAL;
-goto out;
-}
-s.attrs.filesize = total_size;
-}
-
-ret = 0;
+ret = ssh_co_create(create_options, errp);
 
  out:
-ssh_state_free();
-if (uri_options != NULL) {
-QDECREF(uri_options);
-}
-qapi_free_BlockdevOptionsSsh(ssh_opts);
+QDECREF(uri_options);
+qapi_free_BlockdevCreateOptions(create_options);
 return ret;
 }
 
@@ -1223,6 +1234,7 @@ static BlockDriver bdrv_ssh = {
 

[Qemu-block] [PATCH v3 31/36] ssh: Pass BlockdevOptionsSsh to connect_to_ssh()

2018-02-23 Thread Kevin Wolf
Move the parsing of the QDict options up to the callers, in preparation
for the .bdrv_co_create implementation that directly gets a QAPI type.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 block/ssh.c | 34 +-
 1 file changed, 21 insertions(+), 13 deletions(-)

diff --git a/block/ssh.c b/block/ssh.c
index dcf766c213..77bc20041f 100644
--- a/block/ssh.c
+++ b/block/ssh.c
@@ -655,19 +655,13 @@ fail:
 return result;
 }
 
-static int connect_to_ssh(BDRVSSHState *s, QDict *options,
+static int connect_to_ssh(BDRVSSHState *s, BlockdevOptionsSsh *opts,
   int ssh_flags, int creat_mode, Error **errp)
 {
-BlockdevOptionsSsh *opts;
 int r, ret;
 const char *user;
 long port = 0;
 
-opts = ssh_parse_options(options, errp);
-if (opts == NULL) {
-return -EINVAL;
-}
-
 if (opts->has_user) {
 user = opts->user;
 } else {
@@ -747,8 +741,6 @@ static int connect_to_ssh(BDRVSSHState *s, QDict *options,
 goto err;
 }
 
-qapi_free_BlockdevOptionsSsh(opts);
-
 r = libssh2_sftp_fstat(s->sftp_handle, >attrs);
 if (r < 0) {
 sftp_error_setg(errp, s, "failed to read file attributes");
@@ -774,8 +766,6 @@ static int connect_to_ssh(BDRVSSHState *s, QDict *options,
 }
 s->session = NULL;
 
-qapi_free_BlockdevOptionsSsh(opts);
-
 return ret;
 }
 
@@ -783,6 +773,7 @@ static int ssh_file_open(BlockDriverState *bs, QDict 
*options, int bdrv_flags,
  Error **errp)
 {
 BDRVSSHState *s = bs->opaque;
+BlockdevOptionsSsh *opts;
 int ret;
 int ssh_flags;
 
@@ -793,8 +784,13 @@ static int ssh_file_open(BlockDriverState *bs, QDict 
*options, int bdrv_flags,
 ssh_flags |= LIBSSH2_FXF_WRITE;
 }
 
+opts = ssh_parse_options(options, errp);
+if (opts == NULL) {
+return -EINVAL;
+}
+
 /* Start up SSH. */
-ret = connect_to_ssh(s, options, ssh_flags, 0, errp);
+ret = connect_to_ssh(s, opts, ssh_flags, 0, errp);
 if (ret < 0) {
 goto err;
 }
@@ -802,6 +798,8 @@ static int ssh_file_open(BlockDriverState *bs, QDict 
*options, int bdrv_flags,
 /* Go non-blocking. */
 libssh2_session_set_blocking(s->session, 0);
 
+qapi_free_BlockdevOptionsSsh(opts);
+
 return 0;
 
  err:
@@ -810,6 +808,8 @@ static int ssh_file_open(BlockDriverState *bs, QDict 
*options, int bdrv_flags,
 }
 s->sock = -1;
 
+qapi_free_BlockdevOptionsSsh(opts);
+
 return ret;
 }
 
@@ -831,6 +831,7 @@ static int ssh_create(const char *filename, QemuOpts *opts, 
Error **errp)
 int r, ret;
 int64_t total_size = 0;
 QDict *uri_options = NULL;
+BlockdevOptionsSsh *ssh_opts = NULL;
 BDRVSSHState s;
 ssize_t r2;
 char c[1] = { '\0' };
@@ -849,7 +850,13 @@ static int ssh_create(const char *filename, QemuOpts 
*opts, Error **errp)
 goto out;
 }
 
-r = connect_to_ssh(, uri_options,
+ssh_opts = ssh_parse_options(uri_options, errp);
+if (ssh_opts == NULL) {
+ret = -EINVAL;
+goto out;
+}
+
+r = connect_to_ssh(, ssh_opts,
LIBSSH2_FXF_READ|LIBSSH2_FXF_WRITE|
LIBSSH2_FXF_CREAT|LIBSSH2_FXF_TRUNC,
0644, errp);
@@ -876,6 +883,7 @@ static int ssh_create(const char *filename, QemuOpts *opts, 
Error **errp)
 if (uri_options != NULL) {
 QDECREF(uri_options);
 }
+qapi_free_BlockdevOptionsSsh(ssh_opts);
 return ret;
 }
 
-- 
2.13.6




[Qemu-block] [PATCH v3 29/36] ssh: Use QAPI BlockdevOptionsSsh object

2018-02-23 Thread Kevin Wolf
Create a BlockdevOptionsSsh object in connect_to_ssh() and take the
options from there. 'host_key_check' is still processed separately
because it's not in the schema yet.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 block/ssh.c | 136 +++-
 1 file changed, 61 insertions(+), 75 deletions(-)

diff --git a/block/ssh.c b/block/ssh.c
index b63addcf94..9a89b7f350 100644
--- a/block/ssh.c
+++ b/block/ssh.c
@@ -543,21 +543,6 @@ static QemuOptsList ssh_runtime_opts = {
 .type = QEMU_OPT_NUMBER,
 .help = "Port to connect to",
 },
-{
-.name = "path",
-.type = QEMU_OPT_STRING,
-.help = "Path of the image on the host",
-},
-{
-.name = "user",
-.type = QEMU_OPT_STRING,
-.help = "User as which to connect",
-},
-{
-.name = "host_key_check",
-.type = QEMU_OPT_STRING,
-.help = "Defines how and what to check the host key against",
-},
 { /* end of list */ }
 },
 };
@@ -582,23 +567,31 @@ static bool ssh_process_legacy_socket_options(QDict 
*output_opts,
 return true;
 }
 
-static InetSocketAddress *ssh_config(QDict *options, Error **errp)
+static BlockdevOptionsSsh *ssh_parse_options(QDict *options, Error **errp)
 {
-InetSocketAddress *inet = NULL;
-QDict *addr = NULL;
-QObject *crumpled_addr = NULL;
-Visitor *iv = NULL;
-Error *local_error = NULL;
-
-qdict_extract_subqdict(options, , "server.");
-if (!qdict_size(addr)) {
-error_setg(errp, "SSH server address missing");
-goto out;
+BlockdevOptionsSsh *result = NULL;
+QemuOpts *opts = NULL;
+Error *local_err = NULL;
+QObject *crumpled;
+const QDictEntry *e;
+Visitor *v;
+
+/* Translate legacy options */
+opts = qemu_opts_create(_runtime_opts, NULL, 0, _abort);
+qemu_opts_absorb_qdict(opts, options, _err);
+if (local_err) {
+error_propagate(errp, local_err);
+goto fail;
 }
 
-crumpled_addr = qdict_crumple(addr, errp);
-if (!crumpled_addr) {
-goto out;
+if (!ssh_process_legacy_socket_options(options, opts, errp)) {
+goto fail;
+}
+
+/* Create the QAPI object */
+crumpled = qdict_crumple(options, errp);
+if (crumpled == NULL) {
+goto fail;
 }
 
 /*
@@ -609,51 +602,50 @@ static InetSocketAddress *ssh_config(QDict *options, 
Error **errp)
  * but when they come from -drive, they're all QString.  The
  * visitor expects the former.
  */
-iv = qobject_input_visitor_new(crumpled_addr);
-visit_type_InetSocketAddress(iv, NULL, , _error);
-if (local_error) {
-error_propagate(errp, local_error);
-goto out;
+v = qobject_input_visitor_new(crumpled);
+visit_type_BlockdevOptionsSsh(v, NULL, , _err);
+visit_free(v);
+qobject_decref(crumpled);
+
+if (local_err) {
+error_propagate(errp, local_err);
+goto fail;
 }
 
-out:
-QDECREF(addr);
-qobject_decref(crumpled_addr);
-visit_free(iv);
-return inet;
+/* Remove the processed options from the QDict (the visitor processes
+ * _all_ options in the QDict) */
+while ((e = qdict_first(options))) {
+qdict_del(options, e->key);
+}
+
+fail:
+qemu_opts_del(opts);
+return result;
 }
 
 static int connect_to_ssh(BDRVSSHState *s, QDict *options,
   int ssh_flags, int creat_mode, Error **errp)
 {
+BlockdevOptionsSsh *opts;
 int r, ret;
-QemuOpts *opts = NULL;
-Error *local_err = NULL;
-const char *user, *path, *host_key_check;
+const char *user, *host_key_check;
 long port = 0;
 
-opts = qemu_opts_create(_runtime_opts, NULL, 0, _abort);
-qemu_opts_absorb_qdict(opts, options, _err);
-if (local_err) {
-ret = -EINVAL;
-error_propagate(errp, local_err);
-goto err;
-}
-
-if (!ssh_process_legacy_socket_options(options, opts, errp)) {
-ret = -EINVAL;
-goto err;
+host_key_check = qdict_get_try_str(options, "host_key_check");
+if (!host_key_check) {
+host_key_check = "yes";
+} else {
+qdict_del(options, "host_key_check");
 }
 
-path = qemu_opt_get(opts, "path");
-if (!path) {
-ret = -EINVAL;
-error_setg(errp, "No path was specified");
-goto err;
+opts = ssh_parse_options(options, errp);
+if (opts == NULL) {
+return -EINVAL;
 }
 
-user = qemu_opt_get(opts, "user");
-if (!user) {
+if (opts->has_user) {
+user = opts->user;
+} else {
 user = g_get_user_name();
 if (!user) {
 error_setg_errno(errp, errno, "Can't get user name");
@@ -662,17 +654,9 @@ static int connect_to_ssh(BDRVSSHState *s, QDict *options,
 }
 }
 
-

[Qemu-block] [PATCH v3 21/36] rbd: Pass BlockdevOptionsRbd to qemu_rbd_connect()

2018-02-23 Thread Kevin Wolf
With the conversion to a QAPI options object, the function is now
prepared to be used in a .bdrv_co_create implementation.

Signed-off-by: Kevin Wolf 
---
 block/rbd.c | 109 +---
 1 file changed, 53 insertions(+), 56 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index 2e79c2d1fd..9b247f020d 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -24,6 +24,8 @@
 #include "qapi/qmp/qdict.h"
 #include "qapi/qmp/qjson.h"
 #include "qapi/qmp/qlist.h"
+#include "qapi/qobject-input-visitor.h"
+#include "qapi-visit.h"
 
 /*
  * When specifying the image filename use:
@@ -482,29 +484,27 @@ static void qemu_rbd_complete_aio(RADOSCB *rcb)
 qemu_aio_unref(acb);
 }
 
-static char *qemu_rbd_mon_host(QDict *options, Error **errp)
+static char *qemu_rbd_mon_host(BlockdevOptionsRbd *opts, Error **errp)
 {
-const char **vals = g_new(const char *, qdict_size(options) + 1);
-char keybuf[32];
+const char **vals;
 const char *host, *port;
 char *rados_str;
-int i;
-
-for (i = 0;; i++) {
-sprintf(keybuf, "server.%d.host", i);
-host = qdict_get_try_str(options, keybuf);
-qdict_del(options, keybuf);
-sprintf(keybuf, "server.%d.port", i);
-port = qdict_get_try_str(options, keybuf);
-qdict_del(options, keybuf);
-if (!host && !port) {
-break;
-}
-if (!host) {
-error_setg(errp, "Parameter server.%d.host is missing", i);
-rados_str = NULL;
-goto out;
-}
+InetSocketAddressBaseList *p;
+int i, cnt;
+
+if (!opts->has_server) {
+return NULL;
+}
+
+for (cnt = 0, p = opts->server; p; p = p->next) {
+cnt++;
+}
+
+vals = g_new(const char *, cnt + 1);
+
+for (i = 0, p = opts->server; p; p = p->next, i++) {
+host = p->value->host;
+port = p->value->port;
 
 if (strchr(host, ':')) {
 vals[i] = port ? g_strdup_printf("[%s]:%s", host, port)
@@ -517,63 +517,40 @@ static char *qemu_rbd_mon_host(QDict *options, Error 
**errp)
 vals[i] = NULL;
 
 rados_str = i ? g_strjoinv(";", (char **)vals) : NULL;
-out:
 g_strfreev((char **)vals);
 return rados_str;
 }
 
 static int qemu_rbd_connect(rados_t *cluster, rados_ioctx_t *io_ctx,
 char **s_snap, char **s_image_name,
-QDict *options, bool cache,
+BlockdevOptionsRbd *opts, bool cache,
 const char *keypairs, const char *secretid,
 Error **errp)
 {
-QemuOpts *opts;
 char *mon_host = NULL;
-const char *pool, *snap, *conf, *user, *image_name;
 Error *local_err = NULL;
 int r;
 
-opts = qemu_opts_create(_opts, NULL, 0, _abort);
-qemu_opts_absorb_qdict(opts, options, _err);
-if (local_err) {
-error_propagate(errp, local_err);
-r = -EINVAL;
-goto failed_opts;
-}
-
-mon_host = qemu_rbd_mon_host(options, _err);
+mon_host = qemu_rbd_mon_host(opts, _err);
 if (local_err) {
 error_propagate(errp, local_err);
 r = -EINVAL;
 goto failed_opts;
 }
 
-pool   = qemu_opt_get(opts, "pool");
-conf   = qemu_opt_get(opts, "conf");
-snap   = qemu_opt_get(opts, "snapshot");
-user   = qemu_opt_get(opts, "user");
-image_name = qemu_opt_get(opts, "image");
-
-if (!pool || !image_name) {
-error_setg(errp, "Parameters 'pool' and 'image' are required");
-r = -EINVAL;
-goto failed_opts;
-}
-
-r = rados_create(cluster, user);
+r = rados_create(cluster, opts->user);
 if (r < 0) {
 error_setg_errno(errp, -r, "error initializing");
 goto failed_opts;
 }
 
-*s_snap = g_strdup(snap);
-*s_image_name = g_strdup(image_name);
+*s_snap = g_strdup(opts->snapshot);
+*s_image_name = g_strdup(opts->image);
 
 /* try default location when conf=NULL, but ignore failure */
-r = rados_conf_read_file(*cluster, conf);
-if (conf && r < 0) {
-error_setg_errno(errp, -r, "error reading conf file %s", conf);
+r = rados_conf_read_file(*cluster, opts->conf);
+if (opts->has_conf && r < 0) {
+error_setg_errno(errp, -r, "error reading conf file %s", opts->conf);
 goto failed_shutdown;
 }
 
@@ -613,13 +590,12 @@ static int qemu_rbd_connect(rados_t *cluster, 
rados_ioctx_t *io_ctx,
 goto failed_shutdown;
 }
 
-r = rados_ioctx_create(*cluster, pool, io_ctx);
+r = rados_ioctx_create(*cluster, opts->pool, io_ctx);
 if (r < 0) {
-error_setg_errno(errp, -r, "error opening pool %s", pool);
+error_setg_errno(errp, -r, "error opening pool %s", opts->pool);
 goto failed_shutdown;
 }
 
-qemu_opts_del(opts);
 return 0;
 
 failed_shutdown:
@@ -627,7 +603,6 @@ failed_shutdown:
 

[Qemu-block] [PATCH v3 20/36] rbd: Remove non-schema options from runtime_opts

2018-02-23 Thread Kevin Wolf
Instead of the QemuOpts in qemu_rbd_connect(), we want to use QAPI
objects. As a preparation, fetch those options directly from the QDict
that .bdrv_open() supports in the rbd driver and that are not in the
schema.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 block/rbd.c | 55 ---
 1 file changed, 24 insertions(+), 31 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index 4bbcce4eca..2e79c2d1fd 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -326,28 +326,6 @@ static QemuOptsList runtime_opts = {
 /*
  * server.* extracted manually, see qemu_rbd_mon_host()
  */
-{
-.name = "password-secret",
-.type = QEMU_OPT_STRING,
-.help = "ID of secret providing the password",
-},
-
-/*
- * Keys for qemu_rbd_parse_filename(), not in the QAPI schema
- */
-{
-/*
- * HACK: name starts with '=' so that qemu_opts_parse()
- * can't set it
- */
-.name = "=keyvalue-pairs",
-.type = QEMU_OPT_STRING,
-.help = "Legacy rados key/value option parameters",
-},
-{
-.name = "filename",
-.type = QEMU_OPT_STRING,
-},
 { /* end of list */ }
 },
 };
@@ -546,12 +524,13 @@ out:
 
 static int qemu_rbd_connect(rados_t *cluster, rados_ioctx_t *io_ctx,
 char **s_snap, char **s_image_name,
-QDict *options, bool cache, Error **errp)
+QDict *options, bool cache,
+const char *keypairs, const char *secretid,
+Error **errp)
 {
 QemuOpts *opts;
 char *mon_host = NULL;
-const char *pool, *snap, *conf, *user, *image_name, *keypairs;
-const char *secretid;
+const char *pool, *snap, *conf, *user, *image_name;
 Error *local_err = NULL;
 int r;
 
@@ -570,14 +549,11 @@ static int qemu_rbd_connect(rados_t *cluster, 
rados_ioctx_t *io_ctx,
 goto failed_opts;
 }
 
-secretid = qemu_opt_get(opts, "password-secret");
-
 pool   = qemu_opt_get(opts, "pool");
 conf   = qemu_opt_get(opts, "conf");
 snap   = qemu_opt_get(opts, "snapshot");
 user   = qemu_opt_get(opts, "user");
 image_name = qemu_opt_get(opts, "image");
-keypairs   = qemu_opt_get(opts, "=keyvalue-pairs");
 
 if (!pool || !image_name) {
 error_setg(errp, "Parameters 'pool' and 'image' are required");
@@ -662,6 +638,7 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict 
*options, int flags,
 BDRVRBDState *s = bs->opaque;
 Error *local_err = NULL;
 const char *filename;
+char *keypairs, *secretid;
 int r;
 
 /* If we are given a filename, parse the filename, with precedence given to
@@ -672,16 +649,28 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict 
*options, int flags,
 "This is an unsupported option, and may be deprecated "
 "in the future");
 qemu_rbd_parse_filename(filename, options, _err);
+qdict_del(options, "filename");
 if (local_err) {
 error_propagate(errp, local_err);
 return -EINVAL;
 }
 }
 
+keypairs = g_strdup(qdict_get_try_str(options, "=keyvalue-pairs"));
+if (keypairs) {
+qdict_del(options, "=keyvalue-pairs");
+}
+
+secretid = g_strdup(qdict_get_try_str(options, "password-secret"));
+if (secretid) {
+qdict_del(options, "password-secret");
+}
+
 r = qemu_rbd_connect(>cluster, >io_ctx, >snap, >image_name,
- options, !(flags & BDRV_O_NOCACHE), errp);
+ options, !(flags & BDRV_O_NOCACHE), keypairs, 
secretid,
+ errp);
 if (r < 0) {
-return r;
+goto out;
 }
 
 /* rbd_open is always r/w */
@@ -708,13 +697,17 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict 
*options, int flags,
 }
 }
 
-return 0;
+r = 0;
+goto out;
 
 failed_open:
 rados_ioctx_destroy(s->io_ctx);
 g_free(s->snap);
 g_free(s->image_name);
 rados_shutdown(s->cluster);
+out:
+g_free(keypairs);
+g_free(secretid);
 return r;
 }
 
-- 
2.13.6




[Qemu-block] [PATCH v3 18/36] rbd: Fix use after free in qemu_rbd_set_keypairs() error path

2018-02-23 Thread Kevin Wolf
If we want to include the invalid option name in the error message, we
can't free the string earlier than that.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
Reviewed-by: Eric Blake 
---
 block/rbd.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/block/rbd.c b/block/rbd.c
index 8474b0ba11..27fa11b473 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -268,13 +268,14 @@ static int qemu_rbd_set_keypairs(rados_t cluster, const 
char *keypairs_json,
 key = qstring_get_str(name);
 
 ret = rados_conf_set(cluster, key, qstring_get_str(value));
-QDECREF(name);
 QDECREF(value);
 if (ret < 0) {
 error_setg_errno(errp, -ret, "invalid conf option %s", key);
+QDECREF(name);
 ret = -EINVAL;
 break;
 }
+QDECREF(name);
 }
 
 QDECREF(keypairs);
-- 
2.13.6




[Qemu-block] [PATCH v3 30/36] ssh: QAPIfy host-key-check option

2018-02-23 Thread Kevin Wolf
This makes the host-key-check option available in blockdev-add.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 qapi/block-core.json | 63 +++--
 block/ssh.c  | 88 +---
 2 files changed, 117 insertions(+), 34 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index f7679fce53..431d4a4fb2 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2553,6 +2553,63 @@
 '*encrypt': 'BlockdevQcow2Encryption' } }
 
 ##
+# @SshHostKeyCheckMode:
+#
+# @none Don't check the host key at all
+# @hash Compare the host key with a given hash
+# @known_hosts  Check the host key against the known_hosts file
+#
+# Since: 2.12
+##
+{ 'enum': 'SshHostKeyCheckMode',
+  'data': [ 'none', 'hash', 'known_hosts' ] }
+
+##
+# @SshHostKeyCheckHashType:
+#
+# @md5  The given hash is an md5 hash
+# @sha1 The given hash is an sha1 hash
+#
+# Since: 2.12
+##
+{ 'enum': 'SshHostKeyCheckHashType',
+  'data': [ 'md5', 'sha1' ] }
+
+##
+# @SshHostKeyHash:
+#
+# @type The hash algorithm used for the hash
+# @hash The expected hash value
+#
+# Since: 2.12
+##
+{ 'struct': 'SshHostKeyHash',
+  'data': { 'type': 'SshHostKeyCheckHashType',
+'hash': 'str' }}
+
+##
+# @SshHostKeyDummy:
+#
+# For those union branches that don't need additional fields.
+#
+# Since: 2.12
+##
+{ 'struct': 'SshHostKeyDummy',
+  'data': {} }
+
+##
+# @SshHostKeyCheck:
+#
+# Since: 2.12
+##
+{ 'union': 'SshHostKeyCheck',
+  'base': { 'mode': 'SshHostKeyCheckMode' },
+  'discriminator': 'mode',
+  'data': { 'none': 'SshHostKeyDummy',
+'hash': 'SshHostKeyHash',
+'known_hosts': 'SshHostKeyDummy' } }
+
+##
 # @BlockdevOptionsSsh:
 #
 # @server:  host address
@@ -2562,14 +2619,16 @@
 # @user:user as which to connect, defaults to current
 #   local user name
 #
-# TODO: Expose the host_key_check option in QMP
+# @host-key-check:  Defines how and what to check the host key against
+#   (default: known_hosts)
 #
 # Since: 2.9
 ##
 { 'struct': 'BlockdevOptionsSsh',
   'data': { 'server': 'InetSocketAddress',
 'path': 'str',
-'*user': 'str' } }
+'*user': 'str',
+'*host-key-check': 'SshHostKeyCheck' } }
 
 
 ##
diff --git a/block/ssh.c b/block/ssh.c
index 9a89b7f350..dcf766c213 100644
--- a/block/ssh.c
+++ b/block/ssh.c
@@ -430,31 +430,35 @@ check_host_key_hash(BDRVSSHState *s, const char *hash,
 }
 
 static int check_host_key(BDRVSSHState *s, const char *host, int port,
-  const char *host_key_check, Error **errp)
+  SshHostKeyCheck *hkc, Error **errp)
 {
-/* host_key_check=no */
-if (strcmp(host_key_check, "no") == 0) {
-return 0;
-}
+SshHostKeyCheckMode mode;
 
-/* host_key_check=md5:xx:yy:zz:... */
-if (strncmp(host_key_check, "md5:", 4) == 0) {
-return check_host_key_hash(s, _key_check[4],
-   LIBSSH2_HOSTKEY_HASH_MD5, 16, errp);
-}
-
-/* host_key_check=sha1:xx:yy:zz:... */
-if (strncmp(host_key_check, "sha1:", 5) == 0) {
-return check_host_key_hash(s, _key_check[5],
-   LIBSSH2_HOSTKEY_HASH_SHA1, 20, errp);
+if (hkc) {
+mode = hkc->mode;
+} else {
+mode = SSH_HOST_KEY_CHECK_MODE_KNOWN_HOSTS;
 }
 
-/* host_key_check=yes */
-if (strcmp(host_key_check, "yes") == 0) {
+switch (mode) {
+case SSH_HOST_KEY_CHECK_MODE_NONE:
+return 0;
+case SSH_HOST_KEY_CHECK_MODE_HASH:
+if (hkc->u.hash.type == SSH_HOST_KEY_CHECK_HASH_TYPE_MD5) {
+return check_host_key_hash(s, hkc->u.hash.hash,
+   LIBSSH2_HOSTKEY_HASH_MD5, 16, errp);
+} else if (hkc->u.hash.type == SSH_HOST_KEY_CHECK_HASH_TYPE_SHA1) {
+return check_host_key_hash(s, hkc->u.hash.hash,
+   LIBSSH2_HOSTKEY_HASH_SHA1, 20, errp);
+}
+g_assert_not_reached();
+break;
+case SSH_HOST_KEY_CHECK_MODE_KNOWN_HOSTS:
 return check_host_key_knownhosts(s, host, port, errp);
+default:
+g_assert_not_reached();
 }
 
-error_setg(errp, "unknown host_key_check setting (%s)", host_key_check);
 return -EINVAL;
 }
 
@@ -543,16 +547,22 @@ static QemuOptsList ssh_runtime_opts = {
 .type = QEMU_OPT_NUMBER,
 .help = "Port to connect to",
 },
+{
+.name = "host_key_check",
+.type = QEMU_OPT_STRING,
+.help = "Defines how and what to check the host key against",
+},
 { /* end of list */ }
 },
 };
 
-static bool ssh_process_legacy_socket_options(QDict *output_opts,
-

[Qemu-block] [PATCH v3 17/36] gluster: Support .bdrv_co_create

2018-02-23 Thread Kevin Wolf
This adds the .bdrv_co_create driver callback to gluster, which enables
image creation over QMP.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
Reviewed-by: Eric Blake 
---
 qapi/block-core.json |  18 ++-
 block/gluster.c  | 135 ++-
 2 files changed, 108 insertions(+), 45 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 0040795603..74021c51d7 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -3377,6 +3377,22 @@
 '*nocow':   'bool' } }
 
 ##
+# @BlockdevCreateOptionsGluster:
+#
+# Driver specific image creation options for gluster.
+#
+# @location Where to store the new image file
+# @size Size of the virtual disk in bytes
+# @preallocationPreallocation mode for the new image (default: off)
+#
+# Since: 2.12
+##
+{ 'struct': 'BlockdevCreateOptionsGluster',
+  'data': { 'location': 'BlockdevOptionsGluster',
+'size': 'size',
+'*preallocation':   'PreallocMode' } }
+
+##
 # @BlockdevQcow2Version:
 #
 # @v2:  The original QCOW2 format as introduced in qemu 0.10 (version 2)
@@ -3450,7 +3466,7 @@
   'file':   'BlockdevCreateOptionsFile',
   'ftp':'BlockdevCreateNotSupported',
   'ftps':   'BlockdevCreateNotSupported',
-  'gluster':'BlockdevCreateNotSupported',
+  'gluster':'BlockdevCreateOptionsGluster',
   'host_cdrom': 'BlockdevCreateNotSupported',
   'host_device':'BlockdevCreateNotSupported',
   'http':   'BlockdevCreateNotSupported',
diff --git a/block/gluster.c b/block/gluster.c
index 1a07d221d1..6e2f0e3185 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -655,9 +655,11 @@ out:
 return -errno;
 }
 
-static struct glfs *qemu_gluster_init(BlockdevOptionsGluster *gconf,
-  const char *filename,
-  QDict *options, Error **errp)
+/* Converts options given in @filename and the @options QDict into the QAPI
+ * object @gconf. */
+static int qemu_gluster_parse(BlockdevOptionsGluster *gconf,
+  const char *filename,
+  QDict *options, Error **errp)
 {
 int ret;
 if (filename) {
@@ -668,8 +670,7 @@ static struct glfs 
*qemu_gluster_init(BlockdevOptionsGluster *gconf,
 "[host[:port]]volume/path[?socket=...]"
 "[,file.debug=N]"
 "[,file.logfile=/path/filename.log]\n");
-errno = -ret;
-return NULL;
+return ret;
 }
 } else {
 ret = qemu_gluster_parse_json(gconf, options, errp);
@@ -685,10 +686,23 @@ static struct glfs 
*qemu_gluster_init(BlockdevOptionsGluster *gconf,
  "file.server.1.transport=unix,"
  "file.server.1.socket=/var/run/glusterd.socket 
..."
  "\n");
-errno = -ret;
-return NULL;
+return ret;
 }
+}
 
+return 0;
+}
+
+static struct glfs *qemu_gluster_init(BlockdevOptionsGluster *gconf,
+  const char *filename,
+  QDict *options, Error **errp)
+{
+int ret;
+
+ret = qemu_gluster_parse(gconf, filename, options, errp);
+if (ret < 0) {
+errno = -ret;
+return NULL;
 }
 
 return qemu_gluster_glfs_init(gconf, errp);
@@ -1021,19 +1035,71 @@ static int qemu_gluster_do_truncate(struct glfs_fd *fd, 
int64_t offset,
 return 0;
 }
 
-static int qemu_gluster_create(const char *filename,
-   QemuOpts *opts, Error **errp)
+static int qemu_gluster_co_create(BlockdevCreateOptions *options,
+  Error **errp)
 {
-BlockdevOptionsGluster *gconf;
+BlockdevCreateOptionsGluster *opts = >u.gluster;
 struct glfs *glfs;
 struct glfs_fd *fd = NULL;
 int ret = 0;
-PreallocMode prealloc;
-int64_t total_size = 0;
+
+assert(options->driver == BLOCKDEV_DRIVER_GLUSTER);
+
+glfs = qemu_gluster_glfs_init(opts->location, errp);
+if (!glfs) {
+ret = -errno;
+goto out;
+}
+
+fd = glfs_creat(glfs, opts->location->path,
+O_WRONLY | O_CREAT | O_TRUNC | O_BINARY, S_IRUSR | 
S_IWUSR);
+if (!fd) {
+ret = -errno;
+goto out;
+}
+
+ret = qemu_gluster_do_truncate(fd, opts->size, opts->preallocation, errp);
+
+out:
+if (fd) {
+if (glfs_close(fd) != 0 && ret == 0) {
+ret = -errno;
+}
+}
+glfs_clear_preopened(glfs);
+return ret;
+}
+
+static int qemu_gluster_create(const char *filename,
+   QemuOpts *opts, Error **errp)
+{
+

[Qemu-block] [PATCH v3 16/36] file-win32: Support .bdrv_co_create

2018-02-23 Thread Kevin Wolf
This adds the .bdrv_co_create driver callback to file-win32, which
enables image creation over QMP.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
Reviewed-by: Eric Blake 
---
 block/file-win32.c | 45 +
 1 file changed, 37 insertions(+), 8 deletions(-)

diff --git a/block/file-win32.c b/block/file-win32.c
index f24c7bb92c..d572cde357 100644
--- a/block/file-win32.c
+++ b/block/file-win32.c
@@ -553,29 +553,58 @@ static int64_t 
raw_get_allocated_file_size(BlockDriverState *bs)
 return st.st_size;
 }
 
-static int raw_create(const char *filename, QemuOpts *opts, Error **errp)
+static int raw_co_create(BlockdevCreateOptions *options, Error **errp)
 {
+BlockdevCreateOptionsFile *file_opts;
 int fd;
-int64_t total_size = 0;
 
-strstart(filename, "file:", );
+assert(options->driver == BLOCKDEV_DRIVER_FILE);
+file_opts = >u.file;
 
-/* Read out options */
-total_size = ROUND_UP(qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0),
-  BDRV_SECTOR_SIZE);
+if (file_opts->has_preallocation) {
+error_setg(errp, "Preallocation is not supported on Windows");
+return -EINVAL;
+}
+if (file_opts->has_nocow) {
+error_setg(errp, "nocow is not supported on Windows");
+return -EINVAL;
+}
 
-fd = qemu_open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY,
+fd = qemu_open(file_opts->filename, O_WRONLY | O_CREAT | O_TRUNC | 
O_BINARY,
0644);
 if (fd < 0) {
 error_setg_errno(errp, errno, "Could not create file");
 return -EIO;
 }
 set_sparse(fd);
-ftruncate(fd, total_size);
+ftruncate(fd, file_opts->size);
 qemu_close(fd);
+
 return 0;
 }
 
+static int raw_create(const char *filename, QemuOpts *opts, Error **errp)
+{
+BlockdevCreateOptions options;
+int64_t total_size = 0;
+
+strstart(filename, "file:", );
+
+/* Read out options */
+total_size = ROUND_UP(qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0),
+  BDRV_SECTOR_SIZE);
+
+options = (BlockdevCreateOptions) {
+.driver = BLOCKDEV_DRIVER_FILE,
+.u.file = {
+.filename   = (char *) filename,
+.size   = total_size,
+.has_preallocation  = false,
+.has_nocow  = false,
+},
+};
+return raw_co_create(, errp);
+}
 
 static QemuOptsList raw_create_opts = {
 .name = "raw-create-opts",
-- 
2.13.6




[Qemu-block] [PATCH v3 27/36] sheepdog: QAPIfy "redundancy" create option

2018-02-23 Thread Kevin Wolf
The "redundancy" option for Sheepdog image creation is currently a
string that can encode one or two integers depending on its format,
which at the same time implicitly selects a mode.

This patch turns it into a QAPI union and converts the string into such
a QAPI object before interpreting the values.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 qapi/block-core.json | 45 +
 block/sheepdog.c | 94 +---
 2 files changed, 112 insertions(+), 27 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 085b791303..2b249c9e3d 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -3467,6 +3467,51 @@
 '*cluster-size' :   'size' } }
 
 ##
+# @SheepdogRedundancyType:
+#
+# @full Create a fully replicated vdi with x copies
+# @erasure-codedCreate an erasure coded vdi with x data strips and
+#   y parity strips
+#
+# Since: 2.12
+##
+{ 'enum': 'SheepdogRedundancyType',
+  'data': [ 'full', 'erasure-coded' ] }
+
+##
+# @SheepdogRedundancyFull:
+#
+# @copies   Number of copies to use (between 1 and 31)
+#
+# Since: 2.12
+##
+{ 'struct': 'SheepdogRedundancyFull',
+  'data': { 'copies': 'int' }}
+
+##
+# @SheepdogRedundancyErasureCoded:
+#
+# @data-strips  Number of data strips to use (one of {2,4,8,16})
+# @parity-stripsNumber of parity strips to use (between 1 and 15)
+#
+# Since: 2.12
+##
+{ 'struct': 'SheepdogRedundancyErasureCoded',
+  'data': { 'data-strips': 'int',
+'parity-strips': 'int' }}
+
+##
+# @SheepdogRedundancy:
+#
+# Since: 2.12
+##
+{ 'union': 'SheepdogRedundancy',
+  'base': { 'type': 'SheepdogRedundancyType' },
+  'discriminator': 'type',
+  'data': { 'full': 'SheepdogRedundancyFull',
+'erasure-coded': 'SheepdogRedundancyErasureCoded' } }
+
+##
 # @BlockdevCreateNotSupported:
 #
 # This is used for all drivers that don't support creating images.
diff --git a/block/sheepdog.c b/block/sheepdog.c
index 3c3becf94d..22df2ba9d0 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -1882,6 +1882,48 @@ out_with_err_set:
 return ret;
 }
 
+static int parse_redundancy(BDRVSheepdogState *s, SheepdogRedundancy *opt)
+{
+struct SheepdogInode *inode = >inode;
+
+switch (opt->type) {
+case SHEEPDOG_REDUNDANCY_TYPE_FULL:
+if (opt->u.full.copies > SD_MAX_COPIES || opt->u.full.copies < 1) {
+return -EINVAL;
+}
+inode->copy_policy = 0;
+inode->nr_copies = opt->u.full.copies;
+return 0;
+
+case SHEEPDOG_REDUNDANCY_TYPE_ERASURE_CODED:
+{
+int64_t copy = opt->u.erasure_coded.data_strips;
+int64_t parity = opt->u.erasure_coded.parity_strips;
+
+if (copy != 2 && copy != 4 && copy != 8 && copy != 16) {
+return -EINVAL;
+}
+
+if (parity >= SD_EC_MAX_STRIP || parity < 1) {
+return -EINVAL;
+}
+
+/*
+ * 4 bits for parity and 4 bits for data.
+ * We have to compress upper data bits because it can't represent 16
+ */
+inode->copy_policy = ((copy / 2) << 4) + parity;
+inode->nr_copies = copy + parity;
+return 0;
+}
+
+default:
+g_assert_not_reached();
+}
+
+return -EINVAL;
+}
+
 /*
  * Sheepdog support two kinds of redundancy, full replication and erasure
  * coding.
@@ -1892,12 +1934,13 @@ out_with_err_set:
  * # create a erasure coded vdi with x data strips and y parity strips
  * -o redundancy=x:y (x must be one of {2,4,8,16} and 1 <= y < SD_EC_MAX_STRIP)
  */
-static int parse_redundancy(BDRVSheepdogState *s, const char *opt)
+static int parse_redundancy_str(BDRVSheepdogState *s, const char *opt)
 {
-struct SheepdogInode *inode = >inode;
+struct SheepdogRedundancy redundancy;
 const char *n1, *n2;
 long copy, parity;
 char p[10];
+int ret;
 
 pstrcpy(p, sizeof(p), opt);
 n1 = strtok(p, ":");
@@ -1907,35 +1950,32 @@ static int parse_redundancy(BDRVSheepdogState *s, const 
char *opt)
 return -EINVAL;
 }
 
-copy = strtol(n1, NULL, 10);
-/* FIXME fix error checking by switching to qemu_strtol() */
-if (copy > SD_MAX_COPIES || copy < 1) {
-return -EINVAL;
-}
-if (!n2) {
-inode->copy_policy = 0;
-inode->nr_copies = copy;
-return 0;
+ret = qemu_strtol(n1, NULL, 10, );
+if (ret < 0) {
+return ret;
 }
 
-if (copy != 2 && copy != 4 && copy != 8 && copy != 16) {
-return -EINVAL;
-}
+if (!n2) {
+redundancy = (SheepdogRedundancy) {
+.type   = SHEEPDOG_REDUNDANCY_TYPE_FULL,
+.u.full.copies  = copy,
+};
+} else {
+ret = qemu_strtol(n2, NULL, 10, );
+if (ret < 0) {
+return ret;
+}
 
-parity = strtol(n2, NULL, 10);
-/* FIXME fix error checking by 

[Qemu-block] [PATCH v3 25/36] nfs: Use QAPI options in nfs_client_open()

2018-02-23 Thread Kevin Wolf
Using the QAPI visitor to turn all options into QAPI BlockdevOptionsNfs
simplifies the code a lot. It will also be useful for implementing the
QAPI based .bdrv_co_create callback.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 block/nfs.c | 176 ++--
 1 file changed, 53 insertions(+), 123 deletions(-)

diff --git a/block/nfs.c b/block/nfs.c
index 6576a73d6e..9283bfbaae 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -367,49 +367,6 @@ static int coroutine_fn nfs_co_flush(BlockDriverState *bs)
 return task.ret;
 }
 
-static QemuOptsList runtime_opts = {
-.name = "nfs",
-.head = QTAILQ_HEAD_INITIALIZER(runtime_opts.head),
-.desc = {
-{
-.name = "path",
-.type = QEMU_OPT_STRING,
-.help = "Path of the image on the host",
-},
-{
-.name = "user",
-.type = QEMU_OPT_NUMBER,
-.help = "UID value to use when talking to the server",
-},
-{
-.name = "group",
-.type = QEMU_OPT_NUMBER,
-.help = "GID value to use when talking to the server",
-},
-{
-.name = "tcp-syn-count",
-.type = QEMU_OPT_NUMBER,
-.help = "Number of SYNs to send during the session establish",
-},
-{
-.name = "readahead-size",
-.type = QEMU_OPT_NUMBER,
-.help = "Set the readahead size in bytes",
-},
-{
-.name = "page-cache-size",
-.type = QEMU_OPT_NUMBER,
-.help = "Set the pagecache size in bytes",
-},
-{
-.name = "debug",
-.type = QEMU_OPT_NUMBER,
-.help = "Set the NFS debug level (max 2)",
-},
-{ /* end of list */ }
-},
-};
-
 static void nfs_detach_aio_context(BlockDriverState *bs)
 {
 NFSClient *client = bs->opaque;
@@ -452,71 +409,16 @@ static void nfs_file_close(BlockDriverState *bs)
 nfs_client_close(client);
 }
 
-static NFSServer *nfs_config(QDict *options, Error **errp)
-{
-NFSServer *server = NULL;
-QDict *addr = NULL;
-QObject *crumpled_addr = NULL;
-Visitor *iv = NULL;
-Error *local_error = NULL;
-
-qdict_extract_subqdict(options, , "server.");
-if (!qdict_size(addr)) {
-error_setg(errp, "NFS server address missing");
-goto out;
-}
-
-crumpled_addr = qdict_crumple(addr, errp);
-if (!crumpled_addr) {
-goto out;
-}
-
-/*
- * Caution: this works only because all scalar members of
- * NFSServer are QString in @crumpled_addr.  The visitor expects
- * @crumpled_addr to be typed according to the QAPI schema.  It
- * is when @options come from -blockdev or blockdev_add.  But when
- * they come from -drive, they're all QString.
- */
-iv = qobject_input_visitor_new(crumpled_addr);
-visit_type_NFSServer(iv, NULL, , _error);
-if (local_error) {
-error_propagate(errp, local_error);
-goto out;
-}
-
-out:
-QDECREF(addr);
-qobject_decref(crumpled_addr);
-visit_free(iv);
-return server;
-}
-
-
-static int64_t nfs_client_open(NFSClient *client, QDict *options,
+static int64_t nfs_client_open(NFSClient *client, BlockdevOptionsNfs *opts,
int flags, int open_flags, Error **errp)
 {
 int64_t ret = -EINVAL;
-QemuOpts *opts = NULL;
-Error *local_err = NULL;
 struct stat st;
 char *file = NULL, *strp = NULL;
 
 qemu_mutex_init(>mutex);
-opts = qemu_opts_create(_opts, NULL, 0, _abort);
-qemu_opts_absorb_qdict(opts, options, _err);
-if (local_err) {
-error_propagate(errp, local_err);
-ret = -EINVAL;
-goto fail;
-}
 
-client->path = g_strdup(qemu_opt_get(opts, "path"));
-if (!client->path) {
-ret = -EINVAL;
-error_setg(errp, "No path was specified");
-goto fail;
-}
+client->path = g_strdup(opts->path);
 
 strp = strrchr(client->path, '/');
 if (strp == NULL) {
@@ -526,12 +428,10 @@ static int64_t nfs_client_open(NFSClient *client, QDict 
*options,
 file = g_strdup(strp);
 *strp = 0;
 
-/* Pop the config into our state object, Exit if invalid */
-client->server = nfs_config(options, errp);
-if (!client->server) {
-ret = -EINVAL;
-goto fail;
-}
+/* Steal the NFSServer object from opts; set the original pointer to NULL
+ * to avoid use after free and double free. */
+client->server = opts->server;
+opts->server = NULL;
 
 client->context = nfs_init_context();
 if (client->context == NULL) {
@@ -539,29 +439,29 @@ static int64_t nfs_client_open(NFSClient *client, QDict 
*options,
 goto fail;
 }
 
-if (qemu_opt_get(opts, "user")) {
-client->uid = qemu_opt_get_number(opts, "user", 0);
+if (opts->has_user) {
+

[Qemu-block] [PATCH v3 10/36] test-qemu-opts: Test qemu_opts_to_qdict_filtered()

2018-02-23 Thread Kevin Wolf
Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
---
 tests/test-qemu-opts.c | 125 +
 1 file changed, 125 insertions(+)

diff --git a/tests/test-qemu-opts.c b/tests/test-qemu-opts.c
index 6c3183390b..2c422abcd4 100644
--- a/tests/test-qemu-opts.c
+++ b/tests/test-qemu-opts.c
@@ -10,6 +10,7 @@
 #include "qemu/osdep.h"
 #include "qemu/cutils.h"
 #include "qemu/option.h"
+#include "qemu/option_int.h"
 #include "qapi/error.h"
 #include "qapi/qmp/qdict.h"
 #include "qapi/qmp/qstring.h"
@@ -868,6 +869,127 @@ static void test_opts_append(void)
 qemu_opts_free(merged);
 }
 
+static void test_opts_to_qdict_basic(void)
+{
+QemuOpts *opts;
+QDict *dict;
+
+opts = qemu_opts_parse(_list_01, "str1=foo,str2=,str3=bar,number1=42",
+   false, _abort);
+g_assert(opts != NULL);
+
+dict = qemu_opts_to_qdict(opts, NULL);
+g_assert(dict != NULL);
+
+g_assert_cmpstr(qdict_get_str(dict, "str1"), ==, "foo");
+g_assert_cmpstr(qdict_get_str(dict, "str2"), ==, "");
+g_assert_cmpstr(qdict_get_str(dict, "str3"), ==, "bar");
+g_assert_cmpstr(qdict_get_str(dict, "number1"), ==, "42");
+g_assert_false(qdict_haskey(dict, "number2"));
+
+QDECREF(dict);
+qemu_opts_del(opts);
+}
+
+static void test_opts_to_qdict_filtered(void)
+{
+QemuOptsList *first, *merged;
+QemuOpts *opts;
+QDict *dict;
+
+first = qemu_opts_append(NULL, _list_02);
+merged = qemu_opts_append(first, _list_01);
+
+opts = qemu_opts_parse(merged,
+   "str1=foo,str2=,str3=bar,bool1=off,number1=42",
+   false, _abort);
+g_assert(opts != NULL);
+
+/* Convert to QDict without deleting from opts */
+dict = qemu_opts_to_qdict_filtered(opts, NULL, _list_01, false);
+g_assert(dict != NULL);
+g_assert_cmpstr(qdict_get_str(dict, "str1"), ==, "foo");
+g_assert_cmpstr(qdict_get_str(dict, "str2"), ==, "");
+g_assert_cmpstr(qdict_get_str(dict, "str3"), ==, "bar");
+g_assert_cmpstr(qdict_get_str(dict, "number1"), ==, "42");
+g_assert_false(qdict_haskey(dict, "number2"));
+g_assert_false(qdict_haskey(dict, "bool1"));
+QDECREF(dict);
+
+dict = qemu_opts_to_qdict_filtered(opts, NULL, _list_02, false);
+g_assert(dict != NULL);
+g_assert_cmpstr(qdict_get_str(dict, "str1"), ==, "foo");
+g_assert_cmpstr(qdict_get_str(dict, "str2"), ==, "");
+g_assert_cmpstr(qdict_get_str(dict, "bool1"), ==, "off");
+g_assert_false(qdict_haskey(dict, "str3"));
+g_assert_false(qdict_haskey(dict, "number1"));
+g_assert_false(qdict_haskey(dict, "number2"));
+QDECREF(dict);
+
+/* Now delete converted options from opts */
+dict = qemu_opts_to_qdict_filtered(opts, NULL, _list_01, true);
+g_assert(dict != NULL);
+g_assert_cmpstr(qdict_get_str(dict, "str1"), ==, "foo");
+g_assert_cmpstr(qdict_get_str(dict, "str2"), ==, "");
+g_assert_cmpstr(qdict_get_str(dict, "str3"), ==, "bar");
+g_assert_cmpstr(qdict_get_str(dict, "number1"), ==, "42");
+g_assert_false(qdict_haskey(dict, "number2"));
+g_assert_false(qdict_haskey(dict, "bool1"));
+QDECREF(dict);
+
+dict = qemu_opts_to_qdict_filtered(opts, NULL, _list_02, true);
+g_assert(dict != NULL);
+g_assert_cmpstr(qdict_get_str(dict, "bool1"), ==, "off");
+g_assert_false(qdict_haskey(dict, "str1"));
+g_assert_false(qdict_haskey(dict, "str2"));
+g_assert_false(qdict_haskey(dict, "str3"));
+g_assert_false(qdict_haskey(dict, "number1"));
+g_assert_false(qdict_haskey(dict, "number2"));
+QDECREF(dict);
+
+g_assert_true(QTAILQ_EMPTY(>head));
+
+qemu_opts_del(opts);
+qemu_opts_free(merged);
+}
+
+static void test_opts_to_qdict_duplicates(void)
+{
+QemuOpts *opts;
+QemuOpt *opt;
+QDict *dict;
+
+opts = qemu_opts_parse(_list_03, "foo=a,foo=b", false, _abort);
+g_assert(opts != NULL);
+
+/* Verify that opts has two options with the same name */
+opt = QTAILQ_FIRST(>head);
+g_assert_cmpstr(opt->name, ==, "foo");
+g_assert_cmpstr(opt->str , ==, "a");
+
+opt = QTAILQ_NEXT(opt, next);
+g_assert_cmpstr(opt->name, ==, "foo");
+g_assert_cmpstr(opt->str , ==, "b");
+
+opt = QTAILQ_NEXT(opt, next);
+g_assert(opt == NULL);
+
+/* In the conversion to QDict, the last one wins */
+dict = qemu_opts_to_qdict(opts, NULL);
+g_assert(dict != NULL);
+g_assert_cmpstr(qdict_get_str(dict, "foo"), ==, "b");
+QDECREF(dict);
+
+/* The last one still wins if entries are deleted, and both are deleted */
+dict = qemu_opts_to_qdict_filtered(opts, NULL, NULL, true);
+g_assert(dict != NULL);
+g_assert_cmpstr(qdict_get_str(dict, "foo"), ==, "b");
+QDECREF(dict);
+
+g_assert_true(QTAILQ_EMPTY(>head));
+
+qemu_opts_del(opts);
+}
 
 int main(int argc, char *argv[])
 {
@@ -889,6 

[Qemu-block] [PATCH v3 24/36] rbd: Use qemu_rbd_connect() in qemu_rbd_do_create()

2018-02-23 Thread Kevin Wolf
This is almost exactly the same code. The differences are that
qemu_rbd_connect() supports BlockdevOptionsRbd.server and that the cache
mode is set explicitly.

Supporting 'server' is a welcome new feature for image creation.
Caching is disabled by default, so leave it that way.

Signed-off-by: Kevin Wolf 
---
 block/rbd.c | 54 ++
 1 file changed, 10 insertions(+), 44 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index 300a304ec3..624b3c4eac 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -103,6 +103,11 @@ typedef struct BDRVRBDState {
 char *snap;
 } BDRVRBDState;
 
+static int qemu_rbd_connect(rados_t *cluster, rados_ioctx_t *io_ctx,
+BlockdevOptionsRbd *opts, bool cache,
+const char *keypairs, const char *secretid,
+Error **errp);
+
 static char *qemu_rbd_next_tok(char *src, char delim, char **p)
 {
 char *end;
@@ -351,12 +356,6 @@ static int qemu_rbd_do_create(BlockdevCreateOptions 
*options,
 return -EINVAL;
 }
 
-/* TODO Remove the limitation */
-if (opts->location->has_server) {
-error_setg(errp, "Can't specify server for image creation");
-return -EINVAL;
-}
-
 if (opts->has_cluster_size) {
 int64_t objsize = opts->cluster_size;
 if ((objsize - 1) & objsize) {/* not a power of 2? */
@@ -370,54 +369,21 @@ static int qemu_rbd_do_create(BlockdevCreateOptions 
*options,
 obj_order = ctz32(objsize);
 }
 
-ret = rados_create(, opts->location->user);
+ret = qemu_rbd_connect(, _ctx, opts->location, false, keypairs,
+   password_secret, errp);
 if (ret < 0) {
-error_setg_errno(errp, -ret, "error initializing");
 return ret;
 }
 
-/* try default location when conf=NULL, but ignore failure */
-ret = rados_conf_read_file(cluster, opts->location->conf);
-if (opts->location->conf && ret < 0) {
-error_setg_errno(errp, -ret, "error reading conf file %s",
- opts->location->conf);
-ret = -EIO;
-goto shutdown;
-}
-
-ret = qemu_rbd_set_keypairs(cluster, keypairs, errp);
-if (ret < 0) {
-ret = -EIO;
-goto shutdown;
-}
-
-if (qemu_rbd_set_auth(cluster, password_secret, errp) < 0) {
-ret = -EIO;
-goto shutdown;
-}
-
-ret = rados_connect(cluster);
-if (ret < 0) {
-error_setg_errno(errp, -ret, "error connecting");
-goto shutdown;
-}
-
-ret = rados_ioctx_create(cluster, opts->location->pool, _ctx);
-if (ret < 0) {
-error_setg_errno(errp, -ret, "error opening pool %s",
- opts->location->pool);
-goto shutdown;
-}
-
 ret = rbd_create(io_ctx, opts->location->image, opts->size, _order);
 if (ret < 0) {
 error_setg_errno(errp, -ret, "error rbd create");
+goto out;
 }
 
-rados_ioctx_destroy(io_ctx);
-
 ret = 0;
-shutdown:
+out:
+rados_ioctx_destroy(io_ctx);
 rados_shutdown(cluster);
 return ret;
 }
-- 
2.13.6




[Qemu-block] [PATCH v3 23/36] rbd: Assign s->snap/image_name in qemu_rbd_open()

2018-02-23 Thread Kevin Wolf
Now that the options are already available in qemu_rbd_open() and not
only parsed in qemu_rbd_connect(), we can assign s->snap and
s->image_name there instead of passing the fields by reference to
qemu_rbd_connect().

Signed-off-by: Kevin Wolf 
---
 block/rbd.c | 14 +-
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index ee71dc8941..300a304ec3 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -571,7 +571,6 @@ static char *qemu_rbd_mon_host(BlockdevOptionsRbd *opts, 
Error **errp)
 }
 
 static int qemu_rbd_connect(rados_t *cluster, rados_ioctx_t *io_ctx,
-char **s_snap, char **s_image_name,
 BlockdevOptionsRbd *opts, bool cache,
 const char *keypairs, const char *secretid,
 Error **errp)
@@ -593,9 +592,6 @@ static int qemu_rbd_connect(rados_t *cluster, rados_ioctx_t 
*io_ctx,
 goto failed_opts;
 }
 
-*s_snap = g_strdup(opts->snapshot);
-*s_image_name = g_strdup(opts->image);
-
 /* try default location when conf=NULL, but ignore failure */
 r = rados_conf_read_file(*cluster, opts->conf);
 if (opts->has_conf && r < 0) {
@@ -649,8 +645,6 @@ static int qemu_rbd_connect(rados_t *cluster, rados_ioctx_t 
*io_ctx,
 
 failed_shutdown:
 rados_shutdown(*cluster);
-g_free(*s_snap);
-g_free(*s_image_name);
 failed_opts:
 g_free(mon_host);
 return r;
@@ -711,13 +705,15 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict 
*options, int flags,
 goto out;
 }
 
-r = qemu_rbd_connect(>cluster, >io_ctx, >snap, >image_name,
- opts, !(flags & BDRV_O_NOCACHE), keypairs, secretid,
- errp);
+r = qemu_rbd_connect(>cluster, >io_ctx, opts,
+ !(flags & BDRV_O_NOCACHE), keypairs, secretid, errp);
 if (r < 0) {
 goto out;
 }
 
+s->snap = g_strdup(opts->snapshot);
+s->image_name = g_strdup(opts->image);
+
 /* rbd_open is always r/w */
 r = rbd_open(s->io_ctx, s->image_name, >image, s->snap);
 if (r < 0) {
-- 
2.13.6




[Qemu-block] [PATCH v3 15/36] file-posix: Support .bdrv_co_create

2018-02-23 Thread Kevin Wolf
This adds the .bdrv_co_create driver callback to file, which enables
image creation over QMP.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
Reviewed-by: Eric Blake 
---
 qapi/block-core.json | 20 +-
 block/file-posix.c   | 77 +---
 2 files changed, 74 insertions(+), 23 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 359195a1a3..0040795603 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -3359,6 +3359,24 @@
 { 'command': 'blockdev-del', 'data': { 'node-name': 'str' } }
 
 ##
+# @BlockdevCreateOptionsFile:
+#
+# Driver specific image creation options for file.
+#
+# @filename Filename for the new image file
+# @size Size of the virtual disk in bytes
+# @preallocationPreallocation mode for the new image (default: off)
+# @nocowTurn off copy-on-write (valid only on btrfs; default: off)
+#
+# Since: 2.12
+##
+{ 'struct': 'BlockdevCreateOptionsFile',
+  'data': { 'filename': 'str',
+'size': 'size',
+'*preallocation':   'PreallocMode',
+'*nocow':   'bool' } }
+
+##
 # @BlockdevQcow2Version:
 #
 # @v2:  The original QCOW2 format as introduced in qemu 0.10 (version 2)
@@ -3429,7 +3447,7 @@
   'bochs':  'BlockdevCreateNotSupported',
   'cloop':  'BlockdevCreateNotSupported',
   'dmg':'BlockdevCreateNotSupported',
-  'file':   'BlockdevCreateNotSupported',
+  'file':   'BlockdevCreateOptionsFile',
   'ftp':'BlockdevCreateNotSupported',
   'ftps':   'BlockdevCreateNotSupported',
   'gluster':'BlockdevCreateNotSupported',
diff --git a/block/file-posix.c b/block/file-posix.c
index f1591c3849..ba14ed9459 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1982,33 +1982,25 @@ static int64_t 
raw_get_allocated_file_size(BlockDriverState *bs)
 return (int64_t)st.st_blocks * 512;
 }
 
-static int raw_create(const char *filename, QemuOpts *opts, Error **errp)
+static int raw_co_create(BlockdevCreateOptions *options, Error **errp)
 {
+BlockdevCreateOptionsFile *file_opts;
 int fd;
 int result = 0;
-int64_t total_size = 0;
-bool nocow = false;
-PreallocMode prealloc;
-char *buf = NULL;
-Error *local_err = NULL;
 
-strstart(filename, "file:", );
+/* Validate options and set default values */
+assert(options->driver == BLOCKDEV_DRIVER_FILE);
+file_opts = >u.file;
 
-/* Read out options */
-total_size = ROUND_UP(qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0),
-  BDRV_SECTOR_SIZE);
-nocow = qemu_opt_get_bool(opts, BLOCK_OPT_NOCOW, false);
-buf = qemu_opt_get_del(opts, BLOCK_OPT_PREALLOC);
-prealloc = qapi_enum_parse(_lookup, buf,
-   PREALLOC_MODE_OFF, _err);
-g_free(buf);
-if (local_err) {
-error_propagate(errp, local_err);
-result = -EINVAL;
-goto out;
+if (!file_opts->has_nocow) {
+file_opts->nocow = false;
+}
+if (!file_opts->has_preallocation) {
+file_opts->preallocation = PREALLOC_MODE_OFF;
 }
 
-fd = qemu_open(filename, O_RDWR | O_CREAT | O_TRUNC | O_BINARY,
+/* Create file */
+fd = qemu_open(file_opts->filename, O_RDWR | O_CREAT | O_TRUNC | O_BINARY,
0644);
 if (fd < 0) {
 result = -errno;
@@ -2016,7 +2008,7 @@ static int raw_create(const char *filename, QemuOpts 
*opts, Error **errp)
 goto out;
 }
 
-if (nocow) {
+if (file_opts->nocow) {
 #ifdef __linux__
 /* Set NOCOW flag to solve performance issue on fs like btrfs.
  * This is an optimisation. The FS_IOC_SETFLAGS ioctl return value
@@ -2031,7 +2023,8 @@ static int raw_create(const char *filename, QemuOpts 
*opts, Error **errp)
 #endif
 }
 
-result = raw_regular_truncate(fd, total_size, prealloc, errp);
+result = raw_regular_truncate(fd, file_opts->size, 
file_opts->preallocation,
+  errp);
 if (result < 0) {
 goto out_close;
 }
@@ -2045,6 +2038,45 @@ out:
 return result;
 }
 
+static int raw_create(const char *filename, QemuOpts *opts, Error **errp)
+{
+BlockdevCreateOptions options;
+int64_t total_size = 0;
+bool nocow = false;
+PreallocMode prealloc;
+char *buf = NULL;
+Error *local_err = NULL;
+
+/* Skip file: protocol prefix */
+strstart(filename, "file:", );
+
+/* Read out options */
+total_size = ROUND_UP(qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0),
+  BDRV_SECTOR_SIZE);
+nocow = qemu_opt_get_bool(opts, BLOCK_OPT_NOCOW, false);
+buf = qemu_opt_get_del(opts, BLOCK_OPT_PREALLOC);
+prealloc = qapi_enum_parse(_lookup, buf,
+   PREALLOC_MODE_OFF, _err);
+

[Qemu-block] [PATCH v3 14/36] block: x-blockdev-create QMP command

2018-02-23 Thread Kevin Wolf
This adds a synchronous x-blockdev-create QMP command that can create
qcow2 images on a given node name.

We don't want to block while creating an image, so this is not the final
interface in all aspects, but BlockdevCreateOptionsQcow2 and
.bdrv_co_create() are what they actually might look like in the end. In
any case, this should be good enough to test whether we interpret
BlockdevCreateOptions as we should.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
Reviewed-by: Eric Blake 
---
 qapi/block-core.json  | 12 
 include/block/block_int.h |  2 ++
 block/create.c| 76 +++
 block/qcow2.c |  3 +-
 block/Makefile.objs   |  2 +-
 5 files changed, 93 insertions(+), 2 deletions(-)
 create mode 100644 block/create.c

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 74b864d64e..359195a1a3 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -3464,6 +3464,18 @@
   } }
 
 ##
+# @x-blockdev-create:
+#
+# Create an image format on a given node.
+# TODO Replace with something asynchronous (block job?)
+#
+# Since: 2.12
+##
+{ 'command': 'x-blockdev-create',
+  'data': 'BlockdevCreateOptions',
+  'boxed': true }
+
+##
 # @blockdev-open-tray:
 #
 # Opens a block device's tray. If there is a block driver state tree inserted 
as
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 5ae7738cf8..0b43fae782 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -128,6 +128,8 @@ struct BlockDriver {
 int (*bdrv_file_open)(BlockDriverState *bs, QDict *options, int flags,
   Error **errp);
 void (*bdrv_close)(BlockDriverState *bs);
+int coroutine_fn (*bdrv_co_create)(BlockdevCreateOptions *opts,
+   Error **errp);
 int (*bdrv_create)(const char *filename, QemuOpts *opts, Error **errp);
 int (*bdrv_make_empty)(BlockDriverState *bs);
 
diff --git a/block/create.c b/block/create.c
new file mode 100644
index 00..dfd31eca37
--- /dev/null
+++ b/block/create.c
@@ -0,0 +1,76 @@
+/*
+ * Block layer code related to image creation
+ *
+ * Copyright (c) 2018 Kevin Wolf 
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include "block/block_int.h"
+#include "qmp-commands.h"
+#include "qapi/error.h"
+
+typedef struct BlockdevCreateCo {
+BlockDriver *drv;
+BlockdevCreateOptions *opts;
+int ret;
+Error **errp;
+} BlockdevCreateCo;
+
+static void coroutine_fn bdrv_co_create_co_entry(void *opaque)
+{
+BlockdevCreateCo *cco = opaque;
+cco->ret = cco->drv->bdrv_co_create(cco->opts, cco->errp);
+}
+
+void qmp_x_blockdev_create(BlockdevCreateOptions *options, Error **errp)
+{
+const char *fmt = BlockdevDriver_str(options->driver);
+BlockDriver *drv = bdrv_find_format(fmt);
+Coroutine *co;
+BlockdevCreateCo cco;
+
+/* If the driver is in the schema, we know that it exists. But it may not
+ * be whitelisted. */
+assert(drv);
+if (bdrv_uses_whitelist() && !bdrv_is_whitelisted(drv, false)) {
+error_setg(errp, "Driver is not whitelisted");
+return;
+}
+
+/* Call callback if it exists */
+if (!drv->bdrv_co_create) {
+error_setg(errp, "Driver does not support blockdev-create");
+return;
+}
+
+cco = (BlockdevCreateCo) {
+.drv = drv,
+.opts = options,
+.ret = -EINPROGRESS,
+.errp = errp,
+};
+
+co = qemu_coroutine_create(bdrv_co_create_co_entry, );
+qemu_coroutine_enter(co);
+while (cco.ret == -EINPROGRESS) {
+aio_poll(qemu_get_aio_context(), true);
+}
+}
diff --git a/block/qcow2.c b/block/qcow2.c
index 58737d0833..8acb36b0af 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -4463,7 +4463,8 @@ BlockDriver bdrv_qcow2 = {
 

[Qemu-block] [PATCH v3 13/36] block: Make bdrv_is_whitelisted() public

2018-02-23 Thread Kevin Wolf
We'll use a separate source file for image creation, and we need to
check there whether the requested driver is whitelisted.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
Reviewed-by: Eric Blake 
---
 include/block/block.h | 1 +
 block.c   | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/block/block.h b/include/block/block.h
index 54fe8b7a0e..cfce88cbda 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -225,6 +225,7 @@ char *bdrv_perm_names(uint64_t perm);
 void bdrv_init(void);
 void bdrv_init_with_whitelist(void);
 bool bdrv_uses_whitelist(void);
+int bdrv_is_whitelisted(BlockDriver *drv, bool read_only);
 BlockDriver *bdrv_find_protocol(const char *filename,
 bool allow_protocol_prefix,
 Error **errp);
diff --git a/block.c b/block.c
index c0e343d278..4a7e448226 100644
--- a/block.c
+++ b/block.c
@@ -372,7 +372,7 @@ BlockDriver *bdrv_find_format(const char *format_name)
 return bdrv_do_find_format(format_name);
 }
 
-static int bdrv_is_whitelisted(BlockDriver *drv, bool read_only)
+int bdrv_is_whitelisted(BlockDriver *drv, bool read_only)
 {
 static const char *whitelist_rw[] = {
 CONFIG_BDRV_RW_WHITELIST
-- 
2.13.6




[Qemu-block] [PATCH v3 09/36] test-qemu-opts: Test qemu_opts_append()

2018-02-23 Thread Kevin Wolf
Basic test for merging two QemuOptsLists.

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
---
 tests/test-qemu-opts.c | 128 +
 1 file changed, 128 insertions(+)

diff --git a/tests/test-qemu-opts.c b/tests/test-qemu-opts.c
index 5d5a3daa7b..6c3183390b 100644
--- a/tests/test-qemu-opts.c
+++ b/tests/test-qemu-opts.c
@@ -23,6 +23,8 @@ static QemuOptsList opts_list_01 = {
 {
 .name = "str1",
 .type = QEMU_OPT_STRING,
+.help = "Help texts are preserved in qemu_opts_append",
+.def_value_str = "default",
 },{
 .name = "str2",
 .type = QEMU_OPT_STRING,
@@ -32,6 +34,7 @@ static QemuOptsList opts_list_01 = {
 },{
 .name = "number1",
 .type = QEMU_OPT_NUMBER,
+.help = "Having help texts only for some options is okay",
 },{
 .name = "number2",
 .type = QEMU_OPT_NUMBER,
@@ -743,6 +746,129 @@ static void test_opts_parse_size(void)
 qemu_opts_reset(_list_02);
 }
 
+static void append_verify_list_01(QemuOptDesc *desc, bool with_overlapping)
+{
+int i = 0;
+
+if (with_overlapping) {
+g_assert_cmpstr(desc[i].name, ==, "str1");
+g_assert_cmpint(desc[i].type, ==, QEMU_OPT_STRING);
+g_assert_cmpstr(desc[i].help, ==,
+"Help texts are preserved in qemu_opts_append");
+g_assert_cmpstr(desc[i].def_value_str, ==, "default");
+i++;
+
+g_assert_cmpstr(desc[i].name, ==, "str2");
+g_assert_cmpint(desc[i].type, ==, QEMU_OPT_STRING);
+g_assert_cmpstr(desc[i].help, ==, NULL);
+g_assert_cmpstr(desc[i].def_value_str, ==, NULL);
+i++;
+}
+
+g_assert_cmpstr(desc[i].name, ==, "str3");
+g_assert_cmpint(desc[i].type, ==, QEMU_OPT_STRING);
+g_assert_cmpstr(desc[i].help, ==, NULL);
+g_assert_cmpstr(desc[i].def_value_str, ==, NULL);
+i++;
+
+g_assert_cmpstr(desc[i].name, ==, "number1");
+g_assert_cmpint(desc[i].type, ==, QEMU_OPT_NUMBER);
+g_assert_cmpstr(desc[i].help, ==,
+"Having help texts only for some options is okay");
+g_assert_cmpstr(desc[i].def_value_str, ==, NULL);
+i++;
+
+g_assert_cmpstr(desc[i].name, ==, "number2");
+g_assert_cmpint(desc[i].type, ==, QEMU_OPT_NUMBER);
+g_assert_cmpstr(desc[i].help, ==, NULL);
+g_assert_cmpstr(desc[i].def_value_str, ==, NULL);
+i++;
+
+g_assert_cmpstr(desc[i].name, ==, NULL);
+}
+
+static void append_verify_list_02(QemuOptDesc *desc)
+{
+int i = 0;
+
+g_assert_cmpstr(desc[i].name, ==, "str1");
+g_assert_cmpint(desc[i].type, ==, QEMU_OPT_STRING);
+g_assert_cmpstr(desc[i].help, ==, NULL);
+g_assert_cmpstr(desc[i].def_value_str, ==, NULL);
+i++;
+
+g_assert_cmpstr(desc[i].name, ==, "str2");
+g_assert_cmpint(desc[i].type, ==, QEMU_OPT_STRING);
+g_assert_cmpstr(desc[i].help, ==, NULL);
+g_assert_cmpstr(desc[i].def_value_str, ==, NULL);
+i++;
+
+g_assert_cmpstr(desc[i].name, ==, "bool1");
+g_assert_cmpint(desc[i].type, ==, QEMU_OPT_BOOL);
+g_assert_cmpstr(desc[i].help, ==, NULL);
+g_assert_cmpstr(desc[i].def_value_str, ==, NULL);
+i++;
+
+g_assert_cmpstr(desc[i].name, ==, "bool2");
+g_assert_cmpint(desc[i].type, ==, QEMU_OPT_BOOL);
+g_assert_cmpstr(desc[i].help, ==, NULL);
+g_assert_cmpstr(desc[i].def_value_str, ==, NULL);
+i++;
+
+g_assert_cmpstr(desc[i].name, ==, "size1");
+g_assert_cmpint(desc[i].type, ==, QEMU_OPT_SIZE);
+g_assert_cmpstr(desc[i].help, ==, NULL);
+g_assert_cmpstr(desc[i].def_value_str, ==, NULL);
+i++;
+
+g_assert_cmpstr(desc[i].name, ==, "size2");
+g_assert_cmpint(desc[i].type, ==, QEMU_OPT_SIZE);
+g_assert_cmpstr(desc[i].help, ==, NULL);
+g_assert_cmpstr(desc[i].def_value_str, ==, NULL);
+i++;
+
+g_assert_cmpstr(desc[i].name, ==, "size3");
+g_assert_cmpint(desc[i].type, ==, QEMU_OPT_SIZE);
+g_assert_cmpstr(desc[i].help, ==, NULL);
+g_assert_cmpstr(desc[i].def_value_str, ==, NULL);
+}
+
+static void test_opts_append_to_null(void)
+{
+QemuOptsList *merged;
+
+merged = qemu_opts_append(NULL, _list_01);
+g_assert(merged != _list_01);
+
+g_assert_cmpstr(merged->name, ==, NULL);
+g_assert_cmpstr(merged->implied_opt_name, ==, NULL);
+g_assert_false(merged->merge_lists);
+
+append_verify_list_01(merged->desc, true);
+
+qemu_opts_free(merged);
+}
+
+static void test_opts_append(void)
+{
+QemuOptsList *first, *merged;
+
+first = qemu_opts_append(NULL, _list_02);
+merged = qemu_opts_append(first, _list_01);
+g_assert(first != _list_02);
+g_assert(merged != _list_01);
+
+g_assert_cmpstr(merged->name, ==, NULL);
+g_assert_cmpstr(merged->implied_opt_name, ==, NULL);
+g_assert_false(merged->merge_lists);
+
+  

[Qemu-block] [PATCH v3 12/36] qcow2: Use visitor for options in qcow2_create()

2018-02-23 Thread Kevin Wolf
Instead of manually creating the BlockdevCreateOptions object, use a
visitor to parse the given options into the QAPI object.

This involves translation from the old command line syntax to the syntax
mandated by the QAPI schema. Option names are still checked against
qcow2_create_opts, so only the old option names are allowed on the
command line, even if they are translated in qcow2_create().

In contrast, new option values are optionally recognised besides the old
values: 'compat' accepts 'v2'/'v3' as an alias for '0.10'/'1.1', and
'encrypt.format' accepts 'qcow' as an alias for 'aes' now.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
Reviewed-by: Eric Blake 
---
 block/qcow2.c  | 217 -
 tests/qemu-iotests/049.out |   8 +-
 tests/qemu-iotests/112.out |   4 +-
 3 files changed, 83 insertions(+), 146 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 64bf2863cd..58737d0833 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -38,7 +38,7 @@
 #include "qemu/option_int.h"
 #include "qemu/cutils.h"
 #include "qemu/bswap.h"
-#include "qapi/opts-visitor.h"
+#include "qapi/qobject-input-visitor.h"
 #include "qapi-visit.h"
 #include "block/crypto.h"
 
@@ -2414,37 +2414,6 @@ static int qcow2_crypt_method_from_format(const char 
*encryptfmt)
 }
 }
 
-static QCryptoBlockCreateOptions *
-qcow2_parse_encryption(const char *encryptfmt, QemuOpts *opts, Error **errp)
-{
-QCryptoBlockCreateOptions *cryptoopts = NULL;
-QDict *options, *encryptopts;
-int fmt;
-
-options = qemu_opts_to_qdict(opts, NULL);
-qdict_extract_subqdict(options, , "encrypt.");
-QDECREF(options);
-
-fmt = qcow2_crypt_method_from_format(encryptfmt);
-
-switch (fmt) {
-case QCOW_CRYPT_LUKS:
-cryptoopts = block_crypto_create_opts_init(
-Q_CRYPTO_BLOCK_FORMAT_LUKS, encryptopts, errp);
-break;
-case QCOW_CRYPT_AES:
-cryptoopts = block_crypto_create_opts_init(
-Q_CRYPTO_BLOCK_FORMAT_QCOW, encryptopts, errp);
-break;
-default:
-error_setg(errp, "Unknown encryption format '%s'", encryptfmt);
-break;
-}
-
-QDECREF(encryptopts);
-return cryptoopts;
-}
-
 static int qcow2_set_up_encryption(BlockDriverState *bs,
QCryptoBlockCreateOptions *cryptoopts,
Error **errp)
@@ -2838,7 +2807,7 @@ static int qcow2_create2(BlockdevCreateOptions 
*create_options, Error **errp)
 }
 if (version < 3 && qcow2_opts->lazy_refcounts) {
 error_setg(errp, "Lazy refcounts only supported with compatibility "
-   "level 1.1 and above (use compat=1.1 or greater)");
+   "level 1.1 and above (use version=v3 or greater)");
 ret = -EINVAL;
 goto out;
 }
@@ -2856,7 +2825,7 @@ static int qcow2_create2(BlockdevCreateOptions 
*create_options, Error **errp)
 }
 if (version < 3 && qcow2_opts->refcount_bits != 16) {
 error_setg(errp, "Different refcount widths than 16 bits require "
-   "compatibility level 1.1 or above (use compat=1.1 or "
+   "compatibility level 1.1 or above (use version=v3 or "
"greater)");
 ret = -EINVAL;
 goto out;
@@ -3043,144 +3012,112 @@ out:
 
 static int qcow2_create(const char *filename, QemuOpts *opts, Error **errp)
 {
-BlockdevCreateOptions create_options;
-char *backing_file = NULL;
-char *backing_fmt = NULL;
-BlockdevDriver backing_drv;
-char *buf = NULL;
-uint64_t size = 0;
-int flags = 0;
-size_t cluster_size = DEFAULT_CLUSTER_SIZE;
-PreallocMode prealloc;
-int version;
-uint64_t refcount_bits;
-char *encryptfmt = NULL;
-QCryptoBlockCreateOptions *cryptoopts = NULL;
+BlockdevCreateOptions *create_options = NULL;
+QDict *qdict = NULL;
+QObject *qobj;
+Visitor *v;
 BlockDriverState *bs = NULL;
 Error *local_err = NULL;
+const char *val;
 int ret;
 
-/* Read out options */
-size = ROUND_UP(qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0),
-BDRV_SECTOR_SIZE);
-backing_file = qemu_opt_get_del(opts, BLOCK_OPT_BACKING_FILE);
-backing_fmt = qemu_opt_get_del(opts, BLOCK_OPT_BACKING_FMT);
-backing_drv = qapi_enum_parse(_lookup, backing_fmt,
-  0, _err);
-if (local_err) {
-error_propagate(errp, local_err);
+/* Only the keyval visitor supports the dotted syntax needed for
+ * encryption, so go through a QDict before getting a QAPI type. Ignore
+ * options meant for the protocol layer so that the visitor doesn't
+ * complain. */
+qdict = qemu_opts_to_qdict_filtered(opts, NULL, bdrv_qcow2.create_opts,
+true);
+
+/* Handle encryption options */
+val = qdict_get_try_str(qdict, 

[Qemu-block] [PATCH v3 08/36] util: Add qemu_opts_to_qdict_filtered()

2018-02-23 Thread Kevin Wolf
This allows, given a QemuOpts for a QemuOptsList that was merged from
multiple QemuOptsList, to only consider those options that exist in one
specific list. Block drivers need this to separate format-layer create
options from protocol-level options.

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
---
 include/qemu/option.h |  2 ++
 util/qemu-option.c| 42 +-
 2 files changed, 39 insertions(+), 5 deletions(-)

diff --git a/include/qemu/option.h b/include/qemu/option.h
index b127fb6db6..306fdb5f7a 100644
--- a/include/qemu/option.h
+++ b/include/qemu/option.h
@@ -124,6 +124,8 @@ void qemu_opts_set_defaults(QemuOptsList *list, const char 
*params,
 int permit_abbrev);
 QemuOpts *qemu_opts_from_qdict(QemuOptsList *list, const QDict *qdict,
Error **errp);
+QDict *qemu_opts_to_qdict_filtered(QemuOpts *opts, QDict *qdict,
+   QemuOptsList *list, bool del);
 QDict *qemu_opts_to_qdict(QemuOpts *opts, QDict *qdict);
 void qemu_opts_absorb_qdict(QemuOpts *opts, QDict *qdict, Error **errp);
 
diff --git a/util/qemu-option.c b/util/qemu-option.c
index a401e936da..2b412eff5e 100644
--- a/util/qemu-option.c
+++ b/util/qemu-option.c
@@ -1007,14 +1007,23 @@ void qemu_opts_absorb_qdict(QemuOpts *opts, QDict 
*qdict, Error **errp)
 }
 
 /*
- * Convert from QemuOpts to QDict.
- * The QDict values are of type QString.
+ * Convert from QemuOpts to QDict. The QDict values are of type QString.
+ *
+ * If @list is given, only add those options to the QDict that are contained in
+ * the list. If @del is true, any options added to the QDict are removed from
+ * the QemuOpts, otherwise they remain there.
+ *
+ * If two options in @opts have the same name, they are processed in order
+ * so that the last one wins (consistent with the reverse iteration in
+ * qemu_opt_find()), but all of them are deleted if @del is true.
+ *
  * TODO We'll want to use types appropriate for opt->desc->type, but
  * this is enough for now.
  */
-QDict *qemu_opts_to_qdict(QemuOpts *opts, QDict *qdict)
+QDict *qemu_opts_to_qdict_filtered(QemuOpts *opts, QDict *qdict,
+   QemuOptsList *list, bool del)
 {
-QemuOpt *opt;
+QemuOpt *opt, *next;
 
 if (!qdict) {
 qdict = qdict_new();
@@ -1022,12 +1031,35 @@ QDict *qemu_opts_to_qdict(QemuOpts *opts, QDict *qdict)
 if (opts->id) {
 qdict_put_str(qdict, "id", opts->id);
 }
-QTAILQ_FOREACH(opt, >head, next) {
+QTAILQ_FOREACH_SAFE(opt, >head, next, next) {
+if (list) {
+QemuOptDesc *desc;
+bool found = false;
+for (desc = list->desc; desc->name; desc++) {
+if (!strcmp(desc->name, opt->name)) {
+found = true;
+break;
+}
+}
+if (!found) {
+continue;
+}
+}
 qdict_put_str(qdict, opt->name, opt->str);
+if (del) {
+qemu_opt_del(opt);
+}
 }
 return qdict;
 }
 
+/* Copy all options in a QemuOpts to the given QDict. See
+ * qemu_opts_to_qdict_filtered() for details. */
+QDict *qemu_opts_to_qdict(QemuOpts *opts, QDict *qdict)
+{
+return qemu_opts_to_qdict_filtered(opts, qdict, NULL, false);
+}
+
 /* Validate parsed opts against descriptions where no
  * descriptions were provided in the QemuOptsList.
  */
-- 
2.13.6




[Qemu-block] [PATCH v3 11/36] qdict: Introduce qdict_rename_keys()

2018-02-23 Thread Kevin Wolf
A few block drivers will need to rename .bdrv_create options for their
QAPIfication, so let's have a helper function for that.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
Reviewed-by: Eric Blake 
---
 include/qapi/qmp/qdict.h |   6 +++
 qobject/qdict.c  |  34 +
 tests/check-qdict.c  | 129 +++
 3 files changed, 169 insertions(+)

diff --git a/include/qapi/qmp/qdict.h b/include/qapi/qmp/qdict.h
index ff6f7842c3..7c6d844549 100644
--- a/include/qapi/qmp/qdict.h
+++ b/include/qapi/qmp/qdict.h
@@ -81,4 +81,10 @@ QObject *qdict_crumple(const QDict *src, Error **errp);
 
 void qdict_join(QDict *dest, QDict *src, bool overwrite);
 
+typedef struct QDictRenames {
+const char *from;
+const char *to;
+} QDictRenames;
+bool qdict_rename_keys(QDict *qdict, const QDictRenames *renames, Error 
**errp);
+
 #endif /* QDICT_H */
diff --git a/qobject/qdict.c b/qobject/qdict.c
index 23df84f9cd..229b8c840b 100644
--- a/qobject/qdict.c
+++ b/qobject/qdict.c
@@ -1072,3 +1072,37 @@ void qdict_join(QDict *dest, QDict *src, bool overwrite)
 entry = next;
 }
 }
+
+/**
+ * qdict_rename_keys(): Rename keys in qdict according to the replacements
+ * specified in the array renames. The array must be terminated by an entry
+ * with from = NULL.
+ *
+ * The renames are performed individually in the order of the array, so entries
+ * may be renamed multiple times and may or may not conflict depending on the
+ * order of the renames array.
+ *
+ * Returns true for success, false in error cases.
+ */
+bool qdict_rename_keys(QDict *qdict, const QDictRenames *renames, Error **errp)
+{
+QObject *qobj;
+
+while (renames->from) {
+if (qdict_haskey(qdict, renames->from)) {
+if (qdict_haskey(qdict, renames->to)) {
+error_setg(errp, "'%s' and its alias '%s' can't be used at the 
"
+   "same time", renames->to, renames->from);
+return false;
+}
+
+qobj = qdict_get(qdict, renames->from);
+qobject_incref(qobj);
+qdict_put_obj(qdict, renames->to, qobj);
+qdict_del(qdict, renames->from);
+}
+
+renames++;
+}
+return true;
+}
diff --git a/tests/check-qdict.c b/tests/check-qdict.c
index ec628f3453..a3faea8bfc 100644
--- a/tests/check-qdict.c
+++ b/tests/check-qdict.c
@@ -665,6 +665,133 @@ static void qdict_crumple_test_empty(void)
 QDECREF(dst);
 }
 
+static int qdict_count_entries(QDict *dict)
+{
+const QDictEntry *e;
+int count = 0;
+
+for (e = qdict_first(dict); e; e = qdict_next(dict, e)) {
+count++;
+}
+
+return count;
+}
+
+static void qdict_rename_keys_test(void)
+{
+QDict *dict = qdict_new();
+QDict *copy;
+QDictRenames *renames;
+Error *local_err = NULL;
+
+qdict_put_str(dict, "abc", "foo");
+qdict_put_str(dict, "abcdef", "bar");
+qdict_put_int(dict, "number", 42);
+qdict_put_bool(dict, "flag", true);
+qdict_put_null(dict, "nothing");
+
+/* Empty rename list */
+renames = (QDictRenames[]) {
+{ NULL, "this can be anything" }
+};
+copy = qdict_clone_shallow(dict);
+qdict_rename_keys(copy, renames, _abort);
+
+g_assert_cmpstr(qdict_get_str(copy, "abc"), ==, "foo");
+g_assert_cmpstr(qdict_get_str(copy, "abcdef"), ==, "bar");
+g_assert_cmpint(qdict_get_int(copy, "number"), ==, 42);
+g_assert_cmpint(qdict_get_bool(copy, "flag"), ==, true);
+g_assert(qobject_type(qdict_get(copy, "nothing")) == QTYPE_QNULL);
+g_assert_cmpint(qdict_count_entries(copy), ==, 5);
+
+QDECREF(copy);
+
+/* Simple rename of all entries */
+renames = (QDictRenames[]) {
+{ "abc","str1" },
+{ "abcdef", "str2" },
+{ "number", "int" },
+{ "flag",   "bool" },
+{ "nothing","null" },
+{ NULL , NULL }
+};
+copy = qdict_clone_shallow(dict);
+qdict_rename_keys(copy, renames, _abort);
+
+g_assert(!qdict_haskey(copy, "abc"));
+g_assert(!qdict_haskey(copy, "abcdef"));
+g_assert(!qdict_haskey(copy, "number"));
+g_assert(!qdict_haskey(copy, "flag"));
+g_assert(!qdict_haskey(copy, "nothing"));
+
+g_assert_cmpstr(qdict_get_str(copy, "str1"), ==, "foo");
+g_assert_cmpstr(qdict_get_str(copy, "str2"), ==, "bar");
+g_assert_cmpint(qdict_get_int(copy, "int"), ==, 42);
+g_assert_cmpint(qdict_get_bool(copy, "bool"), ==, true);
+g_assert(qobject_type(qdict_get(copy, "null")) == QTYPE_QNULL);
+g_assert_cmpint(qdict_count_entries(copy), ==, 5);
+
+QDECREF(copy);
+
+/* Renames are processed top to bottom */
+renames = (QDictRenames[]) {
+{ "abc","tmp" },
+{ "abcdef", "abc" },
+{ "number", "abcdef" },
+{ "flag",   "number" },
+{ "nothing","flag" },
+{ "tmp", 

[Qemu-block] [PATCH v3 04/36] qcow2: Pass BlockdevCreateOptions to qcow2_create2()

2018-02-23 Thread Kevin Wolf
All of the simple options are now passed to qcow2_create2() in a
BlockdevCreateOptions object. Still missing: node-name and the
encryption options.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 block/qcow2.c | 190 ++
 1 file changed, 152 insertions(+), 38 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index dc6cdea113..22194180c6 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2665,19 +2665,26 @@ static int64_t qcow2_calc_prealloc_size(int64_t 
total_size,
 return meta_size + aligned_total_size;
 }
 
-static size_t qcow2_opt_get_cluster_size_del(QemuOpts *opts, Error **errp)
+static bool validate_cluster_size(size_t cluster_size, Error **errp)
 {
-size_t cluster_size;
-int cluster_bits;
-
-cluster_size = qemu_opt_get_size_del(opts, BLOCK_OPT_CLUSTER_SIZE,
- DEFAULT_CLUSTER_SIZE);
-cluster_bits = ctz32(cluster_size);
+int cluster_bits = ctz32(cluster_size);
 if (cluster_bits < MIN_CLUSTER_BITS || cluster_bits > MAX_CLUSTER_BITS ||
 (1 << cluster_bits) != cluster_size)
 {
 error_setg(errp, "Cluster size must be a power of two between %d and "
"%dk", 1 << MIN_CLUSTER_BITS, 1 << (MAX_CLUSTER_BITS - 10));
+return false;
+}
+return true;
+}
+
+static size_t qcow2_opt_get_cluster_size_del(QemuOpts *opts, Error **errp)
+{
+size_t cluster_size;
+
+cluster_size = qemu_opt_get_size_del(opts, BLOCK_OPT_CLUSTER_SIZE,
+ DEFAULT_CLUSTER_SIZE);
+if (!validate_cluster_size(cluster_size, errp)) {
 return 0;
 }
 return cluster_size;
@@ -2725,12 +2732,11 @@ static uint64_t 
qcow2_opt_get_refcount_bits_del(QemuOpts *opts, int version,
 return refcount_bits;
 }
 
-static int qcow2_create2(BlockDriverState *bs, int64_t total_size,
- const char *backing_file, const char *backing_format,
- int flags, size_t cluster_size, PreallocMode prealloc,
- QemuOpts *opts, int version, int refcount_order,
- const char *encryptfmt, Error **errp)
+static int qcow2_create2(BlockDriverState *bs,
+ BlockdevCreateOptions *create_options,
+ QemuOpts *opts, const char *encryptfmt, Error **errp)
 {
+BlockdevCreateOptionsQcow2 *qcow2_opts;
 QDict *options;
 
 /*
@@ -2747,10 +2753,92 @@ static int qcow2_create2(BlockDriverState *bs, int64_t 
total_size,
  */
 BlockBackend *blk;
 QCowHeader *header;
+size_t cluster_size;
+int version;
+int refcount_order;
 uint64_t* refcount_table;
 Error *local_err = NULL;
 int ret;
 
+/* Validate options and set default values */
+assert(create_options->driver == BLOCKDEV_DRIVER_QCOW2);
+qcow2_opts = _options->u.qcow2;
+
+if (!QEMU_IS_ALIGNED(qcow2_opts->size, BDRV_SECTOR_SIZE)) {
+error_setg(errp, "Image size must be a multiple of 512 bytes");
+ret = -EINVAL;
+goto out;
+}
+
+if (qcow2_opts->has_version) {
+switch (qcow2_opts->version) {
+case BLOCKDEV_QCOW2_VERSION_V2:
+version = 2;
+break;
+case BLOCKDEV_QCOW2_VERSION_V3:
+version = 3;
+break;
+default:
+g_assert_not_reached();
+}
+} else {
+version = 3;
+}
+
+if (qcow2_opts->has_cluster_size) {
+cluster_size = qcow2_opts->cluster_size;
+} else {
+cluster_size = DEFAULT_CLUSTER_SIZE;
+}
+
+if (!validate_cluster_size(cluster_size, errp)) {
+return -EINVAL;
+}
+
+if (!qcow2_opts->has_preallocation) {
+qcow2_opts->preallocation = PREALLOC_MODE_OFF;
+}
+if (qcow2_opts->has_backing_file &&
+qcow2_opts->preallocation != PREALLOC_MODE_OFF)
+{
+error_setg(errp, "Backing file and preallocation cannot be used at "
+   "the same time");
+return -EINVAL;
+}
+if (qcow2_opts->has_backing_fmt && !qcow2_opts->has_backing_file) {
+error_setg(errp, "Backing format cannot be used without backing file");
+return -EINVAL;
+}
+
+if (!qcow2_opts->has_lazy_refcounts) {
+qcow2_opts->lazy_refcounts = false;
+}
+if (version < 3 && qcow2_opts->lazy_refcounts) {
+error_setg(errp, "Lazy refcounts only supported with compatibility "
+   "level 1.1 and above (use compat=1.1 or greater)");
+return -EINVAL;
+}
+
+if (!qcow2_opts->has_refcount_bits) {
+qcow2_opts->refcount_bits = 16;
+}
+if (qcow2_opts->refcount_bits > 64 ||
+!is_power_of_2(qcow2_opts->refcount_bits))
+{
+error_setg(errp, "Refcount width must be a power of two and may not "
+   "exceed 64 bits");
+return -EINVAL;
+}
+

[Qemu-block] [PATCH v3 05/36] qcow2: Use BlockdevRef in qcow2_create2()

2018-02-23 Thread Kevin Wolf
Instead of passing a separate BlockDriverState* into qcow2_create2(),
make use of the BlockdevRef that is included in BlockdevCreateOptions.

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
---
 include/block/block.h |  1 +
 block.c   | 47 +++
 block/qcow2.c | 38 --
 3 files changed, 72 insertions(+), 14 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 947e8876cd..54fe8b7a0e 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -245,6 +245,7 @@ BdrvChild *bdrv_open_child(const char *filename,
BlockDriverState* parent,
const BdrvChildRole *child_role,
bool allow_none, Error **errp);
+BlockDriverState *bdrv_open_blockdev_ref(BlockdevRef *ref, Error **errp);
 void bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd,
  Error **errp);
 int bdrv_open_backing_file(BlockDriverState *bs, QDict *parent_options,
diff --git a/block.c b/block.c
index 814e5a02da..c0e343d278 100644
--- a/block.c
+++ b/block.c
@@ -35,6 +35,8 @@
 #include "qapi/qmp/qerror.h"
 #include "qapi/qmp/qjson.h"
 #include "qapi/qmp/qstring.h"
+#include "qapi/qobject-output-visitor.h"
+#include "qapi-visit.h"
 #include "sysemu/block-backend.h"
 #include "sysemu/sysemu.h"
 #include "qemu/notify.h"
@@ -2408,6 +2410,51 @@ BdrvChild *bdrv_open_child(const char *filename,
 return c;
 }
 
+/* TODO Future callers may need to specify parent/child_role in order for
+ * option inheritance to work. Existing callers use it for the root node. */
+BlockDriverState *bdrv_open_blockdev_ref(BlockdevRef *ref, Error **errp)
+{
+BlockDriverState *bs = NULL;
+Error *local_err = NULL;
+QObject *obj = NULL;
+QDict *qdict = NULL;
+const char *reference = NULL;
+Visitor *v = NULL;
+
+if (ref->type == QTYPE_QSTRING) {
+reference = ref->u.reference;
+} else {
+BlockdevOptions *options = >u.definition;
+assert(ref->type == QTYPE_QDICT);
+
+v = qobject_output_visitor_new();
+visit_type_BlockdevOptions(v, NULL, , _err);
+if (local_err) {
+error_propagate(errp, local_err);
+goto fail;
+}
+visit_complete(v, );
+
+qdict = qobject_to_qdict(obj);
+qdict_flatten(qdict);
+
+/* bdrv_open_inherit() defaults to the values in bdrv_flags (for
+ * compatibility with other callers) rather than what we want as the
+ * real defaults. Apply the defaults here instead. */
+qdict_set_default_str(qdict, BDRV_OPT_CACHE_DIRECT, "off");
+qdict_set_default_str(qdict, BDRV_OPT_CACHE_NO_FLUSH, "off");
+qdict_set_default_str(qdict, BDRV_OPT_READ_ONLY, "off");
+}
+
+bs = bdrv_open_inherit(NULL, reference, qdict, 0, NULL, NULL, errp);
+obj = NULL;
+
+fail:
+qobject_decref(obj);
+visit_free(v);
+return bs;
+}
+
 static BlockDriverState *bdrv_append_temp_snapshot(BlockDriverState *bs,
int flags,
QDict *snapshot_options,
diff --git a/block/qcow2.c b/block/qcow2.c
index 22194180c6..b34924b0f0 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2732,8 +2732,7 @@ static uint64_t qcow2_opt_get_refcount_bits_del(QemuOpts 
*opts, int version,
 return refcount_bits;
 }
 
-static int qcow2_create2(BlockDriverState *bs,
- BlockdevCreateOptions *create_options,
+static int qcow2_create2(BlockdevCreateOptions *create_options,
  QemuOpts *opts, const char *encryptfmt, Error **errp)
 {
 BlockdevCreateOptionsQcow2 *qcow2_opts;
@@ -2751,7 +2750,8 @@ static int qcow2_create2(BlockDriverState *bs,
  * 2 GB for 64k clusters, and we don't want to have a 2 GB initial file
  * size for any qcow2 image.
  */
-BlockBackend *blk;
+BlockBackend *blk = NULL;
+BlockDriverState *bs = NULL;
 QCowHeader *header;
 size_t cluster_size;
 int version;
@@ -2760,10 +2760,15 @@ static int qcow2_create2(BlockDriverState *bs,
 Error *local_err = NULL;
 int ret;
 
-/* Validate options and set default values */
 assert(create_options->driver == BLOCKDEV_DRIVER_QCOW2);
 qcow2_opts = _options->u.qcow2;
 
+bs = bdrv_open_blockdev_ref(qcow2_opts->file, errp);
+if (bs == NULL) {
+return -EIO;
+}
+
+/* Validate options and set default values */
 if (!QEMU_IS_ALIGNED(qcow2_opts->size, BDRV_SECTOR_SIZE)) {
 error_setg(errp, "Image size must be a multiple of 512 bytes");
 ret = -EINVAL;
@@ -2792,7 +2797,8 @@ static int qcow2_create2(BlockDriverState *bs,
 }
 
 if (!validate_cluster_size(cluster_size, errp)) {
-return -EINVAL;
+

[Qemu-block] [PATCH v3 07/36] qcow2: Handle full/falloc preallocation in qcow2_create2()

2018-02-23 Thread Kevin Wolf
Once qcow2_create2() can be called directly on an already existing node,
we must provide the 'full' and 'falloc' preallocation modes outside of
creating the image on the protocol layer. Fortunately, we have
preallocated truncate now which can provide this functionality.

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
---
 block/qcow2.c | 28 +++-
 1 file changed, 19 insertions(+), 9 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 9a2028b3cf..64bf2863cd 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2872,6 +2872,25 @@ static int qcow2_create2(BlockdevCreateOptions 
*create_options, Error **errp)
 }
 blk_set_allow_write_beyond_eof(blk, true);
 
+/* Clear the protocol layer and preallocate it if necessary */
+ret = blk_truncate(blk, 0, PREALLOC_MODE_OFF, errp);
+if (ret < 0) {
+goto out;
+}
+
+if (qcow2_opts->preallocation == PREALLOC_MODE_FULL ||
+qcow2_opts->preallocation == PREALLOC_MODE_FALLOC)
+{
+int64_t prealloc_size =
+qcow2_calc_prealloc_size(qcow2_opts->size, cluster_size,
+ refcount_order);
+
+ret = blk_truncate(blk, prealloc_size, qcow2_opts->preallocation, 
errp);
+if (ret < 0) {
+goto out;
+}
+}
+
 /* Write the header */
 QEMU_BUILD_BUG_ON((1 << MIN_CLUSTER_BITS) < sizeof(*header));
 header = g_malloc0(cluster_size);
@@ -3108,15 +3127,6 @@ static int qcow2_create(const char *filename, QemuOpts 
*opts, Error **errp)
 
 
 /* Create and open the file (protocol layer) */
-if (prealloc == PREALLOC_MODE_FULL || prealloc == PREALLOC_MODE_FALLOC) {
-int refcount_order = ctz32(refcount_bits);
-int64_t prealloc_size =
-qcow2_calc_prealloc_size(size, cluster_size, refcount_order);
-qemu_opt_set_number(opts, BLOCK_OPT_SIZE, prealloc_size, _abort);
-qemu_opt_set(opts, BLOCK_OPT_PREALLOC, PreallocMode_str(prealloc),
- _abort);
-}
-
 ret = bdrv_create_file(filename, opts, errp);
 if (ret < 0) {
 goto finish;
-- 
2.13.6




[Qemu-block] [PATCH v3 00/36] x-blockdev-create for protocols and qcow2

2018-02-23 Thread Kevin Wolf
This series implements a minimal QMP command that allows to create an
image file on the protocol level or an image format on a given block
node.

Eventually, the interface is going to change to some kind of an async
command (possibly a (non-)block job), but that will require more work on
the job infrastructure first, so let's first QAPIfy image creation in
the block drivers. In this series, I'm going for a synchronous command
that is prefixed with x- for now.

This series converts qcow2 and all protocol drivers that allow an actual
image creation. This means that drivers which only check if the already
existing storage is good enough are not converted (e.g. host_device,
iscsi). The old behaviour was useful because 'qemu-img create' wants to
create both protocol and format layer, but with the separation in QMP,
you can just leave out the protocol layer creation when the device
already exists.

Please note that for some of the protocol drivers (gluster, rbd and
sheepdog) I don't have a test setup ready. For those, I only tested
with a fake server address to check that the option are parsed correctly
up to this point and an appropriate error is returned without crashing.

If you are a maintainer of one of these protocols and you are
interested in keeping image creation working for your protocol, you
probably want to test this series on a real setup and give me some
feedback. If you don't, I'll just merge the patches and hope that they
won't break anything.


v3:
- Patch 11 ('qdict: Introduce qdict_rename_keys()'):
  Additional assertion in each test case [Eric]
  Fixed typo in comment [Max]

- Patch 21 ('rbd: Pass BlockdevOptionsRbd to qemu_rbd_connect()'):
  Removed NULL check that is redundant with schema validation [Max]
  Fixed memory leaks on the error path [Max]

- Patch 23 ('rbd: Assing s->snap/image_name in qemu_rbd_open()'):
  Fixed typo in the subject line

- Patch 24 ('rbd: Use qemu_rbd_connect() in qemu_rbd_do_create()'):
  Don't ignore password_secret, but pass it to qemu_rbd_connect() [Max]

- Patch 27 ('sheepdog: QAPIfy "redundacy" create option'):
  Fixed typo in the subject line and commit message

- Patch 28 ('sheepdog: Support .bdrv_co_create'):
  Set the 'driver' option for bdrv_open() [Max]


git-backport-diff compared to v2:

Key:
[] : patches are identical
[] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/36:[] [--] 'block/qapi: Introduce BlockdevCreateOptions'
002/36:[] [--] 'block/qapi: Add qcow2 create options to schema'
003/36:[] [--] 'qcow2: Let qcow2_create() handle protocol layer'
004/36:[] [--] 'qcow2: Pass BlockdevCreateOptions to qcow2_create2()'
005/36:[] [--] 'qcow2: Use BlockdevRef in qcow2_create2()'
006/36:[] [--] 'qcow2: Use QCryptoBlockCreateOptions in qcow2_create2()'
007/36:[] [--] 'qcow2: Handle full/falloc preallocation in qcow2_create2()'
008/36:[] [--] 'util: Add qemu_opts_to_qdict_filtered()'
009/36:[] [--] 'test-qemu-opts: Test qemu_opts_append()'
010/36:[] [--] 'test-qemu-opts: Test qemu_opts_to_qdict_filtered()'
011/36:[0018] [FC] 'qdict: Introduce qdict_rename_keys()'
012/36:[] [--] 'qcow2: Use visitor for options in qcow2_create()'
013/36:[] [--] 'block: Make bdrv_is_whitelisted() public'
014/36:[] [--] 'block: x-blockdev-create QMP command'
015/36:[] [--] 'file-posix: Support .bdrv_co_create'
016/36:[] [--] 'file-win32: Support .bdrv_co_create'
017/36:[] [--] 'gluster: Support .bdrv_co_create'
018/36:[] [--] 'rbd: Fix use after free in qemu_rbd_set_keypairs() error 
path'
019/36:[] [--] 'rbd: Factor out qemu_rbd_connect()'
020/36:[] [--] 'rbd: Remove non-schema options from runtime_opts'
021/36:[0013] [FC] 'rbd: Pass BlockdevOptionsRbd to qemu_rbd_connect()'
022/36:[] [--] 'rbd: Support .bdrv_co_create'
023/36:[down] 'rbd: Assign s->snap/image_name in qemu_rbd_open()'
024/36:[0002] [FC] 'rbd: Use qemu_rbd_connect() in qemu_rbd_do_create()'
025/36:[] [--] 'nfs: Use QAPI options in nfs_client_open()'
026/36:[] [--] 'nfs: Support .bdrv_co_create'
027/36:[down] 'sheepdog: QAPIfy "redundancy" create option'
028/36:[0002] [FC] 'sheepdog: Support .bdrv_co_create'
029/36:[] [--] 'ssh: Use QAPI BlockdevOptionsSsh object'
030/36:[] [--] 'ssh: QAPIfy host-key-check option'
031/36:[] [--] 'ssh: Pass BlockdevOptionsSsh to connect_to_ssh()'
032/36:[] [--] 'ssh: Support .bdrv_co_create'
033/36:[] [--] 'file-posix: Fix no-op bdrv_truncate() with falloc 
preallocation'
034/36:[] [--] 'block: Fail bdrv_truncate() with negative size'
035/36:[] [--] 'qemu-iotests: Test qcow2 over file image creation with QMP'
036/36:[] [--] 'qemu-iotests: Test ssh image creation over QMP'


Kevin Wolf (36):
  block/qapi: Introduce BlockdevCreateOptions
  block/qapi: Add qcow2 create options to schema
  qcow2: Let qcow2_create() 

[Qemu-block] [PATCH v3 01/36] block/qapi: Introduce BlockdevCreateOptions

2018-02-23 Thread Kevin Wolf
This creates a BlockdevCreateOptions union type that will contain all of
the options for image creation. We'll start out with an empty struct
type BlockdevCreateNotSupported for all drivers.

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
---
 qapi/block-core.json | 62 
 1 file changed, 62 insertions(+)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 5c5921bfb7..d256cefc79 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -3359,6 +3359,68 @@
 { 'command': 'blockdev-del', 'data': { 'node-name': 'str' } }
 
 ##
+# @BlockdevCreateNotSupported:
+#
+# This is used for all drivers that don't support creating images.
+#
+# Since: 2.12
+##
+{ 'struct': 'BlockdevCreateNotSupported', 'data': {}}
+
+##
+# @BlockdevCreateOptions:
+#
+# Options for creating an image format on a given node.
+#
+# @driver   block driver to create the image format
+#
+# Since: 2.12
+##
+{ 'union': 'BlockdevCreateOptions',
+  'base': {
+  'driver': 'BlockdevDriver' },
+  'discriminator': 'driver',
+  'data': {
+  'blkdebug':   'BlockdevCreateNotSupported',
+  'blkverify':  'BlockdevCreateNotSupported',
+  'bochs':  'BlockdevCreateNotSupported',
+  'cloop':  'BlockdevCreateNotSupported',
+  'dmg':'BlockdevCreateNotSupported',
+  'file':   'BlockdevCreateNotSupported',
+  'ftp':'BlockdevCreateNotSupported',
+  'ftps':   'BlockdevCreateNotSupported',
+  'gluster':'BlockdevCreateNotSupported',
+  'host_cdrom': 'BlockdevCreateNotSupported',
+  'host_device':'BlockdevCreateNotSupported',
+  'http':   'BlockdevCreateNotSupported',
+  'https':  'BlockdevCreateNotSupported',
+  'iscsi':  'BlockdevCreateNotSupported',
+  'luks':   'BlockdevCreateNotSupported',
+  'nbd':'BlockdevCreateNotSupported',
+  'nfs':'BlockdevCreateNotSupported',
+  'null-aio':   'BlockdevCreateNotSupported',
+  'null-co':'BlockdevCreateNotSupported',
+  'nvme':   'BlockdevCreateNotSupported',
+  'parallels':  'BlockdevCreateNotSupported',
+  'qcow2':  'BlockdevCreateNotSupported',
+  'qcow':   'BlockdevCreateNotSupported',
+  'qed':'BlockdevCreateNotSupported',
+  'quorum': 'BlockdevCreateNotSupported',
+  'raw':'BlockdevCreateNotSupported',
+  'rbd':'BlockdevCreateNotSupported',
+  'replication':'BlockdevCreateNotSupported',
+  'sheepdog':   'BlockdevCreateNotSupported',
+  'ssh':'BlockdevCreateNotSupported',
+  'throttle':   'BlockdevCreateNotSupported',
+  'vdi':'BlockdevCreateNotSupported',
+  'vhdx':   'BlockdevCreateNotSupported',
+  'vmdk':   'BlockdevCreateNotSupported',
+  'vpc':'BlockdevCreateNotSupported',
+  'vvfat':  'BlockdevCreateNotSupported',
+  'vxhs':   'BlockdevCreateNotSupported'
+  } }
+
+##
 # @blockdev-open-tray:
 #
 # Opens a block device's tray. If there is a block driver state tree inserted 
as
-- 
2.13.6




[Qemu-block] [PATCH v3 06/36] qcow2: Use QCryptoBlockCreateOptions in qcow2_create2()

2018-02-23 Thread Kevin Wolf
Instead of passing the encryption format name and the QemuOpts down, use
the QCryptoBlockCreateOptions contained in BlockdevCreateOptions.

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
---
 block/qcow2.c | 62 +++
 1 file changed, 45 insertions(+), 17 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index b34924b0f0..9a2028b3cf 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2414,13 +2414,10 @@ static int qcow2_crypt_method_from_format(const char 
*encryptfmt)
 }
 }
 
-static int qcow2_set_up_encryption(BlockDriverState *bs, const char 
*encryptfmt,
-   QemuOpts *opts, Error **errp)
+static QCryptoBlockCreateOptions *
+qcow2_parse_encryption(const char *encryptfmt, QemuOpts *opts, Error **errp)
 {
-BDRVQcow2State *s = bs->opaque;
 QCryptoBlockCreateOptions *cryptoopts = NULL;
-QCryptoBlock *crypto = NULL;
-int ret = -EINVAL;
 QDict *options, *encryptopts;
 int fmt;
 
@@ -2443,10 +2440,31 @@ static int qcow2_set_up_encryption(BlockDriverState 
*bs, const char *encryptfmt,
 error_setg(errp, "Unknown encryption format '%s'", encryptfmt);
 break;
 }
-if (!cryptoopts) {
-ret = -EINVAL;
-goto out;
+
+QDECREF(encryptopts);
+return cryptoopts;
+}
+
+static int qcow2_set_up_encryption(BlockDriverState *bs,
+   QCryptoBlockCreateOptions *cryptoopts,
+   Error **errp)
+{
+BDRVQcow2State *s = bs->opaque;
+QCryptoBlock *crypto = NULL;
+int fmt, ret;
+
+switch (cryptoopts->format) {
+case Q_CRYPTO_BLOCK_FORMAT_LUKS:
+fmt = QCOW_CRYPT_LUKS;
+break;
+case Q_CRYPTO_BLOCK_FORMAT_QCOW:
+fmt = QCOW_CRYPT_AES;
+break;
+default:
+error_setg(errp, "Crypto format not supported in qcow2");
+return -EINVAL;
 }
+
 s->crypt_method_header = fmt;
 
 crypto = qcrypto_block_create(cryptoopts, "encrypt.",
@@ -2454,8 +2472,7 @@ static int qcow2_set_up_encryption(BlockDriverState *bs, 
const char *encryptfmt,
   qcow2_crypto_hdr_write_func,
   bs, errp);
 if (!crypto) {
-ret = -EINVAL;
-goto out;
+return -EINVAL;
 }
 
 ret = qcow2_update_header(bs);
@@ -2464,10 +2481,9 @@ static int qcow2_set_up_encryption(BlockDriverState *bs, 
const char *encryptfmt,
 goto out;
 }
 
+ret = 0;
  out:
-QDECREF(encryptopts);
 qcrypto_block_free(crypto);
-qapi_free_QCryptoBlockCreateOptions(cryptoopts);
 return ret;
 }
 
@@ -2732,8 +2748,7 @@ static uint64_t qcow2_opt_get_refcount_bits_del(QemuOpts 
*opts, int version,
 return refcount_bits;
 }
 
-static int qcow2_create2(BlockdevCreateOptions *create_options,
- QemuOpts *opts, const char *encryptfmt, Error **errp)
+static int qcow2_create2(BlockdevCreateOptions *create_options, Error **errp)
 {
 BlockdevCreateOptionsQcow2 *qcow2_opts;
 QDict *options;
@@ -2963,8 +2978,8 @@ static int qcow2_create2(BlockdevCreateOptions 
*create_options,
 }
 
 /* Want encryption? There you go. */
-if (encryptfmt) {
-ret = qcow2_set_up_encryption(blk_bs(blk), encryptfmt, opts, errp);
+if (qcow2_opts->has_encrypt) {
+ret = qcow2_set_up_encryption(blk_bs(blk), qcow2_opts->encrypt, errp);
 if (ret < 0) {
 goto out;
 }
@@ -3021,6 +3036,7 @@ static int qcow2_create(const char *filename, QemuOpts 
*opts, Error **errp)
 int version;
 uint64_t refcount_bits;
 char *encryptfmt = NULL;
+QCryptoBlockCreateOptions *cryptoopts = NULL;
 BlockDriverState *bs = NULL;
 Error *local_err = NULL;
 int ret;
@@ -3037,6 +3053,7 @@ static int qcow2_create(const char *filename, QemuOpts 
*opts, Error **errp)
 ret = -EINVAL;
 goto finish;
 }
+
 encryptfmt = qemu_opt_get_del(opts, BLOCK_OPT_ENCRYPT_FORMAT);
 if (encryptfmt) {
 if (qemu_opt_get(opts, BLOCK_OPT_ENCRYPT)) {
@@ -3048,6 +3065,14 @@ static int qcow2_create(const char *filename, QemuOpts 
*opts, Error **errp)
 } else if (qemu_opt_get_bool_del(opts, BLOCK_OPT_ENCRYPT, false)) {
 encryptfmt = g_strdup("aes");
 }
+if (encryptfmt) {
+cryptoopts = qcow2_parse_encryption(encryptfmt, opts, errp);
+if (cryptoopts == NULL) {
+ret = -EINVAL;
+goto finish;
+}
+}
+
 cluster_size = qcow2_opt_get_cluster_size_del(opts, _err);
 if (local_err) {
 error_propagate(errp, local_err);
@@ -3121,6 +3146,8 @@ static int qcow2_create(const char *filename, QemuOpts 
*opts, Error **errp)
 .backing_file   = backing_file,
 .has_backing_fmt= (backing_fmt != NULL),
 .backing_fmt= backing_drv,
+ 

[Qemu-block] [PATCH v3 03/36] qcow2: Let qcow2_create() handle protocol layer

2018-02-23 Thread Kevin Wolf
Currently, qcow2_create() only parses the QemuOpts and then calls
qcow2_create2() for the actual image creation, which includes both the
creation of the actual file on the file system and writing a valid empty
qcow2 image into that file.

The plan is that qcow2_create2() becomes the function that implements
the functionality for a future 'blockdev-create' QMP command, which only
creates the qcow2 layer on an already opened file node.

This is a first step towards that goal: Let's move out anything that
deals with the protocol layer from qcow2_create2() into qcow2_create().
This means that qcow2_create2() doesn't need a file name any more.

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
---
 block/qcow2.c | 64 +++
 1 file changed, 38 insertions(+), 26 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 288b5299d8..dc6cdea113 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2725,7 +2725,7 @@ static uint64_t qcow2_opt_get_refcount_bits_del(QemuOpts 
*opts, int version,
 return refcount_bits;
 }
 
-static int qcow2_create2(const char *filename, int64_t total_size,
+static int qcow2_create2(BlockDriverState *bs, int64_t total_size,
  const char *backing_file, const char *backing_format,
  int flags, size_t cluster_size, PreallocMode prealloc,
  QemuOpts *opts, int version, int refcount_order,
@@ -2751,28 +2751,11 @@ static int qcow2_create2(const char *filename, int64_t 
total_size,
 Error *local_err = NULL;
 int ret;
 
-if (prealloc == PREALLOC_MODE_FULL || prealloc == PREALLOC_MODE_FALLOC) {
-int64_t prealloc_size =
-qcow2_calc_prealloc_size(total_size, cluster_size, refcount_order);
-qemu_opt_set_number(opts, BLOCK_OPT_SIZE, prealloc_size, _abort);
-qemu_opt_set(opts, BLOCK_OPT_PREALLOC, PreallocMode_str(prealloc),
- _abort);
-}
-
-ret = bdrv_create_file(filename, opts, _err);
+blk = blk_new(BLK_PERM_WRITE | BLK_PERM_RESIZE, BLK_PERM_ALL);
+ret = blk_insert_bs(blk, bs, errp);
 if (ret < 0) {
-error_propagate(errp, local_err);
-return ret;
-}
-
-blk = blk_new_open(filename, NULL, NULL,
-   BDRV_O_RDWR | BDRV_O_RESIZE | BDRV_O_PROTOCOL,
-   _err);
-if (blk == NULL) {
-error_propagate(errp, local_err);
-return -EIO;
+goto out;
 }
-
 blk_set_allow_write_beyond_eof(blk, true);
 
 /* Write the header */
@@ -2827,7 +2810,8 @@ static int qcow2_create2(const char *filename, int64_t 
total_size,
  */
 options = qdict_new();
 qdict_put_str(options, "driver", "qcow2");
-blk = blk_new_open(filename, NULL, options,
+qdict_put_str(options, "file", bs->node_name);
+blk = blk_new_open(NULL, NULL, options,
BDRV_O_RDWR | BDRV_O_RESIZE | BDRV_O_NO_FLUSH,
_err);
 if (blk == NULL) {
@@ -2899,7 +2883,8 @@ static int qcow2_create2(const char *filename, int64_t 
total_size,
  */
 options = qdict_new();
 qdict_put_str(options, "driver", "qcow2");
-blk = blk_new_open(filename, NULL, options,
+qdict_put_str(options, "file", bs->node_name);
+blk = blk_new_open(NULL, NULL, options,
BDRV_O_RDWR | BDRV_O_NO_BACKING | BDRV_O_NO_IO,
_err);
 if (blk == NULL) {
@@ -2929,6 +2914,7 @@ static int qcow2_create(const char *filename, QemuOpts 
*opts, Error **errp)
 uint64_t refcount_bits;
 int refcount_order;
 char *encryptfmt = NULL;
+BlockDriverState *bs = NULL;
 Error *local_err = NULL;
 int ret;
 
@@ -2997,12 +2983,38 @@ static int qcow2_create(const char *filename, QemuOpts 
*opts, Error **errp)
 
 refcount_order = ctz32(refcount_bits);
 
-ret = qcow2_create2(filename, size, backing_file, backing_fmt, flags,
+/* Create and open the file (protocol layer) */
+if (prealloc == PREALLOC_MODE_FULL || prealloc == PREALLOC_MODE_FALLOC) {
+int64_t prealloc_size =
+qcow2_calc_prealloc_size(size, cluster_size, refcount_order);
+qemu_opt_set_number(opts, BLOCK_OPT_SIZE, prealloc_size, _abort);
+qemu_opt_set(opts, BLOCK_OPT_PREALLOC, PreallocMode_str(prealloc),
+ _abort);
+}
+
+ret = bdrv_create_file(filename, opts, errp);
+if (ret < 0) {
+goto finish;
+}
+
+bs = bdrv_open(filename, NULL, NULL,
+   BDRV_O_RDWR | BDRV_O_RESIZE | BDRV_O_PROTOCOL, errp);
+if (bs == NULL) {
+ret = -EIO;
+goto finish;
+}
+
+/* Create the qcow2 image (format layer) */
+ret = qcow2_create2(bs, size, backing_file, backing_fmt, flags,
 cluster_size, prealloc, opts, version, refcount_order,
-encryptfmt, 

[Qemu-block] [PATCH v3 02/36] block/qapi: Add qcow2 create options to schema

2018-02-23 Thread Kevin Wolf
Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
---
 qapi/block-core.json | 45 -
 1 file changed, 44 insertions(+), 1 deletion(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index d256cefc79..74b864d64e 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -3359,6 +3359,49 @@
 { 'command': 'blockdev-del', 'data': { 'node-name': 'str' } }
 
 ##
+# @BlockdevQcow2Version:
+#
+# @v2:  The original QCOW2 format as introduced in qemu 0.10 (version 2)
+# @v3:  The extended QCOW2 format as introduced in qemu 1.1 (version 3)
+#
+# Since: 2.12
+##
+{ 'enum': 'BlockdevQcow2Version',
+  'data': [ 'v2', 'v3' ] }
+
+
+##
+# @BlockdevCreateOptionsQcow2:
+#
+# Driver specific image creation options for qcow2.
+#
+# @file Node to create the image format on
+# @size Size of the virtual disk in bytes
+# @version  Compatibility level (default: v3)
+# @backing-file File name of the backing file if a backing file
+#   should be used
+# @backing-fmt  Name of the block driver to use for the backing file
+# @encrypt  Encryption options if the image should be encrypted
+# @cluster-size qcow2 cluster size in bytes (default: 65536)
+# @preallocationPreallocation mode for the new image (default: off)
+# @lazy-refcounts   True if refcounts may be updated lazily (default: off)
+# @refcount-bitsWidth of reference counts in bits (default: 16)
+#
+# Since: 2.12
+##
+{ 'struct': 'BlockdevCreateOptionsQcow2',
+  'data': { 'file': 'BlockdevRef',
+'size': 'size',
+'*version': 'BlockdevQcow2Version',
+'*backing-file':'str',
+'*backing-fmt': 'BlockdevDriver',
+'*encrypt': 'QCryptoBlockCreateOptions',
+'*cluster-size':'size',
+'*preallocation':   'PreallocMode',
+'*lazy-refcounts':  'bool',
+'*refcount-bits':   'int' } }
+
+##
 # @BlockdevCreateNotSupported:
 #
 # This is used for all drivers that don't support creating images.
@@ -3402,7 +3445,7 @@
   'null-co':'BlockdevCreateNotSupported',
   'nvme':   'BlockdevCreateNotSupported',
   'parallels':  'BlockdevCreateNotSupported',
-  'qcow2':  'BlockdevCreateNotSupported',
+  'qcow2':  'BlockdevCreateOptionsQcow2',
   'qcow':   'BlockdevCreateNotSupported',
   'qed':'BlockdevCreateNotSupported',
   'quorum': 'BlockdevCreateNotSupported',
-- 
2.13.6




Re: [Qemu-block] [PATCH v2 21/36] rbd: Pass BlockdevOptionsRbd to qemu_rbd_connect()

2018-02-23 Thread Kevin Wolf
Am 23.02.2018 um 17:43 hat Max Reitz geschrieben:
> On 2018-02-23 17:19, Kevin Wolf wrote:
> > Am 23.02.2018 um 00:25 hat Max Reitz geschrieben:
> >> On 2018-02-21 14:53, Kevin Wolf wrote:
> >>> With the conversion to a QAPI options object, the function is now
> >>> prepared to be used in a .bdrv_co_create implementation.
> >>>
> >>> Signed-off-by: Kevin Wolf 
> > 
> >>> -*s_snap = g_strdup(snap);
> >>> -*s_image_name = g_strdup(image_name);
> >>> +*s_snap = g_strdup(opts->snapshot);
> >>> +*s_image_name = g_strdup(opts->image);
> >>>  
> >>>  /* try default location when conf=NULL, but ignore failure */
> >>> -r = rados_conf_read_file(*cluster, conf);
> >>> -if (conf && r < 0) {
> >>> -error_setg_errno(errp, -r, "error reading conf file %s", conf);
> >>> +r = rados_conf_read_file(*cluster, opts->conf);
> >>> +if (opts->has_conf && r < 0) {
> >>
> >> Reading opts->conf without knowing whether opts->has_conf is true is a
> >> bit weird.  Would you mind "s->has_conf ? opts->conf : NULL" for the
> >> rados_conf_read() call?
> >>
> >> On that thought, opts->snapshot and opts->user are optional, too.  Are
> >> they guaranteed to be NULL if they haven't been specified?  Should we
> >> guard those accesses with opts->has_* queries, too?
> > 
> > These days, both the QMP marshalling code (for the outermost struct when
> > called from x-blockdev-create) and the input visitor (for nested structs
> > and non-QMP callers) initialise the objects with {0} and g_malloc0().
> > 
> > I think Markus once told me that I shouldn't do pointless has_* checks
> > any more in QMP commands, so I intentionally did the same here.
> 
> I'm a bit cautious because of non-zero defaults (like sslverify in the
> ssh driver), but as long as you're aware...

I still hope that QAPI will allow specifying default values in the
schema sometime. But yes, for the time being, not checking has_*
obviously only works when the default is 0/false/NULL.

Kevin


signature.asc
Description: PGP signature


Re: [Qemu-block] [PATCH v8 09/21] null: Switch to .bdrv_co_block_status()

2018-02-23 Thread Kevin Wolf
Am 23.02.2018 um 17:43 hat Eric Blake geschrieben:
> > OFFSET_VALID | DATA might be excusable because I can see that it's
> > convenient that a protocol driver refers to itself as *file instead of
> > returning NULL there and then the offset is valid (though it would be
> > pointless to actually follow the file pointer), but OFFSET_VALID without
> > DATA probably isn't.
> 
> So OFFSET_VALID | DATA for a protocol BDS is not just convenient, but
> necessary to avoid breaking qemu-img map output.  But you are also right
> that OFFSET_VALID without data makes little sense at a protocol layer. So
> with that in mind, I'm auditing all of the protocol layers to make sure
> OFFSET_VALID ends up as something sane.

That's one way to look at it.

The other way is that qemu-img map shouldn't ask the protocol layer for
its offset because it already knows the offset (it is what it passes as
a parameter to bdrv_co_block_status).

Anyway, it's probably not worth changing the interface, we should just
make sure that the return values of the individual drivers are
consistent.

Kevin



Re: [Qemu-block] [PATCH v8 09/21] null: Switch to .bdrv_co_block_status()

2018-02-23 Thread Eric Blake

On 02/14/2018 06:05 AM, Kevin Wolf wrote:


+static int coroutine_fn null_co_block_status(BlockDriverState *bs,



  if (s->read_zeroes) {
-return BDRV_BLOCK_OFFSET_VALID | start | BDRV_BLOCK_ZERO;
-} else {
-return BDRV_BLOCK_OFFSET_VALID | start;
+ret |= BDRV_BLOCK_ZERO;
  }
+return ret;
  }


Preexisting, but I think this return value is wrong. OFFSET_VALID
without DATA is to documented to have the following semantics:

  * DATA ZERO OFFSET_VALID
  *  ftt   sectors preallocated, read as zero, returned file not
  *necessarily zero at offset
  *  fft   sectors preallocated but read from backing_hd,
  *returned file contains garbage at offset

I'm not sure what OFFSET_VALID is even supposed to mean for null.


I'm finally getting around to playing with this.



Or in fact, what it is supposed to mean for any protocol driver, because
normally it just means I can use this offset for accessing bs->file. But > 
protocol drivers don't have a bs->file, so it's interesting to see that
they still all set this flag.


More precisely, it means "I can use this offset for accessing the 
returned *file".  Format and filter drivers set *file = bs->file (ie. 
their protocol layer), but protocol drivers set *file = bs (ie. 
themselves).  As long as you read it as "the offset is valid in the 
returned *file", and are careful as to _which_ BDS gets returned in 
*file*, it can still make sense.


So next I tried playing with a patch, to see how much returning 
OFFSET_VALID with DATA matters; and it turns out is is easily observable 
anywhere that the underlying protocol bleeds through to the format layer 
(particularly the raw format driver):


$ echo abc > tmp
$ truncate --size=10M tmp

pre-patch:
$ ./qemu-img map --output=json tmp
[{ "start": 0, "length": 4096, "depth": 0, "zero": false, "data": true, 
"offset": 0},
{ "start": 4096, "length": 10481664, "depth": 0, "zero": true, "data": 
false, "offset": 4096}]


turn off OFFSET_VALID at the protocol layer:
diff --git i/block/file-posix.c w/block/file-posix.c
index f1591c38490..c05992c1121 100644
--- i/block/file-posix.c
+++ w/block/file-posix.c
@@ -2158,9 +2158,7 @@ static int coroutine_fn 
raw_co_block_status(BlockDriverState *bs,


 if (!want_zero) {
 *pnum = bytes;
-*map = offset;
-*file = bs;
-return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
+return BDRV_BLOCK_DATA;
 }

 ret = find_allocation(bs, offset, , );
@@ -2183,9 +2181,7 @@ static int coroutine_fn 
raw_co_block_status(BlockDriverState *bs,

 *pnum = MIN(bytes, data - offset);
 ret = BDRV_BLOCK_ZERO;
 }
-*map = offset;
-*file = bs;
-return ret | BDRV_BLOCK_OFFSET_VALID;
+return ret;
 }

 static coroutine_fn BlockAIOCB *raw_aio_pdiscard(BlockDriverState *bs,


post-patch:
$ ./qemu-img map --output=json tmp
[{ "start": 0, "length": 4096, "depth": 0, "zero": false, "data": true},
{ "start": 4096, "length": 10481664, "depth": 0, "zero": true, "data": 
false}]





OFFSET_VALID | DATA might be excusable because I can see that it's
convenient that a protocol driver refers to itself as *file instead of
returning NULL there and then the offset is valid (though it would be
pointless to actually follow the file pointer), but OFFSET_VALID without
DATA probably isn't.


So OFFSET_VALID | DATA for a protocol BDS is not just convenient, but 
necessary to avoid breaking qemu-img map output.  But you are also right 
that OFFSET_VALID without data makes little sense at a protocol layer. 
So with that in mind, I'm auditing all of the protocol layers to make 
sure OFFSET_VALID ends up as something sane.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



Re: [Qemu-block] [PATCH v2 21/36] rbd: Pass BlockdevOptionsRbd to qemu_rbd_connect()

2018-02-23 Thread Max Reitz
On 2018-02-23 17:19, Kevin Wolf wrote:
> Am 23.02.2018 um 00:25 hat Max Reitz geschrieben:
>> On 2018-02-21 14:53, Kevin Wolf wrote:
>>> With the conversion to a QAPI options object, the function is now
>>> prepared to be used in a .bdrv_co_create implementation.
>>>
>>> Signed-off-by: Kevin Wolf 
> 
>>> -*s_snap = g_strdup(snap);
>>> -*s_image_name = g_strdup(image_name);
>>> +*s_snap = g_strdup(opts->snapshot);
>>> +*s_image_name = g_strdup(opts->image);
>>>  
>>>  /* try default location when conf=NULL, but ignore failure */
>>> -r = rados_conf_read_file(*cluster, conf);
>>> -if (conf && r < 0) {
>>> -error_setg_errno(errp, -r, "error reading conf file %s", conf);
>>> +r = rados_conf_read_file(*cluster, opts->conf);
>>> +if (opts->has_conf && r < 0) {
>>
>> Reading opts->conf without knowing whether opts->has_conf is true is a
>> bit weird.  Would you mind "s->has_conf ? opts->conf : NULL" for the
>> rados_conf_read() call?
>>
>> On that thought, opts->snapshot and opts->user are optional, too.  Are
>> they guaranteed to be NULL if they haven't been specified?  Should we
>> guard those accesses with opts->has_* queries, too?
> 
> These days, both the QMP marshalling code (for the outermost struct when
> called from x-blockdev-create) and the input visitor (for nested structs
> and non-QMP callers) initialise the objects with {0} and g_malloc0().
> 
> I think Markus once told me that I shouldn't do pointless has_* checks
> any more in QMP commands, so I intentionally did the same here.

I'm a bit cautious because of non-zero defaults (like sslverify in the
ssh driver), but as long as you're aware...

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v2 21/36] rbd: Pass BlockdevOptionsRbd to qemu_rbd_connect()

2018-02-23 Thread Kevin Wolf
Am 23.02.2018 um 00:25 hat Max Reitz geschrieben:
> On 2018-02-21 14:53, Kevin Wolf wrote:
> > With the conversion to a QAPI options object, the function is now
> > prepared to be used in a .bdrv_co_create implementation.
> > 
> > Signed-off-by: Kevin Wolf 

> > -*s_snap = g_strdup(snap);
> > -*s_image_name = g_strdup(image_name);
> > +*s_snap = g_strdup(opts->snapshot);
> > +*s_image_name = g_strdup(opts->image);
> >  
> >  /* try default location when conf=NULL, but ignore failure */
> > -r = rados_conf_read_file(*cluster, conf);
> > -if (conf && r < 0) {
> > -error_setg_errno(errp, -r, "error reading conf file %s", conf);
> > +r = rados_conf_read_file(*cluster, opts->conf);
> > +if (opts->has_conf && r < 0) {
> 
> Reading opts->conf without knowing whether opts->has_conf is true is a
> bit weird.  Would you mind "s->has_conf ? opts->conf : NULL" for the
> rados_conf_read() call?
> 
> On that thought, opts->snapshot and opts->user are optional, too.  Are
> they guaranteed to be NULL if they haven't been specified?  Should we
> guard those accesses with opts->has_* queries, too?

These days, both the QMP marshalling code (for the outermost struct when
called from x-blockdev-create) and the input visitor (for nested structs
and non-QMP callers) initialise the objects with {0} and g_malloc0().

I think Markus once told me that I shouldn't do pointless has_* checks
any more in QMP commands, so I intentionally did the same here.

Kevin


signature.asc
Description: PGP signature


Re: [Qemu-block] [PATCH v2 18/36] rbd: Fix use after free in qemu_rbd_set_keypairs() error path

2018-02-23 Thread Kevin Wolf
Am 23.02.2018 um 16:15 hat Eric Blake geschrieben:
> On 02/21/2018 07:53 AM, Kevin Wolf wrote:
> > If we want to include the invalid option name in the error message, we
> > can't free the string earlier than that.
> > 
> > Signed-off-by: Kevin Wolf 
> > ---
> >   block/rbd.c | 3 ++-
> >   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> D'oh.  Should this one be cc'd to qemu-stable?

Yes, good point. Adding the CC to this reply, and also adding a Cc: line
in the commit message.

Kevin



[Qemu-block] [PATCH 5/5] ide: introduce ide_transfer_start_norecurse

2018-02-23 Thread Paolo Bonzini
For the case where the end_transfer_func is also the caller of
ide_transfer_start, the mutual recursion can lead to unlimited
stack usage.  Introduce a new version that can be used to change
tail recursion into a loop, and use it in trace_ide_atapi_cmd_reply_end.

Signed-off-by: Paolo Bonzini 
---
 hw/ide/atapi.c| 35 +++
 hw/ide/core.c | 16 
 include/hw/ide/internal.h |  2 ++
 3 files changed, 33 insertions(+), 20 deletions(-)

diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
index be99a929cf..4df4a66bbe 100644
--- a/hw/ide/atapi.c
+++ b/hw/ide/atapi.c
@@ -248,12 +248,7 @@ void ide_atapi_cmd_reply_end(IDEState *s)
 trace_ide_atapi_cmd_reply_end(s, s->packet_transfer_size,
   s->elementary_transfer_size,
   s->io_buffer_index);
-if (s->packet_transfer_size <= 0) {
-/* end of transfer */
-ide_atapi_cmd_ok(s);
-ide_set_irq(s->bus);
-trace_ide_atapi_cmd_reply_end_eot(s, s->status);
-} else {
+while (s->packet_transfer_size > 0) {
 /* see if a new sector must be read */
 if (s->lba != -1 && s->io_buffer_index >= s->cd_sector_size) {
 if (!s->elementary_transfer_size) {
@@ -279,11 +274,6 @@ void ide_atapi_cmd_reply_end(IDEState *s)
 size = s->cd_sector_size - s->io_buffer_index;
 if (size > s->elementary_transfer_size)
 size = s->elementary_transfer_size;
-s->packet_transfer_size -= size;
-s->elementary_transfer_size -= size;
-s->io_buffer_index += size;
-ide_transfer_start(s, s->io_buffer + s->io_buffer_index - size,
-   size, ide_atapi_cmd_reply_end);
 } else {
 /* a new transfer is needed */
 s->nsector = (s->nsector & ~7) | ATAPI_INT_REASON_IO;
@@ -305,14 +295,27 @@ void ide_atapi_cmd_reply_end(IDEState *s)
 if (size > (s->cd_sector_size - s->io_buffer_index))
 size = (s->cd_sector_size - s->io_buffer_index);
 }
-s->packet_transfer_size -= size;
-s->elementary_transfer_size -= size;
-s->io_buffer_index += size;
 trace_ide_atapi_cmd_reply_end_new(s, s->status);
-ide_transfer_start(s, s->io_buffer + s->io_buffer_index - size,
-   size, ide_atapi_cmd_reply_end);
+}
+s->packet_transfer_size -= size;
+s->elementary_transfer_size -= size;
+s->io_buffer_index += size;
+
+/* Some adapters process PIO data right away.  In that case, we need
+ * to avoid mutual recursion between ide_transfer_start
+ * and ide_atapi_cmd_reply_end.
+ */
+if (!ide_transfer_start_norecurse(s,
+  s->io_buffer + s->io_buffer_index - 
size,
+  size, ide_atapi_cmd_reply_end)) {
+return;
 }
 }
+
+/* end of transfer */
+ide_atapi_cmd_ok(s);
+ide_set_irq(s->bus);
+trace_ide_atapi_cmd_reply_end_eot(s, s->status);
 }
 
 /* send a reply of 'size' bytes in s->io_buffer to an ATAPI command */
diff --git a/hw/ide/core.c b/hw/ide/core.c
index 447d9624df..ddefeb086d 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -529,8 +529,8 @@ static void ide_clear_retry(IDEState *s)
 }
 
 /* prepare data transfer and tell what to do after */
-void ide_transfer_start(IDEState *s, uint8_t *buf, int size,
-EndTransferFunc *end_transfer_func)
+bool ide_transfer_start_norecurse(IDEState *s, uint8_t *buf, int size,
+  EndTransferFunc *end_transfer_func)
 {
 s->data_ptr = buf;
 s->data_end = buf + size;
@@ -540,10 +540,18 @@ void ide_transfer_start(IDEState *s, uint8_t *buf, int 
size,
 }
 if (!s->bus->dma->ops->start_transfer) {
 s->end_transfer_func = end_transfer_func;
-return;
+return false;
 }
 s->bus->dma->ops->start_transfer(s->bus->dma);
-end_transfer_func(s);
+return true;
+}
+
+void ide_transfer_start(IDEState *s, uint8_t *buf, int size,
+EndTransferFunc *end_transfer_func)
+{
+if (ide_transfer_start_norecurse(s, buf, size, end_transfer_func)) {
+end_transfer_func(s);
+}
 }
 
 static void ide_cmd_done(IDEState *s)
diff --git a/include/hw/ide/internal.h b/include/hw/ide/internal.h
index efaabbd815..1bd93d0a30 100644
--- a/include/hw/ide/internal.h
+++ b/include/hw/ide/internal.h
@@ -624,6 +624,8 @@ void ide_exec_cmd(IDEBus *bus, uint32_t val);
 
 void ide_transfer_start(IDEState *s, uint8_t *buf, int size,
 EndTransferFunc *end_transfer_func);
+bool ide_transfer_start_norecurse(IDEState *s, uint8_t *buf, int size,
+  EndTransferFunc *end_transfer_func);
 void 

[Qemu-block] [PATCH 4/5] atapi: call ide_set_irq before ide_transfer_start

2018-02-23 Thread Paolo Bonzini
The ATAPI_INT_REASON_IO interrupt is raised when I/O starts, but in the
AHCI case ide_set_irq was actually called at the end of a mutual recursion.
Move it early, with the side effect that ide_transfer_start becomes a tail
call in ide_atapi_cmd_reply_end.

Signed-off-by: Paolo Bonzini 
---
 hw/ide/atapi.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
index c0509c8bf5..be99a929cf 100644
--- a/hw/ide/atapi.c
+++ b/hw/ide/atapi.c
@@ -287,6 +287,7 @@ void ide_atapi_cmd_reply_end(IDEState *s)
 } else {
 /* a new transfer is needed */
 s->nsector = (s->nsector & ~7) | ATAPI_INT_REASON_IO;
+ide_set_irq(s->bus);
 byte_count_limit = atapi_byte_count_limit(s);
 trace_ide_atapi_cmd_reply_end_bcl(s, byte_count_limit);
 size = s->packet_transfer_size;
@@ -307,10 +308,9 @@ void ide_atapi_cmd_reply_end(IDEState *s)
 s->packet_transfer_size -= size;
 s->elementary_transfer_size -= size;
 s->io_buffer_index += size;
+trace_ide_atapi_cmd_reply_end_new(s, s->status);
 ide_transfer_start(s, s->io_buffer + s->io_buffer_index - size,
size, ide_atapi_cmd_reply_end);
-ide_set_irq(s->bus);
-trace_ide_atapi_cmd_reply_end_new(s, s->status);
 }
 }
 }
-- 
2.14.3





[Qemu-block] [PATCH 2/5] ide: push end_transfer callback to ide_transfer_halt

2018-02-23 Thread Paolo Bonzini
The callback must be invoked once we get out of the DRQ phase; because
all end_transfer_funcs end up invoking ide_transfer_stop, call it there.
While at it, remove the "notify" argument from ide_transfer_halt; the
code can simply be moved to ide_transfer_stop.

Old PATA controllers have no end_transfer callback, so there is no change
there.  For AHCI the end_transfer_func already called ide_transfer_stop
so the effect is that the PIO FIS is written before the D2H FIS rather
than after, which is arguably an improvement.

However, ahci_end_transfer is now called _after_ the DRQ_STAT has been
cleared, so remove the status check in ahci_end_transfer; it was only
there to ensure the FIS was not written more than once, and this cannot
happen anymore.

Signed-off-by: Paolo Bonzini 
---
 hw/ide/ahci.c |  7 ++-
 hw/ide/core.c | 15 +++
 2 files changed, 9 insertions(+), 13 deletions(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index 937bad55fb..c3c6843b76 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -1326,12 +1326,9 @@ out:
 static void ahci_end_transfer(IDEDMA *dma)
 {
 AHCIDevice *ad = DO_UPCAST(AHCIDevice, dma, dma);
-IDEState *s = >port.ifs[0];
 
-if (!(s->status & DRQ_STAT)) {
-/* done with PIO send/receive */
-ahci_write_fis_pio(ad, le32_to_cpu(ad->cur_cmd->status));
-}
+/* done with PIO send/receive */
+ahci_write_fis_pio(ad, le32_to_cpu(ad->cur_cmd->status));
 }
 
 static void ahci_start_dma(IDEDMA *dma, IDEState *s,
diff --git a/hw/ide/core.c b/hw/ide/core.c
index 92f4424dc3..c4710a6f55 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -544,7 +544,6 @@ void ide_transfer_start(IDEState *s, uint8_t *buf, int size,
 }
 s->bus->dma->ops->start_transfer(s->bus->dma);
 end_transfer_func(s);
-s->bus->dma->ops->end_transfer(s->bus->dma);
 }
 
 static void ide_cmd_done(IDEState *s)
@@ -555,26 +554,26 @@ static void ide_cmd_done(IDEState *s)
 }
 
 static void ide_transfer_halt(IDEState *s,
-  void(*end_transfer_func)(IDEState *),
-  bool notify)
+  void(*end_transfer_func)(IDEState *))
 {
 s->end_transfer_func = end_transfer_func;
 s->data_ptr = s->io_buffer;
 s->data_end = s->io_buffer;
 s->status &= ~DRQ_STAT;
-if (notify) {
-ide_cmd_done(s);
-}
 }
 
 void ide_transfer_stop(IDEState *s)
 {
-ide_transfer_halt(s, ide_transfer_stop, true);
+ide_transfer_halt(s, ide_transfer_stop);
+if (s->bus->dma->ops->end_transfer) {
+s->bus->dma->ops->end_transfer(s->bus->dma);
+}
+ide_cmd_done(s);
 }
 
 static void ide_transfer_cancel(IDEState *s)
 {
-ide_transfer_halt(s, ide_transfer_cancel, false);
+ide_transfer_halt(s, ide_transfer_cancel);
 }
 
 int64_t ide_get_sector(IDEState *s)
-- 
2.14.3





[Qemu-block] [PATCH 3/5] ide: do not set s->end_transfer_func to ide_transfer_cancel

2018-02-23 Thread Paolo Bonzini
There is code checking s->end_transfer_func and it was not taught
about ide_transfer_cancel.  We can just use ide_transfer_stop because
s->end_transfer_func is only ever called in the DRQ phase: after
ide_transfer_cancel, the value of s->end_transfer_func is only used
as a marker and never used to actually invoke the function.

Signed-off-by: Paolo Bonzini 
---
 hw/ide/core.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/hw/ide/core.c b/hw/ide/core.c
index c4710a6f55..447d9624df 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -553,10 +553,9 @@ static void ide_cmd_done(IDEState *s)
 }
 }
 
-static void ide_transfer_halt(IDEState *s,
-  void(*end_transfer_func)(IDEState *))
+static void ide_transfer_halt(IDEState *s)
 {
-s->end_transfer_func = end_transfer_func;
+s->end_transfer_func = ide_transfer_stop;
 s->data_ptr = s->io_buffer;
 s->data_end = s->io_buffer;
 s->status &= ~DRQ_STAT;
@@ -564,7 +563,7 @@ static void ide_transfer_halt(IDEState *s,
 
 void ide_transfer_stop(IDEState *s)
 {
-ide_transfer_halt(s, ide_transfer_stop);
+ide_transfer_halt(s);
 if (s->bus->dma->ops->end_transfer) {
 s->bus->dma->ops->end_transfer(s->bus->dma);
 }
@@ -573,7 +572,7 @@ void ide_transfer_stop(IDEState *s)
 
 static void ide_transfer_cancel(IDEState *s)
 {
-ide_transfer_halt(s, ide_transfer_cancel);
+ide_transfer_halt(s);
 }
 
 int64_t ide_get_sector(IDEState *s)
-- 
2.14.3





[Qemu-block] [RFC PATCH 0/5] atapi: change unlimited recursion to while loop

2018-02-23 Thread Paolo Bonzini
Real hardware doesn't have an unlimited stack, so the unlimited
recursion in the ATAPI code smells a bit.  In fact, the call to
ide_transfer_start easily becomes a tail call with a small change
to the code (patch 4).  The remaining four patches move code around
so as to the turn the call back to ide_atapi_cmd_reply_end into
another tail call, and then convert the (double) tail recursion into
a while loop.

I'm not sure how this can be tested, apart from adding a READ CD
test to ahci-test (which I don't really have time for now, hence
the RFC tag).  The existing AHCI tests still pass, so patches 1-3
aren't complete crap.

Paolo

Paolo Bonzini (5):
  ide: push call to end_transfer_func out of start_transfer callback
  ide: push end_transfer callback to ide_transfer_halt
  ide: make ide_transfer_stop idempotent
  atapi: call ide_set_irq before ide_transfer_start
  ide: introduce ide_transfer_start_norecurse

 hw/ide/ahci.c | 12 +++-
 hw/ide/atapi.c| 37 -
 hw/ide/core.c | 37 +++--
 include/hw/ide/internal.h |  3 +++
 4 files changed, 53 insertions(+), 36 deletions(-)

-- 
2.14.3




[Qemu-block] [PATCH 1/5] ide: push call to end_transfer_func out of start_transfer callback

2018-02-23 Thread Paolo Bonzini
Split the PIO transfer across two callbacks, thus pushing the (possibly
recursive) call to end_transfer_func up one level and out of the
AHCI-specific code.

Signed-off-by: Paolo Bonzini 
---
 hw/ide/ahci.c | 7 ++-
 hw/ide/core.c | 9 ++---
 include/hw/ide/internal.h | 1 +
 3 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index e22d7be05f..937bad55fb 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -1321,8 +1321,12 @@ out:
 
 /* Update number of transferred bytes, destroy sglist */
 dma_buf_commit(s, size);
+}
 
-s->end_transfer_func(s);
+static void ahci_end_transfer(IDEDMA *dma)
+{
+AHCIDevice *ad = DO_UPCAST(AHCIDevice, dma, dma);
+IDEState *s = >port.ifs[0];
 
 if (!(s->status & DRQ_STAT)) {
 /* done with PIO send/receive */
@@ -1444,6 +1448,7 @@ static const IDEDMAOps ahci_dma_ops = {
 .restart = ahci_restart,
 .restart_dma = ahci_restart_dma,
 .start_transfer = ahci_start_transfer,
+.end_transfer = ahci_end_transfer,
 .prepare_buf = ahci_dma_prepare_buf,
 .commit_buf = ahci_commit_buf,
 .rw_buf = ahci_dma_rw_buf,
diff --git a/hw/ide/core.c b/hw/ide/core.c
index 257b429381..92f4424dc3 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -532,16 +532,19 @@ static void ide_clear_retry(IDEState *s)
 void ide_transfer_start(IDEState *s, uint8_t *buf, int size,
 EndTransferFunc *end_transfer_func)
 {
-s->end_transfer_func = end_transfer_func;
 s->data_ptr = buf;
 s->data_end = buf + size;
 ide_set_retry(s);
 if (!(s->status & ERR_STAT)) {
 s->status |= DRQ_STAT;
 }
-if (s->bus->dma->ops->start_transfer) {
-s->bus->dma->ops->start_transfer(s->bus->dma);
+if (!s->bus->dma->ops->start_transfer) {
+s->end_transfer_func = end_transfer_func;
+return;
 }
+s->bus->dma->ops->start_transfer(s->bus->dma);
+end_transfer_func(s);
+s->bus->dma->ops->end_transfer(s->bus->dma);
 }
 
 static void ide_cmd_done(IDEState *s)
diff --git a/include/hw/ide/internal.h b/include/hw/ide/internal.h
index 88212f59df..efaabbd815 100644
--- a/include/hw/ide/internal.h
+++ b/include/hw/ide/internal.h
@@ -445,6 +445,7 @@ struct IDEState {
 struct IDEDMAOps {
 DMAStartFunc *start_dma;
 DMAVoidFunc *start_transfer;
+DMAVoidFunc *end_transfer;
 DMAInt32Func *prepare_buf;
 DMAu32Func *commit_buf;
 DMAIntFunc *rw_buf;
-- 
2.14.3





Re: [Qemu-block] [PATCH v2 27/36] sheepdog: QAPIfy "redundacy" create option

2018-02-23 Thread Kevin Wolf
Am 21.02.2018 um 14:53 hat Kevin Wolf geschrieben:
> The "redundacy" option for Sheepdog image creation is currently a string
> that can encode one or two integers depending on its format, which at
> the same time implicitly selects a mode.
> 
> This patch turns it into a QAPI union and converts the string into such
> a QAPI object before interpreting the values.
> 
> Signed-off-by: Kevin Wolf 

s/redundacy/redundancy/

Both in the subject line and the commit message. Autocompletion is
great, it helps to apply typos consistently.

Kevin



Re: [Qemu-block] [PATCH v2 33/36] file-posix: Fix no-op bdrv_truncate() with falloc preallocation

2018-02-23 Thread Eric Blake

On 02/21/2018 07:54 AM, Kevin Wolf wrote:

If bdrv_truncate() is called, but the requested size is the same as
before, don't call posix_fallocate(), which returns -EINVAL for length
zero and would therefore make bdrv_truncate() fail.

The problem can be triggered by creating a zero-sized raw image with
'falloc' preallocation mode.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
  block/file-posix.c | 14 +-
  1 file changed, 9 insertions(+), 5 deletions(-)



Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



Re: [Qemu-block] [PATCH v2 19/36] rbd: Factor out qemu_rbd_connect()

2018-02-23 Thread Kevin Wolf
Am 23.02.2018 um 00:10 hat Max Reitz geschrieben:
> On 2018-02-21 14:53, Kevin Wolf wrote:
> > The code to establish an RBD connection is duplicated between open and
> > create. In order to be able to share the code, factor out the code from
> > qemu_rbd_open() as a first step.
> > 
> > Signed-off-by: Kevin Wolf 
> > ---
> >  block/rbd.c | 100 
> > 
> >  1 file changed, 60 insertions(+), 40 deletions(-)
> > 
> > diff --git a/block/rbd.c b/block/rbd.c
> > index 27fa11b473..4bbcce4eca 100644
> > --- a/block/rbd.c
> > +++ b/block/rbd.c
> > @@ -544,32 +544,17 @@ out:
> >  return rados_str;
> >  }
> >  
> > -static int qemu_rbd_open(BlockDriverState *bs, QDict *options, int flags,
> > - Error **errp)
> > +static int qemu_rbd_connect(rados_t *cluster, rados_ioctx_t *io_ctx,
> > +char **s_snap, char **s_image_name,
> > +QDict *options, bool cache, Error **errp)
> 
> Bikeshedding ahead:  Maybe this should be called qemu_rados_connect()?
> I don't know anything about this, but there seems to be a distinction
> between rados_* functions and rbd_* functions -- the former work on the
> pool, the latter on the single block device.
> 
> Since this function only connects to the pool and not to a single device
> within, I think it should be called qemu_rados_connect() instead of
> qemu_rbd_connect().
> 
> (Also because qemu_rbd_connect() seems so similar to qemu_rbd_open().)

I think librados is the lower level interface, and librbd builds a
higher level interface on top of it. But I don't know anything about the
details either.

However, for functions in the block driver, qemu_rbd_* is the only
prefix used, there is no qemu_rados_* function. So I assume the prefix
comes from the block driver name 'rbd' rather than which library it
accesses, and that it would be better to keep qemu_rbd_connect().

> Up to you:
> 
> Reviewed-by: Max Reitz 

Thanks.

Kevin


signature.asc
Description: PGP signature


Re: [Qemu-block] [PATCH v2 18/36] rbd: Fix use after free in qemu_rbd_set_keypairs() error path

2018-02-23 Thread Eric Blake

On 02/21/2018 07:53 AM, Kevin Wolf wrote:

If we want to include the invalid option name in the error message, we
can't free the string earlier than that.

Signed-off-by: Kevin Wolf 
---
  block/rbd.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)


D'oh.  Should this one be cc'd to qemu-stable?

Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



Re: [Qemu-block] [PATCH v2 17/36] gluster: Support .bdrv_co_create

2018-02-23 Thread Eric Blake

On 02/21/2018 07:53 AM, Kevin Wolf wrote:

This adds the .bdrv_co_create driver callback to gluster, which enables
image creation over QMP.

Signed-off-by: Kevin Wolf 
---
  qapi/block-core.json |  18 ++-
  block/gluster.c  | 135 ++-
  2 files changed, 108 insertions(+), 45 deletions(-)



Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



Re: [Qemu-block] [Qemu-devel] [PATCH 0/9] nbd block status base:allocation

2018-02-23 Thread no-reply
Hi,

This series failed build test on ppcle host. Please find the details below.

Subject: [Qemu-devel] [PATCH 0/9] nbd block status base:allocation
Type: series
Message-id: 1518702707-7077-1-git-send-email-vsement...@virtuozzo.com

=== TEST SCRIPT BEGIN ===
#!/bin/bash
# Testing script will be invoked under the git checkout with
# HEAD pointing to a commit that has the patches applied on top of "base"
# branch
set -e
echo "=== ENV ==="
env
echo "=== PACKAGES ==="
rpm -qa
echo "=== TEST BEGIN ==="
INSTALL=$PWD/install
BUILD=$PWD/build
mkdir -p $BUILD $INSTALL
SRC=$PWD
cd $BUILD
$SRC/configure --prefix=$INSTALL
make -j100
# XXX: we need reliable clean up
# make check -j100 V=1
make install
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 - [tag update]  patchew/20180223125047.343-1-be...@igalia.com -> 
patchew/20180223125047.343-1-be...@igalia.com
Submodule 'capstone' (git://git.qemu.org/capstone.git) registered for path 
'capstone'
Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc'
Submodule 'roms/QemuMacDrivers' (git://git.qemu.org/QemuMacDrivers.git) 
registered for path 'roms/QemuMacDrivers'
Submodule 'roms/SLOF' (git://git.qemu-project.org/SLOF.git) registered for path 
'roms/SLOF'
Submodule 'roms/ipxe' (git://git.qemu-project.org/ipxe.git) registered for path 
'roms/ipxe'
Submodule 'roms/openbios' (git://git.qemu-project.org/openbios.git) registered 
for path 'roms/openbios'
Submodule 'roms/openhackware' (git://git.qemu-project.org/openhackware.git) 
registered for path 'roms/openhackware'
Submodule 'roms/qemu-palcode' (git://github.com/rth7680/qemu-palcode.git) 
registered for path 'roms/qemu-palcode'
Submodule 'roms/seabios' (git://git.qemu-project.org/seabios.git/) registered 
for path 'roms/seabios'
Submodule 'roms/seabios-hppa' (git://github.com/hdeller/seabios-hppa.git) 
registered for path 'roms/seabios-hppa'
Submodule 'roms/sgabios' (git://git.qemu-project.org/sgabios.git) registered 
for path 'roms/sgabios'
Submodule 'roms/skiboot' (git://git.qemu.org/skiboot.git) registered for path 
'roms/skiboot'
Submodule 'roms/u-boot' (git://git.qemu-project.org/u-boot.git) registered for 
path 'roms/u-boot'
Submodule 'roms/vgabios' (git://git.qemu-project.org/vgabios.git/) registered 
for path 'roms/vgabios'
Submodule 'ui/keycodemapdb' (git://git.qemu.org/keycodemapdb.git) registered 
for path 'ui/keycodemapdb'
Cloning into 'capstone'...
Submodule path 'capstone': checked out 
'22ead3e0bfdb87516656453336160e0a37b066bf'
Cloning into 'dtc'...
Submodule path 'dtc': checked out 'e54388015af1fb4bf04d0bca99caba1074d9cc42'
Cloning into 'roms/QemuMacDrivers'...
Submodule path 'roms/QemuMacDrivers': checked out 
'd4e7d7ac663fcb55f1b93575445fcbca372f17a7'
Cloning into 'roms/SLOF'...
Submodule path 'roms/SLOF': checked out 
'fa981320a1e0968d6fc1b8de319723ff8212b337'
Cloning into 'roms/ipxe'...
Submodule path 'roms/ipxe': checked out 
'0600d3ae94f93efd10fc6b3c7420a9557a3a1670'
Cloning into 'roms/openbios'...
Submodule path 'roms/openbios': checked out 
'54d959d97fb331708767b2fd4a878efd2bbc41bb'
Cloning into 'roms/openhackware'...
Submodule path 'roms/openhackware': checked out 
'c559da7c8eec5e45ef1f67978827af6f0b9546f5'
Cloning into 'roms/qemu-palcode'...
Submodule path 'roms/qemu-palcode': checked out 
'f3c7e44c70254975df2a00af39701eafbac4d471'
Cloning into 'roms/seabios'...
Submodule path 'roms/seabios': checked out 
'63451fca13c75870e1703eb3e20584d91179aebc'
Cloning into 'roms/seabios-hppa'...
Submodule path 'roms/seabios-hppa': checked out 
'649e6202b8d65d46c69f542b1380f840fbe8ab13'
Cloning into 'roms/sgabios'...
Submodule path 'roms/sgabios': checked out 
'cbaee52287e5f32373181cff50a00b6c4ac9015a'
Cloning into 'roms/skiboot'...
Submodule path 'roms/skiboot': checked out 
'e0ee24c27a172bcf482f6f2bc905e6211c134bcc'
Cloning into 'roms/u-boot'...
Submodule path 'roms/u-boot': checked out 
'd85ca029f257b53a96da6c2fb421e78a003a9943'
Cloning into 'roms/vgabios'...
Submodule path 'roms/vgabios': checked out 
'19ea12c230ded95928ecaef0db47a82231c2e485'
Cloning into 'ui/keycodemapdb'...
Submodule path 'ui/keycodemapdb': checked out 
'6b3d716e2b6472eb7189d3220552280ef3d832ce'
Switched to a new branch 'test'
7d95dcd iotests: new test 206 for NBD BLOCK_STATUS
906b016 iotests: add file_path helper
015ee72 iotests.py: tiny refactor: move system imports up
1377201 nbd: BLOCK_STATUS for standard get_block_status function: client part
a750bdb nbd/client: fix error messages in nbd_handle_reply_err
6ec6604 block/nbd-client: save first fatal error in nbd_iter_error
1b609ef nbd: BLOCK_STATUS for standard get_block_status function: server part
ac6e460 nbd: change indenting in nbd.h
5e399e1 nbd/server: add nbd_opt_invalid helper

=== OUTPUT BEGIN ===
=== ENV ===
XDG_SESSION_ID=204371
SHELL=/bin/sh
USER=patchew
PATCHEW=/home/patchew/patchew/patchew-cli -s http://patchew.org --nodebug
PATH=/usr/bin:/bin

Re: [Qemu-block] [PATCH v3] qcow2: Replace align_offset() with ROUND_UP()

2018-02-23 Thread Max Reitz
On 2018-02-15 14:10, Alberto Garcia wrote:
> The align_offset() function is equivalent to the ROUND_UP() macro so
> there's no need to use the former. The ROUND_UP() name is also a bit
> more explicit.
> 
> This patch uses ROUND_UP() instead of the slower QEMU_ALIGN_UP()
> because align_offset() already requires that the second parameter is a
> power of two.
> 
> Signed-off-by: Alberto Garcia 
> Reviewed-by: Eric Blake 
> Reviewed-by: Philippe Mathieu-Daudé 
> ---
> v3 is the same as v2, but rebased on top of the current master fixing
> a merge conflict.
> ---
>  block/qcow2-bitmap.c   |  4 ++--
>  block/qcow2-cluster.c  |  4 ++--
>  block/qcow2-refcount.c |  4 ++--
>  block/qcow2-snapshot.c | 10 +-
>  block/qcow2.c  | 14 +++---
>  block/qcow2.h  |  6 --
>  6 files changed, 18 insertions(+), 24 deletions(-)

Thanks, applied to my block branch:

https://github.com/XanClic/qemu/commits/block

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH 0/3] block/ssh: Add basic .bdrv_truncate()

2018-02-23 Thread Max Reitz
On 2018-02-14 21:49, Max Reitz wrote:
> For (x-)blockdev-create, all protocol drivers that support image
> creation also need to offer a .bdrv_truncate() implementation that
> matches in features.  A previous series of mine brought gluster's and
> sheepdog's implementation up to par regarding preallocated truncation;
> but I forgot about drivers that have a .bdrv_create() but no
> .bdrv_truncate() at all.
> 
> There is only one of these, and that is ssh.  Since libssh2 does not
> seem to know any way of truncating files, we can only support growing
> files -- but that is what we need for (x-)blockdev-create.
> 
> Note that there are also drivers which do not support growing files,
> namely iscsi and file-posix for host devices (maybe more?  I hope not).
> But these also pretty much ignore the specified size on .bdrv_create()
> and just use the size of the existing device.  They just check that the
> specified size does not exceed the actual size, so that pretty much
> matches what their .bdrv_truncate() supports, and we should be fine
> there.
> 
> 
> Max Reitz (3):
>   block/ssh: Pull ssh_grow_file() from ssh_create()
>   block/ssh: Make ssh_grow_file() blocking
>   block/ssh: Add basic .bdrv_truncate()
> 
>  block/ssh.c | 61 
> +
>  1 file changed, 53 insertions(+), 8 deletions(-)

Applied to my block branch.

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH] qemu-img: Make resize error message more general

2018-02-23 Thread Max Reitz
On 2018-02-05 17:27, Max Reitz wrote:
> The issue:
> 
>   $ qemu-img resize -f qcow2 foo.qcow2
>   qemu-img: Expecting one image file name
>   Try 'qemu-img --help' for more information
> 
> So we gave an image file name, but we omitted the length.  qemu-img
> thinks the last argument is always the size and removes it immediately
> from argv (by decrementing argc), and tries to verify that it is a valid
> size only at a later point.
> 
> So we do not actually know whether that last argument we called "size"
> is indeed a size or whether the user instead forgot to specify that size
> but did give a file name.
> 
> Therefore, the error message should be more general.
> 
> Bug: https://bugzilla.redhat.com/show_bug.cgi?id=1523458
> Signed-off-by: Max Reitz 
> ---
>  qemu-img.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Applied to my block branch.

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v2 1/2] iotest 033: add misaligned write-zeroes test via truncate

2018-02-23 Thread Max Reitz
On 2018-02-12 14:14, Anton Nefedov wrote:
> This new test case only makes sense for qcow2 while iotest 033 is generic;
> however it matches the test purpose perfectly and also 033 contains those
> do_test() tricks to pass the alignment, which won't look nice being
> duplicated in other tests or moved to the common code.
> 
> Signed-off-by: Anton Nefedov 
> ---
>  tests/qemu-iotests/033 | 28 
>  tests/qemu-iotests/033.out | 13 +
>  2 files changed, 41 insertions(+)
> 
> diff --git a/tests/qemu-iotests/033 b/tests/qemu-iotests/033
> index 2cdfd13..5fa3983 100755
> --- a/tests/qemu-iotests/033
> +++ b/tests/qemu-iotests/033
> @@ -64,6 +64,9 @@ do_test()
>   } | $QEMU_IO $IO_EXTRA_ARGS
>  }
>  
> +echo
> +echo "=== Test aligned and misaligned write zeroes operations ==="
> +
>  for write_zero_cmd in "write -z" "aio_write -z"; do
>  for align in 512 4k; do
>   echo
> @@ -102,7 +105,32 @@ for align in 512 4k; do
>  done
>  done
>  
> +
> +# Trigger truncate that would shrink qcow2 L1 table, which is done by
> +#   clearing one entry (8 bytes) with bdrv_co_pwrite_zeroes()
> +
> +echo
> +echo "=== Test misaligned write zeroes via truncate ==="
> +echo
> +
> +CLUSTER_SIZE=$((64 * 1024))
> +L2_COVERAGE=$(($CLUSTER_SIZE * $CLUSTER_SIZE / 8))
> +_make_test_img $(($L2_COVERAGE * 2))

There should be a _cleanup_test_img before this or this test will fail
with nbd.

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v2] iotests: Test abnormally large size in compressed cluster descriptor

2018-02-23 Thread Eric Blake

On 02/23/2018 06:50 AM, Alberto Garcia wrote:

L2 entries for compressed clusters have a field that indicates the
number of sectors used to store the data in the image.

That's however not the size of the compressed data itself, just the
number of sectors where that data is located. The actual data size is
usually not a multiple of the sector size, and therefore cannot be
represented with this field.





Another effect of increasing the size field is that it can make it
include data from the following host cluster. In this case 'qemu-img
check' will detect that the refcounts are not correct, and we'll need
to rebuild them.


Indeed, tweaking sizes (can) affect refcount computations.



Additionally, this patch also tests that decreasing the size corrupts
the image since the original data can no longer be recovered. In this
case QEMU returns an error when trying to read the compressed data,
but 'qemu-img check' doesn't see anything wrong if the refcounts are
consistent.

One possible task for the future is to make 'qemu-img check' verify
the sizes of the compressed clusters, by trying to decompress the data
and checking that the size stored in the L2 entry is correct.


Indeed, but that means...



+
+# Reduce size of compressed data to 4 sectors: this corrupts the image.
+poke_file "$TEST_IMG" $((0x80)) "\x40\x06"
+$QEMU_IO -c "read  -P 0x11 0 4M" "$TEST_IMG" 2>&1 | _filter_qemu_io | 
_filter_testdir
+
+# 'qemu-img check' however doesn't see anything wrong because it
+# doesn't try to decompress the data and the refcounts are consistent.
+_check_test_img


...this spot should have a TODO comment that mentions the test needs 
updating if qemu-img check is taught to be pickier.



+
+# Increase size of compressed data to the maximum (8192 sectors).
+# This makes QEMU read more data (8192 sectors instead of 5), but the
+# decompression algorithm stops once we have enough to restore the
+# uncompressed cluster, so the rest of the data is ignored.
+poke_file "$TEST_IMG" $((0x80)) "\x7f\xfe"
+
+# Here the image is too small so we're asking QEMU to read beyond the
+# end of the image.
+$QEMU_IO -c "read  -P 0x11  0 4M" "$TEST_IMG" 2>&1 | _filter_qemu_io | 
_filter_testdir
+# But if we grow the image we won't be reading beyond its end anymore.
+$QEMU_IO -c "write -P 0x22 4M 4M" "$TEST_IMG" 2>&1 | _filter_qemu_io | 
_filter_testdir
+$QEMU_IO -c "read  -P 0x11  0 4M" "$TEST_IMG" 2>&1 | _filter_qemu_io | 
_filter_testdir
+
+# The refcount data is however wrong because due to the increased size
+# of the compressed data it now reaches the following host cluster.
+# This can be repaired by qemu-img check.
+_check_test_img -r all
+$QEMU_IO -c "read  -P 0x11  0 4M" "$TEST_IMG" 2>&1 | _filter_qemu_io | 
_filter_testdir
+$QEMU_IO -c "read  -P 0x22 4M 4M" "$TEST_IMG" 2>&1 | _filter_qemu_io | 
_filter_testdir


Thanks - this indeed tests more scenarios than v1.

With the TODO comment added,
Reviewed-by: Eric Blake 

Hmm - I also wonder - does our refcount code properly account for a 
compressed cluster that would affect the refcount of THREE clusters? 
Remember, qemu will never emit a compressed cluster that touches more 
than two clusters, but when you enlarge the size, if offset part of the 
link was already in the tail of one cluster, then you can bleed over 
into not just one, but two additional host clusters.  Your test didn't 
cover that, because it uses a compressed cluster that maps to the start 
of the host cluster.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



[Qemu-block] Intermittent failure of iotest 203

2018-02-23 Thread Max Reitz
Hi,

iotest 203 relatively often fails for me, at least when run in parallel.
 When I run the following concurrently on four shells:

$ while TEST_DIR=/tmp/t1 ./check -T -qcow2 203; do; done
$ while TEST_DIR=/tmp/t2 ./check -T -qcow2 203; do; done
$ while TEST_DIR=/tmp/t3 ./check -T -qcow2 203; do; done
$ while TEST_DIR=/tmp/t4 ./check -T -qcow2 203; do; done

Very quickly (like under ten iterations), at least one of those starts
to hang and then fails because of a timeout in vm.get_qmp_event().

Before digging deeper into the ppoll() dungeon* myself, I decided to
report this so I wouldn't have to. :-)

*Backtrace:

(gdb) bt
#0  0x7f354137b4d6 in ppoll () at /lib64/libc.so.6
#1  0x55b659144299 in ppoll (__ss=0x0, __timeout=0x7ffe4eaca230,
__nfds=, __fds=) at
/usr/include/bits/poll2.h:77
#2  0x55b659144299 in qemu_poll_ns (fds=,
nfds=, timeout=timeout@entry=39512999619000) at
util/qemu-timer.c:334
#3  0x55b6591450a3 in os_host_main_loop_wait (timeout=) at util/main-loop.c:255
#4  0x55b6591450a3 in main_loop_wait (nonblocking=)
at util/main-loop.c:515
#5  0x55b658d4a253 in main_loop () at vl.c:1933
#6  0x55b658d4a253 in main (argc=, argv=, envp=) at vl.c:4757

Max



signature.asc
Description: OpenPGP digital signature


[Qemu-block] [PATCH v2] iotests: Test abnormally large size in compressed cluster descriptor

2018-02-23 Thread Alberto Garcia
L2 entries for compressed clusters have a field that indicates the
number of sectors used to store the data in the image.

That's however not the size of the compressed data itself, just the
number of sectors where that data is located. The actual data size is
usually not a multiple of the sector size, and therefore cannot be
represented with this field.

The way it works is that QEMU reads all the specified sectors and
starts decompressing the data until there's enough to recover the
original uncompressed cluster. If there are any bytes left that
haven't been decompressed they are simply ignored.

One consequence of this is that even if the size field is larger than
it needs to be QEMU can handle it just fine: it will read more data
from disk but it will ignore the extra bytes.

This test creates an image with a compressed cluster that uses 5
sectors (2.5 KB), increases the size field to the maximum (8192
sectors, or 4 MB) and verifies that the data can be read without
problems.

This test is important because while the decompressed data takes
exactly one cluster, the maximum value allowed in the compressed size
field is twice the cluster size. So although QEMU won't produce images
with such large values we need to make sure that it can handle them.

Another effect of increasing the size field is that it can make it
include data from the following host cluster. In this case 'qemu-img
check' will detect that the refcounts are not correct, and we'll need
to rebuild them.

Additionally, this patch also tests that decreasing the size corrupts
the image since the original data can no longer be recovered. In this
case QEMU returns an error when trying to read the compressed data,
but 'qemu-img check' doesn't see anything wrong if the refcounts are
consistent.

One possible task for the future is to make 'qemu-img check' verify
the sizes of the compressed clusters, by trying to decompress the data
and checking that the size stored in the L2 entry is correct.

Signed-off-by: Alberto Garcia 
---

v2: We now have two scenarios where we make QEMU read data from the
next host cluster and from beyond the end of the image. This
version also runs qemu-img check on the corrupted image.

If the size field is too small, reading fails but qemu-img check
succeeds.

If the size field is too large, reading succeeds but qemu-img
check fails (this can be repaired, though).

---
 tests/qemu-iotests/122 | 38 ++
 tests/qemu-iotests/122.out | 28 
 2 files changed, 66 insertions(+)

diff --git a/tests/qemu-iotests/122 b/tests/qemu-iotests/122
index 45b359c2ba..fd5f43acc3 100755
--- a/tests/qemu-iotests/122
+++ b/tests/qemu-iotests/122
@@ -130,6 +130,44 @@ $QEMU_IO -c "read -P 01024k 1022k" "$TEST_IMG" 2>&1 | 
_filter_qemu_io | _fil
 
 
 echo
+echo "=== Corrupted size field in compressed cluster descriptor ==="
+echo
+# Create an empty image, fill half of it with data and compress it.
+# The L2 entry of the first compressed cluster is located at 0x80.
+# The original value is 0x400800a0 (5 sectors for compressed data).
+TEST_IMG="$TEST_IMG".1 _make_test_img 8M
+$QEMU_IO -c "write -P 0x11 0 4M" "$TEST_IMG".1 2>&1 | _filter_qemu_io | 
_filter_testdir
+$QEMU_IMG convert -c -O qcow2 -o cluster_size=2M "$TEST_IMG".1 "$TEST_IMG"
+
+# Reduce size of compressed data to 4 sectors: this corrupts the image.
+poke_file "$TEST_IMG" $((0x80)) "\x40\x06"
+$QEMU_IO -c "read  -P 0x11 0 4M" "$TEST_IMG" 2>&1 | _filter_qemu_io | 
_filter_testdir
+
+# 'qemu-img check' however doesn't see anything wrong because it
+# doesn't try to decompress the data and the refcounts are consistent.
+_check_test_img
+
+# Increase size of compressed data to the maximum (8192 sectors).
+# This makes QEMU read more data (8192 sectors instead of 5), but the
+# decompression algorithm stops once we have enough to restore the
+# uncompressed cluster, so the rest of the data is ignored.
+poke_file "$TEST_IMG" $((0x80)) "\x7f\xfe"
+
+# Here the image is too small so we're asking QEMU to read beyond the
+# end of the image.
+$QEMU_IO -c "read  -P 0x11  0 4M" "$TEST_IMG" 2>&1 | _filter_qemu_io | 
_filter_testdir
+# But if we grow the image we won't be reading beyond its end anymore.
+$QEMU_IO -c "write -P 0x22 4M 4M" "$TEST_IMG" 2>&1 | _filter_qemu_io | 
_filter_testdir
+$QEMU_IO -c "read  -P 0x11  0 4M" "$TEST_IMG" 2>&1 | _filter_qemu_io | 
_filter_testdir
+
+# The refcount data is however wrong because due to the increased size
+# of the compressed data it now reaches the following host cluster.
+# This can be repaired by qemu-img check.
+_check_test_img -r all
+$QEMU_IO -c "read  -P 0x11  0 4M" "$TEST_IMG" 2>&1 | _filter_qemu_io | 
_filter_testdir
+$QEMU_IO -c "read  -P 0x22 4M 4M" "$TEST_IMG" 2>&1 | _filter_qemu_io | 
_filter_testdir
+
+echo
 echo "=== Full allocation with -S 0 ==="
 echo
 
diff --git a/tests/qemu-iotests/122.out 

Re: [Qemu-block] Limiting coroutine stack usage

2018-02-23 Thread Paolo Bonzini
On 22/02/2018 18:06, John Snow wrote:
> 
> 
> On 02/22/2018 05:57 AM, Kevin Wolf wrote:
>> Am 20.02.2018 um 22:54 hat Paolo Bonzini geschrieben:
>>> On 20/02/2018 18:04, Peter Lieven wrote:
 Hi,

 I remember we discussed a long time ago to limit the stack usage of all
 functions that are executed in a coroutine
 context to a very low value to be able to safely limit the coroutine
 stack size as well.
>>>
>>> IIRC the only issue was that hw/ide/atapi.c has mutual recursion between
>>> ide_atapi_cmd_reply_end -> ide_transfer_start -> ahci_start_transfer ->
>>> ide_atapi_cmd_reply_end.
>>>
>>> But perhaps it's not an issue, somebody needs to audit the code.
>>
>> I think John intended to get rid of the recursion sometime, but I doubt
>> he has had the time so far.
>>
> 
> It hasn't been a priority for me.
> 
> Paolo tried to fix ATAPI by adding a BH callback, but that added the
> possibility of a migration halfway through a data transfer IIRC.
> 
> If anyone wants to tackle it, I'll dig up Paolo's patches.

A better possibility is to make it into tail recursion first and then a
while loop.  Maybe introducing some kind of ide_transfer_start_norecurse
that returns "true" if you have a start_transfer callback (so you need
to do another iteration immediately) and "false" if you don't.  I'll
take a look...

Paolo




Re: [Qemu-block] [PATCH] iotests: Test abnormally large size in compressed cluster descriptor

2018-02-23 Thread Alberto Garcia
On Thu 22 Feb 2018 08:00:08 PM CET, Eric Blake wrote:
>> One consequence of this is that even if the size field is larger than
>> it needs to be QEMU can handle it just fine: it will read more data
>> from disk but it will ignore the extra bytes.
>
> (is that true even for the corner case when the size field points
> beyond the end of the image?  But not important to the meat of the
> patch)

As a matter of fact this is exactly what happens in this test
case... I'm thinking to expand it so both cases are tested.

Berto



Re: [Qemu-block] [PATCH] vl: introduce vm_shutdown()

2018-02-23 Thread Fam Zheng
On Tue, 02/20 13:10, Stefan Hajnoczi wrote:
> 1. virtio_scsi_handle_cmd_vq() racing with iothread_stop_all() hits the
>virtio_scsi_ctx_check() assertion failure because the BDS AioContext
>has been modified by iothread_stop_all().

Does this patch fix the issue completely? IIUC virtio_scsi_handle_cmd can
already be entered at the time of main thread calling virtio_scsi_clear_aio(),
so this race condition still exists:

  main thread   iothread
-
  vm_shutdown
...
  virtio_bus_stop_ioeventfd
virtio_scsi_dataplane_stop
aio_poll()
  ...
virtio_scsi_data_plane_handle_cmd()
  aio_context_acquire(s->ctx)
  virtio_scsi_acquire(s).enter
  virtio_scsi_clear_aio()
  aio_context_release(s->ctx)
  virtio_scsi_acquire(s).return
  virtio_scsi_handle_cmd_vq()
...
  virtqueue_pop()

Is it possible that the above virtqueue_pop() still returns one element that was
queued before vm_shutdown() was called?

If so I think we additionally need to an "s->ioeventfd_stopped" flag set in
virtio_scsi_stop_ioeventfd() and check it in
virtio_scsi_data_plane_handle_cmd().

Fam