Re: [Qemu-block] [RFC] transactions: add transaction-wide property

2015-10-20 Thread Eric Blake
On 09/24/2015 03:40 PM, John Snow wrote:
> This replaces the per-action property as in Fam's series.
> Instead, we have a transaction-wide property that is shared
> with each action.
> 
> At the action level, if a property supplied transaction-wide
> is disagreeable, we return error and the transaction is aborted.
> 
> RFC:
> 
> Where this makes sense: Any transactional actions that aren't
> prepared to accept this new paradigm of transactional behavior
> can error_setg and return.
> 
> Where this may not make sense: consider the transactions which
> do not use BlockJobs to perform their actions, i.e. they are
> synchronous during the transactional phase. Because they either
> fail or succeed so early, we might say that of course they can
> support this property ...
> 
> ...however, consider the case where we create a new bitmap and
> perform a full backup, using allow_partial=false. If the backup
> fails, we might well expect the bitmap to be deleted because a
> member of the transaction ultimately/eventually failed. However,
> the bitmap creation was not undone because it does not have a
> pending/delayed abort/commit action -- those are only for jobs
> in this implementation.

The classic example is the 'Abort' transaction, which always fails.  Or
put another way, if you run a transaction that includes both the
creation of a new bitmap, and the abort action, what does the abort do
to the bitmap.

> 
> How do we fix this?
> 
> (1) We just say "No, you can't use the new block job transaction
> completion mechanic in conjunction with these commands,"
> 
> (2) We make Bitmap creation/resetting small, synchronous blockjobs
> that can join the BlockJobTxn

I'm not sure I have an opinion on this one yet.

> 
> Signed-off-by: John Snow 
> ---
>  blockdev.c | 87 
> --
>  blockjob.c |  2 +-
>  qapi-schema.json   | 15 +++--
>  qapi/block-core.json   | 26 ---
>  qmp-commands.hx|  2 +-
>  tests/qemu-iotests/124 | 12 +++
>  6 files changed, 83 insertions(+), 61 deletions(-)
> 
> @@ -1322,6 +1323,12 @@ static void internal_snapshot_prepare(BlkActionState 
> *common,
>  name = internal->name;
>  
>  /* 2. check for validation */
> +if (!common->txn_props->allow_partial) {
> +error_setg(errp,
> +   "internal_snapshot does not support allow_partial = 
> false");

Should the error message say 'allow-partial' to match the QMP?

>  
> +/**
> + * Allocate a TransactionProperties structure if necessary, and fill
> + * that structure with desired defaults if they are unset.
> + */
> +static TransactionProperties 
> *get_transaction_properties(TransactionProperties *props)
> +{
> +if (!props) {
> +props = g_new0(TransactionProperties, 1);
> +}
> +
> +if (!props->has_allow_partial) {
> +props->allow_partial = true;
> +}
> +
> +return props;
> +}

If *props is NULL on entry, then allow_partial is true on exit but
has_allow_partial is still false. I guess that means we're relying on
the rest of the code ignoring has_allow_partial because they used this
function to set defaults.

Yeah, having default support built into qapi would make this nicer. I
don't know if we'll have enough time for qapi defaults to make it in 2.5
(the queue is already quite big for merely getting boxed qmp callback
functions in place).  But that's all internal, and won't matter if it
doesn't get added until 2.6, without affecting what behavior the
external user sees.

> +
>  /*
>   * 'Atomic' group operations.  The operations are performed as a set, and if
>   * any fail then we roll back all operations in the group.
>   */
> -void qmp_transaction(TransactionActionList *dev_list, Error **errp)
> +void qmp_transaction(TransactionActionList *dev_list,
> + bool hasTransactionProperties,

Name this has_props, to be more reminiscent of other qapi naming.

> + struct TransactionProperties *props,
> + Error **errp)
>  {

> -block_job_txn = block_job_txn_new();
> +/* Does this transaction get *canceled* as a group on failure? */
> +props = get_transaction_properties(props);
> +if (props->allow_partial == false) {
> +block_job_txn = block_job_txn_new();
> +}
>  

>  }
> +if (!hasTransactionProperties) {
> +g_free(props);

qapi_free_TransactionProperties(props) is probably better.


> +++ b/qapi-schema.json
> @@ -1505,14 +1505,20 @@
>  { 'union': 'TransactionAction',
>'data': {
> 'blockdev-snapshot-sync': 'BlockdevSnapshot',
> -   'drive-backup': 'DriveBackupTxn',
> -   'blockdev-backup': 'BlockdevBackupTxn',
> +   'drive-backup': 'DriveBackup',
> +   'blockdev-backup': 'BlockdevBackup',

I like that we no longer need the special subclasses just for transactions.

> 'abort': 'Abort',
> 'blockdev-snapshot-internal-sync': 'BlockdevSnapshotInter

Re: [Qemu-block] [RFC] transactions: add transaction-wide property

2015-10-20 Thread Fam Zheng
On Thu, 09/24 17:40, John Snow wrote:
> This replaces the per-action property as in Fam's series.
> Instead, we have a transaction-wide property that is shared
> with each action.
> 
> At the action level, if a property supplied transaction-wide
> is disagreeable, we return error and the transaction is aborted.
> 
> RFC:
> 
> Where this makes sense: Any transactional actions that aren't
> prepared to accept this new paradigm of transactional behavior
> can error_setg and return.
> 
> Where this may not make sense: consider the transactions which
> do not use BlockJobs to perform their actions, i.e. they are
> synchronous during the transactional phase. Because they either
> fail or succeed so early, we might say that of course they can
> support this property ...
> 
> ...however, consider the case where we create a new bitmap and
> perform a full backup, using allow_partial=false. If the backup
> fails, we might well expect the bitmap to be deleted because a
> member of the transaction ultimately/eventually failed. However,
> the bitmap creation was not undone because it does not have a
> pending/delayed abort/commit action -- those are only for jobs
> in this implementation.
> 
> How do we fix this?
> 
> (1) We just say "No, you can't use the new block job transaction
> completion mechanic in conjunction with these commands,"
> 
> (2) We make Bitmap creation/resetting small, synchronous blockjobs
> that can join the BlockJobTxn

We could categorize commands as synchronous ones and long running ones, and
make it explicit that only long running jobs are affected by "allow_partial".

> 
> Signed-off-by: John Snow 
> ---
>  blockdev.c | 87 
> --
>  blockjob.c |  2 +-
>  qapi-schema.json   | 15 +++--
>  qapi/block-core.json   | 26 ---
>  qmp-commands.hx|  2 +-
>  tests/qemu-iotests/124 | 12 +++
>  6 files changed, 83 insertions(+), 61 deletions(-)
> 
> diff --git a/blockdev.c b/blockdev.c
> index 45a9fe7..02b1a83 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -1061,7 +1061,7 @@ static void blockdev_do_action(int kind, void *data, 
> Error **errp)
>  action.data = data;
>  list.value = &action;
>  list.next = NULL;
> -qmp_transaction(&list, errp);
> +qmp_transaction(&list, false, NULL, errp);
>  }
>  
>  void qmp_blockdev_snapshot_sync(bool has_device, const char *device,
> @@ -1286,6 +1286,7 @@ struct BlkActionState {
>  TransactionAction *action;
>  const BlkActionOps *ops;
>  BlockJobTxn *block_job_txn;
> +TransactionProperties *txn_props;
>  QSIMPLEQ_ENTRY(BlkActionState) entry;
>  };
>  
> @@ -1322,6 +1323,12 @@ static void internal_snapshot_prepare(BlkActionState 
> *common,
>  name = internal->name;
>  
>  /* 2. check for validation */
> +if (!common->txn_props->allow_partial) {
> +error_setg(errp,
> +   "internal_snapshot does not support allow_partial = 
> false");
> +return;
> +}
> +
>  blk = blk_by_name(device);
>  if (!blk) {
>  error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND,
> @@ -1473,6 +1480,12 @@ static void external_snapshot_prepare(BlkActionState 
> *common,
>  }
>  
>  /* start processing */
> +if (!common->txn_props->allow_partial) {
> +error_setg(errp,
> +   "external_snapshot does not support allow_partial = 
> false");
> +return;
> +}
> +
>  state->old_bs = bdrv_lookup_bs(has_device ? device : NULL,
> has_node_name ? node_name : NULL,
> &local_err);
> @@ -1603,14 +1616,11 @@ static void drive_backup_prepare(BlkActionState 
> *common, Error **errp)
>  DriveBackupState *state = DO_UPCAST(DriveBackupState, common, common);
>  BlockDriverState *bs;
>  BlockBackend *blk;
> -DriveBackupTxn *backup_txn;
>  DriveBackup *backup;
> -BlockJobTxn *txn = NULL;
>  Error *local_err = NULL;
>  
>  assert(common->action->kind == TRANSACTION_ACTION_KIND_DRIVE_BACKUP);
> -backup_txn = common->action->drive_backup;
> -backup = backup_txn->base;
> +backup = common->action->drive_backup->base;
>  
>  blk = blk_by_name(backup->device);
>  if (!blk) {
> @@ -1624,11 +1634,6 @@ static void drive_backup_prepare(BlkActionState 
> *common, Error **errp)
>  state->aio_context = bdrv_get_aio_context(bs);
>  aio_context_acquire(state->aio_context);
>  
> -if (backup_txn->has_transactional_cancel &&
> -backup_txn->transactional_cancel) {
> -txn = common->block_job_txn;
> -}
> -
>  do_drive_backup(backup->device, backup->target,
>  backup->has_format, backup->format,
>  backup->sync,
> @@ -1637,7 +1642,7 @@ static void drive_backup_prepare(BlkActionState 
> *common, Error **errp)
>  backup->has_bitmap, backup->bitmap,
>  b

[Qemu-block] [RFC] transactions: add transaction-wide property

2015-09-24 Thread John Snow
This replaces the per-action property as in Fam's series.
Instead, we have a transaction-wide property that is shared
with each action.

At the action level, if a property supplied transaction-wide
is disagreeable, we return error and the transaction is aborted.

RFC:

Where this makes sense: Any transactional actions that aren't
prepared to accept this new paradigm of transactional behavior
can error_setg and return.

Where this may not make sense: consider the transactions which
do not use BlockJobs to perform their actions, i.e. they are
synchronous during the transactional phase. Because they either
fail or succeed so early, we might say that of course they can
support this property ...

...however, consider the case where we create a new bitmap and
perform a full backup, using allow_partial=false. If the backup
fails, we might well expect the bitmap to be deleted because a
member of the transaction ultimately/eventually failed. However,
the bitmap creation was not undone because it does not have a
pending/delayed abort/commit action -- those are only for jobs
in this implementation.

How do we fix this?

(1) We just say "No, you can't use the new block job transaction
completion mechanic in conjunction with these commands,"

(2) We make Bitmap creation/resetting small, synchronous blockjobs
that can join the BlockJobTxn

Signed-off-by: John Snow 
---
 blockdev.c | 87 --
 blockjob.c |  2 +-
 qapi-schema.json   | 15 +++--
 qapi/block-core.json   | 26 ---
 qmp-commands.hx|  2 +-
 tests/qemu-iotests/124 | 12 +++
 6 files changed, 83 insertions(+), 61 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 45a9fe7..02b1a83 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1061,7 +1061,7 @@ static void blockdev_do_action(int kind, void *data, 
Error **errp)
 action.data = data;
 list.value = &action;
 list.next = NULL;
-qmp_transaction(&list, errp);
+qmp_transaction(&list, false, NULL, errp);
 }
 
 void qmp_blockdev_snapshot_sync(bool has_device, const char *device,
@@ -1286,6 +1286,7 @@ struct BlkActionState {
 TransactionAction *action;
 const BlkActionOps *ops;
 BlockJobTxn *block_job_txn;
+TransactionProperties *txn_props;
 QSIMPLEQ_ENTRY(BlkActionState) entry;
 };
 
@@ -1322,6 +1323,12 @@ static void internal_snapshot_prepare(BlkActionState 
*common,
 name = internal->name;
 
 /* 2. check for validation */
+if (!common->txn_props->allow_partial) {
+error_setg(errp,
+   "internal_snapshot does not support allow_partial = false");
+return;
+}
+
 blk = blk_by_name(device);
 if (!blk) {
 error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND,
@@ -1473,6 +1480,12 @@ static void external_snapshot_prepare(BlkActionState 
*common,
 }
 
 /* start processing */
+if (!common->txn_props->allow_partial) {
+error_setg(errp,
+   "external_snapshot does not support allow_partial = false");
+return;
+}
+
 state->old_bs = bdrv_lookup_bs(has_device ? device : NULL,
has_node_name ? node_name : NULL,
&local_err);
@@ -1603,14 +1616,11 @@ static void drive_backup_prepare(BlkActionState 
*common, Error **errp)
 DriveBackupState *state = DO_UPCAST(DriveBackupState, common, common);
 BlockDriverState *bs;
 BlockBackend *blk;
-DriveBackupTxn *backup_txn;
 DriveBackup *backup;
-BlockJobTxn *txn = NULL;
 Error *local_err = NULL;
 
 assert(common->action->kind == TRANSACTION_ACTION_KIND_DRIVE_BACKUP);
-backup_txn = common->action->drive_backup;
-backup = backup_txn->base;
+backup = common->action->drive_backup->base;
 
 blk = blk_by_name(backup->device);
 if (!blk) {
@@ -1624,11 +1634,6 @@ static void drive_backup_prepare(BlkActionState *common, 
Error **errp)
 state->aio_context = bdrv_get_aio_context(bs);
 aio_context_acquire(state->aio_context);
 
-if (backup_txn->has_transactional_cancel &&
-backup_txn->transactional_cancel) {
-txn = common->block_job_txn;
-}
-
 do_drive_backup(backup->device, backup->target,
 backup->has_format, backup->format,
 backup->sync,
@@ -1637,7 +1642,7 @@ static void drive_backup_prepare(BlkActionState *common, 
Error **errp)
 backup->has_bitmap, backup->bitmap,
 backup->has_on_source_error, backup->on_source_error,
 backup->has_on_target_error, backup->on_target_error,
-txn, &local_err);
+common->block_job_txn, &local_err);
 if (local_err) {
 error_propagate(errp, local_err);
 return;
@@ -1686,16 +1691,13 @@ static void do_blockdev_backup(const char *device, 
const char *target,
 static void blockdev_backup_prepare(BlkActionState *common