Re: [Qemu-block] [Qemu-devel] [RFC] transactions: add transaction-wide property

2015-10-22 Thread Stefan Hajnoczi
Thanks for summarizing the discussion!

If you are taking over Fam's series, please squash in your patches to
make review easier.

Maybe the names can be improved:

"allow-partial" is not self-explanatory.

"sync-cancel" is misleading since successful completion is affected too,
not just failure/cancel (jobs wait for each other before reporting
successful completion).

How about "transactional-completion" or "group-completion": "none"/"all"?

Stefan



Re: [Qemu-block] [Qemu-devel] [RFC] transactions: add transaction-wide property

2015-10-20 Thread John Snow


On 10/19/2015 03:27 AM, Markus Armbruster wrote:
> John Snow  writes:
> 
>> On 10/16/2015 08:23 AM, Stefan Hajnoczi wrote:
>>> On Mon, Oct 12, 2015 at 12:50:20PM -0400, John Snow wrote:
 Ping -- any consensus on how we should implement the "do-or-die"
 argument for transactions that start block jobs? :)

 This patch may look a little hokey in how it boxes arguments, but I can
 re-do it on top of Eric Blake's very official way of boxing arguments,
 when the QAPI dust settles.
>>>
>>> I don't understand what you are trying to do after staring at the email
>>> for 5 minutes.  Maybe the other reviewers hit the same problem and
>>> haven't responded.
>>>
>>> What is the problem you're trying to solve?
>>>
>>> Stefan
>>>
>>
>> Sorry...
>>
>> What I am trying to do is to add the transactional blocker property to
>> the *transaction* command and not as an argument to each individual action.
>>
>> There was some discussion on this so I wanted to just send an RFC to
>> show what I had in mind.
> 
> Was it the discussion on @transactional-cancel?  I'm on record
> supporting it per transaction rather than per action:
> Message-ID: <87mvwd8k9q@blackfin.pond.sub.org>
> http://lists.gnu.org/archive/html/qemu-devel/2015-09/msg05948.html
> 

Yes, this is the patch trying to illustrate that. I wrote it as an RFC
that sits on top of Fam's v7, to highlight the changes between his and
my approaches.

>> This series applies on top of Fam's latest series and moves the
>> arguments from each action to a transaction-wide property.




Re: [Qemu-block] [Qemu-devel] [RFC] transactions: add transaction-wide property

2015-10-20 Thread John Snow
So here's the current status of this blob:

- Markus supports the idea of a transaction-wide property, but hasn't
reviewed this particular RFC.
- Eric seemed supportive of a transaction-wide property, but hasn't
chimed in to this thread yet.
- Stefan was not sure what this patch was trying to accomplish.
- Fam appears to think that the concept of transactional properties is
worth pursuing and offered some light review on this RFC.

So let's recap a little bit, in a Markus-style breakdown to help bring
everyone back up to speed, Because I would desperately like some input
from the Qapibarons.

== Recap ==

We're trying to add transactional commands to make the incremental
backup QMP primitives (namely dirty bitmap clear and drive-backup with
sync=incremental) useful in an atomic fashion. Currently, you can't
clear a bitmap and start a backup simultaneously, so it's hard to
guarantee the safety of these things.

The primary problem with mixing jobs and transactions is that if a job
later fails, what do we do with the other actions that succeeded?

We need to be able to cancel jobs all together for failures, but we
don't want to break backwards compatibility with workflows that didn't
expect one failed backup to cancel a different backup.

So we need to add a parameter _somewhere_. First attempts put it as a
new parameter into the QMP command, second attempts put it in an
Action-only extension to the QMP command structure.

I propose we stick it on as an argument to the Transaction command
directly. No worrying about if one action has the transactional-cancel
argument and another doesn't, no managing the difference between QMP and
Action versions of command argument structures.

== What this patch is trying to do ==

This RFC patch applies to Fam's V7, and removes his per-action
transactional-cancel arguments and installs instead a transaction-wide
property, named "allow-partial," which I intended to be short for "Allow
partial completion."

The way it does this is meant to emulate a forthcoming feature by Eric
Blake. Instead of adding an "allow-partial" argument directly to
Transactions, it does this:

- Add an optional TransactionProperties structure to Transaction
- Add an optional allow-partial bool to this structure

This way, the transactional arguments are permanently "boxed" and we can
add/remove them at our leisure without disturbing the C prototypes any
further. This may not be necessary after Eric and Markus' qapi/qmp
refactoring decathlon.

The "default" creates allow-partial as "true," the current behavior. Any
Actions that do not support the allow-partial boolean set to false will
report an error immediately during the prepare phase of the transaction.

This way, if this property is set at the transactional level but an
action doesn't "support" or understand that property, it can fail the
transaction to prevent undefined behavior. This restriction can be
potentially lifted in the future on a per-action basis without harm to
backwards compatibility.

This approach is simple and intuitive: you are instructing all actions
as part of a transaction to live or die together. If they are incapable
of complying, they fail up-front and outright. It reduces the C
management burden and it is taxonomically obvious.

>From this approach, we leave ourselves expansion options, as well:
We can always add per-action overrides to transaction-wide properties in
the future if we so desire, but don't have to do that now.

== What happens next? ==

I believe Fam has completed work on his contribution to this set and is
handing the reins back over to me. I would like to push for the
transactional property.

I will prepare a new iteration that assumes transaction-wide properties
are the "way to go" with the understanding that if there is significant
ruckus and unhappiness we can always roll back to Fam's last version if
we all get tired of reading these long emails that I type.

== Potential Problems ==

(Because you know I can't help myself.)

As discussed on-list with Fam as a response to his review of this RFC,
there might be room to change this property from a simple boolean to a
tri-state:

sync-cancel: {none, jobs, all}

where none/false implies no change from the current behavior.
all/true implies that all actions are to live and die together.
"jobs" might imply that every action that creates a job is to
participate in the group cancellation, but any action that does not can
ignore it.

This would require some introspection to the client to be meaningful
("Which actions are jobs?") but it is probably the most powerful.

Thinking about it, I think what I'll do is this:

sync-cancel: {none, all}

And if we want "jobs" at a later date, it's just an enum addition away.

== Any questions? ==

:D

--js

On 09/24/2015 05:40 PM, John Snow wrote:
> This replaces the per-action property as in Fam's series.
> Instead, we have a transaction-wide property that is shared
> with each action.
> 
> At the action level, if a 

Re: [Qemu-block] [Qemu-devel] [RFC] transactions: add transaction-wide property

2015-10-20 Thread John Snow
A little bit of cross-talk with my "state of the union" reply and this
review from Eric.

Sorry, everyone!

On 10/20/2015 04:12 PM, Eric Blake wrote:
> On 09/24/2015 03:40 PM, John Snow wrote:
>> This replaces the per-action property as in Fam's series.
>> Instead, we have a transaction-wide property that is shared
>> with each action.
>>
>> At the action level, if a property supplied transaction-wide
>> is disagreeable, we return error and the transaction is aborted.
>>
>> RFC:
>>
>> Where this makes sense: Any transactional actions that aren't
>> prepared to accept this new paradigm of transactional behavior
>> can error_setg and return.
>>
>> Where this may not make sense: consider the transactions which
>> do not use BlockJobs to perform their actions, i.e. they are
>> synchronous during the transactional phase. Because they either
>> fail or succeed so early, we might say that of course they can
>> support this property ...
>>
>> ...however, consider the case where we create a new bitmap and
>> perform a full backup, using allow_partial=false. If the backup
>> fails, we might well expect the bitmap to be deleted because a
>> member of the transaction ultimately/eventually failed. However,
>> the bitmap creation was not undone because it does not have a
>> pending/delayed abort/commit action -- those are only for jobs
>> in this implementation.
> 
> The classic example is the 'Abort' transaction, which always fails.  Or
> put another way, if you run a transaction that includes both the
> creation of a new bitmap, and the abort action, what does the abort do
> to the bitmap.
> 
>>
>> How do we fix this?
>>
>> (1) We just say "No, you can't use the new block job transaction
>> completion mechanic in conjunction with these commands,"
>>
>> (2) We make Bitmap creation/resetting small, synchronous blockjobs
>> that can join the BlockJobTxn
> 
> I'm not sure I have an opinion on this one yet.
> 

As stated in my other mail, I'm leaning towards (1) with a design that
allows us to switch to (2) per-action as we feel like it.

>>
>> Signed-off-by: John Snow 
>> ---
>>  blockdev.c | 87 
>> --
>>  blockjob.c |  2 +-
>>  qapi-schema.json   | 15 +++--
>>  qapi/block-core.json   | 26 ---
>>  qmp-commands.hx|  2 +-
>>  tests/qemu-iotests/124 | 12 +++
>>  6 files changed, 83 insertions(+), 61 deletions(-)
>>
>> @@ -1322,6 +1323,12 @@ static void internal_snapshot_prepare(BlkActionState 
>> *common,
>>  name = internal->name;
>>  
>>  /* 2. check for validation */
>> +if (!common->txn_props->allow_partial) {
>> +error_setg(errp,
>> +   "internal_snapshot does not support allow_partial = 
>> false");
> 
> Should the error message say 'allow-partial' to match the QMP?
> 

It sure should.

>>  
>> +/**
>> + * Allocate a TransactionProperties structure if necessary, and fill
>> + * that structure with desired defaults if they are unset.
>> + */
>> +static TransactionProperties 
>> *get_transaction_properties(TransactionProperties *props)
>> +{
>> +if (!props) {
>> +props = g_new0(TransactionProperties, 1);
>> +}
>> +
>> +if (!props->has_allow_partial) {
>> +props->allow_partial = true;
>> +}
>> +
>> +return props;
>> +}
> 
> If *props is NULL on entry, then allow_partial is true on exit but
> has_allow_partial is still false. I guess that means we're relying on
> the rest of the code ignoring has_allow_partial because they used this
> function to set defaults.
> 
> Yeah, having default support built into qapi would make this nicer. I
> don't know if we'll have enough time for qapi defaults to make it in 2.5
> (the queue is already quite big for merely getting boxed qmp callback
> functions in place).  But that's all internal, and won't matter if it
> doesn't get added until 2.6, without affecting what behavior the
> external user sees.
> 

Yes, this is goofy. I will change it to a "false by default" phrasing
for the property to avoid this.

>> +
>>  /*
>>   * 'Atomic' group operations.  The operations are performed as a set, and if
>>   * any fail then we roll back all operations in the group.
>>   */
>> -void qmp_transaction(TransactionActionList *dev_list, Error **errp)
>> +void qmp_transaction(TransactionActionList *dev_list,
>> + bool hasTransactionProperties,
> 
> Name this has_props, to be more reminiscent of other qapi naming.
> 

Yes.

>> + struct TransactionProperties *props,
>> + Error **errp)
>>  {
> 
>> -block_job_txn = block_job_txn_new();
>> +/* Does this transaction get *canceled* as a group on failure? */
>> +props = get_transaction_properties(props);
>> +if (props->allow_partial == false) {
>> +block_job_txn = block_job_txn_new();
>> +}
>>  
> 
>>  }
>> +if (!hasTransactionProperties) {
>> +g_free(props);

Re: [Qemu-block] [Qemu-devel] [RFC] transactions: add transaction-wide property

2015-10-19 Thread Markus Armbruster
John Snow  writes:

> On 10/16/2015 08:23 AM, Stefan Hajnoczi wrote:
>> On Mon, Oct 12, 2015 at 12:50:20PM -0400, John Snow wrote:
>>> Ping -- any consensus on how we should implement the "do-or-die"
>>> argument for transactions that start block jobs? :)
>>>
>>> This patch may look a little hokey in how it boxes arguments, but I can
>>> re-do it on top of Eric Blake's very official way of boxing arguments,
>>> when the QAPI dust settles.
>> 
>> I don't understand what you are trying to do after staring at the email
>> for 5 minutes.  Maybe the other reviewers hit the same problem and
>> haven't responded.
>> 
>> What is the problem you're trying to solve?
>> 
>> Stefan
>> 
>
> Sorry...
>
> What I am trying to do is to add the transactional blocker property to
> the *transaction* command and not as an argument to each individual action.
>
> There was some discussion on this so I wanted to just send an RFC to
> show what I had in mind.

Was it the discussion on @transactional-cancel?  I'm on record
supporting it per transaction rather than per action:
Message-ID: <87mvwd8k9q@blackfin.pond.sub.org>
http://lists.gnu.org/archive/html/qemu-devel/2015-09/msg05948.html

> This series applies on top of Fam's latest series and moves the
> arguments from each action to a transaction-wide property.



Re: [Qemu-block] [Qemu-devel] [RFC] transactions: add transaction-wide property

2015-10-19 Thread Fam Zheng
On Mon, 10/19 09:27, Markus Armbruster wrote:
> John Snow  writes:
> 
> > On 10/16/2015 08:23 AM, Stefan Hajnoczi wrote:
> >> On Mon, Oct 12, 2015 at 12:50:20PM -0400, John Snow wrote:
> >>> Ping -- any consensus on how we should implement the "do-or-die"
> >>> argument for transactions that start block jobs? :)
> >>>
> >>> This patch may look a little hokey in how it boxes arguments, but I can
> >>> re-do it on top of Eric Blake's very official way of boxing arguments,
> >>> when the QAPI dust settles.
> >> 
> >> I don't understand what you are trying to do after staring at the email
> >> for 5 minutes.  Maybe the other reviewers hit the same problem and
> >> haven't responded.
> >> 
> >> What is the problem you're trying to solve?
> >> 
> >> Stefan
> >> 
> >
> > Sorry...
> >
> > What I am trying to do is to add the transactional blocker property to
> > the *transaction* command and not as an argument to each individual action.
> >
> > There was some discussion on this so I wanted to just send an RFC to
> > show what I had in mind.
> 
> Was it the discussion on @transactional-cancel?  I'm on record
> supporting it per transaction rather than per action:
> Message-ID: <87mvwd8k9q@blackfin.pond.sub.org>
> http://lists.gnu.org/archive/html/qemu-devel/2015-09/msg05948.html

I prefer we start with a per-transaction flag as in this patch. Any
fine-grained arguments could be added in the future if it turns out to be
useful.

I'll take a look at the implementation later.

Fam

> 
> > This series applies on top of Fam's latest series and moves the
> > arguments from each action to a transaction-wide property.



Re: [Qemu-block] [Qemu-devel] [RFC] transactions: add transaction-wide property

2015-10-16 Thread Stefan Hajnoczi
On Mon, Oct 12, 2015 at 12:50:20PM -0400, John Snow wrote:
> Ping -- any consensus on how we should implement the "do-or-die"
> argument for transactions that start block jobs? :)
> 
> This patch may look a little hokey in how it boxes arguments, but I can
> re-do it on top of Eric Blake's very official way of boxing arguments,
> when the QAPI dust settles.

I don't understand what you are trying to do after staring at the email
for 5 minutes.  Maybe the other reviewers hit the same problem and
haven't responded.

What is the problem you're trying to solve?

Stefan



Re: [Qemu-block] [Qemu-devel] [RFC] transactions: add transaction-wide property

2015-10-16 Thread John Snow


On 10/16/2015 08:23 AM, Stefan Hajnoczi wrote:
> On Mon, Oct 12, 2015 at 12:50:20PM -0400, John Snow wrote:
>> Ping -- any consensus on how we should implement the "do-or-die"
>> argument for transactions that start block jobs? :)
>>
>> This patch may look a little hokey in how it boxes arguments, but I can
>> re-do it on top of Eric Blake's very official way of boxing arguments,
>> when the QAPI dust settles.
> 
> I don't understand what you are trying to do after staring at the email
> for 5 minutes.  Maybe the other reviewers hit the same problem and
> haven't responded.
> 
> What is the problem you're trying to solve?
> 
> Stefan
> 

Sorry...

What I am trying to do is to add the transactional blocker property to
the *transaction* command and not as an argument to each individual action.

There was some discussion on this so I wanted to just send an RFC to
show what I had in mind.

This series applies on top of Fam's latest series and moves the
arguments from each action to a transaction-wide property.



Re: [Qemu-block] [Qemu-devel] [RFC] transactions: add transaction-wide property

2015-10-12 Thread John Snow
Ping -- any consensus on how we should implement the "do-or-die"
argument for transactions that start block jobs? :)

This patch may look a little hokey in how it boxes arguments, but I can
re-do it on top of Eric Blake's very official way of boxing arguments,
when the QAPI dust settles.

--js

On 09/24/2015 05:40 PM, John Snow wrote:
> This replaces the per-action property as in Fam's series.
> Instead, we have a transaction-wide property that is shared
> with each action.
> 
> At the action level, if a property supplied transaction-wide
> is disagreeable, we return error and the transaction is aborted.
> 
> RFC:
> 
> Where this makes sense: Any transactional actions that aren't
> prepared to accept this new paradigm of transactional behavior
> can error_setg and return.
> 
> Where this may not make sense: consider the transactions which
> do not use BlockJobs to perform their actions, i.e. they are
> synchronous during the transactional phase. Because they either
> fail or succeed so early, we might say that of course they can
> support this property ...
> 
> ...however, consider the case where we create a new bitmap and
> perform a full backup, using allow_partial=false. If the backup
> fails, we might well expect the bitmap to be deleted because a
> member of the transaction ultimately/eventually failed. However,
> the bitmap creation was not undone because it does not have a
> pending/delayed abort/commit action -- those are only for jobs
> in this implementation.
> 
> How do we fix this?
> 
> (1) We just say "No, you can't use the new block job transaction
> completion mechanic in conjunction with these commands,"
> 
> (2) We make Bitmap creation/resetting small, synchronous blockjobs
> that can join the BlockJobTxn
> 
> Signed-off-by: John Snow 
> ---
>  blockdev.c | 87 
> --
>  blockjob.c |  2 +-
>  qapi-schema.json   | 15 +++--
>  qapi/block-core.json   | 26 ---
>  qmp-commands.hx|  2 +-
>  tests/qemu-iotests/124 | 12 +++
>  6 files changed, 83 insertions(+), 61 deletions(-)
> 
> diff --git a/blockdev.c b/blockdev.c
> index 45a9fe7..02b1a83 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -1061,7 +1061,7 @@ static void blockdev_do_action(int kind, void *data, 
> Error **errp)
>  action.data = data;
>  list.value = 
>  list.next = NULL;
> -qmp_transaction(, errp);
> +qmp_transaction(, false, NULL, errp);
>  }
>  
>  void qmp_blockdev_snapshot_sync(bool has_device, const char *device,
> @@ -1286,6 +1286,7 @@ struct BlkActionState {
>  TransactionAction *action;
>  const BlkActionOps *ops;
>  BlockJobTxn *block_job_txn;
> +TransactionProperties *txn_props;
>  QSIMPLEQ_ENTRY(BlkActionState) entry;
>  };
>  
> @@ -1322,6 +1323,12 @@ static void internal_snapshot_prepare(BlkActionState 
> *common,
>  name = internal->name;
>  
>  /* 2. check for validation */
> +if (!common->txn_props->allow_partial) {
> +error_setg(errp,
> +   "internal_snapshot does not support allow_partial = 
> false");
> +return;
> +}
> +
>  blk = blk_by_name(device);
>  if (!blk) {
>  error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND,
> @@ -1473,6 +1480,12 @@ static void external_snapshot_prepare(BlkActionState 
> *common,
>  }
>  
>  /* start processing */
> +if (!common->txn_props->allow_partial) {
> +error_setg(errp,
> +   "external_snapshot does not support allow_partial = 
> false");
> +return;
> +}
> +
>  state->old_bs = bdrv_lookup_bs(has_device ? device : NULL,
> has_node_name ? node_name : NULL,
> _err);
> @@ -1603,14 +1616,11 @@ static void drive_backup_prepare(BlkActionState 
> *common, Error **errp)
>  DriveBackupState *state = DO_UPCAST(DriveBackupState, common, common);
>  BlockDriverState *bs;
>  BlockBackend *blk;
> -DriveBackupTxn *backup_txn;
>  DriveBackup *backup;
> -BlockJobTxn *txn = NULL;
>  Error *local_err = NULL;
>  
>  assert(common->action->kind == TRANSACTION_ACTION_KIND_DRIVE_BACKUP);
> -backup_txn = common->action->drive_backup;
> -backup = backup_txn->base;
> +backup = common->action->drive_backup->base;
>  
>  blk = blk_by_name(backup->device);
>  if (!blk) {
> @@ -1624,11 +1634,6 @@ static void drive_backup_prepare(BlkActionState 
> *common, Error **errp)
>  state->aio_context = bdrv_get_aio_context(bs);
>  aio_context_acquire(state->aio_context);
>  
> -if (backup_txn->has_transactional_cancel &&
> -backup_txn->transactional_cancel) {
> -txn = common->block_job_txn;
> -}
> -
>  do_drive_backup(backup->device, backup->target,
>  backup->has_format, backup->format,
>  backup->sync,
> @@ -1637,7 +1642,7 @@ static void