Re: [Qemu-block] [Qemu-devel] [PATCH v10 12/14] block: add transactional properties

2015-11-06 Thread John Snow


On 11/06/2015 01:46 PM, John Snow wrote:
> 
> 
> On 11/06/2015 03:32 AM, Kevin Wolf wrote:
>> Am 05.11.2015 um 19:52 hat John Snow geschrieben:
>>>
>>>
>>> On 11/05/2015 05:47 AM, Stefan Hajnoczi wrote:
 On Tue, Nov 03, 2015 at 12:27:19PM -0500, John Snow wrote:
>
>
> On 11/03/2015 10:17 AM, Stefan Hajnoczi wrote:
>> On Fri, Oct 23, 2015 at 07:56:50PM -0400, John Snow wrote:
>>> @@ -1732,6 +1757,10 @@ static void 
>>> block_dirty_bitmap_add_prepare(BlkActionState *common,
>>>  BlockDirtyBitmapState *state = DO_UPCAST(BlockDirtyBitmapState,
>>>   common, common);
>>>  
>>> +if (action_check_cancel_mode(common, errp) < 0) {
>>> +return;
>>> +}
>>> +
>>>  action = common->action->block_dirty_bitmap_add;
>>>  /* AIO context taken and released within 
>>> qmp_block_dirty_bitmap_add */
>>>  qmp_block_dirty_bitmap_add(action->node, action->name,
>>> @@ -1767,6 +1796,10 @@ static void 
>>> block_dirty_bitmap_clear_prepare(BlkActionState *common,
>>>   common, common);
>>>  BlockDirtyBitmap *action;
>>>  
>>> +if (action_check_cancel_mode(common, errp) < 0) {
>>> +return;
>>> +}
>>> +
>>>  action = common->action->block_dirty_bitmap_clear;
>>>  state->bitmap = block_dirty_bitmap_lookup(action->node,
>>>action->name,
>>
>> Why do the bitmap add/clear actions not support err-cancel=all?
>>
>> I understand why other block jobs don't support it, but it's not clear
>> why these non-block job actions cannot.
>>
>
> Because they don't have a callback to invoke if the rest of the job fails.
>
> I could create a BlockJob for them complete with a callback to invoke,
> but basically it's just because there's no interface to unwind them, or
> an interface to join them with the transaction.
>
> They're small, synchronous non-job actions. Which makes them weird.

 Funny, we've been looking at the same picture while seeing different
 things:
 https://en.wikipedia.org/wiki/Rabbit%E2%80%93duck_illusion

 I think I understand your idea: the transaction should include both
 immediate actions as well as block jobs.

 My mental model was different: immediate actions commit/abort along with
 the 'transaction' command.  Block jobs are separate and complete/cancel
 together in a group.

 In practice I think the two end up being similar because we won't be
 able to implement immediate action commit/abort together with
 long-running block jobs because the immediate actions rely on
 quiescing/pausing the guest for atomic commit/abort.

 So with your mental model the QMP client has to submit 2 'transaction'
 commands: 1 for the immediate actions, 1 for the block jobs.

 In my mental model the QMP client submits 1 command but the immediate
 actions and block jobs are two separate transaction scopes.  This means
 if the block jobs fail, the client needs to be aware of the immediate
 actions that have committed.  Because of this, it becomes just as much
 client effort as submitting two separate 'transaction' commands in your
 model.

 Can anyone see a practical difference?  I think I'm happy with John's
 model.

 Stefan

>>>
>>> We discussed this off-list, but for the sake of the archive:
>>>
>>> == How it is now ==
>>>
>>> Currently, transactions have two implicit phases: the first is the
>>> synchronous phase. If this phase completes successfully, we consider the
>>> transaction a success. The second phase is the asynchronous phase where
>>> jobs launched by the synchronous phase run to completion.
>>>
>>> all synchronous commands must complete for the transaction to "succeed."
>>> There are currently (pre-patch) no guarantees about asynchronous command
>>> completion. As long as all synchronous actions complete, asynchronous
>>> actions are free to succeed or fail individually.
>>
>> You're overcomplicating this. All actions are currently synchronous and
>> what you consider asynchronous transaction actions aren't actually part
>> of the transaction at all. The action is "start block job X", not "run
>> block job X".
>>
> 
> "OK," except the entire goal of the series was to allow the "aren't
> actually part of the transaction at all" parts to live and die together
> based on the transaction they were launched from. This really implies
> they belong to a second asynchronous phase of the transaction.
> 
> Otherwise, why would totally unrelated jobs fail because a different one
> did?
> 
> I realize this is an /extension/ of the existing semantics vs a "fix,"
> and part of the confusion is how I and everyone else was looking at it
> differently. How cou

Re: [Qemu-block] [Qemu-devel] [PATCH v4 0/3] qemu-io: clean up cvtnum usage

2015-11-06 Thread John Snow


On 11/06/2015 04:46 AM, Kevin Wolf wrote:
> Am 06.11.2015 um 00:53 hat John Snow geschrieben:
>> cvtnum returns an int64_t, not an int, so correct the lvalue types
>> wherever it is used. While we're at it, make the error messages more
>> meaningful and hopefully less confusing.
>>
>> v4:
>>  - Now missing ALL sweaters
>>
>> v3:
>>  - pulled a lot of loose yarn, now missing my sweater
>>(Updated patch 1 even further, reported-by Kevin)
>>
>> v2:
>>  - Squashed NSIG error-checking from patch 3 into patch 1
>>  - Reported-by credits for Max and Reviewed-by from Eric added
> 
> Thanks, applied to the block branch. (Should we mention in the changelog
> that qemu 2.5 contains some of your sweaters?)
> 
> Kevin
> 

If you want to destroy my sweater, NACK this patch as I walk away (as I
walk away)

--js



Re: [Qemu-block] [PATCH v6 03/15] block: Release dirty bitmaps in bdrv_close()

2015-11-06 Thread John Snow


On 11/04/2015 01:57 PM, Max Reitz wrote:
> bdrv_delete() is not very happy about deleting BlockDriverStates with
> dirty bitmaps still attached to them. In the past, we got around that
> very easily by relying on bdrv_close_all() bypassing bdrv_delete(), and
> bdrv_close() simply ignoring that condition. We should fix that by
> releasing all dirty bitmaps in bdrv_close() and drop the assertion in
> bdrv_delete().
> 
> Signed-off-by: Max Reitz 
> ---
>  block.c | 20 +++-
>  1 file changed, 19 insertions(+), 1 deletion(-)
> 
> diff --git a/block.c b/block.c
> index 3493501..23448ed 100644
> --- a/block.c
> +++ b/block.c
> @@ -87,6 +87,8 @@ static int bdrv_open_inherit(BlockDriverState **pbs, const 
> char *filename,
>   const BdrvChildRole *child_role, Error **errp);
>  
>  static void bdrv_dirty_bitmap_truncate(BlockDriverState *bs);
> +static void bdrv_release_all_dirty_bitmaps(BlockDriverState *bs);
> +
>  /* If non-zero, use only whitelisted block drivers */
>  static int use_bdrv_whitelist;
>  
> @@ -1916,6 +1918,8 @@ void bdrv_close(BlockDriverState *bs)
>  bdrv_drain(bs); /* in case flush left pending I/O */
>  notifier_list_notify(&bs->close_notifiers, bs);
>  
> +bdrv_release_all_dirty_bitmaps(bs);
> +
>  if (bs->blk) {
>  blk_dev_change_media_cb(bs->blk, false);
>  }
> @@ -2123,7 +2127,6 @@ static void bdrv_delete(BlockDriverState *bs)
>  assert(!bs->job);
>  assert(bdrv_op_blocker_is_empty(bs));
>  assert(!bs->refcnt);
> -assert(QLIST_EMPTY(&bs->dirty_bitmaps));
>  
>  bdrv_close(bs);
>  
> @@ -3318,6 +3321,21 @@ void bdrv_release_dirty_bitmap(BlockDriverState *bs, 
> BdrvDirtyBitmap *bitmap)
>  }
>  }
>  
> +/**
> + * Release all dirty bitmaps attached to a BDS, independently of whether they
> + * are frozen or not (for use in bdrv_close()).
> + */

This comment caught my attention ...

> +static void bdrv_release_all_dirty_bitmaps(BlockDriverState *bs)
> +{
> +BdrvDirtyBitmap *bm, *next;
> +QLIST_FOREACH_SAFE(bm, &bs->dirty_bitmaps, list, next) {
> +QLIST_REMOVE(bm, list);
> +hbitmap_free(bm->bitmap);
> +g_free(bm->name);
> +g_free(bm);
> +}
> +}
> +
>  void bdrv_disable_dirty_bitmap(BdrvDirtyBitmap *bitmap)
>  {
>  assert(!bdrv_dirty_bitmap_frozen(bitmap));
> 

Currently, the only way a bitmap could/should be frozen is if it is in
use by a job. If a job is running, you probably shouldn't delete stuff
it is using out from under it.

I am assuming by now that it's actually likely that you've canceled any
jobs attached to this node before you go through the motions of deleting
it, and it should be safe to just call bdrv_release_dirty_bitmap ...

We don't want the two foreach loops though, so maybe just factor out a
helper that bdrv_release_all_dirty_bitmaps and bdrv_release_dirty_bitmap
can share. You can leave the frozen assertion
in just bdrv_release_dirty_bitmap before it invokes the helper, so that
bdrv_delete always succeeds in case something gets out-of-sync in the
internals.

(Hmm, to prevent leaks, if you delete a bitmap that is frozen, you
should probably call the helper on its child object too... or just make
the helper recursive.)



Re: [Qemu-block] [Qemu-devel] [PATCH v10 12/14] block: add transactional properties

2015-11-06 Thread John Snow


On 11/06/2015 03:32 AM, Kevin Wolf wrote:
> Am 05.11.2015 um 19:52 hat John Snow geschrieben:
>>
>>
>> On 11/05/2015 05:47 AM, Stefan Hajnoczi wrote:
>>> On Tue, Nov 03, 2015 at 12:27:19PM -0500, John Snow wrote:


 On 11/03/2015 10:17 AM, Stefan Hajnoczi wrote:
> On Fri, Oct 23, 2015 at 07:56:50PM -0400, John Snow wrote:
>> @@ -1732,6 +1757,10 @@ static void 
>> block_dirty_bitmap_add_prepare(BlkActionState *common,
>>  BlockDirtyBitmapState *state = DO_UPCAST(BlockDirtyBitmapState,
>>   common, common);
>>  
>> +if (action_check_cancel_mode(common, errp) < 0) {
>> +return;
>> +}
>> +
>>  action = common->action->block_dirty_bitmap_add;
>>  /* AIO context taken and released within qmp_block_dirty_bitmap_add 
>> */
>>  qmp_block_dirty_bitmap_add(action->node, action->name,
>> @@ -1767,6 +1796,10 @@ static void 
>> block_dirty_bitmap_clear_prepare(BlkActionState *common,
>>   common, common);
>>  BlockDirtyBitmap *action;
>>  
>> +if (action_check_cancel_mode(common, errp) < 0) {
>> +return;
>> +}
>> +
>>  action = common->action->block_dirty_bitmap_clear;
>>  state->bitmap = block_dirty_bitmap_lookup(action->node,
>>action->name,
>
> Why do the bitmap add/clear actions not support err-cancel=all?
>
> I understand why other block jobs don't support it, but it's not clear
> why these non-block job actions cannot.
>

 Because they don't have a callback to invoke if the rest of the job fails.

 I could create a BlockJob for them complete with a callback to invoke,
 but basically it's just because there's no interface to unwind them, or
 an interface to join them with the transaction.

 They're small, synchronous non-job actions. Which makes them weird.
>>>
>>> Funny, we've been looking at the same picture while seeing different
>>> things:
>>> https://en.wikipedia.org/wiki/Rabbit%E2%80%93duck_illusion
>>>
>>> I think I understand your idea: the transaction should include both
>>> immediate actions as well as block jobs.
>>>
>>> My mental model was different: immediate actions commit/abort along with
>>> the 'transaction' command.  Block jobs are separate and complete/cancel
>>> together in a group.
>>>
>>> In practice I think the two end up being similar because we won't be
>>> able to implement immediate action commit/abort together with
>>> long-running block jobs because the immediate actions rely on
>>> quiescing/pausing the guest for atomic commit/abort.
>>>
>>> So with your mental model the QMP client has to submit 2 'transaction'
>>> commands: 1 for the immediate actions, 1 for the block jobs.
>>>
>>> In my mental model the QMP client submits 1 command but the immediate
>>> actions and block jobs are two separate transaction scopes.  This means
>>> if the block jobs fail, the client needs to be aware of the immediate
>>> actions that have committed.  Because of this, it becomes just as much
>>> client effort as submitting two separate 'transaction' commands in your
>>> model.
>>>
>>> Can anyone see a practical difference?  I think I'm happy with John's
>>> model.
>>>
>>> Stefan
>>>
>>
>> We discussed this off-list, but for the sake of the archive:
>>
>> == How it is now ==
>>
>> Currently, transactions have two implicit phases: the first is the
>> synchronous phase. If this phase completes successfully, we consider the
>> transaction a success. The second phase is the asynchronous phase where
>> jobs launched by the synchronous phase run to completion.
>>
>> all synchronous commands must complete for the transaction to "succeed."
>> There are currently (pre-patch) no guarantees about asynchronous command
>> completion. As long as all synchronous actions complete, asynchronous
>> actions are free to succeed or fail individually.
> 
> You're overcomplicating this. All actions are currently synchronous and
> what you consider asynchronous transaction actions aren't actually part
> of the transaction at all. The action is "start block job X", not "run
> block job X".
> 

"OK," except the entire goal of the series was to allow the "aren't
actually part of the transaction at all" parts to live and die together
based on the transaction they were launched from. This really implies
they belong to a second asynchronous phase of the transaction.

Otherwise, why would totally unrelated jobs fail because a different one
did?

I realize this is an /extension/ of the existing semantics vs a "fix,"
and part of the confusion is how I and everyone else was looking at it
differently. How could this happen, though? Let's look at our
transaction documentation:

"Operations that are currently supported:" [...] "- drive-backup" [...]
"If there is any failure
perf

Re: [Qemu-block] [PATCH] mirror: Improve zero-write and discard with fragmented image

2015-11-06 Thread Max Reitz
On 06.11.2015 11:22, Fam Zheng wrote:
> The "pnum < nb_sectors" condition in deciding whether to actually copy
> data is unnecessarily strict, and the qiov initialization is
> unnecessarily too, for both bdrv_aio_write_zeroes and bdrv_aio_discard
> branches.
> 
> Reorganize mirror_iteration flow so that we:
> 
> 1) Find the contiguous zero/discarded sectors with
> bdrv_get_block_status_above() before deciding what to do. We query
> s->buf_size sized blocks at a time.
> 
> 2) If the sectors in question are zeroed/discarded and aligned to
> target cluster, issue zero write or discard accordingly. It's done
> in mirror_do_zero_or_discard, where we don't add buffer to qiov.
> 
> 3) Otherwise, do the same loop as before in mirror_do_read.
> 
> Signed-off-by: Fam Zheng 
> ---
>  block/mirror.c | 161 
> +
>  1 file changed, 127 insertions(+), 34 deletions(-)

Looks good overall, some comments below.

Max

> diff --git a/block/mirror.c b/block/mirror.c
> index b1252a1..ac796b4 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -157,28 +157,21 @@ static void mirror_read_complete(void *opaque, int ret)
>  mirror_write_complete, op);
>  }
>  
> -static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
> +static uint64_t mirror_do_read(MirrorBlockJob *s)
>  {
>  BlockDriverState *source = s->common.bs;
> -int nb_sectors, sectors_per_chunk, nb_chunks;
> -int64_t end, sector_num, next_chunk, next_sector, hbitmap_next_sector;
> +int sectors_per_chunk, nb_sectors, nb_chunks;
> +int64_t end, next_chunk, next_sector, hbitmap_next_sector, sector_num;
>  uint64_t delay_ns = 0;
>  MirrorOp *op;
> -int pnum;
> -int64_t ret;
>  
> -s->sector_num = hbitmap_iter_next(&s->hbi);
> -if (s->sector_num < 0) {
> -bdrv_dirty_iter_init(s->dirty_bitmap, &s->hbi);
> -s->sector_num = hbitmap_iter_next(&s->hbi);
> -trace_mirror_restart_iter(s, bdrv_get_dirty_count(s->dirty_bitmap));
> -assert(s->sector_num >= 0);
> -}
> +sector_num = s->sector_num;
> +sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS;
> +end = s->bdev_length / BDRV_SECTOR_SIZE;
>  
> +next_sector = sector_num;
> +next_chunk = sector_num / sectors_per_chunk;

@next_sector and @next_chunk set here...

>  hbitmap_next_sector = s->sector_num;
> -sector_num = s->sector_num;
> -sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS;
> -end = s->bdev_length / BDRV_SECTOR_SIZE;
>  
>  /* Extend the QEMUIOVector to include all adjacent blocks that will
>   * be copied in this operation.
> @@ -198,14 +191,6 @@ static uint64_t coroutine_fn 
> mirror_iteration(MirrorBlockJob *s)
>  next_sector = sector_num;
>  next_chunk = sector_num / sectors_per_chunk;

...and here already. So the above seems superfluous, considering that
they are not read in between.

(If you keep hbitmap_next_sector = s->sector_num; above the sector_num =
... block, that may reduce conflicts further)

>  
> -/* Wait for I/O to this cluster (from a previous iteration) to be done.  
> */
> -while (test_bit(next_chunk, s->in_flight_bitmap)) {
> -trace_mirror_yield_in_flight(s, sector_num, s->in_flight);
> -s->waiting_for_io = true;
> -qemu_coroutine_yield();
> -s->waiting_for_io = false;
> -}
> -
>  do {
>  int added_sectors, added_chunks;
>  
> @@ -301,24 +286,132 @@ static uint64_t coroutine_fn 
> mirror_iteration(MirrorBlockJob *s)
>  s->sectors_in_flight += nb_sectors;
>  trace_mirror_one_iteration(s, sector_num, nb_sectors);
>  
> -ret = bdrv_get_block_status_above(source, NULL, sector_num,
> -  nb_sectors, &pnum);
> -if (ret < 0 || pnum < nb_sectors ||
> -(ret & BDRV_BLOCK_DATA && !(ret & BDRV_BLOCK_ZERO))) {
> -bdrv_aio_readv(source, sector_num, &op->qiov, nb_sectors,
> -   mirror_read_complete, op);
> -} else if (ret & BDRV_BLOCK_ZERO) {
> +bdrv_aio_readv(source, sector_num, &op->qiov, nb_sectors,
> +   mirror_read_complete, op);
> +return delay_ns;
> +}
> +
> +static uint64_t mirror_do_zero_or_discard(MirrorBlockJob *s,
> +  int64_t sector_num,
> +  int nb_sectors,
> +  bool is_discard)
> +{
> +int sectors_per_chunk, nb_chunks;
> +int64_t next_chunk, next_sector, hbitmap_next_sector;
> +uint64_t delay_ns = 0;
> +MirrorOp *op;
> +
> +sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS;
> +assert(nb_sectors >= sectors_per_chunk);
> +next_chunk = sector_num / sectors_per_chunk;
> +nb_chunks = DIV_ROUND_UP(nb_sectors, sectors_per_chunk);
> +bitmap_set(s->in_flight_bitmap, next_chunk, nb_chunks);
> +delay_ns = ratelimit_calculate_de

Re: [Qemu-block] [Qemu-devel] [PATCH v10 12/14] block: add transactional properties

2015-11-06 Thread Stefan Hajnoczi
On Fri, Nov 06, 2015 at 09:32:19AM +0100, Kevin Wolf wrote:
> Am 05.11.2015 um 19:52 hat John Snow geschrieben:
> > 
> > 
> > On 11/05/2015 05:47 AM, Stefan Hajnoczi wrote:
> > > On Tue, Nov 03, 2015 at 12:27:19PM -0500, John Snow wrote:
> > >>
> > >>
> > >> On 11/03/2015 10:17 AM, Stefan Hajnoczi wrote:
> > >>> On Fri, Oct 23, 2015 at 07:56:50PM -0400, John Snow wrote:
> >  @@ -1732,6 +1757,10 @@ static void 
> >  block_dirty_bitmap_add_prepare(BlkActionState *common,
> >   BlockDirtyBitmapState *state = DO_UPCAST(BlockDirtyBitmapState,
> >    common, common);
> >   
> >  +if (action_check_cancel_mode(common, errp) < 0) {
> >  +return;
> >  +}
> >  +
> >   action = common->action->block_dirty_bitmap_add;
> >   /* AIO context taken and released within 
> >  qmp_block_dirty_bitmap_add */
> >   qmp_block_dirty_bitmap_add(action->node, action->name,
> >  @@ -1767,6 +1796,10 @@ static void 
> >  block_dirty_bitmap_clear_prepare(BlkActionState *common,
> >    common, common);
> >   BlockDirtyBitmap *action;
> >   
> >  +if (action_check_cancel_mode(common, errp) < 0) {
> >  +return;
> >  +}
> >  +
> >   action = common->action->block_dirty_bitmap_clear;
> >   state->bitmap = block_dirty_bitmap_lookup(action->node,
> > action->name,
> > >>>
> > >>> Why do the bitmap add/clear actions not support err-cancel=all?
> > >>>
> > >>> I understand why other block jobs don't support it, but it's not clear
> > >>> why these non-block job actions cannot.
> > >>>
> > >>
> > >> Because they don't have a callback to invoke if the rest of the job 
> > >> fails.
> > >>
> > >> I could create a BlockJob for them complete with a callback to invoke,
> > >> but basically it's just because there's no interface to unwind them, or
> > >> an interface to join them with the transaction.
> > >>
> > >> They're small, synchronous non-job actions. Which makes them weird.
> > > 
> > > Funny, we've been looking at the same picture while seeing different
> > > things:
> > > https://en.wikipedia.org/wiki/Rabbit%E2%80%93duck_illusion
> > > 
> > > I think I understand your idea: the transaction should include both
> > > immediate actions as well as block jobs.
> > > 
> > > My mental model was different: immediate actions commit/abort along with
> > > the 'transaction' command.  Block jobs are separate and complete/cancel
> > > together in a group.
> > > 
> > > In practice I think the two end up being similar because we won't be
> > > able to implement immediate action commit/abort together with
> > > long-running block jobs because the immediate actions rely on
> > > quiescing/pausing the guest for atomic commit/abort.
> > > 
> > > So with your mental model the QMP client has to submit 2 'transaction'
> > > commands: 1 for the immediate actions, 1 for the block jobs.
> > > 
> > > In my mental model the QMP client submits 1 command but the immediate
> > > actions and block jobs are two separate transaction scopes.  This means
> > > if the block jobs fail, the client needs to be aware of the immediate
> > > actions that have committed.  Because of this, it becomes just as much
> > > client effort as submitting two separate 'transaction' commands in your
> > > model.
> > > 
> > > Can anyone see a practical difference?  I think I'm happy with John's
> > > model.
> > > 
> > > Stefan
> > > 
> > 
> > We discussed this off-list, but for the sake of the archive:
> > 
> > == How it is now ==
> > 
> > Currently, transactions have two implicit phases: the first is the
> > synchronous phase. If this phase completes successfully, we consider the
> > transaction a success. The second phase is the asynchronous phase where
> > jobs launched by the synchronous phase run to completion.
> > 
> > all synchronous commands must complete for the transaction to "succeed."
> > There are currently (pre-patch) no guarantees about asynchronous command
> > completion. As long as all synchronous actions complete, asynchronous
> > actions are free to succeed or fail individually.
> 
> You're overcomplicating this. All actions are currently synchronous and
> what you consider asynchronous transaction actions aren't actually part
> of the transaction at all. The action is "start block job X", not "run
> block job X".

Yes, this is how I've thought of it too.

> > == My Model ==
> > 
> > The current behavior is my "err-cancel = none" scenario: we offer no
> > guarantee about the success or failure of the transaction as a whole
> > after the synchronous portion has completed.
> > 
> > What I was proposing is "err-cancel = all," which to me means that _ALL_
> > commands in this transaction are to succeed (synchronous or not) before
> > _any_ actions are irrevocably committed. This means that f

Re: [Qemu-block] [PATCH v7 12/35] util: let qemu_fd_getlength support block device

2015-11-06 Thread Eduardo Habkost
On Mon, Nov 02, 2015 at 05:13:14PM +0800, Xiao Guangrong wrote:
> lseek can not work for all block devices as the man page says:
> | Some devices are incapable of seeking and POSIX does not specify
> | which devices must support lseek().
> 
> This patch tries to add the support on Linux by using BLKGETSIZE64
> ioctl
> 
> Signed-off-by: Xiao Guangrong 

On which cases is this patch necessary? Do you know any examples of
Linux block devices that won't work with lseek(SEEK_END)?

> ---
>  util/osdep.c | 20 
>  1 file changed, 20 insertions(+)
> 
> diff --git a/util/osdep.c b/util/osdep.c
> index 5a61e19..b20c793 100644
> --- a/util/osdep.c
> +++ b/util/osdep.c
> @@ -45,6 +45,11 @@
>  extern int madvise(caddr_t, size_t, int);
>  #endif
>  
> +#ifdef CONFIG_LINUX
> +#include 
> +#include 
> +#endif
> +
>  #include "qemu-common.h"
>  #include "qemu/sockets.h"
>  #include "qemu/error-report.h"
> @@ -433,6 +438,21 @@ int64_t qemu_fd_getlength(int fd)
>  {
>  int64_t size;
>  
> +#ifdef CONFIG_LINUX
> +struct stat stat_buf;
> +if (fstat(fd, &stat_buf) < 0) {
> +return -errno;
> +}
> +
> +if ((S_ISBLK(stat_buf.st_mode)) && !ioctl(fd, BLKGETSIZE64, &size)) {
> +/* The size of block device is larger than max int64_t? */
> +if (size < 0) {
> +return -EOVERFLOW;
> +}
> +return size;
> +}
> +#endif
> +
>  size = lseek(fd, 0, SEEK_END);
>  if (size < 0) {
>  return -errno;
> -- 
> 1.8.3.1
> 

-- 
Eduardo



Re: [Qemu-block] [PATCH v7 11/35] util: introduce qemu_file_getlength()

2015-11-06 Thread Eduardo Habkost
As this patch affects raw_getlength(), CCing the raw block driver
maintainer and the qemu-block mailing list.

On Mon, Nov 02, 2015 at 05:13:13PM +0800, Xiao Guangrong wrote:
> It is used to get the size of the specified file, also qemu_fd_getlength()
> is introduced to unify the code with raw_getlength() in block/raw-posix.c
> 
> Signed-off-by: Xiao Guangrong 
> ---
>  block/raw-posix.c|  7 +--
>  include/qemu/osdep.h |  2 ++
>  util/osdep.c | 31 +++

I know I was the one who suggested osdep.c, but maybe oslib-posix.c is a
more appropriate place for the new function?


>  3 files changed, 34 insertions(+), 6 deletions(-)
> 
> diff --git a/block/raw-posix.c b/block/raw-posix.c
> index 918c756..734e6dd 100644
> --- a/block/raw-posix.c
> +++ b/block/raw-posix.c
> @@ -1592,18 +1592,13 @@ static int64_t raw_getlength(BlockDriverState *bs)
>  {
>  BDRVRawState *s = bs->opaque;
>  int ret;
> -int64_t size;
>  
>  ret = fd_open(bs);
>  if (ret < 0) {
>  return ret;
>  }
>  
> -size = lseek(s->fd, 0, SEEK_END);
> -if (size < 0) {
> -return -errno;
> -}
> -return size;
> +return qemu_fd_getlength(s->fd);
>  }
>  #endif
>  
> diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
> index dbc17dc..ca4c3fa 100644
> --- a/include/qemu/osdep.h
> +++ b/include/qemu/osdep.h
> @@ -303,4 +303,6 @@ int qemu_read_password(char *buf, int buf_size);
>  pid_t qemu_fork(Error **errp);
>  
>  size_t qemu_file_get_page_size(const char *mem_path, Error **errp);
> +int64_t qemu_fd_getlength(int fd);
> +size_t qemu_file_getlength(const char *file, Error **errp);
>  #endif
> diff --git a/util/osdep.c b/util/osdep.c
> index 0092bb6..5a61e19 100644
> --- a/util/osdep.c
> +++ b/util/osdep.c
> @@ -428,3 +428,34 @@ writev(int fd, const struct iovec *iov, int iov_cnt)
>  return readv_writev(fd, iov, iov_cnt, true);
>  }
>  #endif
> +
> +int64_t qemu_fd_getlength(int fd)
> +{
> +int64_t size;
> +
> +size = lseek(fd, 0, SEEK_END);
> +if (size < 0) {
> +return -errno;
> +}
> +return size;
> +}
> +
> +size_t qemu_file_getlength(const char *file, Error **errp)
> +{
> +int64_t size;
> +int fd = qemu_open(file, O_RDONLY);
> +
> +if (fd < 0) {
> +error_setg_file_open(errp, errno, file);
> +return 0;
> +}
> +
> +size = qemu_fd_getlength(fd);
> +if (size < 0) {
> +error_setg_errno(errp, -size, "can't get size of file %s", file);
> +size = 0;
> +}
> +
> +qemu_close(fd);
> +return size;
> +}
> -- 
> 1.8.3.1
> 

-- 
Eduardo



Re: [Qemu-block] [PATCH v7 12/35] util: let qemu_fd_getlength support block device

2015-11-06 Thread Eduardo Habkost
As this patch affects raw_getlength(), CCing the raw block driver
maintainer and the qemu-block mailing list.

On Mon, Nov 02, 2015 at 05:13:14PM +0800, Xiao Guangrong wrote:
> lseek can not work for all block devices as the man page says:
> | Some devices are incapable of seeking and POSIX does not specify
> | which devices must support lseek().
> 
> This patch tries to add the support on Linux by using BLKGETSIZE64
> ioctl
> 
> Signed-off-by: Xiao Guangrong 
> ---
>  util/osdep.c | 20 
>  1 file changed, 20 insertions(+)
> 
> diff --git a/util/osdep.c b/util/osdep.c
> index 5a61e19..b20c793 100644
> --- a/util/osdep.c
> +++ b/util/osdep.c
> @@ -45,6 +45,11 @@
>  extern int madvise(caddr_t, size_t, int);
>  #endif
>  
> +#ifdef CONFIG_LINUX
> +#include 
> +#include 
> +#endif
> +
>  #include "qemu-common.h"
>  #include "qemu/sockets.h"
>  #include "qemu/error-report.h"
> @@ -433,6 +438,21 @@ int64_t qemu_fd_getlength(int fd)
>  {
>  int64_t size;
>  
> +#ifdef CONFIG_LINUX
> +struct stat stat_buf;
> +if (fstat(fd, &stat_buf) < 0) {
> +return -errno;
> +}
> +
> +if ((S_ISBLK(stat_buf.st_mode)) && !ioctl(fd, BLKGETSIZE64, &size)) {
> +/* The size of block device is larger than max int64_t? */
> +if (size < 0) {
> +return -EOVERFLOW;
> +}
> +return size;
> +}
> +#endif
> +
>  size = lseek(fd, 0, SEEK_END);
>  if (size < 0) {
>  return -errno;
> -- 
> 1.8.3.1
> 

-- 
Eduardo



[Qemu-block] [PATCH v9 11/15] qmp: Introduce blockdev-change-medium

2015-11-06 Thread Max Reitz
Introduce a new QMP command 'blockdev-change-medium' which is intended
to replace the 'change' command for block devices. The existing function
qmp_change_blockdev() is accordingly renamed to
qmp_blockdev_change_medium().

Signed-off-by: Max Reitz 
---
 blockdev.c|  7 ---
 include/sysemu/blockdev.h |  2 --
 qapi-schema.json  |  6 --
 qapi/block-core.json  | 23 +++
 qmp-commands.hx   | 31 +++
 qmp.c |  2 +-
 ui/cocoa.m| 10 ++
 7 files changed, 69 insertions(+), 12 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index df93173..dec6169 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2133,8 +2133,9 @@ void qmp_blockdev_insert_medium(const char *device, const 
char *node_name,
 qmp_blockdev_insert_anon_medium(device, bs, errp);
 }
 
-void qmp_change_blockdev(const char *device, const char *filename,
- const char *format, Error **errp)
+void qmp_blockdev_change_medium(const char *device, const char *filename,
+bool has_format, const char *format,
+Error **errp)
 {
 BlockBackend *blk;
 BlockDriverState *medium_bs = NULL;
@@ -2155,7 +2156,7 @@ void qmp_change_blockdev(const char *device, const char 
*filename,
 
 bdrv_flags = blk_get_open_flags_from_root_state(blk);
 
-if (format) {
+if (has_format) {
 options = qdict_new();
 qdict_put(options, "driver", qstring_from_str(format));
 }
diff --git a/include/sysemu/blockdev.h b/include/sysemu/blockdev.h
index a00be94..b06a060 100644
--- a/include/sysemu/blockdev.h
+++ b/include/sysemu/blockdev.h
@@ -63,8 +63,6 @@ DriveInfo *drive_new(QemuOpts *arg, BlockInterfaceType 
block_default_type);
 
 /* device-hotplug */
 
-void qmp_change_blockdev(const char *device, const char *filename,
- const char *format, Error **errp);
 void hmp_commit(Monitor *mon, const QDict *qdict);
 void hmp_drive_del(Monitor *mon, const QDict *qdict);
 #endif
diff --git a/qapi-schema.json b/qapi-schema.json
index 702b7b5..058b8ec 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1842,8 +1842,10 @@
 #  device's password.  The behavior of reads and writes to the block
 #  device between when these calls are executed is undefined.
 #
-# Notes:  It is strongly recommended that this interface is not used especially
-# for changing block devices.
+# Notes:  This interface is deprecated, and it is strongly recommended that you
+# avoid using it.  For changing block devices, use
+# blockdev-change-medium; for changing VNC parameters, use
+# change-vnc-password.
 #
 # Since: 0.14.0
 ##
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 5c4fc72..e9fa649 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1959,6 +1959,29 @@
 
 
 ##
+# @blockdev-change-medium:
+#
+# Changes the medium inserted into a block device by ejecting the current 
medium
+# and loading a new image file which is inserted as the new medium (this 
command
+# combines blockdev-open-tray, blockdev-remove-medium, blockdev-insert-medium
+# and blockdev-close-tray).
+#
+# @device:  block device name
+#
+# @filename:filename of the new image to be loaded
+#
+# @format:  #optional, format to open the new image with (defaults to
+#   the probed format)
+#
+# Since: 2.5
+##
+{ 'command': 'blockdev-change-medium',
+  'data': { 'device': 'str',
+'filename': 'str',
+'*format': 'str' } }
+
+
+##
 # @BlockErrorAction
 #
 # An enumeration of action that has been taken when a DISK I/O occurs
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 1dfa809..d6e4d7e 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -4163,6 +4163,37 @@ Example:
 EQMP
 
 {
+.name   = "blockdev-change-medium",
+.args_type  = "device:B,filename:F,format:s?",
+.mhandler.cmd_new = qmp_marshal_blockdev_change_medium,
+},
+
+SQMP
+blockdev-change-medium
+--
+
+Changes the medium inserted into a block device by ejecting the current medium
+and loading a new image file which is inserted as the new medium.
+
+Arguments:
+
+- "device": device name (json-string)
+- "filename": filename of the new image (json-string)
+- "format": format of the new image (json-string, optional)
+
+Examples:
+
+1. Change a removable medium
+
+-> { "execute": "blockdev-change-medium",
+ "arguments": { "device": "ide1-cd0",
+"filename": "/srv/images/Fedora-12-x86_64-DVD.iso",
+"format": "raw" } }
+<- { "return": {} }
+
+EQMP
+
+{
 .name   = "query-memdev",
 .args_type  = "",
 .mhandler.cmd_new = qmp_marshal_query_memdev,
diff --git a/qmp.c b/qmp.c
index ff54e5a..4e44f98 100644
--- a/qmp.c
+++ b/qmp.c
@@ -414,7 +414,7 @@ void q

[Qemu-block] [PATCH v9 00/15] blockdev: BlockBackend and media

2015-11-06 Thread Max Reitz
Patch 11 of the last series broke the OS X build due to ui/cocoa.m
directly referencing qmp_change_blockdev() (which was an internal
function not directly mapped to any QMP command before, now it becomes
an "external" function and is renamed qmp_blockdev_change_medium()).
This v9 consists only of that patch, because the rest is completely
unchanged from v8.

Unfortunately, neither am I rich enough nor do I have the immediate
intention of buying Apple hardware any time soon anyway, so I have no
way of verifying that this changed version builds on OS X. I would
appreciate help with that.

Oh, and if anybody ever asks me again why it might be a bad idea to put
management code into the GUI layer of one specific operating system...


git-backport-diff against v8:

Key:
[] : patches are identical
[] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/15:[] [--] 'block: Add blk_remove_bs()'
002/15:[] [--] 'block: Make bdrv_states public'
003/15:[] [--] 'block: Add functions for inheriting a BBRS'
004/15:[] [--] 'blockdev: Add blockdev-open-tray'
005/15:[] [--] 'blockdev: Add blockdev-close-tray'
006/15:[] [--] 'blockdev: Add blockdev-remove-medium'
007/15:[] [--] 'blockdev: Add blockdev-insert-medium'
008/15:[] [--] 'blockdev: Implement eject with basic operations'
009/15:[] [--] 'blockdev: Implement change with basic operations'
010/15:[] [--] 'block: Inquire tray state before tray-moved events'
011/15:[0010] [FC] 'qmp: Introduce blockdev-change-medium'
012/15:[] [--] 'hmp: Use blockdev-change-medium for change command'
013/15:[] [--] 'blockdev: read-only-mode for blockdev-change-medium'
014/15:[] [--] 'hmp: Add read-only-mode option to change command'
015/15:[] [--] 'iotests: Add test for change-related QMP commands'


Max Reitz (15):
  block: Add blk_remove_bs()
  block: Make bdrv_states public
  block: Add functions for inheriting a BBRS
  blockdev: Add blockdev-open-tray
  blockdev: Add blockdev-close-tray
  blockdev: Add blockdev-remove-medium
  blockdev: Add blockdev-insert-medium
  blockdev: Implement eject with basic operations
  blockdev: Implement change with basic operations
  block: Inquire tray state before tray-moved events
  qmp: Introduce blockdev-change-medium
  hmp: Use blockdev-change-medium for change command
  blockdev: read-only-mode for blockdev-change-medium
  hmp: Add read-only-mode option to change command
  iotests: Add test for change-related QMP commands

 block.c|   3 +-
 block/block-backend.c  |  56 +++-
 blockdev.c | 286 
 hmp-commands.hx|  20 +-
 hmp.c  |  47 ++-
 include/block/block_int.h  |   2 +
 include/sysemu/block-backend.h |   3 +
 include/sysemu/blockdev.h  |   2 -
 qapi-schema.json   |   6 +-
 qapi/block-core.json   | 126 
 qmp-commands.hx| 218 +
 qmp.c  |   3 +-
 tests/qemu-iotests/118 | 720 +
 tests/qemu-iotests/118.out |   5 +
 tests/qemu-iotests/group   |   1 +
 ui/cocoa.m |  10 +-
 16 files changed, 1402 insertions(+), 106 deletions(-)
 create mode 100755 tests/qemu-iotests/118
 create mode 100644 tests/qemu-iotests/118.out

-- 
2.6.2




Re: [Qemu-block] [Qemu-devel] [PULL 00/37] Block layer patches

2015-11-06 Thread Max Reitz
On 06.11.2015 09:17, Kevin Wolf wrote:
> Am 05.11.2015 um 20:01 hat Peter Maydell geschrieben:
>> On 5 November 2015 at 18:17, Kevin Wolf  wrote:
>>> The following changes since commit 8835b9df3bddf332c883c861d6a1defc12c4ebe9:
>>>
>>>   Merge remote-tracking branch 
>>> 'remotes/mdroth/tags/qga-pull-2015-11-04-tag' into staging (2015-11-05 
>>> 10:52:35 +)
>>>
>>> are available in the git repository at:
>>>
>>>
>>>   git://repo.or.cz/qemu/kevin.git tags/for-upstream
>>>
>>> for you to fetch changes up to 22bdadd23b64af65ac2dd816848dbe2b1240a77a:
>>>
>>>   Merge remote-tracking branch 
>>> 'mreitz/tags/pull-block-for-kevin-2015-11-05' into queue-block (2015-11-05 
>>> 18:01:37 +0100)
>>>
>>> 
>>>
>>> Block layer patches
>>
>> Fails to build on OSX, I'm afraid:
>> Undefined symbols for architecture x86_64:
>>   "_qmp_change_blockdev", referenced from:
>>   -[QemuCocoaAppController changeDeviceMedia:] in cocoa.o
> 
> Max, I think this is yours.

Indeed, will fix. Sorry.

Max




signature.asc
Description: OpenPGP digital signature


[Qemu-block] BlockDeviceInfo's backing_file and ImageInfo's backing-filename

2015-11-06 Thread Alberto Garcia
Hi all,

I noticed that query-named-block-nodes (and HMP 'info block') returns
odd results when the effective backing image of a BlockDriverState is
different from the string that is stored on the file.

There are two fields exposed on the API that show this information:
BlockDeviceInfo's backing_file and ImageInfo's backing-filename. From
the description I would expect that the former refers to the actual
backing file in use (bs->backing) and the latter to the string stored
in the file.

That's however not what happens. Let's see a few examples:

qemu-img create -f qcow2 base.img 4G
qemu-img create -f qcow2 -o backing_file=base.img hd0.img
qemu-img create -f qcow2 -o backing_file=hd0.img hd1.img

1)
  { "execute": "blockdev-add",
"arguments": {
  "options": {
"driver": "qcow2",
"file": {
  "driver": "file",
  "filename": "hd1.img"
},
"id": "drive0"
  }
}
  }

  The output here is normal:

BlockDeviceInfo.backing_file = hd0.img
BlockDeviceInfo.image.backing-filename: hd0.img

2)
  { "execute": "blockdev-add",
"arguments": {
  "options": {
"driver": "qcow2",
"file": {
  "driver": "file",
  "filename": "hd1.img"
},
"id": "drive0",
"backing": {
  "driver": "qcow2",
  "file": {
"driver": "file",
"filename": "base.img"
  }
}
  }
}
  }

  Now there's no reference to hd0.img anywhere:

BlockDeviceInfo.backing_file = base.img
BlockDeviceInfo.image.backing-filename: base.img

3)
  { "execute": "blockdev-add",
"arguments": {
  "options": {
"driver": "qcow2",
"file": {
  "driver": "file",
  "filename": "hd1.img"
},
"id": "drive0",
"backing": ""
  }
}
  }

  Now both fields point to hd0.img, although it hasn't been opened:

BlockDeviceInfo.backing_file = hd0.img
BlockDeviceInfo.image.backing-filename: hd0.img

I would like to fix this but first I need to know the exact semantics
of both fields, and the exact meaning of bs->backing_file.

On top of that the actual backing file string is also present in the
format-specific structs, e.g. BDRVQcow2State.image_backing_file, but
that's not directly accessible from the API (it's not in
ImageInfoSpecific).

Berto



Re: [Qemu-block] [PATCH] virtio-blk: trivial code optimization

2015-11-06 Thread Paolo Bonzini


On 06/11/2015 11:35, Stefan Hajnoczi wrote:
>> >  if (niov + req->qiov.niov > IOV_MAX) {
>> >  merge = false;
>> > +goto unmerge;
>> >  }
>> >  
>> >  /* merge would exceed maximum transfer length of backend 
>> > device */
>> >  if (req->qiov.size / BDRV_SECTOR_SIZE + nb_sectors > 
>> > max_xfer_len) {
>> >  merge = false;
>> > +goto unmerge;
>> >  }
>> >  
>> >  /* requests are not sequential */
>> >  if (sector_num + nb_sectors != req->sector_num) {
>> >  merge = false;
>> >  }
>> > -
>> > +unmerge:
> C has a way of expressing this without gotos.  Please use else if:
> 
>   if (a) {
>   ...
>   } else if (b) {
>   ...
>   } else if (c) {
>   ...
>   }

Another way is

if (niov + req->qiov.niov > IOV_MAX ||
req->qiov.size / BDRV_SECTOR_SIZE + nb_sectors > max_xfer_len ||
sector_num + nb_sectors != req->sector_num) {
submit_requests(...)
...
}

While at it, we could reorder the conditions so that the most common
("requests are not sequential") comes first.

I'm not sure about handling of overflow.  It's probably better to
write conditions as "new > max - old" (e.g. "niov > IOV_MAX -
req->qiov.niov") rather than "old + new > max".  The former is always
safe, because we know that old <= max and there can be no integer
overflow.

Thanks,

Paolo



Re: [Qemu-block] [PATCH V4] block/nfs: add support for setting debug level

2015-11-06 Thread Peter Lieven
Am 06.11.2015 um 12:23 schrieb Stefan Hajnoczi:
> On Thu, Nov 05, 2015 at 11:25:34PM +0100, Peter Lieven wrote:
>> recent libnfs versions support logging debug messages. Add
>> support for it in qemu through an URL parameter.
>>
>> Example:
>>  qemu -cdrom nfs://127.0.0.1/iso/my.iso?debug=2
>> ---
>> v3->v4: revert to the initial version, but limit max debug level
>> v2->v3: use a per-drive option instead of a global one. [Stefan]
>> v1->v2: reworked patch to accept the debug level as a cmdline
>> parameter instead of an URI parameter [Stefan]
>>
>>  block/nfs.c |   10 ++
>>  1 file changed, 10 insertions(+)
>>
>> diff --git a/block/nfs.c b/block/nfs.c
>> index fd79f89..fbea25a 100644
>> --- a/block/nfs.c
>> +++ b/block/nfs.c
>> @@ -36,6 +36,7 @@
>>  #include 
>>  
>>  #define QEMU_NFS_MAX_READAHEAD_SIZE 1048576
>> +#define QEMU_NFS_MAX_DEBUG_LEVEL 2
>>  
>>  typedef struct NFSClient {
>>  struct nfs_context *context;
>> @@ -334,6 +335,15 @@ static int64_t nfs_client_open(NFSClient *client, const 
>> char *filename,
>>  }
>>  nfs_set_readahead(client->context, val);
>>  #endif
>> +#ifdef LIBNFS_FEATURE_DEBUG
>> +} else if (!strcmp(qp->p[i].name, "debug")) {
>> +if (val > QEMU_NFS_MAX_DEBUG_LEVEL) {
>> +error_report("NFS Warning: Limiting NFS debug level"
>> + " to %d", QEMU_NFS_MAX_DEBUG_LEVEL);
>> +val = QEMU_NFS_MAX_DEBUG_LEVEL;
>> +}
>> +nfs_set_debug(client->context, val);
>> +#endif
> Please include the rationale for limiting the debug level as a comment
> in the code.  Otherwise people may read the code and not understand why
> we do this.

Okay. Then we should add this to the readahead limitation as well.

>
> The error message should indicate how to set high debug levels (if
> possible).  A simple "You can't do that" error message is frustrating
> :).

But thats the case at the moment ;-)

Peter




Re: [Qemu-block] [PATCH V4] block/nfs: add support for setting debug level

2015-11-06 Thread Stefan Hajnoczi
On Thu, Nov 05, 2015 at 11:25:34PM +0100, Peter Lieven wrote:
> recent libnfs versions support logging debug messages. Add
> support for it in qemu through an URL parameter.
> 
> Example:
>  qemu -cdrom nfs://127.0.0.1/iso/my.iso?debug=2
> ---
> v3->v4: revert to the initial version, but limit max debug level
> v2->v3: use a per-drive option instead of a global one. [Stefan]
> v1->v2: reworked patch to accept the debug level as a cmdline
> parameter instead of an URI parameter [Stefan]
> 
>  block/nfs.c |   10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/block/nfs.c b/block/nfs.c
> index fd79f89..fbea25a 100644
> --- a/block/nfs.c
> +++ b/block/nfs.c
> @@ -36,6 +36,7 @@
>  #include 
>  
>  #define QEMU_NFS_MAX_READAHEAD_SIZE 1048576
> +#define QEMU_NFS_MAX_DEBUG_LEVEL 2
>  
>  typedef struct NFSClient {
>  struct nfs_context *context;
> @@ -334,6 +335,15 @@ static int64_t nfs_client_open(NFSClient *client, const 
> char *filename,
>  }
>  nfs_set_readahead(client->context, val);
>  #endif
> +#ifdef LIBNFS_FEATURE_DEBUG
> +} else if (!strcmp(qp->p[i].name, "debug")) {
> +if (val > QEMU_NFS_MAX_DEBUG_LEVEL) {
> +error_report("NFS Warning: Limiting NFS debug level"
> + " to %d", QEMU_NFS_MAX_DEBUG_LEVEL);
> +val = QEMU_NFS_MAX_DEBUG_LEVEL;
> +}
> +nfs_set_debug(client->context, val);
> +#endif

Please include the rationale for limiting the debug level as a comment
in the code.  Otherwise people may read the code and not understand why
we do this.

The error message should indicate how to set high debug levels (if
possible).  A simple "You can't do that" error message is frustrating
:).


signature.asc
Description: PGP signature


Re: [Qemu-block] [PATCH] virtio-blk: trivial code optimization

2015-11-06 Thread Stefan Hajnoczi
On Fri, Nov 06, 2015 at 09:04:57AM +0800, arei.gong...@huawei.com wrote:
> diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
> index 093e475..752586d 100644
> --- a/hw/block/virtio-blk.c
> +++ b/hw/block/virtio-blk.c
> @@ -409,18 +409,20 @@ void virtio_blk_submit_multireq(BlockBackend *blk, 
> MultiReqBuffer *mrb)
>  /* merge would exceed maximum number of IOVs */
>  if (niov + req->qiov.niov > IOV_MAX) {
>  merge = false;
> +goto unmerge;
>  }
>  
>  /* merge would exceed maximum transfer length of backend device 
> */
>  if (req->qiov.size / BDRV_SECTOR_SIZE + nb_sectors > 
> max_xfer_len) {
>  merge = false;
> +goto unmerge;
>  }
>  
>  /* requests are not sequential */
>  if (sector_num + nb_sectors != req->sector_num) {
>  merge = false;
>  }
> -
> +unmerge:

C has a way of expressing this without gotos.  Please use else if:

  if (a) {
  ...
  } else if (b) {
  ...
  } else if (c) {
  ...
  }


signature.asc
Description: PGP signature


[Qemu-block] [PATCH] mirror: Improve zero-write and discard with fragmented image

2015-11-06 Thread Fam Zheng
The "pnum < nb_sectors" condition in deciding whether to actually copy
data is unnecessarily strict, and the qiov initialization is
unnecessarily too, for both bdrv_aio_write_zeroes and bdrv_aio_discard
branches.

Reorganize mirror_iteration flow so that we:

1) Find the contiguous zero/discarded sectors with
bdrv_get_block_status_above() before deciding what to do. We query
s->buf_size sized blocks at a time.

2) If the sectors in question are zeroed/discarded and aligned to
target cluster, issue zero write or discard accordingly. It's done
in mirror_do_zero_or_discard, where we don't add buffer to qiov.

3) Otherwise, do the same loop as before in mirror_do_read.

Signed-off-by: Fam Zheng 
---
 block/mirror.c | 161 +
 1 file changed, 127 insertions(+), 34 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index b1252a1..ac796b4 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -157,28 +157,21 @@ static void mirror_read_complete(void *opaque, int ret)
 mirror_write_complete, op);
 }
 
-static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
+static uint64_t mirror_do_read(MirrorBlockJob *s)
 {
 BlockDriverState *source = s->common.bs;
-int nb_sectors, sectors_per_chunk, nb_chunks;
-int64_t end, sector_num, next_chunk, next_sector, hbitmap_next_sector;
+int sectors_per_chunk, nb_sectors, nb_chunks;
+int64_t end, next_chunk, next_sector, hbitmap_next_sector, sector_num;
 uint64_t delay_ns = 0;
 MirrorOp *op;
-int pnum;
-int64_t ret;
 
-s->sector_num = hbitmap_iter_next(&s->hbi);
-if (s->sector_num < 0) {
-bdrv_dirty_iter_init(s->dirty_bitmap, &s->hbi);
-s->sector_num = hbitmap_iter_next(&s->hbi);
-trace_mirror_restart_iter(s, bdrv_get_dirty_count(s->dirty_bitmap));
-assert(s->sector_num >= 0);
-}
+sector_num = s->sector_num;
+sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS;
+end = s->bdev_length / BDRV_SECTOR_SIZE;
 
+next_sector = sector_num;
+next_chunk = sector_num / sectors_per_chunk;
 hbitmap_next_sector = s->sector_num;
-sector_num = s->sector_num;
-sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS;
-end = s->bdev_length / BDRV_SECTOR_SIZE;
 
 /* Extend the QEMUIOVector to include all adjacent blocks that will
  * be copied in this operation.
@@ -198,14 +191,6 @@ static uint64_t coroutine_fn 
mirror_iteration(MirrorBlockJob *s)
 next_sector = sector_num;
 next_chunk = sector_num / sectors_per_chunk;
 
-/* Wait for I/O to this cluster (from a previous iteration) to be done.  */
-while (test_bit(next_chunk, s->in_flight_bitmap)) {
-trace_mirror_yield_in_flight(s, sector_num, s->in_flight);
-s->waiting_for_io = true;
-qemu_coroutine_yield();
-s->waiting_for_io = false;
-}
-
 do {
 int added_sectors, added_chunks;
 
@@ -301,24 +286,132 @@ static uint64_t coroutine_fn 
mirror_iteration(MirrorBlockJob *s)
 s->sectors_in_flight += nb_sectors;
 trace_mirror_one_iteration(s, sector_num, nb_sectors);
 
-ret = bdrv_get_block_status_above(source, NULL, sector_num,
-  nb_sectors, &pnum);
-if (ret < 0 || pnum < nb_sectors ||
-(ret & BDRV_BLOCK_DATA && !(ret & BDRV_BLOCK_ZERO))) {
-bdrv_aio_readv(source, sector_num, &op->qiov, nb_sectors,
-   mirror_read_complete, op);
-} else if (ret & BDRV_BLOCK_ZERO) {
+bdrv_aio_readv(source, sector_num, &op->qiov, nb_sectors,
+   mirror_read_complete, op);
+return delay_ns;
+}
+
+static uint64_t mirror_do_zero_or_discard(MirrorBlockJob *s,
+  int64_t sector_num,
+  int nb_sectors,
+  bool is_discard)
+{
+int sectors_per_chunk, nb_chunks;
+int64_t next_chunk, next_sector, hbitmap_next_sector;
+uint64_t delay_ns = 0;
+MirrorOp *op;
+
+sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS;
+assert(nb_sectors >= sectors_per_chunk);
+next_chunk = sector_num / sectors_per_chunk;
+nb_chunks = DIV_ROUND_UP(nb_sectors, sectors_per_chunk);
+bitmap_set(s->in_flight_bitmap, next_chunk, nb_chunks);
+delay_ns = ratelimit_calculate_delay(&s->limit, nb_sectors);
+
+/* Allocate a MirrorOp that is used as an AIO callback. The qiov is zeroed
+ * out so the freeing in iteration is nop. */
+op = g_new0(MirrorOp, 1);
+op->s = s;
+op->sector_num = sector_num;
+op->nb_sectors = nb_sectors;
+
+/* Advance the HBitmapIter in parallel, so that we do not examine
+ * the same sector twice.
+ */
+hbitmap_next_sector = sector_num;
+next_sector = sector_num + nb_sectors;
+while (next_sector > hbitmap_next_sector) {
+hbitmap_next_sector = hbitmap_iter_next(&s->hbi)

Re: [Qemu-block] [PATCH v4 0/3] qemu-io: clean up cvtnum usage

2015-11-06 Thread Kevin Wolf
Am 06.11.2015 um 00:53 hat John Snow geschrieben:
> cvtnum returns an int64_t, not an int, so correct the lvalue types
> wherever it is used. While we're at it, make the error messages more
> meaningful and hopefully less confusing.
> 
> v4:
>  - Now missing ALL sweaters
> 
> v3:
>  - pulled a lot of loose yarn, now missing my sweater
>(Updated patch 1 even further, reported-by Kevin)
> 
> v2:
>  - Squashed NSIG error-checking from patch 3 into patch 1
>  - Reported-by credits for Max and Reviewed-by from Eric added

Thanks, applied to the block branch. (Should we mention in the changelog
that qemu 2.5 contains some of your sweaters?)

Kevin



[Qemu-block] [PATCH V3 6/6] ide: enable buffered requests for PIO read requests

2015-11-06 Thread Peter Lieven
Signed-off-by: Peter Lieven 
---
 hw/ide/core.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/ide/core.c b/hw/ide/core.c
index 53f9c2c..d1feae2 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -676,8 +676,8 @@ static void ide_sector_read(IDEState *s)
 
 block_acct_start(blk_get_stats(s->blk), &s->acct,
  n * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
-s->pio_aiocb = blk_aio_readv(s->blk, sector_num, &s->qiov, n,
- ide_sector_read_cb, s);
+s->pio_aiocb = ide_buffered_readv(s, sector_num, &s->qiov, n,
+  ide_sector_read_cb, s);
 }
 
 void dma_buf_commit(IDEState *s, uint32_t tx_bytes)
-- 
1.9.1




[Qemu-block] [PATCH V3 5/6] ide: enable buffered requests for ATAPI devices

2015-11-06 Thread Peter Lieven
Signed-off-by: Peter Lieven 
---
 hw/ide/atapi.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
index 29fd131..2f6d018 100644
--- a/hw/ide/atapi.c
+++ b/hw/ide/atapi.c
@@ -190,8 +190,8 @@ static int cd_read_sector(IDEState *s, void *buf)
 block_acct_start(blk_get_stats(s->blk), &s->acct,
  4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
 
-blk_aio_readv(s->blk, (int64_t)s->lba << 2, &s->qiov, 4,
-  cd_read_sector_cb, s);
+ide_buffered_readv(s, (int64_t)s->lba << 2, &s->qiov, 4,
+   cd_read_sector_cb, s);
 
 s->status |= BUSY_STAT;
 return 0;
@@ -424,9 +424,9 @@ static void ide_atapi_cmd_read_dma_cb(void *opaque, int ret)
 s->bus->dma->iov.iov_len = n * 4 * 512;
 qemu_iovec_init_external(&s->bus->dma->qiov, &s->bus->dma->iov, 1);
 
-s->bus->dma->aiocb = blk_aio_readv(s->blk, (int64_t)s->lba << 2,
-   &s->bus->dma->qiov, n * 4,
-   ide_atapi_cmd_read_dma_cb, s);
+s->bus->dma->aiocb = ide_buffered_readv(s, (int64_t)s->lba << 2,
+&s->bus->dma->qiov, n * 4,
+ide_atapi_cmd_read_dma_cb, s);
 return;
 
 eot:
-- 
1.9.1




[Qemu-block] [PATCH V3 0/6] ide: avoid main-loop hang on CDROM/NFS failure

2015-11-06 Thread Peter Lieven
This series aims at avoiding a hanging main-loop if a vserver has a
CDROM image mounted from a NFS share and that NFS share goes down.
Typical situation is that users mount an CDROM ISO to install something
and then forget to eject that CDROM afterwards.
As a consequence this mounted CD is able to bring down the
whole vserver if the backend NFS share is unreachable. This is bad
especially if the CDROM itself is not needed anymore at this point.

v2->v3: - adressed Stefans comments on Patch 1
- added patches 2,4,5,6
- avoided the term cancel in Patch 3. Added an iovec,
  added a BH [Stefan]
v1->v2: - fix offset for 2352 byte sector size [Kevin]
- use a sync request if we continue an elementary transfer.
  As John pointed out we enter a race condition between next
  IDE command and async transfer otherwise. This is sill not
  optimal, but it fixes the NFS down problems for all cases where
  the NFS server goes down while there is no PIO CD activity.
  Of course, it could still happen during a PIO transfer, but I
  expect this to be the unlikelier case.
  I spent some effort trying to read more sectors at once and
  avoiding continuation of elementary transfers, but with
  whatever I came up it was destroying migration between different
  Qemu versions. I have a quite hackish patch that works and
  should survive migration, but I am not happy with it. So I
  would like to start with this version as it is a big improvement
  already.
- Dropped Patch 5 because it is upstream meanwhile.

Peter Lieven (6):
  ide/atapi: make PIO read requests async
  block: add blk_abort_aio_request
  ide: add support for IDEBufferedRequest
  ide: orphan all buffered requests on DMA cancel
  ide: enable buffered requests for ATAPI devices
  ide: enable buffered requests for PIO read requests

 block/block-backend.c  |  17 +++
 hw/ide/atapi.c | 103 +++--
 hw/ide/core.c  |  50 +++-
 hw/ide/internal.h  |  14 ++
 hw/ide/pci.c   |  19 
 include/sysemu/block-backend.h |   3 ++
 6 files changed, 181 insertions(+), 25 deletions(-)

-- 
1.9.1




[Qemu-block] [PATCH V3 1/6] ide/atapi: make PIO read requests async

2015-11-06 Thread Peter Lieven
PIO read requests on the ATAPI interface used to be sync blk requests.
This has two significant drawbacks. First the main loop hangs util an
I/O request is completed and secondly if the I/O request does not
complete (e.g. due to an unresponsive storage) Qemu hangs completely.

Note: Due to possible race conditions requests during an ongoing
elementary transfer are still sync.

Signed-off-by: Peter Lieven 
---
 hw/ide/atapi.c | 97 ++
 1 file changed, 85 insertions(+), 12 deletions(-)

diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
index 747f466..29fd131 100644
--- a/hw/ide/atapi.c
+++ b/hw/ide/atapi.c
@@ -105,33 +105,98 @@ static void cd_data_to_raw(uint8_t *buf, int lba)
 memset(buf, 0, 288);
 }
 
-static int cd_read_sector(IDEState *s, int lba, uint8_t *buf, int sector_size)
+static int
+cd_read_sector_sync(IDEState *s, uint8_t *buf)
 {
 int ret;
 
-switch(sector_size) {
+#ifdef DEBUG_IDE_ATAPI
+printf("cd_read_sector_sync: lba=%d\n", s->lba);
+#endif
+
+switch (s->cd_sector_size) {
 case 2048:
 block_acct_start(blk_get_stats(s->blk), &s->acct,
  4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
-ret = blk_read(s->blk, (int64_t)lba << 2, buf, 4);
+ret = blk_read(s->blk, (int64_t)s->lba << 2, buf, 4);
 block_acct_done(blk_get_stats(s->blk), &s->acct);
 break;
 case 2352:
 block_acct_start(blk_get_stats(s->blk), &s->acct,
  4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
-ret = blk_read(s->blk, (int64_t)lba << 2, buf + 16, 4);
+ret = blk_read(s->blk, (int64_t)s->lba << 2, buf + 16, 4);
 block_acct_done(blk_get_stats(s->blk), &s->acct);
 if (ret < 0)
 return ret;
-cd_data_to_raw(buf, lba);
+cd_data_to_raw(buf, s->lba);
 break;
 default:
 ret = -EIO;
 break;
 }
+
+if (!ret) {
+s->lba++;
+s->io_buffer_index = 0;
+}
+
 return ret;
 }
 
+static void cd_read_sector_cb(void *opaque, int ret)
+{
+IDEState *s = opaque;
+
+block_acct_done(blk_get_stats(s->blk), &s->acct);
+
+#ifdef DEBUG_IDE_ATAPI
+printf("cd_read_sector_cb: lba=%d ret=%d\n", s->lba, ret);
+#endif
+
+if (ret < 0) {
+ide_atapi_io_error(s, ret);
+return;
+}
+
+if (s->cd_sector_size == 2352) {
+cd_data_to_raw(s->io_buffer, s->lba);
+}
+
+s->lba++;
+s->io_buffer_index = 0;
+s->status &= ~BUSY_STAT;
+
+ide_atapi_cmd_reply_end(s);
+}
+
+static int cd_read_sector(IDEState *s, void *buf)
+{
+if (s->cd_sector_size != 2048 && s->cd_sector_size != 2352) {
+return -EINVAL;
+}
+
+s->iov.iov_base = buf;
+if (s->cd_sector_size == 2352) {
+buf += 16;
+}
+
+s->iov.iov_len = 4 * BDRV_SECTOR_SIZE;
+qemu_iovec_init_external(&s->qiov, &s->iov, 1);
+
+#ifdef DEBUG_IDE_ATAPI
+printf("cd_read_sector: lba=%d\n", s->lba);
+#endif
+
+block_acct_start(blk_get_stats(s->blk), &s->acct,
+ 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
+
+blk_aio_readv(s->blk, (int64_t)s->lba << 2, &s->qiov, 4,
+  cd_read_sector_cb, s);
+
+s->status |= BUSY_STAT;
+return 0;
+}
+
 void ide_atapi_cmd_ok(IDEState *s)
 {
 s->error = 0;
@@ -182,18 +247,27 @@ void ide_atapi_cmd_reply_end(IDEState *s)
 ide_atapi_cmd_ok(s);
 ide_set_irq(s->bus);
 #ifdef DEBUG_IDE_ATAPI
-printf("status=0x%x\n", s->status);
+printf("end of transfer, status=0x%x\n", s->status);
 #endif
 } else {
 /* see if a new sector must be read */
 if (s->lba != -1 && s->io_buffer_index >= s->cd_sector_size) {
-ret = cd_read_sector(s, s->lba, s->io_buffer, s->cd_sector_size);
-if (ret < 0) {
-ide_atapi_io_error(s, ret);
+if (!s->elementary_transfer_size) {
+ret = cd_read_sector(s, s->io_buffer);
+if (ret < 0) {
+ide_atapi_io_error(s, ret);
+}
 return;
+} else {
+/* rebuffering within an elementary transfer is
+ * only possible with a sync request because we
+ * end up with a race condition otherwise */
+ret = cd_read_sector_sync(s, s->io_buffer);
+if (ret < 0) {
+ide_atapi_io_error(s, ret);
+return;
+}
 }
-s->lba++;
-s->io_buffer_index = 0;
 }
 if (s->elementary_transfer_size > 0) {
 /* there are some data left to transmit in this elementary
@@ -275,7 +349,6 @@ static void ide_atapi_cmd_read_pio(IDEState *s, int lba, 
int nb_sectors,
 s->io_buffer_index = sector_size;
 s->cd_sector_size = sector_size;
 
-s->status = READY_STAT | SEEK_STAT;
 ide_atapi_cmd_reply_end(s);
 }
 
-- 
1.9.1




[Qemu-block] [PATCH V3 3/6] ide: add support for IDEBufferedRequest

2015-11-06 Thread Peter Lieven
this patch adds a new aio readv compatible function which copies
all data through a bounce buffer. These buffered requests can be
flagged as orphaned which means that their original callback has
already been invoked and the request has just not been completed
by the backend storage. The bounce buffer guarantees that guest
memory corruption is avoided when such a orphaned request is
completed by the backend at a later stage.

This trick only works for read requests as a write request completed
at a later stage might corrupt data as there is no way to control
if and what data has already been written to the storage.

Signed-off-by: Peter Lieven 
---
 hw/ide/core.c | 46 ++
 hw/ide/internal.h | 14 ++
 2 files changed, 60 insertions(+)

diff --git a/hw/ide/core.c b/hw/ide/core.c
index 364ba21..53f9c2c 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -561,6 +561,52 @@ static bool ide_sect_range_ok(IDEState *s,
 return true;
 }
 
+static void ide_buffered_readv_cb(void *opaque, int ret)
+{
+IDEBufferedRequest *req = opaque;
+if (!req->orphaned) {
+if (!ret) {
+qemu_iovec_from_buf(req->original_qiov, 0, req->iov.iov_base,
+req->original_qiov->size);
+}
+req->original_cb(req->original_opaque, ret);
+}
+QLIST_REMOVE(req, list);
+g_free(req);
+}
+
+#define MAX_BUFFERED_REQS 16
+
+BlockAIOCB *ide_buffered_readv(IDEState *s, int64_t sector_num,
+   QEMUIOVector *iov, int nb_sectors,
+   BlockCompletionFunc *cb, void *opaque)
+{
+BlockAIOCB *aioreq;
+IDEBufferedRequest *req;
+int c = 0;
+
+QLIST_FOREACH(req, &s->buffered_requests, list) {
+c++;
+}
+if (c > MAX_BUFFERED_REQS) {
+return blk_abort_aio_request(s->blk, cb, opaque, -EIO);
+}
+
+req = g_new0(IDEBufferedRequest, 1);
+req->original_qiov = iov;
+req->original_cb = cb;
+req->original_opaque = opaque;
+req->iov.iov_base = qemu_blockalign(blk_bs(s->blk), iov->size);
+req->iov.iov_len = iov->size;
+qemu_iovec_init_external(&req->qiov, &req->iov, 1);
+
+aioreq = blk_aio_readv(s->blk, sector_num, &req->qiov, nb_sectors,
+   ide_buffered_readv_cb, req);
+
+QLIST_INSERT_HEAD(&s->buffered_requests, req, list);
+return aioreq;
+}
+
 static void ide_sector_read(IDEState *s);
 
 static void ide_sector_read_cb(void *opaque, int ret)
diff --git a/hw/ide/internal.h b/hw/ide/internal.h
index 05e93ff..3d1116f 100644
--- a/hw/ide/internal.h
+++ b/hw/ide/internal.h
@@ -343,6 +343,16 @@ enum ide_dma_cmd {
 #define ide_cmd_is_read(s) \
((s)->dma_cmd == IDE_DMA_READ)
 
+typedef struct IDEBufferedRequest {
+QLIST_ENTRY(IDEBufferedRequest) list;
+struct iovec iov;
+QEMUIOVector qiov;
+QEMUIOVector *original_qiov;
+BlockCompletionFunc *original_cb;
+void *original_opaque;
+bool orphaned;
+} IDEBufferedRequest;
+
 /* NOTE: IDEState represents in fact one drive */
 struct IDEState {
 IDEBus *bus;
@@ -396,6 +406,7 @@ struct IDEState {
 BlockAIOCB *pio_aiocb;
 struct iovec iov;
 QEMUIOVector qiov;
+QLIST_HEAD(, IDEBufferedRequest) buffered_requests;
 /* ATA DMA state */
 int32_t io_buffer_offset;
 int32_t io_buffer_size;
@@ -572,6 +583,9 @@ void ide_set_inactive(IDEState *s, bool more);
 BlockAIOCB *ide_issue_trim(BlockBackend *blk,
 int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
 BlockCompletionFunc *cb, void *opaque);
+BlockAIOCB *ide_buffered_readv(IDEState *s, int64_t sector_num,
+   QEMUIOVector *iov, int nb_sectors,
+   BlockCompletionFunc *cb, void *opaque);
 
 /* hw/ide/atapi.c */
 void ide_atapi_cmd(IDEState *s);
-- 
1.9.1




[Qemu-block] [PATCH V3 2/6] block: add blk_abort_aio_request

2015-11-06 Thread Peter Lieven
Signed-off-by: Peter Lieven 
---
 block/block-backend.c  | 17 +
 include/sysemu/block-backend.h |  3 +++
 2 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 19fdaae..b13dc4e 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -627,8 +627,9 @@ static void error_callback_bh(void *opaque)
 qemu_aio_unref(acb);
 }
 
-static BlockAIOCB *abort_aio_request(BlockBackend *blk, BlockCompletionFunc 
*cb,
- void *opaque, int ret)
+BlockAIOCB *blk_abort_aio_request(BlockBackend *blk,
+   BlockCompletionFunc *cb,
+   void *opaque, int ret)
 {
 struct BlockBackendAIOCB *acb;
 QEMUBH *bh;
@@ -650,7 +651,7 @@ BlockAIOCB *blk_aio_write_zeroes(BlockBackend *blk, int64_t 
sector_num,
 {
 int ret = blk_check_request(blk, sector_num, nb_sectors);
 if (ret < 0) {
-return abort_aio_request(blk, cb, opaque, ret);
+return blk_abort_aio_request(blk, cb, opaque, ret);
 }
 
 return bdrv_aio_write_zeroes(blk->bs, sector_num, nb_sectors, flags,
@@ -710,7 +711,7 @@ BlockAIOCB *blk_aio_readv(BlockBackend *blk, int64_t 
sector_num,
 {
 int ret = blk_check_request(blk, sector_num, nb_sectors);
 if (ret < 0) {
-return abort_aio_request(blk, cb, opaque, ret);
+return blk_abort_aio_request(blk, cb, opaque, ret);
 }
 
 return bdrv_aio_readv(blk->bs, sector_num, iov, nb_sectors, cb, opaque);
@@ -722,7 +723,7 @@ BlockAIOCB *blk_aio_writev(BlockBackend *blk, int64_t 
sector_num,
 {
 int ret = blk_check_request(blk, sector_num, nb_sectors);
 if (ret < 0) {
-return abort_aio_request(blk, cb, opaque, ret);
+return blk_abort_aio_request(blk, cb, opaque, ret);
 }
 
 return bdrv_aio_writev(blk->bs, sector_num, iov, nb_sectors, cb, opaque);
@@ -732,7 +733,7 @@ BlockAIOCB *blk_aio_flush(BlockBackend *blk,
   BlockCompletionFunc *cb, void *opaque)
 {
 if (!blk_is_available(blk)) {
-return abort_aio_request(blk, cb, opaque, -ENOMEDIUM);
+return blk_abort_aio_request(blk, cb, opaque, -ENOMEDIUM);
 }
 
 return bdrv_aio_flush(blk->bs, cb, opaque);
@@ -744,7 +745,7 @@ BlockAIOCB *blk_aio_discard(BlockBackend *blk,
 {
 int ret = blk_check_request(blk, sector_num, nb_sectors);
 if (ret < 0) {
-return abort_aio_request(blk, cb, opaque, ret);
+return blk_abort_aio_request(blk, cb, opaque, ret);
 }
 
 return bdrv_aio_discard(blk->bs, sector_num, nb_sectors, cb, opaque);
@@ -787,7 +788,7 @@ BlockAIOCB *blk_aio_ioctl(BlockBackend *blk, unsigned long 
int req, void *buf,
   BlockCompletionFunc *cb, void *opaque)
 {
 if (!blk_is_available(blk)) {
-return abort_aio_request(blk, cb, opaque, -ENOMEDIUM);
+return blk_abort_aio_request(blk, cb, opaque, -ENOMEDIUM);
 }
 
 return bdrv_aio_ioctl(blk->bs, req, buf, cb, opaque);
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 9306a52..b5267a8 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -180,5 +180,8 @@ int blk_save_vmstate(BlockBackend *blk, const uint8_t *buf,
 int blk_load_vmstate(BlockBackend *blk, uint8_t *buf, int64_t pos, int size);
 int blk_probe_blocksizes(BlockBackend *blk, BlockSizes *bsz);
 int blk_probe_geometry(BlockBackend *blk, HDGeometry *geo);
+BlockAIOCB *blk_abort_aio_request(BlockBackend *blk,
+   BlockCompletionFunc *cb,
+   void *opaque, int ret);
 
 #endif
-- 
1.9.1




[Qemu-block] [PATCH V3 4/6] ide: orphan all buffered requests on DMA cancel

2015-11-06 Thread Peter Lieven
If the guests canceles a DMA request we can prematurely
invoke all callbacks of buffered requests and flag all them
as orphaned. Ideally this avoids the need for draining all
requests. For CDROM devices this works in 100% of all cases.

Signed-off-by: Peter Lieven 
---
 hw/ide/pci.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/hw/ide/pci.c b/hw/ide/pci.c
index d31ff88..a9e164e 100644
--- a/hw/ide/pci.c
+++ b/hw/ide/pci.c
@@ -240,6 +240,22 @@ void bmdma_cmd_writeb(BMDMAState *bm, uint32_t val)
 /* Ignore writes to SSBM if it keeps the old value */
 if ((val & BM_CMD_START) != (bm->cmd & BM_CMD_START)) {
 if (!(val & BM_CMD_START)) {
+/* First invoke the callbacks of all buffered requests
+ * and flag those requests as orphaned. Ideally there
+ * are no unbuffered (Scatter Gather DMA Requests or
+ * write requests) pending and we can avoid to drain. */
+IDEBufferedRequest *req;
+IDEState *s = idebus_active_if(bm->bus);
+QLIST_FOREACH(req, &s->buffered_requests, list) {
+if (!req->orphaned) {
+#ifdef DEBUG_IDE
+printf("%s: invoking cb %p of buffered request %p with"
+   " -ECANCELED\n", __func__, req->original_cb, req);
+#endif
+req->original_cb(req->original_opaque, -ECANCELED);
+}
+req->orphaned = true;
+}
 /*
  * We can't cancel Scatter Gather DMA in the middle of the
  * operation or a partial (not full) DMA transfer would reach
@@ -253,6 +269,9 @@ void bmdma_cmd_writeb(BMDMAState *bm, uint32_t val)
  * aio operation with preadv/pwritev.
  */
 if (bm->bus->dma->aiocb) {
+#ifdef DEBUG_IDE
+printf("%s: draining all remaining requests", __func__);
+#endif
 blk_drain_all();
 assert(bm->bus->dma->aiocb == NULL);
 }
-- 
1.9.1




Re: [Qemu-block] [Qemu-devel] [PATCH v10 12/14] block: add transactional properties

2015-11-06 Thread Kevin Wolf
Am 05.11.2015 um 19:52 hat John Snow geschrieben:
> 
> 
> On 11/05/2015 05:47 AM, Stefan Hajnoczi wrote:
> > On Tue, Nov 03, 2015 at 12:27:19PM -0500, John Snow wrote:
> >>
> >>
> >> On 11/03/2015 10:17 AM, Stefan Hajnoczi wrote:
> >>> On Fri, Oct 23, 2015 at 07:56:50PM -0400, John Snow wrote:
>  @@ -1732,6 +1757,10 @@ static void 
>  block_dirty_bitmap_add_prepare(BlkActionState *common,
>   BlockDirtyBitmapState *state = DO_UPCAST(BlockDirtyBitmapState,
>    common, common);
>   
>  +if (action_check_cancel_mode(common, errp) < 0) {
>  +return;
>  +}
>  +
>   action = common->action->block_dirty_bitmap_add;
>   /* AIO context taken and released within qmp_block_dirty_bitmap_add 
>  */
>   qmp_block_dirty_bitmap_add(action->node, action->name,
>  @@ -1767,6 +1796,10 @@ static void 
>  block_dirty_bitmap_clear_prepare(BlkActionState *common,
>    common, common);
>   BlockDirtyBitmap *action;
>   
>  +if (action_check_cancel_mode(common, errp) < 0) {
>  +return;
>  +}
>  +
>   action = common->action->block_dirty_bitmap_clear;
>   state->bitmap = block_dirty_bitmap_lookup(action->node,
> action->name,
> >>>
> >>> Why do the bitmap add/clear actions not support err-cancel=all?
> >>>
> >>> I understand why other block jobs don't support it, but it's not clear
> >>> why these non-block job actions cannot.
> >>>
> >>
> >> Because they don't have a callback to invoke if the rest of the job fails.
> >>
> >> I could create a BlockJob for them complete with a callback to invoke,
> >> but basically it's just because there's no interface to unwind them, or
> >> an interface to join them with the transaction.
> >>
> >> They're small, synchronous non-job actions. Which makes them weird.
> > 
> > Funny, we've been looking at the same picture while seeing different
> > things:
> > https://en.wikipedia.org/wiki/Rabbit%E2%80%93duck_illusion
> > 
> > I think I understand your idea: the transaction should include both
> > immediate actions as well as block jobs.
> > 
> > My mental model was different: immediate actions commit/abort along with
> > the 'transaction' command.  Block jobs are separate and complete/cancel
> > together in a group.
> > 
> > In practice I think the two end up being similar because we won't be
> > able to implement immediate action commit/abort together with
> > long-running block jobs because the immediate actions rely on
> > quiescing/pausing the guest for atomic commit/abort.
> > 
> > So with your mental model the QMP client has to submit 2 'transaction'
> > commands: 1 for the immediate actions, 1 for the block jobs.
> > 
> > In my mental model the QMP client submits 1 command but the immediate
> > actions and block jobs are two separate transaction scopes.  This means
> > if the block jobs fail, the client needs to be aware of the immediate
> > actions that have committed.  Because of this, it becomes just as much
> > client effort as submitting two separate 'transaction' commands in your
> > model.
> > 
> > Can anyone see a practical difference?  I think I'm happy with John's
> > model.
> > 
> > Stefan
> > 
> 
> We discussed this off-list, but for the sake of the archive:
> 
> == How it is now ==
> 
> Currently, transactions have two implicit phases: the first is the
> synchronous phase. If this phase completes successfully, we consider the
> transaction a success. The second phase is the asynchronous phase where
> jobs launched by the synchronous phase run to completion.
> 
> all synchronous commands must complete for the transaction to "succeed."
> There are currently (pre-patch) no guarantees about asynchronous command
> completion. As long as all synchronous actions complete, asynchronous
> actions are free to succeed or fail individually.

You're overcomplicating this. All actions are currently synchronous and
what you consider asynchronous transaction actions aren't actually part
of the transaction at all. The action is "start block job X", not "run
block job X".

> == My Model ==
> 
> The current behavior is my "err-cancel = none" scenario: we offer no
> guarantee about the success or failure of the transaction as a whole
> after the synchronous portion has completed.
> 
> What I was proposing is "err-cancel = all," which to me means that _ALL_
> commands in this transaction are to succeed (synchronous or not) before
> _any_ actions are irrevocably committed. This means that for a
> hypothetical mixed synchronous-asynchronous transaction, that even after
> the transaction succeeded (it passed the synchronous phase), if an
> asynchronous action later fails, all actions both synchronous and non
> are rolled-back -- a kind of retroactive failure of the transaction.
> This is clearly not possible in 

Re: [Qemu-block] [Qemu-devel] [PULL 00/37] Block layer patches

2015-11-06 Thread Kevin Wolf
Am 05.11.2015 um 20:01 hat Peter Maydell geschrieben:
> On 5 November 2015 at 18:17, Kevin Wolf  wrote:
> > The following changes since commit 8835b9df3bddf332c883c861d6a1defc12c4ebe9:
> >
> >   Merge remote-tracking branch 
> > 'remotes/mdroth/tags/qga-pull-2015-11-04-tag' into staging (2015-11-05 
> > 10:52:35 +)
> >
> > are available in the git repository at:
> >
> >
> >   git://repo.or.cz/qemu/kevin.git tags/for-upstream
> >
> > for you to fetch changes up to 22bdadd23b64af65ac2dd816848dbe2b1240a77a:
> >
> >   Merge remote-tracking branch 
> > 'mreitz/tags/pull-block-for-kevin-2015-11-05' into queue-block (2015-11-05 
> > 18:01:37 +0100)
> >
> > 
> >
> > Block layer patches
> 
> Fails to build on OSX, I'm afraid:
> Undefined symbols for architecture x86_64:
>   "_qmp_change_blockdev", referenced from:
>   -[QemuCocoaAppController changeDeviceMedia:] in cocoa.o

Max, I think this is yours.

Kevin