Re: [PATCH v5 3/4] block: add block timer and block throttling algorithm

2011-08-11 Thread Zhi Yong Wu
On Wed, Aug 10, 2011 at 7:00 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Wed, Aug 10, 2011 at 7:57 AM, Zhi Yong Wu zwu.ker...@gmail.com wrote:
 On Tue, Aug 9, 2011 at 11:19 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Tue, Aug 9, 2011 at 5:17 AM, Zhi Yong Wu wu...@linux.vnet.ibm.com 
 wrote:
 Note:
      1.) When bps/iops limits are specified to a small value such as 511 
 bytes/s, this VM will hang up. We are considering how to handle this 
 senario.

 If an I/O request is larger than the limit itself then I think we
 should let it through with a warning that the limit is too low.  This
 If it will print a waring, it seems to be not a nice way, the
 throtting function will take no effect on this scenario.

 Setting the limit below the max request size is a misconfiguration.
 It's a problem that the user needs to be aware of and fix, so a
 warning is appropriate.  Unfortunately the max request size is not
 easy to determine from the block layer and may vary between guest
 OSes, that's why we need to do something at runtime.

 We cannot leave this case unhandled because it results in the guest
 timing out I/O without any information about the problem - that makes
 it hard to troubleshoot for the user.  Any other ideas on how to
 handle this case?
Can we constrain the io limits specified by the user? When the limits
is smaller than some value such as 1MB/s, one error is provide to the
user and remind he/her that the limits is too small.


 @@ -90,6 +100,68 @@ int is_windows_drive(const char *filename)
  }
  #endif

 +/* throttling disk I/O limits */
 +void bdrv_io_limits_disable(BlockDriverState *bs)
 +{
 +    bs-io_limits_enabled = false;
 +    bs-req_from_queue    = false;
 +
 +    if (bs-block_queue) {
 +        qemu_block_queue_flush(bs-block_queue);
 +        qemu_del_block_queue(bs-block_queue);

 When you fix the acb lifecycle in block-queue.c this will no longer be
 safe.  Requests that are dispatched still have acbs that belong to
 When the request is dispatched, if it is successfully serviced, our
 acb will been removed.

 Yes, that is how the patches work right now.  But the acb lifecycle
 that I explained in response to the other patch changes this.
OK.

 this queue.  It is simplest to keep the queue for the lifetime of the
 BlockDriverState - it's just a list so it doesn't take up much memory.
 I more prefer to removing this queue, even though it is simple and
 harmless to let it exist.

 If the existing acbs do not touch their BlockQueue after this point in
 the code, then it will be safe to delete the queue since it will no
 longer be referenced.  If you can guarantee this then it's fine to
 keep this code.
I prefer to removing the queue.



 +    }
 +
 +    if (bs-block_timer) {
 +        qemu_del_timer(bs-block_timer);
 +        qemu_free_timer(bs-block_timer);

 To prevent double frees:

 bs-block_timer = NULL;
 Can qemu_free_timer() not ensure that it is NULL? This is not necessary.:)

 No, qemu_free_timer() is a function.  qemu_free_timer() cannot change
 the pointer variable bs-block_timer because in C functions take
 arguments by value.

 After qemu_free_timer() bs-block_timer will still hold the old
 address of the (freed) timer.  Then when bdrv_close() is called it
 does qemu_free_timer() again because bs-block_timer is not NULL.  It
 is illegal to free() a pointer twice and it can cause a crash.
Done.



 +    }
 +
 +    bs-slice_start[0]   = 0;
 +    bs-slice_start[1]   = 0;
 +
 +    bs-slice_end[0]     = 0;
 +    bs-slice_end[1]     = 0;
 +}
 +
 +static void bdrv_block_timer(void *opaque)
 +{
 +    BlockDriverState *bs = opaque;
 +    BlockQueue *queue = bs-block_queue;
 +
 +    qemu_block_queue_flush(queue);
 +}
 +
 +void bdrv_io_limits_enable(BlockDriverState *bs)
 +{
 +    bs-req_from_queue = false;
 +
 +    bs-block_queue    = qemu_new_block_queue();
 +    bs-block_timer    = qemu_new_timer_ns(vm_clock, bdrv_block_timer, 
 bs);
 +
 +    bs-slice_start[BLOCK_IO_LIMIT_READ]  = qemu_get_clock_ns(vm_clock);
 +    bs-slice_start[BLOCK_IO_LIMIT_WRITE] = qemu_get_clock_ns(vm_clock);
 +
 +    bs-slice_end[BLOCK_IO_LIMIT_READ]    =
 +                      qemu_get_clock_ns(vm_clock) + BLOCK_IO_SLICE_TIME;
 +    bs-slice_end[BLOCK_IO_LIMIT_WRITE]   =
 +                      qemu_get_clock_ns(vm_clock) + BLOCK_IO_SLICE_TIME;

 The slice times could just be initialized to 0 here.  When
 bdrv_exceed_io_limits() is called it will start a new slice.
 The result should be same as current method even though they are set to ZERO.

 Yes, the result is the same except we only create new slices in one place.
Done.

 +            }
 +
 +            return true;
 +        }
 +    }
 +
 +    elapsed_time  = real_time - bs-slice_start[is_write];
 +    elapsed_time  /= (BLOCK_IO_SLICE_TIME * 10.0);

 Why * 10.0?
 elapsed_time currently is in ns, and it need to be translated into a
 floating value in minutes.

 I think you mean seconds instead of minutes.  Then the conversion has
 nothing to 

Re: [PATCH v5 3/4] block: add block timer and block throttling algorithm

2011-08-11 Thread Stefan Hajnoczi
On Fri, Aug 12, 2011 at 6:00 AM, Zhi Yong Wu zwu.ker...@gmail.com wrote:
 On Wed, Aug 10, 2011 at 7:00 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Wed, Aug 10, 2011 at 7:57 AM, Zhi Yong Wu zwu.ker...@gmail.com wrote:
 On Tue, Aug 9, 2011 at 11:19 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Tue, Aug 9, 2011 at 5:17 AM, Zhi Yong Wu wu...@linux.vnet.ibm.com 
 wrote:
 Note:
      1.) When bps/iops limits are specified to a small value such as 511 
 bytes/s, this VM will hang up. We are considering how to handle this 
 senario.

 If an I/O request is larger than the limit itself then I think we
 should let it through with a warning that the limit is too low.  This
 If it will print a waring, it seems to be not a nice way, the
 throtting function will take no effect on this scenario.

 Setting the limit below the max request size is a misconfiguration.
 It's a problem that the user needs to be aware of and fix, so a
 warning is appropriate.  Unfortunately the max request size is not
 easy to determine from the block layer and may vary between guest
 OSes, that's why we need to do something at runtime.

 We cannot leave this case unhandled because it results in the guest
 timing out I/O without any information about the problem - that makes
 it hard to troubleshoot for the user.  Any other ideas on how to
 handle this case?
 Can we constrain the io limits specified by the user? When the limits
 is smaller than some value such as 1MB/s, one error is provide to the
 user and remind he/her that the limits is too small.

It would be an arbitrary limit because the maximum request size
depends on the storage interface or on the host block device.  Guest
OSes may limit the max request size further.  There is no single
constant that we can choose ahead of time.  Any constant value risks
not allowing the user to set a small valid limit or being less than
the max request size and therefore causing I/O to stall again.

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 3/4] block: add block timer and block throttling algorithm

2011-08-11 Thread Zhi Yong Wu
On Fri, Aug 12, 2011 at 1:06 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Fri, Aug 12, 2011 at 6:00 AM, Zhi Yong Wu zwu.ker...@gmail.com wrote:
 On Wed, Aug 10, 2011 at 7:00 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Wed, Aug 10, 2011 at 7:57 AM, Zhi Yong Wu zwu.ker...@gmail.com wrote:
 On Tue, Aug 9, 2011 at 11:19 PM, Stefan Hajnoczi stefa...@gmail.com 
 wrote:
 On Tue, Aug 9, 2011 at 5:17 AM, Zhi Yong Wu wu...@linux.vnet.ibm.com 
 wrote:
 Note:
      1.) When bps/iops limits are specified to a small value such as 511 
 bytes/s, this VM will hang up. We are considering how to handle this 
 senario.

 If an I/O request is larger than the limit itself then I think we
 should let it through with a warning that the limit is too low.  This
 If it will print a waring, it seems to be not a nice way, the
 throtting function will take no effect on this scenario.

 Setting the limit below the max request size is a misconfiguration.
 It's a problem that the user needs to be aware of and fix, so a
 warning is appropriate.  Unfortunately the max request size is not
 easy to determine from the block layer and may vary between guest
 OSes, that's why we need to do something at runtime.

 We cannot leave this case unhandled because it results in the guest
 timing out I/O without any information about the problem - that makes
 it hard to troubleshoot for the user.  Any other ideas on how to
 handle this case?
 Can we constrain the io limits specified by the user? When the limits
 is smaller than some value such as 1MB/s, one error is provide to the
 user and remind he/her that the limits is too small.

 It would be an arbitrary limit because the maximum request size
 depends on the storage interface or on the host block device.  Guest
 OSes may limit the max request size further.  There is no single
 constant that we can choose ahead of time.  Any constant value risks
 not allowing the user to set a small valid limit or being less than
 the max request size and therefore causing I/O to stall again.
Let us think at first. When anyone of us has some better idea, we will
rediscuss this.:)

 Stefan




-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 3/4] block: add block timer and block throttling algorithm

2011-08-10 Thread Zhi Yong Wu
On Tue, Aug 9, 2011 at 11:19 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Tue, Aug 9, 2011 at 5:17 AM, Zhi Yong Wu wu...@linux.vnet.ibm.com wrote:
 Note:
      1.) When bps/iops limits are specified to a small value such as 511 
 bytes/s, this VM will hang up. We are considering how to handle this senario.

 If an I/O request is larger than the limit itself then I think we
 should let it through with a warning that the limit is too low.  This
If it will print a waring, it seems to be not a nice way, the
throtting function will take no effect on this scenario.
 reflects the fact that we don't split I/O requests into smaller
 requests and the guest may give us 128 KB (or larger?) requests.  In
 practice the lowest feasible limit is probably 256 KB or 512 KB.

 diff --git a/block.c b/block.c
 index 24a25d5..8fd6643 100644
 --- a/block.c
 +++ b/block.c
 @@ -29,6 +29,9 @@
  #include module.h
  #include qemu-objects.h

 +#include qemu-timer.h
 +#include block/blk-queue.h
 +
  #ifdef CONFIG_BSD
  #include sys/types.h
  #include sys/stat.h
 @@ -58,6 +61,13 @@ static int bdrv_read_em(BlockDriverState *bs, int64_t 
 sector_num,
  static int bdrv_write_em(BlockDriverState *bs, int64_t sector_num,
                          const uint8_t *buf, int nb_sectors);

 +static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors,
 +        bool is_write, double elapsed_time, uint64_t *wait);
 +static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
 +        double elapsed_time, uint64_t *wait);
 +static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
 +        bool is_write, uint64_t *wait);
 +
  static QTAILQ_HEAD(, BlockDriverState) bdrv_states =
     QTAILQ_HEAD_INITIALIZER(bdrv_states);

 @@ -90,6 +100,68 @@ int is_windows_drive(const char *filename)
  }
  #endif

 +/* throttling disk I/O limits */
 +void bdrv_io_limits_disable(BlockDriverState *bs)
 +{
 +    bs-io_limits_enabled = false;
 +    bs-req_from_queue    = false;
 +
 +    if (bs-block_queue) {
 +        qemu_block_queue_flush(bs-block_queue);
 +        qemu_del_block_queue(bs-block_queue);

 When you fix the acb lifecycle in block-queue.c this will no longer be
 safe.  Requests that are dispatched still have acbs that belong to
When the request is dispatched, if it is successfully serviced, our
acb will been removed.
 this queue.  It is simplest to keep the queue for the lifetime of the
 BlockDriverState - it's just a list so it doesn't take up much memory.
I more prefer to removing this queue, even though it is simple and
harmless to let it exist.


 +    }
 +
 +    if (bs-block_timer) {
 +        qemu_del_timer(bs-block_timer);
 +        qemu_free_timer(bs-block_timer);

 To prevent double frees:

 bs-block_timer = NULL;
Can qemu_free_timer() not ensure that it is NULL? This is not necessary.:)


 +    }
 +
 +    bs-slice_start[0]   = 0;
 +    bs-slice_start[1]   = 0;
 +
 +    bs-slice_end[0]     = 0;
 +    bs-slice_end[1]     = 0;
 +}
 +
 +static void bdrv_block_timer(void *opaque)
 +{
 +    BlockDriverState *bs = opaque;
 +    BlockQueue *queue = bs-block_queue;
 +
 +    qemu_block_queue_flush(queue);
 +}
 +
 +void bdrv_io_limits_enable(BlockDriverState *bs)
 +{
 +    bs-req_from_queue = false;
 +
 +    bs-block_queue    = qemu_new_block_queue();
 +    bs-block_timer    = qemu_new_timer_ns(vm_clock, bdrv_block_timer, bs);
 +
 +    bs-slice_start[BLOCK_IO_LIMIT_READ]  = qemu_get_clock_ns(vm_clock);
 +    bs-slice_start[BLOCK_IO_LIMIT_WRITE] = qemu_get_clock_ns(vm_clock);
 +
 +    bs-slice_end[BLOCK_IO_LIMIT_READ]    =
 +                      qemu_get_clock_ns(vm_clock) + BLOCK_IO_SLICE_TIME;
 +    bs-slice_end[BLOCK_IO_LIMIT_WRITE]   =
 +                      qemu_get_clock_ns(vm_clock) + BLOCK_IO_SLICE_TIME;

 The slice times could just be initialized to 0 here.  When
 bdrv_exceed_io_limits() is called it will start a new slice.
The result should be same as current method even though they are set to ZERO.

 +}
 +
 +bool bdrv_io_limits_enabled(BlockDriverState *bs)
 +{
 +    BlockIOLimit *io_limits = bs-io_limits;
 +    if ((io_limits-bps[BLOCK_IO_LIMIT_READ] == 0)
 +          (io_limits-bps[BLOCK_IO_LIMIT_WRITE] == 0)
 +          (io_limits-bps[BLOCK_IO_LIMIT_TOTAL] == 0)
 +          (io_limits-iops[BLOCK_IO_LIMIT_READ] == 0)
 +          (io_limits-iops[BLOCK_IO_LIMIT_WRITE] == 0)
 +          (io_limits-iops[BLOCK_IO_LIMIT_TOTAL] == 0)) {
 +        return false;
 +    }
 +
 +    return true;
 +}
 +
  /* check if the path starts with protocol: */
  static int path_has_protocol(const char *path)
  {
 @@ -642,6 +714,11 @@ int bdrv_open(BlockDriverState *bs, const char 
 *filename, int flags,
             bs-change_cb(bs-change_opaque, CHANGE_MEDIA);
     }

 +    /* throttling disk I/O limits */
 +    if (bs-io_limits_enabled) {
 +        bdrv_io_limits_enable(bs);
 +    }
 +
     return 0;

  unlink_and_fail:
 @@ -680,6 +757,16 @@ void bdrv_close(BlockDriverState *bs)
         if (bs-change_cb)
             

Re: [PATCH v5 3/4] block: add block timer and block throttling algorithm

2011-08-10 Thread Stefan Hajnoczi
On Wed, Aug 10, 2011 at 7:57 AM, Zhi Yong Wu zwu.ker...@gmail.com wrote:
 On Tue, Aug 9, 2011 at 11:19 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Tue, Aug 9, 2011 at 5:17 AM, Zhi Yong Wu wu...@linux.vnet.ibm.com wrote:
 Note:
      1.) When bps/iops limits are specified to a small value such as 511 
 bytes/s, this VM will hang up. We are considering how to handle this 
 senario.

 If an I/O request is larger than the limit itself then I think we
 should let it through with a warning that the limit is too low.  This
 If it will print a waring, it seems to be not a nice way, the
 throtting function will take no effect on this scenario.

Setting the limit below the max request size is a misconfiguration.
It's a problem that the user needs to be aware of and fix, so a
warning is appropriate.  Unfortunately the max request size is not
easy to determine from the block layer and may vary between guest
OSes, that's why we need to do something at runtime.

We cannot leave this case unhandled because it results in the guest
timing out I/O without any information about the problem - that makes
it hard to troubleshoot for the user.  Any other ideas on how to
handle this case?

 @@ -90,6 +100,68 @@ int is_windows_drive(const char *filename)
  }
  #endif

 +/* throttling disk I/O limits */
 +void bdrv_io_limits_disable(BlockDriverState *bs)
 +{
 +    bs-io_limits_enabled = false;
 +    bs-req_from_queue    = false;
 +
 +    if (bs-block_queue) {
 +        qemu_block_queue_flush(bs-block_queue);
 +        qemu_del_block_queue(bs-block_queue);

 When you fix the acb lifecycle in block-queue.c this will no longer be
 safe.  Requests that are dispatched still have acbs that belong to
 When the request is dispatched, if it is successfully serviced, our
 acb will been removed.

Yes, that is how the patches work right now.  But the acb lifecycle
that I explained in response to the other patch changes this.

 this queue.  It is simplest to keep the queue for the lifetime of the
 BlockDriverState - it's just a list so it doesn't take up much memory.
 I more prefer to removing this queue, even though it is simple and
 harmless to let it exist.

If the existing acbs do not touch their BlockQueue after this point in
the code, then it will be safe to delete the queue since it will no
longer be referenced.  If you can guarantee this then it's fine to
keep this code.



 +    }
 +
 +    if (bs-block_timer) {
 +        qemu_del_timer(bs-block_timer);
 +        qemu_free_timer(bs-block_timer);

 To prevent double frees:

 bs-block_timer = NULL;
 Can qemu_free_timer() not ensure that it is NULL? This is not necessary.:)

No, qemu_free_timer() is a function.  qemu_free_timer() cannot change
the pointer variable bs-block_timer because in C functions take
arguments by value.

After qemu_free_timer() bs-block_timer will still hold the old
address of the (freed) timer.  Then when bdrv_close() is called it
does qemu_free_timer() again because bs-block_timer is not NULL.  It
is illegal to free() a pointer twice and it can cause a crash.



 +    }
 +
 +    bs-slice_start[0]   = 0;
 +    bs-slice_start[1]   = 0;
 +
 +    bs-slice_end[0]     = 0;
 +    bs-slice_end[1]     = 0;
 +}
 +
 +static void bdrv_block_timer(void *opaque)
 +{
 +    BlockDriverState *bs = opaque;
 +    BlockQueue *queue = bs-block_queue;
 +
 +    qemu_block_queue_flush(queue);
 +}
 +
 +void bdrv_io_limits_enable(BlockDriverState *bs)
 +{
 +    bs-req_from_queue = false;
 +
 +    bs-block_queue    = qemu_new_block_queue();
 +    bs-block_timer    = qemu_new_timer_ns(vm_clock, bdrv_block_timer, bs);
 +
 +    bs-slice_start[BLOCK_IO_LIMIT_READ]  = qemu_get_clock_ns(vm_clock);
 +    bs-slice_start[BLOCK_IO_LIMIT_WRITE] = qemu_get_clock_ns(vm_clock);
 +
 +    bs-slice_end[BLOCK_IO_LIMIT_READ]    =
 +                      qemu_get_clock_ns(vm_clock) + BLOCK_IO_SLICE_TIME;
 +    bs-slice_end[BLOCK_IO_LIMIT_WRITE]   =
 +                      qemu_get_clock_ns(vm_clock) + BLOCK_IO_SLICE_TIME;

 The slice times could just be initialized to 0 here.  When
 bdrv_exceed_io_limits() is called it will start a new slice.
 The result should be same as current method even though they are set to ZERO.

Yes, the result is the same except we only create new slices in one place.

 +            }
 +
 +            return true;
 +        }
 +    }
 +
 +    elapsed_time  = real_time - bs-slice_start[is_write];
 +    elapsed_time  /= (BLOCK_IO_SLICE_TIME * 10.0);

 Why * 10.0?
 elapsed_time currently is in ns, and it need to be translated into a
 floating value in minutes.

I think you mean seconds instead of minutes.  Then the conversion has
nothing to do with BLOCK_IO_SLICE_TIME, it's just nanoseconds to
seconds.  It would be clearer to drop BLOCK_IO_SLICE_TIME from the
expression and divide by a new NANOSECONDS_PER_SECOND constant
(10.0).


 +
 +    bps_ret  = bdrv_exceed_bps_limits(bs, nb_sectors,
 +                                      is_write, elapsed_time, bps_wait);
 +    

Re: [PATCH v5 3/4] block: add block timer and block throttling algorithm

2011-08-09 Thread Ram Pai
On Tue, Aug 09, 2011 at 12:17:51PM +0800, Zhi Yong Wu wrote:
 Note:
   1.) When bps/iops limits are specified to a small value such as 511 
 bytes/s, this VM will hang up. We are considering how to handle this senario.
   2.) When dd command is issued in guest, if its option bs is set to a 
 large value such as bs=1024K, the result speed will slightly bigger than 
 the limits.
 
 For these problems, if you have nice thought, pls let us know.:)
 
 Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
 ---
  block.c |  347 
 +--
  block.h |6 +-
  block_int.h |   30 +
  3 files changed, 372 insertions(+), 11 deletions(-)
 
 diff --git a/block.c b/block.c
 index 24a25d5..8fd6643 100644
 --- a/block.c
 +++ b/block.c
 @@ -29,6 +29,9 @@
  #include module.h
  #include qemu-objects.h
 
 +#include qemu-timer.h
 +#include block/blk-queue.h
 +
  #ifdef CONFIG_BSD
  #include sys/types.h
  #include sys/stat.h
 @@ -58,6 +61,13 @@ static int bdrv_read_em(BlockDriverState *bs, int64_t 
 sector_num,
  static int bdrv_write_em(BlockDriverState *bs, int64_t sector_num,
   const uint8_t *buf, int nb_sectors);
 
 +static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors,
 +bool is_write, double elapsed_time, uint64_t *wait);
 +static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
 +double elapsed_time, uint64_t *wait);
 +static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
 +bool is_write, uint64_t *wait);
 +
  static QTAILQ_HEAD(, BlockDriverState) bdrv_states =
  QTAILQ_HEAD_INITIALIZER(bdrv_states);
 
 @@ -90,6 +100,68 @@ int is_windows_drive(const char *filename)
  }
  #endif
 
 +/* throttling disk I/O limits */
 +void bdrv_io_limits_disable(BlockDriverState *bs)
 +{
 +bs-io_limits_enabled = false;
 +bs-req_from_queue= false;
 +
 +if (bs-block_queue) {
 +qemu_block_queue_flush(bs-block_queue);
 +qemu_del_block_queue(bs-block_queue);
 +}
 +
 +if (bs-block_timer) {
 +qemu_del_timer(bs-block_timer);
 +qemu_free_timer(bs-block_timer);
 +}
 +
 +bs-slice_start[0]   = 0;
 +bs-slice_start[1]   = 0;
 +
 +bs-slice_end[0] = 0;
 +bs-slice_end[1] = 0;
 +}
 +
 +static void bdrv_block_timer(void *opaque)
 +{
 +BlockDriverState *bs = opaque;
 +BlockQueue *queue = bs-block_queue;
 +
 +qemu_block_queue_flush(queue);
 +}
 +
 +void bdrv_io_limits_enable(BlockDriverState *bs)
 +{
 +bs-req_from_queue = false;
 +
 +bs-block_queue= qemu_new_block_queue();
 +bs-block_timer= qemu_new_timer_ns(vm_clock, bdrv_block_timer, bs);
 +
 +bs-slice_start[BLOCK_IO_LIMIT_READ]  = qemu_get_clock_ns(vm_clock);
 +bs-slice_start[BLOCK_IO_LIMIT_WRITE] = qemu_get_clock_ns(vm_clock);
 
a minor comment. better to keep the slice_start of the both the READ and WRITE
side the same.  

bs-slice_start[BLOCK_IO_LIMIT_WRITE] = 
bs-slice_start[BLOCK_IO_LIMIT_READ];

saves  a call to qemu_get_clock_ns().

 +
 +bs-slice_end[BLOCK_IO_LIMIT_READ]=
 +  qemu_get_clock_ns(vm_clock) + BLOCK_IO_SLICE_TIME;

bs-slice_end[BLOCK_IO_LIMIT_READ] = bs-slice_start[BLOCK_IO_LIMIT_READ] +
BLOCK_IO_SLICE_TIME;

saves one more call to qemu_get_clock_ns()

 +bs-slice_end[BLOCK_IO_LIMIT_WRITE]   =
 +  qemu_get_clock_ns(vm_clock) + BLOCK_IO_SLICE_TIME;


bs-slice_end[BLOCK_IO_LIMIT_WRITE] = bs-slice_start[BLOCK_IO_LIMIT_WRITE] 
+
BLOCK_IO_SLICE_TIME;

yet another call saving.


 +}
 +
 +bool bdrv_io_limits_enabled(BlockDriverState *bs)
 +{
 +BlockIOLimit *io_limits = bs-io_limits;
 +if ((io_limits-bps[BLOCK_IO_LIMIT_READ] == 0)
 +  (io_limits-bps[BLOCK_IO_LIMIT_WRITE] == 0)
 +  (io_limits-bps[BLOCK_IO_LIMIT_TOTAL] == 0)
 +  (io_limits-iops[BLOCK_IO_LIMIT_READ] == 0)
 +  (io_limits-iops[BLOCK_IO_LIMIT_WRITE] == 0)
 +  (io_limits-iops[BLOCK_IO_LIMIT_TOTAL] == 0)) {
 +return false;
 +}
 +
 +return true;
 +}

can be optimized to:

return (io_limits-bps[BLOCK_IO_LIMIT_READ]
|| io_limits-bps[BLOCK_IO_LIMIT_WRITE]
|| io_limits-bps[BLOCK_IO_LIMIT_TOTAL]
|| io_limits-iops[BLOCK_IO_LIMIT_READ]
|| io_limits-iops[BLOCK_IO_LIMIT_WRITE]
|| io_limits-iops[BLOCK_IO_LIMIT_TOTAL]);

 +
  /* check if the path starts with protocol: */
  static int path_has_protocol(const char *path)
  {
 @@ -642,6 +714,11 @@ int bdrv_open(BlockDriverState *bs, const char 
 *filename, int flags,
  bs-change_cb(bs-change_opaque, CHANGE_MEDIA);
  }
 
 +/* throttling disk I/O limits */
 +if (bs-io_limits_enabled) {
 +bdrv_io_limits_enable(bs);
 +}
 +
  return 0;
 
  unlink_and_fail:
 @@ -680,6 +757,16 @@ void bdrv_close(BlockDriverState 

Re: [PATCH v5 3/4] block: add block timer and block throttling algorithm

2011-08-09 Thread Stefan Hajnoczi
On Tue, Aug 9, 2011 at 5:17 AM, Zhi Yong Wu wu...@linux.vnet.ibm.com wrote:
 Note:
      1.) When bps/iops limits are specified to a small value such as 511 
 bytes/s, this VM will hang up. We are considering how to handle this senario.

If an I/O request is larger than the limit itself then I think we
should let it through with a warning that the limit is too low.  This
reflects the fact that we don't split I/O requests into smaller
requests and the guest may give us 128 KB (or larger?) requests.  In
practice the lowest feasible limit is probably 256 KB or 512 KB.

 diff --git a/block.c b/block.c
 index 24a25d5..8fd6643 100644
 --- a/block.c
 +++ b/block.c
 @@ -29,6 +29,9 @@
  #include module.h
  #include qemu-objects.h

 +#include qemu-timer.h
 +#include block/blk-queue.h
 +
  #ifdef CONFIG_BSD
  #include sys/types.h
  #include sys/stat.h
 @@ -58,6 +61,13 @@ static int bdrv_read_em(BlockDriverState *bs, int64_t 
 sector_num,
  static int bdrv_write_em(BlockDriverState *bs, int64_t sector_num,
                          const uint8_t *buf, int nb_sectors);

 +static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors,
 +        bool is_write, double elapsed_time, uint64_t *wait);
 +static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
 +        double elapsed_time, uint64_t *wait);
 +static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
 +        bool is_write, uint64_t *wait);
 +
  static QTAILQ_HEAD(, BlockDriverState) bdrv_states =
     QTAILQ_HEAD_INITIALIZER(bdrv_states);

 @@ -90,6 +100,68 @@ int is_windows_drive(const char *filename)
  }
  #endif

 +/* throttling disk I/O limits */
 +void bdrv_io_limits_disable(BlockDriverState *bs)
 +{
 +    bs-io_limits_enabled = false;
 +    bs-req_from_queue    = false;
 +
 +    if (bs-block_queue) {
 +        qemu_block_queue_flush(bs-block_queue);
 +        qemu_del_block_queue(bs-block_queue);

When you fix the acb lifecycle in block-queue.c this will no longer be
safe.  Requests that are dispatched still have acbs that belong to
this queue.  It is simplest to keep the queue for the lifetime of the
BlockDriverState - it's just a list so it doesn't take up much memory.

 +    }
 +
 +    if (bs-block_timer) {
 +        qemu_del_timer(bs-block_timer);
 +        qemu_free_timer(bs-block_timer);

To prevent double frees:

bs-block_timer = NULL;

 +    }
 +
 +    bs-slice_start[0]   = 0;
 +    bs-slice_start[1]   = 0;
 +
 +    bs-slice_end[0]     = 0;
 +    bs-slice_end[1]     = 0;
 +}
 +
 +static void bdrv_block_timer(void *opaque)
 +{
 +    BlockDriverState *bs = opaque;
 +    BlockQueue *queue = bs-block_queue;
 +
 +    qemu_block_queue_flush(queue);
 +}
 +
 +void bdrv_io_limits_enable(BlockDriverState *bs)
 +{
 +    bs-req_from_queue = false;
 +
 +    bs-block_queue    = qemu_new_block_queue();
 +    bs-block_timer    = qemu_new_timer_ns(vm_clock, bdrv_block_timer, bs);
 +
 +    bs-slice_start[BLOCK_IO_LIMIT_READ]  = qemu_get_clock_ns(vm_clock);
 +    bs-slice_start[BLOCK_IO_LIMIT_WRITE] = qemu_get_clock_ns(vm_clock);
 +
 +    bs-slice_end[BLOCK_IO_LIMIT_READ]    =
 +                      qemu_get_clock_ns(vm_clock) + BLOCK_IO_SLICE_TIME;
 +    bs-slice_end[BLOCK_IO_LIMIT_WRITE]   =
 +                      qemu_get_clock_ns(vm_clock) + BLOCK_IO_SLICE_TIME;

The slice times could just be initialized to 0 here.  When
bdrv_exceed_io_limits() is called it will start a new slice.

 +}
 +
 +bool bdrv_io_limits_enabled(BlockDriverState *bs)
 +{
 +    BlockIOLimit *io_limits = bs-io_limits;
 +    if ((io_limits-bps[BLOCK_IO_LIMIT_READ] == 0)
 +          (io_limits-bps[BLOCK_IO_LIMIT_WRITE] == 0)
 +          (io_limits-bps[BLOCK_IO_LIMIT_TOTAL] == 0)
 +          (io_limits-iops[BLOCK_IO_LIMIT_READ] == 0)
 +          (io_limits-iops[BLOCK_IO_LIMIT_WRITE] == 0)
 +          (io_limits-iops[BLOCK_IO_LIMIT_TOTAL] == 0)) {
 +        return false;
 +    }
 +
 +    return true;
 +}
 +
  /* check if the path starts with protocol: */
  static int path_has_protocol(const char *path)
  {
 @@ -642,6 +714,11 @@ int bdrv_open(BlockDriverState *bs, const char 
 *filename, int flags,
             bs-change_cb(bs-change_opaque, CHANGE_MEDIA);
     }

 +    /* throttling disk I/O limits */
 +    if (bs-io_limits_enabled) {
 +        bdrv_io_limits_enable(bs);
 +    }
 +
     return 0;

  unlink_and_fail:
 @@ -680,6 +757,16 @@ void bdrv_close(BlockDriverState *bs)
         if (bs-change_cb)
             bs-change_cb(bs-change_opaque, CHANGE_MEDIA);
     }
 +
 +    /* throttling disk I/O limits */
 +    if (bs-block_queue) {
 +        qemu_del_block_queue(bs-block_queue);
 +    }
 +
 +    if (bs-block_timer) {
 +        qemu_del_timer(bs-block_timer);
 +        qemu_free_timer(bs-block_timer);
 +    }
  }

  void bdrv_close_all(void)
 @@ -1312,6 +1399,15 @@ void bdrv_get_geometry_hint(BlockDriverState *bs,
     *psecs = bs-secs;
  }

 +/* throttling disk io limits */
 +void bdrv_set_io_limits(BlockDriverState *bs,
 +