Re: [Qemu-block] [PATCH 1/1] Improve block job rate limiting for small bandwidth values

Max Reitz Sat, 02 Jul 2016 06:23:27 -0700

On 28.06.2016 17:28, Sascha Silbe wrote:
> ratelimit_calculate_delay() previously reset the accounting every time
> slice, no matter how much data had been processed before. This had (at
> least) two consequences:
> 
> 1. The minimum speed is rather large, e.g. 5 MiB/s for commit and stream.
> 
>    Not sure if there are real-world use cases where this would be a
>    problem. Mirroring and backup over a slow link (e.g. DSL) would
>    come to mind, though.
> 
> 2. Tests for block job operations (e.g. cancel) were rather racy
> 
>    All block jobs currently use a time slice of 100ms. That's a
>    reasonable value to get smooth output during regular
>    operation. However this also meant that the state of block jobs
>    changed every 100ms, no matter how low the configured limit was. On
>    busy hosts, qemu often transferred additional chunks until the test
>    case had a chance to cancel the job.
> 
> Fix the block job rate limit code to delay for more than one time
> slice to address the above issues. To make it easier to handle
> oversized chunks we switch the semantics from returning a delay
> _before_ the current request to a delay _after_ the current
> request. If necessary, this delay consists of multiple time slice
> units.
> 
> Since the mirror job sends multiple chunks in one go even if the rate
> limit was exceeded in between, we need to keep track of the start of
> the current time slice so we can correctly re-compute the delay for
> the updated amount of data.
> 
> The minimum bandwidth now is 1 data unit per time slice. The block
> jobs are currently passing the amount of data transferred in sectors
> and using 100ms time slices, so this translates to 5120
> bytes/second. With chunk sizes usually being O(512KiB), tests have
> plenty of time (O(100s)) to operate on block jobs. The chance of a
> race condition now is fairly remote, except possibly on insanely
> loaded systems.
> 
> Signed-off-by: Sascha Silbe <[email protected]>
> ---
>  block/commit.c           | 13 +++++--------
>  block/mirror.c           |  4 +++-
>  block/stream.c           | 12 ++++--------
>  include/qemu/ratelimit.h | 43 ++++++++++++++++++++++++++++++++++---------
>  4 files changed, 46 insertions(+), 26 deletions(-)
>


[...]

> diff --git a/block/mirror.c b/block/mirror.c
> index a04ed9c..d9d70f0 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -416,7 +416,9 @@ static uint64_t coroutine_fn 
> mirror_iteration(MirrorBlockJob *s)
>          assert(io_sectors);
>          sector_num += io_sectors;
>          nb_chunks -= DIV_ROUND_UP(io_sectors, sectors_per_chunk);
> -        delay_ns += ratelimit_calculate_delay(&s->limit, io_sectors);
> +        if (s->common.speed) {
> +            delay_ns = ratelimit_calculate_delay(&s->limit, io_sectors);
> +        }

Hmm... Was it a bug that ratelimit_calculate_delay() was called
unconditionally before?

>      }
>      return delay_ns;
>  }

[...]

> diff --git a/include/qemu/ratelimit.h b/include/qemu/ratelimit.h
> index d413a4a..12db769 100644
> --- a/include/qemu/ratelimit.h
> +++ b/include/qemu/ratelimit.h
> @@ -15,34 +15,59 @@
>  #define QEMU_RATELIMIT_H 1
>  
>  typedef struct {
> -    int64_t next_slice_time;
> +    int64_t slice_start_time;
> +    int64_t slice_end_time;
>      uint64_t slice_quota;
>      uint64_t slice_ns;
>      uint64_t dispatched;
>  } RateLimit;
>  
> +/** Calculate and return delay for next request in ns
> + *
> + * Record that we sent @p n data units. If we may send more data units
> + * in the current time slice, return 0 (i.e. no delay). Otherwise
> + * return the amount of time (in ns) until the start of the next time
> + * slice that will permit sending the next chunk of data.
> + *
> + * Recording sent data units even after exceeding the quota is
> + * permitted; the time slice will be extended accordingly.
> + */
>  static inline int64_t ratelimit_calculate_delay(RateLimit *limit, uint64_t n)
>  {
>      int64_t now = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
> +    uint64_t delay_slices;
>  
> -    if (limit->next_slice_time < now) {
> -        limit->next_slice_time = now + limit->slice_ns;
> +    assert(limit->slice_quota && limit->slice_ns);
> +
> +    if (limit->slice_end_time < now) {
> +        /* Previous, possibly extended, time slice finished; reset the
> +         * accounting. */
> +        limit->slice_start_time = now;
> +        limit->slice_end_time = now + limit->slice_ns;
>          limit->dispatched = 0;
>      }
> -    if (limit->dispatched == 0 || limit->dispatched + n <= 
> limit->slice_quota) {
> -        limit->dispatched += n;
> +
> +    limit->dispatched += n;
> +    if (limit->dispatched < limit->slice_quota) {

Nitpick: This should probably stay <=.

> +        /* We may send further data within the current time slice, no
> +         * need to delay the next request. */
>          return 0;
> -    } else {
> -        limit->dispatched = n;
> -        return limit->next_slice_time - now;
>      }
> +
> +    /* Quota exceeded. Calculate the next time slice we may start
> +     * sending data again. */
> +    delay_slices = (limit->dispatched + limit->slice_quota - 1) /
> +        limit->slice_quota;
> +    limit->slice_end_time = limit->slice_start_time +
> +        delay_slices * limit->slice_ns;

I think it would make sense to make this a floating point calculation.
If you don't agree, though:

Reviewed-by: Max Reitz <[email protected]>

> +    return limit->slice_end_time - now;
>  }
>  
>  static inline void ratelimit_set_speed(RateLimit *limit, uint64_t speed,
>                                         uint64_t slice_ns)
>  {
>      limit->slice_ns = slice_ns;
> -    limit->slice_quota = ((double)speed * slice_ns)/1000000000ULL;
> +    limit->slice_quota = MAX(((double)speed * slice_ns) / 1000000000ULL, 1);
>  }
>  
>  #endif
>

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH 1/1] Improve block job rate limiting for small bandwidth values

Reply via email to