Re: [RFC PATCH] blk-throttle: add burst allowance.

2017-12-18 Thread Khazhismel Kumykov
On Mon, Dec 18, 2017 at 1:01 PM, Vivek Goyal  wrote:
> On Mon, Dec 18, 2017 at 12:39:50PM -0800, Khazhismel Kumykov wrote:
>> On Mon, Dec 18, 2017 at 10:29 AM, Vivek Goyal  wrote:
>> > On Mon, Dec 18, 2017 at 10:16:02AM -0800, Khazhismel Kumykov wrote:
>> >> On Mon, Nov 20, 2017 at 8:36 PM, Khazhismel Kumykov  
>> >> wrote:
>> >> > On Fri, Nov 17, 2017 at 11:26 AM, Shaohua Li  wrote:
>> >> >> On Thu, Nov 16, 2017 at 08:25:58PM -0800, Khazhismel Kumykov wrote:
>> >> >>> On Thu, Nov 16, 2017 at 8:50 AM, Shaohua Li  wrote:
>> >> >>> > On Tue, Nov 14, 2017 at 03:10:22PM -0800, Khazhismel Kumykov wrote:
>> >> >>> >> Allows configuration additional bytes or ios before a throttle is
>> >> >>> >> triggered.
>> >> >>> >>
>> >> >>> >> This allows implementation of a bucket style rate-limit/throttle 
>> >> >>> >> on a
>> >> >>> >> block device. Previously, bursting to a device was limited to 
>> >> >>> >> allowance
>> >> >>> >> granted in a single throtl_slice (similar to a bucket with limit N 
>> >> >>> >> and
>> >> >>> >> refill rate N/slice).
>> >> >>> >>
>> >> >>> >> Additional parameters bytes/io_burst_conf defined for tg, which 
>> >> >>> >> define a
>> >> >>> >> number of bytes/ios that must be depleted before throttling 
>> >> >>> >> happens. A
>> >> >>> >> tg that does not deplete this allowance functions as though it has 
>> >> >>> >> no
>> >> >>> >> configured limits. tgs earn additional allowance at rate defined by
>> >> >>> >> bps/iops for the tg. Once a tg has *_disp > *_burst_conf, 
>> >> >>> >> throttling
>> >> >>> >> kicks in. If a tg is idle for a while, it will again have some 
>> >> >>> >> burst
>> >> >>> >> allowance before it gets throttled again.
>> >> >>> >>
>> >> >>> >> slice_end for a tg is extended until io_disp/byte_disp would fall 
>> >> >>> >> to 0,
>> >> >>> >> when all "used" burst allowance would be earned back. trim_slice 
>> >> >>> >> still
>> >> >>> >> does progress slice_start as before and decrements *_disp as 
>> >> >>> >> before, and
>> >> >>> >> tgs continue to get bytes/ios in throtl_slice intervals.
>> >> >>> >
>> >> >>> > Can you describe why we need this? It would be great if you can 
>> >> >>> > describe the
>> >> >>> > usage model and an example. Does this work for io.low/io.max or 
>> >> >>> > both?
>> >> >>> >
>> >> >>> > Thanks,
>> >> >>> > Shaohua
>> >> >>> >
>> >> >>>
>> >> >>> Use case that brought this up was configuring limits for a remote
>> >> >>> shared device. Bursting beyond io.max is desired but only for so much
>> >> >>> before the limit kicks in, afterwards with sustained usage throughput
>> >> >>> is capped. (This proactively avoids remote-side limits). In that case
>> >> >>> one would configure in a root container io.max + io.burst, and
>> >> >>> configure low/other limits on descendants sharing the resource on the
>> >> >>> same node.
>> >> >>>
>> >> >>> With this patch, so long as tg has not dispatched more than the burst,
>> >> >>> no limit is applied at all by that tg, including limit imposed by
>> >> >>> io.low in tg_iops_limit, etc.
>> >> >>
>> >> >> I'd appreciate if you can give more details about the 'why'. 
>> >> >> 'configuring
>> >> >> limits for a remote shared device' doesn't justify the change.
>> >> >
>> >> > This is to configure a bursty workload (and associated device) with
>> >> > known/allowed expected burst size, but to not allow full utilization
>> >> > of the device for extended periods of time for QoS. During idle or low
>> >> > use periods the burst allowance accrues, and then tasks can burst well
>> >> > beyond the configured throttle up to the limit, afterwards is
>> >> > throttled. A constant throttle speed isn't sufficient for this as you
>> >> > can only burst 1 slice worth, but a limit of sorts is desirable for
>> >> > preventing over utilization of the shared device. This type of limit
>> >> > is also slightly different than what i understand io.low does in local
>> >> > cases in that tg is only high priority/unthrottled if it is bursty,
>> >> > and is limited with constant usage
>> >> >
>> >> > Khazhy
>> >>
>> >> Hi Shaohua,
>> >>
>> >> Does this clarify the reason for this patch? Is this (or something
>> >> similar) a good fit for inclusion in blk-throttle?
>> >>
>> >
>> > So does this brust have to be per cgroup. I mean if thortl_slice was
>> > configurable, that will allow to control the size of burst. (Just that
>> > it will be for all cgroups). If that works, that might be a simpler
>> > solution.
>> >
>> > Vivek
>>
>> The purpose for this configuration vs. increasing throtl_slice is the
>> behavior when the burst runs out. io/bytes allowance is given in
>> intervals of throtl_slice, so for long throtl_slice for those devices
>> that exceed the limit will see extended periods with no IO, rather
>> than at throttled speed. With this once burst is run out, since the
>> burst allowance is on top of the throttle, the device can 

Re: [RFC PATCH] blk-throttle: add burst allowance.

2017-12-18 Thread Khazhismel Kumykov
On Mon, Dec 18, 2017 at 1:01 PM, Vivek Goyal  wrote:
> On Mon, Dec 18, 2017 at 12:39:50PM -0800, Khazhismel Kumykov wrote:
>> On Mon, Dec 18, 2017 at 10:29 AM, Vivek Goyal  wrote:
>> > On Mon, Dec 18, 2017 at 10:16:02AM -0800, Khazhismel Kumykov wrote:
>> >> On Mon, Nov 20, 2017 at 8:36 PM, Khazhismel Kumykov  
>> >> wrote:
>> >> > On Fri, Nov 17, 2017 at 11:26 AM, Shaohua Li  wrote:
>> >> >> On Thu, Nov 16, 2017 at 08:25:58PM -0800, Khazhismel Kumykov wrote:
>> >> >>> On Thu, Nov 16, 2017 at 8:50 AM, Shaohua Li  wrote:
>> >> >>> > On Tue, Nov 14, 2017 at 03:10:22PM -0800, Khazhismel Kumykov wrote:
>> >> >>> >> Allows configuration additional bytes or ios before a throttle is
>> >> >>> >> triggered.
>> >> >>> >>
>> >> >>> >> This allows implementation of a bucket style rate-limit/throttle 
>> >> >>> >> on a
>> >> >>> >> block device. Previously, bursting to a device was limited to 
>> >> >>> >> allowance
>> >> >>> >> granted in a single throtl_slice (similar to a bucket with limit N 
>> >> >>> >> and
>> >> >>> >> refill rate N/slice).
>> >> >>> >>
>> >> >>> >> Additional parameters bytes/io_burst_conf defined for tg, which 
>> >> >>> >> define a
>> >> >>> >> number of bytes/ios that must be depleted before throttling 
>> >> >>> >> happens. A
>> >> >>> >> tg that does not deplete this allowance functions as though it has 
>> >> >>> >> no
>> >> >>> >> configured limits. tgs earn additional allowance at rate defined by
>> >> >>> >> bps/iops for the tg. Once a tg has *_disp > *_burst_conf, 
>> >> >>> >> throttling
>> >> >>> >> kicks in. If a tg is idle for a while, it will again have some 
>> >> >>> >> burst
>> >> >>> >> allowance before it gets throttled again.
>> >> >>> >>
>> >> >>> >> slice_end for a tg is extended until io_disp/byte_disp would fall 
>> >> >>> >> to 0,
>> >> >>> >> when all "used" burst allowance would be earned back. trim_slice 
>> >> >>> >> still
>> >> >>> >> does progress slice_start as before and decrements *_disp as 
>> >> >>> >> before, and
>> >> >>> >> tgs continue to get bytes/ios in throtl_slice intervals.
>> >> >>> >
>> >> >>> > Can you describe why we need this? It would be great if you can 
>> >> >>> > describe the
>> >> >>> > usage model and an example. Does this work for io.low/io.max or 
>> >> >>> > both?
>> >> >>> >
>> >> >>> > Thanks,
>> >> >>> > Shaohua
>> >> >>> >
>> >> >>>
>> >> >>> Use case that brought this up was configuring limits for a remote
>> >> >>> shared device. Bursting beyond io.max is desired but only for so much
>> >> >>> before the limit kicks in, afterwards with sustained usage throughput
>> >> >>> is capped. (This proactively avoids remote-side limits). In that case
>> >> >>> one would configure in a root container io.max + io.burst, and
>> >> >>> configure low/other limits on descendants sharing the resource on the
>> >> >>> same node.
>> >> >>>
>> >> >>> With this patch, so long as tg has not dispatched more than the burst,
>> >> >>> no limit is applied at all by that tg, including limit imposed by
>> >> >>> io.low in tg_iops_limit, etc.
>> >> >>
>> >> >> I'd appreciate if you can give more details about the 'why'. 
>> >> >> 'configuring
>> >> >> limits for a remote shared device' doesn't justify the change.
>> >> >
>> >> > This is to configure a bursty workload (and associated device) with
>> >> > known/allowed expected burst size, but to not allow full utilization
>> >> > of the device for extended periods of time for QoS. During idle or low
>> >> > use periods the burst allowance accrues, and then tasks can burst well
>> >> > beyond the configured throttle up to the limit, afterwards is
>> >> > throttled. A constant throttle speed isn't sufficient for this as you
>> >> > can only burst 1 slice worth, but a limit of sorts is desirable for
>> >> > preventing over utilization of the shared device. This type of limit
>> >> > is also slightly different than what i understand io.low does in local
>> >> > cases in that tg is only high priority/unthrottled if it is bursty,
>> >> > and is limited with constant usage
>> >> >
>> >> > Khazhy
>> >>
>> >> Hi Shaohua,
>> >>
>> >> Does this clarify the reason for this patch? Is this (or something
>> >> similar) a good fit for inclusion in blk-throttle?
>> >>
>> >
>> > So does this brust have to be per cgroup. I mean if thortl_slice was
>> > configurable, that will allow to control the size of burst. (Just that
>> > it will be for all cgroups). If that works, that might be a simpler
>> > solution.
>> >
>> > Vivek
>>
>> The purpose for this configuration vs. increasing throtl_slice is the
>> behavior when the burst runs out. io/bytes allowance is given in
>> intervals of throtl_slice, so for long throtl_slice for those devices
>> that exceed the limit will see extended periods with no IO, rather
>> than at throttled speed. With this once burst is run out, since the
>> burst allowance is on top of the throttle, the device can continue to
>> be used more smoothly at the configured throttled speed.
>
> I thought that 

Re: [RFC PATCH] blk-throttle: add burst allowance.

2017-12-18 Thread Vivek Goyal
On Mon, Dec 18, 2017 at 12:39:50PM -0800, Khazhismel Kumykov wrote:
> On Mon, Dec 18, 2017 at 10:29 AM, Vivek Goyal  wrote:
> > On Mon, Dec 18, 2017 at 10:16:02AM -0800, Khazhismel Kumykov wrote:
> >> On Mon, Nov 20, 2017 at 8:36 PM, Khazhismel Kumykov  
> >> wrote:
> >> > On Fri, Nov 17, 2017 at 11:26 AM, Shaohua Li  wrote:
> >> >> On Thu, Nov 16, 2017 at 08:25:58PM -0800, Khazhismel Kumykov wrote:
> >> >>> On Thu, Nov 16, 2017 at 8:50 AM, Shaohua Li  wrote:
> >> >>> > On Tue, Nov 14, 2017 at 03:10:22PM -0800, Khazhismel Kumykov wrote:
> >> >>> >> Allows configuration additional bytes or ios before a throttle is
> >> >>> >> triggered.
> >> >>> >>
> >> >>> >> This allows implementation of a bucket style rate-limit/throttle on 
> >> >>> >> a
> >> >>> >> block device. Previously, bursting to a device was limited to 
> >> >>> >> allowance
> >> >>> >> granted in a single throtl_slice (similar to a bucket with limit N 
> >> >>> >> and
> >> >>> >> refill rate N/slice).
> >> >>> >>
> >> >>> >> Additional parameters bytes/io_burst_conf defined for tg, which 
> >> >>> >> define a
> >> >>> >> number of bytes/ios that must be depleted before throttling 
> >> >>> >> happens. A
> >> >>> >> tg that does not deplete this allowance functions as though it has 
> >> >>> >> no
> >> >>> >> configured limits. tgs earn additional allowance at rate defined by
> >> >>> >> bps/iops for the tg. Once a tg has *_disp > *_burst_conf, throttling
> >> >>> >> kicks in. If a tg is idle for a while, it will again have some burst
> >> >>> >> allowance before it gets throttled again.
> >> >>> >>
> >> >>> >> slice_end for a tg is extended until io_disp/byte_disp would fall 
> >> >>> >> to 0,
> >> >>> >> when all "used" burst allowance would be earned back. trim_slice 
> >> >>> >> still
> >> >>> >> does progress slice_start as before and decrements *_disp as 
> >> >>> >> before, and
> >> >>> >> tgs continue to get bytes/ios in throtl_slice intervals.
> >> >>> >
> >> >>> > Can you describe why we need this? It would be great if you can 
> >> >>> > describe the
> >> >>> > usage model and an example. Does this work for io.low/io.max or both?
> >> >>> >
> >> >>> > Thanks,
> >> >>> > Shaohua
> >> >>> >
> >> >>>
> >> >>> Use case that brought this up was configuring limits for a remote
> >> >>> shared device. Bursting beyond io.max is desired but only for so much
> >> >>> before the limit kicks in, afterwards with sustained usage throughput
> >> >>> is capped. (This proactively avoids remote-side limits). In that case
> >> >>> one would configure in a root container io.max + io.burst, and
> >> >>> configure low/other limits on descendants sharing the resource on the
> >> >>> same node.
> >> >>>
> >> >>> With this patch, so long as tg has not dispatched more than the burst,
> >> >>> no limit is applied at all by that tg, including limit imposed by
> >> >>> io.low in tg_iops_limit, etc.
> >> >>
> >> >> I'd appreciate if you can give more details about the 'why'. 
> >> >> 'configuring
> >> >> limits for a remote shared device' doesn't justify the change.
> >> >
> >> > This is to configure a bursty workload (and associated device) with
> >> > known/allowed expected burst size, but to not allow full utilization
> >> > of the device for extended periods of time for QoS. During idle or low
> >> > use periods the burst allowance accrues, and then tasks can burst well
> >> > beyond the configured throttle up to the limit, afterwards is
> >> > throttled. A constant throttle speed isn't sufficient for this as you
> >> > can only burst 1 slice worth, but a limit of sorts is desirable for
> >> > preventing over utilization of the shared device. This type of limit
> >> > is also slightly different than what i understand io.low does in local
> >> > cases in that tg is only high priority/unthrottled if it is bursty,
> >> > and is limited with constant usage
> >> >
> >> > Khazhy
> >>
> >> Hi Shaohua,
> >>
> >> Does this clarify the reason for this patch? Is this (or something
> >> similar) a good fit for inclusion in blk-throttle?
> >>
> >
> > So does this brust have to be per cgroup. I mean if thortl_slice was
> > configurable, that will allow to control the size of burst. (Just that
> > it will be for all cgroups). If that works, that might be a simpler
> > solution.
> >
> > Vivek
> 
> The purpose for this configuration vs. increasing throtl_slice is the
> behavior when the burst runs out. io/bytes allowance is given in
> intervals of throtl_slice, so for long throtl_slice for those devices
> that exceed the limit will see extended periods with no IO, rather
> than at throttled speed. With this once burst is run out, since the
> burst allowance is on top of the throttle, the device can continue to
> be used more smoothly at the configured throttled speed.

I thought that whole idea of burst is that there is some bursty IO which
will quickly finish. If workload expects a stedy state IO rate, 

Re: [RFC PATCH] blk-throttle: add burst allowance.

2017-12-18 Thread Vivek Goyal
On Mon, Dec 18, 2017 at 12:39:50PM -0800, Khazhismel Kumykov wrote:
> On Mon, Dec 18, 2017 at 10:29 AM, Vivek Goyal  wrote:
> > On Mon, Dec 18, 2017 at 10:16:02AM -0800, Khazhismel Kumykov wrote:
> >> On Mon, Nov 20, 2017 at 8:36 PM, Khazhismel Kumykov  
> >> wrote:
> >> > On Fri, Nov 17, 2017 at 11:26 AM, Shaohua Li  wrote:
> >> >> On Thu, Nov 16, 2017 at 08:25:58PM -0800, Khazhismel Kumykov wrote:
> >> >>> On Thu, Nov 16, 2017 at 8:50 AM, Shaohua Li  wrote:
> >> >>> > On Tue, Nov 14, 2017 at 03:10:22PM -0800, Khazhismel Kumykov wrote:
> >> >>> >> Allows configuration additional bytes or ios before a throttle is
> >> >>> >> triggered.
> >> >>> >>
> >> >>> >> This allows implementation of a bucket style rate-limit/throttle on 
> >> >>> >> a
> >> >>> >> block device. Previously, bursting to a device was limited to 
> >> >>> >> allowance
> >> >>> >> granted in a single throtl_slice (similar to a bucket with limit N 
> >> >>> >> and
> >> >>> >> refill rate N/slice).
> >> >>> >>
> >> >>> >> Additional parameters bytes/io_burst_conf defined for tg, which 
> >> >>> >> define a
> >> >>> >> number of bytes/ios that must be depleted before throttling 
> >> >>> >> happens. A
> >> >>> >> tg that does not deplete this allowance functions as though it has 
> >> >>> >> no
> >> >>> >> configured limits. tgs earn additional allowance at rate defined by
> >> >>> >> bps/iops for the tg. Once a tg has *_disp > *_burst_conf, throttling
> >> >>> >> kicks in. If a tg is idle for a while, it will again have some burst
> >> >>> >> allowance before it gets throttled again.
> >> >>> >>
> >> >>> >> slice_end for a tg is extended until io_disp/byte_disp would fall 
> >> >>> >> to 0,
> >> >>> >> when all "used" burst allowance would be earned back. trim_slice 
> >> >>> >> still
> >> >>> >> does progress slice_start as before and decrements *_disp as 
> >> >>> >> before, and
> >> >>> >> tgs continue to get bytes/ios in throtl_slice intervals.
> >> >>> >
> >> >>> > Can you describe why we need this? It would be great if you can 
> >> >>> > describe the
> >> >>> > usage model and an example. Does this work for io.low/io.max or both?
> >> >>> >
> >> >>> > Thanks,
> >> >>> > Shaohua
> >> >>> >
> >> >>>
> >> >>> Use case that brought this up was configuring limits for a remote
> >> >>> shared device. Bursting beyond io.max is desired but only for so much
> >> >>> before the limit kicks in, afterwards with sustained usage throughput
> >> >>> is capped. (This proactively avoids remote-side limits). In that case
> >> >>> one would configure in a root container io.max + io.burst, and
> >> >>> configure low/other limits on descendants sharing the resource on the
> >> >>> same node.
> >> >>>
> >> >>> With this patch, so long as tg has not dispatched more than the burst,
> >> >>> no limit is applied at all by that tg, including limit imposed by
> >> >>> io.low in tg_iops_limit, etc.
> >> >>
> >> >> I'd appreciate if you can give more details about the 'why'. 
> >> >> 'configuring
> >> >> limits for a remote shared device' doesn't justify the change.
> >> >
> >> > This is to configure a bursty workload (and associated device) with
> >> > known/allowed expected burst size, but to not allow full utilization
> >> > of the device for extended periods of time for QoS. During idle or low
> >> > use periods the burst allowance accrues, and then tasks can burst well
> >> > beyond the configured throttle up to the limit, afterwards is
> >> > throttled. A constant throttle speed isn't sufficient for this as you
> >> > can only burst 1 slice worth, but a limit of sorts is desirable for
> >> > preventing over utilization of the shared device. This type of limit
> >> > is also slightly different than what i understand io.low does in local
> >> > cases in that tg is only high priority/unthrottled if it is bursty,
> >> > and is limited with constant usage
> >> >
> >> > Khazhy
> >>
> >> Hi Shaohua,
> >>
> >> Does this clarify the reason for this patch? Is this (or something
> >> similar) a good fit for inclusion in blk-throttle?
> >>
> >
> > So does this brust have to be per cgroup. I mean if thortl_slice was
> > configurable, that will allow to control the size of burst. (Just that
> > it will be for all cgroups). If that works, that might be a simpler
> > solution.
> >
> > Vivek
> 
> The purpose for this configuration vs. increasing throtl_slice is the
> behavior when the burst runs out. io/bytes allowance is given in
> intervals of throtl_slice, so for long throtl_slice for those devices
> that exceed the limit will see extended periods with no IO, rather
> than at throttled speed. With this once burst is run out, since the
> burst allowance is on top of the throttle, the device can continue to
> be used more smoothly at the configured throttled speed.

I thought that whole idea of burst is that there is some bursty IO which
will quickly finish. If workload expects a stedy state IO rate, then
why to allow a large burst to begin with.

So yes, increasing throtl 

Re: [RFC PATCH] blk-throttle: add burst allowance.

2017-12-18 Thread Khazhismel Kumykov
On Mon, Dec 18, 2017 at 10:29 AM, Vivek Goyal  wrote:
> On Mon, Dec 18, 2017 at 10:16:02AM -0800, Khazhismel Kumykov wrote:
>> On Mon, Nov 20, 2017 at 8:36 PM, Khazhismel Kumykov  
>> wrote:
>> > On Fri, Nov 17, 2017 at 11:26 AM, Shaohua Li  wrote:
>> >> On Thu, Nov 16, 2017 at 08:25:58PM -0800, Khazhismel Kumykov wrote:
>> >>> On Thu, Nov 16, 2017 at 8:50 AM, Shaohua Li  wrote:
>> >>> > On Tue, Nov 14, 2017 at 03:10:22PM -0800, Khazhismel Kumykov wrote:
>> >>> >> Allows configuration additional bytes or ios before a throttle is
>> >>> >> triggered.
>> >>> >>
>> >>> >> This allows implementation of a bucket style rate-limit/throttle on a
>> >>> >> block device. Previously, bursting to a device was limited to 
>> >>> >> allowance
>> >>> >> granted in a single throtl_slice (similar to a bucket with limit N and
>> >>> >> refill rate N/slice).
>> >>> >>
>> >>> >> Additional parameters bytes/io_burst_conf defined for tg, which 
>> >>> >> define a
>> >>> >> number of bytes/ios that must be depleted before throttling happens. A
>> >>> >> tg that does not deplete this allowance functions as though it has no
>> >>> >> configured limits. tgs earn additional allowance at rate defined by
>> >>> >> bps/iops for the tg. Once a tg has *_disp > *_burst_conf, throttling
>> >>> >> kicks in. If a tg is idle for a while, it will again have some burst
>> >>> >> allowance before it gets throttled again.
>> >>> >>
>> >>> >> slice_end for a tg is extended until io_disp/byte_disp would fall to 
>> >>> >> 0,
>> >>> >> when all "used" burst allowance would be earned back. trim_slice still
>> >>> >> does progress slice_start as before and decrements *_disp as before, 
>> >>> >> and
>> >>> >> tgs continue to get bytes/ios in throtl_slice intervals.
>> >>> >
>> >>> > Can you describe why we need this? It would be great if you can 
>> >>> > describe the
>> >>> > usage model and an example. Does this work for io.low/io.max or both?
>> >>> >
>> >>> > Thanks,
>> >>> > Shaohua
>> >>> >
>> >>>
>> >>> Use case that brought this up was configuring limits for a remote
>> >>> shared device. Bursting beyond io.max is desired but only for so much
>> >>> before the limit kicks in, afterwards with sustained usage throughput
>> >>> is capped. (This proactively avoids remote-side limits). In that case
>> >>> one would configure in a root container io.max + io.burst, and
>> >>> configure low/other limits on descendants sharing the resource on the
>> >>> same node.
>> >>>
>> >>> With this patch, so long as tg has not dispatched more than the burst,
>> >>> no limit is applied at all by that tg, including limit imposed by
>> >>> io.low in tg_iops_limit, etc.
>> >>
>> >> I'd appreciate if you can give more details about the 'why'. 'configuring
>> >> limits for a remote shared device' doesn't justify the change.
>> >
>> > This is to configure a bursty workload (and associated device) with
>> > known/allowed expected burst size, but to not allow full utilization
>> > of the device for extended periods of time for QoS. During idle or low
>> > use periods the burst allowance accrues, and then tasks can burst well
>> > beyond the configured throttle up to the limit, afterwards is
>> > throttled. A constant throttle speed isn't sufficient for this as you
>> > can only burst 1 slice worth, but a limit of sorts is desirable for
>> > preventing over utilization of the shared device. This type of limit
>> > is also slightly different than what i understand io.low does in local
>> > cases in that tg is only high priority/unthrottled if it is bursty,
>> > and is limited with constant usage
>> >
>> > Khazhy
>>
>> Hi Shaohua,
>>
>> Does this clarify the reason for this patch? Is this (or something
>> similar) a good fit for inclusion in blk-throttle?
>>
>
> So does this brust have to be per cgroup. I mean if thortl_slice was
> configurable, that will allow to control the size of burst. (Just that
> it will be for all cgroups). If that works, that might be a simpler
> solution.
>
> Vivek

The purpose for this configuration vs. increasing throtl_slice is the
behavior when the burst runs out. io/bytes allowance is given in
intervals of throtl_slice, so for long throtl_slice for those devices
that exceed the limit will see extended periods with no IO, rather
than at throttled speed. With this once burst is run out, since the
burst allowance is on top of the throttle, the device can continue to
be used more smoothly at the configured throttled speed. For this we
do want a throttle group with both the "steady state" rate + the burst
amount, and we get cgroup support with that.

I notice with cgroupv2 io, it seems no longer to configure a
device-wide throttle group e.g. on the root cgroup. (and putting
restrictions on root cgroup isn't an option) For something like this,
it does make sense to want to configure just for the device, vs. per
cgroup, perhaps there is somewhere better it would fit than as 

Re: [RFC PATCH] blk-throttle: add burst allowance.

2017-12-18 Thread Khazhismel Kumykov
On Mon, Dec 18, 2017 at 10:29 AM, Vivek Goyal  wrote:
> On Mon, Dec 18, 2017 at 10:16:02AM -0800, Khazhismel Kumykov wrote:
>> On Mon, Nov 20, 2017 at 8:36 PM, Khazhismel Kumykov  
>> wrote:
>> > On Fri, Nov 17, 2017 at 11:26 AM, Shaohua Li  wrote:
>> >> On Thu, Nov 16, 2017 at 08:25:58PM -0800, Khazhismel Kumykov wrote:
>> >>> On Thu, Nov 16, 2017 at 8:50 AM, Shaohua Li  wrote:
>> >>> > On Tue, Nov 14, 2017 at 03:10:22PM -0800, Khazhismel Kumykov wrote:
>> >>> >> Allows configuration additional bytes or ios before a throttle is
>> >>> >> triggered.
>> >>> >>
>> >>> >> This allows implementation of a bucket style rate-limit/throttle on a
>> >>> >> block device. Previously, bursting to a device was limited to 
>> >>> >> allowance
>> >>> >> granted in a single throtl_slice (similar to a bucket with limit N and
>> >>> >> refill rate N/slice).
>> >>> >>
>> >>> >> Additional parameters bytes/io_burst_conf defined for tg, which 
>> >>> >> define a
>> >>> >> number of bytes/ios that must be depleted before throttling happens. A
>> >>> >> tg that does not deplete this allowance functions as though it has no
>> >>> >> configured limits. tgs earn additional allowance at rate defined by
>> >>> >> bps/iops for the tg. Once a tg has *_disp > *_burst_conf, throttling
>> >>> >> kicks in. If a tg is idle for a while, it will again have some burst
>> >>> >> allowance before it gets throttled again.
>> >>> >>
>> >>> >> slice_end for a tg is extended until io_disp/byte_disp would fall to 
>> >>> >> 0,
>> >>> >> when all "used" burst allowance would be earned back. trim_slice still
>> >>> >> does progress slice_start as before and decrements *_disp as before, 
>> >>> >> and
>> >>> >> tgs continue to get bytes/ios in throtl_slice intervals.
>> >>> >
>> >>> > Can you describe why we need this? It would be great if you can 
>> >>> > describe the
>> >>> > usage model and an example. Does this work for io.low/io.max or both?
>> >>> >
>> >>> > Thanks,
>> >>> > Shaohua
>> >>> >
>> >>>
>> >>> Use case that brought this up was configuring limits for a remote
>> >>> shared device. Bursting beyond io.max is desired but only for so much
>> >>> before the limit kicks in, afterwards with sustained usage throughput
>> >>> is capped. (This proactively avoids remote-side limits). In that case
>> >>> one would configure in a root container io.max + io.burst, and
>> >>> configure low/other limits on descendants sharing the resource on the
>> >>> same node.
>> >>>
>> >>> With this patch, so long as tg has not dispatched more than the burst,
>> >>> no limit is applied at all by that tg, including limit imposed by
>> >>> io.low in tg_iops_limit, etc.
>> >>
>> >> I'd appreciate if you can give more details about the 'why'. 'configuring
>> >> limits for a remote shared device' doesn't justify the change.
>> >
>> > This is to configure a bursty workload (and associated device) with
>> > known/allowed expected burst size, but to not allow full utilization
>> > of the device for extended periods of time for QoS. During idle or low
>> > use periods the burst allowance accrues, and then tasks can burst well
>> > beyond the configured throttle up to the limit, afterwards is
>> > throttled. A constant throttle speed isn't sufficient for this as you
>> > can only burst 1 slice worth, but a limit of sorts is desirable for
>> > preventing over utilization of the shared device. This type of limit
>> > is also slightly different than what i understand io.low does in local
>> > cases in that tg is only high priority/unthrottled if it is bursty,
>> > and is limited with constant usage
>> >
>> > Khazhy
>>
>> Hi Shaohua,
>>
>> Does this clarify the reason for this patch? Is this (or something
>> similar) a good fit for inclusion in blk-throttle?
>>
>
> So does this brust have to be per cgroup. I mean if thortl_slice was
> configurable, that will allow to control the size of burst. (Just that
> it will be for all cgroups). If that works, that might be a simpler
> solution.
>
> Vivek

The purpose for this configuration vs. increasing throtl_slice is the
behavior when the burst runs out. io/bytes allowance is given in
intervals of throtl_slice, so for long throtl_slice for those devices
that exceed the limit will see extended periods with no IO, rather
than at throttled speed. With this once burst is run out, since the
burst allowance is on top of the throttle, the device can continue to
be used more smoothly at the configured throttled speed. For this we
do want a throttle group with both the "steady state" rate + the burst
amount, and we get cgroup support with that.

I notice with cgroupv2 io, it seems no longer to configure a
device-wide throttle group e.g. on the root cgroup. (and putting
restrictions on root cgroup isn't an option) For something like this,
it does make sense to want to configure just for the device, vs. per
cgroup, perhaps there is somewhere better it would fit than as cgroup
option? perhaps have configuration on device node for a throttle 

Re: [RFC PATCH] blk-throttle: add burst allowance.

2017-12-18 Thread Vivek Goyal
On Mon, Dec 18, 2017 at 10:16:02AM -0800, Khazhismel Kumykov wrote:
> On Mon, Nov 20, 2017 at 8:36 PM, Khazhismel Kumykov  wrote:
> > On Fri, Nov 17, 2017 at 11:26 AM, Shaohua Li  wrote:
> >> On Thu, Nov 16, 2017 at 08:25:58PM -0800, Khazhismel Kumykov wrote:
> >>> On Thu, Nov 16, 2017 at 8:50 AM, Shaohua Li  wrote:
> >>> > On Tue, Nov 14, 2017 at 03:10:22PM -0800, Khazhismel Kumykov wrote:
> >>> >> Allows configuration additional bytes or ios before a throttle is
> >>> >> triggered.
> >>> >>
> >>> >> This allows implementation of a bucket style rate-limit/throttle on a
> >>> >> block device. Previously, bursting to a device was limited to allowance
> >>> >> granted in a single throtl_slice (similar to a bucket with limit N and
> >>> >> refill rate N/slice).
> >>> >>
> >>> >> Additional parameters bytes/io_burst_conf defined for tg, which define 
> >>> >> a
> >>> >> number of bytes/ios that must be depleted before throttling happens. A
> >>> >> tg that does not deplete this allowance functions as though it has no
> >>> >> configured limits. tgs earn additional allowance at rate defined by
> >>> >> bps/iops for the tg. Once a tg has *_disp > *_burst_conf, throttling
> >>> >> kicks in. If a tg is idle for a while, it will again have some burst
> >>> >> allowance before it gets throttled again.
> >>> >>
> >>> >> slice_end for a tg is extended until io_disp/byte_disp would fall to 0,
> >>> >> when all "used" burst allowance would be earned back. trim_slice still
> >>> >> does progress slice_start as before and decrements *_disp as before, 
> >>> >> and
> >>> >> tgs continue to get bytes/ios in throtl_slice intervals.
> >>> >
> >>> > Can you describe why we need this? It would be great if you can 
> >>> > describe the
> >>> > usage model and an example. Does this work for io.low/io.max or both?
> >>> >
> >>> > Thanks,
> >>> > Shaohua
> >>> >
> >>>
> >>> Use case that brought this up was configuring limits for a remote
> >>> shared device. Bursting beyond io.max is desired but only for so much
> >>> before the limit kicks in, afterwards with sustained usage throughput
> >>> is capped. (This proactively avoids remote-side limits). In that case
> >>> one would configure in a root container io.max + io.burst, and
> >>> configure low/other limits on descendants sharing the resource on the
> >>> same node.
> >>>
> >>> With this patch, so long as tg has not dispatched more than the burst,
> >>> no limit is applied at all by that tg, including limit imposed by
> >>> io.low in tg_iops_limit, etc.
> >>
> >> I'd appreciate if you can give more details about the 'why'. 'configuring
> >> limits for a remote shared device' doesn't justify the change.
> >
> > This is to configure a bursty workload (and associated device) with
> > known/allowed expected burst size, but to not allow full utilization
> > of the device for extended periods of time for QoS. During idle or low
> > use periods the burst allowance accrues, and then tasks can burst well
> > beyond the configured throttle up to the limit, afterwards is
> > throttled. A constant throttle speed isn't sufficient for this as you
> > can only burst 1 slice worth, but a limit of sorts is desirable for
> > preventing over utilization of the shared device. This type of limit
> > is also slightly different than what i understand io.low does in local
> > cases in that tg is only high priority/unthrottled if it is bursty,
> > and is limited with constant usage
> >
> > Khazhy
> 
> Hi Shaohua,
> 
> Does this clarify the reason for this patch? Is this (or something
> similar) a good fit for inclusion in blk-throttle?
> 

So does this brust have to be per cgroup. I mean if thortl_slice was
configurable, that will allow to control the size of burst. (Just that
it will be for all cgroups). If that works, that might be a simpler
solution.

Vivek


Re: [RFC PATCH] blk-throttle: add burst allowance.

2017-12-18 Thread Vivek Goyal
On Mon, Dec 18, 2017 at 10:16:02AM -0800, Khazhismel Kumykov wrote:
> On Mon, Nov 20, 2017 at 8:36 PM, Khazhismel Kumykov  wrote:
> > On Fri, Nov 17, 2017 at 11:26 AM, Shaohua Li  wrote:
> >> On Thu, Nov 16, 2017 at 08:25:58PM -0800, Khazhismel Kumykov wrote:
> >>> On Thu, Nov 16, 2017 at 8:50 AM, Shaohua Li  wrote:
> >>> > On Tue, Nov 14, 2017 at 03:10:22PM -0800, Khazhismel Kumykov wrote:
> >>> >> Allows configuration additional bytes or ios before a throttle is
> >>> >> triggered.
> >>> >>
> >>> >> This allows implementation of a bucket style rate-limit/throttle on a
> >>> >> block device. Previously, bursting to a device was limited to allowance
> >>> >> granted in a single throtl_slice (similar to a bucket with limit N and
> >>> >> refill rate N/slice).
> >>> >>
> >>> >> Additional parameters bytes/io_burst_conf defined for tg, which define 
> >>> >> a
> >>> >> number of bytes/ios that must be depleted before throttling happens. A
> >>> >> tg that does not deplete this allowance functions as though it has no
> >>> >> configured limits. tgs earn additional allowance at rate defined by
> >>> >> bps/iops for the tg. Once a tg has *_disp > *_burst_conf, throttling
> >>> >> kicks in. If a tg is idle for a while, it will again have some burst
> >>> >> allowance before it gets throttled again.
> >>> >>
> >>> >> slice_end for a tg is extended until io_disp/byte_disp would fall to 0,
> >>> >> when all "used" burst allowance would be earned back. trim_slice still
> >>> >> does progress slice_start as before and decrements *_disp as before, 
> >>> >> and
> >>> >> tgs continue to get bytes/ios in throtl_slice intervals.
> >>> >
> >>> > Can you describe why we need this? It would be great if you can 
> >>> > describe the
> >>> > usage model and an example. Does this work for io.low/io.max or both?
> >>> >
> >>> > Thanks,
> >>> > Shaohua
> >>> >
> >>>
> >>> Use case that brought this up was configuring limits for a remote
> >>> shared device. Bursting beyond io.max is desired but only for so much
> >>> before the limit kicks in, afterwards with sustained usage throughput
> >>> is capped. (This proactively avoids remote-side limits). In that case
> >>> one would configure in a root container io.max + io.burst, and
> >>> configure low/other limits on descendants sharing the resource on the
> >>> same node.
> >>>
> >>> With this patch, so long as tg has not dispatched more than the burst,
> >>> no limit is applied at all by that tg, including limit imposed by
> >>> io.low in tg_iops_limit, etc.
> >>
> >> I'd appreciate if you can give more details about the 'why'. 'configuring
> >> limits for a remote shared device' doesn't justify the change.
> >
> > This is to configure a bursty workload (and associated device) with
> > known/allowed expected burst size, but to not allow full utilization
> > of the device for extended periods of time for QoS. During idle or low
> > use periods the burst allowance accrues, and then tasks can burst well
> > beyond the configured throttle up to the limit, afterwards is
> > throttled. A constant throttle speed isn't sufficient for this as you
> > can only burst 1 slice worth, but a limit of sorts is desirable for
> > preventing over utilization of the shared device. This type of limit
> > is also slightly different than what i understand io.low does in local
> > cases in that tg is only high priority/unthrottled if it is bursty,
> > and is limited with constant usage
> >
> > Khazhy
> 
> Hi Shaohua,
> 
> Does this clarify the reason for this patch? Is this (or something
> similar) a good fit for inclusion in blk-throttle?
> 

So does this brust have to be per cgroup. I mean if thortl_slice was
configurable, that will allow to control the size of burst. (Just that
it will be for all cgroups). If that works, that might be a simpler
solution.

Vivek


Re: [RFC PATCH] blk-throttle: add burst allowance.

2017-12-18 Thread Khazhismel Kumykov
On Mon, Nov 20, 2017 at 8:36 PM, Khazhismel Kumykov  wrote:
> On Fri, Nov 17, 2017 at 11:26 AM, Shaohua Li  wrote:
>> On Thu, Nov 16, 2017 at 08:25:58PM -0800, Khazhismel Kumykov wrote:
>>> On Thu, Nov 16, 2017 at 8:50 AM, Shaohua Li  wrote:
>>> > On Tue, Nov 14, 2017 at 03:10:22PM -0800, Khazhismel Kumykov wrote:
>>> >> Allows configuration additional bytes or ios before a throttle is
>>> >> triggered.
>>> >>
>>> >> This allows implementation of a bucket style rate-limit/throttle on a
>>> >> block device. Previously, bursting to a device was limited to allowance
>>> >> granted in a single throtl_slice (similar to a bucket with limit N and
>>> >> refill rate N/slice).
>>> >>
>>> >> Additional parameters bytes/io_burst_conf defined for tg, which define a
>>> >> number of bytes/ios that must be depleted before throttling happens. A
>>> >> tg that does not deplete this allowance functions as though it has no
>>> >> configured limits. tgs earn additional allowance at rate defined by
>>> >> bps/iops for the tg. Once a tg has *_disp > *_burst_conf, throttling
>>> >> kicks in. If a tg is idle for a while, it will again have some burst
>>> >> allowance before it gets throttled again.
>>> >>
>>> >> slice_end for a tg is extended until io_disp/byte_disp would fall to 0,
>>> >> when all "used" burst allowance would be earned back. trim_slice still
>>> >> does progress slice_start as before and decrements *_disp as before, and
>>> >> tgs continue to get bytes/ios in throtl_slice intervals.
>>> >
>>> > Can you describe why we need this? It would be great if you can describe 
>>> > the
>>> > usage model and an example. Does this work for io.low/io.max or both?
>>> >
>>> > Thanks,
>>> > Shaohua
>>> >
>>>
>>> Use case that brought this up was configuring limits for a remote
>>> shared device. Bursting beyond io.max is desired but only for so much
>>> before the limit kicks in, afterwards with sustained usage throughput
>>> is capped. (This proactively avoids remote-side limits). In that case
>>> one would configure in a root container io.max + io.burst, and
>>> configure low/other limits on descendants sharing the resource on the
>>> same node.
>>>
>>> With this patch, so long as tg has not dispatched more than the burst,
>>> no limit is applied at all by that tg, including limit imposed by
>>> io.low in tg_iops_limit, etc.
>>
>> I'd appreciate if you can give more details about the 'why'. 'configuring
>> limits for a remote shared device' doesn't justify the change.
>
> This is to configure a bursty workload (and associated device) with
> known/allowed expected burst size, but to not allow full utilization
> of the device for extended periods of time for QoS. During idle or low
> use periods the burst allowance accrues, and then tasks can burst well
> beyond the configured throttle up to the limit, afterwards is
> throttled. A constant throttle speed isn't sufficient for this as you
> can only burst 1 slice worth, but a limit of sorts is desirable for
> preventing over utilization of the shared device. This type of limit
> is also slightly different than what i understand io.low does in local
> cases in that tg is only high priority/unthrottled if it is bursty,
> and is limited with constant usage
>
> Khazhy

Hi Shaohua,

Does this clarify the reason for this patch? Is this (or something
similar) a good fit for inclusion in blk-throttle?

Thanks,
Khazhy


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [RFC PATCH] blk-throttle: add burst allowance.

2017-12-18 Thread Khazhismel Kumykov
On Mon, Nov 20, 2017 at 8:36 PM, Khazhismel Kumykov  wrote:
> On Fri, Nov 17, 2017 at 11:26 AM, Shaohua Li  wrote:
>> On Thu, Nov 16, 2017 at 08:25:58PM -0800, Khazhismel Kumykov wrote:
>>> On Thu, Nov 16, 2017 at 8:50 AM, Shaohua Li  wrote:
>>> > On Tue, Nov 14, 2017 at 03:10:22PM -0800, Khazhismel Kumykov wrote:
>>> >> Allows configuration additional bytes or ios before a throttle is
>>> >> triggered.
>>> >>
>>> >> This allows implementation of a bucket style rate-limit/throttle on a
>>> >> block device. Previously, bursting to a device was limited to allowance
>>> >> granted in a single throtl_slice (similar to a bucket with limit N and
>>> >> refill rate N/slice).
>>> >>
>>> >> Additional parameters bytes/io_burst_conf defined for tg, which define a
>>> >> number of bytes/ios that must be depleted before throttling happens. A
>>> >> tg that does not deplete this allowance functions as though it has no
>>> >> configured limits. tgs earn additional allowance at rate defined by
>>> >> bps/iops for the tg. Once a tg has *_disp > *_burst_conf, throttling
>>> >> kicks in. If a tg is idle for a while, it will again have some burst
>>> >> allowance before it gets throttled again.
>>> >>
>>> >> slice_end for a tg is extended until io_disp/byte_disp would fall to 0,
>>> >> when all "used" burst allowance would be earned back. trim_slice still
>>> >> does progress slice_start as before and decrements *_disp as before, and
>>> >> tgs continue to get bytes/ios in throtl_slice intervals.
>>> >
>>> > Can you describe why we need this? It would be great if you can describe 
>>> > the
>>> > usage model and an example. Does this work for io.low/io.max or both?
>>> >
>>> > Thanks,
>>> > Shaohua
>>> >
>>>
>>> Use case that brought this up was configuring limits for a remote
>>> shared device. Bursting beyond io.max is desired but only for so much
>>> before the limit kicks in, afterwards with sustained usage throughput
>>> is capped. (This proactively avoids remote-side limits). In that case
>>> one would configure in a root container io.max + io.burst, and
>>> configure low/other limits on descendants sharing the resource on the
>>> same node.
>>>
>>> With this patch, so long as tg has not dispatched more than the burst,
>>> no limit is applied at all by that tg, including limit imposed by
>>> io.low in tg_iops_limit, etc.
>>
>> I'd appreciate if you can give more details about the 'why'. 'configuring
>> limits for a remote shared device' doesn't justify the change.
>
> This is to configure a bursty workload (and associated device) with
> known/allowed expected burst size, but to not allow full utilization
> of the device for extended periods of time for QoS. During idle or low
> use periods the burst allowance accrues, and then tasks can burst well
> beyond the configured throttle up to the limit, afterwards is
> throttled. A constant throttle speed isn't sufficient for this as you
> can only burst 1 slice worth, but a limit of sorts is desirable for
> preventing over utilization of the shared device. This type of limit
> is also slightly different than what i understand io.low does in local
> cases in that tg is only high priority/unthrottled if it is bursty,
> and is limited with constant usage
>
> Khazhy

Hi Shaohua,

Does this clarify the reason for this patch? Is this (or something
similar) a good fit for inclusion in blk-throttle?

Thanks,
Khazhy


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [RFC PATCH] blk-throttle: add burst allowance.

2017-12-07 Thread Vivek Goyal
On Thu, Nov 16, 2017 at 08:50:33AM -0800, Shaohua Li wrote:

[..]
> Can you describe why we need this? It would be great if you can describe the
> usage model and an example. Does this work for io.low/io.max or both?

Hi Shaohua,

Is there any documentation for "io.low" somewhere now. Should we update
cgroup-v2.txt? Just reading through BLK_DEV_THROTTLING_LOW description
does not give me enough idea to know what it is.

Thanks
Vivek


Re: [RFC PATCH] blk-throttle: add burst allowance.

2017-12-07 Thread Vivek Goyal
On Thu, Nov 16, 2017 at 08:50:33AM -0800, Shaohua Li wrote:

[..]
> Can you describe why we need this? It would be great if you can describe the
> usage model and an example. Does this work for io.low/io.max or both?

Hi Shaohua,

Is there any documentation for "io.low" somewhere now. Should we update
cgroup-v2.txt? Just reading through BLK_DEV_THROTTLING_LOW description
does not give me enough idea to know what it is.

Thanks
Vivek


Re: [RFC PATCH] blk-throttle: add burst allowance.

2017-12-06 Thread Khazhismel Kumykov
On Mon, Nov 20, 2017 at 8:36 PM, Khazhismel Kumykov  wrote:
> On Fri, Nov 17, 2017 at 11:26 AM, Shaohua Li  wrote:
>> On Thu, Nov 16, 2017 at 08:25:58PM -0800, Khazhismel Kumykov wrote:
>>> On Thu, Nov 16, 2017 at 8:50 AM, Shaohua Li  wrote:
>>> > On Tue, Nov 14, 2017 at 03:10:22PM -0800, Khazhismel Kumykov wrote:
>>> >> Allows configuration additional bytes or ios before a throttle is
>>> >> triggered.
>>> >>
>>> >> This allows implementation of a bucket style rate-limit/throttle on a
>>> >> block device. Previously, bursting to a device was limited to allowance
>>> >> granted in a single throtl_slice (similar to a bucket with limit N and
>>> >> refill rate N/slice).
>>> >>
>>> >> Additional parameters bytes/io_burst_conf defined for tg, which define a
>>> >> number of bytes/ios that must be depleted before throttling happens. A
>>> >> tg that does not deplete this allowance functions as though it has no
>>> >> configured limits. tgs earn additional allowance at rate defined by
>>> >> bps/iops for the tg. Once a tg has *_disp > *_burst_conf, throttling
>>> >> kicks in. If a tg is idle for a while, it will again have some burst
>>> >> allowance before it gets throttled again.
>>> >>
>>> >> slice_end for a tg is extended until io_disp/byte_disp would fall to 0,
>>> >> when all "used" burst allowance would be earned back. trim_slice still
>>> >> does progress slice_start as before and decrements *_disp as before, and
>>> >> tgs continue to get bytes/ios in throtl_slice intervals.
>>> >
>>> > Can you describe why we need this? It would be great if you can describe 
>>> > the
>>> > usage model and an example. Does this work for io.low/io.max or both?
>>> >
>>> > Thanks,
>>> > Shaohua
>>> >
>>>
>>> Use case that brought this up was configuring limits for a remote
>>> shared device. Bursting beyond io.max is desired but only for so much
>>> before the limit kicks in, afterwards with sustained usage throughput
>>> is capped. (This proactively avoids remote-side limits). In that case
>>> one would configure in a root container io.max + io.burst, and
>>> configure low/other limits on descendants sharing the resource on the
>>> same node.
>>>
>>> With this patch, so long as tg has not dispatched more than the burst,
>>> no limit is applied at all by that tg, including limit imposed by
>>> io.low in tg_iops_limit, etc.
>>
>> I'd appreciate if you can give more details about the 'why'. 'configuring
>> limits for a remote shared device' doesn't justify the change.
>
> This is to configure a bursty workload (and associated device) with
> known/allowed expected burst size, but to not allow full utilization
> of the device for extended periods of time for QoS. During idle or low
> use periods the burst allowance accrues, and then tasks can burst well
> beyond the configured throttle up to the limit, afterwards is
> throttled. A constant throttle speed isn't sufficient for this as you
> can only burst 1 slice worth, but a limit of sorts is desirable for
> preventing over utilization of the shared device. This type of limit
> is also slightly different than what i understand io.low does in local
> cases in that tg is only high priority/unthrottled if it is bursty,
> and is limited with constant usage
>
> Khazhy
Ping this time without html (oops)


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [RFC PATCH] blk-throttle: add burst allowance.

2017-12-06 Thread Khazhismel Kumykov
On Mon, Nov 20, 2017 at 8:36 PM, Khazhismel Kumykov  wrote:
> On Fri, Nov 17, 2017 at 11:26 AM, Shaohua Li  wrote:
>> On Thu, Nov 16, 2017 at 08:25:58PM -0800, Khazhismel Kumykov wrote:
>>> On Thu, Nov 16, 2017 at 8:50 AM, Shaohua Li  wrote:
>>> > On Tue, Nov 14, 2017 at 03:10:22PM -0800, Khazhismel Kumykov wrote:
>>> >> Allows configuration additional bytes or ios before a throttle is
>>> >> triggered.
>>> >>
>>> >> This allows implementation of a bucket style rate-limit/throttle on a
>>> >> block device. Previously, bursting to a device was limited to allowance
>>> >> granted in a single throtl_slice (similar to a bucket with limit N and
>>> >> refill rate N/slice).
>>> >>
>>> >> Additional parameters bytes/io_burst_conf defined for tg, which define a
>>> >> number of bytes/ios that must be depleted before throttling happens. A
>>> >> tg that does not deplete this allowance functions as though it has no
>>> >> configured limits. tgs earn additional allowance at rate defined by
>>> >> bps/iops for the tg. Once a tg has *_disp > *_burst_conf, throttling
>>> >> kicks in. If a tg is idle for a while, it will again have some burst
>>> >> allowance before it gets throttled again.
>>> >>
>>> >> slice_end for a tg is extended until io_disp/byte_disp would fall to 0,
>>> >> when all "used" burst allowance would be earned back. trim_slice still
>>> >> does progress slice_start as before and decrements *_disp as before, and
>>> >> tgs continue to get bytes/ios in throtl_slice intervals.
>>> >
>>> > Can you describe why we need this? It would be great if you can describe 
>>> > the
>>> > usage model and an example. Does this work for io.low/io.max or both?
>>> >
>>> > Thanks,
>>> > Shaohua
>>> >
>>>
>>> Use case that brought this up was configuring limits for a remote
>>> shared device. Bursting beyond io.max is desired but only for so much
>>> before the limit kicks in, afterwards with sustained usage throughput
>>> is capped. (This proactively avoids remote-side limits). In that case
>>> one would configure in a root container io.max + io.burst, and
>>> configure low/other limits on descendants sharing the resource on the
>>> same node.
>>>
>>> With this patch, so long as tg has not dispatched more than the burst,
>>> no limit is applied at all by that tg, including limit imposed by
>>> io.low in tg_iops_limit, etc.
>>
>> I'd appreciate if you can give more details about the 'why'. 'configuring
>> limits for a remote shared device' doesn't justify the change.
>
> This is to configure a bursty workload (and associated device) with
> known/allowed expected burst size, but to not allow full utilization
> of the device for extended periods of time for QoS. During idle or low
> use periods the burst allowance accrues, and then tasks can burst well
> beyond the configured throttle up to the limit, afterwards is
> throttled. A constant throttle speed isn't sufficient for this as you
> can only burst 1 slice worth, but a limit of sorts is desirable for
> preventing over utilization of the shared device. This type of limit
> is also slightly different than what i understand io.low does in local
> cases in that tg is only high priority/unthrottled if it is bursty,
> and is limited with constant usage
>
> Khazhy
Ping this time without html (oops)


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [RFC PATCH] blk-throttle: add burst allowance.

2017-11-20 Thread Khazhismel Kumykov
On Fri, Nov 17, 2017 at 11:26 AM, Shaohua Li  wrote:
> On Thu, Nov 16, 2017 at 08:25:58PM -0800, Khazhismel Kumykov wrote:
>> On Thu, Nov 16, 2017 at 8:50 AM, Shaohua Li  wrote:
>> > On Tue, Nov 14, 2017 at 03:10:22PM -0800, Khazhismel Kumykov wrote:
>> >> Allows configuration additional bytes or ios before a throttle is
>> >> triggered.
>> >>
>> >> This allows implementation of a bucket style rate-limit/throttle on a
>> >> block device. Previously, bursting to a device was limited to allowance
>> >> granted in a single throtl_slice (similar to a bucket with limit N and
>> >> refill rate N/slice).
>> >>
>> >> Additional parameters bytes/io_burst_conf defined for tg, which define a
>> >> number of bytes/ios that must be depleted before throttling happens. A
>> >> tg that does not deplete this allowance functions as though it has no
>> >> configured limits. tgs earn additional allowance at rate defined by
>> >> bps/iops for the tg. Once a tg has *_disp > *_burst_conf, throttling
>> >> kicks in. If a tg is idle for a while, it will again have some burst
>> >> allowance before it gets throttled again.
>> >>
>> >> slice_end for a tg is extended until io_disp/byte_disp would fall to 0,
>> >> when all "used" burst allowance would be earned back. trim_slice still
>> >> does progress slice_start as before and decrements *_disp as before, and
>> >> tgs continue to get bytes/ios in throtl_slice intervals.
>> >
>> > Can you describe why we need this? It would be great if you can describe 
>> > the
>> > usage model and an example. Does this work for io.low/io.max or both?
>> >
>> > Thanks,
>> > Shaohua
>> >
>>
>> Use case that brought this up was configuring limits for a remote
>> shared device. Bursting beyond io.max is desired but only for so much
>> before the limit kicks in, afterwards with sustained usage throughput
>> is capped. (This proactively avoids remote-side limits). In that case
>> one would configure in a root container io.max + io.burst, and
>> configure low/other limits on descendants sharing the resource on the
>> same node.
>>
>> With this patch, so long as tg has not dispatched more than the burst,
>> no limit is applied at all by that tg, including limit imposed by
>> io.low in tg_iops_limit, etc.
>
> I'd appreciate if you can give more details about the 'why'. 'configuring
> limits for a remote shared device' doesn't justify the change.

This is to configure a bursty workload (and associated device) with
known/allowed expected burst size, but to not allow full utilization
of the device for extended periods of time for QoS. During idle or low
use periods the burst allowance accrues, and then tasks can burst well
beyond the configured throttle up to the limit, afterwards is
throttled. A constant throttle speed isn't sufficient for this as you
can only burst 1 slice worth, but a limit of sorts is desirable for
preventing over utilization of the shared device. This type of limit
is also slightly different than what i understand io.low does in local
cases in that tg is only high priority/unthrottled if it is bursty,
and is limited with constant usage

Khazhy


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [RFC PATCH] blk-throttle: add burst allowance.

2017-11-20 Thread Khazhismel Kumykov
On Fri, Nov 17, 2017 at 11:26 AM, Shaohua Li  wrote:
> On Thu, Nov 16, 2017 at 08:25:58PM -0800, Khazhismel Kumykov wrote:
>> On Thu, Nov 16, 2017 at 8:50 AM, Shaohua Li  wrote:
>> > On Tue, Nov 14, 2017 at 03:10:22PM -0800, Khazhismel Kumykov wrote:
>> >> Allows configuration additional bytes or ios before a throttle is
>> >> triggered.
>> >>
>> >> This allows implementation of a bucket style rate-limit/throttle on a
>> >> block device. Previously, bursting to a device was limited to allowance
>> >> granted in a single throtl_slice (similar to a bucket with limit N and
>> >> refill rate N/slice).
>> >>
>> >> Additional parameters bytes/io_burst_conf defined for tg, which define a
>> >> number of bytes/ios that must be depleted before throttling happens. A
>> >> tg that does not deplete this allowance functions as though it has no
>> >> configured limits. tgs earn additional allowance at rate defined by
>> >> bps/iops for the tg. Once a tg has *_disp > *_burst_conf, throttling
>> >> kicks in. If a tg is idle for a while, it will again have some burst
>> >> allowance before it gets throttled again.
>> >>
>> >> slice_end for a tg is extended until io_disp/byte_disp would fall to 0,
>> >> when all "used" burst allowance would be earned back. trim_slice still
>> >> does progress slice_start as before and decrements *_disp as before, and
>> >> tgs continue to get bytes/ios in throtl_slice intervals.
>> >
>> > Can you describe why we need this? It would be great if you can describe 
>> > the
>> > usage model and an example. Does this work for io.low/io.max or both?
>> >
>> > Thanks,
>> > Shaohua
>> >
>>
>> Use case that brought this up was configuring limits for a remote
>> shared device. Bursting beyond io.max is desired but only for so much
>> before the limit kicks in, afterwards with sustained usage throughput
>> is capped. (This proactively avoids remote-side limits). In that case
>> one would configure in a root container io.max + io.burst, and
>> configure low/other limits on descendants sharing the resource on the
>> same node.
>>
>> With this patch, so long as tg has not dispatched more than the burst,
>> no limit is applied at all by that tg, including limit imposed by
>> io.low in tg_iops_limit, etc.
>
> I'd appreciate if you can give more details about the 'why'. 'configuring
> limits for a remote shared device' doesn't justify the change.

This is to configure a bursty workload (and associated device) with
known/allowed expected burst size, but to not allow full utilization
of the device for extended periods of time for QoS. During idle or low
use periods the burst allowance accrues, and then tasks can burst well
beyond the configured throttle up to the limit, afterwards is
throttled. A constant throttle speed isn't sufficient for this as you
can only burst 1 slice worth, but a limit of sorts is desirable for
preventing over utilization of the shared device. This type of limit
is also slightly different than what i understand io.low does in local
cases in that tg is only high priority/unthrottled if it is bursty,
and is limited with constant usage

Khazhy


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [RFC PATCH] blk-throttle: add burst allowance.

2017-11-17 Thread Shaohua Li
On Thu, Nov 16, 2017 at 08:25:58PM -0800, Khazhismel Kumykov wrote:
> On Thu, Nov 16, 2017 at 8:50 AM, Shaohua Li  wrote:
> > On Tue, Nov 14, 2017 at 03:10:22PM -0800, Khazhismel Kumykov wrote:
> >> Allows configuration additional bytes or ios before a throttle is
> >> triggered.
> >>
> >> This allows implementation of a bucket style rate-limit/throttle on a
> >> block device. Previously, bursting to a device was limited to allowance
> >> granted in a single throtl_slice (similar to a bucket with limit N and
> >> refill rate N/slice).
> >>
> >> Additional parameters bytes/io_burst_conf defined for tg, which define a
> >> number of bytes/ios that must be depleted before throttling happens. A
> >> tg that does not deplete this allowance functions as though it has no
> >> configured limits. tgs earn additional allowance at rate defined by
> >> bps/iops for the tg. Once a tg has *_disp > *_burst_conf, throttling
> >> kicks in. If a tg is idle for a while, it will again have some burst
> >> allowance before it gets throttled again.
> >>
> >> slice_end for a tg is extended until io_disp/byte_disp would fall to 0,
> >> when all "used" burst allowance would be earned back. trim_slice still
> >> does progress slice_start as before and decrements *_disp as before, and
> >> tgs continue to get bytes/ios in throtl_slice intervals.
> >
> > Can you describe why we need this? It would be great if you can describe the
> > usage model and an example. Does this work for io.low/io.max or both?
> >
> > Thanks,
> > Shaohua
> >
> 
> Use case that brought this up was configuring limits for a remote
> shared device. Bursting beyond io.max is desired but only for so much
> before the limit kicks in, afterwards with sustained usage throughput
> is capped. (This proactively avoids remote-side limits). In that case
> one would configure in a root container io.max + io.burst, and
> configure low/other limits on descendants sharing the resource on the
> same node.
> 
> With this patch, so long as tg has not dispatched more than the burst,
> no limit is applied at all by that tg, including limit imposed by
> io.low in tg_iops_limit, etc.

I'd appreciate if you can give more details about the 'why'. 'configuring
limits for a remote shared device' doesn't justify the change.


Re: [RFC PATCH] blk-throttle: add burst allowance.

2017-11-17 Thread Shaohua Li
On Thu, Nov 16, 2017 at 08:25:58PM -0800, Khazhismel Kumykov wrote:
> On Thu, Nov 16, 2017 at 8:50 AM, Shaohua Li  wrote:
> > On Tue, Nov 14, 2017 at 03:10:22PM -0800, Khazhismel Kumykov wrote:
> >> Allows configuration additional bytes or ios before a throttle is
> >> triggered.
> >>
> >> This allows implementation of a bucket style rate-limit/throttle on a
> >> block device. Previously, bursting to a device was limited to allowance
> >> granted in a single throtl_slice (similar to a bucket with limit N and
> >> refill rate N/slice).
> >>
> >> Additional parameters bytes/io_burst_conf defined for tg, which define a
> >> number of bytes/ios that must be depleted before throttling happens. A
> >> tg that does not deplete this allowance functions as though it has no
> >> configured limits. tgs earn additional allowance at rate defined by
> >> bps/iops for the tg. Once a tg has *_disp > *_burst_conf, throttling
> >> kicks in. If a tg is idle for a while, it will again have some burst
> >> allowance before it gets throttled again.
> >>
> >> slice_end for a tg is extended until io_disp/byte_disp would fall to 0,
> >> when all "used" burst allowance would be earned back. trim_slice still
> >> does progress slice_start as before and decrements *_disp as before, and
> >> tgs continue to get bytes/ios in throtl_slice intervals.
> >
> > Can you describe why we need this? It would be great if you can describe the
> > usage model and an example. Does this work for io.low/io.max or both?
> >
> > Thanks,
> > Shaohua
> >
> 
> Use case that brought this up was configuring limits for a remote
> shared device. Bursting beyond io.max is desired but only for so much
> before the limit kicks in, afterwards with sustained usage throughput
> is capped. (This proactively avoids remote-side limits). In that case
> one would configure in a root container io.max + io.burst, and
> configure low/other limits on descendants sharing the resource on the
> same node.
> 
> With this patch, so long as tg has not dispatched more than the burst,
> no limit is applied at all by that tg, including limit imposed by
> io.low in tg_iops_limit, etc.

I'd appreciate if you can give more details about the 'why'. 'configuring
limits for a remote shared device' doesn't justify the change.


Re: [RFC PATCH] blk-throttle: add burst allowance.

2017-11-16 Thread Khazhismel Kumykov
On Thu, Nov 16, 2017 at 8:50 AM, Shaohua Li  wrote:
> On Tue, Nov 14, 2017 at 03:10:22PM -0800, Khazhismel Kumykov wrote:
>> Allows configuration additional bytes or ios before a throttle is
>> triggered.
>>
>> This allows implementation of a bucket style rate-limit/throttle on a
>> block device. Previously, bursting to a device was limited to allowance
>> granted in a single throtl_slice (similar to a bucket with limit N and
>> refill rate N/slice).
>>
>> Additional parameters bytes/io_burst_conf defined for tg, which define a
>> number of bytes/ios that must be depleted before throttling happens. A
>> tg that does not deplete this allowance functions as though it has no
>> configured limits. tgs earn additional allowance at rate defined by
>> bps/iops for the tg. Once a tg has *_disp > *_burst_conf, throttling
>> kicks in. If a tg is idle for a while, it will again have some burst
>> allowance before it gets throttled again.
>>
>> slice_end for a tg is extended until io_disp/byte_disp would fall to 0,
>> when all "used" burst allowance would be earned back. trim_slice still
>> does progress slice_start as before and decrements *_disp as before, and
>> tgs continue to get bytes/ios in throtl_slice intervals.
>
> Can you describe why we need this? It would be great if you can describe the
> usage model and an example. Does this work for io.low/io.max or both?
>
> Thanks,
> Shaohua
>

Use case that brought this up was configuring limits for a remote
shared device. Bursting beyond io.max is desired but only for so much
before the limit kicks in, afterwards with sustained usage throughput
is capped. (This proactively avoids remote-side limits). In that case
one would configure in a root container io.max + io.burst, and
configure low/other limits on descendants sharing the resource on the
same node.

With this patch, so long as tg has not dispatched more than the burst,
no limit is applied at all by that tg, including limit imposed by
io.low in tg_iops_limit, etc.

Khazhy

>> Signed-off-by: Khazhismel Kumykov 
>> ---
>>  block/Kconfig|  11 +++
>>  block/blk-throttle.c | 192 
>> +++
>>  2 files changed, 189 insertions(+), 14 deletions(-)
>>
>> diff --git a/block/Kconfig b/block/Kconfig
>> index 28ec55752b68..fbd05b419f93 100644
>> --- a/block/Kconfig
>> +++ b/block/Kconfig
>> @@ -128,6 +128,17 @@ config BLK_DEV_THROTTLING_LOW
>>
>>   Note, this is an experimental interface and could be changed someday.
>>
>> +config BLK_DEV_THROTTLING_BURST
>> +bool "Block throttling .burst allowance interface"
>> +depends on BLK_DEV_THROTTLING
>> +default n
>> +---help---
>> +Add .burst allowance for block throttling. Burst allowance allows for
>> +additional unthrottled usage, while still limiting speed for sustained
>> +usage.
>> +
>> +If in doubt, say N.
>> +
>>  config BLK_CMDLINE_PARSER
>>   bool "Block device command line partition parser"
>>   default n
>> diff --git a/block/blk-throttle.c b/block/blk-throttle.c
>> index 96ad32623427..27c084312772 100644
>> --- a/block/blk-throttle.c
>> +++ b/block/blk-throttle.c
>> @@ -157,6 +157,11 @@ struct throtl_grp {
>>   /* Number of bio's dispatched in current slice */
>>   unsigned int io_disp[2];
>>
>> +#ifdef CONFIG_BLK_DEV_THROTTLING_BURST
>> + uint64_t bytes_burst_conf[2];
>> + unsigned int io_burst_conf[2];
>> +#endif
>> +
>>   unsigned long last_low_overflow_time[2];
>>
>>   uint64_t last_bytes_disp[2];
>> @@ -507,6 +512,12 @@ static struct blkg_policy_data *throtl_pd_alloc(gfp_t 
>> gfp, int node)
>>   tg->bps_conf[WRITE][LIMIT_MAX] = U64_MAX;
>>   tg->iops_conf[READ][LIMIT_MAX] = UINT_MAX;
>>   tg->iops_conf[WRITE][LIMIT_MAX] = UINT_MAX;
>> +#ifdef CONFIG_BLK_DEV_THROTTLING_BURST
>> + tg->bytes_burst_conf[READ] = 0;
>> + tg->bytes_burst_conf[WRITE] = 0;
>> + tg->io_burst_conf[READ] = 0;
>> + tg->io_burst_conf[WRITE] = 0;
>> +#endif
>>   /* LIMIT_LOW will have default value 0 */
>>
>>   tg->latency_target = DFL_LATENCY_TARGET;
>> @@ -800,6 +811,26 @@ static inline void throtl_start_new_slice(struct 
>> throtl_grp *tg, bool rw)
>>  tg->slice_end[rw], jiffies);
>>  }
>>
>> +/*
>> + * When current slice should end.
>> + *
>> + * With CONFIG_BLK_DEV_THROTTLING_BURST, we will wait longer than min_wait
>> + * for slice to recover used burst allowance. (*_disp -> 0). Setting 
>> slice_end
>> + * before this would result in tg receiving additional burst allowance.
>> + */
>> +static inline unsigned long throtl_slice_wait(struct throtl_grp *tg, bool 
>> rw,
>> + unsigned long min_wait)
>> +{
>> + unsigned long bytes_wait = 0, io_wait = 0;
>> +#ifdef CONFIG_BLK_DEV_THROTTLING_BURST
>> + if (tg->bytes_burst_conf[rw])
>> + bytes_wait = (tg->bytes_disp[rw] * HZ) / tg_bps_limit(tg, rw);
>> + if 

Re: [RFC PATCH] blk-throttle: add burst allowance.

2017-11-16 Thread Khazhismel Kumykov
On Thu, Nov 16, 2017 at 8:50 AM, Shaohua Li  wrote:
> On Tue, Nov 14, 2017 at 03:10:22PM -0800, Khazhismel Kumykov wrote:
>> Allows configuration additional bytes or ios before a throttle is
>> triggered.
>>
>> This allows implementation of a bucket style rate-limit/throttle on a
>> block device. Previously, bursting to a device was limited to allowance
>> granted in a single throtl_slice (similar to a bucket with limit N and
>> refill rate N/slice).
>>
>> Additional parameters bytes/io_burst_conf defined for tg, which define a
>> number of bytes/ios that must be depleted before throttling happens. A
>> tg that does not deplete this allowance functions as though it has no
>> configured limits. tgs earn additional allowance at rate defined by
>> bps/iops for the tg. Once a tg has *_disp > *_burst_conf, throttling
>> kicks in. If a tg is idle for a while, it will again have some burst
>> allowance before it gets throttled again.
>>
>> slice_end for a tg is extended until io_disp/byte_disp would fall to 0,
>> when all "used" burst allowance would be earned back. trim_slice still
>> does progress slice_start as before and decrements *_disp as before, and
>> tgs continue to get bytes/ios in throtl_slice intervals.
>
> Can you describe why we need this? It would be great if you can describe the
> usage model and an example. Does this work for io.low/io.max or both?
>
> Thanks,
> Shaohua
>

Use case that brought this up was configuring limits for a remote
shared device. Bursting beyond io.max is desired but only for so much
before the limit kicks in, afterwards with sustained usage throughput
is capped. (This proactively avoids remote-side limits). In that case
one would configure in a root container io.max + io.burst, and
configure low/other limits on descendants sharing the resource on the
same node.

With this patch, so long as tg has not dispatched more than the burst,
no limit is applied at all by that tg, including limit imposed by
io.low in tg_iops_limit, etc.

Khazhy

>> Signed-off-by: Khazhismel Kumykov 
>> ---
>>  block/Kconfig|  11 +++
>>  block/blk-throttle.c | 192 
>> +++
>>  2 files changed, 189 insertions(+), 14 deletions(-)
>>
>> diff --git a/block/Kconfig b/block/Kconfig
>> index 28ec55752b68..fbd05b419f93 100644
>> --- a/block/Kconfig
>> +++ b/block/Kconfig
>> @@ -128,6 +128,17 @@ config BLK_DEV_THROTTLING_LOW
>>
>>   Note, this is an experimental interface and could be changed someday.
>>
>> +config BLK_DEV_THROTTLING_BURST
>> +bool "Block throttling .burst allowance interface"
>> +depends on BLK_DEV_THROTTLING
>> +default n
>> +---help---
>> +Add .burst allowance for block throttling. Burst allowance allows for
>> +additional unthrottled usage, while still limiting speed for sustained
>> +usage.
>> +
>> +If in doubt, say N.
>> +
>>  config BLK_CMDLINE_PARSER
>>   bool "Block device command line partition parser"
>>   default n
>> diff --git a/block/blk-throttle.c b/block/blk-throttle.c
>> index 96ad32623427..27c084312772 100644
>> --- a/block/blk-throttle.c
>> +++ b/block/blk-throttle.c
>> @@ -157,6 +157,11 @@ struct throtl_grp {
>>   /* Number of bio's dispatched in current slice */
>>   unsigned int io_disp[2];
>>
>> +#ifdef CONFIG_BLK_DEV_THROTTLING_BURST
>> + uint64_t bytes_burst_conf[2];
>> + unsigned int io_burst_conf[2];
>> +#endif
>> +
>>   unsigned long last_low_overflow_time[2];
>>
>>   uint64_t last_bytes_disp[2];
>> @@ -507,6 +512,12 @@ static struct blkg_policy_data *throtl_pd_alloc(gfp_t 
>> gfp, int node)
>>   tg->bps_conf[WRITE][LIMIT_MAX] = U64_MAX;
>>   tg->iops_conf[READ][LIMIT_MAX] = UINT_MAX;
>>   tg->iops_conf[WRITE][LIMIT_MAX] = UINT_MAX;
>> +#ifdef CONFIG_BLK_DEV_THROTTLING_BURST
>> + tg->bytes_burst_conf[READ] = 0;
>> + tg->bytes_burst_conf[WRITE] = 0;
>> + tg->io_burst_conf[READ] = 0;
>> + tg->io_burst_conf[WRITE] = 0;
>> +#endif
>>   /* LIMIT_LOW will have default value 0 */
>>
>>   tg->latency_target = DFL_LATENCY_TARGET;
>> @@ -800,6 +811,26 @@ static inline void throtl_start_new_slice(struct 
>> throtl_grp *tg, bool rw)
>>  tg->slice_end[rw], jiffies);
>>  }
>>
>> +/*
>> + * When current slice should end.
>> + *
>> + * With CONFIG_BLK_DEV_THROTTLING_BURST, we will wait longer than min_wait
>> + * for slice to recover used burst allowance. (*_disp -> 0). Setting 
>> slice_end
>> + * before this would result in tg receiving additional burst allowance.
>> + */
>> +static inline unsigned long throtl_slice_wait(struct throtl_grp *tg, bool 
>> rw,
>> + unsigned long min_wait)
>> +{
>> + unsigned long bytes_wait = 0, io_wait = 0;
>> +#ifdef CONFIG_BLK_DEV_THROTTLING_BURST
>> + if (tg->bytes_burst_conf[rw])
>> + bytes_wait = (tg->bytes_disp[rw] * HZ) / tg_bps_limit(tg, rw);
>> + if (tg->io_burst_conf[rw])
>> + io_wait = 

Re: [RFC PATCH] blk-throttle: add burst allowance.

2017-11-16 Thread Shaohua Li
On Tue, Nov 14, 2017 at 03:10:22PM -0800, Khazhismel Kumykov wrote:
> Allows configuration additional bytes or ios before a throttle is
> triggered.
> 
> This allows implementation of a bucket style rate-limit/throttle on a
> block device. Previously, bursting to a device was limited to allowance
> granted in a single throtl_slice (similar to a bucket with limit N and
> refill rate N/slice).
> 
> Additional parameters bytes/io_burst_conf defined for tg, which define a
> number of bytes/ios that must be depleted before throttling happens. A
> tg that does not deplete this allowance functions as though it has no
> configured limits. tgs earn additional allowance at rate defined by
> bps/iops for the tg. Once a tg has *_disp > *_burst_conf, throttling
> kicks in. If a tg is idle for a while, it will again have some burst
> allowance before it gets throttled again.
> 
> slice_end for a tg is extended until io_disp/byte_disp would fall to 0,
> when all "used" burst allowance would be earned back. trim_slice still
> does progress slice_start as before and decrements *_disp as before, and
> tgs continue to get bytes/ios in throtl_slice intervals.

Can you describe why we need this? It would be great if you can describe the
usage model and an example. Does this work for io.low/io.max or both?

Thanks,
Shaohua
 
> Signed-off-by: Khazhismel Kumykov 
> ---
>  block/Kconfig|  11 +++
>  block/blk-throttle.c | 192 
> +++
>  2 files changed, 189 insertions(+), 14 deletions(-)
> 
> diff --git a/block/Kconfig b/block/Kconfig
> index 28ec55752b68..fbd05b419f93 100644
> --- a/block/Kconfig
> +++ b/block/Kconfig
> @@ -128,6 +128,17 @@ config BLK_DEV_THROTTLING_LOW
>  
>   Note, this is an experimental interface and could be changed someday.
>  
> +config BLK_DEV_THROTTLING_BURST
> +bool "Block throttling .burst allowance interface"
> +depends on BLK_DEV_THROTTLING
> +default n
> +---help---
> +Add .burst allowance for block throttling. Burst allowance allows for
> +additional unthrottled usage, while still limiting speed for sustained
> +usage.
> +
> +If in doubt, say N.
> +
>  config BLK_CMDLINE_PARSER
>   bool "Block device command line partition parser"
>   default n
> diff --git a/block/blk-throttle.c b/block/blk-throttle.c
> index 96ad32623427..27c084312772 100644
> --- a/block/blk-throttle.c
> +++ b/block/blk-throttle.c
> @@ -157,6 +157,11 @@ struct throtl_grp {
>   /* Number of bio's dispatched in current slice */
>   unsigned int io_disp[2];
>  
> +#ifdef CONFIG_BLK_DEV_THROTTLING_BURST
> + uint64_t bytes_burst_conf[2];
> + unsigned int io_burst_conf[2];
> +#endif
> +
>   unsigned long last_low_overflow_time[2];
>  
>   uint64_t last_bytes_disp[2];
> @@ -507,6 +512,12 @@ static struct blkg_policy_data *throtl_pd_alloc(gfp_t 
> gfp, int node)
>   tg->bps_conf[WRITE][LIMIT_MAX] = U64_MAX;
>   tg->iops_conf[READ][LIMIT_MAX] = UINT_MAX;
>   tg->iops_conf[WRITE][LIMIT_MAX] = UINT_MAX;
> +#ifdef CONFIG_BLK_DEV_THROTTLING_BURST
> + tg->bytes_burst_conf[READ] = 0;
> + tg->bytes_burst_conf[WRITE] = 0;
> + tg->io_burst_conf[READ] = 0;
> + tg->io_burst_conf[WRITE] = 0;
> +#endif
>   /* LIMIT_LOW will have default value 0 */
>  
>   tg->latency_target = DFL_LATENCY_TARGET;
> @@ -800,6 +811,26 @@ static inline void throtl_start_new_slice(struct 
> throtl_grp *tg, bool rw)
>  tg->slice_end[rw], jiffies);
>  }
>  
> +/*
> + * When current slice should end.
> + *
> + * With CONFIG_BLK_DEV_THROTTLING_BURST, we will wait longer than min_wait
> + * for slice to recover used burst allowance. (*_disp -> 0). Setting 
> slice_end
> + * before this would result in tg receiving additional burst allowance.
> + */
> +static inline unsigned long throtl_slice_wait(struct throtl_grp *tg, bool rw,
> + unsigned long min_wait)
> +{
> + unsigned long bytes_wait = 0, io_wait = 0;
> +#ifdef CONFIG_BLK_DEV_THROTTLING_BURST
> + if (tg->bytes_burst_conf[rw])
> + bytes_wait = (tg->bytes_disp[rw] * HZ) / tg_bps_limit(tg, rw);
> + if (tg->io_burst_conf[rw])
> + io_wait = (tg->io_disp[rw] * HZ) / tg_iops_limit(tg, rw);
> +#endif
> + return max(min_wait, max(bytes_wait, io_wait));
> +}
> +
>  static inline void throtl_set_slice_end(struct throtl_grp *tg, bool rw,
>   unsigned long jiffy_end)
>  {
> @@ -849,7 +880,8 @@ static inline void throtl_trim_slice(struct throtl_grp 
> *tg, bool rw)
>* is bad because it does not allow new slice to start.
>*/
>  
> - throtl_set_slice_end(tg, rw, jiffies + tg->td->throtl_slice);
> + throtl_set_slice_end(tg, rw,
> + jiffies + throtl_slice_wait(tg, rw, tg->td->throtl_slice));
>  
>   time_elapsed = jiffies - tg->slice_start[rw];
>  
> @@ -889,7 +921,7 @@ static bool 

Re: [RFC PATCH] blk-throttle: add burst allowance.

2017-11-16 Thread Shaohua Li
On Tue, Nov 14, 2017 at 03:10:22PM -0800, Khazhismel Kumykov wrote:
> Allows configuration additional bytes or ios before a throttle is
> triggered.
> 
> This allows implementation of a bucket style rate-limit/throttle on a
> block device. Previously, bursting to a device was limited to allowance
> granted in a single throtl_slice (similar to a bucket with limit N and
> refill rate N/slice).
> 
> Additional parameters bytes/io_burst_conf defined for tg, which define a
> number of bytes/ios that must be depleted before throttling happens. A
> tg that does not deplete this allowance functions as though it has no
> configured limits. tgs earn additional allowance at rate defined by
> bps/iops for the tg. Once a tg has *_disp > *_burst_conf, throttling
> kicks in. If a tg is idle for a while, it will again have some burst
> allowance before it gets throttled again.
> 
> slice_end for a tg is extended until io_disp/byte_disp would fall to 0,
> when all "used" burst allowance would be earned back. trim_slice still
> does progress slice_start as before and decrements *_disp as before, and
> tgs continue to get bytes/ios in throtl_slice intervals.

Can you describe why we need this? It would be great if you can describe the
usage model and an example. Does this work for io.low/io.max or both?

Thanks,
Shaohua
 
> Signed-off-by: Khazhismel Kumykov 
> ---
>  block/Kconfig|  11 +++
>  block/blk-throttle.c | 192 
> +++
>  2 files changed, 189 insertions(+), 14 deletions(-)
> 
> diff --git a/block/Kconfig b/block/Kconfig
> index 28ec55752b68..fbd05b419f93 100644
> --- a/block/Kconfig
> +++ b/block/Kconfig
> @@ -128,6 +128,17 @@ config BLK_DEV_THROTTLING_LOW
>  
>   Note, this is an experimental interface and could be changed someday.
>  
> +config BLK_DEV_THROTTLING_BURST
> +bool "Block throttling .burst allowance interface"
> +depends on BLK_DEV_THROTTLING
> +default n
> +---help---
> +Add .burst allowance for block throttling. Burst allowance allows for
> +additional unthrottled usage, while still limiting speed for sustained
> +usage.
> +
> +If in doubt, say N.
> +
>  config BLK_CMDLINE_PARSER
>   bool "Block device command line partition parser"
>   default n
> diff --git a/block/blk-throttle.c b/block/blk-throttle.c
> index 96ad32623427..27c084312772 100644
> --- a/block/blk-throttle.c
> +++ b/block/blk-throttle.c
> @@ -157,6 +157,11 @@ struct throtl_grp {
>   /* Number of bio's dispatched in current slice */
>   unsigned int io_disp[2];
>  
> +#ifdef CONFIG_BLK_DEV_THROTTLING_BURST
> + uint64_t bytes_burst_conf[2];
> + unsigned int io_burst_conf[2];
> +#endif
> +
>   unsigned long last_low_overflow_time[2];
>  
>   uint64_t last_bytes_disp[2];
> @@ -507,6 +512,12 @@ static struct blkg_policy_data *throtl_pd_alloc(gfp_t 
> gfp, int node)
>   tg->bps_conf[WRITE][LIMIT_MAX] = U64_MAX;
>   tg->iops_conf[READ][LIMIT_MAX] = UINT_MAX;
>   tg->iops_conf[WRITE][LIMIT_MAX] = UINT_MAX;
> +#ifdef CONFIG_BLK_DEV_THROTTLING_BURST
> + tg->bytes_burst_conf[READ] = 0;
> + tg->bytes_burst_conf[WRITE] = 0;
> + tg->io_burst_conf[READ] = 0;
> + tg->io_burst_conf[WRITE] = 0;
> +#endif
>   /* LIMIT_LOW will have default value 0 */
>  
>   tg->latency_target = DFL_LATENCY_TARGET;
> @@ -800,6 +811,26 @@ static inline void throtl_start_new_slice(struct 
> throtl_grp *tg, bool rw)
>  tg->slice_end[rw], jiffies);
>  }
>  
> +/*
> + * When current slice should end.
> + *
> + * With CONFIG_BLK_DEV_THROTTLING_BURST, we will wait longer than min_wait
> + * for slice to recover used burst allowance. (*_disp -> 0). Setting 
> slice_end
> + * before this would result in tg receiving additional burst allowance.
> + */
> +static inline unsigned long throtl_slice_wait(struct throtl_grp *tg, bool rw,
> + unsigned long min_wait)
> +{
> + unsigned long bytes_wait = 0, io_wait = 0;
> +#ifdef CONFIG_BLK_DEV_THROTTLING_BURST
> + if (tg->bytes_burst_conf[rw])
> + bytes_wait = (tg->bytes_disp[rw] * HZ) / tg_bps_limit(tg, rw);
> + if (tg->io_burst_conf[rw])
> + io_wait = (tg->io_disp[rw] * HZ) / tg_iops_limit(tg, rw);
> +#endif
> + return max(min_wait, max(bytes_wait, io_wait));
> +}
> +
>  static inline void throtl_set_slice_end(struct throtl_grp *tg, bool rw,
>   unsigned long jiffy_end)
>  {
> @@ -849,7 +880,8 @@ static inline void throtl_trim_slice(struct throtl_grp 
> *tg, bool rw)
>* is bad because it does not allow new slice to start.
>*/
>  
> - throtl_set_slice_end(tg, rw, jiffies + tg->td->throtl_slice);
> + throtl_set_slice_end(tg, rw,
> + jiffies + throtl_slice_wait(tg, rw, tg->td->throtl_slice));
>  
>   time_elapsed = jiffies - tg->slice_start[rw];
>  
> @@ -889,7 +921,7 @@ static bool tg_with_in_iops_limit(struct 

Re: [RFC PATCH] blk-throttle: add burst allowance.

2017-11-14 Thread Khazhismel Kumykov
(Botched the to line, sorry)


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [RFC PATCH] blk-throttle: add burst allowance.

2017-11-14 Thread Khazhismel Kumykov
(Botched the to line, sorry)


smime.p7s
Description: S/MIME Cryptographic Signature


[RFC PATCH] blk-throttle: add burst allowance.

2017-11-14 Thread Khazhismel Kumykov
Allows configuration additional bytes or ios before a throttle is
triggered.

This allows implementation of a bucket style rate-limit/throttle on a
block device. Previously, bursting to a device was limited to allowance
granted in a single throtl_slice (similar to a bucket with limit N and
refill rate N/slice).

Additional parameters bytes/io_burst_conf defined for tg, which define a
number of bytes/ios that must be depleted before throttling happens. A
tg that does not deplete this allowance functions as though it has no
configured limits. tgs earn additional allowance at rate defined by
bps/iops for the tg. Once a tg has *_disp > *_burst_conf, throttling
kicks in. If a tg is idle for a while, it will again have some burst
allowance before it gets throttled again.

slice_end for a tg is extended until io_disp/byte_disp would fall to 0,
when all "used" burst allowance would be earned back. trim_slice still
does progress slice_start as before and decrements *_disp as before, and
tgs continue to get bytes/ios in throtl_slice intervals.

Signed-off-by: Khazhismel Kumykov 
---
 block/Kconfig|  11 +++
 block/blk-throttle.c | 192 +++
 2 files changed, 189 insertions(+), 14 deletions(-)

diff --git a/block/Kconfig b/block/Kconfig
index 28ec55752b68..fbd05b419f93 100644
--- a/block/Kconfig
+++ b/block/Kconfig
@@ -128,6 +128,17 @@ config BLK_DEV_THROTTLING_LOW
 
Note, this is an experimental interface and could be changed someday.
 
+config BLK_DEV_THROTTLING_BURST
+bool "Block throttling .burst allowance interface"
+depends on BLK_DEV_THROTTLING
+default n
+---help---
+Add .burst allowance for block throttling. Burst allowance allows for
+additional unthrottled usage, while still limiting speed for sustained
+usage.
+
+If in doubt, say N.
+
 config BLK_CMDLINE_PARSER
bool "Block device command line partition parser"
default n
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 96ad32623427..27c084312772 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -157,6 +157,11 @@ struct throtl_grp {
/* Number of bio's dispatched in current slice */
unsigned int io_disp[2];
 
+#ifdef CONFIG_BLK_DEV_THROTTLING_BURST
+   uint64_t bytes_burst_conf[2];
+   unsigned int io_burst_conf[2];
+#endif
+
unsigned long last_low_overflow_time[2];
 
uint64_t last_bytes_disp[2];
@@ -507,6 +512,12 @@ static struct blkg_policy_data *throtl_pd_alloc(gfp_t gfp, 
int node)
tg->bps_conf[WRITE][LIMIT_MAX] = U64_MAX;
tg->iops_conf[READ][LIMIT_MAX] = UINT_MAX;
tg->iops_conf[WRITE][LIMIT_MAX] = UINT_MAX;
+#ifdef CONFIG_BLK_DEV_THROTTLING_BURST
+   tg->bytes_burst_conf[READ] = 0;
+   tg->bytes_burst_conf[WRITE] = 0;
+   tg->io_burst_conf[READ] = 0;
+   tg->io_burst_conf[WRITE] = 0;
+#endif
/* LIMIT_LOW will have default value 0 */
 
tg->latency_target = DFL_LATENCY_TARGET;
@@ -800,6 +811,26 @@ static inline void throtl_start_new_slice(struct 
throtl_grp *tg, bool rw)
   tg->slice_end[rw], jiffies);
 }
 
+/*
+ * When current slice should end.
+ *
+ * With CONFIG_BLK_DEV_THROTTLING_BURST, we will wait longer than min_wait
+ * for slice to recover used burst allowance. (*_disp -> 0). Setting slice_end
+ * before this would result in tg receiving additional burst allowance.
+ */
+static inline unsigned long throtl_slice_wait(struct throtl_grp *tg, bool rw,
+   unsigned long min_wait)
+{
+   unsigned long bytes_wait = 0, io_wait = 0;
+#ifdef CONFIG_BLK_DEV_THROTTLING_BURST
+   if (tg->bytes_burst_conf[rw])
+   bytes_wait = (tg->bytes_disp[rw] * HZ) / tg_bps_limit(tg, rw);
+   if (tg->io_burst_conf[rw])
+   io_wait = (tg->io_disp[rw] * HZ) / tg_iops_limit(tg, rw);
+#endif
+   return max(min_wait, max(bytes_wait, io_wait));
+}
+
 static inline void throtl_set_slice_end(struct throtl_grp *tg, bool rw,
unsigned long jiffy_end)
 {
@@ -849,7 +880,8 @@ static inline void throtl_trim_slice(struct throtl_grp *tg, 
bool rw)
 * is bad because it does not allow new slice to start.
 */
 
-   throtl_set_slice_end(tg, rw, jiffies + tg->td->throtl_slice);
+   throtl_set_slice_end(tg, rw,
+   jiffies + throtl_slice_wait(tg, rw, tg->td->throtl_slice));
 
time_elapsed = jiffies - tg->slice_start[rw];
 
@@ -889,7 +921,7 @@ static bool tg_with_in_iops_limit(struct throtl_grp *tg, 
struct bio *bio,
  unsigned long *wait)
 {
bool rw = bio_data_dir(bio);
-   unsigned int io_allowed;
+   unsigned int io_allowed, io_disp;
unsigned long jiffy_elapsed, jiffy_wait, jiffy_elapsed_rnd;
u64 tmp;
 
@@ -908,6 +940,17 @@ static bool tg_with_in_iops_limit(struct throtl_grp *tg, 
struct bio *bio,
 

[RFC PATCH] blk-throttle: add burst allowance.

2017-11-14 Thread Khazhismel Kumykov
Allows configuration additional bytes or ios before a throttle is
triggered.

This allows implementation of a bucket style rate-limit/throttle on a
block device. Previously, bursting to a device was limited to allowance
granted in a single throtl_slice (similar to a bucket with limit N and
refill rate N/slice).

Additional parameters bytes/io_burst_conf defined for tg, which define a
number of bytes/ios that must be depleted before throttling happens. A
tg that does not deplete this allowance functions as though it has no
configured limits. tgs earn additional allowance at rate defined by
bps/iops for the tg. Once a tg has *_disp > *_burst_conf, throttling
kicks in. If a tg is idle for a while, it will again have some burst
allowance before it gets throttled again.

slice_end for a tg is extended until io_disp/byte_disp would fall to 0,
when all "used" burst allowance would be earned back. trim_slice still
does progress slice_start as before and decrements *_disp as before, and
tgs continue to get bytes/ios in throtl_slice intervals.

Signed-off-by: Khazhismel Kumykov 
---
 block/Kconfig|  11 +++
 block/blk-throttle.c | 192 +++
 2 files changed, 189 insertions(+), 14 deletions(-)

diff --git a/block/Kconfig b/block/Kconfig
index 28ec55752b68..fbd05b419f93 100644
--- a/block/Kconfig
+++ b/block/Kconfig
@@ -128,6 +128,17 @@ config BLK_DEV_THROTTLING_LOW
 
Note, this is an experimental interface and could be changed someday.
 
+config BLK_DEV_THROTTLING_BURST
+bool "Block throttling .burst allowance interface"
+depends on BLK_DEV_THROTTLING
+default n
+---help---
+Add .burst allowance for block throttling. Burst allowance allows for
+additional unthrottled usage, while still limiting speed for sustained
+usage.
+
+If in doubt, say N.
+
 config BLK_CMDLINE_PARSER
bool "Block device command line partition parser"
default n
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 96ad32623427..27c084312772 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -157,6 +157,11 @@ struct throtl_grp {
/* Number of bio's dispatched in current slice */
unsigned int io_disp[2];
 
+#ifdef CONFIG_BLK_DEV_THROTTLING_BURST
+   uint64_t bytes_burst_conf[2];
+   unsigned int io_burst_conf[2];
+#endif
+
unsigned long last_low_overflow_time[2];
 
uint64_t last_bytes_disp[2];
@@ -507,6 +512,12 @@ static struct blkg_policy_data *throtl_pd_alloc(gfp_t gfp, 
int node)
tg->bps_conf[WRITE][LIMIT_MAX] = U64_MAX;
tg->iops_conf[READ][LIMIT_MAX] = UINT_MAX;
tg->iops_conf[WRITE][LIMIT_MAX] = UINT_MAX;
+#ifdef CONFIG_BLK_DEV_THROTTLING_BURST
+   tg->bytes_burst_conf[READ] = 0;
+   tg->bytes_burst_conf[WRITE] = 0;
+   tg->io_burst_conf[READ] = 0;
+   tg->io_burst_conf[WRITE] = 0;
+#endif
/* LIMIT_LOW will have default value 0 */
 
tg->latency_target = DFL_LATENCY_TARGET;
@@ -800,6 +811,26 @@ static inline void throtl_start_new_slice(struct 
throtl_grp *tg, bool rw)
   tg->slice_end[rw], jiffies);
 }
 
+/*
+ * When current slice should end.
+ *
+ * With CONFIG_BLK_DEV_THROTTLING_BURST, we will wait longer than min_wait
+ * for slice to recover used burst allowance. (*_disp -> 0). Setting slice_end
+ * before this would result in tg receiving additional burst allowance.
+ */
+static inline unsigned long throtl_slice_wait(struct throtl_grp *tg, bool rw,
+   unsigned long min_wait)
+{
+   unsigned long bytes_wait = 0, io_wait = 0;
+#ifdef CONFIG_BLK_DEV_THROTTLING_BURST
+   if (tg->bytes_burst_conf[rw])
+   bytes_wait = (tg->bytes_disp[rw] * HZ) / tg_bps_limit(tg, rw);
+   if (tg->io_burst_conf[rw])
+   io_wait = (tg->io_disp[rw] * HZ) / tg_iops_limit(tg, rw);
+#endif
+   return max(min_wait, max(bytes_wait, io_wait));
+}
+
 static inline void throtl_set_slice_end(struct throtl_grp *tg, bool rw,
unsigned long jiffy_end)
 {
@@ -849,7 +880,8 @@ static inline void throtl_trim_slice(struct throtl_grp *tg, 
bool rw)
 * is bad because it does not allow new slice to start.
 */
 
-   throtl_set_slice_end(tg, rw, jiffies + tg->td->throtl_slice);
+   throtl_set_slice_end(tg, rw,
+   jiffies + throtl_slice_wait(tg, rw, tg->td->throtl_slice));
 
time_elapsed = jiffies - tg->slice_start[rw];
 
@@ -889,7 +921,7 @@ static bool tg_with_in_iops_limit(struct throtl_grp *tg, 
struct bio *bio,
  unsigned long *wait)
 {
bool rw = bio_data_dir(bio);
-   unsigned int io_allowed;
+   unsigned int io_allowed, io_disp;
unsigned long jiffy_elapsed, jiffy_wait, jiffy_elapsed_rnd;
u64 tmp;
 
@@ -908,6 +940,17 @@ static bool tg_with_in_iops_limit(struct throtl_grp *tg, 
struct bio *bio,
 * have been 

[RFC PATCH] blk-throttle: add burst allowance

2017-10-26 Thread Khazhismel Kumykov
Allows configuration additional bytes or ios before a throttle is
triggered. Slice end is extended to cover expended allowance recovery
time.

Usage would be e.g. per device to allow users to take up to X bytes/ios
at full speed, but be limited to Y bps/iops with sustained usage.

Signed-off-by: Khazhismel Kumykov 
---
 block/Kconfig|  11 +++
 block/blk-throttle.c | 185 ---
 2 files changed, 186 insertions(+), 10 deletions(-)

diff --git a/block/Kconfig b/block/Kconfig
index 3ab42bbb06d5..16545caa7fc9 100644
--- a/block/Kconfig
+++ b/block/Kconfig
@@ -127,6 +127,17 @@ config BLK_DEV_THROTTLING_LOW
 
Note, this is an experimental interface and could be changed someday.
 
+config BLK_DEV_THROTTLING_BURST
+bool "Block throttling .burst allowance interface"
+depends on BLK_DEV_THROTTLING
+default n
+---help---
+Add .burst allowance for block throttling. Burst allowance allows for
+additional unthrottled usage, while still limiting speed for sustained
+usage.
+
+If in doubt, say N.
+
 config BLK_CMDLINE_PARSER
bool "Block device command line partition parser"
default n
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index d80c3f0144c5..e09ec11e9c5f 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -156,6 +156,11 @@ struct throtl_grp {
/* Number of bio's dispatched in current slice */
unsigned int io_disp[2];
 
+#ifdef CONFIG_BLK_DEV_THROTTLING_BURST
+   uint64_t bytes_burst_conf[2];
+   unsigned int io_burst_conf[2];
+#endif
+
unsigned long last_low_overflow_time[2];
 
uint64_t last_bytes_disp[2];
@@ -506,6 +511,12 @@ static struct blkg_policy_data *throtl_pd_alloc(gfp_t gfp, 
int node)
tg->bps_conf[WRITE][LIMIT_MAX] = U64_MAX;
tg->iops_conf[READ][LIMIT_MAX] = UINT_MAX;
tg->iops_conf[WRITE][LIMIT_MAX] = UINT_MAX;
+#ifdef CONFIG_BLK_DEV_THROTTLING_BURST
+   tg->bytes_burst_conf[READ] = 0;
+   tg->bytes_burst_conf[WRITE] = 0;
+   tg->io_burst_conf[READ] = 0;
+   tg->io_burst_conf[WRITE] = 0;
+#endif
/* LIMIT_LOW will have default value 0 */
 
tg->latency_target = DFL_LATENCY_TARGET;
@@ -799,6 +810,26 @@ static inline void throtl_start_new_slice(struct 
throtl_grp *tg, bool rw)
   tg->slice_end[rw], jiffies);
 }
 
+/*
+ * When current slice should end.
+ *
+ * With CONFIG_BLK_DEV_THROTTLING_BURST, we will wait longer than min_wait
+ * for slice to recover used burst allowance. (*_disp -> 0). Setting slice_end
+ * before this would result in tg receiving additional burst allowance.
+ */
+static inline unsigned long throtl_slice_wait(struct throtl_grp *tg, bool rw,
+   unsigned long min_wait)
+{
+   unsigned long bytes_wait = 0, io_wait = 0;
+#ifdef CONFIG_BLK_DEV_THROTTLING_BURST
+   if (tg->bytes_burst_conf[rw])
+   bytes_wait = (tg->bytes_disp[rw] * HZ) / tg_bps_limit(tg, rw);
+   if (tg->io_burst_conf[rw])
+   io_wait = (tg->io_disp[rw] * HZ) / tg_iops_limit(tg, rw);
+#endif
+   return jiffies + max(min_wait, max(bytes_wait, io_wait));
+}
+
 static inline void throtl_set_slice_end(struct throtl_grp *tg, bool rw,
unsigned long jiffy_end)
 {
@@ -848,7 +879,8 @@ static inline void throtl_trim_slice(struct throtl_grp *tg, 
bool rw)
 * is bad because it does not allow new slice to start.
 */
 
-   throtl_set_slice_end(tg, rw, jiffies + tg->td->throtl_slice);
+   throtl_set_slice_end(tg, rw,
+throtl_slice_wait(tg, rw, tg->td->throtl_slice));
 
time_elapsed = jiffies - tg->slice_start[rw];
 
@@ -888,7 +920,7 @@ static bool tg_with_in_iops_limit(struct throtl_grp *tg, 
struct bio *bio,
  unsigned long *wait)
 {
bool rw = bio_data_dir(bio);
-   unsigned int io_allowed;
+   unsigned int io_allowed, io_disp;
unsigned long jiffy_elapsed, jiffy_wait, jiffy_elapsed_rnd;
u64 tmp;
 
@@ -907,6 +939,17 @@ static bool tg_with_in_iops_limit(struct throtl_grp *tg, 
struct bio *bio,
 * have been trimmed.
 */
 
+   io_disp = tg->io_disp[rw];
+
+#ifdef CONFIG_BLK_DEV_THROTTLING_BURST
+   if (tg->io_disp[rw] < tg->io_burst_conf[rw]) {
+   if (wait)
+   *wait = 0;
+   return true;
+   }
+   io_disp -= tg->io_burst_conf[rw];
+#endif
+
tmp = (u64)tg_iops_limit(tg, rw) * jiffy_elapsed_rnd;
do_div(tmp, HZ);
 
@@ -915,14 +958,14 @@ static bool tg_with_in_iops_limit(struct throtl_grp *tg, 
struct bio *bio,
else
io_allowed = tmp;
 
-   if (tg->io_disp[rw] + 1 <= io_allowed) {
+   if (io_disp + 1 <= io_allowed) {
if (wait)
*wait = 0;
return true;
}
 
  

[RFC PATCH] blk-throttle: add burst allowance

2017-10-26 Thread Khazhismel Kumykov
Allows configuration additional bytes or ios before a throttle is
triggered. Slice end is extended to cover expended allowance recovery
time.

Usage would be e.g. per device to allow users to take up to X bytes/ios
at full speed, but be limited to Y bps/iops with sustained usage.

Signed-off-by: Khazhismel Kumykov 
---
 block/Kconfig|  11 +++
 block/blk-throttle.c | 185 ---
 2 files changed, 186 insertions(+), 10 deletions(-)

diff --git a/block/Kconfig b/block/Kconfig
index 3ab42bbb06d5..16545caa7fc9 100644
--- a/block/Kconfig
+++ b/block/Kconfig
@@ -127,6 +127,17 @@ config BLK_DEV_THROTTLING_LOW
 
Note, this is an experimental interface and could be changed someday.
 
+config BLK_DEV_THROTTLING_BURST
+bool "Block throttling .burst allowance interface"
+depends on BLK_DEV_THROTTLING
+default n
+---help---
+Add .burst allowance for block throttling. Burst allowance allows for
+additional unthrottled usage, while still limiting speed for sustained
+usage.
+
+If in doubt, say N.
+
 config BLK_CMDLINE_PARSER
bool "Block device command line partition parser"
default n
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index d80c3f0144c5..e09ec11e9c5f 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -156,6 +156,11 @@ struct throtl_grp {
/* Number of bio's dispatched in current slice */
unsigned int io_disp[2];
 
+#ifdef CONFIG_BLK_DEV_THROTTLING_BURST
+   uint64_t bytes_burst_conf[2];
+   unsigned int io_burst_conf[2];
+#endif
+
unsigned long last_low_overflow_time[2];
 
uint64_t last_bytes_disp[2];
@@ -506,6 +511,12 @@ static struct blkg_policy_data *throtl_pd_alloc(gfp_t gfp, 
int node)
tg->bps_conf[WRITE][LIMIT_MAX] = U64_MAX;
tg->iops_conf[READ][LIMIT_MAX] = UINT_MAX;
tg->iops_conf[WRITE][LIMIT_MAX] = UINT_MAX;
+#ifdef CONFIG_BLK_DEV_THROTTLING_BURST
+   tg->bytes_burst_conf[READ] = 0;
+   tg->bytes_burst_conf[WRITE] = 0;
+   tg->io_burst_conf[READ] = 0;
+   tg->io_burst_conf[WRITE] = 0;
+#endif
/* LIMIT_LOW will have default value 0 */
 
tg->latency_target = DFL_LATENCY_TARGET;
@@ -799,6 +810,26 @@ static inline void throtl_start_new_slice(struct 
throtl_grp *tg, bool rw)
   tg->slice_end[rw], jiffies);
 }
 
+/*
+ * When current slice should end.
+ *
+ * With CONFIG_BLK_DEV_THROTTLING_BURST, we will wait longer than min_wait
+ * for slice to recover used burst allowance. (*_disp -> 0). Setting slice_end
+ * before this would result in tg receiving additional burst allowance.
+ */
+static inline unsigned long throtl_slice_wait(struct throtl_grp *tg, bool rw,
+   unsigned long min_wait)
+{
+   unsigned long bytes_wait = 0, io_wait = 0;
+#ifdef CONFIG_BLK_DEV_THROTTLING_BURST
+   if (tg->bytes_burst_conf[rw])
+   bytes_wait = (tg->bytes_disp[rw] * HZ) / tg_bps_limit(tg, rw);
+   if (tg->io_burst_conf[rw])
+   io_wait = (tg->io_disp[rw] * HZ) / tg_iops_limit(tg, rw);
+#endif
+   return jiffies + max(min_wait, max(bytes_wait, io_wait));
+}
+
 static inline void throtl_set_slice_end(struct throtl_grp *tg, bool rw,
unsigned long jiffy_end)
 {
@@ -848,7 +879,8 @@ static inline void throtl_trim_slice(struct throtl_grp *tg, 
bool rw)
 * is bad because it does not allow new slice to start.
 */
 
-   throtl_set_slice_end(tg, rw, jiffies + tg->td->throtl_slice);
+   throtl_set_slice_end(tg, rw,
+throtl_slice_wait(tg, rw, tg->td->throtl_slice));
 
time_elapsed = jiffies - tg->slice_start[rw];
 
@@ -888,7 +920,7 @@ static bool tg_with_in_iops_limit(struct throtl_grp *tg, 
struct bio *bio,
  unsigned long *wait)
 {
bool rw = bio_data_dir(bio);
-   unsigned int io_allowed;
+   unsigned int io_allowed, io_disp;
unsigned long jiffy_elapsed, jiffy_wait, jiffy_elapsed_rnd;
u64 tmp;
 
@@ -907,6 +939,17 @@ static bool tg_with_in_iops_limit(struct throtl_grp *tg, 
struct bio *bio,
 * have been trimmed.
 */
 
+   io_disp = tg->io_disp[rw];
+
+#ifdef CONFIG_BLK_DEV_THROTTLING_BURST
+   if (tg->io_disp[rw] < tg->io_burst_conf[rw]) {
+   if (wait)
+   *wait = 0;
+   return true;
+   }
+   io_disp -= tg->io_burst_conf[rw];
+#endif
+
tmp = (u64)tg_iops_limit(tg, rw) * jiffy_elapsed_rnd;
do_div(tmp, HZ);
 
@@ -915,14 +958,14 @@ static bool tg_with_in_iops_limit(struct throtl_grp *tg, 
struct bio *bio,
else
io_allowed = tmp;
 
-   if (tg->io_disp[rw] + 1 <= io_allowed) {
+   if (io_disp + 1 <= io_allowed) {
if (wait)
*wait = 0;
return true;
}
 
/* Calc approx