[PATCH 0/4] blk-throttle: fix a few issues

2017-05-17 Thread Shaohua Li
Hi, The patchset fix a few issues for io.low limit. The title of the patches explain the issues pretty well. Thanks, Shaohua Shaohua Li (4): blk-throttle: add hierarchy support for latency target and idle time blk-throttle: output some debug info in trace blk-throttle: respect 0 bps/iops

[PATCH 2/4] blk-throttle: output some debug info in trace

2017-05-17 Thread Shaohua Li
These info are important to understand what's happening and help debug. Signed-off-by: Shaohua Li --- block/blk-throttle.c | 15 ++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/block/blk-throttle.c b/block/blk-throttle.c index 16174f8..1f8d62f 100644 --- a/

[PATCH 1/4] blk-throttle: add hierarchy support for latency target and idle time

2017-05-17 Thread Shaohua Li
grade. parents nodes don't need to track their IO latency/idle time. Signed-off-by: Shaohua Li --- block/blk-throttle.c | 50 -- 1 file changed, 36 insertions(+), 14 deletions(-) diff --git a/block/blk-throttle.c b/block/blk-throttle.c index b78d

[PATCH 4/4] blk-throttle: force user to configure all settings for io.low

2017-05-17 Thread Shaohua Li
limit doesn't take effect. With this stragety, default setting of latency/idle isn't important, so just set them to very conservative and safe value. Signed-off-by: Shaohua Li --- block/blk-throttle.c | 80 1 file changed, 37 in

[PATCH 3/4] blk-throttle: respect 0 bps/iops settings for io.low

2017-05-17 Thread Shaohua Li
if it uese default setting. To avoid completed stall, we give such cgroup tiny IO resources. Signed-off-by: Shaohua Li --- block/blk-throttle.c | 41 + 1 file changed, 29 insertions(+), 12 deletions(-) diff --git a/block/blk-throttle.c b/block/blk-thrott

[PATCH] blktrace: fix integer parse

2017-05-19 Thread Shaohua Li
sscanf is a very poor way to parse integer. For example, I input "discard" for act_mask, it gets 0xd and completely messes up. Using correct API to do integer parse. This patch also makes attributes accept any base of integer. Signed-off-by: Shaohua Li --- kernel/trace/blktrace.c |

[PATCH 06/11] cgroup: export fhandle info for a cgroup

2017-06-02 Thread Shaohua Li
From: Shaohua Li Add an API to export cgroup fhandle info. We don't export a full 'struct file_handle', there are unrequired info. Sepcifically, cgroup is always a directory, so we don't need a 'FILEID_INO32_GEN_PARENT' type fhandle, we only need export the inod

[PATCH 07/11] blktrace: export cgroup info in trace

2017-06-02 Thread Shaohua Li
From: Shaohua Li Currently blktrace isn't cgroup aware. blktrace prints out task name of current context, but the task of current context isn't always in the cgroup where the BIO comes from. We can't use task name to find out IO cgroup. For example, Writeback BIOs always com

[PATCH 04/11] kernfs: don't set dentry->d_fsdata

2017-06-02 Thread Shaohua Li
From: Shaohua Li When working on adding exportfs operations in kernfs, I found it's hard to initialize dentry->d_fsdata in the exportfs operations. Looks there is no way to do it without race condition. Look at the kernfs code closely, there is no point to set dentry->d_fsdata. inode

[PATCH 05/11] kernfs: add exportfs operations

2017-06-02 Thread Shaohua Li
From: Shaohua Li Now we have the facilities to implement exportfs operations. The idea is cgroup can export the fhandle info to userspace, then userspace uses fhandle to find the cgroup name. Another example is userspace can get fhandle for a cgroup and BPF uses the fhandle to filter info for

[PATCH 11/11] block: use standard blktrace API to output cgroup info for debug notes

2017-06-02 Thread Shaohua Li
From: Shaohua Li Currently cfq/bfq/blk-throttle output cgroup info in trace in their own way. Now we have standard blktrace API for this, so convert them to use it. Note, this changes the behavior a little bit. cgroup info isn't output by default, we only do this with 'blk_cgro

[PATCH 10/11] blktrace: add an option to allow displying cgroup path

2017-06-02 Thread Shaohua Li
From: Shaohua Li By default we output cgroup id in blktrace. This adds an option to display cgroup path. Since get cgroup path is a relativly heavy operation, we don't enable it by default. with the option enabled, blktrace will output something like this: dd-1353 [007] d..2 293.0

[PATCH 08/11] block: always attach cgroup info into bio

2017-06-02 Thread Shaohua Li
From: Shaohua Li blkcg_bio_issue_check() already gets blkcg for a BIO. bio_associate_blkcg() uses a percpu refcounter, so it's a very cheap operation. There is no point we don't attach the cgroup info into bio at blkcg_bio_issue_check. This also makes blktrace outputs correct c

[PATCH 02/11] kernfs: use idr instead of ida to manage inode number

2017-06-02 Thread Shaohua Li
From: Shaohua Li kernfs uses ida to manage inode number. The problem is we can't get kernfs_node from inode number with ida. Switching to use idr, next patch will add an API to get kernfs_node from inode number. Signed-off-by: Shaohua Li --- fs/kernfs/dir.c

[PATCH 09/11] block: call __bio_free in bio_endio

2017-06-02 Thread Shaohua Li
From: Shaohua Li bio_free isn't a good place to free cgroup/integrity info. There are a lot of cases bio is allocated in special way (for example, in stack) and never gets called by bio_put hence bio_free, we are leaking memory. This patch moves the free to bio endio, which should be c

[PATCH 03/11] kernfs: add an API to get kernfs node from inode number

2017-06-02 Thread Shaohua Li
From: Shaohua Li Add an API to get kernfs node from inode number. We will need this to implement exportfs operations. To make the API lock free, kernfs node is freed in RCU context. And we depend on kernfs_node count/ino number to filter stale kernfs nodes. Signed-off-by: Shaohua Li --- fs

[PATCH 01/11] kernfs: implement i_generation

2017-06-02 Thread Shaohua Li
From: Shaohua Li Set i_generation for kernfs inode. This is required to implement exportfs operations. Note, the generation is 32-bit, so it's possible the generation wraps up and we find stale files. The possiblity is low, since fhandle matches both inode number and generation. In most fs

[PATCH 00/11]blktrace: output cgroup info

2017-06-02 Thread Shaohua Li
From: Shaohua Li Hi, Currently blktrace isn't cgroup aware. blktrace prints out task name of current context, but the task of current context isn't always in the cgroup where the BIO comes from. We can't use task name to find out IO cgroup. For example, Writeback BIOs always com

Re: [PATCH 03/11] kernfs: add an API to get kernfs node from inode number

2017-06-02 Thread Shaohua Li
On Fri, Jun 02, 2017 at 03:03:45PM -0700, Eduardo Valentin wrote: > On Fri, Jun 02, 2017 at 02:53:56PM -0700, Shaohua Li wrote: > > From: Shaohua Li > > > > Add an API to get kernfs node from inode number. We will need this to > > implement exportfs operations. > &g

[PATCH] blk-throttle: set default latency baseline for harddisk

2017-06-06 Thread Shaohua Li
From: Shaohua Li hard disk IO latency varies a lot depending on spindle move. The latency range could be from several microseconds to several milliseconds. It's pretty hard to get the baseline latency used by io.low. We will use a different stragety here. The idea is only using IO with sp

Re: [PATCH] blk-throttle: set default latency baseline for harddisk

2017-06-06 Thread Shaohua Li
On Tue, Jun 06, 2017 at 03:12:12PM -0600, Jens Axboe wrote: > On 06/06/2017 01:40 PM, Shaohua Li wrote: > > From: Shaohua Li > > > > hard disk IO latency varies a lot depending on spindle move. The latency > > range could be from several microseconds to several millise

Re: [PATCH] blk-throttle: fix NULL pointer dereference in throtl_schedule_pending_timer

2017-06-06 Thread Shaohua Li
hrottle: make throtl_slice tunable") > Signed-off-by: Joseph Qi Thanks! Reviewed-by: Shaohua Li > --- > block/blk-throttle.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/block/blk-throttle.c b/block/blk-throttle.c > index fc1

[PATCH V2 08/12] blktrace: export cgroup info in trace

2017-06-14 Thread Shaohua Li
From: Shaohua Li Currently blktrace isn't cgroup aware. blktrace prints out task name of current context, but the task of current context isn't always in the cgroup where the BIO comes from. We can't use task name to find out IO cgroup. For example, Writeback BIOs always com

[PATCH V2 06/12] kernfs: add exportfs operations

2017-06-14 Thread Shaohua Li
From: Shaohua Li Now we have the facilities to implement exportfs operations. The idea is cgroup can export the fhandle info to userspace, then userspace uses fhandle to find the cgroup name. Another example is userspace can get fhandle for a cgroup and BPF uses the fhandle to filter info for

[PATCH V2 10/12] block: call __bio_free in bio_endio

2017-06-14 Thread Shaohua Li
From: Shaohua Li bio_free isn't a good place to free cgroup/integrity info. There are a lot of cases bio is allocated in special way (for example, in stack) and never gets called by bio_put hence bio_free, we are leaking memory. This patch moves the free to bio endio, which should be c

[PATCH V2 11/12] blktrace: add an option to allow displying cgroup path

2017-06-14 Thread Shaohua Li
From: Shaohua Li By default we output cgroup id in blktrace. This adds an option to display cgroup path. Since get cgroup path is a relativly heavy operation, we don't enable it by default. with the option enabled, blktrace will output something like this: dd-1353 [007] d..2 293.0

[PATCH V2 01/12] kernfs: implement i_generation

2017-06-14 Thread Shaohua Li
From: Shaohua Li Set i_generation for kernfs inode. This is required to implement exportfs operations. Note, the generation is 32-bit, so it's possible the generation wraps up and we find stale files. The possiblity is low, since fhandle matches both inode number and generation. In most fs

[PATCH V2 05/12] kernfs: introduce kernfs_node_id

2017-06-14 Thread Shaohua Li
From: Shaohua Li inode number and generation can identify a kernfs node. We are going to export the identification by exportfs operations, so put ino and generation into a separate structure. It's convenient when later patches use the identification. Please note, I extend inode number

[PATCH V2 12/12] block: use standard blktrace API to output cgroup info for debug notes

2017-06-14 Thread Shaohua Li
From: Shaohua Li Currently cfq/bfq/blk-throttle output cgroup info in trace in their own way. Now we have standard blktrace API for this, so convert them to use it. Note, this changes the behavior a little bit. cgroup info isn't output by default, we only do this with 'blk_cgro

[PATCH V2 02/12] kernfs: use idr instead of ida to manage inode number

2017-06-14 Thread Shaohua Li
From: Shaohua Li kernfs uses ida to manage inode number. The problem is we can't get kernfs_node from inode number with ida. Switching to use idr, next patch will add an API to get kernfs_node from inode number. Signed-off-by: Shaohua Li --- fs/kernfs/dir.c

[PATCH V2 09/12] block: always attach cgroup info into bio

2017-06-14 Thread Shaohua Li
From: Shaohua Li blkcg_bio_issue_check() already gets blkcg for a BIO. bio_associate_blkcg() uses a percpu refcounter, so it's a very cheap operation. There is no point we don't attach the cgroup info into bio at blkcg_bio_issue_check. This also makes blktrace outputs correct c

[PATCH V2 03/12] kernfs: add an API to get kernfs node from inode number

2017-06-14 Thread Shaohua Li
From: Shaohua Li Add an API to get kernfs node from inode number. We will need this to implement exportfs operations. To make the API lock free, kernfs node is freed in RCU context. And we depend on kernfs_node count/ino number to filter stale kernfs nodes. Signed-off-by: Shaohua Li --- fs

[PATCH V2 07/12] cgroup: export fhandle info for a cgroup

2017-06-14 Thread Shaohua Li
From: Shaohua Li Add an API to export cgroup fhandle info. We don't export a full 'struct file_handle', there are unrequired info. Sepcifically, cgroup is always a directory, so we don't need a 'FILEID_KERNFS_WITH_PARENT' type fhandle, we only need export the inod

[PATCH V2 00/12]blktrace: output cgroup info

2017-06-14 Thread Shaohua Li
From: Shaohua Li Hi, Currently blktrace isn't cgroup aware. blktrace prints out task name of current context, but the task of current context isn't always in the cgroup where the BIO comes from. We can't use task name to find out IO cgroup. For example, Writeback BIOs always com

[PATCH V2 04/12] kernfs: don't set dentry->d_fsdata

2017-06-14 Thread Shaohua Li
From: Shaohua Li When working on adding exportfs operations in kernfs, I found it's hard to initialize dentry->d_fsdata in the exportfs operations. Looks there is no way to do it without race condition. Look at the kernfs code closely, there is no point to set dentry->d_fsdata. inode

[PATCH V3 03/12] kernfs: add an API to get kernfs node from inode number

2017-06-15 Thread Shaohua Li
From: Shaohua Li Add an API to get kernfs node from inode number. We will need this to implement exportfs operations. To make the API lock free, kernfs node is freed in RCU context. And we depend on kernfs_node count/ino number to filter stale kernfs nodes. Signed-off-by: Shaohua Li --- fs

[PATCH V3 12/12] block: use standard blktrace API to output cgroup info for debug notes

2017-06-15 Thread Shaohua Li
From: Shaohua Li Currently cfq/bfq/blk-throttle output cgroup info in trace in their own way. Now we have standard blktrace API for this, so convert them to use it. Note, this changes the behavior a little bit. cgroup info isn't output by default, we only do this with 'blk_cgro

[PATCH V3 07/12] cgroup: export fhandle info for a cgroup

2017-06-15 Thread Shaohua Li
From: Shaohua Li Add an API to export cgroup fhandle info. We don't export a full 'struct file_handle', there are unrequired info. Sepcifically, cgroup is always a directory, so we don't need a 'FILEID_INO32_GEN_PARENT' type fhandle, we only need export the inod

[PATCH V3 09/12] block: always attach cgroup info into bio

2017-06-15 Thread Shaohua Li
From: Shaohua Li blkcg_bio_issue_check() already gets blkcg for a BIO. bio_associate_blkcg() uses a percpu refcounter, so it's a very cheap operation. There is no point we don't attach the cgroup info into bio at blkcg_bio_issue_check. This also makes blktrace outputs correct c

[PATCH V3 04/12] kernfs: don't set dentry->d_fsdata

2017-06-15 Thread Shaohua Li
From: Shaohua Li When working on adding exportfs operations in kernfs, I found it's hard to initialize dentry->d_fsdata in the exportfs operations. Looks there is no way to do it without race condition. Look at the kernfs code closely, there is no point to set dentry->d_fsdata. inode

[PATCH V3 05/12] kernfs: introduce kernfs_node_id

2017-06-15 Thread Shaohua Li
From: Shaohua Li inode number and generation can identify a kernfs node. We are going to export the identification by exportfs operations, so put ino and generation into a separate structure. It's convenient when later patches use the identification. Signed-off-by: Shaohua Li --- fs/k

[PATCH V3 11/12] blktrace: add an option to allow displying cgroup path

2017-06-15 Thread Shaohua Li
From: Shaohua Li By default we output cgroup id in blktrace. This adds an option to display cgroup path. Since get cgroup path is a relativly heavy operation, we don't enable it by default. with the option enabled, blktrace will output something like this: dd-1353 [007] d..2 293.0

[PATCH V3 02/12] kernfs: implement i_generation

2017-06-15 Thread Shaohua Li
From: Shaohua Li Set i_generation for kernfs inode. This is required to implement exportfs operations. The generation is 32-bit, so it's possible the generation wraps up and we find stale files. To reduce the posssibility, we don't reuse inode numer immediately. When the inode number

[PATCH V3 10/12] block: call __bio_free in bio_endio

2017-06-15 Thread Shaohua Li
From: Shaohua Li bio_free isn't a good place to free cgroup/integrity info. There are a lot of cases bio is allocated in special way (for example, in stack) and never gets called by bio_put hence bio_free, we are leaking memory. This patch moves the free to bio endio, which should be c

[PATCH V3 06/12] kernfs: add exportfs operations

2017-06-15 Thread Shaohua Li
From: Shaohua Li Now we have the facilities to implement exportfs operations. The idea is cgroup can export the fhandle info to userspace, then userspace uses fhandle to find the cgroup name. Another example is userspace can get fhandle for a cgroup and BPF uses the fhandle to filter info for

[PATCH V3 08/12] blktrace: export cgroup info in trace

2017-06-15 Thread Shaohua Li
From: Shaohua Li Currently blktrace isn't cgroup aware. blktrace prints out task name of current context, but the task of current context isn't always in the cgroup where the BIO comes from. We can't use task name to find out IO cgroup. For example, Writeback BIOs always com

[PATCH V3 00/12] blktrace: output cgroup info

2017-06-15 Thread Shaohua Li
From: Shaohua Li Hi, Currently blktrace isn't cgroup aware. blktrace prints out task name of current context, but the task of current context isn't always in the cgroup where the BIO comes from. We can't use task name to find out IO cgroup. For example, Writeback BIOs always com

[PATCH V3 01/12] kernfs: use idr instead of ida to manage inode number

2017-06-15 Thread Shaohua Li
From: Shaohua Li kernfs uses ida to manage inode number. The problem is we can't get kernfs_node from inode number with ida. Switching to use idr, next patch will add an API to get kernfs_node from inode number. Signed-off-by: Shaohua Li --- fs/kernfs/dir.c

[PATCH V4 02/12] kernfs: implement i_generation

2017-06-28 Thread Shaohua Li
From: Shaohua Li Set i_generation for kernfs inode. This is required to implement exportfs operations. The generation is 32-bit, so it's possible the generation wraps up and we find stale files. To reduce the posssibility, we don't reuse inode numer immediately. When the inode number

[PATCH V4 00/12] blktrace: output cgroup info

2017-06-28 Thread Shaohua Li
From: Shaohua Li Hi, Currently blktrace isn't cgroup aware. blktrace prints out task name of current context, but the task of current context isn't always in the cgroup where the BIO comes from. We can't use task name to find out IO cgroup. For example, Writeback BIOs always com

[PATCH V4 09/12] block: always attach cgroup info into bio

2017-06-28 Thread Shaohua Li
From: Shaohua Li blkcg_bio_issue_check() already gets blkcg for a BIO. bio_associate_blkcg() uses a percpu refcounter, so it's a very cheap operation. There is no point we don't attach the cgroup info into bio at blkcg_bio_issue_check. This also makes blktrace outputs correct cgroup in

[PATCH V4 10/12] block: call __bio_free in bio_endio

2017-06-28 Thread Shaohua Li
From: Shaohua Li bio_free isn't a good place to free cgroup/integrity info. There are a lot of cases bio is allocated in special way (for example, in stack) and never gets called by bio_put hence bio_free, we are leaking memory. This patch moves the free to bio endio, which should be c

[PATCH V4 12/12] block: use standard blktrace API to output cgroup info for debug notes

2017-06-28 Thread Shaohua Li
From: Shaohua Li Currently cfq/bfq/blk-throttle output cgroup info in trace in their own way. Now we have standard blktrace API for this, so convert them to use it. Note, this changes the behavior a little bit. cgroup info isn't output by default, we only do this with 'blk_cgro

[PATCH V4 08/12] blktrace: export cgroup info in trace

2017-06-28 Thread Shaohua Li
From: Shaohua Li Currently blktrace isn't cgroup aware. blktrace prints out task name of current context, but the task of current context isn't always in the cgroup where the BIO comes from. We can't use task name to find out IO cgroup. For example, Writeback BIOs always com

[PATCH V4 03/12] kernfs: add an API to get kernfs node from inode number

2017-06-28 Thread Shaohua Li
From: Shaohua Li Add an API to get kernfs node from inode number. We will need this to implement exportfs operations. This API will be used in blktrace too later, so it should be as fast as possible. To make the API lock free, kernfs node is freed in RCU context. And we depend on kernfs_node

[PATCH V4 04/12] kernfs: don't set dentry->d_fsdata

2017-06-28 Thread Shaohua Li
From: Shaohua Li When working on adding exportfs operations in kernfs, I found it's hard to initialize dentry->d_fsdata in the exportfs operations. Looks there is no way to do it without race condition. Look at the kernfs code closely, there is no point to set dentry->d_fsdata. inode

[PATCH V4 11/12] blktrace: add an option to allow displying cgroup path

2017-06-28 Thread Shaohua Li
From: Shaohua Li By default we output cgroup id in blktrace. This adds an option to display cgroup path. Since get cgroup path is a relativly heavy operation, we don't enable it by default. with the option enabled, blktrace will output something like this: dd-1353 [007] d..2 293.0

[PATCH V4 07/12] cgroup: export fhandle info for a cgroup

2017-06-28 Thread Shaohua Li
From: Shaohua Li Add an API to export cgroup fhandle info. We don't export a full 'struct file_handle', there are unrequired info. Sepcifically, cgroup is always a directory, so we don't need a 'FILEID_INO32_GEN_PARENT' type fhandle, we only need export the inod

[PATCH V4 05/12] kernfs: introduce kernfs_node_id

2017-06-28 Thread Shaohua Li
From: Shaohua Li inode number and generation can identify a kernfs node. We are going to export the identification by exportfs operations, so put ino and generation into a separate structure. It's convenient when later patches use the identification. Signed-off-by: Shaohua Li --- fs/k

[PATCH V4 06/12] kernfs: add exportfs operations

2017-06-28 Thread Shaohua Li
From: Shaohua Li Now we have the facilities to implement exportfs operations. The idea is cgroup can export the fhandle info to userspace, then userspace uses fhandle to find the cgroup name. Another example is userspace can get fhandle for a cgroup and BPF uses the fhandle to filter info for

[PATCH V4 01/12] kernfs: use idr instead of ida to manage inode number

2017-06-28 Thread Shaohua Li
From: Shaohua Li kernfs uses ida to manage inode number. The problem is we can't get kernfs_node from inode number with ida. Switching to use idr, next patch will add an API to get kernfs_node from inode number. Acked-by: Tejun Heo Signed-off-by: Shaohua Li --- fs/kernfs/dir.c

Re: [PATCH V4 00/12] blktrace: output cgroup info

2017-06-28 Thread Shaohua Li
On Wed, Jun 28, 2017 at 10:43:48AM -0600, Jens Axboe wrote: > On 06/28/2017 10:29 AM, Shaohua Li wrote: > > From: Shaohua Li > > > > Hi, > > > > Currently blktrace isn't cgroup aware. blktrace prints out task name of > > current > > context, b

Re: [PATCH V4 10/12] block: call __bio_free in bio_endio

2017-06-28 Thread Shaohua Li
On Wed, Jun 28, 2017 at 11:29:08PM +0200, Christoph Hellwig wrote: > On Wed, Jun 28, 2017 at 09:30:00AM -0700, Shaohua Li wrote: > > From: Shaohua Li > > > > bio_free isn't a good place to free cgroup/integrity info. There are a > > lot of cases bio is allocated

Re: kernel 4.8-rc5 kernel BUG at block/blk-core.c:2032!

2016-09-08 Thread Shaohua Li
On Thu, Sep 08, 2016 at 10:16:59AM -0600, Jens Axboe wrote: > On 09/08/2016 02:23 AM, Stefan Priebe - Profihost AG wrote: > > Hi, > > > > while trying Kernel 4.8-rc5 my raid5 breaks every few minutes. > > > > Trace: > > [ cut here ] > > kernel BUG at block/blk-core.c:2032!

Re: kernel 4.8-rc5 kernel BUG at block/blk-core.c:2032!

2016-09-09 Thread Shaohua Li
On Fri, Sep 09, 2016 at 08:03:42PM +0200, Stefan Priebe - Profihost AG wrote: > Am 08.09.2016 um 19:33 schrieb Shaohua Li: > > On Thu, Sep 08, 2016 at 10:16:59AM -0600, Jens Axboe wrote: > >> On 09/08/2016 02:23 AM, Stefan Priebe - Profihost AG wrote: > >>> Hi, >

[PATCH V2 03/11] block-throttle: configure bps/iops limit for cgroup in high limit

2016-09-15 Thread Shaohua Li
for their high limit. Signed-off-by: Shaohua Li --- block/blk-throttle.c | 21 +++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/block/blk-throttle.c b/block/blk-throttle.c index 02323fb..6bae1b4 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c

[PATCH V2 07/11] blk-throttle: make throtl_slice tunable

2016-09-15 Thread Shaohua Li
. Signed-off-by: Shaohua Li --- block/blk-sysfs.c| 11 block/blk-throttle.c | 72 block/blk.h | 3 +++ 3 files changed, 64 insertions(+), 22 deletions(-) diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index f87a7e7

[PATCH V2 06/11] blk-throttle: make sure expire time isn't too big

2016-09-15 Thread Shaohua Li
roup sleep time not too big wouldn't change cgroup bps/iops, but could make it wakeup more frequently, which isn't a big issue because throtl_slice * 8 is already quite big. Signed-off-by: Shaohua Li --- block/blk-throttle.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/block/b

[PATCH V2 05/11] block-throttle: add downgrade logic

2016-09-15 Thread Shaohua Li
When queue state machine is in LIMIT_MAX state, but a cgroup is below its high limit for some time, the queue should be downgraded to lower state as one cgroup's high limit isn't met. Signed-off-by: Shaohua Li --- block/blk-throttle.c | 187

[PATCH V2 02/11] block-throttle: add .high interface

2016-09-15 Thread Shaohua Li
Add high limit for cgroup and corresponding cgroup interface. Signed-off-by: Shaohua Li --- block/blk-throttle.c | 139 +++ 1 file changed, 107 insertions(+), 32 deletions(-) diff --git a/block/blk-throttle.c b/block/blk-throttle.c index 71ecee7

[PATCH V2 08/11] blk-throttle: detect completed idle cgroup

2016-09-15 Thread Shaohua Li
idle cgroup is hard. This patch handles a simple case, a cgroup doesn't dispatch any IO. We ignore such cgroup's limit, so other cgroups can use the bandwidth. Signed-off-by: Shaohua Li --- block/blk-throttle.c | 17 - 1 file changed, 16 insertions(+), 1 deletion(-

[PATCH V2 09/11] block-throttle: make bandwidth change smooth

2016-09-15 Thread Shaohua Li
bandwidth, but that's something we pay for sharing. Note this doesn't completely avoid cgroup running under its high limit. The best way to guarantee cgroup doesn't run under its limit is to set max limit. For example, if we set cg1 max limit to 40, cg2 will never run under its high

[PATCH V2 01/11] block-throttle: prepare support multiple limits

2016-09-15 Thread Shaohua Li
We are going to support high/max limit, each cgroup will have 2 limits after that. This patch prepares for the multiple limits change. Signed-off-by: Shaohua Li --- block/blk-throttle.c | 109 --- 1 file changed, 68 insertions(+), 41 deletions

[PATCH V2 00/11] block-throttle: add .high limit

2016-09-15 Thread Shaohua Li
detection - Other bug fixes and improvements V1: http://marc.info/?l=linux-block&m=146292596425689&w=2 -- Shaohua Li (11): block-throttle: prepare support multiple limits block-throttle: add .high interface block-throttle: configure bps/iop

[PATCH V2 10/11] block-throttle: add a simple idle detection

2016-09-15 Thread Shaohua Li
its think time is above a threshold (by default 50us). 50us is choosen arbitrarily so far, but seems ok in test and should allow the cpu does a lot of things before dispatch IO. There is a knob to let user configure the threshold too. Signed-off-by: Shaohua Li --- block/bio.c | 2

[PATCH V2 04/11] block-throttle: add upgrade logic for LIMIT_HIGH state

2016-09-15 Thread Shaohua Li
iops cross high limit, we can upgrade queue state. The other case is children has higher high limit than parent. Children's high limit is meaningless. As long as parent's bps/iops cross high limit, we can upgrade queue state. Signed-off-by: Shaohua Li --- b

[PATCH V2 11/11] blk-throttle: ignore idle cgroup limit

2016-09-15 Thread Shaohua Li
Last patch introduces a way to detect idle cgroup. We use it to make upgrade/downgrade decision. Signed-off-by: Shaohua Li --- block/blk-throttle.c | 30 ++ 1 file changed, 18 insertions(+), 12 deletions(-) diff --git a/block/blk-throttle.c b/block/blk-throttle.c

[PATCH v3 01/11] block-throttle: prepare support multiple limits

2016-10-03 Thread Shaohua Li
We are going to support high/max limit, each cgroup will have 2 limits after that. This patch prepares for the multiple limits change. Signed-off-by: Shaohua Li --- block/blk-throttle.c | 109 --- 1 file changed, 68 insertions(+), 41 deletions

[PATCH v3 04/11] block-throttle: add upgrade logic for LIMIT_HIGH state

2016-10-03 Thread Shaohua Li
iops cross high limit, we can upgrade queue state. The other case is children has higher high limit than parent. Children's high limit is meaningless. As long as parent's bps/iops cross high limit, we can upgrade queue state. Signed-off-by: Shaohua Li --- b

[PATCH v3 02/11] block-throttle: add .high interface

2016-10-03 Thread Shaohua Li
Add high limit for cgroup and corresponding cgroup interface. Signed-off-by: Shaohua Li --- block/blk-throttle.c | 139 +++ 1 file changed, 107 insertions(+), 32 deletions(-) diff --git a/block/blk-throttle.c b/block/blk-throttle.c index 964b713

[PATCH v3 11/11] blk-throttle: ignore idle cgroup limit

2016-10-03 Thread Shaohua Li
Last patch introduces a way to detect idle cgroup. We use it to make upgrade/downgrade decision. Signed-off-by: Shaohua Li --- block/blk-throttle.c | 31 +++ 1 file changed, 19 insertions(+), 12 deletions(-) diff --git a/block/blk-throttle.c b/block/blk-throttle.c

[PATCH v3 05/11] block-throttle: add downgrade logic

2016-10-03 Thread Shaohua Li
When queue state machine is in LIMIT_MAX state, but a cgroup is below its high limit for some time, the queue should be downgraded to lower state as one cgroup's high limit isn't met. Signed-off-by: Shaohua Li --- block/blk-throttle.c | 187

[PATCH V3 00/11] block-throttle: add .high limit

2016-10-03 Thread Shaohua Li
;trial' logic, which creates too much fluctuation - Add a new idle cgroup detection - Other bug fixes and improvements http://marc.info/?l=linux-block&m=147395674732335&w=2 V1: http://marc.info/?l=linux-block&m=146292596425689&w=2 Shaohua Li (11): block-throttle: prepare s

[PATCH v3 10/11] block-throttle: add a simple idle detection

2016-10-03 Thread Shaohua Li
its think time is above a threshold (by default 50us for SSD and 1ms for HD). The idea is think time above the threshold will start to harm performance. HD is much slower so a longer think time is ok. There is a knob to let user configure the threshold too. Signed-off-by: Shaohua Li --- block/bi

[PATCH v3 09/11] block-throttle: make bandwidth change smooth

2016-10-03 Thread Shaohua Li
bandwidth, but that's something we pay for sharing. Note this doesn't completely avoid cgroup running under its high limit. The best way to guarantee cgroup doesn't run under its limit is to set max limit. For example, if we set cg1 max limit to 40, cg2 will never run under its high

[PATCH v3 06/11] blk-throttle: make sure expire time isn't too big

2016-10-03 Thread Shaohua Li
roup sleep time not too big wouldn't change cgroup bps/iops, but could make it wakeup more frequently, which isn't a big issue because throtl_slice * 8 is already quite big. Signed-off-by: Shaohua Li --- block/blk-throttle.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/block/b

[PATCH v3 07/11] blk-throttle: make throtl_slice tunable

2016-10-03 Thread Shaohua Li
. Signed-off-by: Shaohua Li --- block/blk-sysfs.c| 11 block/blk-throttle.c | 72 block/blk.h | 3 +++ 3 files changed, 64 insertions(+), 22 deletions(-) diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index f87a7e7

[PATCH v3 08/11] blk-throttle: detect completed idle cgroup

2016-10-03 Thread Shaohua Li
idle cgroup is hard. This patch handles a simple case, a cgroup doesn't dispatch any IO. We ignore such cgroup's limit, so other cgroups can use the bandwidth. Signed-off-by: Shaohua Li --- block/blk-throttle.c | 17 - 1 file changed, 16 insertions(+), 1 deletion(-

[PATCH v3 03/11] block-throttle: configure bps/iops limit for cgroup in high limit

2016-10-03 Thread Shaohua Li
for their high limit. Signed-off-by: Shaohua Li --- block/blk-throttle.c | 21 +++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/block/blk-throttle.c b/block/blk-throttle.c index 59d4b4c..e2b3704 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c

Re: [PATCH V3 00/11] block-throttle: add .high limit

2016-10-04 Thread Shaohua Li
Hi, On Tue, Oct 04, 2016 at 09:28:05AM -0400, Vivek Goyal wrote: > On Mon, Oct 03, 2016 at 02:20:19PM -0700, Shaohua Li wrote: > > Hi, > > > > The background is we don't have an ioscheduler for blk-mq yet, so we can't > > prioritize processes/cgroups. >

Re: [PATCH V3 00/11] block-throttle: add .high limit

2016-10-04 Thread Shaohua Li
On Tue, Oct 04, 2016 at 07:01:39PM +0200, Paolo Valente wrote: > > > Il giorno 04 ott 2016, alle ore 18:27, Tejun Heo ha > > scritto: > > > > Hello, > > > > On Tue, Oct 04, 2016 at 06:22:28PM +0200, Paolo Valente wrote: > >> Could you please elaborate more on this point? BFQ uses sectors > >>

Re: [PATCH V3 00/11] block-throttle: add .high limit

2016-10-04 Thread Shaohua Li
On Tue, Oct 04, 2016 at 07:43:48PM +0200, Paolo Valente wrote: > > > Il giorno 04 ott 2016, alle ore 19:28, Shaohua Li ha scritto: > > > > On Tue, Oct 04, 2016 at 07:01:39PM +0200, Paolo Valente wrote: > >> > >>> Il giorno 04 ott 2016, al

Re: [PATCH V3 00/11] block-throttle: add .high limit

2016-10-05 Thread Shaohua Li
On Wed, Oct 05, 2016 at 10:49:46AM -0400, Tejun Heo wrote: > Hello, Paolo. > > On Wed, Oct 05, 2016 at 02:37:00PM +0200, Paolo Valente wrote: > > In this respect, for your generic, unpredictable scenario to make > > sense, there must exist at least one real system that meets the > > requirements o

Re: [PATCH V3 00/11] block-throttle: add .high limit

2016-10-05 Thread Shaohua Li
On Wed, Oct 05, 2016 at 11:30:53AM -0700, Shaohua Li wrote: > On Wed, Oct 05, 2016 at 10:49:46AM -0400, Tejun Heo wrote: > > Hello, Paolo. > > > > On Wed, Oct 05, 2016 at 02:37:00PM +0200, Paolo Valente wrote: > > > In this respect, for your generic, unpredictabl

Re: [PATCH V3 00/11] block-throttle: add .high limit

2016-10-05 Thread Shaohua Li
On Wed, Oct 05, 2016 at 09:57:22PM +0200, Paolo Valente wrote: > > > Il giorno 05 ott 2016, alle ore 21:08, Shaohua Li ha scritto: > > > > On Wed, Oct 05, 2016 at 11:30:53AM -0700, Shaohua Li wrote: > >> On Wed, Oct 05, 2016 at 10:49:46AM -0400, Tejun

Re: [PATCH V3 00/11] block-throttle: add .high limit

2016-10-05 Thread Shaohua Li
On Wed, Oct 05, 2016 at 09:47:19PM +0200, Paolo Valente wrote: > > > Il giorno 05 ott 2016, alle ore 20:30, Shaohua Li ha scritto: > > > > On Wed, Oct 05, 2016 at 10:49:46AM -0400, Tejun Heo wrote: > >> Hello, Paolo. > >> > >> On Wed, Oct 0

Re: kernel BUG at block/bio.c:1785 while trying to issue a discard to LVM on RAID1 md

2016-10-05 Thread Shaohua Li
On Wed, Oct 05, 2016 at 10:31:11PM +0100, Sitsofe Wheeler wrote: > On 3 October 2016 at 17:47, Sitsofe Wheeler wrote: > > > > While trying to do a discard (via blkdiscard --length 1048576 > > /dev/) to an LVM device atop a two disk md RAID1 the > > following oops was generated: > > > > [ 103.3062

Re: [PATCH V3 00/11] block-throttle: add .high limit

2016-10-06 Thread Shaohua Li
On Thu, Oct 06, 2016 at 09:58:44AM +0200, Paolo Valente wrote: > > > Il giorno 05 ott 2016, alle ore 22:46, Shaohua Li ha scritto: > > > > On Wed, Oct 05, 2016 at 09:47:19PM +0200, Paolo Valente wrote: > >> > >>> Il giorno 05 ott 2016, all

Re: [PATCH 3/3 v2] md: unblock array if bad blocks have been acknowledged

2016-10-19 Thread Shaohua Li
On Tue, Oct 18, 2016 at 04:10:24PM +0200, Tomasz Majchrzak wrote: > Once external metadata handler acknowledges all bad blocks (by writing > to rdev 'bad_blocks' sysfs file), it requests to unblock the array. > Check if all bad blocks are actually acknowledged as there might be a > race if new bad

[PATCH] badblocks: badblocks_set/clear update unacked_exist

2016-10-20 Thread Shaohua Li
When bandblocks_set acknowledges a range or badblocks_clear a range, it's possible all badblocks are acknowledged. We should update unacked_exist if this occurs. Signed-off-by: Shaohua Li --- block/badblocks.c | 23 +++ 1 file changed, 23 insertions(+) diff --git a/

  1   2   3   4   5   >