[PATCH 2/2] nvme: allocate nvme_queue in correct node

2017-01-31 Thread Shaohua Li
nvme_queue is per-cpu queue (mostly). Allocating it in node where blk-mq will use it. Signed-off-by: Shaohua Li <s...@fb.com> --- drivers/nvme/host/pci.c | 19 +++ 1 file changed, 15 insertions(+), 4 deletions(-) diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host

[PATCH 1/2] blk-mq: allocate blk_mq_tags and requests in correct node

2017-01-31 Thread Shaohua Li
blk_mq_tags/requests of specific hardware queue are mostly used in specific cpus, which might not be in the same numa node as disk. For example, a nvme card is in node 0. half hardware queue will be used by node 0, the other node 1. Signed-off-by: Shaohua Li <s...@fb.com> --- block/blk-mq.

Re: [PATCH 00/17] md: cleanup on direct access to bvec table

2017-02-16 Thread Shaohua Li
On Fri, Feb 17, 2017 at 09:25:27AM +0800, Ming Lei wrote: > Hi Shaohua, > > On Fri, Feb 17, 2017 at 6:16 AM, Shaohua Li <s...@kernel.org> wrote: > > On Thu, Feb 16, 2017 at 07:45:30PM +0800, Ming Lei wrote: > >> In MD's resync I/O path, there are lots of direct a

Re: [PATCH v2 5/5] md: fast clone bio in bio_clone_mddev()

2017-02-15 Thread Shaohua Li
On Tue, Feb 14, 2017 at 08:01:09AM -0800, Christoph Hellwig wrote: > On Tue, Feb 14, 2017 at 11:29:03PM +0800, Ming Lei wrote: > > Firstly bio_clone_mddev() is used in raid normal I/O and isn't > > in resync I/O path. > > > > Secondly all the direct access to bvec table in raid happens on > >

Re: [PATCH v2 0/5] md: use bio_clone_fast()

2017-02-15 Thread Shaohua Li
On Tue, Feb 14, 2017 at 11:28:58PM +0800, Ming Lei wrote: > Hi, > > This patches replaces bio_clone() with bio_fast_clone() in > bio_clone_mddev() because: > > 1) bio_clone_mddev() is used in raid normal I/O and isn't in > resync I/O path, and all the direct access to bvec table in > raid

Re: [PATCH v2 5/5] md: fast clone bio in bio_clone_mddev()

2017-02-15 Thread Shaohua Li
On Wed, Feb 15, 2017 at 11:20:25AM -0800, Shaohua Li wrote: > On Tue, Feb 14, 2017 at 08:01:09AM -0800, Christoph Hellwig wrote: > > On Tue, Feb 14, 2017 at 11:29:03PM +0800, Ming Lei wrote: > > > Firstly bio_clone_mddev() is used in raid normal I/O and isn't > &

Re: [PATCH v2 05/13] md: raid1: simplify r1buf_pool_free()

2017-02-28 Thread Shaohua Li
On Tue, Feb 28, 2017 at 11:41:35PM +0800, Ming Lei wrote: > This patch gets each page's reference of each bio for resync, > then r1buf_pool_free() gets simplified a lot. > > The same policy has been taken in raid10's buf pool allocation/free > too. We are going to delete the code, this simplify

Re: [PATCH v2 06/13] md: raid1: don't use bio's vec table to manage resync pages

2017-02-28 Thread Shaohua Li
eset any more. > > This patch can be thought as a cleanup too > > Suggested-by: Shaohua Li <s...@kernel.org> > Signed-off-by: Ming Lei <tom.leim...@gmail.com> > --- > drivers/md/raid1.c | 83 > ++ > 1 file

Re: [PATCH v2 11/13] md: raid10: don't use bio's vec table to manage resync pages

2017-02-28 Thread Shaohua Li
hly new now in these functions > and not necessary to reset any more. > > This patch can be thought as cleanup too. > > Suggested-by: Shaohua Li <s...@kernel.org> > Signed-off-by: Ming Lei <tom.leim...@gmail.com> > --- > drivers/md/raid10.c | 125 > +++

Re: [PATCH v2 08/13] md: raid1: use bio helper in process_checks()

2017-02-28 Thread Shaohua Li
On Tue, Feb 28, 2017 at 11:41:38PM +0800, Ming Lei wrote: > Avoid to direct access to bvec table. > > Signed-off-by: Ming Lei > --- > drivers/md/raid1.c | 12 > 1 file changed, 8 insertions(+), 4 deletions(-) > > diff --git a/drivers/md/raid1.c

Re: [PATCH v2 04/13] md: prepare for managing resync I/O pages in clean way

2017-02-28 Thread Shaohua Li
On Tue, Feb 28, 2017 at 11:41:34PM +0800, Ming Lei wrote: > Now resync I/O use bio's bec table to manage pages, > this way is very hacky, and may not work any more > once multipage bvec is introduced. > > So introduce helpers and new data structure for > managing resync I/O pages more cleanly. >

Re: kernel 4.8-rc5 kernel BUG at block/blk-core.c:2032!

2016-09-08 Thread Shaohua Li
On Thu, Sep 08, 2016 at 10:16:59AM -0600, Jens Axboe wrote: > On 09/08/2016 02:23 AM, Stefan Priebe - Profihost AG wrote: > > Hi, > > > > while trying Kernel 4.8-rc5 my raid5 breaks every few minutes. > > > > Trace: > > [ cut here ] > > kernel BUG at

Re: kernel 4.8-rc5 kernel BUG at block/blk-core.c:2032!

2016-09-09 Thread Shaohua Li
On Fri, Sep 09, 2016 at 08:03:42PM +0200, Stefan Priebe - Profihost AG wrote: > Am 08.09.2016 um 19:33 schrieb Shaohua Li: > > On Thu, Sep 08, 2016 at 10:16:59AM -0600, Jens Axboe wrote: > >> On 09/08/2016 02:23 AM, Stefan Priebe - Profihost AG wrote: > >>> Hi, >

[PATCH V2 07/11] blk-throttle: make throtl_slice tunable

2016-09-15 Thread Shaohua Li
. Signed-off-by: Shaohua Li <s...@fb.com> --- block/blk-sysfs.c| 11 block/blk-throttle.c | 72 block/blk.h | 3 +++ 3 files changed, 64 insertions(+), 22 deletions(-) diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c

[PATCH V2 06/11] blk-throttle: make sure expire time isn't too big

2016-09-15 Thread Shaohua Li
not too big wouldn't change cgroup bps/iops, but could make it wakeup more frequently, which isn't a big issue because throtl_slice * 8 is already quite big. Signed-off-by: Shaohua Li <s...@fb.com> --- block/blk-throttle.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/block/blk-thro

[PATCH V2 02/11] block-throttle: add .high interface

2016-09-15 Thread Shaohua Li
Add high limit for cgroup and corresponding cgroup interface. Signed-off-by: Shaohua Li <s...@fb.com> --- block/blk-throttle.c | 139 +++ 1 file changed, 107 insertions(+), 32 deletions(-) diff --git a/block/blk-throttle.c b/block/blk-thro

[PATCH V2 08/11] blk-throttle: detect completed idle cgroup

2016-09-15 Thread Shaohua Li
is hard. This patch handles a simple case, a cgroup doesn't dispatch any IO. We ignore such cgroup's limit, so other cgroups can use the bandwidth. Signed-off-by: Shaohua Li <s...@fb.com> --- block/blk-throttle.c | 17 - 1 file changed, 16 insertions(+), 1 deletion(-) diff

[PATCH V2 09/11] block-throttle: make bandwidth change smooth

2016-09-15 Thread Shaohua Li
s something we pay for sharing. Note this doesn't completely avoid cgroup running under its high limit. The best way to guarantee cgroup doesn't run under its limit is to set max limit. For example, if we set cg1 max limit to 40, cg2 will never run under its high limit. Signed-off-by: Shaohua L

[PATCH V2 00/11] block-throttle: add .high limit

2016-09-15 Thread Shaohua Li
9=2 -- Shaohua Li (11): block-throttle: prepare support multiple limits block-throttle: add .high interface block-throttle: configure bps/iops limit for cgroup in high limit block-throttle: add upgrade logic for LIMIT_HIGH state block-throttle:

[PATCH V2 10/11] block-throttle: add a simple idle detection

2016-09-15 Thread Shaohua Li
(by default 50us). 50us is choosen arbitrarily so far, but seems ok in test and should allow the cpu does a lot of things before dispatch IO. There is a knob to let user configure the threshold too. Signed-off-by: Shaohua Li <s...@fb.com> --- block/bio.c | 2 ++ block/blk-s

[PATCH V2 04/11] block-throttle: add upgrade logic for LIMIT_HIGH state

2016-09-15 Thread Shaohua Li
limit, we can upgrade queue state. The other case is children has higher high limit than parent. Children's high limit is meaningless. As long as parent's bps/iops cross high limit, we can upgrade queue state. Signed-off-by: Shaohua Li <s...@fb.com> --- block/blk-throttle.c

[PATCH V2 11/11] blk-throttle: ignore idle cgroup limit

2016-09-15 Thread Shaohua Li
Last patch introduces a way to detect idle cgroup. We use it to make upgrade/downgrade decision. Signed-off-by: Shaohua Li <s...@fb.com> --- block/blk-throttle.c | 30 ++ 1 file changed, 18 insertions(+), 12 deletions(-) diff --git a/block/blk-throttle.c b/blo

Re: [PATCH V3 00/11] block-throttle: add .high limit

2016-10-05 Thread Shaohua Li
On Wed, Oct 05, 2016 at 10:49:46AM -0400, Tejun Heo wrote: > Hello, Paolo. > > On Wed, Oct 05, 2016 at 02:37:00PM +0200, Paolo Valente wrote: > > In this respect, for your generic, unpredictable scenario to make > > sense, there must exist at least one real system that meets the > > requirements

Re: [PATCH V3 00/11] block-throttle: add .high limit

2016-10-05 Thread Shaohua Li
On Wed, Oct 05, 2016 at 11:30:53AM -0700, Shaohua Li wrote: > On Wed, Oct 05, 2016 at 10:49:46AM -0400, Tejun Heo wrote: > > Hello, Paolo. > > > > On Wed, Oct 05, 2016 at 02:37:00PM +0200, Paolo Valente wrote: > > > In this respect, for your generic, unpredictabl

Re: [PATCH V3 00/11] block-throttle: add .high limit

2016-10-05 Thread Shaohua Li
On Wed, Oct 05, 2016 at 09:57:22PM +0200, Paolo Valente wrote: > > > Il giorno 05 ott 2016, alle ore 21:08, Shaohua Li <s...@fb.com> ha scritto: > > > > On Wed, Oct 05, 2016 at 11:30:53AM -0700, Shaohua Li wrote: > >> On Wed, Oct 05, 2016 at 10:49:46AM -0400,

Re: [PATCH V3 00/11] block-throttle: add .high limit

2016-10-05 Thread Shaohua Li
On Wed, Oct 05, 2016 at 09:47:19PM +0200, Paolo Valente wrote: > > > Il giorno 05 ott 2016, alle ore 20:30, Shaohua Li <s...@fb.com> ha scritto: > > > > On Wed, Oct 05, 2016 at 10:49:46AM -0400, Tejun Heo wrote: > >> Hello, Paolo. > >> > &g

[PATCH v3 07/11] blk-throttle: make throtl_slice tunable

2016-10-03 Thread Shaohua Li
. Signed-off-by: Shaohua Li <s...@fb.com> --- block/blk-sysfs.c| 11 block/blk-throttle.c | 72 block/blk.h | 3 +++ 3 files changed, 64 insertions(+), 22 deletions(-) diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c

[PATCH v3 08/11] blk-throttle: detect completed idle cgroup

2016-10-03 Thread Shaohua Li
is hard. This patch handles a simple case, a cgroup doesn't dispatch any IO. We ignore such cgroup's limit, so other cgroups can use the bandwidth. Signed-off-by: Shaohua Li <s...@fb.com> --- block/blk-throttle.c | 17 - 1 file changed, 16 insertions(+), 1 deletion(-) diff

[PATCH v3 03/11] block-throttle: configure bps/iops limit for cgroup in high limit

2016-10-03 Thread Shaohua Li
for their high limit. Signed-off-by: Shaohua Li <s...@fb.com> --- block/blk-throttle.c | 21 +++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/block/blk-throttle.c b/block/blk-throttle.c index 59d4b4c..e2b3704 100644 --- a/block/blk-throttle.c +++ b/blo

[PATCH v3 01/11] block-throttle: prepare support multiple limits

2016-10-03 Thread Shaohua Li
We are going to support high/max limit, each cgroup will have 2 limits after that. This patch prepares for the multiple limits change. Signed-off-by: Shaohua Li <s...@fb.com> --- block/blk-throttle.c | 109 --- 1 file changed, 68 insertions(

[PATCH v3 02/11] block-throttle: add .high interface

2016-10-03 Thread Shaohua Li
Add high limit for cgroup and corresponding cgroup interface. Signed-off-by: Shaohua Li <s...@fb.com> --- block/blk-throttle.c | 139 +++ 1 file changed, 107 insertions(+), 32 deletions(-) diff --git a/block/blk-throttle.c b/block/blk-thro

[PATCH v3 06/11] blk-throttle: make sure expire time isn't too big

2016-10-03 Thread Shaohua Li
not too big wouldn't change cgroup bps/iops, but could make it wakeup more frequently, which isn't a big issue because throtl_slice * 8 is already quite big. Signed-off-by: Shaohua Li <s...@fb.com> --- block/blk-throttle.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/block/blk-thro

[PATCH v3 10/11] block-throttle: add a simple idle detection

2016-10-03 Thread Shaohua Li
(by default 50us for SSD and 1ms for HD). The idea is think time above the threshold will start to harm performance. HD is much slower so a longer think time is ok. There is a knob to let user configure the threshold too. Signed-off-by: Shaohua Li <s...@fb.com> --- block/bio.c

Re: [PATCH 3/3 v2] md: unblock array if bad blocks have been acknowledged

2016-10-19 Thread Shaohua Li
On Tue, Oct 18, 2016 at 04:10:24PM +0200, Tomasz Majchrzak wrote: > Once external metadata handler acknowledges all bad blocks (by writing > to rdev 'bad_blocks' sysfs file), it requests to unblock the array. > Check if all bad blocks are actually acknowledged as there might be a > race if new bad

[PATCH] badblocks: badblocks_set/clear update unacked_exist

2016-10-20 Thread Shaohua Li
When bandblocks_set acknowledges a range or badblocks_clear a range, it's possible all badblocks are acknowledged. We should update unacked_exist if this occurs. Signed-off-by: Shaohua Li <s...@fb.com> --- block/badblocks.c | 23 +++ 1 file changed, 23 insertions(+)

Re: "creative" bio usage in the RAID code

2016-11-14 Thread Shaohua Li
On Sat, Nov 12, 2016 at 09:42:38AM -0800, Christoph Hellwig wrote: > On Fri, Nov 11, 2016 at 11:02:23AM -0800, Shaohua Li wrote: > > > It's mostly about the RAID1 and RAID10 code which does a lot of funny > > > things with the bi_iov_vec and bi_vcnt fields, which we'd pref

Re: [PATCH V4 00/15] blk-throttle: add .high limit

2016-11-14 Thread Shaohua Li
On Mon, Nov 14, 2016 at 04:41:33PM -0800, Bart Van Assche wrote: > On 11/14/2016 04:05 PM, Shaohua Li wrote: > > On Mon, Nov 14, 2016 at 02:46:22PM -0800, Bart Van Assche wrote: > > > On 11/14/2016 02:22 PM, Shaohua Li wrote: > > > > The background is we don't have

Re: [PATCH V4 00/15] blk-throttle: add .high limit

2016-11-14 Thread Shaohua Li
On Mon, Nov 14, 2016 at 02:46:22PM -0800, Bart Van Assche wrote: > On 11/14/2016 02:22 PM, Shaohua Li wrote: > > The background is we don't have an ioscheduler for blk-mq yet, so we can't > > prioritize processes/cgroups. This patch set tries to add basic arbitration > > bet

Re: [PATCH V4 00/15] blk-throttle: add .high limit

2016-11-14 Thread Shaohua Li
On Mon, Nov 14, 2016 at 05:18:28PM -0800, Bart Van Assche wrote: > On 11/14/2016 04:49 PM, Shaohua Li wrote: > > On Mon, Nov 14, 2016 at 04:41:33PM -0800, Bart Van Assche wrote: > > > Thank you for pointing me to the discussion thread about v3 of this patch > > >

Re: "creative" bio usage in the RAID code

2016-11-11 Thread Shaohua Li
On Thu, Nov 10, 2016 at 11:46:36AM -0800, Christoph Hellwig wrote: > Hi Shaohua, > > one of the major issues with Ming Lei's multipage biovec works > is that we can't easily enabled the MD RAID code for it. I had > a quick chat on that with Chris and Jens and they suggested talking > to you

Re: [PATCH V4 09/15] blk-throttle: make bandwidth change smooth

2016-11-23 Thread Shaohua Li
On Wed, Nov 23, 2016 at 04:23:35PM -0500, Tejun Heo wrote: > Hello, > > On Mon, Nov 14, 2016 at 02:22:16PM -0800, Shaohua Li wrote: > > cg1/cg2 bps: 10/80 -> 15/105 -> 20/100 -> 25/95 -> 30/90 -> 35/85 -> 40/80 > > -> 45/75 -> 10/80 > > I wonde

Re: [PATCH V4 07/15] blk-throttle: make throtl_slice tunable

2016-11-22 Thread Shaohua Li
On Tue, Nov 22, 2016 at 04:27:15PM -0500, Tejun Heo wrote: > Hello, > > On Mon, Nov 14, 2016 at 02:22:14PM -0800, Shaohua Li wrote: > > throtl_slice is important for blk-throttling. A lot of stuffes depend on > > it, for example, throughput measurement. It has 100ms def

Re: [PATCH V4 10/15] blk-throttle: add a simple idle detection

2016-11-28 Thread Shaohua Li
On Mon, Nov 28, 2016 at 05:21:48PM -0500, Tejun Heo wrote: > Hello, Shaohua. > > On Wed, Nov 23, 2016 at 05:15:18PM -0800, Shaohua Li wrote: > > > Hmm... I'm not sure thinktime is the best measure here. Think time is > > > used by cfq mainly to tell the likely futur

Re: [PATCH V4 11/15] blk-throttle: add interface to configure think time threshold

2016-11-28 Thread Shaohua Li
On Mon, Nov 28, 2016 at 05:08:18PM -0500, Tejun Heo wrote: > Hello, Shaohua. > > On Wed, Nov 23, 2016 at 05:06:30PM -0800, Shaohua Li wrote: > > > Shouldn't this be a per-cgroup setting along with latency target? > > > These two are the parameters which d

Re: [PATCH V4 03/15] blk-throttle: configure bps/iops limit for cgroup in high limit

2016-11-22 Thread Shaohua Li
On Tue, Nov 22, 2016 at 03:16:43PM -0500, Tejun Heo wrote: > On Mon, Nov 14, 2016 at 02:22:10PM -0800, Shaohua Li wrote: > > each queue will have a state machine. Initially queue is in LIMIT_HIGH > > state, which means all cgroups will be throttled according to their high >

Re: [PATCH V4 05/15] blk-throttle: add downgrade logic

2016-11-22 Thread Shaohua Li
On Tue, Nov 22, 2016 at 04:42:00PM -0500, Tejun Heo wrote: > Hello, > > On Tue, Nov 22, 2016 at 04:21:21PM -0500, Tejun Heo wrote: > > 1. A cgroup and its high and max limits don't have much to do with > >other cgroups and their limits. I don't get how the choice between > >high and max

Re: [PATCH V4 00/15] blk-throttle: add .high limit

2016-11-15 Thread Shaohua Li
On Tue, Nov 15, 2016 at 11:53:39AM -0800, Bart Van Assche wrote: > On 11/14/2016 05:28 PM, Shaohua Li wrote: > > On Mon, Nov 14, 2016 at 05:18:28PM -0800, Bart Van Assche wrote: > > > Unless someone can convince me of the opposite I think that coming up with > > > an a

[PATCH V4 06/15] blk-throttle: make sure expire time isn't too big

2016-11-14 Thread Shaohua Li
not too big wouldn't change cgroup bps/iops, but could make it wakeup more frequently, which isn't a big issue because throtl_slice * 8 is already quite big. Signed-off-by: Shaohua Li <s...@fb.com> --- block/blk-throttle.c | 4 1 file changed, 4 insertions(+) diff --git a/blo

[PATCH V4 02/15] blk-throttle: add .high interface

2016-11-14 Thread Shaohua Li
Add high limit for cgroup and corresponding cgroup interface. Signed-off-by: Shaohua Li <s...@fb.com> --- block/blk-throttle.c | 132 --- 1 file changed, 103 insertions(+), 29 deletions(-) diff --git a/block/blk-throttle.c b/block/blk-thro

[PATCH V4 03/15] blk-throttle: configure bps/iops limit for cgroup in high limit

2016-11-14 Thread Shaohua Li
for their high limit. Signed-off-by: Shaohua Li <s...@fb.com> --- block/blk-throttle.c | 21 +++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/block/blk-throttle.c b/block/blk-throttle.c index a564215..ec53671 100644 --- a/block/blk-throttle.c +++ b/blo

[PATCH V4 13/15] blk-throttle: add a mechanism to estimate IO latency

2016-11-14 Thread Shaohua Li
-by: Shaohua Li <s...@fb.com> --- block/blk-throttle.c | 191 +- include/linux/blk_types.h | 2 + 2 files changed, 190 insertions(+), 3 deletions(-) diff --git a/block/blk-throttle.c b/block/blk-throttle.c index 01b494d..a05d351 100644 --- a

[PATCH V4 15/15] blk-throttle: add latency target support

2016-11-14 Thread Shaohua Li
for SSD as we can't calcualte the latency target for hard disk. And this is only for cgroup leaf node so far. Signed-off-by: Shaohua Li <s...@fb.com> --- block/blk-throttle.c | 58 --- include/linux/blk_types.h | 1 + 2 files changed, 56 inse

[PATCH V4 07/15] blk-throttle: make throtl_slice tunable

2016-11-14 Thread Shaohua Li
. Signed-off-by: Shaohua Li <s...@fb.com> --- block/blk-sysfs.c| 11 block/blk-throttle.c | 77 +--- block/blk.h | 3 ++ 3 files changed, 69 insertions(+), 22 deletions(-) diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c

[PATCH V4 00/15] blk-throttle: add .high limit

2016-11-14 Thread Shaohua Li
ents http://marc.info/?l=linux-block=147395674732335=2 V1: http://marc.info/?l=linux-block=146292596425689=2 Shaohua Li (15): blk-throttle: prepare support multiple limits blk-throttle: add .high interface blk-throttle: configure bps/iops limit for cgroup in high limit blk-throttle: ad

[PATCH V4 11/15] blk-throttle: add interface to configure think time threshold

2016-11-14 Thread Shaohua Li
Add interface to configure the threshold Signed-off-by: Shaohua Li <s...@fb.com> --- block/blk-sysfs.c| 7 +++ block/blk-throttle.c | 25 + block/blk.h | 4 3 files changed, 36 insertions(+) diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c

[PATCH V4 01/15] blk-throttle: prepare support multiple limits

2016-11-14 Thread Shaohua Li
We are going to support high/max limit, each cgroup will have 2 limits after that. This patch prepares for the multiple limits change. Signed-off-by: Shaohua Li <s...@fb.com> --- block/blk-throttle.c | 110 --- 1 file changed, 69 insertions(

[PATCH V4 10/15] blk-throttle: add a simple idle detection

2016-11-14 Thread Shaohua Li
(by default 50us for SSD and 1ms for HD). The idea is think time above the threshold will start to harm performance. HD is much slower so a longer think time is ok. Signed-off-by: Shaohua Li <s...@fb.com> --- block/bio.c | 2 ++ block/blk-throttle.c

[PATCH V4 09/15] blk-throttle: make bandwidth change smooth

2016-11-14 Thread Shaohua Li
s something we pay for sharing. Note this doesn't completely avoid cgroup running under its high limit. The best way to guarantee cgroup doesn't run under its limit is to set max limit. For example, if we set cg1 max limit to 40, cg2 will never run under its high limit. Signed-off-by: Shaohua L

[PATCH V4 05/15] blk-throttle: add downgrade logic

2016-11-14 Thread Shaohua Li
When queue state machine is in LIMIT_MAX state, but a cgroup is below its high limit for some time, the queue should be downgraded to lower state as one cgroup's high limit isn't met. Signed-off-by: Shaohua Li <s...@fb.com> --- block/blk-throttle.c

[PATCH V4 12/15] blk-throttle: ignore idle cgroup limit

2016-11-14 Thread Shaohua Li
Last patch introduces a way to detect idle cgroup. We use it to make upgrade/downgrade decision. And the new algorithm can detect completely idle cgroup too, so we can delete the corresponding code. Signed-off-by: Shaohua Li <s...@fb.com> --- block/blk-throttle.

[PATCH V4 14/15] blk-throttle: add interface for per-cgroup target latency

2016-11-14 Thread Shaohua Li
Add interface for per-cgroup target latency. This latency is for 4k request. Signed-off-by: Shaohua Li <s...@fb.com> --- block/blk-throttle.c | 67 1 file changed, 67 insertions(+) diff --git a/block/blk-throttle.c b/block/blk-thro

[PATCH V4 04/15] blk-throttle: add upgrade logic for LIMIT_HIGH state

2016-11-14 Thread Shaohua Li
limit, we can upgrade queue state. The other case is children has higher high limit than parent. Children's high limit is meaningless. As long as parent's bps/iops cross high limit, we can upgrade queue state. Signed-off-by: Shaohua Li <s...@fb.com> --- block/blk-throttle.c

[PATCH 1/2] block: immediately dispatch big size request

2016-10-28 Thread Shaohua Li
. Signed-off-by: Shaohua Li <s...@fb.com> --- block/blk-core.c | 4 +++- include/linux/blkdev.h | 1 + 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/block/blk-core.c b/block/blk-core.c index 14d7c07..0a396e9 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1763,7 +

[PATCH 2/2] blk-mq: immediately dispatch big size request

2016-10-28 Thread Shaohua Li
This is corresponding part for blk-mq. Disk with multiple hardware queues doesn't need this as we only hold 1 request at most. Signed-off-by: Shaohua Li <s...@fb.com> --- block/blk-mq.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/block/blk-mq.c b/block/bl

[PATCH V2 1/2] block: immediately dispatch big size request

2016-11-03 Thread Shaohua Li
: check the last request instead of the first request, so as long as there is one big size request we flush the plug. Signed-off-by: Shaohua Li <s...@fb.com> --- block/blk-core.c | 4 +++- include/linux/blkdev.h | 1 + 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/blo

Re: [PATCH V2 2/2] blk-mq: immediately dispatch big size request

2016-11-03 Thread Shaohua Li
On Thu, Nov 03, 2016 at 05:09:54PM -0700, Christoph Hellwig wrote: > On Thu, Nov 03, 2016 at 05:03:54PM -0700, Shaohua Li wrote: > > This is corresponding part for blk-mq. Disk with multiple hardware > > queues doesn't need this as we only hold 1 request at most. > > A

Re: [PATCH V4 15/15] blk-throttle: add latency target support

2016-11-29 Thread Shaohua Li
On Tue, Nov 29, 2016 at 05:54:46PM -0500, Tejun Heo wrote: > Hello, > > On Tue, Nov 29, 2016 at 10:14:03AM -0800, Shaohua Li wrote: > > What the patches do doesn't conflict what you are talking about. We need a > > way > > to detect if cgroups are idle or active.

Re: raid0 vs. mkfs

2016-12-07 Thread Shaohua Li
On Wed, Dec 07, 2016 at 07:50:33PM +0800, Coly Li wrote: > On 2016/11/30 上午6:45, Avi Kivity wrote: > > On 11/29/2016 11:14 PM, NeilBrown wrote: > [snip] > > >>> So I disagree that all the work should be pushed to the merging layer. > >>> It has less information to work with, so the fewer

Re: raid0 vs. mkfs

2016-12-08 Thread Shaohua Li
On Fri, Dec 09, 2016 at 12:44:57AM +0800, Coly Li wrote: > On 2016/12/8 上午12:59, Shaohua Li wrote: > > On Wed, Dec 07, 2016 at 07:50:33PM +0800, Coly Li wrote: > [snip] > > Thanks for doing this, Coly! For raid0, this totally makes sense. The raid0 > > zones make things a l

[PATCH] block_dev: don't update file access position for sync direct IO

2016-12-13 Thread Shaohua Li
For sync direct IO, generic_file_direct_write/generic_file_read_iter will update file access position. Don't duplicate the update in .direct_IO. This cause my raid array can't assemble. Cc: Christoph Hellwig <h...@lst.de> Cc: Jens Axboe <ax...@fb.com> Signed-off-by: Shaohua Li

[PATCH V5 06/17] blk-throttle: add downgrade logic

2016-12-15 Thread Shaohua Li
When queue state machine is in LIMIT_MAX state, but a cgroup is below its low limit for some time, the queue should be downgraded to lower state as one cgroup's low limit isn't met. Signed-off-by: Shaohua Li <s...@fb.com> --- block/blk-throttle.c

[PATCH V5 02/17] blk-throttle: prepare support multiple limits

2016-12-15 Thread Shaohua Li
We are going to support low/max limit, each cgroup will have 2 limits after that. This patch prepares for the multiple limits change. Signed-off-by: Shaohua Li <s...@fb.com> --- block/blk-throttle.c | 114 +-- 1 file changed, 73 insertions(

[PATCH V5 09/17] blk-throttle: detect completed idle cgroup

2016-12-15 Thread Shaohua Li
is hard. This patch handles a simple case, a cgroup doesn't dispatch any IO. We ignore such cgroup's limit, so other cgroups can use the bandwidth. Signed-off-by: Shaohua Li <s...@fb.com> --- block/blk-throttle.c | 19 ++- 1 file changed, 18 insertions(+), 1 deletion(-) diff

[PATCH V5 08/17] blk-throttle: make throtl_slice tunable

2016-12-15 Thread Shaohua Li
'throttle_sample_time' reflects its character better. Signed-off-by: Shaohua Li <s...@fb.com> --- Documentation/block/queue-sysfs.txt | 6 +++ block/blk-sysfs.c | 10 + block/blk-throttle.c| 77 ++--- block/blk.h

[PATCH V5 11/17] blk-throttle: add a simple idle detection

2016-12-15 Thread Shaohua Li
50us for SSD and 1ms for HD). The idea is think time above the threshold will start to harm performance. HD is much slower so a longer think time is ok. Signed-off-by: Shaohua Li <s...@fb.com> --- block/bio.c | 2 ++ block/blk-throttle.c

[PATCH V5 01/17] blk-throttle: use U64_MAX/UINT_MAX to replace -1

2016-12-15 Thread Shaohua Li
clean up the code to avoid using -1 Signed-off-by: Shaohua Li <s...@fb.com> --- block/blk-throttle.c | 32 1 file changed, 16 insertions(+), 16 deletions(-) diff --git a/block/blk-throttle.c b/block/blk-throttle.c index a6bb4fe..e45bf50 100644 --- a/blo

[PATCH V5 03/17] blk-throttle: add .low interface

2016-12-15 Thread Shaohua Li
Add low limit for cgroup and corresponding cgroup interface. Signed-off-by: Shaohua Li <s...@fb.com> --- block/blk-throttle.c | 134 --- 1 file changed, 106 insertions(+), 28 deletions(-) diff --git a/block/blk-throttle.c b/block/blk-thro

[PATCH V5 13/17] blk-throttle: ignore idle cgroup limit

2016-12-15 Thread Shaohua Li
Last patch introduces a way to detect idle cgroup. We use it to make upgrade/downgrade decision. And the new algorithm can detect completely idle cgroup too, so we can delete the corresponding code. Signed-off-by: Shaohua Li <s...@fb.com> --- block/blk-throttle.

[PATCH V5 15/17] block: track request size in blk_issue_stat

2016-12-15 Thread Shaohua Li
Currently there is no way to know the request size when the request is finished. Next patch will need this info, so add to blk_issue_stat. With this, we will have 49bits to track time, which still is very long time. Signed-off-by: Shaohua Li <s...@fb.com> --- block/blk-core.c

[PATCH V5 07/17] blk-throttle: make sure expire time isn't too big

2016-12-15 Thread Shaohua Li
not too big wouldn't change cgroup bps/iops, but could make it wakeup more frequently, which isn't a big issue because throtl_slice * 8 is already quite big. Signed-off-by: Shaohua Li <s...@fb.com> --- block/blk-throttle.c | 4 1 file changed, 4 insertions(+) diff --git a/blo

[PATCH V5 16/17] blk-throttle: add a mechanism to estimate IO latency

2016-12-15 Thread Shaohua Li
this feature is SSD only, we probably can use a fixed threshold like 4ms for hard disk though. Signed-off-by: Shaohua Li <s...@fb.com> --- block/blk-stat.c | 4 ++ block/blk-throttle.c | 159 +- block/blk.h | 2 + include

[PATCH V5 14/17] blk-throttle: add interface for per-cgroup target latency

2016-12-15 Thread Shaohua Li
the interface in this way: echo "8:16 rbps=2097152 wbps=max latency=100 idle=200" > io.low latency is in microsecond unit Signed-off-by: Shaohua Li <s...@fb.com> --- block/blk-throttle.c | 30 ++ 1 file changed, 26 insertions(+), 4 deletions(-) dif

[PATCH V5 04/17] blk-throttle: configure bps/iops limit for cgroup in low limit

2016-12-15 Thread Shaohua Li
configured by user. For low limit, cgroup will use the minimal between low limit and max limit configured by user. Last patch already did the convertion. Signed-off-by: Shaohua Li <s...@fb.com> --- block/blk-throttle.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/block/blk-thrott

[PATCH V5 12/17] blk-throttle: add interface to configure idle time threshold

2016-12-15 Thread Shaohua Li
Add interface to configure the threshold. The io.low interface will like: echo "8:16 rbps=2097152 wbps=max idle=2000" > io.low idle is in microsecond unit. Signed-off-by: Shaohua Li <s...@fb.com> --- block/blk-throttle.c | 27 --- 1 file changed,

[PATCH V5 17/17] blk-throttle: add latency target support

2016-12-15 Thread Shaohua Li
idle and other cgroups can dispatch more IO. Currently this latency target check is only for SSD as we can't calcualte the latency target for hard disk. And this is only for cgroup leaf node so far. Signed-off-by: Shaohua Li <s...@fb.com> --- block/blk-throttle.

Re: [PATCH V5 00/17] blk-throttle: add .low limit

2017-01-09 Thread Shaohua Li
On Mon, Jan 09, 2017 at 04:46:35PM -0500, Tejun Heo wrote: > Hello, > > Sorry about the long delay. Generally looks good to me. Overall, > there are only a few things that I think should be addressed. Thanks for your time! > * Low limit should default to zero. I forgot to change it after

Re: [PATCH V4 15/15] blk-throttle: add latency target support

2016-11-29 Thread Shaohua Li
On Tue, Nov 29, 2016 at 12:31:08PM -0500, Tejun Heo wrote: > Hello, > > On Mon, Nov 14, 2016 at 02:22:22PM -0800, Shaohua Li wrote: > > One hard problem adding .high limit is to detect idle cgroup. If one > > cgroup doesn't dispatch enough IO against its high limit, we must

Re: [PATCH V4 13/15] blk-throttle: add a mechanism to estimate IO latency

2016-11-29 Thread Shaohua Li
On Tue, Nov 29, 2016 at 12:24:35PM -0500, Tejun Heo wrote: > Hello, Shaohua. > > On Mon, Nov 14, 2016 at 02:22:20PM -0800, Shaohua Li wrote: > > To do this, we sample some data, eg, average latency for request size > > 4k, 8k, 16k, 32k, 64k. We then use an equation f(

[PATCH V6 09/18] blk-throttle: choose a small throtl_slice for SSD

2017-01-14 Thread Shaohua Li
The throtl_slice is 100ms by default. This is a long time for SSD, a lot of IO can run. To make cgroups have smoother throughput, we choose a small value (20ms) for SSD. Signed-off-by: Shaohua Li <s...@fb.com> --- block/blk-sysfs.c| 2 ++ block/blk-throttle.c | 18 +++---

[PATCH V6 14/18] blk-throttle: ignore idle cgroup limit

2017-01-14 Thread Shaohua Li
Last patch introduces a way to detect idle cgroup. We use it to make upgrade/downgrade decision. And the new algorithm can detect completely idle cgroup too, so we can delete the corresponding code. Signed-off-by: Shaohua Li <s...@fb.com> --- block/blk-throttle.

[PATCH V6 17/18] blk-throttle: add a mechanism to estimate IO latency

2017-01-14 Thread Shaohua Li
this feature is SSD only, we probably can use a fixed threshold like 4ms for hard disk though. Signed-off-by: Shaohua Li <s...@fb.com> --- block/blk-stat.c | 4 ++ block/blk-throttle.c | 162 -- block/blk.h | 2 + include

[PATCH V6 00/18] blk-throttle: add .low limit

2017-01-14 Thread Shaohua Li
ttp://marc.info/?l=linux-block=147395674732335=2 V1: http://marc.info/?l=linux-block=146292596425689=2 Shaohua Li (18): blk-throttle: use U64_MAX/UINT_MAX to replace -1 blk-throttle: prepare support multiple limits blk-throttle: add .low interface blk-throttle: configure bps/iops limit fo

[PATCH V6 13/18] blk-throttle: add interface to configure idle time threshold

2017-01-14 Thread Shaohua Li
Add interface to configure the threshold. The io.low interface will like: echo "8:16 rbps=2097152 wbps=max idle=2000" > io.low idle is in microsecond unit. Signed-off-by: Shaohua Li <s...@fb.com> --- block/blk-throttle.c | 41 -

[PATCH V6 10/18] blk-throttle: detect completed idle cgroup

2017-01-14 Thread Shaohua Li
. Signed-off-by: Shaohua Li <s...@fb.com> --- block/blk-throttle.c | 19 ++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/block/blk-throttle.c b/block/blk-throttle.c index 2d05c91..b3ce176 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @@ -149,6

[PATCH V6 05/18] blk-throttle: add upgrade logic for LIMIT_LOW state

2017-01-14 Thread Shaohua Li
as parent's bps/iops (which is a sum of childrens bps/iops) cross low limit, we can upgrade queue state. Signed-off-by: Shaohua Li <s...@fb.com> --- block/blk-throttle.c | 100 --- 1 file changed, 96 insertions(+), 4 deletions(-) diff --git a/blo

[PATCH V6 04/18] blk-throttle: configure bps/iops limit for cgroup in low limit

2017-01-14 Thread Shaohua Li
-by: Shaohua Li <s...@fb.com> --- block/blk-throttle.c | 20 ++-- 1 file changed, 18 insertions(+), 2 deletions(-) diff --git a/block/blk-throttle.c b/block/blk-throttle.c index d3ad43c..3bc6deb 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @@ -212,12 +212,28 @@

[PATCH V6 03/18] blk-throttle: add .low interface

2017-01-14 Thread Shaohua Li
configuration. Old bps/iops fields in throtl_grp will be the actual limit we use for throttling. Signed-off-by: Shaohua Li <s...@fb.com> --- block/blk-throttle.c | 142 +-- 1 file changed, 114 insertions(+), 28 deletions(-) diff --git a/block/blk-thro

[PATCH V6 11/18] blk-throttle: make bandwidth change smooth

2017-01-14 Thread Shaohua Li
then fully downgrade the queue to LIMIT_LOW state. Note this doesn't completely avoid cgroup running under its low limit. The best way to guarantee cgroup doesn't run under its limit is to set max limit. For example, if we set cg1 max limit to 40, cg2 will never run under its low limit. Signed-o

Re: [PATCH v3 08/14] block: introduce bio_copy_data_partial

2017-03-23 Thread Shaohua Li
Jens, can you look at this patch? If it's ok, I'd like to route it through md tree. Thanks, Shaohua On Fri, Mar 17, 2017 at 12:12:29AM +0800, Ming Lei wrote: > Turns out we can use bio_copy_data in raid1's write behind, > and we can make alloc_behind_pages() more clean/efficient, > but we need

Re: [PATCH v3 02/14] md: move two macros into md.h

2017-03-24 Thread Shaohua Li
On Fri, Mar 24, 2017 at 04:57:37PM +1100, Neil Brown wrote: > On Fri, Mar 17 2017, Ming Lei wrote: > > > Both raid1 and raid10 share common resync > > block size and page count, so move them into md.h. > > I don't think this is necessary. > These are just "magic" numbers. They don't have any

  1   2   3   4   >