Re: [PATCH V4 13/15] blk-throttle: add a mechanism to estimate IO latency

2016-11-14 Thread kbuild test robot
Hi Shaohua, [auto build test WARNING on linus/master] [also build test WARNING on v4.9-rc5] [cannot apply to block/for-next next-20161114] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Shaohua

Re: [PATCH] blk-mq drivers: untangle 0 and BLK_MQ_RQ_QUEUE_OK

2016-11-14 Thread Jens Axboe
On 11/14/2016 04:49 PM, Omar Sandoval wrote: From: Omar Sandoval Let's not depend on any of the BLK_MQ_RQ_QUEUE_* constants having specific values. No functional change. Signed-off-by: Omar Sandoval --- Hi, Jens, Some more trivial cleanup, feel free to apply

Re: "creative" bio usage in the RAID code

2016-11-14 Thread Ming Lei
On Tue, Nov 15, 2016 at 8:13 AM, Shaohua Li wrote: > On Sat, Nov 12, 2016 at 09:42:38AM -0800, Christoph Hellwig wrote: >> On Fri, Nov 11, 2016 at 11:02:23AM -0800, Shaohua Li wrote: >> > > It's mostly about the RAID1 and RAID10 code which does a lot of funny >> > > things with

Re: [PATCH V4 00/15] blk-throttle: add .high limit

2016-11-14 Thread Shaohua Li
On Mon, Nov 14, 2016 at 05:18:28PM -0800, Bart Van Assche wrote: > On 11/14/2016 04:49 PM, Shaohua Li wrote: > > On Mon, Nov 14, 2016 at 04:41:33PM -0800, Bart Van Assche wrote: > > > Thank you for pointing me to the discussion thread about v3 of this patch > > > series. Did I see correctly that

Re: [PATCH V4 00/15] blk-throttle: add .high limit

2016-11-14 Thread Bart Van Assche
On 11/14/2016 04:49 PM, Shaohua Li wrote: On Mon, Nov 14, 2016 at 04:41:33PM -0800, Bart Van Assche wrote: Thank you for pointing me to the discussion thread about v3 of this patch series. Did I see correctly that one of the conclusions was that for users this mechanism is hard to configure?

Re: [PATCH 08/12] dm: dm.c: replace 'bio->bi_vcnt == 1' with !bio_multiple_segments

2016-11-14 Thread Ming Lei
On Fri, Nov 11, 2016 at 8:05 PM, Ming Lei wrote: > Avoid to access .bi_vcnt directly, because the bio can be > splitted from block layer, and .bi_vcnt should never have > been used here. > > Signed-off-by: Ming Lei > --- > drivers/md/dm-rq.c | 7

Re: [PATCH 07/12] dm: use bvec iterator helpers to implement .get_page and .next_page

2016-11-14 Thread Ming Lei
On Fri, Nov 11, 2016 at 8:05 PM, Ming Lei wrote: > Firstly we have mature bvec/bio iterator helper for iterate each > page in one bio, not necessary to reinvent a wheel to do that. > > Secondly the coming multipage bvecs requires this patch. > > Also add comments about the

Re: [PATCH V4 00/15] blk-throttle: add .high limit

2016-11-14 Thread Shaohua Li
On Mon, Nov 14, 2016 at 04:41:33PM -0800, Bart Van Assche wrote: > On 11/14/2016 04:05 PM, Shaohua Li wrote: > > On Mon, Nov 14, 2016 at 02:46:22PM -0800, Bart Van Assche wrote: > > > On 11/14/2016 02:22 PM, Shaohua Li wrote: > > > > The background is we don't have an ioscheduler for blk-mq yet,

Re: [PATCH V4 00/15] blk-throttle: add .high limit

2016-11-14 Thread Bart Van Assche
On 11/14/2016 04:05 PM, Shaohua Li wrote: On Mon, Nov 14, 2016 at 02:46:22PM -0800, Bart Van Assche wrote: On 11/14/2016 02:22 PM, Shaohua Li wrote: The background is we don't have an ioscheduler for blk-mq yet, so we can't prioritize processes/cgroups. This patch set tries to add basic

Re: "creative" bio usage in the RAID code

2016-11-14 Thread Shaohua Li
On Sat, Nov 12, 2016 at 09:42:38AM -0800, Christoph Hellwig wrote: > On Fri, Nov 11, 2016 at 11:02:23AM -0800, Shaohua Li wrote: > > > It's mostly about the RAID1 and RAID10 code which does a lot of funny > > > things with the bi_iov_vec and bi_vcnt fields, which we'd prefer that > > > drivers

Re: [PATCH V4 00/15] blk-throttle: add .high limit

2016-11-14 Thread Shaohua Li
On Mon, Nov 14, 2016 at 02:46:22PM -0800, Bart Van Assche wrote: > On 11/14/2016 02:22 PM, Shaohua Li wrote: > > The background is we don't have an ioscheduler for blk-mq yet, so we can't > > prioritize processes/cgroups. This patch set tries to add basic arbitration > > between cgroups with

Re: [PATCH V4 13/15] blk-throttle: add a mechanism to estimate IO latency

2016-11-14 Thread kbuild test robot
Hi Shaohua, [auto build test WARNING on linus/master] [also build test WARNING on v4.9-rc5] [cannot apply to block/for-next next-20161114] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Shaohua

Re: [PATCH] loop: return proper error from loop_queue_rq()

2016-11-14 Thread Jens Axboe
On 11/14/2016 03:56 PM, Omar Sandoval wrote: From: Omar Sandoval ->queue_rq() should return one of the BLK_MQ_RQ_QUEUE_* constants, not an errno. Thanks Omar, applied. -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a

[PATCH] loop: return proper error from loop_queue_rq()

2016-11-14 Thread Omar Sandoval
From: Omar Sandoval ->queue_rq() should return one of the BLK_MQ_RQ_QUEUE_* constants, not an errno. f4aa4c7bbac6 ("block: loop: convert to per-device workqueue") Signed-off-by: Omar Sandoval --- drivers/block/loop.c | 2 +- 1 file changed, 1 insertion(+), 1

Re: [PATCH V4 00/15] blk-throttle: add .high limit

2016-11-14 Thread Bart Van Assche
On 11/14/2016 02:22 PM, Shaohua Li wrote: The background is we don't have an ioscheduler for blk-mq yet, so we can't prioritize processes/cgroups. This patch set tries to add basic arbitration between cgroups with blk-throttle. It adds a new limit io.high for blk-throttle. It's only for cgroup2.

Re: blk-mq: preparation of the ground for BFQ inclusion

2016-11-14 Thread Jens Axboe
On 11/14/2016 03:13 PM, Paolo Valente wrote: Hi Jens, have you had time to look into the first extensions/changes required? Any time plan? I was busy last week on the writeback and on the polling. I should have something end this week. -- Jens Axboe -- To unsubscribe from this list: send the

[PATCH V4 01/15] blk-throttle: prepare support multiple limits

2016-11-14 Thread Shaohua Li
We are going to support high/max limit, each cgroup will have 2 limits after that. This patch prepares for the multiple limits change. Signed-off-by: Shaohua Li --- block/blk-throttle.c | 110 --- 1 file changed, 69 insertions(+), 41

[PATCH V4 09/15] blk-throttle: make bandwidth change smooth

2016-11-14 Thread Shaohua Li
When cgroups all reach high limit, cgroups can dispatch more IO. This could make some cgroups dispatch more IO but others not, and even some cgroups could dispatch less IO than their high limit. For example, cg1 high limit 10MB/s, cg2 limit 80MB/s, assume disk maximum bandwidth is 120M/s for the

[PATCH V4 05/15] blk-throttle: add downgrade logic

2016-11-14 Thread Shaohua Li
When queue state machine is in LIMIT_MAX state, but a cgroup is below its high limit for some time, the queue should be downgraded to lower state as one cgroup's high limit isn't met. Signed-off-by: Shaohua Li --- block/blk-throttle.c | 188

[PATCH V4 10/15] blk-throttle: add a simple idle detection

2016-11-14 Thread Shaohua Li
A cgroup gets assigned a high limit, but the cgroup could never dispatch enough IO to cross the high limit. In such case, the queue state machine will remain in LIMIT_HIGH state and all other cgroups will be throttled according to high limit. This is unfair for other cgroups. We should treat the

[PATCH V4 12/15] blk-throttle: ignore idle cgroup limit

2016-11-14 Thread Shaohua Li
Last patch introduces a way to detect idle cgroup. We use it to make upgrade/downgrade decision. And the new algorithm can detect completely idle cgroup too, so we can delete the corresponding code. Signed-off-by: Shaohua Li --- block/blk-throttle.c | 39

[PATCH V4 14/15] blk-throttle: add interface for per-cgroup target latency

2016-11-14 Thread Shaohua Li
Add interface for per-cgroup target latency. This latency is for 4k request. Signed-off-by: Shaohua Li --- block/blk-throttle.c | 67 1 file changed, 67 insertions(+) diff --git a/block/blk-throttle.c b/block/blk-throttle.c

[PATCH V4 04/15] blk-throttle: add upgrade logic for LIMIT_HIGH state

2016-11-14 Thread Shaohua Li
When queue is in LIMIT_HIGH state and all cgroups with high limit cross the bps/iops limitation, we will upgrade queue's state to LIMIT_MAX For a cgroup hierarchy, there are two cases. Children has lower high limit than parent. Parent's high limit is meaningless. If children's bps/iops cross high

[PATCH V4 11/15] blk-throttle: add interface to configure think time threshold

2016-11-14 Thread Shaohua Li
Add interface to configure the threshold Signed-off-by: Shaohua Li --- block/blk-sysfs.c| 7 +++ block/blk-throttle.c | 25 + block/blk.h | 4 3 files changed, 36 insertions(+) diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index

[PATCH V4 06/15] blk-throttle: make sure expire time isn't too big

2016-11-14 Thread Shaohua Li
cgroup could be throttled to a limit but when all cgroups cross high limit, queue enters a higher state and so the group should be throttled to a higher limit. It's possible the cgroup is sleeping because of throttle and other cgroups don't dispatch IO any more. In this case, nobody can trigger

[PATCH V4 02/15] blk-throttle: add .high interface

2016-11-14 Thread Shaohua Li
Add high limit for cgroup and corresponding cgroup interface. Signed-off-by: Shaohua Li --- block/blk-throttle.c | 132 --- 1 file changed, 103 insertions(+), 29 deletions(-) diff --git a/block/blk-throttle.c b/block/blk-throttle.c

[PATCH V4 03/15] blk-throttle: configure bps/iops limit for cgroup in high limit

2016-11-14 Thread Shaohua Li
each queue will have a state machine. Initially queue is in LIMIT_HIGH state, which means all cgroups will be throttled according to their high limit. After all cgroups with high limit cross the limit, the queue state gets upgraded to LIMIT_MAX state. cgroups without high limit will use max limit

[PATCH V4 13/15] blk-throttle: add a mechanism to estimate IO latency

2016-11-14 Thread Shaohua Li
We try to set a latency target for each cgroup. The problem is latency highly depends on request size, users can't configure the target for every request size. The idea is users configure latency target for 4k IO, we estimate the target latency for other request size IO. To do this, we sample

[PATCH V4 00/15] blk-throttle: add .high limit

2016-11-14 Thread Shaohua Li
Hi, The background is we don't have an ioscheduler for blk-mq yet, so we can't prioritize processes/cgroups. This patch set tries to add basic arbitration between cgroups with blk-throttle. It adds a new limit io.high for blk-throttle. It's only for cgroup2. io.max is a hard limit throttling.

[PATCH V4 07/15] blk-throttle: make throtl_slice tunable

2016-11-14 Thread Shaohua Li
throtl_slice is important for blk-throttling. A lot of stuffes depend on it, for example, throughput measurement. It has 100ms default value, which is not appropriate for all disks. For example, for SSD we might use a smaller value to make the throughput smoother. This patch makes it tunable.

[PATCH V4 15/15] blk-throttle: add latency target support

2016-11-14 Thread Shaohua Li
One hard problem adding .high limit is to detect idle cgroup. If one cgroup doesn't dispatch enough IO against its high limit, we must have a mechanism to determine if other cgroups dispatch more IO. We added the think time detection mechanism before, but it doesn't work for all workloads. Here we

blk-mq: preparation of the ground for BFQ inclusion

2016-11-14 Thread Paolo Valente
Hi Jens, have you had time to look into the first extensions/changes required? Any time plan? Thanks, Paolo -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at

Re: [PATCH] sd_zbc: Force use of READ16/WRITE16

2016-11-14 Thread Jens Axboe
On 11/10/2016 10:53 PM, Damien Le Moal wrote: Normally, sd_read_capacity sets sdp->use_16_for_rw to 1 based on the disk capacity so that READ16/WRITE16 are used for large drives. However, for a zoned disk with RC_BASIS set to 0, the capacity reported through READ_CAPACITY may be very small,

Re: [PATCH 3/3] blk-mq: make the polling code adaptive

2016-11-14 Thread Jens Axboe
On 11/14/2016 12:43 PM, Omar Sandoval wrote: ,9 +2539,10 @@ static bool blk_mq_poll_hybrid_sleep(struct request_queue *q, * This will be replaced with the stats tracking code, using * 'avg_completion_time / 2' as the pre-sleep target. */ - kt = ktime_set(0,

Re: [PATCH 3/3] blk-mq: make the polling code adaptive

2016-11-14 Thread Omar Sandoval
On Fri, Nov 11, 2016 at 10:11:27PM -0700, Jens Axboe wrote: > The previous commit introduced the hybrid sleep/poll mode. Take > that one step further, and use the completion latencies to > automatically sleep for half the mean completion time. This is > a good approximation. > > This changes the

Re: [PATCH 1/3] block: fast-path for small and simple direct I/O requests

2016-11-14 Thread Omar Sandoval
On Fri, Nov 11, 2016 at 10:11:25PM -0700, Jens Axboe wrote: > From: Christoph Hellwig > > This patch adds a small and simple fast patch for small direct I/O > requests on block devices that don't use AIO. Between the neat > bio_iov_iter_get_pages helper that avoids allocating a

[PATCH 2/2] blk-mq: Avoid memory reclaim when remapping queues

2016-11-14 Thread Gabriel Krisman Bertazi
While stressing memory and IO at the same time we changed SMT settings, we were able to consistently trigger deadlocks in the mm system, which froze the entire machine. I think that under memory stress conditions, the large allocations performed by blk_mq_init_rq_map may trigger a reclaim, which

[PATCH 1/2] blk-mq: Fix failed allocation path when mapping queues

2016-11-14 Thread Gabriel Krisman Bertazi
In blk_mq_map_swqueue, there is a memory optimization that frees the tags of a queue that has gone unmapped. Later, if that hctx is remapped after another topology change, the tags need to be reallocated. If this allocation fails, a simple WARN_ON triggers, but the block layer ends up with an

Re: [PATCHSET] Add support for simplified async direct-io

2016-11-14 Thread Jens Axboe
On 11/14/2016 11:11 AM, Christoph Hellwig wrote: On Mon, Nov 14, 2016 at 11:08:46AM -0700, Jens Axboe wrote: It'd be cleaner to loop one level out, and avoid all that 'dio' stuff instead. And then still retain the separate parts of the sync and async. There's nothing to share there imho, and it

Re: [PATCHSET] Add support for simplified async direct-io

2016-11-14 Thread Jens Axboe
On 11/14/2016 11:05 AM, Christoph Hellwig wrote: On Mon, Nov 14, 2016 at 11:02:47AM -0700, Jens Axboe wrote: We need it unless we want unbounded allocations for the biovec. With a 1GB I/O we're at a page size allocation, and with 64MB I/Os that aren't unheard of we'd be up to a 64 pages or an

Re: [PATCHSET] Add support for simplified async direct-io

2016-11-14 Thread Christoph Hellwig
On Mon, Nov 14, 2016 at 11:02:47AM -0700, Jens Axboe wrote: > > We need it unless we want unbounded allocations for the biovec. With a > > 1GB I/O we're at a page size allocation, and with 64MB I/Os that aren't > > unheard of we'd be up to a 64 pages or an order 6 allocation which will > > take

Re: [PATCHSET] Add support for simplified async direct-io

2016-11-14 Thread Jens Axboe
On 11/14/2016 11:00 AM, Christoph Hellwig wrote: On Mon, Nov 14, 2016 at 10:47:46AM -0700, Jens Axboe wrote: This seems less clean in basically all ways, not sure I agree with you. We already have 4 vecs inlined in a generic bio, and we might as well use the fs bioset instead of creating our

Re: [PATCHSET] Add support for simplified async direct-io

2016-11-14 Thread Christoph Hellwig
On Mon, Nov 14, 2016 at 10:47:46AM -0700, Jens Axboe wrote: > This seems less clean in basically all ways, not sure I agree with you. > We already have 4 vecs inlined in a generic bio, and we might as well > use the fs bioset instead of creating our own. You also add a smallish > dio to track

Re: [PATCHSET] Add support for simplified async direct-io

2016-11-14 Thread Christoph Hellwig
On Mon, Nov 14, 2016 at 10:28:37AM -0700, Jens Axboe wrote: > This is on top of for-4.10/dio, which has Christophs simplified sync > O_DIRECT support and the IO poll bits. > > The restriction on 4 inline vecs is removed on the sync support, we > just allocate the bio_vec array if we have to. I

[PATCH 2/2] block: add support for async simple direct-io for bdevs

2016-11-14 Thread Jens Axboe
Signed-off-by: Jens Axboe --- fs/block_dev.c | 76 ++ 1 file changed, 66 insertions(+), 10 deletions(-) diff --git a/fs/block_dev.c b/fs/block_dev.c index 2010997fd326..62ca4ce21222 100644 --- a/fs/block_dev.c +++

[PATCH 1/2] block: support any sized IO for simplified bdev direct-io

2016-11-14 Thread Jens Axboe
Just alloc the bio_vec array if we exceed the inline limit. Signed-off-by: Jens Axboe --- fs/block_dev.c | 17 ++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/fs/block_dev.c b/fs/block_dev.c index 7c3ec6049073..2010997fd326 100644 ---

[PATCHSET] Add support for simplified async direct-io

2016-11-14 Thread Jens Axboe
This is on top of for-4.10/dio, which has Christophs simplified sync O_DIRECT support and the IO poll bits. The restriction on 4 inline vecs is removed on the sync support, we just allocate the bio_vec array if we have to. I realize this negates parts of the win of the patch for sync, but it's

Re: [PATCH] bsg: Add sparse annotations to bsg_request_fn()

2016-11-14 Thread Bart Van Assche
On 09/25/2016 07:54 PM, Bart Van Assche wrote: Avoid that sparse complains about unbalanced lock actions. Signed-off-by: Bart Van Assche --- block/bsg-lib.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/block/bsg-lib.c b/block/bsg-lib.c index

Re: [PATCH] sd_zbc: Force use of READ16/WRITE16

2016-11-14 Thread Christoph Hellwig
Looks fine, Reviewed-by: Christoph Hellwig -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] blk-wbt: remove stat ops

2016-11-14 Thread Jan Kara
On Fri 11-11-16 08:21:57, Jens Axboe wrote: > Again a leftover from when the throttling code was generic. Now that we > just have the block user, get rid of the stat ops and indirections. > > Signed-off-by: Jens Axboe Looks good to me. You can add: Reviewed-by: Jan Kara

Re: "creative" bio usage in the RAID code

2016-11-14 Thread Christoph Hellwig
On Mon, Nov 14, 2016 at 09:53:46AM +1100, NeilBrown wrote: > > While we're at it - I find the way MD_RECOVERY_REQUESTED is used highly > > confusing, and I'm not 100% sure it's correct. After all we check it > > in r1buf_pool_alloc, which is a mempool alloc callback, so we rely > > on these

Re: "creative" bio usage in the RAID code

2016-11-14 Thread Christoph Hellwig
On Mon, Nov 14, 2016 at 10:03:20AM +1100, NeilBrown wrote: > I would suggest adding a "bi_dev_private" field to the bio which is for > use by the lowest-level driver (much as bi_private is for use by the > top-level initiator). > That could be in a union with any or all of: > unsigned int