Hi Shaohua,
[auto build test WARNING on linus/master]
[also build test WARNING on v4.9-rc5]
[cannot apply to block/for-next next-20161114]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system]
url:
https://github.com/0day-ci/linux/commits/Shaohua
On 11/14/2016 04:49 PM, Omar Sandoval wrote:
From: Omar Sandoval
Let's not depend on any of the BLK_MQ_RQ_QUEUE_* constants having
specific values. No functional change.
Signed-off-by: Omar Sandoval
---
Hi, Jens,
Some more trivial cleanup, feel free to apply
On Tue, Nov 15, 2016 at 8:13 AM, Shaohua Li wrote:
> On Sat, Nov 12, 2016 at 09:42:38AM -0800, Christoph Hellwig wrote:
>> On Fri, Nov 11, 2016 at 11:02:23AM -0800, Shaohua Li wrote:
>> > > It's mostly about the RAID1 and RAID10 code which does a lot of funny
>> > > things with
On Mon, Nov 14, 2016 at 05:18:28PM -0800, Bart Van Assche wrote:
> On 11/14/2016 04:49 PM, Shaohua Li wrote:
> > On Mon, Nov 14, 2016 at 04:41:33PM -0800, Bart Van Assche wrote:
> > > Thank you for pointing me to the discussion thread about v3 of this patch
> > > series. Did I see correctly that
On 11/14/2016 04:49 PM, Shaohua Li wrote:
On Mon, Nov 14, 2016 at 04:41:33PM -0800, Bart Van Assche wrote:
Thank you for pointing me to the discussion thread about v3 of this patch
series. Did I see correctly that one of the conclusions was that for users
this mechanism is hard to configure?
On Fri, Nov 11, 2016 at 8:05 PM, Ming Lei wrote:
> Avoid to access .bi_vcnt directly, because the bio can be
> splitted from block layer, and .bi_vcnt should never have
> been used here.
>
> Signed-off-by: Ming Lei
> ---
> drivers/md/dm-rq.c | 7
On Fri, Nov 11, 2016 at 8:05 PM, Ming Lei wrote:
> Firstly we have mature bvec/bio iterator helper for iterate each
> page in one bio, not necessary to reinvent a wheel to do that.
>
> Secondly the coming multipage bvecs requires this patch.
>
> Also add comments about the
On Mon, Nov 14, 2016 at 04:41:33PM -0800, Bart Van Assche wrote:
> On 11/14/2016 04:05 PM, Shaohua Li wrote:
> > On Mon, Nov 14, 2016 at 02:46:22PM -0800, Bart Van Assche wrote:
> > > On 11/14/2016 02:22 PM, Shaohua Li wrote:
> > > > The background is we don't have an ioscheduler for blk-mq yet,
On 11/14/2016 04:05 PM, Shaohua Li wrote:
On Mon, Nov 14, 2016 at 02:46:22PM -0800, Bart Van Assche wrote:
On 11/14/2016 02:22 PM, Shaohua Li wrote:
The background is we don't have an ioscheduler for blk-mq yet, so we can't
prioritize processes/cgroups. This patch set tries to add basic
On Sat, Nov 12, 2016 at 09:42:38AM -0800, Christoph Hellwig wrote:
> On Fri, Nov 11, 2016 at 11:02:23AM -0800, Shaohua Li wrote:
> > > It's mostly about the RAID1 and RAID10 code which does a lot of funny
> > > things with the bi_iov_vec and bi_vcnt fields, which we'd prefer that
> > > drivers
On Mon, Nov 14, 2016 at 02:46:22PM -0800, Bart Van Assche wrote:
> On 11/14/2016 02:22 PM, Shaohua Li wrote:
> > The background is we don't have an ioscheduler for blk-mq yet, so we can't
> > prioritize processes/cgroups. This patch set tries to add basic arbitration
> > between cgroups with
Hi Shaohua,
[auto build test WARNING on linus/master]
[also build test WARNING on v4.9-rc5]
[cannot apply to block/for-next next-20161114]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system]
url:
https://github.com/0day-ci/linux/commits/Shaohua
On 11/14/2016 03:56 PM, Omar Sandoval wrote:
From: Omar Sandoval
->queue_rq() should return one of the BLK_MQ_RQ_QUEUE_* constants, not
an errno.
Thanks Omar, applied.
--
Jens Axboe
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a
From: Omar Sandoval
->queue_rq() should return one of the BLK_MQ_RQ_QUEUE_* constants, not
an errno.
f4aa4c7bbac6 ("block: loop: convert to per-device workqueue")
Signed-off-by: Omar Sandoval
---
drivers/block/loop.c | 2 +-
1 file changed, 1 insertion(+), 1
On 11/14/2016 02:22 PM, Shaohua Li wrote:
The background is we don't have an ioscheduler for blk-mq yet, so we can't
prioritize processes/cgroups. This patch set tries to add basic arbitration
between cgroups with blk-throttle. It adds a new limit io.high for
blk-throttle. It's only for cgroup2.
On 11/14/2016 03:13 PM, Paolo Valente wrote:
Hi Jens,
have you had time to look into the first extensions/changes required?
Any time plan?
I was busy last week on the writeback and on the polling. I should have
something end this week.
--
Jens Axboe
--
To unsubscribe from this list: send the
We are going to support high/max limit, each cgroup will have 2 limits
after that. This patch prepares for the multiple limits change.
Signed-off-by: Shaohua Li
---
block/blk-throttle.c | 110 ---
1 file changed, 69 insertions(+), 41
When cgroups all reach high limit, cgroups can dispatch more IO. This
could make some cgroups dispatch more IO but others not, and even some
cgroups could dispatch less IO than their high limit. For example, cg1
high limit 10MB/s, cg2 limit 80MB/s, assume disk maximum bandwidth is
120M/s for the
When queue state machine is in LIMIT_MAX state, but a cgroup is below
its high limit for some time, the queue should be downgraded to lower
state as one cgroup's high limit isn't met.
Signed-off-by: Shaohua Li
---
block/blk-throttle.c | 188
A cgroup gets assigned a high limit, but the cgroup could never dispatch
enough IO to cross the high limit. In such case, the queue state machine
will remain in LIMIT_HIGH state and all other cgroups will be throttled
according to high limit. This is unfair for other cgroups. We should
treat the
Last patch introduces a way to detect idle cgroup. We use it to make
upgrade/downgrade decision. And the new algorithm can detect completely
idle cgroup too, so we can delete the corresponding code.
Signed-off-by: Shaohua Li
---
block/blk-throttle.c | 39
Add interface for per-cgroup target latency. This latency is for 4k
request.
Signed-off-by: Shaohua Li
---
block/blk-throttle.c | 67
1 file changed, 67 insertions(+)
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
When queue is in LIMIT_HIGH state and all cgroups with high limit cross
the bps/iops limitation, we will upgrade queue's state to
LIMIT_MAX
For a cgroup hierarchy, there are two cases. Children has lower high
limit than parent. Parent's high limit is meaningless. If children's
bps/iops cross high
Add interface to configure the threshold
Signed-off-by: Shaohua Li
---
block/blk-sysfs.c| 7 +++
block/blk-throttle.c | 25 +
block/blk.h | 4
3 files changed, 36 insertions(+)
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index
cgroup could be throttled to a limit but when all cgroups cross high
limit, queue enters a higher state and so the group should be throttled
to a higher limit. It's possible the cgroup is sleeping because of
throttle and other cgroups don't dispatch IO any more. In this case,
nobody can trigger
Add high limit for cgroup and corresponding cgroup interface.
Signed-off-by: Shaohua Li
---
block/blk-throttle.c | 132 ---
1 file changed, 103 insertions(+), 29 deletions(-)
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
each queue will have a state machine. Initially queue is in LIMIT_HIGH
state, which means all cgroups will be throttled according to their high
limit. After all cgroups with high limit cross the limit, the queue state
gets upgraded to LIMIT_MAX state.
cgroups without high limit will use max limit
We try to set a latency target for each cgroup. The problem is latency
highly depends on request size, users can't configure the target for
every request size. The idea is users configure latency target for 4k
IO, we estimate the target latency for other request size IO.
To do this, we sample
Hi,
The background is we don't have an ioscheduler for blk-mq yet, so we can't
prioritize processes/cgroups. This patch set tries to add basic arbitration
between cgroups with blk-throttle. It adds a new limit io.high for
blk-throttle. It's only for cgroup2.
io.max is a hard limit throttling.
throtl_slice is important for blk-throttling. A lot of stuffes depend on
it, for example, throughput measurement. It has 100ms default value,
which is not appropriate for all disks. For example, for SSD we might
use a smaller value to make the throughput smoother. This patch makes it
tunable.
One hard problem adding .high limit is to detect idle cgroup. If one
cgroup doesn't dispatch enough IO against its high limit, we must have a
mechanism to determine if other cgroups dispatch more IO. We added the
think time detection mechanism before, but it doesn't work for all
workloads. Here we
Hi Jens,
have you had time to look into the first extensions/changes required?
Any time plan?
Thanks,
Paolo
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at
On 11/10/2016 10:53 PM, Damien Le Moal wrote:
Normally, sd_read_capacity sets sdp->use_16_for_rw to 1 based on the
disk capacity so that READ16/WRITE16 are used for large drives.
However, for a zoned disk with RC_BASIS set to 0, the capacity reported
through READ_CAPACITY may be very small,
On 11/14/2016 12:43 PM, Omar Sandoval wrote:
,9 +2539,10 @@ static bool blk_mq_poll_hybrid_sleep(struct request_queue *q,
* This will be replaced with the stats tracking code, using
* 'avg_completion_time / 2' as the pre-sleep target.
*/
- kt = ktime_set(0,
On Fri, Nov 11, 2016 at 10:11:27PM -0700, Jens Axboe wrote:
> The previous commit introduced the hybrid sleep/poll mode. Take
> that one step further, and use the completion latencies to
> automatically sleep for half the mean completion time. This is
> a good approximation.
>
> This changes the
On Fri, Nov 11, 2016 at 10:11:25PM -0700, Jens Axboe wrote:
> From: Christoph Hellwig
>
> This patch adds a small and simple fast patch for small direct I/O
> requests on block devices that don't use AIO. Between the neat
> bio_iov_iter_get_pages helper that avoids allocating a
While stressing memory and IO at the same time we changed SMT settings,
we were able to consistently trigger deadlocks in the mm system, which
froze the entire machine.
I think that under memory stress conditions, the large allocations
performed by blk_mq_init_rq_map may trigger a reclaim, which
In blk_mq_map_swqueue, there is a memory optimization that frees the
tags of a queue that has gone unmapped. Later, if that hctx is remapped
after another topology change, the tags need to be reallocated.
If this allocation fails, a simple WARN_ON triggers, but the block layer
ends up with an
On 11/14/2016 11:11 AM, Christoph Hellwig wrote:
On Mon, Nov 14, 2016 at 11:08:46AM -0700, Jens Axboe wrote:
It'd be cleaner to loop one level out, and avoid all that 'dio' stuff
instead. And then still retain the separate parts of the sync and async.
There's nothing to share there imho, and it
On 11/14/2016 11:05 AM, Christoph Hellwig wrote:
On Mon, Nov 14, 2016 at 11:02:47AM -0700, Jens Axboe wrote:
We need it unless we want unbounded allocations for the biovec. With a
1GB I/O we're at a page size allocation, and with 64MB I/Os that aren't
unheard of we'd be up to a 64 pages or an
On Mon, Nov 14, 2016 at 11:02:47AM -0700, Jens Axboe wrote:
> > We need it unless we want unbounded allocations for the biovec. With a
> > 1GB I/O we're at a page size allocation, and with 64MB I/Os that aren't
> > unheard of we'd be up to a 64 pages or an order 6 allocation which will
> > take
On 11/14/2016 11:00 AM, Christoph Hellwig wrote:
On Mon, Nov 14, 2016 at 10:47:46AM -0700, Jens Axboe wrote:
This seems less clean in basically all ways, not sure I agree with you.
We already have 4 vecs inlined in a generic bio, and we might as well
use the fs bioset instead of creating our
On Mon, Nov 14, 2016 at 10:47:46AM -0700, Jens Axboe wrote:
> This seems less clean in basically all ways, not sure I agree with you.
> We already have 4 vecs inlined in a generic bio, and we might as well
> use the fs bioset instead of creating our own. You also add a smallish
> dio to track
On Mon, Nov 14, 2016 at 10:28:37AM -0700, Jens Axboe wrote:
> This is on top of for-4.10/dio, which has Christophs simplified sync
> O_DIRECT support and the IO poll bits.
>
> The restriction on 4 inline vecs is removed on the sync support, we
> just allocate the bio_vec array if we have to. I
Signed-off-by: Jens Axboe
---
fs/block_dev.c | 76 ++
1 file changed, 66 insertions(+), 10 deletions(-)
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 2010997fd326..62ca4ce21222 100644
--- a/fs/block_dev.c
+++
Just alloc the bio_vec array if we exceed the inline limit.
Signed-off-by: Jens Axboe
---
fs/block_dev.c | 17 ++---
1 file changed, 14 insertions(+), 3 deletions(-)
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 7c3ec6049073..2010997fd326 100644
---
This is on top of for-4.10/dio, which has Christophs simplified sync
O_DIRECT support and the IO poll bits.
The restriction on 4 inline vecs is removed on the sync support, we
just allocate the bio_vec array if we have to. I realize this negates
parts of the win of the patch for sync, but it's
On 09/25/2016 07:54 PM, Bart Van Assche wrote:
Avoid that sparse complains about unbalanced lock actions.
Signed-off-by: Bart Van Assche
---
block/bsg-lib.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/block/bsg-lib.c b/block/bsg-lib.c
index
Looks fine,
Reviewed-by: Christoph Hellwig
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri 11-11-16 08:21:57, Jens Axboe wrote:
> Again a leftover from when the throttling code was generic. Now that we
> just have the block user, get rid of the stat ops and indirections.
>
> Signed-off-by: Jens Axboe
Looks good to me. You can add:
Reviewed-by: Jan Kara
On Mon, Nov 14, 2016 at 09:53:46AM +1100, NeilBrown wrote:
> > While we're at it - I find the way MD_RECOVERY_REQUESTED is used highly
> > confusing, and I'm not 100% sure it's correct. After all we check it
> > in r1buf_pool_alloc, which is a mempool alloc callback, so we rely
> > on these
On Mon, Nov 14, 2016 at 10:03:20AM +1100, NeilBrown wrote:
> I would suggest adding a "bi_dev_private" field to the bio which is for
> use by the lowest-level driver (much as bi_private is for use by the
> top-level initiator).
> That could be in a union with any or all of:
> unsigned int
52 matches
Mail list logo