From: Chunguang Xu
BFQ_DEFAULT_GRP_IOPRIO seems to be unused, maybe we can remove it.
Signed-off-by: Chunguang Xu
---
block/bfq-iosched.h | 1 -
1 file changed, 1 deletion(-)
diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h
index da636a8..91c8654 100644
--- a/block/bfq-iosched.h
+++
From: Chunguang Xu
Since weight, ioprio, ioprio_class will be updated in bfq_init_entity(),
st->wsum will be updated in __bfq_activate_entity(), so when it is first
active, it seems that __bfq_entity_update_weight_prio() has nothing to
do. By resetting entity->prio_change in bfq_init_entity(),
From: Chunguang Xu
Since we will initialize sched_data.service_tree[] in
bfq_init_root_group(), bfq_create_group_hierarchy() can
ignore this part of the initialization, which can avoid
repeated initialization.
Signed-off-by: Chunguang Xu
---
block/bfq-cgroup.c | 4
1 file changed, 4
From: Chunguang Xu
The value range of ioprio is [0, 7], but the result of
bfq_weight_to_ioprio() may exceed this range, so simple
optimization is required.
Signed-off-by: Chunguang Xu
---
block/bfq-wf2q.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git
From: Chunguang Xu
CLASS_RT will preempt other classes, which may starve. At
present, CLASS_IDLE has alleviated the starvation problem
through the minimum bandwidth mechanism. Similarly, we
should do the same for CLASS_BE.
Signed-off-by: Chunguang Xu
---
block/bfq-iosched.c | 6 --
From: Chunguang Xu
Introduce bfq_entity_to_bfqg() to make it easier to obtain the
bfq_group corresponding to the entity.
Signed-off-by: Chunguang Xu
---
block/bfq-cgroup.c | 6 ++
block/bfq-iosched.h | 1 +
block/bfq-wf2q.c| 16
3 files changed, 15 insertions(+), 8
From: Chunguang Xu
The IO depth of queues belong to CLASS_IDLE is limited to 1,
so that it can avoid introducing a larger tail latency under
a device with a larger IO depth. Although limiting the IO
depth may reduce the performance of idle_class, it is
generally not a big problem, because
From: Chunguang Xu
Some misc updates, put together mainly to facilitate patch management.
Chunguang Xu (8):
bfq: introduce bfq_entity_to_bfqg helper method
bfq: convert the type of bfq_group.bfqd to bfq_data*
bfq: limit the IO depth of CLASS_IDLE to 1
bfq: keep the minimun bandwidth for
From: Chunguang Xu
Setting bfq_group.bfqd to void* type does not seem to make much sense.
This will cause unnecessary type conversion. Perhaps it would be better
to change it to bfq_data* type.
Signed-off-by: Chunguang Xu
---
block/bfq-cgroup.c | 2 +-
block/bfq-iosched.h | 2 +-
From: Chunguang Xu
The existing data structure is not very convenient for expansion,
and part of the code can be saved. Here, try to optimize, which
can make the code more concise and easy to expand.
delayacct_is_task_waiting_on_io() is currently only referenced by
cgroup v1, but I have
From: Chunguang Xu
Many distributions do not install the getdelay tool by
default, similar to task_io_accounting, adding a proc
file to make access easier.
v2: Fix some errors prompted by the kernel test robot.
Signed-off-by: Chunguang Xu
Reported-by: kernel test robot
---
fs/proc/base.c
Balbir Singh wrote on 2021/4/19 15:01:
> On Tue, Apr 13, 2021 at 09:37:26AM +0800, brookxu wrote:
>> From: Chunguang Xu
>>
>> The existing data structure is not very convenient for
>> expansion, and part of the code can be saved. Here, try
>> to optimize, whi
From: Chunguang Xu
Many distributions do not install the getdelay tool by
default, similar to task_io_accounting, adding a proc
file to make access easier.
Signed-off-by: Chunguang Xu
---
fs/proc/base.c | 7 +++
kernel/delayacct.c | 41 +
2
From: Chunguang Xu
The existing data structure is not very convenient for
expansion, and part of the code can be saved. Here, try
to optimize, which can make the code more concise and
easy to expand.
Signed-off-by: Chunguang Xu
---
include/linux/delayacct.h | 139
Tejun Heo wrote on 2021/4/5 0:09:
> Hello,
Hi, tj, thanks for your reply:)
> On Thu, Mar 25, 2021 at 02:57:44PM +0800, brookxu wrote:
>> INTERFACE:
>>
>> The bfq.ioprio interface now is available for cgroup v1 and cgroup
>> v2. Users can configure
From: Chunguang Xu
If delayacct is disabled, then delayacct_is_task_waiting_on_io()
always returns false, which causes the statistical value to be
wrong. Perhaps tsk->in_iowait is better.
Signed-off-by: Chunguang Xu
---
kernel/cgroup/cgroup-v1.c | 2 +-
1 file changed, 1 insertion(+), 1
From: Chunguang Xu
The existing data structure is not very convenient for
expansion, and part of the code can be saved. Here, try
to optimize, which can make the code more concise and
easy to expand.
Signed-off-by: Chunguang Xu
---
include/linux/delayacct.h | 139
From: Chunguang Xu
Many distributions do not install the getdelay tool by
default, similar to task_io_accounting, adding a proc
file to make access easier.
Signed-off-by: Chunguang Xu
---
fs/proc/base.c | 7 +++
kernel/delayacct.c | 41 +
2
From: Chunguang Xu
Since we will initialize sched_data.service_tree[] in
bfq_init_root_group(), bfq_create_group_hierarchy() can
ignore this part of the initialization, which can avoid
repeated initialization.
Signed-off-by: Chunguang Xu
---
block/bfq-cgroup.c | 4
1 file changed, 4
From: Chunguang Xu
The value range of ioprio is [0, 7], but the result of
bfq_weight_to_ioprio() may exceed this range, so simple
optimization is required.
Signed-off-by: Chunguang Xu
---
block/bfq-wf2q.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git
From: Chunguang Xu
In order to better guarantee the Qos for each group, we do not
allow queues of different groups to be merged.
Signed-off-by: Chunguang Xu
---
block/bfq-iosched.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index
From: Chunguang Xu
Under better_fairness, if higher priority queue is waiting
for service,disable queue idle, so that a schedule can be
invoked in time. In addition to CLASS_IDLE, other queues
allow idle, so that we can better control buffer IO too.
Signed-off-by: Chunguang Xu
---
From: Chunguang Xu
In order to ensure better Qos of tasks of different groups
and different classes under better_fairness, we only allow
the queues of the same class in the same group can be
injected.
Signed-off-by: Chunguang Xu
---
block/bfq-iosched.c | 28 +++-
1
From: Chunguang Xu
Traverse all schedule domains upward, if there are higher
priority tasks waiting for service, mark in_service_queue
prio_expire and then expire it, so the So RT tasks can be
scheduled in time.
Signed-off-by: Chunguang Xu
---
block/bfq-iosched.c | 7 +++
From: Chunguang Xu
When in_service_queue needs to be preempted by task with
a higher priority, we will mark it with prio_expire flag,
and then expire it on the IO dispatch path. Here add
prio_expire flag only.
Signed-off-by: Chunguang Xu
---
block/bfq-iosched.c | 2 ++
block/bfq-iosched.h | 2
From: Chunguang Xu
CLASS_RT will preempt other classes, which may starve. At
present, CLASS_IDLE has alleviated the starvation problem
through the minimum bandwidth mechanism. Similarly, we
should do the same for CLASS_BE.
Signed-off-by: Chunguang Xu
---
block/bfq-iosched.c | 6 --
From: Chunguang Xu
In the container scenario, in addition to throughput, we
also pay attention to Qos. In order to better support this
scenario, we introduce the better_fairness mode here. In
this mode, we expect to control the Qos of each group
according to its priority better. Only add
From: Chunguang Xu
The IO depth of queues belong to CLASS_IDLE is limited to 1,
so that it can avoid introducing a larger tail latency under
a device with a larger IO depth. Although limiting the IO
depth may reduce the performance of idle_class, it is
generally not a big problem, because
From: Chunguang Xu
Now the ioprio class of all groups is CLASS_BE, which is not very
friendly to the container scene. Therefore, we introduce the bfq.ioprio
interface to allow users to configure the ioprio class and ioprio of
the group, which can meet more priority requirements.
The bfq.ioprio
From: Chunguang Xu
Setting bfq_group.bfqd to void* type does not seem to make much sense.
This will cause unnecessary type conversion. Perhaps it would be better
to change it to bfq_data* type.
Signed-off-by: Chunguang Xu
---
block/bfq-cgroup.c | 2 +-
block/bfq-iosched.h | 2 +-
From: Chunguang Xu
Since the tasks inside the container itself have different
ioprio, in order to be compatible with the actual production
environment, when scheduling within a group, we use the task
ioprio class, but outside the group, we use the group ioprio
class. For example, when counting
From: Chunguang Xu
Introduce bfq_entity_to_bfqg() to make it easier to obtain the
bfq_group corresponding to the entity.
Signed-off-by: Chunguang Xu
---
block/bfq-cgroup.c | 6 ++
block/bfq-iosched.h | 1 +
block/bfq-wf2q.c| 16
3 files changed, 15 insertions(+), 8
From: Chunguang Xu
Any suggestions or discussions are welcome, thank you every much.
BACKGROUND:
In the container scenario, in addition to throughput, we also pay
attention to Qos of each group. Based on hierarchical scheduling,
EMQ, IO Injection, bfq.weight and other mechanisms, we can achieve
Paolo Valente wrote on 2021/3/21 19:04:
>
>
>> Il giorno 12 mar 2021, alle ore 12:08, brookxu ha
>> scritto:
>>
>> From: Chunguang Xu
>>
>
> Hi Chunguang,
>
>> Tasks in the production environment can be roughly divided into
>> thre
From: Chunguang Xu
Since we will initialize sched_data.service_tree[] in
bfq_init_root_group(), bfq_create_group_hierarchy() can
ignore this part of the initialization, which can avoid
repeated initialization.
Signed-off-by: Chunguang Xu
---
block/bfq-cgroup.c | 4
1 file changed, 4
From: Chunguang Xu
In EMQ, perhaps we should not merge the CLASS_RT queue
with other class queues. Otherwise, the delay of
CLASS_RT IO will increase.
Signed-off-by: Chunguang Xu
---
block/bfq-iosched.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/block/bfq-iosched.c
From: Chunguang Xu
The value range of ioprio is [0, 7], but the result of
bfq_weight_to_ioprio() may exceed this range, so simple
optimization is required.
Signed-off-by: Chunguang Xu
---
block/bfq-wf2q.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git
From: Chunguang Xu
Expire bfqq not belong to CLASS_RT and CLASS_RT is waiting for
service, we can further guarantee the latency for CLASS_RT.
Signed-off-by: Chunguang Xu
---
block/bfq-iosched.c | 15 ++-
block/bfq-iosched.h | 8
block/bfq-wf2q.c| 12
3
From: Chunguang Xu
if CLASS_RT is waiting for service,queues belong
to other class disallow idle, so that a schedule
can be invoked in time.
Signed-off-by: Chunguang Xu
---
block/bfq-iosched.c | 5 +
1 file changed, 5 insertions(+)
diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
From: Chunguang Xu
rt_class will preempt other classes, which may cause other
classes to starve to death. At present, idle_class has
alleviated the starvation problem through the minimum
bandwidth mechanism. Similarly, we should do the same for
be_class.
Signed-off-by: Chunguang Xu
---
From: Chunguang Xu
CLASS_RT is more sensitive to latency, and IO injection
will increase the CLASS_RT latency. For this reason,
consider prohibiting the injection of async queue for
CLASS_RT, and only the waker queue and other active
queues belonging to CLASS_RT are allowed to inject. In
this
From: Chunguang Xu
Tasks in the production environment can be roughly divided into
three categories: emergency tasks, ordinary tasks and offline
tasks. Emergency tasks need to be scheduled in real time, such
as system agents. Offline tasks do not need to guarantee QoS,
but can improve system
From: Chunguang Xu
The IO depth of queues belong to CLASS_IDLE is limited to 1,
so that it can avoid introducing a larger tail latency under
a device with a larger IO depth. Although limiting the IO
depth may reduce the performance of idle_class, it is
generally not a big problem, because
From: Chunguang Xu
Setting bfq_group.bfqd to void* type does not seem to make much sense.
This will cause unnecessary type conversion. Perhaps it would be better
to change it to bfq_data* type.
Signed-off-by: Chunguang Xu
---
block/bfq-cgroup.c | 2 +-
block/bfq-iosched.h | 2 +-
From: Chunguang Xu
Introduce bfq_entity_to_bfqg() to make it easier to obtain the
bfq_group corresponding to the entity.
Signed-off-by: Chunguang Xu
---
block/bfq-cgroup.c | 6 ++
block/bfq-iosched.h | 1 +
block/bfq-wf2q.c| 16
3 files changed, 15 insertions(+), 8
From: Chunguang Xu
Tasks in the production environment can be roughly divided into
three categories: emergency tasks, ordinary tasks and offline
tasks. Emergency tasks need to be scheduled in real time, such
as system agents. Offline tasks do not need to guarantee QoS,
but can improve system
From: Chunguang Xu
From: Chunguang Xu
Setting bfq_group.bfqd to void* type does not seem to make much sense.
This will cause unnecessary type conversion. Perhaps it would be better
to change it to bfq_data* type.
Signed-off-by: Chunguang Xu
---
block/bfq-cgroup.c | 2 +-
From: Chunguang Xu
From: Chunguang Xu
The value range of ioprio is [0, 7], but the result of
bfq_weight_to_ioprio() may exceed this range, so simple
optimization is required.
Signed-off-by: Chunguang Xu
---
block/bfq-wf2q.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff
From: Chunguang Xu
From: Chunguang Xu
Since we will initialize sched_data.service_tree[] in
bfq_init_root_group(), bfq_create_group_hierarchy() can
ignore this part of the initialization, which can avoid
repeated initialization.
Signed-off-by: Chunguang Xu
---
block/bfq-cgroup.c | 4
1
From: Chunguang Xu
From: Chunguang Xu
Expire bfqq If a higher priority class is waiting to be served, we
can further guarantee the delay of the higher priority class.
Signed-off-by: Chunguang Xu
---
block/bfq-iosched.c | 14 ++
1 file changed, 14 insertions(+)
diff --git
From: Chunguang Xu
From: Chunguang Xu
Tasks in the production environment can be roughly divided into
three categories: emergency tasks, ordinary tasks and offline
tasks. Emergency tasks need to be scheduled in real time, and the
amount of data for such tasks is usually not very large, such as
From: Chunguang Xu
From: Chunguang Xu
The IO depth of idle_class is limited to 1, so that it can
avoid introducing a larger tail latency under a device with
a larger IO depth. Although limiting the IO depth may reduce
the performance of idle_class, it is generally not a big
problem, because
From: Chunguang Xu
From: Chunguang Xu
rt_class will preempt other classes, which may cause other
classes to starve to death. At present, idle_class has
alleviated the starvation problem through the minimum
bandwidth mechanism. Similarly, we should do the same for
be_class.
Signed-off-by:
From: Chunguang Xu
Tasks in the production environment can be roughly divided into
three categories: emergency tasks, ordinary tasks and offline
tasks. Emergency tasks need to be scheduled in real time, and the
amount of data for such tasks is usually not very large, such as
system agents.
From: Chunguang Xu
From: Chunguang Xu
Introduce bfq_entity_to_bfqg() to make it easier to obtain the
bfq_group corresponding to the entity.
Signed-off-by: Chunguang Xu
---
block/bfq-cgroup.c | 6 ++
block/bfq-iosched.h | 1 +
block/bfq-wf2q.c| 16
3 files
Theodore Ts'o wrote on 2021/1/28 0:21:
> On Tue, Jan 26, 2021 at 08:50:02AM +0800, brookxu wrote:
>>
>> trace point, eBPF and other hook technologies are better for production
>> environments. But for pure debugging work, adding hook points feels a bit
>> heavy. Howev
harshad shirwadkar wrote on 2021/1/26 1:15:
> Hey hi! I don't see my previous comments being handled here or am I
> missing something? It'd be really handy to have the device name
> printed in jbd2 logs.
Maybe I miss something..,the origin switch has been reserved, the new added
switch and the
Theodore Ts'o wrote on 2021/1/26 5:50:
> On Sat, Jan 23, 2021 at 08:00:42PM +0800, Chunguang Xu wrote:
>> On a multi-disk machine, because jbd2 debugging switch is global, this
>> confuses the logs of multiple disks. It is not easy to distinguish the
>> logs of each disk and the amount of
Thanks for your reply.
Jan Kara wrote on 2021/1/25 20:41:
> On Fri 22-01-21 14:43:18, Chunguang Xu wrote:
>> On a multi-disk machine, because jbd2 debugging switch is global, this
>> confuses the logs of multiple disks. It is not easy to distinguish the
>> logs of each disk and the amount of
Jan Kara wrote on 2021/1/25 22:54:
> On Sat 23-01-21 20:00:44, Chunguang Xu wrote:
>> From: Chunguang Xu
>>
>> Compared to directly using numbers to indicate levels, using abstract
>> error, warn, notice, info, debug to indicate levels may be more
>> convenient for code reading and writing.
En...,your idea maybe better, thanks for your time.
harshad shirwadkar wrote on 2021/1/23 3:00:
> I wonder if we should retain the existing module param as well apart
> from the new device specific logging switch? If that switch is
> enabled, we'll get jbd2 logs for all the devices. Given that
ed, Aug 5, 2020 at 3:16 AM brookxu <mailto:brookxu...@gmail.com>> wrote:
>
> Add more... , As we expected, the running time of the test process is
> reduced significantly.
>
> Running time on unrepaired kernel:
> [root@TENCENT64 ~]# time taskset 0x01 ./spars
~]# time taskset 0x01 ./sparse /data1/sparce.dat
real 0m0.471s
user 0m0.004s
sys 0m0.395s
Thanks.
Andreas Dilger wrote on 2020/8/5 12:53:
> On Aug 4, 2020, at 7:02 PM, brookxu wrote:
>> In the scenario of writing sparse files, the Per-inode prealloc list may
>> be very
Thanks, this is a good suggestion, and I will merged this patch with the
previous patch.
Thanks
Andreas Dilger wrote on 2020/8/5 12:54:
> On Aug 4, 2020, at 7:02 PM, brookxu wrote:
>> Add the needed value to ext4_mb_discard_preallocations trace, so
>> we can more easily observ
:
> On Aug 4, 2020, at 7:02 PM, brookxu wrote:
>> In the scenario of writing sparse files, the Per-inode prealloc list may
>> be very long, resulting in high overhead for ext4_mb_use_preallocated().
>> To circumvent this problem, we limit the maximum length of per-inode
>> pre
In the scenario of writing sparse files, the Per-inode prealloc list may
be very long, resulting in high overhead for ext4_mb_use_preallocated().
To circumvent this problem, we limit the maximum length of per-inode
prealloc list to 512 and allow users to modify it.
Signed-off-by: Chunguang Xu
Add the needed value to ext4_mb_discard_preallocations trace, so
we can more easily observe the requested number of trim.
Signed-off-by: Chunguang Xu
---
include/trace/events/ext4.h | 14 --
1 file changed, 8 insertions(+), 6 deletions(-)
diff --git a/include/trace/events/ext4.h
Reorganize the if statement of ext4_mb_release_context(), make it
easier to read.
Signed-off-by: Chunguang Xu
---
fs/ext4/mballoc.c | 27 +--
1 file changed, 13 insertions(+), 14 deletions(-)
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index c0a331e..4f21f34
Delete invalid ac_b_extent backup inside ext4_mb_use_best_found(),
we have done this operation in ext4_mb_new_group_pa() and
ext4_mb_new_inode_pa().
Signed-off-by: Chunguang Xu
---
fs/ext4/mballoc.c | 8 ++--
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/fs/ext4/mballoc.c
Delete the invalid BUGON in ext4_mb_load_buddy_gfp(), the previous
code has already judged whether page is NULL.
Signed-off-by: Chunguang Xu
---
fs/ext4/mballoc.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 28a139f..9b1c3ad 100644
---
Reorganize the if statement of ext4_mb_release_context(), make it
easier to read.
Signed-off-by: Chunguang Xu
---
fs/ext4/mballoc.c | 27 +--
1 file changed, 13 insertions(+), 14 deletions(-)
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index c0a331e..4f21f34
Add the needed value to ext4_mb_discard_preallocations trace, so
we can more easily observe the requested number of trim.
Signed-off-by: Chunguang Xu
---
include/trace/events/ext4.h | 14 --
1 file changed, 8 insertions(+), 6 deletions(-)
diff --git a/include/trace/events/ext4.h
In the scenario of writing sparse files, the Per-inode prealloc list may
be very long, resulting in high overhead for ext4_mb_use_preallocated().
To circumvent this problem, we limit the maximum length of per-inode
prealloc list to 512 and allow users to modify it.
Signed-off-by: Chunguang Xu
Fix spelling typos in ext4_mb_initialize_context.
Signed-off-by: Chunguang Xu
---
fs/ext4/mballoc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index c0a331e..6dc2c6c 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -4399,7
74 matches
Mail list logo