Re: [RFC PATCH 2/2] sched:Pick the apt busy sched group during load balancing
Hi everyone, The figures SCHED_GRP1:3200 and SCHED_GRP2:1156 shown below in the changelog is the probable figure as calculated with the per-entity- load-tracking metric for the runqueue load. > If a sched group has passed the test for sufficient load in > update_sg_lb_stats,to qualify for load balancing,then PJT's > metrics has to be used to qualify the right sched group as the busiest group. > > The scenario which led to this patch is shown below: > Consider Task1 and Task2 to be a long running task > and Tasks 3,4,5,6 to be short running tasks > > Task3 > Task4 > Task1 Task5 > Task2 Task6 > ---- > SCHED_GRP1SCHED_GRP2 > > Normal load calculator would qualify SCHED_GRP2 as > the candidate for sd->busiest due to the following loads > that it calculates. > > SCHED_GRP1:2048 > SCHED_GRP2:4096 > > Load calculator would probably qualify SCHED_GRP1 as the candidate > for sd->busiest due to the following loads that it calculates > > SCHED_GRP1:3200 > SCHED_GRP2:1156 > Regards Preeti -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH 2/2] sched:Pick the apt busy sched group during load balancing
If a sched group has passed the test for sufficient load in update_sg_lb_stats,to qualify for load balancing,then PJT's metrics has to be used to qualify the right sched group as the busiest group. The scenario which led to this patch is shown below: Consider Task1 and Task2 to be a long running task and Tasks 3,4,5,6 to be short running tasks Task3 Task4 Task1 Task5 Task2 Task6 -- -- SCHED_GRP1 SCHED_GRP2 Normal load calculator would qualify SCHED_GRP2 as the candidate for sd->busiest due to the following loads that it calculates. SCHED_GRP1:2048 SCHED_GRP2:4096 Load calculator would probably qualify SCHED_GRP1 as the candidate for sd->busiest due to the following loads that it calculates SCHED_GRP1:3200 SCHED_GRP2:1156 This patch aims to strike a balance between the loads of the group and the number of tasks running on the group to decide the busiest group in the sched_domain. This means we will need to use the PJT's metrics but with an additional constraint. Signed-off-by: Preeti U Murthy --- kernel/sched/fair.c | 22 +++--- 1 file changed, 19 insertions(+), 3 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index dd0fb28..d45b7b4 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -165,7 +165,8 @@ void sched_init_granularity(void) #else # define WMULT_CONST (1UL << 32) #endif - +#define NR_THRESHOLD 2 +#define LOAD_THRESHOLD 1 #define WMULT_SHIFT32 /* @@ -4169,6 +4170,7 @@ struct sd_lb_stats { /* Statistics of the busiest group */ unsigned int busiest_idle_cpus; unsigned long max_load; + u64 max_sg_load; /* Equivalent of max_load but calculated using pjt's metric*/ unsigned long busiest_load_per_task; unsigned long busiest_nr_running; unsigned long busiest_group_capacity; @@ -4628,8 +4630,21 @@ static bool update_sd_pick_busiest(struct lb_env *env, struct sched_group *sg, struct sg_lb_stats *sgs) { - if (sgs->avg_load <= sds->max_load) - return false; + /* Use PJT's metrics to qualify a sched_group as busy +* But a low load sched group may be queueing up many tasks +* +* So before dismissing a sched group with lesser load,ensure +* that the number of processes on it is checked if it is +* not too less loaded than the max load so far +*/ + if (sgs->avg_cfs_runnable_load <= sds->max_sg_load) { + if (sgs->avg_cfs_runnable_load > LOAD_THRESHOLD * sds->max_sg_load) { + if (sgs->sum_nr_running <= (NR_THRESHOLD + sds->busiest_nr_running)) + return false; + } else { + return false; + } + } if (sgs->sum_nr_running > sgs->group_capacity) return true; @@ -4708,6 +4723,7 @@ static inline void update_sd_lb_stats(struct lb_env *env, sds->this_idle_cpus = sgs.idle_cpus; } else if (update_sd_pick_busiest(env, sds, sg, )) { sds->max_load = sgs.avg_load; + sds->max_sg_load = sgs.avg_cfs_runnable_load; sds->busiest = sg; sds->busiest_nr_running = sgs.sum_nr_running; sds->busiest_idle_cpus = sgs.idle_cpus; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH 2/2] sched:Pick the apt busy sched group during load balancing
If a sched group has passed the test for sufficient load in update_sg_lb_stats,to qualify for load balancing,then PJT's metrics has to be used to qualify the right sched group as the busiest group. The scenario which led to this patch is shown below: Consider Task1 and Task2 to be a long running task and Tasks 3,4,5,6 to be short running tasks Task3 Task4 Task1 Task5 Task2 Task6 -- -- SCHED_GRP1 SCHED_GRP2 Normal load calculator would qualify SCHED_GRP2 as the candidate for sd-busiest due to the following loads that it calculates. SCHED_GRP1:2048 SCHED_GRP2:4096 Load calculator would probably qualify SCHED_GRP1 as the candidate for sd-busiest due to the following loads that it calculates SCHED_GRP1:3200 SCHED_GRP2:1156 This patch aims to strike a balance between the loads of the group and the number of tasks running on the group to decide the busiest group in the sched_domain. This means we will need to use the PJT's metrics but with an additional constraint. Signed-off-by: Preeti U Murthy pre...@linux.vnet.ibm.com --- kernel/sched/fair.c | 22 +++--- 1 file changed, 19 insertions(+), 3 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index dd0fb28..d45b7b4 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -165,7 +165,8 @@ void sched_init_granularity(void) #else # define WMULT_CONST (1UL 32) #endif - +#define NR_THRESHOLD 2 +#define LOAD_THRESHOLD 1 #define WMULT_SHIFT32 /* @@ -4169,6 +4170,7 @@ struct sd_lb_stats { /* Statistics of the busiest group */ unsigned int busiest_idle_cpus; unsigned long max_load; + u64 max_sg_load; /* Equivalent of max_load but calculated using pjt's metric*/ unsigned long busiest_load_per_task; unsigned long busiest_nr_running; unsigned long busiest_group_capacity; @@ -4628,8 +4630,21 @@ static bool update_sd_pick_busiest(struct lb_env *env, struct sched_group *sg, struct sg_lb_stats *sgs) { - if (sgs-avg_load = sds-max_load) - return false; + /* Use PJT's metrics to qualify a sched_group as busy +* But a low load sched group may be queueing up many tasks +* +* So before dismissing a sched group with lesser load,ensure +* that the number of processes on it is checked if it is +* not too less loaded than the max load so far +*/ + if (sgs-avg_cfs_runnable_load = sds-max_sg_load) { + if (sgs-avg_cfs_runnable_load LOAD_THRESHOLD * sds-max_sg_load) { + if (sgs-sum_nr_running = (NR_THRESHOLD + sds-busiest_nr_running)) + return false; + } else { + return false; + } + } if (sgs-sum_nr_running sgs-group_capacity) return true; @@ -4708,6 +4723,7 @@ static inline void update_sd_lb_stats(struct lb_env *env, sds-this_idle_cpus = sgs.idle_cpus; } else if (update_sd_pick_busiest(env, sds, sg, sgs)) { sds-max_load = sgs.avg_load; + sds-max_sg_load = sgs.avg_cfs_runnable_load; sds-busiest = sg; sds-busiest_nr_running = sgs.sum_nr_running; sds-busiest_idle_cpus = sgs.idle_cpus; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 2/2] sched:Pick the apt busy sched group during load balancing
Hi everyone, The figures SCHED_GRP1:3200 and SCHED_GRP2:1156 shown below in the changelog is the probable figure as calculated with the per-entity- load-tracking metric for the runqueue load. If a sched group has passed the test for sufficient load in update_sg_lb_stats,to qualify for load balancing,then PJT's metrics has to be used to qualify the right sched group as the busiest group. The scenario which led to this patch is shown below: Consider Task1 and Task2 to be a long running task and Tasks 3,4,5,6 to be short running tasks Task3 Task4 Task1 Task5 Task2 Task6 ---- SCHED_GRP1SCHED_GRP2 Normal load calculator would qualify SCHED_GRP2 as the candidate for sd-busiest due to the following loads that it calculates. SCHED_GRP1:2048 SCHED_GRP2:4096 Load calculator would probably qualify SCHED_GRP1 as the candidate for sd-busiest due to the following loads that it calculates SCHED_GRP1:3200 SCHED_GRP2:1156 Regards Preeti -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/