[PATCH] x86: Unify definition of jiffies
jiffies_64 is always defined in file kernel/time/timer.c. Thus, we can unify definition of jiffies and make it less confusing. This only affects 64-bit platforms. Signed-off-by: Zhihui Zhang <zzhs...@gmail.com> --- arch/x86/kernel/time.c| 4 arch/x86/kernel/vmlinux.lds.S | 4 ++-- 2 files changed, 2 insertions(+), 6 deletions(-) diff --git a/arch/x86/kernel/time.c b/arch/x86/kernel/time.c index 749d189..2fedb33 100644 --- a/arch/x86/kernel/time.c +++ b/arch/x86/kernel/time.c @@ -23,10 +23,6 @@ #include #include -#ifdef CONFIG_X86_64 -__visible volatile unsigned long jiffies __cacheline_aligned = INITIAL_JIFFIES; -#endif - unsigned long profile_pc(struct pt_regs *regs) { unsigned long pc = instruction_pointer(regs); diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S index 1e413a93..940c190 100644 --- a/arch/x86/kernel/vmlinux.lds.S +++ b/arch/x86/kernel/vmlinux.lds.S @@ -36,13 +36,13 @@ OUTPUT_FORMAT(CONFIG_OUTPUT_FORMAT, CONFIG_OUTPUT_FORMAT, CONFIG_OUTPUT_FORMAT) #ifdef CONFIG_X86_32 OUTPUT_ARCH(i386) ENTRY(phys_startup_32) -jiffies = jiffies_64; #else OUTPUT_ARCH(i386:x86-64) ENTRY(phys_startup_64) -jiffies_64 = jiffies; #endif +jiffies = jiffies_64; + #if defined(CONFIG_X86_64) /* * On 64-bit, align RODATA to 2MB so we retain large page mappings for -- 2.7.4
[PATCH] x86: Unify definition of jiffies
jiffies_64 is always defined in file kernel/time/timer.c. Thus, we can unify definition of jiffies and make it less confusing. This only affects 64-bit platforms. Signed-off-by: Zhihui Zhang --- arch/x86/kernel/time.c| 4 arch/x86/kernel/vmlinux.lds.S | 4 ++-- 2 files changed, 2 insertions(+), 6 deletions(-) diff --git a/arch/x86/kernel/time.c b/arch/x86/kernel/time.c index 749d189..2fedb33 100644 --- a/arch/x86/kernel/time.c +++ b/arch/x86/kernel/time.c @@ -23,10 +23,6 @@ #include #include -#ifdef CONFIG_X86_64 -__visible volatile unsigned long jiffies __cacheline_aligned = INITIAL_JIFFIES; -#endif - unsigned long profile_pc(struct pt_regs *regs) { unsigned long pc = instruction_pointer(regs); diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S index 1e413a93..940c190 100644 --- a/arch/x86/kernel/vmlinux.lds.S +++ b/arch/x86/kernel/vmlinux.lds.S @@ -36,13 +36,13 @@ OUTPUT_FORMAT(CONFIG_OUTPUT_FORMAT, CONFIG_OUTPUT_FORMAT, CONFIG_OUTPUT_FORMAT) #ifdef CONFIG_X86_32 OUTPUT_ARCH(i386) ENTRY(phys_startup_32) -jiffies = jiffies_64; #else OUTPUT_ARCH(i386:x86-64) ENTRY(phys_startup_64) -jiffies_64 = jiffies; #endif +jiffies = jiffies_64; + #if defined(CONFIG_X86_64) /* * On 64-bit, align RODATA to 2MB so we retain large page mappings for -- 2.7.4
Re: [PATCH] timers: Reconcile the code and the comment for the 250HZ case
Ah, I see your point. Thanks for the detail explanation. -Zhihui On Mon, Jan 23, 2017 at 6:10 AM, Thomas Gleixner <t...@linutronix.de> wrote: > On Sat, 21 Jan 2017, Zhihui Zhang wrote: > >> Sure, I believe that comments should always match the code. In this > > That's fine. > >> case, using either LVL_SIZE - 1 or LVL_SIZE is fine based on my >> understanding about 20 days ago. But I could be wrong and miss some >> subtle details. Anyway, my point is about readability. > > Well, readability is one thing, but correctness is more important, right? > > Let's assume we have 4 buckets per level and base->clk is 0. So Level 0 > has the following expiry times: > > Bucket 0: base->clk + 0 > Bucket 1: base->clk + 1 > Bucket 2: base->clk + 2 > Bucket 3: base->clk + 3 > > So we can accomodate 4 timers here, but there is a nifty detail. We > guarantee that expiries are not short, so a timer armed for base->clk > will expire at base->clk + 1. > > The reason for this is that we have no distinction between absolute and > relative timeouts. But for relative timeouts we have to guarantee that the > timeout does not expire before the number of jiffies has elapsed. > > Now a timer armed with 1 jiffie relativ to now (jiffies) cannot be queued > to bucket 0 because jiffies can be incremented immediately after queueing > the timer which would expire it early. So it's queued to bucket 1 and > that's why we need to have LVL_SIZE - 1 and not LVL_SIZE. See also > calc_index(). > > Your change completely breaks the wheel. Let's assume the above and a > timer expiring at base->clk + 3. > > With your change the timer would fall into Level 0. So no calc_index() > does: > > expires = (expires + LVL_GRAN(lvl)) >> LVL_SHIFT(lvl); > return LVL_OFFS(lvl) + (expires & LVL_MASK); > > Let's substitute that for the expires = base->clk + 3 case: > > expires = (base->clk + 3 + 1) >> 0; > > --->expires = 4; > > return 0 + (4 & 0x03); > > --->index = 0 > > So the timer gets queued into bucket 0 and expires 4 jiffies too early. > > So using either LVL_SIZE - 1 or LVL_SIZE is _NOT_ fine. > > Thanks, > > tglx > > > > > >
Re: [PATCH] timers: Reconcile the code and the comment for the 250HZ case
Ah, I see your point. Thanks for the detail explanation. -Zhihui On Mon, Jan 23, 2017 at 6:10 AM, Thomas Gleixner wrote: > On Sat, 21 Jan 2017, Zhihui Zhang wrote: > >> Sure, I believe that comments should always match the code. In this > > That's fine. > >> case, using either LVL_SIZE - 1 or LVL_SIZE is fine based on my >> understanding about 20 days ago. But I could be wrong and miss some >> subtle details. Anyway, my point is about readability. > > Well, readability is one thing, but correctness is more important, right? > > Let's assume we have 4 buckets per level and base->clk is 0. So Level 0 > has the following expiry times: > > Bucket 0: base->clk + 0 > Bucket 1: base->clk + 1 > Bucket 2: base->clk + 2 > Bucket 3: base->clk + 3 > > So we can accomodate 4 timers here, but there is a nifty detail. We > guarantee that expiries are not short, so a timer armed for base->clk > will expire at base->clk + 1. > > The reason for this is that we have no distinction between absolute and > relative timeouts. But for relative timeouts we have to guarantee that the > timeout does not expire before the number of jiffies has elapsed. > > Now a timer armed with 1 jiffie relativ to now (jiffies) cannot be queued > to bucket 0 because jiffies can be incremented immediately after queueing > the timer which would expire it early. So it's queued to bucket 1 and > that's why we need to have LVL_SIZE - 1 and not LVL_SIZE. See also > calc_index(). > > Your change completely breaks the wheel. Let's assume the above and a > timer expiring at base->clk + 3. > > With your change the timer would fall into Level 0. So no calc_index() > does: > > expires = (expires + LVL_GRAN(lvl)) >> LVL_SHIFT(lvl); > return LVL_OFFS(lvl) + (expires & LVL_MASK); > > Let's substitute that for the expires = base->clk + 3 case: > > expires = (base->clk + 3 + 1) >> 0; > > --->expires = 4; > > return 0 + (4 & 0x03); > > --->index = 0 > > So the timer gets queued into bucket 0 and expires 4 jiffies too early. > > So using either LVL_SIZE - 1 or LVL_SIZE is _NOT_ fine. > > Thanks, > > tglx > > > > > >
Re: [PATCH] timers: Reconcile the code and the comment for the 250HZ case
Sure, I believe that comments should always match the code. In this case, using either LVL_SIZE - 1 or LVL_SIZE is fine based on my understanding about 20 days ago. But I could be wrong and miss some subtle details. Anyway, my point is about readability. thanks, On Fri, Jan 20, 2017 at 5:41 PM, John Stultz <john.stu...@linaro.org> wrote: > On Mon, Jan 2, 2017 at 1:14 PM, Zhihui Zhang <zzhs...@gmail.com> wrote: >> Adjust the time start of each level to match the comments. Note that >> LVL_START(n) is never used for n = 0 case. Also, each level (except >> level 0) has more than enough room to accommodate all its timers. > > So instead of just covering what your patch does, can you explain in > some detail why this patch is useful? What net effect does it bring? > What sort of bugs would it solve? > > thanks > -john
Re: [PATCH] timers: Reconcile the code and the comment for the 250HZ case
Sure, I believe that comments should always match the code. In this case, using either LVL_SIZE - 1 or LVL_SIZE is fine based on my understanding about 20 days ago. But I could be wrong and miss some subtle details. Anyway, my point is about readability. thanks, On Fri, Jan 20, 2017 at 5:41 PM, John Stultz wrote: > On Mon, Jan 2, 2017 at 1:14 PM, Zhihui Zhang wrote: >> Adjust the time start of each level to match the comments. Note that >> LVL_START(n) is never used for n = 0 case. Also, each level (except >> level 0) has more than enough room to accommodate all its timers. > > So instead of just covering what your patch does, can you explain in > some detail why this patch is useful? What net effect does it bring? > What sort of bugs would it solve? > > thanks > -john
[PATCH] timers: Reconcile the code and the comment for the 250HZ case
Adjust the time start of each level to match the comments. Note that LVL_START(n) is never used for n = 0 case. Also, each level (except level 0) has more than enough room to accommodate all its timers. Signed-off-by: Zhihui Zhang <zzhs...@gmail.com> --- kernel/time/timer.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/time/timer.c b/kernel/time/timer.c index ec33a69..268d5ae 100644 --- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -132,7 +132,7 @@ EXPORT_SYMBOL(jiffies_64); * 5 320131072 ms (~2m)1048576 ms -8388607 ms (~17m - ~2h) * 6 384 1048576 ms (~17m) 8388608 ms - 67108863 ms (~2h - ~18h) * 7 448 8388608 ms (~2h) 67108864 ms - 536870911 ms (~18h - ~6d) - * 8512 67108864 ms (~18h) 536870912 ms - 4294967288 ms (~6d - ~49d) + * 8512 67108864 ms (~18h) 536870912 ms - 4294967295 ms (~6d - ~49d) * * HZ 100 * Level Offset GranularityRange @@ -157,7 +157,7 @@ EXPORT_SYMBOL(jiffies_64); * The time start value for each level to select the bucket at enqueue * time. */ -#define LVL_START(n) ((LVL_SIZE - 1) << (((n) - 1) * LVL_CLK_SHIFT)) +#define LVL_START(n) (LVL_SIZE << (((n) - 1) * LVL_CLK_SHIFT)) /* Size of each clock level */ #define LVL_BITS 6 -- 2.7.4
[PATCH] timers: Reconcile the code and the comment for the 250HZ case
Adjust the time start of each level to match the comments. Note that LVL_START(n) is never used for n = 0 case. Also, each level (except level 0) has more than enough room to accommodate all its timers. Signed-off-by: Zhihui Zhang --- kernel/time/timer.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/time/timer.c b/kernel/time/timer.c index ec33a69..268d5ae 100644 --- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -132,7 +132,7 @@ EXPORT_SYMBOL(jiffies_64); * 5 320131072 ms (~2m)1048576 ms -8388607 ms (~17m - ~2h) * 6 384 1048576 ms (~17m) 8388608 ms - 67108863 ms (~2h - ~18h) * 7 448 8388608 ms (~2h) 67108864 ms - 536870911 ms (~18h - ~6d) - * 8512 67108864 ms (~18h) 536870912 ms - 4294967288 ms (~6d - ~49d) + * 8512 67108864 ms (~18h) 536870912 ms - 4294967295 ms (~6d - ~49d) * * HZ 100 * Level Offset GranularityRange @@ -157,7 +157,7 @@ EXPORT_SYMBOL(jiffies_64); * The time start value for each level to select the bucket at enqueue * time. */ -#define LVL_START(n) ((LVL_SIZE - 1) << (((n) - 1) * LVL_CLK_SHIFT)) +#define LVL_START(n) (LVL_SIZE << (((n) - 1) * LVL_CLK_SHIFT)) /* Size of each clock level */ #define LVL_BITS 6 -- 2.7.4
[PATCH] sched/fair: remove the swap() logic in load_too_imbalanced()
The swap() logic was introduced before we scaled the load by the actual CPU capacity. Now it looks funny that we swap the load but not the CPU capacity. In fact, we probably need to check both directions to ensure that the load of neither side increase too high compared to the other side. Signed-off-by: Zhihui Zhang --- kernel/sched/fair.c | 36 +++- 1 file changed, 19 insertions(+), 17 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 6e2e348..5e8e17b 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1203,7 +1203,7 @@ static void task_numa_assign(struct task_numa_env *env, static bool load_too_imbalanced(long src_load, long dst_load, struct task_numa_env *env) { - long imb, old_imb; + long imb1, imb2, orig_imb1, orig_imb2; long orig_src_load, orig_dst_load; long src_capacity, dst_capacity; @@ -1217,31 +1217,33 @@ static bool load_too_imbalanced(long src_load, long dst_load, src_capacity = env->src_stats.compute_capacity; dst_capacity = env->dst_stats.compute_capacity; - /* We care about the slope of the imbalance, not the direction. */ - if (dst_load < src_load) - swap(dst_load, src_load); - - /* Is the difference below the threshold? */ - imb = dst_load * src_capacity * 100 - - src_load * dst_capacity * env->imbalance_pct; - if (imb <= 0) + /* Does the difference in either direction exceed the threshold? */ + imb1 = dst_load * src_capacity * 100 - + src_load * dst_capacity * env->imbalance_pct; + imb2 = src_load * dst_capacity * 100 - + dst_load * src_capacity * env->imbalance_pct; + if (imb1 <= 0 && imb2 <= 0) return false; /* -* The imbalance is above the allowed threshold. -* Compare it with the old imbalance. +* At least one imbalance is above the allowed threshold. +* Compare it with the original imbalance. */ orig_src_load = env->src_stats.load; orig_dst_load = env->dst_stats.load; - if (orig_dst_load < orig_src_load) - swap(orig_dst_load, orig_src_load); - - old_imb = orig_dst_load * src_capacity * 100 - - orig_src_load * dst_capacity * env->imbalance_pct; + orig_imb1 = orig_dst_load * src_capacity * 100 - + orig_src_load * dst_capacity * env->imbalance_pct; + orig_imb2 = orig_src_load * dst_capacity * 100 - + orig_dst_load * src_capacity * env->imbalance_pct; /* Would this change make things worse? */ - return (imb > old_imb); + if (imb1 > 0 && imb1 > orig_imb1) + return true; + if (imb2 > 0 && imb2 > orig_imb2) + return true; + + return false; } /* -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] sched/fair: remove the swap() logic in load_too_imbalanced()
The swap() logic was introduced before we scaled the load by the actual CPU capacity. Now it looks funny that we swap the load but not the CPU capacity. In fact, we probably need to check both directions to ensure that the load of neither side increase too high compared to the other side. Signed-off-by: Zhihui Zhang <zzhs...@gmail.com> --- kernel/sched/fair.c | 36 +++- 1 file changed, 19 insertions(+), 17 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 6e2e348..5e8e17b 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1203,7 +1203,7 @@ static void task_numa_assign(struct task_numa_env *env, static bool load_too_imbalanced(long src_load, long dst_load, struct task_numa_env *env) { - long imb, old_imb; + long imb1, imb2, orig_imb1, orig_imb2; long orig_src_load, orig_dst_load; long src_capacity, dst_capacity; @@ -1217,31 +1217,33 @@ static bool load_too_imbalanced(long src_load, long dst_load, src_capacity = env->src_stats.compute_capacity; dst_capacity = env->dst_stats.compute_capacity; - /* We care about the slope of the imbalance, not the direction. */ - if (dst_load < src_load) - swap(dst_load, src_load); - - /* Is the difference below the threshold? */ - imb = dst_load * src_capacity * 100 - - src_load * dst_capacity * env->imbalance_pct; - if (imb <= 0) + /* Does the difference in either direction exceed the threshold? */ + imb1 = dst_load * src_capacity * 100 - + src_load * dst_capacity * env->imbalance_pct; + imb2 = src_load * dst_capacity * 100 - + dst_load * src_capacity * env->imbalance_pct; + if (imb1 <= 0 && imb2 <= 0) return false; /* -* The imbalance is above the allowed threshold. -* Compare it with the old imbalance. +* At least one imbalance is above the allowed threshold. +* Compare it with the original imbalance. */ orig_src_load = env->src_stats.load; orig_dst_load = env->dst_stats.load; - if (orig_dst_load < orig_src_load) - swap(orig_dst_load, orig_src_load); - - old_imb = orig_dst_load * src_capacity * 100 - - orig_src_load * dst_capacity * env->imbalance_pct; + orig_imb1 = orig_dst_load * src_capacity * 100 - + orig_src_load * dst_capacity * env->imbalance_pct; + orig_imb2 = orig_src_load * dst_capacity * 100 - + orig_dst_load * src_capacity * env->imbalance_pct; /* Would this change make things worse? */ - return (imb > old_imb); + if (imb1 > 0 && imb1 > orig_imb1) + return true; + if (imb2 > 0 && imb2 > orig_imb2) + return true; + + return false; } /* -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Rename RECLAIM_SWAP to RECLAIM_UNMAP.
the name SWAP implies that we are dealing with anonymous pages only. In fact, the original patch that introduced the min_unmapped_ratio logic was to fix an issue related to file pages. Rename it to RECLAIM_UNMAP to match what does. Historically, commit renamed .may_swap to .may_unmap, leaving RECLAIM_SWAP behind. commit <2e2e42598908> reintroduced .may_swap for memory controller. Signed-off-by: Zhihui Zhang --- mm/vmscan.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 5e8eadd..15328de 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -3596,7 +3596,7 @@ int zone_reclaim_mode __read_mostly; #define RECLAIM_OFF 0 #define RECLAIM_ZONE (1<<0)/* Run shrink_inactive_list on the zone */ #define RECLAIM_WRITE (1<<1) /* Writeout pages during reclaim */ -#define RECLAIM_SWAP (1<<2)/* Swap pages out during reclaim */ +#define RECLAIM_UNMAP (1<<2) /* Unmap pages during reclaim */ /* * Priority for ZONE_RECLAIM. This determines the fraction of pages @@ -3638,12 +3638,12 @@ static long zone_pagecache_reclaimable(struct zone *zone) long delta = 0; /* -* If RECLAIM_SWAP is set, then all file pages are considered +* If RECLAIM_UNMAP is set, then all file pages are considered * potentially reclaimable. Otherwise, we have to worry about * pages like swapcache and zone_unmapped_file_pages() provides * a better estimate */ - if (zone_reclaim_mode & RECLAIM_SWAP) + if (zone_reclaim_mode & RECLAIM_UNMAP) nr_pagecache_reclaimable = zone_page_state(zone, NR_FILE_PAGES); else nr_pagecache_reclaimable = zone_unmapped_file_pages(zone); @@ -3674,15 +3674,15 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order) .order = order, .priority = ZONE_RECLAIM_PRIORITY, .may_writepage = !!(zone_reclaim_mode & RECLAIM_WRITE), - .may_unmap = !!(zone_reclaim_mode & RECLAIM_SWAP), + .may_unmap = !!(zone_reclaim_mode & RECLAIM_UNMAP), .may_swap = 1, }; cond_resched(); /* -* We need to be able to allocate from the reserves for RECLAIM_SWAP +* We need to be able to allocate from the reserves for RECLAIM_UNMAP * and we also need to be able to write out pages for RECLAIM_WRITE -* and RECLAIM_SWAP. +* and RECLAIM_UNMAP. */ p->flags |= PF_MEMALLOC | PF_SWAPWRITE; lockdep_set_current_reclaim_state(gfp_mask); -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Rename RECLAIM_SWAP to RECLAIM_UNMAP.
the name SWAP implies that we are dealing with anonymous pages only. In fact, the original patch that introduced the min_unmapped_ratio logic was to fix an issue related to file pages. Rename it to RECLAIM_UNMAP to match what does. Historically, commit a6dc60f8975a renamed .may_swap to .may_unmap, leaving RECLAIM_SWAP behind. commit 2e2e42598908 reintroduced .may_swap for memory controller. Signed-off-by: Zhihui Zhang zzhs...@gmail.com --- mm/vmscan.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 5e8eadd..15328de 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -3596,7 +3596,7 @@ int zone_reclaim_mode __read_mostly; #define RECLAIM_OFF 0 #define RECLAIM_ZONE (10)/* Run shrink_inactive_list on the zone */ #define RECLAIM_WRITE (11) /* Writeout pages during reclaim */ -#define RECLAIM_SWAP (12)/* Swap pages out during reclaim */ +#define RECLAIM_UNMAP (12) /* Unmap pages during reclaim */ /* * Priority for ZONE_RECLAIM. This determines the fraction of pages @@ -3638,12 +3638,12 @@ static long zone_pagecache_reclaimable(struct zone *zone) long delta = 0; /* -* If RECLAIM_SWAP is set, then all file pages are considered +* If RECLAIM_UNMAP is set, then all file pages are considered * potentially reclaimable. Otherwise, we have to worry about * pages like swapcache and zone_unmapped_file_pages() provides * a better estimate */ - if (zone_reclaim_mode RECLAIM_SWAP) + if (zone_reclaim_mode RECLAIM_UNMAP) nr_pagecache_reclaimable = zone_page_state(zone, NR_FILE_PAGES); else nr_pagecache_reclaimable = zone_unmapped_file_pages(zone); @@ -3674,15 +3674,15 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order) .order = order, .priority = ZONE_RECLAIM_PRIORITY, .may_writepage = !!(zone_reclaim_mode RECLAIM_WRITE), - .may_unmap = !!(zone_reclaim_mode RECLAIM_SWAP), + .may_unmap = !!(zone_reclaim_mode RECLAIM_UNMAP), .may_swap = 1, }; cond_resched(); /* -* We need to be able to allocate from the reserves for RECLAIM_SWAP +* We need to be able to allocate from the reserves for RECLAIM_UNMAP * and we also need to be able to write out pages for RECLAIM_WRITE -* and RECLAIM_SWAP. +* and RECLAIM_UNMAP. */ p-flags |= PF_MEMALLOC | PF_SWAPWRITE; lockdep_set_current_reclaim_state(gfp_mask); -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Do not use arbitrary large movablecore to calculate kernelcore
If you specify movablecore > totalpages, required_kernelcore will end up with a big number because corepages is an unsigned integer. If so, the following nested is a waste of time. But I see your point. -Zhihui On Wed, Apr 1, 2015 at 7:00 PM, Mel Gorman wrote: > On Sat, Mar 28, 2015 at 11:36:02PM -0400, Zhihui Zhang wrote: >> If kernelcore is not set, then we are working with a very large kernelcore >> for nothing - no movable zone will be created. If kernelcore is set, >> then it is not respected at all. >> >> Signed-off-by: Zhihui Zhang > > I'm confused. What bug is this patch fixing? What is the user-visible > impcat of the patch? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Do not use arbitrary large movablecore to calculate kernelcore
If you specify movablecore totalpages, required_kernelcore will end up with a big number because corepages is an unsigned integer. If so, the following nested is a waste of time. But I see your point. -Zhihui On Wed, Apr 1, 2015 at 7:00 PM, Mel Gorman mgor...@suse.de wrote: On Sat, Mar 28, 2015 at 11:36:02PM -0400, Zhihui Zhang wrote: If kernelcore is not set, then we are working with a very large kernelcore for nothing - no movable zone will be created. If kernelcore is set, then it is not respected at all. Signed-off-by: Zhihui Zhang zzhs...@gmail.com I'm confused. What bug is this patch fixing? What is the user-visible impcat of the patch? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Do not use arbitrary large movablecore to calculate kernelcore
If kernelcore is not set, then we are working with a very large kernelcore for nothing - no movable zone will be created. If kernelcore is set, then it is not respected at all. Signed-off-by: Zhihui Zhang --- mm/page_alloc.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 40e2942..32bf5da 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5199,7 +5199,11 @@ static void __init find_zone_movable_pfns_for_nodes(void) */ required_movablecore = roundup(required_movablecore, MAX_ORDER_NR_PAGES); - corepages = totalpages - required_movablecore; + + if (totalpages > required_movablecore) + corepages = totalpages - required_movablecore; + else + corepages = 0; required_kernelcore = max(required_kernelcore, corepages); } -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Do not use arbitrary large movablecore to calculate kernelcore
If kernelcore is not set, then we are working with a very large kernelcore for nothing - no movable zone will be created. If kernelcore is set, then it is not respected at all. Signed-off-by: Zhihui Zhang zzhs...@gmail.com --- mm/page_alloc.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 40e2942..32bf5da 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5199,7 +5199,11 @@ static void __init find_zone_movable_pfns_for_nodes(void) */ required_movablecore = roundup(required_movablecore, MAX_ORDER_NR_PAGES); - corepages = totalpages - required_movablecore; + + if (totalpages required_movablecore) + corepages = totalpages - required_movablecore; + else + corepages = 0; required_kernelcore = max(required_kernelcore, corepages); } -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] [sched] Account the elapse of each period accurately
Currently, the decayed values of previous periods can spill into the lower 10 bits of runnable_avg_period. This makes the next period to be less than 1024 us. If we want to decay exactly every 1024us, which I see no reason not to (less math overhead and consistent decay period among all tasks), we can use a separate field to track how much time the current period has elapsed instead of overloading runnable_avg_period. This patch achieves this. Signed-off-by: Zhihui Zhang --- include/linux/sched.h | 2 +- kernel/sched/fair.c | 5 - 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 8db31ef..fa6b23b 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1116,7 +1116,7 @@ struct sched_avg { * above by 1024/(1-y). Thus we only need a u32 to store them for all * choices of y < 1-2^(-32)*1024. */ - u32 runnable_avg_sum, runnable_avg_period; + u32 accrue, runnable_avg_sum, runnable_avg_period; u64 last_runnable_update; s64 decay_count; unsigned long load_avg_contrib; diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index df2cdf7..c87ecf5 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -676,6 +676,7 @@ void init_task_runnable_average(struct task_struct *p) { u32 slice; + p->se.avg.accrue = 0; p->se.avg.decay_count = 0; slice = sched_slice(task_cfs_rq(p), >se) >> 10; p->se.avg.runnable_avg_sum = slice; @@ -2526,11 +2527,12 @@ static __always_inline int __update_entity_runnable_avg(u64 now, sa->last_runnable_update = now; /* delta_w is the amount already accumulated against our next period */ - delta_w = sa->runnable_avg_period % 1024; + delta_w = sa->accrue; if (delta + delta_w >= 1024) { /* period roll-over */ decayed = 1; + sa->accrue = 0; /* * Now that we know we're crossing a period boundary, figure * out how much from delta we need to complete the current @@ -2558,6 +2560,7 @@ static __always_inline int __update_entity_runnable_avg(u64 now, sa->runnable_avg_sum += runnable_contrib; sa->runnable_avg_period += runnable_contrib; } + sa->accrue += delta; /* Remainder of delta accrued against u_0` */ if (runnable) -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] [sched] Account the elapse of each period accurately
Currently, the decayed values of previous periods can spill into the lower 10 bits of runnable_avg_period. This makes the next period to be less than 1024 us. If we want to decay exactly every 1024us, which I see no reason not to (less math overhead and consistent decay period among all tasks), we can use a separate field to track how much time the current period has elapsed instead of overloading runnable_avg_period. This patch achieves this. Signed-off-by: Zhihui Zhang zzhs...@gmail.com --- include/linux/sched.h | 2 +- kernel/sched/fair.c | 5 - 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 8db31ef..fa6b23b 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1116,7 +1116,7 @@ struct sched_avg { * above by 1024/(1-y). Thus we only need a u32 to store them for all * choices of y 1-2^(-32)*1024. */ - u32 runnable_avg_sum, runnable_avg_period; + u32 accrue, runnable_avg_sum, runnable_avg_period; u64 last_runnable_update; s64 decay_count; unsigned long load_avg_contrib; diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index df2cdf7..c87ecf5 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -676,6 +676,7 @@ void init_task_runnable_average(struct task_struct *p) { u32 slice; + p-se.avg.accrue = 0; p-se.avg.decay_count = 0; slice = sched_slice(task_cfs_rq(p), p-se) 10; p-se.avg.runnable_avg_sum = slice; @@ -2526,11 +2527,12 @@ static __always_inline int __update_entity_runnable_avg(u64 now, sa-last_runnable_update = now; /* delta_w is the amount already accumulated against our next period */ - delta_w = sa-runnable_avg_period % 1024; + delta_w = sa-accrue; if (delta + delta_w = 1024) { /* period roll-over */ decayed = 1; + sa-accrue = 0; /* * Now that we know we're crossing a period boundary, figure * out how much from delta we need to complete the current @@ -2558,6 +2560,7 @@ static __always_inline int __update_entity_runnable_avg(u64 now, sa-runnable_avg_sum += runnable_contrib; sa-runnable_avg_period += runnable_contrib; } + sa-accrue += delta; /* Remainder of delta accrued against u_0` */ if (runnable) -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [mempolicy] remove unnecessary is_valid_nodemask()
The filtering occurs in mpol_set_nodemask(), it reads like this: if (pol->flags & MPOL_F_RELATIVE_NODES) mpol_relative_nodemask(>mask2, nodes,>mask1); else nodes_and(nsc->mask2, *nodes, nsc->mask1); so mask2 is based on mask1. mask2 is only used when nodes is not NULL later. so we don't care the case of (pol->mode == MPOL_PREFERRED && nodes_empty(*nodes)). -Zhihui On Mon, Nov 17, 2014 at 6:08 PM, Andrew Morton wrote: > On Sat, 15 Nov 2014 21:49:57 -0500 Zhihui Zhang wrote: > >> When nodes is true, nsc->mask2 has already been filtered by nsc->mask1, >> which has >> already factored in node_states[N_MEMORY]. >> > > Please be more specific. Where does that filtering occur? > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [mempolicy] remove unnecessary is_valid_nodemask()
The filtering occurs in mpol_set_nodemask(), it reads like this: if (pol-flags MPOL_F_RELATIVE_NODES) mpol_relative_nodemask(nsc-mask2, nodes,nsc-mask1); else nodes_and(nsc-mask2, *nodes, nsc-mask1); so mask2 is based on mask1. mask2 is only used when nodes is not NULL later. so we don't care the case of (pol-mode == MPOL_PREFERRED nodes_empty(*nodes)). -Zhihui On Mon, Nov 17, 2014 at 6:08 PM, Andrew Morton a...@linux-foundation.org wrote: On Sat, 15 Nov 2014 21:49:57 -0500 Zhihui Zhang zzhs...@gmail.com wrote: When nodes is true, nsc-mask2 has already been filtered by nsc-mask1, which has already factored in node_states[N_MEMORY]. Please be more specific. Where does that filtering occur? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] [mempolicy] remove unnecessary is_valid_nodemask()
When nodes is true, nsc->mask2 has already been filtered by nsc->mask1, which has already factored in node_states[N_MEMORY]. Signed-off-by: Zhihui Zhang --- mm/mempolicy.c | 10 ++ 1 file changed, 2 insertions(+), 8 deletions(-) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index e58725a..f22c559 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -162,12 +162,6 @@ static const struct mempolicy_operations { enum mpol_rebind_step step); } mpol_ops[MPOL_MAX]; -/* Check that the nodemask contains at least one populated zone */ -static int is_valid_nodemask(const nodemask_t *nodemask) -{ - return nodes_intersects(*nodemask, node_states[N_MEMORY]); -} - static inline int mpol_store_user_nodemask(const struct mempolicy *pol) { return pol->flags & MPOL_MODE_FLAGS; @@ -202,7 +196,7 @@ static int mpol_new_preferred(struct mempolicy *pol, const nodemask_t *nodes) static int mpol_new_bind(struct mempolicy *pol, const nodemask_t *nodes) { - if (!is_valid_nodemask(nodes)) + if (nodes_empty(*nodes)) return -EINVAL; pol->v.nodes = *nodes; return 0; @@ -234,7 +228,7 @@ static int mpol_set_nodemask(struct mempolicy *pol, nodes = NULL; /* explicit local allocation */ else { if (pol->flags & MPOL_F_RELATIVE_NODES) - mpol_relative_nodemask(>mask2, nodes,>mask1); + mpol_relative_nodemask(>mask2, nodes, >mask1); else nodes_and(nsc->mask2, *nodes, nsc->mask1); -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] [mempolicy] remove unnecessary is_valid_nodemask()
When nodes is true, nsc-mask2 has already been filtered by nsc-mask1, which has already factored in node_states[N_MEMORY]. Signed-off-by: Zhihui Zhang zzhs...@gmail.com --- mm/mempolicy.c | 10 ++ 1 file changed, 2 insertions(+), 8 deletions(-) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index e58725a..f22c559 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -162,12 +162,6 @@ static const struct mempolicy_operations { enum mpol_rebind_step step); } mpol_ops[MPOL_MAX]; -/* Check that the nodemask contains at least one populated zone */ -static int is_valid_nodemask(const nodemask_t *nodemask) -{ - return nodes_intersects(*nodemask, node_states[N_MEMORY]); -} - static inline int mpol_store_user_nodemask(const struct mempolicy *pol) { return pol-flags MPOL_MODE_FLAGS; @@ -202,7 +196,7 @@ static int mpol_new_preferred(struct mempolicy *pol, const nodemask_t *nodes) static int mpol_new_bind(struct mempolicy *pol, const nodemask_t *nodes) { - if (!is_valid_nodemask(nodes)) + if (nodes_empty(*nodes)) return -EINVAL; pol-v.nodes = *nodes; return 0; @@ -234,7 +228,7 @@ static int mpol_set_nodemask(struct mempolicy *pol, nodes = NULL; /* explicit local allocation */ else { if (pol-flags MPOL_F_RELATIVE_NODES) - mpol_relative_nodemask(nsc-mask2, nodes,nsc-mask1); + mpol_relative_nodemask(nsc-mask2, nodes, nsc-mask1); else nodes_and(nsc-mask2, *nodes, nsc-mask1); -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [percpu] Make the unit size of the first chunk the same as other chunks
I see your point. Thanks. -Zhihui On Wed, Oct 29, 2014 at 12:18 AM, Tejun Heo wrote: > Plesae restore lkml cc when replying. > > On Tue, Oct 28, 2014 at 08:12:30PM -0400, Zhihui Zhang wrote: >> My patch just increases the dynamic area in the first chunk slightly >> to cover the round up surplus. On my 64-bit laptop, it is 12288 bytes. > > As I wrote before, it's 12288 bytes on your laptop but it can be much > larger on other setups. > >> It will mostly likely be used, and in fact, a second chunk will be >> most likely needed as well. So in theory you are right, but in >> practice, it probably won't matter. > > If the initial dynamic reserve is too small, which could be the case > given the overall increase in percpu memory, increase > PERCPU_DYNAMIC_EARLY_SIZE. > > -- > tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [percpu] Make the unit size of the first chunk the same as other chunks
I see your point. Thanks. -Zhihui On Wed, Oct 29, 2014 at 12:18 AM, Tejun Heo t...@kernel.org wrote: Plesae restore lkml cc when replying. On Tue, Oct 28, 2014 at 08:12:30PM -0400, Zhihui Zhang wrote: My patch just increases the dynamic area in the first chunk slightly to cover the round up surplus. On my 64-bit laptop, it is 12288 bytes. As I wrote before, it's 12288 bytes on your laptop but it can be much larger on other setups. It will mostly likely be used, and in fact, a second chunk will be most likely needed as well. So in theory you are right, but in practice, it probably won't matter. If the initial dynamic reserve is too small, which could be the case given the overall increase in percpu memory, increase PERCPU_DYNAMIC_EARLY_SIZE. -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [percpu] Make the unit size of the first chunk the same as other chunks
In pcpu_embed_first_chunk(), we allocate full unit size for each CPU in the first chunk: 1981 /* allocate space for the whole group */ 1982 ptr = alloc_fn(cpu, gi->nr_units * ai->unit_size, atom_size); 1983 if (!ptr) { 1984 rc = -ENOMEM; 1985 goto out_free_areas; 1986 } Later we freed unused part: 2009 /* copy and return the unused part */ 2010 memcpy(ptr, __per_cpu_load, ai->static_size); 2011 free_fn(ptr + size_sum, ai->unit_size - size_sum); I am trying to make each CPU to have a full unit size in the first chunk, same as in all other chunks. Does this make sense? -Zhihui On Mon, Oct 27, 2014 at 10:08 AM, Tejun Heo wrote: > On Sat, Oct 25, 2014 at 11:05:58AM -0400, Zhihui Zhang wrote: >> Since we have already allocated the full unit size for the first chunk, we >> might as well use >> it so that the unit size are the same for all chunks. The page first chunk >> allocator already >> has this effect because it allocates one page at a time. > > I'm not following. Where do we allocate the full unit size for the > first chunk? > > Thanks. > > -- > tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [percpu] Make the unit size of the first chunk the same as other chunks
In pcpu_embed_first_chunk(), we allocate full unit size for each CPU in the first chunk: 1981 /* allocate space for the whole group */ 1982 ptr = alloc_fn(cpu, gi-nr_units * ai-unit_size, atom_size); 1983 if (!ptr) { 1984 rc = -ENOMEM; 1985 goto out_free_areas; 1986 } Later we freed unused part: 2009 /* copy and return the unused part */ 2010 memcpy(ptr, __per_cpu_load, ai-static_size); 2011 free_fn(ptr + size_sum, ai-unit_size - size_sum); I am trying to make each CPU to have a full unit size in the first chunk, same as in all other chunks. Does this make sense? -Zhihui On Mon, Oct 27, 2014 at 10:08 AM, Tejun Heo t...@kernel.org wrote: On Sat, Oct 25, 2014 at 11:05:58AM -0400, Zhihui Zhang wrote: Since we have already allocated the full unit size for the first chunk, we might as well use it so that the unit size are the same for all chunks. The page first chunk allocator already has this effect because it allocates one page at a time. I'm not following. Where do we allocate the full unit size for the first chunk? Thanks. -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] [percpu] Make the unit size of the first chunk the same as other chunks
Since we have already allocated the full unit size for the first chunk, we might as well use it so that the unit size are the same for all chunks. The page first chunk allocator already has this effect because it allocates one page at a time. Signed-off-by: Zhihui Zhang --- mm/percpu.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/mm/percpu.c b/mm/percpu.c index 014bab6..7242360 100644 --- a/mm/percpu.c +++ b/mm/percpu.c @@ -1960,6 +1960,7 @@ int __init pcpu_embed_first_chunk(size_t reserved_size, size_t dyn_size, return PTR_ERR(ai); size_sum = ai->static_size + ai->reserved_size + ai->dyn_size; + ai->dyn_size += ai->unit_size - size_sum; areas_size = PFN_ALIGN(ai->nr_groups * sizeof(void *)); areas = memblock_virt_alloc_nopanic(areas_size, 0); @@ -2006,9 +2007,8 @@ int __init pcpu_embed_first_chunk(size_t reserved_size, size_t dyn_size, free_fn(ptr, ai->unit_size); continue; } - /* copy and return the unused part */ + /* copy static data */ memcpy(ptr, __per_cpu_load, ai->static_size); - free_fn(ptr + size_sum, ai->unit_size - size_sum); } } @@ -2034,7 +2034,7 @@ int __init pcpu_embed_first_chunk(size_t reserved_size, size_t dyn_size, } pr_info("PERCPU: Embedded %zu pages/cpu @%p s%zu r%zu d%zu u%zu\n", - PFN_DOWN(size_sum), base, ai->static_size, ai->reserved_size, + PFN_DOWN(ai->unit_size), base, ai->static_size, ai->reserved_size, ai->dyn_size, ai->unit_size); rc = pcpu_setup_first_chunk(ai, base); -- 1.8.1.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] [percpu] Make the unit size of the first chunk the same as other chunks
Since we have already allocated the full unit size for the first chunk, we might as well use it so that the unit size are the same for all chunks. The page first chunk allocator already has this effect because it allocates one page at a time. Signed-off-by: Zhihui Zhang zzhs...@gmail.com --- mm/percpu.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/mm/percpu.c b/mm/percpu.c index 014bab6..7242360 100644 --- a/mm/percpu.c +++ b/mm/percpu.c @@ -1960,6 +1960,7 @@ int __init pcpu_embed_first_chunk(size_t reserved_size, size_t dyn_size, return PTR_ERR(ai); size_sum = ai-static_size + ai-reserved_size + ai-dyn_size; + ai-dyn_size += ai-unit_size - size_sum; areas_size = PFN_ALIGN(ai-nr_groups * sizeof(void *)); areas = memblock_virt_alloc_nopanic(areas_size, 0); @@ -2006,9 +2007,8 @@ int __init pcpu_embed_first_chunk(size_t reserved_size, size_t dyn_size, free_fn(ptr, ai-unit_size); continue; } - /* copy and return the unused part */ + /* copy static data */ memcpy(ptr, __per_cpu_load, ai-static_size); - free_fn(ptr + size_sum, ai-unit_size - size_sum); } } @@ -2034,7 +2034,7 @@ int __init pcpu_embed_first_chunk(size_t reserved_size, size_t dyn_size, } pr_info(PERCPU: Embedded %zu pages/cpu @%p s%zu r%zu d%zu u%zu\n, - PFN_DOWN(size_sum), base, ai-static_size, ai-reserved_size, + PFN_DOWN(ai-unit_size), base, ai-static_size, ai-reserved_size, ai-dyn_size, ai-unit_size); rc = pcpu_setup_first_chunk(ai, base); -- 1.8.1.2 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:sched/core] sched: Clean up some typos and grammatical errors in code/comments
Commit-ID: 9c58c79a8a76c510cd3a5012c536d4fe3c81ec3b Gitweb: http://git.kernel.org/tip/9c58c79a8a76c510cd3a5012c536d4fe3c81ec3b Author: Zhihui Zhang AuthorDate: Sat, 20 Sep 2014 21:24:36 -0400 Committer: Ingo Molnar CommitDate: Sun, 21 Sep 2014 09:00:02 +0200 sched: Clean up some typos and grammatical errors in code/comments Signed-off-by: Zhihui Zhang Cc: pet...@infradead.org Link: http://lkml.kernel.org/r/1411262676-19928-1-git-send-email-zzhs...@gmail.com Signed-off-by: Ingo Molnar --- kernel/sched/core.c | 4 ++-- kernel/sched/fair.c | 6 +++--- kernel/sched/sched.h | 2 +- 3 files changed, 6 insertions(+), 6 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 61ee2b3..a284190 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -8069,7 +8069,7 @@ static int tg_cfs_schedulable_down(struct task_group *tg, void *data) struct cfs_bandwidth *parent_b = >parent->cfs_bandwidth; quota = normalize_cfs_quota(tg, d); - parent_quota = parent_b->hierarchal_quota; + parent_quota = parent_b->hierarchical_quota; /* * ensure max(child_quota) <= parent_quota, inherit when no @@ -8080,7 +8080,7 @@ static int tg_cfs_schedulable_down(struct task_group *tg, void *data) else if (parent_quota != RUNTIME_INF && quota > parent_quota) return -EINVAL; } - cfs_b->hierarchal_quota = quota; + cfs_b->hierarchical_quota = quota; return 0; } diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 74fa2c2..2a1e6ac 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2224,8 +2224,8 @@ static __always_inline u64 decay_load(u64 val, u64 n) /* * As y^PERIOD = 1/2, we can combine -*y^n = 1/2^(n/PERIOD) * k^(n%PERIOD) -* With a look-up table which covers k^n (navg_load >= busiest->avg_load) diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index aa0f73b..1bc6aad 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -188,7 +188,7 @@ struct cfs_bandwidth { raw_spinlock_t lock; ktime_t period; u64 quota, runtime; - s64 hierarchal_quota; + s64 hierarchical_quota; u64 runtime_expires; int idle, timer_active; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:sched/core] sched: Clean up some typos and grammatical errors in code/comments
Commit-ID: 9c58c79a8a76c510cd3a5012c536d4fe3c81ec3b Gitweb: http://git.kernel.org/tip/9c58c79a8a76c510cd3a5012c536d4fe3c81ec3b Author: Zhihui Zhang zzhs...@gmail.com AuthorDate: Sat, 20 Sep 2014 21:24:36 -0400 Committer: Ingo Molnar mi...@kernel.org CommitDate: Sun, 21 Sep 2014 09:00:02 +0200 sched: Clean up some typos and grammatical errors in code/comments Signed-off-by: Zhihui Zhang zzhs...@gmail.com Cc: pet...@infradead.org Link: http://lkml.kernel.org/r/1411262676-19928-1-git-send-email-zzhs...@gmail.com Signed-off-by: Ingo Molnar mi...@kernel.org --- kernel/sched/core.c | 4 ++-- kernel/sched/fair.c | 6 +++--- kernel/sched/sched.h | 2 +- 3 files changed, 6 insertions(+), 6 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 61ee2b3..a284190 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -8069,7 +8069,7 @@ static int tg_cfs_schedulable_down(struct task_group *tg, void *data) struct cfs_bandwidth *parent_b = tg-parent-cfs_bandwidth; quota = normalize_cfs_quota(tg, d); - parent_quota = parent_b-hierarchal_quota; + parent_quota = parent_b-hierarchical_quota; /* * ensure max(child_quota) = parent_quota, inherit when no @@ -8080,7 +8080,7 @@ static int tg_cfs_schedulable_down(struct task_group *tg, void *data) else if (parent_quota != RUNTIME_INF quota parent_quota) return -EINVAL; } - cfs_b-hierarchal_quota = quota; + cfs_b-hierarchical_quota = quota; return 0; } diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 74fa2c2..2a1e6ac 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2224,8 +2224,8 @@ static __always_inline u64 decay_load(u64 val, u64 n) /* * As y^PERIOD = 1/2, we can combine -*y^n = 1/2^(n/PERIOD) * k^(n%PERIOD) -* With a look-up table which covers k^n (nPERIOD) +*y^n = 1/2^(n/PERIOD) * y^(n%PERIOD) +* With a look-up table which covers y^n (nPERIOD) * * To achieve constant time decay_load. */ @@ -6410,7 +6410,7 @@ static struct sched_group *find_busiest_group(struct lb_env *env) goto force_balance; /* -* If the local group is more busy than the selected busiest group +* If the local group is busier than the selected busiest group * don't try and pull any tasks. */ if (local-avg_load = busiest-avg_load) diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index aa0f73b..1bc6aad 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -188,7 +188,7 @@ struct cfs_bandwidth { raw_spinlock_t lock; ktime_t period; u64 quota, runtime; - s64 hierarchal_quota; + s64 hierarchical_quota; u64 runtime_expires; int idle, timer_active; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] [sched] Clean up some typos and grammatical errors in code/comments
Well, the subject line says it all. Signed-off-by: Zhihui Zhang --- kernel/sched/core.c | 4 ++-- kernel/sched/fair.c | 6 +++--- kernel/sched/sched.h | 2 +- 3 files changed, 6 insertions(+), 6 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index ec1a286..eb5505f 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -8005,7 +8005,7 @@ static int tg_cfs_schedulable_down(struct task_group *tg, void *data) struct cfs_bandwidth *parent_b = >parent->cfs_bandwidth; quota = normalize_cfs_quota(tg, d); - parent_quota = parent_b->hierarchal_quota; + parent_quota = parent_b->hierarchical_quota; /* * ensure max(child_quota) <= parent_quota, inherit when no @@ -8016,7 +8016,7 @@ static int tg_cfs_schedulable_down(struct task_group *tg, void *data) else if (parent_quota != RUNTIME_INF && quota > parent_quota) return -EINVAL; } - cfs_b->hierarchal_quota = quota; + cfs_b->hierarchical_quota = quota; return 0; } diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index bfa3c86..6d83845 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2211,8 +2211,8 @@ static __always_inline u64 decay_load(u64 val, u64 n) /* * As y^PERIOD = 1/2, we can combine -*y^n = 1/2^(n/PERIOD) * k^(n%PERIOD) -* With a look-up table which covers k^n (navg_load >= busiest->avg_load) diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 579712f..80b124d 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -184,7 +184,7 @@ struct cfs_bandwidth { raw_spinlock_t lock; ktime_t period; u64 quota, runtime; - s64 hierarchal_quota; + s64 hierarchical_quota; u64 runtime_expires; int idle, timer_active; -- 1.8.1.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] [sched] Clean up some typos and grammatical errors in code/comments
Well, the subject line says it all. Signed-off-by: Zhihui Zhang zzhs...@gmail.com --- kernel/sched/core.c | 4 ++-- kernel/sched/fair.c | 6 +++--- kernel/sched/sched.h | 2 +- 3 files changed, 6 insertions(+), 6 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index ec1a286..eb5505f 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -8005,7 +8005,7 @@ static int tg_cfs_schedulable_down(struct task_group *tg, void *data) struct cfs_bandwidth *parent_b = tg-parent-cfs_bandwidth; quota = normalize_cfs_quota(tg, d); - parent_quota = parent_b-hierarchal_quota; + parent_quota = parent_b-hierarchical_quota; /* * ensure max(child_quota) = parent_quota, inherit when no @@ -8016,7 +8016,7 @@ static int tg_cfs_schedulable_down(struct task_group *tg, void *data) else if (parent_quota != RUNTIME_INF quota parent_quota) return -EINVAL; } - cfs_b-hierarchal_quota = quota; + cfs_b-hierarchical_quota = quota; return 0; } diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index bfa3c86..6d83845 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2211,8 +2211,8 @@ static __always_inline u64 decay_load(u64 val, u64 n) /* * As y^PERIOD = 1/2, we can combine -*y^n = 1/2^(n/PERIOD) * k^(n%PERIOD) -* With a look-up table which covers k^n (nPERIOD) +*y^n = 1/2^(n/PERIOD) * y^(n%PERIOD) +* With a look-up table which covers y^n (nPERIOD) * * To achieve constant time decay_load. */ @@ -6346,7 +6346,7 @@ static struct sched_group *find_busiest_group(struct lb_env *env) goto force_balance; /* -* If the local group is more busy than the selected busiest group +* If the local group is busier than the selected busiest group * don't try and pull any tasks. */ if (local-avg_load = busiest-avg_load) diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 579712f..80b124d 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -184,7 +184,7 @@ struct cfs_bandwidth { raw_spinlock_t lock; ktime_t period; u64 quota, runtime; - s64 hierarchal_quota; + s64 hierarchical_quota; u64 runtime_expires; int idle, timer_active; -- 1.8.1.2 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:sched/core] sched: Rename a misleading variable in build_overlap_sched_groups()
Commit-ID: aaecac4ad46b35ad308245384d019633fb9bc21b Gitweb: http://git.kernel.org/tip/aaecac4ad46b35ad308245384d019633fb9bc21b Author: Zhihui Zhang AuthorDate: Fri, 1 Aug 2014 21:18:03 -0400 Committer: Ingo Molnar CommitDate: Tue, 12 Aug 2014 12:48:21 +0200 sched: Rename a misleading variable in build_overlap_sched_groups() The child variable in build_overlap_sched_groups() actually refers to the peer or sibling domain of the given CPU. Rename it to sibling to be consistent with the naming in build_group_mask(). Signed-off-by: Zhihui Zhang Signed-off-by: Peter Zijlstra Cc: Linus Torvalds Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1406942283-18249-1-git-send-email-zzhs...@gmail.com Signed-off-by: Ingo Molnar --- kernel/sched/core.c | 13 ++--- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 1211575..7d1ec6e 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5739,7 +5739,7 @@ build_overlap_sched_groups(struct sched_domain *sd, int cpu) const struct cpumask *span = sched_domain_span(sd); struct cpumask *covered = sched_domains_tmpmask; struct sd_data *sdd = sd->private; - struct sched_domain *child; + struct sched_domain *sibling; int i; cpumask_clear(covered); @@ -5750,10 +5750,10 @@ build_overlap_sched_groups(struct sched_domain *sd, int cpu) if (cpumask_test_cpu(i, covered)) continue; - child = *per_cpu_ptr(sdd->sd, i); + sibling = *per_cpu_ptr(sdd->sd, i); /* See the comment near build_group_mask(). */ - if (!cpumask_test_cpu(i, sched_domain_span(child))) + if (!cpumask_test_cpu(i, sched_domain_span(sibling))) continue; sg = kzalloc_node(sizeof(struct sched_group) + cpumask_size(), @@ -5763,10 +5763,9 @@ build_overlap_sched_groups(struct sched_domain *sd, int cpu) goto fail; sg_span = sched_group_cpus(sg); - if (child->child) { - child = child->child; - cpumask_copy(sg_span, sched_domain_span(child)); - } else + if (sibling->child) + cpumask_copy(sg_span, sched_domain_span(sibling->child)); + else cpumask_set_cpu(i, sg_span); cpumask_or(covered, covered, sg_span); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:sched/core] sched: Rename a misleading variable in build_overlap_sched_groups()
Commit-ID: aaecac4ad46b35ad308245384d019633fb9bc21b Gitweb: http://git.kernel.org/tip/aaecac4ad46b35ad308245384d019633fb9bc21b Author: Zhihui Zhang zzhs...@gmail.com AuthorDate: Fri, 1 Aug 2014 21:18:03 -0400 Committer: Ingo Molnar mi...@kernel.org CommitDate: Tue, 12 Aug 2014 12:48:21 +0200 sched: Rename a misleading variable in build_overlap_sched_groups() The child variable in build_overlap_sched_groups() actually refers to the peer or sibling domain of the given CPU. Rename it to sibling to be consistent with the naming in build_group_mask(). Signed-off-by: Zhihui Zhang zzhs...@gmail.com Signed-off-by: Peter Zijlstra pet...@infradead.org Cc: Linus Torvalds torva...@linux-foundation.org Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1406942283-18249-1-git-send-email-zzhs...@gmail.com Signed-off-by: Ingo Molnar mi...@kernel.org --- kernel/sched/core.c | 13 ++--- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 1211575..7d1ec6e 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5739,7 +5739,7 @@ build_overlap_sched_groups(struct sched_domain *sd, int cpu) const struct cpumask *span = sched_domain_span(sd); struct cpumask *covered = sched_domains_tmpmask; struct sd_data *sdd = sd-private; - struct sched_domain *child; + struct sched_domain *sibling; int i; cpumask_clear(covered); @@ -5750,10 +5750,10 @@ build_overlap_sched_groups(struct sched_domain *sd, int cpu) if (cpumask_test_cpu(i, covered)) continue; - child = *per_cpu_ptr(sdd-sd, i); + sibling = *per_cpu_ptr(sdd-sd, i); /* See the comment near build_group_mask(). */ - if (!cpumask_test_cpu(i, sched_domain_span(child))) + if (!cpumask_test_cpu(i, sched_domain_span(sibling))) continue; sg = kzalloc_node(sizeof(struct sched_group) + cpumask_size(), @@ -5763,10 +5763,9 @@ build_overlap_sched_groups(struct sched_domain *sd, int cpu) goto fail; sg_span = sched_group_cpus(sg); - if (child-child) { - child = child-child; - cpumask_copy(sg_span, sched_domain_span(child)); - } else + if (sibling-child) + cpumask_copy(sg_span, sched_domain_span(sibling-child)); + else cpumask_set_cpu(i, sg_span); cpumask_or(covered, covered, sg_span); -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] [sched] Rename a misleading variable in build_overlap_sched_groups()
The child variable in build_overlap_sched_groups() actually refers to the peer or sibling domain of the given CPU. Rename it to sibling to be consistent with the naming in build_group_mask(). Signed-off-by: Zhihui Zhang --- kernel/sched/core.c | 13 ++--- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index bc1638b..8ba66006 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5720,7 +5720,7 @@ build_overlap_sched_groups(struct sched_domain *sd, int cpu) const struct cpumask *span = sched_domain_span(sd); struct cpumask *covered = sched_domains_tmpmask; struct sd_data *sdd = sd->private; - struct sched_domain *child; + struct sched_domain *sibling; int i; cpumask_clear(covered); @@ -5731,10 +5731,10 @@ build_overlap_sched_groups(struct sched_domain *sd, int cpu) if (cpumask_test_cpu(i, covered)) continue; - child = *per_cpu_ptr(sdd->sd, i); + sibling = *per_cpu_ptr(sdd->sd, i); /* See the comment near build_group_mask(). */ - if (!cpumask_test_cpu(i, sched_domain_span(child))) + if (!cpumask_test_cpu(i, sched_domain_span(sibling))) continue; sg = kzalloc_node(sizeof(struct sched_group) + cpumask_size(), @@ -5744,10 +5744,9 @@ build_overlap_sched_groups(struct sched_domain *sd, int cpu) goto fail; sg_span = sched_group_cpus(sg); - if (child->child) { - child = child->child; - cpumask_copy(sg_span, sched_domain_span(child)); - } else + if (sibling->child) + cpumask_copy(sg_span, sched_domain_span(sibling->child)); + else cpumask_set_cpu(i, sg_span); cpumask_or(covered, covered, sg_span); -- 1.8.1.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] [sched] Rename a misleading variable in build_overlap_sched_groups()
The child variable in build_overlap_sched_groups() actually refers to the peer or sibling domain of the given CPU. Rename it to sibling to be consistent with the naming in build_group_mask(). Signed-off-by: Zhihui Zhang zzhs...@gmail.com --- kernel/sched/core.c | 13 ++--- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index bc1638b..8ba66006 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5720,7 +5720,7 @@ build_overlap_sched_groups(struct sched_domain *sd, int cpu) const struct cpumask *span = sched_domain_span(sd); struct cpumask *covered = sched_domains_tmpmask; struct sd_data *sdd = sd-private; - struct sched_domain *child; + struct sched_domain *sibling; int i; cpumask_clear(covered); @@ -5731,10 +5731,10 @@ build_overlap_sched_groups(struct sched_domain *sd, int cpu) if (cpumask_test_cpu(i, covered)) continue; - child = *per_cpu_ptr(sdd-sd, i); + sibling = *per_cpu_ptr(sdd-sd, i); /* See the comment near build_group_mask(). */ - if (!cpumask_test_cpu(i, sched_domain_span(child))) + if (!cpumask_test_cpu(i, sched_domain_span(sibling))) continue; sg = kzalloc_node(sizeof(struct sched_group) + cpumask_size(), @@ -5744,10 +5744,9 @@ build_overlap_sched_groups(struct sched_domain *sd, int cpu) goto fail; sg_span = sched_group_cpus(sg); - if (child-child) { - child = child-child; - cpumask_copy(sg_span, sched_domain_span(child)); - } else + if (sibling-child) + cpumask_copy(sg_span, sched_domain_span(sibling-child)); + else cpumask_set_cpu(i, sg_span); cpumask_or(covered, covered, sg_span); -- 1.8.1.2 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [sched] Don't account time after deadline twice
We calculate difference between two readings of a clock to see how much time has elapsed. Part of the time between rq_clock(rq) - dl_se->deadline can indeed be accounted for by reading a different clock (i.e., rq_clock_task()) if the task was running during the period. And that is how dl_se->runtime is obtained. After all, both clocks are running independently, right? Furthermore, the caller of dl_runtime_exceeded() will still use rq_clock() and dl_se->deadline to determine if we throttle or replenish. Anyway, I have failed to see any steal of time. Could you please give a concrete example (perhaps with numbers)? thanks, -Zhihui On Thu, Jul 3, 2014 at 5:50 AM, Juri Lelli wrote: > On Wed, 2 Jul 2014 19:44:04 -0400 > Zhihui Zhang wrote: > >> My point is that rq_clock(rq) - dl_se->deadline is already part of >> dl_se->runtime, which is decremented before calling dl_runtime_exceeded(). > > But, we decrement dl_se->runtime looking at rq_clock_task(rq), that is > in general <= rq_clock(rq), that we use to handle deadlines. So, if we > do like you suggest, in some cases we could end up stealing some > bandwidth from the system. Indeed, we prefer some pessimism here. > > Thanks, > > - Juri > >> So the following line is not needed in the case of both overrun and missing >> deadline: >> >> dl_se->runtime -= rq_clock(rq) - dl_se->deadline; >> >> Or did I miss anything? >> >> thanks, >> >> >> On Tue, Jul 1, 2014 at 9:59 AM, Juri Lelli wrote: >> >> > On Tue, 1 Jul 2014 15:08:16 +0200 >> > Peter Zijlstra wrote: >> > >> > > On Sun, Jun 29, 2014 at 09:26:10PM -0400, Zhihui Zhang wrote: >> > > > Unless we want to double-penalize an overrun task, the time after the >> > deadline >> > > > and before the current time is already accounted in the negative >> > dl_se->runtime >> > > > value. So we can leave it as is in the case of dmiss && rorun. >> > > >> > > Juri? >> > > >> > > > Signed-off-by: Zhihui Zhang >> > > > --- >> > > > kernel/sched/deadline.c | 6 ++ >> > > > 1 file changed, 2 insertions(+), 4 deletions(-) >> > > > >> > > > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c >> > > > index fc4f98b1..67df0d6 100644 >> > > > --- a/kernel/sched/deadline.c >> > > > +++ b/kernel/sched/deadline.c >> > > > @@ -579,10 +579,8 @@ int dl_runtime_exceeded(struct rq *rq, struct >> > sched_dl_entity *dl_se) >> > > > * the next instance. Thus, if we do not account that, we are >> > > > * stealing bandwidth from the system at each deadline miss! >> > > > */ >> > > > - if (dmiss) { >> > > > - dl_se->runtime = rorun ? dl_se->runtime : 0; >> > >> > If we didn't return 0 before, we are going to throttle (or replenish) >> > the entity, and you want runtime to be <=0. So, this is needed. >> > >> > > > - dl_se->runtime -= rq_clock(rq) - dl_se->deadline; >> > > > - } >> > >> > A little pessimism in some cases, due to the fact that we use both >> > rq_clock and rq_clock_task (for the budget). >> > >> > Thanks, >> > >> > - Juri >> > >> > > > + if (dmiss && !rorun) >> > > > + dl_se->runtime = dl_se->deadline - rq_clock(rq); >> > > > >> > > > return 1; >> > > > } >> > > > -- >> > > > 1.8.1.2 >> > > > >> > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [sched] Don't account time after deadline twice
We calculate difference between two readings of a clock to see how much time has elapsed. Part of the time between rq_clock(rq) - dl_se-deadline can indeed be accounted for by reading a different clock (i.e., rq_clock_task()) if the task was running during the period. And that is how dl_se-runtime is obtained. After all, both clocks are running independently, right? Furthermore, the caller of dl_runtime_exceeded() will still use rq_clock() and dl_se-deadline to determine if we throttle or replenish. Anyway, I have failed to see any steal of time. Could you please give a concrete example (perhaps with numbers)? thanks, -Zhihui On Thu, Jul 3, 2014 at 5:50 AM, Juri Lelli juri.le...@gmail.com wrote: On Wed, 2 Jul 2014 19:44:04 -0400 Zhihui Zhang zzhs...@gmail.com wrote: My point is that rq_clock(rq) - dl_se-deadline is already part of dl_se-runtime, which is decremented before calling dl_runtime_exceeded(). But, we decrement dl_se-runtime looking at rq_clock_task(rq), that is in general = rq_clock(rq), that we use to handle deadlines. So, if we do like you suggest, in some cases we could end up stealing some bandwidth from the system. Indeed, we prefer some pessimism here. Thanks, - Juri So the following line is not needed in the case of both overrun and missing deadline: dl_se-runtime -= rq_clock(rq) - dl_se-deadline; Or did I miss anything? thanks, On Tue, Jul 1, 2014 at 9:59 AM, Juri Lelli juri.le...@gmail.com wrote: On Tue, 1 Jul 2014 15:08:16 +0200 Peter Zijlstra pet...@infradead.org wrote: On Sun, Jun 29, 2014 at 09:26:10PM -0400, Zhihui Zhang wrote: Unless we want to double-penalize an overrun task, the time after the deadline and before the current time is already accounted in the negative dl_se-runtime value. So we can leave it as is in the case of dmiss rorun. Juri? Signed-off-by: Zhihui Zhang zzhs...@gmail.com --- kernel/sched/deadline.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index fc4f98b1..67df0d6 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -579,10 +579,8 @@ int dl_runtime_exceeded(struct rq *rq, struct sched_dl_entity *dl_se) * the next instance. Thus, if we do not account that, we are * stealing bandwidth from the system at each deadline miss! */ - if (dmiss) { - dl_se-runtime = rorun ? dl_se-runtime : 0; If we didn't return 0 before, we are going to throttle (or replenish) the entity, and you want runtime to be =0. So, this is needed. - dl_se-runtime -= rq_clock(rq) - dl_se-deadline; - } A little pessimism in some cases, due to the fact that we use both rq_clock and rq_clock_task (for the budget). Thanks, - Juri + if (dmiss !rorun) + dl_se-runtime = dl_se-deadline - rq_clock(rq); return 1; } -- 1.8.1.2 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] [sched] Don't account time after deadline twice
Unless we want to double-penalize an overrun task, the time after the deadline and before the current time is already accounted in the negative dl_se->runtime value. So we can leave it as is in the case of dmiss && rorun. Signed-off-by: Zhihui Zhang --- kernel/sched/deadline.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index fc4f98b1..67df0d6 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -579,10 +579,8 @@ int dl_runtime_exceeded(struct rq *rq, struct sched_dl_entity *dl_se) * the next instance. Thus, if we do not account that, we are * stealing bandwidth from the system at each deadline miss! */ - if (dmiss) { - dl_se->runtime = rorun ? dl_se->runtime : 0; - dl_se->runtime -= rq_clock(rq) - dl_se->deadline; - } + if (dmiss && !rorun) + dl_se->runtime = dl_se->deadline - rq_clock(rq); return 1; } -- 1.8.1.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] [sched] Don't account time after deadline twice
Unless we want to double-penalize an overrun task, the time after the deadline and before the current time is already accounted in the negative dl_se-runtime value. So we can leave it as is in the case of dmiss rorun. Signed-off-by: Zhihui Zhang zzhs...@gmail.com --- kernel/sched/deadline.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index fc4f98b1..67df0d6 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -579,10 +579,8 @@ int dl_runtime_exceeded(struct rq *rq, struct sched_dl_entity *dl_se) * the next instance. Thus, if we do not account that, we are * stealing bandwidth from the system at each deadline miss! */ - if (dmiss) { - dl_se-runtime = rorun ? dl_se-runtime : 0; - dl_se-runtime -= rq_clock(rq) - dl_se-deadline; - } + if (dmiss !rorun) + dl_se-runtime = dl_se-deadline - rq_clock(rq); return 1; } -- 1.8.1.2 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Use LOAD_PHYSICAL_ADDR in vmlinux.lds.S
This unifies the way to specify start VMA on both 32 and 64-bit platforms. I would like to remove __PHYSICAL_START as well, but that appears to be harder. Signed-off-by: Zhihui Zhang --- arch/x86/kernel/vmlinux.lds.S | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S index da6b35a..e81bf49 100644 --- a/arch/x86/kernel/vmlinux.lds.S +++ b/arch/x86/kernel/vmlinux.lds.S @@ -81,11 +81,10 @@ PHDRS { SECTIONS { -#ifdef CONFIG_X86_32 . = LOAD_OFFSET + LOAD_PHYSICAL_ADDR; +#ifdef CONFIG_X86_32 phys_startup_32 = startup_32 - LOAD_OFFSET; #else -. = __START_KERNEL; phys_startup_64 = startup_64 - LOAD_OFFSET; #endif -- 1.8.1.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Use LOAD_PHYSICAL_ADDR in vmlinux.lds.S
This unifies the way to specify start VMA on both 32 and 64-bit platforms. I would like to remove __PHYSICAL_START as well, but that appears to be harder. Signed-off-by: Zhihui Zhang zzhs...@gmail.com --- arch/x86/kernel/vmlinux.lds.S | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S index da6b35a..e81bf49 100644 --- a/arch/x86/kernel/vmlinux.lds.S +++ b/arch/x86/kernel/vmlinux.lds.S @@ -81,11 +81,10 @@ PHDRS { SECTIONS { -#ifdef CONFIG_X86_32 . = LOAD_OFFSET + LOAD_PHYSICAL_ADDR; +#ifdef CONFIG_X86_32 phys_startup_32 = startup_32 - LOAD_OFFSET; #else -. = __START_KERNEL; phys_startup_64 = startup_64 - LOAD_OFFSET; #endif -- 1.8.1.2 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] F2FS: Fix the logic of IS_DNODE()
Signed-off-by: Zhihui Zhang --- fs/f2fs/node.h |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/f2fs/node.h b/fs/f2fs/node.h index afdb130..2be47b2 100644 --- a/fs/f2fs/node.h +++ b/fs/f2fs/node.h @@ -239,7 +239,7 @@ static inline bool IS_DNODE(struct page *node_page) return false; if (ofs >= 6 + 2 * NIDS_PER_BLOCK) { ofs -= 6 + 2 * NIDS_PER_BLOCK; - if ((long int)ofs % (NIDS_PER_BLOCK + 1)) + if (!((long int)ofs % (NIDS_PER_BLOCK + 1))) return false; } return true; -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] F2FS: Fix the logic of IS_DNODE()
Signed-off-by: Zhihui Zhang zzhs...@gmail.com --- fs/f2fs/node.h |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/f2fs/node.h b/fs/f2fs/node.h index afdb130..2be47b2 100644 --- a/fs/f2fs/node.h +++ b/fs/f2fs/node.h @@ -239,7 +239,7 @@ static inline bool IS_DNODE(struct page *node_page) return false; if (ofs = 6 + 2 * NIDS_PER_BLOCK) { ofs -= 6 + 2 * NIDS_PER_BLOCK; - if ((long int)ofs % (NIDS_PER_BLOCK + 1)) + if (!((long int)ofs % (NIDS_PER_BLOCK + 1))) return false; } return true; -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/