Re: More latency regressions with 2.6.11-rc4-RT-V0.7.39-02
On Fri, 2005-02-25 at 05:58 +, Hugh Dickins wrote: > On Thu, 24 Feb 2005, Lee Revell wrote: > > On Thu, 2005-02-24 at 08:26 +, Hugh Dickins wrote: > > > > > > If we'd got to it earlier, yes. But 2.6.11 looks to be just a day or > > > two away, and we've no idea why zap_pte_range or clear_page_range > > > would have reverted. Nor have we heard from Ingo yet. > > > > It's also not clear that the patch completely fixes the copy_pte_range > > latency. This trace is from the Athlon XP. > > Then we need Ingo to investigate and explain all these reversions. > I'm not _blaming_ Ingo for them, but I'm not familiar with his patches > nor with deciphering latency traces - he's the magician around here. > Yup. Oh well. I'll try to compile a comprehensive list of these so we can fix them for 2.6.12. Lee - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: More latency regressions with 2.6.11-rc4-RT-V0.7.39-02
On Thu, 24 Feb 2005, Lee Revell wrote: > On Thu, 2005-02-24 at 08:26 +, Hugh Dickins wrote: > > > > If we'd got to it earlier, yes. But 2.6.11 looks to be just a day or > > two away, and we've no idea why zap_pte_range or clear_page_range > > would have reverted. Nor have we heard from Ingo yet. > > It's also not clear that the patch completely fixes the copy_pte_range > latency. This trace is from the Athlon XP. Then we need Ingo to investigate and explain all these reversions. I'm not _blaming_ Ingo for them, but I'm not familiar with his patches nor with deciphering latency traces - he's the magician around here. Hugh - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: More latency regressions with 2.6.11-rc4-RT-V0.7.39-02
On Thu, 2005-02-24 at 08:26 +, Hugh Dickins wrote: > On Thu, 24 Feb 2005, Lee Revell wrote: > > On Thu, 2005-02-24 at 04:56 +, Hugh Dickins wrote: > > > > > > In other mail, you do expect people still to be using Ingo's patches, > > > so probably this patch should stick there (and in -mm) for now. > > > > Well all of these were fixed in the past so it may not be unreasonable > > to fix them for 2.6.11. > > If we'd got to it earlier, yes. But 2.6.11 looks to be just a day or > two away, and we've no idea why zap_pte_range or clear_page_range > would have reverted. Nor have we heard from Ingo yet. > It's also not clear that the patch completely fixes the copy_pte_range latency. This trace is from the Athlon XP. Lee preemption latency trace v1.1.4 on 2.6.11-rc4-RT-V0.7.39-02 latency: 284 ïs, #25/25, CPU#0 | (M:preempt VP:0, KP:1, SP:1 HP:1 #P:1) - | task: ksoftirqd/0-2 (uid:0 nice:-10 policy:0 rt_prio:0) - _--=> CPU# / _-=> irqs-off | / _=> need-resched || / _---=> hardirq/softirq ||| / _--=> preempt-depth / | delay cmd pid | time | caller \ /| \ | / (T1/#0) dpkg 9299 0 3 0005 [0001017457529380] 0.000ms (+3633922.612ms): <676b7064> (<00746500>) (T1/#2) dpkg 9299 0 3 0005 0002 [0001017457529620] 0.000ms (+0.000ms): __trace_start_sched_wakeup+0x9a/0xd0 (try_to_wake_up+0x90/0x160 ) (T1/#3) dpkg 9299 0 3 0004 0003 [0001017457529825] 0.000ms (+0.000ms): preempt_schedule+0x11/0x80 (try_to_wake_up+0x90/0x160 ) (T3/#4) dpkg-9299 0dn.40ïs : try_to_wake_up+0x118/0x160 <<...>-2> (69 74): (T1/#5) dpkg 9299 0 3 0003 0005 [0001017457530633] 0.000ms (+0.000ms): preempt_schedule+0x11/0x80 (try_to_wake_up+0xf2/0x160 ) (T1/#6) dpkg 9299 0 3 0003 0006 [0001017457530809] 0.001ms (+0.000ms): wake_up_process+0x35/0x40 (do_softirq+0x3f/0x50 ) (T6/#7) dpkg-9299 0dn.21ïs!< (1) (T1/#8) dpkg 9299 0 2 0001 0008 [0001017457898984] 0.276ms (+0.000ms): preempt_schedule+0x11/0x80 (copy_pte_range+0xbc/0x1b0 ) (T1/#9) dpkg 9299 0 2 0001 0009 [0001017457899172] 0.276ms (+0.000ms): __cond_resched_raw_spinlock+0xb/0x50 (copy_pte_range+0xad/0x1b0 ) (T1/#10) dpkg 9299 0 2 000a [0001017457899575] 0.277ms (+0.000ms): __cond_resched+0xe/0x70 (__cond_resched_raw_spinlock+0x35/0x50 ) (T1/#11) dpkg 9299 0 3 000b [0001017457900063] 0.277ms (+0.000ms): __schedule+0xe/0x680 (__cond_resched+0x4a/0x70 ) (T1/#12) dpkg 9299 0 3 000c [0001017457900379] 0.277ms (+0.000ms): profile_hit+0x9/0x50 (__schedule+0x43/0x680 ) (T1/#13) dpkg 9299 0 3 0001 000d [0001017457900602] 0.277ms (+0.001ms): sched_clock+0x14/0x80 (__schedule+0x73/0x680 ) (T1/#14) dpkg 9299 0 3 0002 000e [0001017457902490] 0.279ms (+0.000ms): dequeue_task+0x12/0x60 (__schedule+0x1e0/0x680 ) (T1/#15) dpkg 9299 0 3 0002 000f [0001017457902687] 0.279ms (+0.000ms): recalc_task_prio+0xe/0x140 (__schedule+0x202/0x680 ) (T1/#16) dpkg 9299 0 3 0002 0010 [0001017457902848] 0.279ms (+0.000ms): effective_prio+0x8/0x60 (recalc_task_prio+0x88/0x140 ) (T1/#17) dpkg 9299 0 3 0002 0011 [0001017457902995] 0.279ms (+0.000ms): enqueue_task+0x11/0x80 (__schedule+0x20e/0x680 ) (T4/#18) [ => dpkg ] 0.280ms (+0.000ms) (T1/#19)<...> 2 0 1 0002 0013 [0001017457905091] 0.281ms (+0.000ms): __switch_to+0xe/0x190 (__schedule+0x306/0x680 ) (T3/#20)<...>-2 0d..2 281ïs : __schedule+0x337/0x680 (74 69): (T1/#21)<...> 2 0 1 0002 0015 [0001017457906484] 0.282ms (+0.000ms): finish_task_switch+0x14/0xa0 (__schedule+0x33f/0x680 ) (T1/#22)<...> 2 0 1 0001 0016 [0001017457906713] 0.282ms (+0.000ms): trace_stop_sched_switched+0x11/0x180 (finish_task_switch+0x51/0xa0 ) (T3/#23)<...>-2 0d..1 282ïs : trace_stop_sched_switched+0x4c/0x180 <<...>-2> (69 0): (T1/#24)<...> 2 0 1 0001 0018 [0001017457908107] 0.283ms (+0.000ms): trace_stop_sched_switched+0x11c/0x180 (finish_task_switch+0x51/0xa0 ) vim:ft=help - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: More latency regressions with 2.6.11-rc4-RT-V0.7.39-02
On Thu, 24 Feb 2005, Lee Revell wrote: > On Thu, 2005-02-24 at 04:56 +, Hugh Dickins wrote: > > > > In other mail, you do expect people still to be using Ingo's patches, > > so probably this patch should stick there (and in -mm) for now. > > Well all of these were fixed in the past so it may not be unreasonable > to fix them for 2.6.11. If we'd got to it earlier, yes. But 2.6.11 looks to be just a day or two away, and we've no idea why zap_pte_range or clear_page_range would have reverted. Nor have we heard from Ingo yet. Hugh - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: More latency regressions with 2.6.11-rc4-RT-V0.7.39-02
On Thu, 2005-02-24 at 04:56 +, Hugh Dickins wrote: > On Wed, 23 Feb 2005, Lee Revell wrote: > > On Wed, 2005-02-23 at 20:53 +, Hugh Dickins wrote: > > > On Wed, 23 Feb 2005, Hugh Dickins wrote: > > > > Please replace by new patch below, which I'm now running through > > > > lmbench. > > > > > > That second patch seems fine, and I see no lmbench regression from it. > > > > Should go into 2.6.11, right? > > That's up to Andrew (and Linus). > > I was thinking that way when I rushed you the patch. But given that > you have remaining unresolved latency issues nearby (zap_pte_range, > clear_page_range), and given the warning shot that I screwed up my > first attempt, I'd be inclined to say hold off. > > It's a pity: for a while we were thinking 2.6.11 would be a big step > forward for mainline latency; but it now looks to me like these tests > have come too late in the cycle to be dealt with safely. > > In other mail, you do expect people still to be using Ingo's patches, > so probably this patch should stick there (and in -mm) for now. Well all of these were fixed in the past so it may not be unreasonable to fix them for 2.6.11. Lee - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: More latency regressions with 2.6.11-rc4-RT-V0.7.39-02
On Wed, 23 Feb 2005, Lee Revell wrote: > On Wed, 2005-02-23 at 20:53 +, Hugh Dickins wrote: > > On Wed, 23 Feb 2005, Hugh Dickins wrote: > > > Please replace by new patch below, which I'm now running through lmbench. > > > > That second patch seems fine, and I see no lmbench regression from it. > > Should go into 2.6.11, right? That's up to Andrew (and Linus). I was thinking that way when I rushed you the patch. But given that you have remaining unresolved latency issues nearby (zap_pte_range, clear_page_range), and given the warning shot that I screwed up my first attempt, I'd be inclined to say hold off. It's a pity: for a while we were thinking 2.6.11 would be a big step forward for mainline latency; but it now looks to me like these tests have come too late in the cycle to be dealt with safely. In other mail, you do expect people still to be using Ingo's patches, so probably this patch should stick there (and in -mm) for now. Hugh - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: More latency regressions with 2.6.11-rc4-RT-V0.7.39-02
On Thu, 2005-02-24 at 13:41 +1100, Nick Piggin wrote: > Lee Revell wrote: > > > > Agreed, it would be much better to optimize this away than just add a > > scheduling point. It seems like we could do this lazily. > > > > Oh? What do you mean by lazy? IMO it is sort of implemented lazily now. > That is, we are too lazy to refcount page table pages in fastpaths, so > that pushes a lot of work to unmap time. Not necessarily a bad trade-off, > mind you. Just something I'm looking into. > I guess I was thinking we could be even more lazy, and somehow defer it until after unmap time (in lieu of memory pressure that is). Actually that's kind of what a lock break would do. Lee - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: More latency regressions with 2.6.11-rc4-RT-V0.7.39-02
Lee Revell wrote: On Thu, 2005-02-24 at 12:29 +1100, Nick Piggin wrote: Lee Revell wrote: IIRC last time I really tested this a few months ago, the worst case latency on that machine was about 150us. Currently its 422us from the same clear_page_range code path. Well it should be pretty trivial to add a break in there. I don't think it can get into 2.6.11 at this point though, so we'll revisit this for 2.6.12 if the clear_page_range optimisations don't get anywhere. Agreed, it would be much better to optimize this away than just add a scheduling point. It seems like we could do this lazily. Oh? What do you mean by lazy? IMO it is sort of implemented lazily now. That is, we are too lazy to refcount page table pages in fastpaths, so that pushes a lot of work to unmap time. Not necessarily a bad trade-off, mind you. Just something I'm looking into. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: More latency regressions with 2.6.11-rc4-RT-V0.7.39-02
On Thu, 2005-02-24 at 12:29 +1100, Nick Piggin wrote: > Lee Revell wrote: > > > > IIRC last time I really tested this a few months ago, the worst case > > latency on that machine was about 150us. Currently its 422us from the > > same clear_page_range code path. > > > Well it should be pretty trivial to add a break in there. > I don't think it can get into 2.6.11 at this point though, > so we'll revisit this for 2.6.12 if the clear_page_range > optimisations don't get anywhere. > Agreed, it would be much better to optimize this away than just add a scheduling point. It seems like we could do this lazily. IMHO it's not critical that these latency fixes be merged until the VP feature gets merged, until then people will be using Ingo's patches anyway. Lee - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: More latency regressions with 2.6.11-rc4-RT-V0.7.39-02
Lee Revell wrote: On Thu, 2005-02-24 at 10:27 +1100, Nick Piggin wrote: If you are using i386 with 2-level page tables (no highmem), then the behaviour should be more or less identical. Odd. IIRC last time I really tested this a few months ago, the worst case latency on that machine was about 150us. Currently its 422us from the same clear_page_range code path. On my Athlon XP the clear_page_range latency is not showing up at all, and the worst delay so far is only 35us, most of which is the timer interrupt IOW that machine is showing the best achievable latency (with PREEMPT_DESKTOP). The machine seeing 422 us latencies in clear_page_range is a 600Mhz C3, which is known to be a FSB limited architecture. Well it should be pretty trivial to add a break in there. I don't think it can get into 2.6.11 at this point though, so we'll revisit this for 2.6.12 if the clear_page_range optimisations don't get anywhere. Nick - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: More latency regressions with 2.6.11-rc4-RT-V0.7.39-02
On Thu, 2005-02-24 at 10:27 +1100, Nick Piggin wrote: > Hugh Dickins wrote: > > On Wed, 23 Feb 2005, Lee Revell wrote: > > > Thanks, your patch fixes the copy_pte_range latency. > >> > >>clear_page_range is also problematic. > > > > > > Yes, I saw that from your other traces too. I know there are plans > > to improve clear_page_range during 2.6.12, but I didn't realize that > > it had become very much worse than its antecedent clear_page_tables, > > and I don't see missing latency fixes for that. Nick's the expert. > > > > I wouldn't have thought it should have become worse, latency > wise. What is actually happening is that the lower level freeing > functions are being called more often. But this should result in > the work being spread out more, if anything. Rather than in the > old system things would tend to be batched up into bigger chunks > (typically at exit() time). > > If you are using i386 with 2-level page tables (no highmem), then > the behaviour should be more or less identical. Odd. IIRC last time I really tested this a few months ago, the worst case latency on that machine was about 150us. Currently its 422us from the same clear_page_range code path. On my Athlon XP the clear_page_range latency is not showing up at all, and the worst delay so far is only 35us, most of which is the timer interrupt IOW that machine is showing the best achievable latency (with PREEMPT_DESKTOP). The machine seeing 422 us latencies in clear_page_range is a 600Mhz C3, which is known to be a FSB limited architecture. Lee - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: More latency regressions with 2.6.11-rc4-RT-V0.7.39-02
Hugh Dickins wrote: On Wed, 23 Feb 2005, Lee Revell wrote: Thanks, your patch fixes the copy_pte_range latency. clear_page_range is also problematic. Yes, I saw that from your other traces too. I know there are plans to improve clear_page_range during 2.6.12, but I didn't realize that it had become very much worse than its antecedent clear_page_tables, and I don't see missing latency fixes for that. Nick's the expert. I wouldn't have thought it should have become worse, latency wise. What is actually happening is that the lower level freeing functions are being called more often. But this should result in the work being spread out more, if anything. Rather than in the old system things would tend to be batched up into bigger chunks (typically at exit() time). If you are using i386 with 2-level page tables (no highmem), then the behaviour should be more or less identical. Odd. Nick - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: More latency regressions with 2.6.11-rc4-RT-V0.7.39-02
On Wed, 2005-02-23 at 21:03 +, Hugh Dickins wrote: > On Wed, 23 Feb 2005, Lee Revell wrote: > > > > > > > > Thanks, your patch fixes the copy_pte_range latency. > > > > clear_page_range is also problematic. > > Yes, I saw that from your other traces too. Heh, sorry, that one was a dupe... I should know to give the files better names. Lee - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: More latency regressions with 2.6.11-rc4-RT-V0.7.39-02
On Wed, 2005-02-23 at 20:53 +, Hugh Dickins wrote: > On Wed, 23 Feb 2005, Hugh Dickins wrote: > > Please replace by new patch below, which I'm now running through lmbench. > > That second patch seems fine, and I see no lmbench regression from it. Should go into 2.6.11, right? Lee - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: More latency regressions with 2.6.11-rc4-RT-V0.7.39-02
On Wed, 23 Feb 2005, Lee Revell wrote: > > > > > > Thanks, your patch fixes the copy_pte_range latency. > > clear_page_range is also problematic. Yes, I saw that from your other traces too. I know there are plans to improve clear_page_range during 2.6.12, but I didn't realize that it had become very much worse than its antecedent clear_page_tables, and I don't see missing latency fixes for that. Nick's the expert. Hugh - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: More latency regressions with 2.6.11-rc4-RT-V0.7.39-02
On Wed, 23 Feb 2005, Hugh Dickins wrote: > Please replace by new patch below, which I'm now running through lmbench. That second patch seems fine, and I see no lmbench regression from it. Hugh - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: More latency regressions with 2.6.11-rc4-RT-V0.7.39-02
On Wed, 2005-02-23 at 20:06 +, Hugh Dickins wrote: > > > > Thanks, your patch fixes the copy_pte_range latency. > > Great, if the previous patch fixed that latency then this new one > will too, no need to report on that; but please get rid of the old > patch before it leaks too many of your pages. clear_page_range is also problematic. Lee preemption latency trace v1.1.4 on 2.6.11-rc4-RT-V0.7.39-02 latency: 313 µs, #291/291, CPU#0 | (M:preempt VP:0, KP:1, SP:1 HP:1 #P:1) - | task: ksoftirqd/0-2 (uid:0 nice:-10 policy:0 rt_prio:0) - _--=> CPU# / _-=> irqs-off | / _=> need-resched || / _---=> hardirq/softirq ||| / _--=> preempt-depth / | delay cmd pid | time | caller \ /| \ | / (T1/#0) cc1plus 3510 0 9 0002 [188528469818] 0.000ms (+811152.283ms): <70316363> (<0073756c>) (T1/#2) cc1plus 3510 0 9 0002 0002 [188528470164] 0.000ms (+0.000ms): __trace_start_sched_wakeup+0x96/0xc0 (try_to_wake_up+0x81/0x150 ) (T1/#3) cc1plus 3510 0 9 0003 [188528470660] 0.001ms (+0.000ms): wake_up_process+0x1c/0x30 (do_softirq+0x4b/0x60 ) (T6/#4) cc1plus-3510 0dn.22µs+< (1) (T1/#5) cc1plus 3510 0 2 0002 0005 [188528521949] 0.086ms (+0.000ms): __mod_page_state+0xa/0x30 (clear_page_range+0x183/0x1d0 ) (T1/#6) cc1plus 3510 0 2 0002 0006 [188528522356] 0.087ms (+0.000ms): free_page_and_swap_cache+0x9/0x70 (clear_page_range+0x1a0/0x1d0 ) (T1/#7) cc1plus 3510 0 2 0002 0007 [188528522777] 0.088ms (+0.001ms): __page_cache_release+0xb/0xc0 (clear_page_range+0x1a0/0x1d0 ) (T1/#8) cc1plus 3510 0 2 0002 0008 [188528523382] 0.089ms (+0.000ms): preempt_schedule+0xa/0x70 (__page_cache_release+0xad/0xc0 ) (T1/#9) cc1plus 3510 0 2 0002 0009 [188528523695] 0.089ms (+0.000ms): free_hot_page+0x8/0x10 (clear_page_range+0x1a0/0x1d0 ) (T1/#10) cc1plus 3510 0 2 0002 000a [188528523952] 0.090ms (+0.000ms): free_hot_cold_page+0xe/0x130 (clear_page_range+0x1a0/0x1d0 ) (T1/#11) cc1plus 3510 0 2 0002 000b [188528524216] 0.090ms (+0.001ms): __mod_page_state+0xa/0x30 (free_hot_cold_page+0x2a/0x130 ) (T1/#12) cc1plus 3510 0 2 0002 000c [188528524975] 0.091ms (+0.000ms): preempt_schedule+0xa/0x70 (clear_page_range+0x1a0/0x1d0 ) (T1/#13) cc1plus 3510 0 2 0002 000d [188528525518] 0.092ms (+0.000ms): __mod_page_state+0xa/0x30 (clear_page_range+0x183/0x1d0 ) (T1/#14) cc1plus 3510 0 2 0002 000e [188528525878] 0.093ms (+0.000ms): free_page_and_swap_cache+0x9/0x70 (clear_page_range+0x1a0/0x1d0 ) (T1/#15) cc1plus 3510 0 2 0002 000f [188528526264] 0.093ms (+0.000ms): __page_cache_release+0xb/0xc0 (clear_page_range+0x1a0/0x1d0 ) (T1/#16) cc1plus 3510 0 2 0002 0010 [188528526791] 0.094ms (+0.000ms): preempt_schedule+0xa/0x70 (__page_cache_release+0xad/0xc0 ) (T1/#17) cc1plus 3510 0 2 0002 0011 [188528527169] 0.095ms (+0.000ms): free_hot_page+0x8/0x10 (clear_page_range+0x1a0/0x1d0 ) (T1/#18) cc1plus 3510 0 2 0002 0012 [188528527426] 0.095ms (+0.000ms): free_hot_cold_page+0xe/0x130 (clear_page_range+0x1a0/0x1d0 ) (T1/#19) cc1plus 3510 0 2 0002 0013 [188528527898] 0.096ms (+0.001ms): __mod_page_state+0xa/0x30 (free_hot_cold_page+0x2a/0x130 ) (T1/#20) cc1plus 3510 0 2 0002 0014 [188528528616] 0.097ms (+0.000ms): preempt_schedule+0xa/0x70 (clear_page_range+0x1a0/0x1d0 ) (T1/#21) cc1plus 3510 0 2 0002 0015 [188528529086] 0.098ms (+0.000ms): __mod_page_state+0xa/0x30 (clear_page_range+0x183/0x1d0 ) (T1/#22) cc1plus 3510 0 2 0002 0016 [188528529446] 0.099ms (+0.000ms): free_page_and_swap_cache+0x9/0x70 (clear_page_range+0x1a0/0x1d0 ) (T1/#23) cc1plus 3510 0 2 0002 0017 [188528529833] 0.099ms (+0.000ms): __page_cache_release+0xb/0xc0 (clear_page_range+0x1a0/0x1d0 ) (T1/#24) cc1plus 3510 0 2 0002 0018 [188528530360] 0.100ms (+0.000ms): preempt_schedule+0xa/0x70 (__page_cache_release+0xad/0xc0 ) (T1/#25) cc1plus 3510 0 2 0002 0019 [188528530665] 0.101ms (+0.000ms): free_hot_page+0x8/0x10 (clear_page_range+0x1a0/0x1d0 ) (T1/#26) cc1plus 3510 0 2 0002 001a [188528530922] 0.101ms (+0.000ms): free_hot_cold_page+0xe/0x130 (clear_page_range+0x1a0/0x1d0 ) (T1/#27) cc1plus 3510 0 2 0002 0
Re: More latency regressions with 2.6.11-rc4-RT-V0.7.39-02
On Wed, 2005-02-23 at 20:06 +, Hugh Dickins wrote: > On Wed, 23 Feb 2005, Lee Revell wrote: > > On Wed, 2005-02-23 at 19:16 +, Hugh Dickins wrote: > > > > > > I'm just about to test this patch below: please give it a try: thanks... > > I'm very sorry, there's two things wrong with that version: _must_ > increment addr before breaking out, and better to check after pte_none > too (we can question whether it might be checking too often, but this > replicates what Ingo was doing). Please replace by new patch below, > which I'm now running through lmbench. OK, I will report any interesting results with the new patch. Lee - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: More latency regressions with 2.6.11-rc4-RT-V0.7.39-02
On Wed, 23 Feb 2005, Lee Revell wrote: > On Wed, 2005-02-23 at 19:16 +, Hugh Dickins wrote: > > > > I'm just about to test this patch below: please give it a try: thanks... I'm very sorry, there's two things wrong with that version: _must_ increment addr before breaking out, and better to check after pte_none too (we can question whether it might be checking too often, but this replicates what Ingo was doing). Please replace by new patch below, which I'm now running through lmbench. > Aha, that explains why all the latency regressions involve the VM > subsystem. > > Thanks, your patch fixes the copy_pte_range latency. Great, if the previous patch fixed that latency then this new one will too, no need to report on that; but please get rid of the old patch before it leaks too many of your pages. > Now zap_pte_range, > which Ingo also fixed a few months ago, is the worst offender. Can this > fix be easily ported too? That surprises me: all the zap_pte_range latency fixes I know of are in 2.6.11-rc, perhaps Ingo knows of something missing there? Hugh Ingo's patch to reduce scheduling latencies, by checking for lockbreak in copy_page_range, was in the -VP and -mm patchsets some months ago; but got preempted by the 4level rework, and not reinstated since. Restore it now in copy_pte_range - which mercifully makes it easier. Signed-off-by: Hugh Dickins <[EMAIL PROTECTED]> --- 2.6.11-rc4-bk/mm/memory.c 2005-02-21 11:32:19.0 + +++ linux/mm/memory.c 2005-02-23 19:46:40.0 + @@ -328,21 +328,33 @@ static int copy_pte_range(struct mm_stru pte_t *s, *d; unsigned long vm_flags = vma->vm_flags; +again: d = dst_pte = pte_alloc_map(dst_mm, dst_pmd, addr); if (!dst_pte) return -ENOMEM; spin_lock(&src_mm->page_table_lock); s = src_pte = pte_offset_map_nested(src_pmd, addr); - for (; addr < end; addr += PAGE_SIZE, s++, d++) { - if (pte_none(*s)) - continue; - copy_one_pte(dst_mm, src_mm, d, s, vm_flags, addr); + for (; addr < end; s++, d++) { + if (!pte_none(*s)) + copy_one_pte(dst_mm, src_mm, d, s, vm_flags, addr); + addr += PAGE_SIZE; + /* +* We are holding two locks at this point - either of them +* could generate latencies in another task on another CPU. +*/ + if (need_resched() || + need_lockbreak(&src_mm->page_table_lock) || + need_lockbreak(&dst_mm->page_table_lock)) + break; } pte_unmap_nested(src_pte); pte_unmap(dst_pte); spin_unlock(&src_mm->page_table_lock); + cond_resched_lock(&dst_mm->page_table_lock); + if (addr < end) + goto again; return 0; } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: More latency regressions with 2.6.11-rc4-RT-V0.7.39-02
On Wed, 2005-02-23 at 19:16 +, Hugh Dickins wrote: > On Wed, 23 Feb 2005, Lee Revell wrote: > > > > Did something change recently in the VM that made copy_pte_range and > > clear_page_range a lot more expensive? I noticed a reference in the > > "Page Table Iterators" thread to excessive overhead introduced by > > aggressive page freeing. That sure looks like what is going on in > > trace2. trace1 and trace3 look like big fork latencies associated with > > copy_pte_range. > > I'm just about to test this patch below: please give it a try: thanks... > > Ingo's patch to reduce scheduling latencies, by checking for lockbreak > in copy_page_range, was in the -VP and -mm patchsets some months ago; > but got preempted by the 4level rework, and not reinstated since. > Restore it now in copy_pte_range - which mercifully makes it easier. Aha, that explains why all the latency regressions involve the VM subsystem. Thanks, your patch fixes the copy_pte_range latency. Now zap_pte_range, which Ingo also fixed a few months ago, is the worst offender. Can this fix be easily ported too? Lee preemption latency trace v1.1.4 on 2.6.11-rc4-RT-V0.7.39-02 latency: 197 µs, #74/74, CPU#0 | (M:preempt VP:0, KP:1, SP:1 HP:1 #P:1) - | task: ksoftirqd/0-2 (uid:0 nice:-10 policy:0 rt_prio:0) - _--=> CPU# / _-=> irqs-off | / _=> need-resched || / _---=> hardirq/softirq ||| / _--=> preempt-depth / | delay cmd pid | time | caller \ /| \ | / (T1/#0)dmesg 3249 0 9 0002 [163455166655] 0.000ms (+886841.090ms): <73656d64> (<61700067>) (T1/#2)dmesg 3249 0 9 0002 0002 [163455167105] 0.000ms (+0.000ms): __trace_start_sched_wakeup+0x96/0xc0 (try_to_wake_up+0x81/0x150 ) (T1/#3)dmesg 3249 0 9 0003 [163455167685] 0.001ms (+0.001ms): wake_up_process+0x1c/0x30 (do_softirq+0x4b/0x60 ) (T6/#4)dmesg-3249 0dn.22µs < (1) (T1/#5)dmesg 3249 0 2 0002 0005 [163455168677] 0.003ms (+0.000ms): page_remove_rmap+0x8/0x40 (zap_pte_range+0x13a/0x250 ) (T1/#6)dmesg 3249 0 2 0002 0006 [163455168975] 0.003ms (+0.000ms): __mod_page_state+0xa/0x30 (page_remove_rmap+0x2c/0x40 ) (T1/#7)dmesg 3249 0 2 0002 0007 [163455169390] 0.004ms (+0.000ms): free_page_and_swap_cache+0x9/0x70 (zap_pte_range+0x14c/0x250 ) (T1/#8)dmesg 3249 0 2 0002 0008 [163455169687] 0.005ms (+0.001ms): __page_cache_release+0xb/0xc0 (zap_pte_range+0x14c/0x250 ) (T1/#9)dmesg 3249 0 2 0002 0009 [163455170314] 0.006ms (+0.000ms): preempt_schedule+0xa/0x70 (__page_cache_release+0xad/0xc0 ) (T1/#10)dmesg 3249 0 2 0002 000a [163455170636] 0.006ms (+0.000ms): free_hot_page+0x8/0x10 (zap_pte_range+0x14c/0x250 ) (T1/#11)dmesg 3249 0 2 0002 000b [163455170944] 0.007ms (+0.000ms): free_hot_cold_page+0xe/0x130 (zap_pte_range+0x14c/0x250 ) (T1/#12)dmesg 3249 0 2 0002 000c [163455171314] 0.007ms (+0.001ms): __mod_page_state+0xa/0x30 (free_hot_cold_page+0x2a/0x130 ) (T1/#13)dmesg 3249 0 2 0002 000d [163455172090] 0.009ms (+0.000ms): preempt_schedule+0xa/0x70 (zap_pte_range+0x14c/0x250 ) (T1/#14)dmesg 3249 0 2 0002 000e [163455172432] 0.009ms (+0.000ms): set_page_dirty+0x8/0x60 (zap_pte_range+0x168/0x250 ) (T1/#15)dmesg 3249 0 2 0002 000f [163455172774] 0.010ms (+0.000ms): page_remove_rmap+0x8/0x40 (zap_pte_range+0x13a/0x250 ) (T1/#16)dmesg 3249 0 2 0002 0010 [163455173040] 0.010ms (+0.000ms): __mod_page_state+0xa/0x30 (page_remove_rmap+0x2c/0x40 ) (T1/#17)dmesg 3249 0 2 0002 0011 [163455173421] 0.011ms (+0.000ms): free_page_and_swap_cache+0x9/0x70 (zap_pte_range+0x14c/0x250 ) (T1/#18)dmesg 3249 0 2 0002 0012 [163455173704] 0.011ms (+0.001ms): __page_cache_release+0xb/0xc0 (zap_pte_range+0x14c/0x250 ) (T1/#19)dmesg 3249 0 2 0002 0013 [163455174397] 0.012ms (+0.000ms): preempt_schedule+0xa/0x70 (__page_cache_release+0xad/0xc0 ) (T1/#20)dmesg 3249 0 2 0002 0014 [163455174687] 0.013ms (+0.000ms): free_hot_page+0x8/0x10 (zap_pte_range+0x14c/0x250 ) (T1/#21)dmesg 3249 0 2 0002 0015 [163455175101] 0.014ms (+0.000ms): free_hot_cold_page+0xe/0x130 (zap_pte_range+0x14c/0x250 ) (T1/#22)dmesg 3249 0 2 0002 0016 [163455175436] 0.014ms (+0.001ms): __mod_page_state+0xa/0x3
Re: More latency regressions with 2.6.11-rc4-RT-V0.7.39-02
On Wed, 23 Feb 2005, Lee Revell wrote: > > Did something change recently in the VM that made copy_pte_range and > clear_page_range a lot more expensive? I noticed a reference in the > "Page Table Iterators" thread to excessive overhead introduced by > aggressive page freeing. That sure looks like what is going on in > trace2. trace1 and trace3 look like big fork latencies associated with > copy_pte_range. I'm just about to test this patch below: please give it a try: thanks... Ingo's patch to reduce scheduling latencies, by checking for lockbreak in copy_page_range, was in the -VP and -mm patchsets some months ago; but got preempted by the 4level rework, and not reinstated since. Restore it now in copy_pte_range - which mercifully makes it easier. Signed-off-by: Hugh Dickins <[EMAIL PROTECTED]> --- 2.6.11-rc4-bk9/mm/memory.c 2005-02-21 11:32:19.0 + +++ linux/mm/memory.c 2005-02-23 18:35:28.0 + @@ -328,6 +328,7 @@ static int copy_pte_range(struct mm_stru pte_t *s, *d; unsigned long vm_flags = vma->vm_flags; +again: d = dst_pte = pte_alloc_map(dst_mm, dst_pmd, addr); if (!dst_pte) return -ENOMEM; @@ -338,11 +339,22 @@ static int copy_pte_range(struct mm_stru if (pte_none(*s)) continue; copy_one_pte(dst_mm, src_mm, d, s, vm_flags, addr); + /* +* We are holding two locks at this point - either of them +* could generate latencies in another task on another CPU. +*/ + if (need_resched() || + need_lockbreak(&src_mm->page_table_lock) || + need_lockbreak(&dst_mm->page_table_lock)) + break; } pte_unmap_nested(src_pte); pte_unmap(dst_pte); spin_unlock(&src_mm->page_table_lock); + cond_resched_lock(&dst_mm->page_table_lock); + if (addr < end) + goto again; return 0; } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
More latency regressions with 2.6.11-rc4-RT-V0.7.39-02
Ingo, Did something change recently in the VM that made copy_pte_range and clear_page_range a lot more expensive? I noticed a reference in the "Page Table Iterators" thread to excessive overhead introduced by aggressive page freeing. That sure looks like what is going on in trace2. trace1 and trace3 look like big fork latencies associated with copy_pte_range. This is all with PREEMPT_DESKTOP. Lee preemption latency trace v1.1.4 on 2.6.11-rc4-RT-V0.7.39-02 latency: 323 µs, #22/22, CPU#0 | (M:preempt VP:0, KP:1, SP:1 HP:1 #P:1) - | task: ksoftirqd/0-2 (uid:0 nice:-10 policy:0 rt_prio:0) - _--=> CPU# / _-=> irqs-off | / _=> need-resched || / _---=> hardirq/softirq ||| / _--=> preempt-depth / | delay cmd pid | time | caller \ /| \ | / (T1/#0) valgrind.bin 15934 0 9 0002 [0032064022903959] 0.000ms (+3534259.157ms): <676c6176> (<646e6972>) (T1/#2) valgrind.bin 15934 0 9 0002 0002 [0032064022904377] 0.000ms (+0.000ms): __trace_start_sched_wakeup+0x96/0xc0 (try_to_wake_up+0x81/0x150 ) (T1/#3) valgrind.bin 15934 0 9 0003 [0032064022904934] 0.001ms (+0.000ms): wake_up_process+0x1c/0x30 (do_softirq+0x4b/0x60 ) (T6/#4) valgrind-15934 0dn.22µs!< (1) (T1/#5) valgrind.bin 15934 0 2 0001 0005 [0032064023088406] 0.307ms (+0.000ms): preempt_schedule+0xa/0x70 (copy_pte_range+0xb7/0x1c0 ) (T1/#6) valgrind.bin 15934 0 2 0001 0006 [0032064023088847] 0.308ms (+0.000ms): __cond_resched_raw_spinlock+0x8/0x50 (copy_pte_range+0xa7/0x1c0 ) (T1/#7) valgrind.bin 15934 0 2 0007 [0032064023089269] 0.308ms (+0.001ms): __cond_resched+0x9/0x70 (__cond_resched_raw_spinlock+0x3d/0x50 ) (T1/#8) valgrind.bin 15934 0 3 0008 [0032064023089903] 0.309ms (+0.000ms): __schedule+0xe/0x630 (__cond_resched+0x45/0x70 ) (T1/#9) valgrind.bin 15934 0 3 0009 [0032064023090288] 0.310ms (+0.000ms): profile_hit+0x9/0x50 (__schedule+0x3a/0x630 ) (T1/#10) valgrind.bin 15934 0 3 0001 000a [0032064023090866] 0.311ms (+0.001ms): sched_clock+0xe/0xe0 (__schedule+0x62/0x630 ) (T1/#11) valgrind.bin 15934 0 3 0002 000b [0032064023091857] 0.313ms (+0.000ms): dequeue_task+0xa/0x50 (__schedule+0x1ab/0x630 ) (T1/#12) valgrind.bin 15934 0 3 0002 000c [0032064023092280] 0.313ms (+0.000ms): recalc_task_prio+0xc/0x1a0 (__schedule+0x1c5/0x630 ) (T1/#13) valgrind.bin 15934 0 3 0002 000d [0032064023092690] 0.314ms (+0.000ms): effective_prio+0x8/0x50 (recalc_task_prio+0xa6/0x1a0 ) (T1/#14) valgrind.bin 15934 0 3 0002 000e [0032064023093035] 0.315ms (+0.001ms): enqueue_task+0xa/0x80 (__schedule+0x1cc/0x630 ) (T4/#15) [ => valgrind.bin ] 0.316ms (+0.001ms) (T1/#16)<...> 2 0 1 0002 0010 [0032064023094748] 0.317ms (+0.001ms): __switch_to+0xb/0x1a0 (__schedule+0x2bd/0x630 ) (T3/#17)<...>-2 0d..2 319µs : __schedule+0x2ea/0x630 (7d 69): (T1/#18)<...> 2 0 1 0002 0012 [0032064023095766] 0.319ms (+0.000ms): finish_task_switch+0xc/0x90 (__schedule+0x2f6/0x630 ) (T1/#19)<...> 2 0 1 0001 0013 [0032064023096192] 0.320ms (+0.000ms): trace_stop_sched_switched+0xa/0x150 (finish_task_switch+0x43/0x90 ) (T3/#20)<...>-2 0d..1 320µs : trace_stop_sched_switched+0x42/0x150 <<...>-2> (69 0): (T1/#21)<...> 2 0 1 0001 0015 [0032064023097511] 0.322ms (+0.000ms): trace_stop_sched_switched+0xfe/0x150 (finish_task_switch+0x43/0x90 ) vim:ft=help preemption latency trace v1.1.4 on 2.6.11-rc4-RT-V0.7.39-02 latency: 361 µs, #289/289, CPU#0 | (M:preempt VP:0, KP:1, SP:1 HP:1 #P:1) - | task: ksoftirqd/0-2 (uid:0 nice:-10 policy:0 rt_prio:0) - _--=> CPU# / _-=> irqs-off | / _=> need-resched || / _---=> hardirq/softirq ||| / _--=> preempt-depth / | delay cmd pid | time | caller \ /| \ | / (T1/#0)evolution 16047 0 9 0002 [0032772517080312] 0.000ms (+3483891.566ms): <6c6f7665> (<6f697475>) (T1/#2)evolution 16047 0 9 0002 0002 [0032772517080803] 0.000ms (+0.000ms): __trace_start_sched_wakeup+0x96/0xc0 (try_to_wake_up+0x81/0x150 ) (T1/#3)evolution 16047 0 9 0003 [0032772517081392] 0.001ms