RE: [v5 PATCH 1/2] mm: swap: check if swap backing device is congested or not
> > > > + if (si->flags & (SWP_BLKDEV | SWP_FS)) { > > I re-read your discussion with Tim and I must say the reasoning behind this > test remain foggy. I was worried that the dereference inode = si->swap_file->f_mapping->host; is not always safe for corner cases. So the test makes sure that the dereference is valid. > > What goes wrong if we just remove it? If the dereference to get inode is always safe, we can remove it. Thanks. Tim
RE: [Update][PATCH v5 7/9] mm/swap: Add cache for swap slots allocation
> > > > > > > The cache->slots_ret is protected by cache->free_lock and > > cache->slots is protected by cache->free_lock. Typo. cache->slots is protected by cache->alloc_lock. Tim
RE: [Update][PATCH v5 7/9] mm/swap: Add cache for swap slots allocation
> > > > > > > The cache->slots_ret is protected by cache->free_lock and > > cache->slots is protected by cache->free_lock. Typo. cache->slots is protected by cache->alloc_lock. Tim
RE: [Update][PATCH v5 7/9] mm/swap: Add cache for swap slots allocation
> > + /* > > +* Preemption need to be turned on here, because we may sleep > > +* in refill_swap_slots_cache(). But it is safe, because > > +* accesses to the per-CPU data structure are protected by a > > +* mutex. > > +*/ > > the comment doesn't really explain why it is safe. THere are other users > which are not using the lock. E.g. just look at free_swap_slot above. > How can > cache->slots_ret[cache->n_ret++] = entry; be safe wrt. > pentry = >slots[cache->cur++]; > entry = *pentry; > > Both of them might touch the same slot, no? Btw. I would rather prefer this > would be a follow up fix with the trace and the detailed explanation. > The cache->slots_ret is protected by cache->free_lock and cache->slots is protected by cache->free_lock. They are two separate structures, one for caching the slots returned and one for caching the slots allocated. So they do no touch the same slots. We'll update the comments so it is clearer. Sure. We can issue a follow up fix on top of the current patchset. Thanks. Tim
RE: [Update][PATCH v5 7/9] mm/swap: Add cache for swap slots allocation
> > + /* > > +* Preemption need to be turned on here, because we may sleep > > +* in refill_swap_slots_cache(). But it is safe, because > > +* accesses to the per-CPU data structure are protected by a > > +* mutex. > > +*/ > > the comment doesn't really explain why it is safe. THere are other users > which are not using the lock. E.g. just look at free_swap_slot above. > How can > cache->slots_ret[cache->n_ret++] = entry; be safe wrt. > pentry = >slots[cache->cur++]; > entry = *pentry; > > Both of them might touch the same slot, no? Btw. I would rather prefer this > would be a follow up fix with the trace and the detailed explanation. > The cache->slots_ret is protected by cache->free_lock and cache->slots is protected by cache->free_lock. They are two separate structures, one for caching the slots returned and one for caching the slots allocated. So they do no touch the same slots. We'll update the comments so it is clearer. Sure. We can issue a follow up fix on top of the current patchset. Thanks. Tim
RE: [PATCH -v3 00/10] THP swap: Delay splitting THP during swapping out
> >So this is impossible without THP swapin. While 2M swapout makes a lot of >sense, I doubt 2M swapin is really useful. What kind of application is >'optimized' >to do sequential memory access? We waste a lot of cpu cycles to re-compact 4K pages back to a large page under THP. Swapping it back in as a single large page can avoid fragmentation and this overhead. Thanks. Tim
RE: [PATCH -v3 00/10] THP swap: Delay splitting THP during swapping out
> >So this is impossible without THP swapin. While 2M swapout makes a lot of >sense, I doubt 2M swapin is really useful. What kind of application is >'optimized' >to do sequential memory access? We waste a lot of cpu cycles to re-compact 4K pages back to a large page under THP. Swapping it back in as a single large page can avoid fragmentation and this overhead. Thanks. Tim
RE: [PATCH -v3 00/10] THP swap: Delay splitting THP during swapping out
>> >> - Avoid CPU time for splitting, collapsing THP across swap out/in. > >Yes, if you want, please give us how bad it is. > It could be pretty bad. In an experiment with THP turned on and we enter swap, 50% of the cpu are spent in the page compaction path. So if we could deal with units of large page for swap, the splitting and compaction of ordinary pages to large page overhead could be avoided. 51.89%51.89%:1688 [kernel.kallsyms] [k] pageblock_pfn_to_page | --- pageblock_pfn_to_page | |--64.57%-- compaction_alloc | migrate_pages | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __do_page_fault | do_page_fault | page_fault | 0x401d9a | |--34.62%-- compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __do_page_fault | do_page_fault | page_fault | 0x401d9a --0.81%-- [...] Tim
RE: [PATCH -v3 00/10] THP swap: Delay splitting THP during swapping out
>> >> - Avoid CPU time for splitting, collapsing THP across swap out/in. > >Yes, if you want, please give us how bad it is. > It could be pretty bad. In an experiment with THP turned on and we enter swap, 50% of the cpu are spent in the page compaction path. So if we could deal with units of large page for swap, the splitting and compaction of ordinary pages to large page overhead could be avoided. 51.89%51.89%:1688 [kernel.kallsyms] [k] pageblock_pfn_to_page | --- pageblock_pfn_to_page | |--64.57%-- compaction_alloc | migrate_pages | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __do_page_fault | do_page_fault | page_fault | 0x401d9a | |--34.62%-- compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __do_page_fault | do_page_fault | page_fault | 0x401d9a --0.81%-- [...] Tim
RE: performance delta after VFS i_mutex=>i_rwsem conversion
>> Ok, these enhancements are now in the locking tree and are queued up for >v4.8: >> >>git pull git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git >> locking/core >> >> Dave, you might want to check your numbers with these changes: is >> rwsem performance still significantly worse than mutex performance? > >It's substantially closer than it was, but there's probably a little work >still to do. >The rwsem still looks to be sleeping a lot more than the mutex. Here's where >we started: > > https://www.sr71.net/~dave/intel/rwsem-vs-mutex.png > >The rwsem peaked lower and earlier than the mutex code. Now, if we compare >the old (4.7-rc1) rwsem code to the newly-patched rwsem code (from >tip/locking): > >> https://www.sr71.net/~dave/intel/bb.html?1=4.7.0-rc1&2=4.7.0-rc1-00127 >> -gd4c3be7 > >We can see the peak is a bit higher and more importantly, it's more of a >plateau >than a sharp peak. We can also compare the new rwsem code to the 4.5 code >that had the mutex in place: > >> https://www.sr71.net/~dave/intel/bb.html?1=4.5.0-rc6&2=4.7.0-rc1-00127 >> -gd4c3be7 > >rwsems are still a _bit_ below the mutex code at the peak, and they also seem >to be substantially lower during the tail from 20 cpus on up. The rwsems are >sleeping less than they were before the tip/locking updates, but they are still >idling the CPUs 90% of the time while the mutexes end up idle 15-20% of the >time when all the cpus are contending on the lock. In Al Viro's conversion, he introduced inode_lock_shared which uses read lock in lookup_slow. The rwsem does bail out of optimistic spin when readers acquire the lock, thus causing us to see a lot less optimistic spinning attempts for the unlink test case. Whereas earlier for mutex, we will keep spinning. A simple test may be to see if we get similar performance when changing inode_lock_shared to the writer version. That said, hopefully we should have a lot more read locking than write locking (i.e. more path look up than changes to the path) so the switch to rwsem is still a win. I guess the lesson here is when there is an equal mix of writers and readers, rwsem could be a bit worse in performance than mutex as we don't spin as hard. Tim
RE: performance delta after VFS i_mutex=>i_rwsem conversion
>> Ok, these enhancements are now in the locking tree and are queued up for >v4.8: >> >>git pull git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git >> locking/core >> >> Dave, you might want to check your numbers with these changes: is >> rwsem performance still significantly worse than mutex performance? > >It's substantially closer than it was, but there's probably a little work >still to do. >The rwsem still looks to be sleeping a lot more than the mutex. Here's where >we started: > > https://www.sr71.net/~dave/intel/rwsem-vs-mutex.png > >The rwsem peaked lower and earlier than the mutex code. Now, if we compare >the old (4.7-rc1) rwsem code to the newly-patched rwsem code (from >tip/locking): > >> https://www.sr71.net/~dave/intel/bb.html?1=4.7.0-rc1&2=4.7.0-rc1-00127 >> -gd4c3be7 > >We can see the peak is a bit higher and more importantly, it's more of a >plateau >than a sharp peak. We can also compare the new rwsem code to the 4.5 code >that had the mutex in place: > >> https://www.sr71.net/~dave/intel/bb.html?1=4.5.0-rc6&2=4.7.0-rc1-00127 >> -gd4c3be7 > >rwsems are still a _bit_ below the mutex code at the peak, and they also seem >to be substantially lower during the tail from 20 cpus on up. The rwsems are >sleeping less than they were before the tip/locking updates, but they are still >idling the CPUs 90% of the time while the mutexes end up idle 15-20% of the >time when all the cpus are contending on the lock. In Al Viro's conversion, he introduced inode_lock_shared which uses read lock in lookup_slow. The rwsem does bail out of optimistic spin when readers acquire the lock, thus causing us to see a lot less optimistic spinning attempts for the unlink test case. Whereas earlier for mutex, we will keep spinning. A simple test may be to see if we get similar performance when changing inode_lock_shared to the writer version. That said, hopefully we should have a lot more read locking than write locking (i.e. more path look up than changes to the path) so the switch to rwsem is still a win. I guess the lesson here is when there is an equal mix of writers and readers, rwsem could be a bit worse in performance than mutex as we don't spin as hard. Tim
RE: Regression with SLUB on Netperf and Volanomark
Christoph Lameter wrote: > Try to boot with > > slub_max_order=4 slub_min_objects=8 > > If that does not help increase slub_min_objects to 16. > We are still seeing a 5% regression on TCP streaming with slub_min_objects set at 16 and a 10% regression for Volanomark, after increasing slub_min_objects to 16 and setting slub_max_order=4 and using the 2.6.21-rc7-mm2 kernel. The performance between slub_min_objects=8 and 16 are similar. >> We found that for Netperf's TCP streaming tests in a loop back mode, >> the TCP streaming performance is about 7% worse when SLUB is enabled >> on >> 2.6.21-rc7-mm1 kernel (x86_64). This test have a lot of sk_buff >> allocation/deallocation. > > 2.6.21-rc7-mm2 contains some performance fixes that may or may not be > useful to you. We've switched to 2.6.21-rc7-mm2 in our tests now. >> >> For Volanomark, the performance is 7% worse for Woodcrest and 12% >> worse for Clovertown. > > SLUBs "queueing" is restricted to the number of objects that fit in > page order slab. SLAB can queue more objects since it has true queues. > Increasing the page size that SLUB uses may fix the problem but then > we run into higher page order issues. > > Check slabinfo output for the network slabs and see what order is > used. The number of objects per slab is important for performance. The order used is 0 for the buffer_head, which is the most used object. I think they are 104 bytes per object. Tim - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Regression with SLUB on Netperf and Volanomark
Christoph Lameter wrote: Try to boot with slub_max_order=4 slub_min_objects=8 If that does not help increase slub_min_objects to 16. We are still seeing a 5% regression on TCP streaming with slub_min_objects set at 16 and a 10% regression for Volanomark, after increasing slub_min_objects to 16 and setting slub_max_order=4 and using the 2.6.21-rc7-mm2 kernel. The performance between slub_min_objects=8 and 16 are similar. We found that for Netperf's TCP streaming tests in a loop back mode, the TCP streaming performance is about 7% worse when SLUB is enabled on 2.6.21-rc7-mm1 kernel (x86_64). This test have a lot of sk_buff allocation/deallocation. 2.6.21-rc7-mm2 contains some performance fixes that may or may not be useful to you. We've switched to 2.6.21-rc7-mm2 in our tests now. For Volanomark, the performance is 7% worse for Woodcrest and 12% worse for Clovertown. SLUBs queueing is restricted to the number of objects that fit in page order slab. SLAB can queue more objects since it has true queues. Increasing the page size that SLUB uses may fix the problem but then we run into higher page order issues. Check slabinfo output for the network slabs and see what order is used. The number of objects per slab is important for performance. The order used is 0 for the buffer_head, which is the most used object. I think they are 104 bytes per object. Tim - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] lock stat for -rt 2.6.20-rc2-rt2.2.lock_stat.patch
Bill Huey (hui) wrote: > This should have the fix. > > http://mmlinux.sf.net/public/patch-2.6.20-rc2-rt2.3.lock_stat.patch > > If you can rerun it and post the results, it'll hopefully show the > behavior of that lock acquisition better. > Here's the run with fix to produce correct statistics. Tim @contention events = 848858 @failure_events = 10 @lookup_failed_scope = 175 @lookup_failed_static = 47 @static_found = 17 [2, 0, 0 -- 1, 0] {journal_init_common, fs/jbd/journal.c, 667} [2, 0, 0 -- 31, 0] {blk_init_queue_node, block/ll_rw_blk.c, 1910} [2, 0, 0 -- 31, 0] {create_workqueue_thread, kernel/workqueue.c, 474} [3, 3, 2 -- 16384, 0] {tcp_init, net/ipv4/tcp.c, 2426} [4, 4, 1 -- 1, 0] {lock_kernel, -, 0} [19, 0, 0 -- 1, 0] {kmem_cache_alloc, -, 0} [25, 0, 0 -- 1, 0] {kfree, -, 0} [49, 0, 0 -- 2, 0] {kmem_cache_free, -, 0} [69, 38, 176 -- 1, 0] {lock_timer_base, -, 0} [211, 117, 517 -- 3, 0] {init_timers_cpu, kernel/timer.c, 1842} [1540, 778, 365 -- 7326, 0] {sock_lock_init, net/core/sock.c, 817} [112584, 150, 6 -- 256, 0] {init, kernel/futex.c, 2781} [597012, 183895, 136277 -- 9546, 0] {mm_init, kernel/fork.c, 369} - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] lock stat for -rt 2.6.20-rc2-rt2.2.lock_stat.patch
Bill Huey (hui) wrote: > > Thanks, the numbers look a bit weird in that the first column should > have a bigger number of events than that second column since it is a > special case subset. Looking at the lock_stat_note() code should show > that to be the case. Did you make a change to the output ? No, I did not change the output. I did reset to the contention content by doing echo "0" > /proc/lock_stat/contention. I noticed that the first column get reset but not the second column. So the reset code probably need to be checked. Tim - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] lock stat for -rt 2.6.20-rc2-rt2.2.lock_stat.patch
Bill Huey (hui) wrote: > Can you sort the output ("sort -n" what ever..) and post it without > the zeroed entries ? > > I'm curious about how that statistical spike compares to the rest of > the system activity. I'm sure that'll get the attention of Peter as > well and maybe he'll do something about it ? :) > Here's the lockstat trace. You can cross reference it with my earlier post. http://marc.theaimsgroup.com/?l=linux-kernel=116743637422465=2 The contention happened on mm->mmap_sem shared by the java threads during futex_wake's invocation of _rt_down_read. Tim @contention events = 247149 @failure_events = 146 @lookup_failed_scope = 175 @lookup_failed_static = 43 @static_found = 16 [1, 113, 77 -- 32768, 0]{tcp_init, net/ipv4/tcp.c, 2426} [2, 759, 182 -- 1, 0] {lock_kernel, -, 0} [13, 0, 7 -- 4, 0] {kmem_cache_free, -, 0} [25, 3564, 9278 -- 1, 0]{lock_timer_base, -, 0} [56, 9528, 24552 -- 3, 0] {init_timers_cpu, kernel/timer.c, 1842} [471, 52845, 17682 -- 10448, 0] {sock_lock_init, net/core/sock.c, 817} [32251, 9024, 242 -- 256, 0]{init, kernel/futex.c, 2781} [173724, 11899638, 9886960 -- 11194, 0] {mm_init, kernel/fork.c, 369} - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] lock stat for -rt 2.6.20-rc2-rt2.2.lock_stat.patch
Bill Huey (hui) wrote: > > Good to know that. What did the output reveal ? > > What's your intended use again summarized ? futex contention ? I'll > read the first posting again. > Earlier I used latency_trace and figured that there was read contention on mm->mmap_sem during call to _rt_down_read by java threads when I was running volanomark. That caused the slowdown of the rt kernel compared to non-rt kernel. The output from lock_stat confirm that mm->map_sem was indeed the most heavily contended lock. Tim - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] lock stat for -rt 2.6.20-rc2-rt2.2.lock_stat.patch
Bill Huey (hui) wrote: > > Patch here: > > http://mmlinux.sourceforge.net/public/patch-2.6.20-rc2-rt2.2.lock_stat.p atch > > bill This version is much better and ran stablely. If I'm reading the output correctly, the locks are listed by their initialization point (function, file and line # that a lock is initialized). That's good information to identify the lock. However, it will be more useful if there is information about where the locking was initiated from and who was trying to obtain the lock. Tim - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] lock stat for -rt 2.6.20-rc2-rt2.2.lock_stat.patch
Bill Huey (hui) wrote: Patch here: http://mmlinux.sourceforge.net/public/patch-2.6.20-rc2-rt2.2.lock_stat.p atch bill This version is much better and ran stablely. If I'm reading the output correctly, the locks are listed by their initialization point (function, file and line # that a lock is initialized). That's good information to identify the lock. However, it will be more useful if there is information about where the locking was initiated from and who was trying to obtain the lock. Tim - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] lock stat for -rt 2.6.20-rc2-rt2.2.lock_stat.patch
Bill Huey (hui) wrote: Good to know that. What did the output reveal ? What's your intended use again summarized ? futex contention ? I'll read the first posting again. Earlier I used latency_trace and figured that there was read contention on mm-mmap_sem during call to _rt_down_read by java threads when I was running volanomark. That caused the slowdown of the rt kernel compared to non-rt kernel. The output from lock_stat confirm that mm-map_sem was indeed the most heavily contended lock. Tim - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] lock stat for -rt 2.6.20-rc2-rt2.2.lock_stat.patch
Bill Huey (hui) wrote: Can you sort the output (sort -n what ever..) and post it without the zeroed entries ? I'm curious about how that statistical spike compares to the rest of the system activity. I'm sure that'll get the attention of Peter as well and maybe he'll do something about it ? :) Here's the lockstat trace. You can cross reference it with my earlier post. http://marc.theaimsgroup.com/?l=linux-kernelm=116743637422465w=2 The contention happened on mm-mmap_sem shared by the java threads during futex_wake's invocation of _rt_down_read. Tim @contention events = 247149 @failure_events = 146 @lookup_failed_scope = 175 @lookup_failed_static = 43 @static_found = 16 [1, 113, 77 -- 32768, 0]{tcp_init, net/ipv4/tcp.c, 2426} [2, 759, 182 -- 1, 0] {lock_kernel, -, 0} [13, 0, 7 -- 4, 0] {kmem_cache_free, -, 0} [25, 3564, 9278 -- 1, 0]{lock_timer_base, -, 0} [56, 9528, 24552 -- 3, 0] {init_timers_cpu, kernel/timer.c, 1842} [471, 52845, 17682 -- 10448, 0] {sock_lock_init, net/core/sock.c, 817} [32251, 9024, 242 -- 256, 0]{init, kernel/futex.c, 2781} [173724, 11899638, 9886960 -- 11194, 0] {mm_init, kernel/fork.c, 369} - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] lock stat for -rt 2.6.20-rc2-rt2.2.lock_stat.patch
Bill Huey (hui) wrote: Thanks, the numbers look a bit weird in that the first column should have a bigger number of events than that second column since it is a special case subset. Looking at the lock_stat_note() code should show that to be the case. Did you make a change to the output ? No, I did not change the output. I did reset to the contention content by doing echo 0 /proc/lock_stat/contention. I noticed that the first column get reset but not the second column. So the reset code probably need to be checked. Tim - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] lock stat for -rt 2.6.20-rc2-rt2 [was Re: 2.6.19-rt14 slowdown compared to 2.6.19]
Bill Huey (hui) wrote: > On Tue, Dec 26, 2006 at 04:51:21PM -0800, Chen, Tim C wrote: >> Ingo Molnar wrote: >>> If you'd like to profile this yourself then the lowest-cost way of >>> profiling lock contention on -rt is to use the yum kernel and run >>> the attached trace-it-lock-prof.c code on the box while your >>> workload is in 'steady state' (and is showing those extended idle >>> times): >>> >>> ./trace-it-lock-prof > trace.txt >> >> Thanks for the pointer. Will let you know of any relevant traces. > > Tim, > http://mmlinux.sourceforge.net/public/patch-2.6.20-rc2-rt2.lock_stat.pat ch > > You can also apply this patch to get more precise statistics down to > the lock. For example: > Bill, I'm having some problem getting this patch to run stablely. I'm encoutering errors like that in the trace that follow: Thanks. Tim Unable to handle kernel NULL pointer dereference at 0008 RIP: [] lock_stat_note_contention+0x12d/0x1c3 PGD 0 Oops: [1] PREEMPT SMP CPU 1 Modules linked in: autofs4 sunrpc dm_mirror dm_mod video sbs i2c_ec dock button battery ac uhci_hcd ehci_hcd i2dPid: 0, comm: swapper Not tainted 2.6.20-rc2-rt2 #4 RIP: 0010:[] [] lock_stat_note_contention+0x12d/0x1c3 RSP: 0018:81013fdb3d28 EFLAGS: 00010097 RAX: 81013fd68018 RBX: 81013fd68000 RCX: RDX: 8026762e RSI: RDI: 8026762e RBP: 81013fdb3df8 R08: 8100092bab60 R09: 8100092aafc8 R10: 0001 R11: R12: 81013fd68030 R13: R14: 0046 R15: 002728d5ecd0 FS: () GS:81013fd078c0() knlGS: CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b CR2: 0008 CR3: 00201000 CR4: 06e0 Process swapper (pid: 0, threadinfo 81013fdb2000, task 81013fdb14e0) Stack: 00020001 0100 000e 000e 8100092bc440 Call Trace: [] rt_mutex_slowtrylock+0x84/0x9b [] rt_mutex_trylock+0x2e/0x30 [] rt_spin_trylock+0x9/0xb [] get_next_timer_interrupt+0x34/0x226 [] hrtimer_stop_sched_tick+0xb6/0x138 [] cpu_idle+0x1b/0xdd [] start_secondary+0x2ed/0x2f9 --- | preempt count: 0003 ] | 3-level deep critical section nesting: .. [] cpu_idle+0xd7/0xdd .[] .. ( <= start_secondary+0x2ed/0x2f9) .. [] __spin_lock_irqsave+0x18/0x42 .[] .. ( <= rt_mutex_slowtrylock+0x19/0x9b) .. [] __spin_trylock+0x14/0x4c .[] .. ( <= oops_begin+0x23/0x6f) skipping trace printing on CPU#1 != -1 Code: 49 8b 45 08 8b 78 18 75 0d 49 8b 04 24 f0 ff 80 94 00 00 00 RIP [] lock_stat_note_contention+0x12d/0x1c3 RSP CR2: 0008 <3>BUG: sleeping function called from invalid context swapper(0) at kernel/rtmutex.c:1312 in_atomic():1 [0002], irqs_disabled():1 Call Trace: [] dump_trace+0xbe/0x3cd [] show_trace+0x3a/0x58 [] dump_stack+0x15/0x17 [] __might_sleep+0x103/0x10a [] rt_mutex_lock_with_ip+0x1e/0xac [] __rt_down_read+0x49/0x4d [] rt_down_read+0xb/0xd [] blocking_notifier_call_chain+0x19/0x3f [] profile_task_exit+0x15/0x17 [] do_exit+0x25/0x8de [] do_page_fault+0x7d4/0x856 [] error_exit+0x0/0x84 [] lock_stat_note_contention+0x12d/0x1c3 [] rt_mutex_slowtrylock+0x84/0x9b [] rt_mutex_trylock+0x2e/0x30 [] rt_spin_trylock+0x9/0xb [] get_next_timer_interrupt+0x34/0x226 [] hrtimer_stop_sched_tick+0xb6/0x138 [] cpu_idle+0x1b/0xdd [] start_secondary+0x2ed/0x2f9 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: 2.6.19-rt14 slowdown compared to 2.6.19
Ingo Molnar wrote: > > (could you send me the whole trace if you still have it? It would be > interesting to see a broader snippet from the life of individual java > threads.) > > Ingo Sure, I'll send it to you separately due to the size of the complete trace. Tim - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: 2.6.19-rt14 slowdown compared to 2.6.19
Ingo Molnar wrote: (could you send me the whole trace if you still have it? It would be interesting to see a broader snippet from the life of individual java threads.) Ingo Sure, I'll send it to you separately due to the size of the complete trace. Tim - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] lock stat for -rt 2.6.20-rc2-rt2 [was Re: 2.6.19-rt14 slowdown compared to 2.6.19]
Bill Huey (hui) wrote: On Tue, Dec 26, 2006 at 04:51:21PM -0800, Chen, Tim C wrote: Ingo Molnar wrote: If you'd like to profile this yourself then the lowest-cost way of profiling lock contention on -rt is to use the yum kernel and run the attached trace-it-lock-prof.c code on the box while your workload is in 'steady state' (and is showing those extended idle times): ./trace-it-lock-prof trace.txt Thanks for the pointer. Will let you know of any relevant traces. Tim, http://mmlinux.sourceforge.net/public/patch-2.6.20-rc2-rt2.lock_stat.pat ch You can also apply this patch to get more precise statistics down to the lock. For example: Bill, I'm having some problem getting this patch to run stablely. I'm encoutering errors like that in the trace that follow: Thanks. Tim Unable to handle kernel NULL pointer dereference at 0008 RIP: [802cd6e4] lock_stat_note_contention+0x12d/0x1c3 PGD 0 Oops: [1] PREEMPT SMP CPU 1 Modules linked in: autofs4 sunrpc dm_mirror dm_mod video sbs i2c_ec dock button battery ac uhci_hcd ehci_hcd i2dPid: 0, comm: swapper Not tainted 2.6.20-rc2-rt2 #4 RIP: 0010:[802cd6e4] [802cd6e4] lock_stat_note_contention+0x12d/0x1c3 RSP: 0018:81013fdb3d28 EFLAGS: 00010097 RAX: 81013fd68018 RBX: 81013fd68000 RCX: RDX: 8026762e RSI: RDI: 8026762e RBP: 81013fdb3df8 R08: 8100092bab60 R09: 8100092aafc8 R10: 0001 R11: R12: 81013fd68030 R13: R14: 0046 R15: 002728d5ecd0 FS: () GS:81013fd078c0() knlGS: CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b CR2: 0008 CR3: 00201000 CR4: 06e0 Process swapper (pid: 0, threadinfo 81013fdb2000, task 81013fdb14e0) Stack: 00020001 0100 000e 000e 8100092bc440 Call Trace: [802af471] rt_mutex_slowtrylock+0x84/0x9b [80266909] rt_mutex_trylock+0x2e/0x30 [8026762e] rt_spin_trylock+0x9/0xb [8029beef] get_next_timer_interrupt+0x34/0x226 [802a8b4d] hrtimer_stop_sched_tick+0xb6/0x138 [8024b1ce] cpu_idle+0x1b/0xdd [80278edd] start_secondary+0x2ed/0x2f9 --- | preempt count: 0003 ] | 3-level deep critical section nesting: .. [8024b28a] cpu_idle+0xd7/0xdd .[80278edd] .. ( = start_secondary+0x2ed/0x2f9) .. [80267837] __spin_lock_irqsave+0x18/0x42 .[802af406] .. ( = rt_mutex_slowtrylock+0x19/0x9b) .. [802678db] __spin_trylock+0x14/0x4c .[80268540] .. ( = oops_begin+0x23/0x6f) skipping trace printing on CPU#1 != -1 Code: 49 8b 45 08 8b 78 18 75 0d 49 8b 04 24 f0 ff 80 94 00 00 00 RIP [802cd6e4] lock_stat_note_contention+0x12d/0x1c3 RSP 81013fdb3d28 CR2: 0008 3BUG: sleeping function called from invalid context swapper(0) at kernel/rtmutex.c:1312 in_atomic():1 [0002], irqs_disabled():1 Call Trace: [8026ec53] dump_trace+0xbe/0x3cd [8026eff3] show_trace+0x3a/0x58 [8026f026] dump_stack+0x15/0x17 [8020b75e] __might_sleep+0x103/0x10a [80266e44] rt_mutex_lock_with_ip+0x1e/0xac [802aff07] __rt_down_read+0x49/0x4d [802aff16] rt_down_read+0xb/0xd [8029fc96] blocking_notifier_call_chain+0x19/0x3f [80296301] profile_task_exit+0x15/0x17 [80215572] do_exit+0x25/0x8de [8026a2c1] do_page_fault+0x7d4/0x856 [802681ad] error_exit+0x0/0x84 [802cd6e4] lock_stat_note_contention+0x12d/0x1c3 [802af471] rt_mutex_slowtrylock+0x84/0x9b [80266909] rt_mutex_trylock+0x2e/0x30 [8026762e] rt_spin_trylock+0x9/0xb [8029beef] get_next_timer_interrupt+0x34/0x226 [802a8b4d] hrtimer_stop_sched_tick+0xb6/0x138 [8024b1ce] cpu_idle+0x1b/0xdd [80278edd] start_secondary+0x2ed/0x2f9 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: 2.6.19-rt14 slowdown compared to 2.6.19
Ingo Molnar wrote: > > If you'd like to profile this yourself then the lowest-cost way of > profiling lock contention on -rt is to use the yum kernel and run the > attached trace-it-lock-prof.c code on the box while your workload is > in 'steady state' (and is showing those extended idle times): > > ./trace-it-lock-prof > trace.txt > > this captures up to 1 second worth of system activity, on the current > CPU. Then you can construct the histogram via: > > grep -A 1 ' __schedule()<-' trace.txt | cut -d: -f2- | sort | > uniq -c | sort -n > prof.txt > I did lock profiling on Volanomark as suggested and obtained the profile that is listed below. 246 __sched_text_start()<-schedule()<-rt_spin_lock_slowlock()<-__lock_text_s tart() 264 rt_mutex_slowunlock()<-rt_mutex_unlock()<-rt_up_read()<-(-1)() 334 __sched_text_start()<-schedule()<-posix_cpu_timers_thread()<-kthread() 437 __sched_text_start()<-schedule()<-do_futex()<-sys_futex() 467 (-1)()<-(0)()<-(0)()<-(0)() 495 __sched_text_start()<-preempt_schedule()<-__spin_unlock_irqrestore()<-rt _mutex_adjust_prio() 497 __netif_rx_schedule()<-netif_rx()<-loopback_xmit()<-(-1)() 499 __sched_text_start()<-schedule()<-schedule_timeout()<-sk_wait_data() 500 tcp_recvmsg()<-sock_common_recvmsg()<-sock_recvmsg()<-(-1)() 503 __rt_down_read()<-rt_down_read()<-do_futex()<-(-1)() 1160 __sched_text_start()<-schedule()<-ksoftirqd()<-kthread() 1433 __rt_down_read()<-rt_down_read()<-futex_wake()<-(-1)() 1497 child_rip()<-(-1)()<-(0)()<-(0)() 1936 __sched_text_start()<-schedule()<-rt_mutex_slowlock()<-rt_mutex_lock() Looks like the idle time I saw was due to lock contention during call to futex_wake, which requires acquisition of current->mm->mmap_sem. Many of the java threads share mm and result in concurrent access to common mm. Looks like under rt case there is no special treatment to read locking so the read lock accesses are contended under __rt_down_read. For non rt case, __down_read makes the distinction for read lock access and the read lockings do not contend. Things are made worse here as this delayed waking up processes locked by the futex. See also a snippet of the latency_trace below. -0 2D..2 5821us!: thread_return (150 20) -0 2DN.1 6278us : __sched_text_start()<-cpu_idle()<-start_secondary()<-(-1)() -0 2DN.1 6278us : (0)()<-(0)()<-(0)()<-(0)() java-6648 2D..2 6280us+: thread_return <-0> (20 -4) java-6648 2D..1 6296us : try_to_wake_up()<-wake_up_process()<-wakeup_next_waiter()<-rt_mutex_slow unlock() java-6648 2D..1 6296us : rt_mutex_unlock()<-rt_up_read()<-do_futex()<-(-1)() java-6648 2D..2 6297us : effective_prio <<...>-6673> (-4 -4) java-6648 2D..2 6297us : __activate_task <<...>-6673> (-4 1) java-6648 2 6297us < (-11) java-6648 2 6298us+> sys_futex (00afaf50 0001 0001) java-6648 2...1 6315us : __sched_text_start()<-schedule()<-rt_mutex_slowlock()<-rt_mutex_lock() java-6648 2...1 6315us : __rt_down_read()<-rt_down_read()<-futex_wake()<-(-1)() java-6648 2D..2 6316us+: deactivate_task (-4 1) -0 2D..2 6318us+: thread_return (-4 20) -0 2DN.1 6327us : __sched_text_start()<-cpu_idle()<-start_secondary()<-(-1)() -0 2DN.1 6328us+: (0)()<-(0)()<-(0)()<-(0)() java-6629 2D..2 6330us+: thread_return <-0> (20 -4) java-6629 2D..1 6347us : try_to_wake_up()<-wake_up_process()<-wakeup_next_waiter()<-rt_mutex_slow unlock() java-6629 2D..1 6347us : rt_mutex_unlock()<-rt_up_read()<-futex_wake()<-(-1)() java-6629 2D..2 6348us : effective_prio (-4 -4) java-6629 2D..2 6349us : __activate_task (-4 1) java-6629 2 6350us+< (0) java-6629 2 6352us+> sys_futex (00afc1dc 0001 0001) java-6629 2...1 6368us : __sched_text_start()<-schedule()<-rt_mutex_slowlock()<-rt_mutex_lock() java-6629 2...1 6368us : __rt_down_read()<-rt_down_read()<-futex_wake()<-(-1)() java-6629 2D..2 6369us+: deactivate_task (-4 1) -0 2D..2 6404us!: thread_return (-4 20) -0 2DN.1 6584us : __sched_text_start()<-cpu_idle()<-start_secondary()<-(-1)() Thanks. Tim - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: 2.6.19-rt14 slowdown compared to 2.6.19
Ingo Molnar wrote: If you'd like to profile this yourself then the lowest-cost way of profiling lock contention on -rt is to use the yum kernel and run the attached trace-it-lock-prof.c code on the box while your workload is in 'steady state' (and is showing those extended idle times): ./trace-it-lock-prof trace.txt this captures up to 1 second worth of system activity, on the current CPU. Then you can construct the histogram via: grep -A 1 ' __schedule()-' trace.txt | cut -d: -f2- | sort | uniq -c | sort -n prof.txt I did lock profiling on Volanomark as suggested and obtained the profile that is listed below. 246 __sched_text_start()-schedule()-rt_spin_lock_slowlock()-__lock_text_s tart() 264 rt_mutex_slowunlock()-rt_mutex_unlock()-rt_up_read()-(-1)() 334 __sched_text_start()-schedule()-posix_cpu_timers_thread()-kthread() 437 __sched_text_start()-schedule()-do_futex()-sys_futex() 467 (-1)()-(0)()-(0)()-(0)() 495 __sched_text_start()-preempt_schedule()-__spin_unlock_irqrestore()-rt _mutex_adjust_prio() 497 __netif_rx_schedule()-netif_rx()-loopback_xmit()-(-1)() 499 __sched_text_start()-schedule()-schedule_timeout()-sk_wait_data() 500 tcp_recvmsg()-sock_common_recvmsg()-sock_recvmsg()-(-1)() 503 __rt_down_read()-rt_down_read()-do_futex()-(-1)() 1160 __sched_text_start()-schedule()-ksoftirqd()-kthread() 1433 __rt_down_read()-rt_down_read()-futex_wake()-(-1)() 1497 child_rip()-(-1)()-(0)()-(0)() 1936 __sched_text_start()-schedule()-rt_mutex_slowlock()-rt_mutex_lock() Looks like the idle time I saw was due to lock contention during call to futex_wake, which requires acquisition of current-mm-mmap_sem. Many of the java threads share mm and result in concurrent access to common mm. Looks like under rt case there is no special treatment to read locking so the read lock accesses are contended under __rt_down_read. For non rt case, __down_read makes the distinction for read lock access and the read lockings do not contend. Things are made worse here as this delayed waking up processes locked by the futex. See also a snippet of the latency_trace below. idle-0 2D..2 5821us!: thread_return softirq--31 (150 20) idle-0 2DN.1 6278us : __sched_text_start()-cpu_idle()-start_secondary()-(-1)() idle-0 2DN.1 6278us : (0)()-(0)()-(0)()-(0)() java-6648 2D..2 6280us+: thread_return idle-0 (20 -4) java-6648 2D..1 6296us : try_to_wake_up()-wake_up_process()-wakeup_next_waiter()-rt_mutex_slow unlock() java-6648 2D..1 6296us : rt_mutex_unlock()-rt_up_read()-do_futex()-(-1)() java-6648 2D..2 6297us : effective_prio ...-6673 (-4 -4) java-6648 2D..2 6297us : __activate_task ...-6673 (-4 1) java-6648 2 6297us (-11) java-6648 2 6298us+ sys_futex (00afaf50 0001 0001) java-6648 2...1 6315us : __sched_text_start()-schedule()-rt_mutex_slowlock()-rt_mutex_lock() java-6648 2...1 6315us : __rt_down_read()-rt_down_read()-futex_wake()-(-1)() java-6648 2D..2 6316us+: deactivate_task java-6648 (-4 1) idle-0 2D..2 6318us+: thread_return java-6648 (-4 20) idle-0 2DN.1 6327us : __sched_text_start()-cpu_idle()-start_secondary()-(-1)() idle-0 2DN.1 6328us+: (0)()-(0)()-(0)()-(0)() java-6629 2D..2 6330us+: thread_return idle-0 (20 -4) java-6629 2D..1 6347us : try_to_wake_up()-wake_up_process()-wakeup_next_waiter()-rt_mutex_slow unlock() java-6629 2D..1 6347us : rt_mutex_unlock()-rt_up_read()-futex_wake()-(-1)() java-6629 2D..2 6348us : effective_prio java-6235 (-4 -4) java-6629 2D..2 6349us : __activate_task java-6235 (-4 1) java-6629 2 6350us+ (0) java-6629 2 6352us+ sys_futex (00afc1dc 0001 0001) java-6629 2...1 6368us : __sched_text_start()-schedule()-rt_mutex_slowlock()-rt_mutex_lock() java-6629 2...1 6368us : __rt_down_read()-rt_down_read()-futex_wake()-(-1)() java-6629 2D..2 6369us+: deactivate_task java-6629 (-4 1) idle-0 2D..2 6404us!: thread_return java-6629 (-4 20) idle-0 2DN.1 6584us : __sched_text_start()-cpu_idle()-start_secondary()-(-1)() Thanks. Tim - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: 2.6.19-rt14 slowdown compared to 2.6.19
Ingo Molnar wrote: > > cool - thanks for the feedback! Running the 64-bit kernel, right? > Yes, 64-bit kernel was used. > > while some slowdown is to be expected, did in each case idle time > increase significantly? Volanomark and Re-Aim7 ran close to 0% idle time for 2.6.19 kernel. Idle time increase significantly for Volanomark (to 60% idle) and Re-Aim7 (to 20% idle) with the rt kernel. For netperf, the system was 60% idle for both 2.6.19 and rt kernel and changes in idle time was not significant. > If yes then this is the effect of lock > contention. Lock contention effects are 'magnified' by PREEMPT_RT. For > example if you run 128 threads workload that all use the same lock > then > the -rt kernel can act as if it were a 128-way box (!). This way by > running -rt you'll see scalability problems alot sooner than on real > hardware. In other words: PREEMPT_RT in essence simulates the > scalability behavior of up to an infinite amount of CPUs. (with the > exception of cachemiss emulation ;) [the effect is not this precise, > but > that's the rough trend] Turning off PREEMPT_RT for 2.6.20-rc2-rt0 kernel restored most the performance of Volanaomark and Re-Aim7. Idle time is close to 0%. So the benchmarks with large number of threads are affected more by PREEMPT_RT. For netperf TCP streaming, the performance improved from 40% down to 20% down from 2.6.20-rc2 kernel. There is only a server and a client process for netperf. The underlying reason for the change in performance is probably different. > > If you'd like to profile this yourself then the lowest-cost way of > profiling lock contention on -rt is to use the yum kernel and run the > attached trace-it-lock-prof.c code on the box while your workload is > in 'steady state' (and is showing those extended idle times): > > ./trace-it-lock-prof > trace.txt > Thanks for the pointer. Will let you know of any relevant traces. Thanks. Tim - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: 2.6.19-rt14 slowdown compared to 2.6.19
Ingo Molnar wrote: cool - thanks for the feedback! Running the 64-bit kernel, right? Yes, 64-bit kernel was used. while some slowdown is to be expected, did in each case idle time increase significantly? Volanomark and Re-Aim7 ran close to 0% idle time for 2.6.19 kernel. Idle time increase significantly for Volanomark (to 60% idle) and Re-Aim7 (to 20% idle) with the rt kernel. For netperf, the system was 60% idle for both 2.6.19 and rt kernel and changes in idle time was not significant. If yes then this is the effect of lock contention. Lock contention effects are 'magnified' by PREEMPT_RT. For example if you run 128 threads workload that all use the same lock then the -rt kernel can act as if it were a 128-way box (!). This way by running -rt you'll see scalability problems alot sooner than on real hardware. In other words: PREEMPT_RT in essence simulates the scalability behavior of up to an infinite amount of CPUs. (with the exception of cachemiss emulation ;) [the effect is not this precise, but that's the rough trend] Turning off PREEMPT_RT for 2.6.20-rc2-rt0 kernel restored most the performance of Volanaomark and Re-Aim7. Idle time is close to 0%. So the benchmarks with large number of threads are affected more by PREEMPT_RT. For netperf TCP streaming, the performance improved from 40% down to 20% down from 2.6.20-rc2 kernel. There is only a server and a client process for netperf. The underlying reason for the change in performance is probably different. If you'd like to profile this yourself then the lowest-cost way of profiling lock contention on -rt is to use the yum kernel and run the attached trace-it-lock-prof.c code on the box while your workload is in 'steady state' (and is showing those extended idle times): ./trace-it-lock-prof trace.txt Thanks for the pointer. Will let you know of any relevant traces. Thanks. Tim - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
2.6.19-rt14 slowdown compared to 2.6.19
Ingo, We did some benchmarking on 2.6.19-rt14, compared it with 2.6.19 kernel and noticed several slowdowns. The test machine is a 2 socket woodcrest machine with your default configuration. Netperf TCP Streaming was slower by 40% ( 1 server and 1 client each bound to separate cpu cores on different socket, network loopback mode was used). Volanomark was slower by 80% (Server and Clients communicate with network loopback mode. Idle time goes from 1% to 60%) Re-Aim7 was slower by 40% (idle time goes from 0% to 20%) Wonder if you have any suggestions on what could cause the slowdown. We've tried disabling CONFIG_NO_HZ and it didn't help much. Thanks. Tim - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
2.6.19-rt14 slowdown compared to 2.6.19
Ingo, We did some benchmarking on 2.6.19-rt14, compared it with 2.6.19 kernel and noticed several slowdowns. The test machine is a 2 socket woodcrest machine with your default configuration. Netperf TCP Streaming was slower by 40% ( 1 server and 1 client each bound to separate cpu cores on different socket, network loopback mode was used). Volanomark was slower by 80% (Server and Clients communicate with network loopback mode. Idle time goes from 1% to 60%) Re-Aim7 was slower by 40% (idle time goes from 0% to 20%) Wonder if you have any suggestions on what could cause the slowdown. We've tried disabling CONFIG_NO_HZ and it didn't help much. Thanks. Tim - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/