RE: [v5 PATCH 1/2] mm: swap: check if swap backing device is congested or not

2019-01-10 Thread Chen, Tim C
> >
> > +   if (si->flags & (SWP_BLKDEV | SWP_FS)) {
> 
> I re-read your discussion with Tim and I must say the reasoning behind this
> test remain foggy.

I was worried that the dereference

inode = si->swap_file->f_mapping->host;

is not always safe for corner cases.

So the test makes sure that the dereference is valid.

> 
> What goes wrong if we just remove it?

If the dereference to get inode is always safe, we can remove it.


Thanks.

Tim


RE: [Update][PATCH v5 7/9] mm/swap: Add cache for swap slots allocation

2017-01-17 Thread Chen, Tim C
> > >
> >
> > The cache->slots_ret  is protected by cache->free_lock and
> > cache->slots is protected by cache->free_lock.

Typo.  cache->slots is protected by cache->alloc_lock.

Tim


RE: [Update][PATCH v5 7/9] mm/swap: Add cache for swap slots allocation

2017-01-17 Thread Chen, Tim C
> > >
> >
> > The cache->slots_ret  is protected by cache->free_lock and
> > cache->slots is protected by cache->free_lock.

Typo.  cache->slots is protected by cache->alloc_lock.

Tim


RE: [Update][PATCH v5 7/9] mm/swap: Add cache for swap slots allocation

2017-01-17 Thread Chen, Tim C
> > +   /*
> > +* Preemption need to be turned on here, because we may sleep
> > +* in refill_swap_slots_cache().  But it is safe, because
> > +* accesses to the per-CPU data structure are protected by a
> > +* mutex.
> > +*/
> 
> the comment doesn't really explain why it is safe. THere are other users
> which are not using the lock. E.g. just look at free_swap_slot above.
> How can
>   cache->slots_ret[cache->n_ret++] = entry; be safe wrt.
>   pentry = >slots[cache->cur++];
>   entry = *pentry;
> 
> Both of them might touch the same slot, no? Btw. I would rather prefer this
> would be a follow up fix with the trace and the detailed explanation.
> 

The cache->slots_ret  is protected by cache->free_lock and cache->slots is
protected by cache->free_lock.  They are two separate structures, one for
caching the slots returned and one for caching the slots allocated.  So
they do no touch the same slots.  We'll update the comments so it is clearer.

Sure. We can issue a follow up fix on top of the current patchset.

Thanks.

Tim


RE: [Update][PATCH v5 7/9] mm/swap: Add cache for swap slots allocation

2017-01-17 Thread Chen, Tim C
> > +   /*
> > +* Preemption need to be turned on here, because we may sleep
> > +* in refill_swap_slots_cache().  But it is safe, because
> > +* accesses to the per-CPU data structure are protected by a
> > +* mutex.
> > +*/
> 
> the comment doesn't really explain why it is safe. THere are other users
> which are not using the lock. E.g. just look at free_swap_slot above.
> How can
>   cache->slots_ret[cache->n_ret++] = entry; be safe wrt.
>   pentry = >slots[cache->cur++];
>   entry = *pentry;
> 
> Both of them might touch the same slot, no? Btw. I would rather prefer this
> would be a follow up fix with the trace and the detailed explanation.
> 

The cache->slots_ret  is protected by cache->free_lock and cache->slots is
protected by cache->free_lock.  They are two separate structures, one for
caching the slots returned and one for caching the slots allocated.  So
they do no touch the same slots.  We'll update the comments so it is clearer.

Sure. We can issue a follow up fix on top of the current patchset.

Thanks.

Tim


RE: [PATCH -v3 00/10] THP swap: Delay splitting THP during swapping out

2016-09-22 Thread Chen, Tim C
>
>So this is impossible without THP swapin. While 2M swapout makes a lot of
>sense, I doubt 2M swapin is really useful. What kind of application is 
>'optimized'
>to do sequential memory access?

We waste a lot of cpu cycles to re-compact 4K pages back to a large page
under THP.  Swapping it back in as a single large page can avoid
fragmentation and this overhead.

Thanks.

Tim


RE: [PATCH -v3 00/10] THP swap: Delay splitting THP during swapping out

2016-09-22 Thread Chen, Tim C
>
>So this is impossible without THP swapin. While 2M swapout makes a lot of
>sense, I doubt 2M swapin is really useful. What kind of application is 
>'optimized'
>to do sequential memory access?

We waste a lot of cpu cycles to re-compact 4K pages back to a large page
under THP.  Swapping it back in as a single large page can avoid
fragmentation and this overhead.

Thanks.

Tim


RE: [PATCH -v3 00/10] THP swap: Delay splitting THP during swapping out

2016-09-13 Thread Chen, Tim C
>>
>> - Avoid CPU time for splitting, collapsing THP across swap out/in.
>
>Yes, if you want, please give us how bad it is.
>

It could be pretty bad.  In an experiment with THP turned on and we
enter swap, 50% of the cpu are spent in the page compaction path.  
So if we could deal with units of large page for swap, the splitting
and compaction of ordinary pages to large page overhead could be avoided.

   51.89%51.89%:1688  [kernel.kallsyms]   [k] 
pageblock_pfn_to_page   
  |
  --- pageblock_pfn_to_page
 |  
 |--64.57%-- compaction_alloc
 |  migrate_pages
 |  compact_zone
 |  compact_zone_order
 |  try_to_compact_pages
 |  __alloc_pages_direct_compact
 |  __alloc_pages_nodemask
 |  alloc_pages_vma
 |  do_huge_pmd_anonymous_page
 |  handle_mm_fault
 |  __do_page_fault
 |  do_page_fault
 |  page_fault
 |  0x401d9a
 |  
 |--34.62%-- compact_zone
 |  compact_zone_order
 |  try_to_compact_pages
 |  __alloc_pages_direct_compact
 |  __alloc_pages_nodemask
 |  alloc_pages_vma
 |  do_huge_pmd_anonymous_page
 |  handle_mm_fault
 |  __do_page_fault
 |  do_page_fault
 |  page_fault
 |  0x401d9a
  --0.81%-- [...]

Tim


RE: [PATCH -v3 00/10] THP swap: Delay splitting THP during swapping out

2016-09-13 Thread Chen, Tim C
>>
>> - Avoid CPU time for splitting, collapsing THP across swap out/in.
>
>Yes, if you want, please give us how bad it is.
>

It could be pretty bad.  In an experiment with THP turned on and we
enter swap, 50% of the cpu are spent in the page compaction path.  
So if we could deal with units of large page for swap, the splitting
and compaction of ordinary pages to large page overhead could be avoided.

   51.89%51.89%:1688  [kernel.kallsyms]   [k] 
pageblock_pfn_to_page   
  |
  --- pageblock_pfn_to_page
 |  
 |--64.57%-- compaction_alloc
 |  migrate_pages
 |  compact_zone
 |  compact_zone_order
 |  try_to_compact_pages
 |  __alloc_pages_direct_compact
 |  __alloc_pages_nodemask
 |  alloc_pages_vma
 |  do_huge_pmd_anonymous_page
 |  handle_mm_fault
 |  __do_page_fault
 |  do_page_fault
 |  page_fault
 |  0x401d9a
 |  
 |--34.62%-- compact_zone
 |  compact_zone_order
 |  try_to_compact_pages
 |  __alloc_pages_direct_compact
 |  __alloc_pages_nodemask
 |  alloc_pages_vma
 |  do_huge_pmd_anonymous_page
 |  handle_mm_fault
 |  __do_page_fault
 |  do_page_fault
 |  page_fault
 |  0x401d9a
  --0.81%-- [...]

Tim


RE: performance delta after VFS i_mutex=>i_rwsem conversion

2016-06-09 Thread Chen, Tim C
>> Ok, these enhancements are now in the locking tree and are queued up for
>v4.8:
>>
>>git pull git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git
>> locking/core
>>
>> Dave, you might want to check your numbers with these changes: is
>> rwsem performance still significantly worse than mutex performance?
>
>It's substantially closer than it was, but there's probably a little work 
>still to do.
>The rwsem still looks to be sleeping a lot more than the mutex.  Here's where
>we started:
>
>   https://www.sr71.net/~dave/intel/rwsem-vs-mutex.png
>
>The rwsem peaked lower and earlier than the mutex code.  Now, if we compare
>the old (4.7-rc1) rwsem code to the newly-patched rwsem code (from
>tip/locking):
>
>> https://www.sr71.net/~dave/intel/bb.html?1=4.7.0-rc1&2=4.7.0-rc1-00127
>> -gd4c3be7
>
>We can see the peak is a bit higher and more importantly, it's more of a 
>plateau
>than a sharp peak.  We can also compare the new rwsem code to the 4.5 code
>that had the mutex in place:
>
>> https://www.sr71.net/~dave/intel/bb.html?1=4.5.0-rc6&2=4.7.0-rc1-00127
>> -gd4c3be7
>
>rwsems are still a _bit_ below the mutex code at the peak, and they also seem
>to be substantially lower during the tail from 20 cpus on up.  The rwsems are
>sleeping less than they were before the tip/locking updates, but they are still
>idling the CPUs 90% of the time while the mutexes end up idle 15-20% of the
>time when all the cpus are contending on the lock.

In Al Viro's conversion, he introduced inode_lock_shared which uses
read lock in lookup_slow.   The rwsem does bail out of optimistic spin when
readers acquire the lock, thus causing us to see a lot less optimistic spinning
attempts for the unlink test case.  

Whereas earlier for mutex, we will keep spinning.

A simple test may be to see if we get similar performance when changing
inode_lock_shared to the  writer version.

That said, hopefully we should have a lot more read locking than write locking
(i.e. more path look up than changes to the path) so the switch to rwsem is
still a win.  I guess the lesson here is when there is an equal mix of writers
and readers, rwsem could be a bit worse in performance than mutex as
we don't spin as hard.

Tim



RE: performance delta after VFS i_mutex=>i_rwsem conversion

2016-06-09 Thread Chen, Tim C
>> Ok, these enhancements are now in the locking tree and are queued up for
>v4.8:
>>
>>git pull git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git
>> locking/core
>>
>> Dave, you might want to check your numbers with these changes: is
>> rwsem performance still significantly worse than mutex performance?
>
>It's substantially closer than it was, but there's probably a little work 
>still to do.
>The rwsem still looks to be sleeping a lot more than the mutex.  Here's where
>we started:
>
>   https://www.sr71.net/~dave/intel/rwsem-vs-mutex.png
>
>The rwsem peaked lower and earlier than the mutex code.  Now, if we compare
>the old (4.7-rc1) rwsem code to the newly-patched rwsem code (from
>tip/locking):
>
>> https://www.sr71.net/~dave/intel/bb.html?1=4.7.0-rc1&2=4.7.0-rc1-00127
>> -gd4c3be7
>
>We can see the peak is a bit higher and more importantly, it's more of a 
>plateau
>than a sharp peak.  We can also compare the new rwsem code to the 4.5 code
>that had the mutex in place:
>
>> https://www.sr71.net/~dave/intel/bb.html?1=4.5.0-rc6&2=4.7.0-rc1-00127
>> -gd4c3be7
>
>rwsems are still a _bit_ below the mutex code at the peak, and they also seem
>to be substantially lower during the tail from 20 cpus on up.  The rwsems are
>sleeping less than they were before the tip/locking updates, but they are still
>idling the CPUs 90% of the time while the mutexes end up idle 15-20% of the
>time when all the cpus are contending on the lock.

In Al Viro's conversion, he introduced inode_lock_shared which uses
read lock in lookup_slow.   The rwsem does bail out of optimistic spin when
readers acquire the lock, thus causing us to see a lot less optimistic spinning
attempts for the unlink test case.  

Whereas earlier for mutex, we will keep spinning.

A simple test may be to see if we get similar performance when changing
inode_lock_shared to the  writer version.

That said, hopefully we should have a lot more read locking than write locking
(i.e. more path look up than changes to the path) so the switch to rwsem is
still a win.  I guess the lesson here is when there is an equal mix of writers
and readers, rwsem could be a bit worse in performance than mutex as
we don't spin as hard.

Tim



RE: Regression with SLUB on Netperf and Volanomark

2007-05-03 Thread Chen, Tim C
Christoph Lameter wrote:
> Try to boot with
> 
> slub_max_order=4 slub_min_objects=8
> 
> If that does not help increase slub_min_objects to 16.
> 

We are still seeing a 5% regression on TCP streaming with
slub_min_objects set at 16 and a 10% regression for Volanomark, after
increasing slub_min_objects to 16 and setting slub_max_order=4 and using
the 2.6.21-rc7-mm2 kernel.  The performance between slub_min_objects=8
and 16 are similar.

>> We found that for Netperf's TCP streaming tests in a loop back mode,
>> the TCP streaming performance is about 7% worse when SLUB is enabled
>> on 
>> 2.6.21-rc7-mm1 kernel (x86_64).  This test have a lot of sk_buff
>> allocation/deallocation.
> 
> 2.6.21-rc7-mm2 contains some performance fixes that may or may not be
> useful to you.

We've switched to 2.6.21-rc7-mm2 in our tests now.

>> 
>> For Volanomark, the performance is 7% worse for Woodcrest and 12%
>> worse for Clovertown.
> 
> SLUBs "queueing" is restricted to the number of objects that fit in
> page order slab. SLAB can queue more objects since it has true queues.
> Increasing the page size that SLUB uses may fix the problem but then
> we run into higher page order issues.
> 
> Check slabinfo output for the network slabs and see what order is
> used. The number of objects per slab is important for performance.

The order used is 0 for the buffer_head, which is the most used object.

I think they are 104 bytes per object.

Tim
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Regression with SLUB on Netperf and Volanomark

2007-05-03 Thread Chen, Tim C
Christoph Lameter wrote:
 Try to boot with
 
 slub_max_order=4 slub_min_objects=8
 
 If that does not help increase slub_min_objects to 16.
 

We are still seeing a 5% regression on TCP streaming with
slub_min_objects set at 16 and a 10% regression for Volanomark, after
increasing slub_min_objects to 16 and setting slub_max_order=4 and using
the 2.6.21-rc7-mm2 kernel.  The performance between slub_min_objects=8
and 16 are similar.

 We found that for Netperf's TCP streaming tests in a loop back mode,
 the TCP streaming performance is about 7% worse when SLUB is enabled
 on 
 2.6.21-rc7-mm1 kernel (x86_64).  This test have a lot of sk_buff
 allocation/deallocation.
 
 2.6.21-rc7-mm2 contains some performance fixes that may or may not be
 useful to you.

We've switched to 2.6.21-rc7-mm2 in our tests now.

 
 For Volanomark, the performance is 7% worse for Woodcrest and 12%
 worse for Clovertown.
 
 SLUBs queueing is restricted to the number of objects that fit in
 page order slab. SLAB can queue more objects since it has true queues.
 Increasing the page size that SLUB uses may fix the problem but then
 we run into higher page order issues.
 
 Check slabinfo output for the network slabs and see what order is
 used. The number of objects per slab is important for performance.

The order used is 0 for the buffer_head, which is the most used object.

I think they are 104 bytes per object.

Tim
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] lock stat for -rt 2.6.20-rc2-rt2.2.lock_stat.patch

2007-01-03 Thread Chen, Tim C
Bill Huey (hui) wrote:
> This should have the fix.
> 
>
http://mmlinux.sf.net/public/patch-2.6.20-rc2-rt2.3.lock_stat.patch
> 
> If you can rerun it and post the results, it'll hopefully show the
> behavior of that lock acquisition better.
> 

Here's the run with fix to produce correct statistics.

Tim

@contention events = 848858
@failure_events = 10
@lookup_failed_scope = 175
@lookup_failed_static = 47
@static_found = 17
[2, 0, 0 -- 1, 0]   {journal_init_common, fs/jbd/journal.c,
667}
[2, 0, 0 -- 31, 0]  {blk_init_queue_node, block/ll_rw_blk.c,
1910}
[2, 0, 0 -- 31, 0]  {create_workqueue_thread,
kernel/workqueue.c, 474}
[3, 3, 2 -- 16384, 0]   {tcp_init, net/ipv4/tcp.c, 2426}
[4, 4, 1 -- 1, 0]   {lock_kernel, -, 0}
[19, 0, 0 -- 1, 0]  {kmem_cache_alloc, -, 0}
[25, 0, 0 -- 1, 0]  {kfree, -, 0}
[49, 0, 0 -- 2, 0]  {kmem_cache_free, -, 0}
[69, 38, 176 -- 1, 0]   {lock_timer_base, -, 0}
[211, 117, 517 -- 3, 0] {init_timers_cpu, kernel/timer.c, 1842}
[1540, 778, 365 -- 7326, 0] {sock_lock_init,
net/core/sock.c, 817}
[112584, 150, 6 -- 256, 0]  {init, kernel/futex.c, 2781}
[597012, 183895, 136277 -- 9546, 0] {mm_init, kernel/fork.c,
369}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] lock stat for -rt 2.6.20-rc2-rt2.2.lock_stat.patch

2007-01-03 Thread Chen, Tim C
Bill Huey (hui) wrote:
> 
> Thanks, the numbers look a bit weird in that the first column should
> have a bigger number of events than that second column since it is a
> special case subset. Looking at the lock_stat_note() code should show
> that to be the case. Did you make a change to the output ?

No, I did not change the output.  I did reset to the contention content

by doing echo "0" > /proc/lock_stat/contention.

I noticed that the first column get reset but not the second column. So
the reset code probably need to be checked.

Tim
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] lock stat for -rt 2.6.20-rc2-rt2.2.lock_stat.patch

2007-01-03 Thread Chen, Tim C
Bill Huey (hui) wrote:
> Can you sort the output ("sort -n" what ever..) and post it without
> the zeroed entries ?
> 
> I'm curious about how that statistical spike compares to the rest of
> the system activity. I'm sure that'll get the attention of Peter as
> well and maybe he'll do something about it ? :)
> 

Here's the lockstat trace.  You can cross reference it with my
earlier post.  
http://marc.theaimsgroup.com/?l=linux-kernel=116743637422465=2

The contention happened on mm->mmap_sem shared
by the java threads during futex_wake's invocation of _rt_down_read.

Tim

@contention events = 247149
@failure_events = 146
@lookup_failed_scope = 175
@lookup_failed_static = 43
@static_found = 16
[1, 113, 77 -- 32768, 0]{tcp_init, net/ipv4/tcp.c, 2426}
[2, 759, 182 -- 1, 0]   {lock_kernel, -, 0}
[13, 0, 7 -- 4, 0]  {kmem_cache_free, -, 0}
[25, 3564, 9278 -- 1, 0]{lock_timer_base, -, 0}
[56, 9528, 24552 -- 3, 0]   {init_timers_cpu, kernel/timer.c,
1842}
[471, 52845, 17682 -- 10448, 0] {sock_lock_init, net/core/sock.c,
817}
[32251, 9024, 242 -- 256, 0]{init, kernel/futex.c, 2781}
[173724, 11899638, 9886960 -- 11194, 0] {mm_init, kernel/fork.c,
369}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] lock stat for -rt 2.6.20-rc2-rt2.2.lock_stat.patch

2007-01-03 Thread Chen, Tim C
Bill Huey (hui) wrote:
> 
> Good to know that. What did the output reveal ?
> 
> What's your intended use again summarized ? futex contention ? I'll
> read the first posting again.
> 

Earlier I used latency_trace and figured that there was read contention
on mm->mmap_sem during call to _rt_down_read by java threads
when I was running volanomark.  That caused the slowdown of the rt
kernel
compared to non-rt kernel.  The output from lock_stat confirm
that mm->map_sem was indeed the most heavily contended lock.

Tim
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] lock stat for -rt 2.6.20-rc2-rt2.2.lock_stat.patch

2007-01-03 Thread Chen, Tim C
Bill Huey (hui) wrote:
> 
> Patch here:
> 
>
http://mmlinux.sourceforge.net/public/patch-2.6.20-rc2-rt2.2.lock_stat.p
atch
> 
> bill

This version is much better and ran stablely.  

If I'm reading the output correctly, the locks are listed by 
their initialization point (function, file and line # that a lock is
initialized).  
That's good information to identify the lock.  

However, it will be more useful if there is information about where the
locking
was initiated from and who was trying to obtain the lock.

Tim
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] lock stat for -rt 2.6.20-rc2-rt2.2.lock_stat.patch

2007-01-03 Thread Chen, Tim C
Bill Huey (hui) wrote:
 
 Patch here:
 

http://mmlinux.sourceforge.net/public/patch-2.6.20-rc2-rt2.2.lock_stat.p
atch
 
 bill

This version is much better and ran stablely.  

If I'm reading the output correctly, the locks are listed by 
their initialization point (function, file and line # that a lock is
initialized).  
That's good information to identify the lock.  

However, it will be more useful if there is information about where the
locking
was initiated from and who was trying to obtain the lock.

Tim
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] lock stat for -rt 2.6.20-rc2-rt2.2.lock_stat.patch

2007-01-03 Thread Chen, Tim C
Bill Huey (hui) wrote:
 
 Good to know that. What did the output reveal ?
 
 What's your intended use again summarized ? futex contention ? I'll
 read the first posting again.
 

Earlier I used latency_trace and figured that there was read contention
on mm-mmap_sem during call to _rt_down_read by java threads
when I was running volanomark.  That caused the slowdown of the rt
kernel
compared to non-rt kernel.  The output from lock_stat confirm
that mm-map_sem was indeed the most heavily contended lock.

Tim
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] lock stat for -rt 2.6.20-rc2-rt2.2.lock_stat.patch

2007-01-03 Thread Chen, Tim C
Bill Huey (hui) wrote:
 Can you sort the output (sort -n what ever..) and post it without
 the zeroed entries ?
 
 I'm curious about how that statistical spike compares to the rest of
 the system activity. I'm sure that'll get the attention of Peter as
 well and maybe he'll do something about it ? :)
 

Here's the lockstat trace.  You can cross reference it with my
earlier post.  
http://marc.theaimsgroup.com/?l=linux-kernelm=116743637422465w=2

The contention happened on mm-mmap_sem shared
by the java threads during futex_wake's invocation of _rt_down_read.

Tim

@contention events = 247149
@failure_events = 146
@lookup_failed_scope = 175
@lookup_failed_static = 43
@static_found = 16
[1, 113, 77 -- 32768, 0]{tcp_init, net/ipv4/tcp.c, 2426}
[2, 759, 182 -- 1, 0]   {lock_kernel, -, 0}
[13, 0, 7 -- 4, 0]  {kmem_cache_free, -, 0}
[25, 3564, 9278 -- 1, 0]{lock_timer_base, -, 0}
[56, 9528, 24552 -- 3, 0]   {init_timers_cpu, kernel/timer.c,
1842}
[471, 52845, 17682 -- 10448, 0] {sock_lock_init, net/core/sock.c,
817}
[32251, 9024, 242 -- 256, 0]{init, kernel/futex.c, 2781}
[173724, 11899638, 9886960 -- 11194, 0] {mm_init, kernel/fork.c,
369}
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] lock stat for -rt 2.6.20-rc2-rt2.2.lock_stat.patch

2007-01-03 Thread Chen, Tim C
Bill Huey (hui) wrote:
 
 Thanks, the numbers look a bit weird in that the first column should
 have a bigger number of events than that second column since it is a
 special case subset. Looking at the lock_stat_note() code should show
 that to be the case. Did you make a change to the output ?

No, I did not change the output.  I did reset to the contention content

by doing echo 0  /proc/lock_stat/contention.

I noticed that the first column get reset but not the second column. So
the reset code probably need to be checked.

Tim
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] lock stat for -rt 2.6.20-rc2-rt2 [was Re: 2.6.19-rt14 slowdown compared to 2.6.19]

2007-01-02 Thread Chen, Tim C
Bill Huey (hui) wrote:
> On Tue, Dec 26, 2006 at 04:51:21PM -0800, Chen, Tim C wrote:
>> Ingo Molnar wrote:
>>> If you'd like to profile this yourself then the lowest-cost way of
>>> profiling lock contention on -rt is to use the yum kernel and run
>>> the attached trace-it-lock-prof.c code on the box while your
>>> workload is in 'steady state' (and is showing those extended idle
>>> times): 
>>> 
>>>   ./trace-it-lock-prof > trace.txt
>> 
>> Thanks for the pointer.  Will let you know of any relevant traces.
> 
> Tim,
>
http://mmlinux.sourceforge.net/public/patch-2.6.20-rc2-rt2.lock_stat.pat
ch
> 
> You can also apply this patch to get more precise statistics down to
> the lock. For example:
> 
Bill,

I'm having some problem getting this patch to run stablely.  I'm
encoutering errors like that in the trace that follow:

Thanks.
Tim



Unable to handle kernel NULL pointer dereference at 0008
RIP:
 [] lock_stat_note_contention+0x12d/0x1c3
PGD 0
Oops:  [1] PREEMPT SMP
CPU 1
Modules linked in: autofs4 sunrpc dm_mirror dm_mod video sbs i2c_ec dock
button battery ac uhci_hcd ehci_hcd i2dPid: 0, comm: swapper Not tainted
2.6.20-rc2-rt2 #4

RIP: 0010:[]  []
lock_stat_note_contention+0x12d/0x1c3
RSP: 0018:81013fdb3d28  EFLAGS: 00010097
RAX: 81013fd68018 RBX: 81013fd68000 RCX: 
RDX: 8026762e RSI:  RDI: 8026762e
RBP: 81013fdb3df8 R08: 8100092bab60 R09: 8100092aafc8
R10: 0001 R11:  R12: 81013fd68030
R13:  R14: 0046 R15: 002728d5ecd0
FS:  () GS:81013fd078c0()
knlGS:
CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
CR2: 0008 CR3: 00201000 CR4: 06e0
Process swapper (pid: 0, threadinfo 81013fdb2000, task
81013fdb14e0)
Stack:   00020001 

   0100 000e
 000e   8100092bc440
Call Trace:
 [] rt_mutex_slowtrylock+0x84/0x9b
 [] rt_mutex_trylock+0x2e/0x30
 [] rt_spin_trylock+0x9/0xb
 [] get_next_timer_interrupt+0x34/0x226
 [] hrtimer_stop_sched_tick+0xb6/0x138
 [] cpu_idle+0x1b/0xdd
 [] start_secondary+0x2ed/0x2f9

---
| preempt count: 0003 ]
| 3-level deep critical section nesting:

.. []  cpu_idle+0xd7/0xdd
.[] ..   ( <= start_secondary+0x2ed/0x2f9)
.. []  __spin_lock_irqsave+0x18/0x42
.[] ..   ( <= rt_mutex_slowtrylock+0x19/0x9b)
.. []  __spin_trylock+0x14/0x4c
.[] ..   ( <= oops_begin+0x23/0x6f)

skipping trace printing on CPU#1 != -1

Code: 49 8b 45 08 8b 78 18 75 0d 49 8b 04 24 f0 ff 80 94 00 00 00
RIP  [] lock_stat_note_contention+0x12d/0x1c3
 RSP 
CR2: 0008
 <3>BUG: sleeping function called from invalid context swapper(0) at
kernel/rtmutex.c:1312
in_atomic():1 [0002], irqs_disabled():1

Call Trace:
 [] dump_trace+0xbe/0x3cd
 [] show_trace+0x3a/0x58
 [] dump_stack+0x15/0x17
 [] __might_sleep+0x103/0x10a
 [] rt_mutex_lock_with_ip+0x1e/0xac
 [] __rt_down_read+0x49/0x4d
 [] rt_down_read+0xb/0xd
 [] blocking_notifier_call_chain+0x19/0x3f
 [] profile_task_exit+0x15/0x17
 [] do_exit+0x25/0x8de
 [] do_page_fault+0x7d4/0x856
 [] error_exit+0x0/0x84
 [] lock_stat_note_contention+0x12d/0x1c3
 [] rt_mutex_slowtrylock+0x84/0x9b
 [] rt_mutex_trylock+0x2e/0x30
 [] rt_spin_trylock+0x9/0xb
 [] get_next_timer_interrupt+0x34/0x226
 [] hrtimer_stop_sched_tick+0xb6/0x138
 [] cpu_idle+0x1b/0xdd
 [] start_secondary+0x2ed/0x2f9

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: 2.6.19-rt14 slowdown compared to 2.6.19

2007-01-02 Thread Chen, Tim C
Ingo Molnar wrote:
> 
> (could you send me the whole trace if you still have it? It would be
> interesting to see a broader snippet from the life of individual java
> threads.)
> 
>   Ingo

Sure, I'll send it to you separately due to the size of the complete
trace.

Tim
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: 2.6.19-rt14 slowdown compared to 2.6.19

2007-01-02 Thread Chen, Tim C
Ingo Molnar wrote:
 
 (could you send me the whole trace if you still have it? It would be
 interesting to see a broader snippet from the life of individual java
 threads.)
 
   Ingo

Sure, I'll send it to you separately due to the size of the complete
trace.

Tim
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] lock stat for -rt 2.6.20-rc2-rt2 [was Re: 2.6.19-rt14 slowdown compared to 2.6.19]

2007-01-02 Thread Chen, Tim C
Bill Huey (hui) wrote:
 On Tue, Dec 26, 2006 at 04:51:21PM -0800, Chen, Tim C wrote:
 Ingo Molnar wrote:
 If you'd like to profile this yourself then the lowest-cost way of
 profiling lock contention on -rt is to use the yum kernel and run
 the attached trace-it-lock-prof.c code on the box while your
 workload is in 'steady state' (and is showing those extended idle
 times): 
 
   ./trace-it-lock-prof  trace.txt
 
 Thanks for the pointer.  Will let you know of any relevant traces.
 
 Tim,

http://mmlinux.sourceforge.net/public/patch-2.6.20-rc2-rt2.lock_stat.pat
ch
 
 You can also apply this patch to get more precise statistics down to
 the lock. For example:
 
Bill,

I'm having some problem getting this patch to run stablely.  I'm
encoutering errors like that in the trace that follow:

Thanks.
Tim



Unable to handle kernel NULL pointer dereference at 0008
RIP:
 [802cd6e4] lock_stat_note_contention+0x12d/0x1c3
PGD 0
Oops:  [1] PREEMPT SMP
CPU 1
Modules linked in: autofs4 sunrpc dm_mirror dm_mod video sbs i2c_ec dock
button battery ac uhci_hcd ehci_hcd i2dPid: 0, comm: swapper Not tainted
2.6.20-rc2-rt2 #4

RIP: 0010:[802cd6e4]  [802cd6e4]
lock_stat_note_contention+0x12d/0x1c3
RSP: 0018:81013fdb3d28  EFLAGS: 00010097
RAX: 81013fd68018 RBX: 81013fd68000 RCX: 
RDX: 8026762e RSI:  RDI: 8026762e
RBP: 81013fdb3df8 R08: 8100092bab60 R09: 8100092aafc8
R10: 0001 R11:  R12: 81013fd68030
R13:  R14: 0046 R15: 002728d5ecd0
FS:  () GS:81013fd078c0()
knlGS:
CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
CR2: 0008 CR3: 00201000 CR4: 06e0
Process swapper (pid: 0, threadinfo 81013fdb2000, task
81013fdb14e0)
Stack:   00020001 

   0100 000e
 000e   8100092bc440
Call Trace:
 [802af471] rt_mutex_slowtrylock+0x84/0x9b
 [80266909] rt_mutex_trylock+0x2e/0x30
 [8026762e] rt_spin_trylock+0x9/0xb
 [8029beef] get_next_timer_interrupt+0x34/0x226
 [802a8b4d] hrtimer_stop_sched_tick+0xb6/0x138
 [8024b1ce] cpu_idle+0x1b/0xdd
 [80278edd] start_secondary+0x2ed/0x2f9

---
| preempt count: 0003 ]
| 3-level deep critical section nesting:

.. [8024b28a]  cpu_idle+0xd7/0xdd
.[80278edd] ..   ( = start_secondary+0x2ed/0x2f9)
.. [80267837]  __spin_lock_irqsave+0x18/0x42
.[802af406] ..   ( = rt_mutex_slowtrylock+0x19/0x9b)
.. [802678db]  __spin_trylock+0x14/0x4c
.[80268540] ..   ( = oops_begin+0x23/0x6f)

skipping trace printing on CPU#1 != -1

Code: 49 8b 45 08 8b 78 18 75 0d 49 8b 04 24 f0 ff 80 94 00 00 00
RIP  [802cd6e4] lock_stat_note_contention+0x12d/0x1c3
 RSP 81013fdb3d28
CR2: 0008
 3BUG: sleeping function called from invalid context swapper(0) at
kernel/rtmutex.c:1312
in_atomic():1 [0002], irqs_disabled():1

Call Trace:
 [8026ec53] dump_trace+0xbe/0x3cd
 [8026eff3] show_trace+0x3a/0x58
 [8026f026] dump_stack+0x15/0x17
 [8020b75e] __might_sleep+0x103/0x10a
 [80266e44] rt_mutex_lock_with_ip+0x1e/0xac
 [802aff07] __rt_down_read+0x49/0x4d
 [802aff16] rt_down_read+0xb/0xd
 [8029fc96] blocking_notifier_call_chain+0x19/0x3f
 [80296301] profile_task_exit+0x15/0x17
 [80215572] do_exit+0x25/0x8de
 [8026a2c1] do_page_fault+0x7d4/0x856
 [802681ad] error_exit+0x0/0x84
 [802cd6e4] lock_stat_note_contention+0x12d/0x1c3
 [802af471] rt_mutex_slowtrylock+0x84/0x9b
 [80266909] rt_mutex_trylock+0x2e/0x30
 [8026762e] rt_spin_trylock+0x9/0xb
 [8029beef] get_next_timer_interrupt+0x34/0x226
 [802a8b4d] hrtimer_stop_sched_tick+0xb6/0x138
 [8024b1ce] cpu_idle+0x1b/0xdd
 [80278edd] start_secondary+0x2ed/0x2f9

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: 2.6.19-rt14 slowdown compared to 2.6.19

2006-12-29 Thread Chen, Tim C
Ingo Molnar wrote:
> 
> If you'd like to profile this yourself then the lowest-cost way of
> profiling lock contention on -rt is to use the yum kernel and run the
> attached trace-it-lock-prof.c code on the box while your workload is
> in 'steady state' (and is showing those extended idle times):
> 
>   ./trace-it-lock-prof > trace.txt
> 
> this captures up to 1 second worth of system activity, on the current
> CPU. Then you can construct the histogram via:
> 
>   grep -A 1 ' __schedule()<-' trace.txt | cut -d: -f2- | sort |
>   uniq -c | sort -n > prof.txt
> 

I did lock profiling on Volanomark as suggested and obtained the 
profile that is listed below. 

246
__sched_text_start()<-schedule()<-rt_spin_lock_slowlock()<-__lock_text_s
tart()
264  rt_mutex_slowunlock()<-rt_mutex_unlock()<-rt_up_read()<-(-1)()
334
__sched_text_start()<-schedule()<-posix_cpu_timers_thread()<-kthread()
437  __sched_text_start()<-schedule()<-do_futex()<-sys_futex()
467  (-1)()<-(0)()<-(0)()<-(0)()
495
__sched_text_start()<-preempt_schedule()<-__spin_unlock_irqrestore()<-rt
_mutex_adjust_prio()
497  __netif_rx_schedule()<-netif_rx()<-loopback_xmit()<-(-1)()
499
__sched_text_start()<-schedule()<-schedule_timeout()<-sk_wait_data()
500  tcp_recvmsg()<-sock_common_recvmsg()<-sock_recvmsg()<-(-1)()
503  __rt_down_read()<-rt_down_read()<-do_futex()<-(-1)()
   1160  __sched_text_start()<-schedule()<-ksoftirqd()<-kthread()
   1433  __rt_down_read()<-rt_down_read()<-futex_wake()<-(-1)()
   1497  child_rip()<-(-1)()<-(0)()<-(0)()
   1936
__sched_text_start()<-schedule()<-rt_mutex_slowlock()<-rt_mutex_lock()

Looks like the idle time I saw was due to lock contention 
during call to futex_wake, which requires acquisition of
current->mm->mmap_sem. 
Many of the java threads share mm and result in concurrent access to
common mm.  
Looks like under rt case there is no special treatment to read locking
so
the read lock accesses are contended under __rt_down_read.  For non rt
case, 
__down_read makes the distinction for read lock access and the read
lockings 
do not contend. 

Things are made worse here as this delayed waking up processes locked by
the futex.
See also a snippet of the latency_trace below. 

  -0 2D..2 5821us!: thread_return  (150 20)
  -0 2DN.1 6278us :
__sched_text_start()<-cpu_idle()<-start_secondary()<-(-1)()
  -0 2DN.1 6278us : (0)()<-(0)()<-(0)()<-(0)()
java-6648  2D..2 6280us+: thread_return <-0> (20 -4)
java-6648  2D..1 6296us :
try_to_wake_up()<-wake_up_process()<-wakeup_next_waiter()<-rt_mutex_slow
unlock()
java-6648  2D..1 6296us :
rt_mutex_unlock()<-rt_up_read()<-do_futex()<-(-1)()
java-6648  2D..2 6297us : effective_prio <<...>-6673> (-4 -4)
java-6648  2D..2 6297us : __activate_task <<...>-6673> (-4 1)
java-6648  2 6297us < (-11)
java-6648  2 6298us+> sys_futex (00afaf50
0001 0001)
java-6648  2...1 6315us :
__sched_text_start()<-schedule()<-rt_mutex_slowlock()<-rt_mutex_lock()
java-6648  2...1 6315us :
__rt_down_read()<-rt_down_read()<-futex_wake()<-(-1)()
java-6648  2D..2 6316us+: deactivate_task  (-4 1)
  -0 2D..2 6318us+: thread_return  (-4 20)
  -0 2DN.1 6327us :
__sched_text_start()<-cpu_idle()<-start_secondary()<-(-1)()
  -0 2DN.1 6328us+: (0)()<-(0)()<-(0)()<-(0)()
java-6629  2D..2 6330us+: thread_return <-0> (20 -4)
java-6629  2D..1 6347us :
try_to_wake_up()<-wake_up_process()<-wakeup_next_waiter()<-rt_mutex_slow
unlock()
java-6629  2D..1 6347us :
rt_mutex_unlock()<-rt_up_read()<-futex_wake()<-(-1)()
java-6629  2D..2 6348us : effective_prio  (-4 -4)
java-6629  2D..2 6349us : __activate_task  (-4 1)
java-6629  2 6350us+< (0)
java-6629  2 6352us+> sys_futex (00afc1dc
0001 0001)
java-6629  2...1 6368us :
__sched_text_start()<-schedule()<-rt_mutex_slowlock()<-rt_mutex_lock()
java-6629  2...1 6368us :
__rt_down_read()<-rt_down_read()<-futex_wake()<-(-1)()
java-6629  2D..2 6369us+: deactivate_task  (-4 1)
  -0 2D..2 6404us!: thread_return  (-4 20)
  -0 2DN.1 6584us :
__sched_text_start()<-cpu_idle()<-start_secondary()<-(-1)()

Thanks.

Tim
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: 2.6.19-rt14 slowdown compared to 2.6.19

2006-12-29 Thread Chen, Tim C
Ingo Molnar wrote:
 
 If you'd like to profile this yourself then the lowest-cost way of
 profiling lock contention on -rt is to use the yum kernel and run the
 attached trace-it-lock-prof.c code on the box while your workload is
 in 'steady state' (and is showing those extended idle times):
 
   ./trace-it-lock-prof  trace.txt
 
 this captures up to 1 second worth of system activity, on the current
 CPU. Then you can construct the histogram via:
 
   grep -A 1 ' __schedule()-' trace.txt | cut -d: -f2- | sort |
   uniq -c | sort -n  prof.txt
 

I did lock profiling on Volanomark as suggested and obtained the 
profile that is listed below. 

246
__sched_text_start()-schedule()-rt_spin_lock_slowlock()-__lock_text_s
tart()
264  rt_mutex_slowunlock()-rt_mutex_unlock()-rt_up_read()-(-1)()
334
__sched_text_start()-schedule()-posix_cpu_timers_thread()-kthread()
437  __sched_text_start()-schedule()-do_futex()-sys_futex()
467  (-1)()-(0)()-(0)()-(0)()
495
__sched_text_start()-preempt_schedule()-__spin_unlock_irqrestore()-rt
_mutex_adjust_prio()
497  __netif_rx_schedule()-netif_rx()-loopback_xmit()-(-1)()
499
__sched_text_start()-schedule()-schedule_timeout()-sk_wait_data()
500  tcp_recvmsg()-sock_common_recvmsg()-sock_recvmsg()-(-1)()
503  __rt_down_read()-rt_down_read()-do_futex()-(-1)()
   1160  __sched_text_start()-schedule()-ksoftirqd()-kthread()
   1433  __rt_down_read()-rt_down_read()-futex_wake()-(-1)()
   1497  child_rip()-(-1)()-(0)()-(0)()
   1936
__sched_text_start()-schedule()-rt_mutex_slowlock()-rt_mutex_lock()

Looks like the idle time I saw was due to lock contention 
during call to futex_wake, which requires acquisition of
current-mm-mmap_sem. 
Many of the java threads share mm and result in concurrent access to
common mm.  
Looks like under rt case there is no special treatment to read locking
so
the read lock accesses are contended under __rt_down_read.  For non rt
case, 
__down_read makes the distinction for read lock access and the read
lockings 
do not contend. 

Things are made worse here as this delayed waking up processes locked by
the futex.
See also a snippet of the latency_trace below. 

  idle-0 2D..2 5821us!: thread_return softirq--31 (150 20)
  idle-0 2DN.1 6278us :
__sched_text_start()-cpu_idle()-start_secondary()-(-1)()
  idle-0 2DN.1 6278us : (0)()-(0)()-(0)()-(0)()
java-6648  2D..2 6280us+: thread_return idle-0 (20 -4)
java-6648  2D..1 6296us :
try_to_wake_up()-wake_up_process()-wakeup_next_waiter()-rt_mutex_slow
unlock()
java-6648  2D..1 6296us :
rt_mutex_unlock()-rt_up_read()-do_futex()-(-1)()
java-6648  2D..2 6297us : effective_prio ...-6673 (-4 -4)
java-6648  2D..2 6297us : __activate_task ...-6673 (-4 1)
java-6648  2 6297us  (-11)
java-6648  2 6298us+ sys_futex (00afaf50
0001 0001)
java-6648  2...1 6315us :
__sched_text_start()-schedule()-rt_mutex_slowlock()-rt_mutex_lock()
java-6648  2...1 6315us :
__rt_down_read()-rt_down_read()-futex_wake()-(-1)()
java-6648  2D..2 6316us+: deactivate_task java-6648 (-4 1)
  idle-0 2D..2 6318us+: thread_return java-6648 (-4 20)
  idle-0 2DN.1 6327us :
__sched_text_start()-cpu_idle()-start_secondary()-(-1)()
  idle-0 2DN.1 6328us+: (0)()-(0)()-(0)()-(0)()
java-6629  2D..2 6330us+: thread_return idle-0 (20 -4)
java-6629  2D..1 6347us :
try_to_wake_up()-wake_up_process()-wakeup_next_waiter()-rt_mutex_slow
unlock()
java-6629  2D..1 6347us :
rt_mutex_unlock()-rt_up_read()-futex_wake()-(-1)()
java-6629  2D..2 6348us : effective_prio java-6235 (-4 -4)
java-6629  2D..2 6349us : __activate_task java-6235 (-4 1)
java-6629  2 6350us+ (0)
java-6629  2 6352us+ sys_futex (00afc1dc
0001 0001)
java-6629  2...1 6368us :
__sched_text_start()-schedule()-rt_mutex_slowlock()-rt_mutex_lock()
java-6629  2...1 6368us :
__rt_down_read()-rt_down_read()-futex_wake()-(-1)()
java-6629  2D..2 6369us+: deactivate_task java-6629 (-4 1)
  idle-0 2D..2 6404us!: thread_return java-6629 (-4 20)
  idle-0 2DN.1 6584us :
__sched_text_start()-cpu_idle()-start_secondary()-(-1)()

Thanks.

Tim
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: 2.6.19-rt14 slowdown compared to 2.6.19

2006-12-26 Thread Chen, Tim C
Ingo Molnar wrote:
> 
> cool - thanks for the feedback! Running the 64-bit kernel, right?
> 

Yes, 64-bit kernel was used.

> 
> while some slowdown is to be expected, did in each case idle time
> increase significantly? 

Volanomark and Re-Aim7 ran close to 0% idle time for 2.6.19 kernel.
Idle time
increase significantly for Volanomark (to 60% idle) and Re-Aim7 (to 20%
idle) 
with the rt kernel.  For netperf, the system was 60% idle for 
both 2.6.19 and rt kernel and changes in idle time was not significant.

> If yes then this is the effect of lock
> contention. Lock contention effects are 'magnified' by PREEMPT_RT. For
> example if you run 128 threads workload that all use the same lock
> then 
> the -rt kernel can act as if it were a 128-way box (!). This way by
> running -rt you'll see scalability problems alot sooner than on real
> hardware. In other words: PREEMPT_RT in essence simulates the
> scalability behavior of up to an infinite amount of CPUs. (with the
> exception of cachemiss emulation ;) [the effect is not this precise,
> but 
> that's the rough trend]

Turning off PREEMPT_RT for 2.6.20-rc2-rt0 kernel
restored most the performance of Volanaomark
and Re-Aim7.  Idle time is close to 0%.  So the benchmarks
with large number of threads are affected more by PREEMPT_RT.

For netperf TCP streaming, the performance improved from 40% down to 20%
down from 2.6.20-rc2 kernel.  There is only a server and a client
process
for netperf.  The underlying reason for the change in performance
is probably different.

> 
> If you'd like to profile this yourself then the lowest-cost way of
> profiling lock contention on -rt is to use the yum kernel and run the
> attached trace-it-lock-prof.c code on the box while your workload is
> in 'steady state' (and is showing those extended idle times):
> 
>   ./trace-it-lock-prof > trace.txt
> 

Thanks for the pointer.  Will let you know of any relevant traces.

Thanks.

Tim
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: 2.6.19-rt14 slowdown compared to 2.6.19

2006-12-26 Thread Chen, Tim C
Ingo Molnar wrote:
 
 cool - thanks for the feedback! Running the 64-bit kernel, right?
 

Yes, 64-bit kernel was used.

 
 while some slowdown is to be expected, did in each case idle time
 increase significantly? 

Volanomark and Re-Aim7 ran close to 0% idle time for 2.6.19 kernel.
Idle time
increase significantly for Volanomark (to 60% idle) and Re-Aim7 (to 20%
idle) 
with the rt kernel.  For netperf, the system was 60% idle for 
both 2.6.19 and rt kernel and changes in idle time was not significant.

 If yes then this is the effect of lock
 contention. Lock contention effects are 'magnified' by PREEMPT_RT. For
 example if you run 128 threads workload that all use the same lock
 then 
 the -rt kernel can act as if it were a 128-way box (!). This way by
 running -rt you'll see scalability problems alot sooner than on real
 hardware. In other words: PREEMPT_RT in essence simulates the
 scalability behavior of up to an infinite amount of CPUs. (with the
 exception of cachemiss emulation ;) [the effect is not this precise,
 but 
 that's the rough trend]

Turning off PREEMPT_RT for 2.6.20-rc2-rt0 kernel
restored most the performance of Volanaomark
and Re-Aim7.  Idle time is close to 0%.  So the benchmarks
with large number of threads are affected more by PREEMPT_RT.

For netperf TCP streaming, the performance improved from 40% down to 20%
down from 2.6.20-rc2 kernel.  There is only a server and a client
process
for netperf.  The underlying reason for the change in performance
is probably different.

 
 If you'd like to profile this yourself then the lowest-cost way of
 profiling lock contention on -rt is to use the yum kernel and run the
 attached trace-it-lock-prof.c code on the box while your workload is
 in 'steady state' (and is showing those extended idle times):
 
   ./trace-it-lock-prof  trace.txt
 

Thanks for the pointer.  Will let you know of any relevant traces.

Thanks.

Tim
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.19-rt14 slowdown compared to 2.6.19

2006-12-22 Thread Chen, Tim C
Ingo,
 
We did some benchmarking on 2.6.19-rt14, compared it with 2.6.19 
kernel and noticed several slowdowns.  The test machine is a 2 socket
woodcrest machine with your default configuration.
 
Netperf TCP Streaming was slower by 40% ( 1 server and 1 client 
each bound to separate cpu cores on different socket, network
loopback mode was used).  

Volanomark was slower by 80% (Server and Clients communicate with 
network loopback mode. Idle time goes from 1% to 60%)

Re-Aim7 was slower by 40% (idle time goes from 0% to 20%)

Wonder if you have any suggestions on what could cause the slowdown.  
We've tried disabling CONFIG_NO_HZ and it didn't help much.

Thanks.

Tim
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.19-rt14 slowdown compared to 2.6.19

2006-12-22 Thread Chen, Tim C
Ingo,
 
We did some benchmarking on 2.6.19-rt14, compared it with 2.6.19 
kernel and noticed several slowdowns.  The test machine is a 2 socket
woodcrest machine with your default configuration.
 
Netperf TCP Streaming was slower by 40% ( 1 server and 1 client 
each bound to separate cpu cores on different socket, network
loopback mode was used).  

Volanomark was slower by 80% (Server and Clients communicate with 
network loopback mode. Idle time goes from 1% to 60%)

Re-Aim7 was slower by 40% (idle time goes from 0% to 20%)

Wonder if you have any suggestions on what could cause the slowdown.  
We've tried disabling CONFIG_NO_HZ and it didn't help much.

Thanks.

Tim
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/