RE: [v5 PATCH 1/2] mm: swap: check if swap backing device is congested or not

2019-01-10 Thread Chen, Tim C
> > > > + if (si->flags & (SWP_BLKDEV | SWP_FS)) { > > I re-read your discussion with Tim and I must say the reasoning behind this > test remain foggy. I was worried that the dereference inode = si->swap_file->f_mapping->host; is not always safe for corner cases. So the test makes sure that

RE: [Update][PATCH v5 7/9] mm/swap: Add cache for swap slots allocation

2017-01-17 Thread Chen, Tim C
> > > > > > > The cache->slots_ret is protected by cache->free_lock and > > cache->slots is protected by cache->free_lock. Typo. cache->slots is protected by cache->alloc_lock. Tim

RE: [Update][PATCH v5 7/9] mm/swap: Add cache for swap slots allocation

2017-01-17 Thread Chen, Tim C
> > > > > > > The cache->slots_ret is protected by cache->free_lock and > > cache->slots is protected by cache->free_lock. Typo. cache->slots is protected by cache->alloc_lock. Tim

RE: [Update][PATCH v5 7/9] mm/swap: Add cache for swap slots allocation

2017-01-17 Thread Chen, Tim C
> > + /* > > +* Preemption need to be turned on here, because we may sleep > > +* in refill_swap_slots_cache(). But it is safe, because > > +* accesses to the per-CPU data structure are protected by a > > +* mutex. > > +*/ > > the comment doesn't really explain why it is

RE: [Update][PATCH v5 7/9] mm/swap: Add cache for swap slots allocation

2017-01-17 Thread Chen, Tim C
> > + /* > > +* Preemption need to be turned on here, because we may sleep > > +* in refill_swap_slots_cache(). But it is safe, because > > +* accesses to the per-CPU data structure are protected by a > > +* mutex. > > +*/ > > the comment doesn't really explain why it is

RE: [PATCH -v3 00/10] THP swap: Delay splitting THP during swapping out

2016-09-22 Thread Chen, Tim C
> >So this is impossible without THP swapin. While 2M swapout makes a lot of >sense, I doubt 2M swapin is really useful. What kind of application is >'optimized' >to do sequential memory access? We waste a lot of cpu cycles to re-compact 4K pages back to a large page under THP. Swapping it back

RE: [PATCH -v3 00/10] THP swap: Delay splitting THP during swapping out

2016-09-22 Thread Chen, Tim C
> >So this is impossible without THP swapin. While 2M swapout makes a lot of >sense, I doubt 2M swapin is really useful. What kind of application is >'optimized' >to do sequential memory access? We waste a lot of cpu cycles to re-compact 4K pages back to a large page under THP. Swapping it back

RE: [PATCH -v3 00/10] THP swap: Delay splitting THP during swapping out

2016-09-13 Thread Chen, Tim C
>> >> - Avoid CPU time for splitting, collapsing THP across swap out/in. > >Yes, if you want, please give us how bad it is. > It could be pretty bad. In an experiment with THP turned on and we enter swap, 50% of the cpu are spent in the page compaction path. So if we could deal with units of

RE: [PATCH -v3 00/10] THP swap: Delay splitting THP during swapping out

2016-09-13 Thread Chen, Tim C
>> >> - Avoid CPU time for splitting, collapsing THP across swap out/in. > >Yes, if you want, please give us how bad it is. > It could be pretty bad. In an experiment with THP turned on and we enter swap, 50% of the cpu are spent in the page compaction path. So if we could deal with units of

RE: performance delta after VFS i_mutex=>i_rwsem conversion

2016-06-09 Thread Chen, Tim C
>> Ok, these enhancements are now in the locking tree and are queued up for >v4.8: >> >>git pull git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git >> locking/core >> >> Dave, you might want to check your numbers with these changes: is >> rwsem performance still significantly worse than

RE: performance delta after VFS i_mutex=>i_rwsem conversion

2016-06-09 Thread Chen, Tim C
>> Ok, these enhancements are now in the locking tree and are queued up for >v4.8: >> >>git pull git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git >> locking/core >> >> Dave, you might want to check your numbers with these changes: is >> rwsem performance still significantly worse than

RE: Regression with SLUB on Netperf and Volanomark

2007-05-03 Thread Chen, Tim C
Christoph Lameter wrote: > Try to boot with > > slub_max_order=4 slub_min_objects=8 > > If that does not help increase slub_min_objects to 16. > We are still seeing a 5% regression on TCP streaming with slub_min_objects set at 16 and a 10% regression for Volanomark, after increasing

RE: Regression with SLUB on Netperf and Volanomark

2007-05-03 Thread Chen, Tim C
Christoph Lameter wrote: Try to boot with slub_max_order=4 slub_min_objects=8 If that does not help increase slub_min_objects to 16. We are still seeing a 5% regression on TCP streaming with slub_min_objects set at 16 and a 10% regression for Volanomark, after increasing slub_min_objects

RE: [PATCH] lock stat for -rt 2.6.20-rc2-rt2.2.lock_stat.patch

2007-01-03 Thread Chen, Tim C
Bill Huey (hui) wrote: > This should have the fix. > > http://mmlinux.sf.net/public/patch-2.6.20-rc2-rt2.3.lock_stat.patch > > If you can rerun it and post the results, it'll hopefully show the > behavior of that lock acquisition better. > Here's the run with fix to produce correct statistics.

RE: [PATCH] lock stat for -rt 2.6.20-rc2-rt2.2.lock_stat.patch

2007-01-03 Thread Chen, Tim C
Bill Huey (hui) wrote: > > Thanks, the numbers look a bit weird in that the first column should > have a bigger number of events than that second column since it is a > special case subset. Looking at the lock_stat_note() code should show > that to be the case. Did you make a change to the output

RE: [PATCH] lock stat for -rt 2.6.20-rc2-rt2.2.lock_stat.patch

2007-01-03 Thread Chen, Tim C
Bill Huey (hui) wrote: > Can you sort the output ("sort -n" what ever..) and post it without > the zeroed entries ? > > I'm curious about how that statistical spike compares to the rest of > the system activity. I'm sure that'll get the attention of Peter as > well and maybe he'll do something

RE: [PATCH] lock stat for -rt 2.6.20-rc2-rt2.2.lock_stat.patch

2007-01-03 Thread Chen, Tim C
Bill Huey (hui) wrote: > > Good to know that. What did the output reveal ? > > What's your intended use again summarized ? futex contention ? I'll > read the first posting again. > Earlier I used latency_trace and figured that there was read contention on mm->mmap_sem during call to

RE: [PATCH] lock stat for -rt 2.6.20-rc2-rt2.2.lock_stat.patch

2007-01-03 Thread Chen, Tim C
Bill Huey (hui) wrote: > > Patch here: > > http://mmlinux.sourceforge.net/public/patch-2.6.20-rc2-rt2.2.lock_stat.p atch > > bill This version is much better and ran stablely. If I'm reading the output correctly, the locks are listed by their initialization point (function, file and line #

RE: [PATCH] lock stat for -rt 2.6.20-rc2-rt2.2.lock_stat.patch

2007-01-03 Thread Chen, Tim C
Bill Huey (hui) wrote: Patch here: http://mmlinux.sourceforge.net/public/patch-2.6.20-rc2-rt2.2.lock_stat.p atch bill This version is much better and ran stablely. If I'm reading the output correctly, the locks are listed by their initialization point (function, file and line # that

RE: [PATCH] lock stat for -rt 2.6.20-rc2-rt2.2.lock_stat.patch

2007-01-03 Thread Chen, Tim C
Bill Huey (hui) wrote: Good to know that. What did the output reveal ? What's your intended use again summarized ? futex contention ? I'll read the first posting again. Earlier I used latency_trace and figured that there was read contention on mm-mmap_sem during call to _rt_down_read by

RE: [PATCH] lock stat for -rt 2.6.20-rc2-rt2.2.lock_stat.patch

2007-01-03 Thread Chen, Tim C
Bill Huey (hui) wrote: Can you sort the output (sort -n what ever..) and post it without the zeroed entries ? I'm curious about how that statistical spike compares to the rest of the system activity. I'm sure that'll get the attention of Peter as well and maybe he'll do something about it ?

RE: [PATCH] lock stat for -rt 2.6.20-rc2-rt2.2.lock_stat.patch

2007-01-03 Thread Chen, Tim C
Bill Huey (hui) wrote: Thanks, the numbers look a bit weird in that the first column should have a bigger number of events than that second column since it is a special case subset. Looking at the lock_stat_note() code should show that to be the case. Did you make a change to the output ?

RE: [PATCH] lock stat for -rt 2.6.20-rc2-rt2 [was Re: 2.6.19-rt14 slowdown compared to 2.6.19]

2007-01-02 Thread Chen, Tim C
Bill Huey (hui) wrote: > On Tue, Dec 26, 2006 at 04:51:21PM -0800, Chen, Tim C wrote: >> Ingo Molnar wrote: >>> If you'd like to profile this yourself then the lowest-cost way of >>> profiling lock contention on -rt is to use the yum kernel and run >>> th

RE: 2.6.19-rt14 slowdown compared to 2.6.19

2007-01-02 Thread Chen, Tim C
Ingo Molnar wrote: > > (could you send me the whole trace if you still have it? It would be > interesting to see a broader snippet from the life of individual java > threads.) > > Ingo Sure, I'll send it to you separately due to the size of the complete trace. Tim - To unsubscribe from

RE: 2.6.19-rt14 slowdown compared to 2.6.19

2007-01-02 Thread Chen, Tim C
Ingo Molnar wrote: (could you send me the whole trace if you still have it? It would be interesting to see a broader snippet from the life of individual java threads.) Ingo Sure, I'll send it to you separately due to the size of the complete trace. Tim - To unsubscribe from this

RE: [PATCH] lock stat for -rt 2.6.20-rc2-rt2 [was Re: 2.6.19-rt14 slowdown compared to 2.6.19]

2007-01-02 Thread Chen, Tim C
Bill Huey (hui) wrote: On Tue, Dec 26, 2006 at 04:51:21PM -0800, Chen, Tim C wrote: Ingo Molnar wrote: If you'd like to profile this yourself then the lowest-cost way of profiling lock contention on -rt is to use the yum kernel and run the attached trace-it-lock-prof.c code on the box while

RE: 2.6.19-rt14 slowdown compared to 2.6.19

2006-12-29 Thread Chen, Tim C
Ingo Molnar wrote: > > If you'd like to profile this yourself then the lowest-cost way of > profiling lock contention on -rt is to use the yum kernel and run the > attached trace-it-lock-prof.c code on the box while your workload is > in 'steady state' (and is showing those extended idle times):

RE: 2.6.19-rt14 slowdown compared to 2.6.19

2006-12-29 Thread Chen, Tim C
Ingo Molnar wrote: If you'd like to profile this yourself then the lowest-cost way of profiling lock contention on -rt is to use the yum kernel and run the attached trace-it-lock-prof.c code on the box while your workload is in 'steady state' (and is showing those extended idle times):

RE: 2.6.19-rt14 slowdown compared to 2.6.19

2006-12-26 Thread Chen, Tim C
Ingo Molnar wrote: > > cool - thanks for the feedback! Running the 64-bit kernel, right? > Yes, 64-bit kernel was used. > > while some slowdown is to be expected, did in each case idle time > increase significantly? Volanomark and Re-Aim7 ran close to 0% idle time for 2.6.19 kernel. Idle

RE: 2.6.19-rt14 slowdown compared to 2.6.19

2006-12-26 Thread Chen, Tim C
Ingo Molnar wrote: cool - thanks for the feedback! Running the 64-bit kernel, right? Yes, 64-bit kernel was used. while some slowdown is to be expected, did in each case idle time increase significantly? Volanomark and Re-Aim7 ran close to 0% idle time for 2.6.19 kernel. Idle time

2.6.19-rt14 slowdown compared to 2.6.19

2006-12-22 Thread Chen, Tim C
Ingo, We did some benchmarking on 2.6.19-rt14, compared it with 2.6.19 kernel and noticed several slowdowns. The test machine is a 2 socket woodcrest machine with your default configuration. Netperf TCP Streaming was slower by 40% ( 1 server and 1 client each bound to separate cpu cores on

2.6.19-rt14 slowdown compared to 2.6.19

2006-12-22 Thread Chen, Tim C
Ingo, We did some benchmarking on 2.6.19-rt14, compared it with 2.6.19 kernel and noticed several slowdowns. The test machine is a 2 socket woodcrest machine with your default configuration. Netperf TCP Streaming was slower by 40% ( 1 server and 1 client each bound to separate cpu cores on