> >
> > + if (si->flags & (SWP_BLKDEV | SWP_FS)) {
>
> I re-read your discussion with Tim and I must say the reasoning behind this
> test remain foggy.
I was worried that the dereference
inode = si->swap_file->f_mapping->host;
is not always safe for corner cases.
So the test makes sure that
> > >
> >
> > The cache->slots_ret is protected by cache->free_lock and
> > cache->slots is protected by cache->free_lock.
Typo. cache->slots is protected by cache->alloc_lock.
Tim
> > >
> >
> > The cache->slots_ret is protected by cache->free_lock and
> > cache->slots is protected by cache->free_lock.
Typo. cache->slots is protected by cache->alloc_lock.
Tim
> > + /*
> > +* Preemption need to be turned on here, because we may sleep
> > +* in refill_swap_slots_cache(). But it is safe, because
> > +* accesses to the per-CPU data structure are protected by a
> > +* mutex.
> > +*/
>
> the comment doesn't really explain why it is
> > + /*
> > +* Preemption need to be turned on here, because we may sleep
> > +* in refill_swap_slots_cache(). But it is safe, because
> > +* accesses to the per-CPU data structure are protected by a
> > +* mutex.
> > +*/
>
> the comment doesn't really explain why it is
>
>So this is impossible without THP swapin. While 2M swapout makes a lot of
>sense, I doubt 2M swapin is really useful. What kind of application is
>'optimized'
>to do sequential memory access?
We waste a lot of cpu cycles to re-compact 4K pages back to a large page
under THP. Swapping it back
>
>So this is impossible without THP swapin. While 2M swapout makes a lot of
>sense, I doubt 2M swapin is really useful. What kind of application is
>'optimized'
>to do sequential memory access?
We waste a lot of cpu cycles to re-compact 4K pages back to a large page
under THP. Swapping it back
>>
>> - Avoid CPU time for splitting, collapsing THP across swap out/in.
>
>Yes, if you want, please give us how bad it is.
>
It could be pretty bad. In an experiment with THP turned on and we
enter swap, 50% of the cpu are spent in the page compaction path.
So if we could deal with units of
>>
>> - Avoid CPU time for splitting, collapsing THP across swap out/in.
>
>Yes, if you want, please give us how bad it is.
>
It could be pretty bad. In an experiment with THP turned on and we
enter swap, 50% of the cpu are spent in the page compaction path.
So if we could deal with units of
>> Ok, these enhancements are now in the locking tree and are queued up for
>v4.8:
>>
>>git pull git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git
>> locking/core
>>
>> Dave, you might want to check your numbers with these changes: is
>> rwsem performance still significantly worse than
>> Ok, these enhancements are now in the locking tree and are queued up for
>v4.8:
>>
>>git pull git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git
>> locking/core
>>
>> Dave, you might want to check your numbers with these changes: is
>> rwsem performance still significantly worse than
Christoph Lameter wrote:
> Try to boot with
>
> slub_max_order=4 slub_min_objects=8
>
> If that does not help increase slub_min_objects to 16.
>
We are still seeing a 5% regression on TCP streaming with
slub_min_objects set at 16 and a 10% regression for Volanomark, after
increasing
Christoph Lameter wrote:
Try to boot with
slub_max_order=4 slub_min_objects=8
If that does not help increase slub_min_objects to 16.
We are still seeing a 5% regression on TCP streaming with
slub_min_objects set at 16 and a 10% regression for Volanomark, after
increasing slub_min_objects
Bill Huey (hui) wrote:
> This should have the fix.
>
>
http://mmlinux.sf.net/public/patch-2.6.20-rc2-rt2.3.lock_stat.patch
>
> If you can rerun it and post the results, it'll hopefully show the
> behavior of that lock acquisition better.
>
Here's the run with fix to produce correct statistics.
Bill Huey (hui) wrote:
>
> Thanks, the numbers look a bit weird in that the first column should
> have a bigger number of events than that second column since it is a
> special case subset. Looking at the lock_stat_note() code should show
> that to be the case. Did you make a change to the output
Bill Huey (hui) wrote:
> Can you sort the output ("sort -n" what ever..) and post it without
> the zeroed entries ?
>
> I'm curious about how that statistical spike compares to the rest of
> the system activity. I'm sure that'll get the attention of Peter as
> well and maybe he'll do something
Bill Huey (hui) wrote:
>
> Good to know that. What did the output reveal ?
>
> What's your intended use again summarized ? futex contention ? I'll
> read the first posting again.
>
Earlier I used latency_trace and figured that there was read contention
on mm->mmap_sem during call to
Bill Huey (hui) wrote:
>
> Patch here:
>
>
http://mmlinux.sourceforge.net/public/patch-2.6.20-rc2-rt2.2.lock_stat.p
atch
>
> bill
This version is much better and ran stablely.
If I'm reading the output correctly, the locks are listed by
their initialization point (function, file and line #
Bill Huey (hui) wrote:
Patch here:
http://mmlinux.sourceforge.net/public/patch-2.6.20-rc2-rt2.2.lock_stat.p
atch
bill
This version is much better and ran stablely.
If I'm reading the output correctly, the locks are listed by
their initialization point (function, file and line # that
Bill Huey (hui) wrote:
Good to know that. What did the output reveal ?
What's your intended use again summarized ? futex contention ? I'll
read the first posting again.
Earlier I used latency_trace and figured that there was read contention
on mm-mmap_sem during call to _rt_down_read by
Bill Huey (hui) wrote:
Can you sort the output (sort -n what ever..) and post it without
the zeroed entries ?
I'm curious about how that statistical spike compares to the rest of
the system activity. I'm sure that'll get the attention of Peter as
well and maybe he'll do something about it ?
Bill Huey (hui) wrote:
Thanks, the numbers look a bit weird in that the first column should
have a bigger number of events than that second column since it is a
special case subset. Looking at the lock_stat_note() code should show
that to be the case. Did you make a change to the output ?
Bill Huey (hui) wrote:
> On Tue, Dec 26, 2006 at 04:51:21PM -0800, Chen, Tim C wrote:
>> Ingo Molnar wrote:
>>> If you'd like to profile this yourself then the lowest-cost way of
>>> profiling lock contention on -rt is to use the yum kernel and run
>>> th
Ingo Molnar wrote:
>
> (could you send me the whole trace if you still have it? It would be
> interesting to see a broader snippet from the life of individual java
> threads.)
>
> Ingo
Sure, I'll send it to you separately due to the size of the complete
trace.
Tim
-
To unsubscribe from
Ingo Molnar wrote:
(could you send me the whole trace if you still have it? It would be
interesting to see a broader snippet from the life of individual java
threads.)
Ingo
Sure, I'll send it to you separately due to the size of the complete
trace.
Tim
-
To unsubscribe from this
Bill Huey (hui) wrote:
On Tue, Dec 26, 2006 at 04:51:21PM -0800, Chen, Tim C wrote:
Ingo Molnar wrote:
If you'd like to profile this yourself then the lowest-cost way of
profiling lock contention on -rt is to use the yum kernel and run
the attached trace-it-lock-prof.c code on the box while
Ingo Molnar wrote:
>
> If you'd like to profile this yourself then the lowest-cost way of
> profiling lock contention on -rt is to use the yum kernel and run the
> attached trace-it-lock-prof.c code on the box while your workload is
> in 'steady state' (and is showing those extended idle times):
Ingo Molnar wrote:
If you'd like to profile this yourself then the lowest-cost way of
profiling lock contention on -rt is to use the yum kernel and run the
attached trace-it-lock-prof.c code on the box while your workload is
in 'steady state' (and is showing those extended idle times):
Ingo Molnar wrote:
>
> cool - thanks for the feedback! Running the 64-bit kernel, right?
>
Yes, 64-bit kernel was used.
>
> while some slowdown is to be expected, did in each case idle time
> increase significantly?
Volanomark and Re-Aim7 ran close to 0% idle time for 2.6.19 kernel.
Idle
Ingo Molnar wrote:
cool - thanks for the feedback! Running the 64-bit kernel, right?
Yes, 64-bit kernel was used.
while some slowdown is to be expected, did in each case idle time
increase significantly?
Volanomark and Re-Aim7 ran close to 0% idle time for 2.6.19 kernel.
Idle time
Ingo,
We did some benchmarking on 2.6.19-rt14, compared it with 2.6.19
kernel and noticed several slowdowns. The test machine is a 2 socket
woodcrest machine with your default configuration.
Netperf TCP Streaming was slower by 40% ( 1 server and 1 client
each bound to separate cpu cores on
Ingo,
We did some benchmarking on 2.6.19-rt14, compared it with 2.6.19
kernel and noticed several slowdowns. The test machine is a 2 socket
woodcrest machine with your default configuration.
Netperf TCP Streaming was slower by 40% ( 1 server and 1 client
each bound to separate cpu cores on
32 matches
Mail list logo