On Tue, 8 May 2007, Tim Chen wrote:
> I tried the slub-patches and the avoid atomic overhead patch against
> 2.6.21-mm1. It brings the TCP_STREAM performance for SLUB to the SLAB
> level. The patches not mentioned in the "series" file did not apply
> cleanly to 2.6.21-mm1 and I skipped most of
On Mon, 2007-05-07 at 18:49 -0700, Christoph Lameter wrote:
> On Mon, 7 May 2007, Tim Chen wrote:
>
> > However, the output from TCP_STREAM is quite stable.
> > I am still seeing a 4% difference between the SLAB and SLUB kernel.
> > Looking at the L2 cache miss rate with emon, I saw 6% more
On Mon, 2007-05-07 at 18:49 -0700, Christoph Lameter wrote:
On Mon, 7 May 2007, Tim Chen wrote:
However, the output from TCP_STREAM is quite stable.
I am still seeing a 4% difference between the SLAB and SLUB kernel.
Looking at the L2 cache miss rate with emon, I saw 6% more cache miss
On Tue, 8 May 2007, Tim Chen wrote:
I tried the slub-patches and the avoid atomic overhead patch against
2.6.21-mm1. It brings the TCP_STREAM performance for SLUB to the SLAB
level. The patches not mentioned in the series file did not apply
cleanly to 2.6.21-mm1 and I skipped most of those.
On Mon, 7 May 2007, Tim Chen wrote:
> However, the output from TCP_STREAM is quite stable.
> I am still seeing a 4% difference between the SLAB and SLUB kernel.
> Looking at the L2 cache miss rate with emon, I saw 6% more cache miss on
> the client side with SLUB. The server side has the same
On Mon, 7 May 2007, Tim Chen wrote:
> However, the output from TCP_STREAM is quite stable.
> I am still seeing a 4% difference between the SLAB and SLUB kernel.
> Looking at the L2 cache miss rate with emon, I saw 6% more cache miss on
> the client side with SLUB. The server side has the same
On Fri, 2007-05-04 at 18:02 -0700, Christoph Lameter wrote:
> On Fri, 4 May 2007, Tim Chen wrote:
>
> > On Thu, 2007-05-03 at 18:45 -0700, Christoph Lameter wrote:
> > > H.. One potential issues are the complicated way the slab is
> > > handled. Could you try this patch and see what impact
On Fri, 2007-05-04 at 18:02 -0700, Christoph Lameter wrote:
On Fri, 4 May 2007, Tim Chen wrote:
On Thu, 2007-05-03 at 18:45 -0700, Christoph Lameter wrote:
H.. One potential issues are the complicated way the slab is
handled. Could you try this patch and see what impact it has?
On Mon, 7 May 2007, Tim Chen wrote:
However, the output from TCP_STREAM is quite stable.
I am still seeing a 4% difference between the SLAB and SLUB kernel.
Looking at the L2 cache miss rate with emon, I saw 6% more cache miss on
the client side with SLUB. The server side has the same
On Mon, 7 May 2007, Tim Chen wrote:
However, the output from TCP_STREAM is quite stable.
I am still seeing a 4% difference between the SLAB and SLUB kernel.
Looking at the L2 cache miss rate with emon, I saw 6% more cache miss on
the client side with SLUB. The server side has the same
Got something If I remove the atomics from both alloc and free then I
get a performance jump. But maybe also a runtime variation
Avoid the use of atomics in slab_alloc
About 5-7% performance gain. Or am I also seeing runtime variations?
What we do is add the last free field in the page
If you want to test some more: Here is a patch that removes the atomic ops
from the allocation patch. But I only see minor improvements on my amd64
box here.
Avoid the use of atomics in slab_alloc
This only increases netperf performance by 1%. Wonder why?
What we do is add the last free
On Fri, 4 May 2007, Tim Chen wrote:
> On Thu, 2007-05-03 at 18:45 -0700, Christoph Lameter wrote:
> > H.. One potential issues are the complicated way the slab is
> > handled. Could you try this patch and see what impact it has?
> >
> The patch boost the throughput of TCP_STREAM test by 5%,
On Fri, 2007-05-04 at 16:59 -0700, Christoph Lameter wrote:
> >
> > to run the tests. The results are about the same as the non-NUMA case,
> > with slab about 5% better than slub.
>
> H... both tests were run in the same context? NUMA has additional
> overhead in other areas.
Both slab
On Thu, 2007-05-03 at 18:45 -0700, Christoph Lameter wrote:
> H.. One potential issues are the complicated way the slab is
> handled. Could you try this patch and see what impact it has?
>
The patch boost the throughput of TCP_STREAM test by 5%, for both slab
and slub. But slab is still 5%
On Fri, 4 May 2007, Tim Chen wrote:
> On Fri, 2007-05-04 at 11:27 -0700, Christoph Lameter wrote:
>
> >
> > Not sure where to go here. Increasing the per cpu slab size may hold off
> > the issue up to a certain cpu cache size. For that we would need to
> > identify which slabs create the
On Fri, 2007-05-04 at 11:27 -0700, Christoph Lameter wrote:
>
> Not sure where to go here. Increasing the per cpu slab size may hold off
> the issue up to a certain cpu cache size. For that we would need to
> identify which slabs create the performance issue.
>
> One easy way to check that
On Fri, 2007-05-04 at 11:10 -0700, Christoph Lameter wrote:
> On Fri, 4 May 2007, Tim Chen wrote:
>
> > A side note is that for my tests, I bound the netserver and client to
> > separate cpu core on different sockets in my tests, to make sure that
> > the server and client do not share the same
If I optimize now for the case that we do not share the cpu cache between
different cpus then performance way drop for the case in which we share
the cache (hyperthreading).
If we do not share the cache then processors essentially needs to have
their own lists of partial caches in which they
On Fri, 4 May 2007, Tim Chen wrote:
> A side note is that for my tests, I bound the netserver and client to
> separate cpu core on different sockets in my tests, to make sure that
> the server and client do not share the same cache.
Ahhh... You have some scripts that you run. Care to share?
On Thu, 2007-05-03 at 19:42 -0700, Christoph Lameter wrote:
> H... I do not see a regression (up to date slub with all outstanding
> patches applied). This is without any options enabled (but antifrag
> patches are present so slub_max_order=4 slub_min_objects=16) Could you
> post a .config?
On Thu, 2007-05-03 at 19:42 -0700, Christoph Lameter wrote:
H... I do not see a regression (up to date slub with all outstanding
patches applied). This is without any options enabled (but antifrag
patches are present so slub_max_order=4 slub_min_objects=16) Could you
post a .config?
On Fri, 4 May 2007, Tim Chen wrote:
A side note is that for my tests, I bound the netserver and client to
separate cpu core on different sockets in my tests, to make sure that
the server and client do not share the same cache.
Ahhh... You have some scripts that you run. Care to share?
This
If I optimize now for the case that we do not share the cpu cache between
different cpus then performance way drop for the case in which we share
the cache (hyperthreading).
If we do not share the cache then processors essentially needs to have
their own lists of partial caches in which they
On Fri, 2007-05-04 at 11:10 -0700, Christoph Lameter wrote:
On Fri, 4 May 2007, Tim Chen wrote:
A side note is that for my tests, I bound the netserver and client to
separate cpu core on different sockets in my tests, to make sure that
the server and client do not share the same cache.
On Fri, 2007-05-04 at 11:27 -0700, Christoph Lameter wrote:
Not sure where to go here. Increasing the per cpu slab size may hold off
the issue up to a certain cpu cache size. For that we would need to
identify which slabs create the performance issue.
One easy way to check that this is
On Fri, 4 May 2007, Tim Chen wrote:
On Fri, 2007-05-04 at 11:27 -0700, Christoph Lameter wrote:
Not sure where to go here. Increasing the per cpu slab size may hold off
the issue up to a certain cpu cache size. For that we would need to
identify which slabs create the performance
On Thu, 2007-05-03 at 18:45 -0700, Christoph Lameter wrote:
H.. One potential issues are the complicated way the slab is
handled. Could you try this patch and see what impact it has?
The patch boost the throughput of TCP_STREAM test by 5%, for both slab
and slub. But slab is still 5%
On Fri, 2007-05-04 at 16:59 -0700, Christoph Lameter wrote:
to run the tests. The results are about the same as the non-NUMA case,
with slab about 5% better than slub.
H... both tests were run in the same context? NUMA has additional
overhead in other areas.
Both slab and slub
On Fri, 4 May 2007, Tim Chen wrote:
On Thu, 2007-05-03 at 18:45 -0700, Christoph Lameter wrote:
H.. One potential issues are the complicated way the slab is
handled. Could you try this patch and see what impact it has?
The patch boost the throughput of TCP_STREAM test by 5%, for
If you want to test some more: Here is a patch that removes the atomic ops
from the allocation patch. But I only see minor improvements on my amd64
box here.
Avoid the use of atomics in slab_alloc
This only increases netperf performance by 1%. Wonder why?
What we do is add the last free
Got something If I remove the atomics from both alloc and free then I
get a performance jump. But maybe also a runtime variation
Avoid the use of atomics in slab_alloc
About 5-7% performance gain. Or am I also seeing runtime variations?
What we do is add the last free field in the page
H... I do not see a regression (up to date slub with all outstanding
patches applied). This is without any options enabled (but antifrag
patches are present so slub_max_order=4 slub_min_objects=16) Could you
post a .config? Missing patches against 2.6.21-rc7-mm2 can be found at
H.. One potential issues are the complicated way the slab is
handled. Could you try this patch and see what impact it has?
If it has any then remove the cachline alignment and see how that
influences things.
Remove constructor from buffer_head
Buffer head management uses a constructor
On Thu, 3 May 2007, Chen, Tim C wrote:
> We are still seeing a 5% regression on TCP streaming with
> slub_min_objects set at 16 and a 10% regression for Volanomark, after
> increasing slub_min_objects to 16 and setting slub_max_order=4 and using
> the 2.6.21-rc7-mm2 kernel. The performance
Christoph Lameter wrote:
> Try to boot with
>
> slub_max_order=4 slub_min_objects=8
>
> If that does not help increase slub_min_objects to 16.
>
We are still seeing a 5% regression on TCP streaming with
slub_min_objects set at 16 and a 10% regression for Volanomark, after
increasing
Christoph Lameter wrote:
Try to boot with
slub_max_order=4 slub_min_objects=8
If that does not help increase slub_min_objects to 16.
We are still seeing a 5% regression on TCP streaming with
slub_min_objects set at 16 and a 10% regression for Volanomark, after
increasing slub_min_objects
On Thu, 3 May 2007, Chen, Tim C wrote:
We are still seeing a 5% regression on TCP streaming with
slub_min_objects set at 16 and a 10% regression for Volanomark, after
increasing slub_min_objects to 16 and setting slub_max_order=4 and using
the 2.6.21-rc7-mm2 kernel. The performance between
H.. One potential issues are the complicated way the slab is
handled. Could you try this patch and see what impact it has?
If it has any then remove the cachline alignment and see how that
influences things.
Remove constructor from buffer_head
Buffer head management uses a constructor
H... I do not see a regression (up to date slub with all outstanding
patches applied). This is without any options enabled (but antifrag
patches are present so slub_max_order=4 slub_min_objects=16) Could you
post a .config? Missing patches against 2.6.21-rc7-mm2 can be found at
On Wed, 2 May 2007, Tim Chen wrote:
> We tested SLUB on a 2 socket Clovertown (Core 2 cpu with 2 cores/socket)
> and a 2 socket Woodcrest (Core2 cpu with 4 cores/socket).
Try to boot with
slub_max_order=4 slub_min_objects=8
If that does not help increase slub_min_objects to 16.
> We found
On Wed, 2 May 2007, Tim Chen wrote:
We tested SLUB on a 2 socket Clovertown (Core 2 cpu with 2 cores/socket)
and a 2 socket Woodcrest (Core2 cpu with 4 cores/socket).
Try to boot with
slub_max_order=4 slub_min_objects=8
If that does not help increase slub_min_objects to 16.
We found that
42 matches
Mail list logo