RE: Regression with SLUB on Netperf and Volanomark

2007-05-08 Thread Christoph Lameter
On Tue, 8 May 2007, Tim Chen wrote: > I tried the slub-patches and the avoid atomic overhead patch against > 2.6.21-mm1. It brings the TCP_STREAM performance for SLUB to the SLAB > level. The patches not mentioned in the "series" file did not apply > cleanly to 2.6.21-mm1 and I skipped most of

RE: Regression with SLUB on Netperf and Volanomark

2007-05-08 Thread Tim Chen
On Mon, 2007-05-07 at 18:49 -0700, Christoph Lameter wrote: > On Mon, 7 May 2007, Tim Chen wrote: > > > However, the output from TCP_STREAM is quite stable. > > I am still seeing a 4% difference between the SLAB and SLUB kernel. > > Looking at the L2 cache miss rate with emon, I saw 6% more

RE: Regression with SLUB on Netperf and Volanomark

2007-05-08 Thread Tim Chen
On Mon, 2007-05-07 at 18:49 -0700, Christoph Lameter wrote: On Mon, 7 May 2007, Tim Chen wrote: However, the output from TCP_STREAM is quite stable. I am still seeing a 4% difference between the SLAB and SLUB kernel. Looking at the L2 cache miss rate with emon, I saw 6% more cache miss

RE: Regression with SLUB on Netperf and Volanomark

2007-05-08 Thread Christoph Lameter
On Tue, 8 May 2007, Tim Chen wrote: I tried the slub-patches and the avoid atomic overhead patch against 2.6.21-mm1. It brings the TCP_STREAM performance for SLUB to the SLAB level. The patches not mentioned in the series file did not apply cleanly to 2.6.21-mm1 and I skipped most of those.

RE: Regression with SLUB on Netperf and Volanomark

2007-05-07 Thread Christoph Lameter
On Mon, 7 May 2007, Tim Chen wrote: > However, the output from TCP_STREAM is quite stable. > I am still seeing a 4% difference between the SLAB and SLUB kernel. > Looking at the L2 cache miss rate with emon, I saw 6% more cache miss on > the client side with SLUB. The server side has the same

RE: Regression with SLUB on Netperf and Volanomark

2007-05-07 Thread Christoph Lameter
On Mon, 7 May 2007, Tim Chen wrote: > However, the output from TCP_STREAM is quite stable. > I am still seeing a 4% difference between the SLAB and SLUB kernel. > Looking at the L2 cache miss rate with emon, I saw 6% more cache miss on > the client side with SLUB. The server side has the same

RE: Regression with SLUB on Netperf and Volanomark

2007-05-07 Thread Tim Chen
On Fri, 2007-05-04 at 18:02 -0700, Christoph Lameter wrote: > On Fri, 4 May 2007, Tim Chen wrote: > > > On Thu, 2007-05-03 at 18:45 -0700, Christoph Lameter wrote: > > > H.. One potential issues are the complicated way the slab is > > > handled. Could you try this patch and see what impact

RE: Regression with SLUB on Netperf and Volanomark

2007-05-07 Thread Tim Chen
On Fri, 2007-05-04 at 18:02 -0700, Christoph Lameter wrote: On Fri, 4 May 2007, Tim Chen wrote: On Thu, 2007-05-03 at 18:45 -0700, Christoph Lameter wrote: H.. One potential issues are the complicated way the slab is handled. Could you try this patch and see what impact it has?

RE: Regression with SLUB on Netperf and Volanomark

2007-05-07 Thread Christoph Lameter
On Mon, 7 May 2007, Tim Chen wrote: However, the output from TCP_STREAM is quite stable. I am still seeing a 4% difference between the SLAB and SLUB kernel. Looking at the L2 cache miss rate with emon, I saw 6% more cache miss on the client side with SLUB. The server side has the same

RE: Regression with SLUB on Netperf and Volanomark

2007-05-07 Thread Christoph Lameter
On Mon, 7 May 2007, Tim Chen wrote: However, the output from TCP_STREAM is quite stable. I am still seeing a 4% difference between the SLAB and SLUB kernel. Looking at the L2 cache miss rate with emon, I saw 6% more cache miss on the client side with SLUB. The server side has the same

RE: Regression with SLUB on Netperf and Volanomark

2007-05-04 Thread Christoph Lameter
Got something If I remove the atomics from both alloc and free then I get a performance jump. But maybe also a runtime variation Avoid the use of atomics in slab_alloc About 5-7% performance gain. Or am I also seeing runtime variations? What we do is add the last free field in the page

RE: Regression with SLUB on Netperf and Volanomark

2007-05-04 Thread Christoph Lameter
If you want to test some more: Here is a patch that removes the atomic ops from the allocation patch. But I only see minor improvements on my amd64 box here. Avoid the use of atomics in slab_alloc This only increases netperf performance by 1%. Wonder why? What we do is add the last free

RE: Regression with SLUB on Netperf and Volanomark

2007-05-04 Thread Christoph Lameter
On Fri, 4 May 2007, Tim Chen wrote: > On Thu, 2007-05-03 at 18:45 -0700, Christoph Lameter wrote: > > H.. One potential issues are the complicated way the slab is > > handled. Could you try this patch and see what impact it has? > > > The patch boost the throughput of TCP_STREAM test by 5%,

RE: Regression with SLUB on Netperf and Volanomark

2007-05-04 Thread Tim Chen
On Fri, 2007-05-04 at 16:59 -0700, Christoph Lameter wrote: > > > > to run the tests. The results are about the same as the non-NUMA case, > > with slab about 5% better than slub. > > H... both tests were run in the same context? NUMA has additional > overhead in other areas. Both slab

RE: Regression with SLUB on Netperf and Volanomark

2007-05-04 Thread Tim Chen
On Thu, 2007-05-03 at 18:45 -0700, Christoph Lameter wrote: > H.. One potential issues are the complicated way the slab is > handled. Could you try this patch and see what impact it has? > The patch boost the throughput of TCP_STREAM test by 5%, for both slab and slub. But slab is still 5%

RE: Regression with SLUB on Netperf and Volanomark

2007-05-04 Thread Christoph Lameter
On Fri, 4 May 2007, Tim Chen wrote: > On Fri, 2007-05-04 at 11:27 -0700, Christoph Lameter wrote: > > > > > Not sure where to go here. Increasing the per cpu slab size may hold off > > the issue up to a certain cpu cache size. For that we would need to > > identify which slabs create the

RE: Regression with SLUB on Netperf and Volanomark

2007-05-04 Thread Tim Chen
On Fri, 2007-05-04 at 11:27 -0700, Christoph Lameter wrote: > > Not sure where to go here. Increasing the per cpu slab size may hold off > the issue up to a certain cpu cache size. For that we would need to > identify which slabs create the performance issue. > > One easy way to check that

RE: Regression with SLUB on Netperf and Volanomark

2007-05-04 Thread Tim Chen
On Fri, 2007-05-04 at 11:10 -0700, Christoph Lameter wrote: > On Fri, 4 May 2007, Tim Chen wrote: > > > A side note is that for my tests, I bound the netserver and client to > > separate cpu core on different sockets in my tests, to make sure that > > the server and client do not share the same

RE: Regression with SLUB on Netperf and Volanomark

2007-05-04 Thread Christoph Lameter
If I optimize now for the case that we do not share the cpu cache between different cpus then performance way drop for the case in which we share the cache (hyperthreading). If we do not share the cache then processors essentially needs to have their own lists of partial caches in which they

RE: Regression with SLUB on Netperf and Volanomark

2007-05-04 Thread Christoph Lameter
On Fri, 4 May 2007, Tim Chen wrote: > A side note is that for my tests, I bound the netserver and client to > separate cpu core on different sockets in my tests, to make sure that > the server and client do not share the same cache. Ahhh... You have some scripts that you run. Care to share?

RE: Regression with SLUB on Netperf and Volanomark

2007-05-04 Thread Tim Chen
On Thu, 2007-05-03 at 19:42 -0700, Christoph Lameter wrote: > H... I do not see a regression (up to date slub with all outstanding > patches applied). This is without any options enabled (but antifrag > patches are present so slub_max_order=4 slub_min_objects=16) Could you > post a .config?

RE: Regression with SLUB on Netperf and Volanomark

2007-05-04 Thread Tim Chen
On Thu, 2007-05-03 at 19:42 -0700, Christoph Lameter wrote: H... I do not see a regression (up to date slub with all outstanding patches applied). This is without any options enabled (but antifrag patches are present so slub_max_order=4 slub_min_objects=16) Could you post a .config?

RE: Regression with SLUB on Netperf and Volanomark

2007-05-04 Thread Christoph Lameter
On Fri, 4 May 2007, Tim Chen wrote: A side note is that for my tests, I bound the netserver and client to separate cpu core on different sockets in my tests, to make sure that the server and client do not share the same cache. Ahhh... You have some scripts that you run. Care to share? This

RE: Regression with SLUB on Netperf and Volanomark

2007-05-04 Thread Christoph Lameter
If I optimize now for the case that we do not share the cpu cache between different cpus then performance way drop for the case in which we share the cache (hyperthreading). If we do not share the cache then processors essentially needs to have their own lists of partial caches in which they

RE: Regression with SLUB on Netperf and Volanomark

2007-05-04 Thread Tim Chen
On Fri, 2007-05-04 at 11:10 -0700, Christoph Lameter wrote: On Fri, 4 May 2007, Tim Chen wrote: A side note is that for my tests, I bound the netserver and client to separate cpu core on different sockets in my tests, to make sure that the server and client do not share the same cache.

RE: Regression with SLUB on Netperf and Volanomark

2007-05-04 Thread Tim Chen
On Fri, 2007-05-04 at 11:27 -0700, Christoph Lameter wrote: Not sure where to go here. Increasing the per cpu slab size may hold off the issue up to a certain cpu cache size. For that we would need to identify which slabs create the performance issue. One easy way to check that this is

RE: Regression with SLUB on Netperf and Volanomark

2007-05-04 Thread Christoph Lameter
On Fri, 4 May 2007, Tim Chen wrote: On Fri, 2007-05-04 at 11:27 -0700, Christoph Lameter wrote: Not sure where to go here. Increasing the per cpu slab size may hold off the issue up to a certain cpu cache size. For that we would need to identify which slabs create the performance

RE: Regression with SLUB on Netperf and Volanomark

2007-05-04 Thread Tim Chen
On Thu, 2007-05-03 at 18:45 -0700, Christoph Lameter wrote: H.. One potential issues are the complicated way the slab is handled. Could you try this patch and see what impact it has? The patch boost the throughput of TCP_STREAM test by 5%, for both slab and slub. But slab is still 5%

RE: Regression with SLUB on Netperf and Volanomark

2007-05-04 Thread Tim Chen
On Fri, 2007-05-04 at 16:59 -0700, Christoph Lameter wrote: to run the tests. The results are about the same as the non-NUMA case, with slab about 5% better than slub. H... both tests were run in the same context? NUMA has additional overhead in other areas. Both slab and slub

RE: Regression with SLUB on Netperf and Volanomark

2007-05-04 Thread Christoph Lameter
On Fri, 4 May 2007, Tim Chen wrote: On Thu, 2007-05-03 at 18:45 -0700, Christoph Lameter wrote: H.. One potential issues are the complicated way the slab is handled. Could you try this patch and see what impact it has? The patch boost the throughput of TCP_STREAM test by 5%, for

RE: Regression with SLUB on Netperf and Volanomark

2007-05-04 Thread Christoph Lameter
If you want to test some more: Here is a patch that removes the atomic ops from the allocation patch. But I only see minor improvements on my amd64 box here. Avoid the use of atomics in slab_alloc This only increases netperf performance by 1%. Wonder why? What we do is add the last free

RE: Regression with SLUB on Netperf and Volanomark

2007-05-04 Thread Christoph Lameter
Got something If I remove the atomics from both alloc and free then I get a performance jump. But maybe also a runtime variation Avoid the use of atomics in slab_alloc About 5-7% performance gain. Or am I also seeing runtime variations? What we do is add the last free field in the page

RE: Regression with SLUB on Netperf and Volanomark

2007-05-03 Thread Christoph Lameter
H... I do not see a regression (up to date slub with all outstanding patches applied). This is without any options enabled (but antifrag patches are present so slub_max_order=4 slub_min_objects=16) Could you post a .config? Missing patches against 2.6.21-rc7-mm2 can be found at

RE: Regression with SLUB on Netperf and Volanomark

2007-05-03 Thread Christoph Lameter
H.. One potential issues are the complicated way the slab is handled. Could you try this patch and see what impact it has? If it has any then remove the cachline alignment and see how that influences things. Remove constructor from buffer_head Buffer head management uses a constructor

RE: Regression with SLUB on Netperf and Volanomark

2007-05-03 Thread Christoph Lameter
On Thu, 3 May 2007, Chen, Tim C wrote: > We are still seeing a 5% regression on TCP streaming with > slub_min_objects set at 16 and a 10% regression for Volanomark, after > increasing slub_min_objects to 16 and setting slub_max_order=4 and using > the 2.6.21-rc7-mm2 kernel. The performance

RE: Regression with SLUB on Netperf and Volanomark

2007-05-03 Thread Chen, Tim C
Christoph Lameter wrote: > Try to boot with > > slub_max_order=4 slub_min_objects=8 > > If that does not help increase slub_min_objects to 16. > We are still seeing a 5% regression on TCP streaming with slub_min_objects set at 16 and a 10% regression for Volanomark, after increasing

RE: Regression with SLUB on Netperf and Volanomark

2007-05-03 Thread Chen, Tim C
Christoph Lameter wrote: Try to boot with slub_max_order=4 slub_min_objects=8 If that does not help increase slub_min_objects to 16. We are still seeing a 5% regression on TCP streaming with slub_min_objects set at 16 and a 10% regression for Volanomark, after increasing slub_min_objects

RE: Regression with SLUB on Netperf and Volanomark

2007-05-03 Thread Christoph Lameter
On Thu, 3 May 2007, Chen, Tim C wrote: We are still seeing a 5% regression on TCP streaming with slub_min_objects set at 16 and a 10% regression for Volanomark, after increasing slub_min_objects to 16 and setting slub_max_order=4 and using the 2.6.21-rc7-mm2 kernel. The performance between

RE: Regression with SLUB on Netperf and Volanomark

2007-05-03 Thread Christoph Lameter
H.. One potential issues are the complicated way the slab is handled. Could you try this patch and see what impact it has? If it has any then remove the cachline alignment and see how that influences things. Remove constructor from buffer_head Buffer head management uses a constructor

RE: Regression with SLUB on Netperf and Volanomark

2007-05-03 Thread Christoph Lameter
H... I do not see a regression (up to date slub with all outstanding patches applied). This is without any options enabled (but antifrag patches are present so slub_max_order=4 slub_min_objects=16) Could you post a .config? Missing patches against 2.6.21-rc7-mm2 can be found at

Re: Regression with SLUB on Netperf and Volanomark

2007-05-02 Thread Christoph Lameter
On Wed, 2 May 2007, Tim Chen wrote: > We tested SLUB on a 2 socket Clovertown (Core 2 cpu with 2 cores/socket) > and a 2 socket Woodcrest (Core2 cpu with 4 cores/socket). Try to boot with slub_max_order=4 slub_min_objects=8 If that does not help increase slub_min_objects to 16. > We found

Re: Regression with SLUB on Netperf and Volanomark

2007-05-02 Thread Christoph Lameter
On Wed, 2 May 2007, Tim Chen wrote: We tested SLUB on a 2 socket Clovertown (Core 2 cpu with 2 cores/socket) and a 2 socket Woodcrest (Core2 cpu with 4 cores/socket). Try to boot with slub_max_order=4 slub_min_objects=8 If that does not help increase slub_min_objects to 16. We found that