Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-12 Thread David Miller
From: Eric Dumazet <[EMAIL PROTECTED]> Date: Mon, 12 Nov 2007 21:18:17 +0100 > Christoph Lameter a écrit : > > On Mon, 12 Nov 2007, Eric Dumazet wrote: > >> For example, I do think using a per cpu memory storage on net_device > >> refcnt & > >> last_rx could give us some speedups. > > > > Note

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-12 Thread David Miller
From: Eric Dumazet <[EMAIL PROTECTED]> Date: Mon, 12 Nov 2007 21:14:47 +0100 > I dont think this is a problem. Cpus numbers and ram size are related, even > if > Moore didnt predicted it; > > Nobody wants to ship/build a 4096 cpus machine with 256 MB of ram inside. > Or call it a GPU and dont

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-12 Thread David Miller
From: Herbert Xu <[EMAIL PROTECTED]> Date: Mon, 12 Nov 2007 18:52:35 +0800 > David Miller <[EMAIL PROTECTED]> wrote: > > > > Each IP compression tunnel instance does an alloc_percpu(). > > Actually all IPComp tunnels share one set of objects which are > allocated per-cpu. So only the first

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-12 Thread Eric Dumazet
Christoph Lameter a écrit : On Mon, 12 Nov 2007, Eric Dumazet wrote: For example, I do think using a per cpu memory storage on net_device refcnt & last_rx could give us some speedups. Note that there was a new patchset posted (titled cpu alloc v1) that provides on demand extension of the cpu

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-12 Thread Eric Dumazet
Luck, Tony a écrit : Ahh so the need to be able to expand per cpu memory storage on demand is not as critical as we thought. Yes, but still desirable for future optimizations. For example, I do think using a per cpu memory storage on net_device refcnt & last_rx could give us some speedups.

RE: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-12 Thread Luck, Tony
> > Ahh so the need to be able to expand per cpu memory storage on demand > > is not as critical as we thought. > > > > Yes, but still desirable for future optimizations. > > For example, I do think using a per cpu memory storage on net_device refcnt & > last_rx could give us some speedups. We

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-12 Thread Christoph Lameter
On Mon, 12 Nov 2007, Eric Dumazet wrote: > Christoph Lameter a écrit : > > On Mon, 12 Nov 2007, Herbert Xu wrote: > > > > > David Miller <[EMAIL PROTECTED]> wrote: > > > > Each IP compression tunnel instance does an alloc_percpu(). > > > Actually all IPComp tunnels share one set of objects which

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-12 Thread Eric Dumazet
Christoph Lameter a écrit : On Mon, 12 Nov 2007, Herbert Xu wrote: David Miller <[EMAIL PROTECTED]> wrote: Each IP compression tunnel instance does an alloc_percpu(). Actually all IPComp tunnels share one set of objects which are allocated per-cpu. So only the first tunnel would do that.

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-12 Thread Christoph Lameter
On Mon, 12 Nov 2007, Herbert Xu wrote: > David Miller <[EMAIL PROTECTED]> wrote: > > > > Each IP compression tunnel instance does an alloc_percpu(). > > Actually all IPComp tunnels share one set of objects which are > allocated per-cpu. So only the first tunnel would do that. Ahh so the need

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-12 Thread Herbert Xu
David Miller <[EMAIL PROTECTED]> wrote: > > Each IP compression tunnel instance does an alloc_percpu(). Actually all IPComp tunnels share one set of objects which are allocated per-cpu. So only the first tunnel would do that. In fact that was precisely the reason why per-cpu is used in IPComp

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-12 Thread Herbert Xu
David Miller [EMAIL PROTECTED] wrote: Each IP compression tunnel instance does an alloc_percpu(). Actually all IPComp tunnels share one set of objects which are allocated per-cpu. So only the first tunnel would do that. In fact that was precisely the reason why per-cpu is used in IPComp as

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-12 Thread Christoph Lameter
On Mon, 12 Nov 2007, Herbert Xu wrote: David Miller [EMAIL PROTECTED] wrote: Each IP compression tunnel instance does an alloc_percpu(). Actually all IPComp tunnels share one set of objects which are allocated per-cpu. So only the first tunnel would do that. Ahh so the need to be able

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-12 Thread Eric Dumazet
Christoph Lameter a écrit : On Mon, 12 Nov 2007, Herbert Xu wrote: David Miller [EMAIL PROTECTED] wrote: Each IP compression tunnel instance does an alloc_percpu(). Actually all IPComp tunnels share one set of objects which are allocated per-cpu. So only the first tunnel would do that.

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-12 Thread Christoph Lameter
On Mon, 12 Nov 2007, Eric Dumazet wrote: Christoph Lameter a écrit : On Mon, 12 Nov 2007, Herbert Xu wrote: David Miller [EMAIL PROTECTED] wrote: Each IP compression tunnel instance does an alloc_percpu(). Actually all IPComp tunnels share one set of objects which are allocated

RE: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-12 Thread Luck, Tony
Ahh so the need to be able to expand per cpu memory storage on demand is not as critical as we thought. Yes, but still desirable for future optimizations. For example, I do think using a per cpu memory storage on net_device refcnt last_rx could give us some speedups. We do want to

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-12 Thread Eric Dumazet
Luck, Tony a écrit : Ahh so the need to be able to expand per cpu memory storage on demand is not as critical as we thought. Yes, but still desirable for future optimizations. For example, I do think using a per cpu memory storage on net_device refcnt last_rx could give us some speedups.

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-12 Thread Eric Dumazet
Christoph Lameter a écrit : On Mon, 12 Nov 2007, Eric Dumazet wrote: For example, I do think using a per cpu memory storage on net_device refcnt last_rx could give us some speedups. Note that there was a new patchset posted (titled cpu alloc v1) that provides on demand extension of the cpu

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-12 Thread David Miller
From: Herbert Xu [EMAIL PROTECTED] Date: Mon, 12 Nov 2007 18:52:35 +0800 David Miller [EMAIL PROTECTED] wrote: Each IP compression tunnel instance does an alloc_percpu(). Actually all IPComp tunnels share one set of objects which are allocated per-cpu. So only the first tunnel would do

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-12 Thread David Miller
From: Eric Dumazet [EMAIL PROTECTED] Date: Mon, 12 Nov 2007 21:14:47 +0100 I dont think this is a problem. Cpus numbers and ram size are related, even if Moore didnt predicted it; Nobody wants to ship/build a 4096 cpus machine with 256 MB of ram inside. Or call it a GPU and dont expect

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-12 Thread David Miller
From: Eric Dumazet [EMAIL PROTECTED] Date: Mon, 12 Nov 2007 21:18:17 +0100 Christoph Lameter a écrit : On Mon, 12 Nov 2007, Eric Dumazet wrote: For example, I do think using a per cpu memory storage on net_device refcnt last_rx could give us some speedups. Note that there was a

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-02 Thread Christoph Lameter
On Fri, 2 Nov 2007, Peter Zijlstra wrote: > On Fri, 2007-11-02 at 07:35 -0700, Christoph Lameter wrote: > > > Well I wonder if I should introduce it not as a replacement but as an > > alternative to allocpercpu? We can then gradually switch over. The > > existing API does not allow the

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-02 Thread Peter Zijlstra
On Fri, 2007-11-02 at 07:35 -0700, Christoph Lameter wrote: > Well I wonder if I should introduce it not as a replacement but as an > alternative to allocpercpu? We can then gradually switch over. The > existing API does not allow the specification of gfp_masks or alignements. I've thought

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-02 Thread Christoph Lameter
On Fri, 2 Nov 2007, Peter Zijlstra wrote: > On Thu, 2007-11-01 at 15:58 -0700, David Miller wrote: > > > Since you're the one who wants to change the semantics and guarentees > > of this interface, perhaps it might help if you did some greps around > > the tree to see how alloc_percpu() is

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-02 Thread Peter Zijlstra
On Thu, 2007-11-01 at 15:58 -0700, David Miller wrote: > Since you're the one who wants to change the semantics and guarentees > of this interface, perhaps it might help if you did some greps around > the tree to see how alloc_percpu() is actually used. That's what > I did when I started running

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-02 Thread Christoph Lameter
On Fri, 2 Nov 2007, Peter Zijlstra wrote: On Fri, 2007-11-02 at 07:35 -0700, Christoph Lameter wrote: Well I wonder if I should introduce it not as a replacement but as an alternative to allocpercpu? We can then gradually switch over. The existing API does not allow the specification

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-02 Thread Christoph Lameter
On Fri, 2 Nov 2007, Peter Zijlstra wrote: On Thu, 2007-11-01 at 15:58 -0700, David Miller wrote: Since you're the one who wants to change the semantics and guarentees of this interface, perhaps it might help if you did some greps around the tree to see how alloc_percpu() is actually

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-02 Thread Peter Zijlstra
On Thu, 2007-11-01 at 15:58 -0700, David Miller wrote: Since you're the one who wants to change the semantics and guarentees of this interface, perhaps it might help if you did some greps around the tree to see how alloc_percpu() is actually used. That's what I did when I started running

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-02 Thread Peter Zijlstra
On Fri, 2007-11-02 at 07:35 -0700, Christoph Lameter wrote: Well I wonder if I should introduce it not as a replacement but as an alternative to allocpercpu? We can then gradually switch over. The existing API does not allow the specification of gfp_masks or alignements. I've thought about

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-01 Thread David Miller
From: Christoph Lameter <[EMAIL PROTECTED]> Date: Thu, 1 Nov 2007 18:06:17 -0700 (PDT) > A reasonable implementation for 64 bit is likely going to depend on > reserving some virtual memory space for the per cpu mappings so that they > can be dynamically grown up to what the reserved virtual

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-01 Thread Christoph Lameter
Hmmm... On x86_64 we could take 8 terabyte virtual space (bit order 43) With the worst case scenario of 16k of cpus (bit order 16) we are looking at 43-16 = 27 ~ 128MB per cpu. Each percpu can at max be mapped by 64 pmd entries. 4k support is actually max for projected hw. So we'd get to 512M.

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-01 Thread Christoph Lameter
On Thu, 1 Nov 2007, David Miller wrote: > You cannot put limits of the amount of alloc_percpu() memory available > to clients, please let's proceed with that basic understanding in > mind. We're wasting a ton of time discussing this fundamental issue. There is no point in making absolute

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-01 Thread Christoph Lameter
On Fri, 2 Nov 2007, Eric Dumazet wrote: > > Na. Some reasonable upper limit needs to be set. If we set that to say > > 32Megabytes and do the virtual mapping then we can just populate the first > > 2M and only allocate the remainder if we need it. Then we need to rely on > > Mel's defrag stuff

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-01 Thread Eric Dumazet
Christoph Lameter a écrit : On Thu, 1 Nov 2007, David Miller wrote: From: Christoph Lameter <[EMAIL PROTECTED]> Date: Thu, 1 Nov 2007 15:15:39 -0700 (PDT) After boot is complete we allow the reduction of the size of the per cpu areas . Lets say we only need 128k per cpu. Then the remaining

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-01 Thread David Miller
From: Christoph Lameter <[EMAIL PROTECTED]> Date: Thu, 1 Nov 2007 15:48:00 -0700 (PDT) > On Thu, 1 Nov 2007, David Miller wrote: > > > From: Christoph Lameter <[EMAIL PROTECTED]> > > Date: Thu, 1 Nov 2007 15:15:39 -0700 (PDT) > > > > > After boot is complete we allow the reduction of the size

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-01 Thread Christoph Lameter
On Thu, 1 Nov 2007, David Miller wrote: > From: Christoph Lameter <[EMAIL PROTECTED]> > Date: Thu, 1 Nov 2007 15:15:39 -0700 (PDT) > > > After boot is complete we allow the reduction of the size of the per cpu > > areas . Lets say we only need 128k per cpu. Then the remaining pages will > > be

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-01 Thread David Miller
From: Christoph Lameter <[EMAIL PROTECTED]> Date: Thu, 1 Nov 2007 15:15:39 -0700 (PDT) > After boot is complete we allow the reduction of the size of the per cpu > areas . Lets say we only need 128k per cpu. Then the remaining pages will > be returned to the page allocator. You don't know how

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-01 Thread Christoph Lameter
On Thu, 1 Nov 2007, David Miller wrote: > From: Christoph Lameter <[EMAIL PROTECTED]> > Date: Thu, 1 Nov 2007 15:11:41 -0700 (PDT) > > > On Thu, 1 Nov 2007, David Miller wrote: > > > > > The remaining issue with accessing per-cpu areas at multiple virtual > > > addresses is D-cache aliasing. >

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-01 Thread Christoph Lameter
On Thu, 1 Nov 2007, David Miller wrote: > From: Christoph Lameter <[EMAIL PROTECTED]> > Date: Thu, 1 Nov 2007 06:03:44 -0700 (PDT) > > > In order to make it truly dynamic we would have to virtually map the > > area. vmap? But that reduces performance. > > But it would still be faster than the

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-01 Thread David Miller
From: Christoph Lameter <[EMAIL PROTECTED]> Date: Thu, 1 Nov 2007 15:11:41 -0700 (PDT) > On Thu, 1 Nov 2007, David Miller wrote: > > > The remaining issue with accessing per-cpu areas at multiple virtual > > addresses is D-cache aliasing. > > But that is not an issue for physicallly mapped

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-01 Thread Christoph Lameter
On Thu, 1 Nov 2007, David Miller wrote: > The remaining issue with accessing per-cpu areas at multiple virtual > addresses is D-cache aliasing. But that is not an issue for physicallly mapped caches. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-01 Thread David Miller
From: Christoph Lameter <[EMAIL PROTECTED]> Date: Thu, 1 Nov 2007 06:03:44 -0700 (PDT) > In order to make it truly dynamic we would have to virtually map the > area. vmap? But that reduces performance. But it would still be faster than the double-indirection we do now, right? - To unsubscribe

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-01 Thread David Miller
From: Christoph Lameter <[EMAIL PROTECTED]> Date: Thu, 1 Nov 2007 05:57:12 -0700 (PDT) > That is basically what IA64 is doing but it not usable because you would > have addresses that mean different things on different cpus. List head > for example require back pointers. If you put a listhead

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-01 Thread David Miller
From: Christoph Lameter <[EMAIL PROTECTED]> Date: Thu, 1 Nov 2007 06:01:14 -0700 (PDT) > On Thu, 1 Nov 2007, David Miller wrote: > > > IA64 seems to use it universally for every __get_cpu_var() > > access, so maybe it works out somehow :-))) > > IA64 does not do that. It addds the local cpu

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-01 Thread Christoph Lameter
On Thu, 1 Nov 2007, David Miller wrote: > > This hunk helped the sparc64 looping OOPS I was getting, but cpus hang > > in some other fashion soon afterwards. > > And if I bump PER_CPU_ALLOC_SIZE up to 128K it seems to mostly work. Good > You'll definitely need to make this work dynamically

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-01 Thread Christoph Lameter
On Thu, 1 Nov 2007, David Miller wrote: > IA64 seems to use it universally for every __get_cpu_var() > access, so maybe it works out somehow :-))) IA64 does not do that. It addds the local cpu offset #define __get_cpu_var(var) (*RELOC_HIDE(_cpu__##var,

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-01 Thread Christoph Lameter
On Thu, 1 Nov 2007, Eric Dumazet wrote: > I think this question already came in the past and Linus already answered it, > but I again ask it. What about VM games with modern cpus (64 bits arches) > > Say we reserve on x86_64 a really huge (2^32 bytes) area, and change VM layout > so that each

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-01 Thread David Miller
From: David Miller <[EMAIL PROTECTED]> Date: Thu, 01 Nov 2007 00:01:18 -0700 (PDT) > From: Christoph Lameter <[EMAIL PROTECTED]> > Date: Wed, 31 Oct 2007 21:16:59 -0700 (PDT) > > > Index: linux-2.6/mm/allocpercpu.c > > === > > ---

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-01 Thread David Miller
From: Eric Dumazet <[EMAIL PROTECTED]> Date: Thu, 01 Nov 2007 08:17:58 +0100 > Say we reserve on x86_64 a really huge (2^32 bytes) area, and change > VM layout so that each cpu maps its own per_cpu area on this area, > so that the local per_cpu data sits in the same virtual address on > each cpu.

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-01 Thread Eric Dumazet
Christoph Lameter a écrit : This patch increases the speed of the SLUB fastpath by improving the per cpu allocator and makes it usable for SLUB. Currently allocpercpu manages arrays of pointer to per cpu objects. This means that is has to allocate the arrays and then populate them as needed

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-01 Thread David Miller
From: Christoph Lameter <[EMAIL PROTECTED]> Date: Wed, 31 Oct 2007 21:16:59 -0700 (PDT) > Index: linux-2.6/mm/allocpercpu.c > === > --- linux-2.6.orig/mm/allocpercpu.c 2007-10-31 20:53:16.565486654 -0700 > +++

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-01 Thread David Miller
From: Christoph Lameter [EMAIL PROTECTED] Date: Wed, 31 Oct 2007 21:16:59 -0700 (PDT) Index: linux-2.6/mm/allocpercpu.c === --- linux-2.6.orig/mm/allocpercpu.c 2007-10-31 20:53:16.565486654 -0700 +++ linux-2.6/mm/allocpercpu.c

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-01 Thread Eric Dumazet
Christoph Lameter a écrit : This patch increases the speed of the SLUB fastpath by improving the per cpu allocator and makes it usable for SLUB. Currently allocpercpu manages arrays of pointer to per cpu objects. This means that is has to allocate the arrays and then populate them as needed

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-01 Thread David Miller
From: Eric Dumazet [EMAIL PROTECTED] Date: Thu, 01 Nov 2007 08:17:58 +0100 Say we reserve on x86_64 a really huge (2^32 bytes) area, and change VM layout so that each cpu maps its own per_cpu area on this area, so that the local per_cpu data sits in the same virtual address on each cpu. This

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-01 Thread David Miller
From: David Miller [EMAIL PROTECTED] Date: Thu, 01 Nov 2007 00:01:18 -0700 (PDT) From: Christoph Lameter [EMAIL PROTECTED] Date: Wed, 31 Oct 2007 21:16:59 -0700 (PDT) Index: linux-2.6/mm/allocpercpu.c === ---

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-01 Thread Christoph Lameter
On Thu, 1 Nov 2007, Eric Dumazet wrote: I think this question already came in the past and Linus already answered it, but I again ask it. What about VM games with modern cpus (64 bits arches) Say we reserve on x86_64 a really huge (2^32 bytes) area, and change VM layout so that each cpu

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-01 Thread Christoph Lameter
On Thu, 1 Nov 2007, David Miller wrote: IA64 seems to use it universally for every __get_cpu_var() access, so maybe it works out somehow :-))) IA64 does not do that. It addds the local cpu offset #define __get_cpu_var(var) (*RELOC_HIDE(per_cpu__##var,

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-01 Thread Christoph Lameter
On Thu, 1 Nov 2007, David Miller wrote: This hunk helped the sparc64 looping OOPS I was getting, but cpus hang in some other fashion soon afterwards. And if I bump PER_CPU_ALLOC_SIZE up to 128K it seems to mostly work. Good You'll definitely need to make this work dynamically

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-01 Thread David Miller
From: Christoph Lameter [EMAIL PROTECTED] Date: Thu, 1 Nov 2007 06:01:14 -0700 (PDT) On Thu, 1 Nov 2007, David Miller wrote: IA64 seems to use it universally for every __get_cpu_var() access, so maybe it works out somehow :-))) IA64 does not do that. It addds the local cpu offset

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-01 Thread David Miller
From: Christoph Lameter [EMAIL PROTECTED] Date: Thu, 1 Nov 2007 05:57:12 -0700 (PDT) That is basically what IA64 is doing but it not usable because you would have addresses that mean different things on different cpus. List head for example require back pointers. If you put a listhead into

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-01 Thread David Miller
From: Christoph Lameter [EMAIL PROTECTED] Date: Thu, 1 Nov 2007 06:03:44 -0700 (PDT) In order to make it truly dynamic we would have to virtually map the area. vmap? But that reduces performance. But it would still be faster than the double-indirection we do now, right? - To unsubscribe from

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-01 Thread Christoph Lameter
On Thu, 1 Nov 2007, David Miller wrote: The remaining issue with accessing per-cpu areas at multiple virtual addresses is D-cache aliasing. But that is not an issue for physicallly mapped caches. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-01 Thread David Miller
From: Christoph Lameter [EMAIL PROTECTED] Date: Thu, 1 Nov 2007 15:11:41 -0700 (PDT) On Thu, 1 Nov 2007, David Miller wrote: The remaining issue with accessing per-cpu areas at multiple virtual addresses is D-cache aliasing. But that is not an issue for physicallly mapped caches. Right

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-01 Thread Christoph Lameter
On Thu, 1 Nov 2007, David Miller wrote: From: Christoph Lameter [EMAIL PROTECTED] Date: Thu, 1 Nov 2007 06:03:44 -0700 (PDT) In order to make it truly dynamic we would have to virtually map the area. vmap? But that reduces performance. But it would still be faster than the

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-01 Thread Christoph Lameter
On Thu, 1 Nov 2007, David Miller wrote: From: Christoph Lameter [EMAIL PROTECTED] Date: Thu, 1 Nov 2007 15:11:41 -0700 (PDT) On Thu, 1 Nov 2007, David Miller wrote: The remaining issue with accessing per-cpu areas at multiple virtual addresses is D-cache aliasing. But that is

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-01 Thread David Miller
From: Christoph Lameter [EMAIL PROTECTED] Date: Thu, 1 Nov 2007 15:15:39 -0700 (PDT) After boot is complete we allow the reduction of the size of the per cpu areas . Lets say we only need 128k per cpu. Then the remaining pages will be returned to the page allocator. You don't know how much

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-01 Thread Christoph Lameter
On Thu, 1 Nov 2007, David Miller wrote: From: Christoph Lameter [EMAIL PROTECTED] Date: Thu, 1 Nov 2007 15:15:39 -0700 (PDT) After boot is complete we allow the reduction of the size of the per cpu areas . Lets say we only need 128k per cpu. Then the remaining pages will be returned to

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-01 Thread David Miller
From: Christoph Lameter [EMAIL PROTECTED] Date: Thu, 1 Nov 2007 15:48:00 -0700 (PDT) On Thu, 1 Nov 2007, David Miller wrote: From: Christoph Lameter [EMAIL PROTECTED] Date: Thu, 1 Nov 2007 15:15:39 -0700 (PDT) After boot is complete we allow the reduction of the size of the per cpu

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-01 Thread Eric Dumazet
Christoph Lameter a écrit : On Thu, 1 Nov 2007, David Miller wrote: From: Christoph Lameter [EMAIL PROTECTED] Date: Thu, 1 Nov 2007 15:15:39 -0700 (PDT) After boot is complete we allow the reduction of the size of the per cpu areas . Lets say we only need 128k per cpu. Then the remaining

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-01 Thread Christoph Lameter
On Thu, 1 Nov 2007, David Miller wrote: You cannot put limits of the amount of alloc_percpu() memory available to clients, please let's proceed with that basic understanding in mind. We're wasting a ton of time discussing this fundamental issue. There is no point in making absolute demands

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-01 Thread Christoph Lameter
On Fri, 2 Nov 2007, Eric Dumazet wrote: Na. Some reasonable upper limit needs to be set. If we set that to say 32Megabytes and do the virtual mapping then we can just populate the first 2M and only allocate the remainder if we need it. Then we need to rely on Mel's defrag stuff though

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-01 Thread Christoph Lameter
Hmmm... On x86_64 we could take 8 terabyte virtual space (bit order 43) With the worst case scenario of 16k of cpus (bit order 16) we are looking at 43-16 = 27 ~ 128MB per cpu. Each percpu can at max be mapped by 64 pmd entries. 4k support is actually max for projected hw. So we'd get to 512M.

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-11-01 Thread David Miller
From: Christoph Lameter [EMAIL PROTECTED] Date: Thu, 1 Nov 2007 18:06:17 -0700 (PDT) A reasonable implementation for 64 bit is likely going to depend on reserving some virtual memory space for the per cpu mappings so that they can be dynamically grown up to what the reserved virtual space

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-10-31 Thread David Miller
From: Christoph Lameter <[EMAIL PROTECTED]> Date: Wed, 31 Oct 2007 21:16:59 -0700 (PDT) > /* > * Maximum allowed per cpu data per cpu > */ > +#ifdef CONFIG_NUMA > +#define PER_CPU_ALLOC_SIZE (32768 + MAX_NUMNODES * 512) > +#else > #define PER_CPU_ALLOC_SIZE 32768 > +#endif > + Christoph,

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-10-31 Thread David Miller
From: Christoph Lameter <[EMAIL PROTECTED]> Date: Wed, 31 Oct 2007 18:21:02 -0700 (PDT) > On Wed, 31 Oct 2007, David Miller wrote: > > > From: Christoph Lameter <[EMAIL PROTECTED]> > > Date: Wed, 31 Oct 2007 18:12:11 -0700 (PDT) > > > > > On Wed, 31 Oct 2007, David Miller wrote: > > > > > > >

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-10-31 Thread Christoph Lameter
H... Got this to run on an ia64 big iron. One problem is the sizing of the pool. Somehow this needs to be dynamic. Apply this fix on top of the others. --- include/asm-ia64/page.h |2 +- include/asm-ia64/percpu.h |9 ++--- mm/allocpercpu.c | 12 ++-- 3

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-10-31 Thread Christoph Lameter
On Wed, 31 Oct 2007, David Miller wrote: > From: Christoph Lameter <[EMAIL PROTECTED]> > Date: Wed, 31 Oct 2007 18:12:11 -0700 (PDT) > > > On Wed, 31 Oct 2007, David Miller wrote: > > > > > All I can do now is bisect and then try to figure out what about the > > > guilty change might cause the

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-10-31 Thread David Miller
From: Christoph Lameter <[EMAIL PROTECTED]> Date: Wed, 31 Oct 2007 18:12:11 -0700 (PDT) > On Wed, 31 Oct 2007, David Miller wrote: > > > All I can do now is bisect and then try to figure out what about the > > guilty change might cause the problem. > > Reverting the 7th patch should avoid using

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-10-31 Thread Christoph Lameter
On Wed, 31 Oct 2007, David Miller wrote: > It crashes when SSHD starts, the serial console GETTY hasn't > started up yet so I can't even log in to run those commands > Christoph. Hmmm... Bad. > All I can do now is bisect and then try to figure out what about the > guilty change might cause the

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-10-31 Thread David Miller
From: Christoph Lameter <[EMAIL PROTECTED]> Date: Wed, 31 Oct 2007 18:01:34 -0700 (PDT) > On Wed, 31 Oct 2007, David Miller wrote: > > > Without DEBUG_VM I get a loop of crashes shortly after SSHD > > is started, I'll try to track it down. > > Check how much per cpu memory is in use by > > cat

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-10-31 Thread Christoph Lameter
On Wed, 31 Oct 2007, David Miller wrote: > Without DEBUG_VM I get a loop of crashes shortly after SSHD > is started, I'll try to track it down. Check how much per cpu memory is in use by cat /proc/vmstat currently we have a 32k limit there. - To unsubscribe from this list: send the line

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-10-31 Thread David Miller
From: Christoph Lameter <[EMAIL PROTECTED]> Date: Wed, 31 Oct 2007 17:53:23 -0700 (PDT) > > This patch fixes build failures with DEBUG_VM disabled. > > Well there is more there. Last minute mods sigh. With DEBUG_VM you likely > need this patch. Without DEBUG_VM I get a loop of crashes shortly

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-10-31 Thread Christoph Lameter
> This patch fixes build failures with DEBUG_VM disabled. Well there is more there. Last minute mods sigh. With DEBUG_VM you likely need this patch. --- include/linux/percpu.h |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) Index: linux-2.6/include/linux/percpu.h

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-10-31 Thread David Miller
From: Christoph Lameter <[EMAIL PROTECTED]> Date: Wed, 31 Oct 2007 17:31:12 -0700 (PDT) > Others may have the same issue. > > git pull git://git.kernel.org/pub/scm/linux/kernel/git/christoph/slab.git > allocpercpu > > should get you the whole thing. This patch fixes build failures with

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-10-31 Thread Christoph Lameter
On Wed, 31 Oct 2007, David Miller wrote: > > git pull git://git.kernel.org/pub/scm/linux/kernel/git/christoph/slab.git > > performance > > > > and then you should be able to apply these patches. > > Thanks a lot Chrisoph. Others may have the same issue. git pull

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-10-31 Thread David Miller
From: Christoph Lameter <[EMAIL PROTECTED]> Date: Wed, 31 Oct 2007 17:26:16 -0700 (PDT) > Do > > git pull git://git.kernel.org/pub/scm/linux/kernel/git/christoph/slab.git > performance > > and then you should be able to apply these patches. Thanks a lot Chrisoph. - To unsubscribe from this

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-10-31 Thread Christoph Lameter
On Wed, 31 Oct 2007, David Miller wrote: > > Are these patches against -mm or mainline? > > I get a lot of rejects starting with patch 6 against > mainline and I really wanted to test them out on sparc64. Hmmm... They are against the current slab performance head (which is in mm but it has

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-10-31 Thread David Miller
Are these patches against -mm or mainline? I get a lot of rejects starting with patch 6 against mainline and I really wanted to test them out on sparc64. Thanks. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More

[patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-10-31 Thread Christoph Lameter
This patch increases the speed of the SLUB fastpath by improving the per cpu allocator and makes it usable for SLUB. Currently allocpercpu manages arrays of pointer to per cpu objects. This means that is has to allocate the arrays and then populate them as needed with objects. Although these

[patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-10-31 Thread Christoph Lameter
This patch increases the speed of the SLUB fastpath by improving the per cpu allocator and makes it usable for SLUB. Currently allocpercpu manages arrays of pointer to per cpu objects. This means that is has to allocate the arrays and then populate them as needed with objects. Although these

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-10-31 Thread David Miller
Are these patches against -mm or mainline? I get a lot of rejects starting with patch 6 against mainline and I really wanted to test them out on sparc64. Thanks. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-10-31 Thread Christoph Lameter
On Wed, 31 Oct 2007, David Miller wrote: Are these patches against -mm or mainline? I get a lot of rejects starting with patch 6 against mainline and I really wanted to test them out on sparc64. Hmmm... They are against the current slab performance head (which is in mm but it has not

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-10-31 Thread David Miller
From: Christoph Lameter [EMAIL PROTECTED] Date: Wed, 31 Oct 2007 17:26:16 -0700 (PDT) Do git pull git://git.kernel.org/pub/scm/linux/kernel/git/christoph/slab.git performance and then you should be able to apply these patches. Thanks a lot Chrisoph. - To unsubscribe from this list:

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-10-31 Thread Christoph Lameter
On Wed, 31 Oct 2007, David Miller wrote: git pull git://git.kernel.org/pub/scm/linux/kernel/git/christoph/slab.git performance and then you should be able to apply these patches. Thanks a lot Chrisoph. Others may have the same issue. git pull

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-10-31 Thread David Miller
From: Christoph Lameter [EMAIL PROTECTED] Date: Wed, 31 Oct 2007 17:31:12 -0700 (PDT) Others may have the same issue. git pull git://git.kernel.org/pub/scm/linux/kernel/git/christoph/slab.git allocpercpu should get you the whole thing. This patch fixes build failures with DEBUG_VM

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-10-31 Thread Christoph Lameter
This patch fixes build failures with DEBUG_VM disabled. Well there is more there. Last minute mods sigh. With DEBUG_VM you likely need this patch. --- include/linux/percpu.h |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) Index: linux-2.6/include/linux/percpu.h

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-10-31 Thread David Miller
From: Christoph Lameter [EMAIL PROTECTED] Date: Wed, 31 Oct 2007 17:53:23 -0700 (PDT) This patch fixes build failures with DEBUG_VM disabled. Well there is more there. Last minute mods sigh. With DEBUG_VM you likely need this patch. Without DEBUG_VM I get a loop of crashes shortly after

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-10-31 Thread David Miller
From: Christoph Lameter [EMAIL PROTECTED] Date: Wed, 31 Oct 2007 18:01:34 -0700 (PDT) On Wed, 31 Oct 2007, David Miller wrote: Without DEBUG_VM I get a loop of crashes shortly after SSHD is started, I'll try to track it down. Check how much per cpu memory is in use by cat

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-10-31 Thread David Miller
From: Christoph Lameter [EMAIL PROTECTED] Date: Wed, 31 Oct 2007 18:12:11 -0700 (PDT) On Wed, 31 Oct 2007, David Miller wrote: All I can do now is bisect and then try to figure out what about the guilty change might cause the problem. Reverting the 7th patch should avoid using the sparc

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-10-31 Thread Christoph Lameter
On Wed, 31 Oct 2007, David Miller wrote: It crashes when SSHD starts, the serial console GETTY hasn't started up yet so I can't even log in to run those commands Christoph. Hmmm... Bad. All I can do now is bisect and then try to figure out what about the guilty change might cause the

Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead

2007-10-31 Thread Christoph Lameter
On Wed, 31 Oct 2007, David Miller wrote: Without DEBUG_VM I get a loop of crashes shortly after SSHD is started, I'll try to track it down. Check how much per cpu memory is in use by cat /proc/vmstat currently we have a 32k limit there. - To unsubscribe from this list: send the line

  1   2   >