Re: [RFC v2] percpu: Add a separate function to merge free areas

2014-12-04 Thread Konstantin Khlebnikov
On Fri, Dec 5, 2014 at 12:20 AM, Christoph Lameter wrote: > On Thu, 4 Dec 2014, Tejun Heo wrote: > >> Docker usage is pretty wide-spread now, making what used to be >> siberia-cold paths hot enough to cause actual scalability issues. >> Besides, we're now using percpu_ref for things like aio and

Re: [RFC v2] percpu: Add a separate function to merge free areas

2014-12-04 Thread Christoph Lameter
On Thu, 4 Dec 2014, Tejun Heo wrote: > Docker usage is pretty wide-spread now, making what used to be > siberia-cold paths hot enough to cause actual scalability issues. > Besides, we're now using percpu_ref for things like aio and cgroup > control structures which can be created and destroyed

Re: [RFC v2] percpu: Add a separate function to merge free areas

2014-12-04 Thread Tejun Heo
On Thu, Dec 04, 2014 at 03:15:27PM -0600, Christoph Lameter wrote: > On Thu, 4 Dec 2014, Al Viro wrote: > > > ... except that somebody has not known that and took refcounts on e.g. > > vfsmounts into percpu. With massive amounts of hilarity once docker folks > > started to test the workloads

Re: [RFC v2] percpu: Add a separate function to merge free areas

2014-12-04 Thread Christoph Lameter
On Thu, 4 Dec 2014, Al Viro wrote: > ... except that somebody has not known that and took refcounts on e.g. > vfsmounts into percpu. With massive amounts of hilarity once docker folks > started to test the workloads that created/destroyed those in large amounts. Well, vfsmounts being a

Re: [RFC v2] percpu: Add a separate function to merge free areas

2014-12-04 Thread Al Viro
On Thu, Dec 04, 2014 at 02:28:10PM -0600, Christoph Lameter wrote: > On Thu, 4 Dec 2014, Leonard Crestez wrote: > > > Yes, we are actually experiencing issues with this. We create lots of > > virtual > > net_devices and routes, which means lots of percpu counters/pointers. In > > particular > >

Re: [RFC v2] percpu: Add a separate function to merge free areas

2014-12-04 Thread Tejun Heo
Hello, Christoph. On Thu, Dec 04, 2014 at 02:28:10PM -0600, Christoph Lameter wrote: > Well this is not a common use case and that is not what the per cpu > allocator was designed for. There is bound to be signifcant fragmentation > with the current design. The design was for rare allocations

Re: [RFC v2] percpu: Add a separate function to merge free areas

2014-12-04 Thread Leonard Crestez
On 12/04/2014 07:57 PM, Tejun Heo wrote: > Hello, > > On Wed, Dec 03, 2014 at 12:33:59AM +0200, Leonard Crestez wrote: >> It seems that free_percpu performance is very bad when working with small >> objects. The easiest way to reproduce this is to allocate and then free a >> large >> number of

Re: [RFC v2] percpu: Add a separate function to merge free areas

2014-12-04 Thread Tejun Heo
On Thu, Dec 04, 2014 at 10:10:18PM +0200, Leonard Crestez wrote: > Yes, we are actually experiencing issues with this. We create lots of virtual > net_devices and routes, which means lots of percpu counters/pointers. In > particular > we are getting worse performance than in older kernels because

Re: [RFC v2] percpu: Add a separate function to merge free areas

2014-12-04 Thread Christoph Lameter
On Thu, 4 Dec 2014, Leonard Crestez wrote: > Yes, we are actually experiencing issues with this. We create lots of virtual > net_devices and routes, which means lots of percpu counters/pointers. In > particular > we are getting worse performance than in older kernels because the net_device >

Re: [RFC v2] percpu: Add a separate function to merge free areas

2014-12-04 Thread Tejun Heo
Hello, On Wed, Dec 03, 2014 at 12:33:59AM +0200, Leonard Crestez wrote: > It seems that free_percpu performance is very bad when working with small > objects. The easiest way to reproduce this is to allocate and then free a > large > number of percpu int counters in order. Small objects

Re: [RFC v2] percpu: Add a separate function to merge free areas

2014-12-04 Thread Tejun Heo
Hello, On Wed, Dec 03, 2014 at 12:33:59AM +0200, Leonard Crestez wrote: It seems that free_percpu performance is very bad when working with small objects. The easiest way to reproduce this is to allocate and then free a large number of percpu int counters in order. Small objects (reference

Re: [RFC v2] percpu: Add a separate function to merge free areas

2014-12-04 Thread Christoph Lameter
On Thu, 4 Dec 2014, Leonard Crestez wrote: Yes, we are actually experiencing issues with this. We create lots of virtual net_devices and routes, which means lots of percpu counters/pointers. In particular we are getting worse performance than in older kernels because the net_device refcnt

Re: [RFC v2] percpu: Add a separate function to merge free areas

2014-12-04 Thread Tejun Heo
On Thu, Dec 04, 2014 at 10:10:18PM +0200, Leonard Crestez wrote: Yes, we are actually experiencing issues with this. We create lots of virtual net_devices and routes, which means lots of percpu counters/pointers. In particular we are getting worse performance than in older kernels because the

Re: [RFC v2] percpu: Add a separate function to merge free areas

2014-12-04 Thread Leonard Crestez
On 12/04/2014 07:57 PM, Tejun Heo wrote: Hello, On Wed, Dec 03, 2014 at 12:33:59AM +0200, Leonard Crestez wrote: It seems that free_percpu performance is very bad when working with small objects. The easiest way to reproduce this is to allocate and then free a large number of percpu int

Re: [RFC v2] percpu: Add a separate function to merge free areas

2014-12-04 Thread Tejun Heo
Hello, Christoph. On Thu, Dec 04, 2014 at 02:28:10PM -0600, Christoph Lameter wrote: Well this is not a common use case and that is not what the per cpu allocator was designed for. There is bound to be signifcant fragmentation with the current design. The design was for rare allocations when

Re: [RFC v2] percpu: Add a separate function to merge free areas

2014-12-04 Thread Al Viro
On Thu, Dec 04, 2014 at 02:28:10PM -0600, Christoph Lameter wrote: On Thu, 4 Dec 2014, Leonard Crestez wrote: Yes, we are actually experiencing issues with this. We create lots of virtual net_devices and routes, which means lots of percpu counters/pointers. In particular we are

Re: [RFC v2] percpu: Add a separate function to merge free areas

2014-12-04 Thread Christoph Lameter
On Thu, 4 Dec 2014, Al Viro wrote: ... except that somebody has not known that and took refcounts on e.g. vfsmounts into percpu. With massive amounts of hilarity once docker folks started to test the workloads that created/destroyed those in large amounts. Well, vfsmounts being a performance

Re: [RFC v2] percpu: Add a separate function to merge free areas

2014-12-04 Thread Tejun Heo
On Thu, Dec 04, 2014 at 03:15:27PM -0600, Christoph Lameter wrote: On Thu, 4 Dec 2014, Al Viro wrote: ... except that somebody has not known that and took refcounts on e.g. vfsmounts into percpu. With massive amounts of hilarity once docker folks started to test the workloads that

Re: [RFC v2] percpu: Add a separate function to merge free areas

2014-12-04 Thread Christoph Lameter
On Thu, 4 Dec 2014, Tejun Heo wrote: Docker usage is pretty wide-spread now, making what used to be siberia-cold paths hot enough to cause actual scalability issues. Besides, we're now using percpu_ref for things like aio and cgroup control structures which can be created and destroyed quite

Re: [RFC v2] percpu: Add a separate function to merge free areas

2014-12-04 Thread Konstantin Khlebnikov
On Fri, Dec 5, 2014 at 12:20 AM, Christoph Lameter c...@linux.com wrote: On Thu, 4 Dec 2014, Tejun Heo wrote: Docker usage is pretty wide-spread now, making what used to be siberia-cold paths hot enough to cause actual scalability issues. Besides, we're now using percpu_ref for things like

[RFC v2] percpu: Add a separate function to merge free areas

2014-12-02 Thread Leonard Crestez
Hello, It seems that free_percpu performance is very bad when working with small objects. The easiest way to reproduce this is to allocate and then free a large number of percpu int counters in order. Small objects (reference counters and pointers) are common users of alloc_percpu and I think

[RFC v2] percpu: Add a separate function to merge free areas

2014-12-02 Thread Leonard Crestez
Hello, It seems that free_percpu performance is very bad when working with small objects. The easiest way to reproduce this is to allocate and then free a large number of percpu int counters in order. Small objects (reference counters and pointers) are common users of alloc_percpu and I think