Re: [RFC PATCH] alispinlock: acceleration from lock integration on multi-core platform

2016-04-11 Thread Ling Ma
Is it acceptable for performance improvement or more comments on this patch? Thanks Ling 2016-04-05 11:44 GMT+08:00 Ling Ma : > Hi Longman, > >> with some modest increase in performance. That can be hard to justify. Maybe >> you should find other use cases that involve

Re: [RFC PATCH] alispinlock: acceleration from lock integration on multi-core platform

2016-04-11 Thread Ling Ma
Is it acceptable for performance improvement or more comments on this patch? Thanks Ling 2016-04-05 11:44 GMT+08:00 Ling Ma : > Hi Longman, > >> with some modest increase in performance. That can be hard to justify. Maybe >> you should find other use cases that involve less changes, but still

Re: [RFC PATCH] alispinlock: acceleration from lock integration on multi-core platform

2016-04-04 Thread Ling Ma
Hi Longman, > with some modest increase in performance. That can be hard to justify. Maybe > you should find other use cases that involve less changes, but still have > noticeable performance improvement. That will make it easier to be accepted. The attachment is for other use case with the new

Re: [RFC PATCH] alispinlock: acceleration from lock integration on multi-core platform

2016-04-04 Thread Ling Ma
Hi Longman, > with some modest increase in performance. That can be hard to justify. Maybe > you should find other use cases that involve less changes, but still have > noticeable performance improvement. That will make it easier to be accepted. The attachment is for other use case with the new

Re: [RFC PATCH] alispinlock: acceleration from lock integration on multi-core platform

2016-02-03 Thread Ling Ma
> I have 2 major comments here. First of all, you should break up your patch > into smaller ones. Large patch like the one in the tar ball is hard to > review. Ok, we will do it. >Secondly, you are modifying over 1000 lines of code in mm/slab.c > with some modest increase in performance. That

Re: [RFC PATCH] alispinlock: acceleration from lock integration on multi-core platform

2016-02-03 Thread Waiman Long
On 02/02/2016 11:40 PM, Ling Ma wrote: Longman, The attachment include user space code(thread.c), and kernel patch(ali_work_queue.patch) based on 4.3.0-rc4, we replaced all original spinlock (list_lock) in slab.h/c with the new mechanism. The thread.c in user space caused lots of hot kernel

Re: [RFC PATCH] alispinlock: acceleration from lock integration on multi-core platform

2016-02-03 Thread Waiman Long
On 02/02/2016 11:40 PM, Ling Ma wrote: Longman, The attachment include user space code(thread.c), and kernel patch(ali_work_queue.patch) based on 4.3.0-rc4, we replaced all original spinlock (list_lock) in slab.h/c with the new mechanism. The thread.c in user space caused lots of hot kernel

Re: [RFC PATCH] alispinlock: acceleration from lock integration on multi-core platform

2016-02-03 Thread Ling Ma
> I have 2 major comments here. First of all, you should break up your patch > into smaller ones. Large patch like the one in the tar ball is hard to > review. Ok, we will do it. >Secondly, you are modifying over 1000 lines of code in mm/slab.c > with some modest increase in performance. That

Re: [RFC PATCH] alispinlock: acceleration from lock integration on multi-core platform

2016-02-02 Thread Ling Ma
The attachment(thread.c) can tell us the new mechanism improve output from the user space code (thread,c) by 1.14x (1174810406/1026910602, kernel spinlock consumption is reduced from 25% to 15%) as below: ORG NEW 38186815 43644156 38340186 43121265 38383155 44087753 38567102 43532586

Re: [RFC PATCH] alispinlock: acceleration from lock integration on multi-core platform

2016-02-02 Thread Ling Ma
Longman, The attachment include user space code(thread.c), and kernel patch(ali_work_queue.patch) based on 4.3.0-rc4, we replaced all original spinlock (list_lock) in slab.h/c with the new mechanism. The thread.c in user space caused lots of hot kernel spinlock from __kmalloc and kfree, perf

Re: [RFC PATCH] alispinlock: acceleration from lock integration on multi-core platform

2016-02-02 Thread Ling Ma
The attachment(thread.c) can tell us the new mechanism improve output from the user space code (thread,c) by 1.14x (1174810406/1026910602, kernel spinlock consumption is reduced from 25% to 15%) as below: ORG NEW 38186815 43644156 38340186 43121265 38383155 44087753 38567102 43532586

Re: [RFC PATCH] alispinlock: acceleration from lock integration on multi-core platform

2016-02-02 Thread Ling Ma
Longman, The attachment include user space code(thread.c), and kernel patch(ali_work_queue.patch) based on 4.3.0-rc4, we replaced all original spinlock (list_lock) in slab.h/c with the new mechanism. The thread.c in user space caused lots of hot kernel spinlock from __kmalloc and kfree, perf

Re: [RFC PATCH] alispinlock: acceleration from lock integration on multi-core platform

2016-01-06 Thread One Thousand Gnomes
On Wed, 6 Jan 2016 09:21:06 +0100 Peter Zijlstra wrote: > On Wed, Jan 06, 2016 at 09:16:43AM +0100, Peter Zijlstra wrote: > > On Tue, Jan 05, 2016 at 09:42:27PM +, One Thousand Gnomes wrote: > > > > It suffers the typical problems all those constructs do; namely it > > > > wrecks

Re: [RFC PATCH] alispinlock: acceleration from lock integration on multi-core platform

2016-01-06 Thread Peter Zijlstra
On Wed, Jan 06, 2016 at 09:16:43AM +0100, Peter Zijlstra wrote: > On Tue, Jan 05, 2016 at 09:42:27PM +, One Thousand Gnomes wrote: > > > It suffers the typical problems all those constructs do; namely it > > > wrecks accountability. > > > > That's "government thinking" ;-) - for most real

Re: [RFC PATCH] alispinlock: acceleration from lock integration on multi-core platform

2016-01-06 Thread Peter Zijlstra
On Tue, Jan 05, 2016 at 09:42:27PM +, One Thousand Gnomes wrote: > > It suffers the typical problems all those constructs do; namely it > > wrecks accountability. > > That's "government thinking" ;-) - for most real users throughput is > more important than accountability. With the right API

Re: [RFC PATCH] alispinlock: acceleration from lock integration on multi-core platform

2016-01-06 Thread One Thousand Gnomes
On Wed, 6 Jan 2016 09:21:06 +0100 Peter Zijlstra wrote: > On Wed, Jan 06, 2016 at 09:16:43AM +0100, Peter Zijlstra wrote: > > On Tue, Jan 05, 2016 at 09:42:27PM +, One Thousand Gnomes wrote: > > > > It suffers the typical problems all those constructs do; namely it > >

Re: [RFC PATCH] alispinlock: acceleration from lock integration on multi-core platform

2016-01-06 Thread Peter Zijlstra
On Wed, Jan 06, 2016 at 09:16:43AM +0100, Peter Zijlstra wrote: > On Tue, Jan 05, 2016 at 09:42:27PM +, One Thousand Gnomes wrote: > > > It suffers the typical problems all those constructs do; namely it > > > wrecks accountability. > > > > That's "government thinking" ;-) - for most real

Re: [RFC PATCH] alispinlock: acceleration from lock integration on multi-core platform

2016-01-06 Thread Peter Zijlstra
On Tue, Jan 05, 2016 at 09:42:27PM +, One Thousand Gnomes wrote: > > It suffers the typical problems all those constructs do; namely it > > wrecks accountability. > > That's "government thinking" ;-) - for most real users throughput is > more important than accountability. With the right API

Re: [RFC PATCH] alispinlock: acceleration from lock integration on multi-core platform

2016-01-05 Thread One Thousand Gnomes
> It suffers the typical problems all those constructs do; namely it > wrecks accountability. That's "government thinking" ;-) - for most real users throughput is more important than accountability. With the right API it ought to also be compile time switchable. > But here that is compounded by

Re: [RFC PATCH] alispinlock: acceleration from lock integration on multi-core platform

2016-01-05 Thread Peter Zijlstra
On Thu, Dec 31, 2015 at 04:09:34PM +0800, ling.ma.prog...@gmail.com wrote: > +void alispinlock(struct ali_spinlock *lock, struct ali_spinlock_info *ali) > +{ > + struct ali_spinlock_info *next, *old; > + > + ali->next = NULL; > + ali->locked = 1; > + old = xchg(>lock_p, ali); > + >

Re: [RFC PATCH] alispinlock: acceleration from lock integration on multi-core platform

2016-01-05 Thread Waiman Long
On 12/31/2015 03:09 AM, ling.ma.prog...@gmail.com wrote: From: Ma Ling Hi ALL, Wire-latency(RC delay) dominate modern computer performance, conventional serialized works cause cache line ping-pong seriously, the process spend lots of time and power to complete. specially on multi-core

Re: [RFC PATCH] alispinlock: acceleration from lock integration on multi-core platform

2016-01-05 Thread Waiman Long
On 12/31/2015 03:09 AM, ling.ma.prog...@gmail.com wrote: From: Ma Ling Hi ALL, Wire-latency(RC delay) dominate modern computer performance, conventional serialized works cause cache line ping-pong seriously, the process spend lots of time and power to complete.

Re: [RFC PATCH] alispinlock: acceleration from lock integration on multi-core platform

2016-01-05 Thread Peter Zijlstra
On Thu, Dec 31, 2015 at 04:09:34PM +0800, ling.ma.prog...@gmail.com wrote: > +void alispinlock(struct ali_spinlock *lock, struct ali_spinlock_info *ali) > +{ > + struct ali_spinlock_info *next, *old; > + > + ali->next = NULL; > + ali->locked = 1; > + old = xchg(>lock_p, ali); > + >

Re: [RFC PATCH] alispinlock: acceleration from lock integration on multi-core platform

2016-01-05 Thread One Thousand Gnomes
> It suffers the typical problems all those constructs do; namely it > wrecks accountability. That's "government thinking" ;-) - for most real users throughput is more important than accountability. With the right API it ought to also be compile time switchable. > But here that is compounded by

[RFC PATCH] alispinlock: acceleration from lock integration on multi-core platform

2015-12-31 Thread ling . ma . program
From: Ma Ling Hi ALL, Wire-latency(RC delay) dominate modern computer performance, conventional serialized works cause cache line ping-pong seriously, the process spend lots of time and power to complete. specially on multi-core platform. However if the serialized works are sent to one core

[RFC PATCH] alispinlock: acceleration from lock integration on multi-core platform

2015-12-31 Thread ling . ma . program
From: Ma Ling Hi ALL, Wire-latency(RC delay) dominate modern computer performance, conventional serialized works cause cache line ping-pong seriously, the process spend lots of time and power to complete. specially on multi-core platform. However if the serialized