Is it acceptable for performance improvement or more comments on this patch?
Thanks
Ling
2016-04-05 11:44 GMT+08:00 Ling Ma :
> Hi Longman,
>
>> with some modest increase in performance. That can be hard to justify. Maybe
>> you should find other use cases that involve
Is it acceptable for performance improvement or more comments on this patch?
Thanks
Ling
2016-04-05 11:44 GMT+08:00 Ling Ma :
> Hi Longman,
>
>> with some modest increase in performance. That can be hard to justify. Maybe
>> you should find other use cases that involve less changes, but still
Hi Longman,
> with some modest increase in performance. That can be hard to justify. Maybe
> you should find other use cases that involve less changes, but still have
> noticeable performance improvement. That will make it easier to be accepted.
The attachment is for other use case with the new
Hi Longman,
> with some modest increase in performance. That can be hard to justify. Maybe
> you should find other use cases that involve less changes, but still have
> noticeable performance improvement. That will make it easier to be accepted.
The attachment is for other use case with the new
> I have 2 major comments here. First of all, you should break up your patch
> into smaller ones. Large patch like the one in the tar ball is hard to
> review.
Ok, we will do it.
>Secondly, you are modifying over 1000 lines of code in mm/slab.c
> with some modest increase in performance. That
On 02/02/2016 11:40 PM, Ling Ma wrote:
Longman,
The attachment include user space code(thread.c), and kernel
patch(ali_work_queue.patch) based on 4.3.0-rc4,
we replaced all original spinlock (list_lock) in slab.h/c with the
new mechanism.
The thread.c in user space caused lots of hot kernel
On 02/02/2016 11:40 PM, Ling Ma wrote:
Longman,
The attachment include user space code(thread.c), and kernel
patch(ali_work_queue.patch) based on 4.3.0-rc4,
we replaced all original spinlock (list_lock) in slab.h/c with the
new mechanism.
The thread.c in user space caused lots of hot kernel
> I have 2 major comments here. First of all, you should break up your patch
> into smaller ones. Large patch like the one in the tar ball is hard to
> review.
Ok, we will do it.
>Secondly, you are modifying over 1000 lines of code in mm/slab.c
> with some modest increase in performance. That
The attachment(thread.c) can tell us the new mechanism improve output
from the user space code (thread,c) by 1.14x (1174810406/1026910602,
kernel spinlock consumption is reduced from 25% to 15%) as below:
ORG NEW
38186815 43644156
38340186 43121265
38383155 44087753
38567102 43532586
Longman,
The attachment include user space code(thread.c), and kernel
patch(ali_work_queue.patch) based on 4.3.0-rc4,
we replaced all original spinlock (list_lock) in slab.h/c with the
new mechanism.
The thread.c in user space caused lots of hot kernel spinlock from
__kmalloc and kfree,
perf
The attachment(thread.c) can tell us the new mechanism improve output
from the user space code (thread,c) by 1.14x (1174810406/1026910602,
kernel spinlock consumption is reduced from 25% to 15%) as below:
ORG NEW
38186815 43644156
38340186 43121265
38383155 44087753
38567102 43532586
Longman,
The attachment include user space code(thread.c), and kernel
patch(ali_work_queue.patch) based on 4.3.0-rc4,
we replaced all original spinlock (list_lock) in slab.h/c with the
new mechanism.
The thread.c in user space caused lots of hot kernel spinlock from
__kmalloc and kfree,
perf
On Wed, 6 Jan 2016 09:21:06 +0100
Peter Zijlstra wrote:
> On Wed, Jan 06, 2016 at 09:16:43AM +0100, Peter Zijlstra wrote:
> > On Tue, Jan 05, 2016 at 09:42:27PM +, One Thousand Gnomes wrote:
> > > > It suffers the typical problems all those constructs do; namely it
> > > > wrecks
On Wed, Jan 06, 2016 at 09:16:43AM +0100, Peter Zijlstra wrote:
> On Tue, Jan 05, 2016 at 09:42:27PM +, One Thousand Gnomes wrote:
> > > It suffers the typical problems all those constructs do; namely it
> > > wrecks accountability.
> >
> > That's "government thinking" ;-) - for most real
On Tue, Jan 05, 2016 at 09:42:27PM +, One Thousand Gnomes wrote:
> > It suffers the typical problems all those constructs do; namely it
> > wrecks accountability.
>
> That's "government thinking" ;-) - for most real users throughput is
> more important than accountability. With the right API
On Wed, 6 Jan 2016 09:21:06 +0100
Peter Zijlstra wrote:
> On Wed, Jan 06, 2016 at 09:16:43AM +0100, Peter Zijlstra wrote:
> > On Tue, Jan 05, 2016 at 09:42:27PM +, One Thousand Gnomes wrote:
> > > > It suffers the typical problems all those constructs do; namely it
> >
On Wed, Jan 06, 2016 at 09:16:43AM +0100, Peter Zijlstra wrote:
> On Tue, Jan 05, 2016 at 09:42:27PM +, One Thousand Gnomes wrote:
> > > It suffers the typical problems all those constructs do; namely it
> > > wrecks accountability.
> >
> > That's "government thinking" ;-) - for most real
On Tue, Jan 05, 2016 at 09:42:27PM +, One Thousand Gnomes wrote:
> > It suffers the typical problems all those constructs do; namely it
> > wrecks accountability.
>
> That's "government thinking" ;-) - for most real users throughput is
> more important than accountability. With the right API
> It suffers the typical problems all those constructs do; namely it
> wrecks accountability.
That's "government thinking" ;-) - for most real users throughput is
more important than accountability. With the right API it ought to also
be compile time switchable.
> But here that is compounded by
On Thu, Dec 31, 2015 at 04:09:34PM +0800, ling.ma.prog...@gmail.com wrote:
> +void alispinlock(struct ali_spinlock *lock, struct ali_spinlock_info *ali)
> +{
> + struct ali_spinlock_info *next, *old;
> +
> + ali->next = NULL;
> + ali->locked = 1;
> + old = xchg(>lock_p, ali);
> +
>
On 12/31/2015 03:09 AM, ling.ma.prog...@gmail.com wrote:
From: Ma Ling
Hi ALL,
Wire-latency(RC delay) dominate modern computer performance,
conventional serialized works cause cache line ping-pong seriously,
the process spend lots of time and power to complete.
specially on multi-core
On 12/31/2015 03:09 AM, ling.ma.prog...@gmail.com wrote:
From: Ma Ling
Hi ALL,
Wire-latency(RC delay) dominate modern computer performance,
conventional serialized works cause cache line ping-pong seriously,
the process spend lots of time and power to complete.
On Thu, Dec 31, 2015 at 04:09:34PM +0800, ling.ma.prog...@gmail.com wrote:
> +void alispinlock(struct ali_spinlock *lock, struct ali_spinlock_info *ali)
> +{
> + struct ali_spinlock_info *next, *old;
> +
> + ali->next = NULL;
> + ali->locked = 1;
> + old = xchg(>lock_p, ali);
> +
>
> It suffers the typical problems all those constructs do; namely it
> wrecks accountability.
That's "government thinking" ;-) - for most real users throughput is
more important than accountability. With the right API it ought to also
be compile time switchable.
> But here that is compounded by
From: Ma Ling
Hi ALL,
Wire-latency(RC delay) dominate modern computer performance,
conventional serialized works cause cache line ping-pong seriously,
the process spend lots of time and power to complete.
specially on multi-core platform.
However if the serialized works are sent to one core
From: Ma Ling
Hi ALL,
Wire-latency(RC delay) dominate modern computer performance,
conventional serialized works cause cache line ping-pong seriously,
the process spend lots of time and power to complete.
specially on multi-core platform.
However if the serialized
26 matches
Mail list logo