Re: Some interesting observations when trying to optimize vmstat handling

2007-11-09 Thread Andi Kleen
On Friday 09 November 2007 01:19, Jeremy Fitzhardinge wrote: > Andi Kleen wrote: > > The only problem is that there might be some code who relies on > > restore_flags() restoring other flags that IF, but at least for > > interrupts and local_irq_save/restore it should be fine to change. > > I

Re: Some interesting observations when trying to optimize vmstat handling

2007-11-09 Thread Andi Kleen
On Friday 09 November 2007 01:19, Jeremy Fitzhardinge wrote: Andi Kleen wrote: The only problem is that there might be some code who relies on restore_flags() restoring other flags that IF, but at least for interrupts and local_irq_save/restore it should be fine to change. I don't think

Re: Some interesting observations when trying to optimize vmstat handling

2007-11-08 Thread David Miller
From: Christoph Lameter <[EMAIL PROTECTED]> Date: Thu, 8 Nov 2007 11:58:58 -0800 (PST) > The problem with cmpxchg_local here is that the differential has to > be read before we execute the cmpxchg_local. So the cacheline is > acquired first in read mode and then made exclusive on executing the >

Re: Some interesting observations when trying to optimize vmstat handling

2007-11-08 Thread Jeremy Fitzhardinge
Andi Kleen wrote: > The only problem is that there might be some code who relies on > restore_flags() restoring other flags that IF, but at least for interrupts > and local_irq_save/restore it should be fine to change. > I don't think so. We don't bother to save/restore the other flags in

Re: Some interesting observations when trying to optimize vmstat handling

2007-11-08 Thread Christoph Lameter
On Fri, 9 Nov 2007, Andi Kleen wrote: > > > There is an interrupt enable overhead of 48 cycles that would be good to > > be able to eliminate (Kernel code usually moves counter increments into > > a neighboring interrupt disable section so that __ function can be used). > > Replace the push

Re: Some interesting observations when trying to optimize vmstat handling

2007-11-08 Thread Andi Kleen
> There is an interrupt enable overhead of 48 cycles that would be good to > be able to eliminate (Kernel code usually moves counter increments into > a neighboring interrupt disable section so that __ function can be used). Replace the push flags ; popf with test $IFMASK,flags ; jz 1f; sti ;

Some interesting observations when trying to optimize vmstat handling

2007-11-08 Thread Christoph Lameter
I looked into getting rid of the interrupt enable/disable when updating vm statistics in vmstat.c. The SLUB removal of the interrupt enable/disable doubled the performance of the fast path so maybe we can do the same to vm statistics. Measurements were done on an 8p SMP system (dual quad core

Re: Some interesting observations when trying to optimize vmstat handling

2007-11-08 Thread Christoph Lameter
On Fri, 9 Nov 2007, Andi Kleen wrote: There is an interrupt enable overhead of 48 cycles that would be good to be able to eliminate (Kernel code usually moves counter increments into a neighboring interrupt disable section so that __ function can be used). Replace the push flags ; popf

Some interesting observations when trying to optimize vmstat handling

2007-11-08 Thread Christoph Lameter
I looked into getting rid of the interrupt enable/disable when updating vm statistics in vmstat.c. The SLUB removal of the interrupt enable/disable doubled the performance of the fast path so maybe we can do the same to vm statistics. Measurements were done on an 8p SMP system (dual quad core

Re: Some interesting observations when trying to optimize vmstat handling

2007-11-08 Thread Jeremy Fitzhardinge
Andi Kleen wrote: The only problem is that there might be some code who relies on restore_flags() restoring other flags that IF, but at least for interrupts and local_irq_save/restore it should be fine to change. I don't think so. We don't bother to save/restore the other flags in Xen

Re: Some interesting observations when trying to optimize vmstat handling

2007-11-08 Thread David Miller
From: Christoph Lameter [EMAIL PROTECTED] Date: Thu, 8 Nov 2007 11:58:58 -0800 (PST) The problem with cmpxchg_local here is that the differential has to be read before we execute the cmpxchg_local. So the cacheline is acquired first in read mode and then made exclusive on executing the

Re: Some interesting observations when trying to optimize vmstat handling

2007-11-08 Thread Andi Kleen
There is an interrupt enable overhead of 48 cycles that would be good to be able to eliminate (Kernel code usually moves counter increments into a neighboring interrupt disable section so that __ function can be used). Replace the push flags ; popf with test $IFMASK,flags ; jz 1f; sti ; 1: