Re: [PATCH] x86: Voluntary leave_mm before entering ACPI C3
On Thursday 20 December 2007 11:16, H. Peter Anvin wrote: > Arjan van de Ven wrote: > > On Wed, 19 Dec 2007 11:48:14 -0800 > > "H. Peter Anvin" <[EMAIL PROTECTED]> wrote: > > > >> I think C3 guarantees that the cache contents stay intact, and thus > >> it might make sense in some technology to preserve the TLB as well > >> (being a kind of cache.) > > > > that sounds nice. It's fiction though ;-) > > > > The thing to realize is that linux only sees "ACPI C3"; > > the BIOS maps that C3 to.. > > well any of the C states the processor in the system has. > > What you're saying is afaik correct for the *hardware* C3, not for the "C3" > > that Linux sees.. > > Well, it can only map ACPI C3 to a state which is no more "dead" than > what would normally be permitted by C3. IIRC, C3 is allowed to require > that DMA be turned off (unlike C2), but is not allowed to lose the CPU > state. Re: mapping HW to ACPI C-states. Right, it is fair game for the BIOS to map a "shallower" hardware C-state to a "deeper" ACPI C-state. Re: CPU state All C-states preserve the CPU SW programming state. (eg. while it may be saved and restored in HW, it appears to SW to be always intact). Re: C3 guarantees that the cache contents stay intact This is both true and false, depending on how you use the word "intact". If "intact" == "stays valid in cache", then no, this not guaranteed. The HW reserves the right to flush some or all of the L1 and L2 caches whenever it wants to -- this includes both HW and ACPI C2 and C3 states. If "intact" = "cache consistent", then yes, this guarantee is true. The way the guarantee is implemented varies by generation. Older systems would lock the bus in C3 to assure the processor was woken up for DMA to snoop. Newer hardware simply wakes the cache to snoop without waking the cores, or if it flushes the caches then it doesn't have to snoop at all -- which also counts as "cache consistent":-) cheers, -Len -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Voluntary leave_mm before entering ACPI C3
Arjan van de Ven wrote: On Thu, 20 Dec 2007 08:16:54 -0800 "H. Peter Anvin" <[EMAIL PROTECTED]> wrote: Arjan van de Ven wrote: On Wed, 19 Dec 2007 11:48:14 -0800 "H. Peter Anvin" <[EMAIL PROTECTED]> wrote: I think C3 guarantees that the cache contents stay intact, and thus it might make sense in some technology to preserve the TLB as well (being a kind of cache.) that sounds nice. It's fiction though ;-) The thing to realize is that linux only sees "ACPI C3"; the BIOS maps that C3 to.. well any of the C states the processor in the system has. What you're saying is afaik correct for the *hardware* C3, not for the "C3" that Linux sees.. Well, it can only map ACPI C3 to a state which is no more "dead" than what would normally be permitted by C3. IIRC, C3 is allowed to require that DMA be turned off (unlike C2), but is not allowed to lose the CPU state. state isn't lost if the tlb or the caches are flushed... (properly, eg all pending writebacks are written back first etc) Oh, right. My bad. Of course C3 doesn't guarantee cache retention, only cache coherency. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Voluntary leave_mm before entering ACPI C3
On Thu, 20 Dec 2007 08:16:54 -0800 "H. Peter Anvin" <[EMAIL PROTECTED]> wrote: > Arjan van de Ven wrote: > > On Wed, 19 Dec 2007 11:48:14 -0800 > > "H. Peter Anvin" <[EMAIL PROTECTED]> wrote: > > > >> I think C3 guarantees that the cache contents stay intact, and thus > >> it might make sense in some technology to preserve the TLB as well > >> (being a kind of cache.) > > > > that sounds nice. It's fiction though ;-) > > > > The thing to realize is that linux only sees "ACPI C3"; the BIOS > > maps that C3 to.. well any of the C states the processor in the > > system has. What you're saying is afaik correct for the *hardware* > > C3, not for the "C3" that Linux sees.. > > > > Well, it can only map ACPI C3 to a state which is no more "dead" than > what would normally be permitted by C3. IIRC, C3 is allowed to > require that DMA be turned off (unlike C2), but is not allowed to > lose the CPU state. state isn't lost if the tlb or the caches are flushed... (properly, eg all pending writebacks are written back first etc) -- If you want to reach me at my work email, use [EMAIL PROTECTED] For development, discussion and tips for power savings, visit http://www.lesswatts.org -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Voluntary leave_mm before entering ACPI C3
Arjan van de Ven wrote: On Wed, 19 Dec 2007 11:48:14 -0800 "H. Peter Anvin" <[EMAIL PROTECTED]> wrote: I think C3 guarantees that the cache contents stay intact, and thus it might make sense in some technology to preserve the TLB as well (being a kind of cache.) that sounds nice. It's fiction though ;-) The thing to realize is that linux only sees "ACPI C3"; the BIOS maps that C3 to.. well any of the C states the processor in the system has. What you're saying is afaik correct for the *hardware* C3, not for the "C3" that Linux sees.. Well, it can only map ACPI C3 to a state which is no more "dead" than what would normally be permitted by C3. IIRC, C3 is allowed to require that DMA be turned off (unlike C2), but is not allowed to lose the CPU state. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Voluntary leave_mm before entering ACPI C3
* Venki Pallipadi <[EMAIL PROTECTED]> wrote: > > Btw., since the TLB flush state machine is really subtle and > > fragile, could you try to run the following mmap stresstest i wrote > > some time ago: > > > >http://redhat.com/~mingo/threaded-mmap-stresstest/ > > > > for a couple of hours. It runs nr_cpus threads which then do a > > "random crazy mix" of mappings/unmappings/remappings of a 800 MB > > memory window. The more sockets/cores, the crazier the TLB races get > > ;-) > > > > Ingo, > > I ran this stress test on two systems (8 cores and 2 cores) for over 4 > hours without any issues. There was more than 20% C3 time during the > run. So, this C3 tlbflush path must have been stressed well during the > run. ok, great. Regarding power consumption: i suspect a real difference will only show up on multi-socket systems that can do deeper C modes, or on multicore systems that will benefit from longer idle time on another core. (i suspect most multicore CPUs today will only truly save significant amounts of power if all cores are idle.) In any case, not doing these extra IPIs is definitely a plus. Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Voluntary leave_mm before entering ACPI C3
* Venki Pallipadi [EMAIL PROTECTED] wrote: Btw., since the TLB flush state machine is really subtle and fragile, could you try to run the following mmap stresstest i wrote some time ago: http://redhat.com/~mingo/threaded-mmap-stresstest/ for a couple of hours. It runs nr_cpus threads which then do a random crazy mix of mappings/unmappings/remappings of a 800 MB memory window. The more sockets/cores, the crazier the TLB races get ;-) Ingo, I ran this stress test on two systems (8 cores and 2 cores) for over 4 hours without any issues. There was more than 20% C3 time during the run. So, this C3 tlbflush path must have been stressed well during the run. ok, great. Regarding power consumption: i suspect a real difference will only show up on multi-socket systems that can do deeper C modes, or on multicore systems that will benefit from longer idle time on another core. (i suspect most multicore CPUs today will only truly save significant amounts of power if all cores are idle.) In any case, not doing these extra IPIs is definitely a plus. Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Voluntary leave_mm before entering ACPI C3
Arjan van de Ven wrote: On Wed, 19 Dec 2007 11:48:14 -0800 H. Peter Anvin [EMAIL PROTECTED] wrote: I think C3 guarantees that the cache contents stay intact, and thus it might make sense in some technology to preserve the TLB as well (being a kind of cache.) that sounds nice. It's fiction though ;-) The thing to realize is that linux only sees ACPI C3; the BIOS maps that C3 to.. well any of the C states the processor in the system has. What you're saying is afaik correct for the *hardware* C3, not for the C3 that Linux sees.. Well, it can only map ACPI C3 to a state which is no more dead than what would normally be permitted by C3. IIRC, C3 is allowed to require that DMA be turned off (unlike C2), but is not allowed to lose the CPU state. -hpa -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Voluntary leave_mm before entering ACPI C3
On Thu, 20 Dec 2007 08:16:54 -0800 H. Peter Anvin [EMAIL PROTECTED] wrote: Arjan van de Ven wrote: On Wed, 19 Dec 2007 11:48:14 -0800 H. Peter Anvin [EMAIL PROTECTED] wrote: I think C3 guarantees that the cache contents stay intact, and thus it might make sense in some technology to preserve the TLB as well (being a kind of cache.) that sounds nice. It's fiction though ;-) The thing to realize is that linux only sees ACPI C3; the BIOS maps that C3 to.. well any of the C states the processor in the system has. What you're saying is afaik correct for the *hardware* C3, not for the C3 that Linux sees.. Well, it can only map ACPI C3 to a state which is no more dead than what would normally be permitted by C3. IIRC, C3 is allowed to require that DMA be turned off (unlike C2), but is not allowed to lose the CPU state. state isn't lost if the tlb or the caches are flushed... (properly, eg all pending writebacks are written back first etc) -- If you want to reach me at my work email, use [EMAIL PROTECTED] For development, discussion and tips for power savings, visit http://www.lesswatts.org -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Voluntary leave_mm before entering ACPI C3
Arjan van de Ven wrote: On Thu, 20 Dec 2007 08:16:54 -0800 H. Peter Anvin [EMAIL PROTECTED] wrote: Arjan van de Ven wrote: On Wed, 19 Dec 2007 11:48:14 -0800 H. Peter Anvin [EMAIL PROTECTED] wrote: I think C3 guarantees that the cache contents stay intact, and thus it might make sense in some technology to preserve the TLB as well (being a kind of cache.) that sounds nice. It's fiction though ;-) The thing to realize is that linux only sees ACPI C3; the BIOS maps that C3 to.. well any of the C states the processor in the system has. What you're saying is afaik correct for the *hardware* C3, not for the C3 that Linux sees.. Well, it can only map ACPI C3 to a state which is no more dead than what would normally be permitted by C3. IIRC, C3 is allowed to require that DMA be turned off (unlike C2), but is not allowed to lose the CPU state. state isn't lost if the tlb or the caches are flushed... (properly, eg all pending writebacks are written back first etc) Oh, right. My bad. Of course C3 doesn't guarantee cache retention, only cache coherency. -hpa -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Voluntary leave_mm before entering ACPI C3
On Thursday 20 December 2007 11:16, H. Peter Anvin wrote: Arjan van de Ven wrote: On Wed, 19 Dec 2007 11:48:14 -0800 H. Peter Anvin [EMAIL PROTECTED] wrote: I think C3 guarantees that the cache contents stay intact, and thus it might make sense in some technology to preserve the TLB as well (being a kind of cache.) that sounds nice. It's fiction though ;-) The thing to realize is that linux only sees ACPI C3; the BIOS maps that C3 to.. well any of the C states the processor in the system has. What you're saying is afaik correct for the *hardware* C3, not for the C3 that Linux sees.. Well, it can only map ACPI C3 to a state which is no more dead than what would normally be permitted by C3. IIRC, C3 is allowed to require that DMA be turned off (unlike C2), but is not allowed to lose the CPU state. Re: mapping HW to ACPI C-states. Right, it is fair game for the BIOS to map a shallower hardware C-state to a deeper ACPI C-state. Re: CPU state All C-states preserve the CPU SW programming state. (eg. while it may be saved and restored in HW, it appears to SW to be always intact). Re: C3 guarantees that the cache contents stay intact This is both true and false, depending on how you use the word intact. If intact == stays valid in cache, then no, this not guaranteed. The HW reserves the right to flush some or all of the L1 and L2 caches whenever it wants to -- this includes both HW and ACPI C2 and C3 states. If intact = cache consistent, then yes, this guarantee is true. The way the guarantee is implemented varies by generation. Older systems would lock the bus in C3 to assure the processor was woken up for DMA to snoop. Newer hardware simply wakes the cache to snoop without waking the cores, or if it flushes the caches then it doesn't have to snoop at all -- which also counts as cache consistent:-) cheers, -Len -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Voluntary leave_mm before entering ACPI C3
On Wed, 19 Dec 2007 11:48:14 -0800 "H. Peter Anvin" <[EMAIL PROTECTED]> wrote: > > I think C3 guarantees that the cache contents stay intact, and thus > it might make sense in some technology to preserve the TLB as well > (being a kind of cache.) that sounds nice. It's fiction though ;-) The thing to realize is that linux only sees "ACPI C3"; the BIOS maps that C3 to.. well any of the C states the processor in the system has. What you're saying is afaik correct for the *hardware* C3, not for the "C3" that Linux sees.. -- If you want to reach me at my work email, use [EMAIL PROTECTED] For development, discussion and tips for power savings, visit http://www.lesswatts.org -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Voluntary leave_mm before entering ACPI C3
On Wed, Dec 19, 2007 at 08:32:55PM +0100, Ingo Molnar wrote: > > * Venki Pallipadi <[EMAIL PROTECTED]> wrote: > > > Aviod TLB flush IPIs during C3 states by voluntary leave_mm() before > > entering C3. > > > > The performance impact of TLB flush on C3 should not be significant > > with respect to C3 wakeup latency. Also, CPUs tend to flush TLB in > > hardware while in C3 anyways. > > > > On a 8 logical CPU system, running make -j2, the number of tlbflush > > IPIs goes down from 40 per second to ~ 0. Total number of interrupts > > during the run of this workload was ~1200 per second, which makes it > > ~3% savings in wakeups. > > > > There was no measurable performance or power impact however. > > thanks, applied to x86.git. Nice and elegant patch! > > Btw., since the TLB flush state machine is really subtle and fragile, > could you try to run the following mmap stresstest i wrote some time > ago: > >http://redhat.com/~mingo/threaded-mmap-stresstest/ > > for a couple of hours. It runs nr_cpus threads which then do a "random > crazy mix" of mappings/unmappings/remappings of a 800 MB memory window. > The more sockets/cores, the crazier the TLB races get ;-) > Ingo, I ran this stress test on two systems (8 cores and 2 cores) for over 4 hours without any issues. There was more than 20% C3 time during the run. So, this C3 tlbflush path must have been stressed well during the run. And sorry about the patch not working on UP config. That was a silly oversight on my part. Thanks, Venki -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Voluntary leave_mm before entering ACPI C3
FYI, your patch needed the fix below for !SMP. It works fine otherwise. Ingo --- include/asm-x86/mmu.h |6 ++ 1 file changed, 6 insertions(+) Index: linux/include/asm-x86/mmu.h === --- linux.orig/include/asm-x86/mmu.h +++ linux/include/asm-x86/mmu.h @@ -20,6 +20,12 @@ typedef struct { void *vdso; } mm_context_t; +#ifdef CONFIG_SMP void leave_mm(int cpu); +#else +static inline void leave_mm(int cpu) +{ +} +#endif #endif /* _ASM_X86_MMU_H */ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Voluntary leave_mm before entering ACPI C3
On Wed, Dec 19, 2007 at 11:48:14AM -0800, H. Peter Anvin wrote: > Ingo Molnar wrote: > > > >i dont think it's required for C3 to even turn off any portion of the > >CPU - if an interrupt arrives after the C3 sequence is initiated but > >just before dirty cachelines have been flushed then the CPU can just > >return without touching anything (such as the TLB) - right? So i dont > >think there's any implicit guarantee of TLB flushing (nor should there > >be), but in practice, a good C3 sequence would (statistically) turn off > >large portions of the CPU and hence the TLB as well. > > > > I think C3 guarantees that the cache contents stay intact, and thus it > might make sense in some technology to preserve the TLB as well (being a > kind of cache.) > > Otherwise, what you say here of course is absolutely correct. > C3 does not guarantee all cache contents. Infact, atleast on Intel, L1 will be almost always flushed. Newer more power efficient CPUs does dynamic cache sizing [1] C3 just guarantees that the caches are coherent. That is, if they are intact, then DMA will keep cache consistent. Thanks, Venki [1] - http://download.intel.com/products/processor/core2duo/mobile_prod_brief.pdf -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Voluntary leave_mm before entering ACPI C3
On Wed, Dec 19, 2007 at 08:40:32PM +0100, Ingo Molnar wrote: > > * H. Peter Anvin <[EMAIL PROTECTED]> wrote: > > > Ingo Molnar wrote: > >> * Venki Pallipadi <[EMAIL PROTECTED]> wrote: > >> > >>> Aviod TLB flush IPIs during C3 states by voluntary leave_mm() before > >>> entering C3. > >>> > >>> The performance impact of TLB flush on C3 should not be significant with > >>> respect to C3 wakeup latency. Also, CPUs tend to flush TLB in hardware > >>> while in C3 anyways. > >>> > > > > Are there any CPUs around which *don't* flush the TLB across C3? (I > > guess it's not guaranteed by the spec, though, and as TLBs grow larger > > there might be incentive to keep them online.) > > i dont think it's required for C3 to even turn off any portion of the > CPU - if an interrupt arrives after the C3 sequence is initiated but > just before dirty cachelines have been flushed then the CPU can just > return without touching anything (such as the TLB) - right? So i dont > think there's any implicit guarantee of TLB flushing (nor should there > be), but in practice, a good C3 sequence would (statistically) turn off > large portions of the CPU and hence the TLB as well. > Yes. There are cases where hardware/BIOS can do C-state changes behind OS, with things like be in C1 for a while and then go to C2/C3 after a while etc. In such cases, there will be times when TLBs are not really flushed in hardware. But ideally, if C3 results in deep idle TLBs would be turned off. And in cases where we wake up earlier than expected, C-state policy should identify that and choose a lower C-state next time around. I also tried one variation of this, where in I only do flush if there are more than one CPU sharing the mm. But, that did not help the test case I was using (which is probably the worst case). What I would see is: Process runs on CPU x and mm is not shared Goes idle (C3) waiting on something Wakes up on CPU y which will now start sharing mm and would send flush IPI anyway Thanks, Venki -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Voluntary leave_mm before entering ACPI C3
Ingo Molnar wrote: i dont think it's required for C3 to even turn off any portion of the CPU - if an interrupt arrives after the C3 sequence is initiated but just before dirty cachelines have been flushed then the CPU can just return without touching anything (such as the TLB) - right? So i dont think there's any implicit guarantee of TLB flushing (nor should there be), but in practice, a good C3 sequence would (statistically) turn off large portions of the CPU and hence the TLB as well. I think C3 guarantees that the cache contents stay intact, and thus it might make sense in some technology to preserve the TLB as well (being a kind of cache.) Otherwise, what you say here of course is absolutely correct. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Voluntary leave_mm before entering ACPI C3
* H. Peter Anvin <[EMAIL PROTECTED]> wrote: > Ingo Molnar wrote: >> * Venki Pallipadi <[EMAIL PROTECTED]> wrote: >> >>> Aviod TLB flush IPIs during C3 states by voluntary leave_mm() before >>> entering C3. >>> >>> The performance impact of TLB flush on C3 should not be significant with >>> respect to C3 wakeup latency. Also, CPUs tend to flush TLB in hardware >>> while in C3 anyways. >>> > > Are there any CPUs around which *don't* flush the TLB across C3? (I > guess it's not guaranteed by the spec, though, and as TLBs grow larger > there might be incentive to keep them online.) i dont think it's required for C3 to even turn off any portion of the CPU - if an interrupt arrives after the C3 sequence is initiated but just before dirty cachelines have been flushed then the CPU can just return without touching anything (such as the TLB) - right? So i dont think there's any implicit guarantee of TLB flushing (nor should there be), but in practice, a good C3 sequence would (statistically) turn off large portions of the CPU and hence the TLB as well. Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Voluntary leave_mm before entering ACPI C3
Ingo Molnar wrote: * Venki Pallipadi <[EMAIL PROTECTED]> wrote: Aviod TLB flush IPIs during C3 states by voluntary leave_mm() before entering C3. The performance impact of TLB flush on C3 should not be significant with respect to C3 wakeup latency. Also, CPUs tend to flush TLB in hardware while in C3 anyways. Are there any CPUs around which *don't* flush the TLB across C3? (I guess it's not guaranteed by the spec, though, and as TLBs grow larger there might be incentive to keep them online.) -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Voluntary leave_mm before entering ACPI C3
* Venki Pallipadi <[EMAIL PROTECTED]> wrote: > Aviod TLB flush IPIs during C3 states by voluntary leave_mm() before > entering C3. > > The performance impact of TLB flush on C3 should not be significant > with respect to C3 wakeup latency. Also, CPUs tend to flush TLB in > hardware while in C3 anyways. > > On a 8 logical CPU system, running make -j2, the number of tlbflush > IPIs goes down from 40 per second to ~ 0. Total number of interrupts > during the run of this workload was ~1200 per second, which makes it > ~3% savings in wakeups. > > There was no measurable performance or power impact however. thanks, applied to x86.git. Nice and elegant patch! Btw., since the TLB flush state machine is really subtle and fragile, could you try to run the following mmap stresstest i wrote some time ago: http://redhat.com/~mingo/threaded-mmap-stresstest/ for a couple of hours. It runs nr_cpus threads which then do a "random crazy mix" of mappings/unmappings/remappings of a 800 MB memory window. The more sockets/cores, the crazier the TLB races get ;-) Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] x86: Voluntary leave_mm before entering ACPI C3
Aviod TLB flush IPIs during C3 states by voluntary leave_mm() before entering C3. The performance impact of TLB flush on C3 should not be significant with respect to C3 wakeup latency. Also, CPUs tend to flush TLB in hardware while in C3 anyways. On a 8 logical CPU system, running make -j2, the number of tlbflush IPIs goes down from 40 per second to ~ 0. Total number of interrupts during the run of this workload was ~1200 per second, which makes it ~3% savings in wakeups. There was no measurable performance or power impact however. Signed-off-by: Venkatesh Pallipadi <[EMAIL PROTECTED]> Index: linux-2.6.24-rc/arch/x86/kernel/smp_64.c === --- linux-2.6.24-rc.orig/arch/x86/kernel/smp_64.c +++ linux-2.6.24-rc/arch/x86/kernel/smp_64.c @@ -70,7 +70,7 @@ static DEFINE_PER_CPU(union smp_flush_st * We cannot call mmdrop() because we are in interrupt context, * instead update mm->cpu_vm_mask. */ -static inline void leave_mm(int cpu) +void leave_mm(int cpu) { if (read_pda(mmu_state) == TLBSTATE_OK) BUG(); Index: linux-2.6.24-rc/include/asm-x86/acpi_32.h === --- linux-2.6.24-rc.orig/include/asm-x86/acpi_32.h +++ linux-2.6.24-rc/include/asm-x86/acpi_32.h @@ -31,6 +31,7 @@ #include #include /* defines cmpxchg */ +#include #define COMPILER_DEPENDENT_INT64 long long #define COMPILER_DEPENDENT_UINT64 unsigned long long @@ -138,6 +139,8 @@ static inline void disable_acpi(void) { #define ARCH_HAS_POWER_INIT1 +#define acpi_unlazy_tlb(x) leave_mm(x) + #endif /*__KERNEL__*/ #endif /*_ASM_ACPI_H*/ Index: linux-2.6.24-rc/include/asm-x86/acpi_64.h === --- linux-2.6.24-rc.orig/include/asm-x86/acpi_64.h +++ linux-2.6.24-rc/include/asm-x86/acpi_64.h @@ -30,6 +30,7 @@ #include #include +#include #define COMPILER_DEPENDENT_INT64 long long #define COMPILER_DEPENDENT_UINT64 unsigned long long @@ -148,6 +149,8 @@ static inline void acpi_fake_nodes(const } #endif +#define acpi_unlazy_tlb(x) leave_mm(x) + #endif /*__KERNEL__*/ #endif /*_ASM_ACPI_H*/ Index: linux-2.6.24-rc/drivers/acpi/processor_idle.c === --- linux-2.6.24-rc.orig/drivers/acpi/processor_idle.c +++ linux-2.6.24-rc/drivers/acpi/processor_idle.c @@ -530,6 +530,7 @@ static void acpi_processor_idle(void) break; case ACPI_STATE_C3: + acpi_unlazy_tlb(smp_processor_id()); /* * disable bus master * bm_check implies we need ARB_DIS @@ -1485,6 +1486,7 @@ static int acpi_idle_enter_bm(struct cpu return 0; } + acpi_unlazy_tlb(smp_processor_id()); /* * Must be done before busmaster disable as we might need to * access HPET ! Index: linux-2.6.24-rc/include/asm-ia64/acpi.h === --- linux-2.6.24-rc.orig/include/asm-ia64/acpi.h +++ linux-2.6.24-rc/include/asm-ia64/acpi.h @@ -126,6 +126,8 @@ extern int __devinitdata pxm_to_nid_map[ extern int __initdata nid_to_pxm_map[MAX_NUMNODES]; #endif +#define acpi_unlazy_tlb(x) + #endif /*__KERNEL__*/ #endif /*_ASM_ACPI_H*/ Index: linux-2.6.24-rc/include/asm-x86/mmu.h === --- linux-2.6.24-rc.orig/include/asm-x86/mmu.h +++ linux-2.6.24-rc/include/asm-x86/mmu.h @@ -20,4 +20,6 @@ typedef struct { void *vdso; } mm_context_t; +void leave_mm(int cpu); + #endif /* _ASM_X86_MMU_H */ Index: linux-2.6.24-rc/arch/x86/kernel/smp_32.c === --- linux-2.6.24-rc.orig/arch/x86/kernel/smp_32.c +++ linux-2.6.24-rc/arch/x86/kernel/smp_32.c @@ -256,7 +256,7 @@ static DEFINE_SPINLOCK(tlbstate_lock); * We need to reload %cr3 since the page tables may be going * away from under us.. */ -void leave_mm(unsigned long cpu) +void leave_mm(int cpu) { if (per_cpu(cpu_tlbstate, cpu).state == TLBSTATE_OK) BUG(); Index: linux-2.6.24-rc/include/asm-x86/mmu_context_32.h === --- linux-2.6.24-rc.orig/include/asm-x86/mmu_context_32.h +++ linux-2.6.24-rc/include/asm-x86/mmu_context_32.h @@ -32,8 +32,6 @@ static inline void enter_lazy_tlb(struct #endif } -void leave_mm(unsigned long cpu); - static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next, struct task_struct *tsk) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] x86: Voluntary leave_mm before entering ACPI C3
Aviod TLB flush IPIs during C3 states by voluntary leave_mm() before entering C3. The performance impact of TLB flush on C3 should not be significant with respect to C3 wakeup latency. Also, CPUs tend to flush TLB in hardware while in C3 anyways. On a 8 logical CPU system, running make -j2, the number of tlbflush IPIs goes down from 40 per second to ~ 0. Total number of interrupts during the run of this workload was ~1200 per second, which makes it ~3% savings in wakeups. There was no measurable performance or power impact however. Signed-off-by: Venkatesh Pallipadi [EMAIL PROTECTED] Index: linux-2.6.24-rc/arch/x86/kernel/smp_64.c === --- linux-2.6.24-rc.orig/arch/x86/kernel/smp_64.c +++ linux-2.6.24-rc/arch/x86/kernel/smp_64.c @@ -70,7 +70,7 @@ static DEFINE_PER_CPU(union smp_flush_st * We cannot call mmdrop() because we are in interrupt context, * instead update mm-cpu_vm_mask. */ -static inline void leave_mm(int cpu) +void leave_mm(int cpu) { if (read_pda(mmu_state) == TLBSTATE_OK) BUG(); Index: linux-2.6.24-rc/include/asm-x86/acpi_32.h === --- linux-2.6.24-rc.orig/include/asm-x86/acpi_32.h +++ linux-2.6.24-rc/include/asm-x86/acpi_32.h @@ -31,6 +31,7 @@ #include acpi/pdc_intel.h #include asm/system.h/* defines cmpxchg */ +#include asm/mmu.h #define COMPILER_DEPENDENT_INT64 long long #define COMPILER_DEPENDENT_UINT64 unsigned long long @@ -138,6 +139,8 @@ static inline void disable_acpi(void) { #define ARCH_HAS_POWER_INIT1 +#define acpi_unlazy_tlb(x) leave_mm(x) + #endif /*__KERNEL__*/ #endif /*_ASM_ACPI_H*/ Index: linux-2.6.24-rc/include/asm-x86/acpi_64.h === --- linux-2.6.24-rc.orig/include/asm-x86/acpi_64.h +++ linux-2.6.24-rc/include/asm-x86/acpi_64.h @@ -30,6 +30,7 @@ #include acpi/pdc_intel.h #include asm/numa.h +#include asm/mmu.h #define COMPILER_DEPENDENT_INT64 long long #define COMPILER_DEPENDENT_UINT64 unsigned long long @@ -148,6 +149,8 @@ static inline void acpi_fake_nodes(const } #endif +#define acpi_unlazy_tlb(x) leave_mm(x) + #endif /*__KERNEL__*/ #endif /*_ASM_ACPI_H*/ Index: linux-2.6.24-rc/drivers/acpi/processor_idle.c === --- linux-2.6.24-rc.orig/drivers/acpi/processor_idle.c +++ linux-2.6.24-rc/drivers/acpi/processor_idle.c @@ -530,6 +530,7 @@ static void acpi_processor_idle(void) break; case ACPI_STATE_C3: + acpi_unlazy_tlb(smp_processor_id()); /* * disable bus master * bm_check implies we need ARB_DIS @@ -1485,6 +1486,7 @@ static int acpi_idle_enter_bm(struct cpu return 0; } + acpi_unlazy_tlb(smp_processor_id()); /* * Must be done before busmaster disable as we might need to * access HPET ! Index: linux-2.6.24-rc/include/asm-ia64/acpi.h === --- linux-2.6.24-rc.orig/include/asm-ia64/acpi.h +++ linux-2.6.24-rc/include/asm-ia64/acpi.h @@ -126,6 +126,8 @@ extern int __devinitdata pxm_to_nid_map[ extern int __initdata nid_to_pxm_map[MAX_NUMNODES]; #endif +#define acpi_unlazy_tlb(x) + #endif /*__KERNEL__*/ #endif /*_ASM_ACPI_H*/ Index: linux-2.6.24-rc/include/asm-x86/mmu.h === --- linux-2.6.24-rc.orig/include/asm-x86/mmu.h +++ linux-2.6.24-rc/include/asm-x86/mmu.h @@ -20,4 +20,6 @@ typedef struct { void *vdso; } mm_context_t; +void leave_mm(int cpu); + #endif /* _ASM_X86_MMU_H */ Index: linux-2.6.24-rc/arch/x86/kernel/smp_32.c === --- linux-2.6.24-rc.orig/arch/x86/kernel/smp_32.c +++ linux-2.6.24-rc/arch/x86/kernel/smp_32.c @@ -256,7 +256,7 @@ static DEFINE_SPINLOCK(tlbstate_lock); * We need to reload %cr3 since the page tables may be going * away from under us.. */ -void leave_mm(unsigned long cpu) +void leave_mm(int cpu) { if (per_cpu(cpu_tlbstate, cpu).state == TLBSTATE_OK) BUG(); Index: linux-2.6.24-rc/include/asm-x86/mmu_context_32.h === --- linux-2.6.24-rc.orig/include/asm-x86/mmu_context_32.h +++ linux-2.6.24-rc/include/asm-x86/mmu_context_32.h @@ -32,8 +32,6 @@ static inline void enter_lazy_tlb(struct #endif } -void leave_mm(unsigned long cpu); - static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next, struct task_struct *tsk) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at
Re: [PATCH] x86: Voluntary leave_mm before entering ACPI C3
* Venki Pallipadi [EMAIL PROTECTED] wrote: Aviod TLB flush IPIs during C3 states by voluntary leave_mm() before entering C3. The performance impact of TLB flush on C3 should not be significant with respect to C3 wakeup latency. Also, CPUs tend to flush TLB in hardware while in C3 anyways. On a 8 logical CPU system, running make -j2, the number of tlbflush IPIs goes down from 40 per second to ~ 0. Total number of interrupts during the run of this workload was ~1200 per second, which makes it ~3% savings in wakeups. There was no measurable performance or power impact however. thanks, applied to x86.git. Nice and elegant patch! Btw., since the TLB flush state machine is really subtle and fragile, could you try to run the following mmap stresstest i wrote some time ago: http://redhat.com/~mingo/threaded-mmap-stresstest/ for a couple of hours. It runs nr_cpus threads which then do a random crazy mix of mappings/unmappings/remappings of a 800 MB memory window. The more sockets/cores, the crazier the TLB races get ;-) Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Voluntary leave_mm before entering ACPI C3
Ingo Molnar wrote: * Venki Pallipadi [EMAIL PROTECTED] wrote: Aviod TLB flush IPIs during C3 states by voluntary leave_mm() before entering C3. The performance impact of TLB flush on C3 should not be significant with respect to C3 wakeup latency. Also, CPUs tend to flush TLB in hardware while in C3 anyways. Are there any CPUs around which *don't* flush the TLB across C3? (I guess it's not guaranteed by the spec, though, and as TLBs grow larger there might be incentive to keep them online.) -hpa -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Voluntary leave_mm before entering ACPI C3
* H. Peter Anvin [EMAIL PROTECTED] wrote: Ingo Molnar wrote: * Venki Pallipadi [EMAIL PROTECTED] wrote: Aviod TLB flush IPIs during C3 states by voluntary leave_mm() before entering C3. The performance impact of TLB flush on C3 should not be significant with respect to C3 wakeup latency. Also, CPUs tend to flush TLB in hardware while in C3 anyways. Are there any CPUs around which *don't* flush the TLB across C3? (I guess it's not guaranteed by the spec, though, and as TLBs grow larger there might be incentive to keep them online.) i dont think it's required for C3 to even turn off any portion of the CPU - if an interrupt arrives after the C3 sequence is initiated but just before dirty cachelines have been flushed then the CPU can just return without touching anything (such as the TLB) - right? So i dont think there's any implicit guarantee of TLB flushing (nor should there be), but in practice, a good C3 sequence would (statistically) turn off large portions of the CPU and hence the TLB as well. Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Voluntary leave_mm before entering ACPI C3
Ingo Molnar wrote: i dont think it's required for C3 to even turn off any portion of the CPU - if an interrupt arrives after the C3 sequence is initiated but just before dirty cachelines have been flushed then the CPU can just return without touching anything (such as the TLB) - right? So i dont think there's any implicit guarantee of TLB flushing (nor should there be), but in practice, a good C3 sequence would (statistically) turn off large portions of the CPU and hence the TLB as well. I think C3 guarantees that the cache contents stay intact, and thus it might make sense in some technology to preserve the TLB as well (being a kind of cache.) Otherwise, what you say here of course is absolutely correct. -hpa -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Voluntary leave_mm before entering ACPI C3
On Wed, Dec 19, 2007 at 08:40:32PM +0100, Ingo Molnar wrote: * H. Peter Anvin [EMAIL PROTECTED] wrote: Ingo Molnar wrote: * Venki Pallipadi [EMAIL PROTECTED] wrote: Aviod TLB flush IPIs during C3 states by voluntary leave_mm() before entering C3. The performance impact of TLB flush on C3 should not be significant with respect to C3 wakeup latency. Also, CPUs tend to flush TLB in hardware while in C3 anyways. Are there any CPUs around which *don't* flush the TLB across C3? (I guess it's not guaranteed by the spec, though, and as TLBs grow larger there might be incentive to keep them online.) i dont think it's required for C3 to even turn off any portion of the CPU - if an interrupt arrives after the C3 sequence is initiated but just before dirty cachelines have been flushed then the CPU can just return without touching anything (such as the TLB) - right? So i dont think there's any implicit guarantee of TLB flushing (nor should there be), but in practice, a good C3 sequence would (statistically) turn off large portions of the CPU and hence the TLB as well. Yes. There are cases where hardware/BIOS can do C-state changes behind OS, with things like be in C1 for a while and then go to C2/C3 after a while etc. In such cases, there will be times when TLBs are not really flushed in hardware. But ideally, if C3 results in deep idle TLBs would be turned off. And in cases where we wake up earlier than expected, C-state policy should identify that and choose a lower C-state next time around. I also tried one variation of this, where in I only do flush if there are more than one CPU sharing the mm. But, that did not help the test case I was using (which is probably the worst case). What I would see is: Process runs on CPU x and mm is not shared Goes idle (C3) waiting on something Wakes up on CPU y which will now start sharing mm and would send flush IPI anyway Thanks, Venki -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Voluntary leave_mm before entering ACPI C3
On Wed, Dec 19, 2007 at 11:48:14AM -0800, H. Peter Anvin wrote: Ingo Molnar wrote: i dont think it's required for C3 to even turn off any portion of the CPU - if an interrupt arrives after the C3 sequence is initiated but just before dirty cachelines have been flushed then the CPU can just return without touching anything (such as the TLB) - right? So i dont think there's any implicit guarantee of TLB flushing (nor should there be), but in practice, a good C3 sequence would (statistically) turn off large portions of the CPU and hence the TLB as well. I think C3 guarantees that the cache contents stay intact, and thus it might make sense in some technology to preserve the TLB as well (being a kind of cache.) Otherwise, what you say here of course is absolutely correct. C3 does not guarantee all cache contents. Infact, atleast on Intel, L1 will be almost always flushed. Newer more power efficient CPUs does dynamic cache sizing [1] C3 just guarantees that the caches are coherent. That is, if they are intact, then DMA will keep cache consistent. Thanks, Venki [1] - http://download.intel.com/products/processor/core2duo/mobile_prod_brief.pdf -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Voluntary leave_mm before entering ACPI C3
FYI, your patch needed the fix below for !SMP. It works fine otherwise. Ingo --- include/asm-x86/mmu.h |6 ++ 1 file changed, 6 insertions(+) Index: linux/include/asm-x86/mmu.h === --- linux.orig/include/asm-x86/mmu.h +++ linux/include/asm-x86/mmu.h @@ -20,6 +20,12 @@ typedef struct { void *vdso; } mm_context_t; +#ifdef CONFIG_SMP void leave_mm(int cpu); +#else +static inline void leave_mm(int cpu) +{ +} +#endif #endif /* _ASM_X86_MMU_H */ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Voluntary leave_mm before entering ACPI C3
On Wed, Dec 19, 2007 at 08:32:55PM +0100, Ingo Molnar wrote: * Venki Pallipadi [EMAIL PROTECTED] wrote: Aviod TLB flush IPIs during C3 states by voluntary leave_mm() before entering C3. The performance impact of TLB flush on C3 should not be significant with respect to C3 wakeup latency. Also, CPUs tend to flush TLB in hardware while in C3 anyways. On a 8 logical CPU system, running make -j2, the number of tlbflush IPIs goes down from 40 per second to ~ 0. Total number of interrupts during the run of this workload was ~1200 per second, which makes it ~3% savings in wakeups. There was no measurable performance or power impact however. thanks, applied to x86.git. Nice and elegant patch! Btw., since the TLB flush state machine is really subtle and fragile, could you try to run the following mmap stresstest i wrote some time ago: http://redhat.com/~mingo/threaded-mmap-stresstest/ for a couple of hours. It runs nr_cpus threads which then do a random crazy mix of mappings/unmappings/remappings of a 800 MB memory window. The more sockets/cores, the crazier the TLB races get ;-) Ingo, I ran this stress test on two systems (8 cores and 2 cores) for over 4 hours without any issues. There was more than 20% C3 time during the run. So, this C3 tlbflush path must have been stressed well during the run. And sorry about the patch not working on UP config. That was a silly oversight on my part. Thanks, Venki -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Voluntary leave_mm before entering ACPI C3
On Wed, 19 Dec 2007 11:48:14 -0800 H. Peter Anvin [EMAIL PROTECTED] wrote: I think C3 guarantees that the cache contents stay intact, and thus it might make sense in some technology to preserve the TLB as well (being a kind of cache.) that sounds nice. It's fiction though ;-) The thing to realize is that linux only sees ACPI C3; the BIOS maps that C3 to.. well any of the C states the processor in the system has. What you're saying is afaik correct for the *hardware* C3, not for the C3 that Linux sees.. -- If you want to reach me at my work email, use [EMAIL PROTECTED] For development, discussion and tips for power savings, visit http://www.lesswatts.org -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/