Re: Intel TurboBoost in practice
On Mon, Jul 26, 2010 at 9:11 AM, Alexander Motin m...@freebsd.org wrote: Robert Watson wrote: On Sun, 25 Jul 2010, Alexander Motin wrote: The numbers that you are showing doesn't show much difference. Have you tried buildworld? If you mean relative difference -- as I have told, it's mostly because of my CPU. It's maximal boost is 266MHz (8.3%), but 133MHz of them is enabled most of time if CPU is not overheated. It probably doesn't, as it works on clear table under air conditioner. So maximal effect I can expect on is 4.2%. In such situation 2.8% probably not so bad to illustrate that feature works and there is space for further improvements. If I had Core i5-750S I would expect 33% boost. Can I recommend the use of ministat(1) and sample sizes of at least 8 runs per configuration? Thanks for pushing me to do it right. :) Here is 3*15 runs with fresh kernel with disabled debug. Results are quite close to original: -2.73% and -2.19% of time. x C1 + C2 * C3 +-+ |+* x| |+* x| |+* x| |+* x| |+* x| |+* x| |+* x| |+ ** x| |+ + ** xx| |+ + ** ** xx x| | |__M_A| | |A| | ||A| | +-+ NMinMax Median AvgStddev x 15 12.68 12.84 12.69 12.698667 0.039254966 + 15 12.35 12.36 12.35 12.351333 0.0035186578 Difference at 95.0% confidence -0.347333 +/- 0.0208409 -2.7352% +/- 0.164119% (Student's t, pooled s = 0.0278687) * 15 12.41 12.44 12.42 12.42 0.0075592895 Difference at 95.0% confidence -0.278667 +/- 0.0211391 -2.19446% +/- 0.166467% (Student's t, pooled s = 0.0282674) I also checked one more aspect -- TurboBoost works only when CPU runs at highest EIST frequency (P0 state). I've reduced dev.cpu.0.freq from 3201 to 3067 and repeated the test: x C1 + C2 * C3 +-+ | x + *| | x + *| | x + *| | x + * *| | x x+ * *| | x x+ + * *| | x x+ + * *| | x x+ + * *| | x x+ + + + * *| ||MA| | | |_MA_|| |M_A_|| +-+ NMinMax Median AvgStddev x 15 13.72 13.73 13.72 13.72 0.0048795004 + 15 13.79 13.82 13.8 13.80 0.0072374686 Difference at 95.0% confidence 0.08 +/- 0.00461567 0.582949% +/- 0.0336337% (Student's t, pooled s = 0.00617213) * 15 13.89 13.9 13.8913.894 0.0050709255 Difference at 95.0% confidence 0.170667 +/- 0.00372127 1.24362% +/- 0.0271164% (Student's t, pooled s = 0.00497613) In that case using C2 or C3 predictably caused small performance reduce, as after falling to sleep, CPU needs time to wakeup. Even if tested CPU0 won't ever sleep during test, it's TLB shutdown IPIs to other cores still probably could suffer from waiting other cores' wakeup. In the deeper sleep states, are the TLB contents actually maintained while the processor sleeps? (I notice that in some configurations, we actually flush dirty data from the cache before sleeping.) Alan ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to
Re: Intel TurboBoost in practice
Alan Cox wrote: On Mon, Jul 26, 2010 at 9:11 AM, Alexander Motin m...@freebsd.org mailto:m...@freebsd.org wrote: In that case using C2 or C3 predictably caused small performance reduce, as after falling to sleep, CPU needs time to wakeup. Even if tested CPU0 won't ever sleep during test, it's TLB shutdown IPIs to other cores still probably could suffer from waiting other cores' wakeup. In the deeper sleep states, are the TLB contents actually maintained while the processor sleeps? (I notice that in some configurations, we actually flush dirty data from the cache before sleeping.) As I understand, we flush caches only as last resort, if platform does not supports special techniques, such as disabling arbitration or making CPU to wake up on bus mastering. But same ACPI C-states could map into different CPU C-states. Some of these CPU states (like C6) could imply caches invalidation, though I am not sure it can be seen outside. ACPI 3.0 specification tells nothing about TLBs, so I am not sure we can count on their invalidation, except we do it ourselves, like it is done for caches when CPU can't keep their coherency while sleeping. -- Alexander Motin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Intel TurboBoost in practice
On Sun, 25 Jul 2010, Alexander Motin wrote: The numbers that you are showing doesn't show much difference. Have you tried buildworld? If you mean relative difference -- as I have told, it's mostly because of my CPU. It's maximal boost is 266MHz (8.3%), but 133MHz of them is enabled most of time if CPU is not overheated. It probably doesn't, as it works on clear table under air conditioner. So maximal effect I can expect on is 4.2%. In such situation 2.8% probably not so bad to illustrate that feature works and there is space for further improvements. If I had Core i5-750S I would expect 33% boost. Can I recommend the use of ministat(1) and sample sizes of at least 8 runs per configuration? Robert If you mean absolute difference, here are results or four buildworld runs: hw.acpi.cpu.cx_lowest=C1: 4654.23 sec hw.acpi.cpu.cx_lowest=C2: 4556.37 sec hw.acpi.cpu.cx_lowest=C2: 4570.85 sec hw.acpi.cpu.cx_lowest=C1: 4679.83 sec Benefit is about 2.1%. Each time results were erased and sources pre-cached into RAM. Storage was SSD, so disk should not be an issue. -- Alexander Motin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Intel TurboBoost in practice
Robert Watson wrote: On Sun, 25 Jul 2010, Alexander Motin wrote: The numbers that you are showing doesn't show much difference. Have you tried buildworld? If you mean relative difference -- as I have told, it's mostly because of my CPU. It's maximal boost is 266MHz (8.3%), but 133MHz of them is enabled most of time if CPU is not overheated. It probably doesn't, as it works on clear table under air conditioner. So maximal effect I can expect on is 4.2%. In such situation 2.8% probably not so bad to illustrate that feature works and there is space for further improvements. If I had Core i5-750S I would expect 33% boost. Can I recommend the use of ministat(1) and sample sizes of at least 8 runs per configuration? Thanks for pushing me to do it right. :) Here is 3*15 runs with fresh kernel with disabled debug. Results are quite close to original: -2.73% and -2.19% of time. x C1 + C2 * C3 +-+ |+* x| |+* x| |+* x| |+* x| |+* x| |+* x| |+* x| |+ ** x| |+ + ** xx| |+ + ** ** xx x| | |__M_A| | |A| | ||A| | +-+ NMinMax Median AvgStddev x 15 12.68 12.84 12.69 12.698667 0.039254966 + 15 12.35 12.36 12.35 12.351333 0.0035186578 Difference at 95.0% confidence -0.347333 +/- 0.0208409 -2.7352% +/- 0.164119% (Student's t, pooled s = 0.0278687) * 15 12.41 12.44 12.42 12.42 0.0075592895 Difference at 95.0% confidence -0.278667 +/- 0.0211391 -2.19446% +/- 0.166467% (Student's t, pooled s = 0.0282674) I also checked one more aspect -- TurboBoost works only when CPU runs at highest EIST frequency (P0 state). I've reduced dev.cpu.0.freq from 3201 to 3067 and repeated the test: x C1 + C2 * C3 +-+ | x + *| | x + *| | x + *| | x + * *| | x x+ * *| | x x+ + * *| | x x+ + * *| | x x+ + * *| | x x+ + + + * *| ||MA| | | |_MA_|| |M_A_|| +-+ NMinMax Median AvgStddev x 15 13.72 13.73 13.72 13.72 0.0048795004 + 15 13.79 13.82 13.8 13.80 0.0072374686 Difference at 95.0% confidence 0.08 +/- 0.00461567 0.582949% +/- 0.0336337% (Student's t, pooled s = 0.00617213) * 15 13.89 13.9 13.8913.894 0.0050709255 Difference at 95.0% confidence 0.170667 +/- 0.00372127 1.24362% +/- 0.0271164% (Student's t, pooled s = 0.00497613) In that case using C2 or C3 predictably caused small performance reduce, as after falling to sleep, CPU needs time to wakeup. Even if tested CPU0 won't ever sleep during test, it's TLB shutdown IPIs to other cores still probably could suffer from waiting other cores' wakeup. Obviously in first test these 0.58% and 1.24% were subtracted from the TurboBoost's maximal benefit of 4.3% on this CPU. -- Alexander Motin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Intel TurboBoost in practice
Hi. I've make small observations of Intel TurboBoost technology under FreeBSD. This technology allows Intel Core i5/i7 CPUs to rise frequency of some cores if other cores are idle and power/thermal conditions permit. CPU core counted as idle, if it has been put into C3 or deeper power state (may reflect ACPI C2/C3 states). So to reach maximal effectiveness, some tuning may be needed. Here is my test case: FreeBSD 9-CURRENT on Core i5 650 CPU, 3.2GHz + 1/2 TurboBoost steps (+133/+266MHz) with boxed cooler at the open air. I was measuring building time of the net/mpd5 from sources, using only one CPU core (cpuset -l 0 time make). Untuned system (hz=1000): 14.15 sec Enabled ACPI C2 (hz=1000+C2): 13.85 sec Enabled ACPI C3 (hz=1000+C3): 13.91 sec Reduced HZ (hz=100): 14.16 sec Enabled ACPI C2 (hz=100+C2): 13.85 sec Enabled ACPI C3 (hz=100+C3): 13.86 sec Timers tuned* (hz=100): 14.10 sec Enabled ACPI C2 (hz=100+C2): 13.71 sec Enabled ACPI C3 (hz=100+C3): 13.73 sec All numbers tested few times and are repeatable up to +/-0.01sec. *) Timers were tuned to reduce interrupt rates and respectively increase idle cores sleep time. These lines were added to loader.conf: sysctl kern.eventtimer.timer1=i8254 sysctl kern.eventtimer.timer2=NONE kern.eventtimer.singlemul=1 kern.hz=100 PS: In this case benefit is small, but it is the least that can be achieved, depending on CPU model. Some models allow frequency to be risen by up to 6 steps (+798MHz). PPS: I expect even better effect achieved by further reducing interrupt rates on idle CPUs. -- Alexander Motin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Intel TurboBoost in practice
2010/7/24 Alexander Motin m...@freebsd.org Hi. I've make small observations of Intel TurboBoost technology under FreeBSD. This technology allows Intel Core i5/i7 CPUs to rise frequency of some cores if other cores are idle and power/thermal conditions permit. CPU core counted as idle, if it has been put into C3 or deeper power state (may reflect ACPI C2/C3 states). So to reach maximal effectiveness, some tuning may be needed. [snip] PPS: I expect even better effect achieved by further reducing interrupt rates on idle CPUs. I'm currently testing a patch that eliminates another 31% of the global TLB shootdowns for a buildworld on an amd64 machine. So, you can expect improvement in this area. Alan ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Intel TurboBoost in practice
On 24 Jul 2010, at 14:53, Alexander Motin wrote: Hi. I've make small observations of Intel TurboBoost technology under FreeBSD. This technology allows Intel Core i5/i7 CPUs to rise frequency of some cores if other cores are idle and power/thermal conditions permit. CPU core counted as idle, if it has been put into C3 or deeper power state (may reflect ACPI C2/C3 states). So to reach maximal effectiveness, some tuning may be needed. Here is my test case: FreeBSD 9-CURRENT on Core i5 650 CPU, 3.2GHz + 1/2 TurboBoost steps (+133/+266MHz) with boxed cooler at the open air. I was measuring building time of the net/mpd5 from sources, using only one CPU core (cpuset -l 0 time make). Untuned system (hz=1000): 14.15 sec Enabled ACPI C2 (hz=1000+C2): 13.85 sec Enabled ACPI C3 (hz=1000+C3): 13.91 sec Reduced HZ (hz=100): 14.16 sec Enabled ACPI C2 (hz=100+C2): 13.85 sec Enabled ACPI C3 (hz=100+C3): 13.86 sec Timers tuned* (hz=100): 14.10 sec Enabled ACPI C2 (hz=100+C2): 13.71 sec Enabled ACPI C3 (hz=100+C3): 13.73 sec All numbers tested few times and are repeatable up to +/-0.01sec. *) Timers were tuned to reduce interrupt rates and respectively increase idle cores sleep time. These lines were added to loader.conf: sysctl kern.eventtimer.timer1=i8254 sysctl kern.eventtimer.timer2=NONE kern.eventtimer.singlemul=1 kern.hz=100 PS: In this case benefit is small, but it is the least that can be achieved, depending on CPU model. Some models allow frequency to be risen by up to 6 steps (+798MHz). The numbers that you are showing doesn't show much difference. Have you tried buildworld? PPS: I expect even better effect achieved by further reducing interrupt rates on idle CPUs. -- Alexander Motin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org Use the link below to report this message as spam. https://lavabit.com/apps/teacher?sig=1225540key=3283483970 Regards, -- Rui Paulo ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Intel TurboBoost in practice
On Sat, Jul 24, 2010 at 9:18 AM, Rui Paulo rpa...@lavabit.com wrote: On 24 Jul 2010, at 14:53, Alexander Motin wrote: Hi. I've make small observations of Intel TurboBoost technology under FreeBSD. This technology allows Intel Core i5/i7 CPUs to rise frequency of some cores if other cores are idle and power/thermal conditions permit. CPU core counted as idle, if it has been put into C3 or deeper power state (may reflect ACPI C2/C3 states). So to reach maximal effectiveness, some tuning may be needed. Here is my test case: FreeBSD 9-CURRENT on Core i5 650 CPU, 3.2GHz + 1/2 TurboBoost steps (+133/+266MHz) with boxed cooler at the open air. I was measuring building time of the net/mpd5 from sources, using only one CPU core (cpuset -l 0 time make). Untuned system (hz=1000): 14.15 sec Enabled ACPI C2 (hz=1000+C2): 13.85 sec Enabled ACPI C3 (hz=1000+C3): 13.91 sec Reduced HZ (hz=100): 14.16 sec Enabled ACPI C2 (hz=100+C2): 13.85 sec Enabled ACPI C3 (hz=100+C3): 13.86 sec Timers tuned* (hz=100): 14.10 sec Enabled ACPI C2 (hz=100+C2): 13.71 sec Enabled ACPI C3 (hz=100+C3): 13.73 sec All numbers tested few times and are repeatable up to +/-0.01sec. *) Timers were tuned to reduce interrupt rates and respectively increase idle cores sleep time. These lines were added to loader.conf: sysctl kern.eventtimer.timer1=i8254 sysctl kern.eventtimer.timer2=NONE kern.eventtimer.singlemul=1 kern.hz=100 PS: In this case benefit is small, but it is the least that can be achieved, depending on CPU model. Some models allow frequency to be risen by up to 6 steps (+798MHz). The numbers that you are showing doesn't show much difference. Have you tried buildworld? Agreed. The numbers are small enough that there could be a large degree of variation just based on environmental factors alone; there are other things that go into that as well, such as disk I/O, etc, that probably shouldn't be factored into a CPU performance test. Thanks, -Garrett ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Intel TurboBoost in practice
Hi mav. On Sat, 24 Jul 2010 16:53:10 +0300 Alexander Motin m...@freebsd.org wrote: PS: In this case benefit is small, but it is the least that can be achieved, depending on CPU model. Some models allow frequency to be risen by up to 6 steps (+798MHz). I tested on Core i7 640UM (Arrandale 1.2GHz - 2.26GHz) with openssl speed (w/o aesni(4)) and /usr/src/tools/tools/crypto/cryptotest.c (w/ aesni(4)). http://people.freebsd.org/~nork/aesni/aes128cbc-noaesni.pdf [1] http://people.freebsd.org/~nork/aesni/aes128cbc-aesni.pdf [2] [1] $ /usr/bin/cpuset -l$i /usr/bin/openssl speed -elapsed -mr -multi $n aes128-cbc $i = 0 1 2 3 0,1 0,2 0,3 1,2 1,3 2,3 0,1,2 0,1,3 0,2,3 1,2,3 0,1,2,3 $n = numbers of core, $((`echo $i | wc -c`/2)) [2] $ /usr/bin/cpuset -l$i ./cryptotest -t $n -z 5 8192 $i = 0 1 2 3 0,1 0,2 0,3 1,2 1,3 2,3 0,1,2 0,1,3 0,2,3 1,2,3 0,1,2,3 $n = numbers of core, $((`echo $i | wc -c`/2)) In my environment, according to aes128cbc-noaesni.pdf, at least, 30% performace up by Turbo Boost (I think). And according to aes128cbc-aesni.pdf, at least, 100% performance up by Turbo Boost (I think). And I understand reducing single thread performance by Hyper Threading:-). -- Norikatsu Shigemura n...@freebsd.org ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Intel TurboBoost in practice
Norikatsu Shigemura wrote: On Sat, 24 Jul 2010 16:53:10 +0300 Alexander Motin m...@freebsd.org wrote: PS: In this case benefit is small, but it is the least that can be achieved, depending on CPU model. Some models allow frequency to be risen by up to 6 steps (+798MHz). I tested on Core i7 640UM (Arrandale 1.2GHz - 2.26GHz) with openssl speed (w/o aesni(4)) and /usr/src/tools/tools/crypto/cryptotest.c (w/ aesni(4)). http://people.freebsd.org/~nork/aesni/aes128cbc-noaesni.pdf [1] http://people.freebsd.org/~nork/aesni/aes128cbc-aesni.pdf [2] In my environment, according to aes128cbc-noaesni.pdf, at least, 30% performace up by Turbo Boost (I think). The numbers are interesting, though they are not proving much, because of many other factors may influence on result. It would be more informative to do the tests with C1 and C2/C3 states used. And according to aes128cbc-aesni.pdf, at least, 100% performance up by Turbo Boost (I think). This IMHO is even more questionable. Single, even boosted core shouldn't be faster then 2, 3 and 4. I would say there is some scalability problem. May be context switches, locking, or something else. -- Alexander Motin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Intel TurboBoost in practice
Rui Paulo wrote: On 24 Jul 2010, at 14:53, Alexander Motin wrote: Here is my test case: FreeBSD 9-CURRENT on Core i5 650 CPU, 3.2GHz + 1/2 TurboBoost steps (+133/+266MHz) with boxed cooler at the open air. I was measuring building time of the net/mpd5 from sources, using only one CPU core (cpuset -l 0 time make). Untuned system (hz=1000): 14.15 sec Enabled ACPI C2 (hz=1000+C2): 13.85 sec Enabled ACPI C3 (hz=1000+C3): 13.91 sec Reduced HZ (hz=100): 14.16 sec Enabled ACPI C2 (hz=100+C2): 13.85 sec Enabled ACPI C3 (hz=100+C3): 13.86 sec Timers tuned* (hz=100): 14.10 sec Enabled ACPI C2 (hz=100+C2): 13.71 sec Enabled ACPI C3 (hz=100+C3): 13.73 sec All numbers tested few times and are repeatable up to +/-0.01sec. PS: In this case benefit is small, but it is the least that can be achieved, depending on CPU model. Some models allow frequency to be risen by up to 6 steps (+798MHz). The numbers that you are showing doesn't show much difference. Have you tried buildworld? If you mean relative difference -- as I have told, it's mostly because of my CPU. It's maximal boost is 266MHz (8.3%), but 133MHz of them is enabled most of time if CPU is not overheated. It probably doesn't, as it works on clear table under air conditioner. So maximal effect I can expect on is 4.2%. In such situation 2.8% probably not so bad to illustrate that feature works and there is space for further improvements. If I had Core i5-750S I would expect 33% boost. If you mean absolute difference, here are results or four buildworld runs: hw.acpi.cpu.cx_lowest=C1: 4654.23 sec hw.acpi.cpu.cx_lowest=C2: 4556.37 sec hw.acpi.cpu.cx_lowest=C2: 4570.85 sec hw.acpi.cpu.cx_lowest=C1: 4679.83 sec Benefit is about 2.1%. Each time results were erased and sources pre-cached into RAM. Storage was SSD, so disk should not be an issue. -- Alexander Motin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org